1
Microcontact Printing of Proteins Emmanuel Delamarche IBM Research, Z¨urich Research Laboratory, R¨uschlikon, Switzerland
Originally published in: Nanobiotechnology. Edited by Christof M. Niemeyer and Chad A. Mirkin. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30658-9
1 Introduction
Biomolecules on surfaces have applications that range from medical diagnostics, analytical chemistry, and culturing and studying cells on surfaces, to synthesizing or engineering DNA, carbohydrates, polypeptides, or proteins. Defining patterns of biomolecules – and of proteins in particular – on surfaces is no simple task considering how complex and fragile these molecules can be. Photolithography – the art of structuring surfaces at lateral scales of less than 1 micrometer – was used recently to create DNA microarrays. Photolithography affords the capability of synthesizing strands of DNA using lithographic masks and photochemistry, but it is unlikely that a similar approach would permit the fabrication of arrays of proteins, which cannot be synthesized block-by-block at present. In photolithography, UV light, organic solvents, photoresists, and resist developers can compromise the structure and function of even a simple protein. For these reasons, novel approaches to patterning proteins include defining regions on surfaces that attract, bind, or repel proteins from solution, or in a more direct manner, by delivering small volumes of a solution of protein to a surface using drop-on-demand systems or microfluidic devices [1–3]. This chapter describes the patterning of proteins on surfaces by means of microcontact printing (µCP), where proteins are applied like ink to the surface of a stamp and transferred to a substrate by printing. Microcontact printing was originally developed by Whitesides and coworkers at Harvard for printing alkanethiols on gold with spatial control [4]. Many variants of µCP were later developed and collectively termed “soft lithography” [5]. The central element in µCP is the stamp. This is a silicone-based elastomer that is microstructured by curing liquid prepolymers of poly(dimethylsiloxane) (PDMS) Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Microcontact Printing of Proteins
on a lithographically fabricated master (or mold). Once cured, the stamp is peeled off the mold by hand; the stamp then bears an inverted pattern of that of the mold. One mold can be used to replicate many stamps. The relative softness of the stamp, compared to that of a lithographic mask, allows it to follow the contours of surfaces onto which it is applied. It is the work of adhesion between the stamp and the substrate that drives the spreading of the initial zones of contact at the expense of an elastic adaptation of the stamp [6]. In µCP – and soft lithography in general – the contact between the elastomer and a substrate occurs at the molecular scale and is termed “conformal”; it ensures the homogeneous transfer of ink from the stamp to the printed areas of the substrate [4]. PDMS materials have the following features. They are transparent to optical light and even UV down to ∼240 nm, resistant to many chemicals and pH environments, good electrical insulators, thermally stable, nontoxic, and can have tailored mechanical properties using various degrees of crosslinking and amounts of resin fillers [7]. A PDMS stamp can be a simple piece cut from a PDMS slab or accurately molded and mounted on a stiff backplane [8, 9]. It can be composed of PDMS layers having different mechanical characteristics, or shaped like a paint roller [10]. Handling stamps with tweezers and printing by hand is sufficient for most needs of experimentalists. Mounting a stamp on a printing tool, however, provides the ability to vary and control the pressure applied during printing, and to align the stamp with the substrate. PDMS stamps have advancing and receding contact angles with water of ∼115◦ and ∼95◦ and thus are hydrophobic and promote the spontaneous deposition of proteins from solution [11]. This deposition is nonspecific and self-limiting to a monolayer of proteins if the stamp is rinsed after the inking step. An important difference between microcontact printing proteins on surfaces and alkanethiols on noble metals is the limited amount of proteins on the stamp. Alkanethiols can diffuse inside a PDMS stamp in sufficient amounts for multiple prints, whereas reinking stamps with proteins is necessary after each print unless hydrogel “stampers” are used [12]. These stamps can carry a large reserve of protein solution used for the contact processing (CP) of substrates. PDMS stamps derivatized with biomolecules provide the basis for selective inking strategies and lead to the affinity-contact printing (αCP) technique [13]. Figure 1 delineates the operations that CP, µCP, and αCP involve. These techniques are described in detail in the following sections.
2 Strategies for Printing Proteins on Surfaces 2.1 Contact Processing with Hydrogel Stamps
Contact processing (left-hand panel in Figure 1) mimics the deposition of proteins from an aqueous environment to a surface by utilizing a hydrogel swollen with a solution of protein [12–14]. The proteins can diffuse through this hydrophilic matrix and adsorb onto the substrate without uncontrolled spreading. The stamp
2 Strategies for Printing Proteins on Surfaces
Fig. 1 Three related methods can pattern proteins from a stamp to a surface. In contact processing (left), a hydrogel stamp mediates the diffusion of proteins from its bulk to a surface. Microcontact printing (center) utilizes an elastomeric stamp inked with
proteins to print the proteins on a substrate without having a liquid. The stamp in affinity contact printing (right) is derivatized with capture proteins, which allows it to be selectively inked with target proteins released to a substrate during printing.
consists of two parts. The first is a reservoir above the hydrogel containing proteins dissolved in a biological buffer. The second is the hydrogel that makes contact with the substrate and mediates the transport of proteins to the substrate. A stamp that has a hydrogel made of poly(6-acryloyl-β-O-methyl-galactopyranoside), for example, embedded in a fine capillary can pattern proteins with a resolution of ∼20 µm [14]. Hydrogels having a refined composition and a greater degree of crosslinking exhibited better mechanical resistance, and were patterned by replication of a mold [15]. The latter approach should allow a protein to be patterned on a surface with a resolution better than 20 µm. CP based on hydrogel stamps has interesting features. First, biomolecules remain in a biological buffer until the stamp is removed and the substrate dried. Denaturation of proteins in this case should be minimal, and may be similar to that of proteins adsorbed from solution onto polystyrene microtiter plates. Second, it is straightforward to reuse such stamps for multiple CP experiments [14]. 2.2 Microcontact Printing
Microcontact printing of proteins uses PDMS stamps replicated from a mold (middle panel in Figure 1). Inking the stamp with proteins is simple, and analogous to depositing a layer of capture antibody (Ab) on polystyrene for conducting a solidphase immunoassay. The duration of inking and the concentration of protein in the
3
4 Microcontact Printing of Proteins
ink solution determine the coverage of protein obtained on the stamp [16]. Inking a stamp can be local and/or involve multiple types of proteins when the stamp is locally exposed using a microfluidic network (µFN) or microcontainers to one or more solutions of protein [17]. The transfer of proteins can be remarkably homogeneous and effective, depending on the wetting properties of the substrate [18]. The large area of interaction of proteins with substrates and their high molecular weight account for the high-resolution potential of µCP of proteins. At the limit, single protein molecules can be printed as arrays on a surface [16], whereas the diffusion of alkanethiols on noble metals or the reactivity of silanes with themselves limit the practical resolution achieved for microcontact printing self-assembled monolayers on surfaces. Microcontact printing proteins on surfaces appears to be limited by the resolution and mechanical stability of the patterns on the stamp. Stamps made of Sylgard 184 and using masters prepared using rapid prototyping or photolithography can have micrometer-sized patterns on fields even larger than 10 cm2 [19]. Microcontact printing proteins with arbitrary patterns and submicrometer resolution benefits from the use of a PDMS elastomer stiffer than Sylgard 184 and masters patterned using electron-beam lithography [8]. 2.3 Affinity-Contact Printing
Tailoring the surface chemistry of stamps to ink a particular type of biomolecule is crucial for αCP (right-hand panel in Figure 1). The chemical stability of silicone elastomers is both an advantage for preparing chemically resistant stamps and an obstacle to modifying the surface of PDMS stamps. Exposing PDMS to an oxygen-based plasma forms a glassy silica-like surface layer [20]. The oxidized layer is a few nanometers thick and contains silanol groups ( Si OH), which are useful for anchoring organosilanes [21]. Oxidized PDMS can thus be derivatized similarly to glass or SiO2 in a few chemical steps using silane monolayers and with crosslinkers for proteins [22]. Affinity-contact printing is the technique of covalently immobilizing ligand biomolecules onto a PDMS stamp, and using them to ink a stamp selectively with receptor molecules. A stamp for αCP is roughly analogous to a chromatography column due to its ability to extract proteins selectively from a mixture, although releasing them involves printing them onto a surface [13]. Biomolecules that are naturally present in crude solutions and have a function on a surface are ideal candidates for applications of αCP. Cell adhesion molecules is one example that has already been demonstrated, but αCP could well be extended to a large variety of biomolecules for which ligands exist. Stamps in αCP are reusable and may include sites of different affinity to capture and print multiple types of protein in parallel [23]. 3 Microcontact Printing Polypeptides and Proteins
Many different types of proteins can be inked from an aqueous solution onto a hydrophobic silicon rubber such as PDMS [24]. Hydrophobic polymers in general
3 Microcontact Printing Polypeptides and Proteins
promote the deposition of proteins from solution through a variety of interactions, and slight or pronounced conformational changes of the protein structure can accompany this adsorption process. The kinetics of formation of a layer of protein on hydrophobic surfaces is often compared to a Langmuir-type isotherm: the rate of deposition of the protein molecules scales with their concentration in the bulk of the solution and reaches a plateau when all sites on the substrate become occupied [25]. Hydrophobic substrates, as a general rule, have stronger interactions with hydrophobic proteins, and their adsorption process is less influenced by the pH and ionic strength of the solution and by the isoelectric point of the protein than when polar or charged substrates are employed [25]. The size of the protein does not seem to play an important role on their inking behavior. A wide range of proteins in terms of structure and functions has been microcontact printed, which includes cytochrome c (12.5 kDa) [11], streptavidin and bovine serum albumin (BSA; ∼60 kDa) [26–28], protein A and immunoglobulins G (150 kDa) [11], glucose oxidase (160 kDa) [29], laminin (∼210 kDa) [30], and fibronectin (440 kDa) [31]. It is sometimes necessary to employ stamps with a hydrophilic surface to ink hydrophilic polypeptides such as polylysine (with MW ranging from 38 to 135 kDa) [32] or lipid bilayers [33]. In other cases, small biomolecules such as amino-derivatized biotins were inked and printed onto surfaces reactive to amino groups [34, 35]. In general, the derivatization of biomolecules with thiol groups allows the printing of biomolecules on gold substrates [36], where patterning by printing can be complemented by the adsorption of other types of molecules from solution. The chemisorption of small biomolecules on surfaces might be necessary for efficient transfer from the stamp and to prevent rinsing the printing molecules during subsequent steps. 3.1 Printing One Type of Biomolecule
Immunoglobulins G (IgGs) are interesting candidate molecules for µCP: these Abs are useful on surfaces for heterogeneous immunoassays. Their numerous disulfide linkages make them robust, they adsorb from biological buffers to PDMS in a nonreversible manner [24], and they can be conjugated to fluorescent centers, metal particles, enzymes, or ligands such as biotin. Fluorescence microscopy is a versatile method to follow the results of microcontact printing IgGs onto a glass surface (Figure 2). There, TRITC-labeled anti-chicken Abs were inked everywhere on the stamp but transferred to glass only in the regions of contact [11]. The patterns on the glass are accurate and correspond to zones of the stamp where the inked proteins are missing. The contrast of the 1 µm-wide features in the pattern is high and accurate and, as no fluorescence above background is measured in the nonprinted regions, it is clear that no transfer of Ab occurred in the recessed areas of the stamp. This might not always be the case, because small features have limited mechanical stability [6]. Demolding the stamp from the mold, capillary effects during inking and drying the stamp, and the printing itself may compromise the mechanical stability of patterns [37]. Implementing support structures in the design of the pattern, controlling the forces exerted during printing and affixing a
5
6 Microcontact Printing of Proteins
Fig. 2 Microcontact printing proteins on glass. Fluorescence microscopy images revealing TRITC-labeled chicken Abs on a stamp after inking and accurately transferred in the regions of contact to a glass substrate. Reproduced with permission from Ref [11]. (Copyright 1998 American Chemical Society.)
stiff backplane to the stamp improve the stability of patterns. Stamps can be very large and have features measuring from micrometers to centimeters, making it possible to print proteins of one kind on large substrates to pattern cells indirectly. Examples include microcontact printing fibronectin [31], polylysine [30, 38, 39], laminin [40], and adhesion peptides [41]. 3.2 Substrates
Substrates for biomolecules cover a wide range of materials, from simple glass slides to complex functional microelectronic devices or sensors. Having conformal contact between the stamp and substrate during printing is the first requirement for microcontact printing biomolecules. For this reason, the substrate should not be too rough [6], or have too prominent structures [39]. Polystyrene, poly(styrene terephthalate), glass, amphiphilic comb polymers, Si wafers, and substrates covered with a thin evaporated metal and/or a self-assembled monolayer can be microcontact printed with proteins and stamps made of Sylgard 184 [11, 29, 34, 42]. The printing time does not seem to play a role, and takes the few seconds necessary to propagate the initial contact to the entire substrate. The details of how and why proteins transfer from a stamp to a surface were intriguing until the recent discovery that the difference in wettability by water between the stamp and the surface determines whether transfer occurs [18]. Proteins tend to transfer when the substrate is more wettable, or has a higher work of adhesion for water, than the stamp. In this respect, the chemical composition of the surface does not seem to play a particular role other than defining the wettability of the surface (Figure 3). The surface of the stamp can be derivatized with fluorinated silanes to raise the wettability threshold
3 Microcontact Printing Polypeptides and Proteins
7
8 Microcontact Printing of Proteins
of the substrate below which transfer remains effective, for example. A remarkable incidence of printing proteins occurs with poly(ethylene glycol) (PEG)-derivatized surfaces [18]. Surfaces covered with a sufficient density of PEGs resist the deposition of proteins from solution because in order to interact with the PEG layer, proteins have to remove water solvat-ing EG repeat units and reduce the number of possible conformations of the PEG chains [43]. Both of these requirements are energetically unfavorable to the deposition of proteins from solution onto PEG-treated surfaces [44]. The mechanism accounting for the transfer of protein in µCP might thus involve the dry state of the PEG layer during printing [43], the local pressure exerted by the stamp at the line front propagation of the conformal contact [45], or some contamination of the PEG layer by low-molecular-weight silicone residues from the stamp. The deposition of proteins from solution or by printing exhibits antagonistic behaviors: proteins are more difficult to print on a hydrophobic surface than on a hydrophilic one whereas the opposite situation generally occurs in solution with, as an extreme case, PEG surfaces, which are protein-repellant [46]. Printing proteins is not limited to the patterning of planar substrates but is possible on curved surfaces, structured surfaces, and over large areas [5]. A stamp can be molded directly curved or planar and then curved and rolled over a surface [10]. Large stamps (≥ 10 cm) can be molded with a pattern having an accuracy of better than 1 mm [47]. The mechanical properties of stamps can be varied from 1 MPa (Young’s modulus) to over 30 MPa by adjusting the formulation of the polymer with respect to its average molecular weight between junctions, the junction functionality, and the density and size of filler particles added to the polymer [8]. The hardness of a stamp, its work of adhesion with the substrate, the pressure applied during printing, and the topography and work adhesion of the substrate all determine whether conformal contact will occur. The stability of features on the stamp might be compromised, however, when the stamp is made too soft and pressed too hard during printing [6, 9, 48]. 3.3 Resolution and Contrast of the Patterns
High resolution in lithography refers to patterning features of arbitrary shape at a length scale where it becomes crucial to optimize all parameters of the technique ← Fig. 3 Influence of the wettability of the substrate by water on the degree of transfer of proteins from a PDMS stamp to a Au surface. (A) The substrate is derivatized with SAMs comprising variable mole fractions of two constituents having different end-groups. (B) Proteins are adsorbed from solution onto the stamp. (C) The resulting printed patterns are analyzed using fluorescence microscopy. The fluorescence micrograph corresponds to fluorescently la-
beled proteins printed onto a 100% COOHterminated SAM. The transfer of proteins followed on SAMs having hydrophilic components functionalized with (D) COOH, (E) OH, or (F) EG correlates with the wettability of the mixed SAM (G). Figure kindly provided by J. L. Tan, J. Tien and C.S. Chen, and reprinted with permission from Ref. [18]. (Copyright 2002 American Chemical Society.)
3 Microcontact Printing Polypeptides and Proteins
(e. g., conditions for exposing and developing the resist, transfer of the resist pattern onto the substrate). Electron-beam lithography has a high-resolution regime for making features < 100 nm, photolithography for features < 250 nm, and µCP for features < 500 nm. In conventional lithography, shrinking the dimensions of patterns is driven by the necessity to improve the performance of integrated circuits at invariant or lower cost. The resolution of lithographic techniques limits the smallest sizes of components made today. It will be the physics of tomorrow’s devices that will ultimately be the limiting obstacle to further integration. Patterning biomolecules has a different paradigm for the resolution limit than conventional lithography because single functional elements, an enzyme for example, are available but do not have to be constructed. Microcontact printing meets several requirements that are necessary to place single proteins at predefined positions on a surface: (i) it is possible to fabricate Si molds with features as small as 40 nm using electron-beam lithography [16]; (ii) PDMS-replicated structures can be 80 nm and even smaller [9, 47]; (iii) proteins remain in the areas of contact, unlike alkanethiols and monolayerforming molecules which generally diffuse away from the initial printed zones on the substrate when an excess ink is present on the stamp; and (iv) the solution of protein used to ink the stamp can be diluted to limit the number of proteins inked per feature on the stamp [16]. Figure 4 shows high-resolution patterns of Abs on Si and glass and how a highresolution stamp can look. Each feature in the atomic-force microscopy (AFM) image in Figure 4A comprises ∼1000 Abs of the same type that were printed on a Si wafer using a PDMS stamp made of Sylgard 184 [17]. The structures have a width of 500 nm and an edge resolution better than 50 nm. The contrast of the patterns seems perfect because no Ab is present outside of the printed areas. An excellent contrast, together with specific binding events between printed ligands and receptors from solution, are desirable for high-sensitivity biological assays. The photography in Figure 4B shows a 8 × 4 cm2 stamp composed of a 30 µm-thick layer of PDMS attached to a flexible glass backplane 100 µm thick [8]. The PDMS layer of this stamp has numerous fields with 250 nm-wide lines, is about five times harder than Sylgard 184, but is also more brittle. The patterns are consequently more stable against collapse, and the glass backplane contributes significantly to the long-range accuracy of the pattern while making the stamp simple to handle, mount and align on a printer tool [47]. The surface tension of the polymer is an important limiting factor for the resolution of µCP. Features as small as 5 nm can be written by electron-beam lithography in PMMA and developed. PMMA is brittle, however, and hence not soft enough to form a reliable contact over surfaces. Unmolding even relatively hard PDMS stamps (having a Young’s modulus > 12 MPa) from a master structured with 40 nm-wide lines results in their broadening by 20 nm owing to surface tension effects [16]. Such a broadening is relatively less important for stamps with features ≥ 100 µm, and can be compensated in the electron-beam lithography layout. The stability of small features on stamps limits the freedom of design for high-resolution patterns. Dots, lines, and meshes do not have all the same mechanical stability against pressure; some recessed areas may collapse during printing. Incorporating support structures with micrometer dimensions around the high-resolution fields can remedy these problems [6]. Another strategy
9
10 Microcontact Printing of Proteins
Fig. 4 Microcontact printing proteins at high resolution. (A) AFM images showing that only ∼1000 chicken Abs were printed onto a Si wafer in each element of this pattern. (B) High-resolution :CP is best done using stamps harder than Sylgard 184 and
having a flexible glass backplane. (C) AFM images showing rabbit Abs printed on glass as a mesh comprising 100 nm-wide lines (left) or on regularly spaced areas that have none, one or a few Ab molecules (right).
4 Activity of Printed Biomolecules
is to transfer the resist pattern into the Si master with a reactive ion etching, where the etch rates depend on the geometry of the features. When large structures are made deeper in the master than smaller structures, a larger part of the load during printing is exerted on the larger structures without inducing collapse of the smallest features [16]. The AFM images in Figure 4C correspond to Ab molecules microcontact printed from a PDMS stamp (material B) [8] onto glass using a mesh of 100-nm-wide lines (left image) and 100-nm hemispherical posts [16]. Both the posts and the lines were 60 nm high. The detail of a mesh reveals that two to four Ab molecules define the width of the lines. In the case of posts, one to three Ab molecules occupy each visible printed site, and the statistical analysis of larger printed zones revealed that sites could have none, one or a few printed Ab molecules [16]. There, the resolution limit for microcontact printing a single molecule is reached while still leaving space for improvement to form homogeneous arrays having only one biomolecule per site. A high concentration of protein in the ink, a long inking time, a further reduction of the dimensions of the posts, and a substrate with a high work of adhesion could help printing large arrays of single protein.
4 Activity of Printed Biomolecules
Many studies emphasize that while the adsorption of a protein on a surface is simple to perform, it is nevertheless a complicated phenomenon in which the biological activity of the immobilized biomolecule might be lost or significantly altered [25]. Microcontact printing biomolecules harbors this risk twice: when proteins are inked onto the stamp, and when they are printed. In principle, the deposition of proteins from solution onto PDMS should be analogous to that of proteins on hydrophobic surfaces [24]. The second concern is more difficult to weigh. Transferring a protein by printing implies that the adhesion of the protein with the substrate overcomes that of the protein with the stamp. At the limit, this might create a mechanical stress on the protein and could lead to irreversible conformational changes. It might be interesting to evaluate the yield of transfer as a function of the peeling rate to better characterize the transfer mechanism [49]. Comparing the activity of different types of biomolecules printed or adsorbed onto polystyrene suggests that enzymes are more susceptible to denaturation during printing than during adsorption from solution [11]. A layer of printed proteinase K displayed half the activity of a layer deposited from solution. Abs are more robust against loss of function; the ability of printed polyclonal Abs to capture antigens from solution was similar to that where the captured Abs were adsorbed. Printed monoclonal Abs seemed to have a ∼10% loss of capture efficiency compared to ones adsorbed from solution. The surface activity of microcontact printed biomolecules belonging to three important biological classes is illustrated in Figure 5. Cell adhesion molecules are ideal candidates for printing biomolecules because these molecules are useful on surfaces as they can direct the adhesion and growth of cells to specific regions of a substrate. Moreover, many are “simple” polypeptides. Polylysine [30, 32, 38, 39],
11
12 Microcontact Printing of Proteins
Fig. 5 Microcontact printed proteins preserve a sufficiently high degree of activity for (A) promoting cell adhesion, (B) performing immunoassays, and (C) performing surface enzymatic catalysis. (A) The adhesion peptide PA22-2, which was printed onto a thiol-reactive surface, was immunostained (left-hand fluorescence micrograph) in two steps using an anti-PA22-2 Ab and a secondary Ab labeled with fluorescein, and is useful for attaching hippocampal neurons with spatial control (phase-contrast micrograph on the right). (B) These AFM images
illustrate three steps of an immunoassay in which chicken Abs were printed on a Si wafer (left), BSA was adsorbed from solution during the blocking step (middle), and the printed Abs were recognized by anti-chicken Abs. (C) This fluorescence image shows the deposition from solution of a fluorescent product made insoluble by printed alkaline phosphatase. The images ¨ in (A) were kindly provided by Offenhausser et al. and reprinted from Ref. [41]. (Copyright 2000 with permission from Elsevier Science.)
laminin [50], polylysine fused with laminin [40], fibronectin [31], specific adhesion peptides [41, 42], and neuron-glial cell adhesion molecules [13] (NgCAM), for example, were microcontact printed to promote or guide the attachment of cells to surfaces. In other examples, Ab–cell interactions were used to pattern cells on patterns of printed Abs [51, 52]. In some instances, the stamp was made hydrophilic and the substrate activated with a crosslinker to increase, respectively, the inking
5 Printing Multiple Types of Proteins
and transfer efficiency. The left-hand fluorescence image in Figure 5A corresponds to the immunostaining of the adhesion peptide PA22-2 that was microcontact printed onto a glass surface activated with amine-reactive crosslinkers [41]. The phase-contrast image (right) shows that the printed pattern of peptide was suitable to grow viable hippocampal neurons in the printed regions specifically These results illustrate well the conservation of the function of printed adhesion molecules. The capability of printing a pattern in registry with structures predefined on a substrate opens the way to placing cells wherever desired on a complex surface to study their function and to form networks of immobilized cells [39, 50, 53]. The next example (Figure 5B) is a printed polyclonal Ab, which serves as antigen to bind polyclonal Abs from solution [11]. AFM reveals the printed regions of a Si surface, each of which comprises ∼1000 molecules of chicken Ab molecules. Blocking the free Si surface with BSA is the next step necessary to prevent nonspecific deposition of proteins during the recognition step. After the blocking step, the Si surface is either covered with BSA or printed Abs; the prints are no longer visible. Recognition of the printed chicken Abs by anti-chicken Abs faithfully reflects the printed pattern. This experiment is an example of a highly miniaturized surface immunoassay in which the printed antigens were recognized by their specific Abs. Enzymes are probably more fragile than Abs and suffer more from a random orientation on a surface with respect to their activity than antigens for polyclonal Abs. The activity of printed enzymes can be evaluated using flat stamps, polystyrene substrates and colorimetric measurements [11]. It can be useful, however, to keep enzymatic products near their sites of production and to assess the activity of the enzymes with high spatial resolution. This is possible by using precipitating fluorescent products that accumulate in the regions of the substrate having enzymatic activity (Figure 5C) [17]. Real-time analysis of the development of fluorescence on the printed sites is even possible with this method. At least a part of the alkaline phosphatases printed on the glass surface in Figure 5C are active. This suggests that enzyme-linked immunosorbent assays can be performed using captured Abs that are printed. Interestingly, the same type of reporter enzyme can unambiguously reveal an ensemble of binding events, which are discernible through their localization. The challenge in this case remains to derivatize a surface with several types of protein.
5 Printing Multiple Types of Proteins 5.1 Additive and Subtractive Printing
An obvious application for microcontact printing proteins is the preparation of protein microarrays that can be used to screen different analytes in parallel while conserving reagents and still obtaining high-quality signals [54–56]. The simplest method to place two types of protein on a surface is to print one and adsorb the other from solution, as has been done with two different types of Abs [11]. This strategy requires that the printed layer of protein be complete enough to limit the
13
14 Microcontact Printing of Proteins
adsorption of Abs from solution into the printed regions. The fabrication of arrays comprising n types of protein is possible using additive patterning steps: once a substrate is microcontact-printed, more proteins can be printed next to or over the previously printed ones (Figure 6A) [17, 57]. The transfer of proteins from a hydrophobic stamp to a more wettable substrate accounts for this finding because surfaces covered with printed proteins are more hydrophilic than PDMS stamps. This ability to stack proteins on top of each other is peculiar to µCP and might be useful for constructing protein-based architectures. It is also possible to place a variety of proteins on a substrate without the need for precise alignment during printing. Stamps with parallel lines can, for example, be inked and printed with a rotation between each print [17, 57]. Subtractive approaches are also possible (Figure 6B). One strategy is to ink a flat stamp homogeneously and to remove a subset of the proteins by printing them onto a structured surface. The sites on the stamp made free for adsorption are then covered by proteins from solution and the operation can be repeated [17]. An original way to pattern a surface with multiple types of protein is to fabricate a three-dimensional stamp and ink the different layers of the stamp with different types of proteins. Applying increasing pressure to the stamp brings each layer of the stamp successively in contact with the substrate [58]. The different patterns of protein are inherently aligned, and the difficulty of fabricating and inking the stamp might be compensated for by the relatively simple printing operation. 5.2 Parallel Inking and Printing of Multiple Proteins
Serial methods are simple, but probably not suitable, to pattern substrates with a large number of different proteins. A parallel inking approach of a stamp using a µFN can solve the problem of inking a stamp with different types of proteins (Figure 6C) [17, 59]. With such a strategy, a µFN having an ensemble of independent flowing zones is placed on a flat PDMS stamp [60, 61]. Sealing the microchannels results from the conformal contact between the µFN and the stamp. When solutions of proteins are flushed through the microchannels, proteins deposit in the areas of the stamp exposed to the conduits. Filling a µFN can be done serially, or with an array of dispensing heads or tips. The deposition of protein on the PDMS might be as fast as a few seconds when it is not limited by the mass transport of proteins from solution or the depletion of proteins from the channels. There, inking the stamp with or without a µFN could involve pin spotting, drop-on-demand, or microinjection techniques. Prefilling individual wells of a structured surface and applying it to a PDMS surface is also suited to locally derivatize a stamp with different types of proteins [23]. 5.3 Affinity-Contact Printing
The inking of a stamp with a large number of different types of protein before each print can quickly become cumbersome when it is desirable to print a large series
5 Printing Multiple Types of Proteins
Fig. 6 Fluorescence microscopy images illustrating three strategies for printing several types of protein on a surface. (A) Two fluorescently labeled proteins were printed subsequently on a glass surface. Proteins on the stamp transferred during the second print both to glass and to the lines of proteins already patterned. (B) This pattern includes two fluorescently labeled protein and BSA printed simultaneously from a flat stamp. First, BSA was inked homogeneously by adsorption from solution onto
the stamp and patterned by subtractive printing. Proteins labeled with fluoroscein isothiocyanate (FITC) were then adsorbed in the regions complementary to the BSA pattern. This was repeated to remove BSA and the second protein along lines that were filled with TRITC-labeled protein adsorbed from solution. (C) A stamp was inked with different lines of proteins using independent channels of a :FN and printed onto a polystyrene surface in one step. Reproduced from Ref. [17].
15
16 Microcontact Printing of Proteins
of substrate with the same pattern of protein. The ability to attach biomolecules to PDMS covalently yields the opportunity to define zones on a stamp that can actively bind a target molecule from an ink. Crosslinking capture molecules (e.g., Abs or antigens) to a stamp allows inking the stamp by exposing it to a solution containing targets for the surface-bound capture molecules. This inking strategy, termed αCP, is analogous to capturing a protein on a column for affinity chromatography, although release of the captured species can occur during printing (Figure 7A) [13]. The idealized view of αCP is to have an affinity stamp (α-stamp) with multiple sites for capturing target molecules from solution in parallel. This could be used for many cycles of capture and printing. The fluorescence image in Figure 7B corresponds to the detection of fluorescently labeled Abs that were inked onto an α-stamp and printed on a glass substrate. This stamp had two types of binding sites (antigens) that were crosslinked to PDMS with spatial control by means of subtractive printing and ordinary printing [23]. The α-stamp used to print the Abs in Figure 7C was prepared by attaching capture antigens to a stamp activated with a crosslinker for proteins using a µFN. The yellow line corresponds to the printing of TRITC and FITC-labeled Abs that were simultaneously captured on a line of protein A immobilized on the stamp. Affinity contact printing, in particular when it employs stamps having multiple affinity sites, is both powerful and challenging. It is powerful because the ink can be a complex solution of biomolecules, the stamp is reusable, and patterns can have high resolution, as in µCP. The difficulty of αCP lies in the preparation of the α-stamp because the PDMS surface must be derivatized with crosslinkers and the quality of the patterns may degrade when preparation involves a large number of steps. The capture and release of radioactive or fluorescent proteins using αCP demonstrated the specificity of the capture event and the reusability of the α-stamp for at least 10 cycles [13]. Neuron-glial cell adhesion molecules (NgCAM), which are 200 kDa transmembrane proteins present at a concentration of ∼1 mg mL−1 in membrane homogenates of chicken brain, were captured by monoclonal Abs of an α-stamp and patterned on a polystyrene surface. The patterned surface appeared to be suitable for the attachment and growth of dorsal root glial neurons (Figure 7D and 7E), which was not the case where polystyrene was exposed directly to a nonpurified source of NgCAM [13]. Among all the printing methods reviewed here, αCP might be the most powerful method owing to the selective inking step and the reusability of α-stamps. It also has, in principle, the potential to print biomolecules with a defined orientation on a surface.
6 Methods 6.1 Molds and Stamps
Molds for preparing stamps are most often Si wafers patterned with a combination of photolithography and reactive ion etching. Reactive ion etching Si, instead of
6 Methods
Fig. 7 Affinity contact printing. (A) Stamps for "CP are prederivatized with capture sites and alternatively inked selectively and used for releasing the captured molecules on a substrate during the printing step. The fluorescence microscopy images correspond to (B) fluorescently labeled Abs co-captured and co-printed on glass using an "-stamp that had two types of capture sites; (C)
fluorescently labeled proteins captured on lines of an "-stamp and printed on glass; (D) the immunofluorescent detection of NgCAM that was patterned on polystyrene using an "-stamp decorated with lines of anti-NgCAM mAbs; and (E) the staining by immunofluorescence of neurons which attached to and developed on the printed pattern of NgCAM.
17
18 Microcontact Printing of Proteins
using directly the pattern of photoresist as the mold, prolongs the lifetime of molds. Si molds can be washed and cleaned; passivation of the Si surface is necessary before pouring PDMS with a release layer. Passivation can be done in situ after reactive ion etching or using a simple dessicator that can be evacuated. Typical release agents are fluorinated silanes. High-resolution molds are Si wafers patterned using electron-beam lithography. The density and resolution of the high-resolution features determine the price of these molds, which easily reaches $1000 per written cm2 for features having a critical dimension of 100 nm or less. Rapid prototyping is suitable for the preparation of stamps with a resolution of ∼25 µm [62] and only requires access to a high-resolution printer (5000 dpi). Replicating molds to make stamps and other operations such as inking stamps and printing are best done in a clean room or using a laminar flow bench to minimize the contamination of surfaces by dust particles. Sylgard 184 (Dow Corning) is used to prepare stamps in many cases, and comprises two prepolymers which, once mixed at a ratio of 1:10 (catalyst and hydridosiloxanes:vinyl-functionalized siloxanes), are poured on the master and cured at 60◦ C overnight. The formulation of harder, mechanically more stable PDMS is sometimes necessary when stamps have tall, isolated features [8]. Stamps can be cured on a backplane made of a thin steel plate or glass sheet [47], or on another PDMS layer [63]. Backplanes are typically flexible, but nevertheless give dimensional and long-range stability to stamps, which can then be mounted on a mask aligner, a printer, or handled by hand more conveniently. Design rules to make stamps are described jointly with the analytical description of the formation of conformal contact between stamps and substrates [6, 48]. This information is valuable to estimate the mechanical stability of stamps against pressure and to predict whether conformal contact will occur on rough surfaces or surfaces having topography. Hydrogel stamps are composed of a polymer that is crosslinked to the desired value (2–4%) and embedded in a microcapillary [12] or patterned by photocuring the hydrogel precursor sandwiched between a slide and a mold [15]. Both the inking and printing with hydrogel stamps rely on the diffusion of proteins through the hydrogel medium. 6.2 Surface Chemistry of Stamps
PDMS is hydrophobic and promotes the adsorption of proteins from solution in a manner analogous to polystyrene. Modifying the surface chemistry of stamps is necessary in two cases. Polypeptides and homogeneously polar biomolecules require stamps to have a hydrophilic surface for inking [38]. Affinity stamps must be derivatized with a ligand specific for biomolecule targets [13]. Exposing a PDMS stamp to an O2 -based plasma creates a silica-like layer on PDMS in a self-limiting manner. Stamps should be freshly inked (within ∼5 min) after the plasma treatment to prevent the recovery of their hydrophobic character [20, 22, 46]. This hydrophobic recovery originates from the migration of low-MW silicone residues from the bulk to
6 Methods 19
the surface. The plasma-induced scission of some polymer chains might also create mobile residues. It is impractical to extract these residues from the prepolymer components or after polymerization. Instead, plasma-treated stamps can be kept under water for long periods of time (more than days). Anchoring crosslinkers for proteins on plasma-treated stamps in one or more steps permits attaching covalently ligands onto the stamps [23]. Unreacted crosslinkers can be quenched with chemicals or reacted with noninterfering proteins such as BSA. 6.3 Inking Methods
The time to ink a stamp with a full monolayer of protein can be relatively long: up to 30 min at room temperature to obtain a monolayer of Ab using a concentration in phosphate-buffered saline (PBS) of 1 mg mL−1 , and 45 min with a solution of 5 µg mL−1 [11, 16]. The stamp is rinsed and dried after the inking step and then placed in contact with a substrate. The inking of hydrogel stamps with 1 mg mL−1 solutions of Abs in PBS takes similar times, and might even be faster if the gel is initially dry [12]. Shortening the inking time of PDMS stamps and localizing the inking is possible using µFNs [59]. In this case, the adsorption of proteins to the stamp might not be mass transport-limited, and is local in the regions of the stamp exposed to the channels. Localized inking is equally possible using microcontainers. These are small reservoirs microfabricated in Si, for example, and filled with the same, or different, solutions of proteins by hand or using pipetting robots [23]. Subtractive inking corresponds to inking entirely a flat PDMS stamp with proteins and transferring a subset by printing on a structured target. This strategy can remove proteins from areas of the stamp that become free for the inking of other proteins from solution [17, 23, 57]. This method can be repeated to form patterns with different types of protein next to each other but it requires an alignment step. Inking a stamp for αCP is analogous to linking a protein to a chromatography column via NH2 residues. Crosslinking protocols are usually well detailed by chemical suppliers or reviewed elsewhere [64]. 6.4 Treatments of Substrates
Surfaces more wettable by water than PDMS stamps are, in principle, suited for printing proteins [18]. Otherwise, they can be derivatized appropriately using plasma deposition techniques, oxidizing methods, or by grafting ultrathin organic layers. The wettability criteria may not apply when hydrophilic stamps are used to print certain polar biomolecules. In this case, derivatization of the substrate might also be necessary. Polylysine has been printed on glass directly [38] and on glass derivatized with glutaraldehyde [32], biotin on amine-reactive substrates [34, 35, 65, 66], and the adhesion peptide PA22-2 on a thiol-reactive surface [41]. In principle, crosslinkers can be attached to many types of substrates to bind proteins from stamps with high efficiency.
20 Microcontact Printing of Proteins
6.5 Printing
Handling a stamp with tweezers is the simplest approach to microcontact print proteins on surfaces both at low and high resolution. Typically, the stamp is brought close to the surface at an angle and set down gradually to ensure that conformal contact propagates from the initial contact areas without trapping air. Occasionally, (dust) particles or topography on the substrate are an obstacle to the propagating contact; applying gentle pressure to the stamp helps spread the contact to the rest of the substrate. Hybrid stamps are convenient to mount on printing tools [50] such as modified mask aligners, or on home-built printers using step motors to print substrates (up to 40 cm in lateral dimensions) with curved stamps while controlling the pressure applied to the stamp during printing [67]. Alignment of the stamp to preexisting structures on the substrate is desirable to print cell adhesion molecules on electrodes or to pattern substrates with multiple types of proteins, for example [39, 50]. Other methods already employed in soft lithography might be applicable for printing proteins. One example is printing surfaces and curved surfaces with cylindrical stamps [5, 10].
6.6 Characterization of the Printed Patterns
Surface-sensitive techniques such as ellipsometry, contact angle microscopy, and X-ray photoelectron spectroscopy can provide chemical information about surfaces printed with proteins. Patterns are best characterized using: (i) AFM, for which no labeling of the proteins is necessary; (ii) fluorescence and scanning confocal fluorescence microscopy; (iii) scanning electron microscopy, provided that the protein layer attenuates the emission of secondary electrons sufficiently to yield fair contrast [68]; or (iv) time-of-flight secondary ion mass spectroscopy [69]. AFM yields rich data concerning the contrast and resolution of the patterns, as well as the appearance and height of the printed layer, but suffers from the difficulty in localizing small printed areas [16]. Fluorescence microscopy is conversely effective in localizing signals from fluorescently labeled proteins, albeit with much less resolution than AFM. Colocalization of fluorophores having different spectral properties allows the detection of successively different types of proteins forming complex patterns. Optimization of the fluorescent signals is greatly facilitated by a` priori knowledge of the geometry of a printed pattern. Detection of unlabeled proteins is possible by immunoassays with detecting Abs either fluorescently labeled or conjugated with a reporter enzyme. In the latter case, the enzymatic conversion of chromogenic precursor into a precipitating fluorescent product keeps the fluorescent signal localized to the printed areas [17]. Staining using electroless deposition also keeps signals local [23]. Microcontact printing proteins onto diffraction gratings [51] or surfaces suited for plasmon resonance [26] offers the exciting capability of following binding events in real time and over sites in parallel.
7 Acknowledgments 21
7 Conclusion
The possibilities of microcontact printing biomolecules on a variety of surfaces with spatial control and resolution down to single protein molecules is unprecedented. Many fields could benefit in principle from these achievements. Surfaces could be decorated with high-quality patterns of microcontact-printed proteins for diagnostic applications. Very small volumes and quantities of reagents would be necessary for this purpose, using inking strategies based on microfluidic networks. In this case, numerous different types of protein could be patterned next to each other and used for surface ligand assays [70]. Printing proteins with tools that control the pressure during printing – hybrid stamps that are accurate and mechanically stable – and in alignment with predefined structures on the substrate has been demonstrated. The next steps are to build on these concepts, and mass fabricate high-quality arrays of proteins on surfaces such as glass slides or polystyrene surfaces for diagnostic applications. Stamps of various types can be devised. Some incorporate proteins in solution in a hydrogel-based reservoir, some have sites capable of binding target molecules from a complex ink, and others display numerous inked sites of micrometer lateral dimensions. Wherever biomolecules can be used on a surface, they might be placed advantageously by means of µCP. This is clearly the case for the growth of cells on surfaces, for which it becomes possible to construct hybrid architectures with cells connected to parts of electronic devices [53]. Positioning biomolecules on a biosensor surface is equally interesting. Microcontact printing can pattern areas of a sensing element with great precision and contrast, enabling the real-time monitoring of binding events on a surface. Possibly, the interaction between a large number of analytes and multiple printed sites could be screened for multianalyte immunoassays or drug screening. Microcontact printing is also well suited for preparing samples to investigate the biophysical properties of single biomolecules. Arrays of single biomolecules provide the advantage of having multiple sites to study each immobilized molecule with easy localization, without suffering from averaging effects, and with minimal signal degradation (photobleaching). It appears that although µCP was originally developed as a tool for applications in lithography [5], it is such a versatile technique that chemists and biologists have diverted it towards many more purposes. Microcontact printing clearly has unique, impressive features to manipulate and pattern biomolecules on surfaces, and it will be interesting to see how these will translate into firmly established applications for diagnostics, and biology in general.
Acknowledgments
I am very grateful to my colleagues Bruno Michel for his indefectible support of the work, to Andr´e Bernard and Jean Philippe Renault for having pioneered and carried out challenging experiments on microcontact printing proteins, and to Sergei Amontov, Helen Berney, Hans Biebuyck, Alexander Bietsch, Hans
22 Microcontact Printing of Proteins
Rudolf Bosshard, Isabelle Caelen, Dora Fitzli, Matthias Geissler, Bert Hecht, David Juncker, Max Kreiter, Heinz Schmid, Peter Sonderegger, Richard Stutz, Heiko Wolf, and Marc Wolf for their close collaboration on our biopatterning activities.
References 1 See for example the proceedings on µTAS: Micro Total Analysis Systems 2002: Y. BABA, S. SHOJI, A. van den BERG (eds), Kluwer Academic Publishers, Dordrecht, The Netherlands. 2 R. S. KANE, S. TAKAYAMA, E. OSTUNI, D. E. INGBER, G. M. WHITESIDES, Biomaterials 1999, 20, 2363–2376. 3 (a) D. R. REYES, D. IOSSIFIDIS, P.-A. AUROUX, A. MANZ, Anal. Chem. 2002, 74, 2623–2636; (b) P.-A. AUROUX, D. IOSSIFIDIS, D. R. REYES, A. MANZ, Anal. Chem. 2002, 74, 2637–2652. 4 A. KUMAR, H. A. BIEBUYCK, G. M. WHITESIDES, Langmuir 1994, 10, 1498–1511. 5 Y. XIA, G. M. WHITESIDES, Angew. Chem., Int. Ed. Engl. 1998, 37, 550–575. 6 A. BIETSCH, B. MICHEL, J. Appl. Phys. 2000, 88, 4310–4318. 7 S. J. CLARSON, J. A. SEMLYEN (eds), Siloxane Polymers, PTR Prentice Hall, Englewood Cliffs, NJ, 1993. 8 H. SCHMID, B. MICHEL, Macromolecules 2000, 33, 3042–3049. 9 T. W. ODOM, J. C. LOVE, D. B. WOLFE, K. E. PAUL, G. M. WHITESIDES, Langmuir 2002, 18, 5314–5320. 10 Y. XIA, D. QIN, G. M. WHITESIDES, Adv. Mater. 1996, 8, 1015–1017. 11 A. BERNARD, E. DELAMARCHE, H. SCHMID, B. MICHEL, H. R. BOSSHARD, H. A. BIEBUYCK, Langmuir 1998, 14, 2225–2229. 12 M. A. MARKOWITZ, D. C. TURNER, B. D. MARTIN, B. P. GABER, Appl. Biochem. Biotechnol. 1997, 68, 57–68. 13 A. BERNARD, D. FITZLI, P. SONDEREGGER, E. DELAMARCHE, B. MICHEL, H. R. BOSSHARD, H. A. BIEBUYCK, Nature Biotechnol. 2001, 19, 866–869. 14 B. D. MARTIN, B. P. GABER, C. H. PATTERSON, D. C. TURNER, Langmuir 1998, 14, 3971–3975.
15 B. D. MARTIN, S. L. BRANDOW, W. J. DRESSICK, T. L. SCHULL, Langmuir 2000, 16, 9944–9946. 16 J. P. RENAULT, A. BERNARD, A. BIETSCH, B. MICHEL, H. R. BOSSHARD, E. DELAMARCHE, M. KREITER, B. HECHT, U. P. WILD, J. Phys. Chem. B 2003, 107, 703–711. 17 A. BERNARD, J.-P. RENAULT, B. MICHEL, H. R. BOSSHARD, E. DELAMARCHE, Adv. Mater. 2000, 12, 1067–1070. 18 J. L. TAN, J. TIEN, C. S. CHEN, Langmuir 2002, 18, 519–523. 19 A. KUMAR, G. M. WHITESIDES, Appl. Phys. Lett. 1993, 63, 2002–2004. 20 H. HILLBORG, U. W. GEDDE, IEEE Trans. Dielectrics and Electrical Insulation 1999, 6, 703–717. 21 G. S. FERGUSON, M. K. CHAUDHURY, H. A. BIEBUYCK, G. M. WHITESIDES, Macromole-cules 1993, 26, 5870–5875. 22 C. DONZEL, M. GEISSLER, A. BERNARD, H. WOLF, B. MICHEL, J. HILBORN, E. DELAMARCHE, Adv. Mater. 2001, 13, 1164–1167. 23 J. P. RENAULT, A. BERNARD, D. JUNCKER, B. MICHEL, H. R. BOSSHARD, E. DELAMARCHE, Angew. Chem., Int. Ed. Engl. 2002, 41, 2320–2323. 24 B. R. YOUNG, W. G. PITT, S. L. COOPER, J. Colloid Interface Sci. 1988, 124, 28–43. 25 J. D. ANDRADE, V. HLADY, Adv. Polym. Sci. 1986, 79, 1–63. 26 H. B. LU, J. HOMOLA, C. T. CAMPBELL, G. G. NENNINGER, S. S. YEE, B. D. RATNER, Sensors and Actuators B 2001, 74, 91–99. 27 L. A. KUNG, L. KAM, J. S. HOVIS, S. G. BOXER, Langmuir 2000, 16, 6773–6776. 28 D. STAMOU, C. DUSCHL, E. DELAMARCHE, H. VOGEL, Angew. Chem., Int. Ed. 2003, 42, 5580–5583. 29 D. LOSIC, J. G. SHAPTER, J. J. GOODING, Langmuir 2001, 17, 3307–3316.
References 30 D. W. BRANCH, B. C. WHEELER, G. J. BREWER, D. E. LECKBAND, IEEE Trans. Biomed. Eng. 2000, 47, 290–300. 31 M. NISHIZAWA, K. TAKOH, T. MATSUE, Langmuir 2002, 18, 3645–3649. 32 D. W. BRANCH, J. M. COREY, J. A. WEYHENMEYER, G. J. BREWER, B. C. WHEELER, Med. Bio. Engineer. Comput. 1998, 36, 135–141 33 J. S. HOVIS, S. G. BOXER, Langmuir 2000, 16, 894–897. 34 Z. YANG, A. CHILKOTI, Adv. Mater. 2000, 12, 413–417. 35 J. LAHANN, I. S. CHOI, J. LEE, K. F. JENSEN, R. LANGER, Angew. Chem., Int. Ed. Engl. 2001, 40, 3166–3169. 36 A. T. A. JENKINS, N. BODEN, R. J. BUSHBY, S. D. EVANS, P. F. KNOWLES, R. E. MILES, S. D. OGIER, H. SCHO¨ NHERR, G. J. VANCSO, J. Am. Chem. Soc. 1999, 121, 5274–5280. 37 E. DELAMARCHE, H. A. BIEBUYCK, H. SCHMID, B. MICHEL, Adv. Mater. 1997, 9, 741–746. 38 C. D. JAMES, R. C. DAVIS, L. KAM, H. G. CRAIGHEAD, M. ISAACSON, J. N. TURNER, W. SHAIN, Langmuir 1998, 14, 741–744. 39 C. D. JAMES, R. D. DAVIS, M. MEYER, A. TURNER, S. TURNER, G. WITHERS, L. KAM, G. BANKER, H. CRAIGHEAD, M. ISAACSON, J. TURNER, W. SHAIN, IEEE Trans. Biomed. Eng. 2000, 47, 17–21. 40 L. KAM, W. SHAIN, J. N. TURNER, R. BIZIOS, Biomaterials 2001, 22, 1049–1054. 41 M. SCHOLL, C. SPRO¨ SSLER, M. DENYER, M. KRAUSE, K. NAKAJIMA, A. MAELICKE, W. KNOLL, A. OFFENHA¨ USSER, J. Neurosci. Methods 2000, 104, 65–75. 42 J. HYUN, H. MA, P. BANERJEE, J. COLE, K. GONSALVES, A. CHILKOTI, Langmuir 2002, 18, 2975–2979. 43 P. HARDER, M. GRUNZE, R. DAHINT, G. M. WHITESIDES, P. E. LAIBINIS, J. Phys. Chem. B 1998, 102, 426–436. 44 K. L. PRIME, G. M. WHITESIDES, J. Am. Chem. Soc. 1993, 115, 10714–10721. 45 S. R. SHETH, D. LECKBAND, Proc. Natl. Acad. Sci. USA 1997, 94, 8399–8404. 46 E. DELAMARCHE, C. DONZEL, F. KAMOUNAH, H. WOLF, M. GEISSLER, R. STUTZ, P. SCHMIDT-WINKEL, B. MICHEL, H. J. MATHIEU, K. SCHAUMBURG, Langmuir 2003, 19, 8749–8758.
47 B. MICHEL, A. BERNARD, A. BIETSCH, E. DELAMARCHE, M. GEISSLER, D. JUNCKER, H. KIND, J.-P. RENAULT, H. ROTHUIZEN, H. SCHMID, P. SCHMIDT-WINKEL, R. STUTZ, H. WOLF, IBM J. Res. Develop. 2001, 45, 697–719. 48 C. Y. HUI, A. JAGOTA, Y. Y. LIN, E. J. KRAMER, Langmuir 2002, 18, 1394–1407. 49 F. SCHWESINGER, R. ROS, T. STRUNZ, D. ANSELMETTI, H.-J. G¨UNTHERODT, A. HONEGGER, L. JERMUTUS, L. TIEFENAUER, A. PLU¨ CKTHUN, Proc. Natl. Acad. Sci. USA 2000, 97, 9972–9977. 50 L. LAUER, S. INGEBRANDT, K. SCHOLL, A. OFFENHA¨ USSER, IEEE Trans. Biomed. Eng. 2001, 48, 838–842. 51 P. M. St. JOHN, R. DAVIS, N. CADY, J. CZAJKA, C. A. BATT, H. G. CRAIGHEAD, Anal. Chem. 1998, 70, 1108–1111. 52 J. LAHANN, M. BALCELLS, T. RODON, J. LEE, I. S. CHOI, K. F. JENSEN, R. LANGER, Langmuir 2002, 18, 3632–3638. 53 A. OFFENHA¨ USSER, K. VOGT in C. M. NIEMEYER, C. A. MIRKIN (eds.): Nanobiotechnology, Chapter 5 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 2004. 54 S. C. LIN, F. G. TSENG, H. M. HUANG, C. Y. HUANG, C. C. CHIENG, Fresenius J. Anal. Chem. 2001, 371, 202–208. 55 L. G. MENDOZA, P. MCQUARY, A. MONGAN, R. GANGADHARAN, S. BRIGNAC, M. EGGERS, BioTechniques 1999, 27, 778–788. 56 J. B. DELEHANTY, F. S. LIGLER, Anal. Chem. 2002, 74, 5681–5687. 57 H. D. IRENOWICZ, S. HOWELL, F. E. REGNIER, R. REIFENBERGER, Langmuir 2002, 18, 5263–5268. 58 J. TIEN, C. M. NELSON, C. S. CHEN, Proc. Natl. Acad. Sci. 2002, 99, 1758–1762. 59 M. GEISSLER, A. BERNARD, A. BIETSCH, H. SCHMID, B. MICHEL, E. DELAMARCHE, J. Am. Chem. Soc. 2000, 122, 6303–6304. 60 E. DELAMARCHE, A. BERNARD, H. SCHMID, B. MICHEL, H. A. BIEBUYCK, Science 1997, 276, 779–781. 61 D. JUNCKER, H. SCHMID, U. DRECHSLER, H. WOLF, M. WOLF, B. MICHEL, N. de ROOIJ, E. DELAMARCHE, Anal. Chem. 2002, 74, 6139–6144. 62 D. QIN, Y. XIA, G. M. WHITESIDES, Adv. Mater. 1996, 8, 917–919.
23
24 Microcontact Printing of Proteins
63 T. W. ODOM, V. R. THALLADI, J. C. LOVE, G. M. WHITESIDES, J. Am. Chem. Soc. 2002, 124, 12112–12113. 64 G. T. HERMANSON, A. K. MALLIA, P. K. SMITH (eds), Immobilized Affinity Ligand Techniques, Academic Press Inc., San Diego, CA, 1992. 65 Z. YANG, A. M. BELU, A. LIEBMANNVINSON, H. SUGG, A. CHILKOTI, Langmuir 2000, 16, 7482–7492. 66 J. HYUN, Y. ZHU, A. LIEBMANN-VINSON, Jr., T. P. BEEBE, A. CHILKOTI, Langmuir 2001, 17, 6358–6367.
67 E. DELAMARCHE, J. VICHICONTI, S. A. HALL, M. GEISSLER, W. GRAHAM, B. MICHEL, R. NUNES, Langmuir 2003, 19, 6567–6569. 68 G. P. LOPEZ, H. A. BIEBUYCK, R. HAERTER, A. KUMAR, G. M. WHITESIDES, J. Am. Chem. Soc. 1993, 115, 10774–10781. 69 A. M. BELU, Z. YANG, R. ASLAMI, A. CHILKOTI, Anal. Chem. 2001, 73, 143–150. 70 R. P. EKINS, H. BERGER, F. W. CHU, P. FINKCH, F. KRAUSE, Nanobiology 1998, 4, 197–220.
1
Bacteriorhodopsin and Its Potential in Technical Applications Norbert Hampp
Philipps-Universit¨at Marburg, Marburg, Germany
Dieter Oesterhelt Max-Planck Institute for Biochemistry, Planegg-Martinsried, Germany
Originally published in: Nanobiotechnology. Edited by Christof M. Niemeyer and Chad A. Mirkin. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30658-9
1 Introduction
In biotechnology, the production of biological macromolecules for technical processes is state-of-the art. The biosynthetic capabilities of cells go far beyond those of organic chemistry. Materials such as functional biomolecules, enzymes, antibodies, and hormones are indispensable in the food industry, cleaning, medical diagnosis, pharmacy, and therapy, and consequently many companies supplying systems such as DNA chips and readers, and high-throughput screening platforms were established to meet demands. The biotechnology industry is booming – indeed, it is very likely to become the most important high-tech industry within the next few decades. Why is “bio” the coming technology? The advantages and the need for enduring miniaturizations are an increasing challenge as long as conventional lithographic methods are employed. The utilization of self-organization principles and bioengineering of functional biological structures seems to be a promising alternative approach. Re-engineering of biomolecules in order to realize technically desirable functions has become possible. However, there remains another problem – namely, the communication between classical microsystems (in particular electronic systems) and the nanoscaled biomolecules. This interface remains the major challenge in the realization of “cross-technology” products. Last – but not least – there is the problem of stability, as biomaterials are less stable than organic and semiconductor structures. This is of course a problem
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Bacteriorhodopsin and Its Potential in Technical Applications
today, but in future technologies, where repair mechanisms like those in living organisms may be implemented, this obstacle may be overcome. Today, we have the need to seek methods for stabilizing the structures of biomaterials before they can be considered for technical applications. Bacteriorhodopsin has been studied over the past two decades as a material for technical applications. Its stability is adequate, it has several technically interesting functions, tools for both its modification and production in technical quantities have been developed, and it offers various interface principles, whether optical, electrical, or chemical. In the following sections, the activities for technical applications of bacteriorhodopsin will be reviewed and future developments will be discussed.
2 Overview: The Molecular Properties of Bacteriorhodopsin
In the first section of this chapter, the organisms in which bacteriorhodopsin is found are described, after which the processes used to modify this protein and to produce it in large quantities will be outlined. Finally, the current technical applications of bacteriorhodopsin will be summarized. 2.1 Haloarchaea and their Retinal Proteins
Archaea form, together with Bacteria and Eucarya, the three domains in life. Archaea are unicellular organisms thriving in a variety of habitats. Most of the archaea so far isolated and cultivated prefer extreme environments, although recent investigations have revealed archaea to be a standard component in biomasses of terrestrial and marine environments [1, 2]. Archaea are split into the crenarchaeotal and the euryarchaeotal branches. Members of the crenarchaeotal family comprise the full temperature range of life, from habitats at −1.8◦ C in the Antarctic to hyperthermal springs where Pyrolobus fumarii holds the world record in optimal temperatures for growth at 113◦ C [3]. Typical of the euryarchaeota are the methanogenic archaea. These occur ubiquitously in locations where organic matter is decomposed under strictly anaerobic conditions, such as in aquatic sediments, in marshes, or in the rumen of herbivores, and they contribute heavily to biogenic methane production. A second branch of the euroarchaeota comprises extreme halophilic organisms, the so-called halobacteria; these are ubiquitous on earth wherever salt occurs in solute concentrations close to saturation (Figure 1). Archaea and their proteins won commercial interest for the unusual physicochemical conditions under which they live and reproduce. The best example is bioleaching (biohydrometallurgy) of ore rubble, which was first carried out during the 1950s for copper and is now used to produce a variety of metals [4]. Heatstable enzymes are valuable molecular biological tools; examples include heat-stable catalysts for the polymerase chain reaction, or use in washing powders. Further
2 Overview: The Molecular Properties of Bacteriorhodopsin 3
Fig. 1 Halobacterium salinarum is found in nature in concentrated salt solutions as they occur in salines. A purple color is caused by bacteriorhodopsin, and this is the key protein of the photosynthetic capabilities
of H. salinarum. The proton pathway with the amino acids involved and the lysinebound retinylidene residue are shown in the structural model of bacteriorhodopsin (foreground).
applications of archaeal proteins which are halophilic or especially resistant to extreme pH values might become apparent in the near future. A high potential for applications in nanotechnology lies in the fabrication of devices on the basis of archaeal surface-layer proteins [5]. These readily form two-dimensional crystals, and actually occur on the cell surface as natural crystals with pores of precisely defined size. To date, six genera of halophilic archaea have been identified [6], and retinal proteins occur ubiquitously among these. In two genera, the extreme halophilicity is connected to a second extreme condition of life, alkalophilism, and these archaea are found in large masses in salty natron lakes. Halophilic archaea in general grow on organic substrates and have optimized their bioenergetics during evolution. With a sufficient supply of organic nutrients and oxygen, they respire in standard fashion. However, it must be noted that oxygen solubility in saturated salt solution is about five times lower than in water. Under the anaerobic conditions that commonly occur in their habitats, these organisms have acquired three alternative means for energy conversion: either they use nitrate, dimethylsulfoxide or trimethylaminoxide as end electron acceptors [7, 8] or they ferment arginine to carbamoyl phosphate for the production of ATP [9–11]. The third choice is a very powerful system, and the second route to photosynthesis in nature. This system does not use chlorophyll-based reaction centers (as do bacteria and plants) but rather
4 Bacteriorhodopsin and Its Potential in Technical Applications
relies on retinal as a photon absorber and retinal-containing proteins as energy transducers [12]. The light-driven proton pump bacteriorhodopsin drives a proton circuit across the cell membrane, and ATP is produced via photophosphorylation as the chemical energy source for cell growth. The process is supported by the chloride pump halorhodopsin, which converts light energy via chloride transport into electrochemical energy used for the maintenance of osmotic balance. Halobacteria in nature grow very slowly, due to the limited supply of organic nutrients in hypersaline lakes. (The reader is referred to Refs. [7, 8] for a detailed description of their ecophysiology and the natural cycle of their “blooms” in salt lakes.) An example of such massive growth density is shown in Figure 1; an indication of heavy growth is the presence of an intense reddish to purple color. These blooms may occur only once each year (or even less frequently), but under optimization of nutrient supply in the laboratory generation times are usually in the order of hours. For biotechnological use, these two facts have to be considered: (i) a salt concentration of 25% is required to prepare peptone media, and this may be detrimental even to stainless steel fermentors; and (ii) peptone media are usually expensive to produce. These two obstacles are respectively overcome by using salt-resistant fermenting units and low-priced yeast extracts to replace the peptone. Another point to consider is the genetic stability of strains. Here, extensive progress has been made in the availability of stable standard strains used in biotechnology. In particular, strains which produce bacteriorhodopsin can be maintained stably by the use of phototrophic selection procedures [13]. Halophilic archaea, among them the genus Halobacteria, are unique in the sense that they are the only group of archaea that contains retinal proteins. While originally thought to occur only in higher animals capable of vision, retinal proteins recently have also been found in unicellular plant organisms [14], in fungi [15], in bacteria [16] and, most diverse in function, in halophilic archaea [12]. While all other groups seem to harbor either one of the two principal functions of retinal proteins – that is, a sensory function (eucaryotes) or a presumed energy-converting function (bacteria) – only halophilic archaea have developed a set of four retinal proteins, two of which serve a sensory function, while two convert light energy to chemical energy. Retinal, or vitamin-A aldehyde, originates from β-carotene by oxidative cleavage in the center of the molecule. The aldehyde in the free state is a chemically labile molecule with five conjugated double bonds. It is oxygen-sensitive and shows lightinduced isomerization around all double bonds. Light and oxygen together (photooxidation) destroy the free retinal easily. All known proteins containing retinal protect the molecule against photooxidation and select specific photoisomerization reactions, e. g., from 11-cis retinal to all-trans retinal in visual pigments and all-trans retinal to 13-cis retinal in the haloarchaeal proteins. All retinal proteins known are intrinsic membrane proteins and possess a transmembrane helical topography. Retinal always binds to the ε-amino group of a lysine residue of the seventh transmembrane helix, and a protonated Schiff base results; this becomes embedded in a cage of amino acids, which in turn drastically modifies the spectroscopic, chemical, and photochemical properties. For example visual pigments cover the color range
2 Overview: The Molecular Properties of Bacteriorhodopsin 5
Fig. 2 Bioenergetics of Halobacterium salinarum. Bacteriorhodopsin acts as a light-driven, outward-directed proton pump. The generated proton gradient over the cell membrane drives a membrane-bound ATPase. These two proteins together form the simplest photosynthetic system known.
of the entire visible spectrum, and bacteriorhodopsin is chemically more stable than most proteins. Moreover, its light fastness is greater than that of organic dye molecules. Four molecular structures of retinal proteins are presently known: the visual pigment rhodopsin [17]; bacteriorhodopsin [18], halorhodopsin [19]; and sensory rhodopsin II [20]. Halobacterium salinarum, for example, makes extensive physiological use of retinal proteins (Figure 2). Bacteriorhodopsin drives the above-mentioned photosynthetic process for the production of ATP in light, and is also coupled to a highcapacity energy storage system in the form of a molar potassium gradient [21]. In order to maintain osmotic balance during growth – when a volume increase occurs under conditions of isoosmolar conditions, both inside and outside the cell – halorhodopsin is used as a light-driven anion pump [22]. This transports chloride ions into the cells, against the existing electrochemical potential, and allows a net salt accumulation to occur during the volume increase. This is a second way of avoiding the extensive use of respiratory energy at the expense of organic nutrients, by using light energy instead. Finally, two other retinal proteins occur in the cell which monitor the intensity and wavelength of the environmental illumination. These two photoreceptors – sensory rhodopsin I and II – receive orange light as an attractive stimulant, and blue light and near UV-light as repulsive stimulants (for a review, see Ref. [23]). Both photoreceptors signal to a two-component system of the cell, thereby regulating the frequency of the flagellar motor’s changes in rotational sense for directing halobacterial cells into an environment of optimal conditions. These photoreceptors are only two among a total of 18 receptors, all of
6 Bacteriorhodopsin and Its Potential in Technical Applications
which convert chemical and physical signals of the environment to direct the cell into areas of optimal growth. The structures of all four halobacterial retinal proteins are very similar, and the molecular structures of two – bacteriorhodospin and halorhodopsin – have been elucidated to the atomic level. Moreover, based on details of the sequence alignment of the two ion pumps and the two sensors, together with the details of some two dozen other structures of archaeal retinal proteins, a tree has been created for this unique family of proteins [24]. Of particular interest in biophysical and biochemical terms is that the functions of the proton pump, the chloride pump, and the sensors are largely inter-convertible, either by varying the physical conditions under which the molecules operate, or by introducing minor genetic modifications. As an example, a point mutation in bacteriorhodopsin will convert this proton pump into a chloride pump [25, 26]. 2.2 Structure and Function of Bacteriorhodopsin
Bacteriorhodopsin is by far the best-studied archaeal retinal protein. It is naturally overproduced in the cells under conditions of illumination and limited aeration of a growing cell culture, but constitutive overproducers have been isolated. These cell lines produce up to 300 000 copies per cell, covering about 80 % of the cell surface as patches of two-dimensional natural crystals of bacteriorhodopsin with specific lipidic molecular species forming the so-called purple membrane [18]. Being a paradigm of light-driven proton pumps and seven-transmembrane helical proteins, the intense studies on this molecule over the past three decades by dozens of laboratories have produced a large body of knowledge on its biophysics, biochemistry, and molecular biology. Thus, the biotechnology of this “myoglobin” of membrane proteins is well founded on detailed knowledge about this molecule (for a recent review, see Ref. [27]). Like the other archaeal retinal proteins, bacteriorhodopsin is an intrinsic membrane protein with the common seven-transmembrane helix topology (see Figure 1) and an approximate molecular weight of 26 kDa. The seven helices are arranged in two arcs: an inner arc with helices B, C, and D; and an outer arc with helices E, F, G, and A. A transmembrane pore is formed mainly between helices B, C, F, and G. The retinal is bound to Lys216 in helix G as a protonated Schiff base, which interrupts the pore and separates an extracellular (EC) half channel (Figure 1, downward oriented) from a cytoplasmic (CP) half channel (Figure 1, upward oriented). The retinal side chain in the binding pocket is closely packed between four tryptophan residues, and the positively charged Schiff base interacts electrostatically with the protein environment. The chromophore is defined as the retinylidene moiety and the side chains in contact with it. Its color is tuned by electrostatic interactions specifically with the Schiff base and a complex counterion consisting of several amino acids. Light absorption causes photoisomerization of the retinal (all-trans to 13-cis and vice versa) and energy storage which includes ion-affinity shifts (H+
2 Overview: The Molecular Properties of Bacteriorhodopsin 7
Fig. 3 Photocycle of bacteriorhodopsin. (A) Upon absorption of a photon, the initial $-state of bacteriorhodopsin is converted photochemically to the J-state from where a series of thermal steps leads back to the initial state. The proton transport is intimately coupled to the photocycle, which is observed as a sequence of intermediates which are represented by the common single-letter code with their absorption maxima given as subscripts. In the dark, bacteriorhodopsin relaxes thermally to the D-
state which has 13-cis configuration. The resulting mixture of $- and D-states is called dark-adapted bacteriorhodopsin. From the O-state, a photochemical conversion of alltrans to 9-cis retinal is possible which is not thermally reisomerized to the initial state. (B) The proton transport and related retinal configurations as well as accessibility of the nitrogen in the Schiff-base linkage between retinal and Lys216 are indicated. This sequence represents several of the molecular changes involved in the proton transport.
in bacteriorhodopsin and Cl− in halorhodopsin) of the Schiff base and internal binding sites as well as conformational changes as central elements of the catalytic cycle. This cycle may formally be represented as a sequence of six steps which are indispensable for transport: the all-trans to 13-cis photoisomerization and its thermal reversal (isomerization, I); a reversible change in accessibility (switch, S) of the Schiff base for ions in the EC and CP channel respectively; and ion transfer (T) reactions to and from the Schiff base active center (Figure 3). Specifically in
8 Bacteriorhodopsin and Its Potential in Technical Applications
wild-type bacteriorhodopsin the order of these elementary steps is photoisomerization, proton transfer from the Schiff base to the acceptor aspartic acid (D) 85 in the EC channel, change in accessibility of the Schiff base from EC to CP, proton transfer from the donor aspartic acid (D) 96 in the CP channel to the Schiff base and thermal reisomerization, followed by reset of the Schiff base accessibility from CP to EC. The two proton transfer steps are intimately linked to cooperative changes in the EC and CP channels where hydrogen networks exist [28], and proton conduction over the total distance of about 48 Å is very likely based on a Grotthuss-type mechanism [29]. Structural key players in the EC channel are, besides D85, the arginine residue 82 and the two glutamic acid residues 194 and 204. They are connected by a water bridge to form a dyad which, by all likelihood, is the proton-release unit on the EC surface [18]. Other water molecules in the EC channel also play an important role [30, 31] in addition to amino acid side chains. One water molecule is located between D85 and the Schiff base, and disappears in the structure of the key intermediate M [32]. Three more water molecules below D85 are interconnected with side chains to form an extended hydrogen network as suggested by early experiments [33]. The CP channel from which the Schiff base later in the catalytic cycle receives the proton back is functionally dominated by the protonated D96. The distance between D96 and the nitrogen is 15 Å, and the hydrophobic nature of the space between them could provide an insulating layer against the membrane potential of 280 mV in the ground state. One water molecule occurs in contact with D96, and one water molecule is located between D96 and the Schiff base. Again, changes of this part of the molecule are observed in the key intermediate M [32]. At the proton entrance of the CP channel, aspartic acid D38 plays a role in the refeeding mechanism of protons [34]. The most dramatic effect of mutations on the structure and function of bacteriorhodopsin have the two carboxylates at positions 85 and 96. The removal of the proton acceptor D85 prevents the deprotonation of the Schiff base and almost completely annihilates the proton transport function (for a detailed description, see Ref. [35]). The lack of the proton donor D96 slows the catalytic cycle by a factor of up to several hundred, depending on temperature, humidity, azide concentration, etc. [36]. The reason is that the lack of the internal proton donor renders the rate of reprotonation of the Schiff base depending on external pH. Another important feature of the M-intermediate is its capacity to absorb blue light, and by this to reconvert photochemically into the initial state. Thus, in mutants lacking D96 the life-time of the M-intermediate and therefore the speed of color changes can be regulated either by physico-chemical parameters such as pH, temperature, and humidity, or alternatively by simultaneous application of two photons of different quality, for example green and blue photons. The three most important features, which eventually lead to biotechnological use of bacteriorhodopsin on the basis of its catalytic cycle are: 1. The color changes, which can be used for any type of information processing and storage process.
2 Overview: The Molecular Properties of Bacteriorhodopsin 9
2. The photoelectric events which are due to the changing geometry of the Schiff base upon photoisomerization and the movement of the proton. Such electric changes occur from the picosecond to the millisecond time regime. 3. The pH change between the inside and the outside of bacteriorhodopsincontaining membrane systems as the net result of proton translocation.
So far, the color changes are under most intensive investigation because of the velocity of light-triggered reactions in bacteriorhodopsin and the possibility of regulating the speed of the color changes over a very wide range. 2.3 Genetic Modification of Bacteriorhodopsin
The molecular biology and genetics of halophilic archaea are developing at a slow, but constant, pace. The development of a transformation system was a major breakthrough, as this opened the door to site-specific mutagenesis and gene deletion or replacement [37, 38]. The basic process consists of a protoplast (spheroblast) preparation by incubation of cells with EDTA to remove magnesium ions. These are necessary for the integrity of the surface layer formed by the glycoprotein in the cell wall of archaea. Spheroblasts are incubated with a mixture of vector DNA carrying an antibiotic resistance and polyethyleneglycol. After a curing period in complex medium the cells are plated with antibiotic to select transformed clones [39]. Not many antibiotics have been reported as efficient agents against halophiles, and only two resistance genes have been cloned and inserted into suitable transformation vectors. Mevinolin inhibits the β-hydroxymethylglutaryl CoA (HMG) reductase and thus prevents isoprenoid synthesis of archaea. As lipids of halophilic archaea are diphytanoyl ethers of substituted glycerol, mevinolin completely blocks growth [40]. Novobiocin prevents DNA replication by inhibiting Gyrase B. The two selection marker genes used are hmg and gyrB; these were isolated from Haloferax volcanii and should have highly homologous genes in most other halophilic archaea [41]. The creation of site-specific retinal protein mutants requires appropriate host strains and vectors. The very stable strain H. (halobium) salinarum L33 [42] was introduced as a host for bacteriorhodopsin muteins [39]. It was found as a spontaneous mutation, carries an insertion element in the bop gene, and does not revert to a wild-type phenotype under any condition tested. Further strains were selected after either spontaneous or induced mutation which lack one or more of the retinal proteins, and many more strains with various phenotypes bearing favorable properties for investigations on the bioenergetics and signal transduction of halophiles have been reported. These are not covered in this chapter. More recently, systematic deletions of retinal protein genes have been described; one such example is the strain SNOB (S9 without bop) which is derived from the bacteriorhodopsinoverproducing strain S9 and has the bop gene deleted [43]. In consecutive rounds of a deletion procedure which uses the same antibiotic for selection repeatedly, all four retinal protein genes have been deleted to reach the strain NAOMI (now all
10 Bacteriorhodopsin and Its Potential in Technical Applications
opsins missing; M. Otsuka and D. Oesterhelt, unpublished results). This strain has the advantage that complementation with opsin genes allows the combination of their physiological function at will. Vectors have been introduced first on the basis of a replicative element and a mevinolin-resistant determinant from a halophilic cell, together with a replicon and an ampicillin resistance from E. coli to serve as shuttle vectors [44]. In many laboratories, a plethora of vectors was subsequently designed with a size smaller than 10 kb, and without halobacterial origin of replication (suicide vectors) to enforce homologous recombination and therefore stable integration into the halobacterial genome. For interruption of a functional bop gene, the resistance gene used for selection is usually inserted into the coding sequence and remains stably associated with it permanently. Site-specific mutated genes can be favorably introduced into wildtype (S9) or mutated (L33) background in the following way. The resistance gene is placed next to the mutant bop gene on the suicide vector DNA. After transformation, antibiotic-resistant clones are selected which must result from a single crossover genetic event leading to the integration of vector DNA into the chromosome. These clones are allowed to grow without selection pressure, and are then replica plated with and without antibiotic. Among several genetic results is one which removes the vector DNA with the resistance gene and the original version of bop gene and leaves the mutated bop gene in the correct place stably integrated. These clones are found at various frequencies (usually one among several hundred) as the phenotype which does not grow on the antibiotic-containing plate but on its antibiotic-free counterpart. Besides genetic stability, the procedure provides the additional advantage of repeated use of the same antibiotic in further rounds of genetic alterations. Mutations of bacteriorhodopsin have been produced by the hundred and, once published, are available from the various laboratories. While the genetic background into which the mutated bop genes are introduced is often different, the mutagenesis procedure presently is usually the overlapping PCR method which replaced the preceding approach of gapped duplex DNA and is easily and quickly applied. Recently, a β-galactosidase gene as a reporter gene has been shown to act as an indicator gene for promoter strength and for blue-white selection procedures [45–47]. In conclusion, the production of bacteriorhodopsin muteins has become a routine method, but the maintenance of stable strains overproducing these muteins requires much care. In addition, the maximal level of mutein production can be very different, is unpredictable, and whether a given mutein will produce the crystalline arrangement of the purple membrane is also in doubt. 2.4 Biotechnological Production of Bacteriorhodopsins
The biotechnology of bacteriorhodopsin is based on simplicity of its isolation and chemical and photochemical stability when in the form of purple membranes. As
3 Overview: Technical Applications of Bacteriorhodopsin 11
mentioned, bacteriorhodopsin forms 2D crystals in vivo, and these can be isolated as purple membranes. The isolation is facilitated by two facts: (i) halobacterial cells are unstable in water and cell constituents are released upon lysis; (ii) the cell membrane is fragmented, and for unknown reasons the crystalline patches of the purple membrane are set free as fragments of largest size and highest buoyant density. These two specific features are used in the isolation procedure, either in a combination of sedimentation and isopycnic gradient centrifugation leading to a product of highest purity, or by a filtration procedure. The purification step yields a product which is 95–100% pure depending on the conditions and the mutein under consideration. Although these methods have not yet been exposed to economic competition, it is expected that the biotechnological production of purple membranes might in the future represent a competitive biomaterial in information technology. Certain limits of this material should be mentioned, however. As a membrane protein, bacteriorhodopsin cannot be produced via inclusion bodies as although the refolding and reconstitution of the active chromoprotein into membranes is possible in principle, it is certainly not feasible on an economic basis. Halobacterial cells do not form invaginations of their cell membrane like phototrophic bacteria (e. g., Rhodopseudomonas), which concentrate their reaction centers for photosynthesis in units called chromatophores. Although the purple membrane may occupy about 80 % of the cell membrane area, the amount of purple membranes obtained from cells is comparably low. It is needless to point out that the secretion of the integral membrane protein bacteriorhodopsin from cells into the medium has no biochemical basis. In practical terms, the current production of bacteriorhodopsin is at a level of 25 g m−3 nutrient broth. A 25-g quantity of bacteriorhodopsin would produce 50 L of suspension with an optical density of 1 (at a layer thickness of 1 cm this would allow the passage of only 10 % of light; i.e. it is an intense color). The molecular biology or production of bacteriorhodopsin variants has been established over the years, and at present the genetics is well known, including the sequence of the entire genome [48]. One problem in the molecular genetics of the halophilic archaea, especially H. salinarum, is that of insertion elements, as this sometimes causes instability of strains. However, with increasing knowledge on transformation procedures, vector construction and the creation of new antibiotics for the cells, these problems may be removed, at least for industrial purposes. It should be mentioned again, however, that cells producing bacteriorhodopsin variants do not always overproduce bacteriorhodopsin as do wild-type cells.
3 Overview: Technical Applications of Bacteriorhodopsin
The remarkable physico-chemical properties of bacteriorhodopsin and its numerous technically attractive molecular functions have led to many potential technical uses for this material [49–51]. In most of these applications, bacteriorhodopsin is used in the purple membrane (PM) form because its deliberation from the
12 Bacteriorhodopsin and Its Potential in Technical Applications
crystalline package reduces both the chemical and thermodynamic stability of bacteriorhodopsin to a significant degree. Bacteriorhodopsin comprises three basic molecular functions which may be used in technical applications: photoelectric, photochromic, and proton transport properties. The molecular mechanisms responsible for each of these properties have been already discussed, and here we will focus briefly on the related applications.
3.1 Photoelectric Applications
The use of bacteriorhodopsin as a molecular level photoelectric conversion element is one of the fields where technical applications of the material have been examined. Upon illumination, a photovoltage up to 250 mV per single bacteriorhodopsin layer is generated, and this may be used either as an indicator or control element for various applications. Triggered by the absorption of a photon, the bacteriorhodopsin molecule undergoes a series of very rapid molecular changes, one of which is the generation of a photovoltage caused by changes in the orientation of molecular dipole moments that are triggered by the isomerization of the retinal on the femtosecond scale (Figure 4A). The proton released through the outer proton half channel may be either transferred to the outer medium or conducted along the surface of the PM [52]. The proton conductivity along the PM in H. salinarum cells supports delocalization of protons on the surface and proton conduction to the membrane-bound ATPase molecules. In most of the photoelectric applications of PM, the water content of the films is reduced, and this in turn causes proton conduction along the membrane surface. A single PM sheet contains several thousands of unidirectionally oriented BR molecules. Upon illumination, a number of bacteriorhodopsin molecules proportional to the intensity of light will be excited to accomplish a proton transport process. As all of the bacteriorhodopsin molecules in a single PM patch are oriented in the same direction, the voltage generated over a single membrane is independent of the number of active molecules (Figure 4B), but the proton current generated is proportional to the light intensity. The photovoltage generated can be easily measured by embedding the PM layer between two transparent electrodes. The light-triggered photovoltage induces a compensating polarization voltage in the outer electrodes, and this in turn may be detected as a high impedance voltage signal (Figure 4C). In a perfectly capacitive coupled system of this type, an induced photovoltage with the characteristics shown in Figure 4C is observed. A voltage is induced only during the light intensity change. The polarity of the signal is different for the OFF =⇒ ON and the ON =⇒ OFF transition. This is called the “differential responsivity” of PM layers. There are two major issues to be solved for an attractive use of bacteriorhodopsin in photoelectric processing. The first is that a high degree of orientation of the PMs is obtained because counter-oriented PMs cancel out each others’ photoelectric effects. The second is that a coupling of the light-dependent proton-motive force
3 Overview: Technical Applications of Bacteriorhodopsin 13
Fig. 4 Photoelectric properties of bacteriorhodopsin. (A) Absorption of a photon by bacteriorhodopsin causes molecular changes which lead on a pico- to nanosecond timescale to the generation of a photovoltage over the molecule. The bacteriorhodopsin molecules in a single purple membrane (PM) patch are unidirectionally oriented. The light-driven vectorial proton transport of all the bacteriorhodopsin molecules is switched in parallel. This means that, over a single PM patch, it is not the voltage but the proton current which is proportional to the light intensity. The protons transported through the bacteriorhodopsin molecules can either be released to the outer medium or move along the surface of the PM patch due to proton conduction. (B) Oriented bacteriorhodopsin molecules, each represented by an arrow, sandwiched in a capacitor structure can be used as a photoelectric indicator cell. Upon illumination, a charge separation over the bacteriorhodopsin layer is generated which
induces a proportional charging of the outer electrode layers. No electric conduction between the bacteriorhodopsin layer and the electrodes is required. The electric field of the charge distribution induced in the electrodes compensates the electric field caused by bacteriorhodopsin. (C) Depending on the type of outer circuitry, either the induced voltage or the induced charge motion can be measured. In the latter case, a signal is recorded which corresponds to the first derivative of the temporal change of the light. (D) In a pixelated structure which is coated with oriented bacteriorhodopsin, photovoltages are measured only in those spots where a change in the light intensity occurs; hence, this is called novelty filtering. (E) In a volume (e. g., in 3-D data storage) the detection of the photovoltage in an outer capacitor structure was considered for readout. If the bacteriorhodopsin in the point of excitation is in the $-state, the absorption of a photon will lead to a photoinduced voltage, but not in the M-state.
14 Bacteriorhodopsin and Its Potential in Technical Applications
of bacteriorhodopsin to the electromotive forces used in conventional electronics needs to be achieved. 3.1.1 Preparation of Oriented PM Layers The photoelectric signal of PM is high enough so that a single layer only of PM is needed for applications where the PM acts as a photoelectric indicator molecule. The Langmuir-Blodgett technique is often used for the preparation of such devices. The main problem is to obtain a high degree of orientation of the PM patches. The PM patches are thin (5 nm), large (up to micrometers), and flexible, and for this reason a mechanical orientation is difficult to obtain. The physical properties of the two sides of the PM are definitively different, but the differences are not so large that orientation (e. g., in an electric field) achieves more than a preorientation. Any preorientation obtained needs to be made permanent, for example, by covalent crosslinking or by polymer embedding of the PM patches. The only reliable method described so far is the coating of a surface with monoclonal antibodies against bacteriorhodopsin [53]. Since the antibodies selectively react with the cytoplasmic or extracellular side of the PM, a highly oriented monolayer of PM can be obtained using this method, although unfortunately it is restricted to a single PM layer. 3.1.2 Interfacing the Proton-Motive Force The other problem in photoelectric applications is that under light exposure bacteriorhodopsin transports protons, but not electrons. The interface between the biological component bacteriorhodopsin and its proton-motive force and the electromotive forces required for conventional electronics needs to be considered in the design of devices. Interfacing the proton-motive force of BR with the electron-conducting outer electrodes requires an electrolyte layer which couples both “worlds”. The balancing between the electronic circuitry and the photoelectric properties of a PM layer is crucial, and is the reason why results from different laboratories are so incomparable. The more than complete review published by Hong [54] is recommended for the reader who seeks much more detail on this subject. 3.1.3 Application Examples Ultrafast photodetection was one of the earliest proposals for the use of bacteriorhodopsin in a technical application [55, 56]. Two-dimensional photoelectric arrays, as artificial retinas or as control elements for a liquid crystal spatial light modulator, were developed later. Last, but not least, the photoelectric properties of bacteriorhodopsin can be used for indicator purposes in 3-D memories. 3.1.3.1 Artificial retinas Most devices which make use of the photoelectric properties of bacteriorhodopsin are called “artificial retinas” [57–59], the reason being that they offer certain preprocessing features known from the retina, including edge detection and novelty filtering. The physical background is called the “differential response” of PM layers. A device (see Figure 4D) comprising an electrode array which is covered with one or more oriented layers of PM may be used for
3 Overview: Technical Applications of Bacteriorhodopsin 15
novelty filtering. Each of the electrodes is connected to an amplifier electronics. First, assume that the dark rectangular structure shown in Figure 4D prevents a set of electrodes from being exposed to light. Then, if this structure is moved over the light-sensitive sensor area (see arrow), it causes a voltage to be induced in each of the pixel electrodes, with a sign proportional to the light change. Because only those electrodes “fire” where a change of the illumination occurs, this type of sensor is called an “artificial retina”, and this type of preprocessing is called “novelty filtering”. Due to the differential response of bacteriorhodopsin, the polarity of the electrode signal carries the information whether a pixel was switched to “ON” or to “OFF”, and in turn the direction of the movement of the object can be derived. 3.1.3.2 Electro-optically controlled spatial light modulators Another use of the artificial retina described above may be in optically addressed spatial light modulators (SLMs). The amplifier electronics which detects the photovoltages induced in the electrode pixels may be connected to a SLM device. In particular, liquid crystal (LC)-based SLMs are state-of-the-art. In this case, the electrodes of the bacteriorhodopsin-based artificial retina are connected one-to-one to the pixels of a LC-SLM. An advanced version omits the wiring and amplifiers. The LC-layer of the SLM is controlled directly by the artificial retina device [60, 61]. 3.1.3.3 Readout in 3-D Memories Another application of the photoelectric properties of PM was investigated for the readout of volume storage units with bacteriorhodopsin. The basis is a cube of oriented PM patches in either the purple initial state or the yellow M state. Upon illumination of a PM patch in the initial state (which may be addressed by actinic light), a photovoltage signal is induced. A PM patch in the M state would not respond to the actinic light, but two electrodes on the outer surfaces of the bacteriorhodopsin cube could detect the photovoltage generated, and by this method the 3-D distribution of the photochemical states of PM patches could be read out. This is a prerequisite for a 3-D memory based on bacteriorhodopsin (Figure 4E). 3.2 Photochromic Applications
Most applications currently investigated utilize the photochromic properties of bacteriorhodopsin [50, 62]. During the photocycle, bacteriorhodopsin cycles through a pair of spectroscopically distinguishable intermediates (see Figure 3A), all of which have an absorption maximum which is different from that of the initial B state. However, in most applications the photochromism of bacteriorhodopsin is used in connection with the purple to yellow absorption change which is related to M state formation. Upon acidification, a blue membrane is formed which has a significantly different photocycle. The formation of a 9-cis retinal-containing state is observed, and this is thermally stable. In contrast, 13-cis retinal is isomerized by the bacteriorhodopsin molecule to all-trans retinal at room temperature, and the isomerization of 9-cis
16 Bacteriorhodopsin and Its Potential in Technical Applications
Fig. 5 Photochromism of bacteriorhodopsin and its application. (A) The retinylidene residues are strongly anisotropic. (B) Upon illumination with polarized light, the retinylidene residues which are in parallel to the electric field vector of the actinic light are preferentially excited and isomerized. (C)
Transient photochromic change of bacteriorhodopsin between the initial purple state and the yellowish M-state (middle). (D) The photochemical formation of 9-cis retinal may be utilized for photochromic long-term storage. (E) Permanent storage of information in bacteriorhodopsin.
retinal is not catalyzed. This pathway opens the route to long-term storage materials based on bacteriorhodopsin. 3.2.1 Photochromic Properties of Bacteriorhodopsin Isomerization from all-trans to 13-cis is the first occurrence after the photochemical excitation of bacteriorhodopsin, and this causes significant transient shifts in the absorption spectrum. In addition to the isomerization change, deprotonation of the chromophoric group is observed. In the L to M transition, a proton from the Schiff base nitrogen group is transferred to Asp85, and this deprotonation causes a drastic blue shift of the absorption to 410 nm. The photochromism of bacteriorhodopsin is dominated by the intermediate which has the longest life-time; hence, this forms a “bottleneck” in the photocycle. The retinylidene residue inside bacteriorhodopsin which forms part of the photochromic group in the molecule is strongly anisotropic. A PM layer may be considered as a crystalline arrangement of chromophoric groups which are oriented angles of 120◦ between them (Figure 5A). However, in most applications where a statistic number of PM patches is used, the angular chromophore distribution appears anisotropic. Due to the anisotropy of the retinylidene groups, excitation of a random distribution of PM patches with polarized light causes the chromophores to become oriented in parallel to the actinic light polarization such that a
3 Overview: Technical Applications of Bacteriorhodopsin 17
preferentially converted (and in turn photoinduced) anisotropy is obtained (Figure 5B). In solutions containing PM this is masked by diffusion, but in bacteriorhodopsin-films where the PM patches are fixed, the photo-induced anistropy can be easily observed and utilized. Today, three types of photochromic changes in bacteriorhodopsin have been described which enable different applications. The first is the photochromic shift between the B and M states (Figure 5C), and this is used mainly for optical processing tasks. The second is photoerasable data storage using 9-cis-containing states of blue membrane or suitably modified BR-variants (Figure 5D). However, the very low quantum efficiency for recording of far below 1 % is a major limitation. And last, but not least, permanent photochromic changes obtained through two-photon absorption in bacteriorhodopsin are suitable for long-term data storage (Figure 5E). 3.2.2 Preparation of Bacteriorhodopsin Films Optical films are prepared from bacteriorhodopsin by polymer embedding. Optically clear, water-soluble polymers are suitable for this purpose (e. g., polyvinylalcohol, gelatin). The film formation is usually carried out by mixing the polymers with PMs and additives in aqueous solution, this being cast on a glass support. The water is generally removed by drying in air, but the films may also be sealed with a second glass plate. 3.2.3 Interfacing the Photochromic Changes The reason why photochromic applications of bacteriorhodopsin are more developed than others is because of the ease with which an interface can be implemented between the bacteriorhodopsin films and any type of optical system. The bacteriorhodopsin film is completely sealed, the only interface being the light which transports energy and information simultaneously. 3.2.4 Application Examples Many applications based on the photochromism of bacteriorhodopsin have been suggested, and some are described briefly here as an indication of the wide range of potential uses. 3.2.4.1 Photochromic color classifier As the different rhodopsins in the eye enable color perception, the use of bacteriorhodopsins with different absorption maxima would allow a biomimetic system for color perception to be set up. The photoelectric response of three sensor elements coated with three different bacteriorhodopsin types (i. e., wild-type and two containing retinal analogues) are combined and coupled to a simple neural network for color recognition properties. This functions quite reliably, though it is unclear whether it has any technical advantages over conventional color sensors. Two limitations can be identified in this system. First, the absorption maxima of the three BR types used today are relatively similar, and they do not span the visual wavelength range as well as the human rhodopsins, notably in the blue region. Second, the conventional systems which typically comprise three different color filters and semiconductor light-sensitive elements reliably
18 Bacteriorhodopsin and Its Potential in Technical Applications
supply the required information. At present, it is difficult to identify any advantages of the bacteriorhodopsin-based systems over conventional systems [63]. 3.2.4.2 Photochromic inks Another development is the preparation of photochromic inks. These inks differ from the polymer films for optical recording by their rheological properties. Depending on the method of application (e. g., screen printing, offset printing), the viscosity and surface tension must be considered. The basic photochromic properties are quite similar to those of the optical films, but auxiliaries in the compositions adjust the required application dependent properties. A major problem is to identify suitable compositions which do not interfere with the photochemical properties of the bacteriorhodopsin embedded [64]. 3.2.4.3 Electrochromic inks The main color shift in bacteriorhodopsin is due not to a primary photochemical reaction but to the protonation change of specific groups, in particular the Schiff base linkage and the Asp85 residue. As protons are charged particles, their removal from the binding position by electric fields should be possible. Indeed, this can be demonstrated, though the speed and efficiency of the decoloration/coloration process is quite low. Nonetheless, the basic principle was successfully demonstrated [65], indicating a potential development of electrochromic paper using bacteriorhodopsin. 3.2.4.4 Photochromic photographic film It is well known that a permanent bleaching of bacteriorhodopsin may be achieved with hydroxylamine. The chemical reaction of hydroxylamine with the retinal binding site occurs in an intermediate state only, and no reaction of BR in the β-state is observed. Due to this finding, which dates back to the early retinal extraction experiments, it is possible to fabricate nonreversible optical films from bacteriorhodopsin [66]. These films behave quite similarly to photographic films, except that the compounds needed for the chemical development are already contained in the film. The reason why this process has not been used technically is that, after image formation, the nonreacted hydroxylamine must be removed from the film, or fading of the contrast will occur when stored in light. 3.2.4.5 Long-term photorewriteable storage of information Much interest has centered on photorewriteable optical storage with bacteriorhodopsin. For this purpose, the conventional all-trans to 13-cis conversion is not suitable because of the retinal reisomerization at room temperature in bacteriorhodopsin. In blue membrane, a photochemical conversion from all-trans to 9-cis retinal, which appears pink in bacteriorhodopsin, may be induced by high light intensities. 9-cis retinal is thermally stable and requires photochemical excitation for reconversion of all-trans. The main disadvantage of blue membrane is that its formation from PM requires either acidification or the removal of divalent cations, as both cause destabilization of the bacteriorhodopsin molecule. Aggregation of the PM patches is also seen. The requirement is for bacteriorhodopsin variants which have a similar photochemistry but at ambient pH value and without removal of cations. BR variants with alterations
3 Overview: Technical Applications of Bacteriorhodopsin 19
in position 85 show such desired properties, and one of the first reported for this purpose was D85N. Another approach is to use wild-type BR at neutral pH values and to switch the material photochemically to the 9-cis state from the so-called O-state [67]. This scheme was named branched photocycle memory [68]. 3.2.4.6 Neural networks Due to the fact that its absorption state may be shifted with blue and yellow light in different directions, bacteriorhodopsin is also a suitable material for neural networks. The output from an absorptive bacteriorhodopsin-cell is used to control the absorption state of another such cell. This has been demonstrated in principle [69], but the bacteriorhodopsin has a “fan-out” of much less than one and for this reason is not really a suitable material for the implementation of optical neural networks. (“Fan-out” is a term from electronics which characterizes the signal power output of a device compared to the required signal power input the same device requires.) A “fan-out” of 10 means that the output terminal of the device supplies enough power that 10 input terminals of identical devices could be supplied with enough energy to signal them the input state reliably. In photochromic devices where no amplification occurs, the “fan-out” is generally less than 1. Without external amplification, it is almost impossible to set up control loops. 3.2.4.7 3-D information storage The use of bacteriorhodopsin in 3-D information storage has been investigated for some time. Bacteriorhodopsin shows an astonishingly high two-photon absorption cross-section of the initial B state. This is used in a two-photon absorption setup to address the absorption state of bacteriorhodopsin in three dimensions [70]. The recording process is quite well handled, but the readout is an intrinsic problem. The advantage of such a memory device is its tolerance towards electromagnetic radiation. 3.2.4.8 Nonlinear optical filtering The strongly nonlinear optical response of bacteriorhodopsin towards the incident light intensity has given rise to many applications which use the nonlinear response for image processing purposes (e. g., edge enhancement, noise reduction). The response curve of the bacteriorhodopsin can be tuned over several orders of magnitude by changing the lifetime of the M-state. This may be accomplished by changing the pH, as well as using modified bacteriorhodopsins. 3.2.4.9 Holographic pattern recognition and interferometry The use of bacteriorhodopsin as a short-term memory has been tested in several applications. The most challenging was the construction of a real-time holographic pattern recognition system which operates at video frame rate. In this system, the holographic comparison of images allows similarities between objects used for identification purposes for complex images to be quantified [71, 72]. In the bacteriorhodopsinfilm, holograms are recorded, read-out and erased at video frame rate. In the early 1990s, when the system was first developed, it was unchallenged by computer systems, but as computing power has advanced the bacteriorhodopsin-film method has been abandoned.
20 Bacteriorhodopsin and Its Potential in Technical Applications
Typical photochromic applications include a holographic real-time correlator (Figure 6A) and a holographic camera for nondestructive testing (Figure 6B). Examples of photochromic inks made from bacteriorhodopsin are shown in Figure 6C, with the initial colored (purple) and bleached, yellowish inks in the foreground and background, respectively. An ID card sample with a bacteriorhodopsin-based optical storage in the purple-colored strip is shown in Figure 6D. 3.3 Applications in Energy Conversion
The use of bacteriorhodopsin as a light energy-converting device seems to be first choice as with regard to technical applications of the molecule [73, 74]. All applications of this type have as the key element in common a bacteriorhodopsin-driven charge separation over the PM. Although this is the basic function of bacteriorhodopsin embedded in PM, until now it has not been possible to prepare artificial PMs with technically relevant dimensions. Methods devised to overcome this problem have been unable to provide PMs capable of producing passive proton transport, and research is continuing in this area.
4 Methods
The methods required for this type of research span from biochemistry and bioengineering to various printing and optical techniques. All have been well documented, with the methods of cultivating halobacteria and isolating PMs being common to all applications. A comprehensive laboratory manual on Archaea was produced in 1995 [40]. For specific conditions of phototrophic growth [13], laboratory protocols of cultivation and media composition, the reader is referred to Ref. [40] (pages 13–21 and 225–230), while for the isolation of PMs from halobacteria, the reader is referred to Ref. [75]. The genetic modification of halobacteria is described in Refs. [47, 76]. The very first publication on the discovery of the bacteriorhodopsin [77], as well as the latest volume on its applications [78], should also be mentioned in this context. Information on specific processing steps for the wide range of applications is taken best from the relevant patent applications, and a summary of patents related to applications of bacteriorhodopsin may be found in Ref. [51].
5 Conclusions
Bacteriorhodopsin is today the biological photochromic material for which technical applications in optical information processing are much more developed than for any other biomaterial. The entire sequence, from an analysis of molecular
5 Conclusions 21
Fig. 6 Photochromic applications of bacteriorhodopsin. (A) Holographic correlator; (B) holographic camera for interferometric testing; (C) photochromic inks for security applications; (D) optical storage.
22 Bacteriorhodopsin and Its Potential in Technical Applications
function to its controlled modification and the development of suitable applications, has been demonstrated with this molecule. Ideas of utilizing the evolutionary optimized functions of biological molecules in technical processes by using genetic engineering to produce tailor-made modified versions of natural molecules with improved technical properties have been demonstrated with bacteriorhodopsin for the first time. Biomaterials as blueprints for technical materials with nanoscale functions form the basis of the concept of nanobionics – and bacteriorhodopsin was the first such example.
References 1 E. DELONG, Science 1998, 280, 542–543. 2 E. F. DELONG, K. Y. WU, B. B. PREZELIN, R. V. M. JOVINE, Nature 1994, 371, 695–697. 3 E. BLO¨ CHL, R. RACHEL, S. BURGGRAF, D. HAFENBRADL, H. W. JANNASCH, K. O. STETTER, Extremophiles 1997, 1, 14–21. 4 R. AMILS, A. BALLESTER (eds.), Biohydrometallurgy and the environment toward the mining of the 21st century. . Elsevier, Amsterdam, Lausanne, New York, Oxford, Shannon, Singapore, Tokyo, 1999. 5 U. B. SLEYTR, B. SCHUSTER, D. PUM, IEEE Eng. Med. Biol. 2003, 22, 140–150. 6 W. D. GRANT, H. LARSEN, in: Bergey’s Manual of Systematic Bacteriology, N. PFENNIG (ed.), Williams Wilkins, Baltimore 1989, pp. 2216–2233. 7 A. OREN, FEMS Microbiol. Rev. 1994, 13, 415–440. 8 A. OREN, H. G. TRU¨ PER, FEMS Microbiol. Lett. 1990, 70, 33. 9 R. HARTMANN, H.-D. SICKINGER, D. OESTERHELT. Proc. Natl. Acad. Sci. USA 1980, 77, 3821–3825. 10 A. RUEPP, H. M¨ULLER, F. LOTTSPEICH, J. SOPPA, J. Bacteriol. 1995, 177, 1129–1136. 11 L. I. HOCHSTEIN, F. LANG, Arch. Biochem. Biophys. 1991, 288, 80. 12 D. OESTERHELT, Curr. Opinion Struct. Biol. 1998, 8, 489–500. 13 D. OESTERHELT, G. KRIPPAHL, Ann. Microbiol. Inst. Pasteur 1983 134B, 137–150. 14 K. W. FOSTER, J. SARANAK, N. PATEL, G. ZARILLI, M. OKABE, T. KLINE, K. NAKANISHI, Nature 1984, 311, 756–759.
15 J. A. BIESZKE, E. N. SPUDICH, K. L. SCOTT, K. A. BORKOVICH, J. L. SPUDICH, Biochemistry 1999, 38, 14138–14145. 16 O. B´EJA` , L. ARAVIND, E. V. KOONIN, M. T. SUZUKI, A. HADD, L. P. NGUYEN, S. B. JOVANOVICH, C. M. GATES, R. A. FELDMAN, J. L. SPUDICH, E. N. SPUDICH, E. F. DELONG, Science 2000, 289, 1902–1906. 17 K. PALCZEWSKI, T. KUMASAKA, T. HORI, C. A. BEHNKE, H. MOTOSHA, B. A. FOX, I. L. TRONG, D. C. TELLER, T. OKADA, R. E. STENKAMP, M. YAMAMOTO, M. MIYANO, Science 2000, 289, 739–745. 18 L.-O. ESSEN, R. SIEGERT, W. D. LEHMANN, D. OESTERHELT, Proc. Natl. Acad. Sci. USA 1998, 95, 11673–11678. 19 M. KOLBE, H. BESIR, L.-O. ESSEN, D. OESTERHELT, Science 2000, 288, 1390–1396. 20 V. I. GORDELLY, J. LABAHN, R. MOUKHAMETHLANOC, R. EFREMOV, J. GRANZIN, R. SCHLESINGER, G. B¨ULDT, T. SAVOPOL, A. J. SCHELDIG, J. P. KLARO, M. ENGELHARDT, Nature 2002, 439, 464–465. 21 G. WAGNER, R. HARTMANN, D. OESTERHELT, Eur. J. Biochem. 1978, 89, 169–179. 22 D. OESTERHELT, Israel J. Chem. 1995, 35, 475–494. 23 W. MARWAN, D. OESTERHELT. ASM-News 2000, 66, 83–90. 24 K. IHARA, T. UMEMURA, I. KATAGIRI, T. KITAJIMA-IHARA, Y. SUGIYAMA, Y. KIMURA, Y. MUKOHATA, J. Mol. Biol. 1999, 285, 163–174. 25 J. SASAKI, L. S. BROWN, Y. S. CHON, H. KANDORI, A. MAEDA, R. NEEDLEMAN, J. K. LANYI, Science 1995, 269, 73–75.
References 26 J. TITTOR, U. HAUPTS, Ch. HAUPTS, D. OESTERHELT, A. BECKER, E. BAMBERG, J. Mol. Biol. 1997, 271, 405–416. 27 U. HAUPTS, J. TITTOR, D. OESTERHELT, Annu. Rev. Biophys. Biomol. Struct. 1999, 28, 367–399. 28 R. RAMMELSBERG, G. HUHN, M. L¨UBBEN, K. GERWERT, Biochemistry 1998, 37, 5001–5009. 29 M. WIKSTRO¨ M, Curr. Opin. Struct. Biol. 1998, 8, 480–488. 30 N. A. DENCHER, H. J. SASS, G. B¨ULDT, Biochim. Biophys. Acta 2000, 1460, 192–203. 31 J. HEBERLE, Biochim. Biophys. Acta 2000, 1458, 135–147. 32 H. LUECKE, B. SCHOBERT, H.-T. RICHTER, J.-P. CARTAILLER, J. K. LANYI, Science 1999, 286, 255–260. 33 J. le COUTRE, J. TITTOR, D. OESTERHELT, K. GERWERT, Proc. Natl. Acad. Sci. USA 1995, 92, 4962–4966. 34 J. RIESLE, D. OESTERHELT, N. A. DENCHER, J. HEBERLE, Biochemistry 1996, 35, 6635–6643. 35 U. HAUPTS, J. TITTOR, E. BAMBERG, D. OESTERHELT, Biochemistry 1997, 36, 2–7. 36 A. MILLER, D. OESTERHELT, Biochim. Biophys. Acta 1990, 1020, 57–64. 37 R. L. CHARLEBOIS, W. L. LAM, S. W. CLINE, W. F. DOOLITTLE, Proc. Natl. Acad. Sci. USA 1987, 84, 8530–8534. 38 S. W. CLINE, W. L. LAM, R. L. CHARLEBOIS, L. S. SCHALKWYK, W. F. DOOLITTLE, Can. J. Microbiol. 1989, 35, 148–152. 39 B. NI, M. CHANG, A. DUSCHL, J. LANYI, R. NEEDLEMAN, Gene 1990, 90, 169–172. 40 F. T. ROBB, A. R. PLACE, K. R. SOWERS, H. J. SCHREIER, S. DASSARMA, E. M. FLEISCHMANN (eds), Archaea. A Laboratory Manual. Cold Spring Harbor Laboratory Press, 1995. 41 M. L. HOLMES, S. D. NUTTALL, M. L. DYALL-SMITH, J. Bacteriol. 1991, 173, 3807–3813. 42 G. WAGNER, D. OESTERHELT, G. KRIPPAHL, J. K. LANYI, FEBS Lett. 1981, 131, 341–345. 43 M. PFEIFFER, T. RINK, K. GERWERT, D. OESTERHELT, H.-J. STEINHOFF, J. Mol. Biol. 1999, 287, 163–171. 44 W. L. LAM, W. F. DOOLITTLE, Proc. Natl. Acad. Sci. USA 1989, 86, 5478–5482.
45 D. GREGOR, F. PFEIFER, MICROBIOLSGM 2001, 147, 1745–1754. 46 N. PATENGE, A. HAASE, H. BOLHUIS, D. OESTERHELT, Mol. Microbiol. 2000, 36, 105–113. 47 M. DYALL-SMITH, The Halohandbook: Protocols for halobacterial genetics, 2001. http://www.microbiol.unimelb.edu.au/ staff/mds/HaloHandbook/index.html 48
[email protected] 49 R. R. BIRGE, Annu. Rev. Phys. Chem. 1990, 41, 683–733. 50 N. HAMPP, Chem. Rev. 2000, 100, 1755–1776. 51 N. HAMPP, Appl. Microbiol. Biotechnol. 2000, 53, 633–639. 52 B. GABRIEL, J. TEISSIE, Proc. Natl. Acad. Sci. USA 1996, 93, 14521–14525. 53 K. KOYAMA, N. YAMAGUCHI, T. MIYASAKA, Science 1994, 265, 762–765. 54 F. T. HONG, Prog. Surf. Sci. 1999, 62, 1–237. 55 H. W. TRISSL, M. MONTAL, Nature 1977, 266, 655–657. 56 F. T. HONG, BioSystems 1995, 35, 117–121. 57 T. MIYASAKA, K. KOYAMA, I. ITOH, Kagaku to Kogyo (Tokyo) 1993, 46, 247–249. 58 Z. CHEN, R. R. BIRGE, Trends Biotechnol. 1993, 11, 292–300. 59 T. MIYASAKA, K. KOYAMA, I. ITOH, Science 1992, 255, 342–344. 60 C. RITZEL, K. MEERHOLZ, C. BRA¨ UCHLE, D. OESTERHELT, Mol. Cryst. Liq. Cryst. Sci. Technol. A 1998, 315, 443–448. 61 C. RITZEL, K. MEERHOLZ, C. BRA¨ UCHLE, D. OESTERHELT, Proc. SPIE-Int. Soc. Opt. Eng. 1998, 3475, 49–55. 62 N. HAMPP, C. BRA¨ UCHLE, D. OESTERHELT, Biophys J. 1990, 58, 83–93. 63 M. FRYDRCH, P. SILFSTEN, S. PARKKINEN, J. PARKINNEN, T. JAASKELAINEN, Biosystems 2000, 54, 131–140. 64 N. HAMPP, T. FISCHER, M. NEEBE, Proc. SPIE 2002, 4677, 121–128. 65 P. KOLODNER, E. LUKASHEV, Y.-C. CHING, I. ROUSSO, Proc. Natl. Acad. Sci. USA 1996, 53, 6012–6015. 66 N. N. VSEVOLODOV, T. V. DYUKOVA, Trends Biotechnol. 1994, 12, 81–88. 67 A. POPP, M. WOLPERDINGER, N. HAMPP, C. BRA¨ UCHLE, D. OESTERHELT, Biophys. J. 1993, 65, 1449–1459.
23
24 Bacteriorhodopsin and Its Potential in Technical Applications
68 N. B. GILLESPIE, K. J. WISE, L. REN, J. A. STUART, D. L. MARCY, Q. LI, L. RAMOS, K. JORDAN, L. REN, R. R. BIRGE, J. Phys. Chem. B 2002, 106, 13352–13361. 69 D. HARONIAN, A. LEWIS, Appl. Opt. 1991, 30, 597–608. 70 R. R. BIRGE, R. B. GROSS, M. B. MASTHAY, J. A. STUART, J. R. TALLENT, C. F. ZHANG, Mol. Cryst. Liq. Cryst. Sci. Technol. 1992, B3, 133–148. 71 R. THOMA, N. HAMPP, Opt. Lett. 1994, 19, 1364–1366. 72 R. THOMA, M. DRATZ, N. HAMPP, Opt. Eng. 1995, 34, 1345–1351.
73 M. EISENBACH, C. WEISSMANN, G. TANNY, S. R. CAPLAN, FEBS Lett. 1977, 81, 77–80. 74 D. OESTERHELT, FEBS Lett. 1976, 64, 20–22. 75 OESTERHELT, D. STOECKENIUS, W. Methods Enzymol. 1974, 31, 667–678. 76 K. J. WISE, N. B. GILLESPIE, J. A. STUART, M. P. KREBS, R. R. BIRGE, Trends Biotechnol. 2002, 20, 387–394. 77 D. OESTERHELT, W. STOECKENIUS, Nature New Biol. 1971, 233, 149–152. 78 N. VSEVOLODOV, Biomolecular Electronics, Birkh¨auser, Boston, Basle, Berlin, 1998.
1
Biomolecular Motors Operating in Engineered Environments Stefan Diez, Jonne H. Helenius, and Jonathon Howard Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
Originally published in: Nanobiotechnology. Edited by Christof M. Niemeyer and Chad A. Mirkin. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30658-9
1 Overview
Recent advances in understanding how biomolecular motors work has raised the possibility that they might find applications as nanomachines. For example, they could be used as molecule-sized robots that: Ĺ work in molecular factories where small, but intricate structures are made on tiny assembly lines; Ĺ construct networks of molecular conductors and transistors for use as electrical circuits; Ĺ or that continually patrol inside “adaptive” materials and repair them when necessary.
Thus, biomolecular motors could form the basis of bottom-up approaches for constructing, active structuring and maintenance at the nanometer scale. We will review the current status of the operation of biomolecular motors in engineered environments, and discuss possible strategies aimed at implementing them in nanotechnological applications. We cite reviews whenever possible for the biochemical and biophysical literature, and include primary references to the nanotechnological literature. Biomolecular motors are the active workhorses of cells [1]. They are complexes of two or more proteins that convert chemical energy – usually in the form of the high-energy phosphate bond of ATP – into directed motion. The most familiar motor is the protein myosin which moves along filaments, formed from the protein actin, to drive the contraction of muscle. In fact, all cells – not just specialized
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Biomolecular Motors Operating in Engineered Environments
muscle cells – contain motors that move cellular components such as proteins, mitochondria, and chromosomes from one part of the cell to another. These motors include relatives of muscle myosin (that also move along actin filaments), as well as members of the kinesin and dynein families of proteins. The latter motors move along another type of filament called the microtubule. The reason that motors are necessary in cells is that diffusion is too slow to transport molecules efficiently from where they are made (which typically is near the nucleus) to where they are used (which is often at the periphery of the cell). For example, the passive diffusion of a small protein to the end of a 1 meter-long neuron would take approximately 1000 years, yet kinesin moves it in a week. This corresponds to a speed of 1–2 µm s−1 , which is typical for biomolecular motors [2]. Actin filaments and microtubules form a network of highways within cells, and localized cues are used to target specific cargoes to specific sites in the cell [3]. By using filaments and motors, cells build highly complex and active structures on the molecular (nanometer) scale. Little imagination is needed to envisage employing biomolecular motors to build molecular robots [4]. Biomolecular motors are unusual machines that do what no man-made machines do: they convert chemical energy to mechanical energy directly rather than via an intermediate such as heat or electrical energy. This is essential because the confinement of heat, for example, on the nanometer scale is not possible because of its high diffusivity in aqueous solutions [2]. As energy converters, biomolecular machines are highly efficient. The chemical energy available from the hydrolysis of ATP is 100 × 10−21 J = 100 pN nm−1 (under physiological conditions, where the ATP concentration is 1 mM and the concentrations of the products ADP and phosphate are 0.01 mM and 1 mM, respectively). With this energy, a kinesin molecule is able to perform an 8-nm step against a load of 6 pN [2]. The energy efficiency is therefore almost 50%. For the rotary motor F1 F0 -ATPase synthase which uses the electrochemical gradient across mitochondrial and bacterial membranes to generate ATP, the efficiency is reported to be between 80 and 100% [5, 6]. This high efficiency demonstrates that, like other biological systems, the operation of biological motors has been optimized through evolution. High efficiency is but one feature that makes biomolecular motors attractive for nanotechnological applications. Other features are: 1. They are small and can therefore operate in a highly parallel manner. 2. They are easy to produce and can be modified through genetic engineering. 3. They are extremely cheap. For example, 20 × 109 kinesin motors can be acquired for 1 US cent from commercial suppliers (1 mg = 3.3 × 1015 motors cost $1500; Cytoskeleton, Inc., Colorado, USA) and the price could be significantly decreased if production were scaled up. 4. A wide array of biochemical tools have been developed to manipulate these proteins outside the cell.
1 Overview
This review focuses on two broad categories of molecular motors: Ĺ Linear motors generate force as they move along intracellular filaments. In addition to myosin and kinesin mentioned above, linear motors also include enzymes that move along DNA and RNA. Ĺ Rotary motors generate torque via the rotation of a central core within a larger protein complex. They include ATP synthase, mentioned above, as well as the motor that drives bacterial motility.
Representatives of both categories have been used to manipulate molecules and nanoparticles. Mechanical and structural properties of relevant filaments are contained in Table 1, and those of several associated motors in Table 2. The general set-ups for studying motor proteins outside cells – the so-called motility assays – are depicted in Figure 1. In the gliding assay, the motors are immobilized on a surface and the filaments glide over the assembly (Figure 1A). In the stepping assay, the filaments are laid out on the surface where they form tracks for the motors to move along (Figure 1B). In both assays, movement is observed under the light microscope using fluorescence markers or highcontrast techniques. Variations on these assays have been used to reconstitute linear motility on the four types of filaments – actin filaments, microtubules, DNA, and RNA. The gliding motility assay has provided detailed data on the directionality, speed, and force generation of purified molecular motors [2, 7]. However, for use in nanotechnological applications, the movement of gliding filaments must be controllable in space and time. For example, a simple application would be to employ a moving filament to pick up cargo at point A, move it along a user-defined path to point B, and then release it. A number of methods for the spatial and temporal control of filament movement have been developed. Spatial control has been achieved using topographical features [8–11], chemical surface modifications [10, 12–14], and a combination of both [15–18]. Electrical fields [19–21] and hydrodynamic flow [22, 23] have also been used to direct the motion of gliding filaments. An example from our laboratory of gliding microtubules that are guided by channels is shown in Figure 2. Temporal control has been achieved by manipulating the ATP concentration [9, 24]. In addition to these basic techniques for controlling motion, some simple applications of the gliding assay have been demonstrated. These include the transport of streptavidin-coated beads [9], the transport and stretching of individual DNA molecules [25], the measurement of forces in the pN range [26], and the imaging of surfaces [27]. The stepping assay opens up additional possibilities. Initially, micrometer-sized beads were coated with motor proteins and visualized as they moved along filaments. The movement of beads can be tracked with nanometer precision to determine the speed and step size [2], and the use of optical tweezers allows forces to
3
13 2
25 nm 2 nm
2 nm
RNA
2
2
6 nm
Actin filament Microtubule DNA
Strands per filament
Diameter
Filament
0.34 nm
8 nm 0.34 nm
5.5 nm
Repeat length
75 nm
5 mm 50 nm
10 µm
Persistence length
1.5 GPa
2 GPa 1 GPa
2 GPa
Young’s modulus
30 µm
10 cm 100 mm
100 µm
Maximum length
Kinesin, Dynein RNA polymerase, DNA helicase, topoisomerase Ribosome
Myosin
Motors
80
78 79
77
Reference
where k is the Boltzmann constant and T is absolute temperature. Young’s modulus (E) is calculated assuming that the filament is homogenous and isotropic. The repeat length describes the periodicity along a strand of the filament
Table 1 Physical attributes of actin filaments, microtubules, DNA, and RNA. The persistence length (Lp) is related to the flexural rigidity (EI) by: Lp = EI/kT,
4 Biomolecular Motors Operating in Engineered Environments
The sizes refer to the motor domains.
120◦
8 × 14 45
*
0.34 up to 43 nm/turn 0.34
15
DNA DNA DNA pilus NA NA
0.34
5 36 8
16 24 6 24
Actin Actin Microtubule Microtubule DNA
Myosin II Myosin V Conventional kinesin Dynein T7 DNA polymerase (exonuclease activity) RNA polymerase Topoisomerase Bacteriophage portal motor Type IV pilus retraction motor F1 -ATPase Flagellar motor
Step size [nm]
Filament
Motor
Size* [nm]
5 − 100 bps 1000 8 rps 300 rps
30 000 300 800 6400 >100 bps
Maximum speed [nm s−1 ]
57 pN 110 pN 100 pN nm 550 pN nm
25 pN
10 pN 1.5 pN 6 pN 6 pN 34 pN
Maximum force [pN]
80
NA NA
NA
50 50 50
Efficiency [%]
Table 1. The sizes refer to the motor domains. Dynamic parameters were determined by in-vitro experiments at high ATP concentration
87, 88 89, 90 91 92, 93 6 94
2, 81 82 2, 83 84, 85 86
Reference(s)
Table 2 Values characterizing the operation of several important biomolecular motors. The filaments along which the linear motors operate are indicated in
1 Overview 5
6 Biomolecular Motors Operating in Engineered Environments
Fig. 1 Biomolecular motor systems currently applicable for nanotechnological developments. (A) Linear transport of filaments by surface bound motor molecules (gliding assay). (B) Linear movement of motor proteins along filaments (stepping assay). (C) Rotation generated by a rotary motor.
1 Overview
Fig. 2 (A) Directed movement of gliding microtubules along microstructured polyurethane channels on the surface of a coverslip. The initial positions of the microtubules are shown in orange, while the paths they traveled over the subsequent 12 s are shown in green. (B) Scanning electron microscopy image of the polyurethane channels. The channels are a replica mold of a
Si-master (channel width 500 nm, periodicity 1000 nm, depth 300 nm) produced using a poly(dimethylsiloxane) (PDMS) stamp as an intermediate. Note, that the ridges have been “undercut”. This probably aids the guiding of the microtubules in the channels. (Silicon master provided by T. Pompe, Institute of Polymer Research, Dresden, Germany.)
be measured [28]. In addition to beads, 10 (µm-diameter glass particles [29] and Simicrochips [30] have been transported and membrane tubes have been pulled [32]a along filaments. In another variation, high-sensitivity fluorescence microscopy is used to visualize individual motor molecules as they step along filaments [31, 32]. An example from our laboratory of a single kinesin motor fused to the green fluorescent protein moving along a microtubule is shown in Figure 3. Despite the power of single-molecule techniques, they have yet to be exploited for nanotechnological applications. Rotary motors can be studied in vitro by fixing the stator to a surface and following the movement of the rotor (see Figure 1C). Rotation can be visualized under the light microscope by attaching a fluorescent label or a microscopic marker to the rotor. Both techniques have been used to investigate the stepwise rotation generated by F1 -ATPase, which is a component of the F1 F0 -ATP synthesis machinery [5, 33]. Individual motors have been integrated into nanoengineered environments by arraying them on a nanostructured surface and using them to rotate fluorescent microspheres [34] or to drive Ni-nanopropellers [6].
7
8 Biomolecular Motors Operating in Engineered Environments
Fig. 3 Movement of a single kinesin molecule (labeled with the green fluorescent protein) along a microtubule (red). Micrographs were acquired at the indicated times using total-internal-reflection fluorescence microscopy.
2 Methods
There are many challenges in applying biomolecular motors to nanotechnology. Motility must be robust, it must be controlled both spatially and temporally, and the motors must be hitched to and unhitched from their cargoes. This section summarizes key techniques towards these ends. 2.1 General Conditions for Motility Assays
Motility assays are performed in aqueous solutions that must fulfill a number of requirements. We will illustrate these requirements with the kinesin/microtubule system. Kinesin uses ATP as its fuel; the maximum speed is reached at ∼0.5 mM, approximately equal to the cellular concentration. Other nucleotides such as GTP, TTP, and CTP can substitute for ATP, but the speed is lower [35]. Motility also
2 Methods 9
requires divalent cations, with magnesium preferred over calcium, and strontium and barium unable to substitute [36]. Optimal motility, assessed by gliding speed, occurs over a range of pH, between 6 and 9 [35, 37], and over a range of ionic strengths, between 50 mM and 300 mM [37]. The speed increases with temperature, doubling for each 10◦ C between 5◦ C and 50◦ C [24, 38]; motility fails at higher temperatures. The force is independent of temperature between 15◦ C and 35◦ C [39]. When assays are performed in the middle of these ranges, motility is robust and only a small drop in the mean velocities is seen after 3 hours [24, 37]. If fluorescent markers are used, then an oxygen-scavenging enzyme system must be present in order to prevent photodamage. Many experimental details, including a discussion of the densities of the motors, can be found in Ref. [7].
2.2 Temporal Control
Motors can be reversibly switched off and on by regulating the concentration of fuel, or by adding and removing inhibitors. The ATP concentration can be rapidly altered by flowing in a new solution. In such a setup, the kinesin-dependent movement of microtubules can be stopped within 1 s and restarted within 10 s (unpublished data from our laboratory). Similarly, inhibitors such as AMP–PNP (a non-hydrolyzable analogue of ATP [40]), adocia-sulfate-2 (a small molecule isolated from sponge [41]) and monastrol [42] can be perfused to stop motility. An alternative method to control energy supply is to use photoactivatable ATP. In this method, a flash of UV light is used to release ATP from a derivatized, nonfunctional precursor; an ATP-consuming enzyme is also present to return the ATP concentration to low levels following release. Using such a system, microtubule movement has been repeatedly started and stopped [9], though the start-up and slow-down times were slow, on the order of minutes. The advantage of this method is that the solution in the flow cell does not have to be exchanged. Fortuitously, many proteins possess natural regulatory mechanisms and, once understood, these might offer additional means to regulate the motors in vitro. Examples include the regulation of myosins by phosphorylation and calcium/calmodulin [43] and the inhibition of kinesin by its cargo-binding “tail” domain [44]. Because such natural controls might not always be applicable in a synthetic environment, there is strong interest in the development of artificial control mechanisms for motor proteins. Towards this end, metalion binding sites have been genetically engineered into the F1 -ATPase motor. The binding of ions at the engineered site immobilizes the moving parts of the motor, thus inhibiting its rotation [45]. ATP-driven rotation can be restored by the addition of metal ion chelators. Clever genetic engineering of motors could provide temporal control mechanisms that may be switched by temperature, light, electrical fields, or buffer composition.
10 Biomolecular Motors Operating in Engineered Environments
2.3 Spatial Control
In order to control the path along which filaments glide – a process that we call “guiding” – it is necessary to restrict the location of active motors to specific regions of a surface. This can be done by coating a glass or silicon surface with resist polymers such as PMMA, SU–8, or SAL601 and using UV, electron beam or soft lithography to remove resist from defined regions [12–19]. The motor-containing solution is then perfused across the surface. By choosing appropriate properties of this solution [e. g., the concentration of motors, salts, other blocking proteins such as casein and bovine serum albumin (BSA), and detergents such as Triton X-100], motility can be restricted to either the unexposed, resist surface or to the exposed, underlying substrate. For example, it has been found that myosin motility is primarily restricted to the more hydrophobic resist surfaces while kinesin motility is primarily restricted to the more hydrophilic non-resist surfaces. However, the detailed interactions of the motors with these surfaces are not well understood. One limitation of this approach to binding proteins to surfaces is that the motors tend to bind everywhere, so it is difficult to attain good contrast. A proven method to prevent motor binding is to coat a surface with polyethylene oxide (PEO) [10, 46]. Techniques to bind motors and filaments via affinity tags to surfaces are summarized in section 2.4. While chemical patterning can restrict movement of filaments to areas with a high density of active motors, walking off the trails is not prevented. This was demonstrated by Hess et al. [10], who showed that microtubules move straight across a boundary between high motor density (non-PEO) and low motor density (PEO), where they dissociate from the surface. The problem with a purely chemical pattern is that if a rigid filament is propelled by several motors along its length, there is nothing to stop the motors at the rear from pushing the filament across a boundary into an area of low motor density. The behavior of microtubules colliding with the walls of channels imprinted in polyurethane has been studied by Clemmens et al. [11]. They found that the probability of a filament being guided by the walls decreased as the approach angle increased. At high incident angles, guiding was not observed and instead the microtubules climbed the walls. Combining chemical and topographic features – as occurs in the lithographic studies described above – leads to more efficient guiding. For example, in the study of Moorjani et al. [18], filaments remained at the bottom of the channels formed in the SU–8 even when they collided with the walls at angles above 80◦ . When the leading end of the microtubule hits the wall, the motors at the rear force the microtubule to bend into the region of high motor density, and in this way the motion is guided by the boundary (see Figure 4, unpublished results from our laboratory). While it is possible to use chemical and topographical patterning to guide filaments – that is, to restrict their movement to particular paths – it is more difficult to control the direction of movement along the path. The difficulty arises because the orientation in which motors bind to a uniform surface is not controlled. Some motors will be oriented so that they propel filaments in one direction along the path, whereas others will propel filaments in the opposite direction. The reason
2 Methods
Fig. 4 Sequence of fluorescent images showing the kinesin-driven, unidirectional movement of a rhodamine-labeled microtubule (red) along chemically and topographically structured Si-chip. The bottom of the channels (green), the depths of which
are 300 nm, is coated with kinesin. The surrounding regions are blocked by polyethylene glycol. (Research in collaboration with R. M.M. Smeets, M.G.L. van den Heuvel, and C. Dekker, Delft University of Technology, The Netherlands.)
that motors do not counteract each other is that filaments are polar structures: the orientation of the proteins that form up the filaments is maintained all along the length of the filament (see Figure 1). Because the motors bind stereospecifically to the filament, they will exert force in only one direction. Thus, the orientation of the filament determines its direction of motion; one end always leads. The direction of filament gliding can be controlled by the application of external forces. Actin filaments and microtubules both possess negative net charges and, consequently, in the presence of a uniform electric field, will experience a force directed towards the positive electrode. It is possible to apply high enough electric fields to steer motor-driven filaments in a specified direction [19, 21]. Because the refractive index of protein differs from that of water, filaments become electrically polarized in the presence of an electric field, and consequently in a nonuniform field they move in the direction of highest field strength. This so-called dielectrophoretic force has been used to direct the gliding of actin filaments on a myosin-coated substrate [20]. It is even possible to manipulate a microtubule using optical gradients produced by focusing a laser beam (i. e., an optical tweezers) [47]. Directional control of microtubule gliding has also been achieved using hydrodynamic flow fields [23, 29]. An alternative approach to directionality relies on more sophisticated guiding concepts. For example, unidirectional movement of filaments can be achieved if
11
12 Biomolecular Motors Operating in Engineered Environments
guiding geometries based on arrow and ratchet structures are employed [10, 15]. An example of the unidirectional movement of a microtubule on a topographically and chemically structured silicon chip is depicted in Figure 4 To control the direction of motion in stepping assays, the orientation of the filaments on the surface must be controlled. Towards this end, the generation of isopolar filament arrays has been achieved by binding specific filament ends to a surface, and using hydrodynamic flow to align the filaments along the surface to which they are subsequently adhered to [30, 48–50]. Alternatively, moving filaments can be aligned in a particular orientation by a flow field prior to fixation by glutaraldehyde [23, 29], which has been shown not to interfere with kinesin motility [51]. Fluid flow has also been used to align microtubules binding to patterned silane surfaces, though the orientation of the microtubules was not controlled [52]. 2.4 Connecting to Cargoes and Surfaces
Cargoes can be attached to filaments using several different approaches. The prospective cargo can be coated with an antibody to the filament [53] or to a filament-binding protein such as gelsolin [54]. A clever refinement of this technique is genetically to fuse gelsolin with a cargo protein, thereby generating a dual-functional protein [55]. Alternatively, the cargo can be coated with streptavidin which binds to filaments that have been derivatized with biotin [56]. There are many other possibilities which have not yet been realized. Analogous methods can be used to couple motors to surfaces. For example, the motor can be fused with the bacterial biotin-binding protein [57] and in this way bound to streptavidin-coated cargoes or surfaces. There any many peptide tags that can be fused to proteins to aid their purification [58, 59]. These tags can be used to couple these proteins to surfaces coated with the complementary ligand. A popular tag is the hexahistidine tag which binds Ni2+ and other metals that are chelated to nitriloamines (NTA). A nice approach is to couple the NTA to the terminal ethyleneoxides of triblock copolymers containing PEO. In principle, this provides specific binding of a histagged motor (or another protein) to a surface while the PEO groups block nonspecific binding [46, 60]. Controlled unloading of cargo has not been demonstrated, but ought to be feasible. For example, there are biotins that can be irreversibly cleaved by light and reversibly cleaved by reducing agents, and the histidine-Ni2+ -NTA connection can be broken by sequestering the Ni2+ with EDTA.
3 Conclusions
Although the first steps have been made towards the operation of biomolecular motors in engineered environments, many advances are necessary before these
3 Conclusions
motors can be used in nanotechnological applications such as working in molecular factories and building circuits. An immediate task is to improve the spatial and temporal control over the motors. By combining improved surface techniques with the application of external electric, magnetic, and/or optical fields it should be possible, in the near future, to stretch and collide single molecules, to control cargo loading and unloading, and to sort and pool molecules. Another goal is to control the position and orientation of motors with molecular precision. This means placing motors with an accuracy of ∼10 nm on a surface and controlling their orientation within a few degrees. In this way both the location and the direction of motion of filaments can be controlled. One approach to molecular patterning is to “decorate” filaments with stereospecifically bound motors. Once aligned along the filament matrix, the motors can be transferred to another surface. This approach was taken by Spudich et al. [48, 61] and should be followed up. A further development of this idea is to directly produce (perhaps by stamping a mold made with a filament) surfaces that have structures functionally similar to motor-binding sites. An alternative approach is to use dippen lithography or other AFM techniques [62] to directly pattern motors on surfaces. The robustness of motors must be increased. Motors operate only in aqueous solutions and under a restricted range of solute concentrations and temperatures. While it is inconceivable that protein-based motors could operate in a nonaqueous environment, two approaches to increasing their robustness can be envisaged. First, motors could be purified from thermophilic or halophilic bacteria, some of which grow at temperatures up to 112◦ C and salinities above 5 M. There are also extreme eukaryotes that grow at up to 62◦ C or 5 M NaCl. This approach has already been taken for ATP synthase [63], but not with linear motors because no obvious homologues of myosins or kinesins have been found in bacteria. Second, a genetic screening approach might reveal mutations that allow motors to operate in less restrictive or different conditions. A longer-term goal is to use the design principles learnt from the study of biomolecular motors to build purely artificial nanomotors that can operate in air or vacuum. This is a daunting prospect however, and it is not even clear what fuel(s) might be used. A potential way forward is to use chemical energy from a surface: for example, it was demonstrated that tin particles slide across copper surfaces driven by the formation of bronze alloy [64], this being analogous to paraffin-driven toy boats. Besides the motor systems discussed so far, other biomechanical assemblies are good candidates for nanotechnological applications. In addition to providing paths along which motors move, active biological filaments on their own might find use in nanotechnological applications. The pushing and pulling forces generated by the polymerization and depolymerization of actin filaments and microtubules provide an alternative method of moving molecules [2, 65]. This ability is of particular interest because bacteria possess actin- [66] and microtubule-like [67] filaments and, as mentioned above, the proteins of extremophilic bacteria function in extreme environmental conditions. Filaments and motors can also self-organize under certain conditions [68–71]. On a side note, the flagellar filament in conjunction
13
14 Biomolecular Motors Operating in Engineered Environments
with the flagellar motors allow the bacteria to move in three-dimensional liquid space [72]. In addition to the motors that we have described so far, cells contain numerous biomolecular machines that can also be thought of as motors (for example, see Ref. [3]). These machines use chemical energy to replicate DNA (DNA polymerases) and process it (recombinases, topoisomerases and endonucleases), to produce RNA (RNA polymerases) and splice it (spliceosomes), to make proteins (ribosomes) and fold them (chaperones) and move them across membranes (translocases), and finally destroy them (proteasomes). The energy is provided by another group of machines that generate the electrochemical gradients (electron transport system, bacteriorhodopsin) used by the F1 F0 -ATP synthase to make ATP or by flagellar motors to propel bacteria. All these machines are candidates for nanotechnological applications, and a recent report of the use of chaperones to maintain nanoparticles in solution [73] is a step in this general direction. We finish up by pointing out that the high order and nanometer-scale periodicity of DNA, actin filaments and microtubules make them ideal scaffolds on which to erect three-dimensional nanostructures. While these features have been exploited to make DNA-based structures [74], the use of DNA motors to address specific sites (based on nucleotide sequence) has not, to our knowledge, been realized. Some years ago it was proposed that the regular lattice of microtubules might serve as substrates for molecular computing and information storage [75, 76]. While these ideas seem crazy in the context of the living organism, they may be realizable for biomolecular motors operating in engineered environments. At the moment, anything is possible!
Acknowledgements
The authors thank U. Queitsch for help with experiments on guiding microtubules, T. Pompe, R. M.M. Smeets, M.G.L. van den Heuvel, and C. Dekker for fabricating microstructured channels, and F. Friedrich for assistance with the illustrations.
References 1 B. ALBERTS, Cell 1998, 92, 291. 2 J. HOWARD, Mechanics of Motor Proteins and the Cytoskeleton, Sinauer Associates, Sunderland, MA, 2001. 3 B. ALBERTS, Molecular biology of the cell, 4th edition, Garland Science, New York, 2002. 4 M. CRICHTON, Prey, HarperCollins, Toronto, 2002. 5 K. KINOSITA, Jr., R. YASUDA, H. NOJI, S. ISHIWATA, M. YOSHIDA, Cell 1998, 93, 21.
6 R. K. SOONG, G. D. BACHAND, H. P. NEVES, A. G. OLKHOVETS, H. G. CRAIGHEAD, C. D. MONTEMAGNO, Science 2000, 290, 1555. 7 J. M. SCHOLEY, Motility assays for motor proteins, Academic Press, San Diego, 1993. 8 J. R. DENNIS, J. HOWARD, V. VOGEL, Nanotechnology 1999, 10, 232. 9 H. HESS, J. CLEMMENS, D. QIN, J. HOWARD, V. VOGEL, Nano Lett. 2001, 1, 235.
References 10 H. HESS, J. CLEMMENS, C. M. MATZKE, G. D. BACHAND, B. C. BUNKER, V. VOGEL, Appl. Physics A (Materials Science Processing) 2002, A75, 309. 11 J. CLEMMENS, H. HESS, J. HOWARD, V. VOGEL, Langmuir 2003, 19, 1738. 12 H. SUZUKI, A. YAMADA, K. OIWA, H. NAKAYAMA, S. MASHIKO, Biophys. J. 1997, 72, 1997. 13 D. V. NICOLAU, H. SUZUKI, S. MASHIKO, T. TAGUCHI, S. YOSHIKAWA, Biophys. J. 1999, 77, 1126. 14 J. WRIGHT, D. PHAM, C. MAHANIVONG, D. V. NICOLAU, M. KEKIC, C. G. DOS REMEDIOS, Biomed. Microdevices 2002, 4, 205. 15 Y. HIRATSUKA, T. TADA, K. OIWA, T. KANAYAMA, T. Q. P. UYEDA, Biophys. J. 2001, 81, 1555. 16 C. MAHANIVONG, J. P. WRIGHT, M. KEKIC, D. K. PHAM, C. DOS REMEDIOS, D. V. NICOLAU, Biomed. Microdevices 2002, 4, 111. 17 R. BUNK, J. KLINTH, L. MONTELIUS, I. A. NICHOLLS, P. OMLING, S. TAGERUD, A. MANSSON, Biochem. Biophys. Res. Commun. 2003, 301, 783. 18 S. G. MOORJANI, L. JIA, T. N. JACKSON, W. O. HANCOCK, Nano Lett. 2003, 3, 633. 19 D. RIVELINE, A. OTT, F. JULICHER, D. A. WINKELMANN, O. CARDOSO, J. J. LACAPERE, S. MAGNUSDOTTIR, J. L. VIOVY, L. GORRE-TALINI, J. PROST, Eur. Biophys. J. Biophys. Lett. 1998, 27, 403. 20 S. B. ASOKAN, L. JAWERTH, R. L. CARROLL, R. E. CHENEY, S. WASHBURN, R. R. SUPERFINE, Nano Lett. 2003, 3, 431. 21 R. STRACKE, K. J. BOHM, L. WOLLWEBER, J. A. TUSZYNSKI, E. UNGER, Biochem. Biophys. Res. Commun. 2002, 293, 602. 22 P. STRACKE, K. J. BOHM, J. BURGOLD, H. J. SCHACHT, E. UNGER, IOP Publishing. Nanotechnology 2000, 11, 52. 23 I. PROTS, R. STRACKE, E. UNGER, K. J. BOHM, Cell Biol. Int. 2003, 27, 251. 24 K. J. BOHM, R. STRACKE, M. BAUM, M. ZIEREN, E. UNGER, FEBS Lett. 2000, 466, 59. 25 S. DIEZ, C. REUTHER, C. DINN, R. SEIDEL, M. MERTIG, W. POMPE, J. HOWARD, Nano Lett. 2003, 3, 1251–1254. 26 H. HESS, J. HOWARD, V. VOGEL, Nano Lett. 2002, 2, 1113.
27 H. HESS, J. CLEMMENS, J. HOWARD, V. VOGEL, Nano Lett. 2002, 2, 113. 28 A. D. MEHTA, M. RIEF, J. A. SPUDICH, D. A. SMITH, R. M. SIMMONS, Science 1999, 283, 1689. 29 K. J. BOHM, R. STRACKE, P. MUHLIG, E. UNGER, IOP Publishing. Nanotechnology 2001, 12, 238. 30 L. LIMBERIS, R. J. STEWART, Proceedings of Spie: the International Society for Optical Engineering, SPIE Int. Soc. Opt. Eng. 1998, 3515, 66. 31 R. D. VALE, T. FUNATSU, D. W. PIERCE, L. ROMBERG, Y. HARADA, T. YANAGIDA, Nature 1996, 380, 451. 32 A. YILDIZ, J. N. FORKEY, S. A. MCKINNEY, T. HA, Y. E. GOLDMAN, P. R. SELVIN, Science 2003, 300, 2061. [a] A. ROUX, G. CUPPELLO, J. CARTAUD, J. PROST, B. GOUD, P. BASSEREAU, Proc. Natl. Acad. Sci. USA, 2002, 99, 5394–5399. 33 H. NOJI, R. YASUDA, M. YOSHIDA, K. KINOSITA, Jr., Nature 1997, 386, 299. 34 C. MONTEMAGNO, G. BACHAND, Nanotechnology 1999, 10, 225. 35 S. A. COHN, A. L. INGOLD, J. M. SCHOLEY, J. Biol. Chem. 1989, 264, 4290. 36 K. J. BOHM, P. STEINMETZER, A. DANIEL, M. BAUM, W. VATER, E. UNGER, Cell. Motil. Cytoskeleton 1997, 37, 226. 37 K. J. BOHM, R. STRACKE, E. UNGER, Cell Biol. Int. 2000, 24, 335. 38 K. KAWAGUCHI, S. ISHIWATA, Cell. Motil. Cytoskeleton 2001, 49, 41. 39 K. KAWAGUCHI, S. ISHIWATA, Biochem. Biophys. Res. Commun. 2000, 272, 895. 40 B. J. SCHNAPP, B. CRISE, M. P. SHEETZ, T. S. REESE, S. KHAN, Proc. Natl. Acad. Sci. USA 1990, 87, 10053. 41 R. SAKOWICZ, M. S. BERDELIS, K. RAY, C. L. BLACKBURN, C. HOPMANN, D. J. FAULKNER, L. S. GOLDSTEIN, Science 1998, 280, 292. 42 T. U. MAYER, T. M. KAPOOR, S. J. HAGGARTY, R. W. KING, S. L. SCHREIBER, T. J. MITCHISON, Science 1999, 286, 971. 43 J. R. SELLERS, H. V. GOODSON, Protein Profile 1995, 2, 1323. 44 D. L. COY, W. O. HANCOCK, M. WAGENBACH, J. HOWARD, Nature Cell Biol. 1999, 1, 288. 45 H. LIU, J. J. SCHMIDT, G. D. BACHAND, S. S. RIZK, L. L. LOOGER, H. W. HELLINGA, C. D. MONTEMAGNO, Nature Mater 2002, 1, 173.
15
16 Biomolecular Motors Operating in Engineered Environments
46 M. J. DECASTRO, C. H. HO, R. J. STEWART, Biochemistry 1999, 38, 5076. 47 H. FELGNER, R. FRANK, J. BIERNAT, E. M. MANDELKOW, E. MANDELKOW, B. LUDIN, A. MATUS, M. SCHLIWA, J. Cell Biol. 1997, 138, 1067. 48 J. A. SPUDICH, S. J. KRON, M. P. SHEETZ, Nature 1985, 315, 584. 49 L. LIMBERIS, J. J. MAGDA, R. J. STEWART, Nano Lett. 2001, 1, 277. 50 L. LIMBERIS, R. J. STEWART, IOP Publishing. Nanotechnology 2000, 11, 47. 51 T. B. BROWN, W. O. HANCOCK, Nano Lett. 2002, 2, 1131. 52 D. TURNER, C. Y. CHANG, K. FANG, P. CUOMO, D. MURPHY, Anal. Biochem. 1996, 242, 20. 53 D. C. TURNER, C. Y. CHANG, K. FANG, S. L. BRANDOW, D. B. MURPHY, Biophys. J. 1995, 69, 2782. 54 Y. WADA, T. HAMASAKI, P. SATIR, Mol. Biol. Cell 2000, 11, 161. 55 C. VEIGEL, L. M. COLUCCIO, J. D. JONTES, J. C. SPARROW, R. A. MILLIGAN, J. E. MOLLOY, Nature 1999, 398, 530. 56 J. YAJIMA, M. C. ALONSO, R. A. CROSS, Y. Y. TOYOSHIMA, Curr. Biol. 2002, 12, 301. 57 F. GITTES, E. MEYHOFER, S. BAEK, J. HOWARD, Biophys. J. 1996, 70, 418. 58 E. BERLINER, H. K. MAHTANI, S. KARKI, L. F. CHU, J. E. CRONAN, Jr., J. GELLES, J. Biol. Chem. 1994, 269, 8610. 59 J. W. JARVIK, C. A. TELMER, Annu. Rev. Genet. 1998, 32, 601. 60 K. TERPE, Appl. Microbiol. Biotechnol. 2003, 60, 523. 61 J. P. BEARINGER, S. TERRETTAZ, R. MICHEL, N. TIRELLI, H. VOGEL, M. TEXTOR, J. A. HUBBELL, Nature Mater. 2003, 2, 259. 62 F. JIANG, K. KHAIRY, K. POOLE, J. HOWARD, D. MULLER, submitted 2003. 63 A. HAZARD, C. MONTEMAGNO, Arch. Biochem. Biophys. 2002, 407, 117. 64 A. K. SCHMID, N. C. BARTELT, R. Q. HWANG, Science 2000, 290, 1561. 65 M. DOGTEROM, B. YURKE, Science 1997, 278, 856. 66 F. van den ENT, J. MOLLER-JENSEN, L. A. AMOS, K. GERDES, J. LOWE, EMBO J. 2002, 21, 6935. 67 J. LOWE, L. A. AMOS, Nature 1998, 391, 203.
68 F. J. NEDELEC, T. SURREY, A. C. MAGGS, S. LEIBLER, Nature 1997, 389, 305. 69 T. SURREY, M. B. ELOWITZ, P. E. WOLF, F. YANG, F. NEDELEC, K. SHOKAT, S. LEIBLER, Proc. Natl. Acad. Sci. USA 1998, 95, 4293. 70 K. KRUSE, F. JULICHER, Phys. Rev. Lett. 2000, 85, 1778. 71 D. HUMPHREY, C. DUGGAN, D. SAHA, D. SMITH, J. KAS, Nature 2002, 416, 413. 72 W. S. RYU, R. M. BERRY, H. C. BERG, Nature 2000, 403, 444. 73 D. ISHII, K. KINBARA, Y. ISHIDA, N. ISHII, M. OKOCHI, M. YOHDA, T. AIDA, Nature 2003, 423, 628. 74 N. C. SEEMAN, Nature 2003, 421, 427. 75 S. R. HAMEROFF, R. C. WATT, J. Theoret. Biol. 1982, 98, 549. 76 R. PENROSE, Ann. N. Y. Acad. Sci. 2001, 929, 105. 77 P. SHETERLINE, J. CLAYTON, J. SPARROW, Protein Profile 1995, 2, 1. 78 E. NOGALES, Annu. Rev. Biochem. 2000, 69, 277. 79 C. BUSTAMANTE, Z. BRYANT, S. B. SMITH, Nature 2003, 421, 423. 80 P. J. HAGERMAN, Annu. Rev. Biophys. Biomol. Struct. 1997, 26, 139. 81 C. RUEGG, C. VEIGEL, J. E. MOLLOY, S. SCHMITZ, J. C. SPARROW, R. H. FINK, News Physiol. Sci. 2002, 17, 213. 82 A. MEHTA, J. Cell Sci. 2001, 114, 1981. 83 F. J. KULL, Essays Biochem. 2000, 35, 61. 84 C. SHINGYOJI, H. HIGUCHI, M. YOSHIMURA, E. KATAYAMA, T. YANAGIDA, Nature 1998, 393, 711. References 199 85 S. A. BURGESS, M. L. WALKER, H. SAKAKIBARA, P. J. KNIGHT, K. OIWA, Nature 2003, 421, 715. 86 G. J. WUITE, S. B. SMITH, M. YOUNG, D. KELLER, C. BUSTAMANTE, Nature 2000, 404, 103. 87 M. D. WANG, M. J. SCHNITZER, H. YIN, R. LANDICK, J. GELLES, S. M. BLOCK, Science 1998, 282, 902. 88 N. R. FORDE, D. IZHAKY, G. R. WOODCOCK, G. J. WUITE, C. BUSTAMANTE, Proc. Natl. Acad. Sci. USA 2002, 99, 11682. 89 J. J. CHAMPOUX, Annu. Rev. Biochem. 2001, 70, 369. 90 T. R. STRICK, G. CHARVIN, N. H. DEKKER, J. F. ALLEMAND, D. BENSIMON, V. CROQUETTE, Comptes Rendus Physique 2002, 3, 595.
References 91 D. E. SMITH, S. J. TANS, S. B. SMITH, S. GRIMES, D. L. ANDERSON, C. BUSTAMANTE, Nature 2001, 413, 748. 92 A. J. MERZ, M. SO, M. P. SHEETZ, Nature 2000, 407, 98.
93 B. MAIER, L. POTTER, M. SO, H. S. SEIFERT, M. P. SHEETZ, Proc. Natl. Acad. Sci. USA 2002, 99, 16012. 94 D. J. DEROSIER, Cell 1998, 93, 17.
17
1
Biocatalysis in Non-conventional Media Andreas Sebastian Bommarius
Georgia Institute of Technology, Atlanta, USA
Bettina R. Riebel Emory University School of Medicine, Atlanta, USA
Originally published in: Biocatalysis. Andreas Sebastian Bommarius and Bettina R. Riebel. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30344-1
Introduction
Contrary to expectations that enzymes are only active in aqueous solution, activity in almost anhydrous organic solvents was already demonstrated in the 1930s and rediscovered in 1977. It was not water-miscible hydrophilic solvents such as methanol or acetone that proved to be the best reaction media, but hydrophobic water-immiscible solvents such as toluene or cyclohexane. Supposedly, the cause is the partitioning of water between the enzyme surface and the bulk phase of the organic solvent. As comparably hydrophilic solvents such as methanol or acetone can take up basically infinite amounts of water, they strip the remaining water molecules off the enzyme surface. As a consequence, the enzyme is no longer active because it requires a small but measurable amount of water for developing its activity. The advantages of organic media for enzyme reactions are: (i) an enhancement of the solubility of reactants; (ii) a shift of equilibria in organic media; (iii) easier separation of organic solvents than water; (iv) enhanced stability of enzymes in organic solvents; and (v) altered selectivity, including substrate specificity, enantioselectivity, prochiral selectivity, regioselectivity, and chemoselectivity, of enzymes in organic solvents. In water (with no organic phase present) the yield of the ester is about 0.01%, whereas in a biphasic water–water-immiscible organic solvent system consisting of porous glass impregnated with aqueous buffer solution it is practically 100%. Upon dehydration, many enzymes become extremely thermostable, up to and beyond 100◦ C. Most categories of enzymes have been found to be active in organic solvents, not just lipases but also proteases, dehydrogenases, peroxidases, and several others. Enzymatic reactions in organic solvents follow saturation kinetics; mechanisms Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Biocatalysis in Non-conventional Media
and active sites have been found to be the same as in aqueous solution. Kinetic constants, however, do not correlate with a few simple solvent parameters such as hydrophobicity, dielectric constant, or dipole moment; instead, case-by-case correlations are found. While enzymes in organic solvents often display activities that are orders of magnitude of lower than in water, there are several activation mechanisms which each yield about an order of magnitude of improvement: (i) sufficient hydration in organic solvent (between 1% v/v and saturation); (ii) lyophilization at the pH of maximum activity in water; (iii) addition of lyoprotectants such as polyols; (iv) addition of hydrophobic binding pocket protectors such as phenols or anilines; and (v) lyophilization in strong salts such as KCl (if present at > 90% w/w). Whereas “protein engineering” covers knowledge of improving the biocatalyst, the complementary “medium engineering” optimizes the reaction medium, resulting in improved prochiral, regio-, and enantioselectivity. Many such changes have been observed, demonstrating that not only the catalyst but also the medium exerts a decisive influence on the outcome of an enzymatic reaction.
1 Enzymes in Organic Solvents
Conventional wisdom says that enzymes are active in water and in water only, just as in nature; according to these ideas, enzymes immediately deactivate in organic solvents. This picture is wrong for several reasons: Ĺ in nature, enzymes are not active in water but in the proximity of cell membranes, so a double layer of lipid molecules or a micelle would be a more suitable environment to emulate than an aqueous bulk phase; Ĺ enzymes do not necessarily deactivate in organic solvents – other phenomena such as inaccessibility owing to insolubility or denaturation during lyophilization can account for lack of activity.
Despite these arguments and classic papers demonstrating activity of enzymes in partially or wholly organic solvents [1], results from the Lomonosov University in Moscow, Russia, at the end of the 1970s came somewhat as a surprise: Klibanov et al. demonstrated esterification of N-acetyl-L-tryptophan with ethanol in chloroform [2]. As porous glass was impregnated with aqueous buffer solution, Klibanov et al. termed this system a biphasic water–water-immiscible organic solvent system. They considered the system to consist of two phases, the organic phase and the aqueous phase on the glass bead, noted the shift of equilibria for water-forming reactions in nearly anhydrous media and provided equations modeling the situation. In water (no organic phase) the yield of the ester is about 0.01%, whereas in the biphasic system it is practically 100%. However, it was not water-miscible hydrophilic solvents such as methanol or acetone that proved to be the best reaction media but, on the contrary, hydrophobic
2 Evidence for the Perceived Advantages of Biocatalysts in Organic Media 3
water-immiscible solvents such as toluene or cyclohexane. Supposedly, the cause is the partitioning of water between the enzyme surface and the bulk phase of the organic solvent. As comparably hydrophilic solvents such as methanol or acetone can take up basically infinite amounts of water, they strip the remaining water molecules off the enzyme surface. As a consequence, the enzyme is no longer active because it requires a small but measurable amount of water for developing its activity. What are the advantages of organic solvents as media for biocatalytic reactions? 1. The main advantage is an enhanced solubility of many, mostly hydrophobic, reactants insoluble or only sparingly soluble in water. Even for reactants with a moderate solubility in water, that solubility can often limit the attainable reaction rate. 2. In nearly anhydrous media, equilibria of hydrolytic reactions can be shifted towards condensation. In this fashion, reactions are possible in organic media whose equilibrium constants favor the reactants to such a degree that they alone are present in water. 3. As the enzyme in most cases is not dissolved in hydrophobic media but only suspended, it can be separated relatively easily after the reaction is finished. As the enzyme is not immobilized, connected disadvantages such as low activity yields or mass transfer limitations are avoided. 4. Enzymes are sometimes more stable in anhydrous organic solvents than in water. Lysozyme in water at 100◦ C was demonstrated to be 50% deactivated after 30 s (pH 8) or after 100 min (pH 4), but after 140 h in cyclohexane or even after 200 h as a dry powder [3]. 5. Owing to the influence of the medium, enzymes should feature different substrate specificities and selectivities than water. This would allow for tuning of these parameters under the control of the operator. 6. Organic solvents can be recycled more easily than water with its known high enthalpy of vaporization.
2 Evidence for the Perceived Advantages of Biocatalysts in Organic Media 2.1 Advantage 1: Enhancement of Solubility of Reactants
The rates of asymmetric sulfoxidation of thioanisole in nearly anhydrous (99.7%) isopropyl alcohol and methanol catalyzed by horseradish peroxidase (HRP) were determined to be tens to hundreds of times faster than in water under otherwise identical conditions [4]. Similar effects were observed with other hemoproteins. This dramatic activation is due to a much higher substrate solubility in organic solvents than in water and occurs even though the intrinsic reactivity of HRP in isopropyl alcohol and in methanol is hundreds of times lower than in water.
4 Biocatalysis in Non-conventional Media
In addition, the rates of spontaneous oxidation of the model prochiral substrate thioanisole in several organic solvents was observed to be some 100- to 1000-fold slower than in water. This renders peroxidase-catalyzed asymmetric sulfoxidations synthetically attractive. 2.2 Advantage 2: Shift of Equilibria in Organic Media 2.2.1 Biphasic Reactors Multi-phase reactors are of interest in biocatalytic reactions if one or several components of the reaction are insoluble or insufficiently soluble in aqueous phases but if an aqueous phase has to be kept, if only for the biocatalyst. However, a two-phase system can be utilized advantageously for the shifting of an equilibrium; this is demonstrated below. We analyze the simple reaction A ⇐⇒ B in an organic-aqueous two-phase system with the assumption that reactant A and product B partition between the two phases. The partition coefficients Pw and Porg are defined by Eq. (1).
PX = [X]org /[X]w
(1)
In both phases, equilibrium exists between A and B [Eq. (2)]. K w = [B]w /[A]w and K org = [B]org /[A]org
(2)
A biphasic equilibirum constant also can be defined as in Eq. (3). K biphas = [B]tot /[A]tot
(3)
[X]tot can be linked to [X]org and [X]w via a mass balance [Eq. (4)]. [X]tot Vtot = [X]w Vw + [X]org Vorg
(4)
In addition, the sum V tot = V w + V org is valid, so insertion of Eq. (4) into Eq. (3) yields Eq. (5). K biphas = {[B]w Vw + [B]org Vorg {/{[A]w Vw + [A]org Vorg }
(5)
Upon division by [A]w V w and introduction of the phase ratio α (α = V org /V w ) we finally obtain Eq.: K biphas = K w · {(1 + α PB )/(1 + α PA )}
(6)
It can be seen from this equation that the effective equilibrium constant K bisphas is a function of the phase ratio α. Within limits, the equilibrium can be shifted in the
2 Evidence for the Perceived Advantages of Biocatalysts in Organic Media 5
Fig. 1 Dependence of the effective equilibrium constant K biphas on the phase ratio " (from Chaplin, 1990).
desired direction by controlling and shifting α. If the reaction is of the form A + B ⇐⇒ C + D, Eq. (7) is equivalent to Eq. (6). K biphas = K w · {(1 + α PC )(1 + α PD )/(1 + α PA )(1 + α PB )}
(7)
Figure 1 further elucidates the connection between K biphas and α: 2.3 Advantage 3: Easier Separation
In comparison to the high enthalpy of evaporation of water (57.36 kJ mol−1 ), the respective values are much lower for organic solvents, which are thus much easier to separate than water. Table 1 lists some boiling points and enthalpies of evaporation for common organic solvents. 2.4 Advantage 4: Enhanced Stability of Enzymes in Organic Solvents
Porcine pancreatic lipase catalyzes the transesterification reaction between tributyrin and various primary and secondary alcohols in a 99% organic medium [5]. Upon further dehydration, the enzyme becomes extremely thermostable. Not only
6 Biocatalysis in Non-conventional Media Table 1 Boiling points and enthalpies of evaporation for common organic solvents Solvent MTBE Acetone CH3 OH THF Hexane DIPE B.p. [◦ C] 55.2 29.70 Hvap [kJ mol−1 ]
56.2 32.00
65.0 39.26
67.0 28.76
69.0 31.93
EA
C2 H5 OH C6 H12 Toluene DMF
69.0 77.1 32.56 34.75
78.5 40.50
80.7 32.79
110.6 149 39.23 60.45
MTBE: methyl t-butyl ether; THF: tetrahydrofuran; EA: ethyl acetate; C6 H12: cyclohexane; DIPE: diisopropyl ether; DMF: dimethylformamide.
can the dry lipase withstand heating at 100 ◦ C for many hours, but it exhibits a high catalytic activity at that temperature. Reduction in water content also alters the substrate specificity of the lipase: in contrast to its wet counterpart, the dry enzyme does not react with bulky tertiary alcohols. During transesterification of amino acid esters with alcohols in the presence of chymotrypsin and subtilisin, both thermal and storage stabilities of chymotrypsin in non-aqueous solvents are observed to be greatly enhanced compared to water. 2.5 Advantage 5: Altered Selectivity of Enzymes in Organic Solvents
One of the most profound influences a solvent system can have on a reaction is a change of selectivity. Enzymes in organic solvents have been discovered in many cases to feature altered selectivity, including substrate specificity, enantioselectivity, prochiral selectivity, regioselectivity, and chemoselectivity [6, 7].
3 State of Knowledge of Functioning of Enzymes in Solvents 3.1 Range of Enzymes, Reactions, and Solvents
Not only do enzymes work in anhydrous organic media, but in this unnatural milieu they acquire remarkable properties such as enhanced stability, altered substrate and enantiomeric specificities, molecular memory, and the ability to catalyze unusual reactions [8]. Regarding the latter point, hydrolases, such as lipases, catalyze not only transesterifications in organic media but also other types of reactions, including esterification, aminolysis, thiotransesterification, and oximolysis. As all of these reactions compete with hydrolyses, which tend to dominate in aqueous media, some of them proceed to an appreciable extent only in non-aqueous solvents. Soon after the initial discovery, it became apparent that neither the source of the enzyme, nor the type of enzyme, nor the type of solvent seem to constrain the
3 State of Knowledge of Functioning of Enzymes in Solvents 7 Table 2 A selection of enzymes active in organic solvents
Class
E.C. number Enzyme
Reaction
Reference
yeast ADH horse-liver ADH alcohol oxidase HR peroxidase PP lipase
ethanol oxidation alcohol oxidation ethanol oxidation sulfoxidation transesterification
Deetz [17] Zaks [19] Zaks [19] Dai [4] Zaks [5]
3.2.1.17 3.4.21.1 3.4.21.62 3.4.21.4 3.4.24.27 3.5.1.5 3.5.1.11
lysozyme α-chymotrypsin subtilisin trypsin thermolysin urease Pen G acylase
transesterification transesterification transesterification dipeptide synthesis hydrolysis of urea resolution of β-amino esters
Zaks [3] Zaks [16] many van Unen [36] Bedell [4] Ghatorae [29] Roche [67]
4.1.2.10 4.2.1.84
(R)-oxynitrilase HCN addition to C O nitrile hydratase H2 O addition to nitriles
Oxido-reductases 1.1.1.1 1.1.1.1 1.1.3.13 1.11.1. Hydrolases 3.1.1.3
Lyases
Wajant [80] Thomas [29]
HR peroxidase: horseradish peroxidase; PP lipase: porcine pancreatic lipase.
use of organic solvents [3]. Various types of enzymes, such as lipases, proteases (chymotrypsin, subtilisin), oxidoreductases (alcohol dehydrogenase, oxidases, and peroxidases), and others, react in organic solvents. A selection of enzymes found to be active is listed in Table 2. Lipases from three different sources, porcine pancreatic, yeast, and mold, act vigorously as catalysts in a number of nearly anhydrous organic solvents. The range of solvents encompasses the whole spectrum from hydrophilic ones such as methanol, isopropanol, and acetonitrile via THF to the hydrophobic ones such as dialkyl ethers, toluene, hexane, cyclohexane, and heptane. 3.2 The Importance of Water in Enzyme Reactions in Organic Solvents 3.2.1 Exchange of Water Molecules between Enzyme Surface and Bulk Organic Solvent Early after the (re)discovery of enzyme activity in organic solvents, the higher level of activity in very hydrophobic solvents, such as hydrocarbons, in comparison with rather hydrophilic solvents, such as alcohols, was noted. Soon the hypothesis was raised that water molecules partition between the enzyme surface and the bulk organic solvent, and they are “stripped off” the enzyme surface in dry organic solvents. This idea explained why hydrophilic solvents, which strip water off an enzyme surface more readily than hydrophobic ones, deactivate enzymes much more readily than the latter. The hypothesis was proven correct by exchange of water on the enzyme surface by tritiated water, T2 O: suspension of the tritiated enzyme in the organic solvent, filtration of the enzyme, and measurement of the
8 Biocatalysis in Non-conventional Media
amount of tritiated water in the solvent by scintillation counting demonstrated that several different enzymes (α-chymotrypsin, Subtilisin Carlsberg, and horseradish peroxidase) yielded water molecules immediately after suspension in the solvent. Depending on the solubility of water in the solvent, different amounts of T2 O were desorbed, from 0.4% in hexane to 62% in methanol [9]. The amount of water required by chymotrypsin and subtilisin for catalysis in organic solvents is several hundred molecules per protein molecule, less than required to form a monolayer on the surface. While subtilisin and α-chymotrypsin act as catalysts in a variety of dry organic solvents, the vastly different catalytic activities in these organic solvents are partly due to stripping of the essential water from the enzyme by more hydrophilic solvents and partly due to the solvent directly affecting the enzymatic process. Enzymes as different as yeast alcohol oxidase, mushroom polyphenol oxidase, and horse-liver alcohol dehydrogenase demonstrated vastly increased enzymatic activity in several different solvents upon an increase in the water content, which always remained below the solubility limit [10]. While much less water was required for maximal activity in hydrophobic than in hydrophilic solvents, relative saturation seems to be most relevant to determining the level of catalytic activity. Correspondingly, miscibility of a solvent with water is not a decisive criterion: upon transition from a monophasic to a biphasic solvent system, no significant change in activity level was observed [11]. Therefore, the level of water essential for activity depends more on the solvent than on the enzyme.
3.2.2 Relevance of Water Activity While water level itself seems to be of no significance for enzyme activity in organic solvents, the relative amount of water with respect to total water solubility seems to be very influential. Correspondingly, instead of using concentration cw or mole fraction xw of water as a measure of the amount of water, water activity aw should be preferred. [Activity, linked to concentration by the activity coefficient γ w , aw = γ w · cw , written here for the case of water, is the thermodynamically rigorous way to consider amounts per volume in thermodynamic formulae such as mass action laws.] At the solubility limit in any solvent, water activity is always equal to unity, regardless of the mole fraction. The transesterification of vinyl butyrate with n-octanol with Pseudomonas cepacia lipase yielded increasing K M values with rising water activity; however, vmax values demonstrated a maximum at water activity levels aw between 0.11 and 0.38 [7]. Activity maxima at certain aw values have also been found with many other enzymes, such as optima at aw = 0.55 for Mucor miehei lipase-catalyzed reaction in several solvents of different polarity from hexane to 3-pentanone [13]. In comparing five different lipases, considerable variations not just of the optima of aw but also the dependence of activity on aw itself were found [14]. Moreover, in polar solvents, correlations of activity with aw often cannot be found at all, probably owing to solvent effects beyond those controlling hydration of the enzyme [5].
3 State of Knowledge of Functioning of Enzymes in Solvents 9
Summarizing the findings above, control of water activity aw during enzyme reactions in organic solvents is extremely important, as water activity exerts a crucial influence, and enzyme reactivity crucially depends on it. 3.3 Physical Organic Chemistry of Enzymes in Organic Solvents 3.3.1 Active Site and Mechanism In organic solvents, enzymes react at the same active site as in water; covalent modification of the active center of the enzymes by a site-specific reagent renders them catalytically inactive in organic solvents [16]. Upon replacement of water with octane as the reaction medium, the specificity of chymotrypsin towards competitive inhibitors reverses. Early results with enzymes in organic solvents demonstrated that enzymes work best in organic solvents if there are lyophilized from aqueous solution at or near the optimum pH value in water (see Section 3, below). Any specific pH effect of organic solvents, however, can be dismissed: it was found that the pKa of amino, carboxylic, and phenolic compounds does not differ by more than 0.3 units in the aqueous and lyophilized states [13]. Enzymatic transesterifications in organic solvents follow Michaelis-Menten kinetics, and in quite a few cases, the values of v/K M roughly correlate with the solvent’s hydrophobicity [16]. The dependence of the catalytic activity of the enzyme in organic media on the pH of the aqueous solution from which it was recovered is bell-shaped, with the maximum coinciding with the pH optimum of the enzymatic activity in water. Hammett analysis and finding of corresponding LFERs suggests that the mechanism of an enzyme reaction in organic solvents is the same as in aqueous solution [18]. 3.3.2 Flexibility of Enzymes in Organic Solvents From experiments on (i) replacement of water with other hydrogen bond forming additives and (ii) titration of enzyme amino groups in an organic medium, as well as the literature data on dehydrated enzymes, it is concluded that the water required by enzymes in non-aqueous solvents provides them with sufficient conformational flexibility for catalysis [19]. The activity of lyophilized HL-ADH in several solvents increased by orders of magnitude upon addition of small amounts of water to solvents of dielectric constant ε from 1.9 to 36 [20]. Enzyme flexibility, as measured by electron spin resonance (ESR) spectroscopy with a spin label at the active site, did not depend on water content but on ε of the pure solvent: HL-ADH turns less flexible with decreasing ε. Correlating positively with the hydrophobicity of the solvent, different fractions of inactivated active centers were measured in different solvents with solid-state NMR spectroscopy (13 C-cross-polarization/magic angle spinning (MAS) NMR) [9]. Just as with tritiated water (see above), immediate desorption of water molecules from the protein surface was observed after addition to the organic solvent.
10 Biocatalysis in Non-conventional Media
In some instances, a relationship between enzyme enantioselectivity and flexibility has been found: while there are some positive examples in the literature [8], other investigations either found a slightly negative correlation or did not find any correlation at all [23]. 3.3.3 Polarity and Hydrophobicity of Transition State and Binding Site As enzyme active sites commonly have to be totally anhydrous to allow chemical catalysis to occur, substrates have to be desolvated when transferring from solvent to active site. The effect of six solvents ranging from water to hexane on the standard test reaction in organic solvents, the transesterification of N-acetyl-L-Phe-OEt with n-PrOH, with the help of Subtilisin Carlsberg, has been interpreted by taking into account substrate desolvation, as measured by relative substrate solubility [24]. Analysis of a thermodynamic cycle between ground (E + S) and transition state (ES= ) in the solvent and in acetone yields Eq. (8). =
=
=
G obs = G intr + G s = G ES −G E
(8)
Equation (8) states that the apparent and intrinsic differential activation energies G= obs and G*intr differ by the differential Gibbs free solvation energy Gs , with Gs relating the saturation solubilities of N-Ac-L-Phe-OEt in a solvent to acetone as the standard state [Eq. (9)], so that, with G = H – T · S, Eq. (10) follows. G s = −RT ln([S]solvent /[S]acetone ) G
=
=
obs
= H
intr −T
(9) =
· S
intr
−RT ln([S]solvent /[S]acetone )
(10)
While G= intr failed to yield a straight-line correlation, an LFER, with log ε (manifesting solvent polarity) and G= obs in fact did feature such a correlation. The results indicate that substrate desolvation is important and that the intrinsic activation energy of subtilisin catalysis is lowest in polar organic solvents, which may be due to transition-state stabilization of the enzyme’s polar transition state for transesterification. Furthermore, the results strongly point towards the same transition state in water and organic solvents across a wider range of solvent polarity ε. The dramatic activation of serine proteases in non-aqueous media owing to prior lyophilization in the presence of KCl could be due in principle to the relaxation of potential substrate diffusional limitations in clumps of heterogeneous, undissolved enzyme in organic solvent. This hypothesis was checked with a classic experiment for diffusional limitation: Subtilisin Carlsberg was lyophilized with KCl and phosphate buffer in varying proportions ranging from 1 to 99% (w/w) enzyme in the final lyophilized solids [4]. The active enzyme content of a given biocatalyst preparation was controlled by mixing native subtilisin with subtilisin preinactivated with the serine protease inhibitor PMSF and lyophilizing the enzyme mixture in the presence of different fractions of KCl and phosphate buffer. Initial catalytic rates, normalized per weight of total enzyme in the catalyst material, were measured as
3 State of Knowledge of Functioning of Enzymes in Solvents 11
a function of active enzyme for biocatalyst preparations containing different ratios of active to inactive enzyme. Plots of initial reaction rates as a function of percentage active subtilisin in the biocatalyst were found to be linear for all biocatalyst preparations. Thus, enzyme activation, as high as 3750-fold in hexane for the transesterification of N-Ac-L-PheOEt with n-PrOH, is a manifestation of intrinsic enzyme activation and not of relaxation of diffusional limitations resulting from diluted enzyme preparations. Thus, the above-mentioned hypothesis was refuted. As similar results were found for the metalloprotease thermolysin, activation due to lyophilization in the presence of KCl may be a general phenomenon. 3.4 Correlation of Enzyme Performance with Solvent Parameters
Given the multitude of potential organic solvents for enzyme reactions, design rules or at least heuristics were sought early on for picking organic solvents without trying a large number of them experimentally. Organic solvents can be described and classified according to several criteria, such as volatility (melting and boiling points, Tm and T boil ), size (molar mass, M [g mol−1 ]), accommodation of charge (dielectric constant ε), hydrophobicity (water/octanol partition coefficient, log P), distribution of charge (dipole moment µ), or polarizability (polarization constant α); for data, see Table 3. The water/octanol partition coefficient log P of a component A is defined by Eq. (11), where A is in dilute solution below the solubility limit. logP = log{[A]water /[A]n−octanol }
(11)
n-Octanol is chosen as a reference because it mimics biological membranes with a hydrophobic tail and a hydrophilic head capable of hydrogen bonding.
Table 3 Some properties of common organic solvents Solvent MTBE Acetone CH3 OH THF Hexane DIPE B.p. [◦ C] M [g/mol] ε[−] log P µ [D]
55.2 88.15 4.5 1.15 1.23
56.2 65.0 67.0 58.08 32.04 72.10 20.7 32.63 7.58 − 0.23 − 0.76 0.49 2.70 1.71 1.74
EA
69.0 69.0 77.1 86.17 102.17 88.10 1.89 3.23 6.02 3.5 1.9 0.68 0.0 1.24 1.83
C2 H5 OH C6 H12 Toluene DMF 78.5 46.07 24.3 − 0.24 1.74
80.7 110.6 153 84.16 92.13 73.09 2.02 2.38 36.7 3.2 2.5 − 1.0 0.0 0.30 3.24
MTBE: methyl t-butyl ether; THF: tetrahydrofuran; EA: ethyl acetate; C6 H12 : cyclohexane; DIPE: diisopropyl ether; DMF: dimethylformamide.
12 Biocatalysis in Non-conventional Media
Fig. 2 Correlation of activity data with solvent parameters [27].
3.4.1 Control through Variation of Hydrophobocity: log P Concept In an influential early investigation, correlation of biocatalytic activity data of aerobic and anaerobic whole-cell biocatalysis with solvent properties resulted in the strongest correlation for the partition coefficient log P, whereas both Hildebrand’s solubility parameter δ and the dielectric constant ε showed either a weak correlation with activity data or none at all [26, 27] (Figure 2). This simple picture, unfortunately, did not hold for all enzyme reactions in organic solvents. 3.4.2 Correlation of Enantioselectivity with Solvent Polarity and Hydrophobicity The enantioselectivity of the transesterification of vinyl butyrate with secphenylethanol was influenced greatly by the solvent; however, not log P (hydrophobicity) but both the dielectric constant ε and the dipole moment µ of the solvent correlated best with the E values [27]. Table 4 demonstrates the correlation of µ with E. Table 4 Correlation of the enantioselectivity E of subtilisin in the transtesterification of
racemic 1-phenylethanol in various organic solvents, and the dielectric constants ε of the solvents [28]
Solvent Dioxane Benzene Triethylamine Tetrahydrofuran pyridine
er
E
Solvent
2.2 2.3 2.4 7.6 12.9
61 54 48 40 31
Dimethylformamide Nitromethane Acetonitrile Methylacetamide
er
E
36.7 35.9 35.9 191.3
9 5 3 3
4 Optimal Handling of Enzymes in Organic Solvents 13 Table 5 Effects of the hydrophobicity (log P) of solvents on the prochiral selectivity (S) of
Pseudomonas sp. lipase in the monohydrolysis of 2-(1-naphthoylamino)trimethylene dibutyrate [29])
Solvent
log P
Acetonitrile Nitrobenzene Acetone Cyclohexanone Butanone 2-Pentanone Chloroform Tetrahydrofuran 2-Hexanone Dioxane t-Butyl acetate t-Butyl alcohol t-Amyl alcohol Triethylamine Toluene Benzene Carbon tetrachloride
−0.33 1.8 −0.23 0.96 0.29 0.80 2.0 0.49 1.3 −1.1 1.7 0.80 1.4 1.6 2.5 2.0 3.0
S >30 >30 18 18 16 16 9.9 9.1 8.8 5.4 5.3 4.9 4.8 4.7 3.5 3.2 2.6
In a third example, the prochiral selectivity results of the Pseudomonas sp. lipasecatalyzed monohydrolysis of 2-(1-naphthoylamino)trimethylene dibutyrate correlate inversely with hydrophobicity, as expressed by log P (Table 5; [29]). The three examples provided in this section demonstrate that each case has to be considered individually. None of the solvent parameters has been found to correlate reliably with enzyme activity or selectivity data, except in individual cases; predictions at this point are not possible.
4 Optimal Handling of Enzymes in Organic Solvents
While different enzyme reactions yield different results in different solvents, thus rendering comparison difficult, a procedure akin to a “standard protocol” for conducting enzyme reactions in organic solvents has been developed. It calls for Ĺ lyophilization at optimum pH in aqueous solution; Ĺ meeting the requirement for water for enzyme function; and Ĺ testing the enzyme with a standard reaction under standard conditions.
In the case of subtilisin, the standard reaction is the transesterification of Nacetyl-L-phenylalanine ethyl ester with n-propanol [Eq. (12)]. N−Ac−L−Phe−OEt + n−PrOH → NAc−L−Phe−OPr + EtOH
(12)
14 Biocatalysis in Non-conventional Media
More data have been obtained with this reaction on enzymes in organic solvents than with any other. 4.1 Enzyme Memory in Organic Solvents
Repeatedly, researchers have found the phenomenon that enzyme molecules seem to “remember” the aqueous conditions from which they were prepared [30]. As mentioned above, early results with enzymes in organic solvents demonstrated that enzymes work best in organic solvents if there are lyophilized from aqueous solution at or near the optimum pH value in water. No specific pH effect of organic solvents on enzyme reactions was found: the pK a of amino, carboxylic, and phenolic compounds does not differ by more than 0.3 units in the aqueous and lyophilized state [13]. Catalytic activities of α-chymotrypsin and Subtilisin Carlsberg in various hydrous organic solvents were measured as a function of how the enzyme suspension had been prepared [31]. Direct suspension of the lyophilized enzyme in the solvent containing 1% water was compared with precipitation of the same enzyme from its aqueous solution by a 100-fold dilution with anhydrous solvent. The reaction rate in a given non-aqueous enzymatic system was found to depend on the nature of both enzyme and solvent, but to depend strongly on the mode of enzyme preparation. The common mode of suspending lyophilized enzymes in organic solvents containing very little water results in just a low specific activity. Much higher specific activity can be achieved if the enzymes are dissolved in water first and then diluted with anhydrous organic solvent to the same water content [14]; the lower the water content of the medium, the greater the discrepancy. The mechanism of this phenomenon, termed “lyophilization-induced inactivation”, was found to arise from reversible denaturation of the oxidases on lyophilization: because of its conformational rigidity, the denatured enzyme exhibits very limited activity when directly suspended in largely non-aqueous media but renatures and thus yields much higher activity if first redissolved in water. Lyophilizationinduced inactivation could be minimized, at least for the case of four oxidases employed, by at least two strategies, both involving the addition of excipients to the aqueous enzyme solution before lyophilization: Ĺ addition of lyoprotectants such as polyols or polyethylene glycol to preserve the overall enzyme structure during dehydration; Ĺ addition of phenolic and aniline substrates that presumably bind to the hydrophobic pocket of the enzyme active site.
Not only was the effect of these excipients found to extend activity by up to orders of magnitude but the effect of the two excipient groups was found to be additive, affording up to complete protection against lyophilization-induced inactivation when representatives of the two are present together.
4 Optimal Handling of Enzymes in Organic Solvents 15
4.2 Low Activity in Organic Solvents Compared to Water
One important issue in the discussion of the biotechnological opportunities afforded by non-aqueous enzymology and its impact on biocatalysis is the often drastically diminished enzymatic activity in organic solvents compared to that in water. The loss of comparative specific activity can be as high as 103 −105 . Several recent studies have addressed the issue and have made great strides towards both elucidating causes of the phenomenon of activity loss and designing strategies that systematically enhance activity by multiple orders of magnitude [33]. The goal ultimately is to bring the activity level of enzymes in organic solvents to aqueous-like level. Dipeptide synthesis in acetonitrile is found to be enhanced 425-fold in the α-chymotrypsin-catalyzed reaction between the 2-chloroethyl ester of N-acetyl-Lphenylalanine and L-phenylalaninamide upon lyophilization of the enzyme in the presence of 50 equivalents of 18-crown-6 [34]. Acceleration is observed in different solvents and for various peptide precursors. The enhancement of enzyme activity upon addition of 18-crown-6 to the organic solvent can be reconciled with a mechanism in which macrocyclic interactions of 18-crown-6 with the enzyme play an important role [35]. Macrocyclic interactions (e.g., complexation with lysine ammonium groups of the enzyme) can lead to a reduced formation of inter- and intramolecular salt bridges and, consequently, to lowering of the kinetic conformational barriers, enabling the enzyme to refold into thermodynamically stable, catalytically (more) active conformations. This assumption is supported by the observation that the crown-ether-enhanced enzyme activity is retained after removal of the crown by washing with a dry organic solvent. Much stronger crown ether activation is observed when 18-crown-6 is added prior to lyophilization, and this can be explained by a combination of two effects: the beforementioned macrocyclic complexation effect, and a less specific, non-macrocyclic, lyoprotecting effect. The magnitude of the total crown ether effect depends on the polarity and thermodynamic water activity of the solvent, the activation being highest in dry and apolar media, where kinetic conformational barriers are highest. By determination of the specific activity of crown-ether-lyophilized enzyme as a function of the enzyme concentration, the macrocyclic crown ether (linearly dependent on the enzyme concentration) and the non-macrocyclic lyoprotection effect (not dependent on the enzyme concentration) could be separated. These measurements reveal that the contribution of the non-macrocyclic effect is significantly larger than the macrocyclic refolding effect. Immobilization in a sol–gel matrix accelerated the propanolysis of N-acetyl-Lphenylalanine ethyl ester in cyclohexane for several serine proteases compared to the non-immobilized lyophilized enzymes: 31-fold for Subtilisin Carlsberg, 43-fold for α-chymotrypsin, and 437-fold for trypsin [36]. The activity yield upon immobilization was 90% (α-chymotrypsin). The rate enhancement effect of immobilization on the enzyme activity is highest in hydrophobic solvents.
16 Biocatalysis in Non-conventional Media Table 6 Approximate activation factors for enzymes in organic solvents for various strategies
Action
Approx. activation factor
Sufficient hydration in organic solvent (≥ 1%, < saturation) Lyophilization at pH of maximum activity in water Lyoprotectants such as polyols Hydrophobic binding pocket protectors (phenols, anilines) Strong salts such as KCl (if present at > 90% w/w)
10 10 10 10 10
total:
105
The activation mechanisms, with the corresponding activation ratios by which the maximum attainable rate constants of enzymes in organic solvents can be enhanced, are summarized in Table 6. Activation by a factor of 105 closes most of the gaps between specific activity levels in water and organic solvents. The state of affairs regarding the preparation of highly active enzyme formulations for use in non-aqueous media is summarized by Lee and Dordick [37]. Improved mechanistic understanding of enzyme function and activation in dehydrated environments will lead to the development of a broad array of techniques for generating more active, stable, and enantioselective and regioselective tailored enzymes for synthetically relevant transformations. This, in turn, should result in an increase in the opportunities for enzymatic processes to be developed on a commercial scale. 4.3 Enhancement of Selectivity of Enzymes in Organic Solvents
Enzymatic enantioselectivity in organic solvents can be markedly enhanced by temporarily enlarging the substrate via salt formation [38]. In addition to its size, the stereochemistry of the counterion can greatly affect the enantioselectivity enhancement [39]. In the Pseudomonas cepacia lipase-catalyzed propanolysis of phenylalanine methyl ester (Phe-OMe) in anhydrous acetonitrile, the E value of 5.8 doubled when the Phe-OMe/(S)-mandelate salt was used as a substrate instead of the free ester, and rose sevenfold with (R)-mandelic acid as a Brønsted-Lewis acid. Similar effects were observed with other bulky, but not with petite, counterions. The greatest enhancement was afforded by 10-camphorsulfonic acid: the E value increased to 18 ± 2 for a salt with its R-enantiomer and jumped to 53 ± 4 for the S. These effects, also observed in other solvents, were explained by means of structure-based molecular modeling of the lipase-bound transition states of the substrate enantiomers and their diastereomeric salts. The relatively meager stereoselectivity of horseradish peroxidase in the sulfoxidation of thioanisole in 99.8% (v/v) methanol is vastly improved when the enzyme forms a complex with benzohydroxamic acid [16]. The generality of the observed “ligand-induced stereoselectivity enhancement” is demonstrated with other hydrophobic hydroxamic acids, as well as with additional thioether
5 Novel Reaction Media for Biocatalytic Transformations 17
substrates, and rationalized by means of molecular dynamics simulations and energy minimization.
5 Novel Reaction Media for Biocatalytic Transformations 5.1 Substrate as Solvent (Neat Substrates): Acrylamide from Acrylonitrile with Nitrile Hydratase
One of the best examples for discussing biotransformations in neat solvents is the enzymatic hydrolysis of acrylonitrile, a solvent, to acrylamide. For several applications of acrylamide, such as polymerization to polyacrylamide, very pure monomer is required, essentially free from anions and metals, which is difficult to obtain through conventional routes. In Hideaki Yamada’s group (Kyoto University, Kyoto, Japan), an enzymatic process based on a nitrile hydratase was developed which is currently run on a commercial scale at around 30 000–40 000 tpy with resting cells of third-generation biocatalyst from Rhodococcus rhodochrous J1. Characteristic of this process are its extremely high concentration level and space-time-yield. Solid nitrile substrates render precipitating product up to a solid medium at high degrees of conversion; liquid nitriles can be run as neat substrate. Figure 3 illustrates the connection between substrate concentration up to 15 M (!) and achievable degree of conversion for the example of the transformation of 3-cyanopyridine to nicotinamide.
Fig. 3 Relationship of degree of conversion with substrate concentration for the reaction of cyanopyridine to form nicotinamide [64].
18 Biocatalysis in Non-conventional Media
5.2 Supercritical Solvents
Supercritical solvents combine extremely low viscosity as well as diffusivity of gases with the high solubility of compounds in solution. In addition, products can be separated by simply depressurizing the reaction vessel and boiling off the solvent. After having been tested for chemical reactions, supercritical solvents were also tested for enzymatic reactions in around 1985. Owing to the expected dependence of the reaction parameters on temperature and pressure, supercritical solvents should allow the investigation of that influence. Supercritical CO2 (SCCO2 ), especially, with its accessible critical parameters (T crit = 31◦ C, pcrit = 50 bar) offers access to this technique. However, results demonstrated that SCCO2 does not show good results for all reactions. It was found for the synthesis of acrylates by transesterification of methylacrylate with lipase from Candida rugosa that owing to its hydrophilic character SCCO2 was inferior to more hydrophobic supercritical solvents such as propane, ethane, or even SCSF6 with respect to reactivity [41]. A comparison between SCCO2 and n-hexane for the esterification of oleic acid showed that in SCCO2 inhibition by ethanol was less important but that vmax in nhexane was about an order of magnitude higher. Enzyme stability was comparable and satisfactory in both cases [27]. For a review of the field of biocatalysis in supercritical solvents, see Mesiano [85]. 5.3 Ionic Liquids
Ionic liquids are composed of organic cations with a compatible anion; examples are [BMIM][PF6 ] and [MMIM][MeSO4 ] (Figure 4). Unlike conventional organic solvents, ionic liquids possess no vapor pressure but good solvent properties, and a wide range of thermal stability over a wide range of liquid state (≤ 300◦ C); also, they can be used to form two-phase systems with many solvents [42].
Fig. 4 Some ionic liquids [42]: a) 1-butyl-3-methylimidazolium hexafluorophosphate, [BMIM][PF6 ]; b) 1-butyl-3-methylimidazolium bis((trifluoromethyl)sulfonyl)amide, [BMIM][CF3 SO2 ]2 N;
c) 1-methyl-3-methylimidazolium methylsulfate, [MMIM][MeSO4 ]; d) 1-butyl-3-methylimidazolium triflate, [BMIM][CF3 SO3 ]; e) 1-butyl-4-methylpyridium tetrafluoroborate, [MBPy][BF4 ].
5 Novel Reaction Media for Biocatalytic Transformations 19
Given these properties, they can extend the application of solvent engineering to biocatalytic reactions. Instead of just serving as a replacement for organic solvents, they often lead to improved process performance. The first report on thermolysincatalyzed Z-aspartame synthesis from Z-L-aspartate and L-phenylalanine methyl ester · HCl in 1-butyl-3-methylimidazolium hexafluorophosphate [BMIM][PF6 ] achieved an initial rate (1.2 nmol (min−1 (mg protein)−1 ) comparable to rates in organic solvents, and remarkable enzyme stability obviating immobilization [25]. Since then, a range of enzymatic reactions and whole-cell processes have achieved remarkable yields [42, 78], (enantio) selectivity [45, 46], or enzyme stability [42, 47]. 5.4 Emulsions [Manufacture of Phosphatidylglycerol (PG)]
Fatty acids are important raw materials which are gained from naturally occurring triglycerides. However, thermal processes to obtain fatty acids result in discoloration, so alternative processes have been sought. Lipases (E.C. 3.1.1.3.) catalyze the hydrolysis of lipids at an oil/water interface. In a membrane reactor, the enzymes were immobilized both on the side of the water phase of a hydrophobic membrane as well as on the side of the organic phase of a hydrophilic membrane. In both cases, no other means for stabilization of the emulsion at the membrane were required. The synthesis reaction to n-butyl oleate was achieved with lipase from Mucor miehei, which had been immobilized at the wall of a hollow fiber module. The degree of conversion reached 88%, but the substrate butanol decomposed the membrane before the enzyme was deactivated. Phosphatidylglycerol (PG), which is useful as a lung surfactant for the treatment of respiratory illnesses, can be obtained from phosphatidylcholine (PC) in a transphosphatidylation reaction [Eq. (13)] catalyzed by phospholipase D (PLD, E.C. 3.1.4.4.), with the hydrolysis to phosphatidic acid (PA) as a side reaction [Eq. (14)]. PC + glycerol → PG + choline
(13)
PC + H2 O → PA + choline
(14)
Commonly, the reaction is conducted in an emulsion stabilized by surfactants and containing the substrate (PC). In a microporous membrane reactor, PLD stability could be increased sevenfold by the addition of ethers to the chamber side. Additionally, no surfactants were required as the product could be separated in simple fashion; 20% product was formed, compared with 4% in the simple emulsion system. 5.5 Microemulsions
If the objective is to keep the enzyme active and stable in an aqueous phase but otherwise to use as much organic phase as possible, microemulsions are an option as
20 Biocatalysis in Non-conventional Media
Fig. 5 Microemulsion droplets and possible average location of enzyme molecules [59].
a reaction medium. In contrast to ordinary emulsions they are thermodynamically stable and, at a particle diameter of 1–20 nm, accommodate most often only one enzyme molecule (Figure 5). The microemulsion droplets communicate rapidly and exchange their contents through elastic collisions. The boundary between microemulsions and reversed micelles is not clearly delineated, and the two notions are often used interchangeably. Enzyme of almost all classes and structures have been solubilized in microemulsion systems and used for reactions [48]. Activity and stability are often comparable to values in aqueous media. Many substrates which cannot be made to react in water or in pure organic solvents such as hexane owing to lack of solubility can be brought to reaction in microemulsions. Whereas enzyme structure and mechanism do not seem to change upon transition from water to the microemulsion phase [6], partitioning effects often are very important. Besides an enhanced or diminished concentration of substrates in the vicinity of microemulsion droplets and thus of enzyme molecules, the effective pH values in the water pool of the droplets can be shifted in the presence of charged surfactants. Frequently, observed acceleration or deceleration effects on enzyme reactions can be explained with such partitioning effects [41]. 5.6 Liquid Crystals
As described above, microemulsions feature the disadvantage that the reaction product can be separated only with difficulty from other components of the system. This disadvantage can be compensated by employing highly viscous to solid liquidcrystalline phases which consist of the same components as microemulsions (oil, surfactant, water). Many enzymes display activity in several phases of a threecomponent mixture (Figure 6) [51]. At the beginning of the investigation a surfactant/water/oil phase diagram has to be prepared in case it has not been done already. Subsequently, one can generate the reaction medium by mixing corresponding amounts of components according to the phase diagram; in most cases, several phases will form. In cubic inverse phases, which are mechanically very stable, a series of enzymatic reactions have been conducted [51]. Often, stability of enzymes has been found to be significantly
5 Novel Reaction Media for Biocatalytic Transformations 21
Fig. 6 Liquid-crystalline phases and catalytic activity of peroxidase in ternary water/surfactant/organic solvent phases [59]. Phases: a = normal micelles; b = reversed micelles; c = lamellar aggregates; d = hexagonal aggregates.
higher in such liquid-crystalline phases than in pure isotropic continuous phases and media. 5.7 Ice-Water Mixtures
Even more than 30 years ago, it had been found that hydroxylaminolysis of amino acid esters with the help of trypsin catalysis is accelerated in frozen aqueous phases at −23◦ C in comparison to liquid aqueous phases at 1◦ C [34]. For kinetically controlled peptide synthesis with different serine or cysteine proteases, strongly enhanced yields of di- and oligopeptides have been observed [53] (Table 7; Figure 7). The strength of nucleophiles seems to be much higher in ice/water phases than in the purely liquid phase, as even unprotected dipeptides render good yields and even unprotected amino acid render at least moderate yields (Table 7; Figure 7). The reverse action of a trypsin-free elastase isolated from porcine pancreas was studied in frozen aqueous systems and was found to catalyze peptide bond formation more effectively than in solution at room temperature [36]. The acceptance of free amino acids as nucleophilic amino components indicates a changed specificity of the endoprotease in frozen reaction mixtures. In elastase-catalyzed formation of Ser-, Ile-, and Val-X-bonds in frozen aqueous reaction mixtures, peptide yields obtained depended on the P1 amino acid and the acyl donor chain length. The highest yields have been found at temperatures between −20 and −10◦ C and at pH values where more than 90% of the nucleophile was present as free
22 Biocatalysis in Non-conventional Media Table 7 Yields of α-chymotrypsin-catalyzed peptide synthesis in water and ice using maleoyl-
l-tyrosine methyl ester as acyl donor and various peptides, amino acid amides, and amino acids as nucleophiles , [53] Peptide yield [%] Amino component of the reaction H-Gly-Ala-OH H-Gly-Gly-OH H-Gly-Gly-Gly-OH H-D-Leu-NH2 H-L-Leu-NH2 H-Arg-OH H-Lys-OH H-β-Ala-Gly-OH
25◦ C 5.8 2.6 5.1 9.9 79.1 <2 <2 <2
−25◦ C 94.7 95.4 91.2 73.0 91.8 32.7 43.8 78.8
∗ [amino component] = 50 mM (25 mm as free base); [Maleoyl-L-Tyr-OMe] = 2 mM; [α-chymotrypsin] = 0.15 mm at 25◦ C and 0.3 µm at −25◦ C.
base. Beyond a tenfold excess of nucleophile over electrophile, no further increase in yield could be observed. In a typical case, for the α-chymotrypsin (E.C. 3.4.21.1)catalyzed reaction of maleyl-Phe-OMe with H-Leu-NH2 investigated under a range of reaction conditions, a variety of results were demonstrated [28]:
Fig. 7 Left: dependence of the yield of the "-chymotrypsin-catalyzed synthesis of the peptides maleoyl-Tyr-d-Leu-NH2 (◦) and maleoyl-Tyr-$-Ala-Gly-OH () in ice upon the relative amount of nucleophile free base (for conditions, see Table 7). Right: influence of temperature on the yield of the "-
chymotrypsin-catalyzed peptide synthesis in ice using maleoyl-l-Tyr methyl ester as acyl donor and H-d-Leu-NH2 (◦), H-Gly-Gly-GlyOH (•), and H-Ala-Ala-OH () as amino components (for conditions, see Table 7) [53].
5 Novel Reaction Media for Biocatalytic Transformations 23
Ĺ the peptide yield is independent of the method of shock-freezing; Ĺ the optimal reaction temperature is 248–263 K (−25 to −10 ◦ C), and lower temperatures result in clearly retarded reactions; Ĺ addition of DMSO leads to decreasing peptide yields; and Ĺ peptide bond formation is catalyzed by the active enzyme, since unspecific protein surface catalysis gave no peptide yields at all.
The best explanation of the good results for peptide syntheses in ice-water mixtures are based on the freeze-concentration-model, which just provides for a volume-reducing function for the ice while the liquid aqueous part is still the only relevant phase for the reaction. All observed enhancements of reaction rate would then have to be attributed to an increase in effective concentration. 1 HNMR relaxation time measurements have been used to determine the amount of unfrozen water in partially frozen systems, thus quantifying the extent of the “freeze-concentration effect” [56]. Comparative studies in ice and at room temperature verify the importance of freeze-concentration which, however, is not sufficient for a complete understanding of the observed effects. 5.8 High-Density Eutectic Suspensions
Enzymatic catalysis in heterogeneous eutectic mixtures, i.e., at comparable mole fractions of each substrate (in the case of peptide synthesis, electrophile and nucleophile) and water, offers advantages over conventional reactions conducted in organic solvents. Such benefits include the avoidance of bulk solvents and the attainment of greatly improved productivities. A wide range of proteases have been shown to retain their catalytic activity in eutectic mixtures of substrates, and have been used to synthesize various model and bioactive oligopeptides [30]. In similar fashion, the results of an initial study of enzymatic catalysis in metastable supersaturated solutions of carbohydrates show that such solutions, formed in the presence of small amounts of water and alcohol as plasticizers, are sufficiently stable under ambient conditions to enable enzymatic transformations of substrates [58]. A partial phase diagram for a system consisting of glucose, water, and polyethylene glycol was constructed to identify the regions which are most suitable for biotransformations (Figure 8). It was confirmed that the glass transition in this system occurred below the reaction temperature at any given composition of the constituent components. Several glycosidases were found to be catalytically active in this medium and the activity of β-glucosidase from almond was determined at several compositions of the reaction mixture and related to the corresponding regions of the phase diagram. The synthetic utility of the system was illustrated by glucosylation of several α,ω-alkyldiols, short-chain polyethylene glycols, and hydroxyalkyl and glyceryl monoacrylates. The influence of eutectic media on the kinetics and productivity of biocatalysts has yet to be fully elucidated. Syntheses in eutectic suspensions have been scaled up to the pilot scale in a rotating drum reactor. The bioactive peptide Nα-Cbz-L-Lys(N-Cbz)-Gly-L-Asp(OAll)-L-Glu(OAll)OEt was synthesized via a
24 Biocatalysis in Non-conventional Media
Fig. 8 Phase diagram for a binary eutectic system [30]. a) E: eutectic point; TA: melting point of component A; TB: melting point of component B; CE: eutectic composition. b) Phase diagram of the system N-Z-l-Tyr-OEt + l-Leu-NH2 .
Curve A: binary mixture of substrates in the absence of adjuvant; curve B: ternary mixture with 3% (w/w) water and 6% (w/w) ethanol; curve C: ternary phase diagram containing 10% (w/w) triethylene glycol dimethyl ether.
sequential N-to-C strategy in a heterogeneous solid-liquid mixture of the substrates in the presence of chymopapain and subtilisin as well as 16–20% (w/w) water and ethanol [31]. At substrate concentrations of around 1 M, yields of 67–74% per step at product concentrations of 0.36, 0.49, and 0.48 kg kg−1 were achieved. The corresponding space–time yields were between 0.30 and 0.64 kg (kg d)−1 and biocatalyst reuse provided productivities of 166–312 kg product (kg enzyme)−1 . 5.9 High-Density Salt Suspensions
A measure as simple as adding certain inorganic salts to aqueous enzyme solutions prior to lyophilization can result in dramatic activation of the dried powder in organic media relative to enzyme lyophilized without added salt [60]. Upon salt-induced lyophilization of Subtilisin Carlsberg, the optimum specificity constant Kcat /K M for transesterification in hexane exceeded the value for saltfree enzyme 20 000-fold, substantially more than the previously reported enhancement of 3750-fold. As for pure enzyme, salt-activated enzyme seems to exhibit the greatest activity when lyophilized from a solution of pH equal to the pH for optimal activity in water. The activation ratio and active-site content in the presence of 98% KCl (1% phosphate buffer and 1% enzyme) are highly sensitive to the lyophilization time and water content of the sample, over a range of up to 13fold for Subtilisin Carlsberg and 11-fold for a lipase from Mucor javanicus. The activity and water content were found to correlate directly with the kosmotropicity of the activating salt (kosmotropic salts bind water molecules strongly relative to the strength of water–water interactions in bulk solution) [60]. Combinations of
5 Novel Reaction Media for Biocatalytic Transformations 25
kosmotropic salts with known lyoprotectants such as polyethylene glycol (PEG) and sugars did not yield an appreciably more active catalyst. However, the combination of the kosmotropic sodium acetate with the strongly buffering sodium carbonate activated the enzyme more than the individual additives alone. Enzyme activity was enhanced further by the addition of small amounts of water to the organic solvent. Substrate selectivity experiments suggest that a mechanism other than selective partitioning of substrate into the enzyme-salt matrix is responsible for salt-induced activation of enzymes in organic solvents. Under optimal conditions, the enzyme activity in hexane was improved over 27 000-fold relative to the salt-free enzyme, reaching a catalytic efficiency that was within one order of magnitude of kcat /K M for hydrolysis of the same substrate in aqueous buffer. 5.10 Solid-to-Solid Syntheses
When one is using proteases in a direct reversal of their normal hydrolytic function, the equilibrium position is very important in limiting the attainable yield in equilibrium-controlled enzymatic peptide synthesis. If both reactants and products are largely undissolved in the reaction medium as suspended solids, thermodynamic analysis of such a system shows the reaction will proceed until at least one reactant has dissolved completely, towards either products or reactants (“switchlike” behavior). In case of a favorable equilibrium for synthesis, the yield is maximized in the solvent of least solubility for the starting materials [37]. Thermolysin-catalyzed reactions of X-Phe-OH (X = formyl, Ac, Z) with Leu-NH2 yielded X-Phe-Leu-NH2 with equilibrium yields > 90% over a range of solvents. Some predictions, such as a linear decrease in yield with the reciprocal of the initial reactant concentrations, could be verified [37]. This approach enables high peptide yields in equilibrium-controlled peptide synthesis in high-density aqueous media with an equimolar supply of substrates. Scale-up to molar amounts verified the concepts as well as demonstrate the synthetic utility of this approach: Z-His-Phe-NH2 and Z-Asp-Phe-OMe, precursors for cyclo-[His-Phe-] and the low-calorie sweetener Aspartame, respectively, were synthesized in preparative yields of 84–88% [21]. For a review of the field of peptide synthesis in unusual media, see Jakubke [40]. In the thermolysin-catalyzed solid-to-solid dipeptide synthesis of equimolar amounts of Z-Gln-OH and H-Leu-NH2 as model substrates, the water content was varied from 0 to 600 mL water (mol substrate)−1 and enzyme concentration in the range 0.5–10 g (mol substrate)−1 to achieve 80% yield and initial rates of 5–20 mmol (s kg)−1 [22]. When the water content is decreased from the 1.6-molal lowest substrate concentration, the initial rate increases tenfold to a pronounced optimum at 40 mL water (mol substrate)−1 and falls to much lower values in a system with no added water, and to zero in a rigorously dried system. The behavior at a higher water content was demonstrated through variation of the enzyme content to be caused by mass transfer limitations; at low water levels, the effects reflect the stimulation of the enzymatic activity by water. Preheating of the substrates or ultrasonic treatment had no significant effect on the system.
26 Biocatalysis in Non-conventional Media
During melting point experiments, the formation of a third compound, the salt of the two substrates, was discovered and coincided with a very strong dependence of kinetics on the exact substrate ratio: at 60% Leu-NH2 and 40% Z-Gln the rate was twice as high as at an equimolar substrate ratio [23]. The reaction rate of both Z-GlnLeu-NH2 and Z-Asp-Phe-OMe formation is strongly dependent on the addition of basic salts to the reaction system, pointing to the importance of acid-base effects in solid-to-solid systems [25]. If either KHCO3 or K2 CO3 is raised to 2.25-fold the equimolar amount of the Phe-OMe · HCl starting material, the rate increases 20fold and remains at that level with further addition of KHCO3 , but it drops sharply when further K2 CO3 is added. Beyond the addition of 1.5 equivalents of basic salt, the final yield of the reaction is affected negatively. These effects can be rationalized using a model estimating the pH of these systems, taking into account the possible formation of up to ten different solid phases. Together with influences of different mixing and water distribution methods, the pH model seems to explain most of the experimental results. Speeding up of enzymatic conversions of substrate in aqueous suspensions is often attempted by raising the temperature or adding organic solvents to promote dissolution of the substrate. Results of an α-chymotrypsin-catalyzed hydrolysis of dimethylbenzyl methyl malonate demonstrated that, upon addition of organic cosolvents, longer process times were actually required, even though the substrate solubility increased severalfold as expected; however, on raising the temperature from 25 to 37 ◦ C, the substrate solubility, the substrate dissolution rate, and the enzymatic reaction rate all increased, leading to shorter process times [66]. A simple relationship for the prediction of the overall process time was established by evaluating the time constants for three sub-processes involved: substrate dissolution, enzymatic conversion, and enzyme deactivation [Eqs. (15)–(17)]. Substate dissolution:τdis
= 1/(kL a)0
(15)
Enzymatic conversion:τenz = K M /(kcat · [E]0 )
(16)
Enzyme deactivation:τdeact = 1/(kd + [E]0 )
(17)
With the assumptions of second-order deactivation and first-order reaction, the regimes are characterized as follows: ([A]solid,0 = initial concentration of solid substrate, [A]lim = substrate solubility): Ĺ In the deactivation-limited regime, if τ deact ≈ [A]lim · τ enz /(0.4 · [A]solid,0 ), the process time is totally dominated by deactivation. If [A]solid,0 > [A]lim · τ enz /(0.4 · τ deact ) the enzyme is deactivated before the reaction is complete. Ĺ In the dissolution-limited regime, if deactivation is insignificant, τ deact [A]lim · τ enz /(0.4 · [A]solid,0 ), overall product formation rate r P and the overall process time tpro are given by Eqs. (18).
r p = kL a · [A]lim and tpro ≈2τdis ([A]solid,0 /[A]lim )
(18)
6 Solvent as a Parameter for Reaction Optimization (“Medium Engineering”)
Ĺ In the reaction-limited regime, τ deact [A]lim · τ enz /(0.4 · [A]solid,0 ) and τ enz 2 · τ dis , and r P and tpro are then given by Eq. (19).
r p = (kcat /K M ) · [E]0 · [A]lim and tpro ≈τenz ([A]solid,0 /[A]lim )
(19)
In this case, the concentration of dissolved substrate equals [A]lim . The subprocesses, such as dissolution/crystallization of substrates and products, enzymatic synthesis of the product(s), undesired enzymatic hydrolysis of substrates and/or products, and deactivation, often influence the pH value and are influenced in turn by it as the reactants are weak electrolytes. Investigating the kinetically controlled synthesis of the β-lactam antibiotic amoxicillin from 6-aminopenicillanic acid and D-p-hydroxyphenylglycine methyl ester in a solid suspension system in which the reaction nevertheless occurred in the liquid phase, Diender et al. found that the pH value and dissolved concentrations took a very different course at different initial substrate amounts [18]. These results were described reasonably well by the model based on mass and charge balances, pH-dependent solubilities of the reactants, and enzyme kinetics. Whereas enzymatic kinetic resolutions in a solid substrate suspension are usually treated as conventional kinetic resolutions of a fully dissolved substrate, a modeling study has demonstrated that there are important differences leading to much better performance in a solid suspension case [68]. In the suspension processes the liquid-phase concentration of substrate enantiomer that should be converted can be kept close to the maximum value, i.e., the solubility, when process conditions are properly chosen, whereas in a conventional process this concentration gradually decreases. Calculations show that this leads to a productivity that is about sixfold higher in the suspension processes. Also, for enzymes with a low enantioselectivity, a severalfold increase in yield of remaining enantiopure substrate is predicted, compared to the conventional kinetic resolution of dissolved enantiomers. Other potential advantages of using suspension reactions are that the initial substrate concentration may be higher [up to 25% (w/w)] and that the desired remaining substrate may be recovered by simply filtering off the solid crystals.
6 Solvent as a Parameter for Reaction Optimization (“Medium Engineering”)
The term “medium engineering” was coined as a complement to protein engineering [6]. Whereas the latter changes the catalyst, the former changes the reaction medium with resulting changes in prochiral, regio-, and enantioselectivity. Many such changes have been observed in the literature [6, 10, 11], however, only a few examples are given here to demonstrate that not only the catalyst but also the medium exerts a decisive influence on the outcome of an enzymatic reaction.
27
28 Biocatalysis in Non-conventional Media Table 8 Substrate specificity S (kcat /K M )Ser /(kcat /K M )Phe , in the transesterification of N-Ac-
l-amino acid ethyl ester with n-propanol, catalyzed by Subtilisin Carlsberg in various anhydrous organic solvents [69]
Solvent Dichloromethane Chloroform Toluene Benzene N,N-Dimethylformamide tert-Butyl acetate N-Methylacetamide Diethylether Carbon tetrachloride Ethyl acetate
S 8.2 5.5 4.8 4.4 4.3 3.7 3.4 3.2 3.2 2.6
Solvent tert-Butyl methyl ether Octane Isopropyl acetate Acetonitrile Dioxane Acetone Pyridine tert-Amyl alcohol tert-Butyl alcohol tert-Butylamine
S 2.5 2.5 2.2 1.7 1.2 1.1 0.53 0.27 0.19 0.12
6.1 Change of Substrate Specificity with Change of ReactionM: Specificity of Serine Proteases
When switching from water to an organic solvent, or switching between organic solvents, the substrate specificity can change. In the example of the standard reaction, transesterification of N-acetyl-L-phenylalanine ethyl ester with n-propanol by Subtilisin Carlsberg, which has been mentioned several times in this chapter already, the relative specificity between the rather hydrophobic phenylalanine compound and its more hydrophilic analog N-acetyl-L-serine ethyl ester varies with the solvent (Table 8) [69]. Partitioning of the two substrates between the bulk organic solvent and the binding pocket of the enzyme active site can act as an explanation. It is known that any binding pocket of enzymes can be identified with a value for the hydrophobicity. A hydrophobic substrate in water partitions readily into the binding pocket, but the more hydrophobic the solvent becomes the worse is the partitioning that results. From Table 8 it can be discerned, however, that this monocausal explanation does not suffice, as specificity S does not follow hydrophobicity very well. 6.2 Change of Regioselectivity by Organic Solvent Medium
Ortho-substituted phenolic diesters, such as bis-p-(1,4-butyroyl)-2octylhydroquinone (Figure 9), can be hydrolyzed in the more hindered 1-position or in the less-hindered 4-position. The regioselectivity of hydrolysis to the monoester, as expressed by the ratio of kcat /K M values, depends approximately linearly on log P over the range from acetonitrile (log P: −0.33) to cyclohexane (log P: 3.2), and is thus controlled by the hydrophobicity of the solvent (Carrea, 2000).
6 Solvent as a Parameter for Reaction Optimization (“Medium Engineering”)
Fig. 9 Regioselectivity of the hydrolysis of hydroquinone bis-esters [? ].
In another case, 4-alkylphenols can be transformed with the help of the flavoenzyme vanillyl alcohol oxidase (VAO) to either (R)-1-(4 -hydroxyphenyl) alcohols or to 1-(4 -hydroxyphenyl)alkenes. Both products pass through a common intermediate, the p-quinone methide, which then either is attacked by water or rearranges. The product spectrum can be controlled by medium engineering: in organic solvents such as acetonitrile and toluene, more cis-alkene but not transalkene, and less alcohol, are produced [71]. A similar shift in cis/trans-alkene was achieved by the addition of monovalent anions that bind specifically close to the active site.
Fig. 10 Asymmetric hydrolysis of nifedipine diesters; influence of solvent on stereochemical preferences [11, 38].
29
30 Biocatalysis in Non-conventional Media
6.3 Solvent Control of Enantiospecificity of Nifedipines
The probably most striking example so far of solvent influence on enzyme selectivity has been found by Yoshihiko Hirose and co-workers at the Amano Corp. in Nagoya, Japan, studying nifedipines. Nifedipines, structurally 1-substituted dihydropyridine mono- or diesters, are used in cardiovascular therapy, where they are R termed calcium antagonists [32]; the most prominent representative is Adalat from Bayer (Leverkusen, Germany). While the simplest achiral dihydropyridines can be synthesized by simple mixing and heating of benzaldehydes, ammonia, and alkyl ac-etoacetate, the asymmetric – and thus difficult to synthesize – dihydropyridines are therapeutically the most valuable today. The Amano researchers found not just drastically different enantiomeric excesses in the Pseudomonas sp. (Amano AH) lipase-catalyzed hydrolysis of methyleneoxypropionyl or -pivaloyl diesters, but a near-total reversal of specificity. Whereas in cyclohexane the different esters yielded half-esters with 88.8–91.4% e.e. R-specificity for a triple mutant (“FVL”) of Amano PS lipase, the same transformation with the same enzyme in diisopropyl ether (DIPE) yielded between 68.1 and > 99% e.e. of the S-product (Figure 10) [7, 38].
Suggested Further Reading G. CARREA, G. OTTOLINA, and S. RIVA, Role of solvents in the control of enzyme selectivity in organic media, Trends Biotechnol. 1995, 13, 63–70. G. CARREA and S. RIVA, Properties and synthetic applications of enzymes in
organic solvents, Angew. Chem. Int. Ed. 2000, 39, 2226–2254. A. M. KLIBANOV, Improving enzymes by using them in organic solvents, Nature 2001, 409, 241–246.
References 1 E. A. SYM, The action of esterase in the presence of organic solvents, Biochem. J. 1936, 30, 609–617. 2 A. M. KLIBANOV, G. P. SAMOKHIN, K. MARTINEK, and I. V. BEREZIN, 1977, A new approach to preparative enzymatic synthesis, Biotechnol. Bioeng. 19, 1351–1361, reprinted in Biotechnol. Bioeng 2000, 67, 737–747. 3 A. ZAKS and A. M. KLIBANOV, Enzyme-catalyzed processes in organic solvents, Proc. Natl. Acad. Sci. USA 1986a, 82, 3192–3196. 4 L. DAI and A. M. KLIBANOV, Peroxidase-catalyzed asymmetric
sulfoxidation in organic solvents versus in water, Biotechnol. Bioeng. 2000, 70, 353–357. 5 A. ZAKS and A. M. KLIBANOV, Enzymatic catalysis in organic media at 100 degrees C, Science 1984, 224, 1249–1251. 6 C. R. WESCOTT and A. M. KLIBANOV, The solvent dependence of enzyme specificity, Biochim. Biophys. Acta 1994, 1206, 1–9. 7 Y. HIROSE, K. KARIYA, Y. NAKANISHI, Y. KURONO, and K. ACHIWA, Inversion of enantioselectivity in hydrolysis of 1, 4-di-hydropyridines by point mutation of lipase PS, Tetrahedon Lett. 1995, 36, 1063–1066.
References 8 A. M. KLIBANOV, 1989, Enzymatic catalysis in anhydrous organic solvents, Trends Biochem. Sci. 14, 141–144. 9 L. A. S. GORMAN and J. S. DORDICK, Organic solvents strip water off enzymes, Biotechnol. Bioeng. 1992, 39, 392–397. 10 A. ZAKS and A. M. KLIBANOV, Substrate specificity of enzymes in organic solvents vs. water is reversed, J. Am. Chem. Soc. 1986b, 108, 2767–2768. 11 V. S. NARAYAN and A. M. KLIBANOV, Are water-immiscibility and apolarity of the solvent relevant to enzyme efficiency? Biotechnol. Bioeng. 1993, 41, 390–393. 12 R. BOVARA, G. CARREA, G. OTTOLINA, and S. RIVA, Effects of water activity on V max and K M of lipase catalyzed transesterification in organic media, Biotechnol. Lett. 1993, 15, 937–942. 13 R. H. VALIVETY, P. J. HALLING, and A. R. MACRAE, Reaction rate with suspended lipase catalyst shows similar dependence on water activity in different organic solvents, Biochim. Biophys. Acta 1992a, 1118, 218–222. 14 R. H. VALIVETY, P. J. HALLING, A. D. PEILOW, and A. R. MACRAE, Lipases from different sources vary widely in dependence of catalytic activity on water activity, Biochim. Biophys. Acta 1992b, 1122, 143–146. 15 G. BELL, A. E. M. JANSSEN, and P. J. HALLING, Water activity fails to predict critical hydration level for enzyme activity in polar organic solvents: interconversion of water concentrations and activities, Enzyme Microb. Technol. 1997, 20, 471–477. 16 A. ZAKS and A. M. KLIBANOV, Enzymatic catalysis in nonaqueous solvents, J. Biol. Chem. 1988a, 263, 3194–3201. 17 H. R. COSTANTINO, K. GRIEBENOW, R. LANGER, and A. M. KLIBANOV, On the pH memory of lyophilized compounds containing protein functional groups, Biotechnol. Bioeng. 1997, 53, 345–348. 18 L. T. KANERVA and A. M. KLIBANOV, Hammett analysis of enzyme action in organic solvents, J. Am. Chem. Soc. 1989, 111, 6864–6865. 19 A. ZAKS and A. M. KLIBANOV, The effect of water on enzyme action in organic media, J. Biol. Chem. 1988b, 263, 8017–8021.
20 R. M. GUINN, P. S. SKERKER, P. KAVANAUGH, and D. S. CLARK, Activity and flexibility of alcohol dehydrogenase in organic solvents, Biotechnol. Bioeng. 1991, 37, 303–308. 21 P. A. BURKE, R. G. GRIFFIN, and A. M. KLIBANOV, Solid-state NMR assessment of enzyme active center structure under nonaqueous conditions, J. Biol. Chem. 1992, 267, 20057–20064. 22 J. BROOS, A. J. W. G. VISSER, J. F. J. ENGBERSEN, W. VERBOOM, A. van HOEK, and D. N. REINHOUDT, Flexibility of enzymes suspended in organic solvents probed by time-resolved fluorescence anisotropy. Evidence that enzyme activity and enantioselecticity are directly related to enzyme flexibility, J. Am. Chem. Soc. 1995, 117, 12657–12663. 23 R. V. RARIY and A. M. KLIBANOV, On the relationship between enzymatic enantio-selectivity in organic solvents and enzyme flexibility, Biocat. Biotransf. 2000, 18, 401–407. 24 J. KIM, D. S. CLARK, and J. S. DORDICK, 2000, Intrinsic effects of solvent polarity on enzymic activation energies, Biotechnol. Bioeng. 67, 112–116. 25 B. A. BEDELL, V. V. MOZHAEV, D. S. CLARK, and J. S. DORDICK, Testing for diffusion limitations in salt-activated enzyme catalysts operating in organic solvents, Biotechnol. Bioeng. 1998, 58, 654–657. 26 C. LAANE, S. BOEREN, and K. VOS, On optimizing organic solvents in multi-liquid-phase biocatalysis, Trends Biotechnol. 1985, 3, 251–252. 27 C. LAANE, S. BOEREN, K. VOS, and C. VEEGER, Rules for optimization of biocatalysis in organic solvents, Biotechnol. Bioeng. 1987, 30, 81–87. 28 P. A. FITZPATRICK, D. RINGE, and A. M. KLIBANOV, Computer-assisted modeling of subtilisin enantioselectivity in organic solvents, Biotechnol. Bioeng. 1992, 40, 735–742. 29 F. TERRADAS, M. TESTON-HENRY, P. A. FITZPATRICK, and A. M. KLIBANOV, Marked dependence of enzyme prochiral selectivity on the solvent, J. Am. Chem. Soc. 1993, 115, 390–396. S. M. THOMAS, R. DICOSINO, and V. NAJARAJAN, Biocatalysis: applications and potentials
31
32 Biocatalysis in Non-conventional Media
30
31
32
33
34
35
36
37
38
39
40
41
for chemical industry, Trends Biotechnol. 2002, 20, 238–242. A. M. KLIBANOV, 1995, Enzyme memory. What is remembered and why?, Nature, 374, 596. T. KE and A. M. KLIBANOV, 1998, On enzymatic activity in organic solvents as a function of enzyme history, Biotechnol. Bioeng. 57, 746–750. L. DAI and A. M. KLIBANOV, Striking activation of oxidative enzymes suspended in nonaqueous media, Proc. Natl. Acad. Sci. 1999, USA 96, 9475–9478. A. M. KLIBANOV, 1997, Why are enzymes less active in organic solvents than in water? Trends Biotechnol. 15, 97–101. D. J. van UNEN, J. F. ENGBERSEN, and D. N. REINHOUDT, 1998, Large acceleration of alpha-chymotrypsin-catalyzed dipeptide formation by 18-crown-6 in organic solvents, Biotechnol. Bioeng. 59, 553–556. D. J. van UNEN, J. F. ENGBERSEN, and D. N. REINHOUDT, 2002, Why do crown ethers activate enzymes in organic solvents? Biotechnol. Bioeng. 77, 248–255. D. J. van UNEN, J. F. ENGBERSEN, and D. N. REINHOUDT, 2001, Sol-gel immobilization of serine proteases for application in organic solvents, Biotechnol. Bioeng. 75, 154–158. M. Y. LEE and J. S. DORDICK, 2002, Enzyme activation for nonaqueous media, Curr. Opin. Biotechnol. 13, 376–384. T. KE and A. M. KLIBANOV, Markedly enhancing enzymic enantioselectivity in organic solvents by forming substrate salts, J. Am. Chem. Soc. 1999, 121, 3334–3340. J. S. SHIN, S. LUQUE, and A. M. KLIBANOV, 2000, Improving lipase enantioselectivity in organic solvents by forming substrate salts with chiral agents, Biotechnol. Bioeng. 69, 577–583. P. K. DAS, J. M. CAAVEIRO, S. LUQUE, and A. M. KLIBANOV, Binding of hydrophobic hydroxamic acids enhances peroxidase’s stereoselectivity in nonaqueous sulf-oxidations, J. Am. Chem. Soc. 2002, 124, 782–787. S. KAMAT, J. BARRERA, E. J. BECKMAN, and A. J. RUSSELL, Biocatalytic synthesis of acrylates in organic solvents and supercritical fluids: I. Optimization of
42
43
44
45
46
47
48
49
50
51
enzyme environment, Biotechnol. Bioeng. 1992a, 40, 158–166. U. KRAGL, M. ECKSTEIN, and N. KAFTZIK, 2002, Enzyme catalysis in ionic liquids, Curr. Opin. Biotechnol. 13, 565–571. M. ERBELDINGER, X. NI, and P. J. HALLING, Kinetics of enzymatic solid-to-solid peptide synthesis: synthesis of Z-aspartame and control of acid-base conditions by using inorganic salts, Biotechnol. Bioeng. 2001, 72, 69–76. N. KAFTZIK, P. WASSERSCHEID, and U. KRAGL, Use of ionic liquids to increase the yield and enzyme stability in the β-galactosidase catalyzed synthesis of N-acetyllactosamine, Org. Proc. Res. Dev. 2002, 6, 553–557. S. PARK and R. J. KAZLAUSKAS, Improved preparation and use of room-temperature ionic liquids in lipase-catalyzed enantioand regioselective acylations, J. Org. Chem. 2001, 66, 8395–8401. M. ECKSTEIN, P. WASSERSCHEID, and U. KRAGL, Enhanced enantioselectivity of lipase from Pseudomonas sp. at high temperatures and fixed water activity in the ionic liquid, 1-butyl-3-methylimidazolium bis[(trifluoromethyl)sulfonyl]amide, Biotechnol. Lett. 2002b, 24, 763–767. P. LOZANO, T. de DIEGO, J.-P. GUEGAN, M. VAULTIER, and J. L. IBORRA, Stabilization of α-chymotrypsin by ionic liquids in transesterification reactions, Biotechnol. Bioeng. 2001, 75, 563–569. J. W. SHIELD, H. D. FERGUSON, A. S. BOMMARIUS, and T. A. HATTON, Enzymes in reversed micelles as catalysts for organic-phase reactions, Ind. Eng. Chem. Fundam. 1986, 25, 603–612. A. S. BOMMARIUS, D. I. C. WANG, and T. A. HATTON, Xanthine oxidase reactivity in reversed micellar systems: a contribution to the prediction of enzymatic activity in organized media, J. Am. Chem. Soc. 1995, 117, 4515–4523. D. JOBE, in: Reactions in Compartmentalized Liquids, W. KNOCHE and R. SCHOMA¨ CKER (eds.), Springer Verlag, Heidelberg, 1989, pp. 39–51. N. L. KLYACHKO, A. V. LEVASHOV, A. V. PSHEZHETSKII, N. G. BOGDANOVA, I. V. BEREZIN, and K. MARTINEK, Catalysis by
References
52
53
54
55
56
57
58
59
60
61
enzymes entrapped into hydrated surfactant aggregates having lamellar or cylindrical (hexagonal) or ball-shaped (cubic) structure in organic solvents, Eur. J. Biochem. 1986, 161, 149–154. N. H. GRANT and H. E. ALBURN, Acceleration of enzyme reactions in ice, Nature 1966, 212, 194. M. SCHUSTER, A. AAVIKSAAR, and H.-D. JAKUBKE, Enzyme-catalyzed peptide synthesis in ice, Tetrahedron 1990, 24, 8093–8102. M. HAENSLER, N. WEHOFSKY, S. GERISCH, J. D. WISSMANN, and H.-D. JAKUBKE, Reverse catalysis of elastase from porcine pancreas in frozen aqueous systems, Biol. Chem. Hoppe Seyler 1998, 379, 71–74. S. GERISCH, G. ULLMANN, K. STUBENRAUCH, and H.-D. JAKUBKE, Enzymatic peptide synthesis in frozen aqueous systems: influence of modified reaction conditions on the peptide yield, Biol. Chem. Hoppe Seyler, 1994, 375, 825–828. G. ULLMANN, M. HAENSLER, W. GRUENDER, M. WAGNER, H. J. HOFMANN, and H.-D. JAKUBKE, Influence of freezeconcentration effect on proteinase-catalysed peptide synthesis in frozen aqueous systems, Biochim. Biophys. Acta 1997, 1338, 253–258. I. GILL and E. VULFSON, Enzymic catalysis in heterogeneous eutectic mixtures of substrates, Trends Biotechnol. 1994, 12, 118–122. A. MILLQVIST-FUREBY, I. S. GILL, and E. N. VULFSON, 1998, Enzymatic transformations in supersaturated substrate solutions: I. A general study with glycosidases, Biotechnol. Bioeng. 60, 190–196. I. GILL and R. VALIVETY, Pilot-scale enzymatic synthesis of bioactive oligopeptides in eutectic-based media, Org. Proc. Res. Dev. 2002, 6, 684–691. M. T. RU, J. S. DORDICK, J. A. REIMER, and D. S. CLARK, 1999, Optimizing the salt-induced activation of enzymes in organic solvents: effects of lyophilization time and water content, Biotechnol. Bioeng. 63, 233–141. P. J. HALLING, U. EICHHORN, P. KUHL, and H. D. JAKUBKE, Thermodynamics of
62
63
64
65
66
67
68
69
solid-to-solid conversion and application to enzymic peptide synthesis, Enzyme Microb. Technol. 1995, 17, 601– 606. U. EICHHORN, A. S. BOMMARIUS, K. DRAUZ, and H.-D. JAKUBKE, Synthesis of dipeptides by suspension-to-suspension conversion via thermolysin catalysis: from analytical to preparative scale, J. Pept. Sci. 1997, 3, 245–251. H. D. JAKUBKE, U. EICHHORN, M. HANSLER, and D. ULLMANN, Non-conventional enzyme catalysis: application of proteases and zymogens in biotransformations, Biol. Chem. Hoppe Seyler 1996, 377, 455–464. M. ERBELDINGER, X. NI, and P. J. HALLING, Effect of water and enzyme concentration on thermolysin-catalyzed solid-to-solid peptide synthesis, Biotechnol. Bioeng. 1998, 59, 68–72. M. ERBELDINGER, X. NI, and P. J. HALLING, Kinetics of enzymatic solid-to-solid peptide synthesis: intersubstrate compound, substrate ratio, and mixing effects, Biotechnol. Bioeng. 1999, 63, 316–321. A. WOLFF, L. ZHU, Y. W. WONG, A. J. STRAATHOF, J. A. JONGEJAN, and J. J. HEIJNEN, Understanding the influence of temperature change and cosolvent addition on conversion rate of enzymatic suspension reactions based on regime analysis, Biotechnol. Bioeng. 1999a, 62, 125–134. M. B. DIENDER, A. J. STRAATHOF, T. van der DOES, M. ZOMERDIJK, and J. J. HEIJNEN, Course of pH during the formation of amoxicillin by a suspension-to-suspension reaction, Enzyme Microb. Technol. 2000, 27, 576–582. A. WOLFF, V. van ASPEREN, A. J. STRAATHOF, and J. J. HEIJNEN, Potential of enzymatic kinetic resolution using solid substrates suspension: improved yield, productivity, substrate concentration, and recovery, Biotechnol. Prog. 1999b, 15, 216–227. C. R. WESCOTT and A. M. KLIBANOV, Solvent variation inverts substrate specificity of an enzyme, J. Am. Chem. Soc. 1993, 115, 1629–1631.
33
34 Biocatalysis in Non-conventional Media
70 E. RUBIO, A. FERNANDEZ-MAYORALES, and A. M. KLIBANOV, Effect of the solvent on enzyme regioselectivity, J. Am. Chem. Soc. 1991, 113, 695–696. 71 R. H. H. van den HEUVEL, J. PARTRIDGE, C. LAANE, P. J. HALLING, and W. J. H. van BERKEL Tuning of the product spectrum of vanillyl-alcohol oxidase by medium engineering, FEBS Lett. 2001, 503, 213–216. 72 S. GOLDMANN and J. STOLTEFUSS, 1, 4-Dihydro-pyridines: effect of chirality and conformation on the calcium-antagonistic and -agonistic effects, Angew. Chem. 1991, 103, 1587–605; Angew. Chem. Int. Ed. Engl. 1991, 30, 1559–1578. 73 Y. HIROSE, K. KARIYA, I. SASAKI, Y. KURONO, H. EBIIKE, and K. ACHIWA, Drastic solvent effect on lipase-catalyzed enantioselective hydrolysis of prochiral 1, 4-dihydropyridines, Tetrahedron Lett. 1992, 33, 7157–7160. 74 G. CARREA, G. OTTOLINA, and S. RIVA, Role of solvents in the control of enzyme selectivity in organic media, Trends Biotechnol. 1995, 13, 63–70. 75 G. CARREA and S. RIVA, Properties and synthetic applications of enzymes in organic solvents, Angew. Chem. Int. Ed. 2000, 39, 2226–2254. 76 M. F. CHAPLIN and C. BUCKE, Enzyme Technology, Cambridge University Press, Cambridge, UK, 1990, p. 229. 77 J. S. DEETZ and J. D. ROZZELL, Enzymic catalysis by alcohol dehydrogenases in organic solvents, Ann. N. Y. Acad. Sci. 1988, 542( Enzyme Eng. 9), 230–234. 78 M. ECKSTEIN, M. SESING, U. KRAGL, and P. ADLERCREUTZ, At low water activity α-chymotrypsin is more active in an ionic liquid than in non-ionic organic solvents, Biotechnol. Lett. 2002a, 24, 867–872. 79 M. ERBELDINGER, A. J. MESIANO, and A. J. RUSSELL, Enzymatic catalysis of formation of Z-aspartame in ionic liquid – an alternative to enzymatic catalysis in organic solvents, Biotechnol. Prog. 2000, 16, 1129–1131.
80 P. A. FITZPATRICK and A. M. KLIBANOV, How can the solvent affect enzyme enantioselectivity? J. Am. Chem. Soc. 1991, 113, 3166–3171. 81 A. S. GHATORAE, G. BELL, and P. J. HALLING, Inactivation of enzymes by organic solvents: new technique with well-defined interfacial area, Biotechnol. Bioeng. 1994, 43, 331–336. 82 S. KAMAT, E. J. BECKMAN, and A. J. RUSSELL, Role of diffusion in nonaqueous enzymology. 1. Theory, Enz. Microb. Technol. 1992b, 14, 265–271. 83 K. MARTINEK, A. V. LEVASHOV, N. KLYACHKO, Yu. L. KHMEL’NITSKII, and I. V. BEREZIN, Micellar enzymology, Eur. J. Biochem. 1986, 155, 453–468. 84 A. MARTY, W. CHULALAKSANANUKUL, J. S. CONDORET, R. M. WILLEMOT, and G. DURAND, Comparison of lipase-catalyzed esterification in supercritical carbon dioxide and in n-hexane, Biotechnol. Lett. 1990, 12, 11–16. 85 A. J. MESIANO, E. J. BECKMAN, and A. J. RUSSELL, 1999, Supercritical biocatalysis, Chem. Rev. 99, 623–634. 86 K. OYAMA, S. IRINO, T. HARADA, and N. HAGI, Enzymic production of aspartame, Ann. N. Y. Acad. Sci. (Enzyme Eng. 7) 1984, 434, 95–98. 87 D. ROCHE, K. PRASAD, and O. REPIC, Enantio-selective acylation of β-aminoesters using penicillin G acylase in organic solvents, Tetrahedron Lett. 1999, 40, 3665–3668. 88 M. T. RU, K. C. WU, J. P. LINDSAY, J. S. DORDICK, J. A. REIMER, and D. S. CLARK, 2001, Towards more active biocatalysts in organic media: increasing the activity of salt-activated enzymes, Biotechnol. Bioeng. 75, 187–196. 89 H. WAJANT, S. FORSTER, A. SPRAUER, F. EFFENBERGER, and K. PFIZENMAIER, Enantioselective synthesis of aliphatic (S)-cyanohydrins in organic solvents using hydroxynitrile lyase from Manihot esculenta, Ann. N. Y. Acad. Sci. 1996, 799, 771–776.
1
Photoproteins as Reporters in Whole-cell Sensing Jessika Feliciano, Patrizia Pasini, Sapna K. Deo, and Sylvia Daunert University of Kentucky, Lexington, USA
Originally published in: Photoproteins in Bioanalysis. Edited by Sylvia Daunert and Sapna K. Deo. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31016-6
1 Introduction
Reporter genes code for proteins that produce a detectable signal, which can be differentiated in a mixture of endogenous intra- or extracellular proteins [1, 2]. They have found applications in studies of transcription control, gene expression, cell-signaling mechanisms, drug target discovery, cell-based analyte biosensing, and environmental monitoring of specific microorganisms [3], among others. Photoproteins can serve as reporters in two different ways: via genetically engineered systems that are selective toward specific analytes or by using a constitutively bioluminescent/fluorescent cell line that measures overall toxicity. This chapter will focus on the applications of photoproteins as reporters in genetically engineered whole-cell sensing systems.
1.1 Biosensors Using Intact Cells
A biosensor combines a biological component (the sensing element), which is responsible for the selectivity of the device, with a detection system (the transducer) for measuring the reaction of the biological component with the substance (analyte) being monitored. The biological component can be an enzyme, an antibody, a receptor, or whole cells, while the biological reaction that is monitored can be the activity of an enzyme, the binding of an analyte to an antibody or receptor, the induction of gene expression within cells, or even cell death [4, 5]. A number of detection systems have been employed to monitor these biological events, including electrochemical, optical, and piezoelectric systems [6, 7]. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Photoproteins as Reporters in Whole-cell Sensing
There are a number of advantages in using intact microorganisms, rather than isolated biological components, in biosensors. Microorganisms are usually more tolerant of assay conditions that would be detrimental to an isolated enzyme or protein. Because microorganisms have mechanisms that help them regulate their internal environment, they are also more tolerant of suboptimal pH and temperatures than purified enzymes and are less likely to be inhibited by solutes in the sample being assayed. Microorganisms are cheaper to use in biosensors because the active biological component does not have to be isolated and unlimited quantities can be prepared relatively inexpensively [8–10]. Because an analyte, such as a pollutant, must be taken up by the microorganism before a response is produced, using intact microorganisms can provide information about the bioavailability of the analyte. Classical analytical methods of detection cannot distinguish between bioavailable pollutants from those that are unavailable. In that regard, sensing systems using intact cells complement the more conventional sensors where isolated biological components are being used. These systems may provide physiologically relevant data and give an indication of the bioavailability of a pollutant, which is especially important when decisions regarding remediation of contaminated environments have to be made [9]. While only the bioavailable amount can be obtained, cell-based biosensors are preferred over other biological biosensors because one can acquire other types of information such as toxicity, mutagenicity, and pharmacological activity. Because of all these characteristics, cell-based biosensing systems have found applications in biotechnology, pharmacology, molecular biology, and clinical and environmental chemistry. Among the advances over the years is the constant wish to improve the systems. Improved high-throughput assays for gene function, automation to enhance screening and capacity, and miniaturization to reduce costs have been developed. Systems for the correction of changes in signal that are due to the physiological state of the cell have also been developed. For a good review on how the bioreporter design can be improved by mutagenizing the sensing and the regulatory proteins involved, readers are encouraged to review Ref. [9]. Several reviews have been written on cell-based biosensors, including by our group in 2000 [10]. This chapter will present a general view of the photoproteins with an emphasis on their application specifically to bacterial whole-cell sensing systems. The most advanced technology from our 2000 review will be highlighted, as well as advances in reporter gene technology in cell-based assays since then. 1.2 Reporter Genes in Genetically Engineered Whole-cell Sensors
Genetically engineered whole-cell sensing systems can be developed by coupling a sensing element, which recognizes the analyte and confers selectivity to the system, with a reporter gene, which produces the detectable signal and determines the sensitivity of the system. In this chapter whole-cell sensors that employ receptors as sensing elements will not be discussed. Such biosensing systems are usually based
1 Introduction 3
Fig. 1 Schematic representation of a whole-cell sensing system based on reporter genes.
on eukaryotic cells, either yeast or mammalian cells, and are routinely used in drug screening assays and for the environmental detection of dioxin- and estrogen-like compounds [11–13]. Instead, we will focus on inducible operon-based whole-cell sensors, in which the sensing element is composed of regulatory proteins that are capable of molecular recognition and promoter regions (O/P) of DNA. In a whole-cell sensing system, the regulatory protein controls the transcription of the fused genes. When the target analyte is present, the regulatory protein is capable of recognizing the analyte and will unbind itself from the operator/promoter region, causing transcription to occur. Thus, when the induction takes place in the presence of the target analyte, the reporter gene is co-expressed along with the other genes of the operon. Consequently, the concentration of the inducer can be quantified by measuring the signal generated by the reporter protein (Fig. 1). When a reporter is being selected, one has to take into consideration which type of application the reporter is needed for, including how sensitive the assay should be, in what concentration range one needs to detect, and the nature of the target analyte. In the case of sensitivity, when the reporter gene codes for an enzyme, internal amplification cascades can be used to increase the sensitivity of the device. The stability of the reporter plays a very important role as well. For instance, green fluorescent protein variants that are unstable have different halflives and are more appropriate for kinetic studies than the red fluorescent protein, which takes days to maturate. A good reporter should be environmentally safe, easy to use, highly sensitive, relatively small in size (for inclusion in the vector), and nontoxic, so that expression is related only to the gene under study and not to overall toxicity (biologically inert and benign). Additionally, the reporter should produce a wide dynamic response, its signal should be distinguishable over background, and the expressed product should be enough so that it can be detected directly or
4 Photoproteins as Reporters in Whole-cell Sensing Table 1 Advantages and disadvantages of photoproteins in whole-cell sensing applications
Reporter protein
Advantages
Disadvantages
Bacterial luciferase
High sensitivity. Does not require addition of a substrate. No endogenous activity in mammalian cells. No light source requirement.
Heat labile and therefore of limited use in mammalian cells. Narrow linear range.
Firefly luciferase
High sensitivity. Broad linear range. No endogenous activity in mammalian cells. High quantum yield. Spectral variants.
Sea pansy luciferase
No endogenous activity in bacterial and mammalian cells. No light source requirement. Its substrate, coelenterazine, is membrane permeable. High sensitivity. No endogenous activity in mammalian cells. No light source requirement. Detection down to subattomole levels. Autofluorescent and thus does not require addition of a substrate or cofactors. Spectral variants. No endogenous homologues in most systems. Stable at biological pH.
Requires addition of a substrate. Requires an aerobic environment and ATP. Requires addition of solubilizers for the substrate to permeate the cell. Flash-type emission. Requires addition of a substrate.
Aequorin
Green fluorescent protein
Requires addition of a substrate and the presence of Ca2+ . Flash-type emission. Moderate sensitivity (no signal amplification). Requires post-translational modification. Background fluorescence from biological systems may interfere. Potential cytotoxicity in some cell types.
indirectly. Table 1 summarizes the advantages and disadvantages of the reporter photoproteins. Details about specifics on each of the reporter proteins have been the subject of reviews elsewhere and will not be discussed here. The following section covers the specific characteristics of each photoprotein when used as a reporter in a cell-based assay.
2 The Luciferases
Bioluminescence is the term used to describe the light produced by chemical excitation within living organisms. In many cases light emission is based on luciferasemediated oxidation reactions. Luciferase is a generic name for an enzyme that catalyzes a light-emitting reaction [14]. Luciferases can be found in both terrestrial and aquatic organisms, including bacteria, algae, fungi, jellyfish, insects, shrimps, squids, and fireflies [15]. The most widely used luciferase genes in reporter gene
2 The Luciferases Table 2 Chemical reactions involving photoproteins
Reporter protein
Catalyzed reaction
Bacterial luciferase
FMNH2 + R-CHO + O2 → FMN + H2 O + RCOOH + hν (490 nm)
Firefly luciferase
Firefly luciferin + O2 + ATP → oxyluciferin + AMP + Pi + hν (550–575 nm)
Renilla luciferase
Renilla luciferase + Coelenterazine + O2 → coelenteramide + CO2 + hν (482 nm)
Aequorin
Coelenterazine + O2 + Ca2+ → coelenteramide + CO2 + hν (469 nm)
Green fluorescent protein
Post-translational formation of an internal chromophore
assays are the bacterial luciferase genes (lux genes of terrestrial Photorhabdus luminescens and marine Vibrio harveyi bacteria) and the eukaryotic luc and ruc genes from firefly (Photinus pyralis) and sea pansy (Renilla reniformis). 2.1 Bacterial Luciferases
Bacterial luciferases have found applications in gene transfer and expression, promoter analysis, gene delivery, imaging of gene expression [16], drug discovery and signaling pathways, etc. However, because they are heat-labile dimeric proteins, their applications in mammalian cell-based systems have been limited [1]. Bacterial luciferase catalyzes the oxidation of a reduced flavin mononucleotide (FMNH2 ) and a long-chain aldehyde to FMN and the corresponding long-chain carboxylic acid, emitting light at 490 nm (Table 2). The blue-green light produced in this reaction has a quantum yield between 0.05 and 0.15 [17]. The lux genes from bacteria luciferases have been isolated and extensively employed in the construction of bioreporters. They are five genes that are organized in the luxCDABE operon: luxA and luxB code for the α and β subunits of bacterial luciferase, while luxC, luxD, and luxE code for the synthesis enzymes for the long-chain aldehyde substrate. These five genes are conserved in all bacterial species identified to date. Depending on the combination of the genes employed, different types of bioluminescent sensing systems can be developed [18]. 2.1.1 luxAB Bioreporters These bioreporters contain only the luxA and luxB genes, which are sufficient for generating the bioluminescent signal; however, because the genes that code for the production of the substrate are not present, the substrate must be supplied to the cell. Typically, decanal is added at some point during the assay. This system has the advantage that one can control the moment when the reaction is triggered and the measurement be taken right after, dramatically reducing the basal
5
6 Photoproteins as Reporters in Whole-cell Sensing
bioluminescence resulting from “leaky” promoters. Various luxAB sensing systems have been developed in bacterial, yeast, insect, nematode, plant, and mammalian cell sensing systems. Nevertheless, the addition of a substrate makes it impractical for deployable applications, and platforms for biosensors have utilized the entire luxCDABE cassette instead because expression of all five genes has the advantage of not requiring addition of a substrate. 2.1.2 luxCDABE Bioreporters These bioreporters contain the full lux operon cassette, thereby allowing for a completely independent bioluminescent system that requires no external addition of a substrate. In this biosensing system, the bioreporter is exposed to the target analyte and a quantitative change in bioluminescence is detected, often within an hour. The rapidity of the assay, its ease of use, and the substrate-free characteristic of this system make luxCDABE bioreporters ideal for real-time, online, field, and highthroughput applications. For that reason, luxCDABE-based bacterial luciferases are the most widely used photoproteins in whole-cell sensing applications. They have been incorporated into a diverse range of platforms and have found applications in environmental and clinical applications, among others. 2.1.3 Naturally Luminescent Bioreporters One of the earliest applications of bacterial luciferase as a reporter was in the Microtox assay [19, 20]. This assay relies on the changes in light production of the natural luminescence of Photobacterium phosphoreum and is used to measure overall toxicity. Similarly, toxin-sensitive cells can be used in biosensors for the nonspecific detection of toxins. When the cells are exposed to critical amounts of toxins, either the metabolic activity of the cells decreases or the cells die. In either case, as the concentration of toxins in the environment increases there is a corresponding decrease in the signal produced by a reporter protein expressed constitutively within the cells. It should be noted that this type of biosensor will respond to any type of substance that is toxic to the cells. Based on this principle, biosensors have been developed for determining the biotoxicity of a sample [21, 22]. 2.2 Eukaryotic Luciferases 2.2.1 Firefly Luciferase Firefly luciferase, encoded by the luc gene, was first isolated from the North American firefly Photinus pyralis [23, 24]. The structure of this luciferase is different from that of bacterial luciferase, and the resulting bioluminescent reaction exhibits other characteristics as well. The oxygen-dependent bioluminescent reaction converts the substrate luciferin to oxyluciferin based on energy transfer from ATP to the substrate to yield AMP, carbon dioxide, and light in the 550–575 nm range. The bioluminescence produced by firefly luciferase is characterized by a peak after 300 ms, after which it exhibits decay. This decay is produced by the product oxyluciferin, which associates with the enzyme and inhibits light production. This can be
3 Aequorin
overcome by addition of coenzyme A in the reaction mix, which will render dissociation of oxyluciferin from the enzyme and prolong emission over several minutes [23, 24]. The bioluminescence resulting from firefly luciferase exhibits the highest quantum yield known for a bioluminescent reaction (0.88, which is about 10 times larger than that of bacterial luciferase) [1]. Like bacterial luciferase, mammalian organisms do not exhibit endogenous firefly luciferase activity. However, the genes required to synthesize the substrate (luciferin) are not present; thus, addition of an exogenous substrate is always required for bioluminescence to occur. There has been controversy in this matter because the substrate cannot be permeated into the cell [25]. The addition of solubilizers such as DMSO has been required to facilitate entry into the cell. An interesting feature of some eukaryotic luciferases is that they can exhibit light in different wavelengths, ranging from 547 nm to 593 nm [26]. This characteristic has been employed only in colony detection on the same plate, but it can be further exploited in multi-analyte detection by fusing each mutant to a different regulatory gene. Firefly luciferase has found applications in whole-cell sensing systems for the detection of heavy metals and aromatic organics. Its high sensitivity (subattomole level) and broad dynamic range (eight orders of magnitude) make this reporter a favorite in mammalian cell-based sensing systems. 2.2.2 Sea Pansy Luciferase The sea pansy Renilla reniformis is a coelenterate that, unlike other bioluminescent organisms, emits a bright green light. In vivo, Renilla luciferase oxidizes its substrate coelenterazine, and after energy transfer to the green fluorescent protein, light is emitted at 509 nm. In vitro, it is represented by the Rluc gene and produces a blue light of 482 nm [27]. Like other luciferases, Renilla luciferase has no endogenous activity in bacteria and thus can find applications in bacteria-based sensing systems. Studies have shown a sensitivity and detection range similar to the firefly luciferase. Despite its major advantage over firefly luciferase that coelenterazine is membrane permeable, it has limited applications in analytical chemistry. This is probably due to a flash half-life of several seconds. The main application of this eukaryotic luciferase has been in combination with firefly luciferase, in a dual luciferase-based reporter system [28]. Here, both eukaryotic luciferases show a linear response to the target analyte: five orders of magnitude for Renilla luciferase and seven orders of magnitude for firefly luciferase. In both cases, substrate addition makes it impractical for field applications.
3 Aequorin
Aequorin is a calcium-binding photoprotein isolated from the jellyfish Aequorea victoria. This protein is comprised of three components: apoaequorin (the precursor of the photoprotein aequorin), a coelenterazine chromophore, and molecular oxygen [29]. Within the protein, it contains three Ca2+ -binding sites that, once occupied, cause the protein to undergo a conformational change, which in turn triggers the
7
8 Photoproteins as Reporters in Whole-cell Sensing
oxidation of coelenterazine to a highly unstable intermediate, coelenteramide. Like Renilla luciferase, it catalyzes the oxidation of coelenterazine to yield a flash blue light of 469 nm that lasts five seconds. This flash-type light emission is accompanied by the release of CO2 , has a quantum yield between 0.15 and 0.20, and requires a Ca2+ concentration of 0.1–10 µM to induce light emission [30]. To date, aequorin is one of the most extensively studied calcium-binding photoproteins. It can be regenerated by dialyzing with EDTA to remove Ca2+ and adding fresh coelenterazine. Among the advantages of using aequorin in reporter gene assays are its high sensitivity and low background. Aequorin can be detected at subattomole levels and is not toxic to various types of cells [31]. Furthermore, a complete absence of endogenous aequorin in mammalian cells can find applications in this type of cellbased sensing systems. Despite these advantages, aequorin is not a popular reporter in cell-based assays. This is probably due to its flash-type light emission (lasting less than five seconds), which requires a luminometer that can initiate and detect the reaction rapidly. In addition, the exogenous requirement of coelenterazine makes this impractical for out-of-the-laboratory applications. Currently, there are two ways of assessing aequorin bioluminescence: detection of aequorin itself or detection of the luminescence triggered by the addition of Ca2+ as the product of the signaling cascade. The latter is more commonly used. Aequorin is widely used as a label in immunoassays and in nucleic acid probe assays, and, because it is a Ca2+ -binding protein, it is used to monitor intracellular levels of free calcium in both prokaryotes and eukaryotes. However, to our knowledge there is only one application in cell-based sensing assays. Rider et al. developed the CANARY (Cellular Analysis and Notification of Antigen Risks and Yields) assay for the detection of pathogenic bacteria and viruses [32, 33]. In this assay, B cells were genetically engineered to express membrane-form antibodies and apoaequorin. The cells were patterned on glass substrates using photolithography and packed into a flow chamber of a microfabricated device. Once the antigen binds, an intracellular signaling cascade is triggered, causing the release of Ca2+ and the activation of the luminescent reaction of aequorin. Detection of pathogenic bacteria and viruses was achieved in less than 1 min with high specificity and sensitivity. An advantage of this assay is that it can be engineered for any target analyte and can be specific to antigens of different pathogens.
4 Fluorescent Proteins
Fluorescent proteins absorb light at a certain maximum wavelength and emit it at a different maximum wavelength, normally a higher one. The green fluorescent protein (GFP) is by far the most popular fluorescent reporter protein in whole cellbased assays. GFP variants and the red fluorescent protein (DsRed) also have found applications in this field. Spectral properties of fluorescent proteins are reported in Table 3.
4 Fluorescent Proteins Table 3 Excitation and emission maxima of fluorescent photoproteins
Reporter protein
Green fluorescent protein (GFP) Enhanced green fluorescent protein (EGFP) Green fluorescent protein UV (GFPuv) Blue fluorescent protein (BFP/EBFP) Cyan fluorescent protein (CFP/ECFP) Yellow fluorescent protein (YFP/EYFP) Red fluorescent protein (DsRed)
Excitation kmax (nm)
Emission kmax (nm)
395 488 395 380 433 513 558
509 509 509 440 475 527 583
4.1 Green Fluorescent Protein
The green fluorescent protein, encoded by the gfp gene, has been isolated and cloned from the jellyfish Aequorea victoria. In vivo the role of GFP is to shift the blue bioluminescence of aequorin to green fluorescence by means of a radiationless energy transfer from aequorin to GFP [34]. The wild-type GFP has an excitation maximum at 395 nm, with a minor peak at 475 nm, and an emission maximum at 509 nm with a small shoulder at 540 nm [35]. The main advantage of GFP as a reporter protein is that it does not require the addition of exogenous substrates or cofactors to produce light; therefore, its use is not limited by the accessibility of substrates [1]. Detection of GFP requires only irradiation by light, which makes GFP the reporter of choice for in situ detection of ana lytes. Induced bacteria expressing GFP can be detected at a single-cell level by using techniques such as epifluorescence microscopy, flow cytometry, or con focal laser scanning microscopy [36]. GFP is a very stable protein, which is advantageous because long-lasting GFP produced from weak promoters or under conditions of low metabolic activity can accumulate in the biosensor cells over time. GFP can be easily expressed in a wide range of organisms, in particular bacteria. This enabled extensive mutational studies of the native protein and the functions of individual amino acids, with the final goal of altering GFP stability and spectral properties (Fig. 2). Variants with improved fluorescence intensity, thermostability, and chromophore folding have been obtained. Shifts within the excitation and emission spectra have also been achieved, thus making available mutants such as the blue, cyan, and yellow variants that can be used for multi-analyte detection purposes [37]. Different strategies have been used to generate variants with shorter half-lives that are suitable for time-dependent induction studies [38]. Limitations to the use of GFP as a reporter include lower detectability compared to other reporter systems. This is due in part to lack of the amplification step occurring with enzyme reaction-based reporters. Additionally, despite the absence of endogenous GFP homologues in most target organisms, the sensitivity of GFP may be compromised by the presence of other fluorescent molecules found in
9
10 Photoproteins as Reporters in Whole-cell Sensing
Fig. 2 Expression of fluorescent reporter proteins in E. coli. Reprinted in part from Ref. [10] with permission from the American Chemical Society.
biological systems, which increase background signal [2]. The need for O2 for folding and maturation also limits application to aerobic conditions. Overall, GFP has unique characteristics that have led to its successful application as a reporter for various analytes [10, 38]. 4.2 Red Fluorescent Protein
The dsRed gene encodes a red fluorescent protein (DsRed) with homology to GFP. The red fluorescent protein has been cloned from coral Discosoma sp. [39]. It has an excitation peak at 558 nm and an emission peak at 583 nm. These spectral properties are suitable for its use as a reporter protein. In fact, cellular autofluorescence is supposed to be lower at longer wavelengths, which should result in lower background signal with DsRed as compared to GFP [40]. However, the use of DsRed as a reporter is somewhat limited by the fact that it has a maturation period of several hours to days [41]. In one study the performance of dsRed was compared to that of other reporter genes (gfp, luc, and luxCDABE) in identical constructs in which the reporter genes were inserted under the control of mercury-responsive regulatory units (mer) [42]. As expected, both fluorescent proteins gave a slow response, with DsRed being much slower than GFP. DsRed was expected to give better sensitivity than GFP because of lower autofluorescence; however, that did not happen with the mer–dsRed construct used in the study.
5 Multiplexing
Many of these reporters have been used in combination. Multiple reporter genes are commonly utilized to provide multi-analyte detection or as viability control. Various combinations have been employed, including using different spectral variants of the same photoprotein and using a combination of a fluorescent photoprotein
6 Applications
and bioluminescent one either within the same construct in the same cell [43, 44], within separate constructs within the same cell, or in different cell lines [45, 46]. For instance, the click beetle has four different luciferases that produce bioluminescence emission at different emission maxima (548 nm, 560 nm, 578 nm, and 590 nm) [26, 47]. If their emission spectra do not overlap, the genes encoding the 548-nm and 590-nm emission could be used in a dual-detection system. Not only would detection of more than one analyte be feasible, but also the assay would provide a much simpler system because only one substrate would be required. When using bioluminescent photoproteins, the signals are simultaneously generated and independently measured because of either different kinetics (half-life) or different emission spectrum. In contrast, the fluorescent photoproteins have to be independently excited and/or measured because of different excitation and emission spectra. We developed a dual-analyte sensing system for the simultaneous detection of β-lactose and L-arabinose as model analytes. Two variants of the green fluorescent protein, BFP2 and GFPuv, were used as the signal-generating reporter proteins. The cells were excited at a given wavelength, and emission for the two reporters was collected at wavelengths characteristic of each reporter. The fluorescence emission thus obtained could be correlated with the amount of sugars present in the sample. The system proved to be highly selective for the two analytes [48]. Dual-reporter reagents using Renilla and firefly luciferase reporters are commercially available. For viability control, we developed a fluorescent dual-reporter system that utilizes two fluorescent proteins with different emission spectra (GFPuv and YFP) within the same genetic construct. One of the fluorescent proteins was used for the analytical signal, while the other was the internal reference of general cell viability. Consequently, any environmental conditions that resulted in a nonspecific effect on the analytical signal could be corrected. Such a system with internal response correction is useful for environmental samples whose complex matrices can affect cell viability [49].
6 Applications
Luciferases are the most widely used of the reporter proteins in whole-cell sensing systems. Unlike fluorescent photoproteins, there is no light scattering from other components in the mixture, as well as no background emission from the media because the media itself is not luminescent. This low background expression along with the enzymatic nature and high quantum efficiency of bioluminescent reactions account for the higher and faster detectability of bioluminescence when compared to fluorescence. Because luminometers do not contain a light source, whole-cell sensing systems based on bioluminescence, specifically bacterial luciferases, have been more widely used in real-sample analysis and field applications. The small number of instrument components required has made possible the construction
11
12 Photoproteins as Reporters in Whole-cell Sensing Table 4 Whole-cell sensing systems using photoproteins as reporters able to respond to dif-
ferent stress factors Stress inducer
Promoter
Stress response
Reporter genes
References
Protein damage Oxidative damage Oxidative damage Membrane synthesis DNA damage
GrpE KatG MicF FabA RecA
Heat shock Hydrogen peroxide Superoxide Fatty acid synthesis SOS
luxCDABE luxCDABE luxCDABE luxCDABE luxCDABE gfpuv
[44, 51–56] [43, 52–54, 56, 57] [52] [51–57] [43, 51–55, 57–60]
of cheaper and more compact devices that are suitable for portability in contrast to fluorescence devices. 6.1 Stress Factors and Genotoxicants
Recombinant bacterial whole-cell sensing systems for the detection of general stress, oxidative stress, and genotoxic compounds have been developed. In these systems, reporter genes are fused to promoters of different stress-response regulons, which are activated by a broad range of compounds. A panel of these bacterial strains is usually employed in environmental monitoring studies [50]. Numerous bacterial strains have been developed that are able to respond to different stressful conditions with the production of a reporter photoprotein. Selected examples of such strains are reported in Table 4. 6.2 Environmental Pollutants
A variety of bacterial whole-cell sensing systems based on the promoter–reporter gene concept have been developed for the specific or selective detection of single pollutants or classes of pollutants. Many microorganisms have evolved the ability to survive in suboptimal conditions, including contaminated environments. Such ability usually relies on genetically encoded resistance systems. In the presence of toxic compounds such as heavy metals and metalloids, particular bacteria can synthesize specific proteins that confer resistance to those substances. The mechanisms of resistance vary. Some microorganisms develop efflux pumps, loss-of-uptake systems, or chemical detoxification systems [61, 62]. Other bacterial strains living in contaminated sites can degrade organic xenobiotics and utilize them as carbon and energy sources [63]. Many of these resistance pathways are inducible, meaning that protein synthesis occurs only as required by the presence of given compounds, which makes their use possible for analytical purposes. The gene sequences of numerous regulatory systems have been cloned and plasmid reporter vectors have been constructed that feature the regulatory sequence in gene fusion with a reporter gene. The pioneering work of Sayler and coworkers, who constructed a whole cell-based
6 Applications Table 5 Whole-cell sensing systems using photoproteins as reporters for the detection of
environmental inorganic and organic analytes Analyte
Reporter genes
Regulatory unit
References
Arsenite, arsenate, antimonite
luc luxAB gfp gfpuv
ars
[67–71]
Cadmium, lead
luc luxAB
cad
[72]
Cadmium, lead, zinc
luc luxCDABE rs-GFP
znt
[73, 74]
Chromate
luxCDABE
chr
[75]
Copper
luxCDABE
copA
[73, 76, 77]
Iron
gfp
pvd
[78]
Mercury, organomercurials
luc
mer
[79–84]
luxAB luxCDABE gfp Nickel
luxCDABE
cnr
[85]
Silver
luxCDABE
copA
[77]
Alkanes
luxAB
alk
[86]
Aromatic compounds
luxCDABE
sep
[87]
Benzene, toluene, ethylbenzene, xylene (BTEX)
luc gfp luxCDABE
xyl tbu tod
[88–90]
Naphtalene, salicylate
luxCDABE
nah
[91]
Phenolic compounds
luxCDABE luxAB
mop dmp
[92] [93]
Polychlorinated biphenyls
luxCDABE
bph
Nitrate
gfp
nar
[94, 95]
Phosphate
luxCDABE
pho
[96]
sensing system for the detection of naphthalene and salicylate [64], prompted the development of a plethora of constructs responsive to organic and inorganic pollutants and including different reporter genes. A number of these constructs, along with the application of whole cell-based biosensors to different environmental samples, have been presented and discussed in recent reviews [65, 66]. Many whole-cell sensing systems for the detection of metals and metalloids have been developed, while smaller numbers of biosensing systems for organic pollutants are available. In Table 5 we list inorganic and organic analytes for which whole cell-based biosensors have been developed, along with the regulatory units and reporter genes used. Whole-cell sensing systems are suitable for the detection of metabolites that are formed along the pollutants degradation pathways. As an example, a whole-cell
13
14 Photoproteins as Reporters in Whole-cell Sensing
biosensor for the detection of hydroxylated polychlorinated biphenyls in biological and environmental samples is currently under development in our laboratory. Although whole-cell sensing systems have been mainly developed for environmental monitoring purposes, detection of biologically relevant molecules, such as sugars, has benefited from the analytical performance of reporter gene-based whole-cell biosensors [48, 97, 98]. Whole-cell sensing systems for the detection of quorumsensing signaling molecules and antibiotics have been also developed. These are described in more detail in the following sections of this chapter. 6.3 Quorum-sensing Signaling Molecules
Quorum sensing is a phenomenon that enables certain bacteria to communicate with each other by producing and responding to secreted signaling molecules in proportion to cell density [99]. This process allows a population of bacteria to regulate expression of specialized genes in response to changes in its size. When cell population reaches a critical size, certain genes are expressed, hence the term quorum sensing. This phenomenon is responsible for many processes, including expression and secretion of virulence factors, formation of biofilms, competence, conjugation, antibiotic production, motility, sporulation, symbiosis, and bioluminescence [99]. Several quorum-sensing regulatory systems have been identified so far. The best-characterized are those of the LuxI/LuxR type found in most gramnegative bacteria. These quorum-sensing circuits use N-acyl-homoserine lactones (AHLs) as signaling molecules and contain regulatory proteins with homology to the regulatory proteins LuxI and LuxR, which control bioluminescence in the marine bacterium Vibrio fischeri [100]. The gene sequences of some of these regulatory systems have been cloned and inserted in plasmid reporter vectors, thus allowing the development of whole-cell sensing systems for the detection of quorum-sensing signaling molecules. Whole cell-based sensors that contain the luxCDABE cassette as a reporter in gene fusion with quorum-sensing regulatory sequences have been employed to study structure-activity relationships of AHLs [101]. Such biosensing systems have limits of detection in the nanomolar range. They have been applied successfully to the study of AHLs in sputum of cystic fibrosis patients with bacterial lung infections [102]. Whole-cell sensing systems may also represent a tool to identify AHL-producing bacteria by detecting AHLs in their cell-free culture supernatants. Using this approach, it has been shown that Porphyromonas gingivalis, a gramnegative bacterium implicated in the etiology of human periodontal disease, actually regulates the production of virulence factors by quorum sensing but does not utilize AHL-mediated systems [103]. In another study, it has been reported that plant growth-promoting Pseudomonas putida produces cyclic dipeptides that potentially cross talk with the LuxI/LuxR quorum-sensing system [104]. In general, these biosensors can be used as a preliminary screening tool to identify active samples, which can be then subjected to mass spectrometric and NMR analysis in order to determine the chemical identity of individual molecules.
6 Applications
Recently, the molecular mechanisms of bacterial quorum sensing have been proposed as new drug target. In fact, the pharmacological inhibition of quorum sensing for the treatment of bacterial infections may be an alternative to traditional antibiotic treatments, whose therapeutic efficacy is currently decreasing as a result of the occurrence of multiple antibiotic-resistant pathogenic bacteria [105]. Although physical-chemical methods for the detection of AHLs have been developed [106], detailed investigations of quorum-sensing inhibition by methods such as HPLC, GC-MS, or TLC are not practical for routine screening. Whole-cell sensing systems are capable of easily, rapidly, and inexpensively investigating large numbers of compounds and thus are suitable for high-throughput screening of quorum-sensing inhibitors as potential antimicrobial drugs. A GFP-based whole-cell sensing system was developed to perform single-cell analysis and online studies of AHL-mediated communication among bacteria [107]. In this application, a gfp mutant that encodes a protein with a relatively short half-life was used in order to monitor variations in AHL concentrations in real time. Such a biosensor represents a powerful tool for in vivo studies of cell communication and for in vivo efficacy studies of quorum-sensing inhibitors. Indeed, it has been used to detect the production of AHLs from Pseudomonas aeruginosa in a mouse infection model [108]. A similar biosensor allowed detection of AHL production in laboratorybased P. aeruginosa biofilms and showed the ability of a synthetic compound to penetrate microcolonies and block quorum-sensing signaling in most biofilm cells [109]. GFP-based whole-cell sensing systems combined with flow cytometry analysis have been successfully employed for in situ detection of AHLs in soil [110]. 6.4 Antibiotics
Antibiotics belonging to the tetracycline family are extensively used in the therapy and prophylactic control of bacterial infections in human and veterinary medicine and as food additives for growth promotion in the farming industry. Intensive use of tetracyclines has led to widespread antibiotic resistance in bacterial species. The resistance mechanism genes are located in plasmids that can be efficiently transferred from one strain to another. In addition to the development of drug-resistant pathogens, the use of tetracycline has been associated with problems such as unacceptable levels of drug residues in food products for human consumption and release of drugs into the environment. Control of usage in animal farming is possible by monitoring antibiotic residues in different biological samples. Conventional methods for the detection of tetracycline residues include microbial inhibition tests, immunoassays, and chromatographic methods. Recently, whole-cell sensing systems have been proposed as a sensitive, simple, fast, and inexpensive method for measuring tetracyclines in biological fluids and food samples [111]. A bioluminescent bacterial strain containing the bacterial luciferase operon luxCDABE under the control of the tetracycline-responsive element, which is part of the regulation unit of the tetracycline resistance factor, has been developed. It allows for the specific detection of different clinically relevant tetracyclines, with limits of
15
16 Photoproteins as Reporters in Whole-cell Sensing
detection at picomole levels and an induction time of 90 min [112]. Moreover, freeze-dried cells can detect tetracyclines as sensitively as freshly cultivated cells, thus envisioning the possibility of on-site use. The developed whole-cell sensing system has been tested on different biological samples. It has proved to be suitable to screen veterinary serum samples for tetracycline residues in real time [113] and to detect tetracyclines in raw bovine milk samples below the official limits set by the European Union and the U.S. Food and Drug Administration [114]. The biosensing strain was also applied to the detection of traces of tetracyclines in fish, after simple sample preparation. In this case, tetracycline residues were detected below official limits, and the results correlated with those obtained by conventional HPLC [115]. In another study, a GFP-based whole-cell sensor system for in situ detection of tetracyclines was developed. The biosensor was used for qualitative detection of oxytetracycline production by the bacterium Streptomyces rimosus in soil microcosms [116]. The tetracycline-induced GFP-producing biosensor cells were detected by using flow cytometry and fluorescence-activated cell-sorting (FACS) analysis. This study showed the biosensor potential for microbial ecology studies. The same sensor strain was employed in a separate study for in vivo detection and quantification of tetracyclines in rat intestine [117]. Bioavailable tetracycline concentration within the bacterial growth habitat of the intestine proved to be proportional to the intake concentration, but significantly lower. This finding, made possible by the use of the whole-cell sensing system, may help to clarify and optimize antimicrobial therapy in the intestinal environment.
7 Technological Advances
An emerging trend of research in luminescent whole-cell biosensors is moving toward integration of the bacterial reporter strain into the appropriate detection device. This goal is achieved mainly by immobilizing the cells onto optical fiber tips. Different immobilization methods that utilize membranes or alginate, solid agar, and sol-gel matrices have been developed and optimized. The obtained fiber-optic, whole-cell sensors have been proposed as self-contained, disposable biomonitoring devices for environmental analysis [59, 118–121]. When coupled to a portable photon-counting system, they allow performance of measurements in the field. Additionally, the use of multiple fibers enables multiple detection [122]. An interesting application of optical fiber led to the development of a high-density cell assay platform [123]. Here, fluorescent reporter cells were placed into ordered microwell arrays that were fabricated on an optical imaging fiber. Fluorescence signals from individual cells were detected and spatially located using an imaging detector. The main advantage of this cell array technology is the simultaneous analysis of multiple responses from a large number of different strains, thus showing potential for high-throughput screening applications. A different approach to integration is the implementation of chip-based systems. An example of a whole-cell biochip is the bioluminescent bioreporter
References
integrated circuit (BBIC), in which recombinant bioreporter cells are interfaced with an integrated circuit. In this application, the cells are either immobilized or encapsulated to integrate them into the circuit. Alternatively, a flow system is employed to bring the cell bioreporters in proximity with the photodiode-containing chip of a microluminometer optimized for very low-level luminescence detection [124, 125]. Further developments in whole-cell sensing aim toward system miniaturization, with consequent implementation of high-density assay platforms, including 96and 384-well microtiter plates, microarrays, and microfluidics devices. The main advantage of these analytical formats is that they allow for high-throughput screening of samples. Not only can large numbers of samples be analyzed, but also large amounts of information from each sample can be assessed simultaneously and in a relatively short time. Additionally, miniaturized systems allow decreased reagent consumption, sample volume, and analysis time. Several whole cell-based assay systems that use multi-well platforms have been developed. Reporter cells can be suspended in culture media within the wells [126], immobilized in agar matrices at the bottom of single wells [127], or encapsulated in sol-gel matrices that form thick silicate films [128]. The main advantage of immobilized cells is that, while maintaining the analytical characteristics of free cells, they can be used under continuous flow conditions for on-line monitoring and as multiple-use sensing elements. In that regard, the optimization of deposition and immobilization techniques can contribute significantly to the advancement of whole-cell sensing technology [129]. An example of whole-cell microarray is the Lux-Array, which includes hundreds of E. coli reporter strains, high-density printed to membranes on agar plates [130]. The system was developed for genome-wide transcriptional analysis by fusing a collection of E. coli functional promoters to a reporter gene. However, it shows the potential for assembling an array of reportersensing strains to be used for analytical purposes. Recently, we took advantage of microfluidics to develop a miniaturized whole cell-based sensing system [71]. A recombinant reporter strain for the detection of arsenite/antimonite was adapted to a microcentrifugal microfluidics platform in the shape of a compact disk. Centrifugal force controlled valving, reagent flow, and mixing and drove fluids through microfabricated channels and reservoirs. The force was generated by a motor that spun the disk at programmed velocities. Fluorescence from the reporter was detected using a fiber-optic probe. Despite miniaturization, the biosensing system retained its analytical performance and was able to perform quantitative analysis within short response times. Moreover, the developed system has the potential for portability and applicability in the field.
Acknowledgments
This work was supported in part by the National Institute of Environmental Health Sciences (Grant P42 ES 007380) and the National Science Foundation (Grant CHE0416553).
17
18 Photoproteins as Reporters in Whole-cell Sensing
References 1 NAYLOR, L. H. Reporter gene technology: the future looks bright. Biochem. Pharmacol. 1999, 58, 749–757. 2 WOOD, K. V. Marker proteins for gene expression. Curr. Opin. Biotechnol. 1995, 6, 50–58. 3 JANSSON, J. K. Marker and reporter genes: illuminating tools for environmental microbiologists. Curr. Opin. Microbiol. 2003, 6, 310–316. 4 BOUSSE, L. Whole cell biosensors. Sens. Actuators B Chem. 1996, 34, 270–275. 5 PANCRAZIO, J. J., WHELAN, J. P., BORKHOLDER, D. A., MA, W., STENGER, D. A. Development and application of cell-based biosensors. Ann. Biomed. Eng. 1999, 27, 697–711. 6 Biosensors Fundamentals and Applications (TURNER, A. P. F., KARUBE, I., WILSON, G. S., Eds.), Oxford University Press: Oxford, UK, 1987. 7 Biosensor Principles and Applications (BLUM, L. J., COULET, P. R., Eds.), Marcel Dekker: New York, 1991. 8 D’SOUZA, S. F. Microbial biosensors. Biosens. Bioelectron. 2001, 16, 337–353. 9 van der MEER, J. R., TROPEL, D., JASPERS, M. Illuminating the detection chain of bacterial bioreporters. Environ. Microbiol. 2004, 6, 1005–1020. 10 DAUNERT, S., BARRETT, G., FELICIANO, J. S., SHETTY, R. S., SHRESTHA, S., SMITH-SPENCER, W. Genetically engineered whole-cell sensing systems: coupling biological recognition with reporter genes. Chem. Rev. 2000, 100, 2705–2738. 11 DURICK, K., NEGULESCU, P. Cellular biosensors for drug discovery. Biosens. Bioelectron. 2001, 16, 587–592. 12 WINDAL, I., DENISON, M. S., BIRNBAUM, L. S., Van WOUWE, N., BAEYENS, W., GOEYENS, L. Chemically activated luciferase gene expression (CALUX) cell bioassay analysis for the estimation of dioxin-like activity: critical parameters of the CALUX procedure that impact assay results. Environ. Sci. Technol. 2005, 39, 7357–7364. 13 ROGERS, J. M., DENISON, M. S. Recombinant cell bioassays for
14
15
16
17 18
19
20
21
22
23
endocrine disruptors: development of a stably transfected human ovarian cell line for the detection of estrogenic and anti-estrogenic chemicals. In Vitr. Mol. Toxicol. 2000, 13, 67–82. BALDWIN, T. O., CHRISTOPHER, J. A., RAUSHEL, F. M., SINCLAIR, J. F., ZIEGLER, M. M., FISHER, A. J., RAYMENT, I. Structure of bacterial luciferase. Curr. Opin. Struct. Biol. 1995, 5, 798–809. MEIGHEN, E. A. Molecular biology of bacterial bioluminescence. Microbiol. Rev. 1991, 55, 123–142. RUTTER, G. A., KENNEDY, H. J., WOOD, C. D., WHITE, M. R. H., TAVARE, J. M. Real-time imaging of gene expression in single living cells. Chem. Biol. 1998, 5, R285–R290. MEIGHEN, E. Bioluminescence, Bacterial. Encyclo. Microbiol. 1992, 1, 309–319. BILLARD, P., DUBOW, M. S. Bioluminescence-based assays for detection and characterization of bacteria and chemicals in clinical laboratories. Clin. Biochem. 1998, 31, 1–14. BULICH, A. A practical and reliable method for monitoring the toxicity of aquatic samples. Process Biochem. 1982, 17, 45–47. HERMENS, J., BUSSER, F., LEEUWANGH, P., MUSCH, A. Quantitative structure-activity relationships and mixture toxicity of organic chemicals in Photobacterium phosphoreum: the Microtox test. Ecotoxicol. Environ. Saf. 1985, 9, 17–25. GUPTA, R. K., PATTERSON, S. S., RIPP, S., SIMPSON, M. L., SAYLER, G. S. Expression of the Photorhabdus luminescens lux genes (luxA, B, C, D, and E) in Saccharomyces cerevisiae. FEMS Yeast Res. 2003, 4, 305–313. KELLY, C. J., TUMSAROJ, N., LAJOIE, C. A. Assessing wastewater metal toxicity with bacterial bioluminescence in a bench-scale wastewater treatment system. Water Res. 2004, 38, 423–431. DEWET, J. R., WOOD, K. V., DELUCA, M., HELINSKI, D. R., SUBRAMANI, S. Firefly luciferase gene – structure and
References
24
25
26
27
28
29
30
31
32
expression in mammalian cells. Mol. Cell Biol. 1987, 7, 725–737. DEWET, J. R., WOOD, K. V., HELINSKI, D. R., DELUCA, M. Cloning of firefly luciferase cDNA and the expression of active luciferase in Escherichia coli. Proc. Natl. Acad. Sci. USA 1985, 82, 7870–7873. CRAIG, F. F., SIMMONDS, A. C., WATMORE, D., MCCAPRA, F., WHITE, M. R. Membrane-permeable luciferin esters for assay of firefly luciferase in live intact cells. Biochem. J. 1991, 276 (Pt 3), 637–641. WOOD, K. V., LAM, Y. A., SELIGER, H. H., MCELROY, W. D. Complementary DNA coding click beetle luciferases can elicit bioluminescence of different colors. Science 1989, 244, 700–702. LORENZ, W. W., MCCANN, R. O., LONGIARU, M., CORMIER, M. J. Isolation and expression of a cDNA encoding Renilla reniformis luciferase. Proc. Natl. Acad. Sci. U S A 1991, 88, 4438–4442. STABLES, J., SCOTT, S., BROWN, S., ROELANT, C., BURNS, D., LEE, M. G., REES, S. Development of a dual glow-signal firefly and Renilla luciferase assay reagent for the analysis of g-protein coupled receptor signalling. J. Recept. Signal Transduct. Res. 1999, 19, 395–410. INOUYE, S., NOGUCHI, M., SAKAKI, Y., TAKAGI, Y., MIYATA, T., IWANAGA, S., TSUJI, F. I. Cloning and sequence analysis of cDNA for the luminescent protein aequorin. Proc. Natl. Acad. Sci. USA 1985, 82, 3154–3158. OHMIYA, Y., HIRANO, T. Shining the light: the mechanism of the bioluminescence reaction of calcium-binding photo-proteins. Chem. Biol. 1996, 3, 337–347. SHIMOMURA, O., JOHNSON, F. H., SAIGA, Y. Extraction, purification and properties of aequorin, a bioluminescent protein from the luminous hydromedusan, Aequorea. J. Cell Comp. Physiol. 1962, 59, 223–239. RIDER, T., PETROVICK, M., YOUNG, A., SMITH, L., MATHEWS, R., PALMACCI, S., HOLLIS, M., CHEN, J. NASA/NCI Biomedical Imaging Symposium, Bethesda, MD, 1999.
33 RIDER, T. H., PETROVICK, M. S., NARGI, F. E., HARPER, J. D., SCHWOEBEL, E. D., MATHEWS, R. H., BLANCHARD, D. J., BORTOLIN, L. T., YOUNG, A. M., CHEN, J. Z., HOLLIS, M. A. A B cell-based sensor for rapid identification of pathogens. Science 2003, 301, 213–215. 34 PHILLIPS, G. N. Jr. Structure and dynamics of green fluorescent protein. Curr. Opin. Struct. Biol. 1997, 7, 821–827. 35 WELSH, S., KAY, S. A. Reporter gene expression for monitoring gene transfer. Curr. Opin. Biotechnol. 1997, 8, 617–622. 36 HANSEN, L. H., SØRENSEN, S. J. The use of whole-cell biosensors to detect and quantify compounds or conditions affecting biological systems. Microb. Ecol. 2001, 42, 483–494. 37 LEWIS, J. C., DAUNERT, S. Photoproteins as luminescent labels in binding assays. Fresenius J. Anal. Chem. 2000, 366, 760–768. 38 MARCH, J. C., RAO, G., BENTLEY, W. E. Biotechnological applications of green fluorescent protein. Appl. Microbiol. Biotechnol. 2003, 62, 303–315. 39 MATZ, M. V., FRADKOV, A. F., LABAS, Y. A., SAVITSKY, A. P., ZARAISKY, A. G., MARKELOV, M. L., LUKYANOV, S. A. Fluorescent proteins from nonbioluminescent Anthozoa species. Nat Biotechnol. 1999, 17, 969–973. 40 DAVEY, H. M., KELL, D. B. Flow cytometry and cell sorting of heterogeneous microbial populations: the importance of single-cell analyses. Microbiol. Rev. 1996, 60, 641–696. 41 GROSS, L. A., BAIRD, G. S., HOFFMAN, R. C., BALDRIDGE, K. K., TSIEN, R. Y. The structure of the chromophore within DsRed, a red fluorescent protein from coral. Proc. Natl. Acad. Sci. USA 2000, 97, 11990–11995. 42 HAKKILA, K., MAKSIMOW, M., KARP, M., VIRTA, M. Reporter genes lucFF, luxCDABE, gfp, and dsred have different characteristics in whole-cell bacterial sensors. Anal. Biochem. 2002, 301, 235–242. 43 MITCHELL, R. J., GU, M. B. Construction and Characterization of Novel Dual Stress-Responsive Bacterial Biosensors. Biosens. Bioelectron. 2004, 19, 977–985.
19
20 Photoproteins as Reporters in Whole-cell Sensing
44 MOLINA, A., CARPEAUX, R., MARTIAL, J. A., MULLER, M. A transformed fish cell line expressing a green fluorescent protein-luciferase fusion gene responding to cellular stress. Toxicol. In Vitro 2002, 16, 201–207. 45 SAGI, E., HEVER, N., ROSEN, R., BARTOLOME, A. J., PREMKUMAR, J. R., ULBER, R., LEV, O., SCHEPER, T., BELKIN, S. Fluorescence and bioluminescence reporter functions in genetically modified bacterial sensor strains. Sens. Actuators B Chem. 2003, 90, 2–8. 46 BAUMSTARK-KHAN, C., KHAN, R. A., RETTBERG, P., HORNECK, G. Bacterial Lux-Fluoro test for biological assessment of pollutants in water samples from urban and rural origin. Anal. Chim. Acta 2003, 487, 51–60. 47 WOOD, K. V. The chemical mechanism and evolutionary development of beetle bioluminescence. Photochem. Photobiol. 1995, 62, 662–673. 48 SHRESTHA, S., SHETTY, R. S., RAMANATHAN, S., DAUNERT, S. Simultaneous Detection of Analytes Based on Genetically Engineered Whole Cell Sensing Systems. Anal. Chim. Acta. 2001, 444, 251–260. 49 MIRASOLI, M., FELICIANO, J., MICHELINI, E., DAUNERT, S., RODA, A. Internal response correction for fluorescent whole-cell biosensors. Anal. Chem. 2002, 74, 5948–5953. 50 PEDAHZUR, R., POLYAK, B., MARKS, R. S., BELKIN, S. Water toxicity detection by a panel of stress-responsive luminescent bacteria. J. Appl. Toxicol. 2004, 24, 343–348. 51 GU, M. B., GIL, G. C., KIM, J. H. Enhancing the sensitivity of a two-stage continuous toxicity monitoring system through the manipulation of the dilution rate. J. Biotechnol. 2002, 93, 283–288. 52 PREMKUMAR, J. R., LEV, O., MARKS, R. S., POLYAK, B., ROSEN, R., BELKIN, S. Antibody-based immobilization of bioluminescent bacterial sensor cells. Talanta 2001, 55, 1029–1038. 53 KIM, B. C., GU, M. B. A bioluminescent sensor for high throughput toxicity classification. Biosens. Bioelectron. 2003, 18, 1015–1021.
54 CHOI, S. H., GU, M. B. A portable toxicity biosensor using freeze-dried recombinant bioluminescent bacteria. Biosens. Bioelectron. 2002, 17, 433–440. 55 GU, M. B., GIL, G. C. A multi-channel continuous toxicity monitoring system using recombinant bioluminescent bacteria for classification of toxicity. Biosens. Bioelectron. 2001, 16, 661–666. 56 CHOI, S. H., GU, M. B. Toxicity biomonitoring of degradation byproducts using freeze-dried recombinant bioluminescent bacteria. Anal. Chim. Acta 2003, 481, 229–238. 57 KIM, B. C., PARK, K. S., KIM, S. D., GU, M. B. Evaluation of a high throughput toxicity biosensor and comparison with a Daphnia magna bioassay. Biosens. Bioelectron. 2003, 18, 821–826. 58 GU, M. B., MIN, J., LAROSSA, R. A. Bacterial bioluminescent emission from recombinant Escherichia coli harboring a reca::luxcdabe fusion. J. Biochem. Biophys. Methods 2000, 45, 45–56. 59 POLYAK, B., BASSIS, E., NOVODVORETS, A., BELKIN, S., MARKS, R. S. Bioluminescent whole cell optical fiber sensor to genotoxicants: system optimization. Sens. Actuators B Chem. 2001, 74, 18–26. 60 BARTOLOME, A. J., ULBER, R., SCHEPER, T., SAGI, E., BELKIN, S. Genotoxicity monitoring using a 2D-spectroscopic GFP whole cell biosensing system. Sens. Actuators B Chem. 2003, 89, 27–32. 61 SILVER, S. Genes for all metals – a bacterial view of the Periodic Table. The 1996 Thom Award Lecture. J. Ind. Microbiol. Biotechnol. 1998, 20, 1–12. 62 ROSEN, B. P. Transport and detoxification systems for transition metals, heavy metals and metalloids in eukaryotic and prokaryotic microbes. Comp. Biochem. Physiol. A Mol. Integr. Physiol. 2002, 133, 689–693. 63 PARALES, R. E., HADDOCK, J. D. Biocatalytic degradation of pollutants. Curr. Opin. Biotechnol. 2004, 15, 374–379. 64 BURLAGE, R. S., SAYLER, G. S., LARIMER, F. Monitoring of naphthalene catabolism by bioluminescence with nah-lux transcriptional fusions. J. Bacteriol. 1990, 172, 4749–4757.
References 65 BELKIN, S. Microbial whole-cell sensing systems of environmental pollutants. Curr. Opin. Microbiol. 2003, 6, 206–212. 66 GU, M. B., MITCHELL, R. J., KIM, B. C. Whole-cell-based biosensors for environmental biomonitoring and application. Adv. Biochem. Engin./Biotechnol. 2004, 87, 269–305. 67 RAMANATHAN, S., SHI, W., ROSEN, B. P., DAUNERT, S. Sensing antimonite and arsenite at the subattomole level with genetically engineered bioluminescent bacteria. Anal. Chem. 1997, 69, 3380–3384. 68 TAURIAINEN, S., KARP, M., CHANG, W., VIRTA, M. Recombinant luminescent bacteria for measuring bioavailable arsenite and antimonite. Appl. Environ. Microbiol. 1997, 63, 4456–4461. 69 TAURIAINEN, S., VIRTA, M., CHANG, W., KARP, M. Measurement of firefly luciferase reporter gene activity from cells and lysates using Escherichia coli arsenite and mercury sensors. Anal. Biochem. 1999, 272, 191–198. 70 ROBERTO, F. F., BARNES, J. M., BRUHN, D. F. Evaluation of a gfp reporter gene construct for environmental arsenic detection. Talanta 2002, 58, 181–188. 71 ROTHERT, A., DEO, S. K., MILLNER, L., PUCKETT, L. G., MADOU, M. J., DAUNERT, S. Whole-cell-reporter-gene-based biosensing systems on a compact disk microfluidics platform. Anal. Biochem. 2005, 342, 11–19. 72 TAURIAINEN, S., KARP, M., CHANG, W., VIRTA, M. Luminescent bacterial sensor for cadmium and lead. Biosens. Bioelectron. 1998, 13, 931–938. 73 RIETHER, K. B., DOLLARD, M. A., BILLARD, P. Assessment of heavy metal bioavailability using Escherichia coli zntAp::lux and copAp::lux-based biosensors. Appl. Microbiol. Biotechnol. 2001, 57, 712–716. 74 SHETTY, R. S., DEO, S. K., SHAH, P., SUN, Y., ROSEN, B. P., DAUNERT, S. Luminescence-based whole-cell-sensing systems for cadmium and lead using genetically engineered bacteria. Anal. Bioanal. Chem. 2003, 376, 11–17. 75 PEITZSCH, N., EBERZ, G., NIES, D. H. Alcaligenes eutrophus as a bacterial
76
77
78
79
80
81
82
83
84
85
chromate sensor. Appl. Environ. Microbiol. 1998, 64, 453–458. VULKAN, R., ZHAO, F. J., BARBOSA-JEFFERSON, V., PRESTON, S., PATON, G. I., TIPPING, E., MCGRATH, S. P. Copper speciation and impacts on bacterial biosensors in the pore water of copper-contaminated soils. Environ. Sci. Technol. 2000, 34, 5115–5121. STOYANOV, J. V., MAGNANI, D., SOLIOZ, M. Measurement of cytoplasmic copper, silver, and gold with a lux biosensor shows copper and silver, but not gold, efflux by the CopA ATPase of Escherichia coli. FEBS Lett. 2003, 546, 391–394. JOYNER, D. C., LINDOW, S. E. Heterogeneity of iron bioavailability on plants assessed with a whole-cell GFP-based bacterial biosensor. Microbiology 2000, 146(Pt 10) 2435–2445. SELIFONOVA, O., BURLAGE, R., BARKAY, T. Bioluminescent sensors for detection of bioavailable Hg(II) in the environment. Appl. Environ. Microbiol. 1993, 59, 3083–3090. VIRTA, M., LAMPINEN, J., KARP, M. A luminescence-based mercury biosensor. Anal. Chem. 1995, 67, 667–669. LYNGBERG, O. K., STEMKE, D. J., SCHOTTEL, J. L., FLICKINGER, M. C. A single-use luciferase-based mercury biosensor using Escherichia coli HB101 immobilized in a latex copolymer film. J. Ind. Microbiol. Biotechnol. 1999, 23, 668–676. HANSEN, L. H., SØRENSEN, S. J. Versatile biosensor vectors for detection and quantification of mercury. FEMS Microbiol. Lett. 2000, 193, 123–127. RASMUSSEN, L. D., SORENSEN, S. J., TURNER, R. R., BARKAY, T. Application of a mer-lux biosensor for estimating bioavailable mercury in soil. Soil Biol. Biochem. 2000, 32, 639–646. IVASK, A., HAKKILA, K., VIRTA, M. Detection of organomercurials with sensor bacteria. Anal. Chem. 2001, 73, 5168–5171. TIBAZARWA, C., CORBISIER, P., MENCH, M., BOSSUS, A., SOLDA, P., MERGEAY, M., WYNS, L., van der LELIE, D. A Microbial
21
22 Photoproteins as Reporters in Whole-cell Sensing
86
87
88
89
90
91
92
93
biosensor to predict bioavailable nickel in soil and its transfer to plants. Environ. Pollut. 2001, 113, 19–26. STICHER, P., JASPERS, M. C. M., STEMMLER, K., HARMS, H., ZEHNDER, A. J. B., van der MEER, J. R. Development and characterization of a whole-cell bioluminescent sensor for bioavailable middle-chain alkanes in contaminated ground water samples. Appl. Environ. Microbiol. 1997, 63, 4053–4060. PHOENIX, P., KEANE, A., PATEL, A., BERGERON, H., GHOSHAL, S., LAU, P. C. K. Characterization of a new solvent-responsive gene locus in Pseudomonas putida F1 and its functionalization as a versatile biosensor. Environ. Microbiol. 2003, 5, 1309–1327. KOBATAKE, E., NIIMI, T., HARUYAMA, T., IKARIYAMA, Y., AIZAWA, M. Biosensing of benzene derivatives in the environment by luminescent Escherichia coli. Biosens. Bioelectron. 1995, 10, 601–605. APPLEGATE, B. M., KEHRMEYER, S. R., SAYLER, G. S. A chromosomally based tod-luxCDABE whole-cell reporter for benzene, toluene, ethybenzene, and xylene (BTEX) sensing. Appl. Environ. Microbiol. 1998, 64, 2730–2735. STINER, L., HALVERSON, L. J. Development and characterization of a green fluorescent protein-based bacterial biosensor for bioavailable toluene and related compounds. Appl. Environ. Microbiol. 2002, 68, 1962–1971. HEITZER, A., MALACHOWSKY, K., THONNARD, J. E., BIENKOWSKI, P. R., WHITE, D. C., SAYLER, G. S. Optical biosensor for environmental on-line monitoring of naphthalene and salicylate bioavailability with an immobilized bioluminescent catabolic reporter bacterium. Appl. Environ. Microbiol. 1994, 60, 1487–1494. ABD-EL-HALEEM, D., RIPP, S., SCOTT, C., SAYLER, G. S. A luxCDABE-based bioluminescent bioreporter for the detection of phenol. J. Ind. Microbiol. Biotechnol. 2002, 29, 233–237. LAYTON, A. C., MUCCINI, M., GHOSH, M. M., SAYLER, G. S. Construction of a bioluminescent reporter strain to detect
94
95
96
97
98
99
100
101
102
103
polychlorinated biphenyls. Appl. Environ. Microbiol. 1998, 64, 5023–5026. MBEUNKUI, F., RICHAUD, C., ETIENNE, A. L., SCHMID, R. D., BACHMANN, T. T. Bioavailable nitrate detection in water by an immobilized luminescent cyanobacterial reporter strain. Appl. Microbiol. Biotechnol. 2002, 60, 306–312. TAYLOR, C. J., BAIN, L. A., RICHARDSON, D. J., SPIRO, S., RUSSELL, D. A. Construction of a whole-cell gene reporter for the fluorescent bioassay of nitrate. Anal. Biochem. 2004, 328, 60–66. DOLLARD, M. A., BILLARD, P. Whole-cell bacterial sensors for the monitoring of phosphate bioavailability. J. Microbiol. Methods 2003, 55, 221–229. SHETTY, R. S., RAMANATHAN, S., BADR, I. H. A., WOLFORD, J. L., DAUNERT, S. Green fluorescent protein in the design of a living biosensing system for L-arabinose. Anal. Chem. 1999, 71, 763–768. ˜ ONES, MILLER, W. G., BRANDL, M. T., QUIN B., LINDOW, S. E. Biological sensor for sucrose availability: relative sensitivities of various reporter genes. Appl. Environ. Microbiol. 2001, 67, 1308–1317. MILLER, M. B., BASSLER, B. L. Quorum sensing in bacteria. Annu. Rev. Microbiol. 2001, 55, 165–199. FUQUA, C., PARSEK, M. R., GREENBERG, E. P. Regulation of gene expression by cell-to-cell communication: acylhomoserine lactone quorum sensing. Annu. Rev. Genet. 2001, 35, 439–468. WINSON, M. K., SWIFT, S., FISH, L., THROUP, J. P., JØRGENSEN, F., CHHABRA, S. R., BYCROFT, B. W., WILLIAMS, P., STEWART, G. S. A. B. Construction and analysis of luxCDABE-based plasmid sensors for investigating N-acyl homoserine lactone-mediated quorum sensing. FEMS Microbiol. Lett. 1998, 163, 185–192. MIDDLETON, B., RODGERS, H. C., C´AMARA, M., KNOX, A. J., WILLIAMS, P., HARDMAN, A. Direct detection of N-acylhomoserine lactones in cystic fibrosis sputum. FEMS Microbiol. Lett. 2002, 207, 1–7. BURGESS, N. A., KIRKE, D. F., WILLIAMS, P., WINZER, K., HARDIE, K. R., MEYERS, N. L., ADUSE-OPOKU, J., CURTIS, M. A.,
References
104
105
106
107
108
109
110
C´AMARA, M. LuxS-dependent quorum sensing in Porphyromonas gingivalis modulates protease and haemagglutinin activities but is not essential for virulence. Microbiology 2002, 148, 763–772. DEGRASSI, G., AGUILAR, C., BOSCO, M., ZAHARIEV, S., PONGOR, S., VENTURI, V. Plant growth-promoting Pseudomonas putida WCS358 produces and secretes four cyclic dipeptides: cross-talk with quorum sensing bacterial sensors. Curr. Microbiol. 2002, 45, 250–254. HENTZERM, M., GIVSKOV, M. Pharmacological inhibition of quorum sensing for the treatment of chronic bacterial infections. J. Clin. Invest. 2003, 112, 1300–1307. MORIN, D., GRASLAND, B., VALLE´ E-R´EHEL, K., DUFAU, C., HARAS, D. On-line high-performance liquid chromatography-mass spectrometric detection and quantification of N-acyl-homoserine lactones, quorum sensing signal molecules, in the presence of biological matrices. J. Chromatogr. A 2003, 1002, 79–92. ANDERSEN, J. B., HEYDORN, A., HENTZER, M., EBERL, L., GEISENBERGER, O., CHRISTENSEN, B. B., MOLIN, S., GIVSKOV, M. gfp-Based N-acyl homoserine-lactone sensor systems for detection of bacterial communication. Appl. Environ. Microbiol. 2001, 67, 575–585. WU, H., SONG, Z., HENTZER, M., ANDERSEN, J. B., HEYDORN, A., MATHEE, K., MOSER, C., EBERL, L., MOLIN, S., HØIBY, N., GIVSKOV, M. Detection of N-acylhomoserine lactones in lung tissues of mice infected with Pseudomonas aeruginosa. Microbiology 2000, 146, 2481–2493. HENTZER, M., RIEDEL, K., RASMUSSEN, T. B., HEYDORN, A., ANDERSEN, J. B., PARSEK, M. R., RICE, S. A., EBERL, L., MOLIN, S., HØIBY, N., KJELLEBERG, S., GIVSKOV, M. Inhibition of quorum sensing in Pseudomonas aeruginosa biofilm bacteria by a halogenated furanone compound. Microbiology 2002, 148, 87–102. BURMØLLE, M., HANSEN, L. H., OREGAARD, G., SØRENSEN, S. J. Presence of N-acyl homoserine lactones in soil detected by a
111
112
113
114
115
116
117
118
119
whole-cell biosensor and flow cytometry. Microb. Ecol. 2003, 45, 226–236. HANSEN, L. H., SØRENSEN, S. J. Detection and quantification of tetracyclines by whole cell biosensors. FEMS Microbiol. Lett. 2000, 190, 273–278. KORPELA, M. T., KURITTU, J. S., KARVINEN, J. T., KARP, M. T recombinant Escherichia coli sensor strain for the detection of tetracyclines. Anal. Chem. 1998, 70, 4457–4462. KURITTU, J., KARP, M., KORPELA, M. Detection of tetracyclines with luminescent bacterial strains. Luminescence 2000, 15, 291–297. KURITTU, J., L¨ONNBERG, S., VIRTA, M., KARP, M. A group-specific microbiological test for the detection of tetracycline residues in raw milk. J. Agric. Food Chem. 2000, 48, 3372–3377. PELLINEN, T., BYLUND, G., VIRTA, M., NIEMI, A., KARP, M. Detection of traces of tetracyclines from fish with a bioluminescent sensor strain incorporating bacterial luciferase reporter genes. J. Agric. Food Chem. 2002, 50, 4812–4815. HANSEN, L. H., FERRARI, B., SØRENSEN, A. H., VEAL, D., SØRENSEN, S. J. Detection of oxytetracycline production by Streptomyces rimosus in soil microcosms by combining whole-cell biosensors and flow cytometry. Appl. Environ. Microbiol. 2001, 67, 239–244. BAHL, M. I., HANSEN, L. H., LICHT, T. R., SØRENSEN, S. J. In vivo detection and quantification of tetracycline by use of a whole-cell biosensor in the rat intestine. Antimicrob. Agents Chemother. 2004, 48, 1112–1117. IKARIYAMA, Y., NISHIGUCHI, S., KOYAMA, T., KOBATAKE, E., AIZAWA, M., TSUDA, M., NAKAZAWA, T. Fiber-optic-based biomonitoring of benzene derivatives by recombinant E. coli bearing luciferase gene-fused TOL-plasmid immobilized on the fiber-optic end. Anal. Chem. 1997, 69, 2600–2605. POLYAK, B., BASSIS, E., NOVODVORETS, A., BELKIN, S., MARKS, R. S. Optical fiber bioluminescent whole-cell microbial biosensors to genotoxicants. Water Sci. Technol. 2000, 42, 305–311.
23
24 Photoproteins as Reporters in Whole-cell Sensing
120 GIL, G. C., KIM, Y. J., GU, M. B. Enhancement in the sensitivity of a gas biosensor by using an advanced immobilization of a recombinant bioluminescent bacterium. Biosens. Bioelectron. 2002, 17, 427–432. 121 SUN, Y., ZHOU, T., GUO, J., LI, Y. Dark variants of luminous bacteria whole cell bioluminescent optical fiber sensor to genotoxicants. J. Huazhong Univ. Sci. Technolog. Med. Sci. 2004, 24, 507–509. 122 HAKKILA, K., GREEN, T., LESKINEN, P., IVASK, A., MARKS, R., VIRTA, M. Detection of bioavailable heavy metals in EILATox-Oregon samples using whole-cell luminescent bacterial sensors in suspension or immobilized onto fibre-optic tips. J. Appl. Toxicol. 2004, 24, 333–342. 123 BIRAN, I., WALT, D. R. Optical imaging fiber-based single live cell arrays: a high-density cell assay platform. Anal. Chem. 2002, 74, 3046–3054. 124 SIMPSON, M. L., SAYLER, G. S., PATTERSON, G., NIVENS, D. E., BOLTON, E. K., ROCHELLE, J. M., ARNOTT, J. C., APPLEGATE, B. M., RIPP, S., GUILLORN, M. A. An integrated CMOS microluminometer for low-level luminescence sensing in the bioluminescent bioreporter integrated circuit. Sens. Actuators B Chem. 2001, 72, 134–140.
125 BOLTON, E. K., SAYLER, G. S., NIVENS, D. E., ROCHELLE, J. M., RIPP, S., SIMPSON, M. L. Integrated CMOS photodetectors and signal processing for very low-level chemical sensing with the bioluminescent bioreporter integrated circuit. Sens. Actuators B Chem. 2002, 85, 179–185. 126 RODA, A., PASINI, P., MIRASOLI, M., GUARDIGLI, M., RUSSO, C., MUSIANI, M., BARALDINI, M. Sensitive determination of urinary mercury(II) by a bioluminescent transgenic bacteria-based biosensor. Anal. Letters 2001, 34, 29–41. 127 KIM, B. C., GU, M. B. A bioluminescent sensor for high throughput toxicity classification. Biosens. Bioelectron. 2003, 18, 1015–1021. 128 PREMKUMAR, J. R., ROSEN, R., BELKIN, S., LEV, O. Sol-gel luminescence biosensors: encapsulation of recombinant E. coli reporters in thick silicate films. Anal. Chim. Acta 2002, 462, 11–23. 129 BARRON, J. A., ROSEN, R., JONES-MEEHAN, J., SPARGO, B. J., BELKIN, S., RINGEISEN, B. R. Biological laser printing of genetically modified Escherichia coli for biosensor applications. Biosens. Bioelectron. 2004, 20, 246–252. 130 Van DYK, T. K., DEROSE, E. J., GONYE, G. E. LuxArray, a high-density, genomewide transcription analysis of Escherichia coli using bioluminescent reporter strains. J. Bacteriol. 2001, 183, 5496–5505.
1
Catalytic Antibody Technology Donald Hilvert Swiss Federal Institute of Technology, Z¨urich, Switezerland
Originally published in: Catalytic Antibodies. Edited by Ehud Keinan. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30688-6
1 Introduction
Antibody molecules elicited with rationally designed transition-state analogs catalyze numerous reactions, including many that cannot be achieved by standard chemical methods. Although relatively primitive when compared with natural enzymes, these catalysts are valuable tools for probing the origins and evolution of biological catalysis. Mechanistic and structural analyses of representative antibody catalysts, generated with a variety of strategies for several different reaction types, suggest that their modest efficiency is a consequence of imperfect hapten design and indirect selection. Development of improved transition-state analogs, refinements in immunization and screening protocols, and elaboration of general strategies for augmenting the efficiency of first-generation catalytic antibodies are identified as evident, but difficult, challenges for this field. Rising to these challenges and more successfully integrating programmable design with the selective forces of biology will enhance our understanding of enzymatic catalysis. Further, it should yield useful protein catalysts for an enhanced range of practical applications in chemistry and biology.
2 Exploiting Antibodies as Catalysts
Some three decades ago, Jencks suggested that stable molecules resembling the transition state of a reaction might be used as haptens to elicit antibodies with tailored catalytic activities and selectivities [1]. Implementation of this clever idea was made possible by the development of monoclonal antibodies, viable transition-state analogs, and versatile screening assays, and more than 100 reactions have now been Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Catalytic Antibody Technology
successfully accelerated by antibodies [2, 3]. These include pericyclic processes, group transfer reactions, additions and eliminations, oxidations and reductions, aldol condensations, and miscellaneous cofactor-dependent transformations. Because selectivities in these systems generally reflect the structure of the hapten and can rival those of natural enzymes, transformations that cannot be achieved efficiently or selectively via more traditional chemical methods are the subject of much current research [4]. Total synthesis of the natural product epothilone from a chiral intermediate prepared by antibody catalysis [5] and activation of a prodrug in vivo [6] are two recent accomplishments that illustrate the potential of this technology. Nevertheless, practical applications are still the exception rather than the rule. Low catalytic efficiency, in particular, appears to be a significant limitation. While modest rate accelerations are easily achieved, enzyme-like activity remains elusive. The goal of this review is to address this problem in the light of recent structural and mechanistic work. After first considering the intertwined issues of catalytic efficiency and hapten design, a few well-characterized examples of antibodies promoting a diverse set of reactions are used to illustrate how antibody binding energy is exploited for catalysis and to identify factors that limit overall efficiency. Finally, possible strategies for improving these systems are discussed.
3 Catalytic Efficiency
By almost any criterion, natural enzymes are incredibly efficient catalysts. The fastest enzymes are limited by the rate at which they encounter substrate, and even those that have not achieved this level of evolutionary perfection typically have apparent bi-molecular rate constants (kcat /K m ) between 106 and 108 M−1 s−1 , irrespective of the rate of the corresponding uncatalyzed reaction [7]. Rate accelerations over background (kcat /kuncat ) are also very high, in the range 106 to 1012 , and can even reach 1017 in some special cases [7, 8]. These extraordinary effects have been explained by an enzyme’s ability to bind transition states more tightly than ground states [9]. When enzymatic and nonenzymatic reactions occur by the same mechanism and chemistry is rate determining, a simple thermodynamic cycle based on transition-state theory can be used to show that kcat /kuncat = K m /K TS , whereK m andK TS are the dissociation constants for substrate and transition state from the enzyme [10, 11]. The chemical proficiency of an enzyme, as defined by Wolfenden [7], is then given as the ratio (kcat /K m )/kuncat or 1/K TS . This term represents the lower limit of a protein’s affinity for the transition state and varies from 108 to 1023 M−1 for a series of natural enzymes [7, 8]. Although true transition-state affinity may be underestimated if chemistry is not clearly rate limiting or if the enzyme uses a different mechanism than the uncatalyzed reaction, this term provides a useful measure of catalytic power for the purpose of discussion.
3 Catalytic Efficiency
One practical consequence of the application of transition-state theory to enzymes has been the design of potent inhibitor molecules through mimicry of structural and electronic features of otherwise ephemeral transition states (for recent reviews, see [12–14]). Another has been the use of such compounds to generate catalytic antibodies [2, 3]. The latter strategy has been found to be broadly useful, as any chemical transformation compatible with a biological milieu is potentially amenable to antibody catalysis so long as an appropriate transition-state analog can be devised. Broad scope and high, programmable catalyst selectivity certainly make this route to tailored enzymes one of the most promising to emerge in the last two decades. In terms of efficiency, however, catalytic antibodies do not yet match their natural counterparts. Published kcat /K m values for the best catalysts are only in the range 102 to 104 M−1 s−1 , well below the limit for diffusion-controlled processes, and rate accelerations are usually between 103 - to 105 -fold over background [15, 16]. Smaller effects are often found, but higher efficiencies are extremely rare. Differences between antibodies and enzymes are readily apparent in Fig. 1, where rate acceleration is plotted against chemical proficiency for a representative set of reactions. Antibody-catalyzed reactions cluster in the lower left quadrant of the plot, corresponding to the lowest activities, whereas enzymatic reactions lie above and to the right, spanning a much greater range of efficiency. Because K m values are roughly comparable in all of these systems, an excellent correlation between rate acceleration and proficiency is observed. In essence, better transitionstate binding translates directly into higher activity. Although the best antibodies approach the efficiency of the least efficient enzymes, it should be noted that the
Fig. 1 Comparison of chemical proficiency, -log (K Ts ), and rate acceleration, kcat /kuncat , for a series of reactions catalyzed by enzymes (filled symbols) and antibodies (open symbols). Rearrangements (, ), hydrolyses (,), decarboxylations (, ), de-
protonations (•,◦) and retroaldol reactions (♦, ) are included. Enzyme data are from [7, 112, 131, 132] and antibody data were taken from [25, 26, 38, 75, 91, 110, 113, 115, 121, 127, 200–202].
3
4 Catalytic Antibody Technology
corresponding reactions often involve conversions of relatively activated substrates (e.g., the hydrolysis of aryl esters). In contrast, enzymes specialize in accelerating extremely slow reactions like the cleavage of amides (t1/2 = 7.3 years) and phosphate diesters (t1/2 = 130 000 years), few of which have yielded to significant antibody catalysis. In a complementary analysis, Stewart and Benkovic [15] found a weak correlation between rate acceleration and the ratio of the equilibrium binding constants of the reaction substrate and the hapten: kcat /kuncat = K m /K TS ≈ K m /K i . In other words, affinity for the transition-state analog roughly approximates affinity for the true transition state, in accord with the basic premise underlying the production of catalytic antibodies. Practically speaking, this means that the search for highperformance catalysts can often be reduced to a search for the best hapten binders – provided the hapten is a reasonably good transition-state mimic. However, association constants for low molecular weight haptens (1/K i = 106 –1010 M−1 ) are generally much smaller than chemical proficiencies of the most effective enzymes. Given K m values of ca. 10−4 M, these affinities suggest that kcat /kuncat will seldom exceed ca. 106 for first generation catalytic antibodies, as found experimentally. For reactions with modest activation barriers, effects of this magnitude may be useful. If product inhibition and protein production pose no further technical hurdles, kinetic resolutions can be achieved, the fate of reactive intermediates controlled and so on. For truly difficult reactions, however, such effects are almost certainly inadequate.
4 Hapten Design
Before turning to a description of specific antibodies, a few words about hapten design are in order. Transition states themselves have fleeting lifetimes and cannot be isolated. Synthesis of effective analogs must therefore draw on our chemical intuition about the conformational, stereochemical, and electronic properties of the reaction under study. Because no stable molecule can reproduce all the characteristics of an actual transition state, design efforts have tended to focus on salient features distinguishing transition state from ground state [12–14]. For instance, changes in hybridization or charge occurring as a reaction proceeds can be mimicked by incorporating different elements or charged groups at appropriate sites within the analog. Conformational constraint can help approximate the reactive geometry of a flexible substrate, while multisubstrate analogs are used to imitate the relative disposition of reactants in a bimolecular process. For reactions that require catalytic functionality within the antibody pocket, more sophisticated strategies appear to be needed. In the “bait-and-switch” approach, charge complementarity between hapten and antibody is exploited to induce appropriately positioned acids, bases and nucleophiles. Alternatively, catalytic residues can be selected directly by irreversible chemical modification when mechanismbased inhibitors [17, 18] are employed as haptens. The latter strategy, dubbed
5 Representative Catalytic Antibodies
“reactive immunization” [19], has the virtue of allowing rational engineering of covalent catalysis. For each new reaction, hapten design must be optimized to maximize the probability of finding an antibody catalyst. Because subtle differences between even the best transition-state analogs and actual transition states almost certainly contribute to lower efficiencies of antibodies compared with enzymes, it is important to understand how instructions implicit in any given hapten design are realized in the complementary immunoglobulin binding pocket. Characterization of successful antibody catalysts at the atomic level currently provides the most useful insights into how binding energy is exploited for catalysis.
5 Representative Catalytic Antibodies 5.1 Proximity Effects
Utilization of binding energy to constrain flexible molecules into reactive conformations or to preorganize reactants for bimolecular reaction is a potentially powerful strategy for accelerating reactions with unfavorable entropies of activation [20, 21]. To test whether antibodies might serve as entropy traps [21], concerted pericyclic reactions requiring neither nucleophilic nor acid-base catalysis have been investigated. 5.1.1 Sigmatropic Rearrangements The biologically important Claisen rearrangement of chorismate to prephenate (Fig. 2) and the abiological oxy-Cope rearrangement (Fig. 3) are typical [3,3]-sigmatropic processes. They proceed via highly ordered, entropically unfavorable, cyclic transition states involving the simultaneous formation of a carbon–carbon bond and cleavage of either a carbon-oxygen or another carbon-carbon bond. For the chorismate rearrangement, the conformationally locked oxabicyclic dicarboxylic acid 1 [22] proved to be a successful hapten (Fig. 2). It has a chair-like geometry very similar to that of the presumed transition state and is an effective inhibitor of natural chorismate mutases [22, 23]. Antibodies that bind 1 catalyze the chorismate rearrangement enantioselectively with rate accelerations (kcat /kuncat ) of 102 to 104 over background [24–27]. For comparison, enzymes accelerate this reaction by a factor of ca. 106 [28]. Spectroscopic and X-ray studies of 1F7 (which achieves a 200-fold rate enhancement) have provided insights into the origins of catalysis in these systems. Transferred nuclear Overhauser effects show that 1F7 binds the flexible chorismate molecule in the diaxial conformation specified by the transition-state analog [29]. Crystallographic data confirm that the induced binding pocket faithfully reflects hapten design [30]. Compound 1 is deeply buried in the complex, and the active site’s overall shape and charge are complementary to a single hapten
5
6 Catalytic Antibody Technology
Fig. 2 (Top) The Claisen rearrangement of chorismate to prephenate catalyzed by antibody 1F7; the flexible chorismate adopts an extended pseudo-diequatorial conformation in solution and must undergo a conformational change to populate the less
stable pseudodiaxial conformer in order for reaction to proceed. (Bottom) The conformationally restricted dicarboxylic acid 1 [22] mimics the transition state of the Claisen rearrangement and was used to elicit catalytic antibody 1F7 [25].
Fig. 3 Antibody-catalyzed oxy-Cope rearrangement [36]. The aldehyde product is trapped with hydroxylamine to give an oxime to prevent time dependent inactivation of the catalyst. The 2,5-diaryl cyclohexanol derivative 2 was used to imitate the structure of the pericyclic transition state.
5 Representative Catalytic Antibodies
Fig. 4 (a) Active site of antibody 1F7 [30], showing interactions with hapten 1. The light and heavy chains are pink and blue, respectively; the hapten is yellow. Note that ArgH95 forms a salt bridge with the ligand’s secondary carboxylate but is too far away to form a hydrogen bond with the ether oxygen. (b) Active site of E. coli chorismate
mutase [203]. Bound 1 is completely inaccessible to solvent and makes numerous contacts with protein residues; hydrogen bonds from Lys39 and Gln88 to the ligand’s ether oxygen are essential for high activity [28]. Graphics were prepared with the programs BobScript [204] and Raster3D [205].
enantiomer (Fig. 4a). Consequently, only the corresponding (–)-isomer of chorismate binds in a conformation appropriate for reaction. The subsequent rearrangement of bound substrate then occurs by the same concerted mechanism as that deduced for the uncatalyzed reaction and for natural chorismate mutases. Nevertheless, 1F7 is likely to be a much poorer entropy trap than mutase enzymes. It exploits many fewer hydrogen bonds and electrostatic interactions for ligand recognition (Fig. 4a,b). It also appears to accommodate charge separation in the transition state less effectively. Isotope effects show that the transition state for the rearrangement is highly polarized, with C–O bond cleavage preceding C–C bond formation [31, 32]. The enzymes stabilize this species electrostatically by placing a cationic residue (either Arg or Lys) near the partially negatively-charged ether oxygen of the breaking C–O bond [28, 33, 34], but 1F7 lacks an analogous feature (Fig. 4a,b). These differences presumably explain the antibody’s 104 -fold lower efficiency. They can be attributed, in large part, to shortcomings in hapten design. While 1 reproduces the geometry of the actual transition state reasonably well, it mimics the polarized character of this high energy species poorly [35]. Shortcomings in hapten design are also likely to account for the modest activity of antibody AZ-28. This antibody was raised against cyclohexanol derivative 2
7
8 Catalytic Antibody Technology
(K d = 17 nM) and catalyzes the oxy-Cope rearrangement of the corresponding 2,5diaryl-3-hydroxy-1,5-hexadiene with a kcat /kuncat of 5300 (Fig. 3) [36]. In this case, disposition of the aryl substituents in the transition state is imitated imperfectly in the stable hapten. Like the chorismate mutase antibody, AZ-28 has been shown by TRNOE measurements to preorganize the normally extended hexadiene substrate into a cyclic conformation so that its termini are in close proximity [37]. In this case, ligand recognition is mediated by extensive van der Waals contacts, π-stacking interactions with the aromatic rings, and hydrogen bonding interactions with the alcohol, all evident in the X-ray structure of the antibody-hapten complex (Fig. 5) [38]. However, the conformation adopted by the substrate at the AZ-28 active site is unlikely to be optimal for reaction. The two aryl substituents at C2 and C5 are key recognition elements (Fig. 5) but are oriented very differently in the hapten, where they are sp3 hybridized and equatorial to the plane of the cyclohexane ring, and the transition state, where they are sp2 hybridized and conjugated with the reacting olefins (Fig. 3). Thus, even though the hapten-induced pocket brings together the ends of the hexadiene substrate, binding energy directed to the peripheral aryl groups
Fig. 5 (a) Stereoview of the AZ-28 active site showing hapten-contacting residues [38]. The 5-phenyl ring of 2 sits deep in the hydrophobic pocket, while the 2-phenyl substituent and linker are near the surface. The hapten’s alcohol forms a hydrogen bond
with HisH96 . Two water molecules in the pocket are depicted as red spheres. (b) GRASP [206] surface representation of AZ28 showing the high degree of shape complementarity between ligand and protein.
5 Representative Catalytic Antibodies
almost certainly imposes physical constraints that preclude effective alignment of the reacting [4π + 2σ ] orbitals. Alterations in antibody structure leading to improved orbital overlap should therefore result in significant increases in catalytic efficiency. This inference is supported by studies of AZ-28’s germline precursor. This antibody binds hapten 2 with 40-fold lower affinity than the mature AZ-28 but is a substantially better catalyst, achieving a 163 000-fold rate acceleration over background [38]. Mutagenesis experiments showed that replacement of SerL34 in the germline sequence with Asn is responsible for both effects [38]. In the crystal structure of AZ28, AsnL34 interacts directly with the cyclohexyl ring of the hapten and is therefore in a position to influence the conformation of the substrate at the active site. Although designed as entropy traps, and despite evident restriction in the conformational freedom of their substrates, neither 1F7 nor AZ-28 lowers the entropy of activation for its reaction. Both have S‡ values that are 10–20 cal K−1 mol−1 less favorable than the corresponding uncatalyzed reactions [25, 37]. This contrasts with some natural chorismate mutases which do reduce the entropy barrier to reaction significantly [39]. Mechanistic interpretations of activation parameters are necessarily uncertain [40], but the unfavorable S‡ values are consistent with the need for substantial conformational change in the bound substrate as the reaction proceeds. However, other factors, including changes in solvation or conformation associated with the antibody, cannot be excluded. More generally, these two examples show how the chemical instructions implicit in hapten structure, including deficiencies with respect to transition-state mimicry, are accurately imprinted on an antibody binding site. Improved transition-state analogs should therefore yield much better catalysts. To obtain more efficient chorismate mu-tase antibodies, for example, haptens containing additional negative charges might be used to elicit the catalytically essential cation in the vicinity of the substrate’s ether oxygen. Similarly, haptens in which the aryl substituents are coplanar with the cy-clohexyl ring should increase the probability of identifying faster catalysts for the oxy-Cope rearrangement of Fig. 3. The sensitivity of the latter reaction to anionic substituent effects [41] could also be drawn on. Haptens containing an appropriately positioned ammonium group might induce an antibody residue capable of deproto-nating the substrate alcohol. Fine-tuning of the first-generation antibodies is also likely to yield substantially better catalysts. Identification of a second chorismate mutase antibody possessing significantly higher activity than 1F7 [26] supports the feasibility of such an undertaking. Plausible strategies for optimizing activity include site-directed mutagenesis or random mutagenesis coupled with in vivo selection. The 1F7 Fab fragment’s ability to replace the missing enzyme in a chorismate mutase-deficient yeast cell line [42] is a promising indication that it will be amenable to directed evolution in the laboratory [43]. Overall, there are many ways to bind any given transition-state analog, only some of which will be effective for catalysis. Indeed, a significant fraction of hapten binders in any given experiment is usually found to be inactive. Hapten affinity rather than catalytic activity drives maturation of the immune response, so mutations can arise that favor tighter hapten binding but are deleterious for catalysis,
9
10 Catalytic Antibody Technology
as seen for AZ-28. Broad screening of antibodies raised to each hapten is therefore necessary to guarantee a representative sampling of the immune response. In the present instance, antibodies other than 1F7 and AZ-28 might be obtained that ultimately prove to be better starting points for optimization. 5.1.2 Cycloadditions Loss of both translational and rotational degrees of freedom should make bimolecular reactions particularly sensitive to proximity effects [20]. Diels-Alder reactions between dienes and dienophiles have been used to test this notion. They typically have high activation entropies in the range −30 to −40 cal K−1 mol−1 [44], reflecting the low probability of bringing together two substrates in an orientation optimal for reaction. The transition state for these concerted cycloadditions is highly ordered and resembles the boat form of the cyclohexene product more closely than is does the starting materials. Antibodies raised against bicyclic compounds that mimic the transition-state geometry have displayed a range of useful catalytic effects, including control over reaction pathway and absolute stereochemistry [45–49]. The general approach to catalysis is exemplified by antibody 1E9, which promotes the [4+2] cycloaddition cycloadditionand N-ethylmaleimide (Fig. 6) with multiple turnovers [45]. The initially formed adduct 3 spontaneously eliminates sulfur dioxide to give N-ethyl tetrachlorophthalimide as the final product after oxidation in situ. The endo hexachloronorbornene derivative 4, an excellent mimic of the intermediate and its flanking transition states, served as the hapten (K d = 2 nM). Because the planar product is structurally so different from 4, it binds 105 -fold less tightly to the induced antibody, effectively minimizing product inhibition. In this case, catalytic efficiency can be estimated as an effective molarity (EM) [50]. EM is the ratio of the pseudo-first-order rate constant for the antibody reaction (kcat ) to the second-order rate constant for the uncatalyzed process (kuncat ). This ratio
Fig. 6 Diels-Alder condensation of tetrachlorothiophene dioxide (TCTD) and N-ethyl maleimide (NEM) yields a high-energy intermediate (3) which eliminates sulfur dioxide. The initially formed product is subsequently
oxidized in situ. The transition state for cycloaddition and chelotropic elimination of SO2 closely resemble the hexachloronorbornene derivative 4 used as a hapten to elicit antibody 1E9 [45].
5 Representative Catalytic Antibodies
gives the nominal concentration of one reactant needed to convert the spontaneous bimolecular reaction into a pseudo-first-order process with a rate equivalent to that achieved in the antibody ternary complex. It is usually interpreted as the entropic advantage of a unimolecular over a bimolecular process, with an upper limit of about 108 M for 1 M standard states [51]. For 1E9, the EM is ca. 103 M [52]. Although much lower than the theoretical limit, this value is significantly higher than EMs reported for other antibody Diels-Alderases, which rarely exceed 10 M [46–49], making 1E9 the most efficient such catalyst described to date. Recent structural work has shown that the 1E9 active site is exactly what one might expect of an antibody that functions as an entropy trap [52]. Extensive van der Waals contacts, π-stacking interactions, and a strategically-placed hydrogen bond to one of the succinimide carbonyl groups create a pocket that is highly complementary to the hapten (Fig. 7a). When complexed, the ligand (excluding the hexanoate linker) is 86% buried. Its fit to the protein is so snug that no interfacial cavities are detectable, even when a probe of 1.2 Å radius is used. Thus, the 1E9
Fig. 7 The active site of Diels-Alderase 1E9 provides a snug, complementary binding surface for its hexachloronorbornene hapten [52]. With the exception of a hydrogen bond between the deeply buried carbonyl group of the ligand and the side chain of AsnH35 (not shown), most of the contacts are
hydrophobic in nature. (b) Antibody 39A11 (see Fig. 8) binds its hapten much more loosely. Note that the “dienophilelike” N-aryl succinimide is significantly better packed than the bicyclo[2.2.2]octene moiety, which serves as a diene surrogate.
11
12 Catalytic Antibody Technology
binding pocket appears ideally suited to the task of preorganizing its diene and dienophile substrates in a reactive complex that closely approximates the transitionstate geometry. Nevertheless, a simple entropy trap mechanism does not appear to be operative. The temperature dependence of kcat and kuncat shows that catalysis by 1E9 is achieved entirely by reducing the enthalpy of activation; the solution and the antibody processes are equally unfavorable entropically, with S‡ values of −22 cal K−1 mol−1 [52]. Nor is the rate acceleration due to a simple medium effect associated with the apolar binding cavity, since the uncatalyzed reaction is 10 times slower in acetonitrile than in water. Catalysis by 1E9 can be explained by enthalpic stabilization of the transition state through an unusually close fit to the apolar binding surface of the antibody active site and a strong hydrogen bond between the side chain of AsnH35 and the maleimide carbonyl. Quantum mechanical calculations indicate that the numerous, energetically favorable van der Waals interactions provide the driving force for binding [52]. Although these interactions distinguish poorly between ground and transition state, they hold the substrates against a relatively unfavorable electrostatic field that becomes substantially more favorable as the transition state is approached because of the increased strength of the hydrogen bond to the dienophile carbonyl. Comparison of 1E9 with another Diels-Alderase, 39-A11 [47], dramatically illustrates the importance of close packing for high efficiency. Antibody 39-A11 was generated with the substituted bicyclo[2.2.2]octene derivative 5, which contains an ethano bridge locking the cyclohexene ring into the requisite boat conformation (Fig. 8). It catalyzes the Diels-Alder reaction between an electron-rich acyclic diene and an N-aryl maleimide to give a cyclohexene derivative, albeit with a relatively low effective molarity of 0.35 M.
Fig. 8 Antibody 39-A11 catalyzes a Diels-Alder reaction between an electron-rich acylic diene and an N-aryl maleimide. It was elicited with the bicyclo[2.2.2]octene hapten 5. The ethano bridge in 5 locks the cyclohexane ring into a boat conformation but has no counterpart in the substrate itself.
5 Representative Catalytic Antibodies
Considering the low dissociation constant reported for the antibody-hapten complex (K d = 10 nM) [47], the fit of the bicyclo[2.2.2]octene to 39-A11 is surprisingly loose (Fig. 7b) [53]. Only 66% of the hapten surface area is buried in the complex. Poor complementarity is indicated by the large cavity volume of 117 Å3 detected between ligand and antibody. The portion of the hapten corresponding to the reacting [4+2] system is particularly poorly packed, whereas peripheral substituents of 5, especially the aryl side chain, appear to be important recognition elements. Moreover, the hapten’s ethano bridge, which has no counterpart in the substrates or the transition state, carves out additional unwanted space within the pocket. Consequently, the bound substrates – particularly the diene, which must bind in the least complementary region of the pocket – are likely to retain considerable degrees of freedom. Low catalytic efficiency is therefore unsurprising. Consistent with this idea, introducing large aromatic groups at positions L91 and L96 to improve packing interactions with the kinetically favored endo transition state results in 5 to 10-fold higher kcat values [54]. Relatively non-polar as it is, the environment of the 39-A11 active site may further erode catalytic efficiency. Like 1E9, 39-A11 provides a hydrogen bond (also from AsnH35 ) to the dienophile carbonyl, but its reaction involves a strong donor diene and acceptor dienophile rather than two electron-deficient addends. The corresponding transition state should therefore be more polar and hence more sensitive to transfer from water than that of the 1E9-catalyzed reaction. Unfortunately, mutagenesis experiments to augment activity by providing additional hydrogen bonds to the dienophile have not been successful [54]. Structurally distinct haptens notwithstanding, 1E9 and 39-A11 are unexpectedly closely related in primary sequence and tertiary structure [52, 53, 55]. Both belong to a family of polyspecific antibodies that exhibit extensive cross-reactivity for hydrophobic ligands containing one or two polar groups. For example, 39-A11 and its germline precursor accommodate a range of structurally diverse compounds [53], while 1E9 and the related progesterone-binding antibody DB3 [56, 57] bind each other’s ligands with affinities only 25-fold to 50-fold lower than their own [55]. The side chain of AsnH35 and conserved hydrophobic interactions seem to be particularly important for achieving recognition. However, docking experiments [52] with 1E9 and DB3 suggest that non-cognate molecules bind randomly in the apolar cavity, whereas specific ligands adopt a single, well-defined binding mode similar to that seen in crystal structures of the corresponding antibody-hapten complexes. Specificity in these systems appears to be conferred by a small number of mutations to the shared scaffold. In the case of 39A11, for instance, substituting Val for Ser at position L91 in the germline sequence accounts for almost all the 40-fold increase in hapten affinity achieved during affinity maturation [53]. Similarly, a somatic mutation in the L3 CDR loop (SerL89 → Phe) and a rare mutation in the antibody framework region (TrpH47 → Leu) substantially alter the shape of the 1E9 combining pocket compared to that of 39-A11 or DB3 [52]. These changes are primarily responsible for its virtually perfect shape complementarity to the transition-state analog. This complementarity appears crucial for reactivity, since DB3 does not detectably accelerate the 1E9 reaction despite its affinity for hapten 4.
13
14 Catalytic Antibody Technology
Given the immune system’s enormous combinatorial diversity (more than 108 antibodies are available in the primary response [58]), it is surprising that similar antibodies were generated in separate immunization experiments with 4, 5 and progesterone. Nor, as will be shown below, is this finding unique. To what extent does utilization of a few restricted sets of antibody germline genes limit the catalytic potential of the immune system? Do these frequently selected scaffolds represent local minima from which it will be difficult to evolve more active catalysts? Poor shape complementarity between 39-A11 and the bicyclo[2.2.2]octene is a caseinpoint. However, site-directed mutagenesis experiments demonstrate that the mature antibody is not an evolutionary dead end. The same ValL91 Tyr change that improves catalysis 10-fold also increases hapten affinity 2.5 times [54], and it should be possible to find additional mutations that further tighten the structure. Because only ten antibodies were screened for activity, we cannot know whether analogous mutations were present in the antibody population induced in response to hapten 5. It is conceivable that they never emerged: a 10 nM dissociation constant for the 39-A11-hapten complex is more than adequate for the immune system’s purposes, so there may be little or no selection pressure to increase affinity beyond that point. As shown by 1E9, the combining pocket can be molded remarkably well under favorable circumstances to achieve nearly perfect shape complementarity with a ligand. Superior fit is not necessarily manifest in tighter binding, since 1E9 and 39-A11 have similar hapten affinities (K d = 2 nM versus 10 nM). Compound 4 is more highly optimized than 5 with respect to transition-state mimicry, however, and important binding interactions in 1E9 are concentrated where they are needed for catalysis, rather than loosely dispersed as in 39-A11. A high degree of complementarity at the site of reaction thus appears to pay off in this case in terms of a more efficient catalytic outcome. Even in the case of 1E9, though, much higher efficiency should be attainable. An analogous enzyme is unavailable for direct comparison (see [59] for evidence regarding possible Diels-Alderases in Nature), but 1E9’s chemical proficiency, defined as (kcat /K diene K dienophile )/kuncat = 1.4 × 107 , is far from what is expected of a fully evolved catalyst. Given an already excellent fit between protein and ligand, it is unlikely that mutation of residues lining the binding cavity will substantially improve complementarity. To optimize electrostatic interactions with the transition state and reduce any remaining degrees of freedom available to the bound substrates, residues distant from the active site will have to be modified. Because such mutations will be difficult to identify by inspection of the protein structure, combinatorial mutagenesis and an efficient screening assay [60] or selection protocol will be needed if 1E9 variants with enhanced properties are to be developed. 5.2 Strain
Substrate destabilization through strain has been proposed as another mechanism for achieving rate accelerations with enzymes [20, 61]. Binding energy can be used to strain molecules in various ways. Destabilization can involve geometric
5 Representative Catalytic Antibodies
distortion of the substrate, electrostatic repulsion between groups of like charge, or desolvation effects. If substrate destabilization is relieved at the transition state, significant reductions can result in the free energy of activation for reaction. Rate accelerations obtained in this way are potentially quite large, limited only by the amount of binding energy available to force the substrate into the destabilizing environment. As with entropic effects, it was predicted that strain mechanisms might be readily exploited for antibody catalysis. 5.2.1 Ferrochelatase Mimics Ferrochelatase is an example of an enzyme that is believed to exploit geometric distortion for catalysis. As the terminal enzyme in heme biosynthesis, it promotes complexation of Fe2+ by protoporphyrin IX [62, 63]. Early work suggested that fer-rochelatase functions by distorting the substrate porphyrin from its preferred planar conformation into a bent structure [62]. This distortion exposes the nitrogen of one of the pyrroles to solvent, thereby facilitating metal ion complexation. In fact, non-planar N-methylated porphyrins are known to chelate metal ions 3 to 5 orders of magnitude faster than their non-alkylated counterparts [62]. They are also potent inhibitors of ferrochelatase [64]. When used as haptens, they have yielded antibodies [65] that catalyze insertion of divalent metal ions into mesoporphyrins with kcat values approaching those of the enzyme (Fig. 9). Ferrochelatase antibody 7G12 has been characterized in some detail. It promotes incorporation of Zn2+ and Cu2+ into mesoporphyrin IX with kcat values of 0.022 s−1 and 0.0069 s−1 , respectively [65]. By way of comparison, the kcat value for recombinant Bacillus subtilis ferrochelatase is ca. 0.4 s−1 for metalation with Fe2+ , Zn2+ and Cu2+ [66]. K m values for the porphyrin are also comparable (50 µM for the antibody and 8 µM for the enzyme). However, whereas substrate metal ions bind tightly to the enzyme (20–170 µM), no evidence for saturation of the antibody has been observed up to concentrations up to 2.5 mM, suggesting that binding of metal ions to the antibody does not contribute to catalysis. Severe inhibition of the antibodycatalyzed reaction by the metalloporphyrin product is a further point of contrast. Mechanistic and structural studies have clarified how the antibody chelatase functions. The crystal structure of 7G12 complexed with its hapten has been solved [67]. The N-methylated porphyrin binds at the junction of the heavy and light chains
Fig. 9 N-methyl mesoporphyrin and the mesoporphyrin metalation reaction catalyzed by antibody 7G12 [65].
15
16 Catalytic Antibody Technology
Fig. 10 Stereoview of the 7G12 binding site [67], showing how bound porphyrin packs against the heavy chain (blue) and exposes one face to solvent. The side chains of heavy chain residues AspH96 and ArgH96 are visible under the surface behind the porphyrin. Interactions with the light chain
(pink) are largely restricted to contacts with the methyl and ethyl substituents of the A and B pyrrole rings of the porphyrin. Although a mixture of N-alkylated porphyrins was used, only the derivative with the A ring methylated appears to bind to the antibody.
(Fig. 10). One face of the ligand makes extensive contacts with VH , while the other is relatively exposed to solvent. Packing interactions from light chain Tyr residues may reinforce the distortion from planarity of pyrrole ring A, which bears the N-methyl group. Replacement of these residues with Ala caused large increases in K m and 10-fold to 40-fold decreases in kcat /K m [67], suggesting that they may play a similar role with the non-methylated substrate. In analogy with the enhanced reaction rates achieved by porphyrin alkylation, a catalytic mechanism involving binding and distortion of the porphyrin by the protein, followed by direct chelation of metal ions from solution, seems plausible. Although its precise role is still unclear, amino acid AspH96 is evidently required for this process [67]. Its carboxylate side chain is directed from the VH domain toward the center of the porphyrin ring with one oxygen roughly equidistant from the porphyrin’s four pyrrole nitrogens (Fig. 10). The other oxygen is fixed in place through a hydrogen bond to ArgH95 . It is conceivable that the carboxylate acts as a base which shuttles protons from the porphyrin during metal ion exchange. This residue is also probably responsible, at least in part, for product inhibition, since axial coordination to the metal ion will anchor the metaloporphyrin to the antibody. Direct experimental evidence for porphyrin distortion by 7G12 has been obtained with resonance Raman spectroscopy [68]. Spectral data show that the antibody
5 Representative Catalytic Antibodies
induces an alternating up-and-down tilting of the pyrrole rings very similar to the distortion produced by porphyrin alkylation. In contrast, yeast ferrochelatase apparently causes all four pyrrole rings to tilt in the same direction in a domed fashion. The enzymatic reaction is regulated allosterically by a metal-dependent protein conformational change. Since the antibody has no metal binding site, the distortion it induces must be brought about entirely by binding interactions between porphyrin and protein. An atomic-level explanation of this effect will require elucidation of the antibody-substrate complex structure. The broad lesson to be derived from these experiments is that substrate destabilization can be a very successful approach to antibody catalysis. Although 7G12 and ferrochelatase perturb the porphyrin structure in different ways, both kinds of distortion appear to be effective for metal ion chelation, yielding kcat values within a factor of 10 of each other. That said, destabilization mechanisms are expected to have little to no effect on kcat /K m [69], the steady-state parameter generally optimized through evolution. By this criterion, 7G12 is substantially more primitive than its natural counterpart. Because chelatases catalyze a bimolecular reaction, flux through the catalyst is limited by the least favorably processed substrate. For both enzyme and antibody, this is the metal ion. Hence, rate enhancements are given by [(kcat /K M2+ )/kuncat ]. Since K M2+ values for 7G12 are at least 103 times larger than those for natural ferrochelatases (assuming the metal ion binds at all), the antibody suffers an equivalent rate disadvantage under practical operating conditions, despite its favorable kcat . Similar considerations apply to the chemical proficiencies of the two catalysts [(kcat /K M2+ K porphyrin )/kuncat ]. The antibody’s transition-state affinity is at least four orders of magnitude lower than that of the enzyme. To improve 7G12, a suitable binding site for metal ions should be constructed, perhaps by extending light chain CDR loops that are near the exposed face of the bound porphyrin. The challenge will be to bring the metal ion into close proximity with the porphyrin without further increasing the active site’s affinity for product. As for the chorismate mutase antibody discussed above, an in vivo selection strategy is feasible. Ferrochelatase-deficient yeast auxotrophs have been reported [70]. Complementing this metabolic deficiency with the antibody catalyst would provide a means of identifying more efficient variants. 5.2.2 Other Systems Large rate accelerations are also expected when charged reactants are enthalpically destabilized relative to a charge delocalized transition state by desolvation [20]. Such a mechanism may contribute to the efficacy of enzymes that promote biologically relevant decarboxylations [71–74]. Antibody catalysis of model reactions, such as the solvent-sensitive decarboxylation of 3-carboxy-benzisoxazoles to salicylonitriles [75, 76] and the difficult decarboxylation of orotate to uracil [77], has been used to probe the role of medium effects in a tailored binding pocket. Significant rate enhancements have been achieved, and structural work characterizing the properties of successful active sites will be instructive. Given the well-established and often dramatic sensitivity to solvent change of many reaction types, medium effects are likely to be pervasive in antibody and enzymatic catalysis. Reactions that display
17
18 Catalytic Antibody Technology
large changes in charge localization, including decarboxylations, nucleophilic substitutions, and aldol condensations, should be especially amenable to such effects. 5.3 Electrostatic Catalysis 5.3.1 Acyl Transfer Reactions Enzymes frequently utilize hydrogen bonding and charged groups to stabilize polar transition states electrostatically [69]. When haptens containing positive and negative charges are used, electrostatic interactions can also be exploited for antibody catalysis. For example, anionic phosphonates and phosphonamidates, originally designed as potent inhibitors of hydrolytic enzymes [12, 78], have been useful in the production of antibodies that hydrolyze esters, carbonates and (more rarely) amides [79]. Such analogs resemble the transition state for hydrolysis in a number of ways, including tetrahedral geometry, negative charge, and increased bond lengths. How these features are reflected inthe induced binding pockets has been deduced through structural studies of six different esterase families [80–90]. Antibody 48G7 is typical of this class of catalyst [91]. It was generated against p-nitrophenyl phosphonate (6) and accelerates the hydrolysis of the corresponding activated ester 7a and carbonate 7b by factors of 1.6 × 104 and 4 × 104 , respectively (Fig. 11). Both the antibody and its germline precursor, with and without bound hapten, have been characterized to provide insight into the origins and evolution of its catalytic effects [85–87]. The mature antibody has a deep, well-defined combining site rich in aromatic residues (Fig. 12a). It binds the hapten in an extended conformation with the aryl group at the bottom of a hydrophobic cleft formed between CDRs L1 and H3. The negatively charged phosphonate moiety lies near the pocket entrance, where it forms multiple interactions with charged and neutral antibody residues. The pro-R phosphonyl oxygen hydrogen bonds with the TyrH33 phenolic hydroxyl group and forms a salt bridge with the ArgL96 guanidinium group, while the pro-S
Fig. 11 Hydrolysis of activated aryl esters 7a and carbonates 7b proceeds via an anionic and tetrahedral intermediate (in brackets). Hydrolytic antibody 48G7 was elicited with an aryl phosphonate derivative (6) that mimics this high-energy species and its flanking transition states [91].
5 Representative Catalytic Antibodies
oxygen hydrogen bonds to the HisH35 -imino group and the TyrH96 backbone NH. Comparison of the mature antibody with and without the transition-state analog shows no major conformational changes [86], suggesting a simple lock-and-key mechanism for hapten binding (see, however, [92]). Antibody 48G7 has a 30 000-fold higher affinity for the phosphonate hapten and 20-fold greater catalytic efficiency than its germline precursor. Nine somatic mutations, all lying outside the combining site, are responsible for these effects [85, 87, 93]. Although not in direct contact with bound ligand, the mutated residues appear to preorganize the pocket for binding and catalysis. Improved packing and secondary hydrogen-bonding interactions help limit side-chain and backbone flexibility inherent in the germline protein. In fact, germline 48G7 (Fig. 12b) is conformationally much more flexible than the mature antibody and undergoes significant reorganization upon hapten binding [87]. The hapten itself binds quite differently in the two complexes (Fig. 12a,b). The phosphonate moiety occupies essentially the same location in both, but it cannot form a hydrogen bond with TyrH33 in the germline complex because of an altered conformation of the CDR H1 loop. Further, the p-nitrophenyl group is rotated away from the position it adopts in the mature antibody to occupy a hydrophobic cleft constructed from framework residues. This second apolar pocket is present, but empty, in mature 48G7. The alternative binding mode is made possible by removal of an otherwise repulsive interaction with the nitro group of the hapten in the mature antibody by the somatic mutation SerL34 Gly. Concordant with hapten design, the crystallographic data suggest that 48G7 is a relatively simple catalyst which promotes ester hydrolysis by direct attack of hydroxide on the scissile carbonyl. Although the two hapten binding modes seen in the germline and mature 48G7 complexes create some ambiguity about the substrate’s preferred orientation, the side chains of TyrH33 , HisH35 and ArgL96 and the backbone amide of TyrH96 apparently stabilize the tetrahedral and anionic transition states electrostatically through hydrogen-bonding and ionic interactions (Fig. 12). An analogy between this anion binding site and the well-characterized “oxyanion hole” of serine proteases is apparent. Individually, however, the antibody residues are not very effective oxyanion stabilizers. Mutations at positions H33, H35 and L96 cause only 3 to 30-fold reductions in kcat [85]. Loss of the hydrogen bond between TyrH33 and the transition state probably contributes to the germline’s 20fold lower efficiency, as well. In contrast, replacement of a single asparagine in the oxyanion hole of the protease subtilisin results in 102 to 103 -fold losses in specific activity [94, 95]. High solvent accessibility and/or conformational mobility of the antibody residues may account for their limited efficacy. The 48G7 active-site structure and hapten-recognition properties are strikingly similar to those of other, independently-derived anti-phosphonate antibodies [96]. These include three catalysts (CNJ206 [80, 81], 17E8 [89] and 29G11 [90]) for the cleavage of p-nitrophenyl esters and three (D2.3, D2.4 and D2.5 [83, 84]) for the energetically more demanding hydrolysis of p-nitrobenzyl esters. Although many of these esterolytic antibodies derive from different germline sequences, they appear to have in common a deep hydrophobic pocket in the framework region into
19
20 Catalytic Antibody Technology
Fig. 12 Stereoview of the active site of mature 48G7 (a) and its germline precursor (b), showing the different orientation of the hapten in the two complexes [85–87]. The phosphonate moiety of 6 binds in roughly the same location in both, although the network of hydrogen-bonding interactions that
constitute the oxyanion hole is slightly different. The deeply bound aryl group binds in one of two available hydrophobic clefts at the bottom of the pocket depending on whether the residue at position L34 is Ser (germline) or Gly (mature).
which the leaving group’s aryl/benzyl ring binds (Fig. 13). All exploit multiple hydrogen bonds and salt bridges near the mouth of the cavity for recognition of the phosphonate moiety. Consequently, it is likely that all employ the same basic hydrolytic mechanism as 43G7 [96]. Given this, their comparative efficiency, which ranges over two orders of magnitude, must reflect differing abilities to stabilize the hydrolytic transition state relative to the bound ground state. Examination of the hapten complexes suggests that the rate enhancement (kcat /kuncat = 103 – 105 ) roughly parallels the number of hydrogen bonds to the phosphonate. Flexibility in the active site (as seen in the 48G7 germline antibody) also correlates with low efficiency [80, 81, 87]. In some cases, the antibodies may exploit binding energy to
5 Representative Catalytic Antibodies
Fig. 13 Overlay of three hydrolytic antibodies, 48G7 (green), 17E8 (blue), and CNJ206 (red), shows remarkable structural convergence in these independently-generated active sites [96]. Only haptencontacting side chains are illustrated; the purple balls indicate the position of the phosphorus of the respective bound hapten.
distort the substrate ester from its thermodynamically favorable Z-conformation, making it easier to reach the tetrahedral transition state during catalysis [82, 85]. For example, this factor may come into play in 48G7, depending on whether the substrate binds like the hapten in the mature or the germline complex (Fig. 12). The remarkable degree of structural convergence observed in antibodies selected for tight binding to aryl phosphonate transition-state analog finds parallels in a number of other systems. For example, the immune responses to phosphorylcholine [97], p-azophenylarsonates [98], and 2-phenyloxazolones [99] are dominated by specific combinations of heavy and light chain variable regions. Interestingly, when a phenyloxazolone-binding antibody was found with a unique VH domain, its structure showed conservation of important antigen binding residues [100]. These results point to the immune system’s utilizing a relatively limited number of mechanisms to recognize any given type of antigen. Apparently, strong selective pressure reduces the broad diversity initially present in the primary repertoire to a small set of “best” solutions that can be further optimized by somatic mutation. As discussed above for the Diels-Alderases, structural convergence is even evident in antibodies raised against unrelated haptens. Rather than possessing an infinite variety of differently configured active sites, the immune system appears to play with a limited deck. Variations within the general theme do arise. Another esterase that has been structurally characterized, antibody 6D6, catalyzes the hydrolysis of a chloramphenical monoester with relatively low efficiency (kcat /kuncat = 900) [101]. It shows specific
21
22 Catalytic Antibody Technology
differences from the other antibodies [82]. In particular, the hapten is bound more shallowly, although the stacked aromatic rings of the leaving group and the acyl side chain are the most deeply buried portions of the molecule. The tetrahedral phos-phonate is highly solvent exposed, forming only one hydrogen bond to the antibody, presumably explaining 6D6’s relatively low efficiency. At this point,itisnot clear howmuch activity can be obtained from antiphosphonate antibodies. Over 50 esterolytic antibodies have been showntohave properties roughly comparable to those discussed here: simple hydrolytic mechanisms, rate accelerations up to 105 over background, and chemical proficiencies of 107 − 108 M−1 [15, 16]. While notable, particularly considering concomitant high and predictable selectivity [3], such effects are still orders of magnitude lower than those achieved by analogous enzymes. The immunological approach appears to have hit a ceiling in catalytic efficiency. It seems likely that these activities reflect what can be achieved with an “oxyanion hole” mechanism alone. Recent attempts to augment activity by rational mutagenesis or affinity selection with phage-displayed antibody fragments have met with only limited success [102–105]. Negative results should not be overinterpreted, but theydo raise concerns that the aryl phosphonate binding pocket common to these catalysts is not an intermediate that can be further refined but an evolutionary dead end. Alternative antigen presentation strategies and in vitro selection methods may provide access to different subsets of the immune repertoire more amenable to optimization. For example, antibodies that bind tetrahedral anions not at the entrance to the combining site but deep within their active site, as seen for enzymes that promote hydrolyses, would be of interest. Of course, highly evolved hydrolytic enzymes are more than simple oxyanion holes. They exploit arrays of catalytic groups (and, often, metal cofactors) to catalyze energetically demanding reactions such as amide hydrolysis. Induction of several precisely aligned functional groups in a single immunization step is extremely improbable, however. Such arrays are unlikely to be present in the primary repertoire of the immune system, and, depending on the basic immunoglobulin scaffold, may not be accessible through somatic hypermutation. Occasionally, though, serendipitous mutations that open up new opportunities for catalysis can and do occur. Although phosphonate transition-state analogs specify a simple hydrolytic mechanism, some anti-phosphonate antibodies have been identified that exploit more complex mechanisms. For example, the lipase-like antibody 21H3, generated against a typical benzyl phosphonate ester [106], unexpectedly accelerates ester hydrolysis by a two-step mechanism involving transient acylation of an amino acid within the binding pocket [107]. Its efficiency at hydrolyzing esters is no greater than that of the esterolytic antibodies discussed above, but use of covalent catalysis makes possible stereoselective transesterification reactions that cannot be carried out in water without the antibody [107, 108]. Similarly, considerable biochemical evidence has been adduced for a two-step sequence involving an acyl-antibody intermediate in hydrolytic reactions catalyzed by antibody 43C9 [109]. The latter was generated against phosphonamidate 8 [110], rather than a phosphonate, and it is
5 Representative Catalytic Antibodies
Fig. 14 Phosphonamidate 8, which is a transition-state analog for amide hydrolysis, yielded antibody 43C9 [110]. This antibody cleaves structurally related p-nitroanilides 9 and esters (not shown).
unique in its ability to promote the hydrolysis of activated amides as well as esters (Fig. 14). A 2.5 × 105 -fold rate enhancement (chemical proficiency = 5 × 108 M−1 ) achieved in the cleavage of p-nitroanilide 9 makes 43C9 one of the most efficient hydrolytic antibodies known. The crystal structures of free 43C9 and its complex with p-nitrophenol were recently solved [88]. Although detailed understanding of interactions specific to ligand recognition and catalysis await characterization of the hapten complex, likely participants in the reaction are identifiable upon inspection of the binding pocket (Fig. 15). As with other hydrolytic antibodies, several residues are available for stabilizing the negative charge that develops in the transition state, many of which are shared with 48G7. These include HisH35 and ArgL96 , whose importance for catalysis is supported by the results of site-directed mutagenesis experiments [103]. Residue AsnH33 , like 48G7’s TyrH33 , may also play a role in transition-state stabilization. What distinguishes 43C9 from other esterolytic antibodies, though, is the presence of a second histidine at position L91 . HisL91 is directed into the pocket toward the region where substrate must bind (Fig. 15). Docking experiments have suggested
Fig. 15 Stereoview of the active site of amidase 43C9 with bound p-nitrophenol [88]. The imidazole side chain of HisL91 (green) points into the pocket toward the region normally occupied by the phosphonate in
other hydrolytic antibodies. AsnH33 , HisH35 and ArgL96 are potential oxyanion stabilizers. Two active-site water molecules are shown as red spheres.
23
24 Catalytic Antibody Technology
a plausible orientation of the substrate, placing its scissile carbonyl in an excellent position for nucleophilic attack by this residue’s imidazole side chain [88]. Detailed mechanistic inferences should be considered tentative in the light of likely perturbations to the binding pocket caused by packing interactions between the active sites of adjacent molecules in the crystal [88]. Nevertheless, mutagenesis of HisL91 to Gln decreases catalytic efficiency >50-fold with little effect on ligand binding, supporting its role as the nucleophile that is transiently acylated during catalysis [103]. Consistent with this possibility, the acyl intermediate detected by electrospray mass spectrometry at pH 5.9 is not observed with the HisL91 Gln variant [111]. Covalent catalysis by imidazole is well-established in non-enzymatic reactions of carboxylic acid derivatives [1]. Its efficiency is a consequence of imidazole’s high nucleophilicity and the relative instability of the acyl-imidazole intermediate. The presence of HisL91 can thus explain why 43C9, but not 48G7 or other esterolytic antibodies, cleaves an amide. Context is clearly relevant, since lipase 21H3 lacks amidase activity though it also exploits nucleophilic catalysis [106, 107]. The fact that 43C9’s mechanism is unspecified by its hapten’s design also underscores the importance of serendipity in these experiments. The rarity of such occurrences is reflected in many failed experiments to generate amidases with phosphonamidate haptens. Despite its relative mechanistic sophistication, 43C9 is still a primitive amidase when compared with a typical protease. Subtilisin, which also exploits nucleophilic catalysis, catalyzes the cleavage of succinyl-Ala-Ala-Pro-Phe-p-nitroanilide 104 times more efficiently than 43C9. Its rate acceleration is 3.9 × 109 over background and its chemical proficiency is 2.2 × 1013 M−1 [112]. Moreover, subtilisin can cleave unacti-vated amides, whereas 43C9 is restricted to substrates with leaving groups of pK a < ca. 12 [113]. The high efficiency of covalent catalysis in serine proteases like subtilisin derives from cooperative action of several functional groups in addition to the active site nucleophile. Acid-base chemistry, in particular, is used to activate the serine nucleophile, facilitate proton transfers, and stabilize the amide leaving group. Removal of any single component of the protease’s catalytic triad results in 104 to 106 -fold decreases in activity [112]. Unsurprisingly, 43C9 lacks an analogously complex catalytic machinery. Although it has proven difficult to install additional functional groups by mutagenesis [103], more productive changes may become obvious when the structure of the hapten complex is known. Lessons from the immunological evolution of 48G7 [53, 87] suggest that mutagenesis of residues outside the binding pocket will be required to fine-tune critical synergies between participants in catalysis. 5.4 Functional Groups
Functional groups – acids, bases, and even exogenous cofactors — can clearly extend the capabilities of catalytic antibodies. They will certainly be needed if reactions with large activation barriers are to be significantly accelerated. As we have seen in the case of amidase 43C9, unplanned but useful residues can appear by chance in the
5 Representative Catalytic Antibodies
immunoglobulin pocket during affinity maturation. To increase the probability of eliciting such functional groups where they are needed, several strategies have been devised. Charge complementarity between an antibody and its ligand is the easiest principle to exploit in generating functionalized binding pockets. Cationic haptens have been used, for instance, to elicit negatively charged carboxylates that can serve as bases or nucleophiles or as general acids when protonated. Elimination reactions, epoxide ring openings, cationic cyclizations, and hydrolyses of esters, ketals, and enol ethers have been successfully catalyzed by this approach [114]. In favorable cases, very large catalytic effects have been achieved. For example, antibodies raised against a protonated benzimidazolium derivative use an active-site Asp or Glu to deprotonate substituted benzisoxazoles with effective molarities of 40 000 M and rate accelerations in excess of 108 over the acetate-promoted background reaction [115]. Nevertheless, it is unlikely that a single hapten will ever elicit arrays of residues as sophisticated as those present in highly evolved enzymes. Heterologous immunization has been explored as a method of circumventing this limitation [116]. In this procedure, two molecules, each containing different functional groups, are serially used as haptens to elicit the immune response. Ideally, a subset of the resulting antibodies will possess multiple catalytic groups induced in response to both tem-plating molecules and have, as a result, enhanced activities. An important advantage of this strategy is that simplified haptens can be used, reducing the need for laborious synthesis. Although generality must still be established, results from initial studies indicate that antibody esterases generated by heterologous immunization are more efficient than those generated in response to the individual haptens [116, 117]. Mechanism-based enzyme inhibitors [17, 18] are potentially of even greater utility than haptens [19, 118–122]. Such molecules exploit a protein’s ability to initiate a cascade of events, ultimately leading to its own covalent modification. This irreversible chemical reaction thus provides a means of selecting immunoglobulins in vivo and/or in vitro on the basis of their activity. When the selected antibody is challenged with substrate rather than hapten, the same group(s) responsible for protein modification can be used to promote the desired chemical transformation. Covalent catalysis can thus be specified through hapten design. The potential of reactive immunization is perhaps best illustrated by the production of aldolase antibodies [120]. These catalysts not only utilize a complex chemical mechanism but are among the most efficient catalytic antibodies described to date.
5.4.1 Aldolases Aldol condensations are broadly useful carbon-carbon bond-forming reactions in organic synthesis. They pose special difficulties for biocatalysts, however, because they proceed via a series of consecutive transition states, each requiring acid-base catalysis. Class I aldolases solve these problems by using a reactive lysine to activate the ketone donor through Schiff base formation at the active site [123–125]. Deprotonation of the Schiff base yields an enamine that then adds stereoselectively
25
26 Catalytic Antibody Technology
Fig. 16 $-Diketone 10, used as a hapten to raise antibodies 38C2 and 33F12, traps a reactive, active-site amine to form a stable, chromophoric vinylogous amide [120]. These antibodies promote diverse aldol
condensations. In the example shown (bottom), the two products are formed in a ca. 1:1 ratio with the indicated diastereoselectivities.
to the acceptor aldehyde to form a new carbon-carbon bond. Subsequent hydrolysis of the Schiff base releases product and regenerates the active catalyst. β-Diketones inhibit class I adolases by forming a Schiff base with the active-site lysine and then rearranging to a more stable vinylogous amide. When β-diketone 10 was used as a hapten instead of a more conventional transition-state analog [120], two analogously modifiable antibodies (33F12 and 38C2) were obtained (Fig. 16). The reactive group on the antibodies is a lysine with an anomolously low pK a (5.5 for 33F12 and 6.0 for 38C2 [123]). The vinylogous amide it forms with the hapten (λmax 316 nm, 15 000 M−1 cm−1 ) can be irreversibly trapped by reduction with sodium cyanoborohydride. These same antibodies also mimic the activity of class I aldolases [120]. Their reactive lysine reacts with a wide range of ketones to form enamine adducts. These condense with diverse aldehydes to form aldol products. The structure of antibody aldolase 33F12 [123] shows the catalytic lysine (LysH93 ) buried at the bottom of a hydrophobic pocket (Fig. 17). It is not hydrogen bonded with other amino acids and there are no charged residues within 7Å. The absence of such interactions must be responsible for this group’s unusual reactivity: the hydrophobic microenvironment lowers the amine’s pK a by disfavoring the protonated state. In contrast, class I aldolases are believed to increase the acidity of their catalytically essential lysine through proximity to several positively charged residues [126]. Unfortunately, the unliganded antibody provides few clues about the interactions that stabilize the transition states for formation and breakdown of the carbinol amine, deprotonation to afford the enamine, and creation of the new carbon-carbon bond. Sequestered water molecules or the hydroxyl groups of Tyr or Ser residues
5 Representative Catalytic Antibodies
Fig. 17 The binding pocket of antibody 33F12 [123] is seen through a slice in the molecular surface calculated with a sphere of 1.4 Å radius [206]. Only the g-amino group of LysH93 contacts the molecular surface at the bottom of the antigen binding site.
that constitute the pocket walls may be involved. For instance, a simple rotation of the Lys side chain from its position in the unliganded antibody would bring its amino group near SerH100 , which lies on the opposite side of the pocket. Structures of enamine adducts will be important for clarifying these points. An unanticipated feature of the aldolase antibodies is their promiscuity. Over 100 different reactions, including aldehyde-aldehyde, ketone-aldehyde and ketoneketone condensations, are subject to catalysis [123]. Because the 33F12 pocket is relatively hydrophobic, polyhydroxylated aldehydes, such as glyceraldehyde, glucose, and ribose, which are good acceptors for natural aldolases, are not substrates for the antibodies. Aside from this restriction, a wide range of donors and acceptors is tolerated. Non-specific van der Waals interactions likely provide the driving force for sequestering the first substrate. Once bound, it encounters the reactive lysine and forms the nucleophilic enamine. Provided there are no steric clashes, similar interactions should allow the aldehyde acceptor to bind and undergo aldol addition. The binding site is 11 Å deep and quite capacious, much larger in fact than the β-diketone that induced it, accounting for this broad specificity. These properties have been rationalized as a consequence of the reactive immunization process itself [123]: capture of the antibody by a covalent chemical event early in the process of affinity maturation may obviate the need for further refinement of the binding pocket by somatic mutation. Although induced with an achiral hapten and despite their broad substrate specificity, these aldolase antibodies are surprisingly stereoselective (Fig. 16). When acetone is the donor substrate, addition preferentially occurs on the si-face of the aldehyde acceptor; with hydroxy acetone, attack occurs on the re-face. In most cases, enantiomeric excesses >95% are found [127]. Enantioselective Robinson annulations [128], resolution of tertiary aldols [129], and preparation of chiral intermediates for the total synthesis of brevicomins [130] and epothilones [5] illustrate the synthetic utility of these catalysts. Kinetic resolution of the epothilone precursor
27
28 Catalytic Antibody Technology
was carried out on a gram scale using 0.06 mol% of antibody 38C2 (corresponding to 0.5 g of IgG). The reaction proceeded with good yield (37%) and high enantiomeric excess (90%). More recently, antibody 38C2 has been used to activate prodrugs [6]. It catalyzes the selective removal of generic drug-masking groups via sequential retro-aldol and retro-Michael reactions. This advance could prove useful in the development of selective chemotherapeutic strategies. How do these remarkable antibody aldolases compare to natural aldolases? To address this question, it is easiest to consider representative retroaldol reactions. For such transformations, the antibodies achieve turnover numbers ranging from 0.0003 to 0.08 s−1 [127]. One of the best antibody substrates is 4-(4 isobutyramidophenyl)-4-hydroxy-2-butanone. In the presence of 38C2, it undergoes retroaldolization with kcat and kcat /K m values of 0.083 min−1 and 3.3 × 103 M−1 s−1 , respectively. The rate constant for the background reaction is 1.4 × 10−9 s−1 (aqueous buffer, pH 7 and 25 ◦ C). The antibody thus accelerates this reaction by a factor of 2 × 107 -fold; its chemical proficiency is 6 × 1010 M−1 . These are impressive effects for a catalytic antibody, and may be contrasted with activity accruing to FDP aldolase and KDPG aldolase, typical class I enzymes involved in sugar metabolism. FDP aldolase catalyzes the interconversion of fructose-1,6-diphosphate (FDP) to give dihydroxyacetone phosphate and glyceraldehyde-3-phosphate. Its steady state kinetic parameters in the cleavage direction are kcat = 48 s−1 and kcat /K m = 1.6 × 107 M−1 s−1 [131]. KDPG aldolase, which cleaves 2-keto-3-deoxy-6-phosphogluconate (KDPG) into pyruvate and glyceraldehyde-3-phosphate, is even more active, having kcat and kcat /K m values of 290 s−1 and 4.0 × 10 M−1 s−1 , respectively [132]. Although rate constants for the corresponding uncatalyzed reactions are unavailable, taking 1.4 × 10−9 s−1 as a conservative estimate [127] yields rate enhancements of ca. 1010 − 1011 and chemical proficiencies of ca. 1015 –1016 M−1 . By either criterion, the natural enzymes are several orders of magnitude more efficient than the antibody aldolases. Natural aldolases are more restrictive in their substrate requirements than the antibodies, though they could lose considerable activity and still be competitive. For example, removal of a single phosphate in FDP causes a 50-fold loss in rate with FDP aldolase [131]. The extensive interactions that confer such specificity are almost certainly coupled to the enzyme’s high efficiency. By further refining the antibody pocket tothe precise steric and electronic demandsofaparticular aldol reaction, while maintaining the active-site lysine’s high reactivity, enzyme levels of activity may be attainable. Narrowed scope may be the unavoidable cost of truly high efficiency, however. 6 Perspectives 6.1 General Lessons from Comparisons of Enzymes and Antibodies
Structural and mechanistic work reviewed here reveals many notable parallels between antibodies and their more highly evolved counterparts. Not only are the
6 Perspectives
sizes and shapes of their active sites comparable, but antibodies and enzymes utilize the same set of molecular interactions to bind their respective ligands and stabilize transition states. Although antibodies tend to be less extensively functionalized than enzymes, the basic mechanistic strategies they employ to lower kinetic barriers are strikingly similar. As more primitive catalysts, however, they provide an alternative vantage point for examining the relationship between binding energy and catalysis. In this regard, simplicity is a virtue. Rather than working backwards from a fully evolved enzyme, uncomplicated, tailored model systems can be constructed to illuminate specific mechanistic questions. As multiple mechanistic strategies are combined to augment efficiency, valuable insight into the evolution of catalytic function can be gained. Functional analysis of the antibody intermediates that arise during affinity maturation [38, 53, 85, 87] also sheds light on these issues. Currently, antibodies appear less successful than enzymes intheir ability to achieve the fine level of recognition required for optimal discrimination between transition states and ground states. Their modest efficiencies appear to be a direct consequence of the simple strategy used to generate them. While the process of natural selection optimizes enzymes on the basis of their catalytic activity, the immune system’s microevolutionary mechanisms select antibodies for increased affinity to an imperfect transition-state analog. It is unrealistic toexpect that proteins engineered to recognize such haptens will provide an ideal steric and electrostatic environment for chemical transformation. Even with a perfect transition-state analog, the chances of obtaining a fully evolved catalyst through immunization would be low. As noted above, there is generally insufficient selection pressure to attain the high binding energies that characterize complexes between true enzymes and their transition states. On both micro and macro levels, mechanistic improvements arise as a function of time, so differences in time scales for the evolution of enzymes and antibodies – millions of years versus weeks or months – also come in to play. Although Nature uses a wide variety of different protein scaffolds to build enzyme active sites [133], she does not seem to have adopted the immunoglobulin fold. It is therefore conceivable that antibody structure itself places intrinsic limitations on the kind of reactions amenable to catalysis and on attainable efficiencies. In general, though, structural studies show excellent shape and chemical complementarity between antibodies and their ligands. Depending on the hapten, deep pockets, clefts, grooves, and flatter, more undulating surfaces can be created [134, 135]. Because certain classes of haptens tend to be recognized in the same way [52, 53, 83, 96], structural diversity must be considerably more restricted than might have been expected given 108 variants available in the primary immune repertoire [136, 137], but whether these consensus sites significantly restrict the catalytic capabilities of antibodies is still unclear. Conformational flexibility is another potential concern. Protein conformational changes in enzymes provide a means of excluding water from the active site and enable the catalyst to adjust tochanges in substrate as the reaction coordinate is traversed [138, 139]. Antibodies are known to undergo a comparable range of ligandinduced conformational changes, including alterations in side chain rotamers, segmental movements of hypervariable loops and changes in the relative disposition
29
30 Catalytic Antibody Technology
of the VH and VL domains [134]. Without direct selection for activity, however, these dynamic effects will be difficult to exploit deliberately for catalysis. In fact, conformational flexibility in catalytic antibodies, when observed [80, 81, 87], usually results in lower rather than higher efficiency. 6.2 How efficient does catalysis need to be?
Enzymes represent an extraordinarily high standard against which to judge new catalysts that are rationally designed from simple principles. From the perspective of the chemist, one exciting aspect of catalytic antibody technology is its ability to deliver tailored catalysts for reactions which are difficult to carry out selectively using existing methods or for which natural enzymes do not exist. Must such systems attain enzyme-like efficiency to be useful? Because antibodies are biocompatible and have long serum half-lives, many in vivo applications would be conceivable if sufficient activity were available. In fact, existing catalytic antibodies have already achieved significant effects in biological systems. When expressed at high levels, they have been shown to be competent (if inefficient) catalysts in metabolism, replacing essential enzymes in amino acid [42] and pyrimidine biosynthesis [77] pathways. Therapeutically relevant concentrations of the aldolase antibodies discussed earlierhave beenusedtoactivate prodrugsand kill colon and prostate cell lines [6]. Similarly, an esterolytic antibody has been employed as a cocaine antagonist, protecting rats from cocaine-induced seizures and sudden death [140, 141]. Many chemical reactions cannot proceed in the absence of catalysts because competing pathways have lower energies. In several instances, antibody binding energy has been successfully utilized to alter the course of such reactions by selectively stabilizing the less favorable transition state. For example, antibody catalysts have been developed for normally disfavored syn-eliminations [142], exo rather than endo Diels-Alder cycloadditions [46, 48], and 6-endo-tet ring closures of epoxy alcohols [143, 144]. Antibodies have also been used to control the fate of high-energy intermediates, allowing them to partition along only one of several possible pathways, as in the case of conversion of an enol ether to a cyclic ketal in water [145]. Formation of a strained cyclopropane derivative in a cationic olefin cyclization is another such example [146]. Binding energies up to 5.5 kcal/mol are typically available for achieving such discrimination, and even more energy may be available in favorable cases [15]. Such selectivities could be of great utility in organic synthesis. Assuming an antibody is available for any given transformation, its turnover and cost will ultimately determine whether it is used in practice. Presently, the steady state parameters of typical catalysts necessitate high antibody concentrations (≥ 10 µM = 1.6 mg/ml) and long reaction times to achieve useful conversions [12]. Preparative applications of several antibodies show that gram-scale reactions are feasible, particularly if antibody selectivity is high and competing reactions are substantially slower than the desired transformation [5, 147, 148]. Costs associated with high-volume antibody production are certainly an issue, but some antibodies
6 Perspectives
are now produced on a large scale for diagnostic and medical applications. They are readily obtained in good yield through ascites production [149] or by fermentation in hollow fiber reactors [150, 151]. Their Fab and Fv fragments can often be produced efficiently in plants [152, 153] or microorganisms [154]. Technical advances in microbial fermentation can be expected to make antibody production even more economically favorable in the future. In short, current levels of activity may be adequate for some laboratory applications, but higher efficiencies would certainly be beneficial. A 103 -fold increase in turnover would mean that 103 -fold less catalyst is needed to achieve useful levels of performance. Enhanced proficiency will certainly be necessary if energetically more demanding reactions are to be tackled. The creation of antibody equivalents of site-specific proteases, glycosidases, and nucleases, for example, remains a significant yet unrealized goal. The use of antibodies to synthesize or modify structurally complex and biologically important macromolecules will depend on solving this basic problem. 6.3 Strategies for Optimizing Efficiency
If imperfect design and indirect selection for binding rather than function are the primary reasons for low catalytic efficiency, creation of substantially better antibody catalysts will be feasible. Conceptually, two approaches can be envisaged: (1) refining methods for producing first generation catalysts, and (2) developing new strategies to optimize existing active sites. Improved transition-state analogs and more effective screening of the immune response address the first point. Rational reengineering and directed evolution methods are relevant to the second. These strategies have already been discussed in the context of specific antibody catalysts but are summarized in more general terms below. 6.3.1 Better Haptens The rewards of good – and the penalties of deficient – design are evident in the properties of catalytic antibodies characterized to date. In general, however, it is not clear which structural features of a transition state are most important to mimic to elicit maximally effective catalysts. Statistically meaningful correlation of different hapten types and the properties of their complementary active sites are needed to optimize analogs for each type of reaction. Incorporating design features that maximize transition-state affinity while minimizing ground-state stabilization remains a major challenge. For example, aryl groups are constituents of many haptens. Binding energy directed toward them, while increasing hapten affinity, may be useless or even harmful for catalysis, since these elements are common to both ground and transition state. Inefficient utilization of intrinsic binding energy in this way may help to explain the modest activities seen in the oxy-Cope [38] and Diels-Alder reactions [53] discussed above. The finding that catalytic activity for a series of polyclonal esterases correlates inversely with the size and hydrophobicity of the haptenic aryl phosphates [155] also illustrates this
31
32 Catalytic Antibody Technology
problem. When binding energy is used to recognize parts of the substrate distant from the site of reaction, product inhibition becomes another concern. Strategies to facilitate product release must therefore be considered integral to hapten design. Antigen presentation is a further issue. Small molecules are not immunogenic and mustbe coupledto carrier proteins. Variation of the tether site maybe a useful means of focusing immunorecognition to a hapten’s catalytically relevant epitopes [141]. Linkage of chemical reactivity with the immune system’s selection and amplification processes may mitigate some limitations in design. For this reason, the reactive immunization strategy [19, 118–122] merits increased attention. Many additional examples will be needed to establish its true scope and limitations. Given their importance in natural enzymes[156], metal ions and exogenous organic cofactors should considerably extend the properties of antibody catalysts, as well. Although versatile hybrid catalysts that combine the intrinsic reactivity of the metal ion or coenzyme with the tailored binding specificity of the antibody can be readily envisaged, this strategy has been surprisingly underutilized. Metal ion binding sites have been engineered into antibody binding sites [157], creating a sensitive Zn(II) sensor in one case [158], but catalysis has not been realized. An alternative, seemingly promising strategy using metal chelates for peptide cleavage has received little attention since first reported [159]. Modest activities have been described for other miscellaneous cofactor-dependent reactions [158, 160–163], but more work is obviously needed. Such strategies will be very important for promoting reactions with high kinetic barriers and reactions that cannot be carried out with protein residues alone. 6.3.2 Screening Because unusual germline sequences or fortuitous mutations may be necessary for high activity, the best antibody catalysts are also potentially the rarest. For this reason, more extensive screening of the immune response may dramatically increase the probability of finding highly active clones. Usually, small panels of antibodies chosen for their ability to bind the transition-state analog are purified and tested individually for catalytic activity. This procedure is necessarily indirect and slow. Sensitive chemical [164], biological [42, 77, 91, 165] and immunological assays [60, 166, 167] can facilitate the screening of thousands of candidates directly for catalysis and thereby accelerate the preliminary evaluation process. One practical problem associated with broad screening is that some antigens yield many hapten binders, others relatively few. To increase the size and diversity of the antibody population available for testing, multiple fusions can be performed and several mouse strains utilized for immunization. Mice prone to autoimmunity have been shown to yield unusually large numbers of esterolytic antibodies and may prove more generally useful for expanding the repertoire of catalytic clones elicited by a single transition-state analog [168]. Significant progress has also been made in copying the combinatorial processes of the immune system in vitro. Libraries of antibody fragments containing more than 106 members can be constructed and produced in microorganisms or
6 Perspectives
displayed on phage particles [169–172]. These systems are attractive vehicles for exploring the catalytic potential of different subsets of the primary immunological repertoire. Binders can be selected from these libraries on the basis of hapten affinity and subsequently screened for catalytic activity [169]. Clever strategies for capturing active clones based on their activity should be even more effective [173– 177]. Alternatively, catalysts can be obtained directly by selection in vivo using yeast or bacterial auxotrophs [42, 77, 91]. In these approaches, iterative rounds of mutagenesis and reselection replace somatic mutation as a means of refining initial hits [170]. Domain swapping [178] and powerful DNA shuffling methods [179, 180] have been developed to speed up this process. 6.3.3 Engineering The upper limit on activity that can be achieved with antibodies is unknown and may be reaction dependent. It is therefore important to push several test cases as far as possible. Site-directed mutagenesis is an attractive tool for improving catalytic power, particularly given the availability of increasing numbers of high-resolution structures. In general, it will probably be easiest to reengineer the poorest catalysts, since changes that improve packing or provide missing but critical interactions may be relatively obvious. The mutational study leading to an order of magnitude increase in activity of the Diels-Alderase antibody 39-A11 is a case in point [54]. The fact that relatively few changes are needed to tailor the properties of germline structure during affinity maturation [53, 85, 93] is also encouraging. Pinpointing subtle changes needed to optimize more active clones is likely to be more difficult, however. The obstacles encountered in augmenting the activity of hydrolytic antibodies sound a cautionary note [103–105]. Ultimately, our understanding of will determine what can be achieved in this way. 6.3.4 Selection Enzymes have been brought to peak efficiency over millions of years by the process of natural selection. An analogous process in the laboratory, involving recursive cycles of mutagenesis and genetic selection for function, may provide the ultimate test of the capabilities of antibody catalysts. Evolution of antibodies can be accomplished perhaps most directly by complementation of auxotrophic yeast or bacterial strains. The chorismate mutase antibody 1F7 [42] and an orotate decarboxylase antibody [77] have been shown to confer a significant growth advantage under selective conditions to host cells lacking the corresponding enzymes. Though the experiments are technically difficult because of poor antibody expression in microorganisms, preliminary results with the chorismate mutase antibody have demonstrated the feasibility of selecting antibodies with novel properties [43]. Recent work showing that cytoplasmic production of antibody fragments is optimizable by selection [181] augers well for these efforts. Selection systems are available for many transformations, including the ferrochelatase and metabolic reactions already mentioned [42, 70, 77]. A generalized selection scheme for hydrolytic reactions has also been reported [91]; analogous assays could be developed to exploit the ability of a catalyst to synthesize or destroy nutrients, drugs, hormones, or toxins. In addition to
33
34 Catalytic Antibody Technology
providing clues about the perfectibility of catalytic antibodies, such experiments may yield fundamental insights into structure-activity relationships and the evolution of molecular function. 6.3.5 Other Scaffolds The immune system was originally tapped as a source of catalysts as a matter of convenience: it is unrivaled in its ability to fashion high affinity protein receptors – the antibodies – to virtually any natural or synthetic molecule, essentially on demand [58, 136, 137]. However, now that the combinatorial processes of the immune system can be mimicked in vitro and libraries of macromolecules can be generated relatively easily using the tools of molecular biology, there is no compelling need to restrict Jencks’s original strategy to a single protein fold or even to a single class of macromolecule. Indeed, in analogy to catalytic antibody experiments, catalytic RNAs and DNAs have been obtained from large libraries of nucleic acids by selection for binding to transition-state analogs. Catalysts for porphyrin metalation obtained in this way [182–184] have activity comparable to that of the ferrochelatase antibody 7G12 discussed above, but an RNA rotamase is 30 times less effective than its antibody counterpart [185]. The lower activity of the latter largely reflects the RNA’s lower affinity for the hapten used (K d = 7 µM, compared with 0.21 µM for the antibody). A similar explanation has been invoked for RNAs that bind the 1E9 hapten (compound 4, Fig. 6) but fail to catalyze the corresponding Diels-Alder reaction [186]. Nucleic acids are likely to be intrinsically more limited than proteins in their capacity for high affinity molecular recognition of structurally diverse ligands as well as for catalysis [187]. Direct selection for function rather than transition-state analog binding has proven to be a much more powerful approach for obtaining nucleic acids with novel catalytic properties (for a recent comprehensive review, see [188]). RNA and have been prepared in this way for a variety of reactions, including a Diels-Alder cycloaddition [189]. Although the resulting catalysts often have relatively modest efficiency, phosphoryl transfers, which are difficult to achieve with antibodies because of the dearth of stable analogs of the pentacoordinate transition state, have been particularly amenable to catalysis. One of the most impressive accomplishments in this regard is the selection of highly efficient ribozymes capable of a self-ligation reaction from large pools of random sequence [190]. The number of starting molecules in these experiments was huge (ca. 1015 ), dwarfing the diversity of the primary immune repertoire (ca. 108 molecules), allowing even extremely rare catalysts to be found. One of the ribozymes obtained by selection was subsequently reengineered to function as a true catalyst; it promotes an intermolecular ligation with multiple turnovers and a rate acceleration approaching 109 [191]. This impressive activity is higher than any seen for most antibody catalysts, and provides an impressive demonstration of the power of direct selection. In principle, in vitro selection of peptides and proteins from vast combinatorial libraries is now possible as well using ribosome display [192–194], mRNAprotein fusion methods [195, 196] and more established phage display formats [172].
References
Coupled with efficient ways of linking genotype with phenotype [174–176], these methods can be expected to facilitate the production of proteins with novel properties and functions. As such, they powerfully complement other efforts to harness the power of evolution to redesign the structures and activities of existing enzymes [197–199, 207, 208].
7 Conclusions
Catalytic antibody technology combines programmable design with the combinatorial diversity of the immune system. This fusion has allowed the field to progress in relatively short order from simple model reactions to complex multistep processes, but much remains to be learned. Early efforts focused largely on defining the scope and limitations of this technology. Now that the approach is well established, attention must be paid to strategies for optimizing catalytic efficiency and for promoting more demanding transformations. In many ways, these are far greater challenges than identifying first-generation catalysts with modestactivity. Continued mechanistic and structural analysis of these systems will inform such endeavors. In addition, learning how to create, manipulate and evolve large combinatorial libraries of proteins outside the immune system should help to automate the processes of catalyst discovery and optimization.
Acknowledgements
The author is indebted to Kinya Hotta for creating the graphics for this article and to the National Institutes of Health, the ETH Zurich, the Swiss National Science Foundation, and Novartis Pharma for generous support.
References 1 JENCKS, W. P., Catalysis in Chemistry and Enzymology, McGraw Hill, New York 1969 2 SCHULTZ, P. G., LERNER, R. A., Science 269 (1995), p. 1835–1842 3 HILVERT, D., In Topics in Stereochemistry, ed. S. E. DENMARK 22, John Wiley & Sons, Inc., New York 1999, p. 83– 135 4 SCHULTZ, P. G., LERNER, R. A., Acc. Chem. Res. 26 (1993), p. 391–395 5 SINHA, S. C., BARBAS III, C. F., LERNER, R. A., Proc. Natl. Acad. Sci. USA 95 (1998), p. 14603–14608
6 SHABAT, D., RADER, C., LIST, B., LERNER, R. A., BARBAS III, C. F., Proc. Natl. Acad. Sci. USA 96 (1999), p. 6925–6930 7 RADZICKA, A., WOLFENDEN, R., Science 267 (1995), p. 90–93 8 WOLFENDEN, R., LU, X. D., YOUNG, G., J. Am. Chem. Soc. 120 (1998), p. 6814–6815 9 PAULING, L., Am. Sci. 36 (1948), p. 51–58 10 KURZ, J. L., J. Am. Chem. Soc. 85 (1963), p. 987–991 11 WOLFENDEN, R., Nature 223 (1969), p. 704–705 12 MADER, M. M., BARTLETT, P. A., Chem. Rev. 97 (1997), p. 1281–1301
35
36 Catalytic Antibody Technology
13 RADZICKA, A., WOLFENDEN, R., Methods Enzymol. 249 (1995), p. 284–312 14 LOLIS, E., PETSKO, G. A., Annu. Rev. Biochem. 59 (1990), p. 597–630 15 STEWART, J. D., BENKOVIC, S. J., Nature 375 (1995), p. 388–391 16 THOMAS, N. R., Appl. Biochem. Biotechnol. 47: (1994), p. 345–372 17 ABELES, R. H., MAYCOCK, A. L., Acc. Chem. Res. 9 (1976), p. 313–319 18 WALSH, C., Tetrahedron 38 (1982), p. 871–909 19 WIRSCHING, P., ASHLEY, J. A., LO, C.-H. L., JANDA, K. D., LERNER, R. A., Science 270 (1995), p. 1775–1782 20 JENCKS, W. P., Adv. Enzymol. 43 (1975), p. 219–410 21 WESTHEIMER, F. H., Adv. Enzymol. 24 (1962), p. 441–482 22 BARTLETT, P. A., NAKAGAWA, Y., JOHNSON, C. R., REICH, S. H., LUIS, A. J. Org. Chem. 53 (1988), p. 3195–3210 23 GRAY, J. V., EREN, D., KNOWLES, J. R., Biochemistry 29 (1990), p. 8872–8878 24 HILVERT, D., NARED, K. D., J. Am. Chem. Soc. 110 (1988), p. 5593–5594 25 HILVERT, D., CARPENTER, S. H., NARED, K. D., AUDITOR, M.-T. M., Proc. Natl. Acad. Sci. USA 85 (1988), p. 4953–4955 26 JACKSON, D. Y., JACOBS, J. W., SUGASAWARA, R., REICH, S. H., BARTLETT, P. A., SCHULTZ, P. G., J. Am. Chem. Soc. 110 (1988), p. 4841–4842 27 JACKSON, D. Y., LIANG, M. N., BARTLETT, P. A., SCHULTZ, P. G., Angew. Chem. Int. Ed. Engl. 31 (1992), p. 182–183 28 LEE, A. Y., STEWART, J. D., CLARDY, J., GANEM, B., Chem. Biol. 2 (1995), p. 195–203 29 CAMPBELL, P. A., TARASOW, T. M., MASSEFSKI, W., WRIGHT, P. E., HILVERT, D., Proc. Natl. Acad. Sci. USA 90 (1993), p. 8663–8667 30 HAYNES, M. R., STURA, E. A., HILVERT, D., WILSON, I. A., Science 263 (1994), p. 646–652 31 ADDADI, L., JAFFE, E. K., KNOWLES, J. R., Biochemistry 22 (1983), p. 4494– 4501 32 GUSTIN, D. J., MATTEI, P., KAST, P., WIEST, O., LEE, L., CLELAND, W. W., HILVERT, D., J. Am. Chem. Soc. 121 (1999), p. 1756–1757
33 KAST, P., ASIF-ULLAH, M., JIANG, N., HILVERT, D., Proc. Natl. Acad. Sci. USA 93 (1996), p. 5043–5048 34 LIU, D. R., CLOAD, S. T., PASTOR, R. M., SCHULTZ, P. G., J. Am. Chem. Soc. 118 (1996), p. 1789–1790 35 WIEST, O., HOUK, K. N., J. Org. Chem. 59 (1994), p. 7582–7584 36 BRAISTED, A. C., SCHULTZ, P. G., J. Am. Chem. Soc. 116 (1994), p. 2211–2212 37 DRIGGERS, E. M., CHO, H. S., LIU, C. W., KATZKA, C. W., BRAISTED, A. C., ULRICH, H. D., WEMMER, D. E., SCHULTZ, P. G., J. Am. Chem. Soc. 120 (1998), p. 1945–1958 38 ULRICH, H. D., MUNDORFF, E., SANTARSIERO, B. D., DRIGGERS, E. M., STEVENS, R. C., SCHULTZ, P. G., Nature 389 (1997), p. 271–275 39 G¨ORISCH, J., Biochemistry 17 (1978), p. 3700–3705 40 KAST, P., ASIF-ULLAH, M., HILVERT, D., Tetrahedron Lett. 37 (1996), p. 2691–2694 41 STEIGERWALD, M. J., GODDARD, W. A., EVANS, D. A., J. Am. Chem. Soc. 101 (1979), p. 1994–1997 42 TANG, Y., HICKS, J. B., HILVERT, D., Proc. Natl. Acad. Sci. USA 88 (1991), p. 8784–8786 43 TANG, Y., Evolutionary studies with a catalytic antibody, Ph.D., The Scripps Research Institute 1996 44 SAUER, J., SUSTMAN, R., Angew. Chem. Int. Ed. Engl. 19 (1980), p. 779–807 45 HILVERT, D., HILL, K. W., NARED, K. D., AUDITOR, M.-T. M., J. Am. Chem. Soc. 111 (1989), p. 9261–9262 46 GOUVERNEUR, V. E., HOUK, K. N., PASCUAL-TERESA, B., BENO, B., JANDA, K. D., LERNER, R. A., Science 262 (1993), p. 204–208 47 BRAISTED, A. C., SCHULTZ, P. G., J. Am. Chem. Soc. 112 (1990), p. 7430–7431 48 YLI-KAUHALUOMA, J. T., ASHLEY, J. A., LO, C.-H., TUCKER, L., WOLFE, M. M., JANDA, K. D., J. Am. Chem. Soc. 117 (1995), p. 7041–7047 49 RESMINI, M., MEEKEL, A. A. P., PANDIT, U. K., Pure Appl. Chem. 68 (1996), p. 2025–2028 50 KIRBY, A. J., Adv. Phys. Org. Chem. 17 (1980), p. 183–278 51 PAGE, M. I., JENCKS, W. P., Proc. Natl. Acad. Sci. USA 68 (1971), p. 1678–1683
References 52 XU, J., DENG, Q., CHEN, J., HOUK, K. N., BARTEK, J. D. H., WILSON, I. A., Science (1999), in press 53 ROMESBERG, F. E., SPILLER, B., SCHULTZ, P. G., STEVENS, R. C., Science 279 (1998), p. 1929–1933 54 ROMESBERG, F. E., SCHULTZ, P. G., Bioorg. Med. Chem. Lett. 9 (1999), p. 1741–1744 55 HAYNES, M. R., LENZ, M., TAUSSIG, M. J., WILSON, I. A., HILVERT, D., Israel J. Chem. 36 (1996), p. 151–159 56 AREVALO, J. H., TAUSSIG, M. J., WILSON, I. A., Nature 365 (1993), p. 859–863 57 AREVALO, J. H., HASSIG, C. A., STURA, E. A., SIMS, M. J., TAUSSIG, M. J., WILSON, I. A., J. Mol. Biol. 241 (1994), p. 663–690 58 KABAT, E. A., Structural Concepts in Immunology and Immunochemistry. Holt, Rinehart, and Winston, Inc., New York 1976 59 LASCHAT, S., Angew. Chem. Int. Ed. Engl. 35 (1996), p. 289–291 60 MACBEATH, G., HILVERT, D., J. Am. Chem. Soc. 116 (1994), p. 6101–6106 61 HALDANE, J. B. S., Enzymes. Longmans, Green and Co., London 1930 62 LAVALEE, D. K., Mol. Struct. Energ. 9 (1988), p. 279–314 63 AL-KARADAGHI, S., HANSSON, M., STANISLAV, N., JOHNSON, B., HEDERSTEDT, L., Structure 5 (1997), p. 1501–1510 64 DAILEY, H. A., FLEMING, J. E., J. Biol. Chem. 258 (1983), p. 11453–11459 65 COCHRAN, A. G., SCHULTZ, P. G., Science 249 (1990), p. 781–783 66 HANSSON, M., HEDERSTEDT, L. Eur. J. Biochem. 220 (1994), p. 201–208 67 ROMESBERG, F. E., SANTARSIERO, B. D., SPILLER, B., YIN, J., BARNES, D., SCHULTZ, P. G., STEVENS, R. C., Biochemistry 37 (1998), p. 14404–14409 68 BLACKWOOD Jr., M. E., RUSH III, T. S., ROMESBERG, F., SCHULTZ, P. G., SPIRO, T. G., Biochemistry 37 (1998), p. 779–782 69 WARSHEL, A., J. Biol. Chem. 273 (1998), p. 27035–27038 70 GORA, M., GRZYBOWSKA, E., RYTKA, J., LABBE-BOIS, R., J. Biol. Chem. 271 (1996), p. 11810–11816 71 CROSBY, J., STONE, R., LIENHARD, G. E., J. Am. Chem. Soc. 92 (1970), p. 2891–2900 72 O’LEARY, M. H., PIAZZA, G. J., Biochemistry 20 (1981), p. 2743–2748
73 ALSTON, T. A., ABELES, R. H., Biochemistry 26 (1987), p. 4082–4085 74 SUN, S. X., DUGGLEBY, R. G., SCHOWEN, R. L., J. Am. Chem. Soc. 117 (1995), p. 7317–7322 75 LEWIS, C., KRA¨ MER, T., ROBINSON, S., HILVERT, D., Science 253 (1991), p. 1019–1022 76 TARASOW, T. M., LEWIS, C., HILVERT, D., J. Am. Chem. Soc. 116 (1994), p. 7959– 7963 77 SMILEY, J. A., BENKOVIC, S. J., Proc. Natl. Acad. Sci. USA 91 (1994), p. 8319–8323 78 BARTLETT, P. A., MARLOW, C. K., Biochemistry 22 (1983), p. 4618–4624 79 STEWART, J. D., LIOTTA, L. J., BENKOVIC, S. J., Acc. Chem. Res. 26 (1993), p. 396–404 80 CHARBONNIER, J.-B., CARPENTER, E., GIGANT, B., GOLINELLI-PIMPANEAU, B., ESHHAR, Z., GREEN, B. S., KNOSSOW, M., Proc. Natl. Acad. Sci. USA 92 (1995), p. 11721–11725 81 GOLINELLI-PIMPANEAU, B., GIGANT, B., BIZEBARD, T., NAVAZA, J., SALUDJIAN, P., ZEMEL, R., TAWFIK, D. S., ESHHAR, Z., GREEN, B. S., KNOSSOW, M., Structure 2 (1994), p. 175–183 82 KRISTENSEN, O., VASSYLYEV, D. G., TANAKA, F., MORIKAWA, K., FUJII, I., J. Mol. Biol. 281 (1998), p. 501–511 83 CHARBONNIER, J.-B., GOLINELLI-PIMPANEU, B., GIGANT, B., TAWFIK, D. S., CHAP, R., SCHINDLER, D. G., KIM, S.-H., GREEN, B. S., ESHHAR, Z., KNOSSOW, M., Science 275 (1997), p. 1140–1142 84 GIGANT, B., CHARBONNIER, J.-B., ESHHAR, Z., GREEN, B. S., KNOSSOW, M., Proc. Natl. Acad. Sci. USA 94 (1997), p. 7857–7861 85 PATTEN, P. A., GRAY, N. S., YANG, P. L., MARKS, C. B., WEDEMAYER, G. J., BONIFACE, J. J., STEVENS, R. C., SCHULTZ, P. G., Science 271 (1996), p. 1086–1091 86 WEDEMAYER, G. J., WANG, L. H., PATTEN, P. A., SCHULTZ, P. G., STEVENS, R. C., J. Mol. Biol. 268 (1997), p. 390–400 87 WEDEMAYER, G. J., PATTEN, P. A., WANG, L. H., SCHULTZ, P. G., STEVENS, R. C., Science 276 (1997), p. 1665–1669 88 THAYER, M. M., OLENDER, E. H., ARVAI, A. S., KOIKE, C. K., CANESTRELLI, I. L., STEWART, J. D., BENKOVIC, S. J., GETZOFF, E. D., ROBERTS, V. A., J. Mol. Biol. 291 (1999), p. 329–345
37
38 Catalytic Antibody Technology
89 ZHOU, G. W., GUO, J., HUANG, W., FLETTERICK, R. J., SCANLAN, T. S., Science 265 (1994), p. 1059–1064 90 GUO, J., HUANG, W., ZHOU, G. W., FLETTERICK, R. J., SCANLAN, T. S., Proc. Natl. Acad. Sci. USA 92 (1995), p. 1694–1698 91 LESLEY, S. A., PATTEN, P. A., SCHULTZ, P. G., Proc. Natl. Acad. Sci. USA 90 (1993), p. 1160–1165 92 LINDNER, A. B., ESHHAR, Z., TAWFIK, D. S., J. Mol. Biol. 285 (1999), p. 421–430 93 TOMLINSON, I. M., WALTER, G., JONES, P. T., DEAR, P. H., SONNHAMMER, E. L., WINTER, G., J. Mol. Biol. 256 (1996), p. 813–817 94 WELLS, J. A., CUNNINGHAM, B. C., GRAYCAR, T. P., ESTELL, D. A., Philos. Trans. R. Soc. Lond. (A) 317 (1986), p. 415–423 95 BRYAN, P., PANTOLIANO, M. W., QUILL, S. G., HSIAO, H.-Y., POULOS, T., Proc. Natl. Acad. Sci. USA 83 (1986), p. 3743– 3745 96 MACBEATH, G., HILVERT, D., Chem. Biol. 3 (1996), p. 433–445 97 PADLAN, E. A., COHEN, G. H., DAVIES, D. R., Ann. Inst. Pasteur Immunol. C 136 (1985), p. 271–276 98 STRONG, R. K., PETSKO, G. A., SHARON, J., MARGOLIES, M. N., Biochemistry 30 (1991), p. 3749–3757 99 GRIFFITHS, G. M., BEREK, C., KAARTINEN, M., MILSTEIN, C., Nature 312 (1984), p. 271–275 100 ALZARI, P. M., SPINELLI, S., MARIUZZA, R. A., BOULOT, G., POLJAK, R. J., JARVIS, J. M., MILSTEIN, C., EMBO J. 9 (1990), p. 3807–3814 101 MIYASHITA, H., KARAKI, Y., KIKUCHI, M., FUJII, I., Proc. Natl. Acad. Sci. USA 90 (1993), p. 5337–5340 102 ROBERTS, V. A., STEWART, J., BENKOVIC, S. J., GETZOFF, E. D., J. Mol. Biol. 235 (1994), p. 1098–1116 103 STEWART, J. D., ROBERTS, V. A., THOMAS, N. R., GETZOFF, E. D., BENKOVIC, S. J., Biochemistry 33 (1994), p. 1994–2003 104 BACA, M., SCANLAN, T. S., STEPHENSON, R. C., WELLS, J. A., Proc. Natl. Acad. Sci. USA 94 (1997), p. 10063–10068 105 ARKIN, M. R., WELLS, J. A., J. Mol. Biol. 284 (1998), p. 1083–1094
106 JANDA, K. D., BENKOVIC, S. J., LERNER, R. A., Science 244 (1989), p. 437–440 107 BENKOVIC, S. J., ADAMS, J. A., BORDERS, C. L., JANDA, K. D., LERNER, R. A., Science 250 (1990), p. 1135–1139 108 FERNHOLZ, E., SCHLOEDER, D., LIU, K. K.-C., BRADSHAW, C. W., HUANG, H., JANDA, K. D., LERNER, R. A., WONG, C.-H., J. Org. Chem. 57 (1991), p. 4756–4761 109 STEWART, J. D., KREBS, J. F., SIUZDAK, G., BERDIS, A. J., SMITHRUD, D. B., BENKOVIC, S. J., Proc. Natl. Acad. Sci. USA 91 (1994), p. 7404–7409 110 JANDA, K. D., SCHLOEDER, D., BENKOVIC, S. J., LERNER, R. A., Science 241 (1988), p. 1188–1191 111 KREBS, J. F., SIUZDAK, G., DYSON, H. J., STEWART, J. D., BENKOVIC, S. J., Biochemistry 34 (1995), p. 720–723 112 CARTER, P., WELLS, J. A., Nature 332 (1988), p. 564–568 113 GIBBS, R. A., BENKOVIC, P. A., JANDA, K. D., LERNER, R. A., BENKOVIC, S. J., J. Am. Chem. Soc. 114 (1992), p. 3528–3534 114 LERNER, R. A., BENKOVIC, S. J., SCHULTZ, P. G., Science 252 (1991), p. 659–667 115 THORN, S. N., DANIELS, R. G., AUDITOR, M.-T. M., HILVERT, D., Nature 373 (1995), p. 228–230 116 SUGA, H., ERSOY, O., WILLIAMS, S. F., TSUMURAYA, T., MARGOLIES, M. N., SINSKEY, A. J., MASAMUNE, S., J. Am. Chem. Soc. 116 (1994), p. 6025–6026 117 TSUMURAYA, T., SUGA, H., MEGURO, S., TSUNAKAWA, A., MASAMUNE, S., J. Am. Chem. Soc. 117 (1995), p. 11390–11396 118 JANDA, K. D., LO, C.-H. L., LI, T., BARBAS, C. F., WIRSCHING, P., LERNER, R. A. Proc. Natl. Acad. Sci. USA 91 (1994), p. 2532–2536 119 JANDA, K. D., LO, L.-C., LO, C.-H. L., SIM, M.-M., WANG, R., WONG, C.-H., LERNER, R. A., Science 275 (1997), p. 945–948 120 WAGNER, J., LERNER, R. A., BARBAS, C. F., Science 270 (1995), p. 1797–1800 121 LO, C.-H. L., WENTWORTH Jr., P., JUNG, K. W., YOON, J., ASHLEY, J. A., JANDA, K. D., J. Am. Chem. Soc. 119 (1997), p. 10251–10252 122 GAO, Cs., LAVEY, B. J., LO, C.-H. L., DATTA, A., WENTWORTH Jr., P., JANDA, K. D., J. Am. Chem. Soc. 120 (1998), p. 2211–2217
References 123 BARBAS III, C. F., HEINE, A., ZHONG, G., HOFFMANN, T., GRAMATIKOVA, S., BJO¨ RNESTEDT, R., LIST, B., ANDERSON, J., STURA, E. A., WILSON, I. A., LERNER, R. A., Science 278 (1997), p. 2085–2092 124 HORECKER, B. L., TSOLAS, O., LAI, C.-Y., In The Enzymes, ed. P. D. BOYER, 7, Academic Press, New York 1975, p. 213–258 125 MARSH, J. J., LEBHERZ, H. G., Trends Biochem. Sci. 17 (1992), p. 110–113 126 HIGHBARGER, L. A., GERLT, J. A., KENYON, G. L., Biochemistry 35 (1996), p. 41–46 127 HOFFMANN, T., ZHONG, G., LIST, B., SHABAT, D., ANDERSON, J., GRAMATIKOVA, S., LERNER, R. A., BARBAS III, C. F., J. Am. Chem. Soc. 120 (1998), p. 2768–2779 128 ZHONG, G., HOFFMANN, T., LERNER, R. A., DANISHEFSKY, S., BARBAS, III C. F., J. Am. Chem. Soc. 119 (1997), p. 8131–8132 129 LIST, B., SHABAT, D., ZHONG, G., TURNER, J. M., LI, A., BUI, T., ANDERSON, J., LERNER, R. A., BARBAS III, C. F., J. Am. Chem. Soc. 121 (1999), p. 7283–7291 130 LIST, B., SHABAT, D., BARBAS III, C. F., LERNER, R. A., Chem. Eur. J. 4 (1998), p. 881–885 131 PENTOET E. E., KOCHMAN, M., RUTTER, W. J., Biochemistry 8 (1969), p. 4396–4402 132 HAMMERSTEDT, R. H., M¨OHLER, H., DECKER, K. A., ERSFELD, D. E., WOOD, W. A., Methods Enzymol. 42C (1975), p. 258–264 133 BRANDEN, C., TOOZE, J., Introduction to Protein Structure, Garland Publishing, Inc., New York 1991 134 WILSON, I. A., STANFIELD, R. L., Curr. Opin. Struct. Biol. 3 (1993), p. 113–318 135 DAVIES, D. R., PADLAN, E. A., SHERIFF, S., Annu. Rev. Biochem. 59 (1990), p. 439–473 136 ALT, F. W., BLACKWELL, T. K., YANCOPOULOS, G. D., Science 238 (1987), p. 1079–1087 137 RAJEWSKY, K., F¨ORESTER, I., CUMANG, A., Science 238 (1987), p. 1088–1094 138 TSOU, C. L., Ann. N. Y. Acad. Sci. 864 (1998), p. 1–8 139 POST, C. B., RAY, W. J., Jr., Biochemistry 34 (1995), p. 15881–15885 140 METS, B., WINGER, G., CABRERA, C., SEO, S., JAMDAR, S., YANG, G., ZHAO, K., BRISCOE, R. J., ALMONTE, R., WOODS,
141
142
143 144
145
146 147
148 149
150 151
152
153 154
155 156
157
J. H., LANDRY, D. W., Proc. Natl. Acad. Sci. USA 95 (1998), p. 10176– 10181 YANG, G., CHUN, J., ARAKAWA-URAMOTO, H., WANG, X., GAWINOWICZ, M. A., ZHAO, K., LANDRY, D. W., J. Am. Chem. Soc. 118 (1996), p. 5881–5890 CRAVATT, B. F., ASHLEY, J. A., JANDA, K. D., BOGER, D. L., LERNER, R. A., J. Am. Chem. Soc. 116 (1994), p. 6013–6014 JANDA, K. D., SHEVLIN, C. G., LERNER, R. A., Science 259 (1993), p. 490–493 JANDA, K. D., SHEVLIN, C. G., LERNER, R. A., J. Am. Chem. Soc. 117 (1995), p. 2659–2660 SHABAT, D., ITZHAKY, H., REYMOND, J.-L., KEINAN, E., Nature 374 (1995), p. 143– 146 LI, T., JANDA, K. D., LERNER, R. A., Nature 379 (1996), p. 326–327 REYMOND, J.-L., REBER, J.-L., LERNER, R. A., Angew. Chem. Int. Ed. Engl. 33 (1994), p. 475–477 SINHA, S. C., KEINAN, E., J. Am. Chem. Soc. 117 (1995), p. 3653–3654 HARLOW, E., LANE, D., Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory 1988 LOWRY, D., MURPHY, S., GOFFE, R. A., J. Biotechnol. 36 (1994), p. 35–38 JACKSON, L. R., TRUDEL, L. J., FOX, J. G., LIPMAN, N. S., J. Immunol. Methods 189 (1996), p. 217–231 MA, J. K. C., HIATT, A., HEIN, M., VINE, N. D., WANG, F., STABILA, P., Van DOLLEWEERD, C., MOSTOV, K., LEHNER, T., Science 268 (1995), p. 716–719 HIATT, A., MA, J. K. C., FEBS Lett. 307 (1992), p. 71–75 CARTER, P., KELLEY, R. F., RODRIGUES, M. L., SNEDECOR, B., COVARRUBIAS, M., VELLIGAN, M. D., WONG, W. L. T., ROWLAND, A. M., KOTTS, C. E., CARVER, M. E., YANG, M., BOURELL, J. H., SHEPHARD, H. M., HENNER, D., BioTechnology 10 (1992), p. 163–167 WALLACE, M. B., IVERSON, B. L., J. Am. Chem. Soc. 118 (1996), p. 251–252 WALSH, C., Enzymatic Reaction Mechanisms, W.H. Freeman and Company, New York 1979 IVERSON, B. L., IVERSON, S. A., ROBERTS, V. A., GETZHOFF, E. D., TAINER, J. A.,
39
40 Catalytic Antibody Technology
158
159 160
161 162
163
164
165
166
167
168
169 170
171
172 173
174
BENKOVIC, S. J., LERNER, R. A., Science 249 (1990), p. 659–662 STEWART, J. D., ROBERTS, V. A., CROWDER, M. W., GETZOFF, E. D., BENKOVIC, S. J., J. Am. Chem. Soc. 116 (1994), p. 415–416 IVERSON, B. L., LERNER, R. A., Science 243 (1989), p. 1184–1188 SHOKAT, K. M., LEUMANN, C. J., SUGASAWARA, R., SCHULTZ, P. G., Angew. Chem. Int. Ed. Engl. 27 (1988), p. 1172–1174 NIMRI, S., KEINAN, E., J. Am. Chem. Soc. 121 (1999), in press COCHRAN, A. G., SCHULTZ, P. G., J. Am. Chem. Soc. 112 (1990), p. 9414–9415 COCHRAN, A. G., PHAM, T., SUGASAWARA, R., SCHULTZ, P. G., J. Am. Chem. Soc. 113 (1991), p. 6670–6672 EISENTHAL, R., DANSON, M. J., Enzyme Assays A Practical Approach. IRL Press, Oxford 1993 FENNIRI, H., JANDA, K. D., LERNER, R. A., Proc. Natl. Acad. Sci. USA 92 (1995), p. 2278–2282 LEE, I., STEWART, J. D., ZHONG, W., BENKOVIC, S. J., Anal. Biochem. 230 (1995), p. 62–67 TAWFIK, D. S., GREEN, B. S., CHAP, R., SELA, M., ESHHAR, Z., Proc. Natl. Acad. Sci. USA 90 (1993), p. 373–377 TAWFIK, D. S., CHAP, R., GREEN, B. S., SELA, M., ESHHAR, Z., Proc. Natl. Acad. Sci. USA 92 (1995), p. 2145–2149 CHISWELL, D. J., MCCAFFERTY, J., Trends Biotechnol. 10 (1992), p. 80–84 MARKS, J. D., HOOGENBOOM, H. R., GRIFFITHS, A. D., WINTER, G., J. Biol. Chem. 267 (1992), p. 16007–16010 HUSE, W. D., SASTRY, L., IVERSON, S. A., KANG, A. S., ALTING-MEES, M., BURTON, D. R., BENKOVIC, S. J., LERNER, R. A., Science 246 (1989), p. 1275–1281 SMITH, G. P., PETRENKO, V. A., Chem. Rev. 97 (1997), p. 391–410 JESTIN, J. L., KRISTENSEN, P., WINTER, G., Angew. Chem. Int. Ed. Engl. 38 (1999), p. 1124–1127 GAO, C., LIN, C. H., LO, C.-H. L., MAO, S., WIRSCHING, P., LERNER, R. A., JANDA, K. D., Proc. Natl. Acad. Sci. USA 94 (1997), p. 11777–11782
175 PEDERSEN, H., HOLDER, S., SUTHERLIN, D. P., SCHWITTER, U., KING, D. S., SCHULTZ, P. G., Proc. Natl. Acad. Sci. USA 95 (1998), p. 10523–10528 176 TAWFIK, D. S., GRIFFITHS, A. D., Nat. Biotechnol. 16 (1998), p. 652–656 177 SOUMILLION, P., JESPERS, L., BOUCHET, M., MARCHAND-BRYNAERT, J., WINTER, G., FASTREZ, J., J. Mol. Biol. 237 (1994), p. 415–422 178 MILLER, G. P., POSNER, B. A., BENKOVIC, S. J., Bioorg. Med. Chem. 5 (1997), p. 581–590 179 STEMMER, W. P. C., Nature 370 (1994), p. 389–391 180 CRAMERI, A., CWIRLA, S., STEMMER, W. P. C., Nat. Med. 2 (1996), p. 100– 102 181 MARTINEAU, P., JONES, P., WINTER, G., J. Mol. Biol. 280 (1998), p. 117–127 182 CONN, M. M., PRUDENT, J. R., SCHULTZ, P. G., J. Am. Chem. Soc. 118 (1996), p. 7012–7013 183 LI, Y. F., SEN, D., Nat. Struct. Biol. 3 (1996), p. 743–747 184 LI, Y. F., SEN, D., Biochemistry 36 (1997), p. 5589–5599 185 PRUDENT, J. R., UNO, T., SCHULTZ, P. G., Science 264 (1994), p. 1924–1927. 186 MORRIS, K. N., TARASOW, T. M., JULIN, C. M., SIMONS, S., HILVERT, D., GOLD, L., Proc. Natl. Acad. Sci. USA 91 (1994), p. 13028–13032 187 NARLIKAR, G. J., HERSCHLAG, D., Annu. Rev. Biochem. 66 (1997), p. 19–59 188 WILSON, D. S., SZOSTAK, J. W., Annu. Rev. Biochem. 68 (1999), p. 611–647 189 TARASOW, T. M., TARASOW, S. L., EATON, B. E., Nature 389 (1997), p. 54–57 190 BARTEL, D. P., SZOSTAK, J. W., Science 261 (1993), p. 1411–1418 191 EKLAND, E. H., SZOSTAK, J. W., BARTEL, D. P., Science 269 (1995), p. 364–370 192 HANES, J., PLU¨ CKTHUN, A., Proc. Natl. Acad. Sci. USA 94 (1997), p. 4937– 4942 193 HANES, J., JERMUTUS, L., WEBER-BORNHAUSER, S., BOSSHARD, H. R., PLU¨ CKTHUN, A., Proc. Natl. Acad. Sci. USA 95 (1998), p. 14130– 14135 194 HE, M. Y., TAUSSIG, M. J., Nucleic Acids Res. 25 (1997), p. 5132–5134
References 195 ROBERTS, R. W., SZOSTAK, J. W., Proc. Natl. Acad. Sci. USA 94 (1997), p. 12297–12302 196 NEMOTO, N., MIYAMOTO-SATO, E., HUSIMI, Y., YANAGAWA, H., FEBS Lett. 414 (1997), p. 405–408 197 MACBEATH, G., KAST, P., HILVERT, D., Science 279 (1998), p. 1958–1961 198 MINSHULL, J., STEMMER, W. P. C., Curr. Opin. Chem. Biol. 3 (1999), p. 284–290 199 ARNOLD, F. H., Acc. Chem. Res. 31 (1998), p. 125–131 200 SHOKAT, K. M., LEUMANN, C. J., SUGASAWARA, R., SCHULTZ, P. G., Nature 338 (1989), p. 269–271 201 TAWFIK, D. S., LINDNER, A. B., CHAP, R., ESHHAR, Z., GREEN, B. S., Eur. J. Biochem. 244 (1997), p. 619–626
202 ZEMEL, R., SCHINDLER, D. G., TAWFIK, D. S., ESHHAR, Z., GREEN, B. S., Mol. Immunol. 31 (1994), p. 127–137 203 LEE, A. Y., KARPLUS, A. P., GANEM, B., CLARDY, J., J. Am. Chem. Soc. 117 (1995), p. 3627–3628 204 ESNOUF, R. M., J. Mol. Graph. Model. 15 (1997), p. 132–134, 112–113 205 MERRITT, E. A., MURPHY, M. E. P., Acta Crystallogr. D 50 (1994), p. 869–873 206 NICHOLLS, A., SHARP, K. A., HONIG, B., Proteins Struct. Funct. Genet. 11 (1991), p. 281–296 207 JOO, H., LIN, Z. L., ARNOLD, F. H., Nature 399 (1999), p. 670–673 208 TAYLOR, S. V., KAST, P., HILVERT, D., Angew. Chem. Int. Ed. 40 (2001), 3310–3335
41
1
Structure and Function of Catalytic Antibodies Nicholas A. Larsen Harvard Medical School, Boston, USA
Ian A. Wilson The Scripps Research Institute, La Jolla, USA
Originally published in: Catalytic Antibodies. Edited by Ehud Keinan. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30688-6
1 Introduction
An explosion of antibody structures has emerged in the last 15 years [1, 2]. A subset of these antibodies is catalytic, and the latest version of the Protein Data Bank (PDB) (03/2003) contains 56 such structures representing 27 different antibodies (Table 1). Catalytic antibodies are elicited purposely by immunization with a transition state analog. In principle, antibodies that bind such an analog may also stabilize the true transition state of a chemical reaction relative to the ground state, thereby effecting catalysis [3–7]. Remarkably, the immunoglobulin fold possesses the inherent plasticity to harness the usual combination of electrostatic, van der Waals, hydrophobic, hydrogen bond, and cation-π interactions to generate tailor-made catalysts, which, in rare instances, even rival their natural enzyme counterparts. This general plasticity has been exploited to create model systems that explore a number of common and controversial themes in classical enzymology. Thus, substrate strain, approximation, control of the microenvironment, general acid-base chemistry, induced fit, and covalent catalysis have all been systematically examined in the context of the antibody binding pocket. The main objective of structural analysis is to develop hypotheses about key interactions in the binding pocket for binding substrate, transition state (TS), and product, and to propose plausible mechanisms for catalysis. For some antibodies, these hypotheses have been examined rigorously by biochemistry and mutagenesis (Table 1). Mutagenesis studies allow further quantification of the key interactions and provide an avenue for exploring the routes through the energy landscape that confer catalytic potential to the germline antibody precursor. Structural studies of
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
∗∗)
for diene and dienophile, respectively, equivalent to the so-called effective molarity. ∗∗∗) for substrate and cofactor (periodate), respectively.
∗)
0.28 0.30 0.34 0.08 0.64 0.46 0.05 0.39 0.05 0.004 0.22 3, 10*) 2, 30*) 1, .7* 0.17 0.26 0.27 .04, .25 ***) 0.4 0.36 0.32 0.23 0.21 0.15 0.074 0.051 1.3
3.6 1.0 0.07 0.4 60 220 1500 5.5 0.12 0.11 2.2 0.001 13 40 0.24 1.5 1.4 490 19.2 1.7 0.02 0.02 0.003 0.41 0.02 0.07 0.0016
D2.3 D2.4 D2.5 CNJ206 29G11 17E8 43C9 48G7 6D9 7C8 15A10 13G5 1E9 39-A11 9D9 10F11 33F12 28B4 21D8 5C8 19A4 4C6 1D4 7G12 AZ-28 1F7 4B2
Ester hydrol. pH = 8.3 Ester hydrol. pH = 8.3 Ester hydrol. pH = 8.3 Ester hydrol. pH = 8.0 Ester hydrol. pH = 9.5 Ester hydrol. pH = 9.5 Ester hydrol. pH = 9.3 Ester hydrol. pH = 8.2 Ester hydrol. pH = 8.0 Ester hydrol. pH = 8.0 Ester hydrol. pH = 8.0 Diels-Alder Diels-Alder Diels-Alder Retro-Diels-Alder Retro-Diels-Alder Retro-aldol condensation Sulfide oxidation Decarboxylation Cyclization Cationic cyclization Cationic cyclization Syn-elimination Metal chelatase Oxy-Cope Chorismate mutase Allylic isomerization
kcat (min-1 ) KM(mM)
Antibody Reaction
Table 1 Summary of catalytic antibody structures
2.00 – 2.40 2.15 1.90–2.80 1.61 – 2.10 2.00 – 2.50 2.70 1.30–2.45 1.80–1.85 1.60–2.60 2.00 – 2.80 3.00 1.87
1.85 – 2.10 3.10 2.20 3.20 2.30 2.50 2.20–2.30 1.95–2.70 1.80 2.20 2.35 1.95 1.90 2.10–2.40
x9 1YEC x1 1YED x1 1YEE x1 1KNO x1 1A0Q x1 1EAP x2 43CA x4 1GAF x2 1HYX x1 1CT8 x1 1NJ9 x1 1A3L x1 1C1E x2 1A4K x1 x4 1LO0 x1 1AXT x4 1KEL x2 1C5C x3 25C8 x1 1CF8 x2 1ND0 x2 1JGV x5 3FCT x4 1AXS x1 1FIG x1 1F3D
Yes – – – – Yes Yes Yes – – Yes – – Yes – – Yes Yes Yes – – – – Yes Yes – –
1.3 × 105 3.6 × 104 2.5 × 103 1.6 × 103 2.2 × 103 8.3 × 103 2.7 × 104 1.6 × 104 9.0 × 102 7.1 × 102 2.3 × 104 7 M**) 103M**) 0.4 M**) 4.0 × 102 2.5 × 103 1.7 × 107 9 M**) 2.0 × 104 ND ND ND ND – 5.3 × 103 2.5 × 102 1.5 × 103
1.1 × 105 3.3 × 104 1.3 × 103 0.7 × 102 2.4 × 104 9.0 × 102 6.8 × 104 8.7 × 104 9.0 × 102 1.2 × 10 2.2 × 104 – – – 4.1 × 102 2.5 × 103 – – 7.5 × 103 – – – – – 4.4 × 103 8.5 × 101 4.3 × 102
PDB ID
kcat kuncat -1 KM KDTSA −1 Mutant analysis Res. (Å)
[28] [28] [28] [26] [23] [23, 24] [27] [130] [20, 131] [131] [132, 133] [91] [94] [92, 93] [88, 134] [88, 134] [115–117, 135] [66, 68, 69] [14, 62, 65, 136, 137] [138] [16, 109] [17, 107] [100, 101] [70, 111, 113] [80, 81] [74, 77] [71, 72]
Refs.
2 Structure and Function of Catalytic Antibodies
2 Electrostatic Complementarity
non-catalytic germline precursors further illustrate how the antibodies evolve their catalytic potential by affinity maturation. Such a thorough understanding of these interactions is necessary if improvements are to be made to the antibody catalyst by either mutation or redesign of the immunizing hapten. Importantly, comparison of antibody structures has revealed common structural motifs that are elicited by different haptens during the immune response. Thus, convergent and divergent evolution has been observed directly within the immune system. Most of the catalytic antibody structures found in the are complexes with their respective transition state analog, and provide detailed views of the interactions that contribute to catalysis. In nearly all cases, a high degree of electrostatic and shape complementarity is observed for the corresponding transition state analog (TSA), and, in rarer instances, putative reactive residues that initiate covalent catalysis have been proposed. Consequently, this chapter summarizes antibody-catalyzed reactions in the general context of these three broad headings and concludes with a section on future challenges in the abzyme field for structural biologists and chemists alike.
2 Electrostatic Complementarity
Many antibody-TSA complexes exhibit a high degree of electrostatic complementarity. Recurring binding motifs for anions (e.g., phosphate, arsenate, sulfate) have been identified, indicative of structural convergence from a variety of germline sequences [8–14]. In contrast, general motifs for binding cations (e.g., primary, secondary, tertiary amines) have not yet been identified. Eliciting an Asp/Glu carboxylate using hapten baiting strategies has met with mixed success. Frequently, cations are buried preferentially in hydrophobic pockets but surrounded by aromatic side chains, which are believed to form stabilizing cation-π interactions [15–17].
2.1 Anionic Binding Motifs and Ester Hydrolysis
Nearly half of the catalytic antibody structures found in the PDB correspond to various hydrolytic antibodies in complex with substrate analogs and products (Table 1). Thus, hydrolytic antibodies are by far the most thoroughly studied catalytic antibodies from both a structural and a biochemical vantage, and have already been reviewed extensively in the literature [18, 19]. These hydrolytic antibodies have been raised generally against either phosphonate or phosphonamidate transition state analogs (Scheme 1). The phosphonate moiety is tetrahedral and carries a negative charge, thereby mimicking the transition state for solvent-mediated hydrolysis that involves nucleophilic attack by a water molecule. With the exception of 6D9 [20] 7C8, and 15A10, these hydrolytic antibodies catalyze the hydrolysis of benzyl or
3
4 Structure and Function of Catalytic Antibodies
Scheme 1 Hydrolytic Antibodies 48G7, 17E8, 29G11, CNJ206, 43C9, D2.3, D2.4, D2.5
activated 4-nitrophenol or 4-nitrobenzyl esters (Scheme 1), which are particularly convenient model substrates for both synthetic manipulation and sensitivity during hybridoma screening. 2.2 Structural Motifs
The first structural evidence that antibodies recognize antigens by electrostatic complementarity originates from the pioneering study of mAb McPC603, which binds phosphocholine [8, 9]. Further data have indicated that electronically-related pheny-larsonate compounds elicit a conserved anionic binding motif from the immune repertoire [10–13]. Likewise, crystal structures of catalytic antibodies 48G7, 17E8, 29G11, CNJ206, and 43C9 reveal a recurring motif that coordinates the phosphonate moiety of the hapten (Fig. 1a, b) [18, 19]. In general, the constellation of side-chains derived from L96 (CDR3), H33, H35 (CDR1), and the backbone amide of H96 (CDR3) constitute an anionic binding site, arising from structural convergence within the antibody binding pocket (Fig. 1a, b) [18, 19]. For example, antibody 48G7 utilizes ArgL96, TyrH33, HisH35, and the backbone amide of H96 for phosphonate binding (Fig. 1a) [21, 22]. Antibodies 17E8 and 29G11 are related to each other (92% sequence identity [23]) and use the same residues for phosphonate binding [24, 25] as 48G7, except that a hydrogen bond from the LysH93-Nz amine substitutes for the observed TyrH33 hydroxyl in 48G7. Antibody CNJ206 also similarly uses hydrogen bonds from HisH35 and the backbone amides of H96 and H97 to bind the phosphonate [26]. Although a crystal structure of 43C9 in complex with the transition state analog is not available, it is proposed that HisL91, ArgL96, AsnH33, and HisH35 likewise contribute to the anionic binding site (Fig. 1b) [27]. 2.3 Mechanistic Considerations
Clearly, aryl phosphonate/arsonate/sulfonate-like haptens preferentially elicit electrostatic complementarity from the immune response and may explain the commonly observed binding motif observed in 48G7, 17E8, 29G11, CNJ206, 43C9 and other non-catalytic antibodies. For the hydrolytic antibodies, the significance of the
2 Electrostatic Complementarity Fig. 1. Top view of hydrolytic antibodies. The light chain is represented in pink and the heavy chain in blue; side chains are brown and the ligand is yellow. Hydrogen bonds are represented as dashed green lines. A The hapten complex of 48G7 illustrates the prototypic anionic binding motif composed, for the most part, of heavy chain residues. B 43C9 likely has a similar motif as 48G7. The only available structure is a complex with the product p-nitrophenol. HisL91 is proposed to form an acylated intermediate at the N*1 atom. However, the interpretation of the kinetic and structural data is not fully convincing. C D2.3 has a unique anionic binding motif located on the light chain side of the antibody. The Tyr is proposed to form the key interaction responsible for oxyanion stabilization. Catalytic antibodies D2.3–5, 6D9, and 7C8 differ from this general theme. In D2.3, phosphonate binding arises from hydrogen bonds to AsnL34, TrpH95, and TyrH100D (Fig. 1c) [28]. Thus, the anionic binding site is located on the opposite side of the binding pocket and is composed of different residues. Finally, the phosphonate binding motifs in antibody 6D9 and 7C8 are completely different from those seen in any of the other antibodies and consist of a single, solvent-exposed, hydrogen bond donated from a single HisL27D or TyrH95, respectively [20, 29, 30]. Not surprisingly, these antibodies exhibit lower catalytic activity than any other structurallydetermined hydrolytic antibodies (Table 1). Nevertheless, 6D9 has been recently evolved to have 20-fold higher activity with the addition of a second hydrogen bond to the phosphonate [31].
anionic binding site is underscored by the general agreement between transition state stabilization relative to the ground state, K M K DTS −1 , and the observed rate acceleration, kcat kuncat −1 [32] (Table 1). This agreement implies that hapten-binding interactions observed in the crystal structures correspond to the interactions found in the transition state, and that binding energy is converted into catalysis with near 100% efficiency. Thus, the simplest possible mechanism for ester hydrolysis in this family of antibodies is oxyanion stabilization within the anionic binding site. Oxyanion stabilization has a differential impact on catalysis for various natural esterase and protease enzymes. For example, biochemical studies in a bacterial cocaine esterase have demonstrated that the oxyanion hole is critical, contributing at least 5 kcal mol−1 to catalysis [33]. Oxyanion stabilization is estimated to contribute 2–3 kcal mol−1 to catalysis in papain [34] and cutinase [35], 3-4 kcal mol−1
5
6 Structure and Function of Catalytic Antibodies
in prolyloligopeptidase [36] kcal mol−1 in subtilisin [37–39], and 5–7 kcal mol−1 in acetylcholinesterase [40] and Zn2+ peptidase [41]. These values are within the range ascribed to the oxyanion contribution in the hydrolytic antibodies. For example, a contribution of 5–7 kcal mol−1 from oxyanion stabilization was estimated for the D2.3 hydrolytic antibody [42]. Here, the oxyanion was formally assigned to TyrH100D from pH rate profile and the inactivity of a Tyr → Phe mutant (Fig. 1c) [42]. The alternative candidate, AsnL34, was excluded as the Asn → Gly mutant retained full activity [42]. The activity of this antibody is significantly enhanced by spiking the assay buffer with potent α-nucleophiles, such as peroxide [42]. Thus, the catalytic potential of D2.3 is far from realized, and could be increased by a carefully positioned general acid/base that could more effectively activate water for nucleophilic attack. Even with the more potent nucleophile, the activity of this antibody is apparently limited by product inhibition [42] and conformational isomerization [43]. A largely overlooked ramification of the known anionic binding sites is the asymmetry of the coordinating residues (Fig. 1a–c). This asymmetry means that attack of the substrate carbonyl carbon and oxyanion stabilization has enantiofacial selectivity [44], since the phosphonate oxygens are non-equivalent (prochiral). Thus, the lack of 18 O incorporation in the substrates of hydrolytic antibodies should not be regarded as evidence for an acyl enzyme intermediate [45], but rather that the oxygen atoms are not racemized in the stereoselective binding pocket after hydroxide attack. This enantiofacial selectivity has been noted in other naturally-occurring hydrolytic enzymes [46, 47]. 2.4 Mechanistic Pitfalls
There is an obvious temptation to ascribe complex mechanismsto hydrolytic antibodies to elevate them to the same status as their natural enzyme counterparts. In the case of 17E8, for example, the proximity of a serine and histidine to the phosphonate in the crystal structure led to the proposal that the mechanism involved a catalytic dyad, analogous to the familiar catalytic triad [24]. This structure-based hypothesis was further bolstered by hydroxylamine partitioning data and pH rate profiles that suggested an acyl enzyme intermediate [48]. Initially, this interpretation was criticized because the refined serine rotamer was not within hydrogen-bonding distance to the histidine in the putative dyad [18, 49], but alternative rotamers of a serine have been noted in other serine hydrolases [47, 50, 51]. Eventually, the acyl enzyme mechanism was effectively discounted by a Ser → Ala mutant, which led to twofold improvement in kcat over the wild-type [52]. Likewise, the hydrolytic mechanism for 43C9 was proposed to involve an unprecedented nucleophilic attack by the Nδx atom of HisL91 on the substrate to form an acylated histidine intermediate [53]. In serine hydrolases, the nucleophilicity of histidine N2 atom is enhanced by a buried aspartate that H-bonds to Nδ1 stabilizing the developing positive charge on the imidazole. In the 43C9 pocket, HisL91 N2 H-bonds to TyrL36, but the nearest charged residue is ArgL96 (Fig. 1b). Hence, the available structural data [27] do not
2 Electrostatic Complementarity
yet provide sufficient evidence for such a mechanism. Therefore, the interpretation of the kinetic, mutagenesis, and structural data [45, 53–56] should be carefully reexamined or, at the very least, regarded with skepticism until a direct observation of an acylated histidine intermediate is demonstrated in a crystal structure. 2.5 Structural and Functional Considerations for Hapten Design
The initial use of a phosphonate analog as the transition state for solvent-mediated hydrolysis (Scheme 1) provided the initial breakthrough in the catalytic antibody field [6, 7]. However, the persistent and ubiquitous use of such analogs to elicit hydrolytic antibodies has now become somewhat of a red herring. Phosphonate compounds are not transition state analog inhibitors for serine hydrolases [57]. Thus, it would be impossible to elicit a serine hydrolase-like antibody using a phosphonate analog. Nevertheless, the elicited antibodies are inevitably compared to serine hydrolases [24], or to hydrolase mutants where the entire catalytic triad has been mutated to alanine [58]. Fluorophosphonate compounds (nerve gas reagents, for example) are potent mechanistic inhibitors of serine esterases because of covalent modification of serine with phosphate and concomitant expulsion of fluoride – a good leaving group. Phosphonate and phosphonamidate compounds do not form the same modification in serine esterases, since O− is a poor leaving-group. In contrast, phosphonate compounds are potent transition state analog inhibitors for Zn2+ proteases [59], which utilize a completely different mechanism from that of serine hydrolases and probably do not involve covalent acylation by the enzyme [60]. Rather, Zn2+ is believed to facilitate water-mediated attack of the ester/amide carbonyl carbon with arginine stabilization of the oxyanion [41, 60]. Unfortunately, no design feature in a phosphonate analog would be likely to elicit Zn2+ binding in an antibody combining site. Thus, the likelihood of eliciting hydrolytic antibodies with either serine hydrolase or Zn2+ peptidase-like features would seem unlikely without some innovative new hapten design. 2.6 Anionic Binding Motifs and Control of the Microenvironment in Decarboxylation
Catalytic antibody 21D8 catalyzes a classic decarboxylation reaction (Scheme 2) with a charge-delocalized transition state. This reaction exhibits high sensitivity to the surrounding medium, where the first-order rate constant in aqueous solution is enhanced 106 -fold in aprotic dipolar solvents [61]. Thus, this reaction is a particularly useful model for studying medium effects and control of the microenvironment in a heterogeneous binding pocket [62, 63]. Antibody 21D8 accelerates the decarboxylation reaction by 2 × 104 (Table 1) [62, 64]. The hapten design incorporated two sulfonate moieties to mimic the carboxylate anion of the substrate. Not surprisingly, this antibody has an anionic binding site consisting of HisH35 and ArgL96 [14]. An additional anionic binding site was observed for the second sulfonate group on the opposite side of the binding pocket. However, assigning which of these anionic
7
8 Structure and Function of Catalytic Antibodies
Scheme 2 Decarboxylation Antibody 21D8
binding sites was relevant for binding to the substrate carboxylate was hindered because of the unfortunate symmetry of the hapten. Moreover, attempts at cocrystallization with the substrate were unsuccessful [14]. However, mutagenesis studies support the hypothesis that HisH35 and ArgL96 are indeed key substratebinding residues [65]. It is proposed that the antibody shields the substrate from solvent, but the tetrahedral hydrogen-bonding pattern of the anionic binding site is not fully optimized for binding a planar carboxylate [14, 65]. The pocket is polar with apparently non-optimal hydrogen bonds to the transition state, which may be reminiscent of a dipolar aprotic solvent. Such a model would explain, in part, how the antibody and other enzymes could control the microenvironment to facilitate catalysis. 2.7 Anionic Binding Motifs and the Periodate Cofactor in Sulfide Oxygenation
Catalytic antibody 28B4 catalyzes a bimolecular reaction between sodium periodate (NaIO4 ) and a sulfide to form a sulfoxide (Scheme 3, Table 1) [66]. Oxygenation reactions in naturally-occurring enzymes normally require flavin, heme, or other cofactors as well as NADPH to regenerate the cofactor [67]. Here, periodate is exploited as an artificial chemical cofactor, readily available at high excess and obviating the need for cofactor recycling. The oxygenation reaction is believed to involve nucle-
Scheme 3 Sulfide oxygenation Antibody 28B4
2 Electrostatic Complementarity
ophilic attack of a sulfur lone pair of electrons on a periodate oxygen through a polar transition state (Scheme 3). The amine in the hapten mimics the partial positive charge that develops on the sulfur, while the phosphate mimics the tetrahedral periodate. The crystal structure reveals an anionic binding site similar to that seen in the phosphocholine antibody McPC603 [8, 9] that coordinates periodate (Fig. 2a) [68]. However, the baiting strategy employed to stabilize the developing positive charge on the sulfur was not as effective as the postulated cation-π interaction with a nearby tyrosine does not have optimal geometry (Fig. 2a). Extensive mutagenesis has been performed on this antibody, and the putative germline antibody was also cloned and crystallized in complex with hapten [69]. Here, the p-nitrophenyl portion of the hapten adopts a completely different orientation in the binding pockets of the and affinity-maturated antibodies [69]. In addition, the free and bound forms of the germline antibody structure show significant conformational differences in CDRH3 and CDRL1, while comparison of the free and bound mature antibody shows no significant changes on binding [69]. Thus, the germline is postulated to exhibit wider structural plasticity than the mature antibody [69]. The loss of induced fit binding during the affinity maturation process has been noted in at least two other antibodies [22, 70], but does not rule out the importance of induced fit in affinity-matured antibodies [1].
Fig. 2. Top and side views of sulfide oxygenation and allylic isomerization antibodies, respectively. A Antibody 28B4 has an anion-binding pocket that binds the cofactor periodate in a biomolecular reaction with sulfide to form sulfoxide. This pocket was elicited by a phosphate functionality in the immunizing hapten. B Antibody 4B2 catalyzes an isomerization reaction of a $-( unsaturated ketone. GluL34 acts as general acid/base in the reaction by abstracting the proton from the substrate. Here, GluL34 interacts with the amidinium part of the hapten.
9
10 Structure and Function of Catalytic Antibodies
Scheme 4 Allylic isomerization Antibody 4B2
2.8 Cation Stabilization and Allylic Isomerization
Several antibodies have been elicited that are believed to have mechanistically important, negatively charged carboxylates in their binding pocket. In principle, such carboxylates may be elicited from a baiting strategy that incorporates positivelycharged substituents in the immunizing hapten. Few structures of this type have been determined, however, and for at least one of these antibodies (1D4), the baiting strategy failed. Antibody 4B2 catalyzes an allylic rearrangement of a β–γ unsaturated ketone (Scheme 4, Table 1). Formally, the reaction would be a [1, 3] sigmatropic migration of hydrogen. An amidinium group was incorporated in the hapten design to elicit a carboxylate general acid/base in the binding pocket (Scheme 4) [71]. The crystal structure of 4B2 shows that this baiting strategy was successful, as the two NH groups of the amidinium form H-bonds to each oxygen of the GluL34 carboxylate (Fig. 2b) [72]. The pH rate profile is bell shaped with optimal activity at pH 4.5, the approximate pK a of glutamic acid. An enol intermediate was proposed, based on lack of any antibody-catalyzed 2 H exchange between α and γ carbons in a deuterated substrate, implying that the reaction is not concerted, but involves exchange with solvent [71]. This is a reasonable hypothesis, although it is unlikely that any antibody residue would be positioned to stabilize this transient enol intermediate [72].
3 Shape Complementarity and Approximation
Shape complementarity simply refers to the similarity between the contours of the binding pocket molecular surface and that of its corresponding ligand. Certainly, it is possible to conclude from most of the catalytic antibody structural studies that exquisite shape complementarity is sufficient to explain the stereoselectivity for the majority of these catalysts. Thus, when the catalysts are challenged with chiral (or prochiral) substrates, chiral products are obtained, typically in high enantiomeric excess (ee). In addition, every reaction requires, to some extent, approximation to bring reactive centers together to either break or make new bonds. The antibody molecule is ideally tailored for this role, as it may be programmed to bind flexible substrates in a particular conformation that would otherwise be disfavored in solution, and can bring two substrates together in bimolecular reactions to increase the effective molarity of the substrates relative to one another. In principle,
3 Shape Complementarity and Approximation
approximation lowers the entropy barrier for activation and can effect catalysis. Thus, electrostatic and overall shape complementarity of the binding pocket for the transition state is a key feature for any catalyst and is one of the most important features cited for many catalytic antibody structures. However, approximation alone probably does not fully explain catalysis in all of these systems. Solvent accessibility, control of the microenvironment dielectric, and hydrogen-bonding potential are also important parameters – some reactions are accelerated simply by changing the solvent system. Indeed, not surprisingly, nonspecific interactions with bovine serum albumin (BSA) can accelerate some reactions [73]. In addition, key catalytic residues form specific interactions essential for the reaction to proceed. Catalytic residues have been elicited with mixed success by TSA baiting strategies, reactive immunization, and sometimes by pure chance. Notwithstanding, approximation will be a recurrent theme throughout the remaining sections. 3.1 Unimolecular Rearrangements 3.1.1 Antibody 1F7 Antibody 1F7 catalyzes the [3,3] sigmatropic Claisen rearrangement of chorismate to prephenate (Scheme 5, Table 1) [74]. This reaction is normally catalyzed by chorismate mutase in the biosynthesis of aromatic amino acids. The enzymatic reaction proceeds through an ordered chair-like transition state (S‡ ≈ −13 cal mol−1 K − 1 ) with a rate acceleration of 3 × 106 or four orders of magnitude better than the antibody [75]. Remarkably, even with this modest activity, over-expression of 1F7 complements growth of yeast auxotrophs [76]. The original structure was determined at 3.0 Å resolution from data merged from two crystals collected at room temperature [77]. More recent (unpublished) data at 2.4 Å show some differences in the detailed interactions between the TSA and Fab [77]. However, the general placement of the hapten on the heavy chain side of the antibody is maintained and illustrates a clear and interesting difference in the orientation of the TSA in the antibody pocket as opposed to the enzyme active site [78]. For the antibody, the hapten-binding orientation is determined by the placement of the linker used for covalent attachment to the immunogen, while the enzyme has no such restriction (Scheme 5). The enzyme also utilizes more H-bond interactions than the antibody, which are important for stabilizing the dipolar transition state [77]. Thus, the reaction is probably being catalyzed by preferentially binding the substrate in the
Scheme 5 Chorismate mutase Antibody 1F7
11
12 Structure and Function of Catalytic Antibodies
requisite reactive conformation, while poorly optimized hydrogen bond interactions may account for the inefficiency of the antibody catalyst. This theme recurs in which the substrate in an antibody is inserted upside down compared to that in the natural enzyme [47, 79]. 3.1.2 Antibody AZ-28 AZ-28 catalyzes a related [3,3]sigmatropic oxy-Cope rearrangement (Scheme 6, Table 1) [80]. The chair-like transition state is highly ordered, requiring overlap of the rearranging 4π+ 2σ orbitals of the diene (S‡ ≈ −15 cal mol−1 K−1 ). Antibodies raised against the appropriate, conformationally-restricted TSA would be expected to catalyze the reaction by acting as an “entropy trap.” In addition, appropriate alignment of the 2- and 5-phenyl rings are believed to contribute stereoelectronically to the reaction through hyperconjugation with the 3-hydroxyl (Scheme 6) [81]. This hyperconjugation may accelerate the oxy-Cope rearrangement by an anionic substituent effect. Indeed, the crystal structure shows that packing interactions with the 2- and 5-phenyl rings constrain the cyclohexyl group in the required chairlike transition state (Fig. 3). However, the cyclohexyl ring of the TSA is rotated ca. 80◦ out of plane of the 2- and 5-phenyl rings, so that the phenyl π-orbitals do not align appreciably with the 3-hydroxyl (Fig. 3) [81]. The orientation of the cyclohexyl group is determined by packing interactions, as well as by hydrogen bonds to the 3-hydroxyl (Fig. 3). Surprisingly, biochemical analysis revealed that six mutations occurred during affinity maturation, one of which dramatically decreases the catalytic efficiency [81]. Thus, the evolution of an antibody binding pocket with increased affinity to the TSA does not always correlate with improved catalysis. The germline-encoded antibody structure was determined and revealed an induced-fit mode of binding, as opposed to a lock-and-key type binding in the mature antibody. This apparent conformational flexibility and corresponding decrease in affinity is interpreted as evidence for possible torsional flexibility in the 2-phenyl substituent [82]. This flexibility, in turn, is postulated to facilitate π-overlap and explain the
Scheme 6 Oxy-Cope Antibody AZ-28
3 Shape Complementarity and Approximation Fig. 3. Side view of oxy-Cope antibody AZ-28. The cyclohexyl ring of the TSA is rotated out of plane with the 2- and 5-phenyl rings because of packing interactions and hydrogen bonds to the 3-hydroxyl. The consequent lack of hyperconjugation in the rearranging B-orbitals may explain the decreased rate acceleration in the affinity mature antibody, as opposed to the germline Fab.
improved catalysis. Studies to characterize the dynamics of the antibody catalyzed reaction would provide further evidence to support this hypothesis. 3.2 Bimolecular Rearrangements and the Diels-Alder Reaction
The D-A reaction is one of the most important carbon-carbon bond-forming reactions that is available for organic syntheses of complex natural products and therapeutic agents [83, 84]. Surprisingly, the D-A reaction has not been exploited widely in nature as only a handful of natural D-A enzymes are known [85]. The reaction is a pericyclic [4π+ 2π] cycloaddition of a diene and dienophile. An optimal D-A reaction occurs between an electron-rich diene (HOMO - highest occupied molecular orbital) and an electron-deficient dienophile (LUMO - lowest unoccupied molecular orbital). These abzyme-catalyzed bimolecular reactions are typically much slower and have much higher K M s than esterolytic antibodies (Table 1). The transition state for the D-A reaction is highly ordered for which S‡ is typically -30 to -40 entropy units [86]. Consequently, the recurring theme for these D-A antibodies is shape complementarity; in general, the antibodies with higher shape complementarity to the transition state are better catalysts. Thus, these antibodies may be considered as model systems for studying the effects of approximation in catalysis. 3.2.1 Retro-Diels-Alder Reaction Antibody 10 F11 catalyzes a unimolecular retro-D-A reaction that expels the dieneophile nitroxyl and diene anthracene (Scheme 7, Table 1). The dihedral angles between the phenyl rings in the substrate (110◦ ) and product (180◦ ) differ by 70◦ [87]. A TSA was designed with an intermediate dihedral angle of 140◦ [87]. Four crystal structures were determined including substrate analog (SA), transition state analog (TSA), and product analog (PA) [88]. The kinetics of the reaction suggest that binding energy is converted to catalysis with high efficiency (Table 1), with a clear hierarchy in the shape complementarity index between the antibody and ligands, ranked as TS > SA > PA [88]. This ranking supports the hypothesis that the antibody exerts strain on the substrate to preferentially stabilize the transition state.
13
14 Structure and Function of Catalytic Antibodies
Scheme 7 Retro Diels-Alder Antibody 10F11
Antibody 9D9 catalyzes the same reaction with slower kinetics, which could result from poorer overall shape complementarity [88]. The CDRH3 loop also undergoes a slight induced-fit con-formational change from binding the substrate to the transition state [88]. Finally, a water-mediated hydrogen bond was observed between the antibody backbone and the oxygen that is released from the substrate in the nitroxyl dieneophile product [88]. A second hydrogen bond is formed directly between the antibody backbone and the same oxygen of the substrate. These hydrogen bonds are postulated to act as Lewis acids that further stabilize the LUMO energy of the dieneophile and contribute to the regioselectivity of the retro-D-A reaction [88]. 3.2.2 Disfavored exo Diels-Alder Reaction For antibody 13G5, a novel hapten design was examined (Scheme 8, Table 1). Rather than using a conformationally-restricted TSA, an unrestrained ferrocene derivative was used to test whether the immune system would select a conformer that mimics one of the Diels-Alder transition states (Scheme 8) [89]. In principle, a panel of catalysts would be generated, each of which would selectively catalyze the formation of one of the four possible ortho-diastereomers. In fact, of all the antibodies examined, only one enantiomer was formed, indicative of possible structural bias in the immune response [90]. The catalyzed reaction proceeds with high regio-, diastereo-, and enantioselectivity to form the ortho-exo (S,S) product with ee > 95% after the background exo D-A reaction is subtracted [90]. The background endo:exo product distribution is 85:15 in PBS and 66:34 in toluene, so the exo-transition state is slightly disfavored by ca. 1 kcal mol−1 . The antibody rate acceleration is modest with a kcat(exo) kuncat(exo) −1 of 6.9 M and a kcat(exo) kuncat(endo) −1 of only 1.7 M [89, 91].
Scheme 8 Diels-Alder Antibody 13G5
3 Shape Complementarity and Approximation
Thus, the background reaction to form unwanted ortho-endo product will dominate the final product distribution of the antibody-catalyzed reaction unless sufficiently high concentrations of antibody catalyst are used [90]. The crystal structure of 13G5 shows the cyclopentadienyl ferrocene deeply buried in the antibody combining site [91]. Three H-bonds are observed between the TSA and TyrL36, AspH50, and AsnL91. TyrL36 was initially hypothesized to act as a Lewis acid, activating the dienophile for nucleophilic attack, while AsnL91 and AspH50 form H-bonds to the carboxylate side chain that substitutes for the diene substrate [91]. However, the limited rate accelerations indicate that catalytic residues in the binding pocket have a modest affect, if any, on the kinetics. Rather, the observed diastereo- and enantioselectivity of the antibody could be explained simply by exclusion of solvent (e.g., note toluene product distribution) and specific van der Waals interactions and hydrogen bonds that anchor the substrates in a particular orientation that favors ortho-exo (S,S) product formation due to approximation [86]. The performance of this catalyst may be explained by the relatively poor shape complementarity between the antibody and the true transition state of the reaction, which differs substantially from the hapten as the distance between the cyclopentadienyl rings in the ferrocene structure is ca. 3.3 Å. But what is truly remarkable is that a specific antibody could be generated against such a flexible hapten. 3.2.3 Endo Diels-Alder Reactions Antibody 39-A11 was elicited from immunization with a constrained bicyclic hapten (Scheme 9), and selectively accelerates formation of the kinetically-favored endo product [92]. The acceleration over the background or effective molarity is modest (Table 1). The X-ray structure of the hapten complex indicates that the antibody would bind the diene and dienophile in a reactive conformation, as preprogrammed in the hapten design strategy. The structure also suggests that the catalyst is enantioselective, although an ee has not been determined [93]. AsnH35 forms a hydrogen bond with the dienophile, which could make it more electron deficient and, thereby, more reactive [93]. Interestingly, any evident shape complementarity is lacking between antibody and the endo transition state; the binding pocket can accommodate either the endo or exo transition states (Fig. 4a). Mutations were designed to improve the packing interactions around the kinetically-favored endo transition state, and three of six mutants led to a 5- to 10-fold improvement in kcat [93]. This successful structure-based engineering is a significant and a rare accomplishment in this field. Also, only two somatic mutations were required to generate this catalytic antibody from the germline precursor. The most relevant of these two mutations is SerL91Val (CDR3), located in the binding pocket. The K DTSA for the germline antibody is 380 nM compared to 130 nM in the mature 39-A11. As expected, the catalytic proficiency of the germline antibody is also slightly decreased [93]. In order to circumvent substrate or product inhibition, more sophisticated hapten design strategies incorporate extraneous chemical groups as a negative design, while still promoting high affinity interactions to the transition state (Scheme 10) [94]. 1E9 was originally developed to evaluate the role of proximity
15
16 Structure and Function of Catalytic Antibodies
Scheme 9 Diels-Alder Antibody 39-A11
effects in catalysis, and is the most efficient D-A catalytic antibody yet developed (Table 1) with an effective molarity (EM) of 103 M, where the theoretical limit is 108 M [95]. Two structural features can explain the mechanism of the observed antibody catalysis. First is the conserved H-bond formed between AsnH35 and the succinimide carbonyl oxygen of the hapten, which represents the carbonyl oxygen of the dienophile substrate (Scheme 10). Since an H-bond to the carbonyl oxygen should cause the dieneophile to be electron deficient, it is more susceptible to the attack by the diene. Second, the binding pocket features almost perfect van der
Fig. 4 Molecular surface representation of Diels-Alder antibodies. Antibody 39-A11 (A) and 1E9 (B) are derived from the same germline precursor, yet exhibit a large difference in activity. This difference is attributed to the poor shape complementarity of the
antibody for the transition state in 39-A11 compared to 1E9. Mutations were designed in 39-A11 to improve the shape complementarity, and several improved catalysts were identified.
4 Shape Complementarity and Control of the Reaction Coordinate
Scheme 10 Diels-Alder Antibody 1E9
Waals complementarity to the hapten, providing a clear structural demonstration of approximation in bimolecular catalysis (Fig. 4b) [96]. Interestingly, 1E9, antiprogesterone antibody DB3, and the D-A antibody 39A11 are derived from the same germline gene [93, 97–99]. Structures of these antibodies thus represent possible snapshots in the evolution of substrate binding and catalytic efficiency [94]. Subtle differences in the evolution of their binding pockets through somatic mutations have resulted in antibodies that specifically bind progesterone or catalyze a D-A reaction. The exceptional efficiency of 1E9 is hypothesized to arise from a rare somatic mutation that significantly deepens the active site, creating exquisite complementarity to the transition state [94].
4 Shape Complementarity and Control of the Reaction Coordinate
In many reactions, several products may be formed, but one product predominates because it is favored kinetically. The energy of activation is much lower for the favored product, which reduces the likelihood of forming the disfavored product. If the activation barrier for the reverse reaction is sufficiently large, then the reaction is essentially irreversible, or under kinetic control. Similarly, for some reactions, many possible products may result (e.g., the D-A reaction or cationic cyclization), and complex mixtures result. The antibody catalyst can steer a reaction through a complex reaction coordinate to alter the final reaction profile, often forming exclusively a single product, and, in some cases, a kinetically-disfavored one. 4.1 Syn-Elimination
Antibody 1D4 is a model system for studying the selective catalysis of a disfavored syn-elimination reaction toform a cis product (Scheme 11, Table 1) [100]. The TSA design incorporated a rigid bicyclic ring structure to constrain the two phenyl groups of the substrate in the eclipsed conformation that characterizes the transition state (Scheme 11). In addition, the α-keto proton that is abstracted during elimination was replaced with a positively-charged amine in the hapten to elicit a complementary general base in the antibody combining site. Although
17
18 Structure and Function of Catalytic Antibodies
Scheme 11 Syn-elimination Antibody 1D4
the catalyzed reaction proceeds slowly (Table 1) [100], the background elimination reaction to the cis-product is undetectable in the absence of antibody; the competing anti-elimination reaction (k = 2.5 × 10−4 min−1 ) dominates the background reaction, such that the product distribution is entirely trans. Crystal structures of free and bound 1D4 show a conformational change in CDR H2, indicative of induced fit [101]. The pocket exhibits high shape complementarity to the TSA, which accounts for the selectivity for the eclipsed transition-state of the reaction (Fig. 5) [101]. Hence, the antibody combining site likely utilizes the majority of its binding energy toward overcoming the extreme steric barriers, and, hence, stabilizing the eclipsed transition state. The key mechanistic issue for any βelimination process is whether the reaction is E1, E2, or E1cB [102]. Unfortunately, a crystal structure alone cannot resolve such a question. A water molecule was modeled into electron density that is near the hapten amine [101] and within
Fig. 5 Electrostatic potential mapped to the molecular surface of 1D4. The exquisite shape complementarity for the phenyl rings that characterize the eclipsed transition state of the reaction accounts for the selectivity of the catalyst.
4 Shape Complementarity and Control of the Reaction Coordinate
hydrogen-bonding distance to a histidine. Thus, HisH58 is proposed to promote catalysis through interaction with the water [101]. 4.2 Disfavored Ring Closure
Antibody 5C8 catalyzes the disfavored endo-tet cyclization reaction inviolation of Baldwin’s rules for ring closure (Scheme 12) [15]. The N-methylpiperidinium hapten was anticipated to elicit negative charge in the binding pocket that would stabilize the developing positive charge at the oxirane carbon of the substrate. Several changes were observed between the bound and unbound structures, indicative of induced fit. The largest rearrangement was observed in the backbone and side chain orientations of CDR H3 with the largest differences seen for the two tyrosine residues at the tip of the loop, which fold into the binding site, creating a solvent-inaccessible cavity. AspH95 and AspH101 bind the positive charge of the quaternary amine like a pair of tweezers (Fig. 6) [15]. In addition, the positive charge appears to be stabilized by a cation-π interaction with TyrL91. AspH95 and HisL89 were hypothesized to effect rudimentary general acid/base catalysis (Fig. 6). In the proposed mechanism, AspH95 forms an H-bond to the epoxide-oxygen, favoring the oxirane ring opening, while HisL89 forms an H-bond with the alcohol to promote nucleophilic SN 2-like attack to form the disfavored six-membered ring [15]. This stabilization of the positive charge that develops along the reaction coordinate appears to be an important factor for the rate enhancement and for directing the reaction along the otherwise disfavored pathway [15]. 4.3 Selective Control of a Reactive Carbocation: Antibodies 4C6 and 19A4
Cationic cyclization proceeds via formation of a reactive carbocation, and may be separated into three steps: initiation, propagation, and termination. A carbocation may be formed either by ionization or by electrophilic addition to an unsaturated olefin, while cyclization is terminated by nucleophilic addition or elimination of a proton. The stereochemistry of such transformations should follow the StorkEschenmoser hypothesis [103, 104]. Natural enzymes catalyze polycyclization of squalene through a reactive carbocation intermediate to form a variety of cycloisoprenoids (e.g., cholesterol). For such enzymes, initiation occurs either by protonation of a double bond or by release of pyrophosphate [105]. Structural studies have shown that the enzyme active sites are typically lined with aromatic residues that
Scheme 12 Disfavored cyclization Antibody 5C8
19
20 Structure and Function of Catalytic Antibodies
Fig. 6 Disfavored ring closure antibody 5C8. AspH95 is proposed to form an Hbond with the epoxide-oxygen, favoring oxirane ring opening, with HL89 forming an H-bond with the alcohol to promote SN2-
like attack to form the disfavored ring. The piperidinium moiety was used as bait to elicit the desired negative charge in the binding pocket, which should stabilize the developing positive charge in the actual ring closure reaction.
stabilize the carbocation through cation-π interactions. Catalytic antibodies 4C6 and 19A4 also contain a number of aromatic residues in the binding pocket, which are postulated to play a similar role. Thus, the enzymes and antibodies exhibit similar features. Catalytic antibody 4C6 catalyzes a model reaction that had been shown under other conditions (98% formic acid) to proceed through a carbocation [106]. This reactivity was elicited with an N-oxide hapten [107] that is structurally related to the 5C8 hapten. Structurally analogous N-oxide TSAs are known inhibitors for natural squalene cyclases, validating this design strategy [108]. The oxide was included to promote stabilization of the negative charge that develops in the sulfonic acid leaving group, while the nitrogen would elicit residues that increase the electrophilicity at C1 (Scheme 13, Table 1). The hapten also includes a silane moiety that could stabilize the carbocation via hyperconjugation [107]. In addition, conjugation with silane makes the olefin more electron rich, which facilitates attack of C1 . Carbocations are highly reactive species, yet the antibody selectively catalyzes the formation of only one product in mild conditions where no background reaction is detectable (Scheme 13) [107]. The crystal structure shows several aromatic residues in the active site that could stabilize the reactive carbocation and shield it from solution [17]. In the proposed mechanism, TyrL96 and/or TyrH50 initiate carbocation formation at C1 by promoting SN 1-like departure of the acetamidobenzenesulfonic acid leaving group. Electrophilic attack of the C5 -C6 olefin by the developing C1 carbocation closes the ring and shifts the carbocation to C5 , where the reaction terminates by nucleophilic addition of water [17]. This mechanism, however, does not explain why a related compound (Si exchanged for C) has a ca.100-fold reduction in
4 Shape Complementarity and Control of the Reaction Coordinate
Scheme 13 Cationic cyclization Antibody 4C6
kcat , but similar K M for the reaction, nor why the related substrate that lacks the C5 C6 double bond exhibits no reaction (kinetics monitor formation of leaving group). Together, these data suggest that anchimeric assistance of the electron-rich double bond is the central feature of the mechanism [107]. Thus, the reaction could be initiated by backside attack of C1 by the electron-rich olefin, followed by termination by nucleophilic addition of water at C5 . This attack is facilitated by approximation due to the shape complementarity of the binding pocket. Additional kinetics and mutagenesis could help resolve the details of how the reaction is initiated. Catalytic antibody 19A4 catalyzes a more complicated, tandem-cationic cyclization to form bridge-methylated decalins (Scheme 14, Table 1) [109]. The hapten design was similar to that for 4C6, with the same N-oxide functionality to elicit residues that would promote initiation of the cyclization cascade by expulsion of the sulfonic acid leaving group. In contrast to 4C6, this reaction is terminated by elimination. The antibody does not possess the exquisite control over the reaction that was observed for 4C6, as three different decalins are formed accounting for only 50% of the overall products [109]. Nevertheless, the catalysis is remarkable, as no bicyclic products are formed in the uncatalyzed reaction. The crystal structure shows a hydrophobic binding pocket lined with several aromatic residues that may stabilize the carbocation [16]. In addition, the binding pocket has high shape complementarity to the productive chair-chair conformation of the substrate that is required for effective cyclization (Scheme 14). Here, a hydrogen bond from
Scheme 14 Cationic cyclization Antibody 19A4
21
22 Structure and Function of Catalytic Antibodies
AsnH35a to the sulfonic acid leaving group is believed to facilitate SN 1-like departure with concerted anchimeric assistance from the C5-C6 π bond, yielding the first ring with the carbonium ion at the tertiary C5 carbon [16]. In the next step, the C9-C10 π bond undergoes electrophilic attack by the C5 carbonium to form the second ring. Although an epoxide was included in the original hapten to elicit a catalytic residue in the antibody, there is no catalytic base to direct the regiochemistry of the final termination step of the reaction [16]. Thus, the reaction results in a mixture of elimination products. Structure-based engineering would afford a unique opportunity to control the product distribution for this particular reaction.
5 Shape Complementarity and Substrate Strain
Many catalytic antibodies impose varying degrees of strain on the substrate to achieve reactive conformations that would otherwise be disfavored in solution (e.g., 1D4, 4C6, 19A4, 13G5, 10F11, AZ-28). The enzyme ferrochelatase, however, is a particularly attractive model system for studying substrate strain. Ferrochelatase inserts Fe2+ into porphyrin in heme biosynthesis. In this reaction, the enzyme is proposed to distort the porphyrin ring system to expose the pyrrole nitrogen lone pair electrons, facilitating metal ion complexation. Indeed, N-methyl protoporphyrin – a distorted porphyrin analog – is a potent inhibitor of ferrochelatase [110]. Antibodies elicited against this compound catalyze insertion of Cu2+ or Zn2+ into mesoprotoporphyrin IX, supporting the hypothesis that the hapten is a true transition state analog for the enzyme-catalyzed reaction (Scheme 15, Table 1) [111]. Thus, this reaction is a model system for studying the effects of substrate strain in catalysis [4]. In the antibody-catalyzed reaction, no saturation kinetics were observed for the metal ion, suggesting that metal binding to the antibody is not a prerequisite of catalysis, in contrast to ferrochelatase, which has a K M of 32 µM for iron [112]. The crystal structure, however, reveals that an Asp carboxylate is directed toward
Scheme 15 Metal chelation antibody 7G12
6 Reactive Amino Acids and the Possibility of Covalent Catalysis
Fig. 7 Superposition of free (green) and TSA bound (blue) 7G12. (A) The germline 7G12 shows an induced-fit mode of binding, as demonstrated by the conformation change in CDRH3. (B) The affinity mature
antibody, in contrast, has a lock-and-key mode of binding. The maturation of a preformed binding pocket is associated with higher affinity for the transition state and overall improved catalytic proficiency.
the center of the protoporphyrin (Fig. 7) and is proposed to either guide metal complexation or act as a proton shuttle [113]. The residue is clearly important as Asp → Asn and Asp → His mutations are inactive [113]. In addition, the crystal structure shows that packing interactions between the antibody and substrate induce strain in the Michaelis complex [70]. Although accurate modeling and exact refinement of any distortions in the Michaelis complex at 2.6 Å resolution are not reasonable, the electron density does provide clear qualitative indications of deviation from ideal planarity [70]. The free and hapten-bound structures of the germline and mature antibodies were compared and nicely illustrate the evolution from an induced fit mode of binding to lock-and-key (Fig. 7a,b) [70]. The evolution of this lock-and-key binding results in tighter binding to the TSA (due to faster association and slower dissociation rates), and correlates with a ca. 100-fold improvement in kcat K M −1 [70]. Together, these structural and biochemical results demonstrate the importance of substrate strain in antibody-mediated catalysis.
6 Reactive Amino Acids and the Possibility of Covalent Catalysis
The recently developed reactive immunization strategy [114, 115] has shown that it is possible to create antibodies that catalyze aldol reactions, one of the most important carbon-carbon bond-forming reactions in chemistry and biology. The mechanism for the natural class I aldolase enzyme involves the key formation of a Schiff base between the substrate ketone (aldol donor) and a reactive lysine, followed by condensation with an aldehyde (aldol acceptor) and subsequent hydrolysis to release product (Scheme 16). The binding pocket of aldolase antibody 33F12 is an elongated cleft more than 11 Å deep [116]. The only lysine residue in the active site
23
24 Structure and Function of Catalytic Antibodies
Scheme 16 Previously proposed retro-aldol mechanism for aldolase antibody 33F12
is located at the bottom of the hydrophobic pocket (Fig. 8a). This environment was hypothesized to perturb the pK a of the lysine, making it amore potent nucleophile [116]. The antibody can then use the reactive -amino group to form an enamine with the corresponding substrates, analogous to the natural class I aldolase. The LysH93 → Ala mutant is inactive, which confirms the importance of this residue [117]. An additional residue that may be of mechanistic interest is TyrL36, which forms a water-mediated hydrogen bond with the -amino group of lysine H93 (Fig. 8a). Importantly, a water molecule was postulated to play a key role shuttling protons in the mechanism of a class I aldolase [118]. Interestingly, the antibody exhibits exceptionally broad substrate specificity, implying that the binding pocket is unrefined for a particular substrate presumably due to the lack of further affinity maturation due to the proposed covalent nature of the hapten complex [116]. Antibodies 40F12, 84G3, 93F3 and others were generated with a sulfone β-diketone hapten [119]. Notably, antibodies 93F3 and 84G3 show reversed enantioselectivity compared to 33F12 [119]. Sequence analysis and our unpublished structural data [120] reveal a lysine at position L89 on the opposite side of the binding pocket, which could explain the reversed selectivity of this antibody. The relatively unrefined binding pocket of 33F12 contrasts with that of class I aldolase enzymes, which catalyze the reversible condensation of aldehydes and ketones with high stereospecificity (121,122). The bacterial 2-deoxyribose-5-phosphate aldolase (DERA) (123,124) catalyzes the reversible aldol reaction of acetaldehyde and D-glyceraldehyde-3-phosphate to form 2-deoxyribose-5-phosphate. The ultra high resolution structure of DERA reveals a β-barrel with two lysines in the active site, one of which forms the covalent Schiff base intermediate (Fig. 8b) [118]. Compared to the hydrophobic antibody binding pocket, a number of charged/polar residues form a sophisticated hydrogen-bonding network within the active site, which is likely important for tuning the reactivity of the lysine and the substrate specificity [118]. The mechanism by which Lys167 is rendered nucleophilic in the enzyme is unclear – the pK a was proposed to be depressed by a neighboring Lys201 (Fig. 8b) [118], but mutation of Lys201 → Leu clearly does not hinder formation of the Schiff
6 Reactive Amino Acids and the Possibility of Covalent Catalysis
base [118]. Surprisingly, inhibition of natural aldolases by diketones has not been fully characterized. A lingering challenge is to determine the mode of binding of the aldolase family of antibodies to substrate-related inhibitors, as well as the diketone compound originally used as a hapten to elicit the antibody response (Scheme 16). Despite our best efforts, we have obtained no structural evidence demonstrating the formation
Fig. 8 Comparison of aldolase antibody and enzyme active sites. Red spheres represent water molecules. A The aldolase antibody contains a lysine buried at the bottom of hydrophobic gorge, which has been postulated to form covalent intermediates with an aldol donor (ketone) followed by condensation with an aldol acceptor (aldehyde). TyrL36 may form a water-mediated hydrogen bond with the reactive lysine ,amino group, which may be of importance in the reaction mechanism. B The bacte-
rial 2-deoxyribose-5-phosphate aldolase (DERA) likewise contains a reactive Lys167, which was observed to form a covalent intermediate with the substrate in the crystal structure. Lys167 is probably rendered nucleophilic by the sophisticated hydrogenbonding network in the active site, which includes nearby Asp102 and Lys201. A nearby water molecule forms key stabilizing interactions to the substrate, and is believed to play a key role in the catalytic mechanism.
25
26 Structure and Function of Catalytic Antibodies
of a covalent bond in the antibody binding pocket for the aldolase antibody. This contrasts with the relative ease by which we and others have obtained covalent adducts in enzyme systems [118]. Surprisingly, the Lys167Leu DERA mutant catalyzes the retro aldol reaction for its optimal substrate (3.6 min−1 ) [118] three times faster than the antibody for its optimal substrate (1.4 min−1 ) [125]. Clearly, catalysis of the retro-aldol condensation may occur in the absence of covalent intermediates. Alternative mechanisms for 33F12 and 38C2 were never seriously considered [116], but could involve non-covalent stabilization of an enolate or aldol donor that subsequently attacks the aldol acceptor (analogous to a class II aldolase). Such a mechanism would be more consistent with our structural observations. In any event, the mechanistic supposition that a lysine could be rendered nucleophilic in the hydrophobic environment of the antibody binding pocket led, in part, to the appreciation that secondary amines, e.g., proline, could catalyze the aldol reaction in organic and, to lesser extent, aqueous solvent [126]. Thus, structural and biochemical characterization of aldolase antibodies has led to the development of important new synthetic methodologies.
7 Conclusions
Classically, antibodies are effector molecules that link the recognition elements of the Fab to the destructive elements of the immune system. Recently, this paradigm has been challenged by the discovery that antibodies also possess the intrinsic capability to catalyze the formation of hydrogen peroxide from singlet oxygen and water in the absence of any other metal ions or cofactors [127, 128]. This catalytic capability would potentially link recognition and oxidative killing within the same molecule. This remarkable chemistry is likewise catalyzed by the T-cell receptor (TCR), but not β 2 -microglobulin, which shares the immunoglobulin fold [128]. Very little is understood about where this reaction occurs on the antibody or TCR, and docking studies have not resolved this issue [129]. Assaying light chain dimers and camel antibodies (heavy chain only) would provide some indication of where the chemistry occurs. We have grappled with this problem indirectly by pressurizing numerous antibody crystals under a xenon atmosphere [128]. Our data reveal several hydrophobic pockets in the core of the immunoglobulin fold. Since xenon is larger than oxygen (O2 ), oxygen could potentially diffuse into the same hydrophobic cavities. However, the mechanistic relevance of these hydrophobic cavities, if any, is not yet resolved. Unfortunately, a more direct experimental approach is extremely challenging; distinguishing unambiguously between water, singlet oxygen, hydrogen peroxide, ozone, and other reactive oxygen intermediates (which are believed to be short lived) from an electron density map is not feasible, even at ultra high resolution. Moreover, discriminating between radical chemistry derived from the antibody or synchrotron radiation is non-trivial and requires numerous control experiments. However, we have observed generally that soaking antibody crystals in hydrogen
References
peroxide and exposing them to UV light (source of singlet oxygen) has little effect on the integrity of the antibody [128], although oxidative modifications are observed on TrpLIG3 in the interfacial region between variable and constant domains [139]. Current endeavors aim to trace the origins of antibody-generated oxidants within the immunoglobulin fold.
References 1 WILSON, I. A., STANFIELD, R. L. Curr. Op. Struct. Biol. 4 (1994), p. 857–867 2 PADLAN, E. A., Adv.Prot.Chem. 49 (1996), p. 57–133 3 PAULING, L., Am.Sci. 36 (1948), p. 51–58 4 HALDANE, J. B. S., Enzymes, Longmans Green, London 1930 5 JENCKS, W. P., Catalysis in Chemistry and Enzymology, McGraw-Hill, New York 1969 6 TRAMONTANO, A., JANDA, K. D., LERNER, R. A., Science 234 (1986), p. 1566–1570 7 POLLACK, S. J., JACOBS, J. W., SCHULTZ, P. G., Science 234 (1986), p. 1570–1573 8 SEGAL, D. M., PADLAN, E. A., COHEN, G. H., RUDIKOFF, S., POTTER, M., DAVIES, D. R., Proc. Natl. Acad. Sci. U.S.A. 71 (1974), p. 4298–4302 9 SATOW, Y., COHEN, G. H., PADLAN, E. A., DAVIES, D. R., J. Mol. Biol. 190 (1986), p. 593–604 10 JESKE, D. J., JARVIS, J., MILSTEIN, C., CAPRA, J. D., J. Immunol. 133 (1984), p. 1090–1092 11 ROSE, D. R., STRONG, R. K., MARGOLIES, M. N., GEFTER, M. L., PETSKO, G. A., Proc. Natl. Acad. Sci. U.S.A. 87 (1990), p. 338–342 12 STRONG, R. K., CAMPBELL, R., ROSE, D. R., PETSKO, G. A., SHARON, J., MARGOLIES, M. N., Biochemistry 30 (1991), p. 3739–3748 13 STRONG, R. K., PETSKO, G. A., SHARON, J., MARGOLIES, M. N., Biochemistry 30 (1991), p. 3749–3757 14 HOTTA, K., LANGE, H., TANTILLO, D. J., HOUK, K. N., HILVERT, D., WILSON, I. A., J. Mol. Biol. 302 (2000), p. 1213–1225 15 BALDWIN, J. E., J.Chem.Soc.,Chem. Commun., (1976), p. 734–736 16 PASCHALL, C. M., HASSERODT, J., JONES, T., LERNER, R. A., JANDA, K. D.,
17
18
19 20
21
22
23
24
25
26
27
28
CHRISTIANSON, D. W. Angew. Chem. Int. Ed. 38 (1999), p. 1743–1747 ZHU, X., HEINE, A., MONNAT, F., HOUK, K. N., JANDA, K. D., WILSON, I. A. J. Mol. Biol. 329 (2003), p. 69–83 MACBEATH, G., and HILVERT, D., Chem. Biol. 3 (1996), p. 433–445 TANTILLO, D. J., HOUK, K. N. Chem. Biol. 8 (2001), p. 535–545 KRISTENSEN, O., VASSYLYEV, D. G., TANAKA, F., MORIKAWA, K., FUJII, I., J. Mol. Biol. 281 (1998), p. 501–511 WEDEMAYER, G. J., WANG, L. H., PATTEN, P. A., SCHULTZ, P. G., STEVENS, R. C., J. Mol. Biol. 268 (1997), p. 390–400 WEDEMAYER, G. J., PATTEN, P. A., WANG, L. H., SCHULTZ, P. G., STEVENS, R. C., Science 276 (1997), p. 1665–1669 GUO, J., HUANG, W., ZHOU, G. W., FLETTERICK, R. J., SCANLAN, T. S., Proc. Natl. Acad. Sci. U.S.A. 92 (1995), p. 1694–1698 ZHOU, G. W., GUO, J., HUANG, W., FLETTERICK, R. J., SCANLAN, T. S., Science 265 (1994), p. 1059–1064 BUCHBINDER, J. L., STEPHENSON, R. C., SCANLAN, T. S., FLETTERICK, R. J., J. Mol. Biol. 282 (1998), p. 1033–1041 CHARBONNIER, J. B., CARPENTER, E., GIGANT, B., GOLINELLI-PIMPANEAU, B., ESHHAR, Z., GREEN, B. S., KNOSSOW, M., Proc. Natl. Acad. Sci. U.S.A. 92 (1995), p. 11721–11725 THAYER, M. M., OLENDER, E. H., ARVAI, A. S., KOIKE, C. K., CANESTRELLI, I. L., STEWART, J. D., BENKOVIC, S. J., GETZOFF, E. D., ROBERTS, V. A., J. Mol. Biol. 291 (1999), p. 329–345 CHARBONNIER, J. B., GOLINELLI-PIMPANEAU, B., GIGANT, B.,
27
28 Structure and Function of Catalytic Antibodies
29
30
31
32 33
34
35
36 37
38 39
40
41
42
43 44 45
TAWFIK, D. S., CHAP, R., SCHINDLER, D. G., KIM, S. H., GREEN, B. S., ESHHAR, Z., KNOSSOW, M., Science 275 (1997), p. 1140–1142 MIYASHITA, H., HARA, T., TANIMURA, R., FUKUYAMA, S., CAGNON, C., KOHARA, A., FUJII, I., J. Mol. Biol. 267 (1997), p. 1247–1257 GIGANT, B., TSUMURAYA, T., FUJII, I., KNOSSOW, M., Structure 7 (1999), p. 1385–1393 TAKAHASHI, N., KAKINUMA, H., LIU, L., NISHI, Y., FUJII, I., Nat. Biotech. 19 (2001), p. 563–567 STEWART, J. D., BENKOVIC, S. J., Nature 375 (1995), p. 388–391 TURNER, J. M., LARSEN, N. A., BASRAN, A., BARBAS III, C. F., BRUCE, N. C., WILSON, I. A., LERNER, R. A., Biochemistry 41 (2002), p. 12297–12307 MENARD, R., CARRIERE, J., LAFLAMME, P., PLOUFFE, C., KHOURI, H. E., VERNET, T., Biochemistry 35 (1991), p. 8924–8928 NICOLAS, A., EGMOND, M., VERRIPS, C. T., de VLIEG, J., LONGHI, S., CAMBILLAU, C., MARTINEZ, C., Biochemistry 35 (1996), p. 398–410 SZELTNER, Z., RENNER, V., POLGAR, L., Protein Sci. 9 (2000), p. 353–360 BRYAN, P., PANTOLIANO, M. W., QUILL, S. G., HSIAO, H.-Y., POULOS, T., Proc. Natl. Acad. Sci. U.S.A. 83 (1986), p. 3743–3745 CARTER, P., WELLS, J. A., Proteins: Struct. Funct. Genet. 7 (1990), p. 335–342 O’CONNELL, T. P., DAY, R. M., TORCHILIN, E. V., BACHOVCHIN, W. W., MALTHOUSE, J. G., Biochem. J. 326 (1997), p. 861–866 HAREL, M., QUINN, D. M., NAIR, H. K., SILMAN, I., SUSSMAN, J. L., J. Am. Chem. Soc. 118 (1996), p. 2340–2346 PHILLIPS, M. A., FLETTERICK, R., RUTTER, W. J., J. Biol. Chem. 265 (1990), p. 20692–20698 LINDLER, A. B., KIM, S. H., SCHINDLER, D. G., ESHAR, Z., TAWFIK, D. S., J. Mol. Biol. 329 (2002), p. 559–572 LINDLER, A. B., ESHAR, Z., TAWFIK, D. S., J. Mol. Biol. 285 (1999), p. 421–430 TANTILLO, D. J., HOUK, K. N., J. Comput. Chem. 23 (2002), p. 84–95 JANDA, K. D., ASHLEY, J. A., JONES, T. M., MCLEOD, D. A., SCHLOEDER, D. M.,
46 47
48 49 50
51
52
53
54
55
56
57 58 59 60
61 62
WEINHOUSE, M. I., LERNER, R. A., GIBBS, R. A., BENKOVIC, P. A., HILHORST, R., BENKOVIC, S. J., J. Am. Chem. Soc. 113 (1991), p. 291–297 DEREWENDA, Z. S., WEI, Y., J.Am. Chem. Soc. 117 (1995), p. 2104–2105 LARSEN, N. A., TURNER, J. M., STEVENS, J., ROSSER, S. J., BASRAN, A., LERNER, R. A., BRUCE, N. C., WILSON, I. A., Nat. Struct. Biol. 9 (2002), p. 17–21 GUO, J., HUANG, W., SCANLAN, T. S., J. Am. Chem. Soc. 116 (1994), p. 6062–6069 HAYNES, M. R., HEINE, A., WILSON, I. A., Isr. J. Chem. 36 (1996), p. 133–142 GHOSH, D., SAWICKI, M., LALA, P., ERMAN, M., PANGBORN, W., EYZAGUIRRE, J., GUTIERREZ, R., JORNVALL, H., THIEL, D. J. J. Biol. Chem. 276 (2001), p. 11159–11166 ZHU, X., LARSEN, N. A., BASRAN, A., BRUCE, N. C., WILSON, I. A., J. Biol. Chem. 278 (2003), p. 2008–2014 BACA, M., SCANLAN, T. S., STEPHENSON, R. C., WELLS, J. A., Proc. Natl. Acad. Sci. U.S.A. 94 (1997), p. 10063–10068 STEWART, J. D., KREBS, J. F., SIUZDAK, G., BERDIS, A. J., SMITHRUD, D. B., BENKOVIC, S. J., Proc. Natl. Acad. Sci. U.S.A. 91 (1994), p. 7404–7409 BENKOVIC, S. J., ADAMS, J. A., BORDERS Jr, C. L., JANDA, K. D., LERNER, R. A. Science 250 (1990), p. 1135–1139 GIBBS, R. A., BENKOVIC, P. A., JANDA, K. D., LERNER, R. A., BENKOVIC, S. J., J. Am. Chem. Soc. 114 (1992), p. 3528–3534 KREBS, J. F., SIUZDAK, G., DYSON, H. J., STEWART, J. D., BENKOVIC, S. J., Biochemistry 34 (1995), p. 720–723 SAMPSON, N. S., BARTLETT, P. A., Biochemistry 30 (1991), p. 2255 CARTER, P., WELLS, J. A., Nature 332 (1988), p. 564 BARTLETT, P. A., MARLOWE, C. K., Biochemistry 22 (1983), p. 4618–4624 CHRISTIANSON, D. W., LIPSCOMB, W. N., Acc. Chem. Res. 33 (1989), p. 62–69 KEMP, D. S., PAUL, K. G., J.Am. Chem. Soc. 97 (1975), p. 7305–7312 LEWIS, C., KRAMER, T., ROBINSON, S., HILVERT, D., Science 253 (1991), p. 1019–1022
References 63 TARASOW, T. M., LEWIS, C., HILVERT, D., J. Am. Chem. Soc. 116 (1994), p. 7959–7963 64 HOTTA, K., KIKUCHI, K., HILVERT, D., in Helv. Chim. Acta 83 (2000), p. 2183–2191 65 HOTTA, K., WILSON, I. A., HILVERT, D., Biochemistry 41 (2002), p. 772–779 66 HSIEH, L. C., STEPHANS, J. C., SCHULTZ, P. G., J. Am. Chem. Soc. 116 (1994), p. 2167–2168 67 HOLLAND, H. L., Chem.Rev. 88 (1988), p. 473–485 68 HSIEH-WILSON, L. C., SCHULTZ, P. G., STEVENS, R. C., Proc. Natl. Acad. Sci. U.S.A. 93 (1996), p. 5363–5367 69 YIN, J., MUNDORFF, E. C., YANG, P. L., WENDT, U., HANWAY, D., STEVENS, R. C., SCHULTZ, P. G., Biochemistry 40 (2001), p. 10764–10773 70 YIN, J., ANDRYSKI, S. E., BEUSCHER IV, A. E., STEVENS, R. C., SCHULTZ, P. G., Proc. Natl. Acad. Sci. U.S.A. 100 (2003), p. 856–861 71 GONCALVES, J., DINTINGER, T., LEBRETON, J., BLANCHARD, D., TELLIER, C., Biochem. J. 346 (2000), p. 691–698 72 GOLINELLI-PIMPANEAU, B., GONCALVES, O., DINTINGER, T., BLANCHARD, D., KNOSSOW, M., TELLIER, C., Proc. Natl. Acad. Sci. U.S.A. 97 (2000), p. 9892–9895 73 HOLLFELDER, F., KIRBY, A. J., TAWFIK, D. S., Nature 383 (1996), p. 60–62 74 HILVERT, D., CARPENTER, S. H., NARED, K. D., AUDITOR, M.-T. M., Proc. Natl. Acad. Sci. U.S.A. 85 (1988), p. 4953–4955 75 ANDREWS, P. R., SMITH, G. D., YOUNG, I. G., Biochemistry 12 (1973), p. 3492 76 TANG, Y., HICKS, J. B., HILVERT, D., Proc. Natl. Acad. Sci. U.S.A. 88 (1991), p. 8784–8786 77 HAYNES, M. R., STURA, E. A., HILVERT, D., WILSON, I. A., Science 263 (1994), p. 646–652 78 CHOOK, Y. M., KE, H., LIPSCOMB, W. N. Proc. Natl. Acad. Sci. U.S.A. 90 (1993), p. 8600–8603 79 LARSEN, N. A., ZHOU, B., HEINE, A., WIRSCHING, P., JANDA, K. D., WILSON, I. A., J. Mol. Biol. 311 (2001), p. 9–15 80 BRAISTED, A. C., SCHULTZ, P. G., J. Am. Chem. Soc. 116 (1994), p. 2211–2212
81 ULRICH, H. D., MUNDORFF, E., SANTARSIERO, B. D., DRIGGERS, E. M., STEVENS, R. C., SCHULTZ, P. G., Nature (1997), p. 271–275 82 MUNDORFF, E. C., HANSON, M. A., VARVAK, A., ULRICH, H. D., SCHULTZ, P. G., STEVENS, R. C. Biochemistry 39 (2000), p. 627–632 83 SAUER, J., Angew.Chem.Int.Ed.Engl. 5 (1966), p. 211–230 84 SCHMIDT, R. R., Acc.Chem.Res. 19 (1986), p. 250–259 85 POHNERT, G., Chembiochem 2 (2001), p. 873–875 86 PAGE, M. I., JENCKS, W. P., Proc.Natl. Acad. Sci. U.S.A. 68 (1971), p. 1678-1683 87 BAHR, N., GUELLER, R., REYMOND, J.-L., LERNER, R. A., J. Am. Chem. Soc. 118 (1996), p. 3550–3555 88 HUGOT, M., BENSEL, N., VOGEL, M., REYMOND, M. T., STADLER, B., REYMOND, J.-L., BAUMANN, U., Proc. Natl. Acad. Sci. U.S.A. 99 (2002), p. 9674–9678 89 YLI-KAUHALUOMA, J. T., ASHLEY, J. A., LO, C.-H., TUCKER, L., WOLFE, M. M., JANDA, K. D., J. Am. Chem. Soc. 117 (1995), p. 7041–7047 90 CANNIZZARO, C. E., ASHLEY, J. A., JANDA, K. D., HOUK, K. N., J. Am. Chem. Soc. 125 (2003), p. 2489–2506 91 HEINE, A., STURA, E. A., YLI-KAUHALUOMA, J. T., GAO, C., DENG, Q., BENO, B. R., HOUK, K. N., JANDA, K. D., WILSON, I. A., Science 279 (1998), p. 1934–1940 92 BRAISTED, A. C., SCHULTZ, P. G., J. Am. Chem. Soc. 112 (1990), p. 7430–7431 93 ROMESBERG, F. E., SPILLER, B., SCHULTZ, P. G., STEVENS, R. C., Science 279 (1998), p. 1929–1933 94 XU, J., DENG, D., CHEN, J., HOUK, K. N., BARTEK, J., HILVERT, D., WILSON, I. A., Science 286 (1999), p. 2345–2348 95 PAGE, M. I., JENCKS, W. P., Proc. Natl. Acad. Sci. U.S.A. 68 (1971), p. 1678–1683 96 HILVERT, D., HILL, K. W., Methods Enzymol. 203 (1991), p. 352–369 97 ARE´ VALO, J. H., STURA, E. A., TAUSSIG, M. J., WILSON, I. A., J. Mol. Biol. 231 (1993), p. 103–118
29
30 Structure and Function of Catalytic Antibodies
98 ARE´ VALO, J. H., TAUSSIG, M. J., WILSON, I. A. Nature 365 (1993), p. 859–863 99 ARE´ VALO, J. H., HASSIG, C. A., STURA, E. A., SIMS, M. J., TAUSSIG, M. J., WILSON, I. A. J. Mol. Biol. 241 (1994), p. 663–690 100 CRAVATT, B. F., ASHLEY, J. A., JANDA, K. D., BOGER, D. L., LERNER, R. A., J. Am. Chem. Soc. 116 (1994), p. 6013–6014 101 LARSEN, N. A., HEINE, A., CRANE, L., CRAVATT, B. F., LERNER, R. A., WILSON, I. A., J. Mol. Biol. 314 (2001), p. 93–102 102 MARCH, J. Advanced organic chemistry, John Wiley & Sons, New York 1992 103 STORK, G., BURGSTAHLER, A. W. J.Am. Chem. Soc. 77 (1955), p. 5068 104 ESCHENMOSER, A., RUZICKA, L., JEGER, O., ARIGONI, D. Helv. Chim. Acta 38 (1955), p. 1890 105 LESBURG, C. A., CARUTHERS, J. M., PASCHALL, C. M., CHRISTIANSON, D. W., Curr. Opin. Struct. Biol. 8 (1998), p. 695–703 106 JOHNSON, G., J.Am.Chem.Soc. 86 (1964), p. 1959 107 LI, T., JANDA, K. D., ASHLEY, J. A., LERNER, R. A., Science 264 (1994), p. 1289–1293 108 ABE, I., ROHMER, M., PRESTWICH, G. D., Chem. Rev. 93 (1993), p. 2189–2206 109 HASSERODT, J., JANDA, K. D., LERNER, R. A., J. Am. Chem. Soc. 119 (1997), p. 5993–5998 110 DAILEY, H. A., FLEMING, J. E., J.Biol. Chem. 258 (1983), p. 11453 111 COCHRAN, A. G., SCHULTZ, P. G. Science 249 (1990), p. 781–783 112 TAKETANI, S., TOKUNAGA, R., Eur.J. Biochem. 127 (1982), p. 443–447 113 ROMESBERG, F. E., SANTARSIERO, B. D., SPILLER, B., YIN, J., BARNES, D., SCHULTZ, P. G., STEVENS, R. C. Biochemistry 37 (1998), p. 14404–14409 114 WIRSCHING, P., ASHLEY, J. A., LO, C.-H. L., JANDA, K. D., LERNER, R. A., Science 270 (1995), p. 1775–1782 115 WAGNER, J., LERNER, R. A., BARBAS III, C. F., Science 270 (1995), p. 1797–1800 116 BARBAS III, C. F., HEINE, A., ZHONG, G., HOFFMANN, T., GRAMATIKOVA, S., BJORNESTEDT, R., LIST, B., ANDERSON, J., STURA, E. A., WILSON, I. A., LERNER, R. A. Science 278 (1997), p. 2085–2092
117 KARLSTROM, A., ZHONG, G., RADER, C., LARSEN, N. A., HEINE, A., FULLER, R., LIST, B., TANAKA, F., WILSON, I. A., BARBAS III, C. F., LERNER, R. A., Proc. Natl. Acad. Sci. U.S.A. 97 (2000), p. 3878–3883 118 HEINE, A., DESANTIS, G., LUZ, J. G., MITCHELL, M., WONG, C.-H., WILSON, I. A., Science 294 (2001), p. 369–374 119 ZHONG, G., LERNER, R. A., BARBAS, C. F., III. Angew. Chem. Int. Ed. Engl. submitted (1999) 120 ZHU, X., WILSON, I. A., personal communication (2003) 121 FESSNER, W.-D., Curr.Opin.Chem. Biol. 2 (1998), p. 85–97 122 CHEN, L., DUMAS, D. P., WONG, C.-H., J. Am. Chem. Soc. 114 (1992), p. 741–748 123 RACKER, E., J.Biol.Chem. 196 (1952), p. 347–365 124 BARBAS III, C. F., WANG, Y.-F., WONG, C.-H., J. Am. Chem. Soc. 112 (1990), p. 2013–2014 125 ZHONG, G., SHABAT, D., LIST, B., ANDERSON, J., SINHA, S. C., LERNER, R. A., BARBAS III, C. F., Angew. Chem. Int. Ed. 37 (1998), p. 2481–2484 126 LIST, B., Tetrahedron 58 (2002), p. 5573–5590 127 WENTWORTH, A. D., JONES, L. H., WENTWORTH Jr., P., JANDA, K. D., LERNER, R. A., Proc. Natl. Acad. Sci. U.S.A. 97 (2000), p. 10930–10935 128 WENTWORTH Jr., P., JONES, L. H., WENTWORTH, A. D., ZHU, X., LARSEN, N. A., WILSON, I. A., XU, X., GODDARD, W. A., JANDA, K. D., ESCHENMOSER, A., LERNER, R. A., Science 293 (2001), p. 1806–1811 129 DATTA, D., VAIDEHI, N., XU, X., GODDARD III, W. A., Proc. Natl. Acad. Sci. U.S.A. 99 (2002), p. 2636–2641 130 PATTEN, P. A., GRAY, N. S., YANG, P. L., MARKS, C. B., WEDEMAYER, G. J., BONIFACE, J. J., STEVENS, R. C., SCHULTZ, P. G., Science 271 (1996), p. 1086–1091. 131 FUJII, I., TANAKA, F., MIYASHITA, H., TANIMURA, R., KINOSHITA, K., J. Am. Chem. Soc. 117 (1995), p. 6199–6209 132 YANG, G., CHUN, J., ARAKAWA-URAMOTO, H., WANG, X., GAWINOWICZ, M. A., ZHAO, K., LANDRY, D. W., J. Am. Chem. Soc. 118 (1996), p. 5881–5890
References 133 LARSEN, N. A., HEINE, A., de PRADA, P., REDWAN, R. M., YEATES, T. O., LANDRY, D. W., WILSON, I. A., Acta Crystallogr. D58 (2002), p. 2055–2059 134 BENSEL, N., BAHR, N., REYMOND, M. T., SCHENKELS, C., REYMOND, J.-L., Helv. Chim. Acta 82 (1999), p. 44–52 135 ZHONG, G., SHABAT, D., LIST, B., ANDERSON, J., SINHA, S. C., LERNER, R. A., BARBAS III, C. F., Angew. Chem. Int. Ed. 37 (1998), p. 2481–2484 136 LEWIS, C., PANETH, P., O’LEARY, M. H., HILVERT, D., J. Am. Chem. Soc. 115
(1993), p. 1410–1413 137 TARASOW, T. M., LEWIS, C., HILVERT, D. J. Am. Chem. Soc. 116 (1994), p. 7959–7963 138 GRUBER, K., ZHOU, B., HOUK, K. N., LERNER, R. A., SHEVLIN, C. G., WILSON, I. A., Biochemistry 38 (1999), p. 7062–7074 139 ZHU, X., WENTWORTH, P., WENTWORTH, A. D., ESCHENMOSER, A., LERNER, R. A., WILSON, I. A., Proc. Natl. Acad. Sci. U.S.A. 101 (2004), p. 2247–2252
31
1
N-terminal Ubiquitination Aaron Ciechanover
Technion — Israel Institute of Technology, Haifa, Israel
Originally published in: Protein Degradation, Volume 1. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30837-7 Abstract
The ubiquitin–proteasome system (UPS) is involved in selective targeting of innumerable cellular proteins via a complex pathway that plays important roles in a broad array of processes. An important step in the proteolytic cascade is specific recognition of the substrate by one of many ubiquitin ligases, E3s, that is followed by generation of the polyubiquitin degradation signal. For most substrates, it is believed, though it has not been shown directly, that the first ubiquitin moiety is conjugated, via its C-terminal Gly76 residue, to an ε-NH2 group of an internal lysine residue. Recent findings indicate that for an increasing number of proteins, the first ubiquitin moiety is fused linearly to the α -NH2 group of the N-terminal residue. An important biological question relates to the evolutionary requirement for an alternative mode of ubiquitination.
1 Background
Two distinct structural elements play a role in the ubiquitination of a target protein: (i) the E3 recognition site and (ii) the anchoring residue of the polyubiquitin chain. In most cases, it is believed, though it has been shown for only a few proteins, that the first ubiquitin moiety is transferred to an ε-NH2 group of an internal lysine residue in the substrate. The N-terminal domain of the target protein has attracted attention both as an E3 recognition domain and, recently, as a ubiquitination site. As for specific recognition, in certain rare cases, the stability of a protein is a direct function of its N-terminal residue, which serves as a binding site for the ubiquitin ligase E3α (Ubr1 in yeast; ‘N-end-rule’; [1, 2]). Accordingly, two types of N-terminal residues have been defined, “stabilizing” and “destabilizing”. For the
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 N-terminal Ubiquitination
Mos protein, it was found that its stability is governed primarily by the penultimate proline residue and by a phosphorylation/dephosphorylation cycle of serine3 [3]. A mechanistic explanation for the role of the Pro and Ser residues is still missing. As for the lysine residue targeted, there is no consensus as to its specificity. In some cases distinct lysines are required, while in others there is little or no specificity. Thus signal-induced degradation of IκBα involves two particular lysine residues, 21 and 22 [4]. In the case of Gcn4, lysine residues in the vicinity of a specific PEST degradation signal serve as ubiquitin attachment sites [5]. Mapping of ubiquitination sites of the yeast iso-2-cytochrome c has revealed that the polyubiquitin chain is synthesized almost exclusively on a single lysine [6]. In two other examples, that of Mos (see above; [3]) and the model “N-end rule” substrate X-β-gal (where X is a short fused peptide not encoded by the native molecule [7]), one and two lysines, respectively, that reside in proximity to the degradation signal are required for ubiquitination. In striking contrast, ubiquitination of the ζ chain of the T-cell receptor is independent of any particular lysine residue and proceeds as long as one residue is present in the cytosolic tail of the molecule [8]. Similarly, no single specific lysine residue is required for ubiquitination of either c-Jun [9] or cyclin B [10]: any single lysine residue, even artificially inserted, can serve as a ubiquitin acceptor. Important in this context is that only in a handful of cases it has been shown directly, via chromatographic or mass spectrometric analyses, that ubiquitin is indeed anchored to a lysine residue (see for example Refs. [11, 12]). In most cases studied, and there are not too many, the assumption that an internal lysine serves as the polyubiquitin chain anchor is indirect and based on mutational analyses. One interesting case involves the artificial fusion protein ubiquitin-Pro-X-βgalactosidase. In this chimera, the ubiquitin moiety was fused to the N-terminal Pro residue of the protein. Unlike other ubiquitin-B-X-β-galactosidase species (where B is any of the remaining 19 amino acid residues), here ubiquitin is not removed by isopeptidases and serves as a degradation signal following generation of a polyubiquitin chain that is anchored to Lys48 of the artificially fused ubiquitin moiety [13]. However as noted, in this case the ubiquitin moiety was fused to the N-terminal residue artificially. The first substrate that was identified in which the N-terminal residue serves as a ubiquitination target was MyoD. The basic helix–loop–helix (bHLH) protein MyoD is a tissue-specific transcriptional activator that acts as a master switch for muscle development. MyoD forms heterodimers with other proteins belonging to the bHLH group, such as the ubiquitously expressed E2A, E12 and E47. These dimers are probably the transcriptionally active forms of the factor. Association of MyoD with HLH proteins of the Id family (inhibitors of differentiation that lack the basic domain) inhibits its DNA-binding and biological activities. MyoD is a short-lived protein with a half-life of ∼45 min [14, 15]. Degradation of MyoD is mediated by the ubiquitin system both in vitro and in vivo. Furthermore, the process is inhibited by its consensus DNA-binding site. In contrast, addition of Id1 destabilizes the MyoD–E47–DNA complex and renders the protein susceptible to degradation [15].
2 Results
2 Results
To analyze specific ubiquitination sites in MyoD, we used site-directed mutagenesis to substitute systematically all the lysine residues with arginines [16]. The protein contains nine lysine residues, most of them located within the N-terminal domain of the molecule. The nine residues are in positions 58, 99, 102, 104, 112, 124, 133, 146 and 241. The various proteins were generated either by expression in bacteria followed by purification, or by in vitro translation in reticulocyte lysate in the presence of [35 S]methionine. Conjugation and degradation of the proteins were monitored in a reconstituted cell-free system or in cells. Proteins were detected by either Western blot analysis or PhosphorImaging. Surprisingly, even a MyoD species that lacked all lysine residues was still degraded efficiently in an ATPdependent manner in vitro. To demonstrate involvement of the ubiquitin system in the process, we followed the degradation of wild-type (WT) and lysine-less (LL) MyoD in the absence and presence of ubiquitin. Similar to the degradation of the WT protein, degradation of the LL MyoD was completely dependent upon the addition of exogenous ubiquitin to an extract that does not contain it (Fraction II). Furthermore, addition of methylated ubiquitin, which cannot form polyubiquitin chains and serves as a chain terminator [17], inhibited the degradation of LL MyoD. The inhibition could be alleviated by the addition of excess of free ubiquitin. These results strongly suggested that polyubiquitination of LL MyoD is necessary for degradation of the protein. Furthermore, they implied that the polyubiquitin chain is synthesized on internal lysine residues of ubiquitin. To demonstrate directly polyubiquitinated LL MyoD, we used in-vitro-translated 35 S-labeled protein in a partially reconstituted system. We demonstrated that LL MyoD generates high molecular mass ubiquitinated adducts. It should be noted, however, that these conjugates are of somewhat lower molecular mass than those of the WT MyoD. This can be attributed to the role that the internal lysine residues also play in the process (see also below). To investigate the physiological relevance of the observations in the cell-free system, we followed the fate of the different MyoD lysine-mutated proteins in vivo, using pulse-chase labeling experiments in COS-7 cells that were transiently transfected with the different MyoD cDNAs. In agreement with our in vitro data, the lysine-less MyoD protein is degraded efficiently in cells as well. However, we could observe a progressive increase in the half-life of the proteins of up to ∼2-fold with the gradual substitution of the lysine residues. While the half-life of WT MyoD was ∼50 min, that of LL MyoD was ∼2 h. Interestingly, we found that the stability of MyoD is not affected by the substitution of any specific lysine residue, and it is the total number of these residues that determines the half-life of the protein. To identify the system involved in the destruction of LL MyoD in vivo, transfected cells were incubated in the presence of inhibitors of proteasomal and lysosomal degradation. Chloroquine, a general inhibitor of lysosomal proteolysis, and E-64, a cysteine protease inhibitor that affects lysosomal, but also certain cytosolic proteases, had no effect on the stability of the LL MyoD. In striking contrast, the proteasomal
3
4 N-terminal Ubiquitination
inhibitors MG132 and lactacystin blocked degradation of the LL protein significantly. To demonstrate the intermediacy of ubiquitin conjugates in the degradation of LL MyoD, we incubated COS-7 cells, transiently transfected with either WT or LL MyoD cDNAs, with MG132, and followed generation of ubiquitin-MyoD adducts. Immunoprecipitation with anti-MyoD antibody followed by Western blot analysis with anti-ubiquitin antibody revealed accumulation of high molecular mass compounds in cells transfected with either WT or LL MyoD. A similar analysis of mock-transfected cells clearly demonstrated the specificity of both the anti-MyoD and anti-ubiquitin antibodies. Based on these results, it was clear that polyubiquitination is essential for targeting MyoD for degradation. The lack of internal lysine residues, the only known targets for ubiquitin modification, made it important to identify the functional group that can serve as an attachment site for ubiquitin. Chemically, several groups can generate covalent bonds with ubiquitin. Ser and Thr can participate in ester bond formation, while Cys can generate a thiol ester bond. However, these bonds are unstable and are hydrolyzed in either high pH (Ser and Thr) or high concentration of -SH groups (Cys). The stability of the MyoD-ubiquitin adducts under these conditions made it highly unlikely that any of these modifications is the one we observed. A likely candidate, however, was the free amino group of the N-terminal residue of the protein, which can generate a stable peptide bond with the C-terminal Gly residue of ubiquitin. Edman degradation of the N-terminal residue of bacterially expressed, in-vitro-translated and cellularly expressed MyoDs, has revealed that the ubiquitin attachment site can be the free, unmodified initiator methionine: the proteins were not modified and the N-terminal residue was not acetylated. To demonstrate a role for the free N-terminal amino group in the degradation of MyoD, we chemically modified this group. Initially, we blocked this group in the LL MyoD protein by reductive methylation. While this procedure blocks all amino groups in a protein in a non-discriminatory manner, in this case, it could have been only the α-NH2 group, which is the only free amino group left in the MyoD molecule. The modification stabilized the protein completely. Whereas a free α-NH2 appears to be sufficient for degradation (probably following ubiquitination) of LL MyoD, it is not clear whether it also plays a physiological role in targeting the WT molecule, which has nine available lysine residues. In order to investigate the role and biological relevance of the free α-NH2 group in the targeting of WT MyoD, we selectively blocked it by carbamoylation with potassium cyanate at low pH. This procedure does not modify ε-NH2 groups of internal lysine residues. Automated Edman degradation along with fuorescamine determination of the extent of remaining free NH2 groups confirmed that the modification affected only the N-terminal group. The modified protein was subjected to in vitro degradation and conjugation in cell extract. In contrast to LL MyoD, the N-terminally carbamoylated protein could not be ubiquitinated and was stable. Thus, a free and exposed NH2 terminus of MyoD appears to be an essential site for degradation, most probably because it serves as an attachment site for the first ubiquitin moiety. As an additional control, we selectively modified the internal lysine residues of WT MyoD by guanidination
2 Results
with O-methylisourea. The modification, which does not affect the N-terminal group, generates a protein that is essentially the chemically modified counterpart of the LL MyoD that was generated by site-directed mutagenesis. Similar to the LL protein, this MyoD derivative is degraded efficiently in the cell-free system in a ubiquitin-and an ATP-dependent mode. To analyze the role of the N-terminal residue of MyoD as a ubiquitination site, we fused to WT MyoD, upstream to the N-terminal residue, a 6 × Myc tag, and monitored the stability of the tagged protein. We showed that it is stable both in vitro and in vivo. It should be noted that the two first N-terminal residues of the Myc tag, methionine and glutamate, are identical to the first two N-terminal residues in MyoD. In addition, the Myc tag also contains a lysine residue. Thus, altogether, six additional lysine residues were added to WT MyoD in addition to its own nine native residues. Nevertheless, the tag stabilizes it, probably by blocking access to a specific N-terminal residue, and, as became clear later (see below), to its neighboring domain. Taken together, these findings strongly suggested that MyoD is first ubiquitinated at its N-terminal residue, and the polyubiquitin chain is synthesized on this first conjugated ubiquitin moiety. Internal lysine residues also play a role, probably by serving as additional anchoring sites, whose ubiquitination accelerates degradation. Yet they are not essential for proteolysis to occur. In contrast, ubiquitination of the N-terminal residue plays a critical role in governing the protein’s stability [16] (Figure 1). Using a similar, though not a complete, set of experiments, 12 additional proteins have been identified recently that appear to undergo N-terminal ubiquitination: (i) the human papilloma virus-16 (HPV-16) E7 oncoprotein (18), (ii) the latent membrane protein 1 (LMP1)(19) and (iii) 2A (LMP2A)(20) of the Epstein Barr virus (EBV), involved in viral activation from latency, (iv) the cell cycle-dependent kinase (CDK) inhibitor p21 (21,22), (v) the extracellular signal-regulated kinase 3, ERK3 (22), the inhibitors of differentiation (vi) Id2 (23) and (vii) Id1 (24), two pro-proliferative Helix-Loop-Helix proteins, (viii) hydroxymethyglutaryl-Coenzyme A reductase (HMG-CoA reductase), the first and key regulatory enzyme in the cholesterol biosynthetic pathway (25), (ix) p19ARF , the mouse Mdm2 inhibitor and (x) p14ARF , its human homologue (26), (xi) the HPV-58 E7 oncoprotein, and (xii) the cell cycle regulator p16INK4a (27). As for HMG-CoA reductase, several specific internal lysine residues have also been shown to be important for its targeting, and therefore the essentiality of the N-terminal residue in the process has to be further substantiated. The case of p21 requires further investigation, as one study reported that its degradation by the proteasome does not require ubiquitination [28], while an independent study has demonstrated a role for Mdm2 in targeting p21, also without a requirement for ubiquitination [29]. As we noted for MyoD, substitution of the internal lysines inhibited slightly (up to twofold) both conjugation and degradation of HPV-16, LMP1, and Id2, suggesting that these residues, probably also by serving as ubiquitin anchors, can modulate the stability of these proteins. It is possible to suppose that N-terminal ubiquitination and modification of internal lysines is catalyzed by different ligases that may be even located in different subcellular
5
6 N-terminal Ubiquitination
NH
A
Lys48 - based polyubiquitin chain
K48 G76
C=O NH Ubiquitin K48 G76 C=O NH NH2
X
Kn
COOH
Protein target
B
NH
Lys48 – based polyubiquitin chain
K48 G76
C=O NH NH2
K48 Ubiquitin G76
C NH O
X
Fig. 1 Ubiquitination on an internal lysine and on the N-terminal residue of the target substrate. (A) The first ubiquitin moiety is conjugated, via its C-terminal Gly76 residue, to the g-NH2 group of an internal lysine residue of the target substrate (Kn ). (B) The first ubiquitin moiety is conjugated to a free
Kn
COOH
Protein target "-NH2 group of the N-terminal residue, X. In both cases, successive addition of activated ubiquitin moieties to internal Lys48 on the previously conjugated ubiquitin moiety leads to the synthesis of a polyubiquitin chain which serves as the degradation signal for the 26S proteasome.
compartments (e.g. the nucleus and cytosol). Because of the role that internal lysines play in modulating the stability of these proteins, and in order to better understand the physiological significance of this novel mode of modification, it was important to identify proteins whose degradation is completely dependent on N-terminal ubiquitination. An important group of potential substrates for Nterminal ubiquitination is that of naturally occurring lysine-less proteins – NOLLPs. Since these proteins cannot use the “canonical” lysine conjugation pathway, in order to be targeted by the ubiquitin system they must use, an alternative site for their tagging. Searching the database, we were able to identify 177 eukaryotic NOLLPs, 14 of which occur in humans. In addition, we have identified 111 viral NOLLPs. We have shown that two of the proteins mentioned above, the human tumor supressor p16INK4a and the viral oncoprotein HPV-58 E7 are degraded via
2 Results
the N-terminal ubiquitination pathway [27]. Interestingly, we demonstrated that p16INK4a is ubiquitinated and degraded only in sparse cells, and is stable in dense cells. Similar findings were reported for the NOLLPs p19ARF and p14ARF (26). For E7-16 [18], LMP1 [19], and MyoD (unpublished), it has been shown that truncation of a short N-terminal segment of 10–20 residues stabilized the proteins, suggesting that the entire domain beyond the single N-terminal residue plays a role in governing the stability of these proteins. Such a segment can allow the mobility/flexibility necessary for the N-terminal residue to serve as a ubiquitin acceptor. It can also serve as a recognition domain for the cognate E3. There is no homology between the N-terminal domains of these three proteins, suggesting that if the three N-terminal domains serve as recognition motifs, they recognize different components of the ubiquitin system. Interestingly, the LMP2A E3 was identified as a member of the NEDD4 family of HECT domain ligases, AIP4 and/or WWP2 [20]. A PY motif in LMP2A is recognized by the E3. It resides in the Nterminal domain of the molecule, supporting the hypothesis that in these proteins the E3-binding domain may reside in close proximity to the N-terminal residue ubiquitination site. Is there any direct evidence for N-terminal ubiquitination? All the different and independent lines of evidence in the various studies strongly suggest that ubiquiti-nation occurs on the N-terminal residue, and any other scenario is highly unlikely. Yet, the only direct evidence must be demonstration of a fusion peptide between the C-terminal domain of ubiquitin and the N-terminal domain of the target substrate. The study on p21 [21] and E7-58 [27] brought us a little closer. Bloom and colleagues [21] transfected cells with N-terminally His-tagged ubiquitin and N-terminally HA-tagged p21 that contained a Factor X proteolytic site immediately after the HA tag and upstream of the p21 reading frame. They then immunoprecipitated and resolved the cell-generated ubiquitin conjugates of p21 and treated the mono-ubiquitin–p21 adduct with Factor-X protease. This treatment released a smaller species of p21 (lacking His-ubiquitin and the HA-tag-Factor X site) and His-ubiquitin-HA-Factor X site, thus demonstrating that the HA-tag-Factor X site, which was previously part of p21, had now become part of the Factor X-cleaved ubiquitin. A similar experimental evidence was brought for the NOLLP HPV E7-58 [27]. Here, Ben-Saadon and colleagues generated two species of the protein containing the eight amino acid sequence of the Tobacco Etch Virus (TEV) protease cleavage site inserted either 21 amino acid residues after the iMet [E7-58-TEV(21)] or immediately after the iMet [E7-58-TEV(1)]. The prediction from this experiment was that if ubiquitin is indeed attached to the N-terminal residue of E7-58, TEV protease-catalyzed cleavage will generate an extended ubiquitin molecule that will also contain the respective N-terminal domain of E7-58 [21 residues or 1 residue, respectively, dependent upon whether the substrate of the reaction is E7-58-TEV(21) or E7-58-TEV(1), and the six amino acids derived from the TEV cleavage site]. Such extended ubiquitin moieties were indeed generated following incubation of the substrate with labeled methylated ubiquitin (which generated mostly the mono-ubiquitin adduct of the E7-58 protein), followed by cleavage of the adduct with TEV. The only conclusion that can be derived from these experiments
7
8 N-terminal Ubiquitination
is that the ubiquitin moiety was fused to any of the amino acid residues of the HA tag-Factor X site at the N-terminal domain of p21, or to any of the first 21 amino acids of E7 or the TEV site (part of it; the protease cleaves after the sixth amino acids out of eight in the complete site). Such internal modification is unlikely, however, as it must require a novel chemistry since none of the residues in the tags, the protease sites or the E7 N-terminal fragment, is lysine. Yet, formally, it is still possible that such a modification occurs. The HA tag contains, for example, three Tyr residues. Thus, the evidence provided by these two experiments clearly limits an unlikely non-peptide bond ubiquitination, such as esterification, to a much smaller zone in the N-terminal domain of the tagged p21 or the TEV-containing E7-58, but does not demonstrate directly that the modification occurs indeed on the N-terminal residue. As noted, only identification of a fusion peptide between the C-terminal domain of ubiquitin and the N-terminal domain of the target protein will constitute such an evidence. Ben-Saadon and colleagues have recently isolated the long sought after fusion peptide [27]: mass spectrometric analysis of a tryptic digest of the isolated mono-ubiquitin adduct of HPV-58 E7 revealed a peptide of 11 amino acids, GG-MHGNNPTLR which represents the last two C-terminal amino acids of ubiquitin, GlyGly, and the first nine residues of E7, MetHisGlyAsnAsnProThrLeuArg (MHGNNPTLR). It should be noted that WT E7-58 contains an Arg residue in position 2. It was necessary to substitute this Arg with His, since otherwise the digesting enzyme, trypsin, would have generated a tetrapeptide, GG-MR, that would have been difficult, if not impossible, to identify in the MS analysis. MS/MS analysis of the 11-mer, verified its internal sequence. Coulombe and colleagues were also able to isolate and identify the sequence of a fusion peptide between the C-terminal domain of ubiquitin and the N-terminal domain of HA-tagged p21 that also contained, downstream of the tag, a stretch of residues derived from the N-terminal domain of the native substrate, but without the iMet (which was removed during the construction of the tagged protein) [22].
3 Discussion
N-terminal ubiquitination is a novel pathway, clearly distinct from the N-end rule pathway [30]. In the latter, the N-terminal residue serves as a recognition and binding motif to the ubiquitin ligase, E3α; however, ubiquitination occurs on an internal lysine(s). In contrast, in the N-terminal ubiquitination pathway, modification occurs on the N-terminal residue, whereas recognition probably involves a downstream motif. It should be mentioned that in yeast, using the model fusion protein ubiquitin-Pro-X-β-galactosidase (where X is a short sequence derived from the λ repressor), a new proteolytic pathway has been described, designated the UFD (ubiquitin fusion degradation) pathway [31]. The stably fused ubiquitin moiety (note that in this exceptional case, with Pro and not any other amino acid residue as the linking residue, the ubiquitin moiety is not cleaved off by ubiquitin C-terminal
3 Discussion
hydrolases), functions as a degradation signal, where its Lys48 serves as an anchor for the synthesized polyubiquitin chain. This pathway involves several enzymes, UFD 1–5, some of which appear to be unique and are not part of the “canonical” UPS. It is possible that N-terminal ubiquitination is the most upstream event in the UFD pathway – which was discovered using an artificial chimeric ubiquitin–protein model substrate: the N-terminal ubiquitination pathway can function by providing substrates to the UFD pathway. The physiological significance of N-terminal ubiquitination is still obscure. Naturally occurring lysine-less proteins, NOLLPs, that are degraded by the ubiquitin system must traverse this pathway. Many such proteins, mostly viral, can be found in the database (see above and in Refs. [26, 27]). We believe that many additional lysine-containing proteins, will be discovered to be targeted via this novel mode of modification. Of note is that all the proteins that are N-terminally ubiquitinated must contain a free, unmodified N-terminal residue. Such proteins constitute approximately 25% of all cellular proteins, while the remaining 75% are Nα-acetylated. Whether a protein will be acetylated is dependent on the structure of the N-terminal domain of the mature protein. This is determined by the combined activities of methionine aminopeptidases (MAPs) and N-terminal acetyltransferases (NATs), which are dependent on the specific sequence of up to the first four N-terminal residues of the target protein substrates (reviewed in Ref. [32]). Thus, it is possible to predict which proteins will be potential substrates of the N-terminal ubiquitination pathway. Internal C-terminal fragments of Nα-acetylated proteins can also be modified by ubiquitin at their “new” N-terminal residue following limited processing. Many proteins, such as the NF-κB precursors p105 and p100 or caspase substrates are processed initially in a limited manner, generating a C-terminal fragment with a newly exposed N-terminal residue. For all lysine-containing proteins, the intact free N-termini as well as the products of processing, the assumption is that their internal lysines are not easily accessible, for whatever reason, for ubiquitination, and it is only the N-terminal residue that can be modified. Interestingly, most of the substrates identified thus far have a few lysine residues that might not be accessible to the E3s: For example, MyoD has nine (out of 319), E7 two (out of 97), LMP1 has a single lysine residue (out of 440), LMP2A has three (out of 497), Id2 nine (out of 134) and p21 six (out of 164). From the random discovery of thirteen N-terminally ubiquitinated proteins, it appears that their number could well be larger and that many more will be discovered, which will help in the unraveling of the unique characteristics that distinguish this group of substrates.
Acknowledgments
Research in the laboratory of A.C. is supported by grants from Prostate Cancer Foundation Israel - Centers of Excellence Program, the Israel Science Foundation – Centers of Excellence Program, a Professorship funded by the Israel Cancer Research Fund (ICRF) and the Foundation for Promotion of Research in the Technion administered by the Vice President of the Technion for Research. Infrastructural
9
10 N-terminal Ubiquitination
equipment was purchased with the support of the Wolfson Charitable Fund Center of Excellence for studies on Turnover of Cellular Proteins and its Implications to Human Diseases.
References 1 BACHMAIR, A., FINLEY, D. and VARSHAVSKY, A. In vivo half-life of a protein is a function of its amino-terminal residue. Science, 1986, 234, 179–186. 2 VARSHAVSKY, A. The N-end rule: functions, mysteries, uses. Proc. Natl Acad. Sci. USA, 1996, 93, 12142–12149. 3 NISHIZAWA, M., FURUNO, N., OKAZAKI, K., TANAKA, H., OGAWA, Y. and SAGATA, N. Degradation of Mos by the N-terminal proline (Pro2)-dependent ubiquitin pathway on fertilization of Xenopus eggs: possible significance of natural selection for Pro2 in Mos. EMBO J., 1993, 12, 4021–4027. 4 SCHERER, D. C., BROCKMAN, J. A., CHEN, Z., MANIATIS, T. and BALLARD, D. W. Signal-induced degradation of IkBα requires site-specific ubiquitination. Proc. Natl Acad. Sci. USA, 1995, 92, 11259–11263. 5 KORNITZER, D., RABOY, B., KULKA, R. G. and FINK, G. R. Regulated degradation of the transcription factor Gcn4. EMBO J., 1994, 13, 6021–6030. 6 SOKOLIK, C. W. and COHEN, R. E. The structures of ubiquitin conjugates of yeast Iso-2-cytochrome c. J. Biol. Chem., 1991, 266, 9100–9107. 7 CHAU, V., TOBIAS, J. W., BACHMAIR, A., MARRIOTT, D., ECKER, D. J., GONDA, D. K. and VARSHAVSKY, A. A multi ubiquitin chain is confined to specific lysine in a targeted short-lived protein. Science, 1989, 243, 1576–1583. 8 HOU, D., CENCIARELLI, C., JENSEN, J. P., NGUYGEN, H. B. and WEISSMAN, A. M. Activation-dependent ubiquitination of a T cell antigen receptor subunit on multiple intracellular lysines. J. Biol. Chem., 1994, 269, 14244–14247. 9 TREIER, M., STASZEWSKI, L. and BOHMANN, D. Ubiquitin-dependent c-Jun degradation in vivo is mediated by the δ-domain. Cell, 1994, 78, 787–798.
10 KING, R. W., GLOTZER, M. and KIRSCHNER, M. W. Mutagenic analysis of the destruction signal of mitotic cyclins and structural characterization of ubiquitinated intermediates. Mol. Biol. Cell, 1996, 7, 1343–1357. 11 GOLDKNOPF, I. L. and BUSCH, H. Isopeptide linkage between nonhistone and histone 2A polypeptides of chromosomal conjugate-protein A24. Proc. Natl. Acad. USA, 1977, 74, 864–868. 12 GRONROOS, E., HELLMAN, U., HELDIN, C. H. and ERICSSON, J. Control of Smad7 stability by competition between acetylation and ubiquitination. Mol. Cell, 2002, 10, 483–493. 13 JOHNSON, E. S., BARTEL, B., SEUFERT, W. and VARSHAVSKY, A. Ubiquitin as a degradation signal. EMBO J. 1992, 11, 497–505. 14 THAYER, M. J., TAPSCOTT, S. J., DAVIS, R. L., WRIGHT, W. E., LASSAR, A. B. and WEINTRAUB, H. Positive autoregulation of the myogenic determination gene MyoD1. Cell, 1989, 58, 241–248. 15 ABU-HATOUM, O., GROSS-MESILATY, S., BREITSCHOPF, K., HOFFMAN, A., GONEN, H., CIECHANOVER, A. and BENGAL, E. Degradation of the myogenic transcription factor MyoD by the ubiquitin pathway in vivo and in vitro: regulation by specific DNA binding. Mol. Cell. Biol., 1998, 18, 5670–5677. 16 BREITSCHOPF, K., BENGAL, E., ZIV, T., ADMON, A. and CIECHANOVER, A. A novel site for ubiquitination: The N-terminal residue and not internal lysines of MyoD is essential for conjugation and degradation of the protein. EMBO J., 1998, 17, 5964–5973. 17 HERSHKO, A. and HELLER, H. Occurrence of a polyubiquitin structure in ubiquitin-protein conjugates. Biochem. Biophys. Res. Commun. 1985, 128, 1079–1086.
References 18 REINSTEIN, E., SCHEFFNER, M., OREN, M., SCHWARTZ, A. L. and CIECHANOVER, A. Degradation of the E7 human papillomavirus oncoprotein by the ubiquitin-proteasome system: 2 N-terminal Ubiquitination: No Longer Such a Rare Modification targeting via ubiquitination of the N-terminal residue. Oncogene, 2000, 19, 5944–5950. 19 AVIEL, S., WINBERG, G., MASSUCCI, M. and CIECHANOVER, A. Degradation of the Epstein-Barr virus latent membrane protein 1 (LMP1) by the ubiquitin-proteasome pathway: targeting via ubiquitination of the N-terminal residue. J. Biol Chem., 2000, 275, 23491–23499. 20 IKEDA, M., IKEDA, A. and LONGNECKER, R. Lysine-independent ubiquitination of the Epstein-Barr virus LMP2A. Virology, 2002, 300, 153–159. 21 BLOOM, J., AMADOR, V., BARTOLINI, F., DEMARTINO, G. and PAGANO, M. Proteasome-mediated degradation of p21 via N-terminal ubiquitinylation. Cell, 2003, 115, 1–20. 22 COULOMBE, P., RODIER, G., BONNEIL, E., THIBAULT, P. and MELOCHE, S. N-Terminal ubiquitination of extracellular signal-regulated kinase 3 and p21 directs their degradation by the proteasome. Mol. Cell Biol., 2004, 24, 6140–6150. 23 FAJERMAN, I., SCHAWARTZ, A. L. and CIECHANOVER, A. Degradation of the Id2 developmental regulator: Targeting via N-Terminal Ubiquitination. Biochem. Biophys. Res. Commun., 2004, 314, 505–512. 24 AZAR-TRAUSCH, J. S., LINGBECK, J., CIECHANOVER, A. and SCHWARTZ, A. L. Ubiquitin-proteasome-mediated degradation of Id1 is modulated by MyoD. J. Biol. Chem., 2004, 279, 32614–32619. 25 DOOLMAN, R., LEICHNER, G. S., AVNER, R. and ROITELMAN, J. Ubiquitin is conjugated by membrane ubiquitin ligase to three
26
27
28
29
30
31
32
sites, including N-terminus, in transmembrane region of mammalian 3-hydroxy-3- methylglutaryl coenzyme a reductase: Implications for sterol-regulated enzyme degradation. J. Biol. Chem., 2004, 279, 38184–38193. KUO, M. L., DEN BESTEN, W., BERTWISTLE, D., ROUSSEL, M. F., and SHERR, C. J. N-terminal polyubiquitination and degradation of the Arf tumor suppressor. Genes and Dev., 2004, 18, 1862–1874. BEN-SAADON, R., FAJERMAN, I., ZIV, T., HELLMAN, U., SCHWARTZ, A. L., and CIECHANOVER, A. The Tumor Suppressor Protein p16INK4a and the Human Papillomavirus oncoprotein E7-58 are Naturally Occurring Lysine-Less Proteins that are Degraded by the Ubiquitin System: Direct Evidence for Ubiquitination at the N-Terminal Residue. J. Biol. Chem., 2004, 279, 41414–41421. SHEAFF, R. J., SINGER, J. D., SWANGER, J., SMITHERMAN, M., ROBERTS, J. M. and CLURMAN, B. E. Proteasomal turnover of p21Cip1 does not require p21Cip1 ubiquitination. Mol. Cell, 2000, 5, 403–410. JIN, Y., LEE, H., ZENG, S. X., DAI, M. S. and LU, H. Mdm2 promotes p21waf cip1 proteasomal turnover independently of ubiquitylation. EMBO J., 2003, 22, 6365–6377. VARSHAVSKY, A. The N-end rule: Functions, mysteries, uses. Proc. Natl. Acad. Sci. USA, 1996, 93, 12142–12149. JOHNSON, E. S., MA, P. C., OTA, I. M. and VARSHAVSKY, A. A proteolytic pathway that recognizes ubiquitin as a degradation signal. J. Biol. Chem., 1995, 270, 17442–17456. POLEVODA, B. and SHERMAN, F. N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. J. Mol. Biol., 2003, 325, 595–622.
11
1
The COP9 Signalosome and Its Role in the Ubiquitin System Dawadschargal Bech-Otschir
Western General Hospital, Edinburgh, United Kingdom
Barbara Kapelari Max-Planck-Institute for Biochemistry, Martinsried, Germany
Wolfgang Dubiel Charit´e-Universit¨atsmedizin Berlin, Berlin, Germany
Originally published in: Protein Degradation, Volume 1. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30837-8
1 Introduction
The COP9 signalosome (CSN) is a multimeric, highly-conserved protein complex [1]. Just like the ubiquitin system it occurs in all studied eukaryotic cells. Following its 1994 discovery in plant cells the complex was postulated to function in signal transduction [2]. Originally described as a regulator of light-dependent growth in plants [3, 4], identification and characterization of the CSN from mammalian cells led to the discovery of sequence homologies between CSN subunits and subunits of the 26S proteasome lid complex [5, 6] as well as subunits of the translation-initiation complex eIF3 [7]. Significant progress has been made towards understanding its structure and function by analyzing different eukaryotic organisms. The complex is involved in developmental processes of plants [8] and Drosophila [9] and is essential for embryogenesis in mice [10]. It seems to participate in processes such as DNA repair [11], cell-cycle regulation [12] and angiogenesis [13]. At the moment the pleiotropic effects of the CSN can be explained by its regulatory impact on the ubiquitin system. Here we provide a summary of current knowledge of CSN function in the ubiquitin system.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The COP9 Signalosome and Its Role in the Ubiquitin System
2 Discovery of the CSN
Deng and co-workers discovered the CSN in Arabidopsis thaliana when they characterized mutants of light-dependent development, and they called it the COP9 complex [2]. Morphogenesis of germinating seedlings is light-dependent. Light triggers a developmental process called photomorphogenesis. A number of mutations in the Arabidopsis COP/DET/FUS loci result in the loss of the COP9 complex accompanied with cop phenotypes in which germinating seedlings exhibit light-independent expression of light-induced genes [3]. Therefore the complex was originally hypothesized to be a repressor of photomorphogenesis [14]. The mammalian CSN complex was independently isolated during preparations of the 26S proteasome and called the JAB1-containing signalosome [15]. The same studies identified proteins such as JAB1 [15] and TRIP15 [16] as components of the complex and revealed homologies between subunits of the CSN and components of the 26S proteasome lid complex. Purification and analysis of the complex from Arabidopsis, pork spleen [6, 17] and human red blood cells [5, 18] led to the conclusion that each subunit of the CSN has its paralog subunit in the 26S proteasome lid complex. These data suggested a common origin for the two complexes during evolution. Because they have similar architectures, the two complexes have been postulated to perform related functions (see below). Unfortunately there is only limited information on the structure or function of the eIF3 complex, and its relationship to the CSN and the lid is not well understood [7]. Studies have revealed that the CSN possesses both intrinsic and extrinsic (associated) activities, which will be reviewed in detail below. Historical gene names of the CSN have been summarized before [1]. In this article we use the unified nomenclature of the CSN [1].
3 Architecture of the CSN 3.1 CSN Subunit–Subunit Interactions
The CSN is composed of eight subunits called CSN1 to CSN8, which are highly conserved in eukaryotes, although only six of them occur in fission yeast. Two hybrid screens and biochemical methods such as far westerns, pull downs and co-precipitation defined a number of CSN subunit–subunit interactions. Figure 1 illustrates known subunit–subunit interactions. Initial insight into the architecture of CSN came from the first 2D electron microscopic analysis of purified CSN from human red blood cells [19] (see also Figure 2 below). The CSN architecture shows similarity to that of the lid. Both complexes have an asymmetric arrangement of their subunits and exhibit a central groove structure [19]. Whether the structural similarity of the two complexes is connected with
3 Architecture of the CSN 3
Fig. 1 Subunit–subunit interactions of the CSN and interactions of CSN subunits with other proteins. Subunits are numbered according to the unified nomenclature [1]. CSN subunit–subunit interactions have been published before [19]. Darker shad-
ing indicates subunits with MPN domains and lighter those with PCI domains. Known phosphorylated subunits are indicated. Details on CSN subunit interactions with other proteins can be found in the text.
similar functions remains unclear. The exact arrangement of CSN and lid subunits within their complexes remains uncertain in the absence of high-resolution crystal structures for the two complexes. Interestingly, the occurrence of smaller CSN sub-complexes apart from the large 500-kDa CSN complex has been described in different species such as Arabidopsis, Drosophila, Schizosaccharomyces pombe and mammalian cells (for a review see Ref. [20]). At the moment the physiological function of CSN sub-complexes is unclear. It can be speculated that a controlled equilibrium exists between the large and small CSN complexes. The small complexes may have a function in shuttling between nucleus and cytoplasma and/or between large multi-subunit complexes such as the 26S proteasome and cullin-based Ub ligases. 3.2 CSN-subunit Interactions With Other Proteins
Apart from subunit–subunit interactions within the CSN, a considerable number of cellular proteins interact with CSN subunits (see Figure 1). Although the
4 The COP9 Signalosome and Its Role in the Ubiquitin System
physiological relevance of many of the identified interactions is questionable, most of them might be attributed to a role of the CSN complex in signal transduction and ubiquitin-dependent proteolysis. CSN1 formerly called Gps1 was first described as a signal transduction repressor in Arabidopsis [21]. Over-expression of CSN1 suppresses the activated JNK signaling pathway and also inhibits UV- and serum-induced c-fos expression as well as MEKKactivated AP1-activity in mammalian cells [21, 22]. It remains unclear whether overexpressed CSN1 plays a role as dominant negative regulator when in the CSN complex. Whereas the N-terminal region of CSN1 is sufficient for repression, the C-terminal region is necessary for its integration into the complex and for the stability of the CSN complex [22]. Curiously, the N-terminal region of CSN1 may be not required for the CSN-associated deneddylation of cullin 1 (CUL1) and cullin 3 (CUL3), components of cullin-based E3 Ub ligases in Arabidopsis, although it appears to be one of the binding sites of the CSN for cullin-based complexes [23]. Moreover, CSN1 is the receptor site for the interaction of the CSN with inositol 1,3,4-trisphosphate 5/6 kinase [24]. In addition, CSN1 represents the interactor for a subunit of the translation-initiation factor 3, eIF3c/NIP1 [25], and for the 26S proteasome non-ATPase subunit Rpn6 [26]. Possible functions of these interactions are discussed later. CSN2 also known as alien [27] is perhaps an important regulatory subunit of the CSN. Firstly, CSN2 was identified as Trip15 (thyroid hormone receptor-interacting protein) using a yeast-two-hybrid screen [16]. The binding site of CUL1 and CUL2 is located at the N-terminal region of CSN2. This interaction is important for the CSNmediated deneddylation of cullin-based complexes that regulate their Ub-ligase activity [28]. Additionally, CSN2 binds to the transcription factor, ICSBP (interferon consensus sequence binding protein), which modulates interferon-directed gene expression [29]. Moreover it interacts with the nuclear receptors DAX-1, COUP1TF1 and ecdysone receptor [27, 30]. Interestingly, CSN2 is phos-phorylated by the CSN-associated kinases CK2 and PKD [19, 31]. However, the phosphorylation sites and their physiological function remain unclear. The CSN3 subunit interacts with IKKγ , a component of the IκB-kinase complex controlling NF-κB activity [32]. Additionally, it is the binding site for the CSNassociated kinases CK2 and PKD [31]. The subunit of the translation-initiation factor 3 complex, Int6/eIF3e, and the ubiquitin-conjugating enzyme variant, COP10, have been identified as other cellular interactors [33, 34]. Also the HIV-1 Tat protein interacts with CSN3 (our unpublished data). CSN4 is a poorly studied subunit of the CSN. Only one interactor of CSN4 has been identified, the ubiquitin-conjugating enzyme COP10 [34]. CSN5 appears to be a most important subunit both in terms of interactions with other cellular proteins and because it is a component with intrinsic metalloprotease activity (see below). The binding of CSN5 to cellular proteins including the transcription factors p53 [35] and c-Jun [15], the cell-cycle regulator protein p27 [36], rLHR (lutropin/choriogonadotropin receptor precursor) [37], Smad4 (TGF-β signaling pathway common effector) [38] and HIF1α (hypoxia-inducible factor 1) [39] appears to regulate their metabolic stability. In many cases the CSN5-interacting
3 Architecture of the CSN 5
proteins are phosphorylated by the CSN-associated kinases, which determines the speed of their destruction [40]. In contrast, Id1 and Id3 binding to the CSN complex via CSN5 leads to their stabilization, not to their phosphorylation [41]. The interaction of CSN5 to the member of the IκB multigene family Bcl3, the progesterone receptor PR, and the steroid receptor co-activator SRC-1, leads to stabilization of Bcl3-p50 and PR-SRC-1 complexes and enhances transcriptional activity [42, 43]. Whereas AP-1 activity is stimulated by interaction of CSN5 with the integrin adhesion receptor LFA-1 [44], the opposite effect was reported in the case of the cytokine migration inhibitor factor, MIF [45]. Additionally, there are other published interactors of CSN5 including the membrane-associated RING-finger Ub ligase TRC8 [46], hepatopoietin (HPO) [47], germ-line RNA helicases (GLHs) [48], and the ubiquitin C-terminal hydrolase PGP9.5 [49]. However, the exact role of these interactions remains unclear. Several groups reported the occurrence of a free CSN5 subunit [50] or CSN5 as a component of a smaller complex [51], although the exact physiological function of the different CSN5 forms is so far unclear. It is also unknown whether the occurrence of the different CSN5 forms is regulated. Moreover, little is known about CSN5 interactions in vivo, how they are regulated and under what circumstances they take place. CSN6 like CSN8 exists in eukaryotes except in S. pombe [52]. There are only a few published interactions of CSN6 with other cellular proteins. It binds to the HIV-1 Vpr protein affecting cell-cycle-associated signaling [53] and to the RING-finger protein of the SCF-complex, Rbx1 [54, 55]. In addition, CSN6 is another binding site for Int-6/eIF3e [33]. Interestingly, in mammalian cells two homologs of CSN7, CSN7a and CSN7b, have been found [6]. S. pombe contains only one form of CSN7 whereas Arabidopsis contains two alternative splicing variants, CSN7i and CSN7ii [52, 56]. CSN7 interacts with the polyamine-modulated factor PMF-1 [57]. Interestingly, CSN7 also binds the protein kinase CK2, one of the CSN-associated kinases, which phosphorylates CSN7 [31]. Whether the phosphorylated form of CSN7 is necessary for CSN complex assembly or for other regulatory events is unclear. Little is known of CSN8 interactions. CSN8 like CSN3 and CSN4 binds to COP10 [34]. 3.3 PCI and MPN
Six of the CSN subunits contain PCI (proteasome, COP9 signalosome, initiation factor 3) domains and two contain MPN (Mpr-Pad1-N-terminal) domains [58]. These two characteristic domains have been found in three protein complexes: the CSN, the 26S proteasome lid complex (lid) and the eIF3 complex. The two domains are composed of about 150 to 200 amino acids at the N- or C-terminus of the CSN subunits. Apparently, the PCI domain has been shown to be important for interactions between CSN subunits. Thus, it might have a scaffolding function [22, 59].
6 The COP9 Signalosome and Its Role in the Ubiquitin System
The CSN subunit CSN5 has been shown to contain a metalloprotease motif localized on its MPN domain, which is essential for the cleavage of the ubiquitinlike modifier NEDD8 from cullins [60] (see below). Apart from the catalytic activity of the MPN domain of CSN5 it appears to be the receptor for different cellular proteins associated with the CSN complex (see above and Figure 1). Interestingly, an MPN domain similar to that of CSN5 is located in the N-terminal region of CSN6. However, this MPN domain has no deneddylation catalytic center like CSN5. The function of the CSN6 MPN domain remains obscure.
4 Biochemical Activities Associated With the CSN 4.1 Deneddylation Activity
Studies in fission yeast and Arabidopsis have revealed that the CSN has a role in the cleavage of NEDD8 from cullins [54, 55, 61]. The MPN domain of CSN5, like its paralog subunit Rpn11 of the 26S proteasome lid complex, possesses a highly conserved pattern of four charged amino acid residues: one glutamate, two histidines and one aspartate. This pattern represents a new type of metalloprotease motif called the JAMM (Jab1/MPN domain metalloenzyme) or MPN+ motif [62, 63]. In CSN5 the catalytic region is important for the cleavage of the ubiquitinlike modifier NEDD8 from its targets. Mutations in the conserved histidine and asparxstate residues of CSN5 led to suppression of its deneddylation activity [60]. Crystal-structure analysis obtained with bacterial CSN5/MPN+ domain-containing AF2198 protein confirmed the metal-ion-dependent hydrolytic activity of CSN5, although it was inhibited by the alkylating agent NEM, an inhibitor of cysteine proteases [64]. NEDD8 is activated by a heterodimeric complex of APP-B1 and Ubα3 and is conjugated to target proteins by the conjugating enzyme Ubc12. So far, the only known targets are cullin-family proteins (CUL1–5), which are components of the cullin-based E3 ligase complexes. The covalent linkage of NEDD8 to cullins in vivo is thought to activate Ub-ligase complex activity by facilitating ubiquitin-conjugating enzyme E2 recruitment [65]. Deneddylation of cullins inactivates ubiquitination in vitro, but seems to stimulate the Ub E3 ligase complex activity in vivo [66, 67]. In cell lysates only a small fraction of CUL1 is neddylated, but in csn deletion cells 100% of CUL1 is modified by NEDD8. The purified CSN complex is able to deneddylate, although recombinant CSN5 protein cannot. Obviously CSN5 deneddylation activity is dependent on its assembly into the CSN complex [28, 54]. The fact that null mutants in most CSN subunits lack the deneddylation activity in the presence of excess CSN5 supports the fact that CSN5 alone is inactive in deneddylation [61]. So far, the exact role of deneddylation is questionable (see below).
4 Biochemical Activities Associated With the CSN 7
Fig. 2 Association of the CSN complex with enzymes. The Figure shows an electronmicroscopy image of purified CSN complex from human erythrocytes. As indicated by arrows the CSN is associated with the Ubspecific protease Ubp12, the proteasome, presumably with most of the cullin-based
Ub-ligase complexes, with a number of kinases, and with subunits of the translation initiation complex eIF3. In addition, subunit CSN5 has an intrinsic metalloprotease activity, which deneddylates cullins and also removes Ub conjugated to other proteins (for details see text).
4.2 Protein Kinases
The CSN is associated with enzymes such as kinases, proteases and Ub ligases, which perhaps, besides the intrinsic deneddylation activity, determine the specific function of the CSN in the Ub system. Here we summarize the associated (extrinsic) activities of the CSN shown in Figure 2. 4.2.1 Associated Protein Kinases Originally, a protein kinase was the first enzyme identified with the CSN purified from human erythrocytes. The CSN-associated kinase activity phosphorylated several serine and threonine residues in the N-terminal region of c-Jun [5] resulting in stabilization of c-Jun and increased AP-1 transcriptional activity. The pathway responsible for this c-Jun stabilization/activation was called CSN-directed c-Jun signaling [68]. It was subsequently shown that the CSN-directed c-Jun signaling pathway controls most of the VEGF (vascular endothelial growth factor) production in tumor cells [13]. VEGF is essential for tumor angiogenesis (see below).
8 The COP9 Signalosome and Its Role in the Ubiquitin System
In contrast to c-Jun, phosphorylation of the tumor suppressor p53 by CSNassociated kinases targets the protein for degradation by the Ub system [35]. For p53 stability, modification on Thr155 is most important as shown by mutational analysis [35] and by using different p53 peptides [31]. Mutation of Thr155 to Val led to stabilization of the transiently expressed p53 mutant in HeLa as well as in HL60 cells [35]. Inhibitors of CSN-associated kinases such as curcumin [18] caused stabilization of cellular p53 followed by massive cell death [35]. In addition to p53 and c-Jun, p27, ICSBP (interferon consensus sequence binding protein) and IκBα were identified as substrates of the CSN-associated kinases (for a review see Ref. [40]). Similar to p53, the phosphorylation of p27 results in accelerated degradation of the cyclin-dependent kinase inhibitor p27 by the Ub system (our unpublished data). In the case of ICSBP and IκBα, it is still unclear whether CSNmediated phosphorylation influences their stability. Interestingly, two of the CSN subunits, CSN2 and CSN7, are phosphorylated by the associated kinases [19, 69]. The physiological relevance of these modifications is currently obscure. 4.2.1.1 Identification of associated protein kinases Based on phosphopeptide analyses it became clear that associated kinases modify principally serine and threonine residues. Moreover, the analysis of putative phosphorylation-specific consensus sequences of p53, c-Jun, p27, ICSBP and IκBα revealed that the protein kinase CK2 and a member of the protein kinase C family might be associated with the CSN. It has been shown by immunoblotting that CK2 and the protein kinase Cµ (also called protein kinase D, PKD) co-purify with the CSN from human erythrocytes [31]. In addition, the two kinases co-immunoprecipitated together with the CSN from HeLa cells. Interaction of CK2 as well as PKD with the CSN is mediated by CSN3, as is the interaction between CK2 and the CSN7 subunit [31]. Interestingly, CSN7 itself is phosphorylated. Majerus and co-workers have published work on the co-purification of inositol 1,3,4-trisphosphate 5/6-kinase (5/6-kinase) with the CSN from bovine brain [24, 70]. Although the 5/6-kinase was not detected in the final preparation of the CSN from human erythrocytes [31], it cannot be excluded that the enzyme is associated with another pool of CSN particles. The enzyme phosphorylates c-Jun, IκBα as well as p53 and is sensitive to curcumin. These characteristics are very similar to those described for CK2 and PKD. It has been shown that the 5/6-kinase interacts with CSN1 and that over-expression of CSN1 inhibits its activity [24]. It might be that it interacts with the N-terminal part of CSN1, which has been shown to suppress activation of an AP-1 promoter [22]. Future studies will show whether additional kinases besides 5/6-kinase, CK2 and PKD can interact with the CSN under certain circumstances. For example, an interaction of CSN3 with IKKγ , a component of the IKK kinase complex, has been published [32]. 4.2.1.2 Functions of associated protein kinases Phosphorylation of a number of Ub-dependent substrates by CSN-associated kinases regulates the stability of the proteins towards the Ub system [40], presumably by promoting substrate ubiquitination. Most of the proteins bind to the CSN via CSN5, are phosphorylated and
4 Biochemical Activities Associated With the CSN 9
subsequently channeled to the associated Ub ligase for ubiquitination (see below). Modification of p53 induces a conformational change of the tumor suppressor, which leads to tighter binding to the Ub ligase [35]. In addition, there is evidence that phosphorylation might directly affect Ub-ligase activity. The transcriptional regulator Id3 interacts with the CSN, but is not phosphorylated. Nevertheless, inhibitors of CSN-associated kinases induce ubiquitination and degradation of the Id3 protein [41]. Because of associated kinases and their function in ubiquitination the CSN has been described as a complex “at the interface between signal transduction and ubiquitin-dependent proteolysis” [40]. This becomes even more significant if upstream regulation of the associated kinases is taken into account. Unfortunately at the moment little is known about the receptors or signal-transduction pathways leading to modification of the CSN and its associated kinase activities. It is also unclear whether there are interactions between the kinases and other associated activities of the CSN. 4.3 Deubiquitinating Enzymes
To date two deubiquitinating activities associated with the CSN have been identified. By mutational analysis one deubiquitinating activity has been mapped to the metalloprotease motif His–X–His–X10–Asp of the JAMM or MPN+ domain of CSN5 [11]. The conserved Asp residue of that motif was mutated and the mutant Flag-CSN5 was integrated into the CSN. The mutated CSN lost its ability to remove ubiquitin from the isopeptide bond of a mono-ubiquitinated conjugate [11]. Obviously the same intrinsic metalloprotease activity is responsible for deneddylation of mono-neddylated cullins [60], which is not surprising because of the homologies between Ub and NEDD8. It would be interesting to test whether the lid subunit Rpn11, which exhibits a deubiquitinating MPN+ domain [62], is able to deneddylate mono-neddylated proteins. In addition, another deubiquitinating activity associated with the CSN disassembles poly-Ub chains [11, 71]. This activity is catalyzed in fission yeast by the Ub-specific protease Ubp12, which is a CSN-associated enzyme [71]. The interaction of Ubp12 with the CSN is required for Ubp12 transport to the nucleus. Presumably in the nucleus, S. pombe Pcu1- and Pcu3-based Ub-ligase activities are inhibited by Ubp12 enzyme, since the deubiquitinating enzyme protects a specific adapter protein, Pop1p, from autocatalytic destruction [71]. Thus it seems that the CSN has dual activity in suppressing cullin-based Ub-ligase reactions: one is the intrinsic deneddylation and the other is deubiquitination via associated Ubp12; both reactions serve to inhibit cullin-based Ub ligases in vitro. Data have accumulated showing that CSN-associated deneddylation and deubiquitination are required for Ub-ligase activity in vivo. It has been hypothesized, therefore, that CSN-mediated inhibition of cullin-based ubiquitination might be necessary for the assembly of new cullin-based Ub-ligase complexes. After release from the CSN the new cullin-based complex would be active. It has to return to the
10 The COP9 Signalosome and Its Role in the Ubiquitin System
CSN for re-assembly or is degraded after auto-ubiquitination [71, 72]. For example, p27 has to be degraded at the transition from G1 to S phase. In a first step p27 may bind to the CSN which signals, perhaps by phosphorylation, the assembly of the required SCF complex containing the specific F-box protein Skp2. After formation of the p27-specific SCF complex both p27 and the Ub ligase might be released from the CSN perhaps again by phosphorylation which then results in ubiquitination and complete degradation of p27. Finally the Skp2-containing SCF complex is auto-ubiquitinated and degraded unless additional substrate appears. 4.4 Ubiquitin Ligases
Data have been accumulated demonstrating interactions of the CSN with Ub ligases, in particular with the cullin-based Ub ligases. Cullins 1 to 7 (CUL1–CUL7) form a protein family detected in all eukaryotic cells, which is involved in protein ubiquitination. It is known that CUL1 to CUL5 interact with the RING-domain protein Rbx1, the Ub ligase of the cullin-based complexes. So far it has been shown that the CSN interacts with CUL1 to CUL4 [11, 12, 54, 55, 61, 73]. Binding studies with CUL1 and with CUL2 revealed that the two cullin proteins bind via CSN2 to the complex [28, 54, 55]. In addition, Rbx1 seems to interact with CSN6 [54, 55]. Moreover, CUL1 interacts with Skp1, which makes the connection to a substrate-specific F-box protein. Therefore, CUL1-based Ub ligases are called SCF complexes (Skp1CDC53/CUL1-F-box protein) (for a review see Ref. [74]). CUL2 can be linked to the substrate-adapter protein the von Hippel–Lindau tumor suppressor via elongin C and elongin B forming the so called VCB (von Hippel–Lindau–elongin C–elongin B) complex (for a review see Ref. [74]). BTB/POZ-domain proteins have been identified as possible substrate-specific adaptors of CUL3-based Ub ligases [73, 75, 76]. There are more than 200 putative BTB/POZ-domain proteins expressed in mammalian cells and together with the large number of possible F-box proteins one can estimate that several hundreds of different cullin-based Ub-ligase complexes with different substrate specificities can be formed. The CUL4–Rbx1 complex has been characterized, and seems to be important for checkpoint control [12], DNA repair [11] and ubiquitination of c-Jun [77]. Most likely all cullin-based complexes interact with the CSN. In other words, the CSN is associated with ubiquitinating activity (see Figure 2). There are just a few data on interactions of the CSN with other Ub ligases besides the cullin-based complexes. For example, Mdm2, the RING domain Ub ligase of the tumor suppressor p53, binds to the CSN and is modified by CSN-associated kinases (our unpublished data). Whether Mdm2 is also modified by other CSN-associated activities has to be tested in the future. In addition, COP1, a putative RING-domain Ub ligase, which probably cooperates with the COP1-interacting protein 8 (CIP8) also binds to the CSN (for a review see Ref. [4]). However, some data indicate that COP1 is associated with a CUL4A complex in which it acts together with DET1 as a heterodimeric substrate adapter [77]. In this complex the CSN interacts with both the CUL4A and the COP1.
5 Association of the CSN With Other Protein Complexes 11
5 Association of the CSN With Other Protein Complexes 5.1 The eIF3 Complex
MPN and PCI domains have been also found in subunits of the eIF3 complex. Because MPN and PCI domains are most likely involved in protein–protein interactions (see above), it is not surprising that there are also cross-interactions between subunits of the CSN, the eIF3 and the lid. It has been reported that eIF3e/INT6 possessing a PCI domain interacts with CSN7 [33, 78]. Another eIF3 subunit eIF3c/p105 co-immunoprecipitated with eIF3e/INT6, eIF3b, CSN1 and CSN8 [78]. eIF3e/INT6 was used as bait in a two-hybrid screen that revealed possible interactions with the 26S proteasome ATPase Rpt4, CSN3 and CSN6 but also with CSN7 [33]. Interestingly, the subunit of the CSN-like complex in Saccharomyces cerevisiae Pci8/CSN11 [79] seems also to be a subunit of the budding yeast eIF3 complex and perhaps plays a similar role to eIF3e/INT6 in eukaryotic cells [80]. It has been speculated that these interactions allow the CSN to control translation. Interactions between eIF3e/INT6 or eIF3i with the 26S proteasome have also been described [33, 81]. It has been shown that eIF3e/INT6 interacts with Rpn5 of the lid complex. This has an impact on 26S proteasome activity/localization, presumably affecting cell division and mitotic fidelity [82]. Perhaps there exists a network of “PCI complexes” as suggested [83], which shares polypeptides and communicates via proteins such as eIF3e/INT6. 5.2 The Proteasome
In 1998 it was reported that the CSN co-fractionates with the 26S proteasome from human cells [5]. A yeast two-hybrid screen revealed that the C-terminal domain of the Arabidopsis atCSN1 subunit interacts with atRpn6 of the 26S proteasome lid [26]. Recently gel-filtration size-fractionation of material from Arabidopsis in the presence of ATP and phosphatase inhibitors indicated that the CSN1 and CSN6 subunits co-elute in the same fractions as subunits of the 26S regulatory complex [84]. Based on these data it has been speculated that the CSN might be an alternative lid of the 26S proteasome [85]. The “alternative lid hypothesis”, however, makes little sense if the CSN interacts with the 26S proteasome via the lid component Rpn6 [26]. CSN pull-down experiments and subsequent mass-spectrometry analysis of co-precipitated proteins also revealed the presence of proteasome subunits in the precipitate [73]. However, since proteasome subunits are very abundant in cells, one has to be cautious with this type of data. So far there is no systematic binding study showing physical interaction of the CSN with sub-complexes of the 26S proteasome. Moreover, up to now there is no functional evidence for such a CSN/26S proteasome interaction.
12 The COP9 Signalosome and Its Role in the Ubiquitin System
6 Biological Functions of the CSN 6.1 Regulation of Ubiquitin Conjugate Formation
In general, and including all its activities, intrinsic as well as associated, the CSN seems to be a regulator of ubiquitination. Deneddylation, deubiquitination as well as CSN-mediated phosphorylation (at least with c-Jun and Id3 as substrates) cause inhibition of ubiquitination. It is likely that suppression of ligase activity is an essential step in the dynamic process of specific E3 complex assembly/reassembly. According to the model of Wolf et al. [72] cullin-based Ub-ligase complexes might assemble/reassemble in a protected environment produced by the CSN. In the CSN-associated-state, binding of any E2 to the Ub ligase is prevented, perhaps by deneddylation [65], self-ubiquitination is blocked by continuous deubiquitination [71] and substrate binding could be inhibited by phosphorylation [31]. Only under these conditions can the Ub ligase reassemble without itself being destroyed. For example, an SCF complex might associate with another F-box protein, or a CUL3Ub ligase with another BTB/POZ-domain protein, as an adaptation to the next phase of cell cycle or signal transduction upon the appearance of a new substrate, which has to be degraded. Following this argument a major question arises. How does the substrate signal the assembly of the required Ub ligase performing its ubiquitination? Is it by binding to the CSN and subsequent signaling via specific kinases? In the case of the SCF complexes, another protein called CAND1/Tip120A seems to be involved in the dynamic assembly/reassembly process of the E3 [86]. CAND1 binds to the deneddylated CUL1 and inhibits Ub-ligase activity by competing for the Skp1–F-box-protein unit of the SCF complex [87]. After the release of CAND1, a new Skp1–F-box-protein unit can dock to the CUL1-Rbx1 unit to form an SCF complex possessing the necessary substrate specificity. Now the freshly formed Ub ligase has to be released from the CSN to become active. At the moment it is unclear how the Ub ligase might be released from the CSN. The attractive model of CSN-assisted Ub-ligase-complex assembly has to be tested in the future. In this model the CSN would function as a platform for Ub-ligase assembly. Interestingly, there are no reports of interactions between the 26S proteasome lid complex and Ub ligases. Known E3s directly interacting with the 26S proteasome seem to bind via base ATPases [88, 89]. This is an interesting functional difference between the CSN and the lid developed during evolution. In an alternative model the CSN might be the platform for complete proteolysis. It forms supercomplexes consisting of both the ubiquitinating and the proteolytic machineries. According to this model, the substrate first binds to the CSN, is then ubiquitinated by the associated Ub ligase and finally directly channeled into the 26S proteasome. Deneddylation, deubiquitination and phosphorylation are necessary to maintain the supercomplex, to protect the intermediates and to stimulate proteolysis.
6 Biological Functions of the CSN 13
6.1.1 Cell-cycle and Checkpoint Control Initial insight of the role of CSN in cell-cycle control came from the finding that csn1 and csn2 deletion S. pombe strains have an S-phase delay [52]. Interestingly, this effect did not occur in strains missing other CSN subunits. The S-phase delay was caused by the accumulation of the cell-cycle inhibitor Spd1 (S-phase delayed 1), which is involved in the misregulation of the ribonucleotide reductase (RNR). RNR catalyzes the production of deoxyribonucleotides for DNA synthesis and is composed of four subunits including Suc22. Activation of RNR is regulated by nuclear export of Suc22, which is suppressed by Spd1 [12]. Upon DNA damage or during S phase Spd1 is rapidly degraded, presumably leading to the RNRdependent production of dNTPs. However, in csn1 and csn2 deletion mutants, Spd1 accumulates, causing Suc22-dependent suppression of RNR connected with the S-phase delay and DNA-damage sensitivity [12, 15]. In mammalian cells, binding of HIV-1 Vpr-protein to the CSN6 results in cellcycle arrest at the G2/M phase [53]. Additionally, CSN is involved in the cell cycle via the nuclear export of cell-cycle kinase inhibitor p27kip1 (p27). CSN5 binds to p27 and promotes its nuclear export followed by its proteasome-dependent degradation. The over-expression of CSN5 in mouse fibroblasts counteracts cell-cycle arrest induced by serum depletion [36, 51]. Microinjection of the purified CSN complex into synchronized G1 cells blocks the S-phase entry in a deneddylation-dependent manner [28]. Furthermore, the reduction of CSN subunit expression by RNAi in Caenorhabditis elegans causes the failure of Mei-1 degradation by regulation of its specific Ub-ligase CUL3-based complex, which leads to severe effects during mitotic cell division [76]. Moreover, the CSN is involved in checkpoint control. The double deletions of csn1 and csn2 mutants crossed with checkpoint pathway mutants such as rad3, chk, and cds1 are synthetically lethal in S. pombe [52]. Cds1 kinase is constitutively activated in csn1 mutants. Similarly, loss of csn5 in Drosophila results in activation of Mei-41, one of the ATM/ATR family kinases involved in meiotic checkpoint upon DNA damage [90]. 6.1.2 DNA Repair Two papers have assigned the CSN a function in DNA repair. One study reports on the existence of two different complexes containing human CSN and either one of the two nucleotide-excision-repair proteins, DDB2 or CSA. DDB2 is involved in the global genome-repair pathway (GGR) and CSA functions in the transcriptioncoupled repair pathway (TCR). Additionally, these complexes possess Ub-ligase activity and contain cullin-based Ub-ligase components such as CUL4 and Rbx1/Roc1, and DDB1, a UV-damage DNA-binding protein [11]. However, so far their targets remain unclear. CSN differentially regulates the ubiquitin-ligase activity of the DDB2- and CSA-containing complexes in response to UV irradiation. In support of direct involvement of the CSN is the finding that knockdown of CSN5 with RNAi causes a failure in NER mechanisms [11]. Similarly, CSN in combination with the CUL4–Rbx1 complex is involved in Ub-dependent degradation of CDT1, a licensing factor of the pre-replication complex (preRC), after UV- or γ -irradiation.
14 The COP9 Signalosome and Its Role in the Ubiquitin System
Knockdown of CSN completely suppresses CDT1 degradation, causing a defect G1 checkpoint in response to DNA damage [91]. 6.1.3 Developmental Processes Although CSN is not essential in yeast, the csn1 and csn2 S. pombe deletion mutants display slow growth and sensitivity to UV- and γ -irradiation [52]. Other csn mutants did not show significant phenotypes apart from the loss of cullin’s deneddylation activity [61]. In mutants of CSN-like complexes in the budding yeast S. cerevisiae the sensitivity to the DNA-damage reagents is not affected [92]. In some S. cerevisiae mutants such as csn5, csn9 and csn12 deletions, increased mating efficiency and enhanced pheromone response has been observed [63]. In Drosophila, mutations of CSN causes lethality in early larval stages and defects during oogenesis or photoreceptor R cell differentiation [9, 90, 93, 94]. More specifically, lack of CSN5 leads to the activation of a DNA double-strand-break-dependent checkpoint mediated by Mei-41. This effect is caused by CSN5-dependent inhibition of gurken (Grk) protein translation [90]. In C. elegans, knockdown of CSN5 by RNAi resulted in a sterile phenotype, which could be explained by CSN interaction with germ-line RNA helicases [48]. The best studied physiological role of the CSN in developmental processes is derived from studies on Arabidopsis. Csn mutants can survive embryogenesis, but they die soon after germination. The csn mutants exhibit a defect in photomorphogenesis, a light-dependent developmental process of germinating seedlings. Even in total darkness the mutants display a light-dependent morphology and signalindependent expression of light-induced genes [3, 14, 95]. One key mechanism is the CSN-dependent regulation of the stability of the transcription factor HY5, a positive regulator of light-induced genes. In the dark it is degraded by the Ub system [8]. It has been suggested that in darkness the RING-finger protein COP1 ubiquitinates HY5 and triggers its degradation by the 26S proteasome (for a review see Ref. [4]). In the light, COP1 is relocated to the cytoplasm allowing expression of genes through HY5. Although the exact mechanism remains unclear, the CSN may be required for relocation of COP1 from cytoplasm to the nucleus in darkness. Identical phenotypes caused by different csn mutants in Arabidopsis could be explained by a role of the CSN as a whole complex (for a review see Ref. [20]). There is accumulating evidence for cooperation of the CSN and cullin-based complexes in specific developmental processes [96]. First insight has been provided by studies on auxin response where the CSN interacts with SCFTIR1 , modulating its activity [55]. Similarly, binding of the CSN to other cullin-based complexes regulates their activity in mediating various developmental processes such as flower development, and plant defense responses [97, 98]. 6.2 Tumor Angiogenesis
Tumor angiogenesis is the vascularization of solid tumors, an essential requirement for tumor growth and metastasis. After a solid tumor has reached a size
6 Biological Functions of the CSN 15
of approximately 2 mm3 , it needs nutrient supply from blood vessels, otherwise it dies from necrosis. Many tumor cells are able to induce angiogenesis. In an initiation phase the tumor cells produce large amounts of pro-angiogenic factors such as vascular endothelial growth factor, VEGF. During proliferation and invasion VEGF stimulates migration of endothelial cells. Finally, after a maturation phase, vascularization of solid tumors is completed. Now the tumor can grow, and some tumor cells penetrate through vessel membranes and spread via the circulation. Therefore, inhibition of tumor angiogenesis has become an important strategy in tumor therapy. There is functional cooperation between the CSN and the Ub system in tumor angiogenesis [13]. It has been known for some time that curcumin is an inhibitor of angiogenesis [99]. However, only in 2001 did it become clear that it acts via inhibition of CSN-associated kinases [13]. It has been demonstrated that over-expression of CSN2 subunit leads to elevated amounts of de novo assembled CSN complex connected with increased c-Jun levels and enhanced AP-1 transactivation activity [68]. This c-Jun activation/stabilization is independent of the JNK and the MAP kinase pathway and is called CSN-directed c-Jun signaling [68]. This process can be inhibited by curcumin or other inhibitors of CSN-associated kinases (Figure
Fig. 3 The CSN-directed c-Jun signaling pathway. (A) The active CSN-directed cJun signaling pathway is shown. In case of active CSN-associated kinases c-Jun is phosphorylated, which stabilizes the transcription factor towards the Ub system. In addition, phosphorylation of the responsible E3 might inactivate the enzyme. In this situation Id1 and Id3 are also stabilized. Stable/active c-Jun causes enhanced AP-1 transactivation connected with an increase of VEGF production by tumor cells (see
text). VEGF is a major pro-angiogenic factor produced by many tumor cells. Id1 and Id3 are transcriptional regulators essential for tumor angiogenesis. (B) In the presence of curcumin or other kinase inhibitors the responsible Ub ligase is most likely active and ubiquitinates both c-Jun and Id3. In addition, unphosphorylated c-Jun might have higher affinity to its Ub ligase. This leads to quick degradation of the proteins by the Ub system.
16 The COP9 Signalosome and Its Role in the Ubiquitin System
3). The CSN-directed c-Jun signaling controls up to 75% of VEGF production in tumor cells [13]. In addition, Id1 and Id3 are also essential factors of tumor angiogenesis [100] and are degraded in the presence of CSN-associated kinase inhibitors in an Ub-dependent manner just like c-Jun [41]. Therefore, specific inhibition of CSN-associated kinases might become important for tumor therapy. The application of CSN-associated kinase inhibitors in tumor therapy could be beneficial owing to another effect of curcumin-like compounds, namely they stabilize cellular p53 and, at least in tumors with wild-type p53 protein, massive cell death can be observed [35].
7 Conclusions
The CSN is a regulatory complex of the Ub system. Physically it interacts with the proteasome and with Ub ligases. Although the exact mechanism remains obscure, the CSN regulates ubiquitination of important cell-cycle factors and transcriptional regulators. Its intrinsic deneddylating as well as the associated kinase and deubiquitinating activities seem to be required for determining protein stability towards the Ub system. As a major regulator of the Ub system the CSN is involved in processes such as DNA repair, cell-cycle progression and development. Its role in tumor angiogenesis makes the complex attractive for future tumor therapies.
Acknowledgment
We thank Rasmus Hartmann-Petersen for critical reading of the manuscript. The work was supported by research grants from the Deutsche Forschungsgemeinschaft and from the German-Israeli Foundation to W. D.
References 1 DENG, X. W., DUBIEL, W., WEI, N., HOFMANN, K., MUNDT, K., COLICELLI, J., KATO, J., NAUMANN, M., SEGAL, D., SEEGER, M., CARR, A., GLICKMAN, M., CHAMOVITZ, D. A. Unified nomenclature for the COP9 signalosome and its subunits: an essential regulator of development. Trends Genet. 2000, 16, 202–203. 2 WEI, N., CHAMOVITZ, D. A., DENG, X. W. Arabidopsis COP9 is a component of a novel signaling complex mediating light control of development. Cell 1994, 78, 117–124.
3 WEI, N., DENG, X. W. Making sense of the COP9 signalosome. A regulatory protein complex conserved from Arabidopsis to human. Trends Genet. 1999, 15, 98–103. 4 SERINO, G., DENG, X. W. The COP9 signalosome: regulating plant development through the control of proteolysis. Annu. Rev. Plant Biol. 2003, 54, 165–182. 5 SEEGER, M., KRAFT, R., FERRELL, K., BECH-OTSCHIR, D., DUMDEY, R., SCHADE, R., GORDON, C., NAUMANN, M., DUBIEL, W. A novel protein complex involved in
References
6
7
8
9
10
11
12
13
signal transduction possessing similarities to 26S proteasome subunits. FASEB J. 1998, 12, 469–478. WEI, N., TSUGE, T., SERINO, G., DOHMAE, N., TAKIO, K., MATSUI, M., DENG, X. W. The COP9 complex is conserved between plants and mammals and is related to the 26S proteasome regulatory complex. Curr. Biol. 1998, 8, 919–922. GLICKMAN, M. H., RUBIN, D. M., COUX, O., WEFES, I., PFEIFER, G., CJEKA, Z., BAUMEISTER, W., FRIED, V. A., FINLEY, D. A subcomplex of the proteasome regulatory particle required for ubiquitin-conjugate degradation and related to the COP9-signalosome and eIF3. Cell 1998, 94, 615–623. SCHWECHHEIMER, C., DENG, X. W. COP9 signalosome revisited: a novel mediator of protein degradation. Trends Cell Biol. 2001, 11, 420–426. FREILICH, S., ORON, E., KAPP, Y., NEVO-CASPI, Y., ORGAD, S., SEGAL, D., CHAMOVITZ, D. A. The COP9 signalosome is essential for development of Drosophila melanogaster. Curr. Biol. 1999, 9, 1187–1190. LYKKE-ANDERSEN, K., SCHAEFER, L., MENON, S., DENG, X. W., MILLER, J. B., WEI, N. Disruption of the COP9 signalosome Csn2 subunit in mice causes deficient cell proliferation, accumulation of p53 and cyclin E, and early embryonic death. Mol. Cell. Biol. 2003, 23, 6790–6797. GROISMAN, R., POLANOWSKA, J., KURAOKA, I., SAWADA, J., SAIJO, M., DRAPKIN, R., KISSELEV, A. F., TANAKA, K., NAKATANI, Y. The ubiquitin ligase activity in the DDB2 and CSA complexes is differentially regulated by the COP9 signalosome in response to DNA damage. Cell 2003, 113, 357–367. LIU, C., POWELL, K. A., MUNDT, K., WU, L., CARR, A. M., CASPARI, T. Cop9/signalosome subunits and Pcu4 regulate ribonucleotide reductase by both checkpoint-dependent and -independent mechanisms. Genes Dev. 2003, 17, 1130–1140. POLLMANN, C., HUANG, X., MALL, J., BECH-OTSCHIR, D., NAUMANN, M.,
14
15
16
17
18
19
20
21
DUBIEL, W. The constitutive photomorphogenesis 9 signalosome directs vascular endothelial growth factor production in tumor cells. Cancer Res. 2001, 61, 8416–8421. OSTERLUND, M. T., WEI, N., DENG, X. W. The roles of photoreceptor systems and the COP1-targeted destabilization of HY5 in light control of Arabidopsis seedling development. Plant Physiol. 2000, 124, 1520–1524. CLARET, F. X., HIBI, M., DHUT, S., TODA, T., KARIN, M. A new group of conserved coactivators that increase the specificity of AP-1 transcription factors. Nature 1996, 383, 453–457. LEE, J. W., CHOI, H. S., GYURIS, J., BRENT, R., MOORE, D. D. Two classes of proteins dependent on either the presence or absence of thyroid hormone for interaction with the thyroid hormone receptor. Mol. Endocrinol. 1995, 9, 243–254. WEI, N., DENG, X. W. Characterization and purification of the mammalian COP9 complex, a conserved nuclear regulator initially identified as a repressor of photomorphogenesis in higher plants. Photochem. Photobiol. 1998, 68, 237–241. HENKE, W., FERRELL, K., BECH-OTSCHIR, D., SEEGER, M., SCHADE, R., JUNGBLUT, P., NAUMANN, M., DUBIEL, W. Comparison of human COP9 signalsome and 26S proteasome lid’. Mol. Biol. Rep. 1999, 26, 29–34. KAPELARI, B., BECH-OTSCHIR, D., HEGERL, R., SCHADE, R., DUMDEY, R., DUBIEL, W. Electron microscopy and subunit–subunit interaction studies reveal a first architecture of COP9 signalosome. J. Mol. Biol. 2000, 300, 1169–1178. WEI, N., DENG, X. W. The COP9 signalosome. Annu. Rev. Cell Dev. Biol. 2003, 19, 261–286. SPAIN, B. H., BOWDISH, K. S., PACAL, A. R., STAUB, S. F., KOO, D., CHANG, C. Y., XIE, W., COLICELLI, J. Two human cDNAs, including a homolog of Arabidopsis FUS6 (COP11), suppress G-protein- and mitogen-activated protein kinase-mediated signal transduction in
17
18 The COP9 Signalosome and Its Role in the Ubiquitin System
22
23
24
25
26
27
28
29
yeast and mammalian cells. Mol. Cell. Biol. 1996, 16, 6698–6706. TSUGE, T., MATSUI, M., WEI, N. The subunit 1 of the COP9 signalosome suppresses gene expression through its N-terminal domain and incorporates into the complex through the PCI domain. J. Mol. Biol. 2001, 305, 1–9. WANG, X., KANG, D., FENG, S., SERINO, G., SCHWECHHEIMER, C., WEI, N. CSN1 N-terminal-dependent activity is required for Arabidopsis development but not for Rub1/Nedd8 deconjugation of cullins: a structure-function study of CSN1 subunit of COP9 signalosome. Mol. Biol. Cell 2002, 13, 646–655. SUN, Y., WILSON, M. P., MAJERUS, P. W. Inositol 1,3,4-trisphosphate 5/6-kinase associates with the COP9 signalosome by binding to CSN1. J. Biol. Chem. 2002, 277, 45759–45764. KARNIOL, B., YAHALOM, A., KWOK, S., TSUGE, T., MATSUI, M., DENG, X. W., CHAMOVITZ, D. A. The Arabidopsis homologue of an eIF3 complex subunit associates with the COP9 complex. FEBS Lett. 1998, 439, 173–179. KWOK, S. F., STAUB, J. M., DENG, X. W. Characterization of two subunits of Arabidopsis 19S proteasome regulatory complex and its possible interaction with the COP9 complex. J. Mol. Biol. 1999, 285, 85–95. DRESSEL, U., THORMEYER, D., ALTINCICEK, B., PAULULAT, A., EGGERT, M., SCHNEIDER, S., TENBAUM, S. P., RENKAWITZ, R., BANIAHMAD, A. Alien, a highly conserved protein with characteristics of a corepressor for members of the nuclear hormone receptor superfamily. Mol. Cell. Biol. 1999, 19, 3383–3394. YANG, X., MENON, S., LYKKE-ANDERSEN, K., TSUGE, T., DI, X., WANG, X., RODRIGUEZ-SUAREZ, R. J., ZHANG, H., WEI, N. The COP9 signalosome inhibits p27(kip1) degradation and impedes G1-S phase progression via deneddylation of SCF Cul1. Curr. Biol. 2002, 12, 667–672. COHEN, H., AZRIEL, A., COHEN, T., MERARO, D., HASHMUELI, S., BECH-OTSCHIR, D., KRAFT, R., DUBIEL, W., LEVI, B. Z. Interaction between interferon consensus sequence-binding
30
31
32
33
34
35
36
37
protein and COP9/signalosome subunit CSN2 (Trip15). A possible link between interferon regulatory factor signaling and the COP9/signalosome. J. Biol. Chem. 2000, 275, 39081–39089. ALTINCICEK, B., TENBAUM, S. P., DRESSEL, U., THORMEYER, D., RENKAWITZ, R., BANIAHMAD, A. Interaction of the corepressor Alien with DAX-1 is abrogated by mutations of DAX-1 involved in adrenal hypoplasia congenita. J. Biol. Chem. 2000, 275, 7662–7667. UHLE, S., MEDALIA, O., WALDRON, R., DUMDEY, R., HENKLEIN, P., BECH-OTSCHIR, D., HUANG, X., BERSE, M., SPERLING, J., SCHADE, R., DUBIEL, W. Protein kinase CK2 and protein kinase D are associated with the COP9 signalosome. EMBO J. 2003, 22, 1302–1312. HONG, X., XU, L., LI, X., ZHAI, Z., SHU, H. CSN3 interacts with IKKgamma and inhibits TNF- but not IL-1-induced NF-kappaB activation. FEBS Lett. 2001, 499, 133–136. HOAREAU ALVES, K., BOCHARD, V., RETY, S., JALINOT, P. Association of the mammalian proto-oncoprotein Int-6 with the three protein complexes eIF3, COP9 signalosome and 26S proteasome. FEBS Lett. 2002, 527, 15–21. SUZUKI, G., YANAGAWA, Y., KWOK, S. F., MATSUI, M., DENG, X. W. Arabidopsis COP10 is a ubiquitin-conjugating enzyme variant that acts together with COP1 and the COP9 signalosome in repressing photomorphogenesis. Genes Dev. 2002, 16, 554–559. BECH-OTSCHIR, D., KRAFT, R., HUANG, X., HENKLEIN, P., KAPELARI, B., POLLMANN, C., DUBIEL, W. COP9 signalosome-specific phosphorylation targets p53 to degradation by the ubiquitin system. EMBO J. 2001, 20, 1630–1639. TOMODA, K., KUBOTA, Y., KATO, J. Degradation of the cyclin-dependent-kinase inhibitor p27Kip1 is instigated by Jab1. Nature 1999, 398, 160–165. LI, S., LIU, X., ASCOLI, M. p38JAB1 binds to the intracellular precursor of the
References
38
39
40
41
42
43
44
45
lutropin/choriogonadotropin receptor and promotes its degradation. J. Biol. Chem. 2000, 275, 13386–13393. WAN, M., CAO, X., WU, Y., BAI, S., WU, L., SHI, X., WANG, N. Jab1 antagonizes TGF-beta signaling by inducing Smad4 degradation. EMBO Rep. 2002, 3, 171–176. BAE, M. K., AHN, M. Y., JEONG, J. W., BAE, M. H., LEE, Y. M., BAE, S. K., PARK, J. W., KIM, K. R., KIM, K. W. Jab1 interacts directly with HIF-1alpha and regulates its stability. J. Biol. Chem. 2002, 277, 9–12. BECH-OTSCHIR, D., SEEGER, M., DUBIEL, W. The COP9 signalosome: at the interface between signal transduction and ubiquitin-dependent proteolysis. J. Cell Sci. 2002, 115, 467–473. BERSE, M., BOUNPHENG, M. A., HUANG, X., CHRISTY, B. A., POLLMANN, C., DUBIEL, W. Ubiquitin-dependent degradation of Id1 and Id3 is mediated by the COP9 signalosome. J. Mol. Biol. 2004, 343, 361–370. DECHEND, R., HIRANO, F., LEHMANN, K., HEISSMEYER, V., ANSIEAU, S., WULCZYN, F. G., SCHEIDEREIT, C., LEUTZ, A. The Bcl-3 oncoprotein acts as a bridging factor between NF-kappaB/Rel and nuclear coregulators. Oncogene 1999, 18, 3316–3323. CHAUCHEREAU, A., GEORGIAKAKI, M., PERRIN-WOLFF, M., MILGROM, E., LOOSFELT, H. JAB1 interacts with both the progesterone receptor and SRC-1. J. Biol. Chem. 2000, 275, 8540–8548. BIANCHI, E., DENTI, S., GRANATA, A., BOSSI, G., GEGINAT, J., VILLA, A., ROGGE, L., PARDI, R. Integrin LFA-1 interacts with the transcriptional co-activator JAB1 to modulate AP-1 activity. Nature 2000, 404, 617–621. KLEEMANN, R., HAUSSER, A., GEIGER, G., MISCHKE, R., BURGER-KENTISCHER, A., FLIEGER, O., JOHANNES, F. J., ROGER, T., CALANDRA, T., KAPURNIOTU, A., GRELL, M., FINKELMEIER, D., BRUNNER, H., BERNHAGEN, J. Intracellular action of the cytokine MIF to modulate AP-1 activity and the cell cycle through Jab1. Nature 2000, 408, 211–216.
46 GEMMILL, R. M., BEMIS, L. T., LEE, J. P., SOZEN, M. A., BARON, A., ZENG, C., ERICKSON, P. F., HOOPER, J. E., DRABKIN, H. A. The TRC8 hereditary kidney cancer gene suppresses growth and functions with VHL in a common pathway. Oncogene 2002, 21, 3507–3516. 47 LU, C., LI, Y., ZHAO, Y., XING, G., TANG, F., WANG, Q., SUN, Y., WEI, H., YANG, X., WU, C., CHEN, J., GUAN, K. L., ZHANG, C., CHEN, H., HE, F. Intracrine hepatopoietin potentiates AP-1 activity through JAB1 independent of MAPK pathway. FASEB J. 2002, 16, 90–92. 48 SMITH, P., LEUNG-CHIU, W. M., MONTGOMERY, R., ORSBORN, A., KUZNICKI, K., GRESSMAN-COBERLY, E., MUTAPCIC, L., BENNETT, K. The GLH proteins, Caenorhabditis elegans P granule components, associate with CSN-5 and KGB-1, proteins necessary for fertility, and with ZYX-1, a predicted cytoskeletal protein. Dev. Biol. 2002, 251, 333–347. 49 CABALLERO, O. L., RESTO, V., PATTURAJAN, M., MEERZAMAN, D., GUO, M. Z., ENGLES, J., YOCHEM, R., RATOVITSKI, E., SIDRANSKY, D., JEN, J. Interaction and colocalization of PGP9.5 with JAB1 and p27(Kip1). Oncogene 2002, 21, 3003–3010. 50 CHAMOVITZ, D. A., SEGAL, D. JAB1/CSN5 and the COP9 signalosome. A complex situation. EMBO Rep. 2001, 2, 96–101. 51 TOMODA, K., KUBOTA, Y., ARATA, Y., MORI, S., MAEDA, M., TANAKA, T., YOSHIDA, M., YONEDA-KATO, N., KATO, J. Y. The cytoplasmic shuttling and subsequent degradation of p27Kip1 mediated by Jab1/CSN5 and the COP9 signalosome complex. J. Biol. Chem. 2002, 277, 2302–2310. 52 MUNDT, K. E., PORTE, J., MURRAY, J. M., BRIKOS, C., CHRISTENSEN, P. U., CASPARI, T., HAGAN, I. M., MILLAR, J. B., SIMANIS, V., HOFMANN, K., CARR, A. M. The COP9/signalosome complex is conserved in fission yeast and has a role in S phase. Curr. Biol. 1999, 9, 1427–1430. 53 MAHALINGAM, S., AYYAVOO, V., PATEL, M., KIEBER-EMMONS, T., KAO, G. D., MUSCHEL, R. J., WEINER, D. B. HIV-1 Vpr
19
20 The COP9 Signalosome and Its Role in the Ubiquitin System
54
55
56
57
58
59
60
61
62
interacts with a human 34-kDa mov34 homologue, a cellular factor linked to the G2/M phase transition of the mammalian cell cycle. Proc. Natl. Acad. Sci. USA 1998, 95, 3419–3424. LYAPINA, S., COPE, G., SHEVCHENKO, A., SERINO, G., TSUGE, T., ZHOU, C., WOLF, D. A., WEI, N., DESHAIES, R. J. Promotion of NEDD-CUL1 conjugate cleavage by COP9 signalosome. Science 2001, 292, 1382–1385. SCHWECHHEIMER, C., SERINO, G., CALLIS, J., CROSBY, W. L., LYAPINA, S., DESHAIES, R. J., GRAY, W. M., ESTELLE, M., DENG, X. W. Interactions of the COP9 signalosome with the E3 ubiquitin ligase SCFTIRI in mediating auxin response. Science 2001, 292, 1379–1382. FU, H., REIS, N., LEE, Y., GLICKMAN, M. H., VIERSTRA, R. D. Subunit interaction maps for the regulatory particle of the 26S proteasome and the COP9 signalosome. EMBO J. 2001, 20, 7096–7107. WANG, Y., DEVEREUX, W., STEWART, T. M., CASERO, R. A., Jr. Polyamine-modulated factor 1 binds to the human homologue of the 7a subunit of the Arabidopsis COP9 signalosome: implications in gene expression. Biochem. J. 2002, 366, 79–86. HOFMANN, K., BUCHER, P. The PCI domain: a common theme in three multiprotein complexes. Trends Biochem. Sci. 1998, 23, 204–205. KIM, T., HOFMANN, K., VON ARNIM, A. G., CHAMOVITZ, D. A. PCI complexes: pretty complex interactions in diverse signaling pathways. Trends Plant Sci. 2001, 6, 379–386. COPE, G. A., SUH, G. S., ARAVIND, L., SCHWARZ, S. E., ZIPURSKY, S. L., KOONIN, E. V., DESHAIES, R. J. Role of predicted metalloprotease motif of Jab1/Csn5 in cleavage of Nedd8 from Cul1. Science 2002, 298, 608–611. ZHOU, C., SEIBERT, V., GEYER, R., RHEE, E., LYAPINA, S., COPE, G., DESHAIES, R. J., WOLF, D. A. The fission yeast COP9/signalosome is involved in cullin modification by ubiquitin-related Ned8p. BMC BioChem. 2001, 2, 7. VERMA, R., ARAVIND, L., OANIA, R., MCDONALD, W. H., YATES, J. R., 3rd,
63
64
65
66
67
68
69
70
KOONIN, E. V., DESHAIES, R. J. Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science 2002, 298, 611–615. MAYTAL-KIVITY, V., REIS, N., HOFMANN, K., GLICKMAN, M. H. MPN+, a putative catalytic motif found in a subset of MPN domain proteins from eukaryotes and prokaryotes, is critical for Rpn11 function. BMC BioChem. 2002, 3, 28. TRAN, H. J., ALLEN, M. D., LOWE, J., BYCROFT, M. Structure of the Jab1/MPN domain and its implications for proteasome function. Biochemistry 2003, 42, 11460–11465. KAWAKAMI, T., CHIBA, T., SUZUKI, T., IWAI, K., YAMANAKA, K., MINATO, N., SUZUKI, H., SHIMBARA, N., HIDAKA, Y., OSAKA, F., OMATA, M., TANAKA, K. NEDD8 recruits E2-ubiquitin to SCF E3 ligase. EMBO J. 2001, 20, 4003–4012. PODUST, V. N., BROWNELL, J. E., GLADYSHEVA, T. B., LUO, R. S., WANG, C., COGGINS, M. B., PIERCE, J. W., LIGHTCAP, E. S., CHAU, V. A Nedd8 conjugation pathway is essential for proteolytic targeting of p27Kip1 by ubiquitination. Proc. Natl. Acad. Sci. USA 2000, 97, 4579–4584. READ, M. A., BROWNELL, J. E., GLADYSHEVA, T. B., HOTTELET, M., PARENT, L. A., COGGINS, M. B., PIERCE, J. W., PODUST, V. N., LUO, R. S., CHAU, V., PALOMBELLA, V. J. Nedd8 modification of cul-1 activates SCF(beta(TrCP))-dependent ubiquitination of IkappaBalpha. Mol. Cell. Biol. 2000, 20, 2326–2333. NAUMANN, M., BECH-OTSCHIR, D., HUANG, X., FERRELL, K., DUBIEL, W. COP9 signalosome-directed c-Jun activation/stabilization is independent of JNK. J. Biol. Chem. 1999, 274, 35297–35300. KARNIOL, B., MALEC, P., CHAMOVITZ, D. A. Arabidopsis FUSCA5 encodes a novel phosphoprotein that is a component of the COP9 complex. Plant Cell 1999, 11, 839–848. WILSON, M. P., SUN, Y., CAO, L., MAJERUS, P. W. Inositol 1,3,4-trisphosphate 5/6-kinase is a
References
71
72
73
74
75
76
77
78
79
protein kinase that phosphorylates the transcription factors c-Jun and ATF-2. J. Biol. Chem. 2001, 276, 40998–41004. ZHOU, C., WEE, S., RHEE, E., NAUMANN, M., DUBIEL, W., WOLF, D. A. Fission yeast COP9/signalosome suppresses cullin activity through recruitment of the deubiquitylating enzyme Ubp12p. Mol. Cell 2003, 11, 927–938. WOLF, D. A., ZHOU, C., WEE, S. The COP9 signalosome: an assembly and maintenance platform for cullin ubiquitin ligases? Nat. Cell Biol. 2003, 5, 1029–1033. GEYER, R., WEE, S., ANDERSON, S., YATES, J., WOLF, D. A. BTB/POZ domain proteins are putative substrate adaptors for cullin 3 ubiquitin ligases. Mol. Cell 2003, 12, 783–790. TYERS, M., JORGENSEN, P. Proteolysis and the cell cycle: with this RING I do thee destroy. Curr. Opin. Genet. Dev. 2000, 10, 54–64. XU, L., WEI, Y., REBOUL, J., VAGLIO, P., SHIN, T. H., VIDAL, M., ELLEDGE, S. J., HARPER, J. W. BTB proteins are substrate-specific adaptors in an SCF-like modular ubiquitin ligase containing CUL-3. Nature 2003, 425, 316–321. PINTARD, L., WILLIS, J. H., WILLEMS, A., JOHNSON, J. L., SRAYKO, M., KURZ, T., GLASER, S., MAINS, P. E., TYERS, M., BOWERMAN, B., PETER, M. The BTB protein MEL-26 is a substrate-specific adaptor of the CUL-3 ubiquitin-ligase. Nature 2003, 425, 311–316. WERTZ, I. E., O’ROURKE, K. M., ZHANG, Z., DORNAN, D., ARNOTT, D., DESHAIES, R. J., DIXIT, V. M. Human De-etiolated-1 regulates c-Jun by assembling a CUL4A ubiquitin ligase. Science 2004, 303, 1371–1374. YAHALOM, A., KIM, T. H., WINTER, E., KARNIOL, B., VON ARNIM, A. G., CHAMOVITZ, D. A. Arabidopsis eIF3e (INT-6) associates with both eIF3c and the COP9 signalosome subunit CSN7. J. Biol. Chem. 2001, 276, 334–340. MAYTAL-KIVITY, V., PICK, E., PIRAN, R., HOFMANN, K., GLICKMAN, M. H. The COP9 signalosome-like complex in S. cerevisiae and links to other PCI
80
81
82
83
84
85
86
87
88
complexes. Int. J. Biochem. Cell Biol. 2003, 35, 706–715. SHALEV, A., VALASEK, L., PISE-MASISON, C. A., RADONOVICH, M., PHAN, L., CLAYTON, J., HE, H., BRADY, J. N., HINNEBUSCH, A. G., ASANO, K. Saccharomyces cerevisiae protein Pci8p and human protein eIF3e/Int-6 interact with the eIF3 core complex by binding to cognate eIF3b subunits. J. Biol. Chem. 2001, 276, 34948–34957. DUNAND-SAUTHIER, I., WALKER, C., WILKINSON, C., GORDON, C., CRANE, R., NORBURY, C., HUMPHREY, T. Sum1, a component of the fission yeast eIF3 translation initiation complex, is rapidly relocalized during environmental stress and interacts with components of the 26S proteasome. Mol. Biol. Cell 2002, 13, 1626–1640. YEN, H. C., GORDON, C., CHANG, E. C. Schizosaccharomyces pombe Int6 and Ras homologs regulate cell division and mitotic fidelity via the proteasome. Cell 2003, 112, 207–217. VON ARNIM, A. G., CHAMOVITZ, D. A. Protein homeostasis: a degrading role for Int6/eIF3e. Curr. Biol. 2003, 13, R323–325. PENG, Z., SHEN, Y., FENG, S., WANG, X., CHITTETI, B. N., VIERSTRA, R. D., DENG, X. W. Evidence for a physical association of the COP9 signalosome, the proteasome, and specific SCF E3 ligases in vivo. Curr. Biol. 2003, 13, R504–505. LI, L., DENG, X. W. The COP9 signalosome: an alternative lid for the 26S proteasome? Trends Cell Biol. 2003, 13, 507–509. COPE, G. A., DESHAIES, R. J. COP9 signalosome: a multifunctional regulator of SCF and other cullin-based ubiquitin ligases. Cell 2003, 114, 663–671. ZHENG, J., YANG, X., HARRELL, J. M., RYZHIKOV, S., SHIM, E. H., LYKKE-ANDERSEN, K., WEI, N., SUN, H., KOBAYASHI, R., ZHANG, H. CAND1 binds to unneddylated CUL1 and regulates the formation of SCF ubiquitin E3 ligase complex. Mol. Cell 2002, 10, 1519–1526. XIE, Y., VARSHAVSKY, A. UFD4 lacking the proteasome-binding region catalyses ubiquitination but is impaired in
21
22 The COP9 Signalosome and Its Role in the Ubiquitin System
89
90
91
92
93
94
proteolysis. Nat Cell Biol. 2002, 4, 1003–1007. CORN, P. G., MCDONALD, E. R., 3rd, HERMAN, J. G., EL-DEIRY, W. S. Tat-binding protein-1, a component of the 26S proteasome, contributes to the E3 ubiquitin ligase function of the von Hippel-Lindau protein. Nat. Genet. 2003, 35, 229–237. DORONKIN, S., DJAGAEVA, I., BECKENDORF, S. K. CSN5/Jab1 mutations affect axis formation in the Drosophila oocyte by activating a meiotic checkpoint. Development 2002, 129, 5053–5064. HIGA, L. A., MIHAYLOV, I. S., BANKS, D. P., ZHENG, J., ZHANG, H. Radiation-mediated proteolysis of CDT1 by CUL4-ROC1 and CSN complexes constitutes a new checkpoint. Nat Cell Biol. 2003, 5, 1008–1015. WEE, S., HETFELD, B., DUBIEL, W., WOLF, D. A. Conservation of the COP9/signalosome in budding yeast. BMC Genet. 2002, 3, 15. ORON, E., MANNERVIK, M., RENCUS, S., HARARI-STEINBERG, O., NEUMAN-SILBERBERG, S., SEGAL, D., CHAMOVITZ, D. A. COP9 signalosome subunits 4 and 5 regulate multiple pleiotropic pathways in Drosophila melanogaster. Development 2002, 129, 4399–4409. SUH, G. S., POECK, B., CHOUARD, T., ORON, E., SEGAL, D., CHAMOVITZ, D. A., ZIPURSKY, S. L. Drosophila JAB1/CSN5
95
96
97
98
99
100
acts in photoreceptor cells to induce glial cells. Neuron 2002, 33, 35–46. MA, L., ZHAO, H., DENG, X. W. Analysis of the mutational effects of the COP/DET/FUS loci on genome expression profiles reveals their overlapping yet not identical roles in regulating Arabidopsis seedling development. Development 2003, 130, 969–981. SCHWECHHEIMER, C., SERINO, G., DENG, X. W. Multiple ubiquitin ligase-mediated processes require COP9 signalosome and AXR1 function. Plant Cell 2002, 14, 2553–2563. FENG, S., MA, L., WANG, X., XIE, D., DINESH-KUMAR, S. P., WEI, N., DENG, X. W. The COP9 signalosome interacts physically with SCF COI1 and modulates jasmonate responses. Plant Cell 2003, 15, 1083–1094. WANG, X., FENG, S., NAKAYAMA, N., CROSBY, W. L., IRISH, V., DENG, X. W., WEI, N. The COP9 signalosome interacts with SCF UFO and participates in Arabidopsis flower development. Plant Cell 2003, 15, 1071–1082. ARBISER, J. L., KLAUBER, N., ROHAN, R., van LEEUWEN, R., HUANG, M. T., FISHER, C., FLYNN, E., BYERS, H. R. Curcumin is an in vivo inhibitor of angiogenesis. Mol. Med. 1998, 4, 376–383. BENEZRA, R., RAFII, S., LYDEN, D. The Id proteins and angiogenesis. Oncogene 2001, 20, 8334–8341.
1
Molecular Chaperones and the Ubiquitin–Proteasome System Cam Patterson
University of North Carolina, Chapel Hill, USA
J¨org H¨ohfeld Rheinische Friedrich-Wilhelms-Universit¨at Bonn, Bonn, Germany
Originally published in: Protein Degradation, Volume 2. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31130-0
1 Introduction
The biological activity of a protein is defined by its unique three-dimensional structure. Attaining this structure, however, is a delicate process. A recent study suggests that up to 30% of all newly synthesized proteins never reach their native state [1]. As protein misfolding poses a major threat to cell function and viability, molecular mechanisms must have evolved to prevent the accumulation of misfolded proteins and thus aggregate formation. Two protective strategies appear to be followed. Molecular chaperones are employed to stabilize nonnative protein conformations and to promote folding to the native state whenever possible. Alternatively, misfolded proteins are removed by degradation, involving, for example, the ubiquitin–proteasome system. For a long time molecular chaperones and cellular degradation systems were therefore viewed as opposing forces. However, recent evidence suggests that certain chaperones (in particular members of the 70- and 90-kDa heat shock protein families) are able to cooperate with the ubiquitin–proteasome system. Protein fate thus appears to be determined by a tight interplay of cellular protein-folding and protein-degradation systems.
2 A Biomedical Perspective
The aggregation and accumulation of misfolded proteins is now recognized as a common characteristic of a number of degenerative disorders, many Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Molecular Chaperones and the Ubiquitin–Proteasome System
of which have neurological manifestations [2, 3]. These diseases include prionopathies, Alzheimer’s and Parkinson’s diseases, and polyglutamine expansion diseases such as Huntington’s disease and spinocerebellar ataxia. At the cellular level, these diseases are characterized by the accumulation of aberrant proteins either intracellularly or extracellularly in specific groups of cells that subsequently undergo death. The precise association between protein accumulation and cell death remains incompletely understood and may vary from disease to disease. In some cases, misfolded protein accumulations may themselves be toxic or exert spatial constraints on cells that affect their ability to function normally. In other cases, the sequestering of proteins in aggregates may itself be a protective mechanism, and it is the overwhelming of pathways that consolidate aberrant proteins that is the toxic event. In either case, lessons learned from genetically determined neurodegenerative diseases have helped us to understand the inciting events of protein aggregation that ultimately lead to degenerative diseases. Mutations resulting in neurodegenerative diseases fall into two broad classes. The first class comprises mutations that affect proteins, irrespective of their native function, and cause them to misfold. The classic example of this is Huntington’s disease [4, 5]. The protein encoded by the huntingtin gene contains a stretch of glutamine residues (or polyglutamine repeat), and the genomic DNA sequence that codes for this polyglutamine repeat is subject to misreading and expansion. When the length of the polyglutamine repeat in huntingtin reaches a critical threshold of approximately 35 residues, the protein becomes prone to misfolding and aggregation [6]. This appears to be the proximate cause of neurotoxicity in this invariably fatal disease [7, 8]. A number of other neurodegenerative diseases are caused by polyglutamine expansions [9, 10]. For example, spinocerebellar ataxia is caused by polyglutamine expansions in the protein ataxin-1 [11]. In other diseases, protein misfolding occurs due to other mutations that induce misfolding and aggregation; for example, mutations in superoxide dismutase-1 lead to aggregation and neurotoxicity in amyotrophic lateral sclerosis [12, 13]. Other mutations that result in neurodegenerative diseases are instructive in that they directly implicate the ubiquitin–proteasome system in the pathogenesis of these diseases [14]. For example, mutations in the gene encoding the protein parkin are associated with juvenile-onset Parkinson’s disease [15, 16]. Parkin is a RING finger–containing ubiquitin ligase, and mutations in this ubiquitin ligase cause accumulation of target proteins that ultimately result in the neurotoxicity and motor dysfunction associated with Parkinson’s disease [17–20]. Repressor screens of neurodegeneration phenotypes in animal models have also linked the molecular chaperone machinery to neurodegeneration [21–24]. Taken together, the pathophysiology of neurodegenerative diseases provides a compelling demonstration of the importance of the regulated metabolism of misfolded proteins and provides direct evidence of the role of both molecular chaperones and the ubiquitin–proteasome system in guarding against protein misfolding and its consequent toxicity.
3 Molecular Chaperones: Mode of Action and Cellular Functions 3
3 Molecular Chaperones: Mode of Action and Cellular Functions
Molecular chaperones are defined by their ability to bind and stabilize nonnative conformations of other proteins [25, 26]. Although they are an amazingly diverse group of conserved and ubiquitous proteins, they are also among the most abundant intracellular proteins. The classical function of chaperones is to facilitate protein folding, inhibit misfolding, and prevent aggregation. These folding events are regulated by interactions between chaperones and ancillary proteins, the co-chaperones, which in general assist in cycling unfolded substrate proteins on and off the active chaperone complex [25, 27, 28]. In agreement with their essential function under normal growth conditions, chaperones are ubiquitously expressed and are found in all cellular compartments of the eukaryotic cell (except for peroxisomes). In addition, cells greatly increase chaperone concentration as a response to diverse stresses, when proteins become unfolded and require protection and stabilization [29]. Accordingly, many chaperones are heat shock proteins (Hsps). Four main families of cytoplasmic chaperones can be distinguished: the Hsp70 family, the Hsp90 family, the small heat shock proteins, and the chaperonins. 3.1 The Hsp70 Family
The Hsp70 proteins bind to misfolded proteins promiscuously during translation or after stress-mediated protein damage [26, 30]. Members of this family are highly conserved throughout evolution and are found throughout the prokaryotic and eukaryotic phylogeny. It is common for a single cell to contain multiple homologues, even within a single cellular compartment; for example, mammalian cells express two inducible homologues (Hsp70.1 and Hsp70.3) and a constitutive homologue (Hsc70) in the cytoplasm. These homologues have overlapping but not totally redundant cellular functions. Members of this family are typically in the range of 70 kDa in size and contain three functional domains: an amino-terminal ATPase domain, a central peptide-binding cleft, and a carboxyl terminus that seems to form a lid over the peptide-binding cleft [28] (Figure 1). The chaperones recognize short segments of the protein substrate, which are composed of clusters of hydrophobic amino acids flanked by basic residues [31]. Such binding motifs occur frequently within protein sequences and are found exposed on nonnative proteins. In fact, mammalian Hsp70 binds to a wide range of nascent and newly synthesized proteins, comprising about 15–20% of total protein [32]. This percentage is most likely further increased under stress conditions. Hsp70 proteins apparently prevent protein aggregation and promote proper folding by shielding hydrophobic segments of the protein substrate. The hydrophobic segments are recognized by the central peptide-binding domain of Hsp70 proteins (Figure 1). The domain is composed of two sheets of β strands that together with connecting loops form a cleft to accommodate extended peptides of about seven amino acids in length, as revealed
4 Molecular Chaperones and the Ubiquitin–Proteasome System
Fig. 1 Schematic presentation of the domain architecture and chaperone cycle of Hsp70. Hsp70 proteins display a characteristic domain structure comprising an amino-terminal ATPase domain (ATP), a peptide-binding domain (P), and a carboxylterminal domain (C) that is supposed to form a lid over the peptide-binding domain. In the ATP-bound conformation, the binding
pocket is open, resulting in a low affinity for the binding of a chaperone substrate. ATP hydrolysis induces stable substrate binding through a closure of the peptide-binding pocket. Substrate release is induced upon nucleotide exchange. ATP hydrolysis and nucleotide exchange are regulated by diverse co-chaperones.
in crystallographic studies of bacterial Hsp70 [33]. In the obtained crystal structure, the adjacent carboxyl-terminal domain of Hsp70 folds back over the β sandwich, suggesting that the domain may function as a lid in permitting entry and release of protein substrates (Figure 1). According to this model, ATP binding and hydrolysis by the amino-terminal ATPase domain of Hsp70 induce conformational changes of the carboxyl terminus, which lead to lid opening and closure [28]. In the ATP-bound conformation of Hsp70, the peptide-binding pocket is open, resulting in rapid binding and release of the substrate and consequently in a low binding affinity (Figure 1). Stable holding of the protein substrate requires closing of the binding pocket, which is induced upon ATP hydrolysis and conversion of Hsp70 to the ADP-bound conformation. The dynamic association of Hsp70 with nonnative polypeptide substrates thus depends on ongoing cycles of ATP binding, hydrolysis, and nucleotide exchange. Importantly, ancillary co-chaperones are employed to regulate the ATPase cycle [27, 30]. Co-chaperones of the Hsp40 family (also termed J proteins due to their founding member bacterial DnaJ) stimulate the ATP hydrolysis step within the Hsp70 reaction cycle and in this way promote substrate binding [34] (Figure 1). In contrast, the carboxyl terminus of Hsp70-interacting protein CHIP attenuates ATP hydrolysis [35]. Similarly, nucleotide exchange on Hsp70 is under the control of stimulating and inhibiting co-chaperones. The Hsp70-interacting protein Hip slows down nucleotide exchange by stabilizing the ADP-bound conformation of the chaperone [36], whereas nucleotide exchange is stimulated by the co-chaperone BAG-1 (Bcl-2-associated athanogene 1), which assists substrate unloading from Hsp70 [37–39]. By altering the AT-Pase cycle, the co-chaperones directly modulate the folding activity of Hsp70. In addition to chaperone-recognition motifs, co-chaperones often possess other functional
3 Molecular Chaperones: Mode of Action and Cellular Functions 5
Fig. 2 Domain architecture of diverse cochaperones of Hsp70. DnaJ: domain related to the bacterial co-chaperone DnaJ; TPR: tetratricopeptide repeat; Sti1: domain related to the yeast co-chaperone Sti1; CC: coiled-coil domain; U box: E2-interacting domain present in certain ubiquitin ligases;
PG: polyglycine region; ARM: armadillo repeat; TRSEEX: repeat motif found at the amino terminus of BAG-1 isoforms; ubl: ubiquitin-like domain; BAG: Hsp70-binding domain present in BAG proteins; WW: protein interaction domain.
domains and therefore link chaperone activity to distinct cellular processes [27, 40] (Figure 2). Indeed, as discussed below, the co-chaperones BAG-1 and CHIP apparently modulate Hsp70 function during protein degradation. 3.2 The Hsp90 Family
The 90-kDa cytoplasmic chaperones are members of the Hsp90 family, and in mammals two isoforms exist: Hsp90α and Hsp90β. The Hsp70 and Hsp90 families exhibit several common features: both possess ATPase activity and are regulated by ATP binding and hydrolysis, and both are further regulated by ancillary co-chaperones [41–48]. Unlike Hsp70, however, cytoplasmic Hsp90 is not generally involved in the folding of newly synthesized polypeptide chains. Instead it plays a key role in the regulation of signal transduction networks, as most of the known substrates of Hsp90 are signaling proteins, the classical examples being steroid hormone receptors and signaling kinases. On a molecular level, Hsp90 binds
6 Molecular Chaperones and the Ubiquitin–Proteasome System
Fig. 3 Cooperation of Hsp70 and Hsp90 during the regulation of signal transduction pathways. The inactive signaling protein, e.g., a steroid hormone receptor, is initially recognized by Hsp40 and delivered to Hsp70. Subsequently, a multi-chaperone complex assembles that contains the Hsp70 co-chaperone Hip and the Hsp70/Hsp90organizing protein Hop. Hop stimulates recruitment of an Hsp90 dimer that accepts the substrate from Hsp70. At the final stage
of the chaperone pathway, Hsp90 associates with p23 and diverse cyclophilins (cycloph.) to mediate conformational changes of the signaling protein necessary to reach an activatable state. Upon activation, i.e., hormone binding in the case of the steroid receptor, the signaling protein is released from Hsp90. In the absence of an activating stimulus, the signaling protein folds back to the inactive state when released and enters a new cycle of chaperone binding.
to substrates at a late stage of the folding pathway, when the substrate is poised for activation by ligand binding or associations with other factors. Consequently, Hsp90 accepts partially folded conformations from Hsp70 for further processing. In the case of the chaperone-assisted activation of the glucocorticoid hormone receptor and also of the progesterone receptor, the sequence of events leading to attaining an active conformation is fairly well understood [49–53]. It appears that the receptors are initially recognized by Hsp40 and are then delivered to Hsp70 [54] (Figure 3). Subsequent transfer onto Hsp90 requires the Hsp70/Hsp90-organizing protein Hop, which possesses non-overlapping binding sites for Hsp70 and Hsp90 and therefore acts as a coupling factor between the two chaperones [55]. In conjunction with p23 and different cyclophilins, Hsp90 eventually mediates conformational changes that enable the receptor to reach a high-affinity state for ligand binding. On other signaling pathways Hsp90 serves as a scaffolding factor to permit interactions between kinases and their substrates, as is the case for Akt kinase and endothelial nitric oxide synthase [56]. Since many of the Hsp90 substrate proteins are involved in regulating cell proliferation and cell death, it is not surprising that the chaperone recently emerged as a drug target in tumor therapy [57–59]. The antibiotics
3 Molecular Chaperones: Mode of Action and Cellular Functions 7
Fig. 4 Alteration of chaperone action during signal transduction induced by Hsp90 inhibitors such as geldanamycin and radicicol. In the presence of the inhibitors the activation pathway is blocked, and signaling
proteins are targeted to the proteasome for degradation in a process that involves the co-chaperone CHIP and other E3 ubiquitin ligases that remain to be identified.
geldanamycin and radicicol specifically bind to Hsp90 in mammalian cells and inhibit the function of the chaperone by occupying its ATP-binding pocket [60–63]. Drugs based on these compounds are now being developed as anticancer agents, as they potentially inactivate multiple signaling pathways that drive carcinogenesis. Remarkably, drug-induced inhibition of Hsp90 blocks the chaperone-assisted activation of signaling proteins and leads to their rapid degradation via the ubiquitin–proteasome pathway [64–69] (Figure 4). Hsp90 inhibitors therefore have emerged as helpful tools to study chaperone- proteasome cooperation. 3.3 The Small Heat Shock Proteins
The precise functions of small heat shock proteins (sHsps) including Hsp27 and the eye-lens protein αB-crystallin are incompletely understood. However, they seem to play a major role in preventing protein aggregation under conditions of cellular stress [70–73]. All members investigated so far form large oligomeric complexes of spherical or cylindrical appearance [74, 75]. Complex formation is independent of ATP binding and hydrolysis, but appears to be regulated by temperature and phosphorylation. The structural analysis of wheat Hsp16.9 suggested that the oligomeric complex acts as a storage form rather than an enclosure for substrates, as the active chaperone appears to be a dimer [75]. In agreement with this notion, dissociation of
8 Molecular Chaperones and the Ubiquitin–Proteasome System
the oligomeric complex formed by yeast Hsp26 was found to be a prerequisite for efficient chaperone activity [76]. Subsequent refolding may occur spontaneously or may involve cooperation with other chaperones such as Hsp70 [77]. 3.4 Chaperonins
The chaperone proteins best understood with regard to their mode of action are certainly the so-called chaperonins, which are defined by a barrel-shaped, doublering structure [25, 28]. Members include bacterial GroEL, Hsp60 of mitochondria and chloroplasts, and the TriC–CCT complex localized in the eukaryotic cytoplasm. Based on their characteristic ring structure, a central cavity is formed, which accommodates nonnative proteins via hydrophobic interactions. Conformational changes of the chaperonin subunits induced through ATP hydrolysis change the inner lining of the cavity from a hydrophobic to a hydrophilic character [78–80]. As a consequence the unfolded polypeptide is released into the central chamber and can proceed on its folding pathway in a protected environment [81]. The chaperonins are therefore capable of folding proteins such as actin that cannot be properly folded via other mechanisms [82].
4 Chaperones: Central Players During Protein Quality Control
Due to their ability to recognize nonnative conformations of other proteins, molecular chaperones are of central importance during protein quality control. This was elegantly revealed in studies on the influence of the Hsp70 chaperone system on polyglutamine diseases using the fruit fly Drosophila melanogaster as a model organism (reviewed in Refs. [23] and [83]). Hallmarks of the polyglutamine disease spinocerebellar ataxia type 3 (SCA3), for example, were recapitulated in transgenic flies that expressed a pathological polyQ tract of the ataxin-3 protein in the eye disc [84]. Transgene expression caused formation of abnormal protein inclusions and progressive neuronal degeneration. Intriguingly, co-expression of human cytoplasmic Hsp70 suppressed polyQ-induced neurotoxicity. In a similar experimental approach, Hsp40 family members protected neuronal cells against toxic polyQ expression [22]. Enhancing the activity of the Hsp70/Hsp40 chaperone system apparently mitigates cytotoxicity caused by the accumulation of aggregationprone proteins. These findings obtained in Drosophila were confirmed in a mouse model of spinocerebellar ataxia type 1 (SCA1) [85, 86]. Unexpectedly, however, the Hsp70 chaperone system was unable to prevent the formation of protein aggregates in these models of polyglutamine diseases and upon polyQ expression in yeast and mammalian cells [84, 85, 87–89]. Elevating the cellular levels of Hsp70 and of some Hsp40 family members affected the number of protein aggregates and their biochemical properties, but did not inhibit the formation of polyQ aggregates. Notably, Hsp70 and Hsp40 profoundly modulated the aggregation process of
5 Chaperones and Protein Degradation
polyQ tracts in biochemical experiments; this led to the formation of amorphous, SDS-soluble aggregates, instead of the ordered, SDS-insoluble amyloid fibrils that form in the absence of the chaperone system [88]. These biochemical data were confirmed in yeast and mammalian cells [88, 90]. Although unable to prevent the formation of protein aggregates, the Hsp70 chaperone system apparently prevents the ordered oligomerization and fibril growth that is characteristic of the disease process. In an alternate but not mutually exclusive model to explain their protective role, the chaperones may cover potentially dangerous surfaces exposed by polyQcontaining proteins during the oligomerization process or by the final oligomers. Intriguingly, elevated expression of Hsp70 also suppresses the toxicity of the nonpolyQ-containing protein α-synuclein in a Drosophila model of Parkinson’s disease without inhibiting aggregate formation [24]. Hsp70 may thus exert a rather general function in protecting cells against toxic protein aggregation. This raises the exciting possibility that treatment of diverse forms of human neurodegenerative diseases may be achieved through upregulation of Hsp70 activity. The mentioned examples illustrate that one does not have to evoke the refolding of an aberrant protein to the native state in order to explain the protective activity of Hsp70 observed in models of amyloid diseases. In some cases it might be sufficient for Hsp70 to modulate the aggregation process or to shield interaction surfaces of the misfolded protein to decrease cytotoxic effects. Another option may involve presentation of the misfolded protein to the ubiquitin–proteasome system for degradation.
5 Chaperones and Protein Degradation
Hsp70 and Hsp90 family members as well as small heat shock proteins have all been implicated to participate in protein degradation. For example, the small heat shock protein Hsp27 was recently shown to stimulate the degradation of phosphorylated IκBα via the ubiquitin–proteasome pathway, which may account for the antiapoptotic function of Hsp27 [91]. Similarly, Hsp27 facilitates the proteasomal degradation of phosphorylated tau, a microtubule-binding protein and component of protein deposits in Alzheimer’s disease [92]. Hsp70 participates in the degradation of apolipoprotein B100 (apoB), which is essential for the assembly and secretion of very low-density lipoproteins from the liver [93]. Under conditions of limited availability of core lipids, apoB translocation across the ER membrane is attenuated, resulting in the exposure of some domains of the protein into the cytoplasm and their recognition by Hsp70. This is followed by the degradation of apoB via the ubiquitin–proteasome pathway. Elevating cellular Hsp70 levels stimulated the degradation of the membrane protein, suggesting that the chaperone facilitates sorting to the proteasome. Genetic studies in yeast indicate that cytoplasmic Hsp70 may fulfill a rather general role in the degradation of ER-membrane proteins that display large domains into the cytoplasm [94]. In agreement with this notion, Hsp70 also takes part in the degradation of immaturely glycosylated and aberrantly
9
10 Molecular Chaperones and the Ubiquitin–Proteasome System
folded forms of the cystic fibrosis transmembrane conductance regulator (CFTR) [95–98]. CFTR is an ion channel localized at the apical surface of epithelial cells. Its functional absence causes cystic fibrosis, the most common fatal genetic disease in Caucasians [99, 100]. The disease-causing allele, F508, which is expressed in more than 70% of all patients, drastically interferes with the protein’s ability to fold, essentially barring it from functional expression in the plasma membrane. However, wild-type CFTR also folds very inefficiently, and less than 30% of the protein reaches the plasma membrane [99]. While trafficking from the endoplasmic reticulum (ER) to the Golgi apparatus, immature forms of CFTR are recognized by quality-control systems and are eventually directed to the proteasome for degradation [101–104]. A critical step during CFTR biogenesis is the inefficient folding of the first of two cytoplasmically exposed nucleotide-binding domains (NBD1) of the membrane protein [105, 106]. The disease-causing F508 mutation localizes to NBD1 and further decreases the folding propensity of this domain. During the co-translational insertion of CFTR into the ER membrane, cytoplasmic Hsp70 and its co-chaperone Hdj-2 bind to NBD1 and facilitate intramolecular interactions between the domain and another cytoplasmic region of CFTR, the regulatory R-domain [96, 107]. However, Hsp70 is also able to present CFTR to the ubiquitin–proteasome system [97], and heterologous expression of CFTR in yeast revealed an essential role of cytoplasmic Hsp70 in CFTR turnover [98]. Hsp70 is thus a key player in the cellular surveillance system that monitors the folded state of CFTR at the ER membrane. Interestingly, CFTR and the disease form F508 are deposited in distinct pericentriolar structures, termed aggresomes, upon overexpression or proteasome inhibition [108]. Subsequent studies established that aggresomes are induced upon ectopic expression of many different aggregation-prone proteins (reviewed in Refs. [109] and [110]). Aggresomes form near the microtubule-organizing center in a manner dependent on the microtubule-associated motor protein dynein, and are surrounded by a “cage” of filamentous vimentin [108, 111]. Aggresome formation is apparently a specific and active cellular response when production of misfolded proteins exceeds the capacity of the ubiquitin–proteasome system to tag and remove these proteins. They likely serve to protect the cell from toxic “gain-of-function” activities acquired by misfolded proteins. Aggresomes are also of clinical relevance as they share remarkable biochemical and structural features, for example, with Lewy bodies, the cytoplasmic inclusion bodies found in neurons affected by Parkinson’s disease [112]. The pathways that regulate aggresome assembly are only now being explicated. Histone deacetylase 6 (HDAC6) appears to be a key regulator of aggresome assembly [113]. HDAC6 is a microtubule-associated deacetylase that has the capacity to bind both multi-ubiquitinated proteins and dynein motors and is believed to recruit misfolded proteins to the pericentriolar region for aggresome assembly. Deletion of HDAC6 prevents aggresome formation and sensitizes cells to the toxic effects of misfolded proteins, which supports the hypothesis that aggresomes sequester misfolded proteins to protect against their toxic activities. Components of the ubiquitin–proteasome system and chaperones such as Hsp70 are abundantly present in and are actively recruited to aggresomes [114–116]. Furthermore, elevating cellular Hsp70 levels can reduce aggresome formation by
5 Chaperones and Protein Degradation
stimulating proteasomal degradation [117]. It appears that these subcellular structures are major sites of chaperone – proteasome cooperation to mediate the metabolism of misfolded proteins. The formation of aggresome-like structures is also observed in dendritic cells that present foreign antigens to other immune cells [118]. Immature dendritic cells are located in tissues throughout the body, including skin and gut. When they encounter invading microbes, the pathogens are endocytosed and processed in a manner that involves the generation of antigenic peptides by the ubiquitin–proteasome system. Upon induction of dendritic cell maturation, ubiquitinated proteins transiently accumulate in large cytosolic structures that resemble aggresomes and were therefore termed DALIS (dendritic cell aggresome-like induced structures). It was speculated that DALIS formation may enable dendritic cells to regulate antigen processing and presentation. DALIS contain components of the ubiquitin–proteasome machinery as well as Hsp70 and the co-chaperone CHIP [118, 119]. Again, an interplay of molecular chaperones and the ubiquitin–proteasome system during regulated protein turnover is suggested. The cellular function of molecular chaperones is apparently not restricted to mediating protein folding; instead, chaperones emerge also as vital components on protein-degradation pathways. Remarkably, the balance between folding and degradation activities of chaperones can be manipulated. In cells treated with Hsp90 inhibitors, for example, with geldanamycin (see above), the chaperone-assisted activation of signaling proteins is abrogated and chaperone substrates such as the protein kinases Raf-1 and ErbB2 are rapidly degraded by the ubiquitin–proteasome system [64–69, 120]. This appears to be due, in part, to transfer of the substrates back to Hsp70 and progression toward the ubiquitin-dependent degradation pathway. Substrate interactions with chaperones – and consequently their commitment either toward the folding pathway or to their degradation via the ubiquitin– proteasome machinery – apparently serve as an essential post-translational protein quality-control mechanism within eukaryotic cells. The partitioning of proteins to either one of these mutually exclusive pathways is referred to as “protein triage” [121]. Although some misfolded proteins may be directly recognized by the proteasome [122], specific pathways within the ubiquitin–proteasome system are probably relied on for the degradation of most misfolded and damaged proteins. For example, E2 enzymes of the Ubc4/5 family selectively mediate the ubiquitylation of abnormal proteins as revealed in genetic studies in Saccharomyces cerevisiae [123]. It is well accepted that chaperones play a central role in the triage decision; however, less well understood are the events that lead to the cessation of efforts to fold a substrate, and the diversion of the substrate to the terminal degradative pathway. It is possible that chaperones and components of the ubiquitin–proteasome pathway exist in a state of competition for these substrates and that repeated cycling of a substrate on and off a chaperone maintains the substrate in a soluble state and increases, in a stochastic fashion, its likelihood of interactions with the ubiquitin machinery (Figure 5A). However, some data argue for a more direct role of the chaperones in the degradation process. Hsp70 plays an active and necessary role in the ubiquitylation of some substrates [124]; this activity of Hsp70
11
12 Molecular Chaperones and the Ubiquitin–Proteasome System
Fig. 5 Interplay of molecular chaperones with the ubiquitin–proteasome system. (A) Chaperones and the degradation machinery (i.e., ubiquitylation systems) compete with each other in the recognition of folding intermediates. Interaction with the chaperones directs the substrate towards folding. However, when the protein substrate is unable to attain a folded conformation, the chaperones maintain the folding intermediate in a soluble state that can be recog-
nized by the degradation machinery. (B) The chaperones are actively involved in protein degradation. Through an association with certain components of the ubiquitin conjugation machinery (degrading partner), the chaperones participate in the targeting of protein substrates to the proteasome. A competition between degrading partners and folding partners determines chaperone action and the fate of the protein substrate.
requires its chaperone function, indicating that conformational changes within substrates may facilitate recognition by the ubiquitylation machinery. Plausible hypotheses to explain these observations include direct associations between the chaperone and ubiquitin–proteasome machinery to facilitate transfer of a substrate from one pathway to the other, or conversion of the chaperone itself to a ubiquitylation complex (Figure 5B). It is also entirely possible that several quality-control pathways may exist and that the endogenous triage decision may involve aspects of each of these hypotheses.
6 The CHIP Ubiquitin Ligase: A Link Between Folding and Degradation Systems
Major insights into molecular mechanisms that underlie the cooperation of molecular chaperones with the ubiquitin–proteasome system were obtained through the functional characterization of the co-chaperone CHIP (reviewed in Ref. [40]). CHIP was initially identified in a screen for proteins containing tetratricopeptide repeat (TPR) domains, which are found in several co-chaperones – including Hip, Hop, and the cyclophilins – as chaperone-binding domains [27, 55] (Figure 2). CHIP
6 The CHIP Ubiquitin Ligase: A Link Between Folding and Degradation Systems 13
contains three TPR domains at its amino terminus, which are used for binding to Hsp70 and Hsp90 [35, 125]. Besides the TPR domains, CHIP possesses a Ubox domain at its carboxyl terminus [35] (Figure 2). U-box domains are similar to RING finger domains, but they lack the metal-chelating residues and instead are structured by intramolecular interactions [126]. The predicted structural similarity suggests that U boxes, like RING fingers, may also play a role in targeting proteins for ubiquitylation and subsequent proteasome-dependent degradation, and this possibility is borne out in functional analyses of U box–containing proteins [127, 128]. The TPR and U-box domains in CHIP are separated by a central domain rich in charged residues. The charged domain of CHIP is necessary for TPRdependent interactions with Hsp70 [35] and is also required for homodimerization of CHIP [129]. The tissue distribution of CHIP supports the notion that it participates in protein folding and degradation decisions, as it is most highly expressed in tissues with high metabolic activity and protein turnover: skeletal muscle, heart, and brain. Although it is also present in all other organs, including pancreas, lung, liver, placenta, and kidney, the expression levels are much lower. CHIP is also detectable in most cultured cells, and is particularly abundant in muscle and neuronal cells and in tumor-derived cell lines [35]. Intracellularly, CHIP is primarily localized to the cytoplasm under quiescent conditions [35], although a fraction of CHIP is present in the nucleus [97]. In addition, cytoplasmic CHIP traffics into the nucleus in response to environmental challenge in cultured cells, which may serve as a protective mechanism or to regulate transcriptional responses in the setting of stress [130]. CHIP is distinguished among co-chaperones in that it is a bona fide interaction partner with both of the major cytoplasmic chaperones Hsp90 and Hsp70, based on their interactions with CHIP in the yeast two-hybrid system and in vivo binding assays [35, 125]. CHIP interacts with the terminal-terminal EEVD motifs of Hsp70 and Hsp90, similar to other TPR domain–containing co-chaperones such as Hop [55, 131, 132]. When bound to Hsp70, CHIP inhibits ATP hydrolysis and therefore attenuates substrate binding and refolding, resulting in inhibition of the “forward” Hsp70 substrate folding/refolding pathway, at least in in vitro assays [35]. The cellular consequences of this “anti-chaperone” function are not yet clear, and in fact CHIP may actually facilitate protein folding under conditions of stress, perhaps by slowing the Hsc70 reaction cycle [130, 133]. CHIP interacts with Hsp90 with approximately equivalent affinity to its interaction with Hsp70 [125]. This interaction results in remodeling of Hsp90 chaperone complexes, such that the co-chaperone p23, which is required for the appropriate activation of many, if not all, Hsp90 client proteins, is excluded. The mechanism for this activity is unclear – p23 and CHIP bind Hsp90 through different sites – yet the consequence of this action is predictable: CHIP should inhibit the function of proteins that require Hsp90 for conformational activation. The glucocorticoid receptor is an Hsp90 client that undergoes activation through a well-described sequence of events that depend on interactions of the glucocorticoid receptor with Hsp90 and various Hsp90 co-chaperones, including p23, making it an excellent model to test this prediction. Indeed, CHIP inhibits glucocorticoid receptor substrate binding and
14 Molecular Chaperones and the Ubiquitin–Proteasome System
steroid-dependent transactivation ability [125]. Surprisingly, this effect of CHIP is accompanied by decreased steady-state levels of glucocorticoid receptor, and CHIP induces ubiquitylation of the glucocorticoid receptor in vivo and in vitro, as well as subsequent proteasome-dependent degradation. This effect is both U-box- and TPR-domain-dependent, suggesting that CHIP’s effects on GR require direct interaction with Hsp90 and direct ubiquitylation of GR and delivery to the proteasome. These observations are not limited to the glucocorticoid receptor. ErbB2, another Hsp90 client, is also degraded by CHIP in a proteasome-dependent fashion [120]. Nor are they limited to Hsp90 clients. For example, CHIP cooperates with Hsp70 during the degradation of immature forms of the CFTR protein at the ER membrane and during the ubiquitylation of phosphorylated forms of the microtubule-binding protein tau, which is of clinical importance due to its role in the pathology of Alzheimer’s disease [97, 134]. The effects of CHIP are dependent on both the TPR domain, indicating a necessity for interactions with molecular chaperones, and the U box, which suggests that the U box is most likely the “business end” with respect to ubiquitylation. The means by which CHIP-dependent ubiquitylation occurs is not clear. In the case of ErbB2, ubiquitylation depends on a transfer of the client protein from Hsp90 to Hsp70 [120], indicating that the final ubiquitylation complex consists of CHIP, Hsp70 (but not Hsp90), and the client protein. In any event, the studies are consistent in supporting a role for CHIP as a key component of the chaperone-dependent quality-control mechanism. CHIP efficiently targets client proteins, particularly when they are partially unfolded (as is the case for most Hsp90 clients when bound to the chaperone) or frankly misfolded (as is the case for most proteins binding to Hsp70 through exposed hydrophobic residues). Once the ubiquitylation activity of CHIP was recognized, it was logical to speculate that its U box might function in a manner analogous to that of RING fingers, which have recently been appreciated as key components of the largest family of ubiquitin ligases. If CHIP is a ubiquitin ligase, then its ability to ubiquitylate a substrate should be reconstituted in vitro when a substrate is added in the presence of CHIP, E1, an E2, and ubiquitin. Indeed, this is the case [135–137] (Figure 6). CHIP is thus the first described chaperone-associated E3 ligase. The ubiquitin ligase activity of CHIP depends on functional and physical interactions with a specific family of E2 enzymes, the Ubc4/Ubc5 family, which in humans comprises the E2s UbcH5a, UbcH5b, and UbcH5c. Of interest is the fact that the Ubc4/Ubc5 E2s are stress-activated, ubiquitin-conjugating enzymes [123]. CHIP can therefore be seen as a co-chaperone that, in addition to inhibiting traditional chaperone activity, converts chaperone complexes into chaperone-dependent ubiquitin ligases. Indeed, the chaperones themselves seem to act as the main substrate-recognition components of these ubiquitin ligase complexes, as efficient ubiquitylation of chaperone substrates by CHIP depends on the presence of Hsp70 or Hsp90 in reconstituted systems [136, 137] (Figure 6). The chaperones apparently function in a manner analogous to F-box proteins, which are required as substrate recognition modules in many RING finger–containing ubiquitin ligase complexes [138–140]. Recently, another surprising function for CHIP has been identified, that of activation of the stress-responsive transcription factor heat shock factor-1 (HSF1)
7 Other Proteins That May Influence the Balance
Fig. 6 Characterization of CHIP as a chaperone-associated ubiquitin ligase. Purified CHIP, UbcH5b, the ubiquitinactivating enzyme E1, ubiquitin, and the Hsp70–Hsp40 chaperone system were incubated with the bacterially expressed protein kinase Raf-1 (for details, see Ref. [137]). Raf-1 and ubiquitylated forms of the kinase
(ub(n) -Raf-1) were detected by immunoblotting using a specific anti-Raf-1 antibody. Efficient ubiquitylation of Raf-1 through the CHIP conjugation machinery depends on the recognition of the chaperone substrate by Hsp70, which presents the kinase to the conjugation machinery.
[130]. Through this association, CHIP regulates the expression of chaperones such as Hsp70 independently of its ability to modify their function through direct interactions. The mechanisms through which CHIP activates HSF1 are not entirely clear, but they are dependent in part on the induction of HSF1 trimerization, which is required for nuclear import and DNA binding. In addition, activation of HSF1 by CHIP seems to be independent of CHIP’s ubiquitin ligase activity. The consequences of this activation are important for the response to stress, in that cells lacking CHIP are prone to stress-dependent apoptosis and mice deficient in CHIP (through homologous recombination) succumb rapidly to thermal challenge. These data indicate that CHIP plays a heretofore unsuspected role in coordinating the response to stress, not only by serving as a rate-limiting step in the degradation of damaged proteins but also by increasing the buffering capacity of the chaperone system to guard against stress-dependent proteotoxicity.
7 Other Proteins That May Influence the Balance Between Chaperone-assisted Folding and Degradation
CHIP is ideally suited to mediate chaperone-proteasome cooperation, as it combines a chaperone-binding motif and a domain that functions in ubiquitindependent degradation within its protein structure (Figure 2). Some other cochaperones display a similar structural arrangement [40]. For example, BAG-1 contacts Hsp70 through a BAG-domain located at its carboxyl terminus and, in
15
16 Molecular Chaperones and the Ubiquitin–Proteasome System
Fig. 7 Schematic presentation of the BAG1–Hsp70–CHIP complex. BAG-1 associates with the ATPase domain of Hsp70, while CHIP is bound to the carboxyl terminus. BAG-1 mediates an association of Hsp70 with the proteasome via its ubiquitin-like
domain (ubl), whereas CHIP acts in conjunction with Ubc4/5 as a chaperoneassociated ubiquitin ligase to mediate the attachment of a polyubiquitin chain to the chaperone substrate.
addition, possesses a central ubiquitin-like domain that is used for binding to the proteasome [141] (Figure 2). The co-chaperone thus belongs to a family of ubiquitin domain proteins (UDPs), many of which were shown to be associated with the proteasome [142]. This domain architecture enables BAG-1 to provide a physical link between Hsp70 and the proteasome, and elevating the cellular levels of BAG-1 results in a recruitment of the chaperone to the proteolytic complex. Notably, BAG-1 and CHIP occupy distinct domains on Hsp70 (Figure 7). Whereas BAG-1 associates with the amino-terminal ATPase domain, CHIP binds to the carboxylterminal EEVD motif of Hsp70 [35, 37]. Ternary complexes that comprise both co-chaperones associated with Hsp70 can be isolated from mammalian cells, suggesting a cooperation of BAG-1 and CHIP in the regulation of Hsp70 activity on certain degradation pathways. In fact, BAG-1 is able to stimulate the CHIPinduced degradation of the glucocorticoid hormone receptor [137]. A cooperation of diverse co-chaperones apparently provides additional levels of regulation to alter chaperone-assisted folding and degradation pathways. Interestingly, BAG-1 and also Hsp70 and Hsp90 are themselves substrates of the CHIP ubiquitin ligase [135, 143] (J.H. unpublished). Yet, CHIP-mediated ubiquitylation of the chaperones and the co-chaperone does not induce their proteasomal degradation. Instead, it seems to provide additional means to regulate the association of the chaperone systems with the proteasome. In the case of BAG-1, ubiquitylation mediated by CHIP indeed stimulates the binding of the co-chaperone to the proteasome [143]. It remains to be elucidated, however, why Hsp70 and BAG-1 are not degraded when sorted to the proteasome through CHIP-induced ubiquitylation, in contrast to chaperone substrates such as the glucocorticoid hormone receptor. Possibly, the folded state of the proteins may serve to distinguish targeting factors and substrates doomed for degradation. Efficient ubiquitylation of BAG-1 mediated by CHIP is dependent on the formation of the ternary BAG-1–Hsp70–CHIP complex [143]. The formed chaperone complex would thus expose multiple signals for sorting to the proteasome, e.g., the integrated ubiquitin-like domain of BAG-1 and polyubiquitin chains attached to BAG-1, Hsp70, and the bound protein substrate. Such a redundancy of sorting information might be considered unnecessary. Intriguingly, however, several
7 Other Proteins That May Influence the Balance
subunits of the regulatory 19S particle of the proteasome are currently thought to act as receptors for polyubiquitin chains and integrated ubiquitin-like domains, including Rpn1, Rpn2, Rpt5, and Rpn10. The Rpn10 subunit was initially identified as a polyubiquitin chain receptor and was later shown to also bind integrated ubiquitin-like domains presented by UDPs [144–146]. Rpn10 possesses two distinct ubiquitin-binding domains, of which only one is used for UDP recognition [145–147]. However, conflicting data exist as to whether the subunit acts as a ubiquitin receptor in the context of the assembled 19S complex [148, 149]. More recently, Rpn1 was identified as a receptor for integrated ubiquitin-like domains [149], and a similar function may be fulfilled by the Rpn1-related subunit Rpn2 [150]. Polyubiquitin chains seem to be recognized by the Rpt5 subunit, one of the AAA ATPases present in the ring-like base of the regulatory 19S complex [151]. Its receptor function was revealed when tetraubiquitin was cross-linked to intact proteasomes [148]. Multiple docking sites for ubiquitin-like domains and polyubiquitin chains are apparently displayed by the regulatory particle of the proteasome. This may provide a structural basis for the recognition of multiple sorting signals exposed by the CHIP–chaperone complex (Figure 8). A similar mechanism involving multiple-site binding at the proteasome was recently proposed based on the
Fig. 8 The co-chaperone network that determines folding and degradation activities of Hsp70. BAG-1 and CHIP associate with Hsp70 to induce the proteasomal degradation of a Hsp70-bound protein substrate. When BAG-1 is displaced by binding of HspBP1 to the ATPase domain of Hsp70, the ubiquitin ligase activity of CHIP is attenuated in the formed complex, enabling
CHIP to modulate Hsp70 activity without inducing degradation. The ATPase domain can also be occupied by Hip, which stimulates the chaperone activity of Hsp70 and participates in the Hsp70/Hsp90-mediated regulation of signal transduction pathways. At the same time, Hop displaces CHIP from the carboxyl terminus of Hsp70 and recruits Hsp90 to the chaperone complex.
17
18 Molecular Chaperones and the Ubiquitin–Proteasome System
observation that two unrelated yeast ubiquitin ligases associate with specific subunits of the 19S regulatory complex [152]. In these cases substrate delivery involves interactions of proteasomal subunits with the substrate-bound ubiquitin ligase, with the polyubiquitin chain attached to the substrate, and with the substrate itself. Multiple-site binding may function to slow down dissociation of the substrate from the proteasome and to facilitate transfer into the central proteolytic chamber through ATP-dependent movements of the subunits of the 19S particle. Human cells contain several BAG-1-related proteins: BAG-2, BAG-3 (CAIR-1; Bis), BAG-4 (SODD), BAG-5, and BAG-6 (Scythe, BAT3) [153] (Figure 2). It appears that BAG proteins act as nucleotide-exchange factors to induce substrate unloading from Hsp70 on diverse protein folding, assembly, and degradation pathways. Notably, BAG-6 is another likely candidate for a co-chaperone that regulates protein degradation via the ubiquitin–proteasome pathway. Similar to BAG-1, BAG-6 contains a ubiquitin-like domain that is possibly used for proteasome binding [154]. However, experimental data verifying a role of BAG-6 in protein degradation remain elusive so far. The cooperation of diverse co-chaperones not only may allow promotion of chaperone-associated degradation but also may provide the means to confine the destructive activity of CHIP. The Hsp70-binding protein 1 (HspBP1) seems to fulfill such a regulatory function [155]. HspBP1 was initially identified in a screen for proteins that associate with the ATPase domain of Hsp70 and was shown to stimulate nucleotide release from the chaperone [156, 157]. Notably, association of HspBP1 with the ATPase domain blocks binding of BAG-1 to Hsp70 and at the same time promotes an interaction of CHIP with Hsp70’s carboxyl terminus. In the formed ternary HspBP1–Hsp70–CHIP complex, the ubiquitin ligase activity of CHIP is attenuated and Hsp70 as well as a chaperone substrate are no longer efficiently ubiquitylated [155]. By interfering with CHIP-mediated ubiquitylation, HspBP1 stimulates the maturation of CFTR and promotes the sorting of the membrane protein to the cell surface. HspBP1 apparently functions as an antagonist of the CHIP ubiquitin ligase to regulate Hsp70-assisted folding and degradation pathways (Figure 8). The HspBP1-mediated inhibition of the ubiquitin ligase activity may enable CHIP to modulate the Hsp70 ATPase cycle without inducing degradation. In fact, degradation-independent functions of CHIP have recently emerged [130, 133, 158, 159]. CHIP was shown to regulate the chaperone-assisted folding and sorting of the androgen receptor and of endothelial nitric oxide synthase without inducing degradation [158, 159]. Moreover, CHIP fulfills an essential role in the chaperone-mediated regulation of the heat shock transcription factor, independent of its degradation-inducing activity [130]. It remains to be seen, however, whether HspBP1 cooperates with CHIP in these instances, as HspBP1 displayed a certain specificity with regard to chaperone substrates. The co-chaperone interfered with the degradation of CFTR, but did not influence the CHIP-mediated turnover of the glucocorticoid hormone receptor. Such a client specificity may arise in part from the fact that HspBP1 inhibits the ubiquitin ligase activity of CHIP in a complex with Hsc70, but leaves Hsp90-associated
8 Further Considerations
ubiquitylation unaffected [155]. In addition, direct interactions between HspBP1 and a subset of chaperone substrates may contribute to substrate selection. In any case, the cooperation of CHIP with other co-chaperones apparently provides a means to regulate chaperone-assisted protein degradation. It is likely that there are multiple degradation pathways for misfolded proteins in the eukaryotic cytoplasm. Although CHIP participates in the degradation of chaperone substrates induced by applying Hsp90 inhibitors to cell cultures (see above), drug-induced degradation is not abrogated in cells that lack the CHIP ubiquitin ligase [120]. Furthermore, CHIP cooperates with Hsp70 in the ER-associated degradation of CFTR, but the Hsp70-assisted degradation of apoB at the cytoplasmic face of the ER membrane does not involve CHIP [97]. Taken together, these data strongly argue for the existence of other, yet to be identified, ubiquitin ligases that are able to target chaperone substrates to the proteasome. A likely candidate in this regard is Parkin, a RING finger ubiquitin ligase, whose activity is impaired in juvenile forms of Parkinson’s disease [17]. Hsp70 and CHIP were found to be associated with Parkin in neuronal cells, suggesting an involvement of Parkin in the proteasomal degradation of chaperone substrates [160]. Interestingly, α-synuclein, the main component of protein deposits observed in dopaminergic neurons of Parkinson patients, and synphilin, a protein that binds α-synuclein and induces deposit formation, both associate with yet other ubiquitin ligases: Siah-1 and Siah-2 [161, 162]. In the case of Siah-1, a link to cytoplasmic chaperone systems is suggested by the finding that the Hsp70 co-chaperone BAG-1 is a binding partner of the ubiquitin ligase and suppresses some of the cellular activities of Siah-1 [163]. Taken together, it is tempting to speculate about a role of Parkin and Siah on chaperone-assisted degradation pathways; yet, this remains to be explored in detail.
8 Further Considerations
Although the appreciation of interplay between molecular chaperones and ubiquitin-dependent proteolysis has greatly expanded over the past decade, a number of critical issues remain to be resolved. It is not entirely clear what determines whether a misfolded protein will undergo repeated attempts at misfolding versus diversion to the ubiquitin–proteasome pathway. Recruitment of CHIP into chaperone complexes appears to be a critical component of this reaction, which therefore begs the question as to what regulates this step. Since this step in protein quality control must be both rapidly activated and easily reversible, it is likely that regulation occurs at the post-translational level rather than through changes in steady-state protein levels. The precise sorting mechanisms for ubiquitinated proteins are also unclear. BAG-1 is a player, and it is also likely that overlap exists to some extent for sorting of the cytoplasmic and endoplasmic reticulum quality-control pathways. Nevertheless, much remains to be learned about these steps. From a broader perspective, it is now also imperative to understand the pathophysiological roles of cytoplasmic quality-control mechanisms regulated by
19
20 Molecular Chaperones and the Ubiquitin–Proteasome System
chaperone-proteasome interactions. As mentioned previously, there is a strong association between chaperone dysfunction and accumulations of misfolded proteins that characterizes genetic neurodegenerative diseases. An imbalance between protein folding and degradation may also contribute to some features of senescence and organismal aging. The link between chaperone systems and aging is based on increasing appreciation that modified, misfolded, and aggregated proteins accumulate with age [164]. Dysregulation of chaperone expression has been observed with aging and is therefore implicated in aging-related changes [165]; in general, it is accepted that induction of the major chaperones is impaired with aging, a fact confirmed by recent gene-profiling experiments in vivo [166], although given the diversity of chaperones it is probably not surprising that age-related changes in expression are fairly complicated [167]. The mechanism underlying this dysregulation is not entirely clear, but seems to be due in part to impaired activation of the stress-responsive transcription factor HSF1. Overexpression of heat shock proteins in yeast, C. elegans, and Drosophila leads to increased longevity [168–170]. More recently, conclusive genetic evidence from C. elegans indicates that mutation of HSF1 causes a dramatic and significant reduction in lifespan [170, 171], further implicating the accumulation of misfolded proteins in age-related phenotypes.
9 Conclusions
The associations between molecular chaperones and the ubiquitin–proteasome system represent a critical step in the response to proteotoxic damage. Whether attempts should be made to refold damaged proteins (thus conserving cellular resources) or degrade them instead (to prevent the possibility of protein aggregation and concomitant toxicity) requires a consideration of cellular economy. Defects in the quality-control mechanisms may have enormous consequences even if only slight imbalances occur between protein folding and degradation, as these imbalances can cause accumulated toxicity over time. The relationship between chaperone–proteasome interactions and pathophysiological events is only now being unraveled. Modulation of this system may provide a unique therapeutic target for degenerative diseases and pathologies associated with aging. References 1 U. SCHUBERT, L. C. ANTON, J. GIBBS, C. C. NORBURY, J. W. YEWDELL, J. R. BENNINK, Rapid degradation of a large fraction of newly synthesized proteins by proteasomes, Nature 404 (2000) 770–774. 2 J. P. TAYLOR, J. HARDY, K. H. FISCHBECK, Toxic proteins in neurodegenerative disease, Science 296 (2002) 1991–1995.
3 A. HORWICH, Protein aggregation in disease: a role for folding intermediates forming specific multimeric interactions, J. Clin. Invest. 110 (2002) 1221–1232. 4 E. SCHERZINGER, R. LURZ, M. TURMAINE, L. MANGIARINI, B. HOLLENBACH, R. HASENBANK, G. P. BATES, S. W. DAVIES, H. LEHRACH, E. E. WANKER,
References
5
6
7
8
9
10
11
12
Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo, Cell 90 (1997) 549–558. E. SCHERZINGER, A. SITTLER, K. SCHWEIGER, V. HEISER, R. LURZ, R. HASENBANK, G. P. BATES, H. LEHRACH, E. E. WANKER, Self-assembly of polyglutamine-containing huntingtin fragments into amyloid-like fibrils: implications for Huntington’s disease pathology, Proc. Natl. Acad. Sci. USA 96 (1999) 4604–4609. A. KAZANTSEV, E. PREISINGER, A. DRANOVSKY, D. GOLDGABER, D. HOUSMAN, Insoluble detergent-resistant aggregates form between pathological and nonpathological lengths of polyglutamine in mammalian cells, Proc. Natl. Acad. Sci. USA 96 (1999) 11404–11409. D. MARTINDALE, A. HACKAM, A. WIECZOREK, L. ELLERBY, C. WELLINGTON, K. MCCUTCHEON, R. SINGARAJA, P. KAZEMI-ESFARJANI, R. DEVON, S. U. KIM, D. E. BREDESEN, F. TUFARO, M. R. HAYDEN, Length of huntingtin and its polyglutamine tract influences localization and frequency of intracellular aggregates. Nat. Genet. 18 (1998) 150–154. I. SANCHEZ, C. MAHLKE, J. YUAN, Pivotal role of oligomerization in expanded polyglutamine neurodegenerative disorders, Nature 421 (2003) 373–379. M. F. PERUTZ, Glutamine repeats and neurodegenerative diseases: molecular aspects, Trends Biochem. Sci. 24 (1999) 58–63. H. Y. ZOGHBI, H. T. ORR, Glutamine repeats and neurodegeneration, Annu. Rev. Neurosci. 23 (2000) 217–247. H. T. ORR, M. Y. CHUNG, S. BANFI, T. J. Jr. KWIATKOWSKI, A. SERVADIO, A. L. BEAUDET, A. E. MCCALL, L. A. DUVICK, L. P. RANUM, H. Y. ZOGHBI, Expansion of an unstable trinucleotide CAG repeat in spinocerebellar ataxia type 1, Nat. Genet. 4 (1993) 221–226. D. R. ROSEN, T. SIDDIQUE, D. PATTERSON, D. A. FIGLEWICZ, P. SAPP, A. HENTATI, D. DONALDSON, J. GOTO, J. P. O’REGAN, H. X. DENG, et al. Mutations in Cu/Zn
13
14
15
16
17
18
19
20
superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis. Nature 362 (1993) 59–62. J. A. JOHNSTON, M. J. DALTON, M. E. GURNEY, R. R. KOPITO, Formation of high molecular weight complexes of mutant Cu, Zn-superoxide dismutase in a mouse model for familial amyotrophic lateral sclerosis, Proc. Natl. Acad. Sci. USA 97 (2000) 12571–12576. B. I. GIASSON, V. M. LEE, Are ubiquitination pathways central to Parkinson’s disease? Cell 114 (2003) 1–8. T. KITADA, S. ASAKAWA, N. HATTORI, H. MATSUMINE, Y. YAMAMURA, S. MINOSHIMA, M. YOKOCHI, Y. MIZUNO, N. SHIMIZU, Mutations in the parkin gene cause autosomal recessive juvenile parkinsonism, Nature 392 (1998) 605–608. C. B. L¨UCKING, A. DURR, V. BONIFATI, J. VAUGHAN, G. De MICHELE, T. GASSER, B. S. HARHANGI, G. MECO, P. DENEFLE, N. W. WOOD, Y. AGID, A. BRICE, Association between early-onset Parkinson’s disease and mutations in the parkin gene, N. Engl. J. Med. 342 (2000) 1560–1567. H. SHIMURA, N. HATTORI, S. KUBO, Y. MIZUNO, S. ASAKAWA, S. MINOSHIMA, N. SHIMIZU, K. IWAI, T. CHIBA, K. TANAKA, T. SUZUKI, Familial Parkinson disease gene product, parkin, is a ubiquitin-protein ligase, Nat. Genet. 25 (2000) 302–305. K. K. CHUNG, Y. ZHANG, K. L. LIM, Y. TANAKA, H. HUANG, J. GAO, C. A. ROSS, V. L. DAWSON, T. M. DAWSON, Parkin ubiquitinates the α-synuclein-interacting protein, synphilin-1: implications for Lewy-body formation in Parkinson disease, Nat. Med. 7 (2001) 1144–1150. H. SHIMURA, M. G. SCHLOSSMACHER, N. HATTORI, M. P. FROSCH, A. TROCKENBACHER, R. SCHNEIDER, Y. MIZUNO, K. S. KOSIK, D. J. SELKOE, Ubiquitination of a new form of α-synuclein by parkin from human brain: implications for Parkinson’s disease, Science 293 (2001) 263–269. Y. IMAI, M. SODA, H. INOUE, N. HATTORI, Y. MIZUNO, R. TAKAHASHI, An unfolded putative transmembrane polypeptide, which can lead to endoplasmic
21
22 Molecular Chaperones and the Ubiquitin–Proteasome System
21
22
23
24
25
26
27
28
29
30
31
reticulum stress, is a substrate of Parkin, Cell 105 (2001) 891–902. P. FERNANDEZ-FUNEZ, M. L. NINO-ROSALES, B. de GOUYON, W. C. SHE, J. M. LUCHAK, P. MARTINEZ, E. TURIEGANO, J. BENITO, M. CAPOVILLA, P. J. SKINNER, A. MCCALL, I. CANAL, H. T. ORR, H. Y. ZOGHBI, J. BOTAS, Identification of genes that modify ataxin-1-induced neurodegeneration, Nature 408 (2000) 101–106. P. KAZEMI-ESFARJANI, S. BENZER, Genetic suppression of polyglutamine toxicity in Drosophila, Science 287 (2000) 1837–1840. N. M. BONINI, Chaperoning brain degeneration, Proc. Natl. Acad. Sci. USA 99 Suppl. 4 (2002) 16407–16411. P. K. AULUCK, H. Y. CHAN, J. Q. TROJANOWSKI, V. M. LEE, N. M. BONINI, Chaperone suppression of α-synuclein toxicity in a Drosophila model for Parkinson’s disease, Science 295 (2002) 865–868. F. U. HARTL, M. HAYER-HARTL, Molecular chaperones in the cytosol: from nascent chain to folded protein, Science 295 (2002) 1852–1858. J. FRYDMAN, Folding of newly translated proteins in vivo: the role of molecular chaperones, Annu. Rev. Biochem. 70 (2001) 603–647. J. FRYDMAN, J. H¨ohfeld, Chaperones get in touch: the Hip-Hop connection, Trends Biochem. Sci. 22 (1997) 87–92. B. BUKAU, A. L. HORWICH, The Hsp70 and Hsp60 chaperone machines, Cell 92 (1998) 351–366. R. I. MORIMOTO, Regulation of the heat shock transcriptional response: cross talk between a family of heat shock factors, molecular chaperones, and negative regulators. Genes Dev. 12 (1998) 3788–3796. M. P. MAYER, D. BREHMER, C. S. G¨ASSLER, B. BUKAU, Hsp70 chaperone machines, Adv. Protein Chem. 59 (2001) 1–44. S. R¨UDIGER, L. GERMEROTH, J. SCHNEIDER-MERGENER, B. BUKAU, Substrate specificity of the DnaK chaperone determined by screening cellulose-bound peptide libraries, EMBO J. 16 (1997) 1501–1507.
32 V. THULASIRAMAN, C. F. YANG, J. FRYDMAN, In vivo newly translated polypeptides are sequestered in a protected folding environment, EMBO J. 18 (1999) 85–95. 33 X. ZHU, X. ZHAO, W. F. BURKHOLDER, A. GRAGEROV, C. M. OGATA, M. E. GOTTESMAN, W. A. HENDRICKSON, Structural analysis of substrate binding by the molecular chaperone DnaK, Science 272 (1996) 1606–1614. 34 Y. MINAMI, J. H¨OHFELD, K. OHTSUKA, F. U. HARTL, Regulation of the heat-shock protein 70 reaction cycle by the mammalian DnaJ homolog, Hsp40, J. Biol. Chem. 271 (1996) 19617–19624. 35 C. A. BALLINGER, P. CONNELL, Y. WU, Z. HU, L. J. THOMPSON, L. Y. YIN, C. PATTERSON, Identification of CHIP, a novel tetratricopeptide repeat-containing protein that interacts with heat shock proteins and negatively regulates chaperone functions, Mol. Cell. Biol. 19 (1999) 4535–4545. 36 J. H¨OHFELD, Y. MINAMI, F. U. HARTL, Hip, a novel cochaperone involved in the eukaryotic Hsc70/Hsp40 reaction cycle, Cell 83 (1995) 589–598. 37 J. H¨OHFELD, S. JENTSCH, GrpE-like regulation of the hsc70 chaperone by the anti-apoptotic protein BAG-1, Embo J. 16 (1997) 6209–6216. 38 H. SONDERMANN, C. SCHEUFLER, C. SCHNEIDER, J. H¨OHFELD, F. U. HARTL, I. MOAREFI, Structure of a Bag/Hsc70 complex: convergent functional evolution of Hsp70 nucleotide exchange factors, Science 291 (2001) 1553– 1557. 39 C. S. G¨ASSLER, T. WIEDERKEHR, D. BREHMER, B. BUKAU, M. P. MAYER, Bag-1M accelerates nucleotide release for human Hsc70 and Hsp70 and can act concentration-dependent as positive and negative cofactor, J. Biol. Chem. 276 (2001) 32538–32544. 40 J. H¨OHFELD, D. M. CYR, C. PATTERSON, From the cradle to the grave: molecular chaperones that may choose between folding and degradation, EMBO Rep. 2 (2001) 885–890. 41 W. M. OBERMANN, H. SONDERMANN, A. A. RUSSO, N. P. PAVLETICH, F. U. HARTL, In
References
42
43
44
45
46
47
48
49
50
vivo function of Hsp90 is dependent on ATP binding and ATP hydrolysis, J. Cell Biol. 143 (1998) 901–910. J. BUCHNER, Hsp90 & Co. – a holding for folding, Trends Biochem. Sci. 24 (1999) 136–141. J. C. YOUNG, I. MOAREFI, F. U. HARTL, Hsp90: a specialized but essential protein-folding tool, J. Cell Biol. 154 (2001) 267–273. L. H. PEARL, C. PRODROMOU, Structur, function and mechanism of the Hsp90 molecular chaperone, Adv. Protein Chem. 59 (2001) 157–186. B. PANARETOU, G. SILIGARDI, P. MEYER, A. MALONEY, J. K. SULLIVAN, S. SINGH, S. H. MILLSON, P. A. CLARKE, S. NAABY-HANSEN, R. STEIN, R. CRAMER, M. MOOAPOUR, P. WORKMAN, P. W. PIPER, L. H. PEAL, C. PRODROMOU, Activation of the ATPase activity of hsp90 by the stress-regulated cochaperone aha1, Mol. Cell 10 (2002) 1307–1318. J. C. YOUNG, N. J. HOOGENRAAD, F. U. HARTL, Molecular chaperones Hsp90 and Hsp70 deliver preproteins to the mitochondrial import receptor Tom70, Cell 112 (2003) 41–50. A. BRYCHZY, T. REIN, K. F. WINKLHOFER, F. U. HARTL, J. C. YOUNG, W. M. OBERMANN, Cofactor Tpr2 combines two TPR domains and a J domain to regulate the Hsp70/Hsp90 chaperone system, EMBO J. 22 (2003) 3613–3623. S. M. ROE, M. M. ALI, P. MEYER, C. K. VAUGHAN, B. PANARETOU, P. W. PIPER, C. PRODROMOU, L. H. PEARL, The mechanism of Hsp90 regulation by the protein kinase-specific cochaperone p50 (cdc37), Cell 116 (2004) 87–98. K. D. DITTMAR, W. B. PRATT, Folding of the glucocorticoid receptor be the reconstituted Hsp90-based chaperone machinery. The initial hsp90.p60.hsp70 dependent step is sufficient for creating the steroid binding conformation, J. Biol. Chem. 272 (1997) 13047–13054. K. D. DITTMAR, D. R. DEMADY, L. F. STANCATO, P. KRISHNA, W. B. PRATT, Folding of the glucocorticoid receptor by the heat shock protein (hsp) 90-based chaperone machinery. The role of p23 is to stabilize receptor.hsp90
51
52
53
54
55
56
57
58
59
heterocomplexes formed by hsp90.p60.hsp70, J. Biol. Chem. 272 (1997) 21213–21220. S. CHEN, V. PRAPAPANICH, R. A. RIMERMAN, B. HONORE, D. F. SMITH, Interactions of p60, a mediator of progesterone receptor assembly, with heat shock proteins hsp90 and hsp70. Mol. Endocrinol. 10 (1996) 682–693. H. KOSANO, B. STENSGARD, M. C. CHARLESWORTH, N. MCMAHON, D. TOFT, The assembly of progesterone receptor-hsp90 complexes using purified proteins, J. Biol. Chem. 273 (1998) 32973–32979. W. B. PRATT, D. O. TOFT, Regulation of signaling protein function and trafficking by the hsp90/hsp70-based chaperone machinery, Exp. Biol. Med. 228 (2003) 111–133. M. P. HERNANDEZ, A. CHADLI, D. O. TOFT, Hsp40 binding is the first step in the Hsp90 chaperoning pathway for the progesterone receptor, J. Biol. Chem. 277 (2002) 11873–11881. C. SCHEUFLER, A. BRINKER, G. BOURENKOV, S. PEGORARO, L. MORODER, H. BARTUNIK, F. U. HARTL, I. MOAREFI, Structure of TPR domain-peptide complexes: critical elements in the assembly of the Hsp70-Hsp90 multichaperone machine, Cell 101 (2000) 199–210. J. FONTANA, D. FULTON, Y. CHEN, T. A. FAIRCHILD, T. J. MCCABE, N. FUJITA, T. TSURUO, W. C. SESSA, Domain mapping studies reveal that the M domain of hsp90 serves as a molecular scaffold to regulate Akt-dependent phosphorylation of endothelioal nitric oxide synthase and NO release, Circ. Res. 90 (2002) 866–873. D. B. SOLIT, H. I. SCHER, N. ROSEN, Hsp90 as a therapeutic target in prostate cancer, Semin. Oncol. 30 (2003) 709–716. L. NECKERS, S. P. IVY, Heat shock protein 90, Curr. Opin. Oncol. 15 (2003) 419– 424. A. KAMAL, L. THAO, J. SENSINTAFFAR, L. ZHANG, M. F. BOEHM, L. C. FRITZ, F. J. BURROWS, A high-affinity conformation of Hsp90 confers tumour selectivity on Hsp90 inhibitors, Nature 425 (2003) 407–410.
23
24 Molecular Chaperones and the Ubiquitin–Proteasome System
60 C. PRODROMOU, S. M. ROE, R. O’BRIEN, J. E. LADBURY, P. W. PIPER, L. H. PEARL, Identification and structural characterization of the ATP/ADP-binding site in the Hsp90 molecular chaperone, Cell 90 (1997) 65–75. 61 C. E. STEBBINS, A. A. RUSSO, C. SCHNEIDER, N. ROSEN, F. U. HARTL, N. P. PAVLETICH, Crystal structure of an Hsp90-geldanamycin complex: targeting of a protein chaperone by an antitumor agent, Cell 89 (1997) 239–250. 62 S. V. SHARMA, T. AGATSUMA, H. NAKANO, Targeting of the protein chaperone, Hsp90, by the transformation suppressing agent, radicicol, Oncogene 16 (1998) 2639–2645. 63 T. W. SCHULTE, S. AKINAGA, T. MURAKATA, T. AGATSUMA, S. SUGIMOTO, H. NAKANO, Y. S. LEE, B. B. SIMEN, Y. ARGON, S. FELTS, D. O. TOFT, L. M. NECKERS, S. V. SHARMA, Interaction of radicicol with members of the heat shock protein 90 family of molecular chaperones, Mol. Endocrinol. 13 (1999) 1435–1448. 64 C. SCHNEIDER, L. SEPP-LORENZINO, E. NIMMESGERN, O. OUERFELLI, S. DANISHEFSKY, N. ROSEN, F. U. HARTL, Pharmacologic shifting of a balance between protein refolding and degradation mediated by Hsp90, Proc. Natl. Acad. Sci. USA 93 (1996) 14536–14541. 65 E. G. MIMNAUGH, C. CHAVANY, L. NECKERS, Polyubiquitination and proteasomal degradation of the p185c-erbB-2 receptor protein-tyrosine kinase induced by geldanamycin, J. Biol. Chem. 271 (1996) 22796–22801. 66 M. G. MARCU, T. W. SCHULTE, L. NECKERS, Novobiocin and related coumarins and depletion of heat shock protein 90-dependent signaling proteins, J. Natl. Cancer Inst. 92 (2000) 242–248. 67 W. XU, E. MIMNAUGH, M. F. ROSSER, C. NICCHITTA, M. MARCU, Y. YARDEN, L. NECKERS, Sensitivity of mature Erbb2 to geldanamycin is conferred by its kinase domain and is mediated by the chaperone protein Hsp90, J. Biol. Chem. 276 (2001) 3702–3708.
68 D. B. SOLIT, F. F. ZHENG, M. DROBNJAK, P. N. MUNSTER, B. HIGGINS, D. VERBEL, G. HELLER, W. TONG, C. CORDON-CARDO, D. B. AGUS, H. I. SCHER, N. ROSEN, 17Allylamino-17-demethoxygeldanamycin induces the degradation of androgen receptor and HER-2/neu and inhibits the growth of prostate cancer xenografts, Clin. Cancer Res. 8 (2002) 986–993. 69 A. CITRI, I. ALROY, S. LAVI, C. RUBIN, W. XU, N. GRAMMATIKAKIS, C. PATTERSON, L. NECKERS, D. W. FRY, Y. YARDEN, Drug-induced ubiquitylation and degradation of ErbB receptor tyrosine kinases: implications for cancer therapy, Embo J. 21 (2002) 2407–2417. 70 R. van MONTFORT, C. SLINGSBY, E. VIERLING, Structure and function of the small heat shock protein/alpha-crystallin family of molecular chaperones, Adv. Protein Chem. 59 (2001) 105–156. 71 M. HASLBECK, N. BRAUN, T. STROMER, B. RICHTER, N. MODEL, S. WEINKAUF, J. BUCHNER, Hsp42 is the general small heat shock protein in the cytosol of Saccharomyces cerevisiae, EMBO J. 23 (2004) 638–649. 72 T. STROMER, E. FISCHER, K. RICHTER, M. HASLBECK, J. BUCHNER, Analysis of the regulation of the molecular chaperone Hsp26 by temperature-induced dissociation: the N-terminal domain is important for oligomer assembly and the binding of unfolding proteins. J. Biol. Chem. 279 (2004) 11222–11228. 73 E. BASHA, G. J. LEE, L. A. BRECI, A. C. HAUSRATH, N. R. BUAN, K. C. GIESE, E. VIERLING, The identity of proteins associated with a small heat shock protein during heat stress in vivo indicates that these chaperones protect a wide range of cellular functions, J. Biol. Chem. 279 (2004) 7566–7575. 74 K. K. KIM, R. KIM, S. H. KIM, Crystal structure of a small heat-shock protein, Nature 394 (1998) 595–599. 75 R. L. van MONTFORT, E. BASHA, K. L. FRIEDRICH, C. SLINGSBY, E. VIERLING, Crystal structure and assembly of a eukaryotic small heat shock protein. Nat. Struct. Biol. 8 (2001) 1025–1030. 76 M. HASLBECK, S. WALKE, T. STROMER, M. EHRNSPERGER, H. E. WHITE, S. CHEN, H.
References
77
78
79
80
81
82
83
84
85
86
R. SAIBIL, J. BUCHNER, Hsp26: a temperature-regulated chaperone, EMBO J. 18 (1999) 6744–6751. G. J. LEE, A. M. ROSEMAN, H. R. SAIBIL, E. VIERLING, A small heat shock protein stably binds heat-denatured model substrates and can maintain a substrate in a folding-competent state, EMBO J. 16 (1997) 659–671. Z. XU, A. L. HORWICH, P. B. SIGLER, The crystal structure of the asymmetric GroEL-GroES-(ADP)7 chapero-nin complex, Nature 388 (1997) 741–750. H. R. SAIBIL, N. A. RANSON, The chaperonin folding machine, Trends Biochem. Sci. 27 (2002) 627–632. A. S. MEYER, J. R. GILLESPIE, D. WALTHER, I. S. MILLET, S. DONIACH, J. FRYDMAN, Closing the folding chamber of the eukaryotic chaperonin requires the transition state of ATP hydrolysis, Cell 113 (2003) 369–381. V. THULASIRAMAN, C. F. YANG, J. FRYDMAN, In vivo newly translated polypeptides are sequestered in a protected folding environment, EMBO J. 18 (1999) 85–95. S. A. LEWIS, G. TIAN, I. E. VAINBERG, N. J. COWAN, Chaperonin-mediated folding of actin and tubulin, J. Cell Biol. 132 (1996) 1–4. H. SAKAHIRA, P. BREUER, M. K. HAYER-HARTL, F. U. HARTL, Molecular chaperones as modulators of polyglutamine protein aggregation and toxicity, Proc. Natl. Acad. Sci. USA 99 Suppl. 4 (2002) 16412–16418. J. M. WARRICK, H. Y. CHAN, G. L. GRAY-BOARD, Y. CHAI, H. L. PAULSON, N. M. BONINI, Suppression of polyglutamine-mediated neuro-degeneration in Drosophila by the molecular chaperone Hsp70. Nat. Genet. 23 (1999) 425–428. C. J. CUMMINGS, M. A. MANCINI, B. ANTALFFY, D. B. DEFRANCO, H. T. ORR, H. Y. ZOGHBI, Chaperone suppression of aggregation and altered subcellular proteasome localization imply protein misfolding in SCA1, Nat. Genet. 19 (1998) 148–154. C. J. CUMMINGS, Y. SUN, P. OPAL, B. ANTALFFY, R. MESTRIL, H. T. ORR, W. H.
87
88
89
90
91
92
93
DILLMANN, H. Y. ZOGHBI, Over-expression of inducible HSP70 chaperone suppresses neuropathology and improves motor function in SCA1 mice, Hum. Mol. Genet. 10 (2001) 1511–1518. S. KROBITSCH, S. LINDQUIST, Aggregation of huntingtin in yeast varies with the length of the polyglutamine expansion and the expression of chaperone proteins, Proc. Natl. Acad. Sci. USA 97 (2000) 1589–1594. P. J. MUCHOWSKI, G. SCHAFFAR, A. SITTLER, E. E. WANKER, M. K. HAYER-HARTL, F. U. HARTL, Hsp70 and hsp40 chaperones can inhibit self-assembly of polyglutamine proteins into amyloid-like fibrils, Proc. Natl. Acad. Sci. USA 97 (2000) 7841–7846. J. Z. CHUANG, H. ZHOU, M. ZHU, S. H. LI, X. J. LI, C. H. SUNG, Characterization of a brain-enriched chaperone, MRJ, that inhibits Huntingtin aggregation and toxicity independently, J. Biol. Chem. 277 (2002) 19831–19838. A. SITTLER, R. LURZ, G. LUEDER, J. PRILLER, H. LEHRACH, M. K. HAYER-HARTL, F. U. HARTL, E. E. WANKER, Geldanamycin activates a heat shock response and inhibits huntingtin aggregation in a cell culture model of Huntington’s disease, Hum. Mol. Genet. 10 (2001) 1307–1315. A. PARCELLIER, E. SCHMITT, S. GURBUXANI, D. SEIGNEURIN-BERNY, A. PANCE, A. CHANTOME, S. PLENCHETTE, S. KHOCHBIN, E. SOLARY, C. GARRIDO, Hsp27 is a ubiquitin-binding protein involved in IκBα proteasomal degradation, Mol. Cell Biol. 23 (2003) 5790–5802. H. SHIMURA, Y. MIURA-SHIMURA, K. S. KOSICK, Binding of tau to heat shock protein 27 leads to decreased concentration of hyperphosphorylated tau and enhanced cell survival, J. Biol. Chem. 279 (2004) 17957–17962. V. GUSAROVA, A. J. CAPLAN, J. L. BRODSKY, E. A. FISHER, Apoprotein B degradation is promoted by the molecular chaperones hsp90 and hsp70, J. Biol. Chem. 276 (2001) 24891– 24900.
25
26 Molecular Chaperones and the Ubiquitin–Proteasome System
94 C. TAXIS, R. HITT, S. H. PARK, P. M. DEAK, Z. KOSTOVA, D. H. WOLF, Use of modular substrates demonstrates mechanistic diversity and reveals differences in chaperone requirement of ERAD, J. Biol. Chem. 278 (2003) 35903–35913. 95 Y. YANG, S. JANICH, J. A. COHN, J. M. WILSON, The common variant of cystic fibrosis transmembrane conductance regulator is recognized by hsp70 and degraded in a pre-Golgi nonlysosomal compartment, Proc. Natl. Acad. Sci. USA 90 (1993) 9480–9484. 96 G. C. MEACHAM, Z. LU, S. KING, E. SORSCHER, A. TOUSSON, D. M. CYR, The Hdj-2/Hsc70 chaperone pair facilitates early steps in CFTR biogenesis, Embo J. 18 (1999) 1492–1505. 97 G. C. MEACHAM, C. PATTERSON, W. ZHANG, J. M. YOUNGER, D. M. CYR, The Hsc70 co-chaperone CHIP targets immature CFTR for proteasomal degradation, Nat. Cell Biol. 3 (2001) 100–105. 98 Y. ZHANG, G. NIJBROEK, M. L. SULLIVAN, A. A. MCCRACKEN, S. C. WATKINS, S. MICHAELIS, J. L. BRODSKY, Hsp70 molecular chaperone facilitates endoplasmic reticulum-associated protein degradation of cystic fibrosis transmembrane conductance regulator in yeast, Mol. Biol. Cell 12 (2001) 1303–1314. 99 R. R. KOPITO, Biosynthesis and degradation of CFTR, Physiol. Rev. 79 (1999) S167–173. 100 J. R. RIORDAN, Cystic fibrosis as a disease of misprocessing of the cystic fibrosis transmembrane conductance regulator glycoprotein, Am. J. Hum. Genet. 64 (1999) 1499–1504. 101 C. L. WARD, S. OMURA, R. R. KOPITO, Degradation of CFTR by the ubiquitin proteasome pathway, Cell 83 (1995) 121–127. 102 T. J. JENSEN, M. A. LOO, S. PIND, D. B. WILLIAMS, A. L. GOLDBERG, J. R. RIORDAN, Multiple proteolytic systems, including the proteasome, contribute to CFTR processing, Cell 83 (1995) 129–135. 103 M. S. GELMAN, E. S. KANNEGAARD, R. R. KOPITO, A principal role for the
104
105
106
107
108
109
110
111
112
113
proteasome in endoplasmic reticulum-associated degradation of misfolded intracellular cystic fibrosis transmembrane conductance regulator, J. Biol. Chem. 277 (2002) 11709–11714. Z. KOSTOVA, D. H. WOLF, For whom the bell tolls: protein quality control of the endoplasmic reticulum and the ubiquitin proteasome connection, EMBO J. 22 (2003) 2309–2317. S. SATO, C. L. WARD, M. E. KROUSE, J. J. WINE, R. R. KOPITO, Glycerol reverses the misfolding phenotype of the most common cystic fibrosis mutation, J. Biol. Chem. 271 (1996) 635–638. B. H. QU, P. J. THOMAS, Alteration of the cystic fibrosis transmembrane conductance regulator folding pathway, J. Biol. Chem. 271 (1996) 7261–7264. E. STRICKLAND, B. H. QU, L. MILLEN, P. J. THOMAS, The molecular chapeorne Hsc70 assists the in vitro folding of the N-terminal nucleotide-binding domain of the cystic fibrosis transmembrane conductance regulator, J. Biol. Chem. 272 (1997) 25421–25424. J. A. JOHNSTON, C. L. WARD, R. R. KOPITO, Aggresomes: a cellular response to misfolded proteins, J. Cell Biol. 143 (1998) 1883–1898. R. R. KOPITO, Aggresomes, inclusion bodies and protein aggregation, Trends Cell Biol. 10 (2000) 524–530. S. WAELTER, A. BOEDDRICH, R. LURZ, E. SCHERZINGER, G. LUEDER, H. LEHRACH, E. E. WANKER, Accumulation of mutant huntingtin fragments in aggresome-like inclusion bodies as a result of insufficient protein degradation, Mol. Biol. Cell 12 (2001) 1393–1407. R. GARCIA-MATA, Z. BEBOK, E. J. SORSCHER, E. S. SZTUL, Characterization and dynamics of aggresome formation by a cytosolic GFP-chimera, J. Cell Biol. 146 (1999) 1239–1254. K. S. MCNAUGHT, P. SHASHIDHARAN, D. P. PERL, P. JENNER, C. W. OLANOW, Aggresome-related biogenesis of Lewy bodies, Eur. J. Neurosci. 16 (2002) 2136–2148. Y. KAWAGUCHI, J. J. KOVACS, A. MCLAURIN, J. M. VANCE, A. ITO, T. P. YAO, The deacetylase HDAC6 regulates
References
114
115
116
117
118
119
120
121
122
aggresome formation and cell viability in response to misfolded protein stress, Cell 115 (2003) 727–738. R. GARCIA-MATA, Z. BEBOK, E. J. SORSCHER, E. S. SZTUL, Characterization and dynamics of aggresome formation by a cytosolic GFP-chimera, J. Cell Biol. 146 (1999) 1239–1254. W. C. WIGLEY, R. P. FABUNMI, M. G. LEE, C. R. MARINO, S. MUALLEM, G. N. DEMARTINO, P. J. THOMAS, Dynamic association of proteasomal machinery with the centrosome, J. Cell Biol. 145 (1999) 481–490. A. WYTTENBACH, J. CARMICHAEL, J. SWARTZ, R. A. FURLONG, Y. NARAIN, J. RANKIN, D. C. RUBINSZTEIN, Effects of heat shock, heat shock protein 40 (HDJ-2), and proteasome inhibition on protein aggregation in cellular models of Huntington’s disease, Proc. Natl. Acad. Sci. USA 97 (2000) 2898–2903. J. L. DUL, D. P. DAVIS, E. K. WILLIAMSON, F. J. STEVENS, Y. ARGON, Hsp70 and antifibrillogenic peptides promote degradation and inhibit intracellular aggregation of amyloidogenic light chains, J. Cell Biol. 152 (2001) 705–716. H. LELOUARD, E. GATTI, F. CAPPELLO, O. GRESSER, V. CAMOSSETO, P. PIERRE, Transient aggregation of ubiquitinated proteins during dendritic cell maturation, Nature 417 (2002) 177–182. H. LELOUARD, V. FERRAND, D. MARGUET, J. BANIA, V. CAMOSSETO, A. DAVID, E. GATTI, P. PIERRE, Dendritic cell aggresome-like induced structures are dedicated areas for ubiquitination and storage of newly synthesized defective proteins, J. Cell Biol. 164 (2004) 667–675. W. XU, M. MARCU, X. YUAN, E. MIMNAUGH, C. PATTERSON, L. NECKERS, Chaperone-dependent E3 ubiquitin ligase CHIP mediates a degradative pathway for c-ErbB2/Neu, Proc. Natl. Acad. Sci. USA 99 (2002) 12847–12852. S. WICKNER, M. R. MAURIZI, S. GOTTESMAN, Posttranslational quality control: folding, refolding, and degrading proteins, Science 286 (1999) 1888–1893. C.-W. LIU, M. J. CORBOY, G. N. DEMARTINO, P. J. THOMAS,
123
124
125
126
127
128
129
130
131
Endoproteolytic activity of the proteasome, Science 299 (2003) 408–411. W. SEUFERT, S. JENTSCH, Ubiquitin-conjugating enzymes UBC4 and UBC5 mediate selective degradation of short-lived and abnormal proteins, EMBO J. 9 (1990) 543–550. B. BERCOVICH, I. STANCOVSKI, A. MAYER, N. BLUMENFELD, A. LASZLO, A. L. SCHWARTZ, A. CIECHANOVER, Ubiquitin-dependent degradation of certain protein substrates in vitro requires the molecular chaperone Hsc70, J. Biol. Chem. 272 (1997) 9002–9010. P. CONNELL, C. A. BALLINGER, J. JIANG, Y. WU, L. J. THOMPSON, J. H¨OHFELD, C. PATTERSON, The co-chaperone CHIP regulates protein triage decisions mediated by heat-shock proteins, Nat. Cell Biol. 3 (2001) 93–96. L. ARAVIND, E. V. KOONIN, The U box is a modified RING finger – a common domain in ubiquitination, Curr. Biol. 10 (2000) R132–134. S. HATAKEYAMA, M. YADA, M. MATSUMOTO, N. ISHIDA, K. I. NAKAYAMA, U box proteins as a new family of ubiquitin-protein ligases, J. Biol. Chem. 276 (2001) 33111–33120. E. PRINGA, G. MARTINEZ-NOEL, U. M¨ULLER, K. HARBERS, Interaction of the ring finger-related U-box motif of a nuclear dot protein with ubiquitin-conjugating enzymes, J. Biol. Chem. 276 (2001) 19617– 19623. R. NIKOLAY, T. WIEDERKEHR, W. RIST, G. KRAMER, M. P. MAYER, B. BUKAU, Dimerization of the human E3 ligase CHIP via a coiled-coil domain is essential for its activity, J. Biol. Chem. 279 (2004) 2673–2678. Q. DAI, C. ZHANG, Y. WU, H. MCDONOUGH, R. A. WHALEY, V. GODFREY, H. H. LI, N. MADAMANCHI, W. XU, L. NECKERS, D. CYR, C. PATTERSON, CHIP activates HSF1 and confers protection against apoptosis and cellular stress, EMBO J. 22 (2003) 5446–5458. J. DEMAND, J. L¨UDERS, J. H¨OHFELD, The carboxyl-terminal domain of Hsc70 provides binding sites for a distinct set
27
28 Molecular Chaperones and the Ubiquitin–Proteasome System
132
133
134
135
136
137
138
139
140
of chaperone cofactors, Mol. Cell. Biol. 18 (1998) 2023–2028. A. BRINKER, C. SCHEUFLER, F. VON DER MULBE, B. FLECKENSTEIN, C. HERRMANN, G. JUNG, I. MOAREFI, F. U. HARTL, Ligand discrimination by TPR domains. Relevance and selectivity of EEVD-recognition in Hsp70 xHop xHsp90 complexes, J. Biol. Chem. 277 (2002) 19265–19275. H. H. KAMPINGA, B. KANON, F. A. SALOMONS, A. E. KABAKOV, C. PATTERSON, Overexpression of the cochaperone CHIP enhances Hsp70-dependent folding activity in mammalian cells, Mol. Cell. Biol. 23 (2003) 4948–4958. H. SHIMURA, D. SCHWARTZ, S. P. GYGI, K. S. KOSIK, CHIP-Hsc70 complex ubiquitinates phosphorylated tau and enhances cell survival, J. Biol. Chem. 279 (2004) 4869–4876. J. JIANG, C. A. BALLINGER, Y. WU, Q. DAI, D. M. CYR, J. H¨OHFELD, C. PATTERSON, CHIP is a U-box-dependent E3 ubiquitin ligase: identification of Hsc70 as a target for ubiquitylation, J. Biol. Chem 276 (2001) 42938–42944. S. MURATA, Y. MINAMI, M. MINAMI, T. CHIBA, K. TANAKA, CHIP is a chaperone-dependent E3 ligase that ubiquitylates unfolded protein, EMBO Rep. 2 (2001) 1133–1138. J. DEMAND, S. ALBERTI, C. PATTERSON, J. H¨ohfeld, Cooperation of a ubiquitin domain protein and an E3 ubiquitin ligase during chaperone/proteasome coupling, Curr. Biol. 11 (2001) 1569–1577. P. K. JACKSON, A. G. ELDRIDGE, E. FREED, L. FURSTENTHAL, J. Y. HSU, B. K. KAISER, J. D. REIMANN, The lore of the RINGs: substrate recognition and catalysis by ubiquitin ligases, Trends Cell. Biol. 10 (2000) 429–439. P. K. JACKSON, A. G. ELDRIDGE, The SCF ubiquitin ligase: an extended look, Mol. Cell 9 (2002) 923–925. N. ZHENG, B. A. SCHULMAN, L. SONG, J. J. MILLER, P. D. JEFFREY, P. WANG, C. CHU, D. M. KOEPP, S. J. ELLEDGE, M. PAGANO, R. C. CONAWAY, J. W. CONAWAY, J. W. HARPER, N. P. PAVLETICH, Structure of the Cul1-Rbx1-Skp1-F boxSkp2 SCF
141
142
143
144
145
146
147
148
149
ubiquitin ligase complex, Nature 416 (2002) 703–709. J. L¨UDERS, J. DEMAND, J. H¨ohfeld, The ubiquitin-related BAG-1 provides a link between the molecular chaperones Hsc70/Hsp70 and the proteasome, J. Biol. Chem. 275 (2000) 4613–4617. S. JENTSCH, G. PYROWOLAKIS, Ubiquitin and its kin: how close are the family ties? Trends Cell. Biol. 10 (2000) 335– 342. S. ALBERTI, J. DEMAND, C. ESSER, N. EMMERICH, H. SCHILD, J. H¨OHFELD, Ubiquitylation of BAG-1 suggests a novel regulatory mechanism during the sorting of chaperone substrates to the proteasome, J. Biol. Chem. 277 (2002) 45920–45927; Erratum in J. Biol. Chem. 278 (2003) 15702–15703. Q. DEVERAUX, V. USTRELL, C. PICKART, M. RECHSTEINER, A 26 S protease subunit that binds ubiquitin conjugates, J. Biol. Chem. 269 (1994) 7059–7061. P. YOUNG, Q. DEVERAUX, R. E. BEAL, C. M. PICKART, M. RECHSTEINER, Characterization of two polyubiquitin binding sites in the 26 S protease subunit 5a, J. Biol. Chem. 273 (1998) 5461–5467. H. HIYAMA, M. YOKOI, C. MASUTANI, K. SUGASAWA, T. MAEKAWA, K. TANAKA, J. H. HOEIJMAKERS, F. HANAOKA, Interaction of hHR23 with S5a. The ubiquitin-like domain of hHR23 mediates interaction with S5a subunit of 26 S proteasome, J. Biol. Chem. 274 (1999) 28019–28025. K. J. WALTERS, M. F. KLEIJNEN, A. M. GOH, G. WAGNER, P. M. HOWLEY, Structural studies of the interaction between ubiquitin family proteins and proteasome subunit S5a, Biochemistry 41 (2002) 1767–1777. Y. A. LAM, T. G. LAWSON, M. VELAYUTHAM, J. L. ZWEIER, C. M. PICKART, A proteasomal ATPase subunit recognizes the polyubiquitin degradation signal, Nature 416 (2002) 763–767. S. ELSASSER, R. R. GALI, M. SCHWICKART, C. N. LARSEN, D. S. LEGGETT, B. MULLER, M. T. FENG, F. TUBING, G. A. DITTMAR, D. FINLEY, Proteasome subunit Rpn1 binds ubiquitin-like protein domains, Nat. Cell Biol. 4 (2002) 725–730.
References 150 Y. SAEKI, T. SONE, A. TOHE, H. YOKOSAWA, Identification of ubiquitin-like protein-binding subunits of the 26S proteasome, Biochem. Biophys. Res. Commun. 296 (2002) 813–819. 151 M. H. GLICKMAN, D. M. RUBIN, O. COUX, I. WEFES, G. PFEIFER, Z. CJEKA, W. BAUMEISTER, V. A. FRIED, D. FINLEY, A subcomplex of the proteasome regulatory particle required for ubiquitin-conjugate degradation and related to the COP9-signalosome and eIF3, Cell 94 (1998) 615–623. 152 Y. XIE, A. VARSHAVSKY, UFD4 lacking the proteasome-binding region catalyses ubiquitination but is impaired in proteolysis, Nat. Cell Biol. 4 (2002) 1003–1007. 153 S. TAKAYAMA, J. C. REED, Molecular chaperone targeting and regulation by BAG family proteins, Nat. Cell Biol. 3 (2001) E237–241. 154 K. THRESS, J. SONG, R. I. MORIMOTO, S. KORNBLUTH, Reversible inhibition of Hsp70 chaperone function by Scythe and Reaper, EMBO J. 20 (2001) 1033–1041. 155 S. ALBERTI, K. B¨OHSE, V. ARNDT, A. SCHMITZ, J. H¨ohfeld, The co-chaperone HspBP1 inhibits the CHIP ubiquitin ligase and stimulates the maturation of the cystic fibrosis transmembrane conductance regulator, Mol. Biol. Cell (2004), in press. 156 D. A. RAYNES, V. GUERRIERO, Inhibition of Hsp70 ATPase activity and protein renaturation by a novel Hsp70-binding protein, J. Biol. Chem. 273 (1998) 32883–32888. 157 M. KABANI, C. MCLELLAN, D. A. RAYNES, V. GUERRIERO, J. L. BRODSKY, HspBP1, a homologue of the yeast Fes1 and Sls1 proteins, is an Hsc70 nucleotide exchange factor, FEBS Lett. 531 (2002) 339–342. 158 C. P. CARDOZO, C. MICHAUD, M. C. OST, A. E. FLISS, E. YANG, C. PATTERSON, S. J. HALL, A. J. CAPLAN, C-terminal Hsp-interacting protein slows androgen receptor synthesis and reduces its rate of degradation, Arch. Biochem. Biophys. 410 (2003) 134–140. 159 J. JIANG, D. CYR, R. W. BABBITT, W. C. SESSA, C. PATTERSON,
160
161
162
163
164
165
166
167
Chaperone-dependent regulation of endothelial nitric-oxide synthase intracellular trafficking by the co-chaperone/ubiquitin ligase CHIP, J. Biol. Chem. 278 (2003) 49332– 49341. Y. IMAI, M. SODA, S. HATAKEYAMA, T. AKAGI, T. HASHIKAWA, K. I. NAKAYAMA, R. TAKAHASHI, CHIP is associated with Parkin, a gene responsible for familial Parkinson’s disease, and enhances its ubiquitin ligase activity, Mol. Cell 10 (2002) 55–67. Y. NAGANO, H. YAMASHITA, T. TAKAHASHI, S. KISHIDA, T. NAKAMURA, E. ISEKI, N. HATTORI, Y. MIZUNO, A. KIKUCHI, M. MATSUMOTO, Siah-1 facilitates ubiquitination and degradation of synphilin-1, J. Biol. Chem. 278 (2003) 51504–51514. E. LIANI, A. EYAL, E. AVRAHAM, R. SHERMER, R. SZARGEL, D. BERG, A. BORNEMANN, O. RIESS, C. A. ROSS, R. ROTT, S. ENGELENDER, Ubiquitylation of synphilin-1 and alpha-synuclein by SIAH and its presence in cellular inclusions and Lewy bodies imply a role in Parkinson’s disease, Proc. Natl. Acad. Sci. USA 101 (2004) 5500–5505. S. MATSUZAWA, S. TAKAYAMA, B. A. FROESCH, J. M. ZAPATA, J. C. REED, p53-inducible homologue of Drosophila seven in absentia (Siah) inhibits cell growth: suppression by BAG-1, EMBO J. 17 (1998) 2736–2747. C. SOTI, P. CSERMELY, Aging and molecular chaperones, Exp. Gerontol. 38 (2003) 1037–1040. J. FARGNOLI, T. KUNISADA, A. J. Jr. FORNACE, E. L. SCHNEIDER, N. J. HOLBROOK, Decreased expression of heat shock protein 70 mRNA and protein after heat tretament in cells of aged rats, Proc. Natl. Acad. Sci. USA 87 (1990) 846–850. C. K. LEE, R. G. KLOPP, R. WEINDRUCH, T. A. PROLLA, Gene expression profile of aging and its retardation by caloric restriction, Science 285 (1999) 1390–1393. S. X. CAO, J. M. DHAHBI, P. L. MOTE, S. R. SPINDLER, Genomic profiling of shortand long-term caloric restriction effects
29
30 Molecular Chaperones and the Ubiquitin–Proteasome System
in the liver of aging mice, Proc. Natl. Acad. Sci. USA 98 (2001) 10630–10635. 168 M. TATAR, A. A. KHAZAELI, J. W. CURTSINGER, Chaperoning extended life, Nature 390 (1997) 30. 169 G. MORROW, R. M. TANGUAY, Heat shock proteins and aging in Drosophila melanogaster, Semin. Cell. Dev. Biol. 14 (2003) 291–299.
170 J. F. MORLEY, R. I. MORIMOTO, Regulation of longevity in Caenorhabditis elegans by heat shock factor and molecular chaperones, Mol. Biol. Cell 15 (2004) 657–664. 171 A. L. HSU, C. T. MURPHY, C. KENYON, Regulation of aging and age-related disease by DAF-16 and heat-shock factor, Science 300 (2003) 1142–1145.
1
Ubiquitin-conjugating Enzymes Michael J. Eddins Johns Hopkins School of Medicine, Baltimore, USA
Cecile M. Pickart Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
Originally published in: Protein Degradation, Volume 1. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30837-8
1 Introduction
In this chapter we review the biochemical, structural, and biological properties of ubiquitin-conjugating enzymes (also called E2 enzymes). Because length restrictions preclude a comprehensive treatment, we focus on key findings that have revealed important general insights and principles. Throughout the piece we try to point out important unanswered questions concerning the E2 enzyme family. A few words about nomenclature are necessary. The yeast E2 genes were numbered in the order of their discovery, but the situation is more complicated in mammals. There are currently three naming systems in use for human E2s: one based on protein molecular mass (e.g. E225K is a 25-kD E2), one based on temporal order of gene cloning (e.g. UbcH10 is specified by the tenth E2 gene cloned in humans), and one based on relationship to yeast E2s (e.g. HR6A is one of two human homologs of yeast Rad6/Ubc2). The second system is the least ambiguous, but also the least informative. In this chapter, we generally name mammalian E2s according to their relationship to yeast E2s. When this is not possible, we use one of the published names.
2 Historical Background
Ubiquitin’s best-understood function is that of a protein cofactor in an intracellular protein-degradation pathway that terminates with the destruction of Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Ubiquitin-conjugating Enzymes
Fig. 1 The ubiquitin-conjugation pathway. Steps in ubiquitin activation and substrate modification. E1, ubiquitin activating enzyme; E2, ubiquitin-conjugating enzyme; E3, ubiquitin-protein ligase. Atoms involved in the thiol ester and amide bonds are shown.
ubiquitin-tagged substrates by 26S proteasomes [1]. The discovery in 1980 that this 76-residue protein is conjugated to proteolytic substrates through the formation of a peptide-like bond, and in an ATP-dependent manner, suggested that ubiquitin activation would be part of the conjugation process [2, 3]. A ubiquitin activating enzyme (E1) was soon identified and shown to employ an aminoacyl-tRNA-synthetase-like mechanism [4]. E1 first catalyzes the addition of an adenylate moiety to the carboxyl group of ubiquitin’s C-terminal residue, G76. The AMP-bound ubiquitin is then transferred to a cysteine residue in the E1 active site, concomitant with the formation of a new molecule of ubiquitin adenylate. The thiol-linked ubiquitin is the proximal source of activated ubiquitin for downstream steps. From a chemical point of view, the E1/ubiquitin thiol ester should be competent to donate ubiquitin to a substrate amino group. In fact, aminoacyl-enzyme thiol esters are used in exactly this way in non-ribosomal polypeptide synthesis, a process that was discovered around the same time as ubiquitin–protein conjugation [5]. In spite of the attractive simplicity of this model, however, biochemical reconstitution studies showed that besides E1 two additional fractions were required to conjugate ubiquitin to a model substrate. They were called ubiquitin carrier protein (E2) and ubiquitin-protein ligase (E3), respectively, since the respective factors seemed to act sequentially [6]. Interestingly, the E2 factor apparently formed a thiol ester with ubiquitin. Based on these results, Hershko and co-workers proposed the “ubiquitin conjugation cascade” (Figure 1). Multiple thiol ester-forming proteins were present in the E2 fraction [6], but only the smallest of them reconstituted substrate ubiquitination catalyzed by the thenknown E3 [7]. This result suggested that there could be multiple E2s with distinct functional properties. Confirmation of this hypothesis came with the cloning of the first two E2 genes, RAD6/UBC2 and CDC34/UBC3, which indeed encoded homologous yeast proteins with a signature cysteine-containing active-site motif [8, 9]. The two E2s functioned in distinct biological processes – DNA damage tolerance [8, 10] and cell-cycle control [9] – providing the first hint that ubiquitination might regulate a broad range of biological processes. A family of E2 enzymes naturally suggested that there would also be a family of E3 enzymes. This prediction has since been strikingly confirmed. We now know that specific E2/E3 complexes function to modify specific substrates with ubiquitin.
3 What is an E2? 3
3 What is an E2?
A protein can be identified as an E2 enzyme according to several different criteria. Functionally, the E2 occupies an intermediate position in the conjugation cascade – that is, it acts between the E1 and the E3 (Figure 1). This property accounts for the original name of ubiquitin carrier protein, which drew an analogy to the acyl carrier proteins used in fatty acid biosynthesis and non-ribosomal peptide synthesis [6]. Subsequently, with the recognition that E2 enzymes often play an active role in conjugation, the conjugating enzyme name gained favor. Mechanistically, the E2 first participates in a thiol ester transfer reaction, in which the activated ubiquitin is moved from the active-site cysteine of E1 to that of the E2 (Figure 1). The E2/ubiquitin thiol ester intermediate is strictly required for downstream steps, as shown by ablation of substrate ubiquitination following mutation of the active-site cysteine residues of different E2s (see, for example, Refs. [11, 12]). The ubiquitin is then transferred from the E2 active site to the ε-amino group of the substrate’s lysine residue, forming an isopeptide bond. The conjugation site can also be a specific lysine on a previously conjugated ubiquitin, which leads to polyubiquitin chain elongation; chains linked through K48 are the principal signal for targeting substrates to proteasomes [1]. Transfer of ubiquitin to the substrate requires the assistance of the E3 [1, 6]. If this enzyme belongs to the HECT domain family (Homologous to E6AP C-Terminus), the ubiquitin is first transferred to an active-site cysteine residue of the E3; if the E3 belongs to the RING domain family (Really Interesting New Gene), ubiquitin is transferred directly to the substrate’s amino group (Section 6.3). Collectively, these properties constitute the biochemical definition of an E2 enzyme: it is a protein that accepts ubiquitin in thiol ester linkage from E1, and cooperates with an E3 enzyme to deliver this ubiquitin to the substrate. The functional specialization of individual E2s (Section 4) reflects the specificity of interaction of each E2 with its cognate E3(s), in conjunction with the E3’s substrate specificity. Therefore an E2 enzyme can also be defined according to the cognate E3(s) with which it interacts. The E3 partners of many of the eleven ubiquitin-dedicated E2s in Saccharomyces cerevisiae are conserved in higher organisms (Section 4). However, both the E2 and E3 families are much larger in higher organisms than in yeast. Present accounting suggests that there are 50–70 E2s, and hundreds of E3s, in mammals [13, 14]. The amino acids surrounding the thiol ester-forming cysteine residue are particularly highly conserved, but there is sequence similarity throughout the 150-residue E2 core domain (Figure 2). This bioinformatic definition makes it easy to identify E2 genes in sequenced genomes [13, 14]. In fact, many E2s consist of just this core domain (Figure 2). The fact that such E2s are often functionally distinct from one another indicates that modest sequence variation within the core domain can be highly significant. Structural biology has begun to shed light on this structure/function correlation (Section 6). Other E2s display N- and C-terminal extensions to the core
4 Ubiquitin-conjugating Enzymes
3 What is an E2? 5
Fig. 2 Ubc13 (1JBB). Canonical "/$ E2 fold with the active-site cysteine shown in ball-and-stick.
domain (Figure 2), which may play a role in E3 and/or substrate specificity (see Refs. [15–19]). Structural biology provides a final way to define an E2 enzyme. As expected from the strong sequence conservation, the E2 core domain adopts a conserved fold. At the time this article was being prepared, twelve different E2 structures had been deposited in the Protein Data Bank. The average root-mean-square deviation of the 150 Cα positions of these structures is less than 2Å. E2s are α/β proteins containing a central anti-parallel four-stranded β-sheet (S1–S4), four α-helices (H1, H3, H4, H5), and a small 310 helix (H2) (Figure 3) [20, 21]. The cysteine is located on an extended loop after β-strand 4 and immediately before the short 310 helix H2. The active-site cysteine sits in a shallow groove composed of residues from the H3–H4 and S4–H2 loops. The canonical α/β E2 fold is highly versatile, allowing E2 enzymes to associate with several different proteins in the ubiquitin conjugation cascade without any perturbation of the E2’s tertiary structure (Section 6). Residues occupying the face opposite the active site are less conserved than those surrounding the cysteine [21]. Sequence variation in this region contributes to the functional diversity of the E2 family by permitting specific interactions of individual E2s with cognate E3s and (perhaps) substrates.
6 Ubiquitin-conjugating Enzymes
4 Functional Diversity of Ubiquitin-conjugating Enzymes
The functional range of the ubiquitin-conjugating enzyme family is easily appreciated by considering the family members in a single organism. Table 1 summarizes key properties of the complete set of E2s in the yeast S. cerevisiae, including notable structural features, known cognate E3(s) and their key substrates, and biological functions (see also [22, 23]). We cannot give a comprehensive review of E2 functions in higher organisms, but we do comment on some notable instances of functional conservation, expansion, and divergence (see also [23]). 4.1 Functions Related to Proteasome Proteolysis
In many cases, the specific function(s) of a given E2 enzyme reflect its role in targeting one or more substrates for degradation by 26S proteasomes. The scope of this function varies considerably between E2 family members, however. At one extreme, the functionally redundant enzymes Ubc4 and Ubc5 are necessary for the turnover of many substrates, as shown by a marked reduction in the rate of turnover of endogenous short-lived and abnormal proteins in a ubc4ubc5 yeast strain [24]. The slow growth and stress sensitivity of this strain [24, 25] can also be ascribed to inhibition of proteasomal proteolysis since these phenotypes are characteristic of proteasome mutants [22, 26]. Despite the important role of Ubc4/5 in proteasome degradation, few E3 partners relevant to this function have been identified. One is Ufd4, a HECT-domain E3 that mediates the degradation of linear ubiquitin fusion proteins [27]. A ufd4 strain grows normally, however, indicating that Ubc4 has other cognate E3s. Rsp5, an essential HECT-domain E3, is one likely candidate since this E3 partners with Ubc4 in other pathways (see below). UBC1 is essential for viability in the ubc4ubc5 strain, suggesting that Ubc1 shares substantial functional overlap with Ubc4/5 in directing substrates to proteasomes for degradation [28]. The Ubc4/5 sub-family of E2s is much larger in mammals, where it includes both constitutively and selectively expressed enzymes. Notable human E2s in this group are UbcH5a/b/c, UbcH7, and UbcH8 (see Ref. [23]). The expansion is likely to reflect the much larger size of the E3 family in higher organisms. However, while the results of in vitro conjugation assays and protein-protein interaction studies suggest that certain E3s partner specifically with individual Ubc4/5 sub-family members, the degree of E3 (hence, functional) selectivity in the cellular setting remains quite uncertain (discussed in Refs. [23, 29]). RNA interference studies and mouse knockout models may be helpful in addressing this question in the future. Ubc3/Cdc34 supports the proteasome-mediated proteolysis of numerous substrates through its role as the specific E2 partner of a large family of multi-subunit RING E3s called SCF E3s (Skp-Cullin-F-box, Section 6.3). This role is preserved in higher organisms [30]. Yeast ubc3 mutants arrest in G1 phase of the cell cycle because they fail to degrade Sic1 [31], an inhibitor of the G1/S transition that is
4 Functional Diversity of Ubiquitin-conjugating Enzymes 7 Table 1 E2 enzymes of the yeast Saccharomyces cerevisiae
Gene
Amino acids
UBC1
215 [28]
Cognate E3
Unknown
Hrd1 [39] UBC2 (RAD6)
172 [8]
Ubr1 [156]
Ubr1 [52]
Rad18 [73], [157] Bre1 [67, 69]
UBC3 (CDC34)
295 [9]
SCF E3s
UBC4
148 [24]
Unknown Doa10 Rsp5
UBC5
148 [24]
UBC6
250 [46]
See UBC4
Unknown
Doa10 [42] Doa10 [42] UBC7
165 [25] Hrd1 [39]
Doa10 [42] UBC8
206 [160]
Unknown
UBC10
165 [75]
Unknown
Functions and substrates Short C-terminal tail harbors ubiquitin-associated (UBA) domain [155] Essential in ubc4ubc5 genetic background, suggesting a redundant role with Ubc4/5 in proteasomal turnover of short-lived and abnormal proteins [28] Role in ERAD that is not fulfilled by Ubc4/5 [39, 40] Proteasomal degradation of N-end rule [158] substrates, including cohesin fragment [51] Proteasomal degradation of Cup9 transcriptional repressor regulates peptide import DNA-damage tolerance [8] via monoubiquitination of PCNA [65] (non-proteolytic function) Ubiquitination of histone H2B [68] regulates gene transcription and silencing [71] (non-proteolytic function) Essential gene; long C-terminal tail; targets diverse substrates for proteasomal degradation [30, 34, 36]; regulation of cell-cycle progression Proteasomal degradation of diverse shortlived proteins [24] Proteasomal degradation of MATα2 transcriptional repressor [49] Endocytosis of membrane proteins [60, 61]; protein trafficking (see Ref. [63]) 92% identical to Ubc4; functionally redundant [24] C-terminal tail provides anchoring to ER membrane [46] Together with Ubc7, proteasomal degradation of some ERAD substrates [47, 48, 159] Proteasomal turnover of Ubc6 is Ubc6-, Ubc7-, and Doa10-dependent [42, 50] In conjunction with Ubc7, proteasomal turnover of MATα2 Localized to ER membrane via Cue1 [37] Role in proteasomal degradation via ERAD [38, 48] confers resistance to cadmium and other ER stresses [25] In conjunction with Ubc7, proteasomal turnover of MATα2 Glucose-induced proteasome degradation of fructose-1,6-bisphosphatase [57] Also called Pas2/Pex10. Peroxisome biogenesis [75]; Pex10 is a candidate E3 [78]
8 Ubiquitin-conjugating Enzymes Table 1 (continued)
Gene
Amino acids
Cognate E3
Functions and substrates
UBC11
156 [84]
Unknown
UBC13
153
Unknown; similar to clam E2-C (E2-C functions in mitotic cyclin turnover [79], but Ubc11 is dispensable for this process in yeast [84]) Heterodimerizes with Mms2 (UEV) [72] DNA-damage tolerance [72] via polyubiquitination of PCNA [65] (non-proteolytic function) Essential gene; E2 dedicated to Smt3 (SUMO) [163] Septin modification E2 dedicated to Rub1 (Nedd8) Modification of specific cullin lysine residue activates cullin-based E3s [165]
Rad5 [73]
UBC9
157 [161]
UBC12
188 [164]
Siz1/2 [162] SCF E3s
recognized and polyubiquitinated by a specific SCF E3 [32, 33]. Although this is the only essential function of yeast Ubc3 [31], this E2 partners with many other SCF E3s to target diverse substrates for degradation by proteasomes (see Refs. [30, 34–36]). Although studies in yeast suggest that Cdc34 is the main E2 partner of SCF E3s, some SCF E3s seem to partner with Ubc4/5-type E3s (see Refs. [23, 36]). Ubc3 has a long C-terminal tail (Table 1), making it the most distinctive yeast E2 in terms of primary structure. A chimeric E2 in which the Ubc3 tail is appended to the core domain of Ubc2 fulfills the essential function of Ubc3 in yeast, suggesting that the tail of Ubc3 is necessary for key interactions with the E3 or Sic1 [15, 16]. Ubc7 is localized to the endoplasmic reticulum (ER) through an interaction with a partner protein, Cue1 [37], and plays a major role in proteasome degradation. Ubc7 acts on misfolded proteins of the ER, which are ejected from this compartment as a prelude to ubiquitination at the cytosolic face of the ER membrane and degradation by cytosolic proteasomes [38]. Ubc7’s role in ERAD (ER Associated Degradation) explains why a ubc7 strain is conditionally sensitive to agents that induce protein misfolding in the ER [25, 39–41]. Ubc7 frequently partners with Hrd1, an ERlocalized RING E3 [39], but some ERAD substrates of Ubc7 seem to be recognized in cooperation with a different ER-localized RING E3, Doa10 [42]. Consistent with Ubc7’s prominent role in ERAD, the yeast UBC7 and CUE1 genes are induced as part of the Unfolded Protein Response (UPR) and there is a synthetic lethal relationship between certain ERAD and UPR genes [40, 41]. Yeast Ubc1 also plays a significant, but poorly-defined, role in ERAD [39, 40]. Mammalian Ubc7 also functions in ERAD [43–45]. Ubc6 localizes to the ER through its own C-terminal membrane anchor [46]. Although Ubc6 plays a role in ERAD, its function in this process is less conspicuous than that of Ubc7 [43, 47, 48]. Interestingly, Ubc6 and Ubc7 both contribute to the Doa10-dependent degradation of a soluble nuclear protein [42, 49], and Ubc6 is
4 Functional Diversity of Ubiquitin-conjugating Enzymes 9
itself rapidly degraded by proteasomes in a manner that depends on its own activesite cysteine, its C-terminal membrane anchor, functional Ubc7, and Doa10 [42, 50]. The purpose of this instability remains mysterious. Ubc2 functions rather selectively in proteasome proteolysis. In yeast, two specific E3 partners are known, leading to proteasome degradation events that regulate chromosome stability [51], peptide import [52], and homing endonuclease stability [53]. Mammals have two closely-related Ubc2 isoforms, each of which complements most of the functions of the yeast ubc2 strain [54]. But the mammalian Ubc2 isoforms also have specialized functions – one of them is required for spermatogenesis in the mouse [55] and at least one of them can be inferred to be necessary for cardiovascular development [56]. So far, Ubc8 has been implicated in the regulated turnover of just one substrate, and its E3 partner(s) remain unknown [57]. Interestingly, the closest mammalian relative of yeast Ubc8 is expressed with a restricted tissue specificity and (in some tissues) in a regulated manner [58, 59]. 4.2 Endocytosis and Trafficking
Just because an E2 functions in proteasome proteolysis does not mean that its functions are limited to this pathway. This is because the E2’s functional range is largely determined by the substrate specificity of its E3 partner(s). For example, yeast Ubc4 and Ubc5 play a prominent role in proteasome degradation, but they also cooperate with a HECT E3, Rsp5, to mono-ubiquitinate certain plasma membrane receptors [60–62]. This modification signals receptor endocytosis, leading to degradation in the vacuole (equivalent to the mammalian lysosome). Ubc1 is partially redundant with Ubc4/5 in endocytosis [61, 62], as seen in ubiquitination reactions leading to proteasome degradation (above). Thus, Ubc1, 4, and 5 may be able to substitute for one another in complexes involving many different E3s, probably reflecting the strong conservation of the core domain between Ubc1 and Ubc4/5. Ubc4 and Ubc5 may also act with Rsp5 to regulate protein trafficking downstream of endocytosis (reviewed in Ref. [63]). 4.3 Non-proteolytic Functions
As mentioned in Section 2, Ubc2 is the defining player in a conserved DNA damagetolerance pathway [8, 10, 64]. Here Ubc2 partners with a RING E3, Rad18, to modify a DNA polymerase processivity factor with a single ubiquitin [65]. This modification signals error-prone bypass of DNA lesions [66]. Ubc2 partners with a different RING E3 (Table 1) to mono-ubiquitinate histone H2B [67–69]. This modification promotes histone methylation, which in turn regulates transcription and silencing [70, 71]. Ubc13 participates in the same DNA damage-tolerance pathway as Ubc2 [72]. Ubc13 collaborates with two enzyme partners (Table 1) to modify the DNA polymerase cofactor (above) with a K63-linked polyubiquitin chain, which signals
10 Ubiquitin-conjugating Enzymes
error-free lesion bypass [65, 72, 73]. In higher organisms, Ubc13 also helps to synthesize K63-linked polyubiquitin chains in a second non-proteolytic signaling pathway (see Ref. [74]). The function and mechanism of Ubc13 are discussed in more detail below (Section 7). 4.4 E2s of Uncertain Function
Ubc10 is required for the biogenesis of the peroxisome, an oxidative organelle [75]. This E2 plays a role in peroxisomal protein import [76] and is recruited to the peroxisomal membrane through an interaction with a partner protein [77]. Membrane-localized Ubc10 also seems to be spatially proximal to Pex10, which has a RING-like domain [78]. Whether Pex10 is a cognate E3 of Ubc10 remains to be determined, as does the mechanistic role of ubiquitin conjugation in peroxisome biogenesis. The function of the remaining yeast E2, Ubc11, remains uncertain. Ubc11 is very similar to E2-C, a clam E2 that acts with an essential multi-subunit RING E3, the anaphase promoting complex (APC) or cyclosome, to ubiquitinate mitotic cyclins [79]. This reaction leads to cyclin degradation by proteasomes, which drives exit from mitosis [35, 80]. Mitotic cyclin ubiquitination can be reconstituted in vitro with apparent amphibian and fission yeast orthologs of either Ubc11 or Ubc4 [81, 82] and other data implicate both E2s in this process in higher cells [82, 83]. However, mitotic cyclins are efficiently degraded in budding yeast ubc4 and ubc11 strains, indicating that other E2 enzymes can support this essential function in S. cerevisiae [84]. 4.5 E2 Enzymes and Disease
There are now several striking examples of disease-related defects in ubiquitin conjugation, but most of them involve E3s rather than E2s. This is not surprising given the paramount role of E3s in substrate selection and the corresponding intensity of research effort that has been focused on E3s. Still, there are several hints that defects at the E2 level of the conjugation cascade can also contribute to disease. Many viruses subvert the ubiquitin system to evade the host cell’s defenses or modulate the cellular environment so as to promote viral replication (see Refs. [85, 86] and Section 7). The genome of African swine fever virus encodes an E2 enzyme that is somewhat similar to yeast Ubc3 [87, 88]. This enzyme might alter the activity or specificity of the host cell’s conjugation cascade so as to benefit the virus, or it could act on specific viral proteins. Herpes Simplex Virus-1 (HSV-1) encodes an E3 enzyme that specifically binds the host cell’s Ubc3/Cdc34 enzyme and targets this E2 for ubiquitination and (presumably) degradation – events that may help to stabilize specific cyclins and promote viral replication [89]. A different kind of relationship between an E2 enzyme and disease is exemplified by the finding that the Alzheimer’s amyloid-β peptide induces the expression of
6 The Biochemistry of E2 Enzymes
E225K , a mammalian relative of yeast Ubc1 [90]. E225K was found to play a major role in amyloid-β-dependent neuronal cell killing. This effect may be related to the E225K -dependent production of aberrant polyubiquitin chains, leading to the inhibition of proteasomes [90, 91]. Other studies showed that the human homolog of yeast Ubc11 is over-expressed in numerous cancer cell lines and primary tumors and that forced over-expression of this E2 in cultured cells can drive proliferation and transformation [92]. Similarly, transformation and chromosomal abnormalities were observed following over-expression of human Ubc2b [93]. Such disease-related over-expression effects could arise in two different ways. The higher E2 concentration could lead to a relaxation of specificity – that is, pairings with non-cognate E3s – leading to inappropriate ubiquitination events. Alternatively, specificity could be maintained, but an inappropriately high flux through the normal E2/E3 pathway could lead to the excessive ubiquitination of cognate substrates.
5 E2 Enzymes Dedicated to Ubiquitin-like Proteins (UbLs)
Ubiquitin is just one member of a family of protein modifiers that share a common fold and a common mechanism of isopeptide tagging [70, 94, 95]. Like ubiquitin, individual UbLs are activated at a C-terminal glycine residue by a specific E1 enzyme. Often, the next step is transfer to a specific E2 enzyme. Certain UbL-specific E2 enzymes are so similar to ubiquitin-conjugating enzymes that they were initially thought to be members of the ubiquitin-conjugating enzyme family. This was true of yeast Ubc9 and Ubc12, which are dedicated to Smt3/SUMO and Rub1/Nedd8, respectively (Table 1). SUMO modifies numerous cellular proteins and has a broad functional range [94], but the only known target of Nedd8 is a specific lysine residue in one subunit (the cullin) of SCF E3s. Nedd8 modification activates these E3s (see Ref. [70]). The reader should consult earlier reviews [70, 94, 95] for a detailed discussion of UbL biology and biochemistry. There are two important points for the current discussion. First, the conjugation cascades of UbLs differ from that of ubiquitin chiefly in terms of complexity – there is one conjugating enzyme per UbL, and many fewer E3s. Second, because modifier proteins (including ubiquitin) do not interact strongly with their dedicated E2s (Section 6.1), it is believed that E1 enzymes play the major role in matching E2s with the correct modifier protein (see Ref. [96]).
6 The Biochemistry of E2 Enzymes 6.1 E1 Interaction
An E2 needs to associate with several different proteins in the course of the ubiquitin conjugation cascade, with the first being the E1. Mutational studies conducted with
11
12 Ubiquitin-conjugating Enzymes
the SUMO-specific E2 Ubc9 suggest that free Ubc9 associates with its free cognate E1 through a surface of Ubc9 that includes the C-terminal residues of α-helix H1 and residues in a loop between β-strands S1 and S2 (Figure 3) [97]. This surface of Ubc9 is also important for thiol ester bond formation [97]. Several residues in α-helix H1, particularly the C-terminal residues, are poorly conserved among E2s, and Ubc9 contains a five-residue insertion in the loop between β-strands S1 and S2 (Figure 2). Thus, this region of E2s may contribute to specificity for their cognate E1s. Consistent with this idea, the N-terminal helix (H1) of a ubiquitin E2 was found to be important for E2/ubiquitin thiol ester formation [98]. One cautionary note is that Ubc9 displays a substantial affinity for its free E1 [97], whereas ubiquitin E2s bind tightly to their E1 only after it has been loaded with ubiquitin [6, 99, 100]. The structural basis of this effect remains to be determined. 6.2 Interactions with Thiol-linked Ubiquitin
As a consequence of interacting with ubiquitin-loaded E1, the E2 accepts the activated ubiquitin at its active-site cysteine residue. This thiol ester complex, although biochemically detectable [6], has not been crystallized because it is labile in comparison to the requirements of structural biology. However, NMR chemical shift perturbations have been used to map the binding surface of ubiquitin onto human Ubc2b [101], yeast Ubc1 [102], and human Ubc13 [103]. All three models map the ubiquitin-binding surface of the E2 to a common area that includes parts of α-helix H3, the loop between α-helices H3 and H4, and residues around the active-site cysteine in the extended S4-H2 loop, including part of the 310 helix H2 (Figures 2 and 4). The C-terminus of ubiquitin extends around part of the E2 and is constrained in a cleft [102, 103]. Although this contact surface is detectable in the thiol ester, free E2s display a negligible affinity for free ubiquitin [6]. Thus, the covalent E2/ubiquitin bond enables the formation of these non-covalent contacts. 6.3 E3 Interactions
After E2/ubiquitin thiol ester formation, the ubiquitin must be transferred to the substrate, which is sometimes another ubiquitin. An E3 is usually required for this reaction in vitro, and is always required in vivo. There are three known types of E3s: the RING domain, HECT domain, and U-box (UFD2 homology) families. RING and U-box E3s act as bridging factors for E2s and substrates, but HECT E3s use a different mechanism, adding an extra step to the pathway (Section 6.3.3). 6.3.1 RING E3/E2 Interactions The small RING domain coordinates two zinc ions in a cross-brace arrangement [104]. The domain is defined by the presence of eight zinc-binding groups (cysteines and histidines) with a conserved spacing, such that the distance between the two zincs is conserved at 14 Å [104]. Sequence conservation between RING domains is
6 The Biochemistry of E2 Enzymes
Fig. 3 Ubc1/ubiquitin thiol ester complex model (1FXT). The surface of Ubc1 is shown with residues implicated in ubiquitin binding colored purple and the active-site cysteine colored yellow. Ubiquitin is colored green.
otherwise minimal. The RING-domain fold consists of a central α-helix and several small β-strands separated by loops with variable lengths [105]. RING E3s can be either single-subunit or multi-subunit enzymes. The crystal structure of UbcH7 complexed to a single-subunit RING E3, c-Cbl, shows that the RING domain is the main site of contact, although there are a few intermolecular hydrogen bonds to a non-RING helix of the E3 [106] (Figure 5). The structure of the E2 in the c-Cbl/UbcH7 complex is unchanged relative to free E2 structures. The main basis for the interaction is the packing of several hydrophobic residues of UbcH7, notably F63, into a shallow groove on the RING domain surface. These residues come from the S3–S4 and H2–H3 loops (Figure 3). The α-helix H1 of UbcH7 makes the hydrogen-bond contacts to the non-RING α-helix. Interestingly, even though the c-Cbl/UbcH7 structure undoubtedly shows conserved RING/E2 contacts, this complex is catalytically inactive (cited in Ref. [107]). Therefore additional E2/RING contacts may be needed for catalytic competence. The surface of UbcH7 that contacts c-Cbl does not overlap with the E2 surface that contacts ubiquitin (Figures. 4 and 5; see also Ref. [108]), confirming that the
13
14 Ubiquitin-conjugating Enzymes
Fig. 4 UbcH7/c-Cbl complex (1FBV). The surface of UbcH7 is shown with residues interacting with the c-Cbl RING domain shown in red and the active-site cysteine shown in yellow. c-Cbl is colored green.
E2/ubiquitin thiol ester can associate with a RING E3. The E2 surface that contacts cCbl does, however, overlap the E2 surface implicated in E1/E2 interactions (Section 6.1). Thus, the E1 may have to depart from the E2/ubiquitin complex before E2/E3 interactions can take place. The closest approach of a RING-domain residue to the active-site cysteine of UbcH7 is about 15 Å, arguing against a role for RING E3s in chemical catalysis [106]. Instead, RING E3s have been proposed to facilitate ubiquitination by inducing physical proximity of the E2/ubiquitin thiol ester and the substrate [23, 30, 106, 109]. Catalysis would result from the increased local concentrations of the two reactants (discussed further below). RING/E2 interactions have also been studied with BRCA1. This E3 is unique in that it must heterodimerize with a second RING-domain protein, BARD1, in order to display maximal E3 activity [110]. Even though the heterodimer interface leaves both RING domains available for interaction [110], UbcH5c binds exclusively to the BRCA1 RING [107]. The interacting surface of UbcH5c is homologous to the surface of UbcH7 that contacts the c-Cbl RING domain in that several residues of UbcH5c pack into a cleft on the BRCA1 RING domain. But UbcH5c also makes several
6 The Biochemistry of E2 Enzymes
contacts with the C-terminus of the BRCA1 RING domain and with a non-RING region of the heterodimer [107]. These extra contacts are not observed when UbcH7 binds to the BRCA1/BARD1 complex [107]. Because the BRCA1/BARD1/UbcH7 complex is inactive, the extra contacts observed with UbcH5c may help to create a competent E2/E3 complex. Despite the greater complexity of multi-subunit RING E3s, a common theme is evident – all SCF E3s, as well as several other types of cullin-based E3s, utilize a common RING-domain subunit, the small protein Rbx1 (reviewed in Refs. [30, 111]). Four subunits compose the minimal SCF E3 ligase complex: a cullin scaffold, Rbx1, an adaptor protein (Skp1), and a substrate-binding subunit that connects to the adaptor through a conserved domain called the F-box (see Ref. [30]). The cullin acts as a scaffold, with Rbx1 binding to one end to form a cullin/Rbx1 subcomplex that recruits the E2 and, in many cases, displays a substrate-independent ubiquitinligase activity (see Refs. [23, 30]). The crystal structure of the mammalian SCFSkp2 (Skp1/Cul1/F-boxSkp2 /Rbx1) E3 ligase shows a remarkably rigid, elongated complex [112]. The Cul1 scaffold contains three cullin-repeat motifs that span 110Å, with Rbx1 binding to a discrete C-terminal α/β domain. The Skp1/F-boxSkp2 complex binds to the opposite (Nterminal end) of the cullin. Rbx1 displays a hydrophobic groove, as seen previously in the c-Cbl RING domain [106, 112]. In c-Cbl, this groove provides an interaction surface for UbcH7 and it is reasonable to assume a similar mode of interaction in the case of Rbx1. Interestingly, the site where Nedd8 modifies the cullin is close to where the E2 binds, consistent with data which suggest that neddylation modulates E2 binding or activity [113, 114]. A model of the full SCF/E2 complex [112] shows that the end of Skp2 which binds the substrate is pointed toward the Rbx1-bound E2, with a 50-Å gap between the two. Models based on two other SCF structures show similar distances between the F-box protein and the E2 [109, 115]. Whether this gap can be bridged by the bound substrate is currently unclear. It has been suggested that the E2 may bind to Rbx1 somewhat differently than UbcH7 is observed to bind in the c-Cbl RING/UbcH7 complex, but it is not obvious that this can lead to a 20 Å movement of the E2 toward the bound substrate as suggested [109]. One could also imagine that the bound substrate and E2 “meet” through conformational changes of the SCF complex. However, the rigid separation produced by the Cul1 scaffold seems to be important for activity – introducing a flexible linker into the center of Cul1 produced a protein that could still bind an E2, but did not catalyze substrate ubiquitination [112]. An interesting study established a positive correlation between the rate of dissociation of the Ubc3/ubiquitin intermediate from the RING domain and the rate of Sic1 ubiquitination catalyzed by SCFCdc4 [116]. The authors proposed that the role of the RING domain is to bring the charged E2 into the vicinity of the SCF-bound substrate, but that release of the charged E2 is important to bridge the gap and enable multiple substrate lysines to be targeted. However, although these mechanisms may place the substrate’s lysine residue in the general vicinity of the E2’s active site, it is unclear that they can establish an effective orientation of the lysine residue and the thiol ester bond. Since
15
16 Ubiquitin-conjugating Enzymes
there is no known consensus site for ubiquitination [23, 117, 118], it is unlikely that specific molecular contacts in the vicinity of the substrate’s lysine residue are used to position this attacking group. Overall, it remains unclear how the substrate’s lysine residue approachs the E2 active site. 6.3.2 U-box E3/E2 Interactions The U-box family of E3s bind E2s through the small U-box domain [119]. Some U-box E3s do not seem to have their own cognate substrates, but instead promote polyubiquitination of the substrates of other E3s [120]. Other U-box E3s have defined cognate substrates and behave in a canonical manner [121, 122]. An NMR structure [123] confirmed an earlier prediction [124] that the Ubox domain has a RING-domain-like fold. Remarkably, the U-box domain uses hydrogen-bonding networks in place of zinc coordination to support the characteristic cross-brace arrangement. These interactions stabilize a globular fold consisting of a central α-helix surrounded by several β-strands, which are separated by loops of variable length [123]. There is a shallow groove in the surface located in a position homologous to the E2-interacting surface of RING domains. Mutational studies have linked E3 ligase activity to some of the residues in the surface groove [123, 125]. Since these mutations do not disrupt the U-box fold, they are likely to abrogate E2 binding. Although it is likely that E2s bind similarly to the U-box and RING domains, no E2/U-box structure has been reported so far. 6.3.3 HECT E3/E2 Interactions HECT-domain E3s are defined by the presence of a domain of 350 residues that is homologous to the C-terminus of the founding family member, E6AP (E6 Associated Protein [126]). E6AP is known for its role in binding the E6 protein of oncogenic human papilloma viruses and targeting the p53 tumor suppressor for ubiquitination and degradation [127]. HECT-domain E3s possess an active-site cysteine residue positioned 35 residues upstream of the C-terminus; a thiol ester with ubiquitin is formed at this site and is required for substrate ubiquitination [128]. The crystal structure of an E6AP/UbcH7 complex showed that the HECT domain is L-shaped, with a large, mostly α-helical, N-terminal lobe and a small C-terminal lobe with an α/β structure [108] (Figure 6). UbcH7 binds to the end of the Nterminal lobe and somewhat parallel to the C-terminal lobe, forming an overall U-shaped complex. UbcH7 binds in a large hydrophobic groove in the N-terminal lobe [108]. As seen in other E2/E3 structures, neither the E2 enzyme nor the HECT domain changes its overall fold upon binding. UbcH7 contacts its binding groove with residues from the S3–S4 loop and the H2–H3 loop (Figure 3). A few contacts are also made with the C-terminal portion of α-helix H1. Remarkably, these are the same two loops and helix that bind the RING domain in the c-Cbl/UbcH7 structure [106]. That two different E3s contact a largely similar surface on UbcH7 (Figures 5 and 6) can be explained through the nature of key side-chain contacts. The S3–S4 loop seems to provide most of the specificity, as it contains the F63 residue that is present in all E2s that are known to bind both
6 The Biochemistry of E2 Enzymes
Fig. 5 UbcH7/E6AP (1C4Z). The surface of UbcH7 is shown with residues interacting with the E6AP HECT domain shown in blue and the active-site cysteine shown in yellow. E6AP is colored green with the active-site cysteine between the N- and C-lobe shown in yellow.
HECT E3s and c-Cbl [106, 108]. In c-Cbl, the main contacts for F63 are isoleucine and tryptophan residues located in the RING groove [106]. Both F63 and its contact site in c-Cbl are seen to vary in other E2/RING E3 pairs, suggesting that interactions between these three specific residues are needed, but that the nature of the contact can vary [106]. In other words, a different E2 could bind to a different E3 with a similar geometry, but through different types of side-chain contacts. UbcH7-F63 makes specific contacts with six E6AP residues, so this interaction is likely to be important for all HECT/E2 pairs [108]. The H2-H3 loop that makes the other major contacts with the HECT domain is part of the more variable E2 surface (Section 3). The specific contacts made between the H2-H3 loop of UbcH7 and the E6AP HECT domain could be used to correctly predict the E2 preferences of E6AP and Rsp5 [108]. Thus, residues in the S3-S4 and H2-H3 loops play an important role in determining the specificity of E2/E3 interactions. Positioned near the bend between the two lobes of the HECT domain is its activesite cysteine [108]. This residue is 41 Å away from the active-site cysteine of UbcH7 (Figure 6), suggesting that a large conformational change is needed to bring about
17
18 Ubiquitin-conjugating Enzymes
transfer of ubiquitin from E2 to E3. Such a mechanism was confirmed in the crystal structure of another HECT domain [129]. The WWP1-HECT domain resembles the E6AP HECT domain in having two lobes, but their relative positions differ. In the WWP1-HECT domain the two lobes form an inverted T-shape as opposed to the Lshape seen in the E6AP HECT structure [108, 129]. This conformational change can be brought about by a rotation around three residues in a hinge loop connecting the two lobes. Modeling in the E2 in a position homologous to that seen in the E6AP/UbcH7 structure, the distance between the two active-site cysteines decreases to 16 Å [129]. With additional rotation around the hinge loop the WWP1 activesite cysteine can be brought within 5 Å of the E2’s active-site cysteine. Mutational studies suggested that the flexibility of this hinge loop is indeed important for ligase activity [129]. This flexibility requirement points to possible models for ubiquitin transfer and polyubiquitin chain elongation. One possibility is that the HECT domain adds ubiquitin to target substrates one at a time. This would imply that the E3 changes specificity – from recognizing the substrate to recognizing the ubiquitin – following transfer of the first ubiquitin to the substrate. A different possibility is that the chain is built up by the HECT E3 first, and then transferred as a unit to the substrate. This would require two active sites, one to hold the growing polyubiquitin chain, and the other to hold the next ubiquitin to be added. HECT E3/E2 complexes would satisfy this condition. RING E3/E2 complexes cannot, and thus would have to utilize another mechanism, presumably building on their rigid architecture. In an attractive model [129], the HECT cysteine could hold the first ubiquitin (and later the growing chain), while the C-lobe could rotate to position the first ubiquitin’s target lysine near the thiol ester bond of the bound E2/ubiquitin intermediate. Subsequent rounds of ubiquitin addition to the chain terminus would require the C-lobe to keep rotating, ultimately ending in steric problems for the chain which could favor its transfer to the substrate. A third general model for polyubiquitin chain extension is that the initiation and elongation phases of the reaction involve different E2 enzymes, different E3 enzymes, or different E2/E3 complexes. The modification of a substrate with a noncanonical polyubiquitin chain follows the third model. The Rad6/Rad18 complex ligates the first ubiquitin, while the Mms2/Ubc13/Rad5 complex extends the chain [65, 73]. The extension of K48-linked chains from ubiquitin fusion proteins seems to involve the sequential action of two different E3s with the same E2 [120]. In another possible example, two E2s (orthologs of Ubc11 and Ubc4) have been suggested to act sequentially with the APC in the polyubiquitination of mitotic cyclins in fission yeast [82]. 6.4 E2/Substrate Interactions
With HECT domain E3s, all of the chemistry of isopeptide bond formation occurs at the E3 active site. With RING and U-box E3s, however, the E2 participates directly in this chemical reaction, so the substrate’s lysine must closely approach
6 The Biochemistry of E2 Enzymes
the E2’s active site. A crystal structure of the SUMO E2 Ubc9, complexed with a large fragment of RanGAP1 (an efficient sumoylation substrate), reveals the specificity of this interaction [130]. Unlike ubiquitination, sumoylation is site-specific. The target lysine for sumoylation lies within a tetrapeptide sequence motif -K-X-D/E, where is a hydrophobic residue, K is the target lysine, and X is any residue. Ubc9 makes specific interactions with each of these consensus-motif residues in a manner that places the lysine ε-amino group within 3.5 Å of the Ubc9 active-site cysteine [130]. The lysine approaches the cysteine from what is expected to be its unencumbered (by SUMO) side [102]. The interacting surface on Ubc9 involves α-helix H4, the loop preceding it, and the extended S4-H2 loop, including the active-site cysteine [130] (Figure 3). This surface does not overlap with the presumptive binding surface for SUMO. This mode of interaction is unlikely to hold with ubiquitin E2s, since no general consensus site for ubiquitination is known. A model for ubiquitin E2/substrate interactions has also been proposed for the special case in which the substrate is ubiquitin [131]. The crystal structure of the Mms2/Ubc13 complex led to the modeling of an E2/UEV/ubiquitin (donor)/ubiquitin (acceptor) model. As discussed in Section 7, UEV (Ubiquitin E2 Variant) proteins such as Mms2 are homologous to E2s, but lack the active-site cysteine residue. Known UEV/Ubc13 complexes act as E2 enzymes specialized for the synthesis of K63-linked polyubiquitin chains [72, 132]. In the model [131], Ubc13 is bound to the donor ubiquitin through a thiol ester bond in a manner that agrees well with inferences from NMR analysis of the Ubc1/ubiquitin thiol ester [102] (Figure 7). The position of the non-covalently bound acceptor ubiquitin is determined by the orientation of Mms2 on Ubc13 (Figure 7). The acceptor ubiquitin has its K63 side chain placed to enter the active site of Ubc13 to form a diubiquitin conjugate. The model suggests that K63 of ubiquitin is selected as the conjugation site through steric exclusion of other lysines, as determined by an interaction between Mms2 and a region of ubiquitin that is distant from K63 [131]. Recent NMR studies have confirmed and refined this model [103]. Thus, the substrate lysine is presented to the active-site cysteine through an indirect mechanism, in contrast to the Ubc9/RanGAP1 example in which the E2 interacts directly with the lysine residue itself [130]. Unlike most ubiquitination reactions, the modification of ubiquitin itself is often site-specific. The Mms2/Ubc13/ubiquitin model can help to explain this phenomenon. 6.5 E2 Catalysis Mechanism
Chemical catalytic mechanisms in the ubiquitin conjugation cascade have proved difficult to decipher. The reactions leading to E2/ubiquitin thiol ester and isopeptide bond formation would be facilitated by electrostatic stabilization of the oxyanion and deprotonation of the attacking amino group (isopeptide bond formation) by a general base [133, 134]. However, while the sequence conservation around the E2 active site is very high (Figure 2), all E2 structures show a lack of candidate catalytic residues close to the cysteine (see Refs. [23, 135]). Although catalytic residues could
19
20 Ubiquitin-conjugating Enzymes
Fig. 6 Ubc13 interaction surface. The interacting surfaces have been mapped onto Ubc13. The active-site cysteine is shown in yellow. Colored surfaces contact: covalently bound ubiquitin (purple); RING domains (red); E1 (presumptive, green); acceptor ubiquitin involved in K63-linked polyubiqutin-chain synthesis (blue).
be contributed by other enzymes in the cascade or by the E2 backbone, structural data argue strongly against a chemical catalytic contribution where it may be needed most – in reactions involving RING and U-box domain E3s. Recent studies [136] addressed the role of a strictly conserved asparagine positioned just upstream of the active-site cysteine (N79 in Ubc1 numbering, Figure 2). In existing E2 structures the asparagine is hydrogen-bonded to the backbone or a side chain. It is distant from the E3 contact surface and, as expected, it is dispensable for E2 binding to RING domain E3s. However, the asparagine is critical for E2-catalyzed and RING E3/E2-catalyzed ubiquitin conjugation reactions. The similar effect of asparagine mutation on the two types of conjugation reactions is reasonable given that E2s do not experience structural perturbations upon binding to E3s (above). The data suggest that an intrinsic catalytic role of the asparagine side chain is brought into play through RING-mediated recruitment of the catalytically competent E2/ubiquitin thiol ester. The asparagine is dispensable for upstream and downstream thiol transfer reactions, suggesting that catalytic residues for these reactions may be located in the E1 and HECT E3 active sites. A specific proposal for the role of the asparagine was developed in a model which breaks the hydrogen bonds to the backbone and rotates the asparagine toward the active-site cysteine [136]. Molecular modeling suggested that the asparagine can be positioned to donate a hydrogen bond to the oxyanion (Figure 8) [136]. Many
7 Functional Diversification of the E2 Fold 21
Fig. 7 Model for catalytic role of E2 activesite asparagine. The side chain of the asparagine in the conserved “HPN” motif (Figure 2) stabilizes the oxyanion that
forms when the substrate’s lysine attacks the E2/ubiquitin thiol ester bond. N79 is numbering for Ubc1 (Figure 2).
cysteine proteases, including deubiquitinating enzymes, use an amide side chain in this manner [134, 137–139]. Structural studies of a deubiquitinating enzyme have shown that the entry of ubiquitin into the active site causes a histidine and an asparagine to shift their positions so that the histidine becomes the general base and the asparagine provides the oxyanion hole [137]. Similarly, ubiquitin binding in the E2 active site could be a trigger that repositions the asparagine. So far, no general base is evident in E2s, but this group may not be needed due to the lability of the thiol ester bond.
7 Functional Diversification of the E2 Fold
Increasing evidence suggests that evolution has used (and is using) the E2 fold for new purposes. In one apparent example of functional expansion, E2 core domains have been observed to be embedded within much larger polypeptide chains [140, 141]. The functional properties of these massive E2s remain poorly characterized, and it is likely that more of them will be discovered. But the clearest case of functional diversification is provided by the UEV proteins. UEVs are related to E2s in their primary, secondary, and tertiary structures, but they lack an active-site cysteine residue and therefore cannot function as canonical E2s [142]. Nonetheless they play several different roles in ubiquitin-dependent pathways. Mms2 and its close (mammalian) relative Uev1a form heterodimers with Ubc13 [72, 132]. Each complex plays a key role in the synthesis of K63-linked chains, which act as non-proteolytic signals in different cellular pathways. The Mms2/Ubc13 complex participates in the UBC2/RAD6-dependent DNA damage tolerance pathway by polyubiquitinating the DNA polymerase processivity factor called PCNA (Proliferating Cell Nuclear Antigen) [65, 72]. To be activated for this pathway, PCNA is first mono-ubiquitinated by the Rad6/Rad18 complex, and then modified
22 Ubiquitin-conjugating Enzymes
with a K63-linked polyubiquitin chain by the Mms2/Ubc13/Rad5 complex (Rad18 and Rad5 are RING E3s) [65, 73]. The Mms2/Ubc13 complex has a core ubiquitin polymerization activity [72]. Rad5 might stimulate this activity [132] or target the Uev/E2 complex to PCNA [73], or both. The related human Uev1a/Ubc13 complex is involved in NFκB signal transduction [132]. It plays an intermediate role in the signaling cascade that starts with a proinflammatory cytokine signal and culminates in the nuclear translocation of the active NFκB transcription factor. In this pathway the Ubc13/Uev1a complex modifies a RING E3, Traf6, with K63-linked polyubiquitin chains [132]. This modification is linked to Traf6 oligomerization. It instigates a cascade of kinase reactions ultimately cause the ubiquitination and degradation of NFκB’s inhibitory partner, IκBα [74, 143]. The crystal structure of Mms2 has been solved alone and in complex with Ubc13 [131, 144]. The overall fold is similar to that in E2s, containing a central fourstranded anti-parallel β-sheet surrounded by α-helices. Differences include the absence of the C-terminal α-helix H5 in the shorter Mms2 protein. The helical Nterminus of Mms2 is also extended compared to Ubc13, and this region plays the major role in heterodimer formation. Two ubiquitins can bind to the heterodimer (Section 6.4) and the surfaces they contact do not overlap with the surface contacted by Rad5 [131, 145]. Ubiquitin plays a crucial role in a protein-trafficking pathway that delivers specific cargo proteins to regions of the late endosome membrane that invaginate into the lumen, thereby targeting these proteins to the vacuole/lysosome (reviewed in Ref. [146]). A different UEV protein, called Tsg101 in humans (Tumor Susceptibility Gene) and Vps23 in yeast (Vacuolar Protein Sorting), is part of a large complex that plays a critical role in the sorting step. Cargo proteins are selected based on their conjugation to mono-ubiquitin; the specific role of the UEV protein is to bind the cargo-linked mono-ubiquitin moiety [147]. HIV-1 and certain other viruses subvert this function of Tsg101 in order to bud from the plasma membrane [148– 150]. Mechanistically, Tsg101 is recruited to the virus budding sites by binding to a tetrapeptide “PTAP” motif in the late domain of viral proteins such as HIV1-GAG. Tsg101 is essential for virus budding from the plasma membrane [148], so it is possible that the endocytic budding machinery is hijacked to the plasma membrane via the Tsg101/GAG interaction [85]. The solution structure of the Tsg101 UEV domain has been solved alone and in complex with a PTAP-containing peptide [151, 152]. Human Tsg101 contains 390 amino acids, with the UEV domain located at the N-terminus [152]. The UEV domain is the minimal region needed to bind HIV-1 GAG, and is also the domain involved in mono-ubiquitin recognition and binding [153]. The overall fold of the UEV domain is similar to that of E2s. One notable difference is the presence of an extra N-terminal α-helix on Tsg101 [152]. The other major difference is the absence in Tsg101 UEV of the two C-terminal α-helices of the E2s – a truncation that was also seen in the Mms2 structures. This truncation appears to be a special trait of UEV proteins [154]. In Tsg101 UEV, the absence of the C-terminal helices helps to create the binding site for the PTAP peptide [151].
8 Conclusions
When aligning the structures of a canonical E2, Mms2, and the Tsg101 UEV domain, the hydrophobic core and the region surrounding the vestigial active site are quite similar, but the Tsg101 UEV domain differs from Mms2 and canonical E2s in the positions of the first two β-strands [152]. In Tsg101 they are elongated and shifted toward the N-terminus, forming a β-hairpin that extends 11 residues outside the main body of the domain [152]. This loop is important for ubiquitin binding by Tsg101 [152]. As determined by chemical shift mapping and mutagenesis studies, the Tsg101/ubiquitin binding interface involves the bottom half of the four-stranded β-sheet, including the β-hairpin (loop S1–S2). The binding interface for ubiquitin on Tsg101 is distinct from the surface that Mms2 uses to position the acceptor ubiquitin within the Ubc13/Mms2 complex [131, 144]. Thus, two different UEVs bind ubiquitin in two different ways. The structural biology and biochemistry of UEVs illustrates how modest changes to an E2-like module can create new, functionally important interaction sites. The UEV domain is just one of a growing set of small domains that can endow other protein domains with ubiquitin-binding capability (reviewed in Ref. [63]). Such binding elements are likely to play important roles in transducing ubiquitin signals in diverse cellular pathways.
8 Conclusions
We have emphasized the biochemical properties of E2s, particularly interactions with other factors in the conjugation cascade, because these properties are central to the biological actions of E2s. We have tried to give a flavor of the “creativity and economy” [103] with which E2s have evolved to maximize the interaction potential of a relatively small and conserved surface (Figure 7). Owing to the large scope of the relevant literature and the limited length of this chapter, we have not done full justice to the biological breadth of the E2 enzyme family. For example, we have focused on yeast and mammalian enzymes, but ubiquitin conjugation is increasingly being studied in other model organisms, including flies, worms, and plants. These systems offer powerful tools to address outstanding questions about ubiquitin-dependent pathways in general and E2 enzymes in particular. What are some of those questions? Significant uncertainties remain concerning E2 catalysis and mechanism, as discussed in Section 6. Another important question has been largely ignored in this review – exactly why are there so many E2s? One appealing model is that the identity of the E2 can modulate the substrate specificity of the E3, but experimental evidence for this model remains sparse. Another possibility is that the E2 has little or no influence on substrate choice, but rather helps to control the flux of activated ubiquitin to its cognate E3. In view of the remarkable developments in ubiquitin biology over the last decade, we should be prepared for both interesting and unexpected answers to these (and other) questions.
23
24 Ubiquitin-conjugating Enzymes
References 1 HERSHKO, A. and CIECHANOVER, A. The ubiquitin system. Annu. Rev. Biochem. 1998, 67, 425–479. 2 CIECHANOVER, A., HELLER, H., ELIAS, S., HAAS, A. L., and HERSHKO, A. ATP-dependent conjugation of reticulocyte proteins with the polypeptide required for protein degradation. Proc. Natl. Acad. Sci. USA 1980, 77, 1365–1368. 3 HERSHKO, A., CIECHANOVER, A., HELLER, H., HAAS, A. L., and ROSE, I. A. Proposed role of ATP in protein breakdown: conjugation of proteins with multiple chains of the polypeptide of ATP-dependent proteolysis. Proc. Natl. Acad. Sci. USA 1980, 77, 1783–1786. 4 HAAS, A. L., WARMS, J. V. B., HERSHKO, A., and ROSE, I. A. Ubiquitin-activating enzyme. Mechanism and role in protein-ubiquitin conjugation. J. Biol. Chem. 1982, 257, 2543–2548. 5 CANE, D. E. and WALSH, C. T. The parallel and convergent universes of polyketide synthases and nonribosomal peptide synthetases. Chem. Biol. 1999, 6, R319–R325. 6 HERSHKO, A., HELLER, H., ELIAS, S., and CIECHANOVER, A. Components of ubiquitin-protein ligase system. J. Biol. Chem. 1983, 258, 8206–8214. 7 PICKART, C. M. and ROSE, I. A. Functional heterogeneity of ubiquitin carrier proteins. J. Biol. Chem. 1985, 260, 1573–1781. 8 JENTSCH, S., MCGRATH, J. P., and VARSHAVSKY, A. The yeast DNA repair gene RAD6 encodes a ubiquitin-conjugating enzyme. Nature 1987, 329, 131–134. 9 GOEBL, M. G., YOCHEM, J., JENTSCH, S., MCGRATH, J. P., VARSHAVSKY, A., and BYERS, B. The yeast cell cycle gene CDC34 encodes a ubiquitin-conjugating enzyme. Science 1988, 241, 1331–1335. 10 REYNOLDS, P., WEBER, S., and PRAKASH, L. RAD6 gene of Saccharomyces cerevisiae encodes a protein containing a tract of 13 consecutive aspartates. Proc. Natl. Acad. Sci. USA 1985, 82, 168–172.
11 SUNG, P., PRAKASH, S., and PRAKASH, L. Stable ester conjugate between the Saccharomyces cerevisiae RAD6 protein and ubiquitin has no biological activity. J. Mol. Biol. 1991, 221, 745–749. 12 BANERJEE, A., DESHAIES, R. J., and CHAU, V. Characterization of a dominant negative mutant of the cell cycle ubiquitin-conjugating enzyme Cdc34. J. Biol. Chem. 1995, 270, 26209–26215. 13 SEMPLE, C. A. The comparative proteomics of ubiquitination in mouse. Genome Res 2003, 13, 1389–1394. 14 WONG, B. R., PARLATI, F., QUK, K., DEMO, S., PRAY, T., HUANG, J., PAYAN, D. G., and BENNETT, M. K. Drug discovery in the ubiquitin regulatory pathway. Drug Discov. Today 2003, 8, 746–754. 15 KOLMAN, C. J., TOTH, J., and GONDA, D. K. Identification of a portable determinant of cell cycle function within the carboxyl-terminal domain of the yeast CDC34 (UBC3) ubiquitin conjugating (E2) enzyme. EMBO J. 1992, 11, 3081–3090. 16 SILVER, E. T., GWOZD, T. J., PTAK, C., GOEBL, M. G., and ELLISON, M. J. A chimeric ubiquitin conjugating enzyme that combines the cell cycle properties of CDC34 (UBC3) and the DNA repair properties of RAD6 (UBC2): implications for the structure, function, and evolution of the E2s. EMBO J. 1992, 11, 3091–3098. 17 PTAK, C., PRENDERGAST, J. A., HODGINS, R., KAY, C. M., CHAU, V., and ELLISON, M. J. Functional and physical characterization of the cell cycle ubiquitin-conjugating enzyme CDC34 (UBC3). J. Biol. Chem. 1994, 269, 26539–26545. 18 HODGINS, R., GWOZD, C., ARNASON, T., CUMMINGS, M., and ELLISON, M. J. The tail of a ubiquitin-conjugating enzyme redirects multi-ubiquitin chain synthesis from the lysine 48-linked configuration to a novel nonlysine-linked form. J. Biol. Chem. 1996, 271, 28766–28771. 19 MATHIAS, N., STEUSSY, C. N., and GOEBL, M. G. An essential domain within Cdc34p is required for binding to a
References
20
21
22
23
24
25
26
27
28
29
complex containing Cdc4p and Cdc53p in Saccharomyces cerevisiae. J. Biol. Chem. 1998, 273, 4040–4045. COOK, W. J., JEFFREY, L. C., SULLIVAN, M. L., and VIERSTRA, R. D. Three-dimensional structure of a ubiquitin-conjugating enzyme (E2). J. Biol. Chem. 1992, 267, 15116–15121. COOK, W. J., JEFFREY, L. C., XU, Y., and CHAU, V. Tertiary structures of class I ubiquitin-conjugating enzymes are highly conserved: crystal structure of yeast Ubc4. Biochemistry 1993, 32, 13809–13817. HOCHSTRASSER, M. Ubiquitin-dependent protein degradation. Annu. Rev. Genet. 1996, 30, 405–439. PICKART, C. M. Mechanisms underlying ubiquitination. Annu. Rev. Biochem. 2001, 70, 503–533. SEUFERT, W. and JENTSCH, S. Ubiquitin-conjugating enzymes UBC4 and UBC5 mediate selective degradation of short-lived and abnormal proteins. EMBO J. 1990, 9, 543–550. JUNGMANN, J., REINS, H.-A., SCHOBERT, C., and JENTSCH, S. Resistance to cadmium mediated by ubiquitin-dependent proteolysis. Nature 1993, 361, 369–371. HEINEMEYER, W., KLEINSCHMIDT, J. A., SAIDOWSKY, J., ESCHER, C., and WOLF, D. H. Proteinase yscE, the yeast proteasome/multicatalyticmultifunctional proteinase: mutants unravel its function in stress induced proteolysis and uncover its necessity for cell survival. EMBO J. 1991, 10, 555–562. JOHNSON, E. S., MA, P. C., OTA, I. M., and VARSHAVSKY, A. A proteolytic pathway that recognizes ubiquitin as a degradation signal. J. Biol. Chem. 1995, 270, 17442–17456. SEUFERT, W., MCGRATH, J. P., and JENTSCH, S. UBC1 encodes a novel member of an essential subfamily of yeast ubiquitin-conjugating enzymes involved in protein degradation. EMBO J. 1990, 9, 4535–4541. GONEN, H., BERCOVICH, B., ORIAN, A., CARRANO, A., TAKIZAWA, C., YAMANAKA, K., PAGANO, M., IWAI, K., and
30
31
32
33
34
35
36
37
38
CIECHANOVER, A. Identification of the ubiquitin carrier proteins, E2s, involved in signal-induced conjugation and subsequent degradation of IκBα. J. Biol. Chem. 1999, 274, 14823–14830. DESHAIES, R. J. SCF and cullin/RING H2-based ubiquitin ligases. Annu. Rev. Cell Devel. Biol. 1999, 15, 435–467. SCHWOB, E., BOHM, T., MENDENHALL, M. D., and NASMYTH, K. The B-type cyclin kinase inhibitor p40SIC1 controls the G1 to S transition in S. cerevisiae Cell 1994, 79, 233–244. FELDMAN, R. M. R., CORRELL, C. C., KAPLAN, K. B., and DESHAIES, R. J. A complex of Cdc4p, Skp1p, and Cdc53p/cullin catalyzes ubiquitination of the phosphorylated CDK inhibitor Sic1p. Cell 1997, 91, 221–230. SKOWYRA, D., CRAIG, K. L., TYERS, M., ELLEDGE, S. J., and HARPER, J. W. F-box proteins are receptors that recruit phosphorylated substrates to the SCF ubiquitin-ligase complex. Cell 1997, 91, 209–219. PATTON, E. E., WILLEMS, A. R., SA, D., KURAS, L., THOMAS, D., CRAIG, K. L., and TYERS, M. Cdc53 is a scaffold protein for multiple Cdc34/Skp1/F-box protein complexes that regulate cell division and methionine biosynthesis in yeast. Genes Dev. 1998, 12, 692–705. JACKSON, P. K., ELDRIDGE, A. G., FREED, E., FURSTENTHAL, L., HSU, J. Y., KAISER, B. K., and REIMANN, J. D. R. The lore of the RINGs: substrate recognition and catalysis by ubiquitin ligases. Trends Cell Biol. 2000, 10, 429–439. KUS, B., CALDON, C. E., ANDORN-BROZA, R., and EDWARDS, A. M. Functional interactions of 13 yeast SCF complexes with a set of yeast E2 enzymes in vitro. Proteins: Structure, Function, and Bioinformatics 2004, 54, 455–467. BIEDERER, T., VOLKWEIN, C., and SOMMER, T. Role of Cue1p in ubiquitination and degradation at the ER surface. Science 1997, 278, 1806–1809. KOSTOVA, Z. and WOLF, D. H. For whom the bell tolls: protein quality control of the endoplasmic reticulum and the ubiquitin-proteasome connection. EMBO J. 2003, 22, 2309–2317.
25
26 Ubiquitin-conjugating Enzymes
39 BAYS, N. W., GARDNER, R. G., SEELIG, L. P., JOAZEIRO, C. A., and HAMPTON, R. Y. Hrd1p/Der3p is a membrane-anchored ubiquitin ligase required for ER-associated degradation. Nature Cell Biol. 2000, 3, 24–29. 40 FRIEDLANDER, R., JAROSCH, E., URBAN, J., VOLKWEIN, C., and SOMMER, T. A regulatory link between ER-associated protein degradation and the unfolded-protein response. Nature Cell Biol. 2000, 2, 379–384. 41 TRAVERS, K. J., PATIL, C. K., WODICKA, L., LOCKHART, D. J., WEISSMAN, J. S., and WALTER, P. Functional and genomic analyses reveal an essential coordination between the unfolded protein response and ER-associated degradation. Cell 2000, 101, 249–258. 42 SWANSON, R., LOCHER, M., and HOCHSTRASSER, M. A conserved ubiquitin ligase of the nuclear envelope/endoplasmic reticulum that functions in both ER-associated and Matα2 repressor degradation. Genes Dev. 2001, 15, 2660–2674. 43 TIWARI, S. and WEISSMAN, A. M. Endoplasmic reticulum (ER)-associated degradation of T cell receptor subunits. J. Biol. Chem. 2001, 276, 16193–16200. 44 WEBSTER, J. M., TIWARI, S., WEISSMAN, A. M., and WOJCIKIEWICZ, R. J. H. Inositol 1,4,5-trisphosphate receptor ubiquitination is mediated by mammalian Ubc7, a component of the endoplasmic reticulum-associated degradation pathway, and is inhibited by chelation of intracellular Zn2+ . J. Biol. Chem. 2003, 278, 38238–38246. 45 FANG, S., FERRONE, M., YANG, C., JENSEN, J. P., TIWARI, S., and WEISSMAN, A. M. The tumor autocrine motility factor receptor, gp78, is a ubiquitin protein ligase implicated in degradation from the endoplasmic reticulum. Proc. Natl. Acad. Sci. USA 2001, 98, 14422–14427. 46 SOMMER, T. and JENTSCH, S. A protein translocation defect linked to ubiquitin conjugation at the endoplasmic reticulum. Nature 1993, 365, 176–179. 47 BIEDERER, T., VOLKWEIN, C., and SOMMER, T. Degradation of subunits of the Sec61p complex, an integral component of the
48
49
50
51
52
53
54
55
ER membrane, by the ubiquitin-proteasome pathway. EMBO J. 1996, 15, 2069–2076. HILLER, M. M., FINGER, A., SCHWEIGER, M., and WOLF, D. H. ER degradation of a misfolded luminal protein by the cytosolic ubiuqitin-proteasome pathway. Science 1996, 273, 1725–1728. CHEN, P., JOHNSON, P., SOMMER, T., JENTSCH, S., and HOCHSTRASSER, M. Multiple ubiquitin-conjugating enzymes participate in the in vivo degradation of the yeast MATα2 repressor. Cell 1993, 74, 357–369. WALTER, J., URBAN, J., VOLKWEIN, C., and SOMMER, T. Sec61p-independent degradation of the tail-anchored ER membrane protein Ubc6p. EMBO J. 2001, 20, 3124–3131. RAO, H., UHLMANN, F., NASMYTH, K., and VARSHAVSKY, A. Degradation of a cohesin subunit by the N-end rule pathway is essential for chromosome stability. Nature 2001, 410, 955–959. BYRD, C., TURNER, G. C., and VARSHAVSKY, A. The N-end rule pathway controls the import of peptides through degradation of a transcriptional repressor. EMBO J. 1998, 17, 269–277. KAPLUN, L., IVANTSIV, Y., KORNITZER, D., and RAVEH, D. Functions of the DNA damage response pathway target Ho endonuclease of yeast for degradation via the ubiquitin-26S proteasome pathway. Proc. Natl. Acad. Sci. USA 2000, 97, 10077–10082. KOKEN, M. H. M., HOOGERBRUGGE, J. W., JASPERS-DEKKER, I., de WIT, J., WILLEMSEN, R., ROEST, H. P., GROOTEGOED, J. A., and HOEIJMAKERS, J. H. J. Expression of the ubiquitin-conjugating DNA repair enzymes HHR6A and B suggests a role in spermatogenesis and chromatin modification. Dev. Biol. 1996, 173, 119–132. ROEST, H. P., van KLAVEREN, J., de WIT, J., van GURP, C. G., KOKEN, M. H. M., VERMEY, M., van ROIJEN, J. H., HOOGERBRUGGE, J. W., VREEBERG, M. T. M., BAARENDS, W. M., BOOTSMA, D., GROOTEGOED, J. A., and HOEIJMAKERS, J. H. J. Inactivation of the HR6B
References
56
57
58
59
60
61
62
63
64
ubiquitin-conjugating DNA repair enzyme in mice causes male sterility associated with chromatin modification. Cell 1996, 86, 799–810. KWON, Y. T., KASHINA, A. S., DAVYDOV, I. V., HU, R.-G., AN, J. Y., SEO, J. W., DU, F., and VARSHAVSKY, A. An essential role of N-terminal arginylation in cardovascular development. Science 2002, 297, 96–99. SCHULE, T., ROSE, M., ENTIAN, K.-D., THUMM, M., and WOLF, D. H. Ubc8p functions in catabolite degradation of fructose-1,6-bisphosphatase in yeast. EMBO J. 2000, 19, 2161–2167. LI, Y.-P., LECKER, S. H., CHEN, Y., WADDELL, I. D., GOLDBERG, A. L., and REID, M. B. TNF-α increases ubiquitin-conjugating activity in skeletal muscle by up-regulating UbcH2/E220K . FASEB J. 2003, 17, 1048–1057. WEFES, I., MASTRANDREA, L. D., HALDEMAN, M., KOURY, S. T., TAMBURLIN, J., PICKART, C. M., and FINLEY, D. Inducation of ubiquitin-conjugating enzymes during terminal erythroid differentiation. Proc. Natl. Acad. Sci. USA 1995, 92, 4982–4986. GITAN, R. S., SHABABI, M., KRAMER, M., and EIDE, D. J. A cytosolic domain of the yeast Zrt1 zinc transporter is required for its post-translational inactivation in response to zinc and cadmium. J. Biol. Chem. 2003, 278, 39558–39564. HICKE, L. and RIEZMAN, H. Ubiquitination of a yeast plasma membrane receptor signals its ligand-stimulated endocytosis. Cell 1996, 84, 277–287. DUNN, R. and HICKE, L. Domains of the Rsp5 ubiquitin-protein ligase required for receptor-mediated and fluid-phase endocytosis. Mol. Biol. Cell 2001, 12, 421–435. HICKE, L. and DUNN, R. Regulation of membrane protein transport by ubiquitin and ubiquitin-binding proteins. Annu. Rev. Cell Devel. Biol. 2003, 19, 141–172. ULRICH, H. D. Degradation or maintenance: actions of the ubiquitin system on eukaryotic chromatin. Eukaryotic Cell 2002, 1, 1–10.
65 HOEGE, C., PFANDER, B., MOLDOVAN, G.-L., PYROWOLAKIS, G., and JENTSCH, S. RAD6-dependent DNA repair is linked to modification of PCNA by ubiquitin and SUMO. Nature 2002, 419, 135–141. 66 STELTER, P. and ULRICH, H. D. Control of spontaneous and damage-induced mutagenesis by SUMO and ubiquitin conjugation. Nature 2003, 425, 188–191. 67 HWANG, W. W., VENKATASUBRAHMANYAM, S., IANCULESCU, A. G., TONG, A., BOONE, C., and MADHANI, H. D. A conserved RING finger protein required for histone H2B monoubiquitination and cell size control. Mol. Cell 2003, 11, 261–266. 68 ROBZYK, K., RECHT, J., and OSLEY, M. A. Rad6-dependent ubiquitination of histone H2B in yeast. Science 2000, 287, 501–504. 69 WOOD, A., KROGAN, N. J., DOVER, J., SCHNEIDER, J., HEIDT, J., BOATENG, M. A., DEAN, K., GOLSHANI, A., ZHANG, Y., GREENBLATT, J. F., JOHNSTON, M., and SHILATIFARD, A. Bre1, an E3 ubiquitin ligase required for recruitment and substrate selection of Rad6 at a promoter. Mol. Cell 2003, 11, 267–274. 70 SCHWARTZ, D. C. and HOCHSTRASSER, M. A superfamily of protein tags: ubiquitin, SUMO and related modifiers. Trends Biochem. Sci. 2003, 28, 321–328. 71 SUN, Z.-W. and ALLIS, C. D. Ubiquitination of histone H2B regulates H3 methylation and gene silencing in yeast. Nature 2002, 418, 104–108. 72 HOFMANN, R. M. and PICKART, C. M. Noncanonical MMS2-encoded ubiquitin-conjugating enzyme functions in assembly of novel polyubiquitin chains for DNA repair. Cell 1999, 96, 645–653. 73 ULRICH, H. D. and JENTSCH, S. Two RING finger proteins mediate cooperation between ubiquitin-conjugating enzymes in DNA repair. EMBO J. 2000, 19, 3388–3397. 74 SUN, L. and CHEN, Z. J. The novel functions of ubiquitination in signaling. Curr. Opin. Cell Biol. 2004, 16, 119–126. 75 WIEBEL, F. F. and KUNAU, W. H. The Pas2 protein essential for peroxisome biogenesis is related to
27
28 Ubiquitin-conjugating Enzymes
76
77
78
79
80
81
82
83
84
ubiquitin-conjugating enzymes. Nature 1992, 359, 73–76. van der KLEI, I. J., HILBRANDS, R. E., KIEL, J. A. K. W., RASMUSSEN, S. W., CREGG, J. M., and VEENHUIS, M. The ubiquitin-conjugating enzyme Pex4p of Hansenula polymorpha is required for efficient functioning of the PTS1 import machinery. EMBO J. 1998, 17, 3608–3618. KOLLER, A., SNYDER, W. B., FABER, K. N., WENZEL, T. J., RANGELL, L., KELLER, G. A., and SUBRAMANI, S. Pex22p of Pichia pastoris, essential for peroxisomal matrix protein import, anchors the ubiquitin-conjugating enzyme, Pex4p, on the peroxisomal membrane. J. Cell Biol. 1999, 146, 99–112. ECKERT, J. H. and JOHNSSON, N. Pex10p links the ubiquitin conjugating enzyme Pex4p to the protein import machinery of the peroxisome. J. Cell Sci. 2003, 116, 3623–3634. ARISTARKHOV, A., EYTAN, E., MOGHE, A., ADMON, A., HERSHKO, A., and RUDERMAN, J. V. E2-C a cyclin-selective ubiquitin carrier protein required for the destruction of mitotic cyclins. Proc. Natl. Acad. Sci. USA 1996, 93, 4294–4299. PETERS, J. M. The anaphase-promoting complex: proteolysis in mitosis and beyond. Mol. Cell 2002, 9, 931–943. YU, H., KING, R. W., PETERS, J.-M., and KIRSCHNER, M. W. Identification of a novel ubiquitin-conjugating enzyme involved in mitotic cyclin degradation. Curr. Biol. 1996, 6, 455–466. SEINO, H., KISHI, T., NISHITANI, H., and YAMAO, F. Two ubiquitin-conjugating enzymes, UbcP1/Ubc4 and UbcP4/Ubc11, have distinct functions for ubiquitination of mitotic cyclin. Mol. Cell. Biol. 2003, 23, 3497–3505. TOWNSLEY, F. M., ARISTARKHOV, A., BECK, S., HERSHKO, A., and RUDERMAN, J. V. Dominant-negative cyclin-selective ubiquitin carrier protein E2-C/UbcH10 blocks cells in metaphase. Proc. Natl. Acad. Sci. USA 1997, 94, 2362–2367. TOWNSLEY, F. M. and RUDERMAN, J. V. Functional analysis of the Saccharomyces cerevisiae UBC11 gene. Yeast 1998, 14, 747–757.
85 PORNILLOS, O., GARRUS, J. E., and SUNDQUIST, W. I. Mechanisms of enveloped RNA virus budding. Trends Cell Biol. 2002, 12, 569–579. 86 FURMAN, M. H. and PLOEGH, H. L. Lessons from viral manipulation of protein disposal pathways. J. Clin. Invest. 2002, 110, 875–879. 87 HINGAMP, P. M., ARNOLD, J. E., MAYER, R. J., and DIXON, L. K. A ubiquitin conjugating enzyme encoded by African swine fever virus. EMBO. J. 1992, 11, 361–366. 88 RODRIGUEZ, J. M., SALAS, M. L., and VINUELA, E. Genes homologous to ubiquitin-conjugating proteins and eukaryotic transcription factor SII in African swine fever virus. Virology 1992, 186, 40–52. 89 HAGGLUND, R. and ROIZMAN, B. Characterization of the novel E3 ubiquitin ligase encoded in exon 3 of herpes simplex virus-1-infected cell protein 0. Proc. Natl. Acad. Sci. USA 2002, 99, 7889–7894. 90 SONG, S., KIM, S.-Y., HONG, Y.-M., JO, D.-G., LEE, J.-Y., SHIM, S. M., CHUNG, C.-W., SEO, S. J., YOO, Y. J., KOH, J.-Y., LEE, M. C., YATES, A. J., ICHIJO, H., and JUNG, Y.-K. Essential role of E2–25K/Hip-2 in mediating amyloid-β toxicity. Mol. Cell 2003, 12, 553–563. 91 LAM, Y. A., PICKART, C. M., ALBAN, A., LANDON, M., JAMIESON, C., RAMAGE, R., MAYER, R. J., and LAYFIELD, R. Inhibition of the ubiquitin-proteasome system in Alzheimer’s disease. Proc. Natl. Acad. Sci. USA 2000, 97, 9902–9906. 92 OKAMOTO, Y., OZAKI, T., MIYAZAKI, K., AOYAMA, M., MIYAZAKI, M., and NAKAGAWARA, A. UbcH10 is the cancer-related E2 ubiquitin-conjugating enzyme. Cancer Res. 2003, 63, 4167–4173. 93 SHEKHAR, M. P. V., LYAKHOVICH, A., VISSCHER, D. W., HENG, H., and KONDRAT, N. Rad6 over-expression induces multinucleation, centrosom amplification, abnormal mitosis, aneuploidy, and transformation. Cancer Res. 2002, 62, 2115–2124. 94 MULLER, S., HOEGE, C., PYROWOLAKIS, G., and JENTSCH, S. SUMO, ubiquitin’s
References
95
96
97
98
99
100
101
102
103
mysterious cousin. Nature Rev. Mol. Cell Biol. 2001, 2, 202–210. HOCHSTRASSER, M. Evolution and function of ubiquitin-like protein-conjugation systems. Nature Cell Biol. 2000, 2, E153–E157. WALDEN, H., PODGORSKI, M. S., and SCHULMAN, B. A. Insights into the ubiquitin transfer cascade from the structure of the E1 for NEDD8. Nature 2003, 422, 330–334. BENCSATH, K. P., PODGORSKI, M. S., PAGALA, V. R., SLAUGHTER, C. A., and SCHULMAN, B. A. Identification of a multifunctional binding site on Ubc9p required for Smt3p conjugation. J. Biol. Chem. 2002, 277, 47938–47945. SULLIVAN, M. L. and VIERSTRA, R. D. Cloning of a 16-kDa ubiquitin carrier protein from wheat and Arabidopsis thaliana. Identification of functional domains by in vitro mutagenesis. J. Biol. Chem. 1991, 266, 23878–23885. HAAS, A. L., BRIGHT, P. M., and JACKSON, V. E. Functional diversity among putative E2 isozymes in the mechanism of ubiquitin-histone ligation. J. Biol. Chem. 1988, 163, 13268–13275. SIEPMANN, T. J., BOHNSACK, R. N., TOKGOZ, Z., BABOSHINA, O., and HAAS, A. L. Protein interactions within the N-end rule ubiquitin ligation pathway. J. Biol. Chem. 2003, 278, 9448–9457. MIURA, T., KLAUS, W., GSELL, B., MIYAMOTO, C., and SENN, H. Characterization of the binding interface between ubiquitin and class I human ubiquitin-conjugating enzyme 2b by multidimensional heteronuclear NMR spectroscopy in solution. J. Mol. Biol. 1999, 290, 213–228. HAMILTON, K. S., ELLISON, M. J., BARBER, K. R., WILLIAMS, R. S., HUZIL, J. T., MCKENNA, S., PTAK, C., GLOVER, M., and SHAW, G. S. Structure of a conjugating enzyme-ubiquitin thiolester intermediate reveals a novel role for the ubiquitin tail. Structure 2001, 9, 897–904. MCKENNA, S., MORAES, T., PASTUSHOK, L., PTAK, C., XIAO, W., SPYRACOPOULOS, L., and ELLISON, M. J. An NMR-based model of the ubiquitin-bound human
104
105
106
107
108
109
110
111
112
ubiquitin conjugation complex Mms2.Ubc13. The structural basis for lysine 63 chain catalysis. J. Biol. Chem. 2003, 278, 13151–13158. BORDEN, K. L. RING domains: master builders of molecular scaffolds? J. Mol. Biol. 2000, 295, 1103–1112. BORDEN, K. L. RING fingers and B-boxes: zinc-binding protein-protein interaction domains. Biochem. Cell Biol. 1998, 76, 351–358. ZHENG, N., WANG, P., JEFFREY, P. D., and PAVLETICH, N. P. Structure of a c-Cbl-UbcH7 complex: RING domain function in ubiquitin-protein ligases. Cell 2000, 102, 533–539. BRZOVIC, P. S., KEEFFE, J. R., NISHIKAWA, H., MAYAMOTO, K., FOX, D., FUKUDA, M., OHTA, T., and KLEVIT, R. Binding and recognition in the assembly of an active BRCA1-BARD1 ubiquitin ligase complex. Proc. Natl. Acad. Sci. USA 2003, 100, 5646–5651. HUANG, L., KINNUCAN, E., WANG, G., BEAUDENON, S., HOWLEY, P. M., HUIBREGTSE, J. M., and PAVLETICH, N. P. Structure of an E6AP-UbcH7 complex: insights into ubiquitination by the E2-E3 enzyme cascade. Science 1999, 286, 1321–1326. WU, G., XU, G., SCHULMAN, B. A., JEFFREY, P. D., HARPER, J. W., and PAVLETICH, N. P. Structure of a β-TrCP1-Skp1-β-catenin complex: destruction motif binding and lysine specificity of the SCFβ-TrCP1 ubiquitin ligase. Mol. Cell 2003, 11, 1145–1456. BRZOVIC, P. S., RAJAGOPAL, P., HOYT, D. W., KING, M.-C., and KLEVIT, R. E. Structure of a BRCA1-BARD1 heterodimeric RING-RING complex. Nature Struct. Biol. 2001, 8, 833–837. CONAWAY, R. C. and CONAWAY, J. W. The von Hippel-Lindau tumor suppressor complex and regulation of hypoxia-inducible transcription. Adv. Cancer Res. 2002, 85, 1–12. ZHENG, N., SCHULMAN, B. A., SONG, L., MILLER, J. J., JEFFREY, P. D., WANG, P., CHU, C., KOEPP, D. M., ELLEDGE, S. J., PAGANO, M., CONAWAY, R. C., CONAWAY, J. W., HARPER, J. W., and PAVLETICH, N. P. Structure of the Cul1-Rbx1-Skp1-F
29
30 Ubiquitin-conjugating Enzymes
113
114
115
116
117
118
119
120
121
boxSkp2 SCF ubiquitin ligase complex. Nature 2002, 416, 703–709. KAWAKAMI, T., CHIBA, T., SUZUKI, T., IWAI, K., YAMANAKA, K., MINATO, N., SUZUKI, H., SHIMBARA, N., HIDAKA, Y., OSAKA, F., OMATA, M., and TANAKA, K. NEDD8 recruits E2-ubiquitin to SCF E3 ligase. EMBO J. 2001, 20, 4003–4012. WU, K., CHEN, A., TAN, P., and PAN, Z.-Q. The Nedd8-conjugated ROC1-CUL1 core ubiquitin ligase utilizes Nedd8 charged surface residues for efficient polyubiquitin chain assembly catalyzed by Cdc34. J. Biol. Chem. 2002, 277, 516–527. ORLICKY, S., TANG, X., WILLEMS, A., TYERS, M., and SICHERI, F. Structural basis for phosphodependent substrate selection and orientation by the SDFCdc4 ubiquitin ligase. Cell 2003, 112, 243–256. DEFFENBAUGH, A. E., SCAGLIONE, K. M., BURANDA, T., SKLAR, L. A., and SKOWYRA, D. Release of ubiquitin-charged Cdc34-S∼Ub from the RING domain is essential for ubiquitination of the SCFCdc4 -bound substrate Sic1. Cell 2003, 114, 611–622. PETROSKI, M. D. and DESHAIES, R. J. Context of multiubiquitin chain attachment influences the rate of Sic1 degradation. Mol. Cell 2003, 11, 1435–1444. PENG, J., SCHWARTZ, D., ELIAS, J. E., THOREEN, C. C., CHENG, D., MARSISCHKY, G., ROELOFS, J., FINLEY, D., and GYGI, S. P. A proteomics approach to understanding protein ubiquitination. Nature Biotechnol. 2003, 21, 921–926. PRINGA, E., MARTINEZ-NOEL, G., MULLER, U., and HARBERS, K. Interaction of the ring finger-related U-box motif of a nuclear dot protein with ubiquitin-conjugating enzymes. J. Biol. Chem. 2001, 276, 19617–19623. KOEGL, M., HOPPE, T., SCHLENKER, S., ULRICH, H. D., MAYER, T. U., and JENTSCH, S. A novel ubiquitination factor, E4, is involved in multiubiquitin chain assembly. Cell 1999, 96, 635–644. JIANG, J., BALLINGER, C. A., WU, Y., DAI, Q., CYR, D. M., HOHFELD, J., and PATTERSON, C. CHIP is a
122
123
124
125
126
127
128
129
130
U-box-dependent E3 ubiquitin ligase: identification of Hsc70 as a target for ubiquitylation. J. Biol. Chem. 2001, 276, 42938–42944. MURATA, S., MINAMI, Y., MINAMI, M., CHIBA, T., and TANAKA, K. CHIP is a chaperone-dependent E3 ligase that ubiquitylates unfolded protein. EMBO Rep. 2001, 2, 1133–1138. OHI, M. D., VANDER KOOI, C. W., ROSENBERG, J. A., CHAZIN, W. J., and GOULD, K. L. Structural insights into the U-box, a domain associated with multi-ubiquitination. Nature Struct. Biol. 2003, 10, 250–255. ARAVIND, L. and KOONIN, E. V. The U box is a modified RING finger – a common domain in ubiquitination. Curr. Biol. 2000, 10, R132–R224. HATAKEYAMA, S., YADA, M., MATSUMOTO, M., ISHIDA, N., and NAKAYAMA, K. I. U box proteins as a new family of ubiquitin-protein ligases. J. Biol. Chem. 2001, 276, 33111–31120. HUIBREGTSE, J. M., SCHEFFNER, M., BEAUDENON, S., and HOWLEY, P. M. A family of proteins structurally and functionally related to the E6-AP ubiquitin-protein ligase. Proc. Natl. Acad. Sci. USA 1995, 92, 2563–2567. HUIBREGTSE, J. M., SCHEFFNER, M., and HOWLEY, P. M. A cellular protein mediates association of p53 with the E6 oncoprotein of human papillomavirus types 16 or 18. EMBO J. 1991, 10, 4129–4135. SCHEFFNER, M., NUBER, U., and HUIBREGTSE, J. M. Protein ubiquitination involving an E1-E2-E3 enzyme ubiquitin thioester cascade. Nature 1995, 373, 81–83. VERDECIA, M. A., JOAZEIRO, C. A. P., WELLS, N. J., FERRER, J.-L., BOWMAN, M. E., HUNTER, T., and NOEL, J. P. Conformational flexibility underlies ubiquitin ligation mediated by the WWP1 HECT domain E3 ligase. Mol. Cell 2003, 11, 249–259. BERNIER-VILLAMOR, V., SAMPSON, D. A., MATUNIS, M. J., and LIMA, C. D. Structural basis for E2-mediated SUMO conjugation revealed by a complex between ubiquitin-conjugating enzyme
References
131
132
133
134
135
136
137
138
139
140
Ubc9 and RanGAP1. Cell 2002, 108, 345–356. VANDEMARK, A. P., HOFMANN, R. M., TSUI, C., PICKART, C. M., and WOLBERGER, C. Molecular insights into polyubiquitin chain assembly: crystal structure of the Mms2/Ubc13 heterodimer. Cell 2001, 105, 711–720. DENG, L., WANG, C., SPENCER, E., YANG, L., BRAUN, A., YOU, J., SLAUGHTER, C., PICKART, C., and CHEN, Z. J. Activation of the IκB kinase complex by TRAF6 requires a dimeric ubiquitin-conjugating enzyme complex and a unique polyubiquitin chain. Cell 2000, 103, 351–361. CARTER, P. and WELLS, J. A. Dissecting the catalytic triad of a serine protease. Nature 1988, 332, 564–568. RAWLINGS, N. D. and BARRETT, A. J. Families of cysteine peptidases. Meth. Enzymol. 1994, 244, 461–486. TONG, H., HATEBOER, G., PERRAKIS, A., BERNARDS, R., and SIXMA, T. K. Crystal structure of murine/human Ubc9 provides insight into the variability of the ubiquitin-conjugating system. J. Biol. Chem. 1997, 272, 21381–21387. WU, P.-Y., HANLON, M., EDDINS, M., TSUI, C., ROGERS, R., JENSEN, J. P., MATUNIS, M. J., WEISSMAN, A. M., WOLBERGER, C., and PICKART, C. M. A conserved catalytic residue in the E2 enzyme family. EMBO J. 2003, 22, 5241–5250. HU, M., LI, P., LI, M., LI, W., YAO, T., WU, J.-W., GU, W., COHEN, R. E., and SHI, Y. Crystal structure of a UBP-family deubiquitinating enzyme in isolation and in complex with ubiquitin aldehyde. Cell 2002, 111, 1041–1054. MOSSESSOVA, E. and LIMA, C. D. Ulp1-SUMO crystal structure and genetic analysis reveal conserved interactions and a regulatory element essential for cell growth in yeast. Mol. Cell 2000, 5, 865–876. JOHNSTON, S. C., RIDDLE, S. M., COHEN, R. E., and HILL, C. P. Structural basis for the specificity of ubiquitin C-terminal hydrolases. EMBO J. 1999, 18, 3877–3887. BERLETH, E. S. and PICKART, C. M. Mechanism of ubiquitin conjugating
141
142
143
144
145
146
147
148
enzyme E2–230K: catalysis involving a thiol relay? Biochemistry 1995, 35, 1664–1671. HAUSER, H.-P., BARDROFF, M., PYROWOLAKIS, G., and JENTSCH, S. A giant ubiquitin-conjugating enzyme related to IAP apoptosis inhibitors. J. Cell Biol. 1998, 141, 1415–1422. SANCHO, E., VILA, M. R., SANCHEZ-PULIDO, L., LOZANO, J. J., PACIUCCI, R., NADAL, M., FOX, M., HARVEY, C., BERCOVICH, B., LOUKILI, N., CIECHANOVER, A., LIN, S. L., SANZ, F., ESTIVILL, X., VALENCIA, A., and THOMSON, T. M. Role of UEV-1, an inactive variant of the E2 ubiquitin-conjugating enzymes, in in vitro differentiation and cell cycle behavior of HT-29-M6 intestinal mucosecretory cells. Mol. Cell Biol. 1998, 18, 576–589. WANG, C., DENG, L., HONG, M., AKKARAJU, G. R., INOUE, J.-I., and CHEN, Z. J. TAK1 is a ubiquitin-dependent kinase of MKK and IKK. Nature 2001, 412, 346–351. MORAES, T. F., EDWARDS, R. A., MCKENNA, S., PASTUSHOK, L., XIAO, W., GLOVER, J. N., and ELLISON, M. J. Crystal structure of the human ubiquitin conjugating enzyme complex, hMms2-hUbc13. Nature Struct. Biol. 2001, 8, 669–673. ULRICH, H. D. Protein-protein interactions in an E2-RING finger complex: implications for ubiquitin-dependent DNA damage repair. J. Biol. Chem. 2003, 278, 7051–7058. KATZMANN, D. J., ODORIZZI, G., and EMR, S. D. Receptor downregulation and multivesticular-body sorting. Nature Rev. Mol. Cell Biol. 2002, 3, 893–905. KATZMANN, D. J., BABST, M., and EMR, S. D. Ubiquitin-dependent sorting into the multivesicular body pathway requires the function of a conserved endosomal protein sorting complex, ESCRT-I. Cell 2001, 106, 145–155. GARRUS, J. E., VON SCHWEDLER, U. K., PORNILLOS, O. W., MORHAM, S. G., ZAVITZ, K. H., WANG, H. E., WETTSTEIN, D. A., STRAY, K. M., COTE, M., RICH, R. L., MYSZKA, D. G., and SUNDQUIST, W. I.
31
32 Ubiquitin-conjugating Enzymes
149
150
151
152
153
154
155
156
157
Tsg101 and the vaculolar protein sorting pathway are essential for HIV-1 budding. Cell 2001, 107, 55–65. MARTIN-SERRANO, J., ZANG, T., and BIENIASZ, P. D. HIV-1 and Ebola virus encode small peptide motifs that recruit Tsg101 to sites of particle assembly to facilitate egress. Nature Med. 2001, 7, 1313–1319. CARTER, C. A. Tsg101: HIV-1’s ticket to ride. Trends Microbiol. 2002, 10, 203–205. PORNILLOS, W., ALAM, S. L., DAVIS, D. R., and SUNDQUIST, W. I. Structure of the Tsg101 UEV domain in complex with the PTAP motif of the HIV-1 p6 protein. Nature Struct. Biol. 2002, 9, 812–817. PORNILLOS, O., ALAM, S. L., RICH, R. L., MYSZKA, D. G., DAVIS, D. R., and SUNDQUIST, W. I. Structure and functional interactions of the Tsg101 UEV domain. EMBO J. 2002, 21, 2397–2406. GOFF, A., EHRLICH, L. S., COHEN, S. N., and CARTER, C. A. Tsg101 control of human immunodeficiency virus type 1 Gag trafficking and release. J. Virol. 2003, 77, 9173–9182. KOONIN, E. V. and ABAGYAN, R. A. TSG101 may be the prototype of a class of dominant negative ubiquitin regulators. Nature Genet. 1997, 16, 330–331. HOFMANN, K. and BUCHER, P. The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway. Trends Biochem. Sci. 1996, 21, 172–173. DOHMEN, R. J., MADURA, K., BARTEL, B., and VARSHAVSKY, A. The N-end rule is mediated by the UBC2(RAD6) ubiquitin-conjugating enzyme. Proc. Natl. Acad. Sci. USA 1991, 88, 7351–7355. BAILLY, V., LAMB, J., SUNG, P., PRAKASH, S., and PRAKASH, L. Specific complex
158
159
160
161
162
163
164
165
formation between yeast RAD6 and RAD18 proteins: a potential mechanism for targeting RAD6 ubiquitin-conjugating activity to DNA damage sites. Genes Dev. 1994, 8, 811–820. VARSHAVSKY, A. The N-end rule pathway of protein degradation. Genes Cells 1997, 2, 13–28. BRAUN, S., MATUSCHEWSKI, K., RAPE, M., THOMS, S., and JENTSCH, S. Role of the ubiquitin-selective CDC48UFD1/NPL4 chaperone (segregase) in ERAD of OLE1 and other substrates. EMBO J. 2002, 21, 615–621. QIN, S., NAKAJIMA, B., NOMURA, M., and ARFIN, S. M. Cloning and characterization of a Saccharomyces cerevisiae gene encoding a new member of the ubiquitin-conjugating protein family. J. Biol. Chem. 1991, 266, 15549–15554. SEUFERT, W., FUTCHER, B., and JENTSCH, S. Role of a ubiquitin-conjugating enzyme in degradation of S- and M-phase cyclins. Nature 1995, 373, 78–81. JOHNSON, E. S. and GUPTA, A. A. An E3-like factor that promotes SUMO conjugation to the yeast septins. Cell 2001, 106, 735–744. JOHNSON, E. S. and BLOBEL, G. Ubc9p is the conjugating enzyme for the ubiquitin-like protein Smt3p. J. Biol. Chem. 1997, 272, 26799–26802. LIAKOPOULOS, D., DOENGES, G., MATUSCHEWSKI, K., and JENTSCH, S. A novel protein modification pathway related to the ubiquitin system. EMBO J. 1998, 17, 2208–2214. LAMMER, D., MATHIAS, N., LAPLAZA, J. M., JIANG, W., LIU, Y., CALLIS, J., GOEBL, M., and ESTELLE, M. Modification of yeast Cdc53p by the ubiquitin-related protein Rub1p affects function of the SCFCdc4 complex. Genes Dev. 1998, 12, 914–926.
1
The Deubiquitinating Enzymes
Nathaniel S. Russell and Keith D. Wilkinson Emory University, Atlanta, USA
Originally published in: Protein Degradation, Volume 1. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30837-8
1 Introduction
In the mid-1970s ubiquitin was found to be a covalent modifier of proteins [1]. At the time, it was quite surprising to find a protein that covalently modified another protein. Since then, the reversible covalent modification of proteins by other proteins is known to be commonplace and ubiquitin is used to covalently modify hundreds of proteins, often for the purpose of targeting them to the proteasome for degradation. Protein degradation through the ubiquitin–proteasome system is facilitated by covalently linking ubiquitin to the ε-amino group of a lysine of a substrate protein through its C-terminal glycine [2]. A polyubiquitin (polyUb) chain is formed by linking subsequent ubiquitins to the lysine 48 residue of the preceding ubiquitin in the chain. A chain of four ubiquitins is sufficient for the targeted protein to be recognized and degraded by the proteasome [3]. Conjugation of ubiquitin to other proteins is catalyzed by a three-enzyme cascade [4]. Conjugation begins by activation of ubiquitin by an E1, or Ub-activating enzyme, forming a high-energy thiol ester bond in an ATP-dependent reaction. The ubiquitin is transferred to an E2, or Ub-conjugating enzyme, which then usually pairs with an E3, or Ub-ligase enzyme, to conjugate the ubiquitin to a specific target protein. The usefulness of ubiquitin conjugation is not limited to the ubiquitin– proteasome pathway. Mono- and polyubiquitin are used as signals in various pathways including endocytosis, DNA repair, apoptosis, and transcriptional regulation [5–8]. Polyubiquitin chains can be formed using lysine residues other than K48, the linkage required for proteasomal degradation [9–11]. In addition, there are a number of other ubiquitin-like proteins that also behave as signaling molecules although they are not involved directly in proteasomal degradation. This group Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Deubiquitinating Enzymes
includes SUMO (small ubiquitin-related modifier), Nedd8 (neural precursor cell expressed, developmentally down-regulated 8), ISG15 (interferon-stimulated gene 15), and others [12–14]. These proteins are conjugated to substrates in a similar manner to ubiquitin, using an E1, E2, and E3 cascade of enzymes specific for the particular ubiquitin-like protein involved [15–17]. Soon after it was shown that ubiquitin is conjugated to proteins, it was determined that this was a reversible process and deubiquitinating enzymes, or DUBs, could remove ubiquitin from ubiquitinated proteins [18, 19]. As the genes for ubiquitin and ubiquitin-like proteins were identified it became clear that all ubiquitin family members were synthesized as proproteins and processed to reveal the C-terminal glycylglycine of the active proteins [20]. Based on this information, DUBs were defined as proteases that cleave at the C-terminus of ubiquitin or ubiquitin-like proteins to reverse conjugation to target proteins and also process the proproteins. Over 100 DUBs have been identified (see Table 1 for a list of the DUBs whose roles are known or suspected) and they are used to regulate ubiquitin and ubiquitin-like protein metabolism. Since cells utilize a combination of mono-ubiquitin, polyubiquitin, and ubiquitin-like proteins for a multitude of reasons and conjugate them to thousands of proteins, a regulatory system has evolved that is exceedingly complex and must be exquisitely regulated (see Figure 1). DUBs help regulate this system by processing proubiquitin into a mature form, recycling free polyubiquitin chains into monomeric Ub, assisting the degradation of proteasomal substrates, and regulating the ubiquitination levels of proteins in cellular pathways other than proteolysis. Thus, DUBs play crucial roles in determining the cellular fates of many proteins and regulating cellular function.
Fig. 1 DUB families and substrate specificity. DUBs can be classified by genetic relationships (family) or by substrate specificities (activity). Arrows point towards the substrates that members of each family can
process in a physiologically relevant way. Each family is capable of processing multiple substrates and each activity can be catalyzed by members of more than one family.
1 Introduction 3 Table 1 Physiological roles of DUBs revealed by deletion or knockdown experiments
DUB
Organism
Deletion/knockdown phenotype
Ubiquitin C-terminal hydrolases (UCHs) UCH-L1 mouse gracile axonal dystrophy UCH-L3
mouse
no detectable phenotype
UCH37
human
unknown
BAP1 YUH1
human S. cerevisiae
unknown cannot process proRUB1
Functional role
predominant neuronal UCH [112] undetermined neuronal function [113] edits polyubiquitin chains at proteasome [91] tumor suppressor? [22] processes proRUB1, Ub-adducts [83]
Ubiquitin specific processing proteases (UBPs) UBP1 S. cerevisiae no phenotype detected undetermined [97] UBP2 S. cerevisiae no phenotype detected undetermined [96] UBP3/ S. cerevisiae, polyubiquitin accumulation transcriptional silencing inhibitor, USP10 human regulates membrane transport [66, 118] UBP6 S. cerevisiae low levels of free ubiquitin processes polyubiquitin chains at proteasome [94] UBP8 S. cerevisiae increase in ubiquitinated transcriptional regulator [88] histone H2B UBP14/IsoT S. cerevisiae, increased polyubiquitin recycles free polyubiquitin to human levels, proteasome defects mono-ubiquitin [33, 86] UBP16 S. cerevisiae no phenotype detected undetermined function at mitochondria [70] DOA4 S. cerevisiae Ub-depletion, defective recycles Ub and polyubiquitin Ub recycling adducts [71, 84] Unp/USP4 mouse, human unknown undetermined USP7 human indirect p53 activation regulates p53 ubiquitination [36] UBPy/USP8 human increase in protein cell-growth regulation [119] ubiquitination USP14 mouse (UBP6 ataxia regulating synaptic activity plus homolog) proteasome [109] USP21 human unknown process Ub and Nedd8 conjugates, growth regulator? [32] USP25 human unknown over-expression has possible role in Down’s Syndrome [120] UBP41 human unknown promotes apoptosis [6] UBP43 mouse accumulation of Isg15 processes Isg15, regulates Isg15 conjugates conjugate levels [31] CYLD human cylindromatosis regulates K63 polyubiquitination of substrates in NF-κB pathway [7, 103, 104] fat facets Drosophila defective germ cell regulates specific developmental specification, eye formation processes [74, 75] DUB1, 2, 2A mouse unknown cytokine specific growth regulators VDU1 human unknown regulation of Ub-proteasome pathway? [64]
4 The Deubiquitinating Enzymes Table 1 (Continued)
DUB
Organism
Deletion/knockdown phenotype
Functional role
JAMM Isopeptidases Rpn11 Yeast, human Csn5 Yeast, Drosophila
processes polyubiquitin chains at proteasome [46, 47] defects in SCF E3s in yeast, regulation of cullin neddylation lethal in Drosophila levels [45]
OTU DUBs cezanne
human
unknown
A20
mouse
otubain1 otubain2 VCIP 135
human human human
severe inflammation, premature death unknown unknown unknown
lethal
Ubiquitin-like proteases (ULPs) Ulp1 S. cerevisiae lethal Ulp2
S. cerevisiae
Den1/SENP8 mouse SENP6 human
increased SUMO conjugates, DNA repair defective unknown unknown
Others Apg4B
mouse
unknown
ataxin-3
human
unknown
negative regulation of NF-κB pathway [43] negative regulation of NF-κB pathway [44, 121] editing DUB? [42, 122] undetermined [42] membrane fusion after mitosis [72] regulates cell cycle progression [39] desumoylating enzyme [123]
deneddylates cullins [26, 37] involved in reproductive function? [124] processes autophagy-related UbLs [116] processes polyubiquitin chains? [125]
The organism listed for each DUB refers either to where it was discovered or where the work characterizing the deletion strain and function was performed. DUBs with multiple organism identifiers are either highly similar in sequence in each organism or functional homologs. An unknown deletion phenotype indicates that a deletion, knockout, or knockdown of a particular DUB has yet to be generated. No detectable phenotype indicates that a deletion strain has been made, but no phenotypes were observed.
The study of DUBs has moved at a rapid rate since their initial discovery in the early 1980s. Yet despite all the progress, the total number of DUBs and the substrate specificity of most DUBs are still undetermined. The discovery of novel DUB families including the JAMM isopeptidases and OTU DUBs has highlighted that there may be still more unidentified DUBs. Because of the large number of potential DUB substrates and the exquisite specificity that some individual DUBs exhibit (see Figure 1) study of these enzymes has often been challenging. A burgeoning amount of structural data and recent technical advances are being used to address this challenge. The goal of this chapter is to highlight recent developments in the
2 Structure and In Vitro Specificity of DUB Families 5
DUB field by giving an overview of DUB families, including DUBs that act on ubiquitin-like proteins, to discuss how DUBs achieve their specificity, and to show how the physiological roles of DUBs and their substrates are being elucidated.
2 Structure and In Vitro Specificity of DUB Families 2.1 Ubiquitin C-terminal Hydrolases (UCH)
The first class of DUBs discovered, the ubiquitin C-terminal hydrolases (UCHs), is a relatively small class with only four members in humans and one in budding yeast. UCHs are cysteine proteases related to the papain family of cysteine proteases. Most UCHs consist entirely of a catalytic core that has a molecular mass of about 25 kDa, although Bap1 and UCH37 have C-terminal extensions [21, 22]. All UCHs have a highly conserved catalytic triad consisting of the active-site cysteine, histidine, and aspartate residues that are absolutely required for function [23]. In vitro studies have determined that UCHs have significant activity in removing small adducts from the C-terminus of ubiquitin, including short peptides, ethyl ester groups, and amides [24]. They are also very efficient at cotranslationally processing the primary gene products (proubiquitin or Ub-ribosomal subunit fusions) to expose the C-terminal gly–gly motif required for conjugation of ubiquitin and ubiquitin-like proteins to substrates. However, UCHs are unable to cleave the isopeptide bond between ubiquitins in a polyubiquitin chain or to act on ubiquitin conjugated to a folded protein. They are similarly inefficient in acting upon small peptide substrates based on the sequence of the ubiquitin Cterminus [25]. As described below, the binding of ubiquitin is required for optimal UCH activity. Nedd8, a closely related ubiquitin-like protein, is also a substrate for human UCH-L3, albeit with three orders of magnitude less efficiency than ubiquitin [26]. The crystal structures of human and yeast UCHs have been solved, the latter in complex with the inhibitor ubiquitin aldehyde [27, 28]. The UCH fold is closely related to that of the papain family of cysteine proteases. Ubiquitin is bound in a cleft on a surface that is highly conserved in all UCHs. NMR studies on the binding of ubiquitin to human UCH-L3 show a similar mode of interaction and define three regions on the surface of ubiquitin involved in this binding [29]. As noted below and in Figure 2, the same surface of the ubiquitin fold is also involved in binding to USP7 and ULP1. This is remarkable as these DUBs are from different families and not significantly homologous in sequence or structure. A second feature of UCHs is the presence of a “blocking loop” spanning the active site and limiting the size of substrates that can be accommodated. Yuh1 and other UCH DUBs contain a mobile, ∼20-residue loop that is disordered in the unliganded protein, but becomes ordered upon substrate binding. The loop passes directly over the active site and the leaving group attached to the gly–gly at
6 The Deubiquitinating Enzymes
Fig. 2 Substrate binding by DUBs revealed by X-ray crystal structures. In the ribbon diagrams, the DUB is represented in white and the substrate in color. The ubiquitin (yellow or green) or SUMO (red) substrates are shown in the same orientation to highlight the similarity of substrate binding by different DUB classes. (A) Ubiqui-
tin (yellow) bound to YUH1. (B) Ubiquitin (green) bound to USP7. (C) SUMO (red) bound to ULP1. (D) Superimposition of substrates from A-C. The regions of each substrate that are within 3.5Å of the DUB when bound are highlighted in color to demonstrate the conserved regions that are recognized by the different DUB classes.
the ubiquitin C-terminus has to pass directly through this loop in order to access the catalytic cysteine. The maximum diameter of this loop was calculated to be ∼15Å, too small for any folded substrate save a single helix [27]. The loop thus allows small substrates to be efficiently cleaved, but excludes larger Ub–protein conjugates. This loop explains, at least in part, the preference of UCHs for small or disordered leaving groups.
2 Structure and In Vitro Specificity of DUB Families 7
2.2 Ubiquitin-specific Processing Proteases (UBP/USP)
The ubiquitin specific processing proteases (referred to as UBPs in yeast and USPs in human and mouse) were the second class of DUBs discovered. Catalytically, the UBPs are very similar to the UCHs in that they also utilize the catalytic triad of an active-site cysteine and a conserved histidine and aspartate. The UBP catalytic core of about 400 amino acids contains blocks of conserved sequences (Cys and His boxes) around these catalytic residues [23]. The UBPs are generally larger and more variable in size than the UCH class, ranging from 50 to 300 kDa. Nterminal extensions to the catalytic core account for most of the increased size although a few UBPs have C-terminal extensions. These N-terminal extensions are highly divergent in sequence, unlike the conserved regions of the catalytic core. The sequence and size variations of these extensions are thought to aid in determining UBP localization and substrate specificity. There are 16 UBPs in yeast and more than 50 USPs identified in humans, making the UBP/USP family much larger than the UCH family [30]. UBPs also process a wider variety of substrates than UCH DUBs, including proubiquitin, free polyubiquitin chains of various linkages, and mono-or polyubiquitin conjugated to target proteins in vitro and in vivo. In addition, some family members can act on ubiquitin-like proteins. UBP43 has been demonstrated to act on ISG15 while USP21 cleaves conjugated Nedd8 [31, 32]. The diversity of the UBPs and breadth of substrates they act upon, makes them useful in a wide variety of cellular pathways and locations. UBPs regulate apoptosis, DNA repair, endocytosis, and transcription in addition to the ubiquitin–proteasome pathway (see below). The same diversity presents a challenge in determining the specificity of UBPs and with the exception of Isopeptidase T (UBP14/USP5), there have been few quantitative studies of in vitro specificity [33, 34]. In general, specificity has been described with qualitative “yes or no” assays that are not particularly useful in suggesting in vivo roles. The structure of one UBP catalytic domain has been solved, that of USP7 complexed to ubiquitin aldehyde [35]. The data may be applicable to the way in which other UBPs function because the catalytic core of many UBPs is highly conserved. USP7 (also called HAUSP) is a human ubiquitin-specific protease that regulates the turnover of p53 [36]. USP7 consists of four structural domains; an N-terminal domain known to bind p53 and EBNA1, a catalytic domain, and two C-terminal domains. The 40-kDa catalytic domain exhibits a three-part architecture comprising Fingers, Palm, and Thumb (see Figure 2). The leaving ubiquitin moiety is specifically coordinated by the Fingers, with its C-terminus placed in a deep cleft between the Palm and Thumb where the catalytic residues are located. The domains form a pocket ideal for binding ubiquitin. Residues in the structure important for the above functions are conserved amongst UBPs, indicating that many UBPs may utilize the Fingers, Palm, and Thumb architecture to bind and cleave ubiquitinated substrates.
8 The Deubiquitinating Enzymes
Another interesting structural observation is that water molecules cushion ubiquitin in the binding pocket. This is necessary because the binding surfaces of ubiquitin are uncharged, and the USP7 binding pocket is made up of predominantly acidic amino acid residues. These water molecules form extensive networks of hydrogen bonds with the bound ubiquitin and USP7. It is possible that they contribute to USP7’s substrate specificity by allowing the protein to provide for relatively weak binding of ubiquitin and forcing itself to interact with the target protein to achieve specificity. This seems to be borne out by the fact that ubiquitin does not form a tightly bound complex with USP7 [35]. 2.3 Ubiquitin-like Specific Proteases (ULP)
The ubiquitin-like specific proteases (ULPs) are a third class of DUB first thought to act only on SUMO-related ubiquitin-like proteins. There are two yeast ULPs and seven human ULPs (also called sentrin specific proteases, or SENPs). Further analysis determined that ULPs have little or no activity on ubiquitin substrates, but one (SENP8) acts on Nedd8 [26, 37, 38]. Despite acting on non-ubiquitin substrates, ULPs are still classified as DUBs because the function and mechanism of catalysis is so similar to those of the DUBs that act on ubiquitin. ULPs lack significant sequence homology to other DUBs and are more closely related to viral proteinprocessing proteases [39]. In addition to the lack of sequence homology, ULPs have little structural homology to other DUB classes except in the active site. The structure of ULP1 (see Figure 2) in complex with the C-terminal aldehyde of yeast SUMO (SMT3) illustrates that, like most other DUBs, ULPs are thiol proteases, utilizing a conserved catalytic triad consisting of an active-site cysteine, histidine, and aspartate [40]. Also, they require a gly–gly motif at the C-terminus of their UbL substrate for tight binding. The SUMO binding pocket of ULP1 recognizes SUMO through a number of polar and charged-residue interactions, including multiple salt bridges that are not present in the USP7 ubiquitin-binding site, and does not utilize water molecules or a “blocking loop”. 2.4 OTU DUBs
A class of DUBs only identified since 2002 is the OTU (ovarian tumor protein) DUB class. The OTU domain was originally identified in an ovarian tumor protein from Drosophila melanogaster, and over 100 proteins from organisms ranging from bacteria to humans are annotated as having an OTU domain. The members of this protein superfamily were annotated as cysteine proteases, but no specific function had been demonstrated for any of these proteins. The first hint of a role for OTU proteins in the ubiquitin pathway was afforded by the observation that an OTUdomain-containing protein, HSPC263, reacted with ubiquitin vinyl sulfone (an active-site-directed irreversible inhibitor of DUBs) [41].
3 DUB Specificity 9
Then two groups almost simultaneously discovered that several OTU-containing proteins have DUB activity. Two human OTU DUBs were identified by purification with Ub-aldehyde (a reversible DUB inhibitor) affinity resin [42]. These proteins, named otubain1 and 2 (OTU-domain Ub-aldehyde binding protein) have a mass of approximately 35 kDa and are able to cleave polyubiquitin chains in vitro. However, the cleavage mechanism and their true substrates in vivo have yet to be determined. The other OTU DUB found was Cezanne, a 100-kDa protein that is similar to the A20 negative regulator of NF-κB [43]. Like A20, Cezanne plays a role in regulating NF-κB signaling pathways and has general DUB activity [44]. These OTU DUBs have highly conserved catalytic cysteine and histidine residues, implying that they utilize a catalytic triad to catalyze cleavage of polyubiquitin. It is unclear if most proteins containing OTU domains are DUBs, as analysis of the OTU family for DUB activity is only just beginning. 2.5 JAMM Isopeptidases
JAMM isopeptidases also constitute a recently identified class of DUBs. The members of this interesting class of DUBs were the first non-cysteine protease DUBs identified. Two JAMM isopeptidases have been confirmed as DUBs: Rpn11, which acts on ubiquitin conjugates, and Csn5, which acts on Nedd8 conjugates [45–47]. A number of other eukaryotic proteins have been annotated as containing the JAMM motif, but whether they have DUB activity has yet to be determined. Instead of cysteine proteases, they are metalloproteases belonging to a family of proteins that contain the Jab1/Csn5 and MPN domains [48]. Their activity depends on the JAMM motif (EXn HS/THX7 SXXD) in the JAMM domain. The two histidines and an aspartic acid act as ligands to bind a metal ion, presumably zinc although this has not been proven, to achieve catalysis through polarization of a bound water molecule. A glutamic acid serves as a general acid-base catalyst. The crystal structure of a JAMM metalloprotease from Archaeoglobolus fulgidus bacteria has been recently been solved, but no structures of a JAMM isopeptidase with DUB activity are yet available [49, 50].
3 DUB Specificity
Why are there so many DUBs and how do they achieve specificity? The numerous DUBs identified to date suggest that DUBs have specifically evolved to act on distinct cellular substrates rather than to have general deubiquitinating activity (see Figure 1). We can ask what common features of these enzymes define them as DUBs and what differences allow specific DUBs to act on mono- vs. polyubiquitin? How have they evolved to cleave only ISG15 or SUMO-modified substrates, for instance? A body of data has been accumulated that at least partially answers these questions.
10 The Deubiquitinating Enzymes
3.1 Recognition of the Ub-like Domain
All DUBs appear to recognize the body of the ubiquitin fold. UCH-L3, for example, makes contact with three regions of ubiquitin; residues 6–12, 41–48, and 69–74 [29]. These surfaces are highly conserved in Nedd8, but divergent in ISG15 and SUMO. Correspondingly, UCH-L3 can cleave ubiquitin and Nedd8 adducts but not those of the other ubiquitin-like proteins [26]. The same regions appear to be important for interactions of ubiquitin with many other DUBs (see Figure 2) and Ub-binding proteins. Importantly, all ubiquitinbinding domains examined utilize these same surfaces in binding ubiquitin. Recognition of a ubiquitin domain can be accomplished by ubiquitin-associated domains (UBA), which are present in many proteins, including some DUBs, and interact with polyubiquitin up to 1000-fold better than mono-ubiquitin [51]. However, other binding domains such as UIM (ubiquitin-interacting motif) and CUE (coupling of ubiquitin conjugation to ER degradation) domains utilized in endocytic pathways prefer binding mono-ubiquitin [52–54]. Polyubiquitin-binding proteins recognize a subset of this binding surface of ubiquitin, often described as the hydrophobic patch. It is a group of three amino acids, Leu8, Ile44, and Val70, which are oriented in the ubiquitin molecule to form a small hydrophobic patch [55]. Polyubiquitin chains incorporating ubiquitins with mutations at residues 8 and 44 were unable to be disassembled by DUBs present in the 19S subunit of the proteasome [56]. In addition to providing a recognition site for DUBs, the patch is also important in determining the quaternary structure of polyubiquitin, another feature utilized by DUBs in substrate recognition. One UbL protein, ISG15, consists of a fusion of two ubiquitin domains. The crude mimicking of an Ub-dimer could potentially contribute to its specific recognition by deISGylating enzymes. Polyubiquitin chains linked through all seven lysines in ubiquitin have been detected in vivo, and these poorly characterized forms of non-K48-linked polyubiquitin are also likely to have significant roles in the cell [57]. Polyubiquitin chains that are linked through different lysines are expected to be different enough in structure that individual DUBs could distinguish between them. K63-linked polyubiquitin is a well characterized alternative linkage and unlike K48-linked polyubiquitin, is not involved in proteolytic degradation [58, 59]. Structural data confirms the idea that these two types of polyubiquitin can have different structures [60, 61]. The structures of these dimers were solved by NMR analysis and they were found to have quite different conformations. Indicative of this, non-hydrolyzable ubiquitindimer analogs containing different linkages have markedly different effectiveness when used to inhibit the enzymatic activity of Isopeptidase T [62]. Isopeptidase T binds and cleaves polyubiquitin linked through at least four of the seven possible chain linkages found in vivo, although the catalytic efficiency of these activities is not known [10]. It is interesting to speculate that Isopeptidase T utilizes its two UBA domains to regulate binding of different polyubiquitin substrates. Mutational analysis of the UBA domains and structural data are needed to determine if this is the case and whether it is applicable to other DUBs as well.
3 DUB Specificity 11
3.2 Recognition of the gly–gly Linkage
The central feature that defines all DUBs is that they recognize and act at the C-terminus of the ubiquitin or ubiquitin-like domain. All mature ubiquitin and ubiquitin-like proteins have a C-terminal gly–gly motif and DUB cleavage releases leaving groups attached to the carboxyl group of the C-terminal glycine. With the exception of the JAMM metalloproteases, DUB catalysis starts with the nucleophilic attack of the catalytic cysteine on the carbonyl carbon of the scissile bond to form the tetrahedral intermediate. This is converted to an acyl-enzyme intermediate by expelling the C-terminal leaving group. Attack by a water molecule allows regeneration of the free thiol on the catalytic cysteine and releases free ubiquitin. The JAMM isopeptidases appear to use a classical metalloprotease mechanism [50]. DUB structures have evolved to recognize this C-terminal glycylglycine with exquisite specificity. Analysis of ubiquitin-fusion proteins lacking the gly–gly motif has clearly shown that they are not cleaved efficiently by DUBs [63]. All DUBs exclude larger amino acids at the C-terminus of the ubiquitin domain by having a deep cleft in their respective structures that is only large enough to hold two glycines. The narrowest region of USP7’s catalytic cleft sterically excludes amino acids with any type of side chain, enforcing specificity for ubiquitin conjugates [35]. However, the end of the cleft is open, which allows USP7 to act on large ubiquitin conjugates like its substrate, ubiquitinated p53. ULP1 uses a similar type of cleft to recognize the gly–gly motif except that it uses a tryptophan residue to restrict access to the catalytic site when a substrate is bound to the enzyme [40]. UCHs have a similarly constrained cleft and also use the previously described “blocking loop” to assist in specifically recognizing the C-terminus of ubiquitin. 3.3 Recognition of the Leaving Group
In principle, DUBs might also recognize the leaving group to which ubiquitin is attached. In fact, such a mechanism seems likely as several DUBs have little affinity for ubiquitin and several have been shown to bind the un-ubiquitinated target protein (see Table 2). Interactions between DUBs and putative substrates have been shown for the mammalian DUBs VDU1, USP11, and UBPy, as well as UBP3 from yeast and fat facets from Drosophila [64–68]. In other cases, DUBbinding proteins may serve as scaffolds or adaptors that localize DUBs (discussed below). 3.4 Substrate-induced Conformational Changes
DUBs are not general hydrolases for cleaving after a gly–gly sequence even though they recognize the gly–gly motif at the C-terminus of ubiquitin and ubiquitin-like proteins. What is so special about these particular gly–gly sequences that DUBs will only recognize and act on them and not others? The answer comes from the fact
12 The Deubiquitinating Enzymes Table 2 Identification of DUBs and DUB-binding partners through physical and genetic inter-
action screens DUB
Affinity
Characterized by MS
UCHs UCH37 UCH-L3 UCH-L1 UBPs DOA4 UBP3
Yeast two-hybrid
Interaction partner(s) Synthetic lethal
X X X
X
UBP6 UBP8 USP5 X USP7 USP11 CYLD fat facets UBPy OTU DUBs cezanne otubain1 and 2 X VCIP 135 X JAMM Isopeptidases Rpn11 Others ataxin 3
X
S14, UIP1 [21] Nedd8 [126] JAB1, p27 [127] X X
X X
X X X X X
ubiquitin [43] ubiquitin aldehyde [42] VCP/P47 [72] X
X
SLA1, SLA2 [128] SIR4, Bre5, Stu1 [66, 118, 129] 19S proteasome [130] SAGA, SLIK acetyl transferases [88] ubiquitin [131] ataxin [132] RanBPM [67] NEMO [7] Vasa [75] CDC25(Mm) [133]
UBP6 [92] RAD23, HHR23A, HHR23B [134]
that DUBs interact with the rest of the ubiquitin or ubiquitin-like substrate, and this interaction causes conformational changes in the DUB that are necessary to achieve catalysis. These changes result in rapid and efficient cleavage of only the particular substrate that the DUB is equipped to bind. It also explains why peptides with a gly–gly in them are not susceptible to cleavage by DUBs as they are lacking the substrate-binding domains that cause the DUB conformational change required for cleavage. The different DUB classes utilize a number of conformational changes that are induced upon substrate binding to assist in promoting efficient cleavage. UCH DUBs have been the most thoroughly analyzed. Comparison of the ubiquitin—UCH complex with unliganded UCH shows two significant conformational differences that contribute to keeping the unliganded enzyme in an inactive state. First, the previously described “blocking loop” becomes ordered as it interacts with ubiquitin. Invariant residues form hydrogen bonds with the ubiquitin substrate and other UCH residues, indicating that the loop has functional importance during substrate binding [27]. Second, the side chain of L9 in UCH-L3 intrudes into
4 Localization of DUBs 13
the substrate-binding cleft, occluding the catalytic cysteine and preventing binding of peptide substrates [29]. When ubiquitin binds to the UCH-L3, an interaction between ubiquitin and UCH-L3 repositions L9, allowing access to the active site cleft. Thus, the energy of ubiquitin binding is required to activate UCH-L3, allowing its cleavage. This type of selectivity (where ubiquitin binding is required for activity) may be necessary to prevent deleterious cleavage of other protein substrates by UCHs. A similar situation was observed when the crystal structure of USP7 was solved in the absence of substrate [35]. The catalytic cysteine of the unliganded protein is not in an orientation that would allow catalysis to take place. The histidine residue needed to interact with the active-site cysteine is too distant for a catalytic-triad mechanism to function. Binding of ubiquitin aldehyde induces a significant conformational change that realigns the catalytic triad residues so the hydrogen bonding required for catalysis can take place. Thus, like UCH DUBs, the unliganded protease is inactive and only becomes catalytically active when it is binding substrates. ULP1 also uses conformational changes to “clamp down” on the gly–gly motif when a SUMO substrate is bound. Trp448 lies directly above the active site and interacts with the SUMO C-terminus by Van der Waals interactions, sandwiching the gly–gly motif between Trp448 and the active-site cysteine when SUMO binds [40]. Despite the various methods utilized, all DUBs require a conformational change triggered by binding of a specific substrate to catalyze cleavage. These required conformational changes are driven by the energy of interaction between the DUB and the body of the ubiquitin domain.
4 Localization of DUBs
While many DUBs are cytoplasmic, localization of DUBs is also known to be important in regulating DUB specificity. The localization of ULP1, for example, is important in determining its substrates. The N-terminal domain of ULP1 is known to localize the enzyme to the nuclear envelope, and truncation mutations lacking this domain remain in the cytoplasm [69]. When the truncated protein is expressed in ULP1 yeast strains, the cells grow at wild-type levels, and the truncated protein is able to cleave SUMO substrates in vitro. However, analysis of ULP1 cells expressing this truncation shows an accumulation of SUMO conjugates. Apparently, the localization of ULP1 to the nuclear envelope is necessary in order for it to act on specific nuclear-envelope-localized substrates. The localization helps constrain ULP1 isopeptidase activity so ULP1 does not inappropriately act on cytoplasmic substrates. Other examples of DUB activity regulated by localization include UBP6, which is fully active only when bound to the proteasome (see below) and UBP16 residence on the outer membrane of the mitochondria, although its function there is undetermined [70]. Other DUBs have been found to associate with membranes and regulate membrane-associated cellular processes, although they appear not to be membrane
14 The Deubiquitinating Enzymes
anchored like UBP16. The ability of DOA4 to remove ubiquitin from membranebound endocytic substrates promotes their degradation in the vacuole or lysosome [71]. DUBs are also important for membrane fusion events as shown by the fact that an OTU domain DUB, VCIP135 (VCP/p47 complex-interacting protein of 135kd), is necessary for p97-p47-mediated Golgi cisternae reassembly after mitosis [72]. Also, a neuronal DUB, synUSP, was found to localize to post-synaptic lipid rafts (membrane microdomains involved in membrane trafficking and signal transduction) [73]. However, its function at that location has yet to be characterized. A well-studied example of a tissue-specific DUB activity is fat facets, a UBP originally found in Drosophila[74]. It is important in eye development and germcell specification and is active only in specific cell types during certain stages of development [65, 75]. The lack of fat facets results in defective posterior patterning, germ-cell specification, and eye formation. Fat facets activity is required to prevent the inappropriate degradation of vasa and liquid facets. In this case, the role of the DUB appears to be defined by the restricted expression of its known substrates. Temporal regulation of DUB expression also appears important. D’Andrea and colleagues first described a small family of DUBs that are induced as immediate early gene products of cytokine stimulation [76]. Different cytokines were shown to induce different DUBs and the expression of these enzymes was short-lived [77]. It appears that these DUBs may be involved in down-regulating cytokine receptors, perhaps by removing the ubiquitin involved in sorting of the receptor at the early endosome. Likewise, UBP43, the short-lived processing protein for ISG15, is present at very low levels in normal cells and highly expressed upon interferon induction [78].
5 Probable Physiological Roles for DUBs 5.1 Proprotein Processing
One important function of DUBs is the processing of ubiquitin or ubiquitin-like proteins to their mature forms. Ubiquitin is expressed in cells as either linear polyubiquitin or N-terminally fused to certain ribosomal proteins [79, 80]. These gene products are processed by DUBs to separate the ubiquitin into monomers and expose the gly–gly motif at the C-terminus. Many DUBs process linear polyubiquitin or Ub-fusion proteins in vitro, but this processing appears to take place cotranslationally in vivo and is extremely rapid. This makes analysis difficult and leaves unanswered the question of which DUBs actually perform this function in vivo. Multiple DUBs may be able to perform this processing at a physiologically relevant level since DUB deletions rarely shows processing defects [81]. Ubiquitin-like proteins are also expressed as proproteins with a short C-terminal extension of a few amino acids that must be removed to make the UbL available for conjugation to target proteins. All ULPs have been shown to metabolize their
5 Probable Physiological Roles for DUBs 15
respective proprotein to an active form in vitro although again it is unclear which ULPs are responsible for this activity in vivo. An exception to the confusion is the finding that RUB1 (the yeast homolog of Nedd8) is processed by YUH1 in Saccharomyces cerevisiae. Conjugation of RUB1 to Cdc53 is required for efficient assembly of certain SCF (skp1, cullin, F-box) E3 ubiquitin ligases [82]. Yuh1 deletion strains do not process Rub1 or modify Cdc53 with Rub1 [83]. Modification of Cdc53 by Rub1 could be reconstituted in a Yuh1 strain by expressing a mature Rub1 construct lacking the C-terminal asparagine normally removed by processing. This demonstrated that Yuh1 processes RUB1 proprotein into the mature form in vivo. It is not known which DUB performs the processing of proNedd8. 5.2 Salvage Pathways: Recovering Mono-ubiquitin Adducts and Recycling Polyubiquitin
It has been speculated that without UCH function, all ubiquitin in the cell would be conjugated with glutathione or other cellular amines and therefore unavailable for conjugation. This would quickly result in the cessation of the ubiquitin–proteasome system function and cell death due to lack of active ubiquitin to conjugate to substrates. The effectiveness of UCH DUBs in liberating ubiquitin from other small adducts makes them likely candidates to act on these particular adducts. In addition, the cell must regenerate mono-ubiquitin from polyubiquitin and various mono-ubiquitinated proteins to maintain levels of mono-ubiquitin for conjugation. Doa4 appears to remove small peptides attached to mono- and di-ubiquitin intermediates resulting from proteasomal degradation as well as removing ubiquitin from proteins targeted for endocytosis [84, 85]. Loss of Doa4 function in yeast results in depleted levels of mono-ubiquitin and increased cell death during stationary phase. Another function for DUBs is regenerating free ubiquitin from unanchored polyubiquitin chains removed from proteasome substrates or proteins targeted for other pathways. Polyubiquitin inhibits the proteasome and lowers the amount of free ubiquitin available for conjugation to proteins. Thus, these chains need to be processed to mono-ubiquitin to prevent polyubiquitin accumulation and inhibition of the proteasome. This type of DUB activity has been well characterized in vivo and in vitro and Isopeptidase T appears to be the DUB that is responsible for the majority of this activity. Deletion of UBP14 in yeast is not lethal, although large amounts of polyubiquitin build up in the cell and proteasome function is impaired [86]. Isopeptidase T seems to serve as a general DUB for regenerating mono-ubiquitin as it cleaves polyubiquitin containing various linkages [59]. 5.3 Regulation of Mono-ubiquitination
DUBs have increasingly been found to be important in regulating the ubiquitination level of proteins not targeted to the proteasome for degradation. Some DUBs are active participants in the regulation of mono-ubiquitin (or mono-UbL) conjugation and others can regulate the conjugation of multiple types of ubiquitin or UbLs
16 The Deubiquitinating Enzymes
to a single substrate. For instance, deneddylating enzymes may regulate the neddylation of cullin proteins both by processing proNedd8 and by removing Nedd8 from neddylated cullins [26, 38]. As a component of the SCF E3 ligase complexes, cullins require neddylation in order for their cognate E3 ligase to be efficiently assembled [82]. Regulation of this modification indirectly regulates the ubiquitination of a subset of proteins. Defects in deneddylation could lead to inappropriate ubiquitination of substrates owing to inappropriate recruitment of E2s to the SCF E3s [87]. Regulating mono-ubiquitination of proteins by DUBs is important in histone modification where ubiquitination is thought to modulate chromatin structure and transcriptional activity. Normally, about 10% of the histone core octomers contain ubiquitinated histones and the ubiquitin is removed at mitosis by DUB activity. UBP8 has been demonstrated to regulate the ubiquitination of histone H2B, which is important in transcriptional activation of many genes [88]. Many cell-surface receptors are ubiquitinated upon internalization and the ubiquitin is removed by DUBs at the early endosome. Properly sorted receptors are then shuttled to the lysosome for degradation. In the absence of Doa4, the ubiquitin is not removed upon sorting and instead is co-degraded in the vacuole, resulting in ubiquitin depletion [84]. Another DUB, UBP3 assists Golgi-ER retrograde transport by deubiquitinating B -COP, thus preventing its degradation [66]. Mono-ubiquitinated B -COP cannot be assembled into the COP1 complex without UBP3/Bre5 complex DUB activity. Disruption of the complex in Bre5 strains reduces the efficiency of Golgi-ER transport and facilitates the polyubiquitination and degradation of B -COP by the proteasome. One fascinating observation is that PCNA (proliferating cell nuclear antigen) can be modified by multiple forms of ubiquitin, demonstrating that DUBs with different specificities can act at the same location on a specific substrate. PCNA can be modified by mono-ubiquitin, 63-linked polyubiquitin, or SUMO at K164 [89]. Modification of PCNA by mono- or polyubiquitin determines whether it is utilized in translesion synthesis or error-free DNA repair, respectively. SUMO modification prevents PCNA function in DNA repair and instead promotes DNA replication. It is probable that multiple DUBs, as yet unidentified, are required to regulate PCNA modification. 5.4 Processing of Proteasome-bound Polyubiquitin
DUBs play a crucial role in regulating the function of the proteasome. For a long time it was unclear what happens to polyubiquitin conjugated to a proteasome substrate when that substrate is at the proteasome ready for degradation. Was the conjugated polyubiquitin processed by the proteasome and degraded or was it removed by a DUB and released from the proteasome? The small 13-Å diameter entrance to the 20S catalytic core of the proteasome requires all substrates to be fed through as unfolded polypeptides [90]. A branched polypeptide such as a ubiquitin–protein conjugate apparently has difficulty fitting through the pore,
6 Finding Substrates and Roles for DUBs 17
greatly reducing proteasome efficiency [47]. Thus, it seemed likely that DUBs must remove polyubiquitin from proteasome substrates before they enter the 20S catalytic core of the proteasome. To date, three DUBs are known to perform this function and all are components of the 19S lid of the mammalian proteasome. The first described was UCH37, although its exact function is still unclear [91]. UCH37 is thought to be an editing DUB that assists in clearing the proteasome of ubiquitinated proteins. UCH37 slowly cleaves one ubiquitin at a time from the distal end of the polyubiquitin chain. If chain trimming is faster than the degradation process, loss of the polyubiquitin signal could result in partial degradation or release of proteins from the proteasome. UCH37 activity could also be necessary to recover proteasomes that are having difficulty degrading ubiquitinated proteins. UCH37 is only found in higher eukaryotes, but the other two proteasome-bound DUBs, RPN11 and UBP6 (USP14), are found in all eukaryotes. RPN11 and UBP6 remove polyubiquitin from substrates that are committed to degradation by the proteasome [92]. The mechanisms for this, and exactly what role each DUB plays in removing ubiquitin, are not fully understood. Interestingly, the Rpn11 DUB activity was first detected over 10 years ago when 26S proteasome DUB activity was inhibited by o-phenanthroline, a metal chelator [93]. The metalloprotease DUB activity was not identified until recently [46]. Rpn11 is thought to remove polyubiquitin chains from proteasome substrates before they are degraded, allowing the unfolded substrate to enter the pore of the 20S subunit of the proteasome. It has been proposed that Rpn11 removes most of the polyubiquitin chain attached to a proteasome substrate and then UBP6 acts to remove the remaining one or two ubiquitin residues. Despite the lack of mechanistic understanding, the DUBs are clearly required for efficient proteasomal degradation to take place. The Rpn11 deletion is lethal in yeast and temperature-sensitive mutants show massive accumulation of polyubiquitin conjugates [46, 47]. UBP6 is approximately 300 times more active when it is associated with the proteasome than in its purified form [94]. The UBP6 deletion is not lethal in yeast, but a large decrease in the cellular pool of mono-ubiquitin occurs, indicating that ubiquitin is fed into the proteasome and degraded rather than being released from the proteasome and recycled [95].
6 Finding Substrates and Roles for DUBs
Surprisingly, little is known about the in vivo substrate specificity of DUBs. Difficulty in defining the substrate specificity of individual DUBs often arises from a lack of observable phenotypes in deletion strains. Deletion studies in yeast where up to 4 of the 17 DUBs have been deleted in a single strain have not produced significant phenotypes [96]. It is unclear if this is due to the subtle nature of the phenotypes or if the remaining DUBs compensate for the missing ones. However,
18 The Deubiquitinating Enzymes
several tactics have been fruitful in defining the physiological roles of DUBs. First, definition of in vitro specificities can be useful in focusing genetic screens. For example, the first UBPs were cloned and analyzed after it was discovered that they could cleave ubiquitin-fusion proteins [96, 97]. Second, directed screening of deletions or knockdown studies to identify roles for DUBs have also been successful (see Table 1 for DUB-deletion phenotypes). Study of DUB deletions, including UCH-L1, UCH-L3, and USP14 (see below), in the mouse have demonstrated their importance in neuronal function. Third, potential roles for DUBs have also been identified by physical and genetic interaction screens. Table 2 shows in more detail the interaction screens that have been used in discovering DUBs and characterizing their in vivo roles by identifying novel binding partners. For example, Cezanne was suggested to be a DUB after two-hybrid studies demonstrated its interaction with ubiquitin and UBP6 was identified as a component of the 19S subunit of the proteasome by mass spectrometry.
7 Roles of DUBs Revealed in Disease 7.1 NF-jB Pathway
NF-κB is a transcription factor that can be activated by a number of cellular signals, including stress, inflammation (via tumor necrosis factor) and antigen receptors among others [98, 99]. After receptor stimulation, a cascade ensues that results in the release of NF-κB from its inhibitor IκB. Released NF-κB translocates to the nucleus and activates transcription of a number of genes. Ubiquitin metabolism plays a significant regulatory role in the NF-κB pathway. For NF-κB release from IκB and nuclear translocation to take place, IκB is phosphorylated by IκB kinases, resulting in K48-linked polyubiquitination and proteasomal degradation of IκB [100]. A number of other proteins involved in this pathway such as NEMO, IKKγ , and TRAF6 have K63-linked polyubiquitin chains conjugated to them [101, 102]. It is not clear what purpose the K63-linked chains serve, but they appear to be a regulatory component of the NF-κB pathway. Most of the DUB activity characterized in the NF-κB pathway appears to act on K63-linked polyubiquitin, suggesting that modulation of K63-linked polyubiquitination by DUBs is important for control of the NF-κB pathway. CYLD, a tumor suppressor gene, has been confirmed as a DUB [7, 103, 104]. Loss of CYLD function leads to cylindromatosis, a syndrome characterized by large benign tumors on the face and neck. This is one of the few examples where a defective DUB has been defined as the direct cause of a specific disease. Preferred in vivo substrates of CYLD are believed to be 63-linked polyubiquitin-protein conjugates of NEMO, TRAF6, and TRAF2 components of the NF-κB pathway, but the exact in vivo regulatory role of CYLD is still unknown.
7 Roles of DUBs Revealed in Disease
OTU family DUBs such as Cezanne and A20 also play significant roles as negative regulators of the NF-κB pathway [43, 44]. A20 can cleave K48- and K63-linked polyubiquitin chains in vitro while Cezanne has only been tested on K48-linked chains. Although these DUBs are known to be part of the NF-κB pathway, their in vivo substrates are unknown. It is also unclear as to how these DUBs negatively regulate the NF-κB pathway. 7.2 Neural Function
DUBs, specifically UCHs, appear to play significant roles in neurodegenerative diseases such as Parkinson’s, Alzheimer’s, Huntington’s, and others [105]. A mutant form of UCH-L1 with reduced enzymatic activity has been found in a small family of Parkinson’s patients and the S18Y allele of UCH-L1 has been associated with a reduced risk of sporadic Parkinson’s disease [106, 107]. Many of the inclusion bodies found in patients with Parkinson’s are known to contain high amounts of UCH-L1, ubiquitin, and ubiquitinated proteins, as determined by immunostaining [108]. This suggests that defects in some DUBs or their regulation can cause significant harm to the neuronal system, resulting in disease. DUBs have also been implicated in the formation of other neural inclusion bodies. In addition to the case for their involvement in Parkinson’s disease it has been shown that the mutation of USP14 (the mammalian homolog of yeast UBP6) results in Ataxia in the mouse [109]. Many neurological diseases, including Ataxia, result in damaged or mutated proteins aggregating as polyubiquitinated forms at the microtubule organizing center (MTOC) to form inclusions called aggresomes [110]. An adapter, the tubulin deacetylase HDAC6 (histone deacetylase 6), has recently been shown to bind these polyubiquitinated proteins and tether them to the microtubules where they are then transported to the MTOC [111]. The classic Lewy Body of Parkinson’s disease has all the hallmarks of such an aggresome. The formation of an aggresome is thought to be protective and in its absence the aggregated proteins can trigger apoptosis. Thus, the dynamics of ubiquitination and aggregate formation are important responses to this type of cellular stress and several DUBs can modulate this process. Deletion of UCH-L1 and/or UCH-L3 in mice has demonstrated that they are both involved in neuronal regulation, but have separate functions. The GAD (gracile axonal dystrophy) mouse has been shown to lack UCH-L1, the predominant neuronal UCH [112]. These mice show a unique neuronal “dying back” phenotype that results in paralysis of the limbs due to death of nerves originating in the gracile nucleus. Mice lacking UCH-L3 have no obvious abnormalities or defects [113]. However, the double deletion mouse shows more severe defects including reduced weight, a more severe gracile axonal dystrophy than the L1 deletion, and earlier lethality caused by a loss of the ability to swallow resulting in starvation [114]. This demonstrates that the two DUBs are not redundant and have separate neuronal functions.
19
20 The Deubiquitinating Enzymes
8 New Tools for DUB Analysis
Despite all the DUB structures and substrates previously described, in most cases the in vivo substrate for a particular DUB is unknown. Structural and localization data can provide clues to determine in vivo DUB specificity, especially if one knows what ubiquitin or ubiquitin-like protein it acts upon. Genomic databases have helped, but many annotated DUBs have never been tested for DUB activity and some DUBs thought to act on one type of substrate (based on their homology) are found to act on another when tested. The characterization of hundreds of potential DUBs is a daunting task and in vivo characterization is even more difficult. To make headway, novel tools are needed to conclusively identify potential DUBs and their substrates to help direct appropriate in vivo studies.
8.1 Active-site-directed Irreversible Inhibitors and Substrates
The most promising tools developed for this sort of analysis are active-site-directed irreversible inhibitors of DUBs. These inhibitors are ubiquitin or ubiquitin-like proteins chemically modified at the C-terminus by an electrophilic moiety such as a Michael acceptor or alkyl halide. The modified ubiquitin can be incubated with a purified DUB or a cell lysate containing DUB activity. Ubiquitin vinyl sulfone (UbVS) is one such irreversible inhibitor because the vinyl sulfone moiety reacts with the active-site cysteine of the DUB, forming a thioether linkage. The covalent adduct is stable and can be detected in a variety of ways. Labeling of DUBs is specific, as only a DUB active-site cysteine will efficiently react with the vinyl sulfone moiety. To create these inhibitors, an N-terminally tagged ubiquitin or ubiquitin-like protein (lacking the C-terminal glycine) is expressed using the intein expression system (New England Biolabs). Briefly, in this system a fusion protein consisting of a ubiquitin or a ubiquitin-like protein lacking the C-terminal glycine, an intein linker, and a chitin-binding domain (CBD) is expressed in E. coli. Clarified cell lysate is incubated with chitin beads to bind the ubiquitin-fusion protein. The ubiquitin is then cleaved from the CBD and intein linker by adding mercaptoethanesulfonic acid (MESNA). After MESNA elution, the resulting truncated ubiquitin C-terminal thioester is reacted with glycine vinyl methyl sulfone to create the Ub or UbL vinyl sulfone derivative. The N-terminal tag on the ubiquitin molecule allows analysis of DUB labeling by immunoprecipitation and Western blotting. This labeling has been used with success in yeast where 6 of the 17 known DUBs were labeled with UbVS [115]. Incomplete labeling likely results from DUBs that do not act on mono-ubiquitin or where the UbVS could not access the active site. The labeling has also been used with great success in mammalian cell lysates to identify novel ubiquitin DUBs [41]. A novel deneddylating enzyme and a novel DUB that acts on autophagy-related UbL proteins have also been identified using vinyl sulfone labeled probes [26, 37, 116].
8 New Tools for DUB Analysis 21
This ubiquitin intein system can also be utilized to make a DUB substrate rather than inhibitors by attaching a C-terminal fluorescent tag such as 7amidomethylcoumarin (AMC) instead of vinyl sulfone. DUBs cleave the ubiquitin derivative and release the fluorescent tag, a process that can be followed fluorometrically. Fluorometric assays can then be used to determine a particular DUB’s preferred substrate or to quantitate DUB activity in crude lysates. AMC substrates have turned out to be excellent tools for identifying the substrates of individual DUBs. Den1, for example, was shown to cleave Nedd8-AMC 60 000-fold faster than it cleaves ubiquitin–AMC, and the ratio was even higher when compared to SUMO–AMC [26]. Clearly, these reagents are powerful tools for identifying novel DUBs and identifying potential DUB substrates.
8.2 Non-hydrolyzable Polyubiquitin Analogs
Other modified ubiquitin reagents that are useful in analyzing DUBs are nonhydrolyzable polyubiquitin analogs. These analogs are polyubiquitin chains where the ubiquitins are linked by cross-linking reagents. To synthesize them, one ubiquitin is mutated to cysteine at the C-terminal glycine and another has cysteine introduced at a particular lysine residue. These ubiquitins can then be linked through their cysteine residues with a bifunctional thiol reagent such as dichloroacetone (DCA). As the native ubiquitin sequence contains no cysteines, the ubiquitins will only be linked through the introduced cysteine residues. The result is a ubiquitin dimer analog that mimics physiological dimers. The isopeptide bond is replaced by a DCA linkage, but the ubiquitin subunits retain the appropriate spatial orientation. Thus, DUBs should bind these dimers, but will be unable to cleave them because they cannot hydrolyze the DCA linkage. To make longer polyubiquitin chains, cysteine residues or sulfhydryl groups must be introduced at the desired lysine and the C-terminal glycine on the same ubiquitin molecule. These chain analogs have been used to characterize DUBs in two fashions. First, they can be used as inhibitors of DUBs [62]. Cleavage of Ub–AMC by Isopeptidase T, a polyubiquitin-binding DUB, was inhibited by the addition of differently linked dimer analogs and kinetic inhibition constants were determined. The K i values of dimer analogs were all much lower than the K i for mono-ubiquitin. Further, there was considerable selectivity, as inhibition constants varied depending on the linkage present in the dimer [62]. This demonstrated that the analogs act as faithful mimics of native polyubiquitin. The other way these chain analogs are used is to synthesize them on affinity supports and analyze cell lysates for DUBs that bind a specific type of polyubiquitin chain. These affinity resins have proven useful in identifying a number of binding proteins, including DUBs, from yeast cell lysates [117]. Analogs with different linkages bind a different subset of proteins, perhaps suggesting a way to identify DUBs acting upon polyubiquitin with linkage specificity. Thus, these analogs are excellent tools for characterizing the substrate preferences of known DUBs and discovering novel ones.
22 The Deubiquitinating Enzymes
9 Conclusion
DUBs are clearly an essential cellular component needed for a variety of pathways including protein degradation, DNA repair, apoptosis, membrane trafficking, stress response, and transcriptional regulation. Not only do they act on various ubiquitin substrates, but they are also needed to process ubiquitin-like substrates. Over 100 DUBs from five major families have been identified and the number is likely to increase. Factors that enhance DUB specificity are the presence of a binding pocket that only accommodates the physiological substrate, the requirement for a substrate-induced conformational change that prevents undesired catalysis, and the recognition of the ubiquitous C-terminal gly–gly motif of all DUB substrates. Subcellular localization to a specific organelle or protein complex and tissue-specific as well as temporal expression are also important components of DUB specificity and function. In spite of all that has been learned about DUBs and their function, much remains to be discovered. Future studies are likely to focus on identifying in vivo DUB substrates, novel DUBs, DUB-binding partners, and phenotypes of DUB deletions. Further development of new reagents, such as the non-hydrolyzable polyubiquitin analogs and active-site-directed inhibitors or substrates will help greatly. Directed proteomics studies should assist in identifying DUBs, loss-offunction phenotypes, and potential binding partners. Despite the major gaps that remain in our understanding of DUBs, our knowledge of their roles and importance has progressed amazingly rapidly. Novel DUB gene families have been identified, new ubiquitin-like DUB substrates have been uncovered, and structural data has been analyzed to elucidate how DUBs perform catalysis and specifically recognize their substrates. In vivo substrates of DUBs are beginning to be identified and the tools and techniques needed to search for novel DUBs and analyze known ones for their specificity are rapidly being created. With so much discovered, and yet so much remaining to be found, deubiquitinating enzymes are a vibrant field of study.
References 1 SCHLESINGER, D. H., GOLDSTEIN, G., and NIALL, H. D. The complete amino acid sequence of ubiquitin, an adenylate cyclase stimulating polypeptide probably universal in living cells, Biochemistry, 1975, 14, 2214–2218. 2 HOCHSTRASSER, M. Ubiquitin-dependent protein degradation, Annu Rev Genet, 1996, 30, 405–439. 3 THROWER, J. S., HOFFMAN, L., RECHSTEINER, M., and PICKART, C. M. Recognition of the polyubiquitin
proteolytic signal, Embo J, 2000, 19, 94–102. 4 WILKINSON, K. D. Ubiquitination and deubiquitination: targeting of proteins for degradation by the proteasome, Semin Cell Dev Biol, 2000, 11, 141– 148. 5 WANG, Q., GOH, A. M., HOWLEY, P. M., and WALTERS, K. J. Ubiquitin recognition by the DNA repair protein hHR23a, Biochemistry, 2003, 42, 13529–13535.
References 6 GEWIES, A. and GRIMM, S. UBP41 is a proapoptotic ubiquitin-specific protease, Cancer Res, 2003, 63, 682–688. 7 KOVALENKO, A., CHABLE-BESSIA, C., CANTARELLA, G., ISRAEL, A., WALLACH, D., and COURTOIS, G. The tumour suppressor CYLD negatively regulates NF-kappaB signalling by deubiquitination, Nature, 2003, 424, 801–805. 8 HICKE, L. A new ticket for entry into budding vesicles-ubiquitin, Cell, 2001, 106, 527–530. 9 MASTRANDREA, L. D., YOU, J., NILES, E. G., and PICKART, C. M. E2/E3-mediated assembly of lysine 29-linked polyubiquitin chains, J Biol Chem, 1999, 274, 27299–27306. 10 YOU, J. and PICKART, C. M. A HECT domain E3 enzyme assembles novel polyubiquitin chains, J Biol Chem, 2001, 276, 19871–19878. 11 BABOSHINA, O. V. and HAAS, A. L. Novel multiubiquitin chain linkages catalyzed by the conjugating enzymes E2EPF and RAD6 are recognized by 26 S proteasome subunit 5, J Biol Chem, 1996, 271, 2823–2831. 12 MAHAJAN, R., DELPHIN, C., GUAN, T., GERACE, L., and MELCHIOR, F. A small ubiquitin-related polypeptide involved in targeting RanGAP1 to nuclear pore complex protein RanBP2, Cell, 1997, 88, 97–107. 13 KAMITANI, T., KITO, K., NGUYEN, H. P., and YEH, E. T. Characterization of NEDD8, a developmentally down-regulated ubiquitin-like protein, J Biol Chem, 1997, 272, 28557–28562. 14 NARASIMHAN, J., POTTER, J. L., and HAAS, A. L. Conjugation of the 15-kDa interferon-induced ubiquitin homolog is distinct from that of ubiquitin, J Biol Chem, 1996, 271, 324–330. 15 WALDEN, H., PODGORSKI, M. S., HUANG, D. T., MILLER, D. W., HOWARD, R. J., MINOR, D. L., Jr., HOLTON, J. M., and SCHULMAN, B. A. The structure of the APPBP1-UBA3-NEDD8-ATP complex reveals the basis for selective ubiquitin-like protein activation by an E1, Mol Cell, 2003, 12, 1427–1437.
16 JOHNSON, E. S., SCHWIENHORST, I., DOHMEN, R. J., and BLOBEL, G. The ubiquitin-like protein Smt3p is activated for conjugation to other proteins by an Aos1p/Uba2p heterodimer, Embo J, 1997, 16, 5509–5519. 17 HEMELAAR, J., BORODOVSKY, A., KESSLER, B. M., REVERTER, D., COOK, J., KOLLI, N., GAN-ERDENE, T., WILKINSON, K. D., GILL, G., LIMA, C. D., PLOEGH, H. L., and OVAA, H. Specific and covalent targeting of conjugating and deconjugating enzymes of ubiquitin-like proteins, Mol Cell Biol, 2004, 24, 84–95. 18 ANDERSEN, M. W., GOLDKNOPF, I. L., and BUSCH, H. Protein A24 lyase is an isopeptidase, FEBS Lett, 1981, 132, 210–214. 19 PICKART, C. M. and ROSE, I. A. Ubiquitin carboxyl-terminal hydrolase acts on ubiquitin carboxyl-terminal amides, J Biol Chem, 1985, 260, 7903–7910. 20 JENTSCH, S. and PYROWOLAKIS, G. Ubiquitin and its kin: how close are the family ties?, Trends Cell Biol, 2000, 10, 335–342. 21 LI, T., DUAN, W., YANG, H., LEE, M. K., BTE MUSTAFA, F., LEE, B. H., and TEO, T. S. Identification of two proteins, S14 and UIP1, that interact with UCH37, FEBS Lett, 2001, 488, 201–205. 22 JENSEN, D. E., PROCTOR, M., MARQUIS, S. T., GARDNER, H. P., HA, S. I., CHODOSH, L. A., ISHOV, A. M., TOMMERUP, N., VISSING, H., SEKIDO, Y., MINNA, J., BORODOVSKY, A., SCHULTZ, D. C., WILKINSON, K. D., MAUL, G. G., BARLEV, N., BERGER, S. L., PRENDERGAST, G. C., and RAUSCHER, F. J., 3rd. BAP1: a novel ubiquitin hydrolase which binds to the BRCA1 RING finger and enhances BRCA1-mediated cell growth suppression, Oncogene, 1998, 16, 1097–1112. 23 WILKINSON, K. D. Regulation of ubiquitin-dependent processes by deubiquitinating enzymes, Faseb J, 1997, 11, 1245–1256. 24 LARSEN, C. N., KRANTZ, B. A., and WILKINSON, K. D. Substrate specificity of deubiquitinating enzymes: ubiquitin C-terminal hydrolases, Biochemistry, 1998, 37, 3358–3368.
23
24 The Deubiquitinating Enzymes
25 DANG, L. C., MELANDRI, F. D., and STEIN, R. L. Kinetic and mechanistic studies on the hydrolysis of ubiquitin C-terminal 7-amido-4-methylcoumarin by deubiquitinating enzymes, Biochemistry, 1998, 37, 1868–1879. 26 GAN-ERDENE, T., KOLLI, N., YIN, L., WU, K., PAN, Z. Q., and WILKINSON, K. D. Identification and characterization of DEN1, a deneddylase of the ULP family, J Biol Chem, 2003, 31, 28892–28900. 27 JOHNSTON, S. C., RIDDLE, S. M., COHEN, R. E., and HILL, C. P. Structural basis for the specificity of ubiquitin C-terminal hydrolases, Embo J, 1999, 18, 3877–3887. 28 JOHNSTON, S. C., LARSEN, C. N., COOK, W. J., WILKINSON, K. D., and HILL, C. P. Crystal structure of a deubiquitinating enzyme (human UCH-L3) at 1.8 A resolution, Embo J, 1997, 16, 3787–3796. 29 WILKINSON, K. D., LALELI-SAHIN, E., URBAUER, J., LARSEN, C. N., SHIH, G. H., HAAS, A. L., WALSH, S. T., and WAND, A. J. The binding site for UCH-L3 on ubiquitin: mutagenesis and NMR studies on the complex between ubiquitin and UCH-L3, J Mol Biol, 1999, 291, 1067–1077. 30 WING, S. S. Deubiquitinating enzymes – the importance of driving in reverse along the ubiquitin-proteasome pathway, Int J Biochem Cell Biol, 2003, 35, 590–605. 31 MALAKHOV, M. P., MALAKHOVA, O. A., KIM, K. I., RITCHIE, K. J., and ZHANG, D. E. UBP43 (USP18) specifically removes ISG15 from conjugated proteins, J Biol Chem, 2002, 277, 9976–9981. 32 GONG, L., KAMITANI, T., MILLAS, S., and YEH, E. T. Identification of a novel isopeptidase with dual specificity for ubiquitin- and NEDD8-conjugated proteins, J Biol Chem, 2000, 275, 14212–14216. 33 WILKINSON, K. D., TASHAYEV, V. L., O’CONNOR, L. B., LARSEN, C. N., KASPEREK, E., and PICKART, C. M. Metabolism of the polyubiquitin degradation signal: structure, mechanism, and role of isopeptidase T, Biochemistry, 1995, 34, 14535–14546. 34 MELANDRI, F., GRENIER, L., PLAMONDON, L., HUSKEY, W. P., and STEIN, R. L.
35
36
37
38
39
40
41
42
43
Kinetic studies on the inhibition of isopeptidase T by ubiquitin aldehyde, Biochemistry, 1996, 35, 12893–12900. HU, M., LI, P., LI, M., LI, W., YAO, T., WU, J. W., GU, W., COHEN, R. E., and SHI, Y. Crystal structure of a UBP-family deubiquitinating enzyme in isolation and in complex with ubiquitin aldehyde, Cell, 2002, 111, 1041–1054. LI, M., CHEN, D., SHILOH, A., LUO, J., NIKOLAEV, A. Y., QIN, J., and GU, W. Deubiquitination of p53 by HAUSP is an important pathway for p53 stabilization, Nature, 2002, 416, 648–653. WU, K., YAMOAH, K., DOLIOS, G., GAN-ERDENE, T., TAN, P., CHEN, A., LEE, C. G., WEI, N., WILKINSON, K. D., WANG, R., and PAN, Z. Q. DEN1 is a dual function protease capable of processing the C-terminus of Nedd8 deconjugating hyper-neddylated CUL1, J Biol Chem, 2003, 31, 28882–28891. MENDOZA, H. M., SHEN, L. N., BOTTING, C., LEWIS, A., CHEN, J., INK, B., and HAY, R. T. NEDP1, a highly conserved cysteine protease that deNEDDylates Cullins, J Biol Chem, 2003, 278, 25637–25643. LI, S. J. and HOCHSTRASSER, M. A new protease required for cell-cycle progression in yeast, Nature, 1999, 398, 246–251. MOSSESSOVA, E. and LIMA, C. D. Ulp1-SUMO crystal structure and genetic analysis reveal conserved interactions and a regulatory element essential for cell growth in yeast, Mol Cell, 2000, 5, 865–876. BORODOVSKY, A., OVAA, H., KOLLI, N., GAN-ERDENE, T., WILKINSON, K. D., PLOEGH, H. L., and KESSLER, B. M. Chemistry-based functional proteomics reveals novel members of the deubiquitinating enzyme family, Chem Biol, 2002, 9, 1149–1159. BALAKIREV, M. Y., TCHERNIUK, S. O., JAQUINOD, M., and CHROBOCZEK, J. Otubains: a new family of cysteine proteases in the ubiquitin pathway, EMBO Rep, 2003, 4, 517–522. EVANS, P. C., SMITH, T. S., LAI, M. J., WILLIAMS, M. G., BURKE, D. F., HEYNINCK, K., KREIKE, M. M., BEYAERT,
References
44
45
46
47
48
49
50
51
52
53
R., BLUNDELL, T. L., and KILSHAW, P. J. A novel type of deubiquitinating enzyme, J Biol Chem, 2003, 278, 23180–23186. EVANS, P. C., OVAA, H., HAMON, M., KILSHAW, P. J., HAMM, S., BAUER, S., PLOEGH, H. L., and SMITH, T. S. A20, a regulator of inflammation and cell survival, has deubiquitinating activity, Biochem J, 2004, Pt 3, 727–734. COPE, G. A., SUH, G. S., ARAVIND, L., SCHWARZ, S. E., ZIPURSKY, S. L., KOONIN, E. V., and DESHAIES, R. J. Role of predicted metalloprotease motif of Jab1/Csn5 in cleavage of Nedd8 from Cul1, Science, 2002, 298, 608–611. VERMA, R., ARAVIND, L., OANIA, R., MCDONALD, W. H., YATES, J. R., 3rd, KOONIN, E. V., and DESHAIES, R. J. Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome, Science, 2002, 298, 611–615. YAO, T. and COHEN, R. E. A cryptic protease couples deubiquitination and degradation by the proteasome, Nature, 2002, 419, 403–407. BERNDT, C., BECH-OTSCHIR, D., DUBIEL, W., and SEEGER, M. Ubiquitin system: JAMMing in the name of the lid, Curr Biol, 2002, 12, R815–R817. TRAN, H. J., ALLEN, M. D., LOWE, J., and BYCROFT, M. Structure of the Jab1/MPN domain and its implications for proteasome function, Biochemistry, 2003, 42, 11460–11465. AMBROGGIO, X. I., REES, D. C., and DESHAIES, R. J. JAMM: A Metalloprotease-Like Zinc Site in the Proteasome and Signalosome, PLoS Biol, 2004, 2, E2. MUELLER, T. D., KAMIONKA, M., and FEIGON, J. Specificity of the interaction between UBA domains and ubiquitin, J Biol Chem, 2004, 12, 11926–11936. KANG, R. S., DANIELS, C. M., FRANCIS, S. A., SHIH, S. C., SALERNO, W. J., HICKE, L., and RADHAKRISHNAN, I. Solution structure of a CUE-ubiquitin complex reveals a conserved mode of ubiquitin binding, Cell, 2003, 113, 621–630. PRAG, G., MISRA, S., JONES, E. A., GHIRLANDO, R., DAVIES, B. A.,
54
55
56
57
58
59
60
61
62
HORAZDOVSKY, B. F., and HURLEY, J. H. Mechanism of ubiquitin recognition by the CUE domain of Vps9p, Cell, 2003, 113, 609–620. FISHER, R. D., WANG, B., ALAM, S. L., HIGGINSON, D. S., ROBINSON, H., SUNDQUIST, W. I., and HILL, C. P. Structure and ubiquitin binding of the ubiquitin interacting motif, J Biol Chem, 2003, 31, 28976–28984. BEAL, R. E., TOSCANO-CANTAFFA, D., YOUNG, P., RECHSTEINER, M., and PICKART, C. M. The hydrophobic effect contributes to polyubiquitin chain recognition, Biochemistry, 1998, 37, 2925–2934. LAM, Y. A., DEMARTINO, G. N., PICKART, C. M., and COHEN, R. E. Specificity of the ubiquitin isopeptidase in the PA700 regulatory complex of 26 S proteasomes, J Biol Chem, 1997, 272, 28438–28446. PENG, J., SCHWARTZ, D., ELIAS, J. E., THOREEN, C. C., CHENG, D., MARSISCHKY, G., ROELOFS, J., FINLEY, D., and GYGI, S. P. A proteomics approach to understanding protein ubiquitination, Nat Biotechnol, 2003, 21, 921–926. SPENCE, J., SADIS, S., HAAS, A. L., and FINLEY, D. A ubiquitin mutant with specific defects in DNA repair and multiubiquitination, Mol Cell Biol, 1995, 15, 1265–1273. HOFMANN, R. M. and PICKART, C. M. In vitro assembly and recognition of Lys-63 polyubiquitin chains, J Biol Chem, 2001, 276, 27936–27943. VARADAN, R., WALKER, O., PICKART, C., and FUSHMAN, D. Structural properties of polyubiquitin chains in solution, J Mol Biol, 2002, 324, 637–647. VARADAN, R., ASSFALG, M., HARIRINIA, A., RAASI, S., PICKART, C., and FUSHMAN, D. Solution conformation of Lys63-linked di-ubiquitin chain provides clues to functional diversity of polyubiquitin signaling, J Biol Chem, 2003, 8, 7055–7063. YIN, L., KRANTZ, B., RUSSELL, N. S., DESHPANDE, S., and WILKINSON, K. D. Nonhydrolyzable diubiquitin analogues are inhibitors of ubiquitin conjugation and deconjugation, Biochemistry, 2000, 39, 10001–10010.
25
26 The Deubiquitinating Enzymes
63 JOHNSON, E. S., MA, P. C., OTA, I. M., and VARSHAVSKY, A. A proteolytic pathway that recognizes ubiquitin as a degradation signal, J Biol Chem, 1995, 270, 17442–17456. 64 LI, Z., NA, X., WANG, D., SCHOEN, S. R., MESSING, E. M., and WU, G. Ubiquitination of a novel deubiquitinating enzyme requires direct binding to von Hippel-Lindau tumor suppressor protein, J Biol Chem, 2002, 277, 4656–4662. 65 CHEN, X., ZHANG, B., and FISCHER, J. A. A specific protein substrate for a deubiquitinating enzyme: Liquid facets is the substrate of Fat facets, Genes Dev, 2002, 16, 289–294. 66 COHEN, M., STUTZ, F., and DARGEMONT, C. Deubiquitination, a new player in Golgi to endoplasmic reticulum retrograde transport, J Biol Chem, 2003, 278, 51989–51992. 67 IDEGUCHI, H., UEDA, A., TANAKA, M., YANG, J., TSUJI, T., OHNO, S., HAGIWARA, E., AOKI, A., and ISHIGATSUBO, Y. Structural and functional characterization of the USP11 deubiquitinating enzyme which interacts with the RanGTP-associated protein RanBPM, Biochem J, 2002, 367, 87–95. 68 KATO, M., MIYAZAWA, K., and KITAMURA, N. A deubiquitinating enzyme UBPY interacts with the Src homology 3 domain of Hrs-binding protein via a novel binding motif PX(V/I) (D/N)RXXKP, J Biol Chem, 2000, 275, 37481–37487. 69 LI, S. J. and HOCHSTRASSER, M. The Ulp1 SUMO isopeptidase: distinct domains required for viability, nuclear envelope localization, and substrate specificity, J Cell Biol, 2003, 160, 1069–1081. 70 KINNER, A. and KOLLING, R. The yeast deubiquitinating enzyme Ubp16 is anchored to the outer mitochondrial membrane, FEBS Lett, 2003, 549, 135–140. 71 DUPRE, S. and HAGUENAUER-TSAPIS, R. Deubiquitination step in the endocytic pathway of yeast plasma membrane proteins: crucial role of Doa4p ubiquitin
72
73
74
75
76
77
78
79
80
81
isopeptidase, Mol Cell Biol, 2001, 21, 4482–4494. WANG, Y., SATOH, A., WARREN, G., and MEYER, H. H. VCIP135 acts as a deubiquitinating enzyme during p97-p47-mediated reassembly of mitotic Golgi fragments, J Cell Biol, 2004, 164, 973–978. TIAN, Q. B., OKANO, A., NAKAYAMA, K., MIYAZAWA, S., ENDO, S., and SUZUKI, T. A novel ubiquitin-specific protease, synUSP, is localized at the post-synaptic density and post-synaptic lipid raft, J Neurochem, 2003, 87, 665–675. FISCHER-VIZE, J. A., RUBIN, G. M., and LEHMANN, R. The fat facets gene is required for Drosophila eye and embryo development, Development, 1992, 116, 985–1000. LIU, N., DANSEREAU, D. A., and LASKO, P. Fat facets interacts with vasa in the Drosophila pole plasm and protects it from degradation, Curr Biol, 2003, 13, 1905–1909. ZHU, Y., PLESS, M., INHORN, R., MATHEY-PREVOT, B., and D’ANDREA, A. D. The murine DUB-1 gene is specifically induced by the betac subunit of interleukin-3 receptor, Mol Cell Biol, 1996, 16, 4808–4817. ZHU, Y., LAMBERT, K., CORLESS, C., COPELAND, N. G., GILBERT, D. J., JENKINS, N. A., and D’ANDREA, A. D. DUB-2 is a member of a novel family of cytokine-inducible deubiquitinating enzymes, J Biol Chem, 1997, 272, 51–57. LOEB, K. R. and HAAS, A. L. The interferon-inducible 15-kDa ubiquitin homolog conjugates to intracellular proteins, J Biol Chem, 1992, 267, 7806–7813. FINLEY, D., OZKAYNAK, E., and VARSHAVSKY, A. The yeast polyubiquitin gene is essential for resistance to high temperatures, starvation, and other stresses, Cell, 1987, 48, 1035–1046. FINLEY, D., BARTEL, B., and VARSHAVSKY, A. The tails of ubiquitin precursors are ribosomal proteins whose fusion to ubiquitin facilitates ribosome biogenesis, Nature, 1989, 338, 394–401. AMERIK, A. Y., LI, S. J., and HOCHSTRASSER, M. Analysis of the
References
82
83
84
85
86
87
88
89
90
deubiquitinating enzymes of the yeast Saccharomyces cerevisiae, Biol Chem, 2000, 381, 981–992. LAMMER, D., MATHIAS, N., LAPLAZA, J. M., JIANG, W., LIU, Y., CALLIS, J., GOEBL, M., and ESTELLE, M. Modification of yeast Cdc53p by the ubiquitin-related protein rub1p affects function of the SCFCdc4 complex, Genes Dev, 1998, 12, 914–926. LINGHU, B., CALLIS, J., and GOEBL, M. G. Rub1p processing by Yuh1p is required for wild-type levels of Rub1p conjugation to Cdc53p, Eukaryot Cell, 2002, 1, 491–494. SWAMINATHAN, S., AMERIK, A. Y., and HOCHSTRASSER, M. The Doa4 deubiquitinating enzyme is required for ubiquitin homeostasis in yeast, Mol Biol Cell, 1999, 10, 2583–2594. AMERIK, A. Y., NOWAK, J., SWAMINATHAN, S., and HOCHSTRASSER, M. The Doa4 deubiquitinating enzyme is functionally linked to the vacuolar protein-sorting and endocytic pathways, Mol Biol Cell, 2000, 11, 3365–3380. AMERIK, A., SWAMINATHAN, S., KRANTZ, B. A., WILKINSON, K. D., and HOCHSTRASSER, M. In vivo disassembly of free polyubiquitin chains by yeast Ubp14 modulates rates of protein degradation by the proteasome, Embo J, 1997, 16, 4826–4838. KAWAKAMI, T., CHIBA, T., SUZUKI, T., IWAI, K., YAMANAKA, K., MINATO, N., SUZUKI, H., SHIMBARA, N., HIDAKA, Y., OSAKA, F., OMATA, M., and TANAKA, K. NEDD8 recruits E2-ubiquitin to SCF E3 ligase, Embo J, 2001, 20, 4003–4012. DANIEL, J. A., TOROK, M. S., SUN, Z. W., SCHIELTZ, D., ALLIS, C. D., YATES, J. R., 3rd, and GRANT, P. A. Deubiquitination of histone H2B by a yeast acetyltransferase complex regulates transcription, J Biol Chem, 2004, 279, 1867–1871. HOEGE, C., PFANDER, B., MOLDOVAN, G. L., PYROWOLAKIS, G., and JENTSCH, S. RAD6-dependent DNA repair is linked to modification of PCNA by ubiquitin and SUMO, Nature, 2002, 419, 135–141. GROLL, M., DITZEL, L., LOWE, J., STOCK, D., BOCHTLER, M., BARTUNIK, H. D., and HUBER, R. Structure of 20S proteasome
91
92
93
94
95
96
97
98
99
100
from yeast at 2.4 A resolution, Nature, 1997, 386, 463–471. LAM, Y. A., XU, W., DEMARTINO, G. N., and COHEN, R. E. Editing of ubiquitin conjugates by an isopeptidase in the 26S proteasome, Nature, 1997, 385, 737–740. GUTERMAN, A. and GLICKMAN, M. H. Complementary roles for Rpn11 and Ubp6 in deubiquitination and proteolysis by the proteasome, J Biol Chem, 2004, 279, 1729–1738. EYTAN, E., ARMON, T., HELLER, H., BECK, S., and HERSHKO, A. Ubiquitin C-terminal hydrolase activity associated with the 26 S protease complex, J Biol Chem, 1993, 268, 4668–4674. LEGGETT, D. S., HANNA, J., BORODOVSKY, A., CROSAS, B., SCHMIDT, M., BAKER, R. T., WALZ, T., PLOEGH, H., and FINLEY, D. Multiple associated proteins regulate proteasome structure and function, Mol Cell, 2002, 10, 495–507. CHERNOVA, T. A., ALLEN, K. D., WESOLOSKI, L. M., SHANKS, J. R., CHERNOFF, Y. O., and WILKINSON, K. D. Pleiotropic effects of Ubp6 loss on drug sensitivities and yeast prion are due to depletion of the free ubiquitin pool, J Biol Chem, 2003, 278, 52102–52115. BAKER, R. T., TOBIAS, J. W., and VARSHAVSKY, A. Ubiquitin-specific proteases of Saccharomyces cerevisiae. Cloning of UBP2 and UBP3, and functional analysis of the UBP gene family, J Biol Chem, 1992, 267, 23364–23375. TOBIAS, J. W. and VARSHAVSKY, A. Cloning and functional analysis of the ubiquitin-specific protease gene UBP1 of Saccharomyces cerevisiae, J Biol Chem, 1991, 266, 12021–12028. KARIN, M. and BEN-NERIAH, Y. Phosphorylation meets ubiquitination: the control of NF-[kappa]B activity, Annu Rev Immunol, 2000, 18, 621–663. ZHOU, H., WERTZ, I., O’ROURKE, K., ULTSCH, M., SESHAGIRI, S., EBY, M., XIAO, W., and DIXIT, V. M. Bcl10 activates the NF-kappaB pathway through ubiquitination of NEMO, Nature, 2004, 427, 167–171. YARON, A., GONEN, H., ALKALAY, I., HATZUBAI, A., JUNG, S., BEYTH, S.,
27
28 The Deubiquitinating Enzymes
101
102
103
104
105
106
107
MERCURIO, F., MANNING, A. M., CIECHANOVER, A., and BEN-NERIAH, Y. Inhibition of NF-kappa-B cellular function via specific targeting of the I-kappa-B-ubiquitin ligase, Embo J, 1997, 16, 6486–6494. WANG, C., DENG, L., HONG, M., AKKARAJU, G. R., INOUE, J., and CHEN, Z. J. TAK1 is a ubiquitin-dependent kinase of MKK and IKK, Nature, 2001, 412, 346–351. TANG, E. D., WANG, C. Y., XIONG, Y., and GUAN, K. L. A role for NF-kappaB essential modifier/IkappaB kinase-gamma (NEMO/IKKgamma) ubiquitination in the activation of the IkappaB kinase complex by tumor necrosis factor-alpha, J Biol Chem, 2003, 278, 37297–37305. TROMPOUKI, E., HATZIVASSILIOU, E., TSICHRITZIS, T., FARMER, H., ASHWORTH, A., and MOSIALOS, G. CYLD is a deubiquitinating enzyme that negatively regulates NF-kappaB activation by TNFR family members, Nature, 2003, 424, 793–796. BRUMMELKAMP, T. R., NIJMAN, S. M., DIRAC, A. M., and BERNARDS, R. Loss of the cylindromatosis tumour suppressor inhibits apoptosis by activating NF-kappaB, Nature, 2003, 424, 797– 801. CIECHANOVER, A. and BRUNDIN, P. The ubiquitin proteasome system in neurodegenerative diseases: sometimes the chicken, sometimes the egg, Neuron, 2003, 40, 427–446. LIU, Y., FALLON, L., LASHUEL, H. A., LIU, Z., and LANSBURY, P. T., Jr. The UCH-L1 gene encodes two opposing enzymatic activities that affect alpha-synuclein degradation and Parkinson’s disease susceptibility, Cell, 2002, 111, 209–218. LEROY, E., BOYER, R., AUBURGER, G., LEUBE, B., ULM, G., MEZEY, E., HARTA, G., BROWNSTEIN, M. J., JONNALAGADA, S., CHERNOVA, T., DEHEJIA, A., LAVEDAN, C., GASSER, T., STEINBACH, P. J., WILKINSON, K. D., and POLYMEROPOULOS, M. H. The ubiquitin pathway in Parkinson’s disease, Nature, 1998, 395, 451–452.
108 LOWE, J., MCDERMOTT, H., LANDON, M., MAYER, R. J., and WILKINSON, K. D. Ubiquitin carboxyl-terminal hydrolase (PGP 9.5) is selectively present in ubiquitinated inclusion bodies characteristic of human neurodegenerative diseases, J Pathol, 1990, 161, 153–160. 109 WILSON, S. M., BHATTACHARYYA, B., RACHEL, R. A., COPPOLA, V., TESSAROLLO, L., HOUSEHOLDER, D. B., FLETCHER, C. F., MILLER, R. J., COPELAND, N. G., and JENKINS, N. A. Synaptic defects in ataxia mice result from a mutation in Usp14, encoding a ubiquitin-specific protease, Nat Genet 2002, 32, 420–425. 110 GARCIA-MATA, R., GAO, Y. S., and SZTUL, E. Hassles with taking out the garbage: aggravating aggresomes, Traffic, 2002, 3, 388–396. 111 KAWAGUCHI, Y., KOVACS, J. J., MCLAURIN, A., VANCE, J. M., ITO, A., and YAO, T. P. The deacetylase HDAC6 regulates aggresome formation and cell viability in response to misfolded protein stress, Cell, 2003, 115, 727–738. 112 SAIGOH, K., WANG, Y. L., SUH, J. G., YAMANISHI, T., SAKAI, Y., KIYOSAWA, H., HARADA, T., ICHIHARA, N., WAKANA, S., KIKUCHI, T., and WADA, K. Intragenic deletion in the gene encoding ubiquitin carboxy-terminal hydrolase in gad mice, Nat Genet 1999, 23, 47–51. 113 KURIHARA, L. J., SEMENOVA, E., LEVORSE, J. M., and TILGHMAN, S. M. Expression and functional analysis of Uch-L3 during mouse development, Mol Cell Biol, 2000, 20, 2498–2504. 114 KURIHARA, L. J., KIKUCHI, T., WADA, K., and TILGHMAN, S. M. Loss of Uch-L1 and Uch-L3 leads to neuro-degeneration, posterior paralysis and dysphagia, Hum Mol Genet, 2001, 10, 1963–1970. 115 BORODOVSKY, A., KESSLER, B. M., CASAGRANDE, R., OVERKLEEFT, H. S., WILKINSON, K. D., and PLOEGH, H. L. A novel active site-directed probe specific for deubiquitylating enzymes reveals proteasome association of USP14, Embo J, 2001, 20, 5187–5196. 116 HEMELAAR, J., LELYVELD, V. S., KESSLER, B. M., and PLOEGH, H. L. A single protease,
References
117
118
119
120
121
122
123
124
Apg4B, is specific for the autophagy-related ubiquitin-like proteins GATE-16, MAP1-LC3, GABARAP, and Apg8L, J Biol Chem, 2003, 278, 51841–51850. RUSSELL, N. S. and WILKINSON, K. D. Identification of a novel 29-linked polyubiquitin binding protein, ufd3, using polyubiquitin chain analogues, Biochemistry, 2004, 43, 4844–4854. MOAZED, D. and JOHNSON, D. A deubiquitinating enzyme interacts with SIR4 and regulates silencing in S. cerevisiae, Cell, 1996, 86, 667–677. NAVIGLIO, S., MATTECUCCI, C., MATOSKOVA, B., NAGASE, T., NOMURA, N., DI FIORE, P. P., and DRAETTA, G. F. UBPY: a growth-regulated human ubiquitin isopeptidase, Embo J, 1998, 17, 3241–3250. VALERO, R., BAYES, M., FRANCISCA SANCHEZ-FONT, M., GONZALEZ-ANGULO, O., GONZALEZ-DUARTE, R., and MARFANY, G. Characterization of alternatively spliced products and tissue-specific isoforms of USP28 and USP25, Genome Biol, 2001, 2, RESEARCH0043. LEE, E. G., BOONE, D. L., CHAI, S., LIBBY, S. L., CHIEN, M., LODOLCE, J. P., and MA, A. Failure to regulate TNF-induced NF-kappaB and cell death responses in A20-deficient mice, Science, 2000, 289, 2350–2354. SOARES, L., SEROOGY, C., SKRENTA, H., ANANDASABAPATHY, N., LOVELACE, P., CHUNG, C. D., ENGLEMAN, E., and FATHMAN, C. G. Two isoforms of otubain 1 regulate T cell anergy via GRAIL, Nat Immunol 2004, 5, 45–54. LI, S. J. and HOCHSTRASSER, M. The yeast ULP2 (SMT4) gene encodes a novel protease specific for the ubiquitin-like Smt3 protein, Mol Cell Biol, 2000, 20, 2367–2377. KIM, K. I., BAEK, S. H., JEON, Y. J., NISHIMORI, S., SUZUKI, T., UCHIDA, S., SHIMBARA, N., SAITOH, H., TANAKA, K., and CHUNG, C. H. A new SUMO-1-specific protease, SUSP1, that is highly expressed in reproductive
125
126
127
128
129
130
131
132
133
organs, J Biol Chem, 2000, 275, 14102–14106. BURNETT, B., LI, F., and PITTMAN, R. N. The polyglutamine neuro-degenerative protein ataxin-3 binds polyubiquitylated proteins and has ubiquitin protease activity, Hum Mol Genet, 2003, 12, 3195–3205. WADA, H., KITO, K., CASKEY, L. S., YEH, E. T., and KAMITANI, T. Cleavage of the C-terminus of NEDD8 by UCH-L3, Biochem Biophys Res Commun, 1998, 251, 688–692. CABALLERO, O. L., RESTO, V., PATTURAJAN, M., MEERZAMAN, D., GUO, M. Z., ENGLES, J., YOCHEM, R., RATOVITSKI, E., SIDRANSKY, D., and JEN, J. Interaction and colocalization of PGP9.5 with JAB1 and p27(Kip1), Oncogene, 2002, 21, 3003–3010. FIORANI, P., REID, R. J., SCHEPIS, A., JACQUIAU, H. R., GUO, H., THIMMAIAH, P., BENEDETTI, P., and BJORNSTI, M. A. The deubiquitinating enzyme Doa4p protects cells from DNA topoisomerase I poisons, J Biol Chem, 2004, 20, 21271–21281. BREW, C. T. and HUFFAKER, T. C. The yeast ubiquitin protease, Ubp3p, promotes protein stability, Genetics, 2002, 162, 1079–1089. VERMA, R., CHEN, S., FELDMAN, R., SCHIELTZ, D., YATES, J., DOHMEN, J., and DESHAIES, R. J. Proteasomal proteomics: identification of nucleotide-sensitive proteasome-interacting proteins by mass spectrometric analysis of affinity-purified proteasomes, Mol Biol Cell, 2000, 11, 3425–3439. HADARI, T., WARMS, J. V., ROSE, I. A., and HERSHKO, A. A ubiquitin C-terminal isopeptidase that acts on polyubiquitin chains. Role in protein degradation, J Biol Chem, 1992, 267, 719–727. HONG, S., KIM, S. J., KA, S., CHOI, I., and KANG, S. USP7, a ubiquitin-specific protease, interacts with ataxin-1, the SCA1 gene product, Mol Cell Neurosci, 2002, 20, 298–306. GNESUTTA, N., CERIANI, M., INNOCENTI, M., MAURI, I., ZIPPEL, R., STURANI, E., BORGONOVO, B., BERRUTI, G., and
29
30 The Deubiquitinating Enzymes
MARTEGANI, E. Cloning and characterization of mouse UBPy, a deubiquitinating enzyme that interacts with the ras guanine nucleotide exchange factor CDC25(Mm)/ Ras-GRF1, J Biol Chem, 2001, 276, 39448–39454.
134 WANG, G., SAWAI, N., KOTLIAROVA, S., KANAZAWA, I., and NUKINA, N. Ataxin-3, the MJD1 gene product, interacts with the two human homologs of yeast DNA repair protein RAD23, HHR23A and HHR23B, Hum Mol Genet, 2000, 9, 1795–1803.
1
The 26S Proteasome Martin Rechsteiner University of Utah Medical School, Salt Lake City, USA
Originally published in: Protein Degradation, Volume 1. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30837-8 The 26S proteasome is a large ATP-dependent protease composed of more than 30 different polypeptide chains. Like the ribosome, the 26S proteasome is assembled from two “subunits”, the 19S regulatory complex and the 20S proteasome. The 19S regulatory complex confers the ability to recognize and unfold protein substrates, and the 20S proteasome provides the proteolytic activities needed to degrade the substrates. The 26S proteasome is the only enzyme known to degrade ubiquitylated proteins, and it also degrades intracellular proteins that have not been marked by ubiquitin. The 26S proteasome is located in the nucleus and cytosol of eukaryotic cells, where the enzyme is responsible for the selective degradation of a vast number of important cellular proteins. Because rapid proteolysis is a pervasive regulatory mechanism, the 26S proteasome is essential for the proper functioning of many physiological processes.
1 Introduction
It has become apparent since the mid-1990s that the ubiquitin-proteasome system (UPS) plays a major regulatory role in eukaryotic cells. The UPS helps to control such important physiological processes as the cell cycle [1, 2], circadian rhythms [3], axon guidance [4], synapse formation [5] and transcription [6–8], to name just a few. In view of the growing family of ubiquitin-like proteins [9, 10], it is possible that covalent attachment of ubiquitin and its relatives will even surpass phosphorylation as a regulatory mechanism. Although ubiquitin serves non-proteolytic roles, such as histone modification [11, 12] the activation of cell signaling components [13], endocytosis [14], or viral budding [15], the protein’s principal function appears to be targeting proteins for destruction [16]. To do this, the C-terminus of ubiquitin is activated by an ATP-consuming enzyme (E1) and transferred to one of several
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The 26S Proteasome
Fig. 1 Schematic representation of the Ubiquitin–Proteasome pathway. Ubiquitin molecules are activated by an E1 enzyme (shown green at 1/3 scale) in an ATPdependent reaction, transferred to a cysteine residue (yellow) on an E2 or Ub carrier protein and subsequently attached to
amino groups (NH2 ) on a substrate protein (lysozyme shown in purple) by an E3 or ubiquitin ligase, (the multicolored SCF complex). Note that chains of Ub are generated on the substrate, and these are recognized by the 26S proteasome depicted in the upper right at 1/20 scale.
dozen or more small carrier proteins (E2s) in the form of a reactive thiol ester. The E2s collaborate with members of several large families of ubiquitin ligases or E3s, and ubiquitin is transferred once again to lysine amino groups on the proteolytic substrates (S) and to itself, thereby generating chains of ubiquitin. The substrate bearing the ubiquitin chains is subsequently recognized and degraded by the 26S proteasome, a large, complex ATP-dependent protease (see Figure 1). Eukaryotic genomes contain information for more than 20 E2s and hundreds of E3s. In contrast to the wealth of components devoted to marking protein substrates for destruction, only one enzyme, the 26S proteasome, has been found to degrade ubiquitylated proteins. However, there is complexity here as well, since the 26S proteasome is an assemblage of at least 30 different subunits. Moreover, there is
2 The 20S Proteasome
a growing list of proteins that act as proteasome activators, adapters, or accessory factors. In this article I focus on basic biochemical and physiological properties of the 26S proteasome, drawing occasionally from findings on structurally similar prokaryotic, ATP-dependent proteases [17].
2 The 20S Proteasome 2.1 Structure
We know the molecular anatomy of archaebacterial, yeast and bovine proteasomes in great detail since high-resolution crystal structures have been determined for all three enzymes [18–20]. The archaebacterial proteasome is composed of two kinds of subunits, called α and β. Each subunit forms heptameric rings that assemble into the 20S proteasome by stacking four deep on top of one another to form a “hollow” cylinder. Catalytically inactive α rings form the ends of the cylinder while proteolytic β subunits occupy the two central rings. The quaternary structure of the 20S proteasome can therefore be described as α7β7β7α7. The active sites of the β-subunits face a large central chamber about the size of serum albumin. The αrings seal off the central proteolytic chamber and two smaller antechambers from the external solvent. Archaebacterial proteasomes, with their fourteen identical β subunits, preferentially hydrolyze peptide bonds following hydrophobic amino acids and are therefore said to have chymotrypsin-like activity [21]. Eukaryotic proteasomes maintain the overall structure of the archaebacterial enzyme, but they exhibit a more complicated subunit composition. There are seven different α-subunits and at least seven distinct β-subunits arranged in a precise order within their respective rings. Although current evidence indicates that only three of its seven β-subunits are catalytically active, the eukaryotic proteasome cleaves a wider range of peptide bonds, containing, as it does, two copies each of trypsin-like, chymotrypsin-like and post-glutamyl-hydrolyzing subunits. For this reason it is capable of cleaving almost any peptide bond, having difficulty only with proline–X, glycine–X and to a lesser extent with glutamine–X bonds [22]. 2.2 Enzyme Mechanism and Proteasome Inhibitors
Whereas standard proteases use serine, cysteine, aspartate, or metals to cleave pep-tide bonds, the proteasome employs an unusual catalytic mechanism. Nterminal threonine residues are generated by self-removal of short peptide extensions from the active β-subunits and act as nucleophiles during peptide-bond hydrolysis [23]. Given its unusual catalytic mechanism, it is not surprising that there are highly specific inhibitors of the proteasome. The fungal metabolite lactacystin
3
4 The 26S Proteasome
and the bacterial product epoxomicin covalently modify the active-site threonines and inhibit the enzyme [24, 25]. Other inhibitors include vinylsulfones [26] and various peptide aldehydes, which are generally less specific. A peptide boronate inhibitor of the proteasome, Velcade, has been approved for the treatment of multiple myeloma [27]. 2.3 Immunoproteasomes
Interferon γ (IFNγ ) is an immune cytokine that increases expression of a number of cellular components involved in Class I antigen presentation [28]. Among the IFNγ -inducible components are three catalytically active β-subunits of the proteasome, called LMP2, LMP7 and MECL1 [29]. Each replaces its corresponding constitutive subunit resulting in altered peptide-bond cleavage preferences of 20S immunoproteasomes. For example, immunoproteasomes exhibit much reduced cleavage after acidic residues and enhanced hydrolysis of peptide bonds following branch-chain amino acids such as isoleucine or valine. Class I molecules preferentially bind peptides with hydrophobic or positive C-termini, and proteasomes generate the vast majority of Class I peptides [28, 30]. Hence, the observed β-subunit exchanges are well suited for producing peptides able to bind Class I molecules.
3 The 26S Proteasome 3.1 The Ubiquitin–Proteasome System
Bacteria express as many as five ATP-dependent proteases, all of which contain nucleotide-binding domains that belong to the AAA+ family of ATPases [31]. By contrast, the 26S proteasome is the only ATP-dependent protease discovered so far in the nuclear and cytosolic compartments of eukaryotic cells. Because the 20S proteasome’s internal cavities are inaccessible to intact proteins, openings must be generated in the enzyme’s outer surface for proteolysis to occur. A number of protein complexes have been found to bind the proteasome and stimulate peptide hydrolysis (see Figure 2). The most important of the proteasome-associated components is the 19S regulatory complex (RC) for it is a major part of the 26S ATP-dependent enzyme that degrades ubiquitin-tagged proteins in eukaryotic cells. In Figure 2 the 20S proteasome is shown binding only one 19S RC although doublycapped 26S proteasomes also exist (see Figure 3 below). The 20S proteasome also binds activators such as PA28 or PA200. Each of these activators can be present in 26S proteasome complexes forming what are called hybrid proteasomes. Finally a protein called Ecm29p has been found associated with the 26S proteasome. Ecm29p is thought to act as an adapter coupling the 26S enzyme to secretory vesicles.
3 The 26S Proteasome
Fig. 2 Interaction of the 20S proteasome with other cellular components.
Fig. 3 Electron-microscopic reconstructions of the 26S proteasome. Three images of a doubly-capped 26S proteasome are presented to illustrate the positions of the lid and base subcomplexes of the 19S RC and to identify the most probable location of the RC ATPases.
5
6 The 26S Proteasome
3.2 Ultrastructure of the 26S Proteasome and Regulatory Complex
Electron micrographs of purified 26S proteasomes by Baumeister and colleagues [32] reveal a dumbbell-shaped particle approximately 40 nm in length in which the central 20S proteasome cylinder is capped at one or both ends by asymmetric RCs looking much like Chinese dragonheads (Figure 3). In doubly capped 26S proteasomes the regulatory complexes face in opposite directions, indicating that contact between the proteasome’s α-rings and the RC is highly specific. However, the contacts may not be especially tight since image analysis of Drosophila 26S proteasomes suggests movement of the RCs relative to the 20S proteasome [33]. Electron microscopy (EM) images of the 26S proteasome from several organisms appear similar, indicating that the overall architecture of the enzyme has been conserved in evolution. This conclusion is also supported by sequence conservation among RC subunits (see below). A yeast mutant lacking the RC subunit Rpn10 contains a salt-labile RC that dissociates into two subcomplexes called the lid and the base [34]. The base contains nine RC subunits, which include six ATPases described below, the two largest RC subunits S1 and S2, and S5a; the lid contains the remaining RC subunits. Thus the RC is composed of two sub-complexes separated on one side by a cavity, i.e. the dragon’s mouth (see Figure 3). Ultrastructural studies have also been performed on the lid and on a related protein complex called the COP9 signalosome [35]. Both particles lack obvious symmetry. Some particles exhibit a negative stainfilled, central groove; other classes of particles exhibit seven or eight lobes in a disclike arrangement. Since both the lid and the COP9 signalosome are composed of eight subunits, the lobes may represent individual subunits. 3.3 The 19S Regulatory Complex
The regulatory complex is also called the 19S cap, PA700, and the µ particle. As its most commonly used name suggests, the 19S RC is roughly the same size as the 20S proteasome. In fact it is a more complicated protein assembly containing 17 different subunits ranging in size from 25 kDa to about 110 kDa. In animal cells the subunits are designated S1 through S15. Homologs for each of these subunits are present in budding yeast where an alternate nomenclature has been adopted (see Table 1). Sequences for the 17 RC subunits permit their classification into a group of 6 ATPases and another group containing the 11 non-ATPases. 3.4 ATPases of the RC
The six ATPases belong to the rather large family of AAA ATPases (for ATPases Associated with a variety of cellular Activities) whose eukaryotic members include the motor protein dynein, the membrane fusion factor NSF, and the chaperone VCP/Cdc48 and whose prokaryotic members include five ATP-dependent pro-
3 The 26S Proteasome Table 1 Subunits of the 19S regulatory complex
Mammalian nomenclature Yeast nomenclature Function
Motifs
S1 S2 S3 p55 S4 S5a S5b S6 S6 S7 S8 S9 S10a S10b S11 S12 S13 S14
Leu-rich repeats, KEKE Leu-rich repeats, KEKE PCI PCI AAA nucleotidase UIM, KEKE
Rpn2 Rpn1 Rpn3 Rpn5 Rpt2 Rpn10 none Rpt3 Rpt5 Rpt1 Rpt6 Rpn6 Rpn7 Rpt4 Rpn9 Rpn8 Rpn11 Rpn12
Ub/UbL binding Ub/UbL binding ? ? ATPase polyubiquitin binding ? ATPase ATPase ATPase ATPase ? ? ATPase ? ? Isopeptidase ?
AAA nucleotidase AAA nucleotidase AAA nucleotidase AAA nucleotidase PCI PCI, KEKE AAA nucleotidase PCI MPN, KEKE MPN PCI
teases [31]. The six ATPases, denoted S4, S6, S6 , S7, S8, and S10b in mammals, are about 400 amino acids in length and homologous to one another. Based on their sequences, one can distinguish three major regions: (1) A central nucleotidebinding domain of about 200 amino acids, which is roughly 60% identical among members of the RC subfamily; (2) the C-terminal region, approximately 100 amino acids in length and with a lesser, though significant, degree of conservation (40%); and (3) a highly divergent N-terminal region (<20% identity) around 120 amino acids in length; this region contains heptad repeats characteristic of coiled-coil proteins. Despite sequence differences among RC ATPases within an organism, each ATPase has been conserved during evolution with specific subunits being almost 75% identical between yeast and humans. The high degree of conservation encompasses the entire sequence, making it likely that even the divergent N-terminal regions play an important role in RC function. Conceivably they are used to select substrates for degradation by the 26S proteasome [36]. Alternatively the variable N-terminal regions in the RC ATPases may promote assembly of the RC by the specific placement of the ATPase subunits within the complex. In this regard, the six ATPases associate with one another in highly specific pairs: S4 binds S7, S6 binds S8, and S6 binds to S10b. Moreover, the N-terminal regions of RC ATPases are required for partner-specific binding [37]. Staining patterns of two-dimensional gels show the six RC ATPases to be present at comparable levels, and affinity capture of yeast 26S proteasomes indicate the presence of one copy of each in the regulatory complex. Mutational analysis in yeast demonstrates that the ATPases are not functionally redundant since mutation of yeast S4 has a particularly profound effect on peptide hydrolysis [38]. It is probable that the ATPases form a hexameric ring like other members of the AAA
7
8 The 26S Proteasome
family of ATPases such as NSF or VCP/Cdc48. But this assumption has not been experimentally verified, and pentameric or heptameric arrangements of AAA ATPase complexes have been reported [39, 40]. Finally it is quite likely that the ATP-ases directly bind the α-ring of the 20S proteasome (see Figure 3). Evidence favoring this arrangement comes from chemical cross-linking experiments [41] and the presence of the ATPases in the base subcomplex of the yeast 26S proteasome [34]. 3.5 The non-ATPase Subunits
Whereas the six RC ATPases are homologous and relatively uniform in size, the non-ATPases are heterogeneous in size and sequence. Nonetheless, they can be grouped on the basis of their location, on the presence or absence of certain sequence motifs, and on their affinity for ubiquitin chains or ubiquitin-like proteins. Eight RC subunits are found in the lid subcomplex, each being homologous to a subunit in a separate protein complex called the COP9 Signalosome [42–44]. One of the lid subunits, S13, is a metalloisopeptidase that removes ubiquitin chains from the tagged substrate prior to its translocation into the proteasome for degradation. Six of the eight lid subunits contain PCI domains, stretches of about 200 residues so named from their occurrence in Proteasome, Cop9 signalosome, and the eukaryotic Initiation factor 3 subunits. The PCI domains are thought to mediate subunit-subunit interactions. Two lid subunits contain 140 amino acid-long MPN domains (Mpr1p and Pad1p N-terminal regions) with one of these subunits being the S13 isopeptidase. Although several models have been proposed [45–48], the arrangement of subunits within the lid subcomplex is not known. Functionally, only S13 stands out because of its isopeptidase activity. The presence of S13 in the lid explains why the lid is necessary for degradation of ubiquitylated proteins even though the RC base complex supports the ATP-dependent degradation of some small non-ubiquitylated proteins [34]. Interestingly the COP9 signalosome also exhibits isopeptidase activity that removes the ubiquitin-like protein, NEDD8, from certain ubiquitin ligases [49]. Biochemical activity has not been assigned to any of the remaining seven lid subunits, although two lid subunits, S3 and S9, are critical for the degradation of specific substrates [50, 51]. In addition to the six ATPases the base sub-complex contains the two largest RC subunits (S1, S2) and a smaller subunit called S5a [52]. Besides their common location, these three subunits share the property of binding polyubiquitin chains or ubiquitin-like (UbL) domains. S5a binds polyubiquitin chains even after it has been transferred from SDS-PAGE gels and displays many features that match polyubiquitin recognition by the 26S proteasome [52]. However, S5a cannot be the only ubiquitin-recognition component in the 26S proteasome because deletion of the gene encoding yeast S5a (Rpn10) has only a modest impact on proteolysis [53]. This strongly suggests that there are other ubiquitin-recognition components in the RC, with S1 and S2 being prime candidates. S1 and S2 display significant homology to each other, and both can be modeled as α-helical toroids [54]. They have been shown to bind the UbL domains of RAD23 and Dsk2, adapter proteins
3 The 26S Proteasome
that target ubiquitylated proteins to the 26S proteasome [55, 56]. It has also been found that the S6 ATPase can be cross-linked to ubiquitin [57]. Currently then it appears that the RC contains three or possibly four subunits able to recognize ubiquitin or UbL proteins. As discussed below there are other ways in which the RC can select proteins for destruction. 3.6 Biochemical Properties of the Regulatory Complex 3.6.1 Nucleotide Hydrolysis Both the 26S proteasome and the RC hydrolyze all four nucleotide triphosphates, with ATP and CTP preferred over GTP and UTP [58]. Although ATP hydrolysis is required for conjugate degradation, the two processes are not strictly coupled. Complete inhibition of the peptidase activity of the 26S proteasome by calpain inhibitor I has little effect on the ATPase activity of the enzyme. The nucleotidase activities of the RC and the 26S proteasome closely resemble those of E. coli Lon protease, which is composed of identical subunits that possess both proteolytic and nucleotidase activities in the same polypeptide chain. Like the regulatory complex and 26S proteasome, Lon hydrolyzes all four ribonucleotide triphosphates, but not ADP or AMP [18]. 3.6.2 Chaperone-like Activity AAA nucleotidases share the common property of altering the conformation or association state of proteins, so it is not surprising that the RC has been shown to prevent aggregation of several denatured proteins including citrate synthase and ribonuclease A [59–61]. The chaperone activity of the RC may explain why the RC plays a role in transcription apparently in the absence of an attached 20S proteasome [62]. 3.6.3 Proteasome Activation The 20S proteasome is a latent protease owing to the barrier imposed by the αsubunit rings on peptide entry. Consequently, a readily measured activity of the RC is activation of fluorogenic peptide hydrolysis by the 20S proteasome. The extent of activation is generally found to be in the range 3- to 20-fold [63]. Activation is relatively uniform for all three proteasome catalytic subunits and presumably reflects opening by the attached RC of a channel leading to the proteasome’s central chamber. 3.6.4 Ubiquitin Isopeptide Hydrolysis The channel through the proteasome α-ring into the central chamber measures 1.3 nm in diameter, a size too small to permit passage of a folded protein, even one as small as ubiquitin. This consideration, coupled with the fact that ubiquitin is recycled intact upon substrate degadation, requires an enzyme to remove the polyubiquitin chain prior to or concomitant with proteolysis. Several isopeptidases that remove ubiquitin from substrates have been found associated with the 26S
9
10 The 26S Proteasome
proteasome [64–66]. Of these the ATP-stimulated metalloisopeptidase S13 is an integral component of the enzyme. 3.6.5 Substrate Recognition It is clear that the RC plays a predominant role in selecting proteins for degradation. This important topic is covered below in the context of substrate recognition by both 20S and 26S proteasomes.
4 Substrate Recognition by Proteasomes 4.1 Degradation Signals (Degrons)
The discovery that proteins possess built-in signals targeting them to specific locations within cells was a major success of twentieth-century cell biology [67]. Selective proteolysis can be considered targeting out of existence, and a number of short peptide motifs have been discovered to confer rapid destruction on proteins that bear them. These motifs include PEST sequences [68], the N-terminal amino acid [69], and destruction and KEN boxes [70]. These motifs are recognized by one or more ubiquitin ligases that mark the substrate protein by addition of a polyubiquitin chain. However some proteins are degraded by the 26S proteasome without prior marking by ubiquitin [71]. Denatured proteins are also selectively degraded by both 26S and 20S proteasomes [72]. It is not clear what features of denatured proteins are recognized by proteasomes or by components of the ubiquitin proteolytic system. 4.2 Ubiquitin-dependent Recognition of Substrates
Most well characterized substrates of the 26S proteasome are ubiquitylated proteins so our discussion starts with them. Ubiquitin contains seven lysine residues, and proteomic studies in yeast indicate that chains (or dimers) can be formed using any of them [73]. In higher eukaryotes polyubiquitin chains formed via Lys6, Lys27, Lys29, Lys48, and Lys63 have been observed. Lys6 chains are formed by BRCA/BARD heterodimers where they presumably play a role in DNA repair [74]. Ubiquitin monomers linked to each other through Lys63 are involved in endocytosis and DNA repair, but not apparently in targeting proteins to the 26S proteasome [75, 76]. Lys27 chains have been found on the co-chaperone BAG1, and they target degradation of misfolded proteins bound by the Hsp70 chaperone to the 26S proteasome [77]. Lys29 and Lys48 chains form directly on proteolytic substrates and target them for destruction [78, 79]. Efficient proteolysis of ubiquitylated proteins by the 26S proteasome requires a chain containing at least four ubiquitin monomers [80]. This matches well the ubiquitin-binding characteristics of the RC subunit S5a. It too selectively binds
4 Substrate Recognition by Proteasomes
ubiquitin polymers composed of four or more ubiquitin moieties and exhibits increased affinity for longer chains [52]. S5a molecules from a number of higher eukaryotes contain two independent polyubiquitin-binding sites; each is approximately 30 residues long and characterized by five hydrophobic residues that consist of alternating large and small hydrophobic residues, e.g. Leu-Ala-Leu-Ala-Leu [81]. Similar motifs have been found in other proteins of the ubiquitin system and are now called UIMs, (Ubiquitin Interacting Motifs) [82]. Two recent NMR studies have demonstrated direct interaction between UIMs and the hydrophobic patch on ubiquitin [83–85]. Whereas S5a provides for direct recognition of polyubiquitylated substrates, a second mechanism involves adapter proteins possessing both a UbL domain and one or more UbA (ubiquitin associated) domains [86]. UbA domains are polyubiquitin-binding domains found in several presumed adapter proteins of the ubiquitin system. The adapter proteins include RAD23 and Dsk2 in yeast and recruit substrates to the 26S proteasome. The UbL domains of these proteins bind to 26S proteasome subunits while their UbA domains bind substrate-tethered Ub chains. In yeast the RC subunits S1 and S2 serve as UbL-binding components [55, 87]; in mammals S5a serves this purpose [88–90]. The co-chaperone BAG1 illustrates a third way in which polyubiquitin can target substrate proteins to the 26S proteasome. In this case the substrate is not polyubiquitylated; rather it is bound to the chaperone Hsp70. A polyubiquitin chain, linked through Lys27, is attached to the Hsp70-associated co-chaperone BAG1 [77]. Apparently the Lys27 chain promotes association of the chaperone–substrate complex with the 26S proteasome, after which the substrate is degraded while BAG1, Hsp70, and ubiquitin are recycled. Direct interaction between E3 ubiquitin ligases and RC subunits can also deliver ubiquitylated substrates to the protease. The yeast E3 ligase called UFD4 binds RC ATPases [91] and UFD4-mediated delivery of substrates bypasses the requirement for S5a. In what appears to be a similar delivery system, the mammalian E3 Parkin uses a UbL domain to bind the 26S proteasome [89], and the E3 component pVHL binds a 26S proteasome ATPase [92]. Mutational analyses in yeast have shown that whereas deletion of either S5a or Rad23 has a mild impact on proteolysis, loss of both proteins produces a severe phenotype [93]; furthermore, yeasts lacking S5a, RAD23 and Dsk2 are not viable, indicating that direct delivery by E3 ligases cannot compensate for the absence of all three proteins.
4.3 Substrate Selection Independent of Ubiquitin
The 26S proteasome also degrades non-ubiquitylated proteins [71]. The short-lived enzyme ornithine decarboxylase (ODC) and the cell-cycle regulator p21Cip provide well documented examples of ubiquitin-independent proteolysis by the 26S enzyme [94, 95]. ODC degradation is stimulated by antizyme, a polyamine-induced protein that binds both ODC and the 26S proteasome. Antizyme functions as an adapter much like RAD23 and Dsk2 except that polyubiquitin chains are not involved. However, free ubiquitin chains do compete with antizyme–ODC for degradation
11
12 The 26S Proteasome
[96]. Other potential adapters, such as gankyrin, may target proteins to the 26S proteasome in the absence of ubiquitin marking [97]. The CDK inhibitor p21Cip is degraded in a nonubiquitin-dependent reaction, as clearly demonstrated by substitution of arginines for all the lysine residues in p21Cip. These modifications prevented ubiquitylation of p21Cip, but the lysineless protein was still degraded in human fibroblasts by the proteasome [95]. The Cterminal region of p21Cip binds to the proteasome α-subunit C8, and in vitro p21Cip is degraded by the 20S proteasome alone [98, 99]. Direct binding of p21Cip to the 20S proteasome may open a channel through the α-ring allowing the loosely structured protein to enter the central proteolytic chamber. c-Jun, “aged” calmodulin, troponin C, and p53 are other proteins that can be degraded by the 26S proteasome absent marking by ubiquitin [71]. Thus, other 20S proteasome substrates, in vitro at least, include oxidized proteins, small, denatured proteins and loosely folded proteins such as casein. Whether the 20S proteasome degrades proteins within cells is an unresolved problem.
5 Proteolysis by the 26S Proteasome 5.1 Presumed Mechanism
Proteolysis of ubiquitylated proteins by the 26S proteasome can be thought to consist of seven steps: (1) chaperone-mediated substrate presentation; (2) substrate association with RC subunits; (3) substrate unfolding; (4) detachment of polyubiquitin from the substrate; (5) translocation of the substrate into the 20S proteasome central chamber; (6) peptide bond cleavage; and (7) release of peptide products as well as polyubiquitin (see Figure 4). Step 1 is optional depending on the substrate, and in principle steps 3 and 4 could occur in either order. Step 4 is unnecessary for substrates like ODC that are not ubiquitylated. The other steps almost have to occur as presented. Although it is easy to conceptualize the reaction sequence, few experimental findings bear directly on any of the proposed subreactions, and virtually nothing is known about molecular movements within the 26S proteasome. However several studies on prokaryotic ATP-dependent proteases permit some informed speculation, and it has been shown that step 6 is not required for sequestration of ODC by the 26S proteasome [100]. 5.2 Contribution of Chaperones to Proteasome-mediated Degradation
Chaperones are connected to proteasomes in at least four ways. First, chaperones can deliver substrates to the proteasome as described above for the co-chaperone BAG1 [77]. In a similar fashion the chaperone VCP/Cdc48 is required for the degradation of several ubiquitin-pathway substrates. VCP, a member of the AAA
5 Proteolysis by the 26S Proteasome
Fig. 4 Hypothetical reaction cycle for the 26S proteasome. A polyubiquitylated substrate is delivered to a 26S hybrid proteasome in some cases by chaperones such as VCP/p97 (step 1). Substrate is bound by polyubiquitin recognition components within the regulatory complex (RC) until the polypeptide chain is engaged by the ATPases (step 2). As the polypeptide chain
is unfolded and presumably pumped down the central pore of the proteasome, a signal is conveyed to the S13 metallo-isopeptidase to remove the polyubiquitin chain (step 3). The unfolded polypeptide is eventually degraded within the inner chamber of the proteasome (step 4) and peptide fragments exit the enzyme.
family of ATPases, is a large hexameric ATPase that appears to function as a protein separase able to remove ubiquitylated monomers from multisubunit complexes [101–104]. In some cases the liberated proteins are degraded by the 26S proteasome; in other cases the separated proteins may change their intracellular location. The proteasome also degrades proteins embedded in the endoplasmic reticulum (ER) membrane. If these ER membrane proteins possess a large cytoplasmic domain, their proteasomal degradation can require Hsps 40, 70, and 90 as well as VCP [105, 106]. Hsp90 is required to assemble and stabilize the yeast 26S proteasome providing a third connection between chaperones and proteasomes; Hsp90 is also able to bind and suppress peptide hydrolysis by the 20S proteasome (see below).
13
14 The 26S Proteasome
Finally both chaperones and proteasomes are induced by the accumulation of denatured proteins within eukaryotic cells. 5.2.1 Substrate Binding to the 26S Proteasome Although chaperones may provide for the recognition of some 26S proteasome substrates [107], there is little doubt that the 26S proteasome recognizes substrates directly. As mentioned, ODC is the best-characterized substrate recognized by the 26S proteasome in the absence of a polyubiquitin chain [96]. Which RC subunits actually recognize ODC–antizyme complexes has not been discovered. Presumably one or more subunits in the RC recognize both the C-terminal degron in ODC and some feature of antizyme. The apparent dual recognition of elements in ODC and antizyme may reflect a general mechanism by which ATP-dependent proteases process substrates. For example a number of studies on substrate recognition by E. coli ATP-dependent proteases indicate that substrate adapters provide one recognition site while the substrate provides another [108–110]. Similarly, for the 26S proteasome one recognition element, the attached polyubiquitin chains, may be seen by RC subunits S5a or S2, while the substrate’s N- or C-termini are recognized by RC ATPases. Interestingly, the location of the polyubiquitin chain on the substrate can affect rates of degradation as much as five-fold [111] perhaps by altering the rates at which the RC ATPases engage substrate termini. 5.2.2 Translocation of the Polypeptide Substrate to the Central Proteolytic Chamber Translocation is thought to proceed by the six ATPases threading the polypeptide through a channel in the 20S proteasome’s α-ring. It is also thought that the RC ATPases processively unravel substrates from degrons within the polypeptide chain and are able to “pump” the polypeptide chain in either the N-terminal to C-terminal direction or the opposite [112–114]. Several studies have estimated that hundreds of ATP molecules are needed to degrade small to medium-sized proteins [115, 116]. Rates of translocation range between 10 and 30 amino acids per second [116, 117]. These values are similar to DNA helicases where rates of 50 bases per second and 1ATP per base have been reported [118]. Current models would suggest that the RC ATPases hydrolyze ATP in a sequential rotary fashion essentially screwing the polypeptide chain into the 20S proteasome [119, 120], but the possibility that convulsive movements transfer the substrate has not been ruled out. The ATPases may even be capable of transferring loops into the 20S enzyme. However, a proteomic screen for ClpXP substrates revealed that degrons were either C-terminal or N-terminal [121], so it is likely that the RC ATPases usually engage polypeptide termini. The peptide fragments generated in the central chamber are generally 5 to 10 residues in length, but fragments as long as 35 amino acids can be present [64]. How these fragments exit the central chamber is not known. They could diffuse back through the RC, through small side panels in the 20S proteasome or in the case of hybrid proteasomes (see below) through the end capped by a proteasome activator.
6 Proteasome Biogenesis
5.3 Processing by the 26S Proteasome
In some cases the 26S proteasome partially degrades the substrate protein, releasing processed functional domains. The best-studied example of processing involves the transcriptional activator NFκB. The C-terminal half of a 105-kDa precursor is degraded by the 26S proteasome to yield a 50-kDa N-terminal domain that is the active transcription component [122]. A glycine-rich stretch of amino acids at the C-terminal boundary of p50 is an important factor in limiting proteolysis [123]. It is possible that polypeptide translocation by the RC starts at the Gly-rich region and proceeds in only one direction owing to the presence of the tightly folded N-terminal domain. Or the RC may start translocation at the C-terminus and stop when the ATPases encounter the Gly-rich region. Insertion of a Gly-Ala stretch as small as seven residues into the C-terminal degron of ODC is sufficient to prevent complete destruction of ODC leading instead to partial processing of the enzyme. This has led Coffino and colleagues to suggest that the Gly-Ala stretch impairs substrate transfer by the RC ATPases [124]. Another example of partial processing involves SPT23, a yeast protein embedded in the endoplasmic reticulum membrane [125]. SPT23 controls unsaturated fatty acid levels, and membrane fluidity regulates 26S proteasomal generation of a freely diffusible transcription factor from the SPT23 precursor. Partial processing may be a more widespread regulatory mechanism than is currently thought.
6 Proteasome Biogenesis 6.1 Subunit Synthesis
The synthesis of proteasome subunits is markedly affected by proteasome function. For example, inhibition of proteasome activity by lactacystin induces coordinate expression of both RC and 20S proteasome subunits [66]. Likewise, impaired synthesis of a given RC subunit results in over-expression of all RC subunits [126, 127]. Proteasome subunit synthesis in yeast is controlled by Rpn4p, a short-lived positive transcription factor that binds PACE elements upstream of proteasome genes [128]. Rpn4p is a substrate of the 26S proteasome suggesting that the transcription factor functions in a feedback loop in which proteasome activity limits its concentration thereby regulating proteasome levels [129]. To date an Rpn4-like factor has not been identified in higher eukaryotes, but such a factor is likely to exist. 6.2 Biogenesis of the 20S Proteasome
Proteasome β-subunits are synthesized with N-terminal extensions and are inactive because a free N-terminal threonine is required for peptide-bond hydrolysis [130].
15
16 The 26S Proteasome
The precursor β-subunits assemble with α-subunits to form half proteasomes composed of one α- and one β-ring, which then dimerize to form the 20S particle [131]. The N-terminal extensions are removed thereby generating a new unblocked N-terminal threonine in the catalytically active β-subunits. A small accessory protein called Ump1 in yeast or proteassemblin in mammalian cells assists in the final assembly of the 20S proteasome [132]. Interestingly Ump1/POMP is apparently trapped in the proteasome’s central chamber and degraded upon maturation of the enzyme [133]. 6.3 Biogenesis of the RC
Assembly pathways for the RC are virtually unknown. Asmentioned above, the ATP-ases interact with one another and complexes containing all six S4 subfamily members have been observed following in vitro synthesis. Impaired synthesis of the yeast lid subunit Rpn6 results in the absence of the entire lid [134], so presumably lid and base subcomplexes assemble independently and associate in the final stages of RC formation cells. In mammalian cells, 26S proteasomes assemble from preformed regulatory complexes and 20S proteasomes [135]. 6.4 Post-translational Modification of Proteasome Subunits
Proteasome and RC subunits are subjected to a variety of post-translational modifications including phosphorylation, acetylation, myristoylation, and even Oglycosylation [136–140]. In yeast all seven α-subunits are acetylated as well as two β-subunits. Since acetylation of the N-terminal threonine in an active β-subunit would poison catalysis, it has been suggested that the propeptide extensions function to prevent acetylation [130]. Three members of the S4 ATPase subfamily (S4, S6, and S10b) and two 20S α-subunits (C8 and C9) are known to be phosphorylated. Phosphorylation appears to be particularly important for 26S proteasome assembly and stability. The kinase inhibitor staurosporine reduces 26S proteasome levels in mouse lymphoma cells [135] and interferon γ results in reduced phosphorylation of 20S proteasome α-subunits and decreased 26S proteasome levels [141]. 6.5 Assembly of the 26S Proteasome
The RC and 20S proteasome associate to form the 26S proteasome in the presence of ATP [63]. Comparison of the cross-linking patterns of RC and assembled 26S proteasomes indicates that this association is accompanied by subunit rearrangement [142]. In yeast two proteins play a special role in 26S proteasome assembly or stability. Nob1p is a nuclear protein required for biogenesis of the 26S proteasome and is degraded following assembly of the 26S enzyme. Thus Nob1p suffers the same fate as Ump1 does following 20S maturation [143]. In fission yeast the protein Yin6 regulates both the nuclear localization and the stability of the 26S proteasome
7 Proteasome Activators
[144]. In budding yeast the chaperone Hsp90 also plays a role in the assembly and maintenance of yeast 26S proteasomes, since functional loss of Hsp90 results in 26S proteasome dissociation, indeed even dissociation of the lid subcomplex [145]. The 26S proteasome also dissociates into RC and 20S proteasomes when budding yeast is subjected to long periods of starvation [146].
7 Proteasome Activators
In addition to the RC there are two protein complexes, REGαβ and REGγ , and a single polypeptide chain, PA200, that bind the 20S proteasome and stimulate peptide hydrolysis but not protein degradation. Like the RC, proteasome activators bind the ends of the 20S proteasome and, importantly, they can form mixed or hybrid 26S proteasomes in which one end of the 20S proteasome is associated with a 19S RC and the other is bound to a proteasome activator [147–150]. This latter property raises the possibility that proteasome activators serve to localize the 26S proteasome within eukaryotic cells. 7.1 REGs or PA28s 7.1.1 REGs There are three distinct REG subunits called αβγ [151, 152]. REGsα and β form donut-shaped heteroheptamers found principally in the cytoplasm, whereas REGγ forms a homoheptamer located in the nucleus. REGαβ is abundantly expressed in immune tissues, while REGγ expression is highest in brain. The REGs also differ in their activation properties. REGαβ activates all three proteasome active sites; REGγ only activates the trypsin-like subunit. There is reasonably solid evidence that REGαβ plays a role in Class I antigen presentation [153], but we have little knowledge concerning REGγ function since REGγ knockout mice have almost no phenotype [154]. The crystal structure of REGα reveals that the seven subunits form a donut-shaped structure with a central aqueous channel, and the structure of a REG-proteasome complex provides important insight into the mechanism by which REGα activates the proteasome. The carboxyl tail on each REG subunit fits into a corresponding cavity on the α-ring of the proteasome. Loops on the REG subunits displace N-terminal strands on several proteasome α-subunits reorienting them upward into the aqueous channel of the REG heptamer thereby opening a continuous channel from the exterior solvent to the proteasome central chamber [155]. 7.1.2 PA200 A new proteasome activator called PA200 was recently purified from bovine testis [156]. Human PA200 is a nuclear protein of 1843 amino acids that activates all three catalytic subunits with some preference for the PGPH active site. Homologs
17
18 The 26S Proteasome
of PA200 are present in budding yeast, worms, and plants. A single chain of PA200 can bind each end of the proteasome and, when bound, PA200 molecules look like volcanos in negatively stained EM images. PA200 is thought to play a role in DNA repair, perhaps by recruiting proteasomes to DNA double-strand breaks. 7.1.3 Hybrid Proteasomes As the α-rings at each end of the 20S proteasome are equivalent, the 20S proteasome is capable of binding two RCs, two PA28s, two PA200s, or combinations of these components. In fact 20S proteasomes simultaneously bound to RC and PA28 or PA200 have been observed, and are called hybrid proteasomes [147, 148, 150]. In HeLa cells the levels of hybrid proteasomes containing PA28 at one end and an RC at the other are two-fold higher than 26S proteasomes capped at both ends by 19S RCs [149]. Hybrid 26S proteasomes containing PA200 appear to be much less abundant in HeLa cells [156]. 7.2 ECM29
Another proteasome-associated protein, called Ecm29, has been identified in several proteomic screens [157, 158]. Ecm29p clearly associates with 26S proteasomes; whether it activates proteasomal peptide hydrolysis is currently unknown. Ecm29p is reported to stabilize the yeast 26S proteasome by clamping the RC to the 20S cylinder [159]. However, in mammalian cells Ecm29p is found mainly associated to secretory and endocytic organelles, a location suggesting a role in secretion rather than 26S proteasome stability. Moreover, the levels of Ecm29p vary markedly among mouse organs, so if mammalian Ecm29p serves as a clamp, some tissues either do not require a clamped 26S proteasome or other proteins function as RC-20S clamps.
8 Protein Inhibitors of the Proteasome
A number of proteins have been found to suppress proteolysis by the proteasome. One of these is PI31 [160, 161]; another is the abundant cytosolic chaperone Hsp90 [162], and a third is a proline/arginine-rich 39-residue peptide called PR39 [163]. Both PI31 and Hsp90 may affect how the proteasome functions in Class I antigen presentation. PI31 is a 30-kDa proline-rich protein that inhibits peptide hydrolysis by the 20S proteasome and can block activation by both RC and REGαβ. Although surveys of various cell lines show PI31 to be considerably less abundant than RC or REGαβ, when over-expressed, PI31 is reported to inhibit Class I antigen presentation by interfering with the assembly of immuno-proteasomes [161]. A number of studies have shown that Hsp90 can bind the 20S proteasome and inhibit its chymotrypsin-like and PGPH activities. Interestingly inhibition by Hsp90 is observed with constitutive but not with immuno-proteasomes, a finding consistent
9 Physiological Aspects 19
with proposals that Hsp90 shuttles immuno-proteasome-generated peptides to the endoplasmic reticulum for Class I presentation [164]. PR39 was originally isolated from bone marrow as a factor able to induce angiogenesis and inhibit inflammation. Two hybrid screens showed that PR39 binds the 20S proteasome. Apparently PR39 affects angiogenesis and inflammation by inhibiting respectively the degradation of HIF1 or IκBα, the latter being an inhibitor of NFκB. Finally HIV’s Tat protein inhibits the 20S proteasome’s peptidase activity [165]. Tat also competes with REGαβ for proteasome binding and, by doing so, Tat can inhibit Class I presentation of certain epitopes [166].
9 Physiological Aspects 9.1 Tissue and Subcellular Distribution of Proteasomes
Proteasomes are found in all organs of higher eukaryotes, but the degree to which the composition of proteasomes and its activators varies among tissues is largely unexplored territory. Proteasomes are very abundant in testis which contains almost five-fold more 20S subunits than skeletal muscle [167]. At the cellular level there are about 800 000 proteasomes in a HeLa cell and roughly 20000 proteasomes in a yeast cell. At the subcellular level, 26S proteasomes are present in cytosol and nucleus where they appear to be freely diffusible [168, 169]. They are not usually found in the nucleolus [170] and have not been reported in membrane-bound organelles other than the nucleus. When large amounts of misfolded proteins are synthesized by a cell, the aberrant polypeptides often accumulate around the centrosome in what are called “aggresomes” [171]. Under these conditions 26S proteasomes, chaperones, and proteasome activators also redistribute to the aggresomes, presumably to refold and/or degrade the misfolded polypeptides [172].
9.2 Physiological Importance
Deletion of yeast genes encoding 20S proteasome and 19S RC subunits is usually lethal, indicating that the 26S proteasome is required for eukaryotic cell viability. Known substrates of the 26S proteasome include transcription factors, cell-cycle regulators, protein kinases, etc., essentially most of the cell’s important regulatory proteins. Even proteins secreted into the endoplasmic reticulum are returned to the cytosol for degradation by the 26S proteasome [173, 174]. Given the scope of its substrates it is hardly surprising that in higher eukaryotes the ubiquitin–proteasome system contributes to the regulation of a vast array of physiological processes ranging from cell-cycle traverse to circadian rhythms to learning.
20 The 26S Proteasome
10 Conclusions
The 20S proteasome was discovered in 1980 and the 26S proteasome six years later. Research since the mid-1980s has made it abundantly clear that the ubiquitinproteasome system is of central importance in eukaryotic cell physiology. Yet there is much more to discover. A crystal structure of the 19S RC, or better still the 26S proteasome, would surely provide insight into the mechanism by which the 26S proteasome degrades its substrates. How the 26S proteasome is itself regulated and the extent to which proteasomal components vary among tissues in higher eukaryotes are other important unresolved problems. Hopefully, these and other unanswered questions will spark further interest in the proteasome.
References 1 REED, S. I. Ratchets and clocks: the cell cycle, ubiquitylation and protein turnover. Nat Rev Mol Cell Biol 2003, 4, 855–864. 2 BASHIR, T. and PAGANO, M. Aberrant ubiquitin-mediated proteolysis of cell cycle regulatory proteins and oncogenesis. Adv Cancer Res 2003, 88, 101–144. 3 VIERSTRA, R. D. The ubiquitin/26S proteasome pathway, the complex last chapter in the life of many plant proteins. Trends Plant Sci 2003, 8, 135–142. 4 CAMPBELL, D. S. and HOLT, C. E. Chemotropic responses of retinal growth cones mediated by rapid local protein synthesis and degradation. Neuron 2001, 32, 1013–1026. 5 HEGDE, A. N. and DIANTONIO, A. Ubiquitin and the synapse. Nat Rev Neurosci 2002, 3, 854–861. 6 MURATANI, M. and TANSEY, W. P. How the Ubiquitin–Proteasome system controls transcription. Nat Rev Mol Cell Biol 2003, 4, 192–201. 7 CONAWAY, R. C., BROWER, C. S., and CONAWAY, J. W. Emerging roles of ubiquitin in transcription regulation. Science 2002, 296, 1254–1258. 8 LIPFORD, J. R. and DESHAIES, R. J. Diverse roles for ubiquitin-dependent proteolysis in transcriptional activation. Nat Cell Biol 2003, 5, 845–850.
9 SEELER, J. S. and DEJEAN, A. Nuclear and unclear functions of SUMO. Nat Rev Mol Cell Biol 2003, 4, 690–699. 10 SCHWARTZ, D. C. and HOCHSTRASSER, M. A superfamily of protein tags: ubiquitin, SUMO and related modifiers. Trends Biochem Sci 2003, 28, 321–328. 11 JASON, L. J., MOORE, S. C., LEWIS, J. D., LINDSEY, G., and AUSIO, J. Histone ubiquitination: a tagging tail unfolds? Bioessays 2002, 24, 166–174. 12 ZHANG, Y. Transcriptional regulation by histone ubiquitination and deubiquitination. Genes Dev 2003, 17, 2733–2740. 13 DENG, L. et al. Activation of the IkappaB kinase complex by TRAF6 requires a dimeric ubiquitin-conjugating enzyme complex and a unique polyubiquitin chain. Cell 2000, 103, 351–361. 14 HICKE, L. and DUNN, R. Regulation of membrane protein transport by ubiquitin and ubiquitin-binding proteins. Annu Rev Cell Dev Biol 2003, 19, 141–172. 15 PORNILLOS, O., GARRUS, J. E., and SUNDQUIST, W. I. Mechanisms of enveloped RNA virus budding. Trends Cell Biol 2002, 12, 569–579. 16 HERSHKO, A., CIECHANOVER, A., and VARSHAVSKY, A. Basic Medical Research Award. The ubiquitin system. Nat Med 2000, 6, 1073–1081.
References 17 KESSEL, M. et al. Homology in structural organization between E. coli ClpAP protease the eukaryotic 26S proteasome. J Mol Biol 1995, 250, 587–594. 18 LOWE, J. et al. Crystal structure of the 20S proteasome from the archaeon T. acidophilum at 3.4 A resolution. Science 1995, 268, 533–539. 19 GROLL, M. et al. Structure of 20S proteasome from yeast at 2.4 A resolution. Nature 1997, 386, 463–471. 20 UNNO, M. et al. The structure of the mammalian 20S proteasome at 2.75 A resolution. Structure (Camb) 2002, 10, 609–618. 21 DAHLMANN, B. et al. The multicatalytic proteinase (prosome, proteasome): comparison of the eukaryotic and archaebacterial enzyme. Biomed Biochim Acta 1991, 50, 465–469. 22 HARRIS, J. L., ALPER, P. B., LI, J., RECHSTEINER, M., and BACKES, B. J. Substrate specificity of the human proteasome. Chem Biol 2001, 8, 1131–1141. 23 SEEMULLER, E. et al. Proteasome from Thermoplasma acidophilum: a threonine protease. Science 1995, 268, 579–582. 24 FENTEANY, G. et al. Inhibition of proteasome activities and subunit-specific amino-terminal threonine modification by lactacystin. Science 1995, 268, 726–731. 25 MENG, L. et al. Epoxomicin, a potent and selective proteasome inhibitor, exhibits in vivo anti-inflammatory activity. Proc. Natl. Acad. Sci. USA 1999, 96, 10403–10408. 26 BOGYO, M. et al. Covalent modification of the active site threonine of proteasomal β subunits and the Escherichia coli homolg HsIV by a new class of inhibitors. Proc. Natl. Acad. Sci. USA 1997, 94, 6629–6634. 27 KANE, R. C., BROSS, P. F., FARRELL, A. T., and PAZDUR, R. Velcade: U.S. FDA approval for the treatment of multiple myeloma progressing on prior therapy. Oncologist 2003, 8, 508–513. 28 YEWDELL, J. W. and HILL, A. B. Viral interference with antigen presentation. Nat Immunol 2002, 3, 1019–1025.
29 GOLDBERG, A. L., CASCIO, P., SARIC, T., and ROCK, K. L. The importance of the proteasome and subsequent proteolytic steps in the generation of antigenic peptides. Mol Immunol 2002, 39, 147–164. 30 RAMMENSEE, H.-G., FALK, K., and R¨OTZSCHKE, O. Peptides naturally presented by MHC class I molecules. Annu. Rev. Immunol. 1993, 11, 213–244. 31 DOUGAN, D. A., MOGK, A., ZETH, K., TURGAY, K., and BUKAU, B. AAA+ proteins and substrate recognition, it all depends on their partner in crime. FEBS Lett 2002, 529, 6–10. 32 PETERS, J. M., CEJKA, Z., HARRIS, J. R., KLEINSCHMIDT, J. A., and BAUMEISTER, W. Structural features of the 26S proteasome complex. J Mol Biol 1993, 234, 932–937. 33 WALZ, J. et al. 26S proteasome structure revealed by three-dimensional electron microscopy. J Struct Biol 1998, 121, 19–29. 34 GLICKMAN, M. H. et al. A subcomplex of the proteasome regulatory particle required for ubiquitin-conjugate degradation and related to the COP9-signalosome and elF3. Cell 1998, 94, 615–623. 35 KAPELARI, B. et al. Electron microscopy and subunit-subunit interaction studies reveal a first architecture of COP9 signalosome. J Mol Biol 2000, 300, 1169–1178. 36 RECHSTEINER, M., HOFFMAN, L., and DUBIEL, W. The multicatalytic and 26S proteases. J. Biol. Chem. 1993, 268, 6065–6068. 37 RICHMOND, C., GORBEA, C., and RECHSTEINER, M. Specific interactions between ATPase subunits of the 26S protease. J. Biol. Chem. 1997, 272, 13403–13411. 38 RUBIN, D. M., GLICKMAN, M. H., LARSEN, C. N., DHRUVAKUMAR, S., and FINLEY, D. Active site mutants in the six regulatory particle ATPases reveal multiple roles for ATP in the proteasome. EMBO J. 1998, 17, 4909–4919. 39 DAVEY, M. J., JERUZALMI, D., KURIYAN, J., and O’DONNELL, M. Motors and switches: AAA+ machines within the
21
22 The 26S Proteasome
40
41
42
43
44
45
46
47
48
49
50
replisome. Nat Rev Mol Cell Biol 2002, 3, 826–835. LEE, S. Y. et al. Regulation of the transcriptional activator NtrC1: structural studies of the regulatory and AAA+ ATPase domains. Genes Dev 2003, 17, 2552–2563. HARTMANN-PETERSEN, R., TANAKA, K., and HENDIL, K. B. Quaternary structure of the ATPase complex of human 26S proteasomes determined by chemical cross-linking. Arch Biochem Biophys 2001, 386, 89–94. BERNDT, C., BECH-OTSCHIR, D., DUBIEL, W., and SEEGER, M. Ubiquitin system: JAMMing in the name of the lid. Curr Biol 2002, 12, R815–R817. COPE, G. A. and DESHAIES, R. J. COP9 signalosome: a multifunctional regulator of SCF and other cullin-based ubiquitin ligases. Cell 2003, 114, 663–671. LI, L. and DENG, X. W. The COP9 signalosome: an alternative lid for the 26S proteasome? Trends Cell Biol 2003, 13, 507–509. FERRELL, K., WILKINSON, C. R., DUBIEL, W., and GORDON, C. Regulatory subunit interactions of the 26S proteasome, a complex problem. Trends Biochem Sci 2000, 25, 83–88. FU, H., REIS, N., LEE, Y., GLICKMAN, M. H., and VIERSTRA, R. D. Subunit interaction maps for the regulatory particle of the 26S proteasome and the COP9 signalosome. Embo J 2001, 20, 7096–7107. CAGNEY, G., UETZ, P., and FIELDS, S. Two-hybrid analysis of the Saccharomyces cerevisiae 26S proteasome. Physiol Genomics 2001, 7, 27–34. DAVY, A. et al. A protein-protein interaction map of the Caenorhabditis elegans 26S proteasome. EMBO Rep 2001, 2, 821–828. COPE, G. A. et al. Role of predicted metalloprotease motif of Jab1/Csn5 in cleavage of Nedd8 from Cul1. Science 2002, 298, 608–611. BAILLY, E. and REED, S. I. Functional characterization of rpn3 uncovers a distinct 19S proteasomal subunit requirement for ubiquitin-dependent proteolysis of cell cycle regulatory
51
52
53
54
55
56
57
58
59
60
proteins in budding yeast. Mol Cell Biol 1999, 19, 6872–6890. FONG, A., ZHANG, M., NEELY, J., and SUN, S. C. S9, a 19 S proteasome subunit interacting with ubiquitinated NF-kappaB2/p100. J Biol Chem 2002, 277, 40697–40702. DEVERAUX, Q., USTRELL, V., PICKART, C., and RECHSTEINER, M. A 26S protease subunit that binds ubiquitin conjugates. J. Biol. Chem. 1994, 269, 7059–7061. van NOCKER, S. et al. The Multiubiquitin-Chain-Binding Protein McB1 Is a Component of the 26S Proteasome in Saccharomyces cerevisiae and Plays a Nonessentail, Substrate-Specific Role in Protein Turnover. Mol. Cell Biol. 1996, 16, 6020–6028. KAJAVA, A. V. What curves alphasolenoids? Evidence for an alpha-helical toroid structure of Rpn1 and Rpn2 proteins of the 26S proteasome. J Biol Chem 2002, 277, 49791–49798. ELSASSER, S. et al. Proteasome subunit Rpn1 binds ubiquitin-like protein domains. Nat Cell Biol 2002, 4, 725–730. SAEKI, Y., SONE, T., TOHE, A., and YOKOSAWA, H. Identification of ubiquitin-like protein-binding subunits of the 26S proteasome. Biochem Biophys Res Commun 2002, 296, 813–819. LAM, Y. A., LAWSON, T. G., VELAYUTHAM, M., ZWEIER, J. L., and PICKART, C. M. A proteasomal ATPase subunit recognizes the polyubiquitin degradation signal. Nature 2002, 416, 763–767. HOFFMAN, L. and RECHSTEINER, M. Activation of the multicatalytic protease. The 11 S regulator and 20S ATPase complexes contain distinct 30-kilodalton subunits. J. Biol. Chem. 1994, 269, 16890–16895. BRAUN, B. C. et al. The base of the proteasome regulatory particle exhibits chaperone-like activity. Nat Cell Biol 1999, 1, 221–226. STRICKLAND, E., HAKALA, K., THOMAS, P. J., and DEMARTINO, G. N. Recognition of misfolding proteins by PA700, the regulatory subcomplex of the 26S
References
61
62
63
64
65
66
67 68
69
70
71
proteasome. J Biol Chem 2000, 275, 5565–5572. LIU, C. W. et al. Conformational remodeling of proteasomal substrates by PA700, the 19 S regulatory complex of the 26S proteasome. J Biol Chem 2002, 277, 26815–26820. GONZALEZ, F., DELAHODDE, A., KODADEK, T., and JOHNSTON, S. A. Recruitment of a 19S proteasome subcomplex to an activated promoter. Science 2002, 296, 548–550. HOFFMAN, L., PRATT, G., and RECHSTEINER, M. Multiple forms of the 20 S multicatalytic and the 26S ubiquitin/ATP-dependent proteases from rabbit reticulocyte lysate. J. Biol. Chem. 1992, 267, 22362–22368. VERMA, R. et al. Proteasomal proteomics: identification of nucleotide-sensitive proteasome-interacting proteins by mass spectrometric analysis of affinity-purified proteasomes. Mol Biol Cell 2000, 11, 3425–3439. GUTERMAN, A. and GLICKMAN, M. H. Complementary roles for rpn11 and ubp6 in deubiquitination and proteolysis by the proteasome. J Biol Chem 2004, 279, 1729–1738. MEINERS, S. et al. Inhibition of proteasome activity induces concerted expression of proteasome genes and de novo formation of Mammalian proteasomes. J Biol Chem 2003, 278, 21517–21525. BLOBEL, G. Protein targeting (Nobel lecture). Chembiochem 2000, 1, 86–102. RECHSTEINER, M. and ROGERS, S. W. PEST sequences and regulation by proteolysis. Trends Biochem. Sci. 1996, 21, 267–271. VARSHAVSKY, A. The N-end rule: Functions, mysteries, uses. Proc. Natl. Acad. Sci. USA 1996, 93, 12142–12149. ZUR, A. and BRANDEIS, M. Timing of APC/C substrate degradation is determined by fzy/fzr specificity of destruction boxes. Embo J 2002, 21, 4500–4510. ORLOWSKI, M. and WILK, S. Ubiquitin-independent proteolytic functions of the proteasome. Arch Biochem Biophys 2003, 415, 1–5.
72 GRUNE, T., MERKER, K., SANDIG, G., and DAVIES, K. J. Selective degradation of oxidatively modified protein substrates by the proteasome. Biochem Biophys Res Commun 2003, 305, 709–718. 73 PENG, J. et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol 2003, 21, 921–926. 74 WU-BAER, F., LAGRAZON, K., YUAN, W., and BAER, R. The BRCA1/BARD1 heterodimer assembles polyubiquitin chains through an unconventional linkage involving lysine residue K6 of ubiquitin. J Biol Chem 2003, 278, 34743–34746. 75 SPENCE, J., SADIS, S., HAAS, A. L., and FINLEY, D. A ubiquitin mutant with specific defects in DNA repair and multiubiquitination. Mol Cell Biol 1995, 15, 1265–1273. 76 SPRINGAEL, J. Y., GALAN, J. M., HAGUENAUER-TSAPIS, R., and ANDRE, B. NH4+-induced down-regulation of the Saccharomyces cerevisiae Gap1p permease involves its ubiquitination with lysine-63-linked chains. J Cell Sci 1999, 112 (Pt 9), 1375–1383. 77 ALBERTI, S. et al. Ubiquitylation of BAG-1 suggests a novel regulatory mechanism during the sorting of chaperone substrates to the proteasome. J Biol Chem 2002, 277, 45920–45927. 78 CHAU, V. et al. A multiubiquitin chain is confined to specific lysine in a targeted short-lived protein. Science 1989, 243, 1576–1583. 79 JOHNSON, E. S., MA, P. C., OTA, I. M., and VARSHAVSKY, A. A proteolytic pathway that recognizes ubiquitin as a degradation signal. J. Biol. Chem. 1995, 270, 17442–17456. 80 THROWER, J. S., HOFFMAN, L., RECHSTEINER, M., and PICKART, C. M. Recognition of the polyubiquitin proteolytic signal. EMBO J. 2000, 19, 94–102. 81 YOUNG, P., DEVERAUX, Q., BEAL, R., PICKART, C., and RECHSTEINER, M. Characterization of two polyubiquitin binding sites in the 26S protease subunit 5a. J. Biol. Chem. 1998, 273, 5461–5467. 82 HOFMANN, K. and FALQUET, L. A ubiquitin-interacting motif conserved in
23
24 The 26S Proteasome
83
84
85
86
87
88
89
90
91
92
components of the proteasomal and lysosomal protein degradation systems. Trends Biochem Sci 2001, 26, 347–350. SWANSON, K. A., KANG, R. S., STAMENOVA, S. D., HICKE, L., and RADHAKRISHNAN, I. Solution structure of Vps27 UIM-ubiquitin complex important for endosomal sorting and receptor downregulation. Embo J 2003, 22, 4597–4606. MUELLER, T. D. and FEIGON, J. Structural determinants for the binding of ubiquitin-like domains to the proteasome. Embo J 2003, 22, 4634–4645. RYU, K. S. et al. Binding surface mapping of intra- and interdomain interactions among hHR23B, ubiquitin, and polyubiquitin binding site 2 of S5a. J Biol Chem 2003, 278, 36621–36627. HARTMANN-PETERSEN, R., SEEGER, M., and GORDON, C. Transferring substrates to the 26S proteasome. Trends Biochem Sci 2003, 28, 26–31. SAEKI, Y., SAITOH, A., TOHE, A., and YOKOSAWA, H. Ubiquitin-like proteins and Rpn10 play cooperative roles in ubiquitin-dependent proteolysis. Biochem Biophys Res Commun 2002, 293, 986–992. HIYAMA, H. et al. Interaction of hHR23 with S5a. The ubiquitin-like domain of hHR23 mediates interaction with S5a subunit of 26S proteasome. J Biol Chem 1999, 274, 28019–28025. SAKATA, E. et al. Parkin binds the Rpn10 subunit of 26S proteasomes through its ubiquitin-like domain. EMBO Rep 2003, 4, 301–306. KLEIJNEN, M. F., ALARCON, R. M., and HOWLEY, P. M. The ubiquitin-associated domain of hPLIC-2 interacts with the proteasome. Mol Biol Cell 2003, 14, 3868–3875. XIE, Y. and VARSHAVSKY, A. UFD4 lacking the proteasome-binding region catalyses ubiquitination but is impaired in proteolysis. Nat Cell Biol 2002, 4, 1003–1007. CORN, P. G., MCDONALD, E. R., 3rd, HERMAN, J. G., and EL-DEIRY, W. S. Tat-binding protein-1, a component of the 26S proteasome, contributes to the
93
94
95
96
97
98
99
100
101
102
103
E3 ubiquitin ligase function of the von Hippel-Lindau protein. Nat Genet 2003, 35, 229–237. LAMBERTSON, D., CHEN, L., and MADURA, K. Pleiotropic defects caused by loss of the proteasome-interacting factors Rad23 and Rpn10 of Saccharomyces cerevisiae. Genetics 1999, 153, 69–79. MURAKAMI, Y. et al. Ornithine decarboxylase is degraded by the 26S proteasome without ubiquitination. Nature 1992, 360, 597–599. SHEAFF, R. J. et al. Proteasome turnover of p21Cip1 does not require p21Cip1 ubiquitination. Molec. Cell. 2000, 5, 403–410. ZHANG, M., PICKART, C. M., and COFFINO, P. Determinants of proteasome recognition of ornithine decarboxylase, a ubiquitin-independent substrate. Embo J 2003, 22, 1488–1496. REZVANI, K. et al. Proteasomal interactors control activities as diverse as the cell cycle and glutaminergic neurotransmission. Biochem Soc Trans 2003, 31, 470–473. TOUITOU, R. et al. A degradation signal located in the C-terminus of p21WAF1/CIP1 is a binding site for the C8 alpha-subunit of the 20S proteasome. Embo J 2001, 20, 2367–2375. COLEMAN, M. L., MARSHALL, C. J., and OLSON, M. F. Ras promotes p21(Waf1/Cip1) protein stability via a cyclin D1-imposed block in proteasome-mediated degradation. Embo J 2003, 22, 2036–2046. MURAKAMI, Y., MATSUFUJI, S., HAYASHI, S. I., TANAHASHI, N., and TANAKA, K. ATP-Dependent inactivation and sequestration of ornithine decarboxylase by the 26S proteasome are prerequisites for degradation. Mol Cell Biol 1999, 19, 7216–7227. BAYS, N. W. and HAMPTON, R. Y. Cdc48-Ufd1-Npl4: stuck in the middle with Ub. Curr Biol 2002, 12, R366–R371. BRUNGER, A. T. and DELABARRE, B. NSF and p97/VCP: similar at first, different at last. FEBS Lett 2003, 555, 126–133. RABINOVICH, E., KEREM, A., FROHLICH, K. U., DIAMANT, N., and BAR-NUN, S. AAA-ATPase p97/Cdc48p, a cytosolic
References
104
105
106
107
108
109
110
111
112
chaperone required for endoplasmic reticulum-associated protein degradation. Mol Cell Biol 2002, 22, 626–634. WOJCIK, C., YANO, M., and DEMARTINO, G. N. RNA interference of valosin-containing protein (VCP/p97) reveals multiple cellular roles linked to ubiquitin/proteasome-dependent proteolysis. J Cell Sci 2004, 117, 281–292. IMAMURA, T. et al. Involvement of heat shock protein 90 in the degradation of mutant insulin receptors by the proteasome. J Biol Chem 1998, 273, 11183–11188. TAXIS, C. et al. Use of modular substrates demonstrates mechanistic diversity and reveals differences in chaperone requirement of ERAD. J Biol Chem 2003, 278, 35903–35913. TOKUMOTO, T. et al. Regulated interaction between polypeptide chain elongation factor-1 complex with the 26S proteasome during Xenopus oocyte maturation. BMC Biochem 2003, 4, 6. GONCIARZ-SWIATEK, M. et al. Recognition, targeting, and hydrolysis of the lambda O replication protein by the ClpP/ClpX protease. J Biol Chem 1999, 274, 13999–14005. STUDEMANN, A. et al. Sequential recognition of two distinct sites in sigma(S) by the proteolytic targeting factor RssB and ClpX. Embo J 2003, 22, 4111–4120. NEHER, S. B., SAUER, R. T., and BAKER, T. A. Distinct peptide signals in the UmuD and UmuD subunits of UmuD/D mediate tethering and substrate processing by the ClpXP protease. Proc Natl Acad Sci USA 2003, 100, 13219–13224. PETROSKI, M. D. and DESHAIES, R. J. Context of multiubiquitin chain attachment influences the rate of Sic1 degradation. Mol Cell 2003, 11, 1435–1444. LEE, C., SCHWARTZ, M. P., PRAKASH, S., IWAKURA, M., and MATOUSCHEK, A. ATP-dependent proteases degrade their substrates by processively unraveling them from the degradation signal. Mol Cell 2001, 7, 627–637.
113 HOSKINS, J. R., YANAGIHARA, K., MIZUUCHI, K., and WICKNER, S. ClpAP and ClpXP degrade proteins with tags located in the interior of the primary sequence. Proc Natl Acad Sci USA 2002, 99, 11037–11042. 114 LEE, C., PRAKASH, S., and MATOUSCHEK, A. Concurrent translocation of multiple polypeptide chains through the proteasomal degradation channel. J Biol Chem 2002, 277, 34760–34765. 115 BENAROUDJ, N., ZWICKL, P., SEEMULLER, E., BAUMEISTER, W., and GOLDBERG, A. L. ATP hydrolysis by the proteasome regulatory complex PAN serves multiple functions in protein degradation. Mol Cell 2003, 11, 69–78. 116 KENNISTON, J. A., BAKER, T. A., FERNANDEZ, J. M., and SAUER, R. T. Linkage between ATP consumption and mechanical unfolding during the protein processing reactions of an AAA+ degradation machine. Cell 2003, 114, 511–520. 117 REID, B. G., FENTON, W. A., HORWICH, A. L., and WEBER-BAN, E. U. ClpA mediates directional translocation of substrate proteins into the ClpP protease. Proc Natl Acad Sci USA 2001, 98, 3768–3772. 118 DILLINGHAM, M. S., WIGLEY, D. B., and WEBB, M. R. Demonstration of unidirectional single-stranded DNA translocation by PcrA helicase: measurement of step size and translocation speed. Biochemistry 2000, 39, 205–212. 119 LLORCA, O. et al. The ‘sequential allosteric ring’ mechanism in the eukaryotic chaperonin-assisted folding of actin and tubulin. Embo J 2001, 20, 4065–4075. 120 LASKEY, R. A. and MADINE, M. A. A rotary pumping model for helicase function of MCM proteins at a distance from replication forks. EMBO Rep 2003, 4, 26–30. 121 FLYNN, J. M., NEHER, S. B., KIM, Y. I., SAUER, R. T., and BAKER, T. A. Proteomic discovery of cellular substrates of the ClpXP protease reveals five classes of ClpX-recognition signals. Mol Cell 2003, 11, 671–683.
25
26 The 26S Proteasome
122 PALOMBELLA, V. J., RANDO, O. J., GOLDBERG, A. L., and MANIATIS, T. The Ubiquitin–Proteasome pathway is required for processing the NF-kappa B1 precursor protein and the activation of NF-kappa B. Cell 1994, 78, 773–785. 123 CIECHANOVER, A. et al. Mechanisms of ubiquitin-mediated, limited processing of the NF-kappaB1 precursor protein p105. Biochimie 2001, 83, 341–349. 124 ZHANG, M. and COFFINO, P. Repeat sequence of Epstein-Barr virus EBNA1 protein interrupts proteasome substrate processing. J Biol Chem 2004, 279, 8635–8641. 125 HOPPE, T. et al. Activation of a membrane-bound transcription factor by regulated ubiquitin/proteasomedependent processing. Cell 2000, 102, 577–586. 126 WOJCIK, C. and DEMARTINO, G. N. Analysis of Drosophila 26S proteasome using RNA interference. J Biol Chem 2002, 277, 6188–6197. 127 SZLANKA, T. et al. Deletion of proteasomal subunit S5a/Rpn10/p54 causes lethality, multiple mitotic defects and overexpression of proteasomal genes in Drosophila melanogaster. J Cell Sci 2003, 116, 1023–1033. 128 MANNHAUPT, G., SCHNALL, R., KARPOV, V., VETTER, I., and FELDMANN, H. Rpn4p acts as a transcription factor by binding to PACE, a nonamer box found upstream of 26S proteasomal and other genes in yeast. FEBS Lett 1999, 450, 27–34. 129 XIE, Y. and VARSHAVSKY, A. RPN4 is a ligand, substrate, and transcriptional regulator of the 26S proteasome: a negative feedback circuit. Proc Natl Acad Sci USA 2001, 98, 3056–3061. 130 ARENDT, C. S. and HOCHSTRASSER, M. Eukaryotic 20S proteasome catalytic subunit propeptides prevent active site inactivation by N-terminal acetylation and promote particle assembly. Embo J 1999, 18, 3575–3585. 131 NANDI, D., WOODWARD, E., GINSBURG, D. B., and MONACO, J. J. Intermediates in the formation of mouse 20S proteasomes: implications for the
132
133
134
135
136
137
138
139
140
assembly of precursor beta subunits. Embo J 1997, 16, 5363–5375. GRIFFIN, T. A., SLACK, J. P., MCCLUSKEY, T. S., MONACO, J. J., and COLBERT, R. A. Identification of proteassemblin, a mammalian homologue of the yeast protein, Ump1p, that is required for normal proteasome assembly. Mol Cell Biol Res Commun 2000, 3, 212–217. RAMOS, P. C., HOCKENDORFF, J., JOHNSON, E. S., VARSHAVSKY, A., and DOHMEN, R. J. Ump1p is required for proper maturation of the 20S proteasome and becomes its substrate upon completion of the assembly. Cell 1998, 92, 489–499. SANTAMARIA, P. G., FINLEY, D., BALLESTA, J. P., and REMACHA, M. Rpn6p, a proteasome subunit from Saccharomyces cerevisiae, is essential for the assembly and activity of the 26S proteasome. J Biol Chem 2003, 278, 6687–6695. YANG, Y., FRU¨ H, K., AHN, K., and PETERSON, P. A. In vivo assembly of the proteasomal complexes, implications for antigen processing. J. Biol. Chem. 1995, 270, 27687–27694. MASON, G. G., HENDIL, K. B., and RIVETT, A. J. Phosphorylation of proteasomes in mammalian cells. Identification two phosphorylated subunits and the effect of phosphorylation on activity. Eur. J. Biochem. 1996, 238, 453–462. MASON, G. G., MURRAY, R. Z., PAPPIN, D., and RIVETT, A. J. Phosphorylation of ATPase subunits of the 26S proteasome. FEBS Lett 1998, 430, 269–274. KIMURA, Y. et al. N(alpha)-acetylation and proteolytic activity of the yeast 20 S proteasome. J Biol Chem 2000, 275, 4635–4639. KIMURA, Y. et al. N-Terminal modifications of the 19S regulatory particle subunits of the yeast proteasome. Arch Biochem Biophys 2003, 409, 341–348. SUMEGI, M., HUNYADI-GULYAS, E., MEDZIHRADSZKY, K. F., and UDVARDY, A. 26S proteasome subunits are O-linked N-acetylglucosamine-modified in Drosophila melanogaster. Biochem Biophys Res Commun 2003, 312, 1284–1289.
References 141 BOSE, S., STRATFORD, F. L., BROADFOOT, K. I., MASON, G. G., and RIVETT, A. J. Phosphorylation of 20S proteasome alpha subunit C8 (alpha7) stabilizes the 26S proteasome and plays a role in the regulation of proteasome complexes by gamma-interferon. Biochem J Pt 2004, 378, 177–184. 142 KURUCZ, E. et al. Assembly of the Drosophila 26S proteasome is accompanied by extensive subunit rearrangements. Biochem J 2002, 365, 527–536. 143 TONE, Y. and TOH, E. A. Nob1p is required for biogenesis of the 26S proteasome and degraded upon its maturation in Saccharomyces cerevisiae. Genes Dev 2002, 16, 3142–3157. 144 YEN, H. C., GORDON, C., and CHANG, E. C. Schizosaccharomyces pombe Int6 and Ras homologs regulate cell division and mitotic fidelity via the proteasome. Cell 2003, 112, 207–217. 145 IMAI, J., MARUYA, M., YASHIRODA, H., YAHARA, I., and TANAKA, K. The molecular chaperone Hsp90 plays a role in the assembly and maintenance of the 26S proteasome. Embo J 2003, 22, 3557–3567. 146 BAJOREK, M., FINLEY, D., and GLICKMAN, M. H. Proteasome disassembly and downregulation is correlated with viability during stationary phase. Curr Biol 2003, 13, 1140–1144. 147 HENDIL, K. B., KHAN, S. and TANAKA, K. Simultaneous binding of PA28 and PA700 activators to 20 S proteasomes. Biochem. J. 1998, 332, 749–754. 148 KOPP, F., DAHLMANN, B., and KUEHN, L. Reconstitution of hybrid proteasomes from purified PA700–20 S complexes and PA28alphabeta activator: ultrastructure and peptidase activities. J Mol Biol 2001, 313, 465–471. 149 TANAHASHI, N. et al. Hybrid proteasomes. Induction by interferon-gamma and contribution to ATP-dependent proteolysis. J. Biol. Chem. 2000, 275, 14336–14345. 150 CASCIO, P., CALL, M., PETRE, B. T. W., and GOLDBERG, A. L. Properties of the hybrid form of the 26S proteasome containing
151
152
153
154
155
156
157
158
159
160
161
162
both 19S and PA28 complexes. EMBO J. 2002, 21, 2636–2645. RECHSTEINER, M., REALINI, C., and USTRELL, V. The proteasome activator 11S REG (PA28) and class I antigen presentation. Biochem. J. 2000, 345, 1–15. HILL, C. P., MASTERS, E. I., and WHITBY, F. G. The 11S regulators of 20S proteasome activity. Curr Top Microbiol Immunol 2002, 268, 73–89. MURATA, S. et al. Immunoproteasome assembly and antigen presentation in mice lacking both PA28alpha and PA28beta. Embo J 2001, 20, 5898–5907. MURATA, S. et al. Growth retardation in mice lacking the proteasome activator PA28gamma. J Biol Chem 1999, 274, 38211–38215. WHITBY, F. G. et al. Structural basis for the activation of 20 S proteasomes by 11 S regulators. Nature 2000, 408, 115–120. USTRELL, V., HOFFMAN, L., PRATT, G., and RECHSTEINER, M. PA200, a nuclear proteasome activator involved in DNA repair. EMBO J. 2002, 21, 3403–3412. HO, Y., GRUHLER, A., HEILBUT, A., BADER, G. D., and MOORE, L. Systematic identification of protein complexes in Saccharomyces cerevisiaex by mass spectrometry. Nature 2002, 415, 180–183. GAVIN, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002, 415, 141–147. LEGGETT, D. S. et al. Multiple associated proteins regulate proteasome structure and function. Mol Cell 2002, 10, 495–507. MCCUTCHEN-MALONEY, S. L. et al. cDNA cloning, expression, and functional characterization of PI31, a proline-rich inhibitor of the proteasome. J Biol Chem 2000, 275, 18557–18565. ZAISS, D. M., STANDERA, S., KLOETZEL, P. M., and SIJTS, A. J. PI31 is a modulator of proteasome formation and antigen processing. Proc Natl Acad Sci USA 2002, 99, 14344–14349. LU, D. C. et al. A second cytotoxic proteolytic peptide derived from amyloid beta-protein precursor. Nat. Medicine 2000, 6, 397–404.
27
28 The 26S Proteasome
163 GACZYNSKA, M., OSMULSKI, P. A., GAO, Y., POST, M. J., and SIMONS, M. Prolineand arginine-rich peptides constitute a novel class of allosteric inhibitors of proteasome activity. Biochemistry 2003, 42, 8663–8670. 164 YAMANO, T. et al. Two distinct pathways mediated by PA28 and hsp90 in major histocompatibility complex class I antigen processing. J Exp Med 2002, 196, 185–196. 165 APCHER, G. S. et al. Human immunodeficiency virus-1 Tat protein interacts with distinct proteasomal alpha and beta subunits. FEBS Lett 2003, 553, 200–204. 166 HUANG, X. et al. The RTP site shared by the HIV-1 Tat protein and the 11S regulator subunit alpha is crucial for their effects on proteasome function including antigen processing. J Mol Biol 2002, 323, 771–782. 167 FAROUT, L. et al. Distribution of proteasomes and of the five proteolytic activities in rat tissues. Arch Biochem Biophys 2000, 374, 207–212. 168 BROOKS, P. et al. Subcellular localization of proteasomes and their regulatory
169
170
171
172
173
174
complexes in mammalian cells. Biochem. J. 2000, 346, 155–161. REITS, E. A., BENHAM, A. M., PLOUGASTE, B., NEEFJES, J., and TROWSDALE, J. Dynamics of proteasome distribution in living cells. EMBO J. 1997, 16, 6087–6094. ARABI, A., RUSTUM, C., HALLBERG, E., and WRIGHT, A. P. Accumulation of c-Myc and proteasomes at the nucleoli of cells containing elevated c-Myc protein levels. J Cell Sci 2003, 116, 1707–1717. KOPITO, R. R. Aggresomes, inclusion bodies and protein aggregation. Trends Cell Biol 2000, 10, 524–530. WOJCIK, C. and DEMARTINO, G. N. Intracellular localization of proteasomes. Int J Biochem Cell Biol 2003, 35, 579–589. HAMPTON, R. Y. ER-associated degradation in protein quality control and cellular regulation. Curr Opin Cell Biol 2002, 14, 476–482. MCCRACKEN, A. A. and BRODSKY, J. L. Evolving questions and paradigm shifts in endoplasmic-reticulum-associated degradation (ERAD). Bioessays 2003, 25, 868–877.
1
Molecular Machines for Protein Degradation Matthias Bochtler
Warszawa, Poland
Michael Groll Ludwig-Maximilians-Universit¨at M¨unchen, M¨unchen, Germany
Hans Brandstetter Max-Planck-Institute for Biochemistry, Martinsried, Germany
Tim Clausen Institute for Molecular Pathology, Vienna, Austria
Robert Huber Max-Planck-Institute for Biochemistry, Martinsried, Germany
Originally published in: Protein Degradation, Volume 1. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30837-8
1 Introduction
The action of intracellular proteolytic enzymes is tightly controlled to avoid destruction of properly folded and functional proteins essential for cell viability and to restrict their activity towards sick molecules or/and those marked for destruction. The four (five) proteases discussed in this chapter display different regulatory mechanisms but show sequestration of their active sites inside molecular cages as a common structural principle, albeit being assembled from different building blocks in different shapes and with varying symmetries. The chapter focuses on structural, functional, and mutational studies from our laboratory. We are aware of other cage-forming proteases and their regulatory components that have been structurally characterized and which are mentioned in brief later in the context of our studies. The chapter is arranged in four main sections focused on the proteases HslVU, proteasome, tricorn (DPPIV), and DegP.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Molecular Machines for Protein Degradation
2 The ATP-dependent Protease HslVU
ATP-dependent proteases are complex proteolytic machines. They are present in eubacteria, archaebacteria, in eukaryotic organelles and, as the 20S or 26S proteasome, in the eukaryotic cytosol and nucleoplasm. The activators of all known ATP-dependent proteases are related. They all contain an AAA(+) ATPase domain as a module [1] and are thought to assemble into hexameric particles or, in the case of 26S proteasomes, are present in six variants in the 19S activators [2]. Like the ATPases, the proteolytic components of the ATP-dependent proteases form higher order complexes, but unlike for the ATPases, the symmetry of the protease assemblies varies, and the folds of the subunits need not be related. ClpP is a serine protease, FtsH a metalloprotease, and HslV and the proteasomes from archaebacteria and eubacteria are threonine proteases. Although extensive biochemical data on both the bacterial and eukaryotic ATPdependent proteases are available, the characterization of these proteolytic machines at atomic resolution has proven difficult, because of both the large size of these complexes and their lability to proteolysis and dissociation. No structural data at all are currently available for Lon and the mitochondrial ATP-dependent proteases. In the case of the cytosolic, membrane-integrated bacterial protease FtsH, atomic resolution data are available only for the ATPase domain [3, 4]. In contrast, the ATP-dependent activators of the ClpAP and ClpXP proteolytic machines have so far resisted crystallization. Atomic resolution data are available only for the proteolytic component ClpP [5], and separately for a ClpX monomer [6] and a ClpA monomer [7]. The bacterial protease HslVU is unique in two respects: at present, it is the only ATP-dependent protease to have atomic coordinates of the full complex determined; secondly, and in contrast to all other bacterial ATP-dependent proteases, it contains a proteolytic core that is related to the 20S proteolytic core of archaebacterial and eukaryotic proteasomes. The following sections summarize our understanding of HslVU biochemistry, crystallography, and enzymology and end with some speculation on the implications of these results for other ATP-dependent proteases. 2.1 HslVU Physiology and Biochemistry
In E. coli and most, but not all, other bacteria, the HslU (ATPase) and HslV (protease) genes are found in one operon under the control of a heat-shock promoter. The operon was first found [8] and sequenced [9] in the course of a search for new heat-shock genes. It was later independently isolated again in screens for proteins that could down-regulate the heat-shock response [10] and for suppressors of the SOS-mediated inhibition of cell division in E. coli [11]. The observed biological responses in HslVU over-expression or deletion strains result from a decrease or increase in the steady levels of HslVU substrates [12]. HslVU, itself a heat-shock protein, affects the heat-shock response by degradation of the heat-shock factor σ 32
2 The ATP-dependent Protease HslVU
[10, 12] and the SOS response via the degradation of the cell-division inhibitor SulA [13, 14]. The physiological role of HslVU seems to be limited, probably because of overlapping substrate specificity with other ATP-dependent proteases in bacteria. The E. coli HslVU deletion strain has no phenotype at standard growth temperature, and it appears that HslVU is required for normal growth only at very high temperatures [12]. According to the protease database MEROPS, some bacterial species appear to lack an HslVU–type peptidase altogether [15]. Therefore, it came as a surprise that some HslV and HslU homologs were recently found in primordial eukaryotes where they appear to be simultaneously present with genuine 20S proteasomes [16]. Low expression levels and the lability of the HslVU complex make work with proteins from wild-type strains difficult. Gratifyingly, the active protease can be reconstituted in vitro from over-expressed and purified components [17]. It requires ATP for the degradation of folded substrates and ATP or some of its analogs for the purification of small chromogenic peptides. As expected, ATP-hydrolysis and proteolysis activities are mutually dependent [18]. In addition, the peptidase activity was found to depend in complex ways on the presence of various cations, especially K+ in the buffers [19]. 2.2 HslV Peptidase
On the sequence level, HslV shows sequence similarity with the β-subunits of arch-aebacterial and eukaryotic proteasomes, a fact that was immediately noticed when the E. coli gene was sequenced [9] and was later shown to extend to other related eubacterial sequences [20]. Electron microscopy (EM) of recombinant HslV subsequently suggested that the particle formed a dimer of hexamers that appeared to enclose only one central cavity without antechambers as in proteasomes [21]. The unexpected six-fold symmetry of HslV and the similarity in subunit fold with eukaryotic proteasomes were subsequently confirmed by X-ray crystallography [8]. The crystallographic data also showed that the contracted ring compared to proteasomes resulted from small changes to the subunit–subunit interface, not from an entirely new mode of oligomerization [8] (see Figure 1). All 12 active sites of HslV are located on the inner walls of the hollow particle. In the E. coli particle, each active site has neighboring active sites 28 Å away on the same ring and 22 Å and 26 Å away on the opposite ring. The environment of the nucleophilic Thr1 looks similar to that in proteasomes, and the presence of a (putatively protonated) lysine residue near the active site probably helps to lower the pK a of the N-terminal α-amino group so that it is present in the unprotonated form, which can act as the general base to accept a proton from Thr1. Since the determination of the HslV crystal structure, two additional crystal structures from other species have been determined. The highest resolution structure available to date is the crystal structure of the Haemophilus influenzae enzyme [23]. Intriguingly, this structure showed the presence of cation-binding sites near the active centers [23], a finding that could subsequently be confirmed for the
3
4 Molecular Machines for Protein Degradation
Fig. 1 E. coli HslV vs. T. acidophilum proteasome. Superposition of one hexameric ring of E. coli HslV (red) and of one heptameric ring of T. acidophilum proteosome $-subunits (green) in stereo representation.
The subunits at the “top” of the ring have been overlayed optimally. Note that the “tails” of the HslV subunits that point radially outwards are histidine tags and thus cloning artefacts.
Thermotoga maritima enzyme [24] and that explains, at least in qualitative terms, the dependence of HslVU activity on various cations in solution. Overall, the crystal structures of the enzymes from H. influenzae [23] and T. maritima [24] are very similar to the original structure of the E. coli enzyme. Therefore, it came as a surprise that the HslV homolog known as CodW from Bacillus subtilis behaves rather differently. Although the enzyme contains a threonine residue that aligns with the active site threonine of the E. coli, H. influenzae, and T. maritima enzymes, it does not contain the glycine at the C-terminus of the profragment that is believed to be required for efficient autocatalytic processing, and indeed the polypeptide chain is processed five residues upstream of the conserved threonine that is the active-site nucleophile in other species to expose an N-terminal serine residue [25]. Whether this implies that the serine is in the spatial position normally filled by threonine, implying a discrepancy between the sequence-based and structure-based alignments, or whether it means that the accessory catalytic residues are either dispensable or anchored elsewhere on the sequence is currently not clear. 2.3 HslU ATPase
Based on the sequence, HslU can be easily classified as an ATPase by the presence of conventional Walker A (phosphate binding loop or P-loop) and Walker B (magnesium binding) motifs. Beyond this simple classification, two competing models for HslU were proposed, classifying the enzyme either as a PDZ-domain
2 The ATP-dependent Protease HslVU
containing ATPase [26] or alternatively as a AAA(+)-type ATPase [1]. The crystal structure settled the issue in favour of the AAA(+) model [9]. AAA(+)-ATPases consist essentially of two structural domains that are connected through a short linker. A nucleotide binds at the interface of the two domains. As first observed with HslU, the presence or absence of a nucleotide induces different relative orientations between the two domains [9]. With the availability of many different nucleotide states of HslU, the model was later refined to include a dependence on the state of hydrolysis of the nucleotide [28]. The nucleotide is in a strategic position both at the interface of the N- and C-domains of one subunit and at the interface of adjacent subunits. A combination of mostly conserved residues from the two subunits around the nucleotide creates a highly polar environment [9]. Two arginine residues have attracted particular attention: R393 of E. coli HslU is thought to act as the “sensor” that transmits information on the presence or absence of nucleotide, and possibly on its identity, to the C-domain and thus controls the relative orientation of N- and C-domains in HslU. Another conserved arginine residue, R325, is anchored on the subunit that makes fewer contacts with the nucleotide and is the homolog of the proposed “arginine finger” in FtsH [29]. Although the term “arginine finger” (taken from small GTP-binding proteins Ras and Rho) implies a direct catalytic role for this residue, its distance from the nucleotide phosphates argues more for an indirect role. A similar conclusion has since been reached for ClpA [7]). Experimentally, mutation of either of the two arginine residues abolishes all ATP-dependent proteolysis activity [30]. Loss of subunit interactions plays a major role in the loss of function: The “arginine sensor” mutant R325E is fully and the “arginine finger” mutant R393 is partially dissociated in gel-filtration experiments in the presence of salt [30]. A very complex picture has emerged from biochemical and crystallographic studies designed to characterize the substrate-binding sites in HslU. An essential role for the C-terminus of HslU was first suggested on the basis of experimental studies that were designed based on the prediction of PDZ-like domains at the C-terminus of HslU [26]. Although the prediction of PDZ-like domains later turned out to be in error, the conclusion about the role of the C-terminus of HslU in substrate recognition was later corroborated with the definition of a biochemically defined sensor and substrate discrimination domain (SSD) [31] that is also present in other AAA(+) ATPases and was suggested to act as the “hook” for substrates [32]. When the crystal structure of HslU became available, the SSD domain turned out to coincide with its C-domain. This finding is remarkable and not fully understood, since in all crystal structures of HslU the C-domain primarily mediates oligomerization contacts between HslU subunits. Its solvent-accessible regions are found far on the periphery of the HslU ring, far outside the cavity that is formed by the protruding I-domains (see Figure 2). From the crystal structure [9], it would appear likely that the protruding I-domains rather than the SSD domains act as the “hook”, although the ill-defined tertiary structure of the I-domains makes specific interactions unlikely (see Figure 2). Consistent with this model, it was found experimentally that the I-domains are essential for the degradation of the folded substrate MBP-sulA [30]. A recent two-hybrid study is consistent with both points of view. It confirms
5
6 Molecular Machines for Protein Degradation
Fig. 2 HslU surface colored according to domain. (A) View along the six-fold axis, seen from the side opposite to the I-domains. (B) View along the six-fold axis, seen from the side of the I-domains. Every second subunit of the ring is colored according to domain (N-domain yellow, I-domain blue, C-domain or SSD-domain red), the other subunits are colored in green. The diagram is based on the trig-
onal crystals of E. coli HslU that contain nucleotide in every other subunit. This asymmetry and crystallographic packing effects are responsible for the broken six-fold symmetry of the I-domains. Note that the I-domains of three subunits at the top of the figure have been cut away in (B) to allow a view on the globular N- and C-domains.
the essential role of the C-domain (SSD-domain) in oligomerization, but also supports a role for the I-domain and the SSD-domain in substrate degradation [33]. Presumably, if substrates are translocated through the central pore in HslU as EM data suggest for ClpXP [34], both the I-domains and the globular part of the ring would come into contact with substrate during substrate translocation, although the location of the C-domains (SSD-domains) on the periphery of the HslU ring then still needs to be reconciled with this model. The precise mode of recognition of substrates is even less clear for CodX, the HslU homolog from B. subtilis. In the absence of detergent, the I-domains of two hexameric CodX rings contact each other, leading to a head-to-head stacking of CodX rings and presumably the formation of a central cavity loosely surrounded by I-domains. As the dimer of CodX rings can associate with CodW protease on either side, repetitive, chain-type structures with alternating double rings of the peptidase CodW and the ATPase CodX can be formed [35]. The physiological significance of these high molecular weight assemblies is currently not clear. 2.4 The HslVU–Protease Complex
Over the years, a key theme in ATP-dependent proteolysis has been the issue of “symmetry-matched” vs. “symmetry-mismatched” complexes. In the light of the
2 The ATP-dependent Protease HslVU
clearly established symmetry mismatch of the ClpAP [36] and ClpXP [37] complexes, the very clear EM data on the six-fold symmetry of HslV [21, 38] and reports about a predominant species of HslU with six-fold symmetry [21, 38] came as a surprise because they implied that a “ratcheting mechanism” of ATP-dependent proteolysis, if it existed at all, could not be operating in the HslVU system. This conclusion has since been confirmed by all HslU and HslVU crystal structures [9, 39–41]. In all cases, HslU is hexameric and matches the oligomerization state of HslV. For the first crystal structure of an HslU-HslV co-crystal, a controversial I-domainmediated contact between HslU and HslV was reported [10, 40, 43]. The contact was suspicious from the very beginning because of poor contact area, but seemed compatible with the known low affinity between HslV and HslU and appeared to explain how the symmetry mismatched ClpXP and ClpAP complexes could be formed. Although a crystallographic reinterpretation of our original data that attributed this docking mode to overlooked twinning [40, 43] turned out to be itself in error [10], it is now clear from the combined results of cryoelectron microscopy [44], small-angle scattering [39] and several additional crystal structures of the complex [39, 40] that the physiological mode of interaction between HslU and HslV is with HslU I-domains distal to HslV (see Figure 3). 2.4.1 Allosteric Activation In the absence of a nucleotide, HslVU has residual activity at best, but the presence of several non-hydrolyzable ATP-analogs is sufficient to stimulate HslVU–driven proteolysis activity against substrates that do not require unfolding, suggesting an allosteric effect of nucleotide on HslU and via HslU on HslV. This was further corroborated by the observation that a peptide vinyl sulfone formed a covalent complex with HslV only in the presence of HslU and a nucleotide [12]. The details of this allosteric mechanism emerged from the crystal structure of HslVU from H. influenzae. In this case, but not in other crystal structures of the HslVU complex, the normally buried C-termini of HslU distend and insert into active-site clefts in HslV to reach out almost to the HslV active centers [39] (see Figure 4). The crystal structure of HslVU in complex with a peptide vinyl sulfone inhibitor [46] and two independent biochemical studies [47, 48] that demonstrated the activatory properties of the C-terminal tails of HslU further corroborated this mechanism. In the light of these data, it is remarkable that wild-type HslU in the presence of ADP does not act as an activator for HslV, not even against unfolded or chromogenic substrates. A possible, but experimentally untested explanation could be that the C-termini of HslU are available for HslV binding only in the presence of activatory nucleotides. Whatever the details of the allosteric activation mechanism, it is already clear that HslU affects primarily the conformation of HslV active sites and not the accessibility of the HslV proteolytic chamber. Two independent lines of in vitro evidence support this conclusion. Firstly, an HslV mutant with a widened entrance channel does not show increased proteolytic activity in the absence of HslU, although it can still be activated like wild-type HslV by the presence of HslU [47]. Secondly, the crystal structure of the H. influenzae asymmetric HslVU protease in complex with an
7
8 Molecular Machines for Protein Degradation
Fig. 3 HslVU complex. Originally reported (A) and physiologically relevant (B) docking mode between HslV and HslU. In the physiologically relevant docking mode, the I-domains of HslU point away from the HslV.
inhibitory peptide vinyl sulfone has the inhibitor bound only in HslV subunits that are in contact with HslU [41], strongly arguing against accessibility of the proteolytic chamber as the rate-limiting factor at least under the experimental conditions. 2.5 A Comparison of HslVU with ClpXP and ClpAP
The protease core particles HslV and ClpP are assembled from subunits of entirely different fold and catalytic mechanism. ClpP is a serine protease that belongs to the crotonase superfamily of enzymes, a large class of enzymes that catalyze a variety of chemical reactions that all require the stabilization of an intermediate by an oxyanion hole [3]. HslV belongs to the family of Ntn-hydrolases that share the fold and use the N-terminal residue as the nucleophile [15]. Both ClpP [5] and HslV
2 The ATP-dependent Protease HslVU
Fig. 4 HslVU activation mechanism. (A) Stereo view (C" -trace) of the superposition of the C-domain of an HslU subunit (red) from the original E. coli HslVU complex onto that of the H. influenzae HslU subunit (green) from its complex. Two HslV subunits (pink and blue) from the H. influenzae complex are also shown to illustrate the binding of the C-terminal segment of H. influenzae HslU to the pocket between the HslV subunits (indicated also by a black
curved arrow). (B) Stereo view of the closeup of the C-terminal residues of an E. coli HslU subunit (red) from the complex. The carboxylate of the terminal leucine residue forms salt bridges with R394 of the same subunit and with R329 of an adjacent (yellow) HslU subunit that is not illustrated in (A) for clarity. (C) Stereo view of the closeup of the C-terminal residues of an HslU subunit from the H. influenzae HslVU complex.
9
10 Molecular Machines for Protein Degradation
subunits assemble into large oligomers that enclose a central proteolytic chamber, although the symmetry is different, since HslV is a dimer of hexamers and ClpP is a dimer of heptamers. In contrast to the protease components, which have varying symmetry, all known Clp ATPases are assembled from six identical subunits. These subunits contain either one (HslU and ClpX) or two (ClpA) copies of the AAA(+) module. In addition, the Clp ATPases contain additional domains that are unique for each ATPase. In HslU, a mostly helical I-domain is inserted into the AAA(+) module. ClpX contains an N-terminal domain that was shown to bind zinc [4] and act as a dimerization module [24]. Based on the latter result, a model of ClpX as a trimer of dimers was proposed, with N- and C-domains of ClpX forming a regular hexamer and the zinc-binding modules pairing into dimers [24]. In this context, it is remarkable that freshly isolated HslU behaves as a hexamer, but migrates with the apparent molecular weight of a dimer or trimer in gel filtration after a freeze–thaw cycle [7]. Like ClpX, ClpA contains a non-AAA(+) domain at its N-terminus, and like the N-terminal domain in ClpX, this domain also has the capability to bind zinc [47]. However, unlike the N-domain of ClpX, the N-domain of ClpA is almost entirely helical and consists of two four-helix tandem motifs. In both ClpA and ClpX, the N-terminal non-AAA(+) motifs serve as “docking modules” for accessory proteolysis factors that modulate or change the activity of the proteolytic complex itself [25]. The N-domain of ClpX interacts specifically with the adapter protein SspB that stimulates the degradation of SsrA-tagged proteins [27]. The tag is jointly recognized by the SspB-ClpX complex, where SspB interacts with the N-terminal and central region of the SsrA tag and leaves the C-terminal region for interaction with ClpX [57, 58]. The C-terminal region of SspB shares considerable homology with the corresponding region in RssB, another ClpX adapter protein [27]. It appears that RssB promotes the degradation of a specific substrate, namely a subunit of RNA polymerase known as σ S [59]. It has been shown biochemically that in this case again the adapter protein and the ATPase recognize distinct sites in the substrate [60]. ClpA has its own adapter protein, ClpS. At least in vitro, ClpS switches ClpAP activity away from SsrA tagged towards heat-aggregated proteins [26]. The independent crystal structures of ClpS in complex with the N-domain of ClpA are available [47, 62]. They explain the specificity of ClpS for ClpA over other related Clp proteins, especially ClpB [62]. HslU lacks a domain upstream of the AAA(+) module. Consistent with this, no adapter proteins for HslU have been found so far, to the best of our knowledge. Mechanistically, there are important differences between the protease activation mechanisms of ClpXP/AP and HslVU. Most importantly, HslVU is a symmetrymatched proteolytic machine. In contrast, the definitive seven-fold symmetry of the ClpP protease and the well-established six-fold symmetry of ClpA and ClpX imply that ClpXP is a symmetry-mismatched system. It is hard to imagine how such an arrangement would be compatible with an HslVU–style activation mechanism with insertion of the C-termini of all activator subunits into clefts in protease. Moreover, the C-termini of ClpX particles from various species are very poorly conserved, arguing against their involvement in any allosteric activation mechanism [47].
2 The ATP-dependent Protease HslVU
Experimental evidence implicates an internal loop of ClpX in ClpP binding [63]. This loop is required for ClpXP proteolytic activity and may well be the functional equivalent of the C-terminus of HslU. If so, then the symmetry mismatch in ClpXP would suggest that only a subset of ClpX loops could insert into ClpP clefts at any given time. 2.6 HslVU Peptidase as a Model for the Eukaryotic 26S Proteasome?
On the sequence level, HslV shows sequence similarity with the β-subunits of arch-aebacterial and eukaryotic proteasomes. The crystal structure of E. coli HslV confirmed that individual subunits share the Ntn-hydrolase fold with Thr1 at the N-terminus as the nucleophile, just as in proteasomes. Despite these similarities, there are substantial differences between bacterial HslVU and archaebacterial and eukaryotic 20S proteasomes. In contrast to HslVU, 20S proteasomes are assembled from four rings of seven subunits each, that build up a central proteolytic chamber and two flanking antechambers. The essential role of the C-terminus of HslU has its direct counterpart in the essential role of the C-terminus of the ATP-dependent proteasome activator PA28 [64]. A complex of the yeast 20S proteasome with PA26, the Trypanosoma brucei homolog of PA28 has been crystallized and shows that the C-termini of PA28 insert into clefts in the 20S core particle [65], leading to an opening of the gates in the antechambers. So far, there is no evidence for allosteric activation in the 20S proteasome-PA26 complex or the 26S proteasome, where channel “gating” appears to be important as discussed below in Section 3 on the yeast 20S proteaseome. Currently, high-resolution EM image reconstructions for the 26S proteasome [66], but no atomic-resolution crystallographic data are available for any complex of 20S proteasomes with ATP-dependent activators. The expected assembly of PAN, the archaebacterial AAA(+) activator of proteasomes [67] into hexamers suggests a symmetry-mismatched complex in archaebacteria. The ATP-dependent proteasome activators of eukaryotic proteasomes known as PA700 and the 19S cap also contain six AAA(+) ATPases in a subcomplex of the 19S complex known as the “base” [2]. A priori, one would expect the six ATPases to form a ring with pseudo six-fold symmetry similar to the six-fold ring seen in bacterial AAA(+) activators, but two-hybrid experiments have suggested alternative models [68]. It is currently not clear whether the C-termini of the AAA(+) ATPases in the 19S cap play a similar role as in the ATP-independent PA28-20S proteasome complex. There is no consensus in the C-terminal sequences of different proteasomal AAA(+) ATPases of any particular species, but consensus sequences for the C-termini of any particular subunit from different species can be defined. Unfortunately, the overall sequence similarity of homologous sequences from different species is too high to infer a functional role of the AAA(+) C-termini from sequence similarity. As discussed above, the eubacterial HslVU is distantly related in structure to the proteasome found in archaea and eukaryotes. Surprisingly, however, the structural
11
12 Molecular Machines for Protein Degradation
relationship is not reflected in the regulatory properties as will be described in Section 3, which focuses on structural studies of the yeast 20S proteasome and its activation, activity, and inhibition.
3 The Yeast 20S Proteasome
The most elaborate version of the proteasome core particle (CP) is found in eukaryotes as shown by the crystal structure of the yeast 20S proteasome [40] (see Figure 5B). Here, the α- and β-subunits have diverged into seven different subunits each as compared to the archaeal enzyme which mostly consists of two components [70]. The subunits are present in two copies and occupy precisely defined positions within the 20S complex. As in the archaeal proteasome, the α-subunits are inactive and contribute to the antechambers of the particle, whereas the β-subunits set up the inner hydrolytic chamber. Remarkably, four of the seven different eukaryotic β-subunits lack residues that are essential for pro-peptide autolysis and are therefore proteolytically inactive. As in the archaeal CP, the remaining three subunits mature autoproteolytically to active threonine proteases but with a caspase-like (β1), trypsin-like (β2), and chymotrypsin-like (β5) activity, they exhibit different cleavage potentials [40]. The crystal structure of the bovine 20S proteasome [71] demonstrated that yeast and mammalian CPs are highly homologous in their structural architecture, quaternary assembly, and active-site geometry. However, in mammalian cells, three additional non-essential subunits, β1i, β2i and β5i, respectively, can replace their constitutive counterparts upon induction by the cytokine γ -interferon. The interchange of active subunits modifies the CPs peptidase specificity and is important for the function of the immunoproteasome. The specificity of the S1 pockets of the induced subunits increases the yield of peptides favored for binding to MHC class I molecules and antigen presentation. The bovine proteasome structure provides an explanation of how the constitutive and inducible β-subunits can be mutually interchanged. In the yeast proteasome, these subunits are held in place by several specific interactions, which are absent in the mammalian homolog [71]. 3.1 The Proteasome, a Threonine Protease
As mentioned in Section 3, proteasomes are threonine proteases. Accordingly, a proton acceptor is required to activate the hydroxylic group of Thr1. Although there are several potential acidbase catalysts around Thr1, activation seems to proceed by its own terminal amino group [2, 41]. A positively charged side chain of Lys33 lowers the pK a of the Thr1 amino group (see Figure 6). The assignment of the N-terminus as the catalytic base in proteolysis is further supported by the fact that all Ntnhydrolases share a common fold, but generally do not display any kind of active-site consensus. A second essential factor for proteolysis is a catalytic water molecule that has been observed in the high-resolution structures of CPs [23, 40, 46, 71]. The
3 The Yeast 20S Proteasome
Fig. 5 Molecular surface of the archaeal (A), the eukaryotic 20S (B) and the HslV proteasome (C). The accessible surface is colored in blue, the clipped surface (along the cylinder axis) in white. To mark the position of the active sites, the complexes are shown with the bound inhibitor calpain (yellow). (A) The disorder of the first Nterminal residues in the archaeal "-subunits generates a channel in the structure of the CP, (B) whereas the asymmetric but welldefined arrangement of the " N-terminal tails seals the chamber in eukaryotic CPs. (C) The eubacterial “miniproteasome” has an open channel through which unfolded proteins and small peptides can access the proteolytic sites. (D) Ribbon plot of the free "-ring from A.fulgidus focusing on
the defined N-termini (red). Tyr8 of each N-terminal part makes hydrogen bonds to Asp9 of the adjacent "-subunit (yellow), Arg10 (red) points toward the channel, generating a 13-Å entrance. The final 2FO -FC electron density map, contoured at 1F is shown for the YDR-motif (E, F) Electron density maps of the yeast core particle from wild type and "3N mutant, respectively The individual N-terminal tails of the "subunits are drawn in different colours. Asp9 of subunit "3 plays a key role in stabilizing the closed state ofthe channel and is marked with a black arrow. In the "3N mutant, an open axial channel is visible, whose dimensions are comparable to those ofthe archaeal CP channel.
13
14 Molecular Machines for Protein Degradation
Fig. 6 Proteolytic site of the yeast $1 subunit. The protein backbone is drawn as a white coil with the active-site residues Thr1, Asp17, Lys33, Ser129, Asp166 and Ser169 shown in ball-and-stick mode. Owing to the salt-bridge with Asp17, the amino group of Lys33 should be positively charged and thus
be able to lower the pK a of Thr1O( electrostatically. The other active-site residues Ser129, Asp166 and Ser169 define the orientation of Thr1. A water molecule (green sphere) is located properly to shuttle protons between the terminal amino group and Thr1O( during proteolysis.
solvent molecule is ideally positioned to shuttle protons between the Thr1Oγ and the N-terminus. Furthermore it could act as the base for the cleavage of the acyl ester intermediate, thereby releasing Thr1Oγ for the next catalytic cycle [23]. In general, the cleavage products of the proteasome vary in length between 3 and 25 amino acids with an average length of 7 to 8 amino acids. Several models have been suggested for the “molecular ruler” that determines fragment length. On the basis of structural studies of mutants unable to autolyse, we suggest that it is determined by the substrate-binding clefts designed for peptides 7–9 amino acids long [41]. The likelihood of substrate cleavage depends on the mean residence time at the proteolytic sites, which is maximal if all binding sites are filled [22, 77]. The active subunits in eukaryotic CPs differ mainly in their binding pockets, yielding different cleavage specificities. However, it must be emphasized that the proteasome complex does not represent a simple collection of chymotrypsin-like, trypsin-like, and caspase-like enzymes. In fact it is the structural architecture of the proteolytic chamber that determines specificity. The local structure around each active site imposes a physical constraint on the peptide substrates, whereas the selectivity of the S1 pockets is less relevant. The mechanistic importance of the inner chamber can also be seen from the fact that substrate residues other than P1 influence degradation [13, 17, 45] and that neighboring subunits interfere with
3 The Yeast 20S Proteasome
the functions of the catalytic subunits [41, 49, 53]. However, inhibitor binding and mutational studies indicate that allosteric interactions between individual subunits are insignificant [41, 45, 49, 53, 83] contrasting the allosteric activation of HslV by HslU described in the previous section. 3.2 Inhibiting the Proteasome
Proteasome inhibitors have been instrumental in identifying numerous protein substrates and in elucidating the importance of the proteasome/ubiquitin pathway in many biological processes. Initially, non-specific cell-penetrating peptide aldehydes were used for this purpose. More recently, it became possible to synthesize compounds with increased potency and selectivity [1, 30]. Furthermore, based on the crystal structure of the yeast and bovine liver CP [40, 71], molecular modeling can now be used to engineer improved inhibitors. Besides the synthetic inhibitors, a variety of natural compounds is known to inhibit the CP. One of these natural inhibitors, lactacystin, was discovered by its ability to induce neurite outgrowth in a murine neuroblastoma cell line. Incubation of cells in the presence of radioactive lactacystin leads to the labelling of the β5 subunit [32] and to irreversible inhibition of the CP. As shown by X-ray analysis, the inhibitor is covalently attached to subunit β5 by an ester bond with the N-terminal Thr1Oγ [40] (see Figure ??A). The subunit selectivity of lactacystin can be attributed to its dimethyl group, which mimics a valine or a leucine side chain and closely interacts with Met45 in the hydrophobic S1 pocket of subunit β5. Epoxomicin, an α ,β -epoxyketone peptide, is a natural compound that potently and irreversibly inhibits the catalytic activity of the CP [83]. Unlike most other proteasome inhibitors, epoxomicin is highly specific for the proteasome and does not inhibit any other protease. The crystal structure of epoxomicin bound to the yeast CP explained the unique selectivity of the inhibitor (see Figure ??B). Adduct formation yields an unexpected morpholino ring, which is formed between the Thr1Oγ , the N-terminus, and the epoxy group of the inhibitor [43]. However it should be noted that proteasome inhibitors that covalently bind to the active βsubunits, usually cause apoptosis and cell death [66]. They are therefore cytotoxic and thus may not be pharmaceutically relevant. Recently, it was shown that certain natural products from Apiospora montagnei, TMC-95s, block the proteolytic activity of the CP selectively and reversibly in the low nanomolar range [67, 69]. The TMC-95s represent a novel class of proteasome inhibitors consisting of modified amino acids, which form a heterocyclic ring system. The crystal structure of the yeast CP in complex with TMC-95A shows the inhibitor non-covalently bound to all active sites [44] (see Figure ??C). TMC-95A was anchored by several specific hydrogen bonds, which are formed with main-chain atoms and strictly conserved residues of the β-subunits. The structures of TMC95s contain a crosslink between a tyrosine and an oxoindol side chain, resulting in a strained conformation that fits ideally to the CP active site. Thus the entropic penalty of binding is lower than for more-flexible ligands, which in turn explains
15
16 Molecular Machines for Protein Degradation
3 The Yeast 20S Proteasome
the specificity and high affinity of the TMC-95s. Modeling studies indicate that it is possible to generate a TMC95 scaffold with a variety of functional groups attached to target the proteasomal S1 and S3 pockets and generate subunit specificity [54, 78]. All agents that specifically inhibit the proteasome are potentially of great pharmacological interest [80]. As the CP plays a dominant role in generating antigenic peptides, which are subsequently bound by MHC I molecules, compounds that block this activity might serve as a basis for the development of immunosuppressive drugs. Attempts are being made to design synthetic proteasome inhibitors using the discussed natural inhibitors as lead structures. In addition, proteasomal inhibitors represent powerful tools in molecular biology and can be utilized to identify novel cellular roles of the proteasome. 3.3 Access to the Proteolytic Chamber
In the Thermoplasma acidophilum and Archaeoglobus fulgidus CP, two narrow entry ports of 13-Å diameter exist at both ends of the cylinder, which prevent folded proteins entering [46, 70] (see Figure 5A). Many archaebacteria, such as Methanococcus jannaschii, contain a gene named PAN (proteasome-activating nucleotidase), which is highly homologous to the six ATPases in the 19S-component of the eukaryotic 26S proteasome [67]. It was shown that PAN selectively stimulates the degradation of unfolded proteins, whereas the digestion of small peptides is not enhanced. The threading of specific protein substrates into the lumen of the CP requires the action of an ATPase. The corresponding translocation process catalyzed by PAN follows ATP-dependent unfolding [6, 86]. However, it still remains to be clarified whether complex formation is a prerequisite for the cooperation between PAN and CP or whether the two systems work independently. In contrast to the archaeal CPs, the hydrolytic chamber of the eukaryotic 20S proteasome is tightly sealed (see Figure 5B). The N-termini of the α-subunits project down and across the axial pore and block the entrances by several layers of interdigitating side chains, which form ← Fig. 7 Inhibitor binding to individual active sites of the yeast 20S proteasome. The inhibitors lactacystin (A), epoxomicin (B) and TMC95A (C) are colored green and are shown in stereo mode together with their unbiased electron densities. The active-site Thr1 is highlighted in black. (A) Covalent binding of the Streptomyces metabolite lactacystin to the active site of $5. The S1 pockets of the active subunits $1 and $2 differ from that of $5 and are not suitably constructed to bind the inhibitor. As discussed in the text, Met45 (black), which is located at the bottom of the $5-S1 pocket, makes the difference for inhibitor binding.
(B) Covalent binding of the proteasome inhibitor epoxomycin to $2. The electron density reveals the presence of a unique six-membered ring. The morpholino derivative results from adduct formation between epoxomycin and the proteasomal Thr1O( and amino terminus (pink sticks) and explains the specificity of the inhibitor towards Ntn-hydrolases. (C) Noncovalent binding of the specific proteasome inhibitor TMC-95A from Apiospora montagnei to $2. TMC-95A binds near the proteolytic centre in all active subunits in the extended substrate binding site.
17
18 Molecular Machines for Protein Degradation
a lattice-like structure [40] (see Figure 5E). Thus activation of the eukaryotic CPs requires substantial structural rearrangements of the N-terminal tails to open the molecular gate. This regulatory principle has been confirmed by a yeast CP mutant, in which the first nine amino acids of subunit α3 were deleted (α3N-mutant) [42]. The α3-N-terminal tail was chosen for deletion because it traverses the pore of the CP and contacts all other N-termini that are involved in the structural organization of the plug [40, 71]. In the crystal structure of the mutant, open axial pores were observed that were equivalent in size to those seen in the archaebacterial CP (see Figure 5F). Several points of evidence indicate that opening of the gate is indeed essential for catalytic activation, as all proteolytically active sites are simultaneously activated and no significant structural changes can be seen between mutant and wild-type CP excluding allosteric effects. Furthermore, addition of the synthetic α3-N-terminal peptide to the α3N mutant restores wild-type behavior. An alanine scan of this peptide revealed that α3-Asp9 is essential for stabilizing the closed state of the channel. This aspartate residue closely interacts with Tyr8 and Arg10 of the neighboring subunit α4 (see Figure 5E). The strict conservation of this YDR motif and of other α-N-terminal residues suggests a universal mechanism for opening gates in eukaryotic CPs that has been conserved during evolution. We suggest that binding of regulatory proteins to the CP triggers the rearrangement of the α3-tail and thus opens the gate. This notion was further confirmed by the crystal structure of a 20S/11S heterologous complex between the yeast 20S-proteasome and the Trypanosoma 11Sregulator [65]. This approach was justified by the ability of 11S regulators to activate 20S proteasomes from widely divergent species. The structure of the chimeric complex showed that one cylindrical regulator was bound at each end of the 20S barrel structure. Unlike the uncomplexed proteasome, all of the seven α-subunit Nterminal tails extend away from the CP in the complex and project towards the pore of the regulator. This rearrangement provides access to the proteolytic chamber and is basically achieved by two features: Firstly, the 11S-“activation loops” impose a more stringent seven-fold symmetry on the CP thereby straightening out the asymmetrically oriented α-tails and removing them from the entrance/exit gates. Secondly, the high-affinity binding between CP and 11S is accomplished by the C-terminal sequences of the regulator, which insert into pockets formed between the 20S α-subunits. The major contact is observed between the C-terminal mainchain carboxylate of the 11S regulator and the entry of an internal helix of the CP α-subunit. The strength of this interaction is amplified by the heptameric assembly of the 20S/11S complex [65]. Activation of the CP by the 19S-complex is also regulated by controlling access to the proteolytic chamber, but the gating mechanism differs from that seen in the 11S/20S-complex. The 19S RP consists of two subcomplexes termed lid and base [35]. The base appears to form a ring like structure, including six conserved ATPase subunits (Rpt1–6), and Rpn1 and Rpn2, which are located proximal to the CP’s α-ring. Mutation studies indicated that the ATPase domain of Rpt2 plays a major role in regulating peptidase activity and that the 19S RP opens the gate to the protease in an ATP-dependent manner [68]. No detailed structural information
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV 19
is as yet available that could provide further insight into how the 19S RP controls proteasomal activity. In some but not all archaea the proteasome is accompanied by another large cage-forming protease, the tricorn protease. Tricorn functionally interacts with the proteasome by cleaving the proteasomal peptide products into smaller peptides, which are further degraded into single amino acids by associated factors. Structural and functional aspects of tricorn are described in Section 4. The unexpected relationship between tricorn and the eukaryotic dipeptidyl peptidase IV revealed by these structural studies is also discussed in brief.
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV
Each living cell is a complex system and needs to continuously clear unnecessary or defective components. Within this context, the importance of the proteasome is well established (see Section 2). It predominantly carries out the degradation of cytosolic proteins and generates peptides varying in length between 3 and 25 amino acids. In order to be useful resources to the cell, these products need to be further degraded to eventually yield single amino acids. In the model organism Thermoplasma acidophilum a proteolytic system has been identified that does indeed perform this processing [101]. Based on the crystal structure of the tricorn protease [14], we provide evidence of how the tricorn protease accomplishes efficient turnover of the proteasome-generated peptides. The structure of tricorn reveals a complex mosaic protein whereby five domains combine to form one of six subunits, which further assemble to form the D3 symmetric core protein. The structure shows how the individual domains coordinate the specific steps of substrate processing, including channeling of both the substrate to and the product from the catalytic site. Moreover, the structure shows how accessory protein components might additionally contribute to an even more complex protein machinery that efficiently collects the tricorn-released products. 4.1 Architecture of the Tricorn Protease
The hexameric D3-symmetric tricorn protein is assembled by two perfectly staggered and interdigitating trimeric rings with every subunit of one ring forming contacts almost exclusively with the two subunits of the other ring related by the molecular diads. The toroid structure has the shape of a distorted hexagon formed by a trimer of dimers (see Figure 8). The overall dimensions of the molecule are 160 Å within the plane normal to the three-fold axis and 88 Å parallel to it. The conically shaped central pore connects with additional cavities formed by the individual subunits like spokes of a wheel (see Figure 8). A single subunit is further divided into five sequential sub-domains, namely the N-terminal six-bladed
20 Molecular Machines for Protein Degradation
Fig. 8 Surface representation of the tricorn protease with the ribbon model of one subunit superimposed. The two orthogonal views are along the molecular two-fold and three-fold axis, respectively. The six solid spheres indicate the active-site positions.
β-propeller (β6) followed sequentially by a seven-bladed β-propeller (β7). Both β6 and β7 are topologically unclosed, an extremely rare feature observed only in the prolyl oligopeptidase (POP) [33] and DPIV protease [31, 96]. A PDZ-like domain (R761-D855) is interspersed between the two C-terminal mixed α-β domains. These C-terminal domains harbor the catalytic residues and exhibit the α-β hydrolase fold again underlining the relationship of tricorn with DPIV and POP. 4.2 Catalytic Residues and Mechanism
To elucidate the amino acids crucial for its catalytic activity, we have co-crystallized tricorn with a series of chloromethyl ketone-based inhibitors, including TLCK and TPCK for which we have confirmed inhibitory efficacy. For all of these inhibitors, we have observed continuous electron density connecting to the side chain of S965 which was unambiguously fitted by the respective inhibitor. S965 is positioned at the entrance to helix H3 within sub-domain C2. The uncapped amino group of D966 forms, together with that of G918, the oxyanion hole, which is occupied by a water molecule in the uninhibited structure. H746 is ideally positioned to activate the catalytic S965 at a hydrogen-bonding distance of 2.7 Å. However, in none of the inhibitor complexes could we observe a covalent linkage between H746 and the inhibitor, as observed in the trypsin-like serine proteases [11]. Tricorn is related
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV 21
to the cysteine proteinases in this respect [29]. We confirmed that both residues are crucial for catalysis by constructing the single-site mutants S965A and H746A, both of which are amidolytically inactive. The H746 is correctly oriented by the Oγ of S745 which in turn is polarized by E1023. The arrangement of S965, H746, and the oxyanion hole suggests that the classical steps of peptide-bond hydrolysis follow the sequence of the trypsin-like serine proteases, namely the formation of the tetrahedral adduct, the acyl-enzyme complex, and hydrolysis. Tricorn has been shown to exhibit both tryptic and chymotryptic specificities [101]. The X-ray structure reveals that specificity for basic P1 residues is conferred by D936 which is provided by the diad-related subunit (see Figures 9 and 10). In this way, the previously described structural linkage (trimer of dimers) is translated into functional cooperativity within the dimers. Intriguingly, in the uninhibited high-resolution crystal structure, the acidic S1 specificity-determinant residue D936 was mobile. Consistent with this, the side chain of D936 in the TPCK complex structure adopts an alternative rotamer to allow the TPCK phenyl ring to freely access the hydrophobic niche formed by Y946, I969, V991, and F1013. D936 thus serves as a substrate-specificity switch accommodating both hydrophobic and basic P1 residues. The SO2 group of TPCK and TLCK interacts with the NH moiety of I994, thereby already suggesting the strand E993–P996 as the unprimedsubstrate docking site. These substrate-recognition sites are rather unrestricted in accord with tricorn’s broad substrate specificity [101, 108]. This situation is contrasted by the length restriction of the primed-substrate recognition site. A prominent cluster of basic residues (R131, R132) delineates the binding site of the substrate C-terminus. These basic residues, positioned on a flexible loop as discussed in detail below, together with the primed-site topology, clearly mark tricorn as a carboxypeptidase. The geometric dimensions explain
Fig. 9 Stereo view of a 13-mer chloromethyl ketone bound to the active site. The electron density of the peptide directs to the $7 propeller.
22 Molecular Machines for Protein Degradation
Fig. 10 Detailed active-site view and substrate recognition as deduced from experimental complex structures. The substrate C-terminus is anchored by R131 and R132.
tricorn’s preferential di- and tri-carboxypeptidase activity, while the cleavage of longer peptides will require some conformational rearrangement and is energetically less favorable. By contrast, single amino acids cannot be cleaved off a substrate, because the P1 residue is unable to anchor its carboxylate-group on the basic backstop residues (see Figures 9 and 10). A negative charge was not tolerated at positions P3, P4, and P5 of a synthetic fluorogenic AMC-substrate [108]. The crystal structure did not indicate any steric or electrostatic conflicts, if a canonical binding mode of these substrates was assumed. Owing to their lack of a free C-terminus, the charge polarity of N-terminally succinylated fluorogenic substrates is inverted with respect to an unmodified peptide substrate and may lead to unproductive binding with inverted strand polarity. Each of the three C-terminal domains (C1, PDZ, C2) is remarkably similar to the respective domains (A, B, and C) found in the D1-processing protease (D1P) of photosystem II. The rms deviations between the Cα positions of these domains are 2.2, 2.3, and 2.7 Å with 84, 86, and 135 matching amino acids, respectively. A weak homology between these domains is recognizable in the primary sequences
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV 23
(11, 19, and 20% identities). The relative arrangement of these domains, however, differs very much between tricorn and D1P. With the C2 domain aligned to the C domain of D1P, the orientation of the C1 domain differs from that of the D1P A-domain by 35◦ . Analogously, the required transformation to align the PDZ-like domains includes a 96◦ rotation. The rotation axes of these transformations are unrelated to each other. In addition, proper alignment of the PDZ domain requires a 30-Å translation. The catalytic serine residues (S965 and S372, respectively) are positioned on topologically equivalent positions at the helix entrance in the C2 (C) domain (D1P). Further, the amides forming the oxyanion hole (G918, D966, and G318, A373 in tricorn and D1P, respectively) superimpose to within 1 Å. As in other Tsp-like proteases, the residue serving as general base in D1P is a lysine (K372) residing within the C domain of D1P, while it is a histidine in tricorn (H746) which resides on tricorn’s C1 domain. The relative arrangement of the C1 and C2 domains in tricorn must, for that reason, remain very restricted to allow for proper catalysis. One role of the PDZ domain in substrate recognition has been shown for Tsp [5] and was analogously suggested for the tricorn protease [93]. While the GLGF substrate recognition element is structurally conserved (R764 IAC767 in tricorn), as pointed out earlier, it appears for a number of reasons unlikely that the tricorn PDZ will participate in substrate recognition in the same way as suggested for D1P [77]: (1) The putative substrate-binding site as defined by the crystal structures of the C-terminal peptides complexed with PDZ domains [16, 28] is partly occupied by outer strands of blade 3 of β6 within the same subunit; (2) the generally conserved arginine (R247) involved in recognition of the carboxylate of the peptide C-terminus corresponds to a hydrophobic residue in tricorn (I851); (3) the orientation and position of tricorn PDZ differs so strongly from that seen in D1P that any analogy based on the sequential domain arrangement is invalidated on the basis of their respective three-dimensional domain arrangement. Instead, the PDZ domain mainly serves to scaffold the sub-domains as described earlier and, in addition, might be involved in recognition of associating component proteins.
4.3 Substrate Access and Product Egress Through β-propellers
The comparison with POP, including the open Velcro-topology [33], suggests an important role of the β-propellers for substrate access to and product exit from the active site [31] (see Figure 11). Both the β6 and β7 propeller axes are directed towards the active site of the protein, almost intersecting near S965. The arginine anchor (R131, R132) obstructs the otherwise direct connection from the active-site chamber to the exterior through the β6 propeller. Given these observations, we propose that the β6-propeller channel represents one, if not the major, rear exit from the catalytic chamber. This is consistent with the point mutation L184C, positioned within the β6 propeller. The introduced thiol group was modified with maleimide, partially blocking the β6 propeller. The activity of this mutant enzyme towards fluorogenic
24 Molecular Machines for Protein Degradation
Fig. 11 Cartoon of the electrostatically driven processive substrate turnover.
substrates is significantly reduced (<50%) compared with the wild-type protein [14, 64]. The substrate entrance and product exit paths are indicated in Figure 11. The chloromethyl ketone-based inhibitor-complex crystal structures suggested the strand E993–P996 as a recognition strand for the unprimed-substrate residues. This strand extrapolates towards the β7 channel (see Figure 9). The channel through the β7 propeller provides a significantly shorter route from the catalytic chamber to the outside of the protein (60 Å) as compared to the alternative route through the central pore (83 Å). The latter path to the active site has multiple branchings and dead ends. Therefore, the β7 channel might be utilized by the enzyme for the preferred substrate passage to the active site. It is wide open but capped on its outside by four basic residues (R369, R414, R645, K646) which are only partially charge-compensated by one acidic residue (D456). This locally positive lid to the β7 propeller channel is encircled by acidic residues (D333, D335, D372, D456, D506, D508, E592, and E663). Except for E663, which is located on the hairpin connecting strand 3 and 4 of blade 7, all these charged amino acids are positioned between strands 1 and 2 of the respective β7 blades. The resulting charge distribution mimics an electrostatic lens, whereby peptides are pre-oriented with their C-termini towards the central basic propeller lid. Once the entrance to the β7 channel is opened by a concerted side-chain movement of R369, R414, R645, and K646, and possibly assisted by main-chain movements of A643–K646 (blade 7), a peptide is able to enter the channel in an extended conformation where it will find multiple docking sites at the unsaturated inner strands of the β7 propeller blades. A similar substrate-gating filter mechanism through a seven-bladed β propeller has been suggested for the prolyl oligopeptidase [33, 34], and there is precedent for a β-hairpin binding into a seven-bladed propeller [52]. The preferred substrate entry through the β7 propeller channel is in line with the point mutation R414C, located in the β7 channel. Derivatization of this introduced thiol group with maleimide markedly reduced the fluorogenic activity of this mutant to about 50% of the wild-type activity [14].
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV 25
Tricorn cleaves substrates in a processive mode [64], indicating that only completely digested products will leave the inner protein chambers while larger products will be retained and processed as preferred substrates. The structure suggests several mechanisms to maintain “one way” processing. Basic lids (R414, R645, K646 and R131, R132) are present at the entrances to the β6 and β7 channels. The topology and size of the inner cavities favor an extended conformation of the substrate and the C-terminus of the substrate will be attracted to the basic β6 lid, thereby presenting the substrate’s scissile bond at the active site S965 for proteolyis. In one possible scenario, the primed product residues are released by the enzyme through the “rear exit” to the active site formed by the β6 propeller, which is gated by R131–R132. The arginine gate is located on a helical loop containing three glycines (G126, G130, G139) and not restrained to its position via any protein contacts. These glycines might function as hinge residues allowing the gate to move into a sufficiently voluminous cavity of mixed polarity (see Figure 10). The unprimed side of the substrate is held in place by a series of interactions with the protein. In addition to the observed ionic (D936) or hydrophobic S1 interaction site (Y946, I969, V991, F1013), the P1 main chain is held by its interaction with the oxyanion hole (G918, D966). P2–P4 residues will presumably utilize unsaturated main-chain hydrogen bonds at the strand I994–P996 (see Figure 10) and further interactions might occur in the β7 propeller channel as described in galactose oxidase [52]. The modeling studies and suggested substrate binding at the primed and unprimed sides are fully experimentally confirmed by crystal-structural studies using C- and N-terminally extended covalently bound inhibitors [64]. Tricorn reportedly cooperates with three additional proteins, termed interacting factors F1, F2, and F3, to degrade oligopeptides sequentially to yield free amino acids [37] . F1 is a prolyl iminopeptidase with 14% sequence identity to the catalytic domain of prolyl oligopeptidase POP, which has an additional propeller domain [33, 37]. Guided by this structural scaffold of the latter structure, we speculate that F1 docks onto the six-bladed β-propeller of the tricorn core protein. As in POP, substrate would enter F1 through the propeller channel in this model. While a physical interaction of F1 with tricorn has been suggested [117], the exact mode of interaction of tricorn with F1, F2, and F3 has not been detailed so far. Similarly, there is evidence for functional but not physical interaction of tricorn with the proteasome [117]. A physical interaction between these molecules by aligning their respective central pores would imply a symmetry mismatch. While such a physical interaction would be consistent with the geometric dimensions of both molecules, its existence needs to be experimentally confirmed and characterized. 4.4 Structural and Functional Relationship of Tricorn and DPIV
The situation in the tricorn protease is closely resembled by dipeptidyl peptidase IV (DPIV) where an eight-bladed topologically open β propeller and a side opening provide entrance to and exit from the active site (see Figure 12). Similar to tricorn, DPIV is a serine protease with low but significant structural homology to the
26 Molecular Machines for Protein Degradation
Fig. 12 Ribbon representation of the tetrameric DPIV.
family of α/β-hydrolases. We superimposed the catalytic core elements, including the active-site serine and histidine, the strictly conserved helix following the active-site serine (Ser630-Ala642 and Ser965-Leu977, respectively), and tricorn’s five-stranded parallel β-sheet onto the equivalent strands of the eight-stranded DPIV-sheet. Both sheets have identical polarity. Significantly, both tricorn propellers come to superimpose onto the two DPIV-openings, the tricorn β7 propeller onto the DPIV β8 propeller, and the tricorn β6 propeller onto the side exit, as schematically indicated in Figure 13. This similarity suggests that the β8 propeller provides substrate access to, and the side opening product release from, the DPIV active site. This tricorn-derived model is able to explain the high substrate selectivity critical for DPIV-function to activate or inactivate regulatory peptides. Passage through the β propeller tunnel requires the substrates to unfold thereby providing their “fingerprint” to DPIV. Once the amino terminus of the peptide approaches the active site, it is still held in place by its C-terminus interacting with the β propeller which may contribute to conformationally activate the substrate for cleavage. After the nucleophilic attack the acyl-enzyme intermediate forms, while the primed product is directly released through the side exit. This explains why degradation of glucagon by DPIV is not processive, but occurs sequentially in two independent steps (glucagon 3–29, glucagon 5–29) [94]. Clearly, the final determination of the functional roles of the DPIV openings awaits further experiments.
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV 27
Fig. 13 Schematic representation of the active-site access and product egress in tricorn and DP IV.
4.5 The DegP Protease Chaperone: A Molecular Cage with Bouncers
In this section we describe DegP, a bacterial cage-forming protease, which has ho-mologs in all kingdoms of life. It is distinguished in essential ways from the previous systems by exhibiting extreme flexibility and potential to change its overall shape and its internal structure. Structural flexibility is translated into function and the unique property of DegP to act predominantly as a chaperone or as a protease dependent on temperature. DegP is a Janus-faced molecule appearing as helper or killer as cells need it. Cells have developed a sophisticated system of molecular chaperones and proteases to reduce the amount of unfolded or aggregated proteins [120]. Chaperones recognize hydrophobic stretches of polypeptides that become surface exposed as a consequence of misfolding or unfolding. If refolding attempts fail, irreversibly damaged polypeptides are removed by proteases. E. coli contains several intracellular proteases that recognize and degrade abnormally folded proteins. The biochemical and structural features of these ATPdependent proteases have been studied extensively (see Section 2). However, relatively little is known about proteases that are responsible for the degradation of non-native proteins in the periplasmic compartment of gram-negative bacteria. Such function has been attributed to the heat-shock protein DegP, also commonly referred to as HtrA or Protease Do. While most factors involved in protein quality control are ATP-dependent heat-shock proteins [38], DegP fulfills this role without consuming chemical energy [79]. DegP homologs are found in bacteria, fungi, plants, and mammals. Some, but not all, are classical heat-shock proteins. They are localized in extracytoplasmic compartments and have a modular architecture composed of an N-terminal segment believed to have regulatory functions, a conserved trypsin-like protease domain and one or two PDZ domains at the C-terminus [20].
28 Molecular Machines for Protein Degradation
PDZ domains are protein modules that mediate specific protein–protein interactions and bind preferentially to the C-terminal 3–4 residues of the target protein [104]. Prokaryotic DegPs have been attributed to the tolerance against thermal, osmotic, oxidative, and pH stress as well as to pathogenicity [91]. A number of DegP substrates are known. These are either largely unstructured proteins such as casein, small proteins that tend to denature, hybrid proteins, or proteins that entered a non-productive folding pathway [70, 79, 113]. Stably folded proteins are normally not degraded. In addition to its protease activity, DegP has a general chaperone function. The dual functions switch in a temperature-dependent manner, the protease activity being most apparent at elevated temperatures [113]. The ability to switch between refolding and degradation activity and the large variety of known substrates make DegP a key factor controlling protein stability and turnover. 4.6 The DegP Protomer, a PDZ Protease
DegP from E. coli was crystallized at low temperatures in its chaperone conformation and analyzed [71]. The protomer can be divided into three functionally distinct domains, namely a protease and two PDZ domains, PDZ1 and PDZ2 (see Figure 14). Like other members of the trypsin family, the protease domain of DegP has two perpendicular β-barrel lobes with a C-terminal helix. The catalytic triad is located in the crevice between the two lobes. While the core of the protease domain is highly conserved, there are striking differences in the surface loops L1, L2, and L3 (for nomenclature see Perona and Craik [92], which are important for the adjustment of the catalytic triad (Asp105, His135, Ser210) and the specificity pocket S1. The enlarged loop LA protrudes into the active site of a molecular neighbor, where it intimately interacts with loops L1 and L2. The resulting loop triad LA*-L1-L2 completely blocks the substrate-binding cleft and results in a severe deformation of the proteolytic site abolishing formation of the catalytic triad, the oxyanion hole, and the S1 specificity pocket. Thus the protease domain of the DegP chaperone is present in an inactive state, in which substrate binding as well as catalysis is prevented [71]. The structure of the PDZ domains of DegP is similar to PDZ domains of bacterial origin [77]. Compared to the canonical 4+2 PDZ β-sandwich [16], the DegP PDZ domains show a circularly permuted secondary structure, in which the Nand C-terminal strands are exchanged. Furthermore, they contain a 20-residue insertion following the first β-strand (including helix f) that is important for interand intramolecular contacts within the oligomer. In analogy to other PDZ domains, PDZ1 and/or PDZ2 should be involved in substrate binding. PDZ1 contains a deep binding cleft for substrate, which is mainly constructed by strand 14, its N-terminal loop (the so-called carboxylate-binding loop) and helix h. The carboxylate-binding loop is located in a highly positively charged region and is formed by an E-L-G-I motif, which is similar to the frequently observed G-L-G-F motif [16]. Binding specificity is mainly conferred by the specific configuration of the 0, −2, and −3 binding pockets [109], where pocket 0 anchors the side chain of the
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV 29
Fig. 14 Structure of DegP protomer. Ribbon presentation of the monomer, in which the individual domains are colored differently. Residues of the catalytic triad are shown in a ball-and-stick model. The nomenclature of secondary structure elements, the termini of the protein and regions that were not defined by electron density are indicated.
C-terminal residue. In PDZ1, all pockets are built by mainly hydrophobic residues. The thermal motion factors point to the flexibility of strand 14 and its associated carboxylate-binding loop, indicating the plasticity of the binding site. Thus PDZ1 seems to be well adapted to bind various stretches of hydrophobic peptide ligands. Different from PDZ1, the occluded binding site of PDZ2 is unlikely to be involved in substrate recognition. 4.7 The Two Forms of the DegP Hexamer
In the crystallographic asymmetric unit, two DegP molecules (A and B) were observed, which build up two distinct hexamers (see Figure 15). Both hexamers are formed by staggered association of two trimeric rings. Hexamer A is a largely open
30 Molecular Machines for Protein Degradation
Fig. 15 Structure of the DegP hexamer. Ribbon presentation of the monomer, in which the individual domains are colored differently. Residues of the catalytic triad are shown in a ball-and-stick model. The nomenclature of secondary structure elements, the termini of the protein and regions that were not defined by electron density are indicated.
structure with a wide lateral passage penetrating the entire complex (see Figure 15A), whereas hexamer B corresponds to the closed form, in which a cylindrical 45-Å cavity containing the proteolytic sites is completely shielded from solvent (see Figure 15B). In both cases, the top and bottom of the DegP cage are constructed by the six protease domains, whereas the twelve PDZ domains generate the mobile sidewalls. The height of the cavity is determined by three molecular pillars, which
4 The Tricorn Protease and its Structural and Functional Relationship with Dipeptidyl Peptidase IV 31
are formed by the enlarged LA loops of the protease domain. The PDZ domains are able to adopt different conformations and represent side doors that may open. This en-bloc mobility enables the PDZ domains to function as tentacular arms capturing substrates and delivering them to the inner cavity. This structural organization is strikingly different from the other cage-forming proteases, where substrates enter the central cavity through narrow axial or lateral pores as described in Sections 2 to 4. 4.8 DegP, a Chaperone
E. coli DegP has the ability to stabilize and support the refolding of several non-native proteins in vivo and in vitro [84, 113]. Possible binding sites for misfolded proteins are located within the inner cavity (see Figure 16). The solvent-accessible height of this chamber is 15 Å at its center and increases to 18 Å near the outer entrance. Owing to these geometric constrictions, substrates must be partially unfolded to reach the active site (see Figure 15). As in other chaperones of known structure, the DegP cavity is lined by hydrophobic residues. Two major hydrophobic grooves can be distinguished, which are mainly constructed by residues of loop LA and L2. Notably, the hydrophobic binding sites of the PDZ1 domains are properly oriented to augment the number of potential binding patches. The alternating arrangement of polar and hydrophobic surfaces, both within one trimeric ring and between trimeric rings, should allow the binding of exposed hydrophobic side chains as well as of the peptide backbone of substrates. Taken together, the ceilings of the DegP cavity represent docking platforms for partially misfolded proteins. Both platforms are structurally flexible and should thus allow binding of diverse polypeptides.
Fig. 16 Properties of the inner cavity. Half cut presentations of molecule A (left: side view, center and right: top views) with cut regions shown in dark gray. (Left) Surface representation of the internal tunnel illustrating its molecular-sieve character. Access is restricted to single secondary structure elements as shown by the modeled polyalanine helix, which is colored yellow. (Center)
Top view on the ceiling of the inner cavity with mapped thermal motion factors to show its plasticity. Flexible regions are colored red, rigid regions are blue. (Right) Formation of the hydrophobic binding patches within the cavity. Hydrophobic residues of the protease domain are shown in cyan, and the non-polar peptide-binding groove of PDZ1 in green.
32 Molecular Machines for Protein Degradation
4.9 The Protease Form
The protease conformation of DegP is still elusive as crystallization of a substratelike inhibitor complex has failed and maintenance of a stably folded protein precludes long-term experimentation at elevated temperatures where it displays protease activity. We propose a profound rearrangement of the LA* –L1–L2 loop triad into the canonical conformation of active serine proteases competent for substrate binding. This may be initiated by a collapse of the hydrophobic LA platforms and an enlargement of the hydrophobic contacts caused at high temperature. 4.10 Working Model for an ATP-independent Heat-shock Protein
Cage-forming proteases and chaperones can be ATP-dependent or -independent. In the former group, ATPase activity is important for recognition of target proteins, their dissociation and unfolding, their translocation within the complex, and for various gating mechanisms. The present crystal structure indicates why these functions are not relevant for DegP. DegP preferably degrades substrates, which are per se partially unfolded and which might accumulate under extreme conditions [79, 114, 116]. Alternatively, threading of substrate through the inner chamber could promote unfolding into an extended conformation. Removal of higher-order structural elements may reinitiate substrate folding after exit from DegP. By binding to the C-terminus or a β-hairpin loop of a protein, the PDZ domains could properly position the substrate for threading it into the central cavity. After accessing this chamber, the fate of the unfolded protein depends on the interplay and structural organization of loops LA, L1, and L2. Recruitment of PDZ domains for the gating mechanism should permit a direct coupling of substrate binding and translocation within the DegP particle. This two-step binding process is similar to that of other cageforming proteins such as the proteasome or the Clp proteases. Here, two binding sites (chambers) exist, the first of which primarily determines substrate specificity.
Acknowledgment
M.B. thanks Dr. Ravishankar Ramachandran, Molecular and Structural Biology division, Central Drug Research Institute, Lucknow, India for a critical reading of his contribution to the manuscript.
References 1 NEUWALD, A. F., ARAVIND, L., SPOUGE, J. L., and KOONIN, E. V. AAA+: A class of chaperone-like ATPases associated with
the assembly, operation, and disassembly of protein complexes. Genome Res 1999, 9, 27–43.
References 2 GLICKMAN, M. H., RUBIN, D. M., FU, H., LARSEN, C. N., COUX, O., WEFES, I., PFEIFER, G., CJEKA, Z., VIERSTRA, R., BAUMEISTER, W., et al. Functional analysis of the proteasome regulatory particle. Mol. Biol. Rep. 1999, 26, 21–28. 3 KRZYWDA, S., BRZOZOWSKI, A. M., VERMA, C., KARATA, K., OGURA, T., and WILKINSON, A. J. The crystal structure of the AAA domain of the ATP-dependent protease FtsH of Escherichia coli at 1.5 A resolution. Structure (Camb) 2002, 10, 1073–1083. 4 NIWA, H., TSUCHIYA, D., MAKYIO, H., YOSHIDA, M., and MORIKAWA, K. Hexameric ring structure of the ATPase domain of the membrane-integrated metalloprotease FtsH from Thermus thermophilus HB8. Structure (Camb) 2002, 10, 1415–1423. 5 WANG, J., HARTLING, J. A., and FLANAGAN, J. M. The structure of ClpP at 2.3 A resolution suggests a model for ATP-dependent proteolysis. Cell 1997, 91, 447–456. 6 KIM, D. Y. and KIM, K. K. Crystal structure of ClpX molecular chaperone from Helicobacter pylori. J. Biol. Chem. 2003, 278, 50664–50670. 7 GUO, F., MAURIZI, M. R., ESSER, L., and XIA, D. Crystal structure of ClpA, an Hsp100 chaperone and regulator of ClpAP protease. J. Biol. Chem. 2002b, 277, 46743–46752. 8 CHUANG, S. E., and BLATTNER, F. R. Characterization of twenty-six new heat shock genes of Escherichia coli. J Bacteriol 1993, 175, 5242–5252. 9 CHUANG, S. E., BURLAND, V., PLUNKETT, G., DANIELS, D. L., and BLATTNER, F. R. Sequence analysis of four new heat shock genes constituting the hslu and hslv operons in Escherichia coli. Gene 1993, 134, 1–6. 10 MISSIAKIS, D., SCHWAGER, F., BETTON, J.-M., GEORGOPOULOS, C., and RAINA, S. Identification and characterizaton of HslV HslU (ClpQ ClpY) proteins involved in overall proteolysis of misfolded proteins in Escherichia coli. EMBO J. 1996, 15, 6899–6909. 11 KHATTAR, M. M. Overexpression of the hslVU operon suppresses SOS-mediated
12
13
14
15
16
17
18
19
20
inhibition of cell division in Escherichia coli. FEBS Lett. 1997, 414, 402–404. KANEMORI, M., NISHIHARA, K., YANAGI, H., and YURA, T. Synergistic roles of HslVU and other ATP-dependent proteases in controlling in vivo turnover of σ 32 and abnormal proteins in Escherichia coli. J. Bact. 1997, 179, 7219–7225. KANEMORI, M., YANAGI, H., and YURA, T. The ATP-dependent HslVU/ClpQY protease participates in turnover of cell division inhibitor SulA in Escherichia coli. J Bacteriol 1999, 181, 3674–3680. SEONG, I. S., OH, J. Y., YOO, S. J., SEOL, J. H., and CHUNG, C. H. ATP-dependent degradation of SulA, a cell division inhibitor, by the HslVU protease in Escherichia coli. FEBS Lett. 1999, 456, 211–214. RAWLINGS, N. D., O’BRIEN, E., and BARRETT, A. J. MEROPS: the protease database. Nucleic Acids Res 2002, 30, 343–346. COUVREUR, B., WATTIEZ, R., BOLLEN, A., FALMAGNE, P., LE RAY, D., and DUJARDIN, J. C. Eubacterial HslV and HslU subunits homologs in primordial eukaryotes. Mol Biol Evol 2002, 19, 2110–2117. ROHRWILD, M., COUX, O., HUANG, H. C., MOERSCHELL, R. P., SOON, J. Y., SEOL, J. H., CHUNG, C. H., and GOLDBERG, A. L. HslV-HslU: A novel ATP-dependent protease complex in Escherichia coli related to the eukaryotic proteasome. Proc. Natl. Acad. Sci. USA 1996, 93, 5808–5813. SEOL, J. H., YOO, S. J., SHIN, D. H., SHIM, Y. K., KANG, M.-S., GOLDBERG, A. L., and CHUNG, C. H. The heat shock protein HslVU from Escherichia coli is a protein activated ATPase as well as an ATP-dependent proteinase. Eur. J. Biochem. 1997, 247, 1143–1150. HUANG, H.-C., and GOLDBERG, A. L. Proteolytic activity of the ATP-dependent protease HslVU can be uncoupled from ATP-hydrolysis. J. Biol. Chem. 1997, 272, 21364–21372. LUPAS, A., ZWICKL, P., and BAUMEISTER, W. Proteasome sequences in eubacteria. Trends Biochem. Sci. 1994, 19, 533–534.
33
34 Molecular Machines for Protein Degradation
21 ROHRWILD, M., PFEIFER, G., SANTARIUS, U., M¨ULLER, S. A., HUANG, H. C., ENGEL, A., BAUMEISTER, W., and GOLDBERG, A. L. The ATP-dependent HslVU protease from Escherichia coli is a four-ring structure resembling the proteasome. Nat. Struct. Biol 1997, 4, 133–139. 22 BOCHTLER, M., DITZEL, L., GROLL, M., and HUBER, R. Crystal structure of heat shock locus V (HslV) from Escherichia coli. Proc. Natl. Acad. Sci. USA 1997, 94, 6070–6074. 23 SOUSA, M. C. and MCKAY, D. B. Structure of Haemophilus influenzae HslV protein at 1.9 A resolution, revealing a cation-binding site near the catalytic site. Acta Crystallogr D Biol Crystallogr 2001, 57, 1950–1954. 24 SONG, H., BOCHTLER, M., AZIM, M., HARTMANN, C., HUBER, R., and RAMACHANDRAN, R. Isolation and characterization of the prokaryotic proteasome homolog HslVU (ClpQY) from Thermotoga maritima and the crystal structure of HslV. Biophys Chem. 2003, 100, 437–452. 25 KANG, M. S., LIM, B. K., SEONG, I. S., SEOL, J. H., TANAHASHI, N., TANAKA, K., and CHUNG, C. H. The ATP-dependent CodWX (HslVU) protease in Bacillus subtilis is an N-terminal serine protease. EMBO J. 2001, 20, 734–742. 26 LEVCHENKO, I., SMITH, C. K., WALSH, N. P., SAUER, R. T., and BAKER, T. A. PDZ-like domains mediate binding specificity in the Clp/Hsp100 family of chaperones and protease regulatory subunits. Cell 1997, 91, 939–947. 27 BOCHTLER, M., HARTMANN, C., SONG, H. K., BOURENKOV, G. P., BARTUNIK, H. D., and HUBER, R. The structures of HsIU and the ATP-dependent protease HsIU-HsIV. Nature 2000, 403, 800–805. 28 WANG, J., SONG, J. J., SEONG, I. S., FRANKLIN, M. C., KAMTEKAR, S., EOM, S. H., and CHUNG, C. H. Nucleotide-dependent conformational changes in a protease-associated ATPase HsIU. Structure (Camb) 2001b, 9, 1107–1116. 29 KARATA, K., INAGAWA, T., WILKINSON, A. J., TATSUTA, T., and OGURA, T. Dissecting the role of a conserved motif (the second
30
31
32
33
34
35
36
37
region of homology) in the AAA family of ATPases. Site-directed mutagenesis of the ATP-dependent protease FtsH. J. Biol. Chem. 1999, 274, 26225– 26232. SONG, H. K., HARTMANN, C., RAMACHANDRAN, R., BOCHTLER, M., BEHRENDT, R., MORODER, L., and HUBER, R. Mutational studies on HslU and its docking mode with HslV. Proc. Natl. Acad. Sci. USA 2000, 97, 14103–14108. SMITH, C. K., BAKER, T. A., and SAUER, R. T. Lon and Clp family proteases and chaperones share homologous substrate-recognition domains. Proc. Natl. Acad. Sci. USA 1999, 96, 6678–6682. WICKNER, S. and MAURIZI, M. R. Here’s the hook: similar substrate binding sites in the chaperone domains of Clp and Lon. Proc. Natl. Acad. Sci. USA 1999, 96, 8318–8320. LEE, Y. Y., CHANG, C. F., KUO, C. L., CHEN, M. C., YU, C. H., LIN, P. I., and WU, W. F. Subunit oligomerization and substrate recognition of the Escherichia coli ClpYQ (HslUV) protease implicated by in vivo protein-protein interactions in the yeast two-hybrid system. J. Bacteriol. 2003, 185, 2393–2401. ORTEGA, J., SINGH, S. K., ISHIKAWA, T., MAURIZI, M. R., and STEVEN, A. C. Visualization of substrate binding and translocation by the ATP-dependent protease, ClpXP. Mol Cell 2000, 6, 1515–1521. KANG, M. S., KIM, S. R., KWACK, P., LIM, B. K., AHN, S. W., RHO, Y. M., SEONG, I. S., PARK, S. C., EOM, S. H., CHEONG, G. W., et al. Molecular architecture of the ATP-dependent CodWX protease having an N-terminal serine active site. EMBO J. 2003, 22, 2893–2902. KESSEL, M., MAURIZI, M. R., KIM, B., KOCSIS, E., TRUS, B. L., SINGH, S. K., and STEVEN, A. C. Homology in structural organization between E. coli ClpAP protease and the eukaryotic 26S proteasome. J. Mol. Biol 1995, 250, 587–594. GRIMAUD, R., KESSEL, M., BEURON, F., STEVEN, A. C., and MAURIZI, M. R. Enzymatic and structural similarities
References
38
39
40
41
42
43
44
45
between the Escherichia coli ATP-dependent proteases, ClpXP and ClpAP. J. Biol. Chem. 1998, 273, 12476–12481. KESSEL, M., WHU, W. F., GOTTESMAN, S., KOCSIS, E., STEVEN, A. C., and MAURIZI, M. R. Six-fold rotational symmetry of ClpQ, the E. coli homolog of the 20S proteasome, and its ATP-dependent activator, ClpY. FEBS Lett. 1996, 398, 274–278. SOUSA, M. C., TRAME, C. B., TSURUTA, H., WILBANKS, S. M., REDDY, V. S., and MCKAY, D. B. Crystal and solution structures of an HslUV protease-chaperone complex. Cell 2000, 103, 633–643. WANG, J., SONG, J. J., FRANKLIN, M. C., KAMTEKAR, S., IM, Y. J., RHO, S. H., SEONG, I. S., LEE, C. S., CHUNG, C. H., and EOM, S. H. Crystal structures of the HslVU peptidase-ATPase complex reveal an ATP- dependent proteolysis mechanism. Structure (Camb) 2001a, 9, 177–184. KWON, A. R., KESSLER, B. M., OVERKLEEFT, H. S., and MCKAY, D. B. Structure and reactivity of an asymmetric complex between HslV and I-domain deleted HslU, a prokaryotic homolog of the eukaryotic proteasome. J. Mol. Biol. 2003, 330, 185–195. BOCHTLER, M., SONG, H. K., HARTMANN, C., RAMACHANDRAN, R., and HUBER, R. The quaternary arrangement of HslU and HslV in a cocrystal: a response to Wang, Yale. J Struct Biol 2001, 135, 281–293. WANG, J. A second response in correcting the HslV-HslU quaternary structure. J Struct Biol 2003, 141, 7–8. ISHIKAWA, T., MAURIZI, M. R., BELNAP, D., and STEVEN, A. C. Docking of components in a bacterial complex. Nature 2000, 408, 667–668. BOGYO, M., MCMASTER, J. S., GACZINSKA, M., TORTORELLA, D., GOLDBERG, A. L., and PLOEGH, H. Covalent modification of the active site threonine of proteasomal β-subunits and the Escherichia coli homolog HslV by a new class of inhibitors. Proc. Nat. Acad. Sci. USA 1997, 94, 6629–6634.
46 SOUSA, M. C., KESSLER, B. M., OVERKLEEFT, H. S., and MCKAY, D. B. Crystal structure of HslUV complexed with a vinyl sulfone inhibitor: corroboration of a proposed mechanism of allosteric activation of HslV by HslU. J. Mol. Biol. 2002, 318, 779–785. 47 RAMACHANDRAN, R., HARTMANN, C., SONG, H. K., HUBER, R., and BOCHTLER, M. Functional interactions of HslV (ClpQ) with the ATPase HslU (ClpY). Proc. Natl. Acad. Sci. USA 2002, 99, 7396–7401. 48 SEONG, I. S., KANG, M. S., CHOI, M. K., LEE, J. W., KOH, O. J., WANG, J., EOM, S. H., and CHUNG, C. H. The C-terminal tails of HslU ATPase act as a molecular switch for activation of HslV peptidase. J. Biol. Chem. 2002, 277, 25976–25982. 49 BABBITT, P. C. and GERLT, J. A. Understanding enzyme superfamilies. Chemistry as the fundamental determinant in the evolution of new catalytic activities. J. Biol. Chem. 1997, 272, 30591–30594. 50 BRANNIGAN, J. A., DODSON, G., DUGGLEBY, H. J., MOODY, P. C. E., SMITH, J. L., TOMCHICK, D. R., and MURZIN, A. G. A protein catalytic framework with an N-terminal nucleophile is capable of self-activation. Nature 1995, 378, 416–419. 51 BANECKI, B., WAWRZYNOW, A., PUZEWICZ, J., GEORGOPOULOS, C., and ZYLICZ, M. Structure-function analysis of the zinc-binding region of the Clpx molecular chaperone. J. Biol. Chem. 2001, 276, 18843–18848. 52 DONALDSON, L. W., WOJTYRA, U., and HOURY, W. A. Solution structure of the dimeric zinc binding domain of the chaperone ClpX. J. Biol. Chem. 2003, 278, 48991–48996. 53 BOCHTLER, M., Ph. D. Thesis, TU M¨unchen 1999. 54 GUO, F., ESSER, L., SINGH, S. K., MAURIZI, M. R., and XIA, D. Crystal structure of the heterodimeric complex of the adapter, ClpS, with the N-domain of the AAA+ chaperone, ClpA. J. Biol. Chem. 2002a, 277, 46753–46762. 55 DOUGAN, D. A., MOGK, A., ZETH, K., TURGAY, K., and BUKAU, B. AAA+
35
36 Molecular Machines for Protein Degradation
56
57
58
59
60
61
62
63
64
65
proteins and substrate recognition, it all depends on their partner in crime. FEBS Lett. 2002a, 529, 6–10. DOUGAN, D. A., WEBER-BAN, E., and BUKAU, B. Targeted delivery of an ssrA-tagged substrate by the adapter protein SspB to its cognate AAA+ protein ClpX. Mol Cell 2003, 12, 373–380. LEVCHENKO, I., GRANT, R. A., WAH, D. A., SAUER, R. T., and BAKER, T. A. Structure of a delivery protein for an AAA+ protease in complex with a peptide degradation tag. Mol Cell 2003, 12, 365–372. SONG, H. K. and ECK, M. J. Structural basis of degradation signal recognition by SspB, a specificity-enhancing factor for the ClpXP proteolytic machine. Mol Cell 2003, 12, 75–86. ZHOU, Y., GOTTESMAN, S., HOSKINS, J. R., MAURIZI, M. R., and WICKNER, S. The RssB response regulator directly targets sigma(S) for degradation by ClpXP. Genes Dev 2001, 15, 627–637. STUDEMANN, A., NOIRCLERC-SAVOYE, M., KLAUCK, E., BECKER, G., SCHNEIDER, D., and HENGGE, R. Sequential recognition of two distinct sites in sigma(S) by the proteolytic targeting factor RssB and ClpX. EMBO J. 2003, 22, 4111–4120. DOUGAN, D. A., REID, B. G., HORWICH, A. L., and BUKAU, B. ClpS, a substrate modulator of the ClpAP machine. Mol Cell 2002b, 9, 673–683. ZETH, K., RAVELLI, R. B., PAAL, K., CUSACK, S., BUKAU, B., and DOUGAN, D. A. Structural analysis of the adapter protein ClpS in complex with the N-terminal domain of ClpA. Nat. Struct. Biol. 2002, 9, 906–911. KIM, Y. I., LEVCHENKO, I., FRACZKOWSKA, K., WOODRUFF, R. V., SAUER, R. T., and BAKER, T. A. Molecular determinants of complex formation between Clp/Hsp100 ATPases and the ClpP peptidase. Nat. Struct. Biol. 2001, 8, 230–233. WILK, S. and CHEN, W.-E. Synthetic peptide based activators of the proteasome. Mol. Biol. Rep. 1997, 24, 119–124. WHITBY, F. G., MASTERS, E. I., KRAMER, L., KNOWLTON, J. R., YAO, Y., WANG, C. C., and HILL, C. P. Structural basis for
66
67
68
69
70
71
72
73
74
the activation of 20S proteasomes by 11S regulators. Nature 2000, 408, 115–120. WALZ, J., ERDMANN, A., KANIA, M., TYPKE, D., KOSTER, A. J., and BAUMEISTER, W. 26S proteasome structure revealed by three-dimensional electron microscopy. J Struct Biol 1998, 121, 19–29. ZWICKL, P., NG, D., WOO, K. M., KLENK, H. P., and GOLDBERG, A. L. An archaebacterial ATPase, homologous to ATPases in the eukaryotic 26 S proteasome, activates protein breakdown by 20 S proteasomes. J. Biol. Chem. 1999, 274, 26008–26014. RICHMOND, C., GORBEA, C., and RECHSTEINER, M. Specific interactions between the ATPase subunits of the 26S protease. J. Biol. Chem. 1997, 272, 13403–13411. GROLL, M., DITZEL, L., L¨OWE, J., STOCK, D., BOCHTLER, M., BARTUNIK, H. D., and HUBER, R. Structure of 20S proteasome from yeast at 2.4 Å resolution. Nature 1997, 386, 463–471. L¨OWE, J., STOCK, D., JAP, B., ZWICKL, P., BAUMEISTER, W., and HUBER, R. Crystal structure of the 20S proteasome from the archaeon T. acidophilum at 3.4 Å resolution. Science 1995, 268, 533– 539. UNNO, M., MIZUSHIMA, T., MORIMOTO, Y., TOMISUGI, Y., TANAKA, K., YASUOKA, N., and TSUKIHARA, T. The Structure of the Mammalian 20S Proteasome at 2.75 A Resolution. Structure 2002, 10 (5), 609–618. ARENDT, C. S. and HOCHSTRASSER, M. Eukaryotic 20S proteasome catalytic subunit propeptides prevent active site inactivation by N-terminal acetylation and promote particle assembly. EMBO J. 1999, 18, 3575–3585. GROLL, M., HEINEMEYER, W., JAGER, S., ULLRICH, T., BOCHTLER, M., WOLF, D. H., and HUBER, R. The catalytic sites of 20S proteasomes and their role in subunit maturation: a mutational and crystallographic study. Proc. Natl. Acad. Sci. USA 1999, 96, 10976–10983. GROLL, M., BRANDSTETTER, H., BARTUNIK, H., BOURENKOW, G., and HUBER, R. Investigations on the maturation and regulation of archaebacterial
References
75
76
77
78
79
80
81
82
proteasomes. J Mol Biol. 2003, 327, 75–83. DITZEL, L., HUBER, R., MANN, K., HEINEMEYER, W., WOLF, D. H., and GROLL, M. Conformational constraints for protein self-cleavage in the proteasome. J. Mol. Biol. 1998, 279, 1187–1191. DICK, T. P., NUSSBAUM, A. K., DEEG, M., HEINEMEYER, W., GROLL, M., SCHIRLE, M., KEILHOLZ, W., STEVANOVIC, S., WOLF, D. H., HUBER, R., et al. Contribution of proteasomal beta-subunits to the cleavage of peptide substrates analyzed with yeast mutants. J. Biol. Chem. 1998, 273, 25637–25646. NUSSBAUM, A. K., DICK, T. P., KEILHOLZ, W., SCHIRLE, M., STEVANOVIC, S., DIETZ, K., HEINEMEYER, W., GROLL, M., WOLF, D. H., HUBER, R., et al. Cleavage motifs of the yeast 20S proteasome beta subunits deduced from digests of enolase 1. Proc. Natl. Acad. Sci. USA 1998, 95, 12504–12509. CARDOZO, C., VINITSKY, A., MICHAUD, C., and ORLOWSKI, M. Evidence that the nature of amino acid residues in the P3 position directs substrates to distinct catalytic sites of the pituitary multicatalytic proteinase complex (proteasome). Biochemistry 1994, 33, 6483–6489. BOGYO, M., SHIN, S., MCMASTER, J. S., and PLOEGH, H. L. Substrate binding and sequence preference of the proteasome revealed by active-site-directed affinity probes. Chem Biol 1998, 5, 307–320. GROLL, M., NAZIF, T., HUBER, R., and BOGYO, M. Probing structural determinants distal to the site of hydrolysis that control substrate specificity of the 20S proteasome. Chem. Biol. 2002, 9, 1–20. HEINEMEYER, W., FISCHER, M., KRIMMER, T., STACHON, U., and WOLF, D. H. The active sites of the eukaryotic 20S proteasome and their involvement in subunit precursor processing. J. Biol. Chem. 1997, 272, 25200–25209. J¨AGER, S., GROLL, M., HUBER, R., WOLF, D. H., and HEINEMEYER, W. Proteasome beta-type subunits: unequal roles of propeptides in core particle maturation
83
84
85
86
87
88
89
90
91
and a hierarchy of active site function. J. Mol. Biol. 1999, 291, 997–1013. WENZEL, T., ECKERSKORN, C., LOTTSPEICH, F., and BAUMEISTER, W. Existence of a molecular ruler in proteasomes suggested by analysis of degradation products. FEBS Lett. 1994, 349, 205–209. ADAMS, J., BEHNKE, M., CHEN, S., CRUICKSHANK, A. A., DICK, L. R., GRENIER, L., KLUNDER, J. M., MA, Y.-T., PLAMONDON, L., and STEIN, R. L. Potent and selective inhibitors of the proteasome: dipeptidyl boronic acids. Bioorg. Med. Chem. Lett. 1998, 8, 333–338. ELOFSSON, M., SPLITTGERBER, U., MYUNG, J., MOHAN, R., and CREWS, C. M. Towards subunit-specific proteasome inhibitors: synthesis and evaluation of peptide alpha , beta -epoxyketones. Chem Biol 1999, 6, 811–822. FENTEANY, G., STANDAERT, R. F., LANE, W. S., CHOI, S., COREY, E. J., and SCHREIBER, S. L. Inhibition of proteasome activities and subunit-specific amino-terminal threonine modification by lactacystin. Science 1995, 268, 726–731. MENG, L., MOHAN, R., KWOK, B. H., ELOFSSON, M., SIN, N., and CREWS, C. M. Epoxomicin, a potent and selective proteasome inhibitor, exhibits in vivo antiinflammatory activity. Proc. Natl. Acad. Sci. USA 1999, 96, 10403–10408. GROLL, M., KIM, K. B., KAIRIES, N., HUBER, R., and CREWS, C. M. Crystal structure of epoxomicin: 20S proteasome reveals a molecular basis for selectivity of alpha , beta -epoxyketone proteasome inhibitors. J. Am. Chem. Soc. 2000b, 122, 1237–1238. KLOETZEL, P. Antigen processing by the proteasome. Nat Rev Mol Cell Biol. 2001, 2, 179–187. KOGUCHI, Y., KOHNO, J., NISHIO, M., TAKAHASHI, K., OKUDA, T., OHNUKI, T., and KOMATSUBARA, S. TMC-95A, B, C, and D, novel proteasome inhibitors produced by Apiospora montagnei Sacc. TC 1093. Taxonomy, production, isolation, and biological activities. J Antibiot (Tokyo) 2000, 53, 105–109. KOHNO, J., KOGUCHI, Y., NISHIO, M., NAKAO, K., KURODA, M., SHIMIZU, R.,
37
38 Molecular Machines for Protein Degradation
92
93
94
95
96
97
98
99
OHNUKI, T., and KOMATSUBARA, S. Structures of TMC-95A-D: novel proteasome inhibitors from Apiospora montagnei sacc. TC 1093. J Org Chem 2000, 65, 990–995. GROLL, M., KOGUCHI, Y., HUBER, R., and KOHNO, J. Crystal structure of the 20 S proteasome: TMC-95A complex: a non-covalent proteasome inhibitor. J. Mol. Biol. 2001, 311, 543– 548. KAISER, M., GROLL, M., RENNER, C., HUBER, R., and MORODER, L. The core structure of TMC-95A is a promising lead for reversible proteasome inhibition. Angew. Chem. Int. Ed. 2002, 41, 780–783. LIN, S. and DANISHEFSKY, S. The total synthesis of proteasome inhibitors TMC-95A and TMC-95B: discovery of a new method to generate cis-propenyl amides. Angew Chem Int Ed Engl. 2002, 41, 512–515. LOIDL, G., GROLL, M., MUSIOL, H. J., HUBER, R., and MORODER, L. Bivalency as a principle for proteasome inhibition. Proc. Natl. Acad. Sci. USA 1999, 96, 5418–5422. NAVON, A., and GOLDBERG, A. L. Proteins are unfolded on the surface of the ATPase ring before transport into the proteasome. Mol Cell 2001, 8 (6), 1339–1349. BENAROUDJ, N., ZWICKL, P., SEEMULLER, E., BAUMEISTER, W., and GOLDBERG, A. L. ATP hydrolysis by the proteasome regulatory complex PAN serves multiple functions in protein degradation. Mol Cell 2003, 11, 69–78. GROLL, M., BAJOREK, M., K¨OHLER, A., MORODER, L., RUBIN, D. M., HUBER, R., GLICKMAN, M. H., and FINLEY, D. A gated channel into the proteasome core particle. Nat. Struct. Biol. 2000a, 7, 1062–1067. GLICKMAN, M. H., RUBIN, D. M., COUX, O., WEFES, I., PFEIFER, G., CJEKA, Z., BAUMEISTER, W., FRIED, V. A., and FINLEY, D. A subcomplex of the proteasome regulatory particle required for ubiquitin-conjugate degradation and related to the COP9-signalosome and eIF3. Cell 1998, 94, 615–623.
100 K¨OHLER, A., CASCIO, P., LEGGETT, D. S., WOO, K. M., GOLDBERG, A. L., and FINLEY, D. The axial channel of the proteasome core particle is gated by the Rpt2 ATPase and controls both substrate entry and product release. Molecular Cell 2001, 7, 1143–1152. 101 TAMURA, T., TAMURA, N., CEJKA, Z., HEGERL, R., LOTTSPEICH, F., and BAUMEISTER, W. Tricorn Protease – The Core of a Modular Proteolytic System. Science 1996a, 274, 1385–1389. 102 BRANDSTETTER, H., KIM, J.-S., GROLL, M., and HUBER, R. The crystal structure of the tricorn protease reveals a protein disassembly line. Nature 2001, 414, 466–469. 103 F¨ULO¨ P, V., B¨OCSKEI, Z., and POLGA´ R, L. Prolyl Oligopeptidase: An Unusual β-Propeller Domain Regulates Proteolysis. Cell 1998, 94, 161–170. 104 ENGEL, M., HOFFMANN, T., WAGNER, L., WERMANN, M., HEISER, U., KIEFERSAUER, R., HUBER, R., BODE, W., DEMUTH, H.-U., and BRANDSTETTER, H. The crystal structure of dipeptidyl peptidase IV (CD26) reveals its functional regulation and enzymatic mechanism. Proc. Natl. Acad. Sci. USA 2003, 100, 5063–5068. 105 RASMUSSEN, H. B., BRANNER, S., WIBERG, F. C., and WAGTMANN, N. Crystal structure of human dipeptidyl peptidase IV/CD26 in complex with a substrate analog. Nat. Struct. Biol. 2003, 9. 106 BODE, W., MAYR, I., BAUMANN, U., HUBER, R., STONE, S. R., and HOFSTEENGE, J. The refined 1.9 A crystal structure of human alpha-thrombin: interaction with D-Phe-Pro-Arg chloromethylketone and significance of the Tyr-Pro-Pro-Trp insertion segment. EMBO J. 1989, 8, 3467–3475. 107 EICHINGER, A., BEISEL, H. G., JACOB, U., HUBER, R., MEDRANO, F. J., BANBULA, A., POTEMPA, J., TRAVIS, J., and BODE, W. Crystal structure of gingipain R: an Arg-specific bacterial cysteine proteinase with a caspase-like fold. EMBO J. 1999, 18, 5453–5462. 108 TAMURA, T., TAMURA, N., LOTTSPEICH, F., and BAUMEISTER, W. Tricorn protease (TRI) interacting factor 1 from Thermoplasma acidophilum is a proline
References
109
110
111
112
113
114
115
116
117
iminopeptidase. FEBS Lett. 1996b, 398, 101–105. BEEBE, K. D., SHIN, J., PENG, J., CHAUDHURY, C., KHERA, J., and PEI, D. Substrate Recognition through a PDZ Domain in Tail-Specific Protease. Biochemistry 2000, 39, 3149–3155. PONTING, C. P. and PALLEN, M. J. β-Propeller repeats and a PDZ domain in the tricorn protease: predicted self-compartmentalisation and C-terminal polypeptide-binding strategies of substrate selection. FEMS Microbiol. Lett. 1999, 179, 447–451. LIAO, D. I., QIAN, J., CHISHOLM, D. A., JORDAN, D. B., and DINER, B. A. Crystal structures of the photosystem II D1 C-terminal processing protease. Nat. Struct. Biol. 2000, 7, 749–753. CABRAL, J. H. M., PETOSA, C., SUTCLIFFE, M. J., RAZA, S., BYRON, O., POY, F., MARFATIA, S. M., CHISHTI, A. H., and LIDDINGTON, R. C. Crystal structure of a PDZ domain. Nature 1996, 382, 649–652. DOYLE, D. A., LEE, A., LEWIS, J., KIM, E., SHENG, M., and MACKINNON, R. Crystal structures of a complexed and peptide-free membrane protein-binding domain: Molecular basis of peptide recognition by PDZ. Cell 1996, 85, 1067–1076. KIM, J.-S., GROLL, M., MUSIOL, H. J., BEHRENDT, R., KAISER, M., MORODER, L., HUBER, R., and BRANDSTETTER, H. Navigation inside a protease: substrate selection and product exit in the tricorn protease from Thermoplasma acidophilum. J. Mol. Biol. 2002, 324, 1041–1050. F¨ULO¨ P, V., SZELTNER, Z., and POLGAR, L. Catalysis of serine oligopeptidases is controlled by a gating filter mechanism. EMBO Reports 2000, 1, 277–281. ITO, N., PHILLIPS, S. E. V., STEVENS, C., OGEL, Z. B., MCPHERSON, M. J., KEEN, J. N., YADAV, K. D. S., and KNOWLES, P. F. Novel thioether bond revealed by a 1.7 A crystal structure of galactose oxidase. Nature 1991, 35, 87–90. GOETTIG, P., GROLL, M., KIM, J.-S., HUBER, R., and BRANDSTETTER, H. Crystal structures of the tricorn interacting aminopeptidase F1 with various ligands
118
119
120
121
122
123
124
125
126
127
reveal its catalytic mechanism. EMBO J. 2002, 21. TAMURA, N., LOTTSPEICH, F., BAUMEISTER, W., and TAMURA, T. The role of tricorn protease and its aminopeptidaseinteracting factors in cellular protein degradation. Cell 1998, 95, 637–648. POSPISILIK, J. A., HINKE, S. A., PEDERSON, R. A., HOFFMANN, T., ROSCHE, F., SCHLENZIG, D., GLUND, K., HEISER, U., MCINTOSH, C. H., and DEMUTH, H. Metabolism of glucagon by dipeptidyl peptidase IV (CD26). Regul Pept 2001, 96, 133–141. WICKNER, S., MAURIZI, R., and GOTTESMAN, S. Posttranslational Quality Control: Folding, Refolding, and Degrading Proteins. Science 1999, 286, 1888–1893. GOTTESMAN, S., WICKNER, S., and MAURIZI, M. R. Protein quality control: Triage by chaperones and proteases. Genes & Development 1997, 11, 815– 823. LIPINSKA, B., ZYLICZ, M., and GEORGOPOULOS, C. The Htra (Degp) Protein, Essential For Escherichia-Coli Survival At High-Temperatures, Is an Endopeptidase. J. Bacteriol. 1990, 172, 1791–1797. CLAUSEN, T., SOUTHAN, C., and EHRMANN, M. The HtrA family of proteases: implications for protein composition and cell fate. Mol Cell 2002, 10, 443–455. SHENG, M. and SALA, C. PDZ domains and the organization of supramolecular complexes. Annu. Rev. Neurosci. 2001, 24, 1–29. PALLEN, M. J. and WREN, B. W. The HtrA family of serine proteases. Molec. Microbiol. 1997, 26, 209–221. KOLMAR, H., WALLER, P. R. H., and SAUER, R. T. The DegP and DegQ periplasmic endoproteases of Escherichia coli: Specificity for cleavage sites and substrate conformation. J. Bacteriol. 1996, 178, 5925–5929. SPIESS, C., BEIL, A., and EHRMANN, M. A temperature-dependent switch from chaperone to protease in a widely conserved heat shock protein. Cell 1999, 97, 339–347.
39
40 Molecular Machines for Protein Degradation
128 KROJER, T., GARRIDO-FRANCO, M., HUBER, R., EHRMANN, M., and CLAUSEN, T. Crystal structure of DegP (HtrA) reveals a new protease-chaperone machine. Nature 2002, 416, 455–459. 129 PERONA, J. J. and CRAIK, C. S. Structural basis of substrate specificity in the serine proteases. Protein Sci 1995, 4, 337– 360. 130 SONGYANG, Z., FANNING, A. S., FU, C., XU, J., MARFATIA, S. M., CHISHTI, A. H., CROMPTON, A., CHAN, A. C., ANDERSON, J. M., and CANTLEY, L. C. Recognition of unique carboxyl-terminal motifs by distinct PDZ domains. Science 1997, 275, 73–77. 131 MISRA, R., CASTILLOKELLER, M., and DENG, M. Overexpression of protease-deficient DegP(S210A) rescues the lethal phenotype of Escherichia coli
OmpF assembly mutants in a degP background. J. Bacteriol. 2000, 182, 4882–4888. 132 SWAMY, K. H. S., CHIN, H. C., and GOLDBERG, A. L. Isolation and Characterization of Protease Do From Escherichia-Coli, a Large Serine Protease Containing Multiple Subunits. Arch. Biochem. Biophys. 1983, 224, 543–554. 133 STRAUCH, K. L., JOHNSON, K., and BECKWITH, J. Characterization of Degp, a Gene Required For Proteolysis in the Cell-Envelope and Essential For Growth of Escherichia-Coli At High-Temperature. J. Bacteriol. 1989, 171, 2689–2696. 134 WANG, J. A corrected quaternary arrangement of the peptidase HslV and atpase HslU in a cocrystal structure. J Struct Biol 2001, 134, 15–24.
1
Protein Sorting to the Lysosome
John McCullough, Michale J. Clague, and Sylvie Urb´e University of Liverpool, Liverpool, United Kingdom
Originally published in: Protein Degradation, Volume 3. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31435-0
1 Introduction
The eukaryotic cell contains two major proteolytic systems that mediate protein turnover. The 26S proteasome governs the degradation of most intracellular proteins, whereas the lysosome is responsible for downregulation of cell surface receptors and for the degradation of exogenous proteins that have been internalized. The discovery of ubiquitin in the 1970s led to the identification of the 26S proteasome, which supplanted the notion of the lysosome as the principal player in intracellular protein degradation [1]. There is therefore a nice irony that, as the molecular mechanisms of lysosomal sorting have been unravelled, covalent attachment of ubiquitin has been shown to play a key role in routing proteins through the endomembrane system for lysosomal degradation. Appendage of K48-linked polyubiquitin chains targets proteins for proteasomal destruction. However, polyubiquitin chains linked through any of six other internal lysine residues are also represented within eukaryotic cells and their functions are largely unknown [2]. Emerging roles for ubiquitin have been accompanied by an appreciation of ubiquitination as a dynamic modification, which can be used to govern the assembly and disassembly of macromolecular complexes, much like phosphorylation, through interaction with specific ubiquitin-binding domains. Accordingly, additional roles can be proposed for ubiquitin modification at endosomal membranes, such as cell signalling. A receptor normally enters the endosomal system through incorporation into Clathrin-coated vesicles (CCVs) and delivery to a tubulo-vesicular compartment known as the early or sorting endosome (Figure 1) [3]. From here receptors may recycle to the plasma membrane, or be selected for lysosomal sorting, by incorporation into small vesicles that bud away from the cytosol into the vacuolar lumen Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Sorting to the Lysosome
Fig. 1 Protein sorting in the endocytic pathway. Upon ligand activation (not shown), growth factor receptors (e.g., EGF receptor) are internalized via Clathrin-coated vesicles (CCV) and delivered to the tubulo-vesicular early endosome. Here, ubiquitinated receptors are sequestered into the lumenal
vesicles of the multi-vesicular body (MVB), which delivers its content to the lysosome. Other plasma membrane proteins that are not ubiquitinated (e.g., Transferrin receptor) enter the pathway via the same route but are recycled back to the plasma membrane.
to generate multi-vesicular bodies (MVBs). MVBs may then fuse directly with lysosomes or deliver material to late endosomes, both of which contain acid-dependent proteases. Classic electron microsocopy studies established this paradigm, largely by following the itinerary of epidermal growth factor receptors (EGFR), subsequent to ligand-induced internalization [4–6]. This endocytic pathway has been conserved in yeast, with the yeast vacuole being functionally equivalent to the lysosome. Thus, much of our recent knowledge has been driven by yeast genetic studies of mutants defective in vacuolar transport (vps mutants) [7], which has led to the identification of the MVB sorting machinery. In this article we shall focus first on the work that led to an appreciation of a role for ubiquitin on the endocytic pathway, before discussing the complex machinery that engages with ubiquitin for sorting towards the lysosome. We shall then consider the various E3-ligases and de-ubiquitinating enzymes (DUBs), which have
2 Identification of Ubiquitin as an Endosomal Sorting Signal
been mooted to regulate the ubiquitin status of endosomal proteins, before finally considering the role of different polyubiquitin chains at the endosome.
2 Identification of Ubiquitin as an Endosomal Sorting Signal
The first hints that ubiquitin might play a role in endocytosis emerged in the mid-1980s, when it was found that a number of plasma membrane receptors, including PDGF receptor and GH receptor, were modified with ubiquitin [8, 10]; reviewed in [151]. In the early 1990s, studies revealed that ubiquitination of cell surface receptors that signal through tyrosine kinase activation occurs in response to binding of ligands. The first example of such was the T-cell receptor (TCR). Multiple TCR subunits were found to be ubiquitinated on cytosolic lysine residues in response to receptor occupancy [9]. It required yeast studies to clarify a function for conjugation of ubiquitin to membrane proteins. A link between ubiquitin and endocytosis was established using three different approaches. Firstly, Kolling and Hollenberg analysed the downregulation of Ste6p, the ATP-binding cassette (ABC) transporter for secretion of the pheromone α-factor [11]. They observed the accumulation of ubiquitinated forms of Ste6p in plasma membrane fractions prepared from a mutant Saccharomyces cerevisiae strain with impaired endocytosis. Second, by analysing yeast strains lacking ubiquitin-conjugating enzymes, Hicke and Riezman showed that ubiquitination of the yeast G protein-coupled α-factor receptor Ste2p marked this plasma membrane protein for proteolysis [12]. Using mutant strains, defective for either vacuolar hydrolases or proteasome function, they were able to establish that ubiquitination of Ste2p was required for the vacuolar degradation pathway. A third line of evidence for a link between ubiquitin and endocytosis in yeast was obtained by genetic analysis of the ammonium-induced downregulation of amino acid permeases. Npi1 was isolated as a nitrogen permease inactivator gene in 1983 [13]; Hein et al., subsequently showed that the gene encodes an E3-ligase, Rsp5p [14]. In contrast to proteasomal degradation pathways, appendage of polyubiquitin chains to substrate proteins is not required for ubiquitin-dependent endocytosis in yeast. Ste2p is efficiently internalized under conditions where polyubiquitin chain formation is suppressed by the use of ubiquitin mutants lacking internal lysines [15]. Fusion of a single ubiquitin in-frame to the stable plasma membrane protein Pma1p stimulates endocytosis of this protein [16]. For other proteins, specifically yeast permeases, such as uracil permease and the general amino acid permease, Gap1p, maximal internalization rates require the formation of diubiquitin chains linked through Lys63 of ubiquitin [17, 18]. In mammalian cells, the smeary appearance of ubiquitinated receptors on Western blots was initially interpreted as reflecting polyubiquitination. However, it was not clear how polyubiquitinated receptors would escape proteasomal targeting, and instead be degraded in the lysosome. This issue has since been resolved by using a combination of monoclonal antibodies, which allow discrimination between
3
4 Protein Sorting to the Lysosome
mono- and polyubiquitinated proteins. EGFR, PDGFR and Met tyrosine kinase receptors are all in fact monoubiquitinated at multiple lysine residues within the receptors [19–21]. Furthermore, Haglund et al., showed that fusion of ubiquitin to the C-terminus of EGFR results in constitutive endocytosis, which cannot be further enhanced by EGF stimulation. However, data from the Sorkin laboratory has shown that the dopamine transporter can be polyubiquitinated and contains a mixture of K11, K48 and K63-linked ubiquitin [22]. In addition to promoting the internalization of receptors from the plasma membrane, ubiquitination also promotes sorting of receptors towards late endosomes and the vacuole. Mutation of lysine 6 in the cytoplasmic tail of the yeast vacuolar protein Phm5p inhibits its sorting to lumenal vesicles, but this can be restored by the biosynthetic addition of a single ubiquitin to create a Ub-Phm5p fusion protein [23]. In mammalian cells, transferrin receptor normally recycles back to the plasma membrane after internalization to sorting endosomes [24]. Fusion of ubiquitin to the C-terminus of this receptor prevents its recycling, through interaction with the endosomal protein Hrs (Section 3.2; [25]).
3 Ubiquitin-mediated Sorting at the Endosome: The MVB Sorting Machinery
Our appreciation of the complexity of the molecular machinery responsible for MVB formation and receptor sorting to the lysosome, owes everything to the comprehensive characterization of vps mutant strains in yeast [26]. These vps mutants can be subdivided into classes based on their characteristic phenotype, and it is Class E mutants that present with an enlarged, swollen prevacuolar compartment, indicative of a defect at the stage of MVB formation and inward vesiculation. The sorting events at the prevacuolar or early endosome that lead to the translocation of receptors into internal vesicles are thought to be mediated by a succession of at least four multiprotein complexes, which each have the ability to recognize ubiquitinated cargo through ubiquitin-interacting domains: the Hrs/STAM complex and the ESCRT complexes I, II and III. A fifth component that is essential for this process is an ATPase of the AAA family called Vps4 or SKD1. 3.1 Endosome-associated Ubiquitin Interacting Domains: Structure and Function
Several of the class E vps genes contain domains that are predicted to interact with ubiquitin. A feature common to all ubiquitin-binding motifs is their low affinity (100–500 µM, see Table 1). This makes sense in the face of the high concentration of free ubiquitin in the cytosol which has been estimated at 10 µM [31]; too high an affinity would mean that these modules would be permanently plugged with ubiquitin and unable to dynamically interact with ubiquitinated cargo. The small size of these domains has made them amenable to structural analysis.
3 Ubiquitin-mediated Sorting at the Endosome: The MVB Sorting Machinery Table 1 Ubiquitin-binding domains in endocytic proteins
Name
Length
Examples
K d (l M)
Structure
UIM
∼20 ∼40 ∼45–50 ∼60 ∼150 var.
NZF
∼25–30
Vps36
Hrs:∼300 Vps27:∼100–300 STAM:∼200 n.d. Vps9: 20 µM Tsg101:∼500 n.d. yGGA: 100–400 Vps36:∼200
α-helix
UBA CUE UEV VHS GAT
Hrs/Vps27, STAM/Hsel, Eps15, Epsin Cbl Tollip Tsg101/Vps23 STAM GGA3
Triple α-helix Triple α-helix α-helix/β-sheets Eight α-helices Triple α-helix Zn finger
A large variety of ubiquitin-binding motifs are found in proteins involved in endocytic membrane traffic. The average length of the motif is given in amino acid residues. Kd values correspond to measured affinities for free ubiquitin. n.d.: not determined. Refer to text for references and abbreviations.
The ubiquitin-interacting motif (UIM) was first identified in an unbiased bioinformatic screen based on homology to the ubiquitin-interacting region in the regulatory subunit S5a (Rpn10 in yeast) of the 26S proteasome [27]. It became immediately clear that proteins involved in endocytic trafficking were highly represented amongst the emerging list of UIM proteins. UIM domains are found in Hrs, STAM and Epsin as well as in their yeast counterparts Vps27, HseI and Ent1/2. The UIM is characterized by a short 20 amino acid motif with a highly conserved stretch xxAxxxSxxAc, where denotes a large hydrophobic, and Ac an acidic, residue, and which is preceded by a block of four mostly acidic residues [27]. Solution structures of the UIMs of Hrs and Vps27, as well as a crystal structure of the second UIM of Vps27, are now available and indicate that the motif folds as a short amphipathic helix and interacts with the Leu8-Ile44-Val70 hydrophobic patch in ubiquitin [28–30]. The structures indicate that UIM binding to a monoubiquitinated protein would occlude Lys48 of ubiquitin, rendering it unavailable for ubiquitin chain extension through this residue. This may provide one mechanism that protects ubiquitinated receptors from proteasomal degradation. Ubiquitin-associated (UBA) domains [32, 33] and CUE-domains (named after the founding member Cue1: coupling of ubiquitin conjugation to ER degradation) span 40 to 50 amino acids and share a three-helix bundle structure [34–36]. Two highresolution crystal structures of the CUE domain show that binding to ubiquitin, as in the case of the UIM, is through hydrophobic surfaces, and that Lys48 of ubiquitin is likewise occluded by the interaction [34, 36]. Ubiquitin E2 variant (UEV) domains have homology to the E2 conjugating enzymes that ligate ubiquitin to substrates, but they lack the catalytic cysteine. In contrast to UIM, UBA and CUE domains, the crystal structure of the ESCRT I component Tsg101 UEV domain complexed with ubiquitin indicates that both Lys 48 and Lys 63 of ubiquitin remain fully accessible [37, 38].
5
6 Protein Sorting to the Lysosome
Many UIM, UBA and CUE domains promote self-ubiquitination of their host protein. This can promote a network of proteins held together by ubiquitin interactions, or may regulate the availability of the ubiquitin-binding domain in the host protein [39, 40]. 3.2 The Hrs–STAM Complex and the Endosomal Clathrin Coat
Initial engagement between the sorting machinery and ubiquitinated cargo is mediated through interaction with UIM domains in hepatocyte growth factor tyrosine regulated substrate (Hrs) and signal transducing adapter molecule (STAM), also called Hrs-binding protein (Hbp). In yeast these proteins correspond to Vps27 and Hse1, respectively. Mutations in the UIM of either of these proteins result in specific defects in the sorting of ubiquitinated proteins into the vacuole lumen [41, 42]. Association of Hrs with endosomal membranes is mediated through binding of its FYVE domain to the inositol lipid PtdIns3P [43, 44]. Hrs fulfils an adapter function through direct interaction with ubiquitinated cargo and with the terminal domain (TD) of clathrin heavy chain [45, 46]. Both Hrs and claxsthrin are components of an unusual coat structure that assembles on the vacuolar surface of sorting endosomes [47]. Typically, the coat presents as an extended flat surface, which gives the impression of opposing the natural curvature of the membrane. It comprises two relatively electron-dense layers separated by a thin electron-lucent layer. This particular clathrin coat does not form clathrin-coated vesicles, but instead may provide a matrix capable of trapping and concentrating ubiquitinated receptors such as EGFR [45]. On the other hand, the recycling transferrin receptor, whilst free to diffuse into this region, is not retained within it [47]. A central role of Hrs/Vps27 in both receptor sorting and MVB formation has been proposed. Vps27 is a class E vps mutant defective in lumenal vesicle formation and Hrs knock-out in Drosophila provides a similar phenotype and inhibits EGFR downregulation [48]. siRNA knock-down of Hrs in mammalian cells partially inhibits Met receptor and EGFR downregulation [49, 50]. A role for Hrs in both receptor sorting and lumenal vesicle formation suggests that the two processes may be tightly coupled, with consequent advantages with respect to the loading efficiency. Both receptor sorting and lumenal vesicle formation can also be inhibited by overexpression of Hrs [51]. This inhibition of internal vesicle formation is contingent on an intact Hrs UIM domain, suggesting that it may play both positive (receptor sorting) and negative (vesicle formation) roles in the pathway leading to lumenal vesicle budding. Yet another UIM-domain-containing protein, Eps15, has been reported as an additional subunit of the Hrs–STAM complex [49]. Eps15 has previously been shown to play a role in the CCV-mediated internalization of activated EGFR [52, 53]. The fact that Eps15 clearly functions early on in endocytosis has made it difficult to address whether it is an essential component at a later stage of endosomal sorting. All three proteins, Hrs, STAM and Eps15, are tyrosine phosphorylated in response to growth factors, which could conceivably regulate further associations
3 Ubiquitin-mediated Sorting at the Endosome: The MVB Sorting Machinery
with components of the sorting machinery [44, 54, 55] or with other signalling pathways [56]. 3.3 GGA and Tom1: Alternative Sorting Adapters?
Hrs and STAM are founder members of the family of proteins with a VHS (Vps27p/Hrs/STAM) domain. Striking similarities have now been found with other members, which share the ability to link ubiquitin and Clathrin: GGA (Golgiassociated g-adaptin homologues, Arf-binding) and Tom1/Tom1L1 (target of Myb1) (Figure 2) [57–60]. GGA proteins have previously been described as monomeric adapters that are involved in clathrin-coated vesicular transport between the Golgi and the endosome as well as the endosome and the vacuole/lysosome [150]. Ubiquitin binding is conferred by the GGA and Tom (GAT) domain and by the VHS domain, whereas clathrin is recruited by the Hinge-region [57, 62–64]. The GAT domain is also responsible for binding the small GTPase Arf [65]. In the case of Tom1 and Tom1L1, the GAT domain binds ubiquitin and a CUE-domain protein called Tollip in a mutually exclusive way [60, 66]. Tom1, but not Tom1L1 also binds to a FYVE-domain protein, Endofin, strengthening the analogy with the HrsSTAM complex [67]. Endofin or Tollip co-expression are required for recruitment of Tom1 to endosomes [66, 67]. Finally, all three VHS-domain sorting complexes, Hrs-STAM, GGA and Tom1L1, can independently recruit the internal vesicle formation machinery through a P(S/T)AP motif which interacts with the ESCRTI component tumour suppressor gene 101 (Tsg101, see below) [58, 61, 68]. A yeast two-hybrid assay has also suggested that the VHS domains of GGA1 and 3, as well as of Tom1 are capable of binding to Hrs [58]. It is unclear if the existence of multiple sorting complexes is an indication of redundancy in lysosomal sorting or whether the multitude of adapters provides a higher level of cargo selectivity to the sorting process. An analogy may be made with the clathrin-coated vesicle internalization pathway where a variety of adapters operate to provide both redundant and specific sorting mechanisms [69]. 3.4 The ESCRT Machinery
Yeast studies have defined three distinct multimeric Vps protein complexes, named endosomal sorting complex required for transport (ESCRT) I, II and III, which are proposed to be sequentially recruited and activated at the endosome (Figure 3) [26]. These proteins constitute the core of the MVB formation machinery downstream of the Hrs–STAM complex. At least two of these complexes bind ubiquitin, and the ubiquitinated receptors may be passed along from one complex to another. The first complex, ESCRTI (∼350 kDa) is composed of Vps23, Vps28 and Vps37. The ubiquitin-binding site of this complex is found in Vps23, which has a UEV domain. The mammalian homologue of Vps23, Tsg101 (tumour suppressor gene 101) was originally identified in a tissue culture screen for genes whose disruption
7
8 Protein Sorting to the Lysosome
Fig. 2 Ubiquitin-binding adapter complexes implicated in MVB sorting. Three endosomal adapter complexes have been implicated in the sorting of proteins from the endosome to the lysosome. Each of these complexes interacts with clathrin, which decorates the endosomal membrane in discrete patches. Ubiquitin binding is conferred by UIM (Hrs, STAM), VHS (STAM, GGA), or GAT domains (GGA, Tom1) and might
provide a means to concentrate ubiquitinated receptors in the endosomal clathrincoated microdomains. The PS/TAP motif in the Hrs, GGA and Tom1 subunits may then recruit the ESCTI complex via direct interaction with Tsg101. It is as yet unclear whether these sorting complexes act independently or in concert with each other. Black rectangle: receptor cargo; PI3P: PtdIns3P; Ub: ubiquitin.
3 Ubiquitin-mediated Sorting at the Endosome: The MVB Sorting Machinery
causes cell transformation [70]. Tsg101 as well as hVps28 disruption by RNAi and antibody injection, respectively, clearly interfere with EGFR downregulation in human cells and cause a marked accumulation of ubiquitin on endosomes [71, 72]. The mammalian homologue of Vps37 has only recently been identified and is the least conserved member of ESCRTI [73, 74]. ESCRTI is recruited to endosomal membranes through binding of the UEV domain of TSG101/Vps23 to a conserved PT/(S)AP motif within Hrs–Vps27 [68, 75, 76]. Hrs and TSG101 are present in both cytosolic and membrane fractions, but only associate at the membrane [75]. How can this ordered ESCRT complex assembly be attained specifically at the membrane? Hrs is localized in part at the membrane through interaction of its FYVE domain with the inositide lipid PtdIns3P that is concentrated at early endosomes [43, 44], where it can then bind to ubiquitinated receptors. It is possible that PtdIns3P or ubiquitin binding could induce a conformational change in Hrs that unmasks a TSG101 binding site. TSG101 itself has a PTAP motif, which may interact with its own UEV domain – competitive binding by Hrs may then release TSG101 into a relaxed conformation permissive for ESCRT II recruitment [68, 76, 77]. ESCRTII was described as a cytosolic complex (∼155 kDa) composed of the class E vps proteins Vps22/Eap30, Vps25/Eap25 and Vps36/Eap45 that transiently associates with endosomal membranes in an ESCRTI-dependent manner (Figure 3). Two 3.6-Å resolution crystal structures of an ESCRTII complex containing one molecule of Vps22, the carboxy-terminal domain of Vps36 and two molecules of Vps25 suggest that the complex has the shape of a capital letter ‘Y’, of which the sub-complex Vps22 and Vps36 form one branch. A flexible linker extending from the tip of this branch would lead to two consecutive NZF motifs, the second of which is believed to bind ubiquitinated cargo [78–80]. However, this domain was not solved owing to proteolysis during crystallization. The authors suggest that this structure could provide a “long swinging arm” for transfer of cargo over substantial distances. The NZF motif is lacking in Eap45, the mammalian orthologue of Vps36, but this is compensated for by the inclusion of a GLUE domain, which has been found to have ubiquitin-binding properties [81]. The ESCRTII complex is required for the membrane recruitment of the Snf7Vps20 ESCRTIII sub-complex via an interaction between Vps25 and Vps20 [80, 82], and this in turn is a prerequisite for the recruitment of the other two ESCRTIII proteins Vps24 and Vps2 to the membranes [83, 84]. These last four class E Vps proteins belong to a structurally related “family” of small, highly charged, coiled-coil proteins that in mammalian cells are referred to as CHMPs (charged multivesicular proteins) [85]. Membrane association of this complex may also be conferred by myristoylation of Vps20–CHMP6 and the ability of Vps24–CHMP3 to bind to the phosphatidyl-inositol lipid PtdIns(3,5)P2 [86]. The mammalian homologues of Vps2 and Snf7 are called CHMP2(a and b) and CHMP4(a, b and c), respectively. One highly connected protein is AIP1 or ALIX (ALG2-interacting protein) previously implicated in apoptosis. AIP1 shows interaction with both ESCRTI (Tsg101) and ESCRTIII complexes (Snf7–CHMP4(a, b and c)) and may therefore act as a bridge between these two [87]. The yeast homologue of AIP1, Bro1, plays a role
9
10 Protein Sorting to the Lysosome
Fig. 3 Interactions of the MVB sorting machinery. The first point of engagement of ubiquitinated proteins with the MVB sorting machinery is the Hrs–STAM complex. Hrs (Vps27 in yeast) associates with the endosomal membrane through interaction of its FYVE-domain with PtdIns3P. Both Hrs and STAM (HseI in yeast) bind ubiquitin via their UIM domains. Hrs–Vps27 recruits the ESCRTI complex (composed of Vps23–Tsg101, Vps28 and Vps37), through interaction of its PS/TAP motif with Tsg101–Vps23, which in turn binds ESCRTII (Vps22, Vps25 and Vps36; Eap30, Eap25 and Eap45 in mammalian cells). Both ESCRTI and ESCRTII complexes are able to bind ubiquitin via the UEV domain in Tsg101, and the GLUE and NZF domains in Eap45 and Vps36, respectively. ESCRTII recruits and activates the last
complex in this cascade, ESCRTIII. An additional connection between ESCRTI and ESCRTIII complexes is mediated through AIP1, which in yeast (Bro1) has also been implicated in recruiting the DUB Doa4 to the prevacuolar compartment. ESCRTIII is composed of two subcomplexes, Snf7Vps20 and Vps2–Vps24. These proteins belong to a family of highly charged proteins called CHMPs in mammalian cells. Membrane association of the ESCRTIII complex may be mediated by binding of Vps24 to PtdIns(3,5)P2 and a myristoyl moiety attached to Vps20. The ESCRTIII complex interacts with the AAA-ATPase Vps4 (SKD1 in mouse), which is thought to dissociate the components of the sorting machinery. Black bar: receptor, e.g., EGFR; PI3P: PtdIns3P; PI3,5P2 : PtdIns(3,5)P2 ; Ub: ubiquitin.
4 Ubiquitin Ligases and Endosomal Sorting
in recruiting the deubiquitinating enzyme Doa4 [88, 89], possibly through stabilization of Snf7 at the endosome, with which the enzyme interacts directly [90]. Doa4 is not essential for MVB formation or sorting [23, 91], but acts to recycle ubiquitin from cargo that has progressed beyond ubiquitin-dependent steps in the MVB sorting pathway [92]. 3.5 Vps4–SKD1
The ESCRT-III component Snf7, recruits a Class E Vps protein belonging to the AAA-ATPase family, called Vps4 in yeast, and SKD1 in mouse cells [26, 93–95]. The last resolved step in the MVB formation cascade is the dissociation of the ESCRTmachinery powered by Vps4–SKD1 hydrolysis of ATP. This is inferred from the dramatic phenotype of ATPase-defective Vps4–SKD1 mutants in yeast and mammalian cells, respectively. Vps4 yeast deletion strains show a typical Class E swollen prevacuolar compartment on which the entire upstream sorting machinery accumulates [71, 89, 92, 96]. Overexpression of a catalytically inactive Vps4 mutant in mammalian cells recapitulates this phenotype by promoting the formation of enlarged endosomes on which an “Hrs-clathrin coat”, as well as the ESCRTI machinery accumulates [72, 95, 97, 98]. At the ultrastructural level, cells expressing mutant Vps4 also show a depletion of internal vesicles, and membrane proteins destined for the lysosome accumulate at the peripheral membrane of this abnormal compartment [98]. This suggests that dissociation of the sorting machinery and the formation of internal vesicles are tightly coupled to allow efficient recycling of ESCRT proteins.
4 Ubiquitin Ligases and Endosomal Sorting
The specificity of ubiquitination is largely due to cognate interactions with E3 ligases, of which there are probably in excess of 600 in mammals. By recruiting specific E2 enzymes they may also determine the topology of the ubiquitin extension. E3s are generally split into two major classes: the RING (really interesting new gene) ligases have a catalytic domain based on a double zinc-finger whilst the HECT (homologous to E6-AP carboxyl terminus) ligases contain a 350 amino acid C-terminal domain within which lies a conserved catalytic cysteine. HECT ligases recognize their substrate via WW domains that interact with various proline-rich sequences. They form a thiolester intermediate with ubiquitin, whereas RING ligases promote the direct transfer of ubiquitin from the E2 to the substrate [99]. RING ligases can be further subdivided into those in which the substrate binding site and the RING domain are encoded within a single polypeptide (e.g. c-Cbl) and those in which they are contained within different proteins of a larger complex (e.g. SCF and APC/Cyclosome). We will briefly review the major E3 ligases associated with endocytic trafficking.
11
12 Protein Sorting to the Lysosome
4.1 Nedd4 Family
HECT domain proteins of the Nedd4 family regulate the trafficking of a variety of biosynthetic and endosomal cargo. They have a common architecture consisting of several WW domains and a C-terminal HECT domain. Most members also have an amino terminal C2 domain, which commonly bind to phosphoinositides [100]. The sole member of the Nedd4 family in S. cerevisiae is Rsp5, which seems to be the only ligase required for ubiquitination of cell surface proteins. Early studies demonstrated its involvement in constitutive ubiquitination of the uracil permease Fur4p [101] and the general amino acid permease Gap1p [102]. It is located and functions at multiple sites on the endocytic pathway [103]. Accumulation at the prevacuolar compartment in vps4 cells suggests a function in the MVB sorting pathway. Mutation of the C2 domain disrupts this localization and inhibits sorting of the biosynthetic pathway-derived cargo carboxypeptidase S, but not the endocytosed receptor Ste2, for which ubiquitin can be appended at the plasma membrane [104, 105]. Some transporters such as Fur4p are downregulated both by endocytosis from the cell surface and by diversion from the biosynthetic pathway at the Golgi to the MVB pathway, without reaching the surface. Both pathways require Rsp5 function at the plasma membrane and prevacuole respectively [106]. Drosophila Nedd4 regulates endocytosis of Notch and suppresses its ligandindependent activation [107]. In mammalian cells several family members, including Nedd4 and AIP4/Itch, have been implicated in endocytic trafficking. Nedd4 directly mediates ubiquitination of the epithelial Na+ transporter ENaC, targeting it for downregulation [108]. AIP4/Itch is disrupted in nonagouti lethal or itchy mice, which are characterized by abnormal immune responses and constant itching [109]. It interacts with and ubiquitinates mammalian Notch and the G-protein-coupled chemokine receptor CXCR4 [110, 111]. Salient substrates also include components of the endocytic machinery. Eps15 and Hrs ubiquitination are mediated by Nedd4 or by AIP4 [110, 112–114]. Ubiquitination of these proteins depends on their UIM and it is thought that the HECT ligase is partially recruited through an interaction between its covalently attached ubiquitin and the UIM domain. Once the protein is ubiquitinated, its UIM may be occupied by its own ubiquitin and no longer able to recruit another HECT ligase. In this way, polyubiquitination and subsequent targeting of the endocytic machinery to the proteasome may be prevented [39]. 4.2 c-Cbl
In mammalian cells, the major E3 ligase implicated in endocytic trafficking of RTKs is the cellular proto-oncogene Cbl [115]. It is recruited via its SH2 domain to phosphorylated RTKs. The viral oncogene v-cbl lacks the RING finger motif, and displaces endogenous Cbl, to allow growth factor receptors to escape from downregulation. Similarly, loss of Cbl binding ability through mutation is a recurring theme in oncogenic deregulation of RTKs [116]. The ubiquitination of the Met
5 Endosomal DUBs
receptor is a well-characterized example. Cbl is recruited to activated Met via Grb2, then subsequent binding of its TKB domain to the juxtamembrane autophosphorylated Tyr1003 is proposed to elicit a conformational change within Cbl necessary for activation. A Y1003F mutation inhibits ligand-dependent ubiquitination of Met and leads to cell transformation [117]. Overexpression of c-Cbl, but not mutants lacking ligase activity, dramatically stimulate the degradation of EGFR, apparently without affecting the rate of EGFR internalization [118, 119]. This observation was interpreted to reflect ubiquitindependent lysosomal sorting [118]. Further evidence that Cbl might be required for a later step than internalization has come from work done in Cbl-/- mouse embryonic fibroblasts. This work indicated that EGFR is internalized to endosomes at a normal rate in the absence of Cbl, but its degradation was inhibited [120]. However, the exact site of Cbl-action is still much debated and various reports have since proposed a positive role for Cbl-dependent ubiquitination on the EGFR internalization step [121, 122]. Cbl also acts as a multivalent adapter for at least 40 proteins and some of its roles correspond to this adapter function rather than to its ligase activity. For example, a Cbl–Cin85–endophilin complex positively regulates both Met receptor and EGFR endocytosis independently of E3 ligase activity [123, 124].
5 Endosomal DUBs
Deubiquitinating enzymes (DUBs) regulate the ubiquitin status of endosomal proteins in opposition to E3 ligases. Deubiquitination is not considered to be an obligate step on the MVB sorting pathway as biosynthetic production of chimeric proteins incorporating ubiquitin results in efficient targeting to MVBs [23]. DUB activity at the endosome may be necessary to maintain the pool of free ubiquitin required for endosomal sorting. On the other hand it can negatively regulate lysosomal protein degradation if it acts on ubiquitinated receptors prior to their commitment to the lysosomal pathway (Figures 4 and 5). DUBs implicated in endosomal sorting include the yeast proteins Ubp1, Ubp2 and Doa4, and the mammalian proteins UBPY (USP8) and AMSH. 5.1 Ubp1 and Ubp2
Overexpression of a soluble form of Ubp1 is able to stabilize the ABC-transporter Ste6 and the α-factor receptor Ste2, which are transported to the vacuole for degradation [125]. The stabilization effect of Ste6 was shown not to be due to deubiquitination of Ste6 itself, suggesting that the target of Ubp1 may be a component of the protein-transport machinery. Ubp2 shows specificity for K63 over K48-linked polyubiquitin chains and can antagonize Rsp5 E3 ligase activity [126]. Interestingly
13
14 Protein Sorting to the Lysosome
Fig. 4 Sorting of ubiquitinated receptors at the endosome. Ubiquitination of growth factor receptors (e.g., EGF receptor (EGFR)), by an E3 ligase (e.g. Cbl) promotes their interaction with Hrs in a Clathrin-coated microdomain. DUBs (e.g., AMSH and UBPY) may counteract Cbl activity and oppose MVB sorting and favour recycling. Hrs recruits the ESCRTI complex, which activates and assembles ESCRTII and III. This in-
duces the translocation of the ubiquitinated receptor into internal vesicles of the MVB. Ubiquitin itself is recycled by a DUB (e.g., Doa4) just before, or in parallel with, the disassembly of the ESCRT-machinery by the AAA-ATPase Vps4, which irreversibly seals the sorting process. Note that this illustration combines elements from yeast and mammalian cells.
Ubp2 co-purifies with Rsp5 and is physically linked to Rsp5 through an adapter protein Rup1. 5.2 Doa4
The yeast protein Doa4 was originally shown to interact with the 26S proteasome and proposed to promote proteolysis through removal of ubiquitin from proteolytic intermediates on the proteasome [127, 128]. It has also been proposed to maintain free ubiquitin levels by recycling ubiquitin from cargo molecules that have been committed to the lysosomal sorting pathway, prior to their sequestration away from the cytosol [92, 129]. In common with ESCRT complex components, Doa4 accumulates on the endosome following inactivation of the ATPase Vps4 [92]. Deletion of ESCRT III-complex components blocks this localization [88, 92] and
5 Endosomal DUBs
Fig. 5 The endocytic pathway in yeast and mammalian cells. Endocytosed proteins (e.g., Ste2 and EGFR) are sorted at the early endosome by multiple components of the multivesicular body (MVB) sorting machinery (Hrs–STAM, Vps27–HseI, ESCRTI–III). From here, cargo can be either recycled
back to the plasma membrane or sorted to the MVB or prevacuolar compartments. Sorting to the lysosome and vacuole respectively is enhanced by the activity of E3-ligases (Rsp5, Cbl, Nedd4, AIP4), which may be opposed by DUBs (Ubp1, AMSH, UBPY).
a direct interaction with the ESCRTIII component Snf7 has been shown through two-hybrid screens [90]. 5.3 UBPY
UBPY (USP8), a member of the ubiquitin-specific processing protease (UBP) family, displays the highest similarity to Doa4 amongst mammalian DUBs. It accumulates upon growth stimulation of starved human fibroblasts and downregulates in response to growth arrest induced by cell–cell contact [130]. A link to endosomal protein sorting was first suggested when UBPY was identified in a far-Western screen for Hbp–STAM binding partners [131]. Mutagenic analysis identified a consensus sequence PX(V/I)(D/N)RXXKP as a binding module for interaction with the Hbp-STAM-SH3 domain-binding motif [131, 132]. This represents a novel SH3 binding motif lacking the canonical PXXP sequence. UBPY can hydrolyse both K48and K63-linked chains as well as monoubiqitinated EGFR. Overexpression of UBPY retards EGFR degradation whilst conflicting data have been reported for the effect
15
16 Protein Sorting to the Lysosome
of siRNA-mediated UBPY knock-down, which was found to either enhance [133] or inhibit EGFR downregulation [134]; Row, P.E., McCullough, J., Clague, M.J. and Urb´e, S., unpublished observation). UBPY can exist as a trimolecular complex with Otubain 1, a member of the OTU family of DUBs, and GRAIL, an E3 ligase crucial for the induction of CD4 T cell anergy [135]. UBPY was shown to deubiquitinate GRAIL in vitro and Otubain 1 opposed this action by limiting the DUB activity of UBPY. UBPY-dependent deubiquitination has also been suggested to prevent degradation of the E3 ligase Nrdp1, which plays a role in regulating steady-state levels of ErbB3 and ErbB4 [136] Gnesutta et al. [137] demonstrated an in vitro interaction between mouse UBPY and the N-terminal half of CDC25Mm, a Ras nucleotide exchange factor. This interaction may be functional in cells, since ubiquitination of CDC25Mm in HEK293 cells is diminished following co-expression of mUBPY. 5.4 AMSH
AMSH was originally isolated as a novel adapter molecule from human T cells that interacts with the SH3 domain of STAM (associated molecule with the SH3 domain) [138]. Endogenous STAM was pulled down when AMSH was immunoprecipitated from IL-2 cell lysates in the presence of cross-linkers and both proteins were shown to interact when co-transfected into cells. Intriguingly, AMSH shares the PX(V/I)(D/N)RXXKP STAM/Hbp-SH3 domain-binding motif of UBPY [131]. Thus, AMSH presumably competes with UBPY for binding to the SH3 domain of STAM. AMSH belongs to the JAMM (JAB1/MPN/Mov34) metallo-enzyme family of deubiquitinating enzymes, of which the Rpn11-POH1 subunit of the 19S proteasome lid was the first described representative [139, 140]. No DUB activity was observed when Rpn11 was isolated from its multisubunit complex. In contrast, purified AMSH cleaves ubiquitin chains in vitro, and is hence the first JAMM-domain DUB to exhibit activity in isolation [141]. In common with Ubp2, AMSH displays specificity for K63-over K48-linked polyubiquitin chains [141]. AMSH localizes to endosomes and an inactivating mutation in the JAMM domain of AMSH was shown to promote the accumulation of ubiquitin on endosomes [141]. Concomitantly, this mutant stabilizes an ubiquitinated form of STAM, which is contingent on an intact UIM within STAM. This led us to suggest that ubiquitin, which is appended to STAM in a UIM-dependent fashion and which would normally be removed by either AMSH or UBPY, may provide an additional binding site for enzymatically inactive AMSH. Hence, the inactive mutant of AMSH could act as a “substrate trap” mutant. Ubiquitinated EGFR provides a substrate for AMSH in vitro and siRNA-mediated knock-down of AMSH enhances the degradation rate of EGFR [141]. We have proposed a model for the role of AMSH on endosomes in which AMSH can counteract the E3 ligase activity of c-Cbl on EGFR, before the ubiquitinated receptor has been committed to the lysosomal sorting pathway (Figure 4). AMSH activity
6 Polyubiquitin Linkages and Endocytosis 17
will therefore favour recycling of the receptor. Note that existing data suggest that either AMSH or UBPY can fulfil this role (Figure 5). AMSH also associates with a number of signalling molecules. Signalling defects in cells derived from AMSH-deficient mice were not obvious, although the mice die at about three weeks of age [142]. It was originally hypothesized that AMSH may play a role in cytokine-mediated signalling through its interaction with STAM [138]. A novel Grb2 family member, Gads/Grf40, also associates with AMSH [143]. Gads has been shown to be involved in T-cell receptor (TCR) signalling; Gads knock-out or Gads SH2 transgenic mice show impairment in pre-T-cell development. In addition, AMSH also interacts with inhibitory Smads (I-Smads), and this association negatively regulates their function, and thereby promotes bone morphogenetic protein (BMP)-mediated signalling [144]. Li and Seth [145] have demonstrated that AMSH itself is ubiquitinated by the E3 ligase Smurf2. This is contingent on both proteins binding to the adaptor molecule RFN11 and leads to a reduction in steady-state levels of AMSH by proteasomal degradation. We can now see that association of E3 ligase activity with DUB activity is a recurring theme common to UBP2, UBPY and AMSH.
6 Polyubiquitin Linkages and Endocytosis
Although monoubiquitination may represent a minimal requirement for endosomal sorting and RTKs do not seem to incorporate polyubiquitin chains, there is a large body of work suggesting polyUb involvement in some sorting events. 6.1 Proteasome Involvement in Endocytic Sorting
The downregulation of a sub-set of RTKs and other receptors requires proteasomal activity. In most cases studied so far this requirement does not reflect proteasomal degradation of the receptor per se. Rather, this activity is permissive for receptor sorting towards the lysosomal degradation pathway. Thus receptor downregulation can be sensitive to both proteasome inhibitors (e.g. lactacystin) and inhibition of lysosomal acidification (e.g. concanamycin). K48-linked polyubiquitin chains specify proteasomal degradation and are therefore indirectly implicated in the lysosomal sorting process. Well-characterized examples include interleukin-2 receptors [146], Growth Hormone Receptor (GHR) [147] and the RTK Met [50, 148]. In each case, inhibition of the proteasome promotes recycling of internalized receptors at the expense of sorting to lysosomes. Interestingly, ubiquitination of GHR itself appears to be dispensable for downregulation, which has led to a model in which polyubiquitination and proteasomal degradation of an unidentified accessory factor is required [147]. An example of such a scenario may be found in neurons, where the endocytosis of AMPA-type glutamate receptors also requires ubiquitination and proteasomal degradation of the scaffolding protein PSD-95 [149].
18 Protein Sorting to the Lysosome
Proteasome inhibition could reduce free ubiquitin levels leading to a block in endocytosis. In the case of Met receptor the inhibitory effect of lactacystin can be overcome by overexpression of ubiquitin, but not by a form of ubiquitin unable to form K48 linkages (K48R) [21]. It is baffling that a K48-linkage dependence is observed in the presence of a proteasomal inhibitor. Does this represent an entirely novel function for K48-linked ubiquitin, which has previously been uniquely associated with a proteasomal targeting signal? One scenario may be that the critical step for lysosomal sorting consists of the sequestration of a polyubiquitinated accessory factor away from Met, which is followed by incidental proteasomal degradation. In this model, proteasome activity is only required to generate free ubiquitin (possibly locally), which can contribute to K48-chain formation.
6.2 K63-linked Ubiquitin
Pioneering studies in yeast have provided evidence for a role of K63-linked polyubiquitin in vacuolar sorting. Gap1p and Fur4p have both been shown to be modified with short (2–3) K63-linked polyubiquitin chains [17, 18, 106]. In doa4 cells, which show low levels of endocytosis due to limited availabilty of ubiquitin, the endocytosis rate can be restored by overexpression of wild type or K48R ubiquitin but not by K63R ubiquitin, which can participate in monoubiquitination events but cannot form K63-linked chains. K63-linked chains are well represented in the pool of total cellular polyubiquitin [2]. It may well be that in the excitement over monoubiquitination, we have underestimated the role of K63-linked chains in lysosomal sorting. After all, these can provide higher affinity interactions with ubiquitin-binding domains, whilst still avoiding proteasomal degradation.
7 Conclusions
Now that most of the core components of the ubiquitin-dependent MVB sorting pathway have been identified, the challenge lies in elucidating the choreography underlying this complex process. How is cargo passed between ESCRT complexes or is the whole process more co-operative? The significance of the ubiquitin-chain topology is likely to receive more attention and more ubiquitin-binding domains with distinct specificities remain to be identified. Allied to this one may consider that ubiquitin may not simply be a tag for sorting at the endosome but may coordinate spatially segregated signalling events through recruitment of adapter proteins and enzymes with ubiquitin-binding domains. The endosome may thus provide a “hot spot” for ubiquitin-dependent signalling.
References
Acknowledgements
We thank the Wellcome Trust and North West Cancer Research Fund for funding our research in this area. SU is a Wellcome Career Development Fellow and JM is the recipient of a Wellcome Trust prize studentship.
References 1 CIECHANOVER, A. (2005). Proteolysis: from the lysosome to ubiquitin and the proteasome. Nat. Rev. Mol. Cell Biol. 6, 79–86. 2 PENG, J., SCHWARTZ, D., ELIAS, J. E., THOREEN, C. C., CHENG, D., MARSISCHKY, G., ROELOFS, J., FINLEY, D., and GYGI, S. P. (2003). A proteomics approach to understanding protein ubiquitination. Nat. Biotechnol. 21, 921–926. 3 CLAGUE, M. J. (1998). Molecular aspects of the endocytic pathway. Biochem. J. 336, 271–282. 4 FELDER, S., MILLER, K., MOEHREN, G., ULLRICH, A., SCHLESSINGER, J., and HOPKINS, C. R. (1990). Kinase activity controls the sorting of the epidermal growth factor receptor within the multivesicular body. Cell 61, 623–634. 5 GORDEN, P., CARPENTIER, J. L., COHEN, S., and ORCI, L. (1978). Epidermal growth factor: morphological demonstration of binding, internalization, and lysosomal association in human fibroblasts. Proc. Natl. Acad. Sci. USA 75, 5025–5029. 6 HAIGLER, H. T., MCKANNA, J. A., and COHEN, S. (1979). Direct visualization of the binding and internalization of a ferritin conjugate of epidermal growth factor in human carcinoma cells A-431. J. Cell Biol. 81, 382–395. 7 BANKAITIS, V. A., JOHNSON, L. M., and EMR, S. D. (1986). Isolation of yeast mutants defective in protein targeting to the vacuole. Proc. Natl. Acad. Sci. USA 83, 9075–9079. 8 LEUNG, D. W., SPENCER, S. A., CACHIANES, G., HAMMONDS, R. G., COLLINS, C., HENZEL, W. J., BARNARD, R., WATERS, M. J., and WOOD, W. I. (1987). Growth hormone receptor and serum binding protein: purification, cloning and expression. Nature 330, 537–543.
9 YARDEN, Y., ESCOBEDO, J. A., KUANG, W. J., YANG-FENG, T. L., DANIEL, T. O., TREMBLE, P. M., CHEN, E. Y., ANDO, M. E., HARKINS, R. N., FRANCKE, U. et al. (1986). Structure of the receptor for platelet-derived growth factor helps define a family of closely related growth factor receptors. Nature 323, 226–232. 10 BONIFACINO, J. S., and WEISSMAN, A. (1998). Ubiquitin and the control of protein fate in the secretory and endocytic pathways. Annu. Rev. Cell. Dev. Biol. 14, 19–57. 11 CENCIARELLI, C., HOU, D., HSU, K. C., RELLAHAN, B. L., WIEST, D. L., SMITH, H. T., FRIED, V. A., and WEISSMAN, A. M. (1992). Activation-induced ubiquitination of the T cell antigen receptor. Science 257, 795–797. 12 KOLLING, R. and HOLLENBERG, C. P. (1994). The ABC-transporter Ste6 accumulates in the plasma membrane in a ubiquitinated form in endocytosis mutants. Embo J. 13, 3261–3271. 13 HICKE, L. and RIEZMAN, H. (1996). Ubiquitination of a yeast plasma membrane receptor signals its ligand-stimulated endocytosis. Cell 84, 277–287. 14 GRENSON, M. (1983). Inactivation-reactivation process and repression of permease formation regulate several ammonia-sensitive permeases in the yeast Saccharomyces cerevisiae. Eur. J. Biochem. 133, 135–139. 15 HEIN, C., SPRINGAEL, J. Y., VOLLAND, C., HAGUENAUER-TSAPIS, R., and ANDRE, B. (1995). NPl1, an essential yeast gene involved in induced degradation of Gap1 and Fur4 permeases, encodes the Rsp5 ubiquitin-protein ligase. Mol. Microbiol. 18, 77–87.
19
20 Protein Sorting to the Lysosome
16 TERREL, J., SHIH, S., DUNN, R., and HICKE, L. (1998). Mol. Cell 1, 193–202. 17 SHIH, S. C., SLOPER-MOULD, K. E., and HICKE, L. (2000). Monoubiquitin carries a novel internalization signal that is appended to activated receptors. Embo J. 19, 187–198. 18 GALAN, J. M. and HAGUENAUER-TSAPIS, R. (1997). Ubiquitin lys63 is involved in ubiquitination of a yeast plasma membrane protein. Embo J. 16, 5847–5854. 19 SPRINGAEL, J. Y., GALAN, J. M., HAGUENAUER-TSAPIS, R., and ANDRE, B. (1999). NH4ρ-induced down-regulation of the Saccharomyces cerevisiae Gap1p permease involves its ubiquitination with lysine-63-linked chains. J. Cell Sci. 112 (Pt 9), 1375–1383. 20 HAGLUND, K., SIGISMUND, S., POLO, S., SZYMKIEWICZ, I., DI FIORE, P. P., and DIKIC, I. (2003). Multiple monoubiquitination of RTKs is sufficient for their endocytosis and degradation. Nat. Cell Biol. 5, 461–466. 21 MOSESSON, Y., SHTIEGMAN, K., KATZ, M., ZWANG, Y., VEREB, G., SZOLLOSI, J., and YARDEN, Y. (2003). Endocytosis of receptor tyrosine kinases is driven by monoubiquitylation, not polyubiquitylation. J. Biol. Chem. 278, 21323–21326. 22 CARTER, S., URBE´ , S., and CLAGUE, M. J. (2004). The met receptor degradation pathway: Requirement for K48-linked polyubiquitin independent of proteasome activity. J. Biol. Chem. 281, 5094–5105. 23 MIRANDA, M., WU, C. C., SORKINA, T., KORSTJENS, D., and SORKIN, A. (2005). Enhanced ubiquitylation and accelerated degradation of the dopamine transporter mediated by protein kinase C. J. Biol. Chem. 280, 35617–35624. 24 REGGIORI, F. and PELHAM, H. R. (2001). Sorting of proteins into multivesicular bodies: ubiquitin-dependent and -independent targeting. EMBO J. 20, 5176–5186. 25 HOPKINS, C. R. (1983). Intracellular routing of transferrin and transferrin receptors in epidermoid carcinoma A431 cells. Cell 35, 321–330.
26 RAIBORG, C., BACHE, K. G., GILLOOLY, D. J., MADSHUS, I. H., STANG, E., and STENMARK, H. (2002). Hrs sorts ubiquitinated proteins into clathrin-coated microdomains of early endosomes. Nature Cell Biol. 4, 394–398. 27 KATZMANN, D. J., ODORIZZI, G., and EMR, S. D. (2002). Receptor downregulation and multivesicular-body sorting. Nat. Rev. Mol. Cell Biol. 3, 893–905. 28 HAAS, A. L. and BRIGHT, P. M. (1987). The dynamics of ubiquitin pools within cultured human lung fibroblasts. J. Biol. Chem. 262, 345–351. 29 HOFMANN, K., and FALQUET, L. (2001). A ubiquitin-interacting motif conserved in components of the proteasomal and lysosomal protein degradation systems. TIBS. 26, 347–350. 30 FISHER, R. D., WANG, B., ALAM, S. L., HIGGINSON, D. S., ROBINSON, H., SUNDQUIST, W. I., and HILL, C. P. (2003). Structure and ubiquitin binding of the ubiquitin-interacting motif. J. Biol. Chem. 278, 28976–28984. 31 SHEKHTMAN, A. and COWBURN, D. (2002). A ubiquitin-interacting motif from Hrs binds to and occludes the ubiquitin surface necessary for polyubiquitination in monoubiquitinated proteins. Biochem. Biophys. Res. Commun. 296, 1222–1227. 32 SWANSON, K. A., KANG, R. S., STAMENOVA, S. D., HICKE, L., and RADHAKRISHNAN, I. (2003). Solution structure of Vps27 UIM-ubiquitin complex important for endosomal sorting and receptor downregulation. Embo J. 22, 4597–4606. 33 HOFMANN, K. and BUCHER, P. (1996). The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway. Trends Biochem. Sci. 21, 172–173. 34 VADLAMUDI, R. K., JOUNG, I., STROMINGER, J. L., and SHIN, J. (1996). p62, a phosphotyrosine-independent ligand of the SH2 domain of p56lck, belongs to a new class of ubiquitin-binding proteins. J. Biol. Chem. 271, 20235–20237. 35 KANG, R. S., DANIELS, C. M., FRANCIS, S. A., SHIH, S. C., SALERNO, W. J., HICKE, L., and RADHAKRISHNAN, I. (2003). Solution structure of a CUE-ubiquitin complex
References
36
37
38
39
40
41
42
43
44
reveals a conserved mode of ubiquitin binding. Cell 113, 621–630. MUELLER, T. D. and FEIGON, J. (2002). Solution structures of UBA domains reveal a conserved hydrophobic surface for protein-protein interactions. J. Mol. Biol. 319, 1243–1255. PRAG, G., MISRA, S., JONES, E. A., GHIRLANDO, R., DAVIES, B. A., HORAZDOVSKY, B. F., and HURLEY, J. H. (2003). Mechanism of ubiquitin recognition by the CUE domain of Vps9p. Cell 113, 609–620. PORNILLOS, O., ALAM, S. L., RICH, R. L., MYSZKA, D. G., DAVIS, D. R., and SUNDQUIST, W. I. (2002). Structure and functional interactions of the Tsg101 UEV domain. Embo J. 21, 2397–2406. SUNDQUIST, W. I., SCHUBERT, H. L., KELLY, B. N., HILL, G. C., HOLTON, J. M., and HILL, C. P. (2004). Ubiquitin recognition by the human TSG101 protein. Mol. Cell 13, 783–789. DI FIORE, P. P., POLO, S., and HOFMANN, K. (2003). When ubiquitin meets ubiquitin receptors: a signalling connection. Nat. Rev. Mol. Cell Biol. 4, 491–497. HICKE, L. and DUNN, R. (2003). Regulation of membrane protein transport by ubiquitin and ubiquitin-binding proteins. Annu. Rev. Cell Dev. Biol. 19, 141–172. BILODEAU, P. S., URBANOWSKI, J. L., WINISTORFER, S. C., and PIPER, R. C. (2002). The Vps27p Hse1p complex binds ubiquitin and mediates endosomal protein sorting. Nat. Cell Biol. 4, 534–539. SHIH, S. C., KATZMANN, D. J., SCHNELL, J. D., SUTANTO, M., EMR, S. D., and HICKE, L. (2002). Epsins and Vs27p/Hrs contain ubiquitin-binding domains that function in receptor endocytosis. Nature Cell Biol. 4, 389–393. GILLOOLY, D. J., MORROW, I. C., LINDSAY, M., GOULD, R., BRYANT, N. J., GAULLIER, J. M., PARTON, R. G., and STENMARK, H. (2000). Localization of phosphatidylinositol 3-phosphate in yeast and mammalian cells. EMBO J. 19, 4577–4588.
45 URBE´ , S., MILLS, I. G., STENMARK, H., KITAMURA, N., and CLAGUE, M. J. (2000). Endosomal localization and receptor dynamics determine tyrosine phosphorylation of hepatocyte growth factor-regulated tyrosine kinase substrate. Mol. Cell Biol. 20, 7685– 7692. 46 CLAGUE, M. J. (2002). Membrane transport: a coat for ubiquitin. Curr. Biol. 12, R529–R531. 47 RAIBORG, C., GRONVOLD BACHE, K., MEHLUM, A., STANG, E., and STENMARK, H. (2001). Hrs recruits clathrin to early endosomes. EMBO J. 20, 5008–5021. 48 SACHSE, M., URBE´ , S., OORSCHOT, V., STROUS, G. J., and KLUMPERMAN, J. (2002). Bilayered clathrin coats on endosomal vacuoles are involved in protein sorting toward lysosomes. Mol. Biol. Cell 13, 1313–1328. 49 LLOYD, T. E., ATKINSON, R., WU, M. N., ZHOU, Y., PENNETTA, G., and BELLEN, H. J. (2002). Hrs regulates endosome membrane invagination and tyrosine kinase receptor signaling in Drosophila. Cell 108, 261–269. 50 BACHE, K. G., RAIBORG, C., MEHLUM, A., and STENMARK, H. (2003). STAM and Hrs are subunits of a multivalent ubiquitin-binding complex on early endosomes. J. Biol. Chem. 278, 12513–12521. 51 HAMMOND, D. E., CARTER, S., MCCULLOUGH, J., URBE´ , S., VANDE WOUDE, G., and CLAGUE, M. J. (2003). Endosomal dynamics of met determine signaling output. Mol. Biol. Cell 14, 1346–1354. 52 URBE´ , S., SACHSE, M., ROW, P. E., PREISINGER, C., BARR, F. A., STROUS, G., KLUMPERMAN, J., and CLAGUE, M. J. (2003). The UIM domain of Hrs couples receptor sorting to vesicle formation. J. Cell Sci. 116, 4169–4179. 53 CONFALONIERI, S., SALCINI, A. E., PURI, C., TACCHETTI, C., and Di FIORE, P. P. (2000). Tyrosine phosphorylation of Eps15 is required for ligand-regulated, but not constitutive, endocytosis. J. Cell Biol. 150, 905–912. 54 TORRISI, M. R., LOTTI, L. V., BELLEUDI, F., GRADINI, R., SALCINI, A. E.,
21
22 Protein Sorting to the Lysosome
55
56
57
58
59
60
61
62
63
CONFALONIERI, S., PELICCI, P. G., and Di FIORE, P. P. (1999). Eps15 is recruited to the plasma membrane upon epidermal growth factor receptor activation and localizes to components of the endocytic pathway during receptor internalization. Mol. Biol. Cell 10, 417–434. CLAGUE, M. J. and URBE´ , S. (2001). The interface of receptor trafficking and signalling. J. Cell Sci. 114, 3075– 3081. FAZIOLI, F., MINICHIELLO, L., MATOSKOVA, W. T., WONG, W. T., and Di FIORE, P. P. (1993). Eps15, a novel tyrosine kinase substrate exhibits transforming activity. Mol. Cell Biol. 13, 5814–5828. ROW, P. E., CLAGUE, M. J., and URBE, S. (2005). Growth factors induce differential phosphorylation profiles of the Hrs-Stam complex: a common node in signaling networks with signal specific properties. Biochem. J. 389, 629–636. BILODEAU, P. S., WINISTORFER, S. C., ALLAMAN, M. M., SURENDHRAN, K., KEARNEY, W. R., ROBERTSON, A. D., and PIPER, R. C. (2004). The GAT domains of clathrin-associated GGA proteins have two ubiquitin-binding motifs. J. Biol. Chem. 279, 54808–54816. PUERTOLLANO, R. (2005). Interactions of TOM1L1 with the multivesicular body sorting machinery. J. Biol. Chem. 280, 9258–9264. BONIFACINO, J. S. (2004). The GGA proteins: adaptors on the move. Nat. Rev. Mol. Cell Biol. 5, 23–32. SCOTT, P. M., BILODEAU, P. S., ZHDANKINA, O., WINISTORFER, S. C., HAUGLUND, M. J., ALLAMAN, M. M., KEARNEY, W. R., ROBERTSON, A. D., BOMAN, A. L., and PIPER, R. C. (2004). GGA proteins bind ubiquitin to facilitate sorting at the trans-Golgi network. Nat. Cell Biol. 6, 252–259. YAMAKAMI, M., YOSHIMORI, T., and YOKOSAWA, H. (2003). Tom1, a VHS domain-containing protein, interacts with tollip, ubiquitin, and clathrin. J. Biol. Chem. 278, 52865–52872. PUERTOLLANO, R., RANDAZZO, P. A., PRESLEY, J. F., HARTNELL, L. M., and BONIFACINO, J. S. (2001). The GGAs
64
65
66
67
68
69
70
71
72
promote ARF-dependent recruitment of clathrin to the TGN. Cell 105, 93–102. SHIBA, Y., KATOH, Y., SHIBA, T., YOSHINO, K., TAKATSU, H., KOBAYASHI, H., SHIN, H. W., WAKATSUKI, S., and NAKAYAMA, K. (2004). GAT (GGA and Tom1) domain responsible for ubiquitin binding and ubiquitination. J. Biol. Chem. 279, 7105–7111. ZHU, Y., DORAY, B., POUSSU, A., LEHTO, V. P., and KORNFELD, S. (2001). Binding of GGA2 to the lysosomal enzyme sorting motif of the mannose 6-phosphate receptor. Science 292, 1716–1718. DELL’ANGELICA, E. C., PUERTOLLANO, R., MULLINS, C., AGUILAR, R. C., VARGAS, J. D., HARTNELL, L. M., and BONIFACINO, J. S. (2000). GGAs: a family of ADP ribosylation factor-binding proteins related to adaptors and associated with the Golgi complex. J. Cell Biol. 149, 81–94. KATOH, Y., SHIBA, Y., MITSUHASHI, H., YANAGIDA, Y., TAKATSU, H., and NAKAYAMA, K. (2004). Tollip and Tom1 form a complex and recruit ubiquitin-conjugated proteins onto early endosomes. J. Biol. Chem. 279, 24435–24443. SEET, L. F., LIU, N., HANSON, B. J., and HONG, W. (2004). Endofin recruits TOM1 to endosomes. J. Biol. Chem. 279, 4670–4679. PORNILLOS, O., HIGGINSON, D. S., STRAY, K. M., FISHER, R. D., GARRUS, J. E., PAYNE, M., HE, G. P., WANG, H. E., MORHAM, S. G., and SUNDQUIST, W. I. (2003). HIV Gag mimics the Tsg101-recruiting activity of 4 Ubiquitin and Protein Sorting to the Lysosome the human Hrs protein. J. Cell Biol. 162, 425–434. PUERTOLLANO, R. and BONIFACINO, J. S. (2004). Interactions of GGA3 with the ubiquitin sorting machinery. Nat. Cell Biol. 6, 244–251. TRAUB, L. M. (2003). Sorting it out: AP-2 and alternate clathrin adaptors in endocytic cargo selection. J. Cell Biol. 163, 203–208. LI, L. and COHEN, S. N. (1996). Tsg101: a novel tumor susceptibility gene isolated
References
73
74
75
76
77
78
79
80
81
by controlled homozygous functional knockout of allelic loci in mammalian cells. Cell 85, 319–329. BABST, M., ODORIZZI, G., ESTEPA, E. J., and EMR, S. D. (2000). Mammalian tumour susceptibility gene 101 (TSG101) and the yeast homologue, Vps23p, both function in late endosomal trafficking. Traffic 1, 248–258. BISHOP, N., HORMAN, A., and WOODMAN, P. (2002). Mammalian class E vps proteins recognize ubiquitin and act in the removal of endosomal protein-ubiquitin conjugates. J. Cell Biol. 157, 91–101. BACHE, K. G., SLAGSVOLD, T., CABEZAS, A., ROSENDAL, K. R., RAIBORG, C., and STENMARK, H. (2004). The growth-regulatory protein HCRP1/hVps37A is a subunit of mammalian ESCRT-I and mediates receptor down-regulation. Mol. Biol. Cell 15, 4337–4346. STUCHELL, M. D., GARRUS, J. E., MULLER, B., STRAY, K. M., GHAFFARIAN, S., MCKINNON, R., KRAUSSLICH, H. G., MORHAM, S. G., and SUNDQUIST, W. I. (2004). The human endosomal sorting complex required for transport (ESCRT-I) and its role in HIV-1 budding. J. Biol. Chem. 279, 36059–36071. BACHE, K. G., BRECH, A., MEHLUM, A., and STENMARK, H. (2003). Hrs regulates multivesicular body formation via ESCRT recruitment to endosomes. J. Cell Biol. 162, 435–442. LU, Q., HOPE, L. W., BRASCH, M., REINHARD, C., and COHEN, S. N. (2003). TSG101 interaction with HRS mediates endosomal trafficking and receptor down-regulation. Proc. Natl. Acad. Sci. USA 100, 7626–7631. CLAGUE, M. J. and URBE´ , S. (2003). Hrs function: viruses provide the clue. Trends Cell Biol. 13, 603–606. ALAM, S. L., SUN, J., PAYNE, M., WELCH, B. D., BLAKE, B. K., DAVIS, D. R., MEYER, H. H., EMR, S. D., and SUNDQUIST, W. I. (2004). Ubiquitin interactions of NZF zinc fingers. Embo J. 23, 1411–1421. HIERRO, A., SUN, J., RUSNAK, A. S., KIM, J., PRAG, G., EMR, S. D., and HURLEY, J. H. (2004). Structure of the ESCRT-II
82
83
84
85
86
87
88
89
endosomal trafficking complex. Nature 431, 221–225. TEO, H., PERISIC, O., GONZALEZ, B., and WILLIAMS, R. L. (2004). ESCRT-II, an endosome-associated complex required for protein sorting: crystal structure and interactions with ESCRT-III and membranes. Dev. Cell 7, 559–569. SLAGSVOLD, T., AASLAND, R., HIRANO, S., BACHE, K. G., RAIBORG, C., TRAMBAIANO, D., WAKATSUKI, S., and STENMARK, H. (2005). Eap45 in mammalian ESCRT-II binds ubiquitin via a phosphoinositide-interacting GLUE domain. J. Biol. Chem. 280, 19600–19606. VON SCHWEDLER, U. K., STUCHELL, M., MULLER, B., WARD, D. M., CHUNG, H. Y., MORITA, E., WANG, H. E., DAVIS, T., HE, G. P., CIMBORA, D. M., SCOTT, A., KRANSSLICH, H. G., KAPLAN, J., MORHAM, S. G., and SUNDQUIST, W. J. (2003). The protein network of HIV budding. Cell 114, 701–713. BABST, M., KATZMANN, D. J., ESTEPA-SABAL, E. J., MEERLOO, T., and EMR, S. D. (2002). Escrt-III: an endosome-associated heterooligomeric protein complex required for mvb sorting. Dev. Cell 3, 271–282. BABST, M., KATZMANN, D. J., SNYDER, W. B., WENDLAND, B., and EMR, S. D. (2002). Endosome-associated complex, ESCRT-II, recruits transport machinery for protein sorting at the multivesicular body. Dev. Cell 3, 283–289. HOWARD, T. L., STAUFFER, D. R., DEGNIN, C. R., and HOLLENBERG, S. M. (2001). CHMP1 functions as a member of a newly defined family of vesicle trafficking proteins. J. Cell Sci. 114, 2395–2404. WHITLEY, P., REAVES, B. J., HASHIMOTO, M., RILEY, A. M., POTTER, B. V., and HOLMAN, G. D. (2003). Identification of mammalian Vps24p as an effector of phosphatidylinositol 3,5-bisphosphate-dependent endosome compartmentalization. J. Biol. Chem. 278, 38786–38795. KATOH, K., SHIBATA, H., SUZUKI, H., NARA, A., ISHIDOH, K., KOMINAMI, E., YOSHIMORI, T., and MAKI, M. (2003). The
23
24 Protein Sorting to the Lysosome
90
91
92
93
94
95
96
97
ALG-2-interacting protein Alix associates with CHMP4b, a human homologue of yeast Snf7 that is involved in multivesicular body sorting. J. Biol. Chem. 278, 39104–39113. LUHTALA, N. and ODORIZZI, G. (2004). Bro1 coordinates deubiquitination in the multivesicular body pathway by recruiting Doa4 to endosomes. J. Cell Biol. 166, 717–729. ODORIZZI, G., KATZMANN, D. J., BABST, M., AUDHYA, A., and EMR, S. D. (2003). Bro1 is an endosome-associated protein that functions in the MVB pathway in Saccharomyces cerevisiae. J. Cell Sci. 116, 1893–1903. BOWERS, K., LOTTRIDGE, J., HELLIWELL, S. B., GOLDTHWAITE, L. M., LUZIO, J. P., and STEVENS, T. H. (2004). Protein-protein interactions of ESCRT complexes in the yeast Saccharomyces cerevisiae. Traffic 5, 194–210. URBANOWSKI, J. L. and PIPER, R. C. (2001). Ubiquitin sorts proteins into the intralumenal degradative compartment of the late endosome/vacuole. Traffic 2, 622–630. AMERIK, A. Y., NOWAK, J., SWAMINATHAN, S., and HOCHSTRASSER, M. (2000). The DoA4 deubiquitinating enzyme is functionally linked to the vacuolar protein-sorting and endocytic pathways. Mol. Biol. Cell 11, 3365–3380. LIN, Y., KIMPLER, L. A., NAISMITH, T. V., LAUER, J. M., and HANSON, P. I. (2005). Interaction of the mammalian ESCRT-III protein hSnf7 1 with itself, membranes, and the AAAρ ATPase SKD1. J. Biol. Chem. 280, 12799–12809. PERIER, F., COULTER, K. L., LIANG, H., RADEKE, C. M., GABER, R. F., and VANDENBERG, C. A. (1994). Identification of a novel mammalian member of the NSF/CDC48p/Pas1p/TBP-1 family through heterologous expression in yeast. FEBS Lett. 351, 286–290. YOSHIMORI, T., YAMAGATA, F., YAMAMOTO, A., MIZUSHIMA, N., KABEYA, Y., NARA, A., MIWAKO, I., OHASHI, M., OHSUMI, M., and OHSUMI, Y. (2000). The mouse SKD1, a homologue of yeast Vps4p, is required for normal endosomal trafficking and morphology
98
99
100
101
102
103
104
105
106
in mammalian cells. Mol. Biol. Cell 11, 747–763. BABST, M., WENDLAND, B., ESTEPA, E. J., and EMR, S. D. (1998). The Vps4p AAA ATPase regulates membrane association of a Vps protein complex required for normal endosome function. Embo J. 17, 2982–2993. FUJITA, H., YAMANAKA, M., IMAMURA, K., TANAKA, Y., NARA, A., YOSHIMORI, T., YOKOTA, S., and HIMENO, M. (2003). A dominant negative form of the AAA ATPase SKD1/VPS4 impairs membrane trafficking out of endosomal/lysosomal compartments: class E vps phenotype in mammalian cells. J. Cell Sci. 116, 401–414. SACHSE, M., STROUS, G. J., and KLUMPERMAN, J. (2004). ATPase-deficient hVPS4 impairs formation of internal endosomal vesicles and stabilizes bilayered clathrin coats on endosomal vacuoles. J. Cell Sci. 117, 1699–1708. DUPRE, S., URBAN-GRIMAL, D., and HAGUENAUER-TSAPIS, R. (2004). Ubiquitin and endocytic internalization in yeast and animal cells. Biochim. Biophys. Acta 1695, 89–111. INGHAM, R. J., GISH, G., and PAWSON, T. (2004). The Nedd4 family of E3 ubiquitin ligases: functional diversity within a common modular architecture. Oncogene 23, 1972–1984. GALAN, J. M., MOREAU, V., ANDRE, B., VOLLAND, C., and HAGUENAUER-TSAPIS, R. (1996). Ubiquitination mediated by the Npi1p/Rsp5p ubiquitin-protein ligase is required for endocytosis of the yeast uracil permease. J. Biol. Chem. 271, 10946–10952. SPRINGAEL, J. Y. and ANDRE, B. (1998). Nitrogen-regulated ubiquitination of the Gap1 permease of Saccharomyces cerevisiae. Mol. Biol. Cell 9, 1253–1263. WANG, G., MCCAFFERY, J. M., WENDLAND, B., DUPRE, S., HAGUENAUER-TSAPIS, R., and HUIBREGTSE, J. M. (2001). Localization of the Rsp5p ubiquitin-protein ligase at multiple sites within the endocytic pathway. Mol. Cell Biol. 21, 3564–3575. DUNN, R., KLOS, D. A., ADLER, A. S., and HICKE, L. (2004). The C2 domain of the
References
107
108
109
110
111
112
113
114
Rsp5 ubiquitin ligase binds membrane phosphoinositides and directs ubiquitination of endosomal cargo. J. Cell Biol. 165, 135–144. MORVAN, J., FROISSARD, M., HAGUENAUER-TSAPIS, R., and URBAN-GRIMAL, D. (2004). The ubiquitin ligase Rsp5p is required for modification and sorting of membrane proteins into multivesicular bodies. Traffic 5, 383– 392. BLONDEL, M. O., MORVAN, J., DUPRE, S., URBAN-GRIMAL, D., HAGUENAUER-TSAPIS, R., and VOLLAND, C. (2004). Direct sorting of the yeast uracil permease to the endosomal system is controlled by uracil binding and Rsp5p-dependent ubiquitylation. Mol. Biol. Cell 15, 883–895. SAKATA, T., SAKAGUCHI, H., TSUDA, L., HIGASHITANI, A., AIGAKI, T., MATSUNO, K., and HAYASHI, S. (2004). Drosophila Nedd4 regulates endocytosis of notch and suppresses its ligand-independent activation. Curr. Biol. 14, 2228–2236. ROTIN, D., KANELIS, V., and SCHILD, L. (2001). Trafficking and cell surface stability of ENaC. Am. J. Physiol. Renal Physiol. 281, F391–F399. PERRY, W. L., HUSTAD, C. M., SWING, D. A., O’SULLIVAN, T. N., JENKINS, N. A., and COPELAND, N. G. (1998). The itchy locus encodes a novel ubiquitin protein ligase that is disrupted in a18H mice. Nat. Genet. 18, 143–146. MARCHESE, A., RAIBORG, C., SANTINI, F., KEEN, J. H., STENMARK, H., and BENOVIC, J. L. (2003). The E3 ubiquitin ligase AIP4 mediates ubiquitination and sorting of the G protein-coupled receptor CXCR4. Dev. Cell 5, 709–722. QIU, L., JOAZEIRO, C., FANG, N., WANG, H. Y., ELLY, C., ALTMAN, Y., FANG, D., HUNTER, T., and LIU, Y. C. (2000). Recognition and ubiquitination of Notch by Itch, a hect-type E3 ubiquitin ligase. J. Biol. Chem. 275, 35734–35737. ANGERS, A., RAMJAUN, A. R., and MCPHERSON, P. S. (2004). The HECT domain ligase itch ubiquitinates endophilin and localizes to the trans-Golgi network and endosomal system. J. Biol. Chem. 279, 11471–11479.
115 KATZ, M., SHTIEGMAN, K., TAL-OR, P., YAKIR, L., MOSESSON, Y., HARARI, D., MACHLUF, Y., ASAO, H., JOVIN, T., SUGAMURA, K., and YARDEN, Y. (2002). Ligand-independent degradation of epidermal growth factor receptor involves receptor ubiquitylation and Hgs, an adaptor whose ubiquitin-interacting motif targets ubiquitylation by Nedd4. Traffic 3, 740–751. 116 POLO, S., SIGISMUND, S., FARETTA, M., GUIDI, M., CAPUA, M. R., BOSSI, G., CHEN, H., De CAMILLI, P., and DI FIORE, P. P. (2002). A single motif responsible for ubiquitin recognition and monoubiquitination in endocytic proteins. Nature 416, 451–455. 117 THIEN, C. B. F. and LANGDON, W. Y. (2001). Cbl: many adaptions to regulate protein tyrosine kinases. Nature Rev. 2, 294–305. 118 PESCHARD, P. and PARK, M. (2003). Escape from Cbl-mediated downregulation: a recurrent theme for oncogenic deregulation of receptor tyrosine kinases. Cancer Cell 3, 519–523. 119 PESCHARD, P., FOURNIER, T. M., LAMORTE, L., NAUJOKAS, M. A., BAND, H., LANGDON, W. Y., and PARK, M. (2001). Mutation of the c-Cbl TKB domain binding site on the Met receptor tyrosine kinase converts it into a transforming protein. Mol. Cell 8, 995–1004. 120 LEVKOWITZ, G., WATERMAN, H., ZAMIR, E., KAM, Z., OVED, S., LANGDON, W. Y., BEGUINOT, L., GEIGER, B., and YARDEN, Y. (1998). c-Cbl/Sli-1 regulates endocytic sorting and ubiquitination of the epidermal growth factor receptor. Genes Dev. 12, 3663–3674. 121 THIEN, C. B. F., WALKER, F., and LANGDON, W. Y. (2001). Ring finger mutations that abolish c-Cbl directed polyubiquitination and down regulation of the EGF receptor are insufficient for cell transformation. Mol. Cell 7, 355–365. 122 DUAN, L., MIURA, Y., DIMRI, M., MAJUMDER, B., DODGE, I. L., REDDI, A. L., GHOSH, A., FERNANDES, N., ZHOU, P., MULLANE-ROBINSON, K., RAO, N., DONOGHUE, S., ROGERS, R. A., BOWTELL, D., NARAMURA, M., GU, H., BAND, V.,
25
26 Protein Sorting to the Lysosome
123
124
125
126
127
128
129
130
and BAND, H. (2003). Cbl-mediated ubiquitinylation is required for lysosomal sorting of epidermal growth factor receptor but is dispensable for endocytosis. J. Biol. Chem. 278, 28950–28960. de MELKER, A. A., van der HORST, G., and BORST, J. (2004). Ubiquitin ligase activity of c-Cbl guides the epidermal growth factor receptor into clathrin-coated pits by two distinct modes of Eps15 recruitment. J. Biol. Chem. 279, 55465–55473. HUANG, F. and SORKIN, A. (2005). Grb2-mediated Recruitment of the RING Domain of Cbl to the EGF Receptor Is Essential and Sufficient to Support Receptor Endocytosis. Mol. Biol. Cell. 16, 1268–1281. PETRELLI, A., GILESTRO, G. F., LANZARDO, S., COMOGLIO, P. M., MIGONE, N., and GIORDANO, S. (2002). The endophilin-CIN85-Cbl complex mediates ligand-dependent downregulation of c-Met. Nature 416, 187–190. SOUBEYRAN, P., KOWANETZ, K., SZYMKIEWICZ, I., LANGDON, W. Y., and DIKIC, I. (2002). Cbl-CIN85-endophilin complex mediates ligand-induced downregulation of EGF receptors. Nature 416, 183–187. SCHMITZ, C., KINNER, A., and KOLLING, R. (2005). The deubiquitinating enzyme Ubp1 affects sorting of the ABC-transporter Ste6 in the endocytic pathway. Mol. Biol. Cell. 16, 1319– 1329. KEE, Y., LYON, N., and HUIBREGTSE, J. M. (2005). The Rsp5 ubiquitin ligase is coupled to and antagonized by the Ubp2 deubiquitinating enzyme. Embo J. 24, 2414–2424. PAPA, F. R., AMERIK, A. Y., and HOCHSTRASSER, M. (1999). Interaction of the Doa4 deubiquitinating enzyme with the yeast 26S proteasome. Mol. Biol. Cell 10, 741–756. PAPA, F. R. and HOCHSTRASSER, M. (1993). The yeast DOA4 gene encodes a deubiquitinating enzyme related to a product of the human tre-2 oncogene. Nature 366, 313–319.
131 SWAMINATHAN, S., AMERIK, A. Y., and HOCHSTRASSER, M. (1999). The Doa4 deubiquitinating enzyme is required for ubiquitin homeostasis in yeast. Mol. Biol. Cell 10, 2583–2594. 132 NAVIGLIO, S., MATTECUCCI, C., MATOSKOVA, B., NAGASE, T., NOMURA, N., DI FIORE, P. P., and DRAETTA, G. F. (1998). UBPY: a growth-regulated human ubiquitin isopeptidase. Embo J. 17, 3241–3250. 133 KATO, M., MIYAZAWA, K., and KITAMURA, N. (2000). A de-ubiquitinating enzyme UBPY interacts with the SH3 domain of Hrs binding protein via a novel binding motif Px(V/I)(D/N)RxxKP. J. Biol. Chem. 275, 37481–37487. 134 KANEKO, T., KUMASAKA, T., GANBE, T., SATO, T., MIYAZAWA, K., KITAMURA, N., and TANAKA, N. (2003). Structural insight into modest binding of a non-PXXP ligand to the signal transducing adaptor molecule-2 Src homology 3 domain. J. Biol. Chem. 278, 48162–48168. 135 MIZUNO, E., IURA, T., MUKAI, A., YOSHIMORI, T., KITAMURA, N., and KOMADA, M. (2005). Regulation of EGF receptor down-regulation by UBPY-mediated deubiquitination at endosomes. Mol. Biol. Cell 16, 5163–5174. 136 BOWERS, K., PIPER, S. C., EDELING, M. A., GRAY, S. R., OWEN, D. J., LEHNER, P. J., and LUZIO, J. P. (2005). Degradation of endocytosed epidermal growth factor and virally-ubiquitinated MHC class I is independent of mammalian ESCRTII. J. Biol. Chem. 281, 5094–5105. 137 SOARES, L., SEROOGY, C., SKRENTA, H., ANANDASABAPATHY, N., LOVELACE, P., CHUNG, C. D., ENGLEMAN, E., and FATHMAN, C. G. (2004). Two isoforms of otubain 1 regulate T cell energy via GRAIL. Nat. Immunol. 5, 45–54. 138 WU, X., YEN, L., IRWIN, L., SWEENEY, C., and CARRAWAY, K. L., 3rd. (2004). Stabilization of the E3 ubiquitin ligase Nrdp1 by the deubiquitinating enzyme USP8. Mol. Cell Biol. 24, 7748–7757. 139 GNESUTTA, N., CERIANI, M., INNOCENTI, M., MAURI, I., ZIPPEL, R., STURANI, E., BORGONOVO, B., BERRUTI, G., and MARTEGANI, E. (2001). Cloning and
References
140
141
142
143
144
145
characterization of mouse UBPy, a deubiquitinating enzyme that interacts with the ras guanine nucleotide exchange factor CDC25(Mm)/Ras-GRF1. J. Biol. Chem. 276, 39448–39454. TANAKA, N., KANEKO, K., ASAO, H., KASAI, H., ENDO, Y., FUJITA, T., TAKESHITA, T., and SUGAMURA, K. (1999). Possible involvement of a novel STAM-associated molecule “AMSH” in intracellular signal transduction mediated by cytokines. J. Biol. Chem. 274, 19129–19135. MAYTAL-KIVITY, V., REIS, N., HOFMANN, K., and GLICKMAN, M. H. (2002). MPN+, a putative catalytic motif found in a subset of MPN domain proteins from eukaryotes and prokaryotes, is critical for Rpn11 function. BMC Biochem. 3, 28. VERMA, R., ARAVIND, L., OANIA, R., MCDONALD, W. H., YATES, J. R., 3rd, KOONIN, E. V., and DESHAIES, R. J. (2002). Role of Rpn11 metalloprotease in deubiquitination and degradation by the 26S proteasome. Science 298, 611–615. MCCULLOUGH, J., CLAGUE, M. J., and URBE´ , S. (2004). AMSH is an endosome-associated ubiquitin isopeptidase. J. Cell Biol. 166, 487–492. ISHII, N., OWADA, Y., YAMADA, M., MIURA, S., MURATA, K., ASAO, H., KONDO, H., and SUGAMURA, K. (2001). Loss of neurons in the hippocampus and cerebral cortex of AMSH-deficient mice. Mol. Cell Biol. 21, 8626–8637. ASADA, H., ISHII, N., SASAKI, Y., ENDO, K., KASAI, H., TANAKA, N., TAKESHITA, T.,
146
147
148
149
150
151
TSUCHIYA, S., KONNO, T., and SUGAMURA, K. (1999). Grf40, A novel Grb2 family member, is involved in T cell signaling through interaction with SLP-76 and LAT. J. Exp. Med. 189, 1383–1390. ITOH, F., ASAO, H., SUGAMURA, K., HELDIN, C. H., ten DIJKE, P., and ITOH, S. (2001). Promoting bone morphogenetic protein signaling through negative regulation of inhibitory Smads. Embo J. 20, 4132–4142. LI, H. and SETH, A. (2004). An RNF11: Smurf2 complex mediates ubiquitination of the AMSH protein. Oncogene 23, 1801–1808. ROCCA, A., LAMAZE, C., SUBTIL, A., and DAUTRY-VARSAT, A. (2001). Involvement of the ubiquitin/proteasome system in sorting of the Interleukin 2 receptor b chain to late endocytic compartments. Mol. Biol. Cell 12, 1293–1301. van KERKHOF, P., GOVERS, R., ALVES DOS SANTOS, C. M., and STROUS, G. J. (2000). Endocytosis and degradation of the growth hormone receptor are proteasome-dependent. J. Biol. Chem. 275, 1575–1580. HAMMOND, D. E., URBE´ , S., VANDE WOUDE, G. F., and CLAGUE, M. J. (2001). Downregulation of MET, the receptor for hepatocyte growth factor. Oncogene 20, 2761–2770. COLLEDGE, M., SNYDER, E. M., CROZIER, R. A., SODERLING, J. A., JIN, Y., LANGEBERG, L. K., LU, H., BEAR, M. F., and SCOTT, J. D. (2003). Ubiquitination regulates PSD-95 degradation and AMPA receptor surface expression. Neuron 40, 595–607.
27
1
Regulation of Proteolysis at the Proteasome Portal Monika Bajorek, and Michael H. Glickman
Technion — Israel Institute of Technology, Haifa, Israel
Originally published in: Protein Degradation, Volume 2. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31130-0 Abstract
The proteolytic active sites of the 26S proteasome are sequestered within the central chamber of its 20S catalytic core particle. Access to this chamber is through a narrow channel defined by the outer alpha subunits. An intricate lattice of interactions anchors the N-termini of these alpha subunits, blocking access to the channel in free 20S core particles of eukaryotes. Entry of substrates can be enhanced by attachment of activators or regulatory particles to the proteolytic 20S core. Regulatory particles rearrange the blocking residues to form an open pore and promote substrate entry into the proteolytic chamber. Channel gating is apparently partially rate limiting for proteasome activity, as facilitating substrate entry in the open channel state leads to enhanced overall proteolysis rates. Interestingly, some substrates, particularly hydrophobic ones, can activate gate opening themselves, thus facilitating their own destruction. Properties of channel gating and the interactions required to maintain stable closed and open conformations and their consequences for proteasome function are discussed.
1 Background
In eukaryotes, the 26S proteasome hydrolyzes most nuclear, cytoplasmic, and endoreticulum (ER) proteins into peptides of varying lengths. Normally, substrates destined for elimination are first covalently attached to multiple molecules of ubiquitin (Ub) – a process that is executed by a cascade of ubiquitinating enzymes specific for each class of substrate – and then recognized by the 26S proteasome, unfolded, translocated into the proteolytic chamber, and irreversibly degraded [1–3]. The somewhat simpler 20S proteasomes found in archaea and some bacteria (actinomycetes) can degrade and remove non-ubiquitinated proteins in a very similar Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Regulation of Proteolysis at the Proteasome Portal
manner, though recognition of substrates and anchoring them to the proteasome while they are prepared for degradation probably differ due to lack of the ubiquitin tag in these organisms [4, 5]. In either case, substrates are threaded through a narrow channel leading into the isolated internal chamber where the proteolytic active sites are located. A gated porthole at the entrance to this channel may play a defined role in controlling both the nature of preferred substrates and the rate at which they enter the proteasome lumen where they are irreversibly hydrolyzed (Figure 1). This chapter will focus on this gate and mechanisms of maintaining opened or closed states. The 26S proteasome is composed of two sub-complexes: the 20S core particle of the proteasome (CP) where proteolysis takes place and a 19S regulatory particle (RP) that prepares substrates for entry into the CP. A detailed description of proteasome structure and associated activities can be found in other chapters in this volume, as well as in many detailed reviews [6–15]. Pertinent to understanding regulation of substrate entry, the 20S CP is a cylindrical structure composed of four stacked heptameric rings engendering a sequestered proteolytic chamber. Each of the two outer rings is composed of seven α subunits, and each of the two identical inner rings is formed from seven β subunits. The β rings contain the proteolytic active sites, while the outer α rings define the channel leading into the internal proteolytic chamber [16–19]. Overall, the 20S CP forms a α 7 β 7 β 7 α 7 barrel structure [6]. The α and β rings of archaeal 20S proteasomes are made up of seven copies each of a single α or β subunit, giving the structure as a whole a sevenfold symmetry along the central axis. In contrast, there are seven distinct subunits in each ring of the 20S CP from eukaryotes. When visualized from outside, these subunits are labeled counterclockwise α1 through α7 and β1 through β7, respectively. Although the seven α or β subunits within the 20S CP of each species are structurally almost superimposable, their sequence identities are usually in the 20–40% range (see Ref. [7] and references therein). These differences are probably of significance as they are well maintained; sequences of orthologous subunits from different species are more than 55% identical [20, 21]. The purified 20S CP in its so-called latent form can slowly hydrolyze short or unstructured polypeptides as well as some proteins with hydrophobic or misfolded patches [22–25]. There is increasing evidence that degradation of unstructured nonubiquitinated proteins by the 20S CP may play biological roles in vivo as well [22, 26– 31]. To degrade ubiquitinated substrates, attachment of the ATPase-containing 19S RP to the surface of the α ring of the 20S CP is required [25, 32]. Attachment of 19S RP enhances the basal peptidase activity of proteasomes as well [33, 34]. In archaea and select prokaryotes, ATPase rings such as the proteasome-activating nucleotidase (PAN) complex or the AAA ATPase ring complex (ARC) serve as rudimentary regulatory particles that enhance proteolysis by 20S proteasomes but probably do not affect peptidase rates [35–38]. One manner by which such ATPase-containing regulatory particles activate proteolysis is by unfolding substrates and translocating them into the proteolytic chamber [36, 39–45]. However, other regulatory particles can also influence the proteolytic activity of the proteasome in an ATP-independent manner. For example, a number of non-ATPase activators – such as PA28, PA26,
1 Background
3
4 Regulation of Proteolysis at the Proteasome Portal
and 11S Reg complexes – attach to the 20S CP and enhance peptidase, but not proteolysis, rates [46–53]. Other conditions such as exposure to low ionic strength, sodium dodecyl sulfate (SDS), or small hydrophobic peptides can also activate hydrolysis rates and give rise to an activated 20S CP [18, 22, 24, 34, 54, 55]. These results (together with other observations) suggest that the latent 20S CP is found in a self-imposed repressed state and that it is possible to activate the intrinsic peptidase activity in an ATP-independent manner. Structural analysis revealed the nature of this auto-inhibition by identifying a gated channel leading substrates into the proteolytic chamber [17–19]. Alleviation of inhibition by opening the substrate channel for traffic is a basic feature of proteasomes, a key property necessary to understand their function.
2 The Importance of Channel Gating
In order to protect cells from mistaken degradation of random proteins, entry into the proteasome lumen must be strictly regulated. Indeed, access to the channel in ←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Fig. 1 A porthole into the proteasome. Top view presentations showing the surface structure of the " ring of free 20S CPs from various preparations: (A) mammalian, (B) mammalian in theoretical open state, (C) yeast (S. cerevisiae), (D) yeast in theoretical fully open state, (E) the "3N mutant from yeast, (F) yeast in open conformation imposed by attachment of the PA26S activator, (G) archaea (T. acidophilum), (H) bacteria (Rhodococcus erythropolis). Structures depicted in A, C, E, F, G, and H are the actual 2D determinations extracted from the crystal structure deposited in the Protein Data Bank (PDB) and visualized as a surface view with the Viewerlite program. Acidic residues are colored in red, alkali in blue, and hydrophobic in white. Note the dominance of acidic residues in the pore region of the closed state in yeast and mammalian 20S CPs. These structures indicate that the gross surface structure of eukaryotic proteasomes is remarkably similar, with slight structural divergence apparent in the archaeal and bacterial versions. The N-terminus upstream of threonine 13 in the archaeal complex (the gray region shown in Figure 2) is disordered and thus does not show up in the electron density map, giving the appearance of an open confor-
mation (G). Structures shown in B and D are models in which the electron density of the corresponding tail residues in each " subunit was deleted to mimic an open conformation. Complete removal of these tail segments unveils a pore in the open state. The actual crystal structure determination of the "3N 20S CP indicates that "3 is a pivotal subunit in controlling channel gating. Deletion of the "3 tail alone (E) causes disordering in neighboring subunits (up to the red arrow in Figure 2); however, some obstruction remains when compared with the theoretical fully open state (D). Deletion of the two opposing tails from the "3 and "7 subunits generates a fully activated complex, apparently due to removal of this residual obstruction [22]. Interestingly, attachment of PA26 to the "-ring surface rearranges the tail regions of each subunit, coupled with significant disordering (up to the blue arrow in Figure 2), thus imposing an open conformation (F). By deleting the electron density of the PA26 chains and focusing only on the structure of the 20S subunits, we depict in panel F the actual situation that occurs upon PA26 binding to 20S from yeast. Additional details can be found in the original publications [16–19, 55, 59].
2 The Importance of Channel Gating 5
the latent 20S CP of eukaryotes is restricted by the N-termini of the seven α subunits [17–19]. Each tail assumes a unique conformation while pointing inwards to the center of the ring [7, 9, 10]. Figures 1A and 1C describe the surface of the latent 20S CP as it may appear to an approaching substrate; the interlaced residues in latent proteasomes form a sealed surface obstructing passage of proteins and even short peptides. This property is a common feature of eukaryotic proteasomes and is well conserved between yeast and mammalian complexes. The blocking residues at the entrance to the proteolytic channel probably account for the repressed proteolytic activity of the latent 20S CP of eukaryotes [18, 56]. The behavior of archaeal proteasomes is somewhat different from that of the proteasomes of eukaryotes. In contrast to the sealed chamber in the eukaryotic complex, crystal structures of 20S proteasomes obtained from archaeal organisms distinctly find the proteasome in an open state. Disorder in the conformation of the first 12 amino acids of each α subunit creates a pore in the center of the α ring contiguous with the channel leading into the lumen (Figure 1G). This open state accounts for enhanced peptidase rates measured for this complex [16, 36, 57]. Despite the appearance of an open pore in the archaeal 20S CP (Figure 1), dynamic conformations of the α-subunit tails may partially restrict passage of intact proteins through the pore, necessitating mechanisms for activation of protease activity. For instance, by locking these residues in a stable, open conformation, regulatory complexes found in these organisms (such as PAN) are able to accelerate proteolysis of full-sized proteins [36, 43, 55, 57, 58]. Certain conditions might promote switching of the N-termini from an unstructured into a structured, open conformation without requiring a regulatory complex [57]. Apparently, a similar situation occurs for 20S proteasomes present in bacteria [59]. The N-termini encompassing the first eight amino acids of the subunits in the α ring of the Rhodococcus proteasome are disordered (Figure 2), creating the appearance of an open pore (Figure 1H). The ARC regulatory complex found in these organisms probably stimulates proteolytic activity by anchoring these tails into a static, open conformation, similarly to the role of PAN in archaeal organisms. Eukaryotic 20S CPs can also be found in an open conformation. Repeated freezethawing of purified 20S CPs; mild chemical treatments such as exposure to low ionic strength, to low levels of sodium dodecyl sulfate (SDS), or to short hydrophobic peptides; and attachment of regulatory complexes and even mutations in residues near the central pore all activate hydrolysis rates [18, 22, 24, 34, 54, 55]. Presumably, all these treatments lead to disordering in pore residues involved in gating the channel, thus opening up a porthole into the 20S CP (Figure 1B,D). Rearrangement of channel-blocking residues results in facilitated substrate access into the proteolytic chamber and activation of proteasomes. Activated 20S CPs can easily hydrolyze unfolded or hydrophobic proteins, in some instances more rapidly even than intact 26S holoenzymes [22, 23]. It should be noted that free 20S CPs can spontaneously switch between the latent “closed” and activated “open” conformations [60]; however, under physiological conditions it appears that the majority of free 20S CPs from eukaryotes are found in a closed and latent state.
6 Regulation of Proteolysis at the Proteasome Portal
Fig. 2 Sequence alignment of " subunit Ntermini. N-termini of the seven different " subunits from S. cerevisiae compared to archaeal and bacterial homologues. The seven a subunits ("1-"7) of yeast (Sc) are shown along with the single " subunit found in bacteria (Re) and archaea (Ta). In normal text, to the left of the bold region lie the tail sequences, which differ from one " subunit to the other. Residue numbers for the yeast and bacteria " subunits are assigned based on the alignment to archaea. Residues in the N-terminus to the arrow are disordered in "3N mutant (red arrow) and upon attachment of PA26S (blue arrow). The conserved residues forming the so-called canonical cluster (YDR-P-Y) documented
to play a role in stabilizing the closed or open conformations are underlined. The Nterminal sequence up to the first " helical structure of each CP " subunit (as determined by the crystal structure of the yeast CP [17, 18]) is shown in light gray. This region is homologous to the disordered segments in the N-terminal regions of the " subunit from T. acidophilum [16, 57]. The YD(R) motif in each tail is underlined. Note that in all subunits from yeast a tyrosine residue is present in the same location. In six of the seven subunits, an aspartate follows, and in three, an arginine completes the YDR motif. The remainder of the tail region is divergent between subunits.
Similar to the situation described above for archaeal proteasomes, a distinction can be made between the appearance of an open channel due to disordering in residues lining the pore region and one in which the blocking tails are secured in an open structure. Structural determination of a 20S CP–PA26 complex depicts the α subunit N-termini pointing away from the center of the ring, opening up an unobstructed porthole into the channel [55] (see also Figure 1). The conformation differences between the closed and open states of N-termini could explain how regulatory particles activate proteolytic activity by rearranging the blocking residues to facilitate substrate entry [43, 54, 55]. Regulatory complexes such as the 19S RP participate in channel gating and facilitate substrate entry. This activates proteolysis and allows the resulting proteasome holoenzyme to fulfill its role in regulated protein degradation [7]. Indeed, under standard growth conditions, the majority of proteasomes in yeast cells are found as 26S holoenzymes. That said, 20S CPs are abundant in certain cases and are found
3 A Porthole into the Proteasome 7
regularly in mammalian tissue. Given that 20S core particles are slower than 26S holoenzymes at hydrolyzing most test substrates and are unable to proteolyze polyubiquitinated proteins, it is unclear what the biological function of the 20S CP is, or whether it is an important player in cellular protein breakdown. Abating bulk protein degradation upon proteasome disassembly may be a requirement for survival under certain stress conditions. For instance, prolonged starvation, oxidative stress, and severe heat-induced damage result in dissociation of proteasome holoenzymes and elevated levels of 20S CPs [22, 28, 29, 61]. There is some evidence that the catalytically repressed 20S CP can serve as a reservoir of proteasome components for reassembly when resumption of proteolysis is needed. Nevertheless, free 20S CPs may play limited roles in degradation of unstructured or non-ubiquitinated proteins under both normal growth and stress conditions [22, 26, 28–31].
3 A Porthole into the Proteasome 3.1 The Closed State
The first 12 residues in the archaeal complex, right up to the first α helix in the protein (Figure 2), are naturally disordered, giving the impression of an open channel (Figure 1G). Similar disordering is found in the equivalent sections of α subunits in the bacterial 20S proteasome purified from Rhodococcus (Figure 1H). In comparison, the corresponding N-termini of the seven α subunits in eukaryotic complexes point towards the center of the ring, sealing the entrance to the proteolytic channel (Figure 1A,C). In eukaryotes, the paralogous subunits within the α ring show structural and sequence similarities over the bulk of the protein, yet diverge at their amino-terminal region in both sequence and relative length (Figure 2). Precisely at the center of the ring, each tail accepts a unique conformation, stabilizing a well-defined and non-symmetric closed configuration [7] (see also Figure 3). The tail of α3 is somewhat distinct from the others in that it points directly across the surface of the α ring towards the center, maintaining close contact to every other α subunit. The importance of these tail regions is highlighted by their extreme conservation across eukaryotes; while each tail is highly conserved in different species, the corresponding regions are divergent from one subunit to another [7]. These properties suggest that the N-termini play a critical structural role that has been maintained in core particles in all eukaryotes, and it is precisely their differences that are integral to their function. Low-energy bonds formed between specific tail residues are critical for stabilizing the open or closed conformations. For example, in latent 20S CPs, aspartate at position 9 in the N-terminus of α3 contacts both tyrosine 8 and arginine 10 in neighboring α4 [18]. A salt bridge is formed between the carboxylate group of aspartate 9 in α3 and the guanidinium group of arginine 10 in α4, simultaneous with a hydrogen bond linking aspartate 9 of α3 with tyrosine 8 of α4 [18]. Similar
8 Regulation of Proteolysis at the Proteasome Portal
Fig. 3 A gate-and-latch system determines open and closed states of the substrate channel. Yeast 20S CP in a latent closed conformation (A) and in an open state imposed by the PA26S activator (B). At left a top view of the seven-member " ring of the 20S CP is shown. All " subunits are color coded, starting with "1 (light blue) on top and running counterclockwise to "7 (red). On the right, a side view focusing only on two opposing subunits – "3 (pink) and "7 (red) – highlights the conformational
switch that occurs in the N-termini (black) between the closed state (A) to an open state (B) upon binding of PA26. Note that the N-terminal tails of "3 and "7 adopt different conformations in the closed state and point inwards to block access through the channel (A). In contrast, in the open state induced by PA26S, the tails adopt a new ordered conformation, pointing away from the channel region. A portion of the N-terminus (up to the blue arrow in Figure 2) is disordered and invisible in the crystal structure.
bonds probably link the analogous residues in mammalian 20S CPs [19]. Embedded within the N-terminal segments of most α subunits is a short consensus sequence: Tyr8-Asp9-Arg10 or “the YDR motif ” (Figure 2). Conservation of tyrosine at position 8 is absolute among subunits in yeast (and is invariable at this location in most subunits of other organisms as well); aspartate at position 9 is present in the tail of all yeast subunits except for α2, while conservation of arginine as residue number 10 is less strict. The direct contacts formed between these residues in adjacent subunits may explain their correlated evolutionary conservation. Interactions involving YDR residues could be critical for maintaining distinct open and closed conformations of 20S proteasomes. Finding YDR residues in all subunits that form the α ring makes it somewhat puzzling that crystal structure determination did not pick out similar contacts between other neighboring subunits in the pore region. This raises the possibility that the interaction between α3 and α4 plays a unique and central role in maintaining the closed conformation of the proteasome. Support for the pivotal role of α3 in gating the 20S CP channel was provided upon truncation of the tail region of α3. Truncation of the N-terminus of the α3 subunit in yeast (the α3N mutant) resulted in a purified 20S CP that was found in the open pore conformation
3 A Porthole into the Proteasome 9
(Figures 1 and 3). Furthermore, removal of the N-terminus of α3 caused disordering in neighboring subunits concomitant with stimulation of 20S CP peptidase activity. These results indicate that the tail of α3 is important for stabilizing neighboring tails in the closed conformation. Moreover, an aspartate-to-alanine substitution in the YDR motif of α3 (the α3 D9A mutant) appears to increase peptidase activity of purified 20S CP in vitro, on par with the activation observed upon deletion of the entire tail region in α3N [18]. Both mutations associated with α3 point to a functional significance of the YDR motif in stabilization of the closed state of the gate. Interestingly, the YDR sequence is found intact in the α subunits from various archaea, even though the tail regions are not anchored in the latent state of 20S proteasomes from these organisms [16, 57]. This observation points to a wider role for α-subunit tail interactions in defining proteasome conformation. 3.2 The Open State
In order for substrates to enter the proteolytic chamber, and most likely for products to exit as well, the blocking N-terminal residues of the α subunits in the closed state must be rearranged. Rearrangement obviously necessitates breaking of the interactions that anchor the tails in the closed conformation, while forming competing interactions to stabilize them in an open conformation. For example, removal of the first nine residues at the N-terminus of the α3 subunit in yeast breaks stabilizing interactions with neighboring subunits, causing significant disordering in neighboring tails (Figure 2) and leading to an open pore roughly 13 Å across at the center of the α-ring surface (Figure 1F). Disordering alone is insufficient to allow unobstructed entry through the pore, as no enhancement of protein degradation rates has been observed so far with proteasomes purified from this mutant [18, 22]. Each α subunit plays a unique role in gating. Deletion of the equivalent N-terminal tail of α7 does not significantly increase the peptidase activity compared to wild type, pointing to a significant role for α3 [22]. Because of its peripheral location at the α-ring surface, truncation of α7 alone may not result in sufficient loss of order in neighboring tails to generate an opening wide enough for entry of small peptides. However, deletion of tails of two opposing α subunits (the α3α7N strain; [22]) may act synergistically to relieve hindrance of entry of proteins into the proteolytic chamber. Thus, gating of the proteolytic channel emerges as a pivotal property in regulating proteolysis rates. In fact, activated proteasomes may require more than just a disordered state of channel gating residues. Blocking residues may need to be removed (as in the truncation studies described in the preceding paragraph) or anchored into a stable, open conformation to relieve obstruction of the channel. Studies on archaeal proteasomes point to interactions involving YDR residues as influencing channel opening rather than stabilizing the closed state. While the N-terminal tails (12 amino acids) of each subunit of the α ring are disordered, they nevertheless occupy the pore region and impose a partial barrier to passage of protein substrates [16, 57]. The lack of defined electron density reflects that under the experimental conditions
10 Regulation of Proteolysis at the Proteasome Portal
mutual interactions were not strong enough to anchor these tails into a single stable conformation, resulting in multiple possible conformations of these segments between individual molecules in the sample. Evidence that the YDR motif may play a role in stabilizing the open state was provided by a combined mutagenesis and biochemical study of archaeal proteasomes. Purified archaeal 20S proteasomes slowly hydrolyze GFP tagged with a short C-terminal extension, GFP-ssrA. The rate is accelerated upon addition of the archaeal proteasome activator PAN. However, PAN was unable to activate proteolysis of GFP-ssrA by 20S proteasomes from the archaeon T. acidophilum that were mutated at positions tyrosine 8 or arginine 9 (of the YDR motif in the N-tail regions) of their α subunit [55]. It was suggested that these residues are required to stabilize the tails in an open conformation that is preferred upon attachment of PAN. A stably open conformation was observed in proteasome assembly precursors as well. During proteasome biogenesis, the seven α subunits form an intermediate homomeric ring, the α 7 ring, which only then interacts with β subunits to yield the mature complex with α 7 β 7 composition [62]. In contrast to mature archaeal 20S CPs, structure determination of such an α 7 ring precursor from the archaeon Archaeoglobus fulgidus found the N-terminal segments anchored in a stable, open state [57]. In this conformation, the tail regions that contain the YDR motif adopted a helical structure motif and pointed away from the ring surface: tyrosine 8 of each subunit made a hydrogen bond with aspartate 9 of the preceding α subunit, whereas arginine 10 pointed inwards towards the central channel and did not partake in anchoring neighboring tails. The region N-terminal to tyrosine 8 of each subunit was disordered, creating the appearance of a pore roughly 13 Å in diameter contiguous with the substrate channel. In mature archaeal 20S CPs, the interactions between the N-termini of the α subunits appear to be broken, causing disordering in a greater portion of the tail regions up to residue number 12 (inclusive). These disordered tails, which are not stably anchored, partially block the pore and interfere with passage of proteins into the proteasome lumen. Eukaryotic proteasomes can be found in an open state as well. Crystallography analysis of a PA26–20S CP complex depicted the α ring in an ordered, open, symmetric conformation [55]. Attachment of PA26 induces all seven of the α subunit N-terminal tails to adopt an ordered conformation for residues 7–12 away from the center of the ring. A cluster of four highly conserved residues, Tyr8 and Asp9 (part of the YDR motif in the tail region) together with downstream residues Pro17 and Tyr26 (in the first stable alpha helix; HO), is critical for the open state. Attachment of PA26 to the surface of the α ring repositions Pro17, which in turn induces a conformational change in tail segments to lift up and away from the center of the ring (Figure 2). In stark contrast to the closed state, the open state is remarkably symmetric, with all tails conforming to a similar structure held in place by similar interactions. In this open state, Asp9 forms a hydrogen bond with Tyr26 of the same subunit. Cooperativity between subunits is communicated via an additional hydrogen bond linking Asp9 of one subunit and Tyr8 of the preceding (clockwise) tail [55]. Consequently, the hydrogen bond that holds Asp9 and Tyr8 of α3 and α4, respectively, in the closed state must be broken and rearranged to allow for the open state in which
4 Facilitating Traffic Through the Gated Channel 11
Asp9 of α3 now interacts with Tyr8 of α2. The term “proteasome gating” refers to switching between these two conformations. A principal role of regulatory particles is to promote channel opening by stabilizing one conformation over the other. The interactions occurring between different subunits in the pore region of the eukaryotic PA26–20S CP complex are quite similar to those found in the α 7 -ring precursor complex of archaea, which is also found in a stable, open conformation [57]. Asp9 in each subunit interacts with Tyr8 of the preceding (clockwise) tail in α 7 rings, leading to a symmetric, open structure. Asp9 of each α subunit also interacts with the downstream Tyr26 residue of the same subunit, repositioning the tail away from the center of the ring and up into the cavity of the docking proteasome activator complex (PA26 in this case). This conformation can be seen clearly in a side view of the 20S CP complex shown in Figure 3. The mesh of internal interactions among α subunits described in the preceding paragraph and the apparent lack of stable interactions between α-subunit tail residues with the proteasome activator suggest that a stable, open conformation is an intrinsic property of proteasomes and may explain how, under some conditions, 20S core particles can spontaneously adopt the activated form without need for attachment of an activator complex [60]. Nevertheless, comparative studies show that the 20S CP from Rhodococcus is also found in an open conformation [59], even though the α subunits in this organism do not contain any of the signature YD or PY residues in the tail or HO regions (Figures 1, 2). Apparently, the semblance of an open state does not absolutely require interactions among these residues, suggesting additional mechanisms for stabilizing the closed or open states and switching between them. Additional studies will have to be completed for a clear understanding of gating mechanisms.
4 Facilitating Traffic Through the Gated Channel 4.1 Regulatory Complexes
As mentioned above, a number of ATP-independent activators are known to attach to the 20S CP and activate its peptidase and protease activities. These include the 11S Reg/PA28, PA26, and PA200 [46–53]. Attachment of PA28, for example, increases Vmax for hydrolysis of certain peptides by the 20S CP by up to 100-fold, but in contrast to ATPase-containing activators (such as the 19S RP or PAN), PA28 does not promote protein degradation by the 20S CP [46, 63]. Activation of peptidase activity by non-ATPase-containing regulators can be attributed to imposing an open-channel conformation and facilitated substrate entry upon attachment of the regulatory complex. For example, activation has been documented for the PA26–20S hybrid complex, formed in vitro from 20S CP from S. cerevisiae and PA26 from T. brucei [55]. S. cerevisiae apparently lacks natural homologues of this class of activators, but it is assumed that attachment of other symmetric activators induces
12 Regulation of Proteolysis at the Proteasome Portal
a similar conformational change. For example, the symmetric PAN complex found in archaea appears to drive formation of the open conformation of archaeal proteasomes in much the same manner [55]. Attachment of PAN expedites proteolysis by wild-type archaeal 20S proteasomes, yet is unable to activate proteasomes mutant in any of the channel cluster residues (Tyr8, Asp9, Pro17 and Tyr26; see Section 3.2). This result suggests that each residue in the conserved YD-P-Y cluster is critical for stabilizing the open conformation in activated proteasomes. Mutations in these residues may lead to disordered tails that are unable to “lock” into an open conformation even upon PAN attachment, thus impinging on substrate entry. Activation is not achieved merely by realigning the residues that obstruct traffic through the alpha ring in the closed state, but necessitates locking the α tails into a stable, open conformation. So far, it is unclear whether asymmetric activators, such as the 19S RP, open the channel in a manner similar to that of the symmetric examples given by PA26 and PAN. However, evidence linking the 19S RP to gating can be deduced from a substitution mutation in the ATP-binding site of a single ATPase (RPT2) that severely affects peptidase activity of the proteasome, probably due to hampering the ability of the RP to properly gate the channel into the CP [56, 64]. This observation indicates that even the entry of small peptides – which do not need to be unfolded – can be controlled by the RP. Furthermore, a constitutively open-channel 20S CP generated upon deletions of tail residues in the α3 and α7 subunits (α3α7N mutant) exhibits activated peptidase activity, similar to that measured for 26S proteasome holoenzymes [22]. The implication is that attachment of the 19S RP realigns the α-subunit tails to facilitate passage of substrates, thus enhancing proteolysis rates. It should be emphasized, however, that in contrast to the homomeric rings found in PA26 or PAN, the 19S RP is a heterogeneous complex that contains six different ATPases. The two classes of regulators may induce different conformational changes on the sevenfold symmetry of the α ring. Furthermore, it has not yet been verified that the six ATPases indeed form a six-member ring at the base of the 19S RP. A limited subset of Rpt subunits have been found to come in direct contact with an α subunit (α2–Rpt4, α2–Rpt5, α4–Rpt4, α7–Rpt4, α1–Rpt6, α2–Rpt6, α4–Rpt2, α6–Rpt4; [65– 68]). The pair Rpt2–α3 has been shown to be involved in gating the channel into the CP [18, 56, 64], though gating may be controlled by additional Rpt–α subunit interactions. As there does not appear to be simple pairing of each Rpt with a single α subunit, it is possible that the conformation adopted by α-subunit tails in the 26S proteasome holoenzyme will differ from the “symmetric” open state observed for the PA26–20S CP hybrid. 4.2 Substrate-facilitated Traffic
Recent studies show that some natively disordered proteins can enter the proteasome without assistance of ATP-dependent activators [26–28, 30]. The ability of latent 20S CP to catalyze cleavage of some unfolded proteins suggests that they may directly interact with the α ring to promote gating to facilitate their own entry.
5 Conclusions 13
This mechanism is not general: latent 20S CPs degrade most substrates slower than activated proteasomes [22]. Nevertheless, some substrates with unfolded domains or hydrophobic patches are degraded rapidly by latent 20S CP, faster even than by 26S holoenzymes [23]. Presumably, sequence motifs in the substrate interact with channel-gating residues in α subunits and aid in channel opening. For example, p21 and α-synuclein facilitate their own degradation and, when fused to stable and hard-to-degrade proteins such as GFP, promote their degradation as well indicating that gating sequences are transferable. Support for such a mechanism can be deduced from certain peptides that interact with the proteasome in a noncompetitive way to modulate the proteolytic activity of the proteasome [24, 69]. This stimulation was not observed for open-channel proteasomes (such as α3N or the PA26–20S CP complex), suggesting that they specifically interact with channel-gating residues and promote channel opening. Whether these interactions involve the YDR motif or the YD-P-Y cluster in α subunits, similar to the manner by which regulatory particles activate proteasome activity, has not been elucidated. Interestingly, the pore region of the 20S CP, in the closed state, exposes only hydrophobic or negatively charged side chains. No positively charged groups are present on the surface of the α ring in eukaryotes (Figure 1). Thus, as a substrate approaches the surface of the α ring, it “sees” a predominantly negatively charged surface. This may explain how the latent 20S CP discriminates between substrates and why certain unstructured substrates with hydrophobic or positively charge stretches may interact with gating residues in the pore region and facilitate their own translocation inwards, whereas others do not [23]. For example, hydrolysis of casein – which is highly phosphorylated and carries multiple negatively charged groups – is remarkably slow by latent 20S CP, yet can be dramatically accelerated upon channel opening possibly by removal of tail residues as in Figure 1E [22].
5 Conclusions
It appears that proteasome-dependent proteolysis is a regulated process that can be enhanced or inhibited under certain conditions. There are reports that the pro-teasome itself can be a target of such regulation [22, 29, 61, 70]. Indeed, enhancement of overall in vivo proteolysis rates observed in the open-channel mutant indicates that the proteasome may be partially rate limiting in the overall cascade of ubiquitin-dependent protein degradation [22]. Polyubiquitinated substrates must be stable enough, even if only transiently, to allow for competition between degradation and reversal of fate. Channel gating within 26S holoenzymes may participate in the delicate balance between proteolysis and rescue. A function of a gated channel leading into the CP is to impose inhibition during assembly of the mature CP. In the final stage of CP assembly, selfcompartmentalization is achieved by the association of two α 7 β 7 half-CPs at the β-β interface. These half-CPs are inactive due to propeptides in the critical β subunits that mask their active site. As these half-CPs are joined, inhibition
14 Regulation of Proteolysis at the Proteasome Portal
by β-subunit N-termini is relieved by autolysis [17, 62], while inhibition by the blocking N-termini of the α subunits is imposed. Binding of the regulatory particles relieves this inhibition by opening the channel and thus activating proteolysis. There is increasing evidence that (at least in yeast) certain stress constitutions such as prolonged starvation or severe heat shock naturally promote proteasome dissociation into separate 20S CP and 19S RP subcomponents [22, 29, 61, 70]. These conditions may require repressed proteasome-dependent degradation for survival. One manner by which proteasome activity could be downregulated is by reinstating autoinhibition of the dissociated 20S CP. Indeed, the open-channel mutant that lacks the ability to enter the closed conformation exhibits low viability under conditions that promote proteasome disassembly [22]. An additional reason for a gated channel could be to regulate exit of products from the proteasome. It is possible that under normal conditions product release is slowed down by a gated channel in order to increase processivity or to decrease average peptide length. Most of these short peptides are quickly removed from the cytoplasm. Under certain conditions (such as during immune response) it might be beneficial to produce peptides with other lengths or properties. For example, upon interferon-γ induction, attachment of PA28/11S Reg plays a role in antigen processing by altering the makeup of peptides generated by the hybrid proteasome 19S RP–20S CP–11S Reg complexes [52, 53, 71, 72]. In analogy to the distantly related PA26, PA28 probably attaches to the α-ring surface and rearranges the blocking N-termini, promoting the open-channel conformation. It is possible that the open state increases the exit rate of peptides generated in the proteolytic chamber and alters their makeup to fit better antigen-presentation requirements.
References 1 GLICKMAN, M. H. and A. CIECHANOVER, The Ubiquitin-proteasome Proteolytic Pathway: Destruction for the sake of construction. Physiol. Rev., 2002. 82: p. 373–428. 2 PICKART, C. M., Mechanisms Underlying Ubiquitination. Annu. Rev. Biochem., 2001. 70: p. 503–533. 3 WEISSMAN, A. M., Themes and variations on ubiquitylation. Nature Rev. Cell Mol. Biol., 2001. 2 (3): p. 169–179. 4 MAUPIN-FURLOW, J. A., et al., Proteasomes in the archaea: from structure to function. Front. Biosci., 2000. 5: p. D837–D865. 5 De MOT, R., et al., Proteasomes and other self-compartmentalizing proeases in prokaryotes. Trends Microbiol., 1999. 7 (2): p. 88–92. 6 PICKART, C. M. and R. E. COHEN, Proteasomes and their kin: Proteases in
7
8
9
10
11
the machine Age. Nat. Rev. Mol. Cell Biol., 2004. 5 (3): p. 177–187. BAJOREK, M. and M. H. GLICKMAN, Proteasome regulatory particles: Keepers of the gates. Cell. Mol. Life Sci., 2004. 61 (13): p. 1579–1588. GUTERMAN, A. and M. H. GLICKMAN, Deubiquitinating enzymes are IN(trinsic to proteasome function). Curr. Prot. Pep. Sci., 2004. 5 (3): p. 201–210. GROLL, M. and T. CLAUSEN, Molecular shredders: how proteasomes fulfill their role. Current Opinion in Structural Biology, 2003. 13 (6): p. 665–673. GROLL, M. and R. HUBER, Substrate access and processing by the 20S proteasome core particle. Int. J. Biochem. Cell Biol., 2003. 35 (5): p. 606–616. GLICKMAN, M. H. and V. MAYTAL, Regulating the 26S proteasome., in Curr.
References
12
13
14
15
16
17
18
19
20
21
22
Top. Microbiol. and Immunol., P. ZWICKL and W. BAUMEISTER, Editors. 2002, Springer-Verlag: Germany. p. 43–72. ZWICKL, P. and E. SEEMUELLER, 20S proteasomes, in Curr. Topics Microbiol. Immunol., P. ZWICKL and W. BAUMEISTER, Editors. 2002, Springer-Verlag: Germany. p. 23–41. HEINEMEYER, W. and D. H. WOLF, Active sites and assembly of the 20S proteasome., in Proteasomes: The world of regulatory proteolysis, D. H. WOLF and W. HILT, Editors. 2000, Eurekah.com/ LANDES BIOSCIENCE Publishing Company: Georgetown, Texas USA. p. 48–70. BOCHTLER, M., et al., The Proteasome. Annu. Rev. Biophys. Biomol. Struct., 1999. 28: p. 295–317. VOGES, D., P. ZWICKL, and W. BAUMEISTER, THE 26S PROTEA-SOME: A Molecular Machine Designed for Controlled Proteolysis. Annu. Rev. Biochem, 1999. 68: p. 1015–1068. LOEWE, J., et al., Crystal structure of the 20S proteasome from the archeon T. acidophilum at 3.4 Angstrom resolution. Science, 1995. 268: p. 533–539. GROLL, M., et al., Structure of 20S proteasome from yeast at a 2.4 Angstrom resolution. Nature, 1997. 386: p. 463–471. GROLL, M., et al., A gated channel into the core particle of the proteasome. Nat. Struct. Biol., 2000. 7: p. 1062–1067. UNNO, M., et al., The structure of the mammalian 20S proteasome at 2.75A resolution. Structure, 2002. 10: p. 609–618. VOLKER, C. and A. LUPAS, Molecular Evolution of Proteasomes, in Curr. Top. Microbiol. and Immunol., P. ZWICKL and W. BAUMEISTER, Editors. 2002, Springer-Verlag: Berlin Heidelberg. p. 1–22. GILLE, C., et al., A Comprehensive View on Proteasomal Sequences: Implications for the Evolution of the Proteasome. Journal of Molecular Biology, 2003. 326 (5): p. 1437–1448. BAJOREK, M., D. FINLEY, and M. H. GLICKMAN, Proteasome Disassembly and Downregulation Is Correlated with Viability during Stationary Phase. Current Biology, 2003. 13 (13): p. 1140–1144.
23 LIU, C.-W., et al., Endoproteolytic activity of the proteasome. Science, 2003. 299: p. 408–411. 24 KISSELEV, A. F., D. KAGANOVICH, and A. L. GOLDBERG, Binding of hydrophobic peptides to several non-catalytic sites promotes peptide hydrolysis by all active sites of 20 S proteasomes. Evidence for peptide-induced channel opening in the alpha-rings. J. Biol. Chem., 2002. 277: p. 22260–22270. 25 GLICKMAN, M. H., et al., A subcomplex of the proteasome regulatory particle required for ubiquitin-conjugate degradation and related to the COP9/Signalosome and eIF3. Cell, 1998. 94 (5): p. 615–623. 26 ORLOWSKI, M. and S. WILK, Ubiquitin-independent proteolytic functions of the proteasome. Archives of Biochemistry and Biophysics, 2003. 415 (1): p. 1–5. 27 FORSTER, A. and C. P. HILL, Proteasome degradation: enter the substrate. Trends in Cell Biology, 2003. 13 (11): p. 550–553. 28 GRUNE, T., et al., Selective degradation of oxidatively modified protein substrates by the proteasome. Biochemical and Biophysical Research Communications, 2003. 305 (3): p. 709–718. 29 SHRINGARPURE, R., et al., Ubiquitin Conjugation Is Not Required for the Degradation of Oxidized Proteins by Proteasome. J. Biol. Chem., 2003. 278 (1): p. 311–318. 30 ASHER, G., et al., Mdm-2 and ubiquitin-independent p53 proteasomal degradation regulated by NQO1. PNAS, 2002. 99 (20): p. 13125–13130. 31 HOYT, M. A. and P. COFFINO, Ubiquitin-free routes into the proteasome. Cell. Mol. Life Sci., 2004. 61 (13): p. 1596–1600. 32 GUTERMAN, A. and M. H. GLICKMAN, Complementary roles for Rpn11 and Ubp6 in deubiquitination and proteolysis by the proteasome. J. Biol. Chem., 2004. 279 (3): p. 1729–1738. 33 ADAMS, G. M., et al., Formation of proteasome-PA700 complexes directly correlates with activation of peptidase activity. Biochem., 1998. 37: p. 12927–12932.
15
16 Regulation of Proteolysis at the Proteasome Portal
34 GLICKMAN, M. H., et al., The regulatory particle of the S. cerevisiae proteasome. Mol. Cell. Biol., 1998. 18: p. 3149– 3162. 35 FROHLICH, K. U., An AAA family tree. J. Cell Sci., 2001. 114: p. 1601–1602. 36 ZWICKL, P., et al., An archaebacterial ATPase, homologous to ATPases in the eukaryotic 26 S proteasome, activates protein breakdown by 20 S proteasomes. J. Biol. Chem., 1999. 274: p. 26008– 26014. 37 WOLF, S., et al., Characterization of ARC, a divergent member of the AAA ATPase family from Rhodococcus erythropolis. J. Mol. Biol., 1998. 277: p. 13–25. 38 WOLLENBERG, K. and J. C. SWAFFIELD, Evolution of proteasomal ATPases. Mol. Biol. Evol., 2001. 18 (6): p. 962–974. 39 BRAUN, B. C., et al., The base of the proteasome regulatory particle exhibits chaperone-like activity. Nat. Cell Biol., 1999. 1: p. 221–226. 40 STRICKLAND, E., et al., Recognition of misfolding proteins by PA700, the regulatory subcomplex of the 26S proteasome. J. Biol. Chem., 2000. 275: p. 5565–5572. 41 NAVON, A. and A. L. GOLDBERG, Proteins are unfolded on the surface of the ATPase ring before transport into the proteasome. Mol. Cell, 2001. 8 (6): p. 1339–1349. 42 LIU, C.-W., et al., Conformational remodeling of proteasomal substrates by PA700, the 19S regulatory complex of the 26S Proteasome. J. Biol. Chem., 2002. 277 (30): p. 26815–26820. 43 BENAROUDJ, N., et al., ATP hydrolysis by the proteasome regulatory complex PAN serves multiple functions in protein degradation. Mol. Cell, 2003. 11: p. 69–78. 44 OGURA, T. and K. TANAKA, Dissecting various ATP-dependent steps involved in proteasomal degradation. Mol. Cell, 2003. 11 (1): p. 3–5. 45 KENNISTON, J. A., et al., Studies with a prokaryotic protease show that the cost of translocation consumes much of the energy that is used in degradation. Cell, 2003. 114: p. 511–520. 46 MA, C. P., C. A. SLAUGHTER, and G. N. DEMARTINO, Identification, purification,
47
48
49
50
51
52
53
54
55
56
57
and characterization of a protein activator (PA28) of the 20S proteasome. J. Biol. Chem., 1992. 267: p. 10515–10523. REALINI, C., et al., Characterization of recombinant REGα, REGβ, and REGγ proteasome activators. J. Biol. Chem., 1997. 272: p. 25483–25492. HENDIL, K. B., S. KHAN, and K. TANAKA, Simultaneous binding of PA28 and PA700 activators to 20S proteasomes. Biochem. J., 1998. 332: p. 749–754. YAO, Y., et al., Structural and functional characterizations of the proteasome-activating protein PA26 from Trypanosoma brucei. J. Biol. Chem., 1999. 274 (48): p. 33921–33930. MCCUTCHEN-MALONEY, S. L., et al., cDNA cloning, expression, and functional characterization of PI31, a proline-rich inhibitor of the proteasome. J. Biol. Chem., 2000. 275: p. 18557– 18565. USTRELL, V., et al., PA200, a nuclear proteasome activator involved in DNA repair. EMBO J., 2002. 21 (13): p. 3516–3525. RECHSTEINER, M., C. REALINI, and V. USTRELL, The proteasome activator 11S REG (PA28) and class I antigen presentation. Biochem. J., 2000. 345: p. 1–15. STOHWASSER, R., et al., Kinetic evidences for facilitation of peptide channeling by the proteasomal activator PA28. Eur. J. Biochem., 2000. 267: p. 6221–6230. KOHLER, A., et al., The substrate translocation channel of the proteasome. Biochimie, 2001. 83(3–4): p. 325–332. FORSTER, A., F. G. WHITBY, and C. P. HILL, The pore of activated 20S proteasomes has an ordered 7-fold symmetric conformation. EMBO J., 2003. 22 (17): p. 4356–4364. KOEHLER, A., et al., The axial channel of the proteasome core particle is gated by the Rpt2 ATPase and controls both substrate entry and product release. Mol. Cell, 2001. 7: p. 1143–1152. GROLL, M., et al., Investigations on the Maturation and Regulation of Archaebacterial Proteasomes+ Journal of Molecular Biology, 2003. 327(1 SU): p. 75–83.
References 58 WILSON, H. L., et al., Biochemical and physical properties of the M. jannaschii 20S proteasome and PAN, a homolog of the ATPase (Rpt) subunits of the eucaryal 26S proteasome. J. Bacteriol., 2000. 186 (6): p. 1680–1692. 59 KWON, Y. D., et al., Crystal structures of the Rhodococcus proteasome with and without its pro-peptides: implications for the role of the pro-peptide in proteasome assembly. J. Mol. Biol., 2004. 335: p. 233–245. 60 OSMULSKI, P. A. and M. GACZYNSKA, Nanoenzymology of the 20S proteasome: Proteasomal actions are controlled by the allosteric transition. Biochem., 2002. 41: p. 7047–7053. 61 IMAI, J., et al., The molecular chaperone Hsp90 plays a role in the assembly and maintenance of the 26S proteasome. EMBO J., 2003. 22 (14): p. 3557–3567. 62 CHEN, P. and M. HOCHSTRASSER, Autocatalytic subunit processing couples active site formation in the 20S proteasome to completion of assembly. Cell, 1996. 86: p. 961–972. 63 DUBIEL, W., et al., Purification of an 11S regulator of the multicatalytic protease. J. Biol. Chem., 1992. 267 (31): p. 22369–22377. 64 RUBIN, D. M., et al., Active site mutants in the six regulatory particle ATPases reveal multiple roles for ATP in the proteasome. EMBO J., 1998. 17 (17): p. 4909–4919.
65 COUX, O., An interaction map of proteasome subunits. Biochem. Soc. Trans., 2003. 31: p. 465–469. 66 DAVY, A., et al., A protein-protein map of the C. elegans 26S proteasome. EMBO Rep., 2001. 2 (9): p. 821–828. 67 FU, H. Y., et al., Subunit interaction maps for the regulatory particle of the 26s proteasome and the cop9 signalosome reveal a conserved core structure. EMBO J., 2001. 20 (24): p. 7096–7107. 68 FERRELL, K., et al., Regulatory subunit interactions of the 26S proteasome, a complex problem. Trends Biochem. Sci., 2000. 25 (2): p. 83–88. 69 PAPAPOSTOLOU, D., O. COUX, and M. REBOUD-RAVAUX, Regulation of the 26S proteasome activities by peptides mimicking cleavage products*1. Biochemical and Biophysical Research Communications, 2002. 295 (5): p. 1090–1095. 70 ZHANG, F., et al., O-GlcNAc modification is an endogenous inhibitor of the proteasome. Cell, 2003. 115 (6): p. 715–725. 71 GROETTRUP, M., et al., A role for the proteasome regulator PA28a in antigen presentation. Nature, 1996. 381: p. 166–168. 72 CASCIO, P., et al., Properties of the hybrid form of the 26S proteasome containing both 19S and PA28 complexes. EMBO J., 2002. 21 (11): p. 2636–2645.
17
1
Tailoring Protein Scaffolds for Ligand Recognition A. Skerra
Originally published in: Protein-Ligand Interactions.Edited by Hans-Joachim B¨ohm, Gisbert Schneider. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30521-6
1 Introduction
A major goal of medicinal chemistry is the design of low-molecular-weight ligands that bind to target proteins in a tight and specific manner. In the case of enzymes, these ligands act as inhibitors or allosteric effectors, while in the case of transmembrane receptors, they serve as agonistic or antagonistic signaling molecules. Ligands of these types have conventionally been derived from natural compound libraries and, more recently, via combinatorial synthesis. The quickly growing number of proteins with known three-dimensional structure and the significant methodological improvements in the structural elucidation of proteins during the past decade [1] – employing X-ray crystallography or nuclear magnetic resonance (NMR) techniques – have strongly promoted the computer-aided drug-design approach. Especially enzyme inhibitors can now be readily constructed on the basis of structural information about the target macromolecule [2]. Nevertheless, in the case of receptor targets, the rational prediction of cognate compounds is still hampered due to the inherent difficulties associated with their crystallization or NMR study. An inverse task is given when there is demand for a macromolecule that specifically binds a small ligand. This question has only recently been addressed by peptide chemistry. For example, antiparallel bundles of four α-helices, which were assembled on a cyclic peptide structure as template, have been used to create hydrophobic cavities for heme as a low-molecular-weight compound [3]. The specific complexation of FeIII ·protoporphyrin IX was facilitated by the proper positioning of liganding His residues. While this approach could be interesting from the perspective of rational protein design, it may be limited to special applications, and detailed structural information about the complex is not yet available. Deeper mechanistic insight into the molecular recognition of small molecules has been gained from antibodies, a class of natural proteins that have traditionally Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Tailoring Protein Scaffolds for Ligand Recognition
served as specific binding agents for a variety of “hapten” ligands [4]. Numerous practical applications exist for such antibodies in the fields of medical diagnostics as well as bioanalytics, where so-called immunochemical methods provide a quick, inexpensive, and reliable method for the sensitive detection of metabolites and even xenobiotic compounds [5]. There are also examples of hapten-binding antibodies with clinical use, e.g., in the therapeutic treatment of poisoning with cardiac steroids like digoxin. In this case an Fab fragment of an antibody with high affinity towards the small molecule is administered, which removes the free compound from blood circulation, prevents it from binding to the cell surface receptor, and makes it amenable to renal filtration or degradation in the liver [6]. With respect to the generation of cognate ligand-receptor proteins, one disadvantage is that low-molecular-weight compounds as such cannot be directly used for the immunization of animals. Rather, these haptens must be conjugated to macromolecular carriers in order to elicit an effective immune response. Nevertheless, attempts to generate antibodies with high affinities and specificities against small ligands have often remained unsuccessful. Two potential problems need to be considered in this context. First, when antibodies are to be raised against metabolically occurring substances, they might interfere with physiological processes. Furthermore, if the compounds are toxic, immunization may not be possible at all. Second, and more generally, antibodies were probably evolved by the immune system mainly for the recognition of proteins or other macromolecular targets (like nucleic acids or oligosaccharides) rather than low-molecular-weight compounds. This notion is supported by the crystal structures of various antibody fragments in complex with either antigens or haptens. In the first case an extended interface is formed between the antigen-binding site of the antibody (the paratope) and the macromolecular target. Typically, a surface of approximately 800 Å2 is buried and at least five of the six hypervariable loops (complementarity-determining regions, CDRs) – possibly even together with residues from the structurally conserved framework regions of the antibody variable domains – are involved [7, 8]. The shape of the combining site is often flat but also can be slightly concave or convex. In contrast, in the case of haptens the mode of interaction with the paratope is much more restricted because a pocket for the ligand needs to be formed in order to provide a sufficient number of interactions that ensure tight complex formation. This pocket is usually located at the interface between the pair of variable domains from the light and heavy chains of the antibody. Hence, a cleft must be formed whose shape is mainly determined by the two CDR-3 loops protruding from the VH and VL domains, which are related by a pseudo C2 -symmetry axis. Because a minimal hydrophobic contact area between VH and VL is required in order to maintain the non-covalent domain association, the size of this pocket is limited, and many ligands therefore become just partially buried when bound to an antibody (see Section 4). In addition to this structural consideration, there are empirical observations from recombinant antibody technology indicating that it is difficult to generate an antibody fragment with exquisite specificity towards a hapten using combinatorial libraries cloned from unimmunized donor gene pools or derived from synthetic genes. In contrast, the in vitro selection of high-affinity antibodies against proteins is
1 Introduction 3
Fig. 1 Three types of scaffolds, with a convex, flat, and concave interface, respectively. (Left) Camel VH H domain, here presenting its extended CDR-3 loop towards the active site of an enzyme (PDB entry 1MEL). (Middle) Protein A with its interface made of two "-helices directed against an Ig constant domain (PDB entry 1BDD). (Right) Bilin-
binding protein with its pocket formed by four loops at the open end of the $-barrel structure for the complexation of biliverdin IX( , (PDB entry 1BBP). Loops or amino acid positions that are important for the molecular recognition of the respective target and that may be amenable to side chain exchanges are colored.
nowadays a routine procedure. Consequently, the recruitment of alternative protein classes for the generation of small ligand receptors has attracted attention [4]. To this end, the concept of using a scaffold – which means a protein architecture with high intrinsic stability – to create a binding site for the specific interaction with the target molecule has gained interest. An appropriate protein scaffold should provide a rigid folding unit that spatially brings together several exposed loops that form a continuous and extended interface such that multiple interaction with the target and hence tight binding are ensured. Ideally, such a scaffold should have structurally partitioned the generic information and stability of its polypeptide fold on the one hand and the local shape and molecular recognition function of its active site on the other (Fig. 1). Initially, this approach has had remarkable success in the generation of artificial binding proteins towards “protein antigens” (for a general review, see [9]). Several single domain proteins that belong to the generic immunoglobulin (Ig) fold, thus supporting a set of two or three hypervariable peptide loops on one end of a sandwich of β-sheets, have proven to be suitable for the recognition of such macromolecular targets. Prominent examples include an individual fibronectin III domain [10] as well as certain VH domains derived from camel or llama Ig [11], which constitute soluble globular proteins even in the absence of a cognate VL domain. Typically, these scaffold proteins exhibit a wedge-shaped structure with the set of variable loops located at the tip in close mutual neighborhood (Fig. 1). Therefore, they seem to be particularly suited for complex formation at a groove on the surface of the target protein. In many cases this corresponds to the active site, and, indeed, effective enzyme inhibitors have been generated using the camel VH -domain approach. In contrast, the structural complexation of small molecules is difficult to achieve with this scaffold. So far only cameloid antibodies recognizing rather large azo dye compounds as haptens have been described [12].
4 Tailoring Protein Scaffolds for Ligand Recognition
Another type of scaffold that has been successfully used for the recognition of prescribed target proteins originates from the bacterial immunoglobulin receptor protein A. So-called “affibodies” were obtained by reshaping the natural Ig-binding interface of the Z domain of protein A, which is formed by a side-by-side pair of α-helices [13]. An essentially flat surface is generated in this manner (Fig. 1), which can probably pack against a patch with low curvature on the target protein. Compared with the generation of recombinant receptor proteins against macromolecular targets, the recognition of small, hapten-like compounds obviously poses a greater challenge. To this end, a pocket with complementary shape must be created in order to enable the burial of a significant area of hydrophobic surface and to provide a sufficient number of protein-ligand interactions – van der Waals contacts, hydrogen bonds, and possibly salt bridges – such that practically useful dissociation constants in the nanomolar range result. In fact, these stringent demands still make it rather difficult to apply rational design principles to the creation of cognate receptor proteins; instead, their construction has to rely on the powerful methodology of combinatorial biochemistry that is available today. Nevertheless, there exist only a few protein families in nature whose function lies in the plain complex formation with small molecules – as opposed to their biochemical conversion by enzymes or to the triggering of cellular signals via membrane or nuclear receptors. One example is given by the periplasmic nutrient-binding proteins that are found in Escherichia coli and other Gram-negative bacteria, comprising a variety of proteins with specificities for sugars, amino acids, and essential inorganic ions such as phosphate and sulfate [14]. These proteins serve for the transient complexation of their cognate compounds, followed by controlled delivery to transporter proteins that reside in the inner bacterial plasma membrane. Yet, despite their similarity in function, their sizes vary considerably and the mechanism of ligand complexation usually involves several distinct globular domains. Similarly, streptavidin from Streptomyces avidinii – and also its eukaryotic counterpart avidin, which occurs in chicken egg white – has evolved only in order to tightly complex biotin, a small vitamin compound [15]. In this case the complexation is kinetically almost irreversible, which makes sense for its role as a bacterial antibiotic protein and has led to its widespread use as a biochemical reagent (for references, see [16]). The lipocalins constitute another family of secretory ligand-binding proteins, which are typical for higher organisms. Initially, they were discovered in vertebrates, such as the retinol-binding protein (RBP) in man [17], but in fact lipocalins are found in a variety of eukaryotes and even in bacteria [18–20]. Generally, they serve for the transport or storage of poorly soluble or chemically sensitive compounds. Although their primary structures mostly lack detectable homology, structural analyses revealed a common fold for these proteins, comprising a rigid β-barrel as the central element of the lipocalin architecture [21]. The ligand is bound at the open end of this supersecondary structure, where a set of four loops forms the entrance to a structurally well-defined pocket. Consequently, lipocalins have emerged as an attractive scaffold with potential for the engineering of artificial ligand-binding proteins.
2 Lipocalins: A Class of Natural Compound Carriers
2 Lipocalins: A Class of Natural Compound Carriers
The first lipocalin whose 3-D structure was solved and refined at high resolution was the human plasma retinol-binding protein (RBP) [22, 23]. RBP acts as a natural transporter of vitamin A (retinol) in the blood of vertebrates. Upon complexation in a hydrophobic cavity with complementary shape, the poorly soluble terpenoid alcohol becomes packaged by the protein and protected from oxidation or double-bond isomerization. RBP is synthesized in the liver and directly loaded with the ligand in the hepatocyte, where retinol is stored. Furthermore, the holo-RBP forms a structurally defined ternary complex with transthyretin [24], also known as prealbumin. After delivery of the retinol ligand to a target tissue, the complex decomposes and the monomeric apo-RBP becomes filtered out by the kidney and degraded. In the crystal structure, RBP folds into a single globular domain of approximately 40 Å in diameter (Fig. 2) whose central part is made of an eight-stranded, up- anddown β-barrel. At the amino-terminal end, the β-sheet region is flanked by a coiled peptide segment, and at the carboxy-terminal end, it is followed by an α-helix and an amino acid stretch in a more or less extended conformation. Within the β-barrel the anti-parallel strands (assigned A to H) are arranged in a (+1)7 topology. They wind in a right-handed and conical manner around a central axis such that part of the first strand A is hydrogen bonded via its backbone to the last strand H again. One end of the β-barrel is closed by the amino-terminal peptide segment that runs across its bottom between the two short loops connecting strands B/C and F/G, respectively, before it enters into β-strand A. Dense packing of side chains in this region and within the adjacent interior of the barrel structure leads to the formation of a hydrophobic core. The other end of the β-barrel is open to the solvent and forms a characteristic pocket. In the case of RBP, retinol is encapsulated as a ligand and protrudes into the barrel by almost half of its depth. The entrance to the pocket is formed by a set of four loops, which connect the eight antiparallel β-strands in a pairwise fashion. Because of the chalice-like shape of the protein (Fig. 2) and since many members of this family complex lipophilic compounds, the term “lipocalins” was proposed [25]. Several other lipocalins whose tertiary structures have been elucidated adopt a very similar fold. These were dubbed “prototypic” lipocalins [21] in order to distinguish them from more distantly related members of the family [18]. Within this subset, especially the β-barrel with the attached α-helix is highly conserved. In contrast, the four loops that form the entrance to the ligand pocket vary considerably in sequence, conformation, and length, thus effecting the differing ligand specificities (Fig. 2). However, not all lipocalins need to complex a small ligand in order to fulfill their physiological role. In aphrodisin, for example, which acts as a strong pheromone on male hamsters, the polypeptide itself seems to be responsible for the biological activity, thus requiring transfer of the non-volatile macromolecule by physical contact [26]. Even though a potential ligand pocket was recently detected in its crystal
5
6 Tailoring Protein Scaffolds for Ligand Recognition
Fig. 2 Generic structure of lipocalins. (Top) Ribbon representation of the retinol-binding protein with the bound vitamin A (PDB entry 1RBP). The four loops are shown in dark red at the open end of the $-barrel, and the three characteristic disulfide bonds of the
RBP are highlighted. (Bottom) Superposition of six natural lipocalins with diverse ligand specificities (PDB entries 1BBP, 1BEB, 1BJ7, 1EPA, 1MUP, 1RBP). The $-barrel is colored black (for detailed description, see [21]).
2 Lipocalins: A Class of Natural Compound Carriers
structure [27], attempts to identify a cognate low-molecular-weight molecule have failed. For some other lipocalins promiscuous binding of hydrophobic ligands was assumed [18, 28]. In the case of human apolipoprotein D (ApoD), for example, which occurs as a peripheral protein of the high-density lipoprotein (HDL) complex in serum, a whole series of potential ligands has been discussed [29]. Yet, thorough binding studies with the rigorously purified recombinant protein revealed just two ligands to be complexed with approximately micromolar dissociation constants: progesterone and arachidonic acid [30]. Because progesterone is well discriminated by ApoD against related steroids, such as pregnenolone and testosterone, it seems likely that arachidonic acid is recognized at a different binding site. Lipocalins are typical secretory proteins containing disulfide bonds. Human RBP possesses the maximal number of three disulfide cross-links that was observed so far. One of them joins the carboxy-terminal end of the polypeptide chain to the βbarrel (Cys70 –Cys174 ). Another one fixes the amino-terminal segment of the protein to the carboxy-terminal end of the α-helix (Cys4 –Cys160 ). The third disulfide bond (Cys120 –Cys129 ) links the two neighboring strands G and H just underneath loop # 4 at the open end of the β-barrel (cf Fig. 2). The latter two disulfide bridges are characteristic for RBP. Although a Cys residue close to the amino-terminus is also found in several other lipocalins, it usually forms a disulfide bond with a Cys residue in strand G of the β-barrel, as in the bilin-binding protein (BBP), which carries two disulfide bonds [31, 32]. The C-terminal disulfide bond is obviously conserved in the lipocalin family, especially in those members that possess just one of them, e.g., the human neutrophil gelatinase-associated lipocalin (hNGAL) [33, 34]. Yet, there are certain local deviations as in the BBP, where this link is made to a Cys residue in strand B instead of strand D, as in RBP. Some lipocalins do not possess disulfide bonds at all, e.g., the bacterial lipocalin [20]. Hence, it seems that stabilization of the lipocalin architecture does not generally necessitate disulfide cross-links, contrasting with the immunoglobulin fold [35]. Many lipocalins are abundant in serum or tissue fluids. However, their glycosylation status varies. Human RBP, for example, is not glycosylated, whereas ApoD from human plasma was shown to be glycosylated at both of its potential N-glycosylation sites, Asn45 and Asn78 [36] – possibly in contrast with other tissues where this lipocalin is also expressed. Nevertheless, when synthesized as a recombinant protein in E. coli, via secretion into the bacterial periplasm, the unglycosylated ApoD can be isolated as a soluble and functional protein [30]. Similarly, hNGAL, which is normally glycosylated at a single position, can be obtained as an unglycosylated protein from E. coli and adopts its proper tertiary structure, as elucidated by NMR analysis [33]. Finally, many lipocalins exist as soluble monomers. RBP, for example, which participates in a reversible complex formation with transthyretin, can be isolated as a fully stable monomer, either in complex with retinol or in the absence of the ligand [37]. Interestingly, several mammalian lipocalins appear to be linked via a disulfide bond to other functional protein complexes. For example, human ApoD carries
7
8 Tailoring Protein Scaffolds for Ligand Recognition
a fifth Cys residue in addition to those giving rise to its two intra-chain disulfide bonds. Its unpaired thiol side chain is connected to a Cys residue of apolipoprotein A-II, which is an integral lipoprotein of the HDL particle. When this residue is removed by site-directed mutagenesis, the recombinant ApoD can be isolated in a soluble monomeric state [30]. Moreover, human ApoD is naturally produced as an individual protein in some other tissue fluids [29], and in other organisms, such as rabbits, the unpaired Cys residue is even missing. Similarly, hNGAL normally occurs cross-linked with gelatinase B (known as matrix metalloproteinase 9), but also as a monomeric or homodimeric serum protein, and it was successfully produced in a monomeric state for structural studies [33]. Taken together, lipocalins provide attractive candidates in order to engineer novel ligand specificities. Features like their small size (typically between 150 and 180 residues), monomeric polypeptide composition, dispensable posttranslational modification, and robust protein fold not only facilitate protein-engineering studies but also provide advantages for practical applications.
3 Anticalins: Lipocalins Reshaped via Combinatorial Biotechnology
In a first attempt to tailor the ligand pocket of a lipocalin, the bilin-binding protein (BBP) served as a biochemically well-characterized model protein. The BBP originally occurs as a secretory protein in the butterfly Pieris brassicae, where it complexes biliverdin IXγ , a metabolic oxidation product of protoporphyrin IX. Hence, it serves for coloration as well as photoprotection, especially at the larval state. Natural BBP is found in two isoforms [32]. BBP-I forms a dimer in solution, whereas BBP-II, which likely arises from deamidation of the amino-terminal Asn residue of BBP-I, adopts a stable monomeric state. Genetic analysis revealed that only BBP-I is encoded on the insect chromosome [38]. After fusion of the mature part of the polypeptide chain to a bacterial leader peptide, the apo-BBP could be produced in the periplasm of E. coli as a recombinant protein in a functional state. For this purpose the amino-terminal Asn residue was directly exchanged by Asp at the genetic level and a monomeric BBP with full ligand-binding capability was obtained [38]. The crystal structure of the natural holo-BBP was elucidated at high resolution [31, 32]. It revealed the characteristic β-barrel fold with the tetrapyrrole ligand bound in a helical conformation (Fig. 3). Compared with human RBP, the four loops give room to a wider and shallower pocket for biliverdin. Consequently, BBP appeared to be a promising candidate for the reshaping of its ligand-binding site towards a variety of target compounds. To this end, the methodology of combinatorial biochemistry was applied (Fig. 3), comprising steps of (1) directed random mutagenesis of the loop regions in order to create a molecular library and (2) selection of cognate binding proteins from this library against a prescribed ligand. Based on the 3-D structure of BBP, a total of 16 amino acid positions were identified within the four loop segments – as well as adjoining regions of the β-strands – that dominate the interface with the natural ligand (Fig. 3). These
3 Anticalins: Lipocalins Reshaped via Combinatorial Biotechnology
Fig. 3 Generation of anticalins by randomization and selection. (Top) Randomized amino acid positions (corresponding side chains are shown in light blue) in the BBP, depicted here with the bound biliverdin in dark blue. (Bottom) Principle of the molecular library selection.
positions were chosen to fulfill two criteria: first, they could be expected to tolerate both small and large side chain substitutions, and second, they appeared to reach as deeply as possible into the binding pocket. Hence, one could assess whether the hydrophobic core in the deeper part of the β-barrel would still be functional with respect to the stable folding of corresponding mutants. The 16 positions in the cloned BBP cDNA [38] were subjected to site-directed random mutagenesis using a two-step assembly polymerase chain reaction (PCR) with the help of primer oligodeoxynucleotides that were synthesized with mixed bases at the mutagenic codon positions. In order to introduce unique restriction
9
10 Tailoring Protein Scaffolds for Ligand Recognition
sites at both ends of the amplified central fragment of the BBP structural gene (both for BstXI, but with mutually non-compatible overhangs), two amino acids had to be exchanged at positions belonging to the β-barrel: Asn21 → Gln and Lys135 → Met. In addition, the recombinant BBP carried the mutation Asn1 → Asp mentioned above and Lys87 → Ser, which was introduced in order to remove a proteolytic cleavage site [39]. Thus, there were altogether four fixed amino acid replacements in addition to the randomized side chains. The mutagenized gene cassette was then inserted into an appropriate E. coli vector and a genetic library comprising 3.7 × 108 variants was prepared [40]. The phagemid-display technique [41] was employed in order to select BBP variants with novel binding specificities from the resulting library [39]. For this purpose the BBP variants were produced as fusion proteins with a bacterial signal peptide at the amino-terminus and with the Strep-tag II, followed by a truncated pIII phage coat protein, at the carboxy-terminus [40]. In this case the amino acids 217 to 406 of the gene III product from filamentous phage M13 were used. The whole fusion gene was cloned on a phasmid vector under the tight transcriptional control of the chemically inducible tetracycline promoter [42] so that phagemid particles displaying BBP variants on their surface were efficiently produced under appropriate conditions. Fluorescein, a well known immunological hapten [43] with many applications in biochemistry and biophysics and a collection of commercially available derivatives, served as the prescribed ligand for BBP variants in the first selection study. The phagemid random library was used for panning on a plastic surface coated with a covalent conjugate of fluorescein with bovine serum albumin (BSA). After six cycles of adsorption, acid elution, and phagemid re-amplification, the specific enrichment of a mutant phagemid fraction was observed. From DNA sequence analysis of 10 arbitrarily chosen clones, it became apparent that just four different BBP variants were still present in this population. Three of them – dubbed FluA, FluB, and FluC – gave rise to strong signals for the binding of several fluorescein conjugates when produced as soluble proteins and investigated in an ELISA. In each of these variants, all 16 randomized amino acids had been exchanged when compared with the wild-type BBP [40]. Interestingly, four substitutions were identical among the three selected variants: Arg58 , Arg95 , Arg116 , and His127 . The corresponding preponderance of positively charged side chains was in agreement with the several negative charge centers of the fluorescein derivative that was used in the selection. A similar effect had been observed in attempts to select recombinant antibody fragments against the same hapten from a semi-synthetic combinatorial library [44]. The BBP variant FluA was subjected to detailed biochemical characterization. The engineered lipocalin could be produced at high yield in the periplasm of E. coli (9.1 mg per 2 L culture compared with 1.2 mg for wild-type BBP) and isolated to homogeneity in one step via the Strep-tag method [45]. According to the relative shift in electrophoretic mobility between oxidized and reduced state of the protein, the two disulfide bonds of the BBP scaffold were correctly formed. Furthermore, the far UV circular dichroism spectrum of FluA revealed no marked differences
3 Anticalins: Lipocalins Reshaped via Combinatorial Biotechnology
from the wild-type protein. Consequently, the BBP had tolerated 16 amino acid exchanges within its ligand pocket – plus the four rationally introduced mutations mentioned above – without losing its folding properties as a lipocalin. The novel ligand-binding activity of FluA was studied in ELISA experiments with fluorescein coupled to different carrier proteins. In each case steep saturation curves were observed with half-maximal concentrations in the nanomolar range, so that the recognition of the hapten appeared to be independent of the macromolecular context. Thermodynamic dissociation constants for the complex formation between FluA and fluorescein and some related compounds were determined by fluorescence titration in solution, measuring the emission of the protein’s Tyr and Trp residues. As a result, fluorescein was bound slightly stronger than its two derivatives 4-aminofluorescein and 4-glutarylamidofluorescein, the compounds that actually had been used in the synthesis of the protein conjugates for the selection experiments. In contrast, pyrogallol red, a chemically similar triphenylmethane compound, was bound two orders of magnitude less tightly, while no binding at all could be detected for the related dyes phenolphthalein or rhodamine B. Interestingly, when the titration was performed such that the hapten’s own characteristic fluorescence was measured, almost complete quenching was observed. From this very accurate titration experiment, a K D value of 35.2 ± 3.2 nM was determined for the FluA·fluorescein complex [40]. Using the same spectroscopic effect, the association kinetics between fluorescein and FluA could be measured by rapid mixing, yielding a K on value of 5.28 ±0.05 × 106 M−1 s−1 (G. Beste and A. Skerra, unpublished). The phenomenon of almost complete fluorescence quenching upon complex formation between fluorescein and the engineered lipocalin FluA was elucidated in a series of time-resolved light-absorption measurements after pulse activation of the bound ligand [46]. These experiments revealed an ultrafast electron transfer between the fluorescein and an aromatic side chain (Tyr or Trp) in its close proximity. The excited fluorescein dianion within the ligand pocket abstracts an electron from the neighboring amino acid at a rate of 400 fs. The resulting radical trianion is deactivated in a radiationless process – with a larger time constant of 4 ps – to the spectroscopic ground state of fluorescein by back transfer of the electron. The observed monoexponentiality in the formation of the excited state, and also of its subsequent decay, points toward a high structural definition of the hapten-binding site and explains the highly efficient quenching effect that becomes apparent under stationary conditions. Clearly, this spectroscopic phenomenon was a serendipitous event because there was no corresponding selection applied during the generation of FluA. Also, the other mutants that were selected along with it did not show fluorescence quenching to the same high extent. Nevertheless, it is remarkable that such an efficient electron transfer process, which is even faster (by a factor 3–4) than the one measured between bacteriochlorophyll and bacteriopheophytin in the bacterial reaction center of Rhodobacter sphaeroides, can be achieved by combinatorial protein design (see the discussion in [46]). The fluorescence-quenching effect observed with FluA is also significantly more pronounced than for antibodies that were raised against
11
12 Tailoring Protein Scaffolds for Ligand Recognition
fluorescein by immunization. Hence, the engineered lipocalin may be of interest as a reagent in biophysical studies where the specific masking of fluorescein groups is desired. The same random library of BBP variants was used in selection experiments with several other haptens. The cardiac steroid digoxigenin served as a molecular target of practical relevance [47]. In this case the library was screened by combining phagemid display with a filter-sandwich colony-screening assay in order to rapidly identify individual BBP variants with corresponding ligand-binding activity. As a result, one variant with specificity for digoxigenin was isolated whose dissociation constant was determined to be 295 ± 37 nM by means of protein fluorescence titration [47]. In an attempt to further improve the ligand affinity of this engineered lipocalin, dubbed DigA, an in vitro affinity maturation was performed. Inspection of the primary sequence of the BBP variant revealed that the first of the four loops mainly had charged side chains acquired during the selection. However, these amino acids did not appear to be optimal for the complexation of the hydrophilic, though uncharged, steroid. In this respect it should be noted that the BBP random library with its complexity of 3.7 × 108 was by far too small to represent all possible combinations of the 16 randomized amino acids. Therefore, in principle, considerable room remained for further sequence optimization in parts of the binding pocket, once molecular recognition of a specific ligand was achieved. Consequently, six amino acid positions within loop # 1 of DigA were selectively subjected to oligodeoxynucleotide-directed random mutagenesis, again followed by selection for the binding of digoxigenin groups via phagemid display and colony screening. In this way, the variant DigA16 was obtained, which binds digoxigenin significantly tighter, with a K D value of 30.2 ± 3.6 nM [47]. Remarkably, the glycosylated natural compound digoxin, which has three sugar molecules attached to C-3 of the steroid system [48], is bound with precisely the same affinity. Likewise, digoxigenin conjugates with several different carrier proteins – bovine serum albumin, ovalbumin, or ribonuclease A – that were covalently linked via an aliphatic spacer to the same steroid ring position gave rise to indistinguishable binding signals in solid-phase assays. Thus, the BBP variant DigA16 recognized the digoxigenin group as a true hapten, without detectable context dependence. Subsequent ligand-binding studies [47] revealed that a chemically similar cardiac glycoside, digitoxin, which differs from digoxin just by a single missing hydroxyl group, is bound stronger still, with a K D value equal to 3.2 ±0.54 nM. In contrast, complex formation with the related steroid ouabain, which often shows cross-reactivity with antibodies raised against digoxin [48], was not detectable. In addition, no complex formation was observed with the steroid testosterone or with 4-aminofluorescein, the ligand that had served before in the selection of the BBP variant FluA. Hence, DigA16 represents an engineered lipocalin with high affinity and pronounced specificity towards a rather hydrophilic steroid. The dramatic alteration in the ligand-binding function of this lipocalin was achieved by exchanging a total of 17 amino acids [47], which form most of the pocket in the natural BBP, together
4 Structural Aspects of Ligand Recognition by Engineered Lipocalins
with the four site-directed amino acid replacements that were introduced into the scaffold in order to make it better amenable to protein-engineering experiments (see above). Attempts were made to raise the affinity of DigA16 for the digoxigenin group even further by applying additional cycles of targeted random mutagenesis at loops# 3 and # 4 [49]. The resulting variant DigA16/19, which carries several new mutations in loop# 4, exhibits improved affinity for digoxigenin, with K D =12.4±1.3 nM. In addition, DigA16/19 possesses enhanced ligand specificity and also recognizes part of the linker that was used for fixing the steroid group to the carrier protein. During these experiments the randomized residues were still restricted to the original set of positions chosen within the four loops. However, from recent structural analyses (see Section 4) it appeared that there are additional, so far nonmutated amino acids that contribute to the shape of the ligand pocket and may therefore govern affinity and specificity for the steroids. Thus, future improvement in molecular recognition by this engineered lipocalin may be guided by rational principles. Nevertheless, the successful construction of a digoxigenin-binding lipocalin provides a novel and useful tool in biochemistry. Digoxigenin and digitoxigenin are medically important compounds, either as potentially poisonous substances or as drugs with therapeutic value – as long as they are applied at a precisely adjusted dose [50]. Furthermore, the digoxigenin group has acquired recent popularity in biochemistry as a non-radioactive label for a variety of biomolecules. Several chemically activated derivatives are available for the selective labeling of proteins or nucleic acids so that digoxigenin can be used independently from the commonly employed biotin group, with the advantage of very low background-staining activity [51]. The generation of BBP variants with novel binding specificities for fluorescein or digoxigenin, respectively, has demonstrated for the first time that a lipocalin can be tailored to recognize non-natural ligands. In order to illustrate the antibody-like binding function of the engineered lipocalins, this new class of proteins was termed “anticalins” [40].
4 Structural Aspects of Ligand Recognition by Engineered Lipocalins
The 3-D structures of the fluorescein- and digoxigenin-binding BBP variants have recently been analyzed by X-ray crystallography and compared with the original bilin-binding protein. The crystal structures were determined in different space groups and, for one variant, in both the presence and absence of the hapten, thus giving insight into the structural mechanisms of specific ligand recognition by the engineered lipocalins. In the case of the fluorescein-binding variant FluA, crystals were obtained in the presence of the ligand at pH 8.1 with two FluA·fluorescein complexes in the asymmetric unit, which were refined to a resolution of 2.0 Å [52]. The two molecules
13
14 Tailoring Protein Scaffolds for Ligand Recognition
Fig. 4 Crystal structure of the anticalin FluA with the bound fluorescein (green). C" positions of the 16 randomized amino acids are shown as gray spheres. Trp129 is depicted with its side chain in magenta.
were highly similar in structure, with a root mean square difference (rmsd) of 0.33 Å for 173 mutually superimposed Ca positions. The overall topology of the βbarrel with the α-helix attached to it, both of which are characteristic features of the lipocalin architecture (see Section 2), was found to be conserved (Fig. 4). Both disulfide bonds of the BBP scaffold, one between Cys18 and Cys115 and one between Cys42 and Cys47 , were also clearly visible. Upon superposition with the BBP crystal structure (molecule A from the Protein Data Base entry 1BBP [32]), an rmsd of 1.2 Å was calculated for 159 superimposed Cα positions. The largest structural differences were seen at the four loops that form the entrance to the binding site. The most prominent conformational changes occurred in loops# 1 and # 2. Loop# 1 had adopted a more extended conformation and moved away from the center of the ligand-binding site towards the bulk phase of the solvent. The Cα position of the mutated residue Asn36 (Val in BBP) at its tip was concomitantly displaced by approximately 8 Å. Loop# 3, which was also involved in the contacts with the structural neighbor in the asymmetric unit, had moved away from the barrel axis, thus opening the cavity for the bound ligand, with the Cα position o the non-mutated residue Gly92 at its tip shifted by about 6.6 Å. Fluorescein is bound at the bottom of the cleft that harbors biliverdin IXγ , in the wild-type BBP structure (Fig. 2). Its xanthenolone moiety is located close to the center of the β-barrel, while the carboxyphenyl group is oriented towards the entrance of the pocket. The para ring position (with respect to the central carbon
4 Structural Aspects of Ligand Recognition by Engineered Lipocalins
atom of the triphenylmethane dye), which carried the linker group during the selection experiments for this anticalin [40], is accessible from the solvent via a narrow channel. An area of 454 Å2 , corresponding to 91% of the solvent-accessible surface of fluorescein, became buried in the complex. Approximately 50% of the buried area from the side of the protein is contributed by 6 of the 16 residues that were mutated in the generation of FluA from BBP. The remaining buried surface belongs to 10 residues that were not mutated. Unexpectedly, when compared with the complexation of biliverdin by BBP, fluorescein was found to be inserted even more deeply into the hydrophobic core of the β-barrel. In the central region of the protein, where no mutations had been introduced, the necessary space was created by the movement of loop# 3 and by rearrangement of several side chains. In particular, there is a ladder of residues comprising His86 , Phe99 , His127 , and Trp129 , which have undergone a concerted reorientation of their aromatic side chains. Of those, only His127 , which participates in a packing interaction with one of the phenolic rings of fluorescein, was mutated in the generation of FluA from BBP. The non-mutated residue Trp129 is located directly underneath. It has shown major side chain reorientation and gives rise to an extended π stacking interaction at the middle of the xanthenolone ring system of fluorescein via coplanar arrangement in van der Waals distance. It seems that the introduction of the imidazole side chain at position 127 has triggered the whole movement, including those of the non-mutated residues His86 and Phe99 at the bottom of the pocket. As a result of these changes, original residues of BBP form a significant part of the reshaped ligand pocket. Hence, it appears that in addition to the loop region, the hydrophobic core of the lipocalin displays considerable plasticity as well. Residues close to the hydrophobic core that had been thought to be crucial for proper folding of the protein have adopted completely new side chain orientations in order to allow for the binding of the new ligand. Future design of anticalins with even higher affinities for small haptens may therefore also include residues from the central part of the β-barrel. When considering the side chains that make up the binding site for fluorescein, both mutated and non-mutated, it is found that they are predominantly polar in nature. Thus, the lipocalin pocket is by no means restricted to lipophilic ligands, as was anticipated before and as the name of this protein family may suggest. Furthermore, the crystal structure confirms the crucial role of the mutated basic residues Arg58 and His127 in FluA for the tight binding of fluorescein, which was previously demonstrated by site-directed mutagenesis experiments [40]. However, the mechanism of interaction is different from the earlier assumption that was based on the crystal structure of the anti-fluorescein antibody 4–4–20 [53]. There, the xanthenolone group is oriented such that an Arg and a His residue (with Cα distance similar to BBP) are each in contact with one of the phenolic moieties and form hydrogen bonds. In contrast, Arg58 and His127 of FluA contact the same phenolic group, but from opposite sides. This arrangement is made possible by the extensive structural reorganization at the open end of the β-barrel (see above). Three Arg residues at positions 88, 95, and 116, which were shown to have a
15
16 Tailoring Protein Scaffolds for Ligand Recognition
favorable effect on the ligand affinity via their positively charged side chains, do not form direct contacts with the bound fluorescein and should therefore mainly exert an electrostatic influence. Yet, the reasons for their peculiar arrangement at the entrance to the ligand pocket and their strong conservation among the several selected fluorescein-binding BBP variants [40] are not obvious at present. Taken together, the crystallographic analysis of the anticalin FluA, which was generated by combinatorial design from a prototypic lipocalin, reveals that mutated residues within the loop region and adjoining parts of the β-barrel can give rise to three different effects. First, they can contribute direct contacts with the bound ligand or at least provide an appropriate electrostatic environment. Second, they may induce novel backbone conformations in the loops and thus lead to the formation of a pocket with generic shape complementarity with the prescribed ligand. Finally, there are certain amino acids that influence the side chain conformations of neighboring residues and thus reshape the pocket in an indirect manner. Similar phenomena are known from antibodies where, apart from amino acids that contact the antigen or hapten, key residues within the hypervariable loops are responsible for their canonical backbone conformations [54] and framework residues indirectly fine-tune the shape of the combining site [55]. The crystal structure of the FluA·fluorescein complex also provides a structural explanation for the strong quenching effect of this particular anticalin that was mentioned before. Tight coplanar packing of the indole ring of Trp129 against the xanthenolone system of the bound fluorescein was observed (Fig. 4). Consequently, this aromatic residue is the likely candidate for the highly efficient electron-transfer process and is optimally positioned in this respect. In the case of the anticalin DigA16, crystal structures were solved not only for the bound digoxigenin but also for the complex with the related steroid digitoxigenin and for the uncomplexed apo-protein [56]. The crystals, which were grown at pH 7.6–8.0 and whose structures were refined to resolutions between 1.8 and 1.9 Å, were essentially isomorphous and contained one monomer per asymmetric unit. In addition, crystals were obtained in another space group for the uncomplexed original BBP variant DigA, although with poorer diffraction quality. Again, the overall topology of the lipocalin, comprising the β-barrel with the α-helix attached to it, remained conserved, whereas the set of four loops at the entrance to the ligand pocket revealed clear structural differences in comparison with the BBP. The most prominent conformational change was observed for loop# 1, where an α-helical segment of seven amino acids appeared in DigA16 (residues 33 through 39), in both the presence and absence of the steroid ligand (Fig. 5). Notably, the non-mutated residue Tyr39 within the new helix – which faces the solvent in the BBP – has shifted in all three DigA16 structures such that it packs with its side chain against the bound steroid, if present. This dramatic conformational change results in a displacement of its Cα atom by 9 Å with respect to the position in the BBP. The α-helical loop conformation seems to be essentially stabilized by two specific interactions. First, the side chains of the mutated residues Arg58 and Ser60 in loop# 2 form hydrogen bonds with the carbonyl oxygen of Tyr39 at the carboxy-terminal end
4 Structural Aspects of Ligand Recognition by Engineered Lipocalins
Fig. 5 Molecular recognition of haptens by engineered lipocalins versus antibodies. (Top) Crystal structure of the anticalin DigA16 with the bound digoxigenin (yellow). The arrow points to the hydroxyl substituent of steroid ring A, which had served for covalent attachment – via a flexible spacer – to a solid support during the selection process for this BBP variant. The C" posi-
tions of the 16 initially randomized amino acids are shown as gray spheres. (Bottom) Crystal structure of the anti-digoxigenin Fab fragment 26–10 (PDB entry 1IGJ) with the VL and VH domains colored cyan and magenta, respectively. The bound di-goxigenin group is shown in yellow, while the digitoxose sugar attached to it (at the same ring position as above) is colored light gray.
17
18 Tailoring Protein Scaffolds for Ligand Recognition
of the helix. Second, the newly introduced side chain of His35 at the amino-terminal end of the helix becomes packed between this Tyr residue and the side chain of Leu127 in loop# 4, which also resulted from the mutagenesis. Thus, a small pocket for the imidazole group is formed, which is closed at the bottom by the side chains of Trp129 (already present in the BBP) and Gln28 . The latter residue was introduced during the affinity maturation from DigA to DigA16, leading to a 10-fold improved ligand affinity (see Section 3). In the complex with DigA16, digoxigenin is bound at the bottom of the cleft that otherwise harbors biliverdin IXγ , in the wild-type BBP, and it roughly replaces the space previously occupied by one of the tetrapyrrole rings [32]. An area of 514 Å, corresponding to 95% of the solvent-accessible surface of digoxigenin have thus become buried. Approximately half of the buried surface is contributed by 9 of the 17 residues that were mutated in the generation of DigA16, while the remainder is due to 10 residues that have not been mutated. Similarly, as deduced for the anticalin FluA, specific recognition of the steroid ligand is achieved by preformed shape complementarity of the ligand pocket in apo-DigA16 as a result of (1) side chain replacements and (2) indirect effects of mutated positions on wild-type residues. The latter effect can especially be seen for the shifted residue Tyr39 , which is part of the newly formed helix in loop# 1, and for altered side chain conformations of Phe99 and Trp129 at the bottom of the ligand pocket. These two conserved residues have rotated their aromatic side chains by approximately 120◦ compared with BBP, thus enabling accommodation of the new bulky steroid ligand. Their conformation appears to be similar in all three DigA16 structures, even in the absence of the ligand. Apparently, the side chain rotation of Trp129 is in concert with the extensive rearrangement in the upper part of the binding cleft, particularly within loop# 1 due to the mutated residues. The steroid ligand is bound mainly via van der Waals interactions but also via hydrogen bonds with its polar substituents. Several water molecules have become buried in the ligand pocket as well and participate in hydrogen bond interactions with the hydroxyl and lactone groups of the hydrophilic steroid. Of the buried protein surface in the digoxigenin complex, 34% is provided by non-hydrocarbon groups (for comparison, 33% in the BBP). Hence, the binding site has considerable polar character, which is in contrast with the almost entirely hydrophobic nature of the ligand pocket in other lipocalins, such as the retinol-binding protein (with a corresponding value of 16%). Similarly, the steroid ligand is almost fully trapped within the binding site, with a remaining accessible surface of 5%. Indeed, there is a small gap in the protein shell that permits accessibility of the steroid position C-3, which carried the linker group during the selection procedure [47]. Thus, as a result of the combinatorial protein design experiment, the corresponding steroid derivative must have been almost perfectly shielded from solvent. From a comparison between the DigA16 complexes with digoxigenin versus digitoxigenin, the structural mechanism of the fine specificity in the steroid recognition – and discrimination between these closely related ligands – became apparent. The non-mutated residue His86 at the bottom of the ligand pocket, which is close to the hydrophobic core within the β-barrel, plays a crucial role in this
4 Structural Aspects of Ligand Recognition by Engineered Lipocalins
respect and is involved in an induced fit during ligand complexation. Upon binding of digoxigenin, the side chain of His86 , which points into the empty cavity in apo-DigA16, is displaced by the ligand and forms a hydrogen bond with the hydroxyl group HO-12 of the steroid. Thus, it becomes rotated towards Tyr22 , which is part of a loop at the closed end of the β-barrel. The side chain of Tyr22 itself rotates away from its original hydrogen-bonding partner Thr104 in apo-DigA16 into the direction of His86 and forms a hydrogen bond with the imidazole side chain instead. Digitoxigenin, which is bound essentially at the same position and with the same orientation as digoxigenin, lacks the OH group at the steroid position C-12. Consequently, a hydrogen bond between His86 and digitoxigenin is missing in the complex with DigA16, and its imidazole side chain packs closer to the steroid ring system. Thus, compared with the digoxigenin complex it is partially rotated back into the position that it has assumed in the apo-protein. Accordingly, the side chain of Tyr22 still forms a hydrogen bond with Thr104 . Instead, a water molecule appears at the position that is occupied by the phenolic hydroxyl group of Tyr22 in the DigA16·digoxigenin complex, which is weakly hydrogen-bonded to His86 in the case of the bound digitoxigenin. The set of available crystal structures also provides an explanation for the effect of the affinity maturation that led from the original DigA anticalin to the DigA16 mutant. During this step, several amino acids were randomized in loop# 1 [47]. While most of the corresponding side chains are solvent exposed in the tertiary structure, two residues are probably relevant for the loop conformation and improved ligand affinity of DigA16. First, His35 remained conserved with respect to the DigA sequence, which underlines its role in the helix conformation as described above, together with the invariant residue Tyr39 . Second, Glu28 was consistently replaced by Gln (in several mutants that were selected along with DigA16; see [47]). In the apo-DigA structure, the side chain of Glu28 adopts a different conformation compared with Gln28 in apo-DigA16 and forms a salt bridge with the Arg58 guanidinium group. As a consequence, the carboxylate moiety of Glu28 may sterically interfere with the position of the lactone substituent at ring D of the bound steroid ligand. In addition, due to a corresponding shift of the Arg side chain, the helix – which covers the bound ligand – probably undergoes minor repositioning. Hence, it seems that the introduction of Gln28 is mainly responsible for the 10-fold enhanced affinity of DigA16. Notably, the mechanism of molecular recognition by the anticalin is different from the manner in which digoxigenin is bound by immunoglobulins. Two crystal structures have been described for Fab fragments of monoclonal antibodies that were raised against digoxin as a hapten: the Fab “26–10” in complex with digoxin (PDB accession code 1IGJ; [57]) and the Fab “40–50” in complex with ouabain (PDB accession code 1IBG; [58]). In both cases only between 60% and 70% of the steroid became buried from solvent during complex formation (Fig. 5). The ring system is covered by protein residues mostly from one side, probably due to the limited capacity of the Ig architecture to create a deep pocket at the interface of the VH and VL domains. Even though high affinities have been achieved by these antibodies, their specificities are poor because both significantly cross-react with ouabain, a
19
20 Tailoring Protein Scaffolds for Ligand Recognition
Fig. 6 Plasticity of the engineered lipocalin-binding site: superposition of the natural BBP with its variants FluA and DigA16 (with loops colored blue, green, and yellow, respectively).
cardiac steroid related to digoxin [48, 58]. In contrast, the engineered lipocalin DigA16 clearly distinguishes between these two compounds [47], thus providing a functional advantage. In summary, a phenomenon of pronounced structural plasticity was observed in the engineered BBP variants FluA and DigA or DigA16, which means that the backbone conformation of the lipocalin loop region was strongly influenced by the side chain replacements (Fig. 6). However, in the context of a given amino acid sequence, the conformational flexibility of these loops seems to be rather low because no significant differences were observed in the two independently refined FluA·fluorescein complexes or when comparing the DigA16 structures in the absence or presence of the ligand. This effect was not entirely expected a priori, because from a superposition of natural lipocalins with known tertiary structures [21], it was not clear to which extent the loop conformation is governed by its own distinct sequence versus individually variable features of the β-barrel structure that provides the support.
5 Prospects and Future Applications of Anticalins
The functional and structural data that have been gathered during the engineering of lipocalins for the recognition of two unrelated low-molecular-weight ligands
5 Prospects and Future Applications of Anticalins 21
clearly demonstrate the potential of this scaffold for the generation of artificial receptor proteins with high affinity and specificity for prescribed target molecules. Our findings confirm that the β-barrel architecture of lipocalins constitutes a remarkably stable scaffold. Even though amino acids were replaced in at least 20 different positions, most of them within the binding site of the BBP, the overall topology and the β-barrel structure itself were retained. Structural changes were merely observed at a local level and essentially restricted to the loop regions. It seems that lipocalins indeed provide a partitioned protein architecture, wherein the β-barrel – together with the fixed loops at its closed end and the α-helix attached to it – provides a rigid framework that is structurally conserved among the “prototypic” members of this family [21], while the set of four loops at its open end can be hypervariable. This situation is reminiscent of antibodies, where a set of six CDRs presented on top of a largely constant framework region is responsible for the specific binding of the antigen or hapten. However, compared with recombinant antibody fragments, engineered lipocalins should provide significant benefits because they are composed of one instead of two polypeptide chains, they have a much smaller size, and their set of four loops can be easily manipulated simultaneously at the genetic level. Consequently, lipocalins provide a promising alternative to antibody fragments for the engineering of artificial ligand-binding proteins, therefore called “anticalins,” using the methods of combinatorial biochemistry [21, 40]. The most striking property of the lipocalin scaffold is its ability to provide a well-defined and conformationally rigid cavity for the ligand. The ligand pocket is significantly deeper than the hapten-binding sites found in antibodies and may even reach down into the hydrophobic core of the lipocalin, as was seen in the case of the fluorescein-binding anticalin FluA. A similar mode of complexation would not be possible for an antibody fragment because of the detrimental effect on the non-covalent association between VH and VL . Hence, the lipocalin-bound ligand can be trapped from the solvent and becomes almost fully surrounded by protein residues, which explains the pronounced specificity, especially observed for the steroid ligands. Thus, an extended linker structure, which should also include hydrophilic groups, seems to be required for the functional immobilization of the target compound during the selection procedure for the cognate anticalin. It would be nice to devise selection techniques that no longer necessitate the covalent fixation of the target because under such conditions anticalins might be generated that fully encapsulate their ligands. Remarkably, the anticalins obtained up to now recognize their low-molecularweight ligands independently from the carrier – usually a stable globular protein like BSA or RNAse – that was employed for target display. This was shown for both the fluorescein group [40] and digoxigenin [47]. In this respect, anticalins distinguish themselves from many antibodies, especially when derived from synthetic libraries [4], and also from different protein scaffolds that have been tested for similar purposes. For example, when mutants of cytochrome b562 with two randomized loop regions were selected against an organic target compound, the hapten was only recognized – with a weak micromolar affinity – as long as it remained linked to the original BSA carrier [59].
22 Tailoring Protein Scaffolds for Ligand Recognition
The well-defined binding properties for the small ligand in combination with the lack of cross-reactivity with the macromolecular conjugate partner are probably due to the choice of the randomized positions in the lipocalin within an inner zone at the open end of the β-barrel but below the exposed tips of the loop region (Fig. 3). Consequently, just a few of the randomized residues are potentially accessible from outside the pocket, most likely only after significant changes in the backbone conformation – as observed for loop # 1, which adopts an α-helical structure in DigA16 (see Section 4). However, the scope of molecular recognition by engineered lipocalins should not be restricted to low-molecular-weight compounds. Indeed, the more exposed side chains of the four loops may be specifically randomized for the generation of another sort of anticalin libraries that could be useful for selection towards macromolecular targets. Preliminary data from our laboratory suggest that mutants of the BBP can be obtained in this manner, which exhibit specific binding activity for prescribed proteins with dissociation constants in the nanomolar range. The proof of concept for the generation of anticalins as a novel class of receptor proteins with defined ligand-binding properties was realized using the BBP as a model lipocalin. It has been shown that, because of their simple and robust architecture, these anticalins provide several practical advantages. For example, they are remarkably stable against denaturation. Thermal unfolding studies revealed a melting temperature of 61.3 ◦ C for the recombinant BBP and an even higher T m value of 72.8 ◦ C for the anticalin DigA, although this variant had not been selected for enhanced folding stability [49]. Another advantage of the lipocalin architecture relates to the fact that both ends of the polypeptide chain are sterically accessible at the outside of the β-barrel and should normally not interfere with the structure of the ligand-binding site. Thus, anticalins are amenable to the construction of functional fusion proteins at both their amino- and carboxy-termini. This was demonstrated in the case of DigA16 for alkaline phosphatase, which could serve directly as a reporter enzyme for the detection of digoxigenin groups after fusion with either end of the anticalin [47]. Anticalins may even be fused to each other, leading to so-called “duocalins,” a novel class of bifunctional ligand-binding proteins [60]. The insights that were gained so far from the anticalin approach illustrate once again the huge potential of polypeptides to adopt diverse molecular shapes, seen here for the ligand pockets of engineered lipocalins. Given the high plasticity of the loop region, which probably constitutes a specific functional advantage of this protein family, the rational prediction of the influence of amino acid substitutions on the structure of the binding site will probably remain difficult in the near future. But even when applying the powerful methods of combinatorial biochemistry, one should be aware of the still limited options for the realization of novel functional active sites, which is caused by the vast number of possible sequence combinations on the one hand and the restricted number of molecules that can be physically generated and applied to a selection experiment on the other. We have tried to address this generic problem in protein engineering by making a careful choice of amino acids for random mutagenesis – in fact, just less then half of the positions that one could actually consider – in order to reduce the combinatorial complexity and thus create a potent molecular library in the functional sense. The fact that specific hapten-binding activities were immediately derived from
References
this library confirms the validity of this concept. Nevertheless, additional steps of affinity maturation may be needed, as demonstrated for DigA16 [47, 49], in order to fine-tune the shape complementarity of the binding site after initial ligand recognition property for the ligand was imprinted. The rational choice of positions to be modified, in combination with repeated cycles of targeted randomization and selection – corresponding to a kind of molecular evolution – is probably the best general strategy for obtaining novel proteins with well-defined ligand-binding function, at least for the moment. In conclusion, specifically engineered anticalins open numerous areas of application as ligand-binding proteins not only in bioanalytics and separation technology but also in medical diagnostics and possibly even therapy. Especially for the latter purpose, it could be advantageous to generate anticalins based on a human lipocalin framework [21]. Hence, immunogenic side effects will be minimized upon repeated administration to patients. The preparation of appropriate fusion proteins should permit the introduction of useful effector functions – as already demonstrated with enzymes or certain binding modules, such as the albumin-binding domain [47]. The field of engineered protein scaffolds for molecular recognition has rapidly emerged during the past few years (for reviews, see [9, 61]). Among the several scaffold structures that are currently being exploited, immunoglobulins and lipocalins certainly stand out. Both families are utilized by nature itself in order to provide specific binding proteins based on a stable tertiary fold that supports hypervariable loops. In the case of immunoglobulins, hundreds of millions of different antibodies are constantly being created in each individual’s immune system using mechanisms of genetic recombination and somatic hypermutation. In contrast, lipocalins are much smaller in number and have been stably evolved in many organisms in order to serve more specialized physiological functions. Whereas antibodies must predominantly recognize macromolecular antigens – such as proteins and carbohydrates – in their defense against microbial or viral pathogens, lipocalins seem to mainly serve for the transport and storage of low-molecular-weight compounds. Especially in this respect, lipocalins are distinct from other protein scaffolds that have been subject to protein engineering and successfully used for the generation of binding modules against protein targets. The promising results obtained here from the tailoring of a natural lipocalin for the recognition of hapten-like ligands emphasizes the unique potential of this protein family for the generation of corresponding receptor proteins. Apart from the interesting practical applications, this field of research will also offer conceptual insight into the mechanisms of molecular recognition between proteins and small molecules in general.
References 1 R. C. STEVENS, S. YOKOYAMA, I. A. WILSON, Science 2001, 294, 89–92. 2 H. GOHLKE, G. KLEBE, Angew. Chem. Int. Ed. 2002, 41, 2644–2676.
3 H. K. RAU, N. DEJONGE, W. HAEHNEL, Angew. Chem. Int. Ed. Engl. 2000 39, 250–253.
23
24 Tailoring Protein Scaffolds for Ligand Recognition
4 G. A. WEISS, H. B. LOWMAN, Chem. Biol. 2000, 7, R177–R184. 5 B. HARRIS, Trends Biotechnol. 1999, 17, 290–296. 6 M. R. UJHELYI, S. ROBERT, Clin. Pharmacokinet. 1995, 28, 483–493. 7 D. R. DAVIES, S. CHACKO, Acc. Chem. Res. 1993, 26, 421–427. 8 E. A. PADLAN, Mol. Immunol. 1994, 31, 169–217. 9 A. SKERRA, J. Mol. Recognit. 2000, 13, 167–187. 10 A. KOIDE, C. W. BAILEY, X. HUANG, S. KOIDE, J. Mol. Biol. 1998, 284, 1141–1151. 11 S. MUYLDERMANS, M. LAUWEREYS, J. Mol. Recognit. 1999, 12, 131–140. 12 S. SPINELLI, M. TEGONI, L. FRENKEN, C. van VLIET, C. CAMBILLAU, J. Mol. Biol. 2001, 311, 123–129. 13 K. NORD, E. GUNNERIUSSON, J. RINGDAHL, S. STAHL, M. UHLE´ N, P.-Å. NYGREN, Nature Biotechnol. 1997, 15, 772–777. 14 F. A. QUIOCHO, P. S. LEDVINA, Mol. Microbiol. 1996, 20, 17–25. 15 Y. LINDQUIST, G. SCHNEIDER, Curr. Opin. Struct. Biol. 1996, 6, 798–803. 16 M. WILCHEK, E. A. BAYER, Biomol. Eng. 1999, 16, 1–4. 17 M. E. NEWCOMER, D. E. ONG, Biochim. Biophys. Acta 2000, 1482, 57–64. 18 D. R. FLOWER, Biochem. J. 1996, 318, 1–14. 19 G. GUTIE´ RREZ, M.D. GANFORNINA, D. S´ANCHEZ, Biochim. Biophys. Acta 2000, 1482, 35–45. 20 R. E. BISHOP, Biochim. Biophys. Acta 2000, 1482, 73–83. 21 A. SKERRA, Biochim. Biophys. Acta 2000, 1482, 337–350. 22 M. E. NEWCOMER, T. A. JONES, J. ÅQVIST, J. SUNDELIN, U. ERIKSSON, L. RASK, P. A. PETERSON, EMBO J. 1984, 3, 1451–1454. 23 S. W. COWAN, M. E. NEWCOMER, T. A. JONES, Proteins: Struct. Funct. Genet. 1990, 8, 44–61. 24 H. L. MONACO, M. RIZZI, A. CODA, Science 1995, 268, 1039–1041. 25 S. PERVAIZ, K. BREW, Fed. Am. Soc. Exp. Biol. 1987, 1, 209–214. 26 W. J. HENZEL, H. RODRIGUEZ, A. G. SINGER, J. T. STULTS, F. MACRIDES, W. C. AGOSTA, H. NIALL, J. Biol. Chem. 1988, 263, 16682–16687.
27 F. VINCENT, D. L¨OBEL, K. BROWN, S. SPINELLI, P. GROTE, H. BREER, C. CAMBILLAU, M. TEGONI, J. Mol. Biol. 2001, 305, 459–469. 28 D. R. FLOWER, J. Mol. Recognit. 1995, 8, 185–195. 29 E. RASSART, A. BEDIRIAN, S. DO CARMO, O. GUINARD, J. SIROIS, L. TERRISSE, R. MILNE, Biochim. Biophys. Acta 2000, 1482, 185–198. 30 M. VOGT, A. SKERRA, J. Mol. Recognit. 2001, 14, 79–86. 31 R. HUBER, M. SCHNEIDER, O. EPP, I. MAYR, A. MESSERSCHMIDT, J. PFLUGRATH, H. KAYSER, J. Mol. Biol. 1987, 195, 423–434. 32 R. HUBER, M. SCHNEIDER, I. MAYR, R. M¨ULLER, R. DEUTZMANN, F. SUTER, H. ZUBER, H. FALK, H. KAYSER, J. Mol. Biol. 1987, 198, 499–513. 33 M. COLES, T. DIERCKS, B. MUEHLENWEG, S. BARTSCH, V. Z¨OLZER, H. TSCHESCHE, H. KESSLER, J. Mol. Biol. 1999, 289, 139–157. 34 D. H. GOETZ, S. T. WILLIE, R. S. ARMEN, T. BRATT, N. BORREGAARD, R. K. STRONG, Biochemistry 2000, 39, 1935–1941. 35 R. GLOCKSHUBER, T. SCHMIDT, A. PLU¨ CKTHUN, Biochemistry 1992, 31, 1270–1279. 36 P. A. SCHINDLER, C. A. SETTINERI, X. COLLET, C. J. FIELDING, A. L. BURLINGAME, Protein Sci. 1995, 4, 791–803. 37 H.N. M¨ULLER, A. SKERRA, J. Mol. Biol. 1993, 230, 725–732. 38 F. S. SCHMIDT, A. SKERRA, Eur. J. Biochem. 1994, 219, 855–863. 39 A. SKERRA, Rev. Mol. Biotechnol. 2001, 74, 257–275. 40 G. BESTE, F. S. SCHMIDT, T. STIBORA, A. SKERRA, Proc. Natl. Acad. Sci. USA 1999, 96, 1898–1903. 41 G. P. SMITH, V. A. PETRENKO, Chem. Rev. 1997, 97, 391–410. 42 A. SKERRA, Gene 1994, 151, 131–135. 43 E. W. VOSS, Jr., Fluorescein Hapten: An Immunological Probe, CRC Press, Boca Raton, FL, 1984. 44 C. F. BARBAS III, J.D. BAIN, D. M. HOEKSTRA, R. A. LERNER, Proc. Natl. Acad. Sci. USA 1992, 89, 4457–4461. 45 A. SKERRA, T. G. M. SCHMIDT, Methods Enzymol. 2000, 326, 271–304.
References 46 M. G¨OTZ, S. HESS, G. BESTE, A. SKERRA, M. E. MICHEL-BEYERLE, Biochemistry 2002, 41, 4156–4164. 47 S. SCHLEHUBER, G. BESTE, A. SKERRA, J. Mol. Biol. 2000, 297, 1105–1120. 48 M. MUDGETT HUNTER M. N. MARGOLIES, A. JU, E. HABER, J. Immunol. 1982, 192, 1165–1172. 49 S. SCHLEHUBER, A. SKERRA, Biophys. Chem. 2002, 96, 213–228. 50 A. R. HASTREITER, E. G. JOHN, R. L. van der HORST, Clin. Perinatol. 1988, 15, 491–522. 51 T. MCCREERY, Mol. Biotechnol. 1997, 7, 121–124. 52 I. KORNDO¨ RFER, G. BESTE, A. SKERRA, in preparation. 53 M. WHITLOW, A. J. HOWARD, J. F. WOOD, E. W. VOSS, Jr., K. D. HARDMAN, Protein Eng. 1995, 8, 749–761.
54 C. CHOTHIA, A. M. LESK, J. Mol. Biol. 1987, 196, 901–917. 55 J. FOOTE, G. WINTER, J. Mol. Biol. 1992, 224, 487–499. 56 I. KORNDO¨ RFER, S. SCHLEHUBER, A. SKERRA, in preparation. 57 P. D. JEFFREY, R. K. STRONG, L. C. SIEKER, CY. Y. CHANG, R. L. CAMPBELL, G. A. PETSKO, E. HABER, M. N. MARGOLIES, S. SHERIFF, Proc. Natl. Acad. Sci. USA 1993, 90, 10310–10314. 58 P. D. JEFFREY, J. F. SCHILDBACH, CY. Y. CHANG, P. H. KUSSIE, M. N. MARGOLIES, S. SHERIFF, J. Mol. Biol. 1995, 248, 344–360. 59 J. KU, P. G. SCHULTZ, Proc. Natl. Acad. Sci. USA 1995, 92, 6552–6556. 60 S. SCHLEHUBER, A. SKERRA, Biol. Chem. 2001, 382, 1335–1342. 61 P.-Å. NYGREN, M. UHLE´ N, Curr. Opin. Struct. Biol. 1997, 7, 463–469.
25
1
Evolutionary Generation of Enzymes with Novel Substrate Specificities Uwe T. Bornscheuer
Ernst-Moritz-Arndt-Universit¨at, Greifswald, Germany
Originally published in: Directed Molecular Evolution of Proteins. Edited by Susanne Brakmann, Kai Johnsson. Copyright ľ 2002 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30423-1
1 Introduction
The application of enzymes is now well documented. They are used in organic chemistry – mainly for the production of optically pure compounds – synthesis of detergents and emulsifiers, lipid modification, laundry formulations, food and beverage production etc. [1–6]. Due to their high chemo-, regio- and stereoselectivity, activity at ambient temperatures and pH, biocatalysts are often superior to chemical ones and this might explain that a considerable number of industrial bioprocesses have been established in recent years [2, 7]. Traditional methods to identify new enzymes are based on screening of e.g. soil samples or strain collections by enrichment cultures. Once a suitable biocatalyst is identified, strain improvement as well as cloning and expression of the encoding gene(s) enable production on a large scale. Unfortunately, not all microorganisms can be cultured using common fermentation technology and the number of accessible microorganisms from a sample is estimated to 0.001–1% depending on their origin [8]. In addition, not all enzymes found in nature are suitable for a specific synthetic problem and the useful properties mentioned above are not always satisfactory in practice. Until recently, these limitations were overcome by i) screening for alternative biocatalysts, ii) changes of the reaction system (e.g., from hydrolysis to esterification) or iii) by rational protein design followed by site-directed mutagenesis, if the gene encoding the enzyme and its three-dimensional structure are available [9]. All these methods are very time-consuming and in case of rational protein design also information-intensive. A very promising alternative for the generation of enzymes with improved or altered properties is the strategy of directed evolution (also called evolutive biotechnology or molecular evolution) which came to the fore in the early 90s. In most Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Evolutionary Generation of Enzymes with Novel Substrate Specificities
Fig. 1 Principle of directed evolution.
cases, directed evolution is comprised of a random mutagenesis of the gene encoding the catalyst by error-prone polymerase chain reaction (epPCR) resulting in large libraries of mutants. Another method is based on DNA- (or gene-) shuffling and related techniques. Libraries thus created are then assayed by using high-throughput technologies to identify improved variants (Fig. 1). Further details on the principles and applications of directed evolution can be found elsewhere in this book and in a considerable number of recent reviews dealing with various aspects of directed evolution [8–21]. This chapter focuses on the creation of novel substrate specificities of enzymes.
2 General Considerations
During natural evolution, a broad variety of enzymes has been developed, which are classified according to the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB). Thus, for each type of
2 General Considerations Table 1 Examples for the evolutionary generation of enzymes with novel substrate
specificities Wild-type activity
Method
Assay
Main Results
Ref.
Pseudomonas fluorescent esterase E. coli IGPS
MS
growth/pHindicator
[22, 24, 26]
SDM
growth assay
E. coli ProFARI
shuffling
growth assay
E. coli β-galactosidase
DNA-shuffling
p-/o-nitro-phenyl glycosides
Bacillus megaterium P450 MO Pseudomonas SY77 glutaryl acylase
epPCR/SM
visual inspection
active double mutant acting on sterically hindered substrate evolved PRAI activity from IGPS scaffold evolved PRAI activity from ProFARI 300–1000-fold increased specificity for fucosides generation of indigo/indirubin
epPCR/SM
growth assay
increased Vmax /Km for adipyl-7-ADCA
[34]
Pseudomonas putida P450 MO
epPCR/StEP
digital image screening
[35, 37]
Staphylococcus aureus lipase
epPCR/shuffling
chromogenic assay
enhanced naphthalene hydroxylation/formation of dihydroxy compounds 12-fold increase in phospholipase activity
Bacillus thermocatenulatus lipase
epPCR/SDM
visual inspection
17-fold increase in phospholipase activity
[39]
Saccharomyces cerevisiae peroxidase E. coli aldolase
shuffling
visual inspection
300-fold increased activity for guaiacol
[40]
epPCR/shuffling
UV assay
acceptance of phosphate free substrates
[41]
[28]
[29]
[31]
[33]
[38]
Abbreviations: ADCA, adipyl-cephalosporic acid; epPCR, error-prone PCR; IGPS, indole-3-glycerol phosphate synthase; MO, monooxygenase; MS, mutator strain; PRAI, phosphoribosylanthranilate isomerase; ProFARI, N -[5-phosphoribosyl)formimino]-5-aminoimidazole-4-carboxylamide ribonucleotide isomerase; SDM, site-directed mutagenesis, StEP, staggered extension process
characterized enzyme an EC (Enzyme Commission) number has been provided (see: www.expasy.ch/enzyme/). For instance, all hydrolases have EC number 3 and further subdivisions are provided by three additional digits, e.g. all lipases (official name: triacylglycerol lipases) have the EC number 3.1.1.3 and are thus distinguished from esterases (official name: carboxyl esterases) having the EC number
3
4 Evolutionary Generation of Enzymes with Novel Substrate Specificities
3.1.1.1. This classification is based on the substrate (and cofactor) specificity of an enzyme only, however often very similar amino acid sequences and also related three-dimensional structures can be observed. This suggests that the creation of novel enzyme specificity has a high probability if a closely related catalytic machinery and/or architecture is present for both types of activities, i.e. both substrates should be converted by the Ser-His-Asp triad present in lipases and amidases if an amidase substrate should be accepted by a lipase or vice versa. The chance for creating activity is usually enhanced if both enzymes share high sequence and/or structural homology. In other words, most readers will agree that it will be rather impossible to convert an esterase into a monooxygenase as these two enzymes differ substantially in their architecture, substrate and cofactor-requirement and consequently mode of action. With this in mind, it is not surprising that the examples reported in the literature (Table 1) for novel substrate specificities by directed evolution can be subdivided into the following three groups (Scheme 1): 1. Conversion of substrates, which in principle should be acceptable by the enzyme under investigation, but do not react due to e.g. steric hindrance, electronic effects etc.
Scheme 1 Selected examples illustrating possible targets for novel substrate specificities of enzymes. Esterases do not act on esters of tertiary alcohols and this problem thus belongs to group 1. Galactosidases only accept galactosides as substrates, but not glucosides and therefore this example
resembles group 2. Some lipases – e.g from Rhizopus sp., Bacillus sp. or Staphylococcus sp. – exhibit low activity towards the sn1-group of phospholipids. However, this activity is significantly lower compared to phospholipase A1 (PLA1 ) and consequently it can be regarded as a group 3 target.
3 Examples
2. Conversion of substrates which are usually converted by another (sometimes closely related) enzyme. 3. The compounds are already converted by the enzyme of interest, but at very low activity compared to well accepted (natural) substrates.
It should be noted that a clear-cut distinction between these groups is not always feasible.
3 Examples 3.1 Group 1
The first example in which a substrate not accepted by the wild-type enzyme could be converted by a mutant obtained by directed evolution was published by my group [22]. A sterically hindered 3-hydroxy ester (Scheme 2, 1a, b) resembling a building block in the envisaged total synthesis of epothilones was not accepted by 20 wild-type hydrolases (18 lipases, 2 esterases). Although closely related 3-hydroxy esters could be stereoselectively converted by a range of commercial lipases, this specific compound bearing two methyl groups at C4 did not react in hydrolysis of the corresponding ethyl ester as well as in acylation attempts at the 3-hydroxy group using vinyl acetate in toluene [23]. As the basic structure of the 3-hydroxy ester was set by the epothilone scaffold, the most promising alternative to obtain the compound in optically pure form was the creation of a hydrolase mutant exhibiting activity towards this substrate, preferentially in a stereoselective manner. Thus, a mutant library of an esterase from Pseudomonas fluorescens (PFE) was created using the mutator strain Epicurian coli XL1-Red (available from Stratagene). Key to the identification of improved variants from these libraries was an agar plate assay system based on pH indicators, thus leading to a change in color upon hydrolysis of the ethyl ester 1a (Scheme 3). Parallel assaying of replica-plated colonies on agar plates supplemented with the glycerol derivative 1b was used to refine the identification, because only E. coli colonies producing active esterases had access to the carbon source glycerol, thus leading to enhanced growth and in turn larger colonies [22, 24]. By this strategy, a double mutant (Leu181Val, Ala209Asp) was obtained, which indeed hydrolyzed the sterically hindered substrate. This was confirmed by preparative scale reactions followed by gas chromatographic analysis to calculate conversion and enantiomeric excess as well as by determination of the optical rotation by polarimetry. The best mutant gave an enantiomeric excess of 25% ee which corresponds to an enantioselectivity of E ∼5.0, which is in accordance with the, in general, rather low selectivity of PFE. Several other mutants were identified, but enantiomeric excess was usually around 3–4 % ee. Interestingly, the activity towards acetic acid ethyl ester – the standard substrate to determine esterase activity by a pH-stat assay – was decreased considerably for various mutants. Whereas
5
6 Evolutionary Generation of Enzymes with Novel Substrate Specificities
Scheme 2 Structures of compounds used (1a,b, 4) and obtained (2, 3) by directed evolution of biocatalysts aiming at novel substrate specificities.
wild-type PFE showed an activity of 6 Umg−1 , the activity of the Leu181Val/ Ala209Asp mutant was 2.3 Umg−1 and some variants showed only activities <1.0Umg−1 . The 3D-structure of PFE is unknown, however it was possible to perform homology modeling based on the solved 3D-structure of a haloperoxidase from Streptomyces aureofaciers [25]. This revealed that both mutations are not close to the binding pocket but solely at the periphery of the enzyme [24]. Although this method was useful to turn an inactive wild-type enzyme into an active esterase, the best mutant showed only modest enantioselectivity. Further work showed that the enantioselectivity of PFE could be increased from E = 3.5 to E = 5.2–6.6 in a single round of mutation [26]. Libraries were created by epPCR as well as by using the mutator strain from Epicurian coli. An extremely accurate determination of the enantioselectivity was achieved by using resorufin esters of (R)- or (S)-3-phenylbutyric acid, which allowed measurement of fluorescence in microtiter plates (MTP) avoiding problems with interfering compounds present in the culture medium. A further increase in enantioselectivity up to E = 12 was achieved by saturation mutagenesis at all three positions identified for the first
Scheme 3 Principle of the assay system used to identify active variants of esterase from Pseudomonasfuorescens (PFE) acting on the sterically hindered 3-hydroxy ester 1. Both substrates 1a,b yield the free acid leading to a color change; 1b also releases the carbon source glycerol leading to enhanced growth of esterase mutant producing E. coli clones.
3 Examples
Fig. 2 Mutants obtained by directed evolution and saturation mutagenesis showing enhanced enantioselectivity in the resolution of 3-phenylbutyric acid derivatives. MTP: E-values (Eapp ) determined in microtiter plates using the corresponding (R)- and (S)-resorufin ester; GC:E-values (Etrue ) cal-
culated using the equation E = [ln(1-c)*(1ees )]/[ln(1-c)*(1+ ees )] from data determined by gas chromatographic analysis on a chiral column from samples obtained after esterase-catalyzed hydrolysis of (R,S)-3phenyl butyric acid ethyl ester [24].
generation mutants (E. Henke and U.T. Bornscheuer, unpublished). However, combinations of best variants by site-directed mutagenesis gave no further improvement (Fig. 2). It should be noted, that Eapp values determined in MTPs were quite close to Etrue values, which have been determined by gas chromatographic analysis of samples from resolutions of racemic mixtures of the corresponding ethylesters of 3-phenylbutyric acids. Here, also competition between the two enantiomers is included, whereas the MTP-assay delivers only apparent E values. Significantly higher increases in enantioselectivity were achieved for the resolution of methyl 3-bromo-2-methylpropionate. Using the homology model of PFE, Trp29 and Phe199 were identified as positions for random mutagenesis. Screening of libraries using Quick E [27] identified a mutant Trp29Leu exhibiting an E = 90 compared to E = 12 for wild-type PFE. This is the highest selectivity ever achieved up to now by directed evolution techniques (U.T. Bornscheuer and R.J. Kazlauskas, unpublished), and allows the synthesis of optically pure substrate and product of this compound.
7
8 Evolutionary Generation of Enzymes with Novel Substrate Specificities
Scheme 4 Substrate specificities of ProFARI (involved in histidine biosynthesis) and PRAI (involved in tryptophan biosynthesis).
The knowledge about mutations affecting enantioselectivity and altered substrate specificity derived from these directed evolution studies of PFE is currently used as a basis to design variants combining both properties. In an excellent contribution, Fersht and co-workers evolved phosphoribosylanthranilate isomerase (PRAI) activity from the (βα)8 -barrel scaffold of indole-3glycerol phosphate synthase (IGPS) [28]. Their success provides a striking example for testing the “conserved scaffold” hypothesis of enzyme evolution. Careful design of the basic PRAI loop system into IGPS created the scaffold for directed evolution by DNA-shuffling and StEP. Active variants were identified by an agar plate assay using an E. coli strain that does not grow in the absence of tryptophan, which in turn is the product of PRAI activity. The mutant obtained had 28% identity to PRAI and 90% identity to IGPS. The same hypothesis that several different enzymes having the (βα)8 -barrel scaffold evolved from a common ancestor was used as a basis for the alteration of the substrate specificity of another isomerase having only 10% sequence homology [29]. This showed that a single round of gene-shuffling turned ProFARI (N -[5-phosphoribosyl)formimino]-5-aminoimidazole-4-carboxylamide ribonucleotide isomerase) – involved in histidine biosynthesis – into a mutant with PRAI activity catalyzing the same reaction in tryptophan biosynthesis (Scheme 4). In vivo selection of mutants in an E. coli strain lacking PRAI activity yielded two variants each with only a single amino acid substitution. One mutant catalyzed both reactions, although with only low residual wild-type activity (<4%). Both examples have been already discussed extensively in a recent review [30]. 3.2 Group 2
DNA-shuffling was successfully used to turn an E. coli lacZ β-galactosidase into a fucosidase [31]. Screening of ∼10 000 colonies per round based on chromogenic fucose substrates resulted in the identification of evolved fucosidases with 10–20-fold increased kcat /K m for the fucose substrate compared to the wild-type. Compared to the native β-galactosidase, the final evolved enzyme showed a 1000-fold increased substrate specificity for o-nitrophenyl fucopyranoside versus o-nitrophenyl
3 Examples
galactopyranoside and a 300-fold increase for the corresponding p-nitrophenol derivatives. In a related manner, attempts were made to convert the E. coli lacZ β-galactosidase into a β-glucosidase, however so far only minor changes in the substrate specificity have been reported [32]. In another example, some mutants of the P450-monooxygenase from Bacillus megaterium (P450 BM-3, expressed in E. coli) obtained by directed evolution using epPCR were found to hydroxylate indole at two different positions (Scheme 2, 2a, b) which upon air oxidation and dimerization yield indigo and indirubin [33]. So far, both compounds were known to be formed by the action of naphthalene dioxygenases or styrene monooxygenases, but are not produced by wild-type P450 BM3 and are thus a result of altered substrate specificity. Sequencing of several individual clones revealed that three positions (Phe87, Leu 188, Ala74) contained mutations. Next, simultaneous saturation mutagenesis at all three positions was performed to identify best “replacements” which led to the identification of a triple mutant (Phe87Val, Leu188Gln, Ala74Gly) enabling formation of indigo and indirubin in milligram amounts. In a very recent paper the activity of a glutaryl acylase from Pseudomonas SY77 towards adipyl-7-ADCA – an important precursor in the synthesis of semi-synthetic cephalosporins – was altered by directed evolution [34]. Random mutagenesis of the α-subunit, growth selection of active variants followed by saturation mutagenesis identified three mutants showing decreased Vmax /Km values for glutaryl substrates and higher values towards the adipyl analog. The authors used a growth selection based on adipyl-leucine or adipyl serine to identify acylase mutants acting on adipic acid instead of glutaric acid side-chains in cephalosporins. 3.3 Group 3
A P450 monooxygenase from Pseudomonas putida primarily catalyzes the hydroxylation of camphor (P450cam ) utilizing the cofactor NADPH. In contrast, hydroxylation of naphthalene (Scheme 2, 3) occurs only to a minor extent. Coexpression of P450cam with horseradish peroxidase in E. coli in the presence of hydrogen peroxide allowed hydroxylation in the absence of NADPH via a “peroxide shunt” pathway. Moreover, under these reaction conditions, hydroxylated naphthalenes form fluorescent dimers amenable to digital image screening [35]. Thus, approximately 200 000 mutants obtained by epPCR and the staggered extension process (StEP) [36] were easily screened for increased fluorescence suggesting enhanced activity of P450cam . In addition, mutations could be identified having altered regiospecificities. Thus variants of P450cam were generated showing ∼20-fold higher activity compared to the wild-type as well as mutants catalyzing the formation of dihydroxy compounds [37]. Phospholipase activity was introduced into a Staphylococcus aureus lipase mutant previously created by site-directed mutagenesis, which displayed very low phospholipase activity. Next, directed evolution using epPCR and gene shuffling led to a substantial increase in the phospholipase/lipase activity ratio [38]. Mutants
9
10 Evolutionary Generation of Enzymes with Novel Substrate Specificities
were first screened in a qualitative agar plate assay followed by high-throughput chromogenic assay based on a thio-derivative of a phospholipid. Upon phospholipase activity, the free SH-group was reacted with 5,5 -dithiobis(2-nitrobenzoic acid) yielding 5-thio-2-nitrobenzoic acid amenable to spectrophotometric quantification. The best variant contained six mutations and displayed a 11.6-fold increase in phospholipase activity and a 11.5-fold increased phospholipase/lipase ratio compared to the starting mutant. Recently, similar findings were reported for a lipase from Bacillus thermocatenulatus expressed in E. coli [39]. A single round of random mutagenesis followed by screening of 6000 transformants on egg-yolk identified three variants with ∼10–12-fold phospholipase activity. Further rounds of random mutagenesis gave no improvement, but site-directed mutagenesis (Leu353Ser) gave a 17-fold enhanced phospholipase activity. The substrate specificity of a cytochrom c peroxidase (CCP) from Saccharomyces cerevisiae towards guaiacol (Scheme 2, 4) was increased 300-fold by means of DNAshuffling [40]. Screening was performed by spreading colonies onto nitrocellulose membranes. Mutants showing enhanced activity towards guaiacol were identified by visual inspection of fast-staining colonies due to a brownish color based on the formation of tetraguaiacol. Interestingly, sequencing revealed that the – in the super-family of peroxidase – fully conversed residue Arg48 was changed to His. The authors believe that this mutation plays a key role in the increase in activity toward the phenolic substrate guaiacol. Wild-type aldolase from E. coli poorly accepts non-phosphorylated D-2-keto-3deoxygluconate (KDG). Several rounds of epPCR and shuffling identified mutants with significantly enhanced activity towards KDG [41]. The best mutant showed a 70fold increased catalytic efficiency with concomitant decrease in the activity towards the phosphorylated analog. Assaying was performed in microtiter plates based on reduction of released pyruvate by lactate dehydrogenase yielding a decrease in NADH concentration. Furthermore, library screening identified mutants accepting L-glyceraldehyde as substrate, which is converted by the wild-type aldolase to only a negligible extent.
4 Conclusions
The examples provided in this Chapter demonstrate that directed evolution resembles a very useful tool to create enzyme activities hardly accessible by means of rational protein design (Table 1). Even if the desired substrate specificity is known from other biocatalysts – e.g. phospholipase A1 activity – the advantage of the directed evolution approach resides in the already achieved functional expression of a particular protein. Thus bottlenecks arising from the identification of enzymes by traditional screening and cultivation methods can be circumvented. In addition, directed evolution can dramatically reduce the time required for the provision of a suitable tailor-made enzyme, also because cloning and functional expression of the biocatalyst has already been achieved.
References
The striking examples by the Fersht and Sterner groups for the evolution of βα 8 -barrel enzymes also demonstrate that even functional and structural barriers between different enzymes can be overcome by means of directed evolution. Acknowledgment
I am especially grateful to Prof. Romas Kazlauskas (McGill University, Canada) and Dipl.-Chem. Erik Henke (Greifswald University) for their support. References 1 BORNSCHEUER, U. T., KAZLAUSKAS, R. J. (1999), Hydrolases in organic synthesis -regio- and stereoselective biotransformations, Weinheim: Wiley-VCH. 2 BORNSCHEUER, U. T. (2000), Enzymes in Lipid Modification, Weinheim: Wiley-VCH. 3 DRAUZ, K., WALDMANN, H. (1995), Enzyme catalysis in organic synthesis, Vol. 1&2 Weinheim: VCH. 4 FABER, K. (2000), Biotransformations in Organic Chemistry, Berlin: Springer. 5 PATEL, R. N. (2000), Stereoselective Biocatalysis, New York: Marcel Dekker. 6 WONG, C.-H., WHITESIDES, G. M. (1994), Enzymes in synthetic organic chemistry, Oxford: Pergamon Press. 7 LIESE, A., SEELBACH, K., WANDREY, C. (2000), Industrial Biotransformations, Weinheim: Wiley-VCH. 8 MILLER, C. A. (2000), Advances in enzyme discovery, Inform 11, 489–495. 9 BORNSCHEUER, U. T., POHL, M. (2001), Improved biocatalysts by directed evolution and rational protein design, Curr. Opin. Chem. Biol. 5, 137–143. 10 ARNOLD, F. H. (1998), Design by directed evolution, Acc. Chem. Res. 31, 125–131. 11 ARNOLD, F. H., VOLKOV, A. A. (1999), Directed evolution of biocatalysts, Curr. Opin. Chem. Biol. 3, 54–59. 12 ARNOLD, F. H. (1999), Unnatural selection: molecular sex for fun and profit, Engineering Science 40–50. 13 BORNSCHEUER, U. T. (1998), Directed evolution of enzymes, Angew. Chem. Int. Ed. 37, 3105–3108. 14 ARNOLD, F. H., MOORE, J. C. (1997), Optimizing industrial enzymes by
15
16
17
18
19
20
21
22
23
directed evolution, Adv. Biochem. Eng./Biotechnol. 58, 1–14. SCHMIDT-DANNERT, C., ARNOLD, F. H. (1999), Directed evolution of industrial enzymes, Trends Biotechnol. 17, 135–136. REETZ, M. T., JAEGER, K.-E. (1999), Superior biocatalysts by directed evolution, Topic Curr. Chem. 200, 31–57. PETROUNIA, LP., ARNOLD, F. H. (2000), Designed evolution of enzymatic properties, Curr. Opin. Biotechnol. 11, 325–330. TOBIN, M. B., GUSTAFSSON, C., HUISMAN, G. W. (2000), Directed evolution: the “rational” basis for “irrational”; design, Curr. Opin. Struct. Biol. 10, 421–427. SUTHERLAND, J. D. (2000), Evolutionary optimisation of enzymes, Curr. Opin. Chem. Biol. 4, 263–269. JAEGER, K. E., REETZ, M. T. (2000), Directed evolution of enantioselective enzymes for organic chemistry, Curr. Opin. Chem. Biol. 4, 68–73. REETZ, M. T. (2001), Combinatorial and evolution-based methods in the creation of enantioselective catalysts, Angew. Chem. Int. Ed. 40, 284–310. BORNSCHEUER, U. T., ALTENBUCHNER, J., MEYER, H. H. (1998), Directed evolution of an esterase for the stereoselective resolution of a key intermediate in the synthesis of Epothilones, Biotechnol. Bioeng. 58, 554–559. BORNSCHEUER, U., HERAR, A., KREYE L. WENDEL, V., CAPEWELL, A., MEYER, H. H., SCHEPER T. KOLISIS, F. N. (1993), Factors affecting the lipase catalyzed transesterification reactions of 3-hydroxy
11
12 Evolutionary Generation of Enzymes with Novel Substrate Specificities
24
25
26
27
28
29
30
31
32
esters in organic solvents, Tetrahedron: Asymmetry 4, 1007–1016. BORNSCHEUER, U. T., ALTENBUCHNER, J., MEYER, H. H. (1999), Directed evolution of an esterase: Screening of enzyme libraries based on pH-indicators and a growth assay, Bioorg. Med. Chem. 7, 2169–2173. HECHT, H. J., SOBEK, H., HAAG, T., PFEIFER, O., P´EE, K. H. v. (1994), The metal-ion-free oxidoreductase from Streptomyces aureofiaciens has an a/B hydrolase fold, Nature Struct. Biol. 1, 532–537. HENKE, E., BORNSCHEUER, U. T. (1999), Directed evolution of an esterase from Pseudomonas fluorescens. Random mutagenesis by error-prone PCR or a mutator strain and identification of mutants showing enhanced enantioselectivity by a resorufm-based fluorescence assay, Biol. Chem. 380, 1029–1033. JANES, L. E., KAZLAUSKAS, R. J. (1997), Quick E. A fast spectrophotometric method to measure the enantioselectivity of hydrolases, J. Org. Chem. 62, 4560–4561. ALTAMIRANO, M. M., BLACKBURN, J. M., AGUAYO C., FERSHT, A. R. (2000), Directed evolution of new catalytic activity using the a/B-barrel scaffold, Nature 403, 617–622. J¨URGENS, C., STROM, A., WEGENER, D., HETTWER, S., WILMANNS, M., STERNER, R. (2000), Directed evolution of a (αβ)8 -barrel enzyme to catalyze related reactions in two different metabolic pathways, Proc. Natl. Acad. Sci. USA 97, 9925–9930. STEVENSON, J. D., LUTZ, S., BENKOVIC, S. J. (2001), Retracing enzyme evolution in the (αβ)8 -barrel scaffold, Angew. Chem. Int. Ed. 113, 1906–1908. ZHANG, J. H., DAWES, G., STEMMER, W. P. (1997), Directed evolution of a fucosidase from a galactosidase by DNA shuffling and screening, Proc. Natl. Acad. Sci. USA 94, 4504–4509. STEFAN, A., RADEGHIERI, A., GONZALEZ VARAY RODRIGUEZ, A., HOCHKOEPPLER, A.
33
34
35
36
37
38
39
40
41
(2001), Directed evolution of β-galactosidase from Escherichia coli by mutator strains defective in the 3 →5 exonuclease activity of DNA polymerase III, FEBS Lett. 493, 139–143. LI, Q.-S., SCHWANEBERG, U., FISCHER, P., SCHMID, R. D. (2000), Directed evolution of the fatty acid hydroxylase P450 BM-3 into an indole-hydroxylating catalyst, Chem. Eur. J. 6, 1531–1536. SIO, C. F., LAAN, J. M. v. d., RIEMENS, A. M., QUAX, W. J. (2000), Altering substrate specificity of Pseudomonas SY77 glutaryl acylase towards adipyl-7-ADCA, Biotechnology 2000 meeting, Berlin JOO, H., ARISAWA, A., LIN, Z., ARNOLD, F. H. (1999), A high-throughput digital imaging screen for the discovery and directed evolution of oxygenases, Chem. Biol. 6, 699–706. ZHAO, H., GIVER L. AFFHOLTER, J. A., ARNOLD, F. H. (1998), Molecular evolution by staggered extension process (StEP) in vitro recombination, Nature Biotechnol. 16, 258–261. JOO, H., LIN, Z., ARNOLD, F. H. (1999), Laboratory evolution of peroxide-mediated cytochrome P450 hydroxylation, Nature 399, 670–673. KAMPEN, M. D. v., EGMOND, M. R. (2000), Directed evolution: From a staphylococcal lipase to a phospholipase, Eur. J. Lipid Sci. Technol. 102, 717–726. KAUFFMANN, I., SCHMIDT-DANNERT, C. (2001), Conversion of Bacillus thermocatenulatus lipase into an efficient phospholipase with increased activity towards long chain fatty acyl substrates by directed evolution and rational design, Protein Eng 14, 919–928. IFFLAND, A., TAFELMEYER, P., SAUDAN C. JOHNSSON, K. (2000), Directed molecular evolution of cytochrome c peroxidase, Biochemistry 39, 10790–10798. FONG, S., MACHAJEWSKI, T. D., MAK, C. C., WONG, C.-H. (2000), Directed evolution of D-2-keto-3-deoxy-6-phosphogluconate aldolase to new variants for the efficient synthesis of D- and L-sugars, Chem. Biol. 7, 873–883.
1
Design of Restriction Endonucleases with Novel Specificity Thomas Lanio, Albert Jeltsch, and Alfred Pingoud
Originally published in: Directed Molecular Evolution of Proteins. Edited by Susanne Brakmann, Kai Johnsson. Copyright ľ 2002 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30423-1
1 Introduction 1.1 Biology of restriction/modification systems
Restriction/modification systems (RM systems) can be regarded as an “immune” system to protect the host against foreign DNA, in particular bacteriophage DNA [1]. RM systems comprise two different enzyme activities directed against the same short, often palindromic, DNA sequence – (i) a methyltransferase activity that protects the host DNA by methylation of the recognition sequences and (ii) a restriction endonuclease activity, that cuts unmethylated recognition sequences usually present in foreign DNA (Fig. 1). Since erroneous cleavage of the host DNA would be highly toxic for the cell, restriction enzymes have evolved a remarkable specificity for their canonical recognition site. They are known to cleave “star” sites, which are DNA sequences differing only in one base pair from the canonical sequence, and also methylated recognition sites, by at least 5–6 orders of magnitude more slowly than their canonical recognition sequence [2–6]. With this specificity, restriction endonucleases are among the most specific enzymes known that do not employ an energy-dependent proofreading step. RM systems are very effective against bacteriophages and are ubiquitously distributed over the eubacterial and archaeal kingdoms. The observation that genes encoding RM systems may comprise up to 4% of the genome of a prokaryotic organism emphasizes the importance of these enzymes for their hosts [7]. In addition to their protective functions, the involvement of RM systems in recombination and transposition is still being discussed [8–10], as well as the concept that genes encoding these systems could be regarded as selfish genetic elements [11]. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Design of Restriction Endonucleases with Novel Specificity
Fig. 1 Biological function of restriction/modification systems: RM systems recognize and act on short palindromic DNA sequences. While the host genome is protected by the methyltransferase activity of the system, invading phage DNA is cleaved by the endonuclease activity.
1.2 Biochemical properties of type II restriction endonucleases
RM systems are usually divided into three different types (I, II and III), and then into subtypes, according to certain properties like cofactor requirement, subunit composition and mode of DNA cleavage. Among these three types, the type II restriction enzymes are the most important ones for molecular biology and biotechnology. Today, more than 3300 restriction endonucleases with more than 200 different specificities are known and their number is steadily increasing [12]. These enzymes have been studied over the last 30 years as model systems for specific protein-DNA interactions, in general, and enzymes acting on DNA, in particular (for reviews see [13, 14]). Besides many biochemical studies that were carried out to analyze the mechanism of DNA recognition and cleavage by these enzymes, crystal structures of 12 different type II restriction enzymes have been determined, in part in complex with DNA allowing for a detailed insight into the structure and function of this class of enzyme (for a review see [15]). The orthodox type II restriction endonuclease is a homodimer of around 2 × 30 kDa. It recognizes a palindromic sequence of 4–8 base pairs and cleaves both strands of its DNA substrate in the presence of Mg2+ ions within or directly adjacent to the recognition site, leading to 5 phosphate and 3 OH ends. Depending on the enzyme employed, the cleavage reaction results either in blunt ends (for example EcoRV), or sticky ends with a 5 overhang (for example EcoRI) or a 3 overhang (for example BglI, Fig. 2). An interesting subgroup is represented by the type IIS restriction endonucleases. These enzymes recognize an asymmetric site and cleave this sequence at a defined distance after dimerizing on the substrate (for example FokI). Restriction enzymes constitute one of the largest families of related enzymes known so far and therefore exemplify how natural evolution has generated many different specificities from one (or a few) ancestral protein(s). A detailed comparison
1 Introduction 3
Fig. 2 Cleavage of the specific recognition sites by the type II restriction endonucleases EcoRV, EcoRI and BglI: The cleavage reaction, which requires Mg2+ as cofactor, leads to 5 phosphate and 3 OH ends. While EcoRV cleavage results in blunt ends, EcoRI and BglI generate sticky ends with a 5 and 3 overhang, respectively.
of the structures and functions of these enzymes will not only improve our understanding of natural evolution but also guide design approaches. Information regarding all known RM systems are available at REBASE (www.rebase.neb.com). 1.3 Applications for type II restriction endonucleases
The discovery of restriction endonucleases was one of the key events leading to and being a prerequisite for the development of modern gene technology. These enzymes are indispensable tools for DNA analysis and cloning as well as for medical applications where the study of restriction fragment length polymorphisms (RFLPs) provided first insights into the genetic diversity of man as well as into the genetic basis of some diseases. Recently the use of restriction enzymes has been partly replaced by the PCR technique for certain applications but restriction enzymes still are the work horses in the molecular biology laboratory with a worldwide market of approximately 100–200 million US$. Today, an almost full set of enzymes with different specificities is commercially available for the 4 base pair and 6 base pair cutters (16 or 64, respectively), but only 9 out of 256 possible 8 base pair cutters were found. This under-representation of restriction enzymes with long recognition sites can be understood, because palindromic sites comprising 8 base pairs statistically occur only once every 65 536 base pairs and phage genomes usually are short (104 –105 base pairs). Therefore, a restriction enzyme with an 8 or even 10 base pair recognition site could not fulfill its biological role efficiently.
4 Design of Restriction Endonucleases with Novel Specificity
Restriction endonucleases having a recognition sequence of 8 or more base pairs, therefore, in general provide only a limited selective advantage for the bacterial host and in consequence to date only a few 8 base pair cutters were isolated from natural sources and most likely not many more will be found by screening bacterial strains. On the other hand these “rare cutters” with recognition sites longer than 8 base pairs would be very helpful for the manipulation of large DNA fragments such as human genes which are often larger than 5kb. The availability of such “rare cutters” would facilitate the generation of artificial chromosomes and perhaps make the manipulation of such constructs as easy as plasmid cloning is today. Moreover, such enzymes could be extremely useful for gene targeting and gene replacement, not only for experimental genetics but also for biomedical applications [16–19]. Currently, the only strategy available to introduce single cuts in large DNA fragments is the Achilles’ heel cleavage procedure. In this method, an enzyme is converted into a “rare cutter” by allowing specific cleavage only at one recognition site within a large substrate by protecting all other specific sites via methylation. This is achieved by incubating the DNA substrate with an oligodeoxyribonucleotide, which forms a triple-helix with the recognition site to be cleaved, thereby protecting the site. In the next step the DNA is incubated with the methyltransferase corresponding to the restriction endonuclease to be employed. The methyltransferase methylates all specific recognition sites of the plasmid except the single site protected by triple-helix formation. After inactivation of the methyltransferase the triple-helix is disrupted and the substrate is washed extensively. The single nonmethylated site is then cut by the restriction endonuclease. More recently instead of oligodeoxyribonucleotides, PNAs (peptide nucleic acids) have been used to improve triple-helix formation by sequence-specific strand invasion even at shorter recognition sequences. With PNA-assisted rare cleavage [20] a restriction enzyme was demonstrated to cut genomic DNA into a relatively small number of fragments in the range of Mbp. Instead of triplex-forming oligonucleotides or PNAs also other DNA binding protein (like transcription factors) may be used to protect one site from methylation. Employing this method it was for example, possible to cleave a single site in the yeast genome [21]. A major disadvantage of this method is the requirement for complete methylation of the substrate and, moreover, to prevent methylation of the single site of interest. In consequence the Achilles’ heel approach, in the absence of better methods, might be quite useful for DNA analysis, but not for the manipulation of whole genomes. 1.4 Setting the stage for protein engineering of type II restriction endonucleases
From what has been said above it is obvious that there is a great demand for restriction enzymes with long recognition sequences of 8 base pairs and more. In the last few years several projects were initiated towards achieving this goal by protein engineering, taking advantage of the fact that restriction endonucleases have not only been studied over the last 30 years but also introduced as model
2 Design of Restriction Endonucleases with New Specificities
systems for protein engineering in the late 1980s [22]. Whereas initial efforts were devoted to rational protein design, more recently the focus has shifted to directed evolution to change (or extend) the specificity of restriction endonucleases. In the present review we will summarize what has been achieved so far using these different approaches and present an outlook on future developments.
2 Design of Restriction Endonucleases with New Specificities 2.1 Rational design 2.1.1 Attempts to employ rational design to change the specificity of restriction enzymes A prerequisite for the rational design of proteins is detailed knowledge about the structure, or, even better, about the apoenzyme, the enzyme in the specific complex with its cognate substrate, and the enzyme in complex with competitive inhibitors (transition state analogs) (Fig. 3). Until now 12 different type II restriction endonucleases have been studied crystallographically (for a review see [15]). Of special interest are those studies which reveal the differences between the structures of the enzyme bound to its specific DNA, the enzyme bound to non-specific DNA, and the free enzyme. Even more valuable are crystallographic studies of enzymes complexed with slightly different (with respect to sequences flanking the recognition site) cognate and near-cognate (with respect to the recognition site) substrates. The restriction endonuclease EcoRW (recognition site: GATATC) is one of the best characterized type II restriction enzymes. For this enzyme the structure of the free enzyme [23, 24], the structure of the enzyme bound non-specifically to DNA [23], and the structure of the enzyme bound to various cognate substrates [23, 25–27] as well as the structure of a product complex [25] are known. In addition, structures of EcoRV variants bound to specific substrates [28] and of EcoRV bound to modified substrates [29] have been determined. All these different structures, and those of other restriction endonucieases, show that the recognition of the specific site and the discrimination between the specific and non-specific sites is achieved by multiple contacts between enzyme and substrates. These contacts are not only formed between the enzyme and the bases of the recognition sequence (direct readout) [30] but also between the enzyme and the phosphodiester backbone of the DNA (indirect readout) [31]. With detailed structural knowledge available it was suggested that it should be possible to produce enzymes with new specificities by altering some of these contacts. However, all mutagenesis studies carried out in particular with EcoRI and EcoRV revealed that none of the mutants generated by rational design displayed the desired change in specificity (for a review see [32]). Instead, the substitution of amino acid residues involved in substrate recognition usually resulted in a severe loss of activity. Obviously, a complex interwoven network in the protein-DNA interface is formed cooperatively, leading to the accurate
5
6 Design of Restriction Endonucleases with Novel Specificity
Fig. 3 Correlation of required mechanistic information, required structural information, importance of screening and number of possible enzyme variants in protein engineering by rational protein design and directed evolution.
positioning of the catalytic amino acid residues of both enzyme subunits, which explains the very high level of specificity and accuracy. Any manipulation in this network seems to lead inevitably to a strong reduction in the catalytic rate [33, 34] and/or a relaxation of specificity [35], as shown for EcoRI. The observation that it was not possible to change the substrate specificity of any restriction enzyme might be attributed to our limited understanding of the process of DNA recognition by enzymes for two reasons: (i) we do not fully understand the dynamic structures of the specific enzyme-DNA complexes, which means that we have no reasonable estimates for the energies associated with any of the observed contacts and, more importantly, we have no quantitative information regarding the influence of one contact on all other contacts, i.e. the cooperativity of the interactions, (ii) We do not have sufficient knowledge of alternative conformations of the enzyme-DNA complexes that might be adopted if one or more positions of the target sequence have been exchanged. Only a detailed understanding of these alternative conformations, which do not support catalytic activity, will allow the elucidation of the enthalpic and entropic origins of the extreme specificity of restriction endonucleases. However, the lack of success of rational design might also in part be due to the biology of these systems: restriction endonucleases, like all proteins, are synthesized by protein biosynthesis on the ribosomes in the cell. This process has a limited accuracy: about one wrong amino acid residue is incorporated per 103 –104 residues [36–38]. Assuming that there are 10 copies of a restriction enzyme in the cell (2 × 300 amino acid residues) more than 99 % of all cells will contain at least one restriction enzyme variant with at least one amino acid
2 Design of Restriction Endonucleases with New Specificities
exchange. Given the possibility that even one copy of an enzyme with an altered or relaxed specificity will cause severe problems, there must have been a strong evolutionary pressure to make restriction enzymes not only accurate but also secure in the sense that variants produced by errors in protein biosynthesis do not cleave an altered DNA sequence, i.e. a sequence unprotected by the methyltransferase of the given RM system. Rational design was applied with notable success only to the generation of variants which recognize and cleave modified substrates. For example, an EcoRI variant was obtained that had lost the ability to discriminate between thymine and uracil at the forth position of the EcoRI recognition sequence (GAAUTC and GAATTC) [39]. An EcoRV variant was produced in which a single enzyme-substrate contact was altered by site-directed mutagenesis leading to a 220-fold higher selectivity for GATAUC over GATATC modified substrates [40]. The largest alteration of the specificity of a restriction enzyme achieved to date was obtained with another EcoRV variant. This variant cleaves a modified substrate in which the GATATpC phosphate group is replaced by a methylphosphonate, with the same rate as the wild-type enzyme, which cleaves modified and non-modified substrate at almost the same rate. The variant, however, prefers the modified over the non-modified substrates by more than three orders of magnitude [41]. As none of the modifications targeted by these variants occurs in regular DNA, the described variants can only serve as the proof of principle that new enzyme-substrate contacts can be generated by rational design. Changing the target specificity of a restriction enzyme, however, would require the reengineering of 3–4 contacts of the enzyme to the DNA (co-crystal structure analyses of specific restriction enzyme-DNA complexes tell us that on average each base pair of the recognition sequence is involved in four specificity-determining contacts) against the evolutionary pressure that exists to prevent the generation of variants with new specificities by chance! 2.1.2 Changing the substrate specificity of type IIs restriction enzymes by domain fusion Type IIs restriction endonucleses are composed of a DNA recognition domain and a DNA cleavage domain. Whereas the DNA cleavage domain of FokI, a type IIs enzyme, has a typical restriction endonuclease fold [42], the structure of the DNA recognition domain is completely unrelated and resembles that of the catabolite activator protein [43]. The separation of recognition and cleavage in two different subunits can be employed to generate a chimeric nuclease with altered specificity by genetically fusing other non-related DNA binding domains to the FokI catalytic domain [44]. The most successful class of chimeric nucleases is based on the linkage of the cleavage domain from FokI and a DNA binding domain consisting of three zinc fingers [45]. Since each of the zinc fingers binds to three base pairs, the length and sequence of the DNA to be recognized can be addressed by modulation of the number and nature of the amino acids residues in contact with the DNA. Similar fusions have been prepared with transcription factors of the homeodomain [46] and Gal4-type [47]. Efficient double strand cleavage is only observed if two copies of the recognition sequence are in close proximity and in the correct orientation,
7
8 Design of Restriction Endonucleases with Novel Specificity
reflecting the requirement for dimerization of the cleavage domain of FokI [48]. While native FokI cleaves 9 and 13 base pairs away from its recognition site the dimerized cleavage domains sometimes produce cuts with overhangs other than four base pairs, which could be explained by the uncoupling of substrate recognition and catalysis. A potential application of these chimeric nucleases could be the stimulation of homologous recombination in vivo by introducing a double strand break and thereby producing recombinogenic ends [49]. Similar approaches have been explored for other non-specific nucleases, like DNasel. However, none of the alternative chimeric nucleases showed a similar degree of specificity as the FokI chimera although it was possible to change the cleavage preferences of DNasel significantly [50]. The reason for this difference might be that DNasel is a very active nuclease whereas the catalytic domain of FokI only has a low catalytic activity. Therefore, with DNasel DNA cleavage always occurs also at non-targeted sites, which does not happen as easily with FokI. 2.1.3 Rational design to extend specificities of type II restriction enzymes A much more promising approach than to employ rational design for the change of a specificity is the extension of existing specificities. As we know from the published co-crystal structures and biochemical studies, restriction endonucleases contact not only their recognition sequence but also some additional base pairs. In the case of EcoRV it was shown that, besides the contacts to the nucleotides of the recognition sequence, the protein interacts non-specifically with the backbone and the bases upstream and downstream of the GATATC site [23]. These contacts are largely water-mediated and are considered to be responsible for the dependence of the cleavage rate on the sequences flanking the recognition sequence. This flanking sequence effect suggested a promising starting point for a rational protein design project. It seemed to be possible to engineer new specific protein-DNA contacts outside of the GATATC site enabling the nuclease to recognize an extended site of up to 10 base pairs. This idea was put forward and tested by Schottler et al. [51]: they exchanged an amino acid residue involved in a water-mediated contact to the base on the 5 -side of the GATATC recognition sequence against all naturally occurring amino acid residues. Some variants, having amino acid residues with long or bulky side chains at that position show altered preferences, namely an extended specificity for PuGATATCPy (Fig. 4). More recently, on the basis of crystal structures obtained with EcoRV and oligonucleotides with a canonical site but different flanking sequences, amino acid residues which are involved in contacts to the flanking sequences were identified, and substitutions suggested which could possibly discriminate between different sequences [28]. However, the detailed biochemical characterization of the mutants led to disappointing results, because only one had a considerably altered selectivity, which, however, was not the predicted one [52]. Obviously it is very difficult to predict amino acid substitutions that would lead to an extended specificity of a restriction enzyme. This does not necessarily mean that additional specific contacts cannot be designed, but more likely that the starting point of the design is not well-defined. In most of the co-crystals obtained so far,
2 Design of Restriction Endonucleases with New Specificities
Fig. 4 Contacts of EcoRV to the base pairs flanking the recognition sequence. Exchange of Ala 181 (top) to Lys (bottom) leads to a variant with a strong preference for purine residues 5 to the recognition sequence [51].
the enzymes cannot be activated by Mg2+ ions soaked into the crystal. Thus, the structures do not really represent the specific complex as it is formed before DNA cleavage. Moreover, no structural information on the transition state is available with respect to the specificity determining step and in consequence, predictions regarding the effects of amino acid residue substitutions are unreliable. The lack of our understanding of the key steps of the different conformational changes that are required to couple recognition to catalysis, has so far prevented successful rational design, not only of enzyme variants with an altered specificity but also of those with an extended specificity. 2.2 Evolutionary design of extended specificities
A possible way to overcome these limitations might be a “non-rational” approach by repertoire selection methods which have been shown to be extremely useful for developing new enzyme specificities. With a combination of random mutagenesis and selection it was possible to change enzyme specificities for a wide range of different purposes, even in the absence of detailed structural information (Fig. 3). One of the most successful techniques proved to be the shuffling method which was
9
10 Design of Restriction Endonucleases with Novel Specificity
introduced by Stemmer in 1994 [53] leading to the molecular breeding approach, which was published a few years later by the same group [54]. In the last few years a variety of enzymes was improved regarding stability, solubility, substrate specificity and other properties using this and other methods of directed evolution [55], which all make use of large libraries of up to 109 –1010 different variants. To find the variants of interest, namely those with the desired improved or altered specificity, a reliable high-throughput assay or a stringent in vivo assay must be available (Fig. 3). Even more important is the coupling of phenotype to genotype in order to facilitate recovery of the genes of interest. This can be achieved in vivo, because in the cell the synthesis of each protein is instructed by the cellular DNA, or in vitro by using phage display [56], ribosome display [57] or coupled transcription/translation in water-oil emulsions [58]. The application of directed evolution approaches for the change or extension of the specificity of a restriction enzyme is hampered by the fact that an in vivo selection assay is not available and that examination of endonuclease activity in vitro usually requires purification of the enzymes to avoid background activity by other nucleases prevalent in all cells. This means that an altered or extended specificity can only be observed with sufficiently purified protein preparations, thereby unfortunately separating genotype and phenotype. As neither a reliable in vivo test nor the secretion of restriction endonucleases followed by a colony assay could be established in the past, experiments with larger libraries of these enzymes require a solid clone management, a fast and uncomplicated, nevertheless specific, purification scheme and a fast and robust assay method. Such an approach was chosen to extend the recognition sequence of EcoRV, for example from GATATC to GGATATCC or to AGATATCT. For this enzyme, the expression of His6 -tagged fusion proteins is established, enabling protein purification via affinity chromatography on a Ni2+ -NTA column or Ni2+ -NTA covered magnetic beads, and assays have been worked out to test for extended specificities using substrates with two differently flanked GATATC sites. As randomization of a whole gene results in libraries far too large to be screened in vitro, for EcoRV a structure guided semi-rational approach was employed. The cocrystal structure of EcoRV reveals different regions of the enzyme which are in close proximity (<0.5 mm) to the nucleotide directly flanking the recognition site on the 5 end. The three regions identified are between four and eight amino acid residues in length and located in different parts of the enzyme (Fig. 5). To limit the number of possible variants and to make an evolutionary approach possible, the amino acid residues in the three regions were randomized and screened for the desired specificity in three subsequent rounds of mutagenesis and selection, employing the genes of the selected variants of the first cycle as template for the second mutagenesis step and so forth (Fig. 6). For that purpose, a mutagenesis protocol was developed to ensure that each mutation cycle resulted in additional mutations instead of only accumulating variants that originated from the template pool [59]. Employing this method and screening more than 500 different variants, generated in three subsequent cycles, several variants were found which cleave differently flanked EcoRV recognition sites with more than 10-fold different rates. Compared to the wild-type enzyme, the
2 Design of Restriction Endonucleases with New Specificities
Fig. 5 Structure of a specific EcoRV-DNA complex. The regions subjected to mutagenesis as described in Lanio et al [59] are highlighted in red (amino acid residues 95 to 104), green (amino acid residues 180 to 184) and blue (amino acid residues 219 to 226). In B only the regions subjected to mutagenesis are displayed, together with the DNA.
11
12 Design of Restriction Endonucleases with Novel Specificity
Fig. 6 Schematic overview of the in vitro evolution cycle employed. The locations of the regions subjected to mutagenesis are shown in Fig. 5. The substrate employed to assay the activity and preference of the variants was a prelinearized plasmid harboring two differently flanked EcoRV
recognition sites (CGGGATATCGTC and AAAGATATCTTT). Both sites are cleaved by wild-type EcoRV with identical rates, such that two cleavage intermediates are formed. Mutants that prefer one of the sites produce both intermediates in unequal amounts.
selectivity for sites with differently flanked sequences has been increased by two orders of magnitude [60]. During the experiment not only the desired preferences increased, but also the number of variants displaying this preference. To further improve these results, the genes of the most successful variants were mixed and cleaved with two restriction endonucieases, thereby separating the three mutated regions. Religation of the fragments yielded recombinants with new combinations of beneficial mutations. Interestingly, some of the resulting variants had lost their extended specificity, indicating that individual beneficial substitutions can neutralize each other. However, this last shuffling step also resulted in EcoRV variants with further improved properties, namely variants which differ in their activity towards differently flanked GATATC sites by more than two orders of magnitude: the non-preferred site is only attacked after complete cleavage of the preferred AGATATCT site (Fig. 7). A careful look at the wild-type enzyme cocrystal structure did not reveal an obvious explanation for the effects of the different amino acid substitutions, neither for a single substitution itself, nor for the combination of several different substitutions. As these substitutions most probably affect the transition state whose formation is presumed to be the specificity determining step, even a cocrystal structure of EcoRV variants with differently flanked substrates might not really improve our ideas about how to extend specificity.
2 Design of Restriction Endonucleases with New Specificities
Fig. 7 Cleavage experiments with prelinearized substrate and the EcoRV variant with the most pronounced preference for AT flanked recognition sites (Y95H; K98E; E99V; N100T; S183A; Q224K). Substrate (4nM) (see Fig. 6) was incubated with enzyme (3 nM). Lane 0 = length marker, 1–13 = 1 , 3 , 5 , 6 , 10 , 30 , 60 , 90 , 120 , 150 , 170 —addition of new substrate to
exclude that the variant lost activity during the extended incubation time – 180 , 190 . Note that the substrate ABC is cleaved very rapidly into the two intermediate products AB and C – the CC-flanked recognition site in intermediate AB is only cleaved after complete cleavage of the preferred ATflanked recognition site, to give the final products A, B and C.
Although in the course of the experiments described only single and double mutants were introduced per cycle, the number of variants screened was several orders of magnitude lower than the number of variants generated. Since the bottleneck of the whole procedure was the isolation and purification of the enzyme variants, a high-throughput process was developed in order to allow parallel protein expression, purification and subsequent characterization of 96 protein variants in an automated protocol in micro-plate format [61] (Fig. 8). To test the reliability of the system three codons of the EcoRV gene were randomized using a spiked oligodeoxynucleotide yielding a library of about 1200 different clones. 1056 different clones, corresponding to an almost complete representation of the library, were inoculated and expressed in 96-well blocks. The purification of the proteins after cell lysis is based on the interaction between the His6 -tagged EcoRV variants and Ni2+ -NTA coated microplate allowing for the purification of 10 pmol protein per well. The functional characterization of the variants was performed with the assays previously described (Figs. 6 and 7). Since the whole process is done automatically by a pipetting robot it is possible to characterize 103 different clones within a week and 104 –105 clones in a reasonable period of time. In the relatively small library described several variants displayed a more than 10-fold extended specificity against differently flanked EcoRV recognition sites [61]. In the future, similar approaches will be made easier by the availability of an assay that is suited for high-throughput sample analysis [62]. Rather than using macromolecular DNA substrates and analyzing their cleavage by electrophoresis,
13
14 Design of Restriction Endonucleases with Novel Specificity
Fig. 8 Purification of 96 different enzyme variants in a highthroughput approach. Cells are grown, harvested and lysed in microplates. His6 -tagged variants are isolated by transferring the lysate to Ni-NTA coated microplates from which they are eluted after washing. Specific enzyme activity can be checked by suitable assays.
a 96-well microplate-based fluorescence assay has been developed employing two oligodeoxynucleotides with differently flanked recognition sites and labeled with different fluorescence markers. Before cleavage the fluorescence is quenched. Upon cleavage, the fluorescence of the preferentially cleaved substrate increases. Recording the fluorescence of the two differently labeled substrates allows a rapid automated analysis of variants with an extended specificity (Fig. 9).
3 Conclusions 15
Fig. 9 Schematic overview of a fluorescence-based cleavage assay for determination of cleavage preferences of restriction enzymes. The two substrates employed harbor differently flanked recognition
sites and carry different labels whose fluorescence is quenched before cleavage. Upon cleavage the fluorescence of the preferentially cleaved substrate increases.
3 Conclusions
In the past two decades different strategies have been employed to change or to extend the specificity of restriction enzymes. Since substrate recognition is highly redundant and recognition itself coupled to catalysis via a multitude of intra- and intermolecular interactions, which are not really understood, all rational design projects, by and large, have failed so far. The rare positive results concerned variants with preferences for artificial substrates. Taking these results into account rational design turns out to be ill-suited for the alteration of substrate specificity of restriction enzymes at present. Promising results were obtained with chimeric nucleases, in which combination of the cleavage domain of the type IIs restriction endonuclease FokI was fused to different DNA binding proteins. This technique was successfully employed to initiate homologous recombination in vivo, but since the uncoupling of recognition and catalysis leads to different single strand overhangs these chimeric endonucleases are not suitable for cloning experiments which require a precise double strand break at a defined position with defined overhangs. As this can only be achieved by restriction enzymes, the Achilles’ heel approach is the superior technique, especially for cloning experiments. The requirement for complete methylation of the DNA at all sites that should not be cleaved by the restriction enzyme, prevents the usage of this for DNA manipulation in vivo. The extension of recognition sequences of type II restriction endonucleases also turned out to be a more than difficult job. The function of the different contacts within the protein DNA interface is still poorly understood, as is the mechanism of how substrate recognition is coupled to catalysis. Presumably, the most serious deficit for attempts to alter the specificity of restriction enzymes is the lack of detailed knowledge of the transition state of the DNA cleavage reaction. In
16 Design of Restriction Endonucleases with Novel Specificity
consequence, one does not really know what the specificity determining step looks like and how the catalytic centers are activated during the recognition process. This lack of information could be bypassed by an evolutionary design approach which does not presuppose this information. Unfortunately, for directed evolution large libraries of clones have to be generated, which demands a stringent selection of the clones coding for the gene product with the desired property. As screening for restriction enzyme variants with extended cleavage specificities is not yet possible in vivo, a protocol for expression, purification and characterization of the different variants is required. This limits the size of the library. It was shown recently, however, that by employing nucleoside analogues in an error-prone PCR, even relatively small libraries of variants that carry many amino acid exchanges could yield the same variants that are otherwise only generated by gene shuffling [63]. Libraries comprising up to 105 different clones could be expressed and screened with recently established methods [61] and the high-throughput generation and characterization of restriction enzyme variants with extended specificities could be further improved. Acknowledgements
Research on restriction enzymes in the authors’ laboratory was supported by the Deutsche Forschungsgemeinschaft, the Bundesministerium fur Bildung, Wissenschaft, Forschung und Technologie, the European Union, and by the Fonds der Chemischen Industrie.
References 1 T. A. BICKLE, D. H. KRUGER, Microbiol. Rev. 1993, 57, 434. 2 D. R. LESSER, M. R. KURPIEWSKI, L. JEN-JACOBSON, Science 1990, 250, 776. 3 V. THIELKING, J. ALVES, A. FLIESS, G. MAASS, A. PINGOUD, Biochemistry 1990, 29, 4682. 4 J. ALVES, U. SELENT, H. WOLFES, Biochemistry 1995, 34, 11191. 5 M. NELSON, E. RASCHKE, M. MCCLELLAND, Nucleic Acids Res. 1993, 21, 3139. 6 L. JEN-JACOBSON, L. E. ENGLER, D. R. LESSER, M. R. KURPIEWSKI, C. YEE, B. MCVERRY, Emb J 1996, 25, 2870. 7 L. F. LIN, J. POSFAI, R. J. ROBERTS, H. KONG, Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 2740. 8 J. HEITMAN, Genet. Eng. 1993, 15, 57. 9 M. MCKANE, R. MILKMAN, Genetics 1995, 139, 35. 10 K. CARLSON, L. D. KOSTURKO, Mol. Microbiol. 1998, 27, 671.
11 T. NAITO, K. KUSANO, I. KOBAYASHI, Science 1995, 267, 897. 12 R. J. ROBERTS, D. MACELIS, Nucleic Acids Res. 2001, 29, 268. 13 R. J. ROBERTS, S. E. HALFORD, in Nucleases, 2nd edition (Eds.: S. M. LINN, R. S. LLOYD, R. J. ROBERTS), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1993, pp. 35. 14 A. PINGOUD, A. JELTSCH, Eur. J. Biochem. 1997, 246, 1. 326 13 Evolutionary Generation versus Rational Design of Restriction Endonucleases with Novel Specificity 15 A. PINGOUD, A. JELTSCH, Nucleic. Acids Res. 2001, 29, 3705. 16 A. CHOULIKA, A. PERRIN, B. DUJON, J. F. NICOLAS, Mol. Cell. Biol. 1995, 15, 1968. 17 B. ELLIOTT, C. RICHARDSON, J. WINDERBAUM, J. A. NICKOLOFF, M. JASIN, Mol. Cell. Biol. 1998, 18, 93.
References 18 M. COHEN-TANNOUDJI, S. ROBINE, A. CHOULIKA, D. PINTO, F. EL MARJOU, C. BABINET, D. LOUVARD, F. JAISSER, Mol. Cell. Biol. 1998, 18, 1444. 19 C. RICHARDSON, M. E. MOYNAHAN, M. JASIN, Genes Dev. 1998, 12, 3831. 20 K. I. IZVOLSKY, V. V. DEMIDOV, N. O. BUKANOV, M. D. FRANK-KAMENETSKII, Nucleic Acids Res. 1998, 26, 5011. 21 M. KOOB, W. SZYBALSKI, Science 1990, 250, 271. 22 J. ROSENBERG, B. WANG, C. FREDERICK, N. REICH, P. GREENE, J. GRABLE, J. MCCLARIN, in Protein Engineering, (Eds.: D. OXENDER, C. F. FOX), AR. Liss Inc., 1987, pp. 237. 23 F. K. WINKLER, D. W. BANNER, C. OEFNER, D. TSERNOGLOU, R. S. BROWN, S. P. HEATHMAN, R. K. BRYAN, P. D. MARTIN, K. PETRATOS, K. S. WILSON, EMBO J. 1993, 12, 1781. 24 J. J. PERONA, A. M. MARTIN, J. Mol. Biol. 1997, 273, 207. 25 D. KOSTREWA, F. K. WINKLER, Biochemistry 1995, 34, 683. 26 N. C. HORTON, J. J. PERONA, J. Mol. Biol. 1998, 277, 779. 27 N. C. HORTON, J. J. PERONA, Proc Natl. Acad. Sci. U. S. A. 2000, 97, 5729. 28 N. C. HORTON, J. J. PERONA, J. Biol. Chem. 1998, 273, 21721. 29 N. C. HORTON, M. D. SAM, B. A. CONNOLLY, J. J. PERONA, Biophys. J. 2000, 78, 417A. 30 N. C. SEEMAN, J. M. ROSENBERG, A. RICH, Proc Natl. Acad. Sci. U. S. A. 1976, 73, 804. 31 Z. OTWINOWSKI, R. W. SCHEVITZ, R. G. ZHANG, C. L. LAWSON, A. JOACHIMIAK, R. Q. MARMORSTEIN, B. F. LUISI, P. B. SIGLER, Nature 1988, 335, 321. 32 A. JELTSCH, C. WENZ, W. WENDE, U. SELENT, A. PINGOUD, Trends Biotechnol. 1996, 14, 235. 33 J. ALVES, T. RTITER, R. GEIGER, A. FLIESS, G. MAASS, A. PINGOUD, Biochemistry 1989, 28, 2678. 34 R. GEIGER, T. RTITER, J. ALVES, A. FLIESS, H. WOLFES, V. PINGOUD, C. URBANKE, G. MAASS, A. PINGOUD, A. DUSTERHOFT, M. KROGER, Biochemistry 1989, 28, 2667. 35 H. FLORES, J. OSUNA, J. HEITMAN, X. SOBERON, Gene 1995, 157, 295. 36 R. B. LOFTFIELD, D. VANDERJAGT, Biochem. J. 1972, 128, 1353.
37 P. EDELMANN, J. GALLANT, Cell 1977, 10, 131. 38 J. PARKER, T. C. JOHNSTON, P. T. BORGIA, G. HOLTZ, E. REMAUT, W. FIERS J. Biol. Chem. 1983, 258, 10007. 39 A. JELTSCH, J. ALVES, T. OELGESCHLAGER, H. WOLFES, G. MAASS, A. PINGOUD J. Mol. Biol. 1993, 229, 221. 40 C. WENZ, U. SELENT, W. WENDE, A. JELTSCH, H. WOLFES, A. PINGOUD, Biochim. Biophys. Acta 1994, 1219, 73. 41 T. LANIO, U. SELENT, C. WENZ, W. WENDE, A. SCHULZ, M. ADIRAJ, S. B. KATTI, A. PINGOUD, Protein Eng. 1996, 9, 1005. 42 D. A. WAH, J. A. HIRSCH, L. F. DORNER, I. SCHILDKRAUT, A. K. AGGARWAL, Nature 1997, 388, 97. 43 S. C. SCHULTZ, G. C. SHIELDS, T. A. STEITZ, Science 1991, 253, 1001. 44 S. CHANDRASEGARAN, J. SMITH, Biol. Chem. 1999, 380, 841. 45 J. SMITH, M. BIBIKOVA, F. G. WHITBY, A. R. REDDY, S. CHANDRASEGARAN, D. CARROLL, Nucleic Acids Res. 2000, 28, 3361. 46 Y. KIM, S. CHANDRASEGARAN, Proc Natl. Acad. Sci. U. S. A. 1994, 91, 883. 47 Y.-G. KIM, J. SMITH, M. DURGESHA, S. CHANDRASEGARAN, Biol. Chem. 1998, 379, 489. 48 J. BITINAITE, D. A. WAH, A. K. AGGARWAL, I. SCHILDKRAUT, Proc Natl. Acad. Sci. U. S. A. 1998, 95, 10570. 49 M. BIBIKOVA, D. CARROLL, D. J. SEGAL, J. K. TRAUTMAN, J. SMITH, Y. G. KIM, S. CHANDRASEGARAN, Mol. Cell. Biol. 2001, 21, 289. 50 S. CAL, K. L. TAN, A. MCGREGOR, B. A. CONNOLLY, EMBO J. 1998, 17, 7128. 51 S. SCHOTTLER, C. WENZ, T. LANIO, A. JELTSCH, A. PINGOUD, Eur. J. Biochem. 1998, 258, 184. 52 T. LANIO, A. JELTSCH, A. PINGOUD, Protein Eng. 2000, 13, 275. 53 W. P. STEMMER, Nature 1994, 370, 389. 54 A. CRAMERI, S. A. RAILLARD, E. BERMUDEZ, W. P. STEMMER, Nature 1998, 391, 288. 55 F. H. ARNOLD, P. L. WINTRODE, K. MIYAZAKI, A. GERSHENSON, Trends Biochem. Sci. 2001, 26, 100. 56 T. CLACKSON, H. R. HOOGENBOOM, A. D. GRIFFITHS, G. WINTER, Nature 1991, 352, 624.
17
18 Design of Restriction Endonucleases with Novel Specificity
57 J. HANES, A. PLU¨ CKTHUN, Proc. Natl. Acad. Sci. U. S. A. 1997, 94, 4937. 58 D. S. TAWFIK, A. D. GRIFFITHS, Nat. Biotechnol. 1998, 16, 652. 59 T. LANIO, A. JELTSCH, Biotechniques 1998, 25, 958. 60 T. LANIO, A. JELTSCH, A. PINGOUD, J. Mol. Biol. 1998, 283, 59.
61 T. LANIO, A. JELTSCH, A. PINGOUD, Biotechniques 2000, 29, 338. 62 K. EISENSCHMIDT, Diploma Thesis, Justus-Liebig Universitat (Giessen), 2000. 63 M. ZACCOLO, E. GHERARDI J. Mol. Biol. 1999, 285, 775.
1
S. Dedeblatt
Stefan Lutz, and Stephen J. Benkovic* The Pennsylvania State University, University Park, USA
Orignally published in: Directed Molecular Evolution of Proteins. Edited by Susanne Brakmann, Kai Johnsson. Copyright ľ 2002 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30423-1
1 Introduction
Proteins are involved in virtually every aspect of a living organism, yet they are composed of only twenty natural amino acids. The protein’s tremendous diversity originates from its structural and functional versatility, achieved through polymerization of the chemically distinct amino acid building blocks into sequences of varying composition and length. Upon folding into complex three-dimensional structures, these macromolecules turn into molecular machines with the ability to accelerate even the most complex chemical reactions by many orders of magnitudes with unsurpassed selectivity and specificity. The superior performance of proteins in respect to functional diversity, specificity, and efficiency has attracted scientists and engineers alike to create tailormade catalysts by protein engineering. Customizing a protein towards practical applications such as industrial catalysts, biosensors, and therapeutics offers many economic incentives. At the same time, protein engineering can address fundamental questions of protein design in respect to their structure and function. This can then lead to the development of new technologies facilitating the more directed exploration of protein sequence space. A readily available and abundant resource for examples on how to engineer proteins is Nature itself. The mechanisms of natural protein evolution have repeatedly succeeded in adapting protein function to ever changing environments. Understanding the principles and motives whereby new protein structures and functions emerge can provide useful guidelines for protein engineering. This chapter is directed towards the mechanisms of natural protein evolution, their implications on structure and function, and their experimental implementation in vitro.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 S. Dedeblatt
2 Mechanisms of Protein Evolution in Nature
Two fundamental approaches can be employed to generate novel proteins; i) the de novo arrangement of small, structured peptides or ii) the remodeling of an existing protein framework. In computational simulations, the de novo assembly pathway demonstrated its effectiveness for the creation of novel functionality in proteins [1]. However, successful in vitro experiments are lacking and evidence for the occurrence of such a pathway in Nature has been limited to a family of antifreeze glycoproteins, found in arctic and antarctic fish ([2] and references therein). Having an Ala-Ala-Thr tripeptide (with minor sequence variations) as the fundamental building blocks, the polymeric structures presumably evolved by repeated duplication of a 9-basepair segment. Although antifreeze proteins are believed to have emerged during the geologically recent glaciation period, de novo assembly per se lost its significance after the early days of protein evolution. The sparse distribution of functionality in sequence space makes the repeated de novo creation of complex structures very inefficient and extremely difficult if not impossible within the given time constrains (the last 4 billion years). Instead, remodeling of existing protein scaffolds represents a more efficient way to evolve novel functions and properties. Minor changes to explore the local sequence space by mutagenesis of individual positions within an established structure have proven highly successful in accommodating small environmental changes and in optimizing existing functions. The efficiency of such an approach is exemplified by the fine-tuning of antibody binding specificity by hyper-mutation of a set of precursors. Similarly, rationally designed single-amino acid changes in isocitrate dehydrogenase [3], lactate dehydrogenase from Trichomonas vaginalis [4], and T4 lysozyme [5] each resulted in a specific alteration in substrate specificity. Finally, a several-fold improvement of thermostability of subtilisin was observed upon introduction of two point mutations [6]. On the other hand, major rearrangements of proteins through duplication, circular permutation, fusion, as well as deletion and insertion of large fragments create entirely novel structural combinations with potentially novel function [7] (Fig. ??). Such radical reorganization of genetic material results in the exploration of extended sequence space and accelerated evolution. 2.1 Gene duplication
Reorganization, optimization, and development of novel function by duplication of genetic material is one of the fundamental mechanisms of protein evolution. Evidence for gene duplication can be found in small repetitive structures such as the previously discussed antifreeze proteins up to entire chromosomes and genomes. The unexpectedly high estimates of duplication frequency in eukaryotes, based on the analysis of several genomic databases, underline its significance in evolutionary terms [8]. Mechanistically, gene duplication can originate during chromosomal DNA replication or as a result of unequal crossing-over during
2 Mechanisms of Protein Evolution in Nature
Fig. 1 Schematic overview of mechanisms for major structural reorganization during protein evolution. (Adapted from [49])
meiosis [9]. Depending on the mechanism, either a tandem repeat or two isolated copies of the parental sequences are generated (Fig. ??). In both cases, the copies are confronted with three possible outcomes: one of the two genes is i) inactivated by accumulation of mutation, ii) released from functional constrains which facilitates the accumulation of local mutations and leads to functional differentiation, or iii) both gene products maintain wild-type activity [8]. While gene inactivation has no obvious beneficial effect for the host organism, the presence of multiple functional copies of the same gene can amplify its expression level which may be advantageous by accelerating metabolic flux or increasing production of defensive compounds [10]. Finally, negotiating the fine line between complete inactivation and conservation of wild-type activity, duplication followed by functional differentiation can lead towards the evolution of novel function. However, considering the sensitivity of protein structure and function to mutagenesis, divergence of the duplicate by random drift most likely results in enzyme inactivation. The acquisition of a different function in contrast would be a relatively rare event. A recent modeling study, however, suggested that the accumulation of complementary degenerative mutations in both gene duplicates could actually benefit the host by facilitating the long-term preservation of the duplicated genes. The extended conservation time of both copies increases the probability of evolving novel function [11]. An alternative view of the evolution of functionally novel proteins includes generecruitment [12] or gene sharing [13]. In this model, a period of bifunctionality of a single gene product precedes the duplication event upon which one copy maintains the ancestral function while the other adopts the new function. The
3
4 S. Dedeblatt
various aspects of bifunctionality can hereby range from conservation of structural properties as demonstrated for crystallins [14] to what O’Brian and Herschlag [15] call promiscuous activity in ancestral structures. Independent of the actual mechanism, functional divergence between two proteins with a common ancestor is exemplified in the βα-barrel structures of N-(phosphoribosyl-formimino)-aminoimidazole-carboxamide ribonucleotide isomerase (HisA) and imidazole glycerol phosphate synthase (HisF) of the histidine biosynthetic pathway [16]. The hypothesis of a common origin of the two enzymes was formulated based on sequence comparison [17] and has recently found support by the identification of extensive structural similarities [18]. 2.2 Tandem duplication
In contrast to locally separated duplication products, tandem duplication of a gene results in the expression of two or more copies of the gene product within a single polypeptide chain (Fig. ??). Such multiplicity has, as reflected by the high frequency of its natural occurrence, clear advantages [7]: Ĺ Ĺ Ĺ Ĺ Ĺ
increased stability, new cooperative functions, formation of binding sites in a newly-formed cleft growth of long repetitive structures such as fibrous proteins, and production of multiple binding sites in series that produce either more efficient or more specific binding effects.
Furthermore, tandem repeats can undergo circular permutation and thereby reorganize the structural and functional integrity of the parental sequence. 2.2.1 Proteases Often-cited examples for tandem duplications are pepsin, a member of the aspartate protease family [19], and chymotrypsin, representative of the trypsin-like protease family [20]. Structurally, both enzymes consist of two homologous, stacked domains, β-sheets for pepsin and β-barrels for chymotrypsin, with the active site residues located at the domain interface. Despite extensive sequence divergence between the individual domain pairs, their overall backbone structures are mutually superimposable, strongly suggesting a single-domain homodimeric ancestor [21]. Direct evidence for the feasibility of such a primordial enzyme is provided by homodimeric aspartic proteases, found in retroviruses [22]. Similarly, the homodimeric β-barrel structure of the serine protease from cytomegalovirus, a member of the herpes virus family, may resemble an evolutionary precursor of modern serine proteases. Although crystallographic analysis clearly indicates no direct relation to the latter as revealed by significant differences in structural arrangement and the catalytic triad [23], the underlying design concept of using a two-barrel structure to form a functional enzyme is identical. Interestingly, more thorough analysis of the
2 Mechanisms of Protein Evolution in Nature
individual domains for both model structures have recently led to the suggestion that they themselves originated from multiple duplication events traceable back to five- and six-residue peptides [20]. 2.2.2 βα-barrels Another important structure that presumably arose by tandem duplication is the (βα)8 -barrel. Being the most commonly used structural motif in nature, it is found in approximately 10% of all known enzyme structures [24]. The prototypical (βα)8 barrel consists of eight serial strand-turn-helix-turn repeats, the β-strands forming an interior parallel β-sheet through a continuous hydrogen bonding network, and the helices forming the surface. Studying the symmetric nature of the barrel itself, multiple sequence alignments of the previously introduced gene products of hisA and hisF suggested the presence of an internal repetition within the fold [17]. Based on this data, the authors proposed that the (βα)8 -domain arose by tandem duplication of a (βα)4 -unit (Fig. ??). The validity of such a gene duplication hypothesis was recently examined from a structural perspective [18] and experimentally on HisF from Thermotoga maritima [25]. After bisection of the enzyme into two symmetrical (βα)4 -domains and expression of these in E. coli, the two subunits folded independently into their correct conformations. Compelling evidence in support of the hypothesis was found when coexpression of the two (βα)4 -units resulted in self-association into a functional heterodimer with activities similar to wild type. Although speculative, it has been suggest that the (βα)4 -barrel itself may have originated from even smaller elements such as individual (βα)-units ([26] and references therein). Experimentally, the rearrangement and substitution of (βα)-units in the context of the barrel structure are tolerated [27, 28].
Fig. 2 Schematic illustration of the ($")8 barrel structure. The highly symmetrical structure presumably arose by multiple gene duplication events from a $"-fragment. This hypothesis was tested using reverse engi-
neering. Bisection of the barrel into two halves yielded correctly folded fragments which, upon heterodimerization, regained parental function. However, further fragmentation awaits experimental exploration.
5
6 S. Dedeblatt
2.3 Circular permutation
Circular permutation of a protein results in the relocation of its N- and C-termini within the existing structural framework. Initiated by a tandem duplication of a precursor gene, one mechanistic model proposes an in-frame fusion of the original termini, followed by the generation of a new start codon in the first repeat and a termination site in the second. In support of the model, tandem duplications are observed in prosaposins [29] and DNA methyltransferases [30], both genes for which circular permutated variants are also known. Initial evidence for the structural diversification of the βα-barrel fold by circular permutation was found in sequence alignments of members of the α-amylase super-family [31]. More recently, analysis of crystallographic data of transaldolase B from E. coli suggested that the enzyme was derived from circular permutation of a class I aldolase [32]. In either case, the shift of the two N-terminal βα-repeats (plus the β-strand of the third subunit for amylases) onto the C-terminus resulted in no apparent functional changes. In general, the evolutionary advantages of circular permutation are not clear. However, in vitro experiments have revealed that although the final structure of a circular permuted protein is by and large the same, the rearrangement of the subunits changes the folding pathway and can affect the protein’s conformational stability [33]. The potential benefit of such structural rearrangements from an engineering perspective was demonstrated by the increased anti-tumor activity of a circular permuted interleukin 4-Pseudomonas exotoxin fusion construct, resulting from decreased steric interference between the toxin and the interleukin-binding site [34]. 2.4 Oligomerization
Noncovalent association of homo- and heterologous proteins plays an important role in enzyme function [35]. Oligomerization can modulate a protein’s activity as demonstrated in allosteric enzyme regulation of hemoglobins [36] or by the dimer activation of transcription factors [37]. In addition, the agglomeration of structures at exposed hydrophobic surfaces increases the thermodynamic stability of the participating elements and can create cavities and grooves at the protein-protein interface that can accommodate substrate or regulator binding sites. Along the same line, oligomerization can be viewed as a primitive way of substrate shielding in an enzyme’s active site. A special case of oligomerization is the formation of homodimers by 3D-domain swapping, a term easily confused with “domain swapping”. As shown in Fig. ??, the latter is a synonym for domain recruitment and describes the engineering of proteins by substitution of either gene fragments that encode for individual domains or, in the case of inteins, discrete protein fragments. This involves the breakage and reformation of covalent bonds in the peptide backbone. In contrast, 3D-domain swapping describes the structural reorganization between identical
2 Mechanisms of Protein Evolution in Nature
Fig. 3 Graphical distinction between domain swapping/recruitment and 3D-domain swapping.
multi-domain proteins without breakage of covalent bonds. In the process, a portion of one protein is swapped with the same unit of an identical protein, forming an intertwined dimer or higher order oligomers. Several examples of naturally occurring 3D-domain swapping have been reviewed by Schlunegger et al. [38]. The exchanged structure elements range from secondary structures to entire domains and, although the advantages of 3D-domain swapping are not fully understood, one can envision regulatory purposes as well as increased structural stability [39].
2.5 Gene fusion
Fusion of entire genes, which translates into covalently linked multifunctional enzyme complexes, is a more conclusive commitment towards cooperativity between individual enzymes than non-covalent oligomerization. Often found for enzymes that catalyze subsequent steps in biosynthetic pathways, gene fusions can provide stability, advanced regulatory function, substrate channeling, the ability to direct concerted expression, and entropic advantages, giving the host organism a selective advantage. Multifunctional gene fusion constructs in both the pyrimidine and purine biosynthetic pathways have been discussed in detail elsewhere [40]. Additional examples have been identified in the biosynthetic pathways for histidine [16] and the aromatic amino acids. In the latter, chorismic acid is the key intermediate from which tryptophan, as well as tyrosine and phenylalanine biosynthesis diverges. In E. coli, two bifunctional complexes catalyze the conversion of chorismic acid via the prephenate intermediate into the respective α-ketoacid precursor of tyrosine and phenylalanine. While the tyrosine precursor is synthesized by a chorismate mutase (CM) fused to prephenate dehydrogenase (CM-PDG), phenylpyruvate is produced by a CM-prephenate dehydratase complex (CM-PDT). Either complex is reportedly susceptible to feedback inhibition by its final product [41, 42]. After bisection of the fusion construct, the individual monomers remain functional, excluding the necessity of gene fusion for the purpose of substrate channeling. Instead, allosteric control seems the driving force behind dimer formation [41, 43]. These
7
8 S. Dedeblatt
findings agree with experimental data from the bisection of N-acetylglucosamino1-phosphate uridyltransferase, an enzyme complex that catalyzes the consecutive acetylation and uridylation of glucosamino-1-phosphate. While both enzymes are essential in vivo, their cleavage shows no apparent effect on kinetics of the individual enzymes, the pathway’s overall flux, nor the organism’s phenotype [44]. In contrast, enhanced substrate processing was observed for rationally designed β-galactosidase – galactose dehydrogenase and galactose dehydrogenase-bacterial luciferase fusion constructs [45, 46]. 2.6 Domain recruitment
The generation of functional versatility in proteins by reorganization of individual protein subunits or domains is summarized under the term “domain recruitment”. The idea behind domain recruitment has been vividly depicted by Ostermeier and Benkovic’s adaptation of the clich´e “a thousand monkeys typing at a thousand typewriters would eventually reproduce the works of Shakespeare” [40]. In reference to domain recruitment, these authors suggest that the writing would obviously work much more efficiently if the monkeys, once managing to write entire words or sentences, could duplicate these with a single keystroke. Evidence for domain recruitment has been identified in a wide variety of proteins [47], mechanistically ranging from simple N- or C-terminal fusion to multiple internal insertions and possibly circular permutations [48]. A recent analysis of proteins in the protein structure database (PDB) has further indicated that structural rearrangements as a result of domain shuffling have significantly contributed to today’s functional diversity [49]. A brief overview of the various modes of domain recruitment and their effects on function, is presented on examples of βα-barrel structures. Substrate specificity The family of glycosyl hydrolases utilizes a variety of oligo- and polysaccharides. Preserved across the family, the active site is located in a wide-open groove at the C-terminus of the βα-barrel core structure. Substrate specificity between the family members is modulated through additional domains, which effectively close down the active site, adjusting its size and shape towards the individual substrates. In lacZ from E. coli, four “added” domains were identified, orchestrated around the central βα-barrel [50]. These add-ons all represent individual folding units with distinct structures; jelly-roll barrel (domain 1), fibronectin III fold (domain 2 & 4), and β-sandwich (domain 5). Functionally, domain 1 was found to directly participate in shaping the active site pocket while domain 2 & 4 act as domain linker to correctly position domain 1 and 5. Although the role of domain 5 is less clear, it seems to contribute to substrate binding and may be relevant to oligomerization. In contrast to the above examples, substrate specificity can also be acquired by deleterious modification of the protein framework. The absence of two consecutive α-helices in the barrel structure of endo-β-N-acetylglucosaminidase F1 (Endo F1 )
2 Mechanisms of Protein Evolution in Nature
was first reported by Van Roey et al. [51]. While not affecting the structural or functional integrity of the protein, the unusual feature was interpreted as important to binding of the bulky glycoprotein substrates and their subsequent deglycosylation. Similar alterations of the βα-barrel core structure, although to a lesser extent, were observed in family relatives of Endo F1 and could also be attributed to substrate binding [52].
Regulation The catalytic activity of porcine pancreatic α-amylase, another member of the glucosyl hydrolase family, is metal cofactor-dependent [53]. A calcium ion-binding site is located at the interface of an antiparallel β-sheet, inserted in one of the loop regions of the βα-barrel, and the core structure. In the calcium-bound state, the insertion stabilizes the substrate-binding site, and indirectly constrains part of the active site in a catalytically competent conformation. In phosphoenolpyruvate carboxylase, structural analysis indicates the arrangement of the total 40 helices around the active site in the βα-barrel core structure [54]. The additional domains create a binding site for aspartate, an allosteric inhibitor of the enzyme. Furthermore, the highly conserved hydrophobic C-terminal extension of the enzyme presumably is critical for oligomerization.
Chemistry Flavocytochrome b2 from Saccharomyces cerevisiae, a member of the FMNdependent oxidoreductase superfamily, catalyzes the two-electron oxidation of lactate to pyruvate with subsequent electron-transfer to cytochrome c via the bound flavin [55]. What distinguishes the enzyme from other family members is the N-terminal fusion of a heme-binding domain to the βα-barrel structure, which hosts the primary active site. Rather than dumping the electrons from the reduced flavin hydroquinone onto molecular oxygen, they are transferred intramolecularly to the heme-binding domain and from there in a second intermolecular step to cytochrome c.
2.7 Exon shuffling
Structure analysis of several proteases involved in blood coagulation and fibrinolysis reveals a diverse, sometimes repetitive, assembly of discrete protein modules (Fig. ??) [56]. While these modules represent independent structural units with individual folding pathways, their concerted action contributes to function and specificity in the final protein product. On the genetic level, these individual modules are encoded in separate exons. Over the course of modular protein evolution, new genes are created by duplication, deletion, and rearrangement of these exons. Mechanistically, the exon shuffling actually takes place in the intervening intron sequences (intronic recombination – for further details see [10]).
9
10 S. Dedeblatt
Fig. 4 Schematic illustration of modular protein evolution, exemplified on proteases, involved in blood coagulation and fibrinolysis. Varying type and number of modules, fused onto a common protease unit, generates a family of highly specific hydrolytic enzymes. (Adapted from [56])
The significance of exon shuffling to protein evolution, in particular in respect to the development of multicellularity, is signified by a short inventory of processes involving proteins created by modular assembly. Exon shuffling facilitates the construction of proteins involved in regulation of blood coagulation, fibrinolysis, and complement activation, plus most constituents of the extracellular matrix, cell adhesion proteins, and receptor proteins [10, 57].
3 Engineering Genes and Gene Fragments
In recent years, the protein engineering field has developed a series of in vitro methods that mimic natural processes in an attempt to improve existing enzymes and develop novel functions and properties. Particularly attractive in that respect have been modular approaches reminiscent of domain recruitment, gene fusion, and exon shuffling. By identifying and characterizing these individual modules or building blocks, protein dissection through fragmentation has substantially contributed to the fundamental understanding of protein design. Combined with rational and
3 Engineering Genes and Gene Fragments
combinatorial methods to facilitate the recombination of these building blocks, the exploration of structural and functional properties of proteins has not only produced many improved biocatalysts of industrial importance but also has provided insight to protein function and evolution. The following sections introduce a variety of protein engineering techniques and discuss their applications. 3.1 Protein fragmentation
Methods for protein fragmentation were initially developed to study protein folding mechanisms and pathways [58, 59]. Since, their application has been extended towards the elucidation of structural organization in proteins and their evolutionary origins, as well as towards the design of novel two-hybrid systems. The analysis of a protein’s structural organization by protein fragment complementation facilitates the identification of locations in the peptide backbone, permissible for cleavage without loss in function. Knowledge of such sites is valuable since protein engineering frequently requires the substitution and insertion of modules such as secondary structure elements and domains into an existing protein scaffold. Intuitively, one would assume that the manipulation of a scaffold in regions of lower organization such as loops and linkers is better tolerated by the system. Indeed, the analysis of laboratory data and the Protein Data Bank indicated that the majority of insertions and deletions are actually located in loops [60, 61]. Nevertheless some experimental systems have demonstrated surprising flexibility in the accommodation of insertion in defined structural elements [62] and tolerance towards bisection of the peptide backbone in functionally critical regions [63–65]. Experimentally, the search for permissive sites by protein fragmentation can be attempted by various methods. In the absence of structural data, random fragmentation patterns of proteins can be generated by limited proteolysis. Although biased towards surface loop regions, the partial digestion by chemicals, as well as serine and cysteine proteases has been applied to a variety of proteins [66]. More recently, combinatorial approaches that enable a comprehensive mapping of the entire protein structure were described in the literature. Manoli and Bailey reported the transposon-mediated random insertion of a 31 amino acid fragment into a protein scaffold [67]. Alternatively, a protein scaffold can be sampled for permissive sites by circular permutation [68, 69]. Finally, N- and C-terminal truncation of overlapping DNA regions, using Bal31 exonuclease [70, 71] or exonuclease III [72] can be used to search for bisection points. Furthermore, protein fragmentation has successfully been employed to investigate the evolutionary origin of protein structures and folds. Alternatively to evolutionary studies that focus on the formation of complex protein structures, the disassembly of an existing structural framework may provide valuable insight into the nature of evolutionary significant building blocks (Fig. ??). Termed reverse engineering of proteins, the method calls for the fragmentation of an existing monomeric protein into pieces that ideally possess distinct structural properties, meaning that they fold independently, and exhibit affinity towards oligomerization.
11
12 S. Dedeblatt
Probably one of the most extensively studied folds in that respect is the (βα)8 -barrel. Earlier introduced in the context of tandem duplication, protein fragmentation experiments have provided data in support of a (βα)4 -domain precursor [25]. More extensive digestion experiments of the (βα)8 -barrel were reported by Vogel et al. [73] and Ray et al. [66]. In both cases, digestion of triosephosphate isomerases with subtilisin resulted in multiple fragments which maintained proper folding and could be religated by the protease under nonaqueous conditions, reconstituting the catalytic activity. While providing valuable insight towards a better understanding of protein evolution, the bisection of proteins into functional heterodimers has also found practical applications to study protein-protein interactions. Heterodimeric variants of dihydrofolate reductase (DHFR), green fluorescent protein (GFP), GAR transformylase, and phosphotransferases were constructed to work in two-hybrid systems [64]. 3.2 Rational swapping of secondary structure elements and domains
Rational protein engineering aims to exchange and fuse secondary structure elements and domains at precise locations, based on predictions through rational means, to form a functional hybrid structure. Despite the great complexity involved in rational protein engineering, two outstanding experimental successes were recently reported for β-barrel and β/α-barrel structures. Engineering members of the serine protease family which structurally consist of two homologous, stacked β-barrels, Hopfner and coworkers designed a hybrid protease by fusing the N-terminal β-barrel from trypsin to the C-terminal β-barrel of factor Xa [74]. The location of the fusion point in the linker region between the two β-barrels was chosen following close inspection of the X-ray structures of trypsin and factor Xa. The resulting hybrid was shown to be fully functional as it hydrolyzed a broader but distinct spectrum of peptide substrates in comparison to the parental enzymes. Using rational protein engineering in combination with DNA shuffling on the βα-barrel scaffold, Altamirano et al. [75] demonstrated the successful conversion of indole-3-glycerol phosphate synthase (IGPS) to N-(5 -phosphoribosyl)anthranilate isomerase (PRAI). The two enzymes, catalyzing consecutive reactions in the tryptophan biosynthetic pathway, are thought to have evolved by gene duplication of a common ancestor. Although they share binding affinity towards a common intermediate, their sequence identity (22%) and chemistry is significantly different and no cross-reactivity is detectable. Based on initial modeling studies of the structural differences between the two enzymes, the authors deleted, substituted, and inserted gene fragments corresponding to secondary structure elements into the IGPS scaffold. While the resulting rational hybrid enzymes were virtually inactive, two subsequent rounds of fine-tuning” the structural arrangements by DNA shuffling resulted in hybrid enzymes with 90% sequence identity to IGPS, but PRAI-activity ten-fold higher than wild-type PRAI. The work is truly remarkable since neither of the two approaches alone could have generated the final
3 Engineering Genes and Gene Fragments
hybrid protein. Instead, the authors successfully combined two methods into a new design strategy, rationally rough-drafting the protein scaffold and combinatorially fine-tuning it. Generally, the rational construction and reorganization of hybrid enzymes relies on a comprehensive knowledge of structure and function of the proteins involved. The availability of such information for only a small percentage of proteins has so far limited the effective rational design on a wider range of targets. 3.3 Combinatorial gene fragment shuffling
Alternatively to rational design, combinatorial approaches resembling Darwinian protein evolution through random mutagenesis, recombination, and selection for the desired properties can be employed for protein engineering. These approaches do not require extensive structural and biochemical knowledge and their implementation has resulted in successful modifications and improvements of a wide diversity of targeted properties. In the last decade, a rapidly increasing number of genetic methods for generating combinatorial libraries of proteins have been reported. The early in vivo systems, using chemical and UV-induced random mutagenesis and E. coli mutator strains, have largely been replaced by in vitro methods. With the introduction of errorprone PCR [76], the first technique for rapidly generating large families of related sequences of a specific target gene became available. The next milestone was the successful demonstration of in vitro evolution of proteins by sexual PCR or DNA shuffling [77, 78]. DNA shuffling The directed evolution of proteins by random mutagenesis and DNA shuffling has proven extremely powerful to modify and optimize functional and physicochemical properties of existing proteins [6, 79–85] (for an up-dated list on targets see: www.che.caltech.edu/groups/fha/Enzyme/directed.html). Experimentally, the method starts off with the random fragmentation of a target gene(s) by DNasel treatment. The resulting mixture of double-stranded oligonucleotides, on average 20–100 basepairs in length, is then taken through several rounds of heat-denaturing and reannealing in the presence of a DNA polymerase. This iterative process generates a library of polynucleotides, consisting of fragments from various parental sequences. Today, multiple variations of Stemmer’s original protocol have been reported in the literature [86–90]. Despite the many successful applications of DNA shuffling, the methodology has one serious limitation. Sequence-dependent reassembly, the fundamental mechanism behind DNA shuffling, only works efficiently between sequences with >70 % sequence identity [91]. Below this threshold, DNA shuffling generates libraries biased towards regions of locally higher identity and reassembly of parental sequences becomes a dominant factor.
13
14 S. Dedeblatt
Fig. 5 Principle of family DNA shuffling, demonstrated on the identity network of three sequences. The thickness of the arrows corresponds to the pairwise sequence identities. The similarity of sequence A with C is low, disqualifying them from regu-
lar DNA shuffling. Addition of B acts as a bridging intermediate, having sufficient identity with either A or C to accommodate successful recombination of fragments from all three parents.
Family DNA shuffling or “molecular breeding” of multiple parental sequences lowers the approach’s boundaries of sequence-dependence [82, 83, 92]. In contrast to traditional DNA shuffling, this strategy is based on homologous recombination of multiple parental sequences, some with pairwise DNA sequence identities as low as 50%. The fundamental idea behind the approach is outlined in Fig. ?? on three sequences. Each member of the initial gene pool has a sufficiently high sequence identity to at least one other member. The homology between two specific sequences may be low, yet the remaining members of the pool, each of them with higher identities to either of the two, act as bridging elements and allocate the shuffling of elements from all parents. The experimental implementation of family DNA shuffling on the four amp C lactamases from Citrobacter freudii, Klebsiella pneumoniae, Enterobacter cloacea, and Yersinia enterocolitica was highly successful, resulting in the identification of hybrid proteins with up to 540-fold increased activity [82]. Interestingly, a detailed sequence analysis of the functional hybrids revealed the absence of amp C fragments from the Yersinia enterocolitica. Contrary to the argument that its presence may not be required under the applied selection conditions, a recent computational analysis of the system suggested that a lack of homology in critical regions of amp C from Yersinia prevented the sequence from participating in the shuffling in the first place [93]. This alternative explanation highlights the importance of careful design of family DNA shuffling experiments with particular emphasis not just on global but also local DNA sequence homologies of all participants. In recent years, a series of alternative methods to shuffle sequences of lowhomology has been proposed. Kikuchi et al. [89] took advantage of multiple restriction sites distributed along the gene sequence, which in combination with DNA shuffling added crossovers independent of sequence identity to these region. Others suggested the spiking of the DNase-treated fragment mixture with artificial, designed intermediates; chemically synthesized oligonucleotides that introduce mutations or crossovers at specific location in a sequence [94]. Homology-independent fragment swapping The limitations of DNA shuffling, arising from its dependency on sequence identity, have motivated the development of new technologies to facilitate an unbiased sampling of sequence space. At present, the reported approaches for creating
3 Engineering Genes and Gene Fragments
sequence-independent hybrid libraries between two parental sequences include a family of methods collectively known as Incremental Truncation for the Creation of Hybrid enzYmes (ITCHY) [72, 95, 96] and Sequence-Homology Independent Protein RECom-bination (SHIPREC) [91]. Although the final products of all methods are the same, comprehensive singlecrossover hybrid libraries that includes insertion and deletion sequences, the approaches employ different strategies of library generation (Fig. ??). The original ITCHY method [72] involves separate exonuclease III-treatment of the two parental genes (Fig. ??). The reaction conditions are chosen whereby the nuclease has a sufficiently low processivity (∼10 nucleotides/min), allowing the generation of libraries of every single-base pair deletion by quenching of aliquots over time. Following removal of the single-stranded portion of the DNA fragments, the N-terminal and the C-terminal fragment libraries are mixed and ligated to N
C C
N
a N
b
C
C N
N
C
C
N
C N C N C N N
C N
C
N N
C
N
N
C C C C N
N
C N
C N
C N
C N
C N
C N
C
parental size
Fig. 6 Fundamental mechanisms of homology-independent fragment swapping by a) ITCHY and b) SHIPREC. In the ITCHY protocol, the two parental genes are truncated independently, creating two separate libraries that correspond to the N- and Cterminal fragments. Ligation of the mixed
libraries then generates a comprehensive pool of hybrid sequences. SHIPREC on the other hand starts by limited DNasel digestion of the fused genes. Intramolecular fusion, followed by restriction digest at the original fusion point, results in sequence inversion and generates the hybrid library.
15
16 S. Dedeblatt
generate the single-crossover library. Following the same fundamental idea but being experimentally significantly faster and less laborious is the use of nucleotide analogs such as α-phosphothioate nucleotides to create ITCHY libraries [96]. Furthermore, a variation of ITCHY to bias truncation libraries towards parental sizes has been developed [95]. SHIPREC, an alternative approach to create sequence-independent singlecrossover libraries has been reported by Arnold and coworkers [91]. Utilizing an inverted fusion construct of the parental genes, partial digestion by DNase I cleaves N- and C-terminal fragments at random. Following a size selection of parental size fragments, the original termini are restored by sequence inversion (Fig. ??b). The potential value of these techniques becomes apparent upon analysis of structure diversity and functional selection of ITCHY and SHIPREC libraries. For the model system of glycinamide ribonucleotide transformylase from E. coli (PurN) and human (hGART), comparison of the diversity of the naive libraries obtained by ITCHY with DNA shuffling show the expected even distribution of crossovers over the sampled sequence for the former but a clustering in the central, highly homologous region for the latter [72]. Functional analysis of the ITCHY and SHIPREC libraries proves that interesting and catalytically active solutions for protein engineering can be found outside regions of high sequence identity. The main drawback of these described methods is their limitation to a single crossover per hybrid sequence. In contrast, DNA shuffling and related methods have produced up to 14 intersects per nucleotide sequence [90]. Increasing numbers of crossovers are desirable because they have been shown to improve the evolution of proteins with desirable characteristics. 3.3.1 Homology-independent fragment shuffling The creation of hybrid libraries with multiple sequence-independent crossovers through a combination of ITCHY and DNA shuffling (Fig. ??) was first proposed by Ostermeier et al. [97] and has recently been implemented experimentally [98]. Using the PurN/hGART model system, two complementary incremental truncation libraries (purN-hGART and hGART-purN) were generated (Fig. ??). In preparation for the subsequent DNA shuffling, these libraries were then pre-selected for in-frame and parental-size hybrid sequences (Fig. ??). Without in-frame selection, the probability of SCRATCHY library members encoding for nonsense sequences would increase exponentionally with the number of introduced crossovers. Furthermore, hybrid enzymes with large insertions are prone to protein folding problems while deletions of extended portions of a protein are likely to terminate enzymatic activity. The resulting libraries resemble large artificial “families” of intermediates, each carrying a particular crossover between the two parents. DNase I treatment of these families and subsequent reassembly by PCR results in the creation of naive hybrid libraries comprising crossovers from multiple “family” members (Fig. ??c). In experiments, analysis of the naive libraries revealed about 20 % of the members to have two and three crossovers. Their locations were randomly distributed over the entire expected sequence range and functional complementation of an auxotrophic
3 Engineering Genes and Gene Fragments N
C C
N
N
N
C
N N
C
N
N
C
N
C
N
C
N
a
C
C
N
C C
C
N
C N
N
C N
C N
C N
C N
C
C N
parental size
C
parental size
b N
C
N
C
N
C
N
C
N
C
N
C
N
C
N
C
c N C
C
N
N
C
N
C
N
C
N
C
Fig. 7 Homology-independent fragment shuffling (SCRATCHY) is a combination of ITCHY and DNA shuffling, a) The approach starts by creating two complementary ITCHY libraries. b) To maximize the functional competence of the SCRATCHY library, hybrid sequences of approximately
parental size that are in the correct reading frame are isolated, c) Finally the two ITCHY libraries are mixed, fragmented, and reassembled. The significant portion of the resulting hybrid libraries consist of members with multiple crossovers at locations independent of sequence identity.
17
18 S. Dedeblatt
E. coli strain provided evidence for the catalytic competence of multi-crossover hybrid constructs. In respect to catalytic performance of these hybrid enzymes, the method lags behind DNA shuffling approaches. This however can mostly be attributed to issues of protein folding rather than chemistry itself. In the following section, the aspects of protein folding and its increasing importance to protein engineering at low sequence identities will be discussed. 3.4 Modular recombination and protein folding
The biological activity of proteins generally depends on a unique three-dimensional conformation, which in turn is inherently linked to its primary sequence. Protein folding, the conversion of the translated polypeptide chain into the native state of a protein, is the critical link between gene sequence and three-dimensional structure. Mechanistically, folding is believed to proceed through a predetermined and ordered pathway, either via kinetic intermediates or by direct transition from the unfolded to the native state [99]. In both cases, local and non-local interactions alike stabilize transient structures along the pathway and funnel the intermediates towards the native state. In the context of protein engineering, these folding and stability issues are of major importance. Modular protein engineering in general and homologyindependent methods in particular can introduce significant variation into a system’s primary sequence, impacting the network of interactions that ultimately controls the folding trajectory. Proteins have generally demonstrated amazing flexibility and tolerance towards changes of individual amino acids and small peptide fragments, accommodating them by reorganization of the neighboring environment [62]. In the context of domain swapping and modular engineering, the situation is often more complex. Affecting mostly non-local interactions, this remodeling can disturb long-range interactions crucial to folding and structural integrity. Furthermore, possible steric and physicochemical interferences are expanded to entire surfaces of a protein module. Although the disruption of a network of stabilizing interactions between parental structures is possible on all levels of homology, the probability for structural inconsistencies that lead to misfolding rapidly increases with decreasing sequence similarity. When exploring distant regions in sequence space by homology-independent methods as discussed earlier, deleterious folding effects on hybrid protein libraries upon exchange of subunits of low homology can quickly nullify the potential benefits of structure-based protein engineering. The problem of incorrect folding of hybrid proteins is illustrated by a system investigated in our laboratory. Attempting the substitution of domains from structures with very low sequence homology, incremental truncation libraries between the E. coli proteins GAR transformylase (PurN) and tRNA(fMet)-formyltransferase (FMT) were generated (Marc Ostermeier and Steve Tizio – unpublished data). The N-terminal portions of the two proteins share only 33 % sequence identity, yet their backbone tertiary structures are almost perfectly superimposable (Fig. ??). In addition, they both catalyze the same chemistries; the formyl group-transfer
3 Engineering Genes and Gene Fragments
Fig. 8 a) Structural overlay of glycinamide ribonucleotide transformylase (PurN) [155] and the N-terminal domain of tRNA(fMet)formyltransferase (FMT) [156] from E. coli. Despite the low sequence identity (33 %), the two structures are almost perfectly superimposable. b) Structure-based multiple sequence alignment of PurN and FMT with
human glycinamide ribonucleotide transformylase (hGART) and formyltetrahydrofolate hydrolase (PurU). Conserved residues are labeled with an asterisk and homologous positions are marked with a dot. The three active site residues are highlighted in shaded boxes.
19
20 S. Dedeblatt
from N10 -formyl-tetrahydrofolate onto their respective substrates. In previous experiments, analogous substitution of fragments between the functionally related PurN and N10 -formyl-tetrahydrofolate hydrolase (PurU) from E. coli (60 % DNA sequence identity), as well as PurN and hGART (50% DNA sequence identity) had yielded functional hybrid enzymes [63, 100]. A potential indicator for protein folding problems was the identification of several thermosensitive constructs of PurN/hGART, which failed to complement in an auxotrophic host under regular growth conditions at 37◦ C, yet showed sufficient activity when incubated at ambient temperatures [95, 96]. Presumably, the perturbance of the network of non-local interactions at or near the fusion constructs intraface destabilizes the global structure and prevents proper folding at higher temperatures. In contrast, expression of the PurN/FMT hybrid library at either temperature did not yield any functional chimeras. While subsequent analysis of the libraries on the genetic level indicated the expected diversity, Western blots revealed that the expressed protein was found exclusively in the insoluble fraction of the cells (Steve Tizio – unpublished data). The same difficulties were reported for a series of rationally designed hybrid constructs between ornithine and aspartate transcarbamoylases, two structure homologs that share only 32 % sequence identity [101]. We hypothesize that the observed instability of hybrid constructs is caused by two mechanisms: i) steric and functional clashes at the module surfaces and ii) disruption of folding pathways. For the former, interference at the module’s intraface is illustrated by the “English muffin-experiment”. Tearing two muffins in half and trying to fit the bottom of the first and the top of the second back together is not likely to match up precisely. Instead, the contact surface will have friction points and empty spaces. A similar picture can be drawn for the two fused protein fragments, causing the observed destabilization of the constructs. Furthermore, dissimilarities of module surfaces can result in the exposure of hydrophobic patches, increasing the tendency to agglomerate or misfold. While future studies will have to address these issues in detail, a practical solution to circumvent the problems may be site-directed or random mutagenesis. Mutagenesis of only few positions significantly improved protein solubility and as a result functionality in a variety of systems [102–106]. While the rationale for most changes is unclear, the majority of mutations were located on the protein surface and involved the substitution of hydrophobic amino acids with charged constituents, minimizing hydrophobic patches that would otherwise be prone to aggregation. Furthermore, fundamental differences in the folding pathway between the two parental sequences could affect proper structural alignment in a chimera. This hypothesis is supported by recent results from two studies of evolutionary distant structural homologs. The first study focused on two members of the globin family, soybean leghemoglobin and myoglobin, two proteins with similar threedimensional structure but only 13 % sequence identity [107]. Extensive analysis and comparison of the folding process revealed overall similar, compact helical folding intermediates. These intermediates however differ significantly in detail, consisting of helix A, G, H, and parts of helix B for myoglobin but helix E, G, and H for leghemoglobin. In the second study, the folding mechanisms of two
3 Engineering Genes and Gene Fragments
proteins in the family of intracellular lipid binding proteins were investigated [108]. As in the previous report, the two proteins share very similar structure yet their sequences have diverged to only 23 % sequence identity. The spectroscopic analysis of the folding intermediates indicates very distinct folding pathways leading to the same final structures. Upon substitution of individual parts of two such structures, it seems reasonable to assume that neither of the two original folding pathways will be suitable to orchestrate the conformational organization of the structure as a whole. Instead, the system will likely follow a third, unknown trajectory with uncertain outcome. From an experimental standpoint, the issue can be addressed from two sides. As discussed earlier, mutagenesis can provide the means to make the necessary adjustments to restore solubility and functionality. In a recent report, protein-folding problems due to a C-terminal truncation in chloramphenicol acetyltransferase (CAT) could be corrected by the introduction of additional secondary site mutations [109]. Random mutagenesis of a deleterious, insoluble CAT variant followed by selection for chloramphenicol resistance in E. coli yielded a functional mutant carrying a single point mutation. The beneficial effect of the conservative leucine to phenylalanine substitution, located in the hydrophobic core of the protein, clearly could be attributed to an increase in folding efficiency. An alternative concept can be the fusion of domains in the context of independent folding units, exemplified by the modular assembly of proteases involved in blood coagulation and fibrinolysis. Modules are commonly associated with domains and, on a genetic level, with exons [110–112] although the correlation is not dominant [113]. Recent work with zinc finger binding domains in conjunction with nucleases or regulatory elements demonstrates the successful design of hybrids enzymes based on modular engineering.
3.5 Rational domain assembly – engineering zinc fingers
A very popular target for exploring and exploiting the modular design of proteins is the zinc finger family. Zinc fingers consist of a ββα domain, stabilized by a zinc metal ion with tetrahedral planar coordination to two cysteine and two histidine side-chains. Interacting with the major groove in DNA, the zinc finger domain specifically recognizes three-nucleotide stretches. Serial assembly of individual finger domains can therefore create protein complexes with an extended DNA recognition pattern, a highly desirable property in the post genomic area. Repeatedly, studies have indicated that fusion of finger modules through their natural linker sequence is, however, limited to three domains. Each insertion of a zinc finger into the major groove of DNA causes a slight bending of the latter, putting any module past the third domain out of reach for its DNA recognition sequence. To specifically target a unique site in even a relatively small genome such as E. coli, the 9 nucleotide-recognition pattern of the triple finger is not sufficient. Furthermore, the repertoire of natural zinc finger recognition sequences, the diversity of the “codons”, is limited to a few combinations, leaving only few sites
21
22 S. Dedeblatt
that comprise the correct sequence and length. Recent advances in the field have addressed and provided insight to these questions and are discussed below. Focusing on the limitations of codon diversity, random mutagenesis of existing zinc finger domains followed by screening for variations in the recognition sequence by phage display technology has led to the identification of domains recognizing all 16 possible 5 -GNN-3 combinations [114]. The ability to specifically target desired sites in vivo, using the expanded set of finger domains, was demonstrated. Fusing either three or six modules (two three-finger constructs, linked by a pentapeptide) to transcription factor domains, the expression of a luciferase reporter gene could be up or down-regulated [115]. An alternative approach to overcome the limitations of triplet diversity has been reported by Klug and coworkers [116, 117]. By introducing structured linkers between the individual binding domains, stretches of up to 10 nucleotides could be bridged without compromising binding affinities. Rather than trying to overcome the limitation of the number of naturally occurring codons in zinc fingers, their tactic is based on extending the length of the target sequence that can be covered by the DNA recognition domains. Having a set of linkers with flexible length allows the custom design of highly specific binding complexes using only the modules at hand. Combining the two approaches may even further increase the versatility of the zinc finger domains. Finally, the fusion of such engineered multi-module zinc finger subunits with a variety of catalytic subunits can provide very powerful tools for the engineering of entire genomes as demonstrated with transcriptional regulators [118] or with endonuclease domains [119].
3.6 Combinatorial domain recombination – exon shuffling
Advances in molecular biology and high-throughput screening, as well as the projected flexibility and diversity of modular assembly in natural protein evolution has inspired the development of various techniques to implement combinatorial domain recombination, better known as exon shuffling, in vitro. Two early proposals focused on the exploitation of self-splicing introns to catalyze the fusion of RNA libraries. Fisch and coworkers [120] generated large 25-residue peptide libraries by constructing two random 30-base pair exon libraries, flanked either on their 5 - or 3 -end by a group I intron sequence with an inserted loxCre recombination site. Following electroporation of the first plasmid-based exon library into E. coli and transformation of the second library through phage infection, in vivo recombination of the two libraries by Cre-recombinase produced a repertoire of ∼1.6 × 1011 members. In a final step, the lox-Cre recombination site between the two exon-libraries was excised by the adjacent intron sequences, retaining a 15-base pair stretch at the points of fusion. Screening of the resulting peptide library, comprising two hyper variable 10-amino acid regions with an intervening five-amino acid residue spacer, by phage display identified several sequences with up to nanomolar binding affinities against various protein targets.
3 Engineering Genes and Gene Fragments
Simultaneously, Mikheeva and Jarrell [121] proposed in vitro exon shuffling directly by engineered trans-splicing group II intron ribozymes. The method was implemented on a model system consisting of two fragments from human tissue plasminogen activator. RNA analogs were generated by separate reverse transcription of the two exons, fused to their corresponding intron sequences. A mixture of the two transcripts was then trans-spliced. Although the ligation efficiency was low, gel analysis clearly indicated product formation. In contrast to the previous approach, the engineered ribozymes facilitate the seamless fusion of the two exons. An elegant alternative to the intron-catalyzed recombination has been derived from DNA shuffling [122]. For initial experiments on a human single-chain Fv framework, six complementarity-determining regions (CDRs) within the structure (which could be viewed as individual domains in a general sense) were identified. Fragments of the DNase-treated parental gene were then spiked with synthetic oligonucleotides comprising randomly mutated derivatives of these six CDRs and the mixture was reassembled by PCR. The analysis of the resulting naive library indicated a random mix of wild-type and mutant CDRs, evenly distributed over all six regions. As discussed in a recent review [57], a modified version of the approach now assembles libraries from mixed pools of homologous exons. The proposed method gives great flexibility to customized libraries based on the design of the chimeric oligonucleotide linkers. Presumably, the size and sequence of the linkers, as well as the concentration of individual exons in each pool and the mixing ratio of the different pools can be used to control the reassembly process (Fig. ??). More hypothetically in the context of exon shuffling, post-translational proteinfragment recombination by trans-splicing inteins can be employed to recombine protein modules [123]. Trans-splicing inteins were first discovered in Synechocystis sp. PCC6803, catalyzing the specific and directed religation of the
Fig. 9 Exon shuffling: combinatorial libraries of multidomain enzymes can be created through recombination of individual modules or domains (I–IV) from multiple parents (A–D) in an ordered fashion, maintaining the parental size (left diagram) or by random reassembly. This allows he
insertion, deletion, substitution, or duplication of modules in a single step. The black and white hexagons, stars, and circles indicate unique complementary elements (complementary DNA strands or pairs of trans-splicing inteins).
23
24 S. Dedeblatt
Fig. 10 Ligation of two protein fragments by trans-splicing inteins: a) on the DNA level, the two fragments can be encoded in separate ORFs and are flanked by either a 3 -extension encoding the N-intein (Nfragment) or 5 -extension encoding the C-
intein (C-fragment). b) Following transcription and translation, the intein domains dimerize and c) excise themselves, fusing the N- and C-fragment by a peptide backbone bond.
genetically separately encoded protein fragments that make up the replicative DNA polymerase III catalytic α-subunit (DnaE) [124]. Consisting of short protein sequences at the C-terminus of the first protein fragment and the N-terminus of the second protein fragment, the inteins, following translation, dimerize in vivo and catalyze the peptide-bond formation between the two protein fragments while simultaneously excising themselves (Fig. ??). Using the same basic approach for the successful random fusion of domains and modules of proteins on a post-translational rather than the genetic level would depend on two fundamental requirements. First, multiple pairs of trans-splicing inteins would have to be available and second, the cross-reactivity between the individual pairs of inteins would have to be either minimal or non-existent. So far, a single pair of naturally occurring trans-splicing inteins has been reported in the literature and an additional four pairs have been engineered from related cis-splicing inteins [125]. Assuming that the number of engineered intein pairs will further increase over time, the first requirement should not be a major limitation to the approach. Cross-reactivity between N- and C-termini of trans-splicing inteins from different pairs is the other primary concern. Recently, Otomo et al. [126] reported the simultaneous in vitro recombination of three protein fragments, tagged with two separate pairs of trans-splicing inteins. The ligation, followed by in vitro refolding, generated a correctly assembled, functional full-length protein. Similar experiments in our laboratory, using heterodimeric-GFP, indicate that correct intein-mediated in
4 Gene Fusion – From Bi- to Multifunctional Enzymes 25
vivo assembly of functional protein in the presence of multiple trans-inteins is also possible (C.P. Scott – personal communication).
4 Gene Fusion – From Bi- to Multifunctional Enzymes
While the rationale behind gene consolidation in natural systems is still disputed, protein engineering has made extensive use of the idea of linking genes. Some of the applications may be as simple as attaching a reporter module onto the N- or C-terminus of the target. Alternatively, joining parental genes by insertion of one sequence into the other has created bifunctional hybrids with, in some cases, allosteric properties. Finally, the assembly of multiple functional units into entire protein machineries has been directed towards the multi-step synthesis of complex organic molecules. 4.1 End-to-end gene fusions
Rationally designed end-to-end fusions of genes are employed in the purification, detection, selection, and localization of targeted gene products. Probably the widest-known applications of covalently connecting two proteins are for purification purposes. Fusion constructs between a target protein and a tagging gene such as the maltose-binding protein, chitin-binding domain, thioredoxin, β-galactosidase, or glutathione S-transferases are frequently employed to aid in purification and in some cases actually help stabilize and increase solubility of target proteins [127, 128]. In addition, fusion of a reporter gene such as GFP or bacterial luciferase can be used to identify and localize a protein of interest in vivo and in vitro. Issues of protein solubility and folding can also be addressed by gene fusions in vivo. C-terminal extensions of a target gene with GFP, chloramphenicol acetyltransferase, or β-galactosidase have successfully been applied to select for solubility and in-frame constructs [129–131]. Additional applications of fusion constructs include co-expression with transporter proteins that direct and anchor the target protein to the outer membrane, effectively displaying the protein on the host’s surface. Implemented in phages using the pIII and pVIII coat proteins as fusion partners, as well as in bacterial hosts and yeast using surface-exposed loops or outer-membrane proteins [132–134], these systems are particularly useful for the screening of protein libraries. 4.2 Gene insertions
While most of the gene fusions discussed so far focused on N or C-terminal extensions, the insertion of an entire gene into a second gene is possible and may be advantageous. In a rather intriguing experiment, Betton et al. [135] in-
26 S. Dedeblatt
serted an entire β-lactamase gene (bla, 708 nucleotides) into loop regions of the maltodextrin-binding protein (MalE). Inserting bla at position 130 or 303 in malE which were previously identified as permissive to smaller insertions and dissection [136–138], the authors were able to isolate fully active bifunctional fusion constructs. In addition, MalE was shown to increase the stability and modulate the catalytic activity of the fused lactamase. Similar experiments were reported, exploring the tolerance of the phosphoglycerate kinase scaffold towards insertion of dihydrofolate reductase or β-lactamase and its effect on folding [139], as well as creating bifunctional fusion proteins between 1,3-1,4-β-glucanase and 1,4-β-xylanase [140]. Further expanding the published protocol, Baird et al. [141] circularly permuted an internal fusion construct between the enhanced yellow fluorescent protein (EYFP) and a zinc finger domain or calmodulin. Conformational changes upon metal binding at the insertion domain are translated into structural rearrangements of EYFP. These structural changes in turn affect the fluorometric properties of EYFP; making these fusion constructs in vivo biosensors to monitor Zn2+ and Ca2+ concentration in cells. An allosterically regulated, GFP-based biosensor, sensitive to antibiotics, was engineered by inserting β-lactamase into one of the surface loops of GFP followed by random mutagenesis and selection for antibioticdependent changes in the fluorescence of the fusion libraries [142]. 4.3 Modular design in multifunctional enzymes
One of the most exciting areas in the field of multifunctional enzyme systems is the synthesis of a wide array of organic molecules by polyketide and nonribosomal protein synthetases. These enzymes are generally characterized by multiple subunits which themselves consist of individual domains with distinct enzymatic activities (Fig. ??a). The range of natural products synthesized by these mega-synthetases includes a considerable number of important antibiotics, antifungals, antitumor and cholesterol-lowering compounds, immunosuppressants, and siderophores. Three categories of synthetases are distinguished, based on their substrate specificity and mode of product synthesis. The two known types of polyketide synthetases (PKSs) (Type I and II) utilize acyl-coenzyme A (CoA) monomers while nonribosomal peptide synthetases (NRPSs) use amino acids and their analogs as substrates. Type I PKS and NRPS oligomerize these building blocks by a modular assemblyline arrangement while type II PKS iteratively assembles monomeric units. The modular design of PKS and NRPS renders them convenient for protein engineering to generate hybrid enzymes that can synthesize a range of natural and unnatural metabolites. By independently manipulating what Cane et al. [143] call the four degrees of freedom in polyketide synthesis: variation of chain length, choice of ACP domains (substrate), degree of β-carbon modification, and stereochemistry, product libraries of extremely high diversity can hypothetically be synthesized. Likewise in NPRSs, the variation of chain length, type and stereochemistry of the substrates, and their level of post-synthetic derivatization can potentially yield
4 Gene Fusion – From Bi- to Multifunctional Enzymes 27
Fig. 11 Organizational overview and summary of protein engineering efforts on the polyketide synthetase (PKS) and nonribosomal peptide synthetase (NRPS) framework, a) In both systems, the complete synthetases consist of multiple subunits encoded on individual genes. The subunits themselves are divided in modules which each catalyze the addition of one acyl-building block (PKS) or amino acid (NRPS). The minimal module for PKSs is made up of three domains – a keto synthase (KS), an acyl transferase (AT), and an acyl carrier protein (ACP). In addition, modules can contain up to three modifying domains to derivatize $-carbons on the
polyketide chain; a keto reductase (KR), a dehydratase (DH), and an enoylreductase (ER). Flanking the entire complex is a loading module (LM) and a termination module (TE). The minimal module for NRPSs consists of an adenylation (A), athiolation (T), and a condensation (C) domain. Additional modifications on the peptide chain can be introduced by epimerization (E) or N-methylation (NM) domains. The loading module in NRPS consists of an adenylation/thiolation (AT) domain while the synthesis is complete by a thioesterase (TS). b) Protein engineering of PKSs and NRPSs, organized by structural objectives.
an almost unlimited number of compounds with potentially novel or improved properties. Initial efforts to rationally engineer PKS and NRPS have focused on the alteration of the biosynthetic pathway of metabolites by deletion, insertion, and substitution of individual domains (summarized in Fig. ??). Although these experiments provided valuable information about the structure and function of the individual domains and modules, formation of functional hybrid enzymes proved difficult at times. In contrast, taking entire modules and rearranging them in the context of an enzyme complex was shown to be a more successful route towards the creation of functional hybrid enzymes. Of particular interest were data indicating the significance of the linker regions between covalently as well as non-covalently attached modules. By restoring the correct inter- or intra-polypeptide linker sequences, Khosla and coworkers swapped module positions in the polyketide synthetase 6deoxyerythronolide B synthase (DEBS) and introduced modules from other PKSs, generating an array of functional hybrid enzymes [144–146]. Beyond the swapping of whole modules, efforts to prepare libraries of compounds have turned to randomization of the proteins within each module. With
28 S. Dedeblatt
respect to these truly combinatorial engineering efforts, modular designed type I PKSs have so far been most amenable. Shortly after the report of the successful but elaborate generation of a combinatorial library by cassette replacement of domains in DEBS [147], a relatively simple and elegant multi-plasmid approach was described [148]. The three genes of DEBS were cloned in separate plasmid vectors, followed by introduction of mutations to diversify each individual gene. Upon triple transformation into a heterologous host, a small library of related polyketides was produced based on the synthesis by random members of the three vectors. Although these results are very exciting, the method is very much limited by the number of different plasmids that can be present in a single host cell. The efficiency of triple transformation is low and plasmid stability may be a major problem. The engineering of iterative (type II) PKSs is considerably more difficult due to their repetitive use of a single module in the biosynthesis of natural products. So far, the only combinatorial libraries of polyketides generated with these types of PKS are created not by genetically encoded diversity but by construction of minimal PKS assemblies. Minimal PKSs are created by removal of all accessory proteins such as cyclases, ketoreductases, and aromatases, reducing the functional enzyme complex to three essential domains: a ketosynthase, a chain length factor, and an acyl carrier protein. Studying the minimal version of whiE (spore pigment) PKS, Moore and coworkers found more than two dozen unique polyketides, generated by a single enzyme multiplex [149]. The diversity was generated by premature product dissociation from the enzyme complex at various stages of the oligomerization, resulting in a variable product chain length and regiospecificity of the cyclizationevent.Similar variations in the natural product palette of PKS from Streptomyces maritimus suggest that the same mechanismsare employed by Nature to explore potentially beneficial effects of these secondary metabolites [150]. In contrast, Nature’s mechanism of directing biosynthesis towards particular metabolites is accomplished by accessory proteins such as the cyclase in the natural whiE system. By non-covalently binding to the minimal PKS complex, the accessory protein stabilizes the product-enzyme intermediates, ensuring formation of full-length product and correct cyclization. While such a system is valuable for the generation of polyketide libraries, problems with reproducibility of library composition due to the unknown factors that influence the generation of the diversity may cause problems. Furthermore, upon identification of interesting or useful products in the library, no direct approach exists for resynthesis of these particular compounds. Instead, additional rational engineering is required to attempt the construction of a hybrid that exclusively produces the desired compound. Finally, nonribosomal peptide synthetases represent the least explored members in this family of multienzyme complexes. They represent an important and versatile class of megasynthetases, involved in the biosynthesis of a wide spectrum of peptidylic secondary metabolites including penicillins, vancomycins, bleomycin, and cyclosporin A. The biosynthesis of these compounds is achieved by utilizing not only natural amino acid but also a wide range of D-configured and N-methylated amino acids and hydroxy acids. Variations in the peptide backbone structure result in linear, cyclic, and branched cyclic molecules and these structures can be
5 Conclusions 29
modified by acylation, heterocyclization, reduction, and glycosylate. Very similar to the modular design of type I PKS, nonribosomal peptide synthetases consist of a linear arrangement of multiple modules linked either covalently or noncovalently (Fig. ??a). The individual modules are responsible for the incorporation of a specific amino acid in the growing peptide chain, therefore dictating the product sequence. The modules themselves are formed by three domains, an adenylation-domain (A-domain; activates incoming amino acid), a thiolation-domain (T-domain; peptidyl carrier protein), and a condensation-domain (C-domain; facilitates the reaction between peptidyl moiety and activated amino acid). As for PKSs, rational engineering experiments have focused on the exchange of individual domains within a module [151]. More recently, Marahiel and coworkers [152] reported the successful rearrangement and fusion of intact modules, generating novel peptide products based on the amino acid specificity of the individual building blocks. Combinatorial approaches towards the creation of peptidylic libraries by NPRSs have not yet been reported. Considering the very similar overall design of NPRSs to modular PKSs, one can imagine applying similar methods as discussed above. In addition, the recently described site-directed mutagenesis of the substraterecognizing A-domain of NRPSs, resulting in the alteration or relaxation of its substrate specificity, may be employed to generate a pool of heterogeneous products [153]. In summary, the structural and functional work on PKSs and NRPSs has provided a basic understanding of secondary metabolite biosynthesis. At the same time, these systems have proven extremely valuable to protein engineering efforts. However, the current methodology is not capable of coping with the demands of a large-scale combinatorial approach to generate polyketide- and peptide libraries. Although all the individual elements are readily available, the limitations of an effective shuffling and recombination protocol currently prevents the creation of libraries with >100 individual members.
5 Conclusions
Studies of protein evolution and protein engineering are progressing hand-in-hand. Advances in our understanding of protein evolution have unveiled strategies that have led to the development of novel methods in protein engineering. In return, mimicking natural processes in the laboratory has revealed some of the robust but at the same time delicate nature of complex protein structures, furthering our understanding of the fundamentals in protein evolution. The industrial aspects of protein engineering, as well as the exploration of natural protein evolution, can both profit from one another providing further insight into the fascinating world of proteins. From a methodology standpoint, no single approach provides a universal solution to protein engineering. The nature of the individual target, the objective of the engineering project, and downstream factors such as the availability of
30 S. Dedeblatt
a high-throughput screen or a selection procedure dictates the technique or techniques most suitable to address the problem at hand. While the development of fundamentally novel methods will progress, a particularly attractive approach has become the combination of individual techniques as demonstrated by Fersht and coworkers [75] using rational design and DNA shuffling, as well as by our laboratory [98] using ITCHY and DNA shuffling. Such pairwise approaches not only overcome the limitations of the individual methods but also provide the means for addressing increasingly complex issues of protein engineering. Beyond the advances in experimental design, the development of in silico protein engineering is growing increasingly important. Progress in computational approaches now provide algorithms that simulate the nucleic acid fragment reassembly process during DNA shuffling [93] and assess the tolerance of individual amino acid substitution in the context of the global protein structure [154]. While the days of computer-based protein design are in the far future, the predictions from either modeling system can already provide guidance towards more effective experimental design. Acknowledgments
The authors would like to thank their colleagues in the laboratory and in particular Dr. Walter Fast for many helpful discussion and critical review of the manuscript. We would also like to acknowledge Dr. Marc Ostermeier and Steve Tizio for providing the unpublished data on the PurN/FMT hybrid system.
References 1 L. D. BOGARAD, M. W. DEEM, Proc. Natl. Acad. Sci. USA 1999, 96, 2591–2595. 2 C. H. CHENG, L. CHEN, Nature 1999, 401, 443–444. 3 S. A. DOYLE, S. Y. FUNG, D. E. KOSHLAND, Jr., Biochemistry 2000, 39, 14348–14355. 4 G. WU, A. FISER, B. TER KUILE, A. SALI, M. MULLER, Proc. Natl. Acad. Sci. USA 1999, 96, 6285–90. 5 R. KUROKI, L. H. WEAVER, B. W. MATTHEWS, Proc. Natl. Acad. Sci. USA 1999, 96, 8949–54. 6 K. MIYAZAKI, F. H. ARNOLD, J. Mol. Evol. 1999, 49, 716–720. 7 A. D. MCLACHLAN, Cold Spring Harb. Symp. Quant. Biol. 1987, 52, 411– 420. 8 B. M. LYNCH, J. S. CONERY, Science 2000, 290, 1151–55. 9 S. OHNO, Evolution by Gene Duplication, Springer Verlag, New York 1970.
10 L. PATTHY, Protein Evolution, Blackwell Science, Maiden, MA 1999. 11 A. FORCE, M. LYNCH, F. B. PICKETT, A. AMORES, Y. L. YAN, J. POSTLETHWAIT, Genetics 1999, 151, 1531–45. 12 R. A. JENSEN, Annu. Rev. Microbiol. 1976, 30, 409–425. 13 A. L. HUGHES, Proc. R. Soc. Land. B Biol. Sci. 1994, 256, 119–124. 14 J. PIATIGORSKY, G. WISTOW, Science 1991, 252, 1078–1079. 15 P. J. O’BRIEN, D. HERSCHLAG, Chem. Biol. 1999, 6, R91–R105. 16 R. FANI, P. LIO, A. LAZCANO, J. Mol. Evol. 1995, 41, 760–774. 17 R. FANI, P. LIO, I. CHIARELLI, M. BAZZICALUPO, J. Mol. Evol. 1994, 38, 489–495. 18 D. LANG, R. THOMA, M. HENN-SAX, R. STERNER, M. WILMANNS, Science 2000, 289, 1546–50.
References 19 J. TANG, M. N. JAMES, I. N. HSU, J. A. JENKINS, T. L. BLUNDELL, Nature 1978, 271, 618–621. 20 A. M. BAPTISTA, P. H. JONSON, E. HOUGH, S. B. PETERSEN, J. Mol. Evol. 1998, 47, 353–362. 21 A. D. MCLACHLAN, Eur. J. Biochem. 1979, 100, 181–187. 22 J. TANG, in A. J. BARRETT, N. D. RAWLINGS, J. F. WOESSNER (Eds.): Handbook of Proteolytic Enzymes, Vol. 272, Academic Press, San Diego CA, 1998, 805–814. 23 X. Y. QIU, J. S. CULP, A. G. DILELLA, B. HELLMIG, S. S. HOOG, C. A. JANSON, W. W. SMITH, S. S. ABDELMEGUID, Nature 1996, 383, 275–279. 24 D. REARDON, G. K. FARBER, FASEB J. 1995, 9, 497–503. 25 B. HOCKER, S. BEISMANN-DRIEMEYER, S. HETTWER, A. LUSTIG, R. STERNER, Nat. Struct. Biol. 2001, 8, 32–36. 26 S. JANECEK, S. BALAZ, J. Protein Chem. 1993, 12, 509–514. 27 K. LUGER, U. HOMMEL, M. HEROLD, J. HOFSTEENGE, K. KIRSCHNER, Science 1989, 243, 206–210. 28 V. MAINFROID, K. GORAJ, F. RENTIER-DELRUE, A. HOUBRECHTS, A. LOISEAU, A. C. GOHIMONT, M. E. NOBLE, T. V. BORCHERT, R. K. WIERENGA, J. A. MARTIAL, Protein Eng. 1993, 6, 893– 900. 29 C. P. PONTING, R. B. RUSSELL, Trends Biochem. Sci. 1995, 20, 179–180. 30 A. JELTSCH J. Mol. Evol. 1999, 49, 161–164. 31 E. A. MACGREGOR, H. M. JESPERSEN, B. SVENSSON, FEBS Lett. 1996, 378, 263–266. 32 J. JIA, W. HUANG, U. SCHORKEN, H. SAHM, G. A. SPRENGER, Y. LINDQVIST, G. SCHNEIDER, Structure 1996, 4, 715–724. 33 K. WIELIGMANN, B. NORLEDGE, R. JAENICKE, E. M. MAYR. J. Mol. Biol. 1998, 280, 721–729. 34 R. J. KREITMAN, R. K. PURI, I. PASTAN, Cancer Res. 1995, 55, 3357–3363. 35 G. D’ALESSIO, Prog. Biophys. Mol. Biol. 1999, 72, 271–298. 36 M. F. PERUTZ, Nature 1970, 228, 726– 739. 37 K. A. LEE, J. Cell Sci. 1992, 103, 9–14.
38 M. P. SCHLUNEGGER, M. J. BENNETT, D. EISENBERG, Adv. Protein Chem. 1997, 50, 61–122. 39 G. MACBEATH, P. KAST, D. HILVERT, Protein Sci. 1998, 7, 1757–1767. 40 M. OSTERMEIER, S. J. BENKOVIC, Adv. Protein Chem. 2000, 55, 29–77. 41 S. ZHANG, G. POHNERT, P. KONGSAEREE, D. B. WILSON, J. CLARDY, B. GANEM, J. Biol. Chem. 1998, 273, 6248–53. 42 J. TURNBULL, J. F. MORRISON, W. W. CLELAND, Biochemistry 1991, 30, 7783–7788. 43 R. M. ROMERO, M. F. ROBERTS, J. D. PHILLIPSON, Phytochemistry 1995, 39, 263–276. 44 F. POMPEO, Y. BOURNE, J. van HEIJENOORT, F. FASSY, D. MENGIN-LECREULX, J. Biol. Chem. 2001, 276, 3833–3839. 45 P. LJUNGCRANTZ, H. CARLSSON, M. O. MANSSON, P. BUCKEL, K. MOSBACH, L. BUELOW, Biochemistry 1989, 28, 8786–92. 46 C. LINDBLADH, M. PERSSON, L. BUELOW, K. MOSBACH, Eur. J. Biochem. 1992, 204, 241–247. 47 R. B. RUSSELL, Protein Eng. 1994, 7, 1407–1410. 48 M. NARDINI, B. W. DIJKSTRA, Curr. Opin. Struct. Biol. 1999, 9, 732–737. 49 A. E. TODD, C. A. ORENGO, J. M. THORNTON, J. Mol. Biol. 2001, 307, 1113–1143. 50 D. H. JUERS, R. E. HUBER, B. W. MATTHEWS, Protein Sci. 1999, 8, 122–136. 51 P. Van ROEY, V. RAO, Th. PLUMMER, Jr., A. L. TARENTINO, Biochemistry 1994, 33, 13989–96. 52 A. C. TERWISSCHA VAN SCHELTINGA, M. HENNIG, B. W. DIJKSTRA, J. Mol. Biol. 1996, 262, 243–57. 53 G. BUISSON, E. DUEE, R. HASER, F. PAYAN, EMBO J. 1987, 6, 3909–16. 54 Y. KAI, H. MATSUMURA, T. INOUE, K. TERADA, Y. NAGARA, T. YOSHINAGA, A. KIHARA, K. TSUMURA, K. IZUI, Proc. Natl. Acad. Sci. USA 1999, 96, 823–828. 55 Z. X. XIA, N. SHAMALA, P. H. BETHGE, L. W. LIM, H. D. BELLAMY, N. H. XUONG, F. LEDERER, F. S. MATHEWS, Proc. Natl. Acad. Sci. USA 1987, 84, 2629–33. 56 L. PATTHY, Cell 1985, 41, 657– 663.
31
32 S. Dedeblatt
57 J. A. KOLKMAN, W. P. STEMMER, Nat. Biotechnol. 2001, 19, 423–428. 58 M. L. TASAYCO, J. CAREY, Science 1992, 255, 594–597. 59 A. G. LADURNER, L. S. ITZHAKI, G. de PRAT GAY, A. R. FERSHT, J. Mol. Biol. 1997, 273, 317–329. 60 S. PASCARELLA, P. ARGOS, J. Mol. Biol. 1992, 224, 461–471. 61 N. DOI, H. YANAGAWA, Cell. Mol. Life Sci. 1998, 54, 394–404. 62 M. SAGERMANN, W. A. BAASE, B. W. MATTHEWS, Proc. Natl. Acad. Sci. USA 1999, 96, 6078–83. 63 M. OSTERMEIER, A. E. NIXON, J. H. SHIM, S. J. BENKOVIC, Proc. Natl. Acad. Sci. USA 1999, 96, 3562–67. 64 S. W. MICHNICK, I. REMY, F.X. CAMPBELL-VALOIS, A. VALLEE-BELISLE, J. N. PELLETIER, Methods Enzymol. 2000, 328, 208–230. 65 P. T. BEERNINK, Y. R. YANG, R. GRAF, D. S. KING, S. S. SHAH, H. K. SCHACHMAN, Protein Sci. 2001, 10, 528–537. 66 S. S. RAY, H. BALARAM, P. BALARAM, Chem. Biol. 1999, 6, 625–637. 67 C. MANOIL, J. BAILEY, J. Mol. Biol. 1997, 267, 250–263. 68 R. GRAF, H. K. SCHACHMAN, Proc. Natl. Acad. Sci. USA 1996, 93, 11591–96. 69 J. HENNECKE, P. SEBBEL, R. GLOCKSHUBER, J. Mol. Biol. 1999, 286, 1197–1215. 70 R. J. EUSTANCE, S. A. BUSTOS, R. F. SCHLEIF, J. Mol. Biol. 1994, 242, 330– 338. 71 D. M. NGUYEN, R. F. SCHLEIF, J. Mol. Biol. 1998, 282, 751–759. 72 M. OSTERMEIER, J. H. SHIM, S. J. BENKOVIC, Nat. Biotechnol. 1999, 17, 1205–1209. 73 K. VOGEL, J. CHMIELEWSKI, J. Am. Chem. Soc. 1994, 116, 11163–11164. 74 K. P. HOPFNER, E. KOPETZKI, G. B. KRESSE, W. BODE, R. HUBER, R. A. ENGH, Proc. Natl. Acad. Sci. USA 1998, 95, 9813–18. 75 M. M. ALTAMIRANO, J. M. BLACKBURN, C. AGUAYO, A. R. FERSHT, Nature 2000, 403, 617–622. 76 R. C. CADWELL, G. F. JOYCE, PCR Methods Appl. 1994, 3, S136–S140. 77 W. P. STEMMER, Nature 1994, 370, 389–391.
78 W. P. STEMMER, Proc. Natl. Acad. Sci. USA 1994, 91, 10747–10751. 79 K. CHEN, F. H. ARNOLD, Proc. Natl. Acad. Sci. C/SA 1993, 90, 5618–22. 80 J. C. MOORE, F. H. ARNOLD, Nat. Biotechnol. 1996, 14, 458–467. 81 A. CRAMERI, G. DAWES, E. RODRIGUEZ, S. SILVER, W. P. STEMMER, Nat. Biotechnol. 1997, 15, 436–438. 82 A. CRAMERI, S. A. RAILLARD, E. BERMUDEZ, W. P. STEMMER, Nature 1998, 391, 288–291. 83 J. E. NESS, M. WELCH, L. GIVER, M. BUENO, J. R. CHERRY, T. V. BORCHERT, W. P. STEMMER, J. MINSHULL, Nat. Biotechnol. 1999, 17, 893–896. 84 C. SCHMIDT-DANNERT, D. UMENO, F. H. ARNOLD, Nat. Biotechnol. 2000, 18, 750–753. 85 S. J. ALLEN, J. J. HOLBROOK, Protein Eng. 2000, 13, 5–7. 86 Z. SHAO, H. ZHAO, L. GIVER, F. H. ARNOLD, Nucleic Acids Res. 1998, 26, 681–683. 87 H. ZHAO, L. GIVER, Z. SHAO, J. A. AFFHOLTER, F. H. ARNOLD, Nat. Biotechnol. 1998, 16, 258–261. 88 Y. ZHANG, F. BUCHHOLZ, J. P. MUYRERS, A. F. STEWART, Nat. Genet. 1998, 20, 123–128. 89 M. KIKUCHI, K. OHNISHI, S. HARAYAMA, Gene 1999, 236, 159–167. 90 W. M. COCO, W. E. LEVINSON, M. J. CRIST, H. J. HEKTOR, A. DARZINS, P. T. PIENKOS, C. H. SQUIRES, D. J. MONTICELLO, Nat. Biotechnol. 2001, 19, 354–359. 91 V. SIEBER, C. A. MARTINEZ, F. H. ARNOLD, Nat. Biotechnol. 2001, 19, 456–460. 92 N. W. SOONG, L. NOMURA, K. PEKRUN, M. REED, L. SHEPPARD, G. DAWES, W. P. STEMMER, Nat. Genet. 2000, 25, 436–439. 93 G. L. MOORE, C. D. MARANAS, S. LUTZ, S. J. BENKOVIC, Proc. Natl. Acad. Sci. USA 2001, 98, 3226–31. 94 J. MINSHULL, W. P. STEMMER, Curr. Opin. Chem. Biol. 1999, 3, 284–290. 95 M. OSTERMEIER, S. J. BENKOVIC, Biotech. Lett. 2001, 23, 303–310. 96 S. LUTZ, M. OSTERMEIER, S. J. BENKOVIC, Nucleic Acids Res. 2001, 29, E16. 97 M. OSTERMEIER, A. E. NIXON, S. J. BENKOVIC, Bioorg. Med. Chem. 1999, 7, 2139–2144.
References 98 S. LUTZ, M. OSTERMEIER, G. MOORE, C. MARANAS, S. J. BENKOVIC, Proc. Natl. Acad. Sci. USA 2001, 98, 11248–11253. 99 S. E. RADFORD, Trends Biochem. Sci. 2000, 25, 611–618. 100 A. E. NIXON, M. OSTERMEIER, S. J. BENKOVIC, Trends Biotechnol. 1998, 16, 258–264. 101 L. B. MURATA, H. K. SCHACHMAN, Protein Sci. 1996, 5, 719–728. 102 R. WETZEL, L. J. PERRY, C. VEILLEUX, Biotechnology (NY) 1991, 9, 731–737. 103 Z. LIN, T. THORSEN, F. H. ARNOLD, Biotechnol. Prog. 1999, 15, 467– 471. 104 L. NIEBA, A. HONEGGER, C. KREBBER, A. PLUCKTHUN, Protein Eng. 1997, 10, 435–444. 105 G. E. DALE, C. BROGER, H. LANGEN, A. D’ARCY, D. STUBER, Protein Eng. 1994, 7, 933–939. 106 M. MURBY, E. SAMUELSSON, T. N. NGUYEN, L. MIGNARD, U. POWER, H. BINZ, M. UHLEN, S. STAHL, Eur. J. Biochem. 1995, 230, 38–44. 107 C. NISHIMURA, S. PRYTULLA, H. JANE DYSON, P. E. WRIGHT, Nat. Struct. Biol. 2000, 7, 679–686. 108 P. M. DALESSIO, I. J. ROPSON, Biochemistry 2000, 39, 860–871. 109 J. Van der SCHUEREN, J. ROBBEN, G. VOLCKAERT, Protein Eng. 1998, 11, 1211–1217. 110 M. G. ROSSMANN, P. ARGOS, Annu. Rev. Biochem. 1981, 50, 497–532. 111 C. C. BLAKE, Nature 1979, 277, 598. 112 M. GO, Nature 1981, 291, 90–92. 113 T. TSUJI, K. YOSHIDA, A. SATOH, T. KOHNO, K. KOBAYASHI, H. YANAGAWA, J. Mol. Biol. 1999, 286, 1581–96. 114 D. J. SEGAL, B. DREIER, R. R. BEERLI, C. F. BARBAS, 3rd, Proc. Natl. Acad. Sci. USA 1999, 96, 2758–63. 115 R. R. BEERLI, D. J. SEGAL, B. DREIER, C. F. BARBAS, 3rd Proc. Natl. Acad. Sci. USA 1998, 95, 14628–33. 116 M. MOORE, Y. CHOO, A. KLUG, Proc. Natl. Acad. Sci. USA 2001, 98, 1432–1436. 117 M. MOORE, A. MUG, Y. CHOO, Proc. Natl. Acad. Sci. USA 2001, 98, 1437–1441. 118 R. R. BEERLI, B. DREIER, C. F. BARBAS, 3rd Proc. Natl. Acad. Sci. USA 2000, 97, 1495–1500.
119 J. SMITH, J. M. BERG, S. CHANDRASEGARAN, Nucleic Acids Res. 1999, 27, 674–681. 120 I. FISCH, R. E. KONTERMANN, R. FINNERN, O. HARTLEY, A. S. SOLER-GONZALEZ, A. D. GRIFFITHS, G. WINTER, Proc. Natl. Acad. Sci. USA 1996, 93, 7761–7766. 121 S. MIKHEEVA, K. A. JARRELL, Proc. Natl. Acad. Sci. USA 1996, 93, 7486– 7490. 122 A. CRAMERI, S. CWIRLA, W. P. STEMMER, Nat. Med. 1996, 2, 100–102. 123 F. B. PERLER, Trends Biochem. Sci. 1999, 24, 209–211. 124 H. WU, Z. HU, X. Q. LIU, Proc. Natl. Acad. Sci. USA 1998, 95, 9226–31. 125 F. B. PERLER, Nucleic Acids Res. 2000, 28, 344–345. 126 T. OTOMO, N. ITO, Y. KYOGOKU, T. YAMAZAKI, Biochemistry 1999, 38, 16040–44. 127 N. SHEIBANI, Prep. Biochem. Biotechnol. 1999, 29, 77–90. 128 R. B. KAPUST, D. S. WAUGH, Protein Sci. 1999, 8, 1668–74. 129 L. MAXWELL, A. K. MITTERMAIER, J. D. FORMAN-KAY, A. R. DAVIDSON, Protein Sci. 1999, 8, 1908–1911. 130 S. WALDO, B. M. STANDISH, J. BERENDZEN, T. C. TERWILLIGER, Nat. Biotechnol. 1999, 17, 691–95. 131 C. WIGLEY, R. D. STIDHAM, N. M. SMITH, J. F. HUNT, P. J. THOMAS, Nat. Biotechnol. 2001, 19, 131–136. 132 GEORGIOU, C. STATHOPOULOS, P. S. DAUGHERTY, A. R. NAYAK, B. L. IVERSON, R. CURTISS, Nat. Biotechnol. 1997, 15, 29–34. 133 T. LATTEMANN, J. MAURER, E. GERLAND, T. F. MEYER, J. Bacterial. 2000, 182, 3726–3733. 134 T. BODER, K. D. WITTRUP, Methods Enzymol. 2000, 328, 430–444. 135 M. BETTON, J. P. JACOB, M. HOFNUNG, J. K. BROOME-SMITH, Nat. Biotechnol. 1997, 15, 1276–79. 136 MARTINEAU, J. G. GUILLET, C. LECLERC, M. HOFNUNG, Gene 1992, 113, 35–46. 137 J. M. BETTON, P. MARTINEAU, W. SAURIN, M. HOFNUNG, FEBS Lett. 1993, 325, 34–38. 138 J. M. BETTON, M. HOFNUNG, EMBO J. 1994, 13, 1226–1234.
33
34 S. Dedeblatt
139 B. COLLINET, M. HERVE, F. PECORARI, P. MINARD, O. EDER, M. DESMADRIL J. Biol. Chem. 2000, 275, 17428–33. 140 J. AY, F. GOTZ, R. BORRISS, U. HEINEMANN, Proc. Natl. Acad. Sci. USA 1998, 95, 6613–18. 141 G. S. BAIRD, D. A. ZACHARIAS, R. Y. TSIEN, Proc. Natl. Acad. Sci. USA 1999, 96, 11241–46. 142 N. DOI, H. YANAGAWA, FEBS Lett. 1999, 453, 305–307. 143 D. E. CANE, C. T. WALSH, C. KHOSLA, Science 1998, 282, 63–68. 144 R. S. GOKHALE, S. Y. TSUJI, D. E. CANE, C. KHOSLA, Science 1999, 284, 482–485. 145 R. S. GOKHALE, C. KHOSLA, Curr. Opin. Chem. Biol. 2000, 4, 22–27. 146 S. Y. TSUJI, N. WU, C. KHOSLA, Biochemistry 2001, 40, 2317–25. 147 R. MCDANIEL, A. THAMCHAIPENET, C. GUSTAFSSON, H. FU, M. BETLACH, G. ASHLEY, Proc. Natl. Acad. Sci. USA 1999, 96, 1846–1851. 148 Q. XUE, G. ASHLEY, C. R. HUTCHINSON, D. V. SANTI, Proc. Natl. Acad. Sci. USA 1999, 96, 11740–45. 149 Y. SHEN, P. YOON, T. W. YU, H. G. FLOSS, D. HOPWOOD, B. S. MOORE, Proc. Natl. Acad. Sci. USA 1999, 96, 3622–27. 150 J. PIEL, C. HERTWECK, P. R. SHIPLEY, D. M. HUNT, M. S. NEWMAN, B. S. MOORE, Chem. Biol. 2000, 7, 943–955. 151 P. J. BELSHAW, C. T. WALSH, T Stachelhaus, Science 1999, 284, 486– 489. 152 H. D. MOOTZ, D. SCHWARZER, M. A. MARAHIEL, Proc. Natl. Acad. Sci. USA 2000, 97, 5848–5853. 153 T. STACHELHAUS, H. D. MOOTZ, M. A. MARAHIEL, Chem. Biol. 1999, 6, 493– 505. 154 C. A. VOIGT, S. L. MAYO, F. H. ARNOLD, Z. G. WANG, Proc. Natl. Acad. Sci. USA 2001, 98, 3778–83. 155 R. J. ALMASSY, C. A. JANSON, C. C. KAN, Z. HOSTOMSKA, Proc. Natl. Acad. Sci. USA 1992, 89, 6114–18. 156 E. SCHMITT, S. BLANQUET, Y. MECHULAM, EMBO J. 1996, 15, 4749–58. 157 R. MCDANIEL, C. M. KAO, S. J. HWANG, C. KHOSLA, Chem. Biol. 1997, 4, 667– 674.
158 R. MCDANIEL, S. EBERT-KHOSLA, D. A. HOPWOOD, C. KHOSLA, Nature 1995, 375, 549–554. 159 D. BEDFORD, J. R. JACOBSEN, G. LUO, D. E. CANE, C. KHOSLA, Chem. Biol. 1996, 3, 827–831. 160 OLIYNYK, M. J. BROWN, J. CORTES, J. STAUNTON, P. F. LEADLAY, Chem. Biol. 1996, 3, 833–839. 161 H. SYMMANK, W. SAENGER, F. BERNHARD, J. Biol. Chem. 1999, 274, 21581–21588. 162 S. DOEKEL, M. A. MARAHIEL, Chem. Biol. 2000, 7, 373–384. 163 C. W. CARRERAS, D. V. SANTI, Curr. Opin. Biotech. 1998, 9, 403–411. 164 L. TANG, H. FU, R. MCDANIEL, Chem. Biol. 2000, 7, 77–84. 165 A. RANGANATHAN, M. TIMONEY, M. BYCROFT, J. CORTES, I. P. THOMAS, B. WILKINSON, L. KELLENBERGER, U. HANEFELD, I. S. GALLOWAY, J. STAUNTON, P. F. LEADLAY, Chem. Biol. 1999, 6, 731–741. 166 C. M. KAO, G. L. LUO, L. KATZ, D. E. CANE, C. KHOSLA, J. Am. Chem. Soc. 1995, 117, 9105–06. 167 C. M. KAO, G. L. LUO, L. KATZ, D. E. CANE, C. KHOSLA, J. Am. Chem. Soc. 1996, 118, 9184–85. 168 J. CORTES, K. E. WIESMANN, G. A. ROBERTS, M. J. BROWN, J. STAUNTON, P. F. LEADLAY, Science 1995, 268, 1487–89. 169 A. M. GEHRING, E. DEMOLL, J. D. FETHERSTON, I. MORI, G. F. MAYHEW, F. R. BLATTNER, C. T. WALSH, R. D. PERRY, Chem. Biol. 1998, 5, 573–586. 170 D. TILLETT, E. DITTMANN, M. ERHARD, H. VON DOHREN, T. BORNER, B. A. NEILAN, Chem. Biol. 2000, 7, 753–764. 171 Y. PAITAN, G. ALON, E. ORR, E. Z. RON, E. ROSENBERG, J. Mol. Biol. 1999, 286, 465–474. 172 J. F. APARICIO, I. MOLNAR, T. SCHWECKE, A. KONIG, S. F. HAYDOCK, L. E. KHAW, J. STAUNTON, P. F. LEADLAY, Gene 1996, 169, 9–16. 173 A. F. MARSDEN, B. WILKINSON, J. CORTES, N. J. DUNSTER, J. STAUNTON, P. F. LEADLAY, Science 1998, 279, 199–202. 174 S. KUHSTOSS, M. HUBER, J. R. TURNER, J. W. PASCHAL, R. N. RAO, Gene 1996, 183, 231–236.
1
Engineered Nanopores Hagan Bayley, and Orit Braha
University of Oxford, Oxford, United Kingdom Stephen Cheley, and Li-Qun Gu Texas A&M University, College Station, USA
Originally published in: Nanobiotechnology. Edited by Christof M. Niemeyer and Chad A. Mirkin. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30658-9
1 Overview
Engineered nanopores have potential applications in many areas of technology, including separation science, sensing, and drug delivery. In this chapter, we focus on recent developments regarding pores that span membranes and have diameters of 10 nm or less. We emphasize pores into which new properties have been engineered, highlighting our own recent investigations into α-hemolysin. 1.1 What is a Nanopore?
Nanopores occur in nature, and in the biological literature they are simply known as pores when they are more than 1 nm or so in diameter, or channels when they are narrower. For certain applications in technology, biological pores are being engineered directly. In other cases, new types of pores are being constructed that are based on biological structures [1]. The biological pores that concern us here are transmembrane proteins. Cells also contain soluble pores, such as chaperonins and enzymes that handle nucleic acids, which may be of interest in the future in the context of engineering catalysis into nanopores. Transmembrane proteins fall into two main classes: helix bundles, and β barrels [2]. In general, helix bundles constitute various classes of receptors and channels. Natural examples of these proteins are relatively difficult to engineer: (i) because the helices are usually in close contact and alterations to the side chains disrupt the structure of the protein; and (ii) because they have evolved highly specialized functions, which tend to be difficult to Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Engineered Nanopores
Fig. 1 Examples of various classes of nanopore. (a) Helix bundle in the form of a template-assembled synthetic protein (TASP) [116]. (b) A single subunit of the trimeric porin OmpF (inner diameter ∼2 nm) [117]. (c) The "-hemolysin pore (inner diameter of barrel ∼2 nm) [13]. (d) Nanotube formed from stacked cyclic peptides (inner diameter ∼0.75 nm) [16]. (e) Syn-
thetic barrel based on octiphenyl staves with attached peptides (inner diameter ∼1.5 nm) [20]. (f) Electron micrograph of silica nanotubes made in a porous alumina template (inner diameter, e.g., ∼20 nm). The template was subsequently dissolved [1]. (g) Electron micrograph of a nanopore in silicon nitride sculpted with an Ar ion beam (inner diameter ∼2 nm) [27].
change. Nevertheless, some of the earliest attempts to build membrane proteins de novo involved transmembrane helices. For example, bundles of individual synthetic helices were conceived by the groups of DeGrado and Lear [3], and templateassembled synthetic proteins (TASP) were used initially by Montal, Vogel and colleagues (Figure 1a) [4]. Some progress has been made on de novo single-chain bundles [5, 6], but as yet this work lacks a solid structural basis. By contrast, β-barrels are open structures, permitting dramatic alterations of natural scaffolds [7]. So far, the β-barrel fold has been found in all the outer membrane proteins of Gram-negative bacteria that have been examined, as well as in certain poreforming toxins. Importantly, the side chains that project into a β barrel do not interact strongly with one another, and can be altered to manipulate the properties of the pores. Recently, progress has been made on barrel-like structures formed from helices [8, 9], and we can expect more work in this area in the future. Among the natural channels that have been engineered as nanopores, the most prominent have been porins, proteins that control the permeability of the bacterial outer membrane, and α-hemolysin (αHL), a pore-forming toxin secreted by Staphylococcus aureus. Work on both these systems has been aided by highresolution crystal structures (Figure 1b,c). At this point, the engineering of porins
1 Overview
has been limited, but they do constitute a promising system [10–12]. The αHL pore is a heptameric structure (Figure 1c) [13]. The transmembrane β barrel is topped by a cap domain that contains a large internal cavity. Both the cap and barrel have been engineered by various means (see below). Our laboratory has developed methods for producing pure heteromeric pores, most usefully of the form WT6 ENG1 with just one altered subunit. The engineered subunit (ENG) is chemically or genetically tagged to confer an altered charge. The wild-type (WT) and engineered subunits are then co-assembled and the desired heteromer isolated by preparative sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) [14]. αHL belongs to a family of toxins, which includes the leukocidins. The latter form pores comprising two different, but related, polypeptide chains. We have shown that one leukocidin pore is an octamer, rather than a heptamer, containing four subunits of each class [15]. The fact that both heptamers and octamers exist suggests that it may be possible to control diameter in this family of pores by engineering the subunit–subunit interface. Designed pores containing β structure have been developed by the groups of Ghadiri and Matile. The Ghadiri group pioneered this area by introducing stacked cyclic peptides containing alternating D- and L-amino acids, which form nanotubes that span lipid bilayers (Figure 1d) [16]. The structures resemble gramicidin, a naturally occurring peptide antibiotic that forms transmembrane pores from two head-to-head β spirals. The nanotubes can be stabilized by stapling them together in a metathesis reaction between olefinic amino acid side chains [17]. The diameter of the nanotubes can be adjusted by altering the ring size of the peptides [18], and β-amino acids can form similar structures [19]. Matile’s group has developed synthetic barrels from octiphenyl staves to which all L-amino acid peptides are attached (Figure 1e) [20]. In principle, the structure of the octiphenyl barrels is highly manipulatable. The diameter can be changed by altering the length of the attached peptides, and the number of staves (which depends upon the steric and electrostatic interactions between the peptides) [21]. The length of the barrel can be controlled by adjusting the length of the stave [22]. Finally, heteromeric pores containing two types of staves have been assembled [21]. While limited structural information has been obtained on Ghadiri’s nanotubes [23], the structures of the octiphenyl barrels are based on model building and a substantial body of indirect evidence. Several groups have made nanopores from materials with no connection to biology [1]. For example, commercial track-etched polymer membranes are made by bombarding thin films with nuclear fission particles. The damage tracks are then etched, with base for example in the case of polycarbonate, to form uniform pores as small as 10 nm in diameter, which may be several µm in length. To make smaller pores of far shorter effective length, asymmetric etching techniques have been developed to generate cone-shaped structures [24]. For example, poly(ethylene terephthalate) through which a single high-energy Au ion had passed was treated with concentrated sodium hydroxide from one side, with a stop solution of formic acid on the other side. The membrane was in an electrical cell so that breakthrough could be registered and etching stopped at that point. Further, the applied potential was such that hydroxide ion moved away from the etch site after
3
4 Engineered Nanopores
breakthrough (electro-stopping). Individual pores formed in this way exhibit many of the properties of biological pores, such as ion selectivity, and open and closed states [25]. The chemically etched pores will probably be difficult to make and derivatize in a reproducible way. Fortunately, Martin’s group has developed sophisticated approaches for generating nanotubules with modifiable internal surfaces within the pores of track-etched polycarbonate filters. For example, a gold film is deposited on the surface of the pores by electroless plating [26]. This serves both to control the diameter of the nanotubules, which can be made as small as 1 nm, and to provide a reactive surface (see below). Individual nanopores have also been made by using a low-energy beam of Ar ions to “sculpt” a hole in a thin layer of silicon nitride [27]. Surprisingly, it was demonstrated that the beam can close as well as open a hole, depending on the beam flux and duty cycle and the sample temperature. Because the size of the hole could be monitored during sculpting by counting the transmitted Ar ions, it was possible to produce nanopores of known diameter by feedback control of the beam, and in this way diameters below 1.8 nm were obtained (Figure 1g). 1.2 Engineering Nanopores
Protein pores are engineered by using mutagenesis and targeted chemical modification. The primary interest lies in engineering the interior (lumen) of the pores. Mutagenesis can be used to introduce any of the twenty natural amino acids, which provide a variety of side chains of differing size, shape, polarity, and reactivity. Nonnatural amino acid mutagenesis can also be used to provide side chains that do not occur in nature, such as ketones, alkenes, and azides. Originally, non-natural amino acids were introduced by using preloaded suppressor tRNAs in an in-vitro translation system, a technology that is sufficiently demanding to prevent its routine use. Currently, Escherichia coli strains are under development that provide the mutant suppressor tRNAs and the engineered synthetases required to load them. Indeed, it is even possible to have the bacteria make an amino acid [28]! In targeted chemical modification, an amino acid side chain (usually cysteine) is selectively modified with a reagent. This again adds to the diversity of functionalities that can be incorporated into a protein. Given a high-resolution structure, these approaches to protein engineering allow modifications to be placed at well-defined locations. If more than one modification is made, the geometrical relationship between the new side chains is known. Our group recently implemented a new concept in protein engineering – that of non-covalent modification. Specifically, cyclodextrins and other host molecules (adapters) were allowed to bind to positions within the lumen of the transmembrane barrel of the αHL pore that could be defined by mutagenesis (Figure 2) [29–32]. The adapters remain capable of binding guests and therefore noncovalent modification can introduce new binding sites within the pore. The vast literature on host– guest interactions supports the enormous potential of this approach.
1 Overview
Fig. 2 "-Hemolysin pore modified with noncovalent adapters. The illustration is based on the superposition of structures of "HL and cyclodextrins. The positions of the cyclodextrins were surmised from
mutagenesis experiments. In the structure shown, sites for two different cyclodextrins were engineered. An organic molecule can be trapped in the cavity between the two sites [31].
Designed pores containing β structure, such as those of Ghadiri and Matile, can be engineered using chemical synthesis with a variety of amino acids, and of course there is no requirement for using the natural side chains. The side chains in Ghadiri’s nanotubes all project outwards, which makes it difficult to engineer function, although progress has been made by using functionalized peptides to cap the structures [33]. In Matile’s synthetic barrels, just as in a natural β barrel, the side chains of the amino acids project alternately outwards and into the lumen, permitting the introduction of functional groups within the pore. However, although it is in principle possible, the exquisite precision of protein engineering, in which individual changes are made at specific positions, has not yet been achieved with the synthetic barrels. At present, for example, 32 or 64 histidine residues are introduced at one time. As expected, the internal surfaces of Martin’s gold-plated nanotubules are amenable to derivatization with a variety of thiols that vary in size, shape, and polarity [34]. Interestingly, the surfaces of the gold-plated nanotubules can also be charged electrostatically by using an applied potential [35]. Recently, the idea of electromodulation has been demonstrated in which electrostatic charging is used to attract an amphiphilic surfactant to the wall of the nanotubule and thereby render the surface hydrophobic [36]. By using silane chemistry, proteins have been attached to the internal surface of another class of nanotubules made of silica in an alumina template (see Figure 1f) [37]. Asymmetrically etched poly(ethylene terephthalate) can also be chemically modified. For example, the free carboxyl groups can be neutralized by methylation with diazomethane [25] and additional chemistry that would alter surface properties can readily be conceived. Nevertheless, in these
5
6 Engineered Nanopores
cases, the inability to control the number of modified sites or organize them in space remains a severe drawback. 1.3 What Can a Nanopore Do?
In Arthur Karlin’s succinct words: “Ion channels open, conduct ions selectively, and close” (Figure 3) [38]. The nanopores we are considering also conduct ions, and even though the diameters of the pores are larger than the typical ion channel, they do exhibit at least modest ion selectivity. Importantly, because they are large they can also transport large molecules across a membrane or bilayer. Natural pores, such as gap junctions, can gate (open and close in response to a stimulus), but so far little attention has been given to this issue with engineered nanopores. More important has been the generation of sites within the pores at which blockers can bind (Figure 3d), by analogy with drugs such as verapamil, a Ca-channel blocker. These properties of pores can be elaborated and enhanced by various engineering techniques, and we describe selected examples here. Emphasis is given to investigations on the αHL pore, for which such studies are relatively advanced. The ion selectivity of porins [39] and the αHL pore [32] can be altered by the replacement of residues within the lumen. In general, positively charged residues produce anion selectivity, and negatively charged residues cation selectivity. Similarly, in the case of Matile’s artificial barrels, the amino acid side chains that project into the lumen can be used to control selectivity [40]. Ghadiri’s nanotubes remain cationselective in the presence of positively charged cyclic peptide caps, but the selectivity is substantially reduced [33]. When the neutral adapter molecule β-cyclodextrin is lodged within the lumen of the αHL pore, the weak anion selectivity of the pore is enhanced. By contrast, when the anionic hepta-6-sulfato-β-cyclodextrin is used, the pore becomes cation-selective [41]. Ion selectivity can also be controlled with a tightly adsorbed but noncovalently bound internal surface layer. Matile and colleagues have
Fig. 3 Basic properties of channels and pores. (a) Ion conduction; (b) ion selectivity; (c) gating; (d) channel block.
1 Overview
shown that phosphate tightly bound to positively charged residues in the lumen of artificial barrels renders them cation-selective, while Mg2+ ions bound to internal carboxylates render the barrels anion-selective [40]. Underivatized Au nanotubes are unselective in KF solutions, but cation-selective in KCl and KBr because of adsorption of anions to the Au surface [35]. The selectivity of the Au nanotubes can also be controlled by direct electrostatic charging of the metal surface [35]. There is still a great deal to learn about the transport of larger molecules through nanopores. For rigid molecules there is molecular mass cut-off, although transport will be reduced before the cut-off is reached because of an increased barrier to entry (hard-sphere model) and interactions with the walls during transport. In one example, glucose was shown to pass through nanotubes made from cyclic peptides of ten amino acids, but not through those of eight amino acids [18]. Au nanotubules can also mediate size separation [26]. Several examples of facilitated transport have been demonstrated with various derivatized inorganic nanotubules. In the case of small organic molecules, a high degree of selectivity has been obtained after reacting the walls of Au nanotubules with various alkane thiols. For example, nanotubules with a low diameter of 1.5 nm that had been reacted with hexadecanethiol transported toluene more than 100 times faster than pyridine in aqueous solution [34]. It seems likely that the derivatized nanotubules do not fill with water, and that the toluene diffuses through an internal hydrophobic phase. Electromodulation of the hydrophobicity of the interior of Au nanotubules is a recent innovation [36]. The enantiomers of chiral molecules have been separated in nanotubules [37, 42]. For example, a racemate can be passed though silica nanotubules within which a stereospecific antibody has been attached [37]. It is likely that this separation works by a facilitated transport mechanism in which the permeant molecules partition strongly into nanotubules containing a high density of cognate binding sites. For facilitated transport to occur, these sites must exchange the permeant molecules rapidly. In the case of the αHL pore, the focus has been on the transport of macromolecules. The cut-off for a rigid spherical molecule based on the width of the narrowest constriction in the lumen is less than 1000 Da, but this has not been tested in rigorous experiments. Elongated polymers of much higher mass such as single-stranded DNA [43] and poly(ethylene glycol) (PEG) [44] can pass through the αHL pore. For a highly flexible molecule such as PEG there is a high entropic cost to entry into the pore, which greatly reduces the rate of transport [45]. A great deal of experimental [46, 47] and theoretical effort ([48] and references therein) continues to be devoted to this aspect of nanopore research because of the interest in making sensors for various polymers. α-Helical ion channels are gated (opened and closed) by voltage, ligands, or mechanical force. β-Barrel pores gate naturally by mechanisms that are poorly understood and depend upon variables such as the pH and applied potential. Attempts are being made to engineer voltage gating into porins. For example, positively charged peptide loops have been inserted into turns of a porin. The applied potential pushes the charged insert into the barrel at one polarity and removes it at the other [11], which is akin to the so-called ball-and-chain inactivation of K+ channels [49]. Although weakly asymmetric conduction has been achieved with the porin, more
7
8 Engineered Nanopores
work is required to obtain clear-cut gating. Certain derivatives of gramicidin also appear to gate in a ball-and-chain fashion [50]. There are several other attractive possibilities for controlling the activity of pores. For example, attempts have been made to open and close pores with light, and in such an attempt Woolley and colleagues attached an azobenzene near the mouth of the gramicidin pore [51]. But again, allor-none gating has not yet been achieved. Interestingly, control of the access of organic molecules to cavities within mesoporous silica was recently achieved through reversible photochemical dimerization of covalently attached coumarins [52]. A variety of blocker sites have been engineered into the αHL pore (Figure 4), in this case with a view towards sensor technology (see below). The simplest sites involve mutagenesis with natural amino acids. For example, the introduction of histidine residues at sites within the lumen of the αHL pore allows block by various divalent metal cations [14, 53], while the introduction of arginine residues permits the binding of anionic phosphate esters [54] (Figure 4a). Sites for organic molecules can be formed from noncovalent adapters (Figure 4b) [29–31, 55], although in principle other forms of engineering might be used. Interestingly, the cap domain of the αHL pore, which contains a 45 Å-diameter cavity, is large enough to contain a short covalently attached oligonucleotide, an 8-mer for example (Figure 4c). The tethered oligonucleotide can act as a binding site for complementary strands [56–58]. Most proteins are too large to be accommodated within the lumen of the αHL pore. Nevertheless, the attachment of a protein to an external ligand can be transmitted to the interior and hence affect the conductance of the pore. This has been achieved by attaching the ligand to the end of a molecular fishing line, namely a long PEG chain [59] (Figure 4d). When the protein is bound, the mobility of the part of the chain that lies within the pore is altered, which produces a characteristic change in the electrical signal during bilayer recording. In addition to the fundamental properties of pores discussed above, there are additional features that might be engineered. For example, many membrane proteins form two-dimensional crystals, which have contributed to structural studies. Engineered arrays of protein nanopores might be exploited in the same way as bacterial S-layers. S-layers are porous, two-dimensional arrays of a single protein species that envelope a variety of bacteria and are being explored for various applications
Fig. 4 Blocker and related sites in the "HL pore. (a) Genetically engineered site for phosphate anions [54]. (b) Site for organic molecules formed with a noncovalent adapter [29]. (c) Covalently attached oligonucleotide for duplex formation [57]. (d) Ligand attached to a long PEG chain for the capture of protein analytes [59].
1 Overview
including ultrafiltration, the geometrically defined immobilization of macromolecules, and acting as templates for the formation of arrays of nanocrystals [60]. 1.4 What are the Potential Applications of Nanopores?
There are many exciting prospects for the utilization of engineered nanopores, and in some cases practical applications are emerging. For example, as described above, Martin and colleagues have effected separations with various modified inorganic nanopores. The separations have been based on several properties including molecular mass [26], charge [35], hydrophobicity [34, 36], and even stereochemistry [37, 42]. In principle, similar separations might be carried out with engineered protein nanopores, but with the exception of S-layers with which size separation has been accomplished [60], proteins have not yet been exploited for this purpose. Another area of application of protein nanopores is in cell permeabilization. Because of its importance, the utility of reversible plasma membrane permeabilization in cell and tissue preservation has received the most attention. A mutant αHL pore, H5, which can be closed with divalent metal ions, can be used to introduce small molecules into cells [61]. Once the cells are loaded, efflux can be prevented by Zn(II) ions at the micromolar concentrations normally found in plasma. This procedure has been used to introduce sugars such as sucrose and trehalose into cells (Figure 5) [62]. Cells treated in this way have far higher survival rates after cryopreservation [63] and desiccation [64, 65] than those that have not been treated. Various therapeutic approaches might also be mediated by nanopores. First, nanopores might be used in direct attacks on infectious or malignant cells. For example, Ghadiri’s group has demonstrated that selected cyclic peptides have a rapid bactericidal action on both Gram-negative and Gram-positive species [66]. They suggest that the nanotubes formed from the peptides act by assembling at target membranes and forming “carpets”, thereby rendering them leaky, rather than by acting as individual pores. Malignant mammalian cells might repair themselves
Fig. 5 Loading mammalian cells with trehalose by using "HL-H5 pores, which can be blocked completely by Zn(II) ions [63].
9
10 Engineered Nanopores
after permeabilization, and it has been suggested that the effect of pore formation might be enhanced in the presence of cytotoxic molecules that would not normally enter target cells. Selectivity might be achieved by directing poreforming proteins to target cells with antibodies or lectins. Additional selectivity might be achieved by activation of the poreforming protein at the target cell surface. Along these lines, αHL polypeptides that can be activated by a tumor protease, cathepsin B, have been obtained by screening a library of inactive two-chain αHL mutants containing a combinatorial cassette encoding thousands of potential protease recognition sites [67]. Nanopores have excellent prospects in sensor technology. For example, the ionic current flowing through a pore in a transmembrane potential and the modulation of the current by analytes that act, for instance, as blockers can be monitored. In this way, sensors based on either many pores (macroscopic currents) or single pores are being devised. Here, we focus on single pores and the reader is referred to reviews, where key papers on macroscopic approaches can be located [68–70]. Single pores are used in stochastic sensing where individual interactions between the pores and analyte molecules are detected [47, 70, 71]. In stochastic sensing, the frequency of occurrence of individual binding events reveals the concentration of an analyte, while the analyte can be identified from the signature of the binding events (the extent and duration of current block, for example) (Figure 6). As indicated in the previous section, a variety of blocker sites have been engineered into the αHL pore, and they can be used to detect a wide variety of analytes ranging from small cations and anions, to organic molecules, to macromolecules including DNA and proteins. Polymers can be detected by nanopores in a second mode in which their transit through the pores is registered [46, 47]. This was first demonstrated with single-stranded DNA [43], but there is no reason why the procedure cannot be adapted to other polymers. Here, the nanopore is acting as more than a simple Coulter counter, because limited information about the structure of the polymer is
Fig. 6 Basis of stochastic sensing [71]. Individual interactions between single pores and analyte molecules are detected. (a) Engineered nanopore in the absence of analyte. A flat current trace is seen. (b) Engineered nanopore in the presence of analytes. In-
dividual binding events are manifested as transient reductions in current. The frequency of occurrence of binding events reveals the concentration of an analyte. Further, an analyte can be identified from the signatures of the events.
1 Overview
revealed by the changes in current during transit and the transit velocity. It has been proposed that DNA might be sequenced by this approach, but this seems to be an unlikely proposition in its present formulation [47]. However, protein engineering has not been applied to this idea in earnest, and it might be of help. Finally, engineered nanopores have applications in basic science. For example, the ability to introduce molecules into cells is an important tool in cell biology, and pore-forming proteins are one of the devices that have been used for this purpose [72]. Obviously, the size of the pore determines what goes in and, just as importantly, what comes out. For example, αHL has often been used to bring small molecules (up to ∼1000 Da) into cells, while streptolysin O (which forms pores ∼35 nm in diameter) can be used to transfer macromolecules of up to 100 kDa. Prolonged permeabilization of cells is lethal, and so a means of closing the pores is required. In some cases, wild-type lesions can be eliminated. For example, resealing of the pores formed by streptolysin O occurs by an unknown mechanism that is promoted by Ca2+ ions [73]. Engineered pores can also be used and αHL-H5, which can be closed with divalent metal ions [61], has found several application in the laboratory. For example, Palczewski and colleagues used αHL-H5 to load nucleotides into retinal rod cells [74]. In our own laboratory, it is becoming clear that the αHL pore can be used as a nano-reactor for the investigation of both noncovalent and, most recently, covalent interactions at the single molecule level. Single molecule studies are capable of revealing information that is obscured in ensemble measurements [75]. These studies stem from our attempts to make sensor elements from the αHL pore (see above). In the case of sensors, it does not matter whether the kinetics seen in bulk solution are maintained inside or close to the pore. However, our studies suggest that several binary noncovalent interactions occur with similar kinetics and thus possess similar Kd values to those observed in bulk solvent. These include host–guest interactions [29], DNA duplex formation [56], and protein-ligand interactions [59]. While we can say that association and dissociation rate constants are within a factor of ten of those in bulk solvent, more detailed comparisons will be required to ascertain just how close they are. One partner in each of these interactions was a small molecule. Where a macromolecule must assume a compact state to enter the pore, overall Kd values are increased [45, 76]. In recent studies, we have investigated a variety of covalent chemistries within the αHL pore at the single molecule level, and again the observed kinetic constants are close to those seen in bulk solvent [77, 78]. 1.5 Keeping Nanopores Happy
One issue of great importance in sensor technology is the search for more robust platforms for existing nanopores (Figure 7), or simply more robust nanopores [71]. This is an area that is gaining from several energing aspects of nanotechnology. The lipid bilayers used in the laboratory are usually formed across apertures in polymer films, such as Teflon. These bilayers are mechanically unstable, and a great deal of work is aimed at obtaining a storable and shippable alternative [79–82]. One opportunity lies with improved apertures, and several based on silicon
11
12 Engineered Nanopores
Fig. 7 Different classes of robust bilayer. (a) Bilayer across a small aperture formed in silicon nitride [83]. (b) Bilayer on a porous support comprising a bacterial S-layer [84]. (c) Tethered supported bilayer on a smooth gold surface [86].
substrates have been reported. For example, Peterman and colleagues recently showed that bilayers can be formed across 25-µm apertures in silicon nitride made by established fabrication technology [83] (Figure 7a). The capacitance of the device was reduced by oxidation of exposed silicon substrate and the use of a polyimide surface coating. The bilayers were more robust than those on Teflon apertures, as judged by their ability to withstand high potentials. Another promising area is the use of porous supports, such as nanoporous glass, agarose, etc. One interesting example, with which steady improvements are being made, is the use of S-layers [84], the porous two-dimensional protein crystals from bacterial envelopes (Figure 7b). Recent investigations have demonstrated that S-layer-supported bilayers have an increased resistance to hydrostatic pressure [85]. The exploration of bilayers on solid supports continues. In general, these bilayers have been too leaky for single-channel current recording. Recently, tethered supported bilayers on ultrasmooth Au surfaces have been obtained with specific capacitance and impedance values rivaling those of planar lipid bilayers [86] (Figure 7c). Therefore, if sufficiently small surface areas are used, it should be possible to examine single nanopores in such systems [87]. However, a second serious issue is the high resistance of the volume between the bilayer and the support, which is a consequence of the modest reservoir of mobile ions, at least with tethered bilayers [88, 89]. It may also be possible to replace lipid bilayers completely. For example, polymerizable block copolymers appear to be a promising substitute [90]. In short, progress has been made, but we do not yet have a bilayer that can be slipped into a briefcase.
2 Methods
In this article we have covered a wide variety of types of nanopores, and it would be presumptuous of us to discuss the methodology involved in areas where we do not work directly. Instead, we present a summary of the methods that have been crucial for the progress that has been made with the αHL pore. 2.1 Protein Production
The monomeric form of αHL and its mutants are obtained in abundance and high purity by expression in Staphylococcus [91] or E. coli [92]. When necessary, the
2 Methods 13
purifications can be aided with C-terminal His tags. Conveniently, small amounts of αHL are available by in-vitro transcription and translation, which produces around 50 µg mL−1 [92]. For single-channel recording in the laboratory, we use only nanograms of protein. Other applications likewise require modest amounts; for example, a 1 m2 , two-dimensional crystal of αHL pores would contain only 3.5 mg of protein. The in-vitro production technology is readily scaled up to provide milligrams and even grams [93]. While assembly into heptamers can be facilitated by receptors on target cells, such as rabbit red blood cells [94], the αHL pore self-assembles on artificial lipid bilayers [95]. The heptameric pore is robust and remains stable in the denaturing detergent SDS at up to 65◦ C [96]. This stability is the basis of the separation by SDS-PAGE of heteroheptamers, pores containing various combinations of engineered and WT subunits. The availability of pure heteroheptameric species has been a key aspect of our work on the αHL pore. To achieve the electrophoretic separation, the mutant subunits are tagged; originally, the tag was a targeted chemical modification with a disulfonate near the C terminus [14], but more recently we have used a genetically encoded oligoaspartate tail [57]. 2.2 Protein Engineering
With the benefit of high-resolution structural information, protein engineering [97] allows substitute amino acid side chains or chemical modifications to be placed in a protein at defined sites. Engineering studies on membrane proteins have been limited compared to other systems, largely because the proteins can be tricky to handle and because, until quite recently, there has been a lack of structural information. As noted earlier, de-novo design is a difficult task, and very few studies have been conducted on genetically encoded designed membrane proteins. “Redesign” is more readily accomplished, and since 1996 it has been aided in the case of αHL by the availability of a 1.9 Å crystal structure of the heptameric pore. More recently, structural studies of LukF, a related toxin, have provided information about the monomeric form of αHL [98, 99]. Therefore, the impact of structural changes on both the pore and its ability to assemble can be appraised. The αHL pore is a “blank slate” in terms of engineering, and the goal has been to introduce function rather than change function – which has been a difficult task when attempted in proteins more specialized than αHL. In αHL, most desired changes are made on the internal surface of the protein, which is an unusual challenge. Nevertheless, the open structure of the αHL lumen is more readily manipulated than a narrow channel. For example, we have replaced the transmembrane β barrel with a retro (reversed) amino acid sequence and obtained a functional pore [7]. In practice, redesign is done both by rational approaches (e. g., the examination of molecular models) and hit-and-miss methods (e.g., the screening of libraries of mutants). The approaches are often combined. For example, structural studies may suggest a region for intensive mutagenesis. This approach has been applied to the transmembrane region of αHL by using a semi-synthetic gene in which selected
14 Engineered Nanopores
residues are encoded by a replaceable cassette [7]. Nowadays, most forms of sitedirected mutagenesis are greatly accelerated by tools such as the polymerase chain reaction and ligation-free recombination of DNA fragments. Targeted chemical modification is an invaluable adjunct to direct mutagenesis. In general, modifications are carried out at specific introduced cysteine residues, and in the case of αHL all manner of additions have been made, including the attachment of small molecules with and without poly(ethylene glycol) linkers, peptides, and oligonucleotides [56, 59]. Finally, as noted above, it is possible to make heteromeric αHL pores, which are most commonly molecules with six unaltered subunits and one engineered subunit. 2.3 Electrical Recording
While ensembles of pores can be useful in many circumstances, one notable capability is the means to examine function in intricate detail by single-channel recording. Single-channel recording is also the basis of stochastic sensing. Both monomeric and heptameric αHL pores can assemble or insert into lipid bilayers in a single orientation. Unlike many other pores, the αHL pore remains open indefinitely at moderate transmembrane potentials. Under typical conditions (in 1 M NaCl), the pore has a high conductance of 700 pS, and changes of a few percent in the current carried by a single pore can be measured under optimal conditions. Many measurements can be carried out on a timescale of tens of microseconds. Single-channel recording allows the determination of the basic properties of a nanopore with confidence, including conductance values, ion selectivity, stability, and uniformity. Further, the noncovalent and covalent interactions of exogenous molecules with nanopores can be examined with exquisite sensitivity. 2.4 Other Systems
Analogous methods to those described above will have to be developed for other nanopore systems, if they are to reach the state of sophistication that, in the case of the αHL pore, has been aided by structural studies, protein engineering, and singlechannel recording. It is not at all clear how the structures of many of the alternative systems might be examined at atomic resolution. Nor is it clear how they might be modified at discrete sites with defined stoichiometry. If the current passing through single nanopores is to be recorded, they will have to be assembled into systems that do not suffer from current leaks and are of low intrinsic capacitance.
3 Outlook
As suggested by the preceding discussion, many improvements might be made to existing nanopore technology.
3 Outlook 15
3.1 Rugged Pores
There are limits to the stability of proteins, and therefore several alternative approaches have been taken to the preparation of nanopores, as described earlier. In the long term, whichever techniques are used must be amenable to mass production and compatible with array technology (see below). Of course, a major advantage of protein nanopores is the ability to carry out functionalization with precision by replacing amino acid side chains or carrying out targeted chemical modification. While alternative nanopores, made from polymers, carbon nanotubes, inorganic materials, etc., can be randomly derivatized, it is not yet clear how defined patterns of functionalization or single-site alterations might be made. One compromise would be to immobilize a single adapter molecule (as we have used with the αHL pore) at the entrance of a mechanically and thermally stable nanopore, such as an Au nanotubule. 3.2 Supported Bilayers
To accommodate protein nanopores, it is clear that additional work is needed on supported bilayers and related areas. The formation of defect-free bilayers appears to require ultrasmooth surfaces [86]. “Edge effects” – that is, leaks at the perimeter of the bilayers – must also be tackled. In addition, improved electrical properties demand a substantial reservoir of electrolyte between the bilayer and the support surface [89]. Finally, a reasonable fraction of the incorporated nanopores should be active. Pores with an appreciable extramembranous domain that is located between the bilayer and the underlying surface, might be denatured by the support unless precautions such as polymer cushions are used. In this area, a near-term goal is to observe single channels in supported bilayers [87]. 3.3 Membrane Arrays
Arrays of lipid bilayers would be useful for screening the properties of membrane proteins and for monitoring their interactions with other molecules (Figure 8). The same technology could be used for array-based sensors. Several assays for poreforming molecules have been developed in which liposomes are used in microtiter formats [100–102]. When they are more fully developed, addressable bilayer arrays containing pores will be far more versatile. Several procedures for generating bilayer arrays have been developed, including the formation of corrals by scratching off or blotting the bilayer, or by patterning the support in various ways, for example by depositing the bilayer with a poly-dimethylsiloxane (PDMS) stamp or by making corrals with various barriers including metals, metal oxides, and proteins [103]. Recently, barriers have been formed by the polymerization of diacetylenic lipids [104]. All corrals can be filled with a single type of bilayer by exposure to a vesicle
16 Engineered Nanopores
Fig. 8 Membrane array on a glass surface. The pattern was formed by photolithography of a glass slide coated with photoresist. Each corral contains a bilayer doped with either 1% NBD-phosphatidyl ethanolamine or 1% Texas Red-phosphatidyl ethanolamine.
A false-color image was generated from epifluorescence images recorded through a home-built macroscope. (E. T. Castellana and P.S. Cremer, Texas AM University, unpublished data.)
suspension, or individual corrals can be filled with a micropipette [105, 106] or by flow patterning [103]. Two goals of further development are to address each corral electrically and to wash each one individually by using a microfluidic system. It should also become possible to arrange other classes of pores, such as those fabricated with ion beams, in addressable formats. 3.4 Alternative Protein Pores
Advances in protein folding and structural biology will drive the search for alternative protein pores. While research on αHL has been extremely fruitful, the pore is a heptamer and therefore intricate to handle. A single-chain, monomeric pore would be ideal, and we have begun to explore the monomeric porin OmpG [107, 108]. It is likely that yet more sophisticated engineering techniques will be applied to protein nanopores than have been in the past. For example, multistep syntheses on the lumen wall might be used to produce complex host molecules within a pore. Because highly stable bilayers with desirable properties have been difficult to obtain, the possibility of directly housing a protein nanopore inside a mechanically stable pore of slightly greater diameter should be considered. The surfaces of both the protein nanopore and the recipient pore (e.g., an Au nanotubule) could be tailored for compatibility (Figure 9).
3 Outlook 17
Fig. 9 Possible means for stabilizing a single protein pore. The nanopore (here the "HL pore) is placed inside a mechanically stable pore (here a Au nanotubule). The surface of the nanopore and the "HL pore are tailored for compatibility. (C. R. Martin, University of Florida.)
3.5 Pores with New Attributes and Applications
Several advanced applications of nanopore engineering are being initiated. For example, the αHL pore can be used as a “nanoreactor” for investigating both noncovalent and covalent interactions. The study of noncovalent interactions elaborates upon the work on sensors. Single-channel current traces contain kinetic information. For example, for a simple binary system, such as a cyclodextrin-guest interaction, kon , koff and Kd can be readily obtained. However, while the values are close to those in bulk solution, further work is required to determine whether there are significant differences. Considerations include steric effects, transmembrane and local potentials, solvent confinement, and electroosmosis. In pursuit of ideas for sensing reactive molecules, we have recently begun to explore covalent interactions at the single molecule level with αHL as a nanoreactor, and related considerations come into play here. In addition, it might soon be possible to build various catalytic activities into the pores. This has a precedent in nature, as several enzymes contain a series of active sites located inside an internal channel [109]. Substrates enter the tunnel at one end and products leave at the other. These channels span up to 100 Å, which is the length of the αHL pore. Esterase and ribonuclease activity in a synthetic barrel lined with histidine residues has been observed by Matile and colleagues [110, 111]. By using the variety of substitutions available to protein engineering, and the ability to place them with known geometry, it may be possible to improve turnover and couple catalysis to transmembrane transport.
18 Engineered Nanopores
3.6 Theory
With a continually expanding output of experimental data, much of which is purely descriptive, there is a need for theory and computation in studies of nanopores. Of course, the applications of calculations and simulations to membrane channels and pores is well developed, though far from perfected. By combining aspects of results from structural studies, molecular dynamics simulations, Brownian dynamics and electrodiffusion theory, estimates of fundamental properties of channels and pores have been obtained, such as conductance values, the shape of current-voltage curves, and the magnitude of ion selectivity [112, 113]. The topics covered in this chapter suggest additional areas for exploration such as the properties of water [114] and polymers [48] under confinement, and electroosmosis in pores of nanometer dimensions [115, 118]. It should go without saying that all of this must be coupled to experimentation. Acknowledgments
The authors thank several colleagues for help with the figures, and acknowledge grants from the U.S. Department of Energy, the Multidisciplinary University Research Initiative (ONR 1999), the National Institutes of Health, the Office of Naval Research and DARPA (MOLDICE program). H.B. is the holder of a Royal SocietyWolfson Research Merit Award.
References 1 C. R. MARTIN, P. KOHLI, Nature Rev. Drug Disc. 2003, 2, 29–37. 2 G. E. SCHULZ, Biochim. Biophys. Acta 2002, 1565, 308–317. 3 K. S. ÅKERFELDT, J. D. LEAR, Z. R. WASSERMAN, L. A. CHUNG, W. F. DEGRADO, Acc. Chem. Res. 1993, 26, 191–197. 4 H. BAYLEY, Curr. Opin. Biotechnol. 1999, 10, 94–103. 5 S. LEE, T. KIYOTA, T. KUNITAKE, E. MATSUMOTO, S. YAMASHITA, K. ANZAI, G. SUGIHARA, Biochemistry 1997, 36, 3782–3791. 6 E. MATSUMOTO, T. KIYOTA, S. LEE, G. SUGIHARA, S. YAMASHITA, H. MENO, Y. ASO, H. SAKAMOTO, H. M. ELLERBY, Biopolymers 2001, 56, 96–108. 7 S. CHELEY, O. BRAHA, X. LU, S. CONLAN, H. BAYLEY, Protein Sci. 1999, 8, 1257–1267.
8 A. J. WALLACE, T. J. STILLMAN, A. ATKINS, D. J. JAMIESON, P. A. BULLOUGH, J. GREEN, P. J. ARTYMIUK, Cell 2000, 100, 265–276. 9 B. NORTH, C. M. SUMMA, G. GHIRLANDA, W. F. DEGRADO, J. Mol. Biol. 2001, 311, 1081–1090. 10 G. E. SCHULZ, Curr. Opin. Struct. Biol. 1996, 6, 485–490. 11 M. BANNWARTH, G. E. SCHULZ, Prot. Eng. 2002, 15, 799–804. 12 S. TERRETTAZ, W.-P. ULRICH, H. VOGEL, Q. HONG, L. G. DOVER, J. H. LAKEY, Prot. Sci. 2002, 11, 1917–1925. 13 L. SONG, M. R. HOBAUGH, C. SHUSTAK, S. CHELEY, H. BAYLEY, J. E. GOUAUX, Science 1996, 274, 1859–1865. 14 O. BRAHA, B. WALKER, S. CHELEY, J. J. KASIANOWICZ, L. SONG, J. E. GOUAUX, H. BAYLEY, Chem. Biol. 1997, 4, 497–505. 15 G. MILES, L. MOVILEANU, H. BAYLEY, Protein Sci. 2002, 11, 894–902.
References 16 J. D. HARTGERINK, T. D. CLARK, M. R. GHADIRI, Chem. Eur. J. 1998, 4, 1367–1372. 17 T. D. CLARK, M. R. GHADIRI, J. Am. Chem. Soc. 1995, 117, 12364–12365. 18 J. R. GRANJA, M. R. GHADIRI, J. Am. Chem. Soc. 1994, 116, 10785–10786. 19 T. D. CLARK, L. K. BUEHLER, M. R. GHADIRI, J. Am. Chem. Soc. 1998, 120, 651–656. 20 S. MATILE, Chem. Rec. 2001, 1, 162–172. 21 G. DAS, N. SAKAI, S. MATILE, Chirality 2002, 14, 18–24. 22 G. DAS, S. MATILE, Chirality 2001, 13, 170–176. 23 M. R. GHADIRI, J. R. GRANJA, R. A. MILLIGAN, D. E. MCREE, N. KHAZANOVICH, Nature 1993, 366, 324–327. 24 P. Y. APEL, Y. E. KORCHEV, Z. SIWY, R. SPOHR, M. YOSHIDA, Nucl. Intr. Meth. Physics Res. B 2001, 184, 337–346. 25 C. A. PASTERNAK, G. M. ALDER, P. Y. APEL, C. L. BASHFORD, D. T. EDMONDS, Y. E. KORCHEV, A. A. LEV, G. LOWE, M. MILOVANOVICH, C. W. PITT, T. K. ROSTOVTSEVA, N. I. ZHITARIUK, Radiation Measurements 1995, 25, 675–683. 26 K. B. JIRAGE, J. C. HULTEEN, C. R. MARTIN, Science 1997, 278, 655–658. 27 J. LI, D. STEIN, C. MCMULLAN, D. BRANTON, M. J. AZIZ, J. A. GOLOVCHENKO, Nature 2001, 412, 166–169. 28 R. A. MEHL, J. C. ANDERSON, S. W. SANTORO, L. WANG, A. B. MARTIN, D. S. KING, D. M. HORN, P. G. SCHULTZ, J. Am. Chem. Soc. 2003, 125, 935–939. 29 L.-Q. GU, O. BRAHA, S. CONLAN, S. CHELEY, H. BAYLEY, Nature 1999, 398, 686–690. 30 J. SANCHEZ-QUESADA, M. R. GHADIRI, H. BAYLEY, O. BRAHA, J. Am. Chem. Soc. 2000, 122, 11758–11766. 31 L.-Q. GU, S. CHELEY, H. BAYLEY, Science 2001, 291, 636–640. 32 L.-Q. GU, S. CHELEY, H. BAYLEY, J. Gen. Physiol. 2001, 118, 481–494. 33 J. S´ANCHEZ-QUESADA, M. P. ISLER, M. R. GHADIRI, J. Am. Chem. Soc. 2002, 124, 10004–10005. 34 J. C. HULTEEN, K. B. JIRAGE, C. R. MARTIN, J. Am. Chem. Soc. 1998, 120, 6603– 6604.
35 M. NISHIZAWA, V. P. MENON, C. R. MARTIN, Science 1995, 268, 700–702. 36 S. B. LEE, C. R. MARTIN, J. Am. Chem. Soc. 2002, 124, 11850–11851. 37 S. B. LEE, D. T. MITCHELL, L. TROFIN, T. K. NEVANEN, H. S¨ODERLUND, C. R. MARTIN, Science 2002, 296, 2198–2200. 38 J. M. PASCUAL, A. KARLIN, J. Gen. Physiol. 1998, 111, 717–739. 39 K. SAXENA, V. DROSOU, E. MAIER, R. BENZ, B. LUDWIG, Biochemistry 1999, 38, 2206–2212. 40 N. SAKAI, N. SORDE´ , G. DAS, P. PERROTTET, D. GERARD, S. MATILE, Org. Biomol. Chem. 2003, 1, 1226–1231. 41 L.-Q. GU, M. DALLA SERRA, J. B. VINCENT, G. VIGH, S. CHELEY, O. BRAHA, H. BAYLEY, Proc. Natl. Acad. Sci. USA 2000, 97, 3959–3964. 42 B. B. LAKSHMI, C. R. MARTIN, Nature 1997, 388, 758–760. 43 J. J. KASIANOWICZ, E. BRANDIN, D. BRANTON, D. W. DEAMER, Proc. Natl. Acad. Sci. USA 1996, 93, 13770–13773. 44 L. MOVILEANU, S. CHELEY, S. HOWORKA, O. BRAHA, H. BAYLEY, J. Gen. Physiol. 2001, 117, 239–251. 45 L. MOVILEANU, H. BAYLEY, Proc. Natl. Acad. Sci. USA 2001, 98, 10137– 10141. 46 S. M. BEZRUKOV, J. Membr. Biol. 2000, 174, 1–13. 47 H. BAYLEY, C. R. MARTIN, Chem. Rev. 2000, 100, 2575–2594. 48 M. MUTHUKUMAR, J. Chem. Phys. 2003, 118, 5174–5184. 49 M. ZHOU, J. H. MORAIS-CABRAL, S. MANN, R. MACKINNON, Nature 2001, 411, 643–644. 50 G. A. WOOLLEY, V. ZUNIC, J. KARANICOLAS, A. S. I. JAIKARAN, A. V. STAROSTIN, Biophys. J. 1997, 73, 2465–2475. 51 L. LIEN, D. C. J. JAIKARAN, Z. H. ZHANG, G. A. WOOLLEY, J. Am. Chem. Soc. 1996, 118, 12222–12223. 52 N. K. MAL, M. FUJIWARA, Y. TANAKA, Nature 2003, 421, 350–353. 53 O. BRAHA, L.-Q. GU, L. ZHOU, X. LU, S. CHELEY, H. BAYLEY, Nature Biotechnol. 2000, 17, 1005–1007. 54 S. CHELEY, L.-Q. GU, H. BAYLEY, Chem. Biol. 2002, 9, 829–838.
19
20 Engineered Nanopores
55 L.-Q. GU, H. BAYLEY, Biophys. J. 2000, 79, 1967–1975. 56 S. HOWORKA, L. MOVILEANU, O. BRAHA, H. BAYLEY, Proc. Natl. Acad. Sci. USA 2001, 98, 12996–13001. 57 S. HOWORKA, S. CHELEY, H. BAYLEY, Nature Biotechnol. 2001, 19, 636–639. 58 S. HOWORKA, H. BAYLEY, Biophys. J. 2002, 83, 3202–3210. 59 L. MOVILEANU, S. HOWORKA, O. BRAHA, H. BAYLEY, Nature Biotechnol. 2000, 18, 1091–1095. 60 M. S´ARA, U. SLEYTR, J. Bacteriol. 2000, 182, 859–868. 61 M. J. RUSSO, H. BAYLEY, M. TONER, Nature Biotechnol. 1997, 15, 278–282. 62 J. P. ACKER, X. LU, V. YOUNG, H. CHELEY, H. BAYLEY, A. FOWLER, M. TONER, Biotechnol. Bioeng. 2003, 82, 525–532. 63 A. EROGLU, M. J. RUSSO, R. BIEGANSKI, A. FOWLER, S. CHELEY, H. BAYLEY, M. TONER, Nature Biotechnol. 2000, 18, 163–167. 64 T. CHEN, J. P. ACKER, A. EROGLU, S. CHELEY, H. BAYLEY, A. FOWLER, M. TONER, Cryobiology 2001, 43, 168–181. 65 J. P. ACKER, A. FOWLER, B. LAUMAN, S. CHELEY, M. TONER, Cell Preserv. Technol. 2002, 1, 129–140. 66 S. FERNANDEZ-LOPEZ, H.-S. KIM, E. C. CHOI, M. DELGADO, J. R. GRANJA, A. KHASANOV, A. KRAEHENBUEHL, G. LONG, D. A. WEINBERGER, K. M. WILCOXEN, M. R. GHADIRI, Nature 2001, 412, 452– 455. 67 R. G. PANCHAL, E. CUSACK, S. CHELEY, H. BAYLEY, Nature Biotechnol. 1996, 14, 852–856. 68 H. A. FISHMAN, D. R. GREENWALD, R. N. ZARE, Annu. Rev. Biophys. Biomol. Struct. 1998, 27, 165–198. 69 C. ZIEGLER, W. G¨OPEL, Curr. Opin. Chem. Biol. 1998, 2, 585–591. 70 H. BAYLEY, O. BRAHA, L.-Q. GU, Adv. Mater. 2000, 12, 139–142. 71 H. BAYLEY, P. S. CREMER, Nature 2001, 413, 226–230. 72 S. BHAKDI, U. WELLER, I. WALEV, E. MARTIN, D. JONAS, M. PALMER, Med. Microbiol. Immunol. 1993, 182, 167–175. 73 I. WALEV, S. C. BHAKDI, F. HOFMANN, N. DJONDER, A. VALEVA, K. AKTORIES, S. BHAKDI, Proc. Natl. Acad. Sci. USA 2001, 98, 3185–3190.
74 A. E. OTTO-BRUC, R. N. FARISS, J. P. Van HOOSER, K. PALCZEWSKI, Proc. Natl. Acad. Sci. USA 1998, 95, 15014–15019. 75 X. S. XIE, H. P. LU, J. Biol. Chem. 1999, 274, 15967–15970. 76 L. MOVILEANU, S. CHELEY, H. BAYLEY, Biophys. J. 2003, 85, 897–910. 77 T. LUCHIAN, S.-H. SHIN, H. BAYLEY, Angew. Chem. Int. Ed. 2003, 42, 1926–1929. 78 S.-H. SHIN, T. LUCHIAN, S. CHELEY, O. BRAHA, H. BAYLEY, Angew. Chem. Int. Ed. 2002, 41, 3707–3709. 79 F. S. LIGLER, T. L. FARE, K. D. SEIB, B. SEE, J. W. SMUDA, A. SINGH, P. AHL, M. E. AYERS, A. DALZIEL, P. YAGER, Medical Instrumentation 1988, 22, 247– 256. 80 E. SACKMANN, M. TANAKA, Trends Biotechnol. 2000, 18, 58–64. 81 E.-K. SINNER, W. KNOLL, Curr. Opin. Chem. Biol. 2001, 5, 705–711. 82 J. T. GROVES, Curr. Opin. Drug Discov. Devel. 2002, 5, 606–612. 83 M. C. PETERMAN, J. M. ZIEBARTH, O. BRAHA, H. BAYLEY, H. A. FISHMAN, D. A. BLOOM, Biomed. Microdevices 2002, 4, 231–236. 84 B. SCHUSTER, D. PUM, O. BRAHA, H. BAYLEY, U. B. SLEYTR, Biochim. Biophys. Acta 1998, 1370, 280–288. 85 B. SCHUSTER, U. B. SLEYTR, Biochim. Biophys. Acta 2002, 1563, 29–34. 86 S. M. SCHILLER, R. NAUMANN, K. LOVEJOY, H. KUNZ, W. KNOLL, Angew. Chem. Int. Ed. Engl. 2003, 42, 208–211. 87 G. WIEGAND, K. R. NEUMAIER, E. SACKMANN, Rev. Sci. Instrum. 2000, 71, 2309–2320. 88 B. A. CORNELL, V. L. B. BRAACH-MAKSVYTIS, L. G. KING, P. D. J. OSMAN, B. RAGUSE, L. WIECZOREK, R. J. PACE, Nature 1997, 387, 580–583. 89 G. KRISHNA, J. SCHULTE, B. A. CORNELL, R. J. PACE, P. D. OSMAN, Langmuir 2003, 19, 2294–2305. 90 W. MEIER, C. NARDIN, M. WINTERHALTER, Angew. Chem. Int. Ed. Engl. 2000, 39, 4599–4602. 91 A. VALEVA, A. WEISSER, B. WALKER, M. KEHOE, H. BAYLEY, S. BHAKDI, M. PALMER, EMBO J. 1996, 15, 1857– 1864.
References 92 B. J. WALKER, M. KRISHNASASTRY, L. ZORN, J. J. KASIANOWICZ, H. BAYLEY, J. Biol. Chem. 1992, 267, 10902–10909. 93 T. LAMLA, W. STIEGE, V. A. ERDMANN, Mol. Cell. Proteomics 2002, 1.6, 466–472. 94 A. HILDEBRAND, M. POHL, S. BHAKDI, J. Biol. Chem. 1991, 266, 17195–17200. 95 G. MENESTRINA, J. Membr. Biol. 1986, 90, 177–190. 96 B. WALKER, H. BAYLEY, Protein Eng. 1995, 8, 491–495. 97 W. F. Ed. DEGRADO, Chem. Rev. 2001, 101, special issue. 98 R. OLSON, H. NARIYA, K. YOKOTA, Y. KAMIO, E. GOUAUX, Nature Struct. Biol. 1999, 6, 134–140. 99 J.-D. P´EDELACQ, L. MAVEYRAUD, G. PRE´ VOST, L. BABA-MOUSSA, A. GONZA´ LEZ, E. COURCELLE, W. SHEPARD, H. MONTEIL, J.-P. SAMAMA, L. MOUREY, Structure 1999, 7, 277–288. 100 S. KOLUSHEVA, L. BOYER, R. JELINEK, Nature Biotechnol. 2000, 18, 225–227. 101 J. M. RAUSCH, W. C. WIMLEY, Anal. Biochem. 2001, 293, 258–263. 102 G. DAS, P. TALUKDAR, S. MATILE, Science 2002, 298, 1600–1602. 103 J. T. GROVES, S. G. BOXER, Acc. Chem. Res. 2002, 35, 149–157. 104 K. MORIGAKI, T. BAUMGART, A. OFFENHA¨ USSER, W. KNOLL, Angew. Chem. Int. Ed. Engl. 2001, 40, 172–174. 105 P. S. CREMER, T. YANG, J. Am. Chem. Soc. 1999, 121, 8130–8131. 106 T. YANG, E. E. SIMANEK, P. S. CREMER, Anal. Chem. 2000, 72, 2587–2589.
107 S. CONLAN, Y. ZHANG, S. CHELEY, H. BAYLEY, Biochemistry 2000, 39, 11845–11854. 108 M. BEHLAU, D. J. MILLS, H. QUADER, W. K¨UHLBRANDT, J. VONCK, J. Mol. Biol. 2001, 305, 71–77. 109 X. HUANG, H. M. HOLDEN, F. M. RAUSHEL, Annu. Rev. Biochem. 2001, 70, 149–180. 110 B. BAUMEISTER, N. SAKAI, S. MATILE, Org. Lett. 2001, 3, 4229–4232. 111 B. BAUMEISTER, S. MATILE, Macromolecules 2002, 35, 1549–1555. 112 M. S. P. SANSOM, I. H. SHRIVASTAVA, K. M. RANATUNGA, G. R. SMITH, Trends Biochem. Sci. 2000, 25, 368– 374. 113 B. ROUX, Curr. Opin. Struct. Biol. 2002, 12, 182–189. 114 U. RAVIV, P. LAURAT, J. KLEIN, Nature 2001, 413, 51–54. 115 A. T. CONLISK, J. MCFERRAN, Z. ZHENG, D. HANSFORD, Anal. Chem. 2002, 74, 2139–2150. 116 G. G. KOCHENDOERFER, J. M. TACK, S. CRESSMAN, Bioconjug. Chem. 2002, 13, 474–480. 117 S. W. COWAN, R. M. GARAVITO, J. N. JANSONIUS, J. A. JENKINS, R. KARLSSON, N. K¨ONIG, E. F. PAI, R. A. PAUPTIT, P. J. RIZKALLAH, J. P. ROSENBUSCH, G. RUMMEL, T. SCHIRMER, Structure 1995, 3, 1041–1050 118 L.-Q. GU, S. CHELEY, H. BAYLEY, Proc. Natl. Acad. Sci. USA, 2003, 100, 15498–15503
21
1
Design and Stability of α-Helices
Andrew J. Doig, Neil Errington, and Teuku M. Iqbalsyah UMIST, Manchester, United Kingdom
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Proteins are built of regular local folds of the polypeptide chain called secondary structure. The α-helix was first described by Pauling, Corey, and Branson in 1950 [1], and their model was quickly supported by X-ray analysis of hemoglobin [2]. Irrefutable proof of the existence of the α-helix came with the first protein crystal structure of myoglobin, where most secondary structure is helical [3]. α-Helices were subsequently found in nearly all globular proteins. It is the most abundant secondary structure, with ≈30% of residues found in α-helices [4]. In this review, we discuss structural features of the helix and their study in peptides. Some earlier reviews on this field are in Refs [5–11].
2 Structure of the α-Helix
A helix combines a linear translation with an orthogonal circular rotation. In the α-helix the linear translation is a rise of 5.4 Å per turn of the helix and a circular rotation is 3.6 residues per turn. Side chains spaced i, i + 3, i, i + 4, and i, i + 7 are therefore close in space and interactions between them can affect helix stability. Spacings of i, i + 2, i, i + 5, and i, i + 6 place the side chain pairs on opposite faces of the helix, avoiding any interaction. The helix is primarily stabilized by i, i + 4 hydrogen bonds between backbone amide groups. The conformation of a polypeptide can be described by the backbone dihedral angles φ and ψ. Most φ, ψ combinations are sterically excluded, leaving only the broad β region and narrower α region. One reason why the α-helix is so stable Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Design and Stability of "-Helices
is that a succession of the sterically allowed α, φ, ψ angles naturally position the backbone NH and CO groups towards each other for hydrogen bond formation. It is possible that a succession of the most stable conformation of an isolated residue in a polymer with alternative functional groups could point two hydrogen bond donors or acceptors towards each other, making secondary structure formation unfavourable. One reason why polypeptides may have been selected as the polymer of choice for building functional molecules is that the sterically most stable conformations also give strong hydrogen bonds. The residues at the N-terminus of the α-helix are called N -N-cap-N1-N2-N3N4 etc., where the N-cap is the residue with nonhelical φ, ψ angles immediately preceding the N-terminus of an α-helix and N1 is the first residue with helical , ψ angles [12]. The C-terminal residues are similarly called C4-C3-C2-C1-C-cap-C etc. The N1, N2, N3, C1, C2, and C3 residues are unique because their amide groups participate in i, i + 4 backbone-backbone hydrogen bonds using either only their CO (at the N-terminus) or NH (at the C-terminus) groups. The need for these groups to form hydrogen bonds has powerful effects on helix structure and stability [13]. The following rules are generally observed for N-capping in α-helices: Thr and Ser N-cap side chains adopt the gauche− rotamer, hydrogen bond to the N3 NH and have ψ restricted to 164 ± 8◦ . Asp and Asn N-cap side chains either adopt the gauche− rotamer and hydrogen bond to the N3 NH with ψ = 172 ± 10◦ , or adopt the trans rotamer and hydrogen bond to both the N2 and N3 NH groups with ψ = 107 ± 19◦ . With all other N-caps, the side chain is found in the gauche+ rotamer so that the side chain does not interact unfavorably with the N-terminus by blocking solvation and ψ is unrestricted. An i, i + 3 hydrogen bond from N3 NH to the N-cap backbone C=O is more likely to form at the N-terminus when an unfavorable N-cap is present [14]. Bonds between sp3 hybridized atoms show preferences for dihedral angles of +60◦ (gauche− ), −60◦ (gauche+ ) or 180◦ (trans). Within amino acid side chains the preferences for these rotamers varies between secondary structures [15–18]. Within the α-helix, rotamer preferences vary greatly between different positions of N-cap, N1, N2, N3, and interior [14, 19]. At helix interior positions, the gauche+ rotamer is usually most abundant (64%), followed by trans (33%) with gauche− very rare (3%), though considerable variations are seen. For example, Ser and Thr are gauche− 19% and 14% of the time, respectively, as their hydroxyl groups can hydrogen bond to the helix backbone in this conformation. The β-branched side chains Val, Ile, and Thr are the most restricted with gauche+ very common. 2.1 Capping Motifs
The amide NH groups at the helix N-terminus are satisfied predominantly by side chain hydrogen bond acceptors. In contrast, carbonyl CO groups at the C-terminus are satisfied primarily by backbone NH groups from the sequence following the helix [13]. The presence of such interactions would therefore stabilize helices. A well-known interaction is helix capping, defined as specific patterns found at or
2 Structure of the "-Helix Table 1 The most common capping motifs at α-helix termini
Sequence Related position Designation N-capping p-XXp
N-cap → N3
hp-XXph
N → N4
Expanded capping box/Hydrophobic staple
hP-XXVh
N → N4
Pro-box motif
C-capping hXp-XGh
C3 → C
Schellman motif
hXp-XGp
C2 → C
α L motifs
X-Pro
C-cap → C
Capping box
Pro-capping motif
Interactions
References
[24, 25] Reciprocal H-bonds between residues at N-cap and N3, where S/T at N-cap and E at N3 are the most frequently observed residues Reciprocal H-bonds between [26, 250] residues at N-cap and N3 accompanied by hydrophobic interactions between residues at N and N4 [28] N-cap Pro residues are usually associated with Ile and Leu at N , Val at N3 and a hydrophobic residue at N4 H-bonds between CNH and the C3 CO at and between C NH and C2 CO, respectively accompanied by hydrophobic interaction between C3 and C As The Schellman motif but H-bond between C NH and C3 CO where Cpolar A stabilizing electrostatic interaction of the residues at positions C-cap and Pro at C with the helix macrodipole
[31, 223]
[32]
[34]
p, polar amino acids; h,hydrophobic amino acids;X,any amino acids; P, proline; G, glycine.
near the ends of helices [12, 20–24]. The patterns involve hydrogen bonding and hydrophobic interactions as summarized in Table 1. A common pattern of capping at the helix N-terminus is the capping box. Here, the side chain of the N-cap forms a hydrogen bond with the backbone of N3 and, reciprocally, the side chain of N3 forms a hydrogen bond with the backbone of the N-cap [25]. The definition of the capping box was expanded by Seale et al. [26] to include an associated hydrophobic interaction between residues N and N4 and is also known as a “hydrophobic staple” [27]. A variant of the capping box motif, termed the “big” box with an observed hydrophobic interaction between nonpolar side chain groups in residues N4 and N (not N ) [26]. The Pro-box motif involves three hydrophobic residues and a Pro residue at the N-cap [28]. The N-cap Pro residue is usually associated with Ile or Leu at position N, Val at position N3, and a hydrophobic residue at position N4. This motif is arguably very destabilizing since it discourages helix elongation due to the low
3
4 Design and Stability of "-Helices
N-cap preference of Pro [29] and the poor helix propagation parameter value of Val [30]. This suggests that the Pro-box motif will not specially contribute to protein stability but to the specificity of its fold. The two primary capping motifs found at helix C-termini are the Schellman and the α L motifs [31–33]. The Schellman motif is defined by a doubly hydrogenbonded pattern between backbone partners, consisting of hydrogen bonds between the amide NH at C and the carbonyl CO at C3 and between the amide NH at C and the carbonyl CO at C2, respectively. The associated hydrophobic interaction is between C3 and C. In a Schellman motif, polar residues are highly favored at the C1 position and the C residue is typically glycine. If C is polar, the alternative α L motif is observed. The α L motif is defined by a hydrogen bond between the amide NH at C and the carbonyl CO at C3. As in the Schellman motif, the C residue is typically glycine, which adopts a positive value of dihedral angle. However, the hydrophobic interaction in an α L is heterogeneous, occurring between C3 and any of several residues external to the helix (C3 , C4 , or C5 ) [32]. A statistical analysis of the protein crystal structures database detected another possible C-cap local motif, designated Pro C-capping [34]. Certain combinations of X-Pro pairs, in which residue X is the C-cap and the Pro is at position C , are more abundant than expected. The aliphatic (Ile, Val, and Leu) and aromatic residues (Phe, Tyr, and Trp), together with Asn, His, and Cys are the most favored residues accompanying Pro at position C . A notable difference between the N- and C-terminal motifs is that at the Nterminus, helix geometry favors side chain-to-backbone hydrogen bonding and selects for compatible polar residues [14, 19]. Accordingly, the N-terminus promotes selectivity in all polar positions, especially N-cap and N3 in the capping box. In contrast, at the C-terminus, side chain-to-backbone hydrogen bonding is disfavored. Backbone hydrogen bonds are satisfied instead by posthelical backbone groups. The C-terminus need only select for C residues that can adopt positive values of the backbone dihedral angle , most notably Gly [32]. 2.2 Metal Binding
In general, the stabilization of folded forms can be achieved by reducing the entropy by cross-linking, such as via metal ions. Another way to stabilize helix conformations, especially in short peptides, is to introduce an artificial nucleation site composed of a few residues fixed in a helical conformation. For example, the calcium-binding loop from EF-hand proteins saturated with a lanthanide ion promotes a rigid short helical conformation at its C-terminus region [35]. Metal ions play a key structural and functional role in many proteins. There is therefore interest in using peptide models containing metal complexes. In stabilizing proteins, the rule of thumb in metal–ligand binding is that “hard metals” prefer “hard ligands”. For example Ca and Mg prefer ligands with oxygen as the coordinating atoms (Asp, Glu) [36]. In contrast, soft metals, such as Cu and Zn,
2 Structure of the "-Helix
bind mostly to His, Cys, and Trp ligands and sometimes indirectly via water molecules [37]. In studies of helical peptide models, soft ligands have been mostly used. In the presence of Cd ions, a synthetic peptide containing Cys-His ligands i, i + 4 apart at the C-terminal region promoted helicity from 54% to 90%. The helicity of a similar peptide containing His-His ligands increased by up to 90% as a result of Cu and Zn binding [38]. The addition of a cis-Ru(III) ion to a 6-mer peptide, Ac-AHAAAHANH2 , changed the peptide conformation from random coil to 37% helix [39]. An 11-residue peptide was converted from random coil to 80% helix content by the addition of Cd ions, although the ligands used were not natural amino acids but aminodiacetic acids [40]. As(III) stabilizes helices when bound to Cys side chains spaced i, i + 4 by −0.7 to −1.0 kcal mol−1 [41]. 2.3 The 310 -Helix
310 -Helices are stabilized by i, i + 3 hydrogen bonds instead of the i, i + 4 bonds found in α-helices, making the cylinder of the 310 -helix narrower than α and the hydrogen bonds nonlinear. About 3–4% of residues in crystal structures are in 310 -helices [4, 42], making the 310 -helix the fourth most common type of secondary structure in proteins after α-helices, β-sheets, and reverse turns. Most 310 -helices are short, only 3 or 4 residues long, compared with a mean of 10 residues in α-helices [4]. 310 -Helices are commonly found as N- or C-terminal extensions to an α-helix [4, 43, 44]: Strong amino acid preferences have been observed for different locations within the interiors [42] and N- and C-caps [14] of 310 -helices in crystal structures. The 310 -helix is being recognized as of increasing importance in isolated peptides and as a possible intermediate in α-helix formation [45, 46]. It has been proposed that peptides made of the standard 20 L-amino acids can form 310 -helices [47], or at least populate a significant amount of 310 -helix at their N- and C-termini. The fraction of 310 -helix present in a helical peptide is strongly sequence dependent [48, 49]. 310 -helix formation can be induced by the introduction of a Cα,α -disubstituted α-amino acid, of which α-aminoisobutyric acid (AIB) is the prototype. For most amino acids, the α-helical geometry (φ = −57◦ , ψ −70◦ ) is of lower energy than the 310 geometry (φ = −49◦ , ψ = −26◦ ). There is no disallowed region between the α and 310 conformations in the Ramachandran plot, and a peptide can therefore be gradually transformed from one helix to the other [50]. 2.4 The p -Helix
In contrast to the widely occurring α- and 310 -helices, the π-helix is extremely rare. The π-helix is unfavorable for three reasons: its dihedral angles are energetically unfavorable relative to the α-helix [51, 52], its three-dimensional structure has a 1Å hole down the center that is too narrow for access by a water molecule, resulting in the loss of van der Waals interactions, and a higher number of residues (four) must
5
6 Design and Stability of "-Helices
be correctly oriented before the first i, i + 5 hydrogen bond is formed, making helix initiation more entropically unfavorable than for α- or 310 -helices. The π-helix may be of more than theoretical interest [53–57].
3 Design of Peptide Helices
The first protein crystal structures showed an abundance of α-helices, leading to speculation whether peptide fragments of the helical sequences could be stable in isolation. Since isolated helices lack hydrophobic cores, this question could be rephrased to ask whether the amide-amide hydrogen bond is strong enough to oppose the loss of conformational entropy arising from restricting the peptide into a helical structure [58]. Early estimates of these terms suggested that hydrogen bonds were not strong enough and this was confirmed by peptide studies of helices from myoglobin [59] and staphylococcal nuclease [60], which found no helix formation. Early estimates of helix/coil parameters from a host–guest system (see Section 3.1) suggested that polypeptides would need to be hundreds of residues long to form stable helices. Hence, the earliest work on peptide helices was on long homopolymers of Glu or Lys, which show coil-to-helix transitions on changing the pH from charged to neutral. The neutral polypeptides are metastable and prone to aggregation, ultimately to β-sheet amyloid [61]. A conflicting result was found by Brown and Klee in 1971 [62], who reported that the C-peptide of ribonuclease A, which contains the first 13 residues of the protein and which forms a helix in the protein, was helical at 0◦ C. This observation was not followed up for 10 years until extensive work on the sequence features responsible for helix formation in this peptide, and in the larger S-peptide, was performed by Baldwin and coworkers [63–73], while NMR studies by Rico and coworkers precisely defined the helical structure [74–82]. Some important features responsible for the helicity of the C-peptide that emerged from this work included an i, i + 4 Phe-His interaction, an i, i + 3 Glu-His salt bridge, an unusual i, i + 8 Glu-Arg salt bridge that spanned two turns of the helix and a helix termination signal at Met. The stabilizing effects of salt bridges were inferred from pH titrations and circular dichroism (CD), where a decrease in helicity was seen when a residue participating in a salt bridge was neutralized. Quantification of these features in terms of a free energy was not possible at that time, as helix/coil theory that included side-chain interactions and termination signals (caps) had not been developed. Perhaps the most interesting result from the C- and S-peptide work was the importance of Ala. The replacement of interior helical residues with Ala was stabilizing, indicating that a major reason why this helix was folded in isolation was the presence of three successive alanines from positions 4–6. This led to the successful design of isolated, monomeric helical peptides in aqueous solution, first containing several salt bridges and a high alanine content, based on (EAAAK)n [83, 84] and then a simple sequence with a high alanine content solubilized by several lysines [85]. These “AK peptides” are based on the sequence (AAKAA)n , where n is typically
3 Design of Peptide Helices 7
2–5. The Lys side chains are spaced i, i + 5 so they are on opposite faces of the helix, giving no charge repulsion and may be substituted with Arg or Gln to give a neutral peptide. Hundreds of AK peptides have been studied, giving most of the results on helix stability in peptides (see below). The Alanines in the (EAAAK)n type peptides may be removed entirely; E4 K4 peptides, with sequences based on (EEEEKKKK)n or EAK patterns are also helical, stabilized by large numbers of salt bridges [8, 86–88]. 3.1 Host–Guest Studies
Extensive work from the Scheraga group has obtained helix/coil parameters using a host–guest method. Long random copolymers were synthesized of a water-soluble, nonionic guest (poly[N 5 -(3-hydroxypropyl)-L-glutamine] (PHPG) or poly[N 5 -(4-hydroxybutyl)-L-glutamine] (PHBG)), together with a low (10–50%) content of the guest residue. Using the s and σ Zimm-Bragg helix/coil parameters (see below) for the host homopolymer, it was possible to calculate those for the guest using helix/coil theory as a function of temperature. The results from the host–guest work are in disagreement with most of those from short peptides of fixed sequence (see below). 3.2 Helix Lengths
Helix formation in peptides is cooperative, with a nucleation penalty. Helix stability therefore tends to increase with length, in homopolymers at least. As the length of a homopolymer increases, the mean fraction helix will level off below 100% as long helices tend to break in two. In heteropolymers, observed lengths are highly sequence dependent. As helices are at best marginally stable in monomeric peptides in aqueous solution, they are readily terminated by the introduction of a strong capping residue or a residue with a low intrinsic helical preference. The length distribution of helices in peptides is very different to proteins [4]. Most helices are short, with 5–14 residues most common. There is a general trend for a decrease in frequency as the length increases beyond 13 residues. Helix lengths longer than 25 are rare. This is a consequence of the organization of proteins into domains of similar size, rather than showing different rules for stability; helices do not extend beyond the boundaries of the domain and so terminate. There is also a preference to have close to an integral number of turns so that their N- and C-caps are on the same side of the helix [89]. 3.3 The Helix Dipole
The secondary amide group in a protein backbone is polarized with the oxygen negatively charged and hydrogen positively charged. In a helix the amides are
8 Design and Stability of "-Helices
all oriented in the same direction with the positive hydrogens pointing to the N-terminus and negative oxygens pointing to the C-terminus. This can be regarded as giving a positive charge at the helix N-terminus and a negative charge at the helix C-terminus [90–92]. In general, therefore, negatively charged groups are stabilizing at the N-terminus and positive at the C-terminus, as shown by numerous titrations that measure helix content as a function of pH and amino acid preferences for helix terminal positions. An alternative interpretation of these results is that favored side chains are those that can make hydrogen bonds to the free amide NH groups at N1, N2, and N3 or free CO groups at C1, C2, and C3 [93]. Charged groups can form stronger hydrogen bonds than neutral groups, thus providing an alternative rationalization of the pH titration results. These hypotheses are not mutually exclusive, as a charged side chain can also function as a hydrogen bond acceptor or donor. A free energy simulation of Tidor [94] suggested that helix-stabilizing interactions, as a result of a Tyr → Asp substitution at an N-cap site, arose from hydrogen-bonding interactions from its direct hydrogen-bonding partners, and from more distant electrostatic interactions with groups within the first two turns of the helix. Measurements of the amino acid preferences for the N-cap, N1, N2, and N3 positions in the helix allow a comparison to be made of the relative importance of helix-dipole and hydrogen-bonding interactions [29, 95–97]. The helix-dipole model implies that the side chains most favored at the helix N-terminus are those with a negative charge while positive charges are disfavored. A pure hydrogen-bonding model implies that favored side chains are those that can make hydrogen bonds to the free amide NH groups at N1, N2, and N3. In general, the N-cap results suggest that hydrogen bonding is more important than helix-dipole interactions; the best N-caps are Asn, Asp, Ser, and Thr [29], which can accept hydrogen bonds from the N2 and N3 NH groups [14]. Glu has only a moderate N-cap preference despite its negative charge. In contrast, N1, N2, and N3 results suggest that helix-dipole interactions are more important. The contrasting results between the different helix N-terminal positions can be rationalized by considering the geometry of the hydrogen bonds. N-cap hydrogen bonds are close to linear [14] and so are strong, while N1 and N2 hydrogen bonds are close to 90◦ [19], making them much weaker. Helix-dipole effects are likely to be present at all sites, as also shown by every pH titration where a more negative side chain is favored over a more positive side chain, but this can be overwhelmed by strong hydrogen bonds, as at the N-cap. In the absence of strong hydrogen bonds, helix-dipole effects dominate. Hydrogen bonds can therefore make a substantial contribution to protein stability, but only if their geometry is close to linear (as it is for backbone to backbone hydrogen bonds in α-helices and β-sheets). 3.4 Acetylation and Amidation
A simple, yet effective, way to increase the helicity of a peptide is to acetylate its N-terminus [22, 98]. This is readily done with acetic anhydride/pyridine, after completion of a peptide synthesis, but before cleavage of the peptide from the resin
3 Design of Peptide Helices 9
and deprotection of the side chains. Acetylation removes the positive charge that is present at the helix terminus at low or neutral pH; this would interact unfavorably with the positive helix dipole and free N-terminal NH groups. The extra CO group from the acetyl group can form an additional hydrogen bond to the NH group, putting the acetyl at the N-cap position. This has a strong stabilizing effect by approximately 1.0 kcal mol−1 compared with alanine [29, 30, 99]. The acetyl group is one of the best N-caps. We found only Asn and Asp to be more stabilizing [29]. Amidation of the peptide C-terminus is achieved by using different types of resin in solid-phase peptide synthesis, resulting in the replacement of COO− with CONH2 . This is similar structurally to acetylation: the helix is extended by one hydrogen bond and an unfavorable charge–charge repulsion with the helix dipole is removed. The energetic benefit of amidation is rather smaller, however, with the amide group being no better than Ala and in the middle if the C-cap residues are ranked in order of stabilization effect [29]. As most helical peptides studied to date are both acetylated and amidated, and acetylation is more stabilizing than amidation, the helicity of peptides is generally skewed so that residues near the N-terminus are more helical than those near the C-terminus. Acetylation and amidation are often also beneficial in removing a pH titration that can obscure other effects. 3.5 Side Chain Spacings
Side chains in the helix are spaced at 3.6 residues per turn of the helix. Side chains spaced at i, i + 3, i, i + 4, and i, i + 7 are therefore close in space and interactions between them can affect helix stability. Spacings of i, i + 2, i,i + 5, and i, i + 6 place the side chain pairs on opposite faces of the helix, avoiding any interaction. Care must therefore be taken to avoid unwanted interactions when designing helices. For example, we studied these peptides to determine the energetics of the Phe-Met interaction [100]: F7M12 Ac-YGAAKAFAAKAMAAKAA-NH2 F8M12 Ac-YGAAKAAFAKAMAAKAA-NH2 The control peptide F7M12 has Phe and Met spaced i, i + 5 where they cannot interact; F8M12 moves the Phe one place towards the C-terminus so that Phe and Met are spaced i, i + 4 where they can form a noncovalent interaction. The increase in helix content was used to derive the Phe-Met interaction energy. These sequences are not ideal, however. F7M12 contains a Phe-Lys i, i + 3 interaction that is replaced by a Lys-Phe i, i + 3 interaction in F8M12. If either of these are stabilizing or nonstabilizing the Phe-Met i, i + 4 energy will be in error. While this problem is likely to be small, in this case at least, our more recent designs try to minimize such potential problems. For example, the following peptides, designed to measure Ile-Lys i, i + 4 energies, do not introduce or remove an unwanted i, i + 3 spacing [101]: IKc Ac-AKIAAAAKIAAAAKAKAGY-NH2 IKi Ac-AKAIAAAKAIAAAKAKAGY-NH2
10 Design and Stability of "-Helices
3.6 Solubility
Lack of peptide solubility can be a problem. Peptides designed to be helices can even become highly insoluble amyloid with a high β-sheet content. We have occasionally found that alanine-based peptides designed to be helical instead form amyloid, though this may only take place after several years storage in solution. The measurement of interactions in a helix will be compromised by peptide oligomerization so it is generally essential to check that the peptides are monomeric. This can be done rigorously by sedimentation equilibrium, which determines the oligomeric state of a molecule in solution. This is difficult, however, with the short peptides often used, as their molecular weights are at the lower limit for this technique. A simpler method is to check a spectroscopic technique that depends on peptide structure, most obviously CD, as a function of concentration. If the signal depends linearly on peptide concentration across a large range, including that used to study the peptide structure, it is safe to assume that the peptide is monomeric. For example, if helicity measurements are made at 10 µM, CD spectra can be acquired from 5 µM to 100 µM. An oligomer that does not change state, such as a coiled-coil, across the concentration range cannot be excluded, however. Light scattering can detect aggregation. A monomeric peptide should have a flat baseline in a UV spectrum outside the range of any chromophores in the peptide. In stock solutions of a peptide with a single tyrosine isolated from the helix region by Gly should have A300 /A275 < 0.02 and A250 < A275 < 0.2 [102]. Consideration of solubility is essential when designing helical peptides. While Ala has the highest helix propensity and would provide an ideal theoretical background for substitutions, poly(Ala) is insoluble. Solubility can be achieved most easily by including polar side chains spaced i, i + 5 in the sequence where they cannot interact. Lys, Arg, and Gln are used most often for this purpose. Gln may be preferred if unwanted interactions with charged Lys or Arg may be a problem, but some AQ peptides lack sufficient solubility and AQ peptides are less helical. We have found that it is not easy to predict peptide solubility. For example, 20 peptides with sequence Ac-XAAAAQAAAAQAAGY-NH2, where X is all the amino acids encoded by the genetic code, were all soluble [95]. However, when the homologous peptide Ac-AAAAAAQAAAAQAAAAQGY-NH2 was synthesized to study the N2 position it tended to aggregate. The Gln side chains were therefore replaced with Lys, giving no further problems [96]. The spacing of side chains in the helix is best visualized with a helical wheel, to ensure that the designed helix does not have a nonpolar face that may lead to dimerization. The following web pages provide useful resources for this: http://www.site.uottawa.ca/∼turcotte/resources/HelixWheel/ http://marqusee9.berkeley.edu/kael/helical.htm
3 Design of Peptide Helices 11
3.7 Concentration Determination
An accurate measurement of helix content depends on an accurate spectroscopic measurement and, equally importantly, peptide concentration. This is usually achieved by including a Tyr side chain at one end of the peptide. The extinction coefficient of Tyr at 275 nm is 1450 M−1 cm−1 [103]. If Trp is present, measurements at 281 nm can be used where the extinction coefficient of Trp is 5690 M−1 cm−1 and Tyr 1250 M−1 cm−1 [104]. Phe absorbance is negligible at this wavelength. These UV absorbances are ideally made in 6 M GuHCl, pH 7.0, 25 ◦ C, though we have found very little variation with solvent so measurements in water are identical within error. The main source of error is in pipetting small volumes; this is typically around 2%. Pipetting larger volumes with well-maintained fixed volume pipettors can help minimize this error, which may well be the largest when measuring a percentage helix content by CD. Though the inclusion of aromatic residues is required for concentration determination, this can have the unwanted side effect of perturbing a CD spectrum, leading to an inaccurate measure of helix content. A simple solution to this is to separate the terminal Tyr from the rest of the sequence by one or more Gly residues [105]. If the aromatic residues must be included within the helical region, the CD spectrum should be corrected to remove this perturbation (see later). 3.8 Design of Peptides to Measure Helix Parameters
Numerous peptides have been studied to measure the forces responsible for helix formation (see below). In general, one or more peptides are synthesized that contain the interaction of interest, while all other terms that can contribute to helix stability are known. The helix content of the peptide is measured and the statistical weight of the interaction is varied until predictions from helix/coil theory match experiment. The weight of the interaction (and hence its free energy, as -RT ln(weight)) is then known. In practice it is wise to also synthesize a control peptide that lacks the interaction, but is otherwise very similar. The helix content of this peptide should be predictable from helix/coil theory using known parameters. For example, Table 2 gives sequences of control and interaction peptides used to measure Phe-Lys i, i + 4 side-chain interaction energies [106]. The FK control peptide has no i, i + 4 side-chain interactions and all the helix/coil weights for these residues are known. The helicity predicted by SCINT2 [30] for this peptide is in close agreement with experiment. This is an important confirmation that the helix/coil model and the parameters used are accurate. The discrepancy between prediction and experiment for the FK interaction peptide is because this peptide contains two Phe-Lys i, i + 4 side-chain interactions whose weights are assumed to be 1 (i.e., G = 0). If the Phe-Lys weight is changed to 1.3, the prediction agrees with experiment. Hence G (Phe-Lys) = - RT ln 1.3 = −0.14 kcal mol−1 .
12 Design and Stability of "-Helices
Table 2 Peptides designed to measure the energetics of the Phe-Lys interaction
Peptide name
Sequence
FK control Ac-AAAKFAAAAKFAAAAKAKAGY-NH2 FK interaction Ac-AAAKAFAAAKAFAAAKAKAGY-NH2
Predicted nelicity Phe-Lys weight G (Phe-Lys) Experimental helicity (no side-chain interaction) to fit experiment kcal mol−1 35.4% 41.1%
35.6 34.6
N/A 1.3
N/A −0.14
3 Design of Peptide Helices 13
It is important to minimize the error in determining helix/coil parameters. Errors can be calculated by assuming an error in measurement of % helix (±3% is reasonable) and refitting the results across this experimental error range. For the case above, G (Phe-Lys) was calculated for 38.1% and 44.1% helix, giving a range of −0.05 to −0.18 kcal mol−1 . This is a remarkably small error range, resulting from the high sensitivity of helix content to a small change in helix stability. This sensitivity can be maximized by including multiple identical interactions; here the FK interaction peptide has two Phe-Lys interactions, for example. The helix contents of the control and interaction peptides should be close to 50%. This may be difficult to achieve if the residues being used have low helix preferences. The best way to maximize sensitivity is to calculate it in advance using possible sequences, helix/coil theory, an error of ±3% and guessing a sensible value for the interaction energy. The interaction of interest can be placed at various positions in the peptide, its length can be changed by adding further AAKAA sequences, terminal residues such as a Tyr can be moved, and solubilizing side chains can be changed. Peptide design can be lengthy, considering sensitivity and solubility, but is a valuable process. The complex nature of the helix/coil equilibrium with frayed conformations highly populated means that considerable variations can be seen for apparently small changes, such as moving an interaction towards one terminus of a sequence.
3.9 Helix Templates
A major penalty to helix formation is the loss of entropy arising from the requirement to fix three consecutive residues to form the first hydrogen bond of the helix. Following this nucleation, propagation is much more favored as only a single residue need be restricted to form each additional hydrogen bond. A way to avoid this barrier is to synthesize a template molecule that facilitates helix initiation, by fixing hydrogen bond acceptors or donors in the correct orientation for a peptide to bond in a helical geometry. The ideal template nucleates a helix with an identical geometry to a real helix. Kemp’s group applied this strategy and synthesized a proline-like template that nucleated helices when a peptide chain was covalently attached to a carboxyl group (Figure 1a) [107–111]. The template is in an equilibrium between cis and trans isomers of the proline-like part of the molecule. Only the trans isomer can bond to the helix so helix content is determined by measuring the cis/trans ratio by NMR. Bartlett et al. reported on a hexahydroindol-4-one template (Figure 1b) [112] that induced 49–77% helicity at 0 ◦ C, depending on the method of determination, in an appended hexameric peptide. Several other templates were less successful and could only induce helicity in organic solvents [113–115]. Their syntheses are often lengthy and difficult, partly due to the challenging requirement of orienting several dipoles to act as hydrogen bond acceptors or donors.
14 Design and Stability of "-Helices
Fig. 1 Helix templates. a) Kemp, b) Bartlett.
3.10 Design of 310 -Helices
Peptides can be induced to form 310 -helix by the incorporation of a significant amount of disubstituted Cα amino acids, of which the simplest is α-aminoisobutyric acid (Aib) [116–119]. The presence of steric interactions from the two methyl groups on the α-carbon in AIB results in the 310 geometry being energetically favored over the α. Peptides rich in Cα, α-disubstituted α-amino acids are readily crystallized and many of their structures have been solved (reviewed in Refs [50] and [120]). Aib is conformationally restricted so that shorter Aib-based helices are more stable than Ala-based α-helices [121]. Many examples of helix stabilization or 310 -helix formation by Aib have been reported. The α/310 equilibrium has been studied in peptides. Yokum et al. [122] synthesized a peptide composed of Cα, α-disubstituted α-amino acids that forms mixed 310 -/α-helices in mixed aqueous/organic solvents. Millhauser and coworkers have studied the 310 /α equilibrium in peptides by electron spin resonance (ESR) and nuclear magnetic resonance (NMR) [47, 49, 123, 124] and argued that mixed 310 -/α-helices are common in polyalanine-based helices. Yoder et al. [125] studied a series of L-(αMe)-Val homopolymers and showed that their peptides can form coil, 310 -helix, or α-helix depending on concentration, peptide length, and solvent. Kennedy et al. [126] used Fourier transform infrared (FTIR) spectroscopy to monitor 310 - and α-helix formation in poly(Aib)-based peptides. Hungerford et al. [127] synthesized peptides that showed an α to 310 transition upon heating. 310 -Helices have also been proposed as thermodynamic intermediates where helical peptides of moderate stability exist as a mixture of α- and 310 -structures [45, 128].
4 Helix Coil Theory
Guidelines for the inclusion of the 20 natural amino acids in 310 -helices can be taken from their propensities in proteins, both at interior positions [42] and at N-caps [14]. These are similar, but not identical to α preferences. Side-chain interactions in the 310 -helix are spaced i, i + 3, with i, i + 4 on opposite sides of the helix. This offers scope to preferentially stabilize 310 over α, by including stabilizing i, i + 3 interactions and i, i + 3 repulsions. As the 310 - and α-helix structures are so similar, with only a small change in backbone dihedral angle, peptides designed to form 310 -helix are likely to form a mixture of 310 and α, with a central α segment and 310 at the helix termini common. This complex equilibrium, and the spectro-scopic similarity of α and 310 makes the analysis of these peptides difficult. 3.11 Design of p -helices
To our knowledge, no peptide has yet been made that forms π-helix. There are 4.4 side chains per turn of the π-helix, so i, i + 5 interactions may weakly stabilize πhelix over α. Given the other strongly destabilizing features of the π-helix, however, we doubt whether this effect will ever be strong enough. We expect that no peptide composed of the 20 natural amino acids will ever fold to a monomeric π-helix in aqueous solution. We are willing to back up this prediction, but in view of the out-come of the Paracelsus challenge [129, 130], we are taking the advice of George Rose [131] and offering only an “I Met the π-helix Challenge” T-shirt as a prize.
4 Helix Coil Theory
Peptides that form helices in solution do not show a simple two-state equilibrium between a fully folded and fully unfolded structure. Instead they form a complex mixture of all helix, all coil or, most frequently, central helices with frayed coil ends. In order to interpret experiments on helical peptides and make theoretical predictions on helices it is therefore essential to use a helix/coil theory that considers every possible location of a helix within a sequence. The first wave of work on helix/coil theory was in the late 1950s and early 1960s and this has been reviewed in detail by Poland and Scheraga [132]. In 1992, Qian and Schellman [133] reviewed current understanding of helix/coil theories. Our review covered the extensive development of helix/coil theory since this date [134]. The simplest way to analyze the helix/coil equilibrium, still occasionally seen, is the two-state model where the equilibrium is assumed to be between a 100% helix conformation and 100% coil. This is incorrect and its use gives serious errors. This is because helical peptides are generally most often found in partly helical conformations, often with a central helix and frayed, disordered ends, rather than in the fully folded or fully unfolded states.
15
16 Design and Stability of "-Helices
Fig. 2 Zimm-Bragg and Lifson-Roig codes and weights for the "-helix.
4.1 Zimm-Bragg Model
The two major types of helix/coil model are (i) those that count hydrogen bonds, principally Zimm-Bragg (ZB) [135] and (ii) those that consider residue conformations, principally Lifson-Roig [136]. In the ZB theory the units being considered are peptide groups and they are classified on the basis of whether their NH groups participate in hydrogen bonds within the helix. The ZB coding is shown in Figure 2. A unit is given a code of 1 (e.g., peptide unit 5 in Figure 2) if its NH group forms a hydrogen bond and 0 otherwise. The first hydrogen-bonded unit proceeding from the N-terminus has a statistical weight of σ s, successive hydrogen bonded units have weights of s and nonhydrogen-bonded units have weights of 1. The s-value is a propagation parameter and σ is an initiation parameter. The most fundamental feature of the thermodynamics of the helix/coil transition is that the initiation of a new helix is much more difficult than the propagation of an existing helix. This is because three residues need to be fixed in a helical geometry to form the first hydrogen bond while adding an additional hydrogen bond to an existing helix require that only one residue is fixed. These properties are thus captured in the ZB model by having σ smaller than s. The statistical weight of a homopolymeric helix of a N hydrogen bonds is σ sN−1 . The cost of initiation, σ , is thus paid only once for each helix while extending the helix simply multiplies its weight by one additional s-value for each extra hydrogen bond. 4.2 Lifson-Roig Model
In the LR model each residue is assigned a conformation of helix (h) or coil (c), depending on whether it has helical , ψ angles. Every conformation of a peptide of N residues can therefore be written as a string of N c’s or h’s, giving 2N conformations in total. Residues are assigned statistical weights depending on their conformations and the conformations of surrounding residues. A residue in an h conformation with an h on either side has a weight of w. This can be thought of as an equilibrium constant between the helix interior and the coil. Coil residues are used as a reference and have a weight of 1. In order to form an i, i + 4 hydrogen bond in a helix three successive residues need to be fixed in a helical
4 Helix Coil Theory
conformation. M consecutive helical residues will therefore have M–2 hydrogen bonds. The two residues at the helix termini (i.e., those in the center of chh or hhc conformations) are therefore assigned weights of v (Figure 2). The ratio of w to v gives the approximately the effect of hydrogen bonding (1.7:0.036 for Ala [30] or - RT ln(1.7/0.036) = −2.1 kcal mol−1 ). A helical homopolymer segment of M residues has a weight of v2 wM-2 and a population in the equilibrium of v2 wM-2 divided by the sum of the weights of every conformation (i.e., the partition function). In this way the population of every conformation is calculated and all properties of the helix/coil equilibrium evaluated. The LR model is easier to handle conceptually for heteropolymers since the w and v parameters are assigned to individual residues. The substitution of one amino acid at a certain position thus changes the w- and vvalues at that position. In the ZB model the initiation parameter σ is associated with several residues and s with a peptide group, rather than a residue. It is therefore easier to use the LR model when making substitutions. Indeed, most recent work has been based on this model. A further difference is that the ZB model assigns weights of zero to all conformations that contain a chc or chhc sequence. This excludes a very large number of conformations that contain a residue with helical φ,ψ angles but with no hydrogen bond. In LR theory, these are all considered. The ZB and LR weights are related by the following formulae [133]: s = w/(1 + v); σ = v2 /(1 + v)4 . The complete helix/coil equilibrium is handled by determining the statistical weight for every possible conformation that contains a helix plus a reference weight of 1 for the coil conformation. Each conformation considered in the helix/coil equilibrium is given a statistical weight. This indicates the stability of that conformation, with the higher the weight, the more probable the conformation. Weights are defined relative to the all coil conformation, which is given a weight of 1. The statistical weight of a conformation can thus be regarded as an equilibrium constant relative to the coil; a weight >1 indicates the conformation is more stable than coil, <1 means less stable and 1 means equally stable. The population of each conformation is given by the statistical weight of that conformation divided by the sum of the statistical weights for every conformation (i.e., the partition function). Thus the greater the statistical weight, the more stable the conformation. The key to using helix/coil theory is the partition function. All the properties of a system at equilibrium are contained within the partition function, which makes it very valuable. Partition functions are extremely powerful concepts in statistical thermodynamics since they include all properties of an equilibrium. Any property of the equilibrium can be extracted from the partition function by applying the appropriate mathematical function. This is analogous to quantum mechanics where any property can be determined from a system by applying an operator function to a wavefunction. In this case the properties could be the mean number of hydrogen bonds, the mean helix length, the probability that each residue is within a helix etc. In particular, the mean number of residues with a weight x is given by ∂ ln Z/∂ ln x. Circular dichroism is commonly used to give the mean helix content of a helical peptide, namely the fraction of residues that have a weight of w. LR-based models can thus be related to experimental data by equating the measured mean
17
18 Design and Stability of "-Helices
helix content to (∂ ln Z/∂ ln x)/N, where N is the number of residues in the peptide. Statistical weights can be regarded as equilibrium constants for the equilibrium between coil and the structure (as the reference coil weight is defined as 1). They can therefore be converted to free energies as - RT ln (weight). Equations for partition functions are determined as follows: Write a table of the conformations and assigned statistical weights for residues. For the Lifson-Roig model, the weights of a residue depend on its conformation and the weights of its two neighboring residues as follows: Conformation Weight of central residue in triplet ccc cch chc chh hcc hch hhc hhh
1 1 v v 1 1 v w
The same information is rewritten in the form of a matrix (M):
h h¯ M = h c¯ c h¯ c c¯
¯ hc ¯ c¯h c¯c hh wv00 0 0 1 1 v v 0 0 0 011
For each triplet, the state of the left-most residue is shown at the start of each row in the matrix. The state of the right-most residue in each triplet is shown at the end of the top of each column. The state of the center residue of each triplet is shown as the barred residue in both the rows and columns. When the states of the center residues differ (i.e., one is c while the other is h), the entry in the matrix is zero. Otherwise, the matrix gives the weight, taken from the table. This apparently simple change in presentation now (amazingly) allows the generation of the partition function (Z) for a polypeptide of N residues as: 0 1 Z = ( 0 0 1 1 )M N 1 1 The end vectors are present to ensure that the first and last residues in the peptide cannot have a w-weighting. A w-weighting indicates that the residue is between two residues that hydrogen bond to each other and this is impossible for terminal residues. Further extensions to helix/coil theory are dealt with in the same way, by defining residue weight assignments, rewriting as a matrix and determining end
4 Helix Coil Theory
vectors. The work may be made easier by combining any identical columns. For example, the Lifson-Roig matrix above has identical third and fourth columns so can be simplified to a 3 × 3 matrix:
M = h h¯
h c¯ c(h∪c)
¯ hc ¯ c¯(h ∪ c) hh wv0 0 0 1 v v0
Here, h ∪ c shows the residues being combined. The matrix entries for the combined positions are the entries for the noncombined entries, with zero being discounted if combined with a nonzero entry. New end vectors are now required, as their lengths must be the same as the order of the square matrix. 4.3 The Unfolded State and Polyproline II Helix
The treatment of peptide conformations is based on Flory’s isolated-pair hypothesis [137]. This states that while and ψ for a residue are strongly interdependent, giving preferred areas in a Ramachandran plot, each , ψ pair is independent of the , ψ angles of its neighbors. Pappu et al. examined the isolated-pair hypothesis in detail by exhaustively enumerating the conformations of poly(Ala) chains [138]. Each residue was considered to populate 14 mesostates, defined by ranges of , ψ values. By considering all 14 N mesostate strings, all conformations were considered for up to seven alanines. The number of allowed conformations was found to be considerably fewer than the maximum, thus showing that the isolatedpair hypothesis is invalid. The chains mostly populate extended or helical conformations as many partly helical conformations are sterically disallowed. Such effects are not included in helix/coil theories, thus presenting a considerable challenge for the future. Helix/coil theories assign the same weight (1) to every coil residue; steric exclusion means that these should vary and be lower than 1 in many cases. The polyproline II helix may well be an important conformation for unfolded proteins [139–148]. In particular, denatured alanine-rich peptides may form polyproline II helix [143, 148, 149]. It may therefore be valid to consider residues in helical peptides to be in three possible states (helix, coil, or polyproline II), rather than two (helix or coil). No current helix/coil model takes this into account. A scale of amino acid preferences for the polyproline II helix has recently been published [150]. 4.4 Single Sequence Approximation
Since helix nucleation is difficult, conformations with multiple helical segments are expected to be rare in short peptides. In the one-, or single-, helical sequence approximation, peptide conformations containing more than one helical segment are assumed not to be populated and are excluded from the partition function (i.e.,
19
20 Design and Stability of "-Helices
assigned statistical weights of zero). As peptide length increases, the approximation is no longer valid since multiple helical segments can be long enough to overcome the initiation penalty. The single sequence approximation will also break down when a sequence with a high preference for a helix terminus is within the middle of the chain. The error from using the single sequence approximation will therefore show a wide variation with sequence and could be potentially serious if a sequence has a high preference to populate more than one helix simultaneously. Conformations with two or more helices may also often include helix-helix tertiary interactions that are ignored in all helix/coil models. 4.5 N- and C-Caps
In the original LR model weights are assigned to residues in the center of hhh triplets (a weight of w for propagation) or in the center of chh or hhc triplets (a weight of v for initiation). Residues in all other triplets have weights of 1. LR-based models have been extended by assigning weights to additional conformations. Ncapping can therefore be included in LR theory be assigning a weight of n to the central residue in a cch triplet [99]. Similarly, the C-cap is the first residue in a nonhelical conformation (c) at the C-terminus of a helix. C-cap weights (cvalues) are assigned to central residues in hcc triplets. Application of the model to experimental data where the C-terminal amino acid of a helical peptide was varied, allowed the determination of the c-values and hence free energies of C-capping (as -RT ln c) of all the amino acids [29]. A problem with the original definitions of the capping weights above is that they apply to isolated h or hh conformations that are best regarded as part of the random coil. A helical hydrogen bond can only form when a minimum of three consecutive h residues are present. Andersen and Tong [151] and Rohl et al. [9] therefore changed the definition of the N-cap to apply only to the c residue in a chhh quartet. The N-cap residues in chc or chhc conformations have weights of zero in the Andersen-Tong model and 1 in the Rohl et al. model. 4.6 Capping Boxes
The N-terminal capping box [25] includes a side chain-backbone hydrogen bond from N3 to the N-cap (i, i−3). This is included in the LR model by assigning a weight of w*r to the chhh conformation, where r is the weight for the Ser backbone to Glu side chain bond [9]. 4.7 Side-chain Interactions
As helices have 3.6 residues per turn, side chains spaced i, i + 3 or i, i + 4 are close in space. Side-chain interactions are thus possible when four or five consecutive
4 Helix Coil Theory
residues are in a helix. They are included in the LR-based model by giving a weight of w*q to hhhh quartets and w*p to hhhhh quintets. The side-chain interaction is between the first and last side chains in these groups; the w weight is maintained to keep the equivalence between the number of residues with a w weighting and the number of backbone helix hydrogen bonds [100]. Scholtz et al. [152] used a model based on the one-helical-sequence approximation of the LR model to quantitatively analyze salt bridge interactions in alanine-based peptides. Only a single interaction between residues of any spacing was considered, though this was appropriate for the sequences they studied. Shalongo and Stellwagen [153] also proposed incorporating side-chain interaction energies into the LR model, using a clever recursive algorithm. 4.8 N1, N2, and N3 Preferences
The helix N-terminus shows significantly different residue frequencies for the N-cap, N1, N2, N3, and helix interior positions [12, 19, 154, 155]. A complete theory for the helix should therefore include distinct preferences for the N1, N2, and N3 positions. In the original LR model, the N1 and C1 residues are both assigned the same weight, v. Shalongo and Stellwagen [153] separated these as vN and vC . Andersen and Tong [151] did the same and derived complete scales for these parameters from fitting experimental data, though some values were tentative. The helix initiation penalty is vN *vC and so vN and vC values are all small (≈ 0.04). We added weights for the N1, N2, and N3 (n1, n2, and n3) positions as follows [156]: The n1 value is assigned to a helical residue immediately following a coil residue. The penalty for helix initiation is now n1.v, instead of v2 , as v remains the C1 weight. An N2 helical residue is assigned a weight of n2.w, instead of w. The weight w is maintained in order to keep the useful definition of the number of residues with a w weighting being equal to the number of residues with an i, i + 4 main chain-main chain hydrogen bond. The n2 value is an adjustment to the weight of an N2 residue that takes into account the structures that can be adopted by side chains uniquely at this position. Similarly, an N3 residue is now assigned the weight n3.w, instead of w. 4.9 Helix Dipole
Helix-dipole effects were added to the LR model by Scholtz et al. [152], though they used the one-sequence approximation so that only one or no dipoles in total are present. In LR models helix-dipole effects are subsumed within other energies. For example, N-cap, N1, N2, and N3 energies will include a contribution from the helixdipole interaction so the energy of interaction of charged groups at this position with the dipole should not be counted in addition.
21
22 Design and Stability of "-Helices
4.10 310 - and p -Helices
The Lifson-Roig formalism can easily be adapted to describe helices of other cooperative lengths [157]. The fundamental difference between a 310 -helix and an α-helix is that the 310 -helix has an i, i + 3 hydrogen-bonding pattern rather than the i, i + 4 pattern characteristic of the α-helix. For a given number of units in helical conformations, a 310 -helix will consequently have one more hydrogen bond than an α-helix. To include this difference in the 3 10 -helix theory, one of the αhelical-initiating residues (i.e., the central unit of either the hα hα c or the chα hα triplet) must become a 310 -helix-propagating residue. We arbitrarily chose to assign the propagating statistical weight, wτ , to the central unit of the hτ hτ c triplet such that helix propagating unit i is associated with the hydrogen bond formed between the CO of peptide i−2 and the NH of peptide i+1. The remainder of the statistical weights applicable to the α-helix/coil theory are maintained. The models described above for the α-helix/coil and 310 -helix/coil transitions can be combined to describe an equilibrium including pure α-helices, pure 310 -helices, and mixed α-/310 -helices [157]. In this model, three conformational states are possible, 310 -helical (hτ ), α-helical (hα ), and coil (c). The pure 310 -helix and the mixed α-/310 -helix models were subsequently extended to include side-chain interactions [158]. Sheinerman and Brooks [128] independently produced a model for the α/310 /coil equilibrium, based on the ZB formalism, rather than the LR model. They similarly extend the classification of conformations from α/coil to α/310 /coil. In a π-helix, formation of an i, i + 5 hydrogen bond requires that four units be constrained to the π-helical conformation, hπ . The π subscript designates the conformation and weights describing the π-helix, whose dihedral angles are distinct from α- and 310 -helices. Assigning statistical weights to individual units requires consideration of the conformations of the unit itself and its three nearest neighbors [157]. The initiating statistical weight, vπ , is assigned to a helical unit when one or more of its two N-terminal and nearest C-terminal neighbors are in the coil conformation. The definition of helix-initiating units as the two N-terminal and one C-terminal units of each helical stretch is again arbitrary. Units in a π-helical conformation with three helical neighbors are assigned the propagating statistical weight, wπ . A π-helix propagating residue, i, is thus associated with the hydrogen bond between the NH of residue i + 2 and the CO of residue i −3. 4.11 AGADIR
AGADIR is an LR-based helix/coil model developed by Serrano, Mu˜ noz and coworkers. The original model [159] included parameters for helix propensities excluding backbone hydrogen bonds (attributed to conformational entropy), backbone hydrogen bond enthalpy, side-chain interactions, and a term for coil weights at the end of helical sequences (i.e., caps). The single-sequence approximation was used. The original partition function assumed that many helical conformations did not exist, as all conformations in which the residue of interest is not part of a helix were
4 Helix Coil Theory
excluded [100, 159]. These were corrected in a later version, AGADIRms, which considers all possible conformations [160]. If AGADIR and LR models are both applied to the same data, to determine a side-chain interaction energy, for example, the results are similar, showing that the models are now not significantly different [160, 161]. The treatment of the helix/coil equilibrium differs in a number of respects from the ZB and LR models and these have been discussed in detail in by Mu˜ noz and Serrano [160]. The minimal helix length in AGADIR is four residues in an h conformation, rather than three. The effect of this assumption is to exclude all helices which contain a single hydrogen bond; only helices with two or more hydrogen bonds are allowed. In practice, this probably makes little difference as chhhc conformations are usually unfavorable and hence have low populations. Early versions of AGADIR considered that residues following an acetyl at the N-terminus or preceding an amide at the C-terminus were always in a c conformation; this was changed to allow these to be helical [162]. The latest version of AGADIR, AGADIR1s-2 [162], includes terms for electrostatics [162], the helix dipole [162, 163], pH dependence [163], temperature [163], ionic strength [162], N1, N2, and N3 preferences [164], and capping motifs such as the capping box, hydrophobic staple, Schellman motif, and Pro-capping motif [162]. The free energy of a helical segment, Ghelical-segment , is given by Ghelical-segment + GInt + GHbond + GSD + Gdipole +GnonH + Gelectrost , which are terms for the energy required to fix a residue in helical angles (with separate terms for N1, N2, N3, and N4), backbone hydrogen bonding, side-chain interactions excluding those between charged groups, capping, and helix-dipole interactions, respectively. Electrostatic interactions are calculated with Coulomb’s equation. Helix-dipole interactions were all electrostatic interactions between the helix dipole or free N- and Ctermini and groups in the helix. Interactions of the helix dipole with charged groups located outside the helical segment were also included. pH-dependence calculations considered a different parameter set for charged and uncharged side chains and their pK a values. The single sequence approximation (see above) is used again, unlike in AGADIRms. This means that it must not be used for full protein sequences, though this has been done, even if they do not have any tertiary interactions. AGADIR is at present the only model that can give a prediction of helix content for any peptide sequence, thus making it very useful. It can also predict NMR chemical shifts and coupling constants. In order to do this it must include estimates of all the terms that contribute to helix stability, notably the 400 possible i, i + 4 side-chain interactions. Since only a few of these interactions have been measured accurately, the terms used cannot be precise. Further determination of energetic contributions to helix stability is therefore still needed. 4.12 Lomize-Mosberg Model
Lomize and Mosberg also developed a thermodynamic model for calculating the stability of helices in solution [165]. Interestingly, they extended it to consider helices in micelles or a uniform nonpolar droplet to model a protein core environment. Helix stability in water is calculated as the sum of main chain interactions, which is the
23
24 Design and Stability of "-Helices
free energy change for transferring Ala from coil to helix, the difference in energy when replacing an Ala with another residue, hydrogen bonding and electrostatic interactions between polar side chains and hydrophobic side-chain interactions. An entropic nucleation penalty of two residues per helix is included. Different energies are included for N-cap, N1–N3, C1–C3, C-cap, hydrophobic staples, Schellman motifs, and polar side-chain interactions, based on known empirical data at the time (1996). Hydrophobic interactions were calculated from decreases in nonpolar surface area when they are brought in contact. Helix stability in micelles or nonpolar droplets are found by calculating the stability in water then adding a transfer energy to the nonpolar environment. 4.13 Extension of the Zimm-Bragg Model
Following the discovery of short peptides that form isolated helices in aqueous solution, V´asquez and Scheraga extended the ZB model to include helix-dipole and side-chain interactions [166]. The model is very general as it can include interactions of any spacing within a single helix. It was applied to determine i, i + 4 and i, i + 8 interactions. Long-range interactions, beyond the scope of LR models, can thus be included. Roberts [167] and Gans et al. [88] also refined the ZB model to include side-chain interactions. 4.14 Availability of Helix/Coil Programs
Some Lifson-Roig based helix/coil models are available at http://www.bi. umist.ac.uk/users/mjfajdg/HC.htm. We recommend SCINT2 for peptides with side-chain interactions (freely available from us). If no side-chain interactions are present, CAPHELIX is identical and simpler to run. If N1, N2, or N3 preferences are important in your sequence N1N2N3 can be used. If the peptide contains side-chain interactions that have not been well characterized (which is the case for the great majority of sequences) AGADIR can be run at http://www. embl-heidelberg.de/Services/serrano/agadir/agadir-start.html. In our experience, SCINT2 or CAPHE-LIX are more accurate at predicting helix contents of alaninebased control peptides than AGADIR (see Refs [101] and [106] for examples), though AGADIR can be applied to many more sequences. The Andersen version of an LRbased helix/coil is at http://faculty.washington.edu/nielshan/helix/. 5 Forces Affecting α-helix Stability 5.1 Helix Interior
Since the advent of crystal structures of proteins it had been noticed that some amino acids appeared frequently in α-helices and others less frequently [168, 169].
5 Forces Affecting "-helix Stability 25
For example alanine and leucine are abundant, whereas proline and glycine appear rarely. As more of this kind of information became available a helix propensity scale was derived [170] and eventually allowed prediction of the location of α-helices (and other structures) in folded proteins from their sequence [171]. Different approaches have been used in order to determine the helical propensity or preference of individual amino acids. Scheraga and coworkers used a host–guest strategy (see Section 3.1) to derive values for the helical preference of various amino acid residues [172, 173]. This has been carried out for all 20 naturally occurring amino acids [174]. This work has been criticized as the host side chains can interact with each other [175]. The introduction of a guest residue thus removes host–host interactions and replaces them with PHBG–guest or PHPG– guest side-chain interactions that may obscure the intrinsic helix propensities. We believe that helix propensities are best evaluated in an alanine background, where no side-chain interactions can affect the helix stability. Rohl et al. [30] used many alanine-based peptides with the general sequences Ac(AAKAA)m Y-NH2 (or with Q instead of K) to measure interior helix propensities. Substitutions in the helix interior and subsequent measures of helicity using CD spectroscopy in both water and 40% (v/v) trifluoroethanol (TFE) allowed both the calculation of the Lifson-Roig w parameter and stabilization energy for all 20 amino acids (see Table 3). Kallenbach and coworkers also used synthetic peptides of the form succinyl-YSEEEEKAKKAXAEEAEKKKK-NH2 where substitutions at X
Table 3 Helix propagation propensities and free energies of amino acids in water
Amino acid Ala Arg+ Leu Met Lys+ Gln Glu Ile Trp Ser Tyr Phe Val His Asn Thr Cys Asp Gly Pro From Ref. [30].
G◦ (helix) (kcal mol−1 )
w value
−0.27 −0.052 0.095 0.25 0.019 0.28 0.21 0.44 0.69 0.52 0.42 0.73 0.77 0.57 0.69 0.95 0.64 0.52 1.7 >3.8
1.70 1.14 0.87 0.65 1.00 0.62 0.70 0.46 0.29 0.40 0.48 0.27 0.25 0.36 0.29 0.18 0.32 0.40 0.048 <0.001
26 Design and Stability of "-Helices
allowed determination of helix stabilising energies for common amino acids [87]. Stellwagen and coworkers made substitutions in position 9 of Ac-Y(EAAAK)3 ANH2 [84]. These agree well with the alanine-based peptide work described previously [30, 176]. Other groups have investigated helical propensities and stabilization using whole protein methods. Blaber et al. [177–179] used mutagenesis in the helices of phage T4 lysozyme to study the structural effects of substitutions of amino acids. With the exception of substituting proline, they found that no substitutions significantly distorted the helix backbone. G values correlated well (71–93%) with model peptide studies and with studies on the frequency of amino acid occurrence in protein structures [171]. Fersht and coworkers used a similar method with barnase to study the effect of replacing Ala32 of the second helix in this protein with the other 19 naturally common amino acids [180]. They used reversible urea denaturation to measure free energies of unfolding. O’Neil and DeGrado [181] used substitutions into an α-helical two-stranded coiled-coil system to deduce helix-forming tendencies of common amino acids, through the design of a peptide that forms a noncovalent α-helical dimer, which is in equilibrium with a randomly coiled monomeric state. The α-helices in the dimer contain a single solvent-exposed site that is surrounded by small, neutral amino acid side chains. Each of the commonly occurring amino acids was substituted into this guest site, and the resulting equilibrium constants for the monomer– dimer equilibrium were determined to provide a list of free energy difference (G◦ ) values. Again these values show good agreement with those of other groups working in different model systems. In 1998 Pace and Scholtz [182] gathered information from many different sources and derived a scale for the propensity of each amino acid in the helix interior. This is summarized in Table 4. The values are in (G) relative to alanine as zero. Alanine was taken as zero as it is generally (though not universally) agreed that this amino acid has the highest helical propensity. Shown for comparison in Table 5 are the s values from the Zimm-Bragg model as derived by the Scheraga group [174]. Both tables are ranked in terms of descending helical propensity. As can be seen from Tables 4 and 5, the two series do not agree on even the relative helical propensities of the amino acid residues. The data summarized by Pace and Scholtz shows alanine having the highest helix propensity and all other residues having lower values (a positive (G) value relative to Ala). Both series show proline and glycine to have the lowest helical propensity (highest (G) value or lowest s value. The most controversial of these differences over the years has been that of alanine. Host–guest analysis showing alanine to be effectively helix-neutral has been supported by data from some other groups, notably the templated helices of Kemp and coworkers [107]. Several efforts have been made to try to explain this discrepancy, including implication of the charged groups used to solubilize the alanine-based peptides [183, 184], but little reconciliation has been achieved. The use of template-nucleated helices has been criticized by Rohl et al. [185] who argued that the low apparent helix propensity of alanine is a consequence of properties of the template–helix junction. However, recent work by Kemp and
5 Forces Affecting "-helix Stability 27 Table 4 Summary of other experimental helix propensities (relative to alanine)
Amino acid Ala Arg+ Leu Met Lys+ Gln Glu Ile Trp Ser Tyr Phe Val His Asn Thr Cys Asp Gly Pro
Helix propensity (G)(kcal mol−1 ) 0.00 0.21 0.21 0.24 0.26 0.39 0.40 0.41 0.49 0.50 0.53 0.54 0.61 0.61 0.65 0.66 0.68 0.69 1.00 3.16
From Ref. [182]. Table 5 Helical propensity values from host–guest studies
Amino acid Glu Met Leu Ile Trp Phe Ala Arg+ Tyr Cys Gln Val Lys+ His Thr Asn Asp Ser Gly Pro From Ref. [174].
s (20◦ C) 1.35 1.2 1.14 1.14 1.11 1.09 1.07 1.03 1.02 0.99 0.98 0.95 0.94 0.85 0.82 0.78 0.78 0.76 0.59 0.19
28 Design and Stability of "-Helices
coworkers [186] may at last settle the controversy. Using templates to investigate the helix-forming tendency of polyalanine, these workers extended the length of the polyalanine beyond the previous limit of six residues. Below six residues, both this group and Scheraga’s had low helix propensities for alanine (see above) but when the limit of six was exceeded, a dramatic increase in helix propensity was observed. For chains with less than six alanines, w = 1:03, in agreement with both Kemp and Scheraga’s earlier experimental results. For chains with 6–9 alanines, w = 1:15 and for more than 10 Alanines, w is 1.26. This indicates that there is a length-dependent term in the helicity of polyalanine and that the charged groups are not having the effect previously ascribed to them. These values for longer polyalanine sequences are also much more consistent with values published by other groups. 5.2 Caps
Serrano and Fersht [20] explored the capping preferences at the N-cap by mutating Thr residues at the N-cap of two helices in barnase. They found that negatively charged residues were favored at the N-cap with a rank order of Asp > Thr > Glu > Ser > Asn > Gly > Gln > Ala > Val. Interestingly, their result conflicted with the statistical survey result that Asn is one of the most frequently found N-caps in proteins [12]. Experimentally Asn destabilized the helices by 1.3 kcal mol−1 relative to Thr. The rank order of amino acids N-cap preferences in T4 lysozyme was found to be Thr > Ser > Asn > Asp > Val = Ala > Gly [21]. They suggested that Asn can be inherently as good an N-cap as Ser or Thr, but it requires a change in backbone dihedral angles of N-cap residues, which might be altered in native proteins as the results of tertiary contacts. Indeed Asn is the most stabilizing residue at N-cap in a peptide model in the absence of tertiary contacts and other side-chain interactions (see below). The Kallenbach group [187] substituted several amino acids at the N-cap position in peptide models in the presence of a capping box. They found that Ser and Arg are the most stabilizing residues with G relative to Ala of −0.74 and −0.58 kcal mol−1 , respectively, whilst Gly and Ala are less stabilizing. The results are in agreement with the results of Forood et al. [23], who found that the trend in α-helix inducing ability at the N-cap is Asp > Asn > Ser > Glu > Gln > Ala. A more comprehensive work to determine the preferences for all 20 amino acids at the N-cap position used peptides with a sequence of NH2 -XAKAAAAKAAAAKAAGYCONH2 [22, 29, 99]. N-Capping free energies ranged from Asn (best) to Gln (worst) (Table 6). We have used a similar approach using peptide models to probe the preferences at N1 [95], N2 [96], and N3 [97] using peptides with sequences of CH3 COXAAAA-QAAAAQAAGY-CONH2 , CH3 CO-AXAAAAKAAAAKAAGY-CONH2 and CH3 CO-AAXAAAAKAAAAKAGY-CONH2 , respectively. The results have given N1, N2 and N3 preferences for most amino acids for these positions (Table 6) and these agree well with preferences seen in protein structures, with the interesting exception of Pro at N1. Petukhov et al. similarly obtained N1, N2, and N3
5 Forces Affecting "-helix Stability 29 Table 6 Amino acid propensities at N- and C-terminal positions of the helix
G relative to Ala for transition from coil to the position (kcal mol−1 )
Residue
A C◦ C− D◦ D− E◦ E− F G H◦ H+ I K+ L M N P Q R+ S T V W Y
N-cap
N1
29
95 164;188
0
0
−1.4 1.0 0.5 −1.6 0 1.0 −0.7 0.1 −0.7 1.4 −1.2 1.0 −0.7 0.7 −0.5 0.1 −0.7 −0.3 −1.7 −0.4 2.5 −0.1 −1.2 −0.7 −0.1 −1.3 −0.9
0.5 0.7 0.4 0.5 0.6 0.5 0.7 0.4 0.5 0.6 0.4
N2
0
96
N3
C3
164;188 97 164;188 189 0
0
0.9 0.7 −0.2 −0.2 −0.4 0.9 0.7
0
0
0
C2 190 0
C1 C-cap 189 189 0
0
29
0.6 1.3 0.4
0 0.2
0
0.2
0.3
−0.4 −0.5
0.3
0.8
2.1
0.6 1.0 0.6
0.4
0.5
0.2
0.2 0.4
0.5
0.5
0.1 0.1 0.7 0.4
−0.1 −0.1 −0.3 0.3 0.1
0.1
0.2 0.1 0.6 0.5 0.3 0.4 0.5
0.5 0.8 0.7 0.5 0.5 0.8
0.1 −1.1
2.6 −0.2
0.6 0.9 0.5 0.7 1.7
251
1.1
0.8 0.4
C
0.5 0.5 0.3 0.7
0.7 0.9 0.8 0.7
0.4 0.4 0.7
0.3
1.2
0.2
0.5 0.5 0.4
1.1 1.2
0.6 0.6 0.4
4.0 1.2
0.2 −0.02 0.2 0.05 −0.5 −0.4 0.6 0.5 0.7 0.5 0.8 0.8 0.6 0.5 0.8 0.3 0.4 0.7 0.6 0.9
−0.9 1.5 −0.1 0.1 −0.4 1.2 −0.1 −0.2 0.3 1.1 1.6 0.7
−2.2
NH2 −XAKAAAAKAAAAKAAGY-CONH2 CH3 CO-XAAAAQAAAAQAAGY-CONH2 c CH3 CO-XAAAAAAARAAARGGY-NH2 d CH3 CO-AXAAAAKAAAAKAAGY-CONH2 e CH3 CO-AXAAAAAARAAARGGY-NH2 f CH3 CO-AAXAAAAKAAAAKAGY-CONH2 g CH3 CO-AAXAAAAARAAARGGY-NH2 h NH2 −YGGSAKEAAARAAAAXAA-CONH2 i Substitution of residue 32 (C2 position) of α-helix of ubiquitin. j NH2 −YGGSAKEAAARAAAAAXA-CONH2 k NH2 −YGGSAKEAAARAAAAAAX-CONH2 l CH3 CO-YGAAKAAAAKAAAAKAX-COOH m Substitution of residue 35 (C position) of α-helix of ubiquitin. a b
preferences for nonpolar and uncharged polar residues by applying AGADIR to experimental helical peptide data, and found almost identical results [164, 188]. The complete sequences of peptides used can be seen in the table footnote. In general, at N1, N2, and N3, Asp and Glu as well as Ala are preferred, presumably because negative side chains interact favorably with the helix dipole or NH groups while Ala has the strongest interior helix preference.
30 Design and Stability of "-Helices
Although it is also unique in terms of the presence of unsatisfied backbone hydrogen bonds, the C-terminal region is less explored experimentally. The Cterminus of the α-helix tends to fray more than the N-terminus, making C-terminal measurements less accurate. Preferences at the C-cap position differ from those at the N-cap. At the N-terminus, the helix geometry favors side chain-to-backbone hydrogen bonding, so polar residues are preferred [14, 19]. At the C-terminus unsatisfied backbone hydrogen bonds are fulfilled by interactions with backbone groups upstream of the helix. Zhou et al. [48] found that Asn is the most favored residue at the C-cap followed by Gln > Ser∼Ala > Gly∼Thr. Forood et al. [23] tested a limited number of amino acids at the C-terminus (C1) finding a rank order of Arg > Lys > Ala. Doig and Baldwin [29] determined the C-capping preferences for all 20 amino acids in α-helical peptides. The thermodynamic propensities of some amino acids at C, C-cap, C1, C2, and C3 are also included in Table 6 [189, 190]. 5.3 Phosphorylation
Phosphoserine is destabilizing compared with serine at interior helix positions [191, 192]. We investigated the effect of placing phosphoserine at the N-cap, N1, N2, N3, and interior position in alanine-based α-helical peptides, studying both the −1 and −2 phosphoserine charge states [193]. Phosphoserine stabilizes at the N-terminal positions by as much as 2.3 kcal mol−1 , while it destabilizes in the helix interior by 1.2 kcal mol−1 , relative to serine. The rank order of free energies relative to serine at each position is N2 > N3 > N1 > N-cap > interior. Moreover, −2 phosphoserine is the most preferred residue known at each of these N-terminal positions. Experimental pK a values for the −1 to −2 phosphoserine transition are in the order N2 < N-Cap < N1 < N3 < interior. Phosphoserine can form stabilizing salt bridges to arginine [192]. 5.4 Noncovalent Side-chain Interactions
Many studies have been performed on the stabilizing effects of interactions between amino acid side chains in α-helices. These studies have identified a number of types of interaction that stabilize the helix including salt bridges [83, 86, 88, 152, 194–198], hydrogen bonds [152, 198–200], hydrophobic interactions [100, 201–203], basic/aromatic interactions [106, 204], and polar/nonpolar interactions [101]. The stabilizing energies of many pairs in these categories have been measured, though some have only been analyzed qualitatively. As described earlier, residue side chains spaced i, i + 3 and i, i + 4 are on the same face of the α-helix, though it is the i, i + 4 spacing that receives most attention in the literature, as these are stronger. A summary of stabilizing energies for side-chain interactions is given in Table 7. We give only those that have been measured in helical peptides with the side-chain interaction energies determined by applying helix/coil theory. Almost all are attractive, with the sole exception of the Lys-Lys repulsion.
5 Forces Affecting "-helix Stability 31 Table 7 Summary of side-chain interaction energies from literature
Interaction Ile-Lys (i, i + 4) Val-Lys (i, i + 4) Ile-Arg (i, i + 4) Phe-Met (i, i + 4) Met-Phe (i, i + 4) Gln-Asn (i, i + 4) Asn-Gln (i, i + 4) Phe-Lys (i, i + 4) Lys-Phe (i, i + 4) Phe-Arg (i, i + 4) Phe-Orn (i, i + 4) Arg-Phe (i, i + 4) Tyr-Lys (i, i + 4) Glu-Phe (i, i + 4) Asp-Lys (i, i + 3) Asp-Lys (i, i + 4) Asp-His (i, i + 3) Asp-His (i, i + 4) Asp-Arg (i, i + 3) Glu-His (i, i + 3) Glu-His (i, i + 4) Glu-Lys (i, i + 3) Glu-Lys (i, i + 4) Phe-His (i, i + 4) Phe-Met (i, i + 4) His-Asp (i, i + 3) His-Asp (i, i + 4) His-Glu (i, i + 3) His-Glu (i, i + 4) Lys-Asp (i, i + 3) Lys-Asp (i, i + 4) Lys-Glu (i, i + 3) Lys-Glu (i, i + 4) Lys-Lys (i, i + 4) Leu-Tyr (i, i + 3) Leu-Tyr (i, i + 4) Met-Phe (i, i + 4) Gln-Asp (i, i + 4) Gln-Glu (i, i + 4) Trp-Arg (i, i + 4) Trp-His (i, i + 4) Tyr-Leu (i, i + 3) Tyr-Leu (i, i + 4) Tyr- Val (i, i + 3) Tyr- Val (i, i + 4) Arg (i, i + 4) Glu (i, i + 4) Arg Arg (i, i + 3) Glu (i, i + 3) Arg Arg (i, i + 3) Glu (i, i + 4) Arg Arg (i, i + 4) Glu (i, i + 3) Arg Phosphoserine-Arg (i, i + 4)
G (kcal mol−1 )
Reference
−0.22 −0.25 −0.22 −0.8 −0.5 −0.5 −0.1 −0.14 −0.10 −0.18 −0.4 −0.1 −0.22 −0.5 −0.12 −0.24 >−0.63 >−0.63 −0.8 −0.23 −0.10 −0.38 −0.44 −1.27 −0.7 −0.53 −2.38 −0.45 −0.54 −0.4 −0.58 −0.38 −0.46 +0.17 −0.44 −0.65 −0.37 −0.97 −0.31 −0.4 −0.8 −0.02 −0.44 −0.13 −0.31 −1.5 −1.0 −0.3 −0.1 −0.45
[101] [101] [101] [100] [100] [200] [200] [106] [106] [106] [204] [106] [106] [252] [227] [227] [253] [253] [254] [227] [227] [152] [152] [198] [203] [198] [241] [227] [227] [227] [227] [227] [227] [196] [153] [153] [203] [199] [152] [252] [161] [153] [153] [153] [153] [255] [255] [255] [255] [192]
32 Design and Stability of "-Helices
5.5 Covalent Side-chain interactions
Lactam (amide) bonds formed between NH3 + and CO2 − side chains can stabilize a helix, acting in a similar way to disulfide bridges in a protein by constraining the side chains to be close, reducing the entropy of nonhelical states [205]. Lactam bridges between Lys-Asp, Lys-Glu, and Glu-Orn spaced i, i + 4 have been introduced into analogs of human growth hormone releasing factor [206], and proved to be stabilizing with Lys-Asp most effective. The same Lys-Asp i, i + 4 lactam was stabilizing in other helical peptide systems [207–210], while Lys-Glu i, i + 4 lactam bridges were less effective [211]. Two overlapping Lys-Asp lactams were even more stabilizing [212]. The effect of the ring size formed by the lactam was investigated by replacing Lys with ornithine or (S)-diaminopropionic acid. A ring size of 21 or 22 atoms was most stabilizing (a Lys-Asp i, i + 4 lactam is 20 atoms) [206]. Lactams between side chains spaced i, i + 7 [213, 214] or i, i + 3 [214]; [215], spanning two or one turns of the helix have also been reported. i, i + 7 disulfide bonds have been introduced into alanine-based peptides, using (D)- and (L)-2-amino-6-mercaptohexanoic acid derivatives [216]. Strongly stabilizing effects were observed. Some interesting recent work has shown that helix formation can be reversibly photoregulated. Two cysteine residues are cross-linked by an azobenzene derivative which can be photoisomerized from trans to cis, causing a large increase or decrease in the helix content of the peptide, depending on its spacing [217–219].
5.6 Capping Motifs
Although the N-terminal capping box sequence stabilizes helices by inhibiting N-terminal fraying, it does not necessarily promote elongation unless accompanied by favorable hydrophobic interactions as in a “hydrophobic staple” motif [220, 221]. The nature of the capping box stabilizing effect thus not only arises from reciprocal hydrogen bonds between compatible residues, but also from local interactions between side chains, helix macrodipole-charged residue interactions and solvation [222]. Despite statistical analyses revealing that Schellman motifs are observed more frequently that expected at the helix C-terminus, this motif populates only transiently in aqueous solution but it is formed in 30% TFE [223]. This might be due to the C-terminus being very frayed and the increase of helical content contributed from this motif is small. Energetically this motif is not very favorable due to the entropic cost of fixing a Gly residue at the position C . The Schellman motif is believed to be a consequence of helix formation and does not involve α-helix nucleation [224]. The α L motif seems to be more stable than the alternative Schellman motif [221].
5 Forces Affecting "-helix Stability 33
5.7 Ionic Strength
Electrostatic interactions between charged side chains and the helix macrodipole can stabilize the helix [92, 102, 225]. The interactions are potentially quite strong, but are alleviated by the screening effects of water, ions, and nearby protein atoms. In theory, increasing ionic strength of the solvent (up to 1.0 M) should stabilize the helix through interactions with α-helix dipole moments by shifting the equilibrium between α-helix and random coil, which has a random orientation of the peptide dipoles [226]. The energetics of the interaction between fully charged ion pairs can be diminished by added salt and completely screened at 2.5 M NaCl [197, 227]. In peptides containing side chain-to-side-chain interactions, the effect of ion pairs and charge/helix-dipole interactions cannot be clearly separated. There are, however, indications that the interactions of charged residues with the helix macrodipole are less affected than those between charged side chains [227, 228]. In coiled-coil peptides, salt also affects hydrophobic residues by strengthening their interactions at the coiled-coil interface. This can be explained through alterations of the peptide-water interactions at high salt concentration. However, this requires a strong kosmotropic anion to accompany the screening cation [229]. 5.8 Temperature
Thermal unfolding experiments show that the helix unfolds with increasing temperature [230–232]. There is no sign of cold denaturation, as seen with proteins. Enthalpy and entropy changes for the helix/coil transition are difficult to determine as the helix/coil transition is very broad, precluding accurate determination of highand low-temperature baselines by calorimetry [230]. Nevertheless, isothermal titration calorimetric studies of a series of peptides that form helix when binding a nucleating La3+ , find H for helix formation to be −1.0 kcal mol−1 [135, 233], in good agreement with the earlier work. 5.9 Trifluoroethanol
Peptides with sequences of helices in proteins usually show low helix contents in water. An answer to this problem is to add TFE (2,2,2-trifluoroethanol) to induce helix formation [234–237]. For many peptides, the concentration of TFE used to increase the helix content is only up to 40% [234–236, 238, 239]. TFE may act by shielding CO and NH groups from the water solvent while leading to hydrogen bond formation between them. The conformational equilibrium thus shifts toward more compact structures, such as the α-helical conformation [184]. The mechanism involves interaction between TFE and water with several interpretations. One view suggests that TFE indirectly disrupts the solvent shell on α-helices [240, 241]. Another view proposes that TFE destabilizes the unfolded species and thereby
34 Design and Stability of "-Helices
indirectly enhances the kinetics and thermodynamics of folding of the coiled coil [242]. A more compromising view suggests that TFE forms clusters in water solution, which at lower concentration pulls the water molecules from the surface of proteins. At higher concentration, TFE clusters associate with appropriate hydrophobic side chains reducing their conformational entropy and switch the conformation at TFE concentration > 40% [243]. The propagation propensities of all amino acids increase variably in 40% TFE relative to water. The propagation propensities of the nonpolar amino acids increase greatly in 40% TFE whilst other amino acids propensity increase uniformly. However, glycine and proline are strong helix breakers in both in water and 40% TFE solvents [30]. In addition, 40% TFE dramatically alters electrostatic (and polar) interactions and increases the dependence of helix propensities on the sequence [244]. 5.10 pK a Values
Evaluation of pK a values of titrable amino acids in a peptide sequence can be used to analyze the strength of the possible interaction they form in water. pK a shifts of charged residues at the helix termini are significant because they can potentially interact with unsatisfied hydrogen bonds of the NH groups and CO group at the N-terminus and C-terminus, respectively, or the helix dipole. The pK a values can be measured accurately from the change in ellipticity across a broad range of pH. The asymptotic values of the ellipticities for the different protonation states are fitted to a Henderson-Hasselbach equation to calculate the pK a . In general, the pK a values of Glu and His at N1–N3 are normal compared with those in model compounds. In contrast, Asp and Cys have shifted pK a to lower values [95, 96, 102, 152, 225, 245–248]. An exception for negatively charged residues at the N-cap is that they have a lower pK a [29]. This may be because side chains at the N-cap can form strong hydrogen bonds to NH groups of N2 and N3, while the bonds formed by side chains at N1, N2, and N3 are much weaker [14, 19]. The negatively charged residues at higher pH destabilize helices when at the Ccap [29]. The increased pK a may result from an unfavorable electrostatic interaction with the C-terminal dipole or partial negative charges on the terminal CO groups. 5.11 Relevance to Proteins
Many of the features studied in peptide helices are also applicable to proteins and can be used to rationally modify protein stability or to design new helical proteins. Helices in proteins are often found on the surface with one face exposed to solvent and the other buried in the protein core. Helix propensities and sidechain interactions measured in peptides are thus directly applicable to the solventexposed face. Substitutions at buried positions are much more complex and tertiary interactions also make major contributions to stability. Tertiary interactions at helix
References
termini are rare; nearly all side-chain interactions are local [14]. Preferences for capping sites and the first and last turn of the helix are therefore applicable to most protein helices. The feature of protein helices of amphiphilicity, reflected in possession of a hydrophobic moment [249], is irrelevant to monomeric isolated helices. Acknowledgments
We thank all our coworkers in this field, namely Avi Chakrabartty, Carol Rohl, Buzz Baldwin, Tod Klingler, Ben Stapley, Jim Andrew, Eleri Hughes, Simon Penel, Duncan Cochran, Nicoleta Kokkoni, Jia Ke Sun, Jim Warwicker, Gareth Jones, and Jonathan Hirst. Current work in our lab on helices is supported by the Wellcome Trust (grant references 057318 and 065106). TMI thanks the Indonesian government for a scholarship.
References 1 PAULING, L., COREY, R. B., and BRANSON, H. R. (1951). The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl Acad. Sci. USA 37, 205–211. 2 PERUTZ, M. F. (1951). New X-ray evidence on the configuration of polypeptide chains. Nature 167, 1053–1054. 3 KENDREW, J. C., DICKERSON, R. E., STRANDBERG, B. E. et al. (1960). Structure of myoglobin. Nature 185, 422–427. 4 BARLOW, D. J. and THORNTON, J. M. (1988). Helix geometry in proteins. J. Mol. Biol. 201, 601–619. 5 SCHOLTZ, J. M. and BALDWIN, R. L. (1992). The mechanism of α-helix formation by peptides. Annu. Rev. Biophys. Biomol. Struct. 21, 95–118. 6 BALDWIN, R. L. (1995). α-helix formation by peptides of defined sequence. Biophys. Chem. 55, 127–135. 7 CHAKRABARTTY, A. and BALDWIN, R. L. (1995). Stability of α-Helices. Adv. Protein Chem. 46, 141–176. 8 KALLENBACH, N. R., LYU, P., and ZHOU, H. (1996). CD spectroscopy and the helix-coil transition in peptides and polypeptides. In Circular Dichroism and the Conformational Analysis of Biomolecules G. D. FASMAN, editor, Plenum Press, New York, 201–259.
9 ROHL, C. A. and BALDWIN, R. L. (1998). Deciphering rules of helix stability in peptides. Meth. Enzymol. 295, 1–26. 10 ANDREWS, M. J. I. and TABOR, A. B. (1999). Forming stable helical peptides using natural and artificial amino acids. Tetrahedron 55, 11711–11743. 11 SERRANO, L. (2000). The relationship between sequence and structure in elementary folding units. Adv. Protein Chem. 53, 49–85. 12 RICHARDSON, J. S. and RICHARDSON, D. C. (1988). Amino acid preferences for specific locations at the ends of α helices. Science 240, 1648–1652. 13 PRESTA, L. G. and ROSE, G. D. (1988). Helix signals in proteins. Science 240, 1632–1641. 14 DOIG, A. J., MACARTHUR, M. W., STAPLEY, B. J., and THORNTON, J. M. (1997). Structures of N-termini of helices in proteins. Protein Sci. 6, 147–155. 15 MACGREGOR, M. J., ISLAM, S. A., and STERNBERG, M. J. E. (1987). Analysis of the relationship between side-chain conformation and secondary structure in globular proteins. J. Mol. Biol. 198, 295–310. 16 SWINDELLS, M. B., MACARTHUR, M. W., and THORNTON, J. M. (1995). Intrinsic ψ propensities of amino acids derived from the coil regions of known
35
36 Design and Stability of "-Helices
structures. Nat. Struct. Biol. 2, 596–603. 17 DUNBRACK, R. and KARPLUS, M. (1994). Conformational analysis of the backbone-dependent rotamer preferences of protein side-chains. Nat. Struct. Biol. 1, 334–339. 18 DUNBRACK, R. L. (2002). Rotamer libraries in the 21(st) century. Curr. Opin. Struct. Biol. 12, 431–440. 19 PENEL, S., HUGHES, E., and DOIG, A. J. (1999). Side-chain structures in the first turn of the α-helix. J. Mol. Biol. 287, 127–143. 20 SERRANO, L. and FERSHT, A. R. (1989). Capping and α-helix stability. Nature 342, 296–299. 21 BELL, J. A., BECKTEL, W. J., SAUER, U., BAASE, W. A., and MATTHEWS, B. W. (1992). Dissection of helix capping in T4 lysozyme by structural and thermodynamic analysis of six amino acid substitutions at Thr 59. Biochemistry 31, 3590–3596. 22 CHAKRABARTTY, A., DOIG, A. J., and BALDWIN, R. L. (1993). Helix capping propensities in peptides parallel those in proteins. Proc. Natl Acad. Sci. USA 90, 11332–11336. 23 FOROOD, B., FELICIANO, E. J., and NAMBIAR, K. P. (1993). Stabilization of α-helical structures in short peptides via end capping. Proc. Natl Acad. Sci. USA 90, 838–842. 24 DASGUPTA, S. and BELL, J. A. (1993). Design of helix ends. Amino acid preferences, hydrogen bonding and electrostatic interactions. Int. J. Pept. Protein Res. 41, 499–511. 25 HARPER, E. T. and ROSE, G. D. (1993). Helix stop signals in proteins and peptides: the capping box. Biochemistry 32, 7605–7609. 26 SEALE, J. W., SRINIVASAN, R., and ROSE, G. D. (1994). Sequence determinants of the capping box, a stabilizing motif at the N-termini of α-Helices. Protein Sci. 3, 1741–1745. ˜ OZ, V. and SERRANO, L. (1995). The 27 MUN hydrophobic-staple motif and a role for loop residues in α-helix stability and protein folding. Nat. Struct. Biol. 2, 380–385.
28 VIGUERA, A. R. and SERRANO, L. (1999). Stable proline box motif at the N-terminal end of α-Helices. Protein Sci. 8, 1733–1742. 29 DOIG, A. J. and BALDWIN, R. L. (1995). Nand C-capping preferences for all 20 amino acids in α-helical peptides. Protein Sci. 4, 1325–1336. 30 ROHL, C. A., CHAKRABARTTY, A., and BALDWIN, R. L. (1996). Helix propagation and N-cap propensities of the amino acids measured in alanine-based peptides in 40 volume percent trifluoroethanol. Protein Sci. 5, 2623–2637. 31 SCHELLMAN, C. (1980). The αL conformation at the ends of helices. Protein Folding 53–61. 32 AURORA, R., SRINIVASAN, R., and ROSE, G. D. (1994). Rules for α-helix termination by glycine. Science 264, 1126–1130. 33 AURORA, R. and ROSE, G. D. (1998). Helix capping. Protein Sci. 7, 21–38. 34 PRIETO, J. and SERRANO, L. (1997). C-capping and helix stability: the Pro C-capping motif. J. Mol. Biol. 274, 276–288. 35 SIEDLECKA, M., GOCH, G., EJCHART, A., STICHT, H., and BIERZYNSKI, A. (1999). α-Helix nucleation by calcium-binding peptide loop. Proc. Natl Acad. Sci. USA 96, 903–908. 36 JERNIGAN, R., RAGHUNATHAN, G., and BAHAR, I. (1994). Characterisation of interactions and metal-ion binding-sites in proteins. Curr. Opin. Struct. Biol. 4, 256–263. 37 ALBERTS, I. L., NADASSY, K., and WODAK, S. J. (1998). Analyisis of zinc binding sites in protein crystal structures. Protein Sci. 7, 1700–1716. 38 GHADIRI, M. R. and CHOI, C. (1990). Secondary structure nucleation in peptides -transition-metal ion stabilized α-Helices. J. Am. Chem. Soc. 112, 1630–1632. 39 KISE, K. J. and BOWLER, B. E. (2002). Induction of helical structure in a heptapeptide with a metal cross-link: Modification of the Lifson-Roig Helix-Coil theory to account for covalent cross-links. Biochemistry 41, 15826–15837.
References 40 RUAN, F., CHEN, Y., and HOPKINS, P. B. (1990). Metal ion enhanced helicity in synthetic peptides containing unnatural, metal-ligating residues. J. Am. Chem. Soc. 112, 9403–9404. 41 CLINE, D. J., THORPE, C., and SCHNEIDER, J. P. (2003). Effects of As(III) binding on α-helical structure. J. Am. Chem. Soc. 125, 2923–2929. 42 KARPEN, M. E., De HASET, P. L., and NEET, K. E. (1992). Differences in the amino acid distributions of 310 -helices and α-Helices. Protein Sci. 1, 1333–1342. 43 BAKER, E. N. and HUBBARD, R. E. (1984). Hydrogen bonding in globular proteins. Prog. Biophys. Mol. Biol. 44, 97–179. 44 N´EMETHY, G., PHILLIPS, D. C., LEACH, S. J., and SCHERAGA, H. A. (1967). A second right-handed helical structure with the parameters of the Pauling-Corey α-helix. Nature 214, 363–365. 45 MILLHAUSER, G. L. (1995). Views of helical peptides-a proposal for the position of 310 -helix along the thermodynamic folding pathway. Biochemistry 34, 3872–3877. 46 BOLIN, K. A. and MILLHAUSER, G. L. (1999). α and 310 : The split personality of polypeptide helices. Acc. Chem. Res. 32, 1027–1033. 47 MIICK, S. M., MARTINEZ, G. V., FIORI, W. R., TODD, A. P., and MILLHAUSER, G. L. (1992). Short alanine-based peptides may form 310 -helices and not α-Helices in aqueous solution. Nature 359, 653–655. 48 ZHOU, H. X. X., LYU, P. C. C., WEMMER, D. E., and KALLENBACH, N. R. (1994). Structure of C-terminal α-helix cap in a synthetic peptide. J. Am. Chem. Soc. 116, 1139–1140. 49 FIORI, W. R. and MILLHAUSER, G. L. (1995). Exploring the peptide 310 -helix/α-helix equilibrium with double label electron spin resonance. Biopolymers 37, 243–250. 50 TONIOLO, C. and BENEDETTI, E. (1991). The polypeptide 310 -helix. Trends Biochem. Sci. 16, 350–353. 51 RAMACHANDRAN, G. N. and SASISEKHARAN, V. (1968). Conformation of polypeptides and proteins. Adv. Protein Chem. 23, 283–437.
52 LOW, B. W. and GRENVILLE-WELLS, H. J. (1953). Generalized mathematical relations for polypeptide chain helixes. The coordinates for the π helix. Proc. Natl Acad. Sci. USA 39, 785–801. 53 SHIRLEY, W. A. and BROOKS, C. L. (1997). Curious structure in “canonical” alanine based peptides. Proteins: Struct. Funct. Genet. 28, 59–71. 54 LEE, K. H., BENSON, D. R., and KUCZERA, K. (2000). Transitions from α to π helix observed in molecular dynamics simulations of synthetic peptides. Biochemistry 39, 13737–13747. 55 WEAVER, T. M. (2000). The π -helix translates structure into function. Protein Sci. 9, 201–206. 56 MORGAN, D. M., LYNN, D. G., MILLER-AUER, H., and MEREDITH, S. C. (2001). A designed Zn2+-binding amphiphilic polypeptide: Energetic consequences of π -helicity. Biochemistry 40, 14020–14029. 57 FODJE, M. N. and AL-KARADAGHI, S. (2002). Occurrence, conformational features and amino acid propensities for the π -helix. Protein Eng. 15, 353–358. 58 BALDWIN, R. L. (2003). In search of the energetic role of peptide hydrogen bonds. J. Biol. Chem. 278, 17581–17588. 59 EPAND, R. M. and SCHERAGA, H. A. (1968). Biochemistry 7, 2864–2872. 60 TANIUCHI, J. H. and ANFINSEN, C. B. (1969). J. Biol. Chem. 244, 2864–2872. 61 SPEK, E. J., GONG, Y., and KALLENBACH, N. R. (1995). Intermolecular interactions in a helical oligo- and poly(L-glutamic acid) at acidic pH. J. Am. Chem. Soc. 117, 10773–10774. 62 BROWN, J. E. and KLEE, W. A. (1971). Biochemistry 10, 470–476. 63 BIERZYNSKI, A., KIM, P. S., and BALDWIN, R. L. (1982). A salt bridge stabilizes the helix formed by isolated C-peptide of ribonuclease A. Proc. Natl Acad. Sci. USA 79, 2470–2474. 64 KIM, P. S., BIERZYNSKI, A., and BALDWIN, R. L. (1982). A competing salt-bridge suppresses helix formation by the isolated C-peptide carboxylate of Ribonuclease A. J. Mol. Biol. 162, 187–199.
37
38 Design and Stability of "-Helices
65 KIM, P. S. and BALDWIN, R. L. (1984). A helix stop signal in the isolated S-peptide of ribonuclease A. Nature 307, 329–334. 66 SHOEMAKER, K. R., KIM, P. S., BREMS, D. N. et al. (1985). Nature of the charged-group effect on the stability of the C-peptide helix. Proc. Natl Acad. Sci. USA 82, 2349–2353. 67 STREHLOW, K. G. and BALDWIN, R. L. (1989). Effect of the substitution Ala-Gly at each of 5 residue positions in the C-peptide helix. Biochemistry 28, 2130–2133. 68 OSTERHOUT, J. J., BALDWIN, R. L., YORK, E. J., STEWART, J. M., DYSON, H. J., and WRIGHT, P. E. (1989). H-1 NMR studies of the solution conformations of an analog of the C-peptide of ribonuclease-A. Biochemistry 29, 7059–7064. 69 SHOEMAKER, K. R., FAIRMAN, R., SCHULTZ, D. A. et al. (1990). Sidechain interactions in the C-peptide helix -Phe8-His12+ . Biopolymers 29, 1–11. 70 FAIRMAN, R., SHOEMAKER, K. R., YORK, E. J., STEWART, J. M., and BALDWIN, R. L. (1990). The Glu2-Arg10+ side-chain interaction in the C-peptidce helix of ribonuclease A. Biophys. Chem. 37, 107–119. 71 STREHLOW, K. G., ROBERTSON, A. D., and BALDWIN, R. L. (1991). Proline for alanine subsitutions in the C-peptide helix of ribonuclease A. Biochemistry 30, 5810–5814. 72 FAIRMAN, R., ARMSTRONG, K. M., SHOEMAKER, K. R., YORK, E. J., STEWART, J. M., and BALDWIN, R. L. (1991). Position effect on apparent helical propensities in the C-peptide helix. J. Mol. Biol. 221, 1395–1401. 73 BLANCO, F. J., JIMENEZ, M. A., RICO, M., SANTORO, J., HERRANZ, J., and NIETO, J. L. (1992). The homologous angiogenin and ribonuclease N-terminal fragments fold into very similar helices when isolated. Biochem. Biophys. Res. Commun. 182, 1491–1498. 74 RICO, M., NIETO, J. L., SANTORO, J., BERMEJO, F. J., and HERRANZ, J. (1983). H1-NMR parameters of the N-terminal 13-residue C-peptide of ribonuclease in
75
76
77
78
79
80
81
82
83
aqueous-solution. Org. Magn. Res. 21, 555–563. GALLEGO, E., HERRANZ, J., NIETO, J. L., RICO, M., and SANTORO, J. (1983). H1-NMR parameters of the N-terminal 19-residue S-peptide of ribonuclease in aqueous-solution. Int. J. Pept. Protein Res. 21, 242–253. RICO, M., NIETO, J. L., SANTORO, J., BERMEJO, F. J., HERRANZ, J., and GALLEGO, E. (1983). Low temperature H1-NMR evidence of the folding of isolated ribonuclease S-peptide. FEBS Lett. 162, 314–319. RICO, M., GALLEGO, E., SANTORO, J., BERMEJO, F. J., NIETO, J. L., and HERRANZ, J. (1984). On the fundamental role of the Glu2∼ . . . Arg10+ salt bridge in the folding of isolated ribonuclease A S-peptide. Biochem. Biophys. Res. Commun. 123, 757–763. NIETO, J. L., RICO, M., JIMENEZ, M. A., HERRANZ, J., and SANTORO, J. (1985). Amide H1-NMR study of the folding of ribonuclease C-peptide. Int. J. Biol. Macromol. 7, 66–70. NIETO, J. L., RICO, M., SANTORO, J., and BERMEJO, J. (1985). NH resonanaces of ribonuclease S-peptide in aqueous solution - low temperature NMR study. Int. J. Pept. Protein Res. 25, 47–55. SANTORO, J., RICO, M., NIETO, J. L., BERMEJO, J., HERRANZ, J., and GALLEGO, E. (1986). C13 NMR spectral assignment of ribonuclease S-peptide - Some new structural information about its low temperature folding. J. Mol. Struct. 141, 243–248. RICO, M., HERRANZ, J., BERMEJO, J. et al. (1986). Quantitative interpretation of the helix coil transition in RNase A S-peptide. J. Mol. Struct. 143, 439–444. RICO, M., SANTORO, J., BERMEJO, J. et al. (1986). Thermodynamic parameters for the helix coil thermal transition of ribonuclease S-peptide and derivatives from H1 NMR data. Biopolymers 25, 1031–1053. MARQUSEE, S. and BALDWIN, R. L. (1987). Helix stabilization by Glu− . . . Lys+ salt bridges in short peptides of de novo design. Proc. Natl Acad. Sci. USA 84, 8898–8902.
References 84 PARK, S. H., SHALONGO, W., and STELLWAGEN, E. (1993). Residue helix parameters obtained from dichroic analysis of peptides of defined sequence. Biochemistry 32, 7048–7053. 85 MARQUSEE, S., ROBBINS, V. H., and BALDWIN, R. L. (1989). Unusually stable helix formation in short alanine based peptides. Proc. Natl Acad. Sci. USA 86, 5286–5290. 86 LYU, P. C., MARKY, L. A., and KALLENBACH, N. R. (1989). The role of ion-pairs in α-helix stability -2 new designed helical peptides. J. Am. Chem. Soc. 111, 2733–2734. 87 LYU, P. C., LIFF, M. I., MARKY, L. A., and KALLENBACH, N. R. (1990). Side-chain contributions to the stability of α-helical structure in peptides. Science 250, 669–673. 88 GANS, P. J., LYU, P. C., MANNING, P. C., WOODY, R. W., and KALLENBACH, N. R. (1991). The helix-coil transition in heterogeneous peptides with specific side chain interactions: Theory and comparison with circular dichroism. Biopolymers 31, 1605–1614. 89 PENEL, S., MORRISON, R. G., MORTISHIRE-SMITH, R. J., and DOIG, A. J. (1999). Periodicity in α-helix lengths and C-capping preferences. J. Mol. Biol. 293, 1211–1219. 90 WADA, A. (1976). The α-helix as an electric macro-dipole. Adv. Biophys. 9, 1–63. 91 HOL, W. G. J., van DUIJNEN, P. T., and BERENDSEN, H. J. C. (1978). The α-helix dipole and the properties of proteins. Nature 273, 443–446. 92 AQVIST, J., LUECKE, H., QUIOCHO, F. A., and WARSHEL, A. (1991). Dipoles located at helix termini of proteins stabilize charges. Proc. Natl Acad. Sci. USA 88, 2026–2030. 93 ZHUKOVSKY, E. A., MULKERRIN, M. G., and PRESTA, L. G. (1994). Contribution to global protein stabilization of the N-capping box in human growth hormone. Biochemistry 33, 9856–9864. 94 TIDOR, B. (1994). Helix-capping interaction in λ Cro protein: a free energy simulation analysis. Proteins: Struct. Funct. Genet. 19, 310–323.
95 COCHRAN, D. A. E., PENEL, S., and DOIG, A. J. (2001). Contribution of the N1 amino acid residue to the stability of the α-helix. Protein Sci. 10, 463–470. 96 COCHRAN, D. A. E. and DOIG, A. J. (2001). Effects of the N2 residue on the stability of the α-helix for all 20 amino acids. Protein Sci. 10, 1305–1311. 97 IQBALSYAH, T. M. and DOIG, A. J. (in press). Effect of the N3 residue on the stability of the α-helix. Protein Sci. 13, 32–39. 98 DECATUR, S. M. (2000). IR spectro-scopy of isotope-labeled helical peptides: Probing the effect of N-acetylation on helix stabiliTy. Biopolymers 180–185. 99 DOIG, A. J., CHAKRABARTTY, A., KLINGLER, T. M., and BALDWIN, R. L. (1994). Determination of free energies of N-capping in α-Helices by modification of the Lifson-Roig helix-coil theory to include N- and C-capping. Biochemistry 33, 3396–3403. 100 STAPLEY, B. J., ROHL, C. A., and DOIG, A. J. (1995). Addition of side chain interactions to modified Lifson-Roig helix-coil theory: application to energetics of phenylalanine-methionine interactions. Protein Sci. 4, 2383–2391. 101 ANDREW, C. D., PENEL, S., JONES, G. R., and DOIG, A. J. (2001). Stabilising non-polar/polar side chain interactions in the α-helix. Proteins: Struct. Funct. Genet. 45, 449–455. 102 HUYGHUES-DESPOINTES, B. M., SCHOLTZ, J. M., and BALDWIN, R. L. (1993). Effect of a single aspartate on helix stability at different positions in a neutral alanine-based peptide. Protein Sci. 2, 1604–1611. 103 BRANDTS, J. R. and KAPLAN, K. J. (1973). Derivative spectroscopy applied to tyrosyl chromophores. Studies on ribonuclease, lima bean inhibitor, and pancreatic trypsin inhibitor. Biochemistry 10, 470–476. 104 EDELHOCH, H. (1967). Spectroscopic determination of tryptophan and tyrosine in proteins. Biochemistry 6, 1948–1954. 105 CHAKRABARTTY, A., KORTEMME, T., PADMANABHAN, S., and BALDWIN, R. L. (1993). Aromatic side-chain contribution
39
40 Design and Stability of "-Helices
106
107
108
109
110
111
112
to far-ultraviolet circular dichroism of helical peptides and its effect on measurement of helix propensities. Biochemistry 32, 5560–5565. ANDREW, C. D., BHATTACHARJEE, S., KOKKONI, N., HIRST, J. D., JONES, G. R., and DOIG, A. J. (2002). Stabilising interactions between aromatic and basic side chains in α-helical peptides and proteins. Tyrosine effects on helix circular dichroism. J. Am. Chem. Soc. 124, 12706–12714. KEMP, D. S., BOYD, J. G., and MUENDEL, C. C. (1991). The helical s-constant for alanine in water derived from template-nucleated helices. Nature 352, 451–454. KEMP, D. S., ALLEN, T. J., and OSLICK, S. L. (1995). The energetics of helix formation by short templated peptides in aqueous solution. 1. Characterization of the reporting helical template Ac-HE1(1). J. Am. Chem. Soc. 117, 6641–6657. GROEBKE, K., RENOLD, P., TSANG, K. Y., ALLEN, T. J., MCCLURE, K. F., and KEMP, D. S. (1996). Template-nucleated alanine-lysine helices are stabilized by position-dependent interactions between the lysine side chain and the helix barrel. Proc. Natl Acad. Sci. USA 93, 4025–4029. KEMP, D. S., OSLICK, S. L., and ALLEN, T. J. (1996). The structure and energetics of helix formation by short templated peptides in aqueous solution. 3. Calculation of the helical propagation constant s from the template stability constants t/c for Ac-Hel(1)-Ala(n)-OH, n = 1–6. J. Am. Chem. Soc. 118, 4249–4255. KEMP, D. S., ALLEN, T. J., OSLICK, S. L., and BOYD, J. G. (1996). The structure and energetics of helix formation by short templated peptides in aqueous solution. 2. Characterization of the helical structure of Ac-Hel(1)-Ala(6)-OH. J. Am. Chem. Soc. 118, 4240–4248. AUSTIN, R. E., MAPLESTONE, R. A., SEFLER, A. M. et al. (1997). Template for stabilization of a peptide α-helix: Synthesis and evaluation of conformational effects by circular dichroism and NMR. J. Am. Chem. Soc. 119, 6461–6472.
113 ARRHENIUS, T. and SATTHERTHWAIT, A. C. (1990). Peptides: Chemistry, Structure and Biology: Proceedings of the 11th American Peptide Symposium, ESCOM, Leiden. 114 MULLER, K., OBRECHT, D., KNIERZINGER, A. et al. (1993). Perspectives in Medicinal Chemistry, pp. 513–531, Verlag Chemie, Weinheim. 115 GANI, D., LEWIS, A., RUTHERFORD, T. et al. (1998). Design, synthesis, structure and properties of an α-helix cap template derived from N-[(2S)-2-chloropropionyl](2S)-Pro-(2R)-Ala-(2S,4S)-4-thioPro-OMe which initiates α-helical structures. Tetrahedron 54, 15793–15819. 116 ALEMAN, C. (1997). Conformational properties of α-amino acids disubstituted at the α-carbon. J. Phys. Chem. 101, 5046–5050. 117 SACCA, B., FORMAGGIO, F., CRISMA, M., TONIOLO, C., and GENNARO, R. (1997). Linear oligopeptides. 401. In search of a peptide 310 -helix in water. Gazz. Chim. Ital. 127, 495–500. 118 TANAKA, M. (2002). Design and conformation of peptides containing α,α-disubstituted α-amino acids. J. Synth. Org. Chem. Japan 60, 125–136. 119 LANCELOT, N., ELBAYED, K., RAYA, J. et al. (2003). Characterization of the 310 -helix in model peptides by HRMAS NMR spectroscopy. Chem. Eur. J. 9, 1317–1323. 120 KARLE, I. L. and BALARAM, P. (1990). Structural characteristics of α-helical peptide molecules containing Aib residues. Biochemistry 29, 6747–6756. 121 BANERJEE, R. and BASU, G. (2002). A short Aib/Ala-based peptide helix is as stable as an Ala-based peptide helix double its length. ChemBiochem 3, 1263–1266. 122 YOKUM, T. S., GAUTHIER, T. J., HAMMER, R. P., and MCLAUGHLIN, M. L. (1997). Solvent effects on the 310 -/α-helix equilibrium in short amphipathic peptides rich in α,α-disubstituted amino acids. J. Am. Chem. Soc. 119, 1167–1168. 123 MILLHAUSER, G. L., STENLAND, C. J., HANSON, P., BOLIN, K. A., and van De VEN, F. J. M. (1997). Estimating the relative populations of 310 -helix and α-helix in Ala-rich peptides. A hydrogen
References
124
125
126
127
128
129
130
131
132
133
exchange and high field NMR study. J. Mol. Biol. 267, 963–974. FIORI, W. R., LUNDBERG, K. M., and MILLHAUSER, G. L. (1994). A single carboxy-terminal arginine determines the amino-terminal helix conformation of an alanine-based peptide. Nat. Struct. Biol. 1, 374–377. YODER, G., POLESE, A., SILVA, R. A. G. D. et al. (1997). Conformation characterization of terminally blocked L-(αMe)Val homopeptides using vibrational and electronic circular dichroism. 310 -Helical stabilization by peptide-peptide interaction. J. Am. Chem. Soc. 119, 10278–10285. KENNEDY, D. F., CRISMA, M., TONIOLO, C., and CHAPMAN, D. (1991). Studies of peptides forming 310 - and α-Helices and β-bend ribbon structures in organic solution and in model biomembranes by Fourier transform infrared spectroscopy. Biochemistry 30, 6541–6548. HUNGERFORD, G., MARTINEZ-INSUA, M., BIRCH, D. J. S., and MOORE, B. D. (1996). A reversible transition between an α-helix and a 310 -helix in a fluorescence-labelled peptide. Angew. Chem. Int. Ed. Engl. 35, 326–329. SHEINERMAN, F. B. and BROOKS, C. L. (1995). 310 -Helices in peptides and proteins as studied by modified Zimm-Bragg theory. J. Am. Chem. Soc. 117, 10098–10103. ROSE, G. D. and CREAMER, T. P. (1994). Protein-folding -predicting predicting. Proteins: Struct. Funct. Genet. 19, 1–3. DALAL, S., BALASUBRAMANIAN, S., and REGAN, L. (1997). Protein alchemy: Changing β-sheet into α-helix. Nat. Struct. Biol. 4, 548–552. ROSE, G. D. (1997). Protein folding and the paracelsus challenge. Nat. Struct. Biol. 4, 512–514. POLAND, D. and SCHERAGA, H. A. (1970). Theory of Helix-Coil Transitions in Biopolymers. Academic Press, New York and London. QIAN, H. and SCHELLMAN, J. A. (1992). Helix/coil theories: A comparative study for finite length poly-peptides. J. Phys. Chem. 96, 3987–3994.
134 DOIG, A. J. (2002). Recent advances in helix-coil theory. Biophys. Chem. 101–102, 281–293. 135 ZIMM, B. H. and BRAGG, J. K. (1959). Theory of the phase transition between helix and random coil in polypeptide chains. J. Chem. Phys. 31, 526–535. 136 LIFSON, S. and ROIG, A. (1961). On the theory of the helix-coil transition in polypeptides. J. Chem. Phys. 34, 1963–1974. 137 FLORY, P. J. (1969). Statistical Mechanics of Chain Molecules. Wiley, New York. 138 PAPPU, R. V., SRINIVASAN, R., and ROSE, G. D. (2000). The Flory isolated-pair hypothesis is not valid for polypeptide chains: Implications for protein folding. Proc. Natl Acad. Sci. USA 97, 12565–12570. 139 TIFFANY, M. L. and KRIMM, S. (1968). New chain conformations of poly(glutamic acid) and polylysine. Biopolymers 6, 1379–1382. 140 KRIMM, S. and TIFFANY, M. L. (1974). The circular dichroism spectrum and structure of unordered polypeptides and proteins. Israel J. Chem. 12, 189–200. 141 WOODY, R. W. (1992). Circular dichroism and conformations of unordered polypeptides. Adv. Biophys. Chem. 2, 37–79. 142 BARRON, L. D., HECHT, L., BLANCH, E. W., and BELL, A. F. (2000). Solution structure and dynamics of biomolecules from Raman optical activity. Prog. Biophys. Mol. Biol. 73, 1–49. 143 BLANCH, E. W., MOROZOVA-ROCHE, L. A., COCHRAN, D. A. E., DOIG, A. J., HECHT, L., and BARRON, L. D. (2000). Is polyproline II helix the killer conformation? A Raman optical activity study of the amyloidogenic prefibrillar intermediate of human lysozyme. J. Mol. Biol. 301, 553–563. 144 SMYTH, E., SYME, C. D., BLANCH, E. W., HECHT, L., VASAK, M., and BARRON, L. D. (2000). Solution structure of native proteins irregular folds from Raman optical activity. Biopolymers 58, 138–151. 145 SYME, C. D., BLANCH, E. W., HOLT, C. et al. (2002). A Raman optical activity study of the rheomorphism in caseins,
41
42 Design and Stability of "-Helices
146
147
148
149
150
151
152
153
154
155
156
synucleins and tau. Eur. J. Biochem. 269, 148–156. RUCKER, A. L. and CREAMER, T. P. (2002). Polyproline II helical structure in protein unfolded states: Lysine peptides revisited. Protein Sci. 11, 980–1059. SHI, Z. S., WOODY, R. W., and KALLENBACH, N. R. (2002). Is polyproline II a major backbone conformation in unfolded proteins? Adv. Protein Chem. 62, 163–240. PAPPU, R. V. and ROSE, G. D. (2002). A simple model for polyproline II structure in unfolded states of alanine-based peptides. Protein Sci. 11, 2437–2455. SHI, Z., OLSON, C. A., ROSE, G. D., BALDWIN, R. L., and KALLENBACH, N. R. (2002). Polyproline II structure in a sequence of seven alanine residues. Proc. Natl Acad. Sci. USA 99, 9190–9195. RUCKER, A. L., PAGER, C. T., CAMPBELL, M. N., QUALIS, J. E., and CREAMER, T. P. (2003). Host-guest scale of left-handed polyproline II helix formation. Proteins: Struct. Funct. Genet. 53, 68–75. ANDERSEN, N. H. and TONG, H. (1997). Empirical parameterization of a model for predicting peptide helix/coil equilibrium populations. Protein Sci. 6, 1920–1936. SCHOLTZ, J. M., QIAN, H., ROBBINS, V. H., and BALDWIN, R. L. (1993). The energetics of ion-pair and hydrogen-bonding interactions in a helical peptide. Biochemistry 32, 9668–9676. SHALONGO, W. and STELLWAGEN, E. (1995). Incorporation of pairwise interactions into the Lifson-Roig model for helix prediction. Protein Sci. 4, 1161–1166. ARGOS, P. and PALAU, J. (1982). Amino acid distribution in protein secondary structures. Int. J. Pept. Protein Res. 19, 380–393. KUMAR, S. and BANSAL, M. (1998). Dissecting α-Helices: position-specific analysis of α-Helices in globular proteins. Proteins 31, 460–476. SUN, J. K., PENEL, S., and DOIG, A. J. (2000). Determination of α-helix N1 energies by addition of N1, N2 and N3
157
158
159
160
161
162
163
164
165
166
preferences to helix-coil theory. Protein Sci. 9, 750–754. ROHL, C. A. and DOIG, A. J. (1996). Models for the 310 -helix/coil, π -helix/coil, and α-helix/310 -helix/coil transitions in isolated peptides. Protein Sci. 5, 1687–1696. SUN, J. K. and DOIG, A. J. (1998). Addition of side-chain interactions to 310 -helix/coil and α-helix/310 -helix/coil theory. Protein Sci. 7, 2374–2383. ˜ OZ, V. and SERRANO, L. (1994). MUN Elucidating the folding problem of helical peptides using empirical parameters. Nat. Struct. Biol. 1, 399–409. ˜ OZ, V. and SERRANO, L. (1997). MUN Development of the multiple sequence approximation within the AGADIR model of α-helix formation: comparison with Zimm-Bragg and Lifson-Roig formalisms. Biopolymers 41, 495–509. FERNA´ NDEZ-RECIO, J., V´ASQUEZ, A., CIVERA, C., SEVILLA, P., and SANCHO, J. (1997). The tryptophan/histidine interaction in α-Helices. J. Mol. Biol. 267, 184–197. LACROIX, E., VIGUERA, A. R., and SERRANO, L. (1998). Elucidating the folding problem of α-Helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. J. Mol. Biol. 284, 173–191. ˜ OZ, V. and SERRANO, L. (1995). MUN Elucidating the folding problem of helical peptides using empirical parameters. II. Helix macrodipole effects and rational modification of the helical content of natural peptides. J. Mol. Biol. 245, 275–296. ˜ OZ, V., YUMOTO, N., PETUKHOV, M., MUN YOSHIKAWA, S., and SERRANO, L. (1998). Position dependence of nonpolar amino acid intrinsic helical propensities. J. Mol. Biol. 278, 279–289. LOMIZE, A. L. and MOSBERG, H. I. (1997). Thermodynamic model of secondary structure for α-helical peptides and proteins. Biopolymers 42, 239–269. V´ASQUEZ, M. and SCHERAGA, H. A. (1988). Effect of sequence-specific interactions on the stability of helical
References
167
168
169
170
171
172
173
174
175
conformations in polypeptides. Biopolymers 27, 41–58. ROBERTS, C. H. (1990). A hierarchical nesting approach to describe the stability of α helices with side chain interactions. Biopolymers 30, 335–347. GUZZO, A. V. (1965). The influence of amino acid sequence on protein structure. Biophys. J. 5, 809–822. DAVIES, D. R. (1964). A correlation between amino acid composition and protein structure. J. Mol. Biol. 9, 605–609. CHOU, P. Y. and FASMAN, G. D. (1974). Conformational parameters for amino acids in helical, b-sheet and random coil regions calculated from proteins. Biochemistry 13, 211–221. CHOU, P. Y. and FASMAN, G. D. (1978). Empirical predictions of protein conformation. Annu. Rev. Biochem. 47, 251–276. VON DREELE, P. H., POLAND, D., and SCHERAGA, H. A. (1971). Helix-coil stability constants for the naturally occurring amino acids in water. I. Properties of copolymers and approximate theories. Macromolecules 4, 396–407. VON DREELE, P. H., LOTAN, N., ANANTHANARAYANAN, V. S., ANDREATTA, R. H., POLAND, D., and SCHERAGA, H. A. (1971). Helix-coil stability constants for the naturally occurring amino acids in water. II. Characterization of the host polymers and application of the host-guest technique to random poly-(hydroxypropylglutamine-cohydroxybutylglutamine). Macromolecules 4, 408–417. WOJCIK, J., ALTMANN, K. H., and SCHERAGA, H. A. (1990). Helix-coil stability constants for the naturally occurring amino acids in water. XXIV. Half-cysteine parameters from random poly-(hydroxybutylglutamine-co-Smethylthio-L-cysteine). Biopolymers 30, 121–134. PADMANABHAN, S., YORK, E. J., GERA, L., STEWART, J. M., and BALDWIN, R. L. (1994). Helix-forming tendencies of amino acids in short (hydroxybutyl)-L-glutamine peptides: An
176
177
178
179
180
181
182
183
184
185
evaluation of the contradictory results from host-guest studies and short alanine-based peptides. Biochemistry 33, 8604–8609. CHAKRABARTTY, A., KORTEMME, T., and BALDWIN, R. L. (1994). Helix propensities of the amino acids measured in alanine-based peptides without helix-stabilizing side-chain interactions. Protein Sci. 3, 843–852. BLABER, M., ZHANG, X.-J., and MATTHEWS, B. W. (1993). Structural basis of amino acid α-helix propensity. Science 260, 1637–1640. BLABER, M. W., BAASE, W. A., GASSNER, N., and MATTHEWS, B. W. (1993). Structural basis of amino acid α-helix propensity. Science 260, 1637–1640. BLABER, M., ZHANG, X. J., LINDSTROM, J. D., PEPIOT, S. D., BAASE, W. A., and MATTHEWS, B. W. (1994). Determination of α-helix propensity within the context of a folded protein: sites 44 and 131 in bacteriophage-T4 lysozyme. J. Mol. Biol. 235, 600–624. HOROVITZ, A., MATTHEWS, J. M., and FERSHT, A. R. (1992). A-Helix stability in proteins. II. Factors that influence stability at an internal position. J. Mol. Biol. 227, 560–568. O’NEIL, K. T. and DEGRADO, W. F. (1990). A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids. Science 250, 646–651. PACE, C. N. and SCHOLTZ, J. M. (1998). A helix propensity scale based on experimental studies of peptides and proteins. Biophys. J. 75, 422–427. VILA, J., WILLIAMS, R. L., GRANT, J. A., WOJCIK, J., and SCHERAGA, H. A. (1992). The intrinsic helix-forming tendency of L-alanine. Proc. Natl Acad. Sci. USA 89, 7821–7825. VILA, J. A., RIPOLL, D. R., and SCHERAGA, H. A. (2000). Physical reasons for the unusual α-helix stabilization afforded by charged or neutral polar residues in alanine-rich peptides. Proc. Natl Acad. Sci. USA 97, 13075–13079. ROHL, C. A., FIORI, W., and BALDWIN, R. L. (1999). Alanine is helix-stabilizing in both template-nucleated and standard
43
44 Design and Stability of "-Helices
186
187
188
189
190
191
192
193
194
peptide helices. Proc. Natl Acad. Sci. USA 96, 3682–3687. KENNEDY, R. J., TSANG, K.-Y., and KEMP, D. S. (2002). Consistent helicities from CD and template t/c data for N-templated polyalanines: progress toward resolution of the alanine helicity problem. J. Am. Chem. Soc. 124, 934–944. LYU, P. C., ZHOU, H. X. X., JELVEH, N., WEMMER, D. E., and KALLENBACH, N. R. (1992). Position-dependent stabilizing effects in α-Helices -N-terminal capping in synthetic model peptides. J. Am. Chem. Soc. 114, 6560–6562. PETUKHOV, M., UEGAKI, K., YUMOTO, N., YOSHIKAWA, S., and SERRANO, L. (1999). Position dependence of amino acid intrinsic helical propensities II: Non-charged polar residues: Ser, Thr, Asn, and Gln. Protein Sci. 8, 2144–2150. PETUKHOV, M., UEGAKI, K., YUMOTO, N., and SERRANO, L. (2002). Amino acid intrinsic α-helical propensities III: Positional dependence at several positions of C-terminus. Protein Sci. 11, 766–777. ERMOLENKO, D. N., RICHARDSON, J. M., and MAKHATADZE, G. I. (2003). Noncharged amino acid residues at the solvent-exposed positions in the middle and at the C terminus of the α-helix have the same helical propensity. Protein Sci. 12, 1169–1176. SZALIK, L., MOITRA, J., KRYLOV, D., and VINSON, C. (1997). Phosphorylation destabilizes α-helices. Nat. Struct. Biol. 4, 112–114. LIEHR, S. and CHENAULT, H. K. (1999). A comparison of the α-helix forming propensities and hydrogen bonding properties of serine phosphate and α-amino-g-phosphonobutyric acid. Bioorg. Med. Chem. Lett. 9, 2759–2762. ANDREW, C. D., WARWICKER, J., JONES, G. R., and DOIG, A. J. (2002). Effect of phosphorylation on α-helix stability as a function of position. Biochemistry 41, 1897–1905. HOROVITZ, A., SERRANO, L., AVRON, B., BYCROFT, M., and FERSHT, A. R. (1990). Strength and co-operativity of contributions of surface salt bridges to
195
196
197
198
199
200
201
202
203
204
protein stability. J. Mol. Biol. 216, 1031–1044. MERUTKA, G. and STELLWAGEN, E. (1991). Effect of amino acid ion pairs on peptide helicity. Biochemistry 30, 1591–1594. STELLWAGEN, E., PARK, S.-H., SHALONGO, W., and JAIN, A. (1992). The contribution of residue ion pairs to the helical stability of a model peptide. Biopolymers 32, 1193–1200. HUYGHUES-DESPOINTES, B. M., SCHOLTZ, J. M., and BALDWIN, R. L. (1993). Helical peptides with three pairs of Asp-Arg and Glu-Arg residues in different orientations and spacings. Protein Sci. 2, 80–85. HUYGHUES-DESPOINTES, B. M. and BALDWIN, R. L. (1997). Ion-pair and charged hydrogen-bond interactions between histidine and aspartate in a peptide helix. Biochemistry 36, 1965–1970. HUYGHUES-DESPOINTES, B. M., KLINGLER, T. M., and BALDWIN, R. L. (1995). Measuring the strength of side-chain hydrogen bonds in peptide helices: the Gln.Asp (i,i + 4) interaction. Biochemistry 34, 13267–13271. STAPLEY, B. J. and DOIG, A. J. (1997). Hydrogen bonding interactions between Glutamine and Asparagine in α-helical peptides. J. Mol. Biol. 272, 465–473. PADMANABHAN, S. and BALDWIN, R. L. (1994). Tests for helix-stabilizing interactions between various nonpolar side chains in alanine-based peptides. Protein Sci. 3, 1992–1997. PADMANABHAN, S. and BALDWIN, R. L. (1994). Helix-stabilizing interaction between tyrosine and leucine or valine when the spacing is i,i + 4. J. Mol. Biol. 241, 706–713. VIGUERA, A. R. and SERRANO, L. (1995). Side-chain interactions between sulfur-containing amino acids and phenylalanine in α-helices. Biochemistry 34, 8771–8779. TSOU, L. K., TATKO, C. D., and WATERS, M. L. (2002). Simple cation-p interaction between a phenyl ring and a protonated amine stabilizes an α-helix in water. J. Am. Chem. Soc. 124, 14917–14921.
References 205 TAYLOR, J. W. (2002). The synthesis and study of side-chain lactam-bridged peptides. Biopolymers 66, 49–75. 206 CAMPBELL, R. M., BONGERS, J., and FELXI, A. M. (1995). Rational design, synthesis, and biological evaluation of novel growth-hormone releasing-factor analogs. Biopolymers 37, 67–68. 207 CHOREV, M., ROUBINI, E., MCKEE, R. L. et al. (1991). Cyclic parathyroid-hormone related protein antagonists -lysine 13 to aspartic acid 17 [I to (I + 40] side-chain to side-chain lactamization. Biochemistry 30, 5968–5974. 208 OSAPAY, G. and TAYLOR, J. W. (1992). Multicyclic polypeptide model compounds. 2. Synthesis and conformational properties of a highly α-helical uncosapeptide constrained by 3 side-chain to side-chain lactam bridges. J. Am. Chem. Soc. 114, 6966–6973. 209 BOUVIER, M. and TAYLOR, J. W. (1992). Probing the functional conformation of Neuropeptide-T through the design and study of cyclic analogs. J. Med. Chem. 35, 1145–1155. 210 KAPURNIOTU, A. and TAYLOR, J. W. (1995). Structural and conformational requirements for human calcitonin activity–design, synthesis, and study of lactam-bridged analogs. J. Med. Chem. 38, 836–847. 211 OSAPAY, G. and TAYLOR, J. W. (1990). Multicyclic polypeptide model compounds. 1. Synthesis of a tricyclic amphiphilic α-helical peptide using an oxime resin, segment-condensation approach. J. Am. Chem. Soc. 112, 6046–6051. 212 BRACKEN, C., GULYAS, J., TAYLOR, J. W., and BAUM, J. (1994). Synthesis and nuclear-magnetic-resonance structure determination of an α-helical, bicyclic, lactam-bridged hexapeptide. J. Am. Chem. Soc. 116, 6431–6432. 213 CHEN, S. T., CHEN, H. J., YU, H. M., and WANG, K. T. (1993). Facile synthesis of a short peptide with a side-chain-constrained structure. J. Chem. Res. (S) 6, 228–229. 214 ZHANG, W. T., and TAYLOR, J. W. (1996). Efficient solid-phase synthesis of peptides with tripodal side-chain bridges
215
216
217
218
219
220
221
222
223
224
and optimization of the solvent conditions for solid-phase cyclizations. Tetrahedron Lett. 37, 2173–2176. LUO, P. Z., BRADDOCK, D. T., SUBRAMANIAN, R. M., MEREDITH, S. C., and LYNN, D. G. (1994). Structural and thermodynamic characterization of a bioactive peptide model of apolipoprotein-E-side-chain lactam bridges to constrain the conformation. Biochemistry 33, 12367–12377. JACKSON, D. Y., KING, D. S., CHMIELEWSKI, J., SINGH, S., and SCHULTZ, P. G. (1991). General approach to the synthesis of short α-helical peptides. J. Am. Chem. Soc. 113, 9391–9392. KUMITA, J. R., SMART, O. S., and WOOLLEY, G. A. (2000). Photo-control of helix content in a short peptide. Proc. Natl Acad. Sci. USA 97, 3803–3808. FLINT, D. G., KUMITA, J. R., SMART, O. S., and WOLLEY, G. A. (2002). Using an azobenzene cross-linker to either increase or decrease peptide helix content upon trans-to-cis photoisomerization. Chem. Biol. 9, 391–397. KUMITA, J. R., FLINT, D. G., SMART, O. S., and WOOLLEY, G. A. (2002). Photo-control of peptide helix content by an azobenzene cross-linker: steric interactions with underlying residues are not critical. Protein Eng. 15, 561–569. ˜ OZ, V., RICO, M., JIME´ NEZ, M. A., MUN and SERRANO, L. (1994). Helix stop and start signals in peptides and proteins. The capping box does not necessarily prevent helix elongation. J. Mol. Biol. 242, 487–496. KALLENBACH, N. R. and GONG, Y. X. (1999). C-terminal capping motifs in model helical peptides. Bioorg. Med. Chem. 7, 143–151. PETUKHOV, M., YUMOTO, N., MURASE, S., ONMURA, R., and YOSHIKAWA, S. (1996). Factors that affect the stabilization of α-helices in short peptides by a capping box. Biochemistry 35, 387–397. VIGUERA, A. R. and SERRANO, L. (1995). Experimental analysis of the Schellman motif. J. Mol. Biol. 251, 150–160. GONG, Y., ZHOU, H. X., GUO, M., and KALLENBACH, N. R. (1995). Structural
45
46 Design and Stability of "-Helices
225
226
227
228
229
230
231
232
233
analysis of the N- and C-termini in a peptide with consensus sequence. Protein Sci. 4, 1446–1456. ARMSTRONG, K. M. and BALDWIN, R. L. (1993). Charged histidine affects α-helix stability at all positions in the helix by interacting with the backbone charges. Proc. Natl Acad. Sci. USA 90, 11337–11340. SCHOLTZ, J. M., YORK, E. J., STEWART, J. M., and BALDWIN, R. L. (1991). A neutral water-soluble, α-helical peptide: The effect of ionic strength on the helix-coil equilibrium. J. Am. Chem. Soc. 113, 5102–5104. SMITH, J. S. and SCHOLTZ, J. M. (1998). Energetics of polar side-chain interactions in helical peptides: Salt effects on ion pairs and hydrogen bonds. Biochemistry 37, 33–40. LOCKHART, D. J. and KIM, P. S. (1993). Electrostatic screening of charge and dipole interactions with the helix backbone. Science 260, 198–202. JELESAROV, I., DURR, E., THOMAS, R. M., and BOSSHARD, H. B. (1998). Salt effects on hydrophobic interaction and charge screening in the folding of a negatively charged peptide to a coiled coil (leucine zipper). Biochemistry 37, 7539–7550. SCHOLTZ, J. M., MARQUSEE, S., BALDWIN, R. L. et al. (1991). Calorimetric determination of the enthalpy change for the α-helix to coil transition of an alanine peptide in water. Proc. Natl Acad. Sci. USA 88, 2854–2858. YODER, G., PANCOSKA, P., and KEIDERLING, T. A. (1997). Characterization of alanine-rich peptides, Ac-(AAKAA)n-GY-NH2 (n) 1–4), using vibrational circular dichroism and fourier transform infrared conformational determination and thermal unfolding. Biochemistry 36, 5123–5133. HUANG, C. Y., KLEMKE, J. W., GETAHUN, Z., DEGRADO, W. F., and GAI, F. (2001). Temperature-dependent helix-coil transition of an alanine based peptide. J. Am. Chem. Soc. 123, 9235–9238. GOCH, G., MACIEJCZYK, M., OLESZCZUK, M., STACHOWIAK, D., MALICKA, J., and BIERZYNSKI, A. (2003). Experimental
234
235
236
237
238
239
240
241
242
243
investigation of initial steps of helix propagation in model peptides. Biochemistry 42, 6840–6847. NELSON, J. W. and NELSON, N. R. K. (1986). Stabilization of the ribonuclease S-peptide α-helix by trifluoroethanol. Proteins: Struct. Funct. Genet. 1, 211–217. NELSON, J. W. and NELSON, N. R. K. (1989). Persistence of the α-helix stop signal in the S-peptide in trifluoroethanol solutions. Biochemistry 28, 5256–5261. SONNICHSEN, F. D., van EYK, J. E., HODGES, R. S., and SYKES, B. D. (1992). Effect of trifluoroethanol on protein secondary structure: an NMR and CD study using a synthetic actin peptide. Biochemistry 31, 8790–8798. WATERHOUS, D. V. and JOHNSON, W. C. (1994). Importance of environment in determining secondary structure in proteins. Biochemistry 33, 2121–2128. JASANOFF, A. and FERSHT, A. R. (1984). Quantitative determination of helical propensities. Analysis of data from trifluoroethanol titration curves. Biochemistry 33, 2129–2135. ALBERT, J. S. and HAMILTON, A. D. (1995). Stabilization of helical domains in short peptides using hydrophobic interactions. Biochemistry 34, 984–290. WALGERS, R., LEE, T. C., and CAMMERS-GOODWIN, A. (1998). An indirect chaotropic mechanism for the stabilization of helix conformation of peptides in aqueous trifluoroethanol and hexafluoro-2-propanol. J. Am. Chem. Soc. 120, 5073–5079. LUO, P. and BALDWIN, R. L. (1999). Interaction between water and polar groups of the helix backbone: An important determinant of helix propensities. Proc. Natl Acad. Sci. USA 96, 4930–4935. KENTSIS, A. and SOSNICK, T. R. (1998). Trifluoroethanol promotes helix formation by destabilizing backbone exposure: Desolvation rather than native hydrogen bonding defines the kinetic pathway of dimeric coiled coil folding. Biochemistry 37, 14613–14622. REIERSEN, H. and REES, A. R. (2000). Trifluoroethanol may form a solvent
References
244
245
246
247
248
249
250
matrix for assisted hydrophobic interactions between peptide sidechains. Protein Eng. 13, 739–743. MYERS, J. K., PACE, N., and SCHOLTZ, J. M. (1998). Trifluoroethanol effects on helix propensity and electrostatic interactions in the helical peptide from ribonuclease T1. Protein Sci. 7, 383–388. NOZAKI, Y. and TANFORD, C. (1967). Intrinsic dissociation constants of aspartyl and glutamyl carboxyl groups. J. Biol. Chem. 242, 4731–4735. KYTE, J. (1995). Structure in Protein Chemistry, Garland Publishing, New York and London. KORTEMME, T. and CREIGHTON, T. E. (1995). Ionisation of cysteine residues at the termini of model α-helical peptides. Relevance to unusual thiol pKa values in proteins of the thio-redoxin family. J. Mol. Biol. 253, 799–812. MIRANDA, J. J. (2003). Position-dependent interactions between cysteine and the helix dipole. Protein Sci. 12, 73–81. EISENBERG, D., WEISS, R. M., and TERWILLIGER, T. C. (1982). The helical hydrophobic moment: a measure of the amphiphilicity of a helix. Nature 299, 371–374. ˜ OZ, V. and SERRANO, L. (1995). MUN Analysis of i; i + 5 and i; i + 8 hydrophobic interactions in a helical
251
252
253
254
255
model peptide bearing the hydro-phobic staple motif. Biochemistry 34, 15301–15306. THOMAS, S. T., LOLADZE, V. V., and MAKHATADZE, G. I. (2001). Hydration of the peptide backbone largely defines the thermodynamic propensity scale of residues at the C position of the C-capping box of α-Helices. Proc. Natl Acad. Sci. USA 98, 10670–10675. SHI, Z., OLSON, C. A., BELL, A. J., and KALLENBACH, N. R. (2002). Non-classical helix-stabilizing interactions: C–H. . .O H-bonding between Phe and Glu side chains in α-helical peptides. Biophys. Chem. 101–2, 267–279. LUO, R., DAVID, L., HUNG, H., DEVANEY, J., and GILSON, M. K. (1999). Strength of solvent-exposed salt-bridges. J. Phys. Chem. B 103, 366–380. MARQUSEE, S. and SAUER, R. T. (1994). Contributions of a hydrogen bond/salt bridge network to the stability of secondary and tertiary structure in lambda repressor. Protein Sci. 3, 2217–2225. SHI, Z., OLSON, C. A., BELL, A. J., and KALLENBACH, N. R. (2001). Stabilization of α-helix structure by polar side-chain interactions: Complex salt bridges, cation π interactions and C–H . . . O–H bonds. Biopolymers 60, 366–380.
47
1
Design and Stability of Peptide β-sheets Mark S. Searle University of Nottingham, Nottingham, United Kingdom
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The pathway by which the polypeptide chain assembles from the unfolded, “disordered” state to the final active folded protein has been the subject of intense investigation. Hierarchical models of protein folding emphasize the importance of local interactions in restricting the conformational space of the polypeptide chain in the search for the native state. These nuclei of structure promote interactions between different parts of the sequence leading ultimately to a cooperative ratelimiting step from which the native state emerges [1–4]. Designed peptides that fold autonomously in water (α-helices and β-sheets) have proved extremely valuable in probing the relationship between local sequence information and folded conformation (the stereochemical code) in the absence of the tertiary interactions found in the native state of proteins. This has allowed intrinsic secondary structure propensities to be investigated in isolation, and enabled the nature and strength of the weak interactions relevant to a wide range of molecular recognition phenomena in chemistry and biology to be put on a quantitative footing. While the literature is rich in studies of α-helical peptides [5–8], water-soluble, nonaggregating monomeric β-sheets have emerged relatively recently [9–11]. For reasons of design and chemical synthesis, these are almost exclusively antiparallel β-sheets, although others have used nonnatural linkers to engineer parallel strand alignments [12–14]. Here we focus on contiguous antiparallel β-sheet systems. Autonomously folding β-hairpin motifs, consisting of two antiparallel β-strands linked by a reverse β-turn (Figure 1), represent the simplest systems for probing weak interactions in β-sheet folding and assembly, although more recently a number of three- and four-stranded β-sheet structures have been described. From this growing body of data, key factors have come into focus that are important in rational Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Design and Stability of Peptide $-sheets
Fig. 1 a) Beta-strand alignment and interstrand hydrogen bonds in an antiparallel $-hairpin peptide; R groups represent amino acid side chains, main chain N and angles are shown. N-terminal $-hairpin of ubiquitin in which the native TLTGK turn se-
quence has been replaced by the sequence NPDG. b) Native strand alignment giving rise to a type I NPDG turn. c) Nonnative strand alignment with a G-bulged type I turn (NPDGT).
design. The following will be considered: the role of the β-turn in promoting and stabilizing antiparallel β-sheet formation; the role of intrinsic backbone φ, ψ propensities in preorganizing the extended conformation of the polypeptide chain; the role of cooperativity in β-sheet folding and stability and in the propagation of multistranded β-sheets, and quantitative approaches to estimating the energetics of β-sheet stability and folding. 2 β-Hairpins Derived from Native Protein Sequences
The early focus on peptides excised from native protein structures provided the first insights into autonomously folding β-hairpins, preceding the more rational
3 Role of $-turns in Nucleating $-hairpin Folding
approach to β-sheet design. Peptides derived from tendamistat [15], B1 domain of protein G [16, 17], ubiquitin [18, 19], and ferredoxin [20] showed that these sequences could exist in the monomeric form without aggregating, but that in most cases they showed a very limited tendency to fold in the absence of tertiary contacts. The use of organic co-solvents appeared to induce native-like conformation [17, 18, 20]. The study by Blanco et al. [20] of a peptide derived from the B1 domain of protein G provided the first example of native-like folding in water of a fully native peptide sequence. In contrast, the studies of hairpins isolated from the proteins ubiquitin and ferredoxin, which are structurally homologous to the G B1 domain (all form an α/β-roll fold) showed these to be unfolded in water [18, 20]. More recent studies of the native ubiquitin peptide have now revealed evidence for a small population of the folded state in water [21], while a β-turn mutation has been shown to significantly enhance folding [22]. The apparent lack of evidence for folding of the peptide derived from residues 15–23 of tendamistat (YQSWRYSQA) [15], and from the N-terminal sequence of ubiquitin (MQIFVKTLTGKTITLEV) [19] led to partial redesign of the sequence to enhance folding through modification to the β-turn sequence by introducing an NPDG type I turn, which is the most common type I turn sequence in proteins. Thus, in the former peptide SWRY was replaced by NPDG [15], and in the latter the G-bulged type I turn TLTGK was replaced by NPDG [19]. By introducing this tight two-residue loop across PD it was envisaged that both hairpins would be stabilized. This was certainly the case, however, the most striking observation was that both peptides folded into nonnative conformations with a three-residue G-bulged type I turn (PDG) reestablished across the turn (Figure 1b and 1c). These initial studies in rational redesign led to strikingly irrational results, prompting a much more systematic approach to β-hairpin design. The above results revealed that the βturn sequence, which dictates the preferred backbone geometry, appears to be an important factor in dictating the β-strand alignment. From the protein folding viewpoint, as demonstrated by the redesigned ubiquitin hairpin sequence, it is evident that one important role of the native turn sequence may be to preclude the formation of nonnative conformations that may be incompatible with formation of the native state.
3 Role of β-turns in Nucleating β-hairpin Folding
The systematic classification of β-turns in proteins reveals a wide variety of geometries and sizes of loop [2, 23, 24]. For the design of β-hairpins, the emphasis has been on incorporation of the smallest turn sequence possible to limit the entropic destabilization effects. While two-residue type I and type II turns are generally common in β-turns, the backbone conformation (φ and torsion angles) of a two-residue type I or type II turn results in a local left-handed twist, which is not compatible with the right-handed twist found in protein β-sheets. Consequently, these turns are less commonly found between contiguous antiparallel β-strands. However, the diastereomeric type I and II turns have the φ and and angles complementary
3
4 Design and Stability of Peptide $-sheets
to the right-handed twisted orientation of the β-strands. These conclusions appear to rationalize, at least in part, the above observations with β-turn modifications introduced into the β-hairpins of tendamistat and ubiquitin. The introduced NPDG type I turn is not compatible with the right-handed twist of the β-strands resulting in a refolding to a more flexible G-bulged type I turn. The work of Gellman et al. [25] has shown the importance of backbone φ, angle preferences for the residues in the turn sequence by comparing the stability of a number of β-hairpin peptides derived from the ubiquitin sequence (MQIFVKSXXKTITLVKV) containing either XX = L-Pro-Xaa or D-Pro-Xaa. Replacing L-Pro with D-Pro switches the twist from left-handed (type I or II) to right-handed (type I or II ), making the latter compatible with the right-handed twist of the two β-strands. NMR data (Hα shifts and long-range nuclear Overhauser effects (NOEs)) indicate that each of the D-Pro containing peptides showed a significant degree of folding, whereas the L-Pro analogs appeared to be unfolded. Similar conclusions were drawn from studies of a series of 12-mers containing XG turn sequences, with X = L-Pro or D-Pro [26]. A number of natural L-amino acids are commonly found in the α L region of conformational space and are compatible with the type I or II turn conformation. Statistical analyses from a number of groups have identified Xaa-Gly as a favored type I turn. A number of studies have used the Asn-Gly sequence to design βhairpin motifs and have demonstrated a high population of the folded structure with the required turn conformation and strand alignment [27–32]. The work by de Alba et al. [28] identified two possible conformations of the peptide ITSYNGKTYGR. The NOE data appear to be compatible with rapidly inter-converting conformations involving an YNGK type I turn and a NGKT type II turn, each giving a distinct pattern of cross-strand NOEs. Ramirez-Alvarado et al. [27] also described an NGcontaining 12-mer (RGITVXGKTYGR; X = N), which they subsequently extended to a series of hairpins to examine the correlation between hairpin stability and the database frequency of occurrence of residues in position X [29]. Using nuclear magnetic resonance (NMR) and circular dichroism (CD) measurements they concluded that X = Asn > Asp > Gly > Ala > Ser in promoting hairpin folding, in agreement with the intrinsic φ, preferences of these residues. Despite extensive analysis of NG type I turns in the protein database (PDB), it is still not entirely clear why Asn at the first position is so effective in promoting turn formation. There is no evidence for specific side chain to main chain hydrogen bonds that might stabilise the desired backbone conformation, although specific solvation effects cannot be ruled out [30]. Evidence that the NG turn is able to nucleate folding in the absence of cross-strand interactions was demonstrated in a truncated analog of one designed 16-residue β-hairpin sequence KKYTVSINGKKITVSI (β1 in Figure 2), in which the sequence was shortened to SINGKKITVSI, lacking the N-terminal six residues [30]. Evidence from NOE data showed that the turn was significantly populated with interactions observed between Ser6 Hα ↔ Lys11 Hα and Ile7 NH ↔ Lys10 NH (Figure 2b) that are only compatible with a folded type I turn around NG. Two destabilized β-hairpin mutants (KKYTVSINGKKITKSK with electrostatic repulsion between the N- and
3 Role of $-turns in Nucleating $-hairpin Folding
Fig. 2 Structure of the 16-residue $-hairpin sequence $1, and mutants $2;,$3 and $4 in which cross-strand salt bridges (Lys-Glu) have been introduced at position X2 -X15 and X6 -X11 . In b), hairpin $1 has been truncated by removing the N-terminal hydrophobic
residues (1–5). The resulting peptide still shows the ability to fold around the turn sequence as evident from medium range NOEs (indicated by arrows) that show that the INGK sequence is adopting a type I NG turn.
C-terminal Lys residues, and KKATASINGKKITVSI with the loss of key hydrophobic residues in one strand) showed no evidence from NMR chemical shift data for cross-strand interactions; however, careful examination of NOE data revealed evidence for NG turn nucleation [30]. Titration with organic co-solvent showed both peptides to fold significantly, indicating that the turn sequence probably already predisposes the peptide to form a β-hairpin but that favorable cross-strand interactions are required for stability. The work of de Alba et al. convincingly illustrated this principle in a series of six hairpin sequences (10-mers) where strand residues were conserved but turn sequences varied [28]. Using a number of NMR criteria they were able to show that changes in turn sequence could result in a variety of turn conformations including two residue 2:2 turns, 3:5 turns and 4:4 turns (see earlier nomenclature [22–24]), with different pairings of amino acid side chains. As with the earlier examples cited with the turn modification described for hairpins derived from tendamistat and ubiquitin, the bulged-type I turn (3:5 turn) appears to be an intrinsically stable turn with the necessary right-handed twist. Together these data strongly support a model
5
6 Design and Stability of Peptide $-sheets
for hairpin folding in which the turn sequence strongly dictates its preferred conformation, and that strand alignment, cross-strand interactions and subsequently conformational stability are dictated by the specificity of the turn.
4 Intrinsic φ, Propensities of Amino Acids
Statistical analyses of high-resolution structures in the PDB have provided significant insights into residue-specific intrinsic backbone φ; ψ preferences in polypeptide chains. A novel approach presented by Swindells et al., was to determine φ, ψ propensities of different residues in nonregular regions of protein structure where backbone geometry is free of interactions associated with regular hydrogen bonded β-sheet or α-helical secondary structure [33]. The striking observation is that in this context φ and ψ angles (see Figure 3) are far from randomly distributed, and that most occupy regions of Ramachandran space associated with regular secondary structure. The observed φ; ψ distributions for individual residues has been taken as representative of those found in denatured states of proteins providing the basis of a “random coil” state from which residue-specific NMR parameters (3 JNH-H α and NOE intensities) can be derived as a reference state for folding studies [34, 35].
Fig. 3 Ramachandran plots of residue backbone N and angles taken from a database of 512 high-resolution protein Xray structures showing: a) residues in regular $-sheet (N, −120◦ , 120◦ ) and "-helix (N;, −60◦ , 60◦ ), and b) residues in the irregular coil regions of the same structures
(N, 60◦ , 0◦ is the "L region of conformationa space mainly occupied by Gly). The distribution in b) shows that residues have a natura propensity to occupy the " and $ regions of conformational space even when they are not involved in regular protein secondary structure. Taken from Ref [36].
4 Intrinsic N, Propensities of Amino Acids 7
While β-propensity is found to vary significantly from one residue to the next, context-dependent effects also appear to play an important part [36]. While V, I, F, and Y, for example, have a high intrinsic preference to be in the βregion of the Ramachandran plot where steric interactions with flanking residues are minimized, their conformation is relatively insensitive to the nature of the flanking residues. In contrast, small or unbranched side chains have a higher preference for the α-helical conformation, however, this preference can be significantly modulated by its neighbors through a combination of steric and hydrophobic interactions, as well as both repulsive and attractive electrostatic interactions. General effects of flanking residues (grouped as α-like or β-like, reflecting φ propensities) on the central residue of a XXX triplet are illustrated in Figure 4 as an average over all residues at the central position. More specific effects are also shown for
Fig. 4 Effects of neighboring residues on residue $-propensities within the triplets XNX calculated from the data in Figure 3. The $-propensity of residue N is a measure of the number of times a particular residues is found in the $-region of the Ramachandran plot as a fraction of the total distribution between "- and $-space [$/(" + $)]. The context dependence of the $-propensity is estimated by considering the nature of the neighboring residue (X) (X = any residue, " is a residue that prefers the "-helical region – Asp, Glu, Lys, or Ser, $ is
a residue that prefers the $-sheet region – Ile, Val, Phe, or Tyr). The effects of neighboring residues on the averages-propensity is shown in (a), specific effects on Ser (b), Val (c) and Lys (d) are also shown. While Val is relatively insensitive to the nature of the flanking residues, the smaller Ser residue can be forced to adopt a higher $propensity if it has bulky neighbors. Thus, the intrinsic $-propensity of a particular residue is highly context dependent. Taken in part from Ref. [36].
8 Design and Stability of Peptide $-sheets
Ser, Val, and Lys (Figure 4). With Ser, for example, having bulky flanking residues either side with high β-propensity (denoted βSβ, where β could be V, I, F or Y), significantly increases the β-propensity of the Ser residue to minimize the steric repulsion between the two bulky neighboring residues [36]. Thus, intrinsic structural propensities appear to be highly context dependent. This statistical framework has been extended to a number of experimental systems to examine the extent to which isolated β-strand sequences (in the absence of secondary structure interactions) are predisposed by the primary sequence to adopt an extended β-like conformation. The isolated 8-mer (GKKITVSI), corresponding to the C-terminal β-strand of the hairpin β1 (Figure 2; KKYTVSINGK-KITVSI), was examined by NMR analysis of 3 JNH-Hα values and backbone NOE intensities. Surprisingly, many of these parameters are similar to those for the folded hairpin despite the monomeric nature of the 8-mer [32, 36]. In an analogous study of the C-terminal strand of the ubiquitin hairpin described above [21, 37], similarly large deviations of coupling constants and NOE intensities from random coil values suggested that the isolated β-strands are partially preorganized into an extended conformation supporting a model for hairpin folding which may not require a significant further organisation of the peptide backbone, a factor that may contribute significantly to hairpin stability. Several studies of denaturated states of proteins have also highlighted the influence of neighboring residues in modulating main chain conformational preferences [38–41], and the importance of residual structure in the unfolded state in guiding the conformational search to the native state [42, 43].
5 Side-chain Interactions and β-hairpin Stability
There is a strong case, at least in the context of isolated peptide fragments, that the origin of the specificity of β-hairpin folding is largely dictated by the conformational preferences of the turn sequence. However, the stability of the folded state has been attributed to interstrand hydrogen bonding and/or hydrophobic interactions, though which dominates is still a matter of debate. Ramirez-Alvarado et al. [27] reported that the population of the folded state of the hairpin RGITVNGKTYGR was significantly diminished by replacing residues on the N-terminal strand, and then the C-terminal strand, by Ala. The loss of stability was attributed firstly to a reduction in hydrophobic surface burial, but also due to the intrinsically lower β-propensity of Ala, the latter contributing through an adverse conformational entropy term. To compensate for this de Alba et al. [44] described a family of hairpins derived from the sequence IYSNSDGTWT. The effects of residue substitutions in the first three positions was examined while maintaining the overall β-character of the two strands. Several favorable cross-strand pair-wise interactions were identified that were apparent in earlier, and more recent, PDB analysis of β-sheet interactions [45, 46]. For example, Thr-Thr and Tyr-Thr cross-strand pairs produced stabilizing interactions, whilst Ile-Thr and Ile-Trp had a destabilizing effect. Other studies of
5 Side-chain Interactions and $-hairpin Stability
Ala substitution in one β-strand have similarly highlighted hydrophobic burial as a key factor in conformational stability [30], while the observation of large numbers of side chain NOEs have been used as evidence for hydrophobic stabilization in water (see below) [19, 25–27, 30]. 5.1 Aromatic Clusters Stabilize β-hairpins
The first example of a natural β-hairpin sequence that folded autonomously in water (residues 41–56 of the B1 domain of protein G) identified an aromatic-rich cluster of residues that appears to impart considerable stability [17]. The interstrand pairing of Trp/Val and Tyr/Phe has subsequently been exploited in the design of a number of model hairpin systems, in particular to examine the relationship between the position of this stabilizing cluster and the β-turn sequence. The separation between the loop sequence and cluster strongly influences stability and the extent of participation in β-sheet forming interactions (Figure 5a). In an isomeric family of 20-mers (peptides 1, 2, and 3), all of which contain exactly the same residue composition, the most stable hairpin is that in which the smallest cost in conformational entropy is paid to bring the cluster together, i.e., where the cluster is closest to the loop sequence [47]. This arrangement results in the largest Hα chemical shift deviations, indicating a well-formed core; however, the terminal residues show a lower propensity to fold. Thus, there exists a strong interplay between the two key stabilizing components of the hairpin. A statistical model was developed to rationalize the experimental observations and estimate the free energy change versus the number of peptide hydrogen bonds formed (Figure 5b). It is a well-known phenomenon that α-helical peptides become more stable as the length increases, reflecting the fact that while helix nucleation is energetically unfavorable, the propagation step has a small net increase in stability. In β-hairpin systems, Stanger et al. [48] suggest that this may not be the case. There is some evidence for an increase in stability as strands lengthen from five to seven residues, however, further extension (to nine) does not lead to a further stability increment, suggesting that there may be an intrinsic limit to strand length. Since the choice of sequence extension in this study was limited to an all Thr extension or an alternating Ser-Thr (ST)n extension the conclusions should be viewed with caution. Since crossstrand Ser/Ser and Thr/Thr pairings do not bury very much hydrophobic surface area, it is not surprising perhaps that cross-strand side-chain interactions may only just compensate for the entropic cost of organizing the peptide backbone. Studies with other more favorable pairings may be enlightening. Undoubtedly the most successfully designed structural motif to date has been the tryptophan zipper (trpzip) motif [49], whose stability exceeds substantially all those already described. The design is based around stabilizing nonhydrogen-bonded cross-strand Trp-Trp pairs (Table 1). The most successful designs involved two such Trp-Trp pairs which NMR structural analysis reveals are interdigitated in a zipper-like manner stabilized through face-face offset π-stacking giving a compact structure (Figure 6). This arrangement
9
10 Design and Stability of Peptide $-sheets
Fig. 5 Interplay between hydrophobic cluster and turn position in a family of isomeric $-hairpins. The distance of the WVYF cluster from the turn is shown in (a) for peptides 1, 2, and 3. A statistical mechanical model was used to estimate the free energy of folding for the three peptides according to the number of peptide hydrogen bonds formed and the relative position of the stabilizing hydrophobic cluster (b). Thus, peptide 1, which has the hydrophobic cluster
closest to the turn, has the smallest energetic cost of forming the cluster which then significantly stabilizes this core motif, but leads to fraying of the N- and C-termini. In contrast, in the case of peptide 3, there is a large energy penalty in ordering the peptide backbone to bring the residues of the cluster together. Overall the core hairpin is less stable but more residues are involved in ordered structure. Adapted from Ref. [47].
5 Side-chain Interactions and $-hairpin Stability Table 1 trpzip β-hairpin peptides, sequences and stability
β-hairpin
Sequence
Turn type
Tm
H (kJ mol−1 )
trpzip1 trpzip2 trpzip3
SWTWEGNKWTWK SWTWENGKWTWK SWTWED PNKWTWK
(Type II turn) (Type I turn) (Type II turn)
323 345 352
−45.4 −70.6 −54.8
Fig. 6 NMR structures of trpzips 1 and 2. A) Ensemble of 20 structures of trpzip1 (residues 2–11) showing the relative orientations of the indole rings. B) Overlay of trpzip 1 and 2 aligned to the peptide backbone of residues 2–5 and 8–11 (top),
and rotated by 90◦ for the end-on view of the indole rings. The backbone carbonyl of residue 6 is labeled to illustrate the different turn geometries of the two hairpins (type II and type I ). Taken from Ref [49].
11
12 Design and Stability of Peptide $-sheets
of the indole rings results in a pronounced signature in the CD spectrum with intense exciton-coupling bands at 215 and 229 nm indicative of interactions between aromatic chromophores in a highly chiral environment (see Figure 7). The high sensitivity of the CD bands permits thermal denaturation curves to be determined with high sensitivity, in contrast to other β-sheet systems where only small changes in the CD spectrum at 216 nm are evident, resulting in poor signal-to-noise. The thermal unfolding curves are sigmoidal and reversible, fitting to a two-state folding model (Figure 7). The trpzip hairpins 1, 2, and 3 show high T m values (see Table 1) with folding strongly enthalpy driven. Changing the turn sequence (GN versus NG versus DPN) has a significant impact on stability. The unfolding curve for trpzip2 with the NG type I turn gives the most cooperative unfolding transition and appears to be the most stable of the three at room temperature, despite other studies that suggest that the unnatural D-PN (type II ) turn is the most stabilizing [25]. Clearly, context-dependent factors are at work. Also, the strongly enthalpy-driven signature for folding is different to that reported previously for other systems [32], reflecting differences in the nature of the stabilizing interactions which in the case of the trpzip peptides involve π-π stacking interactions. Despite the fact that the trpzip hairpins represent a highly stabilized motif, no such examples have been found in the protein structure database. The authors suggest that on steric grounds it may be difficult to accommodate the trpzip motif within a multistranded β-sheet or through packing against other structural elements. 5.2 Salt Bridges Enhance Hairpin Stability
The high abundance of salt bridges on the surface of hyperthermophilic proteins [50, 51], together with examples of rational enhancement of protein stability through redesign of surface charge, has strongly implicated ionic interactions as a stabilizing force [52, 53]. In model peptides that are only weakly folded in water, the relative importance of ionic versus hydrophobic interactions in stabilizing local secondary structure has been less well investigated in terms of a detailed quantitative description. To this end, peptide β1 (Figure 2) was mutated to introduce Lys-Glu salt bridges at two positions within the hairpin involving substitution of Ser → Glu: one salt bridge is postioned adjacent to the β-turn, while the other involves residues close to the N- and C-termini (see Figure 2) [54]. Using NMR chemical shift data from Ha resonances we have estimated the net contribution to stability from these two interactions. Although the contribution to stability in each case is small, Hα chemical shifts are extremely sensitive to small shifts in the population of the folded state, as evident from the δHα data in Figure 8. On this basis, the individual contributions of these two interactions was estimated (K2-E15, peptide β2, and E6-K11, peptide β3) compared with their K2-S15 and S6-K11 counterparts (peptide β1) and found to be similar (−1.2 and −1.3 kJ mol−1 at 298K). When the two salt bridges are introduced simultaneously into the hairpin sequence (β4), the energetic contribution of the two interactions together (−3.6 kJ mol−1 ) is
5 Side-chain Interactions and $-hairpin Stability
Fig. 7 Folding data for trpzips 1 to 3 determined from near- and far-UV CD spectra. a) CD spectrum of trpzip 1, with inset of nearUV CD showing buried aromatic residues (10-fold expansion). b) Reversible thermal
denaturation of trpzip 1 monitored by CD at 229 nm (unfolding and refolding curves overlayed). c) Temperature dependence of folding for trpzips 1 to 3 plotted as fraction folded. Taken from Ref. [49].
13
14 Design and Stability of Peptide $-sheets
Fig. 8 Effects of salt bridges (Lys-Glu) on $-hairpin stability. H" chemical shift deviation from random coil values (*H" values) for $-hairpin peptides $2 to $4 compared with those of the reference hairpin
protein $1 containing Lys-Ser cross-strand pairs at the X2 -X15 and X6 -X11 mutation sites (see Figure 2). a) $1 versus $2; b) $1 versus $3, and c) $1 versus $4. All data at 298K and pH 5.5. Taken from Ref. [54].
5 Side-chain Interactions and $-hairpin Stability
significantly greater than the sum of the individual interactions. This effect is readily apparent from the large increase in -δ-Hα values in Figure 8, indicating that the contribution of a given interaction appears to depend on the relative stability of the system in which the interaction is being measured. Similar observations have been reported using a disulfide cyclized β-hairpin scaffold [55]. The indication is
Fig. 9 a) Thermodynamic cycle showing the context-dependent energetic contribution to hairpin stability of each electrostatic interaction in the $1 to $4 family of hairpins (Figure 2) by comparing relative hairpin stabilities at 298K determined from NMR chemical shift data; G values are shown for Lys-Glu interactions at pH 5.5. The stabilizing contribution of each salt bridge is context dependent, showing some degree
of cooperativity between the two mutation sites according to the degree of preorganization of the hairpin structure. b) Schematic illustration of the abundance of cross-strand NOEs in hairpin peptide $4. c) Family of five NMR structures of hairpin $4 showing the peptide backbone alignment, with some fraying at the N- and C-termini. Taken from Ref. [54].
15
16 Design and Stability of Peptide $-sheets
that the strength of the interaction appears to depend on the degree of preorganization of the β-hairpin template that pays varying degrees of the entropic cost in bringing the pairs of side chains together. This is more clearly illustrated by the schematic representation shown in Figure 9 that shows the thermodynamic cycle indicating the effects on stability of the introduction of each mutation. Thus, the energetic contribution of the S15/E15 mutation could be measured either from β1 ←→ β2 or from β3 ←→ β4. The values obtained are quite different, the latter suggesting that the interaction is more favorable (−1.2 versus −2.3 kJ mol−1 ). This appears to correlate with the fact that the stability of β3 is greater than β1 and that the degree of preorganization of the former determines the energetic contribution of the interaction. The same principle is evident when determining the energetics of the E6-K11 interaction. This can be estimated from β1 ←→ β3 or β2 ←→ β4. Again, the energetics are quite different (−1.3 versus −2.4 kJ mol−1 ) with the larger contribution from the β2 ←→ β4 pair, where the intrinsic stability of the β2 reference state is higher. The NMR data show that the folded conformation of β4 is highly populated (> 70%) giving rise to an abundance of cross-strand NOEs (Figure 9) [55]. On the basis of 173 restraints (NOEs and torsion angle restraints from 3 JNH-Hα values) we have calculated a family of structures compatible with the NMR data (Figure 9). The large number of van der Waals contacts between hydrophobic residues (V, I, and Y)
Fig. 10 CD melting curves for $4 at various concentrations (% v/v) of MeOH, as indicated. The nonlinear least squares fit is shown in each case from which thermodynamic data for folding were determined
(see Ref. [54]). The change in heat capacity on folding in 30% and 50% aqueous methanol is assumed to be zero; there is evidence for cold-denaturation in water and 10% methanol. Taken from Ref. [54].
7 Quantitative Analysis of Peptide Folding 17
evident from the NOE data leads us to conclude that hydrophobic interactions still provide the overall driving force for folding with structure and stability further consolidated by additional Coulombic interactions from the salt bridges. CD melting curves for hairpin β4 in water and various concentrations of methanol (Figure 10) show the same characteristics described above from temperature-dependent NMR data for β1: a shallow melting curve for β4 in water with evidence for cold denaturation is consistent with the hydrophobic interaction again providing the dominant driving force for folding (H = +11:9 kJ mol−1 and S = +38 J K−1 mol−1 ), while folding becomes strongly enthalpy driven in 50% aqueous methanol (H = −39:8 kJ mol−1 and S = −106 J K−1 mol−1 ).
6 Cooperative Interactions in β-sheet Peptides: Kinetic Barriers to Folding
Folding kinetics for a β-hairpin derived from the C-terminus of the B1 domain of protein G have been described from measurements of tryptophan fluorescence following laser-induced temperature-jump [56, 57]. Kinetic analysis of this peptide (and a dansylated analogue) reveals a single exponential relaxation process with time constant 3.7 ± 0:3 µs. The data indicate a single kinetic barrier separating folded and unfolded states, consistent with a two-state model for folding. Subsequently, the authors developed a statistical mechanical model based on these observations, describing the stability in terms of a minimal numbers of parameters: loss of conformational entropy, backbone stabilization by hydrogen bonding and formation of a stabilizing hydrophobic cluster between three key residues. This model seems sufficient to reproduce all of the features observed experimentally, with a rough, funnel-like energy landscape dominated by two global minima representing the folded and unfolded states. The formation of the hydrophobic cluster appears to be a key folding event. Nucleation by the turn seems most likely, consistent with experimental measurements of loop formation on the timescale of ≈1 µs [58]. However, simulation studies by others suggest that folding may proceed by hydrophobic collapse followed by rearrangement to form the hydrophobic cluster, with hydrogen bonds then propagating outward from the cluster in both directions [59]. Such a model does not appear to require a turn-based nucleation event.
7 Quantitative Analysis of Peptide Folding
Quantitative analysis of the population of folded β-sheet structures in solution still presents a challenge, largely as a consequence of uncertainties in limiting spectroscopic parameters for the fully folded state. Far UV-CD has been considered to be unreliable as a consequence of the complicating influence of the β-turn conformation and possibly aromatic residues, where present [10]. Added to this, the CD spectrum of β-sheet is intrinsically weak compared with α-helical secondary structure.
18 Design and Stability of Peptide $-sheets
The trpzip peptides [49] represent the exception to the rule, as discussed above. The use of NMR parameters (Hα chemical shifts, 3 JNH-Hα values and NOE intensities) to quantify folded populations has been discussed [9–11, 32, 60]. NMR offers the advantage that several independent parameters can be used in quantitative analysis to provide a consensus picture of the folded state. There still appear to be significant discrepancies between CD analysis and NMR, with peptides that appear to be significantly folded by NMR giving rise to a largely random coil CD spectrum. One interpretation of this observation is that in aqueous solution the peptides fold as a collapsed state with an ill-defined hydrogen bonding network dominated by sidechain interactions. Interestingly, in many cases the addition of organic co-solvents changes the CD spectrum dramatically. The interpretation of co-solvent-induced folding is also subject to some uncertainty. Does trifluorethanol or methanol actually significantly perturb the equilibrium between the folded and unfolded states (induce folding), or do these solvents exert their influence by changing the nature of the folded state such as to stabilize interstrand hydrogen bonding interactions without significantly changing the folded population? The latter hypothesis would appear to more readily account for the observation of solvent-induced effects on the CD spectrum, and finds some support from studies of cyclic β-hairpin analogs where the folded population is fixed, but whose CD spectrum undergoes large solvent-induced changes [60]. The use of cyclic β-hairpin analogs has been exploited in a number of studies to generate a fully folded NMR reference state for comparison with the folding of acyclic analogs [61, 62]. Backbone cyclization through amide bond formation or through a disulfide bridge seem to work equally well. Such an approach has been used effectively to measure the thermodynamics of folding of a short hairpin carrying a motif of aromatic residues [62]. The cyclic analogs show a much higher stability than their acyclic counterparts, including significant protection from amide H/D exchange due to enforced interstrand hydrogen bonding. While peptide cyclization seems a worthwhile approach to defining the fully folded state, there may also be some caveats to this approach that have not been tested. When conformational constraints (β-turns) are imposed at both ends of the structure this may affect the intrinsic twist of the two β-strands, resulting in a more pronounced twist in the cyclic analog than in the acyclic hairpin. Further, the latitude for conformational dynamics in the fully folded state will also be different. Both of these factors are likely to influence Hα shifts chemical shifts to some degree.
8 Thermodynamics of β-hairpin Folding
The number of β-hairpin model systems is expanding rapidly with a greater focus now on quantitative analysis and thermodynamic characterization. The thermodynamic signature for the folding of β1 (Figure 2) presents an insight into the nature of the stabilizing weak interactions in various solvent milieu [30, 32]. β1 exhibits
8 Thermodynamics of $-hairpin Folding
Fig. 11 a) Temperature-dependent stability of the hairpin peptide $1 (see Figure 2) at pH 5.5 in water, 20% methanol and 50% methanol. Temperature is plotted against the RMS value for the deviation of H" chemical shifts from random coil values, assuming a two-state folding model. In water the peptide shows “hot” and “cold” denaturation, but in 50% methanol the folded population increases at low temperature. The best fits to the three sets of data are shown; in water folding is slightly entropydriven with a large negative C◦ p value, while in 50% methanol folding is strongly
enthalpy-driven with C◦ p close to zero. In 20% methanol the values are in between. (Reproduced with permission from the Journal of the American Chemical Society [32]). b) Variable-temperature FTIR spectra of i) 2 mM aqueous solution of the 8-mer peptide GKKITVSI (C-terminal $-strand of hairpin $1); ii) 2 mM solution of $-hairpin peptide $1; iii) 10 mM solution of $1, all in the temperature range 278–330 K and in D2 O solution, pH 5.0 (uncorrected). In iii) the band that appears around 1620 cm−1 is formed irreversibly indicative of peptide aggregation. Reproduced from Ref [64].
the property of cold denaturation, with a maximum in the stability curve occurring at 298K as judged by changes in Hα chemical shift (Figure 11). Such pronounced curvature is clear evidence for entropy-driven folding accompanied by a significant change in heat capacity. Both of these thermodynamic signatures are hallmarks of the hydrophobic effect contributing strongly to hairpin stability. More recent studies by Tatko and Waters [63] have also shown that cold denaturation effects can be observed for simple model peptides, consistent with substantial changes in heat capacity on folding. Further examination of folding in the presence of methanol co-solvent shows that the signature changes such that folding becomes strongly enthalpy driven, and that in 50% aqueous methanol the temperaturestability profile is indicative of the absence of any significant contribution of the hydrophobic effect to folding [32]. The population of the folded state appears to be enhanced by methanol, reflecting similar observations in helical peptides where the phenomenon has been attributed to the effects of the co-solvent destabilizing
19
20 Design and Stability of Peptide $-sheets
the unfolded peptide chain so promoting hydrogen bonding interactions in the folded state. It is unlikely that the folded state of a model β-hairpin peptide resembles a β-sheet in a native protein, with the former sampling a much larger number of conformations of similar energy stabilized by a fluctuating ensemble of transient interactions. IR analysis of the amide I band of β1 does not show significant differences in the region expected for β-sheet formation (≈ 1630 cm−1 ) from data on a nonhydrogen-bonded short reference peptide [64]. However, this band does appear under aggregating conditions (Figure 11). In other cases, IR spectral features characteristic of β-hairpins have been shown to form reversibly at relatively high peptide concentrations [65]. Similarly, β-hairpins that appear to be well folded on the basis of various NMR criteria seem to be weakly folded by CD analysis [60]. This discrepancy has been attributed largely to weak interstrand hydrogen bonding interactions in the folded state. Indeed, molecular dynamics simulations using ensemble-averaging approaches or time-averaged NOEs tend to de-emphasize the role of hydrogen bonding between the peptide backbone of the two strands, but emphasize the role of hydrophobic side chains interactions [30, 60, 67]. In all cases described, side chain NOEs across the β-strands support such interactions, however, cross-strand backbone NOEs only imply the possibility of hydrogen bonds since weakly populated folded states lead to only small NH ←→ ND protection factors in water. In studies (both calorimetric and NMR) of the folding of the C-terminal hairpin from the B1 domain of protein G, enthalpy-driven folding is observed in water [68]. In contrast with the above data, where the stabilizing hydrophobic interactions involved aliphatic side chains, here an aromatic cluster is responsible for folding. A strongly enthalpy-driven transition is also apparent for the trpzip motifs described above [49] all of which is consistent with π-π interactions stabilizing β-hairpin structures through fundamentally electrostatic interactions rather than through only solvophobic effects. Studies of a designed β-hairpin system, also carrying the same motif of three aromatic residues, demonstrate qualitatively similiar enthalpydriven folding [62], while the thermodynamics of the N-terminal hairpin component of a designed three-stranded antiparallel β-sheet enables similar conclusions to be drawn (see further below) [69]. The difference in the thermodynamic signature for aliphatic versus aromatic side-chain interactions has been suggested to have its origins in enthalpy-entropy compensation effects, such that enthalpy-driven interactions may be fundamentally a consequence of tighter interfacial interactions, giving rise to stronger electrostatic (enthalpic) interactions [62]. More recent work by Tatko and Waters [70] probing the energetics of aromatic-aliphatic interactions in hairpins suggests that there is a preference for the self-association of aromatics over aromatic-aliphatic interactions, and that the unique nature of this interaction may impart some degree of sequence selectivity. The limited data available from such systems suggest that the thermodynamic driving force for β-hairpin folding is highly dependent on the nature of the side-chain interactions involved. Further quantitative analysis of peptide folding is required to substantiate these hypotheses.
9 Multistranded Antiparallel $-sheet Peptides
9 Multistranded Antiparallel β-sheet Peptides
The natural extension of the earlier studies on β-hairpin peptides was to design three- and four-stranded antiparallel β-sheets using the design principles already discussed, focusing on the importance of turn sequence in defining stability and strand alignment, and employing motifs of interacting side chains already identified to impart stability. An overriding question concerns the extent to which cooperative interactions perpendicular to the strand direction are important in stabilizing these structures (see Figure 12); in other words, how good is a pre-organized βhairpin motif at templating the interaction of a third strand. Several studies have attempted to address this important question. The earliest study described a 24residue peptide incorporating two NG turns (KKFTLSINGKKYTISNGKTYITGR) that showed little evidence for folding in water but was significantly stabilised in aqueous methanol solutions [71, 72]. By comparison with a 16-residue β-hairpin analog consisting of the same C-terminal sequence (GKKYTISNGKTYITGR), it was possible to show that the Hα shift perturbations for the C-terminal β-hairpin were greater in the presence of the interactions of the third strand. Subsequent design strategies, incorporating the D-Pro-Gly loop, together with other Asn-Glycontaining turn sequences have illustrated that it is possible to design structures that fold in water [69, 73–75], some of which show a degree of cooperative stabilization between different strands. Other studies have reported peptides that fold to three- (and four-) stranded β-sheets in organic solvents [76], but a quantitative dissection of stabilizing interactions between β-strands has not been presented. Redesign of the three-stranded Betanova, following earlier work [74], reported the use of an automated approach using the algorithm PERLA (protein engineering rotamer library algorithm) to define stabilizing and destabilizing single and multipleresidue mutations producing incremental increases in stability over the original design of up to ≈4 kJ mol−1 [77]. Schenck and Gellman [73] demonstrated cooperative interactions using their D-ProGly to L-ProGly switch, the latter destabilizing one hairpin component selectively. From chemical shift analysis they showed that the individual β-hairpins are cooperatively stabilized by the presence of the third strand. De Alba et al. [75] also reported a designed β-sheet system, but were unable to find convincing evidence for cooperative stabilization of either hairpin by the third strand. It is clear that folding is not a highly cooperative process in these peptide systems. Thus, the folded three-stranded β-sheet is more than likely in equilibrium with populations of the individual hairpins and the unfolded state (see Figure 12). One quantitative study of cooperative interactions between the strands of a threestranded β-sheet [69] was based on a designed system incorporating a previously studied β-hairpin with the third strand capable of forming a stabilizing motif of aromatic residues (Figure 13), similar to that already described [17, 62]. The earlier study of the isolated C-terminal β-hairpin showed cold denaturation [30], approximating to two-state unfolding. In the designed three-stranded system, the same hairpin component shows the same cold denaturation profile, however, the N-
21
22 Design and Stability of Peptide $-sheets
Fig. 12 a) Models for the folding of $hairpin and three-stranded $-sheet peptides illustrating the possibility of cooperative interactions being propagated parallel i) and perpendicular ii) to the $-strand direction. b) Four-state model for the folding of a three-stranded antiparallel $-sheet peptide showing the presence at equilibrium of the
unfolded state and partially folded states in which the C-terminal $-hairpin is formed but the N-terminus is disordered, the Nterminal $-hairpin is formed but the Cterminus is disordered, and the fully folded state. The preformed hairpins can act as templates for the folding of the third strand.
terminal hairpin (sharing a common central strand) showed increased folding at low temperature. While the first process is characterized by entropy-driven folding, the latter is enthalpy-driven (Figure 13). Clearly, two different thermodynamic profiles are not consistent with a single two-state folding model, but the data could be rationalized in terms of a four-state model in which the individual hairpins with an unfolded tail are also populated (Figure 12). Examination of the folding of the isolated C-terminal hairpin, and comparison of the data with that of the three-
9 Multistranded Antiparallel $-sheet Peptides
Fig. 13 a) Mean NMR structure of the fully folded state of the three-stranded antiparallel $-sheet structure of sequence KGEWTFVNG9 KYTVSING17 KKITVSI showing the core cluster of hydrophobic residues (underlined). b) Temperature-dependent stability profiles for the various $-hairpin components of the three-stranded sheet using the H" splitting of Gly9 and Gly17 in the two $-turns: Gly9 (filled circles), Gly17 (open circles). Gly17 (open squares)
in the isolated C-terminal hairpin peptide (G9 KYTVSING17 KKITVSI) is also shown. A larger Gly H" splitting indicates a higher population of the folded hairpin. Different profiles for Gly9 and Gly17 in the threestranded sheet show that the peptide cannot be folding via a simple two-state model involving only random coil and fully folded peptide. The data have been fitted assuming the four-state model shown in Figure 12. Reproduced from Ref. [69].
23
24 Design and Stability of Peptide $-sheets
stranded analog, shows good evidence that the C-terminal hairpin is cooperatively stabilized by the interaction of the third strand, even though overall folding is not highly cooperative. The data in Figure 13 show the temperature dependence of the splitting of the Gly Hα resonances in the two NG turns. Larger values indicate a higher folded population. The splitting for G17 is greater in the three-stranded sheet than for the isolated C-terminal hairpin, while many Hα resonances are further downfield shifted than in the isolated hairpin. The temperature dependence of the stability shows the C-terminal hairpin in both cases to undergo cold denaturation. The N-terminal hairpin carrying the aromatic motif of residues increases in population at low temperature; fitting the data shows the former to be entropy driven and the latter enthalpy driven [69]. Entropic factors seem to be the likely explanation for the small cooperative stabilization effect on the C-terminal hairpin (< 2 kJ mol−1 ), with each hairpin providing a possible template against which the third strand can interact. With one strand preorganized, the entropic
Fig. 14 Structure of the four-stranded $-sheet peptide D PD PD P-cc (a), and the cyclic reference peptide c(D P)2 -II (b) used for estimating limiting values for the fully folded state of the C-terminal hairpin of D PD PD P-cc. Adapted from the work of Ref. [78].
10 Conclusions
cost of association of an additional strand is largely confined to the associating strand [73, 78]. The nature of the folded state is unlikely to compare with that of a β-sheet in a native protein, more likely, hydrophobic contacts between side chains stabilize a collapsed conformation where interstrand hydrogen bonds may play a minor stabilizing role. These “loosely” defined interactions between side chains, rather than a native-like “crystalline” array of hydrogen bonds, may explain why cooperative interactions have only a small effect on overall stability because only a small energy barrier separates the folded from partially or fully unfolded states. Several of the above studies have attempted to address the issue of whether cooperative interactions are propagated orthogonal to the strand direction as the number of β-strands increases. Gellman and coworkers have examined the influence of strand number on antiparallel β-sheet stability in designed three- and four-stranded structures (see Figure 14), using the D-Pro-Gly β-turn sequence to define β-turn position and strand length. The results are not dissimilar to those described above; a third strand stabilizes an existing two-stranded β-sheet by ≈2 kJ mol−1 , while a fourth strand seems to confer a similar small stability increment [79].
10 Conclusions
In contrast to the highly cooperative folding behavior characteristic of native globular proteins, simple model β-sheet systems, which lack defining tertiary interactions, do not possess this characteristic. The limited number of three- and fourstranded β-sheet peptides so far described confirms this, although some evidence for cooperative interactions between β-strands have been presented and rationalized on the basis of the entropic benefits associated with adding an additional β-strand to a preformed template. In all cases to date, it seems that designed threeand four-stranded β-sheet structures are in equilibrium with their partially folded β-hairpin components. This is an indication that the energy landscape is relatively flat unlike that for most native proteins where there is a significant energy difference between the single native structure and other conformations. It is interesting to look at examples of the smallest known β-sheet proteins and make comparisons with the designed systems described above. The WW domains [80–82] form a single folded motif consisting of three strands of antiparallel β-sheet but with well-defined tertiary interactions arising from folding back of the N- and C-termini to form a small compact hydrophobic core. This defining feature appears to allow the WW domains to fold co-operatively despite their small size (some as small as 35 residues) (Figure 15). Apart from the desire to be able to design molecules to order with specific tailored properties, the model βsheet systems described have enabled considerable progress to be made in testing our understanding of the basic design principles of β-sheets and fundamental aspects of weak interactions.
25
26 Design and Stability of Peptide $-sheets
Fig. 15 NMR structure of the WW domain of the formin binding protein (PDB code: 1EOI), consisting of a three-stranded antiparallel $sheet motif; the side chains of conserved residues are shown with tertiary contacts evident between W8 and P33.
References 1 HONIG, B. J. Mol. Biol. 1999; 293, 283–293. 2 BROCKWELL, D. J., SMITH, D. A., RADFORD, S. E. Curr. Opin. Struct. Biol. 2000; 10, 16–25. 3 BALDWIN, R. L., ROSE, G. Trends Biochem. Sci. 1999; 24, 26–76. 4 BALDWIN, R. L., ROSE, G. Trends Biochem. Sci. 1999; 24, 77–83. 5 CHAKRABARTTY, A., BALDWIN, R. Adv. Protein Chem. 1995; 46, 141–176. 6 MUNOZ, V., SERRANO, L. J. Mol. Biol. 1995; 245(3), 275–296. 7 O’NEIL, K. T., DEGRADO, W. F. Science 1990; 250(4981), 646–651. 8 HILL, R., DEGRADO, W. J. Am. Chem. Soc. 1998; 120(6), 1138–1145. 9 GELLMAN, S. H. Curr. Opin. Chem. Biol. 1998; 2, 717–725. 10 RAMIREZ-ALVARADO, M., KORTEMME, T., BLANCO, F. J., SERRANO, L. Bioorg. Med. Chem. 1999; 7, 93–103. 11 SEARLE, M. S. J. Chem. Soc. Perkin Trans. 2001; 2, 1011–1020. 12 KEMP, D. S., BOWEN, B. R., MEUNDEL, C. C. J. Org. Chem. 1990; 55, 4650–4657.
13 NOWICK, J. S., SMITH, E. M., PAIRISH, M. Chem. Soc. Rev. 1996; 401–415. 14 DIAZ, H., TSANG, K. Y., CHOO, D., ESPINA, J. R., KELLY, J. W. J. Am. Chem. Soc. 1993; 115, 3790–3791. 15 BLANCO, F. J., JIMENEZ, M. A., HERRANZ, J., RICO, M., SANTORO, J., NIETO, J. L. J. Am. Chem. Soc. 1993; 115, 5887–5889. 16 BLANCO, F. J., RIVAS, G., SERRANO, L. Nat. Struct. Biol. 1994; 1, 584–590. 17 BLANCO, F. J., JIMENEZ, M. A., PINEDA, A., RICO, M., SANTORO, J., NIETO, J. L. Biochemistry 1994; 33, 6004–6014. 18 COX, J. P. L., EVANS, P. A., PACKMAN, L. C., WILLIAMS, D. H., WOOLFSON, D. N. J. Mol. Biol. 1993; 234, 483–492. 19 SEARLE, M. S., WILLIAMS, D. H., PACKMAN, L. C. Nat. Struct. Biol. 1995; 2, 999–1006. 20 SEARLE, M. S., ZERELLA, R., WILLIAMS, D. H., PACKMAN, L. C. Protein Eng. 1996; 9, 559–565. 21 ZERELLA, R., EVANS, P. A., IONIDES, J. M. C. et al. Protein Sci. 1999; 8, 1320– 1331.
References 22 ZERELLA, R., CHEN, P. Y., EVANS, P. A., RAINE, A., WILLIAMS, D. H. Protein Sci. 2000; 9, 2142–2150. 23 SIBANDA, B. L., BLUNDELL, T. L., THORNTON, J. M. J. Mol. Biol. 1989; 206, 759–777. 24 HUTCHINSON, E. G., THORNTON, J. M. Protein Sci. 1994; 3, 2207–2216. 25 HAQUE, T. S., GELLMAN, S. H. J. Am. Chem. Soc. 1997; 119, 2303–2304. 26 STANGER, H. E., GELLMAN, S. H. J. Am. Chem. Soc. 1998; 120, 4236–4237. 27 RAMIREZ-ALVARADO, M., BLANCO, F. J., SERRANO, L. Nat. Struct. Biol. 1996; 3, 604–612. 28 De ALBA, E., JIMENEZ, M. A., RICO, M. J. Am. Chem. Soc. 1997; 119, 175–183. 29 RAMIREZ-ALVARADO, M., BLANCO, F. J., NIEMANN, H., SERRANO, L. J. Mol. Biol. 1997; 273, 898–912. 30 GRIFFITHS-JONES, S. R., MAYNARD, A. J., SEARLE, M. S. J. Mol. Biol. 1999; 292, 1051–1069. 31 MAYNARD, A. J., SEARLE, M. S. J. Chem. Soc. Chem. Commun. 1997; 1297–1298. 32 MAYNARD, A. J., SHARMAN, G. J., SEARLE, M. S. J. Am. Chem. Soc. 1998; 120, 1996–2007. 33 SWINDELLS, M. B., MACARTHUR, M. W., THORNTON, J. M. Nat. Struct. Biol. 1995; 2, 596–603. 34 FIEBIG, K. M., SCHWALBE, H., BUCK, M., SMITH, L. J., DOBSON, C. M. J. Phys. Chem. 1996; 100, 2661–2666. 35 SMITH, L. J., BOLIN, K. A., SCHWALBE, H., MACARTHUR, M. W., THORNTON, J. M., DOBSON, C. M. J. Mol. Biol. 1996; 255, 494–506. 36 GRIFFITHS-JONES, S. R., SHARMAN, G. J., MAYNARD, A. J., SEARLE, M. S. J. Mol. Biol. 1998; 284, 1597–1609. 37 JOURDAN, M., GRIFFITHS-JONES, S. R., SEARLE, M. S. Eur. J. Biochem. 2000; 267, 3539–3548. 38 NERI, D., BILLETER, M., WIDER, G., WUTHRICH, K. Science 1992; 257, 1559–1563. 39 FRANK, M., CLORE, G. M., GRONENBORN, A. Protein Sci. 1995; 4, 2605–2615. 40 BUCKLER, D., HAAS, E., SHERAGA, H. Biochemistry 1995; 34, 15965–15978. 41 SHORTLE, D. FASEB J. 1996; 10, 27–34.
42 KLEIN-SEETHARAMAN, J., OIKAWA, M., GRIMSHAW, S. B. et al. Science 2002; 295, 1719–1722. 43 SHORTLE, D., ACKERMAN, M. S. Science 2001; 293, 487–489. 44 De ALBA, E., RICO, M., JIMENEZ, M. A. Protein Sci. 1997; 6, 2548–2560. 45 WOUTERS, M. A., CURMI, P. M. G. Protein Struct. Funct. Genet. 1995; 22, 119–131. 46 HUTCHINSON, E. G., SESSIONS, R. B., THORNTON, J. M., WOOLFSON, D. N. Protein Sci. 1998; 7, 2287–2300. 47 ESPINOSA, J. F., MUNOZ, V., GELLMAN, S. H. J. Mol. Biol. 2001; 306, 397–402. 48 STANGER, H. E., SYUD, F. A., ESPINOSA, J. F., GIRIAT, I., MUIR, T., GELLMAN, S. H. Proc. Natl Acad. Sci. USA 2001; 98, 12105–12120. 49 COCHRAN, A. G., SKELTON, N. J., STAROVASNIK, M. A. Proc. Natl Acad. Sci. USA 2001; 98, 5578–5583. 50 KARSHIKOFF, A., LADENSTEIN, R. Trends Biochem. Sci. 2001; 26, 550–556. 51 KUMAR, S., NUSSINOV, R. Cell. Mol. Life Sci. 2001, 58, 1216–1233. 52 TAKANO, K., TSUCHIMORI, K., YAMAGATA, Y., YUTANI, K. Biochemistry 2000, 39, 12375–12381. 53 LOLADZE, V. V., IBARRA-MOLERO, B., SANCHEZ-RUIZ, J., MAKHATADZE, G. I. Biochemistry 1999, 38, 16419–16423. 54 CIANI, B., JOURDAN, M., SEARLE, M. S. J. Am. Chem. Soc. 2003; 125, 9038–9347. 55 RUSSELL, S. J., BLANDL, T., SKELTON, N. J., COCHRAN, A. G. J. Am. Chem. Soc. 2003; 125, 388–395. 56 MUNOZ, V., THOMPSON, P. A., HOFRICHTER, J., EATON, W. A. Nature 1997; 390, 196–199. 57 MUNOZ, V., HENRY, E. R., HOFRICHTER, J., EATON, W. A. Proc. Natl Acad. Sci. USA 1998; 95, 5872–5879. 58 HAGEN, S. J., HOFRICHTER, J., SZABO, A., EATON, W. A. Proc. Natl Acad. Sci. USA 1996; 93, 11615–11617. 59 DINNER, A. R., LAZARIDIS, T., KARPLUS, M. Proc. Natl Acad. Sci. USA 1999; 96, 9068–9073. 60 LACROIX, E., KORTEMME, T., LOPEZ DE LA PAZ, M., SERRANO, L. Curr. Opin. Struct. Biol. 1999; 487–493.
27
28 Design and Stability of Peptide $-sheets
61 SYND, F. A., ESPINOSA, J. F., GELLMAN, S. H. J. Am. Chem. Soc. 1999; 121, 11577–11578. 62 ESPINOSA, J. F., GELLMAN, S. H. Angew. Chem. 2000; 39, 2330–2333. 63 TATKO, C. D., WATERS, M. L. J. Am. Chem. Soc. 2004; 126, 2018–2034. 64 COLLEY, C. S., GRIFFITHS-JONES, S. R., GEORGE, M. W., SEARLE, M. S. J. Chem. Soc. Chem. Commun. 2000; 593–594. 65 HILARIO, J., KUBELKA, J., KEIDERLING, T. A. J. Am. Chem. Soc. 2003; 125, 7562–7574. 66 MA, B., NUSSINOV, R. J. Mol. Biol. 2000; 296, 1091–1104. 67 WANG, H., SUNG, S.-S. Biopolymers 1999; 50, 763–776. 68 HONDA, S., KOBAYASHI, N., MUNEKATA, E. J. Mol. Biol. 2000; 295, 269–278. 69 GRIFFITHS-JONES SEARLE, M. S. J. Am. Chem. Soc. 2000; 122, 8350–8356. 70 TATKO, C. D., WATERS, M. L. J. Am. Chem. Soc. 2002; 124, 9372–9373. 71 SHARMAN, G. J., SEARLE, M. S. J. Chem. Soc. Chem. Commun. 1997; 1955–1956. 72 SHARMAN, G. J., SEARLE, M. S. J. Am. Chem. Soc. 1998; 120, 5291–5300.
73 SCHENCK, H. L., GELLMAN, S. H. J. Am. Chem. Soc. 1998; 120, 4869–4870. 74 KORTEMME, T., RAMIREZ-ALVARADO, M., SERRANO, L. Science 1998; 281, 253–256. 75 De ALBA, E., SANTORO, J., RICO, M., JIMENEZ, M. A. Protein Sci. 1999; 8, 854–865. 76 DAS, C., RAGHOTHAMA, S., BALARAM, P. J. Am. Chem. Soc. 1998; 120, 5812–5813; DAS, C., RAGHOTHAMA, S., BALARAM, P. Chem. Commun. 1999; 967–968. 77 LOPEZ DE LA PAZ, M., LACROIX, E., RAMIREZ-ALVARADO, M., SERRANO, L. J. Mol. Biol. 2001; 312, 229–246. 78 FINKELSTEIN, A. V. Proteins 1991; 9, 23–27. 79 SYUD, F. A., STANGER, H. E., SCHENCK MORTELL, H. et al. J. Mol. Biol. 2003; 326, 553–568. 80 MACIAS, M. J., GERVAIS, V., CIVERA, C., OSCHKINAT, H. Nat. Struct. Biol. 2000; 7, 375–379. 81 KOEPF, E. K., PETRASSI, H. M., RATNASWAMY, G., HUFF, M. E., SUDOL, M., KELLY, J. W. Biochemistry 1999; 38, 14338–14351. 82 KANELIS, V., ROTIN, D., FORMAN-KAY, J. D. Nat. Struct. Biol. 2001; 8, 407–412.
1
Engineering Proteins for Stability and Efficient Folding Bernhard Schimmele, and Andreas Pl¨uckthun Universit¨at Z¨urich, Z¨urich, Switzerland
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The industrial, biotechnological, and medical applications of proteins are often limited by an insufficient protein stability or related problems. Such applications commonly require that proteins be produced on a large scale and remain stable enough to fulfill their functions for a reasonable length of time, often under harsh conditions. However, natural proteins are typically only marginally stable, and it is thus a major challenge for protein engineers to optimize stability and folding efficiency. The approaches that have been successfully employed to achieve this goal are rational design, semi-rational strategies based on sequence comparisons, and the methods of directed protein evolution. Of course, these methods are not mutually exclusive and can be combined to solve practical problems. All studies employing these methods have revealed important rules for protein engineering and at the same time shed light on the principles and mechanisms responsible for the folding and stability of proteins. Recent advances in stability engineering have demonstrated that merely small changes in a given protein sequence can have profound effects on its biophysical properties. The major challenge is therefore to correctly identify and remedy these shortcomings. It is the goal of this chapter to summarize the biophysical principles and technological approaches useful in improving the biophysical properties of proteins through sequence modification. Considering the enormous array of technologies involved in this endeavor, ranging from computer algorithms to selection technologies, it is not possible to give detailed experimental protocols in this chapter; instead, we will guide the reader to the cited literature.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Engineering Proteins for Stability and Efficient Folding
2 Kinetic and Thermodynamic Aspects of Natural Proteins 2.1 The Stability of Natural Proteins
Evolution does not per se provide proteins with high stability. In fact, stability is just one of many evolutionary constraints on proteins. Proteins have to fold to a defined structure with adequate yield in a reasonable time and then have to be just stable enough to perform their function over a certain period. There is no evolutionary incentive to make a protein any “better” than what is needed to fulfill its cellular functions. In contrast, the use of a protein in a formulation at high concentration, its prolonged activity at 37 ◦ C, and its large-scale expression and crystallization, just to name a few conditions, may put far higher demands on the protein than its natural environment. Thus, the natural sequence may not be able to provide these properties, but a mutant sequence may. Proteins exist and have evolved in order to fulfill a given function, and evolution drives the structural properties of a protein mainly towards increased functionality [1]. In fact, most proteins are only marginally stable, with Gfolding in the range of −20 to −60 kJ mol−1 . It is still a matter of debate whether this marginal stability is actually a “design feature,” e.g., to allow degradation at a certain rate, whether it is caused by the selection pressure towards higher functionality that may not be compatible with high stability, or whether it is just a side effect of the lack of selection for high stability. Function is often linked to higher structural flexibility in certain regions of a protein. Lower stability as a result of this higher local structural flexibility might therefore simply represent an adaptation to increased functionality [2]. If this were generally true, however, stability engineering would fail in most cases, as it would not be able to reconcile stability with preserved protein function. An alternative, more optimistic view for the protein engineer is that marginal stability can be interpreted as a result of genetic drift [3]. In other words, lower stability is not intended; but it simply does not matter, provided that function is maintained. Random mutations occurring during evolution are more likely to destabilize the structure of a given parental protein sequence than to stabilize it or be neutral. However, as long as this stability decrease is not sufficient to render the protein nonfunctional, these destabilizing mutations are likely to accumulate in the sequence. This tendency has also been referred to as “sequence entropy” [4]. As a consequence, stability engineering could be interpreted as the art of identifying these unfavorable mutations in order to reestablish a more stable sequence. The concept discussed above also sets the basis for the consensus approach to stability engineering, which is discussed in Section 3.1 Based on the physical principles of protein folding and a structure-based analysis of the interactions between amino acid residues, rational design can give hints as to which residues need to be altered to achieve a desired effect, and this is discussed in Section 3.2. The third focus will be set on the methods of directed evolution, which mimic the
2 Kinetic and Thermodynamic Aspects of Natural Proteins
mechanisms of Darwinian evolution to evolve proteins with enhanced folding and stability properties (see Section 4). 2.2 Different Kinds of “Stability”
Before discussing in detail different strategies for rendering a protein more stable, it is worth taking a closer look at some basic features associated with protein stability and folding properties. The term “stability” itself is rather vague, and its precise meaning has often been adapted depending on the problem being addressed. This leads to different definitions of “stability.” We will now briefly analyze the definitions and differences between thermodynamic, kinetic, and thermal stability, as well as folding efficiency, which is also sometimes discussed in this context. Even though these properties are interconnected, they are not equivalent. This has important implications for protein engineering, as it is difficult to predict in advance how a given mutation will influence each of these properties. These influences will in fact be different for any protein under investigation according to the free energies of the native and unfolded states and the folding intermediates, as well as the folding pathway and the respective rate constants. 2.2.1 Thermodynamic Stability Thermodynamics describes the global unfolding behavior of a protein. The corresponding thermodynamic stability G describes the differences between the free energies of the native (N) and the unfolded (U) states. Importantly, G is an equilibrium property for a reaction involving two or more states. The simplest model of an unfolding reaction is the equilibrium of a two-state unfolding reaction:
where kunfolding and kfolding are the rate constants of the respective unfolding and folding reactions. As a consequence, values for G can be deduced only if the described process is fully reversible and no intermediate states are populated to a significant extent, unless they are explicitly known and measured. G describes to what degree the two states are populated at a given temperature according to G unfolding = −RT In K
(1)
where K is the equilibrium constant of the unfolding process. The treatment of experimental data is much simplified in such a model, as stability is affected only by the free energies of the folded and unfolded states. Because G also provides an overall measure of all energetic contributions of interactions occurring upon protein folding, it is a convenient quantity for comparing the energetic effects of single amino acid replacements in a given folded structure. Because of the great scientific interest in deducing the principles of protein folding on a quantitative
3
4 Engineering Proteins for Stability and Efficient Folding
basis, many studies have been carried out with carefully selected model systems in which these strict requirements that allow the determination of G are fulfilled. Moreover, G represents an intrinsic quantity, at a given temperature in a given buffer, that is independent of the experimental setup and therefore reproducible in any case. The major problem in such studies, however, is confirmation that the observed transition can in fact be described as a fully reversible process, and denaturant-induced transition data alone are often not sufficient to allow reliable judgment. Only if data derived from measurements on the basis of different spectral probes (or better yet, additionally by differential scanning calorimetry, yielding the model-independent enthalpy change of unfolding) agree with each other are the deduced G values likely to be correct. Nevertheless, many useful conclusions can be drawn about the effects of mutations even if these strict requirements cannot be completely fulfilled in all cases. Another fact complicates the use of G as a measure for protein stability. Mostly as a result of the fact that the hydrophobic effect is the major driving force of protein folding [5], G itself is a characteristic, curved function of temperature. It is defined as G(T ) = G U −G N = H(T )−T S(T ) = −RT In K
(2)
emphasizing especially that the enthalpy change H is itself a function of temperature. This change of H with T can be described by the heat capacity change C p =
∂H(T ) ∂T
(3) p
Although often ignored, the temperature dependence of G should thus not be left out of the account if one is dealing with the stability engineering of proteins. The large heat capacity change upon the transfer of nonpolar solutes to water, which is the basis of the hydrophobic effect, results in a curved function of G versus temperature. By using the definition of Cp , G can be approximated as a function of T, the melting temperature T m , the enthalpy change at T m H(T m ), and Cp . G(T ) = (1−
T T )H(Tm ) + (T −Tm )C p −T C p In Tm Tm
(4)
If G(T) is plotted as a function of T, the curve increases at low temperatures and decreases at high temperatures (Figure 1). The temperature at which folded and unfolded states are equally populated, and thus G = 0, is called the melting temperature T m . The respective curvature of G versus temperature is strongly dependent on the change of heat capacity Cp upon unfolding. A mutation may change Cp , H(T m ), and T m in any combination, thereby altering the shape and the position of this curve. Higher thermodynamic stability at a given temperature (G(T)) can
2 Kinetic and Thermodynamic Aspects of Natural Proteins
Fig. 1 The complex relationship between the melting temperature and the free energy of folding. Schematic representations of the free energy difference between the folded and unfolded states of a protein, GN , as a function of temperature, T. A typical protein is shown in curve 1 (–––). The shape of this curve changes if a mutation affects C p ; H(T m ), and/or T m of the protein. In the examples shown, not just one but several of these parameters are changed. Higher thermodynamic stability at a given temperature can phenomenologically be the result of a curve upshift (. . .. . .. . .). Right-shifting the
curved function (----) results in a right-shift of the maximum of the GN function as well as in a shift of the melting temperature T m towards higher temperatures (T m ). If the GN function is flattened because of a small C p (. . .), a higher melting temperature (T m ) can result, even though the thermodynamic stability is actually decreased over a broad temperature range. All depicted curves represent extreme cases, and typically a combination of the described alterations will occur upon mutating a protein.
thus be achieved, e.g., by an upshift of this curve with constant maximum. Rightshifting of the curve will decrease G at lower temperatures but increase it at higher temperatures, which goes along with an increase of T m . Flattening of the curve due to a lower change of heat capacity upon unfolding may cause lower thermodynamic stability at most temperatures, even though T m is increased. These considerations contain important implications for the analysis of engineered mutants. First, a measured decrease of G at a certain temperature does not necessarily mean that at higher temperatures the thermodynamic stability might not have been increased. Second, by determining the change of G, no conclusions can be drawn about a potential change of the melting temperature T m . While they are typically related, a low G at a given temperature does not necessarily mean that T m will be decreased. T m and G must therefore be regarded as different properties. It should also be recalled that thermodynamic parameters, while easily reproduced, give no information about how long it will take until equilibrium is reached or, in the case of a nonreversible reaction, until a certain fraction of proteins is inactivated. Thus, thermodynamic stability does not necessarily provide information about whether a protein will meet the stability requirements for an intended application. T m itself often serves as a measure of protein stability, more precisely of thermal stability. In the literature, the expression “thermal stability” is again used with different meanings. In the described case, it represents the melting temperature for a reversible process. However, a complete thermodynamic analysis of the vast majority of proteins is not possible, because either intermediate folding states are
5
6 Engineering Proteins for Stability and Efficient Folding
populated or folding in the absence of denaturants that solubilize the unfolded state is not a fully reversible process. As will be discussed in the following section, thermal stability can nevertheless serve as a very practical means of describing protein stability, defined either as the transition temperature of an irreversible process or as the half-life of a protein under a given set of conditions. 2.2.2 Kinetic Stability In most cases, outside the biophysical research lab, protein unfolding is an irreversible reaction. Initially, the aggregation of unfolded molecules or of folding intermediates prevent the back-reaction of folding, and this is followed, after prolonged times at high temperature, by chemical inactivation or, in impure samples, proteolysis. In a simple model, kinetic inactivation can be described as
As discussed in the Introduction, one evolutionary constraint on proteins is that their three-dimensional structure remains viable for a certain period of time. In the case of non-equilibrium conditions and irreversible reactions, reaction kinetics becomes the important parameter. The folded state can resist high temperatures for a considerably long time if either the rate constant of inactivation or the rate constant of unfolding is sufficiently low. The rate constants are determined by the free energy of activation, e.g., the difference in energy between the folded state and the transition state of unfolding. Proteins can thus be kinetically stabilized by increasing this activation barrier. This kinetic stability is different in some crucial aspects from the thermodynamic stability mentioned above. Engineering of proteins for enhanced stability has to deal with both aspects. The best mutations for enhancing kinetic stability will not necessarily be the best mutations for enhancing thermodynamic stability, and not every mutation that increases thermodynamic stability will automatically have a positive effect on protein half-life. Kinetic stabilization is a common theme in nature, and there are several indications that many proteins from thermophilic organisms are indeed stabilized kinetically rather than thermodynamically [6, 7]. Another important example is that of proteins from the coats of viruses and phages that have to protect their genetic material under very adverse conditions [8]. In extreme cases the native state of a protein can even be less stable than its denatured state, but the native fold can still be kinetically trapped, and large kinetic unfolding barriers can provide the protein with an extremely long half-life [9]. There are different ways of describing and determining kinetic stability. Even if the reaction proceeds in an irreversible way, a practical “melting curve” can still be determined, and the observed transition is cooperative. The midpoint of this transition can serve as a practical means to compare the thermal resistance of different protein variants. Alternatively, one can use the half-life of the protein at a given temperature as an empirical means of stability. However, one has to be aware
2 Kinetic and Thermodynamic Aspects of Natural Proteins
that this will not reflect equilibrium conditions. The observed values are actually kinetic values and thus are not independent of the exact experimental conditions, such as the protein concentration, the heating rate, etc. A more thorough analysis of kinetic stabilization must include kinetic measurements to determine the respective unfolding rates at different temperatures [10, 11]. For medical applications, engineered proteins usually have to fulfill defined stability requirements such as the absence of aggregation in the formulation used, long-term stability, and prolonged activity at 37 ◦ C. The mere analysis of thermodynamic and kinetic parameters under defined reaction conditions in vitro is not always a reliable indicator of protein behavior under in vivo conditions. Mechanisms other than the intrinsic properties of the protein, such as proteolysis or aggregation with other proteins, can affect half-life. For practical utility, the half-life can therefore also be determined by measuring the percentage of molecules that retain their function after incubation under the respective conditions, e.g., in human serum at 37 ◦ C for several days [12]. 2.2.3 Folding Efficiency A different issue of kinetics is related not to unfolding but to the folding of the protein in vivo. More precisely, the question is, which percentage of a protein will actually fold to the native state, as opposed to going to misfolded states, soluble aggregates, or inclusion bodies? Even though the folding efficiency in vivo is not directly related to stability, correlations can often be observed [13]. Additionally, the efficiency of protein folding in vivo is usually the predominant factor influencing the expression yield and is therefore also crucial for large-scale production of functionally intact proteins. Importantly, many mammalian proteins with a high potential for medical applications, especially those secreted or expressed on the cell surface, can rely on the complex folding machinery of the eukaryotic cell to reach their final native state, and the secretory quality-control system of eukaryotic cells allows discrimination of proteins by their folding behavior [14]. Moreover, they usually do not need to be expressed in high amounts in their native physiological context. The resulting lack of selection pressure on their efficiency of folding during evolution is likely to be one of the causes for the difficulties often observed when attempting their overexpression. Unfortunately, these tend to be the proteins of greatest pharmacological utility. Despite the fact that thermodynamic stability underlies folding efficiency, the kinetic partitioning into productive folding or aggregation is influenced by many different factors. For illustrative purposes, we can again describe this by a very simple scheme:
Folding intermediates are often the source of aggregation, and the overall folding efficiency will therefore depend mainly on the nature of these intermediates. This includes the free energy of the folding intermediates themselves, as well as their
7
8 Engineering Proteins for Stability and Efficient Folding
half-lives and any efficient pathways to aggregation. in vivo, the situation becomes much more complex, as additional parameters such as interactions with cellular components and chaperones or degradation by host cell proteases come into play. The final output of a properly folded protein will therefore depend on all of these kinetic competitions [15]. Protein expression in the bacterial cytoplasm in many cases shows correlations between soluble expression yield and thermodynamic stability of the protein [16, 17]. Additional complications can arise from transport steps. For example, the expression of proteins in the periplasm of E. coli is dependent on the prior transport of the polypeptide chain through the inner membrane, and the folding yield is subsequently influenced by the folding and aggregation reaction in the periplasmic space as well as by interactions with periplasmic factors such as chaperones and proteases. In some cases, mutations that show positive effects on in vivo folding yield have no influence on the overall stability of the protein [18]. Conversely, mutations that strongly increase thermodynamic stability sometimes result in lower folding yields [19]. Nevertheless, many mutations act synergistically on both properties, because they are likely to reduce the free energy of the folded state as well as the free energy of folding intermediates and thereby lower the energetic activation barrier to folding [13].
3 The Engineering Approach 3.1 Consensus Strategies 3.1.1 Principles As discussed in the preceding sections, marginal protein stability is likely to be a side effect of “sequence entropy” occurring during natural evolution, because the major driving force of evolution is positive selection towards an enhanced functional property, while stability has to be maintained at only a minimum level to secure function. Mutations are likely to occur in a random fashion during this process; the probability that a mutation will have a stabilizing effect on the protein is very low, whereas the probability that the mutation will have destabilizing effects is very high. However, as long as the remaining amino acid sequence is still able to fold into a given structure and the overall domain stability does not fall below a certain threshold, the resulting protein sequences will not be eliminated during the course of evolution [20]. Destabilizing mutations are therefore often selectively neutral and thus accumulate in a given parental sequence. The same should also be true for folding efficiencies. Most proteins are not needed at high concentrations or may even become harmful to the organism in such a case. Similar to stability, folding yield is selectively neutral, provided that the minimum level for cellular function is maintained. This sets the basis for a semi-rational approach to protein stabilization, which is called the consensus approach [21] and is based on sequence statistics. Because mutations occur randomly, the distribution of amino acids at a given position in
3 The Engineering Approach
a set of homologous proteins can be described, in a very crude approximation, by Boltzmann’s law. The consensus approach assumes that at a specific position in a sequence alignment of homologous proteins, the contribution of the respective consensus amino acid to the stability of the protein is on average higher than the contribution of any non-consensus amino acid. Replacement of all non-consensus amino acids in a sequence by the respective consensus amino acid should therefore increase the overall stability of the protein. Obvious advantages of the consensus approach are that it is comparatively simple and is not strictly dependent on structural information at high resolution. The prerequisite for building a non-biased consensus is the availability of sequences homologous to the protein under investigation. The number of sequences should be large enough to make the sequence statistics reliable and to exclude bias in the resulting consensus sequences. Figure 2 shows an alignment of homologous sequences of single repeat modules, the smallest structural entity of a class of proteins known as leucine-rich repeat (LRR) proteins. Because the length of these modules varies among the different classes of LRR proteins – influencing their topology – only repeat modules of a length of 24 amino acid residues have been used for this alignment. The probability of each amino acid occurring at a given position is calculated to derive a consensus sequence, representing the most frequently occurring amino acid residue at each position. The distribution of residues at each position can provide information on structurally forbidden residues and allows weighing the consensus with respect to variability. In most cases, the consensus will contain the residues important for defining the structure of the proteins. In the case of an enzyme family, it will also include the “functional” residues, i.e., those of the active site. In the case of binding proteins, such as antibodies or repeat proteins, the “functional” residues (those involved in binding) are not conserved but are different for each individual molecule, which has to adapt to its target. Although at first sight no sophisticated structural analysis seems to be required, this is true in only the simplest of cases, where a single family of related sequences can be represented by a single consensus. Frequently, multiple families have emerged that use mutually incompatible solutions of packing. A good example is that of antibodies for which subgroup-specific consensus building has been very fruitful [22]. Averaging over all families would simply yield the consensus of the most-represented family and, if they are equally represented, may result in mutually incompatible residues. Thus, structural analysis can be very helpful in deciding whether an “averaging” of different sequence families is permissible or not. Because of this problem of interacting residues, a simple averaging may lead to incompatible pairs; therefore, these residues should be changed only as groups. The danger of disrupting these interactions by substitutions with consensus residues is especially high in cases where a very broad set of sequences is used for the alignment. To minimize this risk, the sequence statistics can be extended by analysis of covariance in order to derive probabilities that describe the joint occurrence of amino acid residues at two defined positions [23]. As explained above, before deriving a consensus, the aligned sequences can be divided into subclasses, which are likely to contain interacting pairs or groups of residues. Certain
9
10 Engineering Proteins for Stability and Efficient Folding
Fig. 2 Example of the analysis for deriving and analyzing a consensus sequence. (a) From a sequence alignment of 3077 sequences of 24 amino acid LRR motifs of the Pfam database (http://www.sanger.ac.uk/Software/Pfam/),
the relative frequencies of amino acid residues at each position are calculated. (b) Based on the relative frequencies calculated from the alignment, a consensus sequence can be derived representing the most frequently occurring residue at each position.
3 The Engineering Approach
variations of residues involved in salt bridges, distinct hydrogen-bonding patterns, or packing of the hydrophobic core are characterized by complementary changes between these subclasses, with mutations to a certain residue at one position being compensated by a mutation to a complementary residue at another position. The subclasses can be built either by being based on sequence homology alone or by including structural information if available. Even though the definition of these subclasses is always dependent on the homology cutoff set by the investigator, a simple dendrogram analysis can be used to group the complete set of sequences into distinct families. Additionally, by building the consensus sequence of each family separately, followed by a comparison of these consensus sequences, distinct structural features of each group can be recognized in some cases. The consensus concept has been applied successfully to a large variety of proteins and structural motifs to date. Important lessons for an effective application of the consensus approach can be learned from these studies, and we will therefore give a few examples to briefly discuss some of the advantages and limitations of the method. 3.1.2 Examples There is always the concern that the stabilizing and destabilizing effects of introduced mutations will counterbalance each other and that the overall change in protein stability will be small. Steipe et al. [20] applied the method for the first time on immunoglobulin VL domains and predicted 10 potentially stabilizing mutations. Six mutations were indeed stabilizing, three had no effect, and one was destabilizing. When applied to GroEL mini-chaperones, 34 predicted amino acid replacements were individually checked, out of which 13 were stabilizing, five showed no effect, and 16 were destabilizing [24]. Lehmann et al. [25] extended the approach to an entire protein. In a first set of experiments, 13 homologous sequences of
←-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Fig. 2 (Continued) The color code is given on the right hand side of the panel. Residues occurring in more than 80% of all sequences at the respective position are colored in red. Based on these values, consensus sequences with a given homology threshold can be derived (not shown), i.e., with a higher threshold, more positions will be “undefined.” (c) Preferences at a given position are reflected not only by the absolute frequency but also by the total number of different amino acid types occurring at each position (residue variability). As an example, the relative frequencies of each amino acid are shown for the highly conserved position 2 and the highly variable position 4. By plotting the relative frequencies of amino acids at a particular position,
preferences for certain amino acid types as well as “forbidden” residues can be identified. At position 2, a strong preference for leucine can be observed, and the occurrence of other residues is restricted mostly to hydrophobic side chains. At position 4, all residues are “allowed” except for proline, which is “disallowed” due to secondary structure propensity reasons. (d) By normalizing the relative frequency of the consensus amino acid with the number of “allowed” residues at a given position, a modified consensus sequence can be deduced. The variability V is calculated according to V = 100×N/F, where N is the total number of different residue types occurring at each position and F is the frequency of the most frequent residue in percentage.
11
12 Engineering Proteins for Stability and Efficient Folding
a fungal phytase were used to build the consensus, and the resulting consensus enzyme showed an increase of 15–22 degrees in unfolding temperature and an increase of the temperature optimum for catalysis of 16–26 degrees compared with each of its parents. In a second set of experiments, additional mutations were predicted by simply adding more sequences to the alignment. By checking the effects of the individual mutations on thermal stability and combining mutations with positive effects, the unfolding temperature could be increased by an additional 21 degrees to 90 ◦ C. No loss of catalytic activity of the enzyme was observed in any case. This work showed that the number of sequences used for the alignment is indeed an important factor because it might help to optimize ambiguous positions. In addition, the observed large increase in thermal stability could not be attributed to the effect of one single amino acid substitution but rather to the synergistic effects of many replacements. Even though these and other studies show clearly that the consensus approach allows one to predict stabilizing mutations with a rather high success rate in a rapid way, the effect of each predicted mutation carries some uncertainty, and it is possible that some may contribute destabilizing effects that can counterbalance the stabilizing ones. However, the effects of stabilizing mutations were often found to be additive. Therefore, instead of examining each mutation individually, it is often useful to combine groups of “rather certain” mutations and others that are more speculative. The application of the consensus concept to families of repeat proteins [23, 26–28] represents, in some respects, a special case due to a number of favorable features. It can nevertheless illustrate the importance of some principles of the approach. The non-globular fold of repeat proteins consists of repeated motifs of 20–40 amino acids. Several results indicate that consensus repeat proteins are indeed much more stable than natural repeat proteins [29, 30]. Repeat proteins might represent an extreme case in which the principles of the consensus approach become very apparent. The structural entities (the repeat modules) are small and thus each protein contributes several repeats to the databases; the number of available sequences consequently becomes very large compared with other proteins. Therefore, a consensus sequence for a single repeat module can easily be assigned and ambiguous positions will occur with lower frequency in the statistical output (Figure 2). In addition, interactions that are present within or between several repeats will add to the free energy of folding multiple times, while problem spots would equally be potentiated. Therefore, effects on stability are likely to be consecutively added by introducing additional modules to the array. Even though more-detailed analyses are still needed, initial results pinpoint some of the structural reasons for the stability gains observed upon building a consensus in each of these studies. The regular arrangement of structural motifs gives rise to a more regular H-bonding pattern with a higher number of inter- and intra-repeat H bonds [30]. Loop insertions in natural repeat proteins that are likely to result in more flexible local regions are removed, thereby eliminating local centers of unfolding. In the left-handed helical and disallowed regions of the Ramachandran plot, glycine residues are always present in the consensus proteins, while they are avoided in other places by this design, where their flexibility is not needed and may be harmful to stability.
3 The Engineering Approach
3.2 Structure-based Engineering
Structure-based engineering relies on a detailed analysis of 3D structures, followed by site-directed mutagenesis. We avoid the term “rational” engineering, as it would elicit expectations of perfect predictability and implicitly suggest that all other approaches are free of logical reasoning. In structure-based engineering, positions have to be identified at which suboptimal amino acids in the original sequences lead to a loss of stability. Subsequently it needs to be specified which amino acids should be introduced as a replacement. The ab initio prediction of protein structure, however, is still not a feasible task due to the multitude of potential interactions within the protein and between protein and solvent, which leads to an extremely high number of possible conformations and intermediates of similar energy [31]. Hence, a prerequisite for structure-based engineering is, next to experimentally determined structures, usually the existence of a large experimental data-set within a group of structural homologues that can be used as a basis for predictions. High-resolution structures are necessary to allow the estimation of possible conformational, energetic, and steric influences upon replacement of particular amino acid side chains and can thus help to avoid unfavorable strain in the resulting mutants. Because of the present efforts in the field of structural genomics, these structure-based approaches are likely to become even more important in the future. With this structural information in hand, the goal of designing more stable variants is then to pinpoint particular regions and positions associated with possible stability defects and to subsequently find a better solution to the problem. In contrast, semi-rational approaches like the previously discussed consensus concept are rather crude methods for introducing stabilizing features. Structural and energetic analysis can be used to reexamine the changes proposed by the consensus approach and to fine-tune the system to reintroduce structural features that might have been lost in the averaging process. Proteins of hyperthermophilic organisms have been of special interest for examining the structural mechanisms of thermostabilization and have been contemplated as guides for the engineering of “problem” proteins for better properties. From a phenomenological point of view, the basis of increased thermostability is frequently set by a flattened G-versus-T curve that is due to a smaller change of heat capacity upon unfolding (Figure 1) or by a kinetic stabilization that is due to a strong decrease in the rate of unfolding. The crucial question is, however, what the molecular differences are that give these proteins their favorable properties. Genome-wide comparisons between hyperthermophilic and mesophilic organisms with respect to amino acid composition did not yield any obvious common rules of how these effects are achieved [32]. Therefore, hope was placed on the increasing number of pairwise high-resolution structure comparisons of thermophilic proteins with their mesophilic counterparts. While they provided a more differentiated picture, a “global” rule still could not be derived. A few highly specific mutations are often enough to provide considerably stabilizing effects, but the additive effect of many small contributions, none of them dramatic by itself, may be the usual case.
13
14 Engineering Proteins for Stability and Efficient Folding
Moreover, rather than relying on one universal strategy, nature utilizes a variety of strategies for the thermal adaptation of proteins [33]. In fact, the list of stabilizing structural features in hyperthermophilic proteins reflects the diverse principles of protein stability and folding that protein engineers try to exploit and that will be discussed in this section. High-resolution structures of thermophilic proteins can thus provide a detailed view of how nature implements these principles to create proteins of higher stability [34]. However, the lack of a unifying “rule” and the multitude of strategies nature uses provide an important lesson for protein engineers. Depending on the protein under investigation, the strategy of choice can be different, and even for a given protein there may be more than one optimal solution to the problem. Before choosing from the available set of strategies, the focus should therefore be on identifying potential “weak points” responsible for stability defects in a given protein structure. We will now discuss some structural features associated with protein stability as well as strategies for altering these features towards more favorable biophysical properties. The given list of course lays no claim to completeness but should point out some important principles. Any replacement in a given sequence may have multiple effects on protein stability, and destabilizing effects can often outweigh the stabilizing ones. An assessment of potential destabilizing effects is therefore crucial. Wherever possible, references are made to the different forms of “stability” discussed in the Introduction. In the case study provided at the end of this section, an example will be given to demonstrate how consensus approaches and structural analysis can be combined to yield useful results. 3.2.1 Entropic Stabilization An obvious strategy for increasing the free-energy difference between the folded and unfolded states, and thus the thermodynamic stability, is to decrease the entropy of the unfolded state. The underlying concept is to decrease the flexibility of the polypeptide chain, usually by introducing an additional intrachain linkage. Such entropic stabilization has become a common strategy for protein engineers. The prerequisite for success is that the mutations rendering the unfolded protein less flexible do not introduce unfavorable strain in the folded three-dimensional structure or result in any steric incompatibilities [35]. We now discuss several ways to achieve an entropic stabilization. 3.2.1.1 Introduction of Disulfide Bridges The introduction of additional disulfide bridges is a straightforward way of establishing an intrachain linkage to reduce the entropy of the unfolded state [36, 37]. The magnitude of the entropic effect is thought, as a crude approximation, to be proportional to the logarithm of the number of residues between the two bridged cysteines [38]. The spatial distance between the residues to be replaced with cysteines has to be evaluated with care in the model of the folded protein in order to prevent perturbations of the native structure upon formation of the disulfide bridge. It should be noted, however, that the energetic effect of additional disulfide bridges is not only entropic but also of a far more complex nature, giving rise to entropic as well as enthalpic contributions
3 The Engineering Approach
to the change in the free energy of folding [39, 40]. For example, an additional decrease in the free energy of folding can result from the reduced solvation energy of the unfolded state [40]. In contrast, a reduced solvation energy of the folded state would have the opposite effect, while residual structure in the denatured state would again push the equilibrium to the side of the folded protein. Because the disulfide bond itself is hydrophobic in nature, it is often engineered into the interior of the protein. This is not an easy task, as it can negatively affect core packing. Even though there are several examples of successful protein stabilization by introducing artificial disulfide bridges [41, 42], the complex energetic effects can also cause a destabilization of the protein [43]. Furthermore, the introduction of additional cysteines often results in a rather drastic decrease in folding efficiency, because incorrect and intermolecular disulfide formation can remove large portions of expressed protein by aggregation. Since disulfide formation does not occur in the cytoplasm, secretion to the bacterial periplasm or to the eukaryotic ER is required for functional expression, usually associated with lower yields than for the production of cytoplasmic proteins. Alternatively, if the protein is produced by refolding, redox conditions have to be adjusted, which can be difficult if the native protein also contains free cysteines. 3.2.1.2 Circularization An alternative approach with the same underlying concept is the circularization of proteins by fixing the loose N- and C-termini via a peptide bond. In addition to the entropic effect, the fixing of the loose ends can prevent local unfolding events occurring at the termini and thereby kinetically stabilize the native structure. With the discovery of inteins, which mediate protein-splicing reactions, a tool that allows the directed formation of peptide bonds between ends fused to different parts of the intein became available. Intein-mediated protein ligation has been used to covalently link the termini of β-lactamase, a protein that is especially amenable to this strategy due to the close proximity of its N- and C-termini [44]. In accordance with polymer theory, the thermal stability of the protein was enhanced by about 6 degrees, from 45 ◦ C to 51 ◦ C. For circularized DHFR [45] an increased half-life at elevated temperature was observed. The close proximity of the termini is of course a prerequisite for this procedure, and the stabilizing effect is likely to become marginal if the loose ends are linked via long unstructured loops. Similar to the situation upon introduction of artificial disulfide bridges, destabilizing enthalpic effects may negate the favorable entropic contribution [46]. In addition, low protein ligation efficiencies and difficulties in separating circular from linear forms of the protein often cause additional technical challenges. It remains to be seen whether this technology is robust enough for biotechnological or biomedical applications. 3.2.1.3 Shortening Solvent-exposed Loops Short, solvent-exposed loops are rather fixed in the native state, but a comparably large number of additional conformations become accessible in the unfolded state, while long loops have a large number of conformations also available in the native state. Thus, the shortening of loops should in principle lead to a relative decrease in the loss of conformational entropy upon folding. Conversely, increasing the loop lengths by insertion of glycine residues into
15
16 Engineering Proteins for Stability and Efficient Folding
the loops of the four-helix bundle protein Rop has indeed resulted in a strong and continuous decrease in thermodynamic stability [47]. In addition, loop shortening can have the effect of abolishing hot spots of local unfolding events and may result in kinetic stabilization. Even though it has become obvious that loop shortening or tying down of loops by external interactions is a common theme in thermostable proteins of thermophilic organisms [48, 49], the strategy is often hard to realize for a given protein target, as the danger of introducing additional strain in the native state is high, and solvent-exposed loops are often important with respect to function. 3.2.1.4 Reduction of Chain Entropies By considering the conformational entropies of amino acid side chains, another strategy for decreasing the entropy of the unfolded state becomes apparent. Because of the five-membered-ring nature of the proline side chain, it not only restricts the possible conformations of the preceding residue but also can adopt only a few conformations itself. It therefore has the lowest conformational entropy of all amino acids [50]. In contrast, glycine, which has no side chain, has the highest conformational entropy. Substitutions of non-glycine residues with proline or the replacement of glycines by other residues should therefore reduce the entropy of the unfolded state. Positions that allow substitutions with proline are, however, very rare. Because proline is poorly compatible with α-helices and incompatible with β-strands, the position of a new proline must not be part of these secondary structure elements. At most positions in the native structure, the respective torsion angles will be incompatible, and the mutations are thus very much restricted to loop and turn regions. Again, care should be taken not to remove any favorable interactions of the replaced amino acid side chain [51]. In order to examine in advance whether the respective site is permissive for a substitution with proline, the dihedral angles of the site can be checked and should lie in the range of φ/ −50 to −80/120 to 180 or, alternatively, −50 to −70/−10 to −50. Similar restrictions apply for the replacement of glycines with any other residue. In many cases this will create steric overlaps, and such negative structural crowding effects can outweigh the positive energetic benefits. 3.2.2 Hydrophobic Core Packing Exposed residues are often directly involved in ligand or substrate binding and therefore often play a functional role. In contrast, the residues of a protein’s interior usually play mostly a structural role, and the associated hydrophobic effect is thought to be the main driving force of protein folding and thermodynamic stability. In known structures the core residues fill almost the entire interior space, provide many favorable van der Waals interactions, and maximize hydrophobic stabilization by exclusion of the solvent. In principle, an increase in thermodynamic stability of 4–8 kJ mol−1 can be achieved for each additional methylene group buried [52]. Paradoxically, the importance of the hydrophobic effect for folding and stability of proteins simultaneously limits its applicability for protein engineering. Because
3 The Engineering Approach
the hydrophobic core is already densely packed in almost all native proteins, most changes here will create over-packing or packing defects, causing an overall destabilization rather than an improvement in stability. In addition, even subtle changes of core residues can lead to a rearrangement of external residues and thereby alter the functional properties of the protein [53]. Care should also be taken to avoid the introduction of conformational strain by the mutation of core residues, as destabilizing effects from a strained conformation can sometimes compensate the energetic gain of an increase in buried hydrophobic volume [54]. Improvement of core packing must therefore be based on analyses made from high-resolution structural information in combination with sequence comparisons. This allows one to specifically look for cavities in the core that indicate imperfect packing. If the hydrophobic surface area around the cavity is large, additional van der Waals interactions can be provided by the introduction of sterically fitting alkyl or aryl groups from hydrophobic side chains, thereby decreasing the size of the cavity [55, 56]. 3.2.3 Charge Interactions Oppositely charged amino acid residues, if appropriately positioned, have the potential to form salt bridges, whereas like-charged residues lead to repulsions. The magnitude of the effect of charge-charge interactions on overall protein stability is still a matter of discussion [57]. In the case of ionic interactions between side chains buried in the hydrophobic core, the high energetic cost of transferring charged ions from aqueous solution to the low-dielectric interior of the protein also has to be taken into account. If a single charge were buried in a protein, which would be extremely rare in a natural protein, the design of an ion pair would be very attractive. However, if a hydrophobic pair were to be replaced by an ion pair, the resulting energy would have to be higher than the loss of the previous pair plus the cost of burying the charge. Nevertheless, the high contribution of buried salt bridges to the overall stability of the native protein structure underlines their potential for introducing additional stability [58]. The optimal spatial arrangement of the interacting side chains and their respective charges is, however, crucial. Moreover, buried charged side chains are often not only part of interacting charge pairs but also part of complex charge clusters built from many side chains, which are able to magnify the effect. Similar rules apply for the interactions of charged residues on the protein surface. Only the perfect arrangement of charges seems to be able to make up for the desolvation penalty that has to be paid upon formation of a salt bridge. The effect on thermal stability, however, can be drastic. Increasing the free energy of unfolding by the changing of charges includes maximizing the number of salt bridges and, equally important, the removal of repulsive interactions [59], which are not uncommon in natural proteins. Predictions on a structural basis can be difficult due to the often higher flexibility of side chains on the protein surface, but simple models that allow predictions about potential stabilizing and destabilizing surface charges can be used [60]. The key is to consider not only nearest neighbors but also a whole network of charges that have to optimally interact and avoid repulsions.
17
18 Engineering Proteins for Stability and Efficient Folding
Because the surface charge distribution can have a huge impact on stability, but is defined by many residues at different positions, these residues are also a valuable target to be combined with selection techniques as discussed in Section 4. A special case of electrostatic interaction is the “helix dipole.” By reducing the net partial charges at the helical ends through placement of side-chains, which productively interact with the helix dipole, the helical structure is stabilized [61]. Introduction of negatively charged residues at the N-terminal end and positively charged residues at the C-terminal end leads to this stabilizing effect. The provided stability gain is, however, marginal (less than 4 kJ mol−1 ). 3.2.4 Hydrogen Bonding There was no initial reason to believe that intrachain hydrogen bonds in the native state would be more energetic than those of the unfolded chain to water [5]. By including terms of entropy change of the solvent and additional van der Waals interactions upon polar group burial, however, the positive contribution of hydrogen bonding to protein stability has become generally accepted [58, 62]. Despite this ongoing discussion, hydrogen-bonding patterns are a highly valuable target for the stability engineering of proteins. Because engineering deals with improving folded proteins, the major concern has to be how to satisfy the existing hydrogen-bonding network in a structural context. The basis for this endeavor is structural information of high resolution, and the most lucrative goal is to identify potential residues that represent buried but unsatisfied donors of hydrogen bonds. Site-directed mutagenesis of a nearby residue to provide a hydrogen-bonding acceptor can cause a stability gain in the range of 2–10 kJ mol−1 , depending on the geometry and other compensating effects [63, 64]. A special structural context that can provide significant additional stabilization by either hydrogen bonds or ionic interactions is the anchoring of relatively loose structural elements like loop structures or the N- and C-termini, thereby tightening “hot spots” of local unfolding [49, 65]. 3.2.5 Disallowed Phi-Psi Angles The stereochemistry of the polypeptide backbone can be defined by the dihedral angles φ and , and any individual residue in a structure is defined by a single set of φ, values. For conformational analysis of protein structures, the Ramachandran plot representing the dihedral angle space is an excellent starting point [66]. The φ, values of amino acid residues in protein structures usually reside in three preferred or “allowed” regions of the Ramachandran plot, called right-handed helical, extended, and left-handed helical. The right-handed and extended conformations correspond to α-helix and β-strand secondary structures, respectively, and the vast majority of non-glycine residues lie within these two regions. The left-handed region corresponds to structural features at the termini of secondary structure elements and describes regions involved in the reversal of the polypeptide chain. There is a high preference in this region for glycine residues, as the β-carbons of non-glycine residues can sterically interact with the polypeptide backbone, resulting in unfavorable energies. In some cases, the substitution of non-glycine residues
3 The Engineering Approach
by glycines in the left-handed helical region increases thermodynamic stability (up to 8 kJ mol−1 in RNase H) [67]. Although the strict introduction of glycines in such positions is an important point to consider for de novo protein design, it does not represent a general rule for stabilizing the native states of proteins. In some cases, the energy penalty for the accommodation of unfavorable strain can be offset by lost unfavorable or new favorable local interactions, such as hydrogen bonding or hydrophobic interactions. Some replacements of this kind can therefore even lead to destabilization rather than stabilization [68]. The same rules apply in principle to the disallowed regions of the Ramachandran plot. Steric clashes result in a high energetic cost in the folded structure, especially for non-glycine residues in all disallowed regions. Stabilizing mutations to glycine have been introduced with energy gains of up to 18 kJ mol−1 [69]. It has been noted, however, that certain non-glycine residues also have propensities to occur in the disallowed regions – such as Asn and Ala in the type II turn region– and the energetic cost of their occurrences is often low [70]. Other residues found in the disallowed regions are small polar residues [71] that compensate for the energy cost by making additional hydrogen bonds. Unfavorable conformations often occur in very short loops [72], where the rest of the structure may constrain the loop efficiently, and interactions with the solvent may also offset the energy costs. Conformational analysis by the Ramachandran plot therefore provides a convenient and fast way to assess possible conformational strain in the tertiary structure associated with particular target residues. However, a close inspection of possible side chain interactions is required, and the analysis should be extended to identify potential compensating features of the residues to be replaced. 3.2.6 Local Secondary Structure Propensities Effects on the overall stability of a protein can also be influenced by the respective secondary structure propensities of amino acid residues for the α-helical and βsheet conformations. However, these effects are usually marginal. Nevertheless, even at the expense of other favorable trends, such as avoiding the exposure of hydrophobic side chains to the solvent, a given residue is often favored at a certain position due to its secondary structure propensity [73]. If a particular secondary structure element does not form efficiently, many interactions between this element and other parts of the protein can be lost. The energetic consequences and the influences on folding efficiency are, however, hard to predict at this point. 3.2.7 Exposed Hydrophobic Side Chains The removal of exposed hydrophobic side chains increases the polar surface of a protein. Such mutations are not likely to affect thermodynamic stability but presumably do affect the folding efficiency by influencing the rate of aggregation of intermediates. Interestingly, they also do not affect the solubility per se (the amount of native protein that can be dissolved in buffer) and seem to act mostly on folding intermediates [74]. Because of lateral interactions, e.g., between neighboring loops, partially exposed hydrophobic amino acids can even increase stability. It must be kept in mind however, that not only the aggregation pathway, but also the stabilities
19
20 Engineering Proteins for Stability and Efficient Folding
of folding intermediates are an important parameter for kinetic stability, which can possibly be influenced by the existence of exposed hydrophobic side chains. Moreover, hydrophobic cavities at the protein surface are often involved in specific binding functions of the protein, and the removal of these “functional hot spots” has to be avoided. Therefore, the hydrophobicity of the protein surface must be carefully balanced. 3.2.8 Inter-domain Interactions In proteins consisting of more than one domain, additional principles come into play. The overall stability not only reflects the intrinsic stabilities of the single domains but is also influenced by the stability of the interface of the domains in cases where they are interacting with each other. Stabilization of this interface can be achieved mainly by increasing and optimizing the hydrophobic surface area of the interface. Because hydrophobic side chains at the interface are usually exposed during folding and transient opening of the domain interface, a tradeoff between interface stability and folding efficiency is often observed. Additional dramatic effects on stability can be observed in cases where the two domains exhibit very different intrinsic stabilities [75]. In such a scenario, the unfolding of the protein and the loss of function are strongly related to the unfolding of the less stable domain [76]. An important aspect of kinetic stabilization can be observed in twodomain proteins, where one domain can slow down the unfolding, and therefore the aggregation, of the other. Thus, the native state becomes kinetically stabilized in the assembly and a stable domain interface reduces the extent of its transient openings and, thereby, the resulting exposure of hydrophobic patches that would favor aggregation. Alternatively, covalent cross-links (e.g., disulfide bonds) between the interfaces of multimeric proteins can be introduced. For example, the introduction of disulfide bonds between the interfaces of Lactobacillus thymidylate synthase not only increases their thermal stability but also leads to reversible thermal unfolding [77]. 3.3 Case Study: Combining Consensus Design and Rational Engineering to Yield Antibodies with Favorable Biophysical Properties
The following example illustrates how consensus approaches, rational design, and experimental data can be combined in a synergistic fashion to iteratively optimize biophysical properties. The smallest form of an antibody able to bind the antigen in the same manner as the whole IgG consists of two domains, the variable domain of the heavy chain (VH ) and the variable domain of the light chain (VL ). Both domains interact with each other via an interfacial region of highly hydrophobic character. In single-chain Fv (scFv) antibody fragments, the two domains are covalently linked by a flexible linker region of typically 15–20 amino acids (Figure 3a). The binding site for the antigen usually involves three loop regions in each of the domains, named complementarity-determining regions (CDRs). Antibodies that are based
3 The Engineering Approach
Fig. 3 (a) Structure of an antibody Fv fragment consisting of the variable heavy chain (VH ) and the variable light chain (VL ). Each domain VH and VL is characterized by three hydrophobic core regions (upper [green],
central [yellow], and lower [orange] core) and a charge cluster at the base of each domain (red). Even though the residues defining these regions are conserved within the same germ-line family, these sequence
21
22 Engineering Proteins for Stability and Efficient Folding
on human antibody sequences possess great potential for many medical applications, either directly as an antibody fragment or after reconstructing an IgG [78]. Antibody fragments can be expressed in a convenient manner in E. coli, thereby providing rapid access to these proteins [79]. Nevertheless, there are often drastic differences between individual antibodies concerning their expression yield and their stabilities. Ideally, the recombinant antibodies would all provide favorable biophysical properties. The starting point for such recombinant antibodies is a library. One fully synthetic library of this kind, the human combinatorial antibody library (HuCAL), was designed based on the consensus concept [22]. Based on human antibody germline sequences, several sequence families were created. Importantly, instead of averaging over all possible human antibody sequences, the consensus sequences were built for each family separately, resulting in seven consensus frameworks for VH (VH 1a, VH 1b, VH 2, VH 3, VH 4, VH 5, and VH 6) and seven frameworks for VL (Vκ 1, Vκ 2, Vκ 3, Vκ 4, Vλ 1, Vλ 2, and Vλ 3). This diversity is important, as the use of different frameworks allows a variety of non-CDR contacts to the target, thereby greatly increasing the range of targets being recognized. By using this strategy, each human VH and VL subfamily that is frequently used during an immune response is represented by one consensus framework, and thus the immune response is closely mimicked. The consensus building was further restricted to the framework regions, while the CDRs were diversified in a manner guided by structure. Thereby, functional diversity is maintained in optimized sets of frameworks. Instead of fully relying on the statistical output of the consensus building, structural modeling was employed in order to decrease the risk of disrupting interactions between certain amino acid residues that might be in contact with each other in the three-dimensional structure. Because many structures of antibody domains are available, the modeled structures of each framework family could be compared with the respective natural template structures. The models were checked according to several principles, which have been outlined in the preceding section. At this point ←-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Fig. 3 (Continued) motifs differ between different germ-line families. (b) Arrangement of the residues defining the charge cluster of VH 3 and Vk 3 domains [13]. Importantly, the charge cluster consists of a network of buried charges and hydrogen bonds rather than pairwise interactions between individual residues. (c) Furthermore, subtype-dependent packing differences occur. As an example, the residues defining the upper hydrophobic core region of different human VH subtypes are shown as structural superpositions. Structural alignments are shown of VH 4 (PDB entry 1DHV), VH 1a (1DHA), and VH 5 (1DHW) [22], with the most stable framework, VH 3
(1DHU) (left panel). While the upper core residues of VH 3 are densely packed, cavities occur for the other subtypes. In the least stable subtypes, VH 4 and VH 6, the bulky aromatic residues Phe29 and Phe31 are replaced by smaller residues, and the created space is only partly filled up by compensating residues Trp41 and Val25. The loss of the phenyl ring by replacement with Gly in VH 1a (middle panel) as well as the substitution of Leu89 by Ala are not compensated for by larger side chains at other positions, thereby creating hydrophobic cavities. The same is true for position 89 in VH 5 (right panel). Adapted from Ewert et al. [13].
3 The Engineering Approach
the models were mostly checked for whether the interactions in natural antibody domains were correctly recreated. The consensus sequence models were inspected for any unfavorable strain in the structures, represented by unfavorable regions of the Ramachandran plot or any obvious cavities in the hydrophobic core. Moreover, the sequences were checked for the existence of residues already known to be involved in conserved intra-domain interaction patterns – such as conserved charge clusters and hydrogen-bonding patterns – as well as for exposed hydrophobic side chains known to decrease expression levels. Nevertheless, differences between the natural subclasses became apparent in the models. Empirical results had suggested that certain natural framework types display more favorable stabilities and expression levels than others. These differences are already existing in the original natural human germ-line sequences. The intrinsic differences in terms of biophysical properties for each subtype were then experimentally explored in a systematic fashion, first on single domains, then on scFv fragments [13]. The experiments confirmed the observed trends, showing that VH 3 displays the highest thermodynamic stability and soluble expression level, when expressed as an individual domain, among all VH subtypes. In contrast, VH 2, VH 4, and VH 6 display the least favorable properties in terms of stability, folding yield, and the tendency to aggregate. For the VL domains, members of the Vκ subtypes showed slightly higher stabilities and expression yields than the Vλ subtypes, but the behavior was much more homogenous overall. In order to trace back these differences to the structural level, the model structures of the different subtypes were compared with each other. Several structural features can be invoked to explain the extraordinary stability of VH 3 domains in comparison with the even-numbered VH subtypes. First, differences in the hydrogen-bonding networks have an influence on the thermodynamic stabilities. Long-range interactions involving several residues are concentrated in a charge cluster at the base of VH domains to establish a complex interaction network. In VH 3 domains the ionic and hydrogen-bonding interactions within this charge cluster are very well satisfied, whereas fewer interactions are observed at the analogous positions in other subtypes (Figure 3b). Corresponding to the subclass, different hydrogen-bonding networks are formed in the charge cluster of VH domains. Some of these networks are less extended and contain a smaller number of interactions than in VH 3. Additionally, based on the residues at three different positions in the first β-strand of the VH domains, the domains can be classified into four different structural subtypes with respect to their conformations in the first framework region [80]. Mutations bringing together incompatible residues and thus “mixing” subtypes have previously been shown to have a large unfavorable effect on the stability of the whole scFv [81]. Also, clear differences can be observed for hydrophobic core packing of the family subtypes. The upper core region of VH 3 is densely packed, whereas cavities can be identified in VH 4, VH 1a, and VH 5 on the basis of structural alignments (Figure 3c). In the lower core, two of the stable domains have an aromatic residue, while the others do not.
23
24 Engineering Proteins for Stability and Efficient Folding
Finally, a comparison of the Ramachandran plots showed additional non-glycine residues with positive φ-angles and residues with higher secondary structure propensity at certain positions for the even-numbered subtypes compared with the odd-numbered ones. The immediate question was, therefore, whether the results of this structural trouble-shooting could be used to project favorable properties of VH 3 domains onto the less stable subtypes and thereby add another step to the optimization of antibody sequences while maintaining the structural diversity of the immune system. Instead of using the stable VH 3 framework exclusively, resulting in a loss of diversity, some point mutations might be enough to correct some of the shortcomings of the less stable domains. Based on these comparisons, the mutation of six residues in a scFv containing a VH 6 framework led to an overall increase in thermodynamic stability of 20 kJ mol−1 and a fourfold increase in soluble expression yield [73], indeed bringing this framework to the level of VH 3. The effects of the single mutations on stability were almost fully additive, while the effects of folding efficiency (soluble expression yield) were only qualitatively additive. The most dramatic effects on thermodynamic stability were obtained by mutations removing an unsatisfied H-bond donor in the hydrophobic core and introducing glycines at positions with positive -angles. Individually, all mutations, except the one in the hydrophobic core, led to slight increases in soluble expression yield. Interestingly, one mutation that removed a hydrophilic, solvent-exposed glutamine residue by a replacement with hydrophobic valine on the basis of higher secondary structure propensity also significantly increased the expression yield, possibly since valine secures this stretch to be in β-sheet structure. In antibodies, disulfide bond engineering has also been investigated. Optimized disulfide bonds engineered between VH and VL indeed significantly increased the half-life of an Fv fragment at 37 ◦ C [82]. This strategy was subsequently extended to interchain disulfides in generic framework positions [83]. Even more stable proteins can be obtained by combining the single-chain Fv approach with the engineered disulfide [84, 85]. It should be pointed out, however, that the additional inter-domain disulfide significantly reduces the yield of folded protein when produced in the bacterial periplasm, and such proteins have to be prepared by in vitro refolding. Therefore, the additional disulfides are not ideal for antibody libraries; instead, optimized frameworks have shown the greatest promise for combining diversity with stability. In summary, the consensus concept provides a convenient tool for proceeding with large steps in sequence space and a high probability of accumulating features that are favorable for higher stability. In many cases, structural analysis can serve as a trouble-shooting tool to identify shortcomings that might have been created by the consensus-building process or, as in the case of the antibodies described here, that are inherent to the natural sequence. In addition, it serves to rank the potential mutations identified by the consensus approach, keeping the mutational load on the target molecule to a minimum. Experimental data are important not only to validate these results but also to give important hints for future design approaches.
4 The Selection and Evolution Approach 25
Many entries in the table of experimental antibody stability data linked to mutations have come from directed evolution experiments. By combining semi-rational and rational approaches with experimental data, optimization of biophysical properties can be achieved in an iterative fashion. This interplay of experiment and structural analysis can therefore be an effective way to probe the vast sequence space in a systematic manner in order to find the valleys of free energy.
4 The Selection and Evolution Approach 4.1 Principles
The previous section may have led to the impression that mutations enhancing the biophysical properties of proteins can rapidly be identified. In the highlighted case of antibody domains, only the wealth of structural data and empirical measurements available allowed predictions with a high probability of success. Some of the empirical data used successfully for structure-based engineering have come, paradoxically, from directed evolution experiments (see Section 3.3). Despite the rapidly increasing amount of structural data and the better understanding of folding mechanisms of proteins, the effects of introduced mutations still cannot be predicted with a high degree of accuracy in most cases. The main reason is that even slight alterations in the primary sequence can lead to profound conformational changes in tertiary or quaternary structure; consequently, structural predictions have to be very accurate and usually must be backed up by empirical data. Because rational engineering uses site-directed mutagenesis followed by biophysical investigation to probe the effects of specific amino acid substitutions, it becomes very labor-intensive if many mutants have to be checked individually and if no additional hints are available. Additionally, as soon as small synergistic effects of several mutations need to be checked, the combinatorial explosion rapidly exceeds the sample number that can be handled efficiently. More importantly, the restriction to certain target residues automatically excludes alternative solutions to a given problem that may not have been obvious by the initial analysis. For example, affinity improvement of a protein to its binding partner is often achieved more efficiently by slight spatial adjustments of residues directly involved in binding, rather than by substitution of these residues. This kind of spatial adjustment can be caused by mutation of residues whose location in the native structure is further away from the actual binding site (so-called “second-sphere mutations”) [86]. Today, it is almost impossible to predict this kind of mutation by rational means. When it comes to stability engineering of proteins, the problems associated with rational design procedures are even intensified. First, because the energetic contributions of single-site mutations are usually small, the need to sample multiple mutants – in which synergistic effects of several mutations are combined – is more acute. Second, the factors and principles responsible for the overall stability of a
26 Engineering Proteins for Stability and Efficient Folding
native protein are still far from being completely understood, and a multitude of different forces and interactions contribute to it. A complete analysis will have to consider not only the interaction network of the native protein but also the effects on the denatured state. Additionally, potential aggregation pathways will have to be considered. Even simple substitutions like the ones discussed in the previous sections often make contributions to the entropy as well as the enthalpy of the folded and unfolded states, including the solvent in either state. Third, because rational approaches always rely on the current theoretical knowledge, new principles underlying protein stability will rarely be uncovered. In any case, rational engineering requires a clear definition of the problem by the investigator in order to find a solution. Ironically, in the field of protein engineering, the exact definition of the problem is often the problem itself. Thus, to overcome these limitations, an experimental setup is needed that allows the creation of a vast number of variants of a given protein and that subsequently can identify “superior” molecules that best fulfill a desired property. Nature samples the vast sequence space by the strategy of Darwinian evolution, a cyclic iteration of randomization and selection. Nature thereby adapts proteins to fulfill a function under the given environmental conditions. Recent developments in molecular biology have made it possible to mimic Darwinian evolution in a reasonable time in vitro. Not only has this “evolutionary approach” become the most powerful method to date for engineering proteins towards a desired property, but it also provides new insights into the mechanisms and principles that are responsible for this property. However, as has been illustrated in the case study on antibodies, such experiments can be used not only to solve a particular problem but also to gather information about which residues tend to become enriched in particular positions. This again provides a database for rational engineering. Many different selection and evolution strategies have been developed in recent years, but all of them have several features in common that reflect the principles of Darwinian evolution. The starting point is the generation of a genetic library of mutants derived from the wild-type sequence of the protein under examination. Several methods exist to create sequence diversity. In error-prone PCR, the error rate of polymerases is increased by performing the PCR reaction in the presence of deoxynucleotide analogues or in the presence of other metal ions. By using bacterial mutator strains, which are characterized by deficient DNA repair systems, random mutations are introduced during DNA amplification in the bacterial cell, albeit usually at a lower frequency [87]. In DNA shuffling, which mimics the natural process of sexual recombination, genes are randomly fragmented by nuclease digestion and reassembled by a PCR reaction in which homologous fragments act as primers for each other [88] (Figure 4). The staggered extension process [89] is another possibility to obtain recombined genes in vitro. Here, the polymerase-catalyzed extension of template sequences is extremely abbreviated, and repeated cycles of denaturation and extension lead to several template switches – thereby recombining elements from different genes – before the extension finally yields full-length products. Additionally, techniques are available to focus the diversity to certain regions on the whole gene, such as the use of degenerate primers in PCR or the “doping” of a
4 The Selection and Evolution Approach 27
Fig. 4 Methods to create genetic diversity by biochemical means. On the left, errorprone PCR (see text) is depicted schematically. Two types of mutations are shown: favorable ones (open squares) and unfavorable ones (closed circles). In successive cycles of PCR, more mutations of each are introduced, and usually molecules will contain some of either type. Thus, the beneficial effect of the favorable mutations can be completely obscured by the presence of unfavorable ones, if the error rate is too high.
Therefore, this method is most successful if it is not used at too high an error rate. On the right, DNA shuffling according to Stemmer [88] is shown. A short DNAse digestion breaks up the DNA into small pieces, and PCR is used to reassemble the gene. Thereby, mutations are “crossed” and genes with largely favorable mutations can be obtained that can be enriched by selection. Nevertheless, successful evolution experiments have been carried out with either method.
shuffling reaction with degenerate primers. Depending on the problem and the sequence under investigation, each of these methods has its advantages and disadvantages. In any case, the generated library should obey two major criteria: it must be diverse enough to contain individual sequences with beneficial mutations, and it should be of high enough quality to reduce the experimental “noise” (such as sequences with stop codons or frameshifts) in the subsequent selection experiment [90]. A primary prerequisite of any selection system is the coupling of the genotype (the gene sequence) and phenotype (the respective protein displaying the properties) of any individual library member. Briefly, this can be done by two strategies. Either the gene and the protein need to be compartmentalized in cells or artificial compartments, such that the “improved” phenotype stays connected to the altered
28 Engineering Proteins for Stability and Efficient Folding
Fig. 5 Two principal strategies to link phenotype and genotype are depicted. The genetic material must be connected in a unique way to the protein, which defines the phenotype, such that the gene encoding the valuable mutation can be selectively amplified. A collection of two mutant proteins and mutant genes is shown (light gray and dark gray). (a) This connection can be realized through a direct link (e.g., in phage display or ribosome display), as shown on the top. In this case, all assemblies can freely diffuse in the same volume and the selection must filter out those proteins with the desired function, e.g., by binding to a ligand. (b) Alternatively, gene and protein
must be in the same compartment. In nature, this is realized in cells, and microbial cells can be manipulated so that each takes up one variant of a mutant collection. The key is to identify the improved phenotype. This can be done by screening (individual assays on cells) or selection (giving cells with the desired property of the protein a growth advantage). Rather than natural cells, water-in-oil emulsions can be used to create artificial compartments of small water droplets in an oil phase. Usually, the emulsion must be broken and the proteins must be selected by binding to identify the one with the desired phenotype. For details, see text.
gene, or the protein has to be physically coupled to the gene, such that they can be isolated as a particle containing both gene and protein (Figure 5). By selecting the proteins displaying the desired properties, the linked gene sequence can be inherited and amplified subsequently. The various selection technologies differ mainly in the way this physical linkage or compartmentalization is achieved. This will be discussed in more detail in the following section. The library must then be screened or subjected to selection for a certain function, and a defined selection pressure can be applied to direct the selection towards a molecular quantity of interest. It is necessary at this point to discriminate between “screening” and “selection.” Screening methods examine individual members of a library for a given property (e.g., catalytic activity or solubility). For certain properties, screening is often the only way to go. The number of mutants to be screened in a reasonable time depends on the versatility of the screening method. Despite constant progress in automation and miniaturization, even the best screens to date usually do not allow assessment of more than 106 variants in a reasonable time. In contrast, selection methods force the single library members to compete with each other, and members that best fulfill the specified criteria are enriched. Often, the selection
4 The Selection and Evolution Approach 29
is based on the binding of particular variants to an immobilized ligand. Note that the binding is only a “surrogate quantity” of the real property to be improved. The basis for a successful enrichment is an efficient counter-selection of variants that do not possess properties fulfilling these criteria. Especially during the selection of proteins for higher stability, an efficient counter-selection, in addition to the correct choice of the applied selection pressure, is one of the major experimental challenges [91]. The basic rule of screening and selection technology describes the importance of assigning the correct selection pressure. This has been succinctly phrased: “You only get what you screen for!” [92]. An analogous statement can be made for selection. Even though the rule sounds plausible, it is often difficult to translate this statement into an experimental setup, because an additional complication is introduced by the fact that an explicit selection pressure towards just one property is impossible to realize. Depending on the selection procedure used, more than one molecular property will have an influence on the enrichment process, including, for example, affinity or catalytic activity, thermodynamic stability, folding efficiency, and toxicity of the respective molecule. The outcome of the selection experiment will therefore always be a “compromise solution” with respect to the weighting of many properties in a given experimental setup. Assigning the right selection pressure therefore means biasing the selection towards a certain property, rather than exclusively altering this property. For these reasons we will discuss the application of the various selection technologies with an emphasis on the available selection pressures for stability engineering. Some of the described methods and examples will be based on mere screening of library members, but the main focus will be on the selection for favorable protein variants. Choosing the appropriate selection pressure is just one side of the coin. The proper adjustment of its strength is another important factor. If the selection pressure on the system is too low, molecules with the desired properties will be lost in the background noise of the experiment – i.e., they are not enriched. On the other hand, if the selection pressure is too stringent, even the best variants might not pass the “survival threshold.” The genes of variants that survive the applied selection pressures are subsequently amplified and subjected to another so-called “round of selection.” This either can be performed in the absence of further mutagenesis to simply enrich the best members of the initial library (which is constant) or, to completely mimic the Darwinian principle, the selected members can be subjected to alternating rounds of randomization or recombination before subsequent selection, leading to an adaptation of the library from round to round. However, as most of the mutations will be non-beneficial, care has to be taken that the mutation rate is low enough to allow successful enrichment of improved members and not to extinguish the whole population. We use the terms “combinatorial selection” for a process in the absence of such mutations and “evolutionary selection” for a process that includes such mutations. It has been pointed out that the method of DNA family shuffling is in some respects similar to the consensus approach because it combines gene fragments
30 Engineering Proteins for Stability and Efficient Folding
Fig. 6 Common display systems used for selection for enhanced biophysical properties. In in vitro display technologies (a), the proteins are produced by in vitro translation, and protein production does not rely on a host organism. The displayed proteins are linked to RNA either by stabilized ternary complexes (ribosome display, upper panel)
or by a covalent puromycin cross-linker (mRNA display, lower panel), thereby establishing the genotype-phenotype linkage. (b) In contrast, partial in vitro display systems rely on cells to produce the displayed protein, but selections can be performed in vitro. In phage display (upper panel), the protein is displayed on the
4 The Selection and Evolution Approach 31
from homologous sources in a random fashion. For mere statistical reasons, the probability of replacing a residue with a consensus residue by gene shuffling is also higher than replacing it with a non-consensus residue [93]. There is, however, a crucial difference: when recombining genes in a random fashion, a selection or screening step is needed subsequently to identify the “fittest” members of the resulting collection. Rather than being based on theoretical assumptions, gene-shuffling methods mimic the natural process of evolution by identifying the members exhibiting a desired property by explicitly subjecting them to selective pressure for this desired property. In practice, however, even with the largest of libraries, a full “re-equilibration” of residues will not take place. In contrast, the consensus approach is based on the explicit assumption that the statistical preferences in a given (often limited or biased) set of sequences indeed reflect the energetic preferences. However, there may be other reasons that certain sequences are prominent.
4.2 Screening and Selection Technologies Available for Improving Biophysical Properties
As described above, any selection technology relies on four major steps: (1) generation of a genetic library, (2) establishment of the link between genotype and phenotype upon translation into protein, (3) subsequent screening or selection under defined conditions, and (4) re-amplification of selected members. Depending on how these steps are performed, selection technologies can be subdivided into in vitro methods, partial in vitro methods, and in vivo methods. While in vivo systems rely on a host organism to express the protein and to carry the respective genetic information, all in vitro display technologies have in common that the protein production and the selection process are performed entirely in vitro, i.e., that the protein is obtained by cell-free translation. In the partial in vitro methods, the genetic information is introduced in cells, where protein production occurs, while the selection process is performed in vitro. All techniques differ in the way the physical linkage between the protein and its genetic information is established. The principles of linking genotype and phenotype are shown in Figure 6.
←-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Fig. 6 (Continued) surface of filamentous phages, usually fused to the CT domain of the minor coat protein g3p. The three domains (N1, N2, and CT) of the minor coat protein g3p are depicted. The phage particle also carries the gene encoding the displayed protein. In other partial in vitro technologies, the proteins are displayed on the surface of the expressing host cell itself, which are either bacterial cells (bacterial surface display) or yeast cells (yeast surface display)
(lower panel) that also harbor the respective plasmid DNA. (c) in vivo screening or selection systems use the properties of fused reporter proteins for screening (upper panel) or split proteins for intracellular selection (lower panel) in which cellular growth and therefore amplification of the genetic material occurs only if the two protein halves are reconstituted upon interaction of the fused library protein with its target.
32 Engineering Proteins for Stability and Efficient Folding
4.2.1 In Vitro Display Technologies in vitro display technologies started off with the selection of peptides [94] and were then made efficient enough to select for functional proteins [95]. A number of technologies have been developed, which will be described briefly: ribosome display [95, 96], RNA-peptide fusions [97, 98], and water-in-oil emulsions [99, 100]. Water-in-oil emulsions constitute artificial compartments, rather than a physical link, and thereby create a phenotype-genotype coupling and can also use a selection step for interaction with a target. In ribosome display, the genetic library, usually in the form of a PCR product, is directly used to produce mRNA by in vitro transcription. This mRNA contains a ribosome-binding site for the subsequent translation of the protein and several features that stabilize it against degradation. The encoded protein variants are expressed in a cell-free translation reaction by a stoichiometric number of ribosomes. The essential linkage of the translated proteins to their respective mRNA molecule is achieved by eliminating the stop codon and stabilizing the ribosomal complexes, which prevents the release of the translated protein from the ribosome. In the related method of mRNA-peptide fusions, additional steps are used after this stoichiometric translation to covalently couple a linker between the end of the mRNA and the protein. The ribosome is then removed and the complexes are purified. These complexes are then used to bind to an immobilized target. In all these methods, after the selection process the re-amplification of genetic material can be performed solely by biochemical means, using reverse transcriptase to first synthesize RNA-DNA hybrids, which are then subsequently amplified in a PCR reaction. The resulting DNA then serves as a template for the production of mRNA used for the next round of selection (Figure 7). In vitro display technologies offer several advantages over in vivo selection systems. First, much larger library sizes are accessible because the creation of large molecular diversity on the genetic level does not pose a great challenge, and diversities up to 1013 are easily achieved. For in vivo systems, the critical step that limits the size of the displayed library is the transformation efficiency of the respective host organism. Only a small fraction of the initially generated pool will actually enter the cells, and, depending on the host organism, library sizes of about 107 variants in yeast or up to 1011 variants in bacteria can be achieved. In contrast, for in vitro selection systems such a transformation step is not necessary and the diversity of the library is defined by the number of different RNA molecules added to the cell-free translation reaction or by the number of functional ribosomes, whichever number is smaller. For a library of 1014 , a 10-mL translation reaction is required. A second advantage is that, because translation and protein folding take place in vitro, reagents can be added during protein synthesis, which can either promote protein folding or minimize aggregation of the displayed proteins (e.g., chaperones), or substances that exert certain selection pressures can be added. in vitro translation encompasses an additional advantage: many proteins are not compatible with in vivo selection systems because the wild type or at least some of the mutants are toxic to the host organism, undergo severe degradation, or cannot be expressed functionally in the respective cellular environment. Last but not least, a major advantage
Fig. 7 Selection cycle of ribosome display (left panel) and phage display (right panel). For ribosome display, all necessary steps are performed in vitro, whereas for phage display protein translation, folding, and phage assembly take place in the bacterial host cell. Other selection criteria can be applied to the protein of interest (see Figure 8) before selecting for binding to a target, thereby introducing selection pressures for higher stability. It has to be guaranteed, however, that the selection conditions are not too harsh to destroy the linkage of genotype and phe-
notype. Importantly, the genetic pool of library members in phage display remains constant in the course of several selection rounds, whereas in ribosome display, the genetic pool undergoes continuous modifications during the in vitro amplification steps by the limited accuracy of the polymerases, resulting in a true Darwinian evolution. In addition, this pool can deliberately and conveniently be altered by introducing additional mutations or recombination events such as DNA shuffling.
4 The Selection and Evolution Approach 33
34 Engineering Proteins for Stability and Efficient Folding
results from the amplification process after the selection has been performed. Because all enzymes used to convert and amplify the genetic material possess an intrinsic error rate, the genetic pool is never constant and is continuously modified from round to round. Therefore, even without special measures to increase the error rate, usually some “evolution” is observed. The error rates can be additionally increased by performing error-prone PCR or by recombining favorable mutations by means of DNA shuffling as explained in the previous section. 4.2.2 Partial in vitro Display Technologies We define partial in vitro display technologies as those methods in which host organisms are employed to carry and amplify the genetic information and to produce the proteins from this genetic library, usually encoded on plasmids. The selection step, however, is performed in vitro. Therefore, selection pressures similar to those in the in vitro methods can be applied. The array of available techniques includes phage display [101], bacterial surface display [102], and surface display on yeast [103]. Phage display is still the most popular selection technique, and due to its robustness, most of the studies that aim for enhanced biophysical properties of proteins have utilized this technique. Originally, phage display was applied as a method for the identification and selection of peptides or proteins binding to a specific target [101]. More recently, several developments paved the way for its application as a tool for studying protein folding, and as a result, it is now used for the selection of proteins with improved biophysical properties. We will briefly review the main principles of phage display in its most common format. Variants of the protein of interest are fused to the minor coat protein g3p of the filamentous bacteriophage, which consists of three domains (namely, N1, N2, and CT) that are connected by glycine-rich linkers. The fusion protein is usually encoded on a phagemid vector, expressed in the E. coli host and assembled. Typically, the other proteins necessary to produce an intact phage particle are provided by the helper phage. The assembled phages are secreted by the cell, which does not lyse, and the phages can be collected. While the protein is displayed on the exterior of the phage (typically as an N-terminal fusion to either N1 or CT), the gene of interest encoding a library member is packed into the phage particle upon its assembly in the bacterial host. The phages can then be subjected to the selection process. Phages compete with each other for binding to an immobilized target, using the displayed library members for interaction, and can thereby be captured. Optionally, before capturing, the phages can be subjected to harsh environmental conditions. The captured phages are subsequently amplified by re-infecting bacteria, which then produce phages for a new round of selection (Figure 7). In the ideal case, only phages that specifically recognize the target would be amplified. Due to reasons such as nonspecific “sticking” of phages to the surface, the efficiency of selection is in practice at most a 1000- to 10 000-fold enrichment over nonspecific molecules and can even be very much lower. Several aspects are important when phage display is used as a means of evolving proteins with improved biophysical properties. The correct folding of the fusion
4 The Selection and Evolution Approach 35
protein during phage morphogenesis is an important parameter determining the frequency of incorporation of correctly folded proteins displayed on the phage. Folding intermediates that have a strong aggregation tendency lead to aggregation of a particular library member during the assembly process, and rather than the g3p fusion, the g3p wild-type protein from the helper phage will be incorporated. Despite this inherent selection pressure for the folding efficiency of displayed proteins, the enrichment of improved variants is slow [104], and the utility of phage display to study and improve folding of a given protein was initially not obvious [105]. Combined with a selection for protein functionality, it nevertheless sets the basis for selecting proteins according to their folding properties, as will be discussed in Section 4.3.2. Since the selection procedure is performed in vitro, a large variety of external selection pressures can be applied to the phage particles and be combined with functional selection. If function is omitted as a direct indicator for the native state, other criteria that directly correlate with the native state have to be translated into a selectable feature. This opens the door to select for proteins, which do not bind a ligand, albeit with the caveat that it is not assured that the native state is being selected. Such general selection approaches for physical properties take advantage of the fact that unfolded proteins are more susceptible to proteolytic cleavage than are compactly folded ones [106]. One variation on this theme makes use of the modular nature of the g3p protein, thereby linking phage infectivity directly to the proteolytic susceptibility of the target protein. This system has been called “Proside” (protein stability increase by directed evolution) [107] and will be discussed in Section 4.4.4. 4.2.3 In Vivo Selection Technologies In vivo selection technologies such as the yeast two-hybrid system [108, 109] or other split protein complementation assays [110, 111] are valuable tools for studying protein-protein interactions in living cells. They commonly employ reporter systems, in which a covalent link between the protein of interest and another so-called reporter protein is established. Cell growth or a colorimetric reaction is dependent on either a specific protein-protein interaction or the solubility of a critical component. The reporter protein transduces certain properties of the host protein, most importantly its native fold and its ability to interact and/or its solubility or its resistance to cellular proteases, to a screenable or selectable feature of the fused reporter protein itself [112]. Examples of reporter proteins are transcription factors [108], critical metabolic enzymes [111], or proteins that can easily be assayed [113]. Several facts, however, limit the applicability of in vivo systems for stability engineering. First, as the viability of the host organism has to be guaranteed, external selection pressures cannot easily be applied. Thermophilic organisms represent in some cases an interesting alternative, but conditions allowing selections are hard to establish. Additionally, the host environment as well as the fused reporter proteins possibly perturb the characteristics of the target protein. Furthermore, control experiments must ensure that growth is really dependent on the interaction
36 Engineering Proteins for Stability and Efficient Folding
of interest and that mutations in the host have not “short-circuited” the selection strategy. 4.3 Selection for Enhanced Biophysical Properties
As has been explained in the Introduction, desirable biophysical parameters include solubility, stability, and folding efficiency. Even though the various selection methods and conditions usually do not improve one of these properties exclusively, they are often biased towards one of them. We will therefore discuss different ways to exert selection pressure on the system with respect to the property that is likely to be changed. 4.3.1 Selection for Solubility Solubility, correctly defined as the maximal concentration of the native protein that can be kept in solution, is usually not a property to be selected, as most proteins – with the exception of membrane proteins – are sufficiently soluble. The word “solubility” in the context of screening is often inaccurately used to refer to “soluble expression yield,” and thus it usually mirrors the efficiency of folding in the cell. Even though soluble expression yield is not necessarily equivalent to the folding properties and even less to the stability of a protein, a correlation can often be observed, and soluble expression yield is comparatively easy to screen for. In some cases, a function for a given protein cannot be assigned, or it is difficult and laborious to screen for, and then this property becomes especially important. The fluorescence yield of bacterial colonies expressing proteins or protein domains fused to the green fluorescent protein (GFP) correlates over a wide range with the soluble expression yield of the fused domain [114]. Screening for fluorescence intensity is a versatile method, and, combined with directed evolution, has the potential to select variants of proteins that are less aggregation-prone than their progenitors [115, 116]. In a similar setup, reporter proteins can be used as selection markers instead, such as by fusing the protein of interest to chloramphenicol acetyltransferase [117]. Assays exploiting protein-protein interactions can also be used for this purpose and they have the additional advantage that the fused reporter sequence can be much smaller and thereby minimize the risk of a perturbing influence of the reporter itself on the domain of interest [113]. To completely exclude such perturbing influences, reporter systems might possibly be established that do not need any fusion at all and exploit instead the stress response of the expression host cell. As the overexpression of proteins often activates particular stress response genes by the accumulation of insoluble aggregates, the respective gene promoters can be employed to activate reporter genes instead [118]. In special cases, reporter systems can also be combined with other selection methods to select for folding properties. Many intrinsically stable proteins contain permissive sites in loops, into which polypeptide chains of variable length can be inserted without loss of function of the host protein. The higher the number of residues inserted, the larger the entropic cost of ordering these residues will be.
4 The Selection and Evolution Approach 37
Consequently, the overall stability of the host protein should decrease. However, the entropic cost of inserting a folded sequence will be lower than that of an unfolded sequence of the same length. The probability of the host protein reaching the native (functional) state should thus correlate directly with the ability of the inserted sequence to fold into a compact structure. If host proteins with various inserted sequences are displayed on phages and if the host protein allows a selection for “foldedness” by means of binding to a target, folded sequence insertions are enriched. This system has been termed “loop entropy reduction phage display selection” [119]. While attractive in theory, it was found experimentally that mostly sequences that keep the hybrid protein soluble are enriched. Protein solubility, as used here, can also be interpreted as a lower degree of exposed hydrophobic residues, and folded proteins usually display fewer hydrophobic residues on their surface than do unfolded proteins. Conversely, the exposure of hydrophobic residues is usually a sign of non-native states. Display technologies can thus, for example, use the interaction with hydrophobic surfaces to select against more hydrophobic, i.e., less “folded,” proteins [120]. In summary, protein solubility can reflect in many cases the folding properties of proteins and thus be used as a selection criterion. Moreover, soluble expression in vivo is often a major requirement for the large-scale production and the convenient in vitro handling of proteins. In any case, solubility should never be confused with protein stability. Even though both properties might correlate in some cases, the governing principles are often of a different nature. 4.3.2 Selection for Protein Display Rates The in vivo folding efficiency of proteins represents an intrinsic selection pressure when using phage display. Because correct folding is a prerequisite for both incorporation into the phage coat and binding to the target, the subsequent selection of phages that are able to bind to the target is partly influenced by the folding properties of the displayed molecule. However, this does not automatically imply that the selection is driven towards superior folding properties. As mentioned above, the enrichment factors of proteins with superb folding behavior over the poorly folding members are low [104]. Moreover, even though the binding of a given protein to its target is a very direct way to monitor and screen for its proper folding, the selection criterion “folding” is not decoupled from the selection criterion “binding affinity.” If the enrichment factor for one property (folding) is low, the selection is more likely to be driven towards the other (affinity). As a result, selected members will represent a compromise between folding properties, which only have to be sufficient under the given experimental conditions, and the binding affinity to their target. By randomizing exclusively the residues that build the hydrophobic core of the IgG binding domain of peptostreptococcal protein L, which are not involved in ligand binding but determine the stability and folding kinetics of this small protein, Gu et al. [121] could unambiguously demonstrate the utility of phage display for studying the stability and folding efficiency of proteins. A more extensive randomization effort with subsequent characterization of the selected variants pointed out
38 Engineering Proteins for Stability and Efficient Folding
some important aspects of selection for folding [122]. First, it showed that folding kinetics is not the critical parameter for the selection but rather the overall stability of the mutants, a tendency which could be confirmed by more advanced selection approaches (see Section 4.4.2). Second, the authors demonstrated the utility of selection methods to identify certain residues that are important for the folding mechanism. Third, the selected pools were highly diverse and the thermodynamic stabilities of all variants were lower compared to the wild type. In fact, most of the selected proteins denatured just above room temperature. These results illustrate the principle that any selection – be it natural or in the test tube – simply continues until the minimum requirements are met, in this case, functionality at the selection temperature. Consequently, this suggests that additional, more stringent selection pressures are necessary to really accomplish a directed evolution for improved properties. 4.3.3 Selection on the Basis of Cellular Quality Control Additional selection pressure can be provided by the host organism itself. For example, the secretory quality-control system of eukaryotic cells discriminates proteins according to their folding behavior in an efficient way. This is based on mechanisms that lead to retention of misfolded proteins in the endoplasmic reticulum (ER), followed by degradation of these proteins. In yeast, the Golgi complex can reroute misfolded proteins, which have escaped ER retention, to the vacuole for degradation, thereby constituting an additional important quality-control pathway [14]. In combination with yeast surface display, these mechanisms can be employed to bias functional selections towards enhanced folding efficiency. Several studies have shown that the surface display rate of proteins strongly correlates with their ther-mostability and their soluble secretion efficiency [123, 124]. As an example, by making use of elevated temperatures during expression as an additional selection pressure, improved T-cell receptor fragments, whose thermostability exceeded by far the expression temperatures, could be obtained [125]. Even though the low transformation efficiency of yeast restricts the accessible library size, one obvious advantage of the method is the applicability to glycosylated eukaryotic proteins, which generally are not amenable to yeast two-hybrid or phage display methodologies [126]. 4.4 Selection for Increased Stability 4.4.1 General Strategies The key to all selections for stability is to introduce a threshold that separates the molecule with desired properties from the starting molecules. If the population initially lacks functionality, while only a few members are above the selection threshold, the selection system can distinguish between these slight energetic differences. This is the starting situation if the target protein is initially of very low stability and should be brought to “average” properties.
4 The Selection and Evolution Approach 39
If, however, the starting protein is already of considerable stability, but should be brought to even higher stability, it can be advantageous to intentionally destabilize it prior to selection in order to find stabilizing mutations that reconstitute its functionality. This principle can be applied to very different kinds of proteins, provided that mutations are known that destabilize the protein to an extent that will subsequently allow the selection for alternative stabilizing mutations (Figure 8a). Because the effects of independent stabilizing mutations are often additive, the deliberately introduced destabilizing mutations can be reverted in the context of the additional newly selected ones, thereby rendering the molecule far more stable than the original one (Figure 8b). In principle, all reagents and conditions known to destabilize proteins can decrease the number of library members populating the folded state and can therefore be used to exert increasing selection pressure on the system. Increased temperature and denaturing agents are obvious methods that are useful for selecting proteins of higher thermodynamic stability. The major problem is presented by the compatibility of the conditions with the selection method used. 4.4.2 Protein Destabilization An example of the stabilization of a naturally unstable domain by using phage display was a selection performed with the prodomain of the protease subtilisin BPN [127]. At room temperature the prodomain folds into a stable conformation only upon binding to subtilisin. By randomizing positions that are not directly in contact with subtilisin and subsequent selection on subtilisin, a mutant was selected that showed an increase of Gunfolding by 25 kJ mol−1 , from −8 kJ mol−1 to 17 kJ mol−1 , despite the fact that the library size was comparatively small. Intriguingly, the predominant energetic contribution was mediated by a selected disulfide bond. Previously, a similar strategy had been used to select for thermodynamically favored β-turns of the B1 domain of protein G [128]. Based on considerations described above and on the fact that the replacement of amino acids on the surface of proteins would be predicted to have only moderate effects on stability, the authors reasoned that such a selection could be successful only for proteins of marginal stability. Most of the substitutions would then lead to a positive free energy of folding, and thus the molecule would fail to fold into a functional form at all. In fact, selected turn sequences showed clear sequence preferences only if less stable host proteins were used to accept the turn. In this case, the selected sequences either resembled the wild-type turn or reflected the statistical preferences of turn sequences in the databases and stabilized the protein by 12–20 kJ mol−1 , compared to random sequences. Moreover, increased temperature was used as an additional selection pressure during the phage selection. As discussed in the previous section, the inability of the wild-type target protein to fold is the prerequisite for performing an efficient positive selection for functionality by additional mutations. The internal disulfide bond of immunoglobulin domains significantly contributes to the stability of antibodies [129]. The dramatic loss of free energy of folding upon their removal usually renders antibodies nonfunctional. By first destabilizing a scFv antibody fragment through replacement
Fig. 8 Principle of selection for proteins with improved stability. (a) By random mutagenesis of a given sequence, many mutations are obtained. Some of these mutations are favorable for the stability of the native protein and others are unfavorable. As a result, a diverse pool of proteins with a distribution of different energies is created. Upon exposure of the pool to an external selection pressure, only mutants with stabilities exceeding the selection threshold can be recovered and amplied. However, the selection threshold has to be set high enough to allow an efficient selection of “improved”
members. (b) Destabilization of the wild-type sequence prior to selection allows reducing the necessary selection threshold. Thereby, alternative mutations that stabilize the native fold of the protein can be identited and the selection process becomes more efficient. Moreover, additional stability gains can be achieved by removing the deliberately introduced destabilizing mutations after the selection. The initially lost energy is therefore regained, resulting in mutants of higher stability than the wild-type protein. Adapted ¨ et al. [76]. from Worn
40 Engineering Proteins for Stability and Efficient Folding
4 The Selection and Evolution Approach 41
of cysteines with other residues in both variable domains separately, a completely disulfide-free antibody was identified after several cycles of functional selection and recombination by DNA shuffling [17]. One globally stabilizing mutation was found to compensate for the initial stability loss. This study illustrates some important features of how several parameters exert influence on different stages of the selection process. Interestingly, the stabilizing mutation was already selected during the first rounds, showing that the loss of thermodynamic stability was the primary problem that had to be overcome. Because the stability gain of this mutation was large enough to shift the free energy of folding above the required threshold, all successive rounds did not affect protein stability but led to a fine-tuning with respect to the improvement of folding yield [17]. To illustrate the principle of additivity, reintroduction of the disulfide bridge into the selected variant yielded a scFv antibody of very high stability and superior expression yields [84]. 4.4.3 Selections Based on Elevated Temperature in vivo selection systems using thermophilic expression hosts have been employed for the stabilization of enzymes [130]. However, a rather large set of requirements must be met to apply such a selection system. In order to use stability of the enzyme at elevated temperatures as the selection criterion, its enzymatic activity must be vital to the thermophilic organism, and the corresponding gene of the thermophilic host has to be deleted. Randomized versions of a mesophilic enzyme can then be screened by means of metabolic selection. Although the approach is very powerful, the utility of these systems is restricted to special cases. To evolve enzymes with altered thermal stability, conventional screening for enzymatic activity in vitro after randomization or recombination of the respective gene and subsequent expression of the enzyme is still the most widely used method. An activity screen rapid and sensitive enough to identify slightly improved members from a vast pool of mutants is the key feature of this directed evolution approach [131]. However, individual screens have to be developed for each specific class of enzymes. Even though screening limits the explorable sequence space considerably compared to selection, mesophilic enzymes such as subtilisin E [132] and p-nitrobenzyl esterase [133] could be converted into mutants functionally equivalent to thermophilic enzymes by only a few rounds of directed evolution. Similar to observations discussed before, only very few mutations were necessary to increase the melting temperatures by more than 14 degrees. An important feature of screening compared to selection is that the much smaller library size is partially compensated for by directly measuring the quantity of interest: enzymatic activity at elevated temperature. In contrast, most selection systems use a surrogate measure where “false positives” can lead to the phenotype by mechanisms different from the ones desired. Elevated temperatures not only can be used in screening but also can be combined with selection technologies. While ribosome display is not an option – because low temperatures are essential to keep the ternary complexes of RNA, ribosome, and nascent polypeptide intact and thus ensure the coupling of genotype and phenotype – phage display has proved to be a very suitable method for harsh conditions
42 Engineering Proteins for Stability and Efficient Folding
due to the robustness of the phage particles. Nevertheless, future improvements of in vitro technologies may allow their application under more stringent conditions. As of today, in the case of in vitro technologies, it is vital to destabilize the protein first (see Section 4.4.2); then, very significant stability improvements can be selected [134]. The upper temperature limit for selections using filamentous M13 phages is approximately 60 ◦ C [135]. Above this temperature, re-infection titers of the phages are severely decreased, presumably due to the irreversible heat denaturation of phage coat proteins. Phages displaying the protein of interest can be incubated up to this temperature, and proteins still able to function can be selected. It should be noted that the exposure of phages to higher temperature after phage assembly exerts a somewhat different stress on the proteins than in the methods described before. While in the previous examples the functionality of the proteins was influenced mainly by in vivo folding efficiency and the protein stability during the panning procedure, it is the irreversible unfolding reaction at a given temperature that now becomes an additional parameter. The fraction of unfolded molecules will therefore reflect the rate of unfolding and thus kinetic stability. 4.4.4 Selections Based on Destabilizing Agents The use of protein destabilizing agents for biasing the selection pressure towards stability is limited to in vitro and partial in vitro display methods. Like temperature, the concentrations of denaturing agents can be controlled precisely and can be varied from round to round, allowing a gradual increase of stringency. Even though phages are quite resistant to denaturing agents [136], one should be aware of possible general problems when selections are based on the chemical denaturation properties of the displayed proteins. However, if chemical denaturation is combined with a selection for binding, high concentrations of denaturant can prevent binding to the target, even if the protein is not yet unfolded: as the forces governing ligand binding are very similar to those responsible for protein stability, both are disturbed by chemical denaturants. Conversely, because chemical denaturation is in fact often reversible, removal or dilution of the denaturant prior to ligand binding will often result in refolding on the phage and thus release of selection pressure. Moreover, ionic denaturants such as guanidinium chloride weaken electrostatic interactions and strengthen hydrophobic ones. This might impair the selection of mutations that introduce additional ionic interactions on the protein surface [59] and may favor additional hydrophobic interactions, which is not necessarily desired. If the protein is known to be stabilized by disulfide bridges, a strategy similar to the one used by Proba et al. [17] described above may be applicable. To identify globally stabilizing mutations, which compensate for the stability loss upon removal of disulfide bridges, selection can be performed in the presence of reducing agents such as DTT. Thus, the disulfide-forming cysteines do not have to be removed in advance. However, one should be aware of the fact that some disulfide bridges in proteins, once formed, are often hard to reduce, especially if they are buried within the protein core. Thus, it is advantageous to add the reducing agent at a time
4 The Selection and Evolution Approach 43
when the protein is not yet folded. Jermutus et al. [134] stabilized a scFv antibody fragment by using ribosome display. In contrast to the phage display method, the synthesis of the targeted protein occurs in vitro, which makes the polypeptide chain accessible to reagents during its synthesis on the ribosome and before folding has taken place. DTT was added during translation of the scFv, and its concentration was continuously increased from round to round. When using ribosome display, the population undergoes slight changes due to mutations occurring during PCR, resulting in an iterative adjustment of the selected variants towards tolerating the increasingly stringent conditions. 4.4.5 Selection for Proteolytic Stability In each of the examples cited above, selection for stability was based on a functional selection. Therefore, only those proteins for which a specific function can be assigned could be targeted; in addition, this function has to be screenable or, better yet, selectable. It would be highly advantageous to completely uncouple function from stability in the selection process to extend the range of problems to which this can be applied. A more general approach would thus be an invaluable tool for engineering any given protein and for selecting stable folds from a pool of de novo designed proteins. For functional selections, an additional problem arises from the fact that after a certain stability threshold is reached, allowing enough proteins to populate the folded state, the selection is likely to run towards improved binding properties instead. On the other hand, certain mutations might be stabilizing but might result in slight structural rearrangements that affect functionality in a negative way. Thus, it will not be possible to recover these mutations [106]. A general approach that has proved to be powerful for stability engineering is based on the concept that compactly folded proteins are much more resistant to proteolysis than are partly folded or unfolded proteins [137]. Several variations on this theme have been applied to phage display selections. In principle, it is only necessary that the phages displaying proteins, which can be cleaved by the protease, can be efficiently separated from the phages displaying protease-resistant proteins. This can be achieved in a physical way by providing the displayed protein with an N-terminal tag sequence allowing capture of only phages with non-cleaved proteins [106]. Alternatively, the ability of phages to re-infect bacteria can be directly linked to the protease resistance of the protein of interest [107, 136]. This second alternative takes advantage of the modular structure of the protein domains responsible for phage infection. If the displayed domains are inserted between the carboxy-terminal domain of gp3 and the amino-terminal domains N1 and N2, which are required for phage infectivity, proteolytic cleavage of the displayed domain renders the phage noninfectious. This was inspired by the so-called selectively infective phage method (SIP), where it was shown that the g3p domains could be interrupted by additional domains and even an interacting pair [138]. By employing a phage system derived from the SIP technique, which lacks any wild-type g3p, it is assured that infectivity is completely abolished upon proteolytic cleavage [107] (Figure 9). Alternatively, phages can be engineered to make the remaining wild-type g3p itself susceptible to the proteolytic attack [136]. The principle of protease selection can also be
44 Engineering Proteins for Stability and Efficient Folding
Fig. 9 Strategies for selecting for improved protein stability and folding by phage display. (a) Methods utilizing the binding to a given target (affinity selection) as a means for selecting members with improved folding behavior and higher stability from a protein library. Several types of selection pressure can be applied in order to recover mutants with enhanced stability. Destabilizing mutations are deliberately introduced (top) to render the protein “nonfunctional,” and alternative stabilizing mutations are identified by selecting variants whose functionality is regained. Alternatively (bottom), elevated temperatures or denaturing agents can be used to render most of the protein variants “nonfunctional,” allowing the selection of simply the “fittest” members. (b) The resistance of the target protein to a protease can be combined with the ability of phages
to re-infect bacteria. The protein library is inserted between the C-terminal domain and the N-terminal domain in all copies of the g3p protein (cloned into the phage genome), which are needed for reinfection of bacterial cells. By proteolytic cleavage of the inserted protein, these domains are cut off and phage infectivity is lost. In alternative approaches, an N-terminal tag sequence is fused to the protein of interest instead of the N-terminal g3p domains. Upon proteolytic cleavage, the tag is lost. In contrast, phages presenting proteins that are resistant to proteolytic cleavage can subsequently be captured on an affinity matrix binding to the tag sequence. In order to further increase the selection threshold, the shown selection strategies can also be combined.
5 Conclusions
applied to in vitro selection techniques in which the displayed protein is freely accessible [120]. Even though proteolysis seems at first glance to be less correlated to the stability of proteins than temperature, it is suitable for optimizing packing of the hydrophobic core [139] as well as for optimizing electrostatic interactions on the protein surface [59]. One reason for the strength of this approach is based on the fact that proteolytic cleavage is an irreversible reaction, which is not necessarily the case for denaturation by temperature or denaturing agents. Furthermore, protease resistance monitors the flexibility of the polypeptide chain rather than complete denaturation. It is therefore capable not only of selecting against completely unfolded variants but also of detecting local unfolding events. Because sites of local unfolding often initiate the global unfolding process, it can be advantageous to remove such sites. Nevertheless, the effective cleavage of flexible parts of the protein restricts the method to proteins that do not have extended flexible regions in the native state. Furthermore, a selection against the primary recognition sequence of the protease is clearly a possible outcome of such experiments. None of the described methods can stand completely on its own. Many methods can easily be combined to increase the stringency of a selection. A combination of temperature stress and increasing amounts of denaturing agents may, for example, lead to higher flexibility in certain regions of the protein, thereby increasing the sensitivity of the subsequent proteolytic attack. Additionally, temperature and the concentration of denaturing agents allow a much tighter control of the selection pressure than do increasing concentrations of protease and thus open the possibility of a well-controlled gradual increase of selection stringency.
5 Conclusions
Several different approaches are now available for engineering proteins for enhanced biophysical properties. In many cases, few and specific mutations are sufficient to provide proteins of marginal stability with considerably stabilizing features. Thus, the challenge for protein engineering is to identify these positions in a given sequence among the vast number of possible changes and to correctly alter them. Each of the applied techniques has its own merits and bottlenecks. Instead of playing them off against each other, the future challenge will be to find ways to synergistically use them to improve a given molecular property. Evolutionary methods have the advantage of being much less biased by theoretical assumptions or working hypotheses. Additionally, proteins stable in new environments may be evolved, e.g., proteins that fulfill a given function in nonaqueous solutions or high concentrations of detergents. Because the biophysical principles in such environments are of a different nature, rational design has to rely on a much smaller empirical dataset. While rational approaches are likely to become more important as more structural and experimental data become available, notably also those from selection experiments, the large number of variations
45
46 Engineering Proteins for Stability and Efficient Folding
that are potentially able to improve the biophysical properties of a protein often still exceeds the experimentally accessible number. Moreover, because rational engineering has to rely on the available dataset, mutations that lie off the beaten track will rarely be identified. Nevertheless, selection experiments often identify the same “key” mutations in proteins. It is then useful to exploit this information and directly introduce such mutations. A “rational” analysis can also help to recombine important “key” mutations in selected clones and to reduce the effects of a selection-neutral genetic drift. Another combination of rational and combinatorial methods is the creation of “smart” libraries of variants. Library design that is based on such structural considerations and principles will therefore allow more accurate focusing of the selection on specific regions of interest and thereby increase the chances for success. Acknowledgements
The authors thank Drs. Daniela R¨othlisberger, Casim Sarkar and Annemarie Honegger for critical reading of the manuscript. B. S. was the recipient of a Kekule fellowship from the Fonds der Chemischen Industrie. References 1 TAVERNA, D. M. & GOLDSTEIN, R. A. (2002). Why are proteins marginally stable?. Proteins 46, 105–109. 2 ZAVODSZKY, P., KARDOS, J., SVINGOR & PETSKO, G. A. (1998). Adjustment of conformational flexibility is a key event in the thermal adaptation of proteins. Proc. Natl. Acad. Sci. U.S.A. 95, 7406–7411. 3 ARNOLD, F. H., GIVER, L., GERSHENSON, A., ZHAO, H. & MIYAZAKI, K. (1999). Directed evolution of mesophilic enzymes into their thermophilic counterparts. Ann. N.Y. Acad. Sci. 870, 400–403. 4 GOVINDARAJAN, S. & GOLDSTEIN, R. A. (1995). Searching for Foldable Protein Structures Using Optimized Energy Functions. Biopolymers 36, 43–51. 5 DILL, K. A. (1990). Dominant forces in protein folding. Biochemistry 29, 7133–7155. 6 PAPPENBERGER, G., SCHURIG, H. & JAENICKE, R. (1997). Disruption of an ionic network leads to accelerated thermal denaturation of D-glyceraldehyde-3-phosphate
7
8
9
10
11
dehydrogenase from the hyperthermophilic bacterium Thermotoga maritima. J. Mol. Biol. 274, 676–683. STERNER, R. & LIEBL, W. (2001). Thermophilic adaptation of proteins. Crit. Rev. Biochem. Mol. Biol. 36, 39– 106. ZLOTNICK, A. & STRAY, S. J. (2003). How does your virus grow? Understanding and interfering with virus assembly. Trends Biotechnol. 21, 536–542. JASWAL, S. S., SOHL, J. L., DAVIS, J. H. & AGARD, D. A. (2002). Energetic landscape of alpha-lytic protease optimizes longevity through kinetic stability. Nature 415, 343–346. FERSHT, A. R., BYCROFT, M., HOROVITZ, A., KELLIS, J. T., Jr., MATOUSCHEK, A. & SERRANO, L. (1991). Pathway and stability of protein folding. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 332, 171–176. J¨AGER, M. & PLU¨ CKTHUN, A. (1999). Domain interactions in antibody Fv and scFv fragments: effects on unfolding kinetics and equilibria. FEBS Lett. 462, 307–312.
References 12 WILLUDA, J., HONEGGER, A., WAIBEL, R., SCHUBIGER, P. A., STAHEL, R., ZANGEMEISTER-WITTKE, U. & PLU¨ CKTHUN, A. (1999). High thermal stability is essential for tumor targeting of antibody fragments: engineering of a humanized anti-epithelial glycoprotein-2 (epithelial cell adhesion molecule) single-chain Fv fragment. Cancer Res. 59, 5758– 5767. 13 EWERT, S., HUBER, T., HONEGGER, A. & PLU¨ CKTHUN, A. (2003). Biophysical properties of human antibody variable domains. J. Mol. Biol. 325, 531–553. 14 ELLGAARD, L., MOLINARI, M. & HELENIUS, A. (1999). Setting the standards: quality control in the secretory pathway. Science 286, 1882–1888. 15 MARTINEAU, P. & BETTON, J. M. (1999). in vitro folding and thermodynamic stability of an antibody fragment selected in vivo for high expression levels in Escherichia coli cytoplasm. J. Mol. Biol. 292, 921–929. 16 SCHULER, B. & SECKLER, R. (1998). P22 tailspike folding mutants revisited: effects on the thermodynamic stability of the isolated beta-helix domain. J. Mol. Biol. 281, 227–234. 17 PROBA, K., W¨ORN, A., HONEGGER, A. & PLU¨ CKTHUN, A. (1998). Antibody scFv fragments without disulfide bonds made by molecular evolution. J. Mol. Biol. 275, 245–253. 18 KNAPPIK, A. & PLU¨ CKTHUN, A. (1995). Engineered turns of a recombinant antibody improve its in vivo folding. Protein Eng. 8, 81–89. 19 W¨ORN, A. & PLU¨ CKTHUN, A. (1999). Different equilibrium stability behavior of ScFv fragments: identification, classification, and improvement by protein engineering. Biochemistry 38, 8739–8750. 20 STEIPE, B., SCHILLER, B., PLU¨ CKTHUN, A. & STEINBACHER, S. (1994). Sequence statistics reliably predict stabilizing mutations in a protein domain. J. Mol. Biol. 240, 188–192. 21 LEHMANN, M. & WYSS, M. (2001). Engineering proteins for thermostability: the use of sequence alignments versus rational design and
22
23
24
25
26
27
28
29
30
directed evolution. Curr. Opin. Biotechnol. 12, 371–375. KNAPPIK, A., GE, L., HONEGGER, A., PACK, P., FISCHER, M., WELLNHOFER, G., HOESS, A., WOLLE, J., PLU¨ CKTHUN, A. & VIRNEKA¨ S, B. (2000). Fully synthetic human combinatorial antibody libraries (HuCAL) based on modular consensus frameworks and CDRs randomized with trinucleotides. J. Mol. Biol. 296, 57–86. MOSAVI, L. K., MINOR, D. L., Jr. & PENG, Z. Y. (2002). Consensus-derived structural determinants of the ankyrin repeat motif. Proc. Natl. Acad. Sci. U.S.A. 99, 16029–16034. WANG, Q., BUCKLE, A. M., FOSTER, N. W., JOHNSON, C. M. & FERSHT, A. R. (1999). Design of highly stable functional GroEL minichaperones. Protein Sci. 8, 2186–2193. LEHMANN, M., LOCH, C., MIDDENDORF, A., STUDER, D., LASSEN, S. F., PASAMONTES, L., van LOON, A. P. & WYSS, M. (2002). The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng. 15, 403–411. BINZ, H. K., STUMPP, M. T., FORRER, P., AMSTUTZ, P. & PLU¨ CKTHUN, A. 39 Engineering Proteins for Stability and Efficient Folding (2003). Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J. Mol. Biol. 332, 489–503. MAIN, E. R., XIONG, Y., COCCO, M. J., D’ANDREA, L. & REGAN, L. (2003). Design of stable alpha-helical arrays from an idealized TPR motif. Structure (Camb) 11, 497–508. STUMPP, M. T., FORRER, P., BINZ, H. K. & PLU¨ CKTHUN, A. (2003). Designing repeat proteins: modular leucine-rich repeat protein libraries based on the mammalian ribonuclease inhibitor family. J. Mol. Biol. 332, 471–487. FORRER, P., BINZ, H. K., STUMPP, M. T. & PLU¨ CKTHUN, A. (2004). Consensus design of repeat proteins. Chembiochem 5, 183–189. KOHL, A., BINZ, H. K., FORRER, P., STUMPP, M. T., PLU¨ CKTHUN, A. & GRU¨ TTER, M. G. (2003). Designed to be
47
48 Engineering Proteins for Stability and Efficient Folding
31
32
33
34
35
36
37
38
39
40
41
stable: crystal structure of a consensus ankyrin repeat protein. Proc. Natl. Acad. Sci. U.S.A. 100, 1700–1705. LEVITT, M., GERSTEIN, M., HUANG, E., SUBBIAH, S. & TSAI, J. (1997). Protein folding: the endgame. Annu. Rev. Biochem. 66, 549–579. BOHM, G. & JAENICKE, R. (1994). Relevance of sequence statistics for the properties of extremophilic proteins. Int. J. Pept. Protein Res. 43, 97–106. JAENICKE, R. & BOHM, G. (1998). The stability of proteins in extreme environments. Curr. Opin. Struct. Biol. 8, 738–748. VIEILLE, C. & ZEIKUS, G. J. (2001). Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol. Mol. Biol. Rev. 65, 1–43. van den BURG, B. & EIJSINK, V. G. (2002). Selection of mutations for increased protein stability. Curr. Opin. Biotechnol. 13, 333–337. MATSUMURA, M., SIGNOR, G. & MATTHEWS, B. W. (1989). Substantial increase of protein stability by multiple disulphide bonds. Nature 342, 291–293. PERRY, L. J. & WETZEL, R. (1984). Disulfide bond engineered into T4 lysozyme: stabilization of the protein toward thermal inactivation. Science 226, 555–557. PACE, C. N., GRIMSLEY, G. R., THOMSON, J. A. & BARNETT, B. J. (1988). Conformational stability and activity of ribonuclease T1 with zero, one, and two intact disulfide bonds. J. Biol. Chem. 263, 11820–11825. ZHANG, T., BERTELSEN, E. & ALBER, T. (1994). Entropic effects of disulphide bonds on protein stability. Nat. Struct. Biol. 1, 434–438. DOIG, A. J. & WILLIAMS, D. H. (1991). Is the hydrophobic effect stabilizing or destabilizing in proteins? The contribution of disulphide bonds to protein stability. J. Mol. Biol. 217, 389–398. MARTENSSON, L. G., KARLSSON, M. & CARLSSON, U. (2002). Dramatic stabilization of the native state of human carbonic anhydrase II by an engineered
42
43
44
45
46
47
48
49
50
51
disulfide bond. Biochemistry 41, 15867–15875. IVENS, A., MAYANS, O., SZADKOWSKI, H., JURGENS, C., WILMANNS, M. & KIRSCHNER, K. (2002). Stabilization of a (βα)8 -barrel protein by an engineered disulfide bridge. Eur. J. Biochem. 269, 1145–1153. BETZ, S. F. & PIELAK, G. J. (1992). Introduction of a disulfide bond into cytochrome c stabilizes a compact denatured state. Biochemistry 31, 12337–12344. IWAI, H. & PLU¨ CKTHUN, A. (1999). Circular beta-lactamase: stability enhancement by cyclizing the backbone. FEBS Lett. 459, 166–172. SCOTT, C. P., ABEL-SANTOS, E., WALL, M., WAHNON, D. C. & BENKOVIC, S. J. (1999). Production of cyclic peptides and proteins in vivo. Proc. Natl. Acad. Sci. U.S.A. 96, 13638–13643. CAMARERO, J. A., FUSHMAN, D., SATO, S., GIRIAT, I., COWBURN, D., RALEIGH, D. P. & MUIR, T. W. (2001). Rescuing a destabilized protein fold through backbone cyclization. J. Mol. Biol. 308, 1045–1062. NAGI, A. D. & REGAN, L. (1997). An inverse correlation between loop length and stability in a four-helix-bundle protein. Fold. Des. 2, 67–75. RUSSELL, R. J., FERGUSON, J. M., HOUGH, D. W., DANSON, M. J. & TAYLOR, G. L. (1997). The crystal structure of citrate synthase from the hyperthermophilic archaeon pyrococcus furiosus at 1.9 Å resolution. Biochemistry 36, 9983–9994. MACEDO-RIBEIRO, S., DARIMONT, B., STERNER, R. & HUBER, R. (1996). Small structural changes account for the high thermostability of 1[4Fe-4S] ferredoxin from the hyperthermophilic bacterium Thermotoga maritima. Structure 4, 1291–1301. MATTHEWS, B. W., NICHOLSON, H. & BECKTEL, W. J. (1987). Enhanced protein thermostability from site-directed mutations that decrease the entropy of unfolding. Proc. Natl. Acad. Sci. U.S.A. 84, 6663–6667. SRIPRAPUNDH, D., VIEILLE, C. & ZEIKUS, J. G. (2000). Molecular determinants of xylose isomerase thermal stability and
References
52
53
54
55
56
57
58
59
60
61
activity: analysis of thermozymes by site-directed mutagenesis. Protein Eng. 13, 259–265. PACE, C. N. (1992). Contribution of the hydrophobic effect to globular protein stability. J. Mol. Biol. 226, 29–35. LIM, W. A., HODEL, A., SAUER, R. T. & RICHARDS, F. M. (1994). The crystal structure of a mutant protein with altered but improved hydrophobic core packing. Proc. Natl. Acad. Sci. U.S.A. 91, 423–427. VENTURA, S., VEGA, M. C., LACROIX, E., ANGRAND, I., SPAGNOLO, L. & SERRANO, L. (2002). Conformational strain in the hydrophobic core and its implications for protein folding and design. Nat. Struct. Biol. 9, 485–493. OHMURA, T., UEDA, T., OOTSUKA, K., SAITO, M. & IMOTO, T. (2001). Stabilization of hen egg white lysozyme by a cavity-filling mutation. Protein Sci. 10, 313–320. CHEN, J. & STITES, W. E. (2001). Higher-order packing interactions in triple and quadruple mutants of staphylococcal nuclease. Biochemistry 40, 14012–14019. HENDSCH, Z. S. & TIDOR, B. (1994). Do salt bridges stabilize proteins? A continuum electrostatic analysis. Protein Sci. 3, 211–226. PACE, C. N. (2001). Polar group burial contributes more to protein stability than nonpolar group burial. Biochemistry 40, 310–313. MARTIN, A., SIEBER, V. & SCHMID, F. X. (2001). In-vitro selection of highly stabilized protein variants with optimized surface. J. Mol. Biol. 309, 717–726. LOLADZE, V. V., IBARRA-MOLERO, B., SANCHEZ-RUIZ, J. M. & MAKHATADZE, G. I. (1999). Engineering a thermostable protein via optimization of charge-charge interactions on the protein surface. Biochemistry 38, 16419–16423. SHOEMAKER, K. R., KIM, P. S., YORK, E. J., STEWART, J. M. & BALDWIN, R. L. (1987). Tests of the helix dipole model for stabilization of alpha-helices. Nature 326, 563–567.
62 PACE, C. N., SHIRLEY, B. A., MCNUTT, M. & GAJIWALA, K. (1996). Forces contributing to the conformational stability of proteins. FASEB J. 10, 75–83. 63 LOLADZE, V. V., ERMOLENKO, D. N. & MAKHATADZE, G. I. (2002). Thermodynamic consequences of burial of polar and non-polar amino acid residues in the protein interior. J. Mol. Biol. 320, 343–357. 64 MCDONALD, I. K. & THORNTON, J. M. (1994). Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 238, 777–793. 65 HENNIG, M., DARIMONT, B., STERNER, R., KIRSCHNER, K. & JANSONIUS, J. N. (1995). 2.0 Å structure of indole-3-glycerol phosphate synthase from the hyperthermophile Sulfolobus solfataricus: possible determinants of protein stability. Structure 3, 1295– 1306. 66 RAMACHANDRAN, G. N., RAMAKRISHNAN, C. & SASISEKHARAN, V. (1963). Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7, 95–99. 67 KIMURA, S., KANAYA, S. & NAKAMURA, H. (1992). Thermostabilization of Escherichia coli ribonuclease HI by replacing left-handed helical Lys95 with Gly or Asn. J. Biol. Chem. 267, 22014–22017. 68 TAKANO, K., YAMAGATA, Y. & YUTANI, K. (2001). Role of amino acid residues in left-handed helical conformation for the conformational stability of a protein. Proteins 45, 274–280. 69 STITES, W. E., MEEKER, A. K. & SHORTLE, D. (1994). Evidence for strained interactions between side-chains and the polypeptide backbone. J. Mol. Biol. 235, 27–32. 70 VEGA, M. C., MARTINEZ, J. C. & SERRANO, L. (2000). Thermodynamic and structural characterization of Asn and Ala residues in the disallowed II region of the Ramachandran plot. Protein Sci. 9, 2322–2328. 71 GUNASEKARAN, K., RAMAKRISHNAN, C. & BALARAM, P. (1996). Disallowed Ramachandran conformations of amino acid residues in protein structures. J. Mol. Biol. 264, 191–198.
49
50 Engineering Proteins for Stability and Efficient Folding
72 PAL, D. & CHAKRABARTI, P. (2002). On residues in the disallowed region of the Ramachandran map. Biopolymers 63, 195–206. 73 EWERT, S., HONEGGER, A. & PLU¨ CKTHUN, A. (2003). Structure-based improvement of the biophysical properties of immunoglobulin VH domains with a generalizable approach. Biochemistry 42, 1517–1528. 74 NIEBA, L., HONEGGER, A., KREBBER, C. & PLU¨ CKTHUN, A. (1997). Disrupting the hydrophobic patches at the antibody variable/constant domain interface: improved in vivo folding and physical characterization of an engineered scFv fragment. Protein Eng. 10, 435–444. 75 BRANDTS, J. F., HU, C. Q., LIN, L. N. & MOS, M. T. (1989). A simple model for proteins with interacting domains. Applications to scanning calorimetry data. Biochemistry 28, 8588–8596. 76 W¨ORN, A. & PLU¨ CKTHUN, A. (2001). Stability engineering of antibody single-chain Fv fragments. J. Mol. Biol. 305, 989–1010. 77 GOKHALE, R. S., AGARWALLA, S., FRANCIS, V. S., SANTI, D. V. & BALARAM, P. (1994). Thermal stabilization of thymidylate synthase by engineering two disulfide bridges across the dimer interface. J. Mol. Biol. 235, 89–94. 78 HUDSON, P. J. (1999). Recombinant antibody constructs in cancer therapy. Curr. Opin. Immunol. 11, 548–557. 79 PLU¨ CKTHUN, A., KREBBER, A., KREBBER, C., HORN, U., KNU¨ PFER, U., WENDEROTH, R., NIEBA, L., PROBA, K. & RIESENBERG, D. (1996). Producing antibodies in Escherichia coli: From PCR to fermentation. In Antibody Engineering (MCCAFFERTY, J., HOOGEN-BOOM, H. R. & CHISWELL, D. J., eds.), pp. 203–252. IRL Press, Oxford. 80 HONEGGER, A. & PLU¨ CKTHUN, A. (2001). The influence of the buried glutamine or glutamate residue in position 6 on the structure of immunoglobulin variable domains. J. Mol. Biol. 309, 687–699. 81 JUNG, S., SPINELLI, S., SCHIMMELE, B., HONEGGER, A., PUGLIESE, L., CAMBILLAU, C. & PLU¨ CKTHUN, A. (2001). The importance of framework residues H6, H7 and H10 in antibody heavy chains:
82
83
84
85
86
87
88
89
90
91
experimental evidence for a new structural subclassification of antibody VH domains. J. Mol. Biol. 309, 701–716. GLOCKSHUBER, R., MALIA, M., PFITZINGER, I. & PLU¨ CKTHUN, A. (1990). A comparison of strategies to stabilize immunoglobulin Fv-fragments. Biochemistry 29, 1362–1367. JUNG, S. H., PASTAN, I. & LEE, B. (1994). Design of interchain disulfide bonds in the framework region of the Fv fragment of the monoclonal antibody B3. Proteins 19, 35–47. W¨ORN, A. & PLU¨ CKTHUN, A. (1998). Mutual stabilization of VL and VH in single-chain antibody fragments, investigated with mutants engineered for stability. Biochemistry 37, 13120–13127. YOUNG, N. M., MACKENZIE, C. R., NARANG, S. A., OOMEN, R. P. & BAENZIGER, J. E. (1995). Thermal stabilization of a single-chain Fv antibody fragment by introduction of a disulphide bond. FEBS Lett. 377, 135–139. ARKIN, M. R. & WELLS, J. A. (1998). Probing the importance of second sphere residues in an esterolytic antibody by phage display. J. Mol. Biol. 284, 1083–1094. LOW, N. M., HOLLIGER, P. H. & WINTER, G. (1996). Mimicking somatic hypermutation: affinity maturation of antibodies displayed on bacteriophage using a bacterial mutator strain. J. Mol. Biol. 260, 359–368. STEMMER, W. P. (1994). Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, 389–391. ZHAO, H., GIVER, L., SHAO, Z., AFFHOLTER, J. A. & ARNOLD, F. H. (1998). Molecular evolution by staggered extension process (StEP) in vitro recombination. Nat. Biotechnol. 16, 258–261. AMSTUTZ, P., FORRER, P., ZAHND, C. & PLU¨ CKTHUN, A. (2001). in vitro display technologies: novel developments and applications. Curr. Opin. Biotechnol. 12, 400–405. FORRER, P., JUNG, S. & PLU¨ CKTHUN, A. (1999). Beyond binding: using phage
References
92
93
94
95
96
97
98
99
100
101
display to select for structure, folding and enzymatic activity in proteins. Curr. Opin. Struct. Biol. 9, 514–520. ZHAO, H. & ARNOLD, F. H. (1997). Combinatorial protein design: strategies for screening protein libraries. Curr. Opin. Struct. Biol. 7, 480–485. LEHMANN, M., PASAMONTES, L., LASSEN, S. F. & WYSS, M. (2000). The consensus concept for thermostability engineering of proteins. Biochim. Biophys. Acta 1543, 408–415. MATTHEAKIS, L. C., BHATT, R. R. & DOWER, W. J. (1994). An in vitro polysome display system for identifying ligands from very large peptide libraries. Proc. Natl. Acad. Sci. U.S.A. 91, 9022–9026. HANES, J. & PLU¨ CKTHUN, A. (1997). in vitro selection and evolution of functional proteins by using ribosome display. Proc. Natl. Acad. Sci. U.S.A. 94, 4937–4942. HE, M. & TAUSSIG, M. J. (1997). Antibody-ribosome-mRNA (ARM) complexes as efficient selection particles for in vitro display and evolution of antibody combining sites. Nucleic Acids Res 25, 5132–5134. ROBERTS, R. W. & SZOSTAK, J. W. (1997). RNA-peptide fusions for the in vitro selection of peptides and proteins. Proc. Natl. Acad. Sci. U.S.A. 94, 12297–12302. KURZ, M., GU, K., AL-GAWARI, A. & LOHSE, P. A. (2001). cDNA – protein fusions: covalent protein-gene conjugates for the in vitro selection of peptides and proteins. ChemBioChem 2, 666–672. TAWFIK, D. S. & GRIFFITHS, A. D. (1998). Man-made cell-like compartments for molecular evolution. Nat. Biotechnol. 16, 652–656. DOI, N. & YANAGAWA, H. (1999). STABLE: protein-DNA fusion system for screening of combinatorial protein libraries in vitro. FEBS Lett. 457, 227–230. SMITH, G. P. (1985). Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228, 1315–1317.
102 GEORGIOU, G., STATHOPOULOS, C., DAUGHERTY, P. S., NAYAK, A. R., IVERSON, B. L. & CURTISS, R., 3rd. (1997). Display of heterologous proteins on the surface of microorganisms: from the screening of combinatorial libraries to live recombinant vaccines. Nat. Biotechnol. 15, 29–34. 103 BODER, E. T. & WITTRUP, K. D. (1997). Yeast surface display for screening combinatorial polypeptide libraries. Nat. Biotechnol. 15, 553–557. 104 JUNG, S. & PLU¨ CKTHUN, A. (1997). Improving in vivo folding and stability of a single-chain Fv antibody fragment by loop grafting. Protein Eng. 10, 959– 966. 105 HOESS, R. H. (2001). Protein design and phage display. Chem. Rev. 101, 3205–3218. 106 FINUCANE, M. D., TUNA, M., LEES, J. H. & WOOLFSON, D. N. (1999). Core-directed protein design. I. An experimental method for selecting stable proteins from combinatorial libraries. Biochemistry 38, 11604–11612. 107 SIEBER, V., PLU¨ CKTHUN, A. & SCHMID, F. X. (1998). Selecting proteins with improved stability by a phage-based method. Nat. Biotechnol. 16, 955–960. 108 FIELDS, S. & SONG, O. (1989). A novel genetic system to detect protein-protein interactions. Nature 340, 245–246. 109 BAI, C. & ELLEDGE, S. J. (1996). Gene identification using the yeast two-hybrid system. Methods Enzymol. 273, 331–347. 110 JOHNSSON, N. & VARSHAVSKY, A. (1994). Split ubiquitin as a sensor of protein interactions in vivo. Proc. Natl. Acad. Sci. U.S.A. 91, 10340–10344. 111 MICHNICK, S. W. (2001). Exploring protein interactions by interaction-induced folding of proteins from complementary peptide fragments. Curr. Opin. Struct. Biol. 11, 472–477. 112 WALDO, G. S. (2003). Genetic screens and directed evolution for protein solubility. Curr. Opin. Chem. Biol. 7, 33–38. 113 WIGLEY, W. C., STIDHAM, R. D., SMITH, N. M., HUNT, J. F. & THOMAS, P. J. (2001). Protein solubility and folding monitored in vivo by structural
51
52 Engineering Proteins for Stability and Efficient Folding
114
115
116
117
118
119
120
121
122
123
complementation of a genetic marker protein. Nat. Biotechnol. 19, 131–136. WALDO, G. S., STANDISH, B. M., BERENDZEN, J. & TERWILLIGER, T. C. (1999). Rapid protein-folding assay using green fluorescent protein. Nat. Biotechnol. 17, 691–695. PEDELACQ, J. D., PILTCH, E., LIONG, E. C., BERENDZEN, J., KIM, C. Y., RHO, B. S., PARK, M. S., TERWILLIGER, T. C. & WALDO, G. S. (2002). Engineering soluble proteins for structural genomics. Nat. Biotechnol. 20, 927–932. WURTH, C., GUIMARD, N. K. & HECHT, M. H. (2002). Mutations that reduce aggregation of the Alzheimer’s Aβ42 peptide: an unbiased search for the sequence determinants of Aβ amyloidogenesis. J. Mol. Biol. 319, 1279–1290. MAXWELL, K. L., MITTERMAIER, A. K., FORMAN-KAY, J. D. & DAVIDSON, A. R. (1999). A simple in vivo assay for increased protein solubility. Protein Sci. 8, 1908–1911. LESLEY, S. A., GRAZIANO, J., CHO, C. Y., KNUTH, M. W. & KLOCK, H. E. (2002). Gene expression response to misfolded protein as a screen for soluble recombinant protein. Protein Eng. 15, 153–160. MINARD, P., SCALLEY-KIM, M., WATTERS, A. & BAKER, D. (2001). A “loop entropy reduction” phage-display selection for folded amino acid sequences. Protein Sci. 10, 129–134. MATSUURA, T. & PLU¨ CKTHUN, A. (2003). Selection based on the folding properties of proteins with ribosome display. FEBS Lett. 539, 24–28. GU, H., YI, Q., BRAY, S. T., RIDDLE, D. S., SHIAU, A. K. & BAKER, D. (1995). A phage display system for studying the sequence determinants of protein folding. Protein Sci. 4, 1108–1117. KIM, D. E., GU, H. & BAKER, D. (1998). The sequences of small proteins are not extensively optimized for rapid folding by natural selection. Proc. Natl. Acad. Sci. U.S.A. 95, 4982–4986. KOWALSKI, J. M., PAREKH, R. N., MAO, J. & WITTRUP, K. D. (1998). Protein folding stability can determine the efficiency of
124
125
126
127
128
129
130
131
132
133
escape from endoplasmic reticulum quality control. J. Biol. Chem. 273, 19453–19458. SHUSTA, E. V., KIEKE, M. C., PARKE, E., KRANZ, D. M. & WITTRUP, K. D. (1999). Yeast polypeptide fusion surface display levels predict thermal stability and soluble secretion efficiency. J. Mol. Biol. 292, 949–956. SHUSTA, E. V., HOLLER, P. D., KIEKE, M. C., KRANZ, D. M. & WITTRUP, K. D. (2000). Directed evolution of a stable scaffold for T-cell receptor engineering. Nat. Biotechnol. 18, 754–759. BODER, E. T. & WITTRUP, K. D. (2000). Yeast surface display for directed evolution of protein expression, affinity, and stability. Methods Enzymol. 328, 430–444. RUAN, B., HOSKINS, J., WANG, L. & BRYAN, P. N. (1998). Stabilizing the subtilisin BPN pro-domain by phage display selection: how restrictive is the amino acid code for maximum protein stability?. Protein Sci. 7, 2345–2353. ZHOU, H. X., HOESS, R. H. & DEGRADO, W. F. (1996). in vitro evolution of thermodynamically stable turns. Nat. Struct. Biol. 3, 446–451. GOTO, Y. & HAMAGUCHI, K. (1979). The role of the intrachain disulfide bond in the conformation and stability of the constant fragment of the immunoglobulin light chain. J. Biochem. (Tokyo). 86, 1433–1441. AKANUMA, S., YAMAGISHI, A., TANAKA, N. & OSHIMA, T. (1999). Further improvement of the thermal stability of a partially stabilized Bacillus subtilis 3-isopropylmalate dehydrogenase variant by random and site-directed mutagenesis. Eur. J. Biochem. 260, 499–504. FARINAS, E. T., BULTER, T. & ARNOLD, F. H. (2001). Directed enzyme evolution. Curr. Opin. Biotechnol. 12, 545–551. ZHAO, H. & ARNOLD, F. H. (1999). Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng. 12, 47–53. GIVER, L., GERSHENSON, A., FRESKGARD, P. O. & ARNOLD, F. H. (1998). Directed evolution of a thermostable esterase.
References Proc. Natl. Acad. Sci. U.S.A. 95, 12809–12813. 134 JERMUTUS, L., HONEGGER, A., SCHWESINGER, F., HANES, J. & PLU¨ CKTHUN, A. (2001). Tailoring in vitro evolution for protein affinity or stability. Proc. Natl. Acad. Sci. U.S.A. 98, 75–80. 135 JUNG, S., HONEGGER, A. & PLU¨ CKTHUN, A. (1999). Selection for improved protein stability by phage display. J. Mol. Biol. 294, 163–180. 136 KRISTENSEN, P. & WINTER, G. (1998). Proteolytic selection for protein folding using filamentous bacteriophages. Fold. Des. 3, 321–328.
137 FONTANA, A., POLVERINO DE LAURETO, P., De FILIPPIS, V., SCARAMELLA, E. & ZAMBONIN, M. (1997). Probing the partly folded states of proteins by limited proteolysis. Fold. Des. 2, R17–26. 138 KREBBER, C., SPADA, S., DESPLANCQ, D. & PLU¨ CKTHUN, A. (1995). Co-selection of cognate antibody-antigen pairs by selectively-infective phages. FEBS Lett. 377, 227–231. 139 FINUCANE, M. D. & WOOLFSON, D. N. (1999). Core-directed protein design. II. Rescue of a multiply mutated and destabilized variant of ubiquitin. Biochemistry 38, 11613–11623.
53
1
Protein-based Artificial Enzymes Ben Duckworth and Mark D. Distefano
University of Minnesota, Minneapolis, USA
Originally published in: Artificial Enzymes. Edited by Ronald Breslow. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31165-1
1 Introduction
The design of catalysts that rival the specificity and speed of natural enzymes is a challenging objective. Various approaches have been employed to create such molecules. Polypeptides are a logical choice of possible framework for catalyst design because they provide a simple means for generating complex structures. The catalyst structure can be changed by altering the corresponding gene and expressing the desired protein in bacteria. However, chemical methods can also be incorporated into such an approach to greatly expand the range of possible reactions that can be studied. In such a chemogenetic strategy, a small molecule-based catalyst is coupled to a protein scaffold devoid of catalytic activity. The small molecule provides intrinsic reactivity while the protein component controls specificity; the protein can also be used to tune or augment the reactivity of the small molecule catalyst. Such an approach provides enormous flexibility in the design of new catalytic materials. Early efforts in this field focused on using chemical modification to alter the type of reaction catalyzed by existing enzymes or to change their substrate specificity. Thiosubtilisin [1, 2], selenosubtilisin [3] and flavopapain [4] are all examples of proteins that manifest altered reactivity resulting from chemical modification. Other work with sta-phylococcal nuclease [5] and subtilisin [6, 7] employed chemical modification to modulate substrate selectivity. While that pioneering work served as the foundation for protein-based catalyst design, these efforts were, notably, based on protein frameworks that possessed native catalytic activity. Such proteins already contained highly evolved substrate binding sites and/or catalytic machinery. The present chapter focuses on the design of protein-based catalysts starting with polypeptides that have no intrinsic catalytic activity. While this is an enormous challenge when compared with approaches starting with existing enzymes, it opens Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein-based Artificial Enzymes
up a plethora of exciting possibilities; if it could be accomplished, the ability to create catalysts with enzyme-like properties would have dramatic effects that would extend well beyond the field of chemistry. Several recent articles provide a more comprehensive review of the field of protein design, including work with existing enzymes [8–19]. 2 Artificial Nucleases Based on DNA and RNA Binding Proteins 2.1 Introduction
The process of cleaving the phosphodiester bond of RNA and DNA is a vital and widespread phenomenon in nature. Scission of the nucleic acid backbone plays crucial roles in myriad in vivo and in vitro processes, including mutational repair, cell death, cloning, and sequencing. Much attention has been given to the discovery and development of nucleases, both natural and artificial, which might benefit the fields of biotechnology and pharmacology [20]. Several chemical agents possess nuclease activity. However, such compounds have low affinity for oligonucleotides and may not be optimally positioned for attack and scission of the backbone. Therefore, if a chemical nuclease could be incorporated into a DNA or RNA binding protein, substrate binding and nuclease activity would be enhanced. 2.2 Artificial Nucleases from Native Protein Scaffolds
To create an artificial nuclease, Sigman and co-worker enhanced the nuclease specificity of 1,10-phenanthroline(OP)-copper by covalently linking the moiety to the Escherichia coli trp repressor (TrpR) [21]. OP is a robust chemical nuclease that cleaves the phosphodiester bonds of nucleic acids. A copper-oxo species attacks the C-1H of the deoxyribose in the minor grove, yielding 3 - and 5 -phosphomonoesters, free base, and 5-methylene furanone (5-MF) (Figure 1) [22, 23]. This makes it possible to convert DNA-binding proteins into nucleases by conjugation with OPcopper. Since most DNA/RNA binding proteins bind the major groove, it is crucial for the OP-copper group to interact solely with the minor groove to avoid disruption of key binding interactions. To conjugate E. coli trp repressor with OP-copper, the four lysine amino groups were first converted into sulfhydryl groups by reaction with 2-iminothiolane hydrochloride in the presence of the corepressor L-tryptophan (Figure 2) [21, 24, 25]. The thiols were then alkylated with 5-(iodoacetamido)-1,10-phenanthroline (IOP, Figure 2-1), yielding the TrpR-OP conjugate. DNase I footprinting experiments indicated that the DNA-binding ability of TrpR was not jeopardized after modification with OP. Early experiments involved the incubation of trpEDCBA operator with OP-TrpR copper complex for 20 h, which resulted in double-strand scission as well as single-strand nicks in 50% of the labeled strands [21].
2 Artificial Nucleases Based on DNA and RNA Binding Proteins
Fig. 1 Shortened chemical mechanism for DNA cleavage by 1, 10-phenanthroline-copper complexes.
Fig. 2 (A) Procedure for the attachment of 1,10-phenanthroline to proteins via lysine derivatization. (B) 5-(iodoacetamido)-1, 10-phenanthroline (IOP, 1) and 5-(iodoacetylglycylamido)-1, 10-phenanthroline (IAOP, 2).
3
4 Protein-based Artificial Enzymes
2.3 OP Nuclease Design by Mutagenesis and Chemical Modification
An additional approach to artificial nuclease design is cysteine mutagenesis followed by conjugation with the desired catalytic group. This avoids the need to convert amino acid residues into reactive thiols, while allowing precise control over the placement of the reagent within the DNA-binding protein. Sigman and coworkers applied this strategy to create an artificial nuclease using the bacteriophage λCro protein [26]. When complexed with DNA, the C-terminal arm binds within the minor groove and is close to the C-1 hydrogen of the deoxyribose on either DNA strand [26–30]. Therefore, an alanine close to the C-terminus was mutated to a cysteine and derivatized with IOP. After incubation with the Cro A66C-OP conjugate, 40% of the 17 base pair OR-3 operator site was cleaved within 10 min [11]. Thus, placement of the phenanthroline group close to the DNA substrate ensured efficient nuclease activity. Recently, the carboxy terminal domain of NarL, NarLC , was modified with 1,10phe-nanthroline. NarL is a response regulatory protein of E. coli and binds to a heptameric consensus sequence [31]. Two residues of the C-terminal domain, which were mutated and conjugated with IOP (NarLC K201C-IOP, NarLC K211CIOP), showed high site-specific cleavage efficiency of the top strand (Figure 3). When the mutated NarLC was modified with IAOP (Figure 2-2), a similar DNA
Fig. 3 Model of the NarLC -DNA complex showing the locations of amino acids 201 and 211. Two molecules of NarLC are positioned above the major groove of the DNA. Color scheme: Protein secondary structure (green), DNA (white), residue 211 (pink), residue 201 (orange).
2 Artificial Nucleases Based on DNA and RNA Binding Proteins
cleavage pattern was observed on the bottom strand. This IAOP conjugate contains a 4 Å longer linker arm than does the previous IOP conjugate, which allows the OP-Cu complex to extend and cleave the adjacent bases of the DNA substrate. These artificial cleavage enzymes could ultimately be used to identify the number, position and orientation of NarL monomers on several promoters and may be extended to other regulatory proteins. These results highlight the promise of chemical nucleases as tools for biological investigation. 2.4 Additional Applications for OP Conjugates
Other research groups have also created artificial nucleases by OP conjugation. Johnson and co-workers created an artificial nuclease, which aided in the understanding of DNA conformational changes upon protein binding [32]. E. coli Fis (factor for inversion stimulation) was chosen as the protein scaffold and binds to DNA with a low sequence specificity. X-Ray crystallography and electrophoretic mobility studies indicated significant bending of the DNA upon Fis binding [33, 34]. To aid in the study of DNA-Fis complexes, OP-Cu was conjugated to four separate mutated cysteines. Of these four conjugates, N98C-OP and N78C-OP displayed nuclease activity. Scission patterns from these two nucleases not only confirmed earlier studies of DNA bending, but also revealed that the DNA wrapped around the Fis dimer. The E. coli catabolite gene activator protein (CAP) binds to a 22 base-pair, twofold symmetric DNA recognition site with high affinity (K A = 4 × 1010 M−1 ) [35]. Ebright and co-workers successfully converted CAP into a site-specific DNA cleavage enzyme. In their first attempt, OP was conjugated to the only solvent accessible cysteine (C178 ) of CAP. The catalytic group was placed at this position so that intra-CAP interactions would not be disrupted, CAP-DNA contacts would not be comprised, and OP would be in close proximity for a favorable attack on the phosphodiester bond. The binding constant for this conjugate was 1 × 108 M−1 , indicating that the DNA binding ability of CAP was not significantly lost due to conjugation with OP. The CAP178-OP conjugate cleaved the DNA site at four adjacent nucleotides (Figure 4).
Fig. 4 DNA cleavage sites produced by the CAP178–1, 10-phenanthroline conjugate. Longer arrows indicate sites where greater cleavage efficiency occurred.
5
6 Protein-based Artificial Enzymes
Earlier work indicated that in the specific CAP-DNA complex the DNA was bent 90◦ away from the protein but was not distorted in the nonspecific complex. (CAP bound to its noncognate DNA site) [3, 36]. This DNA bending phenomenon was exploited to construct a new CAP-OP nuclease capable of single site cleavage [37]. The crystal structure of the specific CAP-DNA complex revealed that amino acids 24–26 and 89–91 of CAP were close to the DNA substrate [38]. With this in mind, residue 26 was mutated to a cysteine and modified with IOP. While the catalytic group neighbors the DNA in the specific complex, OP is too far from the substrate in the nonspecific complex, and thus cannot cleave the DNA. Ultimately, conjugation of OP at position 26 yielded an artificial nuclease that proceeded to ∼90% cleavage with no detectable nonspecific cleavage at distal sites.
2.5 A Fe-EDTA Artificial Nuclease
An alternative DNA-cleaving agent was used to identify the σ subunit DNA contact sites of E. coli RNA polymerase [39]. Iron-EDTA protein conjugates have previously been shown to cleave DNA by producing hydroxyl radicals, which ultimately attack the deoxyribose backbone [40]. To create this artificial nuclease, Minchin and co-workers mutated several residues within the σ subunit to cysteines. These residues were conjugated with (S)-1-[p-(bromoacetamido)benzyl]-EDTA (BABE) (Figure 5). By identifying the cleavage products of several related promoters, this Fe-BABE conjugate confirmed the location of the σ subunit-promoter DNA contact sites.
2.6 Concluding Remarks
Through direct conjugation of chemical nucleases to DNA binding proteins, several groups have successfully created artificial nuclease from proteins with no
Fig. 5 Procedure for the derivatization of proteins with (S)-1[p-(bromoacetamido)benzyl]-EDTA (BABE).
3 Catalysts Based on Hollow Lipid-binding Proteins 7
native nuclease ability. These nucleases can utilize the favorable protein–DNA binding elements and can position the hydrolytic metal close to the DNA backbone for efficient cleavage [41]. These nucleases have provided useful biological insights and a better understanding of the molecular details involved in protein–DNA interactions.
3 Catalysts Based on Hollow Lipid-binding Proteins 3.1 Lipid-binding Proteins
Lipid-binding proteins are a class of molecules found in eukaryotic cells involved in the transport of fatty acids and other types of hydrophobic compounds. The protein structure consists of two orthogonal planes of β-sheets that form a cup-shaped cavity that is capped off with a helix-turn-helix element. Considerable structural data, obtained from X-ray and NMR analysis, exists for this family of proteins [42, 43]. Of particular interest is the presence of a large, solvent sequestered, cavity whose overall volume varies between 500 and 1000 A3 depending on the exact identity of the protein; with the exception of certain membrane-bound channel proteins, there are few other examples of macromolecules that possess such a cavity. This structural feature serves as the ligand binding site in the protein cavity for a diverse range of molecules. Such a large, solvent sequestered, binding site also provides a useful scaffold for the design of catalysts. Formally, this cavity can be viewed as a protein equivalent to the cyclodextrin template used in much of the pioneering work of Breslow and co-workers in their development of enzyme mimics [18]. 3.2 Initial Work
In their initial work Distefano and co-workers used the protein ALBP (adipocyte lipid-binding protein) for their protein scaffold. This protein contains a unique cysteine residue a position 117 that can be selectively modified using reagents that capitalize on the unique reactivity of the thiol side chain. Kuang and co-workers developed a reagent, TP-PX (6-1) that contained an activated disulfide suitable for protein derivatization. This molecule was used to incorporate a PX moiety into ALBP at position 117, resulting in the formation of a construct denoted ALBP-PX (Figure 6, 6-2) [44]. This semisynthetic biocatalyst aminated reductively various α-keto acids (7-2) to ami-no acids (7-4) with 0 to 94% ee (Figure 7). Because these reactions were performed in the absence of any additional amine source, only single turnovers were obtained. The reaction rates were not, however, significantly faster than those involving free pyridoxamine (7-1b). This suggested that the protein
8 Protein-based Artificial Enzymes
Fig. 6 Derivatization reagents for the preparation of pyridoxamine conjugates of fatty acid binding proteins.
cavity functions as a chiral environment that controls the facial selectivity of the protonation of the aldimine intermediate without forming specific interactions with the bound pyridoxamine cofactor, which could accelerate the reaction, as confirmed by a X-ray crystal structure [45]. Modeling of the Schiff base complexes with several amino acids indicated that one face of the putative aldimine intermediate was protected against the approach of the solvent or buffer molecules that must be the proton source for the reaction given the lack of suitable functional groups within the cavity. This structural data provided a rationale for explaining the enantioselectivity observed in the ALBP-PX system. 3.3 Exploiting the Advantage of a Protein-based Scaffold
The next step in developing these protein-based catalysts was to take advantage of the synthetic flexibility provided by a protein scaffold. Since the protein is encoded by a DNA sequence, changes can be made in the gene that will result in modifications in the protein. Mutagenesis and other genetic methods can produce alterations in the protein framework much more rapidly than the organic synthesis
Fig. 7 Single turnover, reductive amination, reactions promoted by fatty acid binding protein–pyridoxamine conjugates.
3 Catalysts Based on Hollow Lipid-binding Proteins 9
required to make changes in synthetic scaffolds. Initially, the position of catalyst attachment within the cavity was varied to probe its effect on reaction rate and selectivity. For these experiments, a different fatty acid binding protein (IFABP) was employed. The wild-type IFABP contains no cysteine residues in its primary sequence and is, hence, a useful template for the introduction of single cysteine residues at different positions [46]. Cysteine residues were introduced at several positions, including V60 , L72 , and A104 , using site-directed mutagenesis. Figure 8 gives the locations of these mutations within the protein structure. The three different mutant proteins described above were purified and the corresponding conjugates were prepared using the TP-PX reagent (6-1). The conjugates were then evaluated in single turnover conditions (Figure 7) with several α-keto acids to produce amino acids in enantiomerically enriched form. This collection of catalysts exhibited various differences in reactivity and selectivity. Compared to ALBP-PX, IFABP-PX60 reacted at least 9.4-fold more rapidly, while IFABP-PX72 displayed opposite enantioselectivity, and IFABP-PX104 showed a clear selectivity preference for unbranched substrates [47]. From these experiments, the position of cofactor attachment is clearly an important parameter in modulating the reactivity and specificity of these biocatalysts. Moreover, these results underscore the utility of site-directed mutagenesis and the power of using a protein-based scaffold for catalyst development.
Fig. 8 Stereo view showing positions 60, 72, 104 and 117 in IFABP that have been used for the attachment of pyridoxamine and other catalytic groups. From top to bottom: A104, Y117, L72 and V60. Color scheme: Protein secondary structure (green), carbon (white), oxygen (red), nitrogen (blue).
10 Protein-based Artificial Enzymes
Fig. 9 Multiple turnover reactions that produce chiral amino acids from achiral keto acids.
3.4 Catalytic Turnover with Rate Acceleration
Due to its intriguingly more rapid reaction rate than ALBP-PX and free pyridoxamine, IFABP-PX60 was subsequently studied further [48]. Under single turnover conditions, it converted α-keto glutarate (Figure 9, 9-1) into glutamic acid (7-4) 62-fold faster than free pyridoxamine (7-1b); this was determined by evaluating the extent of conversion at much shorter reaction times. This more rapid reaction rate under single turnover conditions suggested that this construct might be capable of catalytic transamination. Using tyrosine or phenylalanine (9-2) as the amine source to recycle the cofactor from the pyridoxal form (7-3a) back to the pyridoxamine form (7-1), L-glutamate (9-3) was formed with an enantiomeric purity of 93% ee. As many as 50 turnovers were observed with long reaction times (14 days). Both kcat and K M for the reaction were determined by kinetic analysis (Table 1). Comparison with similar parameters obtained from reactions with free pyridoxamine indicated that IFABP-PX60 catalyzed transamination some 200 times more efficiently. Analysis of the specific kinetic constants kcat and K M indicated that the observed rate acceleration was due mostly to an increase in substrate binding Table 1 Kinetic constants and catalytic efficiencies for semi-synthetic transaminases based
on fatty acid binding proteins
PX MPX IFABP-PX60 IFABP-MPX60 hsIFABP-PX60 IFABP-PxK38 IFABP-MPxK38 IFABP-PxK51 IFABP-MPxK51 IFABP-Px126 IFABP-MPx126 IFABP-Px126/14 IFABP-MPx126/14
KM (mM)
kcat (h−1 )
kcat /KM (h−1 mM−1 )
73 38.7 1.8 6.8 10.2 0.81 13.7 0.24 8.9 5.5 40 6.0 48
0.032 0.031 0.29 0.23 0.22 0.44 1.12 0.44 0.52 0.18 1.10 0.11 0.78
4.4 × 10−4 8.0 × 10−4 0.16 0.034 0.022 0.54 0.08 1.83 0.06 0.03 0.03 0.02 0.02
3 Catalysts Based on Hollow Lipid-binding Proteins 11
(50-fold), with a smaller effect on the maximal rate (4-fold). While this is an impressive result, the absolute magnitude of kcat /K M (0.02 s−1 M−1 ) makes it clear that this catalyst is still quite primitive compared to natural enzyme systems that occasionally operate with catalytic efficiencies near the diffusion limit. 3.5 Modulation of Cofactor Reactivity with Metal Ions
Model studies with pyridoxamine and related analogs have shown that transamination reactions promoted by these catalysts can be accelerated by the addition of metal ions, including Zn(II), Cu(II), Ni(II) and Al(III). Hypothetically, these cations accomplish this by stabilizing the formation of Schiff base intermediates and by increasing the acidity of the protons that must be removed in the reaction. Interestingly, the addition of metal ions to reactions catalyzed by ALBP- and IFABP-pyridoxamine conjugates resulted in both positive and negative reaction rate perturbations [49]. IFABP-PX104 reacted 4.7-fold faster in the presence of Cu(II) whereas IFABP-PX60 reacted 4.4-fold slower – both rate effects were accompanied by a decrease in reaction enan-tioselectivity. Little change was observed for the reaction catalyzed by IFABP-PX72. UV/Vis spectroscopy experiments that monitored the formation of metal-aldimine intermediates suggested that IFABP-PX60 and IFABP-PX104 but not IFABP-PX72 formed a complex with Cu(II). Thus, metal ions may be used to increase semisynthetic enzyme efficiency, although this did not occur in all cases. These studies also noted that the reactions rates were sensitive to changes in buffer; the use of imidazole resulted in a lower rate than with similar reactions performed in HEPES buffer. These observations raise the possibility that buffer molecules may actually enter the protein cavity and directly participate in the reaction. This makes sense because several proton transfers must occur in a complete transamination reaction and yet there are no functional groups within the cavity that could do this. 3.6 Chemogenetic Approach
In the catalytic mechanism of pyridoxamine-based reactions, the pyridine nitrogen atom must be protonated; this protonation gives the pyridine ring a net positive charge that increases the acidity of the nearby benzylic protons and serves to stabilize a developing negative charge. One way protonation can be effectively enforced is by introducing a permanent positive charge via N-methylation. In some model systems, an N-methylated species accelerates the transamination rate up to 20-fold. In contrast, after reconstitution with N-methylpyridoxamine, alanine aminotransferase lost its activity almost completely (>99.8%). Therefore, N-quaternization may have both positive and negative influences on the reaction rate. To allow a similar modification to be made in these fatty acid binding protein systems, a new reagent containing N-methylpyridoxamine and an activated disulfide (TP-MPX, 6-3) was prepared. A conjugate, IFABP-MPX60 (6-4) was prepared [50] and, under
12 Protein-based Artificial Enzymes
catalytic conditions, significant reaction (2 turnovers, 41% ee) was observed using α-keto glutarate and phenylalanine as substrates. Kinetic analysis of the reaction showed that the MPX-containing conjugate gave higher K M (3.8-fold higher than the earlier PX construct) but similar kcat . This indicated that N-methylation had no positive effects on the reaction catalyzed by conjugates based on IFABP-V60 C. However, the MPX cofactor proved to be considerably more useful in subsequent experiments with IFABP variants containing lysine mutations (see below for further discussion of those proteins). In those studies, efforts were made to increase their catalytic efficiency by employing a genetic method to enforce N-protonation. Taking a cue from the crystal structure of AATase (aspartate amino transferase), an enzyme that catalyzes transamination using a pyridoxamine cofactor, carboxylatecontaining amino acid residues were introduced close to the pyridine nitrogen locus in two IFABP mutant proteins. It was thought, based on functional studies of AATase, that the positioning of anionic side chains near the pyridine nitrogen center would enforce protonation via an ion pairing mechanism; the presence of the proximal negative charge would favor the protonated pyridine. Several carboxylatecontaining IFABP mutants were prepared for this purpose. Unfortunately, none of these mutants were sufficiently stable for conjugate preparation – they precipitated during purification, and efforts to refold the denatured protein were unsuccessful. However, in separate experiments, the TP-MPX reagent (6-3) was used to prepare MPX-containing constructs instead (see 6-4 for a generic structure). Interestingly, these conjugates showed enhanced catalytic activity compared to their PX progenitors [51]. Kinetic studies indicated that the kcat (1.12 h−1 for IFABP-MPxK38 and 0.52 h−1 for IFABP-MPxK51) and turnover numbers (12.2 turnovers by IFABPMPxK38 and 5.7 by IFABP-MPx51 in 24 h) observed with these constructs under standard conditions are the highest achieved in this system (see Table 1 for a summary of kinetic parameters). The success of these constructs, prepared using a combination of chemical modification of the catalyst structure and genetic manipulation of the protein scaffold, highlights the enormous power and flexibility of this chemogenetic approach for catalyst development. 3.7 Adding Functional Groups within the Cavity
A major goal of research involving these protein scaffolds was to determine whether the flexibility of using a protein scaffold could be fully capitalized upon by using rational design in concert with knowledge of chemical mechanism to improve catalytic efficiency. Taking a cue from Nature, based on biochemical experiments with AATase, lysine residues were introduced into the protein cavity to enable Schiff base formation and to serve as general acids and bases in the reaction cycle. Figure 10 shows these functions in an abbreviated transamination mechanism. Based on the crystal structure of IFABP, molecular modeling was employed to identify possible positions where lysine residues could be introduced to perform the functions noted above. Mutants L38 K,V60 C and E51 K,V60 C were prepared and used to create pyridoxamine conjugates [52]. Figure 11 gives a model of IFABP-PXK51.
3 Catalysts Based on Hollow Lipid-binding Proteins 13
Fig. 10 Abbreviated mechanism for transamination reactions promoted by pyridoxal-based catalysts.
The resulting assemblies IFABP-PxK38 and IFABP-PxK51 showed improved kcat and K M . The overall catalytic efficiency (kcat /K M ) of IFABP-PxK51 increased 4200-fold compared to unliganded pyridoxamine phosphate and was 12-fold greater than for IFABP-PX60 while maintaining comparable enantioselectivity (83–94% ee). The principal effect on the kinetic constants for the reactions catalyzed by these mutants was on K M . Each conjugate showed a significant decrease in
Fig. 11 Stereo representation of a model of IFABP-PXK51. The overall protein structure is shown with C60 and the pyridoxal linked via a disulfide bond. The pyridoxal aldehyde group is bonded through a Schiff base to K51 . Color scheme: Protein secondary structure (green), carbon (white), oxygen (red), nitrogen (blue), sulfur (orange).
14 Protein-based Artificial Enzymes
K M (2.2- and 7.5-fold) and a small increase in kcat (1.5-fold). UV/Vis spectroscopy, fluorescence and electron-spray mass spectrometry verified the catalytic function (Schiff base formation) of the introduced lysine residue in the reaction process. Significantly, in a study of rate versus pH, IFABP-PX60 gave a rate that decreased monotonically with increasing pH while the lysine mutants exhibited a bell-shaped profile with a maximum rate near pH 7.5. Taken together, these results provide compelling evidence that the lysines participate directly in the reaction and that features of enzymatic processes, including covalent catalysis, can be mimicked successfully. Computer modeling together with mutagenesis was also used to identify existing residues in the protein cavity that contributed to the substrate specificity of the IFABP-PX60 catalyst. R126 and Y14 were identified as two important residues that interacted with the γ -carboxylate group of α-keto glutarate when bound to the enzyme via a Schiff base with the PX moiety (Figure 12). Mutants IFABPV60 C,R126 M and -V60 C,R126 M,Y14 F were prepared and pyridoxamine was attached to each mutant [53]. Of particular note, IFABP-PxM126 had a K M three-fold higher than IFABP-PX, while the second mutation at position 14 had no significant effect. The kcat s for both conjugates were 2-fold lower than for the original IFABP-PX60. N-methylated pyridoxamine conjugates were also introduced into these two mutants. Compared to the original IFABP-MPX60 conjugate described above, the
Fig. 12 Putative binding site for the sidechain carboxylate of the substrate "-keto glutarate in IFABP-PX60. Present are Y14 (upper left), R126 (upper right) and the Schiff base complex between "-keto glutarate and the pyridoxamine catalyst. (lower
center). Dashed lines indicate putative hydrogen bonds between the (-carboxylate of "-keto glutarate and the side-chain phenol of Y14 and the guanidinium group from R126 . Color scheme: Carbon (white), oxygen (red), nitrogen (blue), sulfur (orange).
3 Catalysts Based on Hollow Lipid-binding Proteins 15
mutations in positions 126 and 14 increased K M as much as seven-fold. IFABPMPxM126 gave a particularly significant result, manifesting a 5-fold higher kcat . While neither the experiments with the lysine mutants nor those with the carboxylate binding site mutants produced catalysts that rival the efficiency of natural enzymes, incremental improvements were observed with many of the conjugates. Moreover, these advances came from constructs that combined mutations in the protein scaffold with synthetic alterations in cofactor structure, thus highlighting the power of the chemogenetic approach. 3.8 Scaffold Redesign
While site-directed mutagenesis for the incorporation of new functional groups into the protein cavity is the most obvious type of genetic modification that can be performed with these systems, more global changes can be made in the scaffold using similar methods. For example, as noted above, the IFABP architecture consists of a β-barrel structure caped off with a α-helical lid. To examine whether entry or exit of substrates or products into or out of this closed cavity was limiting catalytic turnover, a helixless (hs) IFABP mutant was prepared by deleting 17 helical residues (15–31) from the N-terminus and replacing them with a dipeptide linker (Ser-Gly) using site-directed mutagenesis [54]. A previous NMR study showed that this mutant preserved the original β-sheet secondary structure without the α-helical lid and was still relatively stable [55]. The structures of IFABP and hsIFABP are compared in Figure 13; except for the absence of the α-helical lid, the overall protein fold is largely intact. Studies that investigated reactions with this conjugate (hsIFABP-PX60) revealed that in 24 h, approximately two turnovers were observed with a selectivity of 93% ee. A kinetics analysis indicated that kcat s were comparable for both IFABP-PX60 (0.20 h−1 ) and hsIFABP-PX60 (0.22 h−1 ), while the former exhibited a 4-fold increase in K M [56]. After introducing the MPX cofactor into the cavity of helixless protein, 4.9 turnovers in 24 h with 35% ee was observed for the production of Glu under the same conditions. Taken together, these results suggested that removal of the α-helical lid did not affect the maximal rate and enantioselectivity of the respective constructs. However, deletion of this structural element did decrease the substrate binding affinity, as evidenced by an increase in K M . Thus, it appears that while the hsIFABP scaffold was not useful in increasing the efficiency of PX-promoted transamination, these experiments did show that stable constructs could be prepared based on the helixless scaffold. This may have greater utility in reactions with larger cofactor catalysts or substrates. 3.9 Hydrolytic Reactions
Fatty acid binding proteins have also been used as scaffolds for constructing hydrolytic catalysts. Phenanthroline was attached to Cys117 of ALBP using
16 Protein-based Artificial Enzymes
Fig. 13 Comparison of the structures of IFABP and the helixless variant hsIFABP. Top: Stereo view of IFABP. Bottom: Stereo view of hsIFABP.
iodoacetamido 1,10-phenanthroline and metallated to produce a coordinated Cu(II) ion encapsulated within a chiral protein cavity (Figure 14). The resulting conjugate, ALBP-Phen-Cu(II) was able to promote the hydrolysis of several unactivated amino acid esters under mild conditions (pH 6.1, 25 ◦ C) at rates 32- to 280-fold above the background rate in buffered aqueous solution; these reactions also showed modest stereoselectivity [57]. In 24 h incubations, 0.70–7.6 turnovers were obtained with enantiomeric excesses ranging from 31 to 86% ee. ALBPPhen-Cu(II) could also catalyze the hydrolysis of an activated amide (picolinic
Fig. 14 Reaction used to prepare ALBP-Phen.
3 Catalysts Based on Hollow Lipid-binding Proteins 17
acid methyl nitroanilide, PMNA), under slightly more vigorous conditions (37 ◦ C) and after longer incubation times. The rate of amide hydrolysis was 1.6 × 104 fold higher than the background rate [57]. However, kcat obtained with ALBPPhen-Cu(II) was still significantly lower than that obtained with a related Cu(II) bipyridine complex [58]. This rate decrease may reflect a non-optimal, perhaps nonplanar, conformation for PMNA binding within the ALBP cavity. Of related interest, an X-ray crystal structure of ALBP-Phen was obtained. Inspection of this structure showed that the protein could not accommodate the phenanthroline and PMNA within a planar conformation without significant distortion of the protein backbone [45]. This may account for the lower than expected rate of PMNA hydrolysis promoted by ALBP-Phen-Cu(II) compared with non-proteinaceous model complexes. In analogy to the above experiments with pyridoxamine complexes, the 1,10phenan-throline ligand was introduced at several alternative sites (positions 60, 72 and 104, see Figure 8) using the protein scaffold IFABP [59]. Using alanine isopropyl ester as a substrate, IFABP-Phen60 catalyzed ester hydrolysis with less selectivity than ALBP-Phen, while Phen72 promoted the same reaction with higher selectivity. In contrast, hydrolysis of tyrosine methyl ester was catalyzed with higher selectivity by Phen60 and more rapidly by Phen104. These results indicated that the rate enhancement and substrate selectivity of hydrolysis reactions catalyzed by phenanthroline conjugates depended largely on the orientation and environment of the metal ligand within the protein cavity.
3.10 A Flavin-containing Conjugate
Reactions catalyzed by a flavin analog incorporated into IFABP have also been studied by alkylating a Cys residue within the cavity of the helixless variant (hsIFABP) [60]. The conjugate hsIFABP-FL catalyzed the oxidation of several dihydronicotinamides. These experiments were performed mainly to compare results obtained with flavin-IFABP conjugates with similar results acquired with flavopapains; these latter proteins were among the first semisynthetic enzymes produced. Interestingly, while hsIFABP-FL and flavopapain gave comparable rate accelerations (kcat /K M ) for dihydronicotinamide oxidation, hsIFABP-FL manifested much higher K M and kcat than did flavopapain. The low K M observed with flavopapain indicates that it primarily accelerates the reaction rate by enhancing substrate binding, whereas the higher kcat obtained with hsIFABP-FL suggests that its major mode of rate acceleration involves enhancing flavin reactivity. Molecular modeling of hsIFABP-FL indicates that Gln27 is close to N(3)H and O(4) of the flavin (Figure 15). Interaction between the carboxamide of Gln27 and the flavin is likely to alter the redox potential of the isoalloxazine and hence augment its reactivity. Taken together, these results with flavopapain and hsIFABP-FL highlight the diverse ways in which protein scaffolds can be used to modulate catalyst activity.
18 Protein-based Artificial Enzymes
Fig. 15 Hydrogen bonding to the flavin in hsIFABP-FL.
3.11 Some Limitations
While the examples described above highlight the impressive results that can be achieved with protein as scaffolds for catalyst design, this approach is not without problems. Firstly, there is the issue of cavity size. It would be useful to incorporate large metal–ligand systems into this protein system to generate catalysts for a plethora of interesting reactions. Unfortunately, the enclosed cavity of fatty acid binding proteins (and even the somewhat open cavity of hsIFABP) limits the size of what can be introduced into the protein interior. Efforts to incorporate a heme group or a Mn(III)-salen complex into hsIFABP were not successful. While it was possible to attach these moieties to the protein scaffold under denaturing conditions and refold them, the resulting conjugates were unstable and underwent aggregation and precipitation upon storage. In future work, it will be necessary to identify other protein structures that can serve as effective scaffolds but that are not subject to these size limitations. Secondly, work with fatty acid binding proteins is limited to aqueous systems. This creates two types of problems. The first is that catalytic systems that are sensitive to water cannot be employed. The Mn(III)salen complex described above underwent Schiff base hydrolysis and subsequent ligand decomposition upon prolonged storage in aqueous buffer. While the rate of this hydrolytic degradation can be decreased by employing mixtures of water and organic solvents, fatty acid binding proteins have limited solubility in such solvents. In addition, the inability to work in such solvents also limits the substrate concentrations that can be used and also precludes the use of these catalysts with many types of hydrophobic substrates. However, given the extensive interest and the large body of literature concerning the use of enzymes in organic media, it should be possible to identify other types of proteins that can serve as scaffolds for catalyst design in such solvent systems.
4 Myoglobin as a Starting Point for Oxidase Design 4.1 Artificial Metalloproteins and Myoglobin
Metalloproteins represent a major faction of scaffold proteins in artificial enzyme design. The introduction of a wide range of metals and redox cofactors into protein scaffolds can greatly increase both the diversity of enzyme catalysis and the
4 Myoglobin as a Starting Point for Oxidase Design
application of these artificial metalloproteins. To sequester artificial redox cofactors into protein scaffolds, which would otherwise not bind these chemical catalysts, one may either use non-covalent or covalent attachment [61]. Myoglobin is a wellstudied protein that contains a non-covalently bound heme for oxygen transport. The large heme-binding site provides substantial room to accommodate other ligand systems, and for these reasons serves as a useful starting point for catalyst design. 4.2 Non-covalent Attachment of a Redox Center
Watanabe and co-workers pursued an approach involving the non-covalent placement of Mn (III) and Cr (III) salophen complexes into apo-myoglobin [61]. In this artificial metalloprotein, two residues required mutation to improve the binding affinity of the cofactor and to allow increased access of substrates. The sulfoxidation activity of this Cr(III)-salophen myoglobin complex was studied and showed a sixfold rate increase over free Cr(III)-salophen in solution. The enantioselectivity of free salophen was 0%, while that of the myoglobin catalyst was 13%. While the ee is low for this artificial metalloprotein, Wantabe and co-workers were successful in demonstrating that asymmetric reactions can be accomplished using chiral protein cavities. 4.3 Dual Anchoring Strategy
To increase the enantioselectivity of these myoglobin metalloenzymes, Lu and coworkers have successfully utilized a covalent linkage approach [62]. In an earlier attempt a Mn(III)-salen complex was incorporated into apo-myoglobin by mutating residue 103 to cysteine, followed by modification with a methane thiosulfonate derivative of Mn(III)(salen) (Figure 16). This catalyst showed sulfoxidation activity; however, the ee was only 12%. As such a low ee might be a result of the ability of the bound ligand to exist in multiple conformations within the protein cavity, it was hypothesized that the rotational freedom of the salen complex could be limited if it was anchored at two separate locations, unlike the previous single-point attachment conjugate. The Mn(III)(salen) derivative was modeled into myoglobin to identify two positions that would yield the best fit into the heme pocket (Figure 17). In
Fig. 16 A Mn(III)(salen) derivatizing reagent used to prepare proteinMn(III)(salen) conjugates.
19
20 Protein-based Artificial Enzymes
Fig. 17 Structure of myoglobin complexed with a heme group. The two residues chosen for mutation, Y103 and L72 , are shown. Left: Side view of myoglobin. Right: Top view of myoglobin. Color scheme: Carbon (white), oxygen (red), nitrogen (blue), iron (pink).
addition to an increase in the rate of sulfoxidation of the substrate thioanisole, the enantioselectivity increased to 51% ee. These results indicate that the dual anchoring strategy may be applied to the design of other artificial metalloenzymes to obtain better enantioselective control. It may also be possible in future work to modulate substrate specificity as well. 5 Antibodies as Scaffolds for Catalyst Design 5.1 Antibodies as Specificity Elements
The immune system of higher eukaryotes produces antibody molecules that recognize a broad range of small molecules. By immunizing animals with specific antigens, produced by conjugating organic compounds to carrier proteins, antibodies that specifically recognize the original small molecule can be generated. This allows protein scaffolds that have binding sites tailored for a specific organic molecule to be produced. Figure 18 shows the hapten binding site of MOPC 315; an impressive number of specific contacts between the antibody and the nitrophenyl amide containing ligand can be seen. While the immunoglobulin-derived scaffolds are quite large, the ability to create high-affinity substrate binding sites without numerous iterations of design makes this an attractive approach for catalyst design. 5.2 Incorporation of an Imidazole Functional Group into an Antibody for Catalysis
The capability of antibodies to provide tailor-made binding sites for catalyst design has been exploited for the preparation of hydrolytic catalysts. The antibody MOPC 315 binds substituted 2,4-dinitrophenyl-containing compounds with association
5 Antibodies as Scaffolds for Catalyst Design 21
Fig. 18 Stereo view of the structure of the antibody MOPC 315 complexed with an amide-containing hapten. Only the region near the hapten binding site is shown. The hapten and Y34 are shown as larger sticks with the remaining residues in the bind-
ing site as smaller sticks. Residues shown include Y34 , W93 and W98 (from VL ) and W333 , H335 , Y339 , R350 , K359 , Y399 , Y401 and S405 (VH ). Color scheme: Carbon (white), oxygen (red) and nitrogen (blue).
constants ranging from 5 × 104 to 1 × 106 M−1 [63]. Imidazole was incorporated into this antibody via a thiol group introduced by chemically modifying K52H in the active site via a disulfide linkage (Figure 19). Using a series of coumarin esters as substrates, multiple (>10) hydrolytic turnovers were observed with no loss of activity. The catalytic efficiency, kcat /K M , for this reaction was over 103 times higher than for a model reaction employing 4-methylimidazole.
Fig. 19 Introduction of imidazole into the ligand binding site of MOPC 315 through chemical modification and site-directed mutagenesis.
22 Protein-based Artificial Enzymes
5.3 Comparison of Imidazole-containing Antibodies Produced by Chemical Modification and Site-directed Mutagenesis
Since an imidazole group can be incorporated into a protein through site-directed mutagenesis, such an approach was used to prepare a MOPC 315 mutant containing a histidine residue at the corresponding site. In brief, a hybrid Fv fragment of MOPC 315 was constructed by reconstituting a recombinant variable light chain (VL ) produced in E. coli with a variable heavy chain (VH ) from the antibody [64]. Imidazole was introduced into the combining site by substituting Y34 of VL with His using site-directed mutagenesis (see Y34 in Figure 18). This His mutant Fv catalyzed the hydrolysis of the 7-hydroxycoumarin esters of 5-(2,4-dinitrophenyl)aminopentanoic acid 90 000-fold faster than the reaction in the presence of 4-methyl imidazole at pH 7.8. Using that substrate, Fv(Y34HL ) turned over at least eleven times and retained 44% of its activity. The loss of activity probably resulted from the accumulation of the inhibitory reaction product. The mutant Fv bound ε-2, 4-DNP-L-lysine only six-fold less tightly than wild-type protein, suggesting that the substituted Tyr residue is not involved in the DNP recognition process. Compared to the imidazole-catalytic antibody generated by tethering an imidazole chemically, a sixteen-fold greater rate increase was observed for the reactions catalyzed by His mutant Fv protein under similar conditions. This rate enhancement is likely due to the fewer degrees of the freedom of the imidazole moiety inside the combining site (cf. 19-1 and 19-2). While these results alone indicate that the construct prepared by mutagenesis functioned more efficiently than the one obtained from chemical modification, this particular example does not exemplify the most powerful feature of chemical modification, i.e., the ability to incorporate non-natural functionality or spacers into complex protein structures. Considerable use will probably be made of this strategy in future catalyst design.
6 Conclusions
The examples described in this chapter reveal the breadth of methods now used to create protein-based catalysts. Structural analysis via X-ray and NMR techniques has proved to be critical for providing atomic structures that serve as the starting point for design. Computer modeling has become a powerful method in both the planning of new designs and in the interpretation of experimental results. Recombinant DNA techniques allow chemists to make mutations or more global changes in protein scaffolds. Together, these tools have allowed researchers to assemble new protein-based catalysts that can be used to probe biological systems or catalyze useful chemical transformations. While these catalysts do not in general rival the efficiency of natural enzymes, impressive rate accelerations have been obtained in some cases. New methods including phage display [65] and antibody production promise to provide greater access to ligand-specific scaffold development.
References
Peptide/protein ligation techniques offer the possibility of increasing the scope of functionality that can be incorporated into protein structures [66]. As the understanding of how enzymes promote chemical reactions with high efficiency increases, so too will the ability to design more efficient protein-based catalysts. Given the enormous power obtained from combining chemical and genetic methods it is likely that the field of protein-based artificial enzyme design will continue to grow in the near future.
References 1 K. E. NEET, D. E. KOSHLAND, Jr., Proc. Natl. Acad, Sci. U.S.A. 1966, 56, 1606–1611. 2 L. POLGAR, M. L. BENDER, J. Am. Chem. Soc. 1966, 88, 3153–3154. 3 Z. P. WU, D. HILVERT, J. Am. Chem. Soc. 1989, 111, 4513–4514. 4 H. L. LEVINE, Y. NAKAGAWA, E. T. KAISER, Biochem. Biophys. Res. Commun. 1977, 76, 64–70. 5 D. R. COREY, P. G. SCHULTZ, Science 1987, 238, 1401–1403. 6 P. BERGLUND, M. R. STABILE, M. GOLD, J. B. JONES, Bioorg. Med. Chem. Lett. 1996, 6, 2507–2512. 7 P. BERGLUND, G. DESANTIS, M. R. STABILE, X. SHANG, M. GOLD, R. R. BOTT, T. P. GRAYCAR, T. H. LAU, C. MITCHINSON, J. B. JONES, J. Am. Chem. Soc. 1997, 119, 5265–5266. 8 E. T. KAISER, D. S. LAWRENCE, Science 1984, 226, 505–511. 9 E. T. KAISER, D. S. LAWRENCE, S. E. ROKITA, Annu. Rev. Biochem. 1985, 54, 565–595. 10 C. F. MEARES, T. G. WENSEL, Acc. Chem. Res. 1984, 17, 202–209. 11 D. S. SIGMAN, T. W. BRUICE, A. MAZUMDER, C. L. SUTTON, Acc. Chem. Res. 1993, 26, 98–104. 12 I. M. BELL, D. HILVERT, Perspect. Supramol. Chem. 1994, 1, 73–88. 13 M. D. DISTEFANO, H. KUANG, D. QI, A. MAZHARY, Curr. Opin. Struct. Biol. 1998, 8, 459–465. 14 J. B. JONES, G. DESANTIS, Acc. Chem. Res. 1999, 32, 99–107. 15 D. HARING, P. SCHREIER, Naturwissenschaften 1999, 86, 307–312. 16 G. DESANTIS, J. B. JONES, Curr. Opin. Biotecnol. 1999, 10, 324–330.
17 C.-M. TANN, D. QI, M. D. DISTEFANO, Curr. Opin. Chem. Biol. 2001, 5, 696–704. 18 R. BRESLOW, S. D. DONG, Chem. Rev. 1998, 98, 1997–2011. 19 D. QI, C. M. TANN, D. HARING, M. DISTEFANO, Chem. Rev. 2001, 101, 3081–3111. 20 M. A. ZENKOVA 2004, Artificial Nucleases, Springer, Germany. 21 C. H. CHEN, D. S. SIGMAN, Science 1987, 237, 1197–1201. 22 M. M. MEIJLER, I. ZELENKO, D. S. SIGMAN, J. Am. Chem. Soc. 1997, 119, 1135–1136. 23 T. E. GOYNE, D. S. SIGMAN, J. Am. Chem. Soc. 1987, 109, 2846–2848. 24 R. JUE, J. M. LAMBERT, L. R. PIERCE, R. R. TRAUT, Biochemistry 1978, 17, 5399– 5406. 25 D. N. ARVIDSON, C. BRUCE, R. P. GUNSALUS, J. Biol. Chem. 1986, 261, 238–243. 26 T. W. BRUICE, J. G. WISE, D. S. E. ROSSER, D. S. SIGMAN, J. Am. Chem. Soc. 1991, 113, 5446–5447. 27 D. H. OHLENDORF, W. F. ANDERSON, R. G. FISHER, Y. TAKEDA, B. W. MATTHEWS, Nature 1982, 298, 718–723. 28 R. G. BRENNAN, Y. TAKEDA, J. KIM, W. F. ANDERSON, B. W. MATTHEWS, J. Mol. Biol. 1986, 188, 115–118. 29 R. G. BRENNAN, S. L. RODERICK, Y. TAKEDA, B. M. MATTHEWS, Proc. Natl. Acad. Sci. U.S.A., 1990, 87, 8165–8169. 30 P. LEIGHTON, P. LU, Biochemistry 1987, 26, 7262–7271. 31 G. XIAO, D. L. COLE, R. P. GUNSALUS, D. S. SIGMAN, Protein Sci. 2002, 11, 2427–2436. 32 C. Q. PAN, J. A. FENG, S. E. FINKEL, R. LANDGRAF, D. SIGMAN, R. C. JOHNSON,
23
24 Protein-based Artificial Enzymes
33
34
35
36
37 38 39
40 41 42
43
44
45
46
47
Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 1721–1725. D. KOSTREWA, J. GRANZIN, C. KOCH, H. W. CHOE, S. RAGHUNATHAN, W. WOLF, J. LABAHN, R. KAHMANN, W. SAENGER, Nature 1991, 349, 178–180. H. S. YUAN, S. E. FINKEL, J. A. FENG, M. KACZOR-GRZESKOWIAK, R. C. JOHNSON, R. E. DICKERSON, Proc. Natl. Acad. Sci. U.S.A. 1991, 88, 9558–9562. R. H. EBRIGHT, Y. W. EBRIGHT, P. S. PENDERGRAST, A. GUNASEKERA, Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 2882–2886. R. H. EBRIGHT, Y. W. EBRIGHT, A. GUNASEKERA, Nucl. Acids Res. 1989, 17, 10295–10305. P. S. PENDERGRAST, Y. W. EBRIGHT, R. H. EBRIGHT, Science 1994, 265, 959–962. S. C. SCHULTZ, G. C. SHIELDS, T. A. STEITZ, Science 253, 1001–1007. J. A. BOWN, J. T. OWENS, C. F. MEARES, N. FUJITA, A. ISHIHAMA, S. J. W. BUSBY, S. D. MINCHIN, J. Biol. Chem. 1999, 274, 2263–2270. T. D. TULLINS, B. A. DOMBROSKI, Proc. Natl. Acad. Sci. U.S.A. 1986, 83, 5469–5473. C. LIU, M. WANG, T. ZHANG, H. SUN, Coord. Chem. Rev. 2004, 248, 147–168. L. BANASZAK, N. WINTER, Z. XU, D. A. BERNLOHR, S. COWAN, T. A. JONER, Adv. Protein Chem. 1994, 45, 89–151. M. E. HODSDON, J. W. PONDER, D. P. CISTOLA, J. Mol. Biol. 1996, 264, 585–602. H. KUANG, M. L. BROWN, R. R. DAVIES, E. C. YOUNG, M. D. DISTEFANO, J. Am. Chem. Soc. 1996, 118, 10702–10706. J. J. ORY, A. MAZHARY, H. KUANG, R. R. DAVIES, M. D. DISTEFANO, L. J. BANASZAK, Protein Eng. 1998, 11, 253–261. H. KUANG, R. R. DAVIES, M. D. DISTEFANO, Bioorg. Med. Chem. Lett. 1997, 7, 2055–2060. H. KUANG, M. D. DISTEFANO, Protein Eng. 1997, 10, 25.
48 H. KUANG, M. D. DISTEFANO, J. Am. Chem. Soc. 1998, 120, 1072–1073. 49 D. QI, H. KUANG, M. D. DISTEFANO, Bioorg. Med. Chem. Lett. 1998, 8, 875–880. 50 H. KUANG, D. HARING, D. QI, A. MAZHARY, M. D. DISTEFANO, Bioorg. Med. Chem. Lett. 2000, 10, 2091–2095. 51 D. HARING, H. KUANG, L. J. BANASZAK, M. D. DISTEFANO, Protein Eng. 2002, 15, 603–610. 52 D. HARING, M. D. DISTEFANO, Bioconj. Chem. 2001, 12, 385–390. 53 D. HARING, M. D. DISTEFANO, Bioorg. Med. Chem. 2001, 9, 2461–2466. 54 K. KIM, D. P. CISTOLA, C. FRIEDEN, Biochemistry 1996, 35, 7553–7558. 55 R. A. STEELE, D. A. EMMERT, J. KAO, M. E. HODSDON, C. FRIEDEN, D. P. CISTOLA, Protein Sci. 1998, 7, 1332–1339. 56 D. QI 2003, Chemistry PhD Thesis, University of Minnesota, 140. 57 R. R. DAVIES, M. D. DISTEFANO, J. Am. Chem.Soc. 1997, 119, 11643–11652. 58 L. M. SAYRE, K. V. REDDY, A. R. JACOBSON, W. TANG, Inorg. Chem. 1992, 31, 935–937. 59 R. R. DAVIES, H. KUANG, D. QI, A. MAZHARY, E. MAYAAN, M. D. DISTEFANO, Bioorg. Med. Chem. Lett. 1999, 9, 79–84. 60 C-M. TANN 2003, Chemistry PhD Thesis, University of Minnesota, 201. 61 M. OHASHI, T. KOSHIYAMA, T. UENO, M. YANASE, H. FUJJI, Y. WATANABE, Angew. Chem. Int. Ed. 2003, 42, 1005–1008. 62 J. R. CAREY, S. K. MA, T. D. PFISTER, D. K. GARNER, H. K. KIM, J. A. ABRAMITE, Z. WANG, Z. GUO, Y. LU, J. Am. Chem. Soc. 2004, 126, 10812–10813. 63 S. J. POLLACK, P. G. SCHULTZ, J. Am. Chem. Soc. 1989, 111, 1929–1931. 64 E. BALDWIN, P. G. SCHULTZ, Science 1989, 245, 1104–1107. 65 R. H. HOESS, Chem. Rev. 2001, 101, 3205–3218. 66 P. E. DAWSON, T. W. MUIR, I. CLARK-LEWIS, S. B. H. KENT, Science 1994, 266, 776–779.
1
Force Generation in Molecular Motors George Oster University of California, Berkeley, USA
Hongyun Wang University of California, Santa Cruz, USA
Originally published in: Molecular Motors. Edited by Manfred Schliwa. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52730594-0
1 Introduction
Imagine living in a world where a Richter 9 earthquake raged continuously. In such an environment, engines would be unnecessary. You would not need to even pedal your bicycle: you would simply attach a ratchet to the wheel preventing it from going backwards and shake yourself forwards! At the scale of proteins, Brownian motion is even more furious, and proteins evolved to take advantage of this enormous supply of energy. Feynman showed that the familiar mechanical ratchet could not work in an isothermal environment, lest it violate the Second Law of Thermodynamics [1], and so motor proteins must employ a different strategy to convert random thermal fluctuations into a directed force: they use chemical energy via intermolecular forces to capture ‘favorable’ configurations. The way in which proteins do this is dictated by three factors: their size, the strength and range of intermolecular bonds at physiological conditions and the magnitude of the Brownian fluctuations that constitute their thermal environment. These determine the energy, length and time scales on which protein motors can operate. Roughly speaking, motor proteins trap thermal fluctuations in two ways: by biasing diffusion of small, angstrom-sized, steps (‘small ratchets’), and by rectifying nanometer-sized or larger, diffusional displacements (‘big ratchets’). For reasons that will become clear, biasing a sequence of small Brownian fluctuations is generally called a power stroke, while rectifying a large thermal displacement is called a Brownian ratchet. Said another way, Brownian ratchets move down free energy landscapes in steps much larger than kB T, while power strokes move in steps
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Force Generation in Molecular Motors
comparable to or smaller than kB T1) . The distinction is imprecise, but useful, since it delineates two extremes in the general mechanism by which proteins use intermo-lecular attractions to convert chemical energy into mechanical work.2) There are only a few motors that can be regarded as being pure power stroke motors or pure ratchets; most protein motors employ a combination of the two strategies. However, these ‘thoroughbreds’ are good illustrations of the principles. In fact, evolution has designed one protein that joins both ratchet and power stroke motors into one remarkable device: F 0 F 1 ATPase, or ATP synthase, the machine that manufactures the fuel that powers many other protein machines. We will use this protein as our running example. Our discussion will be mostly qualitative and heuristic. However, it is important to realize that the explanatory cartoons we use are supported by extensive calculations. Omitting them is akin to leaving out the ‘Materials and Methods’ section in an experimental paper: assertions without authority are simply opinions. The supporting quantitative arguments are, perforce, contained in the citations.
2 A Brief Description of ATP Synthase Structure
In order to discuss the workings of the F0 and F1 motors we give a brief account of their structure, which is summarized in Fig. 1. ATP synthase comprises two rotary motors acting in opposition, each operating on an entirely different physical mechanism. The F0 motor is contained in the membrane-bound portion and employs as its energy source a transmembrane electromotive force. The F1 motor is contained in the soluble portion and is driven by the hydrolysis of ATP. The F1 motor is constructed of a coiled-coil shaft (the γ -subunit) surrounded by a hexamer composed of alternating α and β subunits. Nucleotide binding sites nestle in the cleft between subunits; however, only three of the sites are catalytically active, the other three bind, but do not hydrolyze, ATP. The catalytic sites are contained mostly in the β-subunit, with a few—but crucial—residues contributed by the α subunit. The catalytic sites hydrolyze sequentially and drive the rotation of the γ shaft. Movies showing the motion of the hexamer and the rotation of the γ -shaft can be downloaded from a web site given in [1, 2]. The F0 motor is composed of two portions. Between 10 and 14 c-subunits (depending on the species and/or conditions) are assembled into a cylinder attached to the γ -shaft and ε-subunit. It interfaces with a second transmembrane assembly consisting of the a and b subunits. By convention, the γ -cn -ε assemblage is called the ‘rotor’, and the remainder of the protein (the α 3 β 3 hexamer and the δ-b2 -a complex) comprises the ‘stator’, although each rotates in the opposite direction. 1) The quantity kB T measures the thermal energy of Brownian motion; its value is 1 kB T ∼ 4.1 pN nm = 4.1 × 10−21 J at 298 K (25 ◦ C).
2) We have included a short Appendix with several simple examples that illustrate the difference between a ‘power stroke’ and ‘Brownian ratchet’.
3 The F1 Motor: A Power Stroke 3
Fig. 1 The structure of ATP synthase. The panel on the left shows a composite based on the pdb coordinates of the known subunits [23]; the right-hand panel is the corresponding structure represented schemat-
ically showing the relative rotation of the F0 and F1 motors and the direction of ion flow through the a-cn subunit interface. Subscripts denote the subunit stoichiometries.
3 The F1 Motor: A Power Stroke
The study of protein ATPases has led to a few generalizations that help us understand the mechanism by which these molecular motors convert the energy residing in the nucleotide γ -phosphate bond into a directed mechanical force. Ĺ At physiological conditions, the free energy of hydrolysis of one ATP is ∼ 20–24 kB T; of this, about 8–9 kB T is enthalpic, the balance being entropic. Ĺ Almost all nucleotide binding sites are nestled in the cleft between protein subunits. The nucleotide is grasped by loops emanating from a parallel β-sheet. Ĺ In many motors, the force-generating step is associated with the binding of nucleotide to the catalytic site. We propose that this is true of all ATPase motors. In particular, for the F1 motor this is the only way to reconcile all of the biochemical and mechanical measurements with its high mechanical efficiency. Ĺ After the force has been generated by ATP binding, an ATP is tightly bound in the catalytic site. The role of hydrolysis is to break the tightly bound ATP into two products and weaken the binding so the products can be released and the force-generating cycle can be repeated.
In the F1 motor, the ATP binding site lies asymmetrically in the cleft between the β and α subunits, the majority of the catalytic sites residing in the β subunit. The power stroke is accomplished by a hinge bending motion that swings the top of the β subunit down towards the bottom portion. The bending motion of the β subunit can be measured by the motion of helices B and C that emanate from the β sheet of
4 Force Generation in Molecular Motors
Fig. 2 The F1 power strokes are accomplished by the hinge bending motions of $ subunits, which are driven by ATP binding to the catalytic sites. (a) During the hinge bending motion, the top part of $ rotates about 30◦ toward the bottom part. This rotation closes the angle between helices B
and C. (b) The hinge bending motion of each $ subunit pushes on the off-axis section of the ( shaft. The rotation of the y shaft is driven by the coordinated hinge bending motions of all three $ subunits [1, 2, 5].
the catalytic site, as shown in the ribbon diagram in Fig. 2a. The bending of the (3 subunit by about 30◦ rotates the central γ shaft by pushing on its off-axis section, much like turning a crankshaft (Fig. 2b). The energy for the power stroke derives from the nucleotide hydrolysis cycle, which consists of four major steps: Binding
Hydrolysis
1
2 ADP Release
CS + ATP CS • ATP PI Release
3
CS • ADP + PI
CS • ADP • PI
CS + ADP
4
(1)
The nucleotide binding step 1 is the force-generating step and should more properly be expressed as a sequence of binding steps: CS + ATP CS • ATP . . . CS • ATP
(2)
Binding Transition
Here the progression from weak to tight binding is symbolized by the increasing size of the bonding symbol indicating the progressive annealing of 15–20 hydrogen bonds (and some hydrophobic interactions at the sugar end). The mechanism driving the hinge bending motion of the β is illustrated schematically in Fig. 3. Immediately after it diffuses into an open catalytic site, the ATP is bound only weakly. The catalytic site wraps around the ATP by forming more bonds which tightens the catalytic site and pulls down the top part of β toward the bottom part. The bonds between the ATP and the catalytic site are formed more or less sequentially, the formation of each bond corresponding to a small drop in binding
3 The F1 Motor: A Power Stroke 5
Fig. 3 The ATP binding transition from weak to tight proceeds as the catalytic site grasps the ATP in a grip of hydrogen bonds (the ‘binding zipper’). As binding progresses, the catalytic site closes up and pulls the top part of the $ subunit toward the bottom part. In this way, the binding free energy is converted to a power stroke
with nearly a constant force. During the power stroke some of the binding energy is stored in the elastic deformation of the p-sheet, which acts ike a spring. This energy is released during the unbending motion to aid product release and return the subunit to its open state.
free energy, that drives a small fraction of the hinge bending motion [1, 3, 4]. When all the bonds have been formed, the ATP is tightly bound in the catalytic site. The overall process from weak to tight binding is called the ‘binding transition’. During the binding transition, the ATP binding free energy is utilized efficiently to generate a bending motion with a nearly constant torque about the hinge region near the β-sheet. The binding transition has two important features [1, 4, 5]: Ĺ The binding free energy decreases (the binding becomes stronger) nearly monotonically and smoothly during the binding transition. This drives the hinge bending motion of the β subunit and consequently the γ shaft rotation in the hydrolysis cycle. In the reverse synthesis cycle, the unbending of the β subunit is driven by the γ shaft rotation in the opposite direction, powered by the F0 motor. When the top of the β subunit is forced up and away from the bottom portion, the
6 Force Generation in Molecular Motors
binding free energy increases (the binding becomes weaker). This progressively releases the nucleotide from the catalytic site [6]. Ĺ By the end of the binding transition, approximately 6–10 kB T of elastic energy is stored in the β-sheet whose loops grasp the nucleotide. Note that the catalytic site should be flexible but not elastic, lest it dissipate too much of the binding free energy by elastic recoil. To summarize, the power stroke is driven by progressive capturing of relatively small Brownian motions that anneal the nucleotide into the catalytic site. Product release is accomplished by using the free energy of hydrolysis to weaken the product binding so that thermal fluctuations can knock them out of the catalytic site. Thus Brownian motion drives both the power and exhaust strokes.
4 The F0 Motor: A Brownian Ratchet
The F0 motor is driven by the ion-motive force, µc across a bacterial or mitochondrial inner membrane. The ion-motive force consists of an ion concentration gradient and an electrical potential difference. In most organisms, the ion is a proton. However, much information has been gleaned from anaerobic bacteria whose F0 is driven by a sodium ion motive force. In either case, the transmembrane chemical potential difference in millivolts is given by: µc + = 2.3
kB T e
log10 [c + ] p − log10 [c + ]c +ψ
[mV ]
(3)
pH
where e is the electronic charge, [c+ ]p the ion (sodium or proton) concentration of the periplasm expressed in molars, [c+ ]c the ion concentration of the cytoplasm and ψ is the electrical potential difference across the membrane. Equation 3 is a thermodynamic relationship that implies that a concentration difference is equivalent to an electrical potential at equilibrium. Since motors operate out of equilibrium, this turns out not to be true in general. However, Eq. 3 points out the need for a mechanism for transforming both an entropic and electrical potential into mechanical work. The basic principle that accomplishes this can be illustrated by the ‘toy’ model shown in Fig. 4. 4.1 A Pure Brownian Ratchet
First, consider a rod with a linear array of negative charges (e. g. a DNA strand, etc.) that can freely diffuse through a hydrophilic pore embedded in a membrane as shown in Fig. 4a. If a potential, , is imposed across the membrane, then the rod will be propelled to the left and can perform work against a load force, FL . This can
4 The F0 Motor: A Brownian Ratchet
Fig. 4 Principle of the F0 motor. (a) A pure power stroke. A rod of negative electrica charges passes through a polar transmembrane pore. An electrical potential, imposed across the membrane drives the line of charges to the left. (b) A pure Brownian ratchet. The charged rod passes through a hydrophobic membrane pore. A high concentration of counterion charges (Na+ or H+ ) on the right can bind to the negative sites and neutralize them so that they can pass through the pore. A second positive ion on the left (not shown) neutralizes the membrane potential. The low concentration of counterions on the left ensures that
the bound charges dissociate but do not quickly rebind, so that the bare site cannot re-enter the pore. The attractive bonds between the rod charge and the aqueous solvent molecules rectify the rod’s diffusion. We call this a Brownian ratchet. (c) The power stroke and ratchet can be combined using an L-shaped ‘stator’. An aqueous input channel (colored white) connects via a polar channel (to the right) to the output reservoir The body of the stator is hydrophobic so that ions cannot leak across the membrane. With this design, the motor has both power stroke and ratchet components.
7
8 Force Generation in Molecular Motors
be considered as nearly a pure power stroke. The motion of the rod can be viewed as being driven by a driving potential, φ, tilted to the left in Fig. 4a. The slope of the driving potential gives the motor force driving the rod to the left: FM = −dφ/dx. The rod will stall when the motor force equals the load force: FM = FL . Because the rod is very small, its motion is stochastic (indicated in Fig. 4 by the Gaussian force labeled kB T). In fact, the net drift of the rod driven by the potential is deeply buried in its random motion: at any time the rod is only a tiny bit more likely to move to the left than to the right, and the mean square of the instantaneous velocity is several orders of magnitude larger than the net drift velocity. So the net movement down the potential gradient can only be detected by looking at correlations over many steps. This is illustrated by example 3 in the Appendix. 4.2 A Pure Power Stroke
Next, consider the situation in Fig. 4b where the rod passes through a hydrophobic pore. Moreover, there is a difference in the concentration of a positive counterion (e. g. sodium or protons) between the two sides of the membrane. The counterion can bind to and neutralize the negative charges (binding sites) on the rod. A difference in the concentration of a second ion (e.g. potassium), which cannot bind to the sites on the rod, neutralizes the membrane potential. A bare binding site is negatively charged and cannot move into the hydrophobic pore, for that would entail shedding its hydration shell at a considerable energetic cost.3) Because of the high concentration on the right side, the binding sites will be largely neutralized and so the rod can diffuse freely to the left through the pore. However, once a neutralized site has emerged from the pore to the left, the bound positive charge will quickly dissociate and leave the binding site unoccupied. Because of the low concentration on the left side, the binding site is likely to remain unoccupied, which prevents it from moving back into the pore. Thus diffusion to the right is rectified by the ion concentration difference, and the rod moves stochastically to the left, driven by a pure Brownian ratchet. The driving potential, φ, in this case looks like a staircase. When the concentration on the left is much lower than that on the right, each step of the staircase potential is much larger than kB T, so that reverse steps are unlikely. This is illustrated by example 4 in the Appendix. The load force, FL , opposing the motion has the effect of ‘tilting’ the driving potential so that the rod must diffuse uphill, against the load force. The two driving forces can be combined as shown in Fig. 4c. Here the membrane is horizontal and so the motor must be augmented by a fixed ‘stator’ assembly. The stator body is hydrophobic with two exceptions. First, there is a large aqueous channel that permits the ions from the high concentration reservoir to access the binding sites. Second, there is a polar channel connecting the aqueous channel 3) The energy cost of moving a negative charge into the pore is approximately G ∼ 45 kB T [22].
4 The F0 Motor: A Brownian Ratchet
Fig. 5 Operation of the F0 motor [7] (a) Simplified geometry of the sodium driven F0 motor showing the path of ions through the stator. This is the same arrangement as in Fig. 4c, but with the charge array wrapped around a cylinder that is free to rotate in the plane of the membrane with respect to the stator. In addition, a ‘blocking charge’ has been added to the stator to prevent the leakage of charge from the high concentration reservoir (periplasm) to the low concentration reservoir (cytoplasm). (b) Free energy diagram of one rotor site as it passes through the rotor–stator interface. Step 1 → 2, the rotor diffuses to the left, bringing the empty (negatively charged) site into the attractive field of the positive stator charge. 2 → 3, once the site is captured,
the membrane potential biases the thermal escape of the site to the left (by tilting the potential and lowering the left edge). 3 → 4, the site quickly picks up an ion from the input channel neutralizing the rotor. 4 → 5, an occupied site being nearly electrically neutral can pass through the dielectric barrier. If the occupied site diffuses to the right, the ion quickly dissociates back into the input channel as it approaches the stator charge. 5 → 6, upon exiting the stator the site quickly loses its ion. The empty (charged) site binds solvent and cannot pass backwards into the low dielectric of the stator. The cycle decreases the free energy of the system by an amount equal to the electromotive force.
to the low concentration reservoir. Since the aqueous channel is isopotential with the high concentration reservoir, this arrangement converts the transmembrane potential drop from vertical to horizontal. In this way the membrane potential and the ion concentration difference act in tandem to move the rod to the left, the former driving a power stroke and the latter driving a Brownian ratchet. The actual F0 motor works somewhat differently from the idealized version in Fig. 4c. Fig. 5a shows two more modifications that are required to make the arrangement resemble the sodium driven F0 motor of the bacterium P. modestum [7, 8]. The linear array of charged binding sites is first wrapped around a cylinder. This cylinder consists of 10–14 c-subunits, depending on the organisms and/or conditions. Second, a ‘blocking charge’ (R232) is present on the stator that prevents leakage of ions between the two reservoirs that would dissipate the ion gradient unproductively. The presence of this blocking charge alters the picture of the rotation of the rotor with respect to the stator substantially. Fig. 5b shows the potential experienced by one binding site of the F0 motor. The presence of the blocking charge creates an electrostatic potential well that attracts the rotor binding site as soon as it diffuses into the polar channel. Once trapped in the well, the rotor
9
10 Force Generation in Molecular Motors
depends on Brownian fluctuations to escape. However, the membrane potential biases escape by lowering the left side of the electrostatic well so that the rotor charge is much more likely to escape to the left than to the right. Once it escapes into the aqueous input channel, it is quickly neutralized by the positive ions so that it can move freely across the hydrophobic barrier. Note that the membrane potential only biases the Brownian fluctuations to the left, but does not actually drive a power stroke as in Fig. 4a. In summary, the F0 motor qualifies as a Brownian ratchet since it rectifies large thermal fluctuations (or long diffusions). The energy for rectification derives from the ion concentration gradient via the short range interactions between ions and the binding sites (binding and unbinding). It also uses the membrane potential to bias thermal fluctuations that release the rotor site from the attraction of the stator blocking charge: it takes less energy to hop out to the left than to the right. This might be thought of as a partial power stroke, so we see that the classification into power stroke and ratchet is largely a question of definition. An interesting class of ratchet motors is those that use the principle of trapping Brownian fluctuations during their assembly to perform a ‘1-shot’ motor task. Examples include the acrosome of Limulus and Thyone, the spasmoneme of Vorticella [9], and the polymerizaton of actin that propels la-mellipodial protrusion and certain intracellular parasites, such as Listeria and Shigella [10, 11]. An important corollary of the ratchet principle, and one that dramatically distinguishes molecular motors from other machines, is that using thermal fluctuations to go uphill in free energy amounts cools off the immediate environment. If the ratchet shown in Fig. 4b is coupled to work against a conservative load force, then the process is endothermic: heat is absorbed from surrounding fluid to increase the potential energy of the external agent that exerts the load force. By contrast, if the motor shown in Fig. 4a is coupled to work against a viscous load, then the process is exothermic: energy from the electrical potential goes to produce the drift velocity of the rod, which in turn is converted to heat by viscous friction. These microscopic thermal energy transactions lead to some surprising properties of molecular motors. For example, it is possible for the motor to perform more work on a viscous load than the free energy it derives from a reaction cycle! These counterintuitive properties are discussed in more detail in the references [5, 12, 13].
5 Coupling and Coordination of Motors
Most ATPase motors have more than one catalytic site. During the motor operation, each catalytic site hydrolyzes ATP and contributes to force generation. These catalytic sites generally do not operate independently, but act in concert with other catalytic sites. Two heads of a kinesin dimer are coordinated with each other to generate unidirectional motion and to ensure processivity. ATP synthase has three catalytic sites. AAA motors are hexamers of ATPases, sometimes stacks of two hexamers. The chaperonin Groel is a stacked pair of heptameric rings, each with seven
5 Coupling and Coordination of Motors
catalytic sites and the portal protein may even be a dodecameric ring of 12 ATPases. Generally, the catalytic sites must act in concert, either firing sequentially, as in F1 , or as in Groel, simultaneously in each ring, but alternating between the rings. This coordination is necessary for the proper operation of the motors, but how is it accomplished? In all cases, motor ATPase catalytic sites are too far apart to communicate in any other way than via elastic strain through the intervening protein structure. Although the details of strain coordination differ, a clue can be found in the correlation between the ADP release at one catalytic site and the ATP binding at another. In F1 , the strain-induced release of ADP arises from two possible sources. First, the γ shaft is asymmetric, so that at every rotational position it strains each catalytic site differently. Second, the intrinsic asymmetry of the protein structure allows the catalytic site to radiate strain differently to the two neighbor sites. The catalytic sites are located in the cleft between adjacent α and β subunits, with the majority of the binding residues in the β subunit [14, 15] This asymmetry allows ATP binding at one catalytic site to propagate a conformational change to the β portion of next catalytic site in the direction of motor rotation, but propagates a different conformational change to the β portion of the previous catalytic site. Thus, one catalytic site can affect the reactions on two neighboring catalytic sites differently. The conformational change directed to the β portion of the next catalytic site can lower the free energy barrier for ADP dissociation, readying it for the next hydrolysis cycle. Indeed this may be the primary structural feature determining the direction of rotation. Because of the unequal symmetry between the F0 and F1 motors, elastic coupling plays an additional role. The F1 motor has three catalytic sites and rotates in three major 120◦ steps, each with a brief pause at 90◦ . On the other hand, the F0 motor has a rotational symmetry that varies between 10 and 14, depending on the source. Therefore, there is no unique ‘stoichiometry’ between the two rotational motions. This symmetry mismatch is not a problem for the motors since the γ -shaft that couples them is torsionally flexible. In synthesis, this allows the stochastic F0 motor to deliver a smooth torque to F1 , which minimizes dissipation as the nucleotide is unzipped from the catalytic site [1, 16]. Thus elastic coupling between subunits of a protein motor provides the means for both coordinating the catalytic cycles and buffering the independent Brownian motions of the subunits. In walking motors the determination of directionality depends on asymmetric structural features of the heads that alternate depending on whether the head is leading or trailing. However, the situation is more complicated since each head has two binding partners: nucleotide and track. One proposal is that strain is generated by binding of the forward head to the track, triggers release of product from that head [17]. Also strain is relayed to the rear head via the connecting structure to lower the energy barrier for a particular step in the reaction cycle (for example, hydrolysis, or product release). This particular reaction step, in turn, triggers the release of rear head from the track [18]. Thus, binding to and release from the track are correlated with the relative positions of the heads. The alternating phases of the two hydrolysis cycles generate unidirectional motion [19].
11
12 Force Generation in Molecular Motors
6 Measures of Efficiency
We have discussed the molecular principles by which protein motors convert chemical bond energy into mechanical work. However, while the general principles may be the same for all motors, the detailed mechanisms are quite diverse. Therefore, it is frequently useful to determine the efficiency of a motor to provide clues as to its mechanism. The most common mechanical measurement performed on protein motors is to vary the load (the force resisting the motion) and measure the speed. In general, two kinds of load experiments are carried out on protein motors in order to determine their load–velocity behavior. In one type, a load is applied to resist the motor’s progress using a laser trap or the elastic stylus of an atomic force microscope. In this case the motor is working against a conservative force (i.e. derivable from a potential energy function) that depends only on the displacement of the motor, f = −∂φ/∂ξ ). A second, and generally more experimentally convenient method, is to vary the viscous resistance of the fluid environment of the motor. This is a dissipative force that depends on the motor velocity. The information gleaned from the two kinds of measurements yield different information [1, 12, 13]. The thermodynamic efficiency, ηTD is generally defined as the ratio of the work done by the motor to the energy input: ηT D =
f ·δ −G
(4)
where G is the free energy drop in one reaction cycle (e.g. from hydrolyzing one ATP, or passing one proton through the motor), and f ·δ. is the reversible work done by the motor against the conservative load force, f , in one reaction cycle. Here δ is the average distance covered per reaction cycle, sometimes called the ‘step size’. f ·δ is the energy output from the motor because it goes to increasing the potential energy of the external agent that exerts the conservative force. Thus, the thermodynamic efficiency is the ratio of energy output to energy input and it measures the energy conversion efficiency when the motor is operating reversibly i.e. ‘infinitely slowly’. For a motor working against a constant force (e.g. a laser trap force clamp), Eq. (4) can be generalized to the steady state [1, 13] by defining an efficiency: η≡
f ·v −G·r
(5)
Here r is the average reaction rate (e.g. hydrolysis cycles per second or average proton flux) and (v) is the average velocity. Strictly speaking, Eq. (5) is not a thermodynamic quantity since the steady state need not be the equilibrium state. Nevertheless, it is a well-defined quantity that is less than unity.
6 Measures of Efficiency
In general, the average step size, δ depends on the load force, f . When δ is independent of f we say the motor is tightly coupled. In that case, each reaction cycle is, on average, coupled to a fixed displacement (one motor step) regardless of the load force. A tightly coupled motor has two properties: Ĺ When the motor is stalled the chemical reaction is also stopped. Ĺ Increasing the load beyond the stall force drives the motor in the opposite direction and also reverses the chemical reaction.
For a tightly coupled motor, one can show that the stall force is given by f stall = −G/δ. At stall, the thermodynamic efficiency is 100%. When the motor is operating close to stall, the thermodynamic efficiency approaches 100%, regardless of the motor mechanism. Conversely, a high thermodynamic efficiency suggests only that the motor motion and the chemical reaction are tightly coupled [12]. Equation (5) applies only to the situation where the motor is working against a constant load. For macroscopic motors, inertia is important and the effect of Brownian fluctuations is negligible. Therefore, they tend to move at roughly a constant velocity (at least on short time scales), and so the friction force on the motor is approximately constant. In this case, the friction force can be treated effectively as a conservative force and the efficiency given by Eq. (5) is well defined. Thus, for a macroscopic motor, we generally do not have to worry whether it is working against a conservative force or a friction force. That is, for a macroscopic motor moving with roughly a constant velocity, both a viscous drag and a conservative load simply oppose the motor motion and they have approximately the same effect on the motor. For a protein motor, the situation is very different. Because protein motors are very small, the effect of inertia is negligible and the motor is driven mostly by the random Brownian force. The instantaneous velocity changes direction very rapidly and its absolute value is several orders of magnitude larger than the average velocity. Consequently, the drag force on the motor is stochastic and cannot be treated as a conservative force. The effect of a conservative load force on the motor is different from that of a viscous drag force that simply opposes the motor motion in any direction and whose magnitude increases with the velocity.4) Therefore, for protein motors, it is necessary to distinguish the case where the motor is loaded with a conservative force and the case where the motor is loaded with a viscous drag. When a protein motor works against a viscous load, the thermodynamic efficiency defined above does not apply. A commonly employed measure of efficiency in this case is the Stokes efficiency, defined by replacing f in Eq. (5) with the average viscous 4) A conservative load force tends to drive the motor backwards (opposite to the positive motor direction) and its magnitude is independent of the velocity. A viscous drag simply damps the motor velocity, while the Brownian force and/or the chemical reaction excite the veloc-
ity. If the chemical reaction is halted, the viscous drag will relax the motor to thermal equilibrium with the surrounding fluid, while the conservative force will drive the motor in the opposite direction.
13
14 Force Generation in Molecular Motors
drag force, f D = ζ v. Here ζ = kB T/D is the drag coefficient and D is the diffusion coefficient, which can be computed or measured independently [12]: η Stokes =
ζ vδ , −G
or
η Stokes =
ζ v2 −G·r
(6)
Although ζ v2 has the dimension of energy per unit time, ζ v2 is not the rate of the work done by the motor motion on the fluid medium5) . Indeed, it is not clear what energy per unit time ζ v2 measures. However, since ζ v2 increases with v, the quantity ζ v2 does measure some aspect of the motor’s mechanical performance. So the Stokes efficiency is the ratio of this ‘mechanical performance’ index to the energy supply. Of course, for Eq. (6) to be a valid measure of efficiency, it has to satisfy ηStokes ≤ 100 %. This is true, but the proof is not trivial. One can show that Eq. (6) measures how close the motor comes to delivering a constant force [1, 12, 13]. Given the free energy supply — G, the maximum average velocity is achieved if – G is utilized to generate a roughly constant driving force (independent of motor position). When the driving force is constant, the Stokes efficiency is 100%. When the Stokes efficiency is close to 100%, the driving force is close to a constant force [13]. A high Stokes efficiency for the F1 motor was one of the factors that implicated the progressive binding of ATP as the force-generating process, for only in this fashion could a constant torque be generated [1]. To summarize, if the thermodynamic efficiency, ηTD , is close to 1, this means that the motor motion is tightly coupled to the chemical reaction. However, when the Stokes efficiency, ηStokes , is close to 1, this means that the motor is delivering a nearly constant force. Since the Stokes and thermodynamic efficiencies measure different properties of the motor, it is worthwhile to measure both efficiencies experimentally.
7 Discussion
The basic principle underlying force generation in molecular motors seems at first glance pretty simple: proteins use short range attractive intermolecular forces to bias small thermal fluctuations or to rectify larger diffusions. The former we call ‘power strokes’, the latter ‘Brownian ratchets’. But the details are all important, and they are devilishly diverse. At some level of abstraction the operation of a molecular motor can be viewed as the motion of a point moving down a multidimensional free energy surface [1, 20, 21]. Indeed, many models start from this viewpoint and try and deduce what the laws of physics can say about the general properties of 5) ζ v2 should not be confused with ζ v2 . By the equipartition theorem of statistical meachanics, ζ v2 equals ζ kB T m−1 at equilibrium,
where m is the motor mass. ζ v2 is several orders of magnitude larger than ζ v2 .
A1 Example Models to Illustrate the Difference between Ratchets and Power Strokes 15
the surface. While useful from the viewpoint of theory, models at this level of abstraction are likely to be unsatisfying to biologists who seek a more mechanistic understanding, akin to how an engineer would understand the design and operation of an automobile motor. This involves dealing with the details of protein structure. The study of molecular motors, more than most other areas in biology, draws on a diversity of fields. Biochemistry elucidates the kinetics of the energy supplying reactions and thermodynamics establishes constraints on the energetic transactions. Mutation studies isolate the key functional amino acids and mechanical measurements provide criteria that circumscribe possible mechanisms. Finally, microscopy and X-ray crystallography provides the sine qua non structures, for it is nearly impossible to deduce how a machine works without knowing what it looks like. However, all of these studies combined cannot produce a mechanistic theory of how motor proteins work; this requires the unifying power of mathematical modeling. For until the operation of a motor can be formulated as equations and solved, our knowledge is, at best, qualitative and uncertain, hardly better than a plausible cartoon that may, or may not, obey the laws of chemistry and physics and in any event cannot be compared quantitatively with experiments.
Appendix
A1 Example Models to Illustrate the Difference between Ratchets and Power Strokes
Here we discuss four model examples to illustrate the role of Brownian fluctuations and the classification of power strokes and Brownian ratchets. For simplicity, we consider the one-dimensional motion of an object. A1.1 Example 1: A power stroke without Brownian fluctuations
Suppose the object is deterministically moved forward a fixed unit distance, x, per unit time, t (e.g. 0.01 nm in 1 µs). The trajectory of the object is shown in Figure A1a. The object moves forward uniformly with a velocity of (x)/(t). This model example is a pure power stroke without Brownian fluctuations. It is mostly relevant for macroscopic motors. Because of the large inertia, macroscopic motors tend to move at a roughly constant velocity and the effect of Brownian fluctuations is negligible. For protein motors, because of the small size, at room temperature the motion is dominated by Brownian fluctuations. One may argue that this is a hypothetical example of a power stroke protein motor at zero temperature.
16 Force Generation in Molecular Motors
Fig. A1 Trajectory of the object for model 1. The object moves forward uniformly with an average velocity of v = x/t = x/1. (b) Trajectory of the object for model 2. The object is driven by a constant force and is subject to Brownian fluctuations. The average velocity of the object is v = x/t. (c) Trajectory of the object for model 3. The object is moved by Brownian fluctuations, and the net drift comes from biasing fluctuations. The average velocity of the object is v = 0.92·x/t. The free energy consumption per unit length is the same as
that in Example 2. Examples 2 and 3 have similar statistical behaviors, and so it is experimentally difficult to distinguish between them. (d) Trajectory of the object for model 4. The object is moved by Brownian fluctuations, and the net drift comes from rectifying large fluctuations. The free energy consumption per unit length is the same as that in Examples 2 and 3 but the average velocity of the object is v = 0.1 x/t, significantly lower than that of Examples 2 and 3.
Since protein motors can only function in a certain temperature range and in a certain fluid medium, this model is not really relevant for protein motors, and so we must modify it to incorporate Brownian fluctuations. A1.2 Example 2: A power stroke with Brownian fluctuations
Suppose, in addition to the deterministic unit forward displacement per unit time, in each unit time the object makes n = 10 independent random moves. Each random move is one unit displacement either forward or backward with equal probability, p (e. g. flipping a fair coin: p = 0.5). This random motion is added to simulate
A1 Example Models to Illustrate the Difference between Ratchets and Power Strokes 17
Brownian fluctuations. The trajectory of the object is shown in Fig. A1b. The average velocity of the object is v = x/t, but the motion is stochastic. If we view the displacement in each unit time as the ‘instantaneous velocity’ for that unit time, then the instantaneous velocity is much larger than the average velocity. The inset shows the details of the trajectory; it is evident that the deterministic drift is buried in the random motions and can be detected only by looking at long time correlations. The number of random moves per unit time is proportional to the diffusion coefficient, D, of the object: 2Dt = n(x)2 . This allows us to calculate the diffusion coefficient from which the drag coefficient, ζ , is computed from the Einstein relation: ζ = kB T/D. In this example, the constant force driving the deterministic forward motion is f = ζ v and the free energy consumption per unit length is δvx. For n = 10, the free energy consumption per unit length is 0.2 kB T. A model such as this is appropriate for describing a charged object, such as a DNA strand, driven through a fluid medium by an electrical potential gradient. A1.3 Example 3: A Brownian ratchet that biases fluctuations
In the two examples above, the net drift is caused directly by a driving force (e. g. an electric potential gradient), and the Brownian fluctuations do not affect the net drift. Next, we consider situations where the net drift is actually caused by biasing or rectifying Brownian fluctuations. In the absence of other influences, the forward and backward fluctuations have the same probability. However, if internal barriers are established to block, partially or completely, the backward fluctuations, then the object is more likely to fluctuate forward. Suppose in each unit time, t, the object makes 10 independent random moves (fluctuations). Each random move is 1 unit displacement, x, either forward or backward with equal probability (p = 0.5). Suppose that each time the object passes a multiple of 5 × x, a barrier is established at that location. At a barrier, the object can fluctuate forward in two ways: a forward Brownian fluctuation and a backward Brownian fluctuation that is reflected by the barrier. The object can move back past the barrier only if a backward Brownian movement ‘breaks’ the barrier. The probability of breaking a barrier depends on the strength of the barrier. Let pf be the probability of forward fluctuation at the barrier and pb be that of backward fluctuation. G/kB T = log(pf /pb ) gives the free energy (in units of kB T) required to break the barrier. If we use pf = 0.73 and pb = 0.27, the corresponding free energy drop at the barrier is 1 kB T Since the barriers are separated by 5 × x, the free energy consumption per unit length is 1/5 = 0.2 kB T, the same as that used in Example 2. The trajectory of the object is shown in Fig. 1Ac. The average velocity of the object is v = 0.92 × x/t, similar to Example 2. Because of the small free energy drop (1 kB T) associated with each barrier, it only partially blocks backward fluctuations. The inset shows the details of the trajectory. This model is an example of a Brownian ratchet that biases small fluctuations. The motion of the object is stochastic and indistinguishable
18 Force Generation in Molecular Motors
from that of Example 2 where the object is driven by a constant force and subject to Brownian fluctuations. In this example, the object is directly moved by Brownian fluctuations. The net drift results from biasing fluctuations. For this reason, this model can be classified as a Brownian ratchet. However, it has the same phenomenological behavior as the power stroke motor in Example 2, and so it is very difficult — and unnecessary — to experimentally distinguish it from a power stroke motor. A1.4 Example 4: A Brownian ratchet that rectifies fluctuations
Finally, consider the model Example 3, but now suppose that in each unit time, t, the object makes 10 independent random moves (fluctuations). Each random move is one unit displacement, x, either forward or backward with equal probability (p = 0.5). Now suppose that each time the object passes a multiple of 100 × x, a barrier of 20 kB T is established at that location. The barrier is very high: the probability of a forward fluctuation is pf = 1–3.8 × 10−11 ≈1 and the probability of a backward fluctuation that surmounts the barrier is pb = 3.8 × 10−11 ≈ 0. Since the barriers are 100 × x apart, the free energy consumption per unit length is 20/100 = 0.2 kB T, the same as that in Examples 2 and 3. The trajectory of the object is shown in Fig. A1d. The average velocity of the object is v = 0.1x/t, significantly lower than that of Examples 2 and 3. Because of the large free energy drop associated with each barrier, it almost completely blocks backward fluctuations. Thus, this model is a Brownian ratchet that rectifies large fluctuations. The stochastic motion of the object is different from that of Examples 2 and 3. The object advances slowly in large ratchet steps: once it passes a multiple of 100x, it almost never goes back. One can add a ‘load’ to the above examples by decreasing the probability of a forward fluctuation and increasing that of a backward fluctuation at every location. One can also combine the power stroke and ratchet by adding both a deterministic motion and larger barriers, or alternatively by adding both small barriers and larger barriers. But this would complicate the model and obscure the simplicity of the distinctions we are trying to illustrate. However, we should point out that there is a way to use the time series data itself to estimate how much of the motor driving force can be ascribed to ratchet and power stroke contributions. This involves constructing an Effective Driving Potential from the time series data of the motor; this is discussed in Wang and Oster [12].
A2 A Closer Look at Binding Free Energy
The simple description given in the text of ATP binding to the catalytic site of F1 or of ions binding to the rotor of F0 , conceals a great deal of complexity because it neglects the role of solvent effects. Charges in aqueous solution are always hydrated, surrounded by a shifting cohort of hydrogen bonded waters. Before ATP can bind
A2 A Closer Look at Binding Free Energy
Fig. A2 Enthalpic and entropic changes during desolvation and binding can be followed by plotting—H, vs. TS. In these coordinates, the free energy change, G, is plotted as linear contour lines decreasing up and to the right. The formation of one hydrogen bond between the enzyme and nucleotide can be represented schematically as a reversible path showing a single desolvation and binding process. In state 1, the enzyme and nucleotide sites are hydrated (solid circles). Removing a water molecule from one site entails an enthalpic increase, H12 . This is followed by an entropic increase, TS23 , as the water escapes into solution. Finally, the removed water hydrogen bonds with other waters resulting in an enthalpic decrease, H34 . Similar changes accompany the release of a water molecule
from the other site during the transition from state 4 to state 7. Now the two empty sites must be brought close together, entailing an entropic decrease, TS78 , and an enthalpic decrease, H89 as the sites bind. Thus the overall free energy change, G19 , has enthalpic and entropic components (black bars) that depend on many factors, especially whether the water binds more strongly to the sites than to other waters. Note that the sequence 1 → 9 is only meant to show the enthalpic and entropic changes. It does not represent the actual sequence of what occurs during the desolvation and binding process. In particular, 2 → 3 (the removed water diffusing into solution) and 3 → 4 (the removed water bonding with other waters) occur simultaneously and cannot be separated.
to the catalytic site to initiate the hydrolysis cycle (or sodium binding to the rotor charge in the F0 motor cycle), both must shed their water coats. The shedding of the water coats is progressive as the hydrogen bonds form between ATP and the catalytic site. Before each hydrogen bond can form, the hydration water molecules must be shed from the donor and receptor just before they form the bond (otherwise they will be re-hydrated quickly). Consider the overall process of the formation of one hydrogen bond between ATP and the catalytic site. This entails a number of energetic and entropic transactions. We can plot the process schematically as the path shown in Fig. A2. (This path does not represent the actual non-equilibrium process, but rather a ‘reversible work’ path to illustrate the separate entropic and enthalpic transactions). We see that, even in the simplest case, a single association event entails four enthalpic and four entropic changes when water molecules break their hydrogen bonds to charged sites, escape into solution, re-bond with other
19
20 Force Generation in Molecular Motors
waters, and finally two sites associate. Clearly, solvent effects can tip the free energy balance, but it is seldom easy to compute how. The binding transition that generates the power stroke as embodied in Eq. (2) and Fig. 3 is also an oversimplified description. The hydrogen bonds are not arrayed in a linear sequence (as suggested by the term ‘binding zipper’), nor are they either ‘on’ (zipped) or ‘off’ (unzipped). The bonding surfaces are complex, hydrogen bonds have a finite range and angular dependence, and thermal motions create a stochastic pattern of graded bonding interactions. Nevertheless, molecular dynamics studies demonstrate that the free energy changes gradually and nearly linearly as the nucleotide unbinds or binds to the catalytic site.
References 1 OSTER, G. and H. WANG. 2000a. Reverse engineering a protein: The mechanochemistry of ATP synthase. Biochim. Biophys. Acta 1458: 482–510. 2 WANG, H. and G. OSTER. 1998. Energy transduction in the F1 motor of ATP synthase. Nature 396: 279–282. 3 BOCKMANN, R. 2002. Nanoseconds molecular dynamics simulation of primary mechanical energy transfer steps in F1 -ATP synthase. Nat. Struct. Biol. 9: 198–202. 4 SUN, S., et al. 2003. Elastic energy storage in beta sheets with application to F1 -ATPase. Eur. Biophys. J. 32: 676–683. 5 OSTER, G. and H. WANG. 2000b. Why is the efficiency of the F1 ATPase so high? J. Bioenerg. Biomembr. 332: 459–469. 6 ANTES, I., et al. 2003. The unbinding of ATP from F1 -atpase. Biophys. J.. 85: 695–706. 7 DIMROTH, P., et al. 1999. Energy transduction in the sodium F-ATPase of Propionigenium modestum. Proc. Natl. Acad. Sci. USA 96: 4924–4929. 8 OSTER, G., et al. 2000. How F0 -ATPase generates rotary torque. Proc. Roy. Soc. 355: 523–528. 9 MAHADEVAN, L. and P. MATSUDAIRA. 2000. Motility powered by supramolecular springs and ratchets. Science 288: 95–99. 10 MOGILNER, A. and G. OSTER. 1996a. Cell motility driven by actin polymerization. Biophys.J. 71: 3030–3045. 11 MOGILNER, A. and G. OSTER. 1996b. The physics of lamellipodial protrusion. Euro. Biophs. J. 25: 47–53.
12 WANG, H. and G. OSTER. 2002a. Ratchets, power strokes, and molecular motors. Appl. Phys. A 75: 315–323. 13 WANG, H. and G. OSTER. 2002b. The Stokes efficiency for molecular motors and its applications. Europhys. Lett. 57: 134–140. 14 MENZ, R., et al. 2001. Structure of bovine mitochondrial F1 -ATPase with nucleotide bound to all three catalytic sites: implications for the mechanism of rotary catalysis. Cell 106: 331–341. 15 STOCK, D., et al. 2000. The rotary mechanism of ATP synthase. Curr. Opin. Struct. Biol. 10: 672–679. 16 JUNGE, W., et al. 2001. Inter-subunit rotation and elastic power transmission in F0 F1 -ATPase. FEBS Lett. 251: 1–9. 17 UEMURA, S., et al. 2002. Kinesin–microtubule binding depends on both nucleotide state and loading direction. Proc. Natl Acad. Sci. USA 99: 5977–5981. 18 HANCOCK, W. and J. HOWARD. 1999. Kinesin’s processivity results from mechanical and chemical coordination between the ATP hydrolysis cycles of the two motor domains. Proc. Natl Acad. Sci. USA 96: 13147–13152. 19 PESKIN, C. S. and G. OSTER. 1995. Coordinated hydrolysis explains the mechanical behavior ofkinesin. Biophys.J. 68: 202s–210s. 20 BUSTAMANTE, C., et al. 2001. The physics of molecular motors. Acc. Chem. Res. 34: 412–420.
References 21 WANG, H., et al. 1998. Force generation in RNA polymerase. Biophys.J. 74: 1186–1202. 22 ISRAELACHVILI, J. and B. NINHAM. 1977. Intermolecular forces the long and short of it. J. Colloid Interface Sci. 58: 14–25.
23 PEDERSEN, P., et al. 2000. ATP Synthases in the year 2000: evolving views about the structures of these remarkable enzyme complexes. J. Bioenerget. Biomembr. 32: 325–332.
21
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
1
Enzyme Catalysis Andreas Sebastian Bommarius Georgia Institute of Technology, Atlanta, USA
Bettina R. Riebel Emory University School of Medicine, Atlanta, USA
Orginally published in: Biocatalysis. Andreas Sebastian Bommarius and Bettina R. Riebel. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30344-1
1 Introduction
Enzymes are a class of macromolecules with the ability both to bind small molecules and to effect reaction. Stabilizing forces such as hydrophobic effects only slightly dominate destabilizing forces such as Coulombic forces of equal polarity; thus the Gibbs free enthalpy of formation of proteins, Gformation , is only weakly negative. In an enzyme reaction, initially free enzyme E and free substrate S in their respective ground states initially combine reversibly to an enzyme–substrate (ES) complex. The ES complex passes through a transition state, Gtr,cat , on its way to the enzyme–product (EP) complex and then on to the ground state of free enzyme E and free product P. From the formulation of the reaction sequence, a rate law, properly containing only observables in terms of concentrations, can be derived. In enzyme catalysis, the first rate law was written in 1913 by Michaelis and Menten; therefore, the corresponding kinetics is named the Michaelis–Menten mechanism. The rate law according to Michaelis–Menten features saturation kinetics with respect to substrate (zero order at high, first order at low substrate concentration) and is first order with respect to enzyme. Important milestones in the rationalization of enzyme catalysis were the lockand-key concept [1], Pauling’s postulate (1944) and induced fit [2]. Pauling’s postulate claims that enzymes derive their catalytic power from transition-state stabilization; the postulate can be derived from transition state theory and the idea of a thermodynamic cycle. The Kurz equation, kcat /kuncat ≈ KS /KT , is regarded as the mathematical form of Pauling’s postulate and states that transition states in the case of successful catalysis must bind much more tightly to the enzyme than Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
2 Enzyme Catalysis
ground states. Consequences of the Kurz equation include the concepts of effective concentration for intramolecular reactions, cooperativity of numerous interactions between enzyme side chains and substrate molecules, and diffusional control as the upper bound for an enzymatic rate. Enzymes demonstrate rate accelerations (kcat /kuncat ratios) of ≤ 1.4 · 1017 and proficiencies [kcat /(kuncat K M ) (= 1/K T )], of ≤ 10 M−1 . Surprisingly, for a range of enzymatic reactions kcat is within two orders of magnitude whereas kuncat varies by more than six orders of magnitude; the highest rate accelerations are observed for reactions with low kuncat . Every (bio)catalyst can be characterized by the three basic dimensions of merit – activity, selectivity and stability – as characterized by turnover frequency (tof) (= 1/kcat ), enantiomeric ratio (E value) or purity (e.e.), and melting point (T m ) or deactivation rate constant (kd ). The dimensions of merit important for determining, evaluating, or optimizing a process are (i) product yield, (ii) (bio)catalyst productivity, (iii) (bio)catalyst stability, and (iv) reactor productivity. The pertinent quantities are turnover number (TON) (= [S]/[E]) for (ii), total turnover number (TTN) (= mole product/mole catalyst) for (iii) and space–time yield [kg (L · d)−1 ] for iv). Threshold values for good biocatalyst performance are kcat > 1 s−1 , E > 100 or e.e. > 99%, TTN > 104 –105 , and s.t.y. > 0.1 kg (L · d)−1 .
2 Characterization of Enzyme Catalysis 2.1 Basis of the Activity of Enzymes: What is Enzyme Catalysis?
Enzymes are a class of multifunctional, multivalent macromolecules with the ability to bind small molecules and, much more importantly, subsequently to effect reaction. Stabilizing forces such as hydrophobic effects [3] only slightly dominate destabilizing forces such as Coulombic forces of equal polarity; thus the Gibbs free enthalpy of formation, Gformation , is weakly negative. Without exception, enzymes are proteins and consist of one or more linear chains of, with rare exceptions, the common 20 proteinogenic amino acids. The fairly frequent disulfide bonds between cysteines are formed through crosslinking of side-chain interactions but do not constitute branching of the backbone chain. Extremely rarely, an additional genetically coded 21st amino acid is found, such as selenocysteine in formate dehydrogenase or glutathione peroxidase [4, 5], or pyrrolysine, the “22nd” amino acid, in methanogens belonging to the domain of archaea [6, 7], or p-fluorophenylalanine in a redesigned E. coli cell [8]. All three examples used a stop codon to code for the unusual amino acid. Non-proteinogenic amino acids are found much more commonly in non-enzyme proteins. One prominent example is collagen, from gelatin, which mainly consists of triplets of hydroxyproline–glycine–proline, in which the hydroxyproline is generated via posttranslational hydroxylation. Other proteins apart from enzymes
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
2 Characterization of Enzyme Catalysis 3
exist that can bind substrates and catalyze reactions, such as catalytic antibodies [9, 10] or ribozymes [11, 12]. On the other hand, there are many proteins that do not catalyze any reaction, and thus are not enzymes, but function as oxygen storage and shuttle units (hemoglobin and myoglobin), receptors (signal transduction proteins such as the EGF-receptor) or as structural proteins (myelin). 2.2 Enzyme Reaction in a Reaction Coordinate Diagram
The idea of an enzyme reaction consisting of both a binding step and a subsequent reaction step is embodied in a free enthalpy-reaction coordinate (G-ξ ) diagram depicting the change in Gibbs free enthalpy G over the extent of reaction ξ (zeta) (Figure 1). In an uncatalyzed reaction, one or more substrates or reactants initially are in their respective ground states; during the reaction, the reactants pass the point of maximum free enthalpy Gtr,uncat , termed the “transition state”, and continue to the ground state of the product(s). In an enzyme reaction, initially free enzyme E and free substrate S likewise are in their respective ground states. From the ground state, enzyme and substrate combine reversibly to an enzyme–substrate (ES) complex. If [S] > K M , the ES complex forms an “intermediate”, i.e., a local minimum of free enthalpy. (If [S] < K M , the free enthalpy of the ES complex is higher than the enzyme’s ground state; at [S] = K M , both free enthalpy levels are the same.) The ES complex passes through another transition state, Gtr,cat , on its way to the enzyme–product (EP) complex and then on to the ground state of free enzyme E and free product P. The maximum of the Gibbs free enthalpy between the ground states of substrate and product forms the Gibbs free enthalpy of activation with the energy difference G= , which determines the rate constant of the reaction. Like every catalyst, an enzyme decreases the value of G= and thus accelerates the reaction. (An agent increasing the value of G= is termed an “anti-catalyst”.)
Fig. 1 Free enthalpy reaction coordinate diagram for an enzyme reaction.
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
4 Enzyme Catalysis
2.3 Development of Enzyme Kinetics from Binding and Catalysis
From the idea of enzyme kinetics as a binding and a reaction step with the corresponding course of the energy curve in the Gibbs free enthalpy–reaction coordinate (G – ξ ) diagram, the reaction scheme represented by Eq. (1) can be drawn. E+S⇔ES→EP→(⇔)E+P
(1)
Among the several ways of verifying or disproving such a reaction scheme, the derivation of a rate law linking a product formation rate or substrate consumption rate with pertinent concentrations of reactants, products, and auxiliary agents such as catalysts probably has the greatest utility, as conversion to product can be predicted. A proper rate law contains only observables, and no intermediates or other unobservable parameters. In enzyme catalysis, the first rate law was written in [13] by Michaelis and Menten (the corresponding kinetics is therefore aptly named the Michaelis–Menten (MM) mechanism). The kinetic schemes for the description of enzyme reactions are similar to those for homogeneous and heterogeneous catalysis and were developed roughly contemporaneously. In heterogeneous catalysis, a mechanism postulating a reaction of a mobile substrate molecule on a solid surface is modeled by a rate law named after Langmuir and Hinshelwood, and later also after Hougen and Watson [14] (LHHW). LHHW and MM both assume a limiting number of active sites which can be saturated with reacting molecules from a continuous phase. The kinetic scheme according to Michaelis–Menten for a one-substrate reaction [13] assumes three possible elementary reaction steps: (i) formation of an enzyme–substrate complex (ES complex), (ii) dissociation of the ES complex into E and S, and (iii) irreversible reaction to product P. In this scheme, the product formation step from ES to E + P is assumed to be rate-limiting, so the ES complex is modeled to react directly to the free enzyme and the product molecule, which is assumed to dissociate from the enzyme without the formation of an enzyme–product (EP) complex [Eq. (2)]. E+S⇔ES→E+P
(2)
In the derivation according to Michaelis and Menten, association and dissociation between free enzyme E, free substrate S, and the enzyme–substrate complex ES are assumed to be at equilibrium, K S = [ES]/([E] · [S]). [The Briggs–Haldane derivation (1925), based on the assumption of a steady state, is more general.] With this assumption and a mass balance over all enzyme components ([E]total = [E]free + [ES]), the rate law in Eq. (3) can be derived. v=vmax [S]/(K M +[S])=kcat [E]total [S]/(K M +[S])
(3)
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
3 Sources and Reasons for the Activity of Enzymes as Catalysts
In contrast to [E]free , [E]total is observable. Eq. (3) is written with K M , the Michaelis constant, instead of the equilibrium binding constant K S ; unless the enzyme reaction is very fast (Section 4.3.); i.e., in almost all cases, K M ≈ K S . In Eq. (3), the reaction rate is traditionally denoted by v [concentration/time] and kcat is the reaction rate constant [time−1 ]. The equation describes a two-parameter kinetics, with a monotonically rising reaction rate with respect to substrate concentration and saturation at high substrate concentration. The maximum reaction rate at saturation is denoted by vmax , with vmax = kcat [E]. The K M value corresponds to the substrate concentration at half saturation (vmax /2) and is a measure of binding affinity of the substrate to the enzyme: a high K M value corresponds to loose binding, and a low value to tight binding between enzyme and substrate. At low substrate concentration, more precisely if [S] K M , Eq. (3) simplifies to v = vmax [S]/K M and consequently is first order with respect to the substrate concentration [S]. In contrast, at high substrate concentration, more precisely if [S] K M , Eq. (3) simplifies to v = vmax and consequently is zeroth order with respect to [S]. In all situations, v is proportional (first-order with respect) to [E]. 3 Sources and Reasons for the Activity of Enzymes as Catalysts 3.1 Chronology of the Most Important Theories of Enzyme Activity
Just one year after Ostwald’s hypothesis about the existence of catalysts in 1893, when nobody yet had a clear idea of the structure and composition of enzymes, Emil Fischer voiced the idea for the first time that a substrate molecule fits into the pocket of an enzyme, the “lock-and-key hypothesis” [1]. Both the lock (enzyme) as well as the key (substrate) were regarded as rigid. This hypothesis was modified later in many ways. According to Haldane [15], catalysis of a reaction occurs only if a catalyst in the active center is complementary to the transition state of the substrate during the reaction. Therefore, the transition state between substrate and products fits best into a pocket close to the enzyme. The substrate molecule is subject to strain upon binding to the active center and changes its conformation to fit into the active center; the key (the substrate) does not fit completely into the lock but is strained and bent. In a further modification of this concept, a reaction is accelerated if a catalyst stabilizes the transition state; in contrast, stabilization of the ground state leads to a deceleration of the reaction. This concept of transition-state stabilization, formulated first by Haldane, was expanded later by Linus Pauling [16, 17]. The notion of lowering of G= attributed to the stabilization of the transition state by the catalyst or to the destabilization of the ground state in comparison to the transition state is generally accepted nowadays. Instead of assuming a “solid” enzyme, in which active center the substrate molecule is “bent” (the concept of substrate strain), the idea was developed [2, 18]
5
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
6 Enzyme Catalysis
that enzymes can embrace the substrate molecule flexibly in the active center and effect reaction by the formation of specific interactions, the so-called “induced fit”. This picture is especially appropriate with allosterically activated enzymes or in situations in which part of the enzyme molecule has to turn or move over longer distances to effect catalysis (hinge movement), as for instance with most NAD(P)(H)dependent enzymes [19]. An equation for the acceleration of the reaction rate by a deliberate catalyst and the relative strength of the binding to the transition state and ground state was formulated in 1963 by Kurz [20], who combined the ideas of the thermodynamic cycle and transition-state theory and whose result, unrecognized by Kurz, can be regarded as a quantification of the Pauling postulate [20]. Just a few years later Jencks originated the idea of transition-state analogs as inhibitors and cited references in the literature [21]. Finally, it was observed [22] that the best inhibitors for serine proteases such as subtilisin, chymotrypsin, trypsin or elastase all led to geometrically similar transition states for the hydrolysis reaction and all bound in the complementary oxyanion hole. All four proteases belonged to the small number of enzymes whose three-dimensional structure was known by then. 3.2 Origin of Enzymatic Activity: Derivation of the Kurz Equation
The key equation resulting from the application of two known theories, the Born–Haber cycle and the transition-state theory, was formulated for any catalyst and without reference to enzymes [20]; a good derivation can be found in the article by Kraut [23]. Transition-state theory is based on two assumptions, the existence of both a dynamic bottleneck and a preceding equilibrium between a transition-state complex and reactants. Eq. (4) results, where k denotes the observed reaction rate constant, κ the transmission coefficient, and ν the mean frequency of crossing the barrier. k=κ·ν·K =
(4)
All correction factors such as tunneling, back-crossing of the barrier, and solvent frictional effects are captured by κ, which is between 0.1 and 1 for the reactions in solution in question here. The equilibrium constant K = is expressed by partition functions, where the mode normal to the reaction coordinate ν is approximated by the term kB T/hν. In the resulting Eq. (5), ν cancels out.
k=κ·(k B T/ h)·K =
(5)
Note that K = is not the thermodynamic equilibrium constant and kB T/h is not a universal frequency for the decomposition of the transition-state complexes into products. The thermodynamic cycle relevant for further discussion is shown in Figure 2.
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
3 Sources and Reasons for the Activity of Enzymes as Catalysts E +
S
E + S≠
KS
KT ES
E + P
[ES] ≠
Fig. 2 Thermodynamic (Born–Haber)-cycle.
If one compares the first-order rate constant for the enzyme-catalyzed singlesubstrate elementary reaction ke with the one for the uncatalyzed reaction ku , Eq. (6) is obtained with Eq. (4). ke /ku =(κe ·νe ·K e= )/(κu ·νu ·K u= )
(6)
In the case of a simple elementary enzyme reaction, ke is identical to kcat . By utilizing the thermodynamic cycle the quotient of the constants of formation of the transition state K e = /Ku = can be equated with the quotient of the dissociation constants for substrate KS and transition state K T [Eq. (7)]. kcat /ku =(κe ·νe ·K S )/(κu ·νu ·K T )
(7)
Although only a few data are available for the comparison of κ e ·ν e , and κ u ·ν u , it is assumed that the two terms do not differ much in magnitude, so that Eq. (8) holds approximately. kcat /ku ≈K S /K T
(8)
This is the equation derived by Kurz, expressed here for the case of an enzyme reaction. The ratio of enzyme-catalyzed to uncatalyzed reaction kcat /ku or kcat /kuncat , i.e., the rate acceleration, often reaches orders of magnitude of 1010 to 1012 [24], and in the best case so far 1.4 · 1017 (orotidine monophosphate decarboxylase, OMPD), which translates into a proficiency kcat /(kuncat · K M ) of 1023 M−1 [25]. This underscores the quality of enzymes as catalysts. In comparison, the best value for the rate acceleration of catalytic antibodies is 2.3 · 108 [26], and typical values are 103 to 105 [27]. Radzicka [24] found that kcat for the reactions in question is often within one or two orders of magnitude; interestingly, the highest efficiencies are observed with reactions of very low kuncat . Thus, Eq. (8) means that transition states in the case of successful catalysis must bind much more tightly to the enzyme than ground states; in this way, Eq. (8) is the mathematical form of the Pauling postulate in 1944 [17]. It is emphasized again that the action of enzymes through the stabilization of the transition state is not an independent concept but is derived from transition-state theory and the idea of a thermodynamic cycle. 3.3 Consequences of the Kurz Equation
The Kurz equation [Eq. (8)] has simplified and channeled many discussions in enzymology Many other explanations of enzyme action and many phenomena
7
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
8 Enzyme Catalysis
observed with enzyme reactions can be reduced to the validity of Eq. (8), as shown in the following paragraphs. Ĺ The principle of transition state stabilization can explain why many enzymes show higher activity against larger, sterically more demanding substrate molecules than against smaller ones. Paradoxically, the cause of the enhanced activity was an increased kcat , not a decreased K M value, i.e., an enhanced maximum rate and not a tighter substrate binding. This phenomenon led to the postulation of “induced fit”. But if the enzyme template binds the transition state more tightly than the ground state, it can be expected that peripheral parts of the substrate molecule have an important role for the binding in the transition state but not necessarily for the binding in the ground state. Conformational changes of the enzyme template do not need to be postulated. Frequently, however, conformational changes during binding and catalysis are observed which are then also called “induced fit” by several authors. Ĺ The exact loci of binding and catalysis cannot be distinguished exactly. In subtilisin, the amino acids forming the catalytic triad, Ser221–His64– were replaced in all combinations by Ala. Even the triple mutant without any amino acid from the original catalytic center displayed a 1000-fold higher reaction rate than the uncatalyzed reaction, and the remainder of the enzyme molecule bound the substrate better in the transition state than in the ground state [28]. Ĺ The numerous interactions between enzyme side chains and substrate molecules act synergistically, so the fit is quite exact. This cooperativity enhances substrate specificity, for a small change of substrate can effect a large change in the binding of the transition state. Triose phosphate isomerase is only 1/1000 as effective if the residue Glu165 is moved away from the substrate by even 1 Å, as was demonstrated with the exchange of Glu165 for Asp165 [29]. The change of only one amino acid of the triad in subtilisin causes an activity decrease of 10−6 , whereas the exchange of all three causes a decrease of only 7 × 10−7 and not of 10−18 [28]. The results seem to be explained by an all-or-nothing situation rather than additivity Ĺ Intramolecular reactions proceed much more rapidly than intermolecular ones owing to entropic stabilization. Page and Jencks have calculated the entropies of translation, rotation, and vibration [30]. The entropy loss of the reactants already realized in the enzyme–substrate (ES) complex allows for a much faster reaction than in the situation without complexation, binding, or geometric orientation. Introduction of an effective concentration [S]eff alleviates the trouble of comparing results of intermolecular catalysis with rate constants of dimension [concentration · time]−1 (second-order reaction) with those of intramolecular catalysis with constants of dimension [time]−1 . The high values often achieved for [S]eff (often > 10 M) reflect the efficiency of intramolecular catalysis. In this context, enzymes are often referred to as “entropy traps”, because many contributions to entropy are “frozen in” after binding of the substrate to the enzyme molecule. Ĺ Many enzymatic reaction mechanisms are accelerated or even only made possible by the absence of water or other solvents. The enzyme supports desolvation
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
3 Sources and Reasons for the Activity of Enzymes as Catalysts Table 1 Wholly or partially diffusion-controlled enzyme reactions adapted from [52]
Substrate
Enzyme
k cat [s−1 ]
K M [M]
kcat /KM [s−1 · M−1 ]
CO2 ⇔ HCO3 − H2 O2 fumarate benzylpenicillin
Carbonic anhydrase Catalase Fumarase β-Lactamase
1 × 106 4 × 107 800 2 × 103
0.012 1.1 5 × 10 −6 2 × 10−5
8.3 × 107 4 × 107 1.6 × 108 1 × 108
by redirecting part of the possible binding energy liberated from the binding between enzyme and substrate. Ĺ Upper boundary of rate constant of an enzyme reaction: diffusion control The upper boundary of the reaction rate is reached when every collision between substrate and enzyme molecules leads to reaction and thus to product. In this case, the Boltzmann factor, exp(−E a /RT), is equal to 1 in the transition-state theory equations and the reaction is diffusion-limited or diffusion-controlled (owing to the difference in mass, the reaction is controlled only by the rate of diffusion of the substrate molecule). The reaction rate under diffusion control is limited by the number of collisions, the frequency Z of which can be calculated according to the Smoluchowski equation [Smoluchowski, 1915; Eq. (9)]. −dc/dt=−8π RD·c 2
(9)
R is the normalized radius of both participating particles (1/r 1 + 1/r 2 ), D the normalized diffusion coefficient (1/D1 + 1/D2 ). The collision frequency Z is then calculated from Eq. (10). Z=2RT/(3000η·[(r 1 +r 2 )2 /r 1 r 2 ])
(10)
As the enzyme molecule is much larger than the substrate or product molecule (r E r S ) and thus diffuses much more slowly (DE DS ), Eqs. (9) and (10) can be simplified to Eqs. (11) and (12). −dc/dt=−8π RDE ·c 2
(11)
Z=2RT/(3000η·rE )
(12)
In the case of a highly viscous solution the influence of viscosity η can dominate [31]. For simple, uncharged particles in water at 25 ◦ C the second-order rate constant is 3.2 × 10 (M · s)−1 [32, 33]. Some cases of wholly or partially diffusion-controlled enzyme reactions are listed in Table 1. Rearrangement of Eq. (7) results in Eq. (13). kcat /K S =(ku ·κe ·ve )/(κu ·vu ·K T )
(13)
9
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
10 Enzyme Catalysis
Eq. (13) is the manifestation of the second-order rate constant for a reaction between free enzyme and free substrate to give free enzyme and free product; thus, the rate constant cannot exceed the maximum value for kdiff , i.e., 3.2 × 109 M−1 · s−1 . A perfectly evolved enzyme therefore features a decreased binding constant K s , which means a stronger binding in the transition state, possibly up to the thermodynamic limit. Limitation of a reaction by translational diffusion in solution is a rather rare case. Much more frequently the limitation of the observed overall reaction rate is by external mass transfer (through a laminar film around a solid macroscopic carrier) or internal mass transfer (diffusion of substrate or product through the pores of a solid carrier or a gel network to an enzyme molecule in the interior of the carrier). 3.4 Efficiency of Enzyme Catalysis: Beyond Pauling’s Postulate
Let us return one more time to the Kurz equation [Eq. (8)], which is regarded as the quantification of Pauling’s postulate that transition-state stabilization, i.e., the tighter binding between enzyme and transition state, expressed by K T , as compared to binding between enzyme and ground state, expressed by K S , is the source of catalytic rate enhancement. kcat /kuncat ≈K S /K T
(8)
Identifying K S with K M , K T approximately equals the ratio (kuncat · K M )/kcat , which is the inverse of termed the “proficiency”, which has been determined experimentally for a number of systems (see Section 4.4 below). If the Kurz equation captured the whole essence of enzyme catalysis, K T should be proportional to (kuncat · K M )/kcat . However, Figure 3 reveals that the correlation of K T (≡ “K TS ”) for some enzyme reactions is much better with kuncat (≡ “knon ”) than with kcat [34]. This seeming contradiction to Pauling’s postulate and the notion of the transitionstate energy and conformation as the sole source of enzymatic catalytic power cast the geometry of the substrate ready to interact with the enzyme, i.e., substrate preorganization, into a new and more important role. While the discussion on this subject is far from settled, a novel concept based on molecular mechanics calculations of the stability of different substrate conformations, termed substrate conformers, has emerged which emphasized substrate preorganization. The substrate conformers which allow reactions to occur most easily clearly resemble the transition state and are called “near-attack conformers” (NACs) [34, 35]. The relative rate constants of anhydride formation krel from mono-p-bromophenyl esters were found to correlate with the mole fractions (P) of ground-state conformers that could be classified as NACs according to Eq. (14) [8]. log krel =0.94log P+7.48 (r 2 =0.915) The results are depicted in Figure 4.
(14)
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
4 Performance Criteria for Catalysts, Processes, and Process Routes
Fig. 3 Comparison of the relationship of K TS with the enzymatic reaction rate constant, kcat , and with the uncatalyzed solution or reference reaction rate constant knon [34].
4 Performance Criteria for Catalysts, Processes, and Process Routes 4.1 Basic Performance Criteria for a Catalyst: Activity, Selectivity and Stability of Enzymes
Every catalyst, and thus also every biocatalyst, can be characterized by the three basic dimensions of merit, namely activity, selectivity, and stability. Additional, but not frequently employed, performance criteria beyond the basic dimensions are discussed in Section 4.3. 1. Biocatalysts often feature much better selectivity than non-biological catalysts, so they are developed for use because of their selectivity, be it enantioselectivity, chemo- or regioselectivity. 2. As activity is straightforward to measure and necessary to know for even the basic experimental protocol, enzyme activity is often well studied. In contrast to other catalysts, most enzymes are only active and stable in a very limited temperature and pH range (mostly between 15 and 50 ◦ C or pH 5 and 10). 3. Unlike activity, stability of enzymes is often interpreted simplistically as thermal stability, i.e., a temperature beyond which the enzyme loses stability. Although this quantity is important, first every statement of stability at a certain temperature depends on exposure time and thus is often ambiguous; and second, for biocatalytic process applications, a more important quantity is the process or operational stability, which is the long-term stability under specified conditions.
11
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
12 Enzyme Catalysis
Fig. 4 Log of the relative rate constants of anhydride formation krel from mono-p-bromophenyl esters vs. the log of the probability (P) of NAC formation of each monophenyl ester of the various dicarboxylic acids [34].
4.1.1 Activity The value of the (overall) enzyme activity is usually provided in “International Units” or “Units”:
1 Unit≡1 IU≡1 U≡1 µmol min−1 In most cases a value for the overall activity of enzyme is not very interesting. Much more important are both the specific activity, scaled to the mass of catalyst, and the volumetric activity, based on the activity per unit volume: 1 U (mg protein)−1 =1µmol (min mg protein)−1 ≡specific activity and 1 U mL−1 =1µmol (min mL)−1 ≡volumetric activity All activity data are meaningless without a specification of the conditions of measurement. Connonen, the activity of enzymes is provided at nearly physiological
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
4 Performance Criteria for Catalysts, Processes, and Process Routes
conditions at 30 ◦ C and pH 7.5, unless specified otherwise. Any enzyme sold commercially comes with information about the detailed assay conditions used to arrive at the specific (solid preparation) or volumetric (liquid preparation) activity. To judge the quality of a biocatalyst, its specific activity can serve as a guideline: the threshold for a useful biocatalyst is 1 U (mg pure protein)−1 ; the threshold for a biocatalyst developed for plant-scale level, however, lies much higher, rather closer to 100 U (mg pure protein)−1 . The volumetric activity of a (bio)catalyst can be enhanced by simple addition of more catalyst to a system. Specific activity, however, must be improved through optimization of the reaction conditions, or through variation of the structure of the carrier or even of the enzyme. A mass-independent quantity of activity, superior to specific activity, is the turnover frequency (tof), defined as: turnover frequency (tof)=
number of catalytic events time×number of active sites
The turnover frequency allows performance comparison between different catalyst systems, biological and/or non-biological. Its threshold is at 1 event per second per active site. According to the definition, a turnover frequency can be determined only if the number of active sites is known. For an enzyme reaction obeying Michaelis–Menten kinetics, Eq. (15) holds. tof=1/kcat
(15)
4.1.2 Selectivity The notion of selectivity needs to be specified further: point selectivity is the incremental selectivity, usually towards product, at a specific degree of conversion, whereas integral selectivity is the overall average selectivity at the same specific degree of conversion. Owing to the importance of enantiomeric purity of target product molecules in life science applications and the pre-eminent position biocatalysts enjoy with respect to the achievement of that goal, enantioselectivity is the most important kind of selectivity in the context of biocatalysis. The enantiomeric ratio or E value [36] serves as a measure of enantio-selectivity at a certain degree of conversion [Eq. (16)].
E =(kcat /K M )A /(kcat /K M )B =ln([A]/[A]0 )/([B]/[B]0 )
(16)
Although the E value is measured at a specific degree of conversion, nevertheless it is an integral selectivity measure. This is even more obvious in the case of the other measure commonly employed, the enantiomeric excess, e.e. [Eq. (17)], which reflects overall selectivity up to the point of isolation of the product. e.e.=([A]−[B])/([A]+[B])×100%
(17)
13
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
14 Enzyme Catalysis
The threshold for an enantioselective (bio)catalyst is either at E = 100 or at 99% e.e., depending on whether one is evaluating either the reaction itself or a reaction product. [These values are linked to the requirement by the Food and Drug Administration of the United States (FDA), the most important body in the world for setting guidelines for both procedures for novel drug approval and drug performance evaluation, that a separate toxicological study be conducted for any impurity, including an enantiomer of the active drug, exceeding a concentration level of 0.5%.] Besides enantioselectivity, regioselectivity and to a lesser extent, chemoselectivity are also important issues of enzymatic reaction selectivity. 4.1.3 Stability Stability of an enzyme is usually understood to mean temperature stability, although inhibitors, oxygen, an unsuitable pH value, or other factors such as mechanical stress or shear can decisively influence stability. The thermal stability of a protein, often employed in protein biochemistry, is characterized by the melting temperature T m , the temperature at which a protein in equilibrium between native (N) and unfolded (U) species, N ⇔ U, is half unfolded. The melting temperature of a protein is influenced on one hand by its amino acid sequence and the number of disulfide bridges and salt pairs, and on the other hand by solvent, added salt type, and added salt concentration. Protein structural stability was found to correlate also with the Hofmeister series [37–39] [Eq. (18)].
Tm =T0 +K [S]
(18)
In Eq. (18), T 0 is the melting temperature in the absence of salt, [S] the salt concentration, and K the corresponding coefficient. Storage stability over time under fixed conditions of temperature, pH value and concentration of additives often can be expressed by a first-order decay law (analogous to radioactive decay) [Eq. (19)]. [E ]t =[E]0 exp(−kd ·t)
(19)
The validity of a first-order decay law over time for the activity of enzymes according to Eq. (19), with [E]t and [E]0 as the active enzyme concentration at time t or 0, respectively, and kd as the deactivation rate constant, is based on the suitability of thinking of the deactivation process of enzymes in terms of Boltzmann statistics. These statistics cause a certain number of active protein molecules to deactivate momentarily with a rate constant proportional to the amount of active protein [for evidence for such a “catastrophic” decomposition, see Craig [40]]. The relevant stability criteria for process applications (the process stability or operating stability) is discussed in the next section.
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
4 Performance Criteria for Catalysts, Processes, and Process Routes
4.2 Performance Criteria for the Process
In evaluating a biocatalyst for a given processing task, there are performance criteria to be met not only for the biocatalyst but also for the process. The dimensions of merit important when determining, evaluating, or optimizing a route for a process are (i) product yield, (ii) (bio)catalyst productivity, (iii) (bio)catalyst stability, and (iv) reactor productivity. 4.2.1 Product Yield Chemical yield to product is most important for the economics of the process. Product yield is inversely proportional to the amount of reactants required per unit of product output. As the key substrate and other raw materials in most mature processes make up more than 50% of the variable cost, a high product yield is indispensable for an economic process. The yield of product y is linked to selectivity σ and the degree of conversion x by Eq. (20).
y=x·σ
(20)
so that the product concentration [P] can be calculated from Eq. (21). [P]=[S]0 ·y=[S]0 ·x·σ
(21)
Yields much less than 100% are no longer acceptable either economically or ecologically. In the separation of a racemate, where the yield per run is limited to 50% per pass, this means that internal or external racemization is necessary unless either the substrate is inexpensive enough to lose up to 50% or the co-product can be marketed in similar amounts. Both of the latter situations are highly unlikely. A threshold for a sufficiently high yield is hard to define, as the threshold value seems to correlate inversely with the unit value of the product. While for basic, large-volume chemicals yields of 98 or 99% are absolutely essential, the situation in fine chemicals calls for 90–95% yield, and in the initial stage of production of extreme performance chemicals, such as pharmaceuticals, yields of > 80% are very acceptable; sometimes values down to 50% have to be encountered. Acceptable yields depend on the number of process steps, including product isolation. If all the steps are assumed to fetch 90% yield, the overall yield depends on the number of steps n as in Eq. (22). yoverall =(ystep )n =(0.9)n
(22)
so that for a one-step process yoverall,1 = 90%, for three-step process yoverall,3 = 73%, and for a ten-step process yoverall,10 = 35%!
15
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
16 Enzyme Catalysis
4.2.2 (Bio)catalyst Productivity If a large amount of (bio)catalyst is added to a substrate solution better results are achieved than if just a small amount of a highly productive catalyst is used. To even call an agent a catalyst, one condition is that the agent be added, as a minimum, in substoichiometric amounts. The smaller the amount of catalyst that has to be added for the same result, the better its performance. The relevant criterion is the dimensionless turnover number, TON [Eq. (23)].
TON=
substrate amount of product = =[S]/[C] amount of catalyst catalyst ratio
(23)
The turnover number is not used frequently in biocatalysis, possibly as the molar mass of the biocatalyst has to be known and taken into account to obtain a dimensionless number, but it is the decisive criterion, besides turnover frequency and selectivity, for evaluation of a catalyst in homogeneous (chemical) catalysis and is thus quoted in almost every pertinent article. Another reason for the low popularity of the turnover number in biocatalysis, apart from the challenge of dimensionality, is the focus on reusability of biocatalysts and the corresponding greater emphasis on performance over the catalyst lifetime instead of in one batch reaction as is common in homogeneous catalysis [41]. For biocatalyst lifetime evaluation, see Section 4.2.3. The turnover number of a catalyst does not refer to the timescale of activity of that catalyst in a process. If the timescale of activity is taken into consideration, the productivity of the catalyst is recovered, expressed as a productivity number, PN, defined in Eq. (24). PN=
mass of product unit catalyst×time
(24)
4.2.3 (Bio)catalyst Stability As in all catalytic processes, catalyst stability is a key process criterion. In contrast to mere temperature or storage stability, which refer to the catalyst independently of a process, the operating stability or process stability is the relevant and decisive dimension of merit. It is determined by comparing the amount of product generated with the amount of catalyst spent. The relevant quantity, also sometimes found in homogeneous catalysis, is the total turnover number (TTN) [Eq. (25)].
TTN=
moles of product produced mole of catalyst spent
(25)
Quite common in applied biocatalysis, where the purity of biocatalyst often is not known, is the expression of biocatalyst stability as an the enzyme consumption number (e.c.n.) [Eq. (26)]. e.c.n.=
g of enzyme amount of enzyme preparation spent = amount of product generated kg (or lb) of product
(26)
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
4 Performance Criteria for Catalysts, Processes, and Process Routes
The e.c.n. value depends on process parameters such as temperature, pH value, and concentrations of substrate(s) and product(s). With the molar masses of enzyme and product, e.c.n. and TTN can be interconverted [Eq. (27)]. TTN=(1000/e.c.n.)×(MWenzyme /MWproduct )
(27)
It should be emphasized that the TTN is not a completely suitable quantity for the evaluation of operating stability because the number of moles of biocatalyst is not a suitable reference for the complexity and cost of its manufacture. However, the values for both numerator and denominator in Eq. (27) are usually known and can be expressed in monetary terms. For checking the application of such a biocatalyst, the contribution of the biocatalyst to the overall cost can be assessed readily. The relevant parameter for studies of operating stability of enzymes is the product of active enzyme concentration [E]active and residence time τ , [E]active · τ . In a continuously stirred tank reactor (CSTR) the quantities [E]active and τ are linked by Eq. (28), where [S0 ] denotes the initial substrate concentration, x the degree of conversion and r(x) the conversion-dependent reaction rate [42, 43]. ([E]active ·τ )/[S0 ]=x/r (x)
(28)
For a comparison and discussion of the concepts of stability at rest [Eq. (19)] and operating stability [Eq. (28)]. The threshold value for sufficient biocatalyst stability depends on the application, as does the value for product yield. For any application in synthesis the TTN should exceed 10 000, and for large-scale processing a value of > 1 000 000 is preferred. 4.2.4 Reactor Productivity For the assessment of reactor productivity that is independent of catalyst, the spacetime yield (s.t.y) [Eq. (29)]. is considered, as in any chemical process.
s.t.y.=mass of product generated/(reactor volume×time)[kg(L·d)−1 ]
(29)
An increase in s.t.y. in the same reactor is equivalent to an increase in reaction rate. When considering the Michaelis–Menten equation [Eq. (3): v = vmax [S]/(K M + [S]) = kcat [E][S]/(K M + [S])], there are three possible ways to achieve a maximum reaction rate: 1. High substrate concentration (increase of space-yield). Enzyme reactions can be accelerated by increasing the substrate concentration up to the limit of saturation (≈ 10K M ). If [S] K M , the enzyme is saturated and Eq. (3) is reduced to Eq. (30).
v=vmax =kcat ·[E]
(30)
17
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
18 Enzyme Catalysis
At substrate saturation, the reaction is zeroth-order with respect to substrate. With a loosely binding substrate, i.e., a high K M value, the enzyme is often saturated only above the solubility limit of the substrate. To improve the reaction rate or the s.t.y. further, according to Eq. (29) the yield per unit time must be increased, which means enhancing either the biocatalyst concentration and/or the time constant kcat . 2. High enzyme concentration. The reaction rate and s.t.y. can be enhanced by increasing the catalyst concentration [E]; in practice, however, in contrast to the formalism of Eq. (3), owing to an either excessive viscosity increase or excess of deactivated protein in the reactor, a maximum limit of enzyme concentration is reached. 3. Increase in the time constant. According to Eq. (3), kcat signifies the time constant of the enzyme reaction [time−1 ], and the corresponding reaction is performed over a time scale 1/kcat [time]. An increase can be effected through the temperature (Arrhenius behavior) but also for instance through changing a protecting group in peptide synthesis [44]. An increase in kcat also increases the acceleration ratio kcat /kuncat [24] of the enzyme catalyst, in contrast to case 2) (see Section 4.3).
A minimum threshold value for reactor productivity can be set at a space-timeyield of about 100 g (L · d)−1 , a value which tends to be compromised more by lack of substrate solubility than by biocatalyst reactivity. Well-developed biocatalytic process often feature space–time yields of > 500 g (L · d)−1 or even > 1 kg (L · d)−1 [44–46]. 4.3 Links between Enzyme Reaction Performance Parameters 4.3.1 Rate Acceleration While users of biocatalysts are often concerned first and foremost about a high maximum rate vmax , a high kcat value, and possibly also a low K M value, a good (bio)catalyst is one that enhances the chemical background rate by as high a factor as possible, at least 105 , possibly 1010 or even more. The dimensionless ratio of the enzyme catalytic rate constant over the uncatalyzed (i.e., the chemical background) rate constant, kcat /kuncat , is called the rate acceleration. An alternative nomenclature used in the literature, the efficiency or proficiency, should be reserved for the term kcat /(kuncat · K M ) quantifying the total energetic cost of enzymatic catalysis [47]. Surprisingly, when analyzing the source of good rate accelerations, i.e., high kcat /kuncat values, or the source of high proficiencies, i.e., high kcat /(kuncat · K M ) values, kcat was found to be within two orders of magnitude for a variety of enzymes whereas kuncat varied by several orders of magnitude, irrespective of the value of kcat (Figures 5 and 6; [24, 48]). The data provides clear guidance that the biggest improvement in enzyme catalysts can be achieved for reactions with very low chemical background rate constants and not by optimizing rate constants or specificities which are already fairly high.
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
4 Performance Criteria for Catalysts, Processes, and Process Routes
10 5 rate constant (e–1)
enzyme rate constants (kcat)
catalyzed
1 t 1/2 1 minute
10 –5 t1/2 1 week uncatalyzed
t 1/2 100 years 10
–10
t 1/2 1 million years 10–15 t 1/2 4 billion years
nonenzymatic rate constants (knon)
Fig. 5 Logarithmic scale comparison of kcat and kuncat (= knon ) for some representative reactions at 25 ◦ C. The length of each vertical bar represents the rate enhancement. [48]. ADC: arginine decarboxylase; ODC: orotidine 5 -phosphate decarboxylase; STN:
staphylococcal nuclease; GLU: sweet potato $-amylase; FUM: fumarase; MAN: mandelate racemase; PEP: carboxypeptodase B; CDA: E. coli cytidine deaminase; KSI: ketosteroid isomerase; CMU: chorismate mutase; CAN: carbonic anhydrase.
4.3.2 Ratio between Catalytic Constant kcat and Deactivation Rate Constant kd The rate equation for a deactivating enzyme in a batch reactor [Eq. (31)] reads:
x·[S]0 −K M ln(1−x)=[E]0 ·(kcat /kd )·{1−exp(−kd ·t)}
(31)
The influence of deactivation depends linearly on the dimensionless ratio kcat /kd , which might serve as a ratio to assess quickly the potential of a deactivating enzyme for synthesis. 4.3.3 Relationship between Deactivation Rate Constant kd and Total Turnover Number TTN Specific enzyme consumption [U per kg product], a hands-on parameter, can be obtained from Eq. (32) [49], where avol,0 [U L−1 ] is the initial volumetric activity).
specific enzyme consumption=(avol,0 ·kd )/(s.t.y.)[U(kg product)−1 ]
(32)
A high value for s.t.y. and low values for [E] (a modified enzyme concentration [g enzyme L−1 ]), avol,0 , or kd all lead to a favorable, i.e., low, specific enzyme consumption. To convert the specific enzyme consumption into the total turnover number TTN, the specific enzyme consumption, sp.e.c. [U (kg product)−1 ] [Eq. (32)] first has to be
19
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
20 Enzyme Catalysis 1010 k cat / K m
enzyme efficiencies (k cat /K m)
(M –1 s–1)
catalyzed
10 5
1 t 1/2 1 minute
10 –5 t1/2 1 week uncatalyzed
t 1/2 100 years 10 –10 rate constant (s–1)
k non (s–1)
t 1/2 1 million years 10 –15 t 1/2 4 billion years
nonenzymatic reaction constants (k non)
Fig. 6 Logarithmic scale comparison of kcat /K M and kuncat (≡ knon ) for some representative reactions at 25◦ C. The length of each vertical bar represents the transition-state affinity or catalytic proficiency (its reciprocal). For abbreviations, see Figure 5 [48].
converted into the enzyme consumption number e.c.n. [g enzyme (kg product)−1 ] [Eq. (26)].
e.c.n.=(sp.e.c.·[E] )/avol,0 [g enzyme(kg product)−1 ]
(33)
If Eq. (32) is inserted into Eq.(33) and the resulting equation into Eq. (27),the result is Eq.(34), which with [E] /MWenzyme = [E]0 [mol L−1 ] becomes Eq. (35). TTN[mol product(mol catalyst)−1 ] =(MWenzyme /MWproduct )(100×s.t.y.)/([E] ·kd )
(34)
TTN=(1000×s.t.y.)/MWproduct ·[E]0 ·kd
(35)
A favorable, i.e., high, TTN is achieved by driving up the s.t.y. of a pertinent experiment while keeping both [E]0 and kd low. (The conversion factor of 1000 does not appear if s.t.y. is expressed in [g (L · d)−1 ] and not [kg (L · d)−1 ].)
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
4 Performance Criteria for Catalysts, Processes, and Process Routes
4.4 Performance Criteria for Process Schemes, Atom Economy, and Environmental Quotient
Roger Sheldon, of TU Delft (Delft, The Netherlands), has compiled a list of trends for chemical and biochemical process technology [50]; they are towards Ĺ Ĺ Ĺ Ĺ Ĺ
catalytic reactions instead of stoichiometric ones, reactions with 100% selectivity at 100% conversion, high substrate concentrations, no detrimental solvents, and no change of solvent during the process, enhanced use of solid or volatile acids and bases as well as pH-stat techniques.
We recognize the importance of high conversion, high selectivity, and thus high product yields; and of high substrate concentration, often leading to high s.t.y. and TTN. However, there are several additional guidelines which help to optimize a complete process. Some of those criteria have been discussed above in the context of the catalyst or the process, such as enantioselectivity or diastereoselectivity instead of simple selectivity to product, and catalyst performance data such as turnover frequency (tof), turnover number (TON), total turnover number (TTN), and space–time-yield (s.ty). Others have to deal with inputs and outputs of the process apart from reactants, catalysts, and products, such as: 1. the atom economy, i.e., the fraction of carbon (or other elements) of the substrate that is utilized or recovered in the product 2. the specific consumption (consumption per kg product) of solvents, salts, and other auxiliaries such as work-up materials (active carbon, filter aids), and 3. the degree of recovery of all materials, ranging from main process components to solvents, salts, and traces as well as contents of trace streams, often termed ancillary losses.
ad 1) Atom economy The degree of utilization of inputs in the final product is termed the atom economy [51]. An atom-economic process uses as many atoms as possible in the reaction stoichiometry Examples of simple, but atom-economic, chemical processes are the manufacture of maleic anhydride by air-oxidation of butane/butene instead of benzene, or of propylene oxide by air-oxidation of propylene (propene) instead of by ring-closing substitution of epichlorohydrin (with loss of HCl). An example of a process with low atom economy is any synthesis with protection/deprotection steps, such as those that occur during peptide synthesis (while possibly still featuring very high product yields). ad 2) Specific consumption of solvents, salts, and auxiliaries This can be captured by evaluating the environmental quotient, EQ [50], consisting of a relative mass of by-product versus target product [kg kg−1 ] (or for our American readers, lb lb−1 )
21
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
22 Enzyme Catalysis Table 2 Comparison of conditions for laboratory- and industrial-scale processes
Medium Substrate pH control Time Temperature, pH gradients
Laboratory-scale processes
Industrial-scale processes
dilute solutions often natural with buffer sufficient for reaction none
(highly) concentrated media often unnatural titration with acid/base; solid acids/bases fast reactions desired heat and mass transfer influence reaction
multiplied by a measure of the environmental unfriendliness. Water or even NaCl might carry a factor of 1 or close to 1, whereas mercury, halogenated solvents, or other toxic by-products might carry a factor of 10 or even 100. ad 3) Recovery of all materials Eco-balances have become increasingly popular with regulators to gauge the impact of discharges. Increasingly, such instruments replace simple levying of costs on discharges or setting inflexible upper limits for concentrations of a range of ecologically unfriendly agents. When aiming for scale-up or when actually already running a process on a pilot or a full industrial scale instead of trying to gain knowledge about a problem under controlled laboratory conditions, one should ensure that the different goals are reflected in different measures taken to obtain data or to develop a catalyst or process for an ultimate large scale. Table 2 contrasts typical laboratory practices with conditions prevailing on production scale. It is prudent, however, for processes to be practiced on a large scale to be developed accordingly, even during the test phase on lab scale. Suggested Further Reading Alan FERSHT, Structure and Mechanism in Protein Science, Freeman, New York, 1999.
William P. JENCKS, Mechanisms in Chemistry and Enzymology, Dover, New York, 1975.
References 1 E. FISCHER, Einfluss der Configuration auf die Wirkung der Enzyme, Ber. dtsch. chem. Ges. 1894, 27, 2985–2993. 2 D. E. KOSHLAND Jr., Application of a theory of enzyme specificity to protein synthesis, Proc. Natl. Acad. Sci. U.S.A., 1958, 44, 98–104. 3 C. TANFORD, Physical Chemistry of Macro-molecules, Wiley, New York, 1961. 4 I. CHAMBERS, J. FRAMPTON, P. GOLDFARB, N. AFFARA, W. MCBAIN, and P. R. HARRISON, The structure of the mouse
glutathione peroxidase gene: the selenocysteine in the active site is encoded by the ‘termination’ codon, TGA, EMBO J. 1986, 5, 1221–1227. 5 F. ZINONI, A. BIRKMANN, T. C. STADTMAN, and A. BOECK, Nucleotide sequence and expression of the selenocysteinecontaining polypeptide of formate dehydrogenase (formate-hydrogenlyase-linked) from Escherichia coli, Proc. Natl. Acad. Sci. USA 1986, 83, 4650–4654.
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
References 6 B. HAO, W. GONG, T. K. FERGUSON, C. M. JAMES, J. A. KRZYCKI, and M. K. CHAN, A new UAG-encoded residue in the structure of a methanogen methyltrans-ferase, Science 2002, 296, 1462–1466. 7 G. SRINIVASAN, C. M. JAMES, and J. A. KRZYCKI, Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA, Science 2002, 296, 1459–1462. 8 R. A. MEHL, J. C. ANDERSON, S. W. SANTORO, L. WANG, A. B. MARTIN, D. S. KING, D. M. HORN, and P. G. SCHULTZ, Generation of a bacterium with a 21 amino acid genetic code, J. Am. Chem. Soc. 2003, 125, 935–939. 9 S. J. POLLACK, J. W. JACOBS, and P. G. SCHULTZ, Selective chemical catalysis by an antibody, Science 1986, 234, 1570–1573. 10 A. TRAMONTANO, K. D. JANDA, and R. A. LERNER, Chemical reactivity at an antibody binding site elicited by mechanistic design of a synthetic antigen, Proc. Natl. Acad. Sci. USA 1986, 83, 6736–6740. 11 T. R. CECH, RNA as an enzyme, Sci. Am. 1986, 255(5), 64–75. 12 T. R. CECH, RNA. Fishing for fresh catalysts, Nature, 1993, 365, 204–205. 13 L. MICHAELIS and M. L. MENTEN, Die Kinetik der Invertinwirkung, Biochem. Z. 49, 1913, 333–369. 14 O. A. HOUGEN and K. M. WATSON, Chemical Process Principles, Part III, Wiley, New York. 1947. 15 J. B. S. HALDANE, Enzymes, M.I.T. Press, Cambridge/MA, USA, 1965. 16 L. PAULING, Molecular architecture and biological reactions, Chem. Eng. News 1946, 24, 1375–1377. 17 L. PAULING, Chemical achievement and hope for the future, Am. Sci. 1948, 36, 51–58. 18 D. E. KOSHLAND Jr., G. NEME´ THY, and D. FILMER, Comparison of experimental binding data and theoretical models in proteins containing subunits, Biochemistry 1966, 5, 365–385. 19 T. J. STILLMAN, A. M. B. MIGUEIS, X.-G. WANG, P. J. BAKER, L. BRITTON, P. C. ENGEL, and D. W. RICE, Insights into the mechanism of domain closure and
20
21
22
23 24
25
26
27
28
29
30
31
substrate sof glutamate dehydrogenase from Clostridium symbosium, J. Mol. Biol. 1999, 285, 875–885. J. L. KURZ, Transition state characterization for catalyzed reactions, J. Am. Chem. Soc. 1963, 85, 987–991. W. P. JENCKS, in: Catalysis in Chemistry and Enzymology, McGraw-Hill, New York, 1969, pp. 288–291. J. KRAUT, Serine proteases: structure and mechanism of catalysis, Ann. Rev. Biochem. 1977, 46, 331–358. J. KRAUT, How do enzymes work?, Science 1988, 242, 533–540. A. RADZICKA and R. WOLFENDEN, A proficient enzyme, Science 1995, 267, 90–93. J. K. LEE and K. N. HOUK, A proficient enzyme revisited: the predicted mechanism for orotidine monophosphate decarboxy-lase, Science 1997, 276, 942–945. G. ZHONG, R. A. LERNER, and C. F. BARBAS III, Enhancement of the repertoire of catalytic antibodies with aldolase activity by combination of reactive immunization and transition state theory, Angew. Chem. Int. Ed. 1999, 111, 3957–3960. J. HASSERODT, Organic synthesis supported by antibody catalysis, Synlett 1999, 12, 2007–2022. P. CARTER and J. A. WELLS, Dissecting the catalytic triad of a serine protease, Nature, 1988, 332, 564–568. D. JOSEPH-MCCARTHY, L. E. ROST, E. A. KOMIVES, and G. A. PETSKO, Crystal structure of the mutant yeast triosephos-phate isomerase in which the catalytic base glutamic acid 165 is changed to aspartic acid, Biochemistry 1994, 33, 2824–2829. M. I. PAGE and W. P. JENCKS, Entropic contributions to rate accelerations in enzymic and intramolecular reactions and the chelate effect, Proc. Natl. Acad. Sci. USA 1971, 68, 1678–1683. U. EICHHORN, A. S. BOMMARIUS, K. DRAUZ, and H.-D. JAKUBKE, Synthesis of dipeptides by suspension-to-suspension conversion via thermolysin catalysis – from analytical to preparative scale, J. Peptide Sci. 1997, 3, 245–251.
23
c02 (JWBK148-Bommarius)
January 17, 2008
18:57
Char Count=
24 Enzyme Catalysis
32 H. GERISCHER, J. F. HOLZWARTH, D. SEIFERT, and L. STROHMAIER, Flow method with integrating observation for very fast irreversible reactions, Ber. Bunsenges. Physik. Chem. 1969, 73, 952–955. 33 C. CREUTZ and N. SUTIN, Vestiges of the inverted region for highly exergonic electron-transfer reactions, J. Am.Chem. Soc. 1977, 99, 241–243. 34 T. C. BRUICE and S. J. BENKOVIC, Chemical basis for enzyme catalysis, Biochemistry 2000, 39, 6267–6274. 35 T. C. BRUICE, A View at the Millenium: the efficiency of enzymatic catalysis, Acc. Chem. Res. 2002, 35, 139–148. 36 C. S. CHEN, Y. FUJIMOTO, G. GIRDAUKAS, and C. SIH, Quantitative analyses of biochemical kinetic resolutions of enantiomers, J. Am. Chem. Soc. 1982, 104, 7294–7299. 37 F. HOFMEISTER, Zur Lehre von der Wirkung der Salze. Zweite Mitteilung, Arch. Exp. Pathol. Pharmakol. 1888, 24, 247–260. 38 P. H. VON HIPPEL and K.-Y. WONG, Neutral salts: the generality of their effects on the stability of macromolecular conformation, Science 1964, 145, 577–580. 39 J. K. KAUSHIK and R. BHAT, A mechanistic analysis of the increase in the thermal stability of proteins in aqueous carboxylic acid salt solutions, Protein Sci. 1999, 8, 222–233. 40 D. B. CRAIG, E. A. ARRIAGA, J. C. Y. WONG, H. LU, and N. J. DOVICHI, Studies on single alkaline phosphatase molecules: reaction rate and activation energy of a reaction catalyzed by a single molecule and the effect of thermal denaturation – the death of an enzyme, J. Am.Chem. Soc. 1996, 118, 5245–5253. 41 H. U. BLASER, F. SPINDLER, and M. STUDER, Enantioselective catalysis in fine chemicals production, Appl. Catal. A: General 2001, 221, 119–143. 42 C. WANDREY, Reaction engineering studies on enzyme catalysts for the development of continuous processes, 913: Habilitation Thesis, TH Hannover, Germany, 1977. 43 A. S. BOMMARIUS, K. DRAUZ, U. GROEGER, and C. WANDREY, 1992, Membrane
44
45
46
47
48
49
50
51
52
53
54
bio-reactors for the production of enantiomer ically pure α-Amino acids, in: Chirality in Industry, A. N. COLLINS, G. N. SHELDRAKE, and J. CROSBY (eds.), Wiley & Sons Ltd., London, 930: Chapter 20, pp. 371–397. A. FISCHER, A. S. BOMMARIUS, K. DRAUZ, and C. WANDREY, A novel approach to enzymatic peptide synthesis using highly solubilizing Nα -protecting groups of amino acids, Biocatalysis 1994, 8, 289–307. J. D. ROZZELL, Biocatalysis at commercial scale: myths and realities, Chimica Oggi 1999, 951: (6/7), 42–47. A. S. BOMMARIUS, M. SCHWARM, and K. DRAUZ, Comparison of different chemo-enzymatic process routes to enantiomeri-cally pure amino acids, Chimia 2001, 55, 50–59. E. A. TAYLOR, D. R. J. PALMER, and J. A. GERLT, The lesser ‘burden borne’ by o-succinyl-benzoate synthase: an ‘easy’ reaction involving a carboxylate carbon acid, J. Am. Chem. Soc. 2001, 123, 5824–5825. R. WOLFENDEN and M. J. SNYDER, The depth of chemical time and the power of enzymes as catalysts, Acc. Chem. Res. 2001, 34, 938–945. U. KRAGL, D. VASIC-RACKI, and C. WANDREY, Continuous production of L-tert-leucine in series of two enzyme membrane reactors, Bioproc. Eng. 1996, 14, 291–297. R. A. SHELDON, Consider the environmental quotient, CHEMTECH 1994, 3, 38–47. B. M. TROST, On inventing reactions for atom economy, Acc. Chem. Res. 2002, 35, 695–705. A. FERSHT, Enzyme Structure and Mechanism, 2nd edition, Freeman, New York, 1985. P. T. ANASTAS and T. C. WILLIAMSON, Frontiers in green chemistry, Green Chemistry 1998, 1–26. M. VON SMOLUCHOWSKI, Versuch einer mathematischen Theorie der Koagulations-kinetik kolloidaler L¨osungen, Z. physik. Chem. 1917, 92, 129–168.
1
Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes Ariel B. Lindner, Zelig Eshhar, and Dan S. Tawfik Weizmann Institute of Science, Rehovot, Israel
Originally published in: Catalytic Antibodies. Edited by Ehud Keinan. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30688-6
Abbreviations
TS – transition state; TSA – TS analog; OAS – oxyanion stabilization; OAH – oxyanion hole; AChE – acetylcholinesterase; D-Abs – D-Antibodies (esterolytic antibodies D2.3, D2.4, and D2.5).
1 Introduction
During the past 16 years, knowledge gained from studies of natural hydrolases and the mechanism of hydrolytic reactions led to a variety of strategies for generating antibodies that catalyze hydrolytic reactions. In particular, ester hydrolysis became the most intensely studied antibody-catalyzed reaction. Esterolytic antibodies provided the first ‘proof of principle’ of antibody-based catalysis, spearheading the wide spectrum of antibody catalysts detailed in the chapters of this book. Although esterolytic antibodies do not, in general, exhibit enzyme-like rates, knowledge may be extracted from the many mechanistic and structural studies of hydrolytic antibodies concerning the mechanism of action of hydrolytic enzymes. Comparing hydrolytic antibodies to hydrolytic enzymes is the underlying theme of this chapter. Specific-base-(hydroxyl)-catalyzed and general-base-catalyzed is the mechanism by which most natural esterases act (Fig. 1). The reaction proceeds via relatively low activation barriers (the addition TS being higher than the elimination TS [1]), and a well-defined singular tetrahedral, oxyanionic intermediate (I1 , Fig. 1a). Inherent in hydroxyl-catalyzed ester hydrolysis is the linear of the hydrolytic rates. In implementing these basic mechanisms, which are also seen in solution, enzymes apply complex mechanisms, combining a variety of catalytic forces, as exemplified Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Fig. 1 Mechanisms of ester hydrolysis. (a) Hydroxide-catalyzed ester hydrolysis. The rate-limiting step consists of an exogenous hydroxide attack on the ester’s carbonyl, forming, via the addition TS1, a tetrahedral oxyanionic intermediate (I1 ). The latter collapses, via the elimination TS2, to form the carboxylate and alcohol products. Top formula: a representative phosphonate TSA, where, for the D-antibodies, R = glutaryl glycine and R = p-nitrobenzyl [24]. The numbers correspond to bond lengths de-
rived from ab initio calculations [1]. (b) The endogenous nucleophile mechanism of Serine hydrolases. The nucleophilic attack is carried out by the active site’s seryl alkoxide, activated by an active-site general-base residue (B:) to form a tetrahedral oxyanionic intermediate (I2 ). This is followed by a rate-limiting deacylation step mediated by a hydroxide ion generated in situ by the basecatalyzed proton elimination of an activesite water molecule, releasing the products via a second intermediate (I3 ) [2].
by three major families of hydrolytic enzymes: seryl/cysteyl hydrolases (Fig. 1b), aspartyl hydrolases (Fig. 1c), and zinc hydrolases (Fig. 1d). In essence, two major catalytic forces are utilized by these enzymes [2]: i) A nucleophilic attack is mediated by a precisely positioned nucleophile, either an endogenous general base (seryl alkoxide (Fig. 1b, step I)) or an explicitly bound specific base (a hydroxyl generated via general-base-catalyzed
1 Introduction
Fig. 1 (c) The general base – general acid (“push-pull”) mechanism of Aspartic hydrolases. An active-site bound water molecule is activated by a general-base carboxylate residue, whereas the oxyanion tetrahedral intermediate (I4 ) is stabilized by a proton donation from a second aspartyl residue. Residues maintaining the carboxylates’ protonation level [135] are not shown for clarity. An alternative “symmetric” model can be found in [136]. (d) The metal-mediated
mechanism of Zinc hydrolases. The nucleophilic attack is carried out by a zinc-bound water molecule, activated by a general-base Glu residue. The formed oxyanionic intermediate (I5 ) is stabilized by its coordination to the metal ion. Though a similar fivecoordinated zinc intermediate has recently been observed [137], a four-coordinated intermediate was previously suggested [2, 138].
deprotonation of an active-site water molecule; Fig. 1a, step II, 1c and 1d), to yield a tetrahedral, oxyanionic intermediate. ii) Oxyanionic stabilization (OAS) of the tetrahedral, oxyanionic intermediate and the TSs leading to its formation and breakdown. The negative charge developed on the carbonyl oxygen is stabilized by hydrogen bonding (Fig. 1b, c) or proton transfer (general-acid catalysis), or by a positive charge (Fig. 1d). iii) Protonation of the alkoxide leaving group by general-acid catalysis.
The relative positioning of various active-site residues involved in the above mechanisms provides the recognition and stabilization of the tetrahedral transition
3
4 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
state (TS). It is therefore expected that, although described as separate forces or contributions, their effect on catalysis is coupled. This coupling is indeed encoded in the enzymes’ active-site structure – for example, the seryl residue (Fig. 3), the aspartyl residues (positioned by a shared a hydrogen bond; Fig. 1b) and the zinc ion (Fig. 1c) all provide both the endogenous nucleophile as well as a hydrogen bond to the TS oxyanion. In addition to mechanism above, enzymes take advantage of a combination of other forces, including medium (solvation/desolvation effects), substrate (groundstate) destabilization, remote recognition, as well as co-ordinated conformational changes [2, 3]. Thus, the remarkable catalytic efficiency of natural enzymes, at times limited only by the rate of diffusion, is the outcome of many forces acting in concert. The separation of these factors and the assessment of their independent contributions to catalysis is far from trivial, as exemplified by various mutagenesis studies (e.g., see [4, 5]). Catalytic antibodies, raised to mimic single facets of the complex enzymatic regimes, provide a unique opportunity to examine and quantify enzymatic forces and mechanisms of action individually. 2 Chapter Overview
The first and most widely applicable strategy for generating esterolytic antibodies is by immunizing with analogs of the tetrahedral, oxyanionic intermediates of ester hydrolysis (Fig. 1a), which are generally referred to as transition state analogs (TSAs). Analysis of the structure, mechanism and catalytic efficiency of these antibodies provides the opportunity to compare and assess oxyanion stabilization (OAS) as the primary source of catalysis in both esterolytic antibodies and natural, hydrolytic enzymes (Section 2.1). Surprisingly, several antibodies directed towards OAS (e.g., 43C9 [6]) have incorporated, haphazardly, an endogenous nucleophile as part of their mechanism of action. These antibodies, together with studies attempting to directly obtain antibodies with an endogenous nucleophile (e.g., reactive immunization [7] and chemical modification strategies [8, 9]), form the basis for our discussion on nucleophilic reactivity (Section 2.2). This is accompanied by an analysis of “chemical rescue” experiments, where strong exogenous nucleophiles supplement the initial OAS activity of various antibodies [10–13] (Section 2.3). Several studies, attempting at the generation of esterolytic antibodies with general-acid/base mechanism either by a “bait and switch” strategy (Section 2.4) or metal incorporation (Section 2.5), are then qualitatively assessed. Finally, the roleof conformational changes, suggested in a number of catalytic antibodies [14–20], is described vis-a-vis known contributions of conformational changes to natural enzymes (Section 2). 2.1 Catalysis by Oxyanion Stabilization (OAS)
The most commonly used TSA haptens for the generation of esterolytic antibodies are phosphonates, although other phosphate derivatives such as phosphinates,
2 Chapter Overview
phos-phonoamidates, and phosphothioates have also been used. This indirect approach for the generation of biocatalysts was first formulated by Jencks [21]. Immunization with these analogs generates antibodies that catalyze ester hydrolysis via preferential binding of the reaction’s tetrahedral, oxyanionic TS compared to the planar, uncharged ester substrate (Fig. 1a). This approach relies on optimal representation of the reaction’s TSs by the TSA. It has been shown, for example, that substrate analogs completely fail to generate esterolytic antibodies [22]. Thus, as discussed below, TSA design is a primary issue. 2.1.1 Fidelity of TSA design Phosphate derivatives that are potent inhibitors of hydrolases were shown to exhibit TSA characteristics, albeit with some important discrepancies ([23] and references therein). Ab initio calculations suggest that these TSAs better present the elimination TS2 , vis-a-vis the addition TS1 , the higher and thus the rate-limiting barrier of the two TSs [1] (Fig. 1a). The TSA P-O bonds overestimate the corresponding lengths of the TS, apart from their being significantly shorter (1.48 Å) than the 2.2 Å distance in the addition TS between the incoming hydroxyl nucleophile and the ester’s carbonyl carbon (Fig. 1a). The symmetrical charge partition between the phosphonates’ oxygens overestimates the charge formed in the actual TS and of the incoming hydroxyl nucleophile. Both of the phosphonates’ oxygens are expected to bait antibodies with H-donor residues. This is crucial to stabilizing the developing oxyanionic charge, yet “baiting” a basic residue would be preferable in the position representing the nucleophilic attack. The latter may act as a general base to activate the incoming nucleophile or as an endogenous nucleophile. Nonetheless, despite the above-described infidelities, phosphonate TSAs (and related derivatives) are fairly close mimics of the TSs of ester hydrolysis and hence elicit antibodies in which OAS is the primary, or often the sole, source of catalysis. 2.1.2 Esterolytic Antibodies Based Solely on Oxyanion Stabilization Our comparison of OAS in esterolytic antibodies with natural enzymes is based on a number of antibodies in which OAS was shown to be the only source of the catalytic activity (group I antibodies). This group includes antibodies D2.3, D2.4 and D2.5 [10, 24–26], 48G7 [27, 28], 29G11 [29, 30], and 6D9 [31–33] (Group I; Fig. 2), which show most if not all of the following characteristics: i) The observed rate acceleration correlates well with the differential affinity toward the TSA vs the substrate: K TSA /K S ≥ kcat /kuncat (Table 1) ii) A linear phase is observed in the pH-rate profile iii) Crystal structure indicating active-site residues that comprise the oxyanion hole iv) Mutation of OAS residues eliminates catalysis (Table 2).
Other antibodies (Group II; Fig. 2) for which structural data is available, CNJ206 [14, 34], 17E8 [29, 30, 35], and 7C8 [11] were all raised against similar or identical TSAs to those of Group I, and exhibit K TSA /K S ≤ kcat /kuncat . This suggests that forces additional to or other than OAS take part (Sections 2.2 and 2.4). Nonetheless, their catalytic efficiencies may be taken as an upper limit to the contribution of OAS to
5
6 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Fig. 2 The “oxyanion hole” of esterolytic antibodies. Oxyanion-stabilizing residues are depicted from the TSA complexes of antibodies CNJ206 (1KNO in PDB), 17E8 (1EAP), D2.3 (1YEC), 48G7 (1AJ7), 6D9
(1HYX), and 7C8 (1CT8), viewing from the oxyanion representing the incoming hydroxyl nucleophile. CPK color codes: gray: carbon, red: oxygen, blue: nitrogen, orange: phosphate, and green: chlorine.
their activity. Antibody 43C9, where an endogenous nucleophile was shown to be involved [6], is not considered, as the two mechanisms that affect its catalysis (OAS and nucleophilic catalysis) are difficult to separate (Section 2.2). Detailed comparisons of the hydrolytic antibodies’ crystal structures in the presence of the respective TSAs are available in several reviews [36–37] . A common motif (“canonical binding array”) of interactions with the TSA oxyanionic core was
2 Chapter Overview Table 1 Oxyanion stabilization by hydrolytic antibodies
kcat /kuncat G‡ cat-uncat G◦ TSA-S Catalytic Antibody × 103 Kcal/mol Kcal/mol OAH residuesa)
Secondary OAH residuesb)
Group I D2.3 D2.4 D2.5 48G7 29G11 6D9
110 36 1.9 16 2.2 0.9
7.0 6.3 4.5 5.8 4.6 4.1
7.0 6.3 4.3 6.3 6.0 4.1
Tyr100dH, Asn34L Tyr100eH, Asn34L Tyr100eH, Ser34L-H2 O His35H, (Tyr96H) Lys93H, (Tyr96H) His27d
Trp95H, Tyr96L-H2 O Trp95H, Tyr96L-H2 O Trp95H, Tyr96L-H2 O Arg96L, Tyr33H His35H, Tyr96L –
a) Residues in hydrogen-bond distance from the phosphonate’s oxyanion representing the ester substrate carbonyl (Fig. 1a), assigned by comparison of the antibodies’ phosphonate- and substrate analog-bound structures (D2.3, D2.4 and D2.5) [128], mutagenesis analysis (see Table 2), pH-dependency analysis (D-antibodies [10], CNJ206 [7, 129]C8 [11]), and docking (48G7, 29G11, and 17E8 [39]). b) Residues in hydrogen-bond distance from the phosphonate’s oxyanion representing the incoming hydroxide nucleophile (Fig. 1a) (see text for assignment).
revealed, identifying the residues contributing to the binding of the two oxyanions (Table 1, Fig. 2). Convergent evolution is evident by the sequence similarities as well as structural homology. Antibodies CNJ206 and 17E8 exhibit only 41–52% differences in the primary sequence of the heavy- and light-variable regions, yet share 8 out of 10 contact residues with the hapten. Antibody 48G7 shares the same light chain with CNJ206 and has 7 of 10 of the heavy chain’s CDR residues in common with CNJ206 (for the detailed analysis see [36–38]). Another example of convergence is seen in the structures of D2.3 and D2.4, sharing the same germline genes with 16 positions difference in their variable regions. In particular, the CDR3 H loop of D2.4 differs in 4 positions and includes an amino acid insertion compared to D2.3, as a result of different D-J segments’ junction. Consequently, their CDR3 H loops adopt Table 2 Oxyanion hole mutants of hydrolytic antibodies
Antibody
OAH residue mutant
k cat /kuncat wt = 1.0
D2.3/4b)
AsnL34 → Gly TyrH100d → Phe TyrH100d → Gly TyrH100d → Lys TyrH100d → Ser His35H → Glu His35H → Gln His27d → Ala
1.1 n.d. n.d. n.d. 0.08 0.04 0.6 n.d.
48G7a) 6D9 a) b)
Mutants expressed as recombinant Fab fragments [28]. Mutants expressed as scFv chimera of D2.4 VL /D2.3 VH [10]. n.d. – catalytic activity not detected.
7
8 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
distinctly different structures yet place the critical OAH residue Tyr100d (D2.3) and Tyr100e (D2.4) at the exact same position with respect to the TSA’s oxyanion [26]. The TSAs of Group I antibodies share two epitopes: an aryl or benzyl leaving group, and a phosphonate monoester (Fig. 2). While aromatic moieties are well known as immunogenic epitopes (hence their common use in TSA haptens), Tantillo and Houk suggested that the determining epitope may be the bidentate “transition state epitope” oxyanionic moiety. This is supported by the identification of similar motif in crystal structures of arsonates, sulfonates and DNA binding antibodies. Interestingly, the origin of many of the residues in contact with the TSA phosphonates can be tracked down to the germline genes (e.g. TyrH33 of 48G7, HisH35 of CNJ206 and 48G7, His27d of 6D9). Exceptions do occur though, as in the structures of 6D9 and 7C8 raised against a secondary p-nitrophenyl phosphonate. In contrast to other Group I TSAs, in antibodies 6D9 and 7C8, the linker via which the TSA hapten is linked to a carrier protein, is located on the leaving group and not on the phosphonate (Fig. 2). The active site of 6D9 and 7C8 is shallower, the phosphonate is less buried than in other Group I antibodies, and the active-site residues in contact with the TSA are fewer and different [11]. Despite the evident convergence between Group I’s active sites, the immune system provided some variations within the interactions forming the oxyanion hole (Table 1). The distinction between the position of the oxyanion hole and that of the entering hydroxy nucleophile was determined by the respective structure in the presence of substrate analogs (as in the D-antibodies), by pH profiles pointing to key hydrogen-bond donor residues (D-Abs, CNJ206), and by mutagenesis (D-Abs, 48G7, 17E8), and was complemented by docking in silico (Table 1, Fig. 2) [39]. 2.1.3 Antibody Oxyanion Holes The D-antibodies (D2.3 and D2.4), the most efficient antibodies of Group I, are taken here as a model for examining the contribution of active-site residues acting OAH. Two such residues (Tyr100dH and Asn34L) were suggested to form the crystal structures of the D-antibodies in complex with the TSA and a substrate analog (Fig. 2). In the crystal structure of D2.3 with an amide substrate analog (1YEF in PDB), 3.8 Å is the distance between the aspargine’s δ-amide proton donor and the scissile bond carbonyl-oxygen (compared to the 2.6 Å distance from the oxygen of TyrH100d), suggesting that Asn34L interaction may be specific to the transition state. In addition, D2.5, lacking Asn34L, has 20–50 fold lower catalytic efficiency than D2.3 and D2.4 (Table 1). However, mutating Asn34L to Glycine did not affect the rate of catalysis (Table 2), leaving Tyr100d the primary functional OAH residue. This was further supported by several Tyr100d mutants (Table 2) and pH profiles of binding and catalysis [10]. The pH-binding profiles revealed a 1.6–1.9 units difference in pK a between the substrate-antibody and the TSA-antibody complexes. Thus, the negative charge marking the difference between the substrate and the TSA (and hence the actual TSs of the reaction) dramatically strengthens the tyrosyloxyanion H-bond, thus rendering it the key residue in the D-Abs oxyanion hole [10]. The specific free energies, 6.3 and 7 Kcal/mol for the complexation of D2.4 and D2.3
2 Chapter Overview
with the TSA G◦ TSA –G◦ s ) are identical to the reduction in the activation energy barrier derived for the D-Abs from their rate acceleration (GC‡ cat –G‡ uncat ) (Table 1). This tight correlation suggests that the D-Abs active sites convert binding energy into catalysis with close to 100% efficiency, corresponding to the contribution of the tyrosyl OAH. Judging from Tables 1 and 2, tyrosine is the most efficient single residue for OAS. Further, a hierarchy may be derived where Tyr (contributing up to 7.0 Kcal/mol) > His (4.1 Kcal/mol, assumed from 6D9 single His OAH and48G7 His35HGlu mutant) ≥ Lys (∼3 Kcal/mol, 29G11 Lys-main chain amide OAH combination is weaker than 48G7) > backbone amide (2.2 Kcal/mol, judged by comparison of CNJ206 to wt48G7; 48G7 HisH35Glu mutant’s activity, which is derived from Tyr97H backbone amide, is approximately half that of CNJ206 with two backbone amides). Though highly speculative, this order of activities may be justified by the oxyanion pK a s found for serine hydrolase substrates, ranging between 7 and 10 [40, 41]. It is expected that hydrogen bonds between donors and acceptors with matching pK a would form the strongest hydrogen bonds [42]. Tyrosine’s hydroxyl (pK a ≈10) may afford the closest match to ester’s TS pK a . It is expected that this pK a would be considerably lower than the pK a of the TSs, leading, for example, to amide hydrolysis. The former is often stabilized in nature by OAH comprised of amide bonds that have a pK a much higher than the hydroxyl of tyrosine (see below) [5, 43]. Matching pK a values may also explain the significantly stronger interaction of the D-Abs’ tyrosine (∼7.0 Kcal/mol) compared to the average binding energy of a charged group via a single hydrogen bond (3.5–4.5 Kcal/mol) [44]. 2.1.4 Oxyanion holes–Antibodies vs. Enzymes The versatility of OAS solutions adopted by the immune system is comparable to, or perhaps even higher than, that observed in natural hydrolytic enzymes. Excluding metal hydrolases, which are not easily mimicked by antibodies (Section 2.5), the overall majority of enzymes use a combination of two (e.g., chymotrypsin, trypsin) or three (e.g., AChE, Fig. 3) backbone amides to form their OAHs. Their contribution to the catalytic activity cannot be addressed directly by mutation analysis or separated according to the individual contribution of each backbone NH. OAS could only be assessed by attributing the residual activity of the triad mutants to OAS [e.g., trypsin [45], acetylcholinesterase (AChE) [46]; Table 3]. A minority of enzymes were found to utilize a combination of a backbone NH groups as H-donor together with a side-chain NH group [e.g., Asn in subtilisin (Fig. 3), Gln, or Ser (Table 3)]. Interestingly, the recently characterized prolyl oligopeptidase-like family [43, 47, 48] and the structurally similar cocE esterase (Fig. 3) [49] utilize a tyrosine side chain (Table 3) together with the backbone NH of the seryl nucleophile. Indeed, the efficient OAS by catalytic antibodies such as the D-antibodies may indicate that tyrosine could be as efficient as or even more efficient than NH groups. The contribution of the OAH side chain to the overall catalytic rates of these enzymes was assessed by mutation of the putative side chain (Table 3). Such studies are complicated by mutation-induced structural changes of the active site as clearly
9
10 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Fig. 3 The ‘oxyanion hole’ of hydrolytic enzymes. Oxyanion-stabilizing residues are depicted from the following inhibitor-enzyme crystal structures: acetylcholinesterase (AChE) with m(N,N,N-trimethylammonio)-
2,2,2-trifluoroacetophenone (TMTFA) (1AMN in PDF), cocaine esterase I (cocE) with phenyl boronic acid (1JU3), and subtilisin with D-p-chlorophenyl-1-acetamidoboronic acid (1AVT).
seen in the cocE Tyr44Phe mutant’s structure. As a result of a loss of H-bonding to the mutated tyrosine, three Trp residues change their position, and the Phe ring is tilted by 20◦ compared to the wild-type tyrosyl ring [5]. In addition, given the average bacterial translation error rate of 5 × 10−4 per amino acid [50], wild-type revertants may significantly affect these single-mutants’ analyses. Histidine-containing OAH (as in antibody 6D9; Fig. 2) have not yet been found in nature. The lower efficiency of His in OAS is seen in hydrolytic antibodies (see above). The latter may be due to the relatively low pK a of the His side chain or the more demanding desolvation of the charged His residue compared to the more hydrophobic tyrosine, aspargine, or serine. Interestingly, several structural studies (NMR of trypsin-leupeptin complex [51]; X-ray of proteaseA-chymostatin [52]) identified the oxygen carbonyl of the aldehyde inhibitors pointing toward the active-site’s His, suggesting that the latter has comparable affinity to the oxyanion with respect to the original OAH two backbone NH groups. This observation is in agreement with the OAS efficiency scale postulated from the catalytic antibodies’ data. The estimated values for the individual residue’s contribution to OAS are generally lower for enzymes (Table 3) than for the D-antibodies and most other esterolytic antibodies (Tables 1 and 2). It may be that the enzymes’ mutagenic studies underestimate the OAH side chain’s contribution, as the remaining H-bond by the backbone NH at the OAH is strengthened in the absence of its side-chain partner, hence compensating for its absence. Such compensation may explain the mild effect of 48G7 His35HGlu mutation (40-fold reduction in catalytic rate), compared to 6D9 His27dAla mutation. The latter lacks a backbone NH, which is present in 48G7’s OAH (Table 2). The only natural OAH suggested to have a similar contribution to catalysis as that of the antibodies described above is of AChE, which has three hydrogen donors (Fig. 3) (compared to 1–2 in most antibodies). Did enzyme evolution miss out on an efficient OAS strategy that is easily implemented by antibodies? Few plausible answers come to mind. The OAH may not have evolved independently in nature, but paralleled (or followed) the appearance of a catalytic triad that takes care of nucleophilic and proton-transfer catalysis (Fig. 1b). Once the latter was established,
2 Chapter Overview Table 3 Contribution of oxyanion stabilization to enzymes, mutagenesis studies
Enzyme
OAH
Subtilisin Asn155, Thr220, (Ser221)
CarboxypeptidaseA Trypsin
Gln19
(Ser42) Prolyloligopeptidase Oligo-peptidaseB CocE
AChE
Arg127 (Gly193) (Ser195) (Ser195) Gln19Ala (Cys25) Ser42 Ser42Ala (Gln121) Tyr473 (Asn555) Tyr452 (?)e) Tyr44 (Tyr114) (Gly118) (Gly119) (Ala201)
OASa) ‡ G cat−uncat Kcal/mol
Ref.
Asn155Gly Asn155Ala Thr220Ala Asn155Ala Thr220Ala Ser221Alab) Arg127Alac) Ser195Ala
3 4.2 1.8 6.0
[4] [130] [130] [130]
4.8 6.0 4.2
[4] [131] [45]
3
[132]
3.6
[133]
Tyr473Phe
3.7
[43, 47]
Tyr452Phe Tyr44Phe
3.6 > 4.4d)
[48] [5]
-f)
5–7
[46]
Mutation
a) Contribution to oxyanion stabilization derived from OAH-residue mutation’s effect on catalytic rates or: b) Contribution to oxyanion stabilization derived from residual activity after mutation of the enzyme’s endogenous nucleophile. c) Though Thr220 is only within 4.0 Å of the oxyanion, dynamic simulations suggest this residue’s participation in subtilisin’s OAS. d) Limited by detection of mutant’s activity. Significant structural perturbation of mutant’s active site e) As crystal structure is unavailable, the identity of an assumed second OAH residue (analogous to the homologous Asn555 of prolyloligopeptidase (see separate entry) is unknown. f) Estimated by the authors based on AChE-TSA crystal structure and accumulative data.
together with other mechanisms such as substrate destabilization [46] [52], the hydrolytic activity may have been sufficiently high (diffusion-limited at times [46]) to make further improvements in OAS unnecessary. Alternatively, it was recently suggested that the charged imidazole of the subtilisin triad’s histidine has a major role of lowering the pK a of the TS oxyanion, thereby diminishing the OAH binding energy [40]. In such a scenario, the major role of the OAH is substrate alignment rather than TS stabilization as was initially suggested [53]. It remains to be seen whether these results would extend to other natural enzymes.
11
12 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
In conclusion, hydrolytic antibodies validate the fundamental role of OAS in enzyme catalysis. They do so with similar or perhaps better efficiency than enzymes. OAS measured for antibodies could be thus used as an upper limit to the contribution of OAS in natural hydrolytic enzymes, providing a reference value for the contribution of their other mechanistic features (see below).
2.1.5 Antibody Affinity and Rate Acceleration Limitation Despite the limitations of TSA design, our analysis suggests that many antibodies do convert TSA binding energies to catalysis with nearly 100% efficiency (Table 1). This may suggest a facile route to increased catalytic efficiency via antibodies with higher affinity. However, it appears that antibodies with affinities higher than 1 nM may not be easily obtained. Data collected on the affinity of a large number of antibodies [54] suggest an antibody of 0.1–1 nM [55]. This is derived from the upper range for the association rate (< 5 × 10 M−1 s−1 ) and a dissociation rate (koff ) that is sufficient to trigger B-cell activation (10−3 –10−4 s−1 ). These studies suggest (see [54, 55] and references therein) that 0.1–1nM affinities are sufficient for mounting an efficient immunological response, arguing that the immune system is unlikely to produce higher affinity antibodies. By inference, with a low-average K M (≈KS ) in the mM range [56–58] a “catalytic ceiling” of K TS /KS = kcat /kuncat = 106 − 107 may be derived for catalytic antibodies raised against TSA haptens even if these happen to perfectly mimic the reaction TSs. Thus, the turnover number (kcat ) of antibody-mediated hydrolysis of esters commonly targeted by catalytic antibodies (kuncat 10−6 –10−5 s−1 ) is expected to be well under 10s−1 and the specificity constant (kcat /K M ) ≤ 104 M−1 s−1 . Under the immunological constraints limiting K TSA , a lower K M (or higher substrate affinity) would result in a lower rate acceleration, without lowering the specificity constant. Indeed, a survey of the published catalytic antibodies conforms with the above theoretical limitation (Fig. 4). Moreover, in some instances, not all of the TSA binding energy is directed towards TS stabilization, but rather to non-contributing epitopes of the TSA hapten. For example, antibody 48G7 was shown to bind its TSA with 30 000-fold higher affinity compared to its germline counterpart, resulting in only ∼80-fold improvement of kcat /K M [16]. Although independent values of kcat and K M were not reported, it is likely that a significant part of the increase in binding energy upon maturation was directed to both the substrate and the TSA, and therefore led to no increase in rate acceleration. Similarly, attempts to improve the rate of esterolytic antibodies by directed evolution aimed toward higher TSA affinities have shown that major increases in affinity may result in minor rate improvements [59, 60]. Improving the rates of esterolytic antibodies is not a purely academic challenge. The limitations in rate hinder at least two of the most promising biomedical applications of hydrolytic antibodies – cocaine detoxification [61] and pro-drug therapy [62]. Catalytic forces other than OAS may help in closing the gap of ∼100-fold reactivity [5, 62] required for their successful implementation. These are discussed below.
2 Chapter Overview
Fig. 4 Rate enhancements exhibited by catalytic antibodies: a compilation of rate accelerations exhibited by all catalytic antibodies [139]. The theoretical threshold is set by the affinity ceiling of antibodies as described
in the text. The only antibody to surpass the theoretical threshold dictated by the immune system is 38C2 [140], achieved by reactive immunization (see text for details).
2.1.6 Nucleophilic Catalysis The vast majority of esterolytic antibodies make use of a solution (exogenous) hydroxide ion to generate the tetrahedral, oxyanionic intermediate that leads to ester hydrolysis. This is not particularly effective, as at or close to neutral pH, hydroxide ion concentrations are very low. The enzymatic solutions to this problem are basically two: (i) nucleophilic catalysis by an active-site residue (an endogenous nucleophile) via an acyl-enzyme intermediate; (ii) a general-base residue and a bound water molecule leading to the in situ generation of an appropriately positioned hydroxide ion. Inspired by natural hydrolytic enzymes that utilize a covalent catalytic mechanism (Fig. 1b), and with the goal of catalyzing the more demanding reaction of amide hydrolysis, several attempts were made to introduce an endogenous nucleophile into antibodies’ combining sites. But, in fact, the more effective cases of endogenous nucleophilic catalysis seem to be in antibodies in which the endogenous nucleophile was incorporated haphazardly. Antibodies in which endogenous nucleophilic catalysis plays a role are discussed in Section 2.2 below. In the same vein, attempts were made to generate antibodies in which the nucleophilic hydroxide is generated in situ by general-base catalysis (discussed in Section 2.4 below). Alternatively, the efficiency of the exogenous nucleophilic attack can be enhanced by applying nucleophiles more potent than hydroxide (“chemical rescue”; Section 2.3). 2.2 Endogenous Nucleophiles in Hydrolytic Antibodies 2.2.1 Reactive immunization (RI) Several attempts were made to introduce endogenous nucleophiles into antibody combining sites, either by chemical modification [9, 63] or by immunization. The
13
14 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Scheme 1
latter was aimed at “reactive immunization”, whereby the labile phosphonate diester 1 (Scheme 1), a known covalent inhibitor of hydrolases, was used as a hapten [7, 64]. During the immunization process, an RI hapten should covalently trap antibodies with a nucleophilic residue, thereby forming a highly stable complex that is selectable by the immune system. There is, however, a major caveat to this methodology. Hydrolysis of the activated p-methylsulfonate ester substrate via nucleophilic catalysis will result in an that is significantly more stable than the original substrate. For effective catalysis to occur, an would be necessary to activate the hydrolysis of the acyl-enzyme intermediate (Fig. 1b). Such a residue is counter-selected in this system, as it would also hydrolyze the covalent phosphonyl hapten-antibody bond, thus eliminating the selectable advantage of the reactive immunization. Moreover, hapten 1 is labile (estimated half-life mice’s serum is 24 h), resulting in a mixture of haptens, 1 and 2, (the bi- and mono-phosphonate esters, respectively). Unexpectedly, the affinity to the “baiting” hapten 1 is rather low (in the micromolar range) compared to the high nanomolar affinity of antibodies raised against hapten 2. The kinetic parameters of anti-1 antibodies, suggest that OAS (represented by hapten 2) does not account for the RI-derived antibodies’ catalytic efficiencies (e.g., for antibody 15G12, K TSA(2) /K M = 2; kcat /kuncat = 6.6 × 105 ). However, there is no other evidence to support nucleophilic catalysis in the hydrolytic antibodies raised against RI hapten 1 (e.g., burst kinetics, trapping of the covalent intermediate, non-competitive inhibition by hapten 1). Some of the RI antibodies were reportedto catalyze the dephosphonylation of their cognate hapten 1, albeit with a low turnover (< 3), because of product inhibition [7] (such single turnover rate enhancement by
2 Chapter Overview
ester-hydrolytic antibodies, raised against the ground state hapten, was reported as early as 1980 [65]). Schowen has recently suggested that TS flexibility, where the classical bipyramidal dephosphonylation TS may adopt a more tetrahedral-like conformation, renders it available for stabilization by the anti-phosphonate antibodies [66]. It should be mentioned that, although RI achieved a limited success with ester hydrolysis, it has proven highly effective for reactions such as aldol condensation, retro-aldol and Michael addition reactions that proceed via a covalent Schiff-base enzyme-substrate intermediate [67, 68]. 2.2.2 Serendipity and 43C9 Antibody It has been suggested that several hydrolytic antibodies raised against phosphonate TSAs include an endogenous nucleophile as part of their mechanism of action. Unfortunately, in most cases, only partial evidence is available. In general, proving a nucleophilic mechanism is not a trivial matter, and several independent pieces of evidence need to provided, including the direct identification of an acyl-enzyme intermediate, before a definite statement can be made [2]. A “ping pong” mechanism by antibody PCP21H3 (kcat /kuncat not determined) in a transesterification reaction of a vinyl ester has been suggested. The antibody exhibits burst kinetics for the hydrolysis of a related p-nitrophenyl ester [17]. Product inhibition, common to many catalytic antibodies, was detected as well, yet this does not outweigh the evidence for the covalent mechanism. A similar burst in product formation interpolated to a 1:1 molar ratio of product to antibody was reported for antibody 6F11 (kcat /kuncat = 3.4 × 103 ) [69]. Chemical modification oftyrosyl residues resulted incomplete activity loss. Antibody 20G9 (kcat /kuncat = 500) exhibits “burst kinetics”, and this led to the suggestion of nucleophilic catalysis [70–72]. However, the kinetic analysis is complicated by strong product inhibition, and no evidence apart from the “burst” is available. Incubation of the antibody with N-cbz-glycine -O-phenyl ester resulted in multiple acylation of the antibody and a significant decrease in its activity, yet acylation of an active-site residue such as lysine that is not directly involved in catalysis may account for this result [73]. In addition, the reported acid-limb pH profile fitting a tyrosyl pK a (9.6), as well as the loss of activity when chemically modifying tyrosyl residues with tetranitromethane (TNM), could well be explained by this residue’s participation in OAS [10, 73, 74]. Antibody 7C8 (kcat /kuncat = 3.4 × 103 ) also exhibits a pH profile with an acidlimb with a pK a ≈ 9.0. The rate is also insensitive to the addition of a strong nucle-ophile (hydroxylamine; see also Section 2.3 below). These led to the suggestion of nucleophilic catalysis by an active-site tyrosyl hydroxyl [11]. However, D2.3 and D2.4 antibodies also manifest the above characteristics [10], yet their active site tyrosine (H100d) was shown to serve as an OAH rather than a nucleophile. Similarly, 7C8’s crystal structure reveals a sole residue that may serve as an OAH, Tyr95H. Another clear demonstration of the difficulties associated with proving a nucleophilic mechanism is antibody 17E8. Its crystal structure and hydroxylamine partitioning experiments suggested the existence of a Ser-His dyad, reminiscent of natural serine hydrolases [13, 35]. However, direct evidence of an acyl-enzyme
15
16 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
intermediate was not provided, and the proposed mechanism involved the disruption of a hydrogen bond between His35H and TrpH47 that is conserved in all known antibody structures [36]. Indeed, nucleophilic catalysis was ruled out when a variant of 17E8 in which the Ser, assumed to act as nucleophile, was replaced by Gly exhibited a similar rate enhancement [30]. Most significantly, analysis of the antibodies described above suggests that even if an is indeed part of their mechanism, the resulting rate enhancements are 2–3 orders of magnitude lower than is achieved with antibodies raised against the same haptens and that act by OAS only (see above). Antibody 43C9, raised against the phosphonoamidate 3 (Scheme 1) [75, 76] stands out from all the above examples, not only because of its high reactivity (it catalyzes the hydrolysis of both a p-nitrophenyl ester and a p-nitroanilide and, although its rate enhancement is within the range of other known esterolytic antibodies, its specificity constant is so far unrivalled (kcat /KM = 2.8 × 107 M−1 s−1 )), but also because of the unambiguous assignment of a nucleophilic mechanism. In the case of antibody 43C9, there is ample evidence to prove a nucleophilic mechanism. The pH-rate profile suggests a change in the rate-limiting step: while at pH > 9 product release is limiting, at more acidic pH an acylation step determines the rate [77]. This was demonstrated by pre-steady-state kinetics, where a burst of antibody-equivalent product release kinetics was observed [77]. Finally, the effect of p-substituents of the phenolic leaving group on the rate of catalysis [78] and the identification by electrospray mass spectroscopy of an acyl-enzyme intermediate (with HisL91 as the acylated residue) unambiguously support the proposed mechanism [6]. A detailed catalytic mechanism based on 43C9 crystal structure was put forward where a nucleophilic attack, carried out by HisL91, is accompanied by OAS via the side chains of HisH35 and ArgL96 (Fig. 5) [79]. The serendipitous selection of a His side chain as the nucleophilic catalyst of 43C9 is quite interesting. Not only is the His’s imidazole side chain a better nucleophile toward activated esters than the seryl alcohol moiety, but also its pK a is optimal for catalysis and theresulting acyl-enzyme intermediate is much more labile than an ester intermediate. Thus, His makes an ideal nucleophilic catalyst. Nevertheless, His was never recorded as a nucleophile in natural hydrolytic enzymes. One possible explanation is that with non-activated ester substrates, imidazole prefers acting as a general base rather than a nucleophile, hinting at nature’s preference for other nucleophiles as alcohols and thiols [80]. The OAH of 43C9 is equivalent to other known antibodies (Table 1, Fig. 2), suggesting a comparable contribution to its rate enhancement. Thus, the nucleophilic mechanism does not seem to contribute much to the rate acceleration of ester hydrolysis. This can be explained by the absence of a general base residue (Fig. 1b) that would catalyze the rate-limiting de-acylation step, thus replacing an activated p-nitrophenyl ester with a less active acyl-imidazolium moiety. However, nucleophilic catalysis may contribute significantly to anilide hydrolysis, where formation of the intermediate, rather than its breakdown, is rate-determining. The nucleophilic mechanism may also release 43C9 from having a kcat /K M limited by low affinity to the ground state substrate, as prescribed for efficient catalytic antibodies utilizing solely the energy gained
2 Chapter Overview
Fig. 5 The mechanism of catalysis by antibody 43C9 [79]. (i) A nucleophilic attack by an endogenous nucleophile – HisL91 side chain. The nucleophilic imidazole is maintained in the correct tautomer by a network of H-bonds with TyrL36 and TyrH95 side chains. (ii) The resulting oxyanionic intermediate is stabilized by H-bonding to His35H and ArgL96. Bond rearrangement, coupled with a proton transfer cascade, re-
sults in the acyl intermediate and the release of the aniline leaving group. The aniline release is mediated by protonation by a water molecule (not shown). (iii) An exogenous nucleophilic attack by a hydroxide ion on the acyl-enzyme intermediate. (iv) Bond rearrangement results in the release of the carboxylate product and the free antibody (see hapten 3 structure for R and R ).
from preferential TS vis-a-vis substrate binding, yielding ca. 2 orders of magnitude higher specificity constant, in parity with natural enzymes’ values. Quantifying the individual contribution of the nucleophilic attack is further complicated by the fact that a HisL91Gln mutation of 43C9 results in total loss of hydrolytic activity, yet retains wild-type levels of affinity to its hapten and substrates [81]. Thus, in agreement with natural enzymes, even in a somewhat simplified model, different catalytic mechanisms seem to be coupled. Importantly, the nucleophilic mechanism endows 43C9 with the ability to hydrolyze the more stable p-nitroanilide substrate, unattainable by the antibodies using OAS as their only source of reactivity. Unlike natural amidases, 43C9 does not catalyze the hydrolysis of non-activated amides. The rate-limiting step in their hydrolysis involves the protonation of the amine-leaving group via a general acid residue (as in natural amidases, Fig. 1b), not found in 43C9 active site.
17
18 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Fig. 6 The tetrahedral intermediate formed by various exogenous nucle-ophiles.
2.3 Exogenous Nucleophiles and Chemical Rescue in Hydrolytic Antibodies
The nucleophilic attack of a solution’s (exogenous) hydroxide anion seems to constitute the rate-limiting step of all hydrolytic antibodies. In most cases, the hydroxide reacts directly with the ester substrate and, in rarer cases, with an acyl-antibody intermediate. This step therefore merits careful examination. One of the most useful analytical tools is the replacement of hydroxide by alternative nucleophiles. Negatively charged, tetrahedral intermediates, homologous to the hydroxyl-mediated catalytic mechanism (Fig. 1a) can be formed by the attack on the ester’s carbonyl by other nucleophiles (Fig. 6). Thus, it may be expected that small nucleophiles, stronger than the hydroxyl anion, may be active in the antibodies’ active sites, resulting in higher catalytic rates. A similar strategy was previously used to “chemically rescue” hydrolytic enzymes where the nucleophilic residue was mutated. The mutants’ catalytic activity can often be restored to rates only 100- to 300-fold lower than the intact enzyme [82–84]. One example of effective “chemical rescue” of hydrolytic antibodies is with the peroxide anion, a ∼200-fold stronger nucleophile than hydroxide, that was found to enhance the turnover rates of D2.3 and D2.4 antibodies to a similar extent (Fig. 7a, Table 4) [10]. The concentration-dependent rate acceleration exhibits a rate ceiling at a kcat that is 500- to 2000-fold higher than that of the antibodies’ hydroxidemediated catalysis. The rate ceiling is due to a change in the rate-limiting step observed at high peroxide concentrations – a conformational change becomes rate limiting instead of the nucleophilic attack. The observed rate enhancement of the D-antibodies in the presence of peroxide compares favorably with the estimated contribution of the deacylation rate-limiting step of various natural hydrolases (e.g., ca. 200-fold for cocaine esterase cocE [10, 49]. The D-Abs’ maximal rate of 6 s−1 is also comparable to many natural esterolytic enzymes, such as cocE [5], thioesterases [85], and carboxylesterases [86], and is within the lower range of typical rates of (1–200 s−1 ). This suggests that a potent exogenous nucleophile can match, in the framework of efficient OAS antibodies, the role of the enzyme’s general-base deacylation mechanism (e.g., the His-Asp dyad in Fig. 1b). The secondorder rate for peroxide anion reactivity with the D-antibodies was calculated to be 6 × 105 M−1 s−1 . This is significantly lower than the directly measured second-order rate for water-mediated deacylation of chymotrypsin (< 4M−1 s−1 ) and compares favorably with its deacylation by an exogenous amine nucleophile (tyrosinamide: 1.7 × 104 M−1 s−1 ), a direct result of the enzyme’s specificity for the leaving group
2 Chapter Overview
Fig. 7 The effect of exogenous nucleophiles on the rate of antibody-mediated ester hydrolysis. Rate acceleration factors are the ratios of net initial rates observed in the presence of the nucleophile at a given con-
centration and in its absence. (a) Effect of peroxide anion on the hydrolysis of pnitrophenyl ester by D2.3 and D2.4 [10]. (b) Effect of hydroxylamine on chloramphenicol ester hydroysis (data derived from [11]).
Table 4 Nucleophile effect on D-Abs’ catalysis
Nucleophilea) (× 106 M−1 )b) − OH −O
2H
NH2 NH2 NH2 OH N3 −
pK a) (× 106 M−1 )b)
Buffer (× 106 M−1 )b)
Antibody D2.3
Antibody D2.4
15.4 11.6 8.1 6.0 4.0
13 2800 0.24 0.45 1 × 10−3
12 1270 0 0 n.d.
14 3200 0.008 0.006 0.01 × 10−3
a) A linear correlation links the nucleophilicity of most nucleophiles to their pK a s. However, the reactivity of the α-nucleophiles cited above (i.e. excluding OH− ) is far higher than predicted from their pK a (e.g. 103 -fold for hydroxylamine). b) Results are given in “molar reactivity” – the rate acceleration factor at 1 M nucleophile as depicted in [10].
19
20 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Scheme 2
[87]. Importantly though, the D-Abs’ rate accelerations (kcat /kuncat ) are relatively unchanged when peroxide replaces hydroxide, suggesting that the D-Abs are equally complementary to the respective TSs of both hydroxide and peroxide. The D-Abs exhibit absolute preference toward small, negatively charged nucleophiles (i.e., hydroxylate and peroxylate) for the displacement of the ester’s alcohol moiety, whilst other nucleophiles (e.g., hydroxylamine) do not affect the rate (Table 4). We believe that this specificity is a result of (i) the selection, by the negatively charged phosphonate anion, of antibodies with a cluster of positively charged residues, creating a channel to the antibody active site, and (ii) the snug fit of the antibody to its hapten and therefore to the TS, preventing the entry of bulky nucleophiles such as azide [10]. These results suggest that it may be possible to optimize the electrostatic properties of the channel through which the nucleophile approaches the active site, perhaps by engineering positively charged residues in and around this channel without interfering with the properties of the active-site core [88], thereby increasing the rate of catalysis. The rate of hydrolysis by antibody 6D9 (raised against hapten 4, Scheme 2) is also increased by the presence of the α-nucleophile hydroxylamine. Unlike the D-abs, both the rate (kcat , or kcat /K M ) and the rate acceleration (kcat /kuncat ) increase, albeit to a minor extent (up to ∼2 fold at 100 mM hydroxylamine) (Fig. 7b) [11]. The source of recognition of hydroxylamine by 6D9 was not addressed, yet the Lys32L residue [11] is located in a position suitable for general-base activation of the nucleophile. Alternatively, it may act as an H-bond donor to orient the nucleophile toward attacking the chloramphenicol ester’s carbonyl. It would thus be interesting to study the effect of Lys32L mutants on the hydroxylamine-mediated reaction, as well as attempting the hydrolysis of the more demanding amide hydrolysis in this system. Another reported case of “chemical rescue” concerns antibody 14–10. Using the heterologous immunization approach, where mice were challenged with both haptens 5 and 6 (Scheme 2), Masamune and coworkers attempted the generation of a p-nitroanilide hydrolyzing antibody that makes use of OAS, general-base catalysis
2 Chapter Overview
Scheme 3
(by “baiting” with a positively-charged amine hapten) and an exogenous phenol nucleophile. The best antibody achieved, 14–10, catalyzed both the lactonization of substrate 7 (Scheme 3) via an intramolecular phenol nucleophilic attack (kcat = 1.5 × 10−2 min−1 , K M = 1.27 mM [89]) and the intermolecular attack of added phenol nucleophile to substrate 8 to produce phenyl propionate, 9 (kcat = 1.3 × 10−4 min−1 , K M(2) = 0.37 mM, K M(phenol) = 0.14 mM [12]). In both cases the phenol does not seem to participate in the background buffercatalyzed reactions. Thus, the antibody catalyzes kinetically “disfavored” reactions, resulting in kcat /kuncat = 2 × 104 and 630 for the intramolecular and intermolecular reactions, respectively (the uncatalyzed rates serving as an upper limit of the background rates). Interestingly, 14–10 does not hydrolyze the ester product 9. This could be only partly explained by the lack of a negatively charged TSA to elicit an OAH, as similar, uncharged haptens (e.g., phosphono-amidates [62], and β-amino alcohols [90]) were shown to elicit efficient hydrolytic antibodies. Nonetheless, as the buffer-mediated hydrolysis rate of 9 proceeds twice as fast as the antibodymediated acyl transfer, the net reaction can be formally considered as the activated amide’s hydrolysis. Future studies may reveal whether the phenol nucleophilicity is indeed enhanced by a general-base residue, as implied by 14–10’s pH optimum (8.0 compared with the phenol’s solution pK a of 10). 2.4 General Acid/Base Mechanisms in Hydrolytic Antibodies
The most notable example of hydrolytic enzymes that make use of general acid and base catalysis is aspartic peptidases. This family of hydrolases utilizes the concerted action of two carboxylate residues (Fig. 1c). The first carboxylate serves as a general base to align and activate the water nucleophile, and subsequently as a general acid to protonate the leaving group. The second serves as the OAH and may be regarded as a general acid, donating a proton to the formed carboxylate. In order to mimic such mechanisms with catalytic antibodies, several “bait and switch”
21
22 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes Table 5 Kinetic parameters of “bait and switch”-raised antibodies
Antibody
Hapten
kcat a) (min−1 ) pH = 7.8
27A6 30C8 2D5
10 11 12
0.004 0.005 3.5
K M (mM)
kcat /KM M−1 s−1 )
kcat /kuncat b)
Ref.
0.25 1.1 0.5
0.7 0.07 120
1.4 × 105 1.2 × 105 1.3 × 105
[134] [90] [69]
a)
Values for antibodies 30C8 and 27A6 were derived from pH-rate profiles. Although the same substrate and reaction are involved, a ∼1000-fold difference exists between the uncatalyzed rates reported of one group [134] and those of the other [69]. b)
haptens were devised with the aim of “baiting” oppositely charged residues in the antibody’s active site that may serve in a general acid/base mechanism. An example of this strategy (haptens 10–12) and the kinetic rates of the resulting antibodies is summarized in Table 5. Haptens 10–12 (Scheme 4) were designed to bait different types of residues in the elicited antibodies’ binding sites. The positively-charged hapten 11 was designed to bait a general-base residue (e.g., a negatively-charged carboxylate), whereas hapten 10 aims at baiting a general-acid residue that would facilitate the rate-determining protonation of the amine-leaving group of substrate 14. Nonetheless, the catalytic activity of the best antibodies raised against haptens 10 and 11 is remarkably similar (Table 5), suggesting that even if “baiting” did occur, it does not effect to a significant extent the antibodies’ catalytic efficiency. Moreover, whilst the antibodies raised against haptens 10–12 all catalyze the hydrolysis of the ester substrate, 13, with similar kinetic parameters, the amide substrate 14 is hydrolyzed by none
Scheme 4
2 Chapter Overview
of these antibodies, not even by those antibodies raised against hapten 12. The effectiveness of the “bait and switch” strategy in raising hydrolytic antibodies is further questioned by the fact that hapten 12 was designed to bait a general-base catalyst while maintaining the phosphonate tetrahedral negatively charged TSA core for the generation of OAH residues. On the other hand, in haptens 10 and 11, a planar, uncharged carbon represents the tetrahedral TS. The catalytic rate of the best anti-12 antibody, 2D5, is also in agreement with rates of the above-described OAS antibodies, suggesting that even if an active-site general base has been generated, its contribution to catalysis is not very significant. By comparing the catalytic rates of antibodies 30C6 and 27A6 to 2D5, one could conclude that the tetrahedral phosphonate is a better mimic of the TSs of ester hydrolysis than the β-alcohol moiety, or, in other words, the phosphonate TSA “baits” a ∼500-fold more efficient OAH. Other attempts to utilise the “bait and switch” strategy yielded similar rate enhancements to the examples described above, with kcat /kuncat values in the range 102 − 104 (see, for example [91–93]). Future studies of antibodies raised by the “bait and switch” strategy may shed more light on their mechanism and may allow an improvement in their rates. Notably, as is the case with reactive immunization, the “bait and switch” approach was applied to non-hydrolytic reactions with greater success. This is evident, for example, in the Glu residue acting as a general base in antibody 4B2’s mediated catalysis of an allylic rearrangement reaction (PDB code 1F3D [94]). In contrast, the crystal structure of antibody 1D4, raised against a charged β-amino phosphate, does not show the expected positioning of a catalyzing general-base residue in its active site [95]. Finally, it is encouraging to note that a structural model of the chemiluminescent anti-phosphonate 7F11 antibody (kcat /kuncat = 1.2 × 103 , kcat /KM = 5.9 × 105 M−1 s−1 [96]) indicates the presence of two aspartic acid residues in positions resembling the natural aspartyl protease. Based on this model, the authors suggested that the 7F11 mechanism is similar to the natural counterpart, but that its much inferior catalytic efficiency may be due to the imperfect positioning of the apartyl residues. Antibody 7F11 also suffers from severe product inhibition, enabling an estimated turnover of ca. 5 cycles [97]. Nevertheless, if its active-site structure does prove similar to its natural counterpart, it may serve as a starting point for the generation of more potent hydrolases. 2.5 Metal-Activated Catalytic Antibodies
Several studies have attempted to raise catalytic antibodies with metal ion-mediated hydrolysis, following the zinc-protease mechanism (Fig. 1d). Many enzyme models based on zinc have been created that exhibit high rate enhancements [98], suggesting that the generation of metallo-antibodies with enzymatic activities is a promising avenue. However, this approach has thus far been explored with limited success. Lerner and coworkers reckoned that the “bait and switch” hapten 11 may elicit a carboxylate residue in the antibodies’ active site that would serve as a metal-binding ligand. Furthermore, the pyridinium’s methyl group will induce a
23
24 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Scheme 5
cavity for the metal ion. Indeed, in the presence of Zn+ , antibody 84A3 catalyzes with high specificity the hydrolysis of a pyridinium ester (kcat /kuncat = 1200 [99]), the structure of which is related to the immunizing hapten. No catalysis was observed in the absence of the metal ion or with other, structurally similar ester substrates. The antibody suffers from severe substrate inhibition, probably due to non-productive binding of the substrate, probably in a mode not involving a zinc ion, as judged by the affinity difference to the substrate in the presence or absence of Zn+ (3.5 µM and 840 µM, respectively). In an attempt to overcome this hurdle, the mercury(II)-phosphorodithioate unsaturated complex 15 (Scheme 5) was used to elicit Hg-dependent hydrolysis of ester 16 [100]. The best anti-15 antibody, 38G2, exhibited Michaelis-Menten kinetics (kcat /kuncat = 300, K M(Hg) = 87 µM and K M(16) = 345 µM), multiple turnovers and absolute dependency on Hg(II) for catalysis. While far from enzyme-like reactivity, these studies widen the spectra of possible mechanisms that could be tailored by antibody mimics. 2.6 Substrate Destabilization
Remote interactions, away from the ester or amide carbonyl reaction core, play an important role in many natural enzymes. Such interactions may contribute to catalysis via substrate destabilization (for a recent review see [101] and references therein). For example, Sussman and coworkers postulated that as much as 5 × 105 (corresponding to TS stabilization of 8 kcal/mol) of AchE’s spectacular rate enhancement (1013 ) is derived from orienting the substrate to a highly unfavorable conformation, perfectly aligning the substrate for the nucleophilic attack to take place. This is achieved by strong binding to the choline ammonium charged moiety [46]. Several examples of the identification and possible role of substrate destabilization by hydrolytic antibodies are given below. The docking of the activated anilide substrate in the crystal structure of 43C9 suggests that the substrate’s amide bond is twisted by 140◦ out of planarity, thus destabilizing the bond’s resonance and facilitating its hydrolysis [79]. Similarly, 6D9 antibody’s structure suggests that the antibody locks its ester substrate in the sin unfavorable conformation, resulting in substrate destabilization. In both the above examples, a quantitative assessment of the contribution of substrate destabilization to the rate acceleration is not available. A 5-fold increase in the catalytic rate accompanied by a smaller effect of higher K M (2-fold) was seen between the catalysis of the glutaryl-glycine nitrophenylester
3 The Role of Conformational Changes in Catalytic Antibodies
substrate and a caproyl-elongated substrate by antibody D2.3, suggesting a remote effect of the substrate’s linker moiety. Crystal structure analysis revealed different interactions with the two linkers that may account for the kinetic differences [102]. When the linker terminal glycine was replaced by either the D- or L-Ala enantiomers, a ∼20-fold difference in D2.3’s specificity constant (kcat /K M ) was observed in favor of the l-Ala enantiomer [103] (for the enantioselectivity potential of the esterolytic active sites of various antibodies see [39]). Another example is set by antibody 17E8 catalyzing the hydrolysis of n-formylnorleucine (nLeu) phenyl ester. The specificity factors (kcat /K M ) observed with a series of substrates where the nLeu side chain was replaced by different linear and branched alkyl side chains extend over 2 orders of magnitude. But rate enhancements higher than those with the original substrate (namely with a Leu side chain) were not obtained. These results suggested that 17E8 interactions with the side chain are used to stabilise both the TS and the ground state. However, the replacement of nLeu side chain with a more hydrophilic ethyl-cysteyl side chain resulted in a 20-fold increase in kcat (43 vs 2.1 s−1 ) accompanied by a similar decrease in ground state binding (K M = 4 vs 0.18 mM with nLeu). This rendered the new substrate as efficient as the original substrate (kcat /K M ≈ 1.2 × 104 M−1 s−1 ) and suggested that substrate destabilization may play a key role in the mechanism of 17E8 with the ethyl-cysteyl substrate [104].
3 The Role of Conformational Changes in Catalytic Antibodies
Structural dynamics are part of the nature of proteins, though their identification on the molecular level and their mechanistic roles are still a subject of much research. In enzymes, the identification of conformational changes is generally elusive, as they are typically much faster than the chemical step and product release [2]. Nonetheless, structural fluctuations were observed in serine proteases when comparing their free and boronic acid inhibitor-complexed structures. Such plasticity may be an integral part of the catalytic cycle, mediating, amongst other steps, substrate binding, TS formation, stabilization, and product release (for a recent example see [105, 106]). Several hydrolytic enzymes were also shown to utilize an induced-fit mechanism to tune their specificity, either to a broad (e.g., alpha lytic protease) or extremely narrow (e.g., factor D) substrate spectrum (see [101] and reference therein). In addition, a recent kinetic study suggests that conformational changes control the aperture of AChE’s active site and are coupled to its catalytic activity [107, 108]. There had been several speculations regarding the role and effect that conformational changes may have on catalysis by antibodies. Padlan, for example, speculated that the of antibodies may hinder their catalytic efficiency [109], whilst Benkovic and Stewart suggested that antibodies elicited against TSA might be limited by a “lock-and-key” rigidity and will lack the conformational isomerism that characterizes enzymatic catalytic cycles [20]. In a more general context, the issue of
25
26 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
Fig. 8. Schematic representation of the two mechanisms of conformational changes linked to antibody-ligand complex formation. Ab and Ab’ represent two antibody isomers exhibiting low and high affinity, respectively.
conformational isomerism in antibodies has been a subject of much research and debate. In the past, kinetic (e.g., [110, 111]) and structural (e.g., [112] and references therein) studies provided strong evidence of the dominance of structural changes in antibodies. Conformational changes can be generally described by the inducedfit model (the ligand binding a low-affinity form of the antibody and inducing a subsequent conformational change that leads to a high affinity complex) or by preequilibrium between two different conformational species of the same antibody, only one of which binds the ligand with high affinity. The two models can be combined in one scheme (Fig. 8). While kinetic evidence supports the existence of both mechanisms, structural evidence has been put forward thus far for induced fit only. However, a recent study combining pre-steady-state kinetics in solution and X-ray crystallography provides conclusive evidence for pre-equilibrium in an antibody [113]. The identification of conformational changes in catalytic antibodies and their possible role in catalysis are discussed in this section.
4 Conformational changes in catalytic antibodies
The advent of catalytic antibodies has brought a renewed interest in antibody structure and evolution. In particular, structures of catalytic antibodies underlined the understanding of their mechanism at the molecular level (over 50 published structures of catalytic antibodies in PDB as of end of 2002). As is the case with many ordinary antibodies, conformational changes are observed with catalytic antibodies as well – in several catalytic antibodies, the structure of the TSA-bound antibody differs from the uncomplexed structure. But, in the absence of kinetic studies in solution, the interpretation of the crystallographic data regarding the mechanism of these con-formational changes is ambiguous. Further complications from the comparison of bound vs unbound antibody crystal structures arise from the effect of packing forces (e.g., the antibodies’ arrangement in the crystal in antibody 1D4 [95], CDR1 packed inside the active site of antibody 43C9, blocking the diffusion of the hapten [79]), and the buffer used for crystallization (e.g., antibody D2.3, see below and [15]). A recurrent structurally observed conformational change is a relative movement between the heavy- and light-chains’ variable regions [114], as exemplified by antibody CNJ206 where a 7◦ rotation and a ∼1 Å translation change were found between the TSA unbound and bound structures. In addition, a distinct isomerization was noted in the active site, where a dramatic change of the entire H3 CDR loop – from an open state to a closed state – is observed.
4 Conformational changes in catalytic antibodies
This changes the binding site completely – from a shallow form in uncomplexed antibody to a deep and narrow cavity in the TSA-bound state. This is assumed to be a TSA-induced conformational change, but in fact the TSA-bound conformation may be pre-existing, as neither the oxyanion hole nor the part of the active site that binds the phenolic leaving group is formed in the free form [14]. Induced-fit movement of CDR-H3 residues (H99-H102) was also suggested to enhance surface complementarity to the TSA vs the substrate and product of the catalyzed retroDiels-Alder reaction, thus implying that better rate enhancement is gained by the conformational changes (antibody 10F11, [19]). Crystal structures of antibody 1D4, catalyzing a syn elimination reaction, indicated conformational changes limited to CDR-H2. The conformational change increases the contacts with the TSA (notably residue Arg58), and may contribute to catalysis by constraining the substrate to an elimination-favorable orientation [95]. The free and TSA-complexed structures of several catalytic antibodies were solved, alongside the structures the putative germline antibodies from which these catatylic antibodies diverged [27, 115–117]. A general trend has been suggested, whereby the germline antibodies exhibit a different free and bound structure, whilst the mature antibodies exhibit the same structure. The observed isomerizations between the free and bound antibodies seem to occur mainly at CDR3H residues. In antibody 48G7’s precursor, tyrosines H98 and H99 shift by up to 5 Å upon TSA binding together, with a 4.5◦ 4 change in the V-regions’ orientation, resulting in better packing of the TSA’s p-nitrophenyl ring by TyrH99 [27, 117]. The hapten of antibody AZ-28, catalyzing an oxy-Cope reaction, induces a rotation of CDR-H3 away from the binding site to yield better π–π stacking with the hapten’s phenyl ring as well as enhanced orbital overlap to the reaction’s TS [115]. In the precursor of the oxidating, periodate-dependent antibody 28B4, better packing of the TSA phenyl ring is also achieved by movement of residues H95–H99 [116]. As judged by crystallography, the matured antibodies related to these germline antibodies exhibit the same structure when free and bound to the TSA. Interestingly, in antibodies 48G7 and 28B4, the TSA’s p-nitrophenyl ring adopts a different conformation in the mature vs the germline-complexed structures, locking them in a different binding pocket. This is surprising, as, in the case of the mature 48G7, the germline’s pocket remains unchanged. The above studies [118] promoted the hypothesis that the germline antibodies exhibit high conformational diversity, and that, during affinity maturation, the hapten binding conformation becomes dominant [110]. However, structural differences between the bound and the free state are commonly observed in mature antibodies ([112] and references therein), and at least one study has shown similar structures of the mature Diels-Alderase antibody 39-A11 and its putative germline precursor in both bound and unbound states [119]. The crystallographic studies were not accompanied by kinetics in solution that indicate the existence of a pre-equilibrium between different unbound conformations, in either the mature or the germline antibodies. Indeed, a kinetic study of mature antibodies that showed no structural differences between their free and TSA-bound states clearly indicated a pre-equilibrium between two antibody conformers, one exhibiting low affinity
27
28 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
toward the TSA and another high affinity and catalytic activity. The slow, minutescale isomerization between the two conformers resulted in hysteretic, “memorylike” behavior [15]. Moreover, it has also been shown that the crystallization buffer shifts the equilibrium toward the active conformer, resulting in both the free and TSA-bound antibodies yielding the same structure [15]. Also of interest are the mutations that lead to affinity maturation and their effect on TSA binding and rate of catalysis. In antibodies 48G7 and AZ-28 the mutations occur away from the active site. The importance of remote residues on catalytic efficiency was also noted in mutants of antibody 17E8 selected by phage display for better TSA binding. A double mutant was achieved, with ten-fold improvement of kcat /K M [120]. In antibody 28B4, however, three of the somatic mutations (nine in total), are indirect contact with the hapten [116]. In all three antibodies, somatic mutations increased the TSA-binding affinity (up to 40 000-fold in 48G7 [117]). Interestingly, the catalytic activity of the mature 48G7 was increased compared to its germline counterpart (see Section 2.1), whereas that of antibody AZ-28 was diminished, hinting at the role of structural flexibility in this antibody’s catalytic activity (the catalytic activity of 28B4 germline was not reported). Structural flexibility may also explain the increase in the catalytic activity of the ArgH100aGln mutant of antibody 43C9 (a CDR-H3 mutation remote from the active site), which reduces product inhibition [81]. 4.1 The contribution of structural isomerization to catalysis
The structural studies described above do not indicate a direct role for conformational changes in antibody catalysis, nor do they provide a measure of their contribution to TS stabilization and rate of catalysis. In the first quantitative study of conformational changes in catalytic antibodies, two independent kinetic measurements (one related to TSA binding and another directly to catalysis) performed with the D-Abs (D2.3, D2.4, D2.5) were used to reveal and quantify the contribution of conformational changes to catalysis [10, 15]. Pre-steady-state TSA-binding kinetics of D2.3, D2.4 and D2.5 antibodies, based on the change of intrinsic tryptophan fluorescence upon hapten binding, indicated the formation of a low micromolar affinity primary complex followed by a conformational change toward the final nanomolar affinity complex (Fig. 9a,b and Table 6 [15]). The rate of isomerization (6 s−1 ) revealed by fluorescence quenching measurements with both the 1b and 1p TSAs is significantly higher than the rate of hydrolysis of the D-Abs (0.05 – 0.22 s−1 ) and is therefore invisible in the catalytic, productrelease assay. However by increasing the turnover rate by means of a strongly reactive nucleophile (i.e., peroxide anion, see Section 2.3 above), a “rate ceiling” equal to the isomerization rate was observed (Fig. 9c) [10]. The isomerization provides 25 to 140fold higher affinity to the TSA (Table 6) and a concomitant increase to the D-Abs catalytic rates. This is yet another manifestation of the true representation of the actual TS by the TS analog. The “induced-fit” isomerization contributes not only to rate enhancement but also to specificity. A second,
5 Conclusions
Fig. 9 Induced fit exhibited by the Dantibodies. (a) Two-step kinetics reflected by the stopped-flow trace of fluorescence quenching of antibody D2.4 in the presence of TSA. The fast phase (shaded) is enlarged in the inset. Each phase was fitted to a single exponential rate equation. (b) The effect
of TSA concentration on the slow bimolecular binding step and on the fast isomerization step [15]. (c) D2.4 (full circles) and D2.3 (empty circles) catalytic rates dependency on the peroxide anion exogenous nucleophile’s concentration [10].
minute-scale, isomerization step is unique to antibody D2.3 [15]. This very slow conformational change increases the affinity to the TSA 1b antigen ∼7 fold yet decreases its affinity to TSA 1p by ∼2 fold (Table 6). These results may suggest that the immune system may select high-affinity and high-specificity antibodies that still exhibit a considerable degree of conformational change, and that affinity maturation does not necessarily involve a rigid “lock-and-key” binding mechanism [27, 37].
5 Conclusions
The examples given in this review emphasize the ability of catalytic antibodies to mimic multiple aspects of enzyme catalysis. In addition, several other enzyme characteristics are faithfully mimicked by hydrolytic antibodies but are beyond the scope of this review. Hydrolytic antibodies can exhibit both a fine substrate specificity that rivals natural enzymes (e.g., 2H6 and 21H3 [121], the D-Abs [25]
29
0.24 ± 0.02 0.20 ± 0.02 0.15 ± 0.01 0.3 ± 0.03 0.20 ± 0.03 0.20 ± 0.02 0.6 0.7 0.1 0.6 0.4 0.5
5±1 6±1 1 ± 0.1 6±1 7±2 4±1
0.2 ± 0.1 0.10 ± 0.05 0.01 ± 0.003 0.04 ± 0.01 0.2 ± 0.1 0.08 ± 0.03
Kfast+ s−1b) kslow+ s−1 kslow0.04 0.02 0.01 0.007 0.03 0.02
0.013 ± 0.002 0.001 ± 3 × 104 – – – –
Kslow s−1 kslowest+ s−1
0.002 ± 3 × 104 0.0023 ± 3 × 104 – – – –
kslowest-
0.15 2.3 – – – –
3±1 25 ± 3 1 ± 0.4 4±1 10 ± 2 10 ± 2
Kslowest nM KD c)
b)
Because of high error in its determination kfast+ was back-calculated from the overall dissociation constant (K d ) after measuring all the other kinetic parameters. Calculated from K d . c) Measured by fluorescence equilibrium titration.
a)
4 × 105 3 × 105 2 × 106 5 × 105 5 × 105 4 × 105
TSA µM−1 s−1 kfast+ s−1a) kfast- µM
D2.3 1b 1p D2.4 1b 1p D2.5 1b 1p
Ab
Table 6 Kinetic parameters of D-Abs’ binding to p-nitro-phenyl (1p)- and benzyl (1b)-phosphonate TSAs
30 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
References
and 12F12 [93] antibodies), as well as very broad specificity that may be induced by bulky TSA linker moieties (e.g., [122, 123]). In addition, high enantioselectivity was reported for several ester hydrolytic antibodies (e.g., [124, 125]). Metal (Pd)-mediated allosterity was also described [126]. Rivaling the rates of enzyme catalysis, by catalytic antibodies or any other enzyme mimic, is a daunting challenge. Antibodies, like all other enzyme mimics, are still much inferior to natural enzymes in their catalytic efficiency. Yet, despite the limited rates achievable by hydrolytic antibodies, their generation and the associated research effort have provided unique insights into enzyme catalysis and antibody diversity, evolution and structure. The clear message from 16 years of research into hydrolytic antibodies is that it is possible to mimic individual forces and contributions to enzymic catalysis with enzyme-like efficiencies. However, it is the combination of the different catalytic forces and contributions described here that may yield antibodies with efficiencies rivaling natural enzymes. Moreover, a significant enhancement in rate may require the incorporation of several catalytic elements and a parallel improvement of the antibody’s dynamics. These complex structural and mechanistic elements are not likely to be induced by designing better haptens or engineering existing antibodies. Thus, the most general and effective way of improving the catalytic potency of antibodies would be to select esterolytic antibodies directly for higher rates and turnover ([127] and references therein). Such selections can affect all the factors concerned, including those linked directly with the active site (e.g., inducing endogenous active-site nucleophiles) and those peripheral to it (e.g., improving the approach of exogenous nucleophiles or the rate of conformational isomerism). The existing hydrolytic antibodies described above that exhibit low to medium rate accelerations and effective turnover may serve as the ideal starting point for such an evolutionary process. References 1 TANTILLO, D. J., HOUK, K. N., J. Org. Chem. 64 (1999), p. 3066–3076 2 FERSHT, A. R., Enzyme structure and mechanism, W.H. Freeman, New-York 1985 3 HAMMES, G. G., Biochemistry 41 (2002), p. 8221–8228 4 CARTER, P., WELLS, J. A. Proteins 7 (1990), p. 335–342 5 TURNER, J. M., LARSEN, N. A., BASRAN, A., BARBAS, C. F. 3rd, BRUCE, N. C., WILSON, I. A., LERNER, R. A., Biochemistry 41 (2002), p. 12297–12307 6 KREBS, J. F. SIUZDAK, G. DYSON, H. J. STEWART, J. D., BENKOVIC, S. J., Biochemistry 34 (1995), p. 720–723 7 WIRSCHING, P., ASHLEY, J. A., LO, C. H., JANDA, K. D., LERNER, R. A., Science 270 (1995), p. 1775–1782
8 BALDWIN, E., SCHULTZ, P. G., Science 245 (1989), p. 1104–1107 9 POLLACK, S. J., NAKAYAMA, G. R., SCHULTZ, P. G., Science 242 (1988), p. 1038–1040 10 LINDNER, A. B., KIM, S. H., SCHINDLER, D. G., ESHHAR, Z., TAWFIK, D. S., J. Mol. Biol. 320 (2002), p. 559–572 11 GIGANT, B., TSUMURAYA, T., FUJII, I., KNOSSOW, M., Struct. Fold Des. 7 (1999), p. 1385–1393 12 ERSOY, O. FLECK, R. SINSKEY, A. MASAMUNE, S., J. Am. Chem. Soc. 120 (1998), p. 817–818 13 GUO, J. HUANG, W. SCANLAN, T. S., J. Am. Chem. Soc. 116 (1994), p. 6062–6069 14 CHARBONNIER, J. B., CARPENTER, E., GIGANT, B., GOLINELLI-PIMPANEAU, B., ESHHAR, Z., GREEN, B. S., KNOSSOW, M.,
31
32 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
15 16
17
18
19
20 21
22
23 24
25
26
27
28
29
30
Proc. Natl. Acad. Sci. U S A 92 (1995), p. 11721–11725 LINDNER, A. B., ESHHAR, Z., TAWFIK, D. S., J. Mol. Biol. 285 (1999), p. 421–430 WEDEMAYER, G. J., PATTEN, P. A., WANG, L. H., SCHULTZ, P. G., STEVENS, R. C., Science 276 (1997), p. 1665–1669 WIRSCHING, P., ASHLEY, J. A., BENKOVIC, S. J., JANDA, K. D., LERNER, R. A., Science 252 (1991), p. 680–685 MUNDORFF, E. C., HANSON, M. A., VARVAK, A., ULRICH, H., SCHULTZ, P. G., STEVENS, R. C., Biochemistry 39 (2000), p. 627–632 HUGOT, M., BENSEL, N., VOGEL, M., REYMOND, M. T., STADLER, B., REYMOND, J. L., BAUMANN, U., Proc. Natl. Acad. Sci. U S A 99 (2002), p. 9674–9678 STEWART, J. D., BENKOVIC, S. J., Nature 375 (1995), p. 388–391 JENCKS, W. P., Catalysis in chemistry and enzymology, McGraw-Hill Book Company, New-York 1969 TAWFIK, D. S., CHAP, R., GREEN, B. S., SELA, M., ESHHAR, Z., Proc. Natl. Acad. Sci. U S A 92 (1995), p. 2145–2149 MADER, M. M., BARTLETT, P. A., Chem. Rev. 97 (1997), p. 1281–1302 TAWFIK, D. S., GREEN, B. S., CHAP, R., SELA, M., ESHHAR, Z., Proc. Natl. Acad. Sci. U S A 90 (1993), p. 373–377 TAWFIK, D. S., LINDNER, A. B., CHAP, R., ESHHAR, Z., GREEN, B. S., Eur. J., Biochem. 244 (1997), p. 619–626 CHARBONNIER, J. B., GOLINELLI, P. B., GIGANT, B., TAWFIK, D. S., CHAP, R., SCHINDLER, D. G. KIM, S. H., GREEN, B. S., ESHHAR, Z., KNOSSOW, M., Science 275 (1997), p. 1140–1142 WEDEMAYER, G. J., WANG, L. H., PATTEN, P. A., SCHULTZ, P. G., STEVENS, R. C., J. Mol. Biol. 268 (1997), p. 390–400 PATTEN, P. A., GRAY, N. S., YANG, P. L., MARKS, C. B., WEDEMAYER, G. J., BONIFACE, J. J. STEVENS, R. C. SCHULTZ, P. G., Science 271 (1996), p. 1086–1091 BUCHBINDER, J. L., STEPHENSON, R. C., SCANLAN, T. S., FLETTERICK, R. J., J. Mol. Biol. 282 (1998), p. 1033–1041 GUO, J., HUANG, W., ZHOU, G. W., FLETTERICK, R. J., SCANLAN, T. S., Proc. Natl. Acad. Sci. U S A 92 (1995), p. 1694–1698
31 MIYASHITA, H., KARAKI, Y., KIKUCHI, M., FUJII, I., Proc. Natl. Acad. Sci. U S A 90 (1993), p. 5337–5340 32 MIYASHITA, H., HARA, T., TANIMURA, R., FUKUYAMA, S., CAGNON, C., KOHARA, A., FUJII, I., J. Mol. Biol. 267 (1997), p. 1247–1257 33 O. KRISTENSEN, VASSYLYEV, D. G., TANAKA, F., MORIKAWA, K., FUJII, I., J. Mol. Biol. 281 (1998), p. 501–511 34 R. ZEMEL, SCHINDLER, D. G., TAWFIK, D. S., ESHHAR, Z., GREEN, B. S., Mol. Immunol. 31 (1994), p. 127–137 35 ZHOU, G. W., GUO, J., HUANG, W., FLETTERICK, R. J., SCANLAN, T. S., Science 265 (1994), p. 1059–1064 36 MACBEATH, G., HILVERT, D., Chem. Biol. 3 (1996), p. 433–445 37 TANTILLO, D. J., HOUK, K. N., Chem. Biol. 8 (2001), p. 535–545 38 GOLINELLI-PIMPANEAU, B., J. Immunol. Methods 269 (2002), p. 157–171 39 TANTILLO, D. J., HOUK, K. N., J. Comput. Chem. 23 (2002), p. 84–95 40 O’CONNELL, T., DAY, P. R. M., TORCHILIN, E. V., BACHOVCHIN, W. W., MALTHOUSE, J. G., BioChem. J. 326 (1997), p. 861–866 41 O’SULLIVAN, D. B., O’CONNELL, T. P., MAHON, M. M., KOENIG, A., MILNE, J. J., FITZPATRICK, T. B., MALTHOUSE, J. P., Biochemistry, (1999) 38 p. 6187–6194 42 GERLT, J. A., GASSMAN, P. G., Biochemistry 32 (1993), p. 11943–11952 43 SZELTNER, Z., RENNER, V., POLGAR, L., Protein Sci. 9 (2000), p. 353–360 44 FERSHT, A. R., SHI, J. P., KNILL-JONES, J., LOWE, D. M., WILKINSON, A. J., BLOW, D. M. BRICK, P., CARTER, P., WAYE, M. M., WINTER, G., Nature 314 (1985), p. 235–238 45 COREY, D. R., CRAIK, C. S., J. Am. Chem. Soc. 114 (1992), p. 1784–1790 46 HAREL, M., QUINN, D. M., NAIR, H. K., SILMAN, I., SUSSMAN, J. L., J. Am. Chem. Soc. 118 (1996), p. 2340–2346 47 SZELTNER, Z., REA, D., RENNER, V., FULOP, V., POLGAR, L., J. Biol. Chem. 277 (2002), p. 42613–42622 48 JUHASZ, T., SZELTNER, Z., RENNER, V., POLGAR, L., Biochemistry 41 (2002), p. 4096–4106 49 LARSEN, N. A., TURNER, J. M., STEVENS, J., ROSSER, S. J., BASRAN, A., LERNER, R. A.,
References
50 51
52 53 54
55 56 57 58 59
60
61
62
63 64
65
66 67 68
69
BRUCE, N. C., WILSON, I. A., Nat. Struct. Biol. 9 (2002), p. 17–21 PARKER, J., MicroBiol. Rev. 53 (1989), p. 273–298 ORTIZ, C., TELLIER, C., WILLIAMS, H., STOLOWICH, N. J., SCOTT, A. I., Biochemistry 30 (1991), p. 10026–10034 DELBAERE, L. T., BRAYER, G. D., J. Mol. Biol. 183 (1985), p. 89–103 HENDERSON, R., J. Mol. Biol. 54 (1970), p. 341–354 CHAPPEY, O., DEBRAY, M., NIEL, E., SCHERRMANN, J. M., J. Immunol. Methods 172 (1994), p. 219–225 FOOTE, J., EISEN, H. N., Proc. Natl. Acad. Sci. USA 92 (1995), p. 1254–1256 THOMAS, N. R., Appl. BioChem. Biotechnol. 47 (1994), p. 345–372 THOMAS, N. R., Nat. Prod. Rep. 13 (1996), p. 479–511 STEVENSON, J. D., THOMAS, N. R., Nat. Prod. Rep. 17 (2000), p. 535–577 BACA, M., SCANLAN, T. S., STEPHENSON, R. C., WELLS, J. A., Proc. Natl. Acad. Sci. U S A 94 (1997), p. 10063–10068 TAKAHASHI, N., KAKINUMA, H., LIU, L., NISHI, Y., FUJII, I., Nat. Biotechnol. 19 (2001), p. 563–567 DENG, S. X., de PRADA, P., LANDRY, D. W., J. Immunol Methods 269 (2002), p. 299–310 WENTWORTH, P., DATTA, A., BLAKEY, D., BOYLE, T., PARTRIDGE, L. J., BLACKBURN, G., Proc. Natl. Acad. Sci. USA 93 (1996), p. 799–803 POLLACK, S. J., JACOBS, J. W., SCHULTZ, P. G., Science 234 (1986), p. 1570–1573 DATTA, A., WENTWORTH Jr., P., SHAW, J. P., SIMEONOV, A., JANDA, K. D., J. Am. Chem. Soc 121 (1999), p. 10461–10467 KOHEN, F., KIM, J. B., LINDNER, H. R., ESHHAR, Z., GREEN, B., FEBS Lett 111 (1980), p. 427–431 SCHOWEN, R. L., J. Immunol. Methods 269 (2002), p. 59–65 WAGNER, J., LERNER, R. A., BARBAS III, C. F., Science 270 (1995), p. 1797–1800 ZHONG, G. F., LERNER, R. A., BARBAS, C. F., Angew. Chem. Int. Ed. 38 (1999), p. 3738–3741 TSUMURAYA, T., TAKAZAWA, N., TSUNAKAWA, A., FLECK, R., MASAMUNE, S., Chem. Eur. J. 7 (2001), p. 3748–3755
70 ANGELES, T. S., SMITH, R. G., DARSLEY, M. J., SUGASAWARA, R., SANCHEZ, R. I., KENTEN, J. SCHULTZ, P. G., MARTIN, M. T., Biochemistry 32 (1993), p. 12128–12135 71 ANGELES, T. S., MARTIN, M. T., BioChem. Biophys. Res. Commun. 197 (1993), p. 696–701 72 MARTIN, M. T., NAPPER, A. D., SCHULTZ, P. G., REES, A. R., Biochemistry, (1991), 30, p. 9757–9761 73 GIGANT, B., CHARBONNIER, J. B., GOLINELLI-PIMPANEAU, B., ZEMEL, R. R., ESHHAR, Z., GREEN, B. S., KNOSSOW, M., Eur J., BioChem. 246 (1997), p. 471–476 74 TAWFIK, D. S., CHAP, R., ESHHAR, Z., GREEN, B. S., Protein Eng. 7 (1994), p. 431–434 75 STEWART, J. D., KREBS, J. F., SIUZDAK, G., BERDIS, A. J., SMITHRUD, D. B., BENKOVIC, S. J., Proc. Natl. Acad. Sci. USA 91 (1994), p. 7404–7409 76 JANDA, K. D., SCHLOEDER, D., BENKOVIC, S. J., LERNER, R. A., Science 241 (1988), p. 1188–1191 77 BENKOVIC, S. J., ADAMS, J. A., BORDERS, Jr., C. L., JANDA, K. D., LERNER, R. A., Science 250 (1990), p. 1135–1139 78 GIBBS, R. A., BENKOVIC, P. A., JANDA, K. D., LERNER, R. A., BENKOVIC, S. J., J. Am. Chem. Soc 114 (1992) 79 THAYER, M. M., OLENDER, E. H., ARVAI, A. S., KOIKE, C. K., CANESTRELLI, I. L., STEWART, J. D., BENKOVIC, S. J., GETZOFF, E. D., ROBERTS, V. A., J. Mol. Biol. 291 (1999), p. 329–345 80 JENCKS, W. P., Catalysis in chemistry and enzymology, McGraw-Hill Book Company, New-York 1969, p. 67–72 81 STEWART, J. D., ROBERTS, V. A., THOMAS, N. R., GETZOFF, E. D., BENKOVIC, S. J., Biochemistry 33 (1994), p. 1994–2003 82 KIMBALL, A. S., LEE, J., JAYARAM, M., TULLIUS, T. D., Biochemistry 32 (1993), p. 4698–4701 83 VILADOT, J. L., de RAMON, E., DURANY, O., PLANAS, A., Biochemistry 37 (1998), p. 11332–11342 84 ADMIRAAL, S. J., SCHNEIDER, B., MEYER, P., JANIN, J., VERON, M., DEVILLE-BONNE, D., HERSCHLAG, D., Biochemistry 38 (1999), p. 4701–4711
33
34 Catalytic Antibodies as Mechanistic and Structural Models of Hydrolytic Enzymes
85 LEE, Y. L., CHEN, J. C., SHAW, J. F., BioChem. Biophys. Res. Commun. 231 (1997), p. 452–456 86 STOOPS, J. K., HORGAN, D. J., RUNNEGAR, M. T., De JERSEY, J., WEBB, E. C., ZERNER, B., Biochemistry 8 (1969), p. 2026–2033 87 FASTREZ, J., FERSHT, A. R., Biochemistry 12 (1973), p. 2025–2034 88 SELZER, T., ALBECK, S., SCHREIBER, G., Nat. Struct. Biol. 7 (2000), p. 537–541 89 ERSOY, O., FLECK, R., SINSKEY, A., MASAMUNE, S., J. Am. Chem. Soc. 118 (1996), p. 13077–13078 90 JANDA, K. D., WEINHOUSE, M. I., SCHLOEDER, D., LERNER, R. A., J. Am. Chem. Soc. 112 (1990), p. 1274–1275 91 SUGA, H., ERSOY, O., TSUMURAYA, T., LEE, J., SINSKEY, A. J., MASAMUNE, S., J. Am. Chem. Soc. 116 (1994), p. 487–494 92 GRYNSZPAN, F., KEINAN, E., P.-, A., Chem. Commun. 21 (1998) 93 IWABUCHI, Y., KURIHARA, S., ODA, M., FUJII, I., Tetrahedron Lett. 40 (1999), p. 5341–5344 94 GOLINELLI-PIMPANEAU, B., GONCALVES, O., DINTINGER, T., BLANCHARD, D., KNOSSOW, M., TELLIER, C., Proc. Natl. Acad. Sci. USA 97 (2000), p. 9892–9895 95 LARSEN, N. A., HEINE, A., CRANE, L., CRAVATT, B. F., LERNER, R. A., WILSON, I. A., J. Mol. Biol. 314 (2001), p. 93–102 96 CROSS, S. S., BRADY, K., STEVENSON, J. D., SACKIN, J. R., KENWARD, N., DIETEL, A., THOMAS, N. R., J. Immunol. Methods 269 (2002), p. 173–195 97 STEVENSON, J. D., DIETEL, A., THOMAS, N. R., Chem. Commun. 1999 (1999), p. 2105–2106 98 WILLIAMS, N. H., TAKASAKI, B., WALL, M., CHIN, J., Acc. Chem. Res. 32 (1999), p. 485–493 99 WADE, W. S., ASHLEY, J. A., JAHANGIRI, G. K., MCELHANY, G., JANDA, K. D., LERNER, R. A., J. Am Chem. Soc 115 (1993), p. 4906–4907 100 BRUMMER, O., HOFFMAN, T. Z., JANDA, K. D., Bioorg. Med. Chem. 9 (2001), p. 2253–2257 101 HEDSTROM, L., Chem. Rev. 102 (2002), p. 4501–4524 102 GIGANT, B., CHARBONNIER, J. B., ESHHAR, Z., GREEN, B. S., KNOSSOW, M., J. Mol. Biol. 284 (1998), p. 741–750
103 D’SOUZA, L. J., GIGANT, B., KNOSSOW, M., GREEN, B. S., J. Am. Chem. Soc. 124 (2002), p. 2114–2115 104 WADE, H., SCANLAN, T. S., J. Am. Chem. Soc., (1999), 121, p. 11935–11941 105 FINER-MOORE, J. S., SANTI, D. V., STROUD, R. M., Biochemistry, (2003), 42, p. 248–256 106 STROUD, R. M., FINER-MOORE, J. S., Biochemistry, (2003), 42, p. 239–247 107 SHI, J., RADIC, Z., TAYLOR, P., J. Biol. Chem., (2002), 277, p. 43301–43308 108 SHI, J., BOYD, A. E., RADIC, Z., TAYLOR, P., J. Biol. Chem. 276 (2001), p. 42196–42204 109 PADLAN, E. A., Mol. Immunol. 31 (1994), p. 169–217 110 FOOTE, J., MILSTEIN, C., Proc. Natl. Acad. Sci. USA 91 (1994), p. 10370–10374 111 LANCET, D., PECHT, I., Proc. Natl. Acad. Sci. USA 73 (1976), p. 3549–3553 112 WILSON, I. A., STANFIELD, R. L., Curr. Opin. Struct. Biol. 4 (1994), p. 857–867 113 JAMES, L. C., ROVERSI, P., TAWFIK, D. S., Science 299 (2003), p. 1362–1367 114 PELLEQUER, J. L., CHEN, S., ROBERTS, V. A., TAINER, J. A., GETZOFF, E. D., J. Mol. Recognit. 12 (1999), p. 267–275 115 ULRICH, H. D., MUNDORFF, E., SANTARSIERO, B. D., DRIGGERS, E. M., STEVENS, R. C., SCHULTZ, P. G., Nature 389 (1997), p. 271–275 116 YIN, J., MUNDORFF, E. C., YANG, P. L., WENDT, K. U., HANWAY, D., STEVENS, R. C., SCHULTZ, P. G., Biochemistry 40 (2001), p. 10764–10773 117 YANG, P. L., SCHULTZ, P. G., J. Mol. Biol. 294 (1999), p. 1191–1201 118 SCHULTZ, P. G., LERNER, R. A., Nature 418 (2002), p. 485 119 ROMESBERG, F. E., SPILLER, B., SCHULTZ, P. G., STEVENS, R. C., Science 279 (1998), p. 1929–1933 120 ARKIN, M. R., WELLS, J. A., J. Mol. Biol. 284 (1998), p. 1083–1094 121 JANDA, K. D., BENKOVIC, S. J., LERNER, R. A., Science 244 (1989), p. 437–440 122 TANAKA, F., KINOSHITA, K., TANIMURA, R., FUJII, I., J. Am. Chem. Soc. 118 (1996), p. 2332–2339 123 KURIHARA, S., TSUMURAYA, T., SUZUKI, K., KURODA, M., LIU, L., TAKAOKA, Y., FUJII, I., Chem. Eur. J. 6 (2000), p. 1656–1662
References 124 SHI, Z. D., YANG, B. H., ZHAO, J. J., WU, Y. L., JI, Y. Y., YEH, M., Bioorg. Med. Chem. 10 (2002), p. 2171–2175 125 WADE, H., SCANLAN, T. S., J. Am. Chem. Soc. 118 (1996), p. 6510–6511 126 FINN, M. G., LERNER, R. A., BARBAS, C. F., J. Am. Chem. Soc. 120 (1998), p. 2963–2964 127 GRIFFITHS, A. D., TAWFIK, D. S., Curr. Opin. Biotechnol. 11 (2000), p. 338–353 128 GIGANT, B., CHARBONNIER, J. B., ESHHAR, Z., GREEN, B. S., KNOSSOW, M., Proc. Natl. Acad. Sci., USA 94 (1997), p. 7857–7861 129 CHARBONNIER, J. B., GOLINELLI-PIMPANEAU, B., GIGANT, B., GREEN, B. S., KNOSSOW, M., Isr. J., Chem. 36 (1996), p. 171–176 130 BRAXTON, S., WELLS, J. A., J. Biol. Chem. 266 (1991), p. 11797–11800 131 PHILLIPS, M. A., FLETTERICK, R., RUTTER, W. J., J. Biol. Chem. 265 (1990), p. 20692–20698 132 MENARD, R., STORER, A. C., Biol. Chem. Hoppe-Seyler, 373 (1992), p. 393–400
133 NICOLAS, A., EGMOND, M., VERRIPS, C. T., DEVLIEG, J., LONGHI, S., CAMBILLAU, C., MARTINEZ, C., Biochemistry 35 (1996), p. 398–410 134 JANDA, K. D., WEINHOUSE, M. I., DANON, T., PACELLI, K. A., SCHLOEDER, D., J. Am. Chem. Soc. 113 (1991), p. 5427–5434 135 ANDREEVA, N. S., RUMSH, L. D., Protein Sci. 10 (2001), p. 2439–2450 136 NORTHROP, D. B., Acc. Chem. Res. 34 (2001), p. 790–797 137 KLEIFELD, O., FRENKEL, A., MARTIN, J. M. L., SAGI, I., Nat. Struct. Biol. 10 (2003), p. 98–103 138 PENNER-HAHN, J. E., Nat. Struct. Biol. 10 (2003), p. 75–77 139 LINDNER, A. B., Ph.D. Thesis, Feinberg 2002, graduate school, Weizmann Institute of Science, Rehovot, Israel 140 HOFFMANN, T., ZHONG, G. F., LIST, B., SHABAT, D., ANDERSON, J., GRAMATIKOVA, S., LERNER, R. A., BARBAS, C. F., J. Am. Chem. Soc. 120 (1998), p. 2768–2779
35
1
Chemical Microarrays for Small Molecule Ligand Screening G. Metz, H. Ottleben, and D. Vetter
Graffinity Pharmaceutical Design GmbH, Heidelberg, Germany
Originally published in: Protein-Ligand Interactions. Edited by Hans-Joachim B¨ohm, Gisbert Schneider. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30521-6
1 Introduction
Since their introduction about a decade ago, combinatorial chemistry and highthroughput screening (HTS) have become indispensable tools in the drug-discovery process. The possibility to synthesize ever-increasing numbers of molecules through novel chemistries and automation is stimulating the development of higher screening capabilities through miniaturization and robotics. Robust biochemical assay development is providing the basis for large-scale screening of biological targets. While in the early days much effort and hope were directed towards managing a numbers game, the focus is shifting from quantity towards quality. For instance, the screening of compound mixtures is being replaced by screening of individual substances in a one-well–one-compound fashion. The design of general-purpose screening libraries as well as corresponding follow-up strategies has become a key aspect in small molecule discovery and optimization. Hits from high-throughput screening enter a selection process to become the subject of medicinal chemistry approaches in lead optimization. Strategies are employed to improve potency, selectivity, and physicochemical profile. If possible, several compound series are generated to allow for alternative routes in case of failure of one. Syntheses of analogues for further exploration are guided by a combination of medicinal chemistry knowledge and intuition, as well as quantitative structure-activity relationships, if available. Such studies help to define particular pharmacophoric features within the hit or lead molecule that constitute the underlying molecular recognition motifs between the ligand and its target and provide a hypothesis for its mode of action. The focus in screening for biological activity assays is on detecting hits with activities in the low micromolar range. Compounds exhibiting this level of activity tend to be of molecular weight in the range of 300–600 Da and of substantial functional Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Chemical Microarrays for Small Molecule Ligand Screening
complexity. A conceptually different strategy can be envisioned that – instead of trying to identify and keep relevant features in a rather complex hit compound – aims at a stepwise discovery starting from molecular fragments. Such fragments need to be screened with techniques suitable for the detection of presumably weak interactions. Guided by early stage structure-affinity information, more potent compounds can then be assembled either from a combination of low affinity binders or by chemical modification of the initial fragments. This article describes the underlying philosophy of fragment-based discovery as well as the experimental approaches suitable for this promising discovery concept. While fragment-based ligand discovery was first adapted in computational methods, several experimental techniques have been introduced recently. Biophysical methods such as nuclear magnetic resonance (NMR) and X-ray crystallography have been successfully applied to fragment discovery. A novel screening technique based on chemical microarrays in combination with label-free affinity detection has emerged and will be discussed in detail.
2 Fragment Approaches 2.1 Conceptual Ideas
Primary screening efforts in drug discovery aim at the identification of hit molecules with the necessary characteristics to be developed into a promising lead molecule. The definition of favorable properties of the starting screening compounds has gained much attention. The design of libraries with drug-like characteristics generally follows the so-called “rule of five” which has been established by retrospective analysis of known drugs and allows a quick assessment based on simple properties, namely, preferred ranges for molecular weight and clogP as well as the number of hydrogen bond donors and acceptors [1]. However, it has been proposed that the ideal profile for hit or lead compounds is different from that of the final drug molecule [2], in particular because hit or lead compounds must be amenable for further optimization. Three categories of lead compounds have been defined based on their physicochemical properties and typical affinities. First, it has been pointed out that hits from drug-like libraries rarely show high (<100 nM) affinities. Typical optimization schemes tend to increase both molecular weight and lipophilicity. Therefore, if applied to already large compounds, they would fall out of the preferred drug-like ranges. Second, non-drug-like hits with very high affinity at an early stage, e.g., natural compounds, usually are rather complex, and optimization towards drug-likeness is difficult. The third category, namely, small (100–350 Da) molecules with low clogP (1–3) is being described as a favorable type of lead compound, as it still allows exploitation of additional (hydrophobic) interactions during combinatorial optimization schemes. Overall, the value of active screening compounds is judged considering affinity relative to units of molecular weight and
2 Fragment Approaches
lipophilicity. Teague et al. suggest that the quality of hits emerging from screening could be improved by tailoring screening libraries towards such lead-like characteristics [2]. In the case of hit identification, various techniques of library design and compound selection are applied to maximize the likelihood of discovery. The relationship between the probability of discovering molecules in a screening assay and their structural complexity was studied using a model of receptor-ligand interactions [3]. The ligand and the active site are represented by a linear string of binary (“+” and “−”) features, and a good fit is indicated by complementarity of the aligned patterns. While molecular recognition is determined by a delicate interplay of physicochemical and steric complementarity, the apparent simplification still allows one to address some key aspects. Varying degrees of relative complexity between the active site and the ligand are represented by different lengths of the binary patterns. By calculating matching probabilities for varying ligand complexities, it is observed that the likelihood of finding any fit exponentially decreases with ligand complexity and that the curve describing the chance of finding just one unique match between ligand and active site features reaches a distinct maximum. The theoretical complexity of such single-mode binding was found to peak at a binary pattern length of three when compared to a site complexity of 12 features. For less complex ligands, multiple binding modes start to dominate the probability curve. While this finding cannot be directly translated into simple molecular features, it points towards a higher overall chance of hit discovery for relatively small molecules. However, smaller molecules will probably exhibit weaker binding, and therefore chances to experimentally detect binding will depend on the screening technology. Taking this into account, the peak of preferred ligand complexity would shift towards somewhat larger compounds. In summary, the existence of an optimal ligand complexity and its relation to the detection probability defines a “range of useful events” (Fig. 1) and leads to the postulation that screening simpler molecules is advantageous from a probabilistic point of view [3]. Molecular recognition addresses the aspect of not only affinity but also specificity, and the question of unique binding modes becomes fundamental for smaller molecules. In general, the multitude of energetically similar but structurally different binding modes increases with fewer interaction features of the ligand. The corresponding free energy of binding landscape is termed “frustrated,” and both native and non-native binding modes of small molecules must be considered [4]. Ideally, the native binding mode is separated from alternate positions in the binding site by an energy “stability gap,” which ensures that a rather specific recognition motif is present (Fig. 2). Small fragment-like compounds exhibiting such a preferred binding mode, so-called molecular anchors, may be more suitable to combinatorial optimization than stronger binders with iso-energetic multiple binding modes [5]. With anchors serving as a receptor-specific recognition motif, a native binding mode would imply that addition of structural features to a binding fragment will not dramatically alter its orientation. Such modular approaches rely on the observation that additivity if not synergy is obtained when preselected fragments are combined. A point in case is a detailed structural comparison of enzyme-ligand binding which
3
4 Chemical Microarrays for Small Molecule Ligand Screening
Fig. 1 The theoretical probability of molecular recognition based on a simple interaction model indicates that the likelihood of a unique binding mode decreases with increasing ligand complexity (A). The probability to experimentally detect a binding event is estimated to increase with com-
plexity (B). The product probability for a so-called useful event, namely, the detection of a ligand with a unique binding mode, reaches a maximum at a medium ligand complexity (C). (See citation in text for details and discussion).
showed that binding modes for individual fragments of the thymidylate synthase substrate correspond well with the whole substrate in the active site [6]. The relationship between the potential affinity of a given molecule and its actually measured binding strength to a given target has been addressed by analyzing individual functional group contributions to drug-receptor interactions [7]. Average binding energies of 10 common functionalities have been estimated based on a dataset of 200 drugs and enzyme inhibitors with known free energies of binding. The average binding energy for a given ligand, often also referred to as Andrews energy, can then be calculated by adding up the average contribution of the functional
Fig. 2. Concept of a stability gap between different binding modes of a fragment within an active site. The assumed binding energy for two different fragments is shown as a simple scheme. Fragment A qualifies as a molecular anchor, as there is a difference in free energy of binding of the best placement to alternative placements. Such a stability gap is not observed for the second fragment B. Originally, these considerations were made based on docked placements of fragments and free energies as estimated from a scoring function (see citation in text).
2 Fragment Approaches
groups present in the molecule and taking into consideration an entropic correction term. The comparison of experimental and theoretical binding energy may then be used to classify the binding as better or worse than average, in other words as an indicator of a good or bad ligand-receptor fit. Poor binders were found to be large, flexible, and rich in polar groups. A similar finding was reported for 3000 screening compounds, where a large proportion exhibited lower observed affinities when compared to the average Andrews energy based on the number of functionalities present. The general non-linearity of binding affinity and ligand size has been pointed out in comparing binding affinities relative to the number of non-hydrogen atoms in known protein-ligand complexes [8]. While a correlation of binding energy with size was found for smaller ligands, there also is evidence for an upper limit of observed affinities for larger ligands that therefore under-perform according to their Andrews energy. The authors note that as few as 7–10 atoms allow for nanomolar binding constants and also suggest to assess relative affinity during a drug-discovery process using expansion approaches. In a more pictorial phrase, it is the “bang per dalton” that has to be kept in mind when evaluating the outcome of hit and lead finding efforts. This ratio tends to be better for smaller-sized compounds. 2.2 Choice of Screening Fragments
The quality and quantity of fragment collections are critical issues for all fragmentbased discovery methods. Some considerations are related to deconvolution strategies based on unique NMR shifts or unique mass or shape, while others are sensitive to the detection limits and noise level of the respective experiment. Besides these technical aspects, the goal is to identify those fragments that best suit the concept of molecular anchors or recognition motifs. On the one hand, a certain size and complexity make a defined binding mode more likely. On the other hand, in order to qualify as a fragment, screening compounds would be limited to a molecular weight range between 100 and 300 Da. The total number of fragments in a screening collection is related to the experimental throughput and the necessity to cover the “accessible fragment space” in order to increase the chances of finding a hit in the initial screen. The sufficient presence of functional groups in the screening fragments is of importance for subsequent hit expansion strategies such as combining or decorating the initial hits. Systematic procedures for the identification of suitable fragments or substructures have been implemented that aim to analyze active molecules in order to identify biologically relevant motifs. Because the field is expanding, a diverse nomenclature is evolving; synonyms for chemical fragments are needles, molecular anchors, biophores, molecular frameworks, MULBITs, or base fragments. In a procedure called RECAP [9], fragmentation is limited to predefined bond types leading to virtual fragments with chemical functionalities that allow them to be used as building blocks for combinatorial chemistry. The method was applied to a number of databases, including the World Drug Index (WDI). By examination of fragment distribution across different therapeutic areas, specific motifs were identified for use in target-biased screening libraries. Typically, the molecular weight of most
5
6 Chemical Microarrays for Small Molecule Ligand Screening
fragments was around 200 Da and the cleavage rules kept ring systems intact. While the approach mainly addressed the question of building block selection for combinatorial libraries, the authors also pointed out its usefulness for generating a fragment database for (computational) ligand buildup approaches. Different variations of the basic approach have been implemented [10–12]. In a related work, databases of known drugs were the basis for breaking down molecules into rings, linkers, and side chains [13, 14]. In this way both common molecular frameworks and side chains were identified. Interestingly, it was found that rather few frameworks and side chains represent the majority of compounds in drug-related compound collections such as the Comprehensive Medicinal Chemistry (CMC) database. Incorporating this finding into a general-purpose screening library resulted in the so-called SHAPES library for NMR-based screening [15]. Natural product databases were also used for the search of interesting molecular scaffolds [11]. It has long been pointed out [16] that certain substructural motifs, named privileged structures, are capable of serving as a starting point for ligands for more than one protein. Systematic modifications of such structures with substitution patterns have been successfully applied in medicinal chemistry [17]. Often these attractive scaffolds are rigid polycyclic heteroatomic systems that allow us to present binding elements in various directions within the binding site. The concept has been adopted as a design strategy for combinatorial libraries [18], and corresponding building blocks are being marketed under the name “optimers” [19]. Experimental screening of compounds enriched with motifs based on such privileged fragments is thought to increase the chances of identifying promising hits for medicinal chemistry programs. Even more, fragment-based screening would allow us to directly use known privileged fragments or close analogues thereof. On the other hand, screening fragments that are not necessarily related to known privileged structures might help us to discover novel chemical motifs displaying properties that qualify them to be called “privileged” [20]. In summary, fragment collections for the experimental screening techniques that will be outlined below either fall into the category of “diversity-oriented” collections or are selected with some bias. The focus is on “drug-relevant” substructures or structures that are targeted against a certain class of proteins where prior knowledge exists. In addition to finding novel leads based on fragment screening, the technique can also be applied in the optimization of a known binder where a particular substructure needs to be replaced by a bio-isoster. A prime example is the quest for alternatives to charged basic groups occupying the thrombin S1 binding pocket in order to enhance bioavailability [21]. 2.3 Experimental Approaches
Fragment-based discovery methods have long been implemented in computational de novo drug design [22, 23]. In the multiple fragment-positioning methods, various functional groups are first placed within the active site, and, after preferred
2 Fragment Approaches
placements have been identified, linking of the fragments provides molecules that can be ranked according a chosen scoring function. Alternatively, sequential buildup strategies start from one placed fragment and successively add functional groups in order to “grow” the ligand, guided by the target structure and a suitable energetic scoring function. A similar strategy is also used in docking programs using an incremental construction algorithm that first performs a fragmentation of the ligand and then, starting with a base fragment placement, continuously builds up the ligand in the binding site. Clearly, these techniques are very much related to the molecular anchor concept discussed above. Besides problems associated with de novo computational approaches such as the synthesizability of proposed virtual compounds, the most critical issue is the fragment ranking, which is calculated by either energy-based methods or rule-based scoring functions. These predictive limitations in computationally derived binding conformation make experimental methods that are guided by either observed activity or binding affinity very attractive. In a hybrid approach, termed “biased needle screening,” in silico prescreening of molecular fragments is followed by a high-concentration bioassay, biophysical hit validation, and structure-based optimization [24]. Virtual pharmacophore screening of 350,000 compounds resulted in the identification of a 3000-member subset with molecular weights below 300 Da. These needle compounds were tested in an activity assay customized to pick up even weak binders. Several structural series could be confirmed, and a 3-D structure-guided optimization based on NMR and X-ray data gave novel, potent inhibitors of DNA gyrase. The authors state that initial HTS on this target did not deliver suitable lead structures. This finding supports the concepts outlined above and demonstrates the usefulness of needle screening as a new entry point to explore the chemical space. The so-called “target-guided ligand assembly strategy” starts from a library of possible binding fragments where each member possesses a common chemical linkage group [25]. Monomer screening against the target is then performed in a bioassay at high concentrations in order to detect even weak binders that then can be connected with a set of flexible linkers for a second round of screening. The utility of the method was demonstrated by using the tyrosine kinase c-Src and a microtiter-based ELISA assay. After screening at two concentrations (1 mM and 500 µM) and inspection of the hits, 37 reagents were selected out of 305 O-methyl oximes in the primary screening library. Using five different linkers, homo- and heterodimers of these reagents were synthesized in single wells, resulting in mixtures of compounds with different linker length. After identification of wells showing inhibition in a second screening assay at higher concentration, deconvolution of the mixtures by single-compound re-synthesis allowed to identify individual active substances and highlighted the importance of the linker length. The most potent compound identified in this manner exhibits an IC50 of 64 nM, a very large increase compared with IC50 values around 40 µM for the individual reagents alone. As with other methods described above, the synergy achieved and the lower number of compounds needed for screening demonstrate the advantage of modular approaches in general.
7
8 Chemical Microarrays for Small Molecule Ligand Screening
In order to discover weakly bound, low-molecular-weight (approx. 250 Da) ligands, a “tethering” strategy has been suggested that relies on the formation of a disulfide bond between the ligand and a cysteine residue on the protein [26]. This cysteine would either be present in the wild type or be genetically engineered in order to target a specific site in the protein. A library of disulfide-containing molecules was prepared and a mixture of a few substances (8–15) was incubated with the target protein under conditions that allow reversible reactions. It is expected that the formation of disulfide bonds to the protein is entropically stabilized for those compounds with an inherent affinity to the protein in proximity to the cysteine. Tethered complexes could be identified using mass spectrometry, provided that the library had been designed to contain molecules with unique molecular mass. The method was applied to thymidylate synthase containing a cysteine in the active site. Several library members showed binding corresponding to millimolar inhibition constants, as determined in an enzymatic assay. From a series of related ligands displaying different binding behavior, qualitative early SAR information was obtained. The exact binding mode was determined with the same ligand attached at different nearby sites through cysteine mutations; interestingly, the location of the tethered molecule was conserved, indicating little influence of the tether on the binding mode. Structure-based modifications in analogy to known substrates improved the initial hit to an inhibitor in the sub-micromolar range. Techniques such as X-ray crystallography and NMR, which combine low affinity screening capability with structural information, are very powerful tools. NMR spectroscopy plays an increasingly prominent role among the biophysical screening methods, and the experimental schemes are continuously being improved to widen the scope of applications [27–32]. Different detection strategies based either on ligand or target resonance signals have evolved. The approaches share the ability to detect weak yet specific binders. Design principles for NMR screening compounds have been reviewed [33]. An early and frequently cited fragment-based experimental screening approach was termed “SAR by NMR” [34, 35]. Here, mixtures of small organic molecules together with 15 N-labeled protein were subjected to 2-D 1 H- 15 N NMR measurements. Protein chemical shift variations relative to spectra of the protein alone indicated a binding event. Based on the predetermined assignment of chemical shifts, both the location of the binding site in the protein and the binding molecule were identified. This initial binder was then used to saturate the protein in order to find a second small molecule binding to a proximal site in a new round of screening. After neighboring small molecules were identified and optimized through an analogue approach, combinations with various linkers were synthesized and assayed. The linker design was supported by the 3-D structure of the protein-ligand complex. As a result of a successful linkage, binders were obtained with high affinities even exceeding the product of the binding constants of the individual fragments due to linker-mediated entropic enhancement. The method has been successfully applied in a number of studies [36–40] elegantly combining the fragment-based strategy with use of structural information. However, the original method can be applied only if sufficient (∼200 mg) N-labeled, soluble (at 2 mM) protein of limited size
2 Fragment Approaches
(< 40 kDa) is available, and it requires chemical shift assignment before the actual screening. The necessity for labeled proteins and the size limitation can be overcome by techniques that monitor not protein NMR signals but ligand resonances, either through line-broadening experiments, transferred NOE measurements, or relaxation- and diffusion-edited methods using pulsed field gradients. The NMR SHAPES strategy utilizes 1-D line broadening and 2-D transferred NOE measurements to identify binders in a mixture of compounds [15]. Potential weakly binding scaffolds (µM to mM) are selected based on an analysis of known drugs and represent molecular frameworks recurrently found in active therapeutic molecules. In contrast to the SAR by NMR method, the strategy does not aim to find highly potent ligands by NMR screening but instead empirically provides a basis for compound selection. In this way, libraries aimed at HTS screening may be biased by filtering against the target-specific chemical motifs identified by the SHAPES approach. The use of X-ray crystallography for primary screening is conceptually related to the SAR by NMR technique. Again, the sensitivity of the method makes it well positioned for fragment-based discovery. Because organic solvent molecules contain functional groups representative of those found in screening compounds, it has been suggested to co-crystallize proteins with different solvents and to experimentally determine preferred locations of small organic molecules [41]. The position of several organic molecules was thought to provide initial templates for selection of screening compounds or rational ligand design efforts. In yet another fragment strategy termed CrystaLead [42], the electron density map of the protein was determined to identify protein and solvent densities in the unbound state. Next, the crystal was exposed to a mixture of small organic molecules in a soaking experiment, and binders were identified by their appearing electron densities. It was crucial for the identification of the binding compounds in the screening mixture that they were of diverse and unique molecular shapes. Weak binding compounds (up to high µM) were detected and then optimized in a structure-directed process. The CrystaLead technology was demonstrated in a urokinase screen of 6–8 compound mixtures against 9 crystals. In this way, 61 fragments were exposed to the crystal soaking experiment, and 5 binders with affinities in the low and high µM range were discovered. One compound was further optimized using additional structural knowledge from a known inhibitor and resulted in an optimized lead of 370 nM affinity to urokinase. Crystallographic screening has gained significant interest in the commercial setting where high-throughput crystallography laboratories and technological advances in both hardware and software have increased the rate of protein-ligand crystal structure determination. The advent of structural genomics initiatives has spurred the development of robotics and automated data interpretation [43, 44]. The latest developments have recently been reviewed [45–47]. These technological advances are now being paired with fragment-based approaches. Results have been reported [46] where defined binding could be elucidated for very small fragments (< 200 Da), supporting the fact that despite small size and presumably weak interactions, fragments are able to make key interactions in order to act as molecular
9
10 Chemical Microarrays for Small Molecule Ligand Screening
anchor [4]. In a study with a protein kinase, such a weak initial fragment was optimized into a nanomolar inhibitor by the aid of structure-based design [45].
3 Chemical Microarrays 3.1 Background
Chemical microarrays can be defined as collections of chemical compounds covalently immobilized on a carrier in a 2-D pattern. After the widespread adoption of DNA-based microarrays in basic and commercial research, there is a growing interest to extend the array concept beyond genetics applications. Exploratory work is underway to lay out proteins and cells in array formats [48–51]. In parallel, various routes are being taken to realize arrays of small molecules such as synthetic chemicals, peptides, or natural products. The principle of a regular 2-D arrangement of chemical diversity is well known from high-throughput screening, which currently serves as the paradigm in de novo small molecule discovery. In this highly industrialized undertaking, compounds are solubilized and deposited in wells of microtiter plates as either single compounds or mixtures. A recent modification of HTS is the deposition of chemical-containing droplets on flat surfaces [52] to reduce sample consumption. In the latter, the target protein is captured in a hydrogel that is subsequently added to cover the surface and resolubilize the screening compounds. It is an inherent problem of HTS that small molecule solubility can vary greatly and can hardly be accounted for. Therefore, the concentration of any given compound in the screening mixture is difficult to predict and essentially unknown at the primary screening stage. This limitation is especially apparent in a fragment-based screening regime. Because higher compound concentrations are employed to facilitate the detection of binders below the usual micromolar cutoff, DMSO tolerance of the target protein or insolubility of the small molecules restricts this approach to robust assay systems and more hydrophilic fragment diversity. When we set out to develop a platform for chemical microarrays, compatibility with fragment-oriented diversity was a major design principle. We saw great promise in also solubilizing hydrophobic compounds by hydrophilic or amphiphilic spacer moieties. Such spacer groups are required for covalent tethering of the ligands to the array surface and for access to protein-binding pockets. The amphiphilic nature of the spacer aids hydration of small molecules associated with high clogP values. The flexibility and the length of the spacer chain allow for the ligands to reach even so-called deep protein-binding sites. Covalent tethering of small molecules is generally seen as an obstacle for deriving structure-activity relationships. The compounds are not free to orient themselves for optimal fit with the protein receptor, the spacer might sterically clash with the protein surface, and the spacer attachment itself might affect the electronic properties of the ligands. Consequently, in order to generate covalent modifications
3 Chemical Microarrays
of known drugs or other compounds known to be bioactive, the decision for regioselective spacer attachment should be based on a receptor-ligand complex structure if possible. Nevertheless, for de novo discovery of small molecule fragments, the spacer-mediated, structure-independent compound solubilization outweighs the above-mentioned potential disadvantage of restricted orientation. Furthermore, fragments can be conjugated to the spacer moiety by more than one type of coupling chemistry and through various functional groups present on the fragment itself. A third aspect of the small molecule “display” format is the additional information generated through knowledge of the tether site. The fact that possible orientations of a tethered compound within a binding site are more limited than in a homogeneous assay can facilitate building a hypothesis for a mode of action. The directionality can be used either in ligand-alignment procedures for SAR-type studies or as a bias in docking algorithms in order to evaluate possible binding modes. Other advantages of using chemical microarrays of peptides and organic compounds are in the minimization of biological sample consumption and the resulting increase in screening throughput. Array approaches hold the promise of mass production and industrialization. A simple format for reliably storing and reading chemical diversity is attractive compared to the current cumbersome fashion of operating refrigerated warehouse-type storage and dispensing systems. Chemical microarrays promise to enable function-blind screening of large numbers of novel targets. In general, chemical microarray approaches open up the opportunity to map interactions and discover small molecule binders for a given protein of interest even before understanding the protein’s function. Once the arrays are produced, the only prerequisite for array-based screening is the preparation of purified, homogeneous, and soluble protein. Advances in protein-production and -purification methods, such as expression systems optimization, protein folding, and affinity-tag development, complement array-based screening. It can be safely estimated that several thousands of novel, non-membrane-bound, putative target proteins will be derived from the knowledge of the human genome [53]. Chemical microarrays provide a potentially powerful alternative to high-throughput screening for small molecule de novo discovery on this wealth of novel targets. 3.2 On-array Synthesis
Chemical diversity for chemical microarrays can be accessed in two ways: importing compounds onto the chip surface or synthetically creating molecules directly on the support. In the pioneering work by Fodor and coworkers, a combination of solid-phase organic synthesis and photolithographic techniques was applied to the in situ creation of oligonucleotide and peptide microarrays [54, 55]. Technological challenges associated with this approach have limited it to the generation of oligonucleotide arrays. DNA consists of only four different nucleotides, and differences in reactivities among the four nucleotides on glass surfaces are well understood, and the solid-phase synthesis is highly optimized. Hence, highly
11
12 Chemical Microarrays for Small Molecule Ligand Screening
miniaturized synthesis of a vast diversity of different single-strand probes became feasible, and corresponding commercial products derived from photolithography are in widespread use. In contrast, the application of on-array synthesis techniques is more demanding for the generation of peptidic and even more challenging for combinatorial chemical libraries. In these cases, a much larger number of building blocks with greater ranges of reactivities is required. The greatest hurdle to highdensity, in situ array synthesis on glass is quality control of the numerous products generated. Techniques to infer physicochemical characteristics of monomolecular films in micrometer-sized areas are just beginning to evolve and are still far away from becoming a routine application such as high-pressure liquid chromatographycoupled mass spectrometry (LC-MS), which is used for conventional combinatorial compound library analysis. A different way to produce chemical microarrays in situ is spot synthesis of combinatorial libraries on cellulose sheets [56]. Spot synthesis is configured as an open system to be operated at room temperature. Despite attempts to replace cellulose with polypropylene as a synthesis support [57], cellulose is still the support of choice for spot synthesis, and reaction conditions have to be compatible with the restricted chemical stability of cellulose. Due to the acid lability of such membranes, the diversity content of these arrays was initially restricted to the synthesis of peptides. Recently, a method was described that could widen the scope of spot synthesis arrays. Germeroth and coworkers [57] succeeded in the assembly of a library of 8000 cellulose-bound 1,3,5-triazines under mild reaction conditions. They employed a strategy that took advantage of a temperature-dependent, successive displacement of cyanuric chlorides by different nucleophiles in a first report of the synthesis of small organic compounds on cellulose sheets. In addition to the limited range of cellulose-compatible synthesis protocols, two further drawbacks remain that are inherent to on-array approaches. Because synthesis takes place directly on the surface that is subsequently used for screening, quality control of the synthesis products is restricted to the surface-bound molecules. Standard cleave-and-characterize procedures involving LC-MS analytical techniques are not practical. This is a severe problem even for the synthesis of peptides where well-established protocols are available and becomes more pronounced when novel chemistries have to be employed. Secondly, each array produced is unique, which renders the production rather costly and prevents the generation of numerous copies of the same array for high-throughput applications. 3.3 Off-array Synthesis and Spotting
Some of the problems associated with on-array in situ synthesis can be overcome by a technology recently published by Stuart Schreiber’s group. In this work, the compound diversity was generated in solution or solid-phase formats different from the array layout, and chemical microarrays were subsequently produced by spotting pre-synthesized molecules [58–60]. In a proof-of-principle experiment, three
3 Chemical Microarrays
different organic compounds were immobilized on a chip surface [58]. For this purpose a silanated glass slide was derivatized to give a surface densely functionalized with thiol-reactive maleimide groups. Onto these surfaces, a high-precision robot delivered approximately 1 nL of a solution of the three different organic compounds, which were pre-synthesized to contain a spacer with a thiol group. In this way, an array of 10,800 spots each 200–250 µm in diameter was produced on a 2.5 × 7.5 cm glass slide. The chips were afterwards probed with respective fluorescence-labeled proteins for selectively binding their ligands. The fluorescence intensity recorded on the spot reflected the binding affinity of the respective ligand. This finding demonstrates the feasibility of semi-quantitative measurements of ligands with binding affinities in the low micro- to nanomolar range. In a second communication a similar technical setup was combined with a different chemical immobilization strategy [59]. The same set of organic probe molecules was presynthesized, this time containing primary alcohol groups, and spotted directly onto thionylchloride-activated glass slides. A small combinatorial library comprising 78 compounds was prepared in analogous fashion and probed with a target protein, and a new “hit” was identified in addition to the positive controls. Although the initial results of this approach were very encouraging, discovering new ligands for broader protein diversity would require a larger number of organic compounds. Moreover, the surface chemistries employed in both papers did not allow a tight control of the number and density of the compounds on each spot of every chip. This might become limiting when a more quantitative analysis of the results is required or when lower affinity interactions are to be analyzed. In the approach cited above, both the nanoliter droplet deposition technique and the surface design of the array support were adopted from DNA microarray fabrication [61]. The presentation of oligonucleotides or DNA on surfaces as well as the readout of such chips in hybridization experiments are facilitated by the fact that both ligand and receptor molecules are relatively similar with respect to solubility, charge density, and pI. The physicochemical properties of nucleotides make it possible to design surfaces that show relatively little background binding under various conditions. However, the situation is quite different if proteins are the target receptors and protein binding rather than hybridization is of interest. Proteins differ dramatically in stability, solubility, and hydrophobicity and can be basic or acidic. The challenge here is to design a surface that resists the unspecific binding of a wide range of different proteins. In addition, probing a DNA microarray by hybridization is relatively simple, as DNA is rather stable and the binding constant for a double-strand formation is relatively high, allowing one to subject the molecules to stringent hybridization conditions. The interactions of small organic molecules or peptides with proteins are often much weaker and range mainly from milli- to micromolar binding constants. This requires high-performance surfaces to minimize unspecific binding. A common drawback of glass slides as supports for chemical microarrays, as well as of cellulose or polypropylene sheets, is the protein compatibility of their respective surface chemistries. It is inherently difficult to render polypropylene or
13
14 Chemical Microarrays for Small Molecule Ligand Screening
silanated glass resistant to unspecific protein adsorption and is even more challenging to control a critical parameter such as ligand density on these polymers. For instance, engineering for biocompatibility of resins used in organic synthesis has become a major challenge for “bead-binding assay” screens [62]. In contrast, self-assembling monolayers (SAMs) of thiols on gold are not only among the best-characterized and best-defined synthetic surfaces but also can be designed to exhibit very low background protein binding. SAMs give excellent control over surface properties at the molecular level. The basic principles of SAM formation and applications in protein binding to immobilized ligands have been reviewed [63, 64]. In general, an alkane thiol is chemisorbed onto a gold surface, and the packing of the hydrocarbon chains creates a dense monolayer on the gold. Attaching an oligo(ethylene oxide) to the hydrocarbon chain confers excellent resistance to nonspecific protein and DNA binding, biocompatibility, and nonfouling properties to the surface [65–67]. Reactive SAMs form uniform layers that contain reactive groups such as amines or carboxylic acids at their surface [68]. The ratio of alkane thiols in a mixture of molecules with and without a reactive group allows one to control the presentation of ligands in a defined surface density (Fig. 3). Carboxylic acid-containing surfaces can be activated by pentafluorophenyl or N-hydroxy succinimide esters. Reactive SAM technology has recently been used for peptide [69] and protein arrays [70] as well. Using the immobilized ligand benzene sulfonamide together with carboanhydrase, it was demonstrated that the unspecific binding can be minimized and
Fig. 3 Creating of a chemical microarray occurs in two steps. A self-assembled monolayer is formed on gold-coated glass from a diluent and an anchor molecule carrying a reactive group. Spotting of ligands attached
to a ChemTag linker results in covalent immobilization of organic molecules. The uniform density of the ligands at each spot is defined by the ratio of anchor and diluent molecules.
3 Chemical Microarrays
that there was no irreversible adsorption. The measurements also were not complicated by mass transport [71, 72]. However, when rate constants for association and dissociation were analyzed, some lateral steric effects affected the binding kinetics. Ligands displayed on mixed self-assembled monolayers ligands in well-defined surface densities offer control over critical parameters to minimize unspecific or irreversible binding or deviation of the binding kinetics from those observed in solution. In our laboratories, we developed a proprietary SAM technology comprising a functionalized anchor molecule diluted with unfunctionalized spacer molecules. The chemical functionality is used to capture spotted ligand molecules, which themselves are carrying a linear spacer and functionalization molecule, named ChemTag. The system was tested against numerous well-known pairs of interacting molecules such as receptor:ligand, enzyme:co-factor/inhibitor, and antigen:antibody for detection of specific interactions. Using a variety of proteins differing in hydrophobicity and molecular weight, the results showed that — irrespective of the protein chosen and even at high concentration — unspecific binding was minimized. These experiments proved that interactions between proteins and ligands of low molecular weight (<200 Da) as well as weak affinities up to the millimolar range can be specifically detected, thus fulfilling a major requirement for the use of chemical microarrays for fragment-oriented screening applications. Building on the advantages provided by reactive mixed self-assembled monolayers, we designed both a compatible spacer chemistry and a miniaturized format for highly parallel solid-phase synthesis. Compounds are synthesized through combinatorial chemistry on a solid phase that is preloaded with the ChemTag spacer molecule. This linear spacer carries two functional groups, located at alpha and omega sites. The alpha group is used for transient conjugation to the synthesis support through a standard linker and remains protected in this fashion during the synthesis cycles. The omega group is available for ligand synthesis and can be functionalized in various ways to allow for different coupling chemistries. Cleavage from the solid phase by splitting the linker-alpha group bond yields free, spacermodified compounds in solution, with the alpha group free to be conjugated to the reactive sites on the microarray SAM surface. The common spacer group acts as a chemical shuttle to transport substances from one solid phase to another (Fig. 4). After solid-phase assembly, products are separated from the synthesis resin and stored in mother microtiter plates. At this stage, aliquots are subjected to quality control using LC-MS. After quality assessment, nanoliter amounts are transferred from the mother plates onto the surfaces of microarrays coated with the reactive mixed SAM. High-precision spotting of the substances in a custom-built automated environment ensures reproducible immobilization of the compounds onto the microarrays. A great variety of organic molecules can be attached through one of the different omega-site chemistries of the ChemTag spacer. The array content of our chemical microarrays ranges from the immobilization of single fragments to combinatorial libraries. As an example for a fragment array, up to 1536 individual monomers have been attached to the ChemTag and have been immobilized on a single array. Using
15
16 Chemical Microarrays for Small Molecule Ligand Screening
Fig. 4 The chemical microarray production starts by assembling combinatorial libraries or single fragments (A) to the ChemTag, which itself is attached to a solid-phase resin (B) resulting in compounds attached to the resin (C). After cleavage from the
resin, the tagged compounds are stored in mother plates (D) and LC/MS quality control can be performed (E). The final step is spotting of the compounds onto a prefabricated microarray carrying a reactive selfassembled monolayer (F).
different chemistries, multifunctional fragments can be immobilized in more than one way. In addition, binary libraries with a batch size of 10,000 compounds have been synthesized on a nanoscale level (40 nmoles/compound) and then spotted on a single array. This rather low synthesis scale of chemical products is sufficient for the production of hundreds of ready-to-screen arrays (Fig. 5).
Fig. 5 Chemical microarrays as they are used at Graffinity. The array is based on a microtiter plate footprint and carries up to 10,000 individual compounds spotted in rows and columns onto an optical microstructure.
4 Screening on Microarrays
4 Screening on Microarrays 4.1 Detection Technology
Chemical microarrays that are based on glass or cellulose or other polymer surfaces were typically interrogated for protein binding by fluorescence or chemiluminescence, respectively. Fluorophore-labeled protein samples were incubated with chemical microarrays, washed, and subsequently scanned by commercial microarray readers. The use of orthogonally labeled proteins was also described [58, 59]. Compound libraries on cellulose sheets are not readily subjected to fluorescence imaging, but a multi-step process of target protein1 and target-directed antibody together with enzyme-conjugated secondary antibody binding serves to obtain high-sensitivity images of protein-binding patterns. In our approach towards chemical microarray screening, we wanted to exploit the fact that self-assembled monolayers readily lend themselves to label-free detection based on surface plasmon resonance (SPR). The key to this approach is a thin gold metal film on a glass support that allows for both the physical effect of plasmon formation and the formation of high-quality self-assembled monolayers. Label-free imaging for chemical microarray readout is attractive because it reduces unspecific binding compared to chemiluminescence, as antibody and enzyme labels are not brought into contact with the array. The approach also allows us to observe the direct binding pattern of the unmodified target protein, as covalent modifications such as fluorophore conjugations are obviated. Molecular recognition of an immobilized ligand and its solubilized binding partner can be detected by a physical phenomenon called surface plasmon resonance [73]. In general, the sensor chip comprises a glass prism covered with gold and coated with a SAM presenting the potential ligands (Fig. 6). Protein solution is added to a reservoir on top. Upon binding, an affinity-dependent mass change occurs at the interface of the detector surface and the liquid above. This influences the dielectric properties of the gold layer. The readout is achieved by measuring the exact resonance condition for an energy transfer from the photons of a light beam to the electrons in the gold layer placed on top of a glass prism. An incident beam from the bottom is reflected at the gold surface and three factors, namely, the angle of incidence, the wavelength, and the refractive index at the interface, determine the resonance condition. For a fixed angle, the wavelength-dependent minimum of the reflected light is quantitatively related to the mass change due to binding. Likewise, an angular dependence at a defined wavelength can be recorded. Optical biosensors and in particular those based on SPR have gained importance in many areas. The technique itself, biological applications, and the availability of commercial instruments have been reviewed [74]. The most common application in the area of protein-ligand interactions is the detailed study of binding kinetics
17
18 Chemical Microarrays for Small Molecule Ligand Screening
Fig. 6 Basic setup (A) for surface plasmon resonance-based detection of molecular recognition comprises a gold-covered glass prism and the observation of the reflected light intensity in dependence of the angle and wavelength. Binding of a soluble protein to an immobilized ligand influences
the resonance condition at the gold interface. In the case of high-density chemical microarrays (B), the actual sensor fields are arranged in rows and columns to allow the parallel detection of up to 10,000 individual binding events using imaging technologies.
[75–79]. It relies on immobilization of one of the binding partners on a surface, and complex formation is optically detected in real time by adding the solubilized partner, e.g., by means of a microfluidic flow chamber. The sensitivity of the technique allowed the detection of small molecules binding to immobilized protein receptors in a number of cases. However, sensitivity is at the limits and throughput is low if the protein is immobilized and the surface is exposed to individual ligands in a sequential fashion. These restrictions do not apply for the chemical microarray format, where a high number of small molecules are immobilized and binding to the target protein is detected in parallel (Fig. 6). Because SPR is sensitive to mass change, the binding of large proteins increases sensitivity as compared to the setup where the protein is immobilized. The need for higher throughput has led to the development of instruments that are capable of working either sequentially or parallel on several sensor fields [80]. However, parallel detection comes into play only when the sensor technology is combined with 2-D arrays opening the technique for screening applications. We designed and built a number of Plasmon Imager devices to fulfill this need and to enable us to perform label-free, simultaneous binding detection for up to 10,000 immobilized small organic molecules against a macromolecular receptor in solution. The instruments are based on imaging technology where the reflected light from the entire array is captured by a CCD camera. The resonance condition for surface plasmons is determined by stepwise variation of the wavelength of the incident light. For each sensor field on the array, the light intensity (in percent of 100) of the reflected light is recorded against the wavelength (in nm), and for each spot a nm value for minimal light intensity is obtained. In order to generate an output that corresponds to protein-binding pattern recognition, two measurements are necessary. The difference between a measurement with buffer alone and after adding the target protein defines the SPR signal, i.e., the nanometer shift of the resonance condition.
4 Screening on Microarrays
4.2 Protein Affinity Fingerprints
Chemical microarray screening data are conveniently displayed in a 2-D format of squares arranged in rows and columns. This representation reflects the actual spatially encoded ligand positions on the grid layout of the microarray. Therefore, each square represents a single spot on the physical array and thus also a single array compound. The SPR shift that is obtained after exposing the chemical microarray to a protein can be shown as a grayscale or in color codes. We developed a proprietary visualization tool for point-and-click interrogation of each data point for both the chemical structure and the measured affinity associated with it. The visualized dataset is called an affinity fingerprint for the protein of interest (Fig. 7). In the case
Fig. 7 Affinity fingerprint of a target protein probed against a chemical microarray presenting 9216 immobilized binary compounds. The color range goes from red to orange to yellow to green to blue to code for decreasing SPR signal. Rows and columns each represent one of two monomers of the binary library and clearly
a pattern of prominent rows and columns points to the presence of certain monomer fragments being highly populated among the hits. The pop-up structure displays an array compound example, with the ChemTag attachment site indicated by a grayfilled circle.
19
20 Chemical Microarrays for Small Molecule Ligand Screening
Fig. 8 SPR signal of a chemical microarray presenting 1200 immobilized fragments shown as a function of molecular weight (A). Even small structures around 150 Da show detectable signal. The correspond-
ing affinity fingerprint of the target protein probed against the chemical fragment microarray is shown as a grayscale, with stronger binders being darker (B).
of binary libraries, constructed from two sets of monomers, visualization of the affinity fingerprints easily deconvolutes the presence of building blocks in hits. By arranging the compounds in rows and columns corresponding to the presence of a single monomer, the occurrence of many strong signals in one column indicates the significant contribution or even dominance of a monomer in the combinatorial hit compound. Besides such prominent fragments, indicated by “strong” rows or columns, hits also can be found for certain binary combinations where each monomer has little or even no “tolerance” for the second building block in order to show affinity. Typically, both types of hits are found. Overall, affinity fingerprints are the large-scale documentation of patterns of molecular recognition between small organic molecules and proteins. Immobilization of a collection of commercially available monomers on chemical microarrays represents the equivalent of the single fragment screening methods described above. Fig. 8 shows the affinity fingerprint of a chemical microarray containing 1200 individual fragments. The immobilized compounds all qualify as needle compounds, with molecular weights from 80 to 350 Da. Fragments can be selected that display diversity in pharmacophoric motifs and/or are enriched in drug-related chemical motifs as outlined above. The relationship between molecular weight and obtained SPR signal shows that the smallest fragments with detectable signals have a molecular weight of around 150 Da. Binding constants of chemical microarray hit fragments are typically observed to be in the micromolar range. For example, a well-known needle-type molecule, namely, amino-benzamidine with a K i of 80 µM against thrombin, is detectable within a fragment affinity fingerprint. After identification of binders from the array screening, the ChemTag can be replaced by a series of small substructures, leading to soluble compounds based
5 Conclusion 21
around the original array hit structure. This so-called tag-replacement strategy makes use of the tether site to expand the array hits. The functional group originally connected to the ChemTag provides both a readily available linking chemistry and a bias towards an attachment point, with a higher chance of pointing towards additional interaction sites or subpockets within the protein. With tag replacement, areas are explored that are not yet occupied by the remainder of the ligand, and attachment sites are likely avoided that might result in clashes with the protein. Case studies showed that the affinity of micromolar array hit compounds could be increased by one or two orders of magnitude. Such tag-replacement series link chemical microarray screening with follow-up medicinal chemistry efforts.
5 Conclusion
A number of technological advances in the area of surface chemistry and biophysical detection technologies have made fragment-based screening an attractive and promising tool for the early stages in drug discovery. The approach provides a new entry point for lead discovery, as it aims at finding drug fragments in a first round of screening and using this information for an iterative buildup of chemical complexity. Such a procedure promises to address a number of issues encountered in the discovery and development of larger, more-complex screening compounds. Fragment-based screening allows us to investigate the relative contribution of the monomers in a reagent-based SAR evaluation, helping to avoid combinatorial explosion. Despite combinatorial chemistry, the number of accessible products by far exceeds the screening capabilities, and library diversity analysis tries to help reduce redundancy in compound collections to achieve a better coverage of “chemicals space.” Prescreening of monomers helps to better cover diversity by focusing combinatorial design around fragments that fared well in an early stage. Then, either smaller screening libraries for optimization can be synthesized based on these preferred monomers, or next rounds of testing can be based on fragment analogues. The experimental techniques that allow the detection of small and weak binders either are based on high-concentration bioassays or fall into the category of affinity screening. Biological assays require careful setup and might not be available in general. NMR and crystallography have been used for some time now for fragmentscreening purposes, and elegant techniques have been developed to discover small fragments binding to target proteins. These techniques open the possibility of detailed insight into the binding mode of ligands, and their attraction lies in the seamless integration of structure-based optimization. Nevertheless, the experimental procedures are technically demanding and are limited either by crystallization conditions or NMR-related protein requirements. A novel emerging technology, label-free screening on chemical microarrays, brings together SPR, a well-known technique for studying molecular interactions, with self-assembled monolayer surface chemistry to display small organic
22 Chemical Microarrays for Small Molecule Ligand Screening
molecules in a miniaturized format. Progress in the area of label-free imaging now enables the parallel label-free affinity fingerprinting of 10,000 immobilized organic molecules against a target protein. The sensitivity of the method allows for the detection of weak interactions and makes routine, empirical fragment-based discovery a reality. The high-throughput, standardized screening format opens the possibility for chemical genomics applications aiming to move chemistry upstream in the discovery process. The goal is the early integration of chemical information either for target validation or for rapid identification of novel leads on a genomic scale. Acknowledgments
The authors are grateful for discussions with the colleagues of Graffinity Pharmaceuticals and want to emphasize the team effort that made SPR based screening on chemical microarrays possible. References 1 LIPINSKI, C. A., LOMBARDO, F., DOMINY, B. W., and FEENEY, P. J., Adv. Drug Deliv. Rev. 2001, 46, 3–26. 2 TEAGUE, S. J., DAVIS, A. M., LEESON, P. D., and OPREA, T., Angew. Chem. Int. Ed. 2001, 38, 3743–3748. 3 HANN, M. M., LEACH, A. R., and HARPER, G. J. Chem. Inf. Comput. Sci. 2001, 41, 856–864. 4 REJTO, P. A. and VERKHIVKER, G. M., Proc. Natl. Acad. Sci.U. S. A 1996, 93, 8945–8950. 5 REJTO, P. A. and VERKHIVKER, G. M., Pac. Symp. Biocomput. 1998, 362–373. 6 STOUT, T. J., SAGE, C. R., and STROUD, R. M., Structure. 1998, 6, 839–848. 7 ANDREWS, P. R., CRAIK, D. J., and MARTIN, J. L., J. Med. Chem. 1984, 27, 1648–1657. 8 KUNTZ, I. D., CHEN, K., SHARP, K. A., and KOLLMAN, P. A., Proc. Natl. Acad. Sci. U.S.A 1999, 96, 9997–10002. 9 LEWELL, X. Q., JUDD, D. B., WATSON, S. P., and HANN, M. M., J. Chem. Inf. Comput. Sci. 1998, 38, 511–522. 10 MAKINO, S., KAYAHARA, T., TASHIRO, K., TAKAHASHI, M., TSUJI, T., and SHOJI, M. J. Comput. Aided Mol. Des 2001, 15, 553–559. 11 SCHNEIDER, G., LEE, M. L., STAHL, M., and SCHNEIDER, P. J. Comput. Aided Mol. Des 2000, 14, 487–494. 12 LEWELL, X. Q. and SMITH, R. J. Mol. Graph. Model. 1997, 15, 43–46.
13 BEMIS, G. W. and MURCKO, M. A., J. Med. Chem. 1996, 39, 2887–2893. 14 BEMIS, G. W. and MURCKO, M. A., J. Med. Chem. 1999, 42, 5095–5099. 15 FEJZO, J., LEPRE, C. A., PENG, J. W., BEMIS, G. W., AJAY MURCKO, M. A., and MOORE, J. M., Chem. Biol. 1999, 6, 755–769. 16 EVANS, B. E., RITTLE, K. E., BOCK, M. G., DIPARDO, R. M., FREIDINGER, R. M., WHITTER, W. L., LUNDELL, G. F., VEBER, D. F., ANDERSON, P. S., CHANG, R. S., J. Med. Chem. 1988, 31, 2235–2246. 17 PATCHETT, A. and NARGUND, R., Annual Reports in Medicinal Chemistry 2000, 35, 289–297. 18 NICOLAOU, K., PFEFFERKORN, J., ROECKER, A., CAO, G., BARLUENGA, S., and MITCHELL, H. J. Am. Chem. Soc. 2000, 122, 9939–9953. 19 www.arraybiopharma.com. 2002. 20 HAJDUK, P. J., BURES, M., PRAESTGAARD, J., and FESIK, S. W., J. Med. Chem. 2000, 43, 3443–3447. 21 LUMMA, W. C., WITHERUP, K. M., TUCKER, T. J., BRADY, S. F., SISKO, J. T., NAYLER-OLSEN, A. M., LEWIS, S. D., LUCAS, B. J., and VACCA, J. P., J. Med. Chem. 1998, 41, 1011–1013. 22 APOSTOLAKIS, J. and CAFLISCH, A., Comb. Chem. High Throughput. Screen. 1999, 2, 91–104.
References 23 MURCKO, M. A. In: CHARIFSON, P. S., Editor, Practical Application of Computer-Aided Drug Design, Dekker, 1997. 24 BOEHM, H. J., BOEHRINGER, M., BUR, D., GMUENDER, H., HUBER, W., KLAUS, W., KOSTREWA, D., KUEHNE, H., LUEBBERS, T., MEUNIER-KELLER, N., and MUELLER, F., J. Med. Chem. 2000, 43, 2664–2674. 25 MALY, D. J., CHOONG, I. C., and ELLMAN, J. A. Proc. Natl. Acad. Sci.U. S. A 2000, 97, 2419–2424. 26 ERLANSON, D. A., BRAISTED, A. C., RAPHAEL, D. R., RANDAL, M., STROUD, R. M., GORDON, E. M., and WELLS, J. A., Proc. Natl. Acad. Sci.U. S. A 2000, 97, 9367–9372. 27 PELLECCHIA, M., SEM, D., and WUETHRICH, K., Nat. Rev. Drug Discov. 2002, 1, 211–218. 28 van DONGEN, M., WEIGELT, J., UPPENBERG, J., SCHULTZ, J., and WIKSTROM, M., Drug Discov. Today 2002, 7, 471–478. 29 DIERCKS, T., COLES, M., KESSLER, H., Curr. Opin. Chem. Biol. 2001, 5, 285–291. 30 BRADLEY, D., Modern Drug Discovery 2001, 29–34. 31 ROBERTS, G. C., Drug Discov. Today 2000, 5, 230–240. 32 MOORE, J. M., Curr. Opin. Biotechnol. 1999, 10, 54–58. 33 LEPRE, C. A., Drug Discov. Today 2001, 6, 133–140. 34 SHUKER, S. B., HAJDUK, P. J., MEADOWS, R. P., and FESIK, S. W., Science 1996, 274, 1531–1534. 35 HAJDUK, P. J., MEADOWS, R. P., and FESIK, S. W., Science 1997, 278, 497–499. 36 HAJDUK, P. J., DINGES, J., MIKNIS, G. F., MERLOCK, M., MIDDLETON, T., KEMPF, D. J., EGAN, D. A., WALTER, K. A., ROBINS, T. S., SHUKER, S. B., HOLZMAN, T. F., and FESIK, S. W., J. Med. Chem. 1997, 40, 3144–3150. 37 HAJDUK, P. J., ZHOU, M. M., and FESIK, S. W., Bioorg. Med. Chem. Lett. 1999, 9, 2403–2406. 38 HAJDUK, P. J., DINGES, J., SCHKERYANTZ, J. M., JANOWICK, D., KAMINSKI, M., TUFANO, M., AUGERI, D. J., PETROS, A., NIENABER, V., ZHONG, P., HAMMOND, R., COEN, M., BEUTEL, B., KATZ, L., and FESIK, S. W., J. Med. Chem. 1999, 42, 3852–3859.
39 HAJDUK, P. J., BOYD, S., NETTESHEIM, D., NIENABER, V., SEVERIN, J., SMITH, R., DAVIDSON, D., ROCKWAY, T., and FESIK, S. W., J. Med. Chem. 2000, 43, 3862–3866. 40 HAJDUK, P. J., GOMTSYAN, A., DIDOMENICO, S., COWART, M., BAYBURT, E. K., SOLOMON, L., SEVERIN, J., SMITH, R., WALTER, K., HOLZMAN, T. F., STEWART, A., MCGARAUGHTY, S., JARVIS, M. F., KOWALUK, E. A., and FESIK, S. W., J. Med. Chem. 2000, 43, 4781–4786. 41 ALLEN, K., BELLAMAINA, C., DING, X., JEFFERY, C., MATTOS, C., PETSKO, G., and RINGE, D., J. Phys. Chem. 1996, 100, 2605–2611. 42 NIENABER, V. L., RICHARDSON, P. L., KLIGHOFER, V., BOUSKA, J. J., GIRANDA, V. L., and GREER, J., Nat. Biotechnol. 2000, 18, 1105–1108. 43 ANDERSON, S. and CHIPLIN, J., Drug Discov. Today 2002, 7, 105–107. 44 GOODWILL, K., TENNANT, M., and STEVENS, R., Drug Discov. Today 2001, 6, S113–S118. 45 CARR, R. and JHOTI, H., Drug Discov. Today 2002, 7, 522–527. 46 BLUNDELL, T., JHOTI, H., and ABELL, C., Nat. Rev. Drug Discov. 2002, 1, 45–54. 47 STEWART, L., CLARK, R., and BEHNKE, C., Drug Discov. Today 2002, 7, 187–196. 48 BUSSOW, K., CAHILL, D., NIETFELD, W., BANCROFT, D., SCHERZINGER, E., LEHRACH, H., and WALTER, G., Nucleic Acids Res. 1998, 26, 5007–5008. 49 ARENKOV, P., KUKHTIN, A., GEMMELL, A., VOLOSHCHUK, S., CHUPEEVA, V., and MIRZABEKOV, A., Anal. Biochem. 2000, 278, 123–131. 50 ZHU, H., BILGIN, M., BANGHAM, R., HALL, D., CASAMAYOR, A., BERTONE, P., LAN, N., JANSEN, R., BIDLINGMAIER, S., HOUFEK, T., MITCHELL, T., MILLER, P., DEAN, R. A., GERSTEIN, M., and SNYDER, M., Science 2001, 293, 2101–2105. 51 MACBEATH, G. and SCHREIBER, S. L., Science 2000, 289, 1760–1763. 52 WARRIOR, U., BURNS, D., and KOFRON, J., Arrayed Compound Screening (ARCS): A Well-less Screening Platform with Advantages of Miniaturization. 2001. Eighth Annual HTT EXPO Advancing Drug Development 2001.
23
24 Chemical Microarrays for Small Molecule Ligand Screening
53 VENTER, J. C. et al., Science 2001, 291, 1304–1351. 54 FODOR, S. P.A., LEIGHTON, R. J., PIRRUNG, M. C., STRYER, L., LU, A. T., and SOLAS, D., Science 1991, 251, 767–773. 55 JACOBS, J. W. and FODOR, S. P., Trends Biotechnol. 1994, 12, 19–26. 56 FRANK, R., Tetrahedron 1992, 48, 9217–9232. 57 SCHARN, D., WENSCHUH, H., REINEKE, U., SCHNEIDER-MERGENER, J., and GERMEROTH, L., J. Comb.Chem. 2000, 2, 361–369. 58 MACBEATH, G., KOEHLER, A., and SCHREIBER, S. L., J. Am. Chem. Soc. 1999, 121, 7967–7968. 59 HERGENROTHER, P. J., DEPEW, K., and SCHREIBER, S. L., J. Am. Chem. Soc. 2000, 122, 7849–7850. 60 KURUVILLA, F. G., SHAMJI, A. F., STERNSON, S. M., HERGENROTHER, P. J., and SCHREIBER, S. L., Nature 2002, 416, 653–657. 61 SCHENA, M., SHALON, D., DAVIS, R. W., and BROWN, P. O., Science 1995, 270, 467–470. 62 LAM, K. S., LEBL, M., and KRCHNAK, V., Chem.Rev. 1997, 97, 411–448. 63 ULMAN, A., Chem. Rev. 1996, 96, 1533–1554. 64 MRKSICH, M. and WHITESIDES, G. M., Trends Biotechnol. 1995, 13, 228–235. 65 SIGAL, G. B., BAMDAD, C., BARBERIS, A., STROMINGER, J., and WHITESIDES, G. M., Anal. Chem. 1996, 68, 490–497.
66 MRKSICH, M., Journal of American Chemical Society 1995, 117, 12009–12010. 67 JUNG, L. et al., Langmuir 2000, 16, 9421–9432. 68 TSANG, S. K., CHEH, J., ISAACS, L., JOSEPH-MCCARTHY, D., CHOI, S. K., PEVEAR, D. C., WHITESIDES, G. M., and HOGLE, J. M., Chem. Biol. 2001, 8, 33–45. 69 HOUSEMAN, B. T., HUH, J. H., KRON, S. J., and MRKSICH, M., Nat. Biotechnol. 2002, 20, 270–274. 70 HODNELAND, C. D., LEE, Y. S., MIN, D. H., and MRKSICH, M., Proc. Natl. Acad. Sci.U. S. A 2002, 99, 5048–5052. 71 LAHIRI, J., ISAACS, L., GRZYBOWSKI, B., CARBECK, J., and WHITESIDES, G. M., Langmuir 1999, 15, 7186–7198. 72 LAHIRI, J., ISAACS, L., TIEN, J., and WHITESIDES, G. M., Anal. Chem. 1999, 71, 777–790. 73 MCDONNELL, J. M., Curr. Opin. Chem. Biol. 2001, 5, 572–577. 74 BAIRD, C. L. and MYSZKA, D. G., J. Mol. Recognit. 2001, 14, 261–268. 75 ABERY, J., Modern Drug Discovery 2001, 35–40. 76 RICH, R. L. and MYSZKA, D. G., J. Mol. Recognit. 2001, 14, 223–228. 77 RICH, R. L. and MYSZKA, D. G., J. Mol. Recognit. 2001, 14, 273–294. 78 MYSZKA, D. G., J. Mol. Recognit. 1999, 12, 390–408. 79 MORTON, T. A. and MYSZKA, D. G., Methods Enzymol. 1998, 295, 268–294. 80 COOPER M. A., Nat. Rev. Drug Disc. 2002, 1, 515–528.
1
Single Molecule Measurements of Molecular Motors Yoshiharu Ishii, and Toshio Yanagida
Single Molecule Processes Project, ICORP, JST, Osaka, Japan
Originally published in: Molecular Motors. Edited by Manfred Schliwa. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52730594-0 Abstract
Single molecule detection was developed to measure the unique operation of molecular motors. The function of molecular motors has been ascribed to the movement of a single motor molecule along a partner protein track. Manipulating single protein tracks and single motors and imaging single molecules has allowed the behaviour of such unitary machines to be measured. The mechanical responses to the energy input from the hydrolysis of ATP molecules have been determined and the underlying mechanism explored. The results indicate that the movement of molecular motors is driven by thermal motion rather than structural changes occurring in the motor molecules. Thermal Brownian motion must be biased in one direction. In this article, we summarize the methods and results of single molecule measurements of molecular motors. 1 Introduction
Almost 30 years after the proposal of the sliding filament theory of muscle, the movement of actin filaments over myosin immobilized on coverslips was first visualized under a microscope [1, 2]. This was accomplished by visualizing actin filaments labeled with phalloidin–tetramethylrhodamine by fluorescence microscopy [3]. Phalloidin, a mushroom toxin, binds stoichiometrically to actin molecules and serves to stabilize the actin molecules in the filament form at low concentrations, at which point the motility assay and the single molecule measurements are performed. The in vitro assay has proven to be a model system in which both biochemical and mechanical properties of the interaction of myosin and actin can be studied using purified proteins. This system contrasts with conventional measurement systems which use single muscle fibers and proteins in solution. The mechanical properties can be measured in a muscle fiber but the system is not Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Single Molecule Measurements of Molecular Motors
defined biochemically. The biochemical properties can be measured in solution, while the mechanical properties cannot be studied. The sliding movement of actin and myosin can be basically ascribed to the operation of a unit machine that contains a single myosin molecule and an actin filament. In the in vitro assay, the smooth movement of an actin filament involves interactions with many myosin molecules. Thus, it is not possible to determine which myosin molecules interact with the actin filament and when and where ATP molecules are hydrolyzed. By manipulating single actin filaments and single myosin molecules, it has been possible to study the interaction between myosin and actin in a controlled manner. Examining the single unit machine offers many advantages over ensemble measurements. Myosin and actin function dynamically. Myosin undergoes a series of chemical reactions during ATP hydrolysis. It changes the interaction with actin cyclically in association with these chemical reactions. In addition, all biomolecules are inevitably exposed to thermal perturbations. These processes are stochastic. In ensemble measurements involving large number of molecules, only average values can be determined and the dynamic changes, which are important in relation to function, are hidden and obscured in the averaging process. Single molecule measurements have overcome these difficulties and have allowed the dynamic behavior of protein molecules to be measured [4].
2 Manipulation of Actin Filaments
Actin filaments can be captured and manipulated by a laser trap [5] or a glass microneedle [6]; Fig. 1. A laser trap is a technique used to manipulate protein molecules using the laser in a non-invasive manner. When a small bead is irradiated by a focused laser, the incident laser beam is deflected on the surface and radiation pressure is exerted. The total radiation pressure exerted on the entire sphere is always directed towards the focal point of the laser, so the beads are attracted to the focal point and follow the focal position of the laser when it is moved (laser trap). The beads also behave as if they are connected via a spring; i. e. the total trap force exerted on the beads is proportional to the distance from the focal point when their position deviates from the focal point (nanometry). Utilizing these properties of the laser trap, an actin filament can be manipulated via two beads attached at either end by a double trap, and the movement generated by myosin can be measured [5]; Fig. 1a. For the mechanical measurements, the actin filament is suspended taut and brought into contact with myosin molecules sparsely fixed on the glass surface. The actin filaments with the beads at their ends move against the trap force in the presence of ATP. Thus, the displacement of the actin filaments generated by myosin can be measured by monitoring the changes in the position of the beads attached to them. The force generated can be estimated if the stiffness of the laser trap is known. The movement generated by single myosin molecules could be detected if the number of myosin molecules on the glass surface was reduced to one. The stiffness of the laser trap is dependent on the power of the laser. For a
3 Nanometry of Actin Filaments 3
Fig. 1 Manipulation of actin filaments. (a) An actin filament is manipulated via two beads attached at each end. (b) A microneedle is attached at one end of an actin filament to is attached at one end of an actin filament to manipulate it. For nanom-
etry, the displacement of the actin filament produced by single myosin molecules can be measured by recording the displacement of the beads or the tip of a microneedle with nanometer accuracy.
bead 1 µm in diameter, the trap force is several tens of pN and the stiffness is ∼0.1 pN nm−1 when the power of the laser is several hundred mW. This stiffness is sufficient to measure the movement caused by a single myosin molecule. The displacement generated with a single ATP molecule is approximately 10 nm. Thus, the force will be less than 8 pN because the free energy released from the hydrolysis of a single ATP molecule is ∼80 pN nm−1 . Microneedles are another method used to manipulate actin filaments. Thin glass microneedles can be prepared and attached to the ends of the actin filaments [6] Fig. 1b. The premise of the measurement using microneedles is the same as that for the laser trap. When actin filaments move as a result of the interaction with myosin, the microneedle bends and at the same time a restoring force is applied to the microneedle resisting the movement. The strength of the force is proportional to the displacement, with the stiffness of the microneedle as a proportional constant. When the force caused by myosin is balanced with the restoring force applied by the microneedle, the movement ceases. The displacement of the actin filaments can be measured and the force can be estimated when the stiffness of the microneedle is known. The stiffness of a microneedle is dependent on its shape. Long, thin glass microneedles are more flexible. A microneedle 50 µm in length and 0.3 µm in diameter, which is the typical length used for the measurements, gives a stiffness of 1 pN nm−1 .
3 Nanometry of Actin Filaments
Given that the displacement of the molecular motors is in the order of nanometers, the change in the position of the beads or the tip of the microneedle must be measured with nanometer accuracy. The image of the bead or the tip of the microneedle
4 Single Molecule Measurements of Molecular Motors
can be enlarged approximately 1000 times and projected onto a pair of photodiodes. The change in the position of the bead or tip of the microneedle results in a change in the difference of the intensities of each photodiode. Therefore, the difference in signals between multiple photodiodes is measured to estimate the displacement of the bead or microneedle. Thus the displacement can be determined with nanometer accuracy, which is much less than the diffraction limit. The diffraction limit is the minimum distance that two spots can be apart to be resolved as separate in the light microscope. As demonstrated in these measurements, it is possible to trace a moving spot with an accuracy greater than the diffraction limit. In these measurements, the displacement of the actin filaments is determined by measuring the displacement of a marker such as a bead or microneedle. If the connection between the myosin molecules and the beads or the microneedle is not tight and some compliant linkages exist between them, damping of the actual displacement of myosin and thermal fluctuations can occur. In addition, the position of the bead or microneedle is subject to thermal fluctuation. The mean square displacement x2 of a bead or microneedle can be related to the stiffness of the laser trap or microneedle, K, by the equation K x2 /2 = kB T/2. According to this equation, the root mean square distance is 6.43 nm for a stiffness of 0.1 pN nm−1 and 2.03 nm for a stiffness of 1 pN nm−1 . To measure the displacement with nanometer accuracy, it is important to make the system as stiff as possible. The beads or microneedle may respond slowly to the fast movements of the protein molecules. This response determines the time resolution. The response time of the system, τ , is related to the friction coefficient β and the stiffness K as τ = β/K. The friction coefficient depends on the shape and size of the probe. For a long and flexible glass microneedle with a diameter of 1 µm and a length of 1 cm, (stiffness = 0.01 pN nm−1 ), for example, the temporal resolution is only 1 s. To increase the temporal resolution, a short thin glass microneedle is necessary. For a microneedle with a length of 50 µm and a diameter of 0.3 µm (which gives a stiffness of 1 pN nm−1 ), the temporal resolution can be increased to almost 1 ms. In contrast, the viscous drag on a bead is small because a bead is smaller than a microneedle. For a bead 1 µm in diameter, the temporal resolution is as good as 0.1 ms. In addition to this advantage of time resolution, the laser trap experiments are much easier to carry out than microneedle experiments. The stiffness of the laser trap can be easily changed by changing the laser power. In the microneedle measurements, microneedles must be individually handcrafted and the stiffness of each individual microneedle must be determined.
4 Movement of Actin Filaments Caused by Single Myosin Molecules
The actin filament repeats steps and returns to its original position, while myosin repeats the binding to and dissociation from the actin filament (Fig. 2). The profile of the displacement trace is dependent on the ATP concentration. At high concentrations of ATP (in the mM range) the actin filaments showed rapid transient
4 Movement of Actin Filaments Caused by Single Myosin Molecules 5
Fig. 2 Time course of the displacement of actin filaments produced by single myosin molecules. Upper trace: the white line represents the raw data and the black line the same data after filtration with a lowpath filter (20 Hz bandwidth). The stiffness was calculated from the variance of bead fluctuations (bottom panel).
movement. As the ATP concentration decreased to the order of µM, the duration of the displacement became longer, while the size of the displacement remained the same. At very low ATP concentrations (in the nM range), individual events caused by a single myosin and ATP molecule can be identified. The duration time of the displacement increased from ∼0.2 s at 1 µM ATP to ∼20 s at 10 nM ATP. The ATP concentration-dependent trace of the displacement has been interpreted as the mechanical step in relation to the chemical reaction of ATP hydrolysis. The ATP binding step is dependent on the ATP concentration. The time for ATP molecules to bind is longer at low ATP concentrations and much shorter at high ATP concentrations. This is thought to be the mechanism by which single molecules ‘learn’ the ATP concentration of the system. The binding of ATP to myosin triggers the dissociation of myosin from the actin filament resulting in the bead returning to its original position. Myosin actually moves the actin filament when the products of ATP hydrolysis, ADP and Pi, are released. The relationship between the mechanical events and the chemical reaction has been confirmed directly by simultaneously measuring the movement of the motor and the turnover of ATP molecules, as outlined below. One important parameter that the mechanical measurements of myosin have provided is the size of the myosin step (step size) caused during the hydrolysis of a single ATP molecule [7, 8]. The position of the bead fluctuated thermally around a mean position, which was dependent on the stiffness of the trap as mentioned above [9]. Once myosin was strongly bound to the actin filament, the stiffness of the system increased and the thermal motion of the beads greatly decreased. By measuring the variance of the displacement trace, it is therefore possible to determine when myosin interacts with actin. However, the size of the step could
6 Single Molecule Measurements of Molecular Motors
not be determined for individual displacements, because the starting position of the displacements cannot be measured. The displacement relative to the mean position of the bead has a wide distribution, reflecting the thermal fluctuation of the bead. Using this technique only mean displacements can be obtained. The size of the displacement caused by a single myosin molecule upon ATP hydrolysis varies among research groups, the reported values ranging from 5 to 20 nm [5, 9–11]. Many experimental factors can affect the results of the step size. In the measurements, myosin molecules are immobilized on the glass surface. These immobilization procedures may affect the activity of myosin. Direct attachment of the myosin head to the glass surface may alter its activity as suggested by the results of in vitro motility assays [12]. To avoid the direct artificial attachment of the head portion of the myosin to the glass surface, the head is placed in a myosin–myosin rod co-filament, which can be prepared so as to contain only one or two heads in a single filament 5–8 µm in length. The use of such filaments has an additional advantage. It has been reported that the step size is dependent on the angle between the myosin and the actin filament [11]. The step size is maximal when the two filaments are parallel, but approaches zero when the two filaments are positioned at right angles. In the myosin–myosin rod filament, the orientation of the myosin head relative to the actin filament is known. In the situation where the myosin head molecules are placed in a random orientation, the displacement of actin filaments would range between the maximum value at an angle of 0◦ and the minimum value at an angle of 90◦ [13]. The average value would be less than the maximum value.
5 Visualization of Single Molecules
Biomolecules including protein molecules and small chemical compounds such as ATP are much smaller than the diffraction limit. It is impossible to visualize them using conventional light microscopy. In order to visualize such small molecules they are labeled with fluorescent probes and then the fluorescence is monitored. In 1995 a single fluorophore attached to a biomolecule in aqueous solution was first visualized after the background noise was greatly reduced by eliminating the noise from different parts of the microscope [14]. Many microscopes have also been developed to decrease the background noise. The basic concept behind these microscopes is that only local regions are illuminated to excite the molecules of interest. To visualize the behavior or motion of biomolecules at work, realtime imaging in aqueous solution is also a prerequisite. Total internal reflection fluorescence microscopy (TIRFM) is a technique which illuminates only the surface area between the solution and the coverslip, allowing both local illuminating and real-time imaging [14]; Fig. 3). When incident laser light is above the critical angle, the light is totally reflected at the interface with the aqueous solution, and an evanescent field is created 100 to 200 nm from the surface of the aqueous solution. Thus, only the fluorescent molecules positioned near the glass surface are excited
6 Visualization of ATP Turnover and Mechano-chemical Coupling
Fig. 3 Total internal reflection fluorescence microscopy. An evanescent field is created within 100–200 nm from the interface between the glass surface and the solution when the laser light strikes at an angle greater than the critical angle. Only molecules in the vicinity of the glass surface are excited and emit fluorescence.
and emit fluorescence. Large numbers of fluorescent molecules outside the region are not excited and therefore the background noise is greatly reduced. A single fluorophore is imaged as a spot, but the size of the spot appears to be enlarged to several hundreds of nanometers due to diffraction, even though the size of the molecule is in the order of a few nanometers. Because of this, spatial resolution is limited. While the fluorophores are continuously illuminated, single fluorophores are repeatedly excited and emit fluorescence within nanoseconds. A single spot therefore represents the integration of a large number of photons emitted from a single fluorophore. Single fluorophores can suddenly cease to emit fluorescence after a certain amount of exposure to irradiation, because they become irreversibly damaged as a result of the photochemical reaction. The average time for a photobleaching process to occur depends on the power of the laser. When a laser with higher power is used, photobleaching occurs in a short time, although the fluorescence is initially brighter. The single stepwise drop of the fluorescence is a good test to confirm that the fluorescent spot was from a single fluorophore. Fluorescence intensity can also be used to determine whether the fluorescence spots arise from single molecules. Histograms of the fluorescence intensity from single spots show a large peak corresponding to multiples of the minimum intensity of a single molecule.
6 Visualization of ATP Turnover and Mechano-chemical Coupling
Fluorescent spots remain in the same position when single biomolecules are immobilized on a glass surface and their motion can be traced only when they move slowly (not greater than tens of µm s−1 ). Rapidly moving fluorophores cannot be
7
8 Single Molecule Measurements of Molecular Motors
Fig. 4. Imaging of the turnover of individual fluorescent ATP molecules produced by the interaction of single myosin molecules in a myosin filament with a single actin filament. (Top) a picture for an actin filament trapped by a laser via two beads attached at both ends and a myosin–myosin-rod co-filament. (Second) fluorescence image of myosin and actin filament. The lower panels show sequential images of an association and dissociation event of ATP (ADP) with the same myosin head.
traced and only contribute to the background noise. The binding of fluorescent ATP to the protein molecules can be visualized in single molecule imaging, because the mobility of bound fluorescent ATP is different from that free in solution [14]; Fig. 4). ATP can be chemically modified with a fluorescent probe. The fluorescent ATP molecules can be visualized as spots when bound to immobilized myosin. However, fluorescent molecules cannot be observed when they are free in solution due to the rapid Brownian movement. The time span between binding and dissociation has a characteristic distribution, reflecting stochastic behavior. The histogram of the time course can be fitted to an exponential curve and the decay time (dissociation rate) can be compared with ATPase rates measured in solution at saturating concentrations of ATP. This assay allows the measurement of ATP binding at the single molecule level. However, the range of concentrations of fluorescent molecules is limited, because all fluorescent ATP molecules in solution contribute to the background noise even when evanescent illumination is used. Typically fluorescent ATP is used at concentrations between 10 and 20 nM. Of course, the binding affinity must be high enough to observe binding at such low concentrations. Given that both mechanical and chemical processes can be measured at the single molecule level, the coupling between them can be directly determined by simultaneous measurements [15]. The direct determination of the coupling is possible
6 Visualization of ATP Turnover and Mechano-chemical Coupling
Fig. 5 Direct determination of the coupling between chemical and mechanical events. The ATP turnover monitored by the fluorescence of Cy3-ATP (bottom panel) and the movement of myosin (top panel) were simultaneously measured at the single molecule level. The dissociation of myosin
from actin filaments (decrease in the displacement) is coupled with the binding of ATP (increase in fluorescence) and the generation of movement (increase in the displacement) is coupled with the dissociation of the products of the ATP hydrolysis reaction (decrease in fluorescence).
only when single molecule detection techniques are used, as individual chemical and mechanical events are not averaged. The data of the simultaneous measurements of the biochemical and mechanical events at low time resolution (∼30 ms) have shown that the chemical reaction and the mechanical events appear to be coupled (Fig. 5). When ATP binds, myosin dissociates from actin and myosin generates the step movement upon ADP dissociation. The coupling between the binding of ATP and the dissociation of myosin has been confirmed at higher time resolution (< 10 ms). However, a time difference between the ADP release and the generation of movement became apparent. This difference had a particular distribution at higher time resolution. In half the cases, the generation of the displacement occurred at the same time as the release of ADP, while in the other half the mechanical displacement was delayed by < 0.5 s after the release of ADP. It should be noted that it is not possible to experimentally distinguish photobleaching from the dissociation of fluorescent ADP in a single event. In both cases, the fluorescent spots disappear suddenly. The laser power can be decreased to
9
10 Single Molecule Measurements of Molecular Motors
decrease the rate of photobleaching and the probability that the photobleaching occurs before the dissociation takes place becomes lower. In these experiments the turnover of fluorescent ATP by myosin and the decay time for the photobleaching of Cy3-ATP either directly attached to the glass or bound to myosin in the presence of vanadate, can be compared with the turnover time. Thus, the probability that photobleaching occurs prior to nucleotide release can be reduced to a few percent. The free energy released at the step where the products of ATP hydrolysis are dissociated, must be stored until the generation of the displacement. The storage of the energy, or a prolonged memory effect, can be explained with the existence of two or more different conformational states, between which the protein undertakes slow transitions. Evidence for these multiple conformational states and spontaneous slow transitions between them has come from the studies of single molecule spectroscopy for many proteins including myosin [16]. Protein molecules have several local minimal states with high-energy barriers between them.
7 Visualization of the Movement of Single Kinesin Motors
The visualization of single molecules also allows the observation of the movement of single fluorescently-labeled motor molecules along a protein track. For visualization, the motor molecules must move for long distances without dissociating. Kinesin, a microtubule-based motor, is one such processive motor. The movement of single molecules of fluorescently-labeled kinesin along microtubules has been visualized [17]. A single kinesin molecule moves for ∼500 nm in 1 ∼2 s. The velocity is similar to that of microtubules moving over kinesin molecules attached to a coverslip. The processive movement of kinesin has been explained on the basis of its double-headed structure; one of the two heads moves one step while the other head stays attached to the microtubule. The two heads operate cooperatively and the steps alternate. If this is the case, then single-headed kinesin should not be able to move for long distances. However, single molecule imaging showed that some of the single-headed kinesin molecules did in fact move along the microtubules for long distances, indicating that two heads are not essential for processive movement (Fig. 6). The data analysis of the time course of the displacement has shown that the movement of single-headed kinesin resulted from diffusion and was biased in one direction [18, 19]. For these single-headed kinesin molecules it has been suggested that additional interactions with microtubules occur so that the motor protein molecules are kept on the track of microtubules during movement. KIF1A, a single-headed motor of the kinesin superfamily, has a lysine-rich positivelycharged loop (the K loop) that is thought to interact with negative charges on the C-terminus of microtubules. Deletion mutants of single-headed kinesin also showed diffusion-driven movement and only traveled small distances of 50 nm in less than 1 s. The travel distance and interaction time were enhanced to 500 nm and 2 s, respectively, when they were fused with a peptide called biotin-dependent
7 Visualization of the Movement of Single Kinesin Motors 11
Fig. 6 Movement of single kinesin molecules along microtubules. The trace of the fluorescently-labeled kinesin was recorded and the displacement along microtubules is plotted as a function of time. K351 is a kinesin construct which lacks the
C-terminal coiled-coil region (single headed kinesin) and biotin-dependent transcarboxylase (BDTC) is fused to K351 (K351BDTC). K411 is a double headed kinesin which shows processive movement over a long distance in one direction.
transcarboxylase (BDTC) which interacts with microtubules. BDTC does not affect the movement of two-headed kinesin. In the images of the movement of single kinesin molecules, we have frequently observed more than two kinesin molecules moving simultaneously along a single microtubule. The images have shown that there is a tendency for several kinesin molecules to move together suggesting that there is some communication between kinesin molecules via the microtubules [20]. Microtubules may be ‘activated’ by the binding of a single kinesin molecule to assume a high affinity state, resulting in cooperative binding of kinesins. However, although the movement of single molecules has been visualized, it has not been possible to observe the unitary movement of kinesin due to limitations in the accuracy of single molecule imaging techniques. The experimental accuracy of imaging is restricted to several hundred nm due to the diffraction limits and the video rate of 1/30 of a second. These limitations have been overcome using a laser trap system where it has been possible to observe the stepwise movement
12 Single Molecule Measurements of Molecular Motors
of kinesin. The use of fluorescent beads instead of single fluorophores enhanced the contrast of the images. The laser trap ‘cooled’ the thermal motion of the beads and the use of a pair of photodiodes allowed the measurement of the displacement to be undertaken with nanometer accuracy [21]. Kinesin attached to a bead was brought into contact with microtubules in the presence of ATP. The number of kinesin molecules attached to a bead was estimated statistically. The distribution of kinesin molecules on beads was consistent with Poisson statistics as indicated by the probability of a moving bead at various molar ratios of kinesin to beads. The average number of kinesin molecules on a bead used for the experiments and the probability that more than two kinesin molecules attached to a bead can be estimated. Kinesin moves along microtubules in regular steps of 8 nm (Fig. 7). The steps are primarily in the forward direction and only occasionally in the backward direction. As kinesin molecules moved against an increased load produced by the laser, the average time of a step as well as the number of the backward steps increased. Eventually, at a stall force of 7–8 pN the number of steps in the forward and backward directions became equal. When this occurred the molecule dissociated from the microtubules and returned to its original position. Both forward and backward movements occurred at the same chemical state; the single cycle of the ATP hydrolysis was coupled to either forward or backward steps and the direction could be determined stochastically. Thus, the thermal movements, which have been clearly observed for single-headed kinesin, are the driving force even for the stepwise movement of kinesin.
Fig. 7 Stepwise movement of kinesin attached to beads. The displacement of kinesin can be monitored by the laser trap method. The displacement trace shows 8-nm steps. Backward movement was observed at high load.
8 Visualization of the Processive Movement of Single Myosin Motors
Substeps of the 8-nm step of kinesin have been uncovered using laser trap measurements with small beads [22]. Decreasing the size of the beads improved the time resolution, although the images of the bead on the detector became fainter. Dark field illumination was used to brighten the images. The 8-nm step could be resolved into fast and slow substeps, each corresponding to a displacement of ∼4 nm.
8 Visualization of the Processive Movement of Single Myosin Motors
In the case of muscle myosin, it was impossible to detect the sliding movement with the imaging techniques described above, because muscle myosin readily dissociates from actin filaments at every cycle of the ATPase reaction. Recently, different types of myosin behavior have been discovered. Myosin V, VI and IX were found to move processively and it has been possible to visualize the movement of single myosin molecules when they are fused to green fluorescent protein (GFP; Fig. 8). Myosin VI, for example, moves 240 nm along a single actin filament in 0.44 s [23]. Interestingly, the directionality of myosin VI is opposite to the direction of other myosins [24]. One of the ends of the actin filament (pointed end) was fluorescently marked and the direction of the movement of myosin could be confirmed for individual myosin molecules.
Fig. 8 Processive movement of myosin V and myosin VI. The movement of myosins V and VI was visualized by means of green fluorescent protein (GFP). Sequential images indicate the movement of these
myosin molecules. Myosin V and myosin VI move in opposite directions. The pointed end of the actin filament was fluorescently labeled.
13
14 Single Molecule Measurements of Molecular Motors
Given that myosin V and VI are processive motors, displacement records were obtained by trapping these myosins like kinesin, as well as by double trapping the actin filaments. A consecutive step movement was observed which was consistent with other processive motors. The advantage of processive myosins is that it is possible to determine the size of individual steps without disturbance from thermal fluctuations, with the exception of the first step. Measuring the step size of myosin V and VI allowed the testing of a model developed on the basis of structural studies of myosin heads. In this so-called lever arm model, a long α-helix (the neck domain) connecting the motor domain and the tail domain is thought to rotate relative to the motor domain, thus acting as a lever [25]. This model predicts that the size of the displacement would be proportional to the length of the neck domain, or lever arm. Myosin V has a long neck domain, which contains six repeats of the calmodulin or light chain binding site as compared to the two binding sites in muscle myosin. The step size of myosin V was shown to be 36 nm, while that of muscle myosin is 5 ∼20 nm, a finding in agreement with the lever-arm model [26]. However, a deletion mutant, in which the neck domain is shortened to only one calmodulin-binding site, performed the same large step as the original myosin V motor [27]; Fig. 9. A similar result has been obtained using myosin VI. Myosin VI in its natural form has only one calmodulin-binding site and also produces a large step [23, 28]. These observations of large steps in myosins with a short lever arm cannot be explained by the lever-arm model. A step size of 36 nm matches the helical pitch of the two-stranded actin filament, indicating that the architecture
Fig. 9 The step movement of myosin V and its deletion mutant. The displacement was measured by a double trapping method. (Top) wild type myosin V. (Bottom) the mutant was obtained by deleting a large part of the neck domain.
9 Manipulation of Single Myosin Molecules with a Scanning Probe and Nanometry 15
of actin filaments plays an important role in the movement of myosin. Thus, the relationship between structure and function can be studied by combining single molecule detection with cell biology and protein engineering. Even if the structural change, i. e. swinging of the lever arm, occurs as predicted based on the X-ray crystallographic studies, these studies indicate that the structural change does not drive the mechanical movement. The structural change of the neck domain may play another role, for example in regulating the kinetic steps as a strain sensor.
9 Manipulation of Single Myosin Molecules with a Scanning Probe and Nanometry
In order to measure the details of the displacement of motors, we have developed a new technique in which single muscle myosin molecules instead of actin filaments are captured and manipulated [29]; Fig. 10. A single myosin head fragment, S1, was attached to a scanning probe and the displacement of the probe along immobilized actin filaments was measured. The probe was brought into contact with actin bundles. The use of actin bundles instead of filaments makes the manipulation for the interaction easier and the system stiffer. The stiffness of this system was 1 pN nm−1 compared to a stiffness of 0.2 pN nm−1 for the laser trap system, given the root mean square displacement of 2.0 nm for the scanning probe system as compared to 4.5 nm for the laser trap system. The response time was < 0.4 ms. This improvement was critical for detecting the details of the movement of single myosin molecules. To capture single myosin S1 molecules, they are marked by attaching a fluorescent dye to them. The fluorescently-labeled S1 molecules were sparsely placed on a glass slide and scanned using a probe. As a scanning probe, a whisker made from ZnO and 10 nm in diameter with sharp edges, was attached to the tip of a thin glass needle. A single myosin S1 molecule was caught by the tip of the scanning probe.
Fig. 10. Measurement of 5.5-nm steps in the displacement generated by myosin S1. A single myosin S1 molecule was attached to the tip of a cantilever and brought into contact with actin filaments. Given that this system is stiffer than the laser trap system, steps in the rising phase of the displacement record are identified. Backward steps have been occasionally observed, as indicated by an arrow.
16 Single Molecule Measurements of Molecular Motors
Confirmation that only a single S1 molecule had been captured, was achieved by monitoring the fluorescence intensity and the photobleaching behavior. The possibility that non-fluorescent myosin molecules and molecules which were labeled but had already been photobleached, were attached in addition to a single fluorescent myosin molecule cannot be ruled out. However, the probability that additional non-fluorescent molecules were attached was estimated to be marginal based on the labeling ratio of myosin after all experimental precautions had been taken. Myosin S1 was attached tightly to a probe via the biotin–avidin system. Given that this molecular glue was attached to the light chain of S1, which is a considerable distance from both the catalytic domain and the binding domain for actin on the myosin head, the attachment to the probe is thought not to interfere with motor functions. The displacement record of muscle myosin measured with this system was indistinguishable from the trace of actin filaments obtained from double trap measurements, if the data were plotted on the same time scale (Fig. 10). The rising phase of the displacement could be expanded, because the time resolution was considerably better for the scanning probe method. Within a single displacement during the hydrolysis of a single ATP molecule, there was stepwise movement. In a single displacement, there were one to five regular steps, giving rise to a total displacement of ∼5 to 30 nm. The average number of steps in a single displacement was three, corresponding to a total displacement of 13 nm. The size of a single step was 5.5 nm, coinciding with the interval between adjacent actin monomers on one strand of an actin filament.
10 Biased Brownian Movement
In addition to the variation in the total number of steps in a single displacement, evidence has been obtained to suggest that the rising phase of the displacement is stochastic. The interval time between steps also varied and was independent of the ATP concentration, whereas the time between the displacements was ATP concentration-dependent. The dwell time between steps was dependent on the temperature so that the 5.5-nm steps could be distinguished more clearly at lower temperatures. Approximately 10% of the total number of steps was in the backward direction. Backward movements increased with an increase in the load. In the experimental system used, the load applied could be increased by increasing the stiffness of the microneedle. At high loads, the number of steps in a single displacement was smaller and the interval time between the steps was larger, while the size of the steps remained the same. Thus, the reciprocal of the interval time between the steps decreased with an increase in the load. This relationship is basically similar to the force–velocity relationship in muscle, indicating that the unit machine consisting of a single myosin molecule and a single actin filament is sufficient to explain the mechanical properties of muscle.
11 Conclusions 17
The available data indicate that the sliding movement of myosin is driven by thermal movement. However, thermal Brownian motion is random. The direction of the movement must be biased and the energy released from ATP hydrolysis is used to bias it. Otherwise, the second law of thermodynamics would be violated. In both myosin and kinesin, the directionality of the movement can be explained by a difference in the activation energy between forward and backward movement. The probability of movement is proportional to exp(−/kB T) when is the activation energy, kB is Boltzmann’s constant, and T is the absolute temperature. The ratio of the probability for forward and backward movement r = nforward /nbackward can be written by the difference in the activation energy between forward and backward movement, as r = exp(−/kB T), where = forward – backward . Analysis of the data shows that the difference in the activation energy was 2 ∼ 3 kB T for myosin and 5 kB T for kinesin at zero load. A model simulation further shows that the directionality ratio can reach 9:1 when the potential difference between neighboring actin monomers is only 10% of the energy barrier between neighboring actin monomers. The question arises as to how this direction-dependent potential profile is created. One interesting idea is that the binding of a myosin molecule may activate several neighboring actin monomers in a direction-dependent manner, so that the myosin molecule is attracted more to one direction than to the other. In the case of the kinesin–microtubule system, such cooperativity in the binding was demonstrated based on the imaging of the binding of single kinesin molecules along microtubules, as described above. The binding of kinesin molecules was induced to a greater extent in front of the moving kinesin molecule. In this model, a statistically biased potential is assumed to cause the biased motion. Another explanation for biased Brownian movement is based on dynamic fluctuations of the potential. Myosin and actin dynamically change their structure and the interactions that occur between them. It has been shown that the movement of Brownian particles under the asymmetrical potential is biased by fluctuating potential unless the fluctuation is random. There is an optimum frequency at which the movement is biased. In the model where a structural change results in movement, different structures of myosin are assumed to correspond to the different states of the ATPase cycle and the structural change is associated with the transition between these states. In contrast, in the biased Brownian model, the movement is thought to occur in one state (ADP.Pi state) in the cycle of ATP hydrolysis. Thus, it is possible to simulate the movement of myosin.
11 Conclusions
New findings in the field of biosciences have been frequently initiated by the development of new techniques. Single molecule detection has allowed us to monitor the dynamic behavior of biomolecules, which is hidden in the average values of
18 Single Molecule Measurements of Molecular Motors
ensemble measurements where it is not possible to measure the sequence of the mechanical and biochemical events. Single molecule detection techniques have completely changed the style of research in this field. Combining this technique with the techniques of protein engineering and cell biology, the possibility for new types of measurements has increased. However, many unanswered questions still remain and additional new questions have been raised. There is still room for further improvement of established techniques and the development of new ones. For example, fluorescent dyes with greater stablity would be very useful. In addition, experimental systems that have increased stiffness would increase the experimental resolution for nanometry. Molecular motors are typical molecular machines. Although protein molecules of nanometer size are exposed to thermal agitation, they function very efficiently. There is increasing evidence to suggest that these molecules do not operate against thermal perturbation, but rather harness thermal energy to perform their functions. We have started measuring the behavior of molecular machines in which the biomolecules assemble and interact. Single molecule detection will be the technique of choice to unveil the underlying mechanisms.
References 1 HARADA Y., K. SAKURADA, T. AOKI, D. D. THOMAS, and T. YANAGIDA. 1990. Mechanochemical coupling in actomyosin energy transduction studied by in vitro movement assay J. Mol. Biol. 216: 49–68. 2 KRON, S. J. and J. A. SPUDICH. 1986. Fluorescent. actin filaments move on mysin fixed to a glass surface. Proc. Natl. Acad. Sci. USA 83: 6272–6276. 3 YANAGIDA T., M. NAKASE, K. NISHIYAMA, and F. OOSAWA. 1984. Direct observation of motion of single F-actin filaments in the presence of myosin. Nature 307: 58–60. 4 ISHII Y. and T. YANAGIDA. 2000. Single molecule detection in life science Single Mol. 1: 5–16. 5 FINER J. T., R. M. SIMMONS, and J. A. SPUDICH. 1994. Single myosin molecule mechanics: Piconewton forces and nanometre steps. Nature 368: 113–119. 6 ISHIJIMA, A., H. KOJIMA, H. HIGUCHI, Y. HARADA, T. FUNATSU, and T. YANAGIDA. 1996. Multiple and single-molecule analysis of the actomyosin motor by nanometer-piconewton manipulation
7
8 9
10
11
12
with a microneedle: Unitary steps and forces. Biophys. J. 70: 383–400. ISHII, Y., S. ESAKI, and T. YANAGIDA. 2002. Ex perimental studies of the myosin–actinmotor. Appl. Phys. A 75: 325–330. SPUDICH J. A. 1994. How molecular motors work. Nature 372: 515–518. MOLLOY J. E., J. E. BURNS, J. KENDRICK-JONES, R. T. TREGEAR, and D. S. C. WHITE. 1995. Movement and force produced by a single myosin head. Nature 378: 209–212. GUILFORD, W. H., D. E. DUPUIS, G. KENNEDY, J. B. PATLACK, and D. M. WARSHAW. 1997. Smooth muscle and skeletal muscle myosin produce similar unitary forces and displacement fully equipped with factors essential for motor function. Biochem. Biophys. Res. Commun. 230: 76–80. TANAKA H., A. ISHIJIMA, M. HONDA, K. SAITO, and T. YANAGIDA. 1998. Orientation dependence of displacements by a single one-headed myosin relative to the actin filament. Biophys. J. 75: 1886–1894. IWANE, A. H., K. KITAMURA, M. TOKUNAGA, and T. YANAGIDA. 1997. Myosin
References
13
14
15
16
17
18
19
20
21
subfragment-1 is Biophys. J. 72: 1006–1021. YANAGIDA, T., K. KITAMURA, H. TANAKA, A. H. IWANE, A. ISHIJIMA, and S. ESAKI. 2000. Single molecule analysis of the actomyosin motor. Curr. Opin. Cell Biol. 12: 20–25. FUNATSU T., Y. HARADA, M. TOKUNAGA, K. SAITO, and T. YANAGIDA. 1995. Imaging of Single Fluorescent Molecules and individual ATP turnovers by single myosin molecules in aqueous solution. Nature 374: 555–559. ISHIJIMA A., H. KOJIMA, T. FUNATSU, M. TOKUNAGA, H. HIGUCHI, and T. YANAGIDA. 1998. Simultaneous measurement of chemical and mechanical reaction. Cell 92: 161–171. WAZAWA T., Y. ISHII, T. FUNATSU, and T. YANAGIDA. 2000. Spectral sluctuation of a single fluorophore confugated to a protein molecule. Biophys. J. 78: 1561–1569. VALE R. D., T. FUNATSU, D. W. PIERCE, L. ROBMERG, Y. HARADA, and T. YANAGIDA. 1996. Direct observation of single kinesin molecules moving along microtubules Nature 380: 451–453. INOUE Y., A. H. IWANE, T. MIYAI, E. MUTO, and T. YANAGIDA. 2001 Motility of single one-headed kinesin molecules along microtubules. Bio phys. J. 81: 2838–2850. OKADA Y. and N. HIROKAWA. 1999. A processive single-headed motor: Kinesin superfamily protein KIF1A. Science 283: 1152–1157. MUTO, E. 2001. Is microtubule an active participant in the mechanism of motility? Biophys. J. 80: 513a. SVOBODA, K., C. F. SCHMIDT, B. J. SCHNAPP, and S. M. BLOCK. 1993. Direct observation of kinesin stepping by optical trapping interferometry. Nature 365 721–727.
22 NISHIYAMA, M., E. MUTO, Y. INOUE, T. YANAGIDA, and H. HIGUCHI. 2001. Substeps within the 8-nm step of the ATPase cycle of single kenisin molecules. Nature Cell Biol. 3: 425–428. 23 NISHIKAWA S., K. HOMMA, Y. KOMORI, M. IWAKI, T. WAZAWA, A. H. IWANE, J. SAITO, R. IKEBE, E. KATAYAMA, T. YANAGIDA, and M. IKEBE. 2002. Class VI myosin moves processively along actin filaments backward with large steps. Biochem. Biophys. Res. Commun. 290: 311–317. 24 WELLS A. L., A. W. LIN, L.-Q. CHEN, D. SATER, S. M. CALN, T. HASSON, B. O. CARRAGHER, R. A. MILLIGAN, and H. L. SWEENEY. 1999. Myosin VI is an actin-based motor that moves backwards. Nature 401: 505–508. 25 RAYMENT I., W. R. RYPNIEWSKI, K. SCHMIDS-BASE, R. SMITH, D. R. TOMCHICK, M. M. BENNING, WINKELMANN, D., A. G. WESENBERG, and H. M. HOLDEN. 1993. Three-dimensional structure of myosin subfragment-1: a molecular motor. Science 261: 50–58. 26 MEHTA, M., R. S. ROCK, A. D. M. RIEF J. A. SPUDICH, R. S. MOOSEKER, and R. E. CHENEY. 1999. Myosin-V is a processive actin-basedmotor. Nature 400: 590–593. 27 TANAKA H., K. HONMA, A. H. IWANE, E. KATAYAMA, R. IKEBE, J. SAITO, T. YANAGIDA, and M. IKEBE. 2001. The motor somain determines the large step of myosin-V Nature 415: 192–195. 28 ROCK R. S., S. E. RICE, A. L. WELLS, T. J. PURCELL, J. A. SPUDICH, and H. L. SWEENEY. 2001. Myosin VI is a processive motor with a large step size. Proc. Natl. Acad. Sci. USA 98: 3655–13659. 29 KITAMURA K., M. TOKUNAGA, A. H. IWANE, and T. YANAGIDA. 1999. A single myosin head moves along an actin filament with regular steps of 5.3 nanometres. Nature 397: 129–134.
19
1
Analysis of Receptor-Ligand Interaction M. M. H¨ofliger, and A. G. Beck-Sickinger Universit¨at Leipzig, Leipzig, Germany
Originally published in: Protein-Ligand Interactions. Edited by Hans-Joachim B¨ohm, Gisbert Schneider. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30521-6
1 Receptors
Receptors are the trigger molecules regulating a great variety of metabolic and physiological processes in cells as well as in more complex organisms. In the pharmacological sense, receptors are transducer proteins that selectively and reversibly bind an endogenous signaling molecule or its synthetic analogue, undergo a conformational change, and modify a cellular response as a consequence. This very general definition of the term receptor will be specified more precisely if we look at the different types of receptors that are present in mammalian cells. On one hand, there are the intracellular receptors of the nuclear receptor superfamily such as the steroid hormone receptors, the retinoid receptors, and the thyroid hormone receptors. The second category of receptor proteins is that of the membrane-bound receptors, which are by far more diverse. Those receptors serve the cells to transform a signal from the outside into a cellular answer. The superfamily comprises the ion-channel-coupled receptors, the kinase-coupled receptors, and the large family of G-protein-coupled receptors. In this chapter we focus on the characterization of the interaction between Gprotein-coupled receptors and their ligands. The presented techniques for the examination of ligand binding and receptor activation, however, are not limited to the analysis of this receptor group. 1.1 The G-Protein-Coupled Receptors
The G-protein-coupled receptors (GPCRs) are integral membrane proteins with the common characteristic of seven membrane-spanning helices. Their endogenous Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Analysis of Receptor-Ligand Interaction
ligands can be monoamine messengers (epinephrine, acetylcholine, serotonin, histamine, dopamine, etc.), lipids (prostaglandins, endogenous lipids, etc), neuropeptides (neuropeptide Y [NPY], substance P [SP], cholecystokinin [CCK], opioids, etc.), and peptide hormones (glucagon, angiotensin, bradykinin), as well as small proteins (chemokines) and large proteins (glycoprotein hormones, thrombin, etc.). All GPCRs transduce their signals to the interior of the cell through the interaction with heterotrimeric G-proteins. Additionally, important sensory proteins such as rhodopsin and the olfactory receptors belong to this family. The length of the different GPCRs varies considerably, from less than 300 amino acids for the smallest representatives, such as the adrenocorticotropin receptor, to more than 1100 amino acids for the metabotropic glutamate receptors. The receptors carry different posttranslational modifications such as glycosylation, which often leads to higher molecular weights of the receptors than expected from the amino acid sequence. Other modifications include disulfide bridges and palmitoylation at specific sites. Most frequently, the GPCRs are classified by primary sequence homology and subfamilies are named after well-characterized members. While only low homology is found in the loop segments, the seven transmembrane helices containing 20–25 hydrophobic amino acids show a higher degree of conservation [1]. So far our insight into the three-dimensional structure of GPCRs is rather limited, as it is very difficult and time-consuming to crystallize such complex integral membrane proteins. Therefore, our knowledge about the 3-D structure of GPCRs is mainly based upon the crystal structures of bacteriorhodopsin and rhodopsin, the only homologous proteins with an elucidated crystal structure. Mainly, the crystal structure of bovine rhodopsin, solved by Palczewski et al. in 2000 [2], in combination with molecular modeling provides valuable knowledge for the understanding of other GPCRs. Computer models based on the sequence homology between rhodopsin and other receptor proteins can now be calculated. However, these models have to be investigated experimentally and crystal structures do not necessarily represent a receptor’s native state, which is to be considered as a dynamic equilibrium rather than a single solid state.
2 Ligand-binding Theory
Binding of a ligand can lead to agonistic, antagonistic, or inverse agonistic effects. Those effects are related to conformational changes in the receptor proteins and subsequent activation or deactivation of signal transduction cascades (Fig. 1). Due to a conformational change, receptors in the active state gain the ability to bind heterotrimeric G-proteins, a variety of which are displayed in Tab. 1. The G-proteins that are present in their inactive GDP-binding form then exchange GDP to GTP. Smaller subunits of the heterotrimeric G-protein complex dissociate. The GTPbinding subunit also dissociates from the receptor and activates a specific signal transduction cascade. This can mean activation or deactivation of the adenylate cyclase, ion channels, phospholipase C, or phosphodiesterase.
2 Ligand-binding Theory
Fig. 1 General scheme of the activation and signal transduction cascade by a GPCR. After binding of a ligand, the receptor is stabilized in the active conformation. It then can bind to the heterotrimeric G-protein. GDP bound by the "-subunit is exchanged
by GTP. The "-subunit dissociates from the $(-complex. The signal can be transduced by the activated "-subunit as well as by the $(-complex, which both then interact with their respective effector molecules.
So far, mainly two general models have been suggested that describe the interaction between a ligand and its receptor. The first one, also known as the induced fit or conformational induction hypothesis, postulates the receptor to be there in an inactive conformation. Agonist and antagonist are thought to bind to the receptor in a similar way. Binding of an agonist induces a conformational change in the receptor molecule, which then leads to the binding of G-proteins and activates the signal transduction cascade. In this model, antagonists bind to the same binding site as agonists, but do not induce the conformational change. The model can explain the action of agonistic or antagonistic ligands. However, some ligands have inverse agonistic effects, and, in fact, more and more substances formerly considered as antagonists are now classified as inverse agonists [3]. Their binding to a receptor not only inhibits the corresponding signal transduction pathway but even Table 1 G-protein subfamilies classified by their α-subunits
Subfamily
Effector protein
Gs
Adenylate cyclase ↑ Ca2+ channels↑ K+ channels↑ Ca2+ channels↓ cGMP specific phosphodiesterase↑ Adenylate cyclase↓ Phospholipase Cβ4 -
Gi
Gq G12
3
4 Analysis of Receptor-Ligand Interaction
reduces it below its basic level. For example, this is the case for some therapeutic drugs such as cimetidine (histamine H2 receptors), haloperidol (dopamine D2 receptors), prazosin (α 1 -adrenoreceptors), timolol (β 2 -adrenoreceptors), and clozapine (D2 receptors and 5-HT2c receptors) [4–8]. Furthermore, the phenomenon of constitutively active receptors has been observed where receptors are in an active state without ligand binding. This is, for instance, the case for the human β 2 adrenergic receptor [9] and the human calcitonin receptor [10]. GPCRs have been expressed at different levels in cell culture, and a direct relationship between the level of expression and basal signaling could be shown [11]. Constitutive receptor activity has been found mostly as a result of mutational changes in the sequence. Both inverse agonism and constitutive activity suggest that not all receptor-ligand interactions can be explained by the induced fit hypothesis. The second model is that of conformational selection, which has been developed more recently. According to this hypothesis, GPCRs exist in at least two conformations. At least one of them binds to G-proteins and therefore is considered as the active state (R*). In other states the receptor cannot bind to G-proteins and is therefore referred to as the inactive, uncoupled receptor. There is an equilibrium between active and inactive conformations R* ↔ R. In the absence of a ligand, usually the inactive receptors represent the majority in this equilibrium. If a ligand is added, it may prefer a special conformation of the receptor for binding. In the case of an agonist, this will be the active state, whereas inverse agonists will prefer the inactive state. Binding of inverse agonists will therefore lower the level of basal signaling. Ligands that bind to active and inactive conformations of a receptor with the same affinity will not influence the equilibrium. They are competitive antagonists, as they can displace other ligands from the binding site but do not lead to changes in basal signaling. Ligands with only little preference for the active state will lead only to a small shift in the equilibrium towards the active conformation. The consequence is an only moderate increase in signaling, also referred to as low efficacy. Those compounds are partial agonists (Fig. 2). Between R and R* there are only small differences in energy. This can be seen by single amino acid mutations leading to constitutive receptor activity [12–14]. Conformational selection and conformational induction, however, are not necessarily in contradiction. Both can be seen as two parts of one mechanism in which the active conformation of a receptor is induced during the binding process, whereas the ligand preferably binds to the active conformation [15].
3 Characterization of the Receptor-Ligand Interaction
The characterization of new receptors usually starts with the discovery of a new receptor-ligand pair. This often starts with knowledge about a pharmacologically active compound, the mode of action of which is unknown, as was the case for neuropeptide Y [16], a 36-amino-acid neuropeptide extracted from porcine brain in 1982. On the other hand, it is also possible that a receptor is discovered for which no ligand is known (orphan receptor). This was the case for the orexin receptors
4 Receptor Material
Fig. 2 Model of the receptor-ligand interaction. Active (R*) and inactive (R) receptors are in a state of equilibrium. Whereas an agonist binds to the active conformation, an antagonist binds to the inactive conformation of the receptor. By stabilizing one
conformation, agonists and antagonists may shift the equilibrium to the respective side. Competitive antagonists will bind to both conformations with the same affinity and therefore not influence the equilibrium.
OX1 and OX2 [17]. For neuropeptide Y (NPY) in humans, until now at least four functional receptors have been identified, the NPY-Y1 , NPY-Y2 , NPY-Y4 , and NPYY5 receptors. (For the NPY-Y4 receptor, the endogenous ligand is the pancreatic polypeptide.) In the case of the orexin receptors, we know two peptide ligands, orexin A and orexin B, that differ significantly in their structure [18]. Once such a set of ligands and their corresponding receptors are known, the characterization of their interaction is necessary. On the one hand, it is desirable to understand how a ligand like NPY can distinguish between its receptors and whether there are differences in the binding mode. This will be especially important if it comes to the design of subtype-specific, pharmacologically active compounds. On the other hand, in cases like the orexin receptors — where we have the two ligands orexin A and B both binding to the OX1 and the OX2 receptor, respectively, but with different affinities [17] — it is important to find the causal connection between the ligand structure and its receptor selectivity.
4 Receptor Material
Usually, G-protein-coupled receptors and their ligands are first identified from primary tissue. However, it is difficult and also ethically questionable to get large enough amounts of primary tissue samples to closely investigate them. Furthermore, the stability of those samples and the accessibility of the embedded receptors can be a problem. There are different ways to circumvent those difficulties. Probably the easiest way is the isolation and cultivation of cancer cell lines that endogenously express the receptors that are of interest. This has been done with the SK-N-MC [19] cell line and the SMS-KAN cell line [20], which express the NPY-Y1 and the NPY-Y2 receptor, respectively. Another commonly used method is the cloning of
5
6 Analysis of Receptor-Ligand Interaction
the receptors into either cancer cell lines or bacterial or yeast cells. Cancer cells have the advantage that they come closest to mammalian tissue cells, are able to make posttranslational modifications, and do not have a cell wall like yeast cells do. A number of such cell lines have been established, such as BHK (baby hamster kidney), CHO (Chinese hamster ovary), HEK (human embryonic kidney), or COS (SV40 transformed African green monkey kidney cells) cells. Those cell lines can be transfected with the DNA of GPCRs of interest [21–24]. The advantage is that this method allows studies with genetically modified receptors in which single or different amino acids are mutated or in which a receptor is fused to another protein [25]. The expression of fusion proteins is often used in combination with fluorescent proteins, such as green fluorescent protein (GFP) [26], to make receptors visible and to perform fluorometric assays (see Section 8). Usually, such transfections are transient. To make them stable, one can combine the receptor encoding DNA sequence with a sequence encoding for an antibiotic resistance. Cultivation of the respective cell line in an antibiotic-containing medium over a longer period of time then can lead to a stable transfection [27–29]. 5 Binding Studies
For new receptor-ligand pairs, usually the determination of binding parameters is the first step. Also, for new, artificial compounds designed as new ligands for a receptor, it is important to determine whether they are binders. Therefore, a variety of binding assays have been established that can be divided into two groups: separation assays and direct assays [30]. In separation assays either whole cells or membrane preparations containing the receptor of interest are used. They are incubated with a radioactive- or fluorescent-labeled ligand until the state of equilibrium is reached. The receptor-ligand complex is then separated by centrifugation or filtration and the amount of bound ligand is determined. The more recently developed direct assays, based on surface plasmon resonance (SPR) [31] or fluorescence correlation spectroscopy (FCS) [32], measure the receptor-ligand interaction in real time. 6 Binding Kinetics
Interactions between G-protein-coupled receptors and their ligands are reversible with the exception of rhodopsin. The binding parameters of a ligand can therefore be determined in a competition-binding assay with a labeled ligand. In such an assay, the displacement of labeled ligand from the receptor is measured in the presence of different concentrations of the examined ligand. The general equation for a bimolecular association between a receptor (R) and a labeled ligand (L*) is k12
R + L ∗ RL ∗ k21
(1)
6 Binding Kinetics 7
where k12 is the association rate (on-rate) and k21 is the dissociation rate (off-rate) of the receptor-ligand complex. The association constant K (not to be confused with the association rate k12 ) of the ligand binding reaction is K=k12 /k21 , whereas the dissociation constant K d is K d =1/K = k21 /k12 . When L*=K d =1/K, then RL* = Rt /2, with Rt being the total receptor concentration. In other words, if the free concentration of the labeled ligand L* reaches the value of K d , the receptor-binding sites will be half-saturated with ligand. The value of K d is one-half of the maximal specific binding Bmax . The free ligand concentration at 50% receptor saturation, the IC50 concentration, is a measure of K d (or 1/K). The K d and Bmax values of a ligand are determined in a saturation analysis. Cell membranes or whole cells are incubated with different concentrations of a labeled ligand. The resulting curve consists of specific binding to the receptorbinding sites and non-specific, non-receptor-binding sites. Each concentration of the labeled ligand should therefore be displaced by a 1000-fold excess of unlabeled ligand to distinguish between specific and non-specific binding. To get the values for the specific binding, the values for non-specific binding are subtracted from those for total binding. The resulting curve for specific binding should be saturable at sufficiently high concentrations and represent a hyperbolic function (Fig. 3a). As the K d value is equal to the concentration of labeled ligand occupying 50% of the Bmax value, it can be determined from the saturation curve. The data was formerly analyzed by Scatchard analysis where it is summarized in the Scatchard plot. The amount of bound ligand divided by the amount of free ligand in solution (y-axis) is plotted against the amount of bound ligand (x-axis) (Fig. 3b). In the case of a bimolecular interaction, this should lead to a straight line with a negative slope. The intercept point with the x-axis represents the Bmax value, whereas the absolute value of the slope represents the K d value (Eq. (2)). More recently, the Scatchard analysis is more and more replaced by computational non-linear regression analysis, a method that makes it possible to directly solve the equation of one-site ligand binding. y=
Bmax · x Kd + x
(2)
From the B max value, the number of binding sites per milligram membrane preparation or per cell, which corresponds to the number of receptors per cell, can be calculated. The IC50 concentration of an unlabeled ligand is determined in a competition assay with a labeled ligand L*. This has the advantage that not every new or uncharacterized ligand has to be labeled but can be tested against an already labeled and characterized compound. A constant concentration of L*, usually at or below its K d , is displaced by increasing concentrations of the unlabeled ligand. For a bimolecular reaction this results in a sigmoidal competition curve. The IC50 value is the concentration of unlabeled ligand, which displaces 50% of the specifically bound labeled ligand from the receptor-binding site (Fig. 4). The IC50 value, however,
8 Analysis of Receptor-Ligand Interaction
Fig. 3 (A) Saturation analysis for a radiolabeled ligand. The x-axis represents the concentration of the labeled ligand. The y-axis shows the dpm-values (decays per minute) for total binding, non-specific binding, and specific binding. The values for specific binding are obtained by subtraction of non-specific binding and total binding. (B) Scatchard plot of a saturation analysis
experiment. The x-axis shows the amount of bound ligand, whereas the y-axis represents the amount of bound ligand divided by the amount of free ligand in solution. Bmax is determined from the intercept point with the x-axis. K d is the absolute value of the slope (both graphs present artificial, idealized data).
is dependent on the concentration of labeled ligand and may vary between experiments performed under different conditions. With the constant value K i , which is independent from the concentration of the labeled ligand, it is easier to compare datasets from different experiments. K i is calculated according to the equation of Cheng and Prusoff [33]:
Ki =
IC50 1 + KLd
(3)
7 Binding Assays 9
Fig. 4 Competition analysis (idealized data). The labeled ligand, which is held at a constant concentration at or below its K d , is displaced by increasing concentrations of the examined ligand. The x value of the curve’s inflection point represents log IC50 .
7 Binding Assays 7.1 Separation Assays
In separation assays the receptors, usually in the form of membrane preparations or whole cells, are incubated with a labeled ligand. The incubation should be long enough to reach the state of equilibrium, which can be anything from seconds to hours. For thermodynamic reasons, the incubation should take place at room temperature or body temperature, which allows the receptors as well as the ligands to change conformation and therefore to interact with each other. As soon as the state of equilibrium is reached, the receptor-bound fraction of the labeled ligand is separated physically from the unbound fraction, which is still in solution. This can be done by either centrifugation or filtration. To avoid disturbance of the established equilibrium during the separation process and following washing steps, the sample should be cooled during these procedures. This significantly reduces the association and dissociation rate. Then the amount of receptor-bound labeled ligand can be determined [30]. 7.2 Radioligand-binding Assay
One of the most common separation assays to test for a ligand’s binding to a receptor is the radioligand-binding assay. Radioactively labeled compounds are used in this type of assay. The high sensitivity of the method allows for the detection and quantification of very small amounts of ligand. Tritium (3 H) and iodine-125 (125 I) are the most commonly used isotopes. The selection of the respective label should, besides the availability, consider the radiochemical and safety properties. Tritium has a long half-life period (more than 12 years), which means that no correction is needed for the decay during the experiment and a labeled tracer can be used over a long period of time (provided that it is chemically stable). Furthermore, its incorporation
10 Analysis of Receptor-Ligand Interaction
into a molecule has no or only few sterical effects. Its properties as a beta radiator with a low radiant energy and only short radiant distance (0.6 cm in air) make it more convenient for handling than the gamma emitter iodine-125. However, the lower efficiency in measurement of 3 H (only 40% in liquid scintillation counting) may make it unsuitable for some applications. 125 I, with its higher radiant energy, higher maximum specific activity, and higher efficiency of measurement (75–90% in gamma counting), may be more suitable for assays in which very high sensitivity is required. One major disadvantage is, however, the size of the isotope 125 I, which can cause sterical changes in the tracer molecule and thereby influence its binding properties. Also, the low half-life period of 125 I makes it necessary to re-determine the specific activity of the labeled ligand before every assay. The radioactive binding assay can be used in both saturation and competition assays (see Section 6). In the saturation analysis, a variety of ligand concentrations are used to get a saturation curve. Each concentration is then competed by a 1000fold excess of the unlabeled ligand to determine the level of non-specific binding. In the competition assay, a single concentration of the radioligand is used. This concentration should preferably be at or below the K d value of the respective ligand. It is then competed by a variety of concentrations of the unlabeled ligand. The concentrations of the unlabeled competitor should be chosen in a way that the resulting competition curve covers the complete range from no displacement at all to complete displacement of the labeled ligand from the receptor [30, 34].
8 Fluorometric Assays 8.1 Fluorescence Labels
Examples of fluorescence labels for ligands are carboxyfluorescein, Cy3TM , a commercially available fluorescent marker based on a cyanine dye or tetramethylrhodamine. They are chemically introduced into a ligand. As with the radioactive labels, a possible influence of the labels on the binding behavior of the labeled ligands has to be considered, especially as the fluorescent dyes are complex molecules. Furthermore, the receptors themselves can be fluorescent labeled, which is done recombinantly. The respective receptors are expressed as fusion proteins with fluorescent proteins, e.g., green fluorescent protein (GFP) from Aequorea victoria, one of its mutant variants, or DSRed from Discosoma striata [26, 35]. 8.2 Fluorescence Correlation Spectroscopy (FCS)
For a number of assays, fluorescent-labeled analogues of a ligand are used. Some of those assays are of a quantitative nature, such as in fluorescence correlation spectroscopy (FCS), where the fluorescent-labeled analogue is used to determine
8 Fluorometric Assays 11
Fig. 5 Schematic setup of a fluorescence correlation spectrometer. A beam of parallel laser light is focused by an immersion objective onto a fluorescent sample. The fluorescence of molecules traversing the focus is collected by the same objective. It is filtered, focused, and detected. Signal autocorrelation is carried out by the computer (PC).
binding kinetics (Fig. 5) [36]. FCS allows the direct detection of molecular interactions in solution. FCS monitors the random motion of a fluorescent molecule in a defined volume (1 fL). Thereby, the diffusion rate of a particle can be detected, which is directly dependent on the particle’s mass. Therefore, FCS can quantify the bound and the free fraction of the fluorescent-labeled ligand and can be used to determine binding parameters. In contrast to the radioligand-binding assay, this has the advantage that the receptor-ligand interaction can be directly monitored over a longer period of time. Recently, FCS was applied to study the receptor diversity of the neuropeptide galanin in cultured cells [37]. In this case three different diffusion times for rhodamine-labeled galanin (Rh-GAL) could be detected: one short diffusion time for unbound Rh-GAL and two different longer diffusion times for membrane-bound Rh-GAL. Those findings suggest that FCS allows one not only to determine the amounts of bound and unbound ligand for a receptor-ligand complex but also to distinguish between different receptor subpopulations or different receptor subtypes. 8.3 Fluorescence Microscopy
Other fluorometric assays will be of a more qualitative nature, if it comes to microscopic studies to characterize receptor expression on cells [38, 39] (Fig. 6) or receptor internalization upon the binding of a ligand [40]. 8.4 Fluorescence Resonance Energy Transfer (FRET)
The use of either pairs of differently fluorescent-labeled receptors or fluorescentlabeled receptors and fluorescent-labeled peptides allows fluorescence resonance
12 Analysis of Receptor-Ligand Interaction
Fig. 6 Fluorescence microscopy image of BHK cells expressing a human NPY-Y2 receptor-GFP fusion protein.
Fig. 7. Principle of FRET. A donor fluorophore is excited at a certain wavelength and transfers its radiation energy non-radiatively to a closely located acceptor fluorophore. Emission at the acceptor’s wavelength can therefore be measured to determine whether the two chromophores are co-localized or not.
energy transfer (FRET) studies. The prerequisite is that the used chromophores form a FRET pair. This means, when in close proximity, that the so-called donor, excited at a certain wavelength, transfers its radiation energy non-radiatively to the closely located acceptor chromophore. Emission at the acceptor’s emission wavelength can be detected (Fig. 7). A receptor-ligand FRET pair can be used to study the binding of a ligand to its receptor [41]. The co-transfection of cells with a receptor-receptor FRET pair can help one to decide whether a receptor is present in the cell membrane as a monomer or dimer [42, 43]. If two different types of receptors are used in a FRET assay, it may even help one to decide whether the receptors are present as heterodimers.
9 Surface Plasmon Resonance
Another technique used for the analysis of receptor-ligand interaction is surface plasmon resonance (SPR), with its first commercially available application in the R BIAcore instruments [44] (Fig. 8). Like FCS, it allows the determination of kinetics by monitoring the association and dissociation of a receptor-ligand complex in real
9 Surface Plasmon Resonance 13
Fig. 8 Schematic setup of the surface plasmon resonance (SPR) detection unit in a R Bia-core instrument. One interaction partner is immobilized on a modified gold surface, whereas the other flows by in solution. On the other side of the sensor chip, a beam of polarized light is reflected by the gold film. The optical phenomenon SPR leads to a reduction in the intensity of re-
flected light at a certain angle (A). This effect is dependent on the refractive index at the sensor surface and thereby on the mass bound to the chip. Therefore, the angle changes if a ligand is bound by the molecules immobilized on the chip (B). The response values are then displayed in resonance units (RU).
time. The interaction partners do not necessarily have to be labeled, which is an advantage of the technique. The principle of SPR measurements is based on an optical phenomenon. The core unit in this technique is a sensor chip consisting of a thin gold film with a modified surface attached on one side. One reactant is attached to the modified sensor surface, whereas the other reaction partner flows past this surface in solution. When the two interaction partners form a complex, the mass on the sensor chip surface increases; when the complex dissociates, the mass falls. This leads to changes in the refractive index in the aqueous layer close to the sensor chip surface that is measured by an optical detection unit on the dry side of the chip. The signal measured in arbitrary resonance units (RU) is approximately proportional to the change in mass with 1 RU=1 pg/mm2 for proteins [45]. A variety of sensor chips with modified surfaces is commercially available, most of them based on a carboxymethyldextrane matrix. This can be used to directly couple one interaction partner via its functional groups by defined coupling chemistries for thiol, amine, or aldehyde coupling, or it can be further modified for capturing biotinylated or histidine-tagged interaction partners or for capturing liposomes. Finally, there is also a hydrophobic sensor surface available, composed of long-chain alkanethiol molecules, that can be used to construct lipid bilayers as membrane-like environments [46]. As recently described, liposome-capturing sensor chips can also be used to reconstitute G-protein-coupled receptors on them [47]. This was done with rhodopsin as the most frequently used model for G-protein-coupled receptors in general. Rhodopsin was immobilized and reconstituted in mixed micelles on the sensor chip surface. If the technique turned out to be suitable for other Gprotein-coupled receptors as well, it might become a valuable tool to test for the functionality of solubilized receptors, as well as to screen for new ligands and to determine binding kinetics.
14 Analysis of Receptor-Ligand Interaction
10 Molecular Characterization of the Receptor-Ligand Interaction 10.1 Antibodies
Antibodies can be used for a variety of applications in the molecular characterization of receptors and receptor-ligand interactions. Antibodies can be used for the detection of receptors in tissue slices, Western blot experiments [48], or ELISAs (enzyme-linked immunosorbent assays) [49]. They can also be used in competition experiments to map the binding epitope of a ligand [50]. Even though the use of antibodies is routine, there is no general protocol for their generation. Polyclonal antibody sera contain many different antibodies with different affinities for the epitopes of an antigen. They are generated by immunization of animals, mostly chickens or rabbits [51], with the antigen (Fig. 9b). To get good immunization results, the animal species should not be too closely related to the species where the antigen comes from. In the case of a G-protein-coupled receptor, immunization may be done with the whole protein or peptide segments of the receptor. Immunization with a whole receptor protein will create a serum with a high diversity of antibodies directed against all parts of the molecule — extracellular, intracellular, or transmembrane segments [51]. The production of sufficient amounts of purified receptor is often difficult and is connected with considerable loss of material. One possibility is the separation on a polyacrylamide gel, followed by blotting onto a nitrocellulose membrane. The membrane can then be introduced under the skin of the animal. Even more drastic is the direct injection of a portion of polyacrylamide gel containing the receptor. However, these methods not only are very stressful for the animal but also are applicable only if other methods to identify the respective receptors are known. As antibodies are often generated against not very well characterized proteins, this may be a problem. The immunization of animals with synthetic peptides is a way to circumvent these problems. Furthermore, it has the advantage that it leads to antibodies directed against a specific and known part of the protein, a benefit in some applications. Such antibodies can recognize parts of a receptor involved in ligand binding or serve as ligands with agonistic [52, 53] or antagonistic properties themselves. They can also displace other ligands, which might help to localize their binding site at the receptor. The peptide sequences are available from GenBank entries of the respective receptor or can be determined from the coding gene sequence. To allow good accessibility in cell or membrane assays, the peptide segments should preferably be chosen from parts of the protein exposed to the surface, especially the extracellular domain [49]. For previously unknown proteins, this information can be estimated from hydropathicity blots. Antibodies generated against intracellular parts of the receptor can be used to explore signal transduction processes or Gprotein binding [54]. To act as an antigen, the mass of a protein should be higher than 5–10 kDa. This often is not the case for the peptides of a receptor loop, so they have to be coupled to
10 Molecular Characterization of the Receptor-Ligand Interaction 15
Fig. 9 (A) Model of an immunoglobulin molecule. The molecule consists of the Fc portion, containing two disulfide bridges and two Fab portions with one disulfide bridge each. The Fab portions carry the variable domains of the antibody. (B) Immunization scheme for rabbits. Before administration of the antigen, a preliminary bleed should be taken. The animal is then immunized with the antigen. After 14 and 28 days, the first and second booster immunizations are given. After 35 days a first
bleed is taken from the animal. The third booster immunization follows after 56 days. Final bleeding is taken after 63 days. (C) Phage display for the generation of monoclonal antibodies. Different antibodies are displayed on the phages’ surfaces. They are then selected via an antigen. The phages that display antibodies with affinity for the antigen are amplified in E. coli. Finally, the sequences of the respective antibodies are identified on the DNA level [51].
16 Analysis of Receptor-Ligand Interaction
a larger protein carrier. Because the immunogenic reaction will be directed against parts of the carrier as well, it should not be relevant for future assays. Bovine serum albumin (BSA) or keyhole limpet hemocyanin (KLH) is commonly used. Because BSA is often used as a blocking agent, for instance in Western blotting, KLH might be a better choice in many cases. Antibody sera obtained from immunized animals can in some cases be directly used for further experiments. Further purification and enrichment, however, are often necessary to increase the specificity and overall affinity of the purified antibodies. Simple enrichment of the antibodies can be achieved by precipitation with ammonium sulfate [55] or by chromatography with protein A [56] or protein G, bacterial cell wall proteins that specifically bind to the Fc portion of immunoglobulins. A more specific enrichment is achieved by affinity chromatography with an antigen column. An alternative to the rather complex mixtures of polyclonal antibodies are monoclonal antibodies [51]. In the first step, as for polyclonal antibodies, an animal is immunized. The antibody-secreting lymphocytes are then isolated from lymphoid tissue and fused with plasmacytoma cells of a similar differentiation state. The new hybridoma cells can then be cultivated and selected, and supernatants can be screened for activity against the antigen. This procedure can be repeated until a clone is found, which produces antibodies with good affinity. Monoclonal antibodies have the advantage that they can theoretically be reproduced infinitely. Another possibility for the generation of monoclonal antibodies is phage display [51] (Fig. 9c), an evolutionary technique in which antibody V genes are cloned for display of assembled heavy- and light-chain variable domains into filamentous bacteriophage. Phages binding to the antigen are selected and soluble antibody fragments are expressed by infected bacteria. 10.2 Applications of Antibodies 10.2.1 Receptor and Ligand Detection Because there are a huge variety of applications for antibodies, in this section we focus on those concerning receptor characterization and receptor-ligand interaction. Antibodies represent a valuable tool to detect receptors as well as their ligands in tissue samples by direct staining of tissue slices. Another method is the separation of cell membranes by SDS polyacrylamide gel electrophoresis (SDS-PAGE), followed by Western blotting onto a nitrocellulose or PVDF membrane (Fig. 10). In both cases the receptors are incubated with the antibodies, non-bound antibodies are washed away, and a second antibody specific for the Fc fragment of the first antibody is added. The second antibody is labeled, either for direct detection with a fluorescent dye or radioactivity or for detection by a staining reaction, with enzymes like alkaline phosphatase or horseradish peroxidase. Often nothing is known about new receptors but their amino acid sequence. In such cases antibodies can be used to screen for their production in various tissues either directly in tissue slices or after immunoblotting of membrane preparations
10 Molecular Characterization of the Receptor-Ligand Interaction 17
Fig. 10 Receptor characterization by SDSPAGE and Western blotting. A cell membrane sample is separated by SDS polyacrylamide gel electrophoresis. The proteins are then blotted onto a nitrocellulose membrane. The membrane is incubated with an antibody that is specific for the receptor. Af-
ter incubation with a second enzyme-linked antibody, the receptor band is stained with an enzymatic dye reaction. The parallel separation of a standard protein mixture allows one to determine the molecular mass of the receptor (MP: marker proteins; CM: cellular membrane) [51].
[48, 49]. Such immunohistochemical studies were performed for the angiotensin II receptor subtypes AT1 and AT2 in rat adrenal gland [48] and in the heart of rabbits [57], for the metabotropic glutamate receptor in rat piriform cortex [58], for serotonin 5-HT2A and 5-HT3 receptors in inhibitory circuits of the primate cerebral cortex [59], and for the dopamine D1 and D2 as well as M4 muscarinic receptor in striatonigral neurons [60]. 10.2.2 Receptor Characterization The detected receptor masses in immunoblotting experiments often differ significantly from those calculated from the amino acid sequence [48, 49]. Sometimes even several bands of different molecular masses are detected for one receptor. Higher receptor masses and different bands may be due to posttranslational modifications like glycosylation or lipid residues. However, such observations are often
18 Analysis of Receptor-Ligand Interaction
difficult to explain and have to be checked carefully in each case. To distinguish whether a receptor double band is due to differences in the degree of glycosylation, due to different splice variants, or simply due to receptor degradation during the work-up process, a deglycosylation experiment can be performed. For this the receptors are treated with enzymes that release oligosaccharides from glycosylation sites, followed by immunoblotting and staining with antibodies. If a receptor is glycosylated, this should result in a single band, which should be detectable at the calculated molecular weight of the receptor. The most frequent posttranslational modifications for GPCRs are N-glycosylation at the N-terminus and external loop Asn-X-Ser/Thr sequences (human calcitonin receptor-like receptor [61] and α 1B -adrenergic receptor [62]), palmitoylation (human dopamine D1 receptor [63]), and phosphorylation (β 2 adrenergic receptor [64]). For many receptors different subtypes with a high degree of sequence homology are known. In cases such as the neuropeptide Y receptor family – with the Y1 , Y2 , Y4 , and Y5 receptors functionally expressed in humans — or the angiotensin II receptor family — with the receptors AT1A , AT1B , and AT2 in human — antibodies that distinguish between different receptor subtypes are a valuable tool to monitor the receptor expression on the protein level [48, 49]. 10.2.3 Functional Characterization of the Receptor-Ligand Interaction Antibodies generated against a peptide segment of a receptor can be used to map the binding site of the endogenous ligand at the respective receptor. This is especially the case if a variety of antibodies generated against different receptor peptides are available. The receptors are then simultaneously incubated with ligand and antibodies. In an ELISA the amount of antibody bound in the presence or absence of the ligand can be determined. Competition is expected for antibodies that have binding sites at the receptor that are overlapping with the ligand’s binding site. In addition, competition assays with radiolabeled ligand can be performed. In that case the amount of ligand bound to the receptor is determined after simultaneous incubation with the antibody. The assay is performed in the same way as the radioactive competition assay (see Section 7.2). Finally, if photoactivatable analogues (see Section 10.5) of the ligand are available, they can be cross-linked to the receptor. Afterwards, the cross-linked receptor-ligand complex can be incubated with the anti-receptor antibodies to see whether their binding site is blocked. A combination of these techniques was used to characterize the binding site of NPY at the Y 1 receptor [50]. 10.3 Aptamers
Aptamers (Latin aptus=fit) are RNA or DNA molecules isolated from combinatorial nucleic acid libraries by in vitro selection experiments termed SELEX: systematic evolution of ligands by exponential enrichment. A few binders with good affinities for the target are selected from up to 1015 different oligonucleotide sequences. The selection is performed by column chromatography or other enrichment techniques
10 Molecular Characterization of the Receptor-Ligand Interaction 19
that are suitable to separate binders from non-binders. Non-binders are washed away, whereas the binders are regained, amplified, and subjected to a new cycle of selection. After several cycles the stringency of the binding conditions is increased, which allows the selection of good binders only [65]. Developed in 1990 [66, 67], until now aptamers have been isolated against more than 150 target molecules, among them small organic molecules, amino acids, peptides, and proteins [68]. The mechanism of target recognition by aptamers is adaptive [69]. Whereas they are predominantly unstructured in solution, aptamers fold upon association with their ligands. The ligand becomes an intrinsic part of the nucleic acid structure, and the 3-D structures of aptamer complexes form highly optimized scaffolds for specific ligand recognition. Thus, aptamers, like receptors, seem to be able to distinguish not only between different ligand molecules but also between different conformations of a single molecule. This was nicely demonstrated for an aptamer selected against the 36-amino-acid peptide NPY [70]. NPY is the endogenous ligand for the Y1 , Y2 , and Y5 receptors. Receptor-selective, conformationally constrained, synthetic analogues of the peptide are known, suggesting that NPY binds to its receptors in different conformations (see Section 10.4.2). The generated aptamer was tested against a variety of these peptides and showed preference for Y2 -receptorselective analogues of NPY. Furthermore, a competition experiment was performed in which aptamer and radiolabeled NPY were simultaneously incubated with the respective receptors. Again, the aptamer showed higher competition for NPY at the Y2 receptor. Recently, intramers, aptamers that can be expressed inside cells and retain their function [71], were developed. Such intramers might also be generated against biomolecules that are part of a signal transduction pathway. In the future they might be used to elucidate signal transduction cascades triggered by a receptor. 10.4 Receptor Mutation and Ligand Modification
A way to gain insight into the interaction between a receptor and its ligand is to mutate single or several positions in the receptor sequence or to modify the ligand. Whereas in the receptor molecule an amino acid is usually substituted by another, the modifications of the ligand can be various. Different ligands, endogenous or synthetic, for a receptor do not necessarily have the same chemical structure. They do not even have to be members of the same chemical class; therefore, the determination of a pharmacophore (defined orientation of functional groups being the basis of biological action) may be difficult. Another problem is the determination of the ligand’s binding site at the receptor, especially because membrane receptors are difficult to crystallize and our structural models of G-protein-coupled receptors are based mainly on the X-ray structure of rhodopsin. 10.4.1 Receptor Mutagenesis One way to locate domains of a receptor that are involved in ligand binding is the creation of chimeric receptors. Chimeric receptors are fusion proteins in which
20 Analysis of Receptor-Ligand Interaction
one part originates from one receptor and the other part originates from another receptor. In most cases, chimeric receptors are used to obtain an initial picture of the location of interesting segments involved in ligand binding [72]. Usually, such chimeric constructs are made of two closely related receptors, such as the neurokinin NK1 and NK2 receptor [73], to obtain functional receptors. There are, however, examples of chimeric constructs from two distantly related receptors, such as the muscarinic M3 and the adrenergic α 2 receptor [74, 75]. In the respective chimeric constructs, the amino terminal five transmembrane domains (TMs) originated from the α 2 adrenergic receptor and the carboxy terminal two TMs originated from the M3 receptor (or vice versa). Whereas the single chimeric constructs were not functional, the co-expression of both constructs leads to functional activity corresponding to both an adrenergic and a muscarinic receptor. Another way to gain more detailed information on important residues for ligand recognition is site-directed mutagenesis of single amino acid residues of a receptor sequence. Before receptor mutations are introduced, one has to select amino acids that are likely to be of relevance for receptor function and ligand binding. One approach is to search for conserved residues of a receptor. This can be done by alignment either of sequences of the same receptor from different species or of sequences of different subtypes of a receptor family. Conserved amino acids, especially in the extracellular regions but also in the transmembrane regions, are likely to play a role in the mechanism of ligand binding. Amino acids in a transmembrane helix often interact with amino acids from another helix in a way that stabilizes the inactive conformation of a receptor [76]. Disruption of such an interaction by an agonistic ligand or by mutation of one interaction partner leads to a change in conformation and to activation of the receptor. This was demonstrated for the α 1B -adrenergic receptor in which residues Asp125 in transmembrane domain 3 and Lys331 in transmembrane domain 7 apparently form a salt bridge that holds the receptor in the inactive conformation [77]. The mutation of Lys331 to Ala led to a six-fold increase in recognition of the endogenous ligand epinephrine without influencing the binding behavior of selective antagonists. Furthermore, the mutation led to an increased level of basal receptor signaling. This explains why some receptors upon mutation of one single amino acid residue are constitutively active. In combination with modeling studies, an assumption for the second interaction partner can be made, which then has to be confirmed by creating the respective mutant. Conserved residues in the intracellular parts of a receptor are more likely to be involved in G-protein binding and signal transduction. This is the case for the DRY motif, a highly conserved amino acid sequence that, with the exception of some conservative mutations, is present in all rhodopsin-like receptors [1]. Another way to identify residues that might be important for ligand binding is to take amino acids from the putative binding region of a receptor and to search for possible interaction partners in the ligand. This approach was performed with the human NPY-Y1 receptor, where a number of negatively charged aspartic acid and glutamic acid residues in the extracellular domains of the NPY-Y1 receptor were systematically mutated to alanines [78]. Those residues are possible partners for electrostatic interactions with positively charged amino acids (arginine, lysine,
10 Molecular Characterization of the Receptor-Ligand Interaction 21
and histidine) in the N- and C-termini of NPY. In radioligand-binding assays, a number of residues were shown to be essential for ligand binding (D104, D194, D200, D287), whereas others had no or only moderate effects. 10.4.2 Ligand Modification To investigate which residues of the ligand are important for receptor recognition, analogues of a known ligand often are synthesized and tested for binding at the respective receptors. This can be done by rational or combinatorial approaches or by a mixture of both. In combinatorial methods the ligand is modified systematically, and up to several hundred slightly different compounds are prepared. Such methods have become possible with the introduction of robotic techniques for chemical synthesis that allow the simultaneous synthesis of many compounds at the same time. Such a set of chemical entities is also called a library. The single compounds are then tested for binding and for their biological activities. Thus, more important residues can be distinguished from less important ones. If analogues with antagonistic properties are detected, structural comparison to the endogenous ligand may help to detect elements that are necessary for receptor activation. For instance, aromatic side chain residues like Tyr, Trp, His, and Phe can be found to play an important role in receptor activation [79]. Also, the reduction of backbone amide bonds converts some peptide agonists to antagonists. This was demonstrated for the C-terminal tetrapeptide of gastrin, where the respective pseudo-peptide analogues had antagonistic properties [80]. Other approaches are more rational. For peptide ligands, a fast way to locate the binding site on the ligand is to create truncated analogues, representing only parts of the original molecule. The truncation does not necessarily have to be N- or C-terminal; also, centrally truncated analogues have been reported to act as ligands. In the case of NPY, a highly Y2 -receptor-selective analogue was reported in which the central amino acids 5–24 have been replaced by a single aminohexa-noic acid (Ahx) molecule [81]. This leads to the assumption that a conformation in which the N- and the C-termini of the peptide are closely associated is relevant for binding at this receptor. As with the receptors, for peptide ligands, chimera can be created in which one part of the molecule originates from one peptide, whereas the other part originates from a second peptide. If the original peptides are ligands of the same receptor family, the chimera can be used to determine structural characteristics that are the precondition for subtype specificity. Chimeric peptides have been used to study the interaction between galanin and its receptors [82]. A more rational approach is the alanine scan (Fig. 11), a method frequently used to screen the amino acid sequence of peptide ligands for the contribution of each residue to the receptor-ligand interaction. In an alanine scan, every single amino acid of a peptide’s natural sequence is replaced by L-alanine. Alanine residues in the natural sequence are usually replaced by glycine. For example, a complete alanine scan of neuropeptide Y revealed residues that are important for NPY’s interaction with the NPY-Y1 and NPY-Y2 receptors [83]. Testing of the analogues at the respective receptors revealed that parts of the ligand, such as the C-terminal pentapeptide amide and especially R33, were necessary for the recognition of both
22 Analysis of Receptor-Ligand Interaction
Fig. 11 l-Alanine scan of CGRP Y0 -28–37, a selective antagonist for the human calcitonin gene-related peptide 1 receptor (CGRP1 ). The analogues were tested in binding-competition assays against (125 Iiodohistidyl)-CGRP. Whereas the replace-
ment of some residues leads to complete loss of binding (P29, T30, V32, and G33), other replacements have a more moderate effect on the receptor recognition (n.d.=no displacement; (A36) was not replaced) [100].
receptors. Other residues were needed only for the binding to one receptor (P2, the NPY loop and R33 for the Y1 receptor and the C-terminal helix, and Y36 for the Y2 receptor), whereas their exchange was tolerated by the other receptor. Based on the findings from an alanine scan, further peptides can be synthesized in which residues involved in the receptor recognition are substituted by homologous amino acids. Interesting in this context is also the use of conformationally constrained analogues in which amino acid residues such as the helix breaker proline or the turn-inducing motive alanine-aminoisobutyric acid are introduced into the natural peptide sequence. In the case of NPY, a ligand that has a distinct conformation in solution [84, 85], it is questionable whether the observed effects are due to direct interactions between residues of the ligand and side chains of the receptor or whether they are caused by an altered conformation of the ligand. Thus, instead of solely determining the binding affinities of modified ligands, one has to investigate their conformation as well. Usually, it is easier by far to gain structural information on the ligands than on G-protein-coupled receptors. Many ligands can be synthesized in sufficient quantity and purity, most of them are water-soluble, and some of them can also be crystallized. A number of techniques allow structural insight into ligand conformation, such as X-ray crystallography, solid-state and solution NMR, and circular dichroism studies for peptides. Well-characterized sets of ligands, especially when containing agonists as well as antagonists for a specific receptor, can be used for computer modeling and structure affinity relationship studies. Ligands for receptors that are not members of the same chemical class and therefore do not share structural similarities at first sight are especially helpful in creating a pharmacophore hypothesis.
10 Molecular Characterization of the Receptor-Ligand Interaction 23
10.4.3 Combination of Receptor Mutation and Ligand Modification Some amino acids of a receptor’s sequence, when mutated, may have effects on the recognition of an agonist but not on the recognition of an antagonist and vice versa, a fact that supports the conformational selection hypothesis for the interaction between ligands and their receptors (see Section 2). Based on the mentioned mutagenesis study at the NPY-Y1 receptor and the results of the alanine scan of NPY, a model for the interaction of NPY with the hY1 receptor was designed [86]. In the meantime, the non-peptide compound BIBP 3226 was created and shown to act as a competitive, specific, and selective Y1 receptor antagonist [87, 88]. A second receptor mutation study at the NPY-Y1 receptor, based on those findings, showed that the agonist NPY and the antagonist BIBP 3226 share an overlapping, but not identical, binding site [89]. Whereas some mutations affected the binding of the endogenous ligand NPY only, others lead to decreased binding of both NPY and BIBP 3226, and one mutation affected the binding of the antagonist BIBP 3226 only. A similar approach was done with the NK-1 receptor and its peptide ligand substance P [90]. Important residues for the recognition of substance P were identified in the N-terminal extension, just outside the transmembrane domain 1 (TM-I), in the first extracellular loop outside TM-II and at the top of TM-III. Substitutions of these residues lead to a dramatic loss of binding of substance P but did not affect non-peptide antagonist binding. The residues important for non-peptide antagonist binding were identified in the outer parts of TM-IV and TM-V. As in the case of the NPY-Y1 receptor, substance P and the antagonists are competitors at the NK-1 receptor but seem to bind differently. Strikingly, one of the antagonist compounds CP96,345, seems to share no interaction point at the receptor with substance P, despite being a competitive antagonist. This again may be explained by the hypothesis of conformational selection, with the antagonists stabilizing an inactive conformation of the receptor and thereby inhibiting the recognition of substance P. The introduction of some mutations interestingly did not affect the ability of radiolabeled substance P to bind to the mutant receptors but impaired its ability to compete with the radiolabeled non-peptide antagonists. The combination of receptor mutagenesis and ligand modification helps to elucidate specific interactions between residues of the ligand and residues within the receptor sequence. 10.5 Cross-linking
Modification of the ligand and receptor mutagenesis studies are indirect approaches for the analysis of molecular interactions between a receptor and its ligand. It can be difficult to determine whether a change in affinity is caused by an altered conformation of either the receptor or the ligand or whether it is due to changed direct molecular interactions. An alternative method to circumvent this problem is to covalently link a ligand to its receptor after incubation. The interaction site can then be determined. In principle two different proceedings are known for
24 Analysis of Receptor-Ligand Interaction
Fig. 12 (A) Model of photo-cross-linking of a peptide ligand to its receptor by the photoactivatable amino acid p-(3trifluoromethyl)-diazirinophenylalanine (Tmd(Phe)). In this example Tmd(Phe) is used to replace its homologue natural
amino acid tyrosine. (B) Reaction scheme of Tmd(Phe). After irradiation with UV light, Tmd(Phe) forms a reactive carbene, which then reacts with residues of the receptor sequence that are closely located in the receptor-ligand complex.
cross-linking. One is the use of an additional bifunctional photoactivatable linker molecule. This has the advantage that the ligand does not have to be modified. However, a limitation of this method is that bifunctional reagents often cross-link the ligand with the receptor at 14–16 Å from the binding site [1]. One way to circumvent this problem is to introduce photoactivatable groups into the ligand itself (Fig. 12a). For peptide ligands a number of photoactivatable amino acids are known that upon irradiation with UV light form highly reactive species, for example, a carbene, a nitrene, or a diradical [1]. The binding behavior of the respective photoactivatable analogues of a native ligand of course has to be characterized before performing a cross-linking experiment, e.g., in a radioligandbinding assay [91]. Provided the photoactivatable ligand shows binding behavior similar to the native ligand, it can be used for cross-linking studies. To enable detection of successful cross-linking, the introduction of a second label is favorable [92]. This can be a radioactive isotope, which has the advantage of a very low detection limit, a fluorescent
10 Molecular Characterization of the Receptor-Ligand Interaction 25
dye, or a group with a defined interaction pattern such as biotin or a histidine-tag. The latter ones have the advantage that they allow the specific purification of the cross-linked receptor-ligand complex [93]. A standard procedure is to incubate the photoactivatable ligand with the receptor as in a standard binding assay [91]. The ligand’s photoactivatable group is then activated by irradiation with UV-light and thus is covalently cross-linked to close residues in the receptor sequence [91]. At this point it is of crucial importance to check for the specificity of the cross-linking, as most photoactivatable groups will react with any residue in their close environment [92]. Therefore, displacement assays with unlabeled ligands should be performed in parallel [92]. If the cross-linking was specific, it should be significantly competed by an unlabeled ligand. Because photoaffinity labeling studies are usually performed with whole cells or membrane preparations, purification of the cross-linked complex is necessary before further characterization of the interaction site is possible. This can be done by SDS-PAGE or via specific interactions of additional labels such as biotin or a histidine-tag. If radio or fluorescent labeling is used for detection, the cross-linked receptors will be directly identified from the gel by phosphorimaging [94]. If labels such as biotin or antibodies are used for detection, the gel will have to be blotted onto a membrane first, followed by staining with enzyme-coupled streptavidin or second antibodies (see Section 10.2.1). The corresponding bands on a silver or coomassie stained gel can be identified via their mass. The identified bands of the cross-linked receptors can then be excised from the gel, de-stained, and submitted to in-gel digestion by a proteolytic enzyme with specific cleavage sites, for example, trypsin [95, 96]. Trypsin specifically cleaves after lysine and arginine and therefore leads to defined peptide fragments of the photoaffinity-labeled receptors. Before trypsin digestion, sulfur atoms of the receptor protein can be reduced and alkylated, e.g., with iodoacetamide, to avoid the formation of peptide dimers via disulfide bonds. Among the digestion fragments should then be a peptide that is cross-linked to the ligand. It can be identified by mass spectrometry (MS) analysis, either of the crude peptide mixture (MALDI-TOF-MS) [95] or after further purification (ESI-MS), [97] or other methods. To facilitate the identification of digestion products, a number of online devices for in silico digestion are available (e.g., at www.expasy.org). The experimentally found fragments can then be compared to the theoretical ones. The cross-linked fragment can be identified by its mass, which is the mass of one receptor fragment plus that of the label. A number of photoactivatable ligands have been used to identify binding sites of ligands at their receptors [92]. The choice of the photoactivatable group is strongly dependent on the ligand into which it is introduced. It should alter the binding properties and the conformation of the ligand as little as possible. Examples of photoactivatable amino acids are p-(3-trifluoromethyl)diazirinophenylalanine (Tmd(Phe)) and p-benzoylphenylalanine (Bpa), two analogues of the amino acid phenylalanine [98]. They can be used to replace Phe or its homologue Tyr, as was shown in cross-linking experiments with the NPY-Y2 receptor [91]. Whereas Tmd(Phe) reacts via formation of a carbene, Bpa forms a biradical upon irradiation with UV light (Fig. 12b).
26 Analysis of Receptor-Ligand Interaction
Examples for the identification of G-protein-coupled receptor binding sites by photoaffinity labeling are the renal V2 vasopressin receptor [99] and the human brain cholecystokininB receptor [97]. 11 Conclusion
Despite the lack of crystallographic structural data for all G-protein-coupled receptors except rhodopsin, the information collected from the various experiments described here can be used to create models how GPCRs function and how the interaction between a ligand and its receptor takes place on the molecular level. The processing of these data not only requires careful evaluation by the scientist but also would be impossible without advanced computing technology. In combination, they lead to an evolving process of the formulation of new hypotheses and their experimental proofs. Nevertheless, direct structural determination of the receptor-ligand interaction remains a great aim for the future. The crystallization of rhodopsin was one step in this direction, and most likely other receptors’ crystallization will follow. Maybe other techniques for the elucidation of transmembrane receptors’ structures will be developed. However, as the interaction between a receptor and its ligand is always a dynamic process, techniques that monitor different aspects of this interaction will retain their importance. References 1 BECK-SICKINGER, A. G., Drug Discovery Today, 1996, 1, 502–513. 2 PALCZEWSKI, K., KUMASAKA, T., HORI, T., BEHNKE, C. A., MOTOSHIMA, H., FOX, B. A., LE TRONG, I., TELLER, D. C., OKADA, T., STENKAMP, R. E., YAMAMOTO, M. and MIYANO, M., Science, 2000 289, 739–745. 3 STRANGE, P. G., Trends Pharmacol. Sci., 2002, 23, 89–95. 4 SMIT, M. J., LEURS, R., ALEWIJNSE, A. E., BLAUW, J., Van NIEUW AMERONGEN, G. P., Van De VREDE, Y., ROOVERS, E. and TIMMERMAN, H., Proc. Natl. Acad. Sci. USA, 1996, 93, 6802–6807. 5 HALL, D. A. and STRANGE, P. G., Br. J. Pharmacol., 1997, 121, 731–736. 6 ROSSIER, O., ABUIN, L., FANELLI, F., LEONARDI, A. and COTECCHIA, S., Mol. Pharmacol., 1999, 56, 858–866. 7 CHIDIAC, P., HEBERT, T. E., VALIQUETTE, M., DENNIS, M. and BOUVIER, M., Mol. Pharmacol., 1994 45, 490–499.
8 WESTPHAL, R. S. and SANDERS-BUSH, E., Mol. Pharmacol., 1994, 46, 937–942. 9 MILANO, C. A., ALLEN, L. F., ROCKMAN, H. A., DOLBER, P. C., MCMINN, T. R., CHIEN, K. R., JOHNSON, T. D., BOND, R. A. and LEFKOWITZ, R. J., Science, 1994, 264, 582–586. 10 COHEN, D. P., THAW, C. N., VARMA, A., GERSHENGORN, M. C. and NUSSENZVEIG, D. R., Endocrinology, 1997, 138, 1400–1405. 11 ARVANITAKIS, L., GERAS-RAAKA, E. and GERSHENGORN, M. C., Trends Endocrinol. Metabol., 1998, 9, 27–31. 12 TURNER, P. R., BAMBINO, T. and NISSENSON, R. A., Mol. Endocrinol., 1996, 10, 132–139. 13 REN, Q., KUROSE, H., LEFKOWITZ, R. J. and COTECCHIA, S., J. Biol. Chem., 1993, 268, 16483–16487. 14 WESTPHAL, R. S., BACKSTROM, J. R. and SANDERS-BUSH, E., Mol. Pharmacol., 1995, 48, 200–205.
References 15 TUNG, M. F., Cellular Signalling, 1996, 8, 217–224. 16 TATEMOTO, K., CARLQUIST, M. and MUTT, V., Nature, 1982, 296, 659–660. 17 SAKURAI, T., Regul. Pept., 1999, 85, 25–30. 18 S¨OLL, R. and BECK-SICKINGER, A. G., J. Pept. Sci., 2000, 6, 387–397. 19 BIEDLER, J. L., HELSON, L. and SPENGLER, B. A., Cancer Res., 1973, 33, 2643–2652. 20 REYNOLDS, C. P., BIEDLER, J. L., SPENGLER, B. A., REYNOLDS, D. A., ROSS, R. A., FRENKEL, E. P. and SMITH, R. G., J. Natl. Cancer Inst., 1986, 76, 375–387. 21 ALALUF, S., MULVIHILL, E. R. and MCILHINNEY, R. A. J., J. Neurochem., 1995, 64, 1548–55. 22 JORDAN, B. A. and DEVI, L. A., Nature, 1999, 399, 697–700. 23 SAITO, Y., NOTHACKER, H. P., WANG, Z., LIN, S. H., LESLIE, F. and CIVELLI, O., Nature, 1999, 400, 265–269. 24 DUROCHER, Y., PERRET, S., THIBAUDEAU, E., GAUMOND, M.-H., KAMEN, A., STOCCO, R. and ABRAMOVITZ, M., Analytical Biochemistry, 2000, 284, 316–326. 25 GUJER, R., ALDECOA, A., BUEHLMANN, N., LEUTHAEUSER, K., MUFF, R., FISCHER, J. A. and BORN, W., Biochemistry, 2001, 40, 5392–5398. 26 MILLIGAN, G., Br. J. Pharmacol. 1999, 128, 501–510. 27 FUKUNAGA, K., ISHII, S., ASANO, K., YOKOMIZO, T., SHIOMI, T., SHIMIZU, T. and YAMAGUCHI, K., J. Biol. Chem. 276, 43025–43030. 28 TARASOVA, N. I., STAUBER, R. H., CHOI, J. K., HUDSON, E. A., CZERWINSKI, G., MILLER, J. L., PAVLAKIS, G. N., MICHEJDA, C. J. and WANK, S. A., J. Biol. Chem., 1997, 272, 14817–14824. 29 MOSER, C., BERNHARDT, G., MICHEL, J., SCHWARZ, H. and BUSCHAUER, A., Can. J. Physiol. Pharmacol., 2000, 78, 134–142. 30 CHO, W., BITTOVA, L. and STAHELIN, R. V., Anal. Biochem., 2001, 296, 153–161. 31 JOENSSON, U., FAEGERSTAM, L., IVARSSON, B., JOHNSSON, B., KARLSSON, R., LUNDH, K., LOEFAAS, S., PERSSON, B., ROOS, H., SJOLANDER, S., STAHLBERG, R., STENBERG, E., URBANICZKY, C., MALMQUIST, M.,
32
33 34 35
36
37
38
39
40
41
42 43 44 45
46
47 48
OSTLIN, H. and RONNBERG, I., Bio-Techniques, 1991, 11, 620–627. HESS, S. T., HUANG, S., HEIKAL, A. A. and WEBB, W. W., Biochemistry, 2002, 41, 697–705. CHENG, Y. and PRUSOFF, W. H., Biochem. Pharmacol., 1973, 22, 3099–3108. LAZARENO, S., J. Recept. Signal Transduct. Res., 2001, 21, 139–165. SCHULZ, R., WEHMEYER, A. and SCHULZ, K., J. Pharmacol. Exp. Ther., 2002, 300, 376–384. RIGLER, R., PRAMANIK, A., JONASSON, P., KRATZ, G., JANSSON, O. T., NYGREN, P., STAHL, S., EKBERG, K., JOHANSSON, B., UHLEN, S., UHLEN, M., JORNVALL, H. and WAHREN, J., Proc. Natl. Acad. Sci. USA, 1999, 96, 13318–13323. PRAMANIK, A., OLSSON, M., LANGEL, U., BARTFAI, T. and RIGLER, R., Biochemistry, 2001, 40, 10839–10845. INGENHOVEN, N. and BECK-SICKINGER, A. G., J. Recept. Signal Transduct. Res., 1997, 17, 407–418. FABRY, M., CABRELE, C., HOCKER, H. and BECK-SICKINGER, A. G., Peptides, 2000, 21, 1885–1893. FABRY, M., LANGER, M., ROTHEN-RUTISHAUSER, B., WUNDERLI-ALLENSPACH, H., HOCKER, H. and BECK-SICKINGER, A. G., Eur. J. Biochem., 2000, 267, 5631–5637. TURCATTI, G., NEMETH, K., EDGERTON, M. D., MESETH, U., TALABOT, F., PEITSCH, M., KNOWLES, J., VOGEL, H. and CHOLLET, A., J. Biol. Chem., 1996, 271, 19991–19998. OVERTON, M. C. and BLUMER, K. J., Current Biology, 2000, 10, 341–344. DEVI, L. A., Trends Pharmacolog. Sci., 2000, 21, 324–326. MYSZKA, D. G., Curr. Opin. Biotechnol., 1997, 8, 50–57. STENBERG, E., PERSSON, B., ROOS, H. and URBANICZKY, C., J. Colloid Interface Sci., 1991, 143, 513–526. PLANT, A. L., BRIGHAM-BURKE, M., PETRELLA, E. C. and O’SHANNESSY, D. J., Anal. Biochem., 1995, 226, 342–348. KARLSSON, O. P. and LOFAS, S., Anal. Biochem., 2002, 300, 132–138. FREI, N., WEISSENBERGER, J., BECK-SICKINGER, A. G., H¨OFLIGER, M.,
27
28 Analysis of Receptor-Ligand Interaction
49
50
51 52
53
54
55 56
57
58
59 60 61 62
63
64
65
WEIS, J. and IMBODEN, H., Regul. Pept., 2001, 101, 149–155. ECKARD, C. P., BECK-SICKINGER, A. G. and WIELAND, H. A., J. Recept. Signal Transduct. Res., 1999, 19, 379–394. WIELAND, H. A., ECKARD, C. P., DOODS, H. N. and BECK-SICKINGER, A. G., Eur. J. Biochem., 1998, 255, 595–603. ECKARD, C. P. and BECK-SICKINGER, A. G., Curr. Med. Chem., 2000, 7, 897–910. LEBESGUE, D., WALLUKAT, G., MIJARES, A., GRANIER, C., ARGIBAY, J. and HOEBEKE, J., Eur. J. Pharmacol., 1998, 348, 123–133. LEIBER, D., HARBON, S., GUILLET, J. G., ANDRE, C. and STROSBERG, A. D., Proc. Natl. Acad. Sci. USA, 1984, 81, 4331–4334. KUSAKABE, Y., ABE, K., TANEMURA, K., EMORI, Y. and ARAI, S., Chem. Senses, 1996, 21, 335–340. DUNBAR, B. S. and SCHWOEBEL, E. D., Methods Enzymol., 1990, 182, 663–670. EY, P. L., PROWSE, S. J. and JENKIN, C. R., Immunochemistry, 1978, 15, 429–436. FU, M. L., SCHULZE, W., WALLUKAT, G., ELIES, R., EFTEKHARI, P., HJALMARSON, A. and HOEBEKE, J., Receptors and Channels, 1998, 6, 99–111. BENITEZ, R., FERNANDEZ-CAPETILLO, O., LAZARO, E., MATEOS, J. M., OSORIO, A., ELEZGARAI, I., BILBAO, A., LINGENHOEHL, K., Van der PUTTEN, H., HAMPSON, D. R., KUHN, R., KNOPFEL, T. and GRANDES, P., J. Comp. Neurol., 2000 417, 263–274. JAKAB, R. L. and GOLDMAN-RAKIC, P. S., J. Comp. Neurol., 2000, 417, 337–348. INCE, E., CILIAX, B. J. and LEVEY, A. I., Synapse, 1997, 27, 357–366. KAMITANI, S. and SAKATA, T., Biochim. Biophys. Acta, 2001, 1539, 131–139. BJOERKLOEF, K., LUNDSTROEM, K., ABUIN, L., GREASLEY, P. J. and COTECCHIA, S., Biochemistry, 2002, 41, 4281–4291. JIN, H., ZASTAWNY, R., GEORGE, S. R. and O’DOWN, B. F., Eur. J. Pharmacol., 1997, 324, 109–116. FRASER, I. D. C., CONG, M., KIM, J., ROLLINS, E. N., DAAKA, Y., LEFKOWITZ, R. J. and SCOTT, J. D., Current Biology, 2000, 10, 409–412. KLUG, S. J. and FAMULOK, M., Mol. Biol. Rep., 1994, 20, 97–107.
66 TUERK, C. and GOLD, L., Science, 1990, 249, 505–510. 67 ELLINGTON, A. D. and SZOSTAK, J. W., Nature, 1990, 346, 818–822. 68 FAMULOK, M. and MAYER, G., Curr. Top. Microbiol. Immunol., 1999, 243, 123–136. 69 HERMANN, T. and PATEL, D. J., Science, 2000, 287, 820–825. 70 PROSKE, D., H¨OFLIGER, M., S¨OLL, R. M., BECK-SICKINGER, A. G. and FAMULOK, M., J. Biol. Chem., 2002, 277, 11416–11422. 71 FAMULOK, M., BLIND, M. and MAYER, G., Chem. Biol., 2001, 8, 931–939. 72 SCHWARTZ, T. W., Curr. Opin. Biotechnol., 1994, 5, 434–444. 73 YOKOTA, Y., AKAZAWA, C., OHKUBO, H. and NAKANISHI, S., EMBO J., 1992, 11, 3585–3591. 74 MAGGIO, R., VOGEL, Z. and WESS, J., FEBS Lett., 1993 319, 195–200. 75 MAGGIO, R., VOGEL, Z. and WESS, J., Proc. Natl. Acad. Sci. USA, 1993, 90, 3103–3107. 76 LEURS, R., SMIT, M. J., ALEWIJNSE, A. E. and TIMMERMAN, H., Trends Biochem. Sci., 1998, 23, 418–422. 77 PORTER, J. E., HWA, J. and PEREZ, D. M., J. Biol. Chem., 1996, 271, 28318–28323. 78 WALKER, P., MUNOZ, M., MARTINEZ, R. and PEITSCH, M. C., J. Biol. Chem., 1994, 269, 2863–2869. 79 MARSHALL, G. R., Biopolymers, 2001, 60, 246–277. 80 MARTINEZ, J., BALI, J. P., RODRIGUEZ, M., CASTRO, B., MAGOUS, R., LAUR, J. and LIGNON, M. F., J. Med. Chem., 1985, 28, 1874–1879. 81 CABRELE, C. and BECK-SICKINGER, A. G., J. Pept. Sci., 2000, 6, 97–122. 82 BARTFAI, T., LANGEL, U., BEDECS, K., ANDELL, S., LAND, T., GREGERSEN, S., AHREN, B., GIROTTI, P., CONSOLO, S., CORWIN, R. et al., Proc. Natl. Acad. Sci. USA, 1993, 90, 11287–11291. 83 BECK-SICKINGER, A. G., WIELAND, H. A., WITTNEBEN, H., WILLIM, K. D., RUDOLF, K. and JUNG, G., Eur. J. Biochem., 1994, 225, 947–958. 84 MONKS, S. A., KARAGIANIS, G., HOWLETT, G. J. and NORTON, R. S., J. Biomol. NMR, 1996, 8, 379–390.
References 85 BADER, R., BETTIO, A., BECK-SICKINGER, A. G. and ZERBE, O., J. Mol. Biol., 2001, 305, 307–329. 86 SAUTEL, M., MARTINEZ, R., MUNOZ, M., PEITSCH, M. C., BECK-SICKINGER, A. G. and WALKER, P., Mol. Cell. Endocrinol., 1995, 112, 215–222. 87 RUDOLF, K., EBERLEIN, W., ENGEL, W., WIELAND, H. A., WILLIM, K. D., ENTZEROTH, M., WIENEN, W., BECK-SICKINGER, A. G. and DOODS, H. N., Eur. J. Pharmacol., 1994, 271, R11– R13. 88 DOODS, H. N., WIENEN, W., ENTZEROTH, M., RUDOLF, K., EBERLEIN, W., ENGEL, W. and WIELAND, H. A., J. Pharmacol. Exp. Ther., 1995, 275, 136–142. 89 SAUTEL, M., RUDOLF, K., WITTNEBEN, H., HERZOG, H., MARTINEZ, R., MUNOZ, M., EBERLEIN, W., ENGEL, W., WALKER, P. and BECK-SICKINGER, A. G., Mol. Pharmacol., 1996, 50, 285–292. 90 ROSENKILDE, M. M., CAHIR, M., GETHER, U., HJORTH, S. A. and SCHWARTZ, T. W., J. Biol. Chem., 1994, 269, 28160– 28164.
91 INGENHOVEN, N., ECKARD, C. P., GEHLERT, D. R. and BECK-SICKINGER, A. G., Biochemistry, 1999, 38, 6897–6902. 92 DORMAN, G. and PRESTWICH, G. D., Trends Biotechnol., 2000, 18, 64–77. 93 BAYER, E. A. and WILCHEK, M., J. Chromatogr., 1990, 510, 3–11. 94 BOLT, M. W. and MAHONEY, P. A., Anal. Biochem., 1997, 247, 185–192. 95 SHEVCHENKO, A., WILM, M., VORM, O. and MANN, M., Anal. Chem., 1996, 68, 850–858. 96 DIHAZI, H., KESSLER, R. and ESCHRICH, K., Anal. Biochem., 2001, 299, 260–263. 97 ANDERS, J., BLUGGEL, M., MEYER, H. E., KUHNE, R., TER LAAK, A. M., KOJRO, E. and FAHRENHOLZ, F., Biochemistry, 1999, 38, 6043–6055. 98 WEBER, P. J. and BECK-SICKINGER, A. G., J. Pept. Res., 1997, 49, 375–383. 99 KOJRO, E., EICH, P., GIMPL, G. and FAHRENHOLZ, F., Biochemistry, 1993, 32, 13537–13544. 100 RIST, B., ENZEROTH, M. and BECK-SICKINGER, A. G., J. Med. Chem., 1998, 41, 117–123.
29
1
Protein Microarrays: Surface Chemistry and Coupling Schemes Michael Sch¨aferling Universit¨a Regensburg, Regensburg, Germany
Dev Kambhampati Alameda, USA
Originally published in: Protein Microarray Technology. Edited by Dev Kambhampati. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30597-1
1 Introduction
This review will focus on various strategies for constructing protein microarrays on a wide range of substrates. The way in which proteins are immobilized on solid substrates will determine the functional properties of a protein microarray. While protein interactions are currently conducted on a wide variety of formats, it is still a major challenge to obtain high quality biomolecular interaction data that can be used to predict how proteins behave in their biological environments. Two approaches for developing protein microarrays are currently underway. In the first approach the properties of the microarray surface are tailored to functionalize unmodified (pristine) biomolecules and in the second biomolecules are modified with tags to effectively anchor them to existing microarray or sensor substrates. Irrespective of which approach is chosen for a particular application, fundamental challenges still remain when it comes to interpreting the data. Extrapolation of protein immobilization strategies from one system to another for different classes of proteins and applications is difficult since proteins have a wide range of structural, chemical, and biological properties. Even within a single protein array application system, normalization studies (corresponding to probe surface concentration and activity) of a wide range of probes with different functional properties have to be considered. Sample preparation and quality control issues plague almost all microarray formats that are currently available. These processes are extremely time consuming and counteract the overall claims and advantages of such microarrays in functional high throughput screening.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Microarrays: Surface Chemistry and Coupling Schemes
A “one-approach-fit-all” scheme for screening all the various types of proteins is improbable. The current (and the most practical) approach is to generate surfaces that can be used for broad subsets of protein families without compromising their functional properties. In this article, protein chips constructed by using different immobilization chemistries on glass, gold, polymer and other special formats (semiconductor surfaces and lab-on-a-chip) will be discussed in detail. Various strategies for increasing the degree of probe immobilization and preventing protein denaturation by the use of effective tags will also be described. 1.1 Background and Current State of Biomolecule Libraries
Most large-scale biotechnology, pharmaceutical and research laboratories currently have existing libraries of both pristine proteins and those conjugated with various tags (biotin, His-Tag, Strep-Tag, Avi-Tag, etc.). Since these molecules have been extensively characterized, such laboratories will be reluctant to alter them to suit the surface immobilization requirements of microarray platforms. As a result, surface immobilization strategies on microarrays have to be designed on the basis of the existing state of biomolecule libraries where there is limited flexibility for operation. Most of the microarray schemes that will be described here will focus on the abovementioned existing biomolecule libraries. Almost all kinds of proteins have reactive functional groups in their side chains (Table 1) which can be used as a starting point for anchoring these molecules on microarrays. In the case of biomolecules conjugated with tags, the immobilization approach proceeds with a higher surface coupling efficiency (when compared to pristine probes). The drawback in this case is that the inherent biomolecular interaction properties of the conjugated probes Table 1 Various methods of coupling proteins on a microarray platform
Functional side group on peptide
Available surface for derivatization
Type of binding
Natural –COOH (carboxylic acid) Asp
Amino
Electrostaticy Covalent amide (after carboxy activation) Electrostatic
–NH2 (amino) Lys, Gln, Arg –SH (thiol) Cys –OH Ser, Thr Synthetic His-Tag Strep-Tag Biotin
Carboxylic acid, active ester, epoxy, aldehyde Maleimide
Covalent amide Covalent thio ether
Epoxy
Covalent ether
Ni-NTA complex Strep-Tactin Streptavidin
Coordination complex Supramolecular complex Supramolecular complex
2 Microarrays Based on Glass Substrates
might be altered, thereby affecting their kinetic and thermodynamic properties. Irrespective of the nature of the probes, the overall goal is to target reactive sites which do not denature the molecules or affect the complementarity-determining regions (CDRs) of the probes during surface immobilization. Numerous surface chemistry approaches are currently available to address such issues.
2 Microarrays Based on Glass Substrates
Most of the current analysis of protein interactions on microarrays are based on surfaces and formats that have been used for DNA arrays. Glass slides coated by epoxy- or aldehyde-terminated alkylsilanes or by polylysine can also be used to immobilize proteins, antibodies and antigens. Such activated glass (microscope slide format) slides can be spotted with biomolecules by the end-user in any format and density. The detection mode for such systems is based mainly on optical techniques such as fluorescence. Table 2 lists some examples of commercially available glass slides with reactive surfaces. 2.1 Surface Modification
The reactive coatings of the glass surface are mainly based on self-assembly techniques. Self-assembled mono-layers (SAMs) of alkylchlorosilanes or alkylalkoxysilanes require hydroxylated surfaces as substrates for their formation. The driving force for this self-assembly is the in-situ formation of polysiloxane, which is connected to surface silanol groups (SiOH) via Si-O-Si bonds (Scheme 1). These thin organic films enhance the biocompatibility of the surface and protect the proteins from denaturation and structural changes during the immobilization step. A great variety of silanization conditions pertaining to temperature and reaction time, which are also strongly influenced by the stability of the terminal functional groups have been described in the literature [1]. Monolayers of aminopropylsilanes, Table 2 Examples of commonly available glass-based microarray platforms
Coating
Supplier
Advantages
Disadvantages
Poly-L-Iysine
Sigma, Fisher
Cheap, easy to handle
Only electrostatic absorption, low stability
Highly reactive, great versatility
Low spot homogeneity, susceptible to contamination
Aminopropylsilane Corning, Erie, Xomics Aldehyde-, NHSGeneScan, Xomics ester- or epoxysilane GeneScan, Xomics Genetix NoAb BioDiscoveries Telechem, Quantifoil
3
4 Protein Microarrays: Surface Chemistry and Coupling Schemes
Scheme 1. 2D schematic description of a polysiloxane monolayer on a glass surface (X = terminal functional)
for example, can be formed under mild conditions (room temperature for 30 min) [2]. Scheme 2 shows some commonly-used silane-reagents used for the derivatization of glass surfaces [3]. Substrates on which these organosilicon monolayers have been successfully prepared include silicon oxide, aluminium oxide, quartz, glass, mica, zinc selenide and germanium oxide [4, 5]. Silicon substrates can also be covered with silane monolayers [6], resulting in a new approach for tuning the electronic properties of semiconductor chips [7] and for challenging new interfaces in the field of bioelectronics.
Scheme 2 Reagents for derivatization of glass surfaces. 1 APTES = aminopropyltriethoxysilane; 2 MPTS = 3mercaptopropyltrimethoxysilane; 3 GPTS = glycidoxypropyltrimethoxysilane; 4 TETU = triethoxysilane
undecanoic acid; 5 HE-APTS = bis(hydroxyethyl)aminopropyltriethoxysilane); 6 4-trimethoxysilylbenzaldehyde; 7 GPTS/HEG = glycidoxypropyltrimethoxysilanehexaethylene glycol; 8 poly(lysine).
2 Microarrays Based on Glass Substrates
2.2 Current State of Glass-Based Protein Microarrays
As in the case of DNA-arrays, complex issues pertaining to spot morphology, homogeneity of signal across the spot, microarray smearing and non-specific adsorption are not adequately addressed in the case of protein microarrays. In addition to the above challenges, it is difficult to compare the experimental results obtained in two different laboratories using the same set of arrays. Reproducibility of data is a major concern and if this current trend continues, it will become extremely difficult to accurately validate the biomolecular interactions that form the basis of many drug discovery and research studies. Preliminary studies [8] pertaining to the use of polylysine arrays in monitoring the interactions of 115 antigen/antibody pairs using complex protein mixture solutions was recently reported. While the success rate of the interaction data was between 20 and 50%, concerns pertaining to microarray smearing and background still remain (Figure 1a). Microarray smearing is a common artifact found in most glass-based arrays and it is one of the methods by which cross-talk between the adjacent probes occurs, leading to false results. In addition to microarray smearing, the electrostatic charges present on the glass can
Fig. 1 (a) Smearing effects on antibody-immobilized polylysine arrays [8]. (b) Snapshot of microarray spots with poor morphology and surface chemistry (within the individual spots). Differences in spot homogeneity can result in varying signal intensities within the same spot [9].
5
6 Protein Microarrays: Surface Chemistry and Coupling Schemes
also lead to the denaturation of proteins, although the exact level of electrostatic charge is difficult to estimate. In another study, large-scale protein–protein interaction studies were also successfully conducted on aldehyde-coated glass slides [9]. While the slides were successful in discriminating between the specific and non-specific interactions, issues pertaining to “quenching” or neutralizing the remaining reactive groups on the slides still remain. BSA was used to effectively block the free surface of the chip to minimize non-specific interactions. This could pose a problem on such slides if small molecules are used which can potentially be masked by the bulky BSA molecules. While spot morphology is not a major concern in normal laboratoryscale research studies, it becomes critical when investigating several thousands of protein samples spotted on a single microarray slide. Fluorescence intensities emanating from the spots (Figure 1b) and their adjoining areas (if the spots are not homogeneous) can affect the data acquisition process. In addition, if the surface chemistry within the spots is not homogeneous, varying intensities of fluorescence signal can be observed within the same spot. Unless the data analysis software is highly sophisticated, such spots can lead to poor averaging effects during the analysis step which will in turn result in misleading protein expression levels for the protein of interest. In another study, Joos et al. [10] reported the direct parallel detection of autoantibodies in human sera by using a single immunoassay experiment in which 18 different antigens were spotted onto aldehyde-coated slides. After activation of a 96-well teflon-coated glass plate (Erie Scientific) with aminopropylsilane, protein antigens were attached to the pre-defined spots to form a high-throughput microarray-based enzyme-linked immunosorbent assay (ELISA) [11]. Inside the wells, the authors spotted arrays of 144 (4 36) elements each using a 36-capillary based print head. This array format thus appears as a compromise between high-density microarrays and conventional micro-titerplates. Antibodies can also be arrayed on aminosilated glass surfaces using a micro-molded hydrogel stamper [12]. Many approaches use micro-contact printing with PDMS stamps [13] or photoreactions in combination with masks [14] for the patterning of glass surfaces with proteins. The profiling of cancer cells from a cellular lysate was carried out with aldehyde- or polylysine-coated glass microarray containing 146 distinct antibodies. The target proteins were labeled with Cy5 or Cy3 dyes and no amplification methods such as ELISA were needed for monitoring the interactions [15]. In another interesting study, Schweitzer et al. designed microarrays containing 75 antibodies against cytokines on chemically-derivatized glass slides to study cytokine secretion from human dendritic cells induced with lipopolysaccharide or tumor necrosis factor-α [16]. The superiority of nickel complex-coated surfaces over conventional aldehyde slides was reported by Snyder in his study on a proteome chip containing 6566 GST yeast proteins spotted onto a glass slide (Figure 2). In this particular case, the fusion proteins attach via their His-Tags and it is believed that the binding sites orient away from the surface [17]. This example shows that the use of tags can help to control the orientation of the immobilized biomolecules on the surface.
3 Microarrays based on Gold Substrates
Fig. 2 GST yeast protein analysis. (A) 60 samples were examined by immunoblot analysis using anti-GST; 19 representative samples are shown. (B) 6566 protein samples representing 5800 unique proteins were
spotted in duplicate onto a single nickelcoated microscope slide which was then probed with anti-GST. (C) Enlarged image of one of the 48 blocks.
3 Microarrays based on Gold Substrates
The use of gold-based substrates for monitoring protein interactions has a relatively long history. Instead of using polycrystalline gold surfaces, thin gold films (10– 200 nm) can easily be deposited on solid supports similar to glass or silicon wafers by evaporation or sputtering in an ultrahigh vacuum (UHV) [18]. Since the adhesion of gold on glass or silicon is poor, pre-coating of a thin layer of Cr or Ti (1–10 nm) is generally required prior to gold deposition. Since the work of Allara [19] and Whitesides [20] in the mid-1980s, the spontaneous adsorption of organosulfur compounds onto gold has been a widely used method for the preparation of self-assembled monolayers (SAMs). If the alkyl chain is of sufficient length, the resulting molecular architecture is very stable and orients itself nearly perpendicular to the surface [21]. The close proximity of the adsorbed molecules causes van der Waals forces to play an important role in intermolecular stabilization. The thickness of these monolayers is typically in the nanometer range in the vertical direction and as a result, they can be termed nanoassemblies. It is important to note however, that in the horizontal plane, i.e. over the substrate surface, these monolayers can extend to a macroscopic length. Although SAMs have been constructed on a wide range of surfaces, monolayers based on alkanethiols on gold have been investigated extensively for the following reasons: (1) ease of fabrication, (2) degree of perfection, (3) chemical stability under laboratory
7
8 Protein Microarrays: Surface Chemistry and Coupling Schemes Table 3 Examples of commercially available gold-based microarray platforms
Coating
Supplier
Advantages
Disadvantages
Functionalized alkanethiols
Zyomyx
Good control over surface quality, good biocompatibility
More expensive than glass or polymer arrays
Graffinity
conditions, (4) availability of materials, and (5) flexibility in introducing various chemical functional groups to give different surface properties. They provide excellent control over the surface properties by individually tailoring/engineering the building blocks. Using these principles, customized surfaces with good biocompatibility properties can be developed to meet the complex needs of proteomic studies. Unlike glass- or polymer-based arrays, SAMs on gold also have the added advantage of the surface properties being efficiently characterized by optical, mechanical and electrochemical analytical tools. Many commercial suppliers offer gold-coated microscope slides with sufficiently high surface quality. In spite of the above advantages, gold-based microarrays still are not in widespread use due to their cost and stability (chemical and thermal). A major limitation to the use of SAMs as an immobilization matrix is their low temperature stability due to the weak bonding (interactions) between sulfur and gold. While this is not a concern in the case of protein arrays (where experiments are conducted at low temperatures), such microarrays are found to be unsuitable for DNA array applications where high stringent wash and temperature conditions are employed. Because of the excellent electronic properties of gold, these surfaces can be used in more advanced and label-free detection technologies such as Surface Plasmon Resonance (SPR, Graffnity, Table 3. A reference to Biacore technology is provided in Section 4.1) and mass spectrometry (refer to Section 5). 3.1 Surface Modifications
Natural proteins contain a wide range of functional groups on the side chains of their polypeptide backbone (refer to Table 1). In principle, all of them can be used during the direct chemical coupling reaction on specially prepared SAM surfaces with appropriate terminal reactive groups. Scheme 3 shows the various terminal functional groups of alkanethiols that are potentially available for biosensor applications. Hydroxy terminal alkanethiols (oligo ethylene glycol units [22] and mannitol groups [23]) undergo low non-specific interactions with proteins and enzymes, and as a result are excellent candidates for making mixed monolayer-based protein array platforms. For other surface modifications which resist protein adsorption, refer to the extensive review by Whitesides [24].
3 Microarrays based on Gold Substrates
Scheme 3 Schematic respresentation of long-chain alkanethiol monolayers (e.g. 16 mercapto-hexadecanoic acid) on gold with different terminal functional groups. 1, biotin; 2, NHS-ester (NHS, Nhydroxysuccinimide); 3, epoxy-ethylene glycol; 4, maleimide; 5, diaminohexane; 6 oligo (ethylene glycol).
3.2 Current State of Gold-based Protein Microarrays
The most widely used gold-based protein microarrays are based on biotin–streptavidin technology. As a result, the following discussion will focus primarily on this technology. One of the reasons for the rapid growth in the application of biotin–streptavidin technology is the ease with which it is possible to biotinylate various biomolecules. The following two premises are, however, inherent when we talk about biotin-conjugated molecules: (a) the biochemical and physical properties of the biomolecules are not affected significantly by the biotinylation step and (b) the properties of the biotin moiety are not affected by the derivatized biomolecule. The relative ease of attachment of biotin to organic molecules substituted with a thiol makes it possible to use gold substrates as the underlying platform for constructing protein arrays. The streptavidin surface (formed on top of the underlying biotin-terminated SAM) can be used for immobilizing a wide range of biotinylated molecules (nucleic acids, proteins, saccharides, etc.). Single component SAMs containing biotin tethered at the terminal end are known to generate poor streptavidin monolayers due to steric constraints imposed
9
10 Protein Microarrays: Surface Chemistry and Coupling Schemes
by the tight biotin layer. The close-packed biotin layer shields the individual biotin units and also prevents the effective docking of the bulky streptavidin molecules. This problem can be obviated by using either spacer molecules or via surface coupling schemes that involve lower yields. The research groups of Knoll [25] and others [26] have actively investigated the interaction of biotin and streptavidin monolayers on a variety of sensor formats. By exploring many different combinations of biotinylated thiols and various spacer molecules, Knoll and co-workers have shown that an optimum monolayer of streptavidin can be generated by using a 1:9 mole ratio of biotin and hydroxyl-terminated thiols. The latter have a dual function: first, they act as efficient spacers between the biotinylated components and second, they minimize the non-specific interaction of streptavidin with the sensor substrate. Other research groups [27], using various streptavidin mutants, have indicated that an optimum streptavidin monolayer can be obtained by using approximately 10–25% biotin on the surface. Another method of generating a well-packed streptavidin surface is by using a surface reaction between a hydrophobic interface (Figure 3) comprising carboxy groups [28] and biotin-terminated linker units attached to amino groups. A denselypacked streptavidin surface is obtained when the surface coupling yield of biotin is between 25 and 30%. This value is in the same range as that of the mixed SAM layers. An application of this surface in screening patients affected with Lyme Borreliosis is shown in Figure 4. No blocking solutions were required during this assay and highly reproducible results with high signal-to-noise ratios were observed because the assay surface was quite homogeneous (i.e. uniform probe packing density) [29].
Fig. 3 Schematic respresentation of a steptavidin sensor surface assembled on a reaction-controlled biotinylated SAM [28].
3 Microarrays based on Gold Substrates
Fig. 4 Applacation of stretavidin sensor platform (refer to Figure 3) for the diagnosis of Lyme Borreliosis. a-e, different patients suffering from the disease; f and g, control experiments using healthy subjects. The difference in the above biomolecular in-
teraction curves can be attribued to patients at various stages of infection. Irrespective of the stage of infection, good signal-to-noise ratios can be observed between infacted patients and healthy subjects using the streptavidin microarray platform [29].
While both the above methods can be used in assembling a streptavidin surface, the mixed SAM approach is relatively easy to construct and can thus accelerate the production process. The drawback is that these molecules are not readily available and as a result, need to be specially synthesized. The synthesis of these molecules is not trivial and the synthesis steps usually have poor success (yield) rates. These problems can be overcome by using the readily available carboxy-terminated thiols for constructing the streptavidin sensor surface. It is important to note that both these techniques result in a high quality streptavidin surface. The biocompatibility of the SAM interface is likely to make such sensor surfaces interesting candidates for the screening of antibodies, patient sera and other complex proteins. Several reports dealing with surface patterning of proteins on gold-coated surfaces using particle lithography [30], photoimmobilization [31], chemical selectivity of mixed monolayers [32] and microcontact printing have been published [33]. Atomic Force Microscopy (AFM) [[33], 26b] or Scanning Tunneling Microscopy (STM) [34] techniques can also be used for direct visualization of such protein patterns on gold surfaces. In the latter case, a thiolated protein was directly adsorbed onto the surface. Issues pertaining to protein denaturation still remain as the study involved the direct contact of the proteins with the metal surface. Gold-coated electrodes can also serve as amperometric biosensors in the form of enzyme electrodes. The enzymes on such biosensors are immobilized via functionalized SAMs [35], conducting polymers such as polypyrrole or polythiophenes [36] or by direct absorption onto the electrode surface [37]. Some prominent examples include the glucose sensor based on the electrochemical detection of enzymaticallyproduced hydrogen peroxide [38] or the screening of neurotoxins with immobilized acetylcholinesterase in the form of competitive assays (using acetyl-thiocholine)
11
12 Protein Microarrays: Surface Chemistry and Coupling Schemes
[39]. However, none of these approaches is likely to be used in a microarray format for HTP studies in the near future.
4 Microarrays based on Polymer Substrates
Most of the above discussions have focused primarily on two-dimensional (2-D) surface architectures. Since the functional properties of a protein are highly dependent on its conformation and structure, it is possible that delicate proteins may not be able to survive the harsh conditions of 2-D surfaces. As a result, other formats based on three-dimensional (3-D) surfaces which provide a fluidic environment that closely mimics the biological environments of various proteins, have been proposed. Some prominent examples of 3-D-based systems include hydrogels and polymer membranes. The basic underlying tenet of such 3-D systems is that a fluidic environment preserves the conformation and, as a result, the activity of the protein. Additional benefits of 3-D architecture include larger surface area and greater probe density. Polymer membranes such as aminated nylon [40] or nitrocellulose [41] were the first commonly used substrates for the preparation of microarrays. For over two decades, these platforms have been widely employed in microbiology laboratories as blotting supports. Detection on these supports has been carried out mainly by radioimaging, although a combination of other detection technologies such as fluorescence is also possible. Table 4 provides an overview of the polymer microarray platforms that are currently available on the market. Gel-based studies in combination with downstream operations such as mass spectrometry, dominate most proteomic investigations. Since it is possible to transfer proteins and antibodies directly onto the blotting membranes after electrophoresis, these platforms definitely have certain advantages when it comes to sample preparation and in terms of the overall cost. Hydrogel-based arrays based on gold substrates have dominated the field of real-time protein interaction studies for the past decade. This has been mainly possible due to the widespread application of SPR Table 4 Examples of common polymer and hydrogel substrates
Polymer substrate
Supplier
Nylon (aminated) membrane
Pall, Macherey & Nagel Cheap, versatile, high biocompatibility, applicable to blotting Schleicher and Schuell Pierce, Bio World Nunc BiaCore Perkin Elmer Prolinx
Nitrocellulose membrane Polystyrene Dextran-Hydrogels Polyacrylamide Polymer brush
Overall advantages
General limitations Only electrostatic absorption, low spot homogeneity and reproducibility, mass transport limitations, high background smearing and non-specific interactions
4 Microarrays based on Polymer Substrates
technology by Biacore. Polymer brushes and hydrogels on glass substrates (conventional microarray format) have also been developed (refer to Table 4), although their usage is still limited. 4.1 Surface Modifications
The immobilization of biomolecules on functionalized polymer membranes such as nylon or cellulose occurs via electrostatic absorption. This is also the case for hydrogels based on polyacrylamide platforms. As a result, it is difficult to obtain reliable, high quality data from such systems for potential application in microarray studies. During the past decade, self-assembled monolayers modified with dextran have become the system of choice for many biomolecular interaction studies. The widespread use of this 3-D architecture was mainly a result of the enormous success of the commercial SPR instrument maker, Biacore. Well-established protocols of self-assembly techniques on gold were utilized by researchers at Biacore to synthesize these dextran surfaces [42]. Epoxy-terminated thiol monolayers were used for coupling dextran polymers to the sensor surface. The immobilization of ligands was achieved by using carboxy-methyl-modified dextran containing extensive reactive handles for activation and covalent attachment of the molecules of interest. The resulting layer is a hybrid material composed of a 2-D monolayer on gold and a 3-D hydrogel which is highly flexible, non-crosslinked and extends 100–200 nm from the coupling surface under physiological buffer conditions (Scheme 4). While such 3-D surfaces have numerous benefits, they are still plagued by problems relating to mass transport effects and high background signals emanating from non-specific interactions. The 3-D dextran surfaces are known to impose mass transfer limitations that can significantly alter the inherent kinetics of the interaction. Diffusion effects play an important role within these 3-D structures, and can lead to the formation of several local concentration gradients. During realtime kinetic studies using surface plasmon resonance measurements, such local gradients can result in the calculation of false kinetic rate constants. Protein immobilization with the help of functionalized polymer brushes or by photochemically-activated polymers will be discussed in Sections 4.2 and 6. 4.2 Current State of Polymer-based Microarrays
Numerous protein chip surfaces can be created by activating the above-mentioned dextran surface. Using the dextran as the base, surfaces incorporating thiol, amine, (strept)-avidin or aldehyde groups can be generated (Scheme 5). These surface modifications have resulted in the commercial success of numerous sensor chips [43] which are routinely used in a plethora of applications ranging from HIV analysis using various monoclonal antibody–antigen interactions to the area of small molecule drug discovery operations and quality control. Since these sensor chips are quite robust and can withstand stringent acidic or basic regeneration
13
14 Protein Microarrays: Surface Chemistry and Coupling Schemes
Scheme 4 Reaction schemes for preparing a hydrogel dextran matrix [41].
steps, they can be re-used several times without any substantial loss of the surface density/activity of the ligands. Another new approach that combines self-assembled monolayer techniques with a 3-D microenvironment is the synthesis of functionalized dendrimers on solid supports such as glass. The activated dendritic structures are terminated by NHS esteror isothiocyanate- groups [44]. Emerging technologies use the (co)polymerization of methacrylate, styrene or vinyl alcohol monomers on the surface of a glass chip followed by subsequent functionalization using photografting [45]. The optical properties of the chip surface can be tailored to the reflectance of light or scattering effects. The tuning of the refractive index of the surface can also be accomplished by effective implementation of high-refractive index materials (TiO2 ) or low-refractive layers (polymers, SiO2 ) [46]. This approach has been used to make highly reflective microarray surfaces at the applied wavelength (emission and excitation) for potential applications in highly sensitive luminescent-based assay systems. A comparative study involving the screening of 11 different array surfaces on polymer gel and silane-coated glass slides was carried out by applying five different monoclonal antibodies in a 20 × 18 matrix. All the important aspects of the procedure are discussed in this work, including ease of chip preparation and handling, storage, signal intensities, reproducibility and detection limits [47]. Based on these investigations, reflective slides with a mirror-like surface (3-aminopropyltriethoxysilane, Amersham Biosciences [48]) were found to provide high levels of
5 Special Formats: Microfluidic Devices and Integrated Semiconductor Chips
Scheme 5 Reaction schemes for generating different protein chip surfaces using the dextran matrix as the underlying platform.
uniformity and reproducibility, whilst the polyacrylamide-coated slides were characterized by their high levels of sensitivity. The authors conclude by stating that the selection of the array support material strongly depends upon the special requirements of the experiment and the properties of the probe biomolecules. Currently, many groups have reported on the fabrication of custom-made antibody arrays on nitrocellulose or nylon membranes for the detection of cytokines in conditioned media or human serum using a multiplexed sandwich assay [49].
5 Special Formats: Microfluidic Devices and Integrated Semiconductor Chips
The lab-on-chip technology involving microfluidics has evolved rapidly over the last decade. This technology allows for the parallelization and integration of multiple sample processing approaches in the field of diagnostics or proteomic research. These devices are however, more complex than conventional microarrays (Figure 5), and while they are suitable for small-scale sample preparation applications, their throughput is still relatively low. Nevertheless, as these systems continue to move from academic laboratories to commercialized technology, the user-friendliness of such microfluidic systems is slowly improving [50]. In spite of the above limitations, this technology has certain advantages such as the use of very small volumes of human sera or urine samples for conducting rapid diagnostic assays. This technology is poised to make rapid advances in the area of nucleic acid-based testing, a field that is set to explode during this decade. The lab-on-a-chip SNP genotyping systems by
15
16 Protein Microarrays: Surface Chemistry and Coupling Schemes
Fig. 5. Schematic diagram of a microfluide system for the analysis of DNA samples. Image by courtesy of US DOE Genomes to Life Program
Caliper [51] and Nanogen [52] are clear illustrations of the scope of such technologies. Whilst most of the early microfluidic formats have focussed primarily on DNA interactions, such systems can also be used for the detection of protein–protein or antibody–antigen interactions if appropriate surface immobilization strategies are employed. While Nanogen uses a plastic cartridge in their silicon wafer biochip unit, many microfluidic systems are based on glass or silicon using classical etching and processing technologies of the semiconductor industry for the structuring of chips with microchannel systems. The planar waveguide glass chips from Zeptosens use integrated microfluidics combined with high refractive index material (Ta2 O5 ) deposited on the glass support for monitoring biomolecular interactions using highly sensitive fluorescence detection methods [53]. The Flow-thru ChipTM from MetriGenix is also based on a three-dimensional microchannel glass chip [54]. CombiMatrix [55] attaches a proprietary porous three-dimensional reaction layer to the active side of semiconductor surfaces for the immobilization of biomolecules. Yet another microfluidic protein biochip platform based on silicon is currently under development at Zyomyx [56]. Microchannels in polymeric materials such as poly-(methyl methacrylate) (PMMA) can also be used to make microfluidic devices and at the same time the methyl ester groups on the surface of the PMMA can be chemically derivatized with reactive groups to immobilize biomolecules [57].
6 Chemical Immobilization Techniques for Proteins
Some of the potential advantages of miniaturization can also be combined with other detection technologies for effective analysis of protein purification and interactions. Two examples of this are the coupling of microfluidics with ionization mass spectroscopy (ESI-MS) [58] or with micro-capillary electrophoresis [59]. Another technology that has been borrowed from the semiconductor industry in the field of integrated microarrays is the fabrication of microelectrode structures on silicon wafers. These gold, silver or platinum electrodes can be coated with functionalized monolayers or conducting polymers that are biocompatible and reactive towards biomolecules. On such electrodes, a voltage can be applied for the selective immobilization of the physiologically charged biomolecules. The microarray system from Nanogen is one example of such an approach, while the chip readout is still undertaken using conventional optical techniques [60]. Other approaches are one step ahead and use electrochemical detection of DNA hybridization in combination with redoxactive labels [61] (such as Motorola’s eSensorTM DNA detection system) or intercalators [62]. Currently these technologies are primarily focused on DNA arrays although in future, additional solutions for monitoring protein–protein or antibody–antigen interactions can also be envisaged by using such techniques. Technologies such as impedance spectroscopy or surface capacitance measurements [63] can potentially be the key to these application (e.g. at BIONCHIP [64]). In analogy to luminescence-based assays, electrochemical detection methods can also be amplified by ELISA techniques. Enzymes such as horseradish peroxidase or alkaline phospatase can be conjugated to the target molecules, and the reaction products can be measured by impedance spectroscopy [65] or with amperometic methods [66] if a suitable substrate has been added. A fully integrated electronic CMOS (Complementary Metal Oxide Silicon) biochip (Figure 6) based on enzyme-linked sensor techniques has been recently launched by Infineon [67].
6 Chemical Immobilization Techniques for Proteins
Generally, all the functional surface coatings described above can act as biocompatible interfaces during the chemical coupling of proteins to any microarray substrate, either via covalent binding, complex coordination chemistry or supramolecular interactions. In addition to the above methods, proteins also have a tendency to interact with the surface via electrostatic interactions. This principle is used in the case of simple electrostatic adsorption of proteins onto nylon, nitrocellulose, polyL-lysine or aminopropylsilane-coated surfaces. These interactions constitute what is widely known as non-specific interactions. Special SAMs based on polyethylene glycols [68] or poly(L-lysine)-g-poly(ethylene glycol) [69] are known to resist such interactions and are thus widely used in the design of most biosensor and microarrays. In the following sections, various strategies for coupling proteins using covalent, photochemical tags and site-directed immobilization approaches will be addressed.
17
18 Protein Microarrays: Surface Chemistry and Coupling Schemes
Fig. 6 CMOS-based biochip platform with integrated electronic microstructure (gold on silicon) suitable for an electrochemical or optical chip read-out by Infineon (top) and Bionchip (bottom). Images by coutesy of Infineon and Bionchip BV.
6.1 Covalent Chemical Coupling
In order to achieve covalent chemical coupling to microarray substrates, proteins should have the following chemical functionalities (see also Table 1) in the side chains of their polypeptide backbone: SH (Cysteine), NH2 (Lysine), COOH (aspartic acid, glutamic acid), OH (Serine), Ph-OH (Ph = phenyl, Tyrosine), imidazole (Histidine). In principle, all of them can be used during the direct chemical coupling reaction on specially prepared SAM surfaces. NHS (N-hydroxysuccinimide ester), aldehyde or epoxy groups form stable amide bonds with the free amino groups of the peptide (Scheme 6). The activation of carboxyl groups with EDC (N-3-dimethylaminopropyl-N-ethylcarbodiimide) and NHS, as well as further coupling schemes with NHS-modified surfaces are shown in Scheme 5. In another approach, the carboxylic acid groups in the side chains of the polypeptide can be activated by carbodiimides or NHS and can be covalently coupled to free amino groups on the sensor surface.
6 Chemical Immobilization Techniques for Proteins
Scheme 6 Surface coupling reaction of NHS-esters with the amino residues of the side-chains of polypeptides (lysine units). R, protein.
Other chemicals which are currently available for the covalent coupling of biomolecules to microarray surfaces include maleimides, pyridyl disulfide and vinyl sulfone, all of which will react with thiol groups [70] (Scheme 7). Epoxymodified surfaces can also be used for coupling hydroxy groups (see Scheme 4) and isothiocyanates for binding with amino groups [71] (Scheme 8). 6.2 Photochemical Cross-Coupling
In principle, there are many approaches available for the photochemical crosscoupling of biomolecules to specially functionalized surfaces. For example, the binding of biomolecules to epoxy- or aldehyde-coated slides can also be induced by irradiation with UV light. But there are also some typical photoreactive groups
Scheme 7 Surface modifications for the attachment of thiol residues (which are present in the side chains) of polypeptides (cysteine units). R, protein, 1, maleimide; 2, disulfides; 3, vinyl sulfone.
19
20 Protein Microarrays: Surface Chemistry and Coupling Schemes
Scheme 8 Immobilization of proteins and enzymes on aminoterminated monolayers using the bifunctional reagent DIDS (4,4 diisothiocyanato-trans-2,2-disulfonic acid disodium salt).
that are currently available. A prominent example is the azidophenyl group. Photoinduced crosslinking of proteins with azidophenyl-functionalized layers can be achieved by irradiation with UV or visible light. The excitation wavelength of this reaction can be changed by the use of additional substituents on the azidophenyl ring [72] (Scheme 9). Another strategy for the photochemical attachment of biomolecules has been developed by researchers at the commercial firm, Exiqon. They have taken advantage of an anthrachinone linker that can be coupled to a wide variety of solid polymeric supports (e.g. polystyrene, polyethylene, polypropylene, PMMA, nylon) by irradiation (Scheme 10) and can be conjugated with either biomolecules, ligands or reactive groups (e.g. carboxylic acids or amines) [73]. Yet another versatile method for the photochemical cross-linking of proteins to surfaces is the photoaptamer technique used by SomaLogic. However, in contrast to the above methods where the capture probes are photochemically immobilized, here the molecular recognition event is stabilized by consecutive photocrosslinking. Photoaptamers are chemically synthesized single-stranded DNA molecules that can be labeled at their 5 -end with acrydite, amine, thiol or biotin. These labels facilitate the covalent attachment of the photoaptamers to solid supports
Scheme 9 Photo-induced cross-linking of proteins R and azidophenylfunctionalized layers (8 = 265–275 mm; if the hydroxyl substituent is replaced with a nitro group, then 8 = 300–460 mm).
6 Chemical Immobilization Techniques for Proteins
Scheme 10 Exiqon’s patented anthrachinone (AQ) covalent attachment photochemical method. R, polymeric solid; P, specific ligand (biomolecule) or reactive group.
including beads, multi-well plates and activated slides. Photoaptamers derived from the PhotoSELEXTM process which substitutes a brominated deoxyuridine (BrdU) for the thymidine base (T) normally found in DNA, are high-affinity ssDNA compounds that can function as highly specific capture agents for proteins. The photocross-linking of the BrdU of the photoaptamer to an electron-rich amino acid side chain of the protein adds a second dimension of specificity to the normal protein binding event. Since photoaptamers can be covalently bound to their target analytes before signal detection, such arrays can be vigorously washed to remove non-specifically bound proteins, yielding superior signal-to-noise ratios [74]. The Versalinx Protein Microarray Technology by Prolinx utilizes a synthetic chemical affinity (Scheme 11) based on the reversible complexation of phenyldiboronic acid (PDBA) with salicylhydroxamic acid (SHA). The microarray surface is then coated with 3-D polymer brushes which have SHA as their terminal units [75]. This surface permits the reversible binding of PDBA-conjugated proteins. 6.3 Tagged Proteins
To prevent protein denaturation, it is preferable to design surface modifications that target protein functional groups which are furthest from their CDRs. It is important that the epitope mapping domains of the proteins are not affected or modified during the surface immobilization step. By understanding the chemical nature of the proteins of interest, smart surfaces can be constructed for use in highquality protein expression studies. Several protein or antibody samples currently exist in conjugated forms having so-called tags such as biotin, His-Tag, Strep-Tag, Avi-Tag, etc. Tags (Table 5) enhance the efficiency of the surface-coupling step but at the same time have a tendency to alter the properties of pristine proteins. Most research laboratories have tools that are adapted to these tags and if protein
21
22 Protein Microarrays: Surface Chemistry and Coupling Schemes
Scheme 11 Reversible formation of a complex of PDBA (phenyldiboronic acid)-conjugated proteins with SHA (salicyl hydroxymid acid)terminated polymer brushes.
chips have to be integrated into routine test procedures, then they also have to be compatible with such tagged molecules. The use of biotin–streptavidin technology in generating biocompatible surfaces has been highlighted in a large number of research articles [76]. Streptavidin surfaces can be used for immobilizing a wide range of biotinylated molecules (nucleic acids, proteins, saccharides, etc.) and biotin–streptavidin architectures have been widely employed as sensor supports for studying numerous biomolecular interactions. One of the reasons for the rapid growth in the application of biotin–streptavidin technology is the ease with which it is possible to biotinylate various biomolecules. In addition to conventional procedures, new developments are continuously being established in the area of biotinylation of biomolecules. In this area, it is worth mentioning a new method for labeling proteins with biotin by using the so-called AviTagTM technology [77]. Many commercial suppliers offer both biotinylated microwell plates/microarray slides and biotin-conjugated biomolecules Table 5 Examples of commercially available tagged proteins and their corresponding immo-
bilization systems Protein tag
Manufacturer of tagged proteins or antibodies Immobilization matrix
Biotin
Roche, MorphoSys; Biotinylation kits by Molecular Probes or Avidity IBA Novagen, ClonTech
Strep-Tag His-Tag
Streptavidin (Greiner, Perkin Elmer, Xenopore) Strep-Tactin (IBA) Ni-NTA complex (Greiner, Qiagen, Xenopore)
6 Chemical Immobilization Techniques for Proteins
Scheme 12 (A) Structure of N-(5-amino-1-carboxypentyl)iminodiacetic acid. (B) The quadridentate nitrilotriacetic acid (NTA) ligand forms a complex with four binding sites on the nickel metal which is present in the center. The two remaining binding sites can be coordinated with histidine ligands (L).
for biomolecular screening experiments. Another interesting tag for coupling to streptavidin-related surfaces is the Strep-Tag. The Strep-Tag consists of eight amino acids and has a high specific affinity for Strep-Tactin, a synthetically-engineered streptavidin derivative. Recombinant proteins or enzymes can be labeled with such a Strep-Tag. This technology is currently used for protein purification, although microplates and microarrays coated with Strep-Tactin can also be used as a platform for high-throughput screening assays [78]. A well-known label in protein chemistry is the widely used His-Tag. Proteins with six histidine amino acid units at the C- or N-terminus of the peptide chain can be expressed in E. coli. By analogy to affinity chromatography columns, nickel chelate complexes [79] can be used as immobilization matrices on biochip surfaces for His-tagged proteins. An example of a chelate complex is Ni-NTA (NTA = nitrilotriacetic acid). For this purpose, N-(5-amino-1-carboxypentyl)iminodiacetic acid can be synthesized by methods similar to those reported in the literature [80] or by solid phase peptide synthesis coupled to an activated carboxylic acid monolayer and subsequent treatment with a Ni2+ solution. The two free binding sites of the central nickel metal can be coordinated to two histidine units of the tag (Scheme 12). One of the main advantages of such systems is their re-usability. The immobilized proteins can be removed after the screening experiments by the addition of a stong nickel-complexing agent such as EDTA. Although the binding of a histidine tag to nickel is relatively weak compared to the biotin–streptavidin system, Snyder et al. [17] have obtained relatively good results from their nickel-coated protein chips. There are several commercial suppliers who provide Ni-based products for biomolecular screening. Nickel chelate-coated glass slides are, for example, provided by Greiner [80]. As part of their LiquiChip System, Qiagen also offers NiNTA-coated beads. This bead-based microplate platform can be used for a wide range of protein-based assays for proteomics and drug discovery [81]. Whitesides and others have also been granted a patent that covers the use of alkanethiol SAMs
23
24 Protein Microarrays: Surface Chemistry and Coupling Schemes
Scheme 13 Coupling of aldehyde residues of glycoproteins to hydrazide-terminated monolayers.
on gold surfaces which are terminated with Ni-chelate complexes for the coordination of polyamino acid-tagged biomolecules [82]. 6.4 Site-Specific Immobilization of Antibodies
Antibodies are known to bind reversibly to protein A-coated surfaces. Antibodies of the IgG type can be attached to protein A surfaces for high-throughput immunoassay applications by means of their Fc-domains of the heavy chains. Under such conditions, the Fab-domain (fragment of antigen binding) is oriented away from the surface and is easily available for the antigens in the sample solution. In situations where monoclonal IgG antibodies do not bind to protein A surfaces, protein G can be used for antibody immobilization. A review of various currently used immunoassay strategies for protein interactions has been recently reported by MacBeath [83]. Another covalent-linking approach involves the use of hydrazide-terminated monolayers. These functionalities bind exclusively to aldehyde groups that appear only in the glycosylated part of the heavy chains of antibodies, but not in the protein chains of the Fab-domains (Scheme 13).
7 Conclusions
This article has reviewed the numerous strategies which are currently available for immobilizing proteins and related biomolecule families. Since proteins are not as homogeneous as nucleic acids, a wide range of challenges will continue
References
to arise depending upon the nature of the end application. Protein denaturation will continue to be the rate-limiting step in most protein microarray platforms. However, selection of smart chemistries coupled with a sound knowledge of the structural and chemical properties of the probes (which are to be immobilized) will ultimately result in the development of reliable strategies which can be used in the near future to develop functional screening platforms for biomolecules.
References 1 [x1] A. ULMAN, Chem. Rev. 1996, 96, 1533–1554. 2 N. ZAMMATTEO, L. JEANMART, S. HAMELS, S. COURTOIS, P. LHEVESI, J. REMACLE, Anal. Biochem. 2000, 280, 143–150. 3 M. C. PIRRUNG, Angew. Chem. Int. Ed. 2002, 41, 1276–1289. 4 A. ULMANN, Chem. Rev. 1996, 1533–1554. 5 J. GUN, J. SAGIV, J. Colloid. Interface Sci. 1986, 112, 457. 6 K. BIERBAUM, M. GRUNZE, Adhes. Soc. 1994, 213. 7 R. COHEN, N. ZENOU, D. CAHEN, S. YITZCHAIK, Chem. Phys. Lett. 1997, 279, 270–274. 8 B. B. HAAB, M. J. DUNHAM, P. O. BROWN, Genome Biol. 2001, 2, 4.1–13. 9 G. MACBEATH, S. L. SCHREIBER, Science 2000, 289, 1760–1763. 10 T. O. JOOS, M. SCHRENK, P. H¨OPFL, K. KRO¨ GER, U. CHOWDHURY, D. STOLL, D. SCHO¨ RNER, M. D¨URR, K. HERRICK, S. RUPP, K. SOHN, H. H¨AMMERLE Electrophoresis 2000, 21, 2641–2650. 11 L. G. MENDOZA, P. MCQUARY, A. MONGAN, R. GANGHADHARAN, S. BRIGNAC, M. EGGERS, BioTechniques 1999, 27, 778–788. 12 B. D. MARTIN, B. P. GABER, C. H. PATTERSON, D. C. TURNER, Langmuir, 1998, 14, 3971–3975. 13 ] A. BERNARD. E. DELAMARCHE, H. SCHMID, B. MICHEL, H. R. BOSSHARD, H. BIEBUYCK, Langmuir 1998, 14, 2225–2229. 14 S. K. BHATIA, J. L. TEIXEIRA, M. ANDERSON, L. C. SHRIVER-LAKE, J. M. CALVERT, J. H. GEORGER, J. J. HICKMAN, C. S. DULCEY, P. E. SCHOEN, F. S. LIGLER, Anal. Biochem. 1993, 208, 197–205. 15 A. SREEKUMAR, M. K. NYATI, S. VARAMBALLY, T. R. BARRETTE, D. GHOSH, T. S. LAWRENCE, A. M. CHINNAIYAN, Cancer Res. 2001, 61, 7585–7593.
16 B. SCHWEITZER, S. ROBERTS, B. GRIMWADE, W. SHAO, M. WANG, Q. FU, Q. SHU, I. LAROCHE, Z. ZHOU, V. T. TCHERNEV, J. CHRISTIANSEN, M. VELLECA, S. F. KINGSMORE, Nature Biotechnol. 2002, 20, 359–365. 17 H. ZHU, M. BILGIN, R. BANGHAM, D. HALL, A. CASAMAYOR, P. BERTONE, N. LAN, R. JANSEN, S. BIDLINGMAIER, T. HOUFEK, T. MITCHELL, P. MILLER, R. A. DEAN, M. GERSTEIN, M. SNYDER, Science 2001, 293, 2101–2105. 18 L. BERTILSSON, B. LIEDBERG, Langmuir 1993, 9, 141–149. 19 a) R. G. NUZZO, D. L. ALLARA, J. Am. Chem. Soc. 1983, 105, 4481–4483. b) M. D. PORTER, T. B. BRIGHT, D. L. ALLARA, C. D. CHIDSEY, J. Am. Chem. Soc. 1987, 109, 3559–3568. 20 a) G. M. WHITESIDES, C. D. BAIN, Science 1988, 240, 62–63. b) L. STRONG, G. M. WHITESIDES, Langmuir 1988, 4, 546–558. 21 a) L. H. DUBOIS, B. R. ZEGARSKI, R. G. NUZZO, Proc. Natl. Acad. Sci. USA 1987, 84, 4739–4742. b) C. D. BAIN, G. M. WHITESIDES, J. Am. Chem. Soc. 1988, 110, 3665–3666. 22 K. L. PRIME, G. M. WHITESIDES, Science 1991, 252, 1164–1167. 23 Y. Y. LUK, M. KATO, M. MRKSICH, Langmuir 200, 16, 9604–9608. 24 E. OSTUNI, R. G. CHAPMAN, R. E. HOLMLIN, S. TAKAYAMA, G. M. WHITESIDES, Langmuir, 2001, 17, 5605–5620. 25 W. KNOLL, M. LILEY, D. PISCEVIC, J. J. SPINKE, M. J. TARLOV, Adv. Biophys. 1997, 34, 1–15. 26 a) L. S. JUNG, K. E. NELSON, C. T. CAMPBELL, P. S. STAYTON, S. S. YEE, V. PEREZ-LUNA, G. LOPEZ, Sensors Actuators B 1998, 54, 137–144. b) M. SCHA¨ FERLING, M. KRUSCHINA, F. ORTIGAO, M. RIEPL, K.
25
26 Protein Microarrays: Surface Chemistry and Coupling Schemes
27
28 29
30 31
32 33
34
35 36 37
38 39
40 41 42 43 44
ENANDER, B. LIEDBERG, Langmuir 2002, 18, 7016–7023. L. S. JUNG, K. E. NELSON, P. S. STAYTON, C. T. CAMPBELL, Langmuir 2000, 16, 9421–9432. M. MECKLENBURG, B. DANIELSSON, F. WINQVIST, Patent SE, PCT/EP 97/03317. P. PAVLICKOVA, N. MEJLHEDE JENSEN, H. PAUL, M. SCHA¨ FERLING, C. GIAMMASI, M. KRUSCHINA, W.-D. DU, M. THEISEN, M. IBBA, F. ORTIGAO, D. KAMBHAMPATI, J. Prot. Res. 2002, 1, 227–231. J. C. GARNO, N. A. WADU-MESTHRIGE, G.-Y. LIU, Langmuir 2002, 18, 8186–8192. E. DELAMARCHE, G. SUNDARABABU, H. BIEBUYCK, B. MICHEL, C. GERBER, H. SIGRIST, H. WOLF, H. RINGSDORF, N. XANTHOPOULOS, H. J. MATHIEU, Langmuir 1996, 12, 1997–2006. M. VEISEH, M. H. ZAREIE, M. ZHANG, Langmuir 2002, 18, 6671–6678. F. MORHARD, J. PIPPER, R. DAHINT, M. GRUNZE, Sensors Actuators B 2000, 70, 232–242. G. J. LEGGETT, C. J. ROBERTS, P. M. WILLIAMS, M. C. DAVIES, D. E. JACKSON, S. J. B. TENDLER, Langmuir 1993, 9, 2356–2362. I. WILLNER, E. KATZ, Angew. Chem. Int. Ed. 2000, 39, 1180–1218. W. SCHUHMANN, Mikrochim. Acta 1995, 121, 1–29. a) T. TATSUMA, T. WATANABE, Anal. Chem. 1991, 63, 1580–1585. b) M. MCRIPLEY, R. A. LINSENMEIER, J. Electroanal.Chem. 1996, 414, 235–246. A. RIKLIN, I. WILLNER, Anal. Chem. 1995, 67, 4118–4126. a) S. ANDREESCU, L. BARTHELMBS, J.-L. MARTY, Anal. Chim. Acta 2002, 464, 171–180. b) G. JEANTY, A. WOJCIECHOWSKA, J.-L. MARTY, M. TROJANOWICZ, Anal. Bioanal. Chem. 2002, 373, 691–695. J. GERSHONI, G. PALADE, Anal. Biochem. 1982, 124, 396–405. H. TOWBIN, T. STAEHELIN, J. GORDIN, Proc. Natl. Acad. Sci., USA 1979, 76, 4350–4354. B. JOHNSSON, S. LOFAS, G. LINDQUIST, Anal. Biochem. 1991, 198, 268–277. http://www.biacore.com R. BENTERS, C. M. NIEMEYER, D. W¨OHRLE, ChemBioChem 2001, 2, 686–694.
45 B. de BOER, H. K. SIMON, M. P. L. WERTS, E. W. van der VEGTE, G. HADZIIOANNOU, Macromolecules 2000, 33, 349–356. b) C. PREININGER, H. CLAUSEN-SCHAUMANN, A. AHLUWALIA, D. DEROSSI, Talanta 2000, 52, 921–930. 46 a) M. MENNIG, P. W. OLIVERA, H. SCHMIDT, Thin Solid Films 1999, 351, 99–102. b) W. E. VARGAS, J. Appl. Phys. 2000, 88, 4079–4084. c) W. QUE, Z. SUN, Y. L. CHAN, C. H. KAM, Thin Solid Films 2000, 359, 177–183. 47 P. ANGENENDT, J. GLO¨ KLER, D. MURPHY, H. LEHRACH, D. J. CAHILL, Anal. Biochem. 2002, 309, 253–260. 48 http://www.apbiotec.com 49 a) R. P. HUANG, R. HUANG, Y. FAN, Y. LIN, Anal. Biochem. 2001, 294, 55–62. b) R. WIESE, Y. BELOSLUDTSEV, T. POWDRILL, P. THOMPSON, M. HOGAN, Clin. Chem. 2001, 47, 1451–1457. c) S. W. TAM, R. WIESE, S. LEE, J. GILMORE, K. D. KUMBLE, J. Immunol. Methods 2002, 261, 157–165. 50 D. FIGEYS, Proteomics, 2002, 2, 373–382. 51 http://www.calipertech.com 52 http://www.nanogen.com 53 M. PAWLAK, E. SCHICK, M. A. BOPP, M. J. SCHNEIDER, P. OROSZLAN, M. EHRAT, Proteomics, 2002, 2, 383–393. 54 V. BENOIT, A. STEEL, M. TORRES, Y.-Y. YU, H. YANG, J. COOPER, Anal. Chem. 2001, 73, 2412–2420. 55 http://www.combimatrix.com 56 http://.www.zyomyx.com 57 S. A. SOPER, A. C. HENRY, B. VAIDYA, M. GALLOWAY, M. WABUYELE, R. L. MCCARLEY, Anal. Chim. Acta 2002, 470, 87–99. 58 a) D. FIGEYS, S. P. GYGI, G. MCKINNON, R. AEBERSOLD, Anal. Chem. 1998, 70, 3728–3734. b) C.-H. CHIOU, G.-B. LEE, H.-T. HSU, P.-W. CHEN, P.-C. LIAO, Sensors and Actuators B 2002, 86, 280–286. 59 a) H. BECKER, M. ARUNDELL, A. HARNISCH, D. H¨ULSENBERG, Sensors and Actuators B 2002, 86, 271–279. b) D. FIGEYS, Y. ZHANG, R. AEBERSOLD, Electrophoresis 1998, 19, 2338–2347. 60 P. SWANSON, R. GELBART, E. ATLAS, L. YANG, T. GROGAN, W. F. BUTLER, D. E. ACKLEY, E. SHELDON, Sensors Actuators B 2000, 64, 22–30. 61 C. J. YU, Y. WAN, H. YOWANTO, J. LI, C. TAO, M. D. JAMES, C. L. TAN, G. F.
References
62
63
64 65
66
67 68 69
BLACKBURN, T. J. MEADE J. Am. Chem. Soc. 2001, 123, 11155–11161. E. M. BOON, D. M. CERES, T. G. DRUMMOND, M. G. HILL, J. K. BARTON, Nature Biotech. 2000, 18, 1096– 1100. a) M. DIJKSMA, B. KAMP, J. C. HOOGLVLIET, W. P. BENNEKOM, Anal. Chem. 2001, 73, 901–907. b) M. RIEPL, V. M. MIRSKY, I. NOVOTNY, V. TVAROZEK, V. REHACEK, O. S. WOLFBEIS, Anal. Chim. Acta 1999, 392, 77–84. c) C. BERGGREN, B. BJARNASON, G. JOHANSSON, Biosens. Bioelectron. 1998, 13, 1061–1068. http://www.bionchip.com a) F. PATOLSKY, E. KATZ, A. BARDEA, I. WILLNER, Langmuir 1999, 15, 3703–3706. b) C. RUAN, L. YANG, Y. LI, Anal. Chem. 2002, 74, 4814–4820. C. N. CAMPBELL, D. GAL, N. CRISTLER, C. BANDITRAT, A. HELLER, Anal. Chem. 2002, 74, 158–162. http://www.infineon.com A. G. FRUTOS, J. M. BROCKMAN and R. M. CORN, Langmuir 2000, 16, 2192–2197. L. A. RUIZ-TAYLOR, T. L. MARTIN, F. G. ZAUGG, K. WITTE, P. INDERMUHLE, S. NOCK, and P. WAGNER Proc. Natl Acad. Sci. USA. 2001, 98, 852–857.
70 http://www.piercenet.com 71 a) I. WILLNER, A. RIKLIN, B. SHOHAM, D. RIVENZON, E. KATZ, Adv. Mater. 1993, 5, 912–915. b) K. LINDROOS, U. LILJEDAHL, M. RAITIO, A.-C. SYVa¨nen, Nucl. Acids Res. 2001, 29, e69. 72 a) J. RINKE, M. MEINKE, R. BRIMACOMBE, G. FINK, W. ROMMEL, H. FASOLD, J. Mol. Biol. 1980, 137, 301–304. b) J. CIESIOLKA, P. GORNICKI, J. OFENGAND, Biochemistry 1985, 24, 4931–4938. 73 http://www.exiqon.com 74 http://www.somalogic.com 75 http://www.prolinx.com 76 M. Sch¨aferling, S. SCHILLER, H. PAUL, M. KRUSCHINA, P. PAVLICKOVA, M. MEERKAMP, C. GIAMMASI, D. KAMBHAMPATI, Electrophoresis 2002, 23, 3097–3105. 77 http://www.avidity.com 78 http://www.iba-go.de 79 E. HOCHULI, H. D¨OBELI, A. SCHACHER, J. Chromat. 1987, 411, 177–184. 80 http://www.greinerbioone.com 81 http://qiagen.com 82 C. C. BAMDEN, G. B. SIGAL, J. L. STROMINGER, G. M. WHITESIDES, USP 6,197,515, March 2001. 83 G. MACBEATH, Nature Genet. 2002, 32, 526–532.
27
1
Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays Greta J. Wegner, Hye Jin Lee, and Robert M. Corn University of Wisconsin, Madison, USA
Originally published in: Protein Microarray Technology. Edited by Dev Kambhampati. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30597-1
1 Introduction
Multi-component arrays of immobilized biomolecules are rapidly becoming essential tools for the screening of bioaffinity interactions. Biomolecular array technology is especially amenable to proteomics research, since much of a protein’s function is accomplished through specific interactions with DNA, peptides, and other proteins. Array-based assay formats have been developed both for the detection of proteins in a sample [1], and for the identification of new protein–protein interactions [2, 3]. Arrays are also now regularly employed for fundamental studies of protein binding interactions. For example, peptide arrays have been used to study the binding interactions of antibodies [4, 5] and to characterize the sequence-specific activity of enzymes [6, 7]. Arrays of proteins have also been used for the high-throughput screening of protein–protein interactions and to identify novel biochemical activities [2, 3]. The majority of array-based assays currently employ fluorescent, enzymatic or radiolabeled biomolecules. Further advancement of analytical systems that do not require labeled biological molecules to detect affinity binding interactions is needed. Surface plasmon resonance (SPR) imaging is emerging as a label-free surface sensitive optical technique that measures the adsorption of solution phase molecules to arrays of biomolecules on thin gold films by changes in the local index of refraction. Label-free detection is particularly advantageous for the assay of small molecule-binding to proteins, since labels can interfere with the biological activity of the target molecule. To date, SPR imaging has been used to study reversible protein binding to DNA, peptide, protein, and carbohydrate arrays [8–10]. This review summarizes the recent advances that we have made in the implementation of SPR imaging to study DNA, RNA, and protein interactions with Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
DNA, peptide and protein probes immobilized in an array format. After a brief description of the SPR imaging technique in Section 2, two surface attachment chemistries used for the covalent immobilization of thiol-modified DNA, peptides, and carbohydrates through the modification of self-assembled monolayers of alkanethiols are discussed in Section 3. In Section 4, an array fabrication method based on chemical protection and photopatterning is described. Several examples of SPR imaging experiments using photopatterned arrays are presented including the study of DNA–DNA hybridization and protein–DNA binding interactions. We will conclude this article in Section 5 with a second array fabrication method based on microfluidic stenciling and address its application to peptide–protein and protein–protein binding interactions.
2 Surface Plasmon Resonance Imaging
Surface plasmon resonance (SPR) measurements are surface-sensitive optical methods used to characterize organic and biological layers on gold or noble metal thin films. These measurements utilize the optical field enhancement that occurs at a metal/dielectric interface when surface plasmons are created. Surface plasmons are electromagnetic waves, excited by p-polarized light, that propagate parallel to the gold surface. The optical fields decay exponentially from the surface of the metal and have a maximum decay length of about 200 nm. Within this region, the optical fields are sensitive to changes in the index of refraction caused by the adsorption or desorption of molecules onto the surface. As a consequence, SPR experiments have been amenable for the label-free study of reversible biomolecule binding interactions on gold films in real-time using two different methods: (1) scanning angle SPR and (2) SPR imaging techniques. Angle shift measurements are the most commonly undertaken SPR technique and have been popularized by the commercial Biacore instrument. In a scanning angle experiment, the percentage of light reflected off of a thin gold film optically coupled to a prism is measured as a function of incident angle. A prism or grating coupling arrangement is required because surface plasmons cannot be excited directly at planar air/metal or water/metal interfaces because momentum-matching conditions are not satisfied. A hemispherical prism is used to tightly focus light from a laser onto a single region of a gold thin film. Figure 1 shows three SPR curves and their theoretical fits where the percent reflectivity of light is measured as a function of the incident angle for light with excitation wavelengths of 1152 (), 814 (), and 633 nm (◦) [11]. For the SPR curve measured at an excitation wavelength of 814 nm, increasing the incident angle from 50 to 50.8◦ causes nearly 100% of the light to be reflected as the critical angle is reached and total internal reflection occurs. Further increase in the angle results in a steep drop in percent reflectivity as light is incorporated into surface plasmons until a minimum is reached at 53.5◦ , the surface plasmon angle. The position of the SPR angle is sensitive to changes in the index of refraction and/or thickness of the adsorbed film causing
2 Surface Plasmon Resonance Imaging
Fig. 1 SPR reflectivity curves showing the relationship between percentage reflectivity and angle of incidence for excitation wavelengths of 1152 nm (), 814 nm (), and 633 nm (◦) for an SF-10 glass/Au/water (in situ) assembly. Solid lines show the theoretical fit obtained by three-phase Fresnel calculations. Sharper SPR curves are obtained with increasing wavelength and can be applied to study thicker films with angle shift measurements and to enhance the contrast of SPR imaging experiments. The angle at which SPR imaging measurements are ob-
tained is marked by a vertical line on the SPR reflectivity curve obtained at an excitation wavelength of 814 nm. Inset shows the relationship between the change in percentage reflectivity (%R) and the change in index of refraction (nf ) at a fixed incident angle of 56.33◦ . A linear relationship is observed for %R less than 10%. Reprinted
with permission from Analytical Chemistry, 71 3928–3934, copyright 1999 American Chemical Society and Analytical Chemistry, 73 1–7, copyright 2001 American Chemical Society.
the minimum to shift. Shifts in angle position can be correlated to changes in thickness or index of refraction using N-phase Fresnel calculations as described by Hansen [12]. SPR measurements can be enhanced by using nearinfrared excitation wavelengths between 800 and 1100 nm. Mid-IR light results in sharper SPR curves than those obtained at an excitation wavelength of 633 nm from a HeNe source (refer to Figure 1). This allows a more precise determination of the SPR angle and can be used to measure thicker films. While scanning angle experiments are used to study a single region on a gold surface, SPR imaging is used to monitor the spatially resolved adsorption of biomolecules onto a multicomponent array. This technique is especially important for biological experiments where the simultaneous measurement of multiple interactions on a single chip is desired. SPR imaging measurements of the change in percent reflectivity are performed at a fixed angle near the SPR angle, as shown by a vertical line passing through the SPR reflectivity curve measured using an excitation wavelength of 814 nm as shown in Figure 1. Figure 2 shows a schematic representation of the SPR imaging instrument used to detect the adsorption of biopolymers in solution to surface-immobilized biomolecules such as DNA and peptides [11]. A collimated white light source is coupled with a narrow band pass
3
4 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
Fig. 2 Schematic diagram of the surface plasmon imager apparatus. P-polarized white light impinges on a prism sample apparatus at a fixed optimal angle, near the surface plasmon angle. The light is then passed through a narrow band pass filter and images are collected by a CCD camera. The analyte is delivered to the array fabricated on a gold thin film using a simple
500 :L flow cell. An SPR difference image of the sequence-specific hybridization of 18mer DNA to a two-component DNA array is shown in the bottom left corner of the figure. A maximum of 160 array elements can be studied on one SPR chip with a total surface area of 0.8 cm2 using square elements with 500 :m widths.
filter centered at 830 nm and used in place of a laser source. By using this configuration, laser fringes which can interfere with image quality are avoided and the enhanced contrast obtained using an NIR source is maintained. Images with lateral resolutions on the order of 50 µm can be obtained using an excitation wavelength of 830 nm. The light is passed through a polarizer in a direction incident to a high index glass prism/sample assembly. Here, light impinges on the back of a sample consisting of a gold thin film chemically modified with an array of biomolecules. Analytes are delivered to the array by either a 500-µL flow cell or through a set of parallel microfluidic channels from PDMS requiring 1 µL of sample per channel. Images are collected using a CCD camera after the light reflected off the sample is passed through a narrow band pass filter. The light interacts with the biomolecules immobilized on the thin gold film to create surface plasmons, inducing attenuation in the light reflected from the surface. As biomolecules from solution adsorb onto the array, the local index of refraction changes causing an increase in the percentage of incident light reflected off each array element of the sample, at a fixed optimal angle. The lower left corner of Figure 2 contains a typical SPR difference image showing sequence-specific adsorption of complementary DNA onto a two-component DNA array. Like the scanning angle SPR measurements, quantitative information can be obtained from SPR imaging measurements. The inset in Figure 1 shows the relationship between the percentage reflectivity and the index of refraction for changes in percentage reflectivity under 10% [13]. This relationship can also be applied to determine the surface coverage of biomolecules adsorbed onto a surface. SPR imaging measurements have been used for the quantitative evaluation of the interactions between biomolecules in solution and DNA or peptide probes immobilized
3 Surface Attachment Chemistries
on the array. In the case of biomolecules that have a high molecular weight, the number of immobilized surface target sites can be decreased in order to stay within this linear %R region [10]. The sensitivity of the SPR imaging approach is about 10 femtomoles (1 µL of a 10 nM solution) for 18-mer single-stranded DNA hybridized to a DNA array [14], and 1 femtomole (1 µL of a 1 nM solution) for specific antibody adsorption onto a peptide array [10]. Although SPR imaging measurements do not match the sensitivity of fluorescence or radioactive methods, it is more than adequate for many applications including the study of protein–protein interactions and has the advantage of employing direct detection of binding interactions.
3 Surface Attachment Chemistries
The development of well-characterized surface chemistries for the attachment of biological molecules onto gold thin films in an array format is necessary for SPR imaging measurements of biomolecular affinity interactions. Noble metal thin films are required for the propagation of surface plasmons. As a consequence, commercially available DNA arrays made on glass substrates, such as those produced by Affymetrix [15–18], or peptide arrays prepared by the SPOT synthesis technique on cellulose membranes [19–21], are not viable options for SPR imaging investigations. Instead, we have employed self-assembled monolayers of long chain alkanethiols that are ω-terminated with an amine functional group as the foundation of the array [8, 9, 11, 22]. Chemical modification of the self-assembled monolayers is used to tether biological molecules to the surface. In this section, two methods using the bifunctional linkers SSMCC (sulfosuccinimidyl 4-(N-maleimidomethyl) cyclohexane-1-carboxylate) and SATP (N-Succinimidyl S-acetyl-thiopropionate) are described for the covalent attachment of thiol-containing DNA, peptide, carbohydrate and capture probe molecules. 3.1 SSMCC Attachment Chemistry
In the first reaction scheme, the heterobifunctional linker SSMCC, which contains both an N-hydroxysulfosuccinimide (NHSS) ester and a maleimide functionality, is used to covalently link the amine-terminated self-assembled monolayer of 11mercaptoundecylamine (MUAM) to a thiol-modified probe molecule (Figure 3a) [8, 22]. First, the NHSS ester moiety of SSMCC is reacted with the free amines of the alkanethiol monolayer to form amide bonds terminating with maleimide groups. Next, thiol-modified DNA or cysteine-containing peptides are introduced to the thiol-reactive maleimide monolayer and attached by the creation of a thioether bond. DNA molecules attached by this method have a surface coverage of 1.0 × 1012 molecules cm−2 [13].
5
6 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
Fig. 3 Surface attachment chemistry for the immobilization of thiol-modified DNA and cysteine-containing peptides: (a) The linker SSMCC is reacted with a wellpacked self-assembled monolayer of 11mercaptoundecylamine (MUAM) to create a maleimide-modified surface. The maleimide surface is then used to covalently attach thiol-modified DNA or cysteine-containing
peptides. (b) In the second approach, SATP is reacted with MUAM to create a protected thiol surface. Upon deprotection with a basic solution containing DTT, the free sulfhydryl is reacted with dipyridyl disulfide to create a pyridyl disulfide surface. Thioldisulfide exchange reactions are used to couple thiol-containing biomolecules to the surface.
3.2 SATP Attachment Chemistry
An alternative reaction scheme, more amenable for generating peptide arrays, involves the attachment of thiol-containing biomolecules onto gold surfaces through a disulfide linkage [10, 23]. The reaction scheme for this surface modification process is shown in Figure 3b. In this approach, a self-assembled monolayer of MUAM is reacted with the molecule SATP, a bifunctional linker containing an NHS ester and a protected sulfhydryl. The NHS ester reacts with the amines present on the surface to form a stable amide bond resulting in a protected sulfhydryl surface. Exposure of the surface to an alkaline solution containing DTT removes the protecting acetyl group revealing an active sulfhydryl surface. Next, the sulfhydryl surface is reacted with 2,2 -dipyridyl disulfide to form a pyridyl disulfide surface. Thiol–disulfide exchange reactions are then performed in order to attach thiol-modified DNA or cysteine-containing peptides to the substrate, with a surface coverage of 1013 molecules cm−2 . This type of surface attachment chemistry has the advantage of being reversible, as the disulfide bond can be cleaved in the presence of DTT to regenerate the sulfhydryl-terminated surface.
4 SPR Imaging Experiments Using Photopatterned Arrays
4 SPR Imaging Experiments Using Photopatterned Arrays
Arrays used for SPR imaging experiments contain multiple individually addressable components and are prepared by a combination of self-assembly, surface attachment chemistry, and array patterning using either photopatterning or PDMS microfluidic channels. This section will explore the first array fabrication methodology developed within our laboratories for the creation of robust DNA and peptide arrays using chemical protection and deprotection and photopatterning. This multistep fabrication procedure is outlined in Figure 4 [8]. The first step of the fabrication method is the creation of a temporary hydrophobic background using the bulky amine protecting group, 9-fluorenylmethoxycarbonyl (Fmoc), frequently used in solid phase peptide synthesis. The N-hydroxysuccinimide ester of Fmoc (FmocNHS) is reacted with the terminal amines of a packed-self assembled alkanethiol
Fig. 4 Multi-step array fabrication process using protection/deprotection chemistry and photopatterning techniques to prepare DNA and peptide arrays. The bottom left corner of the figure shows individually addressable hydrophilic drops of DNA pinned to the surface by a hydrophobic background. First, a bare gold surface is modified with a self-assembled monolayer of 11-mercaptoundecylamine (MUAM). This amine-terminated surface is reacted with the hydrophobic protecting group Fmoc.
UV-light is used to break the gold–thiol bond and create bare gold pads on the surface. These pads are subsequently filled with MUAM. A bifunctional linker is used to attach DNA or peptide probes to create an array. Finally, the Fmoc is removed with base and replaced with a polyethylene derivative to prevent the non-specific adsorption of biopolymers to the background. Reprinted
with permission from Journal of the American Chemical Society, 121, 8044–8051. Copyright 1999 American Chemical Society.
7
8 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
monolayer to form a stable carbamate linkage. The surface is then exposed to UV light through a patterned quartz mask to create arrays composed of 500 by 500 µm square elements with 500 µm spacing between the elements. The gold–thiol bond is cleaved in the regions where the light shines through the mask resulting in a patterned gold surface that consists of bare gold pads surrounded by Fmoc. Next, the surface is immersed in an ethanolic solution of MUAM. The amine-terminated alkanethiols self-assemble in the bare gold pads producing a surface containing reactive hydrophilic MUAM pads surrounded by the hydrophobic Fmoc background. The lower left-hand corner of Figure 4 contains an image of an array showing the individually addressable hydrophilic spots surrounded by a hydrophobic background. Chemical protection allows droplets of hydrophilic DNA or peptides to be “pinned”, without contaminating other sequences contained on neighboring pads on the array, and covalently attached using the linkers SSMCC or SATP. These arrays can be easily used with a mechanical array spotter to introduce multiple peptide or DNA elements onto one chip for high-throughput studies. For example, arrays composed of more than 300 elements can be produced using square elements with 250-µm widths. The array fabrication is then completed by removing the hydrophobic Fmoc background and replacing it with polyethylene glycol (PEG) to reduce the nonspecific adsorption of biopolymers. Since Fmoc is a base-labile protecting group, the original amine-terminated surface can be easily regenerated by exposure to a secondary amine. An NHS derivative of polyethylene glycol (PEG) can react with the terminal amines of the SAM to form a stable amide linkage. The formation of the temporary Fmoc surface is necessary, since PEG is too hydrophilic to prevent contamination between DNA or peptide droplets. The DNA arrays can then be analyzed by SPR imaging using a 500 µL flow cell to introduce biopolymer analytes to the array surface. DNA arrays fabricated using SSMCC attachment chemistry can be used for more than 25 hybridization cycles [13] while the SATP chemistry can be used for more than 30 cycles without degradation of the array [23]. 4.1 DNA–DNA Hybridization
The hybridization of complementary DNA to a five-component DNA array fabricated by photopatterning and SSMCC chemistry was examined by SPR imaging. All surface-immobilized DNA probes were tested by hybridization to respective perfect-match DNA complements. The probes were immobilized in a distinct geometric pattern to easily identify interacting sequences. The DNA sequences used in this experiment are listed in Table 1, and denoted as A–E. The surface was first exposed to a 100 nM solution of the 16-mer DNA oligonucleotide complement to probe A for 15 min (Figure 5a). The SPR difference image shown in this figure was produced by subtracting the images taken before and after hybridization. Regeneration of the surface was achieved by rinsing with 8 M urea. Repeated cycles of hybridization and denaturation were used to obtain Figure 5b–e.
4 SPR Imaging Experiments Using Photopatterned Arrays Table 1 DNA sequences used in Figure 5
Surface-bound probe DNA
DNA sequences (5 −3 )
Probe A Probe B Probe C Probe D Probe E
TAC TCA CCC GTC CGC C CTT TTA TGT TTG AAC CAT GCG GTG GCC GAT CAC CCT CTC CGT GGC TTT CTG GTT A CTT TGA GTT TCA GTC
Probe DNA is modified with a 5 -thiol modifier and a 15-T spacer is appended to the 5 end of the probe DNA.
Excellent sequence-specific adsorption of the complementary DNA was observed for all probes and with little non-specific adsorption to the background. The approximately equal SPR signal under identical experimental conditions indicates that the probes are equally accessible for hybridization to target molecules in solution. 4.2 Mismatch Binding Protein, MutS
Mismatch binding proteins, such as MutS in E. coli are used by cells to identify and repair mismatches, short insertions, and deletions that are caused by replication
Fig. 5 SPR difference images showing the hybridization of perfect-match DNA complements to a DNA array containing five different probes. The sequences were immobilized on 500 by 500 :m array elements in the pattern shown in the figure. (a) First, the array was exposed to a 100 nM solution of the DNA complement to probe A for 15 min. Hybridization adsorption onto the ar-
ray is indicated by a change in the percentage reflectivity of incident light. (b) After rinsing with 8 M urea to regenerate the surface, the experiment was repeated with the DNA complement of probe B. Successive rounds of denaturation and hybridization of the remaining DNA complement probes resulted in images c–e. The same region of the array is shown for all images.
9
10 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
errors. The ability to identify mismatches in DNA is necessary for the technology used to detect mutations and single nucleotide polymorphism in double-stranded DNA. Such assays may be important in the future for correlating genomic information from an individual with the development of diseases [24–26]. As an example, SPR imaging was used to show the interaction of MutS with mismatch DNA sequences (A–D) immobilized onto an array [9, 27]. The sequences and additional experimental information are reported elsewhere [9]. This photopatterned array was exposed to a solution containing the sequence Z. Probe A did not hybridize with sequence Z and remained single stranded. Sequence Z formed a duplex with probe B containing a two-base insertion, a perfectly matched duplex with probe C, and a duplex containing a one-base G/T mismatch with probe D. Figure 6 shows an SPR difference image after MutS was allowed to bind to the DNA array. Adsorption of MutS was observed to both duplex B containing the insertion and duplex D containing the G/T mismatch, which has the strongest mismatch interaction with MutS [28]. Minimal binding occurred to the perfect match or the single-stranded DNA, showing that it was possible to study the sequence-specific interactions of MutS to DNA using SPR imaging measurements of DNA arrays. The utility of mismatch binding proteins to identify single-base mismatches, as well as short insertions and
Fig. 6 SPR imaging measurements of E. coli mismatch binding protein MutS adsorption onto a DNA array. This array was created with DNA probes A through D immobilized on 500 by 500 :m array elements in the pattern shown on the left of the figure. The array was then exposed to a solution containing the sequence Z. Z does not bind at all to probe A leaving it single strandedbut binds to probe B to create a duplex containing a two-base insertion, to probe C in a perfectly complementary manner, and to probe D to form a duplex containing a
single-base mismatch. An SPR difference image of the binding of MutS to the array is shown on the right of the figure. The image shown is the difference between two images collected before and after exposure of the surface to MutS. Only 12 array elements from the total patterned surface area of 0.8 cm2 are presented in these images. Reprinted with permission from An-
nual Review of Physical Chemistry 51, pp 41–63. Copyright 2000 by Annual Reviews www.annualreview.org.
5 SPR Imaging Experiments of Arrays Created by Microfluidic Stencils 11
deletions in double-stranded DNA will lead to the application of mismatch binding proteins in mutation and single nucleotide polymorphism detection assays. 4.3 Bacterial Response Regulators, OmpR and VanR
Response regulators are DNA binding proteins that are essential components of the two-component signal transduction phosphorylation relay system used by bacteria to respond to environmental stimuli and to adapt accordingly through transcriptional activation of certain genes [29]. Since few response regulators have been studied in depth, DNA arrays are a potential tool to determine the DNA sequences recognized by a particular response regulator. As an example, the sequence-specific binding of two response regulators, OmpR and VanR, to DNA probe molecules was studied [30]. OmpR is one of the most well-characterized response regulators and is important in bacterial response to environmental osmotic pressure [31], while VanR is involved in the development of antibiotic resistance [32]. A photopatterned array was prepared containing six different immobilized DNA probes known to interact with either OmpR or VanR. The DNA array was then exposed to solutions of corresponding complementary DNA to create surfaceimmobilized double-stranded DNA. Figure 7a shows an SPR difference image after the array was exposed to a 100 nM solution of OmpR. OmpR primarily absorbed to the DNA sequences OmpF1, F1F2, and C2 which were known to interact only with OmpR. The array was then regenerated with urea and exposed to complementary DNA before VanR (500 nM) was introduced to the array. VanR primarily bound to the DNA sequences VanH1, H2, and R1 (Figure 7b). Little nonspecific adsorption of either VanR or OmpR to the other sequences on the chip was observed. DNA probe sequences for VanR and OmpR response regulators can be found in the literature [30]. In addition to sequence specificity, SPR imaging measurements of DNA arrays can also be used to monitor inhibitor reactions that obstruct the binding of response regulators to DNA controlling gene expression in the bacteria. Figure 8 shows the effect of the inhibitor CpdA on VanR binding to the DNA sequences VanH1, H2, and R1 as measured by SPR imaging. The changes in percentage reflectivity were measured as a function of VanR concentration with and without 5 µM of CpdA. Integration of the line profile was performed to determine the average change in percentage reflectivity for the DNA sequences VanH1, H2, and R1. As can be seen in Figure 8, more adsorption is observed in the absence of the inhibitor than when CpdA is included with the VanR solution.
5 SPR Imaging Experiments of Arrays Created by Microfluidic Stencils
We have recently developed a second fabrication and detection approach based on the coupling of microfluidic networks to SPR imaging measurements with
12 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
Fig. 7 Surface plasmon imaging difference images of response regulator adsorption to double-stranded DNA immobilized on a photopatterned array composed of 500 by 500 :m array elements. (A) Specific adsorption of a 100 nM solution of OmpR to the DNA sequences OmpF1, F1F2, and C2. (B) VanR (500 nM) adsorbs to the DNA sequences VanH1, H2, and R1. OmpR is known to bind to the DNA sequences
OmpF1, F1F2, and C2 and VanR is known to bind VanH1, H2, R1. There is little nonspecific adsorption of the response regulators to the other sequences on the chip. DNA probe sequences for VanR and OmpR response regulators can be found in the literature. Reprinted with permission from Langmuir, Vol. 19, 1486–1492. Copyright 2003 American Chemical Society.
Fig. 8 Plot showing the change in percentage reflectivity observed when VanRphosphate (•) and VanR-phosphate plus 5 mM CpdA (◦) adsorb to VanR DNA promoter sequences as a function of protein concentration. Integration of the line profile of VanR binding to DNA sequences VanH1, H2, and R1 was used to determine the av-
erage percentage reflectivity at each protein concentration. Less VanR-phosphate adsorbs to the DNA probes in the presence of the inhibitor CpdA than in its absence.
Reprinted with permission from Langmuir, Vol. 19, 1486–1492. Copyright 2003 American Chemical Society.
5 SPR Imaging Experiments of Arrays Created by Microfluidic Stencils 13
the specific aim of lowering detection limits, reducing analysis time, and decreasing chemical consumption and sample volume [14]. This approach relies on the construction of microfluidic networks within the polymer polydimethylsiloxane (PDMS), which are physically sealed to chemically-modified gold surfaces. These PDMS microchannels were used in two ways: (i) to fabricate “1-D” arrays consisting of lines of immobilized peptides and DNA probes and (ii) to create “2-D” detection arrays in which a second set of PDMS microchannels were placed perpendicular to a 1-D line array in order to deliver target samples with small volume [14]. 5.1 1-D Peptide Array for Antibody Binding Measurements
The fabrication of 1-D line arrays on gold surfaces is as follows. First, a set of parallel microchannels from PDMS were created by replication from 3-D silicon wafer masters that were created photolithographically from a 2-D chrome mask pattern (Figure 9) [33, 34]. The microchannels were physically attached to a gold substrate modified with MUAM. A simple differential pumping system was used to introduce the chemical linkers SATP or SSMCC into the microchannels. Next, the DNA or peptide probe molecules were reacted within the microchannels. Once the immobilization was complete, the channels were removed and the alkanethiolterminated monolayer was treated with PEG-NHS to prevent non-specific adsorption to the surface. Since the PDMS microchannels physically define where the probe molecules are immobilized, protection and deprotection chemistry is not required.
Fig. 9 Schematic representation of the microfabrication strategy used to create DNA, peptide and protein linear arrays. A self-assembled monolayer of MUAM was formed on a clean gold surface. Polydimethyl siloxane (PDMS) microchannels were used to deliver bifunctional linkers and
probe biomolecules to the surface. The microchannels were removed and the background was protected with a polyethylene glycol derivative. Adapted with permission from Analytical Chemistry, 73 5525–5531. Copyright 2001 American Chemical Society.
14 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
Fig. 10 SPR difference images showing the sequence-specific binding interactions of antibodies to a peptide linear array containing two probes A and B corresponding to the peptide sequences Myc and HA. The peptides were immobilized in the pattern shown in the figure. (a) First, the array was
exposed to a 25 nM solution of anti-Myc. Specific adsorption of anti-Myc to probe A is observed. (b) After rinsing with pH 11.5 buffer to regenerate the surface, the experiment was repeated 100 nM anti-HA. AntiHA primarily bound to probe B, with little non-specific interaction with probe A.
By changing the spacing of the channels, up to 100 different peptide or DNA sequences can be immobilized on one chip. The array can then be coupled with a large volume flow cell (500 µL) to introduce analytes to the line array. SPR imaging measurements were used to monitor the sequence-specific adsorption of antibodies to a two-component peptide array. The linear array contained the peptide sequences, Myc and HA denoted A and B, respectively (Figure 10). The introduction of a 25 nM solution of anti-Myc though a 500 µL flow cell resulted in specific adsorption to probe A. The anti-Myc was removed from the surface using pH 11.5 buffer prior to exposing the array to a solution of 100 nM anti-HA. Anti-HA primarily bound to probe B, although some non-specific adsorption probe A was observed. 5.2 2-D DNA Array for RNA Hybridization
In a second approach, microfluidic channels were used as a small volume flow cell during SPR imaging measurements to reduce the required analyte and to simultaneously introduce multiple analytes. First, a 1-D array composed of DNA elements was prepared and the background protected with PEG. Then a second set of PDMS channels was placed perpendicular to the immobilized DNA lines, forming a 2-D array, and used to introduce analytes to the array surface during SPR imaging
5 SPR Imaging Experiments of Arrays Created by Microfluidic Stencils 15
Fig. 11 (a) Schematic diagram of parallel polydimethyl siloxane (PDMS) microchannels used to deliver small volumes of analyte (1 :L) to peptide and DNA linear arrays. (b) Schematic representation of the
SPR imaging experimental set-up incorporating PDMS microfluidics to deliver a small volume of target sample. This configuration allows different analytes to be delivered through each channel.
experiments (Figure 11a). Figure 11b shows the schematic diagram of an SPR imager set-up where the large flow cell (500 µL) has been replaced with microfluidic channels constructed in PDMS reducing the required sample volume to 1 µL. An example of an SPR difference image of a three-component 2-D hybridization DNA array exposed to GUS gene ssRNA through PDMS microchannels is shown in Figure 12. About 20 femtomoles of RNA in a total solution volume of 1 µL was delivered through each channel. The DNA sequences and more information are reported elsewhere [14]. Varying SPR signal intensities were observed for the three different probes in Figure 12 and can be attributed to differences in the binding efficiency of the GUS gene ssRNA to the different surface-bound ssDNA probes. The ability to detect 20 femtomoles of ssRNA is sufficient for a number of biological applications, including the direct detection of messenger RNA (mRNA) from
16 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
Fig. 12 An SPR difference image showing adsorption of GUS gene ssRNA onto DNA probes. Each channel was 300 :m wide, 35 :m deep, and 1.2 :m long with 900 :m spacing between channels. Reprinted with permission from Analytical Chemistry, 73 5525–5531. Copyright 2001 American Chemical Society.
highly expressed genes and the detection of ribsomal RNA (rRNA) from complex biological samples. 5.3 2-D Peptide Array for Antibody Binding Measurements
SPR imaging technology integrating microfluidics was applied to monitor the sequence-specific interactions of antibodies to peptide motifs for epitope mapping applications. For example, peptide arrays were used to study the residues essential to the binding motif of anti-FLAG M2, an antibody commonly used for the purification of fusion proteins [35]. SPR imaging is currently used to study cell adhesion processes and to investigate the enzymatic modifications of peptides by sequence-specific phophatases, kinases, and proteases. Ultimately, these techniques could be used to identify the important residues in previously uncharacterized protein–protein interactions based on peptide recognition motifs. Sequence-specific adsorption of anti-FLAG M2 to four different peptide epitopes based on the FLAG peptide tag was studied with SPR imaging [10]. Peptide line arrays were created using a microfluidic fabrication process and coupled to a second microchannel with a wraparound design (Figure 13a). The “worm” microchannel has a total sample volume of 5 µL and is used to deliver antibody to the peptide array. Figure 13b shows a difference image obtained by subtracting images of a fourcomponent peptide array before and after the introduction of a solution of 100 nM anti-FLAG M2. A line profile taken across the channels indicated that the greatest amount of antibody adsorption is observed at elements where the original sequence, F1, was immobilized. The least amount of binding was observed to the peptide sequences F3 and F4. These peptides differed from the original sequence by the substitution of an amino acid important to the binding motif by
5 SPR Imaging Experiments of Arrays Created by Microfluidic Stencils 17
Fig. 13 (a) A worm microchannel is used to deliver 5 :l aliquots of antibody solution to the peptide linear array. This microfluidic configuration is suitable for the delivery of a small volume of a single analyte to the array with a detection area equivalent to a configuration using a large volume flow cell. The worm channel was 500 :m wide, 35 :m deep, and 12 mm long with 500 :m spacing between channels. (b) SPR difference image showing the adsorption of 100 nM anti-FLAG to a peptide array composed
of four epitopes, differing by a single amino acid, based on the FLAG peptide sequence. The line profile reveals that the greatest adsorption occurs at elements containing the original sequence, F1. Diminished adsorption is observed at sequences, F3 and F4, containing alanine substitutions for essential residues of the binding motif. Peptide probe sequences can be found in the literature. Reprinted with permission from Analytical Chemistry, 74, 5161–5168. Copyright 2002 American Chemical Society.
an alanine residue. In contrast to the F3 and F4 peptides, the F2 sequence had a substantially higher signal, since this peptide contained an alanine substitution for a non-essential residue in the binding motif. These data demonstrate that SPR imaging can be used to compare the amount of protein binding to peptides differing by a single amino acid and to determine the importance of each residue for a peptide–protein interaction.
18 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
Fig. 14 Langmuir isotherms fitted to the adsorption of anti-FLAG M2 onto an array containing the peptides F1 (•) and F2 (◦). The change in percentage reflectivity was determined by integrating the line profiles for the F1 and F2 peptides at varying anti-FLAG M2 concentrations. The adsorp-
tion constants were calculated by the fit of the Langmuir isotherms, to be 1.5 × 108 M−1 and 2.8 × 107 M−1 for F1 and F4 respectively. Reprinted with permission
from Analytical Chemistry, 74, 5161–5168. Copyright 2002 American Chemical Society.
Quantitative SPR imaging measurements can also be used to simultaneously determine adsorption constants for the interactions of antibodies with multiple peptide motifs on a single chip [10]. The change in percentage reflectivity was measured for the peptide sequences F1 and F2 at increasing antibody concentrations. This data is shown in Figure 14 where the SPR signal resulting from antibody adsorption is plotted as a function of anti-FLAG M2 concentration, ranging from 1 to 150 nM. These data points were fitted with a Langmuir isotherm. The adsorption constants were found to be 1.5 × 108 M−1 for F1 and 2.8 × 107 M−1 for F2, showing that the original sequence had a stronger antibody peptide interaction. The detection limit for the F1 peptide with anti-FLAG M2 was 0.5 nM. 5.4 2-D Protein Array
With the sequencing of many genomes complete, it has become clear that highthroughput methods for studying gene products is necessary in order to elucidate the vast amount of information encoding biological function in an organism. Such studies would be enhanced by the addition of label-free analytical tools such as SPR imaging. The utility of SPR imaging for the study of protein–peptide interactions demonstrated the feasibility of this technique for measuring the adsorption of biomolecules to immobilized proteins. The first step for adapting SPR imaging measurements to the study of protein–protein interactions is the creation of a robust reproducible, biologically-active protein array. Most protein arrays are constructed
5 SPR Imaging Experiments of Arrays Created by Microfluidic Stencils 19
by random attachment strategies, using naturally occurring lysines and terminal amines for surface attachment [36]. However, such approaches have the potential to interfere with the binding site of the protein, negatively affecting biological activity. To target such problems, we have established a general methodology to create oriented arrays of fusion proteins attached to gold thin films. In an oriented fusion protein array, the fusion protein or tag will interact with a capture agent immobilized on the gold thin film leaving the protein of interest free to interact with proteins in solution. We are currently investigating several fusion protein/capture agent pairs including the histidine peptide tag with NTA [37–39], glutathione Stransferase (GST) with glutathione or anti-GST [40, 41], and maltose binding protein (MBP) with maltose [42, 43]. Since these tags and capture agents are commonly used for the purification of fusion proteins, they are well-characterized and the proteins can be released to follow up SPR imaging measurements with further off-chip solution phase experiments. Protein arrays fabricated using NTA-modified gold films to capture histagged proteins are a promising tool for the study of protein–protein interactions using SPR imaging (see Figure 15a). A worm microchannel was used to pattern a selfassembled alkanethiol monolayer on a gold thin film with NTA capture agents. This microchannel was then removed and the whole surface was reacted with a polyethylene glycol derivative to prevent the non-specific adsorption of proteins
Fig. 15 (a) Schematic representation of a general approach for the creation of an oriented fusion protein array on gold surfaces. (b) The NTA capture agent was immobilized through a worm microchannel. Next, a parallel set of PDMS channels was used to simultaneously deliver multiple histidinetagged analytes to the NTA elements. (c) An example of an SPR difference image
showing the immobilization of histidinetagged ubiquitin B and polyhistidine D to NTA-modified monolayers in the presence of nickel ions. In the absence of nickel ions there is little non-specific adsorption of histagged ubiquitin A or polyhistidine C to the NTA surface, suggesting that a specific chelation interaction is occurring.
20 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
to the background around the NTA capture agents. Next a second set of parallel microchannels oriented perpendicular to the immobilized NTA lines was used to introduce proteins to the surface (Figure 15b). This fabrication method provides a convenient and fast way to create arrays composed of one protein per channel. As shown in the SPR difference image in Figure 15c, both polyhistidine and histidinetagged ubiquitin interact with the NTA surfaces in the presence of nickel ions, while little adsorption of these biomolecules is observed when nickel ions are not present, showing the specificity of the interaction. This protein fabrication strategy has the advantage that SPR imaging measurements can be used to evaluate protein immobilization onto the array prior to using the chip to study protein–protein interactions. Our current efforts are focused on the application of SPR imaging measurements to study the interactions of protein, DNA, and peptides to protein chips. Initial results indicate the SPR imaging techniques will become an innovative tool for proteomics research. 6 Conclusions
SPR imaging is emerging as a promising tool for the study of bioaffinity interactions in an array format. This review presented various recent applications of SPR imaging measurements for the study of DNA, RNA, and protein interactions with biomolecular arrays. The key components for these experiments are a wellcharacterized chemical modification of gold thin films to immobilize biomolecular probe molecules coupled with a robust array fabrication method based on photopatterning or microfluidics. The development of two different methods of array fabrication has increased the versatility of the SPR imaging technique; the linear arrays created from microfluidics are particularly efficient in terms of time and materials for arrays requiring only a small number (< 100) of different surfacebound biomolecules, whereas the generalized photopatterned array fabrication methodology is more applicable to systems where the number of surface-attached biomolecules is greater than 100. The successful application of DNA, peptide, and carbohydrate arrays to study protein binding interactions demonstrates the wide variety of bioaffinity interactions that can be observed with SPR imaging. The study of protein–protein interactions by SPR imaging is still in its infancy, but we predict that this technology will have a significant role to play in the high-throughput screening of protein–protein interactions, the study of synthetic chemical libraries, the binding and inhibition of cell receptors, and the elucidation of regulatory pathways in biological systems. 7 Acknowledgments
This research is funded by the National Institute of Health (3R01 GM59622-03S 1 and 8 ROI EB00269-02) and the National Science Foundation (CHE-0133151).
References
References 1 MILLER, J. C., BUTLER, E. B., TEH, B. S., HAAB, B. B. The application of protein microarrays to serum diagnostics: Prostate cancer as a test case. Disease Markers 2001, 17, 225–234. 2 ZHU, H., KLEMIC, J. F., CHANG, S., BERTONE, P., CASAMAYOR, A., KLEMIC, K. G., SMITH, D., GERSTEIN, M., REED, M. A., SNYDER, M. Analysis of yeast protein kinases using protein chips. Nature Genet. 2000, 26, 283–289. 3 ZHU, H., BILGIN, M., BANGHAM, R., HALL, D., CASAMAYOR, A., BERTONE, P., LAN, N., JANSEN, R., BIDLINGMAIER, S., HOUFEK, T., MITCHELL, T., MILLER, P., DEAN, R. A., GERSTEIN, M., SNYDER, M. Global analysis of protein activities using proteome chips. Science 2001, 293, 2101–2105. 4 LAUNE, D., MOLINA, F., MANI, J.-C., DEL RIO, M., BOUANANI, M., PAU, B., GRANIER, C. Dissection of an antibody paratope into peptides discloses the idiotope recognized by the cognate anti-idiotypic antibody. J. Immun. Methods 2000, 239, 63–73. 5 UTHAIPIBULL, C., AUFIERO, B., SYED, S. E. H., HANSE, B., GUEVARA PATINO, J. A., ANGOV, E., LING, I. T., FEGERDING, K., MORGAN, W. D., OCKENHOUSE, C., BIRDSALL, B., FEENEY, J., LYON, J. A., HOLDER, A. A. Inhibitory and blocking monoclonal antibody epitopes on merozite surface protein 1 of the malaria parasiste plasmodium falciparum. J. Mol. Biol. 2001, 307, 1381–1394. 6 HOUSEMAN, B. T., HUH, J. H., KRON, S., J. MRKSICH, M. Peptide chips for the quantitative evaluation of protein kinase activity. Nature Biotech. 2001, 20, 270–274. 7 KILPERT, K., HANSEN, G., WESSNER, H., SCHNEIDER-MERGENER, J., HOHNE, W. Characterizing and optimizing protease/peptide inhibitor interactions, a new application for spot synthesis. J. Biochem. 2000, 128, 1051–1057. 8 BROCKMAN, J. M., FRUTOS, A. G., CORN, R. M. A multi-step chemical modification procedure to create DNA arrays on gold surfaces for the study of protein–DNA interactions with surface plasmon resonance imaging. J. Am. Chem. Soc. 1999, 121, 8044–8051.
9 FRUTOS, A. G., BROCKMAN, J. M., CORN, R. M. Reversible protection and reactive patterning of amine- and hydroxyl-terminated self-assembled monolayers on gold surfaces for the fabrication of biopolymer arrays. Langmuir 2000, 16, 2192–2197. 10 WEGNER, G. J., LEE, H. J., CORN, R. M. Characterization and optimization of peptide arrays for the study of epitope–antibody interactions using surface plasmon resonance imaging. Anal. Chem. 2002, 74, 5161–5168. 11 NELSON, B. P., FRUTOS, A. G., BROCKMAN, J. M., CORN, R. M. Near-infrared surface plasmon resonance measurements of ultrathin films. 1. Angle shift and SPR imaging experiments. Anal. Chem. 1999, 71, 3928–3934. 12 Hansen, W. N. ELECTRIC fields produced by the propagation of plane coherent electromagnetic radiation in a stratified medium. J. Opt. Soc. Am. 1968, 58, 380–390. 13 NELSON, B. P., GRIMSRUD, T. E., LILES, M. R., GOODMAN, R. M., CORN, R. M. Surface plasmon resonance imaging measurements of DNA and RNA hybridization adsorption onto DNA microarrays. Anal. Chem. 2001, 73, 1–7. 14 LEE, H. J., GOODRICH, T. T., CORN, R. M. SPR imaging measurements of 1-D and 2-D DNA microarrays created from microfluidic channels on gold thin films. Anal. Chem. 2001, 73, 5525–5531. 15 PEASE, A. C., SOLAS, D., SULLIVAN, E. J., CRONIN, M. T., HOLMES, C. P., FODOR, S. P. A. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl Acad. Sci. USA 1994, 91, 5022–5026. 16 SCHENA, M., SHALON, D., DAVIS, R. W., BROWN, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270, 467–470. 17 BRENNER, S., JOHNSON, M., BRIDGHAM, J., GOLDA, G., LLOYD, D. H., JOHNSON, D., LUO, S., MCCURDY, S., FOY, M., EWAN, M., ROTH, R., GEORGE, D., ELETR, S., ALBRECHT, G., VERMAAS, E., WILLIAMS, S. R., MOON, K., BURCHAM, T., PALLAS, M.,
21
22 Surface Plasmon Resonance Imaging Measurements of Biomolecular Arrays
18
19
20
21
22
23
24
25
DUBRIDGE, R. B., KIRCHNER, J., FEARON, K., MAO, J., CORCORAN, K. Gene expression analysis by massively parallel signature sequencing (MPS) on microbead arrays. Nature Biotech. 2000, 18, 630–634. WINZELER, E. A., RICHARDS, D. R., CONWAY, A. R., GOLDSTEIN, A. L., KALMAN, S., MCCULLOUGH, M. J., MCCUSKER, J. H., STEVENS, D. A., WODICKA, L., LOCKHART, D. J., DAVIS, R. W. Direct allelic variation scanning of the yeast genome. Science 1998, 281, 1194–1197. WENSCHUH, H., VOLKMER-ENGERT, R., SCHMIDT, M., SCHULZ, M., SCHNEIDER-MERGENER, J., REINEKE, U. Coherent membrane supports for parallel microsynthesis and screening of bioactive peptides. Biopolymers 2000, 55, 188–206. FRANK, R. Spot-synthesis: An easy technique for the positionally addressable, parallel chemical synthesis on a membrane support. Tetrahedron 1992, 48, 9217–9232. REINEKE, U., VOLKMER-ENGERT, R., SCHNEIDER-MERGENER, J. Applications of peptide arrays prepared by the SPOT-technology. Curr. Opin. Biotech. 2001, 12, 59–64. FREY, B., CORN, R. M. Covalent attachment and derivitization of poly(L-lysine) monolayers on gold surfaces as characterized by polarization-modulation FT-IR spectroscopy. Anal. Chem. 1996, 68, 3187–3193. SMITH, E., WANAT, M. J., CHENG, Y., BARREIRA, S. V. P., FRUTOS, A. G., CORN, R. M. Formation, spectroscopic characterization, and applications of sulfhydryl-terminated alkanethiol monolayers for the chemical attachment of DNA onto gold surfaces. Langmuir 2001, 17, 2502–2507. HIRSCHHORN, J. N., SKLAR, P., LINDBLAD-TOH, K., LIM, Y. M., RUIZ-GUTIERREZ, M., BOLK, S., LANGHORST, B., SCHAFFER, S. E., WINCHESTER, E., LANDLER, E. S. SBE-TAGS: An array-based method for efficient single-nucleotide polymorphism genotyping. Proc. Natl Acad. Sci. USA 2000, 97, 12164–12169. REICH, D. E., CARGILL, M., BOLK, S., IRELAND, J., SABETI, P. C., RICHTER, D. J., LAVERY, T., KOUOUMJIAN, R., FARHADIAN,
26 27
28
29
30
31
32
33
34
35
S. F., WARD, R., LANDER, E. S. Linkage disequilibrium in the human genome. Nature 2001, 411, 199–204. Brookes, A. J. THE essence of SNPs. Gene 1999, 234, 177–186. BROCKMAN, J. M., NELSON, B. P., CORN, R. M. Surface plasmon resonance imaging measurements of ultrathin organic films. Annu. Rev. Phys. Chem. 2000, 51, 41–63. SU, S. S., LAHUE, R. S., AU, K. G., MODRICH, P. Mispair Specifity of Methy-directed DNA Mismateh Correction in vitro. J. Biol. Chem. 1987, 263, 6829–6835. Hoch, J. A. TWO component and phosphorelay signal transduction. Curr. Opin. Microbiol. 2000, 3, 165–170. SMITH, E. A., ERICKSON, M. G., ULIJASZ, A. T., WEISBLUM, B., CORN, R. M. Surface plasmon resonance imaging of transcription factor proteins: interactions of bacterial response regulators with DNA arrays on gold films. Langmuir 2003, 19, 1486–1492. MARTINEZ-HACKERT, E., STOCK, A. M. Structural relationships in the OmpR family of winged-helix transcription factors. J. Mol. Biol. 1997, 269, 301–312. HALDIMANN, A., FISHER, S. L., DANIELS, L. L., WALSH, C. T., WANNER, B. L. Transcriptional regulation of the Enterococcus faecium BM4147 vancomycin resistance gene cluster by the VanS-VanR two-component regulatory system in Escerichia coli K-12. J. Bacteriol. 1997, 179, 5903–5913. JACKMAN, R. J., DUFFY, D. C., OSTUNI, E., WILLMORE, N. D., WHITESIDES, G. M. Fabricating large arrays of microwells with arbitrary dimensions and filling them using discontinuous de-wetting. Anal. Chem. 1998, 70, 2280–2287. DUFFY, D. C., MCDONALD, J. C., SCHUELLER, O. J. A., WHITESIDES, G. M. Rapid prototyping of microfluidic systems in poly(dimethylsiloxane). Anal. Chem. 1998, 70, 4974–4984. SLOOTSTRA, J. W., KUPERUS, D., PLUCKTHUN, A., MELOEN, R. H. Identification of new tag sequences with differential and selective recognization properties for the anti-FLAG monoclonal
References
36
37
38
39
40
antibodies M1, M2, and M5. Mol. Divers. 1996, 2, 156–164. SCHREIBER, S. L., MACBEATH, G. Printing proteins as microarrays for high-throughput function determination. Science 2000, 289, 1760–1763. BORNHORST, J. A., FALKE, J. J. Purification of proteins using polyhistidine affinity tags. Methods Enzymol. 2000, 326, 245–254. SCHMITT, J., HESS, J., STUNNENBERG, H. G. Affinity purification of histidine-tagged proteins. Mol. Biol. Rep. 1993, 18, 223–230. HO, C. H., LIMBERIS, L., CALDWELL, K. D., STEWART, R. J. A meta-chelating pluronic for immobilization of histidine-tagged proteins at interfaces: Immoblilization of firefly luciferase on polystyrene beads. Langmuir 1998, 14, 3889–3894. SIMONS, P. C., VANDER JAGT, D. L. Purification of glutathione S-transferase from human liver by glutathione-affinity
chromatography. Anal. Biochem. 1977, 82, 334–341. 41 WANG, C., CASTRO, A. F., WILKES, D. M., ALTENBERG, G. A. Expression and purification of the first nucleotide-binding domain and linker region of human multidrug resistance gene product: Comparison of fusions to glutathione S-transferase, thioredoxin and maltose-binding protein. Biochem. J. 1999, 338, 77–81. 42 CATTOLI, F., SARTI, G. C. Separation of MBP fusion proteins through affinity membranes. Biotechnol. Prog. 2002, 18, 94–100. 43 BACH, H., MAZOR, Y., SHAKY, S., SHOHAM-LEV, A., BERDICHEVSKY, Y., GUTNICK, D. L., BENHAR, I. Escherichia coli maltose-binding protein as a molecular chaperone for recombinant intracellular cytoplasmic single-chain antibodies. J. Mol. Biol. 2001, 312, 79–93.
23
1
Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors Gary Franklin, and Alan McWhirter Biacore AB, Uppsala, Sweden
Originally published in: Protein Microarray Technology. Edited by Dev Kambhampati. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30597-1
1 Introduction
The phenomenon of SPR (surface plasmon resonance) was first shown to be applicable to biological molecules as long ago as 1983 [1] and has since played an increasingly important role in the study of biomolecular interactions. Biacore, the pioneering company in the optical biosensor field, combined SPR detection with an advanced system of microfluidics and specially developed sensor surface coupling chemistries to generate a unique technology for generating high quality, information-rich data on molecular binding events. The technology is label-free, highly flexible in terms of the range of biomolecules that it can accommodate and can deliver a wide range of data types, including analyses of concentration, specificity, affinity and kinetics. The technology has been used very successfully in a number of areas, of which the three major areas have been academic life science research, commercial pharmaceutical and biotechnology applications and within the food industry. The instruments available cover a wide assortment of user requirements, ranging from manual systems designed for simple, flexible basic research applications, to automated, high-throughput systems aimed at the drug development industry, as well as systems designed to function within the GxP (Good Laboratory Practice, Good Manufacturing Practice, Good Clinical Practice etc.) environment now required by many regulatory authorities. Biacore users have taken advantage of the inherent flexibility of the technology to study many diverse types of molecules, although proteins have consistently represented the principle focus. As the genomics era begins to give way to the age of proteomics, the need for high quality data on the molecular interactions of proteins has never been greater. While Biacore, along with many other technologies, Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
was making an important contribution towards the understanding of protein interactions long before the term “proteomics” ever existed, the widely anticipated proteomics explosion will certainly present new challenges and opportunities to all working within this field. 1.1 The Proteomics Revolution
During the last decade of the 20th century, biomedical science was characterized by the technological advances that powered the genomics revolution. The race to complete the sequencing of the human genome was probably the most absorbing scientific development of its time. We are now in possession of a staggering amount of information regarding the nucleic acid sequence of our own, and many other species. The total number of human genes (which has not yet been totally covered by the “completion” of the human genome project) is currently estimated at around 27 000 [2]. When compared to genomic sequencing projects carried out in other species however, it is clear that there is no real correlation between gene number and biological complexity among different organisms and that higher levels of genetic organization and control of expression of the genetic material must be involved. The primary challenge now is to relate this vast store of sequence information to its biological functions. Since the vast majority of genes are currently thought to function at the protein level, the next logical step is to re-examine the genome in terms of the proteins it produces. This will require an even larger-scale research effort however, since it is becoming increasingly clear that the number of distinctly functional human proteins produced is much higher than the number of genes that exist. This provides a likely explanation for the “gene number versus complexity” paradox referred to above. The genes of higher eukaryotes are complex in structure, often containing multiple promoters that direct alternative transcripts, and recent evidence suggests that around 50 % of all genes may be subject to alternative splicing [3]. In addition to multiple transcripts and splice variants, it has also been estimated that on average, each human gene may produce three different post-translationally modified protein variants, all with distinct functions [4]. Taken together, these variables mean that there are multiple proteins produced from each gene, with current estimates ranging anywhere between 3 and 20 proteins per gene. The term “proteome” was first coined in the mid-1990s to introduce the concept of a protein-level equivalent of “genome”, and can be loosely defined as “the entire protein complement that is encoded by a genome”. The proteome concept is considerably more complex than its genomic predecessor however, since the protein complement varies enormously between different cells within the organism and can change from minute to minute within an individual cell, depending on the internal and external signals received. Another factor is that the functional roles of proteins in the proteome involve interaction on a much greater scale of complexity than for nucleic acids. In addition, identification of a protein on the basis of its amino acid composition alone may not provide the crucial information in all cases, since the post-translational modifications alluded to previously can play a critical role in physiological and pathological functions (reviewed in [4]).
2 Biacore Technology
Consequently, there are multiple challenges for the much-vaunted “new era” of proteomics. Very high throughput discovery programs will be required to identify all the proteins produced by the genome and to determine where and when these proteins are actually expressed in very complex, dynamic combinations. In addition, entirely different, lower throughput approaches will be required to provide the detailed characterization of interactions between functionally connected biomolecules. This last aspect in particular (embodied in the branch known as “interaction proteomics” or “functional proteomics”) illustrates the magnitude of the problems ahead. It should be clear from the application examples presented here (see Section 3) that understanding biomolecular interactions in the context of biological function requires a far higher level of detail than “do they bind?” and that the ability to describe interactions in terms of affinity and (particularly) rate constants will be a prerequisite for real progress in the future. Biomolecular interactions are transient events and their kinetic characteristics are vitally important for biological function. A full understanding of how proteins function on a mechanistic level cannot, therefore, rely on steady-state measurements alone [5–8]. This temporal aspect of functional proteomic studies adds yet another level of complexity. Many different technologies will doubtless play important roles in the development of proteomics, and Biacore is ideally placed to make a significant contribution. In fact, even before the term “proteomics” existed, Biacore has always been used in approaches that fundamentally belong to the proteomics field. The ability of Biacore to measure protein concentrations, assess binding specificity, to characterize binding interactions in terms of affinity and rate constants and (as described in Section 4) provide identification by linking SPR to MS, make this a key proteomics technology. Moving from the study of isolated proteins or individual binding partners into more industrialized proteomics applications does not, therefore, represent a fundamental paradigm shift but is merely an issue of scale, as will be discussed later. This article will now describe how Biacore’s approach to SPR technology functions on a technical level and how these features contribute to the study of biomolecular interactions. This will be followed by selected examples of how the technology has been successfully applied to a number of important questions, with relevance to both academic biomedical research and the pharmaceutical and biotechnology industries. Finally, current and future developments will be discussed in the context of the evolving areas of proteomics and protein microarrays.
2 Biacore Technology
Biacore’s optical biosensors are built around three principal components: Ĺ an optical system responsible for the generation and detection of the SPR signal that is triggered by biomolecular interaction events Ĺ a sensor chip with a biospecific surface where biomolecular interactions take place Ĺ a liquid handling system with precision pumps and an integrated microfluidic cartridge (IFC) for controlled flow of sample over the sensor surface
3
4 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
Fig. 1 Ligand, analyte and capturing molecule in relation to the sensor surface, as defined in a Biacore assay.
These three cornerstones of the technology will be described in detail, along with a discussion of their practical and theoretical consequences. In the interests of clarity, some specialized terminology relating to the interactants used in a Biacore assay should also be pointed out. The interactant immobilized on the sensor surface is referred to as the ligand and should not be confused with the use of this term in the biological context of ligands/receptors. The ligand can be attached either directly to the sensor surface, or indirectly via an immobilized capturing molecule, and the interactant in free solution is referred to as the analyte. This is shown schematically in Figure 1. 2.1 Surface Plasmon Resonance
The detection principle of the technology depends upon the phenomenon of surface plasmon resonance (SPR), which occurs in thin conducting films at an interface between media of different refractive index [9]. In Biacore systems, the media are the glass of the sensor chip and the sample solution, and the conducting film is a gold layer on the sensor chip surface. Under conditions of total internal reflection, the light leaks an electric field intensity called an evanescent wave field across the interface into the medium of lower refractive index, without actually losing net energy. The amplitude of the evanescent field wave decreases exponentially with distance from the surface, and the effective penetration depth is about half the wavelength of the incident light. In Biacore instruments, the light source is a near-infrared LED of peak intensity wavelength 760 nm and the penetration distance of the evanescent wave is about 300 nm. At a certain combination of angle of incidence and energy (wavelength), the incident light excites plasmons (electron charge density waves) in the gold film. As a result, a characteristic absorption of energy via the evanescent wave field occurs and SPR is seen as a drop in the intensity of the reflected light at a specific incident angle (the SPR angle). These principles are illustrated in Figure 2. The reduced intensity of reflected light is not caused by light absorption in the sample in the conventional (transmission spectroscopy) sense. The light is totally internally reflected inside the optical unit, and it is the evanescent wave that
2 Biacore Technology
Fig. 2 The principle of SPR detection in Biacore instruments. Schematic drawing showing the optical interface, sensor chip and microfluidic components that comprise the main technology cornerstones in Biacore’s approach to SPR biosensors. As drawn here, the ligand is immobilized on the lower (dark grey) surface of the sensor chip and the analyte is passed over the surface under continuous flow conditions, via the integrated microfluidics system. An evanescent wave is created at the chip surface by the action of total internal reflection of the incident light, and this penetrates the sample side of the interface. As indicated in the enlarged view, the amplitude of
this wave decays exponentially with distance from the surface. Polarized components of the evanescent wave field excite electromagnentic surface plasmon waves within the gold film, enhancing the evanescent wave and causing a characteristic drop in the reflected light intensity. Changes in refractive index at the flow cell side of the interface caused by changes in mass concentration at the sensor surface, result in a shift in the angle of incidence required to create surface plasmon resonance. This SPR angle is measured as a change in the detector position for the reflected intensity dip that results from the SPR phenomenon.
penetrates the sample. The significance of this feature is that measurements can be made on turbid or even opaque solutions, without interference from conventional light absorption or scattering by the sample. When the detecting molecule is attached to the sensor chip or when analyte binds to the detecting molecule, the concentration at the sensor chip surface increases leading to an SPR response, observed as a change in the SPR angle. The specific refractive index contribution (i.e. the change in refractive index produced by a unit change in concentration) is very similar for different proteins regardless of composition, and values for other macromolecules are of the same order of magnitude. Consequently, the response measured is related to the mass of analyte bound and is largely independent of the nature of the analyte. Refractive index contributions for different molecules are additive, so that the amount of detecting molecule attached and the amount of analyte bound can both be measured with the same detection principle.
5
6 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
2.2 The Sensor Surface and Immobilization Chemistry
As outlined in the previous section, SPR measurements with Biacore are based on the interaction of an analyte in solution with a ligand attached to the sensor surface. At the heart of this process is the sensor chip, which provides both the physical conditions necessary to generate the SPR signal, and the surface on which the interaction being studied occurs. The specificity of an SPR analysis is determined through the nature and properties of the molecule attached to the sensor surface. Broadly speaking, biomolecules may be attached to the surface of the sensor chip using three different approaches: 1. Covalent immobilization, where the molecule is attached to the surface through a covalent chemical link. 2. High affinity capture, where the molecule of interest is attached by non-covalent interaction with another molecule (which in turn is usually attached using covalent immobilization). 3. Hydrophobic adsorption, which exploits non-specific hydrophobic interactions between the molecule of interest (or a hydrophobic carrier such as a lipid monolayer or bilayer) and the sensor chip surface. 2.2.1 Sensor chips The sensor chip has two essential features: Ĺ A glass surface covered with a thin gold layer, creating the conditions required for generating an SPR signal. Ĺ A coating of some sort on the gold layer, providing a means for attaching the ligand and an environment where the interaction being studied will occur. The coating does not contribute directly to the SPR effect, and varies between different types of chips.
Although SPR can be generated in thin films made from different materials, all Biacore chips are coated with a uniform thin layer of gold. Gold gives a well-defined reflectance minimum with easily handled visible light wavelengths, and is also amenable to covalent bonding of surface matrix layers, and at the same time the metal is largely inert in physiological buffer conditions. Unmodified gold is not generally a suitable surface environment for biomolecular interactions however, and is covered with a covalently bonded monolayer of alkanethiol molecules. This serves to protect the biological samples from contact with the gold and at the same time provides a means of attachment of a surface matrix, which further enhances the usefulness of the surface in biomolecular contexts. On most sensor chips, the surface is covered with a “hydrogel” matrix of carboxymethylated dextran, a flexible unbranched carbohydrate polymer forming a surface layer of approximately 25–100 nm thick. The dextran is covalently attached to an epoxide-modified alkanethiol monolayer. Surfaces for various applications have been tailored by the size of the dextran, differing in the range of 10 kDa to
2 Biacore Technology
1 MDa [10]. The dextran matrix imparts several important properties to the sensor surface: Ĺ It provides a hydrophilic environment favorable to most solution-based biomolecular interactions. Ĺ It provides a defined chemical basis for covalent attachment of biomolecules to the surface using a wide range of well-defined chemistries (as will be described shortly). Ĺ It increases the surface capacity many-fold in comparison with a flat surface. Ĺ It extends the surface detection principle to encompass a thin layer with an extension of the same order of magnitude as the penetration depth of the evanescent wave responsible for the detection principle. The sensitivity of the detection is thereby increased considerably in comparison with a flat surface.
Dextran is also an inert molecule in the context of biomolecular interactions, and the flexibility of the unbranched polymer chains allows the “surface-attached” biomolecules to move with relative freedom within the surface layer. Dextranbased surface matrices therefore provide an excellent environment for biomolecular interactions [10]. Furthermore, the most frequently utilized carboxymethylated derivative of dextran improves the hydrophilicity and serves as an excellent starting point for various immobilization alternatives. The negatively charged polymer may give rise to unwanted electrostatic interactions with the immobilized ligands and thus to a risk of denaturation. These potential effects are normally eliminated by working under physiological salt conditions, where the electrostatic effects are effectively shielded. The basic structure of a sensor chip and its surface is shown in Figure 3.
Fig. 3 Schematic illustration of the structure of the sensor chip surface.
7
8 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
While this is not the forum for a detailed description of specific sensor chips manufactured by Biacore, a number of variations exist on the basic structure described which are designed for a range of specific application areas. Some of these specific application-related formats will be discussed later, where appropriate. 2.2.2 Immobilization Chemistry As mentioned previously, ligands can be attached to the sensor chip using a variety of chemical methods, allowing a broad range of different types of biomolecules to be analyzed by SPR. The gold layer and dextran matrix on the sensor surface are stable under a wide range of conditions, including extremes of pH and many organic solvents. Once the ligand has been immobilized, the stability of the sensor surface is determined primarily by the stability of the attached ligand itself. The carboxymethylated dextran matrix on the sensor surface is amenable to a range of chemistries for ligand immobilization which exploit different groups on the ligand molecule. Immobilization approaches may be directed towards amine, carboxyl, thiol or hydroxyl groups on the ligand, or may use specific tags introduced into the ligand by either chemical modification or recombinant techniques. Amine coupling chemistry is the most widely applicable approach for attaching biomolecules covalently to the sensor surface. With this method, the dextran matrix on the sensor chip surface is first activated with a mixture of 1-ethyl-3(3-dimethylaminopropyl) carbodiimide (EDC and N-hydroxysuccinimide (NHS) to give reactive succinimide esters [11]. The ligand is then passed over the surface and the esters react spontaneously with amino groups or other nucleophilic groups to link the ligand covalently to the dextran (see Figure 4). Most proteins contain several amine groups so that efficient attachment can be achieved without seriously affecting the biological activity of the ligand. In some instances, however, amine coupling may involve groups at or near the active site or binding site of the ligand, with the result that attachment is accompanied by loss of activity. In such cases, the ligand can be attached using alternative coupling chemistry or a capturing approach. Thiol coupling [12] utilizes exchange reactions between thiol and active disulfide groups. The active disulfide moiety may be introduced either on the dextran matrix (to exchange with a thiol group on the ligand, referred to as the ligand thiol approach) or on the ligand molecule (to exchange with a thiol group introduced on the dextran
Fig. 4 Amine coupling of ligand to the sensor chip surface.
2 Biacore Technology
Fig. 5 Ligand thiol (A) and surface thiol (B) coupling of the ligand to the sensor surface.
matrix, referred to as the surface thiol approach). The reagent 2-(2-pyridinyldithio) ethaneamine (PDEA) can be used for introducing active disulfide groups. The amine group in PDEA can be used to attach the molecule to activated carboxyl groups on either the surface or the ligand. The thiol approach can help to immobilize ligands in a defined orientation, since the number of potential attachment sites is often less than with amine coupling, and in many cases is reduced to one single site. Surface thiol coupling is also valuable for acidic proteins, since the substitution with PDEA raises the isoelectric point of the protein, improving the electrostatic interaction of ligand and surface matrix prior to coupling. The two different thiol coupling approaches are illustrated in Figure 5. Aldehyde coupling is a third chemical immobilization method [12], and is useful for ligands containing aldehyde groups, which can be either native or introduced by oxidation of cis-diols. In this case, immobilization requires activation of the surface with hydrazine or carbohydrazide. Aldehyde coupling provides an alternative approach for immobilizing glycoproteins and other glycoconjugates. The method is particularly suitable for ligands containing sialic acid, since these residues are very easily oxidized to aldehydes. The chemistry of aldehyde coupling is summarized in Figure 6.
2.2.3 General Capture Methods Immobilization of ligand by capturing involves high affinity binding of the ligand to an immobilized capturing molecule. In general, the regeneration step in the assay procedure removes ligand along with any remaining bound analyte, so that new ligand needs to be captured for each assay cycle. The capturing molecule is
9
10 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
Fig. 6 Aldehyde coupling of ligand to the sensor surface.
attached to the surface using one of the covalent chemical approaches described previously. Capturing approaches can provide an alternative to covalent immobilization in situations where it is difficult to find chemical methods that give satisfactory results, or where the assay requires that the ligand on the surface can be changed. Capturing can also provide a ligand purification step, for example, when tagged recombinant ligands are captured from partially purified material through binding of the tag to a specific capturing molecule. The basic requirement for successful capturing is a robust high affinity interaction between the capturing molecule and the ligand. Monoclonal antibodies are frequently used as capturing agents. 2.2.4 Specialized Capture Methods Although one of the major advantages of SPR technology is label-free detection of binding interactions, commonly used labeling techniques employed for purification or other purposes can also be taken advantage of for the purposes of ligand immobilization. Biotin-labeled molecules for example, can be captured directly onto sensor chips pre-immobilized with streptavidin [12]: while this is technically a form of capture, the affinity of the streptavidin–biotin interaction (10−15 M) makes this a permanent immobilization of the ligand in practice. This option is particularly useful when using nucleic acid ligands, but is also used for other applications, such as biotinylated liposomes [13] and biotinylated glycosaminoglycans such as heparin [14]. Histidine (his)-tagged recombinant proteins may also be conveniently captured onto a specialized sensor chip surface on which nitrilotriacetic acid (NTA) has been covalently linked to the carboxymethylated dextran matrix. This surface is activated by a simple pulse of NiCl2 , which forms a chelating complex with NTA that binds polyhistidine peptides [15]. This is a useful generic method for working with his-tagged proteins and the captured ligands are easily removed by a pulse of EDTA. 2.2.5 Lipids and Membrane Proteins Hydrophobic molecules such as lipids present specialized requirements for SPR analysis, specifically in terms of sensor surface attachment, since they cannot be directly coupled to a standard carboxymethylated dextran surface for interaction studies with other molecules. This is an important issue, particularly since membrane-bound receptors are also a very important class of molecules for
2 Biacore Technology
biomedical research and these are most appropriately studied in the context of a lipid environment. To facilitate these types of applications, specific sensor chips are available. These either have a flat hydrophobic surface, suitable for adsorbed lipid monolayers, or a dextran surface modified with lipophilic compounds, which enables liposomes to maintain a membrane-like lipid bilayer structure on the sensor surface. In addition to these specific sensor chips, several other inventive approaches to lipid-related applications have also been demonstrated [13, 16–19]. Taken together, the range of available surface chemistry options enables the study of almost any conceivable ligand, from low molecular weight substances to whole cells. In the event of immobilization difficulties with a particular ligand, the option of using a capture molecule is also available (see Figure 1). 2.3 The Microfluidics System
The method of delivery of samples to the sensor surface is also a critical function and can vary considerably among the different commercial variants of SPR technology. The solution employed by Biacore involves a flexible microfluidics system based around an integrated microfluidic cartridge, or IFC [20]. This consists of a series of micro-channels and membrane valves encased in a plastic housing, and serves to control delivery of liquid to the sensor chip surface. Samples are transferred through a needle and into the IFC, which connects with the detector flow cells. These are formed by the sensor chip pressing against molded channels in the IFC. The precise construction of the IFC and the number of flow cells available varies amongst the different instrument series, but the common purpose is to allow analyte to pass over the sensor surface in a continuous, pulse-free and controlled flow that maintains a constant analyte concentration at the sensor surface during sample injection. An example of a microfluidics system is presented in Figure 7.
Fig. 7 The microfluidics system and flow cells
11
12 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
In the first Biacore instruments, samples were addressed in separate flow cells formed by the IFC. More recently, a technique of hydrodynamic addressing has been introduced whereby solutions are directed over different detection spots in a single flow cell, allowing more precise comparisons between spots. The construction is also designed to provide an almost instantaneous switch from sample to buffer, with minimum dispersion between the injections. These are important features of the technology, as they permit the SPR measurement of rapid events and enable, for example, very high-resolution kinetic studies to be undertaken. A more detailed discussion of the microfluidics system from the assay perspective will be taken up in Section 2.2.5. 2.4 Biacore Assay Basics
The definitions of basic assay formats and interaction partners are given at the beginning of Section 2, but it may also be useful to define some additional fundamental Biacore terminology at this point: Ĺ The resonance unit (RU) is the unit of measurement of the SPR response and is directly proportional to the concentration of biomolecules on the sensor surface. Ĺ The real-time SPR response is presented as a sensorgram, which plots response against time. Sensorgrams may be analyzed to provide information on the rates of interaction between molecules. Ĺ Report points are sensorgram responses taken at specific times, averaged over a short time window, or taken as the slope of the sensorgram over a defined time window. Ĺ Regeneration is the process of removing bound analyte from the surface after an analysis cycle, in preparation for a new cycle.
SPR studies, as with most advanced, high technology molecular techniques, require care and planning from the user in order to provide reliable, high quality results. Proper instrument maintenance, intelligent planning of assays and careful experimental technique all are important factors. As mentioned in several places within this chapter, Biacore systems are designed to enable all appropriate controls to be incorporated into assays. While care should always be exercised when carrying out assays, particular attention needs to be paid when carrying out affinity and kinetic assays. These approaches involve the simultaneous evaluation of multiple sensorgrams derived from concentration series of the analyte. It is very important here that the concentration dilutions are accurate and that the different solutions are carefully matched in order to calculate correct binding constants. 2.4.1 Correlation of SPR Response with Surface Binding While the SPR response is proportional to the concentration of bound analyte at the sensor surface, changes in the refractive index caused by the bulk solution
2 Biacore Technology
Fig. 8 Schematic sensorgram, illustrating the contribution of the bulk refractive index to the SPR signal.
will also contribute to the measured response. In practice, the bulk response is usually irrelevant in terms of data evaluation, since the binding response can be processed to take account of this. The absolute response measures the total SPR signal (including bulk contribution) and is normally recorded for both ligand and reference spots. The relative response subtracts a specified baseline value from the total SPR signal obtained. Depending upon the particular system and assay set-up used, a reference-subtracted response corresponding to the difference between ligand and reference spots can also be presented. To a first approximation, this eliminates the contribution from the bulk solution and corresponds to the amount of ligand and analyte on the surface. A schematic sensorgram which indicates the bulk contribution effect is shown in Figure 8. The precise relationship between SPR signal and the amount of surface-bound analyte depends to some degree on the nature of the molecule involved, but is linear for any given molecule over experimentally relevant ranges. Direct studies using various proteins, for example, have shown a linear response and established an approximate conversion factor of 1000 RU ng−1 protein mm2 , using a standard carboxymethylated dextran sensor chip [21]. Taking a matrix depth of 100 nm, a 1000-RU response approximates to an equivalent solution protein concentration of 10 mg mL1 . This relationship varies for other types of molecules and it is known for example that nucleic acids have a slightly higher specific refractive index increment than proteins. In terms of detection limits, these vary depending on the molecules involved and the assay format used (see below), but the specified limits quoted, based on antibody detection of myoglobin, vary between 50 pM to less than 1 pM depending on the specific type of Biacore instrument. The sensorgram represents one of the key strengths of this technology, since it provides real-time, continuous monitoring throughout all phases of a biomolecular interaction. SPR responses are taken at a frequency of 1 to 10 s−1 , producing
13
14 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
an information-rich data format than enables evaluation of complex interaction properties, but which also gives immediate visual feedback on each interaction. SPR responses at defined points during analysis cycles are taken from the sensorgram in the form of report points that are automatically assigned by the control software and/or assigned by the user. 2.4.2 Assay Design: Assay Formats In addition to direct binding assays (whether directly to an immobilized or captured ligand), a number of other assay formats are possible. Sandwich assays (direct binding with enhancement) are an extension to the direct assay in which a secondary “enhancement molecule” is allowed to interact with the bound analyte in order to increase the level of the SPR response. This approach can also be used to confirm the identity of the analyte when working with complex mixtures for example, since two specific binding interactions must take place in order to see an enhanced response. Inhibition assays (or solution competition assays) exploit the ability of the analyte to inhibit the binding of a high MW detecting molecule to the surface. Typically, the analyte or a derivative thereof is attached to the surface as the ligand, while the detecting molecule (added to the samples at a fixed, known concentration) is a macromolecule that binds specifically to the analyte. The mixture is incubated for a defined time, and then injected over the sensor surface to measure the remaining free detecting molecules. The amount of free detecting molecule is then inversely related to the concentration of analyte in the sample. Inhibition assays are most commonly used for low MW analytes. They may however, also be used as an alternative to direct binding level assays for macromolecular analytes, in situations where immobilization or regeneration of the detecting molecule on the sensor surface is not satisfactory. In surface competition assays, a binding partner to the analyte is used as the ligand, and a high molecular weight analog to the analyte (typically analyte conjugated to a carrier protein) is added in constant amount to the samples to be measured. The basis of the assay is competition between analyte and the high MW analog for binding to the ligand. The measured response is the sum of the contributions from the analyte and the high MW molecule: as with the solution competition approach, the response is inversely related to the amount of analyte in the sample. The surface competition approach can have advantages over the more common inhibition assay format in situations where immobilization of the analyte on the sensor chip surface presents problems. The four basic assay strategies are shown in Figure 9. The sandwich and two competition-format assays are used widely for making concentration measurements, whereas affinity and kinetic measurements principally employ direct binding assays. 2.4.3 Assay Design: Surface Preparation The range of surface coupling chemistry approaches supported by Biacore gives the user great freedom in designing an assay and choosing which interaction partner to immobilize on the sensor surface. In many cases, this choice can be made purely on the basis of experimental efficiency and convenience. In an ideal
2 Biacore Technology
Fig. 9 Schematic illustration of the four principle Biacore assay formats.
situation, the ligand and analyte identities of the biomolecules can be assayed in reciprocal experiments, confirming that surface coupling does not affect the observed binding behavior. As described in Section 2.2, certain types of ligand may necessitate using specific sensor chips, although capture-type approaches can always be considered. However, it may be preferable to use certain molecules as ligand or analyte specifically, such as in the case of RNA molecules, which are generally more likely to be used as ligands. Their strong negative charge can produce electrostatic repulsion effects with the surface matrix at physiological pH if used as analyte which might of course affect apparent binding and kinetic properties [22]. It is often convenient in any case to capture nucleic acid ligands onto a streptavidin surface, given the frequent and convenient use of 5 biotin labeling of oligonucleotides. It is also worth remembering that the SPR signal is related to the mass concentration of bound molecule on the surface, so that the molecular weight (MW) of the interaction partners may also be an important factor. For a given number of interaction events, the SPR response is proportional to the MW of the analyte. Conversely, for a given level of ligand immobilization, a higher MW ligand will provide proportionately fewer binding sites. These issues are normally expressed in terms of Rmax , which defines the theoretical maximum binding capacity of the surface for a defined pair of ligand and analyte molecules: Rmax =
MWanalyte ×Rligand ×n MWligand
(1)
15
16 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
where: Rmax is the maximum analyte binding capacity in RU; MWanalyte and MWligand are the molecular weights of analyte and ligand respectively; Rligand is the amount of immobilized ligand in RU and n is the stoichiometry of analyte binding to ligand. For example, with ligand molecular weight = 50,000 Da, analyte molecular weight = 10,000 Da and ligand immobilized level = 10,000 RU, the theoretical maximum binding capacity assuming 1:1 binding of analyte to ligand is given by Rmax =
10, 000 ×10, 000=2, 000RU 50, 000
(2)
In addition to aiming for immobilization levels in relation to the expected concentration and MW of the analyte, the optimal Rmax can vary considerably depending on the particular assay. Concentration assays, for example often use a relatively high Rmax of several hundred to several thousand RU, whereas for kinetic rate constant determinations, much lower Rmax values of the order of 10–50 RU are recommended (reviewed in [23]). 2.4.4 Regeneration The process of removing bound analyte from the sensor chip surface after analysis of a sample in preparation for the next analysis cycle (regeneration) is another important feature of Biacore’s approach to SPR. This enables sensor surfaces to be prepared with immobilized ligand and re-used for multiple rounds of analysis. The number of times a sensor surface can be regenerated depends on the nature of the attached ligand, but normally ranges from fifty to several hundred. When the ligand is attached directly to the surface, the aim of regeneration is to remove the analyte from the ligand without destroying the ligand activity. When a capturing approach is used, regeneration generally removes both ligand and analyte from the capturing molecule, and the stability of the ligand under regeneration conditions is irrelevant. The optimal regeneration solution varies depending on the nature of the ligand–analyte interaction and the stability of the two molecules, but can often be achieved by simple exposure to acidic (dilute HCl or glycine-HCl) or basic (NaOH) solutions. Many other alternatives can be used, however, and it is possible to use sequential exposure to two different regeneration solutions and to vary the contact time for the solutions on the sensor surface. Efficient regeneration is important for successful assays. Incomplete regeneration or loss of the binding activity from the surface impairs the performance of the assay and the useful lifetime of the sensor chip can be shortened. Suboptimal regeneration can be tolerated in some simple types of yes/no binding assay, since a certain loss of ligand activity can be adjusted for using suitable controls. The demands on efficient regeneration are much stricter in the case of affinity or kinetic assays however, since it is important to maintain a constant surface activity between cycles within the concentration series for each sample.
2 Biacore Technology
2.5 Theoretical and Practical Implications of Technology Design for Biacore Assays
The technological approaches taken by Biacore are designed to optimize SPR analysis of biomolecular interactions. Some of the consequences of these technology implementations have been referred to in the previous sections, but attention will now be focused on the more detailed theoretical and practical implications for designing and running SPR assays with Biacore. David Myszka has also written a review that covers many of these issues [23]. 2.5.1 Label-free Detection The label-free detection principle that forms the basis of SPR technology represents one of its strongest advantages over more traditional methods. In addition to being inconvenient and time consuming, labeling procedures may also have adverse effects on the behavior of the labeled molecule, by changing the properties of, or occluding the binding site for example. The label itself can also introduce complications: fluorophores are usually hydrophobic and are frequently associated with background problems for example, and radiolabeling suffers from drawbacks connected with safety, isotopic decay and radiolysis-induced instability of labeled compounds. Labeling techniques are of course more sensitive in certain instances, but this must be weighed against the disadvantages of time-consuming and laborious procedures and high sample consumption. One important aspect of label-free SPR technology however, is the generic nature of the detection principle. The SPR response from any given analysis is derived from changes in the total mass concentration at the sensor surface and does not distinguish between potentially different molecular identities. This raises an important issue with regard to specificity and experimental design. Non-specific binding of analytes to the native sensor surface can be easily controlled for by incorporating a “blank” flow cell in the analysis. This is rarely a problem however, and the most likely source of non-specific binding problems lies with the ligand chosen. For this reason, it is important to use very pure preparations of the intended ligand and to ensure that it is a sufficiently specific molecule for the intended assay, since this will largely determine the chances of success. Biacore’s ongoing development of combining SPR with mass spectrometry (MS, see Section 4.1) represent an interesting approach to provide molecular identification along with binding information. This promises to be particularly useful in the case of ligand fishing-type experiments, where potentially unknown binding partners may be involved. 2.5.2 The Optical Detection System and Bulk Effects As mentioned previously, the optical detection system in Biacore will detect all surface changes in refractive index, including those contributed by different bulk solutions. In most situations, using relative-responses and careful reference-subtraction will take care of these effects. Particular care is needed however, when working in conditions of low analyte responses, high surface ligand density and organic solvents with large bulk refractive index effects. These conditions most commonly arise
17
18 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
when working with low molecular weight drug compounds, which often require 1–10 % dimethylsulphoxide (DMSO) in order to maintain solubility. Under these conditions, the excluded volume effect between ligand-bound and blank reference surfaces (allowing more of the solvent to access to the blank surface) can produce bulk subtraction errors that are of the same order of magnitude as the expected analyte responses. This requires great care when preparing samples, since small differences in DMSO concentration will produce a significant bulk response difference. Fortunately, this complication can be overcome by normalizing responses against a solvent correction curve prepared by running a concentration series of DMSO-containing buffer solutions over the ligand and reference surfaces [24]. 2.5.3 Surface-based Versus Solution-based Analysis SPR, as the name implies, is an intrinsically surface-based technique. Despite the many advantages that this provides, potential effects on molecular behavior caused by coupling one of the interaction partners to the sensor surface per se must also be considered. The possibility always exists for example, that chemical coupling of a ligand to the sensor surface may interfere with a critical analytebinding region (labeling procedures also carry an analogous risk, as discussed previously). This possibility can usually be tested for by carrying out an activity test on the immobilized sensor surface, using a positive control sample. In the unlikely event of such problems occurring, the range of coupling chemistry options available provides alternative immobilization strategies, as well as the possibility of reversing the experimental design and using the molecule in question as the analyte. Another question that has been raised is whether meaningful binding data can really be addressed by any surface-based method. The argument is that in contrast to solution-based assays, the restrictions imposed upon the interaction partner which is immobilized onto a surface will produce experimental artifacts. This is an over-simplification of course, and interactions in simple solution assays are unlikely to precisely reflect many biomolecular interactions that occur in vivo. Furthermore, the carboxymethyl dextran hydrogel on Biacore sensor chips presents a flexible matrix for ligand binding and probably allows a considerable degree of molecular movement [10]. Formal proof that surface-based methods are in fact highly appropriate for biomolecular interaction studies has come from studies designed to directly compare solution-based techniques with SPR. These have demonstrated that affinity, kinetic and thermodynamic constants obtained using Biacore are extremely consistent with those obtained using a range of solutionbased techniques [25]. 2.5.4 The Microfluidic Design and Mass Transport Effects The liquid flow system in Biacore’s approach to SPR lies at the heart of the design philosophy and contributes greatly to its powerful kinetics capabilities. As with all surface-based analysis methods however, there is one complicating factor that needs to be considered for both system and assay design, namely the issue of mass transport [26]. For the analyte to be able to bind to the sensor surface, the molecules must be transported from the bulk solution to the surface. This is a
2 Biacore Technology
diffusion-controlled process and under the laminar flow conditions that exist in the IFC, the transport rate is proportional to the analyte concentration in the bulk solution, with a proportionality constant called the mass transport coefficient (km ). This constant is proportional to the cube root of the liquid flow rate and depends on flow cell dimensions and the diffusion properties of the analyte in the sample solution [27]. If mass transport is much faster than the association of the analyte with the ligand, the observed binding will be determined by the interaction constants. Conversely, if mass transport is much slower than association, the binding will be wholly limited by the mass transport process and kinetic data for the interaction will not be obtained. In practice, many experimental systems will actually lie between these extremes, and the binding will be determined by contributions from both mass transport and interaction kinetics. This may be represented in a reaction scheme as: Abulk km Asurface + B ka AB km
kd
(3)
where km is the rate constant for mass transport to and from the surface and ka and kd are the rate constants for the formation and dissociation of the complex. It is important to understand the experimental parameters that determine whether the association rate is limited by mass transport or interaction. For a given analyte and flow cell dimensions, only two factors are significant: 1. The flow rate, which affects the mass transport rate constant km but has no effect on the interaction kinetics. Increasing the flow rate increases km , so that mass transport is less likely to be limiting. This effect is however relatively small, since km is proportional to the cube root of the flow rate. 2. The concentration of surface binding sites, which affects the absolute interaction rate but has no effect on mass transport. This is the most important parameter. Increasing the surface binding capacity increases the initial interaction rate, so that mass transport is more likely to be limiting.
As an interaction proceeds, the concentration of available surface binding sites falls and interactions which are limited by mass transport in the initial phase can therefore, become interaction-limited as the amount of free ligand sites decreases. The mass transport rate is also influenced by the diffusion coefficient of the analyte, which in turn depends on the molecular weight. A large analyte has a lower diffusion coefficient, and will therefore be more subject to mass transport limitations than a small analyte with the same kinetic parameters. The very same parameters that govern the influence of mass transport during the association phase also apply during dissociation: released analyte can either diffuse into the bulk buffer flow and be removed, or may in theory re-bind to the ligand. In this context too, a high flow rate and low surface binding capacity will minimize any such effects. It is important to realize that mass transport is not a “problem” in SPR analysis, but a physical property of the system which can be readily accounted for in data
19
20 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
interpretation [28] and which can also be manipulated to some advantage during assay design. When carrying out concentration measurements, it can be desirable to perform assays under conditions where the interaction is entirely controlled by mass transport [29], since under such conditions, the binding rate is entirely dependent on the analyte concentration. This makes the assay independent of affinity effects and therefore more robust for analysis of multiple analytes. These conditions are best achieved by using low flow rates and immobilizing the ligand at a high surface-binding capacity. For assays designed to measure kinetic rate constants on the other hand, the requirements are reversed. If the rate of mass transport is much faster than the association of analyte with ligand, the observed binding will be determined entirely by the interaction constants. The optimal conditions to achieve this are a combination of high flow rate and a low surface binding capacity [23]. The height of the flow cell also affects the phenomenon [20] and some Biacore systems are designed with a lower flow cell that helps to improve mass transport. Given that in practice assays can be governed by a combination of interaction constants and mass transport rate, the fitting algorithms used to determine rate constants from sensorgram data are designed to incorporate a mass transport component that ensures correct analysis [30]. In the extreme case of a completely mass transport-limited interaction however, the sensorgrams will not contain any kinetic information and the calculated rate constants will be meaningless. Fortunately, software tools are now designed to deal with this potential problem and can identify binding interactions that are entirely controlled by mass transport. 3 Biacore Applications in Basic and Clinical Research 3.1 Introduction
The publication of around 400 papers each year covering a wide range of biological and clinical research fields is testament to the versatility of Biacore’s SPR technology, with examples of quantitative analyses of biomolecular interactions to be found in virtually all disciplines within the life sciences. New innovative applications are continually appearing in the literature and demonstrate that the technology has the potential to replace and often improve on the speed and data quality of established techniques. This section will review a selection of recently published applications with particular emphasis on protein:protein interactions and protein:nucleic acid interactions in cancer research and neuroscience, two areas within which Biacore has become firmly established in recent years. 3.2 Applications of Biacore in Cancer Research
The design of targeted therapeutic reagents to cellular signaling events that initiate oncogenic transformation or maintain a tumorigenic phenotype in a cell requires
3 Biacore Applications in Basic and Clinical Research
that the specificity and kinetics of the molecular interactions in the signaling cascades are accurately determined. Possibly the most accessible interactions against which intervention may be directed are those that occur at the cell surface between growth factors and constitutively active receptors. In one approach, peptide mimetics were designed based on a functionally inhibitory antibody to HER2, an oncogenic member of the EGF family of tyrosine kinase receptors which is commonly expressed in human breast, ovarian and colon cancer and which is associated with a poor prognosis [31]. A Biacore assay was used to screen and select peptide mimetics from a range of candidates that bound with high affinity and with favorable kinetics, most importantly those with slow dissociation rates (Figure 10). Those with the slowest dissociation rates were also most effective at inducing functions such as reduced tyrosine phosphorylation of downstream signaling molecules in cell-based assays. Most encouragingly, cells treated with these mimetics were hypersensitive to radiation-induced apoptosis suggesting that the mimetics may be used clinically in conjunction with radiotherapy. Simultaneous competition studies using Biacore showed that the mimetics and the parent antibody were directed against the same epitopes on HER2 [32]. These studies also suggested that the receptor was disabled because the mimetics inhibited receptor dimerization [33], a prerequisite for the propagation of signals into the cytoplasm following engagement with a cognate ligand. A strategy of structure-based targeting of the dimerization interfaces of tyrosine kinase receptors may therefore be generally applicable to the design of antagonists of oncogenically transformed tyrosine kinase receptors. It is important to note that in this study, although the affinity of the mimetic for HER2 was lower than the parent antibody, the dissociation rate constant, kd , was similar, indicating that the complex was stable. Potential therapeutic reagents may therefore be missed during a screening process that relies on affinity data alone and both association and dissociation rate constants (ka and kd , respectively) may reveal effective binders (see [8] and [7] for detailed studies on the treatment of ka and kd data in Biacore assays to select effective binders from a large base of candidate drugs). Although growth factors generally bind with high affinity to one favored cellular receptor, they may nevertheless also bind alternative receptors. Platelet-derived growth factor (PDGF), for example, exists both as a homodimer (AA or BB) and a heterodimer (AB). While PDGF-AA binds only the PDGF-α receptor, PDGF-BB and PDGF-AB can bind both the PDGF-α and -β receptors. In addition, it was recently shown in a Biacore study that the pleiotropic scope of PDGF-BB is wider still and binds low-density lipoprotein receptor-related protein [34]. It does not always follow, however, that the functional consequences of binding alternative receptors are easily differentiated. The functions of two different cell membrane receptors, lymphotoxin-β receptor (LTβR) and herpes virus entry mediator (HveA), both of which bind a common ligand, have now been defined using Biacore. Peptides bearing selected point mutations were generated from LIGHT, a pro-apoptotic TNF ligand superfamily member. Mutants were selected that bound one, but not the other binding partner [35]. One selected mutant that bound HveA, but not LTβR in the Biacore assay, failed to induce apoptosis in target cells. The cell-based assays showed that a dominant negative mutant of TRAF3, an intracellular protein
21
22 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
Fig. 10 Screening a series of peptide mimetics using an SPR-based kinetic assay. (A) Dose dependence curves for binding of AHNP to the immobilized HER2 receptor. AHNP-Y was injected at concentrations of 0.5 (a), 1 (b), 2 (c), 4 (d) and 8 mM (e) at a flow rate of 20 :L min. The sensorgrams show binding of AHNP-Y to the immobilized HER2 (the first 300 s) followed by peptide dissociation from the receptor surface
(the final 240 s). (B) Biacore analysis of the inhibitory effect of AHNP-Y on mAb 4D5 binding to immobilized HER2. Sensorgrams show binding of 1 nM mAb 4D5 to HER2 after pre-injection of 0 (a), 0.3 (b), 1.5 (c), 6 (d), 20 (e), and 60 mM (f) AHNP-Y. Inset: the correlation between the initial rate of 4D5 binding to HER2 and concentration of pre-injected AHNP-Y.
3 Biacore Applications in Basic and Clinical Research
that binds the cytosolic domain of LTβR and transmits signals leading to cell death, blocked LIGHT-mediated apoptosis. These functional cell-based assays confirmed the Biacore binding data, indicating that LTβR but not HveA propagates LIGHTmediated apoptosis. Although mouse monoclonal antibodies have been raised to many human tumorspecific markers [36–39] their use in cancer treatment is greatly limited by the fact that the patient inevitably generates a potentially harmful immune response. Attempts to address these limitations have been made by the generation of singlechain Fv (scFv) molecules that retain specificity without presenting the highly antigenic constant regions of the whole antibody molecule to the host. In one variation of this strategy, a fully human, gene-fused bi-specific antibody (bs-Ab) was created with specificity both to IgG Fc receptors (CD16) on the surface of cytotoxic immune cells and to HER2 on tumor cells [40]. The hypothesis was that this bsAb might function as a bridge and actively recruit host cytotoxic immune cells to tumor cells. In separate Biacore experiments in which either CD16 or HER2 were immobilized on a sensor chip it was demonstrated that the gene fusion process did not greatly alter the binding kinetics of either of the component scFvs in the bs-Ab. Confirmation of “conservation-of-function” after conjugation is an important stage in the development of any carrier system in which an active molecule is conjugated to a carrier. The function of the bs-Ab was also tested in vitro, where it increased lysis of human ovarian cancer cells which over-expressed HER2 when presented together with peripheral blood lymphocytes. While whole monoclonal antibodies are potentially useful reagents for diagnostic applications, human antibody fragments obtained via phage antibody technology are a more attractive alternative for human cancer therapy since they might be expected to penetrate tumors more efficiently than whole antibodies [41–43]. Fibronectin is a ubiquitous extracellular matrix protein, one variant of which contains a novel splice-generated domain (ED-B). This isoform is found in the stroma of fetal and neoplastic tissues and in the walls of developing neoplastic blood vessels, but is absent from normal adult tissues. Reasoning that this protein is a suitably generic tumor-specific target, a number of single chain antibody fragments and dimeric variants to ED-B were generated and an optimal binding fragment was selected for clinical trials [44]. Biacore’s SPR technology was used to screen antibody fragment libraries and to compare detailed binding properties to immobilized ED-B. The affinity dissociation constants (KD ) of the antibody fragments for ED-B varied widely but those that bound ED-B with higher affinity than the parent fragment were shown in most cases to do so as a result of slower dissociation rates. When fragments were assessed for tumor targeting in vivo using real-time photodetection of fluorescently-labeled antibodies, the efficiency of tumor homing correlated closely with the slower off-rates of ED-B binding detected in the Biacore assays. Biacore’s SPR technology was therefore used to select optimal antibody fragments for diagnostic imaging and may therefore be useful in the preclinical stage of selecting reagents for therapeutic strategies targeted at the tumor vascula-ture. Cell surface receptors bind to their cognate ligands and to recruited intracellular second messengers at a hydrophobic interface: the external and internal faces of the
23
24 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
Fig. 11 Signal transduction occurs in the plasma membrane. Ras is anchored in the plasma membrane via one farnesyl and one or two palmitoyl modifications at its C-terminus, where activation by GEFproteins (here Sos) and recruitment of downstream effectors such as Raf-kinase occurs.
plasma membrane, respectively. Proteins assume markedly different structures in hydrophobic and aqueous microenvironments. Even if the transmembrane domain of the receptor is the only part of the molecule that sits directly within a lipid bilayer, structural disturbances in this region of the protein may alter the protruding intraand extracellular domains. It is thus reasonable to assume that interactions between proteins may yield a very different picture depending on the microenvironment in which the interaction is studied. A lipid microenvironment may be vital to allow immobilized receptors to assume a configuration that reasonably mimics that found in situ and thereby present a functional binding interface for interacting molecules. For example, in order to interact with the serine/threonine kinase, Raf , the proto-oncogene, Ras, must insert into a biological membrane to transduce signals from activated receptor tyrosine kinases to the intracellular MAP-kinase pathway (Figure 11). In order to be able to make kinetic measurements of the interaction between Raf and Ras, the plasma membrane was thus simulated in vitro on a hydrophobic sensor chip by loading the chip with liposomes prior to molecular interaction analysis [45]. Biacore experiments showed that GTP-bound Ras bound specifically to Raf only if Ras was immobilized on the liposome-loaded chip by prior fusion with a synthetic lipid tail (Figure 12). In functional assays, cells that were microinjected with nominally oncogenic truncated Ras without a lipid tail did not transform into tumor cells. These cells, in contrast and in full agreement with the Biacore data, only transformed if truncated Ras was co-injected along with a lipid anchor. To be able to grow beyond a volume of about 1 mm3 , a tumor, once initiated, requires its own microvasculature. Inhibition of angiogenesis, or the process of neovascularization, is an attractive strategy in the intervention of tumor spreading. Members of the vascular endothelial cell growth factor (VEGF) family are principal effectors of angiogenesis as they influence the division and migration of endothe-
3 Biacore Applications in Basic and Clinical Research
Fig. 12 Insertion of lipid-modified Ras protein into an artificial membrane on Sensor Chip HPA. Sensor Chip HPA was treated with POPC (palmitoyl oleoyl phosphatidyl choline)-containing lipid vesicles and conditioned by treatment with NaOH and bovine
serum albumin at high flow rates. Farnesylated Ras is inserted into the artificial membrane, while all non-covalently attached molecules are easily displaced from the lipid surface by the detergent octylglycoside.
lial cells by activating cognate transmembrane receptors, among them VEGFR-1 and VEGFR-2. Recombinant, soluble variants of these receptors that lack transmembrane domains can be used to sequester VEGF in solution and block binding to native cell-bound receptors. However, while cell-based approaches indicate that heparin is necessary to facilitate the functional presentation of VEGF to the receptors, cell-free assays suggest that this is not the case for the isolated soluble receptors. Kinetic analysis using Biacore showed that heparin actually inhibited the interaction with soluble receptors [46] and that this was due to a reduction in association rate. In this way, Biacore analysis therefore enabled baseline parameters to be established for the binding of soluble VEGFR-2 to VEGF and to eliminate the potential problem of a heparin requirement in therapeutic applications. Intervention in the deregulated cell cycle is another potential therapeutic anticancer strategy. Progression from the G1 (gap) phase to the S phase (DNA synthesis) of the cell division cycle is regulated by a family of cyclin-dependent protein kinases (CDKs) which are regulated in their turn by many other proteins including members of the cyclin family. In one study, binding between PCNA (proliferating cell nuclear antigen) and mutants of p21, a CDK inhibitor, was measured using both a yeast two-hybrid assay and Biacore [47]. Although these two approaches gave similar qualitative data, only Biacore provided information on binding affinity and further showed that trimerization of PCNA is a prerequisite for p21 binding. The technology was also used to screen for short peptides that inhibited the p21–PCNA interaction in competition assays. One p21-like peptide was found that abolished this interaction beyond the level of detection.
25
26 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
Cyclin G is regulated by p53 in response to growth factor stimulation and its overexpression has been shown to inhibit the proliferation of human sarcoma cells. Although this implies a role for cyclins in cell cycle control, the physiological functions of cyclin G have not been characterized. A cDNA clone encoding a potential cyclin G binding partner has been isolated [48]. This novel serine/threonine protein kinase was named cyclin G-associated kinase (GAK). Immunoprecipitation/Western blotting experiments corroborated GAK binding to cyclin G and indicated that both proteins interact weakly with CDK2. Since such cell-based experiments do not rule out indirect association of proteins within a complex, Biacore’s SPR technology was used to confirm that GAK binds directly to the N-terminus of cyclin G. The kinetic data provided by the Biacore analysis also showed that this association was stable. When the weak interaction between cyclin G and CDK2 suggested by immunoprecipitation was also investigated by SPR using recombinant proteins, no binding was detected, suggesting that any in vivo association between these two proteins either requires post-translational modifications, or that it is indirect. Biacore’s SPR technology therefore identified the primary molecular targets of cyclin G, providing the basis for future investigations into this system and its role in cancer. Cell cycle arrest and apoptosis are often demonstrably co-regulated. For example, survivin, a factor expressed during the G2/M phase of the cell cycle, co-localizes in centrosomes with the apoptotic protein, caspase-3. Although available data suggest that survivin allows cells to progress from G2 to M by binding caspase-3 and thus diverting cells from apoptosis, this has been difficult to prove. Biacore was applied to find out if survivin binds directly to caspase proteins and if this inhibits caspase function [49]. Survivin was immobilized on the sensor chip and was shown to bind caspase-3 and caspase-7 with high affinity. In order to try to explain the discrepancies between Shin et al.’s caspase inhibition results and previous studies that failed to demonstrate binding, the authors noted that whereas their own experiments were carried out with all components pre-warmed to 25 ◦ C, the survivin used in other studies was kept on ice until immediately before the assay. Using the temperature control available with the instrument, it was demonstrated that compared to experiments carried out at 25 ◦ C, the affinity of survivin for caspase-3 was reduced 100-fold at 4 ◦ C and may explain why the caspase:survivin interaction remained undetected in earlier assays. 3.3 Applications of Biacore in Clinical Research
Possibly the most attractive advantage of Biacore as a means of assessing binding partners in clinical research is the lack of a need for sample preparation or cleanup prior to running the experiment. The following examples are from reports in the literature on the analysis of clinical samples such as tissue homogenates, -plasma and urine. During the preclinical stage of the development of potentially therapeutic antibodies, promising alternatives are tested for their affinity to the target [50, 51]. In one study, chimeric and humanized murine antibodies to CD44, a marker for head
3 Biacore Applications in Basic and Clinical Research
and neck cancer, were affinity-ranked using Biacore and then assessed for biodistribution and efficacy in mice bearing head-and-neck squamous cell carcinoma xenografts [52]. Based on both these sets of data, the important finding was made that lower affinity antibodies to CD44 were superior to higher-affinity versions in their ability to target tumors. While tumor uptake of low affinity binders was up to 50 % higher than that of their high affinity counterparts, blood levels and uptake into non-cancerous tissue was unaffected. Biacore has recently made its debut in the analysis of adverse events arising from the use of humanized antibodies in phase I clinical trials [53]. The monoclonal antibody, A33 recognizes a 43-kDa cell surface glycoprotein that is expressed in cancerous human colonic epithelium and which is absent from most normal tissues. A rapid and reliable means is needed to detect and quantify any host immune response to the humanized antibody which could possibly be harmful to the patient. A Biacore assay was used in which human immunoglobulin was immobilized on a sensor chip and whole patient serum was used as the analyte to detect any potentially harmful anti-human antibody response. This study showed that Biacore is sufficiently robust and consistent between sample runs to be used as an on-site method in clinical trials. The detection and measurement of antibody markers in human plasma is a common prognostic and diagnostic means of patient evaluation, as well as in monitoring clinical changes in response to therapy. A Biacore assay has been used to detect antibodies to gangliosides found in patients with neuropathies suspected of having an autoimmune etiology [54]. By immobilizing gangliosides on a sensor chip, crude diluted patient sera could then be used to detect the presence of antiganglioside autoantibodies. The assay was faster than ELISA partly due to the lack of sample preparation and further, had a superior limit of detection that could be even further improved through signal amplification by using isotype-specific antibodies after serum injection (Figure 13). Several further examples exist of Biacore’s SPR technology in clinical evaluation including the detection of fucosylation of α1-Acid Glycoprotein (AGP), an acute phase event in inflammation. Antibodies in this case were firstly immobilized on a sensor chip to sequester AGP directly from human serum and the level of fucosylation was determined by the injection of a fucose-binding lectin [55]. The technique was used to follow the inflammatory status of severe burns patients. Other examples of the measurement of autoantibodies directly from patient plasma with little sample preparation include the detection of anti-DNA antibodies [56] and anti-protein S antibodies [57] in systemic lupus erythematosis and antifactor XII antibodies in patients with anti-phospholipid syndrome [58]. Biacore in combination with mass spectrometry (SPR-MS) has been shown in a proof-of-principle study to be applicable to the clinical situation [59]. Human urine or plasma were spiked with physiological levels of cystatin C, β2-microglobulin, urinary protein-1 and retinol-binding protein and injected over specific antibodies immobilized on a sensor chip. The captured analytes were then digested and identified by mass spectrometry. SPR-MS was used to detect simultaneously β2microglobulin and cystatin C in a single spiked plasma sample by injecting over a
27
28 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
Fig. 13 Use of secondary antibodies for identification of the anti-GM1 autoantibody isotype. (A) Analysis of a serum sample from a patient with multifocal motor neuropathy with anti-IgM and anti-IgG antibodies, indicates that the antibodies in the
serum are IgM. (B) A similar analysis of a sample from a patient with Guillain–Barr´e syndrome indicates the presence of IgG antibodies. (Reprinted with the permission
of Dr Armin Alaedini, Department of Neurology & Neuroscience, Cornell University).
chip with different antibodies immobilized in two different flow cells. Additionally the method was able to detect protein complexes such as that between retinolbinding protein (RBP) and transthyretin using an anti-RBP antibody. A mass spectrum from the chip surface showed traces of both species enabling detection of in vivo assembled protein complexes using SPR-MS. 3.4 Applications of Biacore in Neuroscience
Alzheimer’s disease is associated with the deposition of polymerized amyloid βpeptide (Aβ) protein in lesions in the brain parenchyma. However, Aβ protein is expressed by almost all nucleated cells and is not secreted at elevated levels in patients compared with healthy subjects. Why the protein should be deposited to form lesions exclusively in tissues of the central nervous system is unknown. The most common plasma proteins are also present in cerebrospinal fluid (CSF), but
3 Biacore Applications in Basic and Clinical Research
at much lower concentrations. Biacore’s SPR technology was used in a screening strategy to find out whether any proteins common to both plasma and CSF could bind Aβ protein and thus potentially inhibit deposition [60]. Two forms of Aβ protein are known; one with 40 amino acids and a longer variant that contains two extra C-terminal residues (Aβ1–42). The latter polymerizes more readily into plaque-forming fibrils called β-amyloid. Two parallel immobilization strategies were followed to monitor the binding of plasma/CSF proteins to both forms of Aβ protein. Plasma and CSF proteins were tested for their capacity to inhibit Aβ protein polymerization in a microtitre assay in which Aβ protein or Aβ1–42 were immobilized as templates to which further Aβ protein was added in the presence of the test protein. This revealed that the most inhibitory substances were albumin, α1-antitrypsin, IgA and IgG, proteins also found at a lower concentration in CSF. In Biacore experiments, albumin was run over flow cells prepared as indicated above and was shown to bind only to immobilized, polymerized Aβ protein. The authors suggest that albumin, at concentrations found in plasma, may bind and prevent the deposition of polymerized Aβ protein. In inflammation, levels of albumin in CSF may be sufficiently reduced to abrogate this protective function, enabling the formation of β-amyloid. In addition to Aβ protein, serum amyloid P component (SAP) is also found in amyloid deposits in Alzheimer’s disease and, like Aβ protein, tends to selfassociate in the presence of calcium. Calumenin is one of a family of endoplasmic reticulum proteins that coordinate calcium in maintaining the structural integrity of helix–loop–helix motifs in certain proteins. One study elegantly demonstrated how Biacore could be used to define the interaction and characterize the conditions necessary for it to occur [61]. An unidentified binding partner for calumenin was first detected when a crude placental extract was run over recombinant calumenin immobilized on an affinity column in the presence of calcium. After elution and PAGE, one 30-kDa band was isolated and the protein was internally sequenced by tryptic digestion and identified as SAP. Calumenin was then immobilized as a ligand on a sensor chip. The results confirm that SAP is a ligand for calumenin and that the formation and stability of the complex requires calcium. It is therefore possible that given the tendency of both SAP and calumenin to form insoluble complexes, calumenin may participate in amyloidosis, the pathological process by which amyloid is deposited in the CNS of patients with Alzheimer’s disease. Spinal muscular atrophy (SMA) is a disease that is unequivocally linked to a gene called survival of motor neurons (SMN) [62, 63]. The disease is severely disabling and is characterized by the loss of motor neurons from the lower spinal cord. There is a strong correlation between the severity of SMA and levels of SMN protein. The function of SMN protein is unclear although it is thought to be involved in mRNA splicing. To determine a possible mechanism to explain why mis-sense mutations in SMN should compromise protein function and hence cause disease, Biacore was used to discover how native peptide fragments of SMN, defined by whole, discrete exons, interact with each other [64]. To identify and select the tools necessary to map these protein interactions, a panel of monoclonal antibodies was raised to full-length SMN protein. The antibodies were then tested using Biacore’s SPR
29
30 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
technology for reactivity against the peptides encoded by each exon in the cloned gene. The antibodies were immobilized on a sensor chip and screened with all the fragments. Antibodies were thereby identified which specifically recognized certain peptides within the protein and which could then be used to characterize intra-chain interactions in SMN that may determine conformational settings of functional importance in the full-length protein. By then immobilizing each peptide fragment and screening with exon-defined peptides, the authors defined the fragments of SMN which possessed the potential to interact with each other. This information, combined with epitope mapping, enabled an experiment to be constructed which gave a wealth of information about the structural intramolecular events behind the function of a protein. One monoclonal antibody that was shown to recognize the peptide product of exon “A” alone was immobilized on a sensor chip and subsequent injections of different peptides showed that the capture of peptide A enabled peptides “B1” and “B2” to bind sequentially. These data show not only that peptide A appears to form a scaffold to allow the assembly of B1 and B2, but that the functional protein product is a result of the creation of a novel epitope formed by the combination of B1 and B2. This study is recommended as a model of how Biacore’s SPR technology can be used to test the molecular mechanisms underlying the expression of disease. Alzheimer’s disease is a neurodegenerative disease in which Aβ protein participates in neurotoxic aggregates which localize to sites of cell death and neuronal apoptosis. Toxicity may also be caused by the localized release of activated inflammatory cytokines in the formation of so-called senile plaques. Aβ protein has been shown to induce the release of pro-inflammatory cytokines by cultured cells and this function was shown to be inhibited by ganglioside GM1 [65]. Biacore’s SPR technology was used to demonstrate direct binding of Aβ protein and liposomeimmobilized GM1, suggesting the mechanism behind this neuropathologic function. Interaction was shown to depend on the presence of sialic acid. As Aβ protein is known to be neurotoxic in a β-pleated conformation, the authors suggest that one function of GM1 may be to maintain Aβ protein in an inactive state and thus prevent the initiation of the inflammatory response of neurons in Alzheimer’s disease. In a modified approach to the liposome capture strategy of Ariga and Yu [65] described above, liposomes were immobilized on a sensor chip via an antibody [16]. In addition to ganglioside GM1, lipopolysaccharide (LPA) was incorporated into the liposome construction in this study, enabling capture by an anti-LPA antibody and mild regeneration conditions. A rigorous investigation into the immobilization of gangliosides has been carried out in an effort to develop a simple assay for anti-ganglioside antibodies [66]. A requirement of the technique is to present the glycolipid in an appropriate orientation for antibody characterization i.e. in a manner in which the sugar moiety presented in tumor cells in vivo is available to the analyte and which is thus a potential target for therapeutic monoclonal antibodies. The gangliosides were immobilized via the lipid tail, as using the more common aldehyde coupling procedure via the carbohydrate domain masks the epitope. The authors present a simple and rapid protocol, yielding a stable surface that can be readily reconditioned and reused. Gangliosides
3 Biacore Applications in Basic and Clinical Research
were injected over the unmodified surface of a sensor chip, with ganglioside GD3 covalently immobilized by photoactivation for comparison. Although overall immobilization levels were low compared with those achieved by covalent linkage, the molar level of binding (specific binding) was much higher. Phase I clinical trials, in which chimeric anti-ganglioside antibodies were tested in children with neuroblastomas [67], have shown that a host immune response is still induced to the murine component of the antibody. For this reason, engineered humanized antibodies wele synthesizd that retain the murine specificity-determining CDR regions, but from which the flanking V-region sequences have been replaced with human counterparts. This produces a theoretically less immunogenic protein in which the murine component is reduced from 30 to 10 %. Nakamura et al. [67] based the characterization of the interaction between these humanized antibodies and immobilized gangliosides GD2 and GD3 on the protocol of Catimel et al. [66] above. Their Biacore experiments revealed that humanization changed the kinetic characteristics of the antibodies in ways that are otherwise difficult if not impossible to predict. Biacore’s SPR technology thus provides information about the molecular mechanisms underlying functional changes in humanized antibodies, a critical part of the development of novel antibodies for human immunotherapy. Nerve growth factor (NGF), a member of the neurotrophin family, is implicated in the differentiation and survival functions of neurons [68, 69]. Biacore’s SPR technology has been used to characterize many types of receptor:ligand interactions, including that between NGF and the Trk family of receptors containing intracellular tyrosine kinase domains. Shc is a so-called adaptor protein comprising a C-terminal SH2 domain and an N-terminal phosphotyrosine-binding (PTB) domain. One of its known functions is to link ligand-activated growth factor receptors, including the human NGF receptor TrkA, to the Ras signaling pathway. This functions through its participation in the multi-molecular signaling complex, Shc-Grb2-Sos. The SH2 domain recognizes phosphotyrosine indirectly via adjacent C-terminal amino acids, while the PTB domain directly binds phosphotyrosine located within a specific amino acid motif. Biacore’s SPR technology was used to study the specificity and kinetics of the interaction between the PTB domain of Drosophila Shc and phosphopeptides derived from an EGF-receptor homolog, containing structural features shared by other receptors including TrkA, such as conserved hydrophobic amino acid residues [70]. This system is therefore likely to be a suitable model for other receptor:ligand interactions. Initial experiments in which a peptide containing phosphotyrosine flanked by 11 N-terminal and five C-terminal residues was immobilized on a sensor chip, showed that the association and dissociation rates of the Shc PTB domain were somewhat slower than typical, previously determined SH2 interactions. This may imply that in the interaction between Shc and an activated receptor, Shc is likely to bind rapidly to its target via its SH2 domain while its PTB domain limits the rate of dissociation. By measuring the ability of truncated or point-mutated phosphopeptides to inhibit the binding of the Shc PTB domain to native phosphopeptide, the specificity of the target sequence surrounding phosphotyrosine in the growth factor receptor was determined in a Biacore competition assay. Phosphopeptides were
31
32 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
truncated at various positions N-terminal to phosphotyrosine, incubated in solution with Shc and passed over native phosphopeptide immobilized on a sensor chip. C-terminal deletions had little effect, but deleting N-terminal amino acids revealed that the presence of hydrophobic residues at positions −5 and −7 were critical for Shc to effectively bind the receptor. As expected, nanomolar concentrations of unmodified peptide efficiently inhibited binding of Shc while an interaction could still be detected even in the presence of high concentrations of deletion-peptides. In a study of some relevance to Biacore assays in the receptor:ligand interaction field in general, the receptor for mammalian TrkA was characterized [71] in which the extracellular ligand-binding domain of TrkA was over-expressed in insect cells, purified and immobilized on a sensor chip. The association rate of NGF-binding to recombinant TrkA extracellular domain was close to previously reported values using the full-length receptor. These results suggest that recombinant, purified receptors may be used as a model in the analysis of NGF:TrkA interactions using Biacore’s SPR technology and that the strategy may be generally applicable to a wider range of receptor:ligand interactions. The physico-chemical composition of the microenvironment of the binding site on a receptor for its cognate ligand is critical. Ionic strength and pH, for example, affect non-covalent bonds and so by varying the buffer composition, judgments can be made on the nature of the interaction. Biacore’s SPR technology was used to discover whether local variations in pH changed the capacity of Zn2+ and Cu2+ to modulate NGF binding to its receptor [72]. To do this, the cations were included in running buffers with pH varying from 5.5 to 7.4, mimicking local conditions in cerebral acidosis, a condition frequently arising after stroke or traumatic insult. This pH range also covers the pKa value of histidine, the position and electrostatic status of which are known to influence cation binding. The authors used the strategy described by Woo et al. [71] to prepare and immobilize recombinant TrkA. The dissociation equilibrium constant, KD, for the NGF:TrkA interaction was calculated over a range of pH values covering the pKa value of histidine. This revealed that Zn2+ , but not Cu2+ , lost its ability to bind NGF and inhibit its interaction with TrkA under acidic conditions. The physiological consequences of this depend on the cell type and context; TrkA-expressing cells in an acidic environment and needing NGF for survival may well benefit from the inactivation of Zn2+ , whereas the effects might be detrimental if NGF, in contrast, initiates signals leading to cell death. NGF functions both within and outside the nervous system, for example it is released from mast cells during allergic inflammation and is implicated in the pathophysiology of asthma. Using in vitro studies it was shown that a novel protein called FIZZ1, isolated from the lavage fluid of inflamed murine bronchoalveoli, dampened NGF-mediated survival and gene expression in neurons [73]. Biacore was used to show that an antagonistic mechanism, which is not based on competition for a single receptor, operates to inhibit NGF function in TrkA-bearing cells. The data suggested that rather than inhibiting the interaction of NGF with its receptor, FIZZ1 functions indirectly by binding to a different and as yet unknown receptor, possibly regulating TrkA activity by “inside-to-out” signaling.
3 Biacore Applications in Basic and Clinical Research
The rationale for one Biacore study on Alzheimer’s disease [74] was to identify small peptides that can inhibit aggregation and therefore the toxic neurodegenerative effects of β-amyloid (Aβ) in the diseased brain. Although Aβ aggregates are present in neurotoxic lesions, it is not known if they are causative. Short peptides directed to the self-recognition motifs in Aβ were synthesized and were used to screen the affinities of the interactions. Solution-based methods for screening Aβ inhibitors consume large amounts of the target and they are limited by the tendency of Aβ to aggregate in solution and thus present multiple target forms. This problem is avoided in Biacore assays because the target is immobilized on a planar sensor chip surface at a density and in an orientation that can be carefully controlled by the user. A 26-amino acid fragment of Aβ comprising amino acids 10 to 35 with a Cterminal cysteine was immobilized at low density on a sensor chip thus presenting a uniform surface. Further, inter-assay variability was low as a single chip surface could be regenerated several times and re-used for all test interactions. A range of concentrations of each peptide, based on variations of a five-amino acid sequence corresponding to the central hydrophobic domain of Aβ known to be responsible for self-association, was injected over Aβ on the chip surface for 500 s and the response at equilibrium (Req ) was plotted against time. As the binding was weak, only relative affinities were measured, with the equilibrium response determined after 90 % of the contact time at each concentration. Of several peptides based on multiple length C-terminal extensions of a pentapeptide in the central hydrophobic domain of Aβ (which generally bound more strongly than the original pentapeptide to Aβ), a pattern emerged in which the affinity for Aβ was sensitive to site-specific positioning of positively-charged lysine residues. For example, it was previously determined that the addition of six lysine residues directly adjacent to the basic KLVFF sequence generated a more effective inhibitor of cytotoxicity in cell-based assays. In the Biacore assay, peptides in which three of these residues were separated from the basic sequence by three negativelycharged residues had a 14-fold lower affinity for Aβ, while the affinity of a peptide of similar size bearing three directly adjacent lysine residues was almost unchanged. The peptides were then tested in cytotoxicity microtitre assays by incubating Aβ with neuroblastoma cells in the presence of peptides and compared with the Biacore affinity data (Table 1). High affinity binders protected against toxicity by greater than 80 % in viability assays while low affinity binders were less effective. It is hoped that these peptides will be useful as probes of the mechanisms underlying amyloid plaque formation and in the design of drugs. Biacore’s SPR technology, therefore, is proposed as a reliable means to measure the affinity of small peptides for Aβ, a protein which is notoriously difficult to handle in solution-based assays. Such data may help predict the suitability of candidate peptides as inhibitors of Aβ toxicity in Alzheimer’s disease. The glutamate receptor δ2 (GluRδ2) [75, 76] is exclusively expressed on the dendritic synapses of Purkinje cells [77], structurally unique cells that are among the most organizationally complex neurons in the mammalian nervous system. Although they receive an enormous amount of data from neurons of the spinal cord and elsewhere, they are nevertheless able to convert this input into interpretable
33
34 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors Table 1 Peptides based on C-terminal extensions to a central KLVFF fragment derived from
the central hydrophobic domain of Aβ were tested for their affinity to Aβ immobilized on Pioneer Chip B1. Affinity (KD ) is related to the ability of the same peptide to inhibit Aβmediated cell cytotoxicity in a cell-based assay Peptide sequence*
KD (l M)
Viability (%)+
KLVFFRRRRRR KLVFFKKKKKK KLVWWKKKKKK KLVFFKKKK KLVFWKKKKKK KLVFFKK KLVFFKKKEEE KKKKLVFF KLVFFEKEKEK KKKKKK KLVFFEEEKKK
40 40 40 37 65 80 90 180 300 400 1300
> 90 80–90 80–90 70–80 70–80 70–80 60–70† 60–70† 60–70† 60–70† 60–70†
+ Cellular viability of human neuroblastomas cells was assessed in a microtiter assay using an MTT assay in which cells were exposed to Aβ in the presence or absence of peptides. ∗ K, lysine; L, leucine; V, valine; F, phenylalanine; R, arginine; W, tryptophan; E, glutamic acid. † This is a similar level of viability as seen after treatment with negative controls.
information and are, in fact, the sole means of output in the cerebellar cortex. That GluRδ2 is limited to these cells possibly implies a function in this extraordinary feat of information processing. Biacore analysis showed that an intracellular protein called delphilin co-localizes with GluRδ2 at parallel fiber (input)–Purkinje cell (output) synapses [78] and that recombinant delphilin PDZ domain binds to the C terminus of GluRδ2 with moderate affinity, mostly accounted for by rapid dissociation. The authors also showed that delphilin bound profilin, an actin-binding protein, and dissociated very slowly. It is possible therefore that delphilin acts as a molecular bridge linking GluRδ2 to the actin cytoskeleton via profilin with rapid turnover of the GluRδ2:delphilin interaction contributing to the capacity of the Purkinje cell to process a very large amount of input data. Spinal muscular atrophy (SMA) is predominantly a childhood disease of motor neurons in the spinal cord, affecting the voluntary muscles and causing the impediment of functions such as locomotion and head and neck control. The disease is strongly linked to mutations in the SMN1 (survival motor neuron 1) gene. It has recently been suggested that the clinical symptoms of the disease may be due to increased motor neuron death through the inability of mutated SMN protein to dimerize, and thereby sequester and inhibit the pro-apoptotic function of the tumor suppressor protein, p53 [79]. An elegant surface plasmon resonance strategy was followed in which Biacore was used initially to confirm conventional immunoprecipitation (i.p.) data using an antibody to p53 showing that SMN associated with p53. This finding was substantiated using Biacore in sequential binding experiments that demonstrated how the technology might be used as an alternative to immunoprecipitations and may represent a more convenient tool for detecting molecular interactions in whole cell
3 Biacore Applications in Basic and Clinical Research
Fig. 14 Comparison between immunoprecipitation/blotting and Biacore in the detection of molecular complexes. Both assays reveal that SMN protein derived from whole cell extracts is associated with p53 in the cell. Results achieved by immunoprecipitation are available within 1 day whereas the Biacore assay takes minutes.
extracts (Figure 14). Briefly, anti-p53 antibody was coupled to a sensor chip. Next, whole cell extracts from SMN-transfected cells (treated with a protease inhibitor that protects the labile p53) were passed over the chip, followed by exposure to anti-SMN antibody. Cell extracts that did not contain detectable levels of p53 were used in parallel and demonstrated the specificity of the interaction. SMA patient-derived mutations were also tested for p53 binding. Two SMN proteins carrying C-terminal mutations found in patients with severe type I SMA and a peptide from a patient with less severe type II SMA were immobilized on a sensor chip and exposed to p53. The magnitude in reduction of SMN peptide binding strongly correlated with the clinical subset. This defect is likely to be attributable to the inability of the SMN mutant proteins to efficiently self-associate. Young et al. [79] then applied the sequential Biacore approach to show that this very inability to self-associate may prevent SMN from sequestering and inhibiting p53 function. In these experiments, p53 failed to bind to immobilized monomeric SMN and was only induced to bind when the same chip was supplemented with dimeric SMN.
35
36 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
The significance of these findings is supported in the same body of work by results from indirect confocal immunofluorescence experiments showing that SMN and p53 co-localize in the Cajal bodies (sites of small nuclear ribonucleoprotein particles involved in RNA splicing) of normal fibroblasts, but not in those derived from SMA patients. 3.5 Drug: Plasma Protein Interactions
The clinical potential of drug compounds is greatly affected by the nature of their interactions with circulating plasma proteins, such as human serum albumin (HSA) and α1-acid glycoprotein (AGP). The effects of plasma protein binding by drugs are varied and can influence factors such as the free concentration of drug compound in the circulation, transport and distribution around the body and the duration of drug action. Biacore offers the user the ability to run rapid, automated assays with high sensitivity and relatively low sample consumption. The sensitivity of the technology varies from system to system but association rate constants (ka ) of up to 5 × 106 M−1 s−1 and dissociation rate constants (kd ) of between 10−4 and 1 s−1 are accurately and reproducibly attainable with the latest available systems. Lead compounds can be analyzed from sub-nanomolar to millimolar concentrations. R Biacore’s latest model, Biacore S51, is specifically designed for drug discovery applications from simple ranking, to detailed characterization of the interactions. Hit selection assays may be employed for ranking of lead compounds for binding to one or two plasma proteins, with a throughput of 96 compounds in 12 h. The data produced from single concentration analyses permit the ranking of compounds in terms of binding response to single or multiple plasma proteins. Based on this ranking assay, compounds for which a high-resolution analysis is required can then be selected. These are then taken for detailed compound characterization assays which use concentration series of compounds to generate data sets from which KD values can be calculated using appropriate fitting models. The KD values may then be transformed to give percentage-bound figures, which can be calculated for single or combined plasma protein binding, as appropriate. It may also be possible to maximize the level of the compound characterization to include kinetic analyses of drug–plasma protein interactions in some cases. 3.6 Drug:DNA Interactions
A number of drugs bind cellular DNA and inhibit targeted protein–DNA interactions. A prerequisite for the design of new types of compounds is rapid penetration of cells and interaction with a specific sequence of DNA. Such compounds, structurally predicted from the gene sequence and functionally tested using Biacore’s SPR technology, could have many therapeutic and biotechnological uses. Relatively small molecules can be remarkably specific in their sequence recognition. A “prodrug” of furamidine, for example, binds strongly to the minor groove of DNA and
3 Biacore Applications in Basic and Clinical Research
is selective for AT-rich sequences [80]. Biacore has been used in combination with DNA footprinting methods [81], which are employed to quickly characterize compound interactions with defined sequences of various sizes to discover new DNAbinding drugs. Biacore is then used to resolve questions of stoichiometry, relative affinities and binding kinetics. The characterization of DNA sequence recognition and DNA binding of indolocarbazole antitumor drugs is described elegantly in two studies [81, 82]. Thermodynamic characteristics of binding of organic cations to AT and mixed sequence DNA sites are shown by isothermal calorimetry and Biacore to be very dependent on structure, solvation and sequence of the DNA binding site (see also [83] and [84]). Recently Biacore was shown to be a rapid and reliable means to detect DNA base pair mismatches by using a synthetic low molecular weight ligand that specifically binds G:G mismatches. This work has far-reaching implications for SPR in genome analysis [85]. The technique depends on the development of reagents that specifically recognize certain types of aberrations in native DNA. Today, single nucleotide polymorphisms (SNPs) are routinely detected using methods that rely largely on the separation of DNA fragments by gel electrophoresis or HPLC, neither of which are readily suited to high-throughput analysis. The sheer volume of sequence information from the human genome project demands the development of a tool for rapidly scanning DNA for SNPs and Sando et al. [85] have used Biacore’s SPR technology as an alternative means to this end. They designed and synthesized a low molecular weight ligand that specifically binds double-stranded DNA containing G:G mismatches arising from native C:G pairings and which has minimal reactivity with any of the other seven possible mismatch combinations (A:A, A:C, A:G, T:T, T:C, T:G or C:C). The ligand, naphthyridine, is modeled to recognize the characteristic open bulge that is created between DNA strands at the site of a G:G mismatch, at which no complementary base (cytosine) is present with which to form a hydrogen-bonded base pair. Naphthyridine is synthesized as a dimer and is therefore able to form simultaneous hydrogen bonds with both guanines on each DNA strand. Experiments have shown that the melting temperature, i.e. that at which half of the double-stranded DNA in a sample dissociates into single chains and which is a measure of DNA stability, increased in the presence of dimeric, but not monomeric naphthyridine, to a much higher threshold than that of native DNA or DNA of an otherwise identical composition containing other mismatches. The synthetic naphthyridine ligand contains an aminoalkyl chain and can therefore be covalently linked by amine coupling to carboxyl groups on a sensor chip. 27-mer doublestranded DNA was passed over the prepared chips and binding was seen only if the DNA contained G:G mismatches. DNA containing other SNPs or native DNA did not bind to the ligand. The sensitivity of the assay was shown by a linear increase in response with concentration of analyte from 125 to 1000 nM, a range that was optimized by changes in salt concentration. The intensity of response was a function of both DNA concentration and length. Other non-gel based mismatch screening methods such as melting curve analysis or MutS capture of mismatched fragments rely on detection systems such as
37
38 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
labeling with a fluorescent dye or the use of labeled primers. Sando et al. [85] showed that Biacore’s SPR technology is sensitive, specific, requires no primers or probes and can be adapted for high throughput analysis. This is the first report in which Biacore has been used for DNA mutation analysis and while the design and synthesis of ligands that are specific for other SNPs will be difficult, the principle of Sando et al.’s work is a promising development in accelerating DNA mutation analysis (see also [86] for a commentary on this exciting work). One further study shows that a second class of small molecule, pyrrole-imidazole-containing polyamides, can also function as DNA mismatch recognition agents [87]. 3.7 Nucleic Acid Structure and Analysis
In addition to identifying prospective binders to DNA in the search for novel drugs, Biacore has also been used to investigate the effects of deliberate alterations in the structure of nucleic acids on their function. For example, it has been shown that the substitution of the sugar-phosphodiester backbone of DNA to create so-called pyrrolidine-amide oligonucleotide mimics (POM) [88] enabled the synthetic singlestranded DNA analogs to bind to complementary native strands with higher affinity than did other native strands. Additionally it was shown that POMs have a much faster association rate for binding to RNA than for binding to DNA. Simple cations and polyamines are also shown to affect the binding affinity of ssDNA in duplex formation and the rate of DNA duplex formation and dissociation. Affinity and kinetic analyses have been performed on several different types of DNA structures including perfect linear matches, bulged loops, mismatched duplexes and hairpin duplexes. Of possibly critical importance in cancer, Biacore studies have shown that polyamines (commonly found in tumor cells) may stabilize mismatched duplexes via increasing the thermal stability of the duplex and causing a reduction in kd [89]. Protein:DNA interactions have also been tested by immobilizing a range of different structural forms of DNA on a sensor chip surface and testing affinity and kinetic parameters with interacting proteins [90]. This conceptually interesting work shows that DNA polymerase binds quantitatively and kinetically more efficiently to primed DNA than to bluntend dimers. “Morpholinos” (MORFs) are a recently developed class of synthetic nucleotide oligomers that carry nucleotides in the same defined sequences as their parent nucleic acid but in which the sugar backbone is replaced with a phosphorodiamidate backbone [91]. MORFs are reported to be more nuclease-resistant, more soluble and more rigid than native DNA and thus potentially may be highly efficacious as carriers of sequence-targeted radiopharmaceuticals in cancer treatment [92]. Anti-sense DNA is a frequently investigated strategy aimed at the targeted inhibition of the translation of oncogenic protein genes in vivo. The efficacy of such treatment, however, is limited by the difficulty of ensuring delivery of antisense oligonucleotides into cells. Anti-sense oligonucleotides were conjugated with cell-penetrating peptides and Biacore was used to confirm that the binding affinities of these oligonucleotides for their targets after conjugation were similar to unconjugated oligonucleotides [93]. An extension of the anti-sense approach is a strategy
3 Biacore Applications in Basic and Clinical Research
in which double-stranded DNA is targeted, the so-called anti-gene approach. Here, a DNA triple helix is transiently formed between a specific dsDNA and a singlestranded oligonucleotide and which may potentially inhibit transcription of specific genes. These transient structures, however, are unstable in vivo and Biacore was used in one study to monitor the stability of the complexes formed between dsDNA and chemically-stabilized oligonucleotides [94]. Interactions between small deoxyribozymes and RNA are also open to Biacore on-chip analysis. In one interesting study, the role of higher order RNA structure in self-association has been investigated [95]. The authors propose that mRNA transcribed from mutated genes can form structural anomalies that are regulatory parameters in themselves i.e. that the shape of the RNA in addition to its sequence is important. The implication is that even if a mutation in a codon does not result in an amino acid change in the translated protein, the structural alteration of the transcript may nevertheless have functional consequences. Accurate mRNA splicing for example, is sensitive to the maintenance of RNA structure in order to allow interactions with other cellular components. The authors immobilized short sequences of RNA which recognized and cleaved RNA sequences containing a loop resulting from one such mutation. The cleavage function was shown to be both loop size- and sequence-dependent and further, the reaction could be monitored in real time on the chip surface. Most real-time “on-chip” reactions have been concerned with monitoring realtime changes in molecular weight of an immobilized substrate during passage of an enzyme analyte although DNA synthesis during the passage of DNA polymerase has also been studied [96]. For selected reading on monitoring enzyme activity on Biacore chips through real-time alterations of substrate molecular weight, see also [97] (Design of helical proteins for real-time endoprotease assays); [98] (Proteolysis of the exodomain of recombinant protease-activated receptors: prediction of receptor activation or inactivation by MALDI mass spectrometry); [99] (Monitoring of the refolding process for immobilized firefly luciferase with a biosensor based on surface plasmon resonance); and [100] (Peptide chips for the quantitative evaluation of protein kinase activity). 3.8 Protein:RNA Interactions
The association of specific proteins with RNA molecules begins immediately as the nascent RNA chain is produced during transcription. From this point onwards, a series of different proteins are bound to and released from the RNA and are involved in co- and post-transcriptional RNA processing events such as splicing, export to the cytoplasm, regulation of RNA stability and translation. These interactions are highly dynamic and are critical in the regulation of the various steps involved in RNA processing. In addition, these individual RNA:protein interactions occur in the environment of extremely large and complex nucleic acid–protein complexes such as the transcriptosome and spliceosome. Until recently, however, studies on RNA-binding proteins have relied heavily on techniques which examine binding under equilibrium conditions (such as electrophoretic mobility shift assays). Although this type of
39
40 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
affinity-based data is valuable in many instances, it does not address the key issue of the dynamics of the RNA:protein interactions. This requires a true kinetic analysis that can deliver individual association and dissociation rate constants (ka and kd ). Biacore’s SPR technology can be used to provide detailed kinetic data on the interaction of RNA-binding proteins with their specific RNA targets and this information can provide invaluable insights into their functions. In one study, detailed kinetic analysis was used to elucidate a novel two-stage binding interaction between the spliceosomal protein, U1A and its RNA target [6]. Although the range of surface chemistry approaches that are compatible with Biacore’s technology enables great flexibility in selecting which interaction partner should be immobilized on the sensor surface, the nucleic acid is most often immobilized in RNA:protein studies. One reason for this is the relative ease with which RNA oligonucleotides can be chemically synthesized with a terminal biotin modification which enables them to be immobilized in an oriented fashion on a commercially-available streptavidin surface (Sensor Chip SA). In one particular study on binding of the neuronal protein, HuD to RNA recognition motifs (RRMs), individual RRMs that had remained “hidden” during previous equilibrium analysis-based studies were shown to make important kinetic contributions to the overall binding function (see [22] and [101]). Many RNA-binding proteins contain multiple RRM regions and the results from the HuD study may therefore represent a general model for how investigation into this type of protein should be approached. In the same study, the analysis of spliceosomal protein U1A binding to its target RNA revealed a hitherto unknown two-stage mechanism of binding. This data was also able to identify key amino acid and nucleotide residues involved in the individual interaction stages and provided evidence for the types of forces involved. This model may also prove to have a much wider relevance, since a number of other RNA-binding proteins are known to contain similar distributions of basic amino acids around their RNA binding sites. In both cases, crucial new information regarding the mechanism of the protein:RNA interaction was obtained. These studies relied heavily on the ability of Biacore’s SPR technology to provide high-quality kinetic data. In addition to furthering our understanding of the many vital biological interactions between proteins and RNAs, kinetic approaches may also be invaluable in designing therapeutic small molecule inhibitors.
4 Current Developments and Future Perspectives 4.1 SPR-MS
The advent of the proteomics era has placed a new emphasis on techniques that enable the identification of proteins. Of these, mass spectrometry has been the most significant, with MALDI-TOF (matrix-assisted laser desorption/ionizationtime of flight) and ESI (electrospray ionization) approaches dominating the field. MS analysis is commonly used in combination with traditional separation
4 Current Developments and Future Perspectives 41
techniques, predominantly two-dimensional gel electrophoresis, in order to combine expression profiling with protein identification. In contrast to the array technology approaches used in expression profiling at the mRNA level, analogous developments within proteomics will be greatly complicated by the extensive and complex interactions among the proteins expressed. In practice, this means that simple expression data will be of extremely limited value without a detailed understanding of the expressed proteins’ interaction partners. Combining SPR with MS would therefore provide a potentially powerful marriage of technologies within the proteomics field, with the possibility of providing protein identification which is selected on the basis of functional binding criteria. The painful lesson learned from previous genomic studies is that the expression of a gene in a particular biological context can often prove to be a meaningless observation in itself in the absence of direct functional data. A number of approaches to combining SPR and MS have been taken thus far. It is possible, for example, to physically remove the sensor chip from a Biacore instrument following binding of the analyte to the ligand and use it directly in a MALDI-TOF analysis [102, 103]. This approach produces minimal analyte loss, since all of the analyte molecules are bound within the area of the flow cell which is used as the MALDI target. An alternative method is to elute captured analyte from the sensor surface, using a procedure designed to recover the sample in a small volume after SPR analysis [59, 104]. In addition to being able to use the same sensor surface for repeated recoveries, another advantage of this approach is that it is also possible to enzymatically digest the bound analyte on the sensor surface prior to or following recovery [105], which greatly facilitates MS identification by allowing alignment of MS profiles against various databases. A disadvantage to this approach is the risk of contamination, which must always be guarded against. The SPR route into MS analysis holds a great deal of promise, especially in the field of interaction proteomics, and as testified by the recently published work just described, much has already been achieved. Future developments will doubtless further improve the efficiency and automation of the technique. The small sample volume requirement and high sensitivity features of SPR technology, which are major advantages in terms of interaction analysis, mean that the amount of material that it is possible to obtain from a single SPR experiment for MS analysis remains somewhat restricted. This reflects the fact that the design of SPR instruments has hitherto been optimized very much from the viewpoint of the interaction analysis itself, rather than placing any emphasis on preparative purposes. As SPRMS continues to be developed as a combined technology concept however, it can be anticipated that the two constituent technologies will adapt to each other to provide an optimal solution. 4.2 The Challenges of High-throughput Protein Analysis and Array Technologies
The range of currently available technologies for studying proteins all appear to be restricted in the throughput that can be easily achieved. This suggests that the main
42 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
challenge for the next phase of development is principally one of scale. As interest in proteins shifts gradually away from considering isolated or restricted groups of molecules and more towards understanding how multiple proteins interact in complex, functionally interrelated networks, it is inevitable that increasingly higher throughput systems will be desirable. While protein microarray systems will certainly be developed over the next few years, it is worth considering some of the realities relating to the challenges involved. It would be optimistic to imagine that in the near future proteomics will be conducted on a genomics-scale, since the complexity of working with proteins as compared to nucleic acids presents more of a challenge for developments in automation. There are no current proteomics technologies analogous in performance to the PCR, high-throughput sequencing and SNP analysis technologies which have enabled such astonishingly rapid progress in genomics-related disciplines. Protein array chips will certainly be important tools in proteomics research, but the scale with which they will be used will depend on the information they provide. The word “array” may itself have very different meanings and interpretations. To the genomics field, familiar with chip-based cDNA arrays for expression profiling for example, “array” conjures up an expectation of anything from several hundred to many tens of thousands of individual DNA sequences on a single analysis surface. Despite the highly specific nucleic acid sequence hybridization principle on which such systems are based and the simple, semi-quantitative nature of the data output, this type of array technology experienced difficulties in terms of reliability, reproducibility and not least, data analysis. The challenges for large-scale protein arrays are likely to be considerably more difficult than for their DNA counterparts, with complications such as the difficulties in expressing and purifying large numbers of correctly folded proteins with the appropriate post-translational modifications, protein instability, much greater potential for cross-reactivity, greater difficulty in standardizing binding conditions and much more complex data handling. These fundamental challenges must be dealt with in some way by all the technologies that hope to contribute to the development of protein array approaches. In practice, the development of protein array chips will almost certainly involve a trade-off between relatively low-information screening and more detailed analysis approaches, in terms of the total number of simultaneously analyzable protein “spots” that are practical to use. Protein arrays will doubtless be applied in both the areas of “discovery proteomics” and “interaction proteomics” (see Section 1.1), but are likely to be adapted in quite different ways to meet these distinct challenges. While it may be possible for example to spot many thousands of proteins for semiquantitative expression profiling applications, it is highly unlikely that high-quality kinetic analysis of protein interactions will be carried out on the same scale in the foreseeable future (or even that there would be a demand for such an application). The nature and size of protein microarrays may vary greatly therefore, depending on the intended application. Currently, it is possible to simultaneously analyze a maximum of four immobilized proteins using Biacore’s SPR systems and this can be viewed as a very small-scale microarray system. The strengths of this technology are largely targeted towards producing high-quality, information-rich data (such
References
as high resolution kinetics) and this may position it more towards the high-data content, rather than ultra-high throughput end of the protein array scale. It is important to bear in mind that each data “spot” in a Biacore assay produces a real-time analysis of its interaction with the analyte over the entire course of binding and dissociation. In high-resolution kinetics analysis, this may involve SPR measurements taken at a rate of up to 10 s−1 over the course of many minutes. The amount of data produced by each interaction monitored is therefore very high and while this is one of the cornerstones of the success of Biacore’s technology in a broad range of different applications, it is something that must be considered in the context of larger array situations. As was experienced by the pioneers of large-scale DNA arrays, it may be that data analysis will provide a more important practical limitation to the scope of protein arrays than the developments in technological hardware which are involved. The technical feasibility of using SPR technology to detect interactions in a larger scale array format was announced to the media by Biacore several years ago however, and it is likely that future developments will lead to SPR array systems that are tailored in a format that is appropriate for useful applications within proteomics research, as well as in other applications in life science research and drug development. Precisely where the final balance between array size and data/information content will fall remains to be seen for all the technologies that may contribute to the development of protein microarrays, but this may well vary considerably among them and may in the end be determined more forcefully by application requirements rather than by what may be technically possible to achieve.
References 1 LIEDBERG, B., NYLANDER, C., and LUNDSTRO¨ M, I. Surface plasmon resonance for gas detection and biosensing. Sensors Actuators. 1983, 4, 299–304. 2 VENTER, J. C., et al. The sequence of the human genome. Science 2001, 291, 1304–1351. 3 BRETT, D., POSPISIL, H., VALCARCEL, J., REICH, J., and BORK, P. Alternative splicing and genome complexity. Nature Genet. 2002, 30, 29–30. 4 BANKS, R. E., DUNN, M. J., HOCHSTRASSER, D. F., SANCHEZ, J. C., BLACKSTOCK, W., PAPPIN, D. J., and SELBY, P. J. Proteomics: new perspectives, new biomedical opportunities. Lancet 2000, 356, 1749–1756. 5 PARK, S., MYSZKA, D. G., YU, M., LITTLER, S. J., and LAIRD-OFFRINGA, I. A. HuD RNA recognition motifs play distinct
roles in the formation of a stable complex with AU-rich RNA. Mol. Cell Biol. 2000, 20, 4765–4772. 6 KATSAMBA, P. S., MYSZKA, D. G., and LAIRD-OFFRINGA, I. A. Two functionally distinct steps mediate high affinity binding of U1A protein to U1 hairpin II RNA. J. Biol. Chem. 2001, 276, 21476–21481. 7 De GENST, E., ARESKOUG, D., DECANNIERE, K., MUYLDERMANS, S., and ANDERSSON, K. Kinetic and affinity predictions of a protein–protein interaction using multivariate experimental design. J. Biol. Chem. 2002, 277, 29897–29907. 8 CHOULIER, L., ANDERSSON, K., HAMALAINEN, M. D., Van REGENMORTEL, M. H., MALMQVIST, M., and ALTSCHUH, D. QSAR studies applied to the prediction of antigen–antibody interaction kinetics as measured by
43
44 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
9
10
11
12
13
14
15
16
BIACORE. Prot. Eng. 2002, 15, 373– 382. KRETSCHMANN, E. Die Bestimmung optischer Konstanten von Mtallen durch Anregung von Oberfl¨achen plasmonenschwingungen. Zeitschrift Physik. 1971, 241, 313–324. L¨ofas, ˚ S., and JOHNSSON, B. A novel hydrogel matrix on gold surfaces in surface plasmon resonance sensors for fast and efficient covalent immobilization of ligands. J. Chem. Soc. Chem. Comm. 1990, 21, 1526–1528. JOHNSSON, B., and L¨OFAS ˚ , S. Immobilization of proteins to a carboxymethyldextran modified gold surface for biospecific interaction analysis in surface plasmon resonance. Anal. Biochem. 1991, 198, 268–277. JOHNSSON, B., L¨OFAS ˚ , S., LINDQUIST, G., EDSTRO¨ M, Å., M¨ULLER HILLGREN, R.-M., and HANSSON, A. Comparison of methods for immobilization to carboxymethyl dextran sensor surfaces by analysis of the specific activity of monoclonal antibodies. J. Mol. Recog. 1995, 8, 125–131. TOKARSKA-SCHLATTNER, M., WALLIMANN, T., and SCHLATTNER, U. Multiple interference of anthracyclines with mitochondrial creatine kinases: preferential damage of the cardiac isoenzyme and its implications for drug cardiotoxicity. Mol. Pharmacol. 2002, 61, 516–523. CALDWELL, E. E., ANDREASEN, A. M., BLIETZ, M. A., SERRAHN, J. N., VANDERNOOT, V., PARK, Y., YU, G., LINHARDT, R. J., and WEILER, J. M. Heparin binding and augmentation of C1 inhibitor activity. Arch. Biochem. Biophys. 1999, 361, 215–222. NIEBA, L., NIEBA-AXMANN, S. E., PERSSON, A., H¨AMA¨ LA¨ INEN, M., EDEBRATT, F., HANSSON, A., LIDHOLM, J., MAGNUSSON, K., KARLSSON, Å.F., and PLU¨ CKTHUN, A. BIACORE analysis of histidine-tagged proteins using a chelating NTA sensor chip. Anal. Biochem. 1997, 252, 217–228. MACKENZIE, C. R., HIRAMA, T., LEE, K. K., ALTMAN, E., and YOUNG, N. M. Quantitative analysis of bacterial toxin affinity and specificity for glycolipid
17
18
19
20
21
22
23 24
25
26
receptors by surface plasmon resonance. J. Biol. Chem. 1997, 272, 5533–5538. SEVIN-LANDAIS, A., RIGLER, P., TZARTOS, S., HUCHO, F., HOVIUS, R., and VOGEL, H. Functional immobilisation of the nicotinic acetylcholine receptor in tethered lipid membranes. Biophys. Chem. 2000, 85, 141–152. KARLSSON, O. P., and LOFAS, S. Flow-mediated on-surface reconstitution of g-protein coupled receptors for applications in surface plasmon resonance biosensors. Anal. Biochem. 2002, 300, 132–138. Cooper, M. A. OPTICAL biosensors in drug discovery. Nature Rev. Drug Discov. 2002, 1, 515–528. SJO¨ LANDER, S., and URBANICZKY, C. Integrated fluid handling system for biomolecular interaction analysis. Anal. Chem. 1991, 63, 2338–2345. STENBERG, E., PERSSON, B., ROOS, H., and URBANICZKY, C. Quantitative determination of surface concentration of protein with surface plasmon resonance by using radiolabeled proteins. Coll. Interface Sci. 1991, 143, 513–526. KATSAMBA, P. S., PARK, S., and LAIRD-OFFRINGA, I. A. Kinetic studies of RNA–protein interactions using surface plasmon resonance. Methods 2002, 26, 95–104. Myszka, D. G. IMPROVING biosensor analysis. J. Mol. Recognit. 1999, 12, 1–6. KARLSSON, R., KULLMAN-MAGNUSSON, M., HAMALAINEN, M. D., REMAEUS, A., ANDERSSON, K., BORG, P., GYZANDER, E., and DEINUM, J. Biosensor analysis of drug–target interactions: direct and competitive binding assays for investigation of interactions between thrombin and thrombin inhibitors. Anal. Biochem. 2000, 278, 1–13. DAY, Y. S., BAIRD, C. L., RICH, R. L., and MYSZKA, D. G. Direct comparison of binding equilibrium, thermodynamic, and rate constants determined by surface- and solution-based biophysical methods. Prot. Sci. 2002, 11, 1017–1025. MYSZKA, D. G., MORTON, T. A., DOYLE, M. L., and CHAIKEN, I. M. Kinetic analysis of a protein antigen antibody
References
27
28
29
30
31
32
33
34
interaction limited by mass transport on an optical biosensor. Biophys. Chem. 1997, 64, 127–137. MATSUDA, H. Theory of the steady-state current potential curves of redox electrode reactions in hydrodynamic voltammetry. II Laminar pipe- and channel flow. J. Electroanal. Chem. 1967, 15, 325–336. MYSZKA, D. G., HE, X., DEMBO, M., MORTON, T. A., and GOLDSTEIN, B. Extending the range of rate constants available from BIACORE: Interpreting mass transport-influenced binding data. Biophys. J. 1998, 75, 583–594. SIKAVITSAS, V., NITSCHE, J. M., and MOUNTZIARIS, T. J. Transport and kinetic processes underlying biomolecular interactions in the BIACORE optical biosensor. Biotechnol Prog. 2002, 18, 885–897. RODEN, L. D., and MYSZKA, D. G. Global analysis of a macromolecular interaction measured on BIAcore. Biochem. Biophys. Res. Comm. 1996, 225, 1073–1077. PARK, B. W., ZHANG, H. T., WU, C., BEREZOV, A., ZHANG, X., DUA, R., WANG, Q., KAO, G., O’ROURKE, D. M., GREENE, M. I., and MURALI, R. Rationally designed anti-HER2/neu peptide mimetic disables P185HER2/neu tyrosine kinases in vitro and in vivo. Nature Biotechnol. 2000, 18, 194–198. BEREZOV, A., ZHANG, H. T., GREENE, M. I., and MURALI, R. Disabling erbB receptors with rationally designed exocyclic mimetics of antibodies: structure–function analysis. J. Med. Chem. 2001, 44, 2565–2574. BEREZOV, A., CHEN, J., LIU, Q., ZHANG, H. T., GREENE, M. I., and MURALI, R. Disabling receptor ensembles with rationally designed interface peptidomimetics. J Biol Chem. 2002, 277, 28330–28339. LOUKINOVA, E., RANGANATHAN, S., KUZNETSOV, S., GORLATOVA, N., MIGLIORINI, M. M., LOUKINOV, D., ULERY, P. G., MIKHAILENKO, I., LAWRENCE, D. A., and STRICKLAND, D. K. Platelet-derived growth factor (PDGF)-induced tyrosine phosphorylation of the low density lipoprotein receptor-related protein
35
36
37
38
39
40
41
(LRP). Evidence for integrated co-receptor function betwenn LRP and the PDGF. J. Biol. Chem. 2002, 277, 15499–15506. ROONEY, I. A., BUTROVICH, K. D., GLASS, A. A., BORBOROGLU, S., BENEDICT, C. A., WHITBECK, J. C., COHEN, G. H., EISENBERG, R. J., and WARE, C. F. The lymphotoxin-beta receptor is necessary and sufficient for LIGHT-mediated apoptosis of tumor cells. J. Biol. Chem. 2000, 275, 14307–14315. STEPLEWSKI, Z., LUBECK, M. D., and KOPROWSKI, H. Human macrophages armed with murine immunoglobulin G2a antibodies to tumors destroy human cancer cells. Science 1983, 221, 865–867. VALONE, F. H., KAUFMAN, P. A., GUYRE, P. M., LEWIS, L. D., MEMOLI, V., DEO, Y., GRAZIANO, R., FISHER, J. L., MEYER, L., MROZEK-ORLOWSKI, M. et al. Phase Ia/Ib trial of bispecific antibody MDX-210 in patients with advanced breast or ovarian cancer that overexpresses the proto-oncogene HER-2/neu. J. Clin. Oncol. 1995, 13, 2281–2292. WEINER, L. M., CLARK, J. I., RING, D. B., and ALPAUGH, R. K. Clinical development of 2B1, a bispecific murine monoclonal antibody targeting c-erbB-2 and Fc gamma RIII. J. Hematother. 1995, 4, 453–456. WEINER, L. M., ALPAUGH, R. K., AMOROSO, A. R., ADAMS, G. P., RING, D. B., and BARTH, M. W. Human neutrophil interactions of a bispecific monoclonal antibody targeting tumor and human Fc gamma RIII. Cancer Immunol. Immunother. 1996, 42, 141–150. MCCALL, A. M., ADAMS, G. P., AMOROSO, A. R., NIELSEN, U. B., ZHANG, L., HORAK, E., SIMMONS, H., SCHIER, R., MARKS, J. D., and WEINER, L. M. Isolation and characterization of an anti-CD16 single-chain Fv fragment and construction of an anti-HER2/neu/anti-CD16 bispecific scFv that triggers CD16-dependent tumor cytolysis. Mol. Immunol. 1999, 36, 433–445. YOKOTA, T., MILENIC, D. E., WHITLOW, M., and SCHLOM, J. Rapid tumor penetration of a single-chain Fv and
45
46 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
42
43
44
45
46
47
48
49
comparison with other immunoglobulin forms. Cancer Res. 1992, 52, 3402–3408. CASEY, J. L., KEEP, P. A., CHESTER, K. A., ROBSON, L., HAWKINS, R. E., and BEGENT, R. H. Purification of bacterially expressed single chain Fv antibodies for clinical applications using metal chelate chromatography. J. Immunol. Methods 1995, 179, 105–116. BEGENT, R. H., VERHAAR, M. J., CHESTER, K. A., CASEY, J. L., GREEN, A. J., NAPIER, M. P., HOPE-STONE, L. D., CUSHEN, N., KEEP, P. A., JOHNSON, C. J., HAWKINS, R. E., HILSON, A. J., and ROBSON, L. Clinical evidence of efficient tumor targeting based on single-chain Fv antibody selected from a combinatorial library. Nature Med. 1996, 2, 979–984. NERI, D., CARNEMOLLA, B., NISSIM, A., LEPRINI, A., QUERZE, G., BALZA, E., PINI, A., TARLI, L., HALIN, C., NERI, P., ZARDI, L., and WINTER, G. Targeting by affinity-matured recombinant antibody fragments of an angiogenesis associated fibronectin isoform. Nature Biotechnol. 1997, 15, 1271–1275. BADER, B., KUHN, K., OWEN, D. J., WALDMANN, H., WITTINGHOFER, A., and KUHLMANN, J. Bioorganic synthesis of lipid-modified proteins for the study of signal transduction. Nature 2000, 403, 223–226. HUANG, X., GOTTSTEIN, C., BREKKEN, R. A., and THORPE, P. E. Expression of soluble VEGF receptor 2 and characterization of its binding by surface plasmon resonance. Biochem. Biophys. Res. Commun. 1998, 252, 643–648. KNIBIEHLER, M., GOUBIN, F., ESCALAS, N., , Z. O., MAZARGUIL, H., JONSSON ´ H¨UBSCHER, U., and DUCOMMUN, B. Interaction studies between the p21Cip1/Waf1 cyclin-dependent kinase inhibitor and proliferating cell nuclear antigen (PCNA) by surface plasmon resonance. FEBS Lett. 1996, 391, 66–70. KANAOKA, Y., KIMURA, S. H., OKAZAKI, I., IKEDA, M., and NOJIMA, H. GAK: a cyclin G associated kinase contains a tensin/auxilin-like domain. FEBS Lett. 1997, 402, 73–80. SHIN, S., SUNG, B. J., CHO, Y. S., KIM, H. J., HA, N. C., HWANG, J. I., CHUNG, C.
50
51
52
53
54
55
56
W., JUNG, Y. K., and OH, B. H. An anti-apoptotic protein human survivin is a direct inhibitor of cas-pase-3 and -7. Biochemistry 2001, 40, 1117–1123. HEIDER, K. H., SPROLL, M., SUSANI, S., PATZELT, E., BEAUMIER, P., OSTERMANN, E., AHORN, H., and ADOLF, G. R. Characterization of a high-affinity monoclonal antibody specific for CD44v6 as candidate for immunotherapy of squamous cell carcinomas. Cancer Immunol. Immunother. 1996, 43, 245–253. Van HAL, N. L., Van DONGEN, G. A., TEN BRINK, C. B., HEIDER, K. H., RECH-WEICHSELBRAUN, I., SNOW, G. B., and BRAKENHOFF, R. H. Evaluation of soluble CD44v6 as a potential serum marker for head and neck squamous cell carcinoma. Clin. Cancer Res. 1999, 5, 3534–3541. VEREL, I., HEIDER, K. H., SIEGMUND, M., OSTERMANN, E., PATZELT, E., SPROLL, M., SNOW, G. B., ADOLF, G. R., and Van DONGEN, G. A. Tumor targeting properties of monoclonal antibodies with different affinity for target antigen CD44V6 in nude mice bearing head-and-neck cancer xenografts. Int. J. Cancer 2002, 99, 396–402. RITTER, G., COHEN, L. S., WILLIAMS, C., Jr., RICHARDS, E. C., OLD, L. J., and WELT, S. Serological analysis of human anti-human antibody responses in colon cancer patients treated with repeated doses of humanized monoclonal antibody a33. Cancer Res. 2001, 61, 6851–6859. ALAEDINI, A., and LATOV, N. A surface plasmon resonance biosensor assay for measurement of anti-GM(1) antibodies in neuropathy. Neurology 2001, 56, 855–860. LILJEBLAD, M., RYDEN, I., OHLSON, S., LUNDBLAD, A., and PAHLSSON, P. A lectin immunosensor technique for determination of alpha(1)-acid glycoprotein fucosylation. Anal. Biochem. 2001, 288, 216–224. EIVAZOVA, E. R., MCDONNELL, J. M., SUTTON, B. J., and STAINES, N. A. Specificity and binding kinetics of murine lupus anti-DNA monoclonal
References
57
58
59
60
61
62
63
64
antibodies implicate different stimuli for their production (In Process Citation). Immunology 2000, 101, 371–377. GUERMAZI, S., REGNAULT, V., GORGI, Y., AYED, K., LECOMPTE, T., and DELLAGI, K. Further evidence for the presence of anti-protein S autoantibodies in patients with systemic lupus erythematosus (In Process Citation). Blood Coagul. Fibrinolysis. 2000, 11, 491–498. JONES, D. W., NICHOLLS, P. J., DONOHOE, S., GALLIMORE, M. J., and WINTER, M. Antibodies to factor XII are distinct from antibodies to prothrombin in patients with the anti-phospholipid syndrome. Thromb. Haemost. 2002, 87, 426–430. NEDELKOV, D., and NELSON, R. W. Analysis of human urine protein biomarkers via biomolecular interaction analysis mass spectrometry. Am. J Kidney Dis. 2001, 38, 481–487. BOHRMANN, B., TJERNBERG, L., KUNER, P., POLI, S., LEVET-TRAFIT, B., NASLUND, J., RICHARDS, G., HUBER, W., DOBELI, H., and NORDSTEDT, C. Endogenous proteins controlling amyloid beta-peptide polymerization. Possible implications for beta-amyloid formation in the central nervous system and in peripheral tissues. J. Biol. Chem. 1999, 274, 15990–15995. VORUM, H., JACOBSEN, C., and HONORE, B. Calumenin interacts with serum amyloid P component. FEBS Lett. 2000, 465, 129–134. LEFEBVRE, S., BURGLEN, L., REBOULLET, S., CLERMONT, O., BURLET, P., VIOLLET, L., BENICHOU, B., CRUAUD, C., MILLASSEAU, P., ZEVIANI, M. et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell 1995, 80, 155–165. MONANI, U. R., LORSON, C. L., PARSONS, D. W., PRIOR, T. W., ANDROPHY, E. J., BURGHES, A. H., and MCPHERSON, J. D. A single nucleotide difference that alters splicing patterns distinguishes the SMA gene SMN1 from the copy gene SMN2. Hum. Mol. Genet. 1999, 8, 1177–1183. YOUNG, P. J., MAN, N., LORSON, C. L., LE, T. T., ANDROPHY, E. J., BURGHES, A. H., and MORRIS, G. E. The exon 2b region of the spinal muscular atrophy protein,
65
66
67
68
69
70
71
72
73
SMN, is involved in self-association and SIP1 binding. Hum. Mol. Genet. 2000, 9, 2869–2877. ARIGA, T., and YU, R. K. GM1 inhibits amyloid beta-protein-induced cytokine release. Neurochem Res. 1999, 24, 219–226. CATIMEL, B., SCOTT, A. M., LEE, F. T., HANAI, N., RITTER, G., WELT, S., OLD, L. J., BURGESS, A. W., and NICE, E. C. Direct immobilization of gangliosides onto gold-carboxymethyldextran sensor surfaces by hydrophobic interaction: applications to antibody characterization. Glycobiology 1998, 8, 927–938. NAKAMURA, K., TANAKA, Y., SHITARA, K., and HANAI, N. Construction of humanized anti-ganglioside monoclonal antibodies with potent immune effector functions. Cancer Immunol. Immunother. 2001, 50, 275–284. Snider, W. D. FUNCTIONS of the neurotrophins during nervous system development: what the knockouts are teaching us. Cell 1994, 77, 627–638. IP, N. Y., and YANCOPOULOS, G. D. The neurotrophins and CNTF: two families of collaborative neurotrophic factors. Annu. Rev. Neurosci. 1996, 19, 491–515. LI, S.-C., LAI, K.-M. V., GISH, G. D., PARRIS, W. E., Van der GEER, P., FORMAN-KAY, J., and PAWSON, T. Characterization of the phospho-tyrosine-binding domain of the Drosophila Shc protein. J. Biol. Chem. 1996, 271, 31855–31862. WOO, S. B., WHALEN, C., and NEET, K. E. Characterization of the recombinant extracellular domain of the neurotrophin receptor TrkA and its interaction with nerve growth factor (NGF). Prot. Sci. 1998, 7, 1006–1016. ROSS, G. M., SHAMOVSKY, I. L., WOO, S. B., POST, J. I., VRKLJAN, P. N., LAWRANCE, G., SOLC, M., DOSTALER, S. M., NEET, K. E., and RIOPELLE, R. J. The binding of zinc and copper ions to nerve growth factor is differentially affected by pH: implications for cerebral acidosis. J. Neurochem. 2001, 78, 515–523. HOLCOMB, I. N., et al. FIZZ1, a novel cysteine-rich secreted protein associated with pulmonary inflammation, defines a
47
48 Real-time Characterization of Biomolecular Interactions using Biacore’s Optical Biosensors
74
75
76
77
78
79
80
81
new gene family. EMBO J. 2000, 19, 4046–4055. CAIRO, C. W., STRZELEC, A., MURPHY, R. M., and KIESSLING, L. L. Affinity-based inhibition of beta-amyloid toxicity. Biochemistry 2002, 41, 8620–8629. YAMAZAKI, M., MORI, H., ARAKI, K., MORI, K. J., and MISHINA, M. Cloning, expression and modulation of a mouse NMDA receptor sub-unit. FEBS Lett. 1992, 300, 39–45. LOMELI, H., SPRENGEL, R., LAURIE, D. J., KOHR, G., HERB, A., SEEBURG, P. H., and WISDEN, W. The rat delta-1 and delta-2 subunits extend the excitatory amino acid receptor family. FEBS Lett. 1993, 315, 318–322. ARAKI, K., MEGURO, H., KUSHIYA, E., TAKAYAMA, C., INOUE, Y., and MISHINA, M. Selective expression of the glutamate receptor channel delta 2 subunit in cerebellar Purkinje cells. Biochem. Biophys. Res. Commun. 1993, 197, 1267–1276. MIYAGI, Y., YAMASHITA, T., FUKAYA, M., SONODA, T., OKUNO, T., YAMADA, K., WATANABE, M., NAGASHIMA, Y., AOKI, I., OKUDA, K., MISHINA, M., and KAWAMOTO, S. Delphilin: a novel PDZ and formin homology domain-containing protein that synaptically colocalizes and interacts with glutamate receptor delta 2 subunit. J. Neurosci. 2002, 22, 803–814. YOUNG, P. J., DAY, P. M., ZHOU, J., ANDROPHY, E. J., MORRIS, G. E. and LORSON, C. L. A direct interaction between the survival motor neuron protein and p53 and its relationship to spinal muscular atrophy. J. Biol. Chem. 2002, 277, 2852–2859. LAUGHTON, C. A., TANIOUS, F., NUNN, C. M., BOYKIN, D. W., WILSON, W. D., and NEIDLE, S. A crystallographic and spectroscopic study of the complex between d(CGCGAATTCGCG)2 and 2,5-bis(4-guanylphenyl)furan, an analogue of berenil. Structural origins of enhanced DNA-binding affinity. Biochemistry 1996, 35, 5655–5661. CARRASCO, C., FACOMPRE, M., CHISHOLM, J. D., Van VRANKEN, D. L., WILSON, W. D., and BAILLY, C. DNA sequence recognition by the indolo-carbazole
82
83
84
85
86
87
88
89
90
antitumor antibiotic AT2433-B1 and its diastereoisomer. Nucleic Acids Res. 2002, 30, 1774–1781. CARRASCO, C., VEZIN, H., WILSON, W. D., REN, J., CHAIRES, J. B., and BAILLY, C. DNA binding properties of the indolocarbazole antitumor drug NB-506. Anticancer Drug Des. 2001, 16, 99–107. WANG, L., BAILLY, C., KUMAR, A., DING, D., BAJIC, M., BOYKIN, D. W., and WILSON, W. D. Specific molecular recognition of mixed nucleic acid sequences: an aromatic dication that binds in the DNA minor groove as a dimer. Proc. Natl Acad. Sci. USA 2000, 97, 12–16. DAVIS, T. M., and WILSON, W. D. Determination of the refractive index increments of small molecules for correction of surface plasmon resonance data. Anal. Biochem. 2000, 284, 348–353. SANDO, S., SAITO, I., and NAKATANI, K. Scanning of guanine–guanine mismatches in DNA by synthetic ligands using surface plasmon resonance. Nature Biotechnol. 2001, 19, 51–55. Kwok, P. Y. REFLECTIONS on a DNA mutation scanning tool. Nature Biotechnol. 2001, 19, 18–19. LACY, E. R., LE, N. M., PRICE, C. A., LEE, M., and WILSON, W. D. Influence of a terminal formamido group on the sequence recognition of DNA by polyamides. J. Am. Chem. Soc. 2002, 124, 2153–2163. HICKMAN, D. T., KING, P. M., SLATER, J. M., COOPER, M. A., and MICKLEFIELD, J. Kinetically selective binding of single stranded RNA over DNA by a pyrrolidine-amide oligonucleotide mimic (POM). Nucleosides Nucleotides Nucleic Acids 2001, 20, 1169–1172. HOU, M. H., LIN, S. B., YUANN, J. M., LIN, W. C., WANG, A. H., and Kan LS, L. Effects of polyamines on the thermal stability and formation kinetics of DNA duplexes with abnormal structure. Nucleic Acids Res. 2001, 29, 5121– 5128. TSOI, P. Y., and YANG, M. Kinetic study of various binding modes between human DNA polymerase beta and different DNA substrates by
References
91
92
93
94
95
96
97
98
surface-plasmon-resonance biosensor. Biochem. J. 2002, 361, 317–325. SUMMERTON, J., and WELLER, D. Morpholino antisense oligomers: design, preparation, and properties. Antisense Nucleic Acid Drug Dev. 1997, 7, 187–195. MANG’ERA, K. O., LIU, G., YI, W., ZHANG, Y., LIU, N., GUPTA, S., RUSCKOWSKI, M., and HNATOWICH, D. J. Initial investigations of 99mTc-labeled morpholinos for radiopharmaceutical applications. Eur. J. Nucl. Med. 2001, 28, 1682–1689. ASTRIAB-FISHER, A., SERGUEEV, D., FISHER, M., SHAW, B. R., and JULIANO, R. L. Conjugates of antisense oligonucleotides with the Tat and antennapedia cell-penetrating peptides: effects on cellular uptake, binding to target sequences, and biologic actions. Pharm. Res. 2002, 19, 744–754. BATES, P., REDDOCH, J., HANSAKUL, P., ARROW, A., DALE, R., and MILLER, D. Biosensor detection of triplex formation by modified oligonucleotides. Anal. Biochem. 2002, 307, 235. OKUMOTO, Y., OHMICHI, T., and SUGIMOTO, N. Immobilized small deoxyribozyme to distinguish RNA secondary structures. Biochemistry 2002, 41, 2769–2773. NILSSON, P., PERSSON, B., UHLE´ N, M., and NYGREN, P.-Å. Real-time monitoring of DNA manipulations using biosensor technology. Anal. Biochem. 1995, 224, 400–408. STEINRUCKE, P., ALDINGER, U., HILL, O., HILLISCH, A., BASCH, R., and DIEKMANN, S. Design of helical proteins for real-time endoprotease assays. Anal. Biochem. 2000, 286, 26–34. LOEW, D., PERRAULT, C., MORALES, M., MOOG, S., RAVANAT, C., SCHUHLER, S., ARCONE, R., PIETROPAOLO, C., CAZENAVE, J. P., Van DORSSELAER, A., and LANZA, F.
99
100
101
102
103
104
105
Proteolysis of the exodomain of recombinant protease-activated receptors: prediction of receptor activation or inactivation by MALDI mass spectrometry. Biochemistry 2000, 39, 10812–10822. ZAKO, T., HARADA, K., MANNEN, T., YAMAGUCHI, S., KITAYAMA, A., UEDA, H., and NAGAMUNE, T. Monitoring of the refolding process for immobilized firefly luciferase with a biosensor based on surface plasmon resonance. J. Biochem. (Tokyo). 2001, 129, 1–4. HOUSEMAN, B. T., HUH, J. H., KRON, S. J., and MRKSICH, M. Peptide chips for the quantitative evaluation of protein kinase activity. Nature Biotechnol. 2002, 20, 270–274. KATSAMBA, P. S., BAYRAMYAN, M., HAWORTH, I. S., MYSZKA, D. G., and LAIRD-OFFRINGA, I. A. Complex role of the beta 2–beta 3 loop in the interaction of U1A with U1 hairpin II RNA. J. Biol. Chem. 2002, 277, 33267–33274. NELSON, R. W., NEDELKOV, D., and TUBBS, K. A. Biosensor chip mass spectrometry: a chip-based proteomics approach. Electrophoresis 2000, 21, 1155–1163. KRONE, J. R., NELSON, R. W., DOGRUEL, D., WILLIAMS, P., and GRANZOW, R. BIA/MS: interfacing biomolecular interaction analysis with mass spectrometry. Anal. Biochem. 1997, 244, 124–132. S¨ONKSEN, C. P., NORDHOFF, E., JANSSON, ¨ MALMQVIST, M., and ROEPSTORFF, P. O., Combining MALDI mass spectrometry and biomolecular interaction analysis using a biomolecular interaction analysis instrument. Anal. Chem. 1998, 70, 2731–2736. WILLIAMS, I., and ADDONA, T. A. The integration of SPR biosensors with mass spectrometry: possible applications for proteome analysis (In Process Citation). Trends Biotechnol. 2000, 18, 45–48.
49
1
Cellular Assays and Their Application in Drug Discovery
Hugo Albrecht, Daniela Brodbeck-Hummel, Michael Hoever, Beatrice Nickel, and Urs Regenass Discovery Partners International AG, Allschwil, Switzerland
Originally published in: Molecular Biology in Medicinal Chemistry. Edited by Theodor Dingermann, Dieter Steinhilber and Gerd Folkers. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30431-8
1 Introduction 1.1 Positioning Cellular Assays
Cell-based assays allow to study the function of pharmaceutical or disease targets within complex environments and in the overall biological context. The application of techniques in molecular biology together with the introduction of new tools and analytical devices has moved readouts of cellular assays from phenotypic endpoints such as cell proliferation, cell death, respiration and functional differentiation to the cellular analysis of functional states of specific signaling molecules and metabolic components. This transition of cellular assays from “black box” systems to specific targetassociated, mechanistic measurements allows the identification of chemical compounds with target-specific or closely associated modulatory function. It can therefore be assumed that cellular assays will play an increasingly important role in early drug discovery, in particular in discovering compounds for target validation (“chemical genetics”; [1] for review) and in hit-to-lead selection as well as in lead optimization. In contrast to biochemical testing of chemicals for modulatory activity on molecular targets, cellular systems can deliver additional information with respect to drug transport, metabolism and stability. Most importantly, pharmacological targets remain in their natural environment when probed for modulation and data should therefore have a higher predictive value (Fig. 1).
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Cellular Assays and Their Application in Drug Discovery
Tight target coupled and phenotypic cellular assays
Tight target coupled cellular assays
Identification of potential targets
Assay Development
Hit Finding
Lead Finding
Target validation Disease linkage, Functional studies, Chemogenomics
Lead driven Target selection Lead Optimization Drug Candidate Selection
Fig. 1 Chemistry-driven target selection. The increased number of potential pharmaceutical targets emerging from the functional annotation of the human genome requires an efficient process to select those that are valid for modulating diseaserelevant pathways and phenotypes. Cellular assays which are tightly coupled in the readout with the target of interest allow us
•Tight target coupled assays •Phenotypic assays •Loose target coupled or uncoupled assays: Systems effects •Intracellular pharmacology to qualify the hits identified in cellular or biochemical high-throughput assays with respect to functionality in a physiologically relevant environment. Cellular assays which are not tightly coupled to the target will indicate system effects of compounds and can be interpreted as a cellular “safety” classification. Cellular assays can be used for target and compound prioritization.
However, it remains to be said that, in order to use cellular assays in high throughput, cells need to be taken out of their natural habitat and put into culture. This will inevitably lead to alterations in the repertoire of gene and protein expression, and therefore in their phenotype. Cells used in assays should therefore be characterized with respect to the functionality of their intracellular networks, responses to external stimuli, at least for the signaling pathway of interest, and compared with responses and signaling events at the tissue or organ level. In many cases this is unfortunately not yet fully possible. 1.2 Impact on Drug Discovery
Many compounds are lost during the discovery and development process due to undesirable effects or toxicity and lack of required efficacy. A large part of these effects can be related to interactions of the potential drug with targets other than the
1 Introduction 3
one initially selected to modulate disease. Cellular test systems have the potential to elucidate the multiplicity of interactions of drugs, and therefore contribute to a better understanding of drug action and drug selection. The cell is organized in metabolic and signal transduction cascades. Most of these cascades are complex and nonlinear [2]. In addition, different cascades communicate with each other, and form intracellular circuits and domains. Most diseases cannot be explained by one altered step in a pathway, but are due to several alterations that affect networks in a complex way [3, 4]. Similarly, drugs, whether they interact specifically with one single target or with several molecules, will disturb signaling and metabolic cascades in a complex fashion. In addition, since many cell types make use of the same types of signaling molecules, in a different context, many cell types might be affected to different extents by the same drug. The recent introduction of high-content screening, high-throughput gene and protein expression, as well as metabolite analysis [5, 6], offers the possibility to study system interactions in a multiparallel fashion, and allows a better understanding of the degree and location of interference. Cell-based assays therefore provide a solution to study the modulation of a pharmaceutical target within its complex natural network as well as to assess potential effects on other networks. Human diseases are characterized by one or several alterations in metabolic and signaling pathways important to maintain cellular homeostasis and the differentiated functions. Cellular assays can offer a representation of such alterations and therefore become relevant functional models to assess drug action. Nowadays, cellular assays offer the opportunity to identify the intervention points that lead to phenotypic or endpoint alterations. Often, multiple intervention sites are necessary to alter the outcome of biological networks. Similar rules might apply for the number of drug targets that need to be modulated; in particular, for complex diseases. The right targets and drug combinations can be selected only in the case where each single component of a particular pathway can be individually studied, and this is only rendered possible by the use of cellular systems. The combination of compound that lead to the desired endpoint can be identified by analyzing multiple parameters. The analysis can, moreover, facilitate the assessment of the effect of individual compounds on multiple pathways. This can be taken as a measure for “unwanted effects” (safety profile), but can also lead to the identification of applications in new indications. The analysis of targeted and focused chemical libraries in cellular systems may allow a rapid selection of compounds with various and desired profiles. Hierarchical, spatio-temporal readouts will generate system-level insight into mechanisms of drug action (intracellular pharmacology) and allow us to predict systemic effects (Fig. 2). The development and application of computational tools as well as models of intracellular pathways are a prerequisite to secure data evaluation and interpretation [7, 8]. System-level understanding is only possible by seeking experimental results at the cellular level or beyond. The recent developments of cellular high-throughput systems, which allow specific measurements of molecular events, render applications in primary screening, and in hit-to-lead selection and lead optimization, in particular, possible. This will
4 Cellular Assays and Their Application in Drug Discovery
Extracellular space
• Receptors • Transporters • Ion channels
•pH •pO2
Legend Receptor ligands Phosphorylation Ions, metabolites Translocstions
Cytoplasm
Proteins
with Organelles
Proteins
Sensors fast response
m RNA
medium response slow response
Nucleus with DNA and transcription machinery
+++
+++
Fig. 2 Schematic cellular circuit and cell systems analysis concept. Cells connect to their environment with a wealth of receptors, channels and transporters. Intracellular signaling and metabolic pathways and networks react to intra- and extracellular stimuli in a spatio-temporal manner. Different cell types can react differently to the same stimuli. It can be envisaged that different cell types, cells from normal or pathologi-
cal tissue, or genetically manipulated cells, can be exposed to different stimuli with and without drugs and the cellular reaction can be assessed by sensor systems in a spatiotemporal fashion, from fast responses to phenotypic changes. Multiparametric analysis will identify compounds with targetspecific, target-unrelated, reversible or irreversible effects on cellular homeostasis.
enable the development of structure–activity relationships at the system level as a new paradigm in compound selection and decision-making. 1.3 Classification of Cellular Assays
Cellular assays have the advantage that the pharmacological targets do not need to be purified. However, in many instances, cellular assays require cell manipulation to allow for a specific signal readout. Overexpression of cell-surface receptors or intracellular proteins can have a substantial effect on the cell physiology and have to be taken into account when using the systems for the discovery of novel drugs. Early reactions to hormones, growth factors and neurotransmitters are often mediated via second-messenger systems or rapid influx and efflux of ions. Posttranslational protein modifications and translocation of proteins are the next steps that alter the physiological state of cells. Such changes are able to lead to flow changes in metabolic pathways, and to alterations in gene and protein expression, ultimately leading to new cellular phenotypes, including cell proliferation and cell death.
1 Introduction 5 Table 1 Classification of functional cellular assay types (see text for references)
Second messengers and channels
Protein dynamics
Gene expression
Protein expression
Metabolites profiling Metabolic and phenotypic responses
Ca2+ levels ion channels transporters cyclic nucleotides posttranslational modifications (e.g. phosphorylation) protein translocations protein–protein interaction chip technologies branched DNA technology oligonucleotide sensors reporter gene technology ELISA assays protein chip arrays quantitative analytical profiling cell proliferation, cytotoxicity, cell death/apoptosis respiration, acidification cell differentiation
Cellular assays can be classified by the temporal events occurring when a cell is exposed to alterations in the extra- and intracellular environment (Tab. 1). Fast responses indicated by changes in ion or second-messenger concentrations ultimately lead to functional and phenotypic changes. Some of the cellular events can be recapitulated in so-called model systems, which might be easier to handle and to manipulate genetically than mammalian cells. Yeast has turned out to be an organism of choice, where metabolic signals and protein–protein interactions relevant for signal transduction can be translated into easily measurable growth response (for review, see [9, 10]. Several modifications of the yeast two-hybrid system have shown utility in finding compounds that functionally interfere with specific intracellular mechanisms. Most recently, a sensor system for drug discovery based on conformational activation of a proliferationrelevant enzyme upon drug binding has been described [11]. However, compared to mammalian cells, the yeast cell has some trade-offs, which include the cell wall, lack of the full complement of mammalian genes and alterations in its metabolic systems. The use of yeast two-hybrid and yeast “tribrid” systems [12] are not discussed further in this article. 1.4 Progress in Tools and Technologies for Cellular Compound Profiling
The refinement of analytical devices [13] and reader systems today enables investigators to obtain broad profiles of alterations in gene and protein expression [14] as well as metabolites [5, 15]. Unless a complete system-wide analysis is required, the present technologies allow a reasonable throughput for drug profiling. For gene expression studies, reporter gene assays can substitute for chip-based arrays, at least in selected cases [16, 17].
6 Cellular Assays and Their Application in Drug Discovery
Several cellular imaging or high-throughput microscopy and laser-scanning systems are now becoming available together with the development of new fluorescent tags and specific antibodies. This combination allows us to study protein movements in cells. The study of protein dynamics and interactions has been largely facilitated by the identification of fluorescent proteins [6, 18, 19] and other fluorescent dyes [20] in combination with the development of fluorescence resonance energy transfer (FRET) applications. Other technologies such as multipole coupling spectroscopy (MCS) [21] and biochips for multiparametric cell monitoring [22] are just becoming available. The FLIPRTM (FLuorometric Imaging Plate Reader; Molecular Devices, Sunnyvale, CA) system has been available for several years to study rapid cellular second-messenger responses, such as Ca2+ release, and has been recently upgraded for the analysis of ion channels. The FLIPR applications are discussed in detail below. In the following sections, we will describe a selected set of applications in more detail. The discussion is grouped according to Tab. 1 into fast responses or secondmessenger systems, protein dynamics, alterations in gene and protein expression, and metabolic as well as phenotypic readouts.
2 Membrane Proteins and Fast Cellular Responses
A great variety of membrane proteins have been expressed by using recombinant DNA technology in different eukaryotic cells (e.g. mammalian cell lines, insect cells and yeast) for functional studies of cell-surface receptors, ion channels or transport proteins. 2.1 Receptors
Cell-surface receptors typically mediate cell signaling from the cell surface to the cytoplasm and/or nucleus involving two major principles, i.e. signaling via enzymelinked receptors or trimeric GTP-binding proteins (G-proteins). The enzymelinked receptors include the receptor guanylyl cyclases, receptor tyrosine kinases, tyrosine-kinase-associated receptors, receptor tyrosine phosphatases and receptor serine/threonine kinases [23]. In all cases, a cascade of events is initiated upon activation, which alters receptor status and the concentration of one or more small intracellular signaling molecules. Here, we will discuss assays monitoring signals mediated by G-proteins, which are activated by G-protein-coupled receptors (GPCRs) upon binding of a specific ligand. Subsequent to stimulation, changes of intracellular cyclic AMP (cAMP) or Ca2+ concentrations are induced. These two signaling molecules are known to be involved in regulatory processes of the cardiovascular and nervous systems, immune mechanisms, cell growth and differentiation, and general metabolism. Both messengers act as allosteric effectors on
2 Membrane Proteins and Fast Cellular Responses
specific target proteins in the cell including, but not limited to, kinases (e.g. protein kinase A and C), lipases (e.g. phospholipase Cβ), Ca2+ -binding proteins (e.g. calmodulin) and ion channels (e.g. ionotropic glutamate receptors). The cytoplasmic concentrations of the second messengers are regulated via plasma membrane enzymes – in the cAMP pathway the enzyme adenylate cyclase directly produces cAMP and in the Ca2+ pathway lipases are stimulated to produce the soluble messenger inositol triphosphate (IP3 ), which in turn induces the release of Ca2+ from the endoplasmic reticulum. In some cases, ion channels are activated by direct interaction with G-protein subunits. Different technologies have been developed for intracellular monitoring of second messengers in living cells. Detection of Ca2+ is generally based on fluorescent indicators, which have the property of being able to easily penetrate cell membranes and alter their fluorescent emission when bound to Ca2+ . For measurement of a typical fast and transient response, the binding reaction to Ca2+ must be reversible, therefore providing important kinetic data which allow us to distinguish between a typically transient signal due to specific receptor activation and nonspecific signals affecting Ca2+ homeostasis or membrane permeability. There are currently many different systems available to detect cAMP in cellbased assays. In most cases cells are lysed after exposure to a specific ligand and the amount of cAMP is determined with a competitive immunoassay (PerkinElmer Life Sciences, Boston, MA; Molecular Devices; Biomol Research Laboratories, Plymouth Meeting, CA) or a competitive enzyme complementation assay (DiscoveRx, Fremont, CA). Typically, GPCRs are overexpressed in eukaryotic cell lines to monitor the concentration of second messengers upon stimulation with a specific ligand [24]. In many cases, the co-expression of a suitable G-protein subunit or chimeric G-proteins is necessary in order to elicit a signal sufficient for detection [25, 26].
2.1.1 FLIPR Technology for Detection of Intracellular Calcium Release The FLIPR system provides readouts for high-throughput, cell-based assays that represent information-rich kinetic data in 96- or 384-well format. In brief, cells are loaded with a fluorescent indicator (e.g. Fluo-4 AM for Ca2+ ), which shows an absorption spectrum compatible with excitation at 488 nm with an argon-ion laser source, leading to maximum emission at 516 nm in the presence of Ca2+ . Fluorescence emission is induced simultaneously in all wells of a 96- or 384-well plate and kinetics of intracellular Ca2+ release, triggered by exposure to a specific ligand, can be monitored at a maximum of 60 sequential measurements per minute. The FLIPR system rapidly discriminates between full agonists, partial agonists and antagonists to accelerate primary and secondary screening. Figure 3 illustrates a typical experiment conducted with a mammalian cell line stably transfected with an expression vector encoding a GPCR gene. Positive compounds depicted in Fig. 3b were subjected to verification in duplicate measurement and confirmed hits were further tested for dose-response relationships in EC50 /IC50 experiments.
7
8 Cellular Assays and Their Application in Drug Discovery
2.1.2 Competitive Immunoassay for Detection of Intracellular cAMP Most assays designed for detection of intracellular cAMP are based on monoclonal antibodies specifically recognizing cAMP. Typically, cell lysates are mixed with exogenously added biotinylated cAMP to form a heterotrimer with a specific antibody and streptavidin. This complex formation is competed by endogenous non-biotinylated cAMP. Streptavidin and antibodies can be linked covalently to a donor and an acceptor fluorophore (e.g. europium cryptate and allophycocyanin), forming an interacting FRET pair upon complex formation. A decreased signal is observed with an increase in intracellular cAMP produced, due to competition for binding to the specific antibody. The AlphaScreenTM technology (PerkinElmer Life Sciences) is an alternative technology that allows quantitative detection of a broad range of cellular components,
(a)
20000
Max2
17900
Max1
15800
Fluorescence Change (Counts)
13700 11600 9500 7400 5300 3200 1100 Min -1000 00
20
Compound addition
40
60 80 100 Time (seconds)
120
140
160
180
200
Ligand addition
Fig. 3 Determination of intracellular Ca2+ with the FLIPR system. (a) A fluorescent trace of a typical Ca2+ kinetic is shown and the measured parameters used for calculation of maximal fluorescence signals are depicted. In this example the maxima were measured before and after ligand addition (Max1 and Max2 respectively) and standardized by dividing by the minima measured
before compound addition (Min = baseline). The average of the positive controls was set as 100% control and the average of the negative controls was set as 0% control. For all wells, relative activities with respect to the controls were calculated and the cutoff criterion for both agonists and antagonists was 50%.
2 Membrane Proteins and Fast Cellular Responses
(b)
Agonist
1234567891011121314151617181920212223
151617181920212223
Seed GPCR over-expressing cells into384-well plates
24
A B C D E 1234567891011121314151617181920212223 A
F
B
G
C
H
D
I
E
J
F
24
Incubate over night
K
G
L
H
M
I
N
J
O
K
P
Load cells with Fluo-4 - AM dye
L M N
Screen chemical library
O P
Antagonist
Agonists
NO
Stop
NO
YES
YES
Verification in duplicate
Agonists
Antagonists
NO
YES
EC50 Determination Fig. 3 (Continue)(b) A flow diagram for a typical experiment is shown. Genetically modified cells overexpressing the target GPCR were seeded into 384-well plates overnight. The cells were washed with suitable buffer prior to adding buffer containing the Fluo-4 AM dye, followed by incubation for 45 min. After an additional wash step, the cell plates were transferred into the FLIPR machine and the chemical com-
Verification in duplicate
Stop
NO
Antagonists YES
IC50 Determination pounds were added followed by ligand addition. Fluorescence emission was monitored starting before compound addition and continued until a substantial decay of the signal was observed. Two 384-well plates from a typical screen are included. Examples of an agonist and an antagonist are shown as enlarged graphs. The 100% and 0% controls are overlaid as red- and blue-hatched lines, respectively.
9
10 Cellular Assays and Their Application in Drug Discovery
including cAMP. AlphaScreen relies on the use of “donor” and “acceptor” beads that are coated with a layer of hydrogel providing functional groups for bioconjugation. When the beads are brought into proximity, a cascade of chemical reactions is initiated to produce a greatly amplified signal. Upon laser excitation, a photosensitizer in the “donor” bead converts ambient oxygen to a more excited singlet state. The singlet state oxygen molecules diffuse across to react with a chemiluminescent in the “acceptor” bead that further activates fluorophores contained within the same bead. The fluorophores subsequently emit light at 520–620 nm. In the absence of a specific biological interaction, the singlet state oxygen molecules produced by the “donor” bead go undetected without the close proximity of the “acceptor” bead. AlphaScreen has been used in GPCR functional assays for detection of endogenous cAMP production. In all the above-mentioned systems, the cells have to be subjected to lysis prior to measurement, adding not only an extra step to the assay procedure, but more importantly making kinetic measurements impossible because data can only be generated from endpoints. In addition, whole-cell FRET methods have been developed to determine the temporal and spatial dynamics of cyclic nucleotides (see Section 4.4). 2.1.3 Enzyme Fragment Complementation (EFC) Technology EFC is based on genetically modified enzymes consisting of two fragments, i.e. the enzyme acceptor (EA) and the enzyme donor (ED). The separated fragments are inactive and only spontaneous formation of heterodimers will lead to an enzymatically active form [27]. The HitHunterTM technology (DiscoveRx) is based on a modified version of the β-galactosidase enzyme forming an EA/ED pair. The technology has been adapted for use in many different assays including, but not limited to, kinase, progesterone receptor, estrogen receptor and cAMP assays. Here, we only discuss specific application for intracellular detection of cAMP. In brief, an ED–cAMP peptide conjugate is utilized, in which cAMP is recognized by a specific antibody. This ED fragment is able to complement EA to form an active complex, but only in the absence of the specific antibody. In the assay, anti-cAMP antibody is titrated for complete inhibition of enzyme formation. Exposure of a defined antibody aliquot to cell lysate will lead to antibody–cAMP complex formation, therefore reducing the free amount of antibody. Subsequently, addition of ED conjugate in the following assay step will lead to reduced antibody–ED complex formation, therefore promoting complementation of active enzyme. Free ED–conjugate in the assay is proportional to the concentration of intracellular cAMP and the same applies for the measured β-galactosidase activity (see www.discoverx.com for additional information). 2.2 Membrane Transport Proteins
A great number of substances are transported in and out of complex organisms, in some cases using passive transport, driven solely by diffusion down a concentration
2 Membrane Proteins and Fast Cellular Responses
gradient either through the paracellular or transcellular route. Absorption and distribution of many food constituents such as ions, sugars, amino acids and nucleotides in the body are mediated/facilitated by specific membrane transport proteins, which form channels across cellular membranes. Similar mechanisms are needed for active removal of xenobiotics, waste products and cell metabolites from body tissue. These specific proteins are divided into two classes, i.e. uniporters and coupled transporters, of which the former transport a single solute from one side of the membrane to the other and, for the latter, the transfer of one solute depends on the simultaneous or sequential transport of a second solute. Coupled transporters are further divided into symporters moving compounds in the same direction and antiporters moving the compounds in opposite directions. Some specific transporters act in a passive manner moving molecules down a concentration gradient, as is the case with many nutrients such as glucose, which are found at high concentrations in the extracellular space of tissues. A similar situation is found with ligand- and voltage-gated ion channels of the nervous system, which specifically pass ions down a concentration gradient upon stimulation by neurotransmitters or membrane depolarization, respectively. In contrast, active transporters are linked to hydrolysis of ATP and move compounds in a highly specific manner against a gradient. The largest known and most diverse group of active transport proteins is the ABC transporter superfamily [28]. Over 50 ABC transporters have been described and the variety of substrates transported is large, including amino acids, sugars, inorganic ions, polysaccharides, peptides and even large proteins. One of the most cited ABC transporters, the multidrug-resistance (MDR1) protein [29], has been in the limelight of many drug discovery efforts in recent years, since overexpression in human cancer cells can render these cells simultaneously resistant to a variety of chemically unrelated drugs that are widely used in cancer chemotherapy. Other important phenomena linked to ABC transporters include resistance to chloroquine in malaria parasites, presentation of antigenic peptides on immune cells and the genetic disease known as cystic fibrosis [30–32]. 2.2.1 Ion Channels Most cells maintain an electropotential across the membrane which has been most extensively studied in neurons and muscle cells where the potential is particularly high, and reaches a value of about −70 mV. The membrane potential is maintained by the ubiquitous Na+ -K+ -ATPase, which operates as an antiporter actively pumping Na+ out of the cell, while pumping K+ into the cell. For every molecule of ATP hydrolyzed, three Na+ are pumped out and two K+ pumped in, therefore creating a negative net charge inside the cell. Ion channel proteins form pores in cell membranes, allowing certain ions to pass through driven by a concentration gradient. A characteristic feature of ion channels is a gating mechanism that controls the movement of ions. Here we focus on ion channels, which are expressed in neurons or muscle cells, and therefore play important roles in neuronal signaling and muscle contraction. Changes of the membrane potential can be induced in a spatially limited fashion by ligand-gated Na+ channels
11
12 Cellular Assays and Their Application in Drug Discovery
sensitive to neurotransmitters. Subsequent opening of voltage-gated Na+ channels induces self-propagation of the depolarization along the cell membrane (action potential), finally inducing the opening of voltage-gated Ca2+ channels typically at the synapse (end plate). Influx of Ca2+ induces the release of neurotransmitters into the synaptic cleft, further propagating the signal on to the post-synaptic cell. Malfunctions of ion channels are the basis of many ailments, including muscle and cardiac disorders, epilepsy, migraine, depression, and ataxia. Opening of ion channels is in most cases monitored by measuring changes in the membrane potential. The gold standard for this purpose is the patch clamp technology, offering very high sensitivity, good temporal resolution and measurement of single channel conductance at a precise voltage. To date this technology provides the best data for functional assessment of ion channels; however, the massive disadvantages include requirement of skilled operators, laborious work flow and low throughput. Currently, different companies are involved in designing novel devices to achieve automation of the patch clamping procedure. One of the most advanced tools is the IonWorksTM station (Molecular Devices, Sunnyvale, CA), which uses integrated fluidics for automatic positioning of single cells on microapertures on specially fabricated supports [33]. Single ion-channel recordings were demonstrated at a throughput of up to 2000 assay points per day. Higher throughputs can be achieved by using solvatochromatic, cell-permeable dyes like DiBAC (Molecular Probes, Eugene, OR) or the more recently developed FMP dye (Molecular Devices). Combination with a FLIPR system allows monitoring of ion channel function in true high-throughput mode surpassing analysis of 30,000 compounds per day (for more details, see the following paragraph). More details including other well-established ion-channel assay technologies were described in a recent review article [34].
2.2.1.1 FLIPR Technology and Ion Channel Assays To identify compounds that modulate ion channel activity, it is imperative that rapid and economical evaluation of biological activities in a high-throughput manner is available. One of the fluorescence-based assays of membrane potential changes uses a potentiometric fluorescent probe, bis-(1,3-dibutylbarbituric acid) trimethine oxonol [DiBAC4 (3)], that partitions between the intracellular and extracellular compartments depending on the electropotential across the membrane. Upon depolarization of the cells, the dye moves into the cell, and subsequent binding to intracellular hydrophobic sites results in enhanced quantum yield and in an increase in fluorescence intensity. Conversely, membrane hyperpolarization triggers dye extrusion from the cells and consequently decreases fluorescence, because the quantum yield of DiBAC4 (3) is low in an aqueous environment. The partitioning of DiBAC4 (3) is a slow process and therefore the technology is not amenable to measure channels with fast-gating kinetics or with rapid desensitization. Significant improvement was achieved with the recent development of the FMP dye [35, 36], which has faster partitioning kinetics. Figure 4 shows a typical experiment conducted with differentiated myotubes (C2C12 cell line).
2 Membrane Proteins and Fast Cellular Responses
Agonist
1
2
3
4
5
6
7
8
9
13
Seed C2C12 Cells into 384-well Plates
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
A B C D
Differentiate into Myotubes
E F G H I J K L M
Load Cells with FMP-dye
N O P
Screen LOPAC Library Antagonist
Agonists
NO
YES EC50 Determination And Specificity Check Decreasing conc. of agonist
Pacuronium addition
Fig. 4 Ion-channel activity in differentiated myotubes. The murine C2C12 muscle cell line was used to validate the FMP dye. C2C12 cells were seeded into 384-well plates and differentiated into myotubes by exposure to horse serum. Differentiated C2C12 cells strongly express nAChR and depolarization of the cells can be induced
Stop
NO
Antagonists YES
IC50 Determination
Decreasing conc. of antagonist
Carbachol addition
with specific receptor agonists, such as carbachol, which was applied in the outlined experiment. A 384-well plate from a typical screen for agonists and antagonists is shown. Example of an agonist and an antagonist are depicted as enlarged graphs. The 100% and 0% controls are overlaid as red- and blue-hatched lines, respectively.
14 Cellular Assays and Their Application in Drug Discovery
The LOPAC library (Sigma-Aldrich, St Louis, MO) of pharmacologically active compounds was screened for identification of compounds exerting agonistic or antagonistic effects on the nicotinic acetylcholine receptor (nAChR). In brief, the cells were washed with buffer and loaded with the FMP dye for 30 min at room temperature. Six hundred and forty compounds from the LOPAC library were added in the FLIPR followed by stimulation with carbachol. In contrast to a typical Ca2+ signal, the membrane depolarization does not show a transient kinetic. However data evaluation and hit identification can be carried out as described in Fig. 3. Identification of agonists and antagonists is possible in one experiment. Positives were subjected to EC50 /IC50 determination and, in the case of agonists, the specific AChR antagonist pancuronium was used to show reversal of the carbachol signal (see fluorescence traces shown for the EC50 determination, Fig. 4). Initially 21 compounds were identified as agonists and 12 compounds showed specificity for the AChR. 2.2.1.2 Voltage Ion Probe Reader (VIPR)TM Technology and Ion Channel Assays VIPR technology (Aurora Bioscience, San Diego, CA) is based on FRET and uses two fluorescent molecules. The first molecule, oxonol, is a highly fluorescent, negatively charged, hydrophobic ion that can rapidly redistribute between two binding sites on opposite sides of the plasma membrane in response to changes in membrane potential. The second fluorescent molecule, coumarin lipid, binds specifically to one face of the plasma membrane and functions as a FRET donor to the voltagesensing oxonol acceptor molecule. When the oxonol moves to the intracellular plasma membrane binding site upon depolarization, FRET is decreased, and this results in an increase in the donor fluorescence and a decrease in the oxonol emission. This technology enables research and assay development in the field of ion channel biology. VIPR technology can deliver subsecond, real-time readouts from assays in a microplate format (see www.aurorabio.com/ for additional information). 2.2.2 MDR Proteins MDR is mainly linked to overexpression of proteins of the ABC transporter superfamily. The most extensively studied member is the MDR1 gene product (Pglycoprotein), which is expressed in the intestine, kidneys, liver and blood-brain barrier, as well as in many tumor cells. In the latter case, high overexpression is one of the hallmarks of cell resistance to cancer chemotherapy. This observation is a result of anticancer drugs being actively pumped out of cells, resulting in decreased intracellular drug levels. Small molecules capable of reversing efflux can restore drug sensitivity in resistant tumor cells. The most commonly used assays to identify MDR pump inhibitors are based on fluorescent P-glycoprotein substrates such as Hoechst 33342, rhodamine 123 and rhodamine 6G. The cells are loaded with the substrate and extracellular dye is removed by washing prior to lysis with detergents [37]. Subsequent measurement of fluorescence emission provides a quantitative readout, which is proportional to the intracellular accumulation of the substrate. Exposure of the cells to MDR pump inhibitors will reduce efflux of the substrate
3 Gene and Protein Expression Profiling in High-throughput Formats
and therefore leads to an increased fluorescence readout. However, these assays are not homogeneous and involve washing steps after loading of the cells with the corresponding dye. A novel MDR pump substrate was recently developed to run fully homogeneous assays on a FLIPR system. The FLIPR Multidrug Resistance Assay (Molecular Devices) is a fully homogeneous application needing only 15 min of incubation at room temperature with a proprietary dye, which is transported by the MDR1 gene product and changes its fluorescent properties only after modification by intracellular enzymes. This allows performance of live-cell kinetic assays for screening MDR pump inhibitors in high-throughput mode. The elimination of wash steps and temperature control means that the assay is highly amenable to automation [37].
3 Gene and Protein Expression Profiling in High-throughput Formats
Gene and protein expression profiling can be applied in drug discovery for at least two different purposes. The first application will be for the discovery of new potential drug leads. In this case, the target of interest is either a component of the expression machinery itself or monitoring of expression serves as an indicator for an upstream target and reflects the coupled cellular response [16]. To make high-throughput screening (HTS) possible, the promoter sequences of interest, that have been shown to be coupled to upstream cellular events, are linked to so-called reporter genes, which serve as indicators for transcriptional and translational activation. The second application is the use of expression profiling to obtain information on the specificity of a drug candidate at the cellular level or for the identification of surrogate markers for drug action. In this case, gene or protein expression in response to a potential drug candidate of as many genes and proteins as possible should be investigated. Expression patterns can be used as an indicator for the number of interactions within the cell. Such data could be of relevance for getting an understanding of the broader impact of modulating a potential drug target on the whole cellular system. Gene expression chips have been introduced recently for the measurement of hundreds of simultaneously expressed genes at the mRNA level [38]. The utility of the approach has been recently validated in yeast, where expression profiles identified pathways of previously uncharacterized mutations or identified new potential drug targets [39]. Similarly, protein chip arrays have been introduced to capture a particular fraction of proteins and allow analysis with a reasonable sample throughput [40]. In addition, the technology to analyze the cellular proteome, or parts of it, has been recently introduced [14]. The analysis of thousands of samples becomes a prerequisite if expression profiling is to be used in hit identification. Hit-to-lead selection and lead optimization will require a throughput which is at the level of hundreds to thousands of compounds. Both can be achieved by focusing on selected indicator genes or their products,
15
16 Cellular Assays and Their Application in Drug Discovery
indicative of the discovery target of interest, rather than on broad expression profiling. Several targets can be addressed at once by running established single-readout assays in parallel or by using multiplexing as a tool to combine several readouts in one assay step [41]. In the following section, we will focus on examples that can be applied in early drug discovery and which allow a high sample throughput. Gene expression chips [38, 42] and proteomic technologies [43] are the subject of many recent reviews. However, due to their still limited sample throughput, they will only be mentioned briefly here. 3.1 Reporter Gene Assays in Lead Finding
Transcription factors can be classified according to their structural nature, but also according to their behavior to respond to modulation of signal transduction pathways. Such a classification, which is the basis for reporter gene technology and expression profiling, has recently been published by Brivanlou and Darnell [44]. According to this classification, transcription factors can be grouped into two large classes – those who are constitutively activated, and those whose activity is regulated in a spatio-temporal, quantitative and qualitative manner. Among the regulated transcription factors is the large class of signal-dependent factors, which are activated upon an intra- or extracellular signal. Genes regulated via the latter are attractive as reporter genes since they couple the signaling pathways with a transcriptional response. Extracellular signals, after interacting with appropriate receptor molecules at the cell surface, initiate a cascade of intracellular events which often leads to alterations in the expression of a set of genes. One or several of these genes can be used as an indicator for the modulation of targets within specific signaling pathways. The activation of a certain gene can be easily identified by “tagging” the particular gene with a so-called reporter gene. For this, plasmids need to be constructed which contain the promoter or regulatory region of the gene of interest linked to the reporter gene. These plasmids are then introduced into a cell line which expresses the pathway of interest containing the pharmacological targets and couples the reporter readout with the particular pathway of interest. A number of genes have been described that can serve as reporter genes [16, 45]. Expression can be monitored by fluorescence, e.g. by using green fluorescent protein (GFP) as a reporter, or through the detection with fluorescently labeled antibodies. Alternatively, the enzymatic activity of the reporter is used, as it is the case for luciferase, which converts the substrate to a luminescent product, or the reporter is detected with well-established and characterized secondary readouts producing fluorescence, luminescence or colored reactions. Reporter gene technology has been widely used to establish high-throughput screens for a whole variety of targets of interest in drug discovery (for a review, see [45]). For many targets, such as transmembrane receptors for which other screening methods were difficult to establish, reporter gene technology became the method of
3 Gene and Protein Expression Profiling in High-throughput Formats
choice for finding leads. Selected examples include the use of reporter gene assays to find agonists and antagonists for a whole variety of GPCRs [17] stably transfected into Chinese hamster ovary (CHO) cells with a 6 × CRE–luciferase reporter construct. This cell line served as an indicator for GPCRs, such as the melanocortin-1 receptor, which upon stimulation modulates cAMP levels in the cell. The assay can be performed in 384-well plates, therefore offering significant savings in cost, time and cell culture efforts. Along the same lines, a HTS system was established for the metabotropic glutamate receptor mGluR7 [46]. The mGluR7 receptor is negatively coupled to adenylyl cyclase and its stimulation decreases cAMP levels. CHO cells were stably transfected with rat mGluR7 cDNA and an AMP-responsive luciferase reporter gene. Suitability for HTS was demonstrated in a line where cAMP levels induced by forskolin (an activator of adenylyl cyclase) were significantly reduced by receptor agonists. A direct method to assay receptors negatively coupled to adenylyl cyclase has been described by Xing et al. [47]. The stably transfected CHO reporter cell line contained three separate expression plasmids: (i) a chimeric Gq/i -protein, which redirects a negative Gi signal into a positive Gq signal; (ii) a Gi -coupled GPCR, in this case the µ-opioid receptor or the 5-hydroxytryptamine receptor, and (iii) the 3 × NFAT-β-lactamase reporter. Thus activation of the Gi -coupled GPCR translated into NFAT activation via the Gq -mediated signaling pathway, allowing the monitoring of β-lactamase expression in a HTS format. A wide variety of different cell lines have been used to establish reporter gene assays, a selection of which is indicated here as examples. Human embryonic kidney (HEK) 293-EBNA cells grown in suspension culture were used to assay expression of prostanoid receptors coupled to a CRE-SEAP (secreted human placental alkaline phosphatase) reporter [48]. ECV304 cells, a spontaneously transformed immortal line derived from human umbilical vein endothelial cells, were transfected with an ICAM-1 reporter to screen for compounds which inhibit ICAM-1 expression upon stimulation with interleukin (IL)-1β [49]. The human breast cancer cell line MDA-MB-453 was used as a transfection host to establish stable reporters under the control of a mouse mammary tumor virus (MMTV) promoter to study androgen and glucocorticoid receptor agonists. MDA-MB cells express both the androgen and the glucocorticoid receptors, and therefore serve as an ideal host cell for the reporter constructs [50]. As a practical example, data obtained with a reporter gene assay under HTS conditions is presented in Fig. 5 and Tab. 2. Cells stably transfected with a luciferase reporter gene construct were seeded in 384-well plates and dosed with serial dilutions of a compound known to activate the promoter of the reporter gene construct. After an incubation time of 24 h, the cells were washed, lysed and the luciferase subR ; PerkinElmer Life Sciences) was added. The amount of luciferase strate (LucLite protein expressed correlated directly with the amount of substrate metabolized to a luminescent compound, which was detected in a luminometric plate reader R NXT; PerkinElmer Life Sciences). The data presented here provided (TopCount information as to the minimal concentration of positive reference compound required as standard on each plate for the HTS of 100,000 compounds. To assess the quality of each individual assay plate during HTS, a criterion called the Z factor
17
18 Cellular Assays and Their Application in Drug Discovery 45000 40000
Luminometric counts
35000 30000 25000 20000 15000 10000 5000 0 0
0,1
0,3
1
3
10
30
Concentration of inducer (µM)
Fig. 5 Dose-dependent induction of luciferase reporter gene activity by an inducing substance subsequently used as an agonist control in HTS. Each value represents the average of eight individual determinations, the error bars indicate 1 SD. The mean of luminometric counts measured
with TopCount NXT (PerkinElmer Life Science) is plotted against the concentration of the inducing substance (in :M). Counts obtained in the absence of compound refer to the basal expression of the luciferase reporter gene and constitute the assay background.
is calculated from the dynamic range of the signal (difference between the means of the background and the positive control wells) and the standard deviation (SD) of the control wells on each plate [51]. Since the Z factor takes into account both the signal intensity and the inherent variance, it gives a measure of the stability of an assay. Screening is generally considered possible if 0 < Z < 1 [51]; however, in practice, the minimum value to be met by the Z factor is often set higher. As shown in Fig. 5 and in the values calculated in Tab. 2, a concentration of at least 3 µM of the agonist in the positive control would be required to obtain a Z > 0.5. In effect, the positive control was used at 10 µM during HTS. Table 2 Luciferase reporter gene activation and assay performance
Concentration (l M)
Mean of 8 wells
SD
Z factor
0 0.1 0.3 1 3 10 30
6364 6347 9028 12006 20436 31292 38343
641 314 458 509 1476 1181 2027
n.d. -0.238 0.389 0.549 0.781 0.750
3 Gene and Protein Expression Profiling in High-throughput Formats
Reporter gene assays can be highly informative if the gene product of interest and the pathway of induction are well characterized. It needs to be borne in mind that the reporter gene will be modulated by any type of stimulus influencing any step of the coupled signal transduction cascade. Therefore, the effects on reporter gene expression observed in response to test compounds should always be studied further in assay panels addressing specific steps in the pathway, to rule out mechanisms not directly associated with the target of interest. This has been shown recently by inhibiting the CXCR1-agonist-stimulated mitogen-activated protein (MAP) kinase reporter gene by pertussis toxin (acting at the G-protein level) and inhibitors of one MAP kinase subfamily, but not by inhibitors of a second subfamily [52]. Even though expression of the reporter genes generally does not influence homeostasis of the cell, a certain artificial component is introduced into the system. First and foremost, it needs to be established whether the signal transduction pathway leading to activation of a particular promoter is present and functionally active in the cells in question. The introduction of the reporter gene itself into cells can disturb normal transcription patterns, due to scavenging of particular transcription factors or through positional effects due to integration of the plasmid during the generation of stable cell clones. Thus, great care has to be taken in choosing the promoter and the cell line in which the reporter gene assay is to be established. Limitations can occur regarding the cell lines that can be used, the transfection methods that should be employed and the efficiency of expression of the reporter gene. Nevertheless, even though the generation of well-characterized reporter gene cell lines is time-consuming and labor intensive, they constitute a highly efficient system for HTS programmes.
3.2 Reporter Gene Assays in Lead Optimization
The use of gene expression arrays to monitor simultaneously the expression of hundreds or thousands of genes has been mentioned before. Expression arrays are useful tools not only to characterize the expression phenotypes of cells and tissues for their classification [53], but they also allow us to monitor the impact of mutations [54] or drugs on the cellular phenotype [55] at a systems level. Arrays are widely used in toxico- and pharmacogenomic studies [56, 57]. Furthermore, the large data sets obtained with gene expression arrays have been used to construct predictive models for diagnostic purposes [58, 59]. The use of expression arrays requires the production of probes suitable for hybridization, which can be fastidious to produce. Furthermore, analysis of the large data sets obtained requires sophisticated software, and a broad special knowledge as to the significance and relevance of the multiplicity of the responses observed. In some instances, only a particular fraction of all genes measured is informative for the target in question. Thomas et al. [60] have recently analyzed the expression of 1200 liver transcripts using cDNA arrays, after dosing mice with 24 different
19
20 Cellular Assays and Their Application in Drug Discovery
treatments falling into five toxicological categories. A set of 12 transcripts was identified and considered diagnostic for classification of the treatments into the toxicological classes. Interestingly, addition of further transcripts lead to a decline in the predictive accuracy, and the “individuality” of the specific treatment profile began to take over. In a similar study, Bartosiewicz et al. [61] have used a set of 260 genes involved in phase I and II drug metabolism to test the effect of 10 toxicants on expression profiles in different tissues and for different exposure times in mice. Distinct groups of chemical classes could be correlated to the specific expression profile of a relatively low number of transcripts. Similar expression profiling studies performed on HepG2 cells [62] and on livers of treated rats [63] show that even with a highly focused set of genes, separate classes of chemicals can be identified through shared modulation of expression of the selected gene set. Today, the method of choice to assess gene expression profiles in vivo consists in the use of oligonucleotide or cDNA microarrays or gene chips. However, due to the high cost involved and the labor-intensive sample preparation, which generally preclude a high-throughput method, these studies are done mostly in a late phase of the drug discovery process. Thus, one or a few compounds are analyzed for the induced expression patterns of many genes. To reduce late-phase attrition of potential drug candidates due to unforeseen side-effects, expression-profiling systems which are amenable to HTS are needed. An approach making use of reporter gene assays has been described [64, 65]. Albano et al. [65] selected a set of oxidative stress-responsive promoters from Escherichia coli genes to generate GFP–reporter gene constructs, as a tool for mechanistic screening of antitumor drugs in 96-well format. Zhu and Fahl [66] generated stable HepG2 cell clones expressing GFP under the control of the electrophile response element (EpRE), which allow HTS for induction of phase II drug-metabolizing enzymes. In addition to high-throughput, a further advantage of GFP–reporter gene constructs is that the expression can be detected in live cells, allowing easy monitoring of time courses and reversibility of induction. The use of reporter gene cell lines for expression profiling has further been documented by Farr and Todd [64], who established a panel of cell lines containing the bacterial chloramphenicol N-acetyltransferase (CAT) reporter gene under the control of promoters indicative of specific cellular stress responses. Induction of reporter constructs by chemical compounds can give an indication concerning the action of compounds at the subcellular level. Thus, promoter–reporter gene constructs expressed in well-characterized cell lines can provide useful model systems to profile compounds with respect to activation of canonical signal transduction pathways associated with defined cell fates or phenotypes. The assay system will detect compounds which modulate reporter gene expression directly and those which interfere with the expression elicited by external stimuli such as growth factors or cytokines. When a panel of reporter gene assays is used for compound profiling, similar expression patterns can point to comparable mechanisms of action of test compounds in intact cells. Changes of expression patterns in response to specific signaling pathways and in different cell lines can provide information on the specificity of test compounds. These systems can easily be adapted to HTS, and the analysis of time
3 Gene and Protein Expression Profiling in High-throughput Formats
course and dose dependency of induction. Therefore, the profiling of hit collections or compound sets resulting from lead optimizations might allow the clustering of compounds into groups with pharmacologically similar mechanistic responses. For example, activation of reporter genes coupled to the p53-responsive element (p53RE) [67] would indicate that the test compound leads to DNA damage, and activation of reporters under the control of the xenobiotic response element (XRE) [68, 69] would point to the transcriptional induction of drug-metabolizing enzymes, such as cytochrome P450s and others. Thus, in vitro reporter gene profiling with a well-defined and validated set of genes can contribute not only to compound classification, but also to the identification of possible side-effect profiles of the compounds tested. More recently, with the advance of novel and more sensitive techniques of monitoring gene expression, an entirely new panel of assays has been brought forward. While reporter gene assays were devised with the idea of visualizing a particular gene expression event through the coupling to a protein which can be easily monitored, the novel techniques directly monitor gene expression by quantitation of the mRNA levels for several to up to thousands of genes simultaneously. Furthermore, these techniques do not require the prior introduction of a detection system, allowing unlimited choice with respect to cell type or tissue derived from organs. Table 3 gives an overview of a number of techniques developed for HTS applications that are currently available. This list is by no means complete, but should serve as a general reference to the different approaches taken. With respect to the HTS suitability of these novel technologies, two major categories can be differentiated. There are methodologies like serial analysis of gene expression (SAGE) [70], GeneCalling (transcript profiling by restriction endonuclease fingerprinting) [71] and total gene expression analysis (TOGA) [72], where initial sample preparation is very fastidious, but where high-speed parallel detection of many different genes can be achieved. These techniques are collectively called “open” systems – no prior sequence information is required and virtually all genes induced by a particular compound can be detected. Open systems challenge the DNA microarray technology, providing an alternative to the high cost and high demand of starting material associated with the latter. In practice, they will prove useful if only a few samples need to be analyzed, but where the high throughput (HT) refers to the detection of the different gene products. Quantitative reverse transcription-polymerase chain reaction (RT-PCR) [73], the branched DNA technology promoted by Bayer (Emeryville, CA) [74, 75] and the high-performance signal amplification (HPSATM [76]) system promoted by Chromagen (San Diego, CA) can detect one or a few mRNAs at a time. They are called “closed” system technologies, as are the DNA and oligonucleotide microarrays and the reporter gene technology, since they can be applied only to genes with known sequence. However, these methodologies can be used for HTS, since the assays can be run in a microtiter plate format, allowing the simultaneous processing of many samples. Moreover, they provide an advantage to the traditional reporter gene
21
22 Cellular Assays and Their Application in Drug Discovery Table 3 Differential gene expression technologies (adapted from [155])
Technology
Methodology
Target genes
Open systems: no prior sequence information required HT-compatible SAGE transcript capture by “fingerprint” SAGE tags, identification and quantitation by sequencing of serial concatemers 96 reactions GeneCalling restriction (transcript profiling) endonuclease “fingerprinting” of cDNA, confirmation by competitive PCR TOGA PCR-based cDNA 256 reactions display, identification of mRNAs with 3 poly(A) tail by sequencing Closed systems: analysis of genes with known sequence information HT-compatible hybridization of total DNA microarray/oligoarrays RNA to gene probes arrayed on silicone chips or glass slides Quantitative RT-PCR PCR-mediated release few probes (TaqmanTM ) of gene-specific fluorogenic probes; threshold of detection is inversely proportional to concentration Branched DNA 1 probe direct mRNA (QuantigeneTM ) detection by hybridization with gene-specific linker and enzyme-linked amplification probes 2 probes HPSATM direct mRNA detection by hybridization with gene-specific capture and enzyme-linked amplification probes
Samples
References
few
70
few
71
few
72
few
156, 157
96-well format
73
96-well format
74, 75
96-well format
158
format, because they are more flexible and can be performed on virtually any source material, and the detection of a novel gene requires only the design and synthesis of gene-specific probes. These novel technologies promise increased sensitivity, with a readout not compromised by secondary effects due to the reporter gene, but they are not yet as widely established as the reporter gene systems.
4 Spatio-temporal Assays and Subpopulation Analysis 23
4 Spatio-temporal Assays and Subpopulation Analysis
Cellular assays addressing protein expression levels, spatial and temporal distribution of proteins within the cell, and intracellular signaling dynamics are currently exploited to better understand pharmaceutical target functions in cells. To date, there is an increasing number of labeling reagents and fluorescent probes for imaging applications, including target-specific antibodies, complementing enzymes [77, 78], dyes (e.g. Alexa Fluor dye series; Molecular Probes), affinity reagents [79] and genetically encoded fluorescent proteins [18, 80, 81]. Systems that allow high-throughput analysis of cells include flow cytometrybased devices, such as the recently described high-throughput pharmacological system [82]. This system allows multifactorial characterization of cells, but does not allow us to quantify spatial rearrangements inside cells. Recently developed microscopic imaging technologies have broad applications. They allow analysis of a wide range of cellular functions including cell motility, cell morphology, protein movement inside cells, protein localization, intracellular second messengers and endpoint measurements of signaling cascades. The development of automated imaging systems has made compound screening in high-content format possible. The great advantage of automated imaging and scanning techniques is the tremendous increase in sample throughput compared to manual microscopy. In the case of fluorescence microscopy, multiple fluorescent probes can be detected and quantified simultaneously. R The ArrayScan II system was introduced in 1999 by Cellomics [83] (http://www .cellomics.com) and was one of the first automated imaging instruments for highcontent screening on the market. Meanwhile, other instruments, software and hardware were developed by several companies including Acumen-Bioscience (http://www.acusite.co.uk), Amersham Bioscience (http://www.amersham.com), Applied Biosystems (http://www.appliedbiosystems.com), Q3DM (http://www .q3dm.com), CompuCyte (http://www.compucyte.com), Kinetics Imaging (http:// www.kinetikimaging.com), Universal Imaging (http://www.imagel.com), Axon Instruments (http://www.axon.com) and Evotec OAI (http://www.evotecoai.com). The rapid progress in the development of analytical devices for subcellular analysis in combination with labeling reagents for specific identification of the molecules of interest will allow us to analyze specific signaling and metabolic steps in a highthroughput mode. 4.1 Phosphorylation Stage-specific Antibodies
Of particular interest for automated imaging are activity-dependent reagents, such as phosphospecific antibodies that allow selective analysis of signaling pathways within cells. Phosphorylation and dephosphorylation of proteins by kinases and phosphatases, respectively, are part of a complex signaling network of tightly regulated dynamic processes. Protein phosphorylation frequently results in sub-cellular
24 Cellular Assays and Their Application in Drug Discovery
redistribution of key signaling molecules, which is critical for their biological activity [84–86]. The spatial distribution of two key signaling proteins upon phosphorylation, extracellular signal-regulated kinase (Erk) and p38 MAP kinase, was analyzed with phosphotyrosine-specific antibodies recognizing only the phosphorylated form of the protein [87]. Exposure of cells to a potent Cdc25 phosphatase inhibitor resulted in a concentration-dependent nuclear accumulation of phospho-Erk and phospho-p38, but not nuclear factor-κB (NF-κB). These experiments revealed that Cdc25 inhibition increased Erk phosphorylation and nuclear accumulation, and indicated that the phosphorylation status of Erk is regulated by a Cdc25 phosphatase. A phospho-histone H3 antibody that targets the phosphorylation site on Ser10 of histone H3 and a phospho-histone H1 antibody were used to detect mitogenand oncogene-stimulated phosphorylation in small nucleosomal regions that are sensitive to hyperacetylation [88, 89]. Active kinase states have been analyzed with a panel of phosphospecific monoclonal anti-kinase antibodies. The specificity of the antibodies for the specific kinase has been verified by Western blot analysis and with specific inhibitors for upstream kinases. Using multiparameter flow cytometry, Perez and Nolan [90] could identify lymphocyte subsets with as many as four different intracellular active kinases. This technique allows us for the first time to investigate simultaneously different active kinases in one individual cell. While this technology can be best applied to cells in suspension, high-throughput microscopy is the technology of choice for adherent cells. In addition microscopic applications allow analysis of intracellular spatial distributions which is not possible with flow cytometry. Mitotic cells are characterized by an increase in phosphorylated nucleolin, a nuclear protein phosphorylated in cells entering mitosis. Using a monoclonal antibody that specifically recognizes phosphorylated nucleolin, Mayer et al. [91] have developed a high-throughput whole-cell immunoblot assay. Compounds arresting cells in mitosis by novel modes of action could be identified after sorting compounds with secondary assays. Other protein modifications like acetylation can be detected with specific antibodies as well. The organization of transcriptionally active chromatin and activity of nuclear histone acetyltransferase and deacetylase was investigated by using an acetyl-histone H3-specific antibody [92, 93]. Compounds affecting ligandstimulated changes in chromatin structure can be screened with such antibodies by automated imaging techniques.
4.2 Target-protein-specific Antibodies
Many proteins play important roles in both the cytoplasm and nucleus. Therefore, they need to move from one compartment to the other in order to perform their functions. Typical examples are latent cytoplasmic transcription factors [44] or nuclear transport factors [94]. Assays for nuclear transport factors using heterokaryon formation in combination with fluorescent proteins have been described [95].
4 Spatio-temporal Assays and Subpopulation Analysis 25
Nuc-CytoInten Difference
800 Exp.1
700
Exp.2
600 500 400 300 200 100 0 -100 0
20
40
60
80
Fig. 6 Time course of NF-6B nuclear translocation. HeLa cells were stimulated with 1 ng/ml IL-1" for the indicated times at 37◦ C. Mean values of 100 counted cells per well of two different experiments (Exp.1 and Exp.2) are shown. Nuc-CytoInten = nuclear minus cytoplasmic fluorescence intensity.
The movement of latent cytoplasmic transcription factors upon activation to the nucleus can be monitored and quantified by automated fluorescence imaging. IL-1induced and tumor necrosis factor-α-induced translocation of NF-κB to the nucleus has been quantified with the application of the Cellomics ArrayScan II technology [96]. Unstimulated cells contain inactive NF-κB distributed within the cytoplasm that can be detected by a NF-κB-specific antibody combined with a fluorescently labeled secondary antibody. In IL-1α-stimulated cells, most NF-κB is activated and translocates to the nucleus in a time- and dose-dependent manner (Figs. 6 and 7). Upon IL-1α stimulation, HeLa cells were analyzed with the ArrayScan II system using the specific software that allows quantitative analysis of fluorescence in the nucleus and in the adjacent cytoplasm. Cells are identified as objects by the software through the nuclear staining with Hoechst 33324 or DAPI, and are overlaid with masks for nuclear and cytoplasm fluorescence measurements (Fig. 8). 4.3 Protein–GFP Fusions
Tagging technologies have become popular to monitor the movement or transport of molecules into cells and within cells. Extracellular ligands can be conjugated to biotin and later visualized with enzyme-conjugated streptavidin. A HTS assay
Nuc-CytoInten Difference
26 Cellular Assays and Their Application in Drug Discovery
1000 900 800 700 600 500 400 300 200 100 0 -100 0.0001
Exp.1 Exp.2
0.001
0.01
0.1
1
10
ng/ml IL-1α Fig. 7 Dose-response relationship of IL-1"-induced NF-6B nuclear translocation. HeLa cells were stimulated with indicated concentrations of IL-1" for 30 min at 37◦ C. Mean values of 100 counted cells per well of one block two different experiments (Exp.1 and Exp.2) are shown. Nuc-CytoInten = nuclear minus cytoplasmic fluorescence intensity.
has been described on this basis for epidermal growth factor (EGF) internalization [97]. Fluorescent proteins significantly contribute to the understanding of protein dynamics within the cell. They represent powerful tools and ideal tags to assess realtime analysis of molecular events in living cells in the form of reporters or fusion proteins. The most prominent fluorescent protein is GFP, originally isolated and cloned from the jelly fish Aequorea victoria [98]. A number of GFP variants have been developed with different excitation and emission spectra, including enhanced blue fluorescent protein (eBFP), enhanced yellow fluorescent protein (eYFP) and enhanced cyan fluorescent protein (eCFP). The nature of their structures renders these proteins extremely stable and resistant to degradation by most proteases [99]. This high level of stability does not necessarily represent an advantage for cellular applications. If changes of gene expression with transcription reporter assays are to be monitored, a shorter half-life of the reporter is desired. Therefore, short half-life forms of eGFP were developed in order to be used as transcription reporter proteins and these are termed destabilized eGFP [80]. Meanwhile, other fluorescent proteins from marine invertebrates have been cloned, like the fluorescent proteins from reef corals [81]. The combination of distinct spectral variants of fluorescent proteins
4 Spatio-temporal Assays and Subpopulation Analysis 27
Fig. 8 IL-1"-induced NF-6B translocation in HeLa cells. Cells were stained with rabbit anti-p65 NF-6B antibody and goat antirabbit Alexa Fluor 488 antibody. Nuclear DNA was stained with Hoechst 33342. NF6B is quantified by measurement of green
fluorescence intensity; nuclei appear blue. Unstimulated cells (left) show most green fluorescence in the cytoplasm, whereas cells stimulated with 2 ng/ml of IL-1" for 30 min at 37◦ C (right) show translocation of NF-6B to the nucleus.
enables a wide range of applications, such as the simultaneous analysis of multiple gene expression cascades and localization or translocation of different proteins. Measurement of co-internalization of β-arrestin bound to GPCRs can be used to monitor receptor activation and recycling. Barak et al. [100] demonstrated, using confocal microscopy, the translocation of β-arrestin-2–GFP fusion proteins from the cytoplasm to membrane-anchored GPCRs. This was demonstrated for more than 15 different ligand-activated GPCRs in HEK-293 cells. In the absence of receptor activation, β-arrestins are distributed throughout the cytosol and are absent in the nucleus. Upon addition of ligand to the cells, β-arrestin-1–GFP fusion proteins translocate to the plasma membrane and further to endocytic vesicles [101]. The accumulation of the GFP-tagged proteins can then be quantified using a fluorescence imaging technology. With a β-arrestin-1–GFP fusion protein, co-internalization of arrestin with thyrotropin-releasing hormone receptor 1 in response to agonist was monitored and imaged [102, 103]. NORAK Biosciences (Research Triangle Park, NC; http://www.norakbio.com) exploits arrestin-GFP movement under the term TransfluorTM technology as a broadly applicable, stable HTS platform to identify compounds modulation GPCR signaling. Receptor
28 Cellular Assays and Their Application in Drug Discovery
internalization and trafficking was also exploited by the use of fluorescently labeled ligands such as ligand of transferrin receptor [104]. High-content screening was conducted on the ArrayScan II system, and internalization and recycling of the transferrin receptor with bound ligand could be monitored and quantified. GPCR activation and internalization has also been monitored using GPCR-GFP fluorescent protein fusions [104, 105]. Conway et al. [105] have used the ArrayScan high-content screening system to monitor parathyroid hormone receptor and β 2 adrenergic receptor internalization upon stimulation by specific ligands. Other examples where fusions with fluorescent proteins were used to quantitate protein movement include nuclear translocation of calmodulin [106] or the membrane-bound transcriptional regulator tubby after phosphoinositide hydrolysis [107]. In other applications, translocation domains such as the Ca2+ -sensitive C2 domain of protein kinase Cδ have been fused to YFP and expressed in rat leukemia cells. Evanescent wave single-cell array technology has been applied as an imaging technology to monitor protein movement upon addition of platelet-activating factor [108]. Simultaneous analysis of mixed cell populations can be performed by transfecting each cell line with a single fluorescent fusion protein. Vectors of BFP and YFP were used to stably transfect two different colon cancer lines, the parental line with a mutant K-ras gene and a variant where the mutant gene has been deleted. To screen for compounds that selectively kill only the tumorigenic variant with the mutant ras gene, both populations were mixed and simultaneously assayed for cell survival using fluorescence intensity measurements [109]. Compounds that preferentially affect the cells with the mutant ras allele could be identified. 4.4 Fluorescence Resonance Energy Transfer (FRET)
FRET has become a popular method to study protein dynamics in cells [110]. FRET is based on the transfer of energy absorbed by one fluorescent molecule to another, causing excitation of this second molecule. To achieve FRET, the two fluorescent molecules with different excitation/emission spectra need to come into close proximity of each other. The most general strategy for creating such indicators is to sandwich the conformationally sensitive domain of interest between two mutants of GFP. In case of ligand binding or modification of the domain, the two fluorescent moieties get into close proximity, which enables FRET. In a pioneering study, Mahajan et al. [111] used plasmid constructs that expressed the fusion proteins GFP-Bax and BFP-Bcl-2. Bax and Bcl-2 are two key modulators of apoptosis, but their direct interaction had then never been shown at the cellular level. Using FRET, the direct interaction of the two proteins in mitochondria could be demonstrated. FRET has also been applied to determine kinase activities in living cells. For this purpose, reporters were constructed, consisting of fusions of CFP, a phosphoamino acid-binding domain, a consensus substrate for the relevant kinase of interest and YFP. In cells transfected with the reporter constructs and stimulated
4 Spatio-temporal Assays and Subpopulation Analysis 29
for the particular kinase activity, the fusion protein after phosphorylation underwent a conformational change, and allowed FRET between the YFP and CFP. The change in the ratio of yellow to cyan emission could be quantified, and temporally as well as spatially determined. This technology has been successfully applied for protein kinase A [112], and the three tyrosine kinases Abl, Scr and EGF-receptor (EGF-R) [113]. In a similar approach, Honda et al. [114] demonstrated the use of a genetically encoded fluorescent indicator to investigate the dynamics of guanosine 3 ,5 -cyclic monophosphate (cGMP) in single cells. The cGMP indicator was constructed by bracketing deletion mutants of the cGMP-dependent protein kinase between cyan and yellow mutants of GFP. Binding of cGMP to the fusion protein indicator altered FRET and the ratio of cyan to yellow emissions selectively over cAMP. By using this method, the temporal and spatial dynamics of intracellular cGMP levels in rat fetal lung fibroblasts and in primary rat Purkinje neurons was monitored upon addition of different stimuli. Similar sensors for cAMP have also been published [115]. FRET-based sensors were also used to demonstrate the spatio-temporal behavior of GTB-binding proteins after cellular stimulation with growth factors [116]. A fusion protein was engineered which contained H-Ras, the Ras-binding domain of Raf, and was flanked by a pair of yellow and cyan mutants of GFP. H-Ras was replaced in a second construct by the analogue protein Rap1. Activation of the Ras/Rap pathway by growth factors induced FRET. Spatio-temporal localization of the fusion protein indicated that EGF stimulation of COS-1 cells activated Ras at the peripheral plasma membrane, but Rap-1 in the perinuclear region. In addition, this study showed that growth factors activate only a subpopulation of Ras and Rap, and only at very restricted areas in the cells. This technology in combination with cellular imaging will greatly increase the sensitivity and facilitate cellular HTS approaches for specific intracellular signaling events. 4.5 GPCR Activation using Bioluminescence Resonance Energy Transfer (BRET)
BRET is a naturally occurring process. The jelly fish A. victoria or the sea pansy Renilla reniformes create green fluorescence by BRET. In the former blue-lightemitting case, aequorin associates with GFP, whereas in the latter case, the energy generated by degradation of coelanterazine by luciferase is transferred to GFP. This transfer, however, only occurs if luciferase and GFP form a heterodimer. BRET has been used to demonstrate the interaction of circadian clock proteins from cyanobacteria expressed in E. coli [117]. More recently, BRET has been applied to study GPCR activation by several groups [118–120]. To demonstrate β 2 -adrenergic receptor (β 2 AR) dimerization, β 2 AR-Renilla luciferase and β 2 AR-red-shifted GFP (YFP) constructs were coexpressed in HEK-293 cells. Occurrence of BRET was an indication of receptor dimerization. In the same study, β-arrestin, which is known to associate with the intracellular domain of activated G protein-coupled receptors, was expressed as an arrestin-YFP construct together with β 2 AR-Renilla luciferase. BRET between the two proteins occurred, but was entirely dependent upon receptor
30 Cellular Assays and Their Application in Drug Discovery
activation and indicated the association of arrestin with the intracellular domain of the β 2 AR in the active state of the receptor. This assay has been developed as a HTS assay technology by PerkinElmer Life Sciences. 4.6 Protein Fragment Complementation Assays (PCA)
The methods described above to study protein-protein interactions are associated with certain shortcomings. The yeast two-hybrid system is based on transcription, which requires proteins to be available in the nucleus. In FRET, fluorescent proteins need to be expressed in cells at relatively high levels and the interaction must be such that the proteins are in close proximity to allow energy transfer. Enzyme complementation and its use in studying protein-protein interactions have been developed for E. coli β-galactosidase [121], dihydrofolate reductase (DHFR) [122] and most recently for β-lactamase [123]. In the case of β-galactosidase, two weakly complementing deletion mutants are each fused to the two proteins to be probed for interaction. Only if the two proteins interact are the two parts of the galactosidase forced to complement each other and exert enzymatic activity. The system has been developed by demonstrating FRAP and FKBP12 interaction in the presence of rapamycin. More recently, β-galactosidase complementation was used to establish a HTS system to screen for antagonists of EGF-mediated EGF-R dimerization [77, 124]. For this purpose, the complementing deletion mutants of β-galactosidase were fused to the intracellular domains of the EGF-R and cotransfected into C2C12 mouse myoblasts. Enzyme activity was reconstituted after adding EGF to the cells in culture. With this assay adapted to a 384-well format, primary hits identified in the EGF-R assay were screened for specificity with two complementing deletion mutants of β-galactosidase fused to human β 2 -adrenoceptor and β-arrestin-2. By the use of sensitive chemiluminescent and fluorescent galactosidase reagents, this technology enables the measurement of protein-protein interactions in living cells for HTS applications. The reassembly of active DHFR from rationally designed fragments was established by Pelletier et al. in a prokaryotic system [122]. E. coli DHFR is selectively inhibited by the antifolate trimethoprim. Murine DHFR can rescue E. coli cells. Complementing fragments of murine DHFR fused to homo- or heterodimerizing proteins were used to transform E. coli, which were then grown in the presence of trimethoprim. Survival was taken as a measure for protein interactions and enzyme complementation. Besides GCN4 leucine zipper fusions, raf-ras interaction, and rapamycin-induced interaction of FKBP and TOR2 were used in survival studies to demonstrate interaction and complementation. In another study, protein partners fused to complementing fragments of murine DHFR were introduced into DHFR-negative mammalian cells grown in the absence of nucleotides. Cells where the proteins associate and DHFR can complement are able to survive under these conditions [125]. As an alternative, protein interaction and DHFR complementation can be monitored by feeding cells fluorescein-conjugated methotrexate. Methotrexate only binds
5 Phenotypic Assays 31
to DHFR when the complementary fragments are coexpressed and reassembled. Fluorescence signals can then be monitored by fluorescence microscopy, FACS or by spectroscopical devices used in HTS. The FKBP-rapamycin-FRAP complex and the activation of the erythropoietin receptor were used as model systems [125]. Most recently, DHFR-PCA assays were used to analyze whole biochemical networks in living cells [78]. Cells expressing associating protein pairs were selected and fluorescein-conjugated methotrexate, which binds to reconstituted DHFR, was applied to localize interacting protein partners and to study inhibitors preventing the interaction. The same method has been successfully applied in plant protoplasts expressing complementary fragments of DHFR fused to interacting proteins. The binding of fluorescein-conjugated methotrexate as measure of protein–protein interaction was monitored by fluorescence microscopy [126].
5 Phenotypic Assays 5.1 Proliferation/Respiration/Toxicity
Many cytotoxicity and proliferation assays for mammalian cells rely on radioactivity and measure the incorporation of radiolabeled nucleotides such as [3 H]thymidine into cellular DNA. Although very sensitive, the [3 H]thymidine assay [127] is relatively expensive and labor intensive, and requires handling and disposal of the radioactive waste. Other cytotoxicity assays, such as MTT [3-(4,5-dimethylthiazol-2-yl)2,5-diphenyltetrazolium bromide] [128], Alamar Blue [129], lactate dehydrogenase [130] and ATP content [131], require the addition of specific reagents to generate a signal. Among those, the most commonly used is MTT [132]. This tetrazolium salt is reduced within the mitochondria of metabolically active cells to form a colored formazan dye precipitate. The broad utility of MTT is limited by the fact that it may be susceptible to interference from drugs, either by reacting with reducing groups on the drugs or by scatter of absorbance resulting from the precipitation of drugs absorbing light in the visible spectrum. Cell viability measurement by the MTT assay relies on its reaction with mitochondrial succinate dehydrogenase, which may itself perturb the cells. Furthermore, the test itself is not reversible and, because it is an endpoint reading, each time point reading requires a separate assay. It has to be pointed out that toxicity and proliferation assays are used separately and independently of each other as well as in combination. The combination of both assays allows a thorough investigation of a compounds effect on a particular cell line. There is a considerable need for assays that are homogeneous and reversible. The Oxygen Biosensor System (OBSTM ; BD Biosciences, Bedford, MA) can be seen as a prototype representative. Unlike dye conversion methods, there is no need to add any reagent in order to generate a signal. The OBS is a microplate-based assay platform that enables kinetic monitoring of dissolved oxygen. The oxygen-sensitive dye is embedded in a silicone matrix,
32 Cellular Assays and Their Application in Drug Discovery
16000
Relative fluorescence units
14000 12000 16µM Sodium azide
10000
80µM Sodium azide 400µM Sodium azide
8000
2000µM Sodium azide 10000µM Sodium azide
6000 4000 2000 0
3
30
50
73
102
126
Time after drug addition [h] Fig. 9 Oxygen consumption of HeLa cells after exposure to sodium azide. Cells (1 × 105 ), growing on beads, were seeded in individual wells of an oxygen biosensor plate and treated with sodium azide at the concentrations indicated. Cells were cultivated and fluorescence was measured at the times indicated. Low micromolar concentrations of sodium azide did not affect
the growth of HeLa cells, as judged from the increased oxygen consumption during cells growth. On the other hand, millimolar concentrations of sodium azide are toxic and the relative fluorescence is maintained at a constant (background) level. The decrease of oxygen consumption at 126 h after drug addition indicates that the cells became confluent and stopped proliferating.
which is permanently attached to the bottom of each well of a microplate. The gas-permeable matrix allows oxygen in the vicinity of the dye to be in equilibrium with the oxygen in the liquid medium. Oxygen quenches the ability of the dye to fluoresce in a predictable, concentration-dependent manner. Hence, as oxygen is depleted from a well (e.g. as cells grow and consume oxygen in the surrounding medium), the concentration of oxygen in the matrix decreases and the fluorescence increases. The amount of fluorescence correlates directly to oxygen consumption in the well. This, in turn, relates to the number or relative viability of cells in this particular well. Any reaction that can be linked to oxygen consumption or generation can potentially be measured by using this system (Fig. 9). The signal is dynamic and is read in real-time. Samples may be read as frequently and as many times as desired. The signal is completely reversible, and will rise as the cells proliferate and then fall as they die, giving a more complete picture of proliferation and toxicological kinetics. 5.2 Apoptosis
Apoptosis, or programmed cell death, is one of the most complex cellular processes. Early and late events occurring in cells undergoing apoptosis comprise activation
5 Phenotypic Assays 33
of different caspases, changes in the cytoskeleton, changes of the mitochondrial membrane potential and mitochondrial mass [133], nuclear condensation or fragmentation [134], flipping of phosphatidylserine to the outer membrane layer and blebbing of the plasma membrane. Changes in nuclear morphology, actin reorganization, and mitochondrial potential and mass can be detected and quantified by a multiparameter apoptosis assay [135]. Increase or shrinkage of the nuclear area and chromatin distribution within the nucleus is analyzed by staining cells with a fluorescent DNA dye, such as DAPI or Hoechst 33342. Bundling of intracellular F-actin filaments and increase of the actin content in apoptotic cells [136, 137] is detected by the use of fluorescent phalloidin. This phallotoxin, isolated from the mushroom Amanita phalloides, binds highly specifically to F-actin and can be labeled with any fluorescent dye. However, increase in F-actin content is not observed in all cases of apoptotic cells, depending on the cell type or inducer [138, 139]. The mitochondrionR Red, which accumulates in active mitochondria specific cationic dye Mitotracker is used to measure mitochondrial membrane potential and mass. Apoptotic cells can exhibit loss of mitochondrial membrane potential or increase of total mitochondrial mass [140]. In both cases, the quantity of mitochondrial stain is a measure for the degree of apoptosis. In the case of the multiparameter apoptosis application the ArrayScan II algorithm (see Section 4.2) not only quantifies the label, but also takes into account the distribution of fluorescent dyes within cellular compartments, and calculates nuclear fragmentation and mitochondrial mass increase. This algorithm has been successfully applied in detection of compound-induced apoptosis [135] and can be used for high-throughput compound screening. 5.3 Differentiation
Human stem cells of various types have the ability to differentiate into a number of mature cell types [141], and such stem cells are currently used in bone marrow transplants and in drug research. New drugs such as erythropoietin and granulocyte colony-stimulating factor have been developed that act directly on such cells, and are essential to the success of bone marrow transplantation [142]. The expansion of bone marrow progenitors is extremely sensitive to a wide range of chemotherapeutics and other toxic agents. In some cases, the toxic side-effects of drugs are specific to particular hematopoietic lineages [143]. In the past, use of hematopoietic progenitors for high-throughput drug discovery and toxicology screening was limited by tissue availability. Given the current availability of high-quality primary human hematopoietic progenitors, the current bottleneck in the high-throughput use of these cells is the assay of their differentiation phenotypes. The most commonly used in vitro assay in order to determine the differentiation response of hematopoietic progenitor cells is the colony assay [144]. The colony assay uses selected progenitor populations from human bone marrow, umbilical cord blood or mobilized peripheral blood in clonogenic assays in a semi-solid medium. Colonies of differentiating blood cells appear after a period of 14 days and
34 Cellular Assays and Their Application in Drug Discovery
are counted manually. Different types of colonies are identified by their morphology (e.g. myeloid and erythroid) or cytochemical staining techniques (megakaryocytes). Although the parameters of these assays have been optimized to obtain quantitative data, the extremely low throughput and subjective nature of this assay cause some problems when it is used for chemical compound library screening, early drug discovery and high-throughput toxicity screening. Differentiative responses can also be assayed by the transfection of a target cell with a receptor gene tied to the expression of a differentiation-specific marker [145]. It requires, however, the transfection of a cell containing the many intact signal transduction cascades characteristic of hematopoietic progenitors. Another approach was described by Warren et al. [146]. The authors described a cell-based immunoassay in which the cells were transferred to a fixative-treated assay plate, and fixed in place prior to incubations with primary and secondary antibodies. Although this assay has been used successfully, transfer of cells from the culture plate to the assay plate is time-consuming and not amenable to HTS. Recently [147], this assay design was revised and improved in order to reduce the amount of manipulations required. On termination of the culture period, cell culture supernatant is removed via vacuum filtration and a labeled specific antibody is added. In order to obtain the sensitivity required, the monoclonal antibodies were conjugated to a DELFIA europium chelate [148]. In addition to its sensitivity, the relatively long emission time of the DELFIA europium chelate allowed the use of time-resolved fluorescence (TRF) to avoid the endogenous background fluorescence that is often a significant problem in biological assays. Removal of the cell culture supernatant, unbound antibodies and subsequent wash steps are performed by using a 96-well filter plate. Because there is little carryover from one wash cycle to the next, removal of unbound antibody is very efficient. Repeated washes without cell loss are possible. For signal detection, an enhancement solution was added and europium fluorescence (excitation at 340 nm and emission at 615 nm) was measured over a 400-µs time period after an initial delay of 400 µs using a suitable spectrofluorometer.
5.4 Monitoring Cell Metabolism
Direct determination of metabolites at the cellular or in vivo level provides a sensitive system for assessing drug action because metabolites represent end products of cellular regulatory processes. In addition, intracellular metabolites are generally far less abundant than proteins. In a cellular environment, signal transduction, initiation of gene expression, protein synthesis and metabolic responses to a stress factor must be essentially sequential. The fields of metabolomics – which investigates metabolic regulation and fluxes in individual cells or cell types – and metabonomics – the determination of systemic biochemical profiles and regulation of function in whole organisms by analyzing biofluids and tissues – have therefore been developed recently [15, 149].
5 Phenotypic Assays 35
Studying the effects of drugs on cells or whole organisms by metabonomics relies on multiparametric measurements of alterations in metabolism over time in response to a stress factor. Samples are mapped according to their biochemical composition [150]. Metabolites proved particularly useful in the characterization of mutant phenotypes as shown recently in plants [151] and in the characterization of so-called “silent mutations” in yeast [6]. Many genes are “silent”, which means that if mutated they do not lead to an easily visible phenotypic change, such as alterations in growth rate. The growth rate may not be changed because the concentrations of different intracellular metabolites are adapted in a way that compensates for the effect of the mutation. Accordingly, metabolome analysis can reveal a phenotype at the level of metabolites in many mutants previously scored as silent. In addition, quantifying the relative changes of metabolite concentrations caused by a mutation makes it possible to identify the site of action of the altered gene product as demonstrated by Raamsdonk et al. [6]. By using nuclear magnetic resonance or mass spectrometry (MS)/liquid chromatography/MS applications, a reasonable sample throughput is possible. This will allow the use of metabolome and metabonome analysis in drug discovery as a sensitive measure of drug action [15]. An emerging approach, which allows the analysis of the metabolic state in intact cells, is the functional online analysis of living cells in physiologically controlled environments for extended periods of time (“biochips”). Efforts are being directed towards the parallel development, adaptation and integration of different microelectronic sensors into miniaturized biochips for a multiparametric cellular monitoring (“Cell Monitoring System” [22]). Some possible readouts and the corresponding cellular activities are summarized in Tab. 4. Monitoring of cellular oxygen exchange [152] can be employed for the assessment of mitochondrial activities. Other sinks or sources of oxygen do not contribute significantly in most cell types. Mitochondria, apart from supplying the cell with energy, are suggested to be involved in cell ageing accelerated by reactive oxygen species, in apoptosis and in the toxic mechanisms of numerous drugs. Monitoring mitochondrial activity by lipophilic dyes such as rhodamine 123 suffers from the
Table 4 Readouts for multiparametric cell monitoring systems
Readout
Corresponding cellular activity
Changes in oxygen level Acidification of the medium Changes in impedance
mitochondrial activity (energy metabolism; cell ageing; apoptosis) global indicator of metabolic activity; fast changes: cellular activation events; slow changes: cell growth or death cell adhesion: matrix attachment, tumor metastasis; cell outgrowth, e.g. neurite differentiation influx and effluxofCa2+ , Na+ , K+ e.g. glucose uptake; lactate excretion e.g. induction of reporter gene expression coupled to the generation of bioluminescence
Ion flow Metabolites Light emission
36 Cellular Assays and Their Application in Drug Discovery
risk of potential artifacts, and therefore, nontoxic and noninvasive techniques are particularly valuable. The OBS has been described above as one of the first noninvasive techniques to monitor oxygen consumption. One limitation of the OBS, however, is the requirement for a large number of cells. The system needs at least 50,000 cells per well in order to produce viable results (less cells do not consume enough oxygen in order to (over)compensate the diffusion of oxygen back from the outside environment to the oxygen biosensor). In comparison, microsensor-based technologies are so small in scale that, under ideal circumstances, signals can be obtained from a single cell. Microsensor-based pH recording [153] is predominantly used for determinations of extracellular acidification rates. Several metabolic pathways contribute to extracellular acidification and, thus, can be regarded as a global indicator of metabolic activity. In this sense, fast changes (within minutes) tend to reflect cellular activation events (e.g. receptor-mediated signaling), while slow changes (several hours) are usually attributable to cell growth or cell death [154], respectively. Monitoring of cell adhesion (e.g. tumor metastasis) or outgrowth (e.g. for neurites) can be accomplished by growing the cells directly on pairs of interdigitated electrodes. The cellular impedance signal results from insulation by the cell membranes. If cells are placed on the electrodes, they block the current flow in a passive way and the impedance increases. Therefore, the impedance of the system gives insight into the adhesive behavior of the cells. This methodology is based on on-line data acquisition using multiparametric microsensor devices. A fluid system for the supply of culture media/drug solutions and for the realization of metabolic measurements is connected to the chip. The precise maintenance of in vitro microenvironmental conditions is also necessary for the cellular specimen. The cellular parameters currently amenable by sensor chip readings are metabolic activity (rates of extracellular acidification and cellular respiration), morphologic properties and some ion fluxes. Small-scale sensor chip areas require very small amounts of cellular specimen. This may allow the utilization use of primary cells. Methods have been developed which allow testing of dozens of compounds, but 96-well formats might become available soon [22]. 5.5 Other Phenotypic Assays
There is an increased need for real-time, label-free observations of proteins and cells under their natural (unperturbed) conditions due to some limitations associated with the currently existing technologies. A major drawback of today’s technologies is the need to label and overexpress the protein of interest. Due to its overexpression and modification, the protein’s biochemical activity and localization may not reflect the natural (unperturbed) conditions, and therefore the results have to be interpreted with caution. One technology that addresses those issues uses radiofrequency waves [21] to probe the physiological state of a protein in the context of the whole cell. MCS uses microwaves, which correspond to the natural frequency of
References
protein dynamics, allowing the measurement of electrodynamic properties of proteins and cells. MCS does not use any tags or markers and may be performed in any environment, from simple aqueous solutions to very complex mixtures (i.e. cells), thereby allowing a direct approach to determine changes in cellular phenotypes (i.e. differentiation). Isolated proteins in physiological buffers exposed to a range of microwave frequencies, demonstrate two properties: the spectral response (the proteins “signature”) and the biophysical profile. The first of these determines the raw response of each protein over a range of frequencies. The biophysical profile describes the way in which a protein interacts with the environment – molecular volume, complexation of water, interaction with other compounds/proteins – as well as how it compares to other proteins belonging to similar classes. By scanning a range of frequencies and evaluating different environments (e.g. buffer conditions, presence of cofactors), this analysis defines a multidimensional signature to each protein even in complex (cellular) environments. This would allow the development of cell-based assays where the biophysical profile of target proteins can be analyzed in an environment that closely mimics the situation in primary disease tissues.
Acknowledgments
We thank Dr. Guy Marais for carefully reading and reviewing the manuscript.
References 1 TAN D. S. 2002. Sweet surrender to chemical genetics. Nat Biotechnol 20, 561–3. 2 STEPHANOPOULOS G., J. KELLEHER. 2001. Biochemistry. How to make a superior cell. Science 292, 2024–5. 3 HANAHAN D., R. A. WEINBERG. 2000. The hallmarks of cancer. Cell 100, 57–70. 4 SOMOGYI R., L. D. GRELLER. 2001. The dynamics of molecular networks: applications to therapeutic discovery. Drug Discov Today 6, 1267–77. 5 GLASSBROOK N., C. BEECHER, J. RYALS. 2000. Metabolic profiling on the right path. Nat Biotechnol 18, 1142–3. 6 RAAMSDONK L. M., B. TEUSINK, D. BROADHURST, N. ZHANG, A. HAYES, M. C. WALSH, J. A. BERDEN, K. M. BRINDLE, D. B. KELL, J. J. ROWLAND, H. V. WESTERHOFF, K. van DAM, S. G. OLIVER. 2000. A functional genomics strategy
7
8
9
10
11
12
that uses metabolome data to reveal the phenotype of silent mutations. Nat Biotechnol 19, 45–50. GIFFORD D. K. 2002. Blazing pathways through genetic mountains. Science 293, 2049–51. KARP P. D. 2001. Pathway databases: a case study in computational symbolic theories. Science 293, 2040–4. WHITE M. A. 1996. The yeast two-hybrid system: forward and reverse. Proc Natl Acad Sci USA 93, 10001–3. ITO T., T. CHIBA, M. YOSHIDA. 2001. Exploring the protein interactome using comprehensive two-hybrid projects. Trends Biotechnol 19, S23–7. TUCKER C. L., S. FIELDS. 2001. A yeast sensor of ligand binding. Nat Biotechnol 19, 1042–6. OSBORNE M. A., S. DALTON, J. P. KOCHAN. 1995. The yeast tribrid system – genetic
37
38 Cellular Assays and Their Application in Drug Discovery
13
14
15
16
17
18
19
20
21
22
23
detection of trans-phosphorylated ITAM–SH2-interactions. Biotechnology 13, 1474–8. STRAUB B., E. MEYER, P. FROMHERZ. 2001. Recombinant maxi-K channels on transistor, a prototype of ionoelectronic interfacing. Nat Biotechnol 19, 121–124. CELIS J. E., M. KRUHOFFER, I. GROMOVA, C. FREDERIKSEN, M. OSTERGRAAD, T. THYKJAER, P. GROMOV, J. VU, H. PALSDOTTIR , N. MAGNUSSON, T. F. ` ORTNTOFT. 2000. Gene expression profiling: monitoring transcription and translation products using DNA micro-arrays and proteomics. FEBS Lett 480, 2–16. NICHOLSON J. K.g, J. CONNELLY, C. LINDON, E. HOLMES. 2002. Metabonomics: a platform for studying drug toxicity and gene function. Nat Rev Drug Discov 1, 153–61. Broach JR, J. THORNER. 1996. High throughput screening for drug discovery. Nature 384, 14–6. GOETZ A. S., J. L. ANDREWS, T. R. LITTLETON, D. M. IGNAR. 2000. Development of a facile method for high throughput screening with reporter gene assays. J Biomol Screen 5, 377–84. REMINGTON S. J. 2002. Negotiating the speed bumps to fluorescence. Nat Biotechnol 20, 28–9. SCHAUFELE F. 2001. Shedding light on molecular timing. Nat Biotechnol 19, 21–2. MITCHELL P. 2001. Turning the spotlight on cellular imaging. Nat Biotechnol 19, 1013–7. HEFTI J. P., A. KUMAR. 1999. Sensitive detection method of dielectric dispersions in aqueous based, surface bound macromolecular structures using microwave spectroscopy. Appl Phys Lett 75, 1802–4. BAUMANN W., M. LEHMANN, M. BITZENHOFER, A. SCHWINDE, M. BIRSCHWEIN, R. EHRET, B. WOLF. 1999. Microelectronic sensor system for microphysiological application on living cells. Sensors and Acuators B 55, 77–89. SCHLESSINGER J. 1993. Cellular signaling by receptor tyrosine kinases. Harvey Lect 89, 105–23.
24 SULLIVAN E., E. M. TUCKER, I. L. DALE. 1999. Measurement of [Ca2+ ] using the Fluorometric Imaging Plate Reader (FLIPR). Methods Mol Biol 114, 125–33. 25 MILLIGAN G., S. REES. 1999. Chimaeric G alpha proteins: their potential use in drug discovery. Trends Pharmacol Sci 20, 118–24. 26 NEER E. J. 1995. Heterotrimeric G proteins: organizers of transmembrane signals. Cell 80, 249–57. 27 ZABIN I. 1982. A model of protein interaction. Mol Cell Biochem 49, 87–96. 28 DEAN M., Y. HAMON, G. CHIMINI. 2001. The human ATP-binding cassette (ABC) transporter super-family. J Lipid Res 42, 1007–17. 29 BRINKMANN U., M. EICHELBAUM. 2001. Polymorphisms in the ABC drug transporter gene MDR1. Pharmacogenomics 1, 59–64. 30 HWANG L. Y., P. T. LIEU, P. A. PETERSON, Y. YANG. 2001. Functional regulation of immunoproteasomes and transporter associated with antigen processing. Immunol Res 24, 245–72. 31 KO Y. H., P. L. PEDERSEN. 2001. Cystic fibrosis: a brief look at some highlights of a decade of research focused on elucidating and correcting the molecular basis of the disease. J Bioenerg Biomembr 33, 513–21. 32 OUELLETTE M., D. LEGARE, B. PAPADOPOULOU. 2001. Multidrug resistance and ABC transporters in parasitic protozoa. J Mol Microbiol Biotechnol 3, 201–6. 33 SCHMIDT C., M. MAYER, H. VOGEL. 2000. A chip-based biosensor for the functional analysis of single ion channels. Angew Chem Int 39, 3137–40. 34 XU J. Z., X. WANG, B. ENSIGN, M. LI, L. WU, A. GUIA, J. XU. 2001. Ion-channel assay technologies: quo vadis? Drug Discov Today 6, 1278–87. 35 BAXTER D. F., M. KIRK, A. F. GARCIA, A. RAIMONDI, M. H. HOLMQVIST, K. K. FLINT, D. BOJANIC, P. S. DISTEFANO, R. CURTIS, Y. XIE. 2002. A novel membrane potential-sensitive fluorescent dye improves cell-based assays for ion channels. J Biomol Screen 7, 79–85.
References 36 WHITEAKER K. L., S. M. GOPALAKRISHNAN, D. GROEBE, C. C. SHIEH, U. WARRIOR, D. J. BURNS, M. J. COGHLAN, V. E. SCOTT, M. GOPALAKRISHNAN. 2001. Validation of FLIPR membrane potential dye for high throughput screening of potassium channel modulators. J Biomol Screen 6, 305–12. 37 SARVER J. G., W. A. KLIS, J. P. BYERS, P. W. ERHARDT. 2002. Microplate screening of the differential effects of test agents on Hoechst 33342, rhodamine 123, and rhodamine 6G accumulation in breast cancer cells that overexpress P-glycoprotein. J Biomol Screen 7, 29–34. 38 EPSTEIN C. B., R. A. BUTOW. 2000. Microarray technology–enhanced versatility, persistent challenge. Curr Opin Biotechnol 11, 36–41. 39 HUGHES T. R., M. J. MARTON, A. R. JONES, C. J. ROBERTS, R. STOUGHTON, C. D. ARMOUR, H. A. BENNETT, E. COFFEY, H. DAI, Y. D. HE, M. J. KIDD, A. M. KING, M. R. MEYER, D. SLADE, P. Y. LUM, S. B. STEPANIANTS, D. D. SHOEMAKER, D. GACHOTTE, K. CHAKRABURTTY, J. SIMON, M. BARD, S. H. FRIEND. 2000. Functional discovery via a compendium of expression profiles. Cell 102, 109–26. 40 WEINBERGER S. R., E. A. DALMASSO, E. T. FUNG. 2002. Current achievements using ProteinChip Array technology. Curr Opin Chem Biol 6, 86–91. 41 GOETZ A. S., J. LIACOS, J. YINGLING, D. M. IGNAR. 1999. A combination assay for simultaneous assessment of multiple signaling pathways. J Pharmacol Toxicol Methods 42, 225–35. 42 FREEMAN T. 2000. High throughput gene expression screening: its emerging role in drug discovery. Med Res Rev 20, 197–202. 43 PANDEY A., M. MANN. 2000. Proteomics to study genes and genomes. Nature 405, 837–46. 44 BRIVANLOU A. H., J. E. DARNELL, Jr. 2002. Signal transduction and the control of gene expression. Science 295, 813–8. 45 NAYLOR L. H. 1999. Reporter gene technology: the future looks bright. Biochem Pharmacol 58, 749–57.
46 TERSTAPPEN G. C., A. GIACOMETTI, E. BALLINI, L. ALDEGHERI. 2000. Development of a functional reporter gene HTS assay for the identification of mGluR7 modulators. J Biomol Screen 5, 255–62. 47 XING H., H. C. TRAN, T. E. KNAPP, P. A. NEGULESCU, B. A. POLLOK. 2000. A fluorescent reporter assay for the detection of ligands acting through Gi protein-coupled receptors. J Recept Signal Transduct Res 20, 189–210. 48 DUROCHER Y., PERRET S., THIBAUDEAU E., GAUMOND M. H., KAMEN A., STOCCO R., ABRAMOVITZ M. 2000. A reporter gene assay for high-throughput screening of G-protein-coupled receptors stably or transiently expressed in HEK293 EBNA cells grown in suspension culture. Anal Biochem 284, 316–26. 49 DAVIS G. F., T. R. DOWNS, J. A. FARMER, C. R. PIERSON, J. T. ROESGEN, E. J. CABRERA, S. L. NELSON. 2002. Comparison of high throughput screening technologies for luminescence cell-based reporter screens. J Biomol Screen 7, 67–77. 50 WILSON V. S., K. BOBSEINE, C. R. LAMBRIGHT, L. E. GRAY, Jr. 2002. A novel cell line, mda-kb2, that stably expresses an androgen- and glucocorticoid-responsive reporter for the detection of hormone receptor agonists and antagonists. Toxicol Sci 66, 69–81. 51 ZHANG J. H., T. D. CHUNG, K. R. OLDENBURG. 1999. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen 4, 67–73. 52 REES S., D. P. MARTIN, S. V. SCOTT, S. H. BROWN, N. FRASER, C. O’SHAUGHNESSY, I. J. BERESFORD. 2001. Development of a homogeneous MAP kinase reporter gene screen for the identification of agonists and antagonists at the CXCR1 chemokine receptor. J Biomol Screen 6, 19–27. 53 POMEROY S. L., P. TAMAYO, M. GAASENBEEK, L. M. STURLA, M. ANGELO, M. E. MCLAUGHLIN, J. Y. KIM, L. C. GOUMNEROVA, P. M. BLACK, C. LAU, J. C. ALLEN, D. ZAGZAG, J. M. OLSON, T.
39
40 Cellular Assays and Their Application in Drug Discovery
54
55
56
57
58
59
60
CURRAN, C. WETMORE, J. A. BIEGEL, T. POGGIO, S. MUKHERJEE, R. RIFKIN, A. CALIFANO, G. STOLOVITZKY, D. N. LOUIS, J. P. MESIROV, E. S. LANDER, T. R. GOLUB. 2002. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–42. IDEKER T., V. THORSSON, J. A. RANISH, R. CHRISTMAS, J. BUHLER, J. K. ENG, R. BUMGARNER, D. R. GOODLETT, R. AEBERSOLD, L. HOOD. 2001. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292, 929–34. ULRICH R. G., S. H. FRIEND. 2002. Toxicogenomics and drug discovery: will new technologies help us produce better drugs? Nat Rev Drug Discov 1, 84–5. NUWAYSIR E. F., M. BITTNER, J. TRENT, J. C. BARRETT, C. A. AFSHARI. 1999. Microarrays and toxicology: the advent of toxicogenomics. Mol Carcinog 24, 153–9. ROTHBERG B. E.. 2001. The use of animal models in expression pharmacogenomic analyses. Pharma-cogenomics 1, 48–58. SHIPP M. A., K. N. ROSS, P. TAMAYO, A. P. WENG, J. L. KUTOK, R. C. AGUIAR, M. GAASENBEEK, M. ANGELO, M. REICH, G. S. PINKUS, T. S. RAY, M. A. KOVAL, K. W. LAST, A. NORTON, T. A. LISTER, J. MESIROV, D. S. NEUBERG, E. S. LANDER, J. C. ASTER, T. R. GOLUB. 2002. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8, 68–74. VAN’T VEER L. J., H. DAI, M. J. van de VIJVER, Y. D. HE, A. A. HART, M. MAO, H. L. PETERSE, K. K. van DER, M. J. MARTON, A. T. WITTEVEEN, G. J. SCHREIBER, R. M. KERKHOVEN, C. ROBERTS, P. S. LINSLEY, R. BERNARDS, S. H. FRIEND. 2002. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–6. THOMAS R. S., D. R. RANK, S. G. PENN, G. M. ZASTROW, K. R. HAYES, K. PANDE, E. GLOVER, T. SILANDER, M. W. CRAVEN, J. K. REDDY, S. B. JOVANOVICH, C. A. BRADFIELD. 2001. Identification of toxicologically predictive gene sets using
61
62
63
64
65
66
67
68
69
cDNA microarrays. Mol Pharmacol 60, 1189–94. BARTOSIEWICZ M. J., D. JENKINS, S. PENN, J. EMERY, A. BUCKPITT. 2001. Unique gene expression patterns in liver and kidney associated with exposure to chemical toxicants. J Pharmacol Exp Ther 297, 895–905. BURCZYNSKI M. E., M. MCMILLIAN, J. CIERVO, L. LI, J. B. PARKER, R. T. DUNN, S. HICKEN, S. FARR, M. D. JOHNSON. 2000. Toxicogenomics-based discrimination of toxic mechanism in HepG2 human hepatoma cells. Toxicol Sci 58, 399–415. WARING J. F., R. CIURLIONIS, R. A. JOLLY, M. HEINDEL, R. G. ULRICH. 2001. Microarray analysis of hepatotoxins in vitro reveals a correlation between gene expression profiles and mechanisms of toxicity. Toxicol Lett 120, 359–68. FARR S. B., M. D. TODD. 1998. Methods and kits for eukaryotic gene profiling. President and Fellows of Harvard College (Cambridge, MA, Xenometrix, Inc.). US Patent 5,811,231. ALBANO C. L., W. E. BENTLEY, G. RAO. 2001. High throughput studies of gene expression using green fluorescent protein–oxidative stress promoter probe constructs. J Biomol Screen 6, 421–8. ZHU M., W. E. FAHL. 2000. Development of a green fluorescent protein microplate assay for the screening of chemopreventive agents. Anal Biochem 287, 210–7. ZHAN Q., F. CARRIER, A. J. FORNACE, Jr. 1993. Induction of cellular p53 activity by DNA-damaging agents and growth arrest. Mol Cell Biol 13, 4242–50. DELESCLUSE C., N. LEDIRAC, R. LI, M. P. PIECHOCKI, R. N. HINES, X. GIDROL, R. RAHMANI. 2001. Induction of cytochrome P450 1A1 gene expression, oxidative stress, and genotoxicity by carbaryl and thiabendazole in transfected human HepG2 and lymphoblastoid cells. Biochem Pharmacol 61, 399–407. WATKINS R. E., G. B. WISELY, L. B. MOORE, J. L. COLLINS, M. H. LAMBERT, S. P. WILLIAMS, T. M. WILLSON, S. A. KLIEWER, M. R. REDINBO. 2001. The
References
70
71
72
73
74
75
76
77
78
human nuclear xenobiotic receptor PXR: structural determinants of directed promiscuity. Science 292, 2329–33. BERTELSEN A. H., V. E. VELCULENSCU. 1998. High-throughput gene expression analysis using SAGE. Drug Discov Today 3, 152–9. SHIMKETS R. A., D. G. LOWE, J. T. TAI, P. SEHL, H. JIN, R. YANG, P. F. PREDKI, B. E. ROTHBERG, M. T. MURTHA, M. E. ROTH, S. G. SHENOY, A. WINDEMUTH, J. W. SIMPSON, J. F. SIMONS, M. P. DALEY, S. A. GOLD, M. P. MCKENNA, K. HILLAN, G. T. WENT, J. M. ROTHBERG. 1999. Gene expression analysis by transcript profiling coupled to a gene database query. Nat Biotechnol 17, 798–803. SUTCLIFFE J. G., P. E. FOYE, M. G. ERLANDER, B. S. HILBUSH, L. J. BODZIN, J. T. DURHAM, K. W. HASEL. 2000. TOGA: an automated parsing technology for analyzing expression of nearly all genes. Proc Natl Acad Sci USA 97, 1976–81. HEID C. A., J. STEVENS, K. J. LIVAK, P. M. WILLIAMS. 1996. Real time quantitative PCR. Genome Res 6, 986–94. TODD M. D., X. LIN, L. F. STANKOWSKI, Jr., M. DESAI, G. H. WOLFGANG. 1999. Toxicity screening of a combinatorial library: correlation of cytotoxicity and gene induction to compound structure. J Biomol Screen 4, 259–68. WARRIOR U., Y. FAN, C. A. DAVID, J. A. WILKINS, E. M. MCKEEGAN, J. L. KOFRON, D. J. BURNS. 2000. Application of QuantiGene nucleic acid quantification technology for high throughput screening. J Biomol Screen 5, 343–52. CONRAD M. J.. 1999. Measurement of specific gene transcripts in 96-well plate assays. Genet Engng News 19. GRAHAM D. L., N. BEVAN, P. N. LOWE, M. PALMER, S. REES. 2001. Application of beta-galactosidase enzyme complementation technology as a high throughput screening format for antagonists of the epidermal growth factor receptor. J Biomol Screen 6, 401–11. REMY I., S. W. MICHNICK. 2001. Visualization of biochemical networks in living cells. Proc Natl Acad Sci USA 98, 7678–83.
79 GRIFFIN B. A., S. R. ADAMS, R. Y. TSIEN. 1998. Specific covalent labeling of recombinant protein molecules inside live cells. Science 281, 269–72. 80 KAIN S. R.. 1999. Green fluorescent protein (GFP): applications in cell-based assays for drug discovery. Drug Discov Today 4, 304–12. 81 MATZ M. V., A. F. FRADKOV, Y. A. LABAS, A. P. SAVITSKY, A. G. ZARAISKY, M. L. MARKELOV, S. A. LUKYANOV. 1999. Fluorescent proteins from nonbioluminescent Anthozoa species. Nat Biotechnol 17, 969–73. 82 EDWARDS B. S., F. W. KUCKUCK, E. R. PROSSNITZ, J. T. RANSOM, L. A. SKLAR. 2001. HTPS flow cytometry: a novel platform for automated high throughput drug discovery and characterization. J Biomol Screen 6, 83–90. 83 TAYLOR D. L., E. S. WOO, K. A. GIULIANO. 2001. Real-time molecular and cellular analysis: the new frontier of drug discovery. Curr Opin Biotechnol 12, 75–81. 84 BRUNET A., D. ROUX, P. LENORMAND, S. DOWD, S. KEYSE, J. POUYSSEGUR. 1999. Nuclear translocation of p42/p44 mitogen-activated protein kinase is required for growth factor-induced gene expression and cell cycle entry. EMBO J 18, 664–74. 85 DALAL S. N., C. M. SCHWEITZER, J. GAN, J-A DECAPRIO. 1999. Cytoplasmic localization of human cdc25C during interphase requires an intact 14-3-3 binding site. Mol Cell Biol 19, 4465–79. 86 LIU F., C. ROTHBLUM-OVIATT, C. E. RYAN, H. PIWNICA-WORMS. 1999. Overproduction of human Myt1 kinase induces a G2 cell cycle delay by interfering with the intracellular trafficking of Cdc2–cyclin B1 complexes. Mol Cell Biol 19, 5113–23. 87 VOGT A., T. ADACHI, A. P. DUCRUET, J. CHESEBROUGH, K. NEMOTO, B. I. CARR, J. S. LAZO. 2001. Spatial analysis of key signaling proteins by high-content solid-phase cytometry in Hep3B cells treated with an inhibitor of Cdc25 dual-specificity phosphatases. J Biol Chem 276, 20544–50.
41
42 Cellular Assays and Their Application in Drug Discovery
88 CHADEE D. N., M. J. HENDZEL, C. P. TYLIPSKI, C. D. ALLIS, D. P. BAZETT-JONES, J. A. WRIGHT, J. R. DAVIE. 1999. Increased Ser-10 phosphorylation of histone H3 in mitogen-stimulated and oncogenetransformed mouse fibroblasts. J Biol Chem 274, 24914–20. 89 BOGGS B. A., C. D. ALLIS, A. C. CHINAULT. 2000. Immunofluorescent studies of human chromosomes with antibodies against phosphorylated H1 histone. Chromosoma 108, 485–90. 90 PEREZ O. D., G. P. NOLAN. 2002. Simultaneous measurement of multiple active kinase states using polychromatic flow cytometry. Nat Biotechnol 20, 155–62. 91 MAYER T. U., T. M. KAPOOR, S. J. HAGGARTY, R. W. KING, S. L. SCHREIBER, T. J. MITCHISON. 1999. Small molecule inhibitor of mitotic spindle bipolarity identified in a phenotype-based screen. Science 286, 971–4. 92 BOGGS B. A., B. CONNORS, R. E. SOBEL, A. C. CHINAULT, C. D. ALLIS. 1996. Reduced levels of histone H3 acetylation on the inactive X chromosome in human females. Chromosoma 105, 303–9. 93 HENDZEL M. J., M. J. KRUHLAK, D. P. BAZETT-JONES. 1998. Organization of highly acetylated chromatin around sites of heterogeneous nuclear RNA accumulation. Mol Biol Cell 9, 2491–507. 94 YONEDA Y. 2000. Nucleocytoplasmic protein traffic and its significance to cell function. Genes Cells 5, 777–87. 95 HOWELL J. L., R. TRUANT. 2002. Live-cell nucleocytoplasmic protein shuttle assay utilizing laser confocal microscopy and FRAP. Biotechniques 32, 80–7. 96 DING G. J., P. A. FISCHER, R. C. BOLTZ, J. A. SCHMIDT, J. J. COLAIANNE, A. GOUGH, R. A. RUBIN, D. K. MILLER. 1998. Characterization and quantitation of NF-kappaB nuclear translocation induced by interleukin-1 and tumor necrosis factor-alpha. Development and use of a high capacity fluorescence cytometric system. J Biol Chem 273, 28897–905. 97 De WIT R., C. M. HENDRIX, J. BOONSTRA, A. J. VERKLEU, J. A. POST. 2000. Large-scale screening assay to measure
98
99
100
101
102
103
104
105
epidermal growth factor internalization. J Biomol Screen 5, 133–40. PRASHER D. C., V. K. ECKENRODE, W. W. WARD, F. G. PRENDERGAST, M. J. CORMIER. 1992. Primary structure of the Aequorea victoria green-fluorescent protein. Gene 111, 229–33. CODY C. W., D. C. PRASHER, W. M. WESTLER, F. G. PRENDERGAST, W. W. WARD. 1993. Chemical structure of the hexapeptide chromophore of the Aequorea green-fluorescent protein. Biochemistry 32, 1212–8. BARAK L. S., S. S. FERGUSON, J. ZHANG, M. G. CARON. 1997. A beta-arrestin/ green fluorescent protein biosensor for detecting G protein-coupled receptor activation. J Biol Chem 272, 27497–500. OAKLEY R. H., S. A. LAPORTE, J. A. HOLT, M. G. CARON, L. S. BARAK. 2000. Differential affinities of visual arrestin, beta arrestin1, and beta arrestin2 for G protein-coupled receptors delineate two major classes of receptors. J Biol Chem 275, 17201–10. GROARKE D. A., S. WILSON, C. KRASEL, G. MILLIGAN. 1999. Visualization of agonist-induced association and trafficking of green fluorescent protein-tagged forms of both beta-arrestin-1 and the thyrotropin-releasing hormone receptor-1. J Biol Chem 274, 23263–9. GROARKE D. A., T. DRMOTA, D. S. BAHIA, N. A. EVANS, S. WILSON, G. MILLIGAN. 2001. Analysis of the C-terminal tail of the rat thyrotropin-releasing hormone receptor-1 in interactions and cointernalization with beta-arrestin 1-green fluorescent protein. Mol Pharmacol 59, 375–85. GHOSH R. N., Y. T. CHEN, R. DEBIASI, R. L. DEBIASIO, B. R. CONWAY, L. K. MINOR, K. T. DEMAREST. 2000. Cell-based, high-content screen for receptor internalization, recycling and intracellular trafficking. Biotechniques 29, 170–5. CONWAY B. R., L. K. MINOR, J. Z. XU, J. W. GUNNET, R. DEBIASIO, M. R. D’ANDREA, R. RUBIN, R. DEBIASIO, K. GIULIANO, L. DEBIASIO, K. T. DEMAREST.
References
106
107
108
109
110
111
112
113
114
1999. Quantification of G-protein coupled receptor internalization using G-protein coupled receptor-green fluorescent protein conjugates with the ArrayScanTM high-content screening system. J Biomol Screen 4, 75–86. MERMELSTEIN P. G., K. DEISSEROTH, N. DASGUPTA, A. L. ISAKSEN, R. W. TSIEN. 2001. Calmodulin priming: nuclear translocation of a calmodulin complex and the memory of prior neuronal activity. Proc Natl Acad Sci USA 98, 15342–7. SANTAGATA S., T. J. BOGGON, C. L. BAIRD, C. A. GOMEZ, J. ZHAO, W. S. SHAN, D. G. MYSZKA, L. SHAPIRO. 2001. G-protein signaling through tubby proteins. Science 292, 2041–50. TERUEL M. N., T. MEYER. 2002. Parallel single-cell monitoring of receptor-triggered membrane translocation of a calcium-sensing protein module. Science 295, 1910–2. TORRANCE C. J., V. AGRAWAL, B. VOGELSTEIN, K. W. KINZLER. 2001. Use of isogenic human cancer cells for high-throughput screening and drug discovery. Nat Biotechnol 19, 940–5. LIPPINCOTT-SCHWARTZ J., E. SNAPP, A. KENWORTHY. 2001. Studying protein dynamics in living cells. Nat Rev Mol Cell Biol 2, 444–56. MAHAJAN N. P., K. LINDER, G. BERRY, G. W. GORDON, R. HEIM, B. HERMAN. 1998. Bcl–2 and Bax interactions in mitochondria probed with green fluorescent protein and fluorescence resonance energy transfer. Nat Biotechnol 16, 547–52. ZHANG J., Y. MA, S. S. TAYLOR, R. Y. TSIEN. 2001. Genetically encoded reporters of protein kinase A activity reveal impact of substrate tethering. Proc Natl Acad Sci USA 98, 14997– 5002. TING A. Y., K. H. KAIN, R. L. KLEMKE, R. Y. TSIEN. 2001. Genetically encoded fluorescent reporters of protein tyrosine kinase activities in living cells. Proc Natl Acad Sci USA 98, 15003–8. HONDA A., S. R. ADAMS, C. L. SAWYER, V. LEV-RAM, R. Y. TSIEN, W. R. DOSTMANN. 2001. Spatiotemporal dynamics of guanosine 3 ,5 -cyclic monophosphate
115
116
117
118
119
120
121
122
revealed by a genetically encoded, fluorescent indicator. Proc Natl Acad Sci USA 98, 2437–42. ZACCOLO M., F. De GIORGI, C. Y. CHO, L. FENG, T. KNAPP, P. A. NEGULESCU, S. S. TAYLOR, R. Y. TSIEN, T. POZZAN. 2000. A genetically encoded, fluorescent indicator for cyclic AMP in living cells. Nat Cell Biol 2, 25–9. MOCHIZUKI N., S. YAMASHITA, K. KUROKAWA, Y. OHBA, T. NAGAI, A. MIYAWAKI, M. MATSUDA. 2001. Spatio-temporal images of growth-factor-induced activation of Ras and Rap1. Nature 411, 1065–8. XU Y., D. W. PISTON, C. H. JOHNSON. 1999. A bioluminescence resonance energy transfer (BRET) system: application to interacting circadian clock proteins. Proc Natl Acad Sci USA 96, 151–6. ANGERS S., A. SALAHPOUR, E. JOLY, S. HILAIRET, D. CHELSKY, M. DENNIS, M. BOUVIER. 2000. Detection of beta 2-adrenergic receptor dimerization in living cells using bioluminescence resonance energy transfer (BRET). Proc Natl Acad Sci USA 97, 3684–9. AYOUB M. A., C. COUTURIER, E. LUCAS-MEUNIER, S. ANGERS, P. FOSSIER, M. BOUVIER, R. JOCKERS. 2002. Monitoring of ligand-independent dimerization and ligand-induced conformational changes of melatonin receptors in living cells by bioluminescence resonance energy transfer. J Biol Chem 277, 21522–8. RAMSAY D., E. KELLETT, M. MCVEY, S. REES, G. MILLIGAN. 2002. Homo- and hetero-oligomeric interactions between G protein-coupled receptors in living cells monitored by two variants of bioluminescence resonance energy transfer. Biochem J 365, 429–40. ROSSI F., C. A. CHARLTON, H. M. BLAU. 1997. Monitoring protein–protein interactions in intact eukaryotic cells by beta-galactosidase complementation. Proc Natl Acad Sci USA 94, 8405–410. PELLETIER J. N., F. X. CAMPBELL-VALOIS, S. W. MICHNICK. 1998. Oligomerization domain-directed reassembly of active dihydrofolate reductase from rationally
43
44 Cellular Assays and Their Application in Drug Discovery
123
124
125
126
127
128
129
130
131
designed fragments. Proc Natl Acad Sci USA 95, 12141–6. GALARNEAU A., M. PRIMEAU, L. E. TRUDEAU, S. W. MICHNICK. 2002. Beta-lactamase protein fragment complementation assays as in vivo and in vitro sensors of protein–protein interactions. Nat Biotechnol 20, 619–22. BLAKELY B. T., F. M. ROSSI, B. TILLOTSON, M. PALMER, A. ESTELLES, H. M. BLAU. 2000. Epidermal growth factor receptor dimerization monitored in live cells. Nat Biotechnol 18, 218–22. REMY I., S. W. MICHNICK. 1999. Clonal selection and in vivo quantitation of protein interactions with protein-fragment complementation assays. Proc Natl Acad Sci USA 96, 5394–9. SUBRAMANIAM R., D. DESVEAUX, C. SPICKLER, S. W. MICHNICK, N. BRISSON. 2001. Direct visualization of protein interactions in plant cells. Nat Biotechnol 19, 769–72. MARSHALL E. S., K. M. HOLDAWAY, J. H. SHAW, G. J. FINLAY, J. H. MATTHEWS, B. C. BAGULEY. 1993. Anticancer drug sensitivity profiles of new and established melanoma cell lines. Oncol Res 5, 301–9. PETTY R. D., L. A. SUTHERLAND, E. M. HUNTER, I. A. CREE. 1995. Comparison of MTT and ATP-based assays for the measurement of viable cell number. J Biolumin Chemilumin 10, 29–34. AHMED S. A., R. M. GOGAL, Jr., J. E. WALSH. 1994. A new rapid and simple non-radioactive assay to monitor and determine the proliferation of lymphocytes. An alternative to [3 H]thymidine incorporation assay. J Immunol Methods 170, 211–24. SKAANILD M. T., J. CLAUSEN. 1989. Estimation of LC50 values by assay of lactate dehydrogenase and DNA redistribution in human lymphocyte cultures. ATLA, 293–6. CROUCH S. P., R. KOZLOWSKI, K. J. SLATER, J. FLETCHER. 1993. The use of ATP bioluminescence as a measure of cell proliferation and cytotoxicity. J Immunol Methods 160, 81–8.
132 MOSMANN T. 1983. Rapid colorimetric assay for cellular growth and survival: application to proliferation and cytotoxicity assays. J Immunol Methods 65, 55–63. 133 GREEN D. R., J. C. REED. 1998. Mitochondria and apoptosis. Science 281, 1309–12. 134 MAJNO G., I. JORIS. 1995. Apoptosis, oncosis, and necrosis. An overview of cell death. Am J Pathol 146, 3–15. 135 WOO E. S., G. LAROCCA, O. LAPETS, G. R. BRIGHT. 1999. Multiparametric high content screening for apoptotic signals. High Throughput Screening Supplement to Biomed Products. 136 LEVEE M. G., M. I. DABROWSKA, J. L. LELLI, Jr., D. B. HINSHAW. 1996. Actin polymerization and depolymerization during apoptosis in HL-60 cells. Am J Physiol 271, C1981–92. 137 MAEKAWA T. L., T. A. TAKAHASHI, M. FUJIHARA, J. AKASAKA, S. FUJIKAWA, M. MINAMI, S. SEKIGUCHI. 1996. Effects of ultraviolet B irradiation on cell–cell interaction; implication of morphological changes and actin filaments in irradiated cells. Clin Exp Immunol 105, 389–96. 138 ENDRESEN P. C., P. S. PRYTZ, J. AARBAKKE. 1995. A new flow cytometric method for discrimination of apoptotic cells and detection of their cell cycle specificity through staining of F-actin and DNA. Cytometry 20, 162–71. 139 van ENGELAND M., H. J. KUIJPERS, F. C. RAMAEKERS, C. P. REUTELINGSPERGER, B. SCHUTTE. 1997. Plasma membrane alterations and cytoskeletal changes in apoptosis. Exp Cell Res 235, 421–30. 140 CAMILLERI-BROET S., H. VANDERWERFF, E. CALDWELL, D. HOCKENBERY. 1998. Distinct alterations in mitochondrial mass and function characterize different models of apoptosis. Exp Cell Res 239, 277–92. 141 ANDERSON D. J., F. H. GAGE, I. L. WEISSMAN. 2001. Can stem cells cross lineage boundaries? Nat Med 7, 393–5. 142 HENRY D. 1997. Haematological toxicities associated with dose-intensive chemotherapy, the role for and use of
References
143
144
145
146
147
148
149
150
151
152
recombinant growth factors. Ann Oncol 8 (Suppl. 3), S7–S10. SCHEDING S., J. E. MEDIA, A. NAKEFF. 1994. Acute toxic effects of 3 -azido-3 -deoxythymidine (AZT) on normal and regenerating murine hematopoiesis. Exp Hematol 22, 60–5. EAVES C. J.. 1995. Assay of hematopoietic progenitor cells. In BEUTLER E., M. A. LICHTMAN, B. S. COLLIER, T. J. KIPPS (eds), Williams Hematology, 5th edn. New York: McGraw-Hill, 22–6. BRASIER A. R., D. RON. 1992. Luciferase reporter gene assay in mammalian cells. Methods Enzymol 216, 386–97. WARREN M. K., W. L. ROSE, L. D. BEALL, J. CONE. 1995. CD34+ cell expansion and expression of lineage markers during liquid culture of human progenitor cells. Stem Cells 13, 167–74. GREENWALT D. E., J. SZABO, I. MANCHEL. 2001. High throughput cell-based assay of hematopoietic progenitor differentiation. J Biomol Screen 6, 383–92. SAARINEN K., K. KIVISTO, K. BLOMBERG, K. PUNNONEN, L. LEINO. 2000. Time-resolved fluorometric assay for leukocyte adhesion using a fluorescence enhancing ligand. J Immunol Methods 236, 19–26. FIEHN O. 2002. Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol 48, 155–71. NICHOLSON J. K., J. C. LINDON, E. HOLMES. 1999. “Metabonomics”: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 29, 1181–9. FIEHN O. 2002. Metabolomics –the link between genotypes and phenotypes. Plant Mol Biol 48, 155–71. LEHMANN M., W. BAUMANN, M. BRISCHWEIN, H. GAHLE, I. FREUND, R. EHRET, S. DRECHSLER, H. PALZER, M. KLEINTGES, U. SIEBEN, B. WOLF. 2001.
153
154
155
156
157
158
Simultaneous measurement of cellular respiration and acidification with a single CMOS ISFET. Biosens Bioelectron 16, 195–203. LEHMANN M., W. BAUMANN, M. BRISCHWEIN, R. EHRET, M. KRAUS, A. SCHWINDE, M. BITZENHOFER, I. FREUND, B. WOLF. 2000. Non-invasive measurement of cell membrane associated proton gradients by ion-sensitive field effect transistor arrays for microphysiological and bioelectronical applications. Biosens Bio-electron 15, 117–24. HENNING T., M. BRISCHWEIN, W. BAUMANN, R. EHRET, I. FREUND, R. KAMMERER, M. LEHMANN, A. SCHWINDE, B. WOLF. 2001. Approach to a multiparametric sensor-chip-based tumor chemosensitivity assay. Anticancer Drugs 12, 21–32. RININGER J. A., V. A. DIPIPPO, B. E. GOULD-ROTHBERG. 2000. Differential gene expression technologies for identifying surrogate markers of drug efficacy and toxicity. Drug Discov Today 5, 560–8. SCHENA M., SHALON D., DAVIS R. W., BROWN P. O.. 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarrays. Science 270, 368–71. LOCKHART D. J., DONG H., BYRNE M. C., FOLLETTIE M. T., GALLO M. V., CHEE M. S., MITTMANN M., WANG C., KOBAYASHI M., HORTON H., BROWN E. L.. 1996. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 14, 1649. GONZALES F., MORITA K., SEKI T., MIYATA N., UEKI T., LOIS A., MILAN L., MUNDAY C., GARCIA E., NAKAO T. 2001. Simultaneous HTS measurement of heme oxygenase and GAPDH mRNAs by multiStarTM High Performance Signal Amplification (HPSATM ). Presented at SBS 7th Annual Conference & Exhibition, Baltimore, MD.
45
1
Phosphoproteome Analysis in Medical Research Elke Butt
Universit¨atsklinik W¨urzburg, W¨urzburg, Germany
Katrin Marcus Ruhr-University Bochum, Bochum, Germany
Originally published in: Proteomics in Drug Research. Edited by Michael Hamacher, Katrin Marcus, Kai St¨uhler, Andr´e van Hall, Bettina Warscheid and Helmut E. Meyer. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31226-9
1 Introduction
Proteomics has entirely shifted the paradigm for drug discovery and novel diagnostics, as it is an effective means to rapidly identify new biomarkers and therapeutic targets for the diagnosis and treatment of various diseases, while increasing our understanding of biological processes. Clinical applications of proteomics involve the use of proteomic technologies with the final goal being to characterize the information flow through the intra- and extracellular molecular protein networks that interconnect organ and circulatory systems. Protein phosphorylation, a reversible posttranslational modification, occurs principally on serine, threonine, and tyrosine residues and plays a central role in cellular regulation either by altering a protein’s activity directly or by inducing specific protein–protein interactions, which in turn affect localization, binding specificity or activity. Abnormal kinase or phosphatase function plays a role in several pathologies and tumorigenesis. The phosphoproteome especially will play a key role in clinical proteomics as the aberrant function of protein kinases and phosphatases are often the center of many diseases, e.g., cancer, stroke, inflammatory and viral pathologies. The new focus of narrowly focused molecular targeted therapeutics addresses this concept [1]. A comprehensive analysis of protein phosphorylation includes the identification of the phosphoprotein or phosphopeptide, the localization of the modified amino acid, and if possible, the quantification of phosphorylation.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Phosphoproteome Analysis in Medical Research
Several methods have been used to detect and identify phosphorylated residues in a protein. The first step in a proteome study is the separation of a highly complex protein mixture resulting from a cell, tissue or organ lysate. In general, methods applied to the analysis of the proteome and protein identification are also well suited for the analysis of the phosphoproteome, including the localization of specific phosphorylation sites. The most prominent method for this purpose today is twodimensional polyacrylamide gel electrophoresis (2D PAGE), which separates the proteins according to their isoelectric points in the first and their molecular weights in the second dimension [2, 3]. The phosphoproteins can be visualized using different methods such as radiolabeling with [32 P]-orthophosphate combined with autoradiography [4], immunodetection with phosphospecific antibodies, and/or staining using phosphospecific dyes [5]. The specific isolation of phosphorylated proteins/peptides from a complex protein mixture is facilitated by different methods such as immunoprecipitation [6–8], immobilized metal affinity chromatography [9, 10], or chemical modification followed by specific purification [11, 12]. Mass spectrometric identification and localization of the phosphorylated amino acid residue can be done by several different methods such as mass spectrometry (MS)/MS-based peptide sequencing, precursor ion or neutral loss scanning [13–15] Indeed, the analysis of phosphoproteins and the detection and localization of the phosphorylated residues are not straightforward for various reasons: Ĺ The site of phosphorylation in a protein may vary resulting in a very heterogeneous mixture of different phosphorylation forms. Ĺ Phosphorylation in most instances emerges as a transient modification within signaling cascades and is accompanied by dephosphorylation. Hence, the stoichiometry is generally relatively low and the phosphoprotein detection method must be very sensitive. Ĺ Most analytical techniques for studying protein phosphorylation have a limited dynamic range and only major phosphorylation sites may be detected, whereas minor sites might be difficult to localize. Ĺ Phosphorylated peptides are generally suppressed in mass spectrometric analysis because of their low ionization rates.
In consequence, the localization of the phosphorylated residue often requires the specific isolation, detection and analysis of the modified region, which is complicated by the physicochemical conditions and complexity of the analyzed protein sample. The following three sections show concrete examples of the analysis of phosphoproteins as potential drug targets and demonstrate the applicability and enormous power of proteomics in this field.
2 Phosphoproteomics of Human Platelets
Blood platelets arise as pinch-offs of megakaryocytes in the bone marrow, playing a critical role in wound healing. In their inactive, nonadhesive state they circulate
2 Phosphoproteomics of Human Platelets 3
freely in the blood. Interactions with structures in the subendothelial matrix caused by injuries of the blood vessels initiate a rapid platelet activation resulting in the formation of vascular plugs and release of intracellular substances (e.g., thrombin) that initiate repair processes. Platelet adhesion at the vessel wall is the first step in platelet activation in vivo. Starting from this adhesion, intracellular signaling cascades are initiated resulting in secretion of activating substances and aggregation of the platelets. One important event within these intracellular signaling cascades is the phosphorylation/dephosphorylation of multiple proteins on various tyrosine, serine and threonine residues [16, 17]. Platelets are important components of hemostasis underlying cardiovascular diseases such as myocardial infarction or stroke [18, 19]. Genetic defects may result in bleeding disorders like Glanzmann thrombasthenia and Bernard-Soulier syndrome [20]. The high clinical relevance of arterial thrombosis, the limited knowledge about the underlying mechanisms on the molecular level, and the small number of available antiplatelet drugs [21] show the necessity for gaining more profound insights into the system. Although in recent years inhibitors of platelet aggregation have been established as the outstanding principle for secondary, and partially for primary prevention of cardiovascular diseases [22, 23], only a few effective drugs exist so far [21]. Hence, the search for better and perhaps safer antiplatelet drugs is of particular clinical interest, and it is therefore imperative to understand the exact molecular mechanisms in platelet activation, to find new potential components of the signal transduction pathways and to determine the role of protein phosphorylation. In the study described, a differential phosphoproteomic approach using the combination of several methods – 2D PAGE, 1D PAGE, autoradiography, immunoblotting, and nano high-performance liquid chromatography (LC)-electrospray ionization (ESI) MS/MS – was employed, resulting in a comprehensive analysis of the platelet phosphoproteome. After ex vivo stimulation with thrombin and radiolabeling of the platelet (resting platelets served as control), proteins were separated using 2D PAGE and phosphorylated proteins were directly assigned by detection of [32 P] via autoradiography (Figure 1). Discrimination of serine/threonine-phosphorylated proteins on the one hand and tyrosine-phosphorylated proteins on the other was done by detection of phosphotyrosine-containing proteins via immunoblotting with an antiphosphotyrosine-specific antibody. Phosphorylated protein spots/bands were analyzed by nano-LC MS/MS. Several of the identified proteins were known to be involved in signal transduction and platelet function, e.g., cortactin, myosin regulatory light chain (myosin RLC), and protein disulfide isomerase (PDI). 2.1 Cortactin
In several spots of the immunoblots and the autoradiogram (Figure 2, white arrow) the actin-binding protein cortactin was identified. The human form of cortactin is known as an oncogene, often amplified in human colon carcinomas and tumor cells [24] and seems to be involved in the signal transduction caused by extracellular
4 Phosphoproteome Analysis in Medical Research
Fig. 1 Autoradiography of a twodimensional separation of phosphorylated human platelet proteins after labeling with [32 P] in the pH range 3–10. Top: Autoradiography of the 2D gel with pH 3–10 after application of 400 :g protein lysate of acti-
vated (A) and resting (B) human platelets. Below: Scaled-up sections I-V of the 2D gels. This figure clearly demonstrates the power of autoradiography in the differential phosphoproteome analysis.
stimuli such as thrombin, integrins, or the EGF [25]. On the basis of the MS/MS spectra of the nonphosphorylated and the phosphorylated peptide, a phosphorylation of Ser-418 was detected (Figure 3A, B). Additionally, two further phosphorylation sites were localized at Thr-401 and Ser-405 (results not shown). These sites were already described to be phosphorylated by the extracellular signal-regulated kinase (ERK) in vivo and in vitro [26]. Although the exact role of this phosphorylation is not yet clear, there is evidence for an important function in the regulation of the subcellular localization of cortactin [27]. Additionally, a tyrosine-phosphorylation of cortactin by the tyrosine kinase pp60c-src at different tyrosine residues (Tyr-421, Tyr-466, and Tyr-482) is known to be responsible for its reduced actin-crosslink activity [28].
2 Phosphoproteomics of Human Platelets 5
Fig. 2 Extracts of an anti-phosphotyrosine immunoblot (top) and autoradiography (bottom) after two-dimensional separation in the pH range 4–7 of resting and activated human platelets. (For better orientation a black circle is drawn in.) The extracts of the regions between pI 5.4–5.9 and 40–55 kDa clearly demonstrate the differences in the phosphorylation patterns of resting and thrombin-activated platelets. An intensity
increase of different protein spots identified as, e.g., cortactin (three different phosphorylated forms; white arrows) and protein disulfide-isomerase (black arrow) could be detected after stimulation with thrombin in both the immunoblot and the autoradiogram. Several of the identified proteins were known to be involved in signal transduction and platelet function.
2.2 Myosin Regulatory Light Chain
The regulatory light chains of myosin are known to be phosphorylated at different serine and threonine residues. Thr-18 and Ser-19 are substrates of the myosin light chain-kinase (MLCK) whereas Ser-1, Ser-2 und Ser-9 are phosphorylated by the protein kinase C (PKC) [29–31]. As the first slow phosphorylation by the MLCK is already present in the resting platelets the second fast phosphorylation by the PKC occurred after thrombin stimulation. This second Ca2+ -dependent phosphorylation plays an important role in the secretion of substances from platelet intracellular granules [32]. In 1988, Naka and coworkers [29] described the existence of three different forms of myosin RLC in thrombin–activated platelets (non-, mono-, and bis-phosphorylated RLC) by nonproteomic studies. These results were confirmed unequivocally and were even augmented by this proteomic study (Figure 3C, D) [7]. 2.3 Protein Disulfide Isomerase
PDI is a noncovalent homodimer with a molecular weight of 57 kDa for each subunit. The protein is generally involved in the formation, reduction and
6 Phosphoproteome Analysis in Medical Research
Fig. 3 Localization of the phosphorylation sites of cortactin (A and B) and the myosin regulatory light chain (RLC; C, D). A: Fragment ion spectrum of the nonphosphorylated peptide LPSpSPVYEDAASFK of cortactin. B: Fragment ion spectrum of the nonphosphorylated peptide LPSSPVYEDAASFK of cortactin. Due to the secessions of −98 Da from y11 Ser-418 (Ser-4 in the spectrum) was unambiguously detected to be phosphorylated. C: Fragment ion spectrum of the nonphosphorylated peptide AT-
SNVFAMFDQSQIQEFK of the RLC. D: Fragment spectrum of the phosphorylated peptide ApTSNVFAMFDQSQIQEFK or ATpSNVFAMFDQSQIQEFK, respectively. This peptide contained three possible phosphory-lation sites (Thr-2, Ser-3, and Ser-12). The comparison of the two spectra revealed Thr-2 or Ser-3 as phosphorylated residues. As no secession of −98 Da was observed till y13 a phosphorylation of Ser13 could be excluded. On the basis of the MS/MS spectrum the exact site of phosphorylation was not determinable.
rearrangement of disulfide bridges and plays an important role in platelet function. PDI is localized at the extracellular side of the plasma membrane and is secreted after platelet activation [33]. In platelets, PDI catalyzes the formation of disulfide-bridged complexes of thrombospondin 1 and thrombin-antithrombin III and is thus involved in hemostasis and wound healing [34]. In addition, it is part of the integrin-mediated platelet adhesion [35]. A threonine phosphorylation has been described for the PDI localized in the endoplasmatic reticulum [36].
3 Identification of cAMP- and cGMP-Dependent 7
All proteins identified could be classified into different categories based on their function: 23 cytoskeletal proteins, 25 signaling molecules, 19 metabolic proteins, 5 vesicular proteins, 9 mitochondrial proteins, 6 proteins with extracellular function, 12 proteins involved in protein processing, 23 unknown proteins, and 12 miscellaneous. Overall, this proteome study resulted in a successful identification of 134 proteins and the localization of eight partly unknown phosphorylation sites, clearly demonstrating the power of proteomics. Methods, procedures and results are described in detail in [7]. The identification of phosphorylated proteins from such global approaches is only the first step in characterizing these signaling molecules. It is important to verify the involvement of the proteins identified and to prove the localized phosphorylation sites in regard to their functionality. Such studies may be performed by using antibodies against the protein of interest and/or overexpression of wild-type and mutant forms of epitope-tagged proteins to examine their role in signal transduction pathways.
3 Identification of cAMP- and cGMP-Dependent Protein Kinase Substrates in Human Platelets
The activation process of human platelets and vessel wall–platelet interactions are tightly regulated under physiological conditions and are often impaired in thrombosis, arteriosclerosis, hypertension, and diabetes. Platelet activation can be inhibited by a variety of agents, including aspirin and Ca2+ antagonists, as well as cGMP- and cAMP-elevating agents such as sodium nitroprusside (NO) and prostaglandin I2 [37]. The inhibitory effects of cGMP and cAMP are mediated by cAMP- and cGMP-dependent protein kinase (PKA and PKG), respectively [38]. For example, platelets from certain patients with chronic myelocytic leukemia showed decreased expression of PKG but a normal level of PKA. Phosphorylation of vasodilator-stimulated phosphoprotein (VASP) a specific substrate of both PKA and PKG, was severely impaired in these cGMP-dependent protein-kinase-deficient platelets, despite an exaggerated cGMP response to sodium nitroprusside, whereas the cAMP-mediated VASP phosphorylation was functionally intact [39]. In PKGdeficient mice cGMP-mediated inhibition of platelet aggregation is impaired [40]. However, the molecular mechanisms of platelet inhibition by cGMP signaling downstream of cGK activation are only partially understood and to date, only a few substrates for PKA and PKG have been identified and characterized in intact human platelets. In this respect, we used differential phosphoproteomic display of radiolabeled human platelets to identify specific substrates of PKG and PKA. Therefore, human platelets were labeled with 32 P, stimulated with 500 µM of the specific PKG activator 8-pCPT-cGMP, and proteins of the resulting platelet lysate were separated by 2D PAGE. The two-dimensional phosphoproteomes resulting from this experiment (Figure 4) demonstrate the phosphorylation/dephosphorylation of several proteins after stimulation with 8-pCPT-cGMP.
8 Phosphoproteome Analysis in Medical Research
Fig. 4 PKG-dependent phosphorylation in intact human platelets. Human platelets were incubated in the presence of [32 P]orthophosphate and treated with buffer alone (control) or with 500 :M PKG activator 8pCPT-cGMP for 30 min (stimulation). Platelet homogenate was separated by
two-dimensional gel electrophoresis and an autoradiogram was obtained. Spot 1: LASP; spots 2a, b and c: VASP; spot 3: rap 1b; spots 4, 5 and 8 have not yet been identified. The pH gradient is indicated at the top of the gel.
For identification of these proteins, the spots were excised from the gel, digested with trypsin, and the resulting peptides were analyzed by nano LC-ESI MS/MS [41]. Alongside the increase of phosphorylation in several spots, dephosphorylation is also observed (Spots 4, 5 and 8) indicating the activation of phosphatases by activated PKG (Table 1). All PKG substrates identified so far in human platelets interact directly or indirectly with the actin filament and the cytoskeleton of the cell and are involved in the negative regulation of platelet activation after phosphorylation by PKG. The proteomic approach in combination with functional studies is a very successful tool for studying phosphorylation-dependent effects in platelets. Ongoing proteome studies will now focus on membrane proteins for new therapeutic targets to develop antithrombotic drugs.
4 Identification of a New Therapeutic Target for Anti-Inflammatory Therapy by Analyzing Differences in the Phosphoproteome of Wild Type and Knock Out Mice
Genetically modified mice either overexpressing or knocking out specific components have provided exciting advances in our knowledge of diseases. Recently, we focused on the p38 mitogen-activated protein kinase (MAPK) pathway and its physiological effect on inflammation. An intact cytokine response is essential for efficient host defense against invading pathogens. Inhibition of the p38 MAPK pathway that is involved in a number of cellular stress responses has been used successfully to decrease cytokine production in
4 Identification of a New Therapeutic Target for Anti-Inflammatory Table 1 Identification and function of cAMP- and cGMP-dependent protein kinase substrates
in human platelets Spot No.
Identification
Function
1
LIM and SH3 domain protein LASP
2a, b and c
VASP
3
Rap 1b
Phosphorylation of LASP at Ser-146 leads to a redistribution of the actin-bound protein from the tips of the cell membrane to the cytosol accompanied with a reduced cell migration [41, 42] Three distinct phosphorylation sites (Ser-157, Ser-239 and Thr-278) with different specificities for PKA and PKG [43]. VASP phosphorylation is thought to be involved in the negative regulation of integrin α IIb bIII [45]. Experiments in vitro revealed reduced F-actin binding and actin polymerization of phosphorylated VASP [44]. Phosphorylation of Rap 1b at Ser-179 is associated with translocation of the protein from the membrane to the cytosol [46]
4, 5, 8 Not identified yet Gel not shown Heat shock protein 27 Hsp27
Phosphorylation of Hsp27 at Thr-143 by PKG reduced the stimulatory effect of MAPKAP kinase 2-phosphorylated Hsp27 on actin polymerization [47]
vitro [48], suggesting a key role for this MAPK pathway in inflammation. However, the central role of p38 MAPK, which is responsible for the activation of a number of kinases and transcription factors, limits the use of p38 inhibitors as a selective antiinflammatory strategy. Thus, downstream substrates of the MAPK pathway could represent promising targets for a specific suppression of cytokine production. In mouse and human, three MAPKAP kinases (MK2, MK3, and MK5) have been identified and are activated by p38. The substrate specificities of these kinases also seem very similar and the small heat shock protein HSP25/27 is an efficient substrate for all three kinases in vitro [49]. Therefore, it was rather surprising that lack of MK2 resulted in a defect of cytokine biosynthesis and decreased levels of p38 MAPK. This was unexpected since MK3 and MK5 should be able to compensate for the loss of MK2, at least to a certain degree. To analyze these significant differences, we deleted the MK2 and MK5 gene from mice and compared the phenotypes of the resulting MK5 knockout (KO) mice and the MK2-deficient mice with the wild type (WT) [50]. For this approach WT, MEK2 -/- and MEK5 -/- mouse embryonic cardiac fibroblasts were labeled with 32 P, stimulated by 100 µM arsenite and proteins of the cell lysates were separated by 2D PAGE. The differential phosphoproteomes (Figure 5) demonstrate the phosphorylation of a protein identified by MS as HSP25 (arrow) in wild type and MK5-deficient cells indicating that in vivo, MK2 was the only kinase phosphorylating HSP25 in these cells stimulated by arsenite [50].
9
10 Phosphoproteome Analysis in Medical Research
Fig. 5 Autoradiograms of 2D phosphoproteomics from wildtype (WT), MAPKAP kinase 2 deficient (MK2 −/−) and MAPKAP kinase 5 deficient (MK5 −/−) mouse cardiac fibroblasts before and after stimulation with 100 :M arsenite. The spot for phospho-HSP 25 was identified by mass spectrometry and is indicated by an arrow.
Obviously, MK5 is not able to compensate for this effect, making MK2 a unique pharmacological target for anti-inflammatory therapy. 5 Conclusions
In summary, modern proteomics approaches offer unprecedented opportunities to identify novel phosphorylated proteins for diagnosis and therapy of major illnesses. The development of new clinical proteomic tools away from 32 P labeling methods and the monitoring of protein expression by two-dimensional gels towards MSbased methods using phosphospecific isotope-tagged labeled peptides followed by one-dimensional electrophoresis will gain in importance. In the future, better protein enrichment strategies and quantification of phosphorylation will allow the clinical transfer to functional application and early detection of pathological organ conditions, as a supplement to existing and coevolving advances in diagnostic imaging. Acknowledgements
Parts of the authors’ work were funded by the German Ministry of Education and Research (BMBF).
References
References 1 PETRICOIN, E., WULFKUHLE, J., ESPINA, V., LIOTTA, L. A. (2004). Clinical proteomics: revolutionizing disease detection and patient tailoring therapy. J. Proteome Res. 3, 209–217. 2 G¨ORG, A., POSTEL, W., G¨UNTHER, S., WESER, J. (1985). Improved horizontal two-dimensional electrophoresis with hybrid isoelectric focusing in immobilized pH gradients in the first dimension and laying-on transfer to the second dimension. Electrophoresis 6, 599–604. 3 KLOSE, J. (1999). Large-gel 2-D electrophoresis. Methods Mol. Biol. 112, 147–172. 4 GARRISON, J. C. (1993). Protein phosphorylation. In HARDIE, D. G. (Ed.), A Practical Approach. Oxford University Press, Oxford, pp. 1. 5 SCHULENBERG, B., GOODMAN, T. N., AGGELER, R., CAPALDI, R. A., PATTON, W. F. (2004). Characterization of dynamic and steady-state protein phosphorylation using a fluorescent phosphoprotein gel stain and mass spectrometry. Electrophoresis 15, 2526–2532. 6 MARCUS, K., IMMLER, D., STERNBERGER, J., MEYER, H. E. (2000). Identification of platelet proteins separated by two-dimensional gel electrophoresis and analyzed by matrix assisted laser desorption/ionization-time of flightmass spectrometry and detection of tyrosine-phosphorylated proteins. Electrophoresis 21, 2622–2636. 7 MARCUS, K., MOEBIUS, J., MEYER, H. E. (2003). Differential analysis of phosphorylated proteins in resting and thrombin-stimulated human platelets. Anal. Bioanal. Chem. 276, 973–993. 8 GRONBORG, M., KRISTIANSEN, T. Z., STENSBALLE, A., ANDERSEN, J. S., OHARA, O., MANN, M., JENSEN, O. N., PANDEY, A. (2002). A mass spectrometry-based proteomic approach for identification of serine/threonine-phosphorylated proteins by enrichment with phospho-specific antibodies: identification of a novel protein, Frigg, as a protein kinase A substrate. Mol. Cell Proteomics 1, 517–527.
9 SALOMON, A. R., FICARRO, S. B., BRILL, L. M., BRINKER, A., PHUNG, Q. T., ERICSON, C., SAUER, K., BROCK, A., HORN, D. M., SCHULTZ, P. G., et al. (2003). Profiling of tyrosine phosphorylation pathways in human cells using mass spectrometry. Proc. Natl. Acad. Sci. USA 100, 443–448. 10 FICARRO, S. B., MCCLELAND, M. L., STUKENBERG, P. T., BURKE, D. J., ROSS, M. M., SHABANOWITZ, J., HUNT, D. F., WHITE, F. M. (2002). Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat. Biotechnol. 20, 301–305. 11 ODA, Y., NAGASU, T., CHAIT, B. T. (2001). Enrichment analysis of phosphorylated proteins as a tool for probing the phosphoproteome. Nat. Biotechnol. 19, 379–382. 12 ZHOU, H., WATTS, J. D., AEBERSOLD, R. (2001). A systematic approach to the analysis of protein phosphorylation. Nat. Biotechnol. 19, 375–378. 13 STEEN, H., MANN, M. (2002). A new derivatization strategy for the analysis of phosphopeptides by precursor ion scanning in positive ion mode. J. Am. Soc. Mass Spectrom. 13, 996–1003. 14 BATEMAN, R. H., CARRUTHERS, R., HOYES, J. B., JONES, C., LANGRIDGE, J. I., MILLAR, A., VISSERS, J. P. (2002). A novel precursor ion discovery method on a hybrid quadrupole orthogonal acceleration time-of-flight (Q-TOF) mass spectrometer for studying protein phosphorylation. J. Am. Soc. Mass Spectrom. 13, 792–803. 15 MANN, M., ONG, S. E., GRONBORG, M., STEHEN, H., JENSEN, O. N., PANDEY, A. (2002). Analysis of protein phosphorylation using mass spectrometry: deciphering the phosphoproteome. Trends Biotechnol. 20, 261–268. 16 WATSON, S. P. (1997). Protein phosphatases in platelet function. In VON BRUCHHAUSEN, F., WALTER, U. (Eds.), Platelets and their Factors. Springer-Verlag, Berlin, pp. 297–325. 17 PRESEK, P., MARTINSON, E. A. (1997). Platelet protein tyrosine kinases cyclases. In VON BRUCHHAUSEN, F., WALTER, U.
11
12 Phosphoproteome Analysis in Medical Research
18
19
20
21
22
23 24
25
26
27
28
(Eds.), Platelets and their Factors. Springer-Verlag, Berlin, pp. 263–296. BROWN, A. S., ERUSALIMSKY, J. D., MARTIN, J. F. (1997). Megakaryocytopoiesis: the megakaryocyte/platelet haemi-static axis. In VON BRUCHHAUSEN, F., WALTER, U. (Eds.), Platelets and their Factors. Springer-Verlag, Berlin, pp. 3–19. GLASS, C. K., WITZUM, J. L. (2001). Atherosclerosis: The Road Ahead. Cell 104, 503–516. NURDEN, A. T. (1999). Inherited abnormalities of platelets. Thromb Haemost 82, 468–480. FITZGERALD, D. J. (2001). Vascular biology of thrombosis: the role of platelet–vessel wall adhesion. Neurology 57, S1–S4. LEFKOWITZ, R. J., WILLERSON, J. T. (2001). Prospects for cardiovascular research. JAMA 285, 581–587. BENNETT, J. S. (2001). Platelet-fibrinogen interactions. Annu. Rev. Med. 52, 161–184. SCHUURING, E., VERHOEVEN, E., LITVINOV, S., MICHALIDES, R. J. (1993). The product of the EMS1 gene, amplified and over-expressed in human carcinomas, is homologous to a v-src substrate and is located in cell-substratum contact sites. Mol. Cell Biol. 13, 2891–2898. HUANG, C., NI, Y., WANG, T., GAO, Y., HAUDENSCHILD, C. C., ZHAN, X. (1997). Down-regulation of the filamentous actin cross-linking activity of cortactin by Src-mediated tyrosine phosphorylation. J. Biol. Chem. 272, 13911–13915. CAMPBELL, D. H., SUTHERLAND, R. L., DALY, R. J. (1999). Signaling pathways and structural domains required for phosphorylation of EMS1/cortactin. Cancer Res. 59, 5376–5385. van DAMME, H., BROK, H., SCHUURING-SCHOLTES, E., SCHUURING, E. (1997). The redistribution of cortactin into cell-matrix contact sites in human carcinoma cells with 11q13 amplification is associated with both overexpression and posttranslational modification. J. Biol. Chem. 272, 7374–7380. RAFFEL, G. D., PARMAR, K., ROSENBERG, N. (1996). In vivo association of v-Abl with Shc mediated by a non-phosphotyrosinedependent SH2 interaction. J. Biol. Chem. 271, 4640–4645.
29 NAKA, M., SAITOH, M., HIDAKA, H. (1988). Two phosphorylated forms of myosin in thrombin-stimulated platelets. Arch. Biochem. Biophys. 261, 235–240. 30 KAWAMOTO, S., BENGUR, A. R., SELLERS, J. R., ADELSTEIN, R. S. (1989). In situ phosphorylation of human platelet myosin heavy and light chains by protein kinase C. J. Biol. Chem. 264, 2258–2265. 31 MOUSSAVI, R. S., KELLEY, C. A., ADELSTEIN, R. S. (1993). Phosphorylation of vertebrate nonmuscle and smooth muscle myosin heavy chains and light chains. Mol. Cell Biochem. 127–128, 219–227. 32 NISHIKAWA, M., TANAKA, T., HIDAKA, H. (1980). Ca2+ -calmodulin-dependent phosphorylation and platelet secretion. Nature 287, 863–865. 33 CHEN, K., LIN, Y., DETWILER, T. C. (1992). Protein disulfide isomerase activity is released by activated platelets. Blood 79, 2226–2228. 34 MILEV, Y., ESSEX, D. W. (1999). Protein disulfide isomerase catalyzes the formation of disulfide-linked complexes of thrombospondin-1 with thrombin – antithrombin III. Arch. Biochem. Biophys. 361, 120–126. 35 LAHAV, J., GOFER-DADOSH, N., LUBOSHITZ, J., HESS, O., SHAKLAI, M. (2000). Protein disulfide isomerase mediates integrin-dependent adhesion. FEBS Lett. 475, 89–92. 36 QUEMENEUR, E., GUTHAPFEL, R., GUEGUEN, P. (1994). J. Biol. Chem. 269, 5485–5488. 37 KOESLING, D., NUERNBERG, B. (1997). Platelet G proteins and adenylyl and guanylyl cyclases. In VON BRUCHHAUSEN, F., WALTER, U. (Eds.), Platelets and their Factors. Springer-Verlag, Berlin, pp. 181–218. 38 M¨UNZEL, T., FEIL, R., M¨ULSCH, A., LOHMANN, S. M., HOFMANN, F., WALTER, U. (2003). Physiology and pathophysiology of vascular signaling controlled by guanosine 3 ,5 -cyclic monophosphate-dependent protein kinase. Circulation 108, 2172–2183. 39 EIGENTHALER, M., ULLRICH, H., GEIGER, J., HORSTRUP, K., HONIG-LIEDL, P., WIEBECKE, D., WALTER, U. J. (1993). Defective nitrovasodilator-stimulated protein phosphorylation and calcium regulation
References
40
41
42
43
44
45
in cGMP-dependent protein kinase-deficient human platelets of chronic myelocytic leukemia. Biol. Chem. 268, 13526–13531. MASSBERG, S., SAUSBIER, M., KLATT, P., BAUER, M., PFEIFER, A., SIESS, W., FASSLER, R., RUTH, P., KROMBACH, F., HOFMANN, F. (1999). Increased adhesion and aggregation of platelets lacking cyclic guanosine 3 ,5 -monophosphate kinase I. J. Exp. Med. 189, 1255–1264. BUTT, E., GAMBARYAN, S., G¨OTTFERT, N., GALLER, A., MARCUS, K., MEYER, H. E. (2003). Actin binding of human LIM and SH3 protein is regulated by cGMP- and cAMP-dependent protein kinase phosphorylation on serine 146. J. Biol. Chem. 278, 15601–15607. KEICHER, C., GAMBARYAN, G., SCHULZE, E., MARCUS, K., MEYER, H. E., BUTT, E. (2004). Phosphorylation of mouse LASP-1 on threonine 156 by cAMP-and cGMP-dependent protein kinase. Biochem. Biophys. Res. Commun. 324, 308–316. BUTT, E., ABEL, K., KRIEGER, M., PALM, D., HOPPE, V., HOPPE, J., WALTER, U. (1994). cAMP- and cGMP-dependent protein kinase phosphorylation sites of the focal adhesion vasodilator-stimulated phosphoprotein (VASP) in vitro and in intact human platelets. J. Biol. Chem. 269, 14509–14517. HARBECK, K., H¨UTTELMAIER, S., SCHLU¨ TER, K., JOCKUSCH, B. M., ILLENBERGER, S. (2000). Phosphorylation of the vasodilator-stimulated phosphoprotein regulates its interaction with actin. J. Biol. Chem. 275, 30817–30825. HORSTRUP, K., JABLONKA, B., H¨ONIG-LIEDL,
46
47
48
49
50
P., JUST, M., KOCHSIEK, K., WALTER, U. (1994). Phosphorylation of focal adhesion vasodilator-stimulated phosphoprotein at Ser157 in intact human platelets correlates with fibrinogen receptor inhibition. Eur. J. Biochem. 225, 21–27. TORTI, M., LAPETINA, E. G. (1994). Structure and function of rap proteins in human platelets. Thromb Haemost 71, 533–543. BUTT, E., IMMLER, D., MEYER, H. E., KOTLYAROV, A., LAASS, K., GAESTEL, M. (2001). Heat shock protein 27 is a substrate of cGMP-dependent protein kinase in intact human platelets: phosphorylation-induced actin polymerization caused by HSP27 mutants. J. Biol. Chem. 276, 7108–7113. LEE, J. C., LAYDON, J. T., MCDONNEL, P. C., GALLAGHER, T. F., KUMAR, S., GREEN, D., MCNULTY, D., BLUMENTHAL, M. J., HEYS, J. R., LANDVATTER, S. W. (1994). A protein kinase involved in the regulation of inflammatory cytokine biosynthesis. Nature 372, 739–746. STOKOE, D., ENGEL, K., CAMPBELL, D. G., COHEN, P., GAESTEL, M. (1992). Identification of MAPKAP kinase 2 as a major enzyme responsible for the phosphorylation of the small mammalian heat shock proteins. FEBS Lett. 313, 307–313. SHI, Y., KOTLYAROV, A., LAASS, K., GRUBER, A. D., BUTT, E., MARCUS, K., MEYER, H. E., FRIEDRICH, A., VOLK, H. D., GAESTEL, M. (2003). Elimination of protein kinase MK5/PRAK activity by targeted homologous recombination. Mol. Cell Biol. 23, 7732–7741.
13
1
Mass Spectrometry of Peptides and Proteins Bettina Warscheid Ruhr-University Bochum, Bochum, Germany
Originally published in: Proteomics in Drug Research. Edited by Michael Hamacher, Katrin Marcus, Kai St¨uhler, Andr´e van Hall, Bettina Warscheid and Helmut E. Meyer. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31226-9 Abstract
Rapid advances in biological mass spectrometry have promoted proteomics to being a key technology in molecular cell biology and biomedical research. Current mass spectrometry-based proteomics provides the capability for the identification of cellular and subcellular proteomes, the study of changes in protein concentrations via stable isotope labeling methods and the mapping of functional protein modules isolated by means of affinity purification. The large potential of proteomics techniques to identify and accurately quantify proteins from complex biological samples has been well recognized in the larger scientific community and it will in all probability have a great impact on future drug research.
1 Introduction
In 1981, Barber et al. [1] demonstrated the analysis of a peptide of about 1300 Da in a mass spectrometer. To ionize the native peptide, the molecules were embedded in a glycerol matrix and bombarded with a high energy beam of argon atoms, a technique referred to as fast atom bombardment (FAB). Since ionization of hydrophobic peptides in the FAB process is favored compared to hydrophilic ones, the applicability of this technique to peptide mixture analysis is limited. Yet an avenue toward ionizing large biomolecules was apparent and encouraged the development of new ionization techniques, which also triggered the design of commercial mass spectrometers with expanded mass ranges. In 1988, Karas and Hillenkamp embedded proteins in a large molar excess of a UV-absorbing crystal matrix and irradiated the sample with a laser beam at suitable wavelengths [1]. In this process of matrix-assisted laser desorption/ionization Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Mass Spectrometry of Peptides and Proteins
(MALDI), large proteins can be transferred into the gas phase as intact molecules and become, for the most part, singly protonated. Since MALDI relies on the use of a pulsed laser, mass analysis is usually performed with a time-of-flight (TOF) analyzer of theoretically unlimited mass range. Since the mass spectrometer determines the mass-to-charge (m/z) ratio of the molecules, the ability to generate multiply charged ions would allow for the use of instruments with limited mass ranges in peptide and protein analysis. This was realized by Fenn and coworkers in the 1980s when they sprayed a diluted solution of protein in a high voltage electrostatic field gradient, generating multiprotonated species of intact proteins [2]. Owing to this propensity of the electrospray ionization (ESI) process, quadrupole instruments with a mass range of m/z 2000 (where z = 1) could henceforth be used to analyze molecules with masses exceeding the nominal m/z range of the instrument by about 50 times. The advent of both ESI and MALDI revolutionized the analysis of large biomolecules of low volatility such as peptides and proteins by their capability to form stable ions with little excess energy, enabling the determination of molecular weights even in protein mixtures. To obtain information specific to the primary structure of proteins, however, principles such as the activation of molecules via collisions with small neutral molecules, which have been used in the study of gaseous ion chemistry for decades, had to be adapted and helped to propel mass spectrometry to being of the most important tools in the field of proteomics.
2 Ionization Principles
MALDI and ESI represent the predominant ionization techniques in mass spectrometry-based proteomics, as recognized by the Nobel Prize in chemistry in 2002. MALDI is mainly used to volatize and ionize simple polypeptide samples for mass spectrometric (MS) analysis at high speed. The analysis of more complex peptide mixtures is usually conducted via ESI mass spectrometry (ESI MS) coupled online with a high-pressure liquid chromatography (HPLC) system to concentrate and separate peptides prior to MS analysis. 2.1 Matrix-Assisted Laser Desorption/Ionization (MALDI)
MALDI sample preparation takes place on a metal target by coprecipitation of the analyte molecules and matrix material present in a molar excess of 1000 to 10 000. Upon drying, the solid analyte-doped matrix sample is irradiated under high vacuum conditions by a nitrogen laser with a wavelength of 337 nm and nanosecond pulse-width. Organic molecules well suited to MALDI show high absorbance at the wavelength of the laser employed, and isolate the analyte molecules in a solid crystalline matrix to reduce their intermolecular interactions. For peptide analysis, matrices of α-cyano-4-hydroxycinnamic acid (HCCA) or 2,5-dihydrobenzoic acid
2 Ionization Principles
(DHB) are preferred; the latter or sinnapinic acid (SA) are best applicable to protein analysis. During the MALDI process, matrix crystals are excited and sublimated by laser irradiation. While the matrix plume is expanding, intact analyte molecules and matrix molecules are jointly transferred into the gas phase. The amount of energy transferred to the analyte molecules in this process depends on the physico-chemical properties of the matrix employed and may cause the unimolecular decay of the analyte molecules within a very short time frame. Generally, little internal energy is transferred from DHB to the analyte molecules with increasing energy transfer from SA to HCCA. The latter, which is considered a “hot” matrix, is normally the matrix of choice when analyzing peptides with MALDI-TOF MS on a microsecond time scale. In recent years, MALDI sources have also been successfully coupled to ion-trapping instruments. Extensive collisional cooling of MALDI ions is mandatory for their stabilization in trapping experiments on the millisecond time scale. Higher sensitivity is therefore obtained with the “cooler” DHB matrix than with HCCA. MALDI favors the formation of singly charged ions, which makes the interpretation of mass spectra straightforward. There is evidence that the matrix molecules play a crucial role within the ionization process. Yet the origin of ions produced in MALDI and the mechanisms underlying ionization are still not fully understood. Among various proposed ionization pathways, analyte ionization by gas phase proton transfer reactions in the expanding plume with photoionized matrix molecules has been favored. Ionization efficiencies in MALDI experiments mainly depend on the matrix compound and the sample preparation technique employed. Uniform incorporation of peptides into a monolayer of matrix crystals generally improves ion signal intensities. Increasing analyte concentrations, however, often cause ion signal suppression. Furthermore, the primary structure of a peptide influences its ionization efficiency. Peptides containing arginine at their C-termini, for example, show higher ionization rates than peptides with a lysine residue at the C-terminal position [3]. Considering sample concentrations in the picomole to femtomole range on the target and the ablated sample volume during MALDI, it can be estimated that only a few attomoles of the analyte are needed for informative MS analysis. Nonetheless, high concentrations of salt, buffer, and detergents should be avoided as they impede analyte-matrix cocrystallization and cause ion suppression, both effects leading to a significant loss of sensitivity. High sensitivity and high mass resolution are obtained for peptide analysis in the mass range below 2500 Da by MALDI-MS. Mass analysis of peptide ions below 800 Da is impeded by the appearance of matrix ions and their abundant cluster ions in the MALDI spectra. Peak broadening and loss of sensitivity is usually observed for peptides and proteins at higher masses, due to unimolecular decay occurring to some extent during MALDI. The speed of MALDI analysis depends on the laser pulse rate. With the recent introduction of 200-Hz lasers, samples can be analyzed ten times faster than before. This development is especially advantageous for offline liquid chromatography (LC)/MALDI applications [4]. MALDI mass analysis is performed considerably
3
4 Mass Spectrometry of Peptides and Proteins
faster than LC separation, allowing for chromatographic separation runs conducted in parallel. In an off-line LC/MALDI-MS experiment, complex peptide samples are separated by LC, and while the different peptide species are eluting from the LC column, they are deposited in discrete fractions of low complexity with the matrix of choice on the MALDI target. The automated sampling of the effluent on the MALDI plate in discrete spots is facilitated by the use of targets containing hydrophilic surface sites, which significantly reduces spot size and also improves spotting precision. Offline LC/MALDI samples can be stored and provide the option to perform multiple analyses at different instrumental parameter settings. In recent years, two more approaches – MALDI imaging [5, 6, 8] and surfaceenhanced laser desorption/ionization (SELDI) [9, 10, 10, 12] – have emerged that focus on the discovery of polypeptide biomarkers abundant in the diseased state and either absent or only a minor presence in the healthy state. While the first method employs MALDI for molecular imaging of tissues or cells, SELDI is preferable to screen body fluids from patients by ion mass profiling combined with advanced statistical analysis. For MALDI imaging, a thin slice of, for example, tumor tissue is placed on the sample plate and coated with matrix. Then the laser scans over the surface to generate an ion map of the tissue. To identify molecular patterns diagnostic of the diseased state, images of ions with specific m/z ratios are reconstructed, revealing their distribution on the analyzed tissue slice. In SELDI, separation chemistry on the target is employed to enrich peptides from urine or serum samples, for example. SELDI has also been coupled to quadrupole TOF (qTOF) instruments, providing superior mass resolution and enabling a better distinction of ions similar in mass [10–12]. Both strategies, however, face the analytical challenge of the high complexity of biological samples, resulting in ion suppression, limited reproducibility in MS data and a strong bias against low-abundance constituents as well as polypeptides with higher molecular masses. Various publications have demonstrated the applicability of these discovery-driven MS methods to screening biological material in short analysis times [9, 12, 14]. However, a dependable differentiation between the diseased and normal states appears feasible only if a prominent, disease-distinctive set of biomarkers is detected with high reliability and adequate accuracy. 2.2 Electrospray Ionization
In ESI, the endergonic transfer of ions from solution into the gas phase is accomplished by desolvation. The electric field penetrates the analyte solution and separates positive and negative ions in an electrophoresis-like process. The positive charges accumulate on the surface of the droplets; when the surface tension is exceeded, the characteristic “Taylor cone” is formed and the microspray occurs [15]. At this point, the droplets are close to their stability limit (Rayleigh limit), which is determined by the Coulomb forces of the accumulated positive charges and the surface tension of the solvent. When the repelling Coulomb forces reach the cohesion forces due to solvent evaporation, nanometer-sized droplets are formed by
2 Ionization Principles
a cascade of spontaneous divisions (Coulomb explosion). The ions still present in the liquid phase become completely desolvated during the transfer into the mass spectrometer by further solvent evaporation. If the electric field on the surface of the droplets is high enough, the direct desorption of bare ions occurs. During the whole process, little additional energy is transferred to the ions, which therefore remain stable in the gas phase. In comparison with MALDI, ESI represents an even softer ionization technique and causes no fragmentation of analyte ions. Peptides and proteins subjected to ESI become protonated in the positive mode, while the number of multiple charges increases with the molecular weight of the biomolecules [16]. If the mass spectrometer provides sufficient mass resolution, the charge state of a compound can be directly derived from the difference of m/z values of its isotopes in the mass spectrum (0.5 Da mass differences for doubly charged peptides and about 0.3 Da mass differences for triply charged peptides). A major benefit of the generation of multiply charged ions from polypeptides is that they are measured at a proportional fraction of their molecular weight, and can therefore be readily analyzed with a less sophisticated instrument of limited mass range. ESI takes place at atmospheric pressure. The ions produced are efficiently transmitted through a small orifice into the ion optics of the mass spectrometer. The desolvation of ions is assisted by nitrogen gas which streams from the interface into the region where ionization occurs. This curtain gas stream also hinders neutral molecules entering the high vacuum region. The ESI source basically consists of a capillary tube which continuously transfers the analyte solution into an electric field. The capillary also acts as counter-electrode to the interface plate; a potential difference of about 2–6 kV is typically applied to generate a stable spray with flow rates in the range of nano- to microliters per minute. Since the ion current of a given compound correlates with its concentration over a wide range and is rather independent of the liquid flow rate, ESI sources operating at nanoliter flow rates (<50 nL/min) have become exceedingly popular, drastically reducing sample consumption with an enhancement of sensitivity. The analytical properties of the nano ESI ion source and its great potential for proteomic research were described by Wilm et al. [17, 18]. This source basically consists of a very small, metal-coated glass capillary with an inner diameter of about 1 µm micrometer at the tip. When loading the capillary with a sample volume of 1 µL dissolved in 50 : 50 MeOH : H2 O, for example, it can be analyzed for more than 20 min without loss of sensitivity; ionization efficiency is increased by a factor of about 100 compared to conventional ESI. This setup enables the MS analysis of an enzymatic protein digest in the low femtomole concentration range [18]. Sensitivity in mass analysis can even be increased by coupling a nano ESI source with a high-performance liquid chromatography (HPLC) system operating at a flow rate of about 200 nL/min [19, 20]. An additional benefit of online HPLC/ESI-MS is that sample cleanup, analyte concentration and separation are accomplished in an automated process. However, polypeptides of high hydrophobicity are often retained on reversed-phase columns using a standard gradient and will therefore not be transferred into the MS device for mass analysis. Additionally, very hydrophobic peptides and proteins
5
6 Mass Spectrometry of Peptides and Proteins
(e.g., membrane proteins) tend to precipitate in the small glass capillaries during the electrospray process, which leads to the breakup of the spray. Another problem concerns the suppression of ion formation from an analyte caused by another analyte or by the presence of another constituent (e.g., buffer) in the solution. Since the MS response significantly depends on solvent and sample composition, ion signal intensities of a given analyte do not necessarily correlate with its concentration in the sample. For the quantitative analysis of an analyte of interest by ESI-MS (the same is true for MALDI-MS), the use of an adequate internal standard is therefore mandatory.
3 Mass Spectrometric Instrumentation
The promising use of various MALDI- and ESI-MS approaches in proteome research has accelerated many developments in the instrumental design and concepts of mass spectrometry. Novel MS techniques and intriguing combinations of existing instruments have emerged. Nonetheless, the basic principle of measuring the mass-to-charge ratios of analytes remains. In general, it is mainly the process of ion formation that determines the mass analyzer used. MALDI is commonly coupled to TOF analyzers, while ESI is coupled to quadrupole or ion-trapping instruments [21, 22]. Yet novel MS techniques have been developed for advanced proteomic research which allow for the efficient transfer of MALDI or ESI ions into a mass analyzer almost independent of its working principle. Owing to the soft ionization characteristics of MALDI and ESI, mass-tocharge ratios of intact molecules can be readily measured. To obtain information on the primary structure of polypeptides by MS, however, more sophisticated experiments within the MS device have to be performed. Amino acid sequence-specific data on polypeptides can be generated with mass spectrometers that allow for ion isolation, gas phase peptide fragmentation, and fragment ion analysis. Sequencespecific fragmentation of isolated peptide ions by collision-induced dissociation (CID) requires the combined use of two mass analyzers with the same or two different ion separation principles, commonly referred to as tandem MS instruments. Exceptions are ion trap instruments and MALDI-TOF MS. The first allows for the performance of multiple consecutive cycles of peptide ion fragmentation (MSn ) in a single-stage device [23], whereas in the latter peptide ions undergo unimolecular decay in the field-free drift tube commonly referred to as postsource decay (PSD) [24]. In this article, a description of the working principles of TOF as well as TOF/TOF and qTOF analyzers is provided. The latter two, illustrated in Figure 1, are hybrid instruments and have already had a big impact on proteomic research. The TOF analyzer is a very robust, simple, and fast instrument, which justifies its widespread use in proteomics research. In this technique, a set of ions is accelerated from the ion source into the field-free drift tube. Since the same constant acceleration voltage is applied, the flight-time of ions down the tube is determined
3 Mass Spectrometric Instrumentation
Fig. 1 Schematic representations of a timeof-flight/time-of-flight (TOF/TOF) instrument with a matrix-assisted laser desorption/ionization (MALDI) source (a), and a quadrupole time-of-flight (qTOF) instrument which can be interchangeably coupled
with a MALDI or electrospray ionization (ESI) source (b). In both instrumental setups, peptide ions can be fragmented via collision-induced dissociation (CID) in order to obtain specific information on peptide amino acid sequences.
7
8 Mass Spectrometry of Peptides and Proteins
by their kinetic energy and m/z ratio. Generally, the flight-time of ions at a given charge state is inversely proportional to their molecular mass, with smaller ions arriving earlier at the detector than higher mass ions. However, small differences in the kinetic energy of ions with the same m/z ratio (and therefore in the velocity of these ions) can cause a substantial deterioration of mass resolution in TOF MS analysis. To effectively correct for the initial spatial and kinetic energy distribution of MALDI ions (and therefore to improve mass resolution), the concepts of delayed extraction and reflectron TOF have successfully been established in current MALDITOF instruments. In proteomics, masses of intact peptides from a protein digest are normally measured by MALDI-TOF to acquire a characteristic peptide mass fingerprint (PMF) spectrum. To obtain sequence information on peptide ions with a single stage TOF analyzer, MALDI-PSD analysis is performed in the reflectron ion mode. To increase the rate of unimolecular decay in PSD experiments, the sample is ionized at increased laser energies. As a direct consequence, a decrease in mass resolution is observed in MALDI fragmentation spectra. Further problems involved in MALDI-PSD analysis relate to the limited ability to select precursor ions within a narrow mass window and to general constraints in peptide ion fragmentation by unimolecular decay. These limitations were addressed by the development of the MALDI-TOF/TOF mass spectrometer [10] Figure 1. In this tandem MS instrument, precursor ion selection is performed in the first TOF analyzer, whereas fragment ion mass analysis takes place in the second reflector–TOF instrument. A collision cell is placed between the two mass analyzers and allows for true MS/MS analysis of isolated MALDI ions by CID (Figure 1). To retain the capability of high-throughput peptide mass analysis, ions are generated with a fast pulsing laser. However, MS/MS analysis with a MALDI-TOF/TOF instrument is not significantly faster than with other types of tandem MS instruments such as ion traps. In addition to this vital improvement in MALDI-TOF MS, the concept of orthogonal ion acceleration had great impact on the development of a novel instrumental
Fig. 1 a In the MALDI-TOF/TOF analyzer, ions are accelerated to high kinetic energy and separated according to their mass-tocharge (m/z) ratios along their flight in the second reflectron TOF analyzer (TOF2) and subsequently detected. For mass spectrometric/mass spectrometric (MS/MS) analysis, peptide ions of specific m/z ratios are selected in the first TOF analyzer (TOF1), collisionally activated and fragmented in the collision cell. The fragment ions (F+ ) generated are subsequently separated in the TOF2 section followed by detection; neutral fragments (Fn ) are lost. b The qTOF instrument combines three quadrupole sections (Q0, Q1, and Q2) with a reflec-
tor TOF analyzer for mass analysis of ESI or MALDI ions. Q0 is part of the ion optics for focusing of ions generated by MALDI or ESI. Peptide ions of interest are isolated in Q1 and fragmented by CID in Q2, which is used as collision cell. Fragment ions (F+ ) formed are then perpendicularly accelerated into the reflectron TOF analyzer to acquire the corresponding MS/MS spectrum. For extensive fragmentation of MALDI peptide ions, argon is generally used as collision gas irrespective of the instrumental design. Peptide ions formed by ESI can be readily fragmented with nitrogen or argon as collision gas in qTOF MS/MS experiments.
4 Protein Identification Strategies
configuration, the qTOF analyzer [26, 27] (Figure 1). In this instrumental design, the ion source is decoupled from the TOF analyzer by extensive gas collisions in the quadrupole stages and well defined ion packages are accelerated perpendicularly into the TOF analyzer. MALDI and ESI ion sources can be interchangeably coupled to the qTOF instrument. For MS/MS analysis, ions of a particular m/z ratio are selected in the Q1 quadrupole and fragmented by collisional activation with nitrogen or argon atoms in the collision cell (Q2). The masses of fragment ions generated are then analyzed by the reflector TOF analyzer (Figure 1b). Peptides at the low femtomole level can be measured with high mass resolution of about 10 000 full width at half maximum (FWHM) and with high mass accuracy of about 10 ppm in both MS and MS/MS modes of this instrument. Its usefulness in proteomic research was early demonstrated [28]. In current proteomics, the TOF and hybrid TOF instruments as well as three-dimensional ion trap analyzers are predominantly used. Regarding new types of instruments, linear ion traps either as stand-alone devices [29] or in combination with two quadrupoles [30] or Fourier transform ion cyclotron resonance (FTICR)-MS [31] will certainly take a leading role in future proteomic research.
4 Protein Identification Strategies
In proteomics, two main strategies are followed for the identification of proteins in complex samples as illustrated in Figure 2. The first employs two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) to greatly reduce the complexity of biological samples prior to peptide MS analysis. In the second strategy, limited protein separation by one-dimensional gel electrophoresis (1-D PAGE) is performed and then combined with online nano HPLC/ESI-tandem MS analysis of peptides in an automated fashion. Alternatively, protein mixtures can be separated by LC [e.g., strong cation exchange (SCX) chromatography]. In both strategic tracks, proteins are converted to a set of peptides by enzymatic digestion prior to MS analysis. Although the sample complexity is increased again by protein digestion, the main reasons to subject peptides instead of proteins to MS analysis are based on the facts that (1) MS sensitivity is best in the nominal mass range below 2500 Da, (2) informative sequence information is most readily obtained from peptides of up to 20 amino acid residues via MS/MS and (3) peptides are more soluble than proteins and therefore easier to handle, to chromatographically separate and to electrospray. Visualized protein spots in a 2-D gel are cut out and separately digested ingel with a sequence-specific protease (e.g., trypsin). Subsequently, the molecular masses of the proteolytic peptide mixture generated from a single protein spot are analyzed by MALDI coupled with a TOF, TOF/TOF, qTOF or high-resolution FTICR instrument to acquire the corresponding PMF spectrum (Figure 2). In this method (also termed peptide mass mapping), the identity of a precursor protein can be determined using bioinformatic tools by matching the experimental peptide masses with lists of in-silico generated peptide masses of proteins cataloged in
9
10 Mass Spectrometry of Peptides and Proteins
Fig. 2 Schematic overview of two protein identification strategies commonly followed in proteomics. Protein samples are separated by either two-dimensional (2-D) or one-dimensional (1-D) polyacrylamide gel electrophoresis (PAGE). In both strategic tracks, proteins are converted into a set of peptides by enzymatic digestion (e.g., with trypsin) prior to MS analysis. Peptide mass fingerprinting (PMF) by MALDI MS is predominantly used to identify proteins sepa-
rated by 2-D PAGE. In contrast, protein mixtures only partially separated by 1-D PAGE are analyzed by nano high-performance liquid chromatography (nano HPLC) coupled online to ESI tandem MS (ESI MS/MS). For protein identification, peptide mass maps or sequence specific data obtained are used to search against databases via bioinformatic tools (see text for further details on both strategies).
a database. The specificity and therefore the probability of protein identification with high reliability can be improved by sequence analysis of selected peptides by MALDI-MS/MS with a TOF/TOF or qTOF instrument. This also provides the possibility of coupling LC separation of complex proteolytic peptide mixtures with MALDI-TOF MS analysis (offline LC/MALDI). If a protein spot cannot successfully be identified by MALDI analysis, peptide sequencing by automated nano HPLC/ESI MS/MS is an alternative. This provides superior sensitivity due to the preconcentration of peptides on trapping columns [20] and delivers considerably more sequence-specific information at the peptide level. However, nano HPLC coupled online with ESI-tandem MS analysis is much more time-consuming than MALDI-PMF. Therefore, it is predominantly used to efficiently identify complex protein samples which have only been separated partially, e.g., by 1-D PAGE. This case represents the second protein identification strategy illustrated in Figure 2. Visualized protein bands are excised, followed by protein digestion and separation of the proteolytically generated peptide mixtures by nano HPLC. While peptides are eluting from the chromatographic column, they are directly transferred into the nano ESI ion source via a fused silica capillary to become ionized and analyzed by MS/MS-capable instruments (e.g., ion traps or qTOF). Fragmentation of peptides is typically performed at low energy CID conditions in order to induce cleavages of the amide bonds between the amino acid residues (for a detailed review about peptide sequencing see Steen and Mann, [32]). An alternative strategy, known as the shotgun approach, is to initially digest a protein mixture with a sequence-specific enzyme and subsequently to subject the highly complex proteolytic peptide mixture to 2-D chromatographic separations (e.g., SCX/reversed phase) coupled online to ESI-MS/MS [33–36]. In all
5 Quantitative Mass Spectrometry for Comparative and Functional Proteomics
these cases, algorithms such as Sequest or Mascot are used for large-scale protein identification by searching sequence databases using the MS/MS data obtained in the experiments. However, certain criteria have to be applied to ensure high reliability of the identification of proteins using bioinformatics tools. It should be noted that generally only a fraction of an entire protein sequence is identified, which usually suffices for identification but does not allow the complete characterization of an entire protein. More sophisticated strategies are necessary to detect posttranslationally modified and/or N- or C-terminally processed protein sequences.
5 Quantitative Mass Spectrometry for Comparative and Functional Proteomics
Comparative MS-based proteomics has emerged as a promising tool in the field of drug discovery. To reveal potential drug targets or the effects of drugs on biological systems, a set of diseased and control, or perturbed and nonperturbed samples is compared. Quantitative protein analysis represents a targeted proteomic approach as it aims at identifying only those proteins which show altered expression profiles or undergo changes in the relative degree of posttranslational modification. In addition, new strategies have been developed to harness quantitative MS for the study of protein-protein interactions as discussed later in this chapter. Since the latter approach allows the placing of proteins of interest into a functional context, it in particular offers great potential for the discovery of new targets in drug research. In quantitative proteomics, two alternative strategies have been developed. The first one is based on 2-D PAGE combined with mass spectrometry for protein identification [37]. This method and current advances in the differential display of proteins by 2-D PAGE are discussed at length in another chapter of this book. The second approach exploits mass spectrometry in combination with stable isotope labeling for gaining accurate quantitative information on proteins. Quantitative MS via stable isotope labeling of proteins or peptides represents a universal tool applicable to samples derived from cell lines or tissues. This strategy has also provided shotgun methods capable of quantitation and identification in the same experiment [35, 36, 38, 39]. Typically, a differentially labeled set of samples is combined in a 1: 1 ratio prior to MS analysis. Relative quantitative information on proteins is obtained from mass spectra of isotopecoded peptides. Since MS is not an absolutely quantitative method, the determination of the absolute amount of an analyte is only achievable with respect to an adequate internal standard. To this end, a stable isotope-coded peptide representing the endogenous peptide is synthesized in its heavy form and used as internal standard that is added to the sample in a defined quantity [40]. MS measurement of the ratio of endogenous to synthetic peptide then allows for absolute quantitation of the former. This method is also applicable to the quantitative determination of the degree of posttranslational modification in a protein of interest [41]. Peptide synthesis is time-consuming, though, and instead
11
12 Mass Spectrometry of Peptides and Proteins
of studying a protein that is already known, the aim is often to quantify unknown proteins in a complex biological system. This idea triggered the development of several stable isotope labeling techniques for comparative proteomic studies, using 2 H, 13 C, 15 N or 18 O isotopes [42–46]. Differentially labeled samples are simultaneously analyzed by MS and quantitative information is derived by comparing ion signal intensities or peak areas of peptide pairs observed in the corresponding mass spectra. A benefit of using 13 C, 15 N or 18 O isotopes is that the peptides present in the “heavy” isotopic forms do not exhibit significant isotope-dependent chromatographic shifts from their “light” counterparts, thus improving the precision for quantifying peptide abundances by LC/ESI-MS. Stable isotope-enriched media can be used to directly impart a specific isotope signature to proteins without structural changes. Alternatively, isotope-coded tags can be introduced at the peptide or protein level depending on the labeling method applied. Therefore, current stable isotope labeling methods fall in two major categories: metabolic (in vivo) labeling and chemical (in vitro) labeling Figure 3 shows an overview of the different approaches to relative quantitative analysis of proteins by stable isotope labeling and MS (see text below for details). Irrespective of the method chosen, the complete labeling of proteins (≥ 95%) is a prerequisite to obtaining quantitative information on proteins with high accuracy.
6 Metabolic Labeling Approaches
Metabolic labeling is an in situ method which employs 15 N or isotope-encoded (13 C or 2 H) amino acids. Organisms are grown in controlled media under defined conditions in order to ensure translation of mRNA entities to uniformly isotopically encoded proteins. Consequently, the approach is generally restricted to cell culture systems and is not applicable to quantitative MS analysis of proteins derived from body fluids or tissue biopsy. Since the internal standard is generated in situ, differentially labeled cells can be pooled directly after harvest, which allows for consistent sample processing, assuring accurate protein quantitation. In addition, metabolic labeling provides the investigator with the possibility of effectively reducing the complexity of biological samples at the protein level via gel electrophoresis or liquid chromatography. Separation prior to enzymatic digestion is advantageous as it generally improves qualitative as well as quantitative information on complex protein mixtures obtained by MS analysis. 6.1 15 N Labeling
In 1999, the concept of using 14 N/15 N-containing media for relative quantitative MS analysis of proteins in yeast was introduced [47]. In this work, wild-type versus mutant yeast cells were compared. Identification and quantitation of individual proteins were performed simultaneously by MS analysis. The reported approach
6 Metabolic Labeling Approaches
Fig. 3 Schematic representation of different stable isotope labeling strategies used for determination of protein concentration ratios by MS. Metabolic labeling: heavy isotopes are incorporated in situ into the proteome of a cell population (state 2) by using cell culture medium enriched in 15 N atoms (> 96%) or containing stably isotopi-cally labeled amino acids (e.g., 13 C6 arginine). A second cell population (state1) is grown in normal medium (light), which is used as control for relative quantitation experiments. Light- and heavy-labeled cell populations are immediately mixed in a 1:1 ratio after cell harvest. Subsequently, cells are jointly lysed, proteins are extracted and subjected to any further purification and/or separation steps. Alternatively, cells from two conditions can be initially lysed and then combined to be further processed. The proteins are then digested with a sequencespecific protease and proteolytic peptide
mixtures generated are subjected to MS analysis. Peptide ratios can be obtained from MS spectra, which eventually allows relative quantification of protein concentrations in two different cell states. Chemical labeling: proteins or peptides extracted from cells of two different states are differentially labeled via chemical reaction with stably isotopically encoded tags (light and heavy). Most tagging reagents specifically react with cysteine residues or primary amino groups in proteins or peptides. Once the label is introduced, samples from two conditions can be mixed in a 1:1 ratio, enzymatically digested and relatively quantified by MS analysis. Labeling at the protein level provides the investigator with the possibility of jointly separating differentially labeled samples by gel electrophoresis or liquid chromatography. This greatly reduces the sample complexity prior to digestion, which facilitates relative quantitation via MS analysis.
13
14 Mass Spectrometry of Peptides and Proteins
also provided information on the level of phosphorylation of the protein kinase Ste20. In the same year, Pasa-Tolic et al. demonstrated quantitative profiling of Escherichia coli proteins on a large scale via in vivo stable isotope labeling combined with intact mass tag analysis by high-resolution FTICR-MS [48]. The idea of using accurate mass tags for high-throughput quantitation was further exploited for the analysis of whole cell lysate from two populations of bacteria, one grown in normal medium and one in medium enriched in 15 N (> 96%) [49–51]. To simplify quantitative profiling of proteolytic peptide mixtures from 14 N/15 N-encoded proteins in whole cell lysates, Conrads et al. specifically enriched cysteine-containing peptides via a thiol-reactive affinity tag prior to FTICR-MS analysis [49]. However, a general drawback of cysteine tagging methods [such as the isotope-coded affinity tag (ICAT) technology discussed later in this chapter] is that a large protein fraction does not contain cysteine residues at all and often the cysteine content of the remaining fraction is low. The probability of analyzing a suitable set of isotopically encoded peptides to infer accurate quantitative information on precursor proteins is therefore significantly reduced. With increasing numbers of growth cycles, endogenous proteins become enriched in 15 N atoms when cells are cultured in 15 N media; eventually all 14 N atoms are replaced by 15 N. The corresponding mass shift of 14 N/15 N-labeled proteins or peptides can only be predicted if the amino acid sequences of the proteins are known. This certainly complicates quantitative evaluation of the mass spectral data acquired. Therefore, the use of high-resolution mass spectrometry, such as FTICR, has been recommended [49]. However, a suitable algorithm can predict the expected mass shift of peptides or proteins based on the proteome of the organism studied. 15 N- and 14 N-labeled peptides can easily be distinguished by their isotope distributions observed in mass spectra. Owing to the use of 15 N media of > 96% purity and therefore incomplete replacement of 14 N, the 15 N-labeled peptides exhibit additional isotope peaks in mass spectra. 2-D PAGE is often employed for high-resolution separation of 14 N/15 N-labeled cell lysates followed by MS-based identification and quantitation of distinct proteins. Alternatively, the applicability of a shotgun approach to determine concentration ratios of 14 N/15 N-encoded proteins from yeast on a large scale has been demonstrated [38, 39]. To support data analysis in shotgun proteomics, a correlation algorithm for automated protein quantitation has been developed [52]. In another approach for gel-free proteomics, Wang and coworkers reported an inverse labeling strategy to simplify protein quantitation by 15 N combined with MS [53, 55]. In this experimental approach, both perturbed and control samples are independently cultured in 14 Nand 15 N-enriched media, generating a set of four sample pools as opposed to two in a conventional labeling experiment. Differentially labeled control and perturbed samples are combined, processed, and analyzed by mass spectrometry. Peptide mass spectra obtained were then compared by subtractive analysis to reveal only those peptide pairs which show altered concentrations. This strategy provides confidence in detecting changes in protein concentrations. However, the workload and costs are doubled, so its necessity must be determined on a case by case situation.
6 Metabolic Labeling Approaches
In situ 15 N labeling has successfully been applied to studying bacteria, yeast, and some mammalian cells at moderate costs. Metabolic labeling of Caenorhabditis elegans by feeding 15 N-labeled E. coli cells [54] or 15 N labeling of potato plants for structural protein analysis [55] have also been reported. However, in vivo labeling of higher eukaryotic organisms is expensive and may significantly affect growth and protein expression profiles. A detailed study of the effects of 15 N labeling on these systems has not yet been reported. 6.2 Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC)
Another very elegant in situ method for accurate quantitative assessment of protein or peptide abundances between two given cell lines is the stable isotope labeling by amino acids in cell culture (SILAC) [56]. Mann and coworkers grew mammalian cell lines in media lacking the essential amino acid leucine, but supplemented with its nonradioactive, deuterated form (2 H3 -Leu) [56]. Analysis of cell morphology showed that 2 H3 -Leu caused no differences in the growth of the cells. After five cell cycles, proteins were completely labeled with 2 H3 -Leu. Jiang and English reported metabolic labeling of a yeast strain auxotroph for leucine that was grown on synthetic complete media containing natural abundance Leu or 2 H10 -Leu [57]. Before this concept was harnessed for quantitative proteomics, incorporation of 13 C/15 N/2 H-encoded amino acids during cell culture was utilized to increase confidence in mass fingerprint database searches [58]. To prevent chromatographic differences in peptides containing 1 H/2 H isotopes, the SILAC approach was successfully extended to the use of 13 C6 -arginine [59], improving the accuracy of quantitation. When trypsin is used for protein digestion (note that trypsin cleaves proteins on the carboxy-terminal sides of arginine and lysine residues) in a 13 C6 -arginine experiment, the label is localized at the Cterminus of arginine-containing peptides, while lysine peptides are unlabeled and therefore do not provide quantitative information in peptide mass spectra. Fenselau and coworkers successfully used metabolic labeling with 13 C6 -arginine and 13 C6 lysine for comparative proteomics in drug-resistant and parental breast cancer cell lines [60]. This double isotope labeling approach was previously used to quantify changes in protein phosphorylation in resting versus activated human embryonic kidney cells [61]. Since both 13 C6 -lysine and 13 C6 -arginine were employed during culturing, tryptic digestion of proteins resulted in proteolytic peptide mixtures in which all peptides were labeled at the C-terminus, except for the original C-terminus of the proteins. This is certainly a considerable improvement to the SILAC approach, since it is essential to obtain an adequate set of peptide pairs from each protein for accurate relative quantitation, although there may still be variability from protein to protein. It has been demonstrated very recently that SILAC is applicable to multiplex experiments in order to measure dynamics of protein abundances in cells in response to stimuli such as growth factors or drugs [62]. Henrietta Lacks (HeLa) cells were grown in normal medium or in media lacking normal arginine and
15
16 Mass Spectrometry of Peptides and Proteins
supplemented with either 13 C6 14 N4 - or 13 C6 15 N4 -arginine. After harvesting the cells, differentially labeled cell populations were pooled in a 1 : 1 : 1 ratio for combined sample processing followed by triplicate LC/ESI-MS/MS peptide analyses to test for technical reproducibility. Interestingly, standard deviations obtained in quantitative measurements do vary significantly between proteins, even though the same number of peptide pairs was evaluated. Nevertheless, it was shown that metabolic labeling allows for reliable measurements of alterations in protein abundances of 1.5- to 2-fold in a single biological experiment. To account for biological variations in protein expression profiles, at least three independent biological experiments must be performed. The SILAC approach has also been used to investigate metastatic prostate cancer development at the protein level [63]. The fact that proteins showed altered concentration ratios by quantitative MS was confirmed by western blotting. In addition, proteomic approaches for quantitation of protein phosphorylation via SILAC combined with MS analysis have been described [61, 64, 65]. A recent study reports on identification as well as relative quantitation of in vivo methylation sites of proteins in HeLa cells by stable isotope labeling with 13 C2 H3 -methionine [66]. As shown by the examples given above, SILAC is a very flexible, simple, and accurate procedure for relative quantitation that provides deeper insight into biological systems based on cell culture.
7 Chemical Labeling Approaches
Chemical reactions are used to introduce isotope-coded tags into specific sites in proteins and peptides after sample collection, i.e., following tissue biopsy or cell harvest. In vitro labeling of biomolecules has been achieved in a variety of ways, including enzymatically directed labeling and incorporation of isotope tags prior to or following protein digestion (Figure 3). 7.1 Chemical Isotope Labeling at the Protein Level
Gygi et al. described an approach for accurate quantitation and concurrent identification of individual proteins within complex mixtures from yeast using chemical reagents termed isotope-coded affinity tags (ICATs) and tandem MS [67]. The ICAT reagent consists of a iodoacetyl moiety reacting with cysteine residues, an isotopeencoded linker (1 H8 or 2 H8 ) and a biotin tag. In this method, a set of two samples is differentially tagged and combined in a 1 : 1 ratio, the proteins are separated and then are enzymatically digested. The major innovation here was the selective enrichment of cysteine-containing peptides by the affinity tag, reducing the sample complexity by a factor of about 10 prior to MS analysis. Since the ICAT-encoded peptide pairs differ by at least 8 Da, this labeling method is applicable to MS analysis with low resolution instruments such as ion traps.
7 Chemical Labeling Approaches
The ICAT approach promised to be a widely applicable tool for comparative proteomics of cells and tissues. However, it fails for proteins which are cysteine-free and often the cysteine content of proteins is rather low, rendering them unavailable for accurate quantitative MS analysis. Following this first report on the ICAT strategy [67], a number of studies have further conveyed the ICAT idea in quantitative proteomic research [46, 68–70]. However, there are several drawbacks in the ICAT strategy: (1) ICAT reagents are relatively large (about 500 Da) and their addition to proteins may be sterically hindered; (2) prolonged reaction times are often needed to achieve complete incorporation of ICAT tags into proteins which may lead to partial derivatization of lysine, histidine, methionine, tryptophan, and tyrosine residues; and (3) the chemical stability of the thioether linkage towards molecular oxygen is low, which may result in uncontrolled partial cleavage of the label via βelimination. Smolka et al. systematically evaluated the ICAT method and addressed the need for optimization of labeling conditions [71]. However, even if complete labeling of samples can be achieved, the molecular masses of peptides bearing the ICAT label are significantly increased and the analysis of the corresponding MS/MS spectra (especially for small peptides of fewer than ten amino acids) is complicated due to additional fragmentation of the affinity tag. Nevertheless, improvements in ICAT technology have been made [46], such as the development of a photocleavable ICAT reagent bound to a solid support, thus eliminating the large biotin tag [72, 73], and the introduction of a modified version of the ICAT reagent that uses acid-labile isotope-coded extractants (ALICE) [74]. Finally, 13 C-containing ICAT reagents have been introduced to ensure coelution of peptides during LC separation. This new ICAT generation has been applied to the analysis of a cortical neuron proteome sample to identify proteins regulated by the antitumor drug camptothecin [75]. In 2003, a stable isotope labeling similar to ICAT was reported utilizing a membrane-impermeable biotinylating reagent, which contains a cleavable linker and specifically reacts with primary amino groups, i.e., the ε -amino group of lysine residues and the free amino termini of isolated intact proteins [76]. The capability of this reagent, available in the light and heavy forms for labeling and enrichment of lysine-containing peptides followed by relative quantitative assessment, was shown for two standard proteins. Labeling of proteins was performed under nondenaturing conditions which resulted in incomplete incorporation of the label. Certainly, this considerably limits the reproducibility and accuracy of the method and therefore its usefulness in quantitative proteomics. However, the authors proposed that the method designed would allow for targeting of cell surface proteins (without the need for subcellular fractionation) which could be useful in drug research [76]. Recently, Lottspeich and coworkers reported an alternative approach, termed isotope-coded protein label (ICPL), which is also specific to primary amino groups [77]. Since ICPL is based on stable isotope tagging at the protein level, it is applicable to any protein sample, including extracts from tissues or body fluids. The ICPL tag is an isotope-coded N-nicotinoyloxy-succinimide (Nic-NHS) and different versions such as 1 H4 /2 H4 - and 12 C6 /13 C6 -Nic-NHS have been synthesized for their use
17
18 Mass Spectrometry of Peptides and Proteins
in quantitative proteomics on a global scale, the latter ensuring coelution of the isotopic peptide pairs and therefore accurate quantitation by LC/MS analysis. So far, the efficiency of the approach has been demonstrated by comparative analysis of E. coli lysates differentially spiked with standard proteins [77]. Advantages of the ICPL method are the generally high abundance of lysine residues in proteins providing multiple labeling sites and an increase in MS sensitivity for labeled peptides which permits accurate quantitation and yields in improved sequence coverage of proteins. However, modification of proteins by Nic induces a strong shift in pH resulting in a drastic change in the migration of labeled proteins during 2-D PAGE with isoelectric focusing in the first dimension [77]. This may be advantageous in the analysis of very basic proteins (e.g., ribosomal proteins) but considerably reduces the resolution of 2-D PAGE for the bulk of proteins. 7.2 Stable Isotope Labeling at the Peptide Level
Various alternatives to the ICAT and ICPL technologies have been reported but these methods are consistently based on chemical tagging at the peptide level. In this chapter, only the most common and promising peptide labeling approaches will be discussed [45, 78]. Regnier and coworkers coined the term global internal standard technology (GIST) which employs N-acetoxy-[2 H3 ]succinimide or 2 H4 -succinic anhydride as acetylation reagents to modify all primary amino groups in peptides [53, 55, 79–83]. The applicability of GIST to the quantitation of phosphoproteins enriched via immobilized metal affinity chromatography (IMAC) has recently been demonstrated [84]. In this work, relative differences in phosphopeptide concentration between samples were derived from isotope ratio measurements of the peptide isoforms observed in mass spectra. One should bear in mind, however, that the modification of lysine residues reduces the basicity; consequently, peptide ionization efficiencies can be low. In addition, N-terminally blocked peptides containing no further lysine residues are not susceptible to quantitative MS analysis via GIST. To address this issue, a combination of GIST and labeling of the C-terminal carboxyl groups by 18 O has been proposed and tested for the comparison of the relative protein expression level in epidermal cells grown in the presence or absence of epidermal growth factor [85]. Proteolytic peptide labeling with 18 O isotopes was first reported by Schn¨olzer et al. for its use in de novo sequencing experiments. Chemical labeling of the Ctermini of peptides with 16 O/18 O isotopes leads to a doublet signature for the y-ion series in MS/MS spectra [86]. Since the b-ion series do not contain a label, MS/MS data interpretation is quite straightforward. The usefulness of 18 O labeling for de novo peptide sequencing was shown with MALDI-TOF [87], ESI-qTOF [88], ESI-ion trap [89], and ESI-FTICR [90] instruments. In 2000, Roepstorff and coworkers described 18 O labeling for peptide and protein quantitation [91]. However, the uncontrolled incorporation of one or two 18 O
7 Chemical Labeling Approaches
atoms into peptides and issues related to the overlap of two isotopic distributions complicated the calculation of peptide ratios from the MS spectra. Fenselau and coworkers successfully harnessed 18 O labeling for comparative proteomics using shotgun methods [92]. Generally, two protein samples are proteolytically digested in parallel, one in H2 16 O and the other one in H2 18 O. Alternatively, proteolytic digestion can be conducted prior to 18 O labeling, which increases the flexibility of this method [93]. The incorporation of 18 O atoms in the carboxy-termini of peptides is catalyzed by serine proteases (e.g., trypsin) via hydrolysis [92–94]. Double labeling can be obtained under adequate conditions (sufficient digestion times, high enzyme concentration and high enzyme activity) inducing a fixed mass shift of 4 Da into peptides. High as well as low resolution mass spectrometers can be employed for quantitative analysis of 16 O/18 O-labeled peptide pairs [95, 96]. 18 O labeling as a tool for proteomics was also evaluated by Figeys and coworkers [97] and a number of biologically relevant applications have been reported so far [98–104]. Stable isotope labeling methods commonly impart a mass difference as basis for relative quantitation by evaluation of peptide peaks in MS spectra. These methods are primarily designed for the quantitative analysis of a binary sample set. Even though the feasibility of multiplex protein quantitation from peptide mass spectra via SILAC [62] or ICPL [77] has been demonstrated, the further increase in the overall sample complexity has to be considered. Hence, high-resolution separation prior to MS analysis for the diminution of peak overlapping in peptide mass spectra is a prerequisite for accurate protein quantitation. To address this issue, a new generation of stable isotope tagging reagents, termed iTRAQ, has recently been introduced enabling the simultaneous quantitative comparison of a twofold up to a fourfold sample set [105]. The iTRAQ reagent is an N-hydroxysuccinimide ester which reacts with primary amines in peptides. It links an isotope-encoded tag consisting of a mass balance group (carbonyl) and a reporter group (based on N-methyl-pirazine) to peptides via the formation of an amide bond. In the iTRAQ method, the protein samples to be compared are separately digested with a protease, differentially labeled, pooled, and then jointly separated by chromatography followed by tandem MS analysis. Owing to the specific mass design of the 13 C, 15 N, and 18 O-encoded reporter and balance groups, the overall nominal mass of each of the four different iTRAQ reagents is kept constant and differentially labeled peptides therefore appear as single peaks in MS scans. When iTRAQ tagged peptides are subjected to MS/MS analysis, the amide bond of the tag fragments in a way similar to the cleavage of peptide bonds. The mass balancing carbonyl moiety, however, is lost with no retain of charge (neutral fragment) and liberates the reporter group as isotope-encoded fragment entities of 114.1–117.1 Da. The ion fragments (e.g., b- and y-ion series) of modified peptides are isobaric and fragmentation spectra obtained are usually improved with respect to ion signal intensities and sequence coverage. The relative concentration of the peptides can be deduced from the intensity ratios of reporter ions observed in the MS/MS spectra. The applicability of the iTRAQ method to multiplex quantitative protein profiling has been exemplified by
19
20 Mass Spectrometry of Peptides and Proteins
simultaneous relative and absolute quantitative analysis of wild-type yeast versus two isogenic mutant strains [105]. An inherent drawback of this methodology is that quantitative information is only obtained from those peptides which are subjected to MS/MS analysis. This fact stresses the necessity for efficient peptide separation via multidimensional chromatography (e.g., SCX/reversed-phase HPLC) prior to tandem MS analysis. Moreover, ion trap instruments cannot be employed in quantitative iTRAQ experiments since their peptide fragmentation spectra do not contain information on the low m/z range (low-mass cut off) in which reporter ions would appear (Doroshenko and Cotter, [106]). On the other hand, the use of fast scanning mass analyzers would be an advantage in order to increase the fraction of peptides from a total HPLC run that are subjected to MS/MS analysis. In a recent study, DeSouza et al. employed both iTRAQ and cleavable ICAT reagents in combination with multidimensional LC and tandem MS analysis to discover potential markers for endometrial cancer (EmCa) tissues [107]. As can be expected, the use of ICAT led to identification of a higher proportion of lower abundance signaling proteins while iTRAQ was clearly biased towards abundant sample constituents (i.e., ribosomal proteins and transcription factors). From this set of experiments, a total of nine potential markers for EmCa were discovered. Pyruvate kinase was found to be overexpressed in EmCa tissues using both iTRAQ and ICAT labeling [107]. In any case, the researcher should prefer stable isotope labeling methods which permit simple, specific, and complete incorporation of isotope tags at the earliest possible stage of sample processing. Furthermore, one should aspire to acquiring resolved peptide profiles and isotope patterns which imply the use of multidimensional separation technologies and high-resolution mass analyzers (e.g., FTICR, TOF/TOF, or qTOF). However, protein quantitation of high accuracy can also be achieved with low resolution mass spectrometers such as ion trap instruments [95]. In general, qTOF and TOF/TOF analyzers provide more accurate information on peptide abundances than ion trapping instruments due to space-charge limitations of the latter. The improvement in ion statistics and signal-to-noise ratios via prolonged scanning times is particularly beneficial for accurate quantitation of lower abundance constituents. While primarily it is the MS system used that determines accuracy and dynamic range of isotope ratio measurements, reproducibility strongly depends on the entire experimental strategy employed. The latter provides good reason for minimizing the number of separate sample processing steps, and thus for performing stable isotope labeling at the protein level (also in case chemical labeling is required for quantitative analysis of clinical and mammalian specimens). In terms of accuracy, it is essential to perform protein quantitation on the basis of multiple peptide pairs. Comparing stable isotope labeling followed by MS with differential 2-D PAGE using fluorescent dyes (DIGE technology), Heck and coworkers demonstrated that relative protein quantities measured with both methods are generally in good agreement [108]. It should be noted that for verification of altered ratios of protein concentrations in a biological system the analysis of independent repetitions (at least three biological experiments) is a must, irrespective of the quantitation method employed.
8 Quantitative MS for Deciphering Protein-Protein Interactions
8 Quantitative MS for Deciphering Protein-Protein Interactions
Stable isotope labeling in combination with mass spectrometry is on its way to revolutionizing the study of protein-protein interactions in cells covering both stable complexes as well as transient interactions. There is a variety of techniques available for the purification of protein complexes. In-depth information on purification strategies as well as protein complex analysis by MS means is provided in the literature [109–116]. In a conventional approach, the components of a purified complex and a control are separated by 1-D PAGE, proteins are subsequently stained and band patterns are compared. Distinct bands are cut out from the gel followed by in-gel protein digestion and peptide LC/ESI-MS/MS analysis for protein identification. The lists of the individual protein compositions are then compared to distinguish between background proteins and specific interacting partners in the purified complex. However, due to limitations in peptide MS/MS analyses of complex samples performed separately (e.g., MS/MS analysis of only a subset of peptides in a complex mixture), the nonidentification of proteins in a control does not provide convincing evidence for the specificity of protein-protein interactions and therefore further proof is required, e.g., by biochemical methods such as western blotting. To address this issue, quantitative MS methods can be utilized to allow for identification of specific interacting partners in protein complexes with high reliability. In a metabolic labeling approach, two distinct cell populations (e.g., one containing the bait protein fused with a tag for affinity purification, the other one representing the control) are differentially labeled with stable isotopes during culturing, and pooled in a 1:1 ratio followed by cell lysis and isolation of the protein complex via affinity purification (Figure 4). In contrast, in chemical labeling approaches using ICAT or peptide-specific reagents, affinity purification of protein complexes derived from cell cultures or tissues of two distinctive differentiation or developmental states is performed separately followed by stable isotope tagging of proteins or proteolytic peptides. In either case, samples are combined after completion of the labeling step and proteolytic peptide mixtures are jointly analyzed by LC/tandem MS (Figure 4). In such experiments, protein identification is obtained on the basis of sequence-specific information in peptide fragmentation spectra. In addition, specific interacting partners can be distinguished from copurifying proteins via ratio measurements of light peptides versus their heavy counterparts. True binding proteins become specifically enriched via affinity purification of complexes, and therefore peptide ratios are significantly greater than should be observed in mass spectra representing these proteins. In contrast, nonspecific binding partners are present in equal amounts, resulting in peptide ratios of about 1 (Figure 4). In the case of protein complex versus control, direct discrimination between bona fide and nonspecific interacting partners can be achieved by comparing ratios of peptide pairs. A major promise of this approach is that protein complex purification protocols can be simplified and mild washing conditions used; thus, the ability to screen for
21
22 Mass Spectrometry of Peptides and Proteins
Fig. 4 Schematic representation of the strategies followed for accurate mapping of protein complexes by stable isotope labeling combined with MS. Relative quantitative MS allows for discrimination between specific interacting partners and contaminants which are often copurified with the protein complex of interest. In the metabolic labeling approach, two cell populations are differentially labeled with stably isotopically encoded amino acids (e.g., 13 C -/12 C -arginine) during culturing. Fol6 6 lowing the labeling experiment, the heavy cell population, which contains the bait protein (termed B) fused with a tag for affinity purification, and the light cell population which is used as control are mixed in a 1:1 ratio followed by cell lysis and protein complex isolation. In the chemical labeling approach, cells from cultures or tissues of two distinctive differentiation or developmental states (termed states A and B) are sepa-
rately lysed followed by affinity purification of protein complexes and stable isotope tagging of proteins. Subsequently, the isotopically tagged protein samples can be pooled in a 1:1 ratio and further processed. In both approaches, proteolytic peptide mixtures are generated and then separated by liquid chromatography (LC) followed by qualitative as well as relative quantitative analysis using MS/MS. Specific interacting partners can be differentiated from copurifying proteins based on ion signal intensities of light peptides versus their heavy counterparts in MS spectra. Peptide ratios significantly greater than 1 point to the presence of specific interacting partners; peptides ratios of about 1 are usually distinctive for nonspecific interacting partners. Protein identification is performed on the basis of sequencespecific information observed in MS/MS spectra.
9 Conclusions
transient or weak interaction partners is improved. This has been demonstrated by Aebersold and coworkers who used ICAT technology combined with MS analysis to identify specific interacting partners of a large RNA polymerase II (Pol II) preinitiation complex (PIC) only partially purified from nuclear extracts by a single-step promoter DNA affinity procedure [117]. Comparative analysis of two or more isolated protein complexes looks for differences in protein composition. Using ICAT, dynamic changes in the composition and abundance of STE12 complexes immunopurified from yeast cells in different states [117] as well as in protein complexes associating with the MafK transcription factor upon erythroid differentiation in MEL cells [118] were detected. Pandey and coworkers have recently provided detailed instructions for using the SILAC approach to study protein complexes, protein-protein interactions, and the dynamics of protein abundance and posttranslational modifications [119]. This in situ labeling approach combined with affinity pull-down experiments was shown to be useful to study (1) signaling complexes involved in the epidermal growth factor receptor (EGFR) pathway of EGF-stimulated versus unstimulated HeLa cells [120]; (2) peptide-protein interactions in EGFR signaling [121] and (3) early events in EGFR signaling in a time-course experiment [122]. Hochleitner et al. have recently reported the determination of absolute protein quantities of the human U1 small nuclear ribonuclearprotein (snRNP) complex by using synthetic peptides and stable isotope peptide labeling combined with MS analyses [123]. The examples given above clearly demonstrate that quantitative MS-based proteomics is a powerful tool for shedding light on the stoichiometry, dynamics and assembly processes of functional protein modules in biological systems. Since drugs can equally be used as affinity baits, this methodology could also provide valuable information on cellular target proteins in drug research.
9 Conclusions
Biological mass spectrometry represents the core technology in proteome research. Instrumental designs and fragmentation techniques are still advancing at an expeditious pace. Novel proteomic strategies, with MS, biochemical as well as molecular cell biological techniques acting in concert, promise to dramatically extend our current knowledge of biological systems in the near future. In quantitative proteomics, it is crucial to consider the variability of biological systems in order to deliver meaningful results regarding protein quantities. This biological variability should not be underestimated but rather be addressed by independent repetitions and advanced statistical data analysis. Certainly, evaluation of large quantitative data sets is very laborious and only feasible using bioinformatics. In our laboratory, the current lack of suitable software tools for accurate large-scale determination of protein concentration ratios deduced from MS or MS/MS spectra was addressed by in-house development of an efficient software package supporting quantitative analysis of
23
24 Mass Spectrometry of Peptides and Proteins
data obtained in any experiment employing stable isotope labeling. Quantitative MS in combination with epitope tagging opens up an additional major area for future proteomic research: the mapping and characterization of functional protein modules on a large scale. This adds to the great promise of proteomics in drug research. Yet, the final impact of proteomics in the discovery of new drug targets cannot easily be foreseen. Acknowledgements
My special thanks go to Dr. Silke Oeljeklaus for helpful discussion and critical reading of the manuscript and to her and Sebastian Wiese for assistance in the design of the figures presented in this chapter. References 1 BARBER, M., BORDOLI, R. S., SEDGEWICK, R. D. and TYLER, A. N. (1981). Fast-atom bombardment of solids (F. A. B.): a new ion source for mass spectrometry. J. Chem. Soc. Chem. Commun. 293, 325–327. 2 KARAS, M., HILLENKAMP, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal. Chem. 60, 2299–2301. 3 MENG, C. K., MANN, M., FENN, J. B. (1988). On protons of proteins – a beams a beam for a that. Z. Phys. D 10, 361–368. 4 KRAUSE, E., WENSCHUH, H., JUNGBLUT, P. R. (1999). The dominance of arginine-containing peptides in MALDI-derived tryptic mass fingerprints of proteins. Anal. Chem. 71, 4160–4165. 5 ERICSON, C., PHUNG, Q. T., HORN, D. M., PETERS, E. C., FITCHETT, J. R., FICARRO, S. B., SALOMON, A. R., BRILL, L. M., BROCK, A. (2003). An automated noncontact deposition interface for liquid chromatography matrix-assisted laser desorption/ionization mass spectrometry. Anal. Chem. 75, 2309–2315. 6 CHAURAND, P., CAPRIOLI, R. M. (2002). Direct profiling and imaging of peptides and proteins from mammalian cells and tissue sections by mass spectrometry. Electrophoresis 23, 3125–3135. 7 HUTCHENS, T. W., YIP, T. T. (1993). New desorption strategies for the mass
8
9
10
11
12
spectrometric analysis of macromole-cules. Rapid Commun. Mass Spectrom. 7, 576–580. CHAURAND, P., SCHWARTZ, S. A., CAPRIOLI, R. M. (2002). Imaging mass spectrometry: a new tool to investigate the spatial organization of peptides and proteins in mammalian tissue sections. Curr. Opin. Chem. Biol. 6, 676–681. STOECKLI, M., CHAURAND, P., HALLAHAN, D. E., CAPRIOLI, R. M. (2001). Imaging mass spectrometry: a new technology for the analysis of protein expression in mammalian tissues. Nat. Med. 7, 493–496. MEDZIHRADSZKY, K. F., CAMPBELL, J. M., BALDWIN, M. A., FALICK, A. M., JUHASZ, P., VESTAL, M. L., BURLINGAME, A. L. (2000). The characteristics of peptide collision-induced dissociation using a high-performance MALDI-TOF/TOF tandem mass spectrometer. Anal. Chem. 72, 552–558. PETRICOIN III, E. F., ORNSTEIN, D. K., PAWELETZ, C. P., ARDEKANI, A., HACKETT, P. S., HITT, B. A., VELASSCO, A., TRUCCO, C., WIEGAND, L., WOOD, K., et al. (2002). Serum proteomic patterns for detection of prostate cancer. J. Natl. Cancer Inst. 94, 1576–1578. PETRICOIN, E. F., ARDEKANI, A. M., HITT, B. A., LEVINE, P. J., FUSARO, V. A., STEINBERG, S. M., MILLS, G. B., SIMONE, C., FISHMAN, D. A., KOHN, E. C., LIOTTA,
References
13
14
15
16
17
18
19
20
21
22
L. A. (2002). Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577. TANG, N., TORNATORE, P., WEINBERGER, S. R. (2004). Current developments in SELDI affinity technology. Mass Spectrom. Rev. 23, 34–44. CAPUTO, E., MOHARRAM, R., MARTIN, B. M. (2003). Methods for on-chip protein analysis. Anal. Biochem. 321, 116–124. GUO, J., YANG, E. C., DESOUZA, L., DIEHL, G., RODRIGUES, M. J., ROMASCHIN, A. D., COLGAN, T. J., SIU, K. W. (2005). A strategy for high-resolution protein identification in surface-enhanced laser desorption/ ionization mass spectrometry: Calgranulin A and chaperonin 10 as protein markers for endometrial carcinoma. Proteomics 5, 1953–1966. YANAGISAWA, K., SHYR, Y., XU, B. J., MASSION, P. P., LARSEN, P. H., WHITE, B. C., ROBERTS, J. R., EDGERTON, M., GONZALEZ, A., NADAF, S., et al. (2003). Proteomic patterns of tumour subsets in non-small-cell lung cancer. Lancet 362, 433–439. WILM, M., MANN, M. (1994). Electrospray and Taylor-Cone theory, Dole’s beam of macromolecules at last? Int. J. Mass Spectrom. Ion Process 136, 167–180. FENN, J. B., MANN, M., MENG, C. K., WONG, S. F., WHITEHOUSE, C. M. (1989). Electrospray ionization for mass spectrometry of large biomolecules. Science 246, 64–71. WILM, M., MANN, M. (1996). Analytical properties of the nanoelectrospray ion source. Anal. Chem. 68, 1–8. WILM, M., SHEVCHENKO, A., HOUTHAEVE, T., BREIT, S., SCHWEIGERER, L., FOTSIS, T., MANN, M. (1996). Femtomole sequencing of proteins from polyacrylamide gels by nano-electrospray mass spectrometry. Nature 379, 466–469. BANKS, J. F. Jr. (1996). High-sensitivity peptide mapping using packed-capillary liquid chromatography and electrospray ionization mass spectrometry. J. Chrom. A 743, 99–104. MITULOVIC, G., SMOLUCH, M., CHERVET, J. P., STEINMACHER, I., KUNGL, A., MECHTLER, K. (2003). An improved
23
24
25
26
27
28
29
30
31
method for tracking and reducing the void volume in nano HPLC-MS with micro trapping columns. Anal. Bioanal. Chem. 376, 946–951. JONSSON, A. P. (2001). Mass spectrometry for protein and peptide characterisation. Cell Mol. Life Sci. 58, 868–884. MANN, M., HENDRICKSON, R. C., PANDEY, A. (2001). Analysis of proteins and proteomes by mass spectrometry. Annu. Rev. Biochem. 70, 437–473. JONSCHER, K. R., YATES III, J. R. (1997). The quadrupole ion trap mass spectrometer – a small solution to a big challenge. Anal. Biochem. 244, 1–15. SPENGLER, B. (1997). Post-source decay analysis in matrix-assisted laser desorption/ionization mass spectrometry of biomolecules. J. Mass Spectrom. 32, 1019–1036. KRUTCHINSKY, A. N., ZHANG, W., CHAIT, B. T. (2000). Rapidly switchable matrix-assisted laser desorption/ionization and electrospray quadrupole-time-of-flight mass spectrometry for protein identification. J. Am. Soc. Mass Spectrom. 11, 493–504. LOBODA, A. V., KRUTCHINSKY, A. N., BROMIRSKI, M., ENS, W., STANDING, K. G. (2000). A tandem quadrupole/time-of-flight mass spectrometer with a matrix-assisted laser desorption/ionization source: design and performance. Rapid Commun. Mass Spectrom. 14, 1047–1057. SHEVCHENKO, A., LOBODA, A., SHEVCHENKO, A., ENS, W., STANDING, K. G. (2000). MALDI quadrupole time-of-flight mass spectrometry: a powerful tool for proteomic research. Anal. Chem. 72, 2132–2141. SCHWARTZ, J. C., SENKO, M. W., SYKA, J. E. (2002). A two-dimensional quadrupole ion trap mass spectrometer. J. Am. Soc. Mass Spectrom. 13, 659–669. LE BLANC, J. C., HAGER, J. W., ILISIU, A. M., HUNTER, C., ZHONG, F., CHU, I. (2003). Unique scanning capabilities of a new hybrid linear ion trap mass spectrometer (Q TRAP) used for high sensitivity proteomics applications. Proteomics 3, 859–869.
25
26 Mass Spectrometry of Peptides and Proteins
32 WILCOX, B. E., HENDRICKSON, C. L., MARSHALL, A. G. (2002). Improved ion extraction from a linear octopole ion trap: SIMION analysis and experimental demonstration. J. Am. Soc. Mass Spectrom. 13, 1304–1312. 33 STEEN, H., MANN, M. (2004). The ABC’s (and XYZ’s) of peptide sequencing. Nat. Rev. Mol. Cell Biol. 5, 699–711. 34 LINK, A. J., ENG, J., SCHIELTZ, D. M., CARMACK, E., MIZE, G. J., MORRIS, D. R., GARVIK, B. M., YATES III, J. R. (1999). Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 17, 676–682. 35 OPITECK, G. J., LEWIS, K. C., JORGENSON, J. W., ANDEREGG, R. J. (1997). Comprehensive on-line LC/LC/MS of proteins. Anal. Chem. 69, 1518–1524. 36 WASHBURN, M. P., WOLTERS, D., YATES III, J. R. (2001). Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19, 242–247. 37 WOLTERS, D. A., WASHBURN, M. P., YATES III, J. R. (2001). An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73, 5683–5690. 38 HAYNES, P. A., YATES III, J. R., (2000). Proteome profiling-pitfalls and progress. Yeast 17, 81–87. 39 WASHBURN, M. P., ULASZEK, R., DECIU, C., SCHIELTZ, D. M., YATES III, J. R. (2002). Analysis of quantitative proteomic data generated via multidimensional protein identification technology. Anal. Chem. 74, 1650–1657. 40 WASHBURN, M. P., ULASZEK, R. R., YATES III, J. R. (2003). Reproducibility of quantitative proteomic analyses of complex biological mixtures by multidimensional protein identification technology. Anal. Chem. 75, 5054–5061. 41 DESIDERIO, D. M., ZHU, X. (1998). Quantitative analysis of methionine enkephalin and beta-endorphin in the pituitary by liquid secondary ion mass spectrometry and tandem mass spectrometry. J. Chromatogr. A 794, 85–96. 42 GERBER, S. A., RUSH, J., STEMMAN, O., KIRSCHNER, M. W., GYGI, S. P. (2003).
43
44
45
46
47
48
49
50
51
Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. USA 100, 6940–6945. AEBERSOLD, R., RIST, B., GYGI, S. P. (2000). Quantitative proteome analysis: methods and applications. Ann. N.Y. Acad. Sci. 919, 33–47. FLORY, M. R., GRIFFIN, T. J., MARTIN, D., AEBERSOLD, R. (2002). Advances in quantitative proteomics using stable isotope tags. Trends Biotechnol. 20, 23–29. GOSHE, M. B., SMITH, R. D. (2003). Stable isotope-coded proteomic mass spectrometry. Curr. Opin. Biotechnol. 14, 101–109. LILL, J. (2003). Proteomic tools for quantification by mass spectrometry. Mass Spectrom. Rev. 22, 182–194. TAO, W. A., AEBERSOLD, R. (2003). Advances in quantitative proteomics via stable isotope tagging and mass spectrometry. Curr. Opin. Biotechnol. 14, 110–118. ODA, Y., HUANG, K., CROSS, F. R., COWBURN, D., CHAIT, B. T. (1999). Accurate quantification of protein expression and site-specific phosphorylation. Proc. Natl. Acad. Sci. USA 96, 6591–6596. PASA-TOLIC, L., JENSEN, P. K., ANDERSON, G. A., LIPTON, M. S., PEDEN, K. K., MARTINOVIC, S., TOLIC, N., BRUCE, J. E., SMITH, S. R. (1999). High throughput proteome-wide precision measurements of protein expression using mass spectrometry. J. Am. Chem. Soc. 121, 7949–7950. CONRADS, T. P., ALVING, K., VEENSTRA, T. D., BELOV, M. E., ANDERSON, G. A., ANDERSON, D. J., LIPTON, M. S., PASA-TOLIC, L., UDSETH, H. R., CHRISLER, W. B., et al. (2001). Quantitative analysis of bacterial and mammalian proteomes using a combination of cysteine affinity tags and 15N-metabolic labeling. Anal. Chem. 73, 2132–2139. SMITH, R. D., ANDERSON, G. A., LIPTON, M. S., MASSELON, C., PASA-TOLIC, L., SHEN, Y., UDSETH, H. R. (2002). The use of accurate mass tags for high-throughput microbial proteomics. Omics 6, 61–90.
References 52 SMITH, R. D., PASA-TOLIC, L., LIPTON, M. S., JENSEN, P. K., ANDERSON, G. A., SHEN, Y., CONRADS, T. P., UDSETH, H. R., HARKEWICZ, R., BELOV, M. E., et al. (2001). Rapid quantitative measurements of proteomes by Fourier transform ion cyclotron resonance mass spectrometry. Electrophoresis 22, 1652–1668. 53 MACCOSS, M. J., WU, C. C., LIU, H., SADYGOV, R., YATES III, J. R. (2003). A correlation algorithm for the automated quantitative analysis of shotgun proteomics data. Anal. Chem. 75, 6912–6921. 54 WANG, S., ZHANG, X., REGNIER, F. E. (2002). Quantitative proteomics strategy involving the selection of peptides containing both cysteine and histidine from tryptic digests of cell lysates. J. Chromatogr. A 949, 153–162. 55 WANG, Y. K., MA, Z., QUINN, D. F., FU, E. W. (2002). Inverse 15N-metabolic labeling/mass spectrometry for comparative proteomics and rapid identification of protein markers/targets. Rapid Commun. Mass Spectrom. 16, 1389–1397. 56 KRIJGSVELD, J., KETTING, R. F., MAHMOUDI, T., JOHANSEN, J., ARTAL-SANZ, M., VERRIJZER, C. P., PLASTERK, R. H., HECK, A. J. (2003). Metabolic labeling of C. elegans and D. melanogaster for quantitative proteomics. Nat. Biotechnol. 21, 927–931. 57 IPPEL, J. H., POUVREAU, L., KROEF, T., GRUPPEN, H., VERSTEEG, G., van den PUTTEN, P., STRUIK, P. C., van MIERLO, C. P. (2004). In vivo uniform (15)N-isotope labelling of plants: using the greenhouse for structural proteomics. Proteomics 4, 226–234. 58 ONG, S. E., BLAGOEV, B., KRATCHMAROVA, I., KRISTENSEN, D. B., STEEN, H., PANDEY, A., MANN, M. (2002). Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell Proteomics 1, 376–386. 59 JIANG, H., ENGLISH, A. M. (2002). Quantitative analysis of the yeast proteome by incorporation of
60
61
62
63
64
65
66
67
68
69
isotopically labeled leucine. J. Proteome Res. 1, 345–350. CHEN, X., SMITH, L. M., BRADBURY, E. M. (2000). Site-specific mass tagging with stable isotopes in proteins for accurate and efficient protein identification. Anal. Chem. 72, 1134–1143. ONG, S. E., KRATCHMAROVA, I., MANN, M. (2003). Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC) J. Proteome Res. 2, 173–181. GEHRMANN, M. L., HATHOUT, Y., FENSELAU, C. (2004). Evaluation of metabolic labeling for comparative proteomics in breast cancer cells. J. Proteome Res. 3, 1063–1068. IBARROLA, N., KALUME, D. E., GRONBORG, M., IWAHORI, A., PANDEY, A. (2003). A proteomic approach for quantification of phosphorylation using stable isotope labeling in cell culture. Anal. Chem. 75, 6043–6049. MOLINA, H., PARMIGIANI, G., PANDEY, A. (2005). Assessing reproducibility of a protein dynamics study using in vivo labeling and liquid chromatography tandem mass spectrometry. Anal. Chem. 77, 2739–2744. EVERLEY, P. A., KRIJGSVELD, J., ZETTER, B. R., GYGI, S. P. (2004). Quantitative cancer proteomics: stable isotope labeling with amino acids in cell culture (SILAC) as a tool for prostate cancer research. Mol. Cell Proteomics 3, 729–735. GRUHLER, A., OLSEN, J. V., MOHAMMED, S., MORTENSEN, P., FAERGEMAN, N. J., MANN, M., JENSEN, O. N. (2005). Quantitative phosphoproteomics applied to the yeast pheromone signaling pathway. Mol. Cell Proteomics 4, 310–327. IBARROLA, N., MOLINA, H., IWAHORI, A., PANDEY, A. (2004). A novel proteomic approach for specific identification of tyrosine kinase substrates using [13C]tyrosine. J. Biol. Chem. 279, 15805–15813. ONG, S. E., MITTLER, G., MANN, M. (2004). Identifying and quantifying in vivo methylation sites by heavy methyl SILAC. Nat. Methods 1, 119–126. GYGI, S. P., RIST, B., GERBER, S. A., TURECEK, F., GELB, M. H., AEBERSOLD, R.
27
28 Mass Spectrometry of Peptides and Proteins
70
71
72
73
74
75
76
77
(1999). Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999. GRIFFIN, T. J., HAN, D. K., GYGI, S. P., RIST, B., LEE, H., AEBERSOLD, R., PARKER, K. C. (2001). Toward a high-throughput approach to quantitative proteomic analysis: expression-dependent protein identification by mass spectrometry. J. Am. Soc. Mass Spectrom. 12, 1238–1246. HAN, D. K., ENG, J., ZHOU, H., AEBERSOLD, R. (2001). Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat. Biotechnol. 19, 946–951. SMOLKA, M., ZHOU, H., AEBERSOLD, R. (2002). Quantitative protein profiling using two-dimensional gel electrophoresis, isotope-coded affinity tag labeling, and mass spectrometry. Mol. Cell Proteomics 1, 19–29. SMOLKA, M. B., ZHOU, H., PURKAYASTHA, S., AEBERSOLD, R. (2001). Optimization of the isotope-coded affinity tag-labeling procedure for quantitative proteome analysis. Anal. Biochem. 297, 25–31. ZHOU, H., RANISH, J. A., WATTS, J. D., AEBERSOLD, R. (2002). Quantitative proteome analysis by solid-phase isotope tagging and mass spectrometry. Nat. Biotechnol. 20, 512–515. ZHOU, H., BOYLE, R., AEBERSOLD, R. (2004). Quantitative protein analysis by solid phase isotope tagging and mass spectrometry. Methods Mol. Biol. 261, 511–518. QIU, Y., SOUSA, E. A., HEWICK, R. M., WANG, J. H. (2002). Acid-labile isotope-coded extractants: a class of reagents for quantitative mass spectrometric analysis of complex protein mixtures. Anal. Chem. 74, 4969–4979. YU, L. R., CONRADS, T. P., UO, T., ISSAQ, H. J., MORRISON, R. S., VEENSTRA, T. D. (2004). Evaluation of the acid-cleavable isotope-coded affinity tag reagents: application to camptothecin-related
78
79
80
81
82
83
84
85
86
87
88
cortical neurons. J. Proteome Res. 3, 469–477. HOANG, V. M., CONRADS, T. P., VEENSTRA, T. D., BLONDER, J., TERUNUMA, A., VOGEL, J. C., FISHER, R. J. (2003). Quantitative proteomics employing primary amine affinity tags. J. Biomol. Tech. 14, 216–223. SCHMIDT, A., KELLERMANN, J., LOTTSPEICH, F. (2005). A novel strategy for quantitative proteomics using isotope-coded protein labels. Proteomics 5, 4–15. MORITZ, B., MEYER, H. E. (2003). Approaches for the quantification of protein concentration ratios. Proteomics 3, 2208–2220 CHAKRABORTY, A., REGNIER, F. E. (2002). Global internal standard technology for comparative proteomics. J. Chromatogr. A 949, 173–184. GENG, M., JI, J., REGNIER, F. E. (2000). Signature-peptide approach to detecting proteins in complex mixtures. J. Chromatogr. A 870, 295–313. REN, D., PENNER, N. A., SLENTZ, B. E., REGNIER, F. E. (2004). Histidine-rich peptide selection and quantification in targeted proteomics. J. Proteome Res. 3, 37–45. ZHANG, R., REGNIER, F. E. (2002). Minimizing resolution of isotopically coded peptides in comparative proteomics. J. Proteome Res. 1, 139–147. ZHANG, R., SIOMA, C. S., WANG, S., REGNIER, F. E. (2001). Fractionation of isotopically labeled peptides in quantitative proteomics. Anal. Chem. 73, 5142–5149. RIGGS, L., SEELEY, E. H., REGNIER, F. E. (2005). Quantification of phospho-proteins with global internal standard technology. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 817, 89–96. LIU, P., REGNIER, F. E. (2002). An isotope coding strategy for proteomics involving both amine and carboxyl group labeling. J. Proteome Res. 1, 443–450. SCHNOLZER, M., JEDRZEJEWSKI, P., LEHMANN, W. D. (1996). Protease-catalyzed incorporation of 18O into peptide fragments and its application for protein sequencing by
References
89
90
91
92
93
94
95
96
electrospray and matrix-assisted laser desorption/ionization mass spectrometry. Electrophoresis 17, 945–953. MO, W., TAKAO, T., SHIMONISHI, Y. (1997). Accurate peptide sequencing by post-source decay matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Commun. Mass Spectrom. 11, 1829–1834. SHEVCHENKO, A., CHERNUSHEVICH, I., ENS, W., STANDING, K. G., THOMSON, B., WILM, M., MANN, M. (1997). Rapid ‘de novo’ peptide sequencing by a combination of nanoelectrospray, isotopic labeling and a quadrupole/time-of-flight mass spectrometer. Rapid Commun. Mass Spectrom. 11, 1015–1024. QIN, J., HERRING, C. J., ZHANG, X. (1998). De novo peptide sequencing in an ion trap mass spectrometer with 18O labeling. Rapid Commun. Mass Spectrom. 12, 209–216. KOSAKA, T., TAKAZAWA, T., NAKAMURA, T. (2000). Identification and C-terminal characterization of proteins from two-dimensional polyacrylamide gels by a combination of isotopic labeling and nanoelectrospray Fourier transform ion cyclotron resonance mass spectrometry. Anal. Chem. 72, 1179–1185. MIRGORODSKAYA, O. A., KOZMIN, Y. P., TITOV, M. I., KORNER, R., SONKSEN, C. P., ROEPSTORFF, P. (2000). Quantification of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using (18)O-labeled internal standards. Rapid Commun. Mass Spectrom. 14, 1226–1232. YAO, X., FREAS, A., RAMIREZ, J., DEMIREV, P. A., FENSELAU, C. (2001). Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal. Chem. 73, 2836–2842. YAO, X., AFONSO, C., FENSELAU, C. (2003). Dissection of proteolytic 18O labeling: endoprotease-catalyzed 16O-to-18O exchange of truncated peptide substrates. J. Proteome Res. 2, 147–152. REYNOLDS, K. J., YAO, X., FENSELAU, C. (2002). Proteolytic 18O labeling for
97
98
99
100
101
102
103
104
comparative proteomics: evaluation of endoprotease Glu-C as the catalytic agent. J. Proteome Res. 1, 27–33. HELLER, M., MATTOU, H., MENZEL, C., YAO, X. (2003). Trypsin catalyzed 16O-to-18O exchange for comparative proteomics: tandem mass spectrometry comparison using MALDI-TOF, ESI-QTOF, and ESI-ion trap mass spectrometers. J. Am. Soc. Mass Spectrom. 14, 704–718. SAKAI, J., KOJIMA, S., YANAGI, K., KANAOKA, M. (2005). 18O-labeling quantitative proteomics using an ion trap mass spectrometer. Proteomics 5, 16–23. STEWART, I. I., THOMSON, T., FIGEYS, D. (2001). 18O labeling: a tool for proteomics. Rapid Commun. Mass Spectrom. 15, 2456–2465. BERHANE, B. T., LIMBACH, P. A. (2003). Stable isotope labeling for matrix-assisted laser desorption/ionization mass spectrometry and post-source decay analysis of ribonucleic acids. J. Mass Spectrom. 38, 872–878. BLONDER, J., HALE, M. L., CHAN, K. C., YU, L. R., LUCAS, D. A., CONRADS, T. P., ZHOU, M., POPOFF, M. R., ISSAQ, H. J., STILES, B. G., VEENSTRA, T. D. (2005). Quantitative profiling of the detergent-resistant membrane proteome of iota-b toxin induced vero cells. J. Proteome Res. 4, 523–531. BONENFANT, D., SCHMELZLE, T., JACINTO, E., CRESPO, J. L., MINI, T., HALL, M. N., JENOE, P. (2003). Quantification of changes in protein phosphorylation: a simple method based on stable isotope labeling and mass spectrometry. Proc. Natl. Acad. Sci. USA 100, 880–885. BROWN, K. J., FENSELAU, C. (2004). Investigation of doxorubicin resistance in MCF-7 breast cancer cells using shot-gun comparative proteomics with proteolytic 18O labeling. J. Proteome Res. 3, 455–462. QIAN, W. J., MONROE, M. E., LIU, T., JACOBS, J. M., ANDERSON, G. A., SHEN, Y., MOORE, R. J., ANDERSON, D. J., ZHANG, R., CALVANO, S. E., et al. (2005).
29
30 Mass Spectrometry of Peptides and Proteins
105
106
107
108
109
110
111
Quantitative proteome analysis of human plasma following in vivo lipo-polysaccharide administration using 16O/18O labeling and the accurate mass and time tag approach. Mol. Cell Proteomics 4, 700–709. WANG, Y. K., MA, Z., QUINN, D. F., FU, E. W. (2001). Inverse 18O labeling mass spectrometry for the rapid identification of marker/target proteins. Anal. Chem. ZANG, L., PALMER TOY, D., HANCOCK, W. S., SGROI, D. C., KARGER, B. L. (2004). Proteomic analysis of ductal carcinoma of the breast using laser capture micro dissection, LC-MS, and 16O/18O isotopic labeling. J. Proteome Res. 3, 604–612. ROSS, P. L., HUANG, Y. N., MARCHESE, J. N., WILLIAMSON, B., PARKER, K., HATTAN, S., KHAINOVSKI, N., PILLAI, S., DEY, S., DANIELS, S., et al. (2004). Multiplexed protein quantification in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell Proteomics 3, 1154–1169. DOROSHENKO, V. M., COTTER, R. J. (1995). High-performance collision-induced dissociation of peptide ions formed by matrix-assisted laser desorption/ionization in a quadrupole ion trap mass spectrometer. Anal. Chem. 67, 2180–2187. DESOUZA, L., DIEHL, G., RODRIGUES, M. J., GUO, J., ROMASCHIN, A. D., COLGAN, T. J., SIU, K. W. (2005). Search for cancer markers from endometrial tissues using differentially labeled tags iTRAQ and cICAT with multidimensional liquid chromatography and tandem mass spectrometry. J. Proteome Res. 4, 377–386. KOLKMAN, A., DIRKSEN, E. H., SLIJPER, M., HECK, A. J. (2005). Double standards in quantitative proteomics: direct comparative assessment of difference in gel electrophoresis and metabolic stable isotope labeling. Mol. Cell Proteomics 4, 255–266. BAUER, A., KUSTER, B. (2003). Affinity purification-mass spectrometry. Powerful tools for the characterization of protein complexes. Eur. J. Biochem. 270, 570–578.
112 DZIEMBOWSKI, A., SERAPHIN, B. (2004). Recent developments in the analysis of protein complexes. FEBS Lett. 556, 1–6. 113 FRITZE, C. E., ANDERSON, T. R. (2000). Epitope tagging: general method for tracking recombinant proteins. Methods Enzymol. 327, 3–16. 114 GINGRAS, A. C., AEBERSOLD, R., RAUGHT, B. (2005). Advances in protein complex analysis using mass spectrometry. J. Physiol. 563, 11–21. 115 PUIG, O., CASPARY, F., RIGAUT, G., RUTZ, B., BOUVERET, E., BRAGADO-NILSSON, E., WILM, M., SERAPHIN, B. (2001). The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods 24, 218–229. 116 RIGAUT, G., SHEVCHENKO, A., RUTZ, B., WILM, M., MANN, M., SERAPHIN, B. (1999). A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17, 1030–1032. 117 SHEVCHENKO, A., SCHAFT, D., ROGUEV, A., PIJNAPPEL, W. W., STEWART, A. F., SHEVCHENKO, A. (2002). Deciphering protein complexes and protein interaction networks by tandem affinity purification and mass spectrometry: analytical perspective. Mol. Cell Proteomics 1, 204–212. 118 TERPE, K. (2003). Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl. Microbiol. Biotechnol. 60, 523–533. 119 RANISH, J. A., YI, E. C., LESLIE, D. M., PURVINE, S. O., GOODLETT, D. R., ENG, J., AEBERSOLD, R. (2003). The study of macromolecular complexes by quantitative proteomics. Nat. Genet. 33, 349–355. 120 BRAND, M., RANISH, J. A., KUMMER, N. T., HAMILTON, J., IGARASHI, K., FRANCASTEL, C., CHI, T. H., CRABTREE, G. R., AEBERSOLD, R., GROUDINE, M. (2004). Dynamic changes in transcription factor complexes during erythroid differentiation revealed by quantitative proteomics. Nat. Struct. Mol. Biol. 11, 73–80.
References 121 AMANCHY, R., KALUME, D. E., PANDEY, A. (2005). Stable isotope labeling with amino acids in cell culture (SILAC) for studying dynamics of protein abundance and posttranslational modifications. Sci. STKE 267, 12. 122 BLAGOEV, B., KRATCHMAROVA, I., ONG, S. E., NIELSEN, M., FOSTER, L. J., MANN, M. (2003). A proteomics strategy to elucidate functional protein-protein interactions applied to EGF signaling. Nat. Biotechnol. 21, 315–318. 123 SCHULZE, W. X., MANN, M. (2004). A novel proteomic screen for peptide-protein interactions. J. Biol. Chem. 279, 10756–10764.
124 BLAGOEV, B., ONG, S. E., KRATCHMAROVA, I., MANN, M. (2004). Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics. Nat. Biotechnol. 22, 1139– 1145. 125 HOCHLEITNER, E. O., KASTNER, B., FROHLICH, T., SCHMIDT, A., LUHRMANN, R., ARNOLD, G., LOTTSPEICH, F. (2005). Protein stoichiometry of a multiprotein complex, the human spliceosomal U1 small nuclear ribonucleoprotein: absolute quantification using isotope-coded tags and mass spectrometry. J. Biol. Chem. 280, 2536–2542.
31
1
Difference Gel Electrophoresis (DIGE) Barbara Sitek Ruhr-University Bochum, Bochum, Germany
Burghardt Scheibe GE Healthcare Bio-Sciences, Discovery Systems, Freiburg, Germany
Klaus Jung University of Dortmund, Dortmund, Germany
Alexander Schramm Universit¨atsklinikum Essen, Essen, Germany
Kai St¨uhler University of Dortmund, Dortmund, Germany
Originally published in: Proteomics in Drug Research. Edited by Michael Hamacher, Katrin Marcus, Kai St¨uhler, Andr´e van Hall, Bettina Warscheid and Helmut E. Meyer. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31226-9 Abstract
In the last years the application of two-dimensional electrophoresis (2-DE) has often been declared outdated and a new century of gel-free proteomics was announced. Nevertheless, 2-DE is still the method of choice when analyzing complex protein mixtures. With a separation of 10 000 proteins, 2-DE gives access to high-resolution proteome analysis. Continuous development has consolidated 2-DE application in proteomics, where the introduction of difference gel electrophoresis (DIGE) is the latest improvement. DIGE is based on fluorescently tagging all proteins in each sample with one set of matched fluorescent dyes designed to minimally interfere with protein mobility during 2-DE. For a DIGE analysis, two different fluorescence dyes, CyDyeő minimal dyes and CyDye saturation dyes, are available. CyDye minimal dyes react with an NHS-ester bond of lysine ε-amino residues and enable coelectrophoresis of up to three different samples in one approach. For special applications, e.g., samples from microdissection, the CyDye saturation dyes allow complete 2-D analysis and
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Difference Gel Electrophoresis (DIGE)
quantification of protein abundance changes in scarce sample amounts. The dyes react via a maleimide group with all available cysteine residues in the protein sample, giving a high labeling concentration. Here we show the application of both CyDye types for the analysis of relevant clinical samples and an approach for evaluating DIGE data statistically. We used CyDye minimal dyes to detect kinetic proteome changes resulting from ligand activation of either tyrosine kinases TrkA or TrkB in neuroblastoma cells. TrkA and TrkB are members of the family of neurotrophin receptors, which mediate survival, differentiation, growth, and apoptosis of neurons in response to stimulation by their ligands, NGF and BDNF, respectively. In addition, we applied CyDye saturation dyes for the identification of new molecular markers of the pancreatic tumor progression. One thousand microdissected cells were analyzed from different pancreatic intraepithelial neoplasias (PanIN) grades, the precursor lesion of pancreatic ductal adenocarcinoma (PDAC). Based of the multiplexing strategy and the application of an internal standard, DIGE enables one to perform quantitative proteomics with high accuracy allowing statistical approaches for high-confidence data analysis.
1 Introduction
In the early 1990s Marc Wilkins introduced the term proteome as a description for all proteins, which are coded by a genome at specific time points and under certain conditions. It is thought that in addition to the analysis of the genome that proteome analysis, with its high dynamic and complexity, can be used to describe life in more detail. A number of different methods have therefore been established in different proteomics techniques to allow analysis of complex protein mixtures. Currently, two-dimensional electrophoresis (2-DE) is the separation method with highest resolution power for protein samples. Up to 10 000 proteins can be separated in one gel and are then accessible for quantitative analysis (Figure 1 [1]). In 1975, 2-DE was independently developed by Klose et al. [2] and O’Farrell et al. [3] representing the combination of two different separation techniques. In the first dimension the proteins are focused according to their isoelectric points (pIs) (isoelectric focussing (IEF)) and subsequently in the second according to their molecular size (SDS electrophoresis) (see review [4] for more details). For the IEF, two different techniques, carrier ampholyte IEF (CA-IEF) [1] and immobiline pH gradient (IPG) [3, 6], respectively, are available. The immobilization of the pH gradient and fixation on a plastic backing simplified the IEF and therefore allowed the widespread application of 2-DE (reviewed in more detail in, [2, 8, 9]). Meanwhile, a number of different IPG stripes, with narrow or broad as well as linear or nonlinear pH gradient, are provided by different manufacturers. In contrast, CA-IEF, a labor-intensive technique, failed to achieve widespread application but is still used in more specialized laboratories.
1 Introduction
Fig. 1 Schematic representation of 2-dimensional electrophoresis (2-DE). In the first dimension the proteins are focused according to their isoelectric point [isoelectric focusing (IEF)] and subsequently in the second dimension according to their molecular size (SDS electrophoresis or SDS-PAGE).
For quantitative analysis the proteins can be visualized using a number of staining techniques with different sensitivities and linear dynamic detection ranges (Table 1). In contrast to array technologies like, e.g., Affymetrix chips or protein chips where the analyte position is assignable with a high local x, y resolution, the protein (spot) position in a 2-DE gel depends on its physicochemical properties. Therefore, generating 2-DE gels for a differential analysis requires standardized opTable 1 The most commonly used protein staining methods for proteome analysis using
2-DE. Besides the low detection limit of the applied staining methods, the linear dynamic range for protein quantification is another important parameter for a global description of a proteome Staining method Silver Zinc imidazole SyproRuby SyproOrange, SyproRed Colloidal Coomassie DIGE minimal DIGE saturation Phosphor-imaging (32 P, 14 C and 35 S) a
Detection limit (ng)
Linear dynamic range
Reference
1 3–5a 10 1 30
2 orders of magnitude No quantification 3 orders of magnitude 3 orders of magnitude
[38] [40] [37] [42] [44]
3 orders of magnitude 3–5 orders of magnitude 3–5 orders of magnitude 5 orders of magnitude
[41] [15] [43] [39]
8–10 0.1–0.2 0.005–0.010 1
Detection limit of MS-compatible silver staining [25].
3
4 Difference Gel Electrophoresis (DIGE)
erating procedures to achieve highly reproducible protein patterns [10]. However, small alterations in 2-DE processing, such as, for instance, during the gel casting process, polymerization reaction and/or temperature changes during the IEF run (21–24 h) results in reproducibility variations which have to be compensated for by appropriate image analysis software. Besides detection and normalization features, classical image analysis software like, for example, ImageMaster, PDQuestő , ProteomWeaverő or Delta2Dő provide appropriate algorithms for matching of the spots between different 2-DE gels to compensate for these variations.
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE
An important improvement for the application of 2-DE was the introduction of the so-called difference gel electrophoresis (DIGE) by J. S. Minden’s group in 1997 [11]. DIGE circumvents some basic problems of 2-DE, for example gel-to-gel variations and limited accuracy using different fluorophores (Cy2, Cy3 and Cy5) for a multiplexed analysis. In contrast to classical detection methods (Table 1) this method relies on covalent derivatization of proteins in each sample with one of the set of matched CyDye that do not affect the relative mobility of proteins during electrophoresis. Thus, DIGE allows rapid identification of protein changes between two samples on the same 2-DE gel without influences of gel-to-gel variations. Additionally, DIGE covers a dynamic detection range of 3–5 orders of magnitude; while silver staining can only detect 30-fold changes [1, 9, 11, 13, 15]. For a DIGE analysis the two different fluorescence dyes CyDye minimal dyes and CyDye saturation dyes are available. CyDye minimal dyes that react via an NHS-ester bond with ε-amino residues of lysine enable coelectrophoresis of up to three different samples in one approach. Approximately 3% of the available proteins are labeled on a single lysine per protein, whereas the rest remains unlabeled. This makes the technique robust and labeling optimization is usually unnecessary. The three different CyDye tags add approximately 450 Da to the protein mass when coupled to the protein. The resulting image patterns are comparable to poststained gels and sensitivity of the minimal dyes is similar to most sensitive silver staining. A pooled internal standard can be created by mixing aliquots of all samples to be analyzed. The use of such an internal standard on each gel that comprises equal quantities of each of the samples in the experiment allows calculation of ratios for the same protein spot within one gel as well as between gels. This largely removes experimental gel-to-gel variation leading to improved accuracy of protein quantification between samples from different gels. In a setup of two samples plus internal standard per gel, renunciation of gel repetitions and emphasis on biological repetitions brings down the total number of gels per experiment necessary for quantitative statistics (see Section 2.3). The bottlenecks of classical 2-DE-like methodical variation, laborious image analysis and restricted quantification are therefore minimized in the DIGE workflow (Figure 1).
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE
For special applications, e.g., samples from microdissection (see Section 2.3), the CyDye saturation dyes enable complete 2-DE analysis and quantification of protein abundance changes in scarce sample amounts (5 µg protein/image). The dyes react via a maleimide group with all available cysteine residues in the protein sample, giving a high labeling concentration. Owing to their net zero charge, there is no charge alteration of the labeled protein. As with all DIGE experiments an internal standard sample containing an equal amount of each sample is run on each gel. Sensitivity is 20-fold higher than for silver staining. In contrast to minimal labeling with saturation dyes, a labeling optimization is necessary to determine the appropriate amount of dye. With a confocal fluorescent imager the dye images are acquired at their specific wavelength without crosstalk. Dedicated image analysis software (DeCyder) utilizes a proprietary codetection algorithm (up to triple detection) that permits automatic detection, background subtraction, quantification, normalization and intergel matching of fluorescent images. The experimental design using the internal standard effectively eliminates gel-to-gel variation, allowing detection of small differences in protein levels (Figure 2). Systemic variation as well as inherent biological variation arising from patientto-patient, culture-to-culture differences, etc., can be clearly differentiated from induced biological changes. The DIGE system allows discrimination between true biological differences on the one hand and systemic as well as interindividual differences on the other.
2.1 Application of CyDye DIGE Minimal Fluors (Minimal Labeling with CyDye DIGE Minimal Fluors) 2.1.1 General Procedure In the standard labeling protocol, proteins are first solubilized in a DIGE lysis buffer (30 mM Tris, 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, pH 8.5). Samples produced by acidic precipitation should be lysed with DIGE lysis buffer adjusted to an appropriate pH (e.g., 9.5) resulting in a pH between 8 and 9. Lower pH during the labeling reaction affects labeling efficiency and kinetics in a negative way. The protein concentration should then be determined using a standard protein quantification method. Protein concentrations in the range 1–20 mg/mL have been successfully labeled using the standard protocol (personal communication, Amersham Bioscience). CyDye samples are then added to the protein lysate and incubated on ice in the dark for 30 min so that 50 µg of protein are labeled with 400 pmol of CyDye (prepared in a working solution containing 400 pmol CyDye µL-1 anhydrous dimethylformamide). Then 1 µL of 10 mM lysine is added to stop the reaction and left for 10 min on ice in the dark. Samples can now be stored for at least one month at −70 ◦ C. Lysis buffer in classical 2-DE traditionally contains primary amines (e.g., carrier ampholytes, pharmalytes) and excess of thiols (e.g., DTT, DTE) which should be not present during the labeling reaction.
5
6 Difference Gel Electrophoresis (DIGE)
Fig. 2 Difference gel electrophoresis (DIGE). Ettanő DIGE workflow: three-color and two-color experiments including the internal standard. For fluorescence proteins tagging, two different CyDyes techniques are available. Minimal fluors allow consideration of three different CyDyes (Cy2, Cy3 and Cy5) in a multiplexing experiment. Applying sat-
uration fluors multiplexing is only done for internal standardization, but allows working with low concentration protein samples. After image acquisition at CyDyes specific wavelength using a confocal fluorescence scanner (e.g., Typhoon series) protein spot pattern can be analyzed by appropriate image analysis software (DeCyder).
After labeling, 2×lysis buffer [8 M urea, 4% (w/v) CHAPS, 2% (v/v) carrier ampholytes, 2% (w/v) DTT] can be added to the samples in a 1+1 dilution for IEF. Combining the three samples to be separated in one IEF strip or tube gel results in a total protein concentration of 150 µg per gel (50 µg Cy3+50 µg Cy5+50 µg Cy2 labeled). From here methods are virtually identical to classical 2-DE. A peculiarity is that glass cassettes should have low intrinsic fluorescence capacity, since the gel will be scanned still assembled between the plates. The CyDyes are matched for size and charge resulting in an overlay of identical proteins when run on the same 2-DE gel. The lysine amino acid in proteins carries a +1 charge at neutral or acidic pH. CyDye minimal dyes also carry an intrinsic +1 charge which, when coupled to the lysine, replaces the lysine’s +1 charge with its own, ensuring that the pI of the protein does not significantly alter (Figure 3). The experimental design is based on evidence that the experimental variation in a 2-DE experiment is mostly due to gel-to-gel variation. Running multiple samples on a single gel reduces the number of gels required to produce the same amount
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE
Fig. 3 Labeling chemistry of CyDye DIGE minimal fluors. Dye Nhydroxysuccinimid-ester (NHS-ester) reacts with a single g-amino residue of lysine per protein and a positively charged dye molecule replaces the positive charged lysine on the protein so there is no net change in pI.
of data. The recommended protocol suggests that a Cy2-labeled internal standard sample should be run on all gels within an experiment. The standard sample consists of an aliquot of all the different samples in an experiment (Figure 2). This ensures representation of all proteins on all the gels analyzed. The standard sample will increase the reliability in matching between gels and will also allow the generation of accurate spot statistics between gels (see Section 2.3). 2.1.2 Example of Use: Identification of Kinetic Proteome Changes upon Ligand Activation of Trk-Receptors DIGE exerts its full strength in settings comprising highly similar but not identical biological conditions. Potential applications include monitoring of cellular responses to all kinds of external stimuli. Those systems are mostly error-prone in classical 2-DE due to the above-mentioned interexperimental variations. We have chosen the DIGE system to detect kinetic proteome changes resulting from ligand activation of either tyrosine kinases TrkA or TrkB in neuroblastoma cells (Figure 4). TrkA and TrkB are members of the family of neurotrophin receptors, which mediate survival, differentiation, growth, and apoptosis of neurons in response to stimulation by their ligands, NGF and BDNF, respectively (reviewed in [16]). So far, little is known about the molecular mechanisms used by TrkA/TrkB to mediate this different biological behavior. Expression levels of TrkA/TrkB are important prognostic factors in a variety of embryonal tumors including neuroblastoma, the most common solid tumor of childhood [17]. Since TrkA/TrkB exhibit a high level of sequence similarity and use overlapping pathways for signal transduction, the existence of specific effector molecules crucial for receptor and cell-type specific response is likely [18]. To identify these effectors by analyzing biological effects of TrkA and TrkB activation in a defined model, we performed a proteome analysis using the human neuroblastoma SY5Y cell line stably transfected with the TrkA or TrkB cDNA [19]. The resulting phenotypes of these cell lines have been extensively analyzed and are in agreement with the observed biological functions [5, 6].
7
8 Difference Gel Electrophoresis (DIGE)
Fig. 4 2DE-gels of SY5Y–TrkA cell lysate (A) or SY5Y–TrkB cell lysate (B). Separation of proteins was performed in the first dimension (horizontal) by IEF and then in the second dimension (vertical) by SDS-PAGE. Representative pictures of whole cell lysates
of SY5Y–TrkA (A) and SY5Y–TrkB (B) are shown. Differentially expressed proteins following activation of neurotrophin receptors TrkA or TrkB by their respective ligands are numbered and correspond to the data presented in Table 2.
Proteomic changes were monitored in a time-course of 0, 0.5, 1, 6 and 24 h following receptor activation. These time points were chosen to monitor immediate responses to neurotrophins (NGF/BDNF) as well as late effector proteins. Considering the biological variation, we focused on identification of significantly regulated protein spots in five biologically independent experiments at each time point. At each time point a differential comparison was performed between neurotrophintreated and untreated cells. Technically, 50 µg protein of untreated and ligand treated cells are labeled with Cy5 and Cy3, respectively. A pool of all analyzed samples was labeled with Cy2, generating the internal standard. Proteins extracted from neurotrophin-treated and untreated SY5Y–TrkA/B cells were labeled with Cy3 and Cy5, respectively, mixed with Cy2-labeled internal standards (Figure 5) and run in one gel. After scanning of gels and manual correction of spot detection, protein spots were matched for statistical analysis and determination of the differentially expressed proteins. In whole cell lysates of SY5Y–TrkA/TrkB, 1700–1900 distinct protein spots were detected by the DeCyder software and subsequent manual correction. The analysis of the expression profiles of SY5Y–TrkB cells resulted in 24 significantly regulated spots induced by BDNF (p < 0.05, ratio > 1.5) and 13 regulated spots induced by NGF in SY5Y–TrkA cells (p < 0.05, ratio > 1.3) (Table 2). A total of 20 spots regulated during the time-course following neurotrophin treatment were specific for SY5Y–TrkB cells, and nine regulated spots were specific for SY5Y–TrkA cells. Four spots were regulated in both cell lines. While three spots demonstrated the same regulation pattern, one was inversely regulated between SY5Y–TrkA and SY5Y–TrkB. The respective protein was identi-
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE
Fig. 5 Schematic workflow of sample preparation for 2D-DIGE. SY5Y–TrkA and SY5Y–TrkB cells were stimulated by neurotrophins NGF (100 ng/mL) or BDNF (50 ng/mL), respectively (“treated cells”). Controls received fresh medium without neurotrophins at t = 0 (“untreated cells”). Samples of the same time point following
neurotrophin addition were labeled, mixed and run in one gel with an equal amount of internal standard (B). This standard was pooled from all samples in an experiment. The lower part of the figure exemplarily depicts the procedure for time point t = 24 h. Experiments were repeated five times to adjust for inter-experimental variations.
fied as tropomyosin-3 by MALDI-PMF. The majority of the proteins differentially expressed upon neurotrophin receptor activation were regulated in the late stimulation phase (10 of 13 protein spots in SY5Y–TrkA and 16 of 24 in SY5Y–TrkB) (Figure 6). Most of the proteins identified were assigned to the GeneOntology (GO) class “cytoskeleton organization and biogenesis”, but proteins involved in signal transduction could also be found. Most prominently, SY5Y–TrkB transfected cells up-regulate hnRNP K, which functions as a “RNA silencer” in immature epithelial cells to suppress translation of proteins only needed in differentiated cells. It is intriguing to speculate about a similar function for hnRNP K in preventing expression of proteins found only in differentiated neuronal cells. A recent report indicated that in fact hnRNP K controls the switch from proliferation to neuronal differentiation through regulation of p21 [22]. In this model system, application of the DIGE system has set the basis for integration of the signaling pathways and functional analysis of effector proteins as well as insights into the underlying courses for the opposing phenotypes caused by activation of different neurotrophin receptors. 2.2 Application of Saturation Labeling with CyDye DIGE Saturation Fluors 2.2.1 General Procedure As this type of labeling method aims to label all available cysteines on each protein, many cysteine residues existing as disulfide bonds in proteins must be unfolded and the disulfide bonds broken. This can be achieved under denaturing conditions
9
10 Difference Gel Electrophoresis (DIGE) Table 2 Summary of significantly regulated proteins following neurotrophin receptor activa-
tion in SY5Y–TrkA or SY5Y–TrkB identified by MALDI-MS Spot No.
Protein
Theoretical
Experimental
pI
MW [kDa]
pI
MW [kDa]
Accession no. NCBI
Regulation/ kinetic group TrkA
TrkB
1
Dynein
5.0
71.5
5.4
96.9
gi/24307879
none
↑A
2 3
5.0 5.0
71.5 72.3
5.5 5.3
95.3 86.8
gi/24307879 gi/16507237
↓C ↑C
↓A ↑C
5.0
72.3
5.3
86.8
gi/16507237
none
↑C
6.8
74.1
6.2
86.6
gi/27436946
none
↑C
5.1
51.0
5.4
72.0
gi/14165437
none
↑B
7
Dynein Heat shock 70kDa protein 5 Heat shock 70kDa protein 5 Lamin A/C isoform 1 precursor Heterogeneous nuclear ribonucleoprotein K isoform a not identified
5.6
67.5
↓C
↓B
8
Lamin A/C isoform 2
6.4
65.1
6.0
62.9
gi/27436946
none
↑C
9
Lamin A/C isoform 2
6.4
65.1
6.1
62.9
gi/27436946
none
↑C
10
JC5704 protein disulfide-isomerase (EC 5.3.4.1) ER60 precursor
5.9
56.8
5.7
60.9
gi/7437388
none
↑C
11
JC5704 protein disulfide-isomerase (EC 5.3.4.1) ER60 precursor
5.9
56.8
5.8
60.4
gi/7437388
none
↑C
12
Dihydrolipoamide dehydrogenase precursor
7.9
54.2
6.3
60.4
gi/4557525
none
↓B
13
Adenylyl cyclase-associated protein Nuclear matrix protein NMP200 related to splicing factor PRP19
8.0
51.7
6.3
57.5
gi/5453595
none
↓B
6.1
55.2
6.1
60.5
gi/7657381
none
↓C
TATA binding protein interacting protein 49 kDa ATP synthase, H+ transporting,
6.0
50.2
6.1
56.6
gi/4506753
none
↓C
9.2
59.8
6.3
55.8
gi/15030240
none
↓C
4 5 6
14
15
16 17
ATP synthase, H+ transporting,
9.2
59.8
6.2
55.8
gi/15030240
none
↓C
18
Calumenin
4.4
37.1
5.0
52.0
gi/4502551
none
↑C
19
A Chain A
8.5
36.7
6.5
35.8
gi/13786849
none
↓C
20
A27674 tropomyosin 3, fibroblast
4.7
32.9
5.1
34.4
gi/88928
↓C
↑C
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE Table 2 (Continued)
Spot No.
21
Protein
Theoretical
Experimental
pI
MW [kDa]
pI
MW [kDa]
Rho GDP dissociation inhibitor (GDI) alpha
5.0
23.2
5.3
31.2
Accession no. NCBI
Regulation/ kinetic group TrkA
TrkB
gi/4757768
none
↓C
22
AF487339 1 NM23-H1
5.2
19.7
5.8
22.1
gi/29468184
none
↓B
23
1713410A beta galactoside soluble lectin
5.1
14.6
5.3
12.8
gi/227920
none
↑C
24
not identified
5.4
24
none
↓A
25
6.8
74.1
6.2
86.6
gi/27436946
↑C
none
6.8
74.1
6.2
86.6
gi/27436946
↑A
none
5.1
51.0
5.4
70.5
gi/14165437
↓C
none
28
Lamin A/C isoform 1 precursor Lamin A/C isoform 1 precursor Heterogeneous nuclear ribonucleoprotein K isoform a Lamin B2
5.3
67.7
5.6
71.0
gi/27436951
↑A
none
29
Vimentin
7.9
54.2
5.2
54.2
gi/4557525
↓C
none
30
not identified
5.5
49.4
↓C
none
31
not identified
5.5
49.4
↓C
none
32
ACTB protein
5.5
49.4
↓C
none
33
not identified
5.6
26.4
↓C
none
26 27
5.6
40.2
gi/15277503
with a reducing agent such as tris-(carboxyethyl) phosphine hydrochloride (TCEP) (Figure 7). In some proteins cysteine residues are buried, and thus the extent of labeling will depend on the accessibility of cysteine within the protein under the reaction conditions used. Also protein quantification in the amounts and concentrations used is not easy to carry out. A labeling optimization in a 2-D titration experiment is therefore obligatory to determine the optimum amount of TCEP and dye required for the protein extract being used. The molar ratio of TCEP : dye should always be kept at 1 : 2 to ensure efficient labeling. Typically, 5 µg of protein lysate requires 2 nmol TCEP and 4 nmol dye for the labeling reaction (assuming an average cysteine content of 2%). If the amount of TCEP/dye is too low, available thiol groups on some proteins will not be labeled and show molecular weight trains in the 2-D image. If the amount of TCEP/dye is too high, nonspecificlabeling of the amine groups on lysine residues can occur and have been observed as pI charge trains on the gel. In the standard protocol, proteins are solubilized in lysis buffer (30 mM Tris, 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, pH 8.0). Primary amines or thiols should not be present, because they will compete with the protein for
11
12 Difference Gel Electrophoresis (DIGE)
Fig. 6 Typical kinetics of regulated proteins following neurotrophin treatment of SY5Y–TrkA or SY5Y–TrkB. The majority of proteins like for instance galectin-1 were regulated in the late stimulation phase. Circles (control) and triangles (neurotrophintreated) represent single standardized abun-
dance values from one gel. Lines connect the average standardized abundance values. Though experiments were repeated five times, the DeCyder software could not detect the here presented protein spot in all gels. Thus, at some time points less than five circles or triangles are represented.
Fig. 7 Labelling chemistry of CyDye DIGE saturation fluors: dye maleimide group reacts with all available cysteine residues in the protein and due to its net zero charge there is no charge change.
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE
the dye. After lysis, the pH should not deviate from pH 8; protein concentrations should be between 0.55 and 10 mg/mL. Then cysteine residues are reduced at 37 ◦ C for 1 h by incubating with TCEP (2 mM solution) with a volume determined in the labeling optimization experiment. CyDye DIGE Fluor saturation dye (2 mM working solution in anhydrous dimethylformamide) is added for reaction at 37 ◦ C in the dark for a further 30 min. Cy3 is used for the pooled standard and Cy5 for the individual samples. Finally, the reaction is stopped by excess of DTT using 2×lysis buffer (s.a.). 2-DE is according to standard procedure; the alcylation step in the equilibration procedure can be omitted. 2.2.2 Example of Use: Analysis of 1000 Microdissected Cells from PanIN Grades for the Identification of a New Molecular Tumor Marker Using CyDye DIGE Saturation Fluors Pancreatic adenocarcinoma is the fourth leading cause of cancer in the United States [23]. The main problem of combating pancreatic adenocarcinoma arises from a lack of specific symptoms and limitations in detection methods. The overwhelming majority of pancreatic carcinoma patients are discovered at a late clinical tumor stage. Only 10% of patients show a potentially curable resectable tumor. In order to identify new molecular markers of the pancreatic tumor progression we established a proteomics approach analyzing 1000 microdissected cells from different pancreatic intraepithelial neoplasias (PanIN) grades [24]. It is believed that PanINs are the precursor lesions of pancreatic ductal adenocarcinoma (PDAC). PanINs are histologically subdivided into four different grades, namely PanIN1A/B, PanIN-2 and PanIN-3. All PanINs progress from flat to papillary lesions with increasing degrees of dysplasia (Figure 8). To analyze the different PanIN grades we applied microdissection, through which it is possible to analyze the biologically
Fig. 8 Histological images of different PanIN grades stained with H&E. It is believed that pancreatic intraepithelial neoplasias (PanINs) are the precursor lesions of pancreatic ductal adenocarcinoma (PDAC). PanINs are histologically subdivided into four different grades, namely
PanIN-1A/B, PanIN-2 and PanIN-3. All PanINs progress from flat to papillary lesions with increasing degrees of dysplasia. The PanIN classification has been established as a common diagnostic criterion at the National Cancer Institute Think Tank (http://pathology.jhu.edu/pancreas panin).
13
14 Difference Gel Electrophoresis (DIGE)
Fig. 9 DIGE analysis of pancreatic ductal epithelium and PanIN-2 microdissected cells. The analysis of 1000 microdissected cells from pancreatic ductal epithelium and PanIN-2 lesions revealed a pattern of A 1.875 and B 2.050 protein spots, respec-
tively. Eight significantly regulated proteins are indicated by arrows and respective spot number. In the zoomed sections two differentially regulated proteins (white box) are displayed in 3D view obtained with the DeCyder software.
relevant cell type, often in the minority when analyzing precursor lesions. Different techniques have been developed to facilitate microdissection [7, 25, 26, 28]. To select for PanIN grades, manual microdissection was chosen because we have found it easier and speedier than automatic processing [24]). Using CyDye DIGE saturation fluors we were able to combine the high resolution power of 2-DE with the high sensitivity of fluorescence imaging for the analysis of the scarce sample amount of 1000 microdissected cells (approximately 2 µg). In this preliminary report we have shown that the established protocol can successfully be applied to the proteome analysis of PanIN-2 grade as well as pancreatic ductal epithelium (Figure 9), resulting in the identification of the first molecular markers of PanIN-2 using LC-ESI-MS/MS (Table 3). Annexin A2 and annexin A4 have been found to be differentially expressed in PanIN-2 grades. These proteins are members of the annexin family of calcium-dependent phospholipid-binding proteins showing a 45–59% identity with other members of the annexin family [12]. Although their functions are still not clearly defined, several members of the annexin family have been implicated in membrane-related events along exocytotic and endocytotic pathways [30]. Furthermore, annexin A2 was found to be significantly up-regulated in PanIN-2 grades. It has been shown that the endothelial cell-surface Ca2+ -binding protein, annexin A2, activates the t-PA-dependent formation of plasmin from plasminogen and promotes tumor cell invasion [4]. In the differential expression profiling of Lu et al., annexin 2A has also been found to be up-regulated in pancreatic carcinoma tissue [22].
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE Table 3 First candidate molecular markers for PanIN-2 grade pancreas carcinoma analyzing
1000 microdissected cells Spot No.
Spotname
Accession
Protein
Fold change
T-test
Seq. pI
1 2 3 4
1347 2437 1811 2660
gi/15277503 gi/18645167 gi/4502105 gi/5453740
5 6 7 8
820 1844 2547 823
– – gi/48255907 gi/5030431
Actin, gamma 1 4.0 0.017 5.6 Annexin A2 4.8 0.0008 7.4 Annexin A4 −4.5 0.0017 5.8 Myosin regulatory −2.1 0.00031 4.6 light chain MRCL3 Not identified −2.65 0.0068 – Not identified −13.37 0.0031 – Transgelin −4.5 0.0017 8.9 Vimentin −2.5 0.0055 4.8
Seq. MW (kDa) 40.2 38.6 36.1 19.8 – – 22.6 41.6
Furthermore, a group of three actin-associated proteins down-regulated in PanIN-2 grades were found, whereas actin itself was detected as significantly upregulated. For instance transgelin is an actin-binding protein of unknown function cross-linking actin filaments [33]. It has been shown that down-regulation of transgelin may be an important early event in tumor progression and it is considered a diagnostic marker for breast and colon cancer development [34]. The role of the other actin-associated proteins is at the moment speculative and will be analyzed in more detail when the expression profiles of all PanIN grades and carcinoma samples are available. Currently, there is very little information available concerning the initiation and prevention of pancreatic cancer. But the candidate protein identified applying DIGE saturation labeling for the analysis of PanIN grades will help to find new diagnostic markers for early detection of pancreatic cancer and to decipher processes within tumor biology. After analyzing all of the PanIN stages a widespread catalogue of protein expression in pancreatic tumor progression will be obtained, helping us to understand pancreatic tumor biology in more detail.
2.3 Statistical Aspects of Applying DIGE Proteome Analysis
For the identification of new molecular markers and interaction partners proteome analysis using 2-DE (DIGE) is often applied, but the deduced candidate proteins using the subtractive approach of a differential analysis need further validation. For reasons of time and cost, the number of candidate proteins is limited and only significant proteins changes can be considered for candidate validation. A DIGE experiment produces large amounts of data, or more exactly, volume values for thousands of gel spots. In order to make correct inferences from these data, statistical methods are quite important. Many of the statistical methods used in DNA microarray studies can be adapted for analyses of gel data. In this section we
15
16 Difference Gel Electrophoresis (DIGE)
focus on the calibration and normalization of protein expression data as well as on the detection of differentially expressed proteins resulting from a DIGE experiment. 2.3.1 Calibration and Normalization of Protein Expression Data From the DeCyder software it is possible to find the background-subtracted spot volumes, i.e., for a single spot the lowest 10th percentile pixel values on the spot boundary have been excluded. Several features of this raw data require calibration and normalization prior to statistical analysis. The volume values are affected by dye-specific system gains and constant additive biases. In Figure 10 the Cy5 and Cy3 spot volumes of one gel are plotted against each other. It can clearly be seen that the difference between Cy5 and Cy3 spot volumes increases when the spot volumes increase. When using m gels for a DIGE study (with internal standard, treatment and control) and regarding each image of a gel as a distinct gel, one receives a data set with j = 1, . . ., 3 m gels. To calibrate the spot volumes Karp et al. [17] proposed the following statistical model for consideration:
xi j =a j xi j +bi j
(1)
where xi j is the real expression value of the ith spot on the jth gel, xij is the respective measured spot volume, aj the scaling factors that adjust for the dye-specific system gain and bj additive offsets which compensate for the additive biases. These parameters can be estimated by maximum-likelihood estimation. The respective estimation
Fig. 10 Scatterplot of the Cy5 versus the Cy3 spot volumes of a given minimal CyDye gel.
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE
Fig. 11 Graphs of the logarithm and the arsinh for data normalization.
algorithm and the calibration procedure have been implemented within the vsnpackage for the open-source software R (available at http://cran.r-project.org) [15]. The resulting calibrated spot volumes are lognormal-distributed. Most statistical methods, however, are based on the assumption that the data are normaldistributed. Hence, the calibrated volume values must be normalized. A common method for normalization is to apply the logarithm to the data, but the logarithm has a singularity at zero and causes a bias in the Cy3/Cy5 ratios for small values. The mentioned software package uses the arsinh instead, which for high values is equivalent to the logarithm but has no singularity at zero and its smoothness for small values does not cause biases. The graphs of the logarithm and the arsinh can be compared in Figure 11. Applying the logarithm or the arsinh to the data also has the benefit that the variance of the data is stabilized. In the raw data, high volume values usually have a different variance than small volume values. In Figure 12 the effect on data normalization using arsinh transformation is shown. 2.3.2 Detection of Differentially Expressed Proteins In DIGE experiments it is often of interest to find proteins with altered expression in two different mixtures, e.g., treatment and control. Having transformed the data by the logarithm or the arsinh one has to look for differential expression changes and not for relative changes, because ratios become differences when being transformed by the logarithm or the arsinh, i.e., because log (a/b) = log(a) − log(b). If the volume values from treatment and control probes come, for example, from the same patient or, as in DIGE experiments, from the same gel, then there is
17
18 Difference Gel Electrophoresis (DIGE)
Fig. 12 Lognormal-distributed volume values before normalization (left) and normal-distributed values after the arsinh-transformation (right).
a dependency between the resulting values. Hence, the t-test for paired samples, which has even a higher statistical power than the t-test for independent samples, can be used here. The term power is explained below. For the paired t-test the difference d between the treatment and control value is used. For each single spot the t-test for paired samples tests the null hypothesis H0 , that there is no expression change for this spot, versus the alternative hypothesis, H1 that there is an expression change. Based on the sample, the test decides either to accept H0 or to reject it and accept H1 . Be aware that the decision of a statistical test does not supply 100% certainty. A differentiation between the test decision and reality must always be made. In Table 4 the two kinds of errors that may occur are shown: a Type I error is to reject the null hypothesis when it is true, and a Type II error is to accept the null Table 4 Possible results of a statistical test
Test decision
Reality
Protein not differentially expressed
Protein differentially expressed
Protein not differenttially expressed
Correct decision
Wrong decision (Type I error α)
Protein differentially expressed
Wrong decision (Type II error β)
Correct decision
2 Difference Gel Electrophoresis: Next Generation of Protein Detection in 2-DE
hypothesis when it is not true. The probability α for the Type I error is called the level of the test. The probability β for the Type II error depends on this α-level, the sample size, the expression change to be detected and the variance of the measured values. The probability of rejecting the null hypothesis is called the power of the test. It should be very small when the null hypothesis is true and very big when the alternative hypothesis is true. The higher the sample size for the experiment the better becomes the power. Unfortunately it is not possible to keep α and β small at the same time. In practice it is common to specify a small α first, also called the testing level, and determine an appropriate number m of replications to keep β as small as possible afterwards. Within the t-test the data is summarized in the test-statistic which is t-distributed under the assumption that the data for the treatment and control groups was obtained from normal distributions. The t-test is available in most statistical software packages. 2.3.3 Sample Size Determination It is not easy to directly determine a number m of replicates to use for the t-test. Instead it is more convenient to look how the power of the test behaves when using a specified number of replicates. The power of the paired t-test depends (1) on the probability a for the Type I error, (2) on the standard deviation of the
Fig. 13 Power-function of the paired t-test using three replicates (dotted line), four replicates (dashed line) and five replicates (solid line) under the assumption that the standard deviation of the spots differences is s = 0.2.
19
20 Difference Gel Electrophoresis (DIGE)
differences between the treatment and control values, (3) on the expression change to be detected and (4) on the number m of replicates which are used. Given these four parameters the power can be plotted as function of the expression change to be detected. In Figure 3 the power function is given for different numbers of replicates. On the lower abscissa the relative expression change is given, and on the upper, the differential expression change of the normalized values. From the power one can detect the probability β of the Type II error by the relationship β = 1 − power. If in the case of Figure 13 one wants to detect a relative expression change of 1.5 then β ≈ 0.1 when using five replicates. Hence, the strategy of finding the correct sample size is to specify the above-named four parameters, plot the power function and read the probability β for the Type II error from the graph. If β is too big, plot the power function for a higher number of replicates. Within the open source software R, the power function for the paired t-test can be plotted. 2.3.4 Further Applications There are many other statistical models which can be used for the evaluation of DIGE studies. Inclusion of not only a group factor, but also a time factor in the experiment methods of the analysis of variance (ANOVA) can be applied to find expression changes within the temporal course of the protein expression or to find interactions between the group and time factor. Several multivariate statistical methods are of use, too. Spots with similar expression profiles can be grouped by cluster analysis or, on the other hand, new spots can be assigned to existing groups by the methods of discriminant analysis.
References 1 KLOSE, J., KOBALZ, U. (1995). Two-dimensional electrophoresis of proteins: an updated protocol and implications for a functional analysis of the genome. Electrophoresis 16, 1034–1059. 2 KLOSE, J. (1975). Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals. Humangenetik 26, 231–243. 3 O’FARRELL, P. H. (1975). High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 250, 4007–4021. 4 G¨ORG, A., OBERMAIER, C., BOGUTH, G., HARDER, A., SCHEIBE, B., WILDGRUBER, R., WEISS, W. (2000). The current state of two-dimensional electrophoresis with immobilized pH gradients. Electrophoresis 21(6), 1037–1053.
5 BJELLQVIST, B., EK, K., RIGHETTI, P. G., GIANAZZA, E., G¨ORG, A., WESTERMEIER, R., POSTEL, W. (1982). Isoelectric focusing in immobilized pH gradients: principle, methodology and some applications. J. Biochem. Biophys. Methods 6(4), 317–339. 6 G¨ORG, A., POSTEL, W., GUNTHER, S. (1988). The current state of two-dimensional electrophoresis with immobilized pH gradients. Electrophoresis 9(9), 531–546. 7 ANDERSON, N. G., ANDERSON, N. L. (1996). Twenty years of two-dimensional electrophoresis: past, present and future. Electrophoresis 17(3), 443–453. 8 HERBERT, B. (1999). Advances in protein solubilization for two-dimensional electrophoresis. Electrophoresis 20(4–5), 660–663.
References 9 RABILLOUD, T. (2000). Proteome research: two-dimensional gel electrophoresis and identification methods. Springer, Berlin. 10 LOPEZ, M. F., PATTON, W. F. (1997). Reproducibility of polypeptide spot positions in two-dimensional gels run using carrier ampholytes in the isoelectric focusing dimension. Electrophoresis 18, 338–343. 11 UNLU¨ , M., MORGAN, M. E., MINDEN, J. S. (1997). Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 18, 2071–2077. 12 ALBAN, A., DAVID, S. O., BJORKESTEN, L., ANDERSSON, C., SLOGE, E., LEWIS, S., CURRIE, I. (2003). A novel experimental design for comparative two-dimensional gel analysis: two-dimensional difference gel electrophoresis incorporating a pooled internal standard. Proteomics 3, 36–44. 13 KNOWLES, M. R., CERVINO, S., SKYNNER, H. A., HUNT, S. P., de FELIPE, C., SALIM, K., MENESES-LORENTE, G., MCALLISTER, G., GUEST, P. C. (2003). Multiplex proteomic analysis by two-dimensional differential in-gel electrophoresis. Proteomics 3, 1162–1171. 14 GHARBI, S., GAFFNEY, P., YANG, A., ZVELEBIL, M. J., CRAMER, R., WATERFIELD, M. D., TIMMS, J. F. (2002). Evaluation of two-dimensional differential gel electrophoresis for proteomic expression analysis of a model breast cancer cell system. Mol. Cell Proteomics 1, 91–98. 15 TONGE, R., SHAW, J., MIDDLETON, B., ROWLINSON, R., RAYNER, S., YOUNG, J., POGNAN, F., HAWKINS, E., CURRIE, I., DAVISON, M. (2001). Validation and development of fluorescence two-dimensional differential gel electrophoresis proteomics technology. Proteomics 1, 377–396. 16 MILLER, F. D., KAPLAN, D. R. (2001). On Trk for retrograde signaling. Neuron 32, 767–770. 17 NAKAGAWARA, A., ARIMA-NAKAGAWARA, M., SCAVARDA, N. J., AZAR, C. G., CANTOR, A. B., BRODEUR, G. M. (1993). Association between high levels of expression of the TRK gene and favorable outcome in human
18
19
20
21
22
23
24
neuroblastoma. N. Engl. J. Med. 328, 847–854. SCHULTE, J. H., SCHRAMM, A., KLEIN-HITPASS, L., KLENK, M., WESSELS, H., HAUFFA, B. P., EILS, J., EILS, R., BRODEUR, G. M., SCHWEIGERER, L., HAVERS, W., EGGERT, A. (2005). Microarray analysis reveals differential gene expression patterns and regulation of single target genes contributing to the opposing phenotype of TrkA- and TrkB-expressing neuroblastomas. Oncogene 24(1), 165–177. SITEK, B., APOSTOLOV, O., STU¨ HLER, K., PFEIFFER, K., MEYER, H. E., EGGERT, A., SCHRAMM, A. (2005). Identification of dynamic proteome changes upon ligand activation of Trk-receptors using two-dimensional fluorescence difference gel electrophoresis and mass spectrometry. Mol. Cell Proteomics 4, 291–299. EGGERT, A., IKEGAKI, N., LIU, X.-G., CHOU, T. T., LEE, V. M., TROJANOWSKI, J. Q., BRODEUR, G. M. (2000). Molecular dissection of TrkA signal transduction pathways mediating differentiation in human neuroblastoma cells. Oncogene 19, 2043–2051. EGGERT, A., GROTZER, M. A., IKEGAKI, N., LIU, X. G., EVANS, A. E., BRODEUR, G. M. (2002). Expression of the neurotrophin receptor TrkA down-regulates expression and function of angiogenic stimulators in SH-SY5Y neuroblastoma cells. Cancer Res. 62, 1802–1808. YANO, M., OKANO, H. J., OKANO, H. (2005). Involvement of Hu and hnRNP K in neuronal differentiation through p21 mRNA posttranscriptional regulation. J. Biol. Chem. 280, 12690–12699. NIEDERHUBER, J. E., BRENNAN, M. F., MENCK, H. R. (1995). The National Cancer Data Base report on pancreatic cancer. Cancer 76(9), 1671–1677. SITEK, B., L¨UTTGES, J., MARCUS, K., KLO¨ PPEL, G., SCHMIEGEL, W., MEYER, H. E., HAHN, S. A., STU¨ HLER, K. (2005). Application of DIGE saturation labeling for the analysis of microdissected precursor lesions of pancreatic adenocarcinoma. Proteomics 5, in press.
21
22 Difference Gel Electrophoresis (DIGE)
25 WHETSELL, L., MAW, G., NADON, N., RINGER, D. P., SCHAEFER, F. V. (1992). Polymerase chain reaction microanalysis of tumors from stained histological slides. Oncogene 7, 2355–2361. 26 ZHUANG, Z., BERTHEAU, P., EMMERT-BUCK, M. R., LIOTTA, L. A., GNARRA, J., LINEHAN, W. M., LUBENSKY, I. A. (1995). A microdissection technique for archival DNA analysis of specific cell populations in lesions < 1 mm in size. Am. J. Pathol. 146, 620–625. 27 EMMERT-BUCK, M. R., BONNER, R. F., SMITH, P. D., CHUAQUI, R. F., ZHUANG, Z., GOLDSTEIN, S. R., WEISS, R. A., LIOTTA, L. A. (1996). Laser capture microdissection. Science 274, 998–1001. 28 SCHU¨ TZE, K., LAHR, G. (1998). Identification of expressed genes by laser-mediated manipulation of single cells. Nat. Biotechnol. 16, 737–742. 29 HAUPTMANN, R., MAURER-FOGY, I., KRYSTEK, E., BODO, G., ANDREE, H., REUTELINGSPERGER, C. P. (1989). Vascular anticoagulant beta: a novel human Ca2+ /phospholipid binding protein that inhibits coagulation and phospholipase A2 activity. Its molecular cloning, expression and comparison with VAC-alpha. Eur. J. Biochem. 185, 63–71. 30 WAISMAN, D. M. (1995). Annexin II tetramer: structure and function. Mol. Cell Biochem. 149–150, 301–322. 31 DIAZ, V. M., HURTADO, M., THOMSON, T. M., REVENTOS, J., PACIUCCI, R. (2004). Specific interaction of tissue-type plasminogen activator (t-PA) with annexin II on the membrane of pancreatic cancer cells activates plasminogen and promotes invasion in vitro. Gut 53, 993–1000. 32 LU, Z., HU, L., EVERS, S., CHEN, J., SHEN, Y. (2004). Differential expression profiling of human pancreatic adenocarcinoma and healthy pancreatic tissue. Proteomics 4, 3975–3988. 33 SHAPLAND, C., HSUAN, J. J., TOTTY, N. F., LAWSON, D. J. (1993). Purification and properties of transgelin: a transformation and shape change sensitive actin-gelling protein. Cell Biol. 121, 1065–1073.
34 SHIELDS, J. M., ROGERS-GRAHAM, K., DER, C. J. (2002). Loss of transgelin in breast and colon tumors and in RIE-1 cells by Ras deregulation of gene expression through Raf-independent pathways. J. Biol. Chem. 277, 9790–9799. 35 KARP, A. N., KREIL, D. P., LILLEY, K. S. (2004). Determining a significant change in protein expression with DeCyder during a pairwise comparison using two-dimensional difference gel electrophoresis. Proteomics 4, 1421–1432. 36 HUBER, W., HEYDEBRECK, A., VON S¨ULTMANN, H., POUSTKA, A., VINGRON, M. (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18, S96-S104. 37 FERNANDEZ-PATRON, C., CASTELLANOS-SERRA, L., HARDY, E., GUERRA, M., ESTEVEZ, E., MEHL, E., FRANK, R. W. (1998). Understanding the mechanism of the zinc-ion stains of biomacromolecules in electrophoresis gels: generalization of the reverse-staining technique. Electrophoresis 19, 2398–2406. 38 HEUKESHOVEN, J., DERNICK, R. (1988). Improved silver staining procedure for fast staining in PhastSystem Development Unit. I. Staining of sodium dodecyl sulfate gels. Electrophoresis 9, 28–32. 39 JOHNSTON, R. F., PICKETT, S. C., BARKER, D. L. (1990). Autoradiography using storage phosphor technology. Electrophoresis 11, 355–360. 40 NESTERENKO, M. V., TILLEY, M., UPTON, S. J. (1994). A simple modification of Blum’s silver stain method allows for 30 minute detection of proteins in polyacrylamide gels. J. Biochem. Biophys. Methods 28, 239–242. 41 NEUHOFF, V., STAMM, R., PARDOWITZ, I., AROLD, N., EHRHARDT, W., TAUBE, D. (1990). Essential problems in quantification of proteins following colloidal staining with coomassie brilliant blue dyes in polyacrylamide gels, and their solution. Electrophoresis 11, 101–117. 42 RABILLOUD, T., STRUB, J. M., LUCHE, S., van DORSSELAER, A., LUNARDI, J. (2001). A
References comparison between Sypro Ruby and ruthenium II tris (bathophenanthroline disulfonate) as fluorescent stains for protein detection in gels. Proteomics 1, 699–704. 43 SHAW, J., ROWLINSON, R., NICKSON, J., STONE, T., SWEET, A., WILLIAMS, K., TONGE, R. (2003). Evaluation of saturation labeling two-dimensional
difference gel electrophoresis fluorescent dyes. Proteomics 3, 1181–1195. 44 YAN, J. X., HARRY, R. A., SPIBEY, C., DUNN, M. J. (2000). Postelectrophoretic staining of proteins separated by two-dimensional gel electrophoresis using SYPRO dyes. Electrophoresis 21, 3657–3665.
23
1
Electrophysiological Analysis of Ion Channels Michael Pusch Instituto di Biofisica, Consiglio Nazionale delle Ricerche, Genova, Italy
Originally published in: Expression and Analysis of Recombinant Ions Channels. Edited by Jeffrey J. Clare and Derek J. Trezise. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31209-2 This article outlines electrophysiological methods for extracting biophysical parameters that describe two fundamental properties of ion channels: gating and permeation. The Introduction provides a broad overview of the general concepts of ion channel biophysics and the text a review of the kind of information that can be extracted from electrophysiological recordings. The later sections introduce several methods for the analysis of electrophysiological experiments on heterologously expressed ion channels. Many parts are explicit and can be directly applied “at the bench”. Other, more advanced topics (gating current measurements, singlechannel kinetic analysis) are touched upon only superficially since their application requires further background that can be found in the references.
1 Introduction
The function of ion channels is to rapidly pass – in a passive but selective manner – a large number of ions across biological membranes. This electrogenicity is exploited by excitable cells to quickly change the transmembrane voltage allowing, for example, the conduction of the action potential and the postsynaptic electrical response to chemical neurotransmission. Other cells and organelles exploit the large capacity of charge transport for ion homeostasis and transepithelial transport. For the researcher the electrogenicity is interesting and useful because it allows the measurement of ion channel functioning in “real-time”. It is possible to monitor the action of ion channels both in vivo or in appropriate simplified in vitro systems like brain slices. Intracellular electrodes directly measure the membrane voltage of individual cells while extracellular electrodes monitor cellular electrical activity indirectly. However, the recording of the physiological voltage provides little information regarding the biophysical parameters of the underlying channels, for two main reasons. First, all cells possess a complement of different ion channels and other Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Electrophysiological Analysis of Ion Channels
electrogenic transporters that make it difficult to tease apart their respective contribution to the electrical response. Second, the physiological recordings are currentclamp measurements in which the membrane voltage is “freely floating” and the concentration of neurotransmitters or other ligands is uncontrolled. In order to reliably define a physical parameter of an ion channel (ligand affinity, slope of the voltage-dependence etc.) these key variables have to be fixed, or “clamped”. A further important step is to isolate, as far as possible, a single type of ion channel. In physiological preparations this can be partially achieved using appropriate solutions and blocking agents. Heterologous systems are even better in this respect, even if they carry the risk that some physiologically important molecular component is missing. From the perspective described above the application and interpretation of the voltage-clamp measurements by Hodgkin and Huxley [2] revolutionized the approach to ion channel analysis. Today’s description and interpretation of ion channel function still draws heavily on the principles embodied in this work. One of the most important concepts for ion channels is the “gate”. A gate can be either open or shut (closed). There is no intermediate, half-open gate. At the time of Hodgkin and Huxley, there was no direct evidence for this as practically nothing was known at the level of the single channel. The assumption of a simple open or closed gate allowed a convenient description, as a two-state Markov process associated with a linear differential equation at a fixed voltage or ligand concentration. For the classical “m-gate”, the activation gate of the voltage-gated Na+ channel, this equation reads dm (t)=−βm(t)+α(1−m(t)) dt where m(t) is the probability of the gate being open at time t. The opening and closing rate constants, α and β, respectively, are voltage-dependent and represent a model for the underlying molecular rearrangement leading to gate opening. A verification of the open-closed dichotomy of most ion channels became possible with the patch clamp technique [3, 4] that allowed the real-time visualization of single channel opening and closing in almost all types of animal, plant, and bacterial cells (for simplicity here, the existence of “subconductance” states, “flickering”, and other probably ubiquitous complicated single-channel behavior is ignored). Of course, the description of gating as a two-state process is an idealization, opening is not instantaneous. Also, microscopically, a single “open” state does not exist, but rather an almost infinite number of possible molecular configurations that are macroscopically lumped together into “the open” state, because functionally, they are almost indistinguishable from each other. The transitions from closed to open (and vice versa) appear “instantaneously” on single channel recordings, but in reality take several 100 ns. However, despite the availability of several ion channel structures (see for example Refs. [5–7]) and large computing power it is not yet possible to explore channel gating by molecular dynamics. Even ion permeation that occurs on a much faster time scale cannot be fully simulated, even though some theoretical progress has been made, especially for K+ channels (see for example Ref [8]). Quantitative functional measurements are still essential for a detailed insight into the mechanisms of ion channels. It can be expected that
2 Expression Systems and Related Recording Techniques
many more structural data for ion channels will be available in the near future. The structures will guide computational studies and rational mutagenesis in order to understand the mechanisms of function at a molecular model, to obtain high affinity ligands, and possibly to exploit ion channels as molecular devices in applied technological systems. Computational predictions and structure-based hypotheses have to be tested experimentally with functional data. The present chapter aims to provide an aid to designing, analyzing and interpreting such measurements.
2 Expression Systems and Related Recording Techniques
Each type of heterologous expression system determines a range of possible recording techniques. The most popular expression systems are Xenopus oocytes and mammalian cell lines, like HEK293 or CHO cells. A more rarely used system is the incorporation of relatively crude vesicles or purified proteins into planar lipid bilayers. 2.1 Expression in Xenopus Oocytes
The expression in Xenopus oocytes represents an extremely versatile system that allows the application of many different electrophysiological and biochemical methods [see Refs. 9–11]. Normally, in vitro transcribed cRNA is microinjected but expression can also be achieved using nuclear injection of eukaryotic expression plasmids. The oocyte system is popular because electrophysiological recordings can be easily performed by nonexperts employing the two-electrode voltage-clamp technique (TEV) [9]. This method allows a relatively high throughput compared to patch-clamp techniques and is thus often used, for example, for drug screening. A commonly underestimated problem of the TEV technique that is relevant also for qualitative measurements concerns the error introduced by the so-called series resistance (see for example Ref. [12]). The series resistance is caused by a finite conductivity of the oocyte cytoplasm, leading to a voltage drop within the cytoplasm and thus to a voltage error (see Fig. 1). Typical values of the series resistance are of the order of 0.5–1 k. Thus a current of 10 µA will cause a voltage error of the order of 5–10 mV, a value that cannot always be neglected. Furthermore, even when the series resistance error is accounted for, the TEV technique has a limited time resolution of the order of almost 1 ms in most realworld applications. The apparent time resolution can be enhanced but the oocyte nevertheless provides a nonideal space-clamp. “Fast” kinetic parameters that are derived from TEV measurements are therefore seldom comparable to the same parameters measured with the patch clamp technique. Another disadvantage of whole oocytes is that their cytoplasmic content cannot be controlled. This may lead to a significant variability of measurements from different oocytes if the channel properties depend on the cytosolic composition. Also one would often like to
3
4 Electrophysiological Analysis of Ion Channels
Fig. 1 The intracellular series resistance in Xenopus oocytes. Current flowing through the interior of the oocyte leads to a voltage drop caused by the finite resistance of the cytoplasm.
change, or at least to fix, the intracellular solution. Furthermore, in the case of large expression the intracellular ion concentrations can be significantly altered by the voltage-clamp measurements. For example it is very difficult to handle a large expression (>10 µA) of the Cl− selective muscle channel CLC-1, because its kinetics depends strongly on the intracellular Cl− concentration. The disadvantages described above are partially overcome by the “cut-open” oocyte technique [13]. With this method only a small part of the surface area of the oocyte is clamped and the intracellular solution can be exchanged. However, the method is low throughput, necessitates considerable skill, and perfusion of the interior solution is very slow. Thus, this method finds a narrow range of special applications. A general problem with the expression of Ca2+ -permeable channels or channels that depend on intracellular Ca2+ is that Xenopus oocytes endogenously express a large Ca2+ dependent Cl− current, ICl− (Ca2+ ) [14, 15]. With maximal stimulation ICl− (Ca2+ ) can reach several tens of µA of current. Thus activation of Ca2+ permeable channels that leads to an influx of Ca2+ will inevitably activate this endogenous current and confound the measurements. It is also practically impossible to manipulate the intracellular Ca2+ concentration in order to study its effect on expressed channels. Nevertheless, the endogenous current can be exploited as a Ca2+ sensor to test for a possible Ca2+ permeability and also to test if the activation of (expressed) receptors and/or G-proteins results in an increase in the intracellular Ca2+ concentration (see for example Ref. [16]). One other advantage of the oocyte system is that several cRNAs coding for different subunits of an ion channel or other interacting proteins can be co-injected at defined proportions. For example dominant heterozygous genotypes of channelopathies can be simulated by a one to one co-expression of WT and mutant subunits, and possible dominant negative effects can be quantified (see for example Ref [17]). Co-expression of different proteins can also be achieved in transfected cells. However, with the oocyte injection it is easier to precisely control the relative expression of each protein.
2 Expression Systems and Related Recording Techniques
Finally, electrical data recorded with the TEV can be correlated for the same oocyte with the surface expression of the expressed protein, using for example an introduced extracellular epitope [18, 19]. This is of particular importance for channelopathies because many disease-causing mutations are pathogenic because they effectively reduce or enhance the plasma membrane expression of the channel (see for example Refs. [18, 20]). The patch clamp technique can also be applied to Xenopus oocytes after the vitelline membrane has been removed [10]. Recordings can be performed in the cell-attached, the inside out and the outside out configuration (see Ref. [4] for a description of these methods). The electrical properties of the obtainable seal are exceptional – seal resistances >100 G can be achieved, which allow very high resolution recordings. The size of the patch pipette range from very small to “giant” [21], allowing single-channel or macroscopic recordings. Rapid solution exchanges can be applied to excised patches allowing a precise investigation of, for example, transmitter activated channels (see for example Ref. [22]). The excellent electrical properties of the cell-free patch-clamp configuration represent a significant advantage over whole cell recordings of small cells that can suffer from limited time-resolution due to the access (series) resistance (see below). 2.2 Expression in Mammalian Cells
Another popular expression system is “transfected” mammalian cell lines like HEK, CHO or many others (see Ref. [23]). The expression of one or more proteins is induced by the introduction of the DNA in an appropriate eukaryotic (often mammalian) expression vector by various chemical or physical methods. Cells can be either transiently or stably transfected. Stable transfection generally requires the integration of one or more copies of the expression construct into the genome and is initially more labor intensive than transient transfection. It is, however, convenient for long term studies on a particular channel or for drug screening where large numbers of cells may be required. Many molecular biological methods exist that increase transfection efficiency and the level of expression. Expression can also be induced with several different kinds of viruses [24]. The patch-clamp technique is the method of choice for studying the function of channels expressed in these small cells. All configurations (cell attached, whole cell, inside out, outside out) can be applied but the whole cell configuration is the most straightforward and widely used. Indeed, several different technical approaches have been taken to automate the whole cell patch-clamp for high throughput drug screening (see Ref. [25]). Several factors have to be considered for the analysis of whole cell data. The time resolution in voltage-jump experiments is limited by the time that is necessary to charge the membrane capacitance, Cm , across the access resistance, Ra , given by τ =Cm Ra
5
6 Electrophysiological Analysis of Ion Channels
Typical values of Cm = 20 pF, Ra = 5 M yield a charging time constant of 0.1 ms. This is adequate for most applications but can lead to problems for very fast kinetics observed for example in voltage-gated Na+ or Cl− channels [26, 27]. The access resistance leads also to an error in the voltage-reading similar to the series resistance problem in the two-electrode voltage-clamp. For a membrane current Im the voltage error amounts to V =Im Ra which for typical values Ra = 5 M, Im = 1 nA amounts to 5 mV and becomes worse for larger currents and/or access resistances. In voltage-jump experiments with large currents and fast kinetics the two kinds of errors described combine to create a complex dynamic error that can render certain measurements uninterpretable. Most amplifiers provide an access (series) resistance compensation that compensates both kinds of errors based on an estimate of Cm and Ra . Since the compensation involves positive feedback elements it increases the noise and is prone to oscillations. Care must therefore be taken with its application (see Ref. [28]). Similar to the oocyte system cell-free patches allow a much better voltage-clamp and also faster solution changes. However, because most mammalian cells are quite small, “macroscopic” recordings are more difficult to achieve in excised patches since large pipette diameters are poorly tolerated. 2.3 Leak and Capacitance Subtraction
In heterologous expression systems unwanted currents can arise either from true leak caused by the recording electrodes or from endogenous ion channels and transporters. These have to be carefully avoided using appropriate solutions and protocols. The subtraction of currents remaining after application of a specific blocker, if available, at a saturating concentration, is a very good but often tedious method. For ligand-gated channels the subtraction of currents at zero concentration of ligand is obviously a good method, because the spontaneous open probability is very small in most cases. For voltage-gated channels studied by step-protocols the responses are additionally distorted by the capacitive transients. These can be assumed to be linear, meaning that their size is proportional to the voltage step but independent of the voltage from which the pulse is delivered. Thus smaller steps applied in a voltage range where channels are closed or steps to the reversal potential can be used to subtract capacitive transients after appropriate scaling (see for example Refs. [29, 30]). The most commonly used protocol for the subtraction of linear leak and capacity currents when measuring voltage-gated sodium and potassium channels is called the “P/4 method” [29], that is a standard feature of most data acquisition programs. For this method four small voltage pulses with a quarter of the size of the main voltage pulse are delivered before or after the main pulse. The small pulses are subthreshold and elicit exclusively leak and capacitive
3 Macroscopic Recordings
currents. The response to the four quarter-sized pulses is summed and subtracted from the main response. Linear components are therefore practically completely subtracted.
3 Macroscopic Recordings
In the following sections it is assumed that the ion channel of interest is expressed in a heterologous system and represents the major contribution to the total membrane conductance. The methodologies to extract various biophysical parameters that are useful for a characterization of ion channels are explained. Single channel measurements, if recorded at a sufficient bandwidth, contain, in principle, more information than macroscopic ensemble measurements. However, they are significantly more technically demanding and the very small single channel currents for many ion channels renders their analysis virtually impossible. In addition, fast kinetics are difficult to measure at the single channel level because of the lower signal to noise ratio that has to be compensated by heavier filtering. Thus, for many applications macroscopic recordings represent the only practical approach. The most important relation regarding macroscopic currents is given by I=Ni p
(1)
that describes the total current, I, through a homogenous population of N independent channels. A basic assumption is that the channel under investigation possesses a single open state with current level i, and is without subconductance states. Of course this is an oversimplification for many channels. Without this assumption, however, a practical interpretation of macroscopic currents is almost impossible, because it is very hard to tease apart different conductance levels of a single channel from macroscopic recordings. The parameter p represents the open probability of the channel, that is the time- and/or voltage- and/or ligand-dependent probability of the channel being in a conducting state with an associated current, i. The single parameter p in Eq. (1) summarizes the combined action of all gates of the channels. Often it is useful to think of the gates as independent devices. For example the voltage-gated Na+ channel of Hodgkin and Huxley has 3 m-gates and one h-gate, all independent from each other such that the parameter p is equivalent to p = m3 h. While the independence of different gates is seldom realistic, it is very useful conceptually. Eq. (1) incarnates one of the dogmas of ion channel biophysics: permeation through the open channel, characterized by the parameter i, is independent of channel gating, characterized by the parameter p. This is a very useful conceptual distinction, although in real life it breaks down immediately: the occupancy of the pore by permeating ions generally stabilizes the protein structure and thereby influences gating. However, such effects are generally relatively small with some exceptions. For example, in CLC type Cl− -channels, permeation and gating are
7
8 Electrophysiological Analysis of Ion Channels
strongly coupled [31, 32]. On the other hand, gating has practically no influence on permeation through the open pore, because, by definition “open” implies an open gate. Any influence of closed gates on ion occupancy of pore binding sites vanishes rapidly after opening the gate because the two processes occur on vastly different time scales. This assumption becomes questionable only if a rapid “flicker type” gate is present that opens and closes on a time scale more similar to that of ion conduction. For example in KvLQT1 (KCNQ1) K+ channels a rapid flicker type gate seems to be present that leads to drastic effects of Rb+ ions on the macroscopic current amplitude [33]. In this case it is difficult to distinguish between an effect of Rb+ on permeation or gating because the concept of a “gate” becomes questionable. 3.1 Analysis of Pore Properties–Permeation
Two basic parameters are important when considering permeation properties. One is ion selectivity and the other conductance. Ion selectivity is the ability to favor one ion over another and is expressed as the “permeability” ratio of the two ions, for example PK /PNa for potassium and sodium. It is, in practice, determined from the reversal potential measured in mixtures of the two ions. The simplest situation is when the ions are present at equal concentrations, one on one side of the membrane and the other on the other side – so called bi-ionic conditions. We consider the example of a cationic channel measured with 150 mM NaCl intracellular and 150 mM KCl extracellular. Then the reversal potential, E rev , is given by E rev =RT/(zF )ln(PK /PNa )
(2)
where R is the gas constant, T the absolute temperature, F the Faraday constant and z the valence (z = 1 for the example above). At room temperature the factor RT/F amounts to ∼25 mV, a value that is very useful to remember for electrophysiologists. Inverting Eq. (2) yields PK /PNa =e zF E rev /(RT ) Thus, for example, E rev = 25 mV indicates an e-fold higher permeability of K+ versus Na+ (e ∼2.718). Bi-ionic conditions are preferable but are not easily achieved in some experimental systems. For example in TEV recordings from oocytes the intracellular solution cannot be changed. In this case, the difference of the reversal potential that arises by changing from one extracellular solution to an other is measured. To obtain a permeability ratio, the Goldman-Hodgkin-Katz equation is used E rev = RT/F In((PK [K]ext + PNa [Na]ext )/(PK [K]int + PNa [Na]int ))
(3)
where [K]ext is the extracellular K+ concentration and similarly for Na+ (for anions the sign of the reversal potential has to be inverted). We consider the case where
3 Macroscopic Recordings
the extracellular concentrations of Na+ and K+ are changed from [Na]0 to [Na]1 and from [K]0 to [K]1 , respectively, and assume that the measured reversal potential changes from E 0 to E 1 . From Eq. (3) it follows that the permeability ratio is given by PK /PNa =([Na]1 −[Na]0 eφ )/([K]0 eφ −[K]1 ) where φ = F(E 1 −E 0 )/(RT) ∼ (E 1 −E 0 )/(25 mV). The assumption is that the intracellular concentrations do not change. When ions of different valence are compared (for example Na+ and Cl− or Na+ and Ca2+ ) the equations change slightly [1] but the basic type of measurement remains the same. One important and often overlooked problem of reversal potential measurements is the presence of liquid junction potentials that invariably arise when a solution is exchanged for another. The liquid junction potential is caused by the different mobility of different ions and is most pronounced when small inorganic mobile ions (for example Na+ , Cl− ) are exchanged by large organic quite immobile ions (for example NMDG+ , gluconate− ; see Ref. [34] for how to determine and correct for liquid junction potentials). When the Cl− concentration is changed care must be taken because most reference electrodes are Ag/AgCl electrodes that must be shielded from the solution exchange, for example by agar bridges. It may be difficult to determine the reversal potential because the current flowing through the channel is small, such that endogenous background conductances or a leak conductance dominate the effective reversal potential. One reason for this might be that the gates are closed at the expected reversal potential. This is especially a problem for voltage-gated channels. In this case so-called tail-current analysis can be applied to determine the reversal potential and the shape of the single channel current-voltage relationship, as illustrated in Fig. 2. The currents shown in Fig. 2c were simulated based on the two-state scheme shown in Fig. 2a using the pulse-protocol illustrated in Fig. 2b. Opening and closing rate constants are exponentially voltage-dependent such that the channel closes at negative voltages and opens maximally at positive voltages. The channel was assumed to have a linear single-channel current-voltage relationship (i – V) with an imposed reversal potential of E rev = −70 mV. However, from the steady state current-voltage relationship, obtained at the end of the variable pulse (see box in Fig. 2c, squares in Fig. 2d), the reversal potential cannot be obtained, because the open probability is too low. In contrast, the initial current, “immediately” after the end of the activating voltage-step (arrow in Fig. 2c), is measurable at these negative voltages. This “instantaneous tail current” and the resulting instantaneous current–voltage relationship (Fig. 2d, circles) reflect the shape of the single channel current–voltage relationship. This can be seen from the equation Itail (V t )=Npend i(V t )
(4)
where Itail is the instantaneous current at the tail-voltage,V t , pend is the openprobability at the end of the activating prepulse, and i (V t ) is the single channel
9
10 Electrophysiological Analysis of Ion Channels
Fig. 2 Tail current analysis. Macroscopic currents were simulated for the two state scheme shown in A. The reversal potential was −70 mV. From the holding potential of −80 mV a 0.2 s prepulse to 60 mV was followed by variable pulses ranging from −140 to 80 mV as illustrated in B. At negative voltages currents deactivate quickly
such that the reversal potential cannot be reliably obtained from the steady-state currents (squares in D). The initial tail currents (circles in D) faithfully reproduce the linear single-channel current–voltage relationship and allow a precise determination of the reversal potential.
current at V t . It is assumed that the open probability immediately after the voltagestep remains at the value it had at the end of the prepulse (pend ). Then, the factor N pend is independent of the tail voltage, and Itail (V t ) is proportional to i (V t ). If the purpose is only to determine the reversal potential it is not very important to get exactly the initial current, and an average over a short stretch of currents shortly after the voltage jump can be calculated. For measuring the exact shape of the single channel i–V, care must be taken to determine the “correct” initial value. This may be complicated if the deactivation is fast and the voltage jump
3 Macroscopic Recordings
is associated with a large capacitance transient. In this case the time course of the deactivating current (after the capacitance transient) is fitted by a suitable function (for example an exponential function) that is then back-extrapolated to time “zero”. The determination of the permeability of a blocking ion can also be difficult. For example the muscle Cl− channel CLC-1 is blocked by iodide while iodide has a significant permeability with a permeability ratio of P1 /PCl ∼ 0.2. If extracellular Cl− is completely exchanged by iodide, however, CLC-1 is totally blocked and the reversal potential is dominated by endogenous background conductances, resulting in a wrong estimate of P1 /PCl . The problem can be solved by partially exchanging extracellular Cl− for iodide (for example change from 100 mM Cl− to a solution containing 20 mM Cl− and 80 mM iodide) leaving enough Cl− to allow for a significant conductance. Eq. (3) can then be used again to quantify the permeability ratio. 3.2 Analysis of Fast Voltage-dependent Block – the Woodhull Model
Tail current analysis is useful to analyze voltage-dependent block by fast blockers. A commonly used model to describe voltage-dependent block is the Woodhull model [1, 35]. In this model it is assumed that the charged blocking particle enters the channel pore to a certain distance, and senses therefore part of the transmembrane electric field. Block is quantified by I(c) 1 = I(0) 1+ K Dc(0) exp(zδV F /(RT ))
(5)
Here I(c) is the current in presence of blocker at concentration c, K D (0) is the dissociation constant at zero voltage, z the valence of the blocking ion and δ the “electrical” distance of the binding site from the bulk solution, that stands for the fraction of the electric field from the bulk solution to the binding site. Eq. (5) describes simultaneously the concentration and the voltage-dependence of block with just two parameters, K D (0) and δ (see Fig. 3). Another way to describe the Woodhull model is in terms of an exponentially voltage-dependent dissociation constant (Fig. 3c). The simplicity of the Woodhull model makes it attractive and it is often a good initial model. Several assumptions are, however, seldom truly satisfied. Firstly, “blocking” ions are often permeable to some extent and may “punch through” at large voltages. Also, almost all ion channels have multi-ion pores in which the ions interact. A blocking ion could displace a permeable ion present at the blocking site within the pore. Such effects add to the intrinsic voltage dependence of block (described by δ) and complicate the picture. The incorporation of such features into mechanistic models is beyond the scope of this chapter. See Ref. [1] for a comprehensive description of blocking mechanisms.
11
12 Electrophysiological Analysis of Ion Channels
Fig. 3 Illustration of the Woodhull model of channel block. A linear single-channel i–V and a zero reversal potential was assumed for the unblocked channel (solid line in A). The parameters of the block were K D (0) = 1 mM and z * = 0.4. Application of increasing amounts of blocker (0.5, 1, and 5 mM) produces the increasing voltage-dependent
block dashed lines in A). The concentration dependence of block is illustrated in B for different voltages (−100, −30, 30, and +100 mV). The exponential voltagedependence of the apparent K D , determined by fitting the curves in B with a simple 1:1 binding curve is illustrated in C.
3.3 Information on Gating Properties from Macroscopic Measurements
Macroscopic currents can be used to obtain an estimate about the open-probability. According to Eq. (1) the macroscopic current is proportional to the open probability, p, but further information or assumptions about the number of channels, N, and the single-channel current, i, are necessary to estimate p from the measured current, I. The number of channels can be assumed to be constant in a typical experiment, as long as its duration is relatively short and no particular maneuvers are undertaken to enhance or decrease protein turnover. Some ion channels can be drastically affected by co-expression and acute manipulation of co-expressed or endogenous regulating proteins, like the ubiquitin-protein ligase Nedd4 [36]. The number of channels can also be nonspecifically affected by agents that lead to a general retrieval of plasma membrane. For example treating Xenopus oocytes with phorbol esters, activators of protein kinase C, leads to an unspecific reduction of expressed conductances via a reduction of the plasma membrane surface [37]. Nevertheless, in most cases, N can be regarded as fixed. If measurements are performed at a fixed voltage, as in the case of ligand activated channels or in “isochronous tail-current” measurements for voltage-gated channels (see below) the single channel current, i, is also fixed. Otherwise, the shape of the single-channel current–voltage relationship (i – V)
3 Macroscopic Recordings
has to be taken into account. For the purpose of extracting information about the open probability, the i – V is parametrized in a phenomenological manner, without necessarily interpreting the corresponding parameters mechanistically. In the absence of direct information the i – V is often assumed to be linear i(V)=γ (V−Erev ) with a single-channel conductance, y, and a reversal potential, E rev . This assumption is particularly appropriate if the reversal potential is relatively close to 0 mV, because in this case the intrinsic “Goldman”-rectification [38] has little influence on the shape of the i – V. If the reversal potential is far from zero, and if the channel is highly selective for one ion species present in the solutions, the GoldmanHodgkin-Katz equation is more appropriate to describe the i – V because it takes the concentrations of the permeable ion on the two sides of the membrane into account [1]: i(V )=K φ
exp(z(φ−φr ))−1 exp(zφ)−1
(6)
Where K is a constant depending on the ionic concentrations, φ = VF/(RT) and φ r = E rev F/(RT). The nonlinear shape of the Goldman current-voltage relationship is illustrated in Fig. 4 for a monovalent cation with a reversal potential of −60 mV. It is also not uncommon that some voltage-dependent block or strong rectification is present that has to be taken into account for the description of the i – V. Such a block can be phenomenologically described by a factor that is derived from the Woodhull model. For a Goldman-type rectification with additional block the i – V is described by i(V )=K φ
1 exp(z(φ−φr ))−1 exp(zφ)−1 1+exp((V −V1 )/V2 )
(7)
where V 1 and V 2 are empirical parameters describing the block; an example is shown in Fig. 4 (dashed line).
Fig. 4. The Goldman–Hodgkin–Katz equation. The solid line is drawn according to Eq. (6) with Erev = −60 mV. The dashed line is drawn according to Eq. (7) with Erev = −60 mV and V 1 = V 2 = 50 mV.
13
14 Electrophysiological Analysis of Ion Channels
Fig. 5 Determination of the open probability for voltage gated channels. Currents were simulated according to the 2-state scheme shown in Fig. 2A assuming a reversal potential of 0 mV and applying the pulse protocol shown in A. Steady state cur-
rents measured at the end of the variablevoltage pulse (see arrow in B) are plotted in C as squares. The tail currents at the beginning of the constant tail pulse to 60 mV are shown as circles.
3.3.1 Equilibrium Properties–Voltage-gated Channels Two types of pulse protocols are most often used to extract the overall-open probability from macroscopic measurements. They are illustrated in Fig. 5, together with simulated currents based on the 2-state scheme of Fig. 2. In the direct pulse protocol (Fig. 5a and b) pulses are delivered to various potentials and the maximum current during the pulse is plotted versus voltage (Fig. 5c, squares). Usually, the open-probability of voltage-gated channels is parametrized with a Boltzmann distribution, here written in two different versions
popen =
1 1 = 1+exp((V1/2 −V )/k) 1+exp(zg (V1/2 −V )F /(RT ))
(8)
In both forms the voltage of half-maximal activation,V 1/2 , describes the voltage at which popen = 0.5. The steepness of the voltage dependence is described either by the so-called “slope-factor”, k (in mV), or the “apparent gating valence” zg (dimensionless). These two quantities are inversely related by
k=
RT zg F
3 Macroscopic Recordings
The Boltzmann equation derives from the statistical Boltzmann equilibrium that relates the ratio of the probability to be in one of two microscopic states, O and C that differ in free energy by a certain amount G: pO =exp(−G/(RT )) pC The Boltzmann distribution of voltage-gated channels (Eq. (8)) stems from a simple model for voltage-gated channels that assumes that the free energy difference between the open and the closed state is additively composed of an electrical term, determined by the electrical charge, denoted by Q C and Q O , respectively, and a purely chemical term, G0 , such that G is given by G=G 0 +V (QO −QC )=G 0 +V zg F where zg is the apparent gating valence. The larger the charge difference between the closed and open state, the more sensitive is the channel to voltage, and the steeper the popen (V) curve. To extract the gating component (p) from the permeation component (i) for currents obtained from the “direct” I – V (Fig. 5c, squares) the I – V is fitted by the product of the i – V term and the Boltzmann term: I(V )=K φ
1 exp(z(φ−φr ))−1 exp(zφ)−1 1+exp(zg (V1/2 −V )F /(RT ))
(9)
Here, in Eq. (9), the Goldman-Hodgkin-Katz equation (Eq. (6)) was used for a description of the i – V. The four parameters, K, φ r , V 1/2 and zg are obtained from a fit to the macroscopic data, while only the two parameters V 1/2 , and zg are relevant for gating. The tail-pulse protocol illustrated in Fig. 5a and b (see arrow) is often called “isochronous tail protocol” because the fixed tail pulse is applied after a fixed amount of time. The initial tail current is a measure of the open probability at the end of the (variable) pre-pulse (see Eq. (4)). As for the instantaneous I–V (see Section 3.1) a correct determination of the initial tail current may be hindered by the capacitive artifact. A careful back-extrapolation of the time course of the tail current to “time 0” may be necessary to obtain a reliable estimate of the initial tail current. The tail voltage should be chosen such that the relaxations are well resolved with the employed voltage-clamp technique. The resulting initial tail currents are then plotted versus the pre-pulse voltage (Fig. 5c, circles) and fitted by I(V )=
Imax 1+exp(zg (V1/2 −V )F /(RT ))
(10)
Here, Imax , is the maximal current obtained at saturating voltages. It can be determined by normalization with the measured currents or it can be left as an
15
16 Electrophysiological Analysis of Ion Channels
independent parameter. The latter possibility is particularly appropriate if the employed voltage range is not sufficient to saturate channel gating. In this case the plateau of the Boltzmann distribution is not reached and currents can not be normalized by the maximally measured value. Sometimes it happens that currents do not tend to zero at voltages where the channel should close according to Eq. (8). This may be an intrinsic property of the gating mechanism of the channel or may represent an uncompensated leak component. If such a “residual” open probability is an intrinsic property, a description with a Boltzmann distribution (Eq. (8)) is, strictly speaking, not adequate. Nevertheless the shape of the popen (V) curve can often be described phenomenologically by a modified Boltzmann distribution I(V )=Imax (pmin +
1− pmin ) 1+exp(zg (V1/2 −V )F /(RT ))
Where pmin describes the minimal open probability reached at saturating voltages where channels are maximally closed. V 1/2 is no longer the voltage at which popen = 0.5 but where popen = pmin + 0.5(1–pmin ). The isochronous tail-current protocol is, in principle, superior to the direct I – V because it is not influenced by the shape of the i – V. However, in certain cases a direct method has to be employed. For example, voltage-gated Na+ channels are governed by two main gating processes of opposite voltage dependence and one wants to determine separately their respective voltage dependence. Na+ channels inactivate with a voltage-dependent time-course after an activating voltage step. The steady state, isochronous tail current i – V would determine only the “window” current, the region where activation and inactivation gates are both open. To separate the two gates of the Na+ channel two different protocols are used to assess the voltage dependence of the activation and the inactivation gate, respectively. The “peak current” of the direct I – V is generally used as a measure of the activation gate (Fig. 6a and b). This is justified because the time constant of activation is considerably faster than that of inactivation. The inactivation is measured with a two-pulse protocol similar to the isochronous tail current protocol (Fig. 6c and d). 3.3.2 Equilibrium Properties–Ligand Gated Channels Conceptually, the determination of equilibrium properties of ligand activated channels is similar to that of voltage activated channels. The energy driving the conformational change is not supplied by the membrane voltage but by the chemical energy of ligand binding. Accordingly, the relevant intensive variable is the ligand concentration, here denoted by [L]. However, for ligand activated channels the allosteric action of the ligand is much more evident and explicit than is the voltage for voltage gated channels [39] (even though most quantitative models of
I(V )=G Na (V −E rev )
1 1+exp(zg (V1/2 −V )F /(RT ))
3 Macroscopic Recordings
Fig. 6 Activation and inactivation of voltage-gated sodium channels. Currents were recorded from tsA201 cells transfected with the cardiac sodium channel and measured using the whole cell configuration of the patch clamp technique. In A currents were elicited by 10 ms voltage steps from −80 to 50 mV (see inset). In B the peak currents are plotted versus the test voltage (symbols) superimposed with a fit of the equation as shown by the lines. C shows
currents from a different cell evoked by a two-pulse protocol as shown in the inset. The response to the 100 ms long prepulse to voltages from −120 to −10 mV is not shown. The currents represent the response to the fixed tail pulse to −10 mV that assays channel availability. The peak currents are plotted in D (symbols) together with a fit of Eq. (10). Note that activation (A, B) and inactivation (C, D) have an opposite voltage-dependence.
voltage gated channels often have an allosteric character). Thus instead of assuming a scheme of the form Scheme 1.
where binding of a ligand directly opens the channel, the minimal scheme applied for ligand activated channels assumes that binding of the ligand favors opening but does not directly open the channel. This scheme is given by Scheme 2.
with a closed, unliganded state U, a liganded (bound) but closed state, B, and an open state, O [40]. Ligand binding occurs with second-order association rate kon and dissociation rate koff , while the allosteric transition is described by rate constants α and β.
17
18 Electrophysiological Analysis of Ion Channels
More complex schemes are necessary to include the possibility that unliganded channels are able to open [39]. Allosteric schemes and equations become even more complex if more than one binding site is present, the usual case in real life. Therefore, mostly for reasons of simplicity, the phenomenological description of ligand activated channels is often expressed in terms of the Hill equation popen =
pmax
(11)
1+( K[L]D )n
Where K D is the apparent dissociation constant of the binding site(s) and n the Hill coefficient, an estimate of the number of binding sites [1]. For the simple 2, equilibrium properties can indeed be expressed in the form of Eq. (11) popen =
1 1+ 1r + 1r K[L]D
=
r/(1+r) K* 1+ [L]D
(12)
Where r = α/β, KD = koff /kon , and the apparent affinity K D * = K D /(1 + r). The maximum open probability is pmax = r/(1 + r) = α/(α + β). Thus, even though the concentration dependence of the macroscopic current strictly follows a 1:1 binding isotherm, the measured apparent affinity can be very different from that of the true affinity of the ligand binding site. K D * is always smaller than K D and only if r is very small (α β) is the apparent affinity equal to the true affinity. But in this case the ligand is not very effective (pmax 1). If r is very large, the ligand is very effective (pmax ∼ 1) but the apparent affinity is much higher than the true affinity (K D * K D ). Since the absolute open probability is difficult to determine from macroscopic equilibrium measurements alone, additional information is necessary to determine true affinities and ligand efficacies. These can stem from kinetic macroscopic measurements, noise analysis, or single-channel analysis (see below). A fundamental difference between voltage gated channels and ligand gated channels is that the latter can be more or less efficiently activated by different types of ligands (for example certain glutamate receptors can be activated also by NMDA), while there is only one stimulus (voltage) for voltage gated channels. In terms of the simple model shown by Scheme 2, quantified by Eq. (12), different ligands have generally a different (true) affinity, and a different efficacy (pmax ). For ligands that occupy the “same” binding sites the true number of binding sites is the same. The Hill coefficient in Eq. (11) may nevertheless be different, since Eq. (11) is an approximate phenomenological description of channel activation. Certain ligands might also counteract the effect of a more potent activator. Furthermore, certain receptors possess different kinds of binding sites. For example certain glutamate receptors need glycine as a co-factor for full activation by glutamate [41]. A good overview over equilibrium properties of ligand gated ion channels with further reference is given by [39].
3 Macroscopic Recordings
Many ligand gated channels exhibit desensitization: currents decrease despite the continuous presence of ligand. The degree and kinetics of desensitization vary wildly between different channel types (see Ref. [42] and references therein). This phenomenon is conceptually similar to the “inactivation” of voltage gated channels. The presence of desensitization complicates the determination of activation properties. Experimentally, the most difficult problem, particularly if desensitization is an issue, is the fast application of the ligand (see Section 2.2). 3.3.3 Macroscopic Kinetics Channel kinetics can be evoked by a variety of stimuli. Sometimes, biophysical analysis alone is not sufficient to determine the physiological effect in a complex cellular system, as for example a cardiac myocyte, where numerous channel types contribute to the various phases of the action potential. In such cases, a “physiological” stimulation with an action potential waveform or other stimuli might reveal if, for example, a given mutation leads to action potential shortening (see for example Ref. [43]). However, such physiological stimuli are not well suited to uncovering the underlying mechanistic effect. As outlined in the Introduction, for this purpose clamping the relevant intensive physical parameter (voltage, ligand concentration, temperature, pressure, light intensity,) to a fixed value and performing jump experiments are more informative. Like practically all kinds of conformational changes of proteins, current relaxations of a homogenous population of channels induced by a step-wise change of an intensive parameter, can be described by the sum of a constant term (the steady-state current, I∞ ) and one or more exponential functions
I(t)=I∞ +
n
ai exp(−t/τi )
(13)
i=1
With amplitudes, ai , and time constants, τ i (t is the time after the jump). The kinetics is thus determined by 2n+1 parameters. The time constants, τ i , depend only on the actual value of the relevant physical parameter (the voltage or the ligand concentration after the jump), while the coefficients, ai , depend on the state occupancy before the jump. The exponential time dependence is a mathematical consequence from the Markov property of the conformational changes: once the channel undergoes a conformational change it loses “immediately” the memory of from what state it arrived (if there is more than one possibility to arrive in a certain state) and it has no memory of how long it has already been in a given state. One of the underlying assumptions is, of course, that there exists a finite set of definable, stable “states”. Actually, it is more scientific to turn the argument around: the experimental result that relaxation kinetics for most channels can be well described by Eq. (13) (with a reasonably small and reproducible number, n, of components) suggests that the Markov assumption is valid for ion channels. What is a reasonably small number, n? This is indeed a difficult question. In principle, a gating scheme with N states predicts exponential relaxation with N-1 components (for example a two-state system has single-exponential kinetics). In practice it is very difficult to reliably fit more than two or three exponential components. However,
19
20 Electrophysiological Analysis of Ion Channels
gating schemes often require more than four states. Indeed it is extremely difficult to define a “correct” gating scheme based on fitting of current relaxations. Often gating schemes that have a large number of states can be simplified with symmetry arguments and the number of free parameters can be reduced. A beautiful example are the Hodgkin–Huxley equations (see Ref. [1]), but also recent models of Shaker K+ channels have a relatively small number of parameters, despite a large number of states (see Refs. [44, 45]). Furthermore, current relaxations are often approximately single- or double exponential under certain conditions, even though a full kinetic model requires many states, because most components are negligible. For example the deactivation of Na+ channels at very negative voltages after a brief activating pulse can be well described by a single exponential, even though the general gating involves very many states. Often, kinetics of ion channels is fitted and time constants are determined phenomenologically without necessarily wanting to define a molecular mechanism of gating. In many circumstances, this is the only practical choice, because the underlying mechanism is too complex to be determined reliably from the measurements. One of the most difficult problems in curve fitting with exponentials is to separate components with time constants that differ less than, let us say three-fold. Extreme care has to be taken in such cases and reproducibility has to be tested extensively. Often data can be reasonably well described with the sum of two exponentials, even though the “true” mechanism would require at least three. In such cases, the time constants (and relative weights) of the exponential components determined from the double-exponential fit can be almost meaningless. If the true time course is distorted slightly, for example by voltage-clamp errors (see above), the kinetic analysis becomes even more difficult. Under certain conditions it is impossible to directly follow the kinetics of a process measuring the ionic current. For example, the channel may be closed quickly by one kind of gate while another gate is slowly changing its status. Another reason that renders impossible a direct measurement might be that the voltage is close to the reversal potential or that a strong block occurs. Also, a large capacitive artifact may obscure fast relaxations [27]. In these cases so-called “envelope” protocols can often be applied, as illustrated in Fig. 7 that shows the classical protocol to study the recovery from inactivation of the voltage gated Na+ channel or, completely analogously it illustrates the measurement of the recovery from desensitization of a ligand gated channel. It is insightful to explicitly consider the kinetics of the most simple, two state system, because often more complex schemes can also be simplified to it, allowing an easy quantitative description. Scheme 3.
Opening occurs with rate-constant α, closing with rate constant β. These rate constants depend on the intensive physical parameters (voltage, ligand concentration). The equilibrium open probability is α p∞ = α+β
3 Macroscopic Recordings
Fig. 7 Envelope protocol to study recovery from inactivation. The pulse protocol is illustrated in A, current traces are shown in B. Currents during the recovery period are not shown. In C the peak current at the fi-
nal test pulse is plotted versus the recovery time together with a single exponential fit. Currents were simulated with a simplified Hodgkin–Huxley model.
While the relaxation time constant is given by τ=
1 α+β
Knowing, both p and τ allows the determination of α and β: α= p∞ /τ ;β=(1− p∞ )/τ Relaxations that start from a given value of open probability, p0 , proceed in time as α= p∞ +( p0 − p∞ )exp(−t/τ )
The Del Castillo-Katz model for ligand gated channels (Scheme 2) can be reduced to an effective two-state system if the ligand binding/unbinding is much faster than the opening isomerization transition (described by α and β). In this case the receptor is always in equilibrium with the ligand (see Ref. [1]): Scheme 4.
21
22 Electrophysiological Analysis of Ion Channels
Where the effective opening rate is given by αeff =
1 1+ K[L]D
α
Combining steady state-measurements (Eq. (12)) and relaxation measurements with step-changes in ligand concentration allows the determination of all three parameters (K D , α, β) of the model of Scheme 4: for jumps into zero concentration of ligand, the relaxation rate is τ −1 = because α eff = 0 for [L] = 0. For jumps into saturating concentration of ligand the relaxation rate is τ −1 = α + β because α eff = α for [L] KD . These kinetic experiments thus provide estimates for α and β. From the equilibrium measurements the apparent affinity, K D * is determined (Eq. (12)) and using the values for α and β the true affinity, K D , can be calculated from K D = K D * (1 + α/β). This simple example illustrates how the combined use of equilibrium and kinetic measurements in addition to simplifying assumptions can be used to obtain quantitative information about a molecular mechanism.
3.4 Channel Block
All ion channels interact with a variety of smaller molecules (peptides, small organic molecules) that directly or indirectly reduce ion permeation. Such substances are called blockers or inhibitors and are the bread and butter of the pharmaceutical industry interested in ion channel targets. Blockers may directly block the pore and physically impede ion flow. Inhibitors may reduce current flow by stabilizing the closed state of a channel gate, for example, in which case the binding site may be far away from the ionic pore. Such inhibitors are often called “gating modifiers”. This distinction between these two kinds of modulators is actually not so strict - many pore blockers additionally alter the gating by binding more tightly to one or another gating state (state-dependent block). Such blockers may be useful tools to study the properties of channels [46, 47]. Many pore blockers exert a voltage-dependent block that can often be described by the Woodhull model (see Section 3.2). For most practical purposes channel block is quantified by the Hill equation I([B]) 1 = I(0) 1+( K[B] )n D
that quantifies the ratio of current in the presence of blocker at concentration [B] to the current in its absence with an apparent affinity K D and the Hill coefficient, n.
3 Macroscopic Recordings
3.5 Nonstationary Noise Analysis
The opening and closing of ion channels is a random process that renders current registrations “noisy”, in particular if few channels are present. The statistical properties of the noise can be used to infer some characteristics of the underlying elementary events. A prerequisite is that the channel-induced noise is significantly above the background noise of the recording system. This condition is, for example, generally not fulfilled in TEV recordings from Xenopus oocytes. The patch-clamp technique, on the other hand, is exceptionally well suited for noise analysis. However, background noise may be large if the series resistance in whole cell recordings is highly compensated (see Section 2.2). For stationary noise analysis the system is recorded for a prolonged time at fixed external conditions (voltage, ligand concentration). The power spectrum is then fitted with the sum of Lorentzian functions. While this method can yield important information [48], nonstationary noise analysis is often faster and easier to perform. Under appropriate conditions, each of the elementary parameters of Eq. (1) can be determined assuming a single open conductance level. This should be verified with single channel analysis. For standard nonstationary noise analysis a step-protocol (that may be a voltagestep or stepwise change in ligand concentration) is applied repeatedly, with enough time passing between individual stimulations to ensure identical initial conditions for each step. Each current response, Ii (t), is recorded (i = 1, . . ., n) (Fig. 8a). From these recordings the mean can be calculated by
=
n 1 Ii (t) n i=1
(14)
This is now a much smoother curve than the individual traces (Fig. 8b) and because of this it can be written a =Nip(t) where p(t) is the time course of the (“true”) open probability. The variance of the response, σ 2 (t), for each time point, t, is given by σ 2 (t)=
n 1 (Ii (t)−)2 n i=1
(15)
the standard statistical definition (Fig. 8c). However, the most important “trick” in nonstationary noise analysis is to calculate the variance not as suggested by Eq. (15) but as the squared difference of consecutive records: 1 (Ii+1 (t)−Ii (t))2 2(n−1) i=1 n−1
σ 2 (t)=
(16)
23
24 Electrophysiological Analysis of Ion Channels
Fig. 8 Nonstationary noise analysis. Currents were repeatedly evoked by a test pulse and individual responses are shown in A (currents were simulated with a simplified Hodgkin–Huxley model). The mean and the
variance are shown in B and C, respectively. In D the (binned) variance is plotted versus the respective mean together with a fit of Eq. (18). The horizontal line in D marks the level of the background variance.
(Note the scaling by a factor of 1/2 in Eq. (16)). For a perfectly stable system the results obtained from Eq. (15) and Eq. (16) are identical. However, in reality, small drifts of the total current amplitude (“run-down”, “run-up”) or of the reversal potential or other parameters are practically unavoidable. These small drifts are well cancelled out using Eq. (16) while they artificially increase the variance when calculated by Eq. (16) and may render the noise analysis meaningless, particularly if the single channel conductance is small [49]. Meaningful results can be obtained even with run-down up to a factor of two or more. Since the variance is obtained from the differences of records, leak currents and capacity transients also cancel out (if they are associated with negligible inherent noise). That means that the individual records do not have to be, and should not be, leak-corrected for the application of Eq. (16), in contrast to the calculation of the mean (Eq. (14)). A nice property of the variance is that independent noise source adds independently to it. Thus, total variance is given by 2 2 2 =σchannel +σbackground σtot
3 Macroscopic Recordings
and the background variance can usually be assumed to be independent of voltage. It can thus be determined at a voltage where no ion current flows through the channels (for example at the reversal potential or in the absence of ligand for ligand activated channels or at a voltage where channels are closed for voltage gated channels). Having determined the variance and the mean it is now possible to proceed with the “variance-mean analysis”. For this we use the equality σ 2 =Ni2 p(1− p)
(17)
This fundamental equation [48, 49] can be understood intuitively: for p = 0 the variance is null because no current flows. Also for p = 1 (the maximum value it can attain) there is no fluctuation in the current because all channels are permanently open. The largest fluctuations are present for p = 0.5 when channels are half open and half closed on average. Note that (Eq. (17)) is independent of the kinetics of the fluctuation. However, the bandwidth of the recording system must be sufficiently large to resolve the fastest transitions. Combining Eq. (17) with Eq. (1) yields σ 2 =i I−I 2 /N
(18)
that relates the two macroscopic quantities, σ 2 , and mean current I, in a parabolic function that depends on two parameters, the single channel current i, and the number of channels, N, that can be determined by a least-squares fit (Fig. 8d). Often the whole traces are not plotted and fitted against each other but they are first binned in an appropriate manner [49] (see Fig. 8d). The two parameters, i and N, are best defined if a large interval of popen is covered by the relaxation. If only a very small interval of popen is sampled, the two parameters cannot be determined independently: a small variance can be caused by a small number of channels and/or by a small or large popen and vice versa. It may be that the current excursion in the relaxation is substantial but that popen remains significantly smaller than 0.5. In this case the second term in Eq. (18) is negligible, the relationship is linear, and only the single channel current can be determined. If both, i and N, can be determined the absolute popen during the relaxation can be calculated from p(t)=
I(t) Ni
Thus, using a simple experimental protocol (Fig. 8) allows a quite precise determination of fundamental channel parameters, without the need to perform single channel analysis. 3.6 Gating Current Measurements in Voltage Gated Channels
Another way of obtaining additional information about molecular gating mechanisms is to measure the so-called “gating currents” associated with the molecular rearrangements of voltage gated cation channels [50]. These are transient currents,
25
26 Electrophysiological Analysis of Ion Channels
similar to capacitive currents, that reflect the movement of the gating charges within the electric field. Of course, any conformational rearrangement of the channel protein that is associated with charge redistribution gives rise to “transient” gating currents, even if they do not reflect the movement of a “voltage sensor”. However, in voltage gated K+ , Na+ and Ca2+ channels the voltage-sensor movements clearly dominate the gating currents. In order to resolve gating currents that are of small magnitude it is necessary to eliminate the normal ionic currents by applying blockers or by eliminating permeant ions. However, it is necessary to ensure that such maneuvers of completely eliminating ion flow through the pore does not alter significantly the gating process itself, an often difficult task. It is beyond the scope of this chapter to describe the design and the analysis of gating current measurements (see Ref. [51] for review). However, the estimation of the total gating charge of a single voltage gated K+ channel provides a nice example of this approach [52]. The authors determined first the number of channels, N, in an inside out patch using nonstationary noise analysis (see Section 3.5). Then they replaced intracellular K+ with TEA+ , a blocker of K+ channels, to eliminate the ionic currents and measured the total gating charge, Q, by integrating the gating currents. The ratio Q/N, the gating charge per channel, was about 12 elementary charges, consistent with more indirect measurements. Another nice result was obtained by Conti and St¨uhmer [53], who estimated the size of the charge of a single voltage sensor in Na+ channels. Voltage gated cation channels possess four voltage sensors that move more or less independently of each other. The movement of each sensor produces a spike-like tiny current. The ensemble of many sensors is the random superposition of many such spikes, filtered at the recording frequency. Nonstationary noise analysis yielded an elementary charge of individual spikes of about 2.3 elementary charges, a very reasonable value.
4 Single Channel Analysis
The possibility to observe and analyze the opening and closing of single ion channel molecules marked a revolution in ion channel research and remains one of the few techniques that allow a true single molecule measurement in real time [3, 4]. Single channel recordings can provide a wealth of information and numerous numerical methods for single channel analysis have been developed. The reader is referred to “Single channel Recording” by Sakmann and Neher for detailed information [54]. Here a very broad overview of a typical single channel analysis is provided. Recent papers utilizing more advanced techniques can also be found [55, 56]. 4.1 Amplitude Histogram Analysis
The first step of a single channel analysis is usually to construct an amplitude histogram. Already at this point one ever returning aspect of single channel analysis
4 Single Channel Analysis
becomes important: adequate filtering. Of course the data should be filtered with a good filter (for example an 8-pole Bessel filter) with a cut-off of at least at one half the sample frequencies to avoid aliasing [57]. In order to get, at least in principle, as much information as possible, the sampling rate must be sufficiently high. It is however useless to acquire data at a low signal-to-noise ratio. A dilemma is sometimes that one would like to see online the highly filtered data in order to get an immediate impression of its quality, but one also wants to acquire at a higher frequency for quantitative offline analysis. One possibility is to divert the signal after a primary anti-alias filter into two separately sampled channels. One signal is acquired after only the first filter, while in the second the current signal is subjected to further filtering for immediate inspection. This can be done on an oscilloscope, or, if the acquiring software allows the simultaneous sampling of two channels, the highly filtered signal can be acquired as a second input channel and visualized on the computer screen. For the construction of the amplitude histogram all sample points that fall in a given “current-bin” are counted, resulting in the number of events per bin that are then plotted versus the mid-point of the current bin. This is illustrated in Fig. 9. To each current level of the channel (“closed” and “open” in Fig. 9) corresponds a “peak” in the amplitude histogram. The peaks may not be well separated because noise is large (Fig. 9b). In this case the data can be digitally filtered by a Gaussian filter that has various convenient properties [57] (Fig. 9c). If the baseline is not stable the amplitude histogram becomes distorted. Excessive baseline drift can make the single channel analysis very difficult and must be corrected. Several analysis programs are available that allow baseline correction and many other features. Once an acceptable amplitude histogram has been constructed it is fitted with the sum of Gaussian functions, Gi , one for each peak, i H(I)=
G i (I)=
i
ai (I−µi )2 exp(− ) σi 2σi2 i
(19)
Where each Gaussian functions is characterized by a mean µi a width, σ i , and amplitude, ai . The inclusion of the width, σ i , in the prefactor ai /σ i in the Gaussian fit (Eq. (19)) facilitates the calculation of the relative area, Ai , that is occupied by each Gaussian component: ai Ai =100% j
aj
The relative area, Ai , is a measure of the probability to dwell in the conductance state associated with mean µi . Often it happens that the membrane patch contains an unknown number, N, of identical channels, leading to equidistant peaks of the amplitude histogram at levels ni (n = 0, 1, . . .), where i is the amplitude of a single channel. Even though the absolute open probability cannot easily be determined in such a situation a useful parameter to evaluate effects of drug application or
27
28 Electrophysiological Analysis of Ion Channels
Fig. 9 Amplitude histogram analysis. Currents were simulated based on a simple 2-state scheme with an open conductance level of 1 pA and an added Gaussian noise of 0.4 pA SD (panel A). The baseline and the open conductance level are indicated by horizontal lines. The noisy trace in A seems to be almost useless. However, the amplitude histogram shown in B clearly shows
two peaks at the right amplitudes (0 pA and 1 pA), and the fit with the sum of two Gaussian functions (thick line in B) correctly predicts the amplitude and area of each conductance level. The trace shown in C is strongly filtered and the histogram in D shows two well separated peaks of correct amplitude and weight. The bin width used for the histograms was 10f.A.
other maneuvers is the so-called “NpO ”, i.e. the product of the (unknown) number of channels and their open probability. From the histogram fit the “NpO ” can be calculated as ∞
Np O =
j =0 ∞
jAj Aj
j=0
where the “areas” Aj are obtained from the Gaussian fits as described above, and A0 corresponds to the baseline. 4.2 Kinetic Single Channel Analysis
A comprehensive description of the kinetic analysis of single channel data is beyond the limits of this chapter and only general directions can be given. The strategic decisions that can be taken for kinetic analysis are outlined schematically in Fig. 10.
4 Single Channel Analysis
Fig. 10 Flow diagram of strategies for kinetic single channel analysis.
The most direct way of analysis is depicted on the left of the figure and consists of directly fitting a “hidden Markov model” to the “raw” data [58–60]. A Markov model is a kinetic scheme like those described above (for example, Scheme 2) with possibly many states of various conductance levels connected by rate constants. The rate constants and the conductance levels of the various states are the parameters fitted in this approach. The model is “hidden” for two reasons. First, several kinetic states may be associated with the same conductance. Transitions among these states are therefore not directly visible. Second, the noise can hide shortlived dwell times or low conductance states. One advantage of the hidden Markov model approach is that it takes the noise into account explicitly [59, 60]. Algorithmically, for the hidden Markov model the following question is raised: for a given set of parameters (these parameters include the rate constants of the model, the conductance level of each state, and parameters that describe the noise) what is the probability of observing the currents that have been measured? The parameters of the model and the characteristics of the noise are then adapted to maximize this probability.
29
30 Electrophysiological Analysis of Ion Channels
The calculation of this probability is a formidable task but efficient algorithms have been developed that allow the analysis of quite long data sets. One drawback of the method is that a reasonable kinetic model should a priori be known. The method can be applied for example to study the effect of mutations of an ion channel for which a kinetic scheme has been established previously. Another drawback is that the method is a kind of black box with little possibility of visual evaluation to check if the results are “reasonable”. A further serious problem may occur if the general properties of the noise are not adequately treated [60]. Thus, while the hidden Markov modeling can be a powerful tool, its use requires some experience and results should be checked with other methods. The more traditional methods of analysis do not work directly on the raw data trace but this is first “idealized” (Fig. 11). In the idealization process the events of channel opening and closing are detected either completely automatically or interactively using specially designed computer programs. The noisy raw data trace is thus substituted by the idealized smooth trace (Fig. 11) that can be easily represented as a list with two numbers for each entry in the list: the duration of each dwelling together with the corresponding current level. For idealization one must decide if a current fluctuation represents a transition to another conductance level or if it is just noise. As a criterion the 50% level criterion is most often employed [57]. To apply this criterion, the possible conductance levels are estimated first by the fitting of the amplitude histogram (see above). In addition, events are only accepted if they are of a minimal length. For a reliable assignment of the transitions data usually have to be filtered more than for the hidden Markov approach, at least if the signal to noise level is low. The basic problem in the idealization process is that short events are easily missed (the missed events problem) but short events can also be artifactually introduced if noise is excessive. The problem is double: missing a short closure not only leads to the loss of a closed event but it also lengthens the opening time of the event during which the closure occurred. Similarly, the artifactual introduction of a short closure not only alters the closed times but also shortens, and divides into two the underlying opening. To deal with this problem in generality is not easy. First, a reasonable cut-off is defined such that no events shorter than this are rigorously accepted and the resulting error in the final fitting procedure can then be compensated [56, 61]. For simplicity, we ignore here the missed events problem. The idealized trace can then be analyzed in several ways. A first step is to construct and inspect dwell-time histograms. Several kinds of binning procedures and histograms to construct have been proposed (see for example [62]). In Fig. 11 so-called cumulative dwell time histograms for the two conductance levels are displayed. These histograms can then be fitted with the sum exponential components in order to extract kinetic information from the single channel data. The construction and fitting of histograms is always a first step in data analysis if little is known about the underlying channel and if one wants to obtain an impression about its kinetic behavior, like for example an estimate of the number of open and closed states. Histogram fitting may also provide a purely empirical set of parameters whose variation under the influence of ligands or mutation can be
5 Conclusion
Fig. 11 Dwell-time analysis. In A a short stretch of a simulated trace of a two-state scheme is shown. In B the corresponding idealized trace is displayed. The cumulative dwell-time histograms shown in C and D are based on a total of 408 events each and represent the relative frequency of events to be longer than a given duration. By def-
inition the cumulative distribution equals 1 for a duration of 0. A single exponential dwell-time distribution is represented by a straight line in the logarithmic scaling of Fig. 11. Other representations of dwell-time histograms are probably more common [62] but require a much larger number of events for a satisfactory graphical display.
studied. If a good working hypothesis for a kinetic Markov model for the channel is available it may instead be a good idea to fit directly the likelihood of the idealized channel trace. This approach is similar to the hidden Markov approach described above in that the full kinetic information including possible correlations are exploited. The maximum-likelihood fitting overcomes one of the biggest problems of histogram fitting: it is not clear how the time constants and coefficients extracted from closed and open time histograms have to be weighted in fitting a concrete Markov scheme. In the maximum-likelihood approach the rate constants defining the Markov scheme are directly optimized [56, 61]. Recordings obtained under different conditions can also be fitted simultaneously. This approach thus allows on the one hand an objective estimate of physical parameters similar to the hidden Markov fitting and on the other hand the results can be easily judged visually by comparing the predictions for all kinds of dwell-time histograms [56].
5 Conclusion
This chapter provides a broad overview of current concepts and methods of analysis of electrophysiological data. Several of the methods are incorporated into the free analysis program written by the author that is available for down-
31
32 Electrophysiological Analysis of Ion Channels
load at http://www.ge.cnr.it/ICB/conti moran pusch/programs-pusch/softwaremik.-htm. The simulation program used to generate several of the figures can also be found there.
Acknowledgements
I thank Armando Carpaneto for critically reading the manuscript. The financial support by Telethon Italy (grant GGP04018) and the Italian Research Ministry (FIRB RBAU01PJMS) is gratefully acknowledged.
References 1 B. HILLE, Ion Channels of Excitable Membranes, Sinauer, Sunderland, MA, 2001. 2 A. L. HODGKIN, A. F. HUXLEY, A quantitative description of membrane current and its application to conduction and excitation in nerve, J. Physiol. 1952, 117, 500–544. 3 E. NEHER, B. SAKMANN, Single-channel currents recorded from membrane of denervated frog muscle fibres, Nature 1976, 260, 799–802. 4 O. P. HAMILL, A. MARTY, E. NEHER, B. SAKMANN, F. J. SIGWORTH, Improved patch-clamp techniques for high-resolution current recording from cells and cell-free membrane patches, Pflugers. Arch. 1981, 391, 85–100. 5 D. A. DOYLE et al., The structure of the potassium channel: molecular basis of K+ conduction and selectivity, Science 1998, 280, 69–77. 6 Y. JIANG et al., The open pore conformation of potassium channels, Nature 2002, 417, 523–526. 7 R. DUTZLER, E. B. CAMPBELL, M. CADENE, B. T. CHAIT, R. MACKINNON, X-ray structure of a ClC chloride channel at 3.0 Å reveals the molecular basis of anion selectivity, Nature 2002, 415, 287–294. 8 S. BERNE` CHE, B. ROUX, A microscopic view of ion conduction through the K+ channel, Proc. Natl. Acad.Sci. U. S. A. 2003, 100, 8644–8648. Epub 2003 Jul 01.
9 W. STU¨ HMER, Electrophysiological recording from Xenopus oocytes, Methods Enzymol. 1992, 207, 319–339. 10 W. STU¨ HMER, Electrophysiological recordings from Xenopus oocytes, Methods Enzymol. 1998, 293, 280–300. 11 M. F. ROMERO, Y. KANAI, H. GUNSHIN, M. A. HEDIGER, Expression cloning using Xenopus laevis oocytes, Methods Enzymol. 1998, 296, 17–52. 12 G. NAGEL, T. SZELLAS, J. R. RIORDAN, T. FRIEDRICH, K. HARTUNG, Non-specific activation of the epithelial sodium channel by the CFTR chloride channel, EMBO Rep. 2001, 2, 249–254. 13 E. STEFANI, L. TORO, E. PEROZO, F. BEZANILLA, Gating of Shaker K+ channels: I. Ionic and gating currents, Biophys. J. 1994, 66, 996–1010. 14 M. E. BARISH, A transient calcium-dependent chloride current in the immature Xenopus oocyte, J. Physiol. 1983, 342, 309–325. 15 C. HARTZELL, I. PUTZIER, J. ARREOLA, Calcium-activated chloride channels, Annu. Rev. Physiol. 2005, 67, 719–758. 16 T. M. MORIARTY et al., Beta gamma subunits of GTP-binding proteins inhibit muscarinic receptor stimulation of phospholipase C, Proc. Natl. Acad. Sci. U. S. A. 1988, 85, 8865–8869. 17 M. PUSCH, K. STEINMEYER, M. C. KOCH, T. J. JENTSCH, Mutations in dominant human myotonia congenita drastically alter the voltage dependence of the CIC-1
References
18
19
20
21
22
23
24
25
26
27
28
29
chloride channel, Neuron 1995, 15, 1455–1463. D. FIRSOV et al., Cell surface expression of the epithelial Na channel and a mutant causing Liddle syndrome: a quantitative approach, Proc. Natl. Acad. Sci. U. S. A. 1996, 93, 15370–15375. N. ZERANGUE, B. SCHWAPPACH, Y. N. JAN, L. Y. JAN, A new ER trafficking signal regulates the subunit stoichiometry of plasma membrane K(ATP) channels, Neuron 1999, 22, 537–548. M. SCHWAKE, M. PUSCH, T. KHARKOVETS, T. J. JENTSCH, Surface expression and single channel properties of KCNQ2/KCNQ3, M-type K+ channels involved in epilepsy, J. Biol. Chem. 2000, 275, 13343–13348. D. W. HILGEMANN, C. C Lu, Giant membrane patches: improvements and applications, Methods Enzymol. 1998, 293, 267–280. D. J. MACONOCHIE, D. E. KNIGHT, A method for making solution changes in the sub-millisecond range at the tip of a patch pipette, Pflugers. Arch. 1989, 414, 589–596. R. J. KAUFMAN, Overview of vector design for mammalian gene expression, Methods. Mol. Biol. 1997, 62, 287–300. G. P. NOLAN, A. R. SHATZMAN, Expression vectors and delivery systems, Curr. Opin. Biotechnol. 1998, 9, 447–450. P. B. BENNETT, H. R. GUTHRIE, Trends in ion channel drug discovery: advances in screening technologies, Trends Biotechnol. 2003, 21, 563–569. W. STU¨ HMER et al., Structural parts involved in activation and inactivation of the sodium channel, Nature 1989, 339, 597–603. A. ACCARDI, M. PUSCH, Fast and slow gating relaxations in the muscle chloride channel CLC-1, J. Gen. Physiol. 2000, 116, 433–444. F. J. SIGWORTH, in Single-channel Recording, B. SAKMANMN, E. NEHER (Eds.), Plenum Press, New York, 1995, pp. 95–127. F. BEZANILLA, C. ARMSTRONG, Inactivation of the sodium channel. I. Sodium current experiments, J. Gen. Physiol. 1977, 70, 549–566.
30 C. SAVIANE, F. CONTI, M. PUSCH, The muscle chloride channel ClC-1 has a double-barreled appearance that is differentially affected in dominant and recessive myotonia, J. Gen. Physiol. 1999, 113, 457–468. 31 M. PUSCH, U. LUDEWIG, A. REHFELDT, T. J. JENTSCH, Gating of the voltage-dependent chloride channel CIC-0 by the permeant anion, Nature 1995, 373, 527–531. 32 T-Y. CHEN, Structure and function of CLC channels, Annu. Rev. Physiol. 2005, 67, 809–839. 33 M. PUSCH, L. BERTORELLO, F. CONTI, Gating and flickery block differentially affected by rubidium in homomeric KCNQ1 and heteromeric KCNQ1/KCNE1 potassium channels, Biophys. J. 2000, 78, 211–226. 34 E. NEHER, Correction for liquid junction potentials in patch clamp experiments, Methods Enzymol. 1992, 207, 123–131. 35 A. M. WOODHULL, Ionic blockage of sodium channels in nerve, J. Gen. Physiol. 1973, 61, 687–708. 36 O. STAUB et al., Regulation of stability and function of the epithelial Na+ channel (ENaC) by ubiquitination, EMBO J. 1997, 16, 6325–6336. 37 M. S. AWAYDA, Specific and nonspecific effects of protein kinase C on the epithelial Na+ channel, J. Gen. Physiol. 2000, 115, 559–570. 38 D. E. GOLDMAN, Potential, impedance, and rectification in membranes, J. Gen. Physiol. 1943, 27, 37–60. 39 J. KRUSEK, Allostery and cooperativity in the interaction of drugs with ionic channel receptors, Physiol. Res. 2004, 53, 569–579. 40 J. Del CASTILLO, B Katz, Interaction at end-plate receptors between different choline derivatives, Proc. R. Soc. London, Ser. B. Biol. Sci. 1957, 146, 369–381. 41 J. D. CLEMENTS, G. L. WESTBROOK, Activation kinetics reveal the number of glutamate and glycine binding sites on the N-methyl-D-aspartate receptor, Neuron 1991, 7, 605–613. 42 J. P. CHANGEUX, S. J Edelstein, Allosteric receptors after 30 years, Neuron 1998, 21, 959–980.
33
34 Electrophysiological Analysis of Ion Channels
43 G. BERECKI et al., HERG Channel (Dys)-function Revealed by Dynamic Action Potential Clamp Technique, Biophys. J. 2005, 88, 566–578. 44 W. N. ZAGOTTA, T. HOSHI, R. W. ALDRICH, Shaker potassium channel gating. III: Evaluation of kinetic models for activation, J. Gen. Physiol. 1994, 103, 321–362. 45 J. ZHENG, F. J. SIGWORTH, Selectivity Changes during Activation of Mutant Shaker Potassium Channels, J. Gen. Physiol. 1997, 110, 101–117. 46 E. D. KOCH, B. M. OLIVERA, H. TERLAU, F. CONTI, The Binding of {kappa}-Conotoxin PVIIA and Fast C-Type Inactivation of Shaker K+ Channels are Mutually Exclusive, Biophys. J. 2004, 86, 191–209. 47 M. PUSCH et al., Mechanism of block of single protopores of the Torpedo chloride channel ClC-0 by 2-(p-chlorophenoxy)butyric acid (CPB), J. Gen. Physiol. 2001, 118, 45–62. 48 F. CONTI, Noise analysis and single-channel recordings, Curr. Top. Membranes Transport 1984, 22, 371–405. 49 S. H. HEINEMANN, F. CONTI, Nonstationary noise analysis and application to patch clamp recordings, Methods Enzymol. 1992, 207, 131–148. 50 C. M. ARMSTRONG, F. BEZANILLA, Currents related to movement of the gating particles of the sodium channels, Nature 1973, 242, 459–461. 51 F. BEZANILLA, The voltage sensor in voltage-dependent ion channels, Physiol. Rev. 2000, 80, 555–592. 52 N. E. SCHOPPA, K. MCCORMACK, M. A. TANOUYE, F. J. SIGWORTH, The size of gating charge in wild-type and mutant Shaker potassium channels, Science 1992, 255, 1712–1715. 53 F. CONTI, W. STU¨ HMER, Quantal charge redistributions accompanying the
54
55
56
57
58
59
60
61
62
structural transitions of sodium channels, Eur. Biophys. J. 1989, 17, 53–59. Single-channel Recording, B. SAKMANN, E. NEHER (Eds.), Plenum Press, New York, 1995. F. QIN, Restoration of single-channel currents using the segmental K-means method based on Hidden Markov Modeling, Biophys. J. 2004, 86, 1488–1501. D. COLQUHOUN, C. J. HATTON, A. G. HAWKES, The quality of maximum likelihood estimates of ion channel rate constants, J. Physiol. (London) 2003, 547, 699–728. D. COLQUHOUN, F. J. SIGWORTH, in Single-channel Recording, B. SAKMANN, E. NEHER (Eds.), Plenum Press, New York, 1995, pp. 483–585. S. H. CHUNG, J. B. MOORE, L. G. XIA, L. S. PREMKUMAR, P. W. GAGE, Characterization of single channel currents using digital signal processing techniques based on Hidden Markov Models, Philos. Trans. R. Soc. London, Ser. B Biol. Sci. 1990, 329, 265–285. F. QIN, A. AUERBACH, F. SACHS, Hidden Markov modeling for single channel kinetics with filtering and correlated noise, Biophys. J. 2000, 79, 1928–1944. L. VENKATARAMANAN, F. J. SIGWORTH, Applying hidden Markov models to the analysis of single ion channel activity, Biophys. J. 2002, 82, 1930–1942. F. QIN, A. AUERBACH, F. SACHS, Estimating single-channel kinetic parameters from idealized patch-clamp data containing missed events, Biophys. J. 1996, 70, 264–280. F. J. SIGWORTH, S. M. SINE, Data transformations for improved display and fitting of single-channel dwell time histograms, Biophys. J. 1987, 52, 1047–1054.
1
Ion Channel Assays Using Fluorescent Probes Jes´us E. Gonz´alez, and Jennings Worley III and Fredrick Van Goor Vertex Pharmaceuticals, Inc., San Diego, USA
Originally published in: Expression and Analysis of Recombinant Ions Channels. Edited by Jeffrey J. Clare and Derek J. Trezise. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31209-2
1 Introduction
Ion channels are critical for physiological signaling and are the targets of several drugs. Most of these drugs were discovered using in vivo pharmacology models or directly from observation in humans and only later was it determined that their mechanism of action involved modulation of ion channel activity. Ion channels are widely recognized as “druggable” targets due to their modulation by a wide diversity of small molecules. Despite their promise, these targets have historically been difficult to pursue because of limited structural information, low expression levels, requirement of a membrane environment for proper folding and pharmacology, and limitations or absence of high-throughput screening methods. In the last 15 years drug discovery approaches have increasingly relied on high throughput methods to profile a larger number of candidate compounds in increasing numbers of in vitro assays aimed at providing insight into which molecules will be more likely to be efficacious, safe, and able to be administered in humans. Assays for primary targets, safety counter-screens, chemical properties, and in vitro metabolism are key components of modern discovery processes and approaches. Ion channel assays play as an important role and are particularly challenging as they generally require cellular expression. Fluorescence readouts are commonly use for cell-based assays because of their high sensitivity, compatibility with readily available instrumentation and microtiter plates, and the availability of a variety of probes and reagents [1]. The application of physiological indicators of intracellular calcium and membrane potential has made possible functional fluorescence based ion channels assays with the requisite throughput, sensitivity, and reliability required for large scale profiling. In this chapter we will review the range of fluorescence probes, approaches and concepts Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Ion Channel Assays Using Fluorescent Probes
for assaying activity for sodium, calcium, potassium, and chloride channels, including those activated and regulated by ligands and voltage. Within each class of channels the utility and challenges will be discussed. Where appropriate, relevant reviews will be cited and illustrative examples will be given. Finally, we will provide examples and describe areas of where novel fluorescence assays are being developed, with an emphasis on drug discovery applications and new approaches for measuring ion channel trafficking.
2 Membrane Potential Probes
Fluorescent probes of cellular membrane potential were initially developed as physiological indicators and are divided into three major categories. The first includes fast electrochromic probes, such as Di-4-ANEPPS, that are capable of detecting microsecond voltage changes yet have low voltage sensitivity (∼1–10% F/F o per 100 mV) [2]. The second class is comprised of environment-sensitive dyes that distribute between cellular sites and extracellular solution according to the membrane potential. These probes have good sensitivity (1–100% F/F o per 100 mV) yet relatively slow time resolution, requiring minutes to attain maximal fluorescence response [2]. In the last five years new redistribution dyes have been introduced by Molecular Devices that respond substantially faster, in tens of seconds instead of minutes [3, 4]. The third category involves probes based on fluorescence resonance energy transfer (FRET) readout of rapid transmembrane translocation of fluorescent hydrophobic ions [5, 6]. These probes have good sensitivity (20–150% F/F o per 100 mV) and respond much more rapidly than redistribution dyes since the temporal response results from facile transmembrane redistribution of a mobile voltage-sensitive ionic dye and not slow diffusion across multiple membrane/water interfaces. Ion channels in cells throughout the body open and close in microto milliseconds to produce rapid, < 5 s, changes in membrane potential which are important for cell signaling and are not, in many cases, readily detected with slow redistribution dyes. For this reason most applications requiring the highest temporal resolution, such as neuronal signaling, require electrochromic or FRET dyes. Redistribution and FRET probes are generally used for most ion channel drug discovery applications because of their relatively high sensitivity and ease of use. 2.1 Redistribution Probes
Fluorescent redistribution probes are environment-sensitive charged dyes that equilibrate between intracellular hydrophobic sites and extracellular solution according to membrane potential. Fig. 1A is a schematic illustrating the redistribution mechanism and Fig. 1B shows a fluorescence micrograph of cells stained with the oxonol DiSBAC4 (3). Generally, the fluorescence quantum yield is high when the probe binds to hydrophobic sites such as proteins and membranes and very low when
2 Membrane Potential Probes
Fig. 1 Oxonol probes that operate via redistribution. (A) Schematic of voltage-sensitive redistribution mechanism. Negativelycharged oxonol molecules (small circles) distribute into a cell (large circles) according to the membrane potential V m . The dye fluorescence is greatly enhanced, represented by the small filled circles, when bound to hydrophobic intracellular protein and membrane sites. At less negative po-
tentials, represented on the right panel, more dye accumulates in the cell and the total fluorescence increases. (B) A fluoresence micrograph of CHO-K1 cells stained with DiSBAC4 (3) showing extensive intracellular staining. The extracellular solution is relatively dark because the quantum efficiency in aqueous solution is neglible. (C) Structures of bis-barbiturate trimethine oxonols.
in aqueous solution. Consequently positively charged probes such as cyanines and rhodamines result in bright fluorescence at negative membrane potentials and are relatively dim at less negative or positive potentials. The opposite is true for negatively charged probes such as oxonol dyes, which make it more challenging to detect significant oxonol fluorescence from cells that hyperpolarize to very negative potentials. Anions translocate across the membranes and into cells more easily than cations because of the dipole that originates from the lipid carbonyl groups. The rate difference due to charge can be orders of magnitude, as has been shown for the isostructural borates and phosphonium ions [7]. This is an important reason why negatively charged probes, such as oxonols, are routinely used and positively charge probes are not. The structures of commonly used bisbarbiturate oxonol dyes are shown in Fig. 1C. One of the first fluorescent assays specifically designed for identifying ion channel modulators used the redistribution oxonol probe DiBAC4 (3) to develop a higher throughput assay for identifying openers of KATP channels [8]. This work played a major role in the development of the fluorometric imaging plate reader (FLIPR) [9]. DiBAC4 (3)’s high sensitivity to temperature and excitation and emission properties were key considerations in the original FLIPR design specifications. Key issues with DiBAC4 (3) are temperature sensitivity, slow temporal response, interference from fluorescent compounds and relatively high false positive rate. Even at 37 ◦ C, the time response often requires 15 min or more to reach the maximal fluorescence
3
4 Ion Channel Assays Using Fluorescent Probes
change. Despite these limitations, DiBAC4 (3) has been broadly applied and has also been used with a microfluidic device, in combination with a cationic dye, to measure ion channel activity in human T lymphocytes [10]. Recently Molecular Devices have introduced a new FLIPR membrane potential (FMP) kit that includes a redistribution dye that offers advantages in temporal response and reduced fluorescence interference compared to DiBAC4 (3). The probe has fluorescence properties similar to known thiobarbiturate oxonols and has redistribution kinetics much faster than DiBAC4 (3) with maximal response achieved in approximately 10–15 s. The kit also includes a second quenching dye that reduces extracellular fluorescence and improves signal to background and potentially reduces the effect of some fluorescent compounds. Another FMP kit advantage for HTS applications is that the dyes can be added directly to cells and do not require additional washing steps, streamlining the process and resulting in greater throughput. 2.2 FRET Probes
Membrane potential sensors based on FRET are useful for high-throughput screening (HTS) of ion channel targets [11–13]. They are comprised of two fluorescent components. The first is a highly fluorescent, hydrophobic ion that binds to the plasma membrane and “senses” the transmembrane electric field. The ion sensor rapidly redistributes between two binding sites on opposite sides of the membrane, establishing a Nernstian equilibrium (10-fold concentration ratio for ∼60 mV). In response to a membrane potential change, the hydrophobic ions electrodiffuse across the membrane and establish a new equilibrium corresponding to the new membrane potential. The voltage-dependent redistribution is converted into a ratiometric fluorescent readout with a second fluorescent molecule that binds specifically to one face of the plasma membrane and functions as a FRET partner to the mobile voltage-sensing ion. A schematic of the mechanism is shown in Fig. 2A. A variety of fluorescent membrane-bound molecules have been designed to function as voltage-sensitive FRET partners with different voltage sensitivities, temporal responses, and wavelengths. In the most commonly used configuration, the two fluorescent dye components are a ChloroCoumarin-labeled phospholipid (e.g. CC1-DMPE and CC2-DMPE) and a bis-(1,3-dialkylthiobarbituric acid) trimethine oxonol (DiSBACn (3)), where n corresponds to the number of carbon atoms in the n-alkyl group, shown in Fig. 2B. Both are very bright fluorophores, the properties are shown in Table 1. CC1/2DMPE selectively partitions into the outer leaflet of the plasma membrane and acts as a fixed FRET donor to the mobile voltage-sensitive and negatively charged oxonol acceptor. CC1/2-DMPE does not cross the bilayer because of two negative charges from the coumarin and the phosphate groups. Cells have negative (inside) resting membrane potentials and under these conditions the majority of the negatively charge oxonols populate the relatively positive extracellular leaflet, resulting in efficient FRET (i.e. quenching of the coumarin
2 Membrane Potential Probes
Fig. 2 Voltage-sensitive FRET probes. (A) Schematic of voltage-sensitive FRETmechanism. Fluorescent donor molecules bind selectively to the extracellular leaflet of the plasma membrane, represented by a circle with hatching. Negatively charged acceptor, bold circles, distribute across the plasma membrane according to the membrane potential in a Nernstian manner. At negative resting potentials the acceptors
are predominantly at the extracellular surface and FRET is efficient, as shown on the left panel. Upon depolarization, the transmembrane acceptor equilibrium changes so that more oxonols are at the intracellular side. This causes a decrease in FRET and results in an increase and decrease in donor and acceptor fluorescence, respectively. (B) Structures of FRET donor CC1-DMPE and thiobarbiturate oxonols.
donor and increase in the oxonol acceptor emission). As illustrated in Fig. 2A, depolarization causes translocation of the oxonol to the inner surface of the plasma membrane and an increase in the mean donor and acceptor distance. Because FRET is very sensitive to donor–acceptor distance, this charge movement causes a simultaneous increase and decrease in the CC1/2-DMPE and oxonol fluorescence, respectively. The donor and acceptor fluorescence emission changes are reversed upon repolarization. The oxonol moves reversibly in the membrane with sub-second kinetics, allowing voltage-sensitive FRET to report fast voltage changes. Simultaneous patch-clamp and rapid optical recording have been used to demonstrate the speed, sensitivity and ratiometric nature of voltage-sensitive FRET in cells [5, 6]. The coumarin donor to oxonol acceptor fluorescence emission ratio is independent of the excitation intensity, the number of cells being detected, and the optical path length, providing fewer experimental artifacts compared to single
5
6 Ion Channel Assays Using Fluorescent Probes Table 1 Fluorescence and sensitivity properties of FRET membrane potential probes
Donor CC1/2-DMPE Acceptors DiSBAC2 (3) DiSBAC4 (3) DiSBAC6 (3) DiSBAC2 (5) DiSBAC4 (5) DiSBAC6 (5)
kex (nm)
kem (nm)
Tc (ms)
e (M−1 cm−1 )
Q.Y.
V m sensitivity %R per mV
405
460
na
40 000
1.0
na
540 540 540 640 640 640
560 560 560 660 660 660
500 20 2 50 2 0.40
200 000 200 000 200 000 225 000 225 000 225 000
0.44 0.44 0.44 0.67 0.67 0.67
1–3 0.6–1 0.4–0.8 0.5–2 <0.4 <0.2
wavelength probes. These dyes load well in about 20 min at room temperature and cells can be maintained for hours with the dyes without significantly degrading voltage responses. Analysis of data from a wide variety of cell types gives a sensitivity of about ∼1–3% ratio change per mV for DiSBAC2 (3) over the relevant physiological range. Oxonols differ in their physical, optical, and voltage-sensing properties. DiSBAC2 (3) is more water soluble and is left in the extracellular media during assay. DiSBAC2 (3) is often the first choice of ion channel screening applications because of its ease of loading, assay stability, and sensitivity. With a response time constant of ∼500 ms, a FRET assay using DiSBAC2 (3) as acceptor is 2 orders of magnitude faster than DiBAC4 (3) redistribution assays [14] and 10-fold faster than the FMP probes, making this dye well suited for liquid addition protocols which are often used to trigger membrane potential changes in HTS assays. FRET assays with even higher temporal resolution are possible by using more hydrophobic oxonols, such as DiSBAC6 (3) [5]. These more hydrophobic oxonols require using excipients, for example Pluronic F-127, to assist cellular loading and are fast enough to measure millisecond voltage changes. In the case of hexyl substituted pentamethine oxonol the response time constant was measured at 400 µs [6]. A summary of fluorescence and voltage-sensing properties of various FRET membrane potential probes is given in Table 1. The recent development of rapid electrical stimulation methods has enabled FRET ion channel assays that utilize the millisecond probes (Huang et al., submitted). 2.3 Advantages and Limitations of Membrane Potential Probes
Ion channels pose unique challenges for functional assay development due to their variety and complex kinetics. Direct sensing of membrane potential is an attractive ion channel readout because it is sensitive, generic, and is applied in intact living cells. Fluorescent membrane potential assays can be as, or more, sensitive as other ion channel assays methods including isotopic flux assays and patch clamping
2 Membrane Potential Probes
while being more amenable to high-throughput profiling. Detection sensitivity of ion channel agonists can often be comparable to or better than voltage-clamp in whole cells, depending on channel density and background cellular conductances. Consistent with drug–receptor models where the assay response is nonlinear with receptor occupancy [15], the observed EC50 s for agonist are often shifted to lower concentrations because only a small fraction of the channels have to be activated to elicit sufficient current to change the membrane potential. For antagonist assays, this property works against detection sensitivity and generally causes shifts in IC50 s to higher compound concentrations. Only ∼30 million ions need to pass through channels to change the membrane potential 60 mV in a 50 µm diameter cell [16]. Except for Ca2+ , this small number of ions will not appreciably change the intracellular or extracellular ion concentrations under normal physiological conditions. In the case of a neuronal voltage-activated Na+ channel, only a small number of channels (10s to 100s) need to be opened to cause enough ions to enter into the cell and rapidly depolarize a cell with a high resistance. The approach is generic because changes in membrane potential can be generated by any electrogenic ionic flux across the membrane. Voltage-sensitive fluorescent probes therefore provide a basis for a universal HTS-compatible assay platform for ion channels. Researchers applying voltage-sensitive fluorescent probes must also be aware of some limitations, potential sources of experimental artifacts and important optimization parameters. Selective functional assays for the target of interest require a host cell line with low background currents that are not activated by the stimulation protocol. Occasionally, small currents that seem insignificant in voltage-clamp recording can cause large effects on membrane potential assays. Once sufficient functional expression and stimulation conditions are established, it is important to optimize the cellular dye staining. Generally, staining conditions are selected for maximum dynamic range, signal over background, and response stability. For HTS applications, where reproducibility and efficiency is critical, it is important to select conditions that are not overly sensitive. Another potential issue is drug–dye interactions. The susceptibility of various membrane potential probes to such artifacts has been evaluated and it has been reported that FRET probes and FMP probes have decreased compound-dye issues than DiBAC4 (3)[17]. The relatively high optical interference of DiBAC4 (3) was also reported in a comparison with FMP in hERG [18] and KATP [4] channel assays. Normally these effects are fast and occur immediately after compound addition and do not affect the probes ability to respond to voltage changes. However, if the fluorescence intensities are significantly changed then the apparent voltage changes can be proportionately offset, causing significant error in the measured compound activity. For FRET probes, the ratiometric readout reduces some of these artifacts relative to single intensity dyes, and these effects can be readily identified as a large change in the donor or acceptor intensities compared to control wells in the absence of test compound. These interactions are to a large extent not observed at test compound concentrations below 3 µM, so potent compounds that interfere at higher concentration can usually be cleanly observed at lower concentrations. Finally, cell issues that affect membrane potential as well as most cell-based assays are confluence, health and stability of
7
8 Ion Channel Assays Using Fluorescent Probes
the cells. Sick cells or those cultured under variable conditions often have different functional channel expression and membrane properties that cause increased false positives or negatives and variability. 3 Ion-sensitive Fluorescent Probes 3.1 Calcium Dyes
Calcium dyes have been used extensively for probing Ca2+ signaling in cells and tissue. All currently used exogenous small molecules indicators trace their origin to the seminal work from Roger Tsien and coworkers. These cellular probes are fluorescent Ca2+ chelators based on EGTA that have different excitation and emission wavelengths, ratiometric readouts, and Ca2+ affinities. Fluo3 and its progeny, such as Ca-Green and Fluo4 [19] have been the probes of choice for calcium channel assays using single wavelength excitation, which is typical for most plate readers. Fluo3, which is based on a fluorescein, is conveniently excited with the 488 nm argon ion laser line and its fluorescence can increase 100-fold upon binding Ca2+ (K D = ∼400 nM) [20]. Fura2 has been favored for ratiometric measurements and requires instrumentation capable of dual excitation such as the plate-imaging fluorimeter developed by the former SIBIA Neurosciences and SAIC group [21]. These intracellular indicators are typically loaded into cells as acetoxymethyl (AM) esters which are subsequently hydrolyzed via intracellular esterases to the tetraacid forms which are trapped within cells. The structures of commonly used fluorescent ion indicators are shown in Fig. 3. 3.2 Indicators of Other Ions
Cellular fluorescent indicators of other common ions that permeate through ion channels exist and have been successfully applied in a variety of cells and tissue, however they are not as commonly employed as Ca2+ indicators. This is particularly true for HTS applications that generally require a larger fluorescent change to ensure robustness, and compatibility with screening plate readers. Specific ion channel dependent fluorescence changes are modest because of limited ion selectivity and the difficulty of generating sufficiently large ion concentration changes due to target ion channels. All fluorescent indicators for Na+ and K+ are based on crown ethers, which confer ion selectivity, conjugated with two fluorescent probes. Upon binding to the macrocycle, the fluorescence intensity is enhanced. The sodium, SBFI, and potassium, PBFI, indicators use a benzofuran fluorphore [22]. These probes produce ratiometric changes in their excitation spectra that require similar detection instrumentation as Fura2. Sodium green was developed by Molecular Probes using fluorescein as the fluorophore which enables excitation in visible wavelengths
3 Ion-sensitive Fluorescent Probes
Fig. 3 Structures of fluorescent ion probes.
similar to Fluo3 [23]. A key issue that limits the utility of these probes for sodium channels, as an example, is that the selectivity over K+ is not sufficiently high to eliminate competition with high intracellular K+ . As a result large, >10 mM, intracellular Na+ concentrations are required to elicit significant fluorescence changes. This is not feasible for many channels that inactivate rapidly and/or express at low levels. For K+ channels the intracellular concentrations are essentially fixed, however, the use of thallium as a surrogate in combination with the low affinity Ca2+ indicator BTC has been shown to be capable of large K+ channel dependent signals[24] and is discussed later. Chloride indicators are primarily quinolinium or acridinium dyes that have decreased fluorescence when exposed to higher halide concentrations due to increasing collisional quenching. Different dyes, including SPQ and MQAE, have been applied to monitor CFTR activity [25]. An iodide-sensitive pteridine dye with improve optical properties has been reported [26] and applied to studying CFTR function [27]. An important property of these dyes is that their fluorescence is essentially insensitive to other physiological relevant anions and nitrate. This feature has been used to develop CFTR assays. CFTR expressing cells are loaded with Cl− or I− and a dye such as SPQ. The extracellular Cl− is replaced with nitrate and upon CFTR activation with forskolin intracellular Cl− leaves the cell via CFTR and is exchanged with nitrate, which causes the cellular fluorescence to increase. While routinely used for CFTR these approaches have not been generally used because
9
10 Ion Channel Assays Using Fluorescent Probes Table 2 Fluorescent probes and assays for ion channel classes: the approach most often
used is highlighted in bold Target channel class
Membrane potential
Ion indicators
Fluorescent proteins
Na+
FRET Redistribution
SBFI, Na-green
No
Ca2+
FRET Redistribution
Fluo-3&4, Fura2
K+
FRET Redistribution
Tl+ sensitive probes
Cl−
FRET Redistribution FRET Redistribution
Quenching dyes (e.g. SPQ) If Ca2+ permeable
Ligand-gated
Comments
Fast FRETprobes and E-VIPR use-dependence without gating modifiers Calcium-sensitive Co-expression of GFPs inward rectifier used to assay voltage-dependence No Thallium as a substitute for K+ used for fluorescent flux assay Halide-sensitive Trafficking assays for GFPs CFTR yes TRP nicotinic channels use both Ca2+ and membrane potential dyes
they have limited brightness and signal changes, require the channel to stay open for a long time (like CFTR), and assay development complications that arise from other endogenous conductance pathways for halides and anion substitutes, can also be problematic.
4 Fluorescence Assays for Ion Channels
In the remainder of the chapter, we will describe specific examples of how fluorescence probes have been applied to assay various ion channel sub-families. A large number of these applications are directed toward drug discovery because these assays can be automated and used cost effectively to profile large numbers of compounds. This is currently a necessity because of the inability to design modulators de novo. A summary of available and preferred options for the individual sub-families is listed in Table 2. 4.1 Calcium Channels
The ability to develop functional fluorescent calcium channel assays greatly benefited from the development of fluorescent intracellular calcium probes and
4 Fluorescence Assays for Ion Channels
screening instrumentation, such as the FLIPR [9, 28] and the voltage ion probe reader (VIPR) [12]. Typically these assays take advantage of the ∼105 -fold Ca2+ concentration gradient across the plasma membrane and high sensitivity and selectivity of probes such as Fura2 [29] and Fluo-3 [20]. The channels are typically activated with high K+ or ligand and the probes sensitively detect intracellular concentration changes from ∼50–200 nM basal levels. Since the basal intracellular calcium concentrations levels are so low very few activated channels are required to elicit a substantial fluorescence change. Voltage-gated calcium channels are the targets for established drugs such as dihydropyridines, phenylalkylamines, and benzothiazepines that block L or CaV 1 channels and are used to treat cardiovascular disease, in particular hypertension and angina. The N-type or CaV 2.2 channel is the target for the recently launched drug PrialtTM , which is used to treat chronic morphine resistant pain [30]. Functional calcium channel assays have been developed for most members of this target class. Velicelebi and coworkers have developed functional Ca2+ assays against CaV 2 family channel complexes using stably expressing cell lines [21]. CaV -dependent calcium influx was stimulated with high K+ depolarization and the assay was able to reproduce the known molecular pharmacology differences of peptide toxins against these CaV subtypes. For example, they found that the µ-conotoxin GVIA selectively blocked CaV 2.2 over CaV 2.1 and CaV 2.3. This approach has also been applied to the screening of plant extracts for modulators of neuronal channels [31]. One of the limitations of assays designed to identify blockers for voltage-gated channels is that many blockers are state and voltage dependent. Xia and coworkers co-expressed the inward rectifier Kir2.3 with a CaV 1 or L-type channel complex in order to probe the voltage dependence of block by incubating compound and cells at two extracellular K+ concentrations before applying a maximal high K+ stimulation [32]. They showed that blockers such as nimodopine, which are known to be voltage-dependent blockers with higher affinity at less negative membrane potentials, are more sensitively detected when the cells are K+ -clamped at less negative potentials. In the case of nimodipine, they determined an IC50 of 3 nM at 30 mM external K+ versus 60 nM at 6 mM external K+ . The ability to assay the relative state-dependent activity of channel modulators in a HTS compatible format is a significant improvement over traditional high K+ induced assay protocols. 4.2 Non-voltage-gated Cation Permeable Channels
In addition to voltage-induced channel responses, fluorescent indicators have been widely used to measure activity of cation permeable channels that are activated by voltage-independent mechanisms. These channels can be classified as ligandgated or ligand-activated, a characteristic property used to activate or induce channel activity at a defined time in the assay. For example exogenous application of a ligand to trigger external calcium entry has been essential to implement calcium indicator assays to aid the discovery of small molecules that pharmacologically modulate channel function.
11
12 Ion Channel Assays Using Fluorescent Probes
Fig. 4 hTRPV1 response in 3456 microtiter plate using fluorescent Ca2+ indicator Fluo3/ Fura-red readout. (A) HEK cells stably expressing hTRPV1 were plated in a 3456nanoplate. Cells were loaded with Fluo3AM/Fura-red and capsaicin (200 nM) was added to evoke a calcium transient measured using a fluorescent plate reader. Each
6X6 square, one example is enlarged, represents an 11 point concentration response analysis to a single compound in triplicate as well as capsaicin, positive (capsaicin plus antagonist) and negative (DMSO) controls with N = 1. (B) Concentration–response curve of capsazepine block of capscaicinstimulated hTRPV1 is shown.
One important type of ligand-gated channel is the transient receptor potential (TRP) protein channel family. This class of proteins has gained considerable attention over the past several years as a novel and rapidly expanding channel superfamily. These primarily voltage-independent channels are most notable for their ability to respond to a variety of physical and chemical stimuli, such as temperature, mechanical force, phospholipids, oxidants and many others [33–35]. Because of these properties and where they are expressed, TRP channels have been implicated in a wide range of physiological processes including sensory physiology and immune response and related diseases. The calcium permeable nature of many TRP channels has been exploited to advance the field by assisting the identification and determination of the function of novel family members, for example TRPM3[36], as well as the search for new physiological stimuli and novel small molecules. TRPV1 (vanilloid receptor sub-family) has been defined by its activation by vanilloids such as capsaicin and noxious heat. Figure 4 demonstrates the use of the Fluo3 and Fura-Red calcium indicators to measure hTRPV1 response to capsaicin in a 3456-well nanoplate [37]. Capsazepine, a known TRPV1 antagonist dose-dependently inhibits the fluorescent response and is shown in the right panel. Using a similar assay, Rami and coworkers [38] used Fluo-3 to search for compounds that inhibited the capsaicin induced calcium-dependent fluorescent responses. In this study, they characterized a potent series of compounds that bind to an extracellular site to produce inhibition of temperature-induced activation. In a side-by-side comparison of the responses of the cold responsive channel TRPM8 and TRPV1 Behrendt and coworkers [39] found selective action from a library of
4 Fluorescence Assays for Ion Channels
odorants, providing new tools to help dissect the details of complex sensory signaling pathways that discriminate among different thermo-sensations. Fluorescent calcium indicators have also been used to identify compounds that modulate native channel complexes thought to be related to or comprised by members of the TRP family. Ishikawa et al. [40] examined the ability of small molecules to inhibit induced Fura-2 responses in human Jurkat T-cells to identify a potent inhibitor of Icrac, which was shown to alter immunosuppressant activity in a mouse contact hypersensitivity model. Appendino and colleagues have reported another example of extending studies from bench to preclinical stage. In this study, potent subnanomolar TRPV1 agonists were identified as well as sub-micromolar antagonists from channels exogenously expressed in HEK293 cells using Fluo-3 [41]. These agents were shown to pharmacologically modulate dorsal root ganglion neurons, bladder smooth muscle contraction and significantly altered micturition volume and frequency in an obstructed bladder function model. Thus the use of calcium indicators has greatly facilitated the identification of agents that modify in vivo physiology and are now being tested in clinical trials. In addition to elevating intracellular calcium levels, calcium ion entry can also produce membrane depolarization or hyperpolarization, via calcium-activated K+ channels. The nictoinic receptor, a calcium permeable cation selective channel is an example of a channel that has multiple functional roles. It is activated by acetyl-choline and is important in excitation–contraction coupling as well as many central and peripheral neuronal functions. The cell signaling events that a channel participates in can vary and can lead to complex function. Recently, a number of different cell lines using both calcium and membrane potential probes to measure nicotinic receptor channel activity have been used to dissect and exploit these differences [42]. Data from both formats were complementary but demonstrated distinct differences in ligand-induced channel kinetics. Membrane potential measurements demonstrated greater sensitivity and potency to activation by agonists, while antagonists were slightly less potent as compared to calcium indicator measurements in the same cells. While some differences may be addressed by channel density or assay sensitivity importantly, this represents the ability to use fluorescent indicators to differentiate between channel functions. For voltage-gated calcium channels, participation of these channels in calcium entry versus membrane depolarization is also of interest as both functions have been shown to impact and regulate cellular physiology. Figure 5 compares the measured fluorescence membrane potential versus the fluorescence calcium esponse resulting from applying an electrical field to cells containing a CaV channel. Note the differences in waveform of the calcium and V m signal. The membrane potential has an abrupt large transient change in signal that then reaches a plateau, however, the calcium indicator signal increases slowly building to a sustained level. The cellular property of calcium Ca2+ release (CICR) provides an additional complexity for the application of calcium dyes to the study of channel activity. CICR can greatly amplify the initial calcium entry through the plasma membrane-bound assay target. While providing high sensitivity, CICR can also limit the detection of slowly acting blockers and cause displacement of the dose response relationship relative
13
14 Ion Channel Assays Using Fluorescent Probes
Fig. 5 Membrane potential versus calcium signal. Comparative measurements from CaV -containing cells evoked using electric field stimulation using either an indicator of changes in membrane potential or intracellular calcium ions. HEK cells stably expressing a CaV channel were plated in a
96-well E-VIPR. The stimulation period is indicated. Cells were stained with membrane potential FRETdye pair (top) or loaded with Fluo-3AM and Fura RedAM (bottom). All responses were blocked by the calcium channel blocker Mibefradil (10 µM, not shown).
to the actual compound activity on the target channel. The use of fast membrane potential probes, such as FRET probes, can be designed to provide complementary data that is not dependent on CICR. Using both formats as a measure of CaV activity, differences in pharmacology have been revealed and may be pursued to identify small molecules that differentiate between these CaV functions. 4.3 Sodium Channels
Voltage-gated sodium channels (Nav s) historically have been one of the most difficult ion channel classes for which to develop high-throughput assays. The difficulty has arisen from a combination of challenges including cellular expression, voltage dependence of activation and inactivation, millisecond activation and inactivation kinetics, and voltage and state dependence of many blockers. The use of fluorescence membrane potential probes is the primary approach for this target class because of its high sensitivity to small currents and high-speed readout. Because these channels open and inactivate in milliseconds most assays require the use of gating modifiers such as veratridine and batrachotoxin (BTX) to lock the channel in an open conformation to prolong Na+ influx and cause sufficient current to depolarize cells. The use of toxins in assays is generally not desired because test compounds must compete for these sites and can limit the sensitivity of the
4 Fluorescence Assays for Ion Channels
assays to the detection of competitive binders. Typically cells are incubated in the absence of extracellular Na+ , substituting with organic cations such as tetramethylammonium, choline, or N-methyl-D-glucamine which do not permeate through open NaV channels. With the channel locked open, addition of Na+ results in an Nav -dependent inward current that rapidly depolarizes the cell [12]. Blockers are identified by their ability to inhibit this depolarization. The most commonly used fluorescent membrane potential probes are DiBAC4 (3), FMP and FRET-based voltage sensor probes. Some recent examples of these approaches have been published. Vickery and coworkers have applied FMP dye and NaV activator evoked depolarizations to compare the molecular pharmacology of activators and blockers against rNaV 1.8, rNaV 1.2 and hNaV 1.5 [43]. Using these assays they observed that different activators or gating-modifiers, including veratridine and the type II pyrethroids deltamethrin and venfalerate, had efficacies that were subtype-dependent. They reported that veratridine induced NaV dependent depolarizations in rNaV 1.2 and hNaV 1.5 but was not effective against rNaV 1.8. The type II pyrethroids were most effective in hNaV 1.5 and rNaV 1.8 and were potentiated with veratridine. Of the approximately 15 known blockers examined, all blocked the three subtypes with approximately equal concentrations, except for the neurotoxin TTX. Felix and coworkers have used voltage-sensitive FRET probes to develop assays for NaV 1.7 and NaV 1.5. They have reported that pre-incubation of compounds prior to addition of activators enabled compounds to pre-bind to the native conformation and then subsequently added the activator prior to Na+ addition [44]. They observed that this protocol more accurately reported blocking activities for some classes of compounds. The use of electrical stimulation and high-speed FRET probes has been introduced recently and enables repetitive millisecond stimulation and detection of NaV s in standard 96 and 384 well plates. The electrical stimulation voltage ion probe reader (E-VIPR) eliminates the need for using pharmacological modifiers and enables repetitive stimulation. E-VIPR is very flexible and has been shown to robustly identify and characterize NaV blocker activity, including assessment of use-dependence (Huang et al. submitted). Figure 6 shows E-VIPR concentration–response traces of 5 drugs using an E-VIPR assay and stimulating at 5 Hz. The response includes contributions from both voltage and frequency. At intermediate concentrations use-dependence is detected as differential blockade of the first (tonic) and last (use-dependent) stimulated voltage-transients. TTX, which is not voltage-dependent, does not exhibit as much use-dependence as the local anesthetic drugs. 4.4 Potassium Channels
Potassium channels are deceptively challenging targets for which to develop useful fluorescence assays. The K+ channel family is relatively large and individual members are activated in a wide variety of voltage and nonvoltage mechanisms. While it is straightforward to add high K+ to cells and elicit a depolarization, it is much
15
16 Ion Channel Assays Using Fluorescent Probes
Fig. 6 Activity of use-dependent NaV blockers in E-VIPR membrane potential assay. Concentration–response traces for TTX, tetracaine, amitriptyline, mexiletine, lidocaine and lamotrigine block of hNaV 1.3dependent depolarization transients using a 5 Hz stimulation protocol. The concentrations shown above the first ratio responses, at the left side of the figure, are the highest
concentrations tested. Each blocker was diluted 1 to 4 across the plates, as illustrated by the arrow. Use-dependent block is evident for all compounds except TTX, and is manifested as greater block at the end of the response train compared to the first response. Control responses without blockers are shown on the far right.
more challenging to develop an assay specific to a single target channel. The high and variable K+ channel background conductances make cell line and assay development difficult. The most common approach utilizes membrane potential dyes, although recently, the application of low affinity Ca2+ probes in combination with thallium as a K+ substitute has been used to develop HTS compatible fluorescence assays [24]. Membrane potential assays require the use of cell lines with minimal background conductances and relatively positive resting potentials. For blocker assays, expression of the target K+ channel will set the membrane potential at near its activation potential or near the K+ reversal potential (Krev ), so that block will result in a depolarization (if there are no other significant K+ conductances). The block can be monitored continuously after compound addition or by probing for the reduction on high K+ induced depolarization. For activators or “openers”, the expressed channel is closed at less negative potentials and compound activation causes the cell to hyperpolarize toward the potassium equilibrium potential. This strategy has been employed for identifying and profiling KATP channels openers
4 Fluorescence Assays for Ion Channels
using redistribution dyes such as DiBAC4 in the bladder [8, 45, 46], myocytes [47] and in cells heterologously expressing SUR1 and the inward rectifier Kir6.2 [48]. DiBAC4 has also been used to identify blockers, for example fluoxetine inhibition of Ca-activated K+ channels [49]. Fluorescent probes that directly measure K+ are difficult to apply for monitoring K+ channel-dependent [K+ ] changes because intracellular and extracellular [K+ ] do not change sufficiently under experimentally accessible conditions to elicit a large fluorescence change. Thus, K+ surrogates have been widely used for developing K+ channel assays. Rb86+ is routinely used as a radioactive substitute for K+ in flux assays. Recent developments in the application of atomic spectroscopy detection have led to nonradioactive Rb+ flux assays. Thallium (Tl+ ), another K+ surrogate, can readily enter cells via Na+ /K+ ATPase and Na+ /K+ /Cl+ co-transport mechanisms [50]. Tl+ readily fluxes through K+ channels and is known to quench various water-soluble fluorophores. Its quenching properties have been exploited to probe the activity of acetylcholine receptors [51, 52]. Recently, Weaver and colleagues have identified a coumarin benzothiazole-based low affinity Ca2+ fluorescent probe BTC, structure shown in Fig. 3, that functions as an intracellular Tl+ sensor from a screen of fluorescent metal chelating probes [24]. Using this probe in combination with a Cl− -free assay buffer, which overcomes TlCl solubility limitations, they devised an HTS compatible fluorescent assays for measuring KCNQ2 and SK3 K+ channel modulator activities. Two-fold fluorescence changes were achieved with minimal interference from intracellular Ca2+ and fluorescent test compounds because of weak Ca2+ affinity and the visible excitation wavelengths. Researchers considering using Tl+ should be aware that it is readily absorbed and is known to be toxic [53]. 4.5 Chloride Channels
Extracellular and intracellular ligand-gated Cl− channels regulate membrane potential in excitable and nonexcitable cells and have been identified as drug targets for a number of different conditions and diseases, including neurological disorders and cystic fibrosis. For example, the GABAA receptor is a ligand-gated Cl− channel that is the primary target of many drugs for the treatment of anxiety, epilepsy, muscle spasms, sleep related conditions and is the target for benzodiazepine drugs such as diazepam. Despite GABAA being a well validated drug target, historically it has been difficult to develop fluorescence assays because of desensitization, limitations of Cl− sensitive probes, background conductances, and the desire to identify compounds that modulate the effects of GABA. Recently, FRET membrane potential probes have been successfully applied to develop assays for characterizing GABAA pharmacological agents, including potentiators [54, 55]. This approach has also helped to identify new selective allosteric modulators that can differentiate between beta subunits [56] and to illuminate the structural basis of the selectivity of neurosteriods on the receptor [57]. Cystic fibrosis is due to mutations in the gene encoding the epithelial PKA-gated Cl− channel known as the cystic fibrosis transmembrane conductance regulator
17
18 Ion Channel Assays Using Fluorescent Probes
(CFTR) [58, 59]. The most common mutation is a deletion of phenylalanine at position 508 in the first nucleotide-binding domain of CFTR. This mutation leads to impaired trafficking and gating of CFTR to the apical membrane of respiratory epithelia, resulting in the defective anion transport that underlies the lung disease in patients [60]. Several fluorescent-based assays have been developed to identify and track the structure–activity relationship (SAR) of small molecules that directly target mutant CFTR to restore its defective gating or trafficking [61]. These include the use of fluorescent membrane potential and halide sensors, including geneticallytargeted fluorescent proteins. Voltage-sensitive membrane potential probes offer a sensitive and convenient approach to the indirect monitoring of anion flux through chloride channels in a variety of different cells types, including recombinant cells and primary cell cultures. The excellent throughput, sensitivity, and reproducibility of such assays make them especially useful for tracking SAR to support medicinal chemistry optimization. However, because the membrane potential response is nonlinear, reaching equilibrium at the reversal potential for Cl− , the assay window and ability to monitor potency (Fig. 7) and efficacy (Fig. 8) can be compromised. Two approaches can be used to circumvent this problem. First, the rate of membrane potential change, rather than the absolute magnitude, can be used to monitor Cl− flux. To do this, the temporal response of the dye must be sufficiently fast to accurately track the membrane potential. Second, decreasing channel density or reducing agonist concentrations can reduce the assay sensitivity and expand the linear range. Although this is particularly useful for the identification of highly efficacious compounds, it can potentially reduce the range of potency, limiting SAR evaluation. Another assay approach is to monitor Cl− flux directly using genetically-encoded anion-sensitive fluorescent proteins, such as mutants of yellow fluorescent protein (YFP) [62, 63]. Anion binding shifts the YFP chromophore pK a values and results in quenched fluorescence. Cellular YFP sensor expression enables measurement of transmembrane halide fluxes. Similar to small molecule halide indicators, iodide and nitrate are more sensitively detected and consequently similar anion replacement strategies have been successfully applied to assaying CFTR activity, including high throughput screening [61, 64]. In another application, GABAA -dependent intracellular Cl− changes were measured in cultured hippocampal neurons using a Cl− sensitive YFP mutant fused to Cl− insensitive FRET acceptor protein [65]. The chimeric fluorescent probe provides emission ratio detection that reduces experimental artifacts compared to single intensity indicators. Despite successful applications of fluorescent protein probes for ion channel analysis, their broad use is limited by relatively small fluorescence changes and cellular expression.
5 Assays for Monitoring Channel Trafficking
In addition to using fluorescence-based assays to monitor changes in CFTR gating, they can be used to monitor increased channel density due to compound rescue
5 Assays for Monitoring Channel Trafficking
Fig. 7 CFTR membrane potential assay demonstrates efficacy limitations compared to flux assays: Response to CFTR activation between fluorescent-based and electrophysiological assay formats. To monitor the response to increasing amounts of CFTR activation cells expressing wild-type CFTR were mixed with parental cells in the indicated proportions (% wild-type CFTR). The response to CFTR activation using a maximal concentration of forskolin was monitored in both the fluorescence assay (black circles) and in an electrophysiological assay (red circles) under voltage-clamp control. The response in both assays was normal-
ized to the response using 100% wild-type CFTR-expressing cells. In the fluorescencebased assay, the half maximal response was observed at ∼3% wild-type CFTR and was nonlinear as the concentration of wildtype CFTR expressing cells was increased. In contrast, the half-maximal response in the using chamber assay was reached at ∼60% wild-type CFTR and was linear. These results highlight the nonlinearity of the fluorescence-based assays, which can limit the SAR evaluation of agonist efficacy because the sensitive response saturates with low amounts of CFTR activation.
of protein processing and/or trafficking. A few pharmacological agents have been identified that partially rescue the defective trafficking of mutant CFTR, leading to increased cell surface density. 4-Phenyl butyrate, for example, acts at millimolar concentrations to produce a modest increase in F508-CFTR density in vitro [66], but had limited efficacy in vivo [67]. To enable the detection of compounds that alter protein trafficking, it is necessary to incubate the compounds for several hours prior to monitoring activity to allow for de novo synthesis and insertion into the membrane. This presents a potential issue in that incomplete wash-out of compounds that potentiate channel activity rather than increase its density may also be identified. To separate compounds that rescue the defective processing or trafficking from those that alter channel gating it is necessary to obtain independent confirmation using biochemical assays. For example, the correct processing and trafficking of mutant CFTR can be confirmed by monitoring its maturation via
19
20 Ion Channel Assays Using Fluorescent Probes
Fig. 8
6 Conclusion
passage through the golgi network. Compounds may also act by increasing the gating activity, as well as trafficking to the cell’s surface, further complicating the assessments of channel density in the membrane. To directly monitor changes in the cell surface density and eliminate the impact of channel gating, fluorescence-based assays that monitor protein localization could be used. For example, GFP has been tagged to the cytoplasmic C terminal tail of CFTR and has been successfully used to monitor the cytoplasmic and cell surface protein localization [68]. Although the throughput of such assays is limited, improvements in algorithms designed to separate cytoplasmic from nuclear staining could improve the utility of such assays. In addition to cystic fibrosis, protein trafficking and processing abnormalities of ion channels have been identified as the underlying cause of diseases, such as familial hyperinsulinism (Kir6.2), Sj¨ogren’s syndrome (aquaporin-5), nephrogenic diabetes insipidus (aquaporin-2), Brugada Syndrome (NaV1.5), inherited long QT syndrome (KCNQ1, hERG), and Andersen-Tawil syndrome (Kir2.1). In some cases, channel ligands (i.e, agonists and antagonists) have been demonstrated to partially rescue the defective trafficking and membrane density of mutant channels, including NaV 1.5 (mexiletine) [69], hERG (astemizole, terfenadine) [70], and Kir6.2 (sulfonylureas) [71]. Although a few agents have been identified, fluorescence-based assays similar to those used for mutant CFTR could be used to identify more druglike, potent, and efficacious agents to correct the defective folding and/or trafficking of Na+ and K+ channels.
6 Conclusion
Fluorescence approaches to studying ion channel function and the interactions of modulating molecules offer significant advantages in sensitivity, ease of use, and ← Fig. 8 CFTR membrane potential assay demonstrates the dependence of channel density on agonist sensitivity: Effects of CFTR density on agonist activity in fluorescent membrane potential assays. (A) Theoretical Michaelis-Menten type increase in open probability of CFTR by forskolin (Kd = 5 µM). (B) Simulation of the membrane potential response to CFTR activation by forskolin. The concentration dependent effects on membrane potential were calculated using the Goldman-Hodgkin-Katz equation incorporatingthe open probability at each agonist concentration and CFTR densities of 0.01 to 100 times the background permeability, which was assumed to
be due to an endogenous K+ conductance. Only at very low CFTR densities did the EC50 approximate the Kd for forskolin. (C) To monitor the effects on CFTR density on agonist stimulation in a fluorescence-based membrane potential assay, CFTR-expressing cells were mixed with parental cells at the indicated proportions and expressed as % wild-type (wt) CFTR. As observed in the simulations, only at very low wt-CFTR proportions did the EC50 for forskolin approximate its Kd. These results illustrate that the potency of ion channel agonists in a fluorescence-based ssay is highly sensitive to the channel density.
21
22 Ion Channel Assays Using Fluorescent Probes
high throughput analysis. Existing fluorescent cellular probes provide access to all classes of ion channels. These approaches primarily use membrane potential and ion-sensitive probes, which respond to the net ion flux through the channel. Each has it own assay development challenges and in some cases there are significant limitations on the types of information that can be obtained. Through a combination of improved probes and instrumentation, increasingly higher information content analysis is now possible, for example, fluorescence assays for use-dependent blockade and channel trafficking. Looking forward, additional detail should be attainable through fluorescent proteins and assays that can monitor different channel conformational states and their interactions with other proteins.
References 1 J. E. GONZALEZ, P. A. NEGULESCU, Intracellular detection assays for high-throughput screening, Curr. Opin. Bio-technol. 1998, 9(6), 624–631. 2 L. M. LOEW, Potentiometric membrane dyes, in Fluorescent and Luminescent Probes for Biological Activity, W.T. MASON (Ed.), Academic Press, San Diego, 1993, pp. 150–160. 3 D. F. BAXTER, et al., A novel membrane potential-sensitive fluorescent dye improves cell-based assays for ion channels, J. Biomol. Screen., 2002, 7(1), 79–85. 4 K. L. WHITEAKER, S. M. GOPALAKRISHNAN, D. GROEBE, C. C. SHIEH, U. WARRIOR, D. J. BURNS, M. J. COGHLAN, V. E. SCOTT, M. GOPALAKRISHNAN, Validation of FLIPR membrane potential dye for high throughput screening of potassium channel modulators, J. Biomol. Screen. 2001, (5), 305–312. 5 J. E. GONZALEZ, R. Y. TSIEN, Voltage sensing by fluorescence resonance energy transfer in single cells, Biophys. J. 1995, 69(4), 1272–1280. 6 J. E. GONZALEZ, R. Y. TSIEN, Improved indicators of cell membrane potential that use fluorescence resonance energy transfer, Chem. Biol. 1997, 4(4), 269–277. 7 R. F. FLEWELLING, W. L. HUBBELL, The membrane dipole potential in a total membrane potential model. Applications to hydrophobic ion interactions with membranes, Biophys. J. 1986, 49(2), 541–552.
8 K. O. HOLEVINSKY, Z. FAN, M. FRAME, J. C. MAKIELSKI, V. GROPPI, D. J. NELSON, ATP-sensitive K+ channel opener acts as a potent Cl− channel inhibitor in vascular smooth muscle cells, J. Membr. Biol. 1994, 137(1), 59–70. 9 K. S. SCHROEDER, B. D. NEAGLE, FLIPR: A new instrument for accurate, high throughput optical screening, J. Biomol. Screen. 1996, 1, 75–80. 10 J. FARINAS, A. W. CHOW, H. G. WADA, A microfluidic device for measuring cellular membrane potential, Anal. Biochem. 2001, 295(2), 138–142. 11 J. E. GONZALEZ, M. P. MAHER, Cellular fluorescent indicators and voltage/ion probe reader (VIPR) tools for ion channel and receptor drug discovery, Receptors Channels 2002, 8(5–6), 283–295. 12 J. E. GONZALEZ, K. OADES, Y. LEYCHKIS, A. HAROOTUNIAN, P. A. NEGULESCU, Cell-based assays and instrumentation for screening ion-channel targets, Drug Discov. Today 1999, 4(9), 431–439. 13 M. FALCONER, et al., High-throughput screening for ion channel modulators, J. Biomol. Screen. 2002, 7(5), 460–465. 14 D. E. EPPS, M. L. WOLFE, V. GROPPI, Characterization of the steady-state and dynamic fluorescence properties of the potential-sensitive dye bis-(1,3-dibutylbarbituric acid)trimethine oxonol (Dibac4(3)) in model systems and cells, Chem. Phys. Lipids. 1994, 69(2), 137–150. 15 T. KENAKIN, Stimulus-Response Mechanisms, in Pharmacologic Analysis of
References
16
17
18
19
20
21
22
23
24
25
26
Drug-Receptor Interaction, Lippincott-Raven, Philadelphia, 1997, pp. 78–105. J. KOESTER, Membrane Potential, in Principles of Neural Science, E. R. KANDEL, J. H. SCHWARTZ, T. M. JESSELL (Eds.), Elsevier, New York, 1991, pp. 81–94. C. WOLFF, B. FUKS, P. CHATELAIN, Comparative study of membrane potential-sensitive fluorescent probes and their use in ion channel screening assays, J. Biomol. Screen. 2003, 8(5), 533–543. W. TANG, J. KANG, X. WU, D. RAMPE, L. WANG, H. SHEN, Z. LI, D. DUNNINGTON, T. GARYANTES, Development and evaluation of high throughput functional assay methods for HERG potassium channel, J. Biomol. Screen. 2001, 6(5), 325–331. K. R. GEE, K. A. BROWN, W. N. CHEN, J. BISHOP-STEWART, D. GRAY, I. JOHNSON, Chemical and physiological characterization of fluo-4 Ca(2+ )-indicator dyes, Cell Calcium 2000, 27(2), 97–106. A. MINTA, J. P. KAO, R. Y. TSIEN, Fluorescent indicators for cytosolic calcium based on rhodamine and fluorescein chromophores, J. Biol. Chem. 1989, 264(14), 8171–8178. G. VELICELEBI, K. A. STAUDERMAN, M. A. VARNEY, M. AKONG, S. D. HESS, E. C. JOHNSON, Fluorescence techniques for measuring ion channel activity, Methods Enzymol. 1999, 294, 20–47. A. MINTA, R. Y. TSIEN, Fluorescent indicators for cytosolic sodium, J. Biol. Chem. 1989, 264(32), 19449–19457. G. P. AMORINO, M. H. FOX, Intracellular Na+ measurements using sodium green tetraacetate with flow cytometry, Cytometry 1995, 21(3), 248–256. C. D. WEAVER, D. HARDEN, S. I. DWORETZKY, B. ROBERTSON, R. J. KNOX, A thallium-sensitive, fluorescence-based assay for detecting and characterizing potassium channel modulators in mammalian cells, J. Biomol. Screen. 2004, 9(8), 671–677. M. R. WEST, C. R Molloy, A microplate assay measuring chloride ion channel activity, Anal. Biochem. 1996, 241(1), 51–58. S. JAYARAMAN, L. TEITLER, B. SKALSKI, A. S. VERKMAN, Long-wavelength
27
28
29
30
31
32
33
34
35 36
37
iodide-sensitive fluorescent indicators for measurement of functional CFTR expression in cells, Am. J. Physiol. 1999, 277(5 Pt. 1), C1008–1018. F. MUNKONGE, et al., Measurement of halide efflux from cultured and primary airway epithelial cells using fluorescence indicators, J. Cystic Fibrosis 2004, 3 (Suppl. 2), 171–176. E. SULLIVAN, E. M. TUCKER, I. L. DALE, Measurement of [Ca2+ ] using the Fluorometric Imaging Plate Reader (FLIPR), Methods Mol. Biol. 1999, 114, 125–133. G. GRYNKIEWICZ, M. POENIE, R. Y. TSIEN, A new generation of Ca2+ indicators with greatly improved fluorescence properties, J. Biol. Chem. 1985, 260(6), 3440–3450. G. P. MILJANICH, Ziconotide: neuronal calcium channel blocker for treating severe chronic pain, Curr. Med. Chem. 2004, 11(23), 3029–3040. K. L. ROGERS, W. F. FONG, J. REDBURN, L. R. GRIFFITHS, Fluorescence detection of plant extracts that affect neuronal voltage-gated Ca2+ channels, Eur. J. Pharm. Sci. 2002, 15(4), 321–330. M. XIA, J. P. IMREDY, K. S. KOBLAN, P. BENNETT, T. M. CONNOLLY, State-dependent inhibition of L-type calcium channels: cell-based assay in high-throughput format, Anal. Biochem. 2004, 327(1), 74–81. D. E. CLAPHAM, L. W. RUNNELS, C. STRUBING, The TRP ion channel family, Nat. Rev. Neurosci. 2001, 2(6), 387–396. R. INOUE, TRP channels as a newly emerging non-voltage-gated Ca2+ entry channel superfamily, Curr. Pharm. Des. 2005, 11 (15), 1899–1914. C. MONTELL, The TRP superfamily of cation channels, Sci. STKE 2005, (272): re3. N. LEE, et al., Expression and characterization of human transient receptor potential melastatin 3 (hTRPM3), J. Biol. Chem. 2003, 278(23), 20890–20897. L. MERE, T. BENNETT, P. COASSIN, P. ENGLAND, B. HAMMAN, T. RINK, S. ZIMMERMAN, P. NEGULESCU, Miniaturized FRETassays and microfluidics: key components for ultra-high-throughput screening, Drug Discov. Today 1999, 4(8), 363–369.
23
24 Ion Channel Assays Using Fluorescent Probes
38 H. K. RAMI, et al., Discovery of small molecule antagonists of TRPV1, Bioor. Med. Chem. Lett. 2004, 14(14), 3631–3634. 39 H. J. BEHRENDT, T. GERMANN, C. GILLEN, H. HATT, R. JOSTOCK, Characterization of the mouse cold-menthol receptor TRPM8 and vanilloid receptor type-1 VR1 using a fluorometric imaging plate reader (FLIPR) assay, Br. J. Pharmacol. 2004, 141(4), 737–745. 40 J. ISHIKAWA, K. OHGA, T. YOSHINO, R. TAKEZAWA, A. ICHIKAWA, H. KUBOTA, T. YAMADA, A pyrazole derivative, YM-58483, potently inhibits store-operated sustained Ca2+ influx and IL-2 production in T lymphocytes, J. Immunol. 2003, 170(9), 4441–4449. 41 G. APPENDINO, et al., Development of the first ultra-potent “capsaicinoid” agonist at transient receptor potential vanilloid type 1 (TRPV1) channels and its therapeutic potential, J. Pharmacol. Exp. Ther. 2005, 312(2), 561–570. 42 R. W. FITCH, Y. XIAO, K. J. KELLAR, J. W. DALY, Membrane potential fluorescence: a rapid and highly sensitive assay for nicotinic receptor channel function, Proc. Natl. Acad. Sci. U S A 2003, 100(8), 4909–4914. 43 R. G. VICKERY, et al., Comparison of the pharmacological properties of rat Na(V)1.8 with rat Na(V)1.2a and human Na(V)1.5 voltage-gated sodium channel subtypes using a membrane potential sensitive dye and FLIPR, Receptors Channels 2004, 10(1), 11–23. 44 J. P. FELIX, et al., Functional assay of voltage-gated sodium channels using membrane potential-sensitive dyes, Assay Drug Dev. Technol. 2004, 2(3), 260–268. 45 W. A. CARROLL, et al., Synthesis and structure-activity relationships of a novel series of tricyclic dihydropyridine-based KATP openers that potently inhibit bladder contractions in vitro, J. Med. Chem. 2004, 47(12), 3180–3192. 46 M. GOPALAKRISHNAN, et al., Characterization of the ATP-sensitive potassium channels (KATP) expressed in guinea pig bladder smooth muscle cells, J. Pharmacol. Exp. Ther. 1999, 289(1), 551–558.
47 K. L. WHITEAKER, R. DAVIS-TABER, V. E. SCOTT, M. GOPALAKRISHNAN, Fluorescence-based functional assay for sarco-lemmal ATP-sensitive potassium channel activation in cultured neonatal rat ventricular myocytes, J. Pharmacol. Toxicol. Methods 2001, 46(1), 45–50. 48 M. GOPALAKRISHNAN, et al., Pharmacology of human sulphonylurea receptor SUR1 and inward rectifier K(+) channel Kir6.2 combination expressed in HEK-293 cells, Br. J. Pharmacol. 2000, 129(7), 1323–1332. 49 G. C. TERSTAPPEN, A. PELLACANI, L. ALDEGHERI, F. GRAZIANI, C. CARIGNANI, G. PULA, C. VIRGINIO, The anti-depressant fluoxetine blocks the human small conductance calcium-activated potassium channels SK1, SK2 and SK3, Neurosci. Lett. 2003, 346(1–2), 85–88. 50 T. BRISMAR, S. ANDERSON, V. P. COLLINS, Mechanism of high K+ and Tl+ uptake in cultured human glioma cells, Cell Mol. Neurobiol. 1995, 15(3), 351–360. 51 H. P. MOORE, M.A Raftery, Direct spectroscopic studies of cation translocation by Torpedo acetylcholine receptor on a time scale of physiological relevance, Proc. Natl. Acad. Sci. U S A 1980, 77(8), 4509–4513. 52 W. C. WU, H. P. MOORE, M. A. RAFTERY, Quantitation of cation transport by reconstituted membrane vesicles containing purified acetylcholine receptor, Proc. Natl. Acad. Sci. U S A 1981, 78(2), 775–779. 53 R. S. HOFFMAN, Thallium toxicity and the role of Prussian blue in therapy, Toxicol. Rev. 2003, 22(1), 29–40. 54 C. E. ADKINS, et al., GABA(A) receptors characterized by fluorescence resonance energy transfer-derived measurements of membrane potential, J. Biol. Chem. 2001, 276(42), 38934–38939. 55 A. J. SMITH, P. B. SIMPSON, Methodological approaches for the study of GABA(A) receptor pharmacology and functional responses, Anal. Bioanal. Chem. 2003, 377(5), 843–851. 56 A. J. SMITH, B. OXLEY, S. MALPAS, G. V. PILLAI, P. B. SIMPSON, Compounds exhibiting selective efficacy for different beta subunits of human recombinant gamma-aminobutyric acid A receptors, J.
References
57
58
59
60
61
62
63
64
65
Pharmacol. Exp. Ther. 2004, 311(2), 601–609. G. V. PILLAI, A. J. SMITH, P. A. HUNT, P. B. SIMPSON, Multiple structural features of steroids mediate subtype-selective effects on human alpha4beta3delta GABAA receptors, Biochem. Pharmacol. 2004, 68(5), 819–831. P. M. QUINTON, Missing Cl conductance in cystic fibrosis, Am. J. Physiol. 1986, 251(4 Pt. 1), C649–652. J. R. RIORDAN, et al., Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA, Science 1989, 245(4922), 1066–1073. R. C. BOUCHER, New concepts of the pathogenesis of cystic fibrosis lung disease, Eur. Respir. J. 2004, 23(1), 146–158. H. YANG, et al., Nanomolar affinity small molecule correctors of defective Delta F508-CFTR chloride channel gating, J. Biol. Chem. 2003, 278(37), 35079–35085. R. M. WACHTER, S.J Remington, Sensitivity of the yellow variant of green fluorescent protein to halides and nitrate, Curr. Biol. 1999, 9(17), R628–629. R. M. WACHTER, D. YARBROUGH, K. KALLIO, S. J. REMINGTON, Crystallographic and energetic analysis of binding of selected anions to the yellow variants of green fluorescent protein, J. Mol. Biol. 2000, 301(1), 157–171. A. S. VERKMAN, S. JAYARAMAN, Fluorescent indicator methods to assay functional CFTR expression in cells, Methods Mol. Med. 2002, 70, 187–196. T. KUNER, G. J. AUGUSTINE, A genetically encoded ratiometric indicator for chloride: capturing chloride transients in
66
67
68
69
70
71
cultured hippocampal neurons, Neuron 2000, 27(3), 447–459. R. C. RUBENSTEIN, M. E. EGAN, P. L. ZEITLIN, In vitro pharmacologic restoration of CFTR-mediated chloride transport with sodium 4-phenylbutyrate in cystic fibrosis epithelial cells containing delta F508-CFTR. J. Clin. Invest. 1997, 100(10), 2457–2465. R. C. RUBENSTEIN, P. L. ZEITLIN, A pilot clinical trial of oral sodium 4-phenylbutyrate (Buphenyl) in deltaF508-homozygous cystic fibrosis patients: partial restoration of nasal epithelial CFTR function, Am. J. Respir. Crit. Care Med. 1998, 157(2), 484–490. B. D. MOYER, B.A Stanton, Analysis of CFTR trafficking and polarization using green fluorescent protein and confocal microscopy, Methods Mol. Med. 2002, 70, 217–227. C. R. VALDIVIA, et al., A trafficking defective, Brugada syndrome-causing SCN5A mutation rescued by drugs, Cardiovasc. Res. 2004, 62(1), 53–62. E. FICKER, C. A. OBEJERO-PAZ, S. ZHAO, A. M. BROWN, The binding site for channel blockers that rescue misprocessed human long QTsyndrome type 2 ether-a-gogo-related gene (HERG) mutations, J. Biol. Chem. 2002, 277(7), 4989–4998. F. YAN, C. W. LIN, E. WEISIGER, E. A. CARTIER, G. TASCHENBERGER, S. L. SHYNG, Sulfonylureas correct trafficking defects of ATP-sensitive potassium channels caused by mutations in the sulfonylurea receptor, J. Biol. Chem. 2004, 279(12), 11096–11105.
25
1
Enzyme Activity Fingerprinting Methods for Hydrolases Johann Grognux and Jean-Louis Reymond University of Berne, Berne, Switzerland
Originally published in: Enzyme Assays. Edited by Jean-Louis Reymond. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52731095-1
1 Introduction
Enzyme activity fingerprinting consists in measuring the activity of a single enzyme under a series of conditions, in particular a series of different substrates (Figure 1) [1]. The data set resulting from this high-content assay can be defined as a fingerprint if the measurement is easily repeatable and reproducible, which calls for practical and simple procedures. Activity fingerprinting is possible on the basis of any enzyme assay adaptable to a range of different substrates. All data points should be measured simultaneously using the same enzyme solution. An enzyme activity fingerprint may provide sufficient information to identify a specific enzyme by its activity and distinguish it from other closely related enzymes. This functional identification might serve as a tool for diagnostic and quality control of enzyme-containing samples [2]. The data points in an enzyme fingerprint can be used to compose an image by associating each data point with a pixel. The resolution of this image depends on the number of pixels and on their accuracy, and its meaning is given by the association of each pixel with a particular enzyme substrate. The fingerprint image conveys information about the enzyme’s reactivity, much as the set of atom coordinates or their representation conveys information about an enzyme’s structure (Figure 2). Most importantly, one cannot obtain one from the other because our current theoretical understanding of enzyme structure–activity relationships has only very limited predictive value. Indeed predicting the detailed effect of a structural modification in either enzyme (mutations) or substrate (chemical analog) on the reactivity of the enzyme–substrate pair is almost impossible. The complexity of the problem is illustrated by the fact that a single mutation in an enzyme may have only a barely detectable influence on its structure, but might change its reactivity Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Enzyme Activity Fingerprinting Methods for Hydrolases
Fig. 1 Principles of enzyme fingerprinting.
Fig. 2 Images of protein structure (upper) and activity fingerprints (lower) for two microbial lipases. The 3D structures were generated from Protein Database (PDB) data. The activity fingerprints are taken with the
polyol acetate substrate array in Figure 14. Both pairs of images are different. Note that the functional information in the fingerprint cannot be inferred from the protein structure.
1 Introduction 3
and selectivity completely. Similarly, minor structural changes in a substrate can completely change its reactivity towards a given enzyme. Enzyme assays based on microarrays are large-scale profiling experiments similar to fingerprinting. For example peptide arrays prepared by SPOT synthesis are used to profile the activity of proteases, kinases, and phosphatases, which serves to define their substrate specificity and identify their biological function by defining the optimal substrate [3]. These experiments use arrays of thousands of substrates derived from combinatorial or automated parallel synthesis. Positional scanning libraries form the basis of the analysis of protease substrate specificity. However such experiments are often not easy to reproduce due to the complexity of the reagents and size of the data sets produced. The present chapter reviews fingerprinting methods based on small sets of enzyme substrates, where emphasis is placed on reproducibility and simplicity of the measurements. The aim is to describe a photographic method of taking pictures of reactivity landscapes of enzymes, and using these pictures to analyze function. The principle of enzyme fingerprinting derives from microorganism activity fingerprinting, which is used to as a taxonomic tool in microbiology and in medical diagnostic applications [4]. Enzyme activity fingerprints may be used in the area of quality control and medical diagnostics, where rapid functional identification of enzymes may be needed as an indicator. Enzyme fingerprinting also implicitly addresses the fundamental question of catalytic promiscuity: provided that a pair of enzymes displays identical fingerprints across a series of substrates, are these two enzymes actually identical from the functional point of view, to the degree that their entire reactivity profile against any substrate will be identical? 1.1 One Enzyme – One Substrate
In the traditional view of biochemistry, each enzyme is associated with catalysis on a specific substrate, and this holds particularly true for enzymes of metabolic cycles which process single intermediates with high specificity. A large number of enzyme assays are enzyme-coupled assays, whereby the reaction product of the enzyme to be detected is subsequently converted by secondary enzymes, which are part of the detection reagent, to ultimately produce a detectable signal. Most enzyme-coupled assays relay product processing to a redox enzyme interconverting NAD+ and NADH or NADP+ and NADPH. The latter reaction can be followed spectrophotometrically at 340 nm, a wavelength compatible with almost any type of container, including polystyrene microtiter plates. An example of an enzymecoupled assay relevant in the context of enzyme fingerprinting is the enzymecoupled detection system for acetic acid, in which a series of three highly specific enzymes are used (Figure 3). First, acetyl-CoA synthase (ACS) activates acetic acid to acetyl-CoA with ATP. Second, acetyl-CoA is condensed with oxaloacetate to form citrate by citrate synthase (CS). Finally, oxaloacetate is produced from L-malate by L-malate dehydrogenase (L-MDH). Each of these three enzymes is highly specific
4 Enzyme Activity Fingerprinting Methods for Hydrolases
Fig. 3 Enzyme-coupled detection of acetic acid via NAD/NADH at 340 nm allows lipase activities to be monitored. The lipase may accept a variety of substrates, while the detection enzymes ACS (acetylCoA synthase), CS (citrate synthase) and l-MDH (l-malate dehydrogenase) are very specific for their substrates.
for its substrate. This assay system for acetic acid was used by Bornscheuer et al. to follow lipases and esterases activities on various substrates [5], in particular enantiomeric pairs and could form the basis for fingerprinting experiments for lipases. Reference assays associating a single substrate with an enzyme serve to define activity units. Unit measurements are carried out under defined conditions of medium, temperature, and substrate concentration. One unit generally corresponds to 1 µmol of product formed per minute under specified assay conditions. It is recommended to use M s−1 under standard conditions [6]. Activity units are used to compare different enzyme samples, for example during isolation and purification. All commercial enzymes are sold together with an indication of activity in units per milligram protein, whereby the reference assay substrate and conditions are described. The activity units in many lipases and esterases are defined with reference to the hydrolysis of chromogenic nitrophenyl esters, and those of proteases by reference to the hydrolysis of amino acid nitroanilides, which are also chromogenic. Alcohol dehydrogenases are measured by spectrophotometric recording of the reduction of the cofactor NAD(P)+ to NAD(P)H in the presence of a reference alcohol substrate, for example ethanol for horse liver alcohol dehydrogenase. Glycosidases units refer to the hydrolysis of chromogenic nitrophenyl glycosides as substrates, for example β-D-nitrophenyl-galactoside for β-galactosidase. For many enzymes the reference assay defining units is performed under this enzyme’s optimal conditions and sometimes using the only accepted substrate. In this situation the unit determination provides a fair estimate of the enzyme’s catalytic power in an absolute sense. However the application of enzymes to chemical synthesis has shown that many enzymes accept a wide variety of substrates and may operate under very different conditions (pH, solvent, temperature), each of which corresponds to a certain activity and selectivity pattern of the enzyme. For such enzymes, which include many hydrolytic enzymes, activity units may not reflect the enzyme’s performance. This is the case, for example, for lipases as discussed above, because their reference nitrophenyl ester substrates are often quite poor substrates compared with their natural triglyceride substrates.
1 Introduction 5
1.2 Enzyme Activity Profiles
Enzymes are intrinsically complex catalysts, as evidenced by their folded threedimensional structure. The smallest enzymes are in the range of 20 kDa, which corresponds to 150 amino acids. The polypeptide is always much larger than the catalytic site itself and imparts essential functions in addition to catalysis, including stability, allosteric and biochemical regulation, and cellular localization. Primary sequences often allow a gene product to be assigned to a given enzyme class due to the occurrence of conserved catalytic site arrangements in certain enzymes. For example lipases almost always display a catalytic triad involving a nucleophilic serine within a G-X-S-X-G conserved motif often located in a buried type II turn [7]. Three-dimensional structures may also allow a partial functional assignment classification. For example, the occurrence of an eight-stranded β-barrel (TIM barrel) is often indicative of enzymatic activities. However structural data do not provide detailed information on an enzyme’s catalytic behavior, which must be measured directly. The description of an enzyme’s catalytic function from the chemical standpoint includes activity changes in response to changing reaction conditions (temperature, pH, medium) and substrate structure (chemo- and stereospecificity). These data can be defined as the activity profile of the enzyme, which thus consists in a series of multidimensional data sets. The simplest form of activity profile consists of parametric profiles using a single substrate, which are used to extract mechanistic information. The substrate/concentration profile can lead to the so-called Michaelis–Menten parameters of K m and kcat following a simple kinetic model [8]. Most enzymes also display a specific pH/rate profile which provides information about the catalytic mechanism, for example the role of aspartate and glutamate residues in acidic xylanases [9]. pH/rate profiles often indicate which reaction step is rate-limiting, for example in protein tyrosine phosphatases [10]. Temperature profiles can be interpreted in terms of enzyme denaturation (melting) [11], temperature-dependent activity [12], or using the Arrhenius equation to determine activation parameters. Arrhenius plots for enzymic rates may be nonlinear and their interpretation quite problematic [13], in particular for membrane-bound enzymes [14], this due in part to the temperature dependence of substrate affinity [15]. The most significant activity profiles concern variations in substrate structure, and this will be the main topic of this chapter. Substrate profiles, if recorded in simple and reproducible manner, may provide entirely specific and reproducible patterns that can be used to identify an enzyme or enzyme containing sample, the functional equivalent of a fingerprint. Substrate profiling allows the determination of the substrate preference of an enzyme, which is of practical interest for biocatalysis applications. In the area of biochemistry, the determination of preferred substrates helps to elucidate the cellular function of enzymes, in particular in the case of proteases, kinases, and phosphatases through the identification of the target peptide sequences of each enzyme.
6 Enzyme Activity Fingerprinting Methods for Hydrolases
1.3 The APIZYM System for Microbial Strain Identification
The idea of using a substrate profile as a fingerprint for sample identification dates back to the 1960s, with the invention of the APIZYM system by Bussiere ` et al. as a tool for microorganism classification and identification. The system, first named Auxotab, involved an array of 16 different enzyme substrates [16], and included chromogenic substrates for lipases and esterases, aminopeptidases, chymotrypsin, trypsin, phosphatases, sulfatases, and β-galactosidases. The assays were formulated as a combined set on a filter-paper format, and carried out in parallel on crude microorganism cultures to assess the presence/absence of the corresponding enzymes. The APIZYM system was then developed from this discovery by Daniel Monget at Biomerieux and commercialized [17], and became a popular tool in microbiology. This system uses a series of 19 enzyme assays characteristic of 19 enzyme activities and one blank as reference. It has been used broadly to characterize microorganisms (Table 1) [18]. The apparent activities are classified qualitatively from enzymic tests using labels such as α for a strong activity and β for a weak
Table 1 Enzymes and substrates assayed by the parallel assay APIZYM system
No. Enzyme
Substrate
pH Colora
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
None 2-Naphthyl phosphate 2-Naphthyl butyrate 2-Naphthyl caprylate 2-Naphthyl myristate L-Leucyl 2-naphthylamide L-Valyl-2-naphthylamide L-Cystyl-2-naphthylamide N-Benzoyl-DL-arginine 2-naphthylamide N-Benzoyl-DL-phenylalanine 2-naphthylamide 2-Naphthyl phosphate Naphthol AS bis-phosphodiamide 6-Bromo-2-naphthyl-α-D-galactopyranoside 2-Naphthyl-β-D-galactopyranoside Naphthol AS bis-β-D-glucuronide 2-Naphthyl-α-D-glucopyranoside 6-Bromo-2-naphthyl-β-D-glucopyranoside 1-Naphthyl-N-acetylyS-d-glucosaminide 6-Bromo-2-naphthyl-α-D-mannopyranoside 2-Naphthyl-α-L-fucoside
8.5 7.1 7.1 7.1 7.5 7.5 7.5 8.5 7.1 5.4 5.4 5.4 5.4 5.4 5.4 5.4 5.4 5.4 5.4
None Alkaline phosphatase Esterase C4 Lipase C8 Lipase C14 Leucine aminopeptidase Valine aminopeptidase Cystine aminopeptidase Trypsin Chymotrypsin Acid phosphatase Phosphoamidase α-Galactosidase β-Galactosidase β-Glucuronidase α-Glucosidase β-Glucosidase N-Acetyl-β-glucosaminidase α-Mannosidase α-Fucosidase
b)
purple purple purple purple orange orange orange orange purple purple blue purple purple blue purple purple brown purple purple
A fibrous material (filter paper) is impregnated with substrate added as an alcohol solution, then with a pH stabilizer (Tris-HCl > pH 7.0, or Tris-maleate), and then put in contact for 2–4 h at 37 ◦ C with the biological sample to be analyzed. a Coloration observed after reaction with a solution of Fast Blue BB (N-(4-amino-2,5-diethoxyphenyl)benzamide) in 25% aqueous Tris-HCl containing 10% weight of lauryl sulfate. b Control with biological sample only.
1 Introduction 7
Fig. 4 Principle of PHENOZYM microorganism profiling. The cocktail contains 16 labeled enzyme substrates (see Table 2). Data processing involves determination of percentage conversion of each substrate in the positive (culture) and negative (medium alone) assay.
activity, and interpreted in terms of the presence/absence of the corresponding enzymes [19]. Newer, more exhaustive variations on this theme have been developed. The current format includes 32 different assays and comes in miniaturized, preformated plates [20]. The APIZYM system demonstrates that enzyme activity data allow a functional classification of microorganisms. A more compact version of the APIZYM can be realized in a single assay using a cocktail containing different enzyme substrates, called PHENOZYM (Figure 4 and Table 2) [21]. This format is applicable to extremophilic microorganisms under their optimal growth conditions. Labeled substrates for 16 different enzyme types are combined in this cocktail. The assay involves a single reaction followed by determination of substrate consumption by high-performance liquid chromatography (HPLC) analysis. This allows a rapid identification of multiple enzyme activities, and is compatible with a diversity of growth media and reaction conditions (pH, temperature), including those for thermophilic microorganisms. The functional profiles can be used for a functional classification of the different microbial strains,
8 Enzyme Activity Fingerprinting Methods for Hydrolases Table 2 The PHENOZYM cocktail for profiling thermophilic microorganisms
No. Enzyme 1 2 3 4 5 6 7 8
Phosphatase Amylase β-Galactosidase α-Galactosidase β-Glucosidase β-Glucuronidase N-Acetyl-β-glucosaminidase α-Glucosidase
9 10
α-Mannosidase α-Fucosidase
11 12 13 14 15 16
Valine aminopeptidase Leucine aminopeptidase Chymotrypsin Trypsin Esterase C4 Lipase C8
Substrate
tR
4-NP-phosphate 4-NP-α-D-hexa-(1,4)-glucopyranoside) 4-NP-β-D-galactopyranoside 4-NP-α-D-galactopyranoside 4-NP-β-D-glucopyranoside 4-NP-β-D-glucuronide 4-NP-N-acetyl-β-D-glucosaminide 4-MU-α-D-glucopyranoside 4-Methylumbelliferone 4-MU-α-D-mannopyranoside 4-NP-α-L-fucopyranoside 4-Nitrophenol 4-Nitroaniline L-Valine-paranitroanilide L-Leucine-paranitroanilide L-Phenylalanine-4-nitroanilide N-α-Benzoyl-α-L-arginine-4-nitroanilide 5-(4-Nitrophenoxy)-2-hydroxy-pentyl butanoate 3-(Umbelliferyl)-2-methyl-2-hydroxypropyl octanoate
3.94 7.49 8.98 9.50 10.41 11.53 12.39 13.13 17.38 15.49 19.83 20.12 20.35 23.44 24.83 26.12 27.33 40.23 43.85
a
Analysis conditions: Vydac 218TP54 RP-C18 column, 0.4×22 cm, elution 1.5 mL min−1 , gradient water-acetonitrile +0.1% TFA, detection by UV at 300 nm. b NP: nitrophenyl; MV: methylumbellifenyl. c This substrate gave mostly nitrophenol. Intermediates corresponding to hydrolysis between the glucosyl units were also detected in small amounts. d The 1,2-diol hydrolysis products were not detected and are apparently further degraded under the assay conditions.
resulting in a “phylo-enzymatic” tree consistent with the known phylogenetic classification of the strains. 2 Hydrolase Fingerprinting
The APIZYM and PHENOZYM analyses identify a microorganism on the basis of the enzyme panel it expresses. The principle of enzyme fingerprinting also extends to identifying single enzymes on the basis of their activity profiles against a series of substrates. Substrates for fingerprinting should be sufficiently reactive to return an activity with most enzymes within an enzyme class, yet show differential reactivity to distinguish between different enzymes. The activity detection should also be simple and reproducible to facilitate analysis. 2.1 Fingerprinting with Fluorogenic and Chromogenic Substrates
Fluorogenic and chromogenic substrates provide a very direct, selective, and flexible tool to design enzyme-specific assays. Optimally such substrates react only in
2 Hydrolase Fingerprinting 9
Fig. 5 Fluorogenic and chromogenic substrates using secondary $elimination. The bonds cleaved by the primary enzyme being assayed are marked as wobble lines.
contact with the targeted enzyme or enzyme class, while being unreactive otherwise. The indirect release of a fluorescent or colored phenolate (X− ) by β-elimination from a ketone or aldehyde intermediate provides a general strategy for fluorogenic and chromogenic substrates for a broad variety of enzymes (Figure 5). This principle is suitable for the assay of alcohol dehydrogenases (1) [22], and to measure retro-aldolization in catalytic antibodies (e.g. 2) [23] and transaldolases (e.g. 3) [24]. Similar catalytic antibody assays have been reported, releasing either resorufin from 4 [25], or bromonaphthol from 5 [26]. In the latter case reaction of the released bromonaphthol with brilliant blue forms an insoluble blue precipitate suitable for direct detection in agar plates. Assays using a similar β-elimination strategy to release a fluorescent or colored phenolate have been reported for lipases (6) [27], transketolase (7) [28], Baeyer-Villiger monooxygenases (8) [29], β-lactamase (9) [30], and a fluorescent probe for NADPH and NADH (10) [31]. Indirect release of umbelliferone (7hydroxycoumarin) is also possible via an intermediate hemi-acetal (Figure 6), which leads to assays for lipases and esterases (e.g. 11) [32], lactonases (e.g. 12) and BaeyerVilliger monooxygenases (e.g. 13) [33]. The β-elimination can also be coupled with periodate oxidation of an intermediate 1,2-diol or 1,2-aminoalcohol, thereby allowing the assay of enzymes releasing these as primary products (Figure 7) [34]. The strategy is suitable for lipases and esterases
Fig. 6 Indirect release fluorogenic substrates. The bonds cleaved by the primary enzyme being assayed are marked as wobble lines.
Fig. 7 Fluorogenic and chromogenic substrates for periodate-coupled enzyme assays with $-elimination.
2 Hydrolase Fingerprinting 11
Fig. 8 Naphthaldehyde release assays.
(e.g. 14–15) [35], epoxide hydrolases (e.g. 16) [36], phosphatases (e.g. 17) [37], and acylases and proteases (e.g. 18) [38]. In a similar manner release of the fluorescent 6-methoxy-naphthaldehyde (20) from alcohol precursors can be used in a blue fluorescent assay for alcohol dehydrogenase (19) [39], aldolase antibodies and proline (21) [25], or via the periodate cleavage for esterases or lipases (22), and epoxide hydrolases (23) (Figure 8) [1]. Such substrates do not show any significant background reaction in the absence of specific enzymes, and can be prepared in many variations by simple synthetic steps. A series of mono-acetates, diacetates, cyclic carbonates, and epoxides derived from optically pure versions of fluorogenic and chromogenic 1,2-diols related to 14–16 were prepared and used for fingerprinting hydrolases [1]. The fingerprinting experiment involved incubation of a single enzyme with all the different substrates in parallel in the same microtiter plate, and following the evolution of fluorescence or color simultaneously for all substrates. This fingerprinting procedure for the measurement is repeated at will for each enzyme, and provides reproducible activity profiles across the entire array of substrates. The key advantages of this array are the negligible background reaction without enzyme, the common chromophore/fluorophore release chemistry and the simple signal acquisition using a microtiter plate reader. The array of fluorogenic monoacyl-glycerol analogs shown in Figure 9 was used to analyze the acyl chain length selectivity of lipases and esterases, as discussed below [40]. Esters of nitrophenol and umbelliferone are often used as reference substrate to define activity units of esterases and lipases. These substrates usually show a high level of nonspecific hydrolysis in the absence of enzyme, and therefore cannot be used for fingerprinting. We have recently found that the problem of nonspecific hydrolysis can be circumvented in the case of umbelliferyl esters by using the substrates in a solid-supported format [41]. Umbelliferyl esters (32–47) and oxymethyl ethers (48–51) (Figure 10) can be adsorbed onto silica gel plates
Fig. 9 Lipase fingerprinting using a series of fluorogenic monoacylglycerol equivalents with different acyl chain length. The data from this array are shown in Figure 19.
Fig. 10 Umbelliferyl esters used for fingerprinting on solid support.
2 Hydrolase Fingerprinting 13
Fig. 11 Principle of solid-supported assay.
from dichloromethane solutions to form a homogeneously impregnated layer. The silica gel plates thus prepared are then treated with the enzyme-containing solution in buffer (Figure 11). Under these conditions only active enzymes induce a fluorogenic hydrolysis reaction, while buffer and noncatalytic proteins have no effect. This solid-supported format is advantageous for fingerprinting because only small volumes of enzyme solution are required (1 µL per assay), and simple esters of umbelliferone are prepared in a single synthetic step, or are directly available commercially.
Procedure 1: Esterase/lipase Fingerprinting on Impregnated Silicagel Plates General procedure for the synthesis of esters 32a/b-47a/b: A solution of umbelliferone or 4-methylumbelliferone (1.23 mmol) in anhydrous tetrahydrofuran (THF) (3 mL) was treated with NaH (118 mg, 55% suspension in oil, 2.3 equiv.). After 30 min at 25 ◦ C, the reaction was cooled to 0 ◦ C and the acyl chloride (1.85 mmol, 1.5 equiv.) was added as a solution in dry THF (1 mL). After 2 h at 25 ◦ C, the reaction mixture was poured into aqueous 1 M HCl (50 mL) and extracted with CH2 Cl2 (2x50 mL). The organic phase was dried over Na2 SO4 , evaporated and the residue was purified by flash chromatography to give the pure esters. Fingerprinting: Silica gel thin-layer chromatography (TLC) glass-plates sil G-25 (Macherey-Nagel, Dueren, Germany) (silica gel 40–63 µm, surface 550 m2 g−1 , layer 0.25 mm) were soaked in a solution of substrate (32a/b–51a/b, 2 mM in CH2 Cl2 ) for 2 min and dried. Aliquots (1 µL) of enzyme samples in phosphate-buffered saline (PBS, 10 mM phosphate, 160 mM NaCl, pH 7.4, with 30% v/v glycerol) were spotted at 25 ◦ C. Product formation was recorded after 2–24 h using a fluorescence microtiter plate reader (λex = 360nm, λem = 460nm).
14 Enzyme Activity Fingerprinting Methods for Hydrolases
2.2 Fingerprinting with Indirect Chromogenic Assays
The use of a chromogenic or fluorogenic signal allows multiple quantitative analyses in microtiter plates. However the use of tagged substrates limits the range of structures available to compose the substrate array for fingerprinting. In addition, many chromogenic and fluorogenic substrates are not available commercially, so that the composition of a substrate array for fingerprinting requires a large synthetic effort. A practical alternative consists in using an indirect chromogenic or fluorogenic assay that returns a signal upon reaction progress with an untagged, commercially available substrate. Multisubstrate profiling of lipases and esterases has been realized using a colorimetric assay based on pH indicators, as described by Kazlauskas et al. [42]. Hydrolysis of an ester substrate releases a carboxylic acid, which results in a drop of pH in the assay solution. This pH change can be detected colorimetrically with nitrophenolate as indicator (yellow to colorless upon decreasing pH). In the so-called Quick E variation [43], the reaction rates of separate enantiomers are determined by the pH method in the presence of a competing resorufin ester substrate. Activity profiling with this method across an array of substrates was used to describe the substrate range of four esterases. Indirect fingerprinting of lipases and esterases has also been demonstrated with an assay using the principle of back titration with adrenaline (52) (Figure 12) [44]. In this assay a periodate-resistant substrate, for example a phosphate of carboxylic ester of a 1,2-diol or its epoxide precursor, is converted to a periodate-sensitive reaction product, that is a 1,2-diol, upon enzymatic transformation, that is hydrolysis by an esterase or an epoxide hydrolase. The test solution is treated with a measured amount of sodium periodate, which reacts with the oxidizable functional groups in the products. The unreacted periodate reagent is then revealed by addition
Fig. 12 Indirect chromogenic endpoint assay using back titration of periodate with adrenaline.
2 Hydrolase Fingerprinting 15
of adrenaline (52), which undergoes an instantaneous oxidation with periodate to give adrenochrome (53), a cationic orthoquinone dye with a red absorption maximum in the visible spectrum. This colorimetric assay provides off-the-shelf endpoint assays for lipases using vegetal oils as substrates, phytases using phytic acid as substrate, and epoxide hydrolases using epoxides as substrates. In the latter case, simply plating out a series of epoxides (54a-z) provides an array suitable for fingerprinting of epoxide hydrolase reactivity (Figure 13). The assay can be adapted for fingerprinting hydrolases using an array of acetates (55–88) derived from commercially available polyols such as carbohydrates and glycols (Figure 14) [45]. The resulting activity fingerprints were used to identify unusual reactivities in newly discovered microbial esterases. A related indirect assay based on chelation of copper by a fluorescent chelate or an amino acid is available for screening proteases and peptidases with their natural substrates [46], and could form the basis for similar fingerprinting arrays.
Procedure 2: Lipase/esterase Fingerprinting Using the Adrenaline Test for Enzymes
Substrates were diluted from 10 mM stock solutions in water/acetonitrile mixtures. Enzymes were diluted from 10 mg mL−1 stock solutions in aqueous borate buffer (50 mM, pH 8.0). NaIO4 was added as a freshly prepared 10 mM stock solution in water. Adrenaline (as HCl salt) was added as a 15 mM stock solution in water. Assays (0.1 mL) were conducted in individual wells of 96-well flat-bottom half-area polystyrene microtiter plates (Costar, Cambridge, MA, USA). Conditions: (1) Enzyme in 50 mM aqueous borate pH 8.0, 1 mM NaIO4 , 1 mM substrate, 60 min at 37 ◦ C; (2) 1.5 mM adrenaline, 5 min, 26 ◦ C. The optical density (OD) was recorded using a Spectramax 190 Microplate Spectrophotometer (Molecular Devices, Ismaning/M¨unchen, Germany λ = 490 nm). Commercial enzyme samples were tested at 1 mg mL−1 , proprietary esterases and lipases samples were used at one-tenth of crude enzyme extracts filtered through size-exclusion chromatography columns.
2.3 Cocktail Fingerprinting
Recording good fingerprinting data requires a simple yet reliable assay procedure. Optimally the reagent should be easy to assemble and use, and it should be possible to check its composition at any time. In addition, fingerprinting should be based on readily available instruments and adaptable to a broad range of reaction conditions, such as to be available to any laboratory working with enzymes. A remarkably straightforward solution to this problem is realized in the form of substrate cocktails, which are mixtures of enzyme substrates chosen such that either the substrates, or the reaction products, or both, are easily separable, selectively detectable, and precisely quantifiable after reaction.
16 Enzyme Activity Fingerprinting Methods for Hydrolases
Fig. 13 Array of epoxides used for fingerprinting epoxide hydrolases using the assay described.
The principle of cocktail fingerprinting was first demonstrated with a cocktail of 20 different lipase substrates tagged with a strong UV chromophore, and such that the reaction products are separated by HPLC analysis [47]. The cocktail composition can be checked at any time by HPLC or any spectroscopic analysis method, and the assay involves only a single measurement, which ensures that the relative product distribution is completely conserved for two measurements under identical conditions. The cocktail reagent was used to record the activity fingerprints of 40 different lipases and esterases. All substrates were mono-octanoyl esters of 1,2-diol, which ensured that most substrates showed significant reactivity with the enzymes. Thus, the cocktail ensured activity detection of all enzymes tested, while returning sufficient different reactivity information to uniquely identify each enzyme by a particular reactivity pattern. Cocktail fingerprinting is also suitable for the analysis of protease reactivities [48]. In this case the challenge consists in reducing the enormous diversity of possible peptide sequences to a limited yet representative set of peptides spanning a sufficient range for ensuring broad and differentiated reactivity with various proteases while being separable by HPLC. A series of five hexapeptides including all 25 possible dipeptides between (1) acidic, (2) basic, (3) hydrophobic, (4) aromatic, and (5) small and polar amino acids was realized following a domino-game assembly (Figures 15 and 16). The peptides were fluorescence labeled at the N-terminus, resulting in 31 possible measurable fragments, including the five substrates and
2 Hydrolase Fingerprinting 17
Fig. 14 Array of polyol acetates for lipase/esterase fingerprinting.
Fig. 15 Cocktail fingerprinting of protease with N-terminal labeled hexapeptide substrates. Sequences are selected by domino design.
Fig. 16 Domino design of peptide cocktail. All 25 possible types of dipeptide (upper right) serve as domino pieces to assemble five hexapeptides. Only one possible solution of the domino game is shown and realized as an actual sequence (lower
right). Note that the domino solution selected also realizes 16 of 25 possible 1,3arrangements of amino acid types. The missing 1,3-arrangements are AXA, HXA, HXH, PXP, PXS, NXP, NXN, SXA, SXS.
3 Classification from Fingerprinting Data 19
their N-terminal cleavage products. The resulting cocktail reagent reacts with a broad range of proteases, and returns a specific cleavage pattern in each case (Figure 18).
3 Classification from Fingerprinting Data
The fingerprints provided by the methods above return enzyme-specific or samplespecific activity patterns. The information that can be extracted from such patterns depends on data accuracy, the variability between different samples, and the actual structure of the substrates. In general the fingerprint data consist either of a series of initial reaction rates measured for each substrate in the presence of the enzyme, or of a series of percentages of conversion observed for each substrate after a fixed incubation time. The first step in processing fingerprinting data is to extract this reaction rate information from the primary signal recorded following appropriate calibration curves with reference products. Transformation of the recorded signal to a chemically significant value such as a reaction rate or percentage conversion is not a formal requirement if one is only interested in enzyme similarity by multivariate analysis (see below). 3.1 Fingerprint Representation
A fingerprint data set can be represented in a visual format to compare multiple enzymes and substrates simultaneously. This is done conveniently using grayscale or multicolor array displays associating each square in an array with an enzyme–substrate combination identifiable by its coordinates. The color scale should be adapted for each fingerprint such that the full color scale is used to span reactivity scales from zero to the maximum reaction rate or conversion observed in the fingerprint. Rectangular grids representing the substrate array associated with an enzyme are a convenient option for visual representation. Such grids can be created directly from a data table by conversion to the portable gray map (.pgm) or portable pixel map (.ppm) file formats. Grayscales are suitable for representation of a single data point per square in the array. The representative series of gray-scale fingerprints in Figure 17 was obtained with the carbohydrate acetates array in Figure 14 using the indirect adrenaline test described in Figure 12. The advantage of the array layout is that similarities between enzymes can be quickly assessed by a visual inspection of the data. Such comparisons are not possible if the data are represented as table numbers. One can also represent a series of fingerprints in a combined array where each line corresponds to a different enzyme fingerprint, and each column corresponds to a different substrate in the array. Such combined arrays can be integrated into a table, giving additional information about enzymes and substrates, and the different fingerprints can be ordered according to data clustering as discussed below. The clustered data set of protease cleavage patterns derived from the domino cocktail
20 Enzyme Activity Fingerprinting Methods for Hydrolases
Fig. 17 Activity of esterases and lipases towards polyacetates in Figure 14 as determined by the adrenaline test. Each grayscale square corresponds to one ester substrate according to the layout, from white (0%, no activity) to black (100%, maximum signal), after correction from blank values.
is shown as a representative example of such data display (Figure 18). The contrast in these grayscales can be augmented by using multicolor scales as available from many graphic programs, for example the blue–green–yellow–red scale used for coding temperature ranges. One of the most useful multicolor scales combines color shading with color intensity by using gradual desaturation of blue, green, and red channels, which translates into a white–yellow–orange–red–black color scale with increasing data values. Selectivities between pairs of substrates (enantiomers, stereoisomers, or analogs) can be represented using a two-dimensional color code [1]. In this format one square in the array represents data for the substrate pair by associating one color channel to the first substrate, a second color channel to the second substrate, and the third color channel to the average value of the first two channels. Complete selectivity for either substrate appears as a pure color, and the absence of selectivity
3 Classification from Fingerprinting Data 21
Fig. 18 Grayscale-coded protease fingerprints from domino-peptide cocktail. The relative distribution of products is shown by coloring the P1 position of cleavage of each detected N-terminal coumarinlabeled fragment relative to the most abundant fragment detected (shown in black) using the indicated scale. Unreacted substrates are not color coded. Proteases are ordered according to the hier-
archical clustering shown at right (Ward’s clustering using squared Euclidean distances with product percentages as variables. The variables were not normalized). * is the (7-coumaryl)oxyacetyl label and the capital letters are standard amino acid codes. C-termini of the peptide substrates are CONH2 (peptide 1, 3, 5) or COOH (peptide 2, 4).
(equal rates for each of the two substrates) appears as gray. This color scale is very convenient for visual analysis. Taking into account the occurrence of colorblindness in red–green contrast, the best choice of colors is to use the green (G) and blue (B) channel of the RGB code for the substrate values, and the red channel (R) for the average, thus producing a purple-to-green contrast map. An example of color-coded representation of a data set is shown in Figure 19, showing the data set of lipase and esterase fingerprints obtained against the array of enantiomeric fluorogenic monoacyl glycerol analogs in Figure 9. Procedure 3: Data Treatment for Representation of Colored Selectivity Arrays
The file format used to generate the colored selectivity arrays is the portable pixel map (.ppm) format. Each grid position is first assigned three whole numbers corresponding to the RGB color code between 0 (zero intensity, maximum activity) and 255 (maximum intensity, no activity) as follows: the first number is set according to the activity observed with the (R)-enantiomer (or the first of two given stereoisomers), the second number according to the activity observed with the (S)-enantiomer (or second stereoisomer), and the third number is simply the mathematical average of the first two numbers. Thus a grid with X columns and Y lines is coded with 3X columns and Y lines of whole numbers between 0 and 255. The grid of numbers is saved as
22 Enzyme Activity Fingerprinting Methods for Hydrolases
Fig. 19 Lipase/esterase fingerprints and hierarchical tree. Each line represents a different enzyme, and each column represents a different substrate. Each colored square represents the reaction rates of two enantiomeric lipase substrates (measured
separately) relative to the maximum rate observed with the corresponding enzyme (line), which is given in pM s−1 (color key at lower right). Hierarchical agglomerative clustering was carried out using Ward’s method on the basis of Euclidean distances.
comma separated value (.csv) file. This file is then opened in a text editor and the following three (or four) lines are inserted at the top of the file: P3 # (optional line with identifier) XY 255 where X is the number of columns in the array and Y the number of lines in the array. The file is then saved from the text editor in the portable pixel map format by simply adding the “.ppm” suffix. This file is opened by imageprocessing software and can be resized and saved in a different format (e.g. bmp). In this format it is possible to permutate the order of the three columns corresponding to the RGB code, and to obtain the color shading orange-to-blue (when G is the average of R and B) and a green-to-purple (when R is the average of G and B, as shown in Figure 19).
3 Classification from Fingerprinting Data 23
3.2 Data Normalization
Data processing should be kept to a minimum, paying attention to staying as close to the actual chemistry taking place in the flask as possible. Raw output data recorded during the experiment are expressed in relative fluorescence units in the case of fluorescence assays, or peak areas when processing HPLC spectra. Such primary data are already suitable for similarity analysis between substrates and between enzymes without any manipulation. Nevertheless one usually converts this primary output into chemically meaningful information such as substrate conversions (%) or reaction rates (pM s−1 ) by means of a calibration curve. With or without such conversion, the unspecific signal recorded in blank assays without enzyme should be subtracted from the primary signal. If the different substrates in the array display very different chemical reactivities, the rate data can be converted to relative rate accelerations over background to report the enzyme-specific contribution only. If there is no observable background reaction, one can calculate the relative rate in relation to a reference nonselective chemical catalyst (Eq. 1). Reporting the enzyme-specific rate acceleration rather than the absolute reaction rate allows a stronger statistical weight to be assigned to unreactive substrates, such as esters of hindered alcohols, where even a small conversion by the enzyme is noteworthy. V(i)rel =V(i)obs /V(i)ref
(1)
where V(i)obs is the observed reaction rate or conversion of substrate i; V(i)ref is the reaction rate or conversion of substrate i with a reference nonselective chemical catalyst; V(i)rel is the relative rate acceleration of substrate i over reference. For the purpose of enzyme similarity analysis, we usually normalize the data such that the sum of all reaction rates or conversions observed with a given enzyme across the different substrates in the array is set to a constant value of 1 or 100% (Eq. 2). In this manner two samples of the same enzyme at two different concentrations or at two different reaction times should appear similar. This normalization enables to conserve the actual ratio between each substrate across the total enzyme reactivity against the array. This allows one to compare biocatalysts independently of their concentration in the test samples. Such a selectivity analysis circumvents the need to define enzyme activity units for each enzyme sample being analyzed. The lipase selectivity data set (Figure 19) was normalized using this method.
V(i)rel =V(i)obs /
n
V(i)
(2)
i=1
n V(i) is the sum of the reaction rates of all the substrates for a given where i=1 enzyme.
24 Enzyme Activity Fingerprinting Methods for Hydrolases
Although we have not tested all possible data treatment approaches, we found that the following two data correction methods must be avoided in fingerprint data analysis. First, reporting the data for each substrate relative to the maximum rate or conversion observed in the fingerprint, as is done for the color-coded representation, provides unreliable results in terms of statistical comparisons, probably due to the automatic propagation of any error relating to this maximum peak (Eq. 3). Second, the standard variable normalization methods in multivariate analysis, which consists in transforming each variable such that its average is 0 and its standard deviation is 1, must be avoided because all substrates receive equal statistical weight, in particular substrates with very low conversion where the observed small variations in reaction rate or conversion reflect measurement inaccuracies rather than actual data. V(i)rel =V(i)/Vmax
(3)
where Vmax is the maximum reaction rate observed for an enzyme over the array. Correction of rate data by nonlinear mathematical operations (e.g. logarithmic scales) can be considered when the different reaction rates or conversions comprising the fingerprint span several orders of magnitude yet are all reliably measured. This correction should be considered after normalization of all observed rates of conversion to a constant sum as discussed above, such as to preserve a data set independent of catalyst concentration. 3.3 Hierarchical Clustering of Enzyme Fingerprints
In multivariate analysis, each enzyme is considered as an observation and positioned as a point in a multidimensional space whose dimensions correspond to the reaction rates observed with each substrate in the array, defined as variables (Figure 20). Similarity analysis is based on comparing distances between different enzymes (observations) in this multidimensional space or the different substrates (variables) [49]. This comparison depends on the relative weight assigned to the variables as discussed in the previous section. Hierarchical clustering is an automated procedure for grouping different observations consisting in a set of measured variables according to their similarity (geometrical proximity in the variable space). A number of distance measures and algorithms can be chosen for clustering. For the case of enzyme fingerprints, we have used the agglomerative technique based on Ward’s method which is the most commonly used in statistics [50]. We employed Euclidean and squared Euclidean distances as a measure of similarity. The latter is actually recommended [51], but both resulted in the formation of chemically meaningful clusters. A representative example is the data set in Figure 19. The classification proposed by Ward clustering correctly represents the intuitive reactivity pattern analysis operated visually, and groups esterases operating on short-chain acyl groups at the beginning of the list,
3 Classification from Fingerprinting Data 25
Fig. 20 Multivariate analysis of enzyme rate data. Each substrate defines a dimension, and the enzyme is positioned in the n-dimensional space using the observed reaction rate or conversion with each substrate as coordinate. Hierarchical cluster analysis compares distances in this n-dimensional space.
and lipases active on long-chain acyl groups at the end of this list. The cluster structure is shown in the hierarchical tree. Principal component analysis (PCA) reduces the dimensionality by combining correlated variables linearly into one factor whilst maximizing the variance captured in this factor. For example, PCA identifies if a set of points in a three-dimensional space are grouped in the same plane (two dimensions) or along the same line (one dimension) and determines the corresponding axes, with coefficients for each variable. In most data sets more than 60% of the total variance is explained by the first two principal components, allowing a 2D representation of the intrinsic structure of the clusters. For example, the lipase and esterase data set discussed above can be represented in such a 2D plot, giving a more graphical layout to the data (Figure 21). 3.4 Analysis of Substrate Similarities
The chemical composition of the substrate series composing a fingerprinting array is central to the analysis method, and the structures of these substrates contain the chemical essence of the measurement. As pointed out above, fingerprints based on a limited set of substrates must incorporate substrates displaying a significant reactivity with the targeted enzyme class, such as to avoid “zero” fingerprints devoid of any reactivity information. Most substrates should therefore be variations on a structural type known to be favored by the enzyme class under investigation.
26 Enzyme Activity Fingerprinting Methods for Hydrolases
Fig. 21 Principal component analysis (PCA) of lipases and esterases from fingerprint data. The groups A–F identified visually (dashed circles) correspond to those obtained by agglomerative clustering, with the exception of CRL2 and MJL in cluster E, which are assigned to cluster D (see Figure 19).
Any substrate array can be investigated for the functional relevance of its substrates once a data set has been acquired. In particular, the key question is whether the different substrates in the array all return differential reactivity information across the different enzymes tested. The PCA discussed above answers part of the question. Indeed if each substrate in the array behaves differently, the variability of the data set spans as many dimensions as substrates. However, if any two substrates display a similar reactivity pattern across the different enzymes, the diversity observed will be reduced to a lower number of dimensions. In the data set of lipase and esterase fingerprints discussed above, the first two principal components explain 68% of the diversity observed between the different enzymes. A bar diagram representation of the principal component coefficients indicates how each substrate contributes to the principal components of the data set (Figure 22). Substrate similarities can also be directly investigated by hierarchical clustering or PCA of substrates as observations against enzymes as variables. The data set is first normalized against substrates, with the sum of all reaction rates observed for each substrate across all enzymes being set to a constant of 100%. The results of clustering can be visualized as a hierarchical tree, or by direct rendering of the symmetrical distance matrix, which indicates functional similarities between substrates, as illustrated for the lipase data set (Figure 23). Pairs of enantiomers appear very similar in their reactivity against the different enzymes with the exception of the butyryl and hexanoyl esters, suggesting that this particular fingerprinting set
Fig. 22 Coefficients per substrate of the first two principal components (PC), which account for 68% of observed variance: PC1=46% (black bars), PC2=22% (white bars).
Fig. 23 Euclidean distance matrix between substrates 24–31 in the 25-dimensional space of enzyme reaction rate, rendered in grayscale (black = distance 0, white=maximum distance). For each substrate, reactivity for each enzyme was ex-
pressed as per cent of the total reactivity observed with this substrate across all enzymes measured. Substrates were clustered by agglomerative clustering using the group-average method.
28 Enzyme Activity Fingerprinting Methods for Hydrolases
could be reduced to a lower number of substrates to return essentially the same analysis results in terms of differentiation between enzymes.
4 Conclusions
Enzyme fingerprints are images of the reactivity profile of an enzyme across the chemical structural space of a given substrate type. The resolution of these images depends on the number and structural diversity of the substrates composing the fingerprinting array. Fingerprinting arrays composed of small sets of substrates (<20) are readily assembled and used for fingerprinting by parallel measurements in microtiter plates or in the form of cocktails analyzed by a separative analysis such as HPLC. These arrays or cocktails can be easily reproduced and therefore used as reference measurements of activity profiles. Cocktails can be made fluorogenic by using FRET labels, thereby providing general activity detection and functional identification reagents for enzyme classes such as lipases and proteases [52]. The reactivity profiles derived from enzyme fingerprints provide sufficient information for a functional classification and identification of closely related enzymes within a class. Enzyme fingerprinting can be used as microbial strain identification in medical diagnostics, as exemplified by the APIZYM system. This analysis principle may be extendable to quality control and medical diagnostics in a broadened context. Enzyme fingerprints may also prove useful for enzyme engineering. The most functionally diverse enzymes of a given collection, for example a microbial strain collection, could be identified by fingerprinting and selected as representative members for testing new substrate types, or as most favorable starting points for gene shuffling experiments to create new enzymes with novel selectivities. Fingerprinting might also enable a systematic analysis of the effect of directed evolution on reactivity landscapes to reveal the effect of single mutations on reactivity in a broader context. Finally, the data from enzyme fingerprinting experiments should provide a rich source of information to complement structural information and lead to a refined understanding of structure–activity relationships in enzymes. Acknowledgment
This work was financially supported by the University of Berne, the Swiss National Science Foundation, and Prot´eus SA, Nˆımes, France.
References 1 (a) D. WAHLER, F. BADALASSI, P. CROTTI, J.-L. REYMOND, Angew. Chem. Int. Ed. 2001, 40, 4457–4460; (b) D. WAHLER,
F. BADALASSI, P. CROTTI, J.-L. REYMOND, Chem. Eur. J. 2002, 8, 3211–3228.
References 2 J.-L. REYMOND, D. WAHLER, Chem Bio Chem 2002, 3, 701–708. 3 U. REIMER, U. REINEKE, J. SCHNEIDER-MERGENER, Curr. Opin. Biotechnol. 2002, 13, 315–320. 4 (a) E. GRUNER, A. VON GRAEVENITZ, M. ALTWEGG, J. Microbiol. Methods 1992, 16, 101–118; (b) P. GARCIA-MARTOS, P. MARIN, J. M. HERNANDEZ-MOLINA, L. GARCIA-AGUDO, S. AOUFI, J. MIRA, Mycopathologia 2001, 150, 1–4. 5 M. BAUMANN, R. STU¨ RMER, U. T. BORNSCHEUER, Angew. Chem. 2001, 113, 4329–4333; Angew. Chem. Int. Ed. 2001, 40, 4201–4204. 6 Nomenclature Committee of the International Union of Biochemistry (NC-IUCB), Recommendation 1978, Eur. J. Biochem. 1979, 97, 319–320. 7 M. KORDEL, U. MENGE, G. MORELLE, H. ERDMANN, R. D. SCHMID, Comparative analysis of lipases in view of protein design in Lipases: Structure, Mechanism and Genetic Engineering, GBF Monographs vol. 16, eds. L. ALBERGHINA, R.D. SCHMID, R. VERGER, VCH, Weinheim, 1991. 8 J. KRAUT, Science 1988, 242, 533–540. 9 M. D. JOSHI, G. SIDHU, I. POT, G. D. BRAYER, S. G. WITHERS, L. P. MCINTOSH, J. Mol. Biol. 2000, 299, 255–279. 10 J. M. DENU, J. E. DIXON, Proc. Natl. Acad. Sci. USA 1995, 92, 5910–5914. 11 D. J. HEI, D. S. CLARK, Biotechnol. Bioeng. 1993, 42, 1245–1251. 12 Y.-X. FAN, P. MCPHIE, E. W. MILES, J. Biol. Chem. 2000, 275, 20301–20307. 13 (a) S. KUNUGI, H. HIROHARA, N. ISE, Eur. J. Biochem. 1982, 124, 157–163; (b) D. G. TRUHLAR, A. KOHEN, Proc. Natl. Acad. Sci. USA 2001, 98, 848–851. 14 (a) G. LENAZ, A. M. SECHI, G. PARENTICASTELLI, L. LANDI, E. BERTOLI, Biochem. Biophys. Res. Commun. 1972, 49, 536–542; (b) M. L. PUTERMAN, N. HRBOTICKY, S. M. INNIS, Anal. Biochem. 1988, 170, 409–420. 15 J. R. SILVIUS, B. D. READ, R. N. MCELHANEY, Science 1978, 199, 902–904. 16 J. BUSSIE` RE, A. FOUCART, L. COLOBERT, C.R. Acad. Sceances Paris 1967, 264 D, 415–417. 17 D. MONGET, P. NARDON, French patent FR 2341865, Paris, 1976, US patent 4277561, 1981.
18 M. W. HUMBLE, A. KING, I. PHILIPPS, J. Clin. Pathol. 1977, 30, 275–277. 19 (a) E. GRUNER, A. VON GRAEVENITZ, M. ALTWEGG, J. Microbiol. Methods 1992, 16, 101–118; (b) P. GARCIA-MARTOS, P. MARIN, J. M. HERNANDEZ-MOLINA, L. GARCIA-AGUDO, S. AOUFI, J. MIRA, Mycopathologia 2001, 150, 1–4. 20 http://www.biomerieux.com 21 R. SICARD, J.-P. GODDARD, M. MAZEL, C. AUDIFFRIN, L. FOURAGE, G. RAVOT, D. WAHLER, F. LEFE` VRE and J.-L. REYMOND, Adv. Synth. Catal. 2005, 347, 987–996. 22 (a) G. KLEIN, J.-L. REYMOND, Bioorg. Med. Chem. Lett. 1998, 8, 1113–1116; (b) G. KLEIN, J.-L. REYMOND, Helv. Chim. Acta 1999, 82, 400–407. 23 (a) N. JOURDAIN, R. PEREZ-CARLON, J.-L. REYMOND, Tetrahedron Lett. 1998, 39, 9415–9418; (b) R. P´EREZ CARLON ` , N. JOURDAIN, J.-L. REYMOND, Chem. Eur. J. 2000, 6, 4154–4162. 24 E. M. GONZALEZ-GARCIA, V. H´ELAINE, G. KLEIN, M. SCHUERMANN, G. SPRENGER, W.-D. FESSNER, J.-L. REYMOND, Chem. Eur. J. 2003, 9, 893–899. 25 B. LIST, C. F. BARBAS, R. A. LERNER, Proc. Natl. Acad. Sci. USA 1998, 95, 15351–15355. 26 F. TANAKA, L. KERWIN, D. KUBITZ, R. A. LERNER, C. F. BARBAS, Bioorg. Med. Chem. Lett. 2001, 11, 2983–2986. 27 E. LEROY, N. BENSEL, J.-L. REYMOND, Adv. Syn. Catal. 2003, 345, 859–865. 28 A. SEVESTRE, V. H´ELAINE, G. GUYOT, C. MARTIN, L. HECQUET, Tetrahedron Lett. 2003, 44, 827–830. 29 M. C. GUTIERREZ, A. SLEEGERS, H. D. SIMPSON, V. ALPHAND, R. FURSTOSS, Org. Biomol. Chem. 2003, 1, 3500–3506. 30 W. GAO, B. XING, R. Y. TSIEN, J. RAO, J. Am. Chem. Soc. 2003, 125, 11146–11147. 31 C. A. ROESCHLAUB, N. L. MAIDWELL, M. REZA REZAI, P. G. SAMMES, Chem. Commun. 1999, 1637–1638. 32 E. LEROY, N. BENSEL, J.-L. REYMOND, Bioorg. Med. Chem. Lett. 2003, 13, 2105–2108. 33 R. SICARD, L. S. CHEN, A. J. MARSAIOLI, J.-L. REYMOND, Adv. Synth. Catal. 2005, 347, 1041–1050. 34 F. BADALASSI, D. WAHLER, G. KLEIN, P. CROTTI, J.-L. REYMOND, Angew. Chem. Int. Ed. 2000, 39, 4067–4070.
29
30 Enzyme Activity Fingerprinting Methods for Hydrolases
35 (a) D. LAGARDE, H.-K. NGUYEN, G. RAVOT, D. WAHLER, J.-L. REYMOND, G. HILLS, T. VEIT, F. LEFEVRE, Org. Process. R.& D. 2002, 6 441–445; (b) E. NYFELER, J. GROGNUX, D. WAHLER, J.-L. REYMOND, Helv. Chim. Acta 2003, 86, 2919–2927; (c) J. GROGNUX, D. WAHLER, E. NYFELER, J.-L. REYMOND, Tetrahedron Asymmetry 2004, 15, 2981–2989. 36 F. BADALASSI, G. KLEIN, P. CROTTI, J.-L. REYMOND, Eur. J. Org. Chem. 2004, 2557–2566. 37 E. M. GONZALEZ-GARCIA, J. GROGNUX, D. WAHLER, J.-L. REYMOND, Helv. Chim. Acta 2003, 86, 2458–2470. 38 F. BADALASSI, H.-K. NGUYEN, P. CROTTI, J.-L. REYMOND, Helv. Chim. Acta 2002, 85, 3090–3098. 39 J. WIERZCHOWSKI, W. P. DAFELDECKER, B. HOLMQUIST, B. L. VALLEE, Anal. Biochem. 1989, 178, 57–62. 40 J. GROGNUX, J.-L. REYMOND, ChemBioChem 2004, 5, 826–831. 41 P. BABIAK, J.-L. REYMOND, Anal. Chem. 2005, 77, 373–377. 42 A. M. F. LIU, N. A. SOMERS, R. J. KAZLAUSKAS, T. S. BRUSH, F. ZOCHER, M. M. ENZELBERGER, U. T. BORNSCHEUER, G. P. HORSMAN, A. MEZZETTI, C.
43
44 45 46
47 48 49
50 51
52
SCHMIDT-DANNERT, R. D. SCHMID, Tetrahedron Asymmetry 2001, 12, 545–556. (a) L. E. JANES, R. J. KAZLAUSKAS, J. Org. Chem. 1997, 62, 4560–4561; (b) L. E. JANES, A.C. L¨OWENDAHL, R. J. KAZLAUSKAS, Chem. Eur. J. 1998, 4, 2324–2331. D. WAHLER, J.-L. REYMOND, Angew. Chem. Int. Ed. 2002, 41, 1229–1232. D. WAHLER, O. BOUJARD, F. LEFE` VRE, J.-L. REYMOND, Tetrahedron 2004, 60, 703–710. (a) G. KLEIN, J.-L. REYMOND, Angew. Chem. Int. Ed. 2001, 40, 1771–1773; (b) K. E. S. DEAN, G. KLEIN, O. RENAUDET, J.-L. REYMOND, Bioorg. Med. Chem. Lett. 2003, 10, 1653–1656. J.-P. GODDARD, J.-L. REYMOND, J. Am. Chem. Soc. 2004, 126, 11116–11117. Y. YONGZHENG, J.-L. REYMOND, MolBiosys 2005, 1, 57–63. C. CHATFIELD, A. J. COLLINS, Introduction to Multivariate Analysis, Chapman and Hall, London, 1983. B. S. EVERITT, Cluster Analysis, Edward Arnold, London, 1993. B. S. EVERITT, S. LANDAU, M. M. LEESE, Cluster Analysis, Edward Arnold, London, 2001. Y. YONGZHENG, PhD thesis, University of Berne, 2005.
1
Chemical Complementation Assays Scott Lefurgy and Virginia Cornish Columbia University, New York, USA
Originally published in: Enzyme Assays. Edited by Jean-Louis Reymond. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52731095-1
1 Introduction
Directed evolution of enzymes designed de novo and from existing enzyme scaffolds has the potential to produce catalysts for many useful chemical reactions. Efforts in directed evolution to date have produced a few cases in which significantly increased enzyme activity, increased thermostability, or modification of substrate specificity have been obtained [1]. Notably, many of these successful examples of directed evolution have used complementation strategies as the basis for selection of improved variants, in which the function of a protein is linked to cell survival. While this approach has the advantage of being a selection rather than a screen, a major limitation is that it can only be applied to reactions naturally carried out by the cell that are inherently selectable. In order to exploit complementation to evolve enzymes for a broad range of chemical reactions, general assays must be developed. Recently, our laboratory reported a complementation assay for enzyme catalysis that is reaction-independent [2]. This assay has been called “chemical complementation” because it combines the power of genetic selection with the diversity of synthetic chemistry. Here, enzyme catalysis is linked to transcription of an essential gene through a small molecule yeast three-hybrid system, in which the reaction chemistry is specified in the linker of a small molecule chemical inducer of dimerization (CID). In addition to serving as the substrate for the enzyme, the CID dimerizes a DNA-binding domain and transcription activation domain to reconstitute an artificial transcription factor and activate expression of a reporter gene. Since the chemistry of the linker (i.e. the substrate) can change while the transcription readout remains constant, the assay has the advantage of being reaction-independent. This system has the potential to assay for numerous enzymatic activities that had Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Chemical Complementation Assays
previously been inaccessible to selection methods. To demonstrate the flexibility of this assay, we have applied it to such diverse reactions as β-lactam bond cleavage and glycosidic bond formation [3]. In this chapter, we first present a brief overview of the historical context of complementation methods, followed by an outline of the development of chemical complementation and its adaptation to detect enzyme activity. Finally, two applications of chemical complementation are described: the study of enzyme–inhibitor interactions and directed evolution of a glycosynthase.
2 Complementation Assays 2.1 Introduction
The most widely used genetic method for determination of protein function is the complementation assay. This strategy consists of modifying a cell to knockout an essential protein and then rescuing it from death by introducing a compensatory protein on a plasmid. In the context of genetics, the protein can be derived from the genomic DNA of another organism, as in the case of gene cloning. The power of this assay lies in its ability to select from a library of thousands or millions of variants only those proteins that possess the desired function. Since the generation of variability in DNA and protein sequences is now routine, researchers have begun to apply complementation to enzymology and directed evolution. The only caveat is that a suitable selection system must be devised that (1) can adequately distinguish functional and nonfunctional proteins, and that (2) can also be tuned to different levels of stringency, depending on the desired outcome. The major limitation to this method has been the reliance on naturally selectable or screenable phenotypes. Despite these parameters, however, many promising results have emerged from complementation approaches. 2.2 Early Complementation Assays
Complementation assays have historically been the tools of geneticists. In a now classic experiment, Beadle and Tatum identified enzymes responsible for the biosynthesis of essential metabolites (Figure 1) [4]. They introduced random mutations throughout the chromosomal DNA of the ascomycete Neurospora crassa by treating the organism with X-rays. Approximately 2000 individual mutated strains were then screened for growth in both rich and minimal medium, where the minimal medium contained only nutrients, such as biotin, that Neurospora cannot make itself. Three mutant strains were identified that had growth rates indistinguishable from the wild-type strain in rich media, but grew poorly or not at all in minimal
2 Complementation Assays
Fig. 1 A genetic method to identify genes responsible for the biosynthesis of essential metabolites. Mutant ascomycetes generated by X-ray irradiation were identified by failure to grow on minimal medium. Restoration of growth was conferred by the addition
of a single essential metabolite (vitamin B6). This result pointed toward a putative pathway affected by the mutation and subsequently led to the identification of genes involved in biosynthesis [4].
medium. By adding back individual metabolites to the minimal medium and then measuring the cell growth rates, the three strains were shown to be unable to synthesize vitamin B6, vitamin B1, and p-aminobenzoic acid, respectively. This basic approach of mutating the DNA encoding the proteins and then assaying for some detectable phenotype such as cell growth is the basis for genetic analysis. Most of the genes whose function is known have been identified through this type
3
4 Chemical Complementation Assays
of method, and the functions of thousands of proteins have been assayed in this manner. The contribution of Beadle and Tatum was to point out that the mutant strains could be screened based on enzymatic activity, as opposed to just cell death. Their method, however, still required several years of laborious study to identify the enzymes encoded by the genes they had mutated. In the 1950s Benzer introduced the strategy of complementation, which provided a general solution to the problem of gene identification [5]. In genetic complementation, a pool of plasmid DNA encoding fragments of wild-type chromosomal DNA is introduced into the mutant cell line and then the DNA fragment that complements the mutation is identified based on the recovery of the wild-type phenotype. In the Neurospora example above, plasmid DNA that encoded fragments of the wildtype Neurospora genome would be introduced into the mutant Neurospora deficient in vitamin B6 biosynthesis. Only mutant Neurospora with a copy of the wild-type vitamin B6 biosynthetic gene would be able to grow in the absence of vitamin B6, allowing the plasmid DNA encoding the vitamin B6 biosynthetic gene to be distinguished readily. The recent availability of several complete genome sequences further simplifies genetic complementation because the plasmid DNA sequence can be compared to the complete genome sequence. In addition, it has allowed several labs to construct libraries of plasmid DNA not just encoding random fragments of chromosomal DNA, but instead explicitly encoding every open reading frame expressed in a given organism or cell line [6]. 2.3 Enzymology by Complementation
While the cloning of proteins based on function continues to be the major application of complementation assays, they have more recently been shown to be useful tools for enzymology. The advantage of such an approach lies in the rapid assay of millions of mutants simultaneously. The advent of modern molecular biology has facilitated the generation of large libraries of protein variants. Since these libraries are genetically encoded, cell-based assays that link function of the enzyme to cell survival can be exploited for global analysis of the roles of amino acids in protein structure and function. One aspect of protein structure that has been investigated by complementation is hydrophobic core packing. The observation that substitutions in the protein core are destabilizing led to the theory that optimal packing of side-chains is required for protein folding. While this hypothesis can be derived from first principles, it is difficult to test experimentally. However, Lim and Sauer addressed this problem by using a complementation assay to select properly folded lambda repressor proteins from a collection of mutants with all possible combinations of the core residues [7]. They first generated a library in which seven hydrophobic amino acids in the protein core were randomized by cassette mutagenesis. Then functional repressors were selected from the library based on their ability to prevent infection of Escherichia coli by a repressor-defective phage. Substitutions that resulted in tightly packed cores,
2 Complementation Assays
with hydrophobicities similar to that of the wild-type, produced active repressors. Since this experiment examined every possible set of core residues, the authors were able to draw general conclusions about the rules and patterns of protein core packing. Their findings indicate that hydrophobicity, rather than steric constraints, play the most important role in determining core stability. This example illustrates an experimental question that could not be addressed by site-directed mutagenesis, but was well suited to a combinatorial mutagenesis/selection approach. Another question of interest to enzymologists is how electrostatic interactions, hydrogen bonding, and conformational constraint contribute to catalysis. Hilvert and coworkers used a complementation approach to assess the role of an active site arginine (R90) in chorismate mutase [8]. This enzyme catalyzes the first committed step in the synthesis of aromatic amino acids. They constructed a mutant E. coli strain whose chorismate mutase gene was inactive, and showed that the introduction of the active gene on a plasmid restored growth on a medium lacking phenylalanine and tyrosine. When all possible mutants of R90 were assayed by complementation, only the enzyme containing arginine was selected as active. This result confirmed the requirement of positive charge in that region of the active site. To assess the conformational constraint on this charge, a library was constructed in which both R90 and neighboring cysteine residue (C88) were completely randomized. The selection from this library produced several double mutants that were active (Table 1). The R90L mutation was shown to be active when paired with a compensatory mutation at position 88 to serine, alanine, or glycine. Surprisingly, a lysine at position 88 could restore function when position 90 was changed to serine, showing that the placement of the cation is somewhat flexible. Kinetic analysis of these mutants showed that the removal of the cation in the R90G mutant severely diminished catalytic activity, while restoration of the cation by lysine at position 88 or 90 improved this activity by 1000-fold [9]. This result demonstrated that alternative active-site architectures are not only possible, but can be accessed experimentally. Once again, the use of a selection allowed the examination of many more mutants than could be tested by traditional screening methods. Remarkably, this study and the previous one stand alone as examples of complementation applied to enzymology. Given the power of genetic selections, it is clear that this strategy is underutilized.
Table 1 Kinetic parameters for selected chorismate mutase mutants [9]
Chorismate mutase
kcat (s−1 )
WT R90G R90K C88S/R90K C88K/R90S
46 0.00027 – 0.29 0.32
K m (l M)
kcat /K m (M−1 s−1 )
67 150 – 4300 1900
690 000 1.8 31 67 170
kcat /kuncat
kcat /K m kuncat (M−1 )
4.0×106 2.3×101 – 2.5×104 2.8×104
6.0×1010 1.6×105 2.7×106 5.8×106 1.5 ×107
5
6 Chemical Complementation Assays
2.4 Directed Evolution by Complementation
Complementation assays have proven particularly powerful tools for directed evolution of proteins with new functions. In the last section, we saw examples of protein folding and activity being linked to cell survival. These kinds of assays also lend themselves well to the selection of enzymes with new or improved properties. In a directed evolution experiment, plasmids that can complement the cellular defect are isolated from a starting library, mutagenized, and submitted to additional rounds of selection. This process proceeds iteratively until the desired activity is attained or convergence of the protein sequence is reached. Increasingly sophisticated methods for generating variation in the library have been developed in order to sample a wider array of sequences. In addition, selective pressures can be applied to the cells to increase the stringency of the selection, and thus glean only the most active members of the mutant library. This process attempts to recapitulate Darwinian evolution in a test tube, and has demonstrated the capacity to produce enzymes with novel and useful properties. The earliest example of directed evolution by complementation was essentially a systematic observation of artificial selection of bacteria. Hall used a lacZ-deletion strain of E. coli to select for spontaneous mutants that could grow with lactose as the sole carbon source [10]. The mutations all fell on the opposite side of the chromosome in a single gene, which was termed ebg (for evolved β-galac-tosidase). He first identified two phenotypes that arose from the initial round of selection: class I, which grew on lactose; and class II, which grew slowly on lactose but well on lactulose. He subjected the class I mutants to selection on lactulose and obtained class IV mutants, which grew well on both substrates. Genetic analysis suggested that class IV mutants possessed a combination of the class I and II mutations. The demonstration that enzymes with new functions could be obtained through “directed evolution” by a process that mimicked natural selection was an important step forward for the field. It is remarkable that this kind of analysis was carried out in the absence of genetic engineering methods that we now take for granted. While Hall relied on naturally occurring mutations, chemical and enzymatic methods of mutagenesis soon came into use for generating libraries of random mutants. Typically, reagents like sodium bisulfite and nitrous acid were used to chemically modify single-stranded DNA, such that misincorporations would occur when the complementary strand was synthesized, resulting in transitions or transversions [11]. Unfortunately, the resulting libraries were often biased, producing many mutations at “hot spots” and failing to mutagenize other positions. An important step forward was the generation of statistically random DNA libraries through the use of chemically synthesized oligonucleotides [12]. In 1990, Knowles and colleagues devised a selection system to find active triosephosphate isomerase (TIM) mutants from a pool of random variants [13]. They started with a known mutant (E165D) that was 1000-fold less active than wild-type. Then they generated a library of TIM mutants by using oligonucleotide primers that had been doped with non-wild-type bases throughout the length of the sequence, such that each
2 Complementation Assays
member of the library had an average of one or two mutations, but retained the original E165D mutation. This library was then introduced into an E. coli strain that lacked the TIM gene. Selections for functional enzymes were carried out on media that contained glycerol, lactose, or both. The wild-type could grow with glycerol as the sole carbon source, whereas the E165D mutant could only grow on lactose (and grew very slowly on glycerol). Mutants of E165D with increased activity were selected on glycerol-only media. Six “pseudo-revertants” that contained additional mutations were characterized and found to possess higher catalytic efficiency than the starting mutant enzyme. These new mutations were found to cluster near the active site. The best mutant, E165D/S96P, had a catalytic efficiency of 6 × 10 M−1 s−1 , which was 19-fold higher than the E165D mutant alone. By comparison, chemical mutagenesis only identified a single mutant with improved activity (S96P). In addition to increasing catalytic activity, directed evolution has been used to change the substrate specificity of an enzyme. For example, Yano and coworkers used a complementation assay to convert an aspartate aminotransferase to a branched-chain aminotransferase [14]. First, they engineered an E. coli strain with a knockout in the wild-type branched-chain aminotransferase gene and showed that this strain was not viable unless the growth medium was supplemented with Val, Ile, and Leu. The wild-type aspartate aminotransferase gene was mutated and recombined using DNA shuffling and then introduced into the E. coli selection strain. The authors point out that DNA shuffling, a PCR-based method, could be used both to introduce point mutations and to generate recombined gene products. Five rounds of mutagenesis and selection were carried out. At each round, 106 –107 mutants could be examined, and the stringency of the selection could be readily increased by changing the concentration of 2-oxovaline in the growth media, the expression level of the enzyme, or the incubation time. After the final round, a mutant enzyme with 13 point mutations was isolated that had a 105 -fold increase in kcat /K m for β-branched amino acids and a 30-fold decrease in kcat /K m for aspartate (Tables 2 and 3). The improvement in kcat /K m here is impressive and likely reflects the importance of being able to modulate the stringency of a selection – first to detect poor catalysts and finally to demand high catalytic efficiency. The authors also address the question of how hard it is to alter specificity. By testing each of the 13 mutations individually, they showed that only six point mutations are actually required for a 105 -fold increase in activity toward a new substrate.
Table 2 Evolution of branched-chain aminotransferase activity from enzyme aspartate amino-
transferase [14] Residue No.
7 34 37 41 126 139 142 269 282 293 297 311 326 363 381 387 397
Wild-type AV5A-1 AV5A-7 AV5A-7(6)
E N I K K V D M V D M N R D M
S G G G
N T T T
A T
R C
A V V
N S S S
S G
M K
S G
A V
V
M
L L
L
7
2-Oxovaline L-Aspartate Oxaloacetate
L-Valine
Substrate
46 28 220 650
– 100 4.5 0.035
<0.001 0.057 1.2×105 2.3×107
– 0.0057 550 800
kcat (s−1 )
kcat /km (M−1 s−1 )
kcat (s−1 )
km (mM)
AV5A-7
Wild-type
Table 3 Kinetic parameters of evolved branched-chain aminotransferases [14]
420 3.8 68 0.52
km (mM) 110 7.4×103 3.2×103 1.3×106
kcat /km (M−1 s−1 )
– 60 70 1400
kcat (s−1 )
AV5A-76
– 33 11 0.46
km (mM)
14 1.8×103 6.4×103 3.0×105
kcat /km (M−1 s−1 )
8 Chemical Complementation Assays
3 Development of Chemical Complementation
Interestingly, their results also showed that all six mutations are beneficial on their own – raising the possibility that each position in a protein can be randomized independently. In a related application, the Diversa Corporation (San Diego, CA) has applied a mutagenesis and selection technique, termed Gene Site Saturation MutagenesisTM (GSSM), to evolve proteins that have improved thermostability and stereoselectivity [15]. In this approach, each position in the protein is randomized independently. With library sizes approaching 108 variants, it is actually possible to randomize all positions in the protein in the same experiment. On the challenging problem of changing the substrate specificity of an enzyme, directed evolution approaches have shown great promise. For example, Schultz and colleagues have used genetic selections to generate aminoacyl-tRNA synthetases that accept unnatural amino acids as substrates [16]. In addition, Pabo and coworkers have adapted a bacterial two-hybrid assay [17] to evolve zinc finger variants with defined DNA-binding specificities [18]. In our opinion, the most successful examples of directed evolution to change substrate specificity have employed complementation assays. Since the advent of catalytic antibodies, the hope has been that complementation assays could be used to improve the efficiency of those first-generation catalysts. However, antibodies have proven difficult to express in E. coli and Saccharomyces cerevisiae and thus difficult to improve by selection in these organisms [19–21]. With recent successes in the de novo computational design of enzymes [22–25], it seems we may be able to return to this basic approach but with better-behaved protein scaffolds. Computational design could be used to generate the first-generation catalyst, and selection via a complementation assay could then be employed to improve the activity of that catalyst. To achieve this lofty goal, however, general complementation assays, which can link enzyme catalysis to cell growth in a way that is reaction-independent, will be needed. In the next section, we describe a system based on the yeast three-hybrid assay that fulfills this requirement.
3 Development of Chemical Complementation 3.1 Introduction
As we saw in the previous section, complementation assays are very powerful, but limited to a few reactions that are inherently screenable or selectable. What is needed is an assay that has the power of a complementation growth selection but where the chemistry can be adapted to a reaction of interest. Thus, several laboratories have sought to develop high-throughput assays for enzyme catalysis that are general and can be applied to a broad range of chemical reactions [26–32]. Our laboratory recently reported such an assay, which uses the yeast three-hybrid assay to link enzyme catalysis to reporter gene transcription in vivo [2, 33]. We called this assay “chemical complementation” because it resembles a complementation assay
9
10 Chemical Complementation Assays
in that enzyme activity is required for the cellular phenotype, but the chemistry being complemented is controlled by the chemical linker between two small-molecule ligands and thus can be readily varied. Chemical complementation detects enzyme catalysis of bond formation or cleavage reactions based on covalent coupling of two small molecule ligands in vivo. The heterodimeric small molecule reconstitutes a transcriptional activator, turning on the transcription of a downstream reporter gene. Bond formation is detected as activation of an essential reporter gene; bond cleavage is detected as repression of a toxic reporter gene. The assay can be run on a large library of enzymes as a growth selection where only cells containing a functional enzyme survive. A number of well-characterized reporters for both positive and negative selections, as well as screens, have been reported [34]. The assay can be extended to new chemistry simply by synthesizing small molecule heterodimers with different chemical linkers as the enzyme substrates. All of the other components of the readout system remain constant while the chemistry changes. This flexibility extends this assay to a wide range of reactions, with a variety of substrate specificities and regio- and stereoselectivities. A distinct advantage of this general method is that a new assay would not have to be developed for each new enzyme target. In this section, we describe the development of chemical complementation. Since chemical complementation hinges on the integrity of the transcriptional readout system, we initially set out to devise a robust yeast three-hybrid assay. We then adapted this three-hybrid assay into a reaction-independent enzyme assay, using β-lactamase as a model enzyme. We begin by considering three-hybrid assays generally, focusing on the dexamethasone-methotrexate (Dex-Mtx) system that was developed in our laboratory. Then we present the basic chemical complementation system and consider technical improvements made to the assay since its inception.
3.2 Three-hybrid Assay
Obviously, the key to chemical complementation working well is the small molecule three-hybrid system. Several three-hybrid systems and CIDs had already been reported when we began designing the chemical complementation assay. These systems supported the feasibility of our strategy. Nonetheless, we decided to begin by designing a new CID pair to provide a yeast three-hybrid system that would simplify the synthesis of the small-molecule dimerizer and give a strong transcriptional readout (Figure 2).
→ Fig. 2 Molecules used for yeast threehybrid systems. Monomeric small molecules (1–4) that bind with high affinity to their respective receptor proteins are frequently employed in yeast three-hybrid systems.
Heterodimeric small molecules (5–7) that can reconstitute hybrid transcriptional activators are constructed by connecting monomers with a long, flexible linker (see text).
3 Development of Chemical Complementation
Fig. 2
11
12 Chemical Complementation Assays
3.2.1 Original Yeast Three-hybrid System Licitra and Liu reported the first yeast three-hybrid assay [35]. This assay was an extension of the yeast two-hybrid assay reported by Fields and Song for the highthroughput detection of protein–protein interactions [36]. The yeast two-hybrid assay is based on the observation that eukaryotic transcriptional activators can be dissected into two functionally independent domains, a DNA-binding domain (DBD) and a transcription activation domain (AD), and that hybrid transcriptional activators can be generated by mixing and matching these two domains [37, 38]. It seems that the DNA-binding domain only needs to bring the activation domain into the proximity of the transcription start site, suggesting that the linkage between the DBD and the AD can be manipulated without disrupting activity [39]. Thus, the linkage in the yeast two-hybrid assay is the non-covalent bond between the two interacting proteins. In the yeast three-hybrid assay, the linkage between the DBD and AD is a small-molecule chemical inducer of dimerization (CID) [40]. The CID consists of two ligands that are covalently linked, and that bind to their respective receptor proteins. One receptor is fused to the DBD and the other to the AD, such that when the CID binds to both receptors at the same time, the complete transcription factor is reconstituted (Figure 3). In Licitra and Liu’s yeast three-hybrid system, the transcription factor Gal4 was split into its DBD and AD. The DBD of Gal4 was fused to the ligand-binding domain of rat glucocorticoid receptor (GR), which binds to ligand 1 (dexamethasone, Dex) with nanomolar affinity. The AD of Gal4 was fused to yeast FK506 binding protein (FKBP12), which binds to the ligand 3 (FK506) with nanomolar affinity. Both 1
Fig. 3 Dex-FK506 yeast three-hybrid system. Heterodimeric ligand 5 (Dex-FK506) bridges a DNA-binding protein–receptor protein chimera (Gal4 DBD-GR) and a transcription activation protein–receptor protein chimera (Gal4 AD-FKBP12), effectively reconstituting a transcriptional activator and stimulating
transcription of a reporter gene. The levels of reporter transcription serve as an indicator of the efficiency of Dex-FK506-induced protein dimerization [35]. AD = activation domain; GR = glucocorticoid receptor; DBD = DNA-binding domain.
3 Development of Chemical Complementation
and 3 can be chemically modified without disrupting binding to their respective receptors, so heterodimer 5 (Dex-FK506) served as the CID. The reporter gene consisted of the Gal4 promoter placed upstream of the LEU2 gene. The use of LEU2 as a reporter gene linked reconstitution of the Gal4 transcription factor to survival of a leucine-auxotrophic yeast strain. In a proof of principle experiment, Licitra and Liu selected FKBP12 from a Jurkat cDNA library fused to the Gal4 AD. Despite its promise, there were two drawbacks to the Dex-FK506 yeast three-hybrid assay. First, polyketide 3 is a complex natural product – expensive to obtain in large quantities, requiring several steps for chemical modification, and sensitive to many standard chemical transformations. This synthetic complexity was considered a major drawback for chemical complementation, where the linker between 1 and 3 would need to be a complex substrate rather than a simple commercial linker. Second, the use of a Gal4-based system precluded the use of the regulatable GAL promoter for expression of the three-hybrid components or the enzyme in the eventual chemical complementation system. In our system, we sought to address these issues. 3.2.2 Dexamethasone–Methotrexate Yeast Three-hybrid System The yeast three-hybrid assay is based on the interaction between two small molecule ligands and their respective protein receptors. In theory, any ligand–receptor pair could serve in this capacity, but certain pairs have advantages for this system. The ideal ligand–receptor pair would have a very low dissociation constant to increase the lifespan and concentration of the three-hybrid complex and thus increase the levels of reporter gene transcription. The ideal ligand would be small, membranepermeable, and readily derivatized without disrupting its binding to the receptor. Its receptor would be monomeric and able to be expressed as a fusion protein. Methotrexate and dihydrofolate reductase (DHFR) meet these criteria perfectly. Ligand 2 (methotrexate, Mtx) is commercially available and can be synthesized readily from simple starting materials. DHFR fusion proteins have been used for a variety of biochemical applications [41, 42] due to the picomolar dissociation constant of 2 for DHFR [43]. These properties made Mtx-DHFR an ideal choice for replacement of FK506-FKBP12 in the original yeast three-hybrid system. For the second pair, we chose to retain the dexamethasone–glucocorticoid receptor pair from Liu’s system. The rat glucocorti-coid receptor binds 1 with a K D of 5 nM, and mutants of GR with increased affinity for 1 have been isolated [44]. The commercially available steroid 1 has been used extensively as a cell-permeable small molecule to regulate the activity and nuclear localization of GR fusion proteins in vivo [45]. The heterodimer 6 (Dex-Mtx) thus served as a synthetically tractable heterodimer that could easily be modified to add the specialized linker for chemical complementation. Another important consideration for yeast three-hybrid systems is the relative expression of the receptor–fusion proteins. The GAL promoter is the most tightly regulated promoter in yeast and can be induced to high levels with galactose, completely repressed by glucose, and tuned with varying ratios of galactose and glucose [46]. Of course, this promoter could not be used in previous yeast three-
13
14 Chemical Complementation Assays
hybrid systems that employed the Gal4 system because Gal4 is part of the GAL promoter system. Controllable expression of the protein fusions offers a number of advantages, particularly for molecular evolution. The enzyme can be expressed at high levels during early rounds of selection to obtain a low level of the desired activity. The enzyme expression levels can then be decreased during later rounds of selection to discriminate the most active enzyme variants. Inducible systems also offer a rapid means to check for false positives, by rescreening for the ability to activate reporter genes in the absence of inducer. In order to make a yeast three-hybrid system that would permit the use of the GAL promoter, we employed the orthogonal LexA-B42 system similar to that of Brent and coworkers [47]. The E. coli repressor LexA served as the DBD, while the E. coli B42 “acid-blob” domain served as the AD. Neither of these components is known to interfere with the yeast regulatory systems. The selective markers on the LexA and B42 plasmids were changed from ampicillin-resistance to kanamycin-resistance in anticipation of their use in chemical complementation (vide infra). This modification eventually allowed the use of a different marker on the enzyme vector to facilitate its isolation following the selection. Finally, lacZ and LEU2 were used for the reporter genes so that the assay could either be run as a screen or a selection. Having made these improvements to the system, we showed that heterodimer 6 can activate lacZ transcription in vivo [48]. On the basis of previous studies showing that lacZ transcription levels correlate with the strength of protein-protein interactions in the yeast two-hybrid assay [47], we expected β-galactosidase activity to be a good indicator of Dex-Mtx-induced protein dimerization. Using standard β-galactosidase activity assays we determined that optimal transcription of lacZ occurred at concentrations of 6 greater than 1 µM. Control experiments established that lacZ transcription was dependent on 6. Only background levels of β-galactosidase activity were detected when 6 was omitted. Competition of 6 with a 10-fold excess of 2 reduced Dex-Mtx-dependent lacZ transcription to near background levels. A ten-fold excess of ligand 1, however, did not affect Dex-Mtxdependent lacZ transcription, and higher concentrations of 1 were toxic to the yeast cells. This result may be due to differences in cell permeability between 1 and 6 or may suggest that LexA-DHFR, but not B42-GR, is the limiting reagent. In addition, when either or both receptors were deleted, only background levels of lacZ transcription were detected. The fully assembled system produced a strong transcriptional response in the presence of 6 and provided a robust system in which to read out enzymatic activity in chemical complementation. 3.2.3 Technical Considerations Since the introduction of the yeast two-hybrid assay in 1989, there have been a number of improvements to the basic system. Different DNA-binding and activation domains have been introduced. Convenient vectors with different bacterial drugresistance markers, yeast origins of replication, and yeast auxotrophic markers are now commercially available. An array of common yeast genetic markers have been introduced as reporters, making it possible to test large pools of protein variants (about 106 ) using growth selections. In addition to the simple activator system,
3 Development of Chemical Complementation
reverse- and split-hybrid systems have been developed to detect the disruption of protein-protein interactions, and most recently a transcriptional repressor-based system was reported. These improvements to the yeast two-hybrid system are directly applicable to the three-hybrid system, and have been given detailed treatment in several recent reviews [34, 47]. The CIDs are an additional complexity of the yeast three-hybrid system and bring their own challenges. Two important considerations that have been examined experimentally in our laboratory are the effect of the CID linker on transcription activation and the species-dependence of the receptor proteins [49]. To carry out a high-throughput screen with the Dex-Mtx yeast three-hybrid assay, it proved necessary to stabilize the transcription readout. The original Dex-Mtx system showed both a high number of false positives and false negatives, and an inconsistency in the levels of small-molecule-induced transcription activation. It is generally assumed that this variability in the transcription readout comes from the use of multiple 2µ plasmids. For the Dex-Mtx yeast three-hybrid system, the DBD fusion, AD fusion, and reporter gene are all on 2µ plasmids, and for the chemical complementation system a fourth 2µ plasmid would then be used to express the enzyme. Not only can these plasmids recombine with one another since they have large regions of sequence identity, but also the distribution of plasmid copies likely varies in each cell. The common remedy is to integrate the genes of interest into the chromosome. However, doing so decreases the number of copies of each gene from as many as 60 to one or two (haploid or diploid), which may not be sufficient for a strong transcription readout. Thus, all combinations of the DBD, AD, and reporter gene were integrated into the chromosome, and the resulting strains were tested for their levels of small-molecule-induced lacZ transcription [50]. The AD and reporter gene were found to be the limiting reagents, such that the DBD and any one other component could be integrated without affecting the levels of transcription activation significantly. Use of a Dex-Mtx yeast three-hybrid strain in which the DBD and reporter gene were integrated reduced the false positive rate from 20% to 3%, a significant improvement. Interestingly, integration of the DBD actually increased the levels of transcription activation, which arguably is consistent with optimizing the concentration of DBD bound both to DNA and to the CID. For all of the applications carried out in our laboratory to date, it has proven necessary to integrate one or more components of the yeast three-hybrid system into the chromosome, and it is not simply predictable which integrated strain will be optimal for a given application. Despite the widespread use of yeast two- and three-hybrid assays, very little is known about how transcription activation levels correlate with the strength of the binding interaction. Golemis and coworkers undertook a systematic study of the relationship between protein–protein interaction and the yeast two-hybrid readout using homo- and heterodimers for which the dissociation constants had been determined in vitro [47]. From their studies they concluded that in the yeast two-hybrid system it is not possible to detect interactions with dissociation constants above 1 µM. To extend this work to the yeast three-hybrid system, we set out to examine the correlation between the strength of the binding interaction and the
15
16 Chemical Complementation Assays
transcription readout using a series of mutants of a well-studied li-gand-receptor pair [51]. We chose to carry out these studies in the Mtx-DHFR yeast three-hybrid system described above. The sensitivity of the three-hybrid assay should be a function of the ligand-receptor pair being used as an anchor. Since 2 binds to DHFR with low picomolar affinity, the use of 2 as an anchor should facilitate the detection of relatively weak interactions at the other end. To characterize the sensitivity of the Mtx yeast three-hybrid system, we considered the interaction between FKBP12 and 4 (synthetic ligand for FKBP12, SLF). The nanomolar dissociation constant of this pair left ample room for the design of FKBP12 mutants with reduced but measurable affinity. First, we demonstrated that 7 (Mtx-SLF) could activate lacZ transcription in a yeast three-hybrid system consisting of LexA-DHFR and B42FKBP12. Then, to obtain FKBP12 variants with a wide range of binding affinities for 4, we designed point mutants of FKBP12 on the basis of previous biochemical studies and analysis of the high-resolution crystal structure of FKBP12 with 4 and 3 [52, 53]. We then determined the dissociation constants for 4 and the FKBP12 mutant variants by using a fluorescence polarization assay [54]. Finally, we inserted these FKBP12 point mutants into the yeast three-hybrid assay and measured the levels of transcription activation using a lacZ reporter gene. This experiment showed that binding affinity in the yeast three-hybrid system correlates with transcription, but only over a dynamic range of an order of magnitude. Furthermore, this study demonstrated that the yeast three-hybrid system has a K D cutoff of 50 nM when using the Mtx-DHFR anchor. Interestingly, another study, in which the Mtx-DHFR anchor is used to screen a cDNA library for drug targets, estimates the detection limit to be low micromolar [55]. Since the readout is affected by the placement of the linker on the ligand and the folding and localization of the receptor, there may not be a simple correspondence between affinity and transcription for every interacting pair.
3.2.4 Other Three-hybrid Systems The appeal of high-throughput screening methods to detect protein–small molecule interactions has prompted the development of yeast three-hybrid systems that use alternative strategies for transcription activation. Since any ligand–receptor pair can potentially be used in the three-hybrid system, some groups have looked to exploit increasingly high-affinity interactions as anchors. GPC Biotech introduced a drug–target interaction into a yeast three-hybrid system anchored by Mtx-DHFR. They then used this system to screen cDNA libraries for cyclin-dependent kinases (CDKs) that bind selectively to a given CDK inhibitor [55, 56]. Peterson and coworkers developed a yeast three-hybrid system anchored by the femtomolar biotin–streptavidin interaction [57]. Taking this approach to the extreme, Johnsson and colleagues designed a system based on covalent bond formation between the CID and the DBD, using the human DNA-repair enzyme O6 -alkylguanine-DNA alkyltransferase (hAGT) and its substrate, O6 - methylguanine [58]. The DBD-hAGT fusion served as one “receptor” and DHFR-AD as the other, while the CID consisted of 2 linked to O6 -benzylguanine.
3 Development of Chemical Complementation
Another interesting possibility is the use of a small molecule to activate transcription, without the need for a protein AD. Mapp and coworkers have produced a system in which the CID consists of 2 linked to derivatives of the small molecule isoxazolidine, which interacts directly with the transcription machinery to activate reporter gene expression. Three-hybrid approaches have generated much interest as high-throughput screening tools, but the validation of these methods will require their application to practical problems of protein– small molecule interactions. 3.3 Chemical Complementation
To implement chemical complementation, we needed a robust small molecule yeast three-hybrid assay and an enzyme–substrate pair that could be programmed into the assay. Our optimized Dex-Mtx yeast three-hybrid system produced a strong small-molecule-dependent transcription readout. All we needed to do was introduce the enzyme on a plasmid and the substrate in the linker of 6. Having developed the assay around a model reaction, we then established it as a high-throughput screen for enzyme activity (Figure 4).
Fig. 4 Molecules used in the development of chemical complementation. The cephem core $-lactam (8) is a substrate for cephalosporinases. Mtx-CephemDex (9) (MCD) is both a substrate for cephalosporinases and a heterodimeric
ligand for the Dex-Mtx yeast three-hybrid system. Nitrocefin (10) is a chromogenic cephalosporinase substrate. o-Nitrophenyl-$D-galactopyranoside (11) (ONPG) and X-Gal (12) are chromogenic substrates of the $galactosidase reporter gene (lacZ) [2].
17
18 Chemical Complementation Assays
3.3.1 Selection Scheme and Model Reaction To adapt our Dex-Mtx yeast three-hybrid system to read out enzyme catalysis of bond cleavage reactions, we replaced the linkage between Mtx and Dex in 6 with the substrate for the reaction and added an enzyme as a fourth component to the system. We chose cephalosporin hydrolysis by the Enterobacter cloacae P99 cephalosporinase as a simple cleavage reaction [59] to demonstrate the selection strategy (Figure 5). Cephalosporins are β-lactam antibiotics, and cephalosporinases are the bacterial resistance enzymes that hydrolyze and, therefore, inactivate these antibiotics. The P99 cephalosporinase is well-characterized biochemically and structurally [60, 61] and the synthesis of cephem compounds (the family described by 8) is established [62]. We chose to incorporate Mtx and Dex at the C3’ and C7 positions, respectively, of the cephem core (R3 and R1 of 8). Cleavage of the β-lactam bond in the cephem moiety of heterodimer 9 (Mtx-Ce-phem-Dex, MCD) results in expulsion of the leaving group at the C3’ position, effectively breaking the bond between Mtx and Dex (Figure 5a). Thus, heterodimer 9 should reconstitute the transcriptional activator, causing transcription of the lacZ reporter gene in the yeast three-hybrid assay. When the cephalosporinase enzyme is expressed, however, the cephem linkage
→ Fig. 5 Chemical complementation links enzyme catalysis to reporter gene transcription. Cephalosporin hydrolysis provides a simple cleavage reaction to demonstrate the complementation strategy. (a) Cephalosporin hydrolysis by a cephalosporinase enzyme. Cephalosporinases are serineprotease enzymes and catalyze the hydrolysis of cephalosporin antibiotics 8 by means of an acyl-enzyme intermediate. Hydrolysis of the $-lactam bond in 9 results in expulsion of the leaving group at the C3 position of the cephem core, effectively breaking the bond between Mtx and Dex. (b) Cephalosporin hydrolysis by the cephalosporinase enzyme disrupts transcription of a lacZ reporter gene. Substrate 9 dimerizes a LexA DNA-binding domaindihydrofolate reductase (LexA-DHFR) and a B42 activation domain-glucocorticoid receptor (B42-GR) fusion protein, activating transcription of a lacZ reporter gene. Addition of active cepha-losporinase enzyme results in cleavage of substrate 9 and disruption of lacZ transcription. (c) X-Gal plate assays of cephalosporinase-dependent Mtx-Cephem-Dex (MCD)-induced lacZ transcription. Yeast strains containing a lacZ reporter gene were grown on indicator plates containing 12 with or without Mtx-linker-Dex molecules as indicated. Columns 1–4 correspond to yeast strains containing a LexA
DNA-binding domain fusion protein, a B42 activation domain fusion protein, and enzyme, as follows: Column 1, LexA-DHFR, B42, P99 cephalosporinase. Strains in column 1 lack GR and are used as negative controls. Column 2, LexA-DHFR, B42-GR, no enzyme; column 3, LexA-DHFR, B42-GR, P99 cephalosporinase; and column 4, LexADHFR, B42-GR, P99 S64A cephalosporinase. The rows correspond to individual X-Gal plates, which have different small molecules as indicated: No MD, No substrate; MD, 1 :M 6; MCD, 10 :M 9. (d) o-Nitrophenyl-$d-galactopyranoside (ONPG) liquid assays. Yeast strains expressing the LexA-DHFR and B42-GR fusion proteins and containing lacZ reporter gene and expressing no enzyme (left), P99 cephalosporinase (center), or P99 S64A cephalosporinase (right) were grown in liquid culture and assayed for $-galactosidase activity with 11 as a substrate. The liquid culture contained small molecules as indicated. The assays were done in triplicate. Hydrolysis rates of 11 are reported as nmol min−1 mg−1 of total protein, and the error bars for the specific activity correspond to the standard error. Strains containing the active P99 cephalosporinase showed an eightfold decrease in the level of lacZ transcription relative to strains containing the inactive S64A variant [2].
3 Development of Chemical Complementation
Fig. 5
19
20 Chemical Complementation Assays
should be cleaved, and protein dimerization and transcription of the reporter gene should be disrupted (Figure 5b). The next step was to add the enzyme to the yeast three-hybrid system and find conditions where we could observe enzyme-dependent transcription. To this end, the yeast three-hybrid strain was re-engineered so that the expression of the transcriptional activator fusion proteins and the enzyme could be regulated independently. The LexA-DHFR and B42-GR proteins were placed under the control of the fully regulatable GAL1 promoter; the P99 cephalosporinase, under the repressible MET promoter. In addition, the enzyme was expressed from a plasmid containing a spectinomycin-resistance marker, whereas the transcriptional activator fusion proteins were expressed from vectors containing kanamycin-resistance markers, to facilitate isolation of the plasmid encoding the enzyme at the end of a screen or selection. Finally, no ampicillin-resistance markers were used because these markers encode β-lactamase enzymes, which, if expressed, could cleave the MCD substrate. First, we established independently that the cephalosporinase was being expressed in an active form in the yeast cells by using 10 (nitrocefin), a known chromogenic substrate for the P99 cephalosporinase [23]. We then developed conditions where expression of the P99 cephalosporinase disrupted MCD-mediated lacZ transcription (Figure 5c, column 3). These results were confirmed by using quantitative assays in liquid culture with 11 (o-nitrophenyl-β-D-galactopyranoside, ONPG) and demonstrated that the three-hybrid assay could be used to detect cephalosporinase activity (Figure 5d).
3.3.2 Results To confirm that the observed activation of the reporter gene by heterodimer 9 was truly enzyme-dependent, we carried out a number of control experiments. Using standard β-galactosidase assays both on plates and in liquid culture, we measured the levels of lacZ transcription in yeast strains expressing different LexA and B42 fusion proteins, enzymes, and a lacZ reporter gene (Figure 5c, d). Transcription in each of these strains was tested in the presence of no small molecule, 6 (MD), or 9 (MCD). As can be seen, lacZ transcription in the strain expressing LexA-DHFR and B42-GR is small-molecule-dependent (Figure 5c, column 2). Addition of the wild-type P99 cephalosporinase disrupts this small-molecule-induced transcription activation when the cells are grown in the presence of 9, but not 6 (Figure 5c, column 3, and 5 d). Importantly, expression of the P99 cephalosporinase has little effect on the levels of Mtx-Dex-activated lacZ transcription (Figure 5c, row 2, and 5 d). Another important control was to show that the disruption of lacZ transcription was due to turnover of the cephem linkage and not merely due to sequestration of substrate 9 by the cephalosporinase enzyme. To address this question, we compared the activity of the wild-type P99 cephalosporinase in this assay with that of the inactive mutant S64A (Figure 5c, column 4, and 5 d). No detectable change in the level of lacZ transcription was observed for cells expressing the inactive enzyme. Together, these results established that the change in transcription of the lacZ reporter gene is due to enzyme-catalyzed turnover of substrate 9.
3 Development of Chemical Complementation
One criterion for a complementation assay is its ability to select enzymes with a desired activity from a large cDNA or mutant library. To determine the usefulness of our system for high-throughput screening, an experiment was carried out to isolate active enzymes from a large pool of inactive mutants. The yeast selection strain was transformed with a mixture of plasmids containing the inactive S64A mutant (95%) or the wild-type P99 cephalosporinase (5%). The clones were grown on medium containing 9 and 12 (X-Gal) for blue-white screening. Five blue and five white colonies were analyzed: all five blue colonies contained the expected inactive mutant enzyme, while four of five white colonies contained the wildtype active enzyme. This result indicated that the assay has a low false-positive rate, demonstrating that chemical complementation can be used reliably to screen libraries of proteins in vivo on the basis of catalytic activity. 3.3.3 General Considerations An important parameter for any enzyme assay is its ability to detect enzymes of varying catalytic activity. In an evolution experiment, for example, one might wish to select for poor catalysts in early rounds and then increase the stringency to detect better catalysts in later rounds. For chemical complementation, the key question was whether differences in transcription of the reporter correlated with the catalytic efficiency of the enzyme. To address this question, mutant P99 cephalosporinases with a range of catalytic activity were constructed and assayed. LacZ reporter gene activation was indeed found to correlate with catalytic activity, and the system was able to distinguish between enzymes spanning three orders of magnitude in kcat /K m [63]. This dynamic range makes the assay useful for many applications, including directed evolution and enzymology. To improve this dynamic range, switching to a growth selection strategy will be necessary. In such a system, the reporter gene would be an enzyme that catalyzes the synthesis of an essential metabolite. The advantages of this system are twofold. First, small changes in transcription are amplified into large differences in growth advantage, allowing detection of slight differences in enzymatic activity. Second, the assay can be tuned to detect enzymes in a particular catalytic range by providing more or less of the essential metabolite in the growth medium. Alternatively, changing the expression level of the enzyme could produce a similar effect. Efforts are currently underway in our laboratory to develop conditions that improve the dynamic range of the assay. 3.3.4 Related Methods Since a general assay for enzyme activity would be an invaluable tool for drug discovery, directed evolution and proteomics, many groups have attempted to develop such a system. Some of these approaches are also based on the yeast three-hybrid, but use different selection strategies. Benkovic and colleagues have reported a three-hybrid system in bacteria in which the presence of an active enzyme results in transcription of the arabinose operon. Enzyme activity is linked to transcription by using unreacted substrate to compete with a CID for binding to the yeast threehybrid receptor fusion proteins [30]. In this example, scytalone dehydratase (SD) can process a scytalone analog, rendering the substrate unable to bind to the yeast
21
22 Chemical Complementation Assays
three-hybrid components. Thus, when an active enzyme is present, the three-hybrid is reconstituted and transcription of the arabinose operon ensues. In a totally different strategy, Peterson and coworkers have developed a threehybrid-based system to detect tyrosine kinases. In this system, the peptide substrate of the kinase is covalently fused to the DBD by in-frame translation and takes the role of the CID. Recognition and phosphorylation of the peptide by an active kinase converts the peptide into a ligand recognized by an SH3 domain, which acts as a receptor [64]. Further, this group has reported a system that makes transcription dependent on the enzymatic biotinylation of a protein substrate [65]. Here, the CID consists simply of biotin, which the enzyme BirA covalently links to a peptide fused to the AD. The biotinylated peptide then binds to the DBD-streptavidin fusion to reconstitute the transcriptional activator. Additional methods that use small molecules to control transcription have been put forth, including the “chemical complementation” assay of Doyle and coworkers [66]. Kodadek and colleagues [67] have developed a complementation system for selecting peptides that bind to proteins of interest using a negative selection, where binding of a peptide to its target protein creates a transcriptional repressor. While these methods have not been applied to enzyme catalysis, they point to a growing interest in the field of high-throughput assays that are based on complementation strategies.
4 Applications of Chemical Complementation 4.1 Introduction
To explore the flexibility and power of chemical complementation, we have used the system to study inhibitor interactions with a β-lactamase and to improve the catalytic activity of a glycosynthase. In the first example, the system was used to screen a library of mutant β-lactamases for their interaction with third-generation cephalosporin antibiotics, in order to dissect residues that contribute to antibiotic resistance. In the second example, we added carbohydrate chemistry to the repertoire of this assay and carried out a selection for glycosynthase enzymes with improved catalytic function. In this section, we will present the results of these two applications, which exemplify the ease with which this system can be adapted to new chemistry and used to answer complex biochemical questions. 4.2 Enzyme–Inhibitor Interactions
Although chemical complementation was designed to read out the activity of an enzyme, a logical extension of this system would be to identify or evaluate inhibitors of an enzyme. This approach is particularly attractive as a means of drug discovery
4 Applications of Chemical Complementation
using a library of small molecules to screen for an enzyme inhibitor. Alternatively, the effect of a single inhibitor on the function of an enzyme could be probed systematically by screening it against a series of mutant enzymes. In contrast to the classical approach of generating point mutants and correlating them with altered binding, chemical complementation can rapidly screen hundreds of thousands of targeted or random mutants in a functional assay to qualitatively assess the effectiveness of an inhibitor. 4.2.1 Rationale β-Lactam antibiotics are the largest class of antibacterial agents administered worldwide, and the continuing development of resistance by bacterial strains to these drugs is a major public health concern. Resistance is most often mediated by the expression of β-lactamases and cephalosporinases, which inactivate the antibiotics by hydrolyzing the β-lactam bond [68]. The β-lactamase reaction proceeds through a covalent acyl-enzyme intermediate, which in the case of resistant enzymes is rapidly hydrolyzed to regenerate the catalyst. Therefore, significant resources have been devoted to the development of highly functionalized β-lactam antibiotics that can trap these enzymes in the acyl-enzyme state, thus making them unavailable for β-lactam hydrolysis. Mutant β-lactamases that confer resistance to third-generation cephalosporins such as 13 (cefotaxime) and 15 (ceftazidime) have recently been isolated. Structural studies of these mutants suggest that residues contacting the C7 substituent of the cephalosporin are important determinants of resistance [69, 70]. In particular, mutating tyrosine 221 (Y221) of E. coli cephalosporinase AmpC to alanine or glycine appears to remove a steric constraint on the acyl-enzyme intermediate, allowing it to adopt a favorable conformation for deacylation. Additionally, biochemical evidence has shown that mutations at glutamic acid 217 (E217) and alanine 220 (A220) may also contribute to cephalosporin resistance [71]. In order to understand the molecular basis of antibiotic resistance, we sought to use chemical complementation to measure the effect of inhibitors on β-lactamases containing mutations at positions that contact the C7-substitutuent [72]. Since genetic assays for β-lactamase activity already exist, we also used this experiment as a way to evaluate the precision and accuracy of chemical complementation for high-throughput screening. First, we made a library of enzymes in which position 221 was mutated to all 20 amino acids and compared the inhibition profiles obtained by chemical complementation with the in vitro activity of these mutants. Subsequently, we used chemical complementation to screen a library in which E217, A220, and Y221 were all randomized for enzymes that were active against third-generation cephalosporins. 4.2.2 Screen Strategy In order to determine the inhibition profiles of mutant β-lactamases, we first set out to demonstrate that the assay could distinguish active, inactive, susceptible and resistant enzymes. Since the third-generation cephalosporins 13–16 form a long-lived acyl-enzyme intermediate with the β-lactamase, it was envisioned that
23
24 Chemical Complementation Assays
Fig. 6 Chemical complementation as a screen for enzyme inhibition. The wild-type $-lactamase (left) is unable to cleave the cephem bond in heterodimer 9 due to the presence of an inhibitor bound in its active site, resulting in transcription of the lacZ
reporter gene. Mutant $-lactamases (right) that can cleave or avoid the inhibitor can be identified using the chemical complementation assay because the enzyme cleaves the cephem bond of heterodimer 9, turning off transcription of the lacZ reporter gene.
these inhibitors would also block cleavage heterodimer 9 by P99 cephalosporinase, resulting in an increase in transcription of the lacZ reporter gene. Further, mutant β-lactamases, such as the GC1 variant cephalosporinase, that are not inhibited by the third-generation cephalosporins would be able to cleave heterodimer 9, and thus could be detected based on a decrease in transcription of the lacZ reporter gene (Figure 6). This strategy should be general for enzymes that proceed through a covalent enzyme intermediate. Initially, to establish the feasibility of this screen, we determined the ability of chemical complementation to distinguish the inhibition profiles of wild-type P99 cephalosporinase and a known resistance mutant GC1 (Figure 7). Both of these enzymes were able to cleave heterodimer 9, resulting in background levels of lacZ transcription. When the third-generation cephalosporins 13 (cefotaxime) or 14 (cefuroxime) were added to the growth medium, cells expressing the P99 cephalosporinase showed increased transcription of the lacZ reporter, indicating that the enzyme was being inhibited. Cells expressing the GC1 variant, however, showed relatively little change in lacZ transcription, indicating that this enzyme was available to cleave heterodimer 9 and thus not inhibited. By contrast, as expected, the presence of cephamycin 16 (cefoxitin) strongly inhibited both enzymes,
4 Applications of Chemical Complementation
Fig. 7 Inhibition profiles of two $-lactamases with cephalosporins. Chemical com-plementation can distinguish the inhibition of the P99 cephalosporinase by a series of second- and third-generation cephalosporins from that of the extended spectrum GC1 variant. (a) o-Nitrophenyl$-d-galactopyranoside (ONPG) liquid assay. Screening strains expressing either P99 or GC1 are assayed for lacZ transcription in liquid medium containing 5 :M heterodimer 9 in the presence or absence of 100 :M 13 (cefotaxime, CFT), 14 (cefuroxime, CFR), or 16 (cefoxitin, CFX). Signals are reported as fold-activation relative to
basal transcription of the lacZ reporter gene in the absence of heterodimer 9. P99 cephalosporinase and GC1 variant both effectively cleave heterodimer 9 in the absence of any added inhibitors, reducing the lacZ signal down to background levels. P99 cephalosporinase is clearly inhibited by all three inhibitors, as shown by the four-fold increase in its lacZ transcription signal, while GC1 variant is only inhibited by 16. Error bars represent the standard error for three separate transformants. (b) Molecules described in this study. Cephalosporin 15 (ceftazidime, CFZ) was not tested.
25
26 Chemical Complementation Assays
resulting in large increases in transcription. Thus, chemical complementation could successfully classify enzymes into three phenotypes: inactive, active and inhibited, or active and resistant. This convincingly showed that the assay could be used as an inhibitor screen. 4.2.3 Enzyme Library Screen Having established the feasibility of the inhibitor screen, we then carried out a pilot screen at position Y221 followed by a high-throughput screen of residues contacting the C7 substituent of the cephalosporin antibiotic. The Y221 library was characterized thoroughly to evaluate the precision and accuracy of the chemical complementation assay. The high-throughput assay then extended the assay to a large scale and explored different positions in the cephalosporin binding pocket. For the Y221 screen, position 221 of the P99 cephalosporinase was randomized to all 20 amino acids by using degenerate oligonucleotide primers bearing the NNS codon (N=G,A,T,C; S=G,C). The library was transformed into the yeast selection strain and plated under nonselective conditions. Ninety-five randomly chosen transformants were arrayed in a microtiter plate and assayed in the chemical complementation lacZ screen for transcription with heterodimer 9 in the presence and absence of cephalosporin 13. Based on this assay, each clone was assigned a phenotype: active against heterodimer 9 and resistant to 13; active, but susceptible to inhibition by 13; or inactive. The plasmids were isolated from each clone and sequenced in the region containing position 221 so that the phenotypes and mutations could be correlated (Figure 8). As expected, the Tyr and Ala mutants produced susceptible and resistant phenotypes, respectively, consistent with previous studies. Surprisingly, mutants with Phe, Leu, Val, Cys, His, Lys, Arg, and Ser all showed resistant phenotypes, in contrast to previous studies suggesting a small, hydrophobic residue was required at this position for efficient deacylation of 13. To determine whether the screen had correctly predicted the in vitro activity of the mutant enzymes, a few mutants from each phenotype were expressed, purified, and assayed for activity against 13. All of the mutants deemed resistant showed a substantial increase in their kcat values with 13, compared with wild-type P99 cephalosporinase, ranging from a 430-fold increase with Y221L to over 2700-fold increase with Y221A. In addition, the K m values of these mutants were increased by about 10-fold over the wild-type enzyme, leading to an overall improvement of 100-fold in their catalytic efficiencies (kcat /K m ). By contrast, mutants that were phenotypically susceptible to 13, such as Y221T, showed only modest 30-fold increases in kcat over wild-type. Inactive mutant Y221E was substantially impaired against the heterodimer 9 substrate, with a 100-fold decrease in kcat /K m compared with wild-type, despite its 400-fold increase in kcat for 13. In every case where a mutant was characterized in vitro, the chemical complementation assay had accurately predicted the effect of the mutation. Since redundancy was built into the assay and most mutants were assayed multiple times, the precision of the assay can also be estimated as a ratio of the number of correct results to total results, 74%. This estimate of 74% is conservative because the sequencing results showed that many of the false positives/negatives had additional mutations in the coding
4 Applications of Chemical Complementation
Fig. 8 Y221X mutants of P99 cephalosporinase characterized using the chemical complementation lacZ screen. Mutations are grouped as hydrophobic amino acids (A, F, L, V), negatively and positively charged amino acids (D, E, K, R), polar amino acids (C, H, N, Q, S, T, Y), stop codons ("), or deletions and insertions (D/I). Based on the signals obtained from the liquid o-nitrophenyl-$-d-galactopyranoside (ONPG) data
using the chemical complementation lacZ screen in the presence of 5 :M heterodimer 9, each mutant was categorized as an active or inactive $-lactamase. Active $-lactamases were further categorized as inhibited by or resistant to cephalosporin 13 on the basis of the same assay carried out in the presence of 100 :M 13. No mutants containing G, I, M, P, or W were obtained out of 85 colonies screened.
region. Thus, the Y221 library showed the chemical complementation assay to be robust and even revealed some surprising resistance mutations at this well-studied position. Confident of the accuracy of the chemical complementation assay, a highthroughput screen was then undertaken to survey the effects of additional mutations in the region of the enzyme contacting the C7 substituent of the cephalosporin. A library in which positions 217, 220, and 221 of P99 were randomized using NNS codons was constructed and a random subset of 960 clones was screened with 13 in the chemical complementation assay as described above. Mutants that showed resistant profiles in this screen were characterized in vitro and several were found to have increased activity toward 13. Overall, the A220N/Y221H/N226D gave both the highest kcat and highest K m obtained for any mutant tested, with a kcat that was 5600-fold over that obtained for wild-type P99 cephalosporinase and a K m that was 100-fold over that obtained for wild-type P99 cephalosporinase. Although the library was not screened comprehensively, an improvement in catalytic efficiency of two orders of magnitude was observed for the best mutant, D88N/D217V/A220F/Y221A. Surprisingly, a few mutants in which tyrosine was retained at position 221 also showed up to 77-fold increases in catalytic efficiency toward 13. This result clearly demonstrated the ability of chemical complementation to screen mutant libraries and identify improved catalysts.
27
28 Chemical Complementation Assays
4.2.4 General Considerations The requirement of yeast cell permeability to cephalosporins led us to make two critical modifications to the system. First, the growth medium was supplemented with 1% dimethyl sulfoxide (DMSO) to slightly permeabilize the yeast cell membrane to the inhibitors. Next, the PDR1 gene was knocked-out in the selection strain. Pdr1p positively regulates the expression of PDR5, SNQ2, and YOR1, which encode ABC transporters that work on a variety of small molecules [73–75]. PDR1 knockout strains have been used in a number of studies to increase the permeability of yeast to various small molecules [76]. Together, these two modifications to the assay permitted the observation of enzyme inhibition by cephalosporins in vivo. Like any high-throughput assay, chemical complementation is prone to false positives and false negatives, which result from clonal variation in the yeast three-hybrid components. Conveniently, secondary screening to identify these artifacts simply consists of multiple repetitions of the chemical complementation lacZ screen in the presence and absence of heterodimer 9. Clones that fail to activate lacZ transcription in the presence of the CID are considered false positives for β-lactamase activity conversely, clones that activate transcription in the absence of the CID are regarded as false negatives. In the Y221 library screen, 10 of the 95 colonies tested in this secondary screen were found to be false positives and eliminated. Since this method of secondary screening is enzyme-independent, it should be generally applicable for chemical complementation. The use of secondary screens for the assessment of yeast transcription assays is treated in greater detail elsewhere [34]. 4.3 Glycosynthase Evolution
One of the major goals of enzyme research is the successful design of enzyme catalysts for any desired reaction. There are exciting recent advances in computational modeling [23], but the goal of designing protein sequences de novo that will have a specified activity is still beyond our reach. Directed evolution, on the other hand, seeks to improve catalysts by creating large libraries of protein variants and selecting for those that have improved activity. The bottleneck to enzyme discovery by this method is the design of a suitable selection system. Chemical complementation offers a way to evolve an enzyme for which there is no intrinsic selection, such as a glycosynthase. 4.3.1 Rationale Chemical synthesis of carbohydrates is presently limited by the need for differentially protected intermediates and reactant-dependent coupling yields and stereocontrol. Enzymes, with their exquisite control of both regio- and stereochemistry, provide a promising alternative to traditional small molecule chemistry for the synthesis of oligosaccharides. Glycosyltransferases, the natural enzymes responsible for the synthesis of oligo- and polysaccharides, and glycosidases, the enzymes that normally hydrolyze carbohydrates, have both been used for carbohydrate synthesis [77–80]. The use of glycosyltransferases, however, is limited by the need for
4 Applications of Chemical Complementation
nucleotide diphosphate glycosyl donors, which are relatively more expensive and difficult to synthesize than glycosidase substrates. Oligosaccharide synthesis using glycosidases, not surprisingly, suffers from low yields since the enzyme also catalyzes the hydrolysis of the desired product. Thus, alternative methods are being sought for enzyme-catalyzed carbohydrate synthesis. Recently, Withers and coworkers demonstrated that retaining glycosidases can be engineered to glycosynthases simply by mutating the nucleophilic glutamate residue at the base of the active site to a small hydrophobic residue and using an α-glycosyl fluoride as the donor substrate [81]. Mutation of the active-site nucleophile to a small, hydrophobic residue both accommodates the α-glycosyl fluoride donor and inactivates the hydrolytic activity of the enzyme, allowing the reaction to proceed in the reverse direction. This result opened a new route for carbohydrate synthesis, and several retaining glycosidases have already been successfully converted to glycosynthases using this strategy [82–88]. Directed evolution would offer an obvious route to improve the activity of these enzymes and modify their substrate specificities, except that there is no intrinsic way to screen or select for glycosynthase activity. Thus, we sought to bring chemical complementation to bear on this important class of enzymes for carbohydrate synthesis (Figure 9). 4.3.2 Selection Scheme Chemical complementation is a powerful method because to look at a new reaction class, all that needs to be changed is the bond linking the two halves of the CID. For the glycosynthase reaction, we chose to model the linker after the disaccharide substrates used by the E197A mutant of Cel7B from Humicola insolens [89]. Cel7B is an endoglucanase that catalyzes the hydrolysis of β-1,4-linked glucosidic bonds in cellulose with retention of stereochemistry at the anomeric center [90]. The E197A mutant has been shown to be a glycosynthase [89, 91]. We envisioned that chemical complementation could detect glycosynthase activity as formation of a bond between donor 17 (methotrexate-disaccharide-fluoride, Mtx-Lac-F) and acceptor 18 (dexamethasone-disaccharide, Dex-Cel) in a Dex-Mtx yeast three-hybrid system (Figure 10) [3]. For the acceptor compound 18, cellobiose (the natural substrate of the glycosynthase) was used as the disaccharide with 1 attached at the anomeric position. For the α-glycosyl fluoride donor 17, however, cellobiosyl fluoride could not be used, since it could also act as an acceptor and thus self-polymerize. Therefore, an epimer of cellobiose fluoride, lactosyl fluoride, was chosen that differed only in the stereochemistry at the 4 position. Ligand 2 was installed at the 6 position of the galactose unit to facilitate the chemical synthesis and because the crystal structure of Cel7B complexed with a substrate analog suggests that this position is exposed and, therefore, that the addition of 2 would not interfere with substrate binding. To validate the choice of substrates for this enzyme system, heterodimer 19 (Mtx-LacCel-Dex) was synthesized in vitro from the donor and acceptor substrates 17 and 18 using a commercial preparation of the Cel7B:E197A glycosynthase. The complete CID activated transcription of the LEU2 reporter gene in the Dex-Mtx three-hybrid system, demonstrating that the designed small molecules 17 and 18 were suitable for use in the glycosynthase assay.
29
30 Chemical Complementation Assays
Fig. 9 Molecules used for glycosynthase evolution. "-Glycosyl fluoride donor 17 and cellobiose acceptor 18 are linked by the Cel7B : E197A glycosynthase to make heterodimer 19, which can reconstitute the
Dex-Mtx yeast three-hybrid system [3]. The catalytic activity of evolved glycosynthases is measured in vitro using commercially available analogous substrates 20 and 21.
4.3.3 Glycosynthase Assay Once the CID for the glycosynthase was established, we could determine if chemical complementation could detect the glycosynthase activity of the Cel7B: E197A enzyme. Using standard conditions for a LEU2 growth assay, we showed that expression of Cel7B: E197A glycosynthase in the presence of 17 and 18 conferred a growth advantage to the yeast three-hybrid selection strain (Figure 11) [3], presumably because the Cel7B:E197A glycosynthase catalyzed the synthesis of heterodimer 19. Several control experiments were carried out to confirm that the growth advantage is indeed caused by the catalytic activity of the Cel7B: E197A glycosynthase. First, we showed that transcription activation required both the donor and the acceptor substrates. The yeast three-hybrid selection strain expressing the Cel7B:E197A glycosynthase was grown in the presence of no small molecule, 10 µM 18 alone, 10 µM 17 alone, or 10 µM 17 and 10 µM 18 in media lacking the appropriate auxotrophs and leucine. Activation of the LEU2 reporter gene results in an increase in cell growth and hence in the OD600 for the cell culture. Cell growth was
4 Applications of Chemical Complementation
Fig. 10 Chemical complementation as a high-throughput assay for glycosynthase activity. Chemical complementation detects enzyme catalysis of bond formation or cleavage reactions based on covalent coupling of two small molecule ligands. The heterodimeric small molecule reconsti-
tutes a transcriptional activator, turning on the transcription of a downstream reporter gene. Here, a Dex-Mtx yeast three-hybrid system is used. Glycosynthase activity is detected as formation of a glycosidic linkage between the Mtx "-fluoride donor 17 and Dex cellobiose acceptor 18 [3].
at background levels for cells grown with no small molecule or with only 18 and near background levels with only 17. Cells grown in the presence of both 17 and 18 showed a clear growth advantage. Next, we showed that transcription activation is dependent on the catalytic activity of the Cel7B:E197A glycosynthase. Cells expressing the wild-type Cel7B glycosidase, which should be able to bind the two substrates, but should be much less efficient at product synthesis, showed no growth advantage in the presence of 17 and 18. Furthermore, cells expressing the Cel7B glycosidase were indistinguishable from cells expressing no enzyme. Together, these data suggest that the Cel7B:E197A glycosynthase activity can be detected using chemical complementation. To demonstrate the ability of chemical complementation to carry out a highthroughput selection for glycosynthase activity, the growth advantage conferred by the Cel7B:E197A glycosynthase was used to select the glycosynthase from a pool of inactive variants. A mock library containing 100:1 Cel7B glycosidase to Cel7B:E197A glycosynthase was used to transform the yeast three-hybrid selection strain. Here, the Cel7B glycosidase was used as the “inactive” control. First, the transformed cells were plated under nonselective conditions to determine the transformation efficiency and ten random colonies were analyzed to establish the integrity of the library. All ten colonies were found to express the Cel7B glycosidase, as would be expected from the 100:1 library ratio, based on colony PCR and restriction digestion. Then the library was plated on selective media lacking the appropriate auxotrophs and leucine and containing substrates 17 and 18. The 10 largest colonies were analyzed by colony PCR and restriction digestion. Of these
31
32 Chemical Complementation Assays
Fig. 11 Chemical complementation links Cel7B : E197A glycosynthase activity to LEU2 reporter gene transcription in vivo. Yeast cells containing the DBD-DHFR and AD-GR fusion proteins and the LEU2 reporter gene, expressing either no enzyme, Cel7B glycosidase, or Cel7B : E197A glycosynthase were grown in selective medium with or without the substrates 17 (methotrexate-lactose fluoride, Mtx-Lac-F) and 18 (dexamethasone-cellobiose, Dex-
Cel). Small molecule dimerizer 6 (Dex-Mtx), with a methylene linker, was used as a positive control. Cells expressing no enzyme or Cel7B glycosidase were used as negative controls. Activation of the LEU2 reporter gene and hence cell growth in the absence of leucine is reported here as the OD600 after 6 days of growth at 30◦ C. The growth assays were carried out in triplicate, and the error bars correspond to the standard error [3].
10 colonies, eight contained the Cel7B:E197A glycosynthase, and two contained the Cel7B glycosidase, which corresponds to an enrichment of 400-fold after a single round of selection. This result established that the system could select active glycosynthases from a large library and thus was suitable for directed evolution. 4.3.4 Directed Evolution Having demonstrated that chemical complementation could select for glycosynthase activity, we applied the LEU2 selection to the directed evolution of Cel7B with a E197 saturation library (Figure 11). Position 197 was randomized to all 20 amino acids using cassette mutagenesis with an NNK codon (N = G,A,T,C; K = T,G). This library was transformed into the yeast three-hybrid selection strain and plated on selective media lacking the appropriate auxotrophs and leucine and containing substrates 17 and 18. The 96 largest colonies from the selection plate were arrayed in microtiter plates for secondary screening to eliminate false positives. In the secondary screen, growth of the selected mutants was monitored under selective conditions with and without the two substrates. Only those colonies that showed better growth with the two substrates were considered true positives and
4 Applications of Chemical Complementation
subjected to further characterization. Of the 96 colonies picked, 35 were active in the secondary screen. The Cel7B genes from those 35 positives were PCR amplified and sequenced. Three mutants, Ala, Gly, and Ser, occurred most frequently and represent 80% of the positive clones. In addition, the Leu mutant was isolated twice and the Pro, Asp, and Thr mutants were all isolated once. Of the 10 Ala mutants isolated, three of them also had a spontaneous N196D mutation. To confirm that these mutants are indeed glycosynthases, plasmids for individual mutants were isolated, transformed back into the yeast three-hybrid selection strain, and resubjected to the secondary screen with or without small molecules. All of these variants again showed a growth advantage with substrates 17 and 18, with the N196D/E197A variant growing faster than the E197A variant. The E197A, E197S and N196D/E197A variants were all subsequently overexpressed in yeast and purified to allow determination of their glycosynthase activities. These three variants were assayed in vitro using a standard glycosynthase assay with the substrates 20 (α-lactosyl fluoride) and 21 (p-nitrophenyl-β-cellobioside) [91]. These two substrates are structurally similar to 17 and 18, respectively. The original Cel7B:E197A glycosynthase has a specific activity of 8 ± 2 mol product min−1 mol−1 enzyme. The E197S variant isolated from this selection shows a significant improvement in activity, with a specific activity of 40 ± 5 mol min−1 mol−1 . The N196D/E197A variant, somewhat surprisingly, is within experimental error of E197A, with a specific activity of 7 ± 1 mol min mol−1 . Thus, we were able to increase the activity of the glycosynthase five-fold in this first directed evolution experiment. We are presently expanding our efforts toward screening larger libraries through DNA shuffling, by which we expect to find mutants of significantly higher activity.
4.3.5 General Considerations For any high-throughput assay, a secondary screen is important for ruling out false positives. The secondary screen for glycosynthase activity simply consisted of the chemical complementation growth assay, repeated in both the presence and absence of small molecules 17 and 18. Only those clones that exhibited smallmolecule-dependent growth were considered to be true positives. Additionally, it is important to demonstrate that cell growth is enzyme-dependent and not due to random mutation in the strain. This was easily accomplished by isolating the plasmid of the putative active enzyme and retransforming it into the selection strain to repeat the assay. The number of mutants that can be screened or selected in chemical complementation is presently limited by the transformation efficiency of yeast. While this was not a concern for the small library considered here, the largest libraries that can be considered are on the order of 105 –107 . For the E197 saturation library, the transformation efficiency was 106 colonies per microgram of DNA, showing that the yeast three-hybrid selection strain is not impaired in its transformation efficiency. Yet, for larger library sizes (108 –1010 ), the system will have to be moved to E. coli, and we have made efforts in that direction [92].
33
34 Chemical Complementation Assays
5 Conclusion
As we have seen, complementation approaches are a powerful means to select active enzymes from a large library of proteins. Here, we have presented chemical complementation as a reaction-independent assay that links enzyme activity to cell survival. The desired chemistry is programmed into a small molecule such that bond cleavage or bond formation reconstitutes a yeast three-hybrid transcriptional activator that drives expression of an essential reporter gene. The readout system, Mtx-Dex, LexA-DHFR, B42-GR and the reporter gene can all remain constant while the chemistry changes. Thus, all that needs to be changed for each new reaction is the Mtx-substrate-Dex, or Dex-substrate and Mtx-substrate molecules synthesized in the lab, and the enzyme library. This assay could be used to engineer glycosyltransferases, aldolases, esterases, amidases, and “diels-alderases,” all with a variety of substrate specificities and regio- and stereoselectivities. By converting the assay to a coupled enzyme assay, it may even be possible to detect oxidases and reductases. In addition to providing a powerful selection for the evolution of enzymes with new activities, there should be many uses for a reaction-independent highthroughput assay for enzyme catalysis. The assay can be used to study enzyme function, either to test hundreds of mutants to identify amino acids important for the catalytic activity of an enzyme or hundreds of different molecules to determine the substrate specificity of an enzyme. Likewise, the assay could be applied to drug discovery by screening libraries of small molecules on the basis of inhibition of enzyme activity and a change in transcription of the reporter gene. A distinct advantage is that a new assay would not have to be developed for each new enzyme target. Finally, this assay should be particularly well suited to proteomics. A battery of Mtx-Dex substrates with different substrates as linkers could be prepared and then used to screen cDNA libraries to identify enzymes that fall into common families, such as glycosidases or aldolases. Because mammalian, as well as yeast, three-hybrid assays are standard, the assays could be carried out in the endogenous cell line, ensuring correct posttranslational modification of the proteins. The key to all of these applications is a robust assay for enzymatic activity. Chemical complementation promises to be a powerful high-throughput assay for a wide variety of enzymatic reactions.
References 1 H. TAO, V. W. CORNISH, Curr. Opin. Chem. Biol. 2002, 6, 858–864. 2 K. BAKER, C. BLECZINSKI, H. LIN, G. SALAZAR-JIMENEZ, D. SENGUPTA, S. KRANE, V. W. CORNISH, Proc. Natl Acad. Sci. USA 2002, 99, 16537–16542. 3 H. LIN, H. TAO, V. W. CORNISH, J. Am. Chem. Soc. 2004, 126, 15051–15059.
4 G. W. BEADLE, E. L. TATUM, Proc. Natl. Acad. Sci. USA 1941, 27, 499–506. 5 S. BENZER, Sci. Am. 1962, 206, 10–84. 6 P. UETZ, L. GIOT, G. CAGNEY, T. A. MANSFIELD, R. S. JUDSON, J. R. KNIGHT, D. LOCKSHON, V. NARAYAN, M. SRINIVASAN, P. POCHART, A. QURESHI-EMILI, Y. LI, B. GODWIN, D. CONOVER, T. KALBFLEISCH, G.
References
7 8
9
10 11 12
13
14 15
16
17 18 19
20 21 22 23 24
25
26
VIJAYADAMODAR, M. YANG, M. JOHNSTON, S. FIELDS, J. M. Rothberg, Nature 2000, 403, 623–627. W. A. LIM, R. T. SAUER, Nature 1989, 339, 31–36. P. KAST, M. ASIF-ULLAH, N. JIANG, D. HILVERT, Proc. Natl. Acad. Sci. USA 1996, 93, 5043–5048. P. KAST, C. GRISOSTOMI, I. A. CHEN, S. LI, U. KRENGEL, Y. XUE, D. HILVERT, J. Biol. Chem. 2000, 275, 36832–36838. B. G. HALL, Genetics 1978, 89, 453–465. R. M. MYERS, L. S. LERMAN, T. MANIATIS, Science 1985, 229, 242–247. J. D. HERMES, S. M. PAREKH, S. C. BLACKLOW, H. KOSTER, J. R. KNOWLES, Gene 1989, 84, 143–151. J. D. HERMES, S. C. BLACKLOW, J. R. KNOWLES, Proc. Natl. Acad. Sci. USA 1990, 87, 696–700. T. YANO, S. OUE, H. KAGAMIYAMA, Proc. Natl. Acad. Sci. USA 1998, 95, 5511–5515. G. DESANTIS, K. WONG, B. FARWELL, K. CHATMAN, Z. ZHU, G. TOMLINSON, H. HUANG, X. TAN, L. BIBBS, P. CHEN, K. KRETZ, M. J. BURK, J. Am. Chem. Soc. 2003, 125, 11476–11477. S. W. SANTORO, L. WANG, B. HERBERICH, D. S. KING, P. G. SCHULTZ, Nat. Biotechnol. 2002, 20, 1044–1048. S. L. DOVE, J. K. JOUNG, A. HOCHSCHILD, Nature 1997, 386, 627–630. J. K. JOUNG, E. I. RAMM, C. O. PABO, Proc. Natl. Acad. Sci. USA 2000, 97, 7382–7387. S. A. LESLEY, P. A. PATTEN, P. G. SCHULTZ, Proc. Natl. Acad. Sci. USA 1993, 90, 1160–1165. J. A. SMILEY, S. J. BENKOVIC, Proc. Natl. Acad. Sci. USA 1994, 91, 8319–8323. Y. TANG, J. B. HICKS, D. HILVERT, Proc. Natl. Acad. Sci. USA 1991, 88, 8784–8786. J. KAPLAN, W. F. DEGRADO, Proc. Natl. Acad. Sci. USA 2004, 101, 11566–11570. M. A. DWYER, L. L. LOOGER, H. W. HELLINGA, Science 2004, 304, 1967–1971. C. A. VOIGT, S. L. MAYO, F. H. ARNOLD, Z. G. WANG, Proc. Natl. Acad. Sci. USA 2001, 98, 3778–3783. B. KUHLMAN, G. DANTAS, G. C. IRETON, G. VARANI, B. L. STODDARD, D. BAKER, Science 2003, 302, 1364–1368. A. D. GRIFFITHS, D. S. TAWFIK, EMBO J. 2003, 22, 24–35.
27 H. PEDERSEN, S. HOLDER, D. P. SUTHERLIN, U. SCHWITTER, D. S. KING, P. G. SCHULTZ, Proc. Natl. Acad. Sci. USA 1998, 95, 10523–10528. 28 D. S. TAWFIK, A. D. GRIFFITHS, Nat. Biotechnol. 1998, 16, 652–656. 29 M. J. OLSEN, D. STEPHENS, D. GRIFFITHS, P. DAUGHERTY, G. GEORGIOU, B. L. IVERSON, Nat. Biotechnol. 2000, 18, 1071–1074. 30 S. M. FIRESTINE, F. SALINAS, A. E. NIXON, S. J. BAKER, S. J. BENKOVIC, Nat. Biotechnol. 2000, 18, 544–547. 31 G. XIA, L. CHEN, T. SERA, M. FA, P. G. SCHULTZ, F. E. ROMESBERG, Proc. Natl. Acad. Sci. USA 2002, 99, 6597–6602. 32 F. J. GHADESSY, J. L. ONG, P. HOLLIGER, Proc. Natl. Acad. Sci. USA 2001, 98, 4552–4557. 33 S. W. MICHNICK, F. X. VALOIS, Proc. Natl. Acad. Sci. USA 2002, 99, 16513–16515. 34 B. T. CARTER, H. LIN, V. W. CORNISH, in Directed Molecular Evolution of Proteins, eds. K. BRAKMANN, K. JOHNSSON, Wiley-VCH, Weinheim, 2002. 35 E. LICITRA, J. LIU, Proc. Natl. Acad. Sci. USA 1996, 93, 12817–12821. 36 S. FIELDS, O. SONG, Nature 1989, 340, 245–246. 37 R. BRENT, M. PTASHNE, Cell 1985, 43, 729–736. 38 I. HOPE, K. STRUHL, Cell 1986, 46, 885–894. 39 S. J. TRIEZENBERG, R. KINGSBURY, S. MCKNIGHT, Genes Dev. 1988, 2, 718– 729. 40 D. M. SPENCER, T. J. WANDLESS, S. L. SCHREIBER, G. R. CRABTREE, Science 1993, 262, 1019–1024. 41 M. T. RYAN, H. MULLER, N. PFANNER, J. Biol. Chem. 1999, 274, 20619–20627. 42 J. N. PELLETIER, F. X. CAMPBELL-VALOIS, S. W. MICHNICK, Proc. Natl. Acad. Sci. USA 1998, 95, 12141–12146. 43 S. P. SASSO, R. M. GILLI, J. C. SARI, O. S. RIMET, C. M. BRIAND, Biochim. Biophys. Acta 1994, 1207, 74–79. 44 P. K. CHAKRABORTI, M. J. GARABEDIAN, K. R. YAMAMOTO, S. S. SIMONS Jr., J. Biol. Chem. 1991, 266, 22075–22078. 45 E. J. REBAR, C. O. PABO, Science 1994, 263, 671–673. 46 M. JOHNSTON, Microbiol. Rev. 1987, 51, 458–476.
35
36 Chemical Complementation Assays
47 J. ESTOJAK, R. BRENT, E. A. GOLEMIS, Mol. Cell. Biol. 1995, 15, 5820–5829. 48 H. LIN, W. M. ABIDA, R. T. SAUER, V. W. CORNISH, J. Am. Chem. Soc. 2001, 122, 4247–4248. 49 W. M. ABIDA, B. T. CARTER, E. A. ALTHOFF, H. LIN, V. W. CORNISH, ChemBioChem 2002, 3, 887–895. 50 K. BAKER, D. SENGUPTA, G. SALAZAR-JIMENEZ, V. W. CORNISH, Anal. Biochem. 2003, 315, 134–137. 51 K. S. DE FELIPE, B. T. CARTER, E. A. ALTHOFF, V. W. CORNISH, Biochemistry 2004, 43, 10353–10363. 52 O. FUTER, M. T. DECENZO, R. A. ALDAPE, D. J. LIVINGSTON, J. Biol. Chem. 1995, 270, 18935–18940. 53 D. A. HOLT, J. I. LUENGO, D. S. YAMASHITA, H. J. OH, A. L. KONILIAN, H. K. YEN, L.W. ROZAMUS, M. BRANDT, M. J. BOSSARD, M. A. LEVY, D. S. EGGLESTON, J. LIANG, L. W. SCHULTZ, T. J. STOUT, J. CLARDY, J. Am. Chem. Soc. 1993, 115, 9925–9938. 54 P. D. BRAUN, K. T. BARGLOW, Y. M. LIN, T. AKOMPONG, R. BRIESEWITZ, G. T. RAY, K. HALDAR, T. J. WANDLESS, J. Am. Chem. Soc. 2003, 125, 7575–7580. 55 F. BECKER, K. MURTHI, C. SMITH, J. COME, N. COSTA-ROLDAN, C. KAUFMANN, U. HANKE, C. DEGENHART, S. BAUMANN, W. WALLNER, A. HUBER, S. DEDIER, S. DILL, D. KINSMAN, M. HEDIGER, N. BOCKOVICH, S. MEIER-EWERT, A. F. KLUGE, N. KLEY, Chem. Biol. 2004, 11, 211–223. 56 S. LEFURGY, V. W. CORNISH, Chem. Biol. 2004, 11, 151–153. 57 S. L. HUSSEY, S. S. MUDDANA, B. R. PETERSON, J. Am. Chem. Soc. 2003, 125, 3692–3693. 58 S. GENDREIZIG, M. KINDERMANN, K. JOHNSSON, J. Am. Chem. Soc. 2003, 125, 14970–14971. 59 M. PAGE, The Chemistry of the β-Lactams, Chapman & Hall, Glasgow, 1992. 60 M. GALLENI, J. M. FRERE, Biochem. J. 1988, 255, 119–122. 61 E. LOBKOVSKY, P. C. MOEWS, H. LIU, H. ZHAO, J. M. FRERE, J. R. KNOX, Proc. Natl. Acad. Sci. USA 1993, 90, 11257–11261. 62 W. DURCKHEIMER, F. ADAM, G. FISCHER, R. KIRRSTETTER, Adv. Drug Res. 1988, 17, 61–234. 63 D. SENGUPTA, H. LIN, S. D. GOLDBERG, J. J.
64 65 66
67 68 69
70
71
72
73
74
75
76
77 78 79 80 81
MAHAL, V. W. CORNISH, Biochemistry 2004, 43, 3570–3581. D. D. CLARK, B. R. PETERSON, Chem BioChem 2003, 4, 101–107. S. ATHAVANKAR, B. R. PETERSON, Chem. Biol. 2003, 10, 1245–1253. B. AZIZI, E. I. CHANG, D. F. DOYLE, Biochem. Biophys. Res. Commun. 2003, 306, 774–780. Z. ZHANG, W. ZHU, T. KODADEK, Nat. Biotechnol. 2000, 18, 71–74. J. M. GHUYSEN, Annu. Rev. Microbiol. 1991, 45, 37–67. M. NUKAGA, S. KUMAR, K. NUKAGA, R. F. PRATT, J. R. KNOX, J. Biol. Chem. 2004, 279, 9344–9352. R. A. POWERS, E. CASELLI, P. J. FOCIA, F. PRATI, B. K. SHOICHET, Biochemistry 2001, 40, 9207–9214. Z. ZHANG, Y. YU, J. M. MUSSER, T. PALZKILL, J. Biol. Chem. 2001, 276, 46568–46574. B. T. CARTER, H. LIN, S. D. GOLDBERG, E. A. ALTHOFF, J. RAUSHEL, V. W. CORNISH, Chem BioChem 2005, 6, 2055–2067. T. DELAVEAU, A. DELAHODDE, E. CARVAJAL, J. SUBIK, C. JACQ, Mol. Gen. Genet. 1994, 244, 501–511. H. B. VAN den Hazel, H. PICHLER, M. A. DO Valle Matta, E. LEITNER, A. GOFFEAU, G. DAUM, J. Biol. Chem. 1999, 274, 1934–1941. A. NOURANI, M. WESOLOWSKI-LOUVEL, T. DELAVEAU, C. JACQ, A. DELAHODDE, Mol. Cell. Biol. 1997, 17, 5453–5460. J. KATO-STANKIEWICZ, I. HAKIMI, G. ZHI, J. ZHANG, I. SEREBRIISKII, L. GUO, H. EDAMATSU, H. KOIDE, S. MENON, R. ECKL, S. SAKAMURI, Y. LU, Q. Z. CHEN, S. AGARWAL, W. R. BAUMBACH, E. A. GOLEMIS, F. TAMANOI, V. KHAZAK, Proc. Natl. Acad. Sci. USA 2002, 99, 14398–14403. S. L. FLITSCH, Curr. Opin. Chem. Biol. 2000, 4, 619–625. H. J. GIJSEN, L. QIAO, W. FITZ, C. H. WONG, Chem. Rev. 1996, 96, 443–474. K. M. KOELLER, C. H. WONG, Chem. Rev. 2000, 100, 4465–4494. K. M. KOELLER, C. H. WONG, Nature 2001, 409, 232–240. P. SEARS, C. H. WONG, Science 2001, 291, 2344–2350.
References 82 L. F. MACKENZIE, Q. WANG, R. A. J. WARREN, S. G. WITHERS, J. Am. Chem. Soc. 1998, 120, 5583–5585. 83 M. FAIJES, J. K. FAIRWEATHER, H. DRIGUEZ, A. PLANAS, Chemistry 2001, 7, 4651–4655. 84 G. J. DAVIES, S. J. CHARNOCK, B. HENRISSAT, Trends Glycosci. Glycotechnol. 2001, 13, 105–120. 85 J. K. FAIRWEATHER, R. V. STICK, S. G. WITHERS, Aust. J. Chem. 2000, 53, 913–916. 86 C. MAYER, D. L. ZECHEL, S. P. REID, R. A. WARREN, S. G. WITHERS, FEBS Lett. 2000, 466, 40–44. 87 O. NASHIRU, D. L. ZECHEL, D. STOLL, T. MOHAMMADZADEH, R. A. WARREN, S. G. WITHERS, Angew. Chem. Int. Ed. Engl. 2001, 40, 417–420.
88 A. TRINCONE, G. PERUGINO, M. ROSSI, M. MORACCI, Bioorg. Med. Chem. Lett. 2000, 10, 365–368. 89 S. FORT, V. BOYER, L. GREFFE, G. J. DAVIES, O. MOROZ, L. CHRISTIANSEN, M. SCHULEIN, S. COTTAZ, H. DRIGUEZ, J. Am. Chem. Soc. 2000, 122, 5429–5437. 90 C. SCHOU, G. RASMUSSEN, M. B. KALTOFT, B. HENRISSAT, M. SCHULEIN, Eur. J. Biochem. 1993, 217, 947–953. 91 V. M. DUCROS, C. A. TARLING, D. L. ZECHEL, A. M. BRZOZOWSKI, T. P. FRANDSEN, I. VON OSSOWSKI, M. SCHULEIN, S. G. WITHERS, G. J. DAVIES, Chem. Biol. 2003, 10, 619–628. 92 E. A. ALTHOFF, V. W. CORNISH, Angew. Chem. Int. Ed. Engl. 2002, 41, 2327– 2330.
37
1
High-throughput Screening Methods for Oxidoreductases Tyler W. Johannes, Ryan D. Woodyer, and Huimin Zhao University of Illinois, Urbana, USA
Originally published in: Enzyme Assays. Edited by Jean-Louis Reymond. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52731095-1
1 Introduction
The interest in using enzymes as synthetic chemistry tools continues to grow rapidly [1, 2]. Among various classes of enzymes, oxidoreductases represent a highly versatile class of biocatalysts for specific reduction, oxidation and oxyfunctionalization reactions, and are currently used for the production of a wide variety of chemical and pharmaceutical products. Examples of such products include l-tert-leucine, 6-hydroxynicotinic acid, 5-methylpyrazine-2-carboxylic acid, (R)2-hydroxyphenoxypropionic acid, and steroids [3]. Oxidoreductases are generally employed in whole-cell biotransformations or fermentation-based processes because they require expensive cofactors and coenzymes (e.g. NAD(H) or NADP(H)) to donate or accept the chemical equivalents for reduction or oxidation. To facilitate their applications in vitro, a large number of cofactor regeneration systems have been developed [4, 5]. As with most enzyme biocatalysts, the discovery and engineering of oxidoreductases with improved stability, catalytic activity, and substrate specificity are highly desirable for many applications. Thanks to recent advances in recombinant DNA technology, new enzymes can now be isolated directly from microorganisms that are difficult to cultivate by high-throughput screening of expressed libraries of environmental DNA [6], whereas improved enzymes with desired functions may be engineered rapidly by directed evolution approaches [7]. Identifying new or improved oxidoreductases by these routes requires the development of high-throughput screening methods that are sensitive, efficient, and simple to implement. This chapter focuses on various high-throughput screening methods recently developed for the discovery and directed evolution of oxidoreductases. For the purposes of this chapter, oxidoreductases have been classified into four categories: Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 High-throughput Screening Methods for Oxidoreductases
dehydrogenases, oxidases, oxygenases, and laccases. Each section begins by discussing the general uses for each oxidoreductase, then explains the details of various high-throughput screening assays, and ends with specific examples of how these methods have been used to date.
2 High-throughput Methods for Various Oxidoreductases 2.1 Dehydrogenases
Dehydrogenases catalyze the removal and transfer of hydrogen from a substrate in an oxidation-reduction reaction and are widely used to reduce carbonyl groups of aldehydes or ketones and carbon/carbon double bonds. The reduction reactions carried out by dehydrogenases often introduce chirality into a prochiral center, which may represent a key step in pharmaceutical and fine chemical synthesis. The majority of dehydrogenase enzymes require nicotinamide cofactors (NAD(P) and NAD(P)H), while other cofactors such as pyrroloquinoline quinone (PQQ) and flavins (FAD, FMN) are encountered more rarely [1]. The production or depletion of the nicotinamide cofactors has enabled scientists to measure the kinetics of several dehydrogenases. These reactions can be monitored colorimetrically by combining the redox reaction with a dye-forming reaction. High-throughput screening methods based on this approach have been developed and used to identify dehydrogenase variants with improved characteristics in various directed evolution studies. 2.1.1 Colorimetric Screen Based on NAD(P)H Generation The absorbance of NAD(P)H at 340 nm is commonly used to measure the activity of dehydrogenases. Unfortunately, this approach is generally not suitable for highthroughput methods due to background noise created by cell lysates and the 96-well plate itself. Special plates designed so that they do not absorb in the UV range can be used, but tend to be very expensive when considering a large-scale screening effort. Colorimetric assays resolve most of these difficulties and are quite amenable to high-throughput screening methods. Thus an indirect method requiring either a synthetic compound or a secondary enzyme must be applied. Tetrazolium salts such as nitroblue tetrazolium (NBT) (1) are useful because they can be reduced to formazan dyes such as 2, which absorb light in the visible region. These reactions are essentially irreversible under biological conditions and the increase in color can be easily monitored visually on filter discs or on a standard 96-well plate reader. A cascade reaction leading to the formation of a colored formazan links the production of NAD(P)H to the catalytic activity of a dehydrogenase. An NBT/phenazine methosulfate (PMS) assay has been developed that works well under several different high-throughput formats. Dehydrogenase activity on nitrocellulose membranes, electrophoretic gels, and in liquid phase has been successfully monitored using the NBT/PMS assay. The dehydrogenase cofactor NAD(P)H reacts with NBT (1) in the presence of PMS (3) to produce an insoluble blue-purple
2 High-throughput Methods for Various Oxidoreductases
Fig. 1 Colorimetric assay for dehydrogenase activity. DH = dehydrogenase; PMS = phenzine methosulfate; NBT = nitroblue tetrazolium.
formazan (2) (Figure 1). Other transferring agents besides PMS may be used such as the enzyme diaphorase or Meldola’s Blue (8-dimethylamino-2-benzophenoxazine, 4) [8]. Compared with PMS, Meldola’s Blue has a superior proton transfer rate and is far less sensitive to light. In addition, the cost of Meldola’s Blue is approximately US$4 per gram, while the price of PMS is nearly US$16 per gram (Sigma catalog, 2004, St. Louis, MO, USA). Despite these disadvantages, PMS is used as the primary transferring agent in the NBT colorimetric assay. For the liquid-phase NBT/PMS assay, 0.1% gelatin is often added to the assay mix to help prevent precipitation of the insoluble blue-purple formazan (2) [9]. When scanned spectrophotometrically (range 400–700 nm), the blue-purple formazan (2) shows a maximal absorption at 555 nm. By measuring the increase in formazan (2) production at 580 nm in a microplate reader, the NBT/PMS assay has been used to identify E. coli 6-phosphogluconate dehydrogenase (6PGDH) variants with higher activity and thermostability [10]. This assay can also be used to screen dehydrogenase variants on a nitrocellulose membrane. Briefly, a library of variants is transferred to a nitrocellulose filter followed by lysis of the colonies by washing in a lysozyme solution. A NBT/PMS cocktail is then sprayed onto the membrane to detect dehydrogenase activity. Active mutants produce a blue halo on the nitrocellulose filters. This approach has been successfully used by Holbrook and coworkers to produce a lactate dehydrogenase that no longer requires the expensive activator fructose-1,6-bisphosphate (FBP) [11]. The same group also used this approach to create new substrate specificities for lactate dehydrogenase [12]. The NBT/PMS assay has been used to identify novel alcohol dehydrogenases (ADH) in a high-throughput format without enzyme purification [13]. This new screening approach uses bioinformatics, polymerase chain reaction (PCR) techniques, and the direct in vitro expression of enzymes to rapidly detect and characterize novel ADHs. A pool of 18 novel thermoactive ADHs with a broad substrate range was discovered and characterized. The compound p-rosaniline (5) can also be used as an alternative to NBT (1). No transferring agents are necessary when using p-rosaniline (5), however bisulfite must be added to help convert the dye into its rose-colored form. Dehydrogenase activity can be monitored by observing the formation of a red color on solid agar media and this allows the rapid screening of thousands of colonies [14].
3
4 High-throughput Screening Methods for Oxidoreductases
2.1.2 Screens Based on NAD(P)H Depletion NAD(P)H consumption can be monitored either directly or indirectly, similar to NAD(P)H generation in the previous section (see Section 2.1.1). Enzyme kinetics can be directly monitored by observing the decrease in absorbance at 340 nm as NAD(P)H is depleted. Coupling NAD(P)H depletion to a liquid- or solid-phase colorimetric assay such as NBT/PMS (described in Section 2.1.1) allows for an indirect method for monitoring enzyme activity. Residual NAD(P)H can be titrated with NBT (1) in the presence of PMS (3) to produce an insoluble blue-purple formazan 2 (Figure 1). High-throughput methods involve use of a microplate reader to measure the decrease in absorption at 580 nm or screening bacterial colonies on agar plates by visually identifying white spots on a purple background. Instead of focusing on NAD(P)H, the oxidized form NAD(P)+ provides an alternative. The basis for this method relies on the instability of NAD(P)H in a strong alkali environment. Under these conditions NAD(P)H breaks down to form highly fluorescent products [15]. Recently this method has been adapted to a high-throughput screening method using microtiter plates [16]. In general, any oxidoreductase that utilizes either NADH or NADPH as a cofactor can potentially be screened using this method. 2.2 Oxidases
Oxidases catalyze the removal of hydrogen from substrates and transfer it to molecular oxygen to form either hydrogen peroxide or water. These fundamental synthetic reactions have made oxidases a prime target for directed evolution. Some oxidases are currently being used on a large scale, such as the food antioxidant D-glucose oxidase, but unfortunately the vast majority of oxidases remain economically unattractive due to their poor biocatalytic performance such as low catalytic activity [1]. Thus, several high-throughput screening methods have been developed for directed evolution studies of these oxidases with the goal of overcoming the limitations of naturally occurring enzymes. 2.2.1 Galactose Oxidase Galactose oxidases catalyze the oxidation of primary alcohols to yield aldehydes and hydrogen peroxide. These synthetically useful chemical transformations have made galactose oxidases an attractive target of directed evolution in recent years [17, 18]. Standard chemical methods for oxidizing primary alcohols rely on environmentally unfriendly heavy metals and organic solvents, whereas galactose oxidase provides an environmentally friendly alternative. Galactose oxidase has broad substrate specificity as well as strict regioselectivity and stereoselectivity which make it widely applicable in chemical synthesis, diagnostics, and biosensors [17]. For example, the enzyme can be used to detect a disaccharide tumor marker, Gal-GalNAc, in certain types of colon cancer [19]. Galactose oxidase isolated from its natural host has insufficient expression and activity to be economically used in a large-scale industrial process. Liquid- and solid-phase high-throughput screening methods have
2 High-throughput Methods for Various Oxidoreductases
Fig. 2 Colorimetric assay for oxidase activity based on the HRP/ABTS/H2 O2 reaction. HRP = horseradish peroxidase; ABTS = 2,2 -azinobis(3-ethylbenzthiazoline)-6-sulfonic acid.
been developed to help analyze galactose oxidase libraries to identify variants with improved expression and specific activity. 2.2.1.1 HRP/ABTS/H2 O2 Assay Galactose oxidase activity can be monitored by coupling the production of hydrogen peroxide (H2 O2 ) to a horseradish peroxidase (HRP)/2,2 -azinobis(3-ethylbenzthiazoline)-6-sulfonic acid (ABTS)/H2 O2 reaction [19]. The enzyme HRP catalyzes the polymerization of ABTS (6) to produce a soluble end-product (7) that is green in color and can be measured spectrophotometrically at 405 nm (Figure 2). The high solubility of ABTS (6) in aqueous solutions combined with its high extinction coefficient and reproducibility make this assay well suited for a 96-well plate format. Other advantages of ABTS (6) are its low toxicity, stability at high temperature (for sterilization), and the resulting green dye is relatively stable once oxidized. However, problems such as autooxidation, instability, and a lack of sensitivity have been reported [20]. This method has been used to create a galactose oxidase enzyme with enhanced thermostability and improved expression in E. coli [17]. It has also been successfully used to screen fungal sugar oxidases [21]. 2.2.1.2 HRP/4CN/H2 O2 Assay In addition to traditional 96-well plate screening methods, a relatively new technology involving digital imaging shows great potential in screening large enzyme libraries. The efficiency of digital imaging technology provides a distinct advantage over its traditional counterpart [18]. This system combines simple well-known colorimetric activity assays with single-pixel imaging spectroscopy on a solid-phase screening format. The assay is performed in the solid phase, and 4-chloronaphthol (4CN, 7) is used to generate an insoluble compound (8) which absorbs visible light maximally around 550 nm. As the color develops on the solid support due to oxidase activity, digital imaging equipment is used to track the absorbance versus time for each pixel in the image. Delgrave and coworkers used this technology to isolate a galactose oxidase variant that showed a 16-fold increase in activity and a threefold lower Km compared with the wild-type enzyme [18]. 2.2.2 d-Amino Acid Oxidase The flavoprotein D-amino acid oxidase (D-AO) catalyzes the deamination of D-amino acids to their corresponding α-imino acids. This reaction has drawn significant
5
6 High-throughput Screening Methods for Oxidoreductases
attention recently due to its potential to deaminate cephalosporin C in a sequential enzymatic route to form 7-amino cephalosporanic acid [22]. 7-Amino cephalosporanic acid is a basic building block used in the synthesis of many important semi-synthetic cephalosporin antibiotics. Moreover, D-AO has been used to resolve racemic amino acid mixtures and prepare α-keto acids. A high-throughput screening method based on the oxidation of the chromogen, o-dianisidine (9), has been used to detect the activity of D-AO and identify new microbial D-AO producers [20]. 2.2.2.1 Peroxidase/o-Dianisidine Assay The flavin cofactor FADH2 is oxidized by molecular oxygen to form oxidized FAD and H2 O2 . The product H2 O2 can then be coupled with HRP and a chromogen to form a colored compound. ABTS (6) was found to be unsuitable with D-AO due to problems such as autooxidation, instability, and a lack of sensitivity. Another chromogen o-dianisidine (9) has been successfully used as an alternative to ABTS (6) [20]. HRP reacts with o-dianisidine (9) and H2 O2 to form a red-colored compound (10) under acidic conditions. The o-dianisidine reagent (9) has the added benefit of being a competitive inhibitor of catalase, an enzyme that competes in vitro with HRP for the substrate H2 O2 . This assay might be adaptable to screening other oxidases by simply changing to the particular oxidase’s substrate. 2.2.3 Peroxidases Peroxidases catalyze the oxidation of a wide variety of molecules utilizing H2 O2 as the oxidant. Many peroxidases have an iron protoporphyrin IX prosthetic group in the active site, while others have a vanadium or, in the case of some bacterial peroxidases, no metal cofactor [23]. Peroxidases catalyze reactions that are potentially useful in both industrial and biotechnological applications. For example, HRP has been used as a reporter in diagnostic assays, biosensors, and histochemical staining [24]. HRP has also been suggested as a catalyst for chemical synthesis as it catalyzes polymerization and dehydration of aromatic compounds, heteroatom oxidations, and epoxidation reactions [24, 25]. Furthermore, peroxidases have shown potential in bioremediation for removal of phenols and aromatic amines [26]. However, many peroxidases are not stable enough under the oxidative conditions created by high peroxide concentration and there are many substrates that peroxidases do not react with. Therefore, a number of activity assays have been developed that can be used in a high-throughput fashion for the discovery and engineering of new and improved peroxidases. 2.2.3.1 ABTS and o-Dianisidine Assays As mentioned above in the galactose oxidase and D-AO sections, ABTS (6) and o-dianisidine (9) can both be used to assay peroxidases. ABTS (6) is oxidized by peroxidases in the presence of H2 O2 to form a radical cation with intense green color (7) as shown in Figure 2, whereas the oxidation of o-dianisidine (9) results in the formation of a red dye (10). These types of assays are carried out similarly to those described in the oxidase section, except that hydrogen peroxide is now provided as a substrate. ABTS screening has been used to find cytochrome c peroxidase mutants with altered specificity [27]. Using a combination of DNA shuffling and saturation
2 High-throughput Methods for Various Oxidoreductases
Fig. 3 Colorimetric assay for peroxidase activity based on 3,3 ,5 ,5tetramethylbenzidine (TMB).
mutagenesis, Iffland and coworkers were able to obtain variants of the cytochrome c peroxidase with a 70-fold increased specificity toward ABTS (6) compared with the natural substrate (cytochrome c) [27]. The ABTS assay has also been used to screen a library of HRP variants created by random mutagenesis [28], which resulted in a mutant that is more thermally stable, resistant to a variety of chemical and physical denaturants, and has a higher activity towards other organic substrates. The ABTS-based assay was also utilized to find mutants of HRP that can be functionally expressed in E. coli by directed evolution [29], to find mutants with 174-fold higher thermostability and 100-fold higher oxidative stability in Coprinus cinereus heme peroxidase by directed evolution [30], and to enhance the peroxidase activity of horse heart myoglobin by directed evolution [31]. The usefulness of the o-dianisidine (9) assay has been demonstrated in determining HRP mutants with expanded substrate activity [28]. The convenience of the high-throughput o-dianisidine (9) assay was further demonstrated when it was used to measure the total plasma peroxidase activity to assess the clinical severity of sickle cell anemia and citrulline therapy [32]. 2.2.3.2 TMB Assay Among the numerous colorimetric assays available for peroxidases, one of the most sensitive ones is based on the colorless compound 3,3 ,5,5 -tetramethylbenzidine (TMB) (11). Upon oxidation of this substrate by the peroxidase, TMB (11) forms a blue charge-transfer complex (12) initially, followed by a stable yellow final product (13) (Figure 3) [33]. A typical cell lysate screen is performed by monitoring the production of the blue charge-transfer product (12) at 650 nm on a microplate reader after mixing a solution of the TMB (11) substrate, H2 O2 , and the cell lysate. This assay is
7
8 High-throughput Screening Methods for Oxidoreductases
commonly used in HRP-based biosensors to determine the effects of mutations on peroxidases, and would be a reasonable assay for directed evolution of a peroxidase. 2.2.3.3 Guaiacol Assay The guaiacol peroxidase assay is another colorimetric assay that is amenable to high-throughput screening. In this case, the colorless guaiacol (2-methoxyphenol, 14) is oxidized to the phenoxy radical (15) in the presence of the peroxidase and H2 O2 . The phenoxy radical subsequently undergoes polymerization to form tetraguaiacol (16), which is brown and can be measured at 470 nm. This assay can be performed in microplate format or in solid media format. Removal of background activity can be achieved by the addition of a small amount of ascorbate, which will scavenge the phenoxy radicals until all the ascorbate has reacted. The guaiacol assay has been successfully applied in identifying cytochrome c peroxidase mutants with altered substrate specificity from a library of mutants created by combinatorial mutagenesis [34]. Additionally, cytochrome c peroxidase mutants were created by directed evolution and screened for activity with this nonnatural substrate with a final mutant displaying a 1000-fold change in substrate preference [35]. 2.2.3.4 MNBDH Assay 4-(N-Methylhydrazino)-7-nitro-2,1,3-benzooxadiazole (MNBDH, 17) has been demonstrated to be a good substrate for peroxidase screening. The hydrazine portion of MNBDH (17) is oxidized by peroxidases in the presence of hydrogen peroxide to yield the strongly fluorescent nitroaniline (18) [36]. When measured with a fluorescent plate reader, this assay gives a very sensitive readout and is superior to chromogenic assays such as ABTS (6). However, application of this assay in directed evolution or discovery of peroxidases has yet to be demonstrated. 2.3 Oxygenases
There are several families of oxygenase enzymes whose members fall into one of two groups: those that introduce one oxygen atom into their substrate are generically referred to as monooxygenases, while those that introduce two oxygen atoms are referred to as dioxygenases. These enzymes typically catalyze the hydroxylation of nonactivated carbons or epoxidations of organic substrates and therefore have potential applications in synthetic chemistry, environmental remediation, toxicology, and gene therapy [37]. Examples of oxygenases are the omnipresent cytochrome P450 heme monooxygenases, di-iron enzymes such as methane and toluene monooxygenases, and Rieske nonheme iron dioxygenases such as naphthalene dioxygenase [38]. Molecular oxygen often serves as the oxidant, while reducing equivalents typically come from NAD(P)H-dependent reductase proteins via electron transfer. Since reducing equivalents are transferred from a separate protein in most oxygenases (excluding those with a fused reductase domain), they typically function as large multimeric protein complexes. As such, oxygenases are often not stable enough or inadequately active towards the substrate of interest in vitro. Therefore,
2 High-throughput Methods for Various Oxidoreductases
protein engineering of mono- and dioxygenases and the discovery of new oxygenases has received much interest in the hopes of discovering or evolving oxygenases for use in organic synthesis or biotechnology. Discovery of new or improved oxygenases from large random libraries requires an efficient and sensitive highthroughput screening method for desired enzyme function. Many such screening methods have been developed and used. In this section some of these methods and their applications will be discussed. Assays based on NADPH depletion have been discussed previously. However, these assays can yield misleading results with oxygenases, since uncoupling of the reductase and oxygenase activity can occur, especially for unnatural substrates.
2.3.1 Assays Based on Optical Properties of Substrates and Products Many substrates of mono- and dioxygenases are aromatic and upon oxidation become colored or have a shifted absorbance spectrum. In other cases, the reaction products will autooxidize further to form colored products under the proper conditions. Although this is not applicable to all oxygenase products, it is the most convenient method of assaying oxygenase activity in a high-throughput manner. 2-Hydroxybiphenyl-2-monooxygenase (HpbA), in addition to its natural activity, catalyzes at low efficiency the regioselective o-hydroxylation of many different 2substituted phenols to form the corresponding catechols. Random mutagenesis of HpbA was used to create mutants with enhanced monooxygenase activity with 2substituted phenolic substrates [39]. Screening was accomplished by the formation of color when the reaction products autooxidize [39]. Using this method, mutants were discovered with greater than eightfold improvement in catalytic efficiency of hydroxylation of 2-methoxyphenol and a fivefold improvement in turnover rate with 2-tert-butylphenol [39]. Biphenyl (19) dioxygenase (BPDO) was screened in a somewhat different manner. BPDO is important industrially for the degradation of polychlorinated biphenyls (PCBs) and directed evolution was utilized to broaden its substrate specificity [40]. Screening was achieved by coexpressing dihydrodiol dehydrogenase (BphB) and 2,3-dihydroxybiphenyl-1,2-dioxygenase (BphC), which converted the BPDO reaction product (20) to a diol (21) and finally into a ring meta cleavage product (22) that was yellow in color (Figure 4). This method was first developed to screen a BPDO library created by DNA shuffling for enhanced PCB degradation capabilities [41] and later used to obtain dioxygenase variants with enhanced activity towards monocyclic aromatic hydrocarbons such as toluene and benzene [42]. In another example, hybrid genes of catechol-2,3-dioxygenase were screened for enhanced thermostability based on the color of the product [43, 44]. In this case, the substrate, catechol, is colorless, while the product, 2-hydroxymuconic semialdehyde, is bright yellow. Using this method of screening, mutant dioxygenases with 26-fold enhancement in thermostability were isolated. Finally, mutants of toluene 4-monooxygenase were screened for enhanced activity on o-cresol and o-methoxyphenol based on the formation of the red-brown autooxidation products [45]. Mutants created by saturation mutagenesis were identified that had sevenfold
9
10 High-throughput Screening Methods for Oxidoreductases
Fig. 4 Colorimetric assay for biphenyl dioxygenase (BDPO) activity. BphB = dihydrodiol dehydrogenase; BphC = 2,3-dihydroxybiphenyl1,2-dioxygenase.
and twofold higher catalytic rate with o-methoxyphenol and o-cresol as substrates, respectively [45]. 2.3.2 Assays Based on Gibbs’ Reagent and 4-Aminoantipyrine 4-Aminoantipyrine (4AAP, 23) and 2,6-dichloroquinone-4-chloroimide (Gibbs’ reagent, 24) are very similar in the products they are able to detect. They both react strongly with ortho- and meta-substituted phenolic compounds to produce strongly colored products (e.g. 25). Additionally, they react well with para-substituted compounds if the substituent is a halide or alkoxy moiety to form colored products (e.g. 26) [46]. Since many oxygenases produce such compounds, these are useful assays for measuring their activity. Additionally, dioxygenase cis-dihydrodiol products can be assayed by this method if the cis-dihydrodiol is first converted to a phenol by treatment with acid or cis-dihydrodiol dehydrogenase. Both of the reagents form adducts with the products, which can then be measured with a spectrophotometer. Both reagents are added at the end of the biotransformation (end-point assay) and the colored product develops in several minutes [47]. In the case of 4AAP (23) this occurs under basic conditions in the presence of potassium persulfate or similar oxidant. Although 4AAP (23) has infrequently been used for high-throughput screening, Gibbs’ reagent (24) has been applied in directed evolution experiments. The Gibbs’ assay was used to isolate toluene dioxygenases that accept 4-methylpyridine as a substrate. After several rounds of mutagenesis and screening, a mutant in which the stop codon was replaced with a threonine was found to have sixfold higher activity with 4-methylpyridine and enhanced activity towards the natural substrate [48]. 2.3.3 para-Nitrophenoxy Analog (pNA) Assay The chemical methods available for linear chain hydrocarbon oxidation are energy intensive and hazardous to the environment. Several monooxygenases such as the P450s can catalyze alkane hydroxylation, but at slow rates. An ingenious assay involving the release of para-nitrophenolate from a para-nitrophenoxy analog (pNA, e.g. 25) of a linear chain hydrocarbon has been developed and utilized. When the analogs are hydroxylated by the oxygenase, an unstable hemiacetal (26) is formed that dissociates into an aldehyde and para-nitrophenolate (27); the latter is
2 High-throughput Methods for Various Oxidoreductases
Fig. 5 Colorimetric assay for the activity of P450 monooxygenases.
bright yellow (Figure 5). The obvious disadvantage of this assay is that the pNA (25) substrate is not the desired substrate and may result in the evolved enzyme only acting on pNA (25). However, when compared with other alternatives, this is the easiest assay for long-chain hydrocarbon hydroxylation by monooxygenases. The assay is accomplished by the addition of pNA (25) and NADPH (or use of a regenerative system) and then monitoring the increase in absorbance at 410 nm. P450 monooxygenase BM-3 mutants were screened using an octane pNA (25). Typically this monooxygenase only has activity on longer chain hydrocarbons (at least 12 carbons), but after two rounds of mutagenesis and screening with the analog, the activity of this enzyme on octane was improved fivefold [49]. Another group used site-specific saturation mutagenesis and pNA screening to create mutants of the same enzyme with similarly enhanced activity toward smaller substrates [50]. Recently this P450 was optimized for activity in organic cosolvents by directed evolution using a pNA assay [51]. Activity of the P450 was increased 10-fold in the presence of 2% tetrahydrofuran (THF) and sixfold in the presence of 25% DMSO. Overall, this assay has proved to be very useful for screening monooxygenases that act on long-chain hydrocarbons. 2.3.4 Horseradish Peroxidase-coupled Assay The use of HRP/H2 O2 to detect the reaction products of oxygenases has shown promise [52]. This assay can be used when the products of the oxygenase reaction on aromatic substrates, e.g. naphthalene (28), are or can be converted into hydroxylated aromatic compounds. These hydroxylated aromatic compounds (e.g. naphthol, 29) are then converted by oxidation into colored or fluorescent compounds (e.g. 30 in the presence of HRP and H2 O2 ) (Figure 6). Mono- and dihydroxylated aromatic compounds form dimers (e.g. 30) and polymers in this assay and, depending on their structure, they will display different colorimetric and fluorescent properties, allowing determination of the desired product. This assay can be carried out in cell strains expressing HRP either on solid phase or in liquid assay by adding H2 O2 to
11
12 High-throughput Screening Methods for Oxidoreductases
Fig. 6 Horseradish peroxidase (HRP)-based fluorimetric assay for oxygenase activity.
the solid media or cell culture. The positive mutants will then display an enhanced fluorescent or colorimetric signal. This assay has been suggested for use in screening toluene dioxygenase in combination with cis-dihydrodiol dehydrogenase [52]. Additionally, this assay was used to screen for improved activity of P450cam on naphthalene (28) using H2 O2 as the oxidant. Mutants were found with a 20-fold improvement in activity with differences in regioselectivity of hydroxylation in some cases [53]. 2.3.5 Indole Assay Indole (31) can be oxidized by mono- and dioxygenases to various 2 and 3 position hydroxyl and epoxide indoles. Upon exposure to air, the generated compounds further oxidize and dimerize to form indigo (32) and indirubin (33) (Figure 7). Both of the formed compounds are intensely colored and are easily monitored colorimetrically. This assay, coupled to site-specific saturation mutagenesis, was used to enhance the activity of P450 BM-3 on indole (31) from a level that was too low to detect up to a turnover rate of 2.7 s−1 [54]. This assay was recently used to broaden the substrate specificity of toluene dioxygenase such that it would accept p-xylene as a substrate. One round of mutagenesis and screening resulted in a 4.3-fold activity improvement on p-xylene as well enhancements with several other unnatural substrates [55]. This method of assaying oxygenases is well suited for removing inactive clones from a library or for evolution of altered substrate specificity. 2.4 Laccases
Laccases (EC 1.10.3.2) belong to the blue copper oxidase family and are multi-copper enzymes capable of oxidizing phenols, polyphenols, substituted phenols, diamines, anilines, and other similar compounds. Electrons are removed one at a time from the substrate by a type 1 blue copper ion and transferred to a trinuclear copper cluster and molecular oxygen is utilized as the electron acceptor [56]. The product radical can undergo further oxidation catalyzed by the laccase or it may undergo a nonenzymatic reaction such as hydration or polymerization [56]. Laccases are common enzymes in nature, but are primarily produced by fungi and plants. Laccases are currently receiving a lot of attention as industrial enzymes with proposed applications including textile dye bleaching, pulp bleaching, effluent detoxification, and bioremediation [57]. However, a common hurdle that must
2 High-throughput Methods for Various Oxidoreductases
Fig. 7 Indole-based assay for oxygenase activity.
be overcome with these enzymes is their low expression level in native hosts. Additionally, laccases can catalyze the oxidation of polycyclic hydrocarbons (very toxic pollutants) in the presence of primary substrates such as 1-hydroxy benzothiazole (HBT) [58, 59]. These primary substrates are often toxic, expensive, and result in side-reactions [60]. Therefore, high-throughput screening applied to the discovery and/or directed evolution of laccases is critical to overcome these hurdles. Several screening methods have been developed and are described below. 2.4.1 ABTS Assay The ABTS (6) assay is a very flexible assay applicable to many oxidases, peroxidases as well as laccases. Again the production of the green radical cation (7) is observed spectrophotometrically. This assay has been utilized to improve the functional expression of a fungal laccase in Saccharomyces cerevisiae [61]. Directed evolution was utilized to improve protein expression by eightfold to the highest level yet reported with an additional increase of 22-fold in the turnover number. The ABTS assay has also been used to screen for functional expression of another laccase from Trametes versicolor into the heterologous host Yarrowia lipolytica [62]. This assay is well suited for development of functional selection and enhanced stability screening, and should prove useful in the future. 2.4.2 Poly R-478 Assay Poly R-478 is a polymeric dye that can act as a surrogate substrate for lignin degradation by laccases in the presence of a mediator such as HBT. Upon oxidation by laccases the dye is decolorized, making it easy to follow the activity. The activity of laccases on this substrate has been correlated to the oxidation of polycyclic aromatic hydrocarbons (PAHs) [63]. The assay is performed simply by mixing the water-soluble polymeric dye with the library members to be screened and measuring the decrease in absorbance at 520 nm. Notably, this assay has been utilized recently to discover novel laccases from cultivable fungal strains [64] and to find PAH-degrading strains [63]. 2.4.3 Other Assays Several other assays have been described for laccases that have not yet seen significant application as high-throughput screening methods [65, 66]. In one such assay,
13
14 High-throughput Screening Methods for Oxidoreductases
anthracene is converted to 9,10-anthroquinone by the laccase enzyme, followed by subsequent chemical reduction to 9,10-anthrahydroquinone, which is orange in color and can be measured at 419 nm. Another assay is the production of triiodide in acidic conditions from sodium iodide [66]. In this assay the activity of the laccase can be monitored at 353 nm as the formed triiodide is yellow. Both of these assays require the use of mediators like HBT or ABTS (6) and might be useful in the directed evolution of laccases that have the desired activity in their absence by screening with progressively less mediator. 3 Conclusions
High-throughput screening methods for oxidoreductases represent a key step in identifying variants with improved catalytic function by directed evolution as well as discovering new oxidoreductases by expression of environmental DNA. In this chapter, we have discussed many effective high-throughput screening methods for various oxidoreductases and their successful applications in discovering and engineering new or improved oxidoreductases. The resulting oxidoreductases are not only important for many potential applications such as biocatalysis, bioremediation, diagnostics, and gene therapy, but also are critical to increase our understanding of cellular metabolism and protein structure/function relationships. No doubt in the years to come, these high-throughput screening methods will be used increasingly to isolate oxidoreductases with new or improved characteristics. Acknowledgements
We thank Biotechnology Research and Development Consortium (BRDC) and Office of Naval Research for supporting our work on the characterization and engineering of oxidoreductases including a phosphite dehydrogenase and arylamine oxidases. Abbreviations
4AAP 4CN ABTS ADH BPDO D-AO DH HpbA HRP MNBDH NBT NAD(H)
4-aminoantipyrine 4-chloronaphthol 2,2-azinobis(3-ethylbenzthiazoline)-6-sulfonic acid alcohol dehydrogenase biphenyl dioxygenase D-amino oxidase dehydrogenase 2-Hydroxybiphenyl 2-monooxygenase horseradish peroxidase 4-(N-methylhydrazino)-7-nitro-2,1,3-benzooxadiazole nitroblue tetrazolium nicotinamide adenine dinucleotide (reduced form)
References
NADP(H) PCR PMS PQQ pNA TMB
nicotinamide adenine dinucleotide phosphate (reduced form) polymerase chain reaction phenazine methosulfate pyrroloquinoline quinone para-nitrophenoxy analog 3,3 ,5,5 -tetramethylbenzidine
References 1 K. FABER, Biotransformations in Organic Chemistry: a Textbook, 4th edn, Springer, Berlin, 2000. 2 A. SCHMID, J. S. DORDICK, B. HAUER, A. KIENER, M. WUBBOLTS, B. WITHOLT, Nature 2001, 409, 258–268. 3 S. K. POWELL, M. A. KALOSS, A. PINKSTAFF, R. MCKEE, I. BURIMSKI, M. PENSIERO, E. OTTO, W. P. STEMMER, N. W. SOONG, Nat. Biotechnol. 2000, 18, 1279–1282. 4 H. ZHAO, W. A. van der DONK, Curr. Opin. Biotechnol. 2003, 14, 583–589. 5 W. A. van der DONK, H. ZHAO, Curr. Opin. Biotechnol. 2003, 14, 421–426. 6 J. HANDELSMAN, M. R. RONDON, S. F. BRADY, J. CLARDY, R. M. GOODMAN, Chem. Biol. 1998, 5, R245–249. 7 C. SCHMIDT-DANNERT, Biochemistry 2001, 40, 13125–13136. 8 D. C. DEMIRJIAN, P. C. SHAH, F. MORIS-VARAS, Top. Curr. Chem. 1999, 200, 1–29. 9 J. FIBLA, R. GONZALEZDUARTE, J. Biochem. Biophys. Methods 1993, 26, 87–93. 10 K. M. MAYER, F. H. ARNOLD, J. Biomol. Screening 2002, 7, 135–140. 11 S. J. ALLEN, J. J. HOLBROOK, Protein Eng. 2000, 13, 5–7. 12 A. S. ELHAWRANI, R. B. SESSIONS, K. M. MORETON, J. J. HOLBROOK, J. Mol. Biol. 1996, 264, 97–110. 13 G. RAVOT, D. WAHLER, O. FAVRE-BULLE, V. CILIA, F. LEFEVRE, Adv. Synth. Catal. 2003, 345, 691–694. 14 T. CONWAY, G. W. SEWELL, Y. A. OSMAN, L. O. INGRAM, J. Bacteriol. 1987, 169, 2591–2597. 15 N. O. KAPLAN, S. P. COLOWICK, C. C. BARNES, J. Biol. Chem. 1951, 191, 461–472.
16 G. E. TSOTSOU, A. E. G. CASS, G. GILARDI, Biosens. Bioelectron. 2002, 17, 119–131. 17 L. H. SUN, I. P. PETROUNIA, M. YAGASAKI, G. BANDARA, F. H. ARNOLD, Protein Eng. 2001, 14, 699–704. 18 S. DELAGRAVE, D. J. MURPHY, J. L. R. PRUSS, A. M. MAFFIA, B. L. MARRS, E. J. BYLINA, W. J. COLEMAN, C. L. GREK, M. R. DILWORTH, M. M. YANG, D. C. YOUVAN, Protein Eng. 2001, 14, 261–267. 19 G. Y. YANG, A. M. SHAMSUDDIN, Histol. Histopathol. 1996, 11, 801–806. 20 M. GABLER, M. HENSEL, L. FISCHER, Enzyme Microb. Technol. 2000, 27, 605–611. 21 H. J. DANNEEL, M. ULLRICH, F. GIFFHORN, Enzyme Microb. Technol. 1992, 14, 898–903. 22 H. D. CONLON, J. BAQAI, K. BAKER, Y. Q. SHEN, B. L. WONG, R. NOILES, C. W. RAUSCH, Biotechnol. Bioeng. 1995, 46, 510–513. 23 B. VALDERRAMA, M. AYALA, R. VAZQUEZ-DUHALT, Chem. Biol. 2002, 9, 555–565. 24 N. C. VEITCH, A. T. SMITH, Adv. Inorg. Chem. 2001, 51, 107–162. 25 S. COLONNA, N. GAGGERO, C. RICHELMI, P. PASTA, Trends Biotechnol. 1999, 17, 163–168. 26 J. A. NICELL, J. K. BEWTRA, N. BISWAS, C. C. STPIERRE, K. E. TAYLOR, Can.J. Civil Eng. 1993, 20, 725–735. 27 A. IFFLAND, S. GENDREIZIG, P. TAFELMEYER, K. JOHNSSON, Biochem. Biophys. Res. Commun. 2001, 286, 126–132. 28 B. MORAWSKI, S. QUAN, F. H. ARNOLD, Biotechnol. Bioeng. 2001, 76, 99–107. 29 Z. LIN, T. THORSEN, F. H. ARNOLD, Biotechnol. Prog. 1999, 15, 467–471.
15
16 High-throughput Screening Methods for Oxidoreductases
30 J. R. CHERRY, M. H. LAMSA, P. SCHNEIDER, J. VIND, A. SVENDSEN, A. JONES, A. H. PEDERSEN, Nat. Biotechnol. 1999, 17, 379–384. 31 L. WAN, M. B. TWITCHETT, L. D. ELTIS, A. G. MAUK, M. SMITH, Proc. Natl Acad. Sci. USA 1998, 95, 12825–12831. 32 W. H. WAUGH, J. Pediatr. Hematol. Oncol. 2003, 25, 831–834. 33 P. D. JOSEPHY, T. ELING, R. P. MASON, J. Biol. Chem. 1982, 257, 3669–3675. 34 M. WILMING, A. IFFLAND, P. TAFELMEYER, C. ARRIVOLI, C. SAUDAN, K. JOHNSSON, Chembiochem 2002, 3, 1097–1104. 35 A. IFFLAND, P. TAFELMEYER, C. SAUDAN, K. JOHNSSON, Biochemistry 2000, 39, 10790–10798. 36 J. MEYER, A. BULDT, M. VOGEL, U. KARST, Angew. Chem. Int. Ed. Engl. 2000, 39, 1453–1455. 37 D. R. NELSON, T. KAMATAKI, D. J. WAXMAN, F. P. GUENGERICH, R. W. ESTABROOK, R. FEYEREISEN, F. J. GONZALEZ, M. J. COON, I. C. GUNSALUS, O. GOTOH, et al. DNA Cell Biol. 1993, 12, 1–51. 38 C. S. BUTLER, J. R. MASON, Adv. Microb. Physiol. 1997, 38, 47–84. 39 A. MEYER, A. SCHMID, M. HELD, A. H. WESTPHAL, M. ROTHLISBERGER, H. P. KOHLER, W. J. VAN BERKEL, B. J. WITHOLT, Biol. Chem. 2002, 277, 5575–5582. 40 H. SUENAGA, M. GOTO, K. FURUKAWA, J. Biol. Chem. 2001, 276, 22500–22506. 41 T. KUMAMARU, H. SUENAGA, M. MITSUOKA, T. WATANABE, K. FURUKAWA, Nat. Biotechnol. 1998, 16, 663–666. 42 H. SUENAGA, M. MITSUOKA, Y. URA, T. WATANABE, K. FURUKAWA, J. Bacteriol. 2001, 183, 5441–5444. 43 A. OKUTA, K. OHNISHI, S. HARAYAMA, Gene 1998, 212, 221–228. 44 M. KIKUCHI, K. OHNISHI, S. HARAYAMA, Gene 1999, 236, 159–167. 45 Y. TAO, A. FISHMAN, W. E. BENTLEY, T. K. WOOD, J. Bacteriol. 2004, 186, 4705–4713. 46 C. R. OTEY, J. M. JOERN, Methods Mol. Biol. 2003, 230, 141–148. 47 J. M. JOERN, T. SAKAMOTO, A. ARISAWA, F. H. ARNOLD, J. Biomol. Screen. 2001, 6, 219–223. 48 T. SAKAMOTO, J. M. JOERN, A. ARISAWA, F.
49
50
51
52 53 54
55
56 57
58
59
60 61
62
63
64 65 66
H. ARNOLD, Appl. Environ. Microbiol. 2001, 67, 3882–3887. E. T. FARINAS, U. SCHWANEBERG, A. GLIEDER, F. H. ARNOLD, Adv. Synth. Catal. 2001, 343, 601–606. Q. S. LI, U. SCHWANEBERG, M. FISCHER, J. SCHMITT, J. PLEISS, S. LUTZ-WAHL, R. D. SCHMID, Biochim. Biophys. Acta 2001, 1545, 114–121. T. SENG WONG, F. H. ARNOLD, U. SCHWANEBERG, Biotechnol. Bioeng. 2004, 85, 351–358. H. JOO, A. ARISAWA, Z. LIN, F. H. ARNOLD, Chem. Biol. 1999, 6, 699–706. H. JOO, Z. LIN, F. H. ARNOLD, Nature 1999, 399, 670–673. Q. S. LI, U. SCHWANEBERG, P. FISCHER, R. D. SCHMID, Chemistry 2000, 6, 1531–1536. L. M. NEWMAN, H. GARCIA, T. HUDLICKY, S. A. SELIFONOV, Tetrahedron 2004, 60, 729–734. C. F. THURSTON, Microbiol-Uk 1994, 140, 19–26. N. DURAN, M. A. ROSA, A. D’ANNIBALE, L. GIANFREDA, Enzyme Microb. Technol. 2002, 31, 907–931. C. JOHANNES, A. MAJCHERCZYK, A. HUTTERMANN, Appl. Microbiol. Biotechnol. 1996, 46, 313–317. M. A. PICKARD, R. ROMAN, R. TINOCO, R. VAZQUEZ-DUHALT, Appl. Environ. Microbiol. 1999, 65, 3805–3809. M. ALCALDE, T. BULTER, Methods Mol. Biol. 2003, 230, 193–201. T. BULTER, M. ALCALDE, V. SIEBER, P. MEINHOLD, C. SCHLACHTBAUER, F. H. ARNOLD, Appl. Environ. Microbiol. 2003, 69, 987–995. C. JOLIVALT, C. MADZAK, A. BRAULT, E. CAMINADE, C. MALOSSE, C. MOUGIN, Appl. Microbiol. Biotechnol. 2005, 66, 450–456. J. A. FIELD, E. DE JONG, G. Feijoo COSTA, J. de BONT, A. Appl. Environ. Microbiol. 1992, 58, 2219–2226. L. L. KIISKINEN, M. RATTO, K. KRUUS, J. Appl. Microbiol. 2004, 97, 640–646. M. ALCALDE, T. BULTER, F. H. ARNOLD, J. Biomol. Screen. 2002, 7, 547–553. F. XU, Appl. Microbiol. Biotechnol. 1996, 59, 221–230.
1
Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes Romas J. Kazlauskas University of Minnesota, Saint Paul, USA
Originally published in: Enzyme Assays. Edited by Jean-Louis Reymond. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52731095-1
1 Introduction
Color changes are the most convenient way to follow enzymatic reactions, but most enzymatic reactions do not directly cause a color change. (Reactions that consume or generate NAD(P)H are a notable exception, making them among the most widely used enzyme assays.) Chromogenic substrates conveniently introduce a color change into many reactions, but at the cost of using an artificial rather than a true substrate. One of the best solutions is to use the true substrate and then to couple the reaction of the substrate to a color change reaction. Many such coupled reaction assays have been developed. The focus in this chapter will be on reactions linked to pH changes, which are conveniently detected using pH indicators. For many applications, substrate selectivity is the characteristic that distinguishes a useful enzyme from a non-useful one. To measure substrate selectivity, one must compare the reactions of two substrates. There are two ways to do this using color change reactions – separately or in the same reaction mixture. Measuring the rates of reactions of two substrates separately and then comparing these rates is the simplest way to measure selectivity. However, this approach only yields an estimate of selectivity. Enzyme selectivity stems from differences in the binding of the two substrates to the active site and from differences in the reaction rates of the two substrates. Measuring the reaction rate of each substrate separately may overlook differences in the binding of the two substrates to the active site. Measuring selectivity by simultaneously measuring the reaction rates of both in the same reaction mixture seems at first to be a very special case, which would
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
rarely apply to real world examples. However, by using a reference compound and two measurements, this approach can be applied to a wide range of compounds and, unlike separate measurements, can yield accurate selectivity values. The idea is to first measure the selectivity of the one true substrate and the reference compound. This reference compound is chosen to be convenient to detect and may be a chromogenic substrate. Reaction of the true substrate may be detected by one color (e.g. a pH indicator), while reaction of the reference compound is detected by another color (the chromophore released upon reaction). This competitive experiment gives the true selectivity for this pair of substrates. The second step is to measure the selectivity of the second substrate toward the same reference compound. This yields a second true selectivity. Finally, since the reference compound was the same in both experiments, dividing the two gives the correct selectivity for the two true substrates. We refer to such colorimetric methods where the substrate selectivity is determined not by direct competition of the substrates with each other but in a two-step procedure by comparing reaction of each substrate with a reference compound, as “Quick selectivity methods.” This chapter will provide examples of using this Quick selectivity approach to measure enantioselectivity, diastereoselectivity as well as substrate selectivity.
2 Direct Assays Using Chromogenic Substrates
The simplest and most convenient assays are those involving a chromogenic substrate, that is, one where the reaction yields a product of a different color from the starting material. The most common chromogenic substrates to assay hydrolase activity are the p-nitrophenyl derivatives (Figure 1). In each case the p-nitrophenyl derivatives mimic the natural substrate of the hydrolase while adding a p-nitrophenyl moiety at the cleavage site. For lipases, phospholipases and esterases, these are esters of p-nitrophenol (e.g. 1, 2) [1]; for proteases, these are amides of p-nitroaniline (e.g. 3, 4) [2], and for glycosidases, these are glycosides of p-nitrophenol (e.g. 5) [3]. Other common chromogenic substrates are resorufin or 1-naphthol derivatives (Figure 2). Resorufin analogs have been used previously to spectrophotometrically monitor the activity of enzymes including galactosidases [4], cellulases [5], lipases [6], proteases, esterases, and phospholipases [7] and are particularly useful for detecting activity at low enzyme concentrations due to the intense absorption of resorufin, with an extinction coefficient approximately 70000 M−1 cm−1 . The 1naphthol derivatives are used for activity staining of gels (e.g. 6). Hydrolysis yields 1-naphthol (7), which reacts with a diazo compound (Fast Red 8) to yield red-brown insoluble precipitate (9). The disadvantage of chromogenic substrates is that they only mimic the true substrate of interest. One is never sure whether the true substrate will behave the same way as the chromogenic mimic.
3 Indirect Assays Using Coupled Reactions – pH Indicators
Fig. 1 Typical p-nitrophenyl derivatives for chromogenic assay of hydrolase activity. Each substrate includes a p-nitrophenyl moiety in the normal substrate in such a way that hydrolysis releases p-nitrophenol or p-nitroaniline, which can be detected by their yellow color. The wavy line marks the bond cleaved during hydrolysis.
3 Indirect Assays Using Coupled Reactions – pH Indicators
To detect reaction of a substrate lacking a chromogenic moiety, it is necessary to convert a product of the reaction into a colored compound. For example, hydrolysis of an acetate ester by a lipase or esterase can be detected using a coupled enzyme assay [8]. Several sequential enzymatic reactions convert acetate to citrate accompanied by the formation of NADH, which absorbs at 340 nm. Another example, suitable for monitoring enzyme-catalyzed transesterifications in organic solvent, detects the acetaldehyde released upon transesterification of an alcohol (11) with a vinyl ester (10)
3
4 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
Fig. 2 Action of an esterase releases 1-naphthol (7) which reacts with the diazo compound Fast Red B (8) to give a water-insoluble azo dye (9).
to form ester (12) and acetaldehyde [9]. The acetaldehyde reacts with 4-hydrazino7-nitro-2,1,3-benzoxadiazole (13) to yield a strongly fluorescent compound (14) (Figure 3). Both of the examples above are limited to certain substrates: acetate esters in the first case, transesterifications with vinyl esters in the second case. A more general assay relies on pH indicators. Hydrolysis of an ester at neutral pH, for example, solketal butyrate (15) (butanoate ester of 2,2-dimethyl-1,3-dioxolane4-methanol) to form solketal (16) below, releases a proton (Scheme 1). Detecting this proton release with a pH indicator allows one to monitor a wide range of ester hydrolyses. Researchers have used pH indicators to monitor the progress of enzyme-catalyzed reactions that release or consume protons since the 1940s [10, 11]. For example, researchers have monitored reactions catalyzed by amino acid decarboxylase [12], carbonic anhydrase [13], cholinesterase [14], hexokinase [15, 16], and ester hydrolysis by proteases [17]. In some cases, researchers used a pH indicator assay qualitatively, but in other cases, the color change was proportional to the number of protons released. This quantitative use requires a calibration with additional experiment or a careful choice of reaction conditions.
Fig. 3 Assaying the acylation of an alcohol with a vinyl ester in organic solvent relies on the detection of acetaldehyde.
3 Indirect Assays Using Coupled Reactions – pH Indicators
Scheme 1
3.1 Overview of Quantitative Use of pH Indicator Assay
The most important variables in the pH indicator assay are the choices of buffer and pH indicator [18]. Both the buffer and the indicator must have the same affinity for protons (pK a values within 0.1 unit of each other) so that the relative amount of buffer protonated and indicator protonated stays constant as the pH shifts during the reaction. A difference in pK a of 0.3 units causes an 8% error when the pH changes by 0.1 unit [12]. In a typical assay, the pH changes by 0.05 units (10% hydrolysis of the substrate), thus, differences in pK a values can lead to nonlinear and inaccurate rates. If different pK a values cannot be avoided, accurate results can still be obtained by using calibration experiments [10] or a more complex equation [12]. If the pH indicator is the only species in the reaction solution that reacts with the released protons, then the rate of hydrolysis is equal to the rate of formation of the protonated form of the indicator. The color change in the solution reveals the protonation of the indicator as shown in Eq. (1), where A/dt is the change in absorbance of the solution, ε is the difference in the extinction coefficient of the protonated and nonprotonated forms of the indicator and l is the pathlength of the solution. rate=[indicator·H+ ]/time=
A/dt ε·l
(1)
In practice, solutions containing only indicator are difficult to use because the color changes too readily due to small variation in conditions. For this reason, researchers usually add some buffer to the reaction mixture. Under these conditions, the rate of hydrolysis equals the rate of formation of the sum of the protonated indicator and the protonated buffer (Eq. (2)). rate=[indicator·H+ ]/time+[buffer·H+ ]/time
(2)
When the pK a values of the indicator and buffer are the same, then released protons partition between the indicator and buffer according to the ratio of their concentrations (Eq. (3)) [6, 7, 19]. The highest sensitivity (largest dA/dt) occurs with less buffer. The protons released add to either the buffer, giving no color change, or to the indicator, giving a color change. Thus, lowering the buffer concentration or increasing the indicator concentration increases the sensitivity of the assay. When
5
6 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
Fig. 4 Buffer/indicator pairs with similar pK a values that may be suitable for screening at pH 6, 8, or 9.2.
the buffer to indicator concentration is high, the (1+[buffer]/[indicator]) term can be replaced with ([buffer]/[indicator]): rate=
[buffer] A/dt A/dt [buffer] A/dt + = (1+ ) ε·l [indicator] ε·l ε·l [indicator]
(3)
Since most hydrolases have maximal activity near neutral pH, we developed the assay for pH 7.2. As a pH indicator, we chose 4-nitrophenol 17 (Figure 5). The similarity of its pK a (7.15) to the pH of the reaction mixture ensures that changes in pH give a large and linear color change [6]. The large difference in the extinction coefficients of the protonated and deprotonated forms (200 versus 18000 M−1 cm−1 at 404 nm) gives good sensitivity. Finally, nitrophenols bind less to proteins than some polyaromatic indicators [20]. The concentration of the pH indicator should be as high as possible to maximize sensitivity (see Scheme 1). In our assay, the high initial absorbance of 4-nitrophenoxide/4-nitrophenol limited the concentration to 0.45 mM. (The pathlength in a 96-well plate depends on the volume of the solution in the well since the light passes from the top of the plate through the solution. Thus, the maximum indicator concentration varies with the solution volumes. With other acid-base indicators, poor water solubility can also limit the maximal concentration.) This concentration gave a starting absorbance of ∼1.2 where the accuracy is still not compromised by low light levels. As a buffer, we chose BES (N,N-bis[2-hydroxyethyl]-2-amino-ethanesulfonic acid) (18) because its pK a (7.15) [21] is identical to that of 4-nitrophenol (17). We chose a buffer concentration
3 Indirect Assays Using Coupled Reactions – pH Indicators
of approximately 5 mM as a compromise between low buffer concentrations to maximize sensitivity (see Scheme 1), and high buffer concentrations to ensure accurate measurements and small pH changes throughout the assay (<0.05 pH units for 10% hydrolysis at our conditions). The small pH changes are important because kinetic constants can change with changing pH. Other buffer/indicator pairs may be suitable for screening at other pHs. For example, at pH 6 chlorophenol red (19) (pK a 6.0) and MES (2-[N-morpholino]ethanesulfonic acid, pK a 6.1) (20) may be suitable; at pH 8 phenol red (21) (pK a 8.0) and EPPS (N-[2-hydroxylethyl]piperazine-N -[3-propanesulfonic acid], pK a 8.0) (22) may be suitable; at pH 9 thymol blue (pK a 9.2) (23) and CHES (2-[Ncyclohexylamino]ethanesulfonic acid, pK a 9.3) (24) may be suitable (Figure 5).
Fig. 5 4-Nitrophenol (17) as a pH indicator to detect hydrolysis of esters. Hydrolysis of an ester releases a proton, which protonates the yellow 4-nitrophenoxide ion. The graph shows experimental data for hydrolysis of solketal butyrate (15) by several hydrolases.
The inset shows the linear loss of yellow color as a function of time, which yields the rate of reaction using Eq. (3). The graph shows that the measured rate of hydrolysis increases with increasing enzyme amount as expected. (Data from [18]).
7
8 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
Suitable substrate concentrations ranged from 0.5 to 2 mM, typically 1 mM. At substrate concentration below 0.5 mM, the absorbance changes are too small to be detected accurately. For example, hydrolysis of 5% of a 0.25 mM substrate concentration at our standard conditions (pH 7.2, 0.45 mM 4-nitrophenol (17), 5 mM BES (18)), changes the absorbance by only 0.005 absorbance units. Solubility in water sets the upper limit of substrate concentration because spectrophotometric measurements require clear solutions. Typical organic substrates dissolve poorly in water, so we added organic cosolvent – 7 vol% acetonitrile. For very insoluble substrates, we prepared clear emulsions using detergents. The assay tolerates small changes in reaction conditions, such as the addition of 7% acetonitrile. Indeed, the pK a of 4-nitrophenol (17) changes only slightly from 7.15 to 7.17 upon addition of 10% ethanol. This result suggests that cosolvent concentrations below 10% do not compromise the accuracy of the assay. Also, small amounts of salts present in the hydrolase solutions (buffer salts in commercial hydrolase preparations, 2 mM CaCl2 in the protease solutions) did not affect the accuracy. 3.2 Applications
The pH indicator-based assay is a quick way to measure whether an enzyme accepts a particular substrate. This measurement quickly eliminates unsuitable enzymes or unsuitable substrates from more complicated subsequent experiments. It is most useful with unusual substrate/enzyme combinations where only a few are likely to work. 3.2.1 Searching for an Active Hydrolase (Testing Many Hydrolases Toward One Substrate) One application of this assay is for a rapid survey of commercial enzymes to find those that accept an unnatural or unusual substrate. In one project, we screened a library of 100 commercial hydrolases to find the two that catalyzed hydrolysis of hindered spiro compound 25 that was useful as a chiral auxiliary [22]. This screening identified several candidate hydrolases. Subsequent confirmation of this activity and optimization of the reaction led to a subtilisin-catalyzed preparative scale resolution of this auxiliary (Scheme 2).
Scheme 2
3 Indirect Assays Using Coupled Reactions – pH Indicators
Scheme 3
In a second project, we screened a library of 72 commercial hydrolases (lipases, esterases, and proteases) towards solketal butyrate (15) (see Scheme 1), a chiral building block [18]. Most hydrolases reacted with this less hindered substrate so this initial screen eliminated only 20 of the 72 candidate enzymes from the next stage of screening. In a third project, we again assayed a library of 100 hydrolases to find the few that catalyzed hydrolysis of the amide link in N-acyl sulfinamides (27) to form sulfinamides (28) [23]. Although hydrolysis of amides usually does not work in this assay because it does not release a proton, the N-acyl sulfinamide (27) was a special case. The sulfinamide (28) released is not a strong base like an amine and does not take up the proton released from the ionization of the acid, so the pH indicator assay was suitable for this special amide hydrolysis (Scheme 3). Other researchers recently used similar pH indicator assays to measure the activity of glycosyl transferase [24], haloalkane dehalogenase [25], and nitrilase [26]. 3.2.2 Substrate Mapping of New Hydrolases (Testing Many Substrates Toward Hydrolase) Another application of this assay is mapping the substrate range of newly discovered hydrolases, such as those from thermophiles. Advances in microbiology and molecular biology have made hundreds of new enzymes available from unusual microorganisms such as thermophiles. These enzymes may be useful for organic synthesis because they allow the use of higher temperatures, where reactions are faster and substrates are more soluble [27]. In addition, these enzymes may also tolerate other unusual conditions such as high concentrations of organic solvents. To apply these new enzymes in organic synthesis, one needs some idea of their substrate range and selectivity. The pH indicator-based assay is quick way to map the substrate range of new enzymes. Recently, we, and others, used this pH indicator assay to map over 20 different esterases using the substrate library of more than 50 esters (Figure 6) [28–30]. This substrate mapping first grouped each substrate/esterase combination as slow (≤1 mmol ester hydrolyzed mg−1 protein min−1 ), good (1–10) or very good (>10). The 19 esterases and 31 substrates gave a total of 589 substrate/esterase combinations. Examination of these data yielded several conclusions. The best substrates were activated esters, which were also chemically the most reactive in the library: vinyl esters (30) and phenyl esters (32). Among the vinyl and ethylesters with different chain lengths, most of the esterases appear to favor the hexanoate (36) or octanoate (37), while one esterase (E018b) appeared to strongly favor acetate esters (34). Esters with a sterically hindered acyl group (vinyl pivaloate 30, R=tbutyl or vinyl benzoate 30, R=Ph) and polar esters, e.g. methyl 2-hydroxyacetate 31, R=CH2 OH, were usually poor substrates. Three pairs of esterases (E010/E011,
9
10 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
Fig. 6 Examples of esters used to survey the substrate selectivity of esterases.
E013/E014, E019/E020) showed similar reaction rates with all substrates and therefore might be very similar or even identical enzymes. Thus, this initial substrate mapping provided a good overview of substrate range and ideas for the further refinement of these general conclusions. This further refinement usually involves a quantitative measure of substrate selectivity, where one includes reference compounds in the screening (see below). 3.3 Comparison with Other Methods
There are several advantages with this screening method using pH indicators. First, it is hundreds of times faster than conventional screening. The 96-well format allows the analysis of large numbers of samples simultaneously. The method is quantitative, unlike screening for hydrolytic activity by thin-layer chromatography (TLC). Second, since the entire reactions and analyses take place in the microplate wells, workup and analysis by gas chromatography (GC), high-performance liquid chromatography (HPLC) or nuclear magnetic resonance (NMR) is avoided. Third, it requires hundreds to thousands of times less substrate (typically 20 mg per well) and hydrolase (we used between 0.6 and 35 mg protein per well). Fourth, this assay measures the hydrolysis of any ester, not just chromogenic esters. The
4 Estimating and Measuring Selectivity
most important rule of screening is “You get what you screen for,” so the ability to screen the target compound, not an analog of the target compound, is an important advantage. There are also few disadvantages with our screening method. This assay is approximately seven times less sensitive than one using hydrolysis of 4-nitrophenyl esters (e.g. 1) (Figure 1). For example, if the rate of hydrolysis towards a nonchromogenic ester and a 4-nitrophenyl ester were identical, then our assay would require seven times more hydrolase to observe the same change in absorbance. The assay with 4-nitrophenyl esters releases one molecule of 4-nitrophenol (17) (53% of these will be deprotonated at pH 7.2), while our assay protonates one 4-nitrophenoxide for every 12 protons released. Second, this assay requires clear solutions. To obtain clear solutions with water-insoluble substrates, experimentation is sometimes required to find the best cosolvent or emulsion conditions. Third, the pH indicator method requires a reaction that generates or consumes protons, so it cannot be used for most amide hydrolysis or glycoside hydrolysis.
4 Estimating and Measuring Selectivity
One of the most useful characteristics of enzymes is their selectivity. The true selectivity of an enzyme toward a pair of substrates is the ratio of the specificity constants (kcat /K m ) for each substrate [31]. For example, the enantioselectivity (or enantiomeric ratio, E) of an enzyme is the ratio of the specificity constants for the enantiomers (Eq. (4)). Enantiomeric ratio=E =
(kcat /K m )fast enantiomer (kcat /K m )slow enantiomer
(4)
Other examples of selectivity are substrate selectivity (e.g. selectivity for different acyl chain length among esters), regioselectivity (e.g. selectivity for one ester of several esters in a nucleoside derivative), and diastereoselectivity (e.g. selectivity for a cis isomer over a trans isomer). It is rarely convenient to measure selectivity by measuring the kinetic parameters for each substrate. Such measurement requires multiple kinetic studies, which are slow and tedious. In addition, combining four such measured values, each with an uncertainty of measurement, yields selectivities with large uncertainties. The best methods to measure selectivities are competitive experiments. A mixture of both substrates competes for the enzyme active site and is converted to product. Measuring the amounts of each product formed reveals the selectivity. One such method is the enantioselectivity determination method developed by C.J. Sih’s group [32]. This method is the “gold standard” to which other methods are compared. To measure E, researchers run a test resolution, work up the reaction and measure two of the following: enantiomeric purity of the starting material
11
12 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
(ees), enantiomeric purity of the product (eep), or conversion (c). The difficulty with this method is mainly the measurement of enantiomeric purity, which can be time-consuming. Screening hundreds of commercial enzymes or cultures of microorganisms by this method is difficult without laboratory automation. 4.1 Estimating Selectivity without a Reference Compound
In some cases, a qualitative estimate of selectivity is sufficient. In these cases, the measured rates of reaction in two experiments are simply compared (Eq. (5)). For example, we estimated the enantioselectivity of several lipases toward 4-nitrophenyl2-phenylpropanoate (57) by measuring the initial rates of hydrolysis of each enantiomer (Table 1). The true E-values came from a competitive experiment where the enantiomeric purity of the products and starting materials were measured. The estimated E came from a simple comparison of the initial rate of reaction for the two enantiomers separately. In this case, a pH indicator was not needed to detect hydrolysis because hydrolysis yields the yellow p-nitrophenoxide directly. The agreement is qualitatively good since the estimated E identified enantioselective Table 1 True enantioselectivity and estimated enantioselectivity of several hydrolases toward
4-nitrophenyl 2-phenylpropanoate (57). (All data from [34])
4 Estimating and Measuring Selectivity Table 2 True enantioselectivity and estimated enantioselectivity of several hydrolases toward
solketal butyrate (15). (All data from [18]) Hydrolase
True Ea
Estimatedb
Horse liver esterase Rhizopus oryzae lipase Cutinase Candida rugosa lipase Esterase E013
15 5.0 5.0 3.0 1.0
12 11 2.3 4.9 1.0
a True E was measured by hydrolysis to approximately 40% conversion of a racemic sample of solketal butyrate (15) with the indicated hydrolase followed by measurement of the enantiomeric purity of the product and remaining starting material by gas chromatography using a chiral stationary phase. b Estimated E was measured by comparing the initial rate of hydrolysis of pure enantiomers of solketal butyrate (15) separately. Initial rates were measured using a pH indicator (4-nitrophenol 17) to detect release of acid upon hydrolysis of solketal butyrate (15). With the exception of horse liver esterase, all enzymes favored (R)-solketal butyrate.
and nonenantioselective enzymes. However, there is clearly no quantitative agreement. (The Quick E-values, which do agree quantitatively with the true E-values will be discussed below.) estimatedE =
rate of fast enantiomer measured separately rate of slow enantiomer measured separately
(5)
In another example, we estimated the enantioselectivity of the 52 hydrolases toward solketal butyrate (15) by separately measuring the initial rates of hydrolysis of the pure enantiomers. We used pH indicators to measure the rates of hydrolysis and then presented the ratio of these rates as an estimated enantioselectivity [33]. A few typical results are shown in Table 2. The true enantioselectivity and the estimated enantioselectivity agreed to within a factor of 2.3. This qualitative screening identified horse liver esterase as a new hydrolase for the resolution of solketal butyrate (15) with modest enantioselectivity. In a last example, estimated selectivities differed so much from the true selectivities that they were useless. In this case we estimated the diastereoselectivity of 91 commercial hydrolases towards the cis-(2S, 4S) and trans-(2R, 4S) dioxolanes (59a/b) by measuring the initial rates of hydrolysis of the two pure diastereomers separately. The ratio of the two rates is the estimated diastereoselectivity (Table 3). Three hydrolases – cholesterol esterase, α-chymotrypsin and subtilisin from Bacillus licheniformis – showed excellent (>100) estimated diastereoselectivity. Unfortunately, the true diastereoselectivity was much lower (4.4–17), so this estimated selectivity screening was not accurate enough to identify selective enzymes. The main advantages of measuring estimated selectivity are speed and simplicity, while the main disadvantage is the risk that the estimate is so far off from the true value that it is misleading. Typical screening and data workup times for a library of 100 enzymes are several hours. In practice, we often include an estimated selectivity screening before the quantitative screen involving a reference compound
13
14 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
(see below) to group the substrates into fast-, medium-, and slow-reacting. This information helps choose a good reference compound. 4.2 Quantitative Measure of Selectivity Using a Reference Compound (Quick E and Related Methods)
These estimated selectivities are only estimates because they do involve competition between the substrates. By measuring initial rates of the two substrates separately, we eliminate competitive binding between the two diastereomers. Both kcat and K m contribute to the overall selectivity of an enzyme, thus eliminating competitive binding can lead to inaccuracies, as the example in Table 3 above showed. To understand the contribution of binding to selectivity, imagine a hypothetical case where both substrates have the same kcat values, but different K m values. In a competitive experiment, the enzyme will bind and transform the better binding substrate. The reaction is selective. However, if the hydrolysis of the two substrates occurs in separate vessels, the result depends on the substrate concentration. At saturating amounts of substrate, both reactions will proceed at the same rate and estimated selectivity will incorrectly indicate that the reaction is nonselective. At substrate concentrations well below K m , the estimated selectivity will be close to the true selectivity. One can imagine other combinations of reaction conditions and kinetic parameters that can lead to either overestimation or underestimation of the true selectivity. Table 3 True diastereoselectivity, estimated diastereoselectivity and Quick D of several
hydrolases toward diastereomers of methyl (4S)-2-benzyloxy-1,3-dioxolane-4-carboxylate (59a/b). (All data from [37])
4 Estimating and Measuring Selectivity
Measuring the true selectivity requires a competitive experiment. Adding two substrates to the reaction mixture creates a competitive experiment, but complicates analysis. One must be able to distinguish the reaction of each substrate individually. Methods like HPLC or GC can distinguish the two substrates, but are much slower than colorimetric methods. An alternative is to use one true substrate, whose hydrolysis can be measured by the pH indicator and a second, chromogenic reference compound, whose hydrolysis can be measured by formation of a color different from the pH indicator. Such an experiment would yield the correct selectivity for true substrate 1 and the reference compound (Eq. (6)). rate of substrate 1 reaction [reference] substrate 1 selectivity= · reference rate of reference reaction [substrate 1]
(6)
In a second experiment, a second substrate competes against the same reference compound to yield the correct selectivity for true substrate 2 and the reference compound (Eq. (7)). rate of substrate 2 reaction [reference] substrate 2 selectivity= · reference rate of reference reaction [substrate 2]
(7)
Finally, dividing these two selectivities yields the desired selectivity of substrate 1 versus substrate 2 (Eq. (8)). substrate 1 selectivity substrate 1 reference selectivity= substrate 2 substrate 2 selectivity reference
(8)
We define Quick selectivity methods (Quick E for enantioselectivity, Quick D for diastereoselectivity, Quick S for substrate selectivity) as methods that, instead of letting the substrates compete directly against one another, do so indirectly by competing each substrate against a reference compound. 4.2.1 Chromogenic Substrate In the first Quick E experiments, we used chromogenic substrates and a chromogenic reference compound [34]. Hydrolysis of pure enantiomers of 4-nitrophenyl 2-phenylpropanoate (57) liberates the yellow p-nitrophenoxide ion (7). The increase in absorbance at 404 nm revealed the initial rates of hydrolysis of each enantiomer, but the ratio of these rates gave only an enantiomeric ratio (see Table 1). The ratio of rates over- or underestimated E by as much as 70%. To reintroduce competition, we added resorufin tetradecanoate (58) as a reference compound. We monitored the initial rates of hydrolysis of 4-nitrophenyl (S)-2-phenylpropanoate ((S)-57) at 404 nm and the reference compound 58 at 572 nm in the same solution (Figure 7).
15
16 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
Fig. 7 The first step of the Quick E measure of enantioselectivity of lipases toward 4-nitrophenyl 2-phenylpropanoate (57). Lipase-catalyzed hydrolysis of 4-nitrophenyl (S)-2-phenylpropanoate ((S)-57) and the reference compound, resorufin tetradecanoate (58), yields yellow and pink chromophores,
respectively. The solution turns deep orange if both substrates are hydrolyzed; pink if only the reference compound is hydrolyzed. The second step of the Quick E is the same, except that it uses the (R)enantiomer of the chiral ester. Equations (6) to (8) yield the enantioselectivity.
Two experiments, one with (S)-enantiomer of 57 competing against the reference compound 58 and the second with the (R)-enantiomer of 57 competing against the reference compound 58 yielded the data to calculate the Quick E-values according to Eqs (6) to (8). These Quick E-values agreed with the values measured by the slower methods involving direct competition of the enantiomers (R)- or (S)-57 and analysis of the enantiomers by HPLC (Table 1). We measured low (E = 1.4), average (E = 27), and excellent (E = 210) enantioselectivities correctly by this technique. Each hydrolysis experiment requires 30 s; thus, the measurement time for E was only 1 min.
4.2.2 pH Indicators To apply the Quick E ideas to nonchromogenic substrates, we used pH indicators to detect the hydrolysis of the substrate. The reference compound remained a chromogenic resorufin ester, acetate (60). We first demonstrated this to measure diastereoselectivity of the dioxolane stereoisomers 59a/b in Table 2 (Figure 8). The Quick D measurements agreed with the true values determined by NMR using a direct competitive hydrolysis of the two diastereomers.
4 Estimating and Measuring Selectivity
Fig. 8 The first selectivity step of measuring the Quick D diastereoselectivity of a hydrolase toward the 2R and 2S diastereomers of methyl (4S)-benzyloxy-1,3-dioxolane-4carboxylate (59a/b). Enzyme-catalyzed hydrolysis of trans-(2R, 4S) diastereomers (57b) in the presence of the reference compound, resorufin acetate (60), releases protons which produce a decrease in the yellow absorbance of the pH indicator, and a brilliant pink chromophore, resorufin anion 61. Both of the absorbance changes, at 404 nm and 574 nm respectively, can be monitored simultaneously. The second step of the Quick D uses the cis-(2S, 4R) diastereomer (59a). The graph shows experimental
data for the Quick D measurement with bovine pancreatic protease (Quick D=12.8). The selectivity ratio of the trans-(2R, 4S) diastereomer (59b) versus resorufin acetate (60) is 1.64 (Eq. (6)) while the selectivity ratio of the cis-(2S, 4S) diastereomer (59a) versus resorufin acetate (60) is 0.128 (Eq. (7)). The ratio of the two selectivity ratios equals the diastereoselectivity, 12.8 (Eq. (8)). Conditions in the well during assay: 2.0 mM cis or trans dioxolane methyl ester (59), 0.1 mM resorufin acetate (60), 4.65 mM BES (18), 0.434 mM 4-nitrophenol (17), 0.86% Triton X-100, 7% acetonitrile. (Data from [34]).
17
18 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
Fig. 9 Acyl length selectivity of thermophile esterases toward vinyl esters. Darker squares correspond to higher activity. Most esterases favor either vinyl hexanoate or octanoate, but E018b and AcE (acetyl esterase from orange peel) favor vinyl acetate.
To show selectivity clearly, the absolute enzyme activities were scaled as indicated. This grayscale array representation was created as described by Reymond and coworkers [38]. AChE=acetylcholine esterase. (Data from [30]).
4.3 Application 4.3.1 Substrate Mapping of Hydrolases Mapping the substrate selectivity of hydrolases using Quick S screening was dramatically faster than traditional methods. For example, the mapping of the acyl chain length selectivity of a group of esterases from thermophiles confirmed the high acetyl preference of one esterase (E018b) (Figure 9). We determined the true selectivity of the thermophile esterases toward different acyl chain length using a competitive experiment usually using resorufin acetate (60) as the reference compound. However, accurate measurement of the selectivities requires that the substrate and the reference compound react at comparable rates to measure both reaction rates accurately. When the substrate hydrolysis was much slower than resorufin acetate (60) hydrolysis, we replaced resorufin acetate (60) with the slower reacting resorufin pivaloate (62) or resorufin isobutyrate (63) as reference compounds and extended the measurement time. The trends for the true selectivities were similar to the estimated selectivities. Among the vinyl esters, the hexanoate or octanoate was the best substrate for all except E018b, where the acetate was the best substrate. We confirmed this high acetyl selectivity of E018b by a direct competition between vinyl butyrate and acetate monitored by 1 H-NMR. The resonances of both substrate vinyl esters, and the products, butyric and acetic acid, were monitored over time and revealed a 17-fold acetyl preference. In comparison the Quick S measurements showed a 24-fold acetyl preference. Surprisingly, acetyl esterase from orange peel showed only a fourfold preference for acetyl. 4.3.2 Screening of Mutants in Directed Evolution An important application of Quick E screening was search for more enantioselective mutants in a library of randomly mutated esterases. For this application, the ability to measure enantioselectivity accurately was important to find those mutants
4 Estimating and Measuring Selectivity
Fig. 10 Mutants of Pseudomonas fluorescens esterase generated by random mutagenesis of the entire protein and ordered from highest to lowest enantioselectivity. The flat central part of the curve represents colonies
where there was little or no change in enantioselectivity because they either contained no mutations or mutations that did not affect enantioselectivity.
that only increased enantioselectivity moderately. To increase the enantioselectivity of PFE toward hydrolysis of methyl 3-bromo-2-methylpropanoate (MBMP) (64) (E WT = 12 favoring (S)-64), we used random mutagenesis by an Escherichia coli mutator strain [35]. Screening of 288 crude cell lysates with Quick E revealed that most of the mutants had enantioselectivities near that of the wild type (Quick E = 12), but one mutant, MS6-31, showed significantly higher enantioselectivity (Quick E = 21) (Figure 10 and Table 4). DNA sequencing revealed a C689 T transition, which changes Thr230 to isoleucine. This mutant showed E = 19 in a scale-up reaction in good agreement with Quick E-value. For this and the Thr230Pro mutant in Table 4, the differences between Quick E and the true E are within experimental error even though we measured Quick E-values on cell lysates, but measured endpoint method values on purified enzyme. The next generation of this directed evolution project involved focusing mutations closer to the substrate-binding site [36]. The much more enantioselective mutants at positions 28 and 121 identified in this project also highlighted a weakness of the Quick E screening method. The Quick E-values proved more difficult to measure because the rate reaction of the slow enantiomer was so slow that it was difficult to measure accurately. First, we increased the concentration of substrate, but solubility limited how much we could add. We could not simply add more enzyme because then the reference compound reacted too quickly. Our solution was a slower reacting reference compound – resofurin isobutyrate (63) – in place of resorufin acetate (60) (Figure 11). This reference compound 63 reacted about seven times more slowly with Pseudomonas fluorescens esterase than the acetate 60. With new reference compound 63, we could measure the higher enantioselectivities (Table 4). The errors for measuring the higher enantioselectivity remained higher than that for low enantioselectivity. For this application, this accuracy was sufficient to identify the highly enantioselective mutants. For applications that require measuring high enantioselectivities accurately with Quick E, we are developing even slower-reacting reference compounds.
19
20 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes Table 4 Enantioselectivities of wild-type and selected PFE mutants toward methyl 3-bromo-2-
methylpropanoate (MBMP) (64)
4.4 Advantages and Disadvantages
The main advantage of Quick E is speed, which comes from four differences between Quick E, which involves competition with a reference compound, and true E methods, which involve direct competition of the two enantiomers. First, and most important, the Quick E screening eliminates the need to measure enantiomeric purity. Measuring enantiomeric purity for substrate mapping would require several different analytical methods and probably several different chiral HPLC and GC columns. Each measurement would require at least an hour. Second, Quick E screening measures only initial rates (>5% conversion) so there is no need to wait for the reaction to reach 30 or 40% conversion. Third, the colorimetric
Fig. 11 Resorufin isobutyrate (63) reacts approximately seven times slower than resorufin acetate (60), which makes it a better reference compound for slow-reacting substrates.
References
methods allow us to follow 96 reactions simultaneously using a microplate reader. Fourth, the small scale of the reaction (typically 0.1 mmol) allows us to use more enzyme per mole of substrate and thus the reaction is faster. These advantages all contribute to making Quick E a much faster method than true E methods. Quick E is ideal for rapidly measuring the enantioselectivity of large numbers of samples and it is based on the same equations and assumptions as the true E methods. One disadvantage of Quick E is that it requires pure enantiomers, albeit in small amounts. In many cases, small amounts are available from preparative HPLC with chiral stationary phases or other methods. Another disadvantage is the need for clear aqueous solutions of substrate. Dissolving hydrophobic substrates in aqueous solution, even with the help of surfactants, is sometimes difficult. Measuring high enantioselectivity is challenging for both the true E methods and the Quick E methods. For the true E methods, measuring high enantioselectivity requires the slow-reacting enantiomer to be detected accurately. For the Quick E methods, measuring high enantioselectivity requires the rates of reaction of the slow-reacting enantiomer and a reference compound to be measured simultaneously. This measurement will normally require a slower reacting reference compound. Another limit to measuring high enantioselectivity with Quick E is the enantiomeric purity of the starting slow enantiomer. If the slow-reacting enantiomer is contaminated with the fast-reacting one, then the measured Quick E-value will be lower than the true E.
Acknowledgments
I am grateful to Lana E. Janes, who developed the pH indicator screening and Quick E during her PhD studies at McGill University and to the coworkers who tested, refined, and applied these methods to screening projects in biocatalysis: A.C. L¨owendahl, A. Man Fai Liu, N. Somers, D. Yazbeck, V. Magloire, G. Horsman, P. Mugford, and K. Morley. The Quick E research was funded by the Petroleum Research Fund administered by the American Chemical Society and the Natural Sciences and Engineering Research Council (Canada).
References 1 S. KURIOKA, M. MATSUDA, Anal. Biochem. 1976, 75, 281–289. 2 L. D. GRAHAM, K. D. HAGGETT, P. A. JENNINGS, D. S. LE BROCQUE, R. G. WHITTAKER, Biochemistry 1993, 32, 6250–6258. 3 T. L. ARNALDOS, M. L. SERRANO, A. A. CALDERON, R. MUNOZ, Phytochem. Anal. 1999, 10, 171–174.
4 J. HOFMANN, M. SERETZ, Anal. Chim. Acta 1984, 163, 67–72. 5 G. G. GUILBAULT, A. N. J. HEYN, Anal. Lett. 1967, 1, 163–171. 6 T. VORDERWU¨ LBECKE, K. LIESLICH, H. ERDMANN, Enzyme Microb. Technol. 1992, 14, 631–639. 7 D. N. KRAMER, G. G. GUILBAULT, Anal. Chem. 1964, 36, 1662–1663.
21
22 Quantitative Assay of Hydrolases for Activity and Selectivity Using Color Changes
8 M. BAUMANN, R. STURMER, U. T. BORNSCHEUER, Angew. Chem. Int. Ed. Engl. 2001, 40, 4201–4204. 9 M. KONARZYCKA-BESSLER, U. T. BORNSCHEUER Angew. Chem. Int. Ed. Engl. 2003, 42, 1418–1420. 10 M. J. WAJZER, C.R. Hebd. Seances Acad. Sci. 1949, 229, 1270–1272. 11 R. A. JOHN in Enzyme Assays, eds R. EISENTHAL, M.J. DANSON, IRL, Oxford, 1992, pp. 81–82. 12 R. M. ROSENBERG, R. M. HERREID, G. J. PIAZZA, M. H. O’LEARY, Anal. Biochem. 1989, 181, 59–65. 13 B. H. GIBBONS, J. T. EDSALL, J. Biol. Chem. 1963, 238, 3502–3507. 14 O. H. LOWRY, N. R. ROBERTS, M.-L. WU, W. S. HIXON, E. J. CRAWFORD, J. Biol. Chem. 1954, 207, 19–37. 15 R. A. DARROW, S. P. COLOWICK, Meth. Enzymol. 1962, Vol. V, 226–235. 16 R. K. CRANE, A. SOLS, Meth. Enzymol. 1960, Vol. I, 277–286. 17 R. G. WHITTAKER, M. K. MANTHEY, D. S. LE BROCQUE, P. J. HAYES, Anal. Biochem. 1994, 220, 238–243. 18 L. E. JANES, A.C. L¨OWENDAHL, R. J. KAZLAUSKAS, Chem. Eur. J. 1998, 4, 2324–2331. 19 R. G. KHALIFAH, J. Biol. Chem. 1971, 246, 2561–2573. An appendix includes a derivation of the pH dependence of the buffer factor, Q. 20 E. BANYAI, in Indicators, ed. E. BISHOP, Pergamon Press, Oxford, 1972, p. 75. 21 R. J. BEYNON, J. S. EASTERBY, Buffer Solutions, The Basics, IRL Press, Oxford, 1996, p. 72. 22 P. F. MUGFORD, S. M. LAIT, B. A. KEAY, R. J. KAZLAUSKAS, ChemBioChem 2004, 5, 980–987. 23 C. K. SAVILE, V. P. MAGLOIRE, R. J. KAZLAUSKAS, J. Am. Chem. Soc. 2005, 127, 2104–2113. 24 C. DENG, R. R. CHEN, Anal. Biochem. 2004, 330, 219–226. 25 (a) P. HOLLOWAY, J. T. TREVORS, H. LEE, J. Microbiol. Methods 1998, 32, 31–36; (b) H. ZHAO, Methods Mol. Biol. 2003, 230, 213–221.
26 A. BANERJEE, P. KAUL, R. SHARMA, U. C. BANERJEE, J. Biomol. Screening 2003, 8, 559–565. 27 I. N. TAYLOR, R. C. BROWN, M. BYCROFT, G. KING, J. A. LITTLECHILD, M. C. LLOYD, C. PRAQUIN, H. S. TOOGOOD, S. J. C. TAYLOR, Biochem. Soc. Trans. 2004, 32, 290–292. 28 D. C. DEMIRJIAN, P. C. SHAH, F. MORIS-VARAS, Top. Curr. Chem. 1999, 200, 1–29. 29 A. M. F. LIU, N. A. SOMERS, R. J. KAZLAUSKAS, T. S. BRUSH, F. ZOCHER, M. M. ENZELBERGER, U. T. BORNSCHEUER, G. P. HORSMAN, A. MEZZETTI, C. SCHMIDT-DANNERT, R. D. SCHMID, Tetrahedron Asymmetry 2001, 12, 545–556. 30 N. A. SOMERS, R. J. KAZLAUSKAS, Tetrahedron Asymmetry 2004, 15, 2991–3004. 31 A. FERSHT, Enzyme Structure and Mechanism, 2nd edn, Freeman, New York, 1985, pp. 103–106. 32 C. S. CHEN, Y. FUJIMOTO, G. GIRDAUKAS, C. J. SIH, J. Am. Chem. Soc. 1982, 104, 7294–7299. 33 Other researchers have also estimated enantioselectivity by separately measuring the rates of hydrolysis of the pure enantiomers. For example: a) G. ZANDONELLA, L. HAALCK, F. SPENER, K. FABER, F. PALTAUF, A. HERMETTER, Chirality 1996, 8, 481–489; (b) M. T. REETZ, A. ZONTA, K. SCHIMOSSEK, K. LIEBETON, K.-E. JAEGER, Angew. Chem. Int. Ed. Engl. 1997, 36, 2830–2832. 34 L. E. JANES, R. J. KAZLAUSKAS, J. Org. Chem. 1997, 62, 4560–4561. 35 G. P. HORSMAN, A. M. F. LIU, E. HENKE, U. T. BORNSCHEUER, R. J. KAZLAUSKAS, Chem. Eur. J. 2003, 9, 1933–1939. 36 S. PARK, K. MORLEY, G. P. HORSMAN, M. HOLMQUIST, K. HULT, R. J. KAZLAUSKAS, Chem. Biol. 2005, 12, 45–54. 37 L. E. JANES, A. CIMPOIA, R. J. KAZLAUSKAS, J. Org. Chem. 1999, 64, 9019–9029. 38 D. WAHLER, F. BADALASSI, P. CROTTI, J.-L. REYMOND, Chem. Eur. J. 2002, 8, 3211–3228.
1
Protease Substrate Profiling Jennifer L. Harris The Genomics Institute of the Novartis Research Foundation and The Scripps Research Institute, both San Diego, USA
Originally published in: Enzyme Assays. Edited by Jean-Louis Reymond. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52731095-1
1 Introduction
For all of the coordination and energy that an organism puts into the process of protein production, the degradation of proteins is an equally important and critical process. The specialized enzymes that carry out the hydrolysis of peptide bonds are commonly referred to as proteases (also known as peptidases and proteinases). With more than 500 proteases encoded in the human genome, proteases are one of the largest classes of enzymes [1, 2]. The fundamental activity of a protease is the hydrolysis of peptide bonds. Proteases can be divided into two general groups: exopeptidases that cleave terminal peptide bonds and endopeptidases that cleave internal peptide bonds. Proteases are further classified on the basis of catalytic mechanism into five classes, serine, threonine, cysteine, aspartyl, and metalloproteases [1].In the human genome, metalloproteases are the largest class, with approximately 186 members followed by serine proteases with 176 members, cysteine proteases with 143 members and threonine and aspartyl proteases with just 27 and 21 members respectively [2]. Historically, proteases were recognized as nonselective, promiscuous enzymes that were responsible for indiscriminately degrading dietary proteins.However, with the commitment that is inherent in the irreversibility of the peptide hydrolysis reaction carried out by proteases coupled with the exquisite substrate discrimination shown by many proteases, it is now widely accepted that this enzyme class plays crucial roles in the initiation and regulation of biological pathways. Indeed, proteases have been implicated in almost all aspects of life and death – from fertilization, development, differentiation, homeostasis, immunity, cell migration, cell activation, wound healing, to cell death. Likewise, from a therapeutic standpoint, the
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protease Substrate Profiling
modulation of proteolytic activity offers considerable promise for the treatment of human diseases, among them cardiovascular disease, lung disease, cancer, stroke, inflammation, arthritis, asthma, osteoporosis, neurodegeneration, and infectious disease. It has been estimated that approximately 14% of human proteases are under investigation as drug targets, this in addition to the proteases targeted from infectious diseases [3]. With the complete sequencing of multiple genomes, information on the repertoire of proteases can be established; however, large gaps still remain in our understanding of the biological role of most proteases. These gaps include limited characterization of enzyme active sites, limited understanding of the substrates and pathways that specific proteases regulate or are regulated by, and redundancy of function among proteases. In contrast to genomics, where the changes in the content or amount of cellular DNA or RNA can be readily examined, monitoring translational and posttranslational dynamics of functional proteins on a genomewide level is a more difficult challenge. This challenge of profiling the functional activity of proteases has led to the development of multiple tools and technologies (reviewed in [4–8]). This chapter will focus on the advantages, disadvantages, and application of a variety of tools that have been developed for the identification of protease substrates.
2 Functional Protease Profiling – Peptide Substrate Libraries
One crucial characteristic of a protease is the ability to discriminate among many potential substrates, termed the substrate specificity of the protease. The substrate specificity of a protease is determined by multiple factors that include the temporal and spatial expression of the protease, temporal and spatial expression of potential substrates, activation of the protease by posttranslational modification, availability of essential cofactors or adaptor proteins, and the presence of endogenous inhibitors. But fundamentally, the protease substrate specificity is determined by the make-up of the substrate-binding pockets that comprise the active site [9, 10] (Figure 1). Identifying the preferred substrate cleavage sequence of a protease can have a large impact on the ability to characterize the underlying biology of the protease as well as aid in the drug discovery process. Traditionally, the substrate specificity of a protease is determined by digestion of standard proteins such as gelatin, casein, hemoglobin, or insulin B-chain. Following digestion, the fragments are separated by chromatography or electrophoresis and the cleaved sequences are identified typically by mass spectrometry or Edman degradation. An example of this method is demonstrated by Bleukx et al. [11] where the substrate specificity of a wheat gluten aspartyl protease was determined through digestion of the 30-amino-acid oxidized insulin B-chain peptide and determination of the cleavage sites using Edman degradation. Using this method, the authors were able to demonstrate that the wheat gluten aspartyl protease preferred to cleave peptide bonds between or next to large hydrophobic amino acids [11]. While cleavage
2 Functional Protease Profiling – Peptide Substrate Libraries
Fig. 1 Schematic of protease-substrate subsite binding. The substrate is shown in gray and the protease is shown in black. According to Berger and Schechter [104] nomenclature, the nonprime side substrate sites are labeled P1, P2, . . . Pn from the scissile bond towards the N-terminus of the sub-
strate and the corresponding enzyme sites are designated S1, S2, . . . Sn. The prime side substrate sites are labeled P1 , P2 , . . . Pn from the scissile bond towards the Cterminus of the substrate and the corresponding enzyme sites are designated as S1 , S2 , . . . Sn .
of standard proteins and peptides is widely used, the main limitation is in the number of sequences that can be sampled in a given protein or peptide substrate. Because of this, the results from these studies are useful but only provide a limited analysis of the substrate specificity determinants. In addition to the digestion of standard proteins by proteases, preparation and evaluation of synthetic peptides with systematic substitutions in substrate sequence sites have been used to define the specific substrate subsites. Examples of this approach are found widely in the literature and allow for the quantification of specific contributions of amino acids in the substrate sequence to the substrate specificity, expressed as the kcat /K m kinetic term [12, 13]. Taking this approach, Tsu and Craik were able to determine that the serine collagenase 1 from the fiddler crab recognized a wide range of basic, hydrophobic, and polar P1 amino acids [14]. They demonstrated this through the kinetic analysis of the protease against 15 peptide substrates of the form succinyl-Ala-Ala-Pro-Xaa-pNA, where pNA represents the para-nitroanilide chromogenic leaving group that can be monitored spectrophotometrically upon cleavage by the protease. The major drawback of techniques using single protein or peptide substrates is that only limited substrate sequences can be sampled by the protease. To illustrate, determination of a tetrapeptide sequence of the 20 naturally occurring amino acids would require the systematic synthesis and testing of 160 000 (204 ) distinct tetrapeptide sequences. This number increases exponentially with an eight-aminoacid sequence requiring over 20 billion peptide sequences. Beyond the systematic examination of 1–3 substrate subsites, the synthesis and testing of single proteins or peptides is simply not practical. To address this limitation and to overcome the inefficiencies of producing and screening single substrates, combinatorial chemistry and genetic techniques have been developed for a more exhaustive determination of the substrate specificity of proteases.
3
4 Protease Substrate Profiling
2.1 Solution-based Peptide Substrate Libraries
The introduction of combinatorial synthesis techniques has allowed for the facile production of large peptide libraries with which protease activity can be investigated [15]. While the ease of peptide production has increased the number of peptides one can make, the challenge still remains on how to assay, deconvolute, and interpret the results of combinatorial peptide libraries. Because of this, considerable effort has been put into the development of mixture-based screening for the identification of protease substrate specificity determinants. Positional scanning synthetic combinatorial libraries (PSSCLs) developed by Houghten et al. [16] represent a solution-based method for the determination of substrate specificity of proteases. The major advantage of PSSCLs is that they exploit the power of combinatorial peptide synthesis, in which all possible combinations of amino acids can be represented, while simplifying the time and effort for library production, analysis, and deconvolution. The PSSCL concept can be illustrated by the creation of a tetrapeptide library of 20 amino acids for 160 000 possible sequences (20)4 . In a one-position fixed tetrapeptide PSSCL, the library is synthesized in 80 discrete pools. Each pool consists of 8000 peptide substrates where one position is defined and the other three positions contain an equimolar mixture of 20 amino acids (20 × 20 × 20 × 1 = 8000). In other words, the same 160 000 different peptide substrates are assayed four different times, each time probing a different subsite of the peptide independent of the other three positions. The PSSCL synthesis and deconvolution method has been used to identify individual active molecules in a variety of assays, including the identification of peptide inhibitors of prohormone convertase 1 and 2 by Apletalina et al. [17]. The application of PSSCL to protease substrates was elaborated by Thornberry et al. who combined the use of PSSCLs with 7-aminocoumarin-linked peptide substrates for the rapid and quantitative determination of the substrate specificity of the caspase family of serine proteases [18, 19]. Two of the main advantages of peptide coumarin substrates, first described by Zimmerman et al. [20], are that (1) the fluorescence of the coumarin group is masked when the 7-amino group is conjugated to a peptide substrate and (2) the register of cleavage is defined (Figure 2a). Both of these advantages rest on the fact that only upon cleavage of the peptide–anilide bond does the coumarin group undergo a shift in the excitation and emission wavelengths with a concomitant increase in fluorescence over the uncleaved peptide coumarin substrate. Caspases are cysteine proteases that require Asp at the P1 site of their substrates. The requirement for P1-Asp was exploited for the synthesis of coumarin substrate libraries by linking the carboxylic acid of the aspartic acid side-chain to a solid support to facilitate the solid-phase synthesis of the PSSCL [18] (Figure 2b). Cleavage of the library from the solid support resulted in regeneration of the aspartic acid and allowed for solution-based screening of the library. While the P1 position was held constant as P1-Asp, the P2, P3, and P4 side-chains were varied and included the 20 natural amino acids, with the exception of cysteine and methionine, and the
2 Functional Protease Profiling – Peptide Substrate Libraries
Fig. 2 7-Aminocoumarin substrates. (a) Structure of 7-aminocoumarin substrate. Upon cleavage at the anilide bond there is a concomitant increase in fluorescence of the coumarin. (b and c) Comparison of solid-
phase synthesis strategies for coumarin substrates linked to the resin through the P1-aspartic acid (b) or through modification of coumarin (c).
inclusion of two non-natural amino acids norleucine and D-alanine for a total of 8000 substrates (203 ) in 60 pools of 400 substrates per pool. The library was used to determine the substrate specificity of the caspase family of cysteine proteases and the serine protease granzyme B [19]. The results from their analysis showed that the caspase family could be separated into three classes based on their substrate specificity as determined from the substrate library. All three classes showed a preference for Asp in the P1 position and Glu in the P3 position. The largest differences in preference between the enzymes were observed to be in the P4 position. Class I caspases showed a preference for large hydrophobic amino acids in the P4 position and His in the P2 position. Caspases in class I included those enzymes associated with inflammation such as caspases 1, 4, and 5. The class II caspases showed a distinct preference for Asp in the P4 position and include caspases 2, 3, and 7. The consensus sequence of Asp-Glu-Xaa-Asp is found in many
5
6 Protease Substrate Profiling
of the downstream macromolecular substrates cleaved during apoptosis, suggesting that members of class II are involved during the effector phase of apoptosis. Finally, the class III members, caspases 6, 8, and 9, show a P4 preference for aliphatic amino acids. The resulting consensus site of (Ile/Leu/Val)-Glu-Xaa-Asp resembles the zymogen activation site for the class II caspases and implicates the class III enzymes as upstream activators of the apoptotic protease cascade. The major limitation of the method described above is that the synthesis of the libraries requires connection to the solid support through the P1-Asp side-chain (Figure 2b). Only a fraction of the natural amino acid side-chains have functionality that would allow them to be covalently linked to a solid support. To overcome this limitation, Harris et al. developed a modified 7-aminocoumarin that contains a carboxylic acid group rather than methyl group in the 4-position of the coumarin [21, 22]. The coumarin could then be directly linked to the solid support through its carboxylic acid moiety, allowing for greater diversity in the peptide portion of the molecule (Figure 2c). These libraries allowed for complete examination of the P1 functionality and could be combined in multiple formats for substrate preparation and screening. For example, the most simple tetrapeptide library that allows for complete examination of the specificity determinants in a minimum number of wells is the one-position fixed library [23, 24]. In this format, only one position in the tetrapeptide is fixed at any one time and the three other positions contain an equimolar mixture of all the amino acids used in the library. For the serine protease thrombin in a one-position fixed tetrapeptide coumarin library of 19 amino acids (cysteine is not used and methionine is replaced by the isosteric amino acid norleucine), the major determinant is for the basic amino acids, Arg and Lys in the P1 position and Pro or Ala in the P2 position (Figure 3). Thrombin does not show a significant preference for amino acids in the P3 and P4 positions in this library format. Analysis of these results with thrombin brings up one major drawback to the format of positional scanning libraries where only one position is examined in any given pool of substrates – the inability to observe cooperativity or dependence between subsites in the substrate. Clearly, the format of a one-position fixed PSSCL only allows for a gross overview of substrate specificity determinants. To accomplish a more comprehensive evaluation of the substrate specificity of a protease requires reducing the complexity of the mixtures and increasing the number of fixed positions. For example, in a two-position fixed tetrapeptide PSSCL, the same 130 321 (194 ) substrates are monitored six different times, each time with two different positions fixed in the substrate. Rather than 6859 (193 ) substrates per pool, the complexity is reduced to 361 (192 ) substrates per pool, but the number of pools to synthesize and assay has increased from 76 (19 + 19 + 19 + 19) to 2166 (19 × 19 × 6 for the six different combinations of fixing two positions in the substrate). Another advantage of the two-position fixed format over the one-position fixed format is that the concentration of individual substrates in the pool can be increased from approximately 0.01 µM in the one-position fixed library to 0.25 µM in the two-position fixed library. This is because the concentration of the pooled substrates is limited by the overall solubility of all substrates in the pool, and therefore the lower complexity of the two-position fixed library allows for higher concentrations
2 Functional Protease Profiling – Peptide Substrate Libraries
Fig. 3 Profiling data from thrombin in a 1-position-fixed tetrapeptide positional scanning synthetic combinatorial library (PSSCL). The x-axis represents the fixed amino acid, the other three nonfixed positions contain an equimolar mixture of 19
amino acids for a total of 6859 substrates per well (represented by each bar on the histogram). The y-axis represents the normalized rate of coumarin released from the mixture of substrates by thrombin in relative fluorescence units per second (RFU s−1 ).
of individual substrates in the pool, resulting in a higher signal-to-noise ratio. These substrate concentrations are still well below the typical K m values for these substrates, allowing the rate of hydrolysis to be proportional to the specificity constant, kcat /K m . The results with thrombin now clearly show a preference for P4 aliphatic amino acids (Figure 4) [25]. One major disadvantage of coumarin substrates or any peptide substrate that relies on the cleavage of a peptide/latent leaving group bond, is that the latent leaving group occupies the prime site, preventing examination of prime site/peptide interactions. This limits the utility of these libraries to examining only the nonprime specificity of proteases. Also, while coumarin peptide libraries have been shown to be effective against some aspartyl proteases [26], in general, these substrates show little activity toward enzymes that have strong requirements for prime site interactions such as aspartyl- and metallo-proteases. An iterative strategy that incorporates the nonprime information identified from the PSSCL coumarin libraries in a donor-quencher PSSCL format has been recently demonstrated by Shipway et al. for the identification of prime side substrate determinants (Figure 5) [25, 27]. After identifying the nonprime side substrate
7
8 Protease Substrate Profiling
Fig. 4 Profiling data from thrombin in a 2 position-fixed tetrapeptide PSSCL. The xand y-axes represent the fixed amino acid, the other two nonfixed positions contain an equimolar mixture of 19 amino acids for
a total of 361 substrates assayed in each well (represented by one square on the 2D array). Each 2D array depicts, by relative intensity, reaction velocities in RFU s−1 .
2 Functional Protease Profiling – Peptide Substrate Libraries
Fig. 5 Schematic representation of design of octapeptide donor quencher library. (a) The nonprime side specificity is determined by the coumarin libraries. (b) The nonprime side sequence is fixed to bias for cleavage
between the P1 and P1 positions of the substrate. (c) Upon cleavage by the protease, the quencher is separated from the fluorescent donor group and the increase in fluorescence can be directly monitored.
specificity of the serine protease prostasin as P4-Lys P3-His P2-Tyr P1-Arg, a prime side library was synthesized where the P1 to P4 positions were randomized in a PSSCL format and the P1–P4 positions were held constant as Lys-His-Tyr-Arg. Construction of a one-position fixed PSSCL in this format addressed the issue of catalytic register of the substrate by satisfying the prime side positions and biasing for cleavage after the internal Arg amino acid (Figure 6). The results of prostasin in the library showed no activity for P1 -Ile and helped explain the inability of the enzyme to autoactivate through processing of its prodomain at the sequence Pro-Gln-Ala-ArgIle-Thr-Gly-Gly [27]. Another interesting application of PSSCLs to protease profiling was reported by Mason et al. for the identification of substrate specificity determinants of two deubiquitin hydrolase (DUB) family members, isopeptidase T (isoT) and UCH-L3 [28].
9
10 Protease Substrate Profiling
Fig. 6 Profiling data from prostasin with donor-quencher library. (a) Structure of donor-quencher substrate library. (b) Results of prostasin activity in the 1 positionfixed P1 –P4 substrate libraries. Each graph represents a position in the P1 –P4 sub-
strate sequence with an ‘O’ representing the fixed position and an ‘X’ representing variable positions. The identity of the amino acid in the fixed position is indicated on the x-axis. The y-axis is the reaction velocity in RFU s−1 . (Taken from [27]).
DUBs recognize and cleave the isopeptide bond created when the C-terminus of ubiquitin is conjugated to the ε-amine of a lysine residue. A PSSCL was synthesized based on a C-terminal fragment of ubiquitin, Gly-Gly-Arg-Leu-Arg-Leu-Val-Leuphenyl-12 C6 isocyanate, conjugated to the ε-amine of the lysine in a one-position fixed positional scanning library of the sequence Ac-Arg-Leu-Xaa4 -Xaa3 -Xaa2 -Xaa1 Lys-Gln-Leu-Glu-Asp-Gly-Arg. The kinetic cleavage rate of the branched peptide library by DUBs was quantitatively determined by monitoring the appearance of free Gly-Gly-Arg-Leu-Arg-Leu-Val-Leu-phenyl-12 C6 isocyanate as compared to the 13 Clabeled internal standard Gly-Gly-Arg-Leu-Arg-Leu-Val-Leu-phenyl-C6 isocyanate by mass spectrometry. This study showed that DUBs indeed show distinct specificity for their substrates as demonstrated by the 10-fold preference of UCH-L3 for basic amino acids over other amino acids in the Xaa3 position. Another method that has been used quite extensively to address the prime side specificity of proteases is Edman degradation of peptide mixtures [29–32]. For instance, Cantley’s lab has demonstrated iterative design and synthesis of peptide substrates coupled with Edman degradation to determine the optimal substrate
2 Functional Protease Profiling – Peptide Substrate Libraries
specificity of proteases. In particular, a primary dodecamer peptide library is synthesized of the form acetyl-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-Xaa-amide, where Xaa represents a degenerate position consisting of all natural amino acids except cysteine. To ensure equimolar incorporation of all 19 amino acids, the concentrations of the individual amino acids are adjusted to compensate for unequal coupling rates as determined by Ostresh et al. [33]. Only the substrates that have been cleaved by the protease will have free N-termini and the whole mixture can be sequenced and the amounts and identities of the prime side amino acids (i.e. C-terminal to the cleaved bond) can be quantitated using Edman degradation. One cycle of Edman degradation and sequencing reveals the preferred P1 amino acids, two cycles reveals the preference at P2 and so on. A second library can then be synthesized that examines the nonprime side specificity. The prime side optimal substrate sequence as determined by the primary library is fixed and degenerate positions are synthesized in the non-prime side positions. The N-termini of the peptides in this library are not acylated, but rather a bio-tin molecule is added to the C-terminus of each peptide. Once the library is digested with the protease of interest, streptavidin is used to capture the C-termini of both the cleaved and the uncleaved peptides leaving only the N-terminal fragments of the library, which can then be sequenced using Edman degradation. Using this approach the authors were able to profile multiple metalloproteinases including matrix metalloprotease-2 (MMP-2) [32] and the anthrax lethal factor metalloprotease [34]. One shortcoming with this iterative library design strategy is that the substrates with cleavable sequences may be relatively rare and therefore may not give a significant signal over the background in the primary library. To overcome this, the authors suggest the design of biased libraries where some of the positions in the substrate are held constant, for example a caspase library would have the sequence acetyl-Asp-Glu-Val-Asp-Xaa-Xaa-Xaa-Xaa-Ala-Lys-Lys-amide to bias for cleavage between the Asp-Xaa bond. This second approach requires a priori knowledge of some substrate specificity information for the protease of interest. Another potential drawback that may complicate the execution of this strategy is the potential for misdirected cleavage. This can be especially true for serine proteases of the chymotrypsin fold, where the majority of the substrate specificity determinants are in the nonprime rather than the prime side position of the substrate and therefore fixing the prime side positions may be insufficient to direct the cleavage register of the substrate. 2.2 Solid Support-based Synthesis and Screening of Peptide Libraries
The fundamental advantage of the mixture-based approach arises from the fact that substrates are synthesized and assayed as mixtures and this allows for the examination of a large amount of substrate space in a minimal number of assays. However, this also inevitably leads to the fundamental shortcoming of the mixturebased approach: the results of the screen represent an average of the activity on the
11
12 Protease Substrate Profiling
multiple substrates in the pool. Combinatorial libraries that rely on the synthesis and screening of pools of substrates rather than individual substrates trade off the reduction in the effort of preparing and screening the library for the increase in the effort needed to interpret, deconvolute and validate the results of the screen. In many cases this trade-off is a favorable one because the preparation of large numbers of individual substrates is a daunting task. However, as parallel synthesis techniques improve for single substrate synthesis [35, 36], the next challenge will be to increase the capacity to assay individual substrates. Inspired by DNA chips, microarrays have been successful in miniaturizing an ever-widening range of protein- and peptide-based assays (reviewed in [37]). An interesting application to protease substrate microarrays involved the exploration of the subsite specificity of the outer membrane protease T (OmpT) from E. coli by Dekker et al. [38]. In this study, spot synthesis [39] was used to produce an 80-peptide donor-quencher array on 8 mm spots of cellulose paper that had been derivatized with a polyoxyethylene glycol (PEG) linker. The peptides in the library were of the form acetyl-Dap(dnp)-Ala-Arg-Arg-Ala-Lys(Abz)-Gly-PEG where each position in the P2–P2 sequence Ala-Arg-Arg-Ala was systematically replaced with one of the 20 naturally occurring amino acids. Upon incubation of the cellulosebound peptides with OmpT, the peptide spots that were recognized and cleaved by OmpT increased in fluorescence signal due to the removal of the N-terminal portion of the peptide that contained the quenching dinitrophenyl group. The length of the PEG linker, which corresponded in length to approximately 200 carbon-carbon bonds, was shown to be essential for OmpT activity on the cellulosebound substrates. Direct coupling of the substrates to cellulose support produced substrates that were active against trypsin but were presumably not accessible to the larger detergent-solubilized OmpT protease. The results from the spot assay revealed a broad tolerance for amino acids at the P2 position, Lys>Arg at the P1 position, Lys>Ile>R in the P1 position, and Ile=Val>Ala in the P2 position of the substrate. While the general features of optimal and suboptimal substrate preference are present in the results of the cellulose-bound peptide array, the relative substrate specificity observed on the array does not correlate well with the kcat /K m determined on the select number of substrates that were tested in solution. For example, based on the results from the cellulose-bound peptide array, P2 -Ile should be preferred over P2 -Ala, however the kcat /K m for the P2 -Ile substrate showed a greater than 1000-fold decrease in activity over the P2 -Ala substrate. The authors speculate that this may be due to a dominant influence of kcat versus kcat /K m for substrates immobilized on the cellulose array. Another complicating factor that could contribute to inconclusive results is the lack of register for the site of cleavage within the donor-quencher peptide. Another substrate microarray approach was reported by Salisbury et al. on the synthesis and evaluation of microarrays of coumarin substrates in a miniaturized format representing 800 data points in an area less than 1.7 cm2 [40]. In this study, they linked individually synthesized alkoxylamine-functionalized coumarin substrates to 4-carboxybenzaldehyde derivatized BSA-covered glass slides via a chemoselective oxime ligation. A spatially arrayed collection of substrates was pro-
2 Functional Protease Profiling – Peptide Substrate Libraries
duced where the P4 position in the substrate was held constant as alanine, P1 was constant as lysine, and P2 and P3 consisted of 19 amino acids for a total of 361 individual substrates. Upon treatment of the array with a protease, susceptible substrates are cleaved at the peptide anilide bond thus allowing detection of the now highly fluorescent coumarin linked to the array. The relative kcat /K m for the individual substrates can be determined based on the amount of fluorescence detected for each of the substrates on the array. In this assay, thrombin showed a preference for P2 proline and broad tolerance of multiple amino acids at the P3 position. To address the correlation of solid phase cleavage versus solution-based cleavage, four single substrates were assayed in solution and compared with the microarray results. In this case, the relative ranking of substrate cleavage on the solid-surface microarray was predictive of substrate cleavage in solution, but distinct differences were observed in suboptimal substrates. For example, Ac-Ala-Phe-Ser-Lys-ACC was shown to be hydrolyzed in solution but not on the microarray. While these microarray techniques for monitoring protease substrate specificity solve the issue of assay miniaturization, they still require greater effort for the parallel synthesis of individual substrates than do combinatorial synthesis schemes. Methods that combine the use of solid-phase based portion-mixing (also known as split and mix) with the ability to detect single substrates have proven to be powerful tools [41–43]. One example of the portion-mixing solid-phase assay approach has been developed by Meldal et al. [43] through the use of a bead-based intramolecular fluorescence quenching assay to select for optimal peptide substrates. An important aspect of this solid phase-based assay is the development of biocompatible solid supports, such as polyethylene glycol-poly-(N,N-dimethylacrylamide) copolymer (PEGA) resin, that allow for the libraries to be screened directly on the beads. The general approach for this strategy requires that the peptide be flanked by an internally quenched fluorescence pair, typically Abz as the fluorescent donor on the C-terminal portion of the peptide nearest the bead linker, and 3-nitro-tyrosine as the fluorescent quencher located near the N-terminal portion of the peptide. The synthesis of an exhaustive set of tetrapeptide sequences containing each of the 20 amino acids at all four positions using portion-mixing requires only 84 synthesis steps (20 × 4 coupling steps + 4 deprotection steps), whereas a twoposition fixed PSSCL of 20 amino acids requires 19200 synthesis steps (400 × 6 × 4 coupling steps + 400×6×4 deprotection steps), and finally the synthesis of all 160 000 individual substrates requires 1280000 steps (160000 × 4 coupling steps + 160000 × 4 deprotection steps). Following synthesis, each bead contains a unique peptide sequence that upon incubation with the protease of interest can be cleaved by removing the N-terminal 3-nitro-tyrosine quencher, thus causing the bead to be fluorescent when placed under UV light. The fluorescent beads are then manually chosen and separated from the nonfluorescent beads. The peptide identity is identified directly from the bead using Edman degradation. Hydrolysis of the peptide on the bead is terminated before it is complete, typical values are 20–80% total hydrolysis of the peptide, therefore a portion of the uncleaved peptide remains and both the full-length sequence and the cleavage site within the peptide can be identified by this method.
13
14 Protease Substrate Profiling
The ratio of cleaved to uncleaved peptide is then used to quantify an approximate rate of hydrolysis for each substrate. Because this is a selection rather than a screening method, the statistical analysis of preferred amino acids in the subsites requires the selection and sequencing of a large number of beads. This method has been applied to many proteases, including subtilisin Carlsberg, cruzain, and papain [44–46] and, in general, it identifies good substrate sequences that can then be used to further characterize the enzyme. However, as with other solid-phase selection techniques, a direct correlation is not always seen between the activity of substrates on the solid support and the activity of the substrates in solution. Several reasons could account for this discrepancy, the simplest being that a more extensive set of beads would need to be selected and sequenced to improve the statistical correlation and be representative of the true substrate specificity of the protease. Another potential reason for discrepancies is that additional factors may influence the interaction of the protease with the support-bound substrate. This was indeed observed when screening papain against a library of 20 natural amino acids. The majority of the sequences selected from the assay contained acidic amino acids at P4 , P5 , and P6 [45]. However, the equivalent preference of activity of these peptides did not translate when the peptides were tested in solution. The authors determined that this was an artifact and likely due to an increased local concentration of the enzyme for beads with negatively charged residues because of the positive charge of papain at the pH of the assay. The authors followed this up by designing a library that was absent of negatively charged amino acids. These problems notwithstanding, bead-based methods remain a powerful means of identifying the substrate specificity of proteases. The major complicating factor for methods that utilize solid support assay techniques is that the potential interaction of the protease to the solid support may bias the interpretation of the substrate specificity information and may not reflect the true substrate specificity of the protease. To address this issue and combine the advantages of portion mixing synthesis with solution-based screening methods, Winssinger et al. recently reported the use of peptide nucleic acid (PNA)-encoded rhodamine substrates [47]. The PNA-encoding strategy is ideal for the generation of peptide substrate libraries as it allows for the rapid generation of libraries by split and mix synthesis. The substrates are synthesized using a bifunctional linker to the solid support with orthogonal protecting groups that allow for cosynthesis of the peptide and PNA chains (Figure 7). The synthesis of the reported library of 192 substrates relied on the alternating use of Fmoc-protected PNA monomers and Alloc-protected amino acids [48]. Proteolysis of the bond connecting the peptide substrate to the latent rhodamine fluorophore changes the electronic properties of this fluorophore, resulting in a large increase in fluorescence that can be detected at λex = 488 nm and λem =530 nm (Figure 8) [49]. Proteolysis is measured by incubating the PNA-encoded library of substrates with the protease or biological sample of interest. The fluorescence signal from the library is resolved by hybridization to an oligonucleotide microarray. Location of hybridization is dictated by the sequence of the PNA that is linked to the sequence of the substrate peptide (Figure 8).
2 Functional Protease Profiling – Peptide Substrate Libraries
15
16 Protease Substrate Profiling
Fig. 8 Profiling with the PNA-encoded rhodamine-substrate library. A: Following split and mix synthesis, the rhodamine substrate library is present in solution as a mixture. B: Incubation of substrates with protease or lysate releases selected peptides
from rhodamine scaffold, thus increasing the intrinsic fluorescence of the rhodamine group. C: Active substrates are spatially resolved through hybridization to complementary oligonucleotides on a microarray and visualized with fluorescence detection.
← Fig. 7 Peptide nucleic acid (PNA)-encoded rhodamine substrates. (a) Synthesis of the PNA-encoded rhodamine library using split and combine chemistry. (b) General structure of the 192 member PNA-encoded rho-
damine substrate library where each of the amino acids in the peptide sequence on the rhodamine scaffold is encoded by a specific sequence of PNA monomers as shown in the lower boxes. (Adapted from [47]).
2 Functional Protease Profiling – Peptide Substrate Libraries
While other protease microarray methods have been reported [38, 40], the PNAencoding system has the advantage that the proteolysis can be carried out in solution. This is important in order to exclude the effects of nonspecific interactions of the enzymes with the surface and offers better control of substrate/analyte concentration and reaction condition. While only demonstrated on a limited set of 192 tetrapeptide substrates, experiments with the purified enzymes thrombin and caspase 3 established that the microarray method produced reliable results for the determination of protease substrate specificity. The activity profile of blood samples from healthy donors was compared to blood samples from an individual treated with warfarin, a frequently used drug for oral anticoagulant therapy. More significantly, the PNA-encoded substrate libraries were applied to clinical blood samples to compare protease activity of patients on warfarin therapy. Warfarin decreases the level of active prothrombin, factor X, IX, VII, and protein C by blocking their vitamin K-dependent carboxylation of glutamic acid moieties that is essential for their synthesis and function [50]. The normal blood sample showed a profile Arg in the P1 position and Pro in the P2 position – a profile indicative of thrombin activity. A loss of this thrombin profile was observed in blood from patients treated with warfarin, indicating a decrease in thrombin-mediated signal transduction. One limitation of the PNA-encoded substrates is that detection of proteolysis is dependent on the interaction of the encoded PNA oligomer with the DNA oligomer attached to the microarray. Our knowledge of the properties that define PNA: DNA and PNA: PNA interactions is currently incomplete and requires increased understanding to aid in the design of large substrate libraries. Furthermore, development of cell-permeable probes will be required to capture many dynamics of protein function.
2.3 Genetic Approaches to Identifying Protease Substrate Specificity
The use of peptide libraries displayed on the surface proteins of filamentous bacteriophage has become a widely adopted technique for optimizing peptide or protein sequences [51]. The major advantage of bacteriophage display is that the protein or peptide displayed on the surface of the bacteriophage is directly linked to the genetic material of the phage. Therefore, the identity of the protein sequence on the surface of the phage can be determined by simply sequencing the phage DNA. The application of phage display peptide libraries to protease substrate optimization was first developed by Matthews and Wells [52]. The typical approach to making phage display libraries is the insertion of randomized sequences into protein pIII (gene III product) followed by a protein or peptide sequence that can be used as an affinity tag. Upon incubation of the phage library with the protease of interest, the phage that are displaying substrate sequences susceptible to cleavage by the protease will lose their affinity tag. The phage displaying sequences refractory to cleavage by the protease will retain their affinity tag and can be trapped preferentially by binding to an affinity matrix. The phage with susceptible sequences can be used to infect
17
18 Protease Substrate Profiling
Fig. 9 Schematic representation of substrate phage.
E. coli cells and be amplified for another round of substrate selection or isolated for DNA sequencing of individual clones (Figure 9). The substrate specificity of multiple proteases has been determined using substrate phage display and typically results in the identification of substrates with good catalytic efficiencies [53–59]. For example, Coombs et al. reported the identification of the substrate consensus sequence Ser-Ser-(Tyr or Phe)-TyrSer-(Gly or Ser) for the serine protease prostate-specific antigen (PSA) [55]. They reported kcat /K m values as high as 3100 M−1 s−1 as compared to 2.0 M−1 s−1 for the sequence Gln-Gln-Leu-LeuHis-Asn found in semenogelin, a physiological substrate of PSA. Tenzer et al. reported the use of subtractive substrate phage display to identify specific proteolytic activity in the cytosol of cancer cells in response to treatment with an antiproliferative regiment consisting of staurosporine and ionizing radiation
3 Identification of Macromolecular Substrates
[60]. In this library, a randomized gene encoding seven amino acids and a Cterminal 6×His affinity tag were fused within the T7 phage capsid protein Xb. Fusion within this phage protein allows for the expression of 10 copies of the putative substrate on the surface of the phage. The phage were then bound to a Ni2+ affinity matrix and were first incubated with cytosolic extracts from untreated cells to release the treatment-independent substrates from the matrix. The same matrix was then treated with cytosolic extract from cancer cells treated with the antiproliferation regiment, releasing the treatment-dependent proteolytic substrates. While differences in substrate sequences were observed between treated and untreated cell lysates, there was a lack of agreement between the substrate sequences in two independent experiments. Despite this, several sequences such as Trp-Ala-Val-Glu-Ile-Asp-Phe and Ser-Glu-Ala-Gly-Ile-Ser-Ala identified from one of the experiments were synthesized and further investigated. While no difference in cleavage from cytosolic extracts of treated versus untreated cells was observed for the Ser-Glu-Ala-Gly-Ile-Ser-Ala peptide, a significant increase in cleavage of Trp-Ala-Val-Glu-Ile-Asp-Phe for treated versus untreated cells was observed. The authors went on to show that this substrate was likely being cleaved by a caspase that was activated upon treatment of the cells with the antiproliferative regiment. The major limitation to substrate phage is that the size and diversity of the phage library are ultimately determined by the transformation efficiency and biological constraints of the E. coli host. If only a small fraction of the possible sequences can be sampled in the selection, then the optimal sequence may be missed. By design, substrate phage is a selection rather than a screening process so typically only the optimal substrates are identified and information as to the suboptimal sequences can be inferred, but ultimately remain unaddressed with this method. The catalytic alignment of the cleavage site can also be cryptic and often requires the incorporation of known specificity determinants to bias for cleavage at a particular site or the iterative production and testing of multiple phage libraries [58, 59]. As with the solid-phase approaches, substrate optimization by phage display can also be subject to artifactual results due to the context of the peptide sequence within the phage construct. Furthermore, as elaborated by Sharkov et al., under typical selection conditions the enzyme is present at nanomolar concentrations, whereas the phage particles are typically present at low picomolar concentrations [61]. Under these conditions the kinetics of hydrolysis are more representative of a single turnover reaction rather than steady state conditions and therefore may not enrich for substrates with increased kcat /K m . Indeed, several studies have shown a lack of agreement between substrate phage-enriched clones and the kcat and K m values of the peptides derived from the clones [55, 62–64].
3 Identification of Macromolecular Substrates
Perhaps the most useful information to know about a protease is the identity of the downstream proteins that it cleaves. By knowing the physiological substrates of a
19
20 Protease Substrate Profiling
protease, one can more completely understand the pathways in which the protease is operating and determine the extent of impact that modulating the protease activity would have on a therapeutic or pathological process. Information from synthetic or genetic peptide libraries gives a rapid overview of the proximal substrate cleavage site that can provide insight into the determinants of the substrate binding site as well as sensitive tools for assaying the protease. While the information from the substrate specificity of a protease can be used to identify a list of potential substrates through the bioinformatic examination of the proteome [65], much effort is still needed to validate the proteins as actual substrates of the protease. Indeed, the determinants of the proximal substrate cleavage site of a protease identified by in vitro methods may not accurately represent the determinants for macromolecular substrates that a protease will encounter in a biological setting. The traditional biochemical approach to identifying protease substrates has been the serial candidate approach that is often tedious and likely to only yield partial or inadequate identification of protease substrates. Even when a protease is shown to cleave a protein, the functional significance of that cleavage may be poorly understood. Direct identification of protease substrates on a genome-level in a biological context is still one of the major challenges in protease biology. 3.1 Genetic Approach to the Identification of Macromolecular Substrates
Genetic modification of protease function has been an important approach to identifying downstream protease substrates and furthering our understanding of the biological role of specific proteases. Typically, loss of function mutations in proteases results in the accumulation of specific proteins that lead to the identification of those proteins as substrates for the protease and confirmation that the protease is acting in the pathway. For example, the familial disease haemophilia A is due to loss-of-function mutations in factor VIII that lead to diminished proteolytic activity of factor IXa [66]. Human hereditary disorder for early onset of Alzheimer’s disease can be linked to gain-of-function mutations in presenilins-1 and -2 [67, 68] as well as mutations in the amyloid precursor protein substrate itself [69]. Targeted deletion of proteases in mice has also proven useful in the identification and validation of protease substrates [70–73]. For example, glucagon-like peptide-1 (GLP-1(7–36)-amide) is a gut hormone that enhances the glucose-dependent secretion of insulin from pancreatic β-cells [74]. GLP-1(7–36)-amide is rapidly degraded to an inactive protein by the removal of the two N-terminal amino acids to generate GLP-1(9–36)-amide. Marguet et al. generated mice deficient in dipeptidyl peptidase IV and showed that there was an increase in the level of circulating GLP-1(7–36)amide with a consequent increase in the insulinotropic effect [75]. This knockout strategy for the identification of substrates can be time consuming and because of the existence of compensatory or redundant pathways may sometimes lead to inconclusive results. The question usually remains as to whether a particular substrate is a direct or indirect substrate of the genetically targeted protease because many proteases operate in cascades and activation of downstream
3 Identification of Macromolecular Substrates
proteases may in fact be responsible for the substrate cleavage. Also, unless an obvious phenotype is observed that can guide the researcher to examine candidate proteins as potential substrates, identifying the in vivo substrates by this approach can be tedious and may overlook other relevant substrates. This method is good for confirming hypotheses about the protease and substrate interactions, but rarely are new substrates identified. Methods that allow for the proteome to be queried in a systematic fashion would greatly complement in vivo genetic approaches. One technology that is emerging as a tool to rapidly generate and validate proteases and protease substrates in biology is RNA interference (RNAi) [76–79]. For example, Li et al. used RNAi to demonstrate the role of the de-ubiquitinating protease HAUSP in regulating the antiproliferative effects of the p53 tumor suppressor protein [79]. The ubiquitin ligase Mdm2 is responsible for ubiquitinating p53 thus targeting p53 for destruction by the proteasome. HAUSP can reverse the destruction of p53 through direct binding and de-ubiquitination of the protein. In support of this interaction, a partial reduction of HAUSP in cells by HAUSP-directed RNAi led to the destabilization of p53. When a more complete ablation of HAUSP was achieved through amplified treatment of cells with RNAi, the unexpected result of p53 stabilization was observed. The authors were able to show that Mdm2 was also a substrate of HAUSP and that when levels of HAUSP were completely reduced, Mdm2 was destabilized and no longer able to ubiquitinate p53, thus leading to the stabilization of p53. This study illustrates the subtleties in the dynamics of protease-substrate interactions that can be observed with tools like RNAi but that would be completely overlooked by targeted deletion of the gene. A method that attempts to address cleavage of all proteins in a proteome in an unbiased manner is small pool cDNA expression screening originally developed by Lustig et al. [80] (Figure 10). In this method, a cDNA library is constructed from the cells, tissues or organisms. The clones from a cDNA library are combined in pools of approximately 10–100 clones per pool. The pools of cDNA are then transcribed/translated in vitro in the presence of a labeling agent, typically [35 S]methionine. The labeled protein pools are then incubated with the protease of interest and subsequently separated on sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). Pools that show a shift in the labeled proteins with treatment of the protease are transformed into E. coli and pooled in less complex pools of approximately 1–10 clones per pool. The in vitro transcription/translation and cleavage assay is then iterated until the single cleavage susceptible clone is isolated. The cDNA is then simply sequenced to reveal the identity of the cleaved protease substrate. Using this method Kothakota et al. were able to systematically investigate 1000 cDNA pools of 100 clones per pool (a total of 100 000 clones) to identify substrates of caspase-3 [81]. In three different pools a 65 kDa protein was reduced to a 48 kDa protein. Deconvolution of the pool and sequencing of the clone revealed the protein to be the partial sequence of gelsolin from residues 142–731. Based on the in vitro determined substrate specificity of caspase-3 [19], the cleavage site in gelsolin was identified as Asp-Gln-Thr-AspGly. The functional significance of gelsolin cleavage during apoptosis was then examined and shown to play an important
21
22 Protease Substrate Profiling
Fig. 10 Schematic representation of smallpool cDNA expression screening. (a) The cDNA library of interest is transformed into E. coli to produce single clones. (b) The DNA from approximately 100 clones are pooled into a single vial and the encoded protein is produced by in vitro transcription and translation in the presence of [35 S]methionine. (c) The protein mixtures are then incubated with or without the protease of interest. The proteins are then resolved on SDS-PAGE and visualized by
autoradiography. The cDNA pools that correspond to lanes that show a disappearance or appearance of bands upon incubation with the protease (an example is circled) are again transformed into E. coli. The process of in vitro transcription translation and protease incubation is repeated with the single clones to isolate the putative protease substrate. The protease substrate is identified by sequencing the isolated cDNA clone.
role in effecting the morphological changes of apoptosis through the severing of actin filaments. Additional confirmation of this role was identified by a delayed onset of membrane blebbing and DNA fragmentation in neutrophils isolated from gelsolin-deficient mice. While this method provides a rapid and direct approach to the identification of downstream targets of proteases, there are some limitations. First, the output of the method is only as good as the quality of the cDNA library input – partial sequences lacking an initiation site or empty sequences will decrease the diversity of the cDNA library. Also, abundant transcripts may be overrepresented in the collection and rare transcripts may not be represented at all. Hence, not all proteins will be reliably expressed in the in vitro transcription/translation mixture – proteins may require membrane insertion, posttranslational modifications, or the presence
3 Identification of Macromolecular Substrates
of cofactor or adaptor proteins. All of these factors can contribute to a decrease in representation in the screen and indeed, only 10-40% of the clones in a pool will typically result in labeled proteins [82]. Also, as with most other proteomic techniques, cleavage of the identified protein may not be due to direct cleavage of the protease of interest, but rather through the activation of a downstream protease in the in vitro transcription/translation lysate. Another genetic approach that attempts to identify macromolecular substrates is exosite scanning introduced by Overall et al. An exosite is a domain or site that is peripheral to the primary catalytic site of a protease, but nonetheless can be an important determinant of substrate specificity [83, 84]. To identify new substrates of MMP-2, McQuibban et al. [85] fused the exosite domain of MMP-2 to the β-Gal activator domain to use as bait in a yeast two-hybrid screen. The screen resulted in the identification of monocyte chemoattractant protein-3 (MCP-3) as a protein that interacted with the exosite domain of MMP-2. Further investigation using fulllength MMP-2 showed that MCP-3 is indeed cleaved by MMP-2 at a rate that exceeds that of gelatin cleavage. MMP-2 cleavage of MCP-3 was shown to be functionally relevant as a broad-spectrum CC-chemokine receptor antagonist in in vitro assays. Mouse models of inflammation further supported this role with the observation that MMP-2-cleaved MCP-3 protein significantly eliminated monocyte infiltration in a dose-dependent manner. While exosite scanning has been shown to facilitate the identification of substrates for some proteases, the method is not generally applicable to all proteases. The method is currently limited to proteases that contain identifiable exosite domains and the fundamental limitations of the yeast two-hybrid approach also apply. 3.2 Proteomic Approaches to Identifying Protease Substrates
In contrast to the genetic approach of exosite scanning for identifying protein substrates through binding interactions, several groups have taken a similar proteomic approach to identify substrates using protease catalytic domains with active site mutations [86, 87]. For example, Beresford et al. used affinity chromatography with an inactive variant of the serine protease granzyme A, the active site Ser was mutated to Ala, to isolate potential substrates of granzyme A [86]. Granzyme A is a serine protease that is released from the granules of cytotoxic lymphocytes to facilitate the elimination of tumor and virus-infected cells through a caspase-independent mechanism. Using the affinity chromatography approach the authors were able to identify the endoplasmic reticulum (ER)-associated 270–450 kDa SET complex that among other proteins contained tumor suppressor NM23-H1 which was shown to be a granzyme A-activated DNase [86, 88, 89]. Identification of the substrates of granzyme A provided an increased understanding of the molecular mechanism that granzyme A uses to eliminate tumor and virus cells that evade caspase-mediated apoptosis. The major limitation of this approach is that by removal of the active site residue(s) the interaction of the protease with the substrate relies only on the
23
24 Protease Substrate Profiling
ground state binding, approximated by K m , and does not take into account the catalytic step, approximated by kcat . Specificity of an enzyme for a substrate is dependent on the ability of the enzyme to utilize substrate-binding energy to promote catalysis, kcat /K m . By only relying ground state binding to identify substrates, efficiently catalyzed substrates could easily be overlooked in favor of tight binding substrates that may be less efficiently hydrolyzed by the protease. An interesting solution to the identification of protease substrates is the use of selective tagging of proteolyzed substrates. Wang et al. suggested the use of peptide nucleophiles for identifying substrates of serine and cysteine proteases [90]. Serine and cysteine proteases undergo a two-step mechanism for the hydrolysis of protein substrates. A stable acyl– or thioacyl–enzyme intermediate is formed with the N-terminal portion of the substrate. In this method, a peptide nucleophile tag is designed in which the peptide portion binds to the prime side subsites of the enzyme, placing the nucleophile in position to attack the acyl-enzyme intermediate, thereby incorporating the peptide nucleophile tag to the C-terminus of the N-terminal portion of the substrate. Using biotin in the nucleophile tag allows for selective isolation of the substrates using immobilized streptavidin. The authors showed that this method was effective for labeling bovine serine albumin cleavage products of the cysteine protease cathepsin S and the serine protease human neutrophil elastase. They also showed that pro-interleukin-1β could be selectively labeled by caspase-1. An obvious limitation to the method includes the requirement for design of selective prime side peptide nucleophile tags so that only substrates for the protease of interest are tagged. Also, since the nucleophile tag is competing with water only a small percentage of the substrate will be tagged, leading to potential issues of detection. While this approach is promising, the general application of this method to identifying novel substrates in complex biological systems has yet to be demonstrated. Traditional proteome analysis techniques couple high-resolution protein separation with protein identification and quantification. Typical approaches for protein separation include two-dimensional PAGE, high-performance liquid chromatography, and capillary electrophoresis. Protein identification and quantification are usually achieved by mass spectrometry of stable isotope-incorporated protein mixtures or Edman degradation-based sequencing. These powerful proteomic approaches are beginning to be applied to the challenging field of enzyme substrate identification [65, 91, 92]. Of particular interest, Tam et al. reported the use of an isotopecoded affinity tag (ICAT) coupled with multidimensional liquid chromatography and mass spectrometry to identify proteins shed from the surface of MDA-MB-231 breast cancer cells upon transfection with membrane type 1-matrix metalloproteinase [93]. Proteins that differed in size or abundance between the trans-fected and untransfected cells were identified using tandem mass spectrometry. While this is still an emerging field, these techniques offer promise for identifying the natural substrates of proteases thereby more readily establishing the physiological function of proteases in biology.
References
4 Conclusions
Multiple tools have been developed or are in the process of being developed to facilitate the identification of protease substrates in terms of both identifying the optimal substrate specificity of a protease as well as identifying the physiologically relevant macromolecular substrates of a protease. The application of this knowledge can have far-reaching implications into the characterization of the protease of interest. A practical application of substrate specificity information is in the production of sensitive substrates to monitor protease activity for screening for small molecule modulators. Moreover, in the absence of leads from screening efforts, utilization of the optimal substrate specificity can be used as a starting point for inhibitor design. Indeed, it is typical for efforts in protease drug discovery to utilize the subsite substrate specificity at some point in the inhibitor development process. Other applications of protease substrate specificity information include the visualization of specific protease activity in lysates, cells, tissues, or whole organisms. For example, near infrared (NIR) fluorochromes coupled to optimal peptide substrate sequences have been used to visualize the activity of proteases in whole animals (reviewed in [94]). Low absorbance and low autofluorescence of NIR by tissues allows light to travel 5–6 cm, making NIR well suited for imaging of intact organisms [95]. The ability to view specific proteolysis in vivo will enable the diagnosis of diseases and the real time assessment of drug treatment [96, 97]. Substrate specificity has also been used to selectively target drugs or retro-viruses to specific tissues or sites of disease [98–101]. As an example, several groups have taken advantage of the selective upregulation of the PSA serine protease in prostate cancer to target cytotoxic drugs to the cancer cells [102, 103]. Conjugation of the cytotoxic drug doxorubicin to the optimal peptide substrate of PSA results in a drug that is not toxic unless activated by PSA cleavage. This strategy should allow for an increased therapeutic window for cytotoxic chemotherapy. Proteases form the cornerstone of many biological processes. While much progress has been made over the century of protease research, issues of redundancy and selectivity still brand proteases as a demanding enzyme class. Systematic tools that allow for functional profiling of substrate specificity will be an important mechanism to characterize, classify, and differentiate proteases and protease families. Some of the tools for identifying optimal and suboptimal peptide substrates as well as tools for the discovery of physiological substrates have been presented.
References 1 A. J. BARRETT, N. D. RAWLINGS, J. FRED WOESSNER (eds), Handbook of Proteolytic Enzymes, Elsevier Academic Press, London, 2004.
2 X. S. PUENTE, L. M. SANCHEZ, C. M. OVERALL, C. LOPEZ-OTIN, Nat. Rev. Genet. 2003, 4, 544–558.
25
26 Protease Substrate Profiling
3 C. SOUTHAN, Drug Discov. Today 2001, 6, 681–688. 4 C. LOPEZ-OTIN, C. M. OVERALL, Nat. Rev. Mol. Cell Biol. 2002, 3, 509–519. 5 P. L. RICHARDSON, Curr. Pharm. Des. 2002, 8, 2559–2581. 6 W. M. FREDERIKS, O. R. F. MOOK, J. Histochem. Cytochem. 2004, 52, 711–722. 7 A. BARUCH, D. A. JEFFERY, M. BOGYO, Trends Cell Biol. 2004, 14, 29–35. 8 N. JESSANI, B. F. CRAVATT, Curr. Opin. Cell Biol. 2004, 8, 54–59. 9 J. J. PERONA, C. S. CRAIK, Protein Sci. 1995, 4, 337–360. 10 L. HEDSTROM, Chem. Rev. 2002, 102, 4501–4523. 11 W. BLEUKX, K. BRIJS, S. TORREKENS, F. VAN LEUVEN, J. A. DELCOUR, Biochim. Biophys. Acta 1998, 1387, 317–324. 12 S. BUTENAS, M. KALAFATIS, K. G. MANN, Biochemistry 1997, 36, 2123–2131. 13 H. R. STENNICKE, M. RENATUS, M. MELDAL, G. S. SALVESEN, Biochem. J. 2000, 350, 563–568. 14 C. A. TSU, C. S. CRAIK, J. Biol. Chem. 1996, 271, 11563–11570. 15 D. J. MALY, L. HUANG, J. A. ELLMAN, ChemBioChem 2002, 3, 16–37. 16 C. PINILLA, J. R. APPEL, P. BLANC, R. A. HOUGHTEN, BioTechniques 1992, 13, 901–902, 904–905. 17 E. APLETALINA, J. APPEL, N. S. LAMANGO, R. A. HOUGHTEN, I. LINDBERG, J. Biol. Chem. 1998, 273, 26589–26595. 18 T. A. RANO, T. TIMKEY, E. P. PETERSON, J. ROTONDA, D. W. NICHOLSON, J. W. BECKER, K. T. CHAPMAN, N. A. THORNBERRY, Chem. Biol. 1997, 4, 149–155. 19 N. A. THORNBERRY, T. A. RANO, E. P. PETERSON, D. M. RASPER, T. TIMKEY, M. GARCIA-CALVO, V. M. HOUTZAGER, P. A. NORDSTROM, S. ROY, J. P. VAILLANCOURT, K. T. CHAPMAN, D. W. NICHOLSON, J. Biol. Chem. 1997, 272, 17907–17911. 20 M. ZIMMERMAN, E. YUREWICZ, G. PATEL, Anal. Biochem. 1976, 70, 258–262. 21 J. L. HARRIS, B. J. BACKES, F. LEONETTI, S. MAHRUS, J. A. ELLMAN, C. S. CRAIK, Proc. Natl Acad. Sci. USA 2000, 97, 7754–7759. 22 D. J. MALY, F. LEONETTI, B. J. BACKES, D. S. DAUBER, J. L. HARRIS, C. S. CRAIK, J. A. ELLMAN, J. Org. Chem. 2002, 67, 910–915.
23 J. L. HARRIS, P. B. ALPER, J. LI, M. RECHSTEINER, B. J. BACKES, Chem. Biol. 2001, 8, 1131–1141. 24 W. W. RAYMOND, S. W. RUGGLES, C. S. CRAIK, G. H. CAUGHEY, J. Biol. Chem. 2003, 278, 34517–34524. 25 H. PETRASSI, J. WILLIAMS, J. LI, C. TUMANUT, B. BACKES, J. HARRIS, Submitted for publication. 26 D. S. DAUBER, R. ZIERMANN, N. PARKIN, D. J. MALY, S. MAHRUS, J. L. HARRIS, J. A. ELLMAN, C. PETROPOULOS, C. S. CRAIK, J. Virol. 2002, 76, 1359–1368. 27 A. SHIPWAY, H. DANAHAY, J. A. WILLIAMS, D. C. TULLY, B. J. BACKES, J. L. HARRIS, Biochem. Biophys. Res. Commun. 2004, 324, 953–963. 28 D. E. MASON, J. EK, E. C. PETERS, J. L. HARRIS, Biochemistry 2004, 43, 6535–6544. 29 A. J. BIRKETT, D. F. SOLER, R. L. WOLZ, J. S. BOND, J. WISEMAN, J. BERMAN, R. B. HARRIS, Anal. Biochem. 1991, 196, 137–143. 30 J. R. PETITHORY, F. R. MASIARZ, J. F. KIRSCH, D. V. SANTI, B. A. MALCOLM, Proc. Natl Acad. Sci. USA 1991, 88, 11510–11514. 31 D. ARNOLD, W. KEILHOLZ, H. SCHILD, T. DUMRESE, S. STEVANOVIC, H.-G. RAMMENSEE, Eur. J. Biochem. 1997, 249, 171–179. 32 B. E. TURK, L. L. HUANG, E. T. PIRO, L. C. CANTLEY, Nat. Biotechnol. 2001, 19, 661–667. 33 J. M. OSTRESH, J. H. WINKLE, V. T. HAMASHIN, R. A. HOUGHTEN, Biopolymers 1994, 34, 1681–1689. 34 B. E. TURK, T. Y. WONG, R. SCHWARZENBACHER, E. T. JARRELL, S. H. LEPPLA, R. J. COLLIER, R. C. LIDDINGTON, L. C. CANTLEY, Nat. Struct. Mol. Biol. 2004, 11, 60–66. 35 B. J. BACKES, J. L. HARRIS, F. LEONETTI, C. S. CRAIK, C. J. A. ELLMAN, Nat. Biotechnol. 2000, 18, 187–193. 36 J. E. SHEPPECK II, H. KAR, L. GOSINK, J. B. WHEATLEY, E. GJERSTAD, S. M. LOFTUS, A. R. ZUBIRIA, J. W. JANC, Bioorg. Med. Chem. Lett. 2000, 10, 2639–2642. 37 M. F. TEMPLIN, D. STOLL, M. SCHRENK, P. C. TRAUB, C. F. VOHRINGER, T. O. JOOS, Trends Biotechnol. 2002, 20, 160–166.
References 38 N. DEKKER, R. C. COX, R. A. KRAMER, M. R. EGMOND, Biochemistry 2001, 40, 1694–1701. 39 R. FRANK, Tetrahedron 1992, 48, 9217–9232. 40 C. M. SALISBURY, D. J. MALY, J. A. ELLMAN, J. Am. Chem. Soc. 2002, 124, 14868–14870. 41 A. FURKA, F. SEBESTYEN, M. ASGEDOM, G. DIBO, Int. J. Pept. Protein. Res. 1991, 37, 487–493. 42 Y. DUAN, R. A. LAURSEN, Anal. Biochem. 1994, 216, 431–438. 43 M. MELDAL, I. SVENDSEN, K. BREDDAM, F.-I. AUZANNEAU, Proc. Natl Acad. Sci. USA 1994, 91, 3314–3318. 44 M. A. JULIANO, E. DEL NERY, J. SCHARFSTEIN, M. MELDAL, I. SVENDSEN, A. WALMSLEY, L. JULIANO, in Peptides 1996, Proceedings of the 24th European Peptide Symposium, Edinburgh, Sept. 8–13, 1996, 1998, pp. 511–512. 45 P.M.St. HILAIRE, M. WILLERT, M. A. JULIANO, L. JULIANO, M. MELDAL, J. Comb. Chem. 1999, 1, 509–523. 46 P.M.St. HILAIRE, L. C. ALVES, S. J. SANDERSON, J. C. MOTTRAM, M. A. JULIANO, L. JULIANO, G. H. COOMBS, M. MELDAL, ChemBioChem 2000, 1, 115–122. 47 N. WINSSINGER, R. DAMOISEAUX, D. C. TULLY, B. H. GEIERSTANGER, K. BURDICK, J. L. HARRIS, Chem. Biol. 2004, 11, 1351–1360. 48 F. DEBAENE, L. MEJIAS, J. L. HARRIS, N. WINSSINGER, Tetrahedron 2004, 60, 8677–8690. 49 S. P. LEYTUS, W. L. PATTERSON, W. F. MANGEL, Biochem. J. 1983, 215, 253–260. 50 B. FURIE, B. C. FURIE, Cell 1988, 53, 505–518. 51 G. P. SMITH, J. K. SCOTT, Methods Enzymol. 1993, 217, 228–257. 52 D. J. MATTHEWS, J. A. WELLS, Science 1993, 260, 1113–1117. 53 S.-H. KE, G. S. COOMBS, K. TACHIAS, M. NAVRE, D. R. COREY, E. L. MADISON, J. Biol. Chem. 1997, 272, 16603–16609. 54 S.-H. KE, G. S. COOMBS, K. TACHIAS, D. R. COREY, E. L. MADISON, J. Biol. Chem. 1997, 272, 20456–20462. 55 G. S. COOMBS, R. C. BERGSTROM, J.-L. PELLEQUER, S. I. BAKER, M. NAVRE, M. M.
56
57
58
59
60
61
62 63
64
65
66 67
68
SMITH, J. A. TAINER, E. L. MADISON, D. R. COREY, Chem. Biol. 1998, 5, 475–488. Z. Q. BECK, L. HERVIO, P. E. DAWSON, J. H. ELDER, E. L. MADISON, Virology 2000, 274, 391–401. H. R. MAUN, C. EIGENBROT, R. A. LAZARUS, J. Biol. Chem. 2003, 278, 21823–21830. T. TAKEUCHI, J. L. HARRIS, W. HUANG, K. W. YAN, S. R. COUGHLIN, C. S. CRAIK, J. Biol. Chem. 2000, 275, 26333–26342. J. L. HARRIS, E. P. PETERSON, D. HUDIG, N. A. THORNBERRY, C. S. CRAIK, J. Biol. Chem. 1998, 273, 27364–27373. A. TENZER, B. HOFSTETTER, C. SAUSER, S. BODIS, A. P. SCHUBIGER, C. BONNY, M. PRUSCHY, Proteomics 2004, 4, 2796–2804. N. A. SHARKOV, R. M. DAVIS, J. F. REIDHAAR-OLSON, M. NAVRE, D. CAI, J. Biol. Chem. 2001, 276, 10788–10793. M. M. SMITH, L. SHI, M. NAVRE, J. Biol. Chem. 1995, 270, 6440–6449. L. DING, G. S. COOMBS, L. STRANDBERG, M. NAVRE, D. R. COREY, E. L. MADISON, Proc. Natl Acad. Sci. USA 1995, 92, 7627–7631. L. S. HERVIO, G. S. COOMBS, R. C. BERGSTROM, K. TRIVEDI, D. R. COREY, E. L. MADISON, Chem. Biol. 2000, 7, 443–452. T. LOHMUELLER, D. WENZLER, S. HAGEMANN, W. KIESS, C. PETERS, T. DANDEKAR, T. REINHECKEL, Biol. Chem. 2003, 384, 899–909. D. J. BOWEN, Mol. Pathol. 2002, 55, 1–18. M. CITRON, D. WESTAWAY, W. XIA, G. CARLSON, T. DIEHL, G. LEVESQUE, K. JOHNSON-WOOD, M. LEE, P. SEUBERT, A. DAVIS, D. KHOLODENKO, R. MOTTER, R. SHERRINGTON, B. PERRY, H. YAO, R. STROME, I. LIEBERBURG, J. ROMMENS, S. KIM, D. SCHENK, P. FRASER, P. S. G. HYSLOP, D. J. SELKOE, Nat. Med. 1997, 3, 67–72. C. De JONGHE, M. CRUTS, E. A. ROGAEVA, C. TYSOE, A. SINGLETON, H. VANDERSTICHELE, W. MESCHINO, B. DERMAUT, I. VANDERHOEVEN, H. BACKHOVENS, E. VANMECHELEN, C. M. MORRIS, J. HARDY, D. C. RUBINSZTEIN, P.H.St. GEORGE-HYSLOP, C. Van BROECKHOVEN, Hum. Mol. Genet. 1999, 8, 1529–1540.
27
28 Protease Substrate Profiling
69 J. HARDY, D. J. SELKOE, Science 2002, 297, 353–356. 70 P. LI, H. ALLEN, S. BANERJEE, S. FRANKLIN, L. HERZOG, C. JOHNSTON, J. MCDOWELL, M. PASKIND, L. RODMAN, J. SALFELD, E. TOWNE, D. TRACEY, S. WARDWELL, F.-Y. WEI, W. WONG, R. KAMEN, T. SESHADRI, Cell 1995, 80, 401–411. 71 G. P. SHI, J. A. VILLADANGOS, G. DRANOFF, C. SMALL, L. GU, K. J. HALEY, R. RIESE, H. L. PLOEGH, H. A. CHAPMAN, Immunity 1999, 10, 197–206. 72 B. M. STEIGLITZ, M. AYALA, K. NARAYANAN, A. GEORGE, D. S. GREENSPAN, J. Biol. Chem. 2004, 279, 980–986. 73 U. SAHIN, G. WESKAMP, K. KELLY, H.-M. ZHOU, S. HIGASHIYAMA, J. PESCHON, D. HARTMANN, P. SAFTIG, C. P. BLOBEL, J. Cell Biol. 2004, 164, 769–779. 74 L. B. KNUDSEN, J. Med. Chem. 2004, 47, 4128–4134. 75 D. MARGUET, L. BAGGIO, T. KOBAYASHI, A.-M. BERNARD, M. PIERRES, P. F. NIELSEN, U. RIBEL, T. WATANABE, D. J. DRUCKER, N. WAGTMANN, Proc. Natl Acad. Sci. USA 2000, 97, 6874–6879. 76 G. J. HANNON, J. J. ROSSI, Nature 2004, 431, 371–378. 77 S. TANIDA, T. JOH, K. ITOH, H. KATAOKA, M. SASAKI, H. OHARA, T. NAKAZAWA, T. NOMURA, Y. KINUGASA, H. OHMOTO, H. ISHIGURO, K. YOSHINO, S. HIGASHIYAMA, M. ITOH, Gastroenterology 2004, 127, 559–569. 78 J. L. CONTRERAS, M. VILATOBA, C. ECKSTEIN, G. BILBAO, A. THOMPSON, D. E. ECKHOFF, Surgery 2004, 136, 390–400. 79 M. LI, C. L. BROOKS, N. KON, W. GU, Mol. Cell 2004, 13, 879–886. 80 K. D. LUSTIG, K. L. KROLL, E. E. SUN, M. W. KIRSCHNER, Development 1996, 122, 4001–4012. 81 S. KOTHAKOTA, T. AZUMA, C. REINHARD, A. KLIPPEL, J. TANG, K. CHU, T. J. MCGARRY, M. W. KIRSCHNER, K. KOTHS, D. J. KWIATKOWSKI, L. T. WILLIAMS, Science 1997, 278, 294–298. 82 V. L. CRYNS, Y. BYUN, A. RANA, H. MELLOR, K. D. LUSTIG, L. GHANEM, P. J. PARKER, M. W. KIRSCHNER, J. YUAN, J. Biol. Chem. 1997, 272, 29449–29453.
83 B. STEFFENSEN, U. M. WALLON, C. M. OVERALL, J. Biol. Chem. 1995, 270, 11555–11566. 84 C. M. OVERALL, E. TAM, G. A. MCQUIBBAN, C. MORRISON, U. M. WALLON, H. F. BIGG, A. E. KING, C. R. ROBERTS, J. Biol. Chem. 2000, 275, 39497–39506. 85 G. A. MCQUIBBAN, J.-H. GONG, E. M. TAM, C. A. G. MCCULLOCH, I. CLARK-LEWIS, C. M. OVERALL, Science 2000, 289, 1202–1206. 86 P. J. BERESFORD, C.-M. KAM, J. C. POWERS, J. LIEBERMAN, Proc. Natl Acad. Sci. USA 1997, 94, 9285–9290. 87 J. M. FLYNN, S. B. NEHER, Y.-I. KIM, R. T. SAUER, T. A. BAKER, Mol. Cell 2003, 11, 671–683. 88 P. J. BERESFORD, D. ZHANG, D. Y. OH, Z. FAN, E. L. GREER, M. L. RUSSO, M. JAJU, J. LIEBERMAN, J. Biol. Chem. 2001, 276, 43285–43293. 89 Z. FAN, P. J. BERESFORD, D. Y. OH, D. ZHANG, J. LIEBERMAN, Cell 2003, 112, 659–672. 90 Y. WANG, D. RASNICK, J. KLAUS, D. PAYAN, D. BROEMME, D. C. ANDERSON, J. Biol. Chem. 1996, 271, 28399–28406. 91 L. GUO, J. R. EISENMAN, R. M. MAHIMKAR, J. J. PESCHON, R. J. PAXTON, R. A. BLACK, R. S. JOHNSON, Mol. Cell. Pro-teomics 2002, 1, 30–36. 92 A. J. BREDEMEYER, R. M. LEWIS, J. P. MALONE, A. E. DAVIS, J. GROSS, R. R. TOWNSEND, T. J. LEY, Proc. Natl Acad. Sci. USA 2004, 101, 11785–11790. 93 E. M. TAM, C. J. MORRISON, Y. I. WU, M. S. STACK, C. M. OVERALL, Proc. Natl Acad. Sci. USA 2004, 101, 6917–6922. 94 M. FUNOVICS, R. WEISSLEDER, C.-H. TUNG, Anal. Bioanal. Chem. 2003, 377, 956–963. 95 R. WEISSLEDER, C.-H. TUNG, U. MAHMOOD, A. BOGDANOV Jr., Nat. Biotechnol. 1999, 17, 375–378. 96 C. BREMER, C.-H. TUNG, R. WEISSLEDER, Natl. Med 2001, 7, 743–748. 97 A. WUNDER, C.-H. TUNG, U. MUELLER-LADNER, R. WEISSLEDER, U. MAHMOOD, Arthritis Rheum. 2004, 50, 2459–2465. 98 P. L. CARL, P. K. CHAKRAVARTY, J. A. KATZENELLENBOGEN, M. J. WEBER, Proc.
References Natl Acad. Sci. USA 1980, 77, 2224–2228. 99 G. M. DUBOWCHIK, R. A. FIRESTONE, Bioorg. Med. Chem. Lett. 1998, 8, 3341–3346. 100 P. M. LOADMAN, M. C. BIBBY, J. A. DOUBLE, W. M. AL-SHAKHAA, R. DUNCAN, Clin. Cancer Res. 1999, 5, 3682–3688. 101 R. M. SCHNEIDER, Y. MEDVEDOVSKA, I. HARTL, B. VOELKER, M. P. CHADWICK, S. J. RUSSELL, K. CICHUTEK, C. J. BUCHHOLZ, Gene Ther. 2003, 10, 1370–1380.
102 S. R. DENMEADE, A. NAGY, J. GAO, H. LILJA, A. V. SCHALLY, J. T. ISAACS, Cancer Res. 1998, 58, 2537–2540. 103 D. DEFEO-JONES, V. M. GARSKY, B. K. WONG, D.-M. FENG, T. BOLYAR, K. HASKEL, D. M. KIEFER, K. LEANDER, E. MCAVOY, P. LUMMA, J. WAI, E. T. SENDERAK, S. L. MOTZEL, K. KEENAN, M. VAN ZWIETEN, J. H. LIN, R. FREIDINGER, J. HUFF, A. OLIFF, R. E. JONES, Nat. Med. 2000, 6, 1248–1252. 104 A. BERGER, I. SCHECHTER, Philos. Trans. R. Soc. Lond. B 1970, 257, 249–264.
29
1
Photochemical Enzyme Regulation Ling Peng Universit´e de Marseille, Marseille, France
Maurice Goeldner University of Oxford, Oxford, United Kingdom
Originally published in: Dynamic Studies in Biology. Edited by Maurice Goeldner and Richard S. Givens. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30783-8
1 Introduction
Enzymes are important macromolecules that ensure numerous biological events such as cell metabolism, signal transduction, proliferation, differentiation, etc., and enzyme function is regulated by a number of different modulators, e.g. substrates, products, inhibitors and/or coenzymes. Thus, understanding the functioning of enzymes is important for us to study related biological events. Enzyme catalysis is a dynamic process that involves, successively, binding and breakdown of substrate with or without the help of a coenzyme, formation of product, as well as release of product. Enzymatic function can also be modulated with inhibitors that act at the active or a secondary-binding site. Therefore, photochemical enzyme regulation using caged enzyme modulators is of special interest, allowing control of the catalytic reaction. These caged modulators have a photoremovable protecting group which renders them biologically inactive either by transforming them to an inert chemical or by converting them to an enzyme inhibitor. Upon photoactivation, the caged compounds release the corresponding enzyme substrate, product, coenzyme or inhibitor and switch either on or off the enzymatic reaction. Thus, they are useful tools to photoregulate the enzyme activity and to undertake dynamic studies on their catalytic mechanism. This chapter will cover different examples of photochemical enzyme regulation by caged enzyme modulators, including caged substrates, products, coenzymes or inhibitors.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Photochemical Enzyme Regulation
2 Photoregulation of ATPases 2.1 ATPases using Caged ATP Molecules 2.1.1 Na,K-ATPase The first application of a caged compound in the photoregulation of enzyme activity was pioneered 1978 with the groundbreaking work of Kaplan et al. [1] using caged ATP (Scheme 1) to control the activity on the Na,K-ATPase. The caging group used was a 2-nitrobenzyl group, a frequently used protecting group in organic synthesis [2]. Both the P3 -2-nitrobenzyl ATP (1, NB-ATP) and the P3 -1-(2-nitrophenyl) ethyl ATP (2, NPE-ATP), upon irradiation at 340 nm, released ATP, with a better photolytic efficiency for the NPE-caged ATP (2). The mechanism and kinetics of the photochemical reaction of the caged ATP molecules have been examined in details [3]. Prior to irradiation, neither 1 nor 2 was a substrate nor an inhibitor of purified renal Na,K-ATPase. After photolysis, ATP was released from 2 and hydrolysed by Na,K-ATPase. The byproduct of the photoreaction, 2-nitroso acetophenone, produced inhibition of the enzymatic hydrolysis of ATP, which could be prevented by using scavengers such as glutathione or bisulfite in the irradiated enzyme solution. Further, caged ATP was incorporated into resealed human erythrocyte ghosts prepared from red blood cells depleted of internal energy stores [3]. While the Na,K pump was unable to use incorporated caged ATP molecules as a substrate, the
Scheme 1 Caged ATPase modulators: caged ATP (1 and 2); caged SERCA inhibitor (3); caged Ca2+ .
2 Photoregulation of ATPases
ATP liberated by photolysis activated the pump as evidenced by measurements of K+ -dependent, ouabain-sensitive Na+ efflux. 2.1.2 Transport ATPases and Carriers A thorough analysis of the mechanism of ion transport across membranes using caged substrates for the chemical energy producing enzymes have been described in details by the groups of Fendler and Bamberg, and has been summarized recently [4]. These studies used the NPE-ATP (2) to photoregulate the different ATPases involved in ion transport processes: Na,K-ATPase, Ca-ATPase, Kdp-ATPase and H,KATPase, while a caged Ca2+ molecule was used to investigate the Na,Ca exchanger. More recently, these groups described in detail the mechanism of functioning of the Kdp system, a bacterial P-type ATPase of Escherichia coli that transports K+ with high affinity [5]. Upon rapid release of ATP from 2, a transient current occurred in the absence of K+ , similar to the stationary current seen in the presence of K+ . Based on these studies, a kinetic model for the Kdp-ATPase similar to that of the Na,K-ATPase was proposed: (1) the K-independent step is electrogenic and corresponds to the outward transport of a negative charge; and (2) the K-translocating step, probably also electrogenic, corresponds to transport of positive charge to the intracellular side of the protein [5]. 2.1.3 Molecular Motors Cell motility includes contraction of skeletal, cardiac and smooth muscles, and other motion. The motor proteins that are responsible for these essential functions are members of three superfamilies: kinesins, myosins and dyneins. They convert the chemical energy of ATP hydrolysis into mechanical work. The elementary mechanical steps like force generation should be related to chemical reactions such as ATP binding, and Pi and ADP release. Combining the measurement of force produced by kinesin or myosin with photolysis of caged compounds provides a powerful approach to study the mechano-chemical coupling of the molecular motors, which has been summarized recently by Dantzig et al. [6]. In order to investigate the relations between transient of force by single kinesin molecule and the elementary steps of the ATPase cycle [7], Higuchi et al. measured the time to force generation by kinesin after photorelease of ATP from 2. The kinetics of force generation was consistent with a two-step reaction: ATP binding followed by force generation. The transient rate of force generation was close to the rate of the ATPase cycle in solution, suggesting that the rate-limiting step of the ATPase cycle was involved with the force generation. 2.2 Ca2+ -ATPase
The Ca2+ -ATPase of sarcoplasmic reticulum is a membrane protein which performs the active transport of Ca2+ from the cytoplasm of muscle cells into the sarcoplasmic reticulum, through hydrolysis of ATP [8]. It belongs to the P-type ATPase family and serves as a model for the study of these enzymes. A reaction scheme
3
4 Photochemical Enzyme Regulation
has been proposed for the Ca2+ -ATPase of sarcoplasmic reticulum, suggesting a change from a high Ca2+ affinity, ATP-phosphorylated enzyme state to a low Ca2+ affinity, Pi -phosphorylated state [8]. Details of the transport mechanism, especially the coupling between ATP hydrolysis and active Ca2+ transport, are unknown. Thus, different caged modulators, combined with time-resolved techniques, represent a powerful methodology to study the catalytic reaction of Ca2+ -ATPase. 2.2.1 Ca2+ -ATPase using Caged ATP M¨antele et al. used caged ATP (2) to photoregulate the activity of the Ca2+ -ATPase and studied the subsequent reactions by Fourier transform IR (FTIR) spectroscopy [9]. Photolytic release of ATP from 2 started the catalytic cycle of Ca2+ -ATPase, permitting a sensitive probing of the molecular process such as ATP hydrolysis and formation of the phosphorylated enzyme during catalytic activity of Ca2+ -ATPase by FTIR spectroscopic studies. Their results suggested that the active Ca2+ -ATPase was the phosphorylated form with two Ca2+ ions. Further, Lewis and Thomas [10] undertook studies on microsecond rotational dynamics of Ca2+ -ATPase during enzymatic cycling initiated by photolysis of caged ATP 2. The Ca2+ -ATPase was selectively labeled with an iodoacetamide spin label, which yielded electron paramagnetic resonance (EPR)-sensitive conformational changes of enzyme during ATP-induced enzymatic cycling. Their results indicated that substantial changes in protein mobility did not occur during the Ca2+ -ATPase cycle unless they occurred very early in the transient phase or at some later point in the cycle that was not significantly populated in the steady state. Further experiments [11] showed that transient changes in the amplitude of the restricted component associated with the pre-steady state of Ca2+ pumping were detected with 10-ms time resolution after an ATP jump produced by laser flash photolysis of 2. More light would be shed to this question with caged ATP molecules displaying higher time resolution or with different caged compounds such as caged phosphate, caged calcium, etc. 2.2.2 Ca2+ -ATPase using Caged Inhibitor The family of sarcoplasmic/endoplasmic reticulum Ca2+ -ATPases (SERCA) that transport Ca2+ into the sarcoplasmic reticulum and endoplasmic reticulum are important regulators of cytosolic free Ca2+ levels. The effect of these pumps on Ca2+ oscillations has been investigated by the use of a caged SERCA inhibitor: NmocDBHQ (3, Scheme 1) [12]. DBHQ is a membrane-permeant, reversible inhibitor of SERCA. Photolysis of Nmoc-DBHQ is rapid and the subsequent release of DBHQ takes around 5 ms. The rate-limiting step in the release of the DBHQ inhibitor is the decarboxylation reaction of the photolytic carbonate intermediate. Nmoc-DBHQ was used for photomodulating the SERCA activity and for probing Ca2+ signaling dynamics such as [Ca2+ ] oscillations in fibroblasts, which were monitored through the Ca2+ -sensitive fluorescence of fluo-3 [12]. 2.2.3 Ca2+ -ATPase using Caged Ca2+ Using caged Ca2+ , DM-nitrophen Ca (4, Scheme 1), Troullier et al. [13] studied the calcium binding at the high-affinity transport sites of sarcoplasmic reticulum
3 Photoregulation of GTPases: Ras Protein
Ca2+ -ATPase through time-resolved FTIR spectroscopy. DM-nitrophen is a calcium chelator consisting of a photolabile group linked to an EDTA molecule, having a high affinity for calcium (5 nM at pH 7.1) [14]. Upon photolysis, the EDTA moiety of DM-nitrophen is cleaved into two parts and the affinity of the photoproducts for calcium (3 mM at pH 7.1) is dramatically reduced [14]. Thus, DM-nitrophen Ca was used to trigger the efficient release of calcium in the IR samples. FTIR analyses indicated that glutamic and/or aspartic side-chains were deprotonated upon calcium binding, while main-chain carbonyl groups could also play a role in calcium binding [14].
3 Photoregulation of GTPases: Ras Protein
The guanine nucleotide-binding protein Ras [15] plays a central role in cellular processes acting as a switch between an active GTP-bound and an inactive GDPbound form. Such a switch mechanism is found in pathways involved in cell proliferation, signal transduction, differentiation, protein synthesis and protein transport. The intrinsic GTP hydrolysis rate by Ras is rather low and it is greatly enhanced by GTPase-activating proteins (GAPs). Therefore, understanding the detailed GTPase mechanism of Ras might provide a paradigm for many similar processes. Furthermore, the molecular GTPase mechanism of Ras is particularly relevant since oncogenic Ras mutants appear to be involved in 25–30% of all human tumors. 3.1 Time-resolved Crystallographic Studies with Caged GTP
Crystals of Ha-Ras p21 with NPE-GTP (5, Scheme 2) at the active site [16] have been used to study the conformational change of p21 on GTP hydrolysis. Upon photolysis, GTP was released from the caged compound and the structure of the short-lived p21–GTP complex was determined by Laue diffraction methods, although the structural resolution was low [17]. The binding of GTP in p21 was not significantly affected by the caging group, when the optically pure caged GTP (Scheme 2) was employed [18]. The resolution was, however, significantly improved when the crystals were frozen for monochromatic X-ray data collection [19]. The structural analyses showed local events around the γ -phosphate group of GTP played an important role in GTP hydrolysis. 3.2 Time-resolved FTIR Studies with Caged GTP
The group of Gerwert has employed the technique of time-resolved FTIR difference spectroscopy to study, in solution, the GTPase reaction of Ras, at atomic resolution, in a time-resolved mode [20]. The phosphate vibration was analyzed using site
5
6 Photochemical Enzyme Regulation
Scheme 2 Caged GTP.
specifically 18 O-labeled caged GTP isotopomers [21]. One non-bridging oxygen per nucleotide was replaced for an 18 O isotope in the α, β or γ position of the phosphate chain. In photolysis experiments with free caged GTP, strong vibrational couplings were observed among all phosphate groups. The investigation of the photolysed [Ras·(NPE-GTP)] complex and the subsequent hydrolysis reaction of Ras·GTP showed that the phosphate vibrations are largely decoupled by interaction with the protein in contrast to free GTP. The unusual low frequency of the β(PO2 − ) vibration of Ras-bound GTP, as compared to free GTP, indicated a large decrease in the P O bond order. This revealed that the oxygen atoms of β(PO2 − ) group interact much more strongly with the protein than the γ -oxygen atoms, leading to partial bond breakage or at least weakening the bond between the β/γ bridging oxygen, which preprograms the hydrolysis of GTP. Using FTIR technique, Du et al. [22] found also stronger binding of the GDP moiety by the Ras mutant with higher activity, suggesting that the transition state is largely GDP-like. They analyzed the photolysis and hydrolysis FTIR spectra of the [βnon-bridging-18 O2 , α/β-non-bridging-18 O]GTP isotopomer to probe for positional isotope exchange. However, no positional isotope exchange was observed during GTP hydrolysis by Ras. Thus, they proposed a concerted mechanism with the transition state having a considerable amount of dissociative character. Further, the mechanism of the Ras GTPase reaction, in the presence of catalytic amounts of GAP, was also probed by time-resolved FTIR using caged GTP (5) as a photolabile trigger [23]. This approach provided the complete GTPase reaction pathway at the atomic level with milliseconds time resolution. The addition of GAP reduced the measuring time by two orders of magnitude, which improved significantly the quality of the data [23]. The spectral data indicated that binding to Ras forced the flexible GTP molecule into a strained conformation and induced a specific charge distribution different from that in the unbound case. Binding of
4 Photoregulation of Kinases
GAP to Ras·GTP shifted the negative charge from γ - to β-phosphate. Such a shift was already identified by FTIR because of Ras binding and was enhanced by GAP binding. Thus, the charge distribution of the GAP–Ras–GTP complex resembled dissociative-like transition state and was more like that in GDP. An intermediate was observed on the reaction pathway that appeared when the bond between β- and γ -phosphate was cleaved. In the intermediate, the released Pi was strongly bound to the protein and surprisingly showed bands typical of those seen for phosphorylated enzyme intermediates. The release of Pi from the protein complex is the ratelimiting step for the GAP-catalyzed reaction. These results provided a mechanistic picture that is different from the intrinsic GTPase reaction of Ras [24].
4 Photoregulation of Kinases
Kinases catalyze the transfer of a phosphate group from ATP to an acceptor including amino acids (serine, threonine, tyrosine and histidine), nucleotides (monoand diphosphates) and glucides. This action can be reversed by phosphatases, and the activity of the kinases is closely regulated by intracellular signals such as the concentration of cAMP and of Ca2+ . Among the different protein kinases, PKA, activated by cAMP, and PKC, activated by Ca2+ and diacylglycerol (DAG), represent important regulatory enzymes for signal transduction pathways. Interestingly, as early as 1977, even earlier to the work reported by Kaplan et al. on caged ATP, Engels and Schlaeger [25] reported the synthesis of adenosine cyclic 3 ,5 -phosphate2-nitrobenzyl triester and their characterization as photolabile precursor of cAMP as demonstrated by the photochemical regulation of the activity of protein kinase. Caged cAMP with a 2-nitrobenzyl caging group was synthesized by treatment of the free acid of cAMP with diazo-2-nitrotoluene leading to two diastereomeric isomers of 2NB-caged cAMP, axial (6) and equatorial (7), respectively (Scheme 3). Clearly this pioneering work has opened the way to the use of caged regulators of signaling pathways [26]. 4.1 cAMP-dependent Protein Kinase
cAMP is a second messenger which plays an important role in regulating a variety of cellular processes in eukaryotic organisms. Its intracellular level is regulated by two enzymes of opposite action, i.e. synthesis versus degradation, catalyzed by adenylate cyclase and phosphodiesterase, respectively. cAMP activates specific cellular enzymes called cAMP-dependent protein kinases which catalyze the transfer of a phosphate group from ATP to a specific serine or threonine. Due to the cellular importance of cAMP, it is not surprising that this second messenger has been targeted for caging studies to allow a spatiotemporal control of its liberation in living cells.
7
8 Photochemical Enzyme Regulation
4.1.1 PKA using Caged cAMP As already mentioned, the photochemical activation of PKA using caged cAMP molecules 6 or 7 was demonstrated in vitro on purified bovine brain cAMPdependent protein kinase [25]. Another example assessed the role of cAMP and PKA in 11,12-epoxyeicosatrienoic acid-induced changes in endothelial cell coupling. This was demonstrated by a similar effect observed with forskolin (an adenylate cyclase activator), PKA activators or the photochemical release of cAMP from DMNB-cAMP (8, Scheme 3) on the enhanced endothelial cell coupling [27]. Finally, the cAMPdependent regulation of intracellular Ca2+ involves activation of cAMP-dependent PK as suggested by phosphorylation of IP3 receptor. Using DMNB-cAMP as well as other caged second messengers (Ca2+ and IP3 ) it could be concluded that cAMP elevation inhibits IP3 -induced Ca2+ release in an intact cell and that this inhibition was mediated by cAMP-dependent PK [28]. 4.1.2 PKA using Caged Inhibitor Lawrence et al. described the synthesis and the photochemical properties of a caged peptide PKA inhibitor [29]. The incorporation of the NVOC-caging group on one arginine side-chain of the nonapeptide Gly-Arg-Thr-Gly-Arg-Arg-Asn-Ala-Ile (9, Scheme 3) generated a PKA competitive inhibitor of reduced affinity (20 µM for the caged inhibitor versus 420 nM for the uncaged inhibitor). The in vivo inhibitory activity of the nonapeptide on rat embryo fibroplasts, through PKA-dependent
Scheme 3 Caged PKA modulators: caged cAMP (6–8); caged PKA peptide inhibitor (9).
4 Photoregulation of Kinases
inhibition of morphological changes, was demonstrated by the light-induced changes in morphology observed in the presence of caged inhibitor on this cell line. 4.2 PKC
PKC is a family of enzymes that regulate several cellular proteins by phosphorylation. It is inactive in the cytoplasm until its primary activator DAG is released by phospholipase C. 4.2.1 PKC using Caged Activators The synthesis and the photochemical properties of an α-carboxyl-2,4-dinitrobenzyl1,2-dioctanoyl-rac-glycerol (10, Scheme 4) has been described and used to photochemically activate PKC purified from rat brain [30]. The caged DAG derivative was successfully incorporated in cell membranes. Illumination of electrically stimulated cardiac myocytes loaded with the caged derivative led to a complex cellular response including an initial enhancement of cell contraction followed by a loss of responsiveness. Both phases were inhibited by chelerythrine, a PKC antagonist. The caging of fatty acids as potential PKC activators led to the development of a new caging group for carboxylic acids: 1-(2 -nitrophenyl)-2-ethanediol derivatives (NED-caged derivative) [31]. This new reagent was used to cage arachidonic acid (11, Scheme 4) and was developed mainly because of the hydrolytic instability of the corresponding CNB-caged derivative (12, Scheme 4). The group, besides its
Scheme 4 Caged PKC modulators: caged DAG (10); caged arachidonic acid (11 and 12); caged fluorescent PKC peptide substrate (13).
9
10 Photochemical Enzyme Regulation
hydrolytic stability, displayed fairly good quantum yields (0.2). Activation of PKC could be clearly related to the photochemical release of arachidonic acid from 11. 4.2.2 PKC using Caged Fluorescent Substrates A caged fluorescent PKC substrate was conceived as a peptide derivative where the N-terminal serine side-chain was protected from phosphorylation by a DMNBcaging group and the N-terminal substituted by an NBD fluorescent moiety (13, Scheme 4) [32]. A 2.5-fold increase in fluorescence intensity was observed upon phosphorylation of the uncaged peptide [33]. This caged fluorescent PKC substrate constitutes a first example of a spatiotemporal sensor of PKC in living cells. 4.3 Calmodulin-dependent Protein Kinases 4.3.1 Myosin Light Chain Kinase (MLCK) using Caged Peptide Inhibitors To probe the function of calmodulin, caged peptide inhibitors of MLCK were synthesized and tested as potential blockers of eosinophil cell motility induced by calcium-bound calmodulin (CAM) binding to MLCK [34]. A 20-amino-acid peptide, representing the target sequence in MLCK and which binds tightly to CAM, has been modified as a CNB-caged Tyr derivative (14, Scheme 5). This caged peptide binds CAM 50 times less tightly than the parent peptide and is effectively inactive when assayed in smooth muscle cells known to be CAM dependent. This caged peptide was microinjected in eosinophil cells exhibiting normal motility and promptly blocked cell locomotion when exposed to light. 4.3.2 Calmodulin-dependent Protein Kinase II (CaMKII) using Caged Peptide Inhibitors CaMKII is a second-messenger-responsive multifunctional protein kinase that plays important roles in controlling a variety of cellular functions including memory in response to an increase in intracellular Ca2+ [35]. A 13-amino-acid peptide (LysLys-Ala-Leu-Arg-Arg-Gln-Glu-Ala-Val-Asp-Ala-Leu), designated as autocamitide2-related inhibitory peptide (AIP), was found to inhibit CaMKII in a highly specific manner [36]. Studies showed the importance of the first two Lys residues in the inhibitory activity [37]. Therefore, caged AIPs (15 and 16, Scheme 5) were synthesized and characterized to elucidate the mechanism of CaMKII [38]. The caged AIP resulted in the decrease of the inhibitory activity on CaMKII. Upon irradiation, caged AIP regenerates the parent peptide AIP, thus restores the inhibitory activity as expected.
5 Photoregulation of Alcohol Dehydrogenases
Dehydrogenases are oxidoreductases which utilize different types of co-factors: nucleotide coenzymes (NAD+ , NADP+ ), flavine coenzymes (FAD, FMN), quinone
5 Photoregulation of Alcohol Dehydrogenases
Scheme 5 Caged calmodulin-dependent PK peptide inhibitors (14–16).
coenzymes (PQQ, TTQ) or metallic coenzymes. These oxido-reductions catalyze the transfer of different pairs of reactive species: hydride and proton, or one or two protons together with an equal number of electrons. 5.1 Isocitrate Dehydrogenase (IDH): Caged NAD and NADP
Pyridine nucleotides NAD and NADP are the most abundant coenzymes in eukaryotic cells, and serve as oxidative cofactors for many dehydrogenases, reductases and hydroxylases. They are major carriers of H+ and e− in many important metabolic systems such as the glycolysis pathway, the tricarboxylic acid cycle, fatty acid synthesis and sterol synthesis. A series of caged NADP and NAD analogs has been synthesized and investigated for potential use in time-resolved crystallography study of IDH, an NADPdependent enzyme of the tricarboxylic acid cycle which converts isocitrate to αketoglutarate [39]. Biochemically distinct coenzymes analogs have been synthesized and characterized (Scheme 6). The first, a “catalytically caged” NADP 17, where the nicotinamide group is modified by a CNB-caging group, was designed to bind to IDH prior to photolysis, but is expected to become catalytically inactive. The second, an “affinity caged” NADP 18, where a phosphate group modified by
11
12 Photochemical Enzyme Regulation
Scheme 6 Caged dehydrogenase modulators: caged NADP (17 and 18); caged dehydrogenases substrates (19–23).
5 Photoregulation of Alcohol Dehydrogenases
a DMNPE-caging group, was expected to reduce its affinity for IDH. The incorporation at the 2 -phosphate is expected to most significantly reduce the affinity of NADP for IDH. The catalytic mechanism of IDH was investigated using these caged cofactors as well as a caged substrate, NPE-caged isocitrate (19, Scheme 6) [40]. This enzyme is fully reactive in the crystal and displays a kinetic mechanism similar to that in solution, with an overall turnover rate of about 60 s−1 at 22 ◦ C and with a final dissociation of products as rate limiting step. Millisecond Laue structures of the enzyme/α-ketoglutarate/NADPH complex (the product complex) were obtained using these caged modulators [41], and the most well-refined structure was obtained from the experiment using the DMNPE-caged NADP molecule which displayed a better photochemical efficiency and the highest fragmentation rate (13000 s−1 ).
5.2 Glucose-6-Phosphate Dehydrogenase (G6PDH): Caged Glucose 6-Phosphate and Caged NADP
The in vivo rate of G6PDH activity in sea urchin eggs has been investigated using radiolabeled (3 H or 14 C) NPE-caged phosphate esters of the glucose 6-phosphate substrate, NPE-glucose-6-phosphate (20, Scheme 6) [42]. Exposure to intense light flashes liberates the labeled glucose-6-phosphate within the cell, and the rates of formation of 14 CO2 and 3 H2 O after photolysis correlate with the in vivo rates of G6PDH activity. These results are discussed in terms of various hypotheses regarding the modulation of G6PDH activity by fertilization. The catalytic activity of Saccharomyces cerevisiae G6PDH was also photoregulated using a NB-caged amide of NADP and allowing a 60% enzymatic reduction of photoreleased NADP to NADPH [43].
5.3 7α-Hydroxy Steroid Dehydrogenase (7α-HSDH): Caged Cholic Acid
Bile acids are essential to solubilizing and transporting dietary lipids. Recently, bile acids have been found to bind at orphan nuclear receptor, farnesoid X receptor, which suppresses transcription of the gene encoding cholesterol 7α-hydroxylase when bound to bile acids. The signaling mediated by such nuclear receptors is essential for the regulation of endogenous hormones as well as for metabolic elimination of xenobiotic chemicals. Thus, caged bile acids (21 and 22, Scheme 6) were developed to study the signaling mediated by steroid hormones [44]. Indeed, these caged compounds produced the parent bile acid upon photoirradiation at 350 nm. The enzymatic activities of these compounds using 7α-HSDH, which oxidizes cholic acid to 7-oxo-cholic acid, were recovered after irradiation.
13
14 Photochemical Enzyme Regulation
5.4 Caged Benzyl Alcohol
The catalytic oxidation of benzyl alcohol by horse liver alcohol dehydrogenase was investigated at sub zero temperature using a caged benzyl alcohol derivative (23, Scheme 6) [45].
6 Photoregulation of Cholinesterases
Acetylcholinesterase (AChE) is a serine hydrolase that terminates cholinergic transmission by rapid hydrolysis of the neurotransmitter acetylcholine, with a turnover number approaching 20 000 s−1 [46]. Butyrylcholinesterase (BuChE) [47], termed as such because of its faster hydrolysis of butyrylcholine over acetylcholine, besides having no identified endogenous substrate, plays an important role in detoxification by degrading esters like the muscle relaxant succinylcholine and cocaine. The description of the three-dimensional (3-D) structure of AChE [48] and of BuChE [49] as well as of several AChE-inhibitor complexes, has permitted a better understanding of structure-function relationships in the cholinesterases (ChEs). It has, however, also raised new questions concerning the traffic of substrate and products to and from the active site in view of the high turnover rate. To study these dynamic issues of ChEs, a fast and efficient triggering of the enzyme reaction is required, and, clearly, a temporally and spatially controlled photoregulation of enzyme activity represents the best adapted methodology and prompted the synthesis of several caged ChEs modulators (Scheme 7) [50]. 6.1 Caged Choline Derivatives
As choline is the enzymatic product of ChEs, “caged” choline derivatives 24 and 25 (Scheme 7) were designed [51] to be complexed at the ChEs active site and to generate choline by subsequent photoactivation, allowing us to investigate the rapid clearance of choline from the active site. The arsenocholine derivatives 26 and 27 (Scheme 7) are heavy-atom analogs of caged choline, designed to track the path of choline exit by time-resolved crystallographic studies [52]. We have established the photoregulation of the ChE activities using probe 25, by restoring the enzymatic activity after photolysis of a ChE/25 complex [53]. The experiments in which we established the recovery of enzymic activity after flash laser photolysis of solutions of AChE or BuChE containing 25 (Fig. 1) suggested not only that the byproduct 2-nitrosoacetophenone did not have an inhibitory effect on the two enzymes in the experimental conditions employed, but that both 2-nitrosoacetophenone and choline, which are generated concomitantly within the active sites of the enzymes, were cleared from their active sites [53]. Thus, the experimental system can indeed serve as a paradigm for studying the exit of choline
6 Photoregulation of Cholinesterases
Scheme 7 Caged cholinesterase modulators: caged choline (24 and 25) and arsenocholine (26 and 27); caged nor-acetylcholine (28 and 29) and nor-butyrylcholine (30 and 31); caged carbamylcholine (32) and arsenocarbamylcholine (33).
from the active site of AChE or BuChE by means of time-resolved crystallography. Recently, the 3-D structure of caged arsenocholine 27 complexed to AChE has been obtained [54] opening the way to anomalous diffraction analyses. These studies, however, will need to be reconsidered in the light of recent studies demonstrating [55] that photolytic release of alcohols from caged nitrobenzyl ether derivatives is much slower than initially claimed. 6.2 Caged Cholinesterase Substrate
Nor-acetylcholine is a close analog of the endogenous substrate of AChE. Hydrolysis of nor-acetylcholine and of acetylcholine is chemically identical: after the binary enzyme–substrate complex formation, successive acylation and deacylation steps follow as illustrated in Scheme 8. The advantage of caged nor-acetylcholine (28 and 29, Scheme 7) [56] is that it offers the possibility of studying the mechanism of the entire catalytic process shown in Scheme 8. Following the same strategy, caged nor-butyrylcholine derivatives (30 and 31, Scheme 7) [57] were designed for the
15
16 Photochemical Enzyme Regulation
Fig. 1 Regeneration of enzymic activity after laser flash photolysis (351 nm) of (a) AChE in the presence of 25 (5×10−4 M in phosphate buffer, pH 6.5) and (b) BuChE in the presence of 25 (5.4×10−5 M in phosphate buffer, pH 7.2). (Reproduced with the permission of the American Chemical Society)
Scheme 8 Mode of action of different caged cholinesterase modulators: (a) caged nor-acetylcholine and (b) caged carbamylcholine.
7 Miscellaneous Examples of Caged Enzyme Modulators
studies on BuChE. Unfortunately, both 28/29 and 30/31 are slowly hydrolysed by AChE and BuChE, respectively, impeding a further utilization of these probes. Carbamylcholine serves as a slow substrate for AChE: rapid carbamylation with release of choline is followed by much slower decarbamylation which can be accelerated by dilution (Scheme 8). Thus, “caged” carbamylcholine and arsenocarbamylcholine, 32 [58] and 33 [52] (Scheme 7), can offer the possibility to study the mechanism of carbamylation of AChE as well as to investigate the enzymatic release of choline (Scheme 8). The photoregulation of the AChE activity could be demonstrated using 32, by inactivating the AChE activity through photoinduced carbamylation after photolysis of the AChE/32 complex [53]. In this experiment, even a single laser pulse caused time-dependent loss of enzymic activity to 60% of the control value, while 20 pulses decreased the enzyme activity by a further 15% (Fig. 2a). That this observed inactivation was due to photoinduced carbamylation was demonstrated by the fact that >90% of control activity could be regenerated in a progressive fashion upon subsequent dilution of carbamylated enzyme (Fig. 2b), similar to what was observed for AChE carbamylated directly by carbamylcholine. Such photoregulation reactions using caged carbamylcholine derivatives could not be applied to BuChE, since carbamylcholine has a very low affinity for BuChE and carbamylation of BuChE is a very slow process that requires in addition very high concentrations of carbamylcholine. The obtained results show that 25 and 32, or their quaternary arsonium analogs, are suitable for a photoregulation of the ChEs activities. They each release photochemically a different cholinergic modulator controlling the catalytic reactions at different steps. 25 photoreleases choline (the enzymatic product), while 32 photogenerates carbamylcholine (an AChE substrate), allowing the carbamylation of the enzyme with concomitant release of choline. Since these two compounds generate choline in two different ways, either by a direct photocleavage reaction (with 25) or by enzymatic hydrolysis of a substrate generated by photocleavage (with 32), they constitute complementary tools for time-resolved crystallographic studies envisaged [59]. To overcome the problem raised by the slow photolytic release of choline from the NPE-caged choline derivative 25, a recently described alternative method combined cryophotolysis to kinetic crystallography [60].
7 Miscellaneous Examples of Caged Enzyme Modulators 7.1 Caged Nitric Oxide Synthase (NOS) Inhibitor
The photo-control of the activity of NOS using a caged NOS inhibitor, Bhc-1400W (34, Scheme 9), has recently been established [61]. Incorporation of the Bhc-caging group induced about a 16-fold loss of affinity for NOS (IC50 ∼1100 versus 70 nM for the uncaged inhibitor). Photochemical uncaging was possible either by UV
17
18 Photochemical Enzyme Regulation
Fig. 2 (a) Time-dependent inactivation of AChE after laser flash photolysis (351 nm) of AChE in the presence of 32 (4.2×10−3 M in phosphate buffer, pH 6.5): (circles): 1 pulse; (triangles): 20 pulses. (b) Regeneration of AChE activity after subsequent dilution. (Reproduced with the permission of the American Chemical Society)
irradiation or multiphoton excitation, although with a moderate first-order time constant (0.007 min−1 for multiphoton uncaging). This constitutes a first reported conjugation of a caging group with a large two-photon uncaging cross-section to a therapeutic agent.
7 Miscellaneous Examples of Caged Enzyme Modulators
Scheme 9 Miscellaneous examples of caged enzyme modulators: caged nitric oxide synthase inhibitor (34); caged cytochrome bc1 substrate (35); caged urease substrate (36–39) caged ribonuclease reductase substrate (40).
7.2 Caged Substrate of Quinol-oxidizing Enzymes
Quinol oxidation is a common feature of respiration in mitochondria and aerobic bacteria. The quinol-oxidation enzymes utilize the energy released in electron transfer from quinols to electron acceptor-substrates to promote the translocation of protons across membranes. To investigate in detail the catalytic mechanisms of these enzymes (cytochrome bo3 and cytochrome bc1 ), a caged substrate (35, Scheme 9) was developed to overcome the use of stopped-flow mixing techniques [62]. A BCMB-caged decylubiquinol derivative, BCMB-DQ (35), was synthesized and tested for potential photoregulation of the quinol-oxidation enzymes. The electron input kinetics of cytochrome bc1 could be investigated at a millisecond timescale corresponding to the rate-limiting decarboxylation of the photolytic DQ-carboxylate. 7.3 Caged Urease Substrate
A series of o-nitrobenzyl urea derivatives has been synthesized and tested for their photolytic properties (36–39, Scheme 9) [63]. Rates of photolysis were in the 104 −105 s−1 range together with high quantum yields (up to 0.81). Irradiation of N-CNB-caged derivative (R=CO2 H) triggered the Klebsiella aerogenes urease activity, although only to a very moderate extent.
19
20 Photochemical Enzyme Regulation
7.4 Caged Ribonuclease Reductase Substrate
A coumarincaged cytidine 5 -diphosphate derivative was synthesized and tested for its photolytic properties (40, Scheme 9) [64]. The release of CDP from the caged derivative was ultra-fast (2 × 108 s−1 ) while the moderate quantum yields (∼0.03) are compensated for by the high absorptivity of the coumarin chromophore above 300 nm. The potential use of this derivative for photochemical triggering of Ribonuclease reductase (RNR) activity using this new caged substrate has not been reported yet.
8 Conclusions
As largely exemplified in this chapter, caged enzyme modulators are indeed useful tools to trigger enzymatic reactions or to control their inhibition. Such photoregulation represent a complementary method to the light-activated enzymes involving caging of protein residues. Photochemical enzyme regulation using caged modulators is of special interest, allowing a spatiotemporal control of the catalytic reaction. Combined with time-resolved studies such as FTIR and X-ray crystallography, it is possible to study, at the atomic level, the enzyme functions and mechanisms such as binding or breakdown of a substrate, with or without the help of a coenzyme, formation of product and release of product. Future endeavors will be directed towards the design of new caged enzyme modulators with caging groups displaying improved chemical and photochemical properties together with the development of improved time-resolved techniques with higher resolution for structure elucidation and functional study of enzymes.
References 1 J. H. KAPLAN, B. FORBUSH III, J.F. HOFFMAN, Biochemistry 1978, 17, 1929–1935. 2 (a) V. N. R. PILLAI, Synthesis 1980, 1–26; (b) T. W. GREENE, P. G. M. WUTS, Protective Groups in Organic Synthesis, 3rd edn, Wiley, New York, 1999. 3 J. W. WALKER, G. P. REID, J. A. MCCRAY, D. R. TRENTHAM, J. Am. Chem. Soc. 1988, 110, 7170–7177. 4 K. FENDLER, K. HARTUNG, G. NAGEL, E. BAMBERG, Methods Enzymol. 1998, 291, 289–306. 5 K. FENDLER, S. DRO¨ SE, W. EPSTEIN, E. BAMBERG, K. ALTENDORF, Biochemistry 1999, 38, 1850–1856.
6 J. A. DANTZIG, H. HIGUCHI, Y. E. GOLDMAN, Methods Enzymol. 1998, 291, 307–348. 7 H. HIGUCHI, E. MUTO, Y. INOUE, T. YANAGIDA, Proc. Natl Acad. Sci. USA 1997, 94, 4395–4400. 8 C. TOYOSHIMA, H. NOMURA, Y. SUGITA, FEBS Lett. 2003, 555, 106– 110. 9 A. BARTH, W. KREUZ, W. M¨ANTELE, FEBS Lett. 1990, 277, 147–150. 10 S. LEWIS, D. D. THOMAS, Biochemistry 1991, 30, 8331–8339. 11 J. E. MAHANEY, J. P. FROEHLICH, D. D. THOMAS, Biochemistry 1995, 34, 4864–4879.
References 12 F. M. ROSSI, J. P. Y. KAO, J. Biol. Chem. 1997, 272, 3266–3271. 13 A. TROULLIER, K. GERWERT, Y. DUPONT, Biophys. J. 1996, 71, 2970–2983. 14 J. H. KAPLAN, G. C. R. ELLIS-DAVIS, Proc. Natl Acad. Sci. USA 1988, 85, 6571–6575. 15 K. KINBARA, L. E. GOLDFINGER, M. HANSEN, F. L. CHOU, M. H. GINSBERG, Nat. Rev. Mol. Cell Biol. 2003, 4, 767–776. 16 I. SCHLICHTING, G. RAPP, J. JOHN, A. WITTINGHOFER, E. F. PAI, R. S. GOODY, Proc. Natl Acad. Sci. USA 1989, 86, 7687–7690. 17 I. SCHLICHTING, S. C. ALMO, G. RAPP, K. WILSON, K. PETRATOS, A. LENTFER, A. WITTINGHOFER, W. KABSCH, E. F. PAI, G. A. PETSKO, R. S. GOODY, Nature 1990, 345, 309–315. 18 A. J. SCHEIDIG, S. M. FRANKEN, J. E. T. CORRIE, G. P. REID, A. WITTINGHOFER, E. F. PAI, R. S. GOODY, J. Mol. Biol. 1995, 253, 132–150. 19 A. J. SCHEIDIG, C. BURMESTER, R. S. GOODY, Structure 1999, 7, 1311–1324. 20 K. GERWERT, Biol. Chem. 1999, 380, 931–935. 21 V. CEPUS, A. J. SCHEIDIG, R. S. GOODY, K. GERWERT, Biochemistry 1998, 37, 10263–10271. 22 X. DU, H. FREI, S.-H. KIM, J. Biol. Chem. 2000, 275, 8492–8500. 23 C. ALLIN, K. GERWERT, Biochemistry 2001, 40, 3037–3046. 24 C. ALLIN, M. R. AHMADIAN, A. WITTINGHOFER, K. GERWERT, Proc. Natl Acad. Sci. USA 2001, 98, 7754–7759. 25 J. ENGELS, E.-J. SCHLAEGER, J. Med. Chem. 1977, 20, 907–911. 26 K. CURLEY, D. S. LAWRENCE, Pharmacol. Ther. 1999, 82, 347–354. 27 R. POPP, R. P. BRANDES, G. OTT, R. BUSSE, I. FLEMING, Circ. Res. 2002, 90, 800–806. 28 S. TERTYSHNIKOVA, A. FEIN, Proc. Natl Acad. Sci. USA 1998, 95, 1613–1617. 29 J. S. WOOD, M. KOSZELAC, J. LIU, D. S. LAWRENCE, J. Am. Chem. Soc. 1998, 120, 7145–7146. 30 R. SREEKUMAR, Y. Q. PI, X. P. HUANG, J. W. WALKER, Bioorg. Med. Chem. Lett. 1997, 7, 341–346. 31 J. XIA, X. HUANG, R. SREEKUMAR, J. W. WALKER, Bioorg. Med. Chem. Lett. 1997, 7, 1243–1248.
32 W. F. VELDHUYZEN, Q. NGUYEN, G. MCMASTER, D. S. LAWRENCE, J. Am. Chem. Soc. 2003, 125, 13358–13359. 33 R.-H. YEH, X. YAN, M. CAMMER, A. R. BRESNICK, D. S. LAWRENCE, J. Biol. Chem. 2002, 277, 11527–11532. 34 J. W. WALKER, S. H. GILBERT, R. M. DRUMMOND, M. YAMADA, R. SREEKUMAR, R. E. CARRAWAY, M. IKEBE, F. S. FAY, Proc. Natl Acad. Sci. USA 1998, 95, 1568–1573. 35 J. LISMAN, R. C. MALENKA, R. A. NICOLL, R. MALINOW, Science 1997, 276, 2001–2002. 36 A. ISHIDA, I. KAMESHITA, S. OKUNO, T. KITANI, H. FUJISAWA, Biochem. Biophys. Res. Commun. 1995, 212, 806–812. 37 A. ISHIDA, Y. SHIGERI, Y. TATSU, K. UEGAKI, I. KAMESHITA, S. OKUNO, T. KITANI, N. YUMOTO, H. FUJISAWA, FEBS Lett. 1998, 427, 115–118. 38 Y. TATSU, Y. SHIGERI, A. ISHIDA, I. KAMESHITA, H. FUJISAWA, N. YUMOTO, Bioorg. Med. Chem. Lett. 1999, 9, 1093–1096. 39 B. E. COHEN, B. L. STODDART, D. E. KOSHLAND JR, Biochemistry 1997, 36, 9035–9044. 40 M. J. BRUBAKER, D. H. DYER, B. L. STODDARD, D. E. KOSHLAND Jr, Biochemistry 1996, 35, 2854–2864. 41 B. L. STODDARD, B. E. COHEN, M. BRUBAKER, A. D. MESECAR, D. E. KOSHLAND JR, Nat. Struct. Biol. 1998, 5, 891–897. 42 R. R. SWEZEY, D. EPEL, Dev. Biol. 1995, 169, 733–744. 43 C. P. SALERNO, M. RESAT, D. MAGDE, J. KRAUT, J. Am. Chem. Soc. 1997, 119, 3403–3404. 44 Y. HIRAYAMA, M. IWAMURA, T. FURUTA, Bioorg. Med. Chem. Lett. 2003, 13, 905–908. 45 S.-C. TSAI, J. P. KLINMAN, Bioorg. Chem. 2003, 31, 172–190. 46 J. MASSOULIE´ , L. PEZZEMENTI, S. BON, E. KREJCI, F. M. VALETTE, Prog. Neurobiol. 1993, 41, 31–91. 47 O. LOCKRIDGE, Pharmacol. Ther. 1990, 47, 35–60. 48 J. L. SUSSMAN, M. HAREL, F. FROLOW, C. OEFNER, A. GOLDMAN, L. TOKER, I. SILMAN, Science 1991, 253, 872–879. 49 Y. NICOLET, O. LOCKRIDGE, P. MASSON, J. C. FONTECILLA-CAMPS, F. NACHON, J. Biol. Chem. 2003, 42, 41141–41147.
21
22 Photochemical Enzyme Regulation
50 L. PENG, C. COLAS, M. GOELDNER, Pure Appl. Chem. 1997, 69, 755–759. 51 L. PENG, M. GOELDNER, J. Org. Chem. 1996, 61, 185–191. 52 L. PENG, F. NACHON, J. WIRZ, M. GOELDNER, Angew. Chem. Int. Ed. 1998, 37, 2691–2693. 53 L. PENG, I. SILMAN, J. SUSSMAN, M. GOELDNER, Biochemistry 1996, 35, 10854–10861. 54 M. WEIK, Personal communication. 55 J. E. T. CORRIE, A. BARTH, V. R. N. MUNASINGHE, D. R. TRENTHAM, M. C. HUTTER, J. Am. Chem. Soc. 2003, 125, 8546–8554. 56 L. PENG, J. WIRZ, M. GOELDNER, Angew. Chem. Int. Ed. 1997, 36, 398– 400. 57 L. PENG, J. WIRZ, M. GOELDNER, Tetrahedron Lett. 1997, 38, 2961–2964.
58 J. W. WALKER, J. A. MCCRAY, G. P. HESS, Biochemistry 1986, 25, 1799. 59 L. PENG, M. GOELDNER, Methods Enzymol. 1998, 291, 265–278. 60 A. SPECHT, T. URSBY, M. WEIK, L. PENG, J. KROON, D. BOURGEOIS, M. GOELDNER, Chem Bio Chem 2001, 2, 845–848. 61 H. J. MONTGOMERY, B. PERDICAKIS, D. FISHLOCK, G. A. LAJOIE, E. JERVIS, J. G. GUILLEMETTE, Bioorg. Med. Chem. 2002, 10, 1919–1927. 62 K. C. HANSEN, B. E. SCHULTZ, G. WANG, S. I. CHAN, Biochim. Biophys. Acta 2000, 1456, 121–137. 63 R. WIEBOLDT, D. RAMESH, E. JABRI, P. A. KARPLUS, B. K. CARPENTER, G. P. HESS, J. Org. Chem. 2002, 67, 8827–8831. 64 R. O. SCHO¨ NLEBER, J. BENDIG, V. HAGEN, B. GIESE, Bioorg. Med. Chem. 2002, 10, 97–101.
1
Kinetic Protein Crystallography using Caged Compounds Dominique Bourgeois Institut de Biologie Structurale LCCP, Grenoble, France
Martin Weik Institut de Biologie Structurale LBM, Grenoble, France
Originally published in: Dynamic Studies in Biology. Edited by Maurice Goeldner and Richard S. Givens. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30783-8
1 Introduction 1.1 Protein Activity and Conformational Changes in Crystals
It is quite astonishing that such complex and highly flexible objects as biological macromolecules can arrange themselves in a perfectly ordered fashion to form crystals. Nevertheless, more than 18 000 protein structures have now been determined to near-atomic resolution by X-ray crystallography and this number is just a beginning: with the advent of high-throughput structural proteomics, a world-wide initiative aimed at systematic structural investigations of proteins expressed by given genomes, an explosion of new structures will undoubtedly appear in the near future. However, despite the considerable efforts behind this evolution, our knowledge of static X-ray structures remains insufficient to understand how proteins function. What characterizes life is motion, and biological molecules constantly breathe around their average shape, wandering in a conformational landscape at the basis of activity. Fortunately, watching crystalline proteins in action is not an impossible task. Indeed, macromolecular crystals are fragile samples maintained by weak contact forces, of the level of a single hydrogen bond (∼10 kcal mol−1 ). Therefore, individual molecules in the crystal remain flexible, with only few surface residues interacting with neighboring molecules and most of the others being bathed in solvent (Fig. 1). Indeed, the presence of solvent channels, sometimes making up to 90% of the crystal volume content, is a characteristic feature of protein crystals. Under these conditions, the observation that many enzymes are fully Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Kinetic Protein Crystallography using Caged Compounds
Fig. 1. Representation of the solvent channels within a trigonal crystal of acetyl-cholinesterase, viewed down the c-axis. The protein molecules are bathed within solvent and small molecules such as substrate or cofactors (an NPE-Ch molecule is shown encircled) may diffuse quite freely through the channels whose diameter attains 65 Å in this case. Structural data are from PDB entry 1EA5.
active in the crystalline state is not so surprising. Following conformational changes by X-ray crystallography is nevertheless a considerable challenge. In some cases, the catalysis driving force may lead to crystal disorder or even rupture, ruining the diffraction power of the sample. In other cases, access to the protein active site by a substrate may be restricted by unfortunate crystal contacts or by the competitive presence of some constituent indispensable to crystallization and present in large concentration in the medium. Additionally, the pH, dielectric constant and solvent viscosity generally deviate from their optimal values in physiological conditions, which often slows down catalysis and may alter its mechanism or even abolish it. Last, but not least, crystallography yields an average structure from an ensemble of typically 1013 –1015 individual molecules and a significant portion, if not all, of these must “walk in pace” for a coherent structural picture of catalysis to be delivered. The emerging field of “kinetic crystallography” [1–3], aimed at solving the structures of transient species along the reaction pathway, is therefore paved with a number of exciting challenges. One of the most critical issues concerns the question of homogeneous and synchronous triggering of a reaction in a protein crystal [4]. Caged compounds – photolabile precursors of substrates, cofactors, amino acids or even protons – represent highly effective tools to address this question and their use in the context of kinetic crystallography has already produced results of outstanding biological significance [5–10]. Properly handling them for such experiments, however, is delicate, and it is the purpose of this chapter to highlight both their potential and limitations in this field of research.
1.2 Timescales and Activation States
Enzymes catalyze biological reactions over a broad range of timescales, extending from nanoseconds to seconds or more. In general, rapid reactions involve subtle, small conformational changes, e.g. side-chain motions, associated with small activation barriers, whereas slower reactions involve larger changes, e.g. helix or loop motions, associated with high barriers. Larger motions are not necessarily easier to observe crystallographically because the reduced requirement for high-resolution
1 Introduction 3
data may be compromised by the generation of stronger disorder in the crystal during catalysis. When designing a reaction initiation strategy in a kinetic crystallography experiment, the sole knowledge of the overall reaction rate as measured in solution is largely insufficient. First, rates must be considered in the crystalline state, which in general slows down turnover [1], but in certain circumstances may accelerate it (e.g. due to a short-cut pathway imposed by lattice forces [11]). Second, what counts is the lifetime of the particular intermediate targeted in the study, which can be considerably shorter than the overall reaction cycle. As an example, the lifetime of the K intermediate in the reaction cycle of bacteriorhodopsin, a light-driven proton pump, is a few microseconds [12], whereas the overall cycle takes place in milliseconds. Third, for a significant population of an intermediate to build-up, the rate of formation of the latter must be greater than its rate of decay. These rates (taking the form k=v exp–E a /kT) depend on activation energy barriers E a , as well as on frequency factors v, which can be understood as how often per second the molecule attempts to jump over the activation barrier. As a consequence, extensive kinetic studies are generally required, both in the crystal and in solution, before embarking on non-interpretable diffraction experiments. Once timescales are approximately known, one must decide on a suitable reaction triggering method, activation profile and data collection strategy. It is useful to distinguish between the triggering method, which brings together the ingredients (protein, ligand, substrate and/or cofactor) necessary to start a chemical reaction, and the activation profile, which controls how fast or how far the reaction may proceed by acting on physicochemical parameters such as temperature, pressure, viscosity or pH. Accumulation of a transient species in the crystal will, in principle, be possible whenever reaction triggering is rapid enough to ensure synchrony amongst the molecules of the sample in the chosen activation state. Suppose, for example, that the activation state is first lowered by reducing the temperature to the point where no activity may occur. Under such conditions, reaction triggering, e.g. delivering the substrate to an enzyme, may be as slow as practical considerations permit and accumulation of the intermediate will still be possible, depending on subsequent, adequate temperature elevation. On the contrary, if the activation state of the sample is kept high, optimal turnover occurs immediately upon triggering, which therefore must be rapid enough to keep synchrony. The same considerations hold true for diffraction data collection. If the biological system is brought to a low activation state (e.g. by cryo-cooling), data collection may be slow and single-wavelength X-rays may be used with standard protocols. In a high activation state, transient species must be caught “on the fly” and rapid data collection must be used, such as the Laue technique making use of polychromatic X-rays. Although most kinetic protein crystallographic experiments to date are based on single-turnover strategies, the possibility to rely on a steady-state build-up of an intermediate, where substrates and products are continuously delivered to and withdrawn from the sample with a flow-cell apparatus, has been used successfully [13, 14].
4 Kinetic Protein Crystallography using Caged Compounds
The recent evolution of protein crystallography shows a global standardization, and even automation, towards data collection at cryo-temperatures (typically 100 K or less, i.e. in a low activation state) using samples mounted in nylon cryo-loops [15, 16]. This is mostly due to the extreme brilliance of recent synchrotron sources, resulting in rapid radiation damage, which is minimized [17], but not abolished, at 100 K [17–20]. Monochromatic data collection at close to ambient temperature from crystals mounted in capillaries remains generally only possible with laboratory Xray sources and provides high-resolution density maps with only a small number of friendly crystalline proteins. In parallel, the availability of Laue beamlines is scarce, because of the high level of instrumental sophistication required combined with the low number of favorable samples. As a consequence, kinetic crystallography nowadays receives practical constraints from the structural proteomics world. The dedicated methodology must adapt and we have to consider the use of caged compounds in this new context. 1.3 Accumulating Unstable Species in the Crystal
In principle, the easiest way of triggering catalysis in a crystalline enzyme consists of letting the substrate or a cofactor rapidly diffuse into the crystal by soaking the latter into a proper solution. However, diffusion throughout the narrow crystal channels is often exceedingly slow and may require up to hours or even days, becoming rate limiting [21–23] and preventing the synchronous build-up of intermediate states if the enzyme is in a high activation state. In extreme cases, a reactant may not diffuse at all, being turned over into product by the first layers of macromolecules crossed in the crystal. Despite these limitations, a number of intermediates could be captured using this simple methodology [24–26]. Slow diffusion could be tolerated if the protein could be kept in a low activation state in a way that does not interfere with the diffusion process. For example, lowering the temperature below solvent crystallization or embedding the protein in a glassy medium would arrest diffusion and cannot be envisaged. However, a combination with a pH jump is possible [27, 28] or with engineered proteins that are mutated to block the reaction at a certain step [13, 29]. The other triggering protocol most largely used in protein crystallography relies on direct light activation of built-in photosensitive moieties. These can be natural chromophores such as the retinal of bacteriorhodopsin [30] or the 4-hydroxy cinamoic acid from photoactive yellow protein [31], or photosensitive coordination bonds such as the iron-carbonmonoxy bonds in heme containing proteins [32]. The advantages of this strategy are immense: (1) proper choice of the excitation wavelength and light intensity often allows homogeneous activation throughout the sample, (2) photoactivation is extremely fast, taking place on the timescale of Franck-Condon absorption (∼10−18 s), allowing synchronization in real-time-resolved experiments at the level of the pulse duration of the used light source, and (3) photoactivation occurs at cryo-temperatures, while protein activity is partially or completely stopped. Using this latter strategy, the early steps of extremely fast reactions such as the photocycle of bacteriorhodopsin [12, 33] or PYP [34–36]
2 Real-time Crystallography and the Laue Technique
and the relaxation of myoglobin [37–40] or hemoglobin [41] after CO photolysis could be studied at high resolution using gentle light sources combined with cryocrystallography. Unfortunately, natural photosensitivity is scarce in nature and other widely applicable triggering strategies are necessary. Recently, the observation of radiation damage caused by X-ray-induced photoelectrons has been turned into an elegant strategy to trigger photoreduction in redox crystalline enzymes at low temperature [42, 43]. For the huge number of non-redox proteins that do not contain a naturally photoactivable group, caged compounds represent an elegant way to trigger catalysis by UV-vis light.
2 Real-time Crystallography and the Laue Technique
The main reason for using caged compounds in biological sciences is the rapidity with which they can be photolysed (typically in the millisecond timescale). This is because in practically all fields of research using these compounds [44], this property is a major issue, the biological system being kept in its physiological, high activation state and fast synchronization being essential. In this context, caged compounds were first envisaged in crystallography for their use in real-time-resolved experiments, where intermediate states are caught “on the fly” as they transiently build-up in the crystal. This was at a time where these types of experiments were largely put forward as one of the most elegant applications of high-brilliance synchrotron machines in protein science. Indeed, synchrotrons deliver intense X-rays over a broad bandpass of wavelengths, allowing us to reduce the exposure time necessary to record a diffraction pattern down to well below typical durations of single-turnover biological reactions. This is achieved by employing the Laue technique, where the white beam is allowed to diffract at once on a non-rotating, static crystal. With the advent of third-generation facilities such as the ESRF, Spring-8 or APS, exposure times can be as short as 100 ps [45] – a time much shorter than the photofragmentation time of most known caged compounds. It was therefore anticipated that real movies of catalysis could be obtained where the time resolution would be as high as permitted by the photoactivation process (down to a few tens of microseconds). However, it is striking to observe that amongst all publications that have used Laue diffraction in combination with photolysis of caged compounds, only one of them achieved a time resolution down to the millisecond [8]. For all the others [5–7], the time resolution was rather in the second or even minute time-scale. Two reasons explain this fact. The first reason is that the collection of a single Laue diffraction image, in the vast majority of cases [46], is largely insufficient to completely sample the crystal reciprocal space. Therefore, several images are needed to reconstruct interpretable electron density maps. Since photolysis of caged compounds is a non-repeatable process, the time resolution of the experiment is given by the total experimental time necessary to collect a sufficient number of images if a single crystal is employed. The latter comprises delays to rotate the sample and readout the detector between X-ray exposures. Even with the fastest CCD detectors
5
6 Kinetic Protein Crystallography using Caged Compounds
currently available, the readout time cannot be reduced to below ∼ 100 ms and the total experimental time in practice reaches the order of several seconds. This limitation can, in principle, be alleviated if a series of crystals are employed, each of them being exposed once at complementary orientations. Such a strategy, however, is experimentally very demanding, requiring the preparation of many high-quality, pre-oriented samples. Furthermore, it is based on proper scaling between the different frames – an extremely difficult task due to crystal-to-crystal variations in photolysing efficiency and diffraction quality. 2.1 A Favorable Case: Isocitrate Dehydrogenase
The problem addressed above was overcome successfully by Stoddard et al. [8] in studies of isocitrate dehydrogenase. This enzyme uses NAD(P) to convert isocitrate into α-ketoglutarate and carbon dioxide. The rate-limiting product complex, with a life-time of 40–50 ms under the experimental conditions, could be accumulated in the crystal by using the following strategy several crystals were soaked into free isocitrate and caged (DMNPE)-NADP (P2 -[1-(4,5-dimethoxy-2-nitrophenyl)-ethyl] NADP) (Fig. 2). In the dark, isocitrate bound to the enzyme active site, while caged NADP remained in the solvent channels because the cage lowered dramatically the affinity of the cofactor for the enzyme. A single 0.5-ms xenon light pulse was then sufficient to release a sufficient number of NADP molecules to completely saturate the entire population of enzyme molecules in each crystal despite crystalto-crystal variations in photolysis yield. This was possible because the concentration of enzyme molecules in the crystals was rather low (5 mM) and because NADP has an affinity for the enzyme in the micromolar range, so that a 10-fold excess of caged
Fig. 2 Representation of caged NADP. The DMNPE blocking group is outlined by a dashed border. (Reproduced with permission from [8])
2 Real-time Crystallography and the Laue Technique
cofactor could be used (50 mM) without compromising the possibility of a uniform photolysis due to high optical density. In such conditions, several crystals could be used, each of them providing partial data sets with millisecond time resolution that could be scaled together successfully because even partial photolysis (20–50%) led to complete saturation of the active sites in each sample. The studies of Stoddard et al. [8] could possibly be tackled by two other methods allowing the use of a unique crystal. The first would consist in using a flow cell to regenerate the crystal content in between two X-ray exposures by flowing in fresh isocitrate and caged NADP, while flushing out the products liberated by the last round of reaction triggering. The second, still hypothetical as of today, would rely on the ongoing development of so-called pixel detectors, which are able to continuously readout pixels as they are hit by X-ray photons at a very high rate, thus considerably decreasing the overall data collection time of a Laue experiment provided the crystal can be rotated sufficiently quickly [47]. The advantage given here by the so-called “affinity caging” might in general be offset by the concomitant decrease in time resolution, which is limited by the time it takes for the released co-factor to diffuse and bind into the active site. The diffusion time is small because the distance to travel is tiny, of the order of the unit-cell length. With a conservative diffusion coefficient of D=10−9 cm2 s−1 (representing a diffusion time of a day through a 0.2-mm thick crystal [48]), the mean distance √ traveled within a millisecond is given by d= (2Dt) = 140 Å, that is about the size of a unit cell. The binding time also depends on the bimolecular association (and dissociation) rate constant, and reactant concentrations. In the case of isocitrate dehydrogenase, this penalty had no consequence, considering the high association rate constant (105 –106 M−1 s−1 ) and high reactant concentrations (5 and 10 mM). The experiment by Stoddard et al. [8] was also successful for another reason, i.e. the radiation hardness of the biological system with respect to intense and short UV-vis radiation, of the order of 5 mJ (in 0.5 ms) delivered onto the sample in the case of isocitrate dehydrogenase. This value can be compared to the theoretical energy needed to photolyse a caged compound in a crystal, expressed by: E delivered [mJ]=12×105 ×d[mm]×c[mol l−1 ]×S[mm2 ]/{1−10−OD )×λ[nm]×} where d is the crystal thickness, c is the concentration of caged compound that needs to be photolysed in the crystal, S is the photolysing beam section, OD is the optical density at the photolysing wavelength λ and is the photolysis yield. For crystals of 0.1mm thickness, a concentration of caged compound of 10 mM, an optical density of 0.1, a mean wavelength of 450 nm, a yield of 0.2 and a beam cross-section of 2 mm2 , we find E delivered =13 mJ. This energy is commonly achievable by pulsed flash lamps. Provided good care is taken in alignment and focusing procedures, the energy actually deposited within the crystal is expressed by: E deposited [mJ]=12×105 ×V [mm3 ]×c[moll−1 ]/(λ[nm]×)
7
8 Kinetic Protein Crystallography using Caged Compounds
with V the crystal volume. In the case above, the energy deposited into a 1×1×0.1mm3 crystal amounts to 1.3 mJ. The adiabatic temperature rise resulting from such energy deposition can be calculated as: T [K]=E deposited [J]/(m[mol]×C[J K−1 mol−1]) with m the number of moles of matter (taken as water) in the crystal and C the molar heat capacity of the crystal (taken as water, i.e. 75.3 J K−1 mol−1 ). The temperature rise amounts to about 30 K, a very substantial increase. Unfortunately, in addition to this isotropic temperature elevation, thermal gradients will unavoidably set in throughout the crystal, which may often result in strong crystal disorder and significant loss of X-ray diffraction. In the case of isocitrate dehydrogenase, the employed caged compound could be efficiently photolysed in a single pulse at around 450 nm. At this wavelength, no competitive absorption by other species in the sample occurred and, again, the situation was favorable. In general, the absorption patterns of common caged compounds like NPE compounds tend to overlap significantly with the ones of aromatic amino acids like tryptophans or possibly with the ones of other chromophores present in the sample (e.g. NADH). As a consequence, the choice of a proper excitation wavelength is difficult. If a wavelength close to the absorption maximum of the caged compound is chosen, the overall optical density of the sample will be high (at some point, light will only penetrate a fraction of the crystal volume), resulting in low overall photolysis and in a strong thermal gradient within the sample inducing transient or irreversible damage. On the other hand, if a wavelength remote from the absorption maximum is chosen, the photolysis yield will be lower than optimal and a very intense light source (that may not easily be available) will be required together with a sacrifice in time resolution. The situation may even be more complicated if the build-up of photolytic products results in further competitive absorption and acts as a shield against the photolysing beam. For these reasons, a sufficient level of photolysis generally requires two or more gentle flashes, with the crystal being ideally rotated in between the flashes for a most uniform photoactivation. Otherwise, the photolysis level is too low or crystal damage is too high. In practice, therefore, the achievable time resolution is also limited by the total photolysing time, which can be as long as several seconds at room temperature when employing a flash lamp, possibly being reduced to the 100-ms timescale if using a continuous light source or laser connected to a fast shutter. Finally, isocitrate dehydrogenase crystals suffered only moderately from a further complication linked to the Laue technique. The geometry of Laue diffraction renders this technique extremely sensitive to crystalline disorder. Therefore, crystals amenable to Laue diffraction must not only grow with low mosaicity (0.5), but also must not become mosaic as a result of either X-ray radiation damage, photolysis damage or transient constraints on the lattice resulting from the genuine conformational changes that are under study. In addition, the possibility of subtle structural modifications being induced by the very intense X-ray beam on the crys-
2 Real-time Crystallography and the Laue Technique
tal kept at close to room temperature must be carefully considered. The likelihood of such artifacts is indeed quite high, since at room temperature, primary radiation damage (inelastic interaction of a X-ray photon with atoms in the crystal) is accompanied by extensive secondary damage (caused by diffusing secondary radicals that are created as a consequence of primary events). Overall, the number of macromolecular crystals fulfilling the above-mentioned requirements is unfortunately low, explaining for a large part the small number of successful studies that have employed the Laue technique. Only a very small number of favorable biological systems like isocitrate dehydrogenase are amenable to Laue diffraction and to single-flash photolysis, which are both required to reach a time resolution of a few milliseconds. In the majority of cases, successful photolysis takes of the order of seconds or reproducible kinetics may not be achieved and only the study of intermediates of long lifetimes may be envisaged. Such lifetimes are greater than the time it takes to flash-cool a protein crystal (<1 s) [49]. Although the technique of flash-cooling introduces its own difficulties, we will see in the next section that trapping intermediates in this way and collecting the data with monochromatic radiation rather than with Laue diffraction may be the best compromise. As a consequence, white beam stations at third-generation synchrotron sources nowadays favor a few number of studies on naturally photosensitive biological systems, requiring extremely high time-resolution (nanoseconds or even picoseconds [39, 40]) and receive very few requests for caged-compound-based Laue studies. 2.2 The Case of Ha-ras p21: From Laue to Monochromatic Diffraction and Further Difficulties with Caged Compounds in Kinetic Crystallography
A representative case reflecting the evolution of the use of caged compounds in kinetic crystallography concerns the study of the enzyme p21ras , a small guanine binding protein involved in a signal transduction pathway similar to those involving G-proteins. The protein switches from a GTP-bound, active state to a GDP-bound, inactive state. The switch in signaling state involves conformational changes associated to the hydrolysis of GTP to GDP, that occur on a timescale of several minutes. Photolysis of NPE-GTP (Fig. 3) in crystals of p21ras was first used in combination with Laue diffraction to observe the relatively short-lived GTP complex [6]. Ten flashes with a xenon lamp were delivered in about 1 min to activate crystals and Laue data collection took 5 min. However, it was realized over the years that a number of pitfalls compromised the data quality. Considerable improvement resulted from several modifications in the experimental strategy. Firstly, the use of gentle photolysis using monochromatic light at a wavelength (313 ± 2nm) remote from absorption bands of aromatic amino acids and at moderately low temperature (2 ◦ C) enabled the preservation of the diffraction quality of the crystals, and sometimes even improved it slightly. Complete photolysis could be achieved in this way within 2–3 min. During exposure, continuous rotation of crystals chosen for their relatively small size (about 150 µm in the largest dimension) allowed uniform
9
10 Kinetic Protein Crystallography using Caged Compounds
Fig. 3 Reaction scheme for time-resolved and kinetic crystallography on Ras p21. Ras p21 in complex with GTP is unstable and cannot be crystallized. Caged GTP is a biochemical inert precursor of GTP and the 1:1 complex with Ras p21 is stable and can be crystallized. Illumination with light cleaves off the protection group and converts the crystal of Ras p21-caged GTP into a crys-
tal of Ras p21-GTP. In time-resolved crystallography the data collection is at room temperature and has to be finished within 2 min after complete photolysis of the crystal (the Laue X-ray diffraction method is used). In kinetic crystallography the GTP state is stabilized by freeze-trapping and data collection is performed at cryogenic temperatures. Courtesy of Axel Scheidig.
photolysis throughout the crystal volume, while eliminating disorder that could otherwise result from thermal gradients. Second, the crystal quality was improved by using a pure diastereoisomeric form of caged GTP instead of a racemic mixture of this chiral molecule as employed in the first studies [50, 51]. The use of the pure R-diastereoisomer yielded clear electron density maps consistent with the conformation of the nucleotide precursor being similar to that of the non-hydrolysable form GppNHp, at variance with the early observation of an abnormal binding when using the R/S racemic mixture. In fact, cocrystallization with either form of the caged precursor gave crystals of different morphology and diffraction power, showing that interference may occur between the precursor binding mode and the lattice packing forces. Interestingly, non-chiral caged GTP molecules such as those based on 2-nitrobenzyl or bis(2-nitrophenyl)-methyl caging groups gave crystals of lower diffraction quality, emphasizing that subtle chemical modifications of the caged molecules may drastically influence crystalline order in a rather unpredictable way. In these studies, “catalytic caging” was used, i.e. the caged compound displayed high affinity for the protein and did bind to the active site prior to photolysis (as opposed to “affinity caging” where the caged precursor is prevented from binding, as in the case of isocitrate dehydrogenase). The results on p21ras showed that where “catalytic caging” takes place, the presence of the cage may modify the crystallization conditions, alter the packing constraints and modify the configuration of the active site due to interactions between the cage and various amino acids. Whereas this is acceptable for the unphotolysed structure, which is not expected to be of
3 Trapping of Intermediate States
Fig. 4 Comparison of the electron density distribution around the crucial catalytic loop (P-loop) of Ras p21, obtained with Laue (a) and monochromatic (b) diffraction. A clear improvement is obtained with monochromatic data collection. Courtesy of Axel Scheidig.
physiological relevance, this may lead to trouble if, upon photolysis, the released, highly reactive cage (orthonitroacetophenone in the case of NPE compounds) stays in the active site and continues to interact with it. To minimize this potential problem, it has been suggested to use reducing agents such as dithiothreitol (DTT) in high concentration in the crystallization medium so as to scavenge the released cage [44]. This concern seems to be of less importance in the case of affinity caging, when the cage has no affinity for the active site. However, DTT should always be used to avoid residual interactions with other parts of the protein. Third, instead of relying on Laue diffraction to catch the transient GTP-bound complex, this unstable species was trapped by rapid flash-cooling and structural data were obtained by the standard monochromatic data collection method [9]. Using this strategy, the overall time resolution of the experiment, which previously was limited by the Laue data collection time (several minutes), was improved, since flash-cooling a protein crystal can be achieved in <1 s. Also, the likelihood of residual radiolysis of the caged compound by the white X-ray beam, resulting in further degradation or “blurring” of the time resolution, was much reduced (see below). Furthermore, diffraction data of higher quality and higher completeness could be collected on smaller crystals. Overall, the gain in data quality jumped from 2.8 Å with 60% completeness to 1.6 Å with >90% completeness using gentle photolysis at 2 ◦ C, small crystals, pure diastereoisomer, and monochromatic data collection (Fig. 4). This drastic improvement allowed the authors to draw firm conclusions on the hydrolysis mechanism, particularly concerning the role of the catalytic loop, especially Gln61, and of water molecules positioned around the GTP γ -phosphate group (Fig. 5).
3 Trapping of Intermediate States
Progress made in the case of p21ras is in line with a trend showing that the structures of transients are often more easily visualized by crystallography using trapping methods rather than “real-time-resolved” methods based on fast data collection
11
12 Kinetic Protein Crystallography using Caged Compounds
Fig. 5 Freeze-trapped structure of Ras p21GTP. Electron density distribution around the nucleotide GTP and the catalytic loop of Ras p21. Displayed are the nucleotide, the magnesium(II) ion (together with the coordinating amino acids Ser17 and Thr35), the
residues Gly60 and Gln61, and four water molecules close to the (-phosphate. Note the quality of the electron density, which is high enough to clearly identify the binding mode of GTP. Courtesy of Axe Scheidig.
techniques such as the Laue technique. However, capturing intermediate states in rapid enzymes, with turnover 1 s−1 , still represents a considerable challenge. Indeed, the application of the experimental protocol applied to, for example, p21ras may not be envisaged. On the other hand, a number of trapping studies have been performed on extremely fast proteins displaying a built-in photosensitivity [12, 33, 35, 41, 52, 53]. These studies have allowed researchers to capture intermediates with lifetimes as short as nanoseconds by initiating the photoreaction at cryo-temperatures. At such temperatures, the reaction cycle may not proceed entirely and is stopped whenever an enthalpy barrier cannot be crossed due to the lack of thermal energy or when motions throughout the conformational landscape of the protein are arrested below critical temperatures. The low-temperature intermediates trapped in this way, although they may not thoroughly resemble their physiological counterparts, have nevertheless provided considerable insight into the structure-function relationships of the biological molecules studied in this way. A similar approach may in principle be used in the case of very fast crystalline enzymes, where the built-in photosensitivity is replaced by the exogenous photosensitivity of a caged compound introduced into the crystal [10]. The following strategy, based on a combination of cryo-photolysis of caged compounds and transient temperature increase, can be envisaged (Fig. 6). The caged compound is first soaked into the crystal at room temperature and either binds specifically to the active site (case of “catalytic caging”) or remains in the solvent part of the crystal (case of “affinity caging”). The crystal is then flash-cooled to 100 K and illuminated with UV light. At such a low temperature, neither the photolysis products nor the protein itself are anticipated to move significantly. Only upon transiently raising the temperature to above a critical value (T 1 in Fig. 6) would the protein flexibility and solvent viscosity be such that the un-caged substrate can be expected to reach the active site and/or to be turned over partially by the enzyme. As a consequence,
3 Trapping of Intermediate States
Fig. 6 Temperature profile proposed for a trapping experiment based on cryo-photolysis.
an intermediate state may accumulate above T 1 that can be trapped by cooling the crystal back to 100 K, so that its structure can be solved by conventional monochromatic X-ray crystallography. Two critical questions arise from such a strategy. First, is it possible to cryo-photolyse a caged compound? Second, which temperature profile should be chosen? These issues are addressed in the following sections. 3.1 Cryo-photolysis of NPE-caged Compounds
It is well known that photolysis of caged compounds results from complex photofragmentation reactions involving rate limiting dark reactions [44]. In contrast to the photoexcitation of built-in chromophores that immediately leads to a response of biological relevance, the decomposition of a caged compound into its biologically active form is comparatively much slower and occurs at a rate that decreases as the temperature is lowered. At cryo-temperatures around 100 K, it would be expected that the active compound may not be released at all. For example, the activation enthalpy of NPE-ATP has been measured to be ∼55 kJ mol−1 at pH 6.0–8.0, at ambient temperature [54]. If we (naively) extrapolate the Arrhenius behavior of the photolytic process as measured in the range 280–300 K down to cryo-temperatures, a rate constant of 10−16 s−1 at 100 K is predicted, meaning that photofragmentation may simply not occur at such a temperature. In practice, however, preliminary experiments by absorption microspectrophotometry [55] have suggested that the photolysis of several caged compounds was possible at temperatures as low as 100 K [56]. In these experiments, nano-volumes of solutions of NPE ether (NPE-choline) or phosphate ester (NPE-TMP, NPE-ATP) compounds are flash-cooled and subjected to photolysis at 100 K by a xenon flash lamp or a UV laser delivering light at 355 nm. Typical absorption bands at 290 and 320 nm generally assigned to the nitrosoketone photoproduct appear on the
13
14 Kinetic Protein Crystallography using Caged Compounds
3 Trapping of Intermediate States
minute timescale (Fig. ??a and b). In agreement with these spectrophotometric results, HPLC analysis has shown that quantitative uncaging of NPE-choline and NPE-TMP is possible by illumination at 100 K followed by subsequent excursion to room temperature [56]. Furthermore, a crystallographic experiment revealed directly that biologically active compounds could be released by cryo-photolysis followed by transient warming to room temperature. NPE-caged ATP in flash-cooled crystals of Mycobacterium tuberculosis thymidylate kinase was successfully photolysed at 100 K as assessed by the structural observation of ATP-dependent enzymatic conversion of TMP to TDP after temporarily warming the photolysed crystals to room temperature [10]. To account for the observation of photofragmentation reactions at cryotemperatures, it was suggested that the absorption of UV photons could generate heat locally that may serve to carry forward the fragmentation reaction without affecting significantly the overall temperature of the sample [10, 56]. Alternatively, a possible photosensitivity of one or several intermediates on the photofragmentation pathway was hypothesized. Overall, the mechanism of cryo-photolysis was not fully elucidated. A recent article by Corrie et al. [57] made us realize that the experiments carried out so far could not demonstrate unambiguously the effective release of active compounds at ∼ 100 K. In that article, it was shown that the rate-limiting step in the photofragmentation pathway of NPE ethers is the decomposition of a hemiacetal species whose absorption spectrum closely resembles the one of the nitrosoketone product. Therefore, at least in the case of NPE-choline, successful photolysis at 100 K cannot be concluded from the appearance of absorption bands at 290–320 nm. It is possible that only the hemiacetal species formed and that free choline is not released at 100 K. In contrast, in the case of NPE-phosphate esters, the fragmentation reaction is expected to be rate limited by decomposition of the wellestablished aci-nitro intermediate and microspectrophotometric results are probably more conclusive. ← Fig. 7 (a) Absorption spectrum of NPEAsCh before and after photolysis at 103 K with 355-nm light. (b) Kinetics of disappearance of the initial nitrophenyl compound [black curve in (a)] and build up of the photolysis product (putative nitroso or hemiacetal compound) [gray curve in (a)]. The relative contribution of these species were obtained by fitting the absorption spectra with a linear combination of reference spectra corresponding to 0% and (estimated) 100% photolysis (at room temperature). This procedure allowed us to estimate the total photolysis yield as a function of time. The experimental data (in black) are properly fitted with a biexponential kinetic
scheme (in gray). (c) Kinetics of appearance and disappearance of the aci-nitro intermediate, monitored by the absorbance at 400 nm. In this case, no deconvolution was necessary and the raw data could be used. The experimental data (in black) are properly fitted with a simple kinetic scheme taking into account the conditions of steady state illumination (in gray), of type C(t) = k2 N0 /(k2 –k1 )(exp–k1 t–exp–k2 t), where C is the concentration of the aci-nitro intermediate as a function of time, N0 is a constant, and k1 and k2 are the decay and build-up time constants of the intermediate, respectively.
15
16 Kinetic Protein Crystallography using Caged Compounds
In total, a clear conclusion obtained so far is that cryo-photolysis at ∼ 100 K followed by transient warming to room temperature leads to effective photofragmentation. The amount of uncaging is partial, increases with the photolysing temperature and depends on the type of caged compound [56]. Although a complete elucidation of the mechanism of photofragmentation at cryo-temperatures will clearly require further investigations, recent experiments were carried out (Fig. ??) that reveal new interesting features, some of which are summarized below. In these experiments, a 80-µW laser beam at 355 nm focused on a 50-µm diameter spot was directed on an amorphous solution containing 15 mM NPE-arsenocholine (NPE-AsCh), 50 mM MES, pH 6.0 and ∼25% glycerol. In NPE-AsCh, the nitrogen atom of the ammonium group is replaced by arsenic. This modification does not modify the photofragmentation pathway as compared to normal NPE-choline, but has the advantage in crystallo-graphic experiments that it provides a strong and characteristic anomalous X-ray diffraction signal allowing to readily position this atom in electron density maps [58]. A quantitative evaluation of the reaction kinetics under continuous laser illumination shows the disappearance of ∼ 80% of the initial NPE compound in about 100 s (Fig. ??b). Therefore, photolysis of the initial molecules is nearly complete on this timescale. However, only ∼ 40% of these molecules are transformed into species absorbing at 290–320 nm. These species may either consist of the nitrosoketone final product or a hemiacetal still bound to the molecule to be uncaged [57]. The other 40% photolysed molecules probably end-up in species that are spectroscopically silent in the range 260–600 nm. Prolonged exposure to the laser beam does not modify these amounts substantially, suggesting that an equilibrium is reached. The decay of the initial NPE compound and the build-up of the 290–320-nm (Fig. ??b) species both display biexponential behaviors. In each case, a fast component (t1/2 < 8.0 s) coexists with a slower component (t1/2 > 25 s), consistent with the idea that upon flash-cooling at least two different photofragmentation pathways exist. Interestingly, fitting with a simple kinetic model reveals that the aci-nitro intermediate (monitored at 400 nm) builds up rapidly (< 1 s) and decays with a half-time of ∼ 28 s. This finding indicates that the aci-nitro intermediate is involved only in the slower reaction pathway leading to the formation of the 290–320-nm species, not in the faster one, in agreement with the conclusions of Corrie et al. [57]. If, subsequently to laser illumination at 355 nm at 100 K, the temperature is elevated (at a rate of ∼ 5 K min−1 ), in the dark, the absorbencies at 290–320 nm increase. They increase slowly below 170 K and then rapidly above this temperature, reaching a level suggestive of full conversion to the 290–320-nm species at around 200 K (Fig. 8; above this temperature, rapid ice formation takes place and obscures further measurements). This observation means that, as the temperature is elevated, cross reactions between the above mentioned reaction pathways may occur, finally leading to the production of the 290–320-nm species, which is the most thermodynamically favored species at high temperatures. It is tempting to correlate the change in the rate of formation of the 290–320-nm species at ∼ 170 K with the occurrence of a glass transition in the sample, resulting in a greatly enhanced rotational and translational freedom of the molecules [59].
3 Trapping of Intermediate States
Fig. 8 Evolution of the photolysis product formation upon temperature increase in the dark after illumination at 100 K by UV light at 355 nm for 400 s, as reported by the increase in optical density monitored at 320 nm.
Taken together, our results show that photolysis at 100 K followed by transient warming above a critical temperature (T 1 in Fig. 6) leads to complete liberation of photolysis products absorbing at 290–320 nm. In the case of NPE compounds, assuming that these products correspond to final nitroso species (which remains to be demonstrated for NPE-ether compounds [57]), a complete release of the active molecule is expected. In other words, and as substantiated below, the photofragmentation process is in general not expected to be rate limiting for subsequent catalysis in a protein crystal at cryo-temperatures, opening the possibility to efficiently trap intermediate states. 3.2 Temperature-dependent Enzymatic Activity and Dynamical Transitions
Enzymatic turnover often ceases below a certain temperature called the dynamical transition temperature. This transition (generally within the range 150–250 K) marks the onset of an-harmonic motions of protein atoms occurring on a timescale from picoseconds to 100 ns as evidenced by, for example, neutron [60, 61] or M¨ossbauer spectroscopy [62]. A correlation between the dynamical transition and the onset of biological activity has been established for several proteins, such as bacteriorhodopsin [61], ribonuclease A [63, 64] or myoglobin [52, 65], but in other cases this correlation was not observed [66] and the issue remains controversial. Whether or not the onset of biological activity coincides with the dynamical transition might depend on the type of protein motions that are monitored, which are either determinant or not for biological function. Although the dynamical transition is still a poorly understood process, especially in the case of crystalline proteins, there are further indications suggesting that this transition might be coupled to a
17
18 Kinetic Protein Crystallography using Caged Compounds
solvent glass transition occurring within the crystal channels surrounding the protein molecules [67, 68]. Above the glass transition, the solvent molecules gain rotational and translational freedom, meaning that some degree of liquid state is reached and diffusion may occur. The determination of the proper temperature T 1 (Fig. 6) to be used experimentally is tricky, because transitions are difficult to detect and because experimental caveats also come into play, such as ice formation within the crystal that might slowly develop as the glass transition is passed and that may irreversibly damage the crystal. These questions are currently being investigated and some of them might be answered by techniques such as fluorescence microspectrophotometry, whose sensitivity allows us to probe subtle changes in protein dynamics as a function of temperature within a single crystal [69]. Finally, our current view is that, if proper conditions are found, the combination of cryo-photolysis at a temperature below the dynamical transition followed by transient warming above this temperature should allow a substantial amount of caged compound to be effectively released. Moreover, in the case of “affinity caging”, a released substrate, for example, should be able to diffuse from the solvent channels to the nearest active site centre, typically 100 Å away, within a few minutes. Furthermore, partial catalysis may take place and it is possible that an enzymatic intermediate will accumulate in the crystal.
3.3 Potential Applications of Cryo-photolysis Combined with Temperature-controlled Protein Crystallography
The strategy described above and summarized in Fig. 6 may enable the effective accumulation and trapping of an intermediate state of biological interest within the crystal, in order to solve its three-dimensional structure. This strategy appears especially interesting in the case of rapid enzymes, for which any other known trapping method is prone to failure due to suboptimal synchrony between the molecules of the crystal. In fact, cryo-photolysis below the dynamical transition temperature simply eliminates the requirement for synchrony. As a consequence, this technique may also widen the panel of caged compounds adequate for kinetic crystallography experiments. Indeed, caged compounds of low quantum yield may be used [44], since, within practical limits, the rate of photolysis is not an issue as long as protein activity is arrested below the dynamical transition. Enzymes for which there are no compounds of high quantum yield available could therefore be studied. For example, caged dioxygen [70], displaying a quantum yield of 0.04 at physiological pH, could serve to study the mechanisms of dioxygenases and preliminary experiments have shown that cryo-photolysis of this compound is successful (Grummit, personal communication). Caged protons [71] could also be used to trigger pH-dependent catalytic activity at low temperature.
3 Trapping of Intermediate States
3.4 Cryo-photolysis Wavelength and Cryo-radiolysis of NPE-caged Compounds
The possibility to use gentle (low-power) light sources is a key advantage of cryophotolysis, due to the absence of requirement for a high photolysis rate. Cryophotolysis can also be envisaged by two-photon absorption using light in the range 640–700 nm [72]. However, the choice of the photolysing wavelength leading to optimal cryo-photolysis is always a delicate problem. For example, in the case of amorphous solutions of NPE-AsCh (see above), we observed that illumination at 266 nm leads to a higher photolysis yield than at 355 nm, independently of the illumination time (not shown). This finding is probably related to the wavelengthdependent photosensitivity of various species along the fragmentation pathways. However, illuminating a protein crystal at 266 nm leads to competitive absorption by the aromatic residues, which is expected to offset completely the advantage of a higher photolysis yield. The high optical density of the crystal at this wavelength will result in significant temperature elevation of the sample, thermal gradients, UV damage and even a lack of available photons to interact with the caged molecules located in the crystal centre. An adequate solution may consist in chemically modifying the caged compound so as to tune its absorption spectrum away from the 280-nm region, which often can be achieved by incorporating two metoxy groups on the aromatic group [8]. A serious pitfall complicates the usage of caged compounds in protein crystallography. Experiments by absorption microspectrophotometry have suggested that NPE-caged compounds can be cleaved at cryo-temperatures by employing a highly intense X-ray beam rather than visible light [10]. In the case of NPE-TMP soaked into crystals of M. tuberculosis thymidylate kinase, spectral changes induced by X-irradiation after collection of a full crystallographic data set at 100 K on a undulator beamline (ID9 at the ESRF, Grenoble) were comparable to those induced by cryo-photolysis with 355-nm laser light at the same temperature. Recently online microspectrophotometry on beamline ID14-4 at the ESRF (R. Ravelli et al, in preparation) enabled us to monitor the spectral changes induced by X-ray irradiation in real-time. Thin films of amorphous NPE-AsCh solutions, as described above, were flash-cooled to 100 K using a cryo-loop, and exposed to monochromatic X-rays (γ = 0.94 Å; ∼ 5 × 1011 photons s−1 through slits of dimensions 100 × 100 µm2 ). Absorbance spectra were recorded every ∼ 0.4 s during X-ray exposure. Figure 9(a) shows spectra before, after 10 s and after 100 s of X-ray irradiation. After 10 s of exposure, the peak height at 320 nm has increased and the peak at 290 nm is clearly visible. The spectrum after 100 s of exposure resembles the one after cryo-photolysis with a 355-nm laser, although the ratio of the absorbances at 290 and 320 nm is largely different, suggestive of different fragmentation mechanisms (compare with Fig. ??). The time course of the spectral changes at 260 and 290 nm is shown in Fig. 9(b). Up to a time point of 15 s, corresponding to the onset of X-ray irradiation, the absorbance at 290 nm slowly increases and the one at 260 nm decreases, due to significant photolysis by the measuring light. At time points longer than 15 s, radiolysis is evident, leading to a rapid increase of absorbance
19
20 Kinetic Protein Crystallography using Caged Compounds
Fig. 9 (a) Absorption spectrum of NPEAsCh before and after 10 and 100 s of radiolysis at 100 K with X-rays at 0.94 Å. (b) Kinetics of disappearance of the initial nitrophenyl compound (260 nm) and build up of the photolysis product (putative nitroso or hemiacetal compound, 290 nm peak). In this case, no deconvolution was attempted and the results are only qualitative. Already before X-ray irradiation a peak at 290–320
nm is visible, originating from photolysis products that formed under the influence of the measuring light (deuterium lamp) during alignment of the sample. A flashcooled control solution, containing 50 mM MES, pH 6.0, and ∼25% glycerol but no caged compound, did not show any of the features observed in this figure upon X-ray irradiation (data not shown).
3 Trapping of Intermediate States
at 290–320 nm. The absorbance at 260 nm first decreases and then increases due to a competition between the vanishing peak at 260 nm and the rising peak at 290 nm. Synchrotron X-ray irradiation not only leads to changes in the spectrum of caged compounds, but can also alter them structurally. Electron densities for unilluminated NPE-AsCh in the active site of trigonal crystals of Torpedo californica acetylcholinesterase [73] are shown for data collected at 100 K on a laboratory X-ray source and on beamline ID14-EH2 at the ESRF (Fig. 10a and b). Clearly, irradiation with highly intense synchrotron radiation leads to a loss of definition of the caged compound, whereas radiation from a laboratory X-ray source necessary to collect a full data set preserves its full definition. It is not clear yet how spectroscopic changes induced by radiolysis (Fig. 9) relate to the observed structural damage (Fig. 10) and experiments will be needed to clarify this issue. However, we observed that it is possible to preserve full definition of the caged compound even on intense synchrotron beamlines (data not shown) if the X-ray dose necessary to collect a full data set is carefully minimized. The requirements for suitable X-ray data collection in combination with the strategy of Fig. 6 are summarized below. 3.5 X-ray Data Collection Strategy
Proper X-ray data collection protocols are needed to establish unambiguously the occurrence of a catalytically induced conformational change. In general, structural motions are best detected by computing difference Fourier electron density maps between a “trapped” state and a reference “resting” state. However, the problem of how to collect the corresponding diffraction data is far from obvious. The two data sets should be collected at the same temperature, typically 100 K, where secondary
Fig. 10 Density maps of NPE-Ch bound to the active site of T. californica acetylcholinesterase. Diffraction data were collected at 100 K on a laboratory X-ray source (a) and on the undulator beamline ID14EH2 at the ESRF (b). A model of the caged compound refined from the data collected
on the laboratory X-ray source is shown (unpublished result, obtained in collaboration with M. Goeldner, L. Peng, P. Gros, I. Silman and J. Sussman). Clearly the intense X-rays from a synchrotron source induced a structural alteration of the caged compound.
21
22 Kinetic Protein Crystallography using Caged Compounds
X-ray radiation damage is minimized. Searching for the best isomorphism between data sets would indicate to use a unique crystal for the collection of both sets, one before warming to T 1 and one after (Fig. 6). Unfortunately, a temperature profile transiently driving the crystal above the critical glass and/or dynamical transition temperatures will likely result in diffusion throughout the crystal of radicals formed during the first data collection, leading to disastrous damage apparent in the second data set that will alter the structure, if not ruin the diffraction quality. Separating radiation-induced structural changes from genuine conformational changes might be feasible to a certain extent if the specific alterations expected from the former are known, but this will remain a delicate task. For the same reason, deliberately using X-rays to cryo-radiolyse a caged substrate or cofactor would introduce the risk of damaging the entire crystal at the same time. It seems, therefore, preferable to use two different crystals – one serving as a reference, and the other containing the trapped intermediate generated by UV photolysis and transient temperature rise. However, the following difficulties appear: (1) the reference data set may already reveal a modified caged compound, due to X-ray radiolysis, (2) the caged compound may have different occupancies in different crystals, (3) non-isomorphism between crystals may significantly deteriorate the quality of experimental difference electron density maps (most informative because least biased by modeling errors) and (4) many crystals need to be tediously prepared if the diffraction quality tends to vary randomly. Additional control experiments are necessary that address the structural consequences of either the sole application of a temperature profile on a non cryophotolysed crystal or the sole UV illumination at 100 K.
4 Conclusion
In conclusion, kinetic protein crystallography is a challenging field of research standing at the forefront of protein crystallography. Its development, which will certainly receive great attention from the structural proteomics world in the future, is still in its infancy and will still require considerable efforts. It has now been realized that real-time-resolved crystallography employing the Laue method, although an extremely powerful and elegant technique, is only adequate for a limited number of proteins displaying built-in photosensitivity and may not be easily used in combination with caged compounds. New perspectives are directed towards trapping strategies, where caged compounds still offer considerable interest, especially through their ability to be photolysed at cryo-temperatures, as recently demonstrated [10, 56]. This possibility, combined with the use of physicochemical properties of biological macromolecules such as temperature-dependent activity and dynamical transitions, adds an additional tool to the panel of methods based on cryo-crystallography [3, 74] to trap intermediate states in protein crystals. In particular, it opens the way to the study of structural changes in rapid enzymes such as cholinesterases, a field that was hitherto reserved for a limited number of proteins displaying endogenous photosensitivity. A wide range of caged compounds can
References
benefit from the technique, including those that are moderately photosensitive, and therefore a wider range of enzymes may in principle be studied.
Acknowledgments
We acknowledge M. Goeldner, A. Specht, I. Silman, J. Sussman and P. Gros for fruitful collaborations and numerous discussions. T. Ursby is acknowledged for his essential contribution to cryo-photolysis studies and the development of absorption microspectrophotometry. We thank R. Ravelli and J. Colletier for their help in cryoradiolysis experiments.
References 1 B. L. STODDARD, Methods 2001, 24, 125–138. 2 G. A. PETSKO, D. RINGE, Curr. Opin. Chem. Biol. 2000, 4, 89–94. 3 I. SCHLICHTING, Acc. Chem. Res. 2000, 33, 532. 4 I. SCHLICHTING, R. S. GOODY, Methods Enzymol. 1997, 277, 467–490. 5 E. M. DUKE, S. WAKATSUKI, A. HADFIELD, L. N. JOHNSON, Protein Sci. 1994 3, 1178. 6 I. SCHLICHTING, S. C. ALMO, G. RAPP, K. WILSON, K. PETRATOS, A. LENTFER, A. WITTINGHOFER, W. KABSCH, E. F. PAI, G. A. PETSKO, R. S. GOODY, Nature 1990, 345, 309. 7 B. L. STODDARD, P. KOENIGS, N. PORTER, K. PETRATOS, G. A. PETSKO, D. RINGE, Proc. Natl Acad. Sci. USA 1991, 88, 5503–5507. 8 B. L. STODDARD, B. E. COHEN, M. BRUBAKER, A. D. MESECAR, D. E. KOSHLAND, Jr, Nat. Struct. Biol. 1998, 5, 891–897. 9 A. J. SCHEIDIG, C. BURMESTER, R. S. GOODY, Struct. Fold. Des. 1999, 7, 1311–1324. 10 T. URSBY, M. WEIK, E. FIORAVANTI, M. DELARUE, M. GOELDNER, D. BOURGEOIS, Acta Crystallogr. D 2002, 58, 607–614. 11 D. M. van AALTEN, W. CRIELAARD, K. J. HELLINGWERF, L. JOSHUA-TOR, Protein Sci. 2000, 9, 64–72. 12 K. EDMAN, P. NOLLERT, A. ROYANT, H. BELRHALI, E. PEBAY-PEYROULA, J. HAJDU, R. NEUTZE, E. M. LANDAU, Nature 1999, 401, 822–826.
13 J. M. BOLDUC, D. H. DYER, W. G. SCOTT, P. SINGER, R. M. SWEET, D. E. KOSHLAND, Jr, B.L. STODDARD, Science 1995, 268, 1312–1318. 14 T. R. SCHNEIDER, E. GERHARDT, M. LEE, P. H. LIANG, K. S. ANDERSON, I. SCHLICHTING, Biochemistry 1998, 37, 5394–5406. 15 E. GARMAN, T. SCHNEIDER, J. Appl. Crystallogr. 1997, 27, 211–237. 16 T. Y. TENG, J. Appl. Crystallogr. 1990, 23, 387–391. 17 M. WEIK, R. B. RAVELLI, I. SILMAN, J. L. SUSSMAN, P. GROS, J. KROON, Protein Sci. 2001, 10, 1953–1961. 18 R. B. RAVELLI, S. M. MCSWEENEY, Structure Fold. Des. 2000, 8, 315–328. 19 W. P. BURMEISTER, Acta Crystallogr.D 2000, 56, 328–341. 20 M. WEIK, R. B. RAVELLI, G. KRYGER, S. MCSWEENEY, M. L. RAVES, M. HAREL, P. GROS, I. SILMAN, J. KROON, J. L. SUSSMAN, Proc. Natl Acad. Sci. USA 2000, 97, 623–628. 21 M. W. MAKINEN, A. L. FINK, Annu. Rev. Biophys. Bioeng. 1977, 6, 301–343. 22 B. L. STODDARD, G. K. FARBER, Structure 1995, 3, 991–996. 23 P. O’HARA, P. GOODWIN, B. L. STODDARD, J. Appl. Crystallogr. 1995, 28, 829–833. 24 W. G. SCOTT, J. B. MURRAY, J. R. ARNOLD, B. L. STODDARD, A. KLUG, Science 1996, 274, 2065–2069. 25 P. GOUET, H. M. JOUVE, P. A. WILLIAMS, I. ANDERSSON, P. ANDREOLETTI, L. NUSSAUME, J. HAJDU, Nat. Struct. Biol. 1996, 3, 951–956.
23
24 Kinetic Protein Crystallography using Caged Compounds
26 H. KACK, K. J. GIBSON, Y. LINDQVIST, G. SCHNEIDER, Proc. Natl Acad. Sci. USA 1998, 95, 5495–5500. 27 R. C. WILMOUTH, K. EDMAN, R. NEUTZE, P. A. WRIGHT, I. J. CLIFTON, T. R. SCHNEIDER, C. J. SCHOFIELD, J. HAJDU, Nat. Struct. Biol. 2001, 8, 689–694. 28 P. T. SINGER, A. SMALAS, R. P. CARTY, W. F. MANGEL, R. M. SWEET, Science 1993, 259, 669–673. 29 A. D. PANNIFER, A. J. FLINT, N. K. TONKS, D. BARFORD, J. Biol. Chem. 1998, 273, 10454–10462. 30 D. OESTERHELT, W. STOECKENIUS, Nat. New Biol. 1971, 233, 149–152. 31 W. D. HOFF, P. DUX, K. HARD, B. DEVREESE, I. M. NUGTEREN-ROODZANT, W. CRIELAARD, R. BOELENS, R. KAPTEIN, J. van BEEUMEN, K.J. HELLINGWERF, Biochemistry 1994, 33, 13959–13962. 32 R. H. AUSTIN, K. W. BEESON, L. EISENSTEIN, H. FRAUENFELDER, I. C. GUNSALUS, Biochemistry 1975, 14, 5355–5373. 33 A. ROYANT, K. EDMAN, T. URSBY, E. PEBAY-PEYROULA, E. M. LANDAU, R. NEUTZE, Nature 2000, 406, 645–648. 34 U. K. GENICK, G. E. BORGSTAHL, K. NG, Z. REN, C. PRADERVAND, P. M. BURKE, V. SRAJER, T. Y. TENG, W. SCHILDKAMP, D. E. MCREE, K. MOFFAT, E. D. GETZOFF, Science 1997, 275, 1471–1475. 35 U. K. GENICK, S. M. SOLTIS, P. KUHN, I. L. CANESTRELLI, E. D. GETZOFF, Nature 1998, 392, 206–209. 36 B. PERMAN, V. SRAJER, Z. REN, T. TENG, C. PRADERVAND, T. URSBY, D. BOURGEOIS, F. SCHOTTE, M. WULFF, R. KORT, K. HELLINGWERF, K. MOFFAT, Science 1998, 279, 1946–1950. 37 I. SCHLICHTING, J. BERENDZEN, G. N. PHILLIPS, Jr, R. M. SWEET, Nature 1994, 371, 808–812. 38 I. SCHLICHTING, K. CHU, Curr. Opin. Struct. Biol. 2000, 10, 744–752. 39 D. BOURGEOIS, B. VALLONE, F. SCHOTTE, A. ARCOVITO, A. E. MIELE, G. SCIARA, M. WULFF, P. ANFINRUD, M. BRUNORI, Proc. Natl Acad. Sci. USA 2003, 100, 8704–8709. 40 F. SCHOTTE, M. LIM, T. A. JACKSON, A. V. SMIRNOV, J. SOMAN, J. S. OLSON, G. N. PHILLIPS, Jr, M. WULFF, P. A. ANFINRUD, Science 2003, 300, 1944–1947.
41 S. ADACHI, S. Y. PARK, J. R. TAME, Y. SHIRO, N. SHIBAYAMA, Proc. Natl Acad. Sci. USA 2003, 100, 7039–7044. 42 I. SCHLICHTING, J. BERENDZEN, K. CHU, A. M. STOCK, S. A. MAVES, D. E. BENSON, R. M. SWEET, D. RINGE, G. A. PETSKO, S. G. SLIGAR, Science 2000, 287, 1615–1622. 43 G. I. BERGLUND, G. H. CARLSSON, A. T. SMITH, H. SZOKE, A. HENRIKSEN, J. HAJDU, Nature 2002, 417, 463–468. 44 J. A. MCCRAY, D. R. TRENTHAM, Annu. Rev. Biophys. Biophys. Chem. 1989, 18, 239–270. 45 D. BOURGEOIS, T. URSBY, M. WULFF, C. PRADERVAND, A. LEGRAND, W. SCHILDKAMP, S. LABOURE´ , V. SRAJER, T. Y. TENG, M. ROTH, K. MOFFAT, J. Synchrotron Radiat. 1996, 3, 65–74. 46 I. J. CLIFTON, M. ELDER, J. HAJDU, J. Appl. Crystallogr. 1991, 24, 267–277. 47 J. HAJDU, Annu. Rev. Biophys. Biomol. Struct. 1993, 22, 467–498. 48 A. L. FINK, G. A. PETSKO, Adv. Enzymol. Relat. Areas. Mol. Biol. 1981, 52, 177–246. 49 E. H. SNELL, R. A. JUDGE, M. LARSON, M. J. van der WOERD, J. Synchrotron Radiat. 2002, 9, 361–367. 50 A. J. SCHEIDIG, A. SANCHEZ-LLORENTE, A. LAUTWEIN, E. F. PAI, J. E. CORRIE, G. P. REID, A. WITTINGHOFER, R. S. GOODY, Acta Crystallogr. D 1994, 50, 512–520. 51 A. J. SCHEIDIG, S. M. FRANKEN, J. E. CORRIE, G. P. REID, A. WITTINGHOFER, E. F. PAI, R. S. GOODY, J. Mol. Biol. 1995, 253, 132–150. 52 A. OSTERMANN, R. WASCHIPKY, F. G. PARAK, G. U. NIENHAUS, Nature 2000, 404, 205–208. 53 M. BRUNORI, B. VALLONE, F. CUTRUZZOLA, C. TRAVAGLINI-ALLOCATELLI, J. BERENDZEN, K. CHU, R. M. SWEET, I. SCHLICHTING, Proc. Natl Acad. Sci. USA 2000, 97, 2058–2063. 54 K. BARABAS, L. KESZTHELYI, Acta Biochim. Biophys. Acad. Sci. Hung. 1984, 19, 305–309. 55 D. BOURGEOIS, X. VERNEDE, V. ADAM, E. FIORAVANTI, T. URSBY, J. Appl. Crystallogr. 2002, 35, 319–326. 56 A. SPECHT, T. URSBY, M. WEIK, L. PENG, J. KROON, D. BOURGEOIS, M. GOELDNER, ChemBioChem 2001, 2, 845–848.
References 57 J. E. CORRIE, A. BARTH, V. R. MUNASINGHE, D. R. TRENTHAM, M. C. HUTTER, J. Am. Chem. Soc. 2003, 125, 8546–8554. 58 L. PENG, F. NACHON, J. WIRZ, M. GOELDNER, Angew. Chem. Int. Ed. 1998, 37, 2691–2693. 59 M. FISHER, J. P. DEVLIN, J. Phys. Chem. 1995, 99, 11584–11590. 60 W. DOSTER, S. CUSACK, W. PETRY, Nature 1989, 337, 754–756. 61 M. FERRAND, A. J. DIANOUX, W. PETRY, G. ZACCAI, Proc. Natl Acad. Sci. USA 1993, 90, 9668–9672. 62 F. PARAK, E. W. KNAPP, D. KUCHEIDA, J. Mol. Biol. 1982, 161, 177–194. 63 R. F. TILTON, Jr, J.C. DEWAN, G. A. PETSKO, Biochemistry 1992, 31, 2469–2481. 64 B. F. RASMUSSEN, A. M. STOCK, D. RINGE, G. A. PETSKO, Nature 1992, 357, 423–424. 65 H. LICHTENEGGER, W. DOSTER, T. KLEINERT, A. BIRK, B. SEPIOL, G. VOGL, Biophys. J. 1999, 76, 414–422.
66 R. M. DANIEL, J. C. SMITH, M. FERRAND, S. HERY, R. DUNN, J. L. FINNEY, Biophys. J. 1998, 75, 2504–2507. 67 P. W. FENIMORE, H. FRAUENFELDER, B. H. MCMAHON, F. G. PARAK, Proc. Natl Acad. Sci. USA 2002, 99, 16047. 68 M. WEIK, Eur. Phys. J. E 2003, 12, 153. 69 M. WEIK, X. VERNEDE, R. ROYANT, D. BOURGEOIS, Biophys. J. 2004, 86, 3176. 70 R. MACARTHUR, A. SUCHETA, F. F. CHONG, O. EINARSDOTTIR, Proc. Natl Acad. Sci. USA 1995, 92, 8105–8109. 71 S. KHAN, F. CASTELLANO, J. L. SPUDICH, J. A. MCCRAY, R. S. GOODY, G. P. REID, D. R. TRENTHAM, Biophys. J. 1993, 65, 2368–2382. 72 W. DENK, J. H. STRICKLER, W. W. WEBB, Science 1990, 248, 73–76. 73 J. L. SUSSMAN, M. HAREL, F. FROLOW, C. OEFNER, A. GOLDMAN, L. TOKER, I. SILMAN, Science 1991, 253, 872–879. 74 K. MOFFAT, R. HENDERSON, Curr. Opin. Struct. Biol. 1995, 5, 656–663.
25
1
Luminescent Proteins in Binding Assays
Aldo Roda, Massimo Guardigli, Elisa Michelini, and Mara Mirasoli University of Bologna, Bologna, Italy
Patrizia Pasini University of Kentucky, Lexington, USA
Originally published in: Photoproteins in Bioanalysis. Edited by Sylvia Daunert and Sapna K. Deo. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31016-6
1 Introduction
Unraveling the mechanisms of natural phenomena occurring in various marine and terrestrial organisms and resulting in light production has provided exciting new tools in biosciences. The study of such mechanisms has moved from curiosity towards such fascinating phenomena to their effective application in analytical chemistry. Various bioluminescent or fluorescent proteins have been isolated from luminescent organisms, and the corresponding genes have been cloned. In addition, mutational studies have provided a number of luminescent proteins with improved spectral characteristics or emission at different colors. Bioluminescent and fluorescent proteins allow for the development of ultrasensitive binding assays, thanks to their high detectability and to the availability of highly sensitive light detection instruments. Bioluminescent proteins are referred to with the general term “luciferases”. They are the key enzymes that catalyze bioluminescent reactions, which involve oxidation of a substrate generically known as luciferin. The reaction yields a singlet-excited intermediate that decays with photon emission. Because the enzyme active site provides a reaction microenvironment that is favorable to radiative decay of the excited intermediate product, the bioluminescent reaction is characterized by a very high quantum yield (0.88 for the reaction of firefly luciferase, which is the highest reported for a biological reaction). Among luciferases, calcium-activated photoproteins, such as aequorin from Aequorea victoria or obelin from Obelia longissima, can be defined as proteins that emit light without turnover. Indeed, the product of the reaction is not released from the active site, where it remains tightly bound, preventing turnover of the reaction. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Luminescent Proteins in Binding Assays Table 1 Emission properties of luminescent proteins
Fluorescent proteins GFP EB(lue)FP EG(reen)FP EC(yan)FP EY(ellow)FP RFP
Origin
Emission wavelength (nm)
Aequorea victoria GFP mutant GFP mutant GFP mutant GFP mutant Discosoma striata
509 (absorption at 395 nm) 440 (absorption at 380 nm) 508 (absorption at 489 nm) 477 (absorption at 434 nm) 527 (absorption at 514 nm) 583 (absorption at 558 nm)
Aequorea victoria Obelia longissima
465 (Ca2+ -triggered emission) 485 (Ca2+ -triggered emission)
Photoproteins Aequorin Obelin Luciferases Bacterial luciferases Vibrio spp., Photorabdus spp. 489 Firefly luciferase Photinus pyralis 562 Renilla luciferase Renilla reniformis 470
Fluorescent proteins emit light as a result of photoexcitation. At present, a number of fluorescent proteins are available for applications in biosciences. Along with the green fluorescent protein (GFP), originally isolated from A. victoria, natural or mutated variants are available that display interesting features such as improved photophysical properties, emission at different colors, or photoactivation properties. Relevant properties of common luminescent proteins are gathered in Table 1. Luminescent proteins have been employed as labels in different types of binding assays, including protein–protein and protein–ligand interaction assays, immunoassays, biotin–avidin-based assays, and nucleic acid hybridization assays. The principles of binding assays involving luminescent proteins are depicted in Fig. 1 and 2. Besides their fluorescent or bioluminescent properties, a further reason for the increasing use of luminescent proteins in binding assays derives from the recent advancements in biotechnology. The availability of cDNA encoding these proteins and the possibility of expressing them in bacterial systems has allowed for their production in virtually unlimited amounts. Furthermore, fusion proteins obtained by genetic in-frame fusion of a luminescent protein to a target protein have enabled a number of revolutionary applications. Such bifunctional molecules have the interesting feature of conjugating luminescence with the target protein properties. Interestingly, the same approach can be used to obtain in vivo biotinylated luminescence proteins by fusion with a suitable biotin acceptor peptide. Gene fusion presents several advantages when compared with conventional chemical conjugation approaches. Indeed, it provides bifunctional conjugates uniform in their stoichiometry and in the position of luminescent protein attachment. In
1 Introduction 3
Fig. 1 Use of luminescent proteins as labels in binding assays
Fig. 2 Resonance energy transfer (RET)-based binding assays involving luminescent proteins. In FRET assays both the donor and the acceptor are fluorescent proteins, while in BRET assays the donor and the acceptor are bioluminescent and fluorescent proteins, respectively
4 Luminescent Proteins in Binding Assays
addition, such conjugates are readily available in virtually unlimited amounts after gene expression of the fusion gene, usually in bacteria, and protein purification. In addition, the luminescence activity of the label, which is often partially lost upon chemical conjugation, is usually retained upon protein fusion. In the following, the most recent and innovative applications of luminescent proteins in binding assays will be described.
2 Protein–Protein and Protein–Ligand Interaction Assays
Luminescent proteins have been used extensively in the development of binding assays involving proteins, which have allowed us to obtain insights into many important biological processes. 2.1 FRET and BRETM Techniques
Most of these assays involve resonance energy transfer (RET) techniques. RET is a short-range, nonradiative energy transfer between donor and acceptor molecules that takes place only if the two species are in close proximity (<10 nm) to each other. RET systems are therefore “spectroscopic rulers” that are suitable for monitoring interactions between two partners, provided that each component of the donor–acceptor couple is linked to one partner. The same strategy can be used to detect protein conformation changes if donor and acceptor are fused at the target protein terminals. In the case of fluorescence RET (FRET), both the donor and the acceptor are fluorescent molecules, whereas in bioluminescence RET (BRET) a bioluminescent molecule acts as the energy donor. FRET and, more recently, BRET are the methods of choice for imaging protein association inside living cells. In addition, RET processes can be evaluated using microtiter plate readers, thus allowing for the development of FRET- and BRET-based quantitative assays. FRET methods can be broadly classified as intensity-based and decay kineticsbased methods. Intensity-based methods rely on the measurement of either the acceptor fluorescence or the acceptor:donor fluorescence intensity ratio. However, they are not suitable for detecting low-level protein–protein interactions because their sensitivity is reduced by several factors (e.g., sample autofluorescence and overlap between donor and acceptor emission). Such problems have been partially overcome by other FRET technologies, such as time-resolved FRET (TR-FRET), in which the donor is a lanthanide chelate with long-lived fluorescence. Decay kinetics-based FRET methods rely on the measurement of the kinetics of the light emission. In fluorescence lifetime imaging microscopy (FLIM), the decrease in the donor fluorescence lifetime that is due to FRET is measured, providing a reliable, donor concentration-independent estimation of FRET efficiency. Because the fluorescence lifetimes of donors are in the pico- or nanosecond range, even
2 Protein–Protein and Protein–Ligand Interaction Assays 5
extremely rapid interactions can be monitored. In addition, unlike intensity-based FRET, FLIM measures can be also performed with spectrally similar fluorophores. The BRET process occurs naturally in some marine organisms, such as the sea pansy Renilla reniformis and the jellyfish Aequorea victoria. In both cases the green fluorescent protein (GFP) is the acceptor, while Renilla luciferase (Rluc) or aequorin is the donor, respectively. BRET possesses some advantages in comparison with FRET because it avoids the problems associated with illumination. Therefore, it can be used even in photoresponsive cells or in cell types with significant autofluorescence. A further advantage of BRET is its higher sensitivity, which allows protein–protein interactions to be measured at low protein concentration, thus reducing nonspecific association phenomena of the target proteins. Because BRET assays are essentially intensity-based assays, spectral separation of donor and acceptor emissions is required. 2.2 FRET and BRET Applications
The great variety of available fluorescent and bioluminescent proteins emitting at different colors and the possibility to produce genetic in-frame fusions have had a great impact on FRET applications (which were originally based on organic fluorophores) and made BRET possible. Further advancements came as a result of the availability of improved substrates for bioluminescent proteins. For example, Packard has developed a BRET technology (BRET2 ) based on a proprietary coelenterazine substrate and a GFP mutant that provides a large spectral shift between Rluc and GFP emission, thus allowing for more reliable BRET measurement [1]. The performance of FRET assays has also been significantly improved by new FRET techniques, such as two-photon excitation FRET and time-correlated single photon-counting FRET, which was made possible by the availability of more sensitive imaging devices and ultrafast light detection systems. In recent years, an increasing number of studies have employed FRET and BRET techniques based on luminescent proteins, both in vitro and in vivo in intact cells [2, 3]. In addition to those described in the following paragraphs, selected recent FRET and BRET binding assays involving luminescent proteins are reported in Table 2. FRET assays can involve either a donor–acceptor couple of fluorescent proteins or a fluorescent protein, usually acting as the energy donor, and an organic fluorophore. FRET between enhanced yellow fluorescent protein (EYFP) and enhanced cyan fluorescent protein (ECFP) was used to study receptor–arrestin interactions [19]. FRET allowed detection of a rapid, time-dependent interaction between EYFPtagged β-arrestin 2 and ECFP-tagged receptor 5, while co-immunoprecipitation techniques showed interaction only after 30 min of ligand stimulation. This result clearly demonstrates the superiority of RET methods for the assessment of protein–protein interactions. FRET has been used to study homodimerization of the hY(1), hY(2), and hY(5) neuropeptide Y (NPY) receptors, which belong to the large family of G
6 Luminescent Proteins in Binding Assays Table 2 Additional selected BRET and FRET binding assays reported in the literature
Type of binding interaction
RET process
Reference
Association of neuropeptide Y Y4 receptor Heterodimerization of µ-opioid receptor (MOR1) and substance P receptor (NK1) Homo-oligomerization of leukotriene C4 synthase Oligomerization of adenosine A2A and dopamine D2 receptors Conformational change of protein B (PKT)/Akt resulting from lipid binding Dimerization of CXCR4 receptor Homo- and heterodimerization of oxytocin and vasopressin V1a and V2 receptors Interaction between insulin receptor and protein tyrosine-phosphatase 1B Oligomerization of transcriptional intermediary factor 1 regulators Homodimerization of herpes simplex virus thymidine kinase (TK) monomers Oligomerization of human thyrotropin receptor (TSHR) Interaction between the heart-specific FHL2 protein and the DNA-binding nuclear protein hNP220 Dimerization and ligand-induced conformational changes of melatonin receptors Homo- and heterodimerization of G protein-coupled receptors Oligomerization between adenosine A(1) and P2Y(1) receptors
BRET (BRET2 ) BRET (Rluc-GFP)
[4] [5]
BRET (BRET2 ) BRET (BRET2 )
[6] [7]
FRET (GFP-YFP)
[8]
BRET (BRET2 ) BRET (Rluc-EYFP)
[9] [10]
BRET (Rluc-YFP)
[11]
BRET (Rluc-GFP)
[12]
FRET (CFP-YFP)
[13]
FRET (EGFP-Cy3)
[14]
FRET (BFP-GFP)
[15]
BRET (Rluc-YFP)
[16]
BRET (Rluc-EYFP, BRET2 )
[17]
BRET (BRET2 )
[18]
protein-coupled receptors, in living cells. Fusion proteins of NPY receptors and GFP or its spectral cyan, yellow, and red variants (CFP, YFP, and RFP, respectively) were generated and used as FRET pairs in either FRET fluorescence microscopy or fluorescence spectroscopy. Both techniques clearly showed that all the receptor subtypes are able to form homodimers and that dimers constitute a significant fraction of the receptors [20]. The YFP-RFP donor–acceptor FRET couple has been employed to study the dimerization of the human thyrotropin (TSH) receptor [21]. Observation of FRET between receptors differently tagged with the GFP variants provided evidence for the close proximity of individual receptor molecules, according to previous studies demonstrating the presence of receptor dimers and oligomers in thyroid tissue. The mechanisms underlying the human immunodeficiency virus’s (HIV) inhibitory activity of the chemokine stromal cell-derived factor 1 (SDF-1) were investigated. This chemokine, which binds to the CXC chemokine receptor 4 (CXCR4), is able to inhibit infection of CD4+ cells by HIV strains through binding to cell-surface heparan sulfate proteoglycans (HSPGs). Using FRET between an enhanced green
2 Protein–Protein and Protein–Ligand Interaction Assays 7
fluorescent protein (EGFP)-tagged CXCR4 and a Texas Red-labeled SDF-1 measured by laser confocal microscopy, the concomitant binding of SDF-1 to CXCR4 and cell-surface HSPGs was demonstrated. This binding may increase the local concentration of the chemokine in the surrounding environment of CXCR4, thus promoting its HIV-inhibitory activity [22]. Signal transducer and activator of transcription 3 (Stat3) dimerization was studied by live-cell fluorescence spectroscopy and imaging microscopy combined with FRET. Stat3 fusion proteins with CFP and YFP were constructed and expressed in live cells. Fusion proteins showed functionality and intracellular distribution similar to wild-type Stat3, and FRET studies demonstrated the existence of Stat3 dimers. Visualization of the FRET signals also allowed insights into the spatiotemporal dynamics of Stat3 signal transduction [23]. The association and disassociation of heterotrimeric G proteins were monitored in living cells by tagging the Gα 2 and Gβγ proteins with CFP and YFP. Data from emission spectra were used to detect the FRET fluorescence and to determine kinetics and dose-response curves of bound ligand and analogues. Extending Gprotein FRET to other G proteins should enable direct in situ mechanistic studies and applications, such as drug screening and identifying ligands of G proteincoupled receptors [24]. An improved decay kinetics-based FRET method has been employed to study the association of the α and β 1 subunits of the human cardiac sodium channel [25]. FRET measurements performed using a mode-locked laser, a confocal microscope, and a streak camera demonstrated that the subunits, which were labeled with YFP and CFP, associate in the endoplasmic reticulum before they reach the plasma membrane. This approach has been described as superior to other decay kineticsbased methods because the fluorescence spectra and lifetimes of both donor and acceptor can be measured simultaneously, allowing distinction between FRET and other donor emission-quenching processes. FRET has also been applied to flow cytometry for the development of cell-sorting assays [26]. CFP and YFP were fused to the FUSE-binding protein (FBP)-interacting repressor and to the binding domain of FBP, respectively, and these tagged proteins were used to detect FRET in viable cells by flow cytometry. FRET between the interacting species was easily detected, suggesting that FRET could represent a generally available flow cytometry technique. BRET was applied for the first time in 1999 for the study of the interactions between the circadian clock genes KaiA and KaiB of a cyanobacterium [27], showing the usefulness of this methodology over standard techniques for identifying protein interactions. From that time, the number of BRET applications reported in the literature, which mainly employed Rluc as the bioluminescent donor, has rapidly increased. BRET and TR-FRET were used to study the oligomerization of the human δopioid receptor. Both approaches allowed the detection of homo-oligomers, whose formation was influenced by neither agonist nor inverse agonist ligands. Hetero-oligomers between co-expressed human δ-opioid receptor and human β 2 adrenoreceptor were also studied. Interestingly, heterodimerization was detected
8 Luminescent Proteins in Binding Assays
by BRET, using the Rluc-EYFP BRET couple, but not by TR-FRET, and the extent of heterodimerization was increased in the presence of receptor agonists. The lack of observation of heterodimerization by TR-FRET has been attributed to the lower sensitivity of the technique or, alternatively, to the intracellular localization of the heterodimers, which in this case could not be detected by TR-FRET [28]. BRET has been extensively applied for the study of G protein-coupled receptor (GPCR) oligomerization in living cells, which is emerging as crucial aspect of GPCR function. BRET was shown to be a more powerful tool for studying GPCR oligomerization than standard co-immunoprecipitation methods, which are often limited by the formation of artifactual aggregates [30], and demonstrated that that GPCRs exist as either homo- or hetero-oligomeric complexes [31]. It was also used to study GPCR receptor–β-arrestin interactions involved in receptor desensitization and trafficking in mammalian cells. Because of the highly hydrophobic nature and cellular localization of GPCRs, BRET presented significant advantages in comparison with other techniques (e.g., co-immunoprecipitation and yeast two-hybrid screening) for measuring protein–protein interactions [29]. BRET techniques also allow the real-time monitoring of ligand–receptor interactions. BRET from Rluc to YFP was used to study the interaction between a G protein-coupled receptor kinase (GRK2) and the human oxytocin receptor (OTR) in living cells [32]. While previous attempts to demonstrate involvement of GRK have failed, the real-time BRET assay clearly showed the GRK2-OTR interactions and also offered insights into the dynamics of the process. BRET- and FRET-based binding assays represent a powerful tool in drug discovery because they could permit homogeneous high-throughput screening assays. BRET is particularly suitable for these assays, because the FRET background signal that is due to sample autofluorescence is difficult to eliminate in non-imaging methods, such as in microtiter plate format assays. BRET was used to monitor in vitro the activation state of the insulin receptor. Fusion proteins in which the human insulin receptor is fused to either Rluc or EYFP were expressed in HEK293 cells. Spontaneous in vivo dimer formation through disulfide bonds led to functional insulin receptors that, upon insulin binding, underwent conformational changes detectable by BRET. The procedure allowed for rapid determination of the activation state of the receptor, and it could be used in the search for new molecules with insulin-like activity [33]. 2.3 Other Detection Principles
Fluorescent proteins can be directly used to monitor protein–protein association processes or conformation changes that are due to ligand binding. Fluorescent cellular sensors expressing YFP fused to the ligand-binding domains of estrogen, androgen, and glucocorticoid receptors were developed by Muddana and Peterson [34]. The fusion proteins were tethered through a short linker and expressed in Saccharomyces cerevisiae yeast cells. A dose-dependent fluorescence enhancement
3 Antibody-based Binding Assays 9
was observed in the presence of steroid receptor ligands. The correlation with the known relative receptor-binding affinity values of the compounds suggested that the fluorescence enhancement was due to ligand-induced receptor dimerization, perhaps through stabilization of YFP protein folding. These fluorescent cellular sensors could represent a novel, high-throughput method to identify and analyze ligands of nuclear hormone receptors. Fluorescence anisotropy decay microscopy was used to determine, in individual living cells, the spatial and temporal monomer–dimer distribution of GFP-tagged herpes simplex virus thymidine kinase (TK). Measurement of the rotational time of the proteins by confocal microscopy and a time-correlated single photon-counting technique allowed the determination of their oligomeric state in both the cytoplasm and the nucleus. It was demonstrated that tagged TK was initially produced in a monomeric state and then formed dimers that grew into aggregates. Picosecond time-resolved fluorescence anisotropy microscopy was thus proposed as a promising technique for obtaining structural information on proteins in living cells, even in the case of low protein expression levels [35].
3 Antibody-based Binding Assays
Several binding assays take advantage of the high specificity and avidity of antigen–antibody binding. Among these, immunoassays are the method of choice for the direct, sensitive, and specific quantification in complex matrices of analytes of clinical, pharmacological, environmental, or alimentary interest. What makes immunoassays very appealing is their rapidity, as well as the possibility of automation and high-throughput analyses. In addition, immunohistochemical methods are widely employed for the sensitive and specific localization of analytes in tissue sections or cells. Various highly detectable labels have been used in the development of immunoassays, radioisotopes being the first example. However, safety and disposal issues, as well as limited shelf life, have prompted the search for alternative labels. Nowadays, enzymes are the most commonly used labels, thanks to signal amplification deriving from substrate turnover. This, together with the availability of chemiluminescent substrates, allowed reaching the highest detectability, which is similar or even higher than that obtained with radioisotopes. Among enzymes that can be detected using a chemiluminescent substrate, horseradish peroxidase (HRP) is extensively used, and the recent cloning and heterologous expression of its gene is opening new exciting applications in biosciences [36]. The most popular substrate for HRP, based on the H2 O2 –luminol–enhancer system, even if highly optimized, still presents a relatively low quantum yield, on the order of 0.01. Bioluminescence and fluorescence should provide superior detectability because they are characterized by much higher luminescence quantum yields. The various approaches used to employ luminescent proteins in antibody-based binding assays will be discussed in the following section.
10 Luminescent Proteins in Binding Assays
3.1 Chemical Conjugation
Chemical conjugation of the protein label with either the analyte or the binder (e.g., antibody or protein A) is traditionally used in order to obtain tracers suitable for developing bioluminescent or fluorescent immunoassays. This rather simple approach takes advantage of the presence of reactive groups (usually primary amino or sulfhydryl groups) on the protein molecule. Various ultrasensitive assays have been developed with this strategy, using photoproteins [37–41] and GFP [42] as labels. In order to provide signal amplification, and thus a potential increase in assay sensitivity, streptavidin-biotinylated luciferase complexes were used [43]. Alternatively, the efficiency of the analyte chemical conjugation to obelin was increased by introducing additional sulfhydryl groups into the obelin sequence, with the use of Traut’s reagent (2-iminothiolane). This approach allowed development of a competitive immunoassay for thyroxin and a sandwich immunoassay for thyrotropin, both characterized by a sensitivity comparable with that obtained using radioisotope labels. However, it required careful experimental optimization to avoid excessive obelin conjugation and, consequently, a decrease in its bioluminescent activity [44]. To circumvent low reproducibility and label inactivation drawbacks, site-directed chemical conjugation has been proposed. In particular, aequorin mutants containing a single cysteine residue have been produced and used to obtain oneto-one thyroxine–aequorin homogeneous conjugates that allowed development of immunoassays for thyroxine with detection limits on the order of 10−12 M, which is three orders of magnitude lower than that reported for commercially available immunoassays [45]. Because luciferase is most often inactivated by the chemical reactions involved in direct labeling of antigens or binding molecules, gene-fusion strategies have been used to exploit its high detectability in binding assay development. Alternatively, the luciferase reaction was coupled with a second enzymatic reaction. This allowed the use of the coupled enzyme as a conjugated label, followed by firefly luciferase addition in solution. For example, luciferin derivatives available as a substrate for firefly luciferase only upon activation by a specific enzymatic reaction have been synthesized [46]. Alternatively, the firefly luciferase reaction was coupled with an enzymatic reaction producing ATP, such as those catalyzed by acetate kinase or pyruvate phosphate dikinase [47]. 3.2 Gene Fusion
A new approach for producing tracers suitable for the development of fluorescent or bioluminescent immunoassays takes advantage of the possibility to produce genetic in-frame fusions of a target protein (either the analyte or a binding protein) with the bioluminescent label. Several fusions of binding and luminescent proteins have been produced and proposed as universal reagents for immunoassays or immunohistochemistry
3 Antibody-based Binding Assays 11
techniques. In particular, a fusion protein between protein A and GFP has been proposed as a reagent for Western blotting [48]. Alternatively, the IgG-binding domain of protein A (ZZ) was fused to obelin and utilized for developing quantitative assays of rabbit, mouse, and human IgGs, with the same sensitivity obtained with commercially available protein A–HRP conjugate [49]. With a similar strategy, GFP was fused to the single-chain antibody variable fragment of antibodies of interest, such as the antibody specific for hepatitis B surface antigen [50], for the E6 protein of human papillomavirus type 16 [51], or for the herbicide picloram [52]. In all cases the fusion protein retained both binding and luminescence properties and was suitable for developing sensitive and rapid antibody-based assays. An alternative approach consists of engineering the GFP amino acid sequence by inserting antibody-binding loops into surface-exposed loops at one end of the GFP molecule. The resulting so-called fluorobody works as a binding protein with intrinsic fluorescence characteristics [53]. A fusion protein between streptavidin and luciferase from Pyrophorus plagiophthalamus was produced in an insect cell line using the baculovirus system and was used in both sandwich- and competitivetype immunoassays [54]. An identical cloning strategy was used to obtain a fusion protein between streptavidin and GFP, which was successfully used for identification of biotinylated antibodies in living or fixed cells by fluorescence microscopy, proving that GFP is an excellent marker for histochemical applications. However, in the microtiter plate format the GFP–streptavidin fusion exhibited much lower detectability than the luciferase–streptavidin fusion. The binding domain from a human hyaluronan-binding protein was fused to Renilla reniformis luciferase, and the fusion protein was used to develop an assay for hyaluronan [55]. An emerging number of luciferases from marine and terrestrial organisms are finding their way to biotechnological applications. Different types of fusion proteins between eukaryotic luciferases and binding molecules, along with their possible applications, have been discussed [56]. As an example, an in vivo biotinylated firefly luciferase was constructed by gene fusion of a thermostable mutant of the Luciola lateralis luciferase with a biotin acceptor peptide [57]. This construct, which retains luciferase activity and is able to bind to streptavidin, is an extremely versatile bioluminescent label, and it has been applied to the development of various immunoassays [58, 59]. The gene-fusion approach has also been used to obtain bioluminescent tracers, where a bioluminescent protein is fused to a protein analyte. Using this approach, ultrasensitive competitive immunoassays have been developed for various analytes, such as a model octapeptide [60], the mammalian opioid leucine–enkephalin peptide [61], and protein C [62]. These works demonstrate the versatility of aequorin in such a strategy, since the target analyte can be fused at either the aequorin Nterminal or the C-terminal. In addition, the antibody-binding region of a large protein can be fused to aequorin, rather than the whole protein, thus avoiding problems usually encountered when expressing large eukaryotic proteins in bacteria [62]. The methods developed using such aequorin-based tracers were shown to be more sensitive than immunoassays based on conventional enzyme- or fluorescence-based assays. With a similar strategy, an analyte–obelin fusion protein was used to develop an immunoassay for a model octapeptide [63].
12 Luminescent Proteins in Binding Assays
3.3 Dual-analyte Assays
The possibility of assaying more than one analyte in one assay is particularly appealing in analytical chemistry, though still not very often reported. A dual-analyte competitive binding assay using tandem flash luminescence from the photoprotein aequorin and an acridinium-9-carboxamide label has been described. The assay allowed quantification of two analytes in one tube, by sequential triggering and measuring of the two luminescent reactions. The assay, which requires very short measurement times, is amenable for high-throughput screening [64]. A competitive fluorescence immunoassay that allowed quantification of two model peptide analytes in one well was developed by using two mutants of GFP emitting at different wavelengths as labels. Each peptide was genetically fused to either red-shifted GFP (rsGFP) or blue fluorescent protein (BFP), and the conjugates were used to develop an assay in which the peptides could be independently measured in one well, with detection limits on the order of 10−9 M to 10−10 M [65]. Because of the great variety of available fluorescent and bioluminescent proteins, further advances in this field are expected. 3.4 Expression Immunoassays
Use of enzyme labels is successful in the development of immunoassays because they allow signal amplification as a result of their high substrate turnover. Expression immunoassays take advantage of commercially available cell-free transcription–translation systems to introduce a further amplification system in the assay. In particular, an enzyme-coding DNA fragment, which also contains control elements for the transcription–translation process, is used as a label, instead of the protein itself. The amplification factor obtained by using this approach is potentially very high because, theoretically, several mRNA molecules are generated from each DNA molecule during the transcription–translation process, and, similarly, several protein molecules are generated from each mRNA molecule. Subsequently, the enzyme is detected by adding the suitable substrate. Various expression immunoassays have been developed using a firefly luciferase-coding DNA sequence that, in addition to luciferase high detectability, offers the advantage of being composed of a single and relatively short (550-amino-acid) polypeptide chain, which enhances its in vitro expression efficiency. As an example, an immunoassay for anti-thyrotropin immunoglobulins characterized by a limit of detection of 5 × 104 molecules per well [66] and a sandwich immunoassay for prostate-specific antigen characterized by a limit of detection of 30 ng L−1 [67] have been developed. In order to circumvent difficulties encountered when linking the antibody to the reporter gene, a bifunctional cross-linking molecule was recently produced. The linker molecule consists of (strept)avidin bound to a poly(dA) oligonucleotide, which is able to bind both to a biotinylated antibody and to a poly(dT) sequence properly enzymatically added to the reporter gene [68].
4 Biotin–Avidin Binding Assays 13
3.5 BRET-based Immunoassays
The possibility of fusing a binding with a luminescent protein has been exploited for developing FRET- and BRET-based immunoassays. An example is the development of open sandwich fluorescent immunoassays, where the antibody heavy-chain and light-chain fragments are fused to FRET or BRET partners. The assay is based on the RET phenomenon observed upon antigen-dependent re-association of antibody variable domains. Using this strategy, a FRET-based assay [69] and a BRET-based assay for egg lysozyme [70] were developed. However, the authors observed that such an interesting and innovative approach is not always guaranteed to work. For example, when the epitopes recognized by the couple of antibody fragments are quite far from one another on the analyte molecule, a situation in which FRET or BRET partners are not close enough for energy transfer, even after antibody variable domain association, may occur. For this reason, the authors further developed the method by fusing an artificial dimerization motif of leucine zippers at the C-terminal of FRET partners. These additional dimerization motifs were able to bring FRET partners in close proximity upon antigen–antibody reactions, thus enhancing FRET efficiency. Using this strategy, a homogeneous sandwich immunoassay for a rather large molecule, such as human serum albumin, was developed [71].
4 Biotin–Avidin Binding Assays
Biotin (vitamin H) is a vitamin found in tissue and blood that binds to the glycoprotein avidin, which contains four identical binding sites for biotin. Biotin and its binding protein avidin have been used extensively in the development of a variety of bioanalytical techniques that take advantage of the extremely high affinity (ka 1015 M−1 ) between the two biomolecules. Biotin has also been selected as a model analyte to develop binding assays in order to explore the possibility of applying luminescent proteins as labels for the determination of given analytes in competitive binding assay formats. These assays are based upon the competition between free biotin in a sample and the biotin–label conjugate for the binding sites on avidin [72]. A biotin–aequorin conjugate was produced by conventional chemical conjugation methods and used to develop both a homogeneous and a heterogeneous assay for biotin. The homogeneous assay relied on the inhibition of the signal from the biotin–aequorin conjugate upon binding with avidin. The degree of inhibition was inversely proportional to the free biotin concentration, and a detection limit for biotin of 1 × 10−14 M was achieved [73]. Alternatively, the heterogeneous assay employed a solid phase of avidin-coated microspheres to separate the bound and free biotin-aequorin conjugate, which permitted lowering the detection limit to 1 × 10−15 M [74]. Such a low detection limit, corresponding to 100 zmol of biotin in
14 Luminescent Proteins in Binding Assays
the sample, allowed for miniaturization of the assay. A homogeneous competitive binding assay for the detection of biotin in microfabricated picoliter vials using biotinylated recombinant aequorin was developed. The results showed that detection limits on the order of 10−14 mol of biotin were possible [75]. A binding assay for avidin in picoliter-volume vials in which avidin could be detected at femtomole levels was also described [76]. These binding assays based on picoliter volumes have potential applications in different fields, such as microanalysis and single-cell analysis, in which the amount of sample is limited. In addition, the small volumes allow for instantaneous mixing of the reagents, thus rendering these miniaturized assays suitable for high-throughput screening of biopharmaceuticals and potential drugs synthesized by combinatorial methods. The detection of biotin in individual cells using a BL competitive binding assay was reported. An aequorin–biotin conjugate, free biotin, and avidin were microinjected into sea urchin oocytes, and then the resulting bioluminescence within the oocyte upon triggering of aequorin was measured by a photomultiplier tube-based microscope photometry system. The method allowed for the detection of approximately 20 amol of biotin inside individual oocytes [77]. Recombinant GFP and its mutant enhanced GFP (EGFP) have been used as labels in heterogeneous and homogeneous competitive binding assays for biotin, respectively. Detection limits on the order of 10−8 M and 10−9 M were obtained [42, 78]. Homogeneous binding assays for biotin were recently developed using lumines cent proteins as labels and BRET as the measurement technique. Biotinylated aequorin and a quencher dye conjugated to avidin were used as the two partners of the BRET pair. When biotin bound to avidin, bioluminescence from aequorin was transferred to the chemical dye, which was selected on the basis of its ability to quench the BL emission. A water-soluble biotin derivative or a biotinylated protein was used as a model analyte to develop homogeneous competitive assays. In both cases dose–response curves showing an inverse relationship between percent bioluminescence quenching and analyte concentration were generated, thus demonstrating the feasibility of assays based on this BRET format. Furthermore, the approach is quite general because biotin–avidin could be replaced with a different pair, e.g., hapten–antibody, and other quencher dyes could be used [79]. Fusions of aequorin with streptavidin (SAV) and enhanced green fluorescent protein (EGFP) with biotin carboxyl carrier protein (BCCP) were obtained after expression of the corresponding genes in Escherichia coli cells. Association of SAV–aequorin and BCCP–EGFP was followed by BRET between aequorin (donor) and EGFP (acceptor). It was shown that free biotin inhibited BRET in a dosedependent manner because of its competition with BCCP-EGFP for binding to SAV–aequorin. The distinctive feature of this assay is the use of protein fusions that provide bifunctionally active conjugates that are uniform in their stoichiometry and in the point of reporter protein attachment and that are readily available after expression of the corresponding genes in E. coli and purification. This represents an advantage with respect to methods that use the chemical conjugation of biotin and avidin with donor and acceptor, since chemical modifications are often difficult to reproduce; in addition, in the case of aequorin particular precautions
5 Nucleic Acid Hybridization Assays 15
should be taken while biotinylating and storing the conjugate to avoid loss of BL activity [80].
5 Nucleic Acid Hybridization Assays
The analysis of specific nucleic acid sequences by hybridization is a fast-growing area of laboratory medicine. These methods are based on the specific hybridization of a suitably labeled oligonucleotide probe with a target nucleic acid sequence, followed by the hybrid detection and quantification. Luminescent proteins are becoming more and more popular as labels because luminescence-based hybridization assays are amenable to automation and offer higher detectability over conventional spectrophotometric ones. A microtiter well-based hybridization assay using aequorin as a detection molecule has been developed. The target DNA was hybridized simultaneously with a digoxigenin (dig)-labeled capture probe and a biotinylated detection probe. The hybrids were immobilized onto the wells through dig–anti-dig interaction and were detected by aequorin covalently attached to streptavidin or by complexes of biotinylated aequorin with streptavidin. Linearity of response in the range from 5 amol to 10 fmol of target DNA was shown. The method was combined with reverse transcriptase polymerase chain reaction (RT-PCR) and applied to the determination of the mRNA for prostate-specific antigen (PSA) that was detected from a single cell in the presence of 106 cells not expressing PSA. The method proved to be very fast, since the detection of aequorin is practically instantaneous as compared to the long incubation times required for the detection of enzyme-labeled probes [81]. An aequorin-based immunoassay combined with RT-PCR for quantitating cytokine mRNA has been developed. In this system the hybridized duplex was captured onto a streptavidin-coated microtiter plate and quantitated by anti-dig antibody conjugated with aequorin. The method detected as low as 40 amol of amplified cytokine products, corresponding to 500 copies of template when 27 PCR cycles were used [82]. The same capture and detection format, in combination with RT-PCR, was exploited for investigating the induction of human cytokine expression in peripheral blood mononuclear cells. A statistically significant increase in cDNA for several cytokines in stimulated cells compared to unstimulated cells was demonstrated. The BL method exhibited significant advantages when compared with radioactive methods, such as the ability to quantitate amplicons in a PCR cycle range where linear detection is more robust and to analyze products in an automated, open-architecture microtiter plate format [83]. Hybridization assays that use luminescent proteins as labels have been successfully coupled to PCR, in order to quantify the amplification product in a more accurate manner. A dual-analyte luminescence hybridization assay for quantitative PCR has been developed. This method allowed the simultaneous determination of both amplified target DNA and internal standard (IS) in the same reaction vessel. Biotinylated PCR products from target DNA and IS were captured on a single
16 Luminescent Proteins in Binding Assays
microtiter well coated with streptavidin. The amplified target DNA was hybridized with a dig-labeled specific probe, and the hybrids were detected by anti-dig antibody labeled with aequorin. The amplified IS DNA was hybridized in the same well with a fluorescein-labeled specific probe, and the hybrids were detected by anti-fluorescein antibody labeled with AP. The ratio of the luminescence values obtained from the target DNA and IS amplification products was linearly related to the number of target DNA molecules present in the sample prior to amplification, over a range of 430 to 315 000 target DNA molecules [84]. Novel hybridization assay configurations based on in vitro expression of DNA reporter molecules have been proposed. In vitro expression of DNA consists of the cell-free transcription of DNA to mRNA followed by translation of the mRNA into protein. In one study a system was described that involved simultaneous hybridization of the single-stranded target DNA with a biotinylated capture probe, immobilized on streptavidin-coated microtiter wells, and a dATP-tailed detection probe. The hybrids were reacted with dTTP-tailed firefly luciferase-coding DNA fragments followed by in vitro expression of the DNA on the solid phase. Several enzyme molecules per molecule of enzyme-coding DNA label were generated in solution, thus providing signal amplification [85]. Another study described a similar configuration that used a DNA label encoding apoaequorin. After in vitro expression, apoaequorin was converted to active aequorin and each DNA label was estimated to produce 156 aequorin molecules. A detection limit as low as 0.25 amol of target DNA was achieved, with a linear range over four orders of magnitude [86]. The significance of this approach lies in the use of DNA in hybridization assays as a signal-generating molecule (reporter), rather than solely as a recognition molecule, which forms the basis of a highly sensitive analytical system for binding assays. A different strategy to enhance the sensitivity of aequorin-based hybridization assays relied on the introduction of multiple aequorin labels per DNA hybrid, through enzyme amplification. After simultaneous hybridization of the target DNA with a biotinylated capture probe and a dig-labeled detection probe, the hybrids were reacted with anti-dig antibody labeled with HRP. Peroxidase catalyzed the oxidation of digoxigenin–tyramine by H2 O2 , with the production of tyrosyl radicals that reacted with the tyrosine residues of the streptavidin immobilized on the surface of the microtiter wells, forming protein-bound dityrosine. This resulted in the covalent attachment of multiple dig moieties to the solid phase that were detected by aequorin-labeled anti-dig antibody. An eightfold improvement in detectability was observed with the enzyme amplification as compared to an assay that used only aequorin-labeled anti-dig antibody, resulting in the detection of as low as 1 amol per well of target DNA [87]. Biotinylated aequorin complexed with streptavidin can be used as a reporter molecule in hybridization assays to detect target DNA either labeled with biotin through PCR or hybridized with biotin-labeled probe, as well as in immunoassays in combination with biotinylated antibodies. In order to avoid the inconvenience of chemical cross-linking, bacterial expression of in vivo-biotinylated aequorin was carried out. The entire process was completed in less than two days, and it was
6 Other Binding Assays 17
calculated that 1 L of bacterial culture provided enough biotinylated aequorin for 300 000 hybridization assays [88]. A new conjugation strategy based on the use of recombinant aequorin fused to a hexahistidine tag was recently proposed to obtain aequorin–oligonucleotide conjugates. Affinity capture-facilitated purification of the conjugates enabled the rapid and effective removal of the unreacted oligonucleotide. The conjugates were applied to the development of rapid bioluminometric hybridization assays that exhibited a dynamic range of 2–2000 pmol L−1 of target DNA [89]. Recently, the potential of Gaussia princeps luciferase (GLuc) as a new label for DNA hybridization has been examined. Because luciferases are inactivated upon conjugation to other biomolecules, such as DNA probes or antibodies, their use as labels is limited. To overcome inactivation problems, a bacterial overexpression system that produces in vivo-biotinylated GLuc has been designed. Purified GLuc activity was maintained, allowing enzyme detection down to 1 amol. Biotinylated GLuc was complexed with streptavidin and used as a detection reagent in a microtiter well-based DNA hybridization assay, in which an analytical range of 1.6–800 pmol L−1 of target DNA was shown [90]. The difference in light emission kinetics between the Ca2+ -triggered BL reaction of aequorin and the alkaline phosphatase (AP)-catalyzed CL hydrolysis of dioxetane-based substrates has been exploited for the analysis of both alleles of biallelic polymorphisms in a single microtiter well. Genomic DNA isolated from blood was first subjected to PCR. Then a single-oligonucleotide ligation reaction employing two allele-specific probes, labeled with biotin and digoxigenin, and a common probe carrying a characteristic tail was performed. The ligation product was captured in the microtiter well by hybridization of the tail with an immobilized complementary oligonucleotide. The products were detected by adding a mixture of streptavidin–aequorin complex and anti-dig–alkaline phosphatase conjugate. The ratio of the luminescence signals obtained from AP and aequorin gave the genotype of each sample [91].
6 Other Binding Assays
A different type of binding assay that exploits mutants of the EGFP was developed for the detection of bacterial endotoxin, or lipopolysaccharide (LPS). EGFP variants containing binding sites for LPS and lipid A (LA), the bioactive component of LPS, were obtained by virtual mutagenesis. DNA mutant constructs were expressed in COS-1 cells. LPS or LA binding to the EGFP mutant proteins caused concentrationdependent fluorescence quenching. Thus, the EGFP mutant can represent the basis of a novel fluorescent biosensor for bacterial endotoxin [92]. In another study it was demonstrated that the EGFP mutant can specifically tag gram-negative bacteria such as Escherichia coli and Pseudomonas aeruginosa in contaminated environmental water samples. This dual function in detecting both free endotoxin and live gramnegative bacteria extends the potential of this novel fluorescent biosensor [92].
18 Luminescent Proteins in Binding Assays
GFP mutants were designed, created, and characterized after identifying potential metal-binding sites on the surface of GFP. These metal-binding mutants of GFP exhibit fluorescence quenching at lower transition metal ion concentrations than those of the wild-type protein, thus representing a new class of luminescent protein-based metal sensors [93].
7 Conclusions
The naturally occurring luminescent proteins aequorin, obelin, GFP, luciferases, and their mutants have proved to be highly effective labels in the development of different types of binding assays, including protein–protein and protein–ligand interaction assays, immunoassays, biotin–avidin-based assays, and nucleic acid hybridization assays. These proteins can be detected down to very low levels, thus allowing ultrasensitive detection of the target analytes. This also enables the analysis of small-volume samples, which leads to the development of miniaturized and high-throughput assays. Further advantages include the opportunities to produce fusion proteins and to genetically engineer the native proteins to emit luminescence signals at different wavelengths or with higher efficiency. These features, along with the discovery of new luminescent proteins, allow for their use as labels in the simultaneous array detection of several biomolecules in a given sample, as well as for the development of improved FRET- and BRET-based assays.
References 1 DIONNE, P., CARON, M., LABONTE` , A., CARTER-ALLEN, K., HOULE, B., JOLY, E., TAYLOR, S. C., MENARD, L. BRET2 : Efficient energy transfer from Renilla luciferase to GFP2 to measure protein-protein interactions and intracellular signaling events in living cells. In Luminescence biotechnology. Instruments and applications (Van DYKE, K., Van DYKE, C., WOODFORK, K., Eds.), CRC Press, Boca Raton, pp. 539–555, 2002. 2 EIDNE, K. A., KROEGER, K. M., HANYALOGLU, A. C. Applications of novel resonance energy transfer techniques to study dynamic hormone receptor interactions in living cells. Trends Endocrinol. Metab. 2002, 13, 415–421. 3 BOUTE, N., JOCKERS, R., ISSAD, T. The use of resonance energy transfer in
high-througput screening: BRET versus FRET. Trends Pharmacol. Sci. 2002, 23, 351–354. 4 BERGLUND, M. M., SCHOBER, D. A., ESTERMAN, M. A., GEHLERT, D. R. Neuropeptide Y Y4 receptor homodimers dissociate upon agonist stimulation. J. Pharmacol. Exp. Ther. 2003, 307, 1120–1126. 5 PFEIFFER, M., KIRSCHT, S., STUMM, R., KOCH, T., WU, D., LAUGSCH, M., SCHRODER, H., HOLLT, V., SCHULTZ, S. Heterodimerization of substance P and µ-opioid repeptors regulates receptor trafficking and resensitization. J. Biol. Chem. 2003, 278, 51630–51637. 6 SVARTZ, J., BLOMGRAN, R., HAMMARSTROM, S., SODERSTROM, M. Leukotriene C4 synthase homo-oligomers detected in living cells
References
7
8
9
10
11
12
13
14
by bio luminescence resonance energy trans fer. Biochim. Biophys. Acta 2003, 21, 90–95. KAMIYA, T., SAITOH, O., YOSHIOKA, K., NAKATA, H. Oligomerization of adenosine A2A and dopamine D2 receptors in living cells. Biochem. Biophys. Res. Commun. 2003, 306, 544–549. CALLEJA, V., AMEER-BEG, S. M., VOJNOVIC, B., WOSCHOLSKI, R., DOWNWARD, J., LARIJANI, B. Monitoring conformational changes of proteins in cells by fluorescence lifetime imaging microscopy. Biochem. J. 2003, 372, 33–40. BABCOCK, G. J., FARZAN, M., SODROSKI, J. Ligand-independent dimerization of CXCR4, a principal HIV-1 coreceptor. J. Biol. Chem. 2003, 278, 3378–3385. TERRILLON, S., DURROUX, T., MOUILLAC, B., BREIT, A., AYOUB, M. A., TAULAN, M., JOCKERS, R., BARBERIS, C., BOUVIER, M. Oxytocin and vasopressin V1a and V2 receptors form constitutive homo- and heterodimers during biosynthesis. Mol. Endocrinol. 2003, 17, 677–691. BOUTE, N., BOUBEKEUR, S., LACASA, D., ISSAD, T. Dynamics of the interaction between the insulin receptor and protein tyrosine-phosphatase 1B in living cells. EMBO Rep. 2003, 4, 313–319. GERMAIN-DESPREZ, D., BAZINET, M., BOUVIER, M., AUBRY, M. Oligomerization of transcriptional intermediary factor 1 regulators and interaction with ZNF74 nuclear matrix protein revealed by bioluminescence resonance energy transfer in living cells. J. Biol. Chem. 2003, 278, 22367–22373. TRAMIER, M., GAUTIER, I., PIOLOT, T., RAVALET, S., KEMNITZ, K., COPPEY, J., DURIEUX, C., MIGNOTTE, V., COPPEY-MOISAN, M. Picosecond-hetero-FRET microscopy to probe protein-protein interactions in live cells. Biophys. J. 2002, 83, 3570–3577. LATIF, R., GRAVES, P., DAVIES, T. F. Ligand-dependent inhibition of oligomerization at the human tyrotropin receptor. J. Biol. Chem. 2002, 277, 45059–45067.
15 NG, E. K., CHAN, K. K., WONG, C. H., TSUI, S. K., NGAI, S. M., LEE, S. M., KOTAKA, M., LEE, C. Y., WAYE, M. M., FUNG, K. P. Interaction of the heart-specifc LIM domain protein, FHL2, with DNA-binding nuclear protein, hNP220. J. Cell Biochem 2002, 84, 556–566. 16 AYOUB, M. A., CONTURIER, C., LUCAS-MEUNIER, E., ANGERS, S., FOISSER, P., BOUVIER, M., JOCKERS, R. Monitoring of ligand-independent dimerization and ligand-induced conformational changes of melatonin receptors in living cells by bioluminescence resonance energy transfer. J. Biol. Chem. 2002, 277, 21522–21528. 17 RAMSAY, D., KELLETT, E., MCVEY, M., REES, S., MILLIGAN, G. Homo- and hetero-oligomeric interactions between G-protein coupled receptors in living cells monitored by two variants of bioluminescence resonance energy transfer (BRET): hetero-oligomers between receptor subtypes form more efficiently than between less closely related sequences. Biochem. J. 2002, 365, 429–440. 18 YOSHIOKA, K., SAITOH, O., NAKATA, H. Agonist-promoted heteromeric oligomerization between adenosine A(1) and P2Y(1) receptors in living cells. FEBS Lett. 2002, 523, 147–151. 19 KRAFT, K., OLBRICH, H., MAJOUL, I., MACK, M., PROUDFOOT, A., OPPERMANN, M. Characterization of sequence determinants within the carboxyl-terminal domain of chemokine receptor CCR5 that regulate signaling and receptor internalization. J. Biol. Chem. 2001, 276, 34408–34418. 20 DINGER, M. C., BADER, J. E., KOBOR, A. D., KRETZSCHMAR, A. K., BECK-SICKINGER, A. G. Homodimerization of neuropeptide y receptors investigated by fluorescence resonance energy transfer in living cells. J. Biol. Chem. 2003, 278, 10562–10571. 21 LATIF, R., GRAVES, P., DAVIES, T. F. Oligomerization of the human thyrotropin receptor: fluorescent protein-tagged hTSHR reveals post-translational complexes. J. Biol. Chem. 2001, 276, 45217–45224.
19
20 Luminescent Proteins in Binding Assays
22 VALENZUELA-FERNANDEZ, A., PALANCHE, T., AMARA, A., MAGERUS, A., ALTMEYER, R., DELAUNAY, T., VIRELIZIER, J. L., BALEUX, F., GALZI, J. L., ARENZANA-SEISDEDOS, F. Optimal inhibition of X4 HIV isolates by the CXC chemokine stromal cell-derived factor 1 alpha requires interaction with cell surface heparan sulfate proteoglycans. J. Biol. Chem. 2001, 276, 26550–26558. 23 KRETZSCHMAR, A. K., DINGER, M. C., HENZE, C., BROCKE-HEIDRICH, K., HORN, F. Analysis of Stat3 (signal transducer and activator of transcription 3) dimerization by fluorescence resonance energy transfer in living cells. Biochem. J. 2004, 377, 289–297. 24 JANETOPOULOS, C., DEVREOTES, P. Monitoring receptor-mediated activation of heterotrimeric G-proteins by fluo-rescence resonance energy transfer. Methods 2002, 27, 366–373. 25 BISKUP, C., ZIMMER, T., BENNDORF, K. FRET between cardiac Na+ channel subunits measured with a confocal microscope and a streak camera. Nat. Biotechnol. 2004, 22, 220–224. 26 HE, L., BRADRICK, T. D., KARPOVA, T. S., WU, X., FOX, M. H., FISCHER, R., MCNALLY, J. G., KNUTSON, J. R., GRAMMER, A. C., LIPSKY, P. E. Flow cytometric measurement of fluorescence (Forster) resonance energy transfer from cyan fluorescent protein to yellow fluorescent protein using single-laser excitation at 458 nm. Cytometry 2003, 53, 39–54. 27 XU, Y., PISTON, D. W., JOHNSON, C. H. A bioluminescence resonance energy transfer (BRET) system: application to interacting circadian clock proteins. Proc. Natl. Acad. Sci. USA 1999, 96, 151–156. 28 MCVEY, M., RAMSAY, D., KELLETT, E., REES, S., WILSON, S., POPE, A. J., MILLIGAN, G. Monitoring receptor oligomerization using time-resolved fluorescence resonance energy transfer and bioluminescence resonance energy transfer. The human delta-opioid receptor displays constitutive oligomerization at the cell surface, which is not regulated by receptor occupancy. J. Biol. Chem. 2001, 276, 14092–10499.
29 KROEGER, K. M., EIDNE, K. A. Study of g-protein-coupled receptor-protein interactions by bioluminescence resonance energy transfer. Methods Mol. Biol. 2004, 259, 323–334. 30 CHENG, Z. J., MILLER, L. J. Agonist-dependent dissociation of oligomeric complexes of G protein-coupled cholecystokinin receptors demonstrated in living cells using bioluminescence resonance energy transfer. J. Biol. Chem. 2001, 276, 48040–48047. 31 ANGERS, S., SALAHPOUR, A., BOUVIER, M. Biochemical and biophysical demon-stra tion of GPCR oligomerization in mammalian cells. Life Sci. 2001, 68, 2243–2250. 32 HASBI, A., DEVOST, D., LAPORTE, S. A., ZINGG, H. H. Real time detection of interactions between the human oxytocin receptor and G protein-coupled receptor kinase-2. Mol. Endocrinol 2004, 18, 1277–1286. 33 BOUTE, N., PERNET, K., ISSAD, T. Monitoring the activation state of the insulin receptor using bioluminescence resonance energy transfer. Mol. Pharmacol. 2001, 60, 640–645. 34 MUDDANA, S. S., PETERSON, B. R. Fluorescent cellular sensors of steroid receptor ligands. Chembiochem. 2003, 4, 848–855. 35 GAUTIER, I., TRAMIER, M., DURIEUX, C., COPPEY, J., PANSU, R. B., NICOLAS, J. C., KEMNITZ, K., COPPEY-MOISAN, M. Homo-FRET microscopy in living cells to measure monomer-dimer transition of GFP-tagged proteins. Biophys. J. 2001, 80, 3000–3008. 36 GRIGORENKO, V., ANDREEVA, I., BORCHERS, T., SPENER, F., EGOROV, A. A genetically engineered fusion protein with horseradish peroxidase as a marker enzyme for use in competitive immunoassays. Anal. Chem. 2001, 73, 1134–1139. 37 ERIKAKU, T., ZENNO, S., INOUYE, S. Bioluminescent immunoassay using a monomeric Fab -photoprotein aequorin conjugate. Biochem. Biophys. Res. Commun. 1991, 174, 1331–1336.
References 38 STULTS, N. L., STOCKS, N. F., RIVERA, H., GRAY, J., MCCANN, R. O., O’KANE, D., CUMMINGS, R. D., CORMIER, M. J., SMITH, D. F. Use of recombinant biotinylated aequorin in microtiter and membrane-based assays: purification of recombinant apoaequorin from Escherichia coli. Biochemistry. 1992, 31, 1433–1442. 39 ZATTA, P. F. A new bioluminescent assay for studies of protein G and protein A binding to IgG and IgM. J. Biochem. Biophys. Methods. 1996, 32, 7–13. 40 MIRASOLI, M., DEO, S. K., LEWIS, J. C., RODA, A., DAUNERT, S. Bioluminescence immunoassay for cortisol using recombi-nant aequorin as a label. Anal. Biochem. 2002, 306, 204–211. 41 DESAI, U. A., DEO, S. K., HYLAND, K. V., POON, M., DAUNERT, S. Determination of prostacyclin in plasma through a bioluminescent immunoassay for 6-keto-prostaglandin F1alpha: implication of dosage in patients with primary pulmonary hypertension. Anal. Chem. 2002, 74, 3892–3898. 42 DEO, S. K., DAUNERT, S. Green fluorescent protein mutant as label in homogeneous assays for biomolecules. Anal. Biochem. 2001a, 289, 52–59. 43 VALDIVIESO-GARCIA, A., DESRUISSEAU, A., RICHE, E., FUKUDA, S., TATSUMI, H. Evaluation of a 24-hour bioluminescent enzyme immunoassay for the rapid detection of Salmonella in chicken carcass rinses. J. Food Prot. 2003, 66, 1996–2004. 44 FRANK, L. A., PETUNIN, A. I., VYSOTSKI, E. S. Bioluminescent immunoassay of thyro-tropin and thyroxine using obelin as a label. Anal. Biochem. 2004, 325, 240–246. 45 LEWIS, J. C., DAUNERT, S. Bioluminescence immunoassay for thyroxine employing genetically engineered mutant aequorins containing unique cysteine residues. Anal. Chem. 2001, 73, 3227–3233. 46 MISKA, W., GEIGER, R. A new type of ultrasensitive bioluminogenic enzyme substrates. I. Enzyme substrates with D-luciferin as leaving group. Biol. Chem. Hoppe. Seyler. 1988, 369, 407–411.
47 MAEDA, M. New label enzymes for bioluminescent enzyme immunoassay. J. Pharm. Biomed. Anal. 2003, 30, 1725–1734. 48 AOKI, T., TAKAHASHI, Y., KOCH, K. S., LEFFERT, H. L., WATABE, H. Construction of a fusion protein between protein A and green fluorescent protein and its application to western blotting. FEBS Lett. 1996, 384, 193–197. 49 FRANK, L. A., ILLARIONOVA, V. A., VYSOTSKI, E. S. Use of proZZ-obelin fusion protein in bioluminescent immunoassay. Biochem. Biophys. Res. Commun. 1996, 219, 475–479. 50 CASEY, J. L., COLEY, A. M., TILLEY, L. M., FOLEY, M. Green fluorescent antibodies: novel in vitro tools. Protein Eng. 2000, 13, 445–452. 51 SCHWALBACH, G., SIBLER, A. P., CHOULIER, L., DERYCKERE, F., WEISS, E. Production of fluorescent single-chain antibody fragments in Escherichia coli. Protein Expr. Purif. 2000, 18, 121–132. 52 KIM, I. S., SHIM, J. H., SUH, Y. T., YAU, K. Y., HALL, J. C., TREVORS, J. T., LEE, H. Green fluorescent protein-labeled recombinant fluobody for detecting the picloram herbicide. Biosci. Biotechnol. Biochem. 2002, 66, 1148–1151. 53 ZEYTUN, A., JEROMIN, A., SCALETTAR, B. A., WALDO, G. S., BRADBURY, A. R. Fluorobodies combine GFP fluorescence with the binding characteristics of antibodies. Nat. Biotechnol. 2003, 21, 1473–1479. 54 KARP, M., LINDQVIST, C., NISSINEN, R., WAHLBECK, S., AKERMAN, K., OKER-BLOM, C. Identification of biotinylated molecules using a baculovirus-expressed luciferase-streptavidin fusion protein. Biotechniques 1996, 20, 452–459. 55 CHANG, T. S., WAN, H. M., CHEN, C. C., GIRIDHAR, R., WU, W. T. Fusion protein of the hyaluronan binding domain from human TSG-6 with luciferase for assay of hyaluronan. Biotechnol. Lett. 2003, 25, 1037–1040. 56 KARP, M., OKER-BLOM, C. A streptavidin-luciferase fusion protein: comparisons and applications. Biomol. Eng. 1999, 16, 101–104.
21
22 Luminescent Proteins in Binding Assays
57 TATSUMI, H., FUKUDA, S., KIKUCHI, M., KOYAMA, Y. Construction of biotinylated fireflyluciferases using biotin acceptor peptides. Anal. Biochem. 1996, 243, 176–180. 58 SETO, Y., IBA, T., ABE, K. Development of ultra-high sensitivity bioluminescent enzyme immunoassay for prostate-specific antigen (PSA) using firefly luciferase. Luminescence 2001a, 16, 285–290. 59 SETO, Y., OHKUMA, H., TAKAYASU, S., IBA, T., UMEDA, A., ABE, K. Development of highly sensitive bioluminescent enzyme immunoassay with ultra-wide measurable range for thyroid-stimulating hormone using firefly luciferase. Anal. Chim. Acta 2001b, 429, 19–26. 60 RAMANATHAN, S., LEWIS, J. C., KINDY, M. S., DAUNERT, S. Heterogeneous bio luminescence binding assay for an octa-peptide using recombinant aequorin. Anal. Chim. Acta 1998, 369, 181–188. 61 DEO, S. K., DAUNERT, S. An immunoassay for Leu-enkephalin based on a C-terminal aequorin-peptide fusion. Anal. Chem. 2001b, 73, 1903–1908. 62 DESAI, U. A., WININGER, J. A., LEWIS, J. C., RAMANATHAN, S., DAUNERT, S. Using epitope-aequorin conjugate recognition in immunoassays for complex proteins. Anal. Biochem. 2001, 294, 132–140. 63 MATVEEV, S. V., LEWIS, J. C., DAUNERT, S. Genetically engineered obelin as a bioluminescent label in an assay for a peptide. Anal. Biochem. 1999, 270, 69–74. 64 ADAMCZYK, M., MOORE, J. A., SHREDER, K. Dual analyte detection using tandem flash luminescence. Bioorg. Med. Chem. Lett. 2002, 12, 395–398. 65 LEWIS, J. C., DAUNERT, S. Dual detection of peptides in a fluorescence binding assay by employing genetically fused GFP and BFP mutants. Anal. Chem. 1999, 71, 4321–4327. 66 WHITE, S. R., CHIU, N. H., CHRISTOPOULOS, T. K. Expression immunoassay. Methods 2000, 22, 24–32. 67 CHIU, N. H., CHRISTOPOULOS, T. K. Two-site expression immunoassay using
68
69
70
71
72
73
74
75
76
a firefly luciferase-coding DNA label. Clin. Chem. 1999, 45, 1954–1959. TANNOUS, B. A., CHIU, N. H., CHRISTOPOULOS, T. K. Heterobifunctional linker between antibodies and reporter genes for immunoassay development. Anal. Chim. Acta 2002, 459, 169–176. ARAI, R., UEDA, H., TSUMOTO, K., MAHONEY, W. C., KUMAGAI, I., NAGAMUNE, T. Fluorolabeling of antibody variable domains with green fluorescent protein variants: applica tion to an energy transfer-based homogeneous immunoassay. Protein Eng. 2000, 13, 369–376. ARAI, R., NAKAGAWA, H., TSUMOTO, K., MAHONEY, W., KUMAGAI, I., UEDA, H., NAGAMUNE, T. Demonstration of a homogeneous noncompetitive immunoassay based on bioluminescence resonance energy transfer. Anal. Biochem. 2001, 289, 77–81. OHIRO, Y., ARAI, R., UEDA, H., NAGAMUNE, T. A homogeneous and noncompetitive immunoassay based on the enhanced fluorescence resonance energy transfer by leucine zipper interaction. Anal. Chem. 2002, 74, 5786–5792. LEWIS, J. C., DAUNERT, S. Photoproteins as luminescent labels in binding assays. Fresenius J. Anal. Chem. 2000, 366, 760–768. WITKOWSKI, A., RAMANATHAN, S., DAUNERT, S. Bioluminescence binding assay for biotin with attomole detection based on recombinant aequorin. Anal. Chem. 1994, 66, 1837–1840. FELTUS, A., RAMANATHAN, S., DAUNERT, S. Interaction of immobilized avidin with an aequorin-biotin conjugate: an aequorin-linked assay for biotin. Anal. Biochem. 1997, 254, 62–68. GROSVENOR, A. L., FELTUS, A., CONOVER, R. C., DAUNERT, S., ANDERSON, K. W. Development of binding assays in microfabricated picoliter vials: an assay for biotin. Anal. Chem. 2000, 72, 2590–2594. CROFCHECK, C. L., GROSVENOR, A. L., ANDERSON, K. W., LUMPP, J. K., SCOTT, D.
References
77
78
79
80
81
82
83
L., DAUNERT, S. Detecting biomolecules in picoliter vials using aequorin bioluminescence. Anal. Chem. 1997, 69, 4768–4772. FELTUS, A., GROSVENOR, A. L., CONOVER, R. C., ANDERSON, K. W., DAUNERT, S. Detection of biotin in individual sea urchin oocytes using a bioluminescence binding assay. Anal. Chem. 2001, 73, 1403–1407. HERNANDEZ, E. C., DAUNERT, S. Recombinant green fluorescent protein as a label in binding assays. Anal. Biochem. 1998, 261, 113–115. ADAMCZYK, M., MOORE, J. A., SHREDER, K. Quenching of biotinylated aequorin bioluminescence by dye-labeled avidin conjugates: application to homogeneous bioluminescence resonance energy transfer assays. Org. Lett. 2001, 3, 1797–1800. GOROKHOVATSKY, A. Y., RUDENKO, N. V., MARCHENKOV, V. V., SKOSYREV, V. S., ARZHANOV, M. A., BURKHARDT, N., ZAKHAROV, M. V., SEMISOTNOV, G. V., VINOKUROV, L. M., ALAKHOV, Y. B. Homogeneous assay for biotin based on Aequorea victoria bioluminescence resonance energy transfer system. Anal. Biochem. 2003, 313, 68–75. GALVAN, B., CHRISTOPOULOS, T. K. Bioluminescence hybridization assays using recombinant aequorin. Application to the detection of prostate-specific antigen mRNA. Anal. Chem. 1996, 68, 3545–3550. XIAO, L., YANG, C., NELSON, C. O., HOLLOWAY, B. P., UDHAYAKUMAR, V., LAL, A. A. Quantitation of RT-PCR amplified cytokine mRNA by aequorin-based bioluminescence immunoassay. J. Immunol. Methods 1996, 199, 139–147. ACTOR, J. K., KUFFNER, T., DEZZUTTI, C. S., HUNTER, R. L., MCNICHOLL, J. M. A flash-type bioluminescent immunoassay that is more sensitive than radioimaging: quantitative detection of cytokine cDNA in activated and resting human cells. J. Immunol. Methods 1998, 211, 65–77.
84 VERHAEGEN, M., CHRISTOPOULOS, T. K. Quantitative polymerase chain reaction based on a dual-analyte chemiluminescence hybridization assay for target DNA and internal standard. Anal. Chem. 1998, 70, 4120–4125. 85 LAIOS, E., IOANNOU, P. C., CHRISTOPOULOS, T. K. Novel hybridization assay configurations based on in vitro expression of DNA reporter molecules. Clin. Biochem. 1998, 31, 151–158. 86 WHITE, S. R., CHRISTOPOULOS, T. K. Signal amplification system for DNA hybridization assays based on in vitro expression of a DNA label encoding apoaequorin. Nucleic Acids Res. 1999, 27, e25i–viii. 87 LAIOS, E., IOANNOU, P. C., CHRISTOPOULOS, T. K. Enzyme-amplified aequorin-based bioluminometric hybridization assays. Anal. Chem. 2001, 73, 689–692. 88 VERHAEGEN, M., CHRISTOPOULOS, T. K. Bacterial expression of in vivo-biotinylated aequorin for direct application to bioluminometric hybridization assays. Anal. Biochem. 2002a, 306, 314–322. 89 GLYNOU, K., IOANNOU, P. C., CHRISTOPOULOS, T. K. Affinity capture-facilitated preparation of aequorin-oligonucleotide conjugates for rapid hybridization assays. Bioconjugate Chem. 2003, 14, 1024–1029. 90 VERHAEGEN, M., CHRISTOPOULOS, T. K. Recombinant Gaussia luciferase. Overexpression, purification, and analytical application of a biolumi nes-cent reporter for DNA hybridization. Anal. Chem. 2002b, 74, 4378–4385. 91 TANNOUS, B. A., VERHAEGEN, M., CHRISTOPOULOS, T. K., KOURAKLI, A. Combined flash- and glow-type chemiluminescent reactions for high-throughput genotyping of biallelic polymorphisms. Anal. Biochem. 2003, 320, 266–272. 92 GOH, Y. Y., FRECER, V., HO, B., DING, J. L. Rational design of green fluorescent protein mutants as biosensor for
23
24 Luminescent Proteins in Binding Assays
bacterial endotoxin. Protein Eng. 2002a, 15, 493–502. 93 RICHMOND, T. A., TAKAHASHI, T. T., SHIMKHADA, R., BERNSDORF, J. Engineered metal binding sites on green fluorescence protein. Biochem. Biophys.
Res. Commun. 2000, 268, 462–465. 94 GOH, Y. Y., HO, B., DING, J. L. A novel fluorescent protein-based biosensor for gram-negative bacteria. Appl. Environ. Microbiol. 2002b, 68, 6343–6352.
1
Sensory Defects Caused by Mutant Motor Proteins Karen B. Avraham
Tel Aviv University, Tel Aviv, Israel
Originally published in: Molecular Motors. Edited by Manfred Schliwa. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52730594-0
1 Introduction
Mutations in a multitude of molecular motors cause sensory defects. The loss of these senses, in particular vision and hearing, is not life-threatening, but it can severely affect one’s quality of life. The myosin mutations associated with hearing loss affect the hair cells of the cochlea, and the mutations associated with visual impairment affect the retinal pigment epithelial (RPE) cells and photoreceptors of the retina in a disease termed retinitis pigmentosa (RP) (Fig. 1). Mutations in five myosins, myosin IIIA, myosin VI, myosin VIIA, myosin XV, and the nonmuscle-myosin heavy-chain MYH9 are known to date to lead to hearing loss, while myosin VIIA mutations also cause RP and hearing loss in Usher syndrome type IB. This chapter will describe these genes, the proteins they encode, and the pathology caused by mutations in these genes.
2 Development of the Visual and Auditory Sensory Systems
The three major sensory organs of the head, namely the eye, the ear and the nose, develop from interactions between neural crest cells and epidermal thickenings, the cranial ectodermal placodes. The organs of the inner ear develop from the otic placode [1]. The human inner ear is composed of the cochlea, the organ for hearing, and the vestibular apparatus, formed from five organs, the saccule, utricle, and the three semicircular canals. Beginning at or about embryonic day 22 in humans, a thickening of the surface ectoderm occurs on either side of the rhombencephalon. Through a series of signaling mechanisms, interaction, and incorporation of other Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Sensory Defects Caused by Mutant Motor Proteins
Fig. 1 The inner ear and eye structures affected by mutations in myosins. (A) The inner ear is composed of the cochlea, the organ responsible for hearing, and the vestibular apparatus, made up of five organs (three semicircular canals, utricle, and saccule) responsible for the position and
movement of our head in space. Degeneration of the inner (IHC) and outer hair cells (OHC) of the organ of Corti often leads to hearing loss. (B) The retina of the eye contains photoreceptors, rods and cones, which along with the pigment epithelium, degenerate to cause retinitis pigmentosa.
embryonic tissues, the otic placode develops into the otocyst or otic vesicle. Cells from the medial aspect of each otocyst, along with neural crest cells, develop into both the cochlear and vestibular statoacoustic ganglion. The fluid-filled portion of the otocyst grows into the endolymphatic sacs, the utricle and the saccule. The vestibular and cochlear components separate as the cochlear duct is formed in the fifth week. Differentiation of cochlear cells begins in the seventh week, as well as outgrowths from the utricle. Programmed cell death of these outgrowths results in the formation of the three semicircular canals. Ganglion cells from the VIIIth cranial nerve migrate along the cochlear coils to form the spiral cochlear ganglion, whose processes terminate on the organ of Corti hair cells. The otic capsule is formed beginning in the ninth week. The cochlear duct begins to take on the shape
3 Visual Impairment
it has as in the adult as fluid-filled compartments are formed. Ossification of the otic capsule continues, until the inner ear, with an adult size and configuration, is complete at about week 22. The human eye develops from the optic vesicles that begin as two bulges from the lateral walls of the diencephalon and the overlying surface ectoderm at embryonic day 22 [2]. As the vesicles contact the surface ectoderm, this region thickens to form the lens placode that will differentiate into the lens. The optic vesicles invaginate to form the two layers of the optic cup, which differentiates to form the pigmented retina (the outer layer) and the neural retina (the inner layer), which is composed of glia, ganglion neurons, interneurons and light-sensitive photoreceptor neurons. Differentiation of the neural retina, the lens, and the cornea continues throughout embryogenesis.
3 Visual Impairment
Visual impairment can result from a number of eye abnormalities, including myopia, glaucoma, heterochromia irides, lens opacities, RP and optic atrophy and macular degeneration, to name a few [3]. The leading cause of vision impairment and blindness is age-related eye diseases, including age-related macular degeneration, cataract, glaucoma and diabetic retinopathy. RP is a disease of the rod and cones characterized by night blindness, visual field constriction, acuity loss and abnormal retinograms (ERGs). The loss can involve only peripheral vision, or can lead to total blindness. RP is a genetically heterogeneous disease that involves mutations in numerous genes and is an isolated clinical phenotype or is associated with several syndromes http://www.sph.uth.tmc.edu/RetNet/; [4]. Genes with mutations leading to RP includes rhodopsin and 35 other genes; the proteins they encode have functions ranging from retinal development, protein trafficking and photoreceptor structure, to visual transduction. The chromosomal locations of an additional 24 RPs are known, although their causative genes remain to be discovered. The syndromes associated with RP include Usher syndrome (for example, OMIM 276900; http://www.ncbi.nlm.nih.gov/Omim), Bardet-Biedl syndrome (OMIM 209900) and Leber congenital amaurosis (LCA; OMIM 204000). Half of the RP cases are sporadic, with the remaining having a dominant, recessive, or X-linked mode of inheritance. The cells most affected by RP are the retinal rod photoreceptors, one class of neurons in the mammalian retina. RP is a progressive disease, often beginning with loss of night vision and leading to complete blindness. One of the predominant forms of blindness is Usher syndrome (USH), a disease involving both hearing loss and blindness, first described by Usher in 1913 [5]. Several clinical types have been described based on the age of onset of RP, the cause of blindness; the age of onset and severity of hearing loss; and the presence or absence of vestibular dysfunction. The most severe form, USH type I (USH1), involves profound congenital sensorineural hearing loss with vestibular dsyfunction and an onset of RP at puberty. USH1 loci have been found on seven chromosomes, including
3
4 Sensory Defects Caused by Mutant Motor Proteins
14q32 (USH1A) [6], 11q13 (USH1B), 11p15 (USH1C) [7], 10q (USH1D) [8], 21q (USH1E), 10q21–22 (USH1F), and 17q24–25 (USH1G). The genes for four of these loci have been identified. The USH1B gene is myosin VII (MYO7A) [9], the USH1C gene encodes a PDZ-containing protein named USH1C [10, 11], the USH1D gene is caderin 23 [12, 13], and the USH1F gene is protocad-herin 15 [14, 15]. USH2 is characterized by mild to severe hearing loss, normal vestibular response, and RP. The three forms of USH2 are localized on chromosome 1q41 (USH2A) [16], on 3p23–24.2 (USH2B) [17] and on chromosome 5q14.3–q21.3 (USH2C) [18]. The gene for USH2A encodes a novel protein with laminin epidermal growth factor and fibronectin type III motifs [19]. To date, there is only one form known for USH3, characterized by post-lingual progressive hearing loss, progressive visual loss due to RP, and variable presence of vestibular dysfunction [20]. The gene encodes a protein named clarin-1, a novel protein with homology to stargazin [21, 22]. Of all the USH genes, only one, myosin VIIA, encodes a molecular motor. This gene was isolated with the help of a mouse model for human deafness, shaker 1 (sh1). The locus for this mouse maps to chromosome 7 in a region homologous with human chromosome 11. Mutations in the myosin VIIA gene were found concurrently in both the sh1 mouse [23] and USH1B patients [9]. Two years later, myosin VIIA mutations were found in both dominant and recessive forms of human deafness (see Section 5.1.4).
4 Hearing Impairment
Hearing loss is considered to be the most common sensory loss, affecting 1 in 1000 newborns and up to half of the population at the age of 80 years. Approximately 60% of hearing loss is due to genetic mutations. Hearing loss is genetically heterogeneous, since it is caused by mutations in many different genes [24]. Furthermore, hearing loss is most often monogenic in nature; namely, mutations in only one gene cause deafness in any given individual (or family). Genetic deafness occurs both in association with other signs in the form of syndromic hearing loss (SHL), or as an isolated finding, non-syndromic hearing loss (NSHL). It is estimated that 30% of genetic hearing loss is associated with syndromes. There are over 500 clinically defined syndromes that include hearing loss. Hearing loss has been found associated with diabetes, peripheral neuropathy, craniofacial abnormalities, and dwarfism, to name just a few. Hearing loss is a major feature in a number of prevalent syndromes, including Pendred syndrome (with goitre), Waardenburg syndrome (WS, with pigmentary anomalies and widely-spaced eyes), Alport syndrome (with kidney defects), branchio–oto–renal (BOR) syndrome (with craniofacial and kidney defects), and Usher syndrome (with retinitis pigmentosa). The genes for some of these forms of SHL have been identified, including SLC26A4 [25] (Pendred syndrome), PAX3 [26] (WS type I and III), MITF [27] (WS type II), EDNRB [28], EDN3 [29], and SOX10 [30] (WS type IV), EYA1 [31] (BOR syndrome), COL4A5 [32], COL4A3, and COL4A4 [33] (X-linked and autosomal Alport syndrome).
5 Myosins Involved In Sensory Defects 5
Approximately 70 % of genetic hearing loss is non-syndromic in nature. The largest proportion is inherited in an autosomal recessive mode (∼80%; defined as DFNB loci), ∼18% is inherited in an autosomal dominant mode (defined as DFNA loci), and ∼ 2 % is X-linked (defined as DFN loci). Mitochondrial/maternal inheritance also contributes to a small (1%) proportion of NSHL. The chromosomal locations for 70 loci have already been found for NSHL (updated regularly in the Hereditary Hearing Loss Homepage, http://dnalab-www.uia.ac.be/dnalab/hhh/). A large variety of genes are associated with NSHL. These include myosins (see Sections 5), transcription factors, connexins, ion channels, and more. The most prevalent causative gene for NSHL is connexin 26 (GJB2) [34], which codes for a gap junction protein. Connexin 26 mutations account for 30–50% of inherited hearing loss in many parts of the world [35–39]. Another three connexins, connexin 30 (GJB6) [40–42], connexin 31 (GJB3) [43], and connexin 43 (GJA1) [44], are also associated with human hearing loss, but they occur with much less frequency.
5 Myosins Involved In Sensory Defects
To date, there are five myosins associated with human and mouse hearing loss. Myosins were among the first genes cloned for NSHL in mice. Myosin VIIA mutations were discovered in sh1 mice (see Section 5.1.2) and in USH1B simultaneously, followed by the discovery of myosin VI mutations in Snell’s waltzer mice (sv) [45]; see Section 5.2.2). Five years later, a myosin VI mutation was found in a family with dominant hearing loss [46],; see Section 5.2.3). Mutations in myosin XV were discovered simultaneously in mice and humans [47, 48]; see Sections 5.3.2–5.3.3). A mutation in MYH9 was identified in a family with autosomal dominant hearing loss [49]; see Section 5.4.2). And most recently, three mutations in the myosin IIIA gene were found in a family with autosomal recessive hearing loss [50]; see Section 5.5.3). Of all these molecular motors, only myosin VIIA is currently known to be associated with a defect in the visual system, although myosin IIIA is also expressed in the retina. 5.1 Myosin VIIA
Myosin VII (OMIM 276903) was the seventh myosin to be discovered among the ‘unconventional’ myosins in a polymerase chain reaction (PCR) amplification designed to identify new myosin proteins [51]. Two isoforms of myosin VII are known, myosin VIIA and myosin VIIB [52]. The first myosin VIIA was identified in a human epithelial cell line and human leukocytes [51]. Since then, myosin VIIA has been discovered and characterized in mouse [23], zebrafish [53], C. elegans [54], and in the Dictyostelium discoideum amoeba (DdMVII) [55]. Mutations in myosin VIIA are associated with Usher syndrome type IB and NSHL in humans, NSHL
6 Sensory Defects Caused by Mutant Motor Proteins
Fig. 2 Myosin VIIA protein domain structure.
in mice, impaired phagocytosis in Dictyostelium, impaired spermatogenesis in C. elegans, and abnormal mechanotransduction in zebrafish. 5.1.1 Structure, function, and expression of myosin VIIA The human myosin VIIA (locus designation MYO7A) gene consists of 48 exons, spanning a genomic region of over 100 kb [56, 57]. The myosin VIIA protein consists of 2215 amino acids and has a predicted molecular weight of 254 kDa. The head/motor domain is 729 amino acids long, the neck is 126 amino acids in length, and the tail is 1360 amino acids long (Fig. 2). The unique myosin VIIA tail contains some distinguishing features, including a coiled-coil domain and two repeats of a set of a MyTH4 (myosin tail homology 4) domain and a FERM membrane-binding domain. The repeats are connected by an SH3 domain. Recently, several proteins have been shown to interact with myosin VIIA, revealed through the use of yeast two-hybrid screens. MyRIP is a novel Rab effector, and its interaction with myosin VIIA and Rab27A has led to the speculation that a complex composed of all three proteins bridges retinal melanosomes to the actin cytoskeleton and mediates the trafficking of these melanosomes [58]. Keap1, a human homolog of the Drosophila ring canal protein, kelch, interacts with myosin VIIA through the SH3 domain. While the interaction of myosin VIIA and keap1 was identified in a testes library, these two proteins are also co-expressed in the inner ear [59]. Myosin VIIA has also been shown to interact with the ype I alpha regulatory subunit (RI alpha) of protein kinase A [60]. The FERM domain is the region to which this A-kinase-anchoring protein binds. A retina yeast two-hybrid library was used to identify these interacting proteins and they were also found to both be expressed in the outer hair cells of the cochlea and rod photoreceptor cells of the retina. To date, a fourth myosin VIIA-interacting protein has been identified in the yeast two-hybrid library retina library, vezatin, a novel transmembrane protein that bridges myosin VIIA to the cadherin–catenins complex [61]. Using antibodies specific against myosin VIIA, it has been shown that this motor is expressed in the inner ear, retina, testes, lung, and kidney [58, 62]; Fig. 3. In frog, mouse, rat, and guinea pig inner ears, myosin VIIA is expressed in both the cochlea and vestibular epithelium [63]. The specific expression is in the cell bodies, the cuticular plate (the region immediately under the stereocilia) and along the length of the stereocilia, suggesting that this protein might anchor connectors between stereocilia and fulfill a structural role (Fig. 3). In the eye, myosin VIIA is expressed in the pigmented epithelium and the photoreceptors of the retina [58, 63, 64]. Myosin VIIA may act as a vesicle transporter in the eye, and be involved
5 Myosins Involved In Sensory Defects 7
Fig. 3 Myosin VIIA expression in the ear and eye [58]. (A) Immunolabelling of myosin VIIA in the inner (IHC) and outer hair cells (OHC) of a mouse inner ear. (B) Immunolabelling of myosin VIIa in the pigmented epithelium (E) and (C) photoreceptor (P) cells of the retina of a macaque eye.
in transport of melanosomes or opsin to required regions of the RPE cells that lack these molecules [65]. Based on its potential structural role, myosin VIIA may also be involved in actin organization, affecting melanosome distribution. Most intriguing, a role for myosin VIIA in auditory transduction has recently been described [66]. The mechanical sound wave normally enters the fluid-filled cochlea and is transduced into an electrochemical signal. The sound wave causes deflection of the hair bundles that project from the apical portion of the hair cells, composed of actin-rich structures named stereocilia (Fig. 1). Transduction channels between adjacent stereocilia open and close, depending on the tension of gating springs. A resting tension is maintained by an adaptation motor, allowing channels to be open even in the absence of stimuli. This element of auditory transduction allows for the incredible sensitivity and speed in the system. In shaker 1 mice lacking myosin VIIA (see Section 5.1.2), the resting tension is absent, requiring non-physiologically large bundle deflections to open the transduction channels. A structural role for myosin VIIA is supported since in this case it appears to participate in anchoring membrane-bound elements to the actin core of the stereocilia. 5.1.2 Shaker 1 Mice and Other Models The shaker 1 (sh1) mouse exhibits circling and deafness and is one of a large group of deaf mouse mutants (see Mouse Mutants with Hearing or Balance Defects, http://www.ihr.mrc.ac.uk/hereditary/mousemutants.htm). Ten recessive sh1 alleles have been discovered over the years, having arisen spontaneously or been generated by radiation. The sh1 locus lies on mouse chromosome 7, in the region homologous to human chromosome 11q13.5 (for regions of homology between human and mouse chromosomes, see Human-Mouse Homology Maps, http://www.ncbi.nlm.nih.gov/Homology/). During a positional cloning effort to clone the gene for sh1, a Myo7a exon was found in the sh1 region and since another unconventional myosin had been found in stereociliary bundles [67], myosin VIIA appeared to be a very good candidate for deafness in these mice. This was a key discovery in the identification of the USH1B gene (see Section 5.1.3). Since then, a spectrum of Myo7a mutations have been found in sh1 mice [68, 69]. Overtly,
8 Sensory Defects Caused by Mutant Motor Proteins
Fig. 4 An analysis of sh1 hair cells has revealed a progressive disorganization of the hair bundles. SEM of the surface of outer hair cells from 20-day old (A) control mice and (B) sh1 mutant homozygotes (Myo7a6J allele). Note disorganization of hair bundles. Adapted with permission from [69], Company of Biologists Ltd.
sh1 mice appear to have NSHL, along with vestibular dysfunction, as there is no evidence of visual problems. An analysis of sh1 hair cells has revealed a progressive disorganization of the hair bundles and degeneration (Fig. 4; [69]). The question of why sh1 mice do not develop RP has been an ongoing one. Although sh1 mice do not appear to have any retinal degeneration, they do have defective melanosome distribution in the RPE [65]. Furthermore, a number of alleles do have electroretinographic (ERG) anomalies, revealed by an observation of decreased ERG amplitudes with normal thresholds [68]. Myosin VIIA has also been shown to be essential for aminoglycoside accumulation in cochlear hair cells and hence may be involved in ototoxicity caused by these antibiotics [70]. Unique to the myosins involved in deafness is that myosin VIIA mutations have also been found in zebrafish [53]. Zebrafish have two mechanosensory organs, the inner ear and the lateral line organ, both of which contain sensory cells. The particular advantage of studying these lower vertebrates is the visibility of hair cell development. The mariner mutant is a circling zebrafish with inner ear hair bundle defects, manifested as splaying of the stereocilia (Fig. 5). The Myo7a missense and nonsense mutations in mariner larvae lead to defects in mechanotransduction and inhibition of apical hair cell endocytosis [71]. The comparative analysis of zebrafish, mouse, and human myosin VIIA reveals striking similarities between their domains and the pathogenesis seen in sensory cells from zebrafish and mice help to address the pathology in humans. 5.1.3 Usher Syndrome Type 1B Mutations in myosin VIIA underlie the pathogenesis of Usher’s syndrome type 1B. First described in 1858 [72], this disease was named after Charles Usher, who reported the combination of RP and deafness in Britain [5]. In USH1B, retinal
5 Myosins Involved In Sensory Defects 9
Fig. 5 Stereocilia of the myosin VIIa mariner zebrafish mutant shows a splayed pattern. Five-day old larvae were fixed and labelled with Oregon green phalloidin and observed by confocal microscopy. Hair bundles from (A) control zebrafish are conical
in shape, while (B) mutant mariner (the tn3540 allele) larvae show splaying of the stereociliary hair bundles. Adapted with permission from [53], by Oxford University Press.
(photoreceptor cells) and cochlear and vestibular sensory cells (hair cells) are defective. As a result, these children are born with congenital deafness, usually profound, and with vestibular dysfunction, seen as motor development delay. During childhood, the retinopathy begins as night blindness and progresses to complete blindness, usually by puberty. The USH1B locus (OMIM 276903) was first identified in 1992 to chromosome 11q13.5 [7, 73] and it took only three more years to discover the molecular basis of this disease. When the Myo7a exon was discovered in the sh1 region homologous to the USH1B-containing region, mutations in the human gene were found as well [9]. Since then, 81 mutations in the myosin VIIA gene have been discovered in USH1B [24]. Many of the mutations identified are localized to the 17 exons of the head/motor domain, although there are also mutations in the neck (three exons) and tail region (28 exons). The mutations are varied in nature, including nucleotide substitutions (missense, nonsense, and splicing mutations), small deletions and insertions, and rarer, larger deletions and rearrangements (one case for each) (summarized in Human Gene Mutation Database, http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html). 5.1.4 DFNB2 and DFNA11 There are a number of cases where mutations in the same gene lead to both recessive and dominant deafness (http://dnalab-www.uia.ac.be/dnalab/hhh), and such is the case for myosin VIIA as well. Two non-syndromic deafness loci, DFNB2 (OMIM 600060) [74, 75], and DFNA11 (OMIM 601317) [76], are associated with MYO7A mutations. DFNB2 is an autosomal recessiveformofnon-syndromic deafness, identifiedinChinese and Tunisian families. The Tunisian family is a highly consanguineous one, with 22 affected members exhibiting profound hearing loss. The mutation in this family is a G1797A missense, changing a methionine to isoleucine at the end of exon 15, leading to a decrease in splicing efficiency. Affected
10 Sensory Defects Caused by Mutant Motor Proteins
DFNA11 family members from Japan have moderate progressive sensorineural hearing loss, with no evidence of RP, as revealed by ophthalmological testing [77]. Sequencing of DNA from affected DFNA11 family members revealed an in-frame 9-base pair (bp) deletion that results in the loss of three amino acids. This deletion occurs in the coiled-coiled region of the myosin VIIA tail (Fig. 2), and thus is predicted to prevent dimerization of myosin VII, leading to a dominant-negative effect. 5.2 Myosin VI
Myosin VI (OMIM 600970) was the sixth myosin to be discovered among the ‘unconventional’ myosins. The first myosin VI was identified in Drosophila, named 95F MHC [78]. Since then, myosin VI has been discovered in C. elegans [54], pig [79], bullfrog [80], chicken [81] and mouse [45]. Mutations in myosin VI are associated with NSHL in both humans and mice. 5.2.1 Structure, function, and expression of myosin VI The human myosin VI (MYO6) gene is composed of 32 coding exons and spans a genomic region of approximately 70 kb. The phenotype of the sesv allele clearly suggests that there are additional upstream regulatory sequences to be identified in mice, and they may be present in human as well (see Section 5.2.2). The coding region of myosin VI produces a 4-kb cDNA, although by Northern analysis, transcripts of 6 and 8 kb are observed [82]. The human myosin VI protein is predicted to encode a 1263-amino acid protein, composed of a head/motor domain (776 amino acids), a neck domain, and a tail that is made up of a coiled-coil tail (192 amino acids) and a globular region (232 amino acids) (Fig. 6). Myosin VI is ubiquitously expressed, with the highest expression in the kidney and brain [82]. In the inner ear, myosin VI is localized to the inner and outer hair cells of the cochlea and the vestibular organs [45, 63] (Fig. 7). Myosin VI is localized in the cell body, and in particular, in the cuticular plate, the actin-rich structure underlying the stereocilia. Expression has not been observed in the stereocilia. In Drosophila, inhibition of myosin VI by antibodies demonstrates that this protein is required for proper organization of the syncitial blastoderm [83]. Porcine myosin VI was cloned in an effort to study myosins in the proximal tubule of the kidney, and was the source of the first mammalian myosin VI clone. In the kidney, myosin VI is expressed at the base of the brush border in proximal tubule cells [84]. In
Fig. 6 Myosin VI protein domain structure.
5 Myosins Involved In Sensory Defects 11
Fig. 7 Protein localization of myosin VI in guinea pig organ of Corti. (A) Section at level of the cuticular plate, the actin-rich region below the stereocilia, where myosin VI is enriched. (B) Section at the level of the
cell bodies of the inner (IHC) and outer hair cells (OHC). MVI, myosin VI. Adapted with permission from [63], by copyright permission of The Rockefeller University Press.
the striped bass, two isoforms of myosin VI (FMVIA and FMVIB) were found in a retinal cDNA library [85]. In the retina, these two isoforms are expressed in the photoreceptors, horizontal cells and Muller cells in both fish and primate retinas. Myosin VI has been studied both from the perspective of its inner ear pathology and as a protein involved in membrane traffic in polarized cells. Although myosin VI is ubiquitously expressed, its localization in different cell types has implicated this protein in receptor-mediated endocytosis. Myosin VI is localized to the Golgi complex and the leading, ruffling edge of fibroblasts [81]. In the kidney, myosin VI is localized in the coated-pit region of the brush border and is associated with clathrin-associated proteins [84]. Along with its role in Drosophila embryos (see Section 5.2.2), myosin VI appears to be a general membrane transport motor and have an essential role in membrane trafficking. Thus far, interacting proteins for myosin VI have been identified from brain and kidney libraries. In a search for proteins that interact with the glucose transporter binding protein GLUT1CBP, myosin VI was found to interact, suggesting a role for both in vesicle trafficking or a mechanism for linking GLUT1CBP to the actin cytoskeleton [86]. In experiments designed to find myosin VI-interacting proteins, DOC-2/DAB2 was identified. This Ras cascade signaling molecule interacts with myosin VI at the coiled-coil and globular tail region [87, 88]. Thus myosin VI may transport DOC-2/DAB2 from the cell surface to its specific cellular targets. Myosin VI has several unique features. It has been shown to translocate toward the pointed (−) end of actin, contrary to the direction taken for all the other myosins characterized [89]. A myosin VI construct was bound to actin, and using cryoelectron microscopy and image analysis, an ADP-mediated conformational change was observed in the domain distal to the motor in an opposite orientation to other myosins. This domain corresponds to the converter/lever arm, containing a unique
12 Sensory Defects Caused by Mutant Motor Proteins
53-amino acid insertion. In addition, myosin VI has been shown to be a processive motor with an unusually large step size [90]. 5.2.2 Snell’s waltzer mice and other models The Snell’s waltzer (sv) mouse is a recessive mutation that arose spontaneously at the Jackson Laboratory in the 1960s [91]. These mice are profoundly deaf and exhibit vestibular dysfunction, observed by circling. During a radiation screen at Oak Ridge National Laboratories, another sv allele, sesv , was created [92]. A positional cloning endeavor led to the isolation of the svMyo6 gene, using the sesv allele to gain access to the sv locus [45]. Exon-trapping (a technique used to identify proteinencoding sequences in genomic clones; [93]) led to the identification of an exon with 89 % homology to the porcine myosin VI gene. Subsequent analysis of the sv allele showed that there is a 2-kb genomic deletion in the mouse myosin VI gene, which leads to a 130-bp deletion in the coding region. As a consequence, a stop codon is formed in the neck region and no myosin VI protein is produced in any the sv tissues examined. In the sesv allele, a 2-cM inversion leads to the putative loss of upstream regulatory sequences. As a result, there is reduced expression of myosin VI in all sesv tissues examined. Snell’s waltzer mice never appear to hear and sound stimulation experiments at 30 days after birth reveal only minimal reponses of a summating potential (SP) at 125 dB SPL, a high intensity stimulus that is similar to the noise of a jet engine. There was no compound action potential (CAP) or cochlear microphonics (CM) response at any frequency or intensity of stimulus used [69]. Although vestibular function has not been measured, the sv mice circle continuously in either direction. The hair cells of sv mice show a unique pattern due to the loss of myosin VI, as observed by scanning electron microscopy (SEM) [94]. Already by 3 days after birth, a disorganized pattern is apparent that includes fusion of stereocilia with one another (Fig. 8). By 20 days after birth, giant stereocilia are seen at the top of the hair cells. Transmission EM has demonstrated that the fusion begins at the base of the stereocilia simultaneously, hair cells degenerate, and by 6 weeks of age, there are no hair cells left and the entire organ of Corti is lost [45, 94]. Although endocytosis was shown to function normally in these hair cells, technical limitations may have prevented abnormal processes from being detected. Studies in lower organisms also implicate myosin VI in membrane trafficking. In Drosophila, myosin VI is known to be crucial for proper blastoderm formation [83], and in both C. elegans and Drosophila, for proper spermatogenesis. C. elegans deletion mutants do not properly segregate cell components, resulting in aberrant spermatogenesis. Mitochondria, endoplasmic reticulum/Golgi-derived fibrous-body membranous organelle complexes, actin filaments, and microtubules are not properly transported to or from spermatids [54, 95]. During fertilization, each spermatid must go through a process called ‘individualization’, whereby it is enclosed in its own membrane. Two Drosophila mutants are known to have defects in spermatid individualization. Thus the 95F motor is required for transport and reorganization of membranes in the individualization complex within developing spermatids [95].
5 Myosins Involved In Sensory Defects 13
Fig. 8 Scanning electron microscopy of Snell’s waltzer (sv) inner and outer hair cells of (A) newborn, (B) 3, (C) 7, and (D) 12-day old mutant mice. Note fusion of stereocilia already beginning at three days after birth and general disorganization of cells. Adapted with permission from [94], with permission from Elsevier Science.
5.2.3 DFNA22 Efforts to find a human myosin VI mutation failed for several years, despite this gene’s involvement in mouse deafness [96]. However, in 2001, a genome scan on DNA derived from an Italian family with progressive dominant NSHL revealed linkage to chromosome 6q13, the region containing the human MYO6 gene [46]. A missense mutation was discovered in exon 12, corresponding to the head/motor domain, between the ATP-binding and actin-binding sites (Fig. 6). The cysteine to tyrosine amino acid change may destablilize the protein, and thus compromise myosin VI function. This cysteine is conserved in myosin VI in human, mouse, chicken, pig, striped bass and sea urchin, and in C. elegans and Drosophila, the cysteine is replaced with a similar hydrophilic residue, serine. In a comparison with 143 other myosins, no tyrosine residue appears at this position (see Myosin Motor Domain Sequence Alignment; http://www.mrc-lmb.cam.ac.uk/myosin/trees/txalign.html). It does not appear to form disulfide bonds, which are often formed by cysteines. Given the segregation of this MYO6 mutation with the affected individuals in this family, the previous association of myosin VI with deafness, and the conservation of the mutated residue, this missense mutation appears to lead to deafness in this human family. Although the protein structure of myosin VI has not been elucidated, the threedimensional structure of the head portion of myosin II is known [97]. Using
14 Sensory Defects Caused by Mutant Motor Proteins
Fig. 9 A ribbon respresentation of a threedimensional model of the human myosin VI motor domain, showing the mutated human myosin VI cysteine residue as a ball in the middle. The model was built using the automatic homology modeling facility, Swiss
Model (http://www.expasy.ch/swissmod/ SWISS-MODEL.html) and the threedimensional structures of several myosin motor domains. Adapted with permission from [46], from the University of Chicago.
homology modeling techniques, a model for myosin VI was made based on the amino acid sequence homology between these two myosins (Fig. 9). According to the model, the mutated cysteine that leads to deafness is partly buried in the protein core, suggesting that replacement of this small amino acid with a bulky tyrosine may destabilize the protein [46].
5.3 Myosin XVA
Myosin XVA (OMIM 602666) was discovered in a search for the gene associated with the mouse deaf mutant shaker 2 (sh2) [47] and the human NSHL locus DFNB3 [48].
5 Myosins Involved In Sensory Defects 15
Fig. 10 Myosin XVA protein domain structure.
5.3.1 Structure, function, and expression of myosin XVA The human (locus designation, MYO15A) and mouse myosin XVA genes both contain 66 exons and cover 71 and 60 kb of genomic DNA, respectively [98]. A fulllength isoform of myosin XVA is predicted to encode a 3530-amino acid protein with a molecular weight of 395 kDa. Like most other myosins, myosin XVA contains a motor domain, two IQ motifs, and a tail domain (Fig. 10). Myosin XVA also has a 1200-amino acid domain that precedes the motor at the N-terminal end. The tail domain of myosin XVA is composed of two MyTH4, two FERM-like domains, and an SH3 domain, similar to myosin VIIA (see Section 5.1.1). Myosin XV has a more restricted expression pattern than myosin VI. Northern and dot-blot analysis indicates that the human gene is expressed in fetal and adult brain, ovary, testis, kidney and pituitary gland. In the mouse inner ear, as demonstrated by in situ hybridization, Myo15A is expressed in the hair cells of the cochlea and the vestibular system, including the saccule, utricle, and cristea ampularis [98] (Fig. 11). Labeling with antibodies showed a more restricted pattern in the hair cells, namely in the cuticular plate (the actin-rich region in the apical portion of the cell) and stereocilia. 5.3.2 Shaker 2 The original sh2 allele arose in an X-ray irradiation screen [99] and was mapped to mouse chromosome 11 [100], to a region homologous to human chromosome 17. Like sh1 and sv, these mice are deaf and exhibit the characteristic circling behavior
Fig. 11 Myosin XVa is expressed in the inner (IHC) and outer hair cells (OHC) of the inner (IHC) and outer hair cells (OHC) of the mouse. Myosin XVa is detected by indirect immunofluorescence using myosin XVa
antibodies on whole-mount preparations of organ of Corti in (A) low magnification and (B) high magnification. Adapted with permission from [27], with permission from Elsevier Science.
16 Sensory Defects Caused by Mutant Motor Proteins
Fig. 12 Scanning electron microscopy of the surface of shaker2 (sh2) organ of Corti. Note shortened stereocilia of outer hair cells (OHC) and inner hair cells (IHC).
of waltzer mice. The stereocilia of sh2 hair cells have a unique pattern in that they are extremely short, although the overall array pattern of the hair cells appears normal [47]; Fig. 12). The sh2 mouse was proposed as a model for the human DFNB3 locus due to the mapping of homologous genes in the sh2 and DFNB3 regions [98, 101]. To clone the gene, a physical map of the sh2 region was constructed. Since there were many compelling candidate genes in the region, an in vivo complementation approach was taken in order to reduce the number of candidates to screen. Bacterial artificial chromosomes (BAC) from the critical region were injected into fertilized eggs derived from sh2/sh2 matings. One of the transgenic progeny born contained one of the BACs integrated in its genome. This mouse, named Sebastian after Johann Sebastian Bach, did not circle and was hearing, unexpected from a shaker 2 homozygous cross. This suggested that the gene responsible for the sh2 phenotype was contained within this 140 kb BAC. Sequencing and subsequent examination of the data generated revealed a novel unconventional myosin, named myosin XV, since it was the 15th to be identified. Since the subsequent discovery of a myosin XV, pseudogene, the myosin XV deafness gene has been renamed XVA [102]. Initially, a mutation was detected in one sh2 allele within the motor domain. A G → A transition converts a cysteine to tyrosine. Comparison with 82 other myosins revealed either a cysteine or conservative change to threonine or leucine at this site, demonstrating the conservation of this residue. This change is likely to prevent myosin XVA binding to actin. 5.3.3 DFNB3 The DFNB3 locus (OMIM 600316) was the third recessive locus to be identified when it was found in a significant proportion of villagers from Bengkala, Bali [101]. Deafness in this community has been documented for at least seven generations, and as a result, the villagers, both hearing and deaf, developed their own sign language for effective communication [103]. Additional families were found in India and Pakistan [104]. Deafness was mapped to the centromeric portion of chromosome 17 to a 3-cM region. Once the shaker 2 gene was identified, primers based on homology to the mouse-predicted exons were designed and used to amplify human genomic DNA. Three mutations were identified that segregate with
5 Myosins Involved In Sensory Defects 17
deafness. These include an isoleucine to phenylalanine substitution (I892F) within the MyTH4 domain, an aspartic acid to tyrosine substitution within the same domain (N890Y), and a nonsense mutation in exon 39, predicted to result in a truncated or null protein. All three mutations compromise myosin XVA protein function, an essential component of the auditory pathway. Since then, three additional mutations have been found in myosin XVA in families with non-syndromic deafness. Smith–Magenis syndrome (OMIM 182290) patients have an interstitial deletion [del(17)p11.2] that includes MYO15A, leading to brachycephaly, midface hypoplasia, prognathism, hoarse voice, speech delay with or without hearing loss, psychomotor and growth retardation, and behavioral problems. Recently, the remaining MYO15A allele in one family was found to contain a substitution of a well-conserved threonine downstream of the first MyTH4 domain, which may be responsible for the moderate–severe hearing loss in this individual [104]. 5.4 MYH9
MYH9 (OMIM 160775), also known as non-muscle myosin IIA (NMIIA), is a nonmuscle myosin heavy-chain gene that maps to human chromosome 22q13.1. This myosin is involved in a number of diseases, including autosomal dominant giantplatelet disorders May–Hegglin anomaly, Sebastian syndrome, Fechtner syndrome and Epstein syndrome. Of these, Fechtner syndrome and Epstein patients exhibit hearing loss. 5.4.1 Structure, function, and expression of MYH9 MYH9 was the first mammalian non-muscle myosin heavy chain gene to be identified and at the time was shown to be expressed in fibroblasts, endothelial cells, and macrophages [105]. The MYH9 gene is expressed in a large number of tissues, as revealed by its presence in cDNA libraries and expressed sequence tags (ESTs) derived from stomach, uterus, pancreas, testis, and more (Uni-Gene Cluster Hs.146550; http://www.ncbi.nlm.nih.gov/UniGene). In the rat cochlea, MYH9 protein is expressed in the outer hair cells of the organ of Corti, the spiral ligament, and Reissner’s membrane. Homologs have been identified in mouse, rat, Arabidopsis, Drosophila, C. elegans and yeast. Class II myosins consist of a pair of heavy chains, a pair of light chains, and a pair of regulatory light chains that together form a hexameric myosin molecule [106]. MYH9 encodes an amino-terminal head domain that contains the actin and nucleotide binding sites and a carboxy-terminal coiled-coil α-helical rod structure, separated by an IQ motif. The 40 coding exons of MYH9 encompass 5886 bp of coding sequence. The protein is predicted to encode 1960 amino acids with a molecular weight of 226 kDa. 5.4.2 DFNA17 The DFNA17 locus (OMIM 603622) was identified on 22q12.2–22q13.3 when linkage analysis of a five-generation non-consanguinous American family was carried
18 Sensory Defects Caused by Mutant Motor Proteins
Fig. 13 Histopathological analysis of the temporal bones from a hearing individual (A) and a (B) deceased male with profound hearing loss, demonstrating classical Scheibe degeneration or cochleosaccular degeneration. This phenotype is characteristic of DFNA17/MYH9 hearing loss. Adapted with permission from [107].
out [107]. DFNA17 is inherited in an autosomal dominant manner. Affected members of this family suffer from progressive hearing loss, beginning with the high frequencies at the age of 10 years, and by the age of 30 years the condition manifests itself as moderate to severe HL. Histopathological analysis was performed on the temporal bones of a deceased male with profound hearing loss, demonstrating classical Scheibe degeneration or cochleosaccular degeneration. In this condition, there is a loss of hair cells and supporting cells in the cochlea and saccule (an organ of the vestibular apparatus), with an accompanying loss of neurons, as well as degeneration of the stria vascularis (Fig. 13). Sequence analysis of MYH9 revealed a G → A heterozygote mutation in exon 16, leading to a substitution of arginine by histidine in codon 705 [49]. The R705H mutation occurs in a highly conserved 16-amino acid linker region that contains two free thiol groups SH1 and SH2. Since this region is believed to play a major role in myosin head conformational changes, the DFNA17 mutation may disrupt mechanical function of the motor domain.
5.5 Myosin IIIA
The myosin IIIA (gene locus, MYO3A) (OMIM 606808) was first identified in Drosophila [108] and the horseshoe crab Limulus polyphemus [109], where the gene is expressed exclusively in eye photoreceptor cells. Mutations in the Drosophila myosin IIIA, NINAC, are associated with abnormal photoreceptor electrophysiology. The human MYO3A gene was isolated with the intention of following its retinal expression [110], with the expectation that this gene would be involved in a human eye disease. Therefore it was surprising that the first human mutations in MYO3A are associated with NSHL only [50].
5 Myosins Involved In Sensory Defects 19
Fig. 14 Myosin IIIA protein domain structure.
5.5.1 Structure, function, and expression of myosin IIIA The human myosin IIIA gene (locus designation, MYO3A) has an open reading frame of 4848 bp that codes for a predicted 1616-amino acid protein [110]. This alternatively spliced 35-exon gene spans 308 kb of genomic sequence. Myosin IIIA is composed of a motor domain, a neck and a tail domain (Fig. 14). There are two IQ motifs in the neck, and unique to this protein, an IQ motif in the tail. There is no evidence of a coiled-coil domain, suggesting that myosin IIIA functions as a monomeric myosin. The most unusual feature of myosin IIIA is the N-terminal kinase domain that most resembles a serine/threonine kinase. This kinase is also conserved in the Limulus and Drosophila myosin IIIA proteins, although they do not possess a similar tail motif. The mouse myosin IIIA has recently been identified and shows high similarity to the human gene and protein [50]. While the Limulus and Drosophila myosin IIIA is expressed solely in the photoreceptor cells, human myosin IIIA has a slightly more broad expression. Myosin IIIA mRNA is found in human and monkey retina, as well as in human pancreas but not in the heart, brain, placenta, lung, liver, skeletal muscle or kidney [110]. In the mouse cochlea, MYO3A is expressed specifically in the inner and outer hair cells, as observed by in situ hybridization [50]; Fig. 15). Antibodies against the fish myosin IIIA demonstrate expression specifically in the photoreceptor cells of the retina (http://mcb.berkeley.edu/labs/burnside/myosin.html).
Fig. 15 Myosin IIIa is expressed in the inner (IHC) and outer hair cells (OHC) of the mouse inner ear. mRNA in situ hybridization was performed to demonstrate Myo3a expression in a newborn mouse cochlea. (A) Entire cochlea showing hair cells, with line indicating orientation of cross sections.
(B) Cross section of cochlear duct showing specific labelling of IHC and OHC with Myo3a antisense probe. (C) Cross section of cochlear duct with Myo3a sense probe, indicating that expression in (B) is specific. Adapted with permission from [50], ľ 2002 National Academy of Sciences, U.S.A.
20 Sensory Defects Caused by Mutant Motor Proteins
The human MYO3A gene was localized by radiation hybrid mapping to the human chromosome 10p11.2 region [110]. Although there are at least six retinal diseases that map to chromosome 10 (http://www.sph.uth.tmc.edu/RetNet/), no retinal diseases map to the MYO3A region. Recently, an NSHL locus, DFNB30, was localized to a 13-cM region on chromosome 10, corresponding to 10 Mb [50]. Although many genes are present in this region, myosin IIIA appeared to be a particularly promising candidate due to the known involvement of myosins in the auditory pathway and the role of NINAC in the visual pathway. 5.5.2 Myosin IIIA mutants A mouse mutant with myosin IIIA mutations has not yet been produced. Thus far, the only animal model produced exhibits abnormalities in the visual transduction pathway. The ninaC Drosophila mutant was recovered in an electrophysiological screen to identify altered photoresponse [108]. The ninaC gene codes for two splice variants, p132 and p174. They are identical in their kinase and motor domain, but diverge after the first IQ motif. p132 expression is restricted to the photoreceptor cell bodies, while p174 is expressed in the rhabdomeres. Removal of the kinase domain of both p132 and p174 lead to altered ERGs, but no retinal degeneration. Removal of the myosin motor domain leads to both abnormal ERGs and retinal degeneration. Additional mutagenesis studies led to the conclusion that the motor domain is required for photoreceptor maintenance but not for initiation of the phototransduction cascade [111]. 5.5.3 DFNB30 The DFNB30 locus is represented by an Israeli family originating in Mosul, Iraq [50]. The hearing loss in this family is progressive, beginning in the second or third decade of life. During the initial characterization of this family, it was unclear whether the inheritance pattern was recessive or dominant. Until now, all families with progressive hearing loss have shown a dominant pattern of inheritance. A genome scan revealed age-dependent recessive inheritance, although homozygosity was restricted to only a part of the critical region. Therefore it was not surprising to discover several mutations in the MYO3A gene, corresponding to the haplotypes segregating in the family. The first mutation is a splice mutation that leads to an unstable message, 732 (−2) A → G (Fig. 14). The second mutation is a splice mutation at 1777 (−12) G → A, which leads to the deletion of exon 18 and a stop at codon 668. The third mutation is a nonsense mutation, 3126 T → G, which leads to a stop at codon 1043. Most interesting, there is a genotype–phenotype correlation seen in affected family members (Fig. 16). Individuals who were homozygous for the nonsense mutation had an earlier onset of hearing loss. Between the ages of 25 and 50 years, individuals homozygous for the nonsense mutation had significantly poorer hearing thresholds than those heterozgyote for the nonsense and either of the missense mutations. Although hearing impaired individuals have never complained of visual problems, detailed ophthamological tests will determine whether
6 Conclusions 21
Fig. 16 A phenotype-genotype correlation is present in myosin IIIA mutations in the DFNB30 family. Three mutations segregated in the hearing impaired individuals from this family. Individuals with the nonsense (NN) mutation on both MYO3A alleles had a more severe hearing loss that began in the early 20s. Those who are compound heterozygote for a nonsense and
either splice mutation (NS) began to notice a change in their hearing in their 30s, and their hearing loss is not as severe. The figure shows the hearing thresholds in decibels (dB) at ages 25-39 years and 40–49 years, measured by pure-tone audiometry. Adapted with permission from [50], ľ 2002 National Academy of Sciences, U.S.A.
the retina is affected, as myosin IIIA is expressed in the retina and ninaC mutations lead to retinal degeneration (see Section 5.5.2).
6 Conclusions
Myosins associated with sensory defects perform many functions, ranging from structural, developmental and physiological roles. To date, five myosins are associated with hearing and visual defects in organisms ranging from Drosophila to humans. Additional myosins clearly play a role in the auditory and visual pathways, as they are expressed in cells of the cochlea and retina and perform essential functions in these pathways. For example, myosin Ic mediates auditory adaptation and is expressed in the stereocilia of mammalian hair cells [112]. At least an additional 40 deafness loci and 35 RP loci have been mapped, so that additional myosins may be associated with these disorders. Thus far, we have learned a great deal about the
22 Sensory Defects Caused by Mutant Motor Proteins
function of the molecular motors associated with sensory defects in humans, and availability of mouse models will undoubtedly lead to yet more discoveries.
Acknowledgements
K. B. A. is supported by grants from the European Commission, the Israel Ministry of Health, the Israel Academy of Sciences, the Israel Ministry of Science, Culture and Sports, and the National Institutes of Health Fogarty International Center Grant. I would like to thank Yehoash Raphael, Lisa Beyer, Anil Lalwani, Teresa Nicolson, Thomas Friedman, Aziz El-Amraoui, and Christine Petit for contributing figures and the many collaborators who have worked on myosins with me over the years: Tama Hasson, Karen Steel, Corne Kros, Paolo Gasparini, Mary-Claire King, Nancy Jenkins and Neal Copeland. I would also like to thank my laboratory members who have worked on myosins: Tama Sobe, Nadav Ahituv, Orit Ben-David, and Sarah Vreugde.
References 1 GOERINGER, G. C. 1998. Development of the ear. In: Pediatric Otology and Neurotology. Edited by A. K. LALWANI and K. M. GRUNDFAST, New York: Lippincott–Raven Publishers, pp. 3–27. 2 GILBERT, S. F. 1994. Developmental Biology. Massachusetts, USA: Sinauer Associates. 3 SCHOEM, S. R. and K. M. GRUNDFAST 1998. Oculoauditory syndromes. In: Pediatric Otology and Neurotology. Edited by A. K. LALWANI and K. M. GRUNDFAST. New York: Lippincott–Raven Publishers, pp. 365–374. 4 FARRAR, G. J., P. F. KENNA and P. HUMPHRIES. 2002. On the genetics of retinitis pigmentosa and on mutation-independent approaches to therapeutic intervention. EMBO J. 21: 857–864. 5 USHER, C. H. 1914. On the inheritance of retinitis pigmentosa, with notes of cases. R. Lond. Ophthalmol. Hosp. Rep. 19: 130–236. 6 KAPLAN, J., et al., 1992. A gene for Usher syndrome type I (USH1A) maps to chromosome 14q. Genomics 14: 979–987. 7 SMITH, R. J., et al., 1992. Localization of two genes for Usher syndrome type I to chromosome 11. Genomics 14: 995–1002.
8 WAYNE, S., et al., 1996. Localization of the Usher syndrome type ID gene (Ush1D) to chromosome 10. Hum. Mol. Genet. 5: 1689–1692. 9 WEIL, D., et al., 1995. Defective myosin VIIA gene responsible for Usher syndrome type 1B. Nature 374: 60–61. 10 BITNER-GLINDZICZ, M., et al., 2000. A recessive contiguous gene deletion causing infantile hyperinsulinism, enteropathy and deafness identifies the Usher type 1C gene. Nature Genet. 26: 56–60. 11 VERPY, E., et al., 2000. A defect in harmonin, a PDZ domain-containing protein expressed in the inner ear sensory hair cells, underlies Usher syndrome type 1C. Nature Genet. 26: 51–55. 12 BOLZ, H., et al., 2001. Mutation of CDH23, encoding a new member of the cadherin gene family, causes Usher syndrome type 1D. Nature Genet. 27: 108–112. 13 BORK, J. M., et al., 2001. Usher syndrome 1D and nonsyndromic autosomal recessive deafness DFNB12 are caused by allelic mutations of the novel cadherin-like gene CDH23. Am. J. Hum. Genet. 68: 26–37.
References 14 AHMED, Z. M., et al., 2001. Mutations of the protocadherin gene PCDH15 cause Usher syndrome type 1F. Am. J. Hum. Genet. 69: 25–34. 15 ALAGRAMAM, K. N., et al., 2001. Mutations in the novel protocadherin PCDH15 cause Usher syndrome type 1F. Hum. Mol. Genet. 10: 1709–1718. 16 KIMBERLING, W. J., et al., 1990. Localization of Usher syndrome type II to chromosome 1q. Genomics 7: 245–249. 17 HMANI, M., et al., 1999. A novel locus for Usher syndrome type II, USH2B, maps to chromosome 3 at p23-24.2. Eur. J. Hum. Genet. 7: 363–367. 18 PIEKE-DAHL, S., et al., 2000. Genetic heterogeneity of Usher syndrome type II: localisation to chromosome 5q. J. Med. Genet. 37: 256–262. 19 EUDY J. D., et al., 1998. Mutation of a gene encoding a protein with extracellular matrix motifs in Usher syndrome type IIa. Science 280: 1753–1757. 20 SANKILA, E. M., et al., 1995. Assignment of an Usher syndrome type III (USH3) gene to chromosome 3q. Hum. Mol. Genet. 4: 93–98. 21 ADATO, A., et al., 2002. USH3A transcripts encode clarin-1, a four-transmembrane-domain protein with a possible role in sensory synapses. Eur. J. Hum. Genet. 10: 339–350. 22 JOENSUU et al., 2001. Mutations in a novel gene with transmembrane domains underlie Usher syndrome type 3. Am. J. Hum. Genet. 69: 673–684. 23 GIBSON, F., et al., 1995. A type VII myosin encoded by the mouse deafness gene shaker-1. Nature 374: 62–64. 24 PETIT, C., J. LEVILLIERS and J. P. HARDELIN. 2001. Molecular genetics of hearing loss. Annu. Rev. Genet. 35: 589–646. 25 EVERETT, L. A., et al., 1997. Pendred syndrome is caused by mutations in a putative sulphate transporter gene (PDS). Nature Genet. 17: 411–422. 26 HOTH, C. F., et al., 1993. Mutations in the paired domain of the human PAX3 gene cause Klein-Waardenburg syndrome (WS-III) as well as
27
28
29
30
31
32
33
34
35
36
37
38
Waardenburg syndrome type I (WS-I). Am. J. Hum. Genet. 52: 455–462. TASSABEHJI, M., V. E. NEWTON and A. P. READ. 1994. Waardenburg syndrome type 2 caused by mutations in the human microphthalmia (MITF) gene. Nature Genet. 8: 251–255. ATTIE, T., et al., 1995. Mutation of the endothelin-receptor B gene in Waardenburg–Hirschsprung disease. Hum. Mol. Genet. 4: 2407–2409. EDERY, P., et al., 1996. Mutation of the endothelin-3 gene in the Waardenburg–Hirschsprung disease (Shah–Waardenburg syndrome). Nature Genet. 12: 442–444. PINGAULT, V., et al., 1998. SOX10 mutations in patients with Waardenburg–Hirschsprung disease. Nature Genet. 18: 171–173. ABDELHAK, S., et al., 1997. A human homologue of the Drosophila eyes absent gene underlies branchio–oto–renal (BOR) syndrome and identifies a novel gene family. Nature Genet. 15: 157–164. BARKER, D. F., et al., 1990. Identification of mutations in the COL4A5 collagen gene in Alport syndrome. Science 248: 1224–1227. MOCHIZUKI, T., et al., 1994. Identification of mutations in the alpha 3(IV) and alpha 4(IV) collagen genes in autosomal recessive Alport syndrome. Nature Genet. 8: 77–81. KELSELL, D., et al., 1997. Connexin 26 mutations in hereditary non-syndromic sensorineural deafness. Nature 387: 80–83. DENOYELLE, F., et al., 1997. Prelingual deafness: high prevalence of a 30delG mutation in the connexin 26 gene. Hum. Mol. Genet. 6: 2173–2177. ESTIVILL, X., et al., 1998. Connexin-26 mutations in sporadic and inherited sensorineural deafness. Lancet 351: 394–398. KELLEY P. M., et al., 1998. Novel mutations in the connexin 26 gene (GJB2) that cause autosomal recessive (DFNB1) hearing loss. Am. J. Hum. Genet. 62: 792–799. SHAHIN, H., et al., 2002. Genetics of congenital deafness in the Palestinian
23
24 Sensory Defects Caused by Mutant Motor Proteins
39
40
41
42
43
44
45
46
47
48
population: multiple connexin 26 alleles with shared origins in the Middle East. Hum. Genet. 110: 284–289. SOBE, T., et al., 2000. The prevalence and expression of inherited connexin 26 mutations associated with nonsyndromic hearing loss in the Israeli population. Hum. Genet. 106: 50–57. DEL CASTILLO, I., et al., 2002. A deletion involving the connexin 30 gene in nonsyndromic hearing impairment. N. Engl. J. Med. 346: 243–249. GRIFA, A., et al., 1999. Mutations in GJB6 cause nonsyndromic autosomal dominant deafness at DFNA3 locus. Nature Genet. 23: 16–18. LERER, I., et al., 2001. A deletion mutation in GJB6 cooperating with a GJB2 mutation in trans in non-syndromic deafness: A novel founder mutation in Ashkenazi Jews. Hum. Mutat. 18: 460. XIA, J. H., et al., 1998. Mutations in the gene encoding gap junction protein beta-3 associated with autosomal dominant hearing impairment. Nature Genet. 20: 370–373. LIU, X. Z., et al., 2001. Mutations in GJA1 (connexin 43) are associated with non-syndromic autosomal recessive deafness. Hum. Mol. Genet. 10: 2945–2951. AVRAHAM, K. B., et al., 1995. The mouse Snell’s waltzer deafness gene encodes an unconventional myosin required for the structural integrity of inner ear hair cells. Nature Genet. 11: 369–375. MELCHIONDA, S., et al., 2001. MYO6, the human homologue of the gene responsible for deafness in Snell’s waltzer mice, is mutated in autosomal dominant nonsyndromic hearing loss. Am. J. Hum. Genet. 69: 635–640. PROBST, F. J., et al., 1998. Correction of deafness in shaker-2 mice by an unconventional myosin in a BAC transgene. Science 280: 1444–1447. WANG, A., et al., 1998. Association of unconventional myosin MYO15 mutations with human nonsyndromic deafness DFNB3. Science 280: 1447–1451.
49 LALWANI, A. K., et al., 2000. Human nonsyndromic hereditary deafness DFNA17 is due to a mutation in nonmuscle myosin MYH9. Am. J. Hum. Genet. 67: 1121–1128. 50 WALSH, T., et al., 2002. From flies’ eyes to our ears: Mutations in a human class III myosin cause progressive nonsyndromic hearing loss DFNB30. Proc. Natl Acad. Sci. USA 99: 7518– 7523 51 BEMENT, W. M., T. HASSON, J. A. WIRTH, R. E. CHENEY and M. S. MOOSEKER. 1994. Identification and overlapping expression of multiple unconventional myosin genes in vertebrate cell types. Proc. Natl Acad. Sci. USA 91: 6549–6553. 52 CHEN, Z. Y., et al., 2001. Myosin-VIIb, a novel unconventional myosin, is a constituent of microvilli in transporting epithelia. Genomics 72: 285–296. 53 ERNEST, S., et al., 2000. Mariner is defective in myosin VIIA: a zebrafish model for human hereditary deafness. Hum. Mol. Genet. 9: 2189–2196. 54 KELLEHER, J. F., et al., 2000. Myosin VI is required for asymmetric segregation of cellular components during C. elegans spermatogenesis. Curr. Biol. 10: 1489–1496. 55 TITUS, M. A. 1999. A class VII unconventional myosin is required for phagocytosis. Curr. Biol. 9: 1297–1303. 56 CHEN, Z. Y., et al., 1996. Molecular cloning and domain structure of human myosin-VIIa, the gene product defective in Usher syndrome 1B. Genomics 36: 440–448. 57 WEIL, D., et al., 1996. Human myosin VIIA responsible for the Usher 1B syndrome: a predicted membrane-associated motor protein expressed in developing sensory epithelia. Proc. Natl Acad. Sci. USA 93: 3232–3237. 58 EL-AMRAOUI, A., et al., 1996. Human Usher 1B/mouse shaker-1: the retinal phenotype discrepancy explained by the presence/absence of myosin VIIA in the photoreceptor cells. Hum. Mol. Genet. 5: 1171–1178. 59 VELICHKOVA, M., et al., 2002. A human homologue of Drosophila kelch
References
60
61
62
63
64
65
66
67
68
69
associates with myosin-VIIa in specialized adhesion junctions. Cell Motil. Cytoskelet. 51: 147–164. KUSSEL-ANDERMANN, P., et al., 2000. Unconventional myosin VIIA is a novel A-kinase-anchoring protein. J. Biol. Chem. 275: 29654–29659. KUSSEL-ANDERMANN, P., et al., 2000. Vezatin, a novel transmembrane protein, bridges myosin VIIA to the cadherin-catenins complex. EMBO J. 19: 6020–6029. HASSON, T., M. B. HEINTZELMAN, J. SANTOS-SACCHI, D. P. COREY and M. S. MOOSEKER. 1995. Expression in cochlea and retina of myosin-VIIa, the gene defective in Usher Syndrome type 1B. Proc. Natl Acad. Sci. USA 92: 9815– 9819. HASSON, T., et al., 1997. Unconventional myosins in inner-ear sensory epithelia. J. Cell. Biol. 137: 1287–1307. LIU, X., G. VANSANT, I. P. UDOVICHENKO, U. WOLFRUM and D. S. WILLIAMS. 1997. Myosin VIIa, the product of the Usher 1B syndrome gene, is concentrated in the connecting cilia of photoreceptor cells. Cell Motil. Cytoskelet. 37: 240–252. LIU, X., B. ONDEK and D. S. WILLIAMS. 1998. Mutant myosin VIIa causes defective mela-nosome distribution in the RPE of shaker-1 mice. Nature Genet. 19: 117–118. KROS, C. J., et al., 2002. Reduced climbing and increased slipping adaptation in cochlear hair cells of mice with Myo7a mutations. Nature Neurosci. 5: 41–47. GILLESPIE, P. G., M. C. WAGNER and A. J. HUDSPETH. 1993. Identification of a 120-kd hair-bundle myosin located near stereociliary tips. Neuron 11: 581–594. LIBBY R. T. and K. P. STEEL. 2001. Electroreti-nographic anomalies in mice with mutations in Myo7a, the gene involved in human Usher syndrome type 1B. Invest. Ophthalmol. Vis. Sci. 42: 770–778. SELF, T., et al., 1998. Shaker-1 mutations reveal roles for myosin VIIA in both development and function of cochlear hair cells. Development 125: 557–566.
70 RICHARDSON, G. P., et al., 1997. Myosin VIIA is required for aminoglycoside accumulation in cochlear hair cells. J. Neurosci. 17: 9506–9519. 71 SEILER, C. and T. NICOLSON. 1999. Defective calmodulin-dependent rapid apical endocytosis in zebrafish sensory hair cell mutants. J. Neurobiol. 41: 424–434. 72 VON GRAEFE, A. 1858. Vereinzelte Beobachtungen und Bemerkungen. Exceptionelle Verhalten des Gesichtsfeldes bei Pigmentenartung des Netzhaut. Albrecht Graefes Arch. Klin. Ophthalmol. 4: 250–253. 73 KIMBERLING, W. J., et al., 1992. Linkage of Usher syndrome type I gene (USH1B) to the long arm of chromosome 11. Genomics 14: 988–994. 74 LIU, X.-Z., et al., 1997. Mutations in the myosin VIIA gene causing non-syndromic recessive deafness. Nature Genet. 16: 188–190. 75 WEIL, D., et al., 1997. The autosomal recessive isolated deafness, DFNB2, and the Usher 1B syndrome are allelic defects of the myosin-VIIA gene. Nature Genet. 16: 191–193. 76 LIU, X., et al., 1997. Autosomal dominant non-syndromic deafness caused by a mutation in the myosin VIIA gene. Nature Genet. 17: 268–269. 77 TAMAGAWA, Y., et al., 2002. Phenotype of DFNA11: a nonsyndromic hearing loss caused by a myosin VIIA mutation. Laryngoscope 112: 292–297. 78 KELLERMAN, K. A. and K. G. MILLER. 1992. An unconventional myosin heavy chain gene from Drosophila melanogaster. J. Cell Biol. 119: 823–834. 79 HASSON, T. and M. S. MOOSEKER. 1994. Porcine myosin-VI: characterization of a new mammalian unconventional myosin. J. Cell Biol. 127: 425–440. 80 SOLC, C. F., B. H. DERFLER, G. M. DUYK and D. P. COREY. 1994. Molecular cloning of myosins from the bullfrog saccular macula: a candidate for the hair cell adaptation motor. Auditory Neurosci. 1: 63–75. 81 BUSS, F., et al., 1998. The localization of myosin VI at the golgi complex and leading edge of fibroblasts and its
25
26 Sensory Defects Caused by Mutant Motor Proteins
82
83
84
85
86
87
88
89
90
91
phosphorylation and recruitment into membrane ruffles of A431 cells after growth factor stimulation. J. Cell Biol. 143: 1535–1545. AVRAHAM, K. B., et al., 1997. Characterization of unconventional MYO6, the human homologue of the gene responsible for deafness in Snell’s waltzer mice. Hum. Mol. Genet. 6: 1225–1231. MERMALL, V. and K. G. MILLER. 1995. The 95F unconventional myosin is required for proper organization of the Drosophila syncytial blastoderm. J. Cell Biol. 129: 1575–1588. BIEMESDERFER, D., S. A. MENTONE, M. MOOSEKER and T. HASSON. 2002. Expression of myosin VI within the early endocytic pathway in adult and developing proximal tubules. Am. J. Physiol. Renal Physiol. 282: F785–794. BRECKLER, J., K. AU, J. CHENG, T. HASSON and B. BURNSIDE. 2000. Novel myosin VI isoform is abundantly expressed in retina. Exp. Eye Res. 70: 121– 134. BUNN, R. C., M. A. JENSEN and B. C. REED. 1999. Protein interactions with the glucose transporter binding protein GLUT1CBP that provide a link between GLUT1 and the cytoskeleton. Mol. Biol. Cell 10: 819–832. INOUE, A., O. SATO, K. HOMMA and M. IKEBE. 2002. DOC-2/DAB2 is the binding partner of myosin VI. Biochem. Biophys. Res. Comm. 292: 300–307. MORRIS, S. M., et al., 2002. Myosin VI binds to and localises with Dab2, potentially linking receptor-mediated endocytosis and the actin cytoskeleton. Traffic 3: 331–341. WELLS, A. L., et al., 1999. Myosin VI is an actin-based motor that moves backwards. Nature 401: 505–508. ROCK, R. S., et al., 2001. Myosin VI is a processive motor with a large step size. Proc. Natl Acad. Sci. USA 98: 13655–13659. DEOL, M. S. and M. C. GREEN. 1966. Snell’s waltzer, a new mutation affecting behaviour and the inner ear of the mouse. Genet. Res. 8: 339–345.
92 RUSSELL, L. B. 1971. Definition of functional units in a small chromosomal segment of the mouse and its use in interpreting the nature of radiation-induced mutations. Mutat. Res. 11: 107–123. 93 CHURCH, D. M., et al., 1994. Isolation of genes from complex sources of mammalian genomic DNA using exon amplification. Nature Genet. 6: 98–105. 94 SELF, T., et al., 1999. Role of myosin VI in the differentiation of cochlear hair cells. Dev. Biol. 214: 331–341. 95 HICKS, J. L., W. M. DENG, A. D. ROGAT, K. G. MILLER and M. BOWNES. 1999. Class VI unconventional myosin is required for spermatogenesis in Drosophila. Mol. Biol. Cell 10: 4341– 4353. 96 AHITUV, N., et al., 2000. Genomic structure of the human unconventional myosin VI gene. Gene 261: 269–275. 97 RAYMENT, I., et al., 1993. Three-dimensional structure of myosin subfragment-1: a molecular motor. Science 261: 50–58. 98 LIANG, Y., et al., 1999. Characterization of the human and mouse unconventional myosin XV genes responsible for hereditary deafness DFNB3 and shaker 2. Genomics 61: 243–258. 99 DOBROVOLSKAIA-ZAVASDKAIA, N. 1928. L’irradiation des testicules et l’heredite chez la souris. Arch. Biol. 38: 457– 501. 100 SNELL, G. D. and L. W. LAW. 1939. A linkage between shaker-2 and wavy-2 in the house mouse. J. Hered. 30: 447. 101 FRIEDMAN, T. B., et al., 1995. A gene for congenital, recessive deafness DFNB3 maps to the pericentromeric region of chromosome 17. Nature Genet. 9: 86–91. 102 BOGER, E. T., J. R. SELLERS and T. B. FRIEDMAN. 2001. Human myosin XVBP is a transcribed pseudogene. J. Muscle Res. Cell Motil. 22: 477–483. 103 WINATA, S., et al., 1995. Congenital non-syndromal autosomal recessive deafness in Bengkala, an isolated Balinese village. J. Med. Genet. 32: 336–343.
References 104 LIBURD, N., et al., 2001. Novel mutations of MYO15A associated with profound deafness in consanguineous families and moderately severe hearing loss in a patient with Smith–Magenis syndrome. Hum. Genet. 109: 535–541. 105 SAEZ, C. G., J. C. MYERS, T. B. SHOWS and L. A. LEINWAND. 1990. Human nonmuscle myosin heavy chain mRNA: generation of diversity through alternative polyadenylylation. Proc. Natl Acad. Sci. USA 87: 1164–1168. 106 SELLARS, J. R. 1999. Myosins. Oxford: Oxford University Press. 107 LALWANI, A. K., et al., 1997. A five-generation family with late-onset progressive hereditary hearing impairment due to cochleosaccular degeneration. Audiol. Neurootol. 2: 139–154.
108 MONTELL, C. and G. M. RUBIN. 1988. The Drosophila ninaC locus encodes two photoreceptor cell specific proteins with domains homologous to protein kinases and the myosin heavy chain head. Cell 52: 757–772. 109 BATTELLE, B. A., et al., 1998. A myosin III from Limulus eyes is a clock-regulated phosphoprotein. J. Neurosci. 18: 4548–4559. 110 DOSE, A. C. and B. BURNSIDE. 2000. Cloning and chromosomal localization of a human class III myosin. Genomics 67: 333–342. 111 PORTER, J. A. and C. MONTELL. 1993. Distinct roles of the Drosophila ninaC kinase and myosin domains revealed by systematic mutagenesis. J. Cell Biol. 122: 601–612. 112 HOLT, J. R., et al., 2002. A chemical-genetic strategy implicates myosin-1c in adaptation by hair cells. Cell 108: 371–381.
27
1
Myosin Myopathies
John P. Konhilas, and Leslie A. Leinwand University of Colorado, Boulder, USA
Originally published in: Molecular Motors. Edited by Manfred Schliwa. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52730594-0
1 Introduction
Muscle contraction is ultimately dependent on the coordinated interaction of thick and thin filaments contained within each sarcomere. Myosin is the central component of the muscle thick filament. The myosin head, or cross-bridge, extends outward from the thick filament backbone and contains the catalytic site and the actin-binding domain. The binding to actin and subsequent ATP-dependent conformational change in cross-bridge orientation forces the interdigitated thick and thin filaments to slide past one another and is responsible for generating the power stroke and transmitting the force necessary for muscle contraction. Over the past decade, numerous mutations in the myosin molecule have been linked to muscle myopathies. Myosin mutations are most common in the heart giving rise to either dilated or hypertrophic cardiomyopathies. The best characterized of these inherited diseases is Familial Hypertrophic Cardiomyopathy (FHC). FHC is an autosomal dominant disease characterized by heterogeneous anatomic and histologic features including left ventricular hypertrophy, myofibrillar disarray, and increased interstitial collagen. Patients suffering from FHC have a variable clinical outcome usually associated with contractile dysfunction that can range from being benign and relatively symptom free to sudden death. Although mutations in many sarcomeric components can lead to FHC, the majority are missense mutations in the β-myosin heavy chain (MyHC) genes [1]. To date, at least 60 mutations in MyHC have been described and are associated with the clinical manifestation of FHC in humans. Although the precise mechanisms responsible for the development of myosin myopathy remain unclear, it most certainly involves alterations in force production as a result of these mutations. In support of this idea, analysis of the crystallographic Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Myosin Myopathies
Fig. 1 Myosin molecule structure and location of FHC mutations. (A) Crystal structure of chicken skeletal S1 showing the actinbinding domain, ATP binding pocket, and placement of the essential (ELC) and regulatory light chains (RLC). (B) Schematic of
a single myosin molecule. The locations of some of the FHC mutations are indicated by the inverted arrows (). (C) Linear representation of the full length rat MHCwith the relative location of each subunit.
structure of the myosin head reveals that most, but not all mutations, including the skeletal myosin mutation [2], map to the head and neck region of the myosin molecule in close proximity to the functional domains of the motor (see Fig. 1; [3, 4]). The locations of these mutations suggest an interference with either the catalytic activity or actin–myosin interaction leading to an alteration in the generation of power necessary for muscle contraction. From this model of pathogenesis, however, it is difficult to reconcile two distinct phenotypes such as the development of hypertrophic cardiomyopathy (HCM) or dilated cardiomyopathy (DCM) from mutations holding similar positions in the myosin head [1, 5]. In addition, this model does not readily explain FHC induced by mutations in the myosin rod [3]. It seems logical to conclude that mechanism by which these rod mutations initiate the FHC disease process appear to involve aspects of myosin function independent of myosin motor activity. Much of our understanding of how mutations in the sarcomere can result in a condition such as FHC, DCM or skeletal myopathy has emerged from a wide range of approaches including analysis of the enzymatic and motility properties of mutated myosin molecules and the phenotypic and genetic characterization of transgenic mice expressing these mutant alleles. In this chapter, we will examine the functional characteristics of specific mutant myosins and transgenic models harboring these same mutations. We will further describe the molecular basis of FHC with a discussion of mutations in myosin-interacting proteins. Finally, although the majority of this discussion focuses on cardiac-based myopathies because
2 Cardiac Myosin Heavy Chains
of the extensive analysis of this disease, we also discuss skeletal muscle myopathies resulting from mutations in myosin and myosin-interacting proteins.
2 Cardiac Myosin Heavy Chains 2.1 MyHC Structure and Function
Striated muscle MyHC isoforms exhibit striking sequence homology. Thus, in considering the mechanisms whereby mutations in myosin cause myopathy, it is important to review what is known about the structure and functional domains of the cardiac myosins. Conventional myosin or myosin II is the main component of the muscle thick filament and accounts for 30% of the total myofibrillar proteins. Myosin is a hexameric molecule consisting of two heavy chains and two nonidentical light chains (MLC). The myosin rod is an α-helical coiled coil, termed light meromyosin (LMM), arranged in parallel along the axis of the thick filament (Fig. 1). It is linked via a hinge region termed subfragment-2 (S2) to a globular head, or subfragment-1 (S1). Myosin S1 is at the N-terminus of the heavy chain and contains the catalytic site and the actin binding domain. Whereas the S1 subunit of myosin hydrolyzes ATP and can bind actin, LMM mediates filament assembly [7] and provides binding sites for myosin-associated proteins such as myosin binding protein C and titin [8, 9]. In the mammalian heart, two isoforms (α-MyHC and β-MyHC) with 93% amino acid identity are expressed [10]. Three distinct myosins (V1 , V2 , and V3 ) have been described in the adult heart. V1 and V3 are homodimers of α-MyHC and (βMyHC, respectively. V2 myosin is the α/β heterodimer. In mammals, the relative proportion of each polypeptide depends on the species, developmental stage and physiological or pathophysiological status of the myocardium. Whereas β-MyHC predominates in the human heart, small rodents express 90–100% α-MyHC in the adult myocardium. Intermediate between the rodent and human myocardium, the adult rabbit heart is comprised of approximately 70% β-MyHC [11]. In all mammals, β-MyHC is the dominant isoform during the embryonic stages. However, α-MyHC expression is essential during mouse development as demonstrated by the lethality of the α-MyHC null mutation [12]. Around birth in small rodents, there is shift to the α-MyHC with complete silencing of P-MyHC expression [13]. Shifts of the cardiac MyHC isoforms have been demonstrated in response to hormonal changes, aging, exercise and pathology. For example, hypothyroidism induces a shift to β-MyHC in both rabbits and rats [11, 14]. Similarly, pressure overload brought about by aortic banding, myocardial infarction, or other pathological stressors significantly increases β-MyHC content in the myocardium [15]. Transgenic mouse hearts with mutations in sarcomeric proteins demonstrate a MyHC isoform switch to predominantly β-MyHC [16, 17]. MyHC isoform shifts were thought to be of no significance in humans since the normal heart was
3
4 Myosin Myopathies
previously believed to be composed entirely of β-MyHC [5, 19]. However, evidence has been presented indicating that α-MyHC comprises ∼10% of total MyHC in non-failing hearts and becomes virtually undetectable in the failing myocardium [20, 21]. Interestingly, in contrast to the induction of β-MyHC by pathologic stimulation, exercise induces expression of α-MyHC in rats [22]. Despite significant structural and sequence homology, the two isoforms exhibit distinct functional characteristics. Early studies illustrated that myosin composed of α-MyHC exhibits two to three times the actin-activated and calcium-stimulated myosin ATPase activity when compared to myosin composed of primarily β-MyHC [11, 23, 24]. Moreover, actin filament velocities of each isoform in an in vitro motility assay show parallel differences [11, 23]. These functional differences are achieved independent of alterations in the Ca2+ sensitivity of tension and maximum tensiongenerating capability [19]. The results with isolated proteins are consistent with observations that intact cardiac muscle preparations with α-MyHC have increased unloaded shortening velocities compared to β-MyHC in both papillary muscles from thyroxine treated (hyperthyroid) rabbits [26] and intact cardiac trabeculae from PTUtreated (hypothyroid) rats [14]. Skinned cardiac myocytes expressing α-MyHC have augmented loaded shortening and power output when compared to those composed of primarily β-MyHC [27]. Consistent with these observations, the rates of submaximal and maximal tension development are lower (60 %) coupled with an increase in the half-time for relaxation in skinned myocardium expressing β-MyHC from that of controls [19]. Furthermore, the isozyme characteristics are not affected by β-adrenergic treatment suggesting that these properties are due to the intrinsic differences in cross-bridge kinetics between α-MyHC and β-MyHC [14, 19]. The increased kinetics of α-MyHC compared to β-MyHC are not, however, without an energetic cost. Warshaw and colleagues illustrated in the in vitro motility assay that β-MyHC has greater average isometric force-generating capacity than α-MyHC [11]. Coupled with a slowed actin filament velocity and reduced myosin ATPase, β-MyHC undergoes a slower transition from the force-generating to the non-force-generating state once strongly bound. Thus, the prolonged detachment rate of β-MyHC suggests a greater economy of force production, a similar conclusion reached by the same group using an in vitro motility assay [23]. Consistent with this idea, Holubarsch et al. [28] showed that left ventricular papillary muscles from PTU-treated rats produce less heat, relative to controls, when contractile heat was normalized to force [28]. A direct measurement of force-dependent ATP consumption in skinned rat cardiac preparations yielded similar results as demonstrated by a linear decrease in tension cost with increasing β-MyHC isoform content relative to α-MyHC [29]. Taken together, these mechanical data predict that myocardium composed primarily of α-MyHC would develop tension, or pressure, at a greater rate (due to enhanced myosin ATPase activity and increased power output) than myocardium expressing predominantly β-MyHC. Furthermore, it could be argued that the extended half-time for relaxation in myocardial preparations of β-MyHC [19] should
2 Cardiac Myosin Heavy Chains
similarly slow the rate of relaxation in an intact ventricle. Indeed, this has been demonstrated in several studies in which hypothyroidism in the mouse results in a decrease in the time derivative of both pressure development (+ dP/dt) and relaxation (– dP/dt) in isolated work-performing hearts [7, 31] and intact closed-chest preparations [32]). Additional effects independent of MyHC expression confound the experimental models of hyperthyroidism and, similarly, hypothyroidism. To address the specific role of β-MyHC in the intact myocardium without the accompanying hormonal effects of thyroid hormone manipulation, a transgenic murine model was generated that expresses 12% β-MyHC in adult ventricular tissue [33]. The depression in myofibrillar ATPase activity determined from transgenic hearts correlates with the significant decrease in systolic function assessed by a Langendorff preparation [33]. Thus, it appears that MyHC isoform expression in the intact heart does have functional consequences even in the absence of additional alterations. An important observation from these studies is that expression of β-MyHC in the heart does not result in an overt cardiac pathology [33]. Therefore, the implication is that the isoform switch to β-MyHC, in and of itself, does not contribute to the pathogenesis of cardiomyopathy in general and that HCM is not caused purely by a reduction in the ATPase activity of the myosin molecule. Instead, in light of an enhanced economy of force production associated with β-MyHC, an increase in β-MyHC content is, at least initially, an adaptive response to the energetic demands in the context of a compromised myocardium. 2.2 Cardiac Muscle Regulation and Disease
Cardiac function is ultimately dependent on the ability of the myofilaments to generate force, which, in turn, is dependent on the amount of cytosolic Ca2+ , phosphorylation state, and isoform content of the cardiac myocyte [14, 35]. Thus, in an attempt to understand the development of myosin myopathies resulting from mutations, it is important to elucidate the relationship between myosin function and these additional factors involved in the regulation of muscle contraction. As detailed above, the mechanical properties of the cardiac heavy chain isoforms have a direct impact on cardiac function independent of other signals. On the other hand, factors such as thyroid hormone are involved in both the regulation of MyHC isoform content and cardiac function [32]. In general, an increase in physiologic load forces heart cells to adapt and match the intensity and dynamics of their mechanical activity to the prevailing hemodynamic demands. The beat-by-beat regulation of cardiac output is accomplished through intrinsic and extrinsic feedback regulation that acts in a relatively instantaneous manner. Changes in end-diastolic volume can alter the contractile state of the ventricle through an intrinsic mechanism known as the Frank–Starling law of the heart [36]. Myocardial function can also be tuned to the hemodynamic load via neurohumoral factors such as adrenergic stimulation and subsequent activation of intracellular signaling pathways [35].
5
6 Myosin Myopathies
However, the inability of the myocyte to match cardiac output to the physiological demand that occurs with a chronic hemodynamic load such as hypertension or altered contractile function such as mutant sarcomeric proteins initiates a remodeling process. Central to this remodeling process is myocyte enlargement (hypertrophy) since the adult cardiac myocyte is terminally differentiated and unable to proliferate. Clearly, the development of FHC is dependent on the coordinated integration of the multiple signals to counteract the contractile dysfunction. Very often, however, these signals lead to maladaptations that can exacerbate the myocardial dysfunction [14, 37]. Ultimately, the heart is unable to respond to changes in hemodynamic load resulting in the clinical syndrome of congestive heart failure (CHF) [22]. Clearly, hypertrophy is an adaptive, compensatory mechanism which attempts to match cardiac output to the increase in physiological demand. Yet, hypertrophy resulting from pathological stimuli such as a mutant protein or hypertension is initially adaptive but can become maladaptive and is a leading predictor of CHF. Although it is unclear how myofilament dysfunction resulting from these mutant proteins translates into the complex phenotypes associated with FHC, it is most certainly related to a functional impairment of the contractile apparatus and subsequent integration of the hypertrophic signals arising from this dysfunction as discussed below.
3 Cardiac MyHC Myopathy 3.1 Functional Characterization of MyHC Motor Domain Mutations
To date, the vast majority of mutations in the β-MyHC described in human disease are located in the motor domain of the myosin molecule, i. e. the actin-binding region, the nucleotide binding pocket and the hinge region [4]. The functional impact of several single amino acid substitutions has been assessed in assays such as actin-activated ATPase and in vitro motility. In most instances, the severity of the disease correlates with the functional defects in motor activity as measured by these assays. For example, patients with an arginine substituted by a glutamine at residue 403 (R403Q) generally have a poor clinical prognosis [29]. Accordingly, actin-activated ATPase activity of the R403Q mutation expressed in baculovirus or mammalian cells is greatly reduced [40–42]. In addition, this mutation results in a velocity as measured by the in vitro motility assay that is nearly 5-fold less than normal myosin [42]. A reduction in this parameter was also found in myosin purified from cardiac and skeletal muscle biopsies with this same mutation as well as other myosin mutations (T124I, Y162C, G256E, R870H, and L908V) [11, 44]. Moreover, skinned fibers from patients with the R403Q mutation exhibit lowered force/stiffness ratio and depressed velocity of shortening and power output [44]. A summary of the mechanical and/or kinetic properties of additional mutations is presented in Table 1.
3 Cardiac MyHC Myopathy Table 1 In vitro functional analysis of myosin mutations
Mutation
Source
T124I Y162C R249Q G256E R403Q R453C V606M G741R R870H L908V
Human, HMM/S1 Human HMM/S1 Human, HMM/S1 Human, HMM/S1, Mouse HMM/S1 Human, HMM/S1 Human Human Human
ATPase
↓ ↓ ↓ ↓, ↔
filament velocity ↓ ↓↓ ↓ ↔, ↓ ↓↓, ↑ ↓ ↓ ↓ ↓ ↓↓, ↑
Ca2+ sensitivity
↔ ↓, ↑ ↓
Study [43] [43] [41, 43] [42–44, 125] [40–44, 53, 125] [41] [40, 43] [44] [43] [43, 125]
HMM, heavy meromyosin; S1, myosin (S1) head domain.
The biochemical properties of additional myosin mutations located in the myosin S1 were also studied and found to correspond to the clinical prognosis [40, 41]. The V606M mutation lies in the actin-binding domain and the R249Q and R453C mutations are located at the base of the ATP-binding pocket [4]. The V606M mutation is typically associated with a benign form of FHC [46] although an unfavorable prognosis has been reported in several families with the V606M [17, 37]. Patients with the R453C mutation present with a severe form of FHC. In contrast, the R249Q mutation has a moderate phenotype with respect to incidence and age of sudden death. These mutations (R249Q, R453C, and V606M) result in a reduction in sliding velocities of actin filaments [41]. In the ATPase assay, the V606M mutation shows the mildest decrease and R453C mutation has the most significant decrease of the three [40]. In contrast, Sata and Ikebe [41] found no difference in the actin-activated ATPase between the V606M mutation and wild-type controls. However, these functional differences may be due to the use of human β-MyHC S1/S2 fragment [41] versus myosin S1 of the rat α-MyHC [40]. Thus, in the majority of cases, the degree of impairment in the mutant myosin motors correlates with clinical prognosis. However, the phenotypes of certain mutations depend largely on additional genetic and non-genetic factors. Support of this is found in a recent study demonstrating that a polymorphic modifier gene can affect the hypertrophic response in two inbred mouse strains expressing a mutant allele (R403Q; [49]). More apparent genetic factors such as ethnic background also contribute to the development of the disease. For example, Korean kindred with the R403Q mutation had predominantly left ventricular outflow obstruction and no sudden cardiac death while Caucasian pedigrees had non-obstructive hypertrophic cardiomyopathy with a high incidence of sudden cardiac death [17]. Thus, the mechanical factors only partially predict the penetrative nature of this disease, a notion consistent with the clinical heterogeneity.
7
8 Myosin Myopathies
3.2 Transgenic Models of Myosin-based FHC
The pathogenesis of myosin-based FHC has been studied using transgenic murine models with the missense mutation (R403Q) in the mouse α-MyHC corresponding to the human β-MyHC mutation [50, 51]. In addition, a transgenic rabbit model of the R403Q mutation in the β-MyHC gene has been described [52]. The murine model described by Geisterfer-Lowrance et al. [50] placed the mutation into the endogenous locus while that described by Geisterfer-Lowrance et al. [50] used a transgenic approach. The latter model contains an additional deletion of amino acids 468–527 bridged by nine non-myosin amino acids [51]. Although each mouse model recapitulates aspects of the myocardial pathology associated with FHC, they exhibit distinct phenotypes. In both mouse models, cardiac histopathology characterized by myocellular disarray and interstitial fibrosis was evident by 3–4 months of age. As early as 5 weeks of age, the mice described by Geisterfer-Lowrance et al. [50] had altered LV diastolic kinetics with delayed pressure relaxation and chamber filling accompanied by decreased cardiac output [50, 53]. They also observed enlarged atria in the absence of ventricular hypertrophy [50]. The diminished actin-activated ATPase activity and enhanced dynamic stiffness in cardiac muscle strips from these hearts is consistent with diastolic dysfunction [54]. Interestingly, in this particular model the cardiac contractile dysfunction at 5 weeks of age preceded any histopathology or morphological abnormalities. Lending further support for the mechanical basis of this disease was the observation that the decay of the intracellular Ca2+ transient was markedly delayed in isolated myocytes from the transgenic mice independent of alterations in the amount of sarcoplasmic reticulum Ca2+ -ATPase [55]. This suggests that altered mechanical activity of the sarcomere not only leads to aberrant ventricular function but also induces the cellular pathology associated with FHC. In contrast to the mice described above, the transgenic mice showed significant left ventricular hypertrophy at 4 months of age [16, 51]. These mice exhibited diastolic dysfunction at both 4 months and 8–10 months of age with either normal or increased systolic function in females. Unlike the female mice, hemodynamic function worsened with age in males [16, 56]. At an older age (8 months), male mice developed progressive left ventricular dilation similar to cases reported in humans [51, 57]. Interestingly, augmentation of catecholamine reactivity in the mutant myocardium by high level over-expression of the β 2 -adrenergic receptor hastened the progression to cardiac failure. In contrast, the male CHF phenotype was prevented by genetic manipulations that blunted the adrenergic signaling pathway [58]. Thus, the underlying cause of the myocellular pathology can be attributed to the mutated protein, yet it appears that the progression of the disease is influenced by additional factors, such as gender, associated with the hypertrophic response. Recently, a transgenic rabbit model of FHC was described [52]. The hearts in these rabbits carry the human β-MyHC with the R403Q mutation. A significant increase in ventricular septal wall thickness and left ventricular mass was seen in the transgenic hearts compared to hearts expressing the wild-type human
4 MyHC Interacting Proteins and FHC 9
transgene. Despite normal ventricular dimensions and systolic function (measured by echocardiography), transgenic mutant hearts displayed significant myocyte disarray and increased interstitial collagen. Premature death was also more common in mutant than in wild-type transgenic animals. The interesting feature of the transgenic rabbit model is that, similar to humans, rabbits express predominantly β-MyHC. Thus, trangenic rabbits may prove a more ideal model over existing transgenic mouse models to study the pathogenesis of human FHC. In one of the transgenic mouse lines with minimal expression of the mutant transgene (0.6–2.5 %), the appearance of histopathology preceded the onset of ventricular hypertrophy suggesting that the myocellular degeneration triggers the hypertrophy [51]. Thus, considering each model, the time course of events is such that, sarcomeric dysfunction results in altered ventricular function triggering remodeling that can also feedback and contribute to this dysfunction. In order to compensate for decreased cardiac output, the hypertrophic response is initiated starting a vicious cycle of positive feedback into this aberrant, maladaptive process. Although transgenic models provide a direct link between altered contractile function and the development of cardiomyopathies, there is no complete model describing the integration of the multiple specific pathways induced by the myofilament dysfunction and leading to the pathologies associated with FHC. Given the impact of additional factors on the progression of this disease [53, 58], it seems obvious that development of FHC depends on the coordinated interaction of multiple signaling pathways. For example, the identification of two mutations in the myosin rod (A1379T and S1776G) that lead to hypertrophic cardiomyopathy cannot be readily explained by a deficit in force transmission [6]. These missense mutations may disrupt the coiled-coil structure and/or thick filament assembly that may lead to ultrastructural disorder, despite the presumably normal force-generating capability of the myosin head [6]. Yet, it is not known how a potential impairment of filament structure or assembly translates into the complications associated with FHC. Potential paradigms in which to identify and distinguish these pathways may reside in the observation that there are marked differences between female and male animals [50, 51, 56]. Moreover, the identification of myosin mutations (S532P and F764L) in the actin-binding domain and hinge region that result in a dilated cardiomyopathy unaccompanied by hypertrophy, also points to a distinct signaling response [5]. Thus, it appears that the performance of the contractile machinery not only initiates this disease process but is central to the integration of the signals that perpetuate and promote this response.
4 MyHC Interacting Proteins and FHC 4.1 The Essential and Regulatory Light chains
The structure of the myosin rod (LMM) allows the specific myosin–myosin interactions necessary for thick filament assembly. In addition, other myosin-interacting
10 Myosin Myopathies
proteins are necessary for normal muscle functioning. The carboxyl terminus of the S1 region forms a single α-helix. Structural and functional data suggest that this region acts as the lever arm, such that the catalytic domain of the myosin head swings relative to this neck domain [59]. Thus, this motion accompanied with the strong acto–myosin interaction during muscle activation is responsible for the displacement between the thick and thin filament necessary for contraction. Each of the two light chains, the essential light chain (ELC) and regulatory light chain (RLC), wrap around this α-helix, encompassing the majority of this region, and may play a role in providing structural support for this segment of the myosin rod [3]. In addition to providing structural support for myosin, the light chains may modulate myofilament contractile properties. The RLC of myosin contains a divalent cation-binding site similar to other Ca2+ binding ‘EF-hand’ proteins like troponin C and calmodulin [60]. Removal of the light chains reduced the velocity of actin filament movement in an in vitro motility assay [61] whereas only removal of the ELC reduced isometric force [62]. Two adjacent serine residues in the N-terminal domain of the RLC are targeted by the Ca2+ /calmodulin-dependent myosin light chain kinase (MLCK). The phosphorylation of these sites by MLCK in smooth and non-muscle myosins plays an integral role in the activation of myosin ATPase and subsequent contractile strength [63]. However, in striated muscle, the primary regulation of muscle contraction is not dependent on RLC phosphorylation. Nevertheless, phosphorylation of the RLC may act to modulate cross-bridge cycling kinetics in contracting muscle. In support of this, RLC phosphorylation results in an increase in Ca2+ -sensitivity of tension development and the rate of force redevelopment in intact and skinned skeletal muscle fibers [64–67]. Szczesna et al. [68] also demonstrated that the increase in Ca2+ -sensitivity was accompanied by an increase in maximal force development and higher Ca2+ -dependent ATPase activity in skinned muscle fibers reconstituted with phosphorylated RLC. However, because RLC phosphorylation significantly increased the rate constant for isometric force redevelopment [64, 65] without a concomitant increase in the rate of isometric force-dependent ATPase consumption [65], it has been proposed that RLC phosphorylation enhances the rate of transition from non-force-generating cross-bridges to force-generating states. The structural basis for these physiological findings may reside in an increase in the mobility of the myosin heads following RLC phosphorylation as demonstrated by disordering of the myosin head array [69, 70]. Although these studies were performed using skeletal muscle preparations, experimental evidence has suggested an important role of RLC phosphorylation in the stretch activation response of cardiac papillary muscles [71]. The high level of RLC phosphorylation in the epicardium will increase tension and decrease the stretch-activation response while resulting in the opposite effect in the endocardium, allowing different muscle mechanics across the ventricular wall [71, 72]. The stretch-activation response coupled with the spatial gradient of phosphorylated RLC may facilitate the torsional contraction observed in the intact ventricle [72]. Moreover, the presence of mutations within the α-helix (the light chain-binding domain) and in both ELC and RLC underlie distinct phenotypes of FHC.
4 MyHC Interacting Proteins and FHC 11
4.2 Myosin Light Chain-based FHC
The first identified mutations in the RLC (A13T, E22K, and P94R) and the ELC (M149V) were associated with a form of FHC characterized by mid-ventricular cavity obstruction due to papillary muscle hypertrophy [73]. However, a more typical phenotype of hypertrophic cardiomyopathy dominated by increased left ventricular mass and abnormal contractile function was subsequently found in patients with the RLC mutations F18L and R58Q [74]. Interestingly, these mutations were not associated with cases of sudden death [73, 74]. The mechanism by which the light chain mutations produce the FHC phenotype is unclear but most likely involves an interference of the interaction between the light chains and myosin thereby affecting force generation. Skinned fibers with the E22K mutation had an increased Ca2+ -sensitivity of tension and enhanced disordering of the myosin heads [69]. Furthermore, RLC mutations (A13T, F18L, E22K, R58Q, and P95A) reduced Ca2+ -binding with the largest reduction observed in the F18L mutation and complete obliteration in the R58Q mutation [67]. Moreover, phosphorylation of the RLC altered the Ca2+ affinity to varying degrees in wild-type RLC and the RLC mutants [67]. In the context of the intact sarcomere, a transgenic mouse in which the human ELC mutation (M149V) was inserted into the murine genome recapitulated FHC [71]. However, if the homologous mutation was made in the context of the murine gene locus, there was histopathology consistent with FHC but without the characteristic hypertrophic response [75]. Instead, the hearts from these mice were smaller as measured by heart weight-to-body weight ratios. This occurred in conjunction with an increased myofibrillar ATPase activity and the Ca2+ -sensitivity of tension and a concomitant decrease in maximum power output [75]. Sanbe et al. [75] also reported that unlike patients with the E22K RLC mutation as described above, transgenic mice carrying the same mutation exhibited no hypertrophy and no pathology. To determine the functional impact of myosin mutations that disrupt the interaction of the light chains and myosin, a transgenic mouse was generated with a deletion in the light chain-binding domain of the α-cardiac MyHC [76]. Skinned myocytes and multicellular preparations from transgenic hearts exhibited decreased Ca2+ -sensitivity of tension and rates of relaxation [76]. These hearts displayed an increase in anterior wall thickness whicht translated into an overall increase in heart mass. Histopathological features were consistent with FHC including small vessel coronary disease, myocellular disarray, and interstitial fibrosis [76]. In addition, these mice had a significant valvular pathology seen in a subset of patients with FHC [76]. Similar to the MyHC mutations, the in vitro functional characteristics of light chain mutations do not necessarily predict the myocardial phenotype harboring the corresponding mutations. Moreover, light chain mutations in transgenic models do not always recapitulate human FHC indicative of the phenotypic diversity seen in the clinical realm. Again, the implication is that the pathogenesis of FHC
12 Myosin Myopathies
involves the integrative response to multiple signaling pathways and modifying factors. 4.3 Myosin Binding Protein C-Based FHC
Another major myosin accessory protein present in nearly all vertebrate striated muscle is myosin-binding protein C (MyBP-C). MyBP-C is relatively large (∼130 kDa) and belongs to the intracellular immunoglobulin superfamily. It is composed of repeated Ig and fibronectin domains [77]. The C-terminal region of MyBP-C also interacts with the LMM and with titin, presumably anchoring the protein to the thick filament complex [8, 78]. The N-terminal portion binds to the myosin S2 segment in close proximity to the lever arm of the myosin head [79]. Despite an undefined, yet necessary role of MyBP-C in thick filament assembly [80–82], a role for MyBP-C as a physiological regulator of cardiac contractility is emerging in the literature [83–86]. The influence of MyBP-C on actomyosin ATPase is dependent on ionic strength, the isoform of myosin, and the presence of the light chains [82, 87, 88]. Another important issue apart from the effect on actomyosin ATPase is the physiological role of MyBP-C phosphorylation. Within the N-terminal region that interacts with the myosin S2 segment are phosphorylation sites targeted by cAMPdependent protein kinase (PKA) and Ca2+ /calmodulin-dependent kinase (CaMK) [89, 90]. Introduction of non-phosphorylated recombinant cardiac MyBP-C into skinned muscle fiber preparations reduced Ca2+ -activated maximal force and increased the Ca2+ -sensitivity of force development compared to MyBP-C in the phosphorylated state [84]. However, recent studies demonstrated that PKA treatment of skinned myocardium from transgenic mice that undergo PKA-dependent phosphorylation of MyBP-C in the absence of TnI phosphorylation had no effect on the Ca2+ -sensitivity of force development or Ca2+ -activated maximal force [91, 92]. Despite an undetermined functional role, phosphorylation of MyBP-C has also been shown to have a structural effect on both the myosin heads [93] and myofilament lattice structure [94]. FHC has also been linked to mutations in cardiac MyBP-C [95]. In general, patients with mutations in MyBP-C have later onset, lower penetration, a better prognosis and longer life expectancies compared to most MyHC mutations and very often remain asymptomatic. A knock-in mouse model that carries a shortened MyBP-C at the N-terminus lacking the mutant site identified in FHC [95] shows no pathology despite an increase in Ca2+ -sensitivity of force [96]. The majority of these mutations, however, result in truncations of the MyBP-C molecule at the C-terminus [95] which is the region that interacts with the myosin rod and with titin [78, 97]. Given the interactions of the C-terminus with myosin and titin, the mutated complexes could theoretically disrupt sarcomeric structure and interfere with normal contractile function of the heart potentially leading to a hypertrophic response. This idea is supported by transgenic models in which a truncated MyBP-C
4 MyHC Interacting Proteins and FHC 13
expressed in the heart elicited aspects of FHC [86, 98, 99]. Similar to the human form of MyBP-C-based FHC, the pathology associated with these mutations was quite subtle. Despite an increase in the Ca2+ -sensitivity of force and a reduction in the maximum power output in both permeabilized cardiac fibers or myocytes, hypertrophy was not evident in the hearts of transgenic mice expressing a mutant MyBP-C that lacks both the myosin- and titin-binding domains [99]. A similar mechanical effect accompanied by cardiac hypertrophy was found in hearts that expressed a mutant MyBP-C that lacks the myosin-binding domain but retains the titin-binding domain [86]. In both models, the mutant peptide was incorporated into the sarcomere, albeit in an aberrant fashion. Interestingly, there were no changes in myocyte and interstitial morphology yet striking structural abnormalities such as deficits in sarcomere organization and arrangement, were exhibited [86, 99]. When the mutant protein was expressed in a homozygous fashion, there was profound pathology characterized by a dilated cardiomyopathy at birth that progressed into significant cardiac hypertrophy by 8–12 weeks of age [98]. Despite well-organized sarcomeres, the hearts had depressed contractile function and a prominent histopathology including myocyte hypertrophy, myofibrillar disarray, and fibrosis consistent with the hypertrophic response [98]. Similarly, inactivation of the MyBP-C gene and thus elimination from the cardiac sarcomere resulted in significant hypertrophy, depressed contractile function, and a characteristic hypertrophic histopathology [100]. In the MyBP-C null hearts, sarcomere striation patterns were visible although the Z-lines appeared out of register [100]. Although the functional impact of MyBP-C-based mutations cannot be ignored, based on these studies, the notion of MyBP-C as an essential component in sarcomere assembly must be modified, at a minimum, to a modulatory or regulatory role. In support of this, FHC-associated mutations in the MyBP-C-binding domain of the MyHC S2 segment drastically reduced MyBP-binding without affecting the coiledcoil structure of myosin or myofibrillar integrity [79]. Again, this suggests that the contractile dysfunction associated with MyBP-C-based FHC is associated with a functional impairment rather than an assembly deficit. 4.4 Titin-based Familial Hypertrophic Cardiomyopathy
In striated muscle, titin (also known as connectin) spans from the Z-line to the M-line of the sarcomere and binds tightly with the thick filament [9, 101]. Titin has been targeted as an essential contributor to the passive elasticity of striated muscle, and the variable extensibility of titin isoforms may explain, in part, the differences in passive stress between cardiac and skeletal muscle [102–105]. Although a multidomain polypeptide, only the I-band region is functionally extensile [102, 103, 106]. Within this region is a segment rich in proline (P), glutamate (E), valine (V), and lysine (K) residues (the so-called PEVK segment) that is serially flanked by immunoglobulin-like (Ig) domains [107]. In addition, two differentially spliced extensile regions, the N2A/N2B elements that join the tandem Ig domains with the PEVK element, are included in this region and may explain the variation in passive
14 Myosin Myopathies
stiffness of cardiac myocytes [104]. During passive stretch, these regions do not stretch in parallel but instead tandem Ig and PEVK segments extend sequentially with the Ig domain associated with small stretch and low passive force while the PEVK domain associated with greater stretch and passive force increases [103, 108]. The contribution of titin to passive tension arises from the extensibility properties of these specific domains. Likewise, titin acts as scaffolding for thick filament assembly as suggested by its interactions with several sarcomeric components. The carboxyl terminus of titin extends into the A-band and binds near the end of the LMM region of the myosin tail forming a stiff structure relative to the I-band segment [9, 109]. This region of titin contains domain patterns that reflect myosin head repeats, suggesting titin may have a role in determining thick filament structure and arrangement of the myosin heads [109]. Consistent with this structural role, titin interacts with an M-band protein, myomesin, an association that is regulated by the phosphorylation state of a serine residue within this region of myomesin [101]. Also within the carboxy terminal region, titin binds to MyBP-C, a protein that has been implicated in the structural organization of the thick filament as mentioned above [97, 99]. Interestingly, titin can associate with actin and this interaction is limited to the Z-line. This presumably maintains high stiffness of the relaxed cardiac myofibril [9, 110], suggesting both a structural and physiological role for titin. Within the Z-line, titin forms a ternary complex with actin and α-actinin, an important Z-line component [111]. Although the functional consequence of this association remains to be elucidated, specific titin mutations in this region linked to the development of FHC may provide a testable framework to study this interaction [112]. Recently, a mutation in titin was identified in patients with a typical hypertrophic cardiomyopathy who had no known mutations in other genes known to cause FHC [112]. This mutation (R740L) is within the binding domain of αactinin in the Z-line region. The hypertrophic response in these patients may be empirically related to the observation that this titin mutation increased binding affinity to α-actinin [112]. However, Satoh et al. [113] also mapped mutations to the Z-line binding region of the titin gene in patients presenting with a familial form of dilated cardiomyopathy (DCM). The two mutations (V54M and A743V) decreased the affinity of titin to α-actinin and another Z-line protein, T-cap/telethonin as measured by a yeast two-hybrid assay [113]. Two additional mutations (Q4053ter and S4465D) found in FHC patients were localized to N2A extensible domains [113]. Although the mechanical properties of the mutated titins have yet to be defined, it is likely that both the HCM and the DCM have arisen from a functional impairment in force transmission from Z-line to Z-line mediated through the titin molecule.
5 Myosin-based Myopathies in Skeletal Muscle
As indicated by the discussion above, the relationship between mutations in sarcomeric components of cardiac muscle and FHC are well defined. Mutations in
5 Myosin-based Myopathies in Skeletal Muscle 15
sarcomeric proteins of the skeletal contractile apparatus can also lead to a severe myopathic phenotype. A majority of skeletal muscle myopathies are recessive although some actin and myosin mutations display a dominant inheritance [2, 114]. There have also been a number of de novo mutations identified. Skeletal myopathies are generally characterized by structural abnormalities, hypotonia and severe muscle weakness that can result in early death (within the first year of life) usually due to respiratory failure. Also, skeletal muscle mutations typically lead to muscle degeneration rather than the myocellular hypertrophy which is usually observed in FHC [115, 116]. These congenital myopathies typically display one of two distinct phenotypes: nemaline myopathy, which is identified by the appearance of intranuclear and/or sarcoplasmic rod structures, termed nemaline bodies, consisting of proteins from the Z-line and thin filament, and actin myopathy, which is distinguished from nemaline myopathies by the presence of large sub-sarcolemmal accumulations of thin filaments composed primarily of actin [114, 117]. Whereas mutations in actin can lead to both actin and nemaline myopathies [114], patients with mutations in tropomyosin [118], troponin T [119], and nebulin [120] exhibit a pure nemaline myopathy. Another group of inheritable skeletal myopathies are classified as hereditary inclusion-body myopathies (HIBM). In general, muscle fibers from patients with HIBM have morphological features dominated by rimmed vacuoles and filamentous inclusion bodies [115]. A novel mutation in the skeletal MyHC-IIa gene has been identified in patients with HIBM and is inherited in an autosomal dominant fashion [2, 121]. The histopathological findings are heterogeneous depending on age, with adolescents having variability in fiber size, central nuclei and focal disorganization; older patients show progressive muscle weakness, dystrophic changes and inclusion bodies [2]. This mutation is a missense mutation leading to an amino acid substitution at residue 706 (E to K) and is located in the highly conserved motor domain of the myosin head. The manifestation of the pathology strongly suggests a severe deficit in mechanical capacity of these myofilaments as has been demonstrated by the incorporation of point mutations in skeletal MyHC genes in Drosophila, C. elegans, and Dictyostelium or by targeted disruption of fast MyHC IIb or IId in mice [122]. A muscular dystrophy described in mice has been linked to a mutation in the region that links the Ig with the PEVK domain in the skeletal isoform of titin [123]. While the heterozygote mice are unaffected by this mutation, homozygotes develop progressive muscle degeneration and severe structural abnormalities [123]. As detailed above, each of these domains is important in the passive properties of both skeletal and cardiac muscle [103, 108]. Given that the Ig and PEVK domains are stretched sequentially and not in parallel, a mutation in the linker region may disrupt this transition in passive stretch. In addition, this mutation is in the calpain binding domain of titin. Calpain is a highly autolytic, skeletal muscle-specific protease that is presumably stabilized by its binding to titin. The significance of this interaction is supported by mutations in the calpain gene found in patients exhibiting human limb-girdle muscular dystrophy characterized by pelvic girdle weakness, atrophic pattern of muscle involvement, lax abdominal muscles and
16 Myosin Myopathies
mobility difficulties [124]. Thus, the etiology of this mutation as a muscular dystrophy may be due to the autolytic activity of calpain, abnormal force transmission, or both.
6 Conclusions
The discovery of sarcomeric mutations underlying human cardiac and skeletal myopathies has greatly accelerated our understanding of the how myofilament contractile dysfunction can induce a diseased state. The ability to recapitulate many aspects of the disease state in transgenic models has hastened this learning process. One of the more confounding issues is the fact that the same mutations can induce a clinically heterogeneous phenotype in humans. Similarly, the same mutation inserted into the murine genome does not exactly mimic the human disease. For one thing, murine physiology is very different from that of larger mammals. In addition, our ability to study cardiac physiology although improving, is limited when compared to the study of larger mammals [125]. Clearly, the context and physiological environment play a critical role in determining the phenotype. Nevertheless, the transgenic mouse model, including the many other tools described in this chapter, play a critical role in describing the molecular pathogenesis of the disease. The elucidation of the signaling pathways that participate in the remodeling process associated with these myopathies will provide the necessary link between sarcomeric mutations, muscle dysfunction, and the manifestation of the disease.
References 1 SEIDMAN, J. G. and SEIDMAN, C. 2001. The genetic basis for cardiomyopathy: From mutation identification to mechanistic paradigms. Cell 104: 557–567. 2 DARIN, N., et al. 1998. Autosomal dominant myopathy with congenital joint contractures, ophthalmoplegia, and rimmed vacuoles. Ann. Neurol. 44: 242–248. 3 RAYMENT, I., et al. 1993. Three-dimensional structure of myosin subfragment-1: A molecular motor. Science 261: 50–58. 4 RAYMENT, I., et al. 1995. Structural interpretation of the mutations in the beta-cardiac myosin that have been implicated in familial hypertrophic cardiomyopathy. Proc. Natl Acad. Sci. USA 92: 3864–3868.
5 KAMISAGO, M., et al. 2000. Mutations in sarcomere protein genes as a cause of dilated cardiomyopathy. N. Engl. J. Med. 343: 1688–1696. 6 BLAIR, E., et al. 2002. Mutations of the light meromyosin domain of the beta-myosin heavy chain rod in hypertrophic cardiomyopathy. Circ. Res. 90: 263–269. 7 MCLACHLAN, A. D. and KARN, J. 1982. Periodic charge distributions in the myosin rod amino acid sequence match cross-bridge spacings in muscle. Nature 299: 226–231. 8 OKAGAKI, T., et al. 1993. The major myosin-binding domain of skeletal muscle mybp-c (c protein) Resides in the cooh-terminal, immunoglobulin c2 motif. J. Cell Biol. 123: 619–626.
References 9 SOTERIOU, A., et al. 1993. A survey of interactions made by the giant protein titin. J. Cell Sci. 104 (Pt. 1): 119–123. 10 MCNALLY, E. M., et al. 1989. Full-length rat alpha and beta cardiac myosin heavy chain sequences. Comparisons suggest a molecular basis for functional differences. J. Mol. Biol. 210: 665–671. 11 VANBUREN, P., et al. 1995. Cardiac v1 and v3 myosins differ in their hydrolytic and mechanical activities in vitro. Circ. Res. 77: 439–444. 12 JONES, W. K., et al. 1996. Ablation of the murine alpha myosin heavy chain gene leads to dosage effects and functional deficits in the heart. J. Clin. Invest. 98: 1906–1917. 13 LYONS, G. E., et al. 1990. Developmental regulation of myosin gene expRes.sion in mouse cardiac muscle. J. Cell Biol. 111: 2427–2436. 14 de TOMBE, P. P. and TER KEURS, H. E. D. J. 1991. Lack of effect of isoproterenol on unloaded velocity of sarcomere shortening in rat cardiac trabeculae. Circ. Res. 68: 382–391. 15 NADAL-GINARD, B. and MAHDAVI, V. 1989. Molecular basis of cardiac performance. Plasticity of the myocardium generated through protein isoform switches. J. Clin. Invest. 84: 1693–1700. 16 FREEMAN, K., et al. 2001. Progression from hypertrophic to dilated cardiomyopathy in mice that express a mutant myosin trans-gene. Am. J. Physiol. Heart Circ. Physiol. 280: H151–H159. 17 TARDIFF, J. C., et al. 1999. Cardiac troponin t mutations Res.ult in allele-specific phenotypes in a mouse model for hypertrophic cardiomyopathy J. Clin. Invest. 104: 469–481. 18 BOUVAGNET, P., et al. 1989. Distribution pattern of alpha and beta myosin in normal and diseased human ventricular myocardium. Basic Res. Cardiol. 84: 91–102. 19 SCHIAFFINO, S., et al. 1984. Myosin changes in hypertrophied human atrial and ventricular myocardium. A correlated immunofluorescence and quantitative immunochemical study on
20
21
22
23
24
25
26
27
28
29
30
serial cryosections. Eur. Heart J. 5 (Suppl. F): 95–102. LOWES, B. D., et al. 1997. Changes in gene expRes.sion in the intact human heart. Down-regulation of alpha-myosin heavy chain in hypertrophied, failing ventricular myocardium. J. Clin. Invest. 100: 2315–2324. MIYATA, S., et al. 2000. Myosin heavy chain isoform expRes.sion in the failing and nonfailing human heart. Circ. Res. 86: 386–390. SCHAIBLE, T. F. and SCHEUER, J. 1985. Cardiac adaptations to chronic exercise. Prog. Cardiovasc. Dis. 27: 297–324. HARRIS, D. E., et al. 1994. Smooth, cardiac and skeletal muscle myosin force and motion generation assessed by cross-bridge mechanical interactions in vitro. J. Muscle Res. Cell Motil. 15, 11–19. LITTEN, R. Z., 3rd, et al. 1982. Altered myosin isozyme patterns from pRes.sure-overloaded and thyrotoxic hypertrophied rabbit hearts. Circ. Res. 50: 856–864. FITZSIMONS, D. P., et al. 1998. Role of myosin heavy chain composition in kinetics of force development and relaxation in rat myocardium. J. Physiol. (Lond.) 513: 171–183. PAGANI, E. D. and JULIAN, F. J. 1984. Rabbit papillary muscle myosin isozymes and the velocity of muscle shortening. Circ. Res. 54: 586–594. HERRON, T. J., et al. 2001. Loaded shortening and power output in cardiac myocytes are dependent on myosin heavy chain isoform expRes.sion. Am. J. Physiol. Heart Circ. Physiol. 281: H1217–H1222. HOLUBARSCH, C., et al. 1985. The economy of isometric force development, myosin isoenzyme pattern and myofibrillar atpase activity in normal and hypothyroid rat myocardium. Circ. Res. 56: 78–86. RUNDELL, V. L., et al. 2002. Cardiac myofilament tension cost in rats: Relation to mhc isoform content. Biophys. J. 82: 397a. BRITTSAN, A. G., et al. 1999. The effect of isoproterenol on phospholamban-deficient mouse hearts
17
18 Myosin Myopathies
31
32
33
34
35
36
37
38
39
40
41
with altered thyroid conditions. J. Mol. Cell. Cardiol. 31, 1725–1737. KISS, E., et al. 1998. Thyroid hormone-induced alterations in phospholamban-deficient mouse hearts. Circ. Res. 83: 608–613. LORENZ, J. N. and ROBBINS, J. 1997. Measurement of intraventricular pressure and cardiac performance in the intact closed-chest anesthetized mouse. Am. J. Physiol. 272: H1137–H1146. TARDIFF J. C., et al. 2000. Expression of the beta (slow)-isoform of mhc in the adult mouse heart causes dominant-negative functional effects. Am. J. Physiol. Heart Circ. Physiol. 278: H412–H419. de TOMBE, P. P. and SOLARO, R. J. 2000. Integration of cardiac myofilament activity and regulation with pathways signaling hypertrophy and failure. Ann. Biomed. Eng. 28: 991–1001. SOLARO, R. J. and RARICK, H. M. 1998. Troponin and tropomyosin: Proteins that switch on and tune in the activity of cardiac myofilaments. Circ. Res. 83: 471–480. STARLING, E. H. 1918. Linacre Lecture on the Law of the Heart, Cambridge, 1915. New York: Longmans, Green, and Co. VIKSTROM, K. L., et al. 1998. Hypertrophy, pathology, and molecular markers of cardiac pathogenesis. Circ. Res. 82: 773–778. FRANCIS, G. S. and COHN, J. N. 1990. Heart failure: Mechanisms of cardiac and vascular dysfunction and the rationale for pharmacologic intervention. FASEB J. 4: 3068–3075. GEISTERFER-LOWRANCE, A. A., et al. 1990. A molecular basis for familial hypertrophic cardiomyopathy: A beta cardiac myosin heavy chain gene missense mutation. Cell 62: 999–1006. ROOPNARINE, O. and LEINWAND, L. A. 1998. Functional analysis of myosin mutations that cause familial hypertrophic cardiomyopathy. Biophys. J. 75: 3023–3030. SATA, M. and IKEBE, M. 1996. Functional analysis of the mutations in the human
42
43
44
45
46
47
48
49
cardiac beta-myosin that are Res.ponsible for familial hypertrophic cardiomyopathy. Implication for the clinical outcome. J. Clin. Invest. 98: 2866–2873. SWEENEY, H. L., et al. 1994. Heterologous expRes.sion of a cardiomyopathic myosin that is defective in its actin interaction. J. Biol. Chem. 269: 1603–1605. CUDA, G., et al. 1997. The in vitro motility activity of beta-cardiac myosin depends on the nature of the beta-myosin heavy chain gene mutation in hypertrophic cardiomyopathy. J. Muscle Res. Cell Motil. 18: 275–283. LANKFORD, E. B., et al. 1995. Abnormal contractile properties of muscle fibers expressing beta-myosin heavy chain gene mutations in patients with hypertrophic cardiomyopathy. J. Clin. Invest. 95: 1409–1414. PALMITER, K. A., et al. 2000. R403q and l908v mutant beta-cardiac myosin from patients with familial hypertrophic cardiomyopathy exhibit enhanced mechanical performance at the single molecule level. J Muscle Res. Cell Motil. 21: 609–620. WATKINS, H., et al. 1992. Characteristics and prognostic implications of myosin missense mutations in familial hypertrophic cardiomyopathy. N. Engl. J. Med. 326: 1108–1114. FANANAPAZIR, L. and EPSTEIN, N. D. 1994. Genotype-phenotype correlations in hypertrophic cardiomyopathy. Insights provided by comparisons of kindreds with distinct and identical beta-myosin heavy chain gene mutations. Circulation 89: 22–32. HAVNDRUP, O., et al. 2001. The val606met mutation in the cardiac beta-myosin heavy chain gene in patients with familial hypertrophic cardiomyopathy is associated with a high risk of sudden death at young age. Am. J. Cardiol. 87: 1315–1317. SEMSARIAN, C., et al. 2001. A polymorphic modifier gene alters the hypertrophic Response in a murine model of familial hypertrophic cardiomyopathy. J. Mol. Cell Cardiol. 33: 2055–2060.
References 50 GEISTERFER-LOWRANCE, A. A., et al. 1996. A mouse model of familial hypertrophic cardiomyopathy. Science 272: 731–734. 51 VIKSTROM, K. L., et al. 1996. Mice expressing mutant myosin heavy chains are a model for familial hypertrophic cardiomyopathy. Mol. Med. 2: 556–567. 52 MARIAN, A. J., et al. 1999. A transgenic rabbit model for human hypertrophic cardiomyopathy. J. Clin. Invest. 104: 1683–1692. 53 MCCONNELL, B. K., et al. 2001. Comparison of two murine models of familial hypertrophic cardiomyopathy. Circ. Res. 88: 383–389. 54 BLANCHARD, E., et al. 1999. Altered cross-bridge kinetics in the alphamhc403/+ mouse model of familial hypertrophic cardiomyopathy. Circ. Res. 84: 475–483. 55 KIM, S. J., et al. 1999. An alpha-cardiac myosin heavy chain gene mutation impairs contraction and relaxation function of cardiac myocytes. Am. J. Physiol. 276: H1780–H1787. 56 OLSSON, M. C., et al. 2001. Gender and aging in a transgenic mouse model of hypertrophic cardiomyopathy. Am. J. Physiol. Heart Circ. Physiol. 280: H1136–H1144. 57 TEN CATE, F. J. and ROELANDT, J. 1979. Progression to left ventricular dilatation in patients with hypertrophic obstructive cardiomyopathy. Am. Heart J. 97: 762–765. 58 FREEMAN, K., et al. 2001. Alterations in cardiac adrenergic signaling and calcium cycling differentially affect the progression of cardiomyopathy. J. Clin. Invest. 107: 967–974. 59 UYEDA, T. Q., et al. 1996. The neck region of the myosin motor domain acts as a lever arm to generate movement. Proc. Natl Acad. Sci. USA 93: 4459–4464. 60 COLLINS, J. H. 1976. Homology of myosin dtnb light chain with alkali light chains, troponin c and parvalbumin. Nature 259: 699–700. 61 LOWEY S., et al. 1993. Skeletal muscle myosin light chains are essential for physiological speeds of shortening. Nature 365: 454–456.
62 VANBUREN, P., et al. 1994. The essential light chain is required for full force production by skeletal muscle myosin. Proc. Natl Acad. Sci. USA 91: 12403–12407. 63 BRESNICK, A. R. 1999. Molecular mechanisms of nonmuscle myosin-ii regulation. Curr. Opin. Cell Biol. 11: 26–33. 64 METZGER, J. M. and MOSS, R. L. 1992. Myosin light chain 2 modulates calcium-sensitive cross-bridge transitions in vertebrate skeletal muscle. Biophys. J. 63: 460–468. 65 SWEENEY, H. L. and STULL, J. T. 1990. Alteration of cross-bridge kinetics by myosin light chain phosphorylation in rabbit skeletal muscle: Implications for regulation of actin-myosin interaction. Proc. Natl Acad. Sci. USA 87: 414–418. 66 SWEENEY, H. L., et al. 1993. Myosin light chain phosphorylation in vertebrate striated muscle: Regulation and function. Am. J. Physiol. 264, C1085–C1095. 67 SZCZESNA, D., et al. 2001. Familial hypertrophic cardiomyopathy mutations in the regulatory light chains of myosin affect their structure, Ca2+ binding, and phosphorylation. J. Biol. Chem. 276: 7086–7092. 68 SZCZESNA, D., et al. 2002. Phosphorylation of the regulatory light chains of myosin affects ca(2+) sensitivity of skeletal muscle contraction. J. Appl. Physiol. 92: 1661–1670. 69 LEVINE, R. J., et al. 1998. Structural and functional Res.ponses of mammalian thick filaments to alterations in myosin regulatory light chains. J. Struct. Biol. 122: 149–161. 70 YANG, Z., et al. 1998. Changes in interfilament spacing mimic the effects of myosin regulatory light chain phosphorylation in rabbit psoas fibers. J. Struct. Biol. 122: 139–148. 71 VEMURI, R., et al. 1999. The stretch-activation Res.ponse may be critical to the proper functioning of the mammalian heart. Proc. Natl Acad. Sci. USA 96: 1048–1053. 72 DAVIS, J. S., et al. 2001. The overall pattern of cardiac contraction depends
19
20 Myosin Myopathies
73
74
75
76
77
78
79
80
81
on a spatial gradient of myosin regulatory light chain phosphorylation. Cell 107: 631–641. POETTER, K., et al. 1996. Mutations in either the essential or regulatory light chains of myosin are associated with a rare myopathy in human heart and skeletal muscle. Nature Genet. 13: 63–69. FLAVIGNY, J., et al. 1998. Identification of two novel mutations in the ventricular regulatory myosin light chain gene (myl2) associated with familial and classical forms of hypertrophic cardiomyopathy. J. Mol. Med. 76: 208–214. SANBE, A., et al. 2000. In vivo analysis of an essential myosin light chain mutation linked to familial hypertrophic cardiomyopathy. Circ. Res. 87: 296–302. WELIKSON, R. E., et al. 1999. Cardiac myosin heavy chains lacking the light chain binding domain cause hypertrophic cardiomyopathy in mice. Am. J. Physiol. 276: H2148–H2158. EINHEBER, S. and FISCHMAN, D. A. 1990. Isolation and characterization of a cdna clone encoding avian skeletal muscle c-protein: An intracellular member of the immunoglobulin superfamily Proc. Natl Acad. Sci. USA. 87: 2157–2161. ALYONYCHEVA, T. N., et al. 1997. Isoform-specific interaction of the myosin-binding proteins (mybps) with skeletal and cardiac myosin is a property of the c-terminal immunoglobulin domain. J. Biol. Chem. 272: 20866–20872. GRUEN, M. and GAUTEL, M. 1999. Mutations in beta-myosin s2 that cause familial hypertrophic cardiomyopathy (fhc) abolish the interaction with the regulatory domain of myosin-binding protein-c. J. Mol. Biol. 286: 933–949. KORETZ, J. F. 1979. Effects of c-protein on synthetic myosin filament structure. Biophys. J. 27: 433–446. MAW, M. C. and ROWE, A. J. 1986. The reconstruction of myosin filaments in rabbit psoas muscle from solubilized myosin. J. Muscle Res. Cell Motil. 7: 97–109.
82 WINEGRAD, S. 1999. Cardiac myosin binding protein c. Circ. Res. 84: 1117–1126. 83 FLAVIGNY J., et al. 1999. Cooh-terminal truncated cardiac myosin-binding protein c mutants Res.ulting from familial hypertrophic cardiomyopathy mutations exhibit altered expRes.sion and/or incorporation in fetal rat cardiomyocytes. J. Mol. Biol. 294: 443–456. 84 KUNST, G., et al. 2000. Myosin binding protein c, a phosphorylation-dependent force regulator in muscle that controls the attachment of myosin heads by its interaction with myosin s2. Circ. Res. 86: 51–58. 85 WINEGRAD, S. 2000. Myosin binding protein c, a potential regulator of cardiac contractility [editorial; comment]. Circ. Res. 86: 6–7. 86 YANG, Q., et al. 1999. In vivo modeling of myosin binding protein c familial hypertrophic cardiomyopathy. Circ. Res. 85: 841–847. 87 HARTZELL, H. C. 1985. Effects of phosphorylated and unphosphorylated c-protein on cardiac actomyosin atpase. J. Mol. Biol. 186: 185–195. 88 MOOS, C. and FENG, I. N. 1980. Effect of c-protein on actomyosin atpase. Biochim. Biophys. Acta 632: 141–149. 89 GARVEY J. L., et al. 1988. Phosphorylation of c-protein, troponin i and phospholamban in isolated rabbit hearts. Biochem. J. 249: 709–714. 90 HARTZELL, H. C. and GLASS, D. B. 1984. Phosphorylation of purified cardiac muscle c-protein by purified camp-dependent and endogenous ca2+-calmodulin-dependent protein kinases. J. Biol. Chem. 259: 15587–15596. 91 FENTZKE, R. C., et al. 1999. Impaired cardio-myocyte relaxation and diastolic function in transgenic mice expressing slow skeletal troponin i in the heart. J. Physiol. (Lond.) 517: 143–157. 92 KENTISH, J. C., et al. 2001. Phosphorylation of troponin i by protein kinase a accelerates relaxation and cross-bridge cycle kinetics in mouse
References
93
94
95
96
97
98
99
100
101
102
ventricular muscle. Circ. Res. 88: 1059–1065. WEISBERG, A. and WINEGRAD, S. 1998. Relation between cross-bridge structure and actomyosin atpase activity in rat heart. Circ. Res. 83: 60–72. KONHILAS, J. P., et al. 2000. Alterations in myofilament lattice spacing do not underlie the frank-starling law of the heart. Circulation 102: II-215. NIIMURA, H., et al. 1998. Mutations in the gene for cardiac myosin-binding protein c and late-onset familial hypertrophic cardiomyopathy. N. Engl. J. Med. 338: 1248–1257. WITT, C. C., et al. 2001. Hypercontractile properties of cardiac muscle fibers in a knock-in mouse model of cardiac myosin-binding protein-c. J. Biol. Chem. 276: 5353–5359. FREIBURG, A. and GAUTEL, M. 1996. A molecular map of the interactions between titin and myosin-binding protein c. Implications for sarcomeric assembly in familial hypertrophic cardiomyopathy. Eur. J. Biochem. 235: 317–323. MCCONNELL, B. K., et al. 1999. Dilated cardiomyopathy in homozygous myosin-binding protein-c mutant mice. J. Clin. Invest. 104: 1235–1244. YANG, Q., et al. 1998. A mouse model of myosin binding protein c human familial hypertrophic cardiomyopathy. J. Clin. Invest. 102: 1292–1300. HARRIS, S. P., et al. 2002. Hypertrophic cardiomyopathy in cardiac myosin binding protein-c knockout mice. Circ. Res. 90: 594–601. OBERMANN, W. M., et al. 1997. Molecular structure of the sarcomeric m band: Mapping of titin and myosin binding domains in myomesin and the identification of a potential regulatory phosphorylation site in myomesin. EMBO J. 16: 211–220. GRANZIER, H. L. and IRVING, T. C. 1995. Passive tension in cardiac muscle: Contribution of collagen, titin, microtubules, and intermediate filaments. Biophys. J. 68: 1027–1044.
103 LINKE, W. A., et al. 1998. Nature of pevk-titin elasticity in skeletal muscle. Proc. Natl Acad. Sci. USA 95: 8052–8057. 104 TROMBITAS, K., et al. 2000. Extensibility of isoforms of cardiac titin: Variation in contour length of molecular subsegments provides a basis for cellular passive stiffness diversity [in process citation]. Biophys. J. 79: 3226–3234. 105 WANG, K., et al. 1991. Regulation of skeletal muscle stiffness and elasticity by titin isoforms: A test of the segmental extension model of Res.ting tension. Proc. Natl Acad. Sci. USA 88: 7101–7105. 106 LABEIT, S., et al. 1997. The giant protein titin. Emerging roles in physiology and pathophysiology. Circ. Res. 80, 290–294. 107 LABEIT, S. and KOLMERER, B. 1995. Titins: Giant proteins in charge of muscle ultrastructure and elasticity. Science 270: 293–296. 108 GRANZIER, H., et al. 1996. Nonuniform elasticity of titin in cardiac myocytes: A study using immunoelectron microscopy and cellular mechanics. Biophys. J. 70: 430–442. 109 HOUMEIDA, A., et al. 1995. Studies of the interaction between titin and myosin. J. Cell Biol. 131: 1471–1481. 110 LINKE, W. A., et al. 1997. Actin-titin interaction in cardiac myofibrils: Probing a physiological role. Biophys. J. 73: 905–919. 111 SORIMACHI, H., et al. 1997. Tissue-specific expRes.sion and alpha-actinin binding properties of the z-disc titin: Implications for the nature of vertebrate z-discs. J. Mol. Biol. 270: 688–695. 112 SATOH, M., et al. 1999. Structural analysis of the titin gene in hypertrophic cardiomyopathy: Identification of a novel disease gene. Biochem. Biophys. Res. Commun. 262: 411–417. 113 ITOH-SATOH, M., et al. 2002. Titin mutations as the molecular basis for dilated cardiomyopathy. Biochem. Biophys. Res. Commun. 291: 385–393. 114 NOWAK, K. J., et al. 1999. Mutations in the skeletal muscle alpha-actin gene in patients with actin myopathy and nemaline myopathy. Nature Genet. 23: 208–212.
21
22 Myosin Myopathies
115 ASKANAS, V. and ENGEL, W. K. 1998. Sporadic inclusion-body myositis and hereditary inclusion-body myopathies: Current concepts of diagnosis and pathogenesis. Curr. Opin. Rheumatol. 10: 530–542. 116 BURTON, E. A. and DAVIES, K. E. 2002. Muscular dystrophy–reason for optimism? Cell 108: 5–8. 117 MARSTON, S. B. and HODGKINSON, J. L. 2001. Cardiac and skeletal myopathies: Can genotype explain phenotype? J. Muscle Res. Cell Motil. 22: 1–4. 118 LAING, N. G., et al. 1995. A mutation in the alpha tropomyosin gene tpm3 associated with autosomal dominant nemaline myopathy nem1. Nature Genet. 10: 249. 119 JOHNSTON, J. J., et al. 2000. A novel nemaline myopathy in the amish caused by a mutation in troponin t1. Am. J. Hum. Genet. 67: 814–821. 120 PELIN, K., et al. 1999. Mutations in the nebulin gene associated with autosomal
121
122
123
124
125
recessive nemaline myopathy. Proc. Natl Acad. Sci. USA 96: 2305–2310. MARTINSSON, T., et al. 2000. Autosomal dominant myopathy: Missense mutation (glu-706 ? lys) in the myosin heavy chain iia gene. Proc. Natl Acad. Sci. USA 97: 14614–14619. RUPPEL, K. M. and SPUDICH, J. A. 1996. Structure-function analysis of the motor domain of myosin. Annu. Rev. Cell Dev. Biol. 12: 543–573. GARVEY, S. M., et al. 2002. The muscular dystrophy with myositis (mdm) mouse mutation disrupts a skeletal muscle-specific domain of titin. Genomics 79: 146–149. BUSHBY, K. M. 1999. Making sense of the limb-girdle muscular dystrophies. Brain 122 (Pt. 8): 1403–1420. ROBBINS, J. 2000. Remodeling the cardiac sarcomere using transgenesis. Annu. Rev. Physiol. 62: 261–287.
1
Protein Sequence Variants: Resources and Tools Yum Lina Yip, Maria Livia Famiglietti, Elisabeth Gasteiger, and Amos Bairoch Swiss Institute of Bioinformatics (ISB/SIB), Gen`eve, Switzerland
Originally published in: Biomedical Applications of Proteomics. Edited by JeanCharles Sanchez, Garry L. Corthals and Denis F. Hochstrasser. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30807-1
1 Introduction
The mapping of the human genome has provided a large volume of data, which, rather than representing the conclusion of years of work, forms the basis for future projects such as the characterization of all human genes and the study of the role of genetic diversity in determining health or disease status [1]. Very few traits have a single genetic origin. Most depend on the combination of various genetic factors, together with environmental influences. For complex diseases it is a very arduous task to decipher the genetic background contributing to the manifestation of the disease. A major challenge in medical genetics is, indeed, to understand the relationship between genetic and phenotypic variation. A first step to reach this goal is the identification and characterization of all variants of human genes. Since most genomic diversity is due to single nucleotide polymorphisms (SNPs), which represent about 90% of human DNA sequence variation [2], a particular effort is currently being put into their detection and mapping. High-throughput genotyping techniques [3, 4] are generating huge amount of data [5–13] whose collection and integration with other medical, biological, and biochemical information is crucial in unravelling disease mechanisms. The need to develop “a considered, integrated, and systematic approach to the problem of mutation documentation” was recognized as early as 1994, and resulted in the HUGO Mutation Database Initiative [14]. This initiative evolved into the Human Genome Variation Society (http://www.hgvs.org/), with the major aim of providing up-to-date documentation of genetic variations via a system of cross-linked specialized and central databases whose content and structure follow uniform guidelines and recommendations [15, 16]. Presently, a variety of databases [17] and tools of search and information retrieval are available on the Web. In this Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Sequence Variants: Resources and Tools
article, we illustrate the main topics concerning the annotation process and present the main online resources documenting human variation, including genomic and protein databases, and related search tools. Finally, we describe different analysis tools for in silico identification and characterization of protein variants. Variation at the protein level can be due to a plethora of different modifications. In this article, the term “protein variants” indicates protein sequences containing amino acid substitutions, due to non-synonymous coding SNPs.
2 Medical Protein Annotation
The expression “medical protein annotation” is used for the process of collecting, reviewing, and integrating information on disease-associated genes/proteins, susceptible of inclusion in a database for public use. It can cover different fields. Hence, disease phenotypes, clinical information, therapeutic information, genetic and structural data, might be more or less relevant depending on the main focus of the database and the needs of the particular scientific community to which the database is addressed. Nevertheless, independent of the content of a database, the general major challenge in the annotation process and, at the same time, obligation toward the scientific community is to make sure that the information is correct, accurate, clear, and not misleading. In the annotation of sequence variants, two issues are particularly relevant to the problem of data quality in mutation databases, namely nomenclature and the effect of sequence variations. A consistent and widely accepted nomenclature is indispensable for unambiguous and accurate reporting of sequence variants. Recommendations have been published for the description of simple and complex sequence changes [18, 19]. However, due to the complexity of the subject, discussions are necessary in order to improve variant designation rules. Official recommendations are available online and scientists are invited to submit remarks and nomenclature suggestions for complicated cases that are not yet covered (http://www.genomic.unimelb.edu.au/mdi/mutnomen/). Correct variants nomenclature in publications and database submissions is an essential basis for the quality control that each scientific database annotator should undertake before including variation data in a database. In particular, errors due to wrong assignment of base position, incorrectly deduced amino acid changes, erroneous use of the one-letter amino acid code, and typographical mistakes can be recognized if the name of the variant follows the basic nomenclature rules [18, 19]: (1) variants are best described at the DNA level, using either a genomic or a cDNA reference sequence with the accession and version number from a primary sequence database; (2) the description should be preceded by a letter indicating the type of reference sequence used (“g” for genomic, “c” for cDNA, “m” for mitochondrial,); (3) the nucleotide numbering should designate the A of the ATG of the initiator methionine as nucleotide +1; (4) it should be clearly indicated at which level sequence variation has been analyzed: “r” or “p” should precede variant
3 Databases
descriptions if RNA or protein sequences have been analyzed. For example, given the cDNA sequence with accession number AF151980 and version AF151980.2, the designation c.226C>A indicates the substitution of cytosine at nucleotide position 226 by an adenosine, corresponding to the deduced amino acid change p. Arg76Ser (p. R76S) [20]. A curation tool for checking consistency between described nucleotide changes and corresponding deduced amino acid changes [21] is available online (http://www.ebi.ac.uk/cgi-bin/mutations/check.cgi). A second major difficulty in the annotation process is to describe the effect of a sequence variation. Establishing a disease association, especially for missense variants, may not be an easy task, as recently outlined [22, 23]. A few criteria can help in designating a missense variant as disease-causing: Ĺ The extent of DNA analysis: the entire gene should be analyzed to rule out the possibility of missing the actual relevant mutation. Ĺ Segregation of the disease phenotype with the mutation within pedigrees is a good indication of pathogenicity. Ĺ The type of missense change: substitutions of a residue by amino acids of different physical character are more likely to influence protein structure, particularly if they affect residues conserved among species. Ĺ The frequency of the variant, since a mutation occurring at a frequency less than 1% in the population is rare and may be disease-causing. Ĺ Functional analysis of the variant is the definitive test to confirm its phenotypic effect.
The retinal-specific ATP-binding cassette transporter gives a good example of the complexity of this issue. The corresponding gene (ABCA4 or ABCR) is mutated in retinal disorders [24–27]. However, while the association with Stargardt disease [28, 29], cone-rod dystrophy [25, 26, 30, 31], and retinitis pigmentosa [30] is widely accepted, the role of ABCR mutations in age-related macular dystrophy is still unclear due to studies that either attest [32, 33] or confute [34] association with the disease. The annotation process in such cases should consist of the objective and complete reporting of the conflicting data, providing database users with references from which the information has been obtained.
3 Databases
Many databases that describe sequence variations, in the form of mutations and polymorphisms, and their relationship to phenotype or disease are currently freely available from the Internet. It is expected that medical research will benefit greatly from these resources. To make the best use of the data, it is essential for a user to know what is available and how to approach the overwhelming data space. In this section, we aim to provide a brief description of the main databases documenting
3
4 Protein Sequence Variants: Resources and Tools
human variation and discuss their utility and limitations. Some techniques of search will also be suggested. 3.1 Central Databases
In general, there are two main types of mutation database: (1) central databases, which contain information on variation across the whole genome [35]; and (2) specialized databases, which concentrate on variations within a single gene or in genes associated with a single phenotype. If the central databases always contain a large amount of information, they can differ from each other in several aspects. First, the source of data and how they are accepted in the database can vary. Some databases are intensively curated and may contain only peer-reviewed, published information. Others accept submitted data directly from researchers and do little validation. There are also some databases that only collect data from many sources and present them with a common format at a single site. The quality of the databases and the degree of validity of the presented data is therefore variable. Secondly, the type of information provided can differ. For example, some databases may put their focus on disease phenotype and clinical relevance, while others record the biochemical consequences of the variations. In the following, we provide examples of some major central databases in which human mutation information can be documented. 3.1.1 Online Mendelian Inheritance in Man (OMIM) OMIM (http://www.ncbi.nlm.nih.gov/omim/) is a knowledgebase of human genes and genetic disorders [36, 37]. Originally, it was based on the textbook version of MIM [38]. OMIM entries are created for each unique genetic disorder or gene for which sufficient information exists. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to PubMed references, genetic databases (DNA and protein sequence), or other central and specialized databases. It also provides a highly detailed map-viewer and information on inheritance patterns. OMIM can be searched by title, OMIM number, allelic variants, text, references, clinical synopsis, gene map disorder, and contributors. In terms of information on allelic variants, OMIM does not contain all known variations in a gene, but concentrates on the disease-causing mutations and some most common polymorphisms. It should also be noted that the variants are germline mutations and are distinct from somatic mutations, as seen in non-familial cases of cancer, which are generally not included. Allelic variants can be investigated for each entry under the “Allelic variants” section of the “Table of contents” of OMIM. This section contains a list of variants, which includes a brief description of the mutation in terms of amino acid change. It also describes the discovery and clinical details of a particular mutation. Besides the richness of textual information provided in each entry, one of the major advantages of the OMIM database is that it can serve as a gateway to a wealth of genetic databases through its relevant links. However, the current textual
3 Databases
nature of OMIM makes automatic parsing of information difficult. Furthermore, OMIM is mainly directed to geneticists and is therefore not specifically concerned with sequence-oriented data. This results in a low correspondence between allelic variants and sequence information and makes it difficult to faithfully map allelic variants onto a sequence. 3.1.2 The Human Gene Mutation Database (HGMD) The Human Gene Mutation Database [39, 40] (http://archive.uwcm.ac.uk/ uwcm/mg/hgmd0.html) is a collection of published germline mutations associated with inherited diseases [41]. HGMD comprises single base-pair substitutions in coding, regulatory, and splicing regions of human nuclear genes, as well as deletions, duplications, insertions, repeat expansions, combined micro insertions and deletions (indels), and complex rearrangements. HGMD usually only includes mutations with obvious phenotypic consequences (disease-causing mutations). However, some polymorphisms with a statistically significant association to disease have been included on the basis that they significantly reduce the expression of a given gene or they alter the functional activity of its protein product. HGMD can be searched either by disease, gene name, or gene symbol. All HGMD entries have a unique accession number. Each entry provides a reference to the first published report of a mutation, the number of entries by mutation type, associated disease, the gene name, symbol, and chromosomal location. HGMD also provides links to OMIM, GDB, GenAtlas, the HUGO nomenclature, and specialized databases. 3.1.3 The SNP Databases Apart from OMIM and HGMD, where the database’s focus is on disease-associated variation, high-throughput SNP data can be documented via the “SNP databases”. In general, these databases serve as repositories and do not provide, at least for the moment, additional information about associated phenotype or biological significance. Two major databases of this type are dbSNP and the Human Genome Variation database (HGVbase). DbSNP (http://www.ncbi.nlm.nih.gov/SNP/) is a repository for single-base nucleotide substitution, short deletion and insertion polymorphisms, and microsatellite repeats from all species [42, 43]. It should be noted that in dbSNP, the use of the term “polymorphism” does not imply minimum allele frequencies (1% of the population) for SNPs. DbSNP contains about 3 million human SNPs as well as 0.5 million from other organisms. The Web-based interface allows flexible searches by gene name and by cross-reference to other databases, such as OMIM or structure databases. HGVbase [44] (http://www.hgvbase.cgb.ki.se/) was constructed to provide a nonredundant collection of all known, both functionally important and silent human DNA variants. Data are derived from different sources such as publications, batch submissions, or information from other Web-based databases. They are integrated in HGVbase on the basis of extensive curation. Records include neutral polymorphisms as well as disease-related mutations. They are grouped in proven and
5
6 Protein Sequence Variants: Resources and Tools
suspected variants. Generally, variants detected only by computational methods are classified as “suspected”. The tag “proven” is used for records that have been experimentally confirmed. New data can be submitted directly to the database by researchers using a Web page submission form. The data are then indexed and presented using the EBI SRS system (http://srs.ebi.ac.uk/) so that search can be performed by SNP identifier (ID), gene name, gene symbol, variation type and so forth. Each HGVbase entry is presented in the context of its surrounding sequence and mapped to the genome sequence. Population allele frequencies, when available, are also presented. In addition, the record provides information about the submitter and publications, genotyping assays and functional prediction, and links to GenBank and other databases. SNP-related information can also be retrieved from the SNP consortium website [45], from the JSNP repository of Japanese SNP data [46], from the Seattle SNPs website (http://www.pga.mbt.washington.edu/), or from the ALFRED [47, 48], which is designed to store and disseminate frequencies of alleles at human autosomal polymorphic sites and multiple defined population samples. Some non-mutationoriented databases, such as the Genome Database (GDB) [49] or the Swiss-Prot Protein Knowledgebase can also provide valuable information on variants. 3.1.4 Advantages and Drawbacks of Central Databases The major advantage of central databases lies in the fact that they usually offer a consistent and well-maintained interface to access a large amount of variation data across the whole genome. As these databases often receive direct funding, the database design can reach a greater sophistication and consequently, flexible search options, together with good data visualization tools, can be made available. As a common base of information about all variations, a centralized database may also allow statistical analysis to be conducted for a spectrum of variations [35]. However, while striving to provide a large amount of data, the central databases fail to have specialized knowledge of any particular locus. The curators are less likely to have close ties to a particular research community and, therefore, data produced in small laboratories or more specific information may be missed out from these databases. 3.2 Specialized Databases
For many years now, efforts have been made to create specialized databases. These efforts are of tremendous value to the biological research community, since they help to organize the massive amounts of data into specific, tractable research areas. Many more types of data, such as molecular structure and function, macromolecular interactions, geographic location of the allele, can be included in these specific databases. Two subcategories of specialized databases can be distinguished, based on their focus: gene-oriented, locus-specific databases (LSDBs) (Table 1) [50], and disease-oriented databases (Table 2).
http://www.mcgill.ca/ androgendb/
http://www.bioinf.uta.fi/ BTKbase 758 entries Last update: September 2002
http://www.genet.sickkids. on.ca/cftr/
http:// www.rubic.rdg.ac.uk/g6pd/ 193 entries Last update: December 2001
Androgen receptor gene mutations database [54]
BTKbase [55]
Cystic fibrosis mutation database
G6PD mutations [62]
1
Website
Database name
Table 1 Examples of locus-specific databases
Published articles; favism website; electronic submission; manual validation
Submissions by authors before publication
Direct electronic submissions by authors
Published articles; curated
Data source Mutations in androgen receptor gene Phenotype description Functional effects References Information on androgen receptor interacting proteins Mutations and polymorphisms in Bruton’s tyrosine kinase gene Phenotype description; population data Technical data Textual description of protein domains Information on mutations effect References Special feature: mutation checker Mutations and polymorphisms in the cystic fibrosis transmembrane conductance regulator Phenotype description; population data References Mutations in G6PD gene Phenotype classification 3D structural data References
Content
(continued)
Indexed by SRS
OMIM GDB Indexed by SRS
OMIM EMBL Swiss-Prot Swiss-Change PubMed Indexed by SRS
PubMed Swiss-Prot EMBL Indexed by SRS
Linksa)
3 Databases 7
http://www.le.ac.uk/ genetics/collagen/ Last update: March 2003 No statistics available online http://www.uta.fi/imt/ bioinfo/KinMutBase/ 320 entries Last update: December 2001
Human type I and III collagen mutation database [57, 58]
KinMutBase [59]
Human PAX6 allelic variant databases [56]
http://www.kcl.ac.uk/ip/ petergreen/ haemBdatabase.html 2511 patient entries Version 12, 2003 http://www.pax6.hgu. mrc.ac.uk/ 228 record Last update: January 2003
Haemophilia B mutation database
Table 1 (continued)
OMIM GDB GeneCards HGMD EMBL Swiss-Prot PubMed Indexed by SRS OMIM PubMed Mutations and polymorphisms in the paired box protein Pax-6 Phenotype description Information on protein domains Information on mutations effect References
Mutations and polymorphisms in COL1A1, COL1A2, and COL3A1 Phenotype description References Mutations in protein kinases genes Phenotype description References
Published articles and electronic data submis sions; automatic curation (MuStar software)
Published articles; curated
Direct submissions to curators
RefSeq Swiss-Prot
Indexed by SRS
Mutations and polymorphisms in factor IX gene Phenotype description Heterogeneous comments References
Direct submissions to coordinators
8 Protein Sequence Variants: Resources and Tools
http://www.tGRAP.uit.no Entries: 10500 Last update: April 2001 http://www.shef.ac.uk/ vwf/ Continual update No statistics available
TGRAP
Mutations and polymorphisms in von Willebrand factor Phenotype description; functional studies References
Electronic submissions
Published articles; curation
Mutations and polymorphisms in the phenylalanine-4-hydroxylase gene Phenotype description; clinical information Population data; haplotype data Mouse models; in vitro expression analysis results Molecular modeling data References Mutations in PHEX gene Phenotype description Information on mutation effects References Mutations in G protein-coupled receptors References
Mutations in nuclear receptors Phenotype description Position in structural alignment
Published articles and personal communications; manual curation
Published articles; SwissProt and other Web-based resources; limited curation
Swiss-Prot OMIM PubMed Indexed by SRS
OMIM
Swiss-Prot NucleaRDB OMIM VDR, PNR, GRR pages PubMed OMIM GeneBank GeneCards Swiss-Prot Indexed by SRS
Only clickable links from an entry to a corresponding relevant entry of another database are described here. Links that are provided by the website to other websites are not included.
a)
VWF database
http:// www.phexdb.mcgill.ca/
http://www.cmbipc60. cmbi.kun.nl:8080/ cgi-bin2/nrmd/nrmd.py 1095 entries Last update: November 2002 http://www.pahdb. mcgill.ca/
PHEXdb
PAHdb [60, 61]
NRMD [63]
3 Databases 9
Website
http://molgen-www.uia. ac.be/ADMutations/
http:// www.wmi.usyd.edu.au: 8080/melanoma.html
http://www.pc4.fsm.it:81/ cardmoc/ Regular update
http://www.globin.cse.psu.edu/ globin/hbvar/menu.html 1165 entries (May 2003) Continual update
Database name
Alzheimer disease mutation database
eMelanoBase [64]
Gene connection for the heart
HbVar [65]
Table 2 Examples of disease-oriented databases
Books on hemoglobin variants and thalassemia; curators for additional entries
Database maintained and regularly updated under the supervision of editors in a European-wide collaborative effort
Submissions by E-mail; curated
Published articles and direct submissions
Data source
Linkage data and mutations underlying inherited arrhythmias Phenotype descriptions In vitro expression data References Hemoglobin variants and thalassemia causing mutations Population data; clinical information Stability data Detection methods
Mutations in genes involved in Alzheimer’s disease Phenotype descriptions Family details References Germline variants in genes involved in familial melanoma susceptibility Population and geopolitical data References
Content
PubMed Indexed by SRS
OMIM GDB GeneBank Swiss-Prot PubMed NCBI nucleotide NCBI protein OMIM Ensembl PubMed
Linksa)
10 Protein Sequence Variants: Resources and Tools
http://www.ucl.ac.uk/ncl/ Last Published articles and update: October 2002 personal communications
NCL mutations
Mutations and polymorphisms in genes responsible for hereditary inflammatory disorders Population data Phenotype description Detection methods References Mutations and polymorphisms in genes involved in muscular dystrophies Gene and protein descriptions Phenotypes description Information on molecular diagnosis techniques References Mutations and polymorphisms in genes involved in neuronal ceroid lipofuscinoses Phenotype description Information on mutation effects References
Indexed by SRS
OMIM LocusLink GDB HGMD PubMed Indexed by SRS
PubMed
Only clickable links from an entry to a corresponding relevant entry of another database are described here. Links that are provided by the website to other websites are not included.
a)
http://www.dmd.nl/
Leiden Muscular Dystrophy Pages
Published articles and direct submissions
http://fmf.igh.cnrs.fr/ infevers/ Electronic submissions; curated
INFEVERS [66]
3 Databases 11
12 Protein Sequence Variants: Resources and Tools
3.2.1 An Example of a Locus-specific Database: the IARC TP53 Database The IARC TP53 mutation dataset [51] (http://www.iarc.fr/p53/index.html) is the largest data set available on the variations of any human gene [52]. The database includes all somatic and germline p53 mutations, as well as polymorphisms, that have been reported in the published literature since 1989. The IARC TP53 database is extensively annotated, in particular with respect to the description of tumor pathologies. It is interesting to note that the “central unit” of the database schema is the cancer patient whose tumor contains a mutation. Therefore, clinical information, as well as information on individual exposure and risk factors is provided for each individual whenever possible. For germline mutations, the data also include the description of the family, information on each family member, information on tumor samples, and reference to the publication in which the family is described. The database can be searched via a Web-based application. The user can identify and select specific sets of data based on queries. The results, such as mutation patterns, codon distribution and tumor spectrum, can be visualized in the form of either graphs or tables. The entire data set, or sets of data selected according to the user’s queries, can be downloaded as tab-limited text. In addition, the website offers a comprehensive user guide, a slide shown on p53 mutations as well as many links to other websites or database entries on TP53. Apart from the IARC TP53 database, other p53 databases are also available. For example, the UMD-p53 database (http://p53.curie.fr/index.html) [53]. Although it has been developed independently from the IARC TP53 database, the UMD-p53 database contains essentially similar entries derived from the published literature. The two databases differ in their format, their annotations (more extensive in the IARC database), and the design of their respective Web-search applications.
3.2.2 An Example of a Disease-oriented Specialized Database: Retina International’s Scientific Newsletter – Mutation Database In contrast to “gene-oriented” specialized databases that focus on gathering mutant information for a single locus, the “disease-oriented” databases contain all variants implicated in a particular disease and these variants can be from different loci. The Retina International’s Scientific Newsletter (http://www.retinainternational.com/sci-news/database.htm) provides an up-to-date synopsis of genes and mutations underlying retinal disorders. The data are organized in three different databases. In the mutation database, the mutations are sorted by gene and are presented following the official mutation nomenclature recommendations [18, 19]. The protein database offers an overview of all proteins directly or indirectly involved in the vision process and in the maintenance of the integrity of retinal cells and tissues. Information derived from the literature and links to other databases (GeneBank, OMIM, LocusLink, Swiss-Prot, MEDLINE) are also provided. The disease database is a collection of various retinal phenotypes and provides information on chromosomal location and the markers linked to a given locus. Finally, an animal model database presents a description of animal models for inherited retinal disorders.
3 Databases
3.2.3 Other Locus-specific Databases There are numerous other LSDBs available on the Internet. Some examples [50, 54– 66] are listed in Tables 1 and 2. It is possible to obtain a nearly exhaustive list on the following website: http://www.expasy.org/alinks.html#Human mut. As the LSDBs were developed independently of each other, they have very different content and structure depending on gene and disease characteristics. In a recent study, a total of 94 independent websites that describe mutations associated with human disease were evaluated according to 80 content criteria defined by the authors [50]. It was found that 54% of the LSDBs studied were easy to use, whereas 11% were hard to follow. The majority of the databases (73%) were displayed through HTML and a certain redundancy can be found among the databases. Not surprisingly, there was extreme heterogeneity between databases and the type of criteria met by them. Clearly, this study provided a strong case for having a certain uniformity of data among the databases in order to make the content as useful as possible. 3.2.4 Advantages and Drawbacks of Specialized Databases Specialized databases have several advantages. First, they often present the best state of knowledge in a particular area, and information for each variant is provided in greater depth. This is because specialized databases are usually set up, maintained, and curated by expert(s) with a particular interest in the gene(s) or field in question. They are consequently very well adapted and responsive to the needs of the corresponding research field. Secondly, they allow a large range of information to be retrieved for a particular gene. This information would otherwise have to be collected from multiple sites. However, because of their smaller size and usually a lack of direct funding, specialized databases may become stagnant and/or disappear. 3.3 The Swiss-Prot Protein Knowledgebase and Information on Disease and Sequence Variations
In this section, a detailed description of the Swiss-Prot Protein Knowledgebase for the use of proteomic medical research will be provided. Although Swiss-Prot is not a disease or mutation-centered database, it stores a wealth of information of interest to the medical community. Swiss-Prot [67] was created in 1986 at the Department of Medical Biochemistry of the University of Geneva. It is now an equal partnership between the European Molecular Biology Laboratory (EMBL) and the Swiss Institute of Bioinformatics (SIB). The database contains data originating from a wide variety of biological organisms. Currently (Release 41.15 of 03.07.2003), there are a total of about 130000 annotated entries from more than 8200 different species. A particular effort, however, is put on the annotation of all known human protein sequences and their mammalian orthologues (The Human Proteomics Initiative) (http://www.expasy.org/sprot/hpi/) [68]. In release 41.15 (03.07.2003), there were 9503 annotated human sequences. These entries were associated with 24448
13
14 Protein Sequence Variants: Resources and Tools
Fig. 1 Excerpt from Swiss-Prot entry P06400 (Nice-Prot format). Various reference blocks, cross-references, variants in the feature table, and part of the sequence have been deleted for the sake of simplicity.
literature references; 23378 experimental or predicted post-translational modifications (PTMs), 3222 splice variants and 15770 variants. The majority (about 60%) of the variants annotated in Swiss-Prot are linked with disease states. A detailed description of the structure of a Swiss-Prot entry (Fig. 1), and of the type of information annotated in the different line types, is available from the Swiss-Prot user manual: http://www.expasy.org/sprot/userman.html. Information on diseases and the characteristics of the variants can be found in various sections of Swiss-Prot, as discussed below.
3 Databases
Fig. 1 (continued)
3.3.1 Gene Names In Swiss-Prot, the name of the gene coding for a specific protein is indicated in a linetype called “GN” (Gene Name). Swiss-Prot uses the gene symbol proposed by the HUGO Gene Nomenclature Committee (HGNC) [69] whenever possible in order to avoid confusion in gene nomenclature, and to facilitate information retrieval from public databases. It also lists, if they exist, symbols that were previously used in literature or database reports.
15
16 Protein Sequence Variants: Resources and Tools
Fig. 1 (continued)
3.3.2 Description of Diseases For proteins involved in disease, the description of the disease can be found in the comment lines (“CC”) prefixed by the token “DISEASE”. Disease descriptions can be very concise, but links to corresponding OMIM entries are provided, allowing the user to retrieve more detailed information. Diseases associated with chromosomal
3 Databases
translocations are also described, and, if known, information on the translocation break point is indicated. Another comment token, “POLYMORPHISM” contains data on alleles and polymorphisms not associated with a specific disease. 3.3.3 Proteins as Therapeutic Drugs Swiss-Prot stores information relevant to the use of specific proteins as therapeutic drugs (pharmaceuticals). For such proteins, the brand name(s), the name(s) of the company(ies) that develops and/or sells the drug(s) and a brief description of the therapeutic use(s) of the protein are provided. This is implemented through a comment topic called “PHARMACEUTICAL”. It should also be noted that the generic name of a protein drug is recorded in the description lines (“DE”) of the relevant entry. While some generic names correspond to the main scientific name (example: insulin, somatotropin), for some others the generic name is one of the synonyms of the official protein name. 3.3.4 Data on Variants One of the major aims of Swiss-Prot is to provide the users with non-redundant data. Therefore, in contrast to other often automatically created databases or repositories which maintain one sequence record per unique sequence and can consequently have up to hundreds of records corresponding to the same gene, Swiss-Prot chooses to display the most common isoform of a protein as its “master” sequence. All variations related to this “master” sequence are annotated in the feature tables using diverse keys according to the origin of the difference (alternative splicing, polymorphism, conflictual sequencing results). Genetic variation data are stored in the feature table (“FT” lines) of the relevant entries using the “VARIANT” key. As Swiss-Prot is a “proteocentric” resource, only data on point mutations, small deletions, or insertions are collected. Frame-shift or nonsense mutations are excluded, as they generally have a deleterious effect on the structure and function of a protein. Each variant annotated in Swiss-Prot is manually checked, and information on disease association and functional effects is provided according to literature reports. If a variant occurs in different disease phenotypes, it is registered only once. However, information on the associated disease phenotypes can be found in the corresponding FT descriptions. Each variant is also given a unique identifier (FTId) to allow a direct link to the relevant entries in disease mutation databases, as well as to provide these databases with a method to implement reciprocal links. While Swiss-Prot is focusing its effort in the annotation of experimentally proven and published variants, sequence differences revealed by sequence alignments are also stored in the database, namely in the feature table using the “CONFLICT” key. Although it cannot be excluded that these data might reflect sequencing errors, they might well correspond to not yet identified variants and can be considered as the Swiss-Prot counterpart of in silico detected polymorphisms stored in SNPs repositories such as dbSNP and HGVbase. Recently, a specific variant web page has been implemented through the ExPASy server [70] to group and provide additional information on each variant (Fig. 2) (Yip
17
18 Protein Sequence Variants: Resources and Tools
Fig. 2 Example of a web page for a Swiss-Prot variant (VAR 011580 in Swiss-Prot entry P06400). The structures of the homology models can be obtained via a hyperlink in the “3D homology model” section.
3 Databases
et al., unpublished data). This page can be accessed via a link provided through the FTId. Apart from general information such as the amino acid change, position and effect of the variant, and its association with diseases, the page provides new structural information of the variant. In particular, when available, 3-D models generated by an automatic homology modeling method can be obtained so that the mutation can be visualized directly on the protein structure (Yip et al., unpublished data). Results of a sequence alignment can also be retrieved to evaluate residue conservation among different species. In addition, the variant web page gives a list of specific literature references, and direct links to relevant dbSNP, HGVbase and OMIM entries. When compared with the information on missense variants listed in other mutation databases, it should be noted that Swiss-Prot does not simply catalogue amino acid changes predicted from nucleotide variations, but it stores, when available, information on direct protein sequencing and characterization including post-translational modifications (PTMs). This is important, as the real effect of missense variants on proteins, such as PTM and structural phenotype [71], cannot be deduced from simple translation of single nucleotide substitutions at the DNA level. In this regard, Swiss-Prot offers links to proteomic databases, for example Swiss 2DPAGE (http://www.expasy.org/ch2d/) [72] and Siena 2DPAGE (http://www.bio-mol.unisi.it/2d/2d.html), and can be regarded as an integration platform between genomics and proteomics. Finally, it should be noted that while the concept of redundancy minimization and the organization of variant data in feature tables have many advantages for human readers, it may represent a certain danger to computer programs which only look at the “master” sequence and disregard the annotation in the feature tables. Similarity or motif search programs, such as BLAST [73] or ScanProsite [74], can miss alternatively spliced isoforms, or disease state isoforms with a high degree of sequence variation. In order to remedy this problem, the program var splic.pl [75] was written to reconstruct alternative sequences as annotated in the feature tables of Swiss-Prot and its computer-annotated supplement TrEMBL. Var splic.pl was conceived in particular for alternative splice isoform sequences, but it can also be used to derive alternative sequences resulting from polymorphisms or disease variants. The FASTA-formatted sequences obtained in this manner can be downloaded from the ExPASy and EBI FTP servers. The program is available free of charge from ftp://ftp.ebi.ac.uk/pub/software/swissprot/. When searching Swiss-Prot or TrEMBL, most sequence search tools on the ExPASy and EBI servers take into account these isoform sequences. 3.3.5 Cross-references Swiss-Prot is directly cross-referenced [76] to about 50 other databases, such as the EMBL/GenBank/DDBJ international nucleotide sequence database [77], the PDB structural database [78, 79], OMIM, MEDLINE/PubMed, the human gene nomenclature database Genew [80], which stores all HGNC-approved gene symbols, the Gene Ontology database (GO) [81], and many others. GO is a database of controlled terms for the description of the molecular function, biological process, and cellular
19
20 Protein Sequence Variants: Resources and Tools
component of gene products. Most of the cross-references from Swiss-Prot to other databases are stored in a line type called “DR”. However, a direct link from the relevant protein entries to specific mutation databases can be found within a comment topic called “DATABASE” (Fig. 1). In addition to these explicit cross-references hard coded in the raw Swiss-Prot entry, human sequence entries also have implicit links to databases such as GeneCards [82] and GeneLynx [83]. Implicit links are not annotated in the “DR” lines of the raw Swiss-Prot entry, but are added automatically when viewing the entry on the ExPASy server. GeneCards and GeneLynx are meta-databases of human genes, which offer concise information about the function of the gene products as well as their involvement in diseases. Both are created by automatically extracting relevant information from diverse internet resources, including Swiss-Prot, MIM, GenAtlas, and GDB. On the ExPASy web server, all explicit and implicit cross-references are implemented as hypertext links that allow users to seamlessly browse from one database to another. 3.3.6 Medical-oriented Keywords In order to facilitate the retrieval of data on medically relevant proteins, some specific medical-oriented keywords stored in the “KW” lines have been created in Swiss-Prot. These are: “Disease mutation”, which is used for sequences in which there is at least one known disease-linked mutation; “Polymorphism”, which is used in each entry containing protein sequence variants for which no disease association has been reported (at the level of the protein sequence), and “Chromosomal translocation”, which is used to indicate proteins whose genes are known to be involved in chromosomal translocations. The keyword “Pharmaceutical” was created to indicate that a protein is used as a therapeutic drug. Other specific keywords such as “Albinism”, “Charcot-Marie-Tooth disease”, “Deafness”, “Diabetes”, “Hemophilia”, “Phenylketonuria”, “Retinitis pigmentosa”, and “Xeroderma pigmentosum” were designed for genetic diseases linked with more than a single gene/protein. This list is constantly expanding, and new disease-related keywords are added while relevant Swiss-Prot entries are being annotated. Keywords such as “Allergen”, “Anti-oncogene”, and “Proto-oncogene” are also available. In summary, the Swiss-Prot database offers a high-quality annotation and a high level of integration with other biomolecular databases [76]. Swiss-Prot is distributed by anonymous FTP (see http://www.expasy.org/sprot/download.html) and can be accessed through the ExPASy (http://www.expasy.org/sprot/) and EBI (http://www.ebi.ac.uk/swissprot/) servers. 3.4 Techniques of Search
With this plethora of databases available through the Internet and each offering specific features [17], the investigator’s needs will clearly determine which database(s) is most suitable for use. For example, a clinician may want to find clinical information about patients with mutations. In this case, a database such as OMIM
3 Databases
can provide the desired information. However, when a researcher studying a single gene or a disease wants to find more biochemical or experimental data about the mutations, well-curated and maintained LSDBs would be preferable for this purpose. In general, it is usually good to start by searching one or more central databases to obtain some general information about mutations in a gene. Then, by using the data or particular links provided in these databases, one can easily move to the LSDBs for more complete and up-to-date data [35]. It is also recommended that researchers should not limit their studies to the use of one particular database. A recent study assessing the utility of seven international public or private SNPs databases (dbSNP, HGVbase, JSNP, GeneSNP, HOWDY, CGAP-GAI, LEELAB) showed that there was very little overlap among the databases [84]. Therefore, researchers who do not explore the entire information space provided by various databases take the risk of missing out information. Moreover, special care should be taken when using databases that solely rely on in silico detection methods of the variants, as they may contain false positives. A comprehensive interrogation of multiple databases is clearly necessary to take full advantage of the available online resources. Several valuable systems are available that can serve as central hubs for querying across different databases. At the European Bioinformatics Institute (EBI), resources related to mutation data are indexed under the Sequence Retrieval System (SRS) [85], which is a powerful program for parsing, indexing, querying, viewing, and linking independent textual databases. The SRS system at EBI indexes more than 40 databases containing information on human nuclear gene mutations. These include many LSDBs, OMIM, variation data from the EMBL and Swiss-Prot sequence databases (EMBLCHANGE and SWISSCHANGE, respectively), and SNP information from dbSNP. In addition, key fields from many LSDBs are combined into a central resource HUMUT, which contains a subset of validated information common to all its component databases with limited redundancy. HUMUT is therefore a federated human mutation database that uses SRS as a tool for analysis. The data in HUMUT are not simply stored in a database, but are organized in a standardized format that allows easy computational analysis of the data. In addition, this database includes cross-links between databases [21, 86]. A similar system is also provided by the NCBI. Entrez is an integrated search and retrieval system for the major databases supported by the NCBI, including PubMed, GenBank, GenPept Sequences, dbSNP, LocusLink, and many others. It enables text searching of databases and provides links to related information from other NCBI-maintained resources [86]. 3.5 Challenges for Databases
With the completion of the human genome map, the research environment has changed dramatically during the last few years. Sequence variations and their biological consequences were once considered as a field of peripheral interest. They are now widely recognized to be one of the major focuses of the post-genomic
21
22 Protein Sequence Variants: Resources and Tools
era. Consequently, mutation databases should respond to the demands coming from the scientific community. The first challenge databases are confronted with is to keep up with new data. This is particularly true for central databases that constantly have to deal with a large amount of heterogeneous information. Electronic submission forms may provide an option to encourage authors to directly deposit data. The development of programs for setting up, managing, and curating databases is also becoming a constant priority. These programs should aim to facilitate data retrieval and storage without diminishing data quality [87, 88]. Some databases also strive to develop automatic or semi-automatic annotation tools. In the case of medical annotation of the entries, an automatic approach is less trivial. In fact, for a complete and correct annotation, a large number of documents have to be retrieved and read, and every entry checked manually. As a result, the implementation of advanced data-mining tools may become more and more frequent [89]. Alternatively, interfaces can be built to allow external experts to edit, or participate in the curation process. The second challenge is to allow the information to be fully interconnected as a whole so that biological knowledge can be derived. Such integration can possibly be achieved on two levels. On the first level, ideally, the various data in different databases should be comprehensively cross-linked so that when starting with an entry from one database, information on the same entry provided by other databases can be accessed easily. On the second level, it should be made possible to query across different databases. Indexing and designing sophisticated data structure, which connects optimally the data objects, could achieve this. Complex queries for discovering indirect relationships among data could then be carried out through an efficient search and retrieval system. The EBI SRS system is an example of the successful implementation of such a system. In the near future, these two levels of data integration will continue to present a formidable intellectual, technical, and logistical challenge. A third challenge for databases is related to the information content. At the present time, most LSDBs have been created with the intention of listing gene mutations and directly relating the genotypes to their phenotype expression. However, as noted by Gottlieb et al., variable expressivity due to somatic mutations or mosaicism can cause phenotypic differences [90]. This suggests that it may be necessary to adapt the databases to incorporate additional data on genetic heterogeneity, and variable expressivity. For example, tissue specificity and the timing of the mutation may have important consequences on the phenotype and should be recorded in the databases. At the same time, a clear distinction between germline and somatic mutations should be made for the entries. Apart from adapting the content of the data, it is essential for the different LSDBs to share a minimum of core elements and strive to provide certain uniformity in data structure. From the survey conducted by Claustres et al., it is encouraging to see that the guidelines provided by the HUGO Mutation Database Initiative are being increasingly accepted and used in the design of new databases. For example, 42% of LSDBs use the HUGO-MDI nomenclature, and 29% of LSDBs follow the complete HUGO-MDI guidelines [50].
4 Analysis Tools in the Context of Protein Variants
The last, but not the least important challenge is to assure the quality of the data. Therefore, even in cases where a degree of automation is implemented for information assembly and annotation, it is necessary to have quality control over the input into the databases. Continuous update and validation of the current database content has to be ensured. Clearly, it goes without saying that the quality of the data you get from a database is only as good as the quality of the input data. Quality assurance is thus essential for the retrieval of most updated biologically relevant information from the databases. It can be envisioned that in the not too far future, central databases and LSDBs will continue to complement each other. Central databases would store a large amount of mutations, each with a minimal set of information. Links to LSDBs, where the major curation job would remain, would provide access to more detailed information for a particular field. Meta-analysis of mutation data across many genes would be available either in central databases or in specially designed retrieval systems.
4 Analysis Tools in the Context of Protein Variants
Currently, most information about variants stored in databases stems from highthroughput genomic projects. Changes in amino acid residues are thus often deduced from changes in the nucleotide sequence. However, advances in the field of proteomics now allow new variants to be identified or deduced nucleotide changes to be confirmed with online proteomic tools. At the same time, there are more analysis and prediction programs being made available through the Internet, which provide researchers with the opportunity to conduct in silico analysis of protein variants. The development of these computational tools is essential because they can be employed on a scale that is consistent with the large number of variants being identified. 4.1 Proteomic Tools for Protein Identification and the Characterization of Variants 4.1.1 Protein Identification Tools Peptide mass fingerprinting provides a common method to identify proteins [91– 93]. The experimental data are a list of peptide mass values obtained from an enzymatic digestion of a protein. The protein is identified by comparing these experimental masses with theoretical masses obtained by an in silico digestion of all proteins in a database. This is most effective for the simple (and often theoretical) scenario where the sample is not post-translationally modified, and does not contain any residues that differ from the “master” sequence, e.g. sequence variants. Both types of sequence alterations considerably influence the mass of a peptide, and an identification method based on an identity check of theoretical and experimental masses is very likely to miss out peptides modified in this manner. In these cases,
23
24 Protein Sequence Variants: Resources and Tools
more sophisticated identification and characterization tools are required. In the following enumeration of such tools, we concentrate on programs that are freely available to the scientific community via a Web interface, or can be downloaded free of charge. Others are only mentioned briefly. 4.1.1.1 Mascot Mascot (http://www.matrixscience.com/cgi/index.pl?page=/ search?form?select.html) is a powerful search engine, which uses mass spectrometry data to identify proteins from primary sequence databases [94]. In particular, the Mascot peptide mass fingerprinting tool has a comprehensive database menu, which includes Swiss-Prot (although not TrEMBL). If Swiss-Prot is selected, the tool searches not only the master Swiss-Prot sequences, but also all splice isoforms, annotated variants and conflicts, and combinations of these. These alternative entries can be recognized in the result by their pseudo-accession numbers, which are of the form P12345-01-02-03. The isoform described by this number would possess the splicing variations belonging to the first annotated alternatively spliced isoform (01), the variant features belonging to the second alternative variant form (02), and the conflicts belonging to the third alternative conflict form (03). 4.1.1.2 SEQUEST The SEQUEST package (http://www.fields.scripps.edu/ sequest/), distributed by Thermo Finnigan, correlates uninterpreted tandem mass spectra of peptides with amino acid sequences from protein and nucleotide databases [95]. A mode to identify amino acid substitutions resulting from SNPs, applicable to both HPLC and microspray tandem mass spectrometry has been investigated, but is not available to the general public. Additional algorithms have been published that address the improvement of identification of mutated proteins using mass spectrometry: SALSA [96] is a patternrecognition algorithm developed to screen large numbers of peptide MS-MS spectra for fragmentation characteristics indicative of specific peptide modifications. MSConvolution and MS-Alignment [97] use the spectral alignment approach. However, to the best of our knowledge, no tools implementing these algorithms are available to the scientific community. 4.1.2 Peptide Characterization Tools Besides identification of proteins with variants, several online tools allow detailed peptide characterization and thus help to determine if a variant is present. 4.1.2.1 FindMod The FindMod tool (http://www.expasy.org/tools/findmod.html) was conceived to identify potential amino acid substitutions (as well as posttranslational modifications) in peptides from mass spectrometry experiments [98]. For a protein identified as a potential candidate, theoretical and experimental masses are compared, and their mass differences are then matched with the mass differences induced by all of the possible substitutions of any of the amino acids in the peptide by any other residue. If a mass match is found, the corresponding amino acid substitution is suggested to the user. If the substitution further corresponds to a sequence variant or conflict annotated for this peptide in the underlying
4 Analysis Tools in the Context of Protein Variants
Swiss-Prot entry, this is highlighted in color. This allows distinguishing between merely potential substitutions and those already observed and documented in the literature, or it might help to confirm a variation that was previously annotated as a conflict. As additional interpretation help, the BLOSUM62 score [99] is provided for each suggested potential amino acid substitution. 4.1.2.2 PFMUTS (ProteinFragmentMUTationS) PFMUTS (http://www.mcs.vuw. ac.nz/∼aleksand/pfmuts.html) is based on the same idea. PFMUTS only accepts raw sequence input, and the analysis cannot, therefore, be combined with the wealth of related information already known and annotated in Swiss-Prot. However, the program allows prediction not only of single, but also potential double mutations of a peptide fragment, i.e. it is possible to find peptides in which two amino acids are replaced by two other amino acids. 4.1.2.3 Mascot In the context of peptide characterization, Mascot (see also above) allows a subset of the matching candidate proteins to be selected for submission to an “error tolerant search”. This can be done using the MS-MS search mode that tries to increase the number of matching peaks by predicting potential unspecific cleavage, post-translational modifications, and primary sequence variations. Only the amino acid substitutions that result from single base substitutions are taken into account, which is an attempt to reduce the number of suggested substitutions of an otherwise potentially very long list. As can be seen from the above list, the number of available identification tools that allow the discovery of new SNPs, based on peptide mass fingerprinting alone, is still limited. However, if one candidate protein is available, several characterization programs are available that allow the score for this candidate protein to be refined by investigating potential amino acid substitutions. This is probably due to the huge combinatorial overhead that results from the large number of possible substitutions: If every amino acid of every peptide of every protein in a database is potentially substituted, there is a high risk of obtaining a large number of unrelated, false-positive hits. Identification algorithms have to be extremely robust and well designed in order to deal with this problem in a sophisticated manner. 4.2 Tools for Analyzing and/or Predicting the Effects of Protein Variants
Many protein variants cause disease in humans, and these disease-causing mutations often tend to occur in structurally and functionally important sites. However, without in-depth biochemical or functional studies, it is often difficult to distinguish causative mutations from polymorphisms with little or no clinical significance. Techniques that could predict the functional consequences of a protein variant would therefore be invaluable for the clinical management of the related disease, as well as for medical research. Currently, there are two major approaches for the analysis of the impact of protein variations on the structure/function of proteins.
25
26 Protein Sequence Variants: Resources and Tools
4.2.1 Sequence-based Analysis or Prediction Tools The first strategy involves sequence-based analysis. In this type of analysis, protein sequence comparison among species is often used to infer preliminary information about protein function. The basic assumption is that in evolution, sequences coding for the most critical structural or functional amino acids tend to be the most highly conserved due to selection pressure against functionally altered proteins. Consequently, evolutionarily conserved amino acids could serve as a surrogate to identify functionally critical amino acids. There are several online analysis tools that use this assumption to predict the effect of variants. Most of these tools use empirical weighting schemes to measure amino acid similarity and score the substitution based on how often each amino acid replaces other amino acids in alignments of evolutionarily related proteins. 4.2.1.1 SIFT (Sorting Intolerant From Tolerant) SIFT [100] (http://blocks.fhcrc.org/ ∼pauline/SIFT.html) uses sequence homology and position-specific information to predict whether a substitution affects protein function or not [101]. Given a protein sequence, this tool searches for related proteins and performs a multiple sequence alignment (MSA) of these proteins with the query. Then, based on the amino acid appearing at each position in the alignment, the program classifies the substitutions as tolerated or deleterious. A position in the protein query that is conserved in the alignment will be scored by SIFT as intolerant to most changes; whereas a position that is poorly conserved will be scored by SIFT as tolerant. Whilst SIFT can choose the sequences for alignment automatically, it was found that better prediction results were obtained when a list of homologous sequences was provided. In addition, orthologous sequences will be more appropriate than paralogous sequences with distinct biochemical functions. The latter may confound prediction. 4.2.1.2 ConSeq ConSeq (http://conseq.bioinfo.tau.ac.il/) is also a Web server that allows structurally and functionally important residues in proteins to be identified based on MSA. The program accepts either a unique sequence of a protein or a domain, or an MSA provided by the user as input. In cases where a single sequence is provided, the server automatically carries out a PSI-BLAST search [73] for close homologous sequences in the Swiss-Prot database, and produces an MSA using the CLUSTALW program [102]. A phylogenetic tree consistent with the MSA (either calculated or provided by users) is then built. The conservation score for each residue is calculated using the Rate4Site algorithm [103], which is based on the maximum likelihood principle [104]. In addition, the server uses the MSA to predict the relative solvent accessibility of each residue, whether it is in the protein core or exposed to the solvent. It then uses these two parameters to predict if a residue is biologically important. For example, slowly evolving and exposed residues are marked with an “f” (predicted to be functional), whereas slowly evolving and buried residues are marked with an “s” to indicate that they may have an important structural role. The ConSeq server provides a color scheme to visualize the conservation scores, the relative accessibility prediction for each
4 Analysis Tools in the Context of Protein Variants
residue, and indicates those residues with a predicted functional and structurally role. Both SIFT and ConSeq base their predictions on sequence data alone and thus do not require protein structural information. They are thus well suited for the analysis of proteins with homologous sequences but of unknown three-dimensional (3-D) structure. However, sequence-based methods may not be sensitive enough to help analyze the effect of point mutations in more depth. 4.2.2 Structure-based Analysis or Prediction Tools The second strategy is structure-based. In fact, knowledge of protein structure has long been used as one of the main approaches to rationalize the effects of disease-causing mutations [105, 106]. Amino acid variants may have an impact on the folding, interaction sites, solubility or stability of the protein, and thus directly affect protein structure and function. These effects cannot be directly analyzed by simple alignment programs, but they can be estimated from physico-chemical considerations. Therefore, several online servers use additional structural data to study or predict the effects of mutations. 4.2.2.1 ConSurf (Conservation Surface-mapping) The ConSurf server (http:// consurf.tau.ac.il/) can be considered as a complement to the ConSeq server described above [107]. It has been developed by the same group. ConSurf enables the identification of functionally important regions on the surface of a protein or domain, based on the phylogenetic relations between its close sequence homologues [108]. Therefore, although ConSurf uses structural information, the algorithm used for the analysis is still mainly sequence-based. ConSurf accepts a PDB file of a protein structure as input. It extracts the sequence from the PDB file and automatically carries out a search for homologous sequences of proteins of known structure. A phylogenetic tree consistent with the computed MSA is then built using the maximum parsimony method [109], and the conservation scores calculated [103]. Similar to ConSeq, ConSurf also accepts user-provided MSA to build the phylogenetic tree. The conservation scores are colorcoded and “projected” onto the molecular surface of the protein. Users can use the online Protein Explorer engine (http://molvis.sdsc.edu/protexpl/frntdoor.htm) to visualize patches of highly conserved residues that are often of important biological function. 4.2.2.2 MutaProt MutaProt (http://www.bioinfo.weizmann.ac.il/cgi-bin/ MutaProt/mutations0.cgi) is a tool designed to analyze pairs of PDB files that differ in one or two amino acids and to examine the microenvironment surrounding the mutated residue(s) [110]. The server has a collection of PDB files (resolution of 3.5 Å or better) concerning structural distinction between proteins with point substitutions in amino acid. Users can query these data in three ways: by specifying a PDB ID, keywords, or by designating a particular pair of amino acids. Searches can also be restricted to a particular level of residue solvent accessibility. Links are provided to the detailed structural analysis of the mutation site. This includes the
27
28 Protein Sequence Variants: Resources and Tools
relative accessibility of the mutated residues, the displacement between equivalent atoms of the contacting residues, and a list of contacts involving the mutated site and its surrounding residues. In addition, users can visualize the superimposed regions around the mutation site(s) via an interactive 3-D presentation. The MutaProt database was last updated on November 2002 and currently contains 17668 PDB file-pairs, including 5654 single mutations, and 12014 double mutations. 4.2.2.3 PolyPhen (Polymorphism Phenotyping) PolyPhen (http://www.tux.emblheidelberg.de/ramensky/) is an automatic tool for the prediction of the possible impact of an amino acid substitution on the structure and function of a human protein [111]. The prediction is based on a set of straightforward empirical rules that take into account the protein annotation of the substitution site, the profile analysis of homologous sequences and the 3-D protein structure [112]. For a given amino acid substitution in a human protein, PolyPhen first checks if the amino acid replacement occurs at a site that is annotated in the SwissProt/TrEMBL database feature table as DISULFID, BINDING, ACT SITE, LIPID, METAL, SITE, or MOD RES, or at a site located in a TRAMSMEM, SIGNAL, or PROPEP region. The server also takes into account predictions made with computational programs, such as the TMHMM algorithm for transmembrane regions (http://www.cbs.dtu.dk/services/TMHMM/) and the SignalP program (http://www.cbs.dtu.dk/services/SignalP/) for signal peptide regions of the protein sequences. As a second step, PolyPhen identifies homologues of the input sequences via a BLAST search of the NRDB database (a database of non-identical protein sequences extracted from EMBL CDS translations + PDB + Swiss-Prot + PIR) and uses aligned sequences with 30–94% sequence identity to the input sequence to construct a profile matrix by using the position-specific independent counts (PSIC) method [113]. Finally, a number of structural parameters, such as secondary structure, solvent accessible surface area, φ– dihedral angles, as well as contacts with “critical sites”, ligands, and other polypeptide chains are determined. This is done by mapping the amino acid substitution to known protein 3-D structure or, if the 3-D structure is not available, to a homologous protein structure of at least 50% sequence identity to the query. PolyPhen accepts a protein sequence, or the Swiss-Prot/TrEMBL ID or AC together with sequence position and two amino acid variants characterizing the polymorphism as input. The program then sorts the mutations into possibly damaging (supposed to affect protein function), probably damaging, benign (most likely lacking any phenotypic effect), or unknown. Not all the parameters calculated are used for the decision rule, if some of them do not help to increase sensibility without significant loss of specificity of predictions. Apart from these analysis or prediction tools for variants, several recent studies have performed in-depth analysis of different structural biophysical parameters to estimate the likely impact of an amino acid substitution on the structure and function of a protein. These studies either seek to differentiate between neutral and functional polymorphisms [114, 115], or attempt to identify important struc-
4 Analysis Tools in the Context of Protein Variants
tural parameters for the disease-causing effects of particular mutations [116–118]. Most of these studies performed mapping of a set of protein variants from public databases onto existent 3-D structures of the corresponding proteins and analyzed their structural features. The availability of 3-D structures is thus critical to determine the data set used. To date, the number of experimental protein structures (19151, PDB holding 10 June 2003) is still limited when compared with the number of available sequences (more than 1 million Swiss-Prot Release 41.15 and TrEMBL release 24.1 of 04.07.2003). In this context, the use of predictive methods, such as comparative or homology modeling, could represent an alternative solution to the problem. 4.2.3 The Swiss-Prot Variant Page and Comparative Modeling The Swiss-Prot variant page (see Fig. 2) provides 3-D comparative models of protein variants (Yip et al., unpublished data). These homology models were constructed using the automated comparative protein modeling machine PromodII underlying the SWISS-MODEL server (http://www.swissmodel.expasy.org/) [119]. Comparative modeling is based on the observation that families of proteins with members sharing similar sequences often fold into similar 3-D structures. This conservation allows, in principle, the prediction of structure in a family of proteins when only the structure of a single family member is known. Clearly, the reliability and the accuracy of the models are of great importance in such an approach and these two factors largely depend on how closely related the target sequence and the template structure are. Therefore, for the Swiss-Prot variant page, proteins are chosen to be modeled only when crystal template structure(s) of better than 2.5 Å resolutions and of at least 70% sequence identity can be found. This threshold was selected on the basis of several facts: (1) the four successive Critical Assessment of Protein Structure Prediction (CASP) meetings have demonstrated that deviation of models from experimental structures are typically < 1 Å for Cα atoms at >60% sequence identity [120]; (2) models which were built using templates with a high degree of sequence identity (>70%) have been proven useful during drug design projects, thus indicating their high reliability and accuracy [121, 122]. Currently, more than 3500 variants of the 15 000-plus human protein variants recorded in the Swiss-Prot database have corresponding 3-D models. Of these 3500 variants, about 1600 are related to disease. The model data, together with information on Swiss-Prot sequences/variants, and experimentally determined protein structures from the PDB are stored in the regularly updated ModSNP database (Yip et al., unpublished data). The models can be accessed through the Swiss-Prot variant page and can be further analyzed using the platform-independent molecular graphic suite Astax Viewer. 4.2.4 Remarks The analysis or prediction tools, either sequence- or structure-based, all have their own pros and cons. Sequence-based tools that use sequence homology rather than protein structure can potentially analyze a larger number of variants than tools based on protein structure alone. However, in instances where fewer than 5-10
29
30 Protein Sequence Variants: Resources and Tools
homologues are available, it is recommended that structural information should be included if available [123]. In terms of performance, it is currently found that sequence- and structure-based tools have similar prediction abilities [100], and that the MSA-based profile scores appear to contribute most to the prediction when both structural and sequence information is used [111]. Nevertheless, as it is at present still not clear which structural parameters give the strongest predictive power and because few detailed large-scale structural analyses of protein variants have been performed due to the deficit in protein structures, it could be possible that structure-based analysis has not yet reached its full potential. The availability of high-quality comparative models will surely help to provide a larger data set for analysis.
5 Conclusions
Following the completion of the Human Genome Project, the increased availability of public databases and online analysis tools is surely influencing the way medical research is done. In silico approaches may be used to identify candidate genes for genetic disorders and to set priorities in their analysis [124, 125]. For example, a wise use of the analysis and prediction tools of the effect of protein variants can provide a fast and useful guide for the design and analysis of mutagen-esis studies. This can reduce the number of functional assays required and give a higher proportion of affected phenotypes. Careful sequence and structural analysis of proteins involved in similar disorders may also uncover novel features and suggest new pathological pathways [126]. In this respect, a comprehensive and correct annotation is of crucial importance as any in silico approach is highly dependent on the quality of the data to which it is applied. The integration of mutation data, functional data, expression profiles, protein sequence and structural information, protein/protein interaction data, and phenotypic descriptions in databases will play a major role in the elucidation of the chain of events leading from a molecular defect to a pathology.
References 1 COLLINS, F. S., PATRINOS, A., JORDAN, E., CHAKRAVARTI, A., GESTELAND, R., WALTERS, L. Science 1998, 282, 682–689. 2 COLLINS, F. S., BROOKS, L. D., CHAKRAVARTI, A. Genome Res. 1998, 8, 1229–1231. 3 GUT, I. G. Hum. Mutat. 2001, 17, 475–492. 4 TSUCHIHASHI, Z., DRACOPOLI, N. C. Pharmacog. J. 2002, 2, 103–110.
5 CARGILL, M., ALTSHULER, D., IRELAND, J. et al. Nature Genet. 1999, 22, 231–238. 6 CAMBIEN, F., POIRIER, O., NICAUD, V. et al. Am. J. Hum. Genet. 1999, 65, 183–191. 7 HALUSHKA, M. K., FAN, J. B., BENTLEY, K. et al. Nature Genet. 1999, 22, 239–247. 8 ALTSHULER, D., POLLARA, V. J., COWLES, C. R. et al. Nature 2000, 407, 513–516. 9 CLIFFORD, R., EDMONSON, M., HU, Y., NGUYEN, C., SCHERPBIER, T., BUETOW, K.
References
10 11 12
13 14 15
16
17 18 19 20
21
22 23 24 25
26
27
28 29
H. Genome Res. 2000, 10, 1259– 1265. MARTIN, E. R., LAI, E. H., GILBERT, J. R. et al. Am. J. Hum. Genet. 2000, 67, 383–394. JOHNSON, G. C., ESPOSITO, L., BARRATT, B. J. et al. Nature Genet. 2001, 29, 233–237. SACHIDANANDAM, R., WEISSMAN, D., SCHMIDT, S. C. et al. Nature 2001, 409, 928–933. IIDA, A., SAITO, S., SEKINE, A. et al. J. Hum. Genet. 2002, 47, 285–310. COTTON, R. G., MCKUSICK, V., SCRIVER, C. R. Science 1998, 279, 10–11. SCRIVER, C. R., NOWACKI, P. M., LEHVASLAIHO, H. Hum. Mutat. 1999, 13, 344–350. SCRIVER, C. R., NOWACKI, P. M., LEHVASLAIHO, H. Hum. Mutat. 2000, 15, 13–15. BAXEVANIS, A. D. Nucleic Acids Res. 2003, 31, 1–12. ANTONARAKIS, S. E. Hum. Mutat. 1998, 11, 1–3. DEN DUNNEN, J. T., ANTONARAKIS, S. E. Hum. Mutat. 2000, 15, 7–12. PAZNEKAS, W. A., BOYADJIEV, S. A., SHAPIRO, R. E. et al. Am. J. Hum. Genet. 2003, 72, 408–418. LEHVASLAIHO, H., STUPKA, E., ASHBURNER, M. Hum. Mutat. 2000, 15, 52–56. COTTON, R. G., SCRIVER, C. R. Hum. Mutat. 1998, 12, 1–3. COTTON, R. G., HORAITIS, O. Hum. Mutat. 2000, 15, 16–21. ROZET, J. M., GERBER, S., SOUIED, E. et al. Eur. J. Hum. Genet. 1998, 6, 291–295. PALOMA, E., MARTINEZ-MIR, A., VILAGELIU, L., GONZALEZ-DUARTE, R., BALCELLS, S. Hum. Mutat. 2001, 17, 504–510. BRIGGS, C. E., RUCINSKI, D., ROSENFELD, P. J., HIROSE, T., BERSON, E. L., DRYJA, T. P. Invest. Ophthalmol. Vis. Sci. 2001, 42, 2229–2236. WEBSTER, A. R., HEON, E., LOTERY, A. J. et al. Invest. Ophthalmol. Vis. Sci. 2001, 42, 1179–1189. GERBER, S., ROZET, J. M., Van de POL, T. J. et al. Genomics 1998, 48, 139–142. NASONKIN, I., ILLING, M., KOEHLER, M. R., SCHMID, M., MOLDAY, R. S., WEBER, B. H. Hum. Genet. 1998, 102, 21–26.
30 CREMERS, F. P., Van de POL, D. J., Van DRIEL, M. et al. Hum. Mol. Genet. 1998, 7, 355–362. 31 MAUGERI, A., KLEVERING, B. J., ROHRSCHNEIDER, K. et al. Am. J. Hum. Genet. 2000, 67, 960–966. 32 ALLIKMETS, R., SHROYER, N. F., SINGH, N. et al. Science 1997, 277, 1805–1807. 33 ALLIKMETS, R. Am. J. Hum. Genet. 2000, 67, 487–491. 34 GUYMER, R. H., HEON, E., LOTERY, A. J. et al. Arch. Ophthalmol. 2001, 119, 745–751. 35 PORTER, C. J., TALBOT, C. C., CUTICCHIA, A. J. Hum. Mutat. 2000, 15, 36–44. 36 HAMOSH, A., SCOTT, A. F., AMBERGER, J., VALLE, D., MCKUSICK, V. A. Hum. Mutat. 2000, 15, 57–61. 37 HAMOSH, A., SCOTT, A. F., AMBERGER, J., BOCCHINI, C., VALLE, D., MCKUSICK, V. A. Nucleic Acids Res. 2002, 30, 52–55. 38 MCKUSICK, V. A. Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders, 12th edn. Johns Hopkins University Press, Baltimore, 1997. 39 KRAWCZAK, M., BALL, E. V., FENTON, I. et al. Hum. Mutat. 2000, 15, 45–51. 40 STENSON, P. D., BALL, E. V., MORT, M. et al. Hum. Mutat. 2003, 21, 577–581. 41 COOPER, D. N., BALL, E. V., KRAWCZAK, M. Nucleic Acids Res. 1998, 26, 285–287. 42 SHERRY, S. T., WARD, M., SIROTKIN, K. Genome Res. 1999, 9, 677–679. 43 SHERRY, S. T., WARD, M. H., KHOLODOV, M. et al. Nucleic Acids Res. 2001, 29, 308–311. 44 FREDMAN, D., SIEGFRIED, M., YUAN, Y. P., BORK, P., LEHVASLAIHO, H., BROOKES, A. J. Nucleic Acids Res. 2002, 30, 387–391. 45 THORISSON, G. A., STEIN, L. D. Nucleic Acids Res. 2003, 31, 124–127. 46 HIRAKAWA, M., TANAKA, T., HASHIMOTO, Y., KURODA, M., TAKAGI, T., NAKAMURA, Y. Nucleic Acids Res. 2002, 30, 158–162. 47 OSIER, M. V., CHEUNG, K. H., KIDD, J. R., PAKSTIS, A. J., MILLER, P. L., KIDD, K. K. Am. J. Phys. Anthropol. 2002, 119, 77–83. 48 RAJEEVAN, H., OSIER, M. V., CHEUNG, K. H. et al. Nucleic Acids Res. 2003, 31, 270–271. 49 CUTICCHIA, A. J. Hum. Mutat. 2000, 15, 62–67.
31
32 Protein Sequence Variants: Resources and Tools
50 CLAUSTRES, M., HORAITIS, O., VANEVSKI, M., COTTON, R. G. Genome Res. 2002, 12, 680–688. 51 OLIVIER, M., EELES, R., HOLLSTEIN, M., KHAN, M. A., HARRIS, C. C., HAINAUT, P. Hum. Mutat. 2002, 19, 607–614. 52 HERNANDEZ-BOUSSARD, T., RODRIGUEZ-TOME, P., MONTESANO, R., HAINAUT, P. Hum. Mutat. 1999, 14, 1–8. 53 BEROUD, C., SOUSSI, T. Hum. Mutat. 2003, 21, 176–181. 54 GOTTLIEB, B., LEHVASLAIHO, H., BEITEL, L. K., LUMBROSO, R., PINSKY, L., TRIFIRO, M. Nucleic Acids Res. 1998, 26, 234–238. 55 VIHINEN, M., KWAN, S. P., LESTER, T. et al. Hum. Mutat. 1999, 13, 280–285. 56 BROWN, A., MCKIE, M., VAN H., V., PROSSER, J. Nucleic Acids Res. 1998, 26, 259–264. 57 DALGLEISH, R. Nucleic Acids Res. 1997, 25, 181–187. 58 DALGLEISH, R. Nucleic Acids Res. 1998, 26, 253–255. 59 STENBERG, K. A., RIIKONEN, P. T., VIHINEN, M. Nucleic Acids Res. 2000, 28, 369–371. 60 SCRIVER, C. R., WATERS, P. J., SARKISSIAN, C. et al. Hum. Mutat. 2000, 15, 99–104. 61 SCRIVER, C. R., HURTUBISE, M., KONECKI, D. et al. Hum. Mutat. 2003, 21, 333–344. 62 KWOK, C. J., MARTIN, A. C., AU, S. W., LAM, V. M. Hum. Mutat. 2002, 19, 217–224. 63 Van DURME, J. J., BETTLER, E., FOLKERTSMA, S., HORN, F., VRIEND, G. Nucleic Acids Res. 2003, 31, 331–333. 64 FUNG, D. C., HOLLAND, E. A., BECKER, T. M., HAYWARD, N. K., BRESSACDE PAILLERETS, B., MANN, G. J. Hum. Mutat. 2003, 21, 2–7. 65 HARDISON, R. C., CHUI, D. H., GIARDINE, B. et al. Hum. Mutat. 2002, 19, 225–233. 66 SARRAUSTE DE MENTHIERE, C., TERRIERE, S., PUGNERE, D., RUIZ, M., DEMAILLE, J., TOUITOU, I. Nucleic Acids Res. 2003, 31, 282–285. 67 BOECKMANN, B., BAIROCH, A., APWEILER, R. et al. Nucleic Acids Res. 2003, 31, 365–370. 68 O’DONOVAN, C., APWEILER, R., BAIROCH, A. Trends Biotechnol. 2001, 19, 178–181.
69 POVEY, S., LOVERING, R., BRUFORD, E., WRIGHT, M., LUSH, M., WAIN, H. Hum. Genet. 2001, 109, 678–680. 70 GASTEIGER, E., GATTIKER, A., HOOGLAND, C., IVANYI, I., APPEL, R. D., BAIROCH, A. Nucleic Acids Res. 2003, 31, 3784–3788. 71 JACOBY, E. S., KICMAN, A. T., ILES, R. K. J. Mol. Endocrinol. 2003, 30, 239–252. 72 HOOGLAND, C., SANCHEZ, J. C., TONELLA, L. et al. Nucleic Acids Res. 2000, 28, 286–288. 73 ALTSCHUL, S. F., MADDEN, T. L., SCHAFFER, A. A. et al. Nucleic Acids Res. 1997, 25, 3389–3402. 74 GATTIKER, A., GASTEIGER, E., BAIROCH, A. Appl. Bioinform. 2002, 1, 107–108. 75 KERSEY, P., HERMJAKOB, H., APWEILER, R. Bioinformatics 2000, 16, 1048–1049. 76 GASTEIGER, E., JUNG, E., BAIROCH, A. Curr. Issues Mol. Biol. 2001, 3, 47–55. 77 STOESSER, G., BAKER, W., Van DEN, B. A. et al. Nucleic Acids Res. 2002, 30, 21–26. 78 BHAT, T. N., BOURNE, P., FENG, Z. et al. Nucleic Acids Res. 2001, 29, 214–218. 79 BERMAN, H. M., WESTBROOK, J., FENG, Z. et al. Nucleic Acids Res. 2000, 28, 235–242. 80 WAIN, H. M., LUSH, M., DUCLUZEAU, F., POVEY, S. Genew: the human gene nomenclature database. Nucleic Acids Res. 2002, 30, 169–171. 81 ASHBURNER, M., BALL, C. A., BLAKE, J. A. et al. Nature Genet. 2000, 25, 25–29. 82 REBHAN, M., CHALIFA-CASPI, V., PRILUSKY, J., LANCET, D. Bioinformatics 1998, 14, 656–664. 83 LENHARD, B., HAYES, W. S., WASSERMAN, W. W. Genome Res. 2001, 11, 2151–2157. 84 MARSH, S., KWOK, P., MCLEOD, H. L. Hum. Mutat. 2002, 20, 174–179. 85 ETZOLD, T., ULYANOV, A., ARGOS, P. Methods Enzymol. 1996, 266, 114–128. 86 WHEELER, D. L., CHURCH, D. M., FEDERHEN, S. et al. Nucleic Acids Res. 2003, 31, 28–33. 87 BEROUD, C., COLLOD-BEROUD, G., BOILEAU, C., SOUSSI, T., JUNIEN, C. Hum. Mutat. 2000, 15, 86–94. 88 BROWN, A. F., MCKIE, M. A. Hum. Mutat. 2000, 15, 76–85. 89 DOBROKHOTOV, P. B., GOUTTE, C., VEUTHEY, A. L., GAUSSIER, E. Bioinformatics 2003, 19 Suppl. 1, i91-i94.
References 90 GOTTLIEB, B., BEITEL, L. K., TRIFIRO, M. A. Hum. Mutat. 2001, 17, 382–388. 91 HENZEL, W. J., BILLECI, T. M., STULTS, J. T., WONG, S. C., GRIMLEY, C., WATANABE, C. Proc. Natl Acad. Sci. USA 1993, 90, 5011–5015. 92 MANN, M., HOJRUP, P., ROEPSTORFF, P. Biol. Mass Spectrom. 1993, 22, 338–345. 93 YATES, J. R., III, SPEICHER, S., GRIFFIN, P. R., HUNKAPILLER, T. Anal. Biochem. 1993, 214, 397–408. 94 PERKINS, D. N., PAPPIN, D. J., CREASY, D. M., COTTRELL, J. S. Electrophoresis 1999, 20, 3551–3567. 95 GATLIN, C. L., ENG, J. K., CROSS, S. T., DETTER, J. C., YATES, J. R., III. Anal. Chem. 2000, 72, 757–763. 96 LIEBLER, D. C., HANSEN, B. T., DAVEY, S. W., TISCARENO, L., MASON, D. E. Anal. Chem. 2002, 74, 203–210. 97 PEVZNER, P. A., MULYUKOV, Z., DANCIK, V., TANG, C. L. Genome Res. 2001, 11, 290–299. 98 WILKINS, M. R., GASTEIGER, E., GOOLEY, A. A. et al. J. Mol. Biol. 1999, 289, 645–657. 99 HENIKOFF, S., HENIKOFF, J. G. Proc. Natl Acad. Sci. USA 1992, 89, 10915–10919. 100 NG, P. C., HENIKOFF, S. Genome Res. 2002, 12, 436–446. 101 NG, P. C., HENIKOFF, S. Genome Res. 2001, 11, 863–874. 102 THOMPSON, J. D., HIGGINS, D. G., GIBSON, T. J. Nucleic Acids Res. 1994, 22, 4673–4680. 103 PUPKO, T., BELL, R. E., MAYROSE, I., GLASER, F., BEN TAL, N. Bioinformatics 2002, 18 Suppl. 1, S71–S77. 104 FELSENSTEIN, J. J. Mol. Evol. 1981, 17, 368–376. 105 ERLANDSEN, H., STEVENS, R. C. Mol. Genet. Metab. 1999, 68, 103–125. 106 SIGAL, A., ROTTER, V. Cancer Res. 2000, 60, 6788–6793. 107 GLASER, F., PUPKO, T., PAZ, I. et al. Bioinformatics 2003, 19, 163–164. 108 ARMON, A., GRAUR, D., BEN TAL, N. J. Mol. Biol. 2001, 307, 447–463.
109 FELSENSTEIN, J. Methods Enzymol. 1996, 266, 418–427. 110 EYAL, E., NAJMANOVICH, R., SOBOLEV, V., EDELMAN, M. Bioinformatics 2001, 17, 381–382. 111 RAMENSKY, V., BORK, P., SUNYAEV, S. Nucleic Acids Res. 2002, 30, 3894–3900. 112 SUNYAEV, S., RAMENSKY, V., KOCH, I., LATHE, W., III, KONDRASHOV, A. S., BORK, P. Hum. Mol. Genet. 2001, 10, 591–597. 113 SUNYAEV, S. R., EISENHABER, F., RODCHENKOV, I. V., EISENHABER, B., TUMANYAN, V. G., KUZNETSOV, E. N. Protein Eng. 1999, 12, 387–394. 114 WANG, Z., MOULT, J. Hum. Mutat. 2001, 17, 263–270. 115 CHASMAN, D., ADAMS, R. M. J. Mol. Biol. 2001, 307, 683–706. 116 TERP, B. N., COOPER, D. N., CHRISTENSEN, I. T. et al. Hum. Mutat. 2002, 20, 98– 109. 117 FERRERCOSTA, C., OROZCO, M., De LA CRUZ, X. J. Mol. Biol. 2002, 315, 771–786. 118 STITZIEL, N. O., TSENG, Y. Y., PERVOUCHINE, D., GODDEAU, D., KASIF, S., LIANG, J. J. Mol. Biol. 2003, 327, 1021–1030. 119 GUEX, N., PEITSCH, M. C. Electrophoresis 1997, 18, 2714–2723. 120 VENCLOVAS, C., ZEMLA, A., FIDELIS, K., MOULT, J. Proteins 2001, Suppl 5, 163–170. 121 MARTIRENOM, M. A., STUART, A. C., FISER, A., SANCHEZ, R., MELO, F., SALI, A. Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 291–325. 122 BAKER, D., SALI, A. Science 2001, 294, 93–96. 123 SAUNDERS, C. T., BAKER, D. J. Mol. Biol. 2002, 322, 891–901. 124 PEREZ-IRATXETA, C., BORK, P., ANDRADE, M. A. Nature Genet. 2002, 31, 316–319. 125 Van DRIEL, M. A., CUELENAERE, K., KEMMEREN, P. P., LEUNISSEN, J. A., BRUNNER, H. G. Eur. J. Hum. Genet. 2003, 11, 57–63. 126 SCHEEL, H., TOMIUK, S., HOFMANN, K. Hum. Mol. Genet. 2002, 11, 1757–1762.
33
1
PPARs and Cancer J. H. Gill
University of Bradford, Bradford, UK
Ruth A. Roberts Aventis Pharma SA, centre de Recherche de Paris, Vitry sur Seine, France
Originally published in: Cellular Proteins and Their Fatty Acids in Health and Disease. Edited by Asim K. Duttaroy and Friedrich Spener. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30437-0
1 Introduction
The peroxisome proliferator activated receptors (PPARs) are ligand-activated transcription factors belonging to the superfamily of nuclear hormone receptors. The prototype PPAR, PPAR α was cloned from mouse liver by homology to known superfamily members [1] and was subsequently found to mediate the cellular response to the fibrate hypolipidemic drugs [2]. Since then, three subtypes of PPAR have been identified termed α, β/δ, and γ [1, 3–5] and an increasing diversity of natural and artificial PPAR ligands have been described, including fatty acids, eicosinoids, and insulin sensitizers (reviewed in Ref [6]). This biological profile has stimulated much interest in PPARs as therapeutic targets in the search for new drugs for obesity, hypolipidemia, and type 2 diabetes (reviewed in Refs [7–9]). Recently, there has been great interest in the suggestion that the activation of PPARs may have anticancer effects. Here, we review the evidence for PPARs in cancer and the potential opportunities for therapeutic intervention.
2 The PPAR Family
The PPARs characterized to date have a common organization consisting of six coding exons (Fig. 1), encoding the N-terminal A/B domain, the DNA binding domain, the hinge region, and the ligand binding domain. PPARs exhibit 80% and
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 PPARs and Cancer
Fig. 1 Diagrammatic representation of PPAR activation and protein domains of PPAR. PPAR structure has an A/B domain, a DNA binding domain (DBD), a hinge region, and a ligand binding domain (LBD). The DBD and LBD domains are encoded by two exons each, one for each zinc finger or ligand, respectively. ACO, acyl-CoA oxidase;
ACS, acyl-CoA synthetase; ACODH, acyl-CoA dehydrogenase; BFE, bifunctional enzyme; CYP4A1, cytochrome P450 4A1; FATP, fatty acid transport protein; L-FABP, liver fatty acid binding protein; LPL, lipoprotein lipase; LST-1, liver-specific type 1 sugar transporter; PEPCK, phosphoenolpyruvate carboxykinase.
65% homology in their DNA binding and ligand binding domains, respectively, and are highly conserved in mammalian species. Once activated, PPARs directly regulate gene transcription by forming heterodimers with the retinoid X receptor (RXR) [10]. This complex binds to a specific sequence in the 5 region of target genes, known as a PPAR response element (PPRE) that consists of a direct repeat (DR) of two TGACCT sequences separated by one nucleotide (so called DR1 element). The first natural PPRE was identified upstream of the rat peroxisomal acyl-CoA oxidase (ACO) gene [11] then subsequently in other peroxisomal genes such as bifunctional enzyme (BFE) [12], rat cytochrome P4504A1 [13] and in an increasing number of genes implicated in lipid homeostasis (reviewed in Ref. [9]). All these PPREs conform to the DR1 criteria but a detailed analysis of the family of PPAR responsive genes indicates that additional parameters dictate the patterns of activation such as the regions flanking the PPRE [14] and the presence of additional transcriptional repressors and coactivators [15– 18]. Although the PPAR isoforms are highly homologous and function by the same mechanism, their ligand specificity [19] and tissue distribution differs and their target genes are different, suggesting unique functions for the different receptor isoforms[5].
3 PPAR" 3
3 PPARα 3.1 Expression and Activation
PPARα is expressed in a variety of tissues, especially the liver and kidneys [3, 4, 20]. Although many chemicals have been demonstrated to activate PPARα, including plasticizers, fibrate hypolipidemic drugs, and natural fatty acids [1, 6, 21–26] a definitive endogenous ligand has yet to be identified. Two potential candidates are the inflammatory mediator leukotriene B4 (LTB4 ), and fatty acids or their metabolites, both of which have been demonstrated to bind directly to PPARα [27, 28]. The outcome of PPARα activation has primarily been studied in the liver. Genes upregulated by PPARα include those involved with the regulation of fatty acid degradation, such as acyl-CoA oxidase and cytochrome P4504A [11, 13, 28]. Transcriptional regulation of such genes strongly suggests a role for PPARα in liver lipid homeostasis [28–30]. 3.2 PPARα and Cancer
In addition to their many roles in normal tissue physiology, PPARs have been implicated in regulating cancer development, surprisingly both as promoters and inhibitors of the process [31–38]. Activation of PPARα by the peroxisome proliferator (PP) class of chemicals is known to cause rodent hepatocellular carcinoma (reviewed in Ref [39]). Studies performed using PPARα-null mice support the dependence of this process on PPARα [18, 40–43], although the actual mechanism by which PPARα activation leads to the formation of liver tumors in rodents is at present unclear. One such hypothesis involves the perturbation of hepatocyte growth control, such that activation of PPARα leads to an increase in proliferation and a suppression of cell death [42, 44, 45]. The induction of proliferation by PPARα ligands has been shown to involve regulation of cell cycle control genes, such as CDK4 [46, 47], c-myc, c-fos, and c-jun [47, 48]. Growth regulators such as the interleukins [49, 50] and tumor necrosis factor α (TNFα) [51–55] have been implicated in this response. However, there is as yet no evidence for direct transcription of these growth regulatory genes by PPARα. The suppression of apoptosis by activation of PPARα may result from the release of cytokines, is not dependent on the type of apoptotic pathway, and occurs in a specific population of hepatocytes [46, 49, 50, 55–59]. 3.3 Species Differences
Although PPARα ligands clearly cause rodent liver tumors, this appears to be a rodent-specific phenomenon (reviewed in Ref. [9]). Clinical side effects of fibrate
4 PPARs and Cancer
PPs are rare and analyses of causes of death during treatment show no evidence of an adverse effect and no evidence of an increase in malignant disease compared with the normal population [60]. When IARC reviewed clofibrate they concluded that the mechanism of liver carcinogenesis in clofibrate treated rats would not be operative in humans [61] based on lack of response in human hepatocytes and on the results of extensive epidemiological studies, particularly the WHO trial on clofibrate comprising 208000 man-years of observation [62, 63]. Furthermore, a meta-analysis [64] of the results from six clinical trials on clofibrate also found no excess cancer mortality [61]. Extensive studies have been performed to dissect this species difference in response to PPARα activation. Such marked differences may be explained by quantity of PPARα and quality of PPARα-mediated responses [39, 42, 65–67]. Humans were found to express functional PPARα, but at much lower expression levels than that of the responsive rodent species [42, 66]. The lower expression of PPARα could be attributed to the presence of a truncated, inactive form of PPARα, which appears to be present in most individuals examined to date [39, 66]. In addition, there is a species-specific response with regard to the responsiveness of PPARα target genes, suggesting different patterns of gene regulation compared with the rodent situation [68–70]. Therefore, levels of human PPARα are sufficient to facilitate the action of hypolipidemic drugs, but may be too low to activate the genes responsible for the hepatocarcinogenic effects of PPARα activation [42, 67]. 3.4 PPARα as a Therapeutic Target?
Several studies have suggested an indirect role for PPARα in tumorigenesis based on correlation of expression with cancer development. Expression levels of PPARα have been suggested to be associated with the development of prostatic adenocarcinoma [71]. Similarly, activation of PPARα has been shown to cause an upregulation of the immediate early gene cyclooxygenase-2 (COX-2) in murine liver epithelia [72], rabbit corneal epithelia [73] and human colorectal epithelial cell lines [74, 75]. Since approximately 80% of human colorectal carcinomas have increased levels of COX-2, an enzyme involved in prostaglandin breakdown, the PPARα-induced expression of COX-2 is of potential significance for colon carcinogenesis [74–76]. Interestingly, fatty acids are known to be activators of PPARα and dietary fat is a known risk factor for colon carcinogenesis [77]. The regulation of COX-2 by PPARα observed in colorectal cells may also be mirrored in other human epithelia, given that fatty acids have already been shown to upregulate COX-2 in rat mammary gland [78]. These data suggest that activation of PPARα may play a role in colon carcinogenesis, raising the possibility that inhibition of PPARα could have therapeutic potential. However, these data should be viewed with caution in the presence of extensive epidemiological evidence showing that prolonged exposure to PPARα ligands has no effects on tumor incidence [62, 63]. Nonetheless, if further research identifies inhibition of PPARα as a reasonable therapeutic target, a dominant negative
4 PPAR(
isoform of PPARα, hPPARα6/29 has been described that can inhibit PPARmediated gene expression [59]. Aside from the possible effects of PPARα ligands on COX-2 expression, activation of PPARα by Wy 14,643 has been shown to inhibit progression of DMBA-induced rodent mammary tumors [79]. Also, PPARα ligands have been shown to result in apoptosis of human monocyte-derived macrophages [80] and to alter chemotaxis in macrophages from peritoneal fluid [81]. These data raise the possibility of targeting tumor macrophages as a means of modulating cancer cell survival signals provided by macrophage-derived cytokines [82].
4 PPARγ 4.1 Expression and Activation
PPARγ has a restricted expression profile, being expressed predominantly in adipose tissue and large intestine [34, 38, 83]. The observation that the anti-diabetic thiazolidinediones (TZDs) activated PPARγ led to a wave of research and hypotheses regarding a role for PPARγ in the regulation of diabetes and obesity [36, 84]. In addition to TZDs, PPARγ is activated by endogenous ligands (Fig. 2), including fatty acids and prostaglandin metabolites, such as 15-deoxy-12,14 -prostaglandin J2 [83, 84]. Major roles for PPARγ have been identified in adipocyte differentiation, monocyte/macrophage development and glucose homeostasis [36, 38, 83, 85]
Fig. 2 Natural and synthetic ligands for PPARs and possible role in cancer therapy. LA, linoleic acid; AA, arachadonic acid; Trog, troglitazone; PGJ2 , 15-deoxy-12,14 -prostaglandin J2 .
5
6 PPARs and Cancer
mediated via control of the expression of a large number of genes involved in lipid transport and insulin sensitization (reviewed in Ref. [86]). In humans, three PPARγ mRNA have been identified, all of which are derived as splice variants of the same gene. To date, no physiological differences have been discovered between the three different PPARγ isoforms [87]. 4.2 PPARγ and Cancer
In addition to a role in obesity and diabetes, PPARγ may regulate proliferation and differentiation (reviewed in Ref [88]), suggesting a potential role in tumorigenesis. PPARγ expression was shown to be tumor-specific in non-small cell lung cancer, with expression observed in > 50% of primary lung tumors and several immortalized cell lines [32]. Similarly, PPARγ expression was observed in >90% pancreatic adenocarcinomas investigated, in contrast to normal pancreatic duct epithelia which were negative for PPARγ at the protein level [89]. In a study of normal versus malignant brain, breast, and prostate cells, PPARγ expression was consistently higher in the malignant cells but was not regulated by ligand [90]. As well as evidence from expression studies, activation of PPARγ resulted in an inhibition of estrogen production in breast adipose tissue [91] and played a role in tumor regression [92]. Additionally, activation of PPARγ inhibited growth of human hepatocellular [93] and esophageal carcinoma cells though cell cycle arrest and induction of apoptosis [94]. The tumor suppressor functions of PPARγ ligands are reported to be mediated by modulation of expression of the tumor suppressor PTEN [95], a protein known to have an involvement in many cellular functions including proliferation and apoptosis (reviewed in Ref. [96]). 4.3 PPARγ as a Therapeutic Target?
Studies on estrogen production suggest a role for PPARγ ligands in the treatment of breast cancer [91]. In addition, the observation that activation of PPARγ is antiproliferative and correlates with maturational stage in neuroblastoma, vascular smooth muscle cells, and DMBA-induced mammary tumors suggests a potential use for PPARγ in the management of advanced stage vascularised tumors [38, 79, 97]. However, the most compelling evidence for PPARγ as a therapeutic target for cancer is derived from studies of colorectal tumors. Treatment of colorectal cancer cell lines with PPARγ ligands resulted in growth inhibition, promotion of differentiation related markers, and expression of Drg-1 (differentiation-related gene-1), a putative suppressor of metastasis in human colorectal cancer [34, 98]. Activation of PPARγ was also found to inhibit expression of COX-2 in colorectal cancer cell lines, which is in opposition to results observed with PPARα [75, 99]. Furthermore, inactivating mutations in PPARγ were identified in a subset of colorectal tumors, supporting a role for PPARγ as a tumor suppressor of colorectal carcinogenesis [100].
5 PPAR$
Although the majority of studies have demonstrated a role for PPARγ ligands as preventative agents for colorectal cancer, two studies in APCmin/+ mice suggested the converse. These studies showed an increase in tumors or polyps in the colon of these mice after they were fed a diet containing a PPARγ agonist for 8 or 5 weeks, respectively [101, 102]. However, this is in contrast to in vitro and murine xenograft studies using human cell lines in which PPARγ agonists resulted in growth inhibition and induction of differentiation [100, 103]. As already described for PPARα, species differences in response between the human cell lines compared with the APCmin/+ mice offer the most likely explanation for this discrepancy. Additionally, one recent paper suggests that the antitumor effects of TZDs are independent of PPARγ and are mediated instead by inhibition of transcription [104].
5 PPARβ 5.1 Expression and Activation
Expression of PPARβ has been demonstrated in a vast array of tissue and cell types [4, 105, 106]. Unlike PPARα and PPARγ , little is known about the physiological role of PPARβ and several independent studies have linked this receptor to diverse functions. An involvement for PPARβ has been suggested in brain lipid metabolism [107], squamous cell differentiation [105], regulation of other hormone receptors [108], and as a mediator of prostacyclin function in blastocyst implantation in mice [76]. Although several compounds have been demonstrated to activate PPARβ nonspecifically, including a number of PPARα and γ agonists, a selective ligand for PPARβ has yet to be elucidated. 5.2 PPARβ and Cancer
Unlike PPARα and PPARγ , little is known about the role of PPARβ and its involvement, if any, in the development of cancer. As described earlier, activation of this receptor has been associated with processes independent of those identified for either PPARα or PPARγ . The observations that PPARβ can repress human PPARα activity [108] and induce PPARγ expression [109] would suggest that PPARβ can act as a negative regulator of tumorigenesis in certain cell systems. In contrast to this hypothesized antitumorigenic role, PPARβ has been suggested to play a role in the development of colorectal cancer [37, 76, 110]. Expression levels of PPARβ were shown to be higher in colon carcinoma compared to the adjacent surrounding tissue [37, 76]. Furthermore, PPARβ was concentrated in the most differentiated cells at the luminal surface of the mucosal glands in normal mucosa, but was expressed in epithelial cells located throughout the dysplastic glands of the
7
8 PPARs and Cancer
neoplastic tissue [76]. This suggests that PPARβ expression alone is not proneoplastic [76]. Using serial analysis of gene expression (SAGE), PPARβ was identified as a downstream target of the adenomatous polyposis coli (APC) tumor suppressor pathway [111]. Under normal conditions, APC binds to β-catenin and inhibits its ability to form a transcription complex with TCF-4. Conversely, in the majority of colorectal tumors examined, APC is inactivated by truncating mutations giving rise to elevated transcriptional activity of the β-catenin/TCF-4 complex. Similarly, in those tumors with intact APC genes, mutations of β-catenin that render it resistant to the inhibitory effects of APC occur, thereby demonstrating that inhibition of β-catenin/TCF-4-mediated transcription is critical to the tumor suppressive role of APC [37]. Downregulation of PPARβ by APC was demonstrated to be as the result of a direct molecular interaction, since the PPARβ promoter contains TCF-4 binding sites, and PPARβ promoters were repressed by APC as well as stimulated by mutant β-catenin [37]. 5.3 PPARβ as a Therapeutic Target?
Evidence for a potential use of PPARβ ligands in cancer therapy is derived from studies of colorectal tumor development where non-steroidal anti-inflammatory drugs (NSAIDs) can bind and potentially inhibit PPARβ [37], offering an explanation of NSAID-mediated chemoprevention [110]. This theory was supported using a PPARβ-null human colorectal cancer cell line [37]. These cells showed no obvious phenotype in vitro, and no increased sensitivity to NSAIDs. However, they were defective in establishing tumors in vivo [37]. Although the absence of PPARβ affected tumorigenicity of this colorectal cell line, the authors stressed that this effect may be specific to this cell line, and that the role of PPARβ in the chemopreventative effects of NSAIDs may involve more than one mechanism [37]. However, because the APC pathway is mutated with high frequency in colon cancer and overexpression of PPARβ has been seen in many colorectal cancers, it seems likely that increased expression of PPARβ may contribute to the neoplastic process and thus its modulation may present therapeutic opportunities.
6 Conclusions
A review of the last decade of literature reveals an involvement of PPARs in many diverse biological processes from adipocyte differentiation to rat liver cancer. The evidence for a role of PPARs in fat metabolism and energy balance is conclusive whereas the evidence for a role in tumorigenesis is confusing. Nonetheless, it is clear that modulation of PPARγ activity has some effect on the regulation of tumors of the colon, opening many avenues for exploratory research. Perhaps most compelling is the tentative connection between high fat diet, cellular energy balance, and cancer of the colon and maybe of the breast. Although the way forward
References
depends on further elucidation of the basic biology of cancer, much is already known about PPAR ligand chemistry and toxicology and the binding specificity of different PPARs [19]. Thus, PPARs provide excellent therapeutic targets since they are receptors with intrinsic thresholds that act directly to modulate gene expression.
References 1 ISSEMANN, I., GREEN, S. Nature 1990, 347, 645–650. 2 KREY, G., BRAISSANT, O., L’HORSET, F., KALKHOVEN, E., PERROUD, M., PARKER, M., WAHLI, W. Mol. Endocrinol. 1997, 11, 779–791. 3 KLIEWER, S. A., FORMAN, B. M., BLUMBERG, B., ONG, E. S., BORGMEYER, U., MANGELSDORF, D. J., UMESONO, K., EVANS, R. M. Proc. Natl Acad. Sci. USA 1994, 91, 7355–7359. 4 BRAISSANT, O., FOUFELLE, F., SCOTTO, C., DAUCA, M., WAHLI, W. Endocrinology 1996, 137, 354–366. 5 CORTON, J. C., ANDERSON, S. P., STAUBER, A. Annu. Rev. Pharmacol. Toxicol. 2000, 40, 491–518. 6 DESVERGNE, B., WAHLI, W. Endocr. Rev. 1999, 20, 649–688. 7 ESCHER, P., WAHLI, W. Mutat. Res. 2000, 448, 121–138. 8 KERSTEN, S., DESVERGNE, B., WAHLI, W. Nature 2000, 405, 421–424. 9 ROBERTS, R., MOFFAT, G. Comments Toxicol. 2001, 7, 259–274. 10 ISSEMANN, I., PRINCE, R. A., TUGWOOD, J. D., GREEN, S. J. Mol. Endocrinol. 1993, 11, 37–47. 11 TUGWOOD, J. D., ISSEMANN, I., ANDERSON, R. G., BUNDELL, K. R., MCPHEAT, W. L., GREEN, S. EMBO J. 1992, 11, 433–439. 12 BARDOT, O., ALDRIDGE, T., LATRUFFE, N., GREEN, S. Biochem. Biophys. Res. Com-mun. 1993, 192, 37–45. 13 ALDRIDGE, T. C., TUGWOOD, J. D., GREEN, S. Biochem. J. 1995, 306, 473–479. 14 HSU, M., PALMER, C., SONG, W., GRIFFIN, K., JOHNSON, E. J. Biol. Chem. 1998, 273, 27988–97. 15 BAES, M., CASTELEIN, H., DESMET, L., DECLERCQ, P. Biochem. Biophys. Res. Commun. 1995, 215, 338–345. 16 MARCUS, S., CAPONE, J., RACHUBINSKI, R. Mol. Cell. Endocrinol. 1996, 120, 31–39.
17 YAN, Z., KAAM, W., STAUDINGER, J., MEDVEDEV, A., GHANAYEM, B., JETTEN, A. J. Biol. Chem. 1998, 273, 10948–57. 18 LADIAS, J., HADZOPOULOU-CLADARAS, M., KARDASSIS, D., CARDOT, P., CHENG, J., ZANNIS, V., CLADARAS, C. J. Biol. Chem. 1992, 267, 15849–60. 19 XU, H., LAMBERT, M., MONTANA, V., PLUNKET, K., MOORE, L., COLLINS, J., OPLINGER, J., KLIEWER, S., GAMPE, R., MCKEE, D., MOORE, J., WILLSON, T. Proc. Natl Acad. Sci. USA. 2001, 98, 13919–24. 20 AUBOEUF, D., RIEUSSET, J., FAJAS, L., VALLIER, P., FRERING, V., RIOU, J., STAELS, P., AUWERX, J., LAVILLE, M., VIDAL, H. Diabetes 1997, 46, 1319–1327. 21 AUWERX, J. Horm. Res. 1992, 38, 269–277. 22 BOCOS, C., GOTTLICHER, M., GEARING, K., BANNER, C., ENMARK, E., TEBOUL, M., CRICKMORE, A., GUSTAFSSON, J.-H. J. Steroid Biochem. 1995, 53, 467–473. 23 ELLINGHAUS, P., WOLFRUM, C., ASSMANN, G., SPENER, F., SEEDORF, U. J. Biol. Chem. 1999, 29, 2766–2772. 24 GOTTLICHER, M., DEMOZ, A., SVENNSSON, D., TOLLET, P., BERGE, R., GUSTAFSSON, J.-A. Biochem. Pharmacol. 1993, 46, 2177–2184. 25 KELLER, H., DREYER, C., MEDIN, J., MAHFOUDI, A., OZATO, K., WAHLI, W. Proc. Natl Acad. Sci. USA 1993, 90, 2160–2164. 26 SCHOONJANS, K., PEINADO-ONSURBE, J., LEFEBVRE, A. M., HEYMAN, R., BRIGGS, M., DEEB, S., STAELS, B., AUWERX, J. EMBO J. 1996, 15, 5336–5348. 27 DEVCHAND, P. R., KELLER, H., PETERS, J. M., VAZQUEZ, M., GONZALEZ, F. J., WAHLI, W. Nature 1996, 384, 39–43. 28 CORTON, J. C., LAPINSKAS, P. J., GONZALEZ, F. J. Mutat. Res. 2000, 448, 139–151.
9
10 PPARs and Cancer
29 MOTOJIMA, K., PASSILLY, P., PETERS, J. M., GONZALEZ, F. J., LATRUFFE, N. J. Biol. Chem. 1998, 273, 16710–14. 30 WAHLI, W., BRAISSANT, O., DESVERGNE, B. Chem. Biol. 1995, 2, 261–266. 31 BROWN, P. H., LIPPMAN, S. M. Breast Cancer Res. Treat. 2000, 62, 1–17. 32 CHANG, T. H., SZABO, E. Cancer Res. 2000, 60, 1129–1138. 33 GUAN, Y. F., ZHANG, Y. H., BREYER, R. M., DAVIS, L., BREYER, M. D. Neoplasia 1999, 1, 330–339. 34 GUPTA, R. A., BROCKMAN, J. A., SARRAF, P., WILLSON, T. M., DUBOIS, R. N. J. Biol. Chem. 2001, 276, 29681–87. 35 HAN, S. W., GREENE, M. E., PITTS, J., WADA, R. K., SIDELL, N. Clin. Cancer Res. 2001, 7, 98–104. 36 MURPHY, G. J., HOLDER, J. C. Trends Pharmacol. Sci. 2000, 21, 469–474. 37 PARK, B. H., VOGELSTEIN, B., KINZLER, K. W. Proc. Natl Acad. Sci. USA 2001, 98, 2598–03. 38 WEIHAN, S., GREENE, M. E., PITTS, J., WADA, R. K., SIDELL, N. Clin. Cancer Res. 2001, 7, 98–104. 39 ROBERTS, R. A. Arch. Toxicol. 1999, 73, 413–418. 40 LEE, S. S., PINEAU, T., DRAGO, J., LEE, E. J., OWENS, J. W., KROETZ, D. L., FERNANDEZ-SALGUERO, P. M., WESTPHAL, H., GONZALEZ, F. J. Mol. Cell. Biol. 1995, 15, 3012–3022. 41 LEE, S. S., GONZALEZ, F. J. Ann. NY Acad. Sci. 1996, 804, 524–529. 42 ROBERTS, R. A., JAMES, N. H., HASMALL, S. C., HOLDEN, P. R., LAMBE, K., MACDONALD, N., WEST, D., WOODYATT, N. J., WHITCOME, D. Toxicol. Lett. 2000, 112–113, 49–57. 43 PETERS, J. M., CATTLEY, R. C., GONZALEZ, F. J. Carcinogenesis 1997, 18, 2029–2033. 44 HASMALL, S. C., ROBERTS, R. A. Pharmacol. Ther. 1999, 82, 63–70. 45 JAMES, N. H., SOAMES, A. R., ROBERTS, R. A. Arch. Toxicol. 1998, 72, 784–790. 46 GILL, J. H., BRICKELL, P., DIVE, C., ROBERTS, R. A. Carcinogenesis 1998, 19, 1743–1747. 47 PETERS, J. M., AOYAMA, T., CATTLEY, R. C., NOBUMITSU, U., HASHIMOTO, T., GONZALEZ, F. J. Carcinogenesis 1998, 19, 1989–1994.
48 LEDWITH, B. J., JOHNSON, T. E., WAGNER, L. K., PAULEY, C. J., MANAM, S., GALLOWAY, S. M., NICHOLS, W. W. Cancer Res. 1996, 56, 3257–3264. 49 ROBERTS, R. A., JAMES, N. H., COSULICH, S., HASMALL, S. C., ORPHANIDES, G. Toxicol. Lett. 2001, 120, 301–306. 50 WEST, D. A., JAMES, N. H., COSULICH, S. C., HOLDEN, P. R., BRINDLE, R., ROLFE, M., ROBERTS, R. A. Hepatology 1999, 30, 1417–1424. 51 CHEVALIER, S., MACDONALD, N., ROBERTS, R. A. J. Cell Sci. 1999, 112, 4785–4791. 52 HASMALL, S. C., WEST, D. A., OLSEN, K., ROBERTS, R. A. Carcinogenesis 2000, 21, 2159–65. 53 HOLDEN, P. R., HASMALL, S. C., JAMES, N. H., WEST, D. R., BRINDLE, R. D., GONZALEZ, F. J., PETERS, J. M., ROBERTS, R. A. Cell. Mol. Biol. 2000, 46, 29–39. 54 PETERS, J. M., RUSYN, I., ROSE, M. L., GONZALEZ, F. J., THURMAN, R. G. Carcinogenesis 2000, 21, 823–826. 55 ROLFE, M., JAMES, N. H., ROBERTS, R. A. Carcinogenesis 1997, 18, 2277–80. 56 GILL, J. H., JAMES, N. H., ROBERTS, R. A., DIVE, C. Carcinogenesis 1998, 19, 299–304. 57 HASMALL, S. C., JAMES, N. H., MACDONALD, N., GONZALEZ, F. J., PETERS, J. M., ROBERTS, R. A. Mutat. Res. 2000, 448, 193–200. 58 ROBERTS, R. A., SOAMES, A. R., GILL, J. H., JAMES, N. H., WHEELDON, E. B. Carcinogenesis 1995, 16, 1693–1698. 59 ROBERTS, R. A., JAMES, N. H., WOODYATT, N. J., MACDONALD, N., TUGWOOD, J. D. Carcinogenesis 1998, 19, 43–48. 60 The Coronary Drug Project Research Group. JAMA 1975, 231, 360–380. 61 IARC. Clofibrate. IARC Working Group Monograph on the Evaluation of the Carcinogenic Risk to Humans 1996, vol. 66, pp. 391–426. 62 Committee of Principal Investigators. Lancet 1980, 379–385. 63 Committee of Principal Investigators. Lancet 1984, 600–604. 64 LAW, M., THOMPSON, S., WALD, N. BMJ 1994, 308, 373–379.
References 65 VANDEN HEUVEL, J. P. Toxicol. Sci. 1999, 47, 1–8. 66 PALMER, C., HSU, M. H., GRIFFIN, K., RAUCY, J., JOHNSON, E. Mol. Pharmacol. 1998, 53, 14–22. 67 GONZALEZ, F. In: Dietary Fat and Cancer: Genetic and Molecular Interactions (Proceedings of Seventh Annual Conference of AICR held in Washington, D.C., August 28–30, 1996). 1997, pp. 109–125. 68 WOODYATT, N. J., LAMBE, K. G., MYERS, K. A., TUGWOOD, J. D., ROBERTS, R. A. Carcinogenesis 1999, 20, 369–372. 69 LAWRENCE, J., LI, Y., CHEN, S., DELUCA, J., BERGER, J., UMBERNHAUER, D., MOLLER, D., ZHOU, G. J. Biol. Chem. 2001, 276, 31521–7. 70 HSU, M.-H., SAVAS, U., GRIFFIN, K., JOHNSON, E. J. Biol. Chem. 2001, 276, 27950–8. 71 COLLETT, G. P., BETTS, A. M., JOHNSON, M. I., PULIMOOD, A. B., COOK, S., NEAL, D. E., ROBSON, C. N. Clin. Cancer Res. 2000, 6, 3241–3248. 72 LEDWITH, B. J., PAULEY, C. J., WAGNER, L. K., ROKOS, C. L., ALBERTS, D. W., MANAM, S. J. Biol. Chem. 1997, 272, 3707–3714. 73 BONAZZI, A., MASTYUGIN, V., MIEYAL, P. A., DUNN, M. W., LANIADO-SCHWARTZMAN, M. J. Biol. Chem. 2000, 275, 2837–2844. 74 MEADE, E. A., MCINTYRE, T. M., ZIMMERMAN, G. A., PRESCOTT, S. M. J. Biol. Chem. 1999, 274, 8328–8334. 75 IKAWA, H., KAMEDA, H., KAMITANI, H., BAEK, S. J., NIXON, J. B., HSI, L. C., ELING, T. E. Exp. Cell Res. 2001, 267, 73–80. 76 GUPTA, R. A., TAN, J., KRAUSE, W. F., GERACI, M. W., WILLSON, T. M., DEY, S. K., DUBOIS, R. N. Proc. Natl Acad. Sci. USA 2000, 97, 13275–80. 77 BARTSCH, H., NAIR, J., OWEN, R. W. Carcinogenesis 1999, 20, 2209–2218. 78 BADAWI, A., EL-SOHEMY, A., STEPHEN, L., GHOSHAL, A., ARCHER, M. Carcinogenesis 1998, 19, 905–910. 79 PIGHETTI, G. M., NOVOSAD, W., NICHOLSON, C., HITT, D. C., HANSENS, C., HOLLINGSWORTH, A. B., LERNER, M. L., BRACKETT, D., LIGHTFOOT, S. A., GIMBLE, J. M. Anticancer Res. 2001, 21, 825–829.
80 CHINETTI, G., GRIGLIO, S., ANTONUCCI, M., TORRA, I., DELERIVE, P., MAJD, Z., FRUCHART, J.-C., CHAPMAN, J., NAJIB, J., STAELS, B. J. Biol. Chem. 1998, 273, 25573–80. 81 HORNUNG, D., WAITE, L., RICKE, E., BENTZIEN, F., WALLWEINER, D., TAYLOR, R. J. Clin. Endocrin. Metab. 2001, 86, 3108–3114. 82 ROBERTS, R. A., KIMBER, I. Carcinogenesis 1999, 20, 1397–1401. 83 KLIEWER, S. A., LENHARD, J. M., WILLSON, T. M., PATEL, I., MORRIS, D. C., LEHMANN, J. M. Cell 1995, 83, 813–819. 84 GIMBLE, J. M., ROBINSON, C. E., WU, X., KELLY, K. A., RODRIGUEZ, B. R., KLIEWER, S. A., LEHMANN, J. M., MORRIS, D. C. Mol. Pharmacol. 1996, 50, 1087–1094. 85 KLIEWER, S. A., XU, H. E., LAMBERT, M. H., WILLSON, T. M. Recent Prog. Horm. Res. 2001, 56, 239–263. 86 ROCCHI, S., AUWERX, J., ISEMURA, M. Br. J. Nutr. 2000, 84, S223–S227. 87 GELMAN, L., FRUCHART, J. C., AUWERX, J. Cell. Mol. Life Sci. 1999, 55, 932–943. 88 FAJAS, L., DEBRIL, M.-B., AUWERX, J. J. Mol. Endocrinol. 2001, 27, 1–9. 89 MOTOMURA, W., OKUMURA, T., TAKAHASHI, N., OBARA, T., KOHGO, Y. Cancer Res. 2000, 60, 5558–5564. 90 NWANKWO, J. O., ROBBINS, M. E. Prostaglandins Leukot. Essent. Fatty Acids 2001, 64, 241–245. 91 RUBIN, G. L., ZHAO, Y., KALUS, A. M., SIMPSON, E. R. Cancer Res. 2000, 60, 1604–1608. 92 AGARWAL, V. R., BISCHOFF, E. D., HERMANN, T., LAMPH, W. W. Cancer Res. 2000, 60, 6033–6038. 93 RUMI, M., SATO, H., ISHIHARA, S., KAWASHIMA, K., HAMAMOTO, S., KAZUMORI, H., OKUYAMA, T., FUKUDA, R., NAGASUE, N., KINOSHITA, Y. Br. J. Cancer. 2001, 1640–1647. 94 TAKASHIMA, T., FUJIWARA, Y., HIGUCHI, K., ARAKAWA, T., YANO, Y., HASUMA, T., OTANI, S. Int. J. Oncol. 2001, 19, 465–471. 95 PATEL, L., PASS, I., COXON, P., DOWNES, C., SMITH, S., MACPHEE, C. Curr. Biol. 2001, 11, 764–768. 96 SIMPSON, L., PARSONS, R. Exp. Cell Res. 2001, 264, 29–41.
11
12 PPARs and Cancer
97 HUPFELD, C. J., WEISS, R. H. Am. J. Physiol. 2001, 281, 207–216. 98 GUAN, R. J., FORD, H. L., FU, Y., LI, Y., SHAW, L. M., PARDEE, A. B. Cancer Res. 2000, 60, 749–755. 99 YANG, W. L., FRUCHT, H. Carcinogenesis 2001, 22, 1379–1383. 100 SARRAF, P., MUELLER, E., SMITH, W. M., WRIGHT, H. M., KUM, J. B., AALTONEN, L. A., de LA CHAPELLE, A., SPIEGELMAN, B. M., ENG, C. Mol. Cell. 1999, 3, 799–804. 101 SAEZ, E., TONTONOZ, P., NELSON, M. C., ALVAREZ, J. G., MING, U. T., BAIRD, S. M., THOMAZY, V. A., EVANS, R. M. Nature Med. 1998, 4, 1058–1061. 102 LEFEBVRE, A. M., CHEN, I., DESREUMAUX, P., NAJIB, J., FRUCHART, J. C., GEBOES, K., BRIGGS, M., HEYMAN, R., AUWERX, J. Nature Med. 1998, 4, 1053–1057. 103 KITAMURA, S., MIYAZAKI, Y., SHINOMURA, Y., KONDO, S., KANAYAMA, S.,
104
105
106 107
108 109
110 111
MATSUZAWA, Y. Jpn J. Cancer Res. 1999, 90, 75–80. PALAKURTHI, S., AKTAS, H., GRUBISSICH, L., MORTENSEN, R., HALPERIN, J. Cancer Res. 2001, 61, 6213–6218. MATSUURA, H., ADACHI, H., SMART, R. C., XU, X., ARATA, J., JETTEN, A. M. Mol. Cell. Endocrinol. 1999, 147, 85–92. BRAISSANT, O., WAHLI, W. Endocrinology 1998, 139, 2748–2754. BASU-MODAK, S., BRAISSANT, O., ESCHER, P., DESVERGNE, B., HONEGGER, P., WAHLI, W. J. Biol. Chem. 1999, 274, 35881–88. JOW, L., MUKHERJEE, R. J. Biol. Chem. 1995, 270, 3836–3840. BASTIE, C., HOLST, D., GAILLARD, D., JEHL-PIETRI, C., GRIMALDI, P. A. J. Biol. Chem. 1999, 274, 21920–25. ROY, H. K., KAROLSKI, W. J., RATASHAK, A. Int. J. Cancer 2001, 92, 609–615. HE, T. C., CHAN, T. A., VOGELSTEIN, B., KINZLER, K. W. Cell 1999, 99, 335–345.
1
PPARs in Atherosclerosis Jorge Plutzky Harvard Medical School, Boston, MA, USA, 02115
Originally published in: Cellular Proteins and Their Fatty Acids in Health and Disease. Edited by Asim K. Duttaroy and Friedrich Spener. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30437-0
1 Atherosclerosis 1.1 Introduction
Atherosclerosis and its protean manifestations – including myocardial infarction, cerebrovascular event, peripheral vascular disease – remains a major cause of morbidity and mortality worldwide [1]. Although a common scourge of Western society, the exportation of atherosclerosis to developing countries reveals obvious and worrisome trends. This immense burden of disease has been countered with extensive efforts to better understand, prevent, and treat atherosclerosis and its complications. Over the years, this has led to an evolution of thinking regarding atherosclerosis [2, 3]. A century ago, atherosclerosis was considered primarily a degenerative disease of the elderly. Over time, it became apparent that atherosclerosis is a clinical entity manifest in a much broader demographic segment of the population. Early observations focused on intra-luminal arterial obstructions as being at the center of an imbalance between arterial supply and tissue demand that precipitated cardiac ischemia and infarction. Although true to some extent for “demand ischemia”, this view of a gradual “hardening of the arteries” gave way to understanding atherosclerotic plaque as a more dynamic lesion [4]. Critical to this was recognition of thrombus formation as an integral element in both the propagation of lesions as well as their complications [5, 6]. Most of acute cardiovascular events derive from plaque rupture and the ensuing occlusive thrombus that results [7]. More recently, the field has focused on two dominant issues regarding the nature of atherosclerotic plaque. The first is atherosclerosis as a chronic inflammatory Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 PPARs in Atherosclerosis
disease [1]. The second is attention to how metabolic pathways and disorders fundamentally and pervasively influence vascular responses and the atherosclerotic process [8]. Both of these trends – inflammation and metabolism as critical components of the atherosclerotic process – underlie the explosive interest in the role of peroxisome proliferator activated receptors (PPARs) in vascular biology and atherosclerosis [9, 10]. We will review the current concepts in atherosclerosis itself before delving into the existing evidence implicating PPARs and their ligands as possible mediators in these responses, looking in turn at PPARα, PPARγ , and PPARδ (also known as PPARβ). 1.2 Atherosclerosis as a Clinical Syndrome
Atherosclerosis can no longer be considered as a single pathologic process accounting for its various clinical manifestations. Two common scenarios highlight this spectrum of disease. The first is the 45-year-old man who presents with sudden death and is found at autopsy to have ruptured a single severe atherosclerotic plaque in his left anterior descending artery, with a subsequent rapid cascade of occlusive thrombus, acute cardiac ischemia, ventricular fibrillation, and death. Contrast this to the octagenarian who has a mild infarct as a complication of hip surgery and is found to have diffuse mild three-vessel disease. Both patients have atherosclerosis, but the manifestations are quite divergent. Similarly, distinctions can be made between atherosclerosis in the coronary arteries, cerebrovasculature, aorta, and femoral vessels. The presence of biomechanical factors, such as shear stress and hemodynamics, which alter cellular responses including gene expression is one of many likely contributors to such differences [11]. Here we will focus largely on coronary atherosclerosis. Atherosclerosis develops through a variety of recognized stages, from its earliest manifestation as a fatty streak through raised lesions to evolution into complicated advanced lesions [6]. An early seminal event in atherosclerosis is the entry of inflammatory cells like monocytes (MO) and their in situ development into macrophages (Mφ) and then foam cells as they take up lipid [1]. This arterial entry of MO/Mφ involves a series of specific cellular interactions heavily dependent on the expression of adhesion molecules on the luminal surface of endothelial cells (EC) [12]. Late-stage complicated lesions are characterized by a large necrotic and highly thrombogenic lipid core, separated from the circulation by the fibrous cap, a reactive response produced primarily by vascular smooth muscle cells (VSMCs) and consisting of collagen and other extracellular matrix materials [10]. Data from pathologic studies as well as thrombolytic trials, in which the culprit lesion leading to myocardial infarction (MI) could be seen by comparing coronary angiograms preand post-thrombolysis, revealed that most MIs occurred not in the most stenotic lesions but rather in those with more modest stenoses, in the 50–70% range [13, 14]. Through such work, the fissuring of the fibrous cap, known as plaque rupture, and the exposure of the circulation to the thrombogenic lipid core result in the occlusive thrombosis inducing most MIs.
1 Atherosclerosis
These observations have focused attention on the nature of the fibrous cap, what maintains it, and the forces and players involved with plaque rupture versus plaque stabilization [7]. One group of proteins highly implicated in plaque destabilization are the matrix metalloproteinases (MMPs) [15]. MMPs represent a large, complex, family of highly regulated matrix-degrading enzymes thought to be especially active in the shoulder regions of ruptured plaques. Another mechanism contributing to lesion formation is the superficial erosion of ECs. 1.3 Cellular Constituents of Atherosclerosis
The complex interplay of factors and forces described above clearly involves unique and critical roles for different cellular constituents of the arterial wall in both normal vessel function and pathologic atherosclerotic responses. One fundamental shift in perspective over the past decade has been in understanding the endothelium as not a simple passive conduit but rather a dynamic organ involved in metacrine, paracrine, and endocrine function [16, 17]. In their critical position lining blood vessels, ECs represent the interface between circulating elements and the ultimate tissue response, serving as a transducer of those reactions. Endothelial changes like adhesion molecule expression are not only among the earliest cellular responses contributing to atherosclerosis, endothelial dysfunction also occurs as an early manifestation of clinical atherosclerosis [18]. Centrally involved in such changes, and a salient example of endothelial roles in vascular responses, is endothelial nitric oxide production, an endogenously produced substance synthesized by endothelial nitric oxide synthetase, that induces arterial relaxation through effects on VSMCs [19, 20]. The studies establishing the dependency on the endothelium for such responses contributed to the Nobel Prize in Medicine being awarded to Furchgott, Murad, and Ignarro [21]. The medial layer of the arterial wall is primarily muscular, comprising VSMCs. These cells help maintain vascular tone and provide an essential structural component. VSMCs are integral to many processes implicated in atherosclerosis [22]. They may contribute to the presence of hypertension or be involved in the reaction to it. The migration of VSMCs from the media to the intima as well as proliferation of VSMCs once present there is a hallmark of the atherosclerotic process. VSMCs also provide the main source of extracellular matrix which forms the fibrous cap. In this role, VSMCs provide an essential function, albeit a reaction to a pathologic state. VSMC dysfunction may contribute to the weakening of the fibrous cap due to less collagen synthesis. Interestingly, such decreases in collagen production are repressed by inflammatory stimuli such as interleukins (IL-1). VSMCs also elaborate matrix metalloproteinases (MMPs), which contribute to the remodeling process [6]. Interestingly, MMP expression appears in part regulated by pro-inflammatory signals such as IL-1 [23, 24]. Two other cell types represent critical players in atherosclerosis, even if they are not cellular constituents of the artery wall under normal conditions. These atheroma-associated cells include monocytes/macrophages and lymphocytes,
3
4 PPARs in Atherosclerosis
predominantly T cells [1]. Monocytes are attracted from the circulation to sites of endothelial injury, going through a process of rolling, attachment, and entry into the vessel wall [12]. Extensive data suggest chemokines, or chemoattractant cytokines, as an important signal attracting monocytes and other inflammatory cells to sites of inflammation and injury [25]. Members of this large complex family of proteins are induced by different signals, including inflammatory cytokines acting on various cells, such as EC. Secreted chemokines bind to chemokine receptors present on specific inflammatory cells. Moving down concentration gradients of chemokines, monocytes adhere and enter the vessel wall, and take up lipid, especially low-density lipoproteins (LDL). T lymphocytes are also critical atheromaassociated cells, providing important signals, for example inflammatory cytokines that contribute to monocyte, VSMC, and EC activation [26]. 1.4 Atherosclerosis as an Inflammatory Disorder
As suggested above, an extensive literature has now established atherosclerosis as a chronic inflammatory disease [1]. Although the exact proximal signals remain unclear, with posited contributors ranging from possible infectious etiologies to oxidized LDL, inflammatory cells and inflammatory signals are implicated in essentially every step of atherogenesis and atherosclerosis. Extensive efforts have identified many inflammatory mediators that participate in these responses. Interestingly, these basic science studies now intersect with epidemiologic studies which find that levels of non-specific markers of inflammation like c-reactive protein (cRP) predict the risk of future cardiovascular events [27]. Interestingly, c-RP levels are predictive above and beyond total cholesterol/HDL ratios, and decreased cardiovascular events through lipid-lowering are paralleled by decreases in c-RP levels [28, 29]. Although a multitude of questions persist about the clinical significance of inflammatory markers, their striking correlation with atherosclerosis underscores the prospect of inflammation as a therapeutic target, a notion that has been raised for PPAR agonists. 1.5 Atherosclerosis as a Metabolic Disorder
Coincident with expanding recognition of the dynamic nature of the vasculature and its pathobiology has been the increasing attention to the interaction between various aspects of metabolism and vascular responses [8]. The most obvious connection in this regard has been lipid metabolism and its now well-established role in atherosclerosis. This extends beyond the extensive data establishing LDL as a risk factor, pathogenic mediator, and therapeutic target to other lipoprotein particles [30]. Triglycerides, carried predominantly in VLDL particles (when synthesized by the liver) and chylomicrons (when particles are assembled in the gut) are composed primarily of fatty acids (FAs) [31, 32]. Establishing if triglycerides represent independent risk factors for atheroclerosis has been challenging [33, 34]. Although
2 PPAR in the Vasculature
triglycerides are clearly associated with cardiovascular events, this association tends to weaken or disappear when other parameters are taken into account. This is not surprising given the close interaction between other risk factors like obesity, diabetes, the post-menopausal state and increased triglyceride levels [35, 36]. Triglycerides are also associated, although inversely, with high-density lipoproteins [30]. HDL is thought to participate in reverse cholesterol transport, moving lipoproteins from the periphery back to the liver [37]. The extensive connections between PPARs and lipoprotein metabolism are discussed in detail elsewhere in this book, but remain relevant to the connections between PPARs and atherosclerosis since as PPAR agonists may exert their effects on atherosclerosis through changes in lipids. Recent trials suggest that even in patients with coronary artery disease (CAD) with average to low LDL levels, modestly increased triglycerides, and low HDL, fibrate treatment can decrease cardiovascular events [38]. The possibility that these may occur in part through PPARα activation has been raised [9, 39]. Beyond lipoproteins, the metabolic effects on atherosclerosis are also strikingly evident among diabetic patients [40]. Diabetes mellitus is now considered a cardiovascular risk equivalent based on data that patients with diabetes but no known CAD have the same risk of a future cardiovascular event as non-diabetics with a prior MI [41–43]. Increased cardiovascular risk also extends to patients with the insulin resistance syndrome (syndrome X) and even so-called “pre-diabetes”. A significant percentage of these insulin-resistant subjects will go on to manifest frank diabetes [44, 45]. The fact that so many of these patients will have clinically significant atherosclerosis has suggested the hypothesis that the metabolic aspects of these various syndromes - high triglycerides, low HDL, increased post-prandial hyperglycemia, and lipemia – may exert effects on the vasculature long before hyperglycemia diagnostic of diabetes becomes apparent [46]. Supportive of this hypothesis is the observation that perhaps as many as half of the patients by the time they are diagnosed with diabetes have already had an MI [47, 48]. Considerable attention has been paid to the potential effects, either protective or harmful, of anti-diabetic medications on the vasculature [49, 50].
2 PPAR in the Vasculature 2.1 PPARs in Vascular Biology and Atherosclerosis
From this broad overview, we can now consider the potential role of PPAR pathways in atherosclerosis. PPARs have been considered elsewhere in this book; the focus here is on the potential part these key transcriptional regulators of various aspects of metabolism (Tab. 1) may play in vascular responses, especially those relevant to atherosclerosis. To some extent, this issue has been greatly influenced by the ongoing use of PPAR agonists as therapeutic agents in patients with high risk for atherosclerosis – either those with dyslipidemia receiving fibrates or diabetic
5
6 PPARs in Atherosclerosis Table 1 Putative ligands, cellular expression patterns, and pathways in which PPARs are
thought to be involved PPARα
PPARγ
PPARδ
Ligands
Fibrates Fatty acids
Carba/prostacyclins Fatty acids
Expression
Liver Muscle Vascular: ECs, SMCs Atheroma-associated: MPs, MPs, lymphocytes FA oxidation Lipid metabolism/transfer Energy balance
Thiazolidinediones Fatty acids Hydroxyoctadenoic acids Adipose tissue Vascular: ECs, SMCs Atheroma-associated: MPs, lymphocytes Adipogenesis Lipid metabolism/transfer Glucose homeostasis
Inflammation? Lipid metabolism/transfer
Pathways
Ubiquitous
PPAR expression in vascular and atheroma associated cells is now well-established, raising the prospect of synthetic and endogenous PPAR ligands having direct effects on vascular responses. EC, endothelial cell; SMC, smooth muscle cell; FA, fatty acid; MP, macrophages.
patients receiving thiazolidinediones. Early experiments examining these direct effects of PPAR agonists in the vasculature sought to first determine if PPARs were present in the various cellular components of the arterial wall and/or atheromaassociated cells, before considering if PPAR-regulated targets with relevance to vascular biology existed. Work has now evolved into testing if such effects occur in vivo, including humans. Thus, prior and emerging clinical trial data regarding cardiovascular events in patients receiving PPAR agonists can be viewed through this lens. To what extent did changes in cardiovascular events seen in those clinical trials stem from indirect changes in metabolism induced by PPAR agonists versus possible effects of directly activating PPARs in vascular cells. Beyond these pharmacologic issues, a pertinent question also exists regarding the role of endogenous PPAR activation in vascular responses. 2.2 Examining Evidence for PPAR in Vascular Responses
To a certain degree, the data regarding PPARs in the vasculature have been limited by several critical issues. One is the inherent complexity of PPAR action, with multiple levels of control that can vary among different cell types or conditions [51]. These issues include differences in respect to ligands, and the effects of those ligands at different concentrations, differences in PPAR action dependent on cell type, for example due to the presence or absence of certain co-activators or the relative levels of PPAR themselves, and variance in promoter response elements, to name a few. Published results may also differ because of different experimental protocols. Thus, perhaps not surprisingly, various and at times divergent effects
3 PPAR( in Vascular Biology and Atherosclerosis
have been reported in different and sometimes the very same vascular or atheromaassociated cell types. Furthermore, it remains challenging to ever establish that a given effect occurs in response to nuclear receptor activation, and not some other up- or downstream mechanism. These issues must be kept in mind in interpreting many of the results reported in the PPAR literature in general and certainly with regard to PPARs in the vasculature. Thus, elements of confusion, contradiction, and unresolved issues persist in the nascent world of PPARs and their agonists in the vasculature. These issues, perhaps a product of the dizzying pace of progress, will be noted as they arise in the review of the data that follows.
3 PPARγ in Vascular Biology and Atherosclerosis 3.1 In vitro Evidence
PPARγ expression is now recognized in essentially all vascular and atheromaassociated cells seen in the vessel wall [52]. This includes VSMCs, ECs, monocyte/macrophages, and more recently Tcells. PPARγ is also expressed in human atherosclerotic lesions [53]. With the presence of these nuclear receptors established in these cellular settings, the question arises as to the evidence for regulation of relevant targets in the vascular pathways described above. PPARγ expression has been linked to a variety of integral responses in VSMCs. Among these, regulation of matrix metalloproteinase 9 (MMP-9) is perhaps an example of a now canonical PPARγ -regulated target gene, with evidence of transrepression of MMP-9 induction seen across many levels in both VSMCs [54] and monocytes/macrophages [55]. PPARγ agonists repress MMP-9 mRNA induction, protein levels, and gelatinolytic activity on zymograms, a commonly used tool to study MMP responses [54]. Similar PPARγ effects have been reported for the MMP-9 promoter [55]. A net effect of decreased MMP-9 action seems apparent given the absence of PPARγ regulation of tissue inhibitors of MMPs (known as TIMPs) [54]. PPARγ agonists also limit VSMC migration [54] and proliferation [56] with effects on cell cycling [57]. These changes in VSMC responses may contribute to thiazolidinedione effects in cardiovascular settings in which VSMC matrix remodeling may be at work, for example instent restenosis [58, 59]. They may also play a role in the inhibition seen in intimal: medial hyperplasia reported with thiazolidinediones [60]. PPARγ agonists also exert effects on MMP-9 in monocyte/macrophages, where MMPs [61] have been implicated in plaque rupture; it remains to be seen if PPARγ activation contributes to plaque stabilization. In addition to similar effects through PPARγ activation on MMPs in monocytes and macrophages, PPARγ activation also has other effects in these cells as well. Among the earliest reports of PPARγ action in inflammatory cells were studies indicating that PPARγ agonists could limit cytokine induction in
7
8 PPARs in Atherosclerosis
monocytes/macrophages [62, 63]. More recent work establishes PPARγ (and PPARα, discussed below) expression in T lymphocytes [64–66], with a variety of effects reported, including apoptosis and inhibition of cytokine production [67]. Provocative results were also reported for PPARγ -mediated induction of CD36, a receptor for oxidized LDL uptake [68, 69]. These effects, which appeared to contribute to foam cell formation, raised concerns for possible pro-atherogenic effects through thiazolidinediones. This would not necessarily be the case, given other possible off-setting effects through sequestering oxidized LDL by CD36 induction in other tissues, and concurrent changes in cholesterol efflux [70]. Furthermore, the nature of the increased lipid in these monocytes was never established and could well have been triglyceride-rich lipoprotein, and not the lipid particles typically associated with foam cells. In fact, subsequent work has established that PPARγ , and apparently PPARα activators also induce the expression of ABC-1, a protein implicated in cholesterol efflux [71–73]. ABC-1 is the gene disrupted in Tangier’s disease, a clinical condition associated with markedly decreased HDL levels [74, 75]. Endothelial cells also express PPARγ as shown by RT-PCR [76], Western blotting and Northern analysis [77]. The anti-inflammatory effects suggested above have also been implicated in ECs on a different limb of inflammation – chemokine expression. We have reported that PPARγ agonists limit IFNγ induction of specific subsets of chemokines implicated in atherosclerosis, at least among those tested. Three different well-established PPARγ agonists limited induction of three different CXC chemokines (IP-10, Mig, and ITAC), but not the CC chemokine MCP-1 [78]. Further analysis found consistent PPARγ effects on the IP-10 promoter and in functional responses to lymphocytes expressing the CXC chemokine receptor. Although no effect was seen on MCP-1, this does not exclude PPARγ effects on the MCP-1 pathway. Two groups have reported responses that either clearly or likely occur by repression of the receptor for MCP-1 – CCR2 [79, 80]. Interestingly, MCP-1 levels were not changed in mouse models of atherosclerosis treated with PPARγ agonists, although MCP-1 levels do appear to fall in patients treated with TZDs [81]. PPARγ agonists may also induce apoptosis in EC [82], although these effects were observed primarily with 15-deoxy12,14 prostaglandin J2 (15-deoxy-12,14 PGJ2 ), which also has PPAR-independent effects [83]. Another relevant PPARγ target in ECs highlights some of the important issues noted earlier regarding challenges of establishing effects in vitro and comparing these to clinical responses. A variety of groups have found that PPARγ agonists could induce plasminogen activator inhibitor-1 (PAI-1), a procoagulant protein implicated in atherosclerosis [77, 84, 85]. Increased PAI-1 expression through PPARγ has been reported in EC in angiogenesis studies [84], in response to putative natural but not synthetic ligands [86] and in adipocytes [85]. In contrast, others find a decrease in PAI-1 in EC. Regardless, in human studies, serum PAI-1 levels appear to fall, possibly due to improved insulin sensitivity, or improved levels of glucose and/or triglycerides [87]. Thus different effects may be seen in vitro while clinical responses may vary due to pleiotropic drug effects, including differences between different agonists as well as PPAR-independent effects.
4 PPAR" in Vascular Biology and Atherosclerosis
3.2 In vivo Evidence
At least four different studies have reported various PPARγ agonists in vivo decrease atherosclerosis in mouse models [81, 88–90]. The first of these, by Glass and colleagues, studied the effects of rosiglitazone and an experimental PPARγ agonist, GW7845, in LDL receptor-deficient animals [81]. The extent of lesions was decreased with both agents, although interestingly, only in male mice. The explanation for this gender difference is unclear, but the female mice were more insulin resistant. This difference between males and females was not observed in other studies employing other models (apoE-deficient and LDLR-deficient mice on high fat or high fructose) and other PPARγ agonists. The ongoing clinical use of PPARγ ligands (pioglitazone or Actos, rosiglitazone or Avandia) as insulin-sensitizing agents in humans affords the possibility of asking if these drugs exert similar effects on atherosclerosis. Such questions are particularly germane given the very high risk for ischemic cardiovascular disease that patients with diabetes face. Despite these opportunities, establishing that the effects of any PPAR agonist derives from activation of its purported nuclear receptor target remains difficult. These agents may influence atherosclerosis indirectly, by improving metabolic status, directly but independent of PPAR activation, or directly through activation of its cognate PPAR. Early studies in humans suggest that these agents can in fact improve both serum and vascular surrogate markers of abnormal vascular responses [60, 87, 91, 92]. In terms of lipids, decreased triglycerides and improved HDL levels have been seen, often with changes that rival or exceed those seen in trials demonstrating decreased cardiac events through HDL-raising therapies [93]. Questions persist if there may also be differential effects on various parameters such as lipids among PPARγ agonists. Beyond this, improvements in inflammatory markers have also been seen, as recently reported. Such effects include changes in markers such as c-reactive protein (C-RP) [94], currently under intensive study as a novel predictor of cardiovascular risk. Arterial reactivity may be improved after TZD treatment, a potentially significant finding given evidence for endothelial dysfunction as a surrogate marker for early changes in atherosclerosis, although much of this data has been in abstract form. Perhaps most impressive have been the improvements in carotid intimal thickness seen in relatively short time frames in studies with limited numbers of patients. These results have contributed to a variety of ongoing studies focusing on cardiovascular endpoints in patients being treated with these agents. 4 PPARα in Vascular Biology and Atherosclerosis 4.1 In vitro Evidence
The central role for PPARα in fatty acid metabolism – controlling enzymes involved with β-oxidation as well as various targets involved in lipid metabolism – places PPARα in a position to be relevant to a host of vascular issues [95]. Such a role is
9
10 PPARs in Atherosclerosis
furthered by evidence that certain FAs can act as ligands for PPARα [51, 96]. FAs, present in various lipoproteins to a variable degree and with a varying nature (chain length, degree of saturation, cis/trans conformation), have been implicated in many aspects of EC, VSMC, monocyte/macrophage and even lymphocyte responses [97– 100]. PPARα has been reported in all such cellular settings, thus targets in these locations may come under regulation by synthetic and natural PPAR agonists, exerting direct effects on vascular responses. Such action could occur independently or in synergism with PPARα’s myriad effects on lipid metabolism, which range from regulation of apolipoproteins involved in HDL metabolism to enzymes of lipid metabolism (e.g., lipoprotein lipase), to even possible effects of other lipid drugs on HDL levels (e.g., statin-induction of HDL). Our focus here is on the potential direct vascular effects. Prior observations from our group revealed that certain fatty acids, including the ω-3 FA docosohexanoic acid, could limit adhesion molecule expression, especially VCAM-1, in ECs [101]. We asked if these effects might be occurring through PPARα activation. In fact, PPARα agonists can limit inflammatory cytokine-induced VCAM-1 expression, with effects that do not derive from cellular toxicity or changes in mRNA stability. These effects are demonstrable at the level of the VCAM-1 promoter through likely NFkB effects, as well as in functional assays of adhesion [86]. Although we found this effect to be specific to PPARα, others report that PPARγ agonists can also limit adhesion molecule expression [102, 103]. More recent evidence indicates that oxidation of some ω-3 FA, like eicosopentanoic acid (EPA), can limit leukocyte adhesion through PPARαdependent mechanisms evident in vivo (Fig. 1) [104]. These effects, specific to oxidized EPA and evident in classic PPARα transactivation assays, were absent when such in vivo adhesion studies were repeated in the genetic absence of PPARα (Fig. 1) [104]. Other PPARα-regulated targets in ECs include endothelin 1 as well as various enzymes involved in oxidative processes including superoxide dismutase and phox (p47, p22) [92, 105]. The effects of PPAR agonists in general appear to be an anti-oxidative effect, as also suggested by studies in humans [91, 92]. In contrast to these potentially anti-atherosclerotic effects, at least two reports indicate that certain oxidized lipoproteins may induce adhesion molecule expression through PPARαdependent mechanisms [106], although perhaps only under certain conditions such as in the presence of specific lipolytic enzymes [107]. Integrating these apparently contradictory effects into a unified model remains currently beyond our grasp but certainly worthy of further investigation, and likely a function of the complexity alluded to earlier. PPARα is also expressed in VSMCs. Evidence from Staels and colleagues indicates that PPARα activation can limit IL-6-induced changes in those cells [108]. PPARs clearly play a role in inflammatory cell differentiation and signaling, thus providing another setting in which they are likely players in atherosclerosis. In monocyte/macrophages PPARα has been reported to induce apoptosis, but only after cytokine stimulation, in contrast to PPARγ agonists, which induced apoptosis in the absence of such stimuli [109]. Mentioned earlier was the possible effect of
4 PPAR" in Vascular Biology and Atherosclerosis
Fig. 1 Effect of oxidized EPA on leukocyte adhesion in mesenteric venules in wildtype and PPAR"-deficient mice. Wild-type or PPAR"-deficient mice (PPAR"−/−) were given an intraperitoneal injection of vehicle (Veh) alone, native EPA, or oxidized EPA (oxEPA) one hour prior to injection of LPS. Five hours later mice were anesthetized and mesenteric venules were observed using intra-vital microscopy. (A) Adherent leukocytes were determined (n = 5–7
for each group of mice). *P<0.03 compared with Veh + LPS (wild-type) and oxidized EPA+LPS (PPAR"−/−). Similar results were seen for leukocyte rolling. (B) Representative photographs of leukocytes interacting with the vessel wall (arrows) in LPS stimulated wild-type and PPAR"−/− mice, after indicated treatments, are shown. The effects of oxidized EPA are abrogated in the genetic absence of PPAR" [104].
PPARα on ABC-1 regulation [72]. Recent work highlights the effects of PPARs in T lymphocytes, although the results have varied, in part due to different agonists, concentrations, and cell types employed [65, 67, 110, 111]. We found PPARα activation limits inflammatory cytokine production, including TNFa, IL-2, and IFNγ [67]. This would argue for a significant proximal effect of PPARα activators on cytokine production by T lymphocytes, which is of potential importance given the critical part of lymphocytes in interacting with and activating macrophages in atherosclerotic plaque.
11
12 PPARs in Atherosclerosis
4.2 In vivo Evidence
In vivo mouse data had previously been somewhat lacking. This may have been a result of the toxicity that mice experience in response to fibrate-induced peroxisomal proliferation. PPARα-deficient mice were crossed to the apoE deficiency model of atherosclerosis, with the surprising result that these mice had less not more atherosclerosis, despite the published record supporting an anti-inflammatory and anti-atherosclerotic roles for PPARα in the vasculature [112]. One might have anticipated that atherosclerosis should have been accentuated, not decreased. Several factors may have contributed to this result. This important experiment did require the crossing of different strains, which could have contributed to the results. Interestingly, this same group has reported in abstract form that bone marrow transplantation of PPARα-deficient bone marrow ameliorates atherosclerosis in the LDL receptor-deficient mouse, potentially more consistent with the brunt of the in vitro data [113]. If so, perhaps involvement of PPARα in apoE pathways may have contributed to different responses in the absence of apoE. Anomalous results in mouse models of atherosclerosis have engendered general questions regarding the suitability of this model for broad extrapolation to human atherosclerosis [114, 115]. For example, hepatic effects are certainly fundamentally different in mice versus humans. Rigorous examination of this interesting data will also have to consider the prospect that PPARα can, in certain settings or in certain cells, promote proatherosclerotic and pro-inflammatory responses. Recent work demonstrated that treatment of apoE-deficient mice with the PPARα agonist fenofibrate decreased lesion size, an effect that was even greater when these mice also expressed human aPoAI [128]. In terms of humans, clinical trials with fibrates from PPARα agonists can also be reviewed, asking if the responses seen might have derived from PPARα action, either because of indirect effects on lipid parameters or directly through PPARα activation in vascular or inflammatory cells themselves. PPARα activation accounts at least in part for the effects of fibrates on HDL increases (inducing apoAI transcription) or triglyceride lowering (by inducing lipoprotein lipase, downregulating apoC-III expression, and inducing β-oxidation) [116–118]. Interestingly, some of these trials may suggest the possibility of direct PPARα effects in the vasculature exactly because of minimal lipid changes despite positive results. VA-HIT (Veterans Administration HDL Intervention Trial) is one such case in point [38]. In that trial, veterans with relatively low to average LDL levels (110 mg dL−1 ; obviating an ethical requirement for statin treatment), low HDL (32 mg dL−1 ), and only modestly increased triglycerides (160 mg dL−1 ) were given the putative PPARα ligand gemfibrozil or placebo. At the end of the study, primary cardiovascular endpoints were significantly decreased. Given that HDL levels only rose ∼3 points to a mean of 35 mg dL−1 , and that post-hoc analyses fail to show an impact of triglyceride lowering on event reduction, the possibility of a direct effect of the drug can be entered as a possible hypothesis. Other studies with fibrates, e.g. bezafibrate, have had somewhat less impressive results, although this may derive
6 Conclusion
from bezafibrates lesser activity against PPARα [119]. Positive angiographic results have been seen with fenofibrate in DAIS (Diabetes Atherosclerosis Intervention Study) [120]. The notion that fibrates might bind to and activate PPARα raises the intriguing issue of what substances produced endogenously by the body might be capable of similar responses. We have found that lipolysis of triglyceride-rich lipoproteins generates specific PPAR ligands in a manner determined by the lipase, the lipid substrate acted upon, and the PPAR being targeted. Lipoprotein lipase (LPL), the main enzyme implicated in triglyceride metabolism, can act on VLDL to release specific PPARα ligands as well as distal PPARα effects. Of note, these distal responses included decreases in inflammation, suggesting that intact triglyceride metabolism may limit inflammation and/or atherosclerosis through PPARα [129].
5 PPARδ in Vascular Biology and Atherosclerosis
Less is known about PPARδ pathways and effects, but this is rapidly changing [121–123]. There is considerable interest in the potential role for PPARδ given its essentially ubiquitous expression. Like PPARα, certain fatty acids bind to the activated PPARδ. Studies using PPARδ (and PPARα) ligands have been performed in an interesting rhesus monkey model that spontaneously develops an insulin resistancelike syndrome among some but not all members of the colony [124, 125]. More recent work suggests that PPARδ may mediate pro-inflammatory responses, including the promotion of lipid accumulation in macrophages [126], although other reports suggest effects on cholesterol efflux through PPARδ activation [127]. No doubt ongoing work will provide further insights into the potential role of PPARδ in the vasculature.
6 Conclusion
Dramatic progress has been made over the past decade regarding the role of PPAR in metabolism; even more impressive has been the rapidly evolving evidence for PPAR signaling in vascular biology and atherosclerosis. Both sets of data continue to argue that PPAR are likely relevant for vascular responses and the pathogenesis of atherosclerosis, either indirectly by changing metabolic parameters or directly by activating PPAR. Despite these advances, a multitude of issues remains unresolved. Most fundamentally, potential pro-atherosclerotic effects through PPAR activation have been suggested by some reports. Better understanding of PPAR effects, especially in different cellular settings, should address this. No doubt the field will continue to be driven by the ongoing emergence of new synthetic PPAR ligands, including ones with greater potency, dual PPARα/PPARγ ligands, and PPARδ
13
14 PPARs in Atherosclerosis
activators. A central issue persists regarding the identity of true endogenous PPAR ligands, with a sense that perhaps those naturally occurring mediators have not yet been identified. Understanding the nature and generation of such natural PPAR ligands may provide considerable insight into in vivo physiologic pathways that maintain insulin sensitivity, raise HDL, lower triglycerides, and may limit inflammation. Ultimately, all these issues will return to the central question regarding the effects such pathways have on atherosclerosis in humans, and its treatment.
References 1 R. ROSS, N. Engl. J. Med. 1999, 340, 115–126. 2 G. C. MCMILLAN, Adv. Exp. Med. Biol. 1995, 369, 1–6. 3 W. E. STEHBENS, Cardiovasc. Pathol. 1999, 8, 177–178. 4 R. ROSS, Nature 1993, 362, 801–809. 5 M. B. TAUBMAN, P. L. GIESEN, A. D. SCHECTER, Y. NEMERSON, Thromb. Haemost. 1999, 82, 801–805. 6 P. LIBBY, J. Intern. Med. 2000, 247, 349–358. 7 P. LIBBY, Circulation 2001, 104, 365–372. 8 J. PLUTZKY, Curr. Opin. Cardiol. 2000, 15, 416–421. 9 N. MARX, P. LIBBY, J. PLUTZKY, J. Cardiovasc. Risk 2001, 8, 203–210. 10 C. K. GLASS, J. L. WITZTUM, Cell 2001, 104, 503–516. 11 J. N. TOPPER, M. A. GIMBRONE, Jr. Mol. Med. Today 1999, 5, 40–46. 12 T. A. SPRINGER, Cell 1994, 76, 301–314. 13 W. C. LITTLE, Am. J. Cardiol. 1990, 66, 44G–47G. 14 V. FUSTER, J. BADIMON, J. H. CHESEBRO, J. T. FALLON, Haemostasis 1996, 26 (Suppl 4), 269–284. 15 E. E. CREEMERS, J. P. CLEUTJENS, J. F. SMITS, M. J. DAEMEN, Circ. Res. 2001, 89, 201–210. 16 M. A. GIMBRONE, Jr., N. RESNICK, T. NAGEL, L. M. KHACHIGIAN, T. COLLINS, J. N. TOPPER, Ann. NY Acad. Sci. 1997, 811, 1–10; discussion 10–11. 17 T. F. LUSCHER, M. BARTON, Clin. Cardiol. 1997, 20, II-3–10. 18 N. RUDERMAN, D. CHISHOLM, X. PI SUNYER, S. SCHNEIDERS, Am. J. Med. 1998, 105, 32S–39S.
19 H. LI, U. FORSTERMANN, J. Pathol. 2000, 190, 244–254. 20 A. TEDGUI, Z. MALLAT, Circ. Res. 2001, 88, 877–887. 21 T. N. RAJU, Lancet 2000, 356, 346. 22 J. L. ORFORD, A. P. SELWYN, P. GANZ, J. J. POPMA, C. ROGERS, Am. J. Cardiol. 2000, 86, 6H–11H. 23 E. LEE, A. J. GRODZINSKY, P. LIBBY, S. K. CLINTON, M. W. LARK, R. T. LEES, Arterioscler. Thromb. Vasc. Biol. 1995, 15, 2284–2289. 24 U. SCHONBECK, F. MACH, P. LIBBY, J. Immunol. 1998, 161, 3340–3346. 25 A. D. LUSTER, N. Engl. J. Med. 1998, 338, 436–445. 26 T. J. REAPE, P. H. GROOT, Atherosclerosis 1999, 147, 213–225. 27 P. M. RIDKER, M. J. STAMPFER, N. RIFAI, JAMA 2001, 285, 2481–2485. 28 P. M. RIDKER, N. RIFAI, M. A. PFEFFER, F. SACKS, E. BRAUNWALD, Circulation 1999, 100, 230–235. 29 I. JIALAL, D. STEIN, D. BALIS, S. M. GRUNDY, B. ADAMS-HUET, S. DEVARAJ, Circulation 2001, 103, 1933–1935. 30 K. A. DUGI, D. J. RADER, Semin. Thromb. Hemost. 2000, 26, 513–519. 31 I. J. GOLDBERG, J. Lipid Res. 1996, 37, 693–707. 32 A. JAGLA, J. SCHREZENMEIR, Exp. Clin. Endocrinol. Diabetes 2001, 109, S533–S547. 33 E. J. SCHAEFER, J. J. GENEST, Jr., J. M. ORDOVAS, D. N. SALEM, P. W. WILSON, Atherosclerosis 1994, 108 (Suppl), S41–S54. 34 M. A. AUSTIN, Proc. Nutr. Soc. 1997, 56, 667–670.
References 35 F. H. De MAN, M. C. CABEZAS, H. H. Van BARLINGEN, D. W. ERKELENS, T. W. de BRUIN, Eur. J. Clin. Invest. 1996, 26, 89–108. 36 M. LAAKSO, S. LEHTO, Atherosclerosis 1998, 137 (Suppl), S65–S73. 37 S. A. HILL, M. J. MCQUEEN, Clin. Biochem. 1997, 30, 517–525. 38 H. B. RUBINS, S. J. ROBINS, D. COLLINS, C. L. FYE, J. W. ANDERSON, M. B. ELAM, F. H. FAAS, E. LINARES, E. J. SCHAEFER, G. SCHECTMAN, T. J. WILT, J. WITTES, N. Engl. J. Med. 1999. 341, 410–418. 39 I. P. TORRA, G. CHINETTI, C. DUVAL, J. C. FRUCHART, B. STAELS, Curr. Opin. Lipidol. 2001, 12, 245–254. 40 J. R. SOWERS, M. EPSTEIN, E. D. FROHLICH, Hypertension 2001, 37, 1053–1059. 41 S. M. HAFFNER, N. Engl. J. Med. 2000, 342, 1040–1042. 42 R. W. JAMES, Curr. Opin. Lipidol. 2001, 12, 425–431. 43 JAMA 2001, 285, 2486–2497. 44 R. B. GOLDBERG, Med. Clin. North Am. 2000, 84, 81–93, viii. 45 H. N. GINSBERG, L. S. HUANG, J. Cardiovasc. Risk 2000, 7, 325–331. 46 S. M. HAFFNER, L. MYKKANEN, A. FESTA, J. P. BURKE, M. P. STERN, Circulation 2000, 101, 975–980. 47 R. M. JACOBY, R. W. NESTO, J. Am. Coll. Cardiol. 1992, 20, 736–744. 48 J. JULIENS, J. Diabetes Complications 1997, 11, 123–130. 49 H. GINSBERG, J. PLUTZKY, B. E. SOBEL, J. Cardiovasc. Risk 1999, 6, 337–346. 50 M. LAAKSOS, J. Intern. Med. 2001, 249, 225–235. 51 T. M. WILLSON, P. J. BROWN, D. D. STERNBACH, B. R. HENKE, J. Med. Chem. 2000, 43, 527–550. 52 D. BISHOP-BAILEY, Br. J. Pharmacol. 2000, 129, 823–834. 53 O. ZIOUZENKOVA, S. PERREY, N. MARX, D. BACQUEVILLE, J. J. PLUTZKY, Curr. Atheroscler. Rep. 2002, 4, 59–64. 54 N. MARX, U. SCHONBECK, M. A. LAZAR, P. LIBBY, J. PLUTZKY, Circ. Res. 1998, 83, 1097–1103.
55 M. RICOTE, J. T. HUANG, J. S. WELCH, C. K. GLASS, J. Leukoc. Biol. 1999, 66, 733–739. 56 R. E. LAW, W. MEEHAN, X. XI, K. GRAF, D. WUTHRICH, W. COATS, D. FAXONS, J. Clin. Invest. 1996, 98, 1897–1905. 57 R. E. LAW, S. GOETZE, X. P. XI, S. JACKSON, Y. KAWANO, L. DEMER, M. C. FISHBEIN, W. P. MEEHAN, W. A. HSUEH, Circulation 2000, 101, 1311–1318. 58 T. TAKAGI, A. YAMAMURO, K. TAMITA, K. YAMABE, M. KATAYAMA, S. MORIOKA, T. AKASAKA, K. YOSHIDA, Am. J. Cardiol. 2002, 89, 318–322. 59 T. TAKAGI, T. AKASAKA, A. YAMAMURO, Y. HONDA, T. HOZUMI, S. MORIOKA, K. YOSHIDA, J. Am. Coll. Cardiol. 2000, 36, 1529–1535. 60 J. MINAMIKAWA, S. TANAKA, M. YAMAUCHI, D. INOUE, H. KOSHIYAMA, J. Clin. Endocrinol. Metab. 1998, 83, 1818–1820. 61 N. MARX, G. SUKHOVA, C. MURPHY, P. LIBBY, J. PLUTZKY, Am. J. Pathol. 1998, 153, 17–23. 62 C. JIANG, A. T. TING, B. SEED, Nature 1998, 391, 82–86. 63 M. RICOTE, A. C. LI, T. M. WILLSON, C. J. KELLY, C. K. GLASS, Nature 1998, 391, 79–82. 64 R. B. CLARK, D. BISHOP-BAILEY, T. ESTRADA-HERNANDEZ, T. HLA, L. PUDDINGTON, S. J. PADULA, J. Immunol. 2000, 164, 1364–1371. 65 S. G. HARRIS, R. P. PHIPPS, Eur. J. Immunol. 2001, 31, 1098–1105. 66 B. R. KWAK, S. MYIT, F. MULHAUPT, N. VEILLARD, N. RUFER, E. ROOSNEK, F. MACH, Circ. Res. 2002, 90, 356–362. 67 N. MARX, B. KEHRLE, K. KOHLHAMMER, M. GRUB, W. KOENIG, V. HOMBACH, P. LIBBY, J. PLUTZKYS, Circ. Res. 2002, 90, 703–710. 68 P. TONTONOZ, L. NAGY, J. G. ALVAREZ, V. A. THOMAZY, R. M. EVANS, Cell 1998, 93, 241–252. 69 L. NAGY, P. TONTONOZ, J. G. ALVAREZ, H. CHEN, R. M. EVANS, Cell 1998, 93, 229–40. 70 B. M. SPIEGELMAN, Cell 1998, 93, 153–155. 71 K. J. MOORE, E. D. ROSEN, M. L. FITZGERALD, F. RANDOW, L. P. ANDERSSON, D. ALTSHULER, D. S.
15
16 PPARs in Atherosclerosis
72
73
74 75 76
77
78
79
80
81
82 83
84
85
MILSTONE, R. M. MORTENSEN, B. M. SPIEGELMAN, M. W. FREEMAN, Nature Med. 2001, 7, 41–47. G. CHINETTI, S. LESTAVEL, V. BOCHER, A. T. REMALEY, B. NEVE, I. P. TORRA, E. TEISSIER, A. MINNICH, M. JAYE, N. DUVERGER, H. B. BREWER, J. C. FRUCHART, V. CLAVEY, B. STAEL, Nature Med. 2001, 7, 53–58. A. CHAWLA, Y. BARAK, L. NAGY, D. LIAO, P. TONTONOZ, R. M. EVANS, Nature Med. 2001, 7, 48–52. H. H. HOBBS, D. J. RADERS, J. Clin. Invest. 1999, 104, 1015–1017. J. M. ORDOVAS, Nutr. Rev. 2000, 58, 76–79. H. SATOH, K. TSUKAMOTO, Y. HASHIMOTO, N. HASHIMOTO, M. TOGO, M. HARA, H. MAEKAWA, N. ISOO, S. KIMURA, T. WATANABE, Biochem. Biophys. Res. Commun. 1999, 254, 757–763. N. MARX, T. BOURCIER, G. K. SUKHOVA, P. LIBBY, J. PLUTZKY, Arterioscler. Thromb. Vasc. Biol. 1999, 19, 546–551. N. MARX, F. MACH, A. SAUTY, J. H. LEUNG, M. N. SARAFI, R. M. RANSOHOFF, P. LIBBY, J. PLUTZKY, A. D. LUSTER, J. Immunol. 2000, 164, 6503–6508. K. H. HAN, M. K. CHANG, A. BOULLIER, S. R. GREEN, A. LI, C. K. GLASS, O. QUEHENBERGERS, J. Clin. Invest. 2000, 106, 793–802. U. KINTSCHER, S. GOETZE, S. WAKINO, S. KIM, S. NAGPAL, R. A. CHANDRARATNA, K. GRAF, E. FLECK, W. A. HSUEH, R. E. LAW, Eur. J. Pharmacol. 2000, 401, 259–270. A. C. LI, K. K. BROWN, M. J. SILVESTRE, T. M. WILLSON, W. PALINSKI, C. K. GLASS, J. Clin. Invest. 2000, 106, 523–531. D. BISHOP-BAILEY, T. HLA, J. Biol. Chem. 1999, 274, 17042–17048. A. ROSSI, P. KAPAHI, G. NATOLI, T. TAKAHASHI, Y. CHEN, M. KARIN, M. G. SANTORO, Nature 2000, 403, 103–108. X. XIN, S. YANG, J. KOWALSKI, M. E. GERRITSEN, J. Biol. Chem. 1999, 274, 9116–9121. H. IHARA, T. URANO, A. TAKADA, D. J. LOSKUTOFF, FASEB J. 2001, 15, 1233–1235.
86 N. MARX, G. K. SUKHOVA, T. COLLINS, P. LIBBY, J. PLUTZKY, Circulation 1999, 99, 3125–3131. 87 D. A. EHRMANN, D. J. SCHNEIDER, B. E. SOBEL, M. K. CAVAGHAN, J. IMPERIAL, R. L. ROSENFIELD, K. S. POLONSKY, J. Clin. Endocrinol. Metab. 1997, 82, 2108–2116. 88 A. R. COLLINS, W. P. MEEHAN, U. KINTSCHER, S. JACKSON, S. WAKINO, G. NOH, W. PALINSKI, W. A. HSUEH, R. E. LAWS, Arterioscler. Thromb. Vasc. Biol. 2001, 21, 365–371. 89 Z. CHEN, S. ISHIBASHI, S. PERREY, J. OSUGA, T. GOTODA, T. KITAMINE, Y. TAWMURA, H. OKAZAKI, N. YAHAGI, Y. IIZUKA, F. SHIONOIRI, K. OHASHI, K. HARA, H. SHIMANO, R. NAGAI, N. YAMADAS, Arterioscler. Thromb. Vasc. Biol. 2001, 21, 372–377. 90 T. CLAUDEL, M. D. LEIBOWITZ, C. FIEVET, A. TAILLEUX, B. WAGNER, J. J. REPA, G. TORPIER, J. M. LOBACCARO, J. R. PATERNITI, D. J. MANGELSDORF, R. A. HEYMAN, J. AUWERXS, Proc. Natl Acad. Sci. USA 2001, 98, 2610–2615. 91 A. ALJADA, H. GHANIM, J. FRIEDMAN, R. GARG, P. MOHANTY, P. DANDONAS, J. Clin. Endocrinol. Metab. 2001, 86, 3130–3133. 92 H. GHANIM, R. GARG, A. ALJADA, P. MOHANTY, Y. KUMBKARNI, E. ASSIAN, W. HAMOUDA, P. DANDONAS, J. Clin. Endocrinol. Metab. 2001, 86, 1306–1312. 93 A. A. PARULKAR, M. L. PENDERGRASS, R. GRANDA-AYALA, T. R. LEE, V. A. FONSECA, Ann. Intern. Med. 2001, 134, 61–71. 94 S. M. HAFFNER, A. S. GREENBERG, W. M. WESTON, H. CHEN, K. WILLIAMS, M. I. FREED, Circulation 2002, 106, 679–684. 95 I. PINEDA TORRA, P. GERVOIS, B. STAEL, Curr. Opin. Lipidol. 1999, 10, 151–159. 96 H. E. XU, M. H. LAMBERT, V. G. MONTANA, K. D. PLUNKET, L. B. MOORE, J. L. COLLINS, J. A. OPLINGER, S. A. KLIEWER, R. T. GAMPE, Jr., D. D. MCKEE, J. T. MOORE, T. M. WILLSON, Proc. Natl Acad. Sci. USA 2001, 98, 13919–13924. 97 D. HWANG, Annu. Rev. Nutr. 2000, 20, 431–456. 98 P. T. PRICE, C. M. NELSON, S. D. CLARKE, Curr. Opin. Lipidol. 2000, 11, 3–7.
References 99 J. BREMER, Prog. Lipid Res. 2001, 40, 231–268. 100 P. A. GRIMALDI, Curr. Opin. Clin. Nutr. Metab. Care 2001, 4, 433–437. 101 R. De CATERINA, M. A. CYBULSKY, S. K. CLINTON, M. A. GIMBRONE, Jr., P. LIBBY, Prostaglandins Leukot. Essent. Fatty Acids 1995, 52, 191–195. 102 S. M. JACKSON, F. PARHAMI, X. P. XI, J. A. BERLINER, W. A. HSUEH, R. E. LAW, L. L. DEMER, Arterioscler. Thromb. Vasc. Biol. 1999, 19, 2094–2104. 103 V. PASCERI, H. D. WU, J. T. WILLERSON, E. T. YEH, Circulation 2000, 101, 235–238. 104 S. SETHI, O. ZIOUZENKOVA, H. NI, D. D. WAGNER, J. PLUTZKY, T. N. MAYADAS, Blood 2002, 100, 1340–1346. 105 I. INOUE, S. GOTO, T. MATSUNAGA, T. NAKAJIMA, T. AWATA, S. HOKARI, T. KOMODA, S. KATAYAMA, Metabolism 2001, 50, 3–11. 106 H. LEE, W. SHI, P. TONTONOZ, S. WANG, G. SUBBANAGOUNDER, C. C. HEDRICK, S. HAMA, C. BORROMEO, R. M. EVANS, J. A. BERLINER, L. NAGYS, Circ. Res. 2000, 87, 516–521. 107 P. DELERIVE, C. FURMAN, E. TEISSIER, J. FRUCHART, P. DURIEZ, B. STAELS, FEBS Lett. 2000, 471, 34–38. 108 B. STAELS, W. KOENIG, A. HABIB, R. MERVAL, M. LEBRET, I. PINEDA TORRA, P. DELERIVE, A. FADEL, G. CHINETTI, J.-C. FRUCHART, J. NAJIB, J. MACLOUF, A. TEDGUI, Nature 1998, 393, 790–793. 109 G. CHINETTI, S. GRIGLIO, M. ANTONUCCI, I. P. TORRA, P. DELERIVE, Z. MAJD, J. C. FRUCHART, J. CHAPMAN, J. NAJIB, B. STAELS, J. Biol. Chem. 1998, 273, 25573–25580. 110 P. WANG, P. O. ANDERSON, S. CHEN, K. M. PAULSSON, H. O. SJOGREN, S. LI, Int. Immunopharmacol. 2001, 1, 803–812. 111 P. GOSSET, A. S. CHARBONNIER, P. DELERIVE, J. FONTAINE, B. STAELS, J. PESTEL, A. B. TONNEL, F. TROTTEIN, Eur. J. Immunol. 2001, 31, 2857–2865. 112 K. TORDJMAN, C. BERNAL-MIZRACHI, L. ZEMANY, S. WENG, C. FENG, F. ZHANG, T. C. LEONE, T. COLEMAN, D. P. KELLY, C. F. SEMENKOVICH, J. Clin. Invest. 2001, 107, 1025–1034.
113 V. R. BABAEV, A. S. MAJOR, T. ZHU, I. HIROYUKI, D. E. DOVE, S. FAZIO, M. F. LINTON, C. F. SEMENKOVICH, Circulation 2001, 104, II45. 114 T. M. BOCAN, Curr. Pharm. Des. 1998, 4, 37–52. 115 J. W. KNOWLES, N. MAEDA, Arterioscler. Thromb. Vasc. Biol. 2000, 20, 2336–2345. 116 R. HERTZ, J. BISHARA-SHIEBAN, J. BAR-TANA, J. Biol. Chem. 1995, 270, 13470–13475. 117 B. STAELS, N. VU DAC, V. A. KOSYKH, R. SALADIN, J. C. FRUCHART, J. DALLONGEVILLE, J. AUWERX, J. Clin. Invest. 1995, 95, 705–712. 118 J. C. FRUCHART, B. STAELS, P. DURIEZS, Curr. Atheroscler. Rep. 2001, 3, 83–92. 119 S. BEHAR, E. GRAFF, H. REICHER REISS, V. BOYKO, M. BENDERLY, A. SHOTAN, D. BRUNNER, Eur. Heart J. 1997, 18, 52–59. 120 G. STEINER, Diabetes Care 2000, 23 (Suppl 2), B49–53. 121 C. QI, Y. ZHU, J. K. REDDY, Cell Biochem. Biophys. 2000, 32, 187–204. 122 I. TAKADA, R. T. YU, H. E. XU, M. H. LAMBERT, V. G. MONTANA, S. A. KLIEWER, R. M. EVANS, K. UMESONO, Mol. Endocrinol. 2000, 14, 733–740. 123 T. HATAE, M. WADA, C. YOKOYAMA, M. SHIMONISHI, T. TANABE, J. Biol. Chem. 2001, 276, 46260–46267. 124 D. A. WINEGAR, P. J. BROWN, W. O. WILKISON, M. C. LEWIS, R. J. OTT, W. Q. TONG, H. R. BROWN, J. M. LEHMANN, S. A. KLIEWER, K. D. PLUNKET, J. M. WAY, N. L. BODKIN, B. C. HANSEN, J. Lipid Res. 2001, 42, 1543–1551. 125 J. B. HANSEN, H. ZHANG, T. H. RASMUSSEN, R. K. PETERSEN, E. N. FLINDT, K. KRISTIANSEN, J. Biol. Chem. 2001, 276, 3175–3182. 126 H. VOSPER, L. PATEL, T. L. GRAHAM, G. A. KHOUDOLI, A. HILL, C. H. MACPHEE, I. PINTO, S. A. SMITH, K. E. SUCKLING, C. R. WOLF, C. N. PALMER, J. Biol. Chem. 2001, 276, 44258–44265. 127 W. R. OLIVER, Jr., J. L. SHENK, M. R. SNAITH, C. S. RUSSELL, K. D. PLUNKET, N. L. BODKIN, M. C. LEWIS, D. A. WINEGAR, M. L. SZNAIDMAN, M. H. LAMBERT, H. E.
17
18 PPARs in Atherosclerosis
XU, D. D. STERNBACH, S. A. KLIEWER, B. C. HANSEN, T. M. WILLSON, Proc. Natl Acad. Sci. USA 2001, 98, 5306–5311. 128 H. DUEZ, Y. S. CHAO, M. HERNANDEZ, G. TORPIER, P. POULAIN, S. MUNDT, Z. MALLAT, E. TEISSIER, C. A. BURTON, A. TEDOUI, J. C. FRUCHART, C. FIEVET, S. D.
WRIGHT, B. STAELS, J. Biol. Chem. 2002, 207, 1. 129 O. ZIOUZENKOVA, S. PERREY, L. ASATRYAN, J. HWANG, K. MACNAUL, D. MOLLER, D. RADER, A. SEVANIAN, R. ZECHNER, G. HOEFLER, J. PLUTZKY, PNAS, 2003, 100, 2730–2735
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
1
An Overview of Protein Misfolding Diseases Christopher M. Dobson University of Cambridge, Cambridge, United Kingdom
Orginally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The ability of even the most complex protein molecules to fold to their biologically functional states is perhaps the most fundamental example of biological self-assembly, one of the defining characteristics of living systems [1]. In recent years, considerable progress has been made in understanding both the general principles of protein folding and the specific structural transitions that are involved in the folding of individual proteins [2, 3]. In addition to the detailed mechanistic studies that have largely been carried out in vitro, considerable efforts have been directed at understanding the manner in which folding occurs in vivo. Although the fundamental principles underlying the mechanism of folding are unlikely to differ in any significant manner from those elucidated from in vitro studies, many details of the way in which individual proteins fold undoubtedly depend on the environment in which they are located. In particular, given the complexity and stochastic nature of the folding process, it is inevitable that misfolding events will occur under some circumstances [4]. As misfolded or even incompletely folded chains inevitably expose to the outside world many regions of the polypeptide molecule that are buried in the native state, such species are prone to aberrant interactions with other molecules within the complex and crowded cellular environment [5]. Such interactions can result both in the disruption of normal cellular processes and in self-association or aggregation. In order to cope with the inevitable issues of misfolding and aggregation, therefore, living systems have evolved a range of elaborate strategies, including molecular chaperones, folding catalysts, and quality-control and degradation mechanisms [6, 7]. The fact that proteins can misfold and interact inappropriately with one another is well established. For example, one of the major problems in generating Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
2 An Overview of Protein Misfolding Diseases
re-combinant protein for research purposes or for use in biotechnology is recovering significant quantities of soluble and fully active protein from in vitro refolding procedures carried out in the laboratory [8]. The fact that misfolding, and its consequences such as aggregation, can be an important feature of the behavior of proteins in vivo is clearly evident from observations that the levels of certain proteins known to be able to deal with such problems are increased substantially during cellular stress [9]. Indeed, many molecular chaperones were first recognized in prokaryotic (bacterial) systems that had been subjected to stress generated by exposure to elevated temperatures, as their nomenclature as heat shock proteins (Hsps) indicates. As well as assisting folding and preventing incompletely folded chains from forming misfolded species, it is clear that some molecular chaperones are able to rescue proteins that have misfolded and allow them a second chance to fold correctly. There are also examples of molecular chaperones that are known to be able to solubilize misfolded aggregates under at least some circumstances. Such active intervention requires energy, and, not surprisingly, ATP is required for many of the molecular chaperones to function correctly [5]. Despite the fact that molecular chaperones are often expressed at high levels only in stressed systems, it is clear that they have a critical role to play in all organisms even when present at lower levels under normal physiological conditions. In eukaryotic systems, many of the proteins that are synthesized in a cell are destined for secretion to the extracellular environment. These proteins are translocated into the ER, where folding takes place prior to secretion through the Golgi apparatus. The ER contains a wide range of molecular chaperones and folding catalysts to promote efficient folding; in addition, the proteins involved are subject to stringent quality control [6]. The quality-control mechanism in the ER can involve a complex series of glycosylation and deglycosylation processes and acts to prevent misfolded proteins from being secreted from the cell. Once recognized, unfolded and misfolded proteins are targeted for degradation through the ubiquitin-proteasome pathway [7]. The details of how these regulatory systems operate provide remarkable evidence of the stringent mechanisms that biology has established to ensure that misfolding and its consequences are minimized. The importance of such a process is underlined by the fact that recent experiments suggest that up to half of all polypeptide chains fail to satisfy the quality-control mechanism in the ER, and for some proteins the success rate is even lower [10]. Like the “heat shock response” in the cytoplasm, the “unfolded protein response” in the ER is upregulated during stress and, as we shall see below, is strongly linked to the avoidance of misfolding diseases.
2 Protein Misfolding and Its Consequences for Disease
Folding and unfolding are the ultimate ways of generating and abolishing specific cellular activities, and unfolding is also the precursor to the ready degradation of a protein [11]. Moreover, it is increasingly apparent that some events in the cell,
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
2 Protein Misfolding and Its Consequences for Disease Table 1 Representative protein-folding diseases (adapted from Ref. [14]).
Disease
Protein
Site of folding
Hypercholesterolemia Cystic fibrosis Phenylketonuria Huntington’s disease Marfan syndrome Osteogenesis imperfecta α1-Antitrypsin deficiency Tay-Sachs disease Scurvy Alzheimer’s disease Parkinson’s disease Scrapie/Creutzfeldt-Jakob disease Familial amyloidoses Retinitis pigmentosa Cataract Cancer
Low-density lipoprotein receptor Cystic fibrosis transmembrane regulator Phenylalanine hydroxylase Huntingtin Fibrillin Procollagen α1-Antitrypsin β-Hexosaminidase Collagen Amyloid β-peptide/tau α-Synuclein Prion protein Transthyretin/lysozyme Rhodopsin Crystallins p53
ER ER Cytosol Cytosol ER ER ER ER ER ER Cytosol ER ER ER Cytosol Cytosol
such as translocation across membranes, can require proteins to be in unfolded or partially folded states. Processes as apparently diverse as trafficking, secretion, the immune response, and the regulation of the cell cycle are in fact now recognized to be directly dependent on folding and unfolding [12]. It is not surprising, therefore, that failure to fold correctly, or to remain correctly folded, gives rise to the malfunctioning of living systems and therefore to disease. Indeed, it is becoming increasingly evident that a wide range of human diseases is associated with aberrations in the folding process (Table 1) [13, 14]. Some of these diseases (e.g., cystic fibrosis) can be attributed to the simple fact that if proteins do not fold correctly, they will not able to exercise their proper function; such misfolded species are often degraded rapidly within the cell. Others (e.g., disorders associated with α1-antitrypsin) result at least in part from the failure of proteins to be trafficked to the appropriate organs in the body [15, 16]. In other cases, misfolded proteins escape all the protective mechanisms discussed above and form intractable aggregates within cells or in extracellular space. An increasing number of pathologies–including Alzheimer’s and Parkinson’s diseases, the spongiform encephalopathies, and late-onset diabetes – are known to be directly associated with the deposition of such aggregates in tissue (Table 2) [13, 14, 17–19]. Diseases of this type are among the most debilitating, socially disruptive, and costly in the modern world, and they are becoming increasingly prevalent as our societies age and as new agricultural, dietary, and medical practices are adopted [20]. One of the most characteristic features of many of the aggregation diseases is that they give rise to the deposition of proteins in the form of amyloid fibrils and plaques [17, 18]. Such deposits can form in the brain, in vital organs such as the liver and spleen, or in skeletal tissue, depending on the disease involved.
3
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
4 An Overview of Protein Misfolding Diseases Table 2 Examples of diseases associated with amyloid deposition (adapted from Refs. [14] and [18]).
Clinical syndrome Alzheimer’s disease Spongiform encephalopathies Primary systemic amyloidosis Secondary systemic amyloidosis
Fibril component
Aβ peptide,1-42,1-43 Full-length prion or fragments Intact light chain or fragments 76-residue fragment of amyloid A protein Familial amyloidotic polyneuropathy I Transthyretin variants and fragments Senile systemic amyloidosis Wild-type transthyretin and fragments Hereditary cerebral amyloid angiopathy Fragment of cystatin-C Hemodialysis-related amyloidosis β2-microglobulin Familial amyloidotic polyneuropathy II Fragments of apolipoprotein A-I Finnish hereditary amyloidosis 71-residue fragment of gelsolin Type II diabetes Islet-associated polypeptide Medullary carcinoma of the thyroid Fragments of calcitonin Atrial amyloidosis Atrial natriuretic factor Lysozyme amyloidosis Full-length lysozyme variants Insulin-related amyloidosis Full-length insulin Fibrinogen α chain amyloidosis Fibrinogen α chain variants
Type Organ limited Organ limited Systemic Systemic Systemic Systemic Organ limited Systemic Systemic Systemic Organ limited Organ limited Organ limited Systemic Systemic Systemic
In the case of neurodegenerative diseases, the quantity of such aggregates can be almost undetectable in some cases, while in systemic diseases, kilograms of protein can be found in such deposits. Each amyloid disease primarily involves the aggregation of a specific peptide or protein, although a range of additional components, including other proteins and carbohydrates, are also incorporated into the deposits when they form in vivo. The characteristics of the soluble forms of the 20 or so proteins involved in the well-defined amyloidoses vary greatly, ranging from intact globular proteins to largely unstructured peptide molecules, but the aggregated forms have many common characteristics [21]. Thus, amyloid deposits all show specific optical properties (such as birefringence) on binding certain dye molecules, notably Congo red; these properties have been used in diagnosis for over a century. The fibrillar structures that are characteristic of many of these types of aggregates have very similar morphologies (long, unbranched, and often twisted structures a few nanometers in diameter) and a characteristic “crossbeta” X-ray fiber diffraction pattern [21]. The latter reveals that the organized core structure is composed of β-sheets with the strands running perpendicular to the fibril axis (Figure 1) [22]. Fibrils having the essential characteristics of those found in ex vivo deposits can be reproduced in vitro from the component proteins under appropriate conditions, showing that they can self-assemble without the need for other components. For many years it was generally assumed that the ability to form amyloid fibrils with the characteristics described above was limited to a relatively small number
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
2 Protein Misfolding and Its Consequences for Disease
Fig. 1 Molecular model of an amyloid fibril. This model is derived from cryo-EM analysis of fibrils grown from bovine insulin whose native topology is indicated in (a). The two chains (A, green, and B, blue) are connected by three disulfide bridges (b). A possible topology for insulin in the fibrils
is illustrated in (c), while (d) indicates how strands could be assembled in a complete fibril, a model of which is shown in (e). The fibril consists of four “protofilaments” that twist around one another to form a regular pattern with a diameter of approximately 10 nm (from Ref. [31]).
of proteins, largely those seen in disease states, and that these proteins must possess specific sequence motifs encoding the amyloid core structure. However, recent studies have suggested that the ability of polypeptide chains to form such structures is common and indeed can be considered a generic feature of polypeptide chains [4, 23–25]. The most direct evidence for the latter statement is that fibrils can be formed under appropriate conditions by many different proteins that are not associated with disease, including such well-known proteins as myoglobin [26], as well as by homopolymers such as polythreonine or polylysine [27]. Remarkably, fibrils of similar appearance to those containing large proteins can be formed by peptides with just a handful of residues [28]. One can consider that amyloid fibrils are highly organized structures (effectively one-dimensional crystals) adopted by an unfolded polypeptide chain when it behaves as a typical polymer molecule; similar types of structures can be formed, for example, by synthetic polymers. The essential features of such structures are therefore determined by the physicochemical properties of the polymer chain. As with other highly organized materials (including crystals)
5
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
6 An Overview of Protein Misfolding Diseases
whose structures are based on repetitive long-range interactions, the most stable structures are usually those consisting of a single type of molecular species (e.g., a specific peptide sequence) where such interactions can be optimized [25].
3 The Structure and Mechanism of Amyloid Formation
Studies of amyloid fibrils formed by both disease-associated and other peptides and proteins have enabled many of the features of these structures to be defined [21, 29– 33], although no complete structure has yet been determined in atomic detail. It is clear that the core structure of the fibrils is stabilized primarily by interactions, particularly hydrogen bonds, involving the atoms of the extended polypeptide main chain. As the main chain is common to all polypeptides, this observation explains why fibrils formed from polypeptides of very different amino acid sequences are similar in appearance. The side chains are likely to be incorporated in whatever manner is most favorable for a given sequence within the amyloid structures; they appear to affect the details of the fibrillar assembly but not the fundamental structure of the fibril [34]. In addition, the proportion of a polypeptide chain that is incorporated in the core structure can vary substantially; in some cases only a handful of residues may be involved in such structure, with the remainder of the chain associated in some other manner with the fibrillar assembly. This generic type of structure contrasts strongly with the globular structures of most natural proteins that result from the uniquely favorable packing of a given set of side chains into a particular fold. On the conceptual model outlined here, structures such as amyloid fibrils occur because interactions associated with the highly specific packing of the side chains can sometimes override the conformational preferences of the main chain [24–27]. The strands and helices so familiar in the structures of native proteins are then the most stable structures that the main chain can adopt in the folds that are primarily defined by the side chain interactions. However, if the solution environment (pH, temperature, concentration, etc.) in which the molecules are found is such that these side chain interactions are insufficiently stable, the structures can unfold and may then, at least under some circumstances, reassemble in the form of amyloid fibrils. In order to understand the occurrence and significance of amyloid formation in biological systems, it is essential to establish the mechanism by which such structures are assembled from the soluble precursor species. In globular proteins the polypeptide main chain is largely buried within the folded structure. Conditions that favor formation of any form of aggregate by such species are therefore ones in which the molecules involved are at least partially unfolded, as occurs, for example, at low pH or elevated temperature [23, 35]. Because of the importance of the globular fold in preventing aggregation, the fragmentation of proteins, through proteolysis or other means, is another ready mechanism to stimulate amyloid formation. Indeed, many amyloid disorders, including Alzheimer’s disease, involve aggregation of fragments of larger precursor proteins that are unable to fold in the
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
3 The Structure and Mechanism of Amyloid Formation
absence of the remainder of the protein structure (see Table 2). Experiments in vitro indicate that the formation of fibrils, by appropriately destabilized or fragmented proteins, is then generally characterized by a lag phase followed by a period of relatively rapid growth [36]. Behavior of this type is typical of nucleated processes such as crystallization; as with crystallization, the lag phase in amyloid formation can generally be eliminated by addition of preformed fibrils to fresh solutions, a process known as seeding [36]. Although the specific events taking place during fibril growth are not yet elucidated in any detail, it is becoming possible to simulate the overall kinetic profiles using relatively simple models that incorporate wellestablished principles of nucleated processes [37]. One of the key findings of studies of the formation of amyloid fibrils is that there are many common features in the behavior of the different systems that have been examined [19, 36, 38, 39]. The first phase of the aggregation process generally appears to involve the formation of more or less disordered oligomeric species as a result of relatively nonspecific interactions, although in some cases specific structural transitions, such as domain swapping [40], may be involved if such processes increase the rate of aggregation. The earliest species visible by electron or atomic force microscopy often resemble small bead-like structures, sometimes described as amorphous aggregates or as micelles. These early “pre-fibrillar aggregates” then appear to transform into species with more distinctive morphologies, sometimes described as “protofibrils” or “protofilaments” [39, 41]. These structures are commonly short, thin, sometimes curly, fibrillar species that are thought to be able to assemble into mature fibrils, perhaps by lateral association accompanied by some degree of structural reorganization [42] (Figure 2). The extent to which aggregates dissolve and reassemble into more regular structures involved at the different stages of fibril assembly is not clear, but it could well be important under the slow growth conditions in which the most highly structured fibrils are formed. The earliest aggregates are likely in most situations to be relatively disorganized structures that expose to the outside world a variety of segments of the protein that are normally buried in the globular state. In other cases, however, such species appear to adopt distinctive structures, including the well-defined “doughnut-shaped” species that have been seen for a number of systems (Figure 2) [39, 44]. Although the ability of polypeptides to form amyloid fibrils appears generic, the propensity to do so varies dramatically with amino acid composition and sequence. At the most fundamental level, some types of amino acids are much more soluble than others, such that the concentration required for aggregation to occur will be much greater for some polypeptides than for others. In addition, the aggregation process, like crystallization, needs to be nucleated and the rates at which this process takes place can be highly dependent on many different factors. It is clear that even single changes of amino acids in protein sequences can change the rates at which the unfolded polypeptide chains aggregate by an order of magnitude or more [45]. Moreover, it has recently proved possible to correlate changes in aggregation rates caused by mutations with changes in simple properties that result from substitutions, such as charge, secondary structure propensities, and hydrophobicity [45]. As this correlation has been found to hold for a wide range of
7
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
8 An Overview of Protein Misfolding Diseases
Fig. 2 Schematic representation of the formation of amyloid fibrils and pathological deposits. An unfolded or partially unfolded peptide or protein initially forms small soluble aggregates that then assemble to form a variety of “pre-fibrillar” species, some or all of which appear to be toxic to living cells. These species undergo a series of additional
assembly and reorganizational steps to give ordered amyloid fibrils of the type illustrated in Figure 3. Pathological deposits such as the Lewy bodies associated with Parkinson’s disease [43] frequently contain large assemblies of fibrillar aggregates, along with other components including molecular chaperones (from Ref. [4]).
different sequences (see Figure 3), it strongly endorses the concept of a common mechanism of amyloid formation. In accordance with such ideas, those proteins that are completely or partially unfolded under normal conditions in the cell have sequences that are likely to have very low propensities to aggregate. An interesting and potentially important additional observation is that specific regions of the polypeptide chain appear to be responsible for nucleating the aggregation process [46]. Interestingly, the residues that nucleate the folding of a globular protein seem to be distinct from those that nucleate its aggregation into amyloid fibrils. Such a characteristic, which may reflect the different nature of the partially folded species that initiate the assembly processes, offers the opportunity for evolutionary pressure to select sequences that favor folding over aggregation [46, 47].
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
4 A Generic Description of Amyloid Formation
Fig. 3 Rationalization of the effects of mutations on the aggregation rates of peptides and proteins. The experimental aggregation rates of a variety of short peptides or natively unfolded proteins, including amylin, amyloid $-peptide, tau, and "-synuclein (see Tables 1 and 2 for a summary of the diseases with which these are associated), are shown plotted against rates calculated from an algorithm derived from extensive
mutational studies of the protein acylphosphatase; the analogous plot of the data for acylphosphatase is shown in (b). The correlation shown in (a) argues strongly for a common mechanism for amyloid formation and provides a platform for both predicting the effects of natural mutations and designing polypeptides with altered aggregation properties (from Ref. [45]).
4 A Generic Description of Amyloid Formation
It is clear that proteins can adopt different conformational states under different conditions and that there can be considerable similarities in this regard between different proteins. This situation can be summarized by using a schematic representation of the different states that are in principle accessible to a polypeptide chain (see Figure 4) [4, 25, 48]. In such a situation, the state that a given system populates under specific conditions will depend on the relative thermodynamic stabilities of the different states (in the case of oligomers and aggregates, the concentration will be a critical parameter) and on the kinetics of the various intercon-version processes. In this diagram, amyloid fibrils are included as just one of the types of aggregates that can be formed by proteins, although they have particular significance in that their highly organized, hydrogen-bonded structure gives them unique kinetic stability. This type of representation emphasizes that complex biological systems have become robust by controlling and regulating the various states accessible to a given polypeptide chain at given times and under given conditions, just as they regulate and control the various chemical transformations that take place in the cell [4, 24]. The latter is achieved primarily through enzymes and the former by means of the molecular chaperones and degradation mechanisms mentioned above. And just as the aberrant behavior of enzymes can cause metabolic disease, the aberrant behavior of the chaperone and other machinery regulating polypeptide conformations can contribute to misfolding and aggregation diseases [49].
9
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
10 An Overview of Protein Misfolding Diseases
Fig. 4 Schematic representation of some of the states accessible to a polypeptide chain following its synthesis on a ribosome. The relative populations of the different states depend on the kinetics and thermodynamics of the various equilibria shown in the diagram. In living systems the fate of a given
molecule is closely regulated, using mechanisms such as those illustrated in Figures 1 and 2, rather like metabolic pathways in cells are controlled by enzymes and associated molecules such as cofactors (from Ref. [25]).
The type of diagram shown in Figure 4 serves as a framework for understanding the fundamental molecular events that underlie the regulatory and quality-control mechanisms and the origins of the amyloid diseases that can result when these fail or are overwhelmed. As we have discussed above, partially or completely unfolded polypeptides are highly aggregation-prone and represent the species that trigger amyloid formation. Such species are, however, inherent in the folding process, and for this reason a variety of molecular chaperones is present in abundance in the cellular compartments wherever such processes occur. It is possible, however, that chaperones also exist in environments where ab initio folding does not take place. Indeed, an extracellular chaperone (clusterin) has been identified recently [50]. Nevertheless, it is undoubtedly important in the context of avoiding aggregation
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
5 The Fundamental Origins of Amyloid Disease
that proteins are correctly folded prior to their secretion from the cell, hence the need for a highly effective system of quality control in the ER. In this context it is interesting that the majority of the diseases associated with amyloid formation involve deposits that are extracellular in nature, although it is not yet clear exactly where their formation in fact takes place [19]. As biology is a dynamic process, there is a continuous need to degrade as well as synthesize proteins, and the degradative mechanisms target misfolded as well as redundant proteins. It is during such processes, which require unfolding and proteolysis of polypeptide chains, that aggregation may be particularly likely. Degradation pathways, such as those of the ubiquitin-proteasome system, are therefore highly regulated in order to avoid the occurrence of such events [7, 51]. In order to understand misfolding and aggregation diseases, we need to know not only how such systems function efficiently but also why they fail to do so under some circumstances [51–53]. The effects of many pathogenic mutations can be particularly well understood from the schematic representation given in Figure 4. Many of the mutations associated with the familial deposition diseases increase the population of partially unfolded states either within or outside the cell by decreasing the stability or cooperativity of the native state [54–56]. Cooperativity is in fact a crucial factor in enabling proteins to remain soluble, as it ensures that, even for a protein that is marginally stable, the equilibrium population of unfolded molecules or of unfolded regions of the polypeptide chain is minimal [24]. Other familial diseases are associated with the accumulation of fragments of native proteins, which are often produced by aberrant processing or incomplete degradation; such species are usually unable to fold into aggregation-resistant states. Other pathogenic mutations act by enhancing the propensities of such species to aggregate, for example, by increasing their hydrophobicity or decreasing their charge [45]. In the case of the transmissible encephalopathies, it is likely that ingestion of pre-aggregated states of an identical protein (e.g., by cannibalistic means or by contamination of surgical instruments) dramatically increases the rate of aggregation within the individual concerned and hence underlies the mechanism of transmission [36, 57, 58]. Such seeding phenomena may also at least partly explain why some deposition conditions such as Alzheimer’s disease progress so rapidly once the initial symptoms are evident [36, 59].
5 The Fundamental Origins of Amyloid Disease
Despite our increasing knowledge of the general principles that underlie protein misfolding diseases, the manner in which improperly folded proteins and aggregated proteins can generate pathological behavior is not yet understood in detail. In the case of systemic disease, the sheer mass of protein that can be deposited may physically disrupt the functioning of specific organs [17]. In other cases it may be that the loss of functional protein results in the failure of some crucial cellular process [39]. For neurodegenerative disorders such as Alzheimer’s disease,
11
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
12 An Overview of Protein Misfolding Diseases
however, it appears that the primary symptoms arise from the destruction of cells such as neurons by a “toxic gain of function” that results from the misfolding and aggregation process [19, 39]. It has recently become apparent that the early prefibrillar aggregates of proteins associated with neurological disorders can be highly damaging to cells; by contrast, the mature fibrils are relatively benign [39, 60]. There is evidence, however, that the toxic nature of protein aggregates is not restricted to species formed from the peptides and proteins associated with pathological conditions. Experiments have recently indicated that pre-fibrillar aggregates of several proteins that are not connected with any known diseases have a degree of toxicity comparable to that of similar species formed from the Aβ-peptide [61]. The concept that the effects of such aggregates on cells are more strongly related to their generic nature than to their specific sequence has recently been reinforced through experiments with polyclonal antibodies that cross-react with early aggregates of different peptides and proteins and, moreover, inhibit their toxicity in cellular assays [62]. It is possible that there are specific mechanisms for such toxicity, for example, through the doughnut-shaped aggregates that resemble the toxins produced by bacteria that form pores in membranes and disrupt the ion balance in cells [44]. It is also possible that disorganized early aggregates, or even mis-folded monomers, are toxic through a less specific mechanism, e.g., because exposed, nonnative hydrophobic surfaces may stimulate aberrant interactions with cell membranes or other cellular components [53, 63, 64]. Such findings raise the question as to how cellular systems are able to tolerate the intrinsic tendency of incompletely folded proteins to aggregate. The answer is almost certainly that under normal circumstances the molecular chaperones and other ”housekeeping“ mechanisms are remarkably efficient in ensuring that such potentially toxic species are neutralized [5, 61]. Molecular chaperones of various types are able to shield hydrophobic regions, to unfold some forms of aggregates, or to alter the partitioning between different forms of aggregates. The latter mechanism, for example, could convert the precursors of amyloid fibrils into less intractable species, allowing them to be refolded or disposed of by the cellular degradation systems. Indeed, evidence has been obtained that such a situation occurs with polyglutamine sequences associated with disorders such as Huntington’s disease [65]. In this case the precursors of amyloid fibrils appear to be diverted into amorphous, and hence more readily degradable, species by the action of molecular chaperones. If such protective processes fail, it may be possible for potentially harmful species to be sequestered in relatively harmless forms such as inclusion bodies in bacteria or aggresomes in eukaryotic systems. Indeed, it has been suggested that the formation of mature amyloid fibrils, whose toxicity appears to be much lower than that of their precursors (as we have discussed above), may itself represent a protective mechanism in some cases [19, 36]. Most of the aggregation diseases, however, do not result from genetic mutations or infectious agents but rather are sporadic or associated with old age. The ideas summarized in this chapter offer a qualitative explanation of why such a situation arises. We have seen that all proteins have an inherent tendency to aggregate unless they are maintained in a highly regulated environment. Selective pressure
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
6 Approaches to Therapeutic Intervention in Amyloid Disease
during evolution has therefore resulted in protein molecules that are normally able to resist aggregation during our normal lifespan, enabling us to develop, to pass on genes, and to give appropriate protection to our offspring. Evolution, however, can only generate sequences and cellular environments that are adequate for those conditions under which selective pressure exists [20]. There is no reason to suppose that random mutations will in general further reduce the propensity to aggregate; indeed they will generally increase it – just as random mutations generally reduce the stability of native proteins. We can see, therefore, that our recent ability to prolong life [66] is leading to the proliferation of these diseases as the limitations in their ability to resist reversion to aggregates, including the intractable and damaging amyloid structures, become evident. However, it is intriguing to speculate that favorable mutations in aggregation-prone proteins might be the reason that some in the population do not readily succumb to diseases such as Alzheimer’s even in extreme old age [45]. The link with aging is likely to involve more than just a greater probability of aggregation taking place as we get older. It is likely to be linked more fundamentally to the failure of the “housekeeping” mechanisms in our bodies under such circumstances [20, 67]. This failure may in part be a result of the need for greater protective capacity in old age as aggregation becomes more prevalent, perhaps as a consequence of the increasing accumulation of misfolded and damaged proteins, leading to chaperone overload [68].But as we age, it is likely that the activity of our chaperone response and degradative mechanisms declines, resulting in the increasing probability that the protective mechanisms are overwhelmed. One reason for this decline, although there may be many, is that cells become less efficient in producing the ATP that is needed for the effective functioning of many chaper-ones. Similarly, the rapidity with which we have introduced practices that have not been experienced previously in history – including new agricultural practices (associated with the emergence of BSE and variant CJD [57]), a changing diet and lifestyle (associated with the prevalence of type II diabetes [69]), and new medical procedures (e.g., resulting in iatrogenic diseases such as CJD [57] and amyloid deposition in hemodialysis, during which the concentration of β-2 microglobulin in serum increases [70]) – means that we have not had time to evolve effective protective mechanisms [20]. Thus, we are now increasingly observing the limitations of our proteins and their environments to resist aggregation under conditions that differ from those under which natural selection has taken place.
6 Approaches to Therapeutic Intervention in Amyloid Disease
The progress made in understanding the underlying causes of the amyloid diseases is also leading to new approaches to their prevention and treatment [71, 72]. For proteins whose functional state is a tightly packed globular fold, we have seen that an essential first step in fibril formation is undoubtedly the partial or complete
13
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
14 An Overview of Protein Misfolding Diseases
unfolding of the native structure that otherwise protects the aggregation-prone polypeptide backbone. Thus, many of the familial forms of amyloid disease are associated with genetic mutations that decrease protein stability and thereby promote unfolding. In such cases, one approach to therapy is to find ways of stabilizing the native states of amyloidogenic protein variants. In accordance with this proposal, a recent study has reported a series of small-molecule analogues of the hormone thyroxine, the natural ligand of transthyretin, that act in this manner. Transthyretin is the protein associated with the most common type of systemic amyloidosis, and these potential drugs have been found to dramatically inhibit the rate at which the disease-associated variants aggregate, at least in vitro [73]. In a similar approach, specific antibodies raised against lysozyme have been found to prevent amyloid fibril formation by pathogenic forms of this well-know antibacterial protein whose aggregation is linked to another systemic amyloidosis [74]. Moreover, quinacrine, a drug previously used to combat malaria, has been found to limit dramatically the replication in cell cultures of the pathogenic form of the prion protein associated with CJD [75]. The finding that this molecule interacts with the soluble form of the protein [76] suggests that a similar mechanism could be operating here. Quinacrine is already entering clinical trials, and determined efforts are now underway to find more potent forms of this compound [75]. A possible therapeutic strategy for those amyloid diseases that result from the aggregation of proteolytic fragments rather than intact proteins would be to reduce the levels of the aggregation-prone peptides by inhibiting the enzymes through whose action the fragments are generated. An extremely important example of this approach involves the 40- or 42-residue Aβ-peptide derived from the amyloid precursor protein (APP), as the aggregation of this species is linked to Alzheimer’s disease. Much is now known about the complex secretase enzymes whose role in processing APP gives rise to the Aβ-peptide, and a number of potent inhibitors have already been developed as potential therapeutics for this debilitating neuro-degenerative disease [77, 78]. Another way of reducing the levels of aggregation-prone species in an organism is to enhance their clearance from the body. A number of strategies based on this general idea have emerged recently, some of the most exciting of which involve the action of antibodies raised against specific amyloid-forming polypeptides. Of particular interest is the finding that active immunization with Aβ-peptides can result in the extensive clearance of amyloid deposits in transgenic mice that overexpress the Aβ-peptide. Although the first set of clinical trials with human Alzheimer victims was terminated as a result of an inflammatory response in a significant number of patients, the potential of this type of approach has been clearly demonstrated [79]. One possible variation on the original procedure is to use passive immunotherapy, in which antibodies are infused into, rather than generated within, the patient. Additionally, there are means of stimulating clearance other than by the direct action of antibodies. In a study of the effects of a ligand designed to bind to serum amyloid P (SAP), a protein that is associated with amyloid deposits in vivo and is known to inhibit natural clearance mechanisms, the levels of SAP in serum were reduced dramatically [80]. It remains to be established how such molecules fare in ongoing clinical trials, but the general principle of
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
7 Conclusions
using small molecules to target specific proteins for degradation is an extremely interesting development. Enhancing the clearance of the amyloidogenic species may well be of particular importance in other forms of amyloid disorders that involve the deposition of peptides and proteins that, although intact, appear not to fold to globular structures even under normal physiological conditions. Examples of such cases include αsynuclein, whose aggregation is associated with Parkinson’s disease, and IAPP (amylin), which is involved in type II diabetes [14, 18]. Another strategy that has been proposed is to intervene in the aggregation process directly by means of small peptides or peptide analogues designed to bind tightly to fibrils as they form, hence blocking their further growth. A variety of such species is being explored, particularly in the context of Alzheimer’s disease [81]. A potential problem with this approach is that inhibition of the formation of amyloid fibrils might occur at the expense of the small aggregates that are their precursors. Such an effect could be highly counterproductive, as the latter appear to be the primary pathogenic agents, at least in the neurodegenerative forms of amyloid disease, as we discuss above. But provided that problems of this type can be avoided, this approach could potentially be rather generally applicable. It is a particularly satisfying aspect of research directed at understanding the origins of the amyloid disorders to find that so many different therapeutic approaches can be rationalized using the relatively simple framework illustrated in Figure 4 [72]. Particularly, as there are a number of different stages in the aggregation process where intervention is potentially possible, one can be optimistic that novel and efficacious forms of treatment will emerge in the not-too-distant future. Moreover, with the increasing evidence for the generic nature of the amyloid conditions, it is interesting to speculate that there might be generic approaches to inhibiting a variety, perhaps even all, of the amyloid diseases with a single drug [25, 72]. That such an idea could in principle be realized is suggested by the observation that antibodies raised against small aggregates of the Aβ-peptide not only recognize but also suppress the toxicity of similar aggregates formed from other proteins [62]. An approach of this type would be particularly dramatic in light of the number of different types of amyloid disorders that increasingly place the aging populations of the world at risk. Alternatively, one might even speculate that a means could be found to extend the period of time that our natural defenses are able to combat aggregation, and hence defer the onset of these disorders. And looking perhaps even further into the future, our ability to design peptides and proteins whose tendencies to aggregate are reduced, often substantially, provides the basis for a range of exciting approaches involving gene therapy and stem cell techniques [45].
7 Conclusions
This chapter has discussed the state of our developing knowledge of the origin of amyloid disease from the perspective of our increasing understanding of the
15
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
16 An Overview of Protein Misfolding Diseases
fundamental properties of proteins. We have stressed the fact that the structure and biological effects of amyloid deposition can be considered to be generic and that this fact can be rationalized, at least in outline, through the ideas of polymer science. The propensities of different sequences to aggregate under specific conditions, however, can vary significantly, and it is becoming possible to explain the origins of pathogenic behavior in terms of the factors that affect such propensities. The amyloid diseases can be thought of as fundamentally resulting from the reversion of normally soluble proteins to the generic amyloid structure as a result of the failures of the mechanisms that have evolved to prevent such behavior. A range of novel approaches is now being developed to combat the various amyloid diseases and can to a large extent be rationalized in terms of intervention at different steps in the process of aggregation as represented in Figure 4. Many of these approaches show considerable promise for the treatment of specific diseases, and the generic nature of amyloid formation raises the exciting possibility that there could be generic approaches to preventing some or indeed all of the various members of the family of diseases. We can therefore be optimistic that the better understanding of protein misfolding and aggregation that is developing from recent research of the type discussed in this article will enable us to devise increasingly effective strategies for drug discovery based on rational arguments.
Acknowledgements
The ideas in this article have emerged from extensive discussions with outstanding students, research fellows, and colleagues over many years. I am most grateful to all of them; they are too numerous to mention here, but the names of many appear in the list of references. The content of this chapter is derived in part from a review article published in Nature in 2003 [4] and a perspective published in Science in 2004 [72]. The research of C.M.D. is supported by Programme Grantsfrom the Wellcome Trust and the Leverhulme Trust, as well as by the BBSRC, EPSRC, and MRC.
References 1 M. VENDRUSCOLO, J. ZURDO, C. E. MACPHEE and C. M. DOBSON, “Protein Folding and Misfolding: A Paradigm of Self-Assembly and Regulation in Complex Biological Systems”, Phil. Trans. R. Soc. Lond., A, 361, 1205–1222 (2003). 2 A. R. FERSHT, Structure and Mechanism in Protein Science; A Guide to Enzyme Catalysis and Protein Folding, W.H. Freeman, New York, 1999. 3 R. H. PAIN (Ed.), Protein Folding (2nd ed.) Oxford University Press, Oxford, 2000.
4 C. M. DOBSON, “Protein Folding and Misfolding”, Nature, 426, 884–890 (2003). 5 F. U. HARTL and M. HAYER-HARTL, “Molecular Chaperones in the Cytosol: From Nascent Chain to Folded Protein”, Science, 295, 1852–1858 (2002). 6 R. SITIA and I. BRAAKMAN, “Quality Control in the Endoplasmic Reticulum Protein Factory”, Nature, 426, 891–894 (2003). 7 A. L. GOLDBERG, “Protein Degradation and Protection against Misfolded or Damaged Proteins”, Nature, 426, 895–899 (2003).
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
References 8 R. R. KOPITO, “Aggresomes, Inclusion Bodies and Protein Aggregates”, Trends Cell Biol., 10, 524–530 (2000). 9 H. R. PELHAM, “Speculations on the Functions of the Major Heat Shock and Glucose-Regulated Proteins”, Cell, 46, 959–961 (1968). 10 U. SCHUBERT, L. C. ANTON, J. GIBBS, C. C. ORBYRY, J. W. YEWDELL and J. R. BENNINK, “Rapid Degradation of a Large Fraction of Newly Synthesized Proteins by Proteasomes, Nature, 404, 770–774 (2000). 11 A. MATOUSCHEK, “Protein Unfolding – an Important Process in Vivo?”, Curr. Opin. Struct. Biol., 13, 98–109 (2001). 12 S. E. RADFORD and C. M. DOBSON, “From Computer Simulations to Human Disease: Emerging Themes in Protein Folding”, Cell, 97, 291–298 (1999). 13 P. J. THOMAS, B. H. QU and P. L. PEDERSEN, “Defective Protein Folding as a Basis of Human Disease”, Trends Biochem. Sci., 20, 456–459 (1995). 14 C. M. DOBSON, “The Structural Basis of Protein Folding and its Links with Human Disease”, Phil. Trans. R. Soc. Lond., B 356, 133–145 (2001). 15 K. POWELL and P. L. ZEITLIN, “Therapeutic Approaches to Repair Defects in F508 CFTR Folding and Cellular Targeting”, Adv. Drug Deliv. Rev., 54, 1395–1408 (2002). 16 R. W. CARRELL and D. A. LOMAS, “Mechanisms of Disease. α1-Antitrypsin Deficiency – A Model for Conformational Diseases”, N. Engl. J. Med, 346, 45–53 (2002). 17 M. B. PEPYS, “Amyloidoses” in The Oxford Textbook of Medicine (eds D. J. WEATHERALL, J. G. LEDINGHAM and D. A. WARREL), 3rd Edition Oxford University Press, Oxford, 1512–1524 (1995). 18 D. J. SELKOE, “Folding Proteins in Fatal Ways”, Nature, 426, 900–904 (2003). 19 E. H. KOO, P. T. LANSBURY Jr., and J. W. KELLY, “Amyloid Diseases: Abnormal Protein Aggregation in Neurodegeneration”, Proc. Natl. Acad. Sci. USA, 96, 9989–9990 (1999). 20 C. M. DOBSON, “Getting Out of Shape – Protein Misfolding Diseases”, Nature, 418, 729–730 (2002).
21 M. SUNDE and C. C. F. BLAKE, “The Structure of Amyloid Fibrils by Electron Microscopy and X-ray Diffraction”, Adv. Protein Chem., 50, 123–159 (1997). 22 J. L. JIME´ NEZ, J. I. GUIJARRO, E. ORLOVA, J. ZURDO, C. M. DOBSON, M. SUNDE and H. R. SAIBIL, “Cryo-electron Microscopy Structure of an SH3 Amyloid Fibril and Model of the Molecular Packing”, EMBO J., 18, 815–821 (1999). 23 F. CHITI, P. WEBSTER, N. TADDEI, A. CLARK, M. STEFANI, G. RAMPONI and C. M. DOBSON, “Designing Conditions for in vitro Formation of Amyloid Protofilaments and Fibrils”, Proc. Natl. Acad. Sci. USA, 96, 3590–3594 (1999). 24 C. M. DOBSON, “Protein Misfolding, Evolution and Disease”, Trends Biochem. Sci., 24, 329–332 (1999). 25 C. M. DOBSON, “Protein Folding and Disease: A View from the First Horizon Symposium”, Nature Rev. Drug Disc., 2, 154–160 (2003). 26 M. F¨ANDRICH, M. A. FLETCHER and C. M. DOBSON, “Amyloid Fibrils from Muscle Myoglobin”, Nature, 410, 165–166 (2001). 27 M. F¨ANDRICH and C. M. DOBSON, “The Behavior of Polyamino Acids Reveals an Inverse Side-chain Effect in Amyloid Structure Formation”, EMBO J., 21, 5682–5690 (2002). 28 M. LOPEZ DE LA PAZ, K. GOLDIE, J. ZURDO, E. LACROIS, C. M. DOBSON, A. HOENGER and L. SERRANO, “De Novo Designed Peptide-based Amyloid Fibrils”, Proc. Natl. Acad. Sci. USA, 99, 16052–16057 (2002). 29 L. C. SERPELL, C. C. F. BLAKE and P. E. FRASER, “Molecular Structure of a Fibrillar Alzheimers Aβ Fragment”, Biochemistry, 39, 13269–13275 (2000). 30 H. WILLE, M. D. MICHELITSCH, V. GUENEBAUT, S. SUPATTAPONE, A. SERBAN, F. E. COHEN, D. A. AGARD and S. B. PRUSINER, “Structural Studies of the Scrapie Prion Protein by Electron Crystallography”, Proc. Natl. Acad. Sci, USA, 99, 3563–3568 (1999). 31 J. L. JIME´ NEZ, E. J. NETTLETON, M. BOUCHARD, C. V. ROBINSON, C. M. DOBSON and H. R. SAIBIL, “The Protofilament Structure of Insulin Amyloid Fibrils”, Proc. Natl. Acad. Sci. USA, 99, 9196–9201 (2002).
17
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
18 An Overview of Protein Misfolding Diseases
32 A. T. PETKOVA, Y. ISHII, J. J. BALBACH, O. N. ANTZUTKIN, R. D. LEAPMAN, F. DELAGLIO and R. TYCKO, “A Structural Model for Alzheimer’s β-amyloid Fibrils Based on Experimental Constraints from Solid State NMR”, Proc. Natl. Acad. Sci. USA, 99, 16742–16747 (2002). 33 C. P. JARONIEC, C. E. MACPHEE, V. S. BAJAJ, M. T. MCMAHON, C. M. DOBSON and R. G. GRIFFIN, “High Resolution Molecular Structure of a Peptide in an Amyloid Fibril Determined by Magic Angle Spinning NMR Spectroscopy”, Proc. Natl. Acad. Sci. USA, 101, 711–716 (2004). 34 A. CHAMBERLAIN, C. E. MACPHEE, J. ZURDO, L. A. MOROZOVA-ROCHE, H. A. O. HILL, C. M. DOBSON and J. DAVIS, “Ultrastructural Organisation of Amyloid Fibrils by Atomic Force Microscopy”, Biophys. J., 79, 3282–3293 (2000). 35 J. KELLY, “Alternative Conformation of Amyloidogenic Proteins and their Multi-step Assembly Pathways”, Curr. Opin. Struct. Biol., 8, 101–106 (1998). 36 J. D. HARPER and P. T. LANSBURY Jr., “Models of Amyloid Seeding in Alzheimer’s Disease and Scrapie: Mechanistic Truths and Physiological Consequences of the Time-dependent Solubility of Amyloid Proteins”, Annu. Rev. Biochem., 66, 385–407 (1997). 37 S. B. PADRICK and A. D. MIRANKER, “Islet Amyloid” Phase Partitioning and Secondary Nucleation are Central to the Mechanism of Fibrillogenesis”, Biochemistry, 41, 4694–4703 (2002). 38 E. J. NETTLETON, P. TITO, M. SUNDE, M. BOUCHARD, C. M. DOBSON and C. V. ROBINSON, “Characterization of the Oligomeric States of Insulin in Self-assembly and Amyloid Fibril Formation by Mass Spectrometry”, Biophys. J., 79, 1053–1065 (2000). 39 B. CAUGHEY and P. T. LANSBURY Jr., “Protofibrils, Pores, Fibrils, and Neurodegeneration: Separating the Responsible Protein Aggregates from the Innocent Bystanders”, Annu. Rev. Neurosci., 26, 267–298 (2003). 40 M. P. SCHLUNEGGER, M. J. BENNETT and D. EISENBERG, “Oligomer Formation by 3D Domain Swapping: A Model for Protein
41
42
43
44
45
46
47
48
49
50
Assembly and Misassembly”, Adv. Protein Chem., 50, 61–122 (1997). G. BITAN, M. D. KIRKITADZE, A. LOMAKIN, S. S. VOLLERS, G. B. BENEDEK and D. B. TEPLOW. “Amyloid β-protein assembly: Aβ40 and Aβ42 oligomerize through distinct pathways,” Proc. Natl. Acad. Sci. USA, 100, 330–335 (2003). M. BOUCHARD, J. ZURDO, E. J. NETTLETON, C. M. DOBSON and C. V. ROBINSON, “Formation of Insulin Amyloid Fibrils Followed by FTIR Simultaneously with CD and Electron Microscopy”, Protein Sci., 9, 1960–1967 (2000). M. G. SPILLANTINI, M. L. SCHMIDT, V. M.-Y. LEE, J. Q. TROJANOWSKI, R. JAKES and M. GOEDERT, “α-Synuclein in Lewy Bodies”, Nature, 388, 839–840 (1997). H. A. LASHUEL, D. HARTLEY, B. M. PETRE, T. WALZ, P. T. LANSBURY Jr., “Neurodegenerative Disease: Amyloid Pores from Pathogenic Mutations”, Nature, 418, 291 (2002). F. CHITI, M. STEFANI, N. TADDEI, G. RAMPONI and C. M. DOBSON, “Rationalisation of Mutational Effects on Peptide and Protein Aggregation Rates”, Nature, 424, 805–808 (2003). F. CHITI, N. TADDEI, F. BARONI, C. CAPANNI, M. STEFANI, G. RAMPONI and C. M. DOBSON, “Kinetic Partitioning of Protein Folding and Aggregation”, Nature Struct. Biol., 9, 137–143 (2002). S. W. RASO and J. KING, “Protein Folding and Human Disease”, in Protein Folding, R. H. PAIN (Ed.), (2nd ed) Oxford University Press, Oxford, 2000, pp 406–428. C. M. DOBSON, “How Do We Explore the Energy Landscape for Folding?”, Simplicity and Complexity in Proteins and Nucleic Acids (eds. H. FRAUNFELDER, J. DEISENHOFER and P. G. WOLYNES), Dahlem University Press, Berlin, pp. 15–37 (1999). A. J. L. MACARIO and E. C. DE MACARIO, “Sick Chaperones and Ageing: A Perspective”, Ageing Res. Rev., 1, 295–311 (2002). M. R. WILSON and S. B. EASTERBROOK-SMITH, “Clusterin is a Secreted Mammalian Chaperone”, Trends Biochem Sci., 25, 95–98 (2000).
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
References 51 N. F. BENCE, R. M. SAMPAT and R. R. KOPITO, “Impairment of the Ubiquitin-proteosome System by Protein Aggregation”, Science, 292, 1552–1555 (2001). 52 A. HORWICH, “Protein Aggregation in Disease: A Role for Folding Intermediates Forming Specific Multimeric Interactions”, J. Clin Invest., 110, 1221–1232 (2002). 53 M. STEFANI and C. M. DOBSON, “Protein Aggregation and Aggregate Toxicity: New Insights into Protein Folding, Misfolding Diseases and Biological Evolution”, J. Mol. Med., 81, 678–699 (2003). 54 D. R. BOOTH, M. SUNDE, V. BELLOTTI, C. V. ROBINSON, W. L. HUTCHINSON, P. E. FRASER, P. N. HAWKINS, C. M. DOBSON, S. E. RADFORD, C. C. F. BLAKE and M. B. PEPYS, “Instability, Unfolding and Aggregation of Human Lysozyme Variants Underlying Amyloid Fibrillogenesis”, Nature, 385, 787–793 (1997). 55 M. RAMIREZ-ALVARADO, J. S. MERKEL and L. REGAN, “A Systematic Exploration of the Influence of Protein Stability on Amyloid Fibril Formation in vitro”, Proc. Natl. Acad. Sci. USA, 97, 8979–8984 (2000). 56 M. DUMOULIN, A. M. LAST, A. DESMYTER, K. DECANNIERE, D. CANET, A. SPENCER, D. B. ARCHER, S. MUYLDERMANS, L. WYNS, A. MATAGNE, C. REDFIELD, C. V. ROBINSON and C. M. DOBSON, “A Camelid Antibody Fragment Inhibits Amyloid Fibril Formation by Human Lysozyme”, Nature, 424, 783–788 (2003). 57 S. PRUSINER, “Prion Diseases and the BSE Crisis”, Science, 278, 245–251 (1997). 58 P. CHIEN, A. H. DEPACE, S. R. COLLINS and J. S. WEISSMAN, “Generation of Prion Transmission Barriers by Mutational Control of Amyloid Conformations”, Nature, 424, 948–951 (2003). 59 M. F. PERUTZ and A. H. WINDLE, “Cause of Neuronal Death in Neuro-degenerative Diseases Attributable to Expansion of Glutamine Repeats”, Nature, 412, 143–144 (2001). 60 D. M. WALSH, I. KLYUBIN, J. V. FADEEVA, W. K. CULLEN, R. ANWYL, M. S. WOLFE, M. J. ROWAN and D. J. SELKOE, “Naturally Secreted Oligomers of Amyloid Beta Protein Potently Inhibit Hippocampal
61
62
63
64
65
66
67
68
69
70
Long-term Potentiation in vivo”, Nature, 416, 535–539 (2002). M. BUCCIANTINI, E. GIANNONI, F. CHITI, F. BARONI, L. FORMIGLI, J. ZURDO, N. TADDEI, G. RAMPONI, C. M. DOBSON and M. STEFANI, “Inherent Cytotoxicity of Aggregates Implies a Common Origin for Protein Misfolding Diseases”, Nature, 416, 507–511 (2002). R. KAYED, E. HEAD, J. L. THOMPSON, T. M. MCINTIRE, S. C. MILTON, C. W. COTMAN and C. G. GLABE, “Common Structure of Soluble Amyloid Oligomers Implies Common Mechanisms of Pathogenesis”, Science, 300, 486–489 (2003). M. SVENSSON, H. SABHARWAL, A. HAKANSSON, A. K. MOSSBERG, P. LIPNIUNAS, H. LEFFLER, C. SVANBORG, S. LINSET, “Molecular Characterization of α-Lactalbumin Folding Variants that Induce Apoptosis in Tumor Cells”, J. Biol. Chem., 274, 6388–6396 (1999). P. POLVERINO DE LAURETO, N. TADDEI, E. FRARE, C. CAPANNI, S. CONSTANTINI, J. ZURDO, F. CHITI, C. M. DOBSON and A. FONTANA, “Protein Aggregation and Amyloid Fibril Formation by an SH3 Domain Probed by Limited Proteolysis”, J. Mol. Biol., 334, 129–141 (2003). P. J. MUCHOWSKI, G. SCHAFFAR, A. SITTLER, E. E. WANKER, M. K. HAYER-HARTL and F. U. HARTL, “Hsp70 and Hsp40 Chaperones Can Inhibit Self-assembly of Polyglutamine Proteins into Amyloid-like Fibrils”, Proc. Natl. Acad. Sci. USA, 97, 7841–7846 (2000). J. OEPPEN and J. W. VAUPEL, “Broken Limits to Life Expectancy”, Science, 296, 1029–31 (2002). P. CSERMELY, “Chaperone Overload is a Possible Contributor to ‘Civilization Diseases’”, Trends Gen., 17, 701–704 (2001). A. J. L. MACARIO and E. C. MACARIO, “Sick Chaperones and Ageing: A Perspective”, Ageing Res., Rev. 1, 295–311 (2002). J. W. M. H¨OPPENER, M. G. NIEUWEN-HUIS, T. M. VROOM, B. AHRE´ N and C. J. M. LIPS, “Role of Islet Amyloid in Type 2 Diabetes Mellitus: Consequence or Cause?”, Mol. Cell. Endocrin., 197, 205–212 (2002). F. GEJYO, N. HOMMA, Y. SUZUKI and M. ARAKAWA, “Serum Levels of
19
c33 (JWBK148-Bommarius)
January 17, 2008
18:59
Char Count=
20 An Overview of Protein Misfolding Diseases
71
72
73
74
75
76
β2-microglobulin as a New Form of Amyloid Protein in Patients Undergoing Long-term Hemodialysis”, N. Engl. J. Med., 314, 585–586 (1986). F. E. COHEN and J. W. KELLY, “Therapeutic Approaches to Protein-misfolding Diseases”, Nature, 426, 905–909 (2003). C. M. DOBSON, “In the Footsteps of Alchemists”, Science, 304, 1259–1262 (2004). P. HAMMARSTROM, R. L. WISEMAN, E. T. POWERS and J. W. KELLY, “Prevention of Transthyretin Amyloid Disease by Changing Protein Misfolding Energetics”, Science, 299, 713–716 (2003). M. DUMOULIN, A. M. LAST, A. DESMYTER, K. DECANNIERE, D. CANET, G. LARSSON, A. SPENCER, D. B. ARCHER, J. SASSE, S. MUYLDERMANS, L. WYNS, C. REDFIELD, A. MATAGNE, C. V. ROBINSON and C. M. DOBSON, “A Camelid Antibody Fragment Inhibits the Formation of Amyloid Fibrils by Human Lysozyme”, Nature, 424, 783–788 (2003). B. C. MAY, A. T. FAFARMAN, S. B. HONG, M. ROGERS, L. W. DEADY, S. B. PRUSINER, F. E. COHEN, “Potent Inhibition of Scrapie Prion Replication in Cultured Cells by Bis-acridines”, Proc. Natl. Acad. Sci. USA, 100, 3416–3421 (2003). M. VOGTHERR, S. GRIMME, B. ELSHORST, D. M. JACOBS, K. FIEBIG, C. GRIESINGER and R. ZAHN, “Antimalarial Drug Quinacrine
77
78
79
80
81
Binds to C-terminal Helix of Cellular Prion Protein”, J. Med. Chem., 46, 3563–3564 (2003). M. S. WOLFE, “γ -Secretase as a Target for Alzheimer’s Disease”, Curr. Top. Med. Chem., 2, 371–383 (2002). R. VASSAR, “Beta-secretase (BACE) as a Drug Target for Alzheimer’s Disease”, Adv. Drug Deliv. Rev., 54, 1589–1602 (2002). J. A. R. NICOLL, D. WILKINSON, C. HOLMES, P. STEART, H. MARKHAM and R. O. WELLER, “Neuropathology of Human Alzheimer Disease after Immunization with Amyloid-beta Peptide: A Case Report”, Nature Med., 9, 448–452 (2003). M. B. PEPYS, J. HERBERT, W. L. HUTCHINSON, G. A. TENNENT, H. J. LACHMANN, J. R. GALLIMORE, L. B. LOVAT, T. BARTFAI, A. ALANINE, C. HERTEL, T. HOFFMANN, R. JAKOB-ROETNER, R. D. NORCROSS, J. A. KEMP, K. YAMAMURA, M. SUZUKI, G. W. TAYLOR, S. MURRAY, D. THOMPSON, A. PURVIS, S. KOLSTOE, S. P. WOOD, P. N. HAWKINS, “Targeted Pharmacological Depletion of Serum Amyloid P Component for Treatment of Human Amyloidosis”, Nature, 417, 254–259 (2002). C. SOTO, “Unfolding the Role of Protein Misfolding in Neurodegenerative Diseases”, Nature Rev. Neurosci, 4, 49–60 (2003).
1
Biochemistry and Structural Biology of Mammalian Prion Disease Rudi Glockshuber
ETH H¨onggerberg, Z¨urich, Switzerland
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction 1.1 Prions and the “Protein-Only” Hypothesis
Transmissible spongiform encephalopathies (TSEs) or prion diseases in humans and mammalians have been the subject of intensive research during the last decade, and there are excellent reviews available dealing with all aspects of TSEs [1–8]. This chapter focuses on biophysical and structural studies on the recombinant prion protein (PrP) and their implications for understanding the mechanism of prion propagation in mammals. In contrast to infectious diseases caused by microorganisms or viruses, mammalian transmissible spongiform encephalopathies (TSEs) are caused by an infectious agent, the prion (for prion terminology, see Table 2), which appears to be devoid of informational nucleic acid [9, 10]. According to the “protein-only” hypothesis [11–13], the prion consists mainly, if not entirely, of the insoluble, oligomeric scrapie isoform (PrPSc ) of the monomeric, cellular prion protein (PrPC ) that is produced by the infected host [14]. The protein-only hypothesis further assumes that the subunits in PrPSc oligomers have the same covalent structure as PrPC [15–17], but a different tertiary structure [18], and that PrPSc oligomers propagate by triggering the conversion of PrPC (e.g., from a non-infected cell) into new PrPSc molecules. These basic assumptions are supported by the fact that highly enriched prions isolated from the brain of infected animals essentially consist of PrPSc oligomers [9, 10, 19] and, unlike nucleic acid–containing infectious agents, are resistant to strong UV irradiation [11, 20]. The strongest evidence in favor of the protein-only hypothesis comes from the fact that knockout mice devoid of the gene encoding
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Biochemistry and Structural Biology of Mammalian Prion Disease
the murine prion protein are resistant to prion infections, and transgenic mice overexpressing PrPC are particularly susceptible to prions [21]. This proves that PrP is indeed an indispensable component in prion propagation. In addition, all inherited prion diseases in humans described to date are associated with mutations in the gene encoding human PrP [1, 22], arguing that these mutations favor spontaneous formation of PrPSc in the corresponding individuals. Despite these facts, the protein-only hypothesis on the nature of TSE prions still cannot be considered proven [23]. At first view, it appears that the prion phenomenon is not in accordance with one of the basic principles in protein biochemistry, namely, the dogma that polypeptide chains adopt only one specific conformation, which is the thermodynamically most stable state in aqueous solution [24]. The prion hypothesis now suggests that there are at least two conformational states of the prion protein (PrPC and PrPSc ) and implies that PrPSc is more stable than the functional, cellular form of the prion protein. A simple way to resolve this apparent contradiction is to restrict the central dogma of protein folding to the physiologically relevant association state of proteins in vivo. Under these prerequisites, monomeric PrPC would be more stable than monomeric PrPSc , which agrees well with the fact that monomeric PrPSc has indeed never been observed. Moreover, it is practically impossible to assess the thermodynamic stability of PrPSc because it cannot be reversibly dissociated and reassembled in vitro [25]. The tertiary structure of PrPSc thus appears to be stable only in the context of the quaternary structure of the PrPSc oligomer. Overall, the thermodynamic stability of the PrPSc oligomer is not accessible experimentally, and there is no principle theoretical difficulty in combining the protein-only hypothesis with Anfinsen’s central dogmas of protein folding [24]. Table 1 summarizes the known forms of human and mammalian TSEs. 1.2 Models of PrPSc Propagation
A key feature of any infectious agent is its ability to replicate. In contrast to amyloids associated with amyloid diseases in humans, including Alzheimer’s and Parkinson diseases, PrPSc aggregates are the only known protein aggregates in mammalians that can be transmitted and cause disease by propagation in the newly infected host. Within the framework of the protein-only hypothesis, there are mainly two potential mechanisms that are discussed for replication of PrPSc aggregates, i.e., formation of new PrPSc oligomers from PrPC of the host. These models also account for the de novo formation of prions in inherited or sporadic prion diseases. The template assistance or heterodimer model [26] proposes that the rate-limiting step in the de novo generation of PrPSc is formation of a PrPSc monomer from PrPC . A PrPSc monomer would then form a heterodimer with a benign PrPC molecule and convert this PrPC into PrPSc . This requires direct or indirect specific recognition of PrPC by PrPSc . Newly converted PrPSc molecules would then further convert PrPC molecules into PrPSc (Figure 1A). The nucleation-polymerization model [27] (Figure 1B) assumes that PrPC and PrPSc are always in a dynamic equilibrium, but
1 Introduction 3 Table 1 Transmissible spongiform encephalopathies in humans and mammals.
Disease
Origin/Mechanism Human prion diseases
Sporadic Creutzfeld-Jakob disease (CJD)
Variant CJD
Familial CJD Gerstmann-Stra¨ussler-Scheinker disease
Spontaneous formation of prions from PrPc Most common human prion disease; nevertheless extremely rare (1 case per one million per year) Affected individuals are almost exclusively homozygous with respect to the polymorphism at residue 129 (Met/Met or Val/Val) Average age of onset: >60 years Infection from bovine prions Infected individuals are generally Met/Met homozygous with respect to the polymorphism at residue 129, which is also Met in bovine PrP Germ-line mutations in the PrP gene Germ-line mutations in the PrP gene
Fatal familial insomnia Iatrogenic CJD
Germ-line mutations in the PrP gene Infection from prion-contaminated human growth hormone, dura mater grafts, etc.
Kuru
Infection through ritual cannibalism (Fore people) Prion diseases in mammals
Bovine spongiform encephalopathy Scrapie (sheep and goats) Transmissible mink encephalopathy Chronic wasting disease (elk, deer) Feline spongiform encephalopathy (cats)
BSE-contaminated food Genetic susceptibility to spontaneous formation of scrapie prions. Transmission among sheep? Infection with prions from cattle or sheep Origin and transmission unknown Infection with prion-contaminated beef
PrPC monomers are by far more stable than PrPSc monomers, such that the small fraction of PrPSc monomers is not observed experimentally. Here, the rate-limiting step in spontaneous PrPSc formation is the oligomerization of the small fraction of PrPSc monomers to an oligomer of critical size, which is sufficiently stabilized by quaternary structure contacts such that it can no longer dissociate. This nucleus is capable of growing further, by pulling additional PrPSc monomers from the equilibrium into the PrPSc oligomer. An important difference between the models is that the nucleation-polymerization model does not necessarily require a direct contact and specific recognition between PrPC and PrPSc . Moreover, it much better accounts for the prion strain phenomenon, as will be discussed below (see Section 5). In addition to these models for the generation of PrPSc , another mechanism of amyloid formation has been postulated for the aggregation of human lysozyme variants in hereditary systemic amyloidosis. Here, single amino acid replacements
4 Biochemistry and Structural Biology of Mammalian Prion Disease
Fig. 1 Schematic representation of the template-assistance model (A) and nucleation-polymerization model (B) of PrPSc propagation.
in lysozyme destabilize the native state relative to partially structured intermediates. This leads to an increase of the fraction of partially folded intermediates that act as amyloid precursors and are pulled from the normal folding equilibrium during amyloid propagation [28]. A further mechanism is observed in the formation of amyloid fibrils of transthyretin (TTR), which are the putative cause of senile systemic amyloidosis (SSA) and familial amyloidotic polyneuropathy (FAP). Here, dissociation of the native TTR homotetramer into monomers is the rate-limiting step in amyloid formation, which is followed by partial monomer denaturation and misassembly into fibrils [29].
2 Properties of PrPC and PrPSc
Mammalian PrPC is a cell surface glycoprotein of about 210 amino acids that is strongly expressed in neurons but is also present in most other tissues. PrPC belongs to the most conserved proteins known, and pairwise alignment of PrPs from different species generally reveals more than 90% sequence identity [30, 31]. In addition, there are three types of invariant posttranslational modifications found in mature PrPC : two N-glycosylation sites at Asn181 and Asn197 [34], a single disulfide bond between Cys179 and Cys214 [33], and a C-terminal glycosyl-phosphatidylinositol (GPI) anchor at residue 231 (amino acid numbering according to human PrP) [16, 17, 32]. Another typical feature of mammalian PrP sequences is a segment of five octapeptide repeats in the N-terminal region (Figure 3A). Although
2 Properties of PrPC and PrPSc
the strong conservation of PrP in mammals suggests an important cellular role of the protein, the biological function of PrPC remains unknown to date. This is mainly due to the fact that PrP knockout mice are essentially healthy [35] and show only subtle phenotypes that are not completely preserved in different knockout strains. Proposed functions for PrP include roles in signal transduction [36], copper storage [37], circadian rhythm regulation [38], and maintenance of synaptic function [39] and Purkinje cell survival [40] (also reviewed in Ref. [8]). PrPC is monomeric, is soluble in non-denaturing detergents, and contains a high fraction of α-helical secondary structure [18]. in vivo, PrPC is enriched in cholesterol and sphingolipid-rich membrane rafts, subject to caveolae-mediated endocytosis [41], and has a half-life of about 3–6 h [42]. There is evidence that raft localization of PrPC and co-localization with PrPSc in the same membrane are required for its conversion into PrPSc [43]. Most PrP molecules synthesized in the cell reach the cell surface and are anchored to the cell membrane via their GPI anchor, but two intracellular transmembrane forms of PrP have been identified that span the ER membrane in different orientations, of which the form with the C-terminus towards the ER lumen (Ctm PrP) may be associated with neurodegeneration in prion-infected individuals [44]. Whether aberrant membrane topology of PrP at the ER has general functional implications still has to be determined. Moreover, retrograde transport of misfolded PrP from the ER to the cytosol normally leads to PrP degradation by the proteasome, but if this degradation is compromised, the accumulation of cytosolic PrP has severe toxic effects on neurons and has been proposed to be the general mechanism underlying the death of neurons during prion pathogenesis [45]. The insoluble, oligomeric scrapie form PrPSc has so far been isolated only from the brain of prion-infected individuals or prion-infected cell cultures. Although the subunits in PrPSc have the same covalent structure and posttranslational modifications as PrPC , it differs from PrPC with respect to all its biochemical properties: it is insoluble in non-denaturing detergents and shows an increased β-sheet content and a decreased α-helix content compared to PrPC [18]. Another important property of PrPSc is its partial resistance to proteinase K (PK). While PrPC is readily degraded by PK, the polypeptide chains in PrPSc exhibit a protease-resistant core ranging from approximately residue 90 to the C-terminal residue 231. In contrast, the N-terminal PrPSc segment from residue 23 to approximately residue 90 is degraded (Figure 2) [46, 47]. As will be discussed in Section 5, this observation has important implications, as it shows that PrPSc is an ordered oligomer in which all subunits appear to have a very similar structural environment that protects residues 90–231 in every subunit from access by PK. This is a strong hint that PrPSc does not represent a nonspecific, inclusion body–like protein aggregate but rather a regular array of subunits with identical tertiary structures in which both the fold of the subunits and their quaternary structure contacts confer protease resistance. This is in good agreement with the nucleation-polymerization model of PrPSc propagation [27], which assumes specific subunit-subunit contacts in the nucleation and further growth of the PrPSc oligomer. The importance of the
5
6 Biochemistry and Structural Biology of Mammalian Prion Disease
Fig. 2 Protease resistance of PrPSc . While PrPC is readily degraded by proteinase K into peptide fragments, the subunits of PrPSc contain a protease-resistant core (PrP27–30) from about residue 90 to the C-terminal residue 231 (amino acid numbering according to human PrP). Thus, only
the N-terminal segment 23–90 in PrPSc subunits is accessible to proteinase K degradation. The structure of PrPC , comprised of the flexibly disordered N-terminal tail 23–120, and the structured C-terminal domain PrP(121–231) (cf. Figure 3) are also indicated.
protease-resistant core of PrPSc , termed PrP27–30, is further underlined by the fact that it retains prion infectivity and forms amyloid-like fibers [19]. In addition, transgenic mice exclusively expressing N-terminally truncated forms of PrP that still contain segment 90–231 are susceptible to prions and able to replicate prions [48]. This shows that the octapeptide repeats in the N-terminal segment of PrPC are not required for prion propagation. The identification of PrP27–30 after protei-nase K treatment of brain extracts, protein separation by SDS-PAGE, and immunospecific detection in Western blots represents the basis for both post-mortem and pre-clinical detection of prions, e.g., in BSE-infected cattle, and, as outlined in Section 5, biochemical characterization of prion strains [6, 47]. Interestingly, a high-molecular-weight polysaccharide consisting of 1,4-, 1,6-, and 1,4,6-linked glucose units was found to be associated with infectious preparations of PrP27–30, and it was postulated that this polysaccharide is a scaffold that promotes formation of PrPSc and contributes to its extraordinary physical and chemical stability [49]. Additional structural information on PrPSc has become available from fiber diffraction studies and atomic force microscopy studies indicating that PrPSc contains β-sheets that are arranged perpendicular to the fiber axis [50] and may form parallel β-helices [51].
3 Three-dimensional Structure and Folding of Recombinant PrP
3 Three-dimensional Structure and Folding of Recombinant PrP 3.1 Expression of the Recombinant Prion Protein for Structural and Biophysical Studies
The protein-only hypothesis and all theoretical models of PrPC to PrPSc conversion imply that knowledge of the three-dimensional structures of PrPC and PrPSc is the prerequisite for understanding prion propagation at a molecular level. As PrPSc is insoluble, and because no soluble, low-molecular-weight oligomers with defined stoichiometry have been isolated so far, PrPSc has not been accessible to atomic structure determination by X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. Because it is extremely difficult to purify milligram quantities of PrPC from mammalian brain, production of recombinant PrP in Escherichia coli has proved to be necessary to determine the three-dimensional structure of the monomeric protein. Since there is neither N-linked glycosylation nor GPI anchor biogenesis in E. coli, bacterially produced PrP lacks these modifications. Nevertheless, there is no evidence at present that these modifications significantly influence the tertiary structure of the prion protein. To allow formation of the invariant disulfide bond between Cys179 and 214 in the recombinant prion protein, two strategies have been pursued. As disulfide bonds cannot form in the reducing environment of the bacterial cytoplasm, the prion protein and N-terminally truncated fragments were secreted into periplasm with the E. coli OmpA signal sequence [52]. The disulfide bond was formed, but the protein was N-terminally degraded by periplasmic proteases such that only the fragment from residues 121–231 stayed stable in the periplasm. The fragment was termed PrP(121–231) [52]. This was the first evidence that the entire N-terminal segment 23–120 of recombinant PrP does not adopt a defined structure in solution (see below). Because any attempt to produce full-length PrP in the bacterial periplasm has failed so far, the full-length protein was expressed in the cytoplasm where the protein accumulates in the reduced form and nonspecifically aggregates into inclusion bodies. Oxidative refolding from the inclusion body fraction in vitro and further purification yielded large quantities of homogeneous, recombinant PrP for structure analysis and biochemical characterrization [53, 54]. 3.2 Three-dimensional Structures of Recombinant Prion Proteins from Different Species and Their Implications for the Species Barrier of Prion Transmission 3.2.1 Solution Structure of Murine PrP The first three-dimensional structure of a recombinant prion protein was reported in 1996 for the fragment PrP(121–231) of the murine prion protein, which was determined by nuclear magnetic resonance spectroscopy (NMR) [56, 57] (Figure 3B). PrP(121–231) adopts a unique fold that has so far not been found in other proteins. The fold consists of a short, two-stranded, antiparallel β-sheet
7
8 Biochemistry and Structural Biology of Mammalian Prion Disease
Fig. 3 Posttranslational modifications of PrPC and three-dimensional structure of recombinant PrP. (A) Mammalian prion proteins contain an N-terminal signal sequence (residues 1–22) that directs the protein to the endoplasmic reticulum, where glycans (CHO) at Asn181 and Asn197 are attached, the C-terminal peptide 232–254 is replaced by a GPI anchor at the carboxylate of residue 231, and the single disulfide bridge between Cys179 and Cys214 is formed (amino acid numbering according to human PrP). Positions of $-strands (S1, S2) and "-helices (H1–H3) in the NMR structure of the recombinant PrP are indicated, as well the length of the protease-
resistant core of PrPSc and the structured C-terminal domain PrP(121–231). (B) Ribbon diagram of the NMR structure of recombinant murine PrP. The protein is composed of a flexibly disordered N-terminal tail (residues 23–120, represented as random coil C" trace) and a structured domain (residues 121–231), which consists of a two-stranded, antiparallel $-sheet and three "-helices. The single disulfide bond (black lines) connects the second and third "-helix and is completely shielded from the solvent. The solvent-accessible residues 90–120, which become protease-resistant in PrPSc , are indicated as red lines.
3 Three-dimensional Structure and Folding of Recombinant PrP
(residues 128–131 and 161–164) and three α-helices (residues 144–154, 175–194, and 200– 226). The fold is additionally characterized by a tightly packed hydrophobic core of 20 amino acids (residues 134, 137, 139, 141, 158, 161, 175, 176, 179, 180, 184, 198, 203, 205, 206, 209, 210, 213, 214, and 215). The invariant disulfide bond between Cys179 and Cys214 is part of this hydrophobic core and connects the second with the third α-helix. The hydrophobic core is surrounded by a shell of hydrogen-bonded residues that further stabilize the fold [58]. Overall, the structure of PrP(121–231) resembles a flat ellipsoid that is characterized by an uneven distribution of surface charges between the two flat surfaces. The two glycosylation sites at Asn181 and Asn189 are located on the negatively charged side of the molecule, while the opposite surface is positively charged. It has been proposed that the uneven charge distribution may contribute to the orientation of PrPC relative to the cell membrane. One would expect that the protein associates via its positively charged surface with the membrane, such that the two glycosylation sites would be oriented towards the extracellular space [56]. The NMR structure of full-length murine PrP, which was reported in 1997, revealed that residues 23–120 (including the octapeptide repeats), in contrast to the compact domain PrP(121–231), are flexibly disordered in solution (Figure 3B), with all residues in this segment being flexible on a sub-nanosecond timescale [59]. The structure of residues 121–231 in the context of the full-length protein proved to be indistinguishable from that in isolated PrP(121–231). The functional role of the flexible tail is presently unclear. However, a plausible model has been proposed for structure formation within the octapeptide repeat regions upon cooperative binding of Cu2+ ions [60]. As there is increasing evidence that a substantial fraction of cellular proteins are “natively unfolded” in isolation and adopt a defined tertiary structure only in complex with target molecules [61], it appears likely that the flexible tail of PrP is required for the interaction of PrPC with its natural ligands. Overall, the structure of recombinant, non-glycosylated, full-length PrP produced in E. coli is in full agreement with the known properties of PrPC isolated from mammalian cells. As natural PrPC , it predominantly contains α-helical secondary structure and is soluble and monomeric. It is therefore generally accepted that the structure of recombinant PrP corresponds to that of the benign cellular prion protein. As will be discussed in Section 3.2.2, the NMR structure of murine PrP is also very similar to the structures of other mammalian prion proteins. Because the segment 90–231 is protease-resistant in PrPSc , an important conclusion from the structure of full-length PrP(23–231) is that residues 90–120, which are required for prion propagation [36], must adopt a defined three-dimensional structure in PrPSc . This corresponds to the minimal structural change that has to be postulated for a PrP polypeptide when it is converted from PrPC to a subunit of PrPSc (Figure 3B). As one might expect from the flexibly disordered N-terminal tail, attempts to crystallize recombinant prion proteins have failed so far. The only exception is the X-ray structure of human PrP(90–231), which crystallized as a covalently linked homodimer in which α-helices 3 are swapped between the subunits such that αhelix 2 in each subunit is now disulfide-bonded to α-helix 3 from the other subunit (Figure 4B) [62]. This structure may be an experimental artifact that was produced
9
10 Biochemistry and Structural Biology of Mammalian Prion Disease
Fig. 4 Comparison of the NMR structure of recombinant murine PrP(121–231) (pdb code: 1AG2) (A) with the X-ray structure of a dimer of recombinant human PrP(90–231) (B) (code1I4M) and the NMR structure of the murine Doppel protein (C) (code 1I17). (B) In the dimer of human PrP(90–231), the C-terminal "-helices are swapped and disul-
fide is cross-linked with the second "-helix of the other subunit. The molecule may be an artifact resulting from oxidative refolding of the recombinant protein from E. coli inclusion bodies under nonnative conditions. (C) Note that the Doppel protein is stabilized by two intramolecular disulfide bonds (dark blue lines).
during oxidative in vitro refolding of the recombinant protein under conditions that favored intermolecular disulfide bond formation. Disulfide-bonded homodimers have not been observed for PrPC , nor have intermolecular disulfide bonds been reported for PrPSc oligomers [33]. As mentioned above, the structure of the C-terminal domain of the mammalian prion protein represents a new fold that has not been observed in other proteins. In this context, it is interesting to discuss the recently solved NMR structure of the mammalian protein with closest homology to the prion protein, termed Doppel (Dpl) [63] (Figure 4C). Dpl lacks the N-terminal octapeptide repeats found in PrP and is most likely not linked with prion diseases, as it is normally not expressed in the central nervous system. In contrast, it was shown to be required for male fertility in mice by controlling male gametogenesis and sperm-egg interaction [64]. Figure 4C shows the NMR structure of murine Dpl [65], which shares 21% amino acid sequence identity with murine PrP and is very similar to the structure of human Dpl [66]. The Dpl fold is similar to that of PrP(121–231) in that the secondary structures fall in the same positions in the two proteins. Differences are observed mainly in the lengths of the α-helices and the positions of the β-strands (in Dpl, the strands are displaced by one to two residues relative to those in the PrP structure). Moreover, Dpl contains two intramolecular disulfide bonds. The disulfide bond
3 Three-dimensional Structure and Folding of Recombinant PrP
109–143 corresponds to that of PrP(121–231), and the second disul-fide 95–148 connects the segment after β-strand 1 with the C-terminal segment following αhelix 3. The strongest structural difference between the proteins is that the plane of the β-sheet is parallel to the axis of α-helix 2 in Dpl, while it is perpendicular to this axis in the structure of PrP(121–231). Moreover, there is a marked kink in αhelix 2 of Dpl, which is not observed in PrP(121–231). In summary, the structured domains of Dpl and PrP appear to have evolved from the same ancestor by gene duplication but now fulfill different functions. 3.2.2 Comparison of Mammalian Prion Protein Structures and the Species Barrier of Prion Transmission In addition to the structure of murine PrP, the NMR structures of the recombinant prion proteins from hamster [67, 68], human [69], and cattle [70] have been determined. All proteins consist of very similar global architectures, i.e., the flexible N-terminal tail of about 100 residues, followed by the folded C-terminal domain. The folds of C-terminal domains are again very similar but exhibit interesting local differences [57]. For example, α-helix 3 is kinked in the structure of murine PrP but straight in the structures of hamster, bovine, and human PrP. Comparison of the structures of mouse and hamster PrP revealed a better-defined loop segment between β-strand 1 and α-helix 2 and slight local differences between the structures in the segment between α-helix 1 and β-strand 2. In the structure of human PrP, α-helix 3 is also straight and well defined and extends to residue 228. Interestingly, the C-terminal turns of α-helices 2 and 4 appear to be in a local unfolding equilibrium. In addition, short, transient contacts between the flexible tail and the folded C-terminal domain were detected for human PrP, which may slightly contribute to a stabilization of the C-terminal residues of α-helices 2 and 3 against unfolding [57]. The structure of bovine PrP is most closely related to that of human PrP but shows the differences in local backbone conformation from murine and hamster PrP described above (Figure 5). Comparison of the structure of bovine PrP(121–231) with that from humans, mice, and hamsters revealed RMSD values of 0.98 Å, 1.66 Å, and 1.68 Å, respectively [57, 70]. The striking structural similarity between bovine and human PrPC , in conjunction with the fact that BSE prions can be transmitted to humans and cause new variant Creutzfeld-Jakob disease (CJD) (Table 1), suggests that the structural relationship at the level of PrPC inversely correlates with the species barrier of prion transmission. The species-barrier phenomenon has so far been discussed mainly at the level of sequence similarity between PrPs from different species with the view that the transmission barrier decreases with increasing sequence identity [30, 31]. A detailed structural comparison of all available PrP solution structures suggests that the most relevant structural feature with regard to the species-barrier phenomenon, besides the conformational differences described above, are differences in local surface charges, even though the uneven surface charge distribution on both flat surfaces of the C-terminal PrP domain is preserved in all structures [57]. In the context of the nucleation-polymerization mechanism of prion propagation, an inverse correlation between structural similarity of PrPC s from different species
11
12 Biochemistry and Structural Biology of Mammalian Prion Disease
Fig. 5 Comparison of the NMR structure of bovine PrP(121–231) (green, pdb code 1DX0) with that of recombinant PrP(121–231) from mice (yellow, code 1AG2), hamsters (pink, code 1B10), and humans (red, code 1E1J). The C" traces are represented as spline functions, and the
thickness of the cylindrical rods represents the mean global displacement per residue after superposition of the backbone atoms in the set of energy-minimized conformers that represents each NMR structure (picture from Dr. R. Riek, ETH Zurich).
and species barrier could mean that those parts of the PrPC structure that are similar between a pair of species but different in other PrPC structures are preserved in the structure of the PrPSc subunits and possibly are involved in subunit-subunit contacts. This could mean that substantial parts of the structure of the PrPC segment 121-231 are still contained in the tertiary structure of PrPSc subunits. In addition, a host-specific factor, called “protein X,” may be responsible for the species-barrier phenomenon. “Protein X” is believed to be contained in every mammalian species and to be required for prion propagation [71]. 3.3 Biophysical Characterization of the Recombinant Prion Protein 3.3.1 Folding and Stability of Recombinant PrP Studies on the folding and thermodynamic stability of murine PrP(121–231) at neutral pH and 25 ◦ C revealed that the C-terminal PrP domain behaves perfectly according to the two-state model as one would expect for a one-domain domain protein. It shows entirely reversible unfolding and refolding transitions in denaturantinduced equilibrium experiments, with a free energy of folding of −30 kJ mol−1 and a cooperativity expected for a 111-residue protein (m = 4.8 kJ mol−1 /M urea) [52]. In the context of full-length PrP(23–231), the stability of the domain is slightly decreased (–26 kJ mol−1 ), which is due to a somewhat lower cooperativity, while the transition midpoint is the same as that of PrP(121-231) (6.3 M urea). The lower folding cooperativity observed for full-length PrP may be due to formation of residual structure in the unfolded state in which residues 23-120 interact with the parts of segment 121–231 [55]. Reversible unfolding transitions at neutral pH were
3 Three-dimensional Structure and Folding of Recombinant PrP
also confirmed for human PrP(90–231) [72–75]. Hydrogen-exchange experiments on human PrP(90-231) revealed that only a small segment of about 10 residues around the disulfide bond retains residual structure in the unfolded state [74]. Further studies revealed that the stability of recombinant PrP decreases with decreasing pH [87, 96] and shows an unusual salt dependence in that salts destabilize the protein at concentrations below 50 mM increase the stability at high concentrations according to the Hofmeister series [96]. Assuming that folding of recombinant PrP corresponds to that of PrPC , the complete reversibility of PrPC folding nicely explains within the framework of the protein-only hypothesis why treatment of prions with denaturants such as urea or guanidinium chloride (GdmCl) completely abolishes infectivity, even after subsequent removal of the denaturant [76]: high denaturant concentrations dissociate PrPSc into monomers and denature its subunits, so that a solution of dissociated and unfolded PrPSc is identical to a solution of unfolded PrPC . Consequently, unfolded PrPSc subunits will always refold to PrPC . Analysis of the kinetics of folding of murine PrP(121–231) showed that the Cterminal domain belongs to the fastest folding proteins described so far. It folds with a half-time of about 200 µs at neutral pH and 4 ◦ C and unfolds with a halftime of 4.6 min under these conditions [77]. Rapid folding of PrP(121–231) is consistent with the absence of cis-proline peptide bonds in the three-dimensional structure [57, 67–70]. Analysis of the dependence of the rate constants of unfolding and refolding on denaturant concentration showed that the compactness of the transition state of folding is closer to the unfolded state than to the native state. No transient folding intermediates have been detected so far in the kinetics of folding of PrP(121–231). However a kinetic folding intermediate has been identified in the folding of recombinant human PrP(90–231) which nevertheless also folds extremely rapidly [78, 79]. Whether this difference is due to the additional segment 90–120 in human PrP(90–231) or reflects intrinsic differences in the folding of human and murine PrP(121–231) still has to be established. Overall the very rapid folding of PrP seems to exclude that kinetic folding intermediates can serve as a source of PrPSc at neutral pH. However the population of kinetic intermediates is uniformly higher in all recombinant human PrP(90–231) variants that bear mutations linked with inherited human TSEs [79]. PrPSc has been shown to accumulate in lysosomes of prion-infected cells [80–82] where the pH varies between 4 and 6 [83]. This opens the possibility that conversion of PrPC into PrPSc does not occur at the cell surface but rather in the acidic environment of lysosomes. For this reason, folding of recombinant PrP has also been analyzed in detail under acidic conditions, where the properties of the protein indeed change dramatically. At pH 4, the folding mechanism (U↔N) of murine PrP(121–231) changes from a two-state to a three-state equilibrium in which an acid-induced unfolding intermediate is observed by a pronounced plateau phase in the circular dichroism signal at 222 nm [72, 83, 84, 84–86]. At pH 4, the intermediate is maximally populated at 3 M urea and still significantly populated in the absence of urea [85]. In contrast to the α-helical CD spectrum of native PrP(121–231) at pH 4, the intermediate shows far-UV CD spectra of a
13
14 Biochemistry and Structural Biology of Mammalian Prion Disease
β-sheet protein with a single minimum at 215 nm that are reminiscent of PrPSc . The transition from two-state to three-state equilibrium transitions has an apparent pKa of 4.5, which indicates that protonation of acidic side chains is required for stabilization of the intermediate [85]. Very similar results were obtained for the folding of the full-length murine PrP(23–231) and human PrP(90–231). Consequently, formation of the acid-induced unfolding intermediate is an intrinsic property of the domain PrP(121–231). Further investigations demonstrated that formation of the intermediate is dependent on protein concentration and that the intermediate is most likely is a homodimer. In addition, ionic strengths above 50 mM are required for the intermediate to be observed. Equilibrium folding of recombinant PrP at neutral and acidic pH can thus by described by the following scheme [87]: pH 4–8, low ionic strength at pH 4–5: U ↔ N pH 4, ionic strength > 50 µM, [PrP] > 20 mM: 2 U ↔ I2 ↔ 2 N An attractive hypothesis resulting from these data is that the intermediate represents an early dimeric intermediate in the formation of oligomeric PrPSc . Indeed, starting with conditions that favor the population of the intermediate, fibrillar aggregates of recombinant PrP(90–231) could be obtained that showed a somewhat increased resistance to proteinase K [88]. Nevertheless, as in all attempts to generate prions de novo from recombinant PrP performed so far, it could not be demonstrated that these aggregates are infectious and relevant for understanding prion propagation. The same holds true for a β-sheet-rich conformation of human PrP(91–231), which is observed after thermal unfolding and cooling of the protein [86]. 3.3.2 Role of the Disulfide Bond in PrP The disulfide bond between the only cysteine residues of PrP, Cys179, and Cys214 is invariant in all mammalian prion protein sequences [31]. It is formed quantitatively in PrPC , and it has been shown unequivocally that free thiol groups are also absent in PrPSc [33]. Thus, there is primarily no indication that reduced PrP is involved in the mechanism of prion propagation. Nevertheless, a series of interesting recent observations argue in favor of a transient reduction of the disulfide bond during PrPSc formation. For example, it could be that the disulfide bond of PrPC , after caveolin-dependent endocytosis into lysosomes, becomes reduced and oligomerizes into reduced PrPSc in which the subunits again become re-oxidized when PrPSc is released into the extracellular space after death of the infected cell. Alternatively, it could be that reduced PrP is required for formation of PrPSc nuclei but that further incorporation of oxidized PrP into these nuclei eventually leads to a PrPSc oligomer in which the vast majority of subunits are oxidized, such that the small fraction of reduced PrP that was initially present is no longer detectable. The characterization of reduced, recombinant prion proteins has indeed revealed that the prion protein has the property of adopting two entirely different tertiary
3 Three-dimensional Structure and Folding of Recombinant PrP
structures, depending on whether or not the Cys179-Cys214 disulfide is formed. This was first observed for recombinant PrP(91–231) [89, 90]. Reduction of the disulfide bond in human PrP(90–231) at pH 4 and low ionic strength leads to transition of the predominantly α-helical protein to a monomeric polypeptide that shows β-sheet-like CD spectra [90]. Loss of chemical shift dispersion in 1 H–15 N heteronuclear NMR spectra, however, indicates that the tertiary structure content in this conformational state is significantly lower compared to that of the oxidized protein. Increase in ionic strength at acidic pH then leads to formation of amyloidlike fibers of reduced PrP(91–231), which show increased protease resistance [90]. In addition to these in vitro experiments, several studies on the expression of fulllength PrP in the reducing environment of the yeast cytoplasm have revealed that self-replicating amyloid-like fibers of reduced PrP are formed in vivo and show a protease-resistance pattern similar to that of the non-glycosylated PrP from PrPSc [91, 92]. In this context, it is very interesting that an in vitro protocol for transformation of recombinant hamster PrP(90–231) into amyloids could be developed that requires reduction and re-oxidation of the recombinant protein. The subunits in the resulting oligomers were connected by intermolecular disulfide bonds, which can be explained by a sub-domain swapping mechanism according to which each PrP molecule is covalently connected to two other PrPs via its cysteine residues and each oligomer has two non-satisfied ends with a free thiol group [93]. Here, the domain contacts and the structures of the monomers must be different compared to the X-ray structure of homodimeric human PrP(90–231), which lacks free thiols, as each Cys179 is disulfide-bonded with Cys214 of the other subunit of the dimer [62]. In summary, amyloid-like oligomers can be formed with both oxidized and reduced recombinant PrP in vitro. In none of these cases, however, could it be shown that infectious prions were generated. In this context, it should be emphasized that strong evidence has been accumulated during the past few years that nearly any protein, including purely α-helical proteins such as myoglobin that are not related to known amyloid diseases, can be converted into β-sheet-rich amyloid fibrils in vitro [94, 95]. This suggests that protein amyloids are primarily stabilized by β-sheet hydrogen bonds between main chain atoms. Thus, amyloid formation appears to be an intrinsic property of any polypeptide chain, independent of the amino acid sequence [95]. Therefore, it could well be that none of the aggregates of recombinant PrP reported so far are related to infectious PrPSc . 3.3.3 Influence of Point Mutations Linked With Inherited TSEs on the Stability of Recombinant PrP All known inherited prion diseases in humans are associated with mutations in the gene encoding human PrP. There are three different familial human TSEs, all of which are autosomal dominant: the familial Creutzfeld-Jakob diseases (CJD), Gerstmann-Straussler-Scheinker syndrome (GSS), and fatal familial insomnia (FFI) (Table 1) [1]. Figure 6A summarizes the location of amino acid replacements in human PrP linked with the three different disease phenotypes. Interestingly, the
15
16 Biochemistry and Structural Biology of Mammalian Prion Disease
mutation D178N causes either CJD or FFI, depending on the polymorphism at position 129 (Val or a Met, respectively). Figure 6B shows that disease-related mutations are mainly located in the folded C-terminal domain of PrP. There is, however, no structural clustering of amino acid replacements related to a specific disease phenotype. As the affected individuals with a point mutation in one of the two PrP alleles spontaneously develop prions without having had contact with external prions, an obvious mechanism underlying this phenomenon could be a destabilization of PrPC due to the mutation and, consequently, a facilitated conversion to PrPSc . Inspection of the NMR structure of recombinant PrP indeed predicts a destabilization in the case of the replacements D178N (loss of a salt bridge to Arg164), T183A (loss of two hydrogen bonds from the hydroxyl group of Thr183 to the HN of Tyr163 and the carbonyl oxygen of Cys179), F198S (generation of a cavity), and Q2017R (loss of a hydrogen bond to the carbonyl atom of Ala133) [57, 58]. However, there are no clear-cut predictions for the other amino acid replacements, and the GSS mutations P102L, P105L, and A117V are located in the flexible tail and should not affect the stability of PrPC at all. Therefore, all disease-related point mutations were introduced into recombinant PrP(121–231), and the thermodynamic stabilities of the variants were analyzed. All variants showed far-UV CD spectra that were indistinguishable from that of the wild-type protein, demonstrating that none of these mutations a priori induces a PrPSc -like, β-sheet-rich conformation [55]. The thermodynamic stabilities of the PrP(121–231) variants with single disease-related amino acid replacements are summarized in Figure 6C and reveal that some but not all mutations destabilize PrPC . A significant destabilization was indeed observed only for the above replacements that were predicted to be destabilizing from analysis of the threedimensional structure. The most destabilizing mutation proved to be T183A, which decreases the free energy of folding relative to the wild-type protein from −30 to −10 kJ mol−1 [55]. Other mutations such as E220K or V210I were completely neutral with respect to PrPC stability. Analogous results were obtained for variants of recombinant PrP(90–231) [62]. Consequently, destabilization of PrPC cannot be the only mechanism underlying the spontaneous formation of prions in inherited human TSEs, and destabilization may trigger disease only in the case of mutations D178N, T183A, F198S, R208H, and Q217R (Figure 6C). Another result from this analysis is that there is no correlation between the thermodynamic stability of a TSE-related PrPC variant and the disease phenotype. For example, both the most destabilizing mutation T183A and the “neutral” mutation E200K are linked with inherited CJD (Figure 6). It follows that there are most likely multiple and independent mechanisms that can lead to spontaneous generation of prions as a consequence of a mutation in PrP. Such mechanisms may, e.g., be an increased stability of PrPSc (“wild-type” PrPSc has a half-life of about two days in vivo) or faster nucleation kinetics of PrPSc . The latter may be caused by the above-mentioned higher fraction of kinetic folding intermediates that has been reported for variants of human PrP(90–231) bearing amino acid replacements associated with inherited TSEs [78, 79].
3 Three-dimensional Structure and Folding of Recombinant PrP
Fig. 6 Point mutations in human PrP that are associated with inherited prion diseases and their effect on the intrinsic stability of recombinant PrP(121–231). (A) Scheme of the primary structure of mature human PrPC , with the amino acid replacements that are linked with the three inherited human TSEs: CJD (blue), GSS (red), and FFI (pink). The Met/Val polymorphism at residue 129 (gray) determines the phenotype of the disease related to the replacement D178N (FFI or CJD, respectively). (B) Location of residues that are replaced in inherited TSE in the structure of PrP(121–231). The
polymorphism in murine PrP at residue 190 is also shown (same color code as in (A)). (C) Thermodynamic stabilities of recombinant variants of PrP(121–231), with disease-related, single amino acid replacements, relative to PrP(121–231) wild type. Positive Go values indicate that the variant is less stable than wild-type PrP(121231) (same color code as in (A)). Wild-type PrP(121–231) has a free energy of folding of −30 kJ mol−1 . The least stable variant T183A is thus destabilized threefold relative to the wild-type protein.
17
18 Biochemistry and Structural Biology of Mammalian Prion Disease
4 Generation of Infectious Prions in vitro: Principal Difficulties in Proving the Protein-Only Hypothesis
Most scientists would consider the in vitro generation of prions from purified PrPC the final proof of the protein-only hypothesis [8]. In principle, there are several ways to perform such an experiment. One approach is the propagation of prions by inoculation of purified PrPC (or recombinant PrP) with catalytic amounts of PrPSc isolated from the brain of prion-infected animals. Another (and possibly even more convincing) experiment is the de novo generation of prions from purified PrPC or recombinant PrP by applying in vitro conditions that favor both spontaneous formation and propagation of PrPSc . The only way to monitor the success of these experiments is to test newly converted PrPSc molecules for infectivity. This is done by injection of prions into the brain of transgenic mice that overexpress PrPC , and are therefore particularly susceptible to prions, and determination of the incubation time until the onset of the disease. Unfortunately, there is only a very rough, inverse linear correlation between the logarithm of the concentration of infectious units and incubation time [48, 97]. In practice, this means that differences in prion concentration between two samples (e.g., before and after an in vitro conversion experiment) can be detected reliably only when the concentrations differ by two to three orders of magnitude. If we assume that a difference in concentration of infectious units by a factor of 100 is minimally required to distinguish two prion preparations, a conversion experiment in which an infectious PrPSc preparation is used as an inoculum has to be designed such that PrPC is used at 100-fold excess over PrPSc and that the conversion would be 100%. It is, however, obvious that the conditions for an in vitro conversion experiment will not be optimal. Thus, if the conversion efficiency was, e.g., only 1%, PrPC would have to be used at an excess of at least 10000-fold over PrPSc . In summary, the low sensitivity of the only available assay for infectious prions requires very high in vitro conversion efficiencies for detection of newly generated prions. The most promising attempts to convert PrPC to the protease-resistant, oligomeric form (which is also termed PrPres ) are experiments in which 35 S-labeled PrPC isolated from mammalian cells is incubated with PrPSc in vitro [47, 98]. Numerous experiments performed according to the following reaction scheme revealed that 35 S-labeled PrPC can indeed be converted in vitro into a protease-resistant form, PrPres , which has the same biochemical properties as proteinase K-treated PrPSc [47]. 35 S-labeled PrPC from mammalian cells + PrPSc from prion-infected individuals Incubation ↓ Proteinase K treatment ↓ Detection of protease-resistant35 S-PrP(Prpres ) ↓
5 Understanding the Strain Phenomenon in the Context of the Protein-Only Hypothesis 19
Further mechanistic studies demonstrated that this in vitro conversion of PrPC to PrPres is characterized by a rapid and specific binding step of PrPC to PrPSc oligomers, followed by a slow conformational transition to PrPres in the PrPSc -bound state [47]. Strikingly, this assay reproduces experimentally determined species barriers of prion diseases, in that PrPSc preparations from one species convert PrPC from another species with similar efficiency, as the disease can be transmitted between the two species [99]. Unfortunately, the seeded in vitro conversion of PrPC to PrPsen generally does not generate more than stoichiometric amounts of PrPres relative to the PrPSc molecules used as seeds [47], which prevents detection of newly generated infectivity. However, the seeded in vitro conversion has been significantly improved recently by a protocol in which growing PrPSc /PrPres oligomers were subjected to repeated rounds of sonification and re-incubation with PrPC , with the idea of breaking growing oligomers into smaller nuclei by the sonification step and thereby increasing the concentration of “growing ends” [100]. Indeed, this PCRlike approach strongly increased the efficiency of the in vitro conversion process, such that a 50-fold molar excess of converted PrPsen over the initial PrPSc seeds was achieved. Despite this significant improvement, it could still not be shown that the concentration of infectivity increased and new prions were generated. In any case, this new method is predicted to improve the sensitivity of prion assays based on the detection of protease-resistant PrP by almost two orders of magnitude. Regarding the above described principle difficulties in proving generation of prions from PrPC when PrPSc is used as an inoculum, the de novo generation of prions from recombinant PrP, which requires only a qualitative proof of existence of infectivity, may be the more promising approach towards the final proof of the protein-only hypothesis. However, the seeded in vitro conversion method has the potential of being further improved through the finding that mammalian RNA preparations stimulate the in vitro amplification of PrPres [101]. This hints at a role for host-encoded stimulatory RNA molecules in prion pathogenesis, which is, e.g., principally possible after retrograde transport of PrP into the cytosol.
5 Understanding the Strain Phenomenon in the Context of the Protein-Only Hypothesis: Are Prions Crystals?
The most frequently raised argument against the protein-only hypothesis is the occurrence of prion strains. This relates to the fact that different prions can be isolated from a single species that differ in incubation time, clinical signs, vacuolar brain pathology, and PrPSc deposition in the brain [8, 23]. These specific phenotypic properties of a given prion strain are reproducibly preserved during consecutive rounds of experimental infection and re-isolation of the strain from the brain of the infected individual. Strain phenomena are typically observed for viruses, which are subject to high mutation rates of the viral genome. The prion strain phenomenon can, however, be explained in the context of the protein-only hypothesis on the basis of the nucleation-polymerization model of prion propagation [47]. This is because
20 Biochemistry and Structural Biology of Mammalian Prion Disease
Fig. 7 Explanation of different prion strains within the framework of the protein-only hypothesis: (A) Besides disease symptoms and incubation time, prion strains isolated from mammalians and humans can be distinguished on the basis of the biochemical properties of PrPSc subunits after proteinase K treatment and separation by SDS-PAGE and Western blotting. Specifically, prion strains differ in the fractions of non-glycosylated, mono-glycosylated, and doubly glycosylated PrP contained in PrPSc oligomers, as wel as the N-terminus of the protease-resistant core of the individual subunits. The figure schematically shows the Western blot band patterns of four different human prion strains after proteinase K treatment of PrPSc : (1) sporadic
CJD, Met/Met 129 genotype; (2) sporadic CJD, Val/Val 129 genotype; (3) iatrogenic CJD, Val/Val129 genotype; (4) variant CJD, Met/Met129 genotype. (B) The simplest explanation for the occurrence of different prion strains are crystallike PrPSc oligomers that differ in the fractions of the incorporated PrP glycoforms, the arrangement of subunits in the quaternary structure, and/or slightly different tertiary structures of incorporated subunits. In this model, each prion strain corresponds to a specific crystal form. Only one crystal form of PrPSc would then occur in a prion-infected individual, and only this crystal form would be further propagated when the disease is transmitted to another individual.
there is very convincing evidence that prion strains differ not only with respect to the above phenotypic criteria but also in the biochemical properties of their PrPSc deposits [6, 47, 102]. Figure 7A schematically depicts a Western blot analysis of human PrPSc from the brain of different patients after proteinase K digestion and SDS-PAGE separation [102]. It shows that the proteinase K– resistant core of the PrPSc oligomers that are associated with a given strain differs both in the fractions of the incorporated PrP glycoforms (in mammalian cells, there is always a mixture of non-glycosylated, mono-glycosylated, and di-glycosylated PrPC ) and in the exact length of the proteinase K–resistant polypeptide fragments. Like the phenotypic
6 Conclusions 21
properties described above, the characteristic strain-specific banding patterns are always preserved, even after crossing the species barrier, and thus are diagnostic for each prion strain. For example, a characteristic feature of BSE prions from cattle is the high fraction of di-glycosylated PrP contained in the PrPSc oligomers, which is also observed in PrPSc isolated from patients with variant CJD [102] (Figure 7A, Table 1). The simplest model that explains these similarities is that PrPSc is a crystallike oligomer. Similar to the fact that proteins frequently form different crystal forms, even under identical crystallization conditions, PrPSc appears to be capable of forming multiple macromolecular assemblies that differ in subunit composition and arrangement in their quaternary structures and possibly also in the tertiary structures of their subunits (Figure 7B). Each prion strain thus would correspond to a different PrPSc “crystal form,” and this single crystal form then acts as a seed for formation of new PrPSc oligomers that grow further in the same crystal form. This model can also explain why humans who are heterozygous with respect to the polymorphism at amino acid 129 (Met or Val in humans) are essentially protected from sporadic CJD: although about 51% of the European population is Met/Met or Val/Val homozygous and 49% is Met/Val heterozygous, 95% of all sporadic CJD patients are homozygous [103, 104]. If we assume that residue 129 is solventexposed in PrPSc subunits (as it is in PrPC ; Figure 6B) and involved in subunitsubunit contacts in PrPSc , it is easy to imagine that a PrPSc oligomer composed of identical polypeptide chains could more easily assemble spontaneously into a quaternary structure with a regular array of subunits, compared to a 1:1 mixture of different polypeptides that are incorporated statistically into the oligomer. Another remarkable finding in this context is that residue 129 is exclusively Met in cattle PrP, and variant CJD patients are exclusively Met/Met homozygous [105].
6 Conclusions
Although essentially all available experimental data on TSE prions and their propagation are either in agreement with or can be explained within the framework of the protein-only hypothesis, the most convincing proof of the prion hypothesis, i.e., the experimental replication of TSE prions in vitro, has not been reported to date. Some of the principal difficulties in proving the generation of infectious prions in vitro by seeding experiments have already been addressed in Section 4. In addition, if the “crystal model” of PrPSc that explains the strain-specific fractions of the different PrP glycoforms in PrPSc is correct, it follows that the PrP glycans are critically involved in subunit-subunit contacts that determine the quaternary structure of PrPSc . In the worst case, this means that it will not be possible to produce a PrPSc oligomer that is infectious and identical to PrPSc isolated from prion-infected brains from non-glycosylated, recombinant PrP. Moreover, experiments on the de novo formation of prions from recombinant PrP have so far essentially been performed in solution, while the conversion process in vivo most likely occurs at the surface of membranes. The covalent linkage to the GPI anchor and a membrane
22 Biochemistry and Structural Biology of Mammalian Prion Disease
localization of PrP may therefore be additional prerequisites for generation of prions in vitro. A new protocol for the preparation of recombinant PrP covalently attached to liposomes has recently been established [106] that will allow in vitro conversion experiments with membrane-bound, recombinant PrP. In addition, like cellular RNAs [101], the glucose-based polyglycans found to be associated with PrPSc [49] may be another critical cofactor of prion replication. A more general question is which experimental strategies should be pursued to gain mechanistic insights into prion propagation at the molecular level. Because there appears to be an inverse correlation between structural similarity at the level of recombinant PrP and species barrier, the determination of additional structures of recombinant PrPs is certainly a promising approach towards a further understanding of the species barrier. Even more important, however, appears to be the investigation of the oligomeric structure of PrPSc . For example, PrPSc oligomers that are spontaneously formed in inherited human prion diseases can either be homo-oligomers of the mutated prion protein or hetero-oligomers of mutated and non-mutated PrP. Both cases have been reported (see references in [55]), but no comprehensive mechanistic investigation on these phenomena is available. As PrPSc is insoluble and not accessible to structure determination by NMR spectroscopy and X-ray crystallography, the combination of electron microscopy, electron crystallography, and biochemical and genetic studies will be required to obtain further insights into the molecular details of the PrPSc quaternary structure. In contrast to mammalian TSEs, the protein-only hypothesis can be considered proven in the case of the yeast prions Ure2p and Sup35p, which are responsible for the non-genetically transmissible phenotypes [URE3] and [PSI], respectively [107–109]. Specifically, infectious aggregates of Sup35p have been generated from recombinant, bacterial protein in vitro and then successfully transmitted to yeast cells that further propagated the [PSI] phenotype [110]. In addition, the speciesbarrier phenomenon was unequivocally shown to be caused by specific aggregates of Sup35p from different yeasts that replicate independently in the same test tube [111]. Thus, there are no principle reasons that the protein-only hypothesis should not be valid for mammalian TSEs.
References 1 PRUSINER, S. B. (1997). Prion diseases and the BSE crisis. Science 278, 245–251. 2 COHEN, F. E. (1999). Protein misfolding and prion diseases. Journal of Molecular Biology 293, 313–320. 3 WEISSMANN, C. (1999). Molecular Genetics of Transmissible Spongiform Encephalopathies. J. Biol. Chem. 274, 3–6.
4 AGUZZI, A., MONTRASIO, F. & KAESER, P. S. (2001). Prions: health scare and biological challenge. Nat Rev Mol Cell Biol 2, 118–126. 5 WEISSMANN, C., ENARI, M., KLOHN, P. C., ROSSI, D. & FLECHSIG, E. (2002). Transmission of prions. J Infect Dis 186 Suppl 2, S157–S165. 6 WADSWORTH, J. D., HILL, A. F., BECK, J. A. & COLLINGE, J. (2003). Molecular and
References
7
8
9
10
11
12 13
14
15
16
17
clinical classification of human prion disease. Br Med Bull 66, 241–254. OSHEROVICH, L. Z. & WEISSMAN, J. S. (2002). The utility of prions. Dev Cell 2, 143–151. AGUZZI, A. & POLYMENIDOU, M. (2004). Mammalian prion biology: one century of evolving concepts. Cell 116, 313–327. BOLTON, D. C., MCKINLEY, M. P. & PRUSINER, S. B. (1982). Identification of a protein that purifies with the scrapie prion. Science 218, 1309–1311. RIESNER, D., KELLINGS, K., WIESE, U., WULFERT, M., MIRENDA, C. & PRUSINER, S. B. (1993). Prions and nucleic acids: search for “residual” nucleic acids and screening for mutations in the PrP-gene. Dev Biol Stand 80, 173–181. ALPER, T., CRAMP, W. A., HAIG, D. A. & CLARKE, M. C. (1967). Does the agent of scrapie replicate without nucleic acid?. Nature 214, 764–766. GRIFFITH, J. S. (1967). Self-replication and scrapie. Nature 215, 1043–1044. PRUSINER, S. B. (1982). Novel proteinaceous infectious particles cause scrapie. Science 216, 136–144. BASLER, K., OESCH, B., SCOTT, M., WESTAWAY, D., WALCHLI, M., GROTH, D. F., MCKINLEY, M. P., PRUSINER, S. B. & WEISSMANN, C. (1986). Scrapie and cellular PrP isoforms are encoded by the same chromosomal gene. Cell 46, 417–428. HOPE, J., MORTON, L. J., FARQUHAR, C. F., MULTHAUP, G., BEYREUTHER, K. & KIMBERLIN, R. H. (1986). The major polypeptide of scrapie-associated fibrils (SAF) has the same size, charge distribution and N-terminal protein sequence as predicted for the normal brain protein (PrP). Embo J 5, 2591–2597. STAHL, N. & PRUSINER, S. B. (1991). Prions and prion proteins. Faseb J 5, 2799–2807. STAHL, N., BALDWIN, M. A., TEPLOW, D. B., HOOD, L., GIBSON, B. W., BURLINGAME, A. L. & PRUSINER, S. B. (1993). Structural studies of the scrapie prion protein using mass spectrometry and amino acid sequencing. Biochemistry 32, 1991–2002.
18 PAN, K. M., BALDWIN, M., NGUYEN, J., GASSET, M., SERBAN, A., GROTH, D., MEHLHORN, I., HUANG, Z., FLETTERICK, R. J., COHEN, F. E. & et al. (1993). Conversion of alpha-helices into beta-sheets features in the formation of the scrapie prion proteins. Proc Natl Acad Sci USA 90, 10962–10966. 19 PRUSINER, S. B., MCKINLEY, M. P., BOWMAN, K. A., BOLTON, D. C., BENDHEIM, P. E., GROTH, D. F. & GLENNER, G. G. (1983). Scrapie prions aggregate to form amyloid-like birefringent rods. Cell 35, 349–358. 20 ALPER, T. (1985). Scrapie agent unlike viruses in size and susceptibility to inactivation by ionizing or ultraviolet radiation. Nature 317, 750. 21 BUELER, H., AGUZZI, A., SAILER, A., GREINER, R. A., AUTENRIED, P., AGUET, M. & WEISSMANN, C. (1993). Mice devoid of PrP are resistant to scrapie. Cell 73, 1339–1347. 22 HSIAO, K., BAKER, H. F., CROW, T. J., POULTER, M., OWEN, F., TERWILLIGER, J. D., WESTAWAY, D., OTT, J. & PRUSINER, S. B. (1989). Linkage of a prion protein missense variant to Gerstmann-Straussler syndrome. Nature 338, 342–345. 23 CHESEBRO, B. (1998). BSE and prions: uncertainties about the agent. Science 279, 42–43. 24 ANFINSEN, C. B. (1973). Principles that govern the folding of protein chains. Science 181, 223–230. 25 PRUSINER, S. B., GROTH, D., SERBAN, A., STAHL, N. & GABIZON, R. (1993). Attempts to restore scrapie prion infectivity after exposure to protein denaturants. Proc Natl Acad Sci USA 90, 2793–2797. 26 PRUSINER, S. B. (1991). Molecular biology of prion diseases. Science 252, 1515–1522. 27 JARRETT, J. T. & LANSBURY, P. T., Jr. (1993). Seeding “one-dimensional crystallization” of amyloid: a pathogenic mechanism in Alzheimer’s disease and scrapie?. Cell 73, 1055–1058. 28 BOOTH, D. R., SUNDE, M., BELLOTTI, V., ROBINSON, C. V., HUTCHINSON, W. L., FRASER, P. E., HAWKINS, P. N., DOBSON,
23
24 Biochemistry and Structural Biology of Mammalian Prion Disease
29
30
31
32
33
34
35
36
C. M., RADFORD, S. E., BLAKE, C. C. & PEPYS, M. B. (1997). Instability, unfolding and aggregation of human lysozyme variants underlying amyloid fibrillogenesis. Nature 385, 787–793. HAMMARSTROM, P., WISEMAN, R. L., POWERS, E. T. & KELLY, J. W. (2003). Prevention of transthyretin amyloid disease by changing protein misfolding energetics. Science 299, 713–716. SCHATZL, H. M., DA COSTA, M., TAYLOR, L., COHEN, F. E. & PRUSINER, S. B. (1995). Prion protein gene variation among primates. J Mol Biol 245, 362–374. WOPFNER, F., WEIDENHOFER, G., SCHNEIDER, R., VON BRUNN, A., GILCH, S., SCHWARZ, T. F., WERNER, T. & SCHATZL, H. M. (1999). Analysis of 27 mammalian and 9 avian PrPs reveals high conservation of flexible regions of the prion protein. J Mol Biol 289, 1163–1178. HOPE, J., MORTON, L. J., FARQUHAR, C. F., MULTHAUP, G., BEYREUTHER, K. & KIMBERLIN, R. H. (1986). The major polypeptide of scrapie-associated fibrils (SAF) has the same size, charge distribution and N-terminal protein sequence as predicted for the normal brain protein (PrP). Embo J 5, 2591–2597. TURK, E., TEPLOW, D. B., HOOD, L. E. & PRUSINER, S. B. (1988). Purification and properties of the cellular and scrapie hamster prion proteins. Eur J Biochem 176, 21–30. CAUGHEY, B., RACE, R. E., ERNST, D., BUCHMEIER, M. J. & CHESEBRO, B. (1989). Prion protein biosynthesis in scrapie-infected and uninfected neuro-blastoma cells. J Virol 63, 175–181. BUELER, H., FISCHER, M., LANG, Y., BLUETHMANN, H., LIPP, H. P., DEARMOND, S. J., PRUSINER, S. B., AGUET, M. & WEISSMANN, C. (1992). Normal development and behaviour of mice lacking the neuronal cell-surface PrP protein. Nature 356, 577–582. SHMERLING, D., HEGYI, I., FISCHER, M., BLATTLER, T., BRANDNER, S., GOTZ, J., RULICKE, T., FLECHSIG, E., COZZIO, A., VON MERING, C., HANGARTNER, C., AGUZZI, A. & WEISSMANN, C. (1998). Expression of amino-terminally
37
38
39
40
41
42
43
truncated PrP in the mouse leading to ataxia and specific cerebellar lesions. Cell 93, 203–214. BROWN, D. R., QIN, K., HERMS, J. W., MADLUNG, A., MANSON, J., STROME, R., FRASER, P. E., KRUCK, T., VON BOHLEN, A., SCHULZ-SCHAEFFER, W., GIESE, A., WESTAWAY, D. & KRETZSCHMAR, H. (1997). The cellular prion protein binds copper in vivo. Nature 390, 684–687. TOBLER, I., GAUS, S. E., DEBOER, T., ACHERMANN, P., FISCHER, M., RULICKE, T., MOSER, M., OESCH, B., MCBRIDE, P. A. & MANSON, J. C. (1996). Altered circadian activity rhythms and sleep in mice devoid of prion protein. Nature 380, 639–642. COLLINGE, J., WHITTINGTON, M. A., SIDLE, K. C., SMITH, C. J., PALMER, M. S., CLARKE, A. R. & JEFFERYS, J. G. (1994). Prion protein is necessary for normal synaptic function. Nature 370, 295–297. SAKAGUCHI, S., KATAMINE, S., NISHIDA, N., MORIUCHI, R., SHIGEMATSU, K., SUGIMOTO, T., NAKATANI, A., KATAOKA, Y., HOUTANI, T., SHIRABE, S., OKADA, H., HASEGAWA, S., MIYAMOTO, T. & NODA, T. (1996). Loss of cerebellar Purkinje cells in aged mice homozygous for a disrupted PrP gene. Nature 380, 528–531. BARON, G. S., WEHRLY, K., DORWARD, D. W., CHESEBRO, B. & CAUGHEY, B. (2002). Conversion of raft associated prion protein to the protease-resistant state requires insertion of PrP-res (PrP(Sc)) into contiguous membranes. Embo J 21, 1031–1040. PETERS, P. J., MIRONOV, A., Jr., PERETZ, D., van DONSELAAR, E., LECLERC, E., ERPEL, S., DEARMOND, S. J., BURTON, D. R., WILLIAMSON, R. A., VEY, M. & PRUSINER, S. B. (2003). Trafficking of prion proteins through a caveolae-mediated endosomal pathway. J Cell Biol 162, 703–717. CAUGHEY, B. & RAYMOND, G. J. (1991). The scrapie-associated form of PrP is made from a cell surface precursor that is both protease- and phospholipase-sensitive. J Biol Chem 266, 18217–18223.
References 44 HEGDE, R. S., MASTRIANNI, J. A., SCOTT, M. R., DEFEA, K. A., TREMBLAY, P., TORCHIA, M., DEARMOND, S. J., PRUSINER, S. B. & LINGAPPA, V. R. (1998). A transmembrane form of the prion protein in neurodegenerative disease. Science 279, 827–834. 45 MA, J., WOLLMANN, R. & LINDQUIST, S. (2002). Neurotoxicity and neurodegeneration when PrP accumulates in the cytosol. Science 298, 1781–1785. 46 MCKINLEY, M. P., BOLTON, D. C. & PRUSINER, S. B. (1983). A protease-resistant protein is a structural component of the scrapie prion. Cell 35, 57–62. 47 CAUGHEY, B., RAYMOND, G. J., CALLAHAN, M. A., WONG, C., BARON, G. S. & XIONG, L. W. (2001). Interactions and conversions of prion protein isoforms. Adv Protein Chem 57, 139–169. 48 FISCHER, M., RULICKE, T., RAEBER, A., SAILER, A., MOSER, M., OESCH, B., BRANDNER, S., AGUZZI, A. & WEISSMANN, C. (1996). Prion protein (PrP) with amino-proximal deletions restoring susceptibility of PrP knockout mice to scrapie. Embo J 15, 1255–1264. 49 APPEL, T. R., DUMPITAK, C., MATTHIESEN, U. & RIESNER, D. (1999). Prion rods contain an inert polysaccharide scaffold. Biol Chem 380, 1295–1306. 50 NGUYEN, J. T., INOUYE, H., BALDWIN, M. A., FLETTERICK, R. J., COHEN, F. E., PRUSINER, S. B. & KIRSCHNER, D. A. (1995). X-ray diffraction of scrapie prion rods and PrP peptides. J Mol Biol 252, 412–422. 51 WILLE, H., MICHELITSCH, M. D., GUENEBAUT, V., SUPATTAPONE, S., SERBAN, A., COHEN, F. E., AGARD, D. A. & PRUSINER, S. B. (2002). Structural studies of the scrapie prion protein by electron crystallography. Proc Natl Acad Sci USA 99, 3563–3568. 52 HORNEMANN, S. & GLOCKSHUBER, R. (1996). Autonomous and reversible folding of a soluble amino-terminally truncated segment of the mouse prion protein. J Mol Biol 261, 614–619. 53 HORNEMANN, S., KORTH, C., OESCH, B., RIEK, R., WIDER, G., WUTHRICH, K. &
54
55
56
57
58
59
60
61
62
GLOCKSHUBER, R. (1997). Recombinant full-length murine prion protein, mPrP(23–231): purification and spectroscopic characterization. FEBS Lett 413, 277–281. ZAHN, R., VON SCHROETTER, C. & WUTHRICH, K. (1997). Human prion proteins expressed in Escherichia coli and purified by high-affinity column refolding. FEBS Lett 417, 400–404. LIEMANN, S. & GLOCKSHUBER, R. (1999). Influence of amino acid substitutions related to inherited human prion diseases on the thermodynamic stability of the cellular prion protein. Biochemistry 38, 3258–3267. RIEK, R., HORNEMANN, S., WIDER, G., BILLETER, M., GLOCKSHUBER, R. & WUTHRICH, K. (1996). NMR structure of the mouse prion protein domain PrP(121–321). Nature 382, 180–182. WUTHRICH, K. & RIEK, R. (2001). Three-dimensional structures of prion proteins. Adv Protein Chem 57, 55–82. RIEK, R., WIDER, G., BILLETER, M., HORNEMANN, S., GLOCKSHUBER, R. & WUTHRICH, K. (1998). Prion protein NMR structure and familial human spongiform encephalopathies. Proc Natl Acad Sci USA 95, 11667–11672. RIEK, R., HORNEMANN, S., WIDER, G., GLOCKSHUBER, R. & WUTHRICH, K. (1997). NMR characterization of the full-length recombinant murine prion protein, mPrP(23–231). FEBS Lett 413, 282–288. VILES, J. H., COHEN, F. E., PRUSINER, S. B., GOODIN, D. B., WRIGHT, P. E. & DYSON, H. J. (1999). Copper binding to the prion protein: structural implications of four identical cooperative binding sites. Proc Natl Acad Sci USA 96, 2042–2047. DYSON, H. J. & WRIGHT, P. E. (2002). Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol 12, 54–60. KNAUS, K. J., MORILLAS, M., SWIETNICKI, W., MALONE, M., SUREWICZ, W. K. & YEE, V. C. (2001). Crystal structure of the human prion protein reveals a mechanism for oligomerization. Nat Struct Biol 8, 770–774.
25
26 Biochemistry and Structural Biology of Mammalian Prion Disease
63 MOORE, R. C., LEE, I. Y., SILVERMAN, G. L., HARRISON, P. M., STROME, R., HEINRICH, C., KARUNARATNE, A., PASTERNAK, S. H., CHISHTI, M. A., LIANG, Y., MASTRANGELO, P., WANG, K., SMIT, A. F., KATAMINE, S., CARLSON, G. A., COHEN, F. E., PRUSINER, S. B., MELTON, D. W., TREMBLAY, P., HOOD, L. E. & WESTAWAY, D. (1999). Ataxia in prion protein (PrP)-deficient mice is associated with upregulation of the novel PrP-like protein doppel. J Mol Biol 292, 797–817. 64 BEHRENS, A., GENOUD, N., NAUMANN, H., RULICKE, T., JANETT, F., HEPPNER, F. L., LEDERMANN, B. & AGUZZI, A. (2002). Absence of the prion protein homologue Doppel causes male sterility. Embo J 21, 3652–3658. 65 MO, H., MOORE, R. C., COHEN, F. E., WESTAWAY, D., PRUSINER, S. B., WRIGHT, P. E. & DYSON, H. J. (2001). Two different neurodegenerative diseases caused by proteins with similar structures. Proc Natl Acad Sci USA 98, 2352–2357. 66 LUHRS, T., RIEK, R., GUNTERT, P. & WUTHRICH, K. (2003). NMR structure of the human doppel protein. J Mol Biol 326, 1549–1557. 67 JAMES, T. L., LIU, H., ULYANOV, N. B., FARR-JONES, S., ZHANG, H., DONNE, D. G., KANEKO, K., GROTH, D., MEHLHORN, I., PRUSINER, S. B. & COHEN, F. E. (1997). Solution structure of a 142-residue recombinant prion protein corresponding to the infectious fragment of the scrapie isoform. Proc Natl Acad Sci USA 94, 10086–10091. 68 DONNE, D. G., VILES, J. H., GROTH, D., MEHLHORN, I., JAMES, T. L., COHEN, F. E., PRUSINER, S. B., WRIGHT, P. E. & DYSON, H. J. (1997). Structure of the recombinant full-length hamster prion protein PrP(29– 231): the N terminus is highly flexible. Proc Natl Acad Sci USA 94, 13452–13457. 69 ZAHN, R., LIU, A., LUHRS, T., RIEK, R., VON SCHROETTER, C., LOPEZ GARCIA, F., BILLETER, M., CALZOLAI, L., WIDER, G. & WUTHRICH, K. (2000). NMR solution structure of the human prion protein. Proc Natl Acad Sci USA 97, 145–150. 70 LOPEZ GARCIA, F., ZAHN, R., RIEK, R. & WUTHRICH, K. (2000). NMR structure of
71
72
73
74
75
76
77
78
79
the bovine prion protein. Proc Natl Acad Sci USA 97, 8334–8339. TELLING, G. C., SCOTT, M., MASTRIANNI, J., GABIZON, R., TORCHIA, M., COHEN, F. E., DEARMOND, S. J. & PRUSINER, S. B. (1995). Prion propagation in mice expressing human and chimeric PrP transgenes implicates the interaction of cellular PrP with another protein. Cell 83, 79–90. SWIETNICKI, W., PETERSEN, R., GAMBETTI, P. & SUREWICZ, W. K. (1997). pH-dependent stability and conformation of the recombinant human prion protein PrP(90–231). J Biol Chem 272, 27517–27520. SWIETNICKI, W., PETERSEN, R. B., GAMBETTI, P. & SUREWICZ, W. K. (1998). Familial mutations and the thermodynamic stability of the recombinant human prion protein. J Biol Chem 273, 31048–31052. HOSSZU, L. L., BAXTER, N. J., JACKSON, G. S., POWER, A., CLARKE, A. R., WALTHO, J. P., CRAVEN, C. J. & COLLINGE, J. (1999). Structural mobility of the human prion protein probed by backbone hydrogen exchange. Nat Struct Biol 6, 740–743. JACKSON, G. S., HOSSZU, L. L., POWER, A., HILL, A. F., KENNEY, J., SAIBIL, H., CRAVEN, C. J., WALTHO, J. P., CLARKE, A. R. & COLLINGE, J. (1999a). Reversible conversion of monomeric human prion protein between native and fibrilogenic conformations. Science 283, 1935–1937. PRUSINER, S. B., GROTH, D., SERBAN, A., STAHL, N. & GABIZON, R. (1993). Attempts to restore scrapie prion infectivity after exposure to protein denaturants. Proc Natl Acad Sci USA 90, 2793–2797. WILDEGGER, G., LIEMANN, S. & GLOCKSHUBER, R. (1999). Extremely rapid folding of the C-terminal domain of the prion protein without kinetic intermediates. Nat Struct Biol 6, 550–553. APETRI, A. C. & SUREWICZ, W. K. (2002). Kinetic intermediate in the folding of human prion protein. J Biol Chem 277, 44589–44592. APETRI, A. C., SUREWICZ, K. & SUREWICZ, W. K. (2004). The effect of
References
80
81
82
83
84
85
86
87
88
disease-associated mutations on the folding pathway of human prion protein. J Biol Chem 279, 18008–18014. VEY, M., PILKUHN, S., WILLE, H., NIXON, R., DEARMOND, S. J., SMART, E. J., ANDERSON, R. G., TARABOULOS, A. & PRUSINER, S. B. (1996). Subcellular colocalization of the cellular and scrapie prion proteins in caveolae-like membranous domains. Proc Natl Acad Sci USA 93, 14945–14949. ARNOLD, J. E., TIPLER, C., LASZLO, L., HOPE, J., LANDON, M. & MAYER, R. J. (1995). The abnormal isoform of the prion protein accumulates in late-endosome-like organelles in scrapie-infected mouse brain. J Pathol 176, 403–411. BORCHELT, D. R., TARABOULOS, A. & PRUSINER, S. B. (1992). Evidence for synthesis of scrapie prion proteins in the endocytic pathway. J Biol Chem 267, 16188–16199. LEE, R. J., WANG, S. & LOW, P. S. (1996). Measurement of endosome pH following folate receptor-mediated endocytosis. Biochim Biophys Acta 1312, 237–242. ZHANG, H., STOCKEL, J., MEHLHORN, I., GROTH, D., BALDWIN, M. A., PRUSINER, S. B., JAMES, T. L. & COHEN, F. E. (1997). Physical studies of conformational plasticity in a recombinant prion protein. Biochemistry 36, 3543–3553. HORNEMANN, S. & GLOCKSHUBER, R. (1998). A scrapie-like unfolding intermediate of the prion protein domain PrP(121–231) induced by acidic pH. Proc Natl Acad Sci USA 95, 6010–6014. JACKSON, G. S., HILL, A. F., JOSEPH, C., HOSSZU, L., POWER, A., WALTHO, J. P., CLARKE, A. R. & COLLINGE, J. (1999b). Multiple folding pathways for heterologously expressed human prion protein. Biochim Biophys Acta 1431, 1–13. GLOCKSHUBER, R. (2001). Folding dynamics and energetics of recombinant prion proteins. Adv Protein Chem 57, 83–105. SWIETNICKI, W., MORILLAS, M., CHEN, S. G., GAMBETTI, P. & SUREWICZ, W. K.
89
90
91
92
93
94
95 96
97
98
(2000). Aggregation and fibrillization of the recombinant human prion protein huPrP90-231. Biochemistry 39, 424–431. MEHLHORN, I., GROTH, D., STOCKEL, J., MOFFAT, B., REILLY, D., YANSURA, D., WILLETT, W. S., BALDWIN, M., FLETTERICK, R., COHEN, F. E., VANDLEN, R., HENNER, D. & PRUSINER, S. B. (1996). High-level expression and characterization of a purified 142-residue polypeptide of the prion protein. Biochemistry 35, 5528–5537. JACKSON, G. S., HOSSZU, L. L., POWER, A., HILL, A. F., KENNEY, J., SAIBIL, H., CRAVEN, C. J., WALTHO, J. P., CLARKE, A. R. & COLLINGE, J. (1999a). Reversible conversion of monomeric human prion protein between native and fibrilogenic conformations. Science 283, 1935–1937. MA, J. & LINDQUIST, S. (1999). De novo generation of a PrPSc-like conformation in living cells. Nat Cell Biol 1, 358–361. MA, J., WOLLMANN, R. & LINDQUIST, S. (2002). Neurotoxicity and neurodegeneration when PrP accumulates in the cytosol. Science 298, 1781–1785. LEE, S. & EISENBERG, D. (2003). Seeded conversion of recombinant prion protein to a disulfide-bonded oligomer by a reduction-oxidation process. Nat Struct Biol 10, 725–730. FANDRICH, M., FLETCHER, M. A. & DOBSON, C. M. (2001). Amyloid fibrils from muscle myoglobin. Nature 410, 165–166. DOBSON, C. M. (2003). Protein folding and misfolding. Nature 426, 884–890. APETRI, A. C. & SUREWICZ, W. K. (2003). Atypical effect of salts on the thermodynamic stability of human prion protein. J Biol Chem 278, 22187–22192. BRANDNER, S., ISENMANN, S., RAEBER, A., FISCHER, M., SAILER, A., KOBAYASHI, Y., MARINO, S., WEISSMANN, C. & AGUZZI, A. (1996). Normal host prion protein necessary for scrapie-induced neurotoxicity. Nature 379, 339–343. KOCISKO, D. A., COME, J. H., PRIOLA, S. A., CHESEBRO, B., RAYMOND, G. J., LANSBURY, P. T. & CAUGHEY, B. (1994). Cell-free formation of protease-resistant prion protein. Nature 370, 471–474.
27
28 Biochemistry and Structural Biology of Mammalian Prion Disease
99 RAYMOND, G. J., HOPE, J., KOCISKO, D. A., PRIOLA, S. A., RAYMOND, L. D., BOSSERS, A., IRONSIDE, J., WILL, R. G., CHEN, S. G., PETERSEN, R. B., GAMBETTI, P., RUBENSTEIN, R., SMITS, M. A., LANSBURY, P. T., Jr. & CAUGHEY, B. (1997). Molecular assessment of the potential transmissibilities of BSE and scrapie to humans. Nature 388, 285–288. 100 SABORIO, G. P., PERMANNE, B. & SOTO, C. (2001). Sensitive detection of pathological prion protein by cyclic amplification of protein misfolding. Nature 411, 810–813. 101 DELEAULT, N. R., LUCASSEN, R. W. & SUPATTAPONE, S. (2003). RNA molecules stimulate prion protein conversion. Nature 425, 717–720. 102 COLLINGE, J., SIDLE, K. C., MEADS, J., IRONSIDE, J. & HILL, A. F. (1996a). Molecular analysis of prion strain variation and the aetiology of ’new variant’ CJD. Nature 383, 685–690. 103 PALMER, M. S., DRYDEN, A. J., HUGHES, J. T. & COLLINGE, J. (1991). Homozygous prion protein genotype predisposes to sporadic Creutzfeldt-Jakob disease. Nature 352, 340–342. 104 COLLINGE, J., PALMER, M. S. & DRYDEN, A. J. (1991). Genetic predisposition to iatrogenic Creutzfeldt-Jakob disease. Lancet 337, 1441–1442.
105 COLLINGE, J., BECK, J., CAMPBELL, T., ESTIBEIRO, K. & WILL, R. G. (1996b). Prion protein gene analysis in new variant cases of Creutzfeldt-Jakob disease. Lancet 348, 56. 106 EBERL, H., TITTMANN, P. & GLOCKSHUBER, R. (2004). Characterization of recombinant, membrane-attached full-length prion protein. J Biol Chem 279, 25058–25065. 107 UPTAIN, S. M. & LINDQUIST, S. (2002). Prions as protein-based genetic elements. Annu Rev Microbiol 56, 703–741. 108 OSHEROVICH, L. Z. & WEISSMAN, J. S. (2002). The utility of prions. Dev Cell 2, 143–151. 109 WICKNER, R. B., EDSKES, H. K., ROBERTS, B. T., PIERCE, M. & BAXA, U. (2002). Prions of yeast as epigenetic phenomena: high protein “copy number” inducing protein “silencing“. Adv Genet 46, 485–4525. 110 SPARRER, H. E., SANTOSO, A., SZOKA, F. C., Jr. & WEISSMAN, J. S. (2000). Evidence for the prion hypothesis: induction of the yeast [PSI+] factor by in vitro – converted Sup35 protein. Science 289, 595–599. 111 SANTOSO, A., CHIEN, P., OSHEROVICH, L. Z. & WEISSMAN, J. S. (2000). Molecular basis of a yeast prion species barrier. Cell 100, 277–288.
1
Prion Protein
Philippe Derreumaux Institut de Biologie Physico-Chimique–CNRS, Paris, France
Originally published in: Amyloid Proteins. Edited by Jean D. Sipe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31072-X
1 Introduction
The deposition of amyloid fibrils sharing a common cross-β-sheet structure with the β-strands perpendicular to the fiber axis is a hallmark of several fatal diseases [1–3]. These disorders associated with the failure of proteins to fold correctly can affect the brain, the central nervous system, and various organs such as the liver and heart in humans and animals. The list of fatal diseases includes, among others, primary and secondary systemic amyloidosis, Alzheimer’s, Parkinson’s and Huntington’s diseases, diabetes Type 2, and transmissible spongiform encephalopathies (TSE). The first TSE symptom observed is pronounced astrogliosis [4], followed by spongiform degeneration caused by the formation of vacuoles in neuronal processes and astrocytes [5]. The late step in the TSE disease often involves the formation of amyloid plaques containing PrPSc [6], the abnormal (or scrapie) isoform of the cellular prion protein (PrPC ) encoded by the PrP gene and found predominantly on the outer surface of neurons. The “protein-only” hypothesis states that a single prion (proteinaceous infectious particle) protein, lacking a small nucleic acid, can give rise to multiple isolates or strains with varying infectivity and incubation time, and is able to convert PrPC to PrPSc in an autocatalytic manner [7–9]. This theory is at variance with the virus [10] and virino [11] hypotheses which postulate that the infectious particle consists of a small nucleic acid coated by PrPSc . The original translation product consists of 253 amino acids, but region 1–22 is cleaved as signal peptide during trafficking and region 232–253 is replaced by a glycosylphosphatidylinositol (GPI) anchor at position 231. As seen in Fig. 1, mammalian PrPC is a highly conserved secretory cell surface glycoprotein of approximately 210 amino acids (residues 23–231). PrPC is also characterized by two glycosylation sites at N181 and N197, and a single disulfide bond between the cysteines C179 and C214. PrPC and PrPSc Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Prion Protein
share the same primary structure – no sequence or post-translational differences have been detected [12] – but have distinct biophysical and biochemical properties. PrPC is monomeric, soluble in detergents and sensitive to proteinase K, while PrPSc is oligomeric, insoluble and partially resistant to proteinase K [13, 14]. In addition, the conversion of PrPC to PrPSc involves a large variation in secondary structure as determined by IR spectroscopy: PrPC has 42% α-helix and 3–5% β-sheet, whereas PrPSc has 30% α-helix and 43% β-sheet, and PrPSc 27–30 (i.e. PrPSc with removal of residues 23–90) has 25% α-helix and 48–54% β-sheet [15–18]. Despite decades of study and recent promising discoveries, scientists do not know how to prevent prion diseases or what leads prion proteins to become misfolded and toxic. No purified recombinant PrP has been successfully converted in vitro to infectious PrPSc . Yet, scrapie in sheep and goats was first recognized in the 18th century, and mad cow disease or bovine spongiform encephalopathy (BSE) was discovered in 1986. In humans, different types of TSE were identified during the 20th century. Human TSEs are essentially sporadic (80%) with the modes of natural transmission remaining undetermined; around 15% are inherited [familial forms of Creutzfeldt–Jakob disease (CJD), fatal familial insomnia (FFI) and Gerstmann–Str¨aussler syndrome (GSS)] by mutations in the human PrP gene within chromosome 20. Sporadic CJD occurs at a rate of 0.5–1 per million population per year and GSS, which is the most common familial TSE, generally occurs in the third or fourth decade of life. New pathogenic mutations in the human prion protein continue to be discovered and the current identified list is given in Fig. 1. Finally, around 5% of human TSEs are infectious through CJD-infected surgical equipment or tissue transplants; kuru, recognized in 1957, results from exposure to contaminated human tissues during endocannibalistic rituals, whereas new variant CJD (vCJD), identified in 1996, is believed to be spread by dietary exposure to the BSE agent [19] and might also be transmitted by blood transmission [20]. In this chapter, we review our current understanding of the underlying factors that may promote the topological change of mammalian prion proteins. We have selected to focus here on the structural, thermodynamic and dynamic aspects of the in vitro and in vivo conversion. Reviews on the link between misfolding, pathogenesis and neurotoxicity on the pathological implications of prion glycosylation, on the differences between mammalian and yeast prions, and, finally, on the possible destinations taken by PrPC can be found elsewhere [21–24].
2 Conformations of PrPC and PrPSc
Knowledge of the three-dimensional structure of PrP in its protease-sensitive and protease-resistant forms is important for our understanding of prion replication. However, both the PrPC and PrPSc structures are still unknown at atomic resolution. Amyloid fibrils are non-crystalline and insoluble, and therefore not amenable to X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. Furthermore, there is accumulating evidence that the PrPSc structure is not unique
2 Conformations of PrPC and PrPSc
Fig. 1 Species variations and mutations of the prion protein gene. (A) The octarepeats within the tail are represented by white boxes, the regular secondary structures (helices: H1, H2 and H3; $-strands: S1 and S2) within the NMR PrPC structure [28] are also shown. The disulfide bond (SS) between Cys179 and Cys214 is drawn as a red line; the GPI anchor at position 231 and the two glycosylation sites N181 and N197 are also shown. Polymorphisms and point mutations associated with known human GSS, CJD or FFI are given. The
mutations H187R, T188R, T188A, T188K and P238S are still unclassified [137]. (B) Aligned amino acid sequences of human, bovine, hamster, mouse, sheep, chicken and duck PrP. The multiple alignment was carried using T-Coffee [136]. The reliability of the alignment is given in a color code (e.g. red means very reliable). In the last line below each aligned sequences, the asterisk indicates the strictly conserved residues. Note that the fragment spanning residues 106–126 is fully identical between the seven species.
3
4 Prion Protein
Fig. 2 Mutations of the PrP gene are associated with prion diseases. (A) The NMR recombinant Syrian hamster PrPC NMR structure [28] with the three "-helices H1, H2 and H3, the two $-strands S1 and S2, the disulfide bond between the cysteines C179 and C214, the GPI anchor, and the two glycosylation sites N181 and N197. For
clarity, only residues 120–231 are shown. (B) The positions of all polymorphisms and disease-causing mutations as discussed in Fig. 1. The mutations leading to GSS are given in green; those leading to CJD are in red. Here, all residues spanning PrP90–231 are shown.
and differences in the conformation of the infectious protein determine prion strain variation [25, 26]. Based on far-UV circular dichroism (CD) and limited proteolysis experiments, PrP and recombinant PrPs in membranes (lacking glycosylation sites) or in solution (lacking both GPI anchor and glycosylation sites) seem to adopt similar conformations [27], but no detailed structure is available for exact comparison. Thus far, only the solution structure of recombinant PrP from various mammalian species was characterized by NMR spectroscopy and X-ray diffraction. All NMR solution structures consist of a disordered tail (residues 23–123), and a globular domain with three α-helices and a short antiparallel β-sheet (Fig. 2). The β-strands S1 and S2 comprise residues 128–131 and 161–164, respectively helix H1 spans residues 144–153, helix H2 residues 172–194 and helix H3 residues 200–225 (numbering in Syrian hamster [28]). Although the NMR structures from mouse PrP121–231 [29], bovine PrP23–230 [30], Syrian hamster PrP23–231 [28] and human PrP121–230 at pH 7 and 4.5 [31, 32] superpose well, conformational changes are observed within the loops connecting the helices and even in the length of the helices. The disorder in the loops is an intrinsic property of PrP as deduced by hydrogen-deuterium exchange measurements [33] and is independent of the oligomeric (monomeric or dimeric) state of PrP, as deduced by molecular dynamics (MD) simulations [34]. Helix H3 spans residues 200–222 in mouse PrP121–231 versus 200–228 in human PrP90–231 and helix H2 is 12 residues shorter (173–182) in the human PrP125–228(S170N) variant [35]. Strikingly, the crystal PrP structures do not always superpose on the NMR conformations. The crystal structure of recombinant human PrP119–226 shows a
2 Conformations of PrPC and PrPSc
covalently dimer with an intermolecular disulfide bridge, swapping of helix H3 and formation of an interchain antiparallel β-sheet through residues 190–194, i.e. in helix H2 within the monomeric NMR structure [36]. In contrast to human PrP, the crystal structure of recombinant sheep PrP123–230 points to a monomeric state with an intramolecular disulfide bridge, superposing within 1.7 Å r.m.s. deviation from the NMR human PrP structure [37]. This X-ray structure also shows intermolecular or lattice contacts between the strands S1, allowing for the formation of a four-stranded intermolecular β-sheet and the possibility that the residues 190–194 could easily shift from an α-helix to a β-strand [37]. Whether the dimeric form with swapping of helix H3 and intermolecular disulfide bridge is an artifact of the crystallization protocol or indicates a physiological state remains to be determined. An essential aspect of the recombinant PrP structure is that the entire tail encompassing residues 23 to around 120 is highly flexible and largely disordered for a wide range of pH values. This is not totally surprising since the tail spanning residues 23–90 contains around 40% of glycine residues. Region 90–120 is, however, critical in prion propagation. Residues 108–111 influence prion replication efficiency [38]. PrP with the region 113–120 deleted is not converted to a protease-resistant form when expressed in scrapie-infected neuroblastoma cells [39]. Mice expressing PrP with a deletion of residues 32–106 are not susceptible to infection in vivo [40]. In addition, the mutations P102L and P105L lead to GSS syndrome and the mutation P101L in murine PrP (P102L in human numbering) alters TSE incubation time across three species barriers [41]. The flexible tail contains the octapeptide PHGGGWGQ repeated 4 times from residues 60 to 90 which binds copper within the physiological range [42]. Using PrP-derived peptides, full-length Syrian hamster PrP and electron paramagnetic resonance measurements, copper was found to interact not only with each HGGGW segment of the tail, but also with the PrP92–96 segment GGGTH through the His96 imidazole group and the carboxyl oxygen atom of Gly94 [43]. The 106–126 peptide, cytotoxic in vivo [44] and containing the palindromic sequence VAGAAAAGAV, was shown to form fibrils with parallel β-sheets [45], but the extrapolation of this result to PrP90–231 remains to be determined. Recent NMR studies of recombinant human PrP23–230 at pH 6.2 showed that the octapeptide repeats are structured, with the segments HGGGW and GWGQ adopting a loop conformation and a β-turn, respectively [46]. This result, along with the crystal structure of the copperbinding pentapeptide HGGGW-Cu2+ [47], suggests that the conformations of the HGGGW loop depend on both pH and copper binding. Three PrPSc models were first constructed on the basis of immunological studies and CD spectra. In the first model, the region between residues 90 and 145 was modeled by two consecutive β-hairpins (four β-strands) with β-strand 1 parallel to β-strand 3 [48]. In the second model, the secondary H1, S1 and S2 structural elements were replaced by a Greek key consisting of four adjacent antiparallel β-strands [49], whereas in the last model, PrPSc adopted β-helical conformations [50]. More recently, three other PrPSc models were proposed on the basis of specific criteria and electron microscopy data at 7 Å resolution from in vitro infectious Syrian and mouse PrP two-dimensional protofibril crystals. Fig. 3 shows
5
6 Prion Protein
Fig. 3 Schematic representation of the secondary structures of wild-type recombinant Syrian hamster PrPSc by NMR [28], and the three PrPSc models proposed by Wille et al. [51], Mornon et al. [53] and De Marco
and Daggett [54]. Helices H1, H2 and H3 are presented by grey boxes; S1, S2 and the new $-strands by black boxes; $-helical conformations by cross-hatched boxes.
the fluctuations of the secondary structures within these predicted PrPSc models. Wille et al. proposed a hexamer with each monomer featuring β-helices in the region 90–170 and α-helices H2 and H3 formed using threading techniques onto known β-helical structures. Nevertheless, they did not exclude the possibility that PrPSc could adopt a novel protein fold [51]. Mornon et al., using sequence analysis, identified the TATA box-binding protein fold as a template for PrPSc which allowed for the construction of another hexameric PrPSc model [52, 53]. In their model, the S1H1S2 motif and the N-terminal end of helix H3 are preserved, whereas the helix H2 and its neighboring residues are transformed into three new β-strands. Finally, De Marco and Daggett used MD simulations of the Syrian hamster PrP109–219 (D147N) at low pH to generate β-rich conformations and then docking between the units to build an alternative hexameric PrPSc model [54]. In this model, the three helices are preserved, and the core of the β structure consists of an isolated and new β-strand preceding helix H1, and a new β-strand spanning residues 116–119 packed against the S1 and S2 strands. Since it is possible to generate several PrPSc models within the electron microscopy envelope, other experimental data are needed to discriminate the solution from false positives. These can include differential proteinase K, vibrational spectroscopies and antibodies which recognize selectively PrPC or PrPSc . The model of Wille et al. [51] with β-helices was recently questioned using vibrational Raman optical activity measurements, which pointed to strong similarities between the spectra of PrPSc and flat β-sheet proteins, but not β-helical proteins [55]. The model of Mornon et al. is consistent with a recently discovered antibody recognizing the YYR motif within the strand S2 of PrPSc [56] and with the antibody V5B2, raised against residues 214–226, which recognizes PrPSc , but not PrPSc , indicating a structural rearrangement of the C-terminus [57]. The model of De Marco and Daggett [54] is consistent with peptide-binding studies of PrPSc [58, 59] potently inhibiting the PrP-res induced cell-free conversion of PrP-sen to the protease-resistant state. This model is also consistent with the monoclonal antibody 15B3, which recognizes
3 Stability and Unfolding/Folding of PrPC In Vitro
PrPSc at positions 142–148 (N-terminal half of helix H1 in the normal protein), 162–170 and 214–226, but not PrPC [49]. However, neither Mornon’s or De Marco’s model discuss the conformations of the fragment 90–109, and fail to reproduce, in their present forms, the experimental percentage of β structure. Fourier transform IR studies suggest that 48% of residues (or about 68 residues) [18] are in β-strands within PrPSc 90–231, whereas for instance the De Marco’s model has 37 residues in β-strands within the region 109–219. These differences require study in more detail.
3 Stability and Unfolding/Folding of PrPC In Vitro
An important step towards understanding the conformational conversion of PrPC to PrPSc is to characterize the energy landscape and unfolding/folding pathways of the wild-type and mutant prion proteins in vitro. It is well established that the recombinant prion protein can be folded either to a monomeric α-helical topology or β-rich oligomers and that folding to the PrPC isoform is under kinetic control [60]. Starting from disordered conformations, PrP folds to its PrPC isoform at pH 7, whereas under acidic pH conditions, PrP avoids the kinetic trap and folds to the β-rich isoform. The energy barrier separating the PrPC state and the PrP βrich oligomers was estimated to be 35–45 kcal/mol for wild-type mouse PrP [60]. There is also strong evidence that the full tertiary context of the protein is necessary to stabilize the terminal α-helices and the β-strands within PrPC . Biophysical studies showed that the helices H2 and H3 are largely disordered in the PrP fragment spanning helices H2 and H3 [61]. Monte Carlo simulations suggested that the sequence PrP127–164 can adopt two distinct tertiary folds with different secondary structures [62]. Initial experiments on recombinant mouse PrP121–231 at 298 K, based on stopped-flow fluorescence using the variant F175W which has the same overall structure and stability as wild-type mouse PrP121–231, suggested a two-state folding model at neutral pH, i.e. PrP either unfolded or native. The mouse PrP121–231 fragment was also found to fold very rapidly to PrPC with a half-life of 170 µs [63]. Then, other kinetic studies on human PrP90–231 and mouse PrP121–231 suggested that the change from neutral to acid pH conditions shifted the two-state kinetic to a three-state kinetic by stabilizing a monomeric intermediate with β-sheet structure in equilibrium with the native helical state. An intermediate state was detected at pH 4.0 and 1–1.5 M Gdn-HCl [64], pH 3.6 and 0.75–1.75 M Gdn-HCl [65], pH 5.0 and 2–2.5 M Gdn-HCl [66], and 3.5–4.5 M urea [67]. These β-sheet-rich intermediates, generated under acidic pH conditions (pH below 5), are, however, oligomeric rather than monomeric in character [68–71]. More recently, by following the kinetics of folding and unfolding reactions of human PrP90–231 at 278 K and pH 7, it was again proposed that the prion protein folds by a three-state mechanism involving a monomeric intermediate [72]. Since then, the existence of a folding intermediate, PrP*, has been advocated by a number of studies. These include
7
8 Prion Protein
hydrogen exchanges monitored by NMR [73] and high-pressure NMR experiments on Syrian hamster PrP90–231 [74] – the population of this intermediate was found to be 1% at pH 5.2, 303 K and 1 bar – or mouse PrP121–231 [75]. The point mutations associated with inherited human TSEs are also of interest for characterizing the energy landscape. It was initially believed that mutations in PrP gene could promote the conformational conversion by destabilizing the native structure of PrPC [76]. Fig. 2 shows the positions of polymorphisms and diseasecausing mutations within the NMR human PrP structure. The most frequent polymorphism, modulating disease susceptibility and onset, is located at codon 129. For instance, the M129/N178 allele segregates with FFI, whereas the V129/N178 allele leads to CJD [77]. The codon stop mutation at position 145 which generates an atypical GSS variant is also given [78]. As seen in Fig. 2, most familial mutations are located within the globular domain 124–226: strand S1 (G131V), helix H2 (D178N, V180I, T183A, H187R, T188A or T188K or T188R), loop between H2 and H3 (E196K and F198S), and helix H3 (E200K, D202N, V203I, R208H, V210I, E211Q, Q212P and Q217R). However, three mutations also occur within the disordered N-terminal tail: P102L, P105L and A117V. Although destabilization of the PrPC structure upon mutations has been confirmed for D178N, T183A, F198S, R208H and Q217R mutants by equilibrium unfolding studies in urea [79–81], it is not a general mechanism underlying the formation of PrPSc . The structure of the E200K variant of human PrP superposes well on the wild-type sequence structure [82]. The M129V and P102L variants of human PrP show PrPC -like structural properties from CD analysis [79, 83]. The recently discovered G131V mutation does not lead to any identifiable effects on PrPC secondary structure, as deduced by MD simulations [84]. Clearly, other mechanisms rather than changes in the thermodynamic stability of PrPC are thus important for conversion. One solution is that single-point mutations reduce the energy barrier by stabilizing the transition state. This effect has not been fully explored yet. Another possibility is that the point mutations change the electrostatic energy surface of PrPC and thus alter the interaction (binding or conversion step) with PrPSc . This effect has been proposed for the E200K mutant [82]. Finally, it is possible that the point mutations stabilize a partially folded intermediate. Recent kinetic experiments on a number of human PrP90–231 variants carrying mutations associated with familial forms of prion disease suggest the existence of partially structured intermediates on the refolding pathway of the following PrP variants: F198S, Q217R, V180I, V210I, R208H, D178N/M129 and D178N/V129. In each case, the partially folded state was found to be at least an order of magnitude more populated than the fully unfolded state. The strongest effect was seen for the variant F198S where the ratio of intermediate states relative to the native state is 1/350 versus 1/42,000 for wild-type PrP [81]. In contrast, the P102L mutation did not lead to any increase in the population of the intermediate, leaving unanswered the question of its impact on the energy surface. While it is not a definitive proof for the existence of a monomeric PrP* intermediate, it is interesting that such an intermediate has not been detected for Doppel, a homolog of PrP that does not form infectious prions [73].
4 Mechanisms of Prion Replication In Vivo 9
Little is still known on the structure of the monomeric PrP* intermediate. Highpressure NMR experiments point to an intermediate with the helices H2 and H3 disordered and helix H1 formed at pH 5.2, 303 K and 1 bar [74]. MD simulations of PrP90–231 with either the strand S1 or the strand S2 deleted – these variants do not inhibit prion propagation in a cell-free assay system [85] – suggest partially unfolded intermediates of molten globule type with the helices H2 and H3 disordered, and the helix H1 either fully formed or partially disordered at its C-terminal end [86]. Clearly, the structural features of PrP*, if this intermediate exists in monomeric form, need to be investigated in more detail. Nonetheless, based on MD simulations coupled with φ analysis on globular proteins, it would not be too surprising that PrP* consists of an ensemble of distinct conformations having different secondary structure compositions [87].
4 Mechanisms of Prion Replication In Vivo
Within the protein-only hypothesis, a detailed mechanism for the conformational transition is still unclear. A number of facts are, however, well accepted. Interactions between PrPC and PrPSc provide the main forces to propagate the topological change [60]. The conversion is promoted by partially denaturing conditions and in the presence of low concentrations of urea, i.e. under conditions expected to increase the population of an intermediate. Irrespective of the kinetic model (see below), transition from PrPC to PrPSc is a two-step process, which begins with binding between the two PrP isoforms, followed by conversion of the cellular form to the pathogenic form [88]. By designing specific synthetic peptides which inhibit binding and conversion reaction, it was found that the binding surfaces include residues 119–138 (region preceding helix H1), 165–174 (loop between S2 and H2) and 206–223 (helix H3), while conversion to PrPSc is strongly influenced by residues 139, 155 and 170 [58, 59, 88]. The species barrier, which is the most intriguing feature of prion propagation, limits transmission of prions across species. Mouse scrapie prion can be transmitted to Syrian hamster with an incubation time of 380 days, but transmission in mice is 130 days [89]. At present, two main factors have been identified to contribute to the species barrier from studies on transgenic animals: the conformations of individual strains of PrPSc , and the difference in PrP sequences between the donor and acceptor species [90, 91]. For instance, six positions have been identified in affecting prion transmission between mice and humans, i.e. residues 138, 143, 145, 155 and 166 [29]. Several kinetic models have been proposed to explain why spontaneous formation is an extremely rare event and how infection with PrPSc promotes the conversion of the cellular prion protein (Fig. 4). The template-assisted or heterodimer model [92] postulates that PrPSc is thermodynamically more stable than PrPC , but kinetically inaccessible. PrPC exists in equilibrium with a conformational intermediate PrP* which is able to interact with PrPSc . Binding of PrP* to PrPSc lowers the high activation energy barrier and allows the formation of a PrPSc heterodimer, but other
10 Prion Protein
Fig. 4 Kinetic models for the conformational conversion of PrPC to PrPSc . (A) The template-assisted or heterodimer model [92]. Here, the rate-limiting step during assembly is the conformational conversion between the partially folded PrP intermediate (PrP*) and a newly formed PrPSc . Binding of PrP* to PrPSc under the control of the putative protein X allows the formation
of a PrPSc heterodimer. (B) The polymerization nucleation model [71, 93]. Here the rate-limiting step is the formation of a stable PrPSc nucleus or seed. This is discerned by the observation of a lag phase. The exact size of the nucleus is unknown. PrPSc monomers are stabilized by joining the nucleus, as in seeded crystallization.
cellular factors such as molecular chaperones or prion-disease causing mutations might also contribute by reducing the energy barriers or increasing the stability of the partially folded intermediate species. In this model, the rate-limiting step is the conformational conversion between PrP* and PrPSc (Fig. 4A). The second kinetic model is nucleated polymerization [71, 93]. In this model, PrPC and PrPSc are in equilibrium in solution, but PrPSc is rare and unstable. The rate-limiting step is the formation of a stable PrPSc nucleus or seed. This is discerned by the observation of a lag phase in polymer growth. Once a seed is present, molecular association facilitates the conformational change of PrPC at a rapid rate (Fig. 4B). Within these models, two variants have also emerged recently. The β-nucleation model proposes that unfolding of helix H1 in PrPC , catalyzed by the PrPSc aggregate, is the key event in PrPSc propagation, and that intermolecular salt bridges between the helix H1 D and R residues of adjacent molecules are critical for the conversion [94]. This kinetic model, partially supported by MD simulations of wild-type and its D178N and E200K variants [95], was, however, not validated by cell-free conversion data [96]. The nucleated conformational conversion model, which is a dynamic version of nucleated polymerization, assumes that structurally fluid oligomeric complexes are crucial intermediates and rapid assembly occurs when these complexes conformationally convert upon association with nuclei [97]. At present, computer
4 Mechanisms of Prion Replication In Vivo 11
simulations have not been able to distinguish these kinetic models, independently of the protein models used, because the size of the oligomer is too small [98–102], the energy surface is biased towards the native state [103, 104] or the simulation conditions are not appropriate [105]. Regardless of the kinetic model, template-assisted or nucleation models, several important questions remain to be answered. (1) What is the role played by PrPC plasticity in the conformational change? Based on the experimental percentage of β-sheet within PrPSc [15], both the tail and the protein core undergo a structural change, but does one region initiate the conformational change? To address this issue, several MD simulations focused on the tail [84, 106–109]. In particular, Alonso et al. performed 10-ns MD simulations of Syrian hamster PrP109–219 at 300 K under neutral and low pH conditions [106]. Their results showed that acidic conditions favor the formation of a three-stranded antiparallel β-sheet spanning residues 109–131, whereas neutral conditions maintain the Nterminal region in a disordered state. Santini et al., by using MD simulations on various Syrian hamster PrP90–231 sequences, provided evidence that the ratelimiting factor for the formation of a three-stranded antiparallel β-sheet within the tail is thermodynamic rather than kinetic in character at neutral pH [84]. This βsheet within the tail, which is marginally populated in both PrP90–231 M129V and G131V mutants (30% of β-sheet versus 70% of random conformations) and not populated in wild-type PrP90–231 or its P102L variant, might be stabilized upon binding to the PrPSc template. In parallel, other studies focused on the possible role of the structured core. By using MD studies of PrP in solution [106, 110], molecular mechanics calculations of PrP fragments in monomeric, dimeric and tetrameric forms [94], and NMR structures of sheep PrP142–166 in solution [111], unfolding of helix H1 was suggested to provide the first internal driving force. However, other studies pointed to the high stability of helix H1. These include CD and NMR studies of human and murine prion peptides encompassing helix 1 and flanking sequences under various pH conditions [112–114], high-pressure NMR experiments on Syrian hamster PrP90–231 [74], MD simulations of PrP at low pH and high temperature conditions [115], and MD simulations of two PrP deletion variants at pH 7 and 300 K [86]. Alternatively, other theoretical investigations led to the proposal that helix H2 was the key region in the conformational change [52, 53, 116], because helix H2 is frustrated in the monomeric PrP structure [117, 118]. Frustration in secondary structure elements is defined as the incompatibility between the predicted (e.g. by using neural networks) secondary structure and the experimentally determined structure. Or they concluded using biased MD simulations that there are two populated unfolding routes, the helices H2 and H3 being disordered first with the S1H1S2 motif formed and vice versa [119]. Clearly, this diversity of unfolding and folding routes for PrP is not surprising, but it remains to simulate the dynamics of PrPC in interaction with a PrPSc model to reflect experimental reality. (2) What is the exact role of the β-rich oligomers formed by non reduced recombinant PrP proteins in the conversion? Recent studies have shown that these oligomers, sensitive to digestion by proteinase K, are apparently not on the pathway to amyloid fibril formation [69, 70]. However, it remains possible that these β-rich oligomers may offer a pool of pre-aggregated material for further structural reorganization
12 Prion Protein
through alternative pathways. Furthermore, these oligomers have been observed in other neurodegenerative diseases and there is accumulating evidence that both soluble A β oligomers (early aggregates) and fibrils are toxic in cell cultures [120]. (3) What is the minimal size of the infectious PrPSc particle? Ionizing irradiation experiments have suggested that the minimally infectious PrPSc is a dimer [121], and protease-resistant PrP dimers were observed in hamster brain [122]. On the other hand, recent studies based on size-exclusion chromatography, electrospray mass spectrometry and dynamic light scattering showed that the β-rich oligomers consist of octamers [60, 69, 123], whereas electron crystallography experiments have deduced an hexamer from in vitro infectious Syrian and mouse PrP two-dimensional protofibril crystals [51]. Taken together, these data suggest that infectivity may be associated with several sizes of aggregates under specific external conditions such as pH, temperature, concentration. (4) Finally, what is the role played by copper in the interconversion process and is our inability to generate infectious PrPSc in test tubes due to the absence of cellular factors? There is strong genetic evidence supporting for the importance of the copperbinding domain, since modifications in the number of octa-repeats cause familial prion diseases [124]. The exact role of copper in the interconversion process is far from being elucidated. It is currently believed that the binding of copper ions induces a conformational transition that presumably modulates PrP aggregation [46], thereby enhancing its infectivity [125]. Prior studies on the transmission of human prions to transgenic mice have suggested the role of a cofactor, protein X, in the formation of PrPSc . Protein X appears to bind to the side-chains of residues that form a discontinuous epitope: some amino acids are in the loop composed of residues 165–171 and at the end of helix H2 (Q168 and Q172), whereas others are on the surface of helix H3 (T215 and Q219) [126]. It is intriguing that the protein X epitope coincide with the binding surfaces between PrPSc and PrPC and the disordered regions within the NMR PrPC structures of human, mouse and Syrian hamster. However, all physical attempts to identify the protein X have failed thus far. Recent studies have also indicated that PrPC can associate with various molecular complexes [127]. For instance, Edenhofer et al. found interactions between the molecular chaperone Hsp60 and recombinant glutathione S-transferase-fusion PrPC peptides [128]. In addition, in vitro conversion experiments using a variant of the protein-misfolding cyclic amplification method [129] have shown that specific RNA molecules stimulate PrPSc formation [130]. This raises the possibility that RNA molecules are cellular cofactors and are involved in generating strain diversity. Clearly, advances in mass spectrometry should help clarify the nature of the cellular factors. 5 Conclusions
Since 1997 when Stanley Prusiner was awarded the Nobel Prize for Medicine, important steps have been made in understanding prion replication at the structural
References
level, but many questions remain to be answered. Thus far, many molecules have been identified to be effective for inhibiting prion replication in cell culture essays. For instance, Soto et al. partly reversed in vitro PrPSc to PrPC using β-sheet breaker peptide spanning PrP115–122 [131]. Caughey et al. inhibited the conversion reaction in vitro using diferoylmethane [132]. However, all these inhibitors failed on sick animals. The design of efficient inhibitors blocking prion propagation in mammals requires, at least, a combined effort in three directions: 1. PrP functional role Several studies have pointed to the possible role of PrP in copper transport or metabolism [125], signal transduction [133] and apoptosis [134], to name a few, but the physiological function of PrP remains unknown. Yet, PrP is highly conserved across species, and is expressed in most adult tissues and at high levels in the central nervous system. Thus, PrP is likely to be very useful. 2. Cellular factors Advances in experimental techniques are needed to identify the cellular factors because no purified recombinant PrP has yet been successfully converted in vitro to infectious PrPSc . 3. Structural characterization of the soluble PrP oligomers and insoluble PrPSc fibrils A detailed characterization of the soluble oligomeric intermediates is very difficult because the intermediates are typically short lived and are present in a wide range of conformations and degrees of aggregation; however, these atomic models would greatly facilitate the design of new drugs by computers.
Such a combined effort should not only be useful in prion science but also in Alzheimer’s or Huntington’s diseases because soluble oligomers apparently share a unique common structural feature [135], and, eventually, in understanding the generic rules between amino acid sequences, folded and misfolded structures for all protein coding genes. References 1 SIPE, J. D. Amyloidosis. Annu Rev Biochem 1992, 61, 947–975. 2 KELLY, J. F. The alternative conformations of amyloidogenic proteins and their multistep assembly pathways. Curr Opin Struct Biol 1998, 8, 101–106. 3 DOBSON, C. M. Protein-misfolding diseases: getting out of shape. Nature 2002, 418, 729–730. 4 DIEDRICH, J. F., P. E. BENDHEIM, Y. S. KIM, R. I. CARP and A. T. HAASE. Scrapie-associated prion protein accumulates in astrocytes during scrapie infection. Natl Acad Sci USA 1991, 88, 375–379.
5 JEFFREY, M., J. R. SCOTT, A. WILLIAMS and H. FRASER. Ultrastructural features of spongiform encephalopathy transmitted to mice from three species of biovidae. Acta Neuropathol 1992, 84, 559–569. 6 BENDHEIM, P. E., R. A. BARRY, S. J. DEARMOND, D. P. STITES and S. B. PRUSINER. Antibodies to a scrapie prion protein. Nature 1984, 310, 418–421. 7 ALPER, T., W. A. CRAMP, D. A. HAIG and M. C. CLARKE. Does the agent of scrapie replicate without nucleic acid? Nature 1967, 214, 764–766. 8 GRIFFITH, J. S. Self-replication and scrapie. Nature 1967, 215, 1043–1044.
13
14 Prion Protein
9 PRUSINER, S. B. Novel proteinaceous infectious particles cause scrapie. Science 1982., 216, 136–144. 10 ROHWER, R. G. Scrapie infectious agent is virus-like in size and susceptibility to inactivation. Nature 1984, 308, 658–662. 11 DICKINSON, A. G. and G. W. OUTRAM. Genetic aspects of unconventional virus infections: the basis of the virino hypothesis. Ciba Found Symp 1988, 135, 63–83. 12 STAHL, N., M. A. BALDWIN, D. B. TEPLOW, L. HOOD, B. W. GIBSON, A. L. BURLINGAME and S. B. PRUSINER. Structural studies of the scrapie prion protein using mass spectrometry and amino acid sequencing. Biochemistry 1993, 32, 1991–2002. 13 PRUSINER, S. B. Prions. Proc Natl Acad Sci USA 1998, 95, 13363–13383. 14 KOCISKO, D. A., J. H. COME, S. A. PRIOLA, B. CHESBRO, G. J. RAYMOND, P. T. LANSBURY and B. CAUGHEY. Cell-free formation of protease-resistant prion protein. Nature 1994, 370, 471–474. 15 CAUGHEY, B. W., A. DONG, K. BHAT, D. ERNST, S. HAYES and W. CAUGHEY. Secondary structure analysis of the scrapie-associated protein PrP 27–30 in water by infrared spectroscopy. Biochemistry 1991, 30, 7672–7680. 16 GASSET, M., M. A. BALDWIN, R. J. FLETTERICK and S. B. PRUSINER. Perturbation of the secondary structure of the scrapie prion protein under conditions that alter infectivity. Proc Natl Acad Sci USA 1993, 90, 1–5. 17 PAN, K. M., M. BALDWIN, J. NGUYEN, M. GASSET, A. SERBAN, D. GROTH, I. MEHLHORN, Z. HUANG, et al. Conversion of alpha-helices into beta-sheets features in the formation of the scrapie prion proteins. Proc Natl Acad Sci USA 1993, 90, 10962–10966. 18 WILLE, H., G. F. ZHANG, M. A. BALDWIN, F. E. COHEN and S. B. PRUSINER. Separation of scrapie prion infectivity from PrP amyloid polymers. J Mol Biol 1996, 259, 608–621. 19 BRUCE, M. E., R. G. WILL, J. W. IRONSIDE, I. MCCONNELL, D. DRUMMOND, A. SUTTIE, L. MCCARDLE, A. CHREE, et al. Transmissions to mice indicate that
20
21
22
23
24
25
26
27
28
29
30
“new variant” CJD is caused by the BSE agent. Nature 1997, 389, 498–501. LLEWELYN, C. A., P. E. HEWITT, R. S. KNIGHT, K. AMAR, S. COUSENS, J. MACKENZIE and R. G. WILL. Possible transmission of variant Creutzfeldt-Jakob disease by blood transfusion. Lancet 2004, 363, 417–421. BOUSSET, L. and R. MELKI. Similar and divergent features in mammalian and yeast prions. Microbes Infect 2002, 4, 461–469. ERMONVAL, M., S. MOUILLET-RICHARD, P. CODOGNO, O. KELLERMANN and J. BOTTI. Evolving views in prion glycosylation: functional and pathological implications. Biochimie 2003, 85, 33–45. HETZ, C. and C. SOTO. Protein misfolding and disease: the case of prion disorders. Cell Mol Life Sci 2003, 60, 133–143. PRADO, M. A., J. ALVES-SILVA, A. C. MAGALHAES, V. F. PRADO, R. LINDEN, V. R. MARTINS and B. R. BRENTANI. PrPC on the road: trafficking of the cellular prion protein. J Neurochem 2004, 88, 769–781. TANAKA, M., P. CHIEN, N. NABER, R. COOKE and J. S. WEISSMAN. Conformational variations in an infectious protein determine prion strain differences. Nature 2004, 428, 323–328. KING, C. Y. and R. DIAZ-AVALOS. Protein-only transmission of three yeast prion strains. Nature 2004, 428, 319–323. EBERL, H., P. TITTMANN and R. GLOCKSHUBER. Characterization of recombinant, membrane-attached full-length prion protein. J Biol Chem 2004, 279, 25058–25065. DONNE, D. G., J. H. VILES, D. GROTH, I. MEHLHORN, T. L. JAMES, F. E. COHEN, S. B. PRUSINER, P. E. WRIGHT, et al. Structure of the recombinant full-length hamster prion protein PrP(29–231): the N terminus is highly flexible. Proc Natl Acad Sci USA 1997, 94, 13452–13457. RIEK, R., S. HORNEMANN, G. WIDER, M. BILLETER, R. GLOCKSHUBER and K. W¨UTHRICH. NMR structure of the mouse prion protein domain PrP(121–321). Nature 1996, 382, 180–182. LOPEZ GARCIA, F., R. ZAHN, R. RIEK and K. W¨UTHRICH. NMR structure of the
References
31
32
33
34
35
36
37
38
39
bovine prion protein. Proc Natl Acad Sci USA 2000, 97, 8334–8339. ZAHN, R., A. LIU, T. LUHRS, R. RIEK, C. VON SCHROETTER, F. LOPEZ GARCIA, M. BILLETER, L. CALZOLAI, et al. NMR solution structure of the human prion protein. Proc Natl Acad Sci USA 2000, 97, 145–150. CALZOLAI, L. and R. ZAHN. Influence of pH on NMR structure and stability of the human prion protein globular domain. J Biol Chem 2003, 278, 35592–35596. HOSSZU, L. L., N. J. BAXTER, G. S. JACKSON, A. POWER, A. R. CLARKE, J. P. WALTHO, C. J. CRAVEN and J. COLLINGE. Structural mobility of the human prion protein probed by backbone hydrogen exchange. Nat Struct Biol 1999, 6, 740–743. SEKIJIMA, M., C. MOTONO, S. YAMASAKI, K. KANEKO and Y. AKIYAMA. Molecular dynamics simulation of dimeric and monomeric forms of human prion protein: insight into dynamics and properties. Biophys J 2003, 85, 1176–1185. CALZOLAI, L., D. A. LYSEK, P. GUNTERT, C. VON SCHROETTER, R. RIEK, R. ZAHN and K. W¨UTHRICH. NMR structures of three singleresidue variants of the human prion protein. Proc Natl Acad Sci USA 2000, 97, 8340–8345. KNAUS, K. J., M. MORILLAS, W. SWIETNICKI, M. MALONE, W. K. SUREWICZ and V. C. YEE. Crystal structure of the human prion protein reveals a mechanism for oligomerization. Nat Struct Biol 2001, 8, 770–774. HAIRE, L. F., S. M. WHYTE, N. VASISHT, A. C. GILL, C. VERMA, E. J. DODSON, G. G. DODSON and P. M. BAYLEY. The crystal structure of the globular domain of sheep prion protein. J Mol Biol 2004, 336, 1175–1183. SUPATTAPONE S., T. MURAMOTO, G. LEGNAME, I. MEHLHORN, F. E. COHEN, S. J. DEARMOND, S. B. PRUSINER and M. R. SCOTT. Identification of two prion protein regions that modify scrapie incubation time. J Virol 2001, 75, 1408–1413. HOLSCHER, C., H. DELIUS and A. BURKLE. Overexpression of nonconvertible
40
41
42
43
44
45
46
47
PrPC delta 114–121 in scrapie-infected mouse neuroblastoma cells leads to trans-dominant inhibition of wild-type PrPSc accumulation. J Virol 1998, 72, 1153–1159. WEISSMANN, C. Molecular genetics of transmissible spongiform encephalopathies. J Biol Chem 1999, 274, 3–6. BARRON, R. M., V. THOMSON, E. JAMIESON, D. W. MELTON, J. IRONSIDE, R. WILL and J. C. MANSON. Changing a single amino acid in the N-terminus of murine PrP alters TSE incubation time across three species barriers. EMBO J 2001, 20, 5070–5078. KRAMER, M. L., H. D. KRATZIN, B. SCHMIDT, A. ROMER, O. WINDL, S. LIEMANN, S. HORNEMANN and H. KRETZSCHMAR. Prion protein binds copper within the physiological concentration range. J Biol Chem 2001, 276, 16711–16719. BURNS, C. S., E. ARONOFF-SPENCER, G. LEGNAME, S. B. PRUSINER, W. E. ANTHOLINE, G. J. GERFEN, J. PEISACH and G. L. MILLHAUSER. Copper coordination in the full-length, recombinant prion protein Biochemistry 2003, 42, 6794–6803. ETTAICHE, M., R. PICHOT, J. P. VINCENT and J. CHABRY. In vivo cytotoxicity of the prion protein fragment 106–126. J Biol Chem 2000, 275, 36487–36490. KUWATA, K., T. MATUMOTO, H. CHENG, K. NAGAYAMA, T. L. JAMES and H. RODER. NMR-detected hydrogen exchange and molecular dynamics simulations provide structural insight into fibril formation of prion protein fragment 106–126. Proc Natl Acad Sci USA 2003, 100, 14790–14795. ZAHN, R. The octapeptide repeats in mammalian prion protein constitute a pH-dependent folding and aggregation site. J Mol Biol 2003, 334, 477–488. BURNS, C. S., E. ARONOFF-SPENCER, C. M. DUNHAM, P. LARIO, N. I. AVDIEVICH, W. E. ANTHOLINE, M. M. OLMSTEAD, A. VRIELINK, et al. Molecular features of the copper binding sites in the octarepeat domain of the prion protein. Biochemistry 2002, 41, 3991–4001.
15
16 Prion Protein
48 HUANG, Z., S. B. PRUSINER and F. E. COHEN. Scrapie prions: a three-dimensional model of an infectious fragment. Fold Design 1996, 1, 13–19. 49 KORTH, C., B. STIERLI, P. STREIT, M. MOSER, O. SCHALLER, R. FISCHER, W. SCHULZ-SCHAEFFER, H. KRETZSCHMAR, et al. Prion (PrP c)-specific epitope defined by a monoclonal antibody. Nature 1997, 390, 74–77. 50 DOWNING, D. T. and N. D. LAZO. Molecular modelling indicates that the pathological conformations of prions might be β-helical. Biochem J 1999, 343, 453–460. 51 WILLE, H., M. D. MICHELITSCH, V. GUENEBAUT, S. SUPATTAPONE, A. SERBAN, F. E. COHEN, D. A. AGARD and S. B. PRUSINER. Structural studies of the scrapie prion protein by electron crystallography. Proc Natl Acad Sci USA 2002, 99, 3563–3568. 52 MORNON, J. P., K. PRAT, F. DUPUIS and I. CALLEBAUT. Structural features of prions explored by sequence analysis. I. Sequence data. Cell Mol Life Sci 2002, 59, 1366–1376. 53 MORNON, J. P., K. PRAT, F. DUPUIS, N. BOISSET and I. CALLEBAUT. Structural features of prions explored by sequence analysis. II. A PrP c model. Cell Mol Life Sci 2002, 59, 2144–2154. 54 DEMARCO, M. L. and V. DAGGETT. From conversion to aggregation: protofibril formation of the prion protein. Proc Natl Acad Sci USA 2004, 101, 2293–2298. 55 MCCOLL, I. H., E. W. BLANCH, A. C. GILL, A. G. RHIE, M. A. RITCHIE, L. HECHT, K. NIELSEN and L. D. BARRON. A new perspective on beta-sheet structures using vibrational Raman optical activity from poly(L-lysine) to the prion protein. J Am Chem Soc 2003, 125, 10019–10026. 56 PARAMITHIOTIS, E., M. PINARD, T. LAWTON, S. LABOISSIERE, V. L. LEATHERS, W. Q. ZOU, L. A. ESTEY, J. LAMONTAGNE, et al. A prion protein epitope selective for the pathologically misfolded conformation. Nat Med 2003, 9, 893–899. 57 SERBEC, C. V., M. BRESJANAC, M. POPOVIC, P. K. HARTMAN, V. GALVANI, R. RUPREHT, M. CERNILEC, T. VRANAC, et al.
58
59
60
61
62
63
64
65
66
Monoclonal antibody against a peptide of human prion protein discriminates between Creutzfeldt-Jacob’s disease-affected and normal brain tissue. J Biol Chem 2003, 279, 3694–3698. CHABRY, J., B. CAUGHEY and B. CHESEBRO. Specific inhibition of in vitro formation of protease-resistant prion protein by synthetic peptides. J Biol Chem 1998, 273, 13203–13207. HORIUCHI, M., G. S. BARON, L. W. XIONG and B. CAUGHEY. Inhibition of interactions and interconversions of prion protein isoforms by peptide fragments from the C-terminal folded domain. J Biol Chem 2001, 276, 15489–15497. BASKAKOV, I. V., G. LEGNAME, S. B. PRUSINER and F. E. COHEN. Folding of prion protein to its native alpha-helical conformation is under kinetic control. J Biol Chem 2001, 276, 19687–19690. EBERL, H. and R. GLOCKSHUBER. Folding and intrinsic stability of deletion variants of PrP(121–231), the folded C-terminal domain of the prion protein. Biophys Chem 2002, 96, 293–303. DERREUMAUX, P. Evidence that the 127–164 region of prion proteins has two equienergetic conformations with beta or alpha features. Biophys J 2001, 81, 1657–1665. WILDEGGER, G., S. LIEMANN and R. GLOCKSHUBER. Extremely rapid folding of the C-terminal domain of the prion protein without kinetic intermediates. Nat Struct Biol 1999, 6, 550–553. JACKSON, G. S., L. L. HOSSZU, A. POWER, A. F. HILL, J. KENNEY, H. SAIBIL, C. J. CRAVEN, J. P. WALTHO, et al. Reversible conversion of monomeric human prion protein between native and fibrilogenic conformations. Science 1999, 283, 1935–1937. SWIETNICKI, W., R. PETERSEN, P. GAMBETTI and W. K. SUREWICZ. pH-dependent stability and conformation of the recombinant human prion protein PrP(90–231). J Biol Chem 1997, 272, 27517–27520. ZHANG, H., J. STOCKEL, I. MEHLHORN, D. GROTH, M. A. BALDWIN, S. B. PRUSINER, T. L. JAMES and F. E. COHEN. Physical
References
67
68
69
70
71
72
73
74
75
76
studies of conformational plasticity in a recombinant prion protein. Biochemistry 1997, 36, 3543–3553. HORNEMANN, S. and R. GLOCKSHUBER. A scrapie-like unfolding intermediate of the prion protein domain PrP(121–231) induced by acidic pH. Proc Natl Acad Sci USA 1998, 95, 6010–6014. MORILLAS, M., D. L. VANIK and W. K. SUREWICZ. On the mechanism of alpha-helix to beta-sheet transition in the recombinant prion protein. Biochemistry 2001, 40, 6982–6987. BASKAKOV, I. V., G. LEGNAME, M. A. BALDWIN, S. B. PRUSINER and F. E. COHEN. Pathway complexity of prion protein assembly into amyloid. J Biol Chem 2002, 277, 21140–21148. LU, B. Y. and J. Y. CHANG. Isolation and characterization of a polymerized prion protein. Biochem J 2002, 364, 81–87. BASKAKOV, I. V., G. LEGNAME, Z. GRYCZYNSKI and S. B. PRUSINER. The peculiar nature of unfolding of the human prion protein. Protein Sci 2004, 13, 586–595. APETRI, A. C. and W. K. SUREWICZ. Kinetic intermediate in the folding of human prion protein. J Biol Chem 2002, 277, 44589–44592. NICHOLSON, E. M., H. MO, S. B. PRUSINER, F. E. COHEN and S. MARQUSEE. Differences between the prion protein and its homolog Doppel: a partially structured state with implications for scrapie formation. J Mol Biol 2002, 316, 807–815. KUWATA, K., H. LI, Y. YAMADA, G. LEGNAME, S. B. PRUSINER, K. AKASAKA and T. L. JAMES. Locally disordered conformer of the hamster prion protein: a crucial intermediate to PrPSc ? Biochemistry 2002, 41, 12277–12283. MARTINS, S. M., A. CHAPEAUROUGE and S. T. FERREIRA. Folding intermediates of the prion protein stabilized by hydrostatic pressure and low temperature. J Biol Chem 2003, 278, 50449–50455. COHEN, F. E., K. PAN, Z. HUANG, M. BALDWIN, R. FLETTERICK and S. B. PRUSINER. Structural clues to prion replication. Science 1994, 264, 530–531.
77 GOLDFARB, L. G., R. B. PETERSEN, M. TABATON, P. BROWN, A. C. LEBLANC, P. MONTAGNA, P. CORTELLI, J. JULIEN, et al. Fatal familial insomnia and familial Creutzfeldt-Jakob disease: disease phenotype determined by a DNA polymorphism. Science 1992, 258, 806–808. 78 KITAMOTO, T., R. IIZUKA and J. TATEISHI. An amber mutation of prion protein in Gerstmann–Str¨aussler syndrome with mutant PrP plaques. Biochem Biophys Res Commun 1993, 192, 525–531. 79 LIEMANN, S. and R. GLOCKSHUBER. Influence of amino acid substitutions related to inherited human prion diseases on the thermodynamic stability of the cellular prion protein. Biochemistry 1999, 38, 3258–3267. 80 VANIK, D. L. and W. K. SUREWICZ. Disease-associated F198S mutation increases the propensity of the recombinant prion protein for conformational conversion to scrapie-like form. J Biol Chem 2002, 277, 49065–49070. 81 APETRI, A. C., K. A. SUREWICZ and W. K. SUREWICZ. The effect of disease-associated mutations on the folding pathway of human prion protein. J Biol Chem 2004, 279, 18008–18014. 82 ZHANG, Y., W. SWIETNICKI, M. G. ZAGORSKI, W. K. SUREWICZ and F. D. SONNICHSEN. Solution structure of the E200K variant of human prion protein. J Biol Chem 2000, 275, 33650–33654. 83 SWIETNICKI, W., R. PETERSEN, P. GAMBETTI and W. SUREWICZ. Familial mutations and the thermodynamic stability of the recombinant human prion protein. J Biol Chem 1998, 273, 31048–31052. 84 SANTINI, S., J.-B. CLAUDE, S. AUDIC and P. DERREUMAUX. Impact of the tail and mutations G131Vand M129V on prion protein flexibility. Proteins 2003, 51, 258–265. 85 VORBERG, I., K. CHAN and A. PRIOLA. Deletion of β-strand and α-helix secondary structure in normal prion protein inhibits formation of its protease-resistant iso-form. J Virol 2001, 75, 10024–10032.
17
18 Prion Protein
86 SANTINI, S. and P. DERREUMAUX. The helix H1 of the prion protein is rather stable against environmental perturbations: molecular dynamics of mutation and deletion variants of PrP(90–231). Cell Mol Life Sci 2004, 61, 951–960. 87 PACI, E., M. VENDRUSCOLO, CM. DOBSON and M. KARPLUS. Determination of a transition state at atomic resolution from protein engineering data. J Mol Biol 2002, 324, 151–163. 88 HORIUCHI, M., S. A. PRIOLA, J. CABRY and B. CAUGHEY. Interactions between heterologous forms of prion protein: binding, inhibition of conversion and species barriers. Proc Natl Acad Sci USA 2000, 97, 5836–5841. 89 KIMBERLIN, R. H., S. COLE and C. A. WALKER. Temporary and permanent modifications to a single strain of mouse scrapie on transmission to rats and hamsters. J Gen Virol 1987, 68, 1875–1881. 90 PRUSINER, S. B., M. SCOTT, D. FOSTER, K. PAN, D. GROTH, C. MIRENDA, M. TORCHIA, S. L. YANG, et al. Transgenetic studies implicate interactions between homologous PrP isoforms in scrapie prion replication. Cell 1990, 63, 673–686. 91 ASANTE, E. A., J. M. LINEHAN, M. DESBRUSLAIS, S. JOINER, I. GOWLAND, A. L. WOOD, J. WELCH, A. F. HILL, et al. BSE prions propagate as either variant CJD-like or sporadic CJD-like prion strains in transgenic mice expressing human prion protein. EMBO J 2002, 21, 6358–6366. 92 COHEN, F. E. and S. B. PRUSINER. Pathologic conformations of prion proteins. Annu Rev Biochem 1998, 67, 793–819. 93 JARRETT, J. T. and P. T. LANSBURY Jr. Seeding one dimensional crystallization of amyloid: a pathogenic mechanism in Alzheimer’s disease and scrapie? Cell 1993, 73, 1055–1058. 94 MORRISSEY, M. P. and E. I. SHAKHNOVICH. Evidence for the role of PrPC helix 1 in the hydrophilic seeding of prion aggregates. Proc Natl Acad Sci USA 1999, 96, 11293–11298.
95 LEVY, Y. and O. M. BECKER. Conformational polymorphism of wild-type and mutant prion proteins: energy landscape analysis. Proteins 2002, 47, 458–468. 96 SPEARE, J. O., T. S. RUSH 3rd, M. E. BLOOM and B. CAUGHEY. The role of helix 1 aspartates and salt bridges in the stability and conversion of prion protein. J Biol Chem 2003, 278, 12522–12529. 97 SERIO, T. R., A. G. CASHIKAR, A. S. KOWAL, G. J. SAWICKI, J. J. MOSLEHI, L. SERPELL, M. F. ARNSDORF and S. L. LINDQUIST. Nucleated conformational conversion and the replication of conformational information by a prion determinant. Science 2000, 289, 1317–1321. 98 HARRISON, P. M., H. S. CHAN, S. B. PRUSINER and F. E. COHEN. Conformational propagation with prion-like characteristics in a simple model of protein folding. Protein Sci 2001, 10, 819–835. 99 GSPONER, J., U. HABERTHUR and A. CAFLISCH. The role of side-chain interactions in the early steps of aggregation: molecular dynamics simulations of an amyloidforming peptide from the yeast prion Sup35. Proc Natl Acad Sci USA 2003, 100, 5154–5159. 100 KLIMOV, D. K. and D. THIRUMALAI. Dissecting the assembly of Aβ S16–22 amyloid peptides into antiparallel beta sheets. Structure 2003, 11, 295–307. 101 SANTINI, S., G. WEI, N. MOUSSEAU and P. DERREUMAUX. Exploring the folding pathways of proteins through energy landscape sampling: application to Alzheimer’s beta-amyloid peptide. Internet Electron J Mol Des 2003, 2, 564–577. 102 SANTINI, S., G. WEI, N. MOUSSEAU and P. DERREUMAUX. Pathway complexity of Alzheimer’s-amyloid A1622 peptide assembly. Structure 2004, 12, 1245–1255. 103 DING, F., N. V. DOKHOLYAN, S. V. BULDYREV, H. E. STANLEY and E. I. SHAKHNOVICH. Molecular dynamics simulation of the SH3 domain aggregation suggests a generic amyloidogenesis mechanism. J Mol Biol 2002, 324, 851–857.
References 104 JANG, H., C. K. HALL and Y. ZHOU. Assembly and kinetic folding pathways of a tetrameric beta-sheet complex: molecular dynamics simulations on simplified off-lattice protein models. Biophys J 2004, 86, 31–49. 105 DIMA, R. I. and D. THIRUMALAI. Exploring protein aggregation and self-propagation using lattice models: phase diagram and kinetics. Protein Sci 2002, 11, 1036–1049. 106 ALONSO, D. O., S. J. DEARMOND, F. E. COHEN and V. DAGGETT. Mapping the early steps in the pH-induced conformational conversion of the prion protein. Proc Natl Acad Sci USA 2001, 98, 2985–2989. 107 ZUEGG, J. and J. E. GREADY Molecular dynamics simulations of human prion protein: importance of correct treatment of electrostatic interactions. Biochemistry 1999, 38, 13862–13876. 108 OKIMOTO, N., K. YAMANAKA, A. SUENAGA, M. HATA and T. HOSHINO. Computational studies on prion proteins: effect of Ala117 -Val mutation. Biophys J 2002, 82, 2746–2757. 109 PARCHMENT, O. G. and J. W. ESSEX. Molecular dynamics of mouse and Syrian hamster PrP: implications for activity. Proteins 2000, 38, 327–340. 110 GSPONER J., P. FERRARA and A. CAFLISCH. Flexibility of the murine prion protein and its Asp178Asn mutant investigated by molecular dynamics simulations. J Mol Graph Model 2001, 20, 169–182. 111 KOZIN S. A., G. BERTHO, A. K. MAZUR, H. RABESONA, J. P. GIRAULT, T. HAERTLE, M. TAKAHASHI, P. DEBEY, et al. Sheep prion protein synthetic peptide spanning helix 1 and beta-strand 2 (residues 142–166) shows beta-hairpin structure in solution. J Biol Chem 2001, 276, 46364–46370. 112 LIU, A., R. RIEK, R. ZAHN, S. HORNEMANN, R. GLOCKSHUBER and K. W¨UTHRICH. Peptides and proteins in neurode-generative disease: helix propensity of a polypeptide containing helix 1 of the mouse prion protein studied by NMR and CD spectroscopy. Biopolymers 1999, 51, 145–152. 113 ZIEGLER, J., H. STICHT, U. C. MARX, W. MULLER, P. ROSCH and S. SCHWARZINGER. CD and NMR-studies of prion protein
114
115
116
117
118
119
120
121
122
123
helix 1: novel implications for its role in the PrPC to PrPSc conversion process. J Biol Chem 2003, 278, 50175–50181. JAMIN, N., Y. M. COIC, C. LANDON, L. OVTRACHT, F. BALEUX, J. M. NEUMANN and A. SANSON. Most of the structural elements of the globular domain of murine prion protein form fibrils with predominant beta-sheet structure. FEBS Lett 2002, 529, 256–260. GU, W., T. WANG, J. ZHU, Y. SHI, H. LIU. Molecular dynamics simulation of the unfolding of the human prion protein domain under low pH and high temperature conditions. Biophys Chem 2003, 104, 79–94. GILIS, D. and M. ROOMAN. PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. Protein Eng 2000, 13, 849–856. KALLBERG, Y., M. GUSTAFSSON, B. PERSSON, J. THYBERG and J. JOHANSSON. Prediction of amyloid fibrilforming proteins. J Biol Chem 2001, 276, 12945–12950. DIMA, R. I. and D. THIRUMALAI. Exploring the propensities of helices in PrPC to form beta sheet using NMR structures and sequence alignments. Biophys J 2002, 83, 1268–1280. SETTANNI, G., T. X. HOANG, C. MICHELETTI and A. MARITAN. Folding pathways of prion and doppel. Biophys J 2002, 83, 3533–3541. WALSH, D. M., I. KLYUBIN, J. V. FADEEVA, W. K. CULLEN, R. ANWYL, M. S. WOLFE, M. J. ROWAN and D. J. SELKOE. Naturally secreted oligomers of amyloid beta protein potently inhibit hippocampal longterm potentiation in vivo. Nature 2002, 416, 483–484. BELLINGER-KAWAHARA, C. G., E. KEMPNER, D. GROTH, R. GABIZON and S. B. PRUSINER. Scrapie prion liposomes and rods exhibit target sizes of 55,000 Da. Virology 1988, 164, 537–541. TURK, E., D. B. TEPLOW, L. E. HOOD, S. B. PRUSINER. Purification and properties of the cellular and scrapie hamster prion proteins. Eur J Biochem 1988, 176, 21–30. SOKOLOWSKI, F., A. J. MODLER, R. MASUCH, D. ZIRWER, M. BAIER, G. LUTSCH, D. A. MOSS, K. GAST, et al.
19
20 Prion Protein
124
125
126
127
128
129
130
Formation of critical oligomers is a key event during conformational transition of recombinant Syrian hamster prion protein. J Biol Chem 2003, 278, 40481–40492. CAPELLARI, S., P. PARCHI, J. CAMPBELL, D. M. POSEY, R. B. PETERSEN and P. GAMBETTI. Creutzfeldt-Jakob disease associated with a deletion of two repeats in the prion protein gene. Neurology 2002, 59, 1628–1630. SIGURDSSON, E. M., D. R. BROWN, M. A. ALIM, H. SCHOLTZOVA, R. CARP, H. C. MEEKER, F. PRELLI, B. FRANGIONE, et al. Copper chelation delays the onset of prion disease. J Biol Chem 2003, 278, 46199–46202. KANEKO, K., L. ZULIANELLO, M. SCOTT, C. M. COOPER, A. C. WALLACE, T. L. JAMES, F. E. COHEN and S. B. PRUSINER. Evidence for protein X binding to a discontinuous epitope on the cellular prion protein during scrapie prion propagation. Proc Natl Acad Sci USA 1997, 94, 10069–10074. LEE, K. S., R. LINDEN, M. A. PRADO, R. R. BRENTANI and V. R. MARTINS. Towards cellular receptors for prions. Rev Med Virol 2003, 13, 399–408. EDENHOFER, F., R. RIEGER, M. FAMULOK, W. WENDLER, S. WEISS and E. L. WINNACKER. Prion protein PrPC interacts with molecular chaperones of the Hsp60 family. J Virol 1996, 70, 4724–4728. SABORIO, G. P., B. PERMANNE and C. SOTO. Sensitive detection of pathological prion protein by cyclic amplification of protein misfolding. Nature 2001, 411, 810–813. DELEAULT, N. R., R. W. LUCASSEN, S. SUPATTAPONE. RNA molecules stimulate prion protein conversion. Nature 2003, 425, 717–720.
131 SOTO, C., R. J. KASCSAK, G. P. SABORIO, P. AUCOUTURIER, T. WISNIEWSKI, F. PRELLI, R. KASCSAK, E. MENDEZ, et al. Reversion of prion protein conformational changes by synthetic-sheet breaker peptides. Lancet 2000, 355, 192–197. 132 CAUGHEY, B., L. D. RAYMOND, G. J. RAYMOND, L. MAXSON, J. SILVEIRA and G. S. BARON. Inhibition of protease-resistant prion protein accumulation in vitro by curcumin. J Virol 2003, 77, 5499–5502. 133 MOUILLET-RICHARD, S., M. ERMONVAL, C. CHEBASSIER, J. L. LAPLANCHE, S. LEHMANN, J. M. LAUNAY and O. KELLERMANN. Signal transduction through prion protein. Science 2000, 289, 1925–1928. 134 SCHATZL, H. M., L. LASZLO, D. M. HOLTZMAN, J. TATZELT, S. J. DEARMOND, R. I. WEINER, W. C. MOBLEY and S. B. PRUSINER. A hypothalamic neuronal cell line persistently infected with scrapie prions exhibits apoptosis. J Virol 1997, 71, 8821–8831. 135 KAYED, R., E. HEAD, J. L. THOMPSON, T. M. MCINTIRE, S. C. MILTON, C. W. COTMAN and C. G. GLABE. Common structure of soluble amyloid oligomers implies common mechanism of pathogenesis. Science 2003, 300, 486–489. 136 NOTREDAME, C., D. G. HIGGINS and J. HERINGA. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000, 302, 205–217. 137 KOVACS, G. G., G. TRABATTONI, J. A. HAINFELLNER, J. W. IRONSIDE, R. S. KNIGHT and H. BUDKA. Mutations of the prion protein gene phenotypic spectrum. J Neurol 2002, 249, 1567–1582.
1
Protein Aggregates in Neurodegenerative Disorders Giorgio Giaccone
Istituto Nazionale Neurologico Carlo Besta, Milano, Italy
Mario Salmona Istituto di Ricerche Farmacologiche Mario Negri, Milano, Italy
Fabrizio Tagliavini Istituto Nazionale Neurologico Carlo Besta, Milano, Italy
Gianluigi Forloni Istituto di Ricerche Farmacologiche Mario Negri, Milano, Italy
Originally published in: Amyloid Proteins. Edited by Jean D. Sipe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31072-X
1 Introduction
The presence of protein aggregates has been proposed as a unifying feature of neurodegenerative disorders. The aggregates may consist of fibrils with a β-pleatedsheet conformation, termed amyloid, or be made up of misfolded proteins without the staining and ultrastructural properties of amyloid. This abnormal material can accumulate at the intra- or extracellular levels, but is invariably associated with neuronal degeneration. In some instances, new categories of disorders could be identified on the basis of the misfolded protein. The formation of inclusions may represent an early event or the end stage of a molecular cascade of several steps and the pathogenic role of the aggregates might be variable in different diseases. For several neurodegenerative disorders, genetic variants assist in explaining the pathogenesis of the more common sporadic forms, and in developing mouse and other models. Our understanding of the pathways involved in protein aggregation and in the molecular mechanisms of cellular toxicity is growing rapidly [1, 2]. The presence of extracellular amyloid deposits and of intracellular neurofibrillary tangles is the main neuropathological feature of Alzheimer’s disease. The β amyloid (Aβ) protein, a peptide of 40–42 amino acids, is the major component Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Aggregates in Neurodegenerative Disorders
of amyloid, while neurofibrillary tangles are made up of hyperphosphorylated tau protein. Based on the presence of abnormal hyperphosphorylation and aggregation of tau without other specific neuropathological abnormalities, a heterogeneous group of neurodegenerative disorders clinically characterized by dementia and/or motor syndromes is now unified under the term of “tauopathies”. The group includes Pick’s disease, progressive supranuclear palsy (PSP) and corticobasal degeneration (CBD). The discovery that mutations of protein tau are associated with familial forms of frontotemporal dementia and parkinsonism (FTDP-17) provides strong evidence that the derangement of tau metabolism contributes to neurodegeneration. The accumulation of a misfolded form of the prion protein characterizes prion diseases and plays an essential role also in their transmissibility. Molecular genetic investigations allowed the identification of α-synuclein aggregates as the main component of Lewy bodies, the intracellular inclusions that characterize the degenerating cells in Parkinson’s disease. α-Synuclein is also found in the intraneural inclusions of other neurodegenerative disorders, such as dementia with Lewy bodies (DLB) and multiple system atrophy, which are all now commonly described as “α-synucleinopathies”. The pathogenetic role of protein aggregates found in other groups of neurodegenerative diseases is less defined. Ubiquitinated cytoplasmatic inclusions have been detected in amyotrophic lateral sclerosis, and distinguished morphologically in skein inclusions, hyaline Lewy body-like inclusions and Bunina bodies. The composition of these aggregates is varied: in the familial forms associated with copper–zinc superoxide dismutase (SOD1) mutations, the mutated form of the enzyme was consistently found in the aggregates. Recent studies have proposed a direct involvement of SOD1 misfolding and aggregation in the pathogenesis of this disease. According to this hypothesis, Stathopulos et al. [3] have shown that SOD1 mutants undergo conformational changes that facilitate their polymerization. However, in terms of temporal development of the neurodegenerative process investigated in animal models, the appearance of intracellular inclusions is not an early event. Furthermore, the heterogeneous composition of intracellular inclusions in sporadic cases did not support a seminal role of protein aggregation in the pathogenesis of amyotrophic lateral sclerosis [4–6]. The presence of a trinucleotide (CAG) repeat expansion is characteristic of many neurological disorders including Huntington’s disease, spinobulbar muscular atrophy, dentatorubro-pallidoluysian atrophy and different forms of spinocerebellar ataxia. The expanded polyglutamine [poly(Q)] tract lies in functionally unrelated genes and the accumulation of poly(Q) sequences is a common pathological hallmark of these disorders. In most cases, the accumulation occurs at the level of neuronal nuclei (intranuclear inclusions). The molecular basis of the toxicity of poly(Q) has been explained by two alternative concepts. One implies that the expansion leads to conformational changes creating a misfolded structure with cytotoxic properties [7]; the other that the long poly(Q) sequence increases the capacity of the protein to form amyloid aggregates, according to the hypothesis of nucleation [8]. However, the central role of protein aggregates in poly(Q) disorders has been
2 Neuropathology 3
questioned by researchers who consider them to be a neuronal byproduct that, in some cases, may have protective functions [9, 10].
2 Neuropathology 2.1 Alzheimer’s Disease
The neuropathologic hallmark of Alzheimer’s disease is the co-occurrence of extracellular deposits of amyloid made up of straight fibrils of around 10 nm whose main component is the Aβ polypeptide and the intracellular build-up of abnormal twisted filaments [paired helical filaments (PHFs)], mainly composed of hyperphosphorylated tau protein. Aβ deposition takes place in the neuropil as well as in leptomeningeal and parenchymal vessel walls, whereas PHF formation occurs within perykaria and neurites of selected neuronal populations. Therefore, Alzheimer’s disease is a unique protein misfolding disease, since it is characterized by misfolding of two unrelated proteins, Aβ and tau, that cause two distinct histopathologic changes. This peculiarity, together with the fact that Aβ deposition and neurofibrillary changes can occur independently of each other, is the basis of several questions regarding the pathogenesis, and, in particular, which one of the two lesions is more strictly associated with the development of diffuse neuronal loss and dysfunction, synaptic rarefaction, and the appearance of the clinical signs of dementia. The issue is further complicated by the fact that Aβ deposition and tau pathology, although with differences in severity and topographic distribution, may also be present in non-demented elderly subjects [11]. A main distinction is that, in non-demented subjects, neurofibrillary changes, when present, are less severe and confined to restricted brain areas (the mesial temporal structures), while Aβ deposits may be widespread in the neocortex, in some instances in an amount similar to Alzheimer’s disease. This is in keeping with the concept that neurofibrillary changes evolve in an anatomical stereotypical hierarchical fashion, with the entorhinal cortex being the earliest area affected, followed by the hippocampal formation. With increasing disease progression, neurofibrillary changes occur in the association cortex, while the primary sensory and motor cortices are spared until the very advanced stages of the disease [12, 13] (Fig. 1A and 1B). Aβ is a 40- to 42-residue peptide derived by proteolytic cleavage of a 695- to 770amino-acid Aβ protein precursor (AβPP) [14–16], a transmembrane glycoprotein encoded on chromosome 21, in which Aβ is positioned partly in the extracellular domain and partly in the transmembrane domain. The Aβ deposits in the vessel wall take the form of congophilic (amyloid) angiopathy, while in the neuropil, the deposits determine the formation of the senile plaques (Fig. 2). These are complex lesions whose morphogenesis is influenced by the participation of cellular elements, either neuronal (neurites and synaptic terminals) or glial (reactive astrocytes and activated microglia), as well as by the association of Aβ amyloid with other
4 Protein Aggregates in Neurodegenerative Disorders
Fig. 1 Alzheimer’s disease. Immunohistochemistry with monoclonal antibody AT8 to phosphorylated tau (immunoreactivity corresponds to the brown reaction product) revealed severe involvement of the cerebral cortex by neurofibrillary changes that can spare the primary motor and sensory areas (A) or involve all cortical fields (B). AT8 labeled cell bodies and a network of neu-
ronal processes dispersed in the neuropil of the cerebral cortex (neuropil threads) (C). Clustering of AT8-immunoreactive profiles occurred in direct correspondence to senile plaques (D). Labeled neurons often showed pyramidal morphology, with AT8 immunostaining extending to the apical dendrite, reproducing the picture of flame-shaped NFTs (E).
“chaperone” molecules. These may be associated to all forms of amyloid, such as P component, complement factors, apolipoprotein (Apo) E and J, or more specific for Aβ deposits as α 1 -antichymotrypsin [17]. The introduction of immunohistochemical techniques using antibodies against Aβ revealed a far higher density and wider distribution of Aβ deposits in Alzheimer’s disease than had been appreciated by the use of classic silver impregnation methods or by applying specific staining for amyloid, such as Congo red or Thioflavin S. This was because immunohistochemistry could also recognize Aβ deposits that lacked the staining and ultrastructural properties of amyloid. These amorphous appearing, non-fibrillar plaques were referred to as “diffuse plaques” or “pre-amyloid deposits”, since they might represent an early stage in the morphogenesis of senile plaques, also considering that they are not associated with neuritic or glial changes [18–21] (Fig. 3). This hypothesis is supported by studies on patients with Down’s syndrome (trisomy 21) who invariably develop Alzheimer’s neuropathological changes, starting with pre-amyloid deposits in their teenage years and displaying fully-developed senile plaques with neuritic/glial changes and neurofibrillary tangles (NFTs) about two decades later [22, 23]. Moreover, by immunohistochemistry, it has been demonstrated that, in
2 Neuropathology 5
Fig. 2 Alzheimer’s disease. Double immunohistochemical detection of phosphorylated tau (blue) and A$ protein (brown). Note the intermingling of A$ deposition with tau-immunoreactive neuronal profiles, configuring the senile plaque.
Alzheimer’s disease, Aβ deposition also affects brain regions previously considered to be spared by the pathologic process, such as the cerebellum, striatum and brainstem [24, 25]. Interestingly, non-fibrillar pre-amyloid deposits are the predominant form of Aβ deposition in these regions. The concept of the C-terminal heterogeneity of the Aβ peptides deposited in Alzheimer’s disease brains provided a further distinguishing element between pre-amyloid deposits and senile plaques, since the former are made up almost exclusively by Aβ1–42, while in the latter contain a mixture of Aβ1–40 and Aβ1–42. Aβ1–40 is the main constituent of amyloid in the vessel walls [26, 27]. NFTs are caused by the intracellular accumulation of PHFs, abnormal twisted filaments 80–200 nm thick, whose main component is an abnormally phosphorylated form of tau protein [28–30]. Tau is a microtubule-associated protein, encoded by a gene on chromosome 17 [31, 32], highly expressed in the central nervous system (CNS). In normal conditions, tau is present exclusively in axons, while in Alzheimer’s disease, it accumulates in an insoluble, hyperphosphorylated form ectopically in the perikaria and dendrites. The main biologic function of tau is to bind
6 Protein Aggregates in Neurodegenerative Disorders
Fig. 3 Alzheimer’s disease. Immunohistochemistry with antibodies to A$ (immunoreactivity corresponds to the brown reaction product) revealed the high density of A$ deposits in the cerebral cortex (low magnification, A) and the different morphologies
of A$ deposition: classic senile plaques with a dense core and a more dispersed peripheral halo (B), amorphous amyloid deposit (C), pre-amyloid deposits (D) and amyloid angiopathy close to a senile plaque (E).
microtubules via the C-terminal binding domains, promoting their polymerization and stabilization. Tau therefore plays an important role in maintaining neuronal integrity, axonal transport and polarity [33, 34]. Highly phosphorylated tau has fewer tendencies to bind microtubules [35–38]. NFTs and degenerating neurites of plaques are consistently immunoreactive with antiphosphorylated tau antibodies (Fig. 1D). Moreover, immunohistochemistry with these antibodies is much more sensitive than classic silver impregnation methods in revealing neurofibrillary
2 Neuropathology 7
changes and enabled recognition of an additional form of hyperphosphorylated tau (Fig. 1C–E). These are the “neuropil threads”, a meshwork of randomly oriented, tau-immunoreactive neurites, dispersed in the neuropil, that represent a consistent part of the total burden of neurofibrillary pathology in Alzheimer’s disease [39]. NFT and degenerating neurites of plaques are also consistently immunoreactive for ubiquitin. The molecular mechanism of PHF formation has not been clarified: several kinases and phosphatases have been implicated [40, 41], but it remains to be determined whether hyperphosphorylation is the primary event that makes tau insoluble and resistant to degradation or is a consequence of the aggregation process. In summary, several lines of evidence indicate the cerebral deposition of Aβ amyloid is the seminal event that initiates a cascade of biochemical and cellular changes leading to tau-related alteration of the neuronal cytoskeleton and to neurodegeneration. This scenario – known as the “amyloid cascade hypothesis” [42, 43] – is supported by the observation that quantitative modification of the total levels of Aβ or of the ratio between long forms and short forms of this peptide (i.e. Aβ1–42 versus Aβ1–40) are sufficient to induce the whole neuropathologic phenotype of the disease. The first situation takes place in Down’s syndrome, in which one extra copy of the gene coding for the AβPP is present as a consequence of the chromosome 21 trisomy, while the second corresponds to the familiar cases of Alzheimer’s disease associated with mutations of the genes for AβPP, presenilin1 or presenilin2 (see below) [43]. NFT composed of tau aggregates that are biochemically similar to those of Alzheimer’s disease have been described in many neurodegenerative diseases in which Aβ is absent. These are referred as tauopathies, and include Picks disease, CBD and PSP. A subset of tauopathies is familial and mutations of the tau gene have been demonstrated in a percentage of them [44–47]. 2.2 Tauopathies
Starting from 1994, an autosomal-dominantly inherited form of frontotemporal dementia with parkinsonism was described in several families and linked to the region of chromosome 17 that contains the tau gene, resulting in the denomination “frontotemporal dementia and parkinsonism linked to chromosome 17” (FTDP-17) [44, 48]. The neuropathologic picture of FTDP-17 is characterized by tau-positive inclusions that occur in neurons, but may also be numerous in glial cells. The now widely used term “tauopathy” was introduced to highlight the abundance of tau accumulation, in the absence of Aβ aggregates [45]. In 1998, the first mutations in the tau gene (MAPT) were reported in FTDP-17 families and now the total of the known mutations is more than 30 [46, 47, 49]. Tau protein is encoded by a single gene of 16 exons from which different forms are produced via alternative mRNA splicing. In adult human brain, six tau forms are generated ranging from 352 to 441 amino acids, differing with respect to the presence of three (3R-tau) or four (4R-tau) C-terminal repeat sequences of
8 Protein Aggregates in Neurodegenerative Disorders
31–32 residues. Three different forms of each (3R-tau and 4R-tau) exist, as a result of the presence of none, one or two N-terminal inserts of unknown function [41]. The complex structure of the tau protein is the basis of the variability in the chemicophysical characteristics of the insoluble tau filaments in different tauopathies. The partial elucidation of the relationship between the site of the mutation on the MAPT gene, the tau isoforms engaged in the misfolding process, and the biochemical and structural characteristics of the tau aggregates provided very important insights into the role of tau abnormalities in neurodegenerative diseases, and more generally into the pathogenesis of protein misfolding diseases. The most important implication derived from studies of patients with FDTP-17 is that the mutation of the tau gene is sufficient to cause intracellular tau deposition and to induce neurodegeneration. Moreover, the studies of these rare inherited forms of tauopathies have shed light on the pathogenesis of other more common, sporadic neurodegenerative diseases characterized by tau inclusions (Pick’s disease, CBD and PSP), that can be considered as primary tauopathies, due to the absence of other disease-specific neuropathological abnormalities. In contrast, other brain diseases are known in which tau inclusions are likely secondary to metabolic (Niemann–Pick type C), infective (subacute sclerosing panencephalitis) or traumatic (dementia pugilistica) brain lesions. PSP is characterized clinically by supranuclear gaze palsy and by postural instability [50]. Neuropathologically, the main features are atrophy of the basal ganglia and brainstem, with neuronal loss and gliosis. In these brain areas, there is a high density of tau pathology, including NFTs in neurons and consistent tau aggregates in glial cells, both astrocytes (“tufted astrocytes”) and oligodendrocytes (“coiled bodies”) [51] (Fig. 4C and 4D). In contrast to Alzheimer’s disease, ultrastructurally these aggregates are made up straight fibrils of 15–18 nm [52] and biochemically are mainly composed of 4R-tau isoforms [41]. CBD is a progressive neurodegenerative disease involving the cerebral cortex and deep grey and white matter structures such as the striatum, thalamus, substantia nigra, capsula interna, subcortical frontal white matter and brainstem [53]. The neuropathological features are depigmentation of substantia nigra and asymmetrical frontoparietal atrophy. In affected regions, neuronal loss, gliosis and tau-immunoreactive glial and neuronal inclusions are prominent, and surviving nerve cells are often swollen (“achromatic” or “ballooned” neurons). The neuronal tau inclusions are morphologically pleomorphic. In some neurons they appear as dispersed globose NFTs; in others as small compact inclusion bodies and occasionally tau-immunoreactivity diffusely fills the cytoplasm. A typical finding of CBD is the prominent deposition of abnormal tau in neuronal processes (“threadlike neuritic profiles”) and in glial cells, with consistent involvement of the white matter. In the cerebral cortex, peculiar lesions consisting of annular clusters of tau-immunoreactive processes (“astrocytic plaques”) are considered typical of CBD [54] (Fig. 4E and 4F). The abnormal filaments include both PHF-like and straight filaments [55]. The biochemical profile of insoluble tau is similar to that of PSP, being mainly composed of 4R-tau isoforms [41].
2 Neuropathology 9
Fig. 4 Tauopathies. In Pick’s disease the characteristic neuronal, round, welldemarcated inclusions (Pick bodies) are revealed by Bodian silver impregnation (A) and by immunohistochemistry with monoclonal antibody to phosphorylated tau (AT8, immunoreactivity corresponds to the brown reaction product) (B). In PSP the tau pathology is stained by monoclonal antibody AT8 to phosphorylated tau, and includes NFTs in neurons (C) and tau deposi-
tion in glial cells, as the “tufted astrocytes” (D). In CBD the deposition of phosphorylated tau disclosed by AT8 immunohistochemistry is prominent in neuronal perikaria (E), but also in neuritic processes (E). The “astrocytic plaque” (F) is considered typical of this disease. FTDP-17 is characterized by the presence of phosphorylated tau deposition revealed by AT8 immunohistochemistry in neurons (often diffuse in the cytoplasm, G) and in glial cells (H).
Pick’s disease is now a term that should be restricted to forms of frontotemporal degeneration with tau-immunoreactive lesions and, in particular, with the characteristic neuronal, round, well-demarcated argyrophilic inclusions (Pick bodies) [56] (Fig. 4A and 4B). The neuropathologic picture is that of severe frontotemporal
10 Protein Aggregates in Neurodegenerative Disorders
atrophy with marked neuronal loss, gliosis and neuropil vacuolization. Pick bodies are abundant in the dentate gyrus of the hippocampus and focally in layers II, III and VI of the cerebral cortex. They are detected by silver impregnation and by immunohistochemistry with antibodies to hyperphosphorylated tau that reveal additional cytoskeletal changes, including neuritic profiles and astrocytic and oligodendroglial inclusions [57]. Biochemically, the tau aggregates are predominantly made up by the 3R-tau isoforms [41]. It should be recalled that the majority of patients with the clinical features of FTD do not show the neuropathology of Pick’s disease, but a more unspecific picture of atrophy of the frontal and temporal lobes with neuronal loss, gliosis and microvacuolar neuropil degeneration, in the absence of tau-immunoreactive changes. The neuropathology of FTDP-17 is heterogeneous with respect to the morphologic characteristic of the lesions as well as their severity and topographic distribution. It can overlap with the scenario of Pick’s disease, PSP and CBD – the invariable change being the presence of tau deposits in neurons and glial cells in several grey and white matter structures of the CNS (Fig. 4G and 4H). The heterogeneity of the clinical/pathologic phenotypes partially reflects the position of the mutations in the tau gene. They can be separated in two broad categories: (1) coding region missense mutations, and (2) intronic mutations and silent mutations of the coding region. The first group of tau mutations is the ordinary one, in which a nucleotide substitution in the coding region of the MAPTgene changes the corresponding amino acidic residue. Most of the reported missense mutations occur in the highly conserved region within or near the microtubule-binding domains. The effects of these mutations seem to be 2-fold: (1) they reduce the ability of tau protein to bind to microtubules and to promote microtubule assembly in vitro [58, 59], and (2) they increase the tendency of tau to aggregate in insoluble filaments [60, 61]. The structure and biochemical characteristics of the misfolded tau aggregates are dependent on the exon of the mutation. While mutations located outside exon 10 modify all six tau isoforms, those within exon 10 affect only the three tau isoforms bearing four microtubule-binding domains (4R-tau), since the supplementary repeat of this portion of the molecule is coded by exon 10. The second class of mutations is more puzzling. These genetic defects are single base pair substitutions occurring within intron 10 or silent nucleotide changes in the adjacent exon 10. At first it was difficult to explain how mutations that do not affect the sequence of the tau protein could provoke its misfolding. The explanation came from biochemical studies demonstrating that these mutations increase the levels of 4R forms of tau, likely due to a higher proportion of mRNAs in which exon 10 is retained [62, 63], as a consequence of the disruption of a regulatory stem-loop, “splicing enhancer” structure [64]. The mechanisms by which changes in the ratio of 3R-tau to 4R-tau lead to neuronal dysfunction is unclear, but it seems established that a ratio of 3R-tau forms to 4R-tau forms of about unity is necessary for the normal structure and function of microtubules. It has been suggested that some missense mutations in exon 10 close to the downstream intron may also act with this mechanism.
2 Neuropathology 11
2.3 Prion Diseases
Prion diseases are fatal, rapidly progressive, neurodegenerative disorders of humans and animals. They are a unique category of diseases, since may be sporadic or inherited in origin and can be transmitted. They are therefore also called “transmissible spongiform encephalopathies”, a term that underlines their infectious character and the classic main neuropathological hallmark of neuropil vacuolation. The transmissible agent, the prion, is devoid of informational nucleic acid and consists only of protein [65] (see below). Several lines of evidence indicate that prions are composed of an abnormal, pathogenic isoform of the prion protein (PrP). The normal form of PrP (PrPC ) is widely expressed in neurons and glia in the CNS, and little is known about its function(s). The human cellular gene which encodes PrPC has been called PRNP, whose open reading frame is in a single exon. The pathogenic isoform (PrPSc ) results when the normal form undergoes a conformational change, converting α-helical regions to β-sheet motifs, and possesses abnormal physicochemical properties such as detergent insolubility and protease resistance [66]. Therefore, prion diseases are “transmissible protein misfolding diseases”, in which the abnormally folded protein may induce the conversion of normal molecules into the misfolded status. Animal prion diseases include scrapie in sheep and goats and bovine spongiform encephalopathy (BSE) in cattle. In humans, the most common prion disease is Creutzfeldt-Jakob disease (CJD), which can be inherited as autosomal dominant disease associated with mutations of the PRNP gene, acquired iatrogenically through exposure to material contaminated with prions or arise sporadically for no obvious reason. Possible causes of sporadic CJD include spontaneous formation of PrPSc as a rare stochastic event, somatic mutations of PRNP or unrecognized prion exposure. A variant CJD (vCJD) form was reported in 1996 [67], characterized by clustering in the UK and peculiar neuropathological and biochemical features. Geographic and temporal association, as well as transmission studies revealing strong analogies between the vCJD and BSE agents, lead to the conclusion that vCJD is acquired by humans by exposure to BSE contaminated material. The detection of the deposition of PrPSc is now possible not only biochemically by immunoblot analysis, but also in histological specimens following the introduction of effective pre-treatments that make immunohistochemistry with anti-PrP antibodies sensitive and specific [68–70]. These techniques revealed how PrPSc accumulates in the CNS of CJD patients with variable topographic distribution and different patterns of immunostaining – synaptic, perivacuolar, perineuronal and plaque-like (Fig. 5A–D). The accumulation of PrPSc in the CNS is accompanied by neuronal and glial changes. The triad of spongiform changes, neuronal loss and gliosis, in the absence of an inflammatory response, is the classic neuropathological hallmark of CJD. Spongiform change is relatively specific to CJD, but may differ in severity in different patients and from regions to regions of the CNS. It is characterized by small, round or oval vacuoles in the neuropil of the cerebral cortex and other grey structures (Fig. 5E). Glial changes consist of reactive
12 Protein Aggregates in Neurodegenerative Disorders
Fig. 5 Prion diseases. In CJD, most grey structures exhibit PrPSc immunoreactivity (corresponding to the brown reaction product) (A) that may assume different patterns: diffuse synaptic type (B), perineuronal (C) and perivacuolar (D), and is associated with
spongiform changes (hematoxylin & eosin, E) and astrogliosis (anti-glial fibrillary acid immunohistochemistry, F). PrP amyloid deposits are consistent in GSS (Thioflavin S, G) and vCJD (PrP immunohistochemistry, H).
astrogliosis and microglial activation (Fig. 5F). Neuronal loss in the affected cortical and subcortical regions is often severe, sometimes with complete depletion of some neuronal populations [71]. While in classic CJD cases the formation of true amyloid deposits is a rare event and is restricted to specific brain regions, in Gerstmann–Str¨aussler–Scheinker (GSS) disease this phenomenon is relevant, and PrP amyloid deposits are abundant and widespread in the CNS [72] (Fig. 5G). GSS is determined by specific mutations in the PRNP gene and has a longer disease duration than CJD. Amyloid PrP deposition is also abundant in vCJD, in which the typical lesion is the florid plaques – an amyloid core surrounded by vacuoles of spongiosis (Fig. 5H).
2 Neuropathology 13
The neuropathology of CJD is heterogeneous and it has been shown that this may be related to variations in the tertiary structure of PrPSc , resulting in different conformers of the abnormal protein having distinct physicochemical and pathogenic properties [73]. This is confirmed by the observation of molecular size differences in the protease-resistant core of PrPSc that are due to different sites of proteolytic cleavage reflecting different tertiary structures. In particular, the clinicopathological heterogeneity of sporadic CJD has been linked to two types of PrPSc , termed type 1 and type 2, having a molecular weight of 21 and 19 kDa. Evidence suggests that the PrPSc type in combination with the genotype at codon 129 of the PRNP gene – a common polymorphic site encoding methionine or valine – is a major determinant of deposition pattern (i.e. diffuse or focal) and brain regional distribution of PrPSc . The view that different conformers of PrPSc may possess distinct pathogenic properties is supported by a study on CJD cases with both type 1 and type 2 PrPSc in the same brain. This event is relatively common, involving about 25% of the sporadic patients with a close relationship between PrPSc type, pattern of PrP immunoreactivity and severity of spongiform changes [74]. Further support for the prion hypothesis has been provided by transgenic animal studies; in particular, the finding that mice in which the PrP gene has been ablated (PrP knock-out mice) neither develop neuropathological features of prion diseases nor accumulate PrPSc or propagate the disease after inoculation of prions, demonstrating that the absence of PrPC expression prevents prion spread and neuronal dysfunction [75]. 2.4 Synucleinopathies
Synucleinopathies represent a heterogeneous group of neurodegenerative disorders characterized by the presence of cytoplasmic neuronal and glial inclusions. The synucleinopathies can be divided into Lewy body disorders (Parkinson’s disease and DLB) and disorders with intracytoplasmic inclusions in glial cells (multiple system atrophy). Parkinson’s disease is the second most common neurodegenerative disease after Alzheimer’s disease. Parkinson’s disease is characterized clinically by resting tremor, rigidity and bradykinesia, resulting from the progressive and selective loss of dopaminergic neurons in the pars compacta of the substantia nigra. Histopathologically, it is characterized by the degeneration of specific nerve cell populations that develop filamentous inclusions in the form of Lewy bodies and dilated neurites. The onset of clinical symptoms occurs when the loss of dopaminergic neurons is over 50%. Lewy bodies are found not only in the substantia nigra, but also in other brain structures [76]. The classical Lewy body is a spherical, cytoplasmatic inclusion with a diameter of about 10–30 µm. After hematoxylin & eosin staining, the Lewy body assumes the classical “target shape” with a central core more intensely stained and a more faded external portion (Fig. 6A). At the ultrastructural level, the central portion is formed by tightly packed filamentous and granular material, while in the external part filaments of 7–20 nm are associated with electron-dense material.
14 Protein Aggregates in Neurodegenerative Disorders
Fig. 6 Synucleinopathies. In Parkinson’s disease, Lewy bodies appear as target-like, spherical, cytoplasmic inclusions (hematoxylin & eosin, A) intensely immunoreactive with anti-SYN antibodies (immunore-
activity corresponds to the brown reaction product, B). In MSA, GCIs appear black in sections stained with the Gallyas silver impregnation (C) and are immunolabeled by anti-SYN antibodies (D).
Other round-shaped inclusions (“pale” bodies) are also identified in Parkinson’s disease, and appear as granular and eosinophilic material displacing neuromelanin of pigmented neurons, without the classical “target shape” of Lewy bodies. Pale bodies are considered precursors of Lewy bodies. The identification of familial forms of Parkinson’s disease caused by missense mutations in the α-synuclein gene [77] prompted immunohistochemical studies showing that Lewy bodies are consistently and intensely immunoreactive with anti-SYN antibodies [78] (Fig. 6B). Previously, Lewy bodies were recognized by means of anti-ubiquitin antibodies [79]. Furthermore, parkin, another protein genetically associated to familial Parkinson’s disease, was invariably found in Lewy bodies [80]. SYN immunoreactivity was found not only in perykarial inclusions but also in neurites (Lewy neurites). DLB defines a specific clinical syndrome characterized by progressive dementia with fluctuation of cognitive deficits and hallucinations, mainly visual, associated with the neuropathological finding of Lewy bodies. However, the threshold of Lewy bodies for a DLB diagnosis has not been defined yet. This could lead to the undetermined condition of subjects with a clinical diagnosis of dementia where the typical Alzheimer’s disease pathology (Aβ deposits and NFTs) co-exists with the presence of Lewy bodies. The problem is that the relative contribution to the clinical symptoms of the individual pathological components is difficult to define. Other DLB cases are “pure”, in which Alzheimer-type pathology is absent or minimal. Based on the localization and density of Lewy bodies, three categories have been identified: Lewy bodies at predominance in the brainstem, in the limbic cortex and in the neocortex. Unexpectedly, DLB is not necessary associated with the third category (Lewy bodies at predominance in neocortex). Cortical Lewy bodies after
3 The Neurotoxic Proteins
hematoxylin & eosin show different shapes: round, oval and bean-like, lacking the target shape typical of nigral Lewy bodies. On the other hand, the antigenic characteristics of cortical Lewy bodies are essentially the same of the classical Lewy bodies, as they are immunopositive for SYN, ubiquitin and parkin. Multiple system atrophy (MSA) is clinically associated to disautonomic disturbances, parkinsonism and cerebellar signs. Neuropathologically, it is characterized by intracellular glial inclusions originally recognized by the silver impregnation method of Gallyas [81]. They appear as oval, or triangle-, flame- or sickle-shaped glial cytoplasmic inclusions (GCIs or Papp-Lantos inclusions) localized in oligodendrocytes (Fig. 6C). The size is variable, but the inclusions often completely fill the cytoplasm, confining the nucleus at the periphery. GCIs are immunoreactive with antibodies against SYN [82] (Fig. 6D). At the ultrastructural level, GCIs are composed of tubules or filaments having a diameter of 20–40 nm in association with granular material. In MSA subjects, SYN-immunoreactive inclusions can be found in glial and neuronal cells nuclei and in neurites. GCIs are prominent in the motor and sensory primary cortices, in the putamen, globus pallidus, internal and external capsula, corpus callosum, and several other structures of the brainstem and spinal cord.
3 The Neurotoxic Proteins 3.1 Alzheimer’s Disease
The pathogenic role of Aβ in Alzheimer’s disease has been put forward on the basis of genetic, experimental and biochemical evidence, and an essential contribution to this hypothesis derived from the demonstration that Aβ peptides can exert neurotoxic effects in vitro [42]. The toxicity of Aβ peptides is mediated by apoptosis [83, 84] and was initially associated with the polymerization into fibrils of the peptide [85]. The biological active fragment was found to be the Aβ 25–35 sequence. The fibrillogenic capacity, tested in vitro, increases progressively from peptide Aβ1–40 to Aβ1–42 and Aβ25–35. Aβ peptides containing 40 or 42 residues are those found in the brains of Alzheimer’s disease patients in senile plaques and originate from the precursor of Aβ, AβPP. In the amyloidogenic pathway of catabolism, AβPP is cleaved sequentially by β- and y-secretases at the N- and C-terminus of the Aβ region. β-Secretase cleaves between residues 671 (Met) and 672 (Asp), while the cleavage at the C-terminal by γ -secretases can occur either at residue 711 (Val) or 713 (Iso) to generate Aβ1–40 or Aβ1–42 [86–88]. The amyloid-like fibrils spontaneously formed in vitro by these peptides have morphological, staining and ultrastructural features similar to the amyloid deposits isolated from Alzheimer’s disease brains. The amyloid fibrils are composed of protofilaments that are hydrogen-binding β-sheet structures with the β-strands running perpendicular to the long fibril axis. It is not clear whether the conformational alterations in the oligomers are
15
16 Protein Aggregates in Neurodegenerative Disorders
a consequence of the oligomer assembling or whether the misfolded monomers induce the formation of the aggregates. An intermediate possibility is that an initial, unstable, misfolded conformation of peptide monomer can be stabilized by interaction with other proteins to form oligomers that lead the formation of fibrils and aggregates [89]. In this case, the conformational changes are not triggered by oligomerization, but their stable conformation is indissolubly associated with the oligomers. The formation of aggregates in Alzheimer’s disease brains is thought to be seeded by Aβ1–42, which is more prone to form aggregates, whereas Aβ1–40 accumulates subsequently. The essential role of Aβ1–42 is supported by evidence that the common molecular change associated with the mutations of three different genes linked to familial Alzheimer’s disease (AβPP, presenilin1 and presenilin2) is the increase of Aβ1–42 production [90]. The close association between the neurotoxicity and the aggregation of the Aβ peptide was subsequently reconsidered, and the hypothesis that Aβ deposition is a primary factor in the pathogenesis of Alzheimer’s disease has been challenged on the basis of studies showing that the degree of cortical synapse loss and the quantity of neurofibrillary changes predict the severity of dementia more accurately than the density of Aβ deposits [91, 92]. Recently, new data have emerged that lead to reconsideration of the amyloid cascade hypothesis; they support as the central pathogenic element the soluble oligomers of Aβ rather than the fibrillar aggregates of the peptide [93, 94]. Using amidation at the C-terminal of β25–35 to reduce the fibrillogenic capacity of the peptide, we demonstrated that the neurotoxic activity was independent of the aggregation state of the peptide [95]. Accordingly, several other studies suggested that neuronal death better correlates with the presence of oligomer species rather than with aggregates of Aβ peptides [96, 97]. It is possible that senile plaques represent the reservoir of aggregated Aβ that can continuously release diffusible oligomers and protofibrils to induce injury not only in the surrounding cells, but also at a certain distance of the senile plaques. Other theories advance the notion that neuronal death is triggered by intracellular events that occur during Aβ PP processing. In this regard, a role of intraneuronal Aβ in Alzheimer’s disease pathogenesis is suggested by the study of triple transgenic mice (Aβ PP/PS1/Tau). In the hippocampus of these animals, the intraneuronal accumulation of Aβ is an early event that precedes plaque formation and correlates with synaptic dysfunction [98]. A good correlation between total content of Aβ1–42 (biochemically determined in specific cerebral regions) and cognitive decline was recently found in Alzheimer’s disease, in keeping with the closer association of intellectual deterioration with Aβ production rather than with the density of plaques [99]. 3.2 Prion Diseases
As mentioned earlier, the pathogenic mechanism of prion diseases is based on the conformational conversion of the cellular prion protein (PrPC ) into the disease-specific species (PrPSc ). The “protein only” hypothesis on the nature of the
3 The Neurotoxic Proteins
infectious agent (prion) indicates that it is essentially constituted of PrPSc [65] that is able to induce a “conformational transmission” via the host PrPC [100]. It has been postulated that other element(s) (called protein X) facilitate the conversion of PrPC to PrPSc [101]. Although the pathogenetic mechanisms are not fully clarified, the direct role of PrPSc in neuronal degeneration and glial activation is largely accepted. However, based on the evidence obtained in transgenic mice, Chiesa et al. [102] have demonstrated that the disease-associated, infectious form of the prion protein differs from the neurotoxic species. To evaluate the toxicity of PrPSc , in principle, one could directly apply the purified protein to neurons in culture. Although there have been several reports of these experiments [103], they are difficult to interpret because of uncertainties about the physical state of the PrPSc , since detergents that are required to maintain the protein in solution have to be removed prior to application to cell cultures. An alternative strategy has been to analyze the effect on cultured neurons of synthetic peptides derived from the PrP sequence [104]. The concept that misfolding of PrP causes a transmissible neurodegenerative disorder has prompted studies aimed at identifying polypeptide segments that are central to the conversion process. In particular, a synthetic peptide corresponding to the human PrP region 106–126 (PrP106–126) recapitulates several chemicophysical characteristics of PrPSc , including the propensity to form β-sheet-rich, insoluble and protease-resistant fibrils similar to those found in prion diseases [104, 105]. Similar to the studies with synthetic Aβ peptides, 10 years ago PrP106–126 was proposed as a model with which to investigate the biological effects of PrPSc [106]. This approach was successful with respect to the capacity of the synthetic peptide to mimic in vitro several aspects of the disease associated to the presence of PrPSc , including neurodegeneration, glial activation and alteration of membrane fluidity. Although PrP106–126 is not normally found in the brain of individuals with prion diseases, the various N- and C-terminal truncated fragments of PrP produced in the CNS in the course of sporadic and inherited prion diseases invariably contain the 106–126 sequence, suggesting that this region of the protein may possess the ability to trigger a fundamental pathogenic mechanism. The neurotoxic effect of PrP106–126 was abolished or reduced in neurons derived from PrP knock-out mice that are resistant to prion infectivity arguing for a role of the cellular prion protein in the neurotoxic cascade activated by PrP106–126 and underscoring the relevance of this model [107, 108]. Numerous other peptides homologous to PrP fragments were synthesized in wild-type and mutated forms, and their chemicophysical characteristics and biological effects investigated [109]. In some cases, the introduction of missense mutations associated to familial prion diseases (i.e. D178N) increased the fibrillogenic capacity of the PrP peptides [110]. The PrP106–126 peptide is highly fibrillogenic, but its in vitro toxicity was independent of this feature, while the ability to induce astrogliosis seems to be associated with the aggregation state of the peptide [110]. The amidation at the C-terminal combined with the acetylation at the N-terminal strongly reduced the fibrillogenic capacity of PrP106–126, but did not affect the toxicity of the peptide
17
18 Protein Aggregates in Neurodegenerative Disorders
[111]. These results are in agreement with the neuropathological evidence that PrPSc is deposited as amyloid fibrils only in particular conditions, as in GSS and vCJD, while in sporadic CJD PrPSc is assembled in non-fibrillar form, most likely as oligomers, strongly supporting the concept that soluble oligomers rather than mature amyloid fibrils are actually the pathogenetic species that causes neurodegeneration. It is noteworthy, in this regard, that an antibody recently developed and recognizing soluble oligomers of Aβ [112] prevented the toxicity induced by Aβ oligomers. Surprisingly, the antibody also recognized oligomer aggregates of SYN, islet amyloid polypeptide, polyglutamine, insulin and PrP peptides. As for Aβ, the antibody did not react with the monomeric form or the fibrillar version of these proteins, but only with the oligomers, indicating that the antibody recognizes a common structural epitope independent of the amino acid sequence. 3.3 Synucleinopathies
Epidemiological studies suggest that, in the etiology of synucleinopathies, environmental factors are more relevant than genetic ones. Nevertheless, recent data about the existence of rare familiar forms of Parkinson’s disease have been collected, and a pathogenetic role has been demonstrated for several genes, as SYN, parkin and DJ-1, whose mutations have been linked to familiar forms of Parkinson’s disease [113]. Two different missense mutations in the SYN gene (A30P and A53T) cause rare familial forms of Parkinson’s disease [77, 114] and recently a family with a new mutation (E46K) has been described [115]. Based on this evidence, SYN was identified as a major component of Lewy bodies [78]. On the other hand, a recessive form of juvenile Parkinson’s disease was associated with mutations in the gene encoding parkin [116]. Several pieces of evidence, most of which are based on the analysis of transfected cells, support the hypothesis that accumulation of SYN in Parkinson’s disease is a consequence of an alteration of the ubiquitin/proteasome system (UPS) [117, 118]. The self-aggregation capacity of SYN [119, 120] and the ubiquitin ligase activity attributed to parkin [121, 122] suggest that protein aggregation and dysfunction of the UPS might play a causal role in the development of sporadic and familial Parkinson’s disease. SYN is a highly conserved 140-amino-acid protein widespread in the CNS. It interacts with other cerebral proteins (14-3-3, synphilin-1, parkin, tyrosine hydroxylase, dopamine transporter) and it is involved in dopamine vesicle trafficking [123]. In aqueous solution, free SYN shows a natively unfolded conformation that favors protein self-aggregation and toxicity [124]. Experimental injuries reproducing possible Parkinson’s disease triggers (toxins, oxidative stress, proteasome impairment) increase SYN aggregation [125–127]. Like the other amyloidogenic proteins, SYN aggregation follows a multistep process, starting from SYN monomers that form oligomers (protofibrils) whose coalescence generates fibrils that, in turn, can aggregate in inclusion bodies. Recent data suggest that the protofibrils are the toxic intermediate, while the final inclusions may have a protective value [128]. Several in vitro studies have confirmed that SYN mutations
3 The Neurotoxic Proteins
predispose to protein aggregation and the overexpression of the mutated forms of the protein are toxic for the cell [129, 130]. According to this hypothesis, the accumulation of SYN in the Lewy bodies is a toxic event that eventually triggers neuronal death [131]. We found that the fragment corresponding to the residues 61–95 [also known as non-amyloid component of plaques (NAC)] was specifically neurotoxic for the dopaminergic cells and the pre-aggregation of the peptide amplified this effect [132]. Conversely, other studies demonstrated that SYN can exert neuroprotective activity against various cellular stresses like oxidative stress, serum deprivation and apoptosis [133, 134]. In our investigation, we have also noted that PC12 transfected with SYN were more resistant to oxidative stress. This evidence was derived mainly by in vitro overexpression of the wild-type protein whose protective role was reduced by the mutations A30P and A53T. Accordingly, the loss of SYN protective function, rather than a gain of toxic function, may determine the cell death in Parkinson’s disease. Nevertheless, the situation is not completely clear, as other authors have reported a protective role also for the mutated forms of the protein [134]. It is possible that the two apparent opposing actions of SYN are not mutually exclusive. SYN may be at first neuroprotective, but, if the protein self-aggregation becomes relevant, the disease occurs. In this regard, it was recently reported, using an in vitro model that the extracellular administration of wild-type SYN on the nanomolar scale is protective against oxidative stress, while the overexpression of the protein or its extracellular administration on the micromolar scale is toxic [135]. We have confirmed these data using a chimeric SYN associated with the TAT [136] sequence (TAT–SYN) to translocate the protein inside the cells. Mutated (A30P, A53T) and wild-type TAT–SYN sequences equally protected against oxidative stress induced by hydrogen peroxide in PC12 and the effect was mediated by the induction of HSP70 protein [137]. The protective effect by TAT–SYN in SK-NBE neuroblastoma cells was efficacious also against the dopamine-specific neurotoxin 6-hydroxydopamine (6OHDA). A third important point that has been intensively investigated for Parkinson’s disease is the so called “selective vulnerability”, i.e. why only a particular region of the brain (substantia nigra pars compacta) is affected even though SYN is widely expressed in brain and the factors that probably trigger the disease are not region specific. Recent studies have pointed out that a first important factor is dopamine synthesis and secretion that are peculiar features of the neurons of the affected areas [138]. In fact, dopamine is a molecule that is prone to oxidation, which may contribute to generate reactive molecules potentially able to damage cellular components like proteins or lipids. From this point of view several experimental data indicate that SYN is implicated in dopamine metabolism and trafficking [139, 140] suggesting a relationship between protein function, dopamine homeostasis, oxidative stress and neuronal damage. The development of animal models has provided further insight about SYN pathogenetic mechanisms. Transgenic mice overexpressing human SYN (wildtype or mutated) showed intraneuronal inclusions, but the associated phenotype was quite heterogeneous as for the neuronal populations involved and no model recapitulated completely the human pathology. For instance, transgenic mice
19
20 Protein Aggregates in Neurodegenerative Disorders
expressing human SYN gene carrying the A53T mutation under Thy-1 promoter control [141] developed a progressive motor impairment starting from 40 days post-birth. Many telencephalic and brainstem neurons and the neurons of the spinal cord show a strong immunoreactivity to SYN antibodies in cellular bodies and dendrites; this picture differs from the axonal and presynaptic distribution of endogenous SYN in control mice. Brainstem neurons and motor neurons seem particularly vulnerable: motor neurons alterations comprise axonal damage and neuromuscular junction denervation, suggesting that an increased SYN expression may interfere with synaptic trophic mechanisms. Otherwise, transgenic mice with an elevated human SYN expression under platelet-derived growth factor promoter control showed a dopaminergic neuronal loss, and SYN- and ubiquitin-positive inclusions formation in different cerebral areas [142]. 4 Conclusions
Although the presence of aggregates made up of misfolded proteins is a common feature of many neurodegenerative disorders, the biological processes responsible for this phenomenon are variable and its significance in the pathogenesis of the diseases is different. As illustrated in this chapter, the neuropathological manifestations may be extremely heterogeneous even within disorders characterized by aggregates made up by the same protein. In some conditions, like Alzheimer’s disease, prion diseases and tauopathies, the evidence supporting a pathogenic role of the aggregates is quite compelling, although the molecular mechanisms triggering the protein accumulation remain to be established. In the synucleinopathies, the responsibility of the pathologic inclusions in determining the degenerative process is more uncertain. In ALS and diseases with CAG repeat expansions the role played by the aggregates is still elusive, although the number of studies focused on their potential pathogenic role is growing. Finally, the consensus around the neurotoxic role of oligomer species rather than fibrils themselves is vast and propose new therapeutic approaches at these diseases. Acknowledgment
Supported by the Italian Ministry of Health, Department of Social Services (PS03–4 and PS-03–10) and by the European Community (LSHM-CT-2004–503039 and Neuroprion FOOD-CT-2004–506579). References 1 TAYLOR, J. P., HARDY, J. and FISCHBECK, J. H. Toxic Proteins in neurodegenerative disease. Science 2002, 296, 1991–1995. 2 SOTO, C. Unfolding the role of protein misfolding in neurodegenerative
diseases. Nat Rev Neurosci 2003, 4, 49–60. 3 STATHOPULOS, P. B., RUMFELDT, J. A., SCHOLZ, G. A., IRANI, R. A., FREY, H. E., HALLEWELL, R. A., LEPOCK, J. R. and
References
4
5
6
7
8
9
10
11
12
MEIERING, E. M. Cu/Zn superoxide dismutase mutants associated with amyotrophic lateral sclerosis show enhanced formation of aggregates in vitro. Proc Natl Acad Sci USA 2003, 100, 7021–7026. BENDOTTI, C. and CARRI, M. T. Lessons from models of SOD1-linked familial ALS. Trends Mol Med 2004, 10, 393–400. van WELSEM, M. E., HOGENHUIS, J. A., MEININGER, V., METSAARS, W. P., HAUW, J. J. and SEILHEAN, D. The relationship between Bunina bodies, skein-like inclusions and neuronal loss in amyotrophic lateral sclerosis. Acta Neuropathol 2002, 10, 583–589 PIAO, Y. S., et al. Neuropathology with clinical correlations of sporadic amyo-trophic lateral sclerosis: 102 autopsy cases examined between 1962 and 2000. Brain Pathol 2003, 13, 10–22. BUSCH, A., ENGEMANN, S., LURZ, R., OKAZAWA, H., LEHRACH, H. and WANKER, E. E. Mutant huntingtin promotes the fibrillogenesis of wild-type huntingtin: a potential mechanism for loss of huntingtin function in Huntington’s disease. J Biol Chem 2003, 278, 41452–41461. SCHERZINGER, E., et al. Self-assembly of polyglutamine-containing huntingtin fragments into amyloid-like fibrils: implications for Huntington’s disease pathology. Proc Natl Acad Sci USA 1999, 96, 4604–4609. LEE, H. G., PETERSEN, R. B., ZHU, X., HONDA, K., ALIEV, G., SMITH, M. A. and PERRY, G. Will preventing protein aggregates live up to its promise as prophylaxis against neurodegenerative diseases? Brain Pathol 2003, 13, 630–638. MICHALIK, A. and Van BROECKHOVEN, C. Pathogenesis of polyglutamine disorders: aggregation revisited. Hum Mol Genet 2003, 12, R173–R186. PRICE, J. L., DAVIS, P. B., MORRIS, J. C. and WHITE, D. L. The distribution of tangles, plaques, and related immunohistochemical markers in healthy aging and Alzheimer’s disease. Neurobiol Aging 1991, 12, 295–312. BRAAK, H. and BRAAK, E. Neuropathological stageing of
13
14
15
16
17
18
19
20
21
22
Alzheimer-related changes. Acta Neuropathol 1991, 82, 239–259. DELACOURTE, A., et al. The biochemical pathway of neurofibrillary degeneration in aging and Alzheimer’s disease. Neurology 1999, 52, 1158–1165. GLENNER, G. G. and WONG C. W. Alzheimer’s disease: initial report of the purification and characterization of a novel cerebrovascular amyloid protein. Biochem Biophys Res Commun 1984, 120, 885–890. MASTERS, C. L., SIMMS, G., WEINMAN, N. A., MULTHAUP, G., MCDONALD, B. L. and BEYREUTHER, K. Amyloid plaque core protein in Alzheimer disease and Down syndrome. Proc Natl Acad Sci USA 1985, 82, 4245–4249. KANG, J., et al. The precursor of Alzheimer’s disease amyloid A4 protein resembles a cell-surface receptor. Nature 1987, 325, 733–736. MUCHOWSKI, P. J. and WACKER, J. L. Modulation of neurodegeneration by molecular chaperones. Nat Rev Neurosci 2005, 6, 11–22. TAGLIAVINI, F., GIACCONE, G., FRANGIONE, B. and BUGIANI, O. Preamyloid deposits in the cerebral cortex of patients with Alzheimer’s disease and nondemented individuals. Neurosci Lett 1988, 93, 191–196. YAMAGUCHI, H., HIRAI, S., MORIMATSU, M., SHOJI, M., and IHARA, Y. Diffuse type of senile plaques in the brains of Alzheimer-type dementia. Acta Neuropathol 1988, 77, 113–119. VERGA, L., FRANGIONE, B., TAGLIAVINI, F., GIACCONE, G., MIGHELI, A. and BUGIANI, O. Alzheimer patients and Down patients: cerebral preamyloid deposits differ ultrastructurally and histochemically from the amyloid of senile plaques. Neurosci Lett 1989, 105, 294–299. BUGIANI, O., TAGLIAVINI, F. and GIACCONE, G. Preamyloid deposits, amyloid deposits, and senile plaques in Alzheimer’s disease, Down syndrome, and aging. Ann NY Acad Sci 1991, 640, 122–128. GIACCONE, G., TAGLIAVINI, F., LINOLI, G., BOURAS, C., FRIGERIO, L., FRANGIONE, B.
21
22 Protein Aggregates in Neurodegenerative Disorders
23
24
25
26
27
28
29
30
31
and BUGIANI, O. Down patients: extracellular preamyloid deposits precede neuritic degeneration and senile plaques. Neurosci Lett 1989, 97, 232–238. MANN, D. and ESIRI, M. The pattern of acquisition of plaques and tangles in the brains of patients under 50 years of age with Down’s syndrome. J Neurol Sci 1989., 89, 169–179. BUGIANI, O., GIACCONE, G., FRANGIONE, B., GHETTI, B. and TAGLIAVINI, F. Alzheimer patients: preamyloid deposits are more widely distributed than senile plaques throughout the central nervous system. Neurosci Lett 1989, 103, 263–268. JOACHIM, C. L., MORRIS, J. H. and SELKOE, D. J. Diffuse senile plaques occur commonly in the cerebellum in Alzheimer’s disease. Am J Pathol 1989, 135, 309–319. IWATSUBO, T., ODAKA, A., SUZUKI, N., MIZUSAWA, H., NUKINA, N. and IHARA, Y. Visualization of Aβ42(43) and AyS40 in senile plaques with end-specific A monoclonals: evidence that an initially deposited species is Aβ42(43). Neuron 1994, 13, 45–53. FUKUMOTO, H., ASAMI-ODAKA, A., SUZUKI, N., SHIMADA, H., IHARA, Y. and IWATSUBO, T. Amyloid beta-protein deposition in normal aging has the same characteristics as that in Alzheimer’s: predominance of AyS42(43) and association of AyS40 with cored plaques. Am J Pathol 1996, 148, 259–265. KIDD, M. Paired helical filaments in electron microscopy of Alzheimer’s disease. Nature 1963, 197, 192–194. CROWTHER, R. A. and WISCHIK, C. M.. Image reconstruction of the Alzheimer paired helical filament. EMBO J 1985, 4, 3661–3665. GOEDERT, M., WISCHIK, C. M.. CROWTHER, R. A., WALKER, J. E. and KLUG, A. Cloning and sequencing of the cDNA encoding a core protein of the paired helical filament of Alzheimer disease: identification as the microtubule associated protein tau. Proc Natl Acad Sci USA 1988, 85, 4051–4055. NEVE, R. L., HARRIS, P., KOSIK, K. S., KURNIT, D. M. and DONLON, T. A. Identification of cDNA clones for the
32
33
34
35
36
37
38
39
human microtubule-associated protein tau and chromosomal localization of the genes for tau and microtubule-associated protein 2. Brain Res 1986, 387, 271–280. ANDREADIS, A., BROWN, W. M. and KOSIK, K. S. Structure and novel exons of the human tau gene. Biochemistry 1992, 31, 10626–10633. WEINGARTEN, M. D., LOCKWOOD, A. H., HWO, S. Y. and KIRSCHNER, M. W. A protein factor essential for microtubule assembly. Proc Natl Acad Sci USA 1975, 72, 1858–1862. CLEVELAND, D. W., HWO, S. Y. and KIRSCHNER, M. W. Purification of tau, a microtubule-associated protein that induces assembly of microtubules from purified tubulin. J Mol Biol 1977, 116, 207–225. DRECHSEL, D. N., HYMAN, A. A., COBB, M. H. and KIRSCHNER, M. W. Modulation of the dynamic instability of tubulin assembly by the microtubule-associated protein tau. Mol Biol Cell 1992, 3, 1141–1154. BIERNAT, J., GUSTKE, N., DREWES, G., MANDELKOW, E. M. and MANDELKOW, E. Phosphorylation of Ser262 strongly reduces binding of tau to microtubules: distinction between PHF-like immunoreactivity and microtubule binding. Neuron 1993, 11, 153–163. BRAMBLETT, G. T., GOEDERT, M., JAKES, R., MERRICK, S. E., TROJANOWSKI, J. Q. and LEE, V. M. Abnormal tau phosphorylation at Ser396 in Alzheimer’s disease recapitulates development and contributes to reduced microtubule binding. Neuron 1993, 10, 1089–1099. YOSHIDA, H. and IHARA, Y. Tau in paired helical filaments is functionally distinct from fetal tau: assembly incompetence of paired helical filament-tau. J Neurochem 1993, 61, 1183–1186. BRAAK, H., BRAAK, E., GRUNDKE-IQBAL, I. and IQBAL, K. Occurrence of neuropil threads in the senile human brain and in Alzheimer’s disease: a third location of paired helical filaments outside of neurofibrillary tangles and neuritic plaques. Neurosci Lett 1986, 65, 351–355.
References 40 LEE, V. M. Y., GOEDERT, M. and TROJANOWSKI, J. Q. Neurodegenerative Tauopathies. Annu Rev Neurosci 2001, 24, 1121–1159. 41 BUE´ E, L., BUSSIE´ RE, T., BUE´ E-SCHERRER, V., DELACOURTE, A. and HOF, P. R. Tau protein isoforms, phosphorylation and role in neurodegenerative disorders. Brain Res Rev 2000, 33, 95–130. 42 HARDY, J. A. and HIGGINS, G. A. Alzheimer’s disease: the amyloid cascade hypothesis. Science 1992, 256, 184–185. 43 SELKOE, D. Alzheimer’s disease: genes, proteins, and therapy. Physiol Rev 2001, 81, 741–766. 44 FOSTER, N. L., WILHELMSEN, K., SIMA, A. A., JONES, M. Z., D’AMATO, C. J. and GILMAN, S. Frontotemporal dementia and parkinsonism linked to chromosome 17: a consensus conference. Ann Neurol 1997, 41, 706–715. 45 SPILLANTINI, M. G., GOEDERT, M., CROWTHER, R. A. MURRELL, J. R., FARLOW, M. R. and GHETTI, B. Familial multiple system tauopathy with presenile dementia: a disease with abundant neuronal and glial tau filaments. Proc Natl Acad Sci USA 1997, 94, 4113–4118. 46 HUTTON, M., et al. Association of missense and 5 -splice-site mutations in tau with the inherited dementia FTDP-17. Nature 1998, 393, 702–705. 47 POORKAJ, P., et al. Tau is a candidate gene for chromosome 17frontotemporal dementia. Ann Neurol 1998, 43, 815–825. 48 WILHELMSEN, K. C., LYNCH, T., PAVLOU, E., HIGGINS, M. and NYGAARD, T. G. Localization of disinhibition-dementiaparkinsonism-amyotrophic complex to 17q21–22. Am J Hum Genet 1994, 55, 1159–1165. 49 GHETTI, B., HUTTON, M. L. and WSZOLEK, Z. Frontotemporal dementia and parkinsonism linked to chromosome 17 associated with tau gene mutations (FTDP-17T). In Neurodegeneration: The Molecular Pathology of Dementia and Movement Disorders, D. DICKSON (ed.). ISN Neuropath Press, Basel, 2003.
50 STEELE, J. C., RICHARDSON, J. C. and OLSZEWSKI, J. Progressive supranuclear palsy. Arch Neurol 1964, 10, 333–359. 51 HAUW, J. J., VERNY, M., DELAERE, P., CERVERA, P., HE, Y. and DUYCKAERTS, C. Constant neurofibrillary changes in the neucortex in progressive supranuclear palsy. Basic differences with Alzheimer’s disease and aging. Neurosci Lett 1990, 119, 182–186. 52 TELLEZ-NAGEL, I. and WISNIEWSKI, H. M. Ultrastructure of neurofibrillary tangles in Steele-Richardson-Olszewski syndrome. Arch Neurol 1973, 29, 324–327. 53 REBEIZ, J. J., KOLODNY, E. H. and RICHARDSON, E. P., Jr. Corticodentatonigral degeneration with neuronal achromasia. Arch Neurol 1968, 18, 20–33. 54 FEANY, M. B. and DICKSON, D. W. Widespread cytoskeletal pathology characterizes corticobasal degeneration. Am J Pathol 1995, 146, 1388–1396. 55 KSIEZAK-REDING, H., MORGAN, K., MATTIACE, LA., DAVIES, P., LIU, W. K., YEN, S. H., WEIDENHEIM, K. and DICKSON, D. W. Ultrastructure and biochemical composition of paired helical filaments in corticobasal degeneration. Am J Pathol 1994, 145, 1496–1508. 56 CONSTANTINIDIS, J., RICHARD, J. and TISSOT, R. Picks disease. Histological and clinical correlations. Eur Neurol 1974, 11, 208–217. 57 FEANY, M. B., MATTIACE, LA. and DICKSON, D. W. Neuropathologic overlap of progressive supranuclear palsy, Picks disease and corticobasal degeneration. J Neuropathol Exp Neurol 1996, 55, 53–67. 58 HONG, M., et al. Mutation-specific functional impairments in distinct tau isoforms of hereditary FTDP-17. Science 1998., 282, 1914–1917. 59 HASEGAWA, M., SMITH, M. J. and GOEDERT, M. Tau proteins with FTDP-17 mutations have a reduced ability to promote microtubule assembly. FEBS Lett 1998, 437, 207–210. 60 ARRASATE, M., PEREZ, M., ARMAS-PORTELA, R. and AVILA, J. Polymerization of tau peptides into fibrillary structures. The effect of
23
24 Protein Aggregates in Neurodegenerative Disorders
61
62
63
64
65
66 67
68
69
70
FTDP-17 mutations. FEBS Lett 1999., 446, 199–202. NACHARAJU, P., LEWIS, J., EASSON, C., YEN, S., HACKETT, J., HUTTON, M. and YEN, S. H. Accelerated filaments formation from tau protein with specific FTDP-17 mis-sense mutations. FEBS Lett 1999, 447, 195–199. D’SOUZA, I., POORKAJ, P., HONG, M., NOCHLIN, D., LEE, V. M., BIRD, T. D. and SCHELLENBERG, G. D. Missense and silent tau gene mutations cause frontotemporal dementia with parkinsonism-chromosome 17 type, by affecting multiple alternative RNA splicing regulatory elements. Proc Natl Acad Sci USA, 1999, 96, 5598–5603. GROVER, J., et al. 5 splice site mutations in tau associated with the inherited dementia FTDP-17 affect a stem-loop structure that regulates alternative splicing of exon 10. J Biol Chem 1999, 274, 15134–15143. JIANG, Z., COTE, J., KWON, J. M., GOATE, A. M. and WU, J. Y. Aberrant splicing of tau pre-RNA caused by intronic mutations associated with the inherited frontotemporal dementia with parkinsonism linked to chromosome 17. Mol Cell Biol 2000., 20, 4036–4048. PRUSINER, S. B. Novel proteinaceous infectious particles cause scrapie. Science, 1982, 216, 136–144. PRUSINER, S. B. Prions. Proc Natl Acad Sci USA 1998, 95, 13363–13383. WILL, R. G., et al. A new variant of Creutzfeldt-Jakob disease in the UK. Lancet 1996, 347, 921–925. BUDKA, H., et al. Neuropathological diagnostic criteria for Creutzfeldt-Jakob disease (CJD) and other human spongiform encephalopathies (prion diseases). Brain Pathol 1995, 5, 459–466. KRETZSCHMAR, H. A., IRONSIDE, J. W., DEARMOND, S. J. and TATEISHI, J. Diagnostic criteria for sporadic Creutzfeldt-Jakob disease. Arch Neurol 1996, 53, 913–920. GIACCONE, G., et al. Creutzfeldt-Jakob disease: Carnoy’s fixative improves the immunohistochemistry of the proteinase-K resistant prion protein. Brain Pathol 2000, 10, 31–37.
71 BUDKA, H., HEAD, M. W., IRONSIDE, J. W., GAMBETTI, P., PARCHI, P., ZEIDLER, M. and TAGLIAVINI, F. Sporadic Creutzfeldt-Jakob disease. In Neurodegeneration: The Molecular Pathology of Dementia and Movement Disorders, D. DICKSON (ed.). ISN Neuropath Press, Basel, 2003. 72 GHETTI, B., DLOUHY, S. R., GIACCONE, G., BUGIANI, O., FRANGIONE, B., FARLOW, M. and TAGLIAVINI, F. Gerstmann-Str¨aussler-Scheinker disease and the Indiana kindred. Brain Pathol 1995, 5, 61–75. 73 PARCHI, P., et al. Classification of sporadic Creutzfeldt-Jakob disease based on molecular and phenotypic analysis of 300 subjects. Ann Neurol 1999, 46, 224–233. 74 PUOTI, G., GIACCONE, G., ROSSI, G., CANCIANI, B., BUGIANI, O. and TAGLIAVINI, F. Sporadic Creutzfeldt-Jakob disease: cooccurrence of different types of PrPSc in the same brain. Neurology 1999, 53, 2173–2176. 75 BUELER, H., AGUZZI, A., SAILER, A., GREINER, R. A., AUTENRIED, P., AGUET, M. and WEISSMANN C. Mice devoid of PrP are resistant to scrapie. Cell 1993, 73, 1339–1347. 76 FORNO, L. S. Neuropathology of Parkinson’s disease. J Neuropathol Exp Neurol 1996, 55, 259–272. 77 POLYMEROPOULOS, M. H., et al. Mutation in the α-synuclein gene identified in families with Parkinson’s disease. Science 1997, 276, 2045–2077. 78 SPILLANTINI, M. G., SCHMIDT, M. L., LEE, V. M., TROJANOWSKI, J. Q., JAKES, R. and GOEDERT, M. α-synuclein in Lewy bodies. Nature 1997, 388, 839–840. 79 KAZUHARA, S., MORI, H., IZUMIYAMA, N., YOSHIMURA, M. and IHARA, Y. Lewy bodies are ubiquitinated. A light and electron microscopic immunohistochemical study. Acta Neuropathol 1988, 75, 345–353. 80 SCHLOSSMACHER, M. G., et al. Parkin localizes to the Lewy bodies of Parkinson disease and dementia with Lewy bodies. Am J Pathol 2002, 160, 1655–1667. 81 PAPP, M. I., KAHN, J. E. and LANTOS, P. L. Glial cytoplasmic inclusions in the CNS
References
82
83
84
85
86
87
88
89
90
of patients with multiple system atrophy (striatonigral degeneration, olivopontocerebellar atrophy and Shy-Drager syndrome). J Neurol Sci 1989, 94, 79–100. ARIMA, K., et al. NACP/-synuclein immunoreactivity in fibrillary components of neuronal and oligodendroglial cytoplasmic inclusion in the pontine nuclei in multiple system atrophy. Acta Neuropathol 1998, 96, 439–444. FORLONI, G., CHIESA, R., SMIROLDO, S., VERGA, L., SALMONA, M., TAGLIAVINI, F. and ANGERETTI, N. Apoptosis mediated neurotoxicity induced by chronic application of beta amyloid fragment 25–35. Neuroreport 1993, 4, 523–526. LOO, D. T., COPANI, A., PIKE, C. J., WHITTEMORE, E. R., WALENCEWICZ, A. J. and COTMAN, C. W. Apoptosis is induced by beta-amyloid in cultured central nervous system neurons. Proc Natl Acad Sci USA 1993, 90, 7951–7955. PIKE, C. J., WALENCEWICZ, A. J., GLABE, C. G. and COTMAN, C. W. In vitro aging of beta-amyloid protein causes peptide aggregation and neurotoxicity. Brain Res 1991, 563, 311–314. LUO, Y., et al. Mice deficient in BACE1, the Alzheimer’s beta-secretase, have normal phenotype and abolished beta-amyloid generation. Nat Neurosci 2001, 4, 231–232. DEWACHTER, I. and Van LEUVEN, F. Secretases as targets for the treatment of Alzheimer’s disease: the prospects. Lancet Neurol 2002, 1, 409–416. SELKOE, D. J. The cell biology of beta-amyloid precursor protein and presenilin in Alzheimer’s disease. Trends Cell Biol 1998, 8, 447–453. CAUGHEY, B. and LANSBURY, P. T. Protofibrils, pores, fibrils and neurodegeneration: separating the responsible protein aggregates from the innocent bystanders. Annu Rev Neurosci 2003, 26, 267–298. FORLONI, G., et al. Protein misfolding in Alzheimer’s and Parkinson’s disease: genetics and molecular mechanisms. Neurobiol Aging 2002, 23, 957–976.
91 ARRIAGADA, P. V., GROWDON, J. H., HEDLEY-WHYTE, E. T. and HYMAN, B. T. Neurofibrillary tangles but not senile plaques parallel duration and severity of Alzheimer’s disease. Neurology 1992, 42, 631–639. 92 BENNETT, D. A., COCHRAN, E. J., SAPER, C. B., LEVERENZ, J. B., GILLEY, D. W. and WILSON, R. S. Pathological changes in frontal cortex from biopsy to autopsy in Alzheimer’s disease. Neurobiol Aging 1993, 14, 589–596. 93 KLEIN, W. L., KRAFFT, G. A. and FINCH, C. E. Targeting small Abeta oligomers: the solution to an Alzheimer’s disease conundrum? Trends Neurosci 2001, 24, 219–224. 94 HARDY, J. and SELKOE, D. J. The amyloid hypothesis of Alzheimer’s disease: progress and problems on the road to therapeutics. Science 2002, 297, 353–356. 95 FORLONI, G., LUCCA, E., ANGERETTI, N., DELLA TORRE, P. and SALMONA, M. Amidation of beta-amyloid peptide strongly reduced the amyloidogenic activity without alteration of the neurotoxicity. J Neurochem 1997, 69, 2048–2054. 96 HARTLEY, D. M., WALSH, D. M., YE, C. P., DIEHL, T., VASQUEZ, S., VASSILEV, P. M., TEPLOW, D. B. and SELKOE, D. J. Protofibrillar intermediates of amyloid beta-protein induce acute electrophysiological changes and progressive neurotoxicity in cortical neurons. J Neurosci 1999, 19, 8876–8884. 97 CLEARY, J. P., WALSH, D. M., HOFMEISTER, J. J., SHANKAR, G. M., KUSKOWSKI, M. A., SELKOE, D. J. and ASHE, K. H. Natural oligomers of the amyloid-beta protein specifically disrupt cognitive function. Nat Neurosci 2005, 8, 79–84. 98 ODDO, S., et al. Triple-transgenic model of Alzheimer’s disease with plaques and tangles: intracellular Abeta and synaptic dysfunction. Neuron 2003, 39, 409–421. 99 NASLUND, J., HAROUTUNIAN, V., MOHS, R., DAVIS, K. L., DAVIES, P., GREENGARD, P. and BUXBAUM, J. D. Correlation between elevated levels of amyloid beta-peptide in the brain and cognitive decline. J Am Med Ass 2000, 283, 1571–1577.
25
26 Protein Aggregates in Neurodegenerative Disorders
100 COHEN, F. E. and PRUSINER, S. B. Pathologic conformations of prion proteins. Annu Rev Biochem 1998, 67, 793–819. 101 SOTO, C. and SABORIO, G. P. Prions: disease propagation and disease therapy by conformational transmission. Trends Mol Med 2001 7, 109–114. 102 CHIESA, R., PICCARDO, P., QUAGLIO, E., DRISALDI, B., SI-HOE, S. L., TAKAO, M., GHETTI, B. and HARRIS, D. A. Molecular distinction between pathogenic and infectious properties of the prion protein. J Virol 2003, 77, 7611–7622. 103 HETZ, C., RUSSELAKIS-CARNEIRO, M., MAUNDRELL, K., CASTILLA, J. and SOTO, C. Caspase-12 and endoplasmic reticulum stress mediate neurotoxicity of pathological prion protein. EMBO J 2003, 22, 5435–5445. 104 FORLONI, G., ANGERETTI, N., CHIESA, R., MONZANI, E., SALMONA, M., BUGIANI, O. and TAGLIAVINI, F. Neurotoxicity of a prion protein fragment. Nature 1993, 362, 543–546. 105 TAGLIAVINI, F., et al. Synthetic peptides homologous to prion protein residues 106–147 form amyloid-like fibrils in vitro. Proc Natl Acad Sci USA 1993, 90, 9678–9682. 106 FORLONI, G., TAGLIAVINI, F., BUGIANI, O. and SALMONA, M. Amyloid in Alzheimer’s disease and prion-related encephalopathies: studies with synthetic peptides. Prog Neurobiol 1996, 49, 287–315. 107 BROWN, D. R., HERMS, J. and KRETZSCHMAR, H. A. Mouse cortical cells lacking cellular PrP survive in culture with a neurotoxic PrP fragment Neuroreport 1994, 27, 2057–2060. 108 FIORITI, L., QUAGLIO, E., HARRIS, D., FORLONI, G. and CHIESA, R. The neurotoxicity of prion protein (PrP) peptide 106–126 is independent of the expression level of PrP and is not mediated by abnormal PrP species. Cell Mol Neurosci 2005, 28, 165–176. 109 TAGLIAVINI, F., FORLONI, G., D’URSI, P., BUGIANI, O. and SALMONA, M. Studies on peptide fragments of prion. Adv Protein Chem 2001, 57, 171–201.
110 FORLONI, G., ANGERETTI, N., MALESANI, P., PERESSINI, E., RODRIGUEZ MARTIN, T., DELLA TORRE, P. and SALMONA, M. Influence of mutations associated with familial prion-related encephalopathies on biological effects of PrP peptides. Ann Neurol 1999, 45, 489–494. 111 FORLONI, F., TAGLIAVINI, F., BUGIANI, O. and SALMONA, M. Apoptosis-mediated neurotoxicity induced by beta-amyloid and PrP fragments. Mol Chem Neuropathol 1996, 28, 163–171. 112 KAYED, R., HEAD, E., THOMPSON, J. L., MCINTIRE, T. M., MILTON, S. C., COTMAN, C. W. and GLABE, C. G. Common structure of soluble amyloid oligomers implies common mechanism of pathogenesis. Science 2003, 300, 486–489. 113 BONIFATI, V., OOSTRA, B. A. and HEUTINK, P. Unraveling the pathogenesis of Parkinson’s disease – the contribution of monogenic forms. Cell Mol Life Sci 2004, 61, 1729–1750. 114 KRUGER, R., et al. Ala30Pro mutation in the gene encoding alpha-synuclein in Parkinson’s disease. Nat Genet 1998, 18, 106–108. 115 ZARRANZ, J. J., et al. The new mutation, E46K, of alpha-synuclein causes Parkinson and Lewy body dementia. Ann Neurol 2004, 55, 164–173. 116 KITADA, T., et al. Mutations in the parkin gene cause autosomal recessive juvenile parkinsonism. Nature 1998, 392, 605–608. 117 BIASINI, E., FIORITI, L., CEGLIA, I., INVERNIZZI, R., BERTOLI, A., CHIESA, R. and FORLONI, G. Proteasome inhibition and aggregation in Parkinson’s disease: a comparative study in untransfected and transfected cells J Neurochem 2004, 88, 545–553. 118 PETRUCELLI, L. and DAWSON, T. M. Mechanism of neurodegenerative disease: role of the ubiquitin proteasome system. Ann Med 2004, 36, 315–320. 119 EL-AGNAF, O. M. and IRVINE, G. B. Aggregation and neurotoxicity of alpha-synu-clein and related peptides. Biochem Soc Trans 2000, 30, 559–565. 120 WOOD, S. J., WYPYCH, J., STEAVENSON, S., LOUIS, J. C., CITRON, M. and BIERE, A. L.
References
121
122
123
124
125
126
127
128
Synuclein fibrillogenesis is nucleation-dependent. Implications for the pathogenesis of Parkinson’s disease. J Biol Chem 1999, 274, 19509–19512. SHIMURA, H., et al. Familial Parkinson disease gene product, parkin, is a ubiquitin-protein ligase. Nat Genet 2000, 25, 302–305. YAMAMOTO, A., FRIEDLEIN, A., IMAI, Y., TAKAHASHI, R., KAHLE, P. J. and HAASS, C. Parkin phosphorylation and modulation of its E3 ubiquitin ligase activity. J Biol Chem 2005, 280, 3390–3399. WERSINGER, C., PROU, D., VERNIER, P., NIZNIK, H. B. and SIDHU, A. Mutations in the lipid-binding domain of alpha-synuclein confer overlapping, yet distinct, functional properties in the regulation of dopamine transporter activity. Mol Cell Neurosci 2003, 24, 91–103. MUNISHKINA, L. A., HENRIQUES, J., UVERSKY, V. N. and FINK, A. L. Role of protein-water interactions and electrostatics in alpha-synuclein fibril formation. Biochemistry 2004, 43, 3289–3300. HASHIMOTO, M., HSU, L. J., XIA, Y., TAKEDA, A., SISK, A., SUNDSMO, M. and MASLIAH, E. Oxidative stress induces amyloid-like aggregate formation of NACP/alpha-synuclein in vitro. Neurore-port 1999, 10, 717–721. NORRIS, E. H., GIASSON, B. I., ISCHIROPOULOS, H. and LEE, V. M. Effects of oxidative and nitrative challenges on alpha-synuclein fibrillogenesis involve distinct mechanisms of protein modifications. J Biol Chem 2003, 278, 27230–27240. SHERER, T. B., KIM, J. H., BETARBET, R. and GREENAMYRE, J. T. Subcutaneous rotenone exposure causes highly selective dopaminergic degeneration and alphasynuclein aggregation. Exp Neurol 2003, 179, 9–16. CONWAY, K. A., LEE, S. J., ROCHET, J. C., DING, T. T., WILLIAMSON, R. E. and LANSBURY, P. T., Jr. Acceleration of oligomerization, not fibrillization, is a shared property of both alpha-synuclein
129
130
131
132
133
134
135
136
137
mutations linked to early-onset Parkinson’s disease: implications for pathogenesis and therapy. Proc Natl Acad Sci USA 2000, 97, 571–576. CONWAY, K. A., HARPER, J. D. and LANSBURY, P. T. Accelerated in vitro fibril formation by a mutant alpha-synuclein linked to early-onset Parkinson disease. Nat Med 1998, 4, 1318–1320. GREENBAUM, E. A., GRAVES, C. L., MISHIZEN-EBERZ, A. J., LUPOLI, M. A., LYNCH, D. R., ENGLANDER, S. W., AXELSEN, P. H. and GIASSON, B. I. The E46K mutation in alpha-synuclein increases amyloid fibril formation. J Biol Chem 2005, Jan 4 [Epub ahead of print] VOLLES, M. J. and LANSBURY, P. T., Jr. Zeroing in on the pathogenic form of alpha-synuclein and its mechanism of neurotoxicity in Parkinson’s disease. Biochemistry 2003, 42, 7871–7878. FORLONI, G., BERTANI, I., CALELLA, A. M., THALER, F. and INVERNIZZI, R. Alpha-synuclein and Parkinson’s disease: selective neurodegenerative effect of alpha-synuclein fragment on dopaminergic neurons in vitro and in vivo. Ann Neurol 2000, 47, 632–640. MANNING-BOG, A. B., MCCORMACK, A. L., PURISAI, M. G., BOLIN, L. M. and DI MONTE, D. A. Alpha-synuclein overexpression protects against paraquat-induced neurodegeneration. J Neurosci 2002, 23, 3095–3099. ANCOLIO, K., ALVES DA COSTA, C., UEDA, K. and CHECLER, F. Alpha-synuclein and the Parkinson’s disease-related mutant Ala53Thr-alpha-synuclein do not undergo proteasomal degradation in HEK293 and neuronal cells. Neurosci Lett 2000, 285, 79–82. SEO, J. H., et al. Alpha-synuclein regulates neuronal survival via Bcl-2family expression and PI3/Akt kinase pathway. FASEB J 2002, 16, 1826–1828 BECKER-HAPAK, M., MCALLISTER, S. S. and DOWDY, S. F. TAT-mediated protein transduction into mammalian cells. Methods 2001, 24, 247–256. ALBANI, D., PEVERELLI, E., RAMETTA, R., BATELLI, S., VESCHINI, L., NEGRO, A. and
27
28 Protein Aggregates in Neurodegenerative Disorders
FORLONI, G. Protective effect of TAT-delivered alphasynuclein: relevance of the C-terminal domain and involvement of HSP70. FASEB J 2004, 18, 1713–1715. 138 XU, J., KAO, S. Y., LEE, F. J., SONG, W., JIN, L. W. and YANKNER, B. A. Dopamine-dependent neurotoxicity of alphasynuclein: a mechanism for selective neurodegeneration in Parkinson disease. Nat Med 2002, 8, 600–606. 139 LEE, F. J., LIU, F., PRISTUPA, Z. B. and NIZNIK, H. B. Direct binding and functional coupling of alpha-synuclein to the dopamine transporters accelerate
dopamine-induced apoptosis. FASEB J 2001, 15, 916–922. 140 PEREZ, R. G., WAYMIRE, J. C., LIN, E., LIU, J. J., GUO, F. and ZIGMOND, M. J. A role for alpha-synuclein in the regulation of dopamine biosynthesis. J Neurosci 2002, 22, 3090–3099. 141 van der PUTTEN, H., et al. Neuropathology in mice expressing human alpha-synuclein. J Neurosci 2000, 20, 6021–6029. 142 MASLIAH, E., et al. Dopaminergic loss and inclusion body formation in alpha-synuclein mice: implications for neuro-degenerative disorders. Science 2000, 287, 1265–1269.
1
Cellular Toxicity of Protein Aggregates Bruce Kagan
University of California, Los Angeles, USA
Originally published in: Amyloid Proteins. Edited by Jean D. Sipe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3–527–31072-X
1 Introduction
“Amyloid” was first described by Virchow in the 19th century as an amorphous starch-like deposit in tissues that stained with iodine, indicating the presence of carbohydrate. Investigators in the 20th century concluded that amyloid deposits were largely proteinaceous and exhibited a characteristic staining with the dye Congo red. It also became clear that many different proteins could form amyloid deposits and that these deposits were composed of fibrils with a characteristic structure (for review [1]. Fibrils are 80–100 Å in width and often of very extended length. At least 21 known proteins and peptides have been reported to deposit as amyloid (Table 1) and are associated with a wide variety of human illnesses [2]. Although amyloid-forming proteins exhibit no amino acid sequence homology, they all adopt a characteristic β-sheet structure within amyloid fibrils. Amyloid deposits also contain a variety of common components, such as glycosaminoglycans, proteoglycans and the pentraxin serum amyloid P. The structural role of these components in amyloid fibrils and deposits is unclear. Although β-sheet conformation is at the essence of amyloid fibril formation, its role in determining pathology has been less clear. Amyloid deposits were often found to be associated with disease, but were also found in otherwise apparently healthy tissue. Subsequently, studies began to show that amyloid deposits are associated with disease and, perhaps, play an etiologic role [3]. However, the experimental evidence for a correlation of clinical illness with amyloid burden has been irregular and unpersuasive. Recent studies have suggested that smaller, soluble aggregates of proteins, many of which go on to form amyloid fibrils, may be responsible for cellular dysfunction and death. These early aggregates will be referred to as oligomers in this chapter, with the understaning that the term oligomer might refer to species containing as few as two or as many as thousands of molecules. More Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Cellular Toxicity of Protein Aggregates Table 1 Diseases of protein misfolding: amyloidoses
Disease
Protein
Abbreviation
Alzheimer’s disease Down’s Syndrome (trisomy 21) Heredity cerebral angiopathy (Dutch) Kuru Gerstmann-Str¨ausslerScheinker syndrome (GSS) Creutzfeldt-Jacob disease Scrapie (sheep) Bovine spongiform encephalopathy (“mad cow”) Type II diabetes mellitus (adult onset) Dialysis-associated amyloidosis Senile cardiac amyloidosis Familial amyloid polyneuropathy Reactive amyloidosis (familial Mediterranean fever) Familial amyloid polyneuropathy (Finnish) Macroglobulinemia Primary systemic amyloidoses Familial Polyneuropathy – Iowa (Irish) Hereditary cerebral myopathy – Iceland Non-neuropathic hereditary amyloid with renal disease Non-neuropathic hereditary amyloid with renal disease Familial British dementia
amyloid β precursor protein (Aβ1–42)
AβPP (Aβ1–42)
prion protein
PrPC /PrPSc
islet amyloid polypeptide (amylin)
IAPP
β 2 -microglobulin atrial natriuretic protein transthyretin
β2M ANP TTR
serum amyloid A
SAA
gelsolin
AGel
λ1 heavy chain Igλ, Igκ apolipoprotein AI
AH AL ApoAI
cystatin C
ACys
fibrinogen α
AFibA
lysozyme
ALys
FBDP
ABri
precise numbers may come from future studies, but the important distinction is that these oligomers exist in solution (at least for a time), as opposed to fibrils which precipitate out of solution. A number of common, significant, costly, and devastating diseases are amyloidoses. These include Alzheimer’s disease, Type 2 diabetes mellitus and the spongiform encephalopathies [prion diseases, e.g. scrapie, “mad cow” disease, new variant Creutzfeldt-Jakob disease (Table 1)]. The aging of the population in the developed world has caused an epidemic of these age-related diseases to begin, making the search for an understanding of disease pathogenesis more critical than ever. Although a great deal of research has strongly implicated amyloid proteins in the pathogenesis of these illnesses, the molecular mechanisms by which these misfolded proteins cause dysfunction and/or toxicity remains elusive (for review, see [4]). In this article, I will first review the evidence that oligomers (as opposed to fibrils) of amyloid peptides are the pathogenic species. I will go on to
2 Aggregation Table 2 Diseases of protein misfolding: non-amyloidoses
Disease
Protein
Abbreviation
Diffuse Lewy body disease Parkinson’s disease Fronto-temporal dementia Amyotrophic lateral sclerosis Triplet-repeat diseases: Huntington’s spinal and bulbar muscular atrophy spinocerebellar ataxias spinocerebellar ataxia 17
α-synuclein
AS
tau superoxide dismutase-1 polyglutamine tracts in the following proteins: huntingtin androgen receptor
tau SoD-1 PG
ataxins TATA box-binding protein
review the current evidence as to how these oligomers cause cellular dysfunction and/or toxicity. It has been also reported that protein misfolding and aggregation is linked to the pathophysiology of a number of other, non-amyloid diseases (Table 2). Furthermore, it has been reported that non-disease-related proteins can form amyloid oligomers with toxicity. Here, the links amongst these various disease-associated phenomena will be considered.
2 Aggregation
The early findings that amyloid burden in patients did not correlate well with clinical severity of illness was troubling to investigators who were thinking along the lines that amyloid fibrils and deposits play an etiologic role in amyloidoses. Subsequently, the discovery of familial forms of amyloidoses linked to mutations in amyloid proteins gave new impetus to the notion that these proteins “caused” the illness [5, 6]. Further research, initially with the Aβ1–42 peptide of Alzheimer’s disease, showed that amyloid peptides could demonstrate cytotoxicity in isolated cell systems and that these peptides were toxic to the cell types damaged in diseases [7]. Thus, Aβ and PrP106–126 [8] were shown to be toxic to neurons, and IAPP was shown to be toxic to β cells from the islets of Langerhans in the pancreas [9]. Further experiments showed that amyloid fibrils themselves usually lacked toxicity. Monomers, also, seemed to lack cytotoxicity. In early experiments, however, cytotoxicty was poorly reproducible, especially for the Aβ peptides. This was eventually shown to be due to differences in the aggregation status of Aβ under varying experimental conditions [10]. The aggregation of amyloid fibril-forming peptides was found to be a highly complex and variable process [11]. The initial phases of protein aggregation from monomers into small oligomers progressed quite slowly. However, once a critical mass was reached, aggregation proceeded quite rapidly. Thus, growth of fibrils could be “nucleated” by the addition of pre-formed “seeds” to a solution of peptide. The seeds greatly accelerated the aggregation of the amyloid protein. However, it was soon discovered that, while promotion of amyloid peptide aggregation (by
3
4 Cellular Toxicity of Protein Aggregates
acidic pH, high concentration, “aging” or seeding) was necessary to achieve toxicity, extensive aggregation could actually lead to a decline in cytotoxicty [12]. It appeared that an intermediate state of aggregation was associated with the cytotoxicity of amyloid fibril-forming peptides. The notion of amyloid peptide-induced change also broadened. While cytotoxicity was clearly relevant to diseases in which cells died, animal models of Alzheimer’s disease created with mice transgenic for the Aβ precursor protein (AβPP) or Aβ developed amyloid deposits, learning and memory deficits, and synaptic loss, but no frank neuronal loss. Although some criticized the models as inadequate, or attributed the lack of nerve loss to differences between mice and men, others began to think that, perhaps in the early stages of Alzheimer’s disease, there were characteristic dysfunctions that precede neuronal loss. Developmental investigations of transgenic mouse models showed that memory deficits actually preceded amyloid deposits and synaptic loss. Further studies showed that Aβ peptides could inhibit long-term potentiation, a model for memory [13]. Additionally, only oligomeric forms of Aβ could inhibit long-term potentiation, not monomeric or fibrillar Aβ [14]. These observations led to the implication of oligomers in cellular dysfunction relevant to clinical symptoms and to cytotoxicity. In vitro, it was shown that fibroblasts could be killed by Aβ that was “freshly prepared” (and thus less highly aggregated) and that appeared to be “globular” by atomic force microscopy. Toxicity was blocked by antibodies or the channel-blocking zinc ion, but not by antioxidants [15]. A similar story emerged in Huntington’s disease, where early deficits in learning and memory could be linked to the presence of aggregates of the disease related protein huntingtin with a long PG tract. These deficits appear well before any frank neuronal loss in the brains of affected transgenic mice. Electrophysiologic abnormalities accompanied these deficits and could be detected even prior to the appearance of huntingtin aggregates, suggesting that the large, visible aggregates may not be responsible for the early cellular effects of huntingtin with extended PG tracts [14]. It has also been shown that PG aggregates targeted to the nucleus can be directly cytotoxic [17]. Oligomers have also been implicated in the cytotoxicity of AS. Mutations in AS which promote “protofibril” (oligomers) formation also enhance cytotoxicity [18]. Similarly, the “arctic” mutation of Aβ, which enhances oligomer formation, shows increased cytotoxicity in comparison to native Aβ peptide [19]. Toxic oligomers of IAPP were shown by light scattering to contain between 25 and 6000 IAPP molecules [20]. Once again, fibrils and monomers were shown to lack toxicity. An antibody preparation to the “soluble oligomer” conformation was generated, which recognized several different amyloid peptides, but only in this precise conformation [21]. This antibody could block cytotoxicity of a range of amyloid peptides, strongly suggesting that oligomers were the toxic species and that the toxic mechanism (or at least a part of it) was common to all the amyloids. Biological dogma has maintained that the three-dimensional structure and function of proteins is dictated by their primary amino acid sequence. Our knowledge of amyloid diseases has led to the recognition that natively folded proteins may
2 Aggregation
unfold at times and adopt non-native conformations. Proteins may refold to their native conformation (with or without the assistance of chaperones) or they may be targeted for degradation via the ubiquitin–proteasome pathway. Alternately, some partially unfolded or misfolded proteins may aggregate. This aggregation may be driven thermodynamically by new hydrogen bonding possibilities [22] or by the hydrophobic effect, the need to shield hydrophobic parts of proteins from the aqueous environment. Proteins that form amyloid appear to have a propensity to misfold into β-sheet structures. These β-sheets tend to self-aggregate, forming intermolecular bonds that aggregate the proteins or peptides. Proteins that possess hydrogen-bonding defects are more likely to interact with lipid bilayers [22]. It is still not well understood why these complexes are not handled by the cell’s usual defense mechanisms, i.e. refolding to native structures through chaperone assistance or targeting for degradation through ubiquitin–proteasomes. It may be that these processes are at work, but are merely overwhelmed by the amount of misfolded protein they must handle. This might be the place where disease begins. Alternatively, it is possible that the nature of these β-sheet aggregates leads them to attack or insert into cellular membranes. This would render them inaccessible to chaperones and to the ubiquitin–proteasome pathway. Another possibility is that, once seeding or nucleation has occurred, the fibrillation process proceeds so rapidly that the aggregates become too large to be handled by the cell’s defenses. It has been suggested that amyloid fibril formation, with or without vacuolization, is actually a cellular defense mechanism to isolate the misfolded protein and keep it separate from the cell’s vital machinery. Support for this idea comes from experiments showing that amyloid fibrils have little, if any, toxicity to cells, while smaller aggregates appear to possess considerable cytotoxicity. The initial misfolding of a protein may be driven by mutations, metal interactions, environmental conditions such as temperature, acidic pH, oxidation and proteolysis or a simple increase in concentration. All these phenomena can potentially increase the presence of misfolded conformations and increase the probability of aggregation. The presence of membranes may also play a key role in this process, as lipid bilayers have been shown to influence both the conformation of amyloid peptides and the propensity of amyloid peptides to aggregate [23]. Thus, the intracellular component in which an amyloid protein finds itself may significantly affect its ultimate conformational fate. The many amyloid peptides exhibit no primary sequence homology, but they all possess the possibility of converting to relatively high β-sheet content. Sometimes this β-sheet content is unpredictable, as in the prion protein where sequences which would usually be predicted to be α-helical adopt a β-sheet conformation when synthesized [8]. While the earliest stages of amyloid peptide aggregation cannot be easily observed with current imaging techniques, the results of recent electron and atomic force microscopic studies have shown the presence of spherical or globular aggregates early in the process of amyloid fibril formation. The smallest particle appears to aggregate into chains or annular rings referred to as “protofibrils”. These structures appear to be the precursors of fibrils that exist in mature amyloid deposits. The
5
6 Cellular Toxicity of Protein Aggregates
protofibrillar structures possess significant toxicity in comparison to fibrils themselves. It is unclear whether the annular structures or rings are more toxic than their linear brethren, but it is of interest that pathogenic mutations of Aβ amyloid and AS lead to increased formation of these annular structures [18]. 3 Cellular Mechanisms of Oligomeric Toxicity
Recently, it has been reported that proteins and peptides associated with amyloid disease are not unique in their ability to form amyloid. Aggregation and amyloid fibril formation have been achieved for a number of non-disease associated proteins (Table 3) [20]. It has been proposed that under appropriate conditions all proteins might be capable of forming amyloid fibrils. Beyond the structural similarity of these non-disease-associated amyloid proteins to actual amyloid proteins, there are functional similarities as well. For example, HypF is able to aggregate, and the aggregates can kill cells and permeabilize liposomes [95]. Fibrils lack these properties, but aggregation of HypF is required for toxicity and permeabilization, thus implicating smaller oligomers as the cytotoxic, membrane-permeabilizing species. Preceding cytotoxicity, HypF oligomers, but not fibrils, raised intercellular Ca2+ levels and levels of reactive oxygen species [11]. These intracellular effects were reversible, and they were prevented with the omission of Ca2+ from the media or the addition of reducing agents. These physiologic effects are strikingly similar to those observed with disease related amyloid peptides [26, 27]. These results are also consistent with the findings that antibodies raised against soluble pre-fibrillar oligomers of various amyloid peptides recognize that state in other amyloid peptides and can prevent cytotoxicity [21]. This suggests that amyloid peptides share an oligomeric conformation critical to cytotoxicity and independent of amino acid sequence. Thus, a wide variety of structural, biochemical, and physiologic studies suggest that pre-fibrillar amyloid peptide oligomers act to damage and/or kill cells, especially neurons, and that they appear to share a common mechanism of action. As we discuss below, a growing body of biophysical evidence implicates channel formation
Table 3 Non-disease related amyloid-forming proteins/peptides
SH3 domain p 85 Phosphatidylinositol-3-kinase HypF N-terminal domain (E. coli) Apomyoglobin (equine) Endostatin (human) StefinB(human) Fibroblast growth factor (Notophthalmus viridescens) VI domain (murine) Curlin CgsA subunit
fibronectin type III phosphoglycerate kinase acylphosphatase amphoterin (human) apocytochrome c Met aminopeptidase ADA2H apolipoprotein CII B1 domainof IgG-binding protein monellin
7 The Channel Hypothesis 7
by amyloid peptides in cellular membranes as the molecular mechanism of this damage. 4 Loss of Function Hypothesis
This view postulates that cytotoxicity results from a loss of native protein function due to misfolding of native protein and aggregation. This idea seems untenable in light of the fact that so many different, structurally and functionally unrelated proteins are involved in amyloid diseases. Transgenic mouse models in which an amyloid protein-encoding gene is inserted and disease results also argue against this idea. An alternative form of this concept is that aggregated amyloid proteins might bind and inactivate essential cellular proteins, such as transcription factors. However, there is little evidence to suggest that physiologically significant amounts of critical cellular proteins can be found in amyloid deposits or inclusion bodies. Substantial evidence has now accumulated to show that amyloid proteins possess toxic activity toward cells. The molecular mechanism of this toxicity remains unknown. 5 Receptors for Advanced End-products of Glycation (RAGE) Receptors
It has been proposed that RAGE bind various amyloid oligomers, inducing cellular stress and activation of NF-κB [10]. While some experimental evidence supports this idea, it has not been clearly documented that this mechanism accounts for tissue damage in amyloid diseases. Further investigation is warranted. 6 Oxidative Stress
Another hypothesis for the pathogenesis of amyloidosis and amyloid-like diseases is that oxidative stress is induced directly by amyloid peptides, leading to free radical production, mitochondrial dysfunction and death [13]. Although many elements of oxidative stress pathology occur in amyloid induced cytotoxicity, it does not explain how oxidative stress and/or free radicals are generated by amyloid peptides. Initial reports that the peptide itself could generate free radicals have not been confirmed. Oxidative stress clearly occurs in cells affected with amyloid toxicity. It remains to be established where in the chain of toxicity oxidation plays its role. 7 The Channel Hypothesis
A large body of research has focused on the ability of amyloid peptides to interact with membranes. A series of provocative studies has shown that many (if not all)
8 Cellular Toxicity of Protein Aggregates
amyloid peptides can aggregate into β-sheet oligomers capable of spontaneously inserting into lipid membranes. The peptides form relatively permanent ionpermeable channel structures across the cell membrane. These channels are reported to have properties that would damage or kill most cells (Table 4). The channels are reported to be (a) large, (b) non-selective, (c) heterogeneous, (d) voltage independent, (e) irreversible, (f) inhibited by agents that prevented aggregation such as Congo red and (g) blocked by zinc ions. These channels would likely cause a leakage pathway in plasma, mitochondrial, endoplasmic reticulum, lysosomal or other membranes. These leaks could damage cells by (1) disrupting membrane potentials and ion gradients, (2) causing loss of vital intracellular ions such as K+ and Mg+ , (3) allowing influx of toxic ions such as Ca2+ , (4) running down energy stores by forcing ion pumps to work harder, (5) disrupting mitochondrial membrane potential and initiating apoptosis by allowing cytochrome c to leak out of mitochondria, and (6) allowing toxic enzymes and other factors to leak out of lysosomes and peroxisomes. In the sections that follow, the evidence for channel formation by the amyloid peptides which have been most studied by these techniques is reviewed.
8 Aβ
The “channel hypothesis” was first proposed by Arispe et al. [3] after they reported that Aβ1–40 could form cation-selective, calcium permeable channels of various conductances in planar lipid bilayer membranes (BLMs). The channels were formed by Aβ1–42 as well and were large, voltage independent and blocked by tromethamine (Tris+ ) and aluminum [4]. The largest channels observed (4 nS) could potentially change the interior [Na] of a cell by as much as 10 µM/s. They proposed that ionic leaking of Na+ , K+ and Ca2+ could disrupt membrane potential and ionic regulation within a few seconds. While these findings were not immediately confirmed by other laboratories, due to problems with irregular aggregation of Aβ [32], eventually a series of studies found Aβ peptides to be capable of forming channels in BLMs [32, 34], liposomes [32], neurons and oocytes [24], and fibroblasts [36]. The state of Aβ aggregation is critical not only to its cytotoxicity [10, 33], but also to its channel-forming abilities. Indeed, monomers and fibrils of Aβ that are non-toxic fail to form ion channels [32]. Oligomeric species of Aβ cause a variety of channel entities, which can be distinguished by their single-channel conductance, ionic selectivity, kinetics and other channel properties [38]. It was also found that conditions (aging, acidic pH, etc.) that favor the aggregation of monomers into oligomers led to an increase in channel activity [32]. Exposure to organic solvents, which promote monomeric Aβ, led to loss of channel activity. However, channel activity could be recovered by allowing the peptide to “age” in aqueous solution. Although Aβ1–40 and 1–42 are the primary forms of Aβ peptides found in vivo, other Aβ fragments have been of experimental interest. Aβ25–35, a cytotoxic peptide not found in vivo, is a voltage-dependent, non-selective channel former [32]. Studies using variants of Aβ25–35 showed that channel formation was necessary for
Table 4 Channel properties of amyloid peptides
Peptide Aβ25–35 Aβ1–40 Aβ1–40 Aβ1–40 ARC (E22G) Aβ1–42 CT105 (C-terminal fragment of amyloid precursor protein (APP) Islet amyloid polypeptide (Amylin) PrP106–126 PrP106–126 PrP82–146 SAA SAA 2.2 (murine hexamer) C-type natriuretic peptide Atrial natriuretic factor Beta2 -microglobulin Transthyretin Polyglutamine (average molecular weight = 6000) Polyglutamine 40 NAC (AS65–95) AS A30P/A53T CT (human/salmon) Lysozyme 87–114
Single-channel Ion selectivity conductance (pS) (permeability ratio)
Blockade by zinc by Congo red
10–400 10–2000 50–4000
cation (PK /PCl =1.6) + cation (PK /PCl =1.8) + cation (PK /PCl =11.1) +
+
[59, 70] [32] [3, 4], [3, 30] [55]
10–2000 120
cation (PK /PCl =1.8) cation
+ +
+ +
[32] [50]
7.5
cation (PK /PCl = 1.9) +
+
[71]
10–400 140, 900, 1444
+
+
10–1000
cation (PK /PCl =2.5) cation (PK /PCl >10) cation (variable) cation (PK /PCl =2.9)
+
+
21, 63 68, 160, 273 0.5–120 variable 19–220
cation (PK /PCl > 10) + variable non-selective + cation (variable) + non-selective –
+ + + –
[60] [49] [6] [35] [99] [35] [50] [30] [30] [15]
17 10–300
cation variable
+
12/580
non-selective non-selective
15–20/70–100
25/80
20–25/80–120 permeable to β-galactosidase (molecular weight 116 kD) 50/190
Reference
+
[73] [5] [99] [96] [36] 8 A$
Cu/Zn SoD
Ring diameter (inner/outer, in Å, by microscopy)
Chung et al. (2003) 9
10 Cellular Toxicity of Protein Aggregates
cytotoxicity, but not sufficient, i.e. all cytotoxic species formed channels, but there were two channel forming variants of Aβ25–35 that did not kill cells [39]. Channel activity could be enhanced by lipids carrying a net negative surface charge and this effect could be countered by high salt concentrations. Addition of cholesterol, which stiffens membranes, decreased Aβ25–35 channel activity [40]. Aβ25–35 variants could not form channels if they were not at least 10 residues long, indicating a minimum bilayer spanning length of about 30 Å, a result consistent with the β-sheet span lengths of the known channels generated by Staphylococcal α toxin and anthrax toxin [41, 42]. More recently, an extremely short channel-forming Aβ variant (31–35) has been reported [43]. Whether this peptide might form hemi-channels to span the bilayer similar to the peptide Gramicidin is unknown. In vivo, Aβ1–40 or 1–42 can induce currents in rat cortical neurons [25, 44], HNT cells [46] and gonadotrophin-releasing hormone secreting neurons [42]. The channels observed in vivo seem indistinguishable in their properties from those observed in vitro. Aβ1–40 or 1–42 can also kill fibroblasts in a manner inhibited by antibodies, tromethamine or zinc, but not by antioxidants, suggesting that channel formation is the mechanism of cytotoxicity. The freshly prepared Aβ used in these studies appeared “globular:” consistent with an early stage of aggregation [8, 36]. It has also been shown that the cholesterol content of plasma membranes affects a cell’s vulnerability to Aβ1–40 and 1–42 [2], suggesting that the membrane plays a critical role in Aβ cytotoxicity. Also, it has been reported that Aβ can directly induce cytochrome c with release from mitochondria. This could occur through the action of Aβ to decrease mitochondrial membrane potential or even through Aβ channel mediated release of cytochrome c [46].
9 PrP106–126
The prion protein (PrP) has at least two distinct tertiary conformations, PrPC and PrPSc , the latter of which results in a transmissible neurodegenerative disease known as a spongiform encephalopathy. Prion diseases include scrapie in sheep and “mad cow” disease as well as Creutzfeldt-Jakob disease, Gerstmann-Str¨ausslerScheinker syndrome and fatal familial insomnia in humans. These illnesses may be sporadic, infectious or hereditary. The familial versions are associated with mutations in the prion protein [16]. PrPSc deposits in the brains of afflicted organisms in a form that is readily converted in amyloid fibrils in vitro. A critical step in the conformational transition from PrPC to PrPSc is the conversion of α-helical and random coil regions of PrP to β-sheets [52]. One region predicted to be α-helical, PrP106–126, actually forms β-sheets when chemically synthesized and self-aggregates into amyloid fibrils [26]. The β-sheet-rich form of PrP106–126 binds to membranes, unfolds and ultimately disrupts the bilayer [45]. Forloni et al. [23] demonstrated that PrP106–126 was toxic to neurons in culture. Lin et al. [56] reported that PrP106–126 could form ion-permeable channels in planar lipid bilayer membranes at neurotoxic concentrations. PrP106–126 channels were irreversibly associated with the membranes, demonstrated a multiplicity of single
9 PrP106–126 11
channel conductances (10–400 pS in 0.1 M NaCl) and had relatively long lifetimes (seconds to minutes). Ionic selectivity of the channels was meager, with significant permeability being shown to Na+ , K+ , Cl− and Ca+ (PNa /PCl = 2.5). Channel activity could be enhanced dramatically by “aging” of the peptide in aqueous solution, a procedure which promotes aggregation and increases neurotoxicity. Incubation of PrP106–126 at acidic pH also enhanced channel activity by nearly 100 times and shifted the distribution of observed single-channel conductances to higher conductance levels. It has also been reported that acidic pH converts α-helical PrP106–126 to the β-sheet conformation [8]. Kourie and Culverson [49] characterized three distinct channel types formed by PrP106–126. These included: (1) a dithiodipyin sensitive channel of 40 pS with slow kinetic behavior, (2) a giant channel, 900–1500 pS, exhibiting five separate subconductance states and (3) a tetraethyl-ammonium chloride (TEA)-sensitive channel of 140 pS with rapid kinetics. Manunta et al. [58] were unable to observe PrP106–126 neurotoxicity or channel formation, but this may have been a result of the highly variable aggregation state of the PrP106–126 peptide, reminiscent of the variability seen with aggregation of Aβ peptides. Bahadi et al. [6] have reported that PrP82–146, a peptide found in the PrPSc brains of patients with Gerstmann-Str¨aussler-Scheinker syndrome, can also form ion channels. Scrambling the amino acid sequence of the 106–126 region of this longer peptide abolishes the ability to form ion channels, whereas scrambling the 127–146 region has no effect, thus implicating the 106–126 region as key in channel-forming ability. The electrophysiologic properties of PrP82–145 are very similar to those of PrP106–126. Channel activity could be decreased by the antibiotic rifampicin which had previously been shown to decrease aggregation and toxicity of Aβ peptides. The association of amyloid deposits with prion diseases is variable. Intriguingly, in one prion disease where amyloid fibrils are not found, the mutant prion protein adapts a transmembrane conformation [29]. It is tempting to speculate that this transmembrane protein may be causing ionic leakage across the cell membrane. Lysosomotropic agents have been reported to inhibit PrPSc accumulation in neuroblastoma cells [18]. One of these agents, quinacrine, has been reported to block PrP106–126 channels [6]. Quinacrine is also able to repair the impaired functioning of N-type calcium channels in prion-infected neurons [62]. Thus, it seems likely that channel blockers, such as quinacrine, may be useful as potential therapeutic agents in prion related diseases. Indeed, there is at least one report of quinacrine improving the clinical status of four patients with Creutzfeldt-Jakob disease [63]. Other acridine derivatives and tricyclic compounds may have even better antiprion efficiency [47]. Congo red can inhibit channel formation, block PrP106–126 cytotoxicity and inhibit the development of scrapie [33, 37]. It remains to be seen whether the anti-channel blocking or anti-aggregation activities of quinacrine are key to its anti-prion effects. It has also been reported that a peptide PrP170–175 bearing a prion protein mutation related to schizoaffective disorder [6] increases the permeability of planar lipid bilayers and forms channels with conductance of 8–26 pS in 0.5 M KCl. The native PrP170–175 does not form channels in membranes. This result suggests that yet another segment of PrP may be capable of influencing toxicity via channel formation and that this may be directly relevant to human disease.
12 Cellular Toxicity of Protein Aggregates
10 IAPP
IAPP (amylin) is a 37-residue amyloidogenic hormone which is co-secreted with insulin from β cells in the islets of Langerhans in the pancreas. Amyloid deposits comprising IAPP are found in the islets of patients with Type 2 diabetes, and are positively correlated with β cell loss and clinical insulin requirements [12, 66]. IAPP is cytotoxic to β cells in culture [9]. Although IAPP is α-helical in aqueous solution, exposure to lipid membranes induced a transition to the β-sheet structure [67]. Human IAPP formed ionpermeable channels in planar lipid membranes at cytotoxic concentrations [39]. Rat IAPP, which differs from human IAPP at five amino acid positions, and is non-amyloidogenic and non-toxic, did not form channels. Human IAPP channels could be inserted into membranes irrespective of voltage; however, once inserted, channels rapidly opened at negative voltages and rapidly inactivated at positive voltages (voltages being relative to the IAPP-containing side). Inactivation faded gradually over a time course of several minutes. Open IAPP channels were ohmic and exhibited a single-channel conductance of 7.5 pS in 0.1 M KCl. Channels were permanently associated with the membrane and showed lifetimes of seconds to minutes depending on voltages. Increasing concentrations of net negatively charged lipids in the membrane led to an increase in IAPP channel activity. Increasing salt concentrations in the aqueous solution decreased channel activity. Anguiano et al. [1] showed that liposomes could be permeabilized by IAPP in a graded fashion, allowing Ca2+ to cross the membrane, while not allowing fura-2 (molecular weight = 832) or FITC-dextran (molecular weight = 4400) to escape. IAPP has also been reported to disrupt Ca2+ homeostasis in cells in a manner similar to Aβ and prion-related peptides [27]. Large fibrils of IAPP have been reported to be non-toxic, whereas smaller aggregates are associated with cytotoxicity [20]. The aggregates, but not fibrils, could disrupt planar lipid bilayers. Light scattering showed these oligomers to range in size from 25 to 6000 IAPP molecules. Hirakura et al. [33] showed that Congo red incubation with IAPP, Aβ or PrP106–126, prior to membrane exposure, could inhibit channel formation. They also reported that Zn2+ could reversibly block these channels. The concurrence of channel-forming properties, physiologic effects and cytotoxicity strongly suggests a common mechanism for the channel-forming action for these three amyloid peptides.
11 ANP
A family of hormones, C-type natriuretic peptide (CNP), ANP and brain-derived natriuretic peptide (BNP), helps to regulate fluid and ionic balance. As people age, increasingly large amyloid deposits of ANP are found in their hearts. These fibril-containing deposits are thought to play a deleterious role, perhaps leading
12 SAA 13
to atrial fibrillation and other cardiac pathology [64]. Channels in lipid bilayers have been reported for ANP1–28 [34], CNP-22 and OaC-type natriuretic peptide (18–39) from platypus [48]. ANP channels exhibited a multiplicity of single channel conductances, but all were cation selective. The channels could be divided into three types: (1) Ba2+ -sensitive, fast kinetics with three modes (spike, burst and open), 68 pS, (2) a large conductance channel possessing subconductance states, time dependent inactivation, 273 pS, and (3) transient activation, 160 pS. CNP channels were weakly cation selective, with a high open probability and large single-channel conductance (546 pS). The physiologic properties of these peptides were believed to be concordant with known pathological effects described in animal models and human disease. Although the peptides act through a well-known receptor and second messenger signaling system, it has been proposed that the channel-forming activity of the peptides could play a role in physiologic events. ANP channels would likely hyperpolarize muscle cell membrane and inhibit depolarization driven contractions. Large conductance channels were postulated to degrade membrane potential and ionic balance.
12 SAA
SAA refers to a family of related apolipoproteins. During states of infection or inflammation the acute-phase isoforms of SAA can increase their levels in serum by as much as 3 orders of magnitude. AA fibrils most commonly comprising the N-terminal 76 residues of SAA are found as amyloid deposits in various organs such as spleen, kidney and liver. Patients with chronic infections, such as tuberculosis, or inflammatory diseases, such as rheumatoid arthritis, are particularly at risk. Patients with cancer, arteriosclerosis and Alzheimer’s disease have also been reported to have elevated SAA concentrations [72]. One commercially available, recombinant-generated acute-phase isoform, SAAp, has been reported to form ion-permeable channels in planar lipid bilayer membranes at physiologically relevant concentrations [35]. A wide variety of singlechannel conductances (10–1000 pS) were observed, consistent with a peptide aggregated into multiple oligomeric states. SAA channels were permeable to most physiologic ions including Na+ , K+ , Ca2+ and Cl− , exhibiting only a weak preference for cations over anions. Channel formation could be inhibited by preincubation of SAA with Congo red, but addition of Congo red after channel formation had no effect. Channels could be reversibly blocked by 100 µM Zn2+ . The naturally occurring acute phase isoform SAA1 was reported to lyse bacterial cells when expressed in Escherichia coli, whereas expression of the constitutive isoform SAA4 did not. SAA1 and SAA4 differ in their sequences at approximately 50% of residues. SAA1 has a greater concentration of hydrophobic residues in the N-terminal region. The resemblance of these results to the properties of channel forming toxins such as colicins [74], yeast killer toxins [39], defensins [40] and protegrins [77] led to the proposal that SAA might play a role in host defense
14 Cellular Toxicity of Protein Aggregates
against microbes. Electron microscopic analysis revealed that murine SAA 2.2 can exist as an annular hexamer with a central “pore-like” region [78]. Although membranes were not present, the observed pore diameter of 25 Å was consistent with the physiologic findings of Hirakura et al. [35].
13 AS
Parkinson’s disease is a progressive neurodegenerative disorder characterized by tremor, rigidity and brachykinesia. The hallmark lesion of Parkinson’s disease is the Lewy body, an inclusion body in dopaminergic neurons consisting largely of non-amyloid component (NAC; residues 66–95 of AS). Incorrectly named, NAC is actually a fibril-forming amyloid peptide. Mutations in AS are associated with familial Parkinson’s disease and implicate AS in the pathophysiology of the illness [7]. NAC is also found in Alzheimer’s disease amyloid deposits, which suggests a link between the two amyloid diseases. This notion is supported by the clinical overlap in these illnesses. Alzheimer’s disease patients are commonly found to have motor abnormalities. Parkinson’s disease patients frequently have cognitive problems, such as dementia and depression. Intermediate syndromes such as dementia with Lewy bodies also implicate AS in damage to neurons outside the dopaminergic system [65]. It has been reported that AS can increase the permeability of liposomes in a graded manner to substances of increasing size. This “sieving” action is characteristic of “pore-like” transport systems [81]. Pathogenic mutations in AS such as A30P and A53T have been reported to accelerate the formation of oligomers (protofibrils) capable of permeabilizing activity [81]. Electron microscopy revealed that AS formed annular, pore-like oligomers and that the Parkinson’s diseaserelated mutations enhanced the formation of these structures [18]. The pathogenic “arctic” mutation of Aβ also showed a similar enhancing effect on these annular structures [54]. Electrophysiologic studies of NAC have confirmed the formation of ion-permeable channels in lipid bilayers [5]. These channels have properties strikingly similar to those of other amyloidogenic peptides. Single-channel conductances are heterogeneous. Ionic selectivity is weak. Channels are irreversible and have extended lifetimes. Channel formation is inhibited by Congo red and channels are blocked by Zn2+ .
14 β2M
β 2 M forms amyloid deposits in bones and joints of patients on hemodialysis or peritoneal dialysis, a syndrome referred to as “dialysis-associated amyloidosis”. This 99-residue peptide belongs to the MHC class I complex which is involved in the presentation of foreign antigens to lymphocytes. β 2 M levels can rise 100-fold
16 PG
during states of renal failure [19]. Renal transplantation can result in lower β 2 M levels and clinical symptoms. (β 2 M’s physiologic effects include the induction of Ca2+ , efflux from calvariae, bone resorption and increasing collagenase production [9, 72, 78]. β 2 M is mainly found in amyloid deposits as fulllength native protein. In contrast to other amyloid protein “misfolding” diseases, the misfolding here appears to be solely a function of increased protein concentration in the serum, rather than mutation or proteolysis. Channel formation by β 2 M was reported by Hirakura and Kagan [41]. A multiplicity of single channel conductances were observed, ranging from 0.5 to 120 pS, with 90 pS being the most commonly observed size in 0.1 M KCl. Channel lifetimes were typically extended and ionic selectivity was poor. β 2 M associated irreversibly with the membrane. Incubation of β 2 M with Congo red inhibited formation of channels. Zn2+ could reversibly block inserted channels. Open β 2 M channels exhibited a slight degree of rectification with trans positive currents being somewhat larger than trans negative currents. Channel formation could be accelerated by acidic pH, compatible with the idea that the acidotic/uremic state of renal failure could enhance the generation of pathologic oligomers of β 2 M.
15 AL Amyloidosis
AL (light chain) amyloidosis is characterized by fibrillar deposits of the variable domain of immunoglobulin light chains. Fibril assembly is dependent on environmental conditions, e.g. the process may be different on surfaces versus in solution [36]. AL deposits are frequently found on surfaces such as arterial walls and basement membranes. A recent study of AL aggregation using atomic force microscopy found that AL protein of the variable domain SMA could form annular aggregates similar to those seen with AS (see Section 13). The SMA annular aggregates were significantly larger, however. Acidic pH was critical to the formation of these aggregates. It was suggested that these annular species might form pores in membranes [89].
16 PG
Although triplet-repeat diseases are not classic “amyloidoses”, their pathology seems to involve protein misfolding and the accumulation of toxic protein aggregates. Huntington’s disease is the most common and best known of these hereditary illnesses, which are caused by an expansion of the codon CAG which codes for glutamine. In Huntington’s disease, PG tracts larger than 37 residues cause disease, although this number varies amongst the different triplet-repeat illnesses. Several different proteins are associated with the various triplet-repeat diseases, but all are affected by an extended PG tract with a minimum threshold length for causing
15
16 Cellular Toxicity of Protein Aggregates
clinical symptoms. In Huntington’s disease, the best-studied triplet-repeat disease, the presence of amyloid-like neuronal aggregates of huntingtin, the PG expanded protein, correlates with disease progression in transgenic mice [56]. The toxicity of huntingtin is proportional to the PG tract repeat length. Age of onset of illness is inversely proportional to PG repeat length. However, a PG repeat length = 12 cell line shows vulnerability to apoptosis without visible aggregates of huntingtin (PG repeat length = 150). Thus, visible deposit or aggregates may not be necessary to cause dysfunction or cell death. Indeed, some transgenic mice show electrophysiologic abnormalities in striatum before aggregates are visible [14]. Channel formation by PG was reported almost simultaneously by Hirakura et al. [34] and Monoi et al. [73]. The former group reported channels that were long lived, non-selective and heterogeneous in single channel conductance size, ranging from 19 to 220 pS in 0.1 M KCl. Channel formation was increased by acidic pH. Unlike classic amyloid peptides, Congo red pre-incubation did not inhibit channel activity nor was Zn2+ able to block PG channels. These distinct properties indicate that PG aggregates are clearly different structurally from classic amyloids. Monoi et al. [73] reported that PG40 could form cation-selective, long-lived channels with a single-channel conductance of 17 pS. PG29 could not form channels, consistent with the 37-residue cut-off for clinical illness. The investigators also proposed a structural model, the µ-helix, for PG channels, a model which just spans the bilayer hydrophobic core at a length of 37 residues, again in agreement with the clinical data. Further evidence of possible channel formation by PG in mitochondria was reported by Panov et al. [76], who showed that, in Huntington’s disease, mitochondria exhibited decreased membrane potential and were depolarized at lower Ca2+ levels than control mitochondria. Mitochondria from brain of transgenic mice expressing huntingtin with a pathogenic PG tract exhibited a similar dysfunction. Electron microscopy revealed mutant huntingtin localized to mitochondrial membranes. Most strikingly, the mitochondrial defects could be reproduced by a fusion protein with a long PG repeat. These results suggest that PG tracts are toxic to mitochondria and likely act via a channel-forming mechanism, depolarizing mitochondria and leaving them more vulnerable to apoptosis or metabolic insults. These data are consistent with a report that Aβ peptide can directly induce cytochrome c release in isolated mouse brain mitochondria, by directly inducing a permeability increase in the mitochondrial membrane [46].
17 HypF
The recent studies of non-disease related amyloid proteins has led to a reexamination of the nature of amyloid structure with respect to disease. It has been reported that a wide variety of non-disease proteins can form amyloid fibrils under the “appropriate” conditions [94]. It is a curiosity that the aggregation process of at least one such non-disease-related amyloid protein, HypF, can lead to formation
19 Lysozyme
of oligomeric structures and permeabilization of lipid membranes, without forming amyloid fibrils [83]. These oligomers are cytotoxic as well. These data strongly suggest that it is the oligomer β-sheet conformation itself that leads to membrane insertion, channel formation, cellular dysfunction and, eventually, cytotoxicity. It has been proposed that these are latent properties of all polypeptide chains (See chapter 3). 18 Calcitonin (CT)
Human calcitonin is (hCT) is a 32-residue peptide hormone involved in the regulation of calcium and phosphorous metabolism. It is produced by the C cells of the thyroid gland and is found in fibrillar amyloid deposits in patients with medullary carcinoma of the thyroid. The calcitonin peptide and fragments as small as four residues can form amyloid fibrils when incubated in aqueous solution. Calcitonin has pharmacological use in humans as a treatment for Paget’s disease and osteoporosis. The tendency of hCT to form fibrils limits its utility and has favored the use of salmon calcitonin (sCT), which is less fibrillogenic. A peptide corresponding to residues 16–21 of hCT form antiparallel β-sheets in methanol water and peptides consisting of residues 15–19 are highly amyloidogenic [82]. Fibril formation is accompanied by a conformational charge from α-helix to β-sheet in this central region or a change from random coil to β-sheet in the C-terminal region [27]. CT from several species, including human, salmon, eel and porcine, can form ionpermeable channels in lipid bilayer membranes [96]. These channels were long lived, voltage dependent, non-selective between cations and anions, permeable to Ca2+ , and enhanced by the presence of negatively charged phospholipids. It is possible that the channel-forming activity of hCT could have other physiologic effects on signal transduction, particularly in light of its ability to allow Ca2+ across membranes. Further investigation is needed to investigate the potential physiological role of these channels. 19 Lysozyme
Lysozyme is an enzyme that exhibits antimicrobial activity by cleaving bonds in the outer wall of Gram-positive bacteria. Lysozyme also forms toxic amyloid deposits in human. Mutations or partial protein unfolding leads to aggregation, oligomer formation and fibrilization of lysozyme. Partial unfolding of lysozyme also leads to the development of a non-enzymatic, broad-spectrum antimicrobial activity and a membrane-permeabilizing activity [36]. These activities were localized to a helix–loop–helix peptide at the upper lip of the active site cleft (87–114 of hen lysozyme). Similar peptides from human and chicken lysozyme possessed these activities as well. These results suggest that the unfolding of lysozyme leads to fundamental shifts in protein function and activity. The aggregation of lysozyme
17
18 Cellular Toxicity of Protein Aggregates
monomers into oligomers appears to create a membrane-penetrating, antimicrobial complex which can aggregate into amyloid rings as seen by imaging [62].
References 1 SIPE J. D., COHEN A. S.. Review: history of the amyloid fibril. J Struct Biol 2000, 130, 88–98. 2 MERLINI G., WESTERMARK P.. The systemic amyloidoses: clearer understanding of the molecular mechanisms offers hope for more effective therapies. J Intern Med 2004, 255, 159–178. 3 HARDY J., ALLSOP D.. Amyloid deposition as the central event in the aetiology of Alzheimer’s disease. Trends Pharmacol Sci 1991, 12, 383–388. 4 MERLINI G., BELLOTTI V.. Molecular mechanisms of amyloidosis. N Engl J Med 2003, 349, 583–596. 5 PEPYS M. B., HAWKINS P. N., BOOTH D. R., VIGUSHIN D. M., TENNENT G. A., SOUTAR A. K., TOTTY N., NGUYEN O., BLAKE C. C., TERRY C. J., FEEST T. G., ZALIN A. M., HSUAN J. J.. Human lysozyme gene mutations cause hereditary systemic amyloidosis. Nature 1993, 362, 553–557. 6 SAMAIA H. B., MARI J. J., VALLADA H. P., MOURA R. P., SIMPSON A. J., BRENTANI R. R.. A prionlinked psychiatric disorder. Nature 1997, 390, 241. 7 YANKNER B. A., DUFFY L. K., KIRSCHNER D. A.. Neurotrophic and neurotoxic effects of amyloid β protein: reversal by tachykinin neuropeptides. Science 1990, 250, 279–282. 8 De GIOIA L., SELVAGGINI C., GHIBAUDI E., DIOMEDE L., BUGIANI O., FORLONI G., TAGLIAVINI F., SALMONA M.. Conformational polymorphism of the amyloidogenic and neurotoxic peptide homologous to residues 106–126 of the prion protein. J Biol Chem 1994, 269, 7859–7862. 9 LORENZO A., RAZZABONI B., WEIR G. C., YANKNER B. A.. Pancreatic islet cell toxicity of amylin associated with type-2 diabetes mellitus. Nature 1994, 368, 756–760.
10 PIKE C. J., BURDICK D., WALENCEWICZ A. J., GLABE C. G., COTMAN C. W.. Neurodegeneration induced by β-amyloid peptides in vitro: the role of peptide assembly state. J Neurosci 1993, 13, 1676–1687. 11 HARPER J. D., LANSBURY P. T., Jr. Models of amyloid seeding in Alzheimer’s disease and scrapie: mechanistic truths and physiological consequences of the time-dependent solubility of amyloid proteins. Annu Rev Biochem 1997, 66, 385–407. 12 HIRAKURA Y., SATOH Y., HIRASHIMA N., SUZUKI T., KAGAN B. L., KIRINO Y. Membrane perturbation by the neurotoxic Alzheimer amyloid fragment β25–35 requires aggregation and β-sheet formation. Biochem Mol Biol Int 1998, 46, 787–794. 13 CHEN Q. S., KAGAN B. L., HIRAKURA Y., XIE C. W.. Impairment of hippocampal long-term potentiation by Alzheimer amyloid β-peptides. J Neurosci Res 2000, 60, 65–72. 14 WALSH D. M., KLYUBIN I., FADEEVA J. V., CULLEN W. K., ANWYL R., WOLFE M. S., ROWAN M. J., SELKOE D. J.. Naturally secreted oligomers of amyloid β protein potently inhibit hippocampal long-term potentiation in vivo. Nature 2002, 416, 483–484. 15 ZHU M., SOUILLAC P. O., IONESCU-ZANETTI C., CARTER S. A., FINK A. L.. Surface-catalyzed amyloid fibril formation. J Biol Chem 2002, 277, 50914–50922. 16 CEPEDA C., HURST R. S., CALVERT C. R., HERNANDEZ-ECHEAGARAY E., NGUYEN O. K., JOCOY E., CHRISTIAN L. J., ARIANO M. A., LEVINE M. S.. Transient and progressive electrophysiological alterations in the corticostriatal pathway in a mouse model of Huntington’s disease. J Neurosci 2003, 23, 961– 969.
References 17 YANG W., DUNLAP J. R., ANDREWS R. B., WETZEL R.. Aggregated polyglutamine peptides delivered to nuclei are toxic to mammalian cells. Hum Mol Genet 2002, 11, 2905–2917. 18 LASHUEL H. A., PETRE B. M., WALL J., SIMON M., NOWAK R. J., WALZ T., LANSBURY P. T., Jr. Alpha-synuclein, especially the Parkinson’s disease-associated mutants, forms pore-like annular and tubular protofibrils. J Mol Biol 2002a, 322, 1089–1102. 19 LASHUEL H. A., HARTLEY D. M., PETRE B. M., WALL J. S., SIMON M. N., WALZ T., LANSBURY P. T., Jr. Mixtures of wild-type and a pathogenic (E22G) form of Aβ40 in vitro accumulate protofibrils, including amyloid pores. J Mol Biol 2003, 332, 795–808. 20 JANSON J., ASHLEY R. H., HARRISON D., MCINTYRE S., BUTLER P. C.. The mechanism of islet amyloid polypeptide toxicity is membrane disruption by intermediate-sized toxic amyloid particles. Diabetes 1999, 48, 491–498. 21 KAYED R., HEAD E., THOMPSON J. L., MCINTIRE T. M., MILTON S. C., COTMAN C. W., GLABE C. G.. Common structure of soluble amyloid oligomers implies common mechanism of pathogenesis. Science 2003, 300, 486–489. 22 FERNANDEZ A., BERRY R. S.. Proteins with H-bond packing defects are highly interactive with lipid bilayers: implications for amy-loidogenesis. Proc Natl Acad Sci USA 2003, 100, 2391–2396. 23 MCLAURIN J., FRANKLIN T., FRASER P. E., CHAKRABARTTY A.. Structural transitions associated with the interaction of Alzheimer β-amyloid peptides with gangliosides. J Biol Chem 1998, 273, 4506–4515. 24 FANDRICH M., FORGE V., BUDER K., KITTLER M., DOBSON C. M., DIEKMANN S.. Myoglobin forms amyloid fibrils by association of unfolded polypeptide segments. Proc Natl Acad Sci USA 2003, 100, 15463–15468. 25 BUCCIANTINI M., CALLONI G., CHITI F., FORMIGLI L., NOSI D., DOBSON C. M., STEFANI M.. Prefibrillar amyloid protein aggregates share common features of
26
27
28
29
30
31
32
33
cytotoxicity. J Biol Chem 2004, 279, 31374–31382. SCHUBERT D., BEHL C., LESLEY R., BRACK A., DARGUSCH R., SAGARA Y., KIMURA H.. 1995, Amyloid peptides are toxic via a common oxidative mechanism. Proc Natl Acad Sci USA 92, 1989–1993. KAWAHARA M., KURODA Y., ARISPE N., ROJAS E.. Alzheimer’s β-amyloid, human islet amylin, and prion protein fragment evoke intracellular free calcium elevations by a common mechanism in a hypothalamic GnRH neuronal cell line. J Biol Chem 2000, 275, 14077– 14083. BUCCIARELLI L. G., WENDT T., RONG L., LALLA E., HOFMANN M. A., GOOVA M. T., TAGUCHI A., YAN S. F., YAN S. D., STERN D. M., SCHMIDT A. M.. RAGE is a multiligand receptor of the immunoglobulin superfamily implications for homeostasis and chronic disease. Cell Mol Life Sci 2002, 59, 1117–1128. BUTTERFIELD D. A., BUSH A. I.. Alzheimer’s amyloid β-peptide (1–42): involvement of methionine residue 35 in the oxidative stress and neurotoxicity properties of this peptide. Neurobiol Aging 2004, 25, 563–568. ARISPE N., POLLARD H. B., ROJAS E.. Giant multi level cation channels formed by Alzheimer disease amyloid-protein [AβP-(1–40)] in bilayer membranes. Proc Natl Acad Sci USA 1993a, 90, 10573–10577. ARISPE N., ROJAS E., POLLARD H. B.. Alzheimer disease amyloid β protein forms calcium channels in bilayer membranes: blockade by tromethamine and aluminum. Proc Natl Acad Sci USA 1993b, 90, 567–571. MIRZABEKOV T., LIN M. C., YUAN W. L., MARSHALL P. J., CARMAN M., TOMASELLI K., LIEBERBURG I., KAGAN B. L.. Channel formation in planar lipid bilayers by a neurotoxic fragment of the β-amyloid peptide. Biochem Biophys Res Commun 1994, 202, 1142–1148. HIRAKURA Y., LIN M. C., KAGAN B. L.. Alzheimer amyloid aβ1–42 channels: effects of solvent, pH, and Congo Red. J Neurosci Res 1999, 57, 458–466.
19
20 Cellular Toxicity of Protein Aggregates
34 KOURIE J. I., HANNA E. A., HENRY C. L. Properties and modulation of a human atrial natriuretic peptide (a-hANP)-formed ion channels. Can J Physiol Pharmacol 2001, 79, 654–664. 35 FRASER S. P., SUH Y. H., DJAMGOZ M. B.. Ionic effects of the Alzheimer’s disease β-amyloid precursor protein and its metabolic fragments. Trends Neurosci 1997, 20, 67–72. 36 ZHU Y. J., LIN H., LAL R.. Fresh and nonfibrillar amyloid β protein(1–40) induces rapid cellular degeneration in aged human fibroblasts: evidence for AβP-channel-mediated cellular toxicity. FASEB J 2000, 14, 1244–1254. 37 HIRAKURA Y., YIU W. W., YAMAMOTO A., KAGAN B. L.. Amyloid peptide channels: blockade by zinc and inhibition by Congo red (amyloid channel block). Amyloid 2000a, 7, 194–199. 38 KOURIE J. I., CULVERSON A. L., FARRELLY P. V., HENRY C. L., LAOHACHAI K. N.. Heterogeneous amyloid-formed ion channels as a common cytotoxic mechanism: implications for therapeutic strategies against amyloidosis. Cell Biochem Biophys 2002, 36, 191–207. 39 MIRZABEKOV T. A., LIN M. C., KAGAN B. L.. Pore formation by the cytotoxic islet amyloid peptide amylin. J Biol Chem 1996, 271, 1988–1992. 40 LIN M. C., KAGAN B. L. Electrophysiologic properties of channels induced by Aβ25–35 in planar lipid bilayers. Peptides 2002, 23, 1215–1228. 41 SONG L., HOBAUGH M. R., SHUSTAK C., CHELEY S., BAYLEY H., GOUAUX J. E.. Structure of staphylococcal α-hemolysin, a heptameric transmembrane pore. Science 1996, 274, 1859–1866. 42 PETOSA C., COLLIER R. J., KLIMPEL K. R., LEPPLA S. H., LIDDINGTON R. C.. Crystal structure of the anthrax toxin protective antigen. Nature 1997, 385, 833– 838. 43 QI J. S., QIAO J. T.. Amyloid β-protein fragment 31–35 forms ion channels in membrane patches excised from rat hippocampal neurons. Neuroscience 2001, 105, 845–852. 44 WEISS J. H., PIKE C. J., COTMAN C. W. Ca2+ channel blockers attenuate
45
46
47
48
49
50
51
52
53
β-amyloid peptide toxicity to cortical neurons in culture. J Neurochem 1994, 62, 372–375. FURUKAWA K., ABE Y., AKAIKE N.. Amyloid β protein-induced irreversible current in rat cortical neurones. Neuroreport 1994, 5, 2016–2018. SANDERSON K. L., BUTLER L., INGRAM V. M.. Aggregates of a β-amyloid peptide are required to induce calcium currents in neuron-like human teratocarcinoma cells: relation to Alzheimer’s disease. Brain Res 1997, 744, 7–14. KAWAHARA M., ARISPE N., KURODA Y., ROJAS E.. Alzheimer’s disease amyloid β-protein forms Zn2+ -sensitive, cation-selective channels across excised membrane patches from hypothalamic neurons. Biophys J 1997, 73, 67–75. BHATIA R., LIN H., LAL R.. Fresh and globular amyloid β protein (1–42) induces rapid cellular degeneration: evidence for AβP channel-mediated cellular toxicity. FASEB J 2000, 14, 1233–1243. ARISPE N., DOH M.. Plasma membrane cholesterol controls the cytotoxicity of Alzhei mer’s disease AβP (1–40) and (1–42) peptides. FASEB J 2002, 16, 1526–1536. KIM H. S., LEE J. H., LEE J. P., KIM E. M., CHANG K. A., PARK C. H., JEONG S. J., WITTENDORP M. C., SEO J. H., CHOI S. H., SUH Y. H.. Amyloid β peptide induces cytochrome C release from isolated mitochondria. Neuroreport 2002, 13, 1989–1993. DEARMOND S. J., PRUSINER S. B.. Perspectives on prion biology, prion disease pathogenesis, and pharmacologic approaches to treatment. Clin Lab Med 2003, 23, 1–41. PAN K. M., BALDWIN M., NGUYEN J., GASSET M., SERBAN A., GROTH D., MEHLHORN I., HUANG Z., FLETTERICK R. J., COHEN F. E., PRUSINER S. B.. Conversion of α-helices into β-sheets features in the formation of the scrapie prion proteins. Proc Natl Acad Sci USA 1993, 90, 10962–10966. GASSET M., BALDWIN M. A., LLOYD D. H., GABRIEL J. M., HOLTZMAN D. M., COHEN F., FLETTERICK R., PRUSINER S. B..
References
54
55
56
57
58
59
60
61
62
Predicted α-helical regions of the prion protein when synthesized as peptides form amyloid. Proc Natl Acad Sci USA 1992, 89, 10940–10944. KAZLAUSKAITE J., SANGHERA N., SYLVESTER I., VENIEN-BRYAN C., PINHEIRO T. J.. Structural changes of the prion protein in lipid membranes leading to aggregation and fibrillization. Biochemistry 2003, 42, 3295–3304. FORLONI G., CHIESA R., SMIROLDO S., VERGA L., SALMONA M., TAGLIAVINI F., ANGERETTI N.. Apoptosis mediated neurotoxicity induced by chronic application of β amyloid fragment 25–35. Neuroreport 1993, 4, 523–526. LIN M. C., MIRZABEKOV T., KAGAN B. L.. Channel formation by a neurotoxic prion protein fragment. J Biol Chem 1997, 272, 44–47. KOURIE J. I., CULVERSON A.. Prion peptide fragment PrP[106–126] forms distinct cation channel types. J Neurosci Res 2000, 62, 120–133. MANUNTA M., KUNZ B., SANDMEIER E., CHRISTEN P., SCHINDLER H.. Reported channel formation by prion protein fragment 106–126 in planar lipid bilayers cannot be reproduced [Letter]. FEBS Lett 2000, 474, 255–256. BAHADI R., FARRELLY P. V., KENNA B. L., KOURIE J. I., TAGLIAVINI F., FORLONI G., SALMONA M.. Channels formed with a mutant prion protein PrP(82–146) homologous to a 7-kDa fragment in diseased brain of GSS patients. Am J Physiol Cell Physiol 2003, 285, C862–C872. HEGDE R. S., MASTRIANNI J. A., SCOTT M. R., DEFEA K. A., TREMBLAY P., TORCHIA M., DEARMOND S. J., PRUSINER S. B., LINGAPPA V. R.. A transmembrane form of the prion protein in neurodegenerative disease. Science 1998, 279, 827–834. DOH-URA K., IWAKI T., CAUGHEY B.. Lysosomotropic agents and cysteine protease inhibitors inhibit scrapie-associated prion protein accumulation. J Virol 2000, 74, 4894–4897. SANDBERG M. K., WALLEN P., WIKSTROM M. A., KRISTENSSON K.. Scrapie-infected
63
64
65
66
67
68
69
70
71
GT1–1 cells show impaired function of voltagegated N-type calcium channels (Ca(v) 2.2) which is ameliorated by quinacrine treatment. Neurobiol Dis 2004, 15, 143–151. NAKAJIMA M., YAMADA T., KUSUHARA T., FURUKAWA H., TAKAHASHI M., YAMAUCHI A., KATAOKA Y. Results of quinacrine administration to patients with Creutzfeldt-Jakob disease. Dement Geriatr Cogn Disorders 2004, 17, 158–163. KORTH C., MAY B. C., COHEN F. E., PRUSINER S. B.. Acridine and phenothiazine derivatives as pharmacotherapeutics for prion disease. Proc Natl Acad Sci USA 2001, 98, 9836–9841. INGROSSO L., LADOGANA A., POCCHIARI M.. Congo red prolongs the incubation period in scrapie-infected hamsters. J Virol 1995, 69, 506–508. WESTERMARK P., WILANDER E.. The influence of amyloid deposits on the islet volume in maturity onset diabetes mellitus. Diabetologia 1978, 15, 417– 421. BUTLER A. E., JANSON J., SOELLER W. C., BUTLER P. C.. Increased β-cell apoptosis prevents adaptive increase in β-cell mass in mouse model of type 2 diabetes: evidence for role of islet amyloid formation rather than direct action of amyloid. Diabetes 2003, 52, 2304–2314. MCLEAN L. R., BALASUBRAMANIAM A.. Promotion of β-structure by interaction of diabetes associated polypeptide (amylin) with phosphatidylcholine. Biochim Biophys Acta 1992, 1122, 317–320. ANGUIANO M., NOWAK R. J., LANSBURY P. T., Jr. Protofibrillar islet amyloid polypeptide permeabilizes synthetic vesicles by a pore-like mechanism that may be relevant to type II diabetes. Biochemistry 2002, 41, 11338–11343 MCCARTHY R. E., 3rd, KASPER E. K.. A review of the amyloidoses that infiltrate the heart. Clin Cardiol 1998, 21, 547–552. KOURIE J. I.. Characterization of a C-type natriuretic peptide (CNP-39)-formed cation-selective channel from platypus (Ornithor-hynchus anatinus) venom. J Physiol 1999, 518, 59–69.
21
22 Cellular Toxicity of Protein Aggregates
72 SIPE J. D.. Serum amyloid A: from fibril to function. Current status. Amyloid 2000, 7, 10–12. 73 HIRAKURA Y., CARRERAS I., SIPE J. D., KAGAN B. L.. Channel formation by serum amyloid A: a potential mechanism for amyloid pathogenesis and host defense. Amyloid 2002, 9, 13–23. 74 SCHEIN S. J., KAGAN B. L., FINKELSTEIN A.. Colicin K acts by forming voltage-dependent channels in phospholipid bilayer membranes. Nature 1978, 276, 159–163. 75 KAGAN B. L.. Mode of action of yeast killer toxins: channel formation in lipid bilayer membranes. Nature 1983, 302, 709–711. 76 KAGAN B. L., SELSTED M. E., GANZ T., LEHRER R. I.. Antimicrobial defensin peptides form voltage-dependent ion-permeable channels in planar lipid bilayer membranes. Proc Natl Acad Sci USA 1990, 87, 210–214. 77 SOKOLOV Y., MIRZABEKOV T., MARTIN D. W., LEHRER R. I., KAGAN B. L.. Membrane channel formation by antimicrobial protegrins. Biochim Biophys Acta 1999, 1420, 23–29. 78 WANG L., LASHUEL H. A., WALZ T., COLON W. Murine apolipoprotein serum amyloid A in solution forms a hexamer containing a central channel. Proc Natl Acad Sci USA 2002, 99, 15947– 15952. 79 BAPTISTA M. J., COOKSON M. R., MILLER D. W.. Parkin and α-synuclein: opponent actions in the pathogenesis of Parkinson’s disease. Neuroscientist 2004, 10, 63–72. 80 MCKEITH I. G., MOSIMANN U. P.. Dementia with Lewy bodies and Parkinson’s disease. Parkinsonism Relat Disord 2004, 10 (Suppl 1), S15–S18. 81 VOLLES M. J., LANSBURY P. T., Jr. Vesicle permeabilization by protofibrillar α-synuclein is sensitive to Parkinson’s disease-linked mutations and occurs by a pore-like mechanism. Biochemistry 2002, 41, 4595–4602. 82 LASHUEL H. A., HARTLEY D., PETRE B. M., WALZ T., LANSBURY P. T., Jr. Neurodegenerative disease: amyloid
83
84
85
86
87
88
89
90
91
92
93
pores from pathogenic mutations. Nature 2002b, 418, 291. AZIMOVA R. K., KAGAN B. L.. Ion channels formed by a fragment of α-synuclein (NAC) in lipid membranes. Biophys J 2003, 84, 53a. DRU¨ EKE T. B.. Dialysis-related amyloidosis. Nephrol Dial Transplant 1998, 13 (Suppl 1), 58–64. BRINCKERHOFF C. E., MITCHELL T. I., KARMILOWICZ M. J., KLUVE-BECKERMAN B., BENSON M. D.. Autocrine induction of collagenase by serum amyloid A-like and β 2 -microglobulin-like proteins [see Comments]. Science 1989, 243, 655– 657. MOE S. M., SPRAGUE S. M.. Beta 2-microglobulin induces calcium efflux from cultured neonatal mouse calvariae. Am J Physiol 1992, 263, F540–F545. PETERSEN J., KANG M. S.. In vivo effect of β 2 -microglobulin on bone resorption. Am J Kidney Dis 1994, 23, 726–730. KAGAN B. L., HIRAKURA Y., AZIMOV R., AZIMOVA R.. The channel hypothesis of Huntington’s disease. Brain Res Bull 2001, 56, 281–284. ZHU M., HAN S., ZHOU F., CARTER S. A., FINK A. L.. Annular oligomeric amyloid intermediates observed by in situ atomic force microscopy. J Biol Chem 2004, 279, 24452–24459. LI S. H., LI X. J.. Huntingtin-protein interactions and the pathogenesis of Huntington’s disease. Trends Genet 2004, 20, 146–154. HIRAKURA Y., AZIMOV R., AZIMOVA R., KAGAN B. L.. Polyglutamine-induced ion channels: a possible mechanism for the neurotoxicity of huntington and other CAG repeat diseases. J Neurosci Res 2000b, 60, 490–494. MONOI H., FUTAKI S., KUGIMIYA S., MINAKATA H., YOSHIHARA K.. Poly-l-glutamine forms cation channels: relevance to the pathogenesis of the polyglutamine diseases [see Comments]. Biophysical J 2000, 78, 2892–2899. PANOV A. V., GUTEKUNST C. A., LEAVITT B. R., HAYDEN M. R., BURKE J. R., STRITTMATTER W. J., GREENAMYRE J. T.. Early mitochondrial calcium defects in Huntington’s disease are a direct effect
References
94
95
96
97
98
99
of polyglutamines. Nat Neurosci 2002, 5, 731–736. STEFANI M., DOBSON C. M.. Protein aggregation and aggregate toxicity new insights into protein folding, misfolding diseases and biological evolution. J Mol Med 2003, 81, 678–699. RELINI A., TORRASSA S., ROLANDI R., GLIOZZI A., ROSANO C., CANALE C., BOLOGNESI M., PLAKOUTSI G., BUCCIANTINI M., CHITI F., STEFANI M.. Monitoring the process of HypF fibrillization and liposome permeabilization by protofibrils. J Mol Biol 2004, 338, 943–957. RECHES M., PORAT Y., GAZIT E.. Amyloid fibril formation by pentapeptide and tetrapeptide fragments of human calcitonin. J Biol Chem 2002, 277, 35475–35480. STIPANI V., GALLUCCI E., MICELLI S., PICCIARELLI V., BENZ R.. Channel formation by salmon and human calcitonin in black lipid membranes. Biophys J 2001, 81, 3332–3338. IBRAHIM H. R., THOMAS U., PELLEGRINI A.. A helix-loop-helix peptide at the upper lip of the active site cleft of lysozyme confers potent antimicrobial activity with membrane permeabilization action. J Biol Chem 2001, 276, 43767–43774. MALISAUSKAS M., ZAMOTIN V., JASS J., NOPPE W., DOBSON C. M., MOROZOVA-ROCHE L. A.. Amyloid protofilaments from the calcium-binding protein equine lysozyme: formation of ring and linear structures depends on pH and metal ion concentration. J Mol Biol 2003, 330, 879–890.
100 FARRELLY P. V., KENNA B. L., LAOHACHAI K. L., BAHADI R., SALMONA M., FORLONI G., KOURIE J. I.. Quinacrine blocks PrP (106–126)-formed channels. J Neurosci Res 2003, 74, 934–941. 101 HIRAKURA Y., KAGAN B. L.. Pore formation by β-2-microglobulin: a mechanism for the pathogenesis of dialysis associated amyloidosis. Amyloid 2001, 8, 94–100. 102 KOURIE J. I., HENRY C. L., FARRELLY P.. Diversity of amyloid jl protein fragment [1–40]-formed channels. Cell Mol Neurobiol 2001b, 21, 255–284. 103 LIN H., ZHU Y. J., LAL R.. Amyloid β protein (1–40) forms calcium-permeable, Zn +-sensitive channel in reconstituted lipid vesicles. Biochemistry 1999, 38, 11189–11196. 104 LIN M. C.. Channel formation by amyloidogenic neurotoxic and neurodegenerative disease related peptides. PhD Dissertation, Division of Neuroscience, UCLA, 1996. 105 RODRIGUES C. M., SOLA S., BRITES D.. Bilirubin induces apoptosis via the mitochondrial pathway in developing rat brain neurons. Hepatology 2002, 35, 1186–1195. 106 SIMMONS M. A., SCHNEIDER C. R.. Amyloid β peptides act directly on single neurons. Neurosci Lett 1993, 150, 133–136. 107 YANG A. J., KNAUER M., BURDICK D. A., GLABE C.. Intracellular A β1–42 aggregates stimulate the accumulation of stable, insoluble amy-loidogenic fragments of the amyloid precursor protein in transfected cells. J Biol Chem 1995, 270, 14786–14792.
23
1
Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases Ronald Wetzel University of Tennessee, Knoxville, USA
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
There are at least 15 human diseases associated with the genetic expansion of a chromosomal trinucleotide repeat sequence [1–3]. Some of these occur in noncoding regions, while others are located in open reading frames coding for a variety of proteins. Most of the diseases associated with expressed protein repeats involve CAG repeats coding for polyglutamine (polyGln). The common feature of DNA that underlies the genetic nature of these diseases is the genetic instability of triplet repeats of some of the 64 possible DNA nucleotide triplets [1]. Currently there are nine known diseases involving expansion of a polyGln sequence [1]. While these diseases on average vary one from the other in details of brain pathology and symptoms [1], there are also significant similarities. The polyGln repeat length threshold separating the normal population from at-risk individuals is strikingly similar for eight of the nine diseases [1]. All nine diseases are progressive and uniformly lethal and tend to present in midlife [3]. Of particular significance to this chapter, polyGln aggregates have been demonstrated in each of these diseases [3]. The demonstration of polyGln-containing inclusions in patient material [4] and in animal and cellular models [5–7] also suggested their resemblance to other neurodegenerative diseases [8] involving protein deposition into amyloid or other aggregated deposits. As a subject for studying protein aggregation diseases at the molecular level, however, the polyGln diseases have pluses and minuses. On the one hand, the uniform monotony of glutamine that constitutes the pathogenic core of the disease proteins does not inspire immediate insights into disease
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
mechanisms or protein-folding mechanisms. Indeed, standard site-directed mutagenesis approaches are technically difficult at the DNA level, due to the lack of unique internal restriction sites [9], and in any case may be of questionable value at the protein level, due to the likely ability of the sequence to shift and slide while aggregating so as to accommodate, and negate the effect of, potential modifying mutations [10]. On the other hand, there is something attractive about the simplicity of a homopolymer as a subject for addressing fundamental problems in protein folding and aggregation; for example, studying a real-world homopolymer should provide data more easily related to theoretical protein-folding studies that are often based on highly simplified model structures. A significant additional attraction to studying these diseases is the prospect of contributing to the development of therapies. As summarized below, protein-folding and aggregation studies of these sequences in vitro are making significant contributions to a better understanding of the molecular basis of these diseases and along the way are providing unique systems for better characterizing the role of protein structure in ordered aggregation processes.
2 Key Features of the Polyglutamine Diseases
The literature on the clinical, genetic, cellular, and biochemical aspects of the expanded polyGln diseases has grown enormously since the discovery of the disease genes over the past decade. There are excellent books reviewing the various aspects of each disease [11, 12], and a steady stream of review articles and essays discuss possible disease mechanisms from various points of view [2, 13–31]. These reviews reflect the lively, ongoing debate on the importance of polyGln oligomers and aggregates in the disease mechanism. The discussion here is very brief and necessarily centered on the toxicity of polyGln peptides, especially the possibility that polyGln aggregates play a toxic role. 2.1 The Variety of Expanded PolyGln Diseases
There are currently nine human diseases linked to a genetic expansion in an in-frame, translated CAG repeat [3], of which the most recently identified is spinocerebellar ataxia 17, involving expansions in the transcription factor TBP, or TATA-binding protein [32]. Table 1 lists the known diseases, the disease protein involved, its molecular weight and subcellular localization, and the CAG repeat lengths for the normal and pathological ranges. The most well known and most common of these diseases is Huntington’s disease (HD). There are a number of genetic ataxias whose disease gene has not yet been identified [33, 34], and at least some of these may eventually be found to also involve CAG repeat expansion.
2 Key Features of the Polyglutamine Diseases Table 1 Overview of expanded CAG repeat diseasesa
Disease
Gene product
Cellular localization
Normal CAG Mutant CAG MW (kDa) repeat range repeat range
Huntington’s DRPLAb SBMAc SCA1d SCA2d SCA3d /MJDe SCA6d SCA7d SCA17d
huntingtin atrophin-1 androgen rec. ataxin-1 ataxin-2 ataxin-3 CACNA1Af ataxin-7 TBPg
Cytoplasm/nucleus Cytoplasm/nucleus Cytoplasm/nucleus Cytoplasm/nucleus Cytoplasm Cytoplasm/nucleus Cytoplasm Cytoplasm/nucleus Nucleus
348 190 104 87 145 46 280 95 38
6–39 3–35 9–33 6–44 13–33 3–40 4–19 4–35 24–44
36–>200 49–88 38–65 39–83 32–>200 54–89 20–33 37–306 46–63
a
Data from Ref. [12]. Dentatorubral-pallidoluysian atrophy c Spinal and bulbar muscular atrophy d Spinocerebellar ataxia e Machado-Joseph Disease f α 1A voltage-dependent calcium channel g TATA box-binding protein b
2.2 Clinical Features
Except for the X-linked spinal and bulbar muscular atrophy (SBMA), involving CAG repeat expansions in the gene for the androgen receptor, the other eight known polyGln diseases exhibit autosomal dominant genetics [3]. Huntington’s disease normally presents as a motor disorder, although a significant minority of patients present with psychiatric symptoms [12]. The expanded CAG repeat ataxias involve a wide range of symptoms and significant variability [3]. Early-onset forms tend to present with distinct, more-aggressive symptoms that set them apart from the later-onset forms. The large variations in the course of disease for patients with the same CAG repeat length suggest important roles for secondary factors, including environment and modifying genes, and these are currently under investigation in those diseases with statistically significant amounts of patient data. 2.2.1 Repeat Expansions and Repeat Length DNA triplet expansions leading to genetic disease can occur in expressed open reading frames or in non-coding regions [1]. Of the possible triplet repeat sequences, only a few are known to be involved in triplet expansion genetic disorders [1]. Although it is presumed that there are unique structural aspects to the DNA of those triplets whose instability during one or more enzymatic process in the cell leads to expansion-based diseases, the structural basis of repeat instability is unclear and continues to be evaluated [35]. The stability of the CAG repeat, in particular, appears to depend on a number of factors, including the length of the repeat before expansion, its sequence context, and whether the donor of that gene
3
4 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
is the mother or the father [3, 36]. The normal role of benign repeat lengths of the polyGln sequence in the various proteins in which it is found remains a mystery. In fact, since homologues of the same protein often vary considerably in their polyGln repeat length, while other parts of the protein may show very little variation, it is possible that polyGln in most cases is not functionally important. The hypothesis that normal-length polyGln sequences serve no important function that might be compromised upon sequence expansion is consistent with the widely accepted view that expanded CAG repeat diseases are predominantly gain-of-function disorders [3]. Table 1 shows the clear separation of benign repeat lengths from pathological repeat lengths in the expanded CAG repeat diseases. In all cases but one (SCA 6, which uniquely involves a resident transmembrane channel protein), the pathological cutoff lies in the mid- to upper-30s. In HD, within this critical region, repeat length also influences penetrance, the percentage of individuals with the mutation that eventually develop disease symptoms [37]. Within the pathological repeat length range, age of onset correlates strongly with repeat length, as shown for HD in Figure 1. One of the clinical characteristics of the disease, recognized prior to the discovery of the disease genes, is that of anticipation, in which a parent presents with a disease only after it is already recognized in a child. With the discovery of the dynamic genetic nature of the triplet repeat diseases, anticipation is now understood by the fact that repeat expansions, especially when passed on by the father, can be large, and large repeat expansions tend to dramatically decrease age of onset (Figure 1) [1]. 2.3 The Role of PolyGln and PolyGln Aggregates
Although a few genetic ataxias involve CAG expansions in non-coding regions [33, 34], the bulk of the characterized disorders are associated with expression of the polyGln sequence at the protein level. Loss of function may play a role in some mutations in the gene associated with SCA 6 [3]. Further, a decline in normal levels of active protein, perhaps mediated by aggregation, may also contribute to symptoms in some expanded CAG repeat diseases [38, 39]. On the whole, however, the dominant effect of polyGln expansion appears to be a gain of toxic function. It seems very unlikely that the expanded polyGln disorders, with their often overlapping clinical features and very similar genetics, do not share intimate details of their molecular mechanisms, and it is further unlikely that the molecular mechanisms involving these otherwise unrelated protein sequences are not centered in the behavior of the polyGln sequence. This is further supported by the generation of animal models based on expression of artificial proteins containing in-frame CAG repeats spliced to proteins not known to be associated with disease [5, 40]. Opinion in the field has been divided as to whether expanded polyGln is cytotoxic due to the induction, within the soluble monomer, of a toxic conformation or, alternatively, to the accelerated aggregation of the monomer into a toxic, oligomeric state. Although the ability of a series of antibodies to bind preferentially to expanded
2 Key Features of the Polyglutamine Diseases
Fig. 1 HD CAG repeat length is correlated with the age at onset of neurologic symptoms. The relationship between the expanded HD CAG repeat length (“Number of CAG Repeat Units”) and the age at neurologic onset is given for 1070 HD patients reported in the literature. The mean age at onset for any given HD CAG repeat length is depicted by filled symbol. Power
regression analysis reveals a significant inverse correlation between expanded CAG repeat size and age at onset (r = −0.87, P < 0:0001), although individuals with the same expanded repeat exhibit widely different ages at onset, suggesting the existence of modifiers of the disease process. Figure and analysis courtesy of Marcy MacDonald.
polyGln has been interpreted as evidence for the existence of a novel conformation within the monomer, recent studies suggest that the increased avidity observed for such antibodies can be explained by the multiplicity of independent, oligoGlnbinding sites in the expanded sequence (a linear lattice effect) [41]. Most other data is consistent with no significant change in the conformation of bulk-phase polyGln upon its sequence expansion. In contrast, substantial evidence points to a central role for polyGln aggregation in the disease process [25, 27–29], beginning with the observation of intraneuronal, nuclear inclusions (NII) in cell and animal models of polyGln disease [5–7]. Similar aggregates are found in brain tissue from HD and other diseases listed in Table 1, but not in normal, aged control brain [3, 4]. Genetically identified suppressers of polyGln toxicity, such as PQE1 [42] and dHDJ1 [40], have been shown to be aggregation suppressors in vitro (see Section 8). Further, designed polypeptide-based inhibitors of polyGln aggregation also protect cells from the
5
6 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
toxic effects of expanded polyGln sequences [43] and polyGln aggregates [44] (see Section 8.1). As discussed in detail below, the repeat length–dependent aggregation properties of polyGln closely mirror the repeat-length dependence of disease risk shown in Table 1. Also as discussed below, polyGln aggregates made in vitro and delivered to the nuclei of cells in culture are very cytotoxic [45]. Although the appearance of visible aggregates does not always closely correlate with cytotoxicity in cell and animal models [2, 17, 21], there are reasonable explanations for how such observations might still be consistent with cytotoxic aggregates. Some of these rationales will be alluded to in this chapter.
3 PolyGln Peptides in Studies of the Molecular Basis of Expanded Polyglutamine Diseases
If the effect of sequence expansion on the physicochemical properties of the polyGln sequence plays a central role in the expanded CAG repeat diseases, then it is of particular importance to study polyGln behavior with respect to repeat length in well-defined biophysical experiments. Consists of an overview of studies using either chemically synthesized peptides or recombinant proteins to conduct studies on the conformational preferences of soluble polyGln and on the aggregation efficiency of these sequences. The poor solution behavior of the polyGln sequence has provided substantial barriers to conducting such studies, and much has been learned simply in the effort to produce molecules amenable to analysis. 3.1 Conformational Studies
Initial studies on the favored conformation of the polyGln sequence gave conflicting results. Working with chemically synthesized polyGln sequences well below the pathological threshold repeat length, Altschuler et al. found the sequence to exist in random coil [46], while Sharma et al. found peptides of similar length to be dominated by β-sheet [47]. This discrepancy was resolved with the later realization of the importance of applying rigorous methods of peptide disaggregation as part of the solubilization protocol [48]. Using such protocols, Chen et al. showed that polyGln sequences in solution in PBS exhibit the signature CD spectrum of a statistical coil [49]. More importantly, they also showed that the CD spectra of polyGln peptides both below and above the pathological threshold repeat length give essentially identical CD spectra [49], providing the first experimental indication that a pure conformational change within a monomeric polyGln protein is not likely to underlie the repeat-length dependence of clinical features. Similar results were later obtained with various polyGln repeat lengths within the context of E. coli-expressed fusion proteins containing either naked polyGln sequences [50] or polyGln sequences embedded in a fragment of huntingtin [41]. Importantly, these
3 PolyGln Peptides in Studies of the Molecular Basis of Expanded Polyglutamine Diseases
studies also show that results obtained with synthetic peptides, such as those described here, are not compromised by the disaggregation protocol. The above discussion notwithstanding, the CD spectrum generally assigned to the random coil state is not unambiguous. In fact, a recent study has confirmed earlier suggestions that polypeptides exhibiting the random coil CD spectrum might actually exist in the ordered conformation known as polyproline type II helix [51]. Indeed, recent studies suggest that the amino acid glutamine is particularly comfortable in this form of secondary structure [52]. While the exact nature of the conformation of monomeric polyGln may ultimately be of significant importance to understanding disease mechanisms, the main important lesson to date from solution studies is that there appears to be no difference between the solution structures of polyGln sequences below and above the pathological repeat length range. Absent some more subtle conformational difference between short and long polyGln, some other aspect of the polyGln sequence must underlie the repeatlength dependence of disease risk. 3.2 Preliminary in vitro Aggregation Studies
With the revelation that expanded polyGln sequences are responsible for the diseases listed in Table 1, and in light of the importance of protein aggregation in other neurodegenerative diseases such as Alzheimer’s disease, Perutz speculated that polyGln expansion might lead to an increased propensity of proteins containing these sequences to aggregate and thereby cause disease [53]. Working with a Q15 sequence flanked by pairs of charged residues, the Perutz lab reported facile aggregation of the peptide in aqueous solution to yield aggregates exhibiting the cross-β structure characteristic of amyloid [53]. In analogy to the leucine zippers designed by nature to promiscuously self-associate in the creation of dimeric transcription factors, Perutz coined the term “polar zipper” for the tendency of polyGln sequences to aggregate, presenting a model in which hydrogen bonding by side chain amide groups contributes to the stability of the aggregate [53]. Scherzinger et al. were able to overcome many of the limitations encountered in the above preliminary studies by using a recombinantly expressed form of glutathione S-transferase (GST) fused with the exon 1 fragment of huntingtin containing the polyGln sequence [54, 55]. Fusion of GST with htt exon 1 renders the protein reasonably well behaved so that it can be extracted and partially purified while maintaining it in a soluble form. By the ingenious use of an installed protease site, Scherzinger et al. were able to cleave this soluble protein between its GST and htt components in a controlled reaction, releasing the polyGln portion in situ so that its aggregation could be monitored in a synchronized reaction [54]. By doing so, and by following the generation of aggregate using a filter trap assay, they were able to observe a clear increase in the aggressiveness of spontaneous aggregation as polyGln repeat length increased, with a significant increase in aggregation rate as repeat length increased through the pathological cutoff range [54, 55]. This correspondence between the repeat length threshold for disease risk and the threshold
7
8 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
for rapid aggregation continues to be one of the most striking features of polyGln behavior in solution, and it has been invoked as supporting evidence for hypotheses involving aggregation in the disease mechanism. 3.3 In Vivo Aggregation Studies
In addition to contributing to our understanding of the disease mechanism, cell and animal models involving recombinant expression of polyGln-containing disease proteins have contributed greatly to our knowledge of the aggregation process itself. The dependence of aggregation efficiency on polyGln repeat length was characterized qualitatively in a number of in vivo experiments comparing a limited variety of repeat length constructs [6, 7, 54, 56]. At the same time, it is also possible for non-pathological repeat length polyGlns to form aggregates under certain conditions in the cell [57], emphasizing that even sequences below the pathological threshold have some potential to aggregate. Transgene experiments have implicated a modulating role for flanking sequences and/or flanking domains in the aggregation of expanded polyGln, consistent with the possibility that proteolytic processing may play an important role in controlling aggregation and toxicity in the cell. For example, mice expressing exon 1–like fragments of htt tend to develop aggregates (and pathology) more rapidly and aggressively than do mice expressing full-length htt [26]. Besides the mammalian systems mentioned here, a number of other organisms, such as yeast [58], D. melanogaster and C. elegans [59], have been used to learn details about how polyglutamine aggregates in the cellular environment. The promiscuity of polyGln aggregation was noted early on in the observation that an aggregate initiated by an expanded polyGln sequence is capable of recruiting the normal-length polyGln protein into the deposit [57]. More recently, recruitment of the normal allele of htt by expanded polyGln htt aggregates has also been observed, suggesting that expanded CAG repeat diseases may include an overlay of loss of function in the disease mechanism [39]. The possible role of generalized polyGln aggregation in the cell was also suggested by the observation that expression of expanded polyGln containing a nuclear localization signal (NLS) could mediate nuclear localization of green fluorescent protein (GFP) linked to a normal-length polyGln [60]. One of the intriguing possible mechanisms of polyGln cytotoxicity is that cellular polyGln aggregates might deplete the soluble cellular pools of other polyGln-containing factors by recruiting them into the aggregate in an ongoing elongation reaction [57, 61, 62]. Such mechanisms are supported by the observation of co-localization of TBP [57, 61] and CBP (CREB-binding protein) [60, 63] to cellular aggregates in tissue from patients and/or from cell and animal models.
4 Analyzing Polyglutamine Behavior With Synthetic Peptides: Practical Aspects
Given the limited success of previous studies of polyGln behavior using synthetic peptides and the comparative success of studies using recombinant proteins in vitro
4 Analyzing Polyglutamine Behavior With Synthetic Peptides: Practical Aspects
and in vivo, one might legitimately question the value of putting further efforts into the chemical approach. There are a number of reasons for doing so, however. While polypeptides, even unnatural fusions, expressed in a cellular environment might be considered to be “more native” than chemically synthesized peptides exposed, even transiently, to organic solvents, the situation is complicated by the central role of the prior aggregation state of a protein in biasing its aggregation behavior in vitro. This is illustrated by early experiments in the Alzheimer’s disease field, in which studies of the amyloid peptide Aβ were confused by investigators’ ignorance of the aggregation state of the peptide and the importance it plays in cytotoxicity and aggregation. This led to an odd lore concerning certain manufacturers’ batches of synthetic peptide as being “good” (cytotoxic right out of the vial) or “bad” (nontoxic) batches. Eventually, both cytotoxicity [64] and aggressive aggregation ability [65] were clearly shown to be attributable to traces of aggregates resident in the vialed material. As shown below, methods now exist for rigorously removing preexisting aggregates from synthetic peptides. Such methods are chemically benign, allow complete removal of the organic solvent, and are compatible with peptides whose natural state as a soluble monomer in native buffer is relatively unstructured. In contrast, recombinant proteins are potentially much more complex and, once nonnative assembly occurs, the unnatural oligomeric state may be much more difficult to recognize and reverse. For example, expression of a GST-AT-3 fusion containing a Q27 sequence led to an enriched, soluble protein that, however, migrated as an oligomer on native gel filtration [66]; similar soluble oligomers can be observed in isolation of htt exon-1 fusion proteins (G. Thiagaragan and R. Wetzel, unpublished). Clearly, it is possible, at least with relatively short polyGln repeats, to prepare monomeric, folded proteins by recombinant expression [41], but a great level of care and analytical scrutiny is required to insure that the preparation is aggregate-free, and the challenges increase dramatically with increasing repeat length. Thus, there may be situations in which chemically synthesized peptides, exposed transiently to non-aqueous solvents to effect efficient disaggregation, are “more native” than a recombinant protein prepared and maintained in aqueous buffers. An additional attractive feature of focusing on chemically synthesized polyGln peptides is that optical studies are not complicated by the presence of other segments of polypeptide. For example, this allows relatively clean interpretation of the CD spectra of subject polypeptides without having to be concerned about the contributions of other domains and how those other domains may or may not also be changing during the experiment. Work with synthetic peptides also allows facile use of chemical tags such as biotin [67] and fluorescein [45], as well as non-natural amino acids such as D-amino acids [10]. Synthetic peptides often make it more possible to conduct studies requiring relatively large amounts of material and to accomplish the studies in a shorter time. Finally, while it is clear that the most appropriate studies, if they can be done on pristine protein samples, are those that are chemically and physically most similar to the toxic fragments generated in vivo, it is also true that studies on simple polyGln peptides will allow baselines to be established for their biophysical behavior, so that further studies on the sequence in the natural settings of the disease proteins
9
10 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
can be placed into context and the specific effects of sequence context be cleanly appreciated. 4.1 Disaggregation of Synthetic Polyglutamine Peptides
In the review of the aggregation literature in Section 3.2, one troubling inconsistency stands out. Work with recombinant protein fragments yields results on the repeat-length dependence of aggregation that mirror the repeat-length dependence of disease risk in HD and other expanded CAG diseases, with sequences up to about Q35 being relatively resistant to aggregation. In contrast, however, previous work with chemically synthesized peptides established an apparent solubility ceiling of Q15 –Q22 , above which it appeared that peptides could not be studied in solution [46, 47, 53]. This appeared to be so even when charged flanking residues were included in an attempt to improve peptide solubility. However, we now understand that this apparent discrepancy between the transient solubilities of recombinant and chemically synthesized peptides is due to the presence, in the latter, of residual polyGln aggregates, leading to a situation not unlike that described above for Aβ peptides. This is dramatically demonstrated by two examples. Although attempts to directly dissolve a K2 Q44 K2 sequence in aqueous buffer failed to give significant soluble peptide [48], pretreatment of a K2 Q44 K2 peptide (or a K2 Q20 K2 peptide) with a trifluoroacetic acid/hexafluoroisopropanol disaggregation protocol adapted from the Aβ literature [68] yields peptides that display excellent initial solubility in aqueous buffers [48]. Even more dramatic is the effect this protocol has on the peptide K2 Q15 K2 . This peptide readily dissolves directly from the synthetic lyophilized powder in low salt, pH 3 buffer to yield a transparent, low light-scattering solution. When this solution is adjusted to phosphate-buffered saline conditions, however, aggregation occurs very rapidly to form a protofibril-like product [48]. In contrast, when this same peptide is exposed to the TFA/HFIP disaggregation protocol prior to solubilization in water and then is adjusted to PBS conditions, the peptide remains in solution for months at 37 ◦ C, finally reaching an equilibrium position of aggregation after about six months [48]. Although the composition of K2 Q15 K2 directly dissolved in water has not been further investigated, it is most likely that the suspended peptide exists as a mixture of dissolved monomer and microaggregates, the latter of which seed aggregation by the former when the solution is adjusted to PBS conditions. This tediously slow aggregation kinetics of the K2 Q15 K2 peptide after disaggregation now qualitatively matches the behavior of short polyGln sequences in recombinant proteins and is also consistent with the lack of disease risk in humans with short polyGln sequences in htt and other disease proteins. This change in the character of the K2 Q15 K2 peptide is not a consequence of any covalent chemical change in the peptide [48], and no significant amounts of organic solvent remain after the treatment. To ensure that all traces of aggregates are removed, a preparative ultracentrifuge step is included. The sequential treatment of the lyophilized, synthetic peptide with TFA, followed by HFIP, as described
4 Analyzing Polyglutamine Behavior With Synthetic Peptides: Practical Aspects
for treatment of Aβ peptides [68], is adequate for disaggregating polyGln peptides up to repeat length 40. Above this, the synthetic peptides do not dissolve well in TFA. However, it was found that peptides of higher repeat length dissolve well in a 1:1 mixture of TFA and HFIP and that, with sufficient incubation time and care, they can be completely disaggregated in this solvent mixture [48]. This treatment also appears to dissolve other aggregation-prone peptides that are resistant to the sequential method. 4.2 Growing and Manipulating Aggregates
Although many studies on disease mechanism in HD and other protein aggregation diseases involve analysis of aggregation behavior, the study of the aggregation products is also important, since it is likely that it is the aggregates themselves, or a continuing aggregation process supported by these aggregates, that are responsible for cytotoxicity. In analogy to the above discussion on working with the peptides, therefore, a few words must also be said about the growth and handling of aggregates. Simple polyGln peptides incubated at 37 ◦ C grow into a variety of morphologically related structures, as reviewed in Section 6.1. In general, these aggregates appear to be variations on the assembly of narrow protofilaments, in analogy to the substructures of amyloid fibrils. The aggregate formation reaction is quite reproducible, so long as the peptides are processed as described above and growth conditions are held constant. The aggregates are quite stable, with critical concentrations well below micromolar for polyGln peptides of repeat length 30 or more [49]. Aggregate suspensions in PBS can be snap-frozen in liquid nitrogen and stored at −80 ◦ C for extended periods without noticeable change in properties. 4.2.1 Polyglutamine Aggregation by Freeze Concentration Simple polyGln peptides dissolved in PBS, when incubated for one day or more in the frozen state at −5 to −20 ◦ C, are highly aggregated upon thawing [66]. This is true even for Q15 peptides, which, when incubated at 37 ◦ C in the completely disaggregated state, require months to achieve significant levels of aggregation. Studies on the mechanism of this effect showed that aggregate formation under these conditions is due to the process of freeze-concentration [69, 70]. Peptide solutions in buffer stored frozen at temperatures above the eutectic points of the buffer components consist of a solid ice phase and fissures of liquid phase containing most of the solutes, including the peptide [69, 70]. Since the liquid phase is only a small volume of the total, the solute concentrations in the liquid phase are enormous, with, for example, NaCl concentrations of several molar resulting from the freezing of a simple PBS ([NaCl] ∼ 0.15 M) solution. These conditions often result in the formation of protein aggregates, which is why it is often advisable to store protein solutions either above 0 ◦ C or below 0 ◦ C with freezing-point depression agents such as glycerol. Interestingly, because ice structure at modest freezing temperatures is rather unstable, solutions of polyGln peptides may be snap-frozen in liquid nitrogen, but if they are then stored at −20 ◦ C, they will still develop
11
12 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
Fig. 2 Time course of freeze-concentration aggregation monitored by light scattering. Samples of 10 µM disaggregated K2 Q37 K2 in PBS were incubated at 37 ◦ C (solid black bars), at −10 ◦ C in the liquid state (open
bars), or at −10 ◦ C in the frozen state (gray bars). The atter sample was snap-frozen in liquid nitrogen then transferred to the −10 ◦ C bath. From Ref [66] with permission from the American Chemical Society.
aggregates as the ice structure equilibrates to that normally formed at the −20 ◦ C storage temperature. Figure 2 shows the time course of aggregate formation in the ice state versus liquid state in the −5 to −10 ◦ C range [66]. The freeze-concentration-induced aggregation of polyGln has several implications. First, there is no reason to suspect that any polyGln sequence, even in the context of other protein sequences, will not be susceptible to this effect. We have found that polyGly peptides can be stored for months at −80 ◦ C if they are first disaggregated, adjusted to low concentrations, and snap-frozen in liquid nitrogen before storage. It is recommended that all polyGln-containing proteins be stored with similar care or not frozen at all. Interestingly, we found that the Aβ(1–40) peptide does not form amyloid by freeze-concentration. Thus, the same level of care required for working with polyGln sequences may not be required when working with other amyloid systems. It is important to be aware, however, of the potential for the seemingly benign freezing process to wreck havoc with samples of any protein, especially aggregation-prone proteins. 4.2.2 Preparing Small Aggregates In addition to this potential for undesired aggregation, the freeze-concentrationinduced aggregation of polyGln sequences is important because it generates aggregates with somewhat different properties compared to those grown from the same peptide at 37 ◦ C [66]. As discussed in Section 6, these aggregates appear to be more like the protofibril assembly intermediates of other amyloid growth reactions than like the mature aggregates grown at 37 ◦ C. These aggregates, once formed, are stable and do not assemble further on incubation at 37 ◦ C. On a weight basis, aggregates prepared by freeze-concentration are much better seeds for fibril elongation than are aggregates grown at 37 ◦ C [49]. At the same time, 37 ◦ C aggregation reactions seeded (and hence greatly accelerated) with aggregates prepared
5 In vitro Studies of PolyGln Aggregation
by freeze-concentration produce aggregates indistinguishable from those grown by spontaneous aggregation at 37 ◦ C, indicating that the substructures of these aggregates must be quite similar. We have used freeze-concentration to produce aggregates of small sizes to facilitate their uptake into mammalian cells [45]. After aggregates are grown in frozen buffer, reactions are thawed and the aggregates collected by centrifugation. Aggregates are sonicated with a probe sonicator and then pushed through a series of molecular weight cutoff membrane filters to produce the finest possible aggregate size. This procedure is critical for the ability of mammalian cells to take up naked polyGln aggregates [45] (see Section 7.1.1).
5 In vitro Studies of PolyGln Aggregation
The central importance of the expansion of the polyGln repeat in this family of neurodegenerative diseases logically leads to the study of the biophysical properties of various repeat lengths of polyGln in search of clues to molecular mechanisms. Coincidentally, focusing on the polyGln sequence in isolation also makes accessible certain experimental approaches not possible with more complex proteins. For example, circular dichroism (CD) spectra can be cleanly interpreted in terms of what the polyGln sequence is doing, without being concerned about sequences and other elements of secondary structure. Solution concentrations of non-aggregated peptide can be quantified with great precision and accuracy using reverse-phase HPLC, as described in the experimental Section. Insolubility can be cleanly interpreted in terms of polyGln aggregation, without worrying about the aggregation tendency of other protein sequences in unnatural fusion proteins or truncated native proteins. The data from studies on isolated polyGln sequences establish a baseline for the fundamental behavioral trends of the polyGln sequence, so that further studies with sequences more closely resembling the complex structures of disease proteins can be put into context. 5.1 The Universe of Protein Aggregation Mechanisms
The broad phenomenon of particulate aggregation is systematized in the field of colloidal assembly and coagulation. The aggregation of proteins, especially in response to heat and as a side reaction in protein folding, has been viewed historically as a problem in colloidal assembly [71]. Early generalized models of the kinetics of coagulation involved models of particles undergoing hierarchical assembly into larger and larger clusters. Cluster formation in these models was considered to be controlled either by simple diffusion limits or by an energy barrier that determined the productiveness of each collision, but the assumption in either case was that each productive collision was irreversible [72]. However, since many protein aggregation reactions such as crystallization and amyloid fibril formation reach
13
14 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
equilibrium positions in which both aggregates and monomers are populated, the possibility of productive collisions being reversible must be considered, which complicates the modeling [72]. Protein crystallization and amyloid formation may differ from classical colloidal assembly in another respect; rather than a hierarchical assembly mechanism involving the interactions of intermediates of ever-increasing size, protein crystals and amyloids, once formed, might be capable of growing by the addition of monomeric or oligomeric protein units [72]. This could occur, for example, if the protein in the aggregated state projects a docking surface that is capable of recruiting bulk-phase monomer very efficiently, compared to the relatively inefficient process of nucleation. Such a nucleation-dependent polymerization model has been developed [73, 74] and successfully applied to hemoglobin aggregation [75]. The model also takes into account the potential reversibility of steps in the aggregation process and thus treats the nucleation process as an unfavorable pre-equilibrium before a series of increasingly favorable elongation steps; that is, the kinetic nucleus is viewed as the least stable species on the aggregation pathway. The unfavorability of the nucleation step gives rise to a “lag phase,” during which no mature aggregates are formed and negligible amounts of monomer are consumed, followed by a rapid growth phase. Since this is a relatively simple model that has been successfully applied to protein systems, and since there is no evidence from in vitro studies of a more coagulationbased assembly mechanism for polyGln aggregation, this model has been applied to polyGln aggregation data, as described in the following section. 5.2 Basic Studies on Spontaneous Aggregation
Rigorously disaggregated polyglutamine peptides (Section 4.1) incubated in PBS at 37 ◦ C undergo a spontaneous aggregation process that generally exhibits a lag phase typical of nucleated growth polymerization. These lag times become shorter as the repeat length gets longer, with aggregation becoming much more aggressive at repeat lengths about 35 [49]. These results agree with previous observations based on studies of exon I fragments of huntingtin [55]. Following the kinetics by a number of different windows on the aggregation process – light scattering, ThT fluorescence, CD, and solubility – reveals overlapping growth curves (Figure 3), suggesting that there are no major hidden species in the process [76]. The fact that the CD transition, from a spectrum consistent with random coil to one consistent with β-sheet, coincident with the reaction as monitored by aggregate formation and peptide solubility suggests that there is no pre-assembly in the bulk-phase monomer to a β-sheet containing monomeric species before aggregation occurs [76]. This interpretation is reinforced by an experiment in which the aggregation reaction is stopped midway and centrifuged [76]. The resuspended centrifugation pellet gives a typical β-sheet CD spectrum, while the supernatant gives the same random coil signature spectrum as seen in the starting reaction mixture (Figure 4). The implication is that the mechanism by which the bulk-phase polyGln peptides take on β-sheet characteristics occurs either through a concerted mechanism, in
5 In vitro Studies of PolyGln Aggregation
Fig. 3 Spontaneous aggregation of a polyGln peptide monitored by four independent measures. A sample of 66 µM freshly disaggregated K2 Q42 K2 in 10 mM Tris-TFA, pH 7.0, was incubated at 37 ◦ C and aggregation was monitored by circular dichroism
signal at 200 nm (), determining remaining soluble monomer by HPLC (•), light scattering (), and thioflavin T fluorescence (◦). From Ref. [76] with permission from the National Academy of Sciences (USA).
Fig. 4 Circular dichroism spectra of various fractions of a polyGln aggregation reaction midpoint. The reaction described in the legend of Figure 3 was sampled after 90-h incubation and CD spectra were collected after various treatments: no treat-
ment (a); ultracentrifugation supernatant (b); resuspended ultracentrifugation pellet (c); sum of the supernatant and pellet spectra (d). Adapted from Ref. [76] with permission from the National Academy of Sciences (USA).
15
16 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
which molecules take on β-structure as they become ensconced onto the end of the growing aggregate, or through a mechanism in which isolated monomers rapidly add to aggregates as soon as they acquire β-structure. 5.3 Nucleation Kinetics of PolyGln
The increasing efficiency with which peptides undergo spontaneous aggregation (see previous section) suggests that it may be this property of the polyGln sequence that is responsible for the critical importance of repeat length in the expanded CAG repeat diseases. The source of the repeat-length dependence of aggregation is not immediately obvious, however. To investigate whether the repeat-length effect is played out at the level of nucleation and to learn more about the nucleation process, we applied a successful model for the nucleation-dependent aggregation of sickle hemoglobin [74] to the polyGln aggregation process [76]. In this model, the formation of an oligomeric kinetic nucleus from a bulk phase of monomeric proteins is treated as a pre-equilibrium, the favorability of which is dependent on the equilibrium constant and the number of monomers that cluster together to form the nucleus. This latter number, which is of particular importance because it becomes an exponential factor in the equation describing nucleation kinetics, is called the “critical nucleus,” n* . Whether any particular nucleus collapses back to n* monomers, or progresses on to committed aggregate growth, depends on the efficiency with which it is stabilized through the elongation of the nucleus into a growing aggregate, as determined by the rate constant describing this elongation step. Since the number of kinetic nuclei formed during a nucleation-dependent polymerization reaction is vanishingly small and cannot be directly observed, the concentration of nuclei present at any one time is inferred by the kinetics with which observable aggregates develop, as described by the rate constant for elongation of aggregates. The overall nucleation kinetics equation (Eq. (1)) emerging from this model describes , the increase in the concentration of monomers that have been converted into aggregates at time t, as it depends on the concentration of monomers in the bulk phase, c, the nucleation (pre-)equilibrium constant K n *, and the rate constants for elongation of the nucleus and aggregate, k+ , which, for simplicity, are assumed to be identical. 2 =1/2k+ K n* c (n*+2) t 2
(1)
Eq. (1) accounts for the time lag observed for nucleation-dependent growth kinetics and for the concentration dependence of the lag phase and the underlying nucleation kinetics. It also predicts two other features of nucleation kinetics: (1) that the kinetics should be dependent on time2 , and (2) that a log-log plot of the term emerging from the t2 plot, with respect to monomer concentration, will yield a line of slope n* + 2, from which n* , the number of monomers in the kinetic nucleus, can be derived. In fact, analysis of polyGln peptides with repeat lengths below and above the pathological threshold region shows linear t2 plots at all peptide
5 In vitro Studies of PolyGln Aggregation
Fig. 5 Critical nucleus analysis for polyGln peptides of various lengths. Nucleation kinetics for each peptide at several concentrations was determined as described in the text and analyzed by t2 plots. The log of the slope of the t2 plot is plotted against the log of peptide concentration. The slope
of the straight line fitting this data is n* + 2, where n* is the critical nucleus. Shown are data for K2 Q28 K2 (), K2 Q36 K2 (•), and K2 Q47 K2 (). Also shown are theoretical lines showing slopes of 3 (◦) and 4 (). From Ref. [76] with permission from the National Academy of Sciences (USA).
concentrations examined [76]. Figure 5 shows that plotting the log of the slopes of the t2 plots at each concentration, against the log of the concentration, gives a straight line for each repeat length. The equivalence of the slopes for peptides above, at, and below the pathological threshold, as shown in Figure 5, shows that, within this range, repeat length does not alter the size of the critical nucleus. Table 2 shows the surprising result that the kinetics analyses for these peptides yield slopes of 3, reducing, therefore, to n* = 1 in each case. Table 2 also shows that the source of the more aggressive nucleation kinetics, as polyGln repeat length increases, is therefore the larger value of the k+ 2 Kn *. term, rather than a change in the value of n*. Since it was previously shown that elongation rates do not change dramatically as polyGln repeat lengths change [49], it can be concluded that the major source of the increased aggressiveness of polyGln aggregation above the
Table 2 Kinetic parameters for polyGln nucleation of aggregationa
Slope of the log-log plot Calculated critical nucleus (n*) from slope k+ 2 K n* (M−2 s−2 ) a
Data from Ref. [76].
Q28
Q36
Q47
2.98 0.98
2.68 0.68
2.59 0.59
0.001128
0.09193
1.8304
17
18 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
Fig. 6 A model for the nucleationdependent polymerization of polyGln peptides. Reversible formation of the monomeric nucleus, the least stable species on the aggregation pathway, is visualized as an unfavorable folding reaction. Elongation is visualized as a cyclic two-step process
involving addition of bulk-phase polyGln monomer to the nucleus or growing aggregate, followed by consolidation of the structure of the newly added monomer to extend the structure of the aggregate. From Ref. [76] with permission from the National Academy of Sciences (USA).
pathological threshold is an increase in the nucleation (pre-)equilibrium constant Kn* . The result of this analysis, i.e., that the kinetic nucleus for nucleation-dependent polymerization of polyGln sequences contains a single molecule of polyGln, is a surprising, if not shocking, result. The whole theoretical basis for nucleation in nucleation-dependent polymerization and colloidal assembly theories is a model in which the nucleus is itself an oligomer on the assembly pathway. Based on general colloidal assembly models [72] and on water condensation analysis [77], the stability derived from the growth of the nucleus is basically attributed to an improved surface-to-volume quotient for this particulate cluster [78]. The recognition, however, that monomeric polyGln peptides, even long ones, assume a non-β-sheet conformation in solution makes conceptually possible a model in which the nucleation event is an unfavorable folding reaction within the monomer population. Figure 6 illustrates this model schematically, showing bulk-phase monomer as random coil and the nucleus as a monomer collapsed into an antiparallel β-sheet conformation. Interestingly, despite the fact that the model for the nucleation process was developed based on the assumption of an oligomeric critical nucleus, the mathematical analysis growing out of this model is sufficiently robust that it also describes quite well the situation in which n* = 1. The reason for this is that the most fundamental definition of the critical nucleus is more mathematical than physical, stating only that the nucleus is the least stable species on the assembly pathway. The critical nucleus for polyGln assembly into amyloid-like aggregates achieves this distinction by being the product of a highly unfavorable folding reaction. How unfavorable these reactions are in the polyGln series is presently unknown, since the k+ 2 K n* . term cannot be deconvoluted due to (1) a present inability to independently determine the value of k+ and (2) the inability to independently observe the nucleation equilibrium without its leading to aggregation. Direct observation of the nucleation process, determination of K n *, and derivation of molar elongation rate constants remain challenges for future research. The ability of the nucleation kinetics model to accommodate polyGln aggregation made possible the calculated estimation of the expected kinetics curves for
5 In vitro Studies of PolyGln Aggregation
Fig. 7 Calculated nucleation kinetics curves for 100-pM polyGln peptides in PBS at 37 ◦ C, based on kinetic parameters derived from in vitro studies at higher concentrations. From Ref. [76] with permission from the National Academy of Sciences (USA).
nucleation-dependent polymerization of polyGln at relatively low steady-state concentrations, i.e., of magnitudes similar to those expected in vivo. Figure 7 shows the resulting curves for assumed polyGln concentrations of 100 pM, which show predicted ages of onset for the three polyGln repeat lengths studied that differ by orders of magnitude and in a way that is consistent with disease statistics (Figure 1). The actual steady-state concentrations of the aggregation-prone proteolytic fragments of huntingtin or other expanded CAG repeat disease proteins are not known, and it is clear that the molecular complexity of cells and organisms is likely to foil attempts at adequately describing disease through test tube experiments in simple buffers alone. However, the simulation in Figure 7 illustrates how nucleation propensity, as controlled simply by repeat-length variation, is capable of contributing significantly to disease risk and age of onset. Ongoing studies of simple peptides in defined buffers are also providing quantitative descriptions of the roles of molecular crowding, molecular chaperones, downstream amino acid sequences, and other polyGln-containing molecules on impacting aggregation nucleation (A. Thakur, A. Bhattacharyya, G. Thiagaragan, and R. Wetzel, unpublished results). 5.4 Elongation Kinetics
The model for polyGln aggregate growth shown in Figure 6 shows that the nucleus elongates by successive productive encounters with bulk-phase monomer. Although other models have been proposed for fibril elongation that may be followed by some proteins under some solution conditions [79], growth by monomer additions is likely to be the major pathway by which amyloid fibrils grow in vivo, especially in most systems where precursor protein concentrations are quite low. The
19
20 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
kinetics of this process have been successfully modeled [80] as a simple bimolecular reaction between aggregate and monomer, such that the observed rate depends on monomer concentration, aggregate (molar) concentration, and the second-order rate constant. However, since in a heavily seeded reaction, aggregation mass builds primarily by the lengthening of the seeds, the molar concentration of aggregate (or the molar concentration of growth sites) does not change appreciably over the course of a simple elongation reaction, and the rate expression reduces to a pseudofirst-order kinetics equation involving monomer concentration and the first-order rate constant. (This rate constant is not a universal constant for this peptide, however, since it incorporates the molar concentration of fibrils used as seeds, which is not trivial to determine.) Understanding more about the polyGln aggregate elongation process is important not only because of how this process contributes to the deposition of the disease protein in neurons but also because of the possible importance of the recruitment of other polyGln-containing proteins in the disease mechanism (Section 3.3). If the polyGln aggregate, or the aggregation process itself, is responsible for the neurotoxicity observed in these diseases, it is also important to learn how we might interfere with this process therapeutically. The need to discover and characterize aggregation inhibitors adds to the importance of having available one or more ways of quantifying elongation in vitro. 5.4.1 Microtiter Plate Assay for Elongation Kinetics One convenient way of studying the elongation reaction is by immobilizing preformed aggregate on a surface, incubating the surface in a solution of monomer, and monitoring the growth of the aggregate seed. Aβ elongation has been studied by this approach in two ways, by using either fibrils fixed to microtiter plate wells and following elongation with radio-labeled monomer [81] or fibrils fixed to a dextran surface and following elongation by an increase in fibril mass as detected by surface plasmon resonance [82]. A modification of the microplate approach has been used to develop a polyGln elongation assay, which has been described in detail [67, 83]. In this assay, very small masses (typically 20–100 ng per well) of polyGln aggregates are immobilized onto microtiter plate wells. An aqueous solution of a disaggregated polyGln, with a biotin covalently attached to the N-terminus, is supplied at low concentration (typically 10 nM) and incubated. After incubation, excess biotinylated peptide is washed out and the amount of biotin remaining on the solid phase by virtue of a continuation of the polyGln aggregation process is measured with a streptavidin-linked reagent. High sensitivity is achieved through using streptavidin complexed with europium, which in turn can be measured with great sensitivity using time-resolved fluorescence. Since the microplate must be processed intact, kinetics are accomplished by phasing the start times of multiple reactions so that their common termination will deliver a time course. A typical kinetics progress curve for this polyGln elongation reaction is pictured in Figure 8. The division of the kinetics into fast and slow phases, as seen in Figure 8, has also been observed in the Aβ microplate assay [84]. As predicted by the kinetic model described above, the magnitude of the kinetic phases increases both with
5 In vitro Studies of PolyGln Aggregation
Fig. 8 Kinetics of polyGln aggregate elongation monitored by a microtiter plate– based assay. Twenty nanograms of K2 Q28 K2 aggregates were immobilized by adsorption to each microtiter plate well, and
the wells were incubated for various times with 10 nM biotinyl-K2 Q28 K2 . Signal was developed using europium-tagged streptavidin and time-resolved fluorescence. From Ref. [67] with permission from Elsevier.
the increase of the amount of deposited polyGln aggregate and with an increase in monomer concentration. This assay is proving useful in working out some fundamental aspects of the polyGln elongation reaction [49], in screening for inhibitors of polyGln aggregate elongation (see following section), and in characterizing HD brain material. 5.4.2 Repeat-length and Aggregate-size Dependence of Elongation Rates The repeat-length dependence of spontaneous polyGln aggregation is grounded in the repeat-length dependence of the unfavorable folding of the monomeric kinetic nucleus (Section 5.3) and is the simplest available explanation for how the gross features of the repeat-length effect on disease risk and age of onset are established. The repeat-length effect of the elongation phase of aggregate growth is also important, however. If the ability of polyGln aggregates to cause cell death and/or dysfunction is due to their ability to recruit other polyGln-containing proteins into inactivating aggregates, then the dependence of the efficiency of recruitment on polyGln repeat length will determine the range of polyGln-containing proteins that are at risk of being recruited. Experiments using the microtiter plate elongation assay show that there is, indeed, a length dependence of recruitment kinetics, but it is not nearly as dramatic as that for nucleation-dependent aggregation [49]. PolyGln peptides in the 5–10 repeat length range are not strongly recruited into preexisting aggregates.
21
22 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
At repeat lengths in the Q15 to Q20 range, however, elongation rates become more substantial and continue to increase with increasing repeat length up to about Q40 , above which there appears to be little effect on rate of increasing repeat length. This suggests that polyGln proteins with repeat lengths in the 5–10 range are not likely to impact polyGln aggregation processes very much. However, at repeats of Q15 and above, the existence of polyGln proteins, and their relative concentrations and functions in the cell, may well impact expanded CAG repeat diseases. These results are consistent with observations that neuronal polyGln aggregates often contain CREB-binding protein (which has a repeat length of 18 Gln residues), but that the homologous protein P300, which has a polyGln repeat of only 6, is not recruited into cellular aggregates [85]. The rate of aggregate elongation, as determined by polyGln repeat length and concentration, may also significantly contribute to other aspects of polyGln aggregation and pathology (A. M. Bhattacharyya and R. Wetzel, manuscript in preparation). Size also matters in considering the mass-normalized recruitment activity of polyGln aggregates. Microtiter plate experiments show that, on an equal mass basis, the ability of a preformed aggregate to serve as a template for polyGln elongation improves as the average particle size of the aggregate decreases [49]. The data support the idea that not all polyGln aggregates are functionally equivalent and that small aggregates are substantially better than the larger ones at supporting elongation. If polyGln aggregates are, in fact, toxic by virtue of their ability to recruit polyGln-containing proteins and thus remove them from their normal sites of action, it stands to reason that more recruitment-active aggregates would be more toxic. That is, aggregates too small to be easily detected by light/fluorescence microscopy may be far more cytotoxic than the relatively recruitment-inert aggregates that can be so visualized. These conclusions are further supported by staining cellular aggregates for recruitment activity, which will be discussed in Section 6.2.
6 The Structure of PolyGln Aggregates
Like other amyloid-like fibrils, the interior, atomic-level structure of the polyGln aggregate remains opaque, due to its inaccessibility to standard high-resolution methods and the limited resolution of the available methods. The following sections catalog what we do know about polyGln aggregate structure, with an emphasis on the extent to which these aggregates resemble other amyloids. 6.1 Electron Microscopy Analysis
Perutz and coworkers first showed that the polyGln aggregate has a fibrillar construction. The aggregates obtained from a Q15 peptide immediately upon adjusting a pH 3 solution of the synthetic product to pH 7 [53] resemble in the electron microscope the protofibrils observed as assembly intermediates in amyloid
6 The Structure of PolyGln Aggregates
formation by other peptides, such as Aβ [86]. Similar structures are observed from Q15 or Q20 peptides after they are rigorously disaggregated and then incubated in PBS (Figure 9). In addition, aggregates of longer polyGln sequences take on this same protofibril-like appearance when they are grown in PBS at −20 ◦ C by freeze-concentration (Section 4.1) (Figure 9). It is possible that aggregation stops at this stage at −20 ◦ C because the diffusion of the protofibrils is limited by the ice lattice. The generation of protofibril-like structures from simple polyGln peptides in vitro has also been accomplished through the use of nonnative buffer conditions [87]; the relationship of these aggregates to those formed, for example, by freeze-concentration has not been explored. When simple polyGln peptides longer than Q20 are incubated at 37 ◦ C in PBS, they grow into more complex aggregated structures that appear in the EM as ribbons of about 20 nm in diameter and with substructures suggesting assembly from parallel-aligned protofibrils or protofilaments (Figure 9). Attempts to detect protofibril intermediates in these assembly reactions have not been successful, however (unpublished results), suggesting that under native conditions in simple PBS buffers protofibrils either do not exist, or have a very short lifetime. The observation of polyGln-associated protofibril-like structures in cells [88] may be due to the presence of flanking protein sequences and/or other molecules in the milieu altering the assembly pathway. When longer polyGln peptides (Q45 or higher) are incubated in PBS at 37 ◦ C, especially in low salt buffer, they grow into long filamentous aggregates that match the structures of typical amyloid fibrils (Figure 9). Incubation in vitro of fusion proteins containing huntingtin exon 1 fragments with expanded polyGln repeats also generates amyloid fibrils [54], and some EM studies suggest a fibrillar appearance of the substructures of polyGln inclusions in cells [4]. Thus, there appears to be a hierarchy of form in the world of polyGln aggregates, all of which exhibit a similar protofilament substructure that is capable of assembling in higher-ordered structures depending on polyGln repeat length, protein and cellular context, and growth conditions. 6.2 Analysis with Amyloid Dyes Thioflavin T and Congo Red
The early history of amyloids is associated with the discovery of the ability of certain heteroaromatic dyes to bind selectively to fibrils and, through binding, take on altered spectroscopic properties that can be monitored. In 1927, Divry reported that amyloid deposits in tissue, previously detected by a starch-like reaction to iodide staining, could be easily detected by staining the tissue with the dye Congo red (CR) followed by polarized light microscopy [89]. The success of the method, which remains the primary pathological stain for amyloid [90], depends not only on the ability of CR to bind to amyloid fibrils but also on the ability of the fibrils to confer onto the bound CR a macroscopic order that can be detected as birefringence in light microscopy. Since the resolution of light microscopy is limited by the wavelength range of visible light, birefringence is observed only if the ordered
23
24 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
Fig. 9 Electron micrographs of various protein aggregates. (A) D2 Q15 K2 aggregates grown in PBS at 37 ◦ C. (B) K2 Q20 K2 aggregates grown in PBS at 37 ◦ C. (C) K2 Q37 K2 aggregates grown in PBS at 37 ◦ C. (D) K2 Q37 K2 aggregates grown in PBS in the frozen state at −20 ◦ C. (E) Fluorescein-
tagged K2 Q42 K2 aggregates grown in PBS at 37 ◦ C. (F) K2 Q42 K2 aggregates grown in Tris-HCl, pH 7.2 at 37 ◦ C. (G) A$(1–40) fibrils grown in PBS at 37 ◦ C. The bar represents 50 nm. From Ref. [66] with permission from the American Chemical Society.
array of CR molecules that produces it extends into that size range. Since amyloid fibril diameters are in the 10 nm range, it is clear that for fibrils to be detected by CR birefringence, the fibrils have to be arranged in a higher level of order, that is, oriented non-randomly in the field. In fact, EM of fibrils often shows them to be loosely bundled together in an approximate parallel arrangement. This type of order is required in tissue for amyloid to be detected by CR birefringence. Perhaps this explains the lack of success in detecting polyGln aggregates in tissue
6 The Structure of PolyGln Aggregates
by amyloid reagents. CR staining of polyGln aggregates grown in vitro produces a red cast, suggesting that the aggregates bind CR, but does not yield significant birefringence [66], suggesting that a lack of macroscopic order, rather than an absence of the fundamental amyloid fold, may be the explanation for the lack of signal. CR binding to amyloid in vitro can be detected by virtue of a spectral shift conferred onto CR upon binding [91], but there appear to be no data on whether polyGln aggregates produce this effect. Another important dye for amyloid analysis is thioflavin T (ThT). ThT has the ability to bind to many different amyloid fibrils [92, 93], and when it does, the binding produces a characteristic change in ThT’s fluorescent properties. The fluorescence yield upon ThT binding is linear with the number of ThT molecules bound, and therefore (under saturating conditions) also with the mass of amyloid, which allows the dye to be used to quantify amyloid in vitro [93]. PolyGln aggregates produce a typical amyloid-like ThT response [49]. PolyGln aggregation kinetics monitored by ThT track exactly, within experimental error, with other measures of aggregate formation (Figure 3). The molecular mechanisms by which CR and ThT bind to polyGln and other amyloids are not known. CR, as a sulfonate, carries a strong negative charge, while ThT carries a positive charge, at neutral pH. The charged nature of these dyes has led to speculation that electrostatics play an important role in their binding to amyloids, most of which are derived from sequences possessing charged amino acids. PolyGln sequences carry only the neutral glutamine side chain, however, casting doubt on any overriding role for electrostatics in the binding of ThT. Although the synthetic polyGln molecules summarized here contain flanking Lys residues for peptide solubility, their aggregates, displaying only positively charged Lys side chains, are still capable of binding the positively charged ThT. 6.3 Circular Dichroism Analysis
As discussed in Section 5.2, the CD spectra of aggregated polyGln peptides are consistent with high levels of β-sheet (Figure 4), and the progress of the CD signal change at 200 nm that marks the progress of the two-state transition from soluble peptide to aggregate overlaps the aggregation kinetic curves determined by other means (Figure 3). The ease of obtaining an interpretable CD spectrum of the polyGln aggregate is somewhat unusual in the amyloid field, where attempts to collect CD spectra of amyloid fibrils in suspension are marred by high levels of light scattering. That polyGln aggregates are better behaved in CD must be due to lower light scattering, presumably due to smaller average particle sizes. In fact, while Rayleigh light scattering in a fluorometer is a very sensitive means of following amyloid fibril formation by Aβ (B. O’Nuallain and R. Wetzel, unpublished observations), equivalent weight concentrations of polyGln aggregates scatter light much less well, although still to a useful extent [48, 76]. When polyGln aggregation reactions are carried beyond the point when the ThT reaction has reached plateau and the solution-phase peptide is essentially depleted, further, relatively small,
25
26 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
changes occur in the CD spectra (S. Chen and R. Wetzel, unpublished). Similar changes have been seen in other β-sheet proteins, where they have been interpreted as reflecting increasing degrees of twist in the sheet [94]. 6.4 Presence of a Generic Amyloid Epitope in PolyGln Aggregates
Recently, a class of antibody has been described that has the unusual, unanticipated ability to recognize a conformational epitope that seems to be present to some extent on all amyloid fibrils while not being present in the native state of the monomer [95, 96]. For example, the mouse MAb WO1 was generated by challenging mice with an Aβ(1-40) amyloid fibril immunogen, but the resulting antibody binds not only to Aβ fibrils but also to amyloid fibrils derived from IAPP, β2-microglobulin, transthyretin, immunoglobulin light chain, etc. [96]. This antibody also binds to the amyloid-like aggregates of synthetic polyGln peptides [96], offering another indication of the relatedness of the polyGln aggregate to the generic amyloid motif [66]. Conversely, some MAbs generated from mice immunized with polyGln aggregates are able to bind well to other amyloid fibrils such as Aβ fibrils (B. O’Nuallain, R. Wetzel, S. Ou, J. Ko, P. Patterson, unpublished), further indicating the presence of an amyloid epitope on the polyGln aggregate. Such pan-amyloid antibodies tend to be IgMs, limiting their use as tissue-staining reagents. These antibodies have a number of other uses, however, as diagnostics for the amyloid motif and as a means of quantifying certain features of fibrils [10]. 6.5 Proline Mutagenesis to Dissect the Polyglutamine Fold Within the Aggregate
One of the central questions in amyloid research is how the constituent polypeptide folds upon itself to engage the amyloid structural motif. Amyloid fibrils clearly possess a cross-β structure, in which the peptide chains are perpendicular to the fibril axis, while the H-bonds between strands in the β-sheets are parallel to the axis [97]. The details of how the peptide chains are arranged is space within these simple, minimal restraints have not, however, been established for any amyloid. Mutagenesis, especially with proline residues [98], has been used in other amyloid systems to garner details about how the sequence of the amyloidogenic peptide folds within the fibril. In Aβ, for example, it is clear from a variety of techniques that the N-terminal 10–15 amino acids, as well as the C-terminal 3–4 amino acids, are not involved in tight β-sheet H-bonding [99]. Proline analysis, which is based on the observation that loops, floppy ends, and some turn positions can comfortably accept Pro residues, while β-sheets are destabilized by them, also suggests the location of turns in the Aβ fibril [99]. A major problem confronts the use of mutagenesis to dissect polyGln structure in the aggregate, however. The problem is the prospect that the polyGln sequence will be promiscuous in how it adopts its fold within the aggregate. This means that
6 The Structure of PolyGln Aggregates
any particular sequence position of a Q40 sequence, let’s say position 10, might be found in a number of environments when a series of Q40 molecules folds into the aggregate. Further, should position 10 be favored in a certain kind of structure in the aggregate, a mutation at position 10 that is unfavorable in that type of structure will likely simply lead to adjustments in the polyGln chain as it engages the aggregate structure, to place residue 10 in the least damaging – that is, least destabilizing – place in the aggregate. A modified mutagenesis strategy, taking the above complication into account, has led to some insights into polyGln structure within the aggregate. It is based on the hypothesis that polyGln folds in the aggregate in a series of extended chain segments, of some unknown optimal length, alternating with turn segments [10]. This hypothesis was then tested by analysis of the aggregation kinetics of a series of mutant polyGln peptides composed of alternating segments of oligoGln and Pro-Gly sequences, the latter being especially comfortable in turn elements. It was found that the peptide PGQ9 , consisting of four Q9 segments interspersed with three PG pairs, undergoes spontaneous aggregation essentially as fast as a Q42 peptide of the same length. Since peptides with shorter QN elements aggregate less rapidly, while a PGQ10 peptide aggregates only marginally more rapidly, it was concluded that the best model for how an unbroken polyGln sequence aggregates is PGQ9 . Since each turn may involve one or two Gln residues bordering the PG pair, this corresponds to extended chain segments of 7–8 residues interspersed with turns of 3–4 residues. Two further mutagenesis tests on PGQ9 support such models. To test the interpretation that the PG pairs are located in turns in the aggregate, a PGQ9 peptide was synthesized containing D-prolines in place of L-prolines; this peptide aggregates more rapidly than the L-Pro form, consistent with data showing D-Pro-Gly to be more compatible in β-turns than L-Pro-Gly [100]. To test the interpretation that the oligoGln sequences between the PG pairs are in extended chain in the aggregate, mutant peptides were synthesized in which additional proline residues were inserted in the middle of one or more of these oligoGln sequences. When a single Pro residue is placed in the middle of either the second (PGQ9 P2 ) or third (PGQ9 P3 ) of the four Q9 segments, these peptides are completely incapable of forming aggregates under conditions where PGQ9 aggregates readily. This replicates the strong inhibition of amyloid formation when a single Pro residue is placed in the middle of a β-sheet region in Aβ peptides [98, 99] and strongly supports the model for the aggregate structure of PGQ9 . That the structure of the PGQ9 aggregate is very similar to that of a Q42 aggregate was shown by a number of studies, including EM, WO1 binding (Section 6.4) and the abilities of each aggregate to seed elongation of the other peptide [10]. Interestingly, the aggregation of another Q42 analogue, containing only a single proline residue at the same residue position as the additional proline in PGQ9 P2 , is nearly as rapid as that of an unbroken Q42 peptide. This confirms the concerns about folding promiscuity in polyGln aggregation that guided the design of the PGQ9 series. That is, simple mutagenesis experiments on unbroken polyGln will generally be very difficult to interpret with confidence, because of the ability of the
27
28 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
polyGln sequence to accommodate to the mutation by slightly altering its fold in the aggregate. The model of a repeat structure consisting of alternating extended chains and turns is compatible with either an antiparallel β-sheet motif or an irregular parallel β-helix motif of the kind seen in certain globular proteins [101]. Previously, Starikov et al. proposed an antiparallel β-sheet model, noting that the number of residues required to make a four-stranded version of such a model is very close to the pathological threshold for most expanded CAG repeat diseases [102]. Perutz and coworkers proposed a “water-filled nanotube” model for the polyGln aggregate [103], which is related to parallel β-helix models previously proposed for the amyloid structure [101, 104]. It is not clear whether the central core of water would be energetically favorable or whether a more stable variation might be to exclude the water to generate an irregular helix more like those seen in parallel β-helical proteins [101, 105]. However, the “ideal” water-filled nanotube model [103] might be expected to be destabilized by Pro or Pro-Gly insertions, given the uniform extended chain and continuous twisted β-sheet in the model. The ability of polyGln aggregates to tolerate certain periodicities of Pro-Gly insertion is also incompatible with aggregates consisting of very long strands of polyGln extended chain [27]. Interestingly, the mechanism of nucleation of polyGln aggregation does not seem to be altered in these Pro-Gly mutants. Thus, analysis of the concentration dependence of the early phases of spontaneous aggregation for the peptide PGQ9 yields a value of n* = 1 for the critical nucleus [10], the same number obtained for unbroken polyGln sequences.
7 Polyglutamine Aggregates and Cytotoxicity
There is little disagreement that the expansion of the polyGln sequence plays a major role in expanded CAG repeat diseases, but opinions differ considerably as to the mechanisms by which this effect plays out. Many reviews are available that summarize the data and argue for and against an active involvement of polyGln aggregates [2, 13–31]. One of the central barriers to reaching a consensus on the nature of the polyGln effect is the difficulty in characterizing and/or controlling the aggregation state of polyGln in cells. This problem has an impact in various ways. For example, in order to produce htt aggregates in a reasonable time frame in a cell or animal model, it is necessary to express unnaturally high concentrations of monomeric protein in the cell; if toxicity is observed, is it then due to the aggregates or to the high concentration of monomer required to make the aggregate? At the same time, genetic analyses for binding partners of expanded polyGln proteins may in some cases be mediated by an aggregated form of the bait molecule rather than the assumed monomeric form. Semantics and sensitivity of detection may also play roles in the controversy over the toxic species in polyGln diseases. To a protein chemist, an aggregate is any oligomeric protein state held together by noncovalent interactions, even
7 Polyglutamine Aggregates and Cytotoxicity
including native oligomers; thus, a nonnative dimer of huntingtin, for example, would be considered an aggregate. Some workers, however, interpret an absence of large, macroscopic cellular inclusions as definitively ruling out the involvement of aggregates in toxicity. If the neuronal dysfunction and death characteristics of expanded CAG repeat diseases can in fact be attributed to polyGln aggregates, it remains to be determined how aggregates produce their effects at the molecular and cellular level. A number of theories have been put forward to explain the toxicity of protein aggregates in neurodegeneration. Aggregates might saturate and/or otherwise inactivate the cellular machinery, such as molecular chaperones, the ubiquitin-proteasome system [106], and the aggresome system [107], responsible for dealing with protein misfolding. Aggregates might insert into membranes and alter their properties [108]. One mechanism that is uniquely feasible for polyGln aggregation and that has received significant attention and confirmatory data is the recruitment-sequestration model [24, 57, 61, 62, 85]. Since it is well established that amyloid like aggregates of one polyGln sequence are capable of being elongated by the addition of other polyGln sequences, it is possible that polyGln aggregates in the cell might recruit other polyGln-containing proteins. Many such proteins exist in the human genome [109–111], in particular among proteins involved in nucleic acid binding, such as transcription factors. In a pivotal paper, Ross and coworkers showed that expanded polyGln cytotoxicity in a cell model appears to be mediated by the recruitment of the transcription factor CREB-binding protein (CBP), which contains a Q18 repeat, into the polyGln aggregates [85]. Expression of high levels of a functional, polyGln-minus version of CBP protects cells from expanded polyGln toxicity [85]. As discussed below, synthetic polyGln peptides have contributed to the debate about the toxicity of expanded polyGln sequences by addressing some of these technical barriers. 7.1 Direct Cytotoxicity of PolyGln Aggregates
One way to more directly test the toxicity of aggregates would be to deliver polyGln aggregates produced in vitro into cells. This would eliminate the need for producing unnaturally high levels of monomeric polyGln, the toxicity of which then becomes an important question if the experimental result is to be properly interpreted. The ability to produce a variety of well-characterized polyGln aggregates in relative bulk, as described above, makes possible experiments involving delivery of aggregates to cells. In addition, the ability to deliver aggregates essentially composed of only the polyGln sequence addresses the question of the role of other sequence elements in cytotoxicity of expanded CAG repeat proteins. 7.1.1 Delivery of Aggregates into Cells and Cellular Compartments The challenge in analyzing the cytotoxicity of exogenous aggregates is their delivery into the cell. In fact, liposome encapsulation does allow polyGln peptide aggregates to be taken up by cells in a relatively short time [45]. Interestingly, however, a control
29
30 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
experiment of polyGln aggregate mixed directly with cells in culture also resulted in very good cellular uptake of the aggregate [45]. Sonication and membrane filtration (Section 4.2.2) to generate smaller average sizes of aggregates improves cellular uptake [45]. Uptake through the cellular membrane was confirmed by confocal microscopy on fluorescently tagged aggregates. The mechanism by which cells take up these aggregates is not clear, but it may be related to an electrostatic attraction between the negatively charged phospholipid membrane and the highly positively charged aggregates, which are composed of peptides consisting of polyGln flanked by pairs of Lys residues. The process does not require polyGln, since a control amyloid fibril is also readily taken up [45]. Since nuclear polyGln inclusions are often observed in cell and animal models, it is also of interest to deliver aggregates into the nucleus. In an attempt to do this, polyGln peptides were prepared that contain an N-terminal nuclear localization signal (NLS) peptide sequence as well as a fluorescein tag. In fact, aggregates of such peptides are not only taken up by the cell but also are transported into the nucleus, as confirmed by confocal microscopy of isolated nuclei [45]. A control amyloid fibril from a peptide containing a NLS is also readily taken up into the nucleus. The ability of the nuclear transporter to manage such aggregates is somewhat surprising, since the aggregates appear to exceed 100 nm in diameter (based on their ability to resist filtration through a 100-nm filter membrane), while the diameter of the nuclear pore is on the order of 26 nm [112]. It is possible that the multiple display of NLS signal peptides along the surface of the aggregate provides a repeat that allows it to “ratchet” through the pore [112]. It is also possible that the large aggregates mixed with and taken up by cells may transiently break apart to individual protofibrillar aggregates, as seen in EMs of these aggregates; these have diameters in the range of 5 nm. It is very unlikely that aggregates get into cells or nuclei by way of full dissociation to monomers followed by reassembly on the other side of the membrane. These polyGln aggregates are very stable in vitro in physiological conditions [45]. Further, such a mechanism could not explain the ability of Q20 aggregates to invade the cell and nucleus, since the very slow aggregation kinetics of the monomer, especially at the concentrations likely to obtain in this scenario, rules out any efficient aggregation after dissociation. 7.1.2 Cell Killing by Nuclear-targeted PolyGln Aggregates The ability to deliver various protein aggregates into cells and their nuclei makes it possible to study the toxicity of these aggregates. PolyGln, as well as control, aggregates delivered to the cytoplasm of either PC12 or Cos-7 cells in culture exhibit little or no toxicity to these cells when examined by a number of standard methods for determining cell death: propidium iodide exclusion, lactate dehydrogenase release, and MTT reduction [45]. In contrast, polyGln aggregates delivered to cell nuclei rapidly kill cells, with very similar results from all three measures of cell killing. Although amyloid fibrils of cold shock protein peptide B-1 (CspB1) are efficiently delivered to cell nuclei, they do not kill cells, indicating the importance
7 Polyglutamine Aggregates and Cytotoxicity
of the polyGln sequence. Interestingly, nuclear-targeted aggregates of Q20 and Q42 peptides are equally effective at killing cells. This strongly suggests that any aggregated polyGln peptide can be toxic when delivered to the nucleus and reinforces the conclusion derived from the aggregation kinetics (Section 5.2) that the distinction between a toxic (Q42 ) and benign (Q20 ) repeat-length sequence is made at the level of aggregation efficiency (Section 5). Although the ability of nuclear-localized polyGln aggregates to rapidly kill cells may thus be responsible for the neuronal loss observed in end-stage expanded CAG repeat brain stem, it remains possible that cytoplasmic polyGln aggregates might also have harmful effects that may be less dramatic and difficult to detect in a one-day cell culture. The cell death induced in cultured cells by nuclear uptake of polyGln aggregates appears to be related to an apoptotic process (W. Yang and R. Wetzel, manuscript in preparation). 7.2 Visualization of Functional, Recruitment-positive Aggregation Foci
One possible explanation for why easily detected polyGln aggregates do not correlate with cytotoxicity in all cases is that the wrong aggregates are being monitored. If the cell killing by expanded polyGln sequences is due to recruitment of cellular polyGln proteins by aggregates of the polyGln disease protein, then the ability of these aggregates to efficiently recruit other polyGln sequences – their “recruitment activity” – is of paramount importance to their cytotoxicity. Recruitment activity can be defined as a measure of the number of monomeric polyGln peptides that can be recruited into a polyGln aggregate of a certain mass in a certain time. Interestingly, even in vitro aggregates of simple polyGln peptides can exhibit different recruitment activities depending on the way they are prepared and on their sizes [49]. Larger aggregates exhibit lower recruitment activity than smaller aggregates, presumably because of the greater surface area of the latter [49] (Section 5.4). Thus, if the recruitment hypothesis is valid, then the best way to gauge the polyGln aggregate burden of a cell or tissue would be to assess the recruitment activity in the tissue, either as a tissue stain or as an activity measurement akin to an enzymatic assay. Both of these approaches are feasible, because of the robustness and specificity of the polyGln elongation reaction, and both have been shown to work when using human brain material. Aggregates in an enriched fraction isolated from HD brain tissue, when fixed to microplate wells, exhibit the same ability to recruit biotinylated polyGln peptides as synthetic aggregates exhibit (V. Berthelier and R. Wetzel, unpublished). HD brain slices incubated with biotinylated polyGln pick up the peptide in small, punctate, intracellular centers called aggregation foci [113]. The ability to characterize cells and tissue for levels of functional polyGln aggregates should provide sharper tools for addressing the question of the role of polyGln aggregates in cell dystrophy and death in expanded CAG repeat diseases.
31
32 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
8 Inhibitors of PolyGln Aggregation
If polyGln aggregates are toxic to cells, a viable therapeutic approach for expanded CAG repeat diseases would be the use of specific inhibitors of spontaneous polyGln aggregation and/or elongation. Such inhibitors clearly exist in nature. For example, several genetic screens in HD disease models reveal that increased expression of DnaJ class chaperones can protect cells and animals from polyGln toxicity. Investigations of the molecular basis of this effect reveal that recombinant HDJ1, a human DnaJ homologue, is very effective at inhibiting both spontaneous polyGln aggregation and seeded elongation, presumably by binding to nuclei or small aggregates and inhibiting their elongation with additional polyGln monomers (A. Bhattacharyya and R. Wetzel, unpublished). This suggests that inhibitors of polyGln aggregation can have a protective effect. 8.1 Designed Peptide Inhibitors
A similar result has been obtained with a structure-based elongation inhibitor. Not only is the peptide PGQ9 -P2 incapable of spontaneously aggregating (see Section 6.5), it can also inhibit the aggregation of other polyGln peptides. In the microplate assay described earlier, PGQ9 -P2 inhibits Q42 elongation with an IC50 in the low micromolar range (Figure 10) [44]. Furthermore, cells pre-incubated with
Fig. 10 Concentration-dependent inhibition of polyGln aggregate elongation by PGQ9 -based peptide inhibitors. Microtiter plate wells containing adsorbed Q45 polyGln aggregates (100 ng per well) were incubated with 10 nM biotinyl-Q28 and various concentrations of PGQ9 P2 (•) for 45 min at 37 ◦ C. The derived EC50 value is 1.1 µM.
8 Inhibitors of PolyGln Aggregation
PGQ9 -P2 are protected, in a dose-dependent manner, from the cytotoxicity of polyGln aggregates in the cell model described in Section 7.1.2. [44]. This implies not only that aggregates are toxic but also that the basis of their toxicity is their ability to recruit other polyGln peptides in the cell. This in turn suggests that even cells that have already entered the aggregation pathway might be protected from aggregate toxicity by appropriate inhibitors.
8.2 Screening for Inhibitors of PolyGln Elongation
As discussed above, polypeptide-based inhibitors of polyGln aggregation exist. However, small molecules with comparable activities would presumably be better drug candidates. In order to discover and refine such inhibitors, viable assays are needed. A filter blot assay that allows screening of small compound libraries for inhibitors of the aggregation of the polyGln-containing htt exon 1 domain has identified a number of small molecule inhibitors [114]. The microtiter plate elongation assay described in Section 5.4.1 was initially designed for the same purpose [67, 83]. The assay has several important features, including reproducibility, insensitivity to small concentrations of DMSO and small pH changes, good throughput, and the ability to operate at low polyGln concentrations approaching physiological. This assay has been used to screen a number of small libraries of drug-like compounds as well as to determine dose-response curves for hits and analogues (V. Berthelier and R. Wetzel, unpublished). A typical screening result is shown in Figure 11.
Fig. 11 Microtiter plate screen of a small compound library for inhibitors of biotinylpolyGln elongation on adsorbed polyGln aggregates. Amount of elongation for each compound was compared to elongation
without added compound, and the percent inhibition was calculated. In this plate, 1 out of 80 compounds tested gave >50% inhibition. Error bars from triplicate analysis.
33
34 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
9 Conclusions
Historically, protein aggregation has been viewed as a nonspecific, amorphous process that is very difficult to study and unlikely to be relevant to events in the cell. Recent studies of amyloid-like phenomena, however, are showing that at least some aggregation processes are highly specific, more closely resembling crystallization phenomena. As described in this article, a logical approach that assumes such crystal-like packing and growth as a starting point for focused analytical experiments can make great progress in understanding the aggregation process both in vitro and in vivo. Far from being alien to the cell environment, it is now known that aggregation is a continuing background problem in the life of the cell. Although cells can normally manage the process pretty well, in some cases, as in the amyloidrelated neurodegenerative diseases, the normal cellular safeguards break down, are overwhelmed, or are isolated by cellular compartmentalization, and protein aggregation proceeds unabated, with devastating consequences. Protein aggregation, long swept under the rug in the protein-folding field, is now very much a viable subject for biophysical studies. While there has long been a perception that protein misfolding and aggregation are shadowy events occurring behind a veil impenetrable to the experimentalist, a variety of analytical methods exist [115], and improved methods are being developed [99, 116–120], that are capable of revealing these processes with analytical clarity.
Acknowledgements
The author would like to thank Geetha Thiagarajan for help with the experimental Sextion of this chapter and Angela Williams, Valerie Berthelier, and Ashwani Thakur for permission to include unpublished data. I am also very grateful to Marcy MacDonald for providing Figure 1 would like to acknowledge past and present members of my laboratory, in particular, Songming Cheh, Valerie Berthelier, J. Bradley Hamilton, Wen Yang, Ashwani Thakur, Anusri Bhattacharya, Brian O’Nuallian, Alex Osmand, Angela Williams, and Tina Richey, for their woek and ideas as developing the body of work summarized here. I gratefully acknowledge the Hereditary Disease Foundation and the NIH (R01 AG19322) for sustained funding and Ethan Signer, Carl Johnson, and Nancy Wexler for their enthusiastic support.
References 1 WILMOT, G. R. & WARREN, S. T. (1998). A new mutational basis for disease. In Genetic Instabilities and Hereditary Neurological Diseases (WELLS, R. D. &
WARREN, S. T., eds.), pp. 3–12. Academic Press, San Diego. 2 CUMMINGS, C. J. & ZOGHBI, H. Y. (2000). Fourteen and counting: unraveling
References
3
4
5
6
7
8
9
10
trinucleotide repeat diseases. Hum Mol Genet 9, 909–916. BATES, G. P. & BENN, C. (2002). The polyglutamine diseases. In Huntington’s Disease (BATES, G. P., HARPER, P. S. & JONES, L., eds.), pp. 429–472. Oxford University Press, Oxford, U.K. DIFIGLIA, M., SAPP, E., CHASE, K. O., DAVIES, S. W., BATES, G. P., VONSATTEL, J. P. & ARONIN, N. (1997). Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain. Science 277, 1990–1993. ORDWAY, J. M., TALLAKSEN-GREENE, S., GUTEKUNST, C. A., BERNSTEIN, E. M., CEARLEY, J. A., WIENER, H. W., DURE, L. S. t., LINDSEY, R., HERSCH, S. M., JOPE, R. S., ALBIN, R. L. & DETLOFF, P. J. (1997). Ectopically expressed CAG repeats cause intranuclear inclusions and a progressive late onset neurological phenotype in the mouse [In Process Citation]. Cell 91, 753–763. DAVIES, S. W., TURMAINE, M., COZENS, B. A., DIFIGLIA, M., SHARP, A. H., ROSS, C. A., SCHERZINGER, E., WANKER, E. E., MANGIARINI, L. & BATES, G. P. (1997). Formation of neuronal intranuclear inclusions underlies the neurological dysfunction in mice transgenic for the HD mutation. Cell 90, 537–548. PAULSON, H. L., PEREZ, M. K., TROTTIER, Y., TROJANOWSKI, J. Q., SUBRAMONY, S. H., DAS, S. S., VIG, P., MANDEL, J. L., FISCHBECK, K. H. & PITTMAN, R. N. (1997). Intranuclear inclusions of expanded polyglutamine protein in spinocerebellar ataxia type 3. Neuron 19, 333–344. MARTIN, J. B. (1999). Molecular basis of the neurodegenerative disorders [published erratum appears in N Engl J Med 1999 Oct 28;341(18):1407]. N Engl J Med 340, 1970–1980. CHEN, Y. W. (2003). Site-specific mutagenesis in a homogeneous polyglutamine tract: application to spinocerebellar ataxin-3. Protein Eng 16, 1–4. THAKUR, A. & WETZEL, R. (2002). Mutational analysis of the structural organization of polyglutamine
11
12
13
14
15
16
17
18
19
20
21
22 23
24
aggregates. Proc Natl Acad Sci USA 99, 17014–17019. WELLS, R. D. & WARREN, S. T. (1998). Genetic Instabilities and Hereditary Neurological Diseases, Academic Press, San Diego. BATES, G. P., HARPER, P. S. & JONES, L., Eds. (2002). Huntington’s Disease. Oxford, U.K.: Oxford University Press. REDDY, P. S. & HOUSMAN, D. E. (1997). The complex pathology of trinucleotide repeats. Curr Opin Cell Biol 9, 364–372. GUSELLA, J. F. & MACDONALD, M. E. (1998). Huntingtin: a single bait hooks many species. Curr Opin Neurobiol 8, 425–430. REDDY, P. H., WILLIAMS, M. & TAGLE, D. A. (1999). Recent advances in understanding the pathogenesis of Huntington’s disease. Trends Neurosci 22, 248–255. LIN, X., CUMMINGS, C. J. & ZOGHBI, H. Y. (1999). Expanding our understanding of polyglutamine diseases through mouse models. Neuron 24, 499–502. ZOGHBI, H. Y. & ORR, H. T. (1999). Polyglutamine diseases: protein cleavage and aggregation. Curr Opin Neurobiol 9, 566–570. PAULSON, H. L. (1999). Protein fate in neurodegenerative proteinopathies: polyglutamine diseases join the (mis)fold. Am J Hum Genet 64, 339–345. FERRIGNO, P. & SILVER, P. A. (2000). Polyglutamine expansions: proteolysis, chaperones, and the dangers of promiscuity. Neuron 26, 9–12. TOBIN, A. J. & SIGNER, E. R. (2000). Huntington’s disease: the challenge for cell biologists. Trends Cell Biol 10, 531–536. ZOGHBI, H. Y. & ORR, H. T. (2000). Glutamine repeats and neurodegeneration. Annu Rev Neurosci 23, 217–247. BATES, G. (2000). Huntington’s disease. In reverse gear. Nature 404, 944–945. CHA, J. H. (2000). Transcriptional dysregulation in Huntington’s disease. Trends Neurosci 23, 387–392. MCCAMPBELL, A. & FISCHBECK, K. H. (2001). Polyglutamine and CBP: fatal attraction?. Nat Med 7, 528–530.
35
36 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
25 ROSS, C. A. (2002). Polyglutamine pathogenesis: emergence of unifying mechanisms for Huntington’s disease and related disorders. Neuron 35, 819–822. 26 RUBINSZTEIN, D. C. (2002). Lessons from animal models of Huntington’s disease. Trends Genet 18, 202–209. 27 ROSS, C. A., POIRIER, M. A., WANKER, E. E. & AMZEL, M. (2003). Polyglutamine fibrillogenesis: the pathway unfolds. Proc Natl Acad Sci USA 100, 1–3. 28 BATES, G. (2003). Huntingtin aggregation and toxicity in Huntington’s disease. Lancet 361, 1642–1644. 29 MICHALIK, A. & Van BROECKHOVEN, C. (2003). Pathogenesis of polyglutamine disorders: aggregation revisited. Hum Mol Genet 12 Suppl 2, R173–86. 30 PAULSON, H. (2003). Polyglutamine neurodegeneration: minding your Ps and Qs. Nat Med 9, 825–826. 31 RUDNICKI, D. D. & MARGOLIS, R. L. (2003). Repeat expansion and autosomal dominant neurodegenerative disorders: consensus and controversy. Exp. Rev. Mol. Med. 5, 1–24. 32 KOIDE, R., KOBAYASHI, S., SHIMOHATA, T., IKEUCHI, T., MARUYAMA, M., SAITO, M., YAMADA, M., TAKAHASHI, H. & TSUJI, S. (1999). A neurological disease caused by an expanded CAG trinucleotide repeat in the TATA-binding protein gene: a new polyglutamine disease?. Hum Mol Genet 8, 2047–2053. 33 MARGOLIS, R. L. (2002). The spinocerebellar ataxias: order emerges from chaos. Curr Neurol Neurosci Rep 2, 447–456. 34 ALBIN, R. L. (2003). Dominant ataxias and Friedreich ataxia: an update. Curr Opin Neurol 16, 507–514. 35 BOWATER, R. P. & WELLS, R. D. (2001). The intrinsically unstable life of DNA triplet repeats associated with human hereditary disorders. Prog Nucleic Acid Res Mol Biol 66, 159–202. 36 HARPER, P. S. & JONES, L. (2002). Huntington’s disease: genetic and molecular studies. In Huntington’s Disease (BATES, G. P., HARPER, P. S. & JONES, L., eds.), pp. 113–158. Oxford University Press, Oxford, U.K.
37 MYERS, R. H., MARANS, K. S. & MACDONALD, M. E. (1998). Huntington’s Disease. In Genetic Instabilities and Hereditary Neurological Diseases (WELLS, R. D. & WARREN, S. T., eds.), pp. 301–323. Academic Press, San Diego. 38 ZAJAC, J. D. & MACLEAN, H. E. (1998). Kennedy’s disease: clinical aspects. In Genetic Instabilities and Hereditary Neurological Diseases (WELLS, R. D. & WARREN, S. T., eds.), pp. 87–111. Academic Press, San Diego. 39 CATTANEO, E., RIGAMONTI, D., GOFFREDO, D., ZUCCATO, C., SQUITIERI, F. & SIPIONE, S. (2001). Loss of normal huntingtin function: new developments in Huntington’s disease research. Trends Neurosci 24, 182–188. 40 KAZEMI-ESFARJANI, P. & BENZER, S. (2000). Genetic suppression of polyglutamine toxicity in Drosophila. Science 287, 1837–1840. 41 BENNETT, M. J., HUEY-TUBMAN, K. E., HERR, A. B., WEST, A. P., ROSS, S. A. & BJORKMAN, P. J. (2002). A linear lattice model for polyglutamine in CAG expansion diseases. Proc Natl Acad Sci USA 99, 11634–11639. 42 FABER, P. W., VOISINE, C., KING, D. C., BATES, E. A. & HART, A. C. (2002). Glutamine/proline-rich PQE-1 proteins protect Caenorhabditis elegans neurons from huntingtin polyglutamine neurotoxicity. Proc Natl Acad Sci USA 99, 17131–17136. 43 KAZANTSEV, A., WALKER, H. A., SLEPKO, N., BEAR, J. E., PREISINGER, E., STEFFAN, J. S., ZHU, Y. Z., GERTLER, F. B., HOUSMAN, D. E., MARSH, J. L. & THOMPSON, L. M. (2002). A bivalent Huntingtin binding peptide suppresses polyglutamine aggregation and pathogenesis in Drosophila. Nat Genet 25, 25. 44 THAKUR, A. K., YANG, W., & WETZEL, R. (2004). Inhibition of polyglutamine aggregate cytotoxicity by a structure-based elongation inhibitor. FASEB J. 18, 923–925. 45 YANG, W., DUNLAP, J. R., ANDREWS, R. B. & WETZEL, R. (2002). Aggregated polyglutamine peptides delivered to nuclei are toxic to mammalian cells. Hum Mol Genet 11, 2905–2917.
References 46 ALTSCHULER, E. L., HUD, N. V., MAZRIMAS, J. A. & RUPP, B. (1997). Random coil conformation for extended polyglutamine stretches in aqueous soluble monomeric peptides. J Pept Res 50, 73–75. 47 SHARMA, D., SHARMA, S., PASHA, S. & BRAHMACHARI, S. K. (1999). Peptide models for inherited neurodegenerative disorders: conformation and aggregation properties of long polyglutamine peptides with and without interruptions. FEBS Lett 456, 181–185. 48 CHEN, S. & WETZEL, R. (2001). Solubilization and disaggregation of polyglutamine peptides. Protein Science 10, 887–891. 49 CHEN, S., BERTHELIER, V., YANG, W. & WETZEL, R. (2001). Polyglutamine aggregation behavior in vitro supports a recruitment mechanism of cytotoxicity. J. Mol. Biol. 311, 173–182. 50 MASINO, L., KELLY, G., LEONARD, K., TROTTIER, Y. & PASTORE, A. (2002). Solution structure of polyglutamine tracts in GST-polyglutamine fusion proteins. FEBS Lett 513, 267–272. 51 SHI, Z., OLSON, C. A., ROSE, G. D., BALDWIN, R. L. & KALLENBACH, N. R. (2002). Polyproline II structure in a sequence of seven alanine residues. Proc Natl Acad Sci USA 99, 9190–9195. 52 RUCKER, A. L., PAGER, C. T., CAMPBELL, M. N., QUALLS, J. E. & CREAMER, T. P. (2003). Host-guest scale of left-handed polyproline II helix formation. Proteins 53, 68–75. 53 PERUTZ, M. F., JOHNSON, T., SUZUKI, M. & FINCH, J. T. (1994). Glutamine repeats as polar zippers: their possible role in inherited neurodegenerative diseases. Proc Natl Acad Sci USA 91, 5355–5358. 54 SCHERZINGER, E., LURZ, R., TURMAINE, M., MANGIARINI, L., HOLLENBACH, B., HASENBANK, R., BATES, G. P., DAVIES, S. W., LEHRACH, H. & WANKER, E. E. (1997). Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo. Cell 90, 549–558. 55 SCHERZINGER, E., SITTLER, A., SCHWEIGER, K., HEISER, V., LURZ, R., HASENBANK, R., BATES, G. P., LEHRACH, H. & WANKER, E.
56
57
58
59
60
61
62
63
E. (1999). Self-assembly of polyglutamine-containing huntingtin fragments into amyloid-like fibrils: implications for Huntington’s disease pathology. Proc Natl Acad Sci USA 96, 4604–4609. ONODERA, O., BURKE, J. R., MILLER, S. E., HESTER, S., TSUJI, S., ROSES, A. D. & STRITTMATTER, W. J. (1997). Oligomerization of expanded-polyglutamine domain fluorescent fusion proteins in cultured mammalian cells. Biochem Biophys Res Commun 238, 599–605. PEREZ, M. K., PAULSON, H. L., PENDSE, S. J., SAIONZ, S. J., BONINI, N. M. & PITTMAN, R. N. (1998). Recruitment and the role of nuclear localization in polyglutamine-mediated aggregation. J Cell Biol 143, 1457–1470. SHERMAN, M. Y. & MUCHOWSKI, P. J. (2003). Making yeast tremble: yeast models as tools to study neurodegenerative disorders. Neuromolecular Med 4, 133–146. DRISCOLL, M. & GERSTBREIN, B. (2003). Dying for a cause: invertebrate genetics takes on human neurodegeneration. Nat Rev Genet 4, 181–194. KAZANTSEV, A., PREISINGER, E., DRANOVSKY, A., GOLDGABER, D. & HOUSMAN, D. (1999). Insoluble detergent-resistant aggregates form between pathological and nonpathological lengths of polyglutamine in mammalian cells. Proc Natl Acad Sci USA 96, 11404–11409. HUANG, C. C., FABER, P. W., PERSICHETTI, F., MITTAL, V., VONSATTEL, J. P., MACDONALD, M. E. & GUSELLA, J. F. (1998). Amyloid formation by mutant huntingtin: threshold, progressivity and recruitment of normal polyglutamine proteins. Somat Cell Mol Genet 24, 217–233. PREISINGER, E., JORDAN, B. M., KAZANTSEV, A. & HOUSMAN, D. (1999). Evidence for a recruitment and sequestration mechanism in Huntington’s disease. Philos Trans R Soc Lond B Biol Sci 354, 1029–1034. MCCAMPBELL, A., TAYLOR, J. P., TAYE, A. A., ROBITSCHEK, J., LI, M., WALCOTT, J.,
37
38 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
64
65
66
67
68
69
70
71
72
73
MERRY, D., CHAI, Y., PAULSON, H., SOBUE, G. & FISCHBECK, K. H. (2000). CREB-binding protein sequestration by expanded polyglutamine. Hum Mol Genet 9, 2197–2202. HOWLETT, D. R., JENNINGS, K. H., LEE, D. C., CLARK, M. S., BROWN, F., WETZEL, R., WOOD, S. J., CAMILLERI, P. & ROBERTS, G. W. (1995). Aggregation state and neurotoxic properties of Alzheimer beta-amyloid peptide. Neurodegeneration 4, 23–32. WOOD, S. J., CHAN, W. & WETZEL, R. (1996). Seeding of Aβ fibril formation is inhibited by all three isotypes of apolipoprotein E. Biochem. 35, 12623–12628. CHEN, S., BERTHELIER, V., HAMILTON, J. B., O’NUALLAIN, B. & WETZEL, R. (2002). Amyloid-like features of polyglutamine aggregates and their assembly kinetics. Biochemistry 41, 7391–7399. BERTHELIER, V., HAMILTON, J. B., CHEN, S. & WETZEL, R. (2001). A microtiter plate assay for polyglutamine aggregate extension. Anal Biochem 295, 227–236. ZAGORSKI, M. G., YANG, J., SHAO, H., MA, K., ZENG, H. & HONG, A. (1999). Methodological and chemical factors affecting amyloid-β amyloidogenicity. Meth Enzymol 309, 189–204. FRANKS, F. (1985). Biophysics and biochemistry at low temperatures, Cambridge University Press, Cambridge. FRANKS, F. (1993). Storage stabilization of proteins. In Protein Biotechnology: Isolation, Characterization and Stabilization (FRANKS, F., ed.), pp. 489–531. Humana Press, New York. JAENICKE, R. & SECKLER, R. (1997). Protein Assembly In Vitro. In Protein Misassembly (WETZEL, R., ed.), Vol. 50, pp. 1–59. Academic Press, San Diego. ANDERSON, V. J. & LEKKERKERKER, H. N. (2002). Insights into phase transition kinetics from colloid science. Nature 416, 811–815. BISHOP, M. F. & FERRONE, F. A. (1984). Kinetics of nucleation-controlled polymerization. A perturbation treatment for use with a secondary pathway. Biophys J 46, 631–644.
74 FERRONE, F. (1999). Analysis of protein aggregation kinetics. Meths. Enzymol. 309, 256–274. 75 CAO, Z. & FERRONE, F. A. (1996). A 50th order reaction predicted and observed for sickle hemoglobin nucleation. J Mol Biol 256, 219–222. 76 CHEN, S., FERRONE, F. & WETZEL, R. (2002). Huntington’s Disease age-of-onset linked to polyglutamine aggregation nucleation. Proc. Natl. Acad. Sci. USA 99, 11884–11889. 77 ABRAHAM, F. F. (1974). Homogeneous Nucleation Theory, Academic Press, New York. 78 PERUTZ, M. F. & WINDLE, A. H. (2001). Cause of neural death in neurodegenerative diseases attributable to expansion of glutamine repeats. Nature 412, 143–144. 79 SERIO, T. R., CASHIKAR, A. G., KOWAL, A. S., SAWICKI, G. J., MOSLEHI, J. J., SERPELL, L., ARNSDORF, M. F. & LINDQUIST, S. L. (2000). Nucleated conformational conversion and the replication of conformational information by a prion determinant. Science 289, 1317–1321. 80 NAIKI, H. & GEJYO, F. (1999). Kinetic analysis of amyloid fibril formation. Meths Enzymol 309, 305–318. 81 ESLER, W. P., STIMSON, E. R., GHILARDI, J. R., FELIX, A. M., LU, Y. A., VINTERS, H. V., MANTYH, P. W. & MAGGIO, J. E. (1997). A beta deposition inhibitor screen using synthetic amyloid. Nat Biotechnol 15, 258–263. 82 MYSZKA, D. G., WOOD, S. J. & BIERE, A. L. (1999). Analysis of fibril elongation using surface plasmon resonance biosensors. In Amyloid, Prions and other Protein Aggregates (WETZEL, R., ed.), Vol. 309, pp. 386–402. Academic Press, San Diego. 83 BERTHELIER, V. & WETZEL, R. (2003). An assay for characterizing the in vitro kinetics of polyglutamine aggregation. In Methods in Molecular Medicine: Neurogenetics (N. T. POTTER, e., ed.), pp. 295–303. Humana Press, New York. 84 ESLER, W. P., STIMSON, E. R., JENNINGS, J. M., VINTERS, H. V., GHILARDI, J. R., LEE, J. P., MANTYH, P. W. & MAGGIO, J. E. (2000). Alzheimer’s disease amyloid
References
85
86
87
88
89
90
91
92
93
94
propagation by a template-dependent dock-lock mechanism. Biochemistry 39, 6288–6295. NUCIFORA, F. C., Jr., SASAKI, M., PETERS, M. F., HUANG, H., COOPER, J. K., YAMADA, M., TAKAHASHI, H., TSUJI, S., TRONCOSO, J., DAWSON, V. L., DAWSON, T. M. & ROSS, C. A. (2001). Interference by huntingtin and atrophin-1 with cbp-mediated transcription leading to cellular toxicity. Science 291, 2423–2428. HARPER, J. D., WONG, S. S., LIEBER, C. M. & LANSBURY, P. T. (1997). Observation of metastable Abeta amyloid protofibrils by atomic force microscopy. Chem Biol 4, 119–125. KAYED, R., HEAD, E., THOMPSON, J. L., MCINTIRE, T. M., MILTON, S. C., COTMAN, C. W. & GLABE, C. G. (2003). Common structure of soluble amyloid oligomers implies common mechanism of pathogenesis. Science 300, 486–489. POIRIER, M. A., LI, H., MACOSKO, J., CAI, S., AMZEL, M. & ROSS, C. A. (2002). Huntingtin spheroids and protofibrils as precursors in polyglutamine fibrilization. J Biol Chem 277, 41032–41037. DIVRY, P. & FLORKIN, M. (1927). Sur les proprietes optiques de l’amyloide. C. R. Seances Soc Biol 97, 1808–1810. WESTERMARK, G. T., JOHNSON, K. H. & WESTERMARK, P. (1999). Staining methods for identification of amyloid in tissue. Methods Enzymol 309, 3–25. KLUNK, W. E., JACOB, R. F. & MASON, R. P. (1999). Quantifying amyloid beta-peptide (Abeta) aggregation using the congo red-abeta (CR-abeta) spectrophotometric assay [In Process Citation]. Anal Biochem 266, 66–76. NAIKI, H., HIGUCHI, K., HOSOKAWA, M. & TAKEDA, T. (1989). Fluorometric determination of amyloid fibrils in vitro using the fluorscent dye, thioflavine T. Anal. Biochem. 177, 244–249. LEVINE, H. (1999). Quantification of β-sheet amyloid fibril structures with thioflavin T. Meth Enzymol 309, 274–284. WOODY, R. W. (1996). Theory of circular dichroism of proteins. In Circular
95
96
97
98
99
100
101
102
103
104
105
106
Dichroism and Conformational Analysis of Biomolecules (FASMAN, G. D., ed.), pp. 25–67. Plenum, New York. HRNCIC, R., WALL, J., WOLFENBARGER, D. A., MURPHY, C. L., SCHELL, M., WEISS, D. T. & SOLOMON, A. (2000). AntibodyMediated Resolution of Light Chain-Associated Amyloid Deposits. Am J Pathol 157, 1239–1246. O’NUALLAIN, B. & WETZEL, R. (2002). Conformational antibodies recognizing a generic amyloid fibril epitope. Proc Natl Acad Sci USA 99, 1485–1490. SUNDE, M. & BLAKE, C. (1997). The structure of amyloid fibrils by electron microscopy and X-ray diffraction. Adv Protein Chem 50, 123–159. WOOD, S. J., WETZEL, R., MARTIN, J. D. & HURLE, M. R. (1995). Prolines and amyloidogenicity in fragments of the Alzheimer’s peptide β/A4. Biochem. 34, 724–730. WILLIAMS, A., PORTELIUS, E., KHETERPAL, I., GUO, J., COOK, K., XU, Y. & WETZEL, R. (2004). Mapping Aβ amyloid fibril secondary structure using scanning proline mutagenesis. J. Mol. Biol. 335, 833–842. VENKATRAMAN, J., SHANKARAMMA, S. C. & BALARAM, P. (2001). Design of folded peptides. Chem Rev 101, 3131–3152. WETZEL, R. (2002). Ideas of order for amyloid fibril structure. Structure 10, 1031–1036. STARIKOV, E. B., LEHRACH, H. & WANKER, E. E. (1999). Folding of oligoglutamines: a theoretical approach based upon thermodynamics and molecular mechanics. J Biomol Struct Dyn 17, 409–427. PERUTZ, M. F., FINCH, J. T., BERRIMAN, J. & LESK, A. (2002). Amyloid fibers are water-filled nanotubes. Proc Natl Acad Sci USA 99, 5591–5595. JENKINS, J. & PICKERSGILL, R. (2001). The architecture of parallel beta-helices and related folds. Prog Biophys Mol Biol 77, 111–175. PICKERSGILL, R. W. (2003). A primordial structure underlying amyloid. Structure (Camb) 11, 137–138. BENCE, N. F., SAMPAT, R. M. & KOPITO, R. R. (2001). Impairment of the
39
40 Protein Folding and Aggregation in the Expanded Polyglutamine Repeat Diseases
107
108
109
110
111
112
113
114
ubiquitin-proteasome system by protein aggregation. Science 292, 1552–1555. KOPITO, R. R. (2000). Aggresomes, inclusion bodies and protein aggregation. Trends Cell Biol 10, 524–530. LASHUEL, H. A., HARTLEY, D., PETRE, B. M., WALZ, T. & LANSBURY, P. T., Jr. (2002). Neurodegenerative disease: amyloid pores from pathogenic mutations. Nature 418, 291. MARGOLIS, R. L., ABRAHAM, M. R., GATCHELL, S. B., LI, S. H., KIDWAI, A. S., BRESCHEL, T. S., STINE, O. C., CALLAHAN, C., MCINNIS, M. G. & ROSS, C. A. (1997). cDNAs with long CAG trinucleotide repeats from human brain. Hum Genet 100, 114–122. HANCOCK, J. M., WORTHEY, E. A. & SANTIBANEZ-KOREF, M. F. (2001). A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. Mol Biol Evol 18, 1014–1023. COLLINS, J. R., STEPHENS, R. M., GOLD, B., LONG, B., DEAN, M. & BURT, S. K. (2003). An exhaustive DNA micro-satellite map of the human genome using high performance computing. Genomics 82, 10–19. TALCOTT, B. & MOORE, M. S. (1999). Getting across the nuclear pore complex. Trends Cell Biol 9, 312–318. OSMAND, A. P., BERTHELIER, V. & WETZEL, R. (2002). Identification of aggregation foci, intracellular neuronal structures in the neocortex in Huntington’s disease capable of recruiting polyglutamine. Program 293. 6, 2002 Abstract Viewer/Itinerary Planner. Washington, DC: Society for Neuroscience, 2002. Online. HEISER, V., SCHERZINGER, E., BOEDDRICH, A., NORDHOFF, E., LURZ, R., SCHUGARDT, N., LEHRACH, H. & WANKER, E. E. (2000).
115
116
117
118
119
120
Inhibition of huntingtin fibrillogenesis by specific antibodies and small molecules: Implications for Huntington’s disease therapy. Proc. Natl. Acad. Sci. (USA) 97, 6739–6744. WETZEL, R., Ed. (1999). Amyloid, Prions, and other Protein Aggregates. Methods in Enzymology. Edited by ABELSON, J. N. & SIMON, M. I. San Diego, CA: Academic Press. KHETERPAL, I., ZHOU, S., COOK, K. D. & WETZEL, R. (2000). Abeta amyloid fibrils possess a core structure highly resistant to hydrogen exchange. Proc Natl Acad Sci USA 97, 13597–13601. KHETERPAL, I., WILLIAMS, A., MURPHY, C., BLEDSOE, B. & WETZEL, R. (2001). Structural features of the Aβ amyloid fibril elucidated by limited proteolysis. Biochem. 40, 11757–11767. TOROK, M., MILTON, S., KAYED, R., WU, P., MCINTIRE, T., GLABE, C. G. & LANGEN, R. (2002). Structural and Dynamic Features of Alzheimer’s Abeta Peptide in Amyloid Fibrils Studied by Site-directed Spin Labeling. J Biol Chem 277, 40810–40815. PETKOVA, A. T., ISHII, Y., BALBACH, J. J., ANTZUTKIN, O. N., LEAPMAN, R. D., DELAGLIO, F. & TYCKO, R. (2002). A structural model for Alzheimer’s beta-amyloid fibrils based on experimental constraints from solid state NMR. Proc Natl Acad Sci USA 99, 16742–16747. FOGUEL, D., SUAREZ, M. C., FERRAO-GONZALES, A. D., PORTO, T. C., PALMIERI, L., EINSIEDLER, C. M., ANDRADE, L. R., LASHUEL, H. A., LANSBURY, P. T., KELLY, J. W. & SILVA, J. L. (2003). Dissociation of amyloid fibrils of alpha-synuclein and transthyretin by pressure reveals their reversible nature and the formation of water-excluded cavities. Proc Natl Acad Sci USA 100, 9831–9836.
1
Thermodynamics of Amyloid Formation Ilia V. Baskakov
University of Maryland, Baltimore, USA
Originally published in: Amyloid Proteins. Edited by Jean D. Sipe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31072-X
1 Introduction
More than 20 systemic and neurodegenerative maladies have been linked to the formation of ordered protein aggregates [1, 2]. When the deposited polymeric form is sufficiently ordered to bind Congo red and Thioflavin T, the term amyloid is used to define these types of aggregation [3]. A common feature of amyloid aggregates is formation of β-sheet-rich polymeric forms organized into highly ordered fibrils or plaques [4]. Recent studies have demonstrated that a broad variety of proteins unrelated to any known conformational disease can adopt β-sheet-rich amyloid forms in vitro and in vivo [5–8]. Medin, a proteolytic fragment of lactadherin, is an example of polypeptides that form amyloid deposits in vivo (amyloid deposits of medin were found in aorta of virtually all individuals studied older than 60 years), but has not been linked to any pathological processes [9]. Amyloidogenic proteins are now found in a variety of organisms including prokaryotes, plants, insect and mammals [10–13]. No consensus sequences that would predetermine the ability to form amyloid have been identified in any of these classes of proteins. Even though the amyloidogenic proteins show no obvious sequence similarity, they share a similar conformational property when converted into the amyloid fibrils, i.e. thermodynamically stable cross-β-pleated sheets [14, 15]. This finding has led to the hypothesis that the ability to fold into amyloid forms is not a unique property of certain proteins associated with degenerative maladies; rather, it may well be a feature of polypeptides in general [16].
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Thermodynamics of Amyloid Formation
2 Thermodynamic versus Kinetic Control of Protein Folding
In his 1972 Nobel Prize lecture Christian Anfinsen described the “thermodynamic hypothesis” of protein folding [17]. This hypothesis states that “the threedimensional structure of a native protein in its normal physiological milieu is the one in which the Gibbs free energy of the whole system is lowest; that is, that the native conformation is determined by the totality of interatomic interactions and hence by the amino acid sequence, in a given environment”. Although the thermodynamic hypothesis has now been widely established and protein folding is commonly thought to be controlled by thermodynamic preferences, it has been understood by many, including Anfinsen and others, that kinetic issues can alter the folding landscape [18]. Whereas most small globular proteins will refold spontaneously in vitro to a native conformation, in vivo folding often exploits auxiliary molecules and defined subcellular compartments to avoid the deposit of ordered misfolded aggregates. What drives conversion of natively folded proteins into alternative misfolded conformations? Why do some proteins assemble into amyloid fibrils at some point within a protein’s lifetime, while others do not? Direct comparison of the thermodynamic stability of the native state with that of the β-sheet-rich amyloid state is impossible due to the insolubility, heterogeneity and high degree of polymerization of the amyloid fibril aggregates. On the other hand, the high resistance of the amyloid fibril aggregates to denaturation by detergents and to thermal and solvent-induced denaturation serves as a clever illustration of the extremely high thermodynamic stability and remarkable physical properties that the β-pleated sheet conformation imparts to amyloid fibrils. It is noteworthy that nature has learned to exploit these unusual properties of amyloid structures for a variety of physiological functions. For example, the major structural component of the shells of many insects and fish is of amyloid [11, 13]. To protect the developing embryo from temperature variation, mechanical pressure, proteases, bacteria, viruses and dehydration, the ability to construct complex amyloid structures evolved in these organisms through natural evolution and selection [11, 13]. Other examples of naturally occurring amyloid structures include extracellular curly fibrils expressed by Escherichia coli and Salmonella. These fibrils are involved in the colonization of bacteria on surfaces and in biofilm formation [19]. Furthermore, mammalian melanocytes produce a glycoprotein that polymerizes into amyloid-like fibrils, on which melanins are sequestered and concentrated during the multistage process of melanosome biogenesis [20]. These observations illustrate that amyloid formation is an evolutionary preserved biological pathway used to produce natural biomaterials with important physiological functions. While the direct thermodynamic analysis of insoluble amyloid structures is quite complicated, numerous studies have demonstrated that, in addition to insoluble amyloid fibrillar forms, many amyloidogenic proteins also adopt soluble oligomeric β-sheet-rich states [21–25]. Some of these β-sheet-rich oligomeric forms lie on the kinetic pathways to the amyloid fibrils, while others are off the kinetic pathways [24–26]. A direct comparison of the thermodynamic stability of the native and misfolded β-sheet-rich oligomeric isoform illustrates that the native state is not the
2 Thermodynamic versus Kinetic Control of Protein Folding
Fig. 1 Schematic free energy diagram of the conformational transition of the prion protein [27]. The "-helical monomeric isofrom is not the lowest energy state. However, the conformational transition form the "-helical
isoform to a $-sheet-rich oli-gomeric form is controlled by a large energetic barrier and, therefore, is prevented within the protein lifetime.
lowest energy state [27, 28]. It is shown in Fig. 1 that, when the unfolded state is used as a reference in the free energy diagram, the β-sheet-rich state is thermodynamically more stable than the native state. Even though the β-oligomeric states exhibit a thermodynamic stability higher than that of the native states, they might not be the true global energy minimum states, because the β-oligomeric species may undergo an additional time-dependent transition to highly polymeric amyloid forms [24, 25]. Why is the thermodynamically more stable β-sheet-rich isoform not accessible during folding under native conditions? It has been demonstrated that the rate of folding to the β-sheet-rich oligomeric isoform is slower by several orders of magnitude than the rate of folding to the native conformation [27]. To prevent the conformational conversion, the native state has to be separated by a large energetic barrier from the alternative β-sheet-rich state (Fig. 1). Although the free energy diagram does not provide a view of the actual kinetic pathways for the conformational transition, several important observations can be made regarding the origin of the energetic barrier. First, the native form has to unfold substantially on the route to the β-sheet-rich oligomeric isoform. Indeed, several studies have demonstrated that a substantial portion of the energetic barrier is linked to the partial unfolding of the native conformations [27, 29, 30]. The conversion to β-sheet-rich forms can be accelerated by shifting the native-unfolded equilibrium toward the unfolded state [6, 30–32]. Furthermore, the connection between the structural complexity of the pre-transition state and the energetic barrier is demonstrated by numerous observations that conversion of polypeptides with low structural complexity into β-sheet-rich isoforms occurs spontaneously and does not require partially denaturing conditions [21, 33]. A significant contribution to the energetic barrier seems to be associated with the process of assembly. Several studies demonstrated that the accumulation of a β-rich conformation is coupled with oligomerization. Analyses of the kinetic traces indicate that the process of folding to the β-oligomeric
3
4 Thermodynamics of Amyloid Formation
isoforms represents a transition with an apparent reaction order of 3 or higher [27, 34]. This high order of reaction suggests that the conformational transition will depend dramatically upon the concentration of the transition state.
3 What Thermodynamic Forces are Responsible for the Exceptional Stability of Amyloid Aggregates?
It has been more than 40 years since Kauzmann described the thermodynamic forces responsible for the folding of proteins to the native conformations [35]. Such forces include hydrogen bonding, electrostatic, van der Waals, conformational entropy and hydrophobic interactions. Upon folding, groups accessible to solvent in the denatured state became newly buried in the native state. On transfer of the denatured state from a denaturing solution environment to physiological conditions, collapse of the denatured state occurs, causing removal of hydrophobic groups from water. This is believed to be an important driving force for folding of proteins into unique native conformations. For hydrophobic forces to play the major role in protein folding, the fraction of the hydrophobic groups destined for burial must be significant in the denatured states of proteins. Fig. 2 presents a analysis of the classes of groups that are buried upon protein folding in terms of their relative proportion as a function of protein molecular mass. Hydrophobic groups account for more than 20% of all buried groups upon protein folding and their relative proportion is consistently maintained, regardless of protein molecular mass. This result strongly supports the importance of the role that hydrophobic effects play in protein folding. Remarkably, it is evident from Fig. 2 that it is the peptide backbone that comprises the largest numerical fraction of groups newly buried on protein folding (numerical
Fig. 2 Number fraction of backbone and side-chain classes that are newly buried upon folding proteins of different molecular masses to their native states. In addition to the peptide backbone class, all side-chains are grouped into three classes: hydrophobic,
polar and charged side-chains. Crystallographic coordinates data were used from the PDB databank in evaluating the number of each group exposed in the native states of the proteins used in the calculations [38].
4 Single Polypeptide Chain-Multiple $-Sheet-rich Abnormal Isoforms
fraction = 55%). Whether the removal of the peptide backbone units from contact with water molecules is thermodynamically favorable or not has been widely debated [36]. The essential feature of the peptide backbone is its ability to form interand intramolecular hydrogen bonds. By forming hydrogen bonds between NH and CO groups in the folded state, polypeptides minimize the unfavorable effects of removing polar groups from water. The requirement that NH and CO groups of the peptide backbone must either form hydrogen bonds with water or with each other provides a strong structural constraint on the pathways toward possible folded states. This holds, especially, for ordered β-sheet-rich aggregates, where the role of hydrophobic interactions seems to be unsubstantial. Analyses of microcrystals of the amyloidogenic peptides demonstrated that amyloid fibril structures are highly hydrogen bonded, nearly anhydrous and densely packed β-sheets [37]. One can speculate that amyloid structures are formed in a way that optimizes formation of hydrogen bonds between strands of newly buried polypeptide backbone. The ability of many proteins to adopt alternative β-sheet-rich polymeric folding in vitro and in vivo argues that the mechanism involved and the thermodynamic forces that stabilize amyloid conformations have to be generic in nature. The fact that the numerical fraction of the newly buried peptide backbone is large and constant with respect to molecular mass illustrates that, under favorable conditions, burial of the peptide backbone could counteract the unfavorable exposure of hydrophobic groups. Considering that the number of newly buried peptide backbone groups predominates over that of newly buried side-chains lends support to the concept that it is hydrogen bonding of the peptide backbone that provides generic force for the stabilization of amyloid fibril forms. As the amyloid fibril aggregates grow and, correspondingly, the ratio of surface accessible to solvent versus the volume occupied by protein fabric decreases, the stabilizing effect increases. In the way that the hydrophobic force is considered one of the major determinants of the unique native conformation, it seems that the chemical nature of the polypeptide backbone, with its capacity for forming inter- and intramolecular hydrogen bonds is the central determinant of amyloid fibril formation [38].
4 Single Polypeptide Chain-Multiple β-Sheet-rich Abnormal Isoforms
Numerous biophysical studies have revealed that amyloidogenic proteins can adopt conformationally distinct non-native β-sheet-rich isoforms in vitro [26, 39]. For example, amyloid fibril formation by the prion protein occurs through a pathway different from the one that leads to formation of the β-oligomeric species [26, 40]. Regardless of which abnormal β-isoforms are biologically relevant, the ability of amyloidogenic proteins to form distinct abnormal conformers reflects the complexity of the energetic landscape of folding, as well as high conformational plasticity. How do different conformers arise from the same amino acid sequence in the absence of cellular cofactors or templates? Recent biophysical studies of model proteins demonstrated that physical properties of the native and denatured states of some proteins may change gradually
5
6 Thermodynamics of Amyloid Formation
Fig. 3 Free energy diagram illustrating the thermodynamically variable characters of native and denatured states. Not only do the relative populations of the native (N) and the denatured (D) states change as a function denaturant concentration, but the
physical properties of both states are variable depending of solvent conditions. [N1 , N2 , N3 , N4 ] and [D1 , D2 , D3 , D4 ] illustrate the thermodynamically variable character of native and denatures states, respectively.
with environmental conditions [41]. The gradual change within the native and denatured species is consistent with the thermodynamically variable model, which postulates that the thermodynamic character of the native and/or denatured ensemble changes continuously as a function of the solvent environment [42, 43] (Fig. 3). The gradual change of the thermodynamic character is not always accompanied by conformational rearrangements of secondary structure. Thus, many proteins that display a thermodynamically variable behavior of the native ensembles did not show any changes as monitored by circular dichroism (CD) [42, 44]. However, such changes can be observed by other techniques such as nuclear magnetic resonance (NMR) [45], proton uptake/release [42], size-exclusion chromatography [41, 44] and dynamic light scattering [34]. One may speculate that the diversity of potential misfolding pathways of amyloidogenic proteins is linked to the variable thermodynamic behavior. Thus, a gradual environment-dependent change in the physical properties of the native and/or denatured ensembles may bias the adoption of particular pathway of misfolding under different pathological conditions.
5 Does the Process of Prion Propagation Differ from Formation of Ordered Amyloid Aggregates?
Among many amyloid-related maladies one subclass of the conformational disorders, prion diseases, seems to be distinguished by certain peculiar features: 1. The most unorthodox feature of prion disease is the existence of an infectious isoform of the prion protein, PrPSc [46, 47]. PrPSc propagates its abnormal conformation in an autocatalytic manner using the normal isoform (PrPC ) of the
6 Prion Propagation is an Autocatalytic Process
same protein as a substrate. In addition to the transmission of prion diseases in mammals, the phenomenon of self-propagating conformational transition has been described for prion proteins in yeast and in fungi [48, 49]. In all cases, the abnormal protein conformation acts either as the transmissible agent of disease or as a heritable determinant of phenotype. Reconstitution of mammalian prion infectivity in vitro has been difficult to accomplish for many years. This problem raised growing skepticism over the sufficiency of PrP alone to form an infectious agent. Recent work, however, demonstrated that the amyloid form of recombinant PrP induced a transmissible form of prion diseases in transgenic mice, providing the first compelling evidence for the “protein-only hypothesis” of prion propagation in mammals [50]. 2. Efficient self-propagation of prions requires identity or high homology between the amino acid sequences of PrPC and PrPSc , implying there is a high species specificity associated with their interaction. 3. Another prominent feature of prion propagation is the “strain” phenomenon. When PrPC is converted into pathogenic isoforms, a single unique amino acid sequence is capable of adopting conformationally distinct states, which are known as “strains” of PrPSc .
Interestingly, despite differences in the primary and tertiary structures of yeast and mammalian prions, the yeast prions display all of the characteristic features of mammalian prion proteins [51, 52]. Remarkably, recent studies illustrated that amyloid formation by non-prion proteins can also exhibit some properties that are thought to be peculiar for prion propagation. Thus, the systemic amyloidosis caused by the amyloid deposition of the serum amyloid A protein can be transmitted from animal to animal by a prion-like mechanism [53]. A species barrier, previously thought to be a distinctive feature of prion propagation, was also observed for the non-prion protein α-synuclein [54]. The growth of amyloid fibrils by Aβ peptides displayed striking specificity to cross-seeding by heterologous fibrils [55]. Since the process of amyloid formation is a general property of the polypeptide backbone, it would be of great interest to know how many amyloidogenic proteins are capable of self-propagating conformational transition in a prion-like manner (Fig. 4).
6 Prion Propagation is an Autocatalytic Process
Two competing models have been proposed to explain the autocatalytic conversion of prion folding: (1) the nucleation–polymerization model (NPM) [56] and (2) the template assisted model (TAM) [57, 58] (Fig. 5). Using different terminology, “heterogeneous nucleation” versus “templating”, both models employ an autocatalytic mechanism to explain the process of prion transmission and replication. Phenomenologically, the presence of a catalyst (nucleus or template) accelerates the process of conversion, either by avoiding a rate-limiting step in NPM or by lowering of the energy barrier in TAM. Both models predict low occurrences of sporadic
7
8 Thermodynamics of Amyloid Formation
Fig. 4 The multidimensional sequence space defines all possible amino acids sequences. Among all species, only approximately 350,000 polypeptide sequences appear to have evolved through natural evolution and selection (http://protomap.cornell.edu/). Because the ability to adopt an amyloid conforma-
tion seems to be encoded in the physical/chemical nature of the polypeptide backbone, a substantial fraction of naturally occurring proteins would be expected to be amyloidogenic. The number of proteins that are capable of propagating abnormal conformations in a prion-like manner remains to be determined.
form of prion disease, and, correspondingly, a low rate of spontaneous conversion, since spontaneous or non-seeded formation of the first catalytic center is inaccessible either thermodynamically (according to NPM) or kinetically (as follows from TAM). Exogenous administration of catalyst (PrPSc in infectious form of disease) shortens the incubation time to such extent that disease develops within the human
Fig. 5 Schematic diagram showing the nucleation-dependent and the templateassisted mechanisms of the conversion of PrPC to PrPSc . Briefly, NPM postulates that the rate-limiting step is the formation of a nucleus: an oligomeric aggregate of PrP. The nucleus is thermodynamically unstable and this makes the spontaneous process a very rare event. However, as the nucleus is formed, further polymerization is
facile. Exogenous addition of PrPSc bypasses the formation of the nucleus. TAM, on the other hand, argues that PrPC is separated from PrPSc by a substantial energy barrier. The high barrier precludes the formation of PrPSc under normal conditions. However, the process of conversion is facilitated by an exogenous administration of PrPSc , which acts as a catalyst and lowers the energy barrier.
7 Conformational Diversity of Self-propagating Prion Aggregates Table 1 The autocatalytic mechanism predicts three possible outcomes with respect to prion
disease, depending on the multiplication coefficient (r)
Essential kinetic features Kinetics follows formal mechanisms of: Clinical form of disease
r>1
r≈1
r<1
exponential growt of the reaction rate branched chain reactions development of prion disease
steady-state level
decay of the reaction
enzyme catalyses enzyme catalyses subclinical form of disease
apparent first-order kinetics clearance of prions
or animal lifetime. However, neither model explicitly discusses the mechanism of multiplication of the catalytic centers in the time course of the disease. Time-dependent multiplication of catalytic centers is a key feature of any autocatalytic reaction and may have a dramatic effect on the final outcome of these processes. In a simplified mathematical equation, the reaction velocity of an autocatalytic process is determined by the multiplication coefficient (r), which is proportional to the rate of generation of new catalytic centers divided by the rate of their loss or clearance in the time course of the process. When r > 1 the reaction rate increases exponentially, while the reaction decays when r < 1 (Table 1). Such mechanism postulates a threshold effect when r is very close to 1. In this case slight changes in experimental parameters may switch the reaction from a decay mode to an autoacceleration mode and vice versa. This mechanism predicts that the autocatalytic reaction can be induced even at subthreshold conditions (r < 1) by exogenous addition of catalytic centers. The proposed model is in agreement with the hypothesis that the progression of prion disease and the final outcome are controlled by fine dynamic balance between the rate of self-propagation and the rate of clearance of PrPSc . It would be of great interest to know what mechanisms account for multiplication of catalytic centers in vivo. The ability to control the rate of multiplication of catalytic isoforms of PrP would be a powerful strategy to treat prion diseases.
7 Conformational Diversity of Self-propagating Prion Aggregates
The “protein-only hypothesis” postulates that the given PrP amino acid sequence must be capable of adopting separate states when it is converted into pathological isoforms called “strains” of PrPSc . The existence of different prion “strains” has been recognized for a long time in the history of prion diseases [59, 60]. Each prion “strain” is associated with distinct neuropathologic features, such as the length of incubation time and the distribution of neuronal vacuolation in the brain [61]. These features are stable during serial transmission of PrPSc in a given host species. In the absence of another molecule that assists conversion, the formation
9
10 Thermodynamics of Amyloid Formation
of different strains by the same protein can only be accomplished either by covalent modification of PrP or through the adoption of multiple conformations. During the past several years, considerable evidence has been amassed indicating that the properties of prion “strains” are enciphered in the conformation of PrPSc [62–66]. According to the template assisted model the abnormal PrPSc isoform provides a conformational template and guides conversion of PrPC into a conformer identical to PrPSc . Therefore, the conformational properties of each individual “strain” are inherited by nascent PrPSc . Although supported by numerous experiments carried out in vivo, the principle of formation of conformationally distinct self-propagating aggregates by the prion protein within the same primary sequence has yet to be demonstrated in vitro. The questions of fundamental importance are: (1) what factors determine the conformational diversity of self-propagating protein aggregates, (2) what is the role of the primary structure of protein versus template and environmental factors in this process, and (3) how do new conformers (“strains”) arise?
8 High Species Specificity of Prion Propagation
Numerous experiments on transmission of prions between different species demonstrated that prion replication is highly specific with respect to the primary structure of interacting isoforms, PrPC and PrPSc [67]. When the amino acid sequence of the PrPC of a recipient animal is identical to the sequence of PrPSc of a donor animal, the disease has the shortest incubation time. In contrast, if prions are transmitted from one species to another, the incubation time is longer in the first passage and the newly infected animals develop atypical clinical signs and unusual histopathology [68]. Once the initial passage has been accomplished, the incubation period shortens in subsequent passages and becomes fixed, as does histopathology. This phenomenon is called a “species barrier” [69]. From studies in transgenic animals, two factors have been identified that contribute to the species barrier: (1) the difference in PrP sequences between the donor species and the recipient species, and (2) the “strains” of PrPSc . It has been noticed that, when transmitted to a different species, some “strains” overcome the species barrier more easily than others [70]. To explain the species barrier, Caughey et al. proposed that when both homologous and heterologous PrPC are present, the heterologous PrPC (PrPC with a sequence different from PrPSc ) could bind to PrPSc without conversion. By binding to PrPSc , the heterologous PrPC inhibits binding and conversion of homologous PrPC [71]. In this model, the process of conversion, rather than binding, requires amino acid compatibility between PrPC and PrPSc . Although this simple model rationalizes a substantial body of experimental data on the species barrier, it fails to recognize the strain specificity of the species barrier. The questions are: (1) why can some “strains” from donor species be transmitted to recipient species, while other “strains” cannot, and (2) why some, but not all, recipient species propagate certain “strains” from donor species? It is clear now that the strain and the species barrier
9 Conclusions 11
Fig. 6 The new conceptual framework postulates that (1) primary structure of PrP of each individual species (A, B and C) determines conformational diversity of “strains”. (2) However, a template specifies particular conformation within given conformational space. (3) Subsets of conformers (“strains”) formed by PrP molecules from different species may overlap suggesting that some
PrPSc “strains” are more universal and can be shared by two or more species, whereas other “strains” are more species-specific. For instance, “strain” n is shared by species A, B, and C, while “strain” m is shared by species A and B. Only those “strains” that occupy overlapping areas can be propagated by more than one species.
phenomena are closely connected. Therefore, the experimental data related to both phenomena have to be discussed in the context of a common model. Fig. 6 represents a new conceptual framework that explains the strain specificity of the species barrier. This framework may provide general guidance for research conducted on this topic.
9 Conclusions
The observations that many proteins are able to adopt alternative amyloid-like folding require us to revisit many basic issues of protein folding, such as the role of kinetic traps in the folding pathway, the complexity of the energy landscape and the position of the native state in this landscape. An alternative view of protein folding presented in Fig. 7 proposes a complex energy surface with multiple energy minima. Most polypeptides are trapped in a native state that corresponds to a local free energy minimum for the entire protein lifetime, whereas global energy minima are occupied by β-sheet-rich states, which are kinetically inaccessible under physiological conditions and at indefinite protein dilution. However, despite high kinetic barriers, some proteins, including PrP, α-synuclein, parkin, tau and other, find a route to β-sheet-rich multimeric structures with unfortunate consequences. If a β-sheet-rich amyloid structure is an intrinsic preference, particularly at high
12 Thermodynamics of Amyloid Formation
Fig. 7 An idealized “volcano energy landscape” of protein folding, to illustrate how a protein could fold fast to the native state, which might not the lowest energy state. Most proteins remain “trapped” in their native states for the entire protein lifetime, because alternative $-sheet-rich
states are separated by a high energetic barrier. Opposite to the native form, the $-sheet-rich state of each individual protein may adopt more than one conformation within given conformational space. N = native state, U = unfolded state; A = amyloid state.
protein concentration, then compartmentalization of proteins during folding and clearance of misfolded proteins should play critical roles in cellular health. In addition, amyloidogenic sidechain patterns, such as alternating polar and nonpolar amino acid residues, will be avoided through natural selection [72]. With many fundamental rules that govern protein folding having been discovered, new questions have to be addressed in the nearest future. How general is the phenomenon of self-propagating conformational transition? Can the prion hypothesis be expanded to non-prion proteins? Are individual non-prion proteins capable of assembly into conformationally distinct self-propagating aggregates (“strains”)? To what extent is sequence identity (or high homology) between native and selfpropagating abnormal isoforms necessary for efficient self-propagation? What is the role of the template in conformational diversity of self-propagating β-sheet-rich states?
References 1 CARRELL, R. W. and LOMAS, D. A. Conformational disease. Lancet 1997, 350, 134–138. 2 PRUSINER, S. B. Shattuck Lecture – neuro-degenerative diseases and prions. N Engl J Med 2001, 344, 1516–1526. 3 SIPE, J. D. and COHEN, A. S. Review: history of the amyloid fibril. J Struct Biol 2000, 130, 88–98. 4 KELLY, J. W. The alternative conformations of amyloidogenic proteins and their multistep assembly pathways. Curr Opin Struct Biol 1998, 8, 101–106.
5 CHITI, F., WEBSTER, P., TADDEI, N., CLARK, A., STEFANI, M., RAMPONI, G. and DOBSON, C. M. Designing conditions for in vitro formation of amyloid protofilaments and fibrils. Proc Natl Acad Sci USA 1999, 96, 3590–3594. 6 RAMIREZ-ALVARADO, M., MERKEL, J. S. and REGAN, L. A Systematic exploration of the influence of the protein stability on amyloid fibril formation in vitro. Proc Natl Acad Sci 2000, 97, 8979–8984. 7 YUTANI, K., TAKAYAMA, G., GODA, S., YAMAGATA, Y., MAKI, S., NAMBA, K., TSUNASAWA, S. and OGASAHARA, K. The
References
8
9
10
11
12
13
14
15
16
process of amyloid-like fibril formation by methionine aminopeptidase from a hyperthermophile, Pyrococcus furiosus. Biochemistry 2000, 39, 2769–2777. GUIJARRO, J. I., SUNDE, M., JONES, J. A., CAMPBELL, I. D. and DOBSON, C. M. Amyloid fibril formation by an SH3 domain. Proc Natl Acad Sci USA 1998, 95, 4224–4228. HAGGQVIST, B., NASLUND, J., SLETTEN, K., WESTERMARK, G., MUCCHIANO, G., TJERNBERG, L. O., NORDSTEDT, C., ENGSTROM, U. and WESTERMARK, P. Medin: an integral fragment of aortic smooth muscle cell-produced lactadherin forms the most common human amyloid. Proc Acad Natl Sci USA 1999, 96, 8669–8674. KONNO, T. Multistep nucleus formation and a separate subunit contribution of the amyloidogenesis of heat-denatured monellin. Protein Sci 2001, 10, 2093–2101. ICONOMIDOU, V. A., VRIEND, G. and HAMODRAKAS, K. Amyloid protects the silk moth oocyte and embryo. FEBS Lett 2000, 479, 141–145. CHAPMAN, M. R., ROBINSON, L. S., PINKNER, J. S., ROTH, R., HEUSER, J., HAMMAR, M., NORMARK, S. and HULTGREN, S. J. Role of Escherichia coli curli operons in directing amyloid fiber formation. Science 2002, 295, 851–855. PODRABSKY, J. E., CARPENTER, J. E. and HAND, S. C. Survival of water stress in annual fish embryos: dehydration avoidance and egg envelope amyloid fibers. Am J Physiol 2001, 280, R123–R131. JIMENEZ, J. L., NETTLETON, E. J., BOUCHARD, M., ROBINSON, C. V., DOBSON, C. M. and SAIBIL, H. The protofilament structure of insulin amyloid fibrils. Proc Acad Natl Sci USA 2002, 99, 9196–9201. JIMENEZ, J. L., GUIJARRO, J. I., ORLOVA, E., ZURDO, J., DOBSON, C. M., SUNDE, M. and SAIBIL, H. Cryoelectron microscopy structure of an SH3 amyloid fibril and model of the molecular packing. EMBO J 1999, 18, 815–821. DOBSON, C. M. Protein misfolding, evolution and disease. Trends Biochem Sci 2002, 24, 329–332.
17 ANFINSEN, C. B. Principles that govern the folding of protein chains. Science 1973, 181, 223–230. 18 BAKER, D. and AGARD, D. A. Kinetics versus thermodynamics in protein folding. Biochemistry 1994, 33, 7505–7509. 19 CHAPMAN, M. R., ROBINSON, L. S., PINKNER, J. S., ROTH, R., HEUSER, J., HAMMAR, M., NORMARK, S. and HULTGREN, S. J. Role of Escherichia coli curli operones in directing amyloid fiber formation. Science 2002, 295, 851–855. 20 BERSON, J. F., THEOS, A. C., HARPER, D. C., TENZA, D., RAPOSO, G. and MARKS, M. S. Proprotein convertase cleavage liberates a fibrillogenic fragment of a resident glycoprotein to initiate melanosome biogenesis. J Cell Biol 2003, 161, 521–533. 21 BASKAKOV, I. V., AAGAARD, C., MEHLHORN, I., WILLE, H., GROTH, D., BALDWIN, M. A., PRUSINER, S. B. and COHEN, F. E. Self-assembly of recombinant prion protein of 106 residues. Biochemistry 2000, 39, 2792–2804. 22 NETTLETON, E. J., TITO, P., SUNDE, M., BOUCHARD, M., DOBSON, C. M. and ROBINSON, C. V. Characterization of the oligomeric states of insulin in self-assembly and amyloid fibril formation by mass spectrometry. Biophys J 2000, 79, 1053–1065. 23 HUANG, T. H. J., YANG, D.-S., PLASKOS, N. P., GO, S., YIP, C. M., FRASER, P. E. and CHAKRABARTTY, A. Structural studies of soluble oligomers of the Alzheimer beta-amyloid peptide. J Mol Biol 2000, 297, 73–87. 24 WALSH, D. M., HARTLEY, D. M., KUSUMOTO, Y., FEZOUI, Y., CONDRON, M. M., LOMAKIN, A., BENEDEK, G. B., SELKOE, D. J. and TEPLOW, D. B. Amyloid beta-protein fibrillogenesis: structure and biological activity of protofibrillar intermediates. J Biol Chem 1999, 274, 25945–25952. 25 SERAG, A. A., ALTENBACH, C., GINGERY, M., HUBBELL, W. L. and YEATES, T. O. Identification of a subunit interface in transthyretin amyloid fibrils: evidence for self-assembly from oligomeric
13
14 Thermodynamics of Amyloid Formation
26
27
28
29
30
31
32
33
34
building blocks. Biochemistry 2001, 40, 9089–9096. BASKAKOV, I. V., LEGNAME, G., BALDWIN, M. A., PRUSINER, S. B. and COHEN, F. E. Pathway complexity of prion protein assembly into amyloid. J Biol Chem 2002, 277, 21140–21148. BASKAKOV, I. V., LEGNAME, G., PRUSINER, S. B. and COHEN, F. E. Folding of prion protein to its native α-helical conformation is under kinetic control. J Biol Chem 2001, 276, 19687–19690. HAMMARSTROM, P., WISEMAN, R. L., POWERS, E. T. and KELLY, J. W. Prevention of transthyretin amyloid disease by changing protein misfolding energetics. Science 2003, 299, 713–716. KHURANA, R., GILLESPIE, J. R., TALAPATRA, A., MINERT, L. J., IONESCU-ZANETTI, C., MILLETT, I. and FINK, A. L. Partially folded intermediates as critical precursors of light chain amyloid fibrils and amorphous aggregates. Biochemistry 2001, 40, 3525–3535. CHITI, F., TADDEI, N., BUCCIANTINI, M., WHITE, P., RAMPONI, G. and DOBSON, C. M.. Mutational analysis of the propensity for amyloid formation by a globular protein. EMBO J 2000, 19, 1441–1449. HAMADA, D. and DOBSON, C. M.. A kinetic of beta-lactoglobulin amyloid fibril formation promoted by urea. Protein Sci 2002., 11, 2417–2426. NIELSEN, L., KHURANA, R., COATS, A., FROKJAER, S., BRANGE, J., VYAS, S., UVERSKY V. N. and FINK, A. L. Effect of environmental factors on the kinetics of insulin fibril formation: elucidation of the molecular mechanism. Biochemistry 2001, 40, 6036–6046. KUNDU, B., MAITI, N. R., JONES, E. M., SUREWICZ, K. A., VANIK, D. L. and SUREWICZ, W. K. Nucleation-dependent conformational conversion of the Y145Stop variant of human prion protein: structural clues for prion propagation. Proc Acad Natl Sci USA 2003, 100, 12069–12074. SOKOLOWSKI, F., MODLER, A. J., MASUCH, R., ZIRWER, D., BAIER, M., LUTSCH, G. M. D. A., GAST, K. and NAUMANN, D. Formation of critical oligomers is a key event during conformational transition
35
36
37
38
39
40
41
42
43
44
45
of recombinant Syrian hamster prion protein. J Biol Chem 2003, 278, 40481–40492. KAUZMANN, W. Some factors in the interpretation of protein denaturation. Adv Protein Chem 1959, 14, 1–63. HONIG, B. and COHEN, F. E. Adding backbone to protein folding: why proteins are polypeptides. Fold Des 1996, 1, R17–R20. BALBIRNIE, M., GROTHE, R. and EISENBERG, D. S. An amyloidforming peptide from the yeast prion Sup35 reveals a dehydrated beta-sheet structure for amyloid. Proc Acad Natl Sci USA 2001, 98, 2375–2380. BOLEN, D. W. and BASKAKOV, I. V. The osmophobic effect: natural selection of a thermodynamic force in protein folding. J Mol Biol 2001, 310, 955–963. LU, B. Y. and CHANG, J. Y. Isolation of isoforms of mouse prion protein with PrPSc -like structural properties. Biochemistry 2001, 40, 13390–13396. BASKAKOV, I. V. Autocatalytic conversion of recombinant prion proteins displays a species barrier. J Biol Chem 2004, 279, 586–595. BASKAKOV, I. V. and BOLEN, D. W. Monitoring the sizes of denatured ensembles of staphylococcal nuclease proteins: implications regarding m values, intermediates, and thermodynamics. Biochemistry 1998, 37, 18010–18017. BOLEN, D. W. and YANG, M. Effects of guanidine hydrochloride on the proton inventory of proteins: implications on interpretations of protein stability. Biochemistry 2000, 39, 15208–15216. YANG, M., LIU, D. and BOLEN, D. W. The peculiar nature of the guanidine hydrochloride-induced two-state denaturation of staphylococcal nuclease: a calorimetric study. Biochemistry 1999, 38, 11216–11222. BASKAKOV, I. V., LEGNAME, G., GRYCZYNSKI, Z. and PRUSINER, S. B. The peculiar nature of unfolding of human prion protein. Protein Sci 2004, 13, 586–595. KUWATA, K., LI, H., YAMADA, H., LEGNAME, G., PRUSINER, S. B., AKASAKA,
References
46
47 48
49
50
51
52
53
54
55
K. and JAMES, T. L. Locally disordered conformer of the hamster prion protein: a crucial intermediate of PrPSc ? Biochemistry 2002, 41, 12277–12283. GAJDUSEK, D. C. Spontaneous generation of infectious nucleating amyloids in the transmissible and nontransmissible cerebral amyloidoses. Mol Neurobiol 1994, 8, 1–13. PRUSINER, S. B. Prion diseases and the BSE crisis. Science 1997, 278, 245–251. WICKNER, R. B. [URE3] as an altered URE2 protein: evidence for a prion analog in Saccharomyces cerevisiae. Science 1994, 264, 566–569. MADDELEIN, M. L., DOS REIS, S., DUVEZIN-CAUBET, S., COULARY-SALIN, B. and SAUPE, S. J. Amyloid aggregates of the HET-S prion protein are infectious. Proc Acad Natl Sci USA 2002, 99, 7402–7407. LEGNAME, G., BASKAKOV, I. V., NGUYEN, H.-O. B., RIESNER, D., COHEN, F. E., DEARMOND, S. J. and PRUSINER, S. B. Synthetic wild-type mammalian prions. Paper presented at the First International Conference of the European Network of Excellence Neuroprion, Paris, May 25, 2004. CHIEN, P. and WEISSMAN, J. S. Conformational diversity in a yeast prion dictates its seeding specificity. Nature 2001, 410, 223–227. TANAKA, M., CHIEN, P., NABER, N., COOKE, R. and WEISSMAN, J. S. Conformational variations in an infectious protein determine prion strain differences. Nature 2004, 428, 323–328. LUNDMARK, K., WESTERMARK, G. T., NYSTROM, S., MURPHY, C. L., SOLOMON, A. and WESTERMARK, P. Transmissibility of systemic amyloidosis by a prion-like mechanism. Proc Acad Natl Sci USA 2002, 99, 6979–6984. ROCHET, J. C., CONWAY, K. A. and LANSBURY, P. T., Jr. Inhibition of fibrillization and accumulation of prefibrillar oligomers in mixtures of human and mouse alpha-synuclein. Biochemistry 2000, 39, 10619–10626. O’NUALLAIN, B., WILLIAMS, A. D., WESTERMARK, P. and WETZEL, R. Seeding specificity in amyloid growth induced by
56
57
58
59
60
61
62
63
64
hetero logous fibrils. J Biol Chem 2004, 279, 17490–17494. HARPER, J. D. and LANSBURY, P. T., Jr. Models of amyloid seeding in Alzheimer’s disease and scrapie: mechanistic truths and physiological consequences of the time-dependent solubility of amyloid proteins. Annu Rev Biochem 1997, 66, 385–407. COHEN, F. E., PAN, K.-M., HUANG, Z., BALDWIN, M., FLETTERICK, R. J. and PRUSINER, S. B. Structural clues to prion replication. Science 1994, 264, 530– 531. COHEN, F. E. and PRUSINER, S. B. Pathologic conformations of prion proteins. Annu Rev Biochem 1998, 67, 793–819. FRASER, H. and DICKINSON, A. G. Scrapie in mice. Agentstrain differences in the distribution and intensity of grey matter vacuolation. J Comp Pathol 1973, 83, 29–40. DICKINSON, A. G. and FRASER, H. Modification of the pathogenesis of scrapie in mice by treatment of the agent. Nature 1969, 222, 892–893. HECKER, R., TARABOULOS, A., SCOTT, M., PAN, K.-M., TORCHIA, M., JENDROSKA, K., DEARMOND, S. J. and PRUSINER, S. B. Replication of distinct scrapie prion isolates is region specific in brains of transgenic mice and hamsters. Genes Dev 1992, 6, 1213–1228. BESSEN, R. A. and MARSH, R. F. Distinct Prp properties suggest the molecular basis of strain variation in transmissible mink encephalopathy. J Virol 1994, 68, 7859–7868. TELLING, G. C., PARCHI, P., DEARMOND, S. J., CORTELLI, P., MONTAGNA, P., GABIZON, R., MASTRIANNI, J., LUGARESI, E., GAMBETTI, P. and PRUSINER, S. B. Evidence for the conformation of the pathologic isoform of the prion protein enciphering and propagating prion diversity. Science 1996, 274, 2079–2082. SAFAR, J., WILLE, H., ITRI, V., GROTH, D., SERBAN, H., TORCHIA, M., COHEN, F. E. and PRUSINER, S. B. Eight prion strains have PrPSc molecules with different conformations. Nat Med 1998, 4, 1157–1165.
15
16 Thermodynamics of Amyloid Formation
65 CAUGHEY, B., RAYMOND, G. J. and BESSEN, R. A. Strain-dependent differences in /8-sheet conformations of abnormal prion protein. J Biol Chem 1998, 273, 32230–32235. 66 PERETZ, D., SCOTT, M., GROTH, D., WILLIAMSON, A., BURTON, D., COHEN, F. E. and PRUSINER, S. B. Strain-specified relative conformational stability of the scrapie prion protein. Protein Sci 2001, 10, 854–863. 67 SCOTT, M., RIDLEY, R. M., BAKER, H. F., DEARMOND, S. J. and PRUSINER, S. B. Transgenetic investigations of the species barrier and prion strains. In Prion Biology and Diseases, PRUSINER, S. B. (ed.). Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1999, pp. 307–347. 68 PATTISON, I. H. Experiments with scrapie with special reference to the nature of the agent and the pathology of the disease. In Slow, Latent and Temperate Virus Infections, NINDB Monograph 2, GAJDUSEK, D. C., GIBBS, C. J., Jr. and
69
70
71
72
ALPERS, M. P. (eds). US Government Printing Office, Washington, DC, 1965, pp. 249–257. PATTISON, I. H. and JONES, K. M. Modification of a strain of mouse-adapted scrapie by passage through rats. Res Vet Sci 1968, 9, 408–410. SCOTT, M. R., GROTH, D., TATZELT, J., TORCHIA, M., TREMBLAY, P., DEARMOND, S. J. and PRUSINER, S. B. Propagation of prion strains through specific conformers of the prion protein. J Virol 1997, 71, 9032–9044. HORIUCHI, M., PRIOLA, S. A., CHABRY, J. and CAUGHEY, B. Interactions between heterologous forms of prion protein: binding, inhibition of conversion, and species barriers. Proc Natl Acad Sci USA 2000, 97, 5836–5841. BROOME, B. M. and HECHT, M. H. Nature disfavors sequences of alternating polar and non-polar amino acids: implications for amyloidogenesis. J Mol Biol 2000, 296, 961–968.
1
Post-translational Chemical Modifications in Amyloid Fibril Formation Melanie R. Nilsson McDaniel College, Westminster, USA
Originally published in: Amyloid Proteins. Edited by Jean D. Sipe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31072-X
1 Introduction
From a purely na¨ıve point of view, it would seem that an obvious connection between normal protein degradation reactions (i.e. normal protein aging) and diseases associated with aging (e.g. amyloidosis) may exist. This connection, however, has received little attention. The lack of research in this area stems, in part, from the difficulties associated with detecting these subtle modifications and, in part, from the proposition that these modifications are not relevant to disease. The “conformational hypothesis” (the predominant view) suggests that amyloid fibril formation results from destabilization of the native protein as a result of changes in the local solution environment (e.g. pH, ionic strength, accessory proteins) [1–3]. An alternative view, which I will call the “chemical modification hypothesis”, suggests the protein conformational change is facilitated by a change in the primary structure (i.e. chemical modification) of the protein. The local solution environment can dramatically affect the types of chemical modifications that occur, so the solution environment is still a critical factor in fibril formation. Both hypotheses are consistent with Anfinsen’s view (cited in [1–3]) that the primary structure and aqueous environment govern the three-dimensional fold of a protein. In the conformational change hypothesis, the alteration in three-dimensional structure arises from changes in the aqueous environment, whereas in the chemical modification hypothesis, the altered conformation is a result of an altered primary sequence. It is quite possible that both hypotheses may be true, but applicable to different proteins. However, it is important to resolve which mechanism occurs in each disease because unraveling the mechanism of aggregation will have a significant impact on evaluating appropriate therapeutic strategies. The purpose of this article is to present the evidence that supports the view in which post-translational chemical modification of the primary sequence may be an important mechanism of amyloid fibril formation. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Post-translational Chemical Modifications in Amyloid Fibril Formation Table 1 Modification of amino acids in proteins (reprinted from [4], with permission)
Amino Acid
Type of modification
Alanine Arginine Asparagine Aspartic acid
GPI-anchoring N-methylation, ADP-ribosylation, deimination glycosylation, GPI-anchoring, deamidation phosphorylation, methylation, isomerization, racemization, GPI-anchoring cystine formation, selenocysteine formation, heme linkage, myris-toylation, palmitoylation, S-nitrosylation, oxidation to C-α-formylglycine, deamination, GPI-anchoring γ -carboxyglutamate formation, methylation, polyglycylation deamidation, crosslinking, pyroglutamate formation, ADP-ribosylation, N-methylation GPI-anchoring, myristoylation methylation, diphthamide formation, phosphorylation, deamination N-acetylation, N-methylation oxidation, hydroxylation, crosslinking, palmitoylation, biotinylation, deamination hydroxylation hydroxylation, glycosylation phosphorylation, glycosylation, acetylation, GPI-anchoring, oxidation to C-α-formylglycine, dehydration and formation of a thioether link phosphorylation, glycosylation, dehydration and formation of a thioether link iodination, phosphorylation, sulfation, flavin linkage, nucleotide linkage, o-tyrosine, chloro-, nitrotyrosine and dityrosine formation formylation, acetylation, pyroglutamate formation GPI-anchoring, amidation, polyglycylation
Cysteine
Glutamic acid Glutamine Glycine Histidine Lysine Phenylalanine Proline Serine Threonine Tyrosine N-terminus C-terminus
The primary structure of a protein refers to the covalent arrangement of atoms in a protein. This includes not only the sequence of amino acids, but also posttranslational processing (e.g. formation of disulfides, proteolysis, glycosylation, lipid modification, phosphorylation) in addition to non-enzymatically regulated chemical modifications such as deamidation, racemization and isomerization. Table 1 lists some chemical modifications that are known to occur in proteins [4]. It is important to approach any relationship between primary structure and protein conformation with caution, since these relationships can be extremely difficult to predict. For example, it is not surprising that myoglobin and lysozyme have very different folds (myoglobin is all α-helical and lysozyme is a mixed α/β fold) since the primary structures are not similar. However, there are proteins with significantly different primary structures that all have the same basic fold (e.g. several non-homologous proteins fold into an α/β (TIM) barrel [5]). Another insightful example is the case of mutant sequences of transthyretin (TTR). Some mutations of TTR destabilize the protein conformation and facilitate amyloid deposition. However, there are also mutations that stabilize the native conformation and suppress amyloid formation, and polymorphisms that have no significant impact on
1 Introduction 3
amyloid formation. (For a review on ATTR, see [6].) These examples emphasize that it is not easy to predict the impact a given chemical modification will have on amyloid fibril formation and, furthermore, it cannot be assumed a priori that chemical modifications will play a primary role in all amyloid diseases. Post-translational modifications encompass both enzymatic and non-enzymatic modifications. Enzymatically regulated modifications are a normal part of protein processing and typically are required for protein folding, stability or function. Nonenzymatic chemical modifications, in contrast, are generally not part of normal protein processing and result simply from the favorability of these reactions in the chemical environment of the body. Both types of modifications result in a change in the primary structure of the protein and, therefore, can potentially contribute to amyloid fibril formation. The role of enzymatically regulated modifications in amyloid fibril formation is unclear. There are two possible scenarios by which these modifications could affect fibril formation – one in which the correct processing events do not occur and the other in which there is modification at the wrong place or at the wrong time. The absence of the modifications necessary for normal processing could play a role in amyloid formation by either destabilizing the native fold or by the presence of an unprocessed region that may initiate deposition. For example, incompletely processed islet amyloid polypeptide (IAPP), in which the pro-hormone is not completely proteolytically cleaved, has been identified in amyloid deposits [7–9]. The incorrectly processed peptide contains IAPP with an additional N-terminal peptide. The presence of the N-terminal segment could contribute to amyloid formation by destabilizing IAPP (possibly by preventing the formation of the proper storage form [10]) or the N-terminal peptide may bind to components of the extracellular matrix and act as an initiation site for fibril formation [11]. Similarly, evidence for the precursors of calcitonin [12] and atrial natriuretic peptide [13] have been identified in amyloid deposits. Enzymatically regulated modifications may also contribute to fibril formation by occurring at the wrong time or in the wrong place. For example, tau, the protein component of neurofibrillary tangles in Alzheimer’s disease, is hyperphosphorylated [14]. Phosphorylation is an enzymatically regulated process but, in this case, occurs in excess. The role of non-enzymatic chemical modifications in amyloid fibril formation is also unclear. Proteins spontaneously degrade in vivo and in vitro by a variety of mechanisms. Some of the more common reactions are racemization, oxidation of methionine, and deamidation of asparagine and glutamine. In vitro these modifications are frequently ignored and considered of minor significance. One notable exception, however, is the stability of peptide drugs, which has received considerable attention in an attempt to prolong their shelf life. In vivo the body normally protects itself against these modifications by either preventing the modification, repairing the modified sequences or clearing the damaged proteins. It has been proposed, for example, that the deamidation of asparagine to aspartic/isoaspartic acid may serve as a type of molecular clock to estimate the amount of time a protein has been in the cell [15]. Presumably, after a specified period of time, a signal will be sent to degrade the protein.
4 Post-translational Chemical Modifications in Amyloid Fibril Formation
Alterations due to the labile nature of amino acids can lead to additional functionality, but there is also a danger that these reactions will result in negative consequences. This is reminiscent of the mutability of DNA that allows for evolution but also can lead to disease. Like random mutations in DNA, there is no way to easily predict the outcome of a chemical modification, but the probability of a damaging modification that is not repaired will statistically increase with age (for an overview of chemical damage associated with aging, see [16]). This is a very attractive proposition to explain the familial versus sporadic variability of amyloid disease progression. Familial cases of amyloid diseases occur when a genetic mutation leads to a protein sequence that is more prone to aggregation than the wild-type sequence. There are also, however, late-onset variations in which there are no mutations and the mechanism of destabilization of the native protein is unclear. One possible explanation is that chemical modifications which result from normal age-related protein degradation reactions may destabilize the proteins in a similar fashion as inherited point mutations. 2 Common Modifications that May Play a Significant Role In Vivo 2.1 Cleavage by Proteases or Non-enzymatic Hydrolysis
The cleavage of a peptide or protein can occur as a result of protease activity or nonenzymatic hydrolysis. Four major classes of proteases (aspartic acid, serine, cysteine and metalloproteases) have been identified and named according to the catalytic portion of the active site. Proteases are commonly stored in lysosomes and, therefore, protein cleavage associated with lysosomes has been implicated for many years in amyloidogenesis. For example, Glenner et al. in 1971 proposed a primary role for proteolysis in AL amyloid formation: “The fact that ‘amyloid’ fibrils can be created from some Bence–Jones proteins at a physiologic temperature in the presence of a proteolytic enzyme having an acidic pH optimum suggests that one possible pathogenetic mechanism for amyloid formation may be by means of intralysosomal catheptic digestion of light polypeptide chains of immunoglobulins” [17]. Cleavage of the amide backbone of a peptide or protein, however, does not need to be mediated by proteases. Acidic conditions can be used to facilitate cleavage of a protein into constituent amino acids in preparation for amino acid analysis. Although the conditions employed in this technique are not found in the body, hydrolysis of many amide bonds occurs under physiologically relevant conditions. Amide hydrolysis can be catalyzed by acid or base, but the mechanisms and labile sites are substantially different. In several cases in which the amyloid fibril component is a fragment of a larger precursor, specific proteases have not been identified in the generation of the amyloidogenic fragment, suggesting that some proteins may be cleaved by non-enzymatic reactions. A survey of the amyloid literature results in a multitude of examples in which cleavage of the protein backbone is implicated in amyloid fibril formation. For
2 Common Modifications that May Play a Significant Role In Vivo 5
example, in the most common form of amyloid deposition (which occurs in the aortic smooth muscle), the major protein component is medin, a small fragment of lactadherin [18]. More recently, unusual fragments of kerato-epithelin have been identified in corneal amyloid deposits [19]. Additional examples in which cleavage by proteolysis or hydrolysis may play a significant role in amyloid fibril formation are highlighted below. Non-pathological proteolytically induced amyloid fibril formation has been identified in melanosome biogenesis [20]. Pmel17 is a protein expressed in melanocytes and, as part of normal processing, is cleaved into two unequal pieces (the 80-kDa Mα fragment and the 28-kDa Mβ fragment) by a furin-like convertase. The Mα fragment subsequently forms insoluble fibrils which act as a scaffold for melanin pigments. The full-length protein is unable to form fibrils and assembly only occurs upon cleavage. It is speculated that this tight control mechanism prevents aberrant fibril formation. Other physiologically relevant amyloid processes, such as curli formation in Escherichia coli, are also carefully regulated events [21]. Lithostathine (also known as pancreatic stone protein, S2) is a 144-residue secretory protein that generates a 133-residue amyloidogenic fragment (S1, pancreatic thread protein) upon spontaneous autocleavage or cleavage with trypsin [22]. S1 has been identified in the brain of patients with Alzheimer’s disease [23] and in pancreatic stones [24]. Cleavage of lithostathine to S1 is a pre-fibrillogenic event and is an interesting example of fibril formation that can be induced by proteolysis or by an autocatalytic mechanism. Gelsolin is an actin-binding protein [25]. In familial amyloidosis of the Finnish type (FAF), there is a mutation at residue 187 in which an aspartic acid is converted to an asparagine or tyrosine [26, 27]. The mutations have mild effects on the thermodynamic stability of the protein [28] but lead to a susceptibility to proteolytic cleavage [29]. Cleavage results in the production of an amyloidogenic fragment that encompasses residues 173–223 or 173-225 [26]. In contrast, the wild-type protein is not cleaved and does not form amyloid fibrils. The amyloidogenic fragment has been observed in the cerebrospinal fluid (CSF) [30] and identified as the major circulating form of patients with FAF [31, 32], suggesting that cleavage is the driving force in FAF amyloid deposition. Cystatin C (also known as γ -trace basic protein) is an inhibitor of cysteine proteases. In hereditary cystatin C amyloid angiopathy (HCCAA or HCHWA-I), there is a mutation at position 68 in which a leucine is replaced with a glutamine [33]. Amyloid deposits of this variant protein contain cystatin C lacking the first 10 residues [33]. The soluble truncated form has not been observed in detectable quantities in the CSF [34], but the loss of the 10-residue fragment is consistent with the proposed domain-swapped dimer mechanism of cystatin fibril formation [35]. The dimer interface occurs between strands 2 and 3 (residues 42–59) [35], and the loss of the first 10 residues would eliminate the edge strand (strand 1) and potentially facilitate deposition. In sporadic cerebral amyloid angiopathy with cystatin C deposition (SCCAA), there is no mutation present and full-length cystatin C is found in amyloid deposits. However, in SCCAA it is speculated that cystatin C co-precipitates with the Aβ peptide [36].
6 Post-translational Chemical Modifications in Amyloid Fibril Formation
N-terminal fragments of different apolipoproteins are associated with a wide variety of amyloid deposits. Fragments of apolipoprotein AIV (ApoAIV) have been co-localized with TTR in cardiac amyloid and the authors have proposed that this fragment may “serve as a local nidus for fibrillogenesis” [37]. An N-terminal fragment of apolipoprotein AI (ApoAI) is the major fibril protein in atherosclerotic plaques and a fragment of variant ApoAI is the major amyloid constituent in FAP Type III (Iowa) [38–40]. Similarly, the N-terminal portion of the apolipoprotein serum amyloid A (ApoSAA) has been isolated from amyloid deposits and this fragment has been generated in vitro upon digestion with cathepsin B, suggesting that proteolysis is a prefibrillogenic event [41, 42]. Hatters and Howlett propose a mechanism by which apolipoproteins may be deposited. They suggest that “oxidative processes may promote protein truncation or modification of apolipoproteins in a manner that perturbs their lipid binding properties and thus by default promote the amyloidogenic folding options” [43]. Familial British dementia (FBD) is associated with amyloid fibrils composed of C-terminal fragments of α- and β-tubulin [44] and fragments of a protein of unknown function named BRI [45, 46]. In FBD, a mutation in the BRI gene results in the production of a protein (BRI-L) that is extended by 11 amino acids at the C-terminus. Cleavage of this protein near the C-terminus results in the release of a 34-amino-acid peptide that readily forms amyloid fibrils [45]. Cleavage of BRI-L is speculated to be mediated by furin [47, 48]. A similar proteolytic event is involved in the formation of a different amyloidogenic peptide derived from BRI that occurs in familial Danish dementia [48]. β 2 -Microglobulin (β 2 M) amyloidosis occurs in patients undergoing long-term dialysis treatment. N-terminally truncated forms of β 2 M are common in ex vivo amyloid. Cleavage occurs between K6 and I7 (around 25%), and to a lesser extent between residues 10–11, 17–18, 19–20, 86–87 and 98–99 [49, 50]. Truncated β 2 M, which corresponds to a loss of the first six residues, has a high tendency to aggregate in vitro [49, 51]. Therefore, it is possible that truncation of β 2 M destabilizes the native fold and leads to amyloid fibril formation. TTR amyloid formation occurs in familial amyloid polyneuropathy (FAP) and senile systemic amyloidosis (SSA). FAP is associated with mutations in TTR [52], while SSA is associated with a fragment of wild-type TTR that results from cleavage between residues 46 and 52 [53]. It has been proposed that the mutations observed in FAP destabilize the native TTR fold and facilitate deposition. The lack of mutations observed in the A and H β-strands of TTR and the demonstration that a synthetic peptide of the A-strand can form amyloid fibrils in vitro [54, 55] led to the hypothesis that the A-strand (and H-strand) may be critical for fibril formation [56]. However, if the A-strand plays a critical role in the fibril structure, removal of this strand would be detrimental to fibril formation. The observation that around 80% of wildtype TTR in ex vivo amyloid fibrils lacks the N-terminal region (residues 1–46, 49, 52 or 53) [53] which contains the A-strand (residues 10–20) is inconsistent with this view. Therefore, wild-type TTR and mutant TTR may form fibrils via different pathways. Mutations of TTR could lead to global destabilization of the native fold that facilitates amyloid fibril formation, resulting in the relatively early age of onset
2 Common Modifications that May Play a Significant Role In Vivo 7
observed in FAP. In contrast, wild-type TTR may be cleaved as a result of age-related chemical modification or aberrant proteolysis, which would be consistent with the older age of onset of SSA. α-Synuclein (also known as NACP) has been identified in inclusions associated with Parkinson’s disease, dementia with Lewy bodies (DLB), Alzheimer’s disease, multiple system atrophy (MSA) and amyotrophic lateral sclerosis (ALS) (reviewed in [57]). Lewy bodies contain α-Synuclein and in vitro studies reveal α-synuclein fibrils have all the hallmarks of amyloid [58]. Truncated forms of α-Synuclein have been reported in Parkinson’s disease and DLB [59, 60]. Fragments of α-Synuclein (around 2–4 kDa shorter than the full-length protein) have been identified in Lewy bodies, but not in a normal control [60]. Fragments that are around 4 and 10 kDa shorter than full-length α-Synuclein have been observed in soluble fractions of Parkinson’s disease, DLB and normal control extracts [59], suggesting these fragments may be normal breakdown products of α-Synuclein. However, the smaller fragment was also identified in poorly soluble fractions suggesting this fragment may play a role in fibril formation [59]. Immunochemical analysis indicates that the 4- and 10-kDa truncated fragments are derived from the middle region of α-Synuclein [59], which is consistent with the previously identified NAC peptide isolated from the brains of patients with Alzheimer’s disease [61]. Significant Aβ N-terminal and C-terminal heterogeneity has been observed in Alzheimer’s disease amyloid deposits. Aβ is generated by cleavage of a larger precursor, the amyloid β precursor protein (AβPP). In addition to Aβ, cleavage of AβPP can generate two other peptides, C100 and p3 [62]. All three peptides (Aβ, p3 and C100) have been implicated in amyloid formation. Intact C100 is competent to form amyloid fibrils which, upon non-specific proteolysis, results in fibrils composed of Aβ [63]. The peptide p3 is the major component of “preamyloid”, indicating an important role of p3 in amyloidogenesis [64]. Ragged N and C-termini of Aβ have been found in plaques. The C-termini reported include V39, V40, A42 and T43 [65]. Aβ which ends in A42 is highly amyloidogenic and represents the earliest C-terminus observed in deposits, suggesting that an increase in the ratio of A42 to other peptides may facilitate fibril formation [66, 67]. Aβ peptides have been identified with N-termini beginning with –I6, -V3, D1 or isoD1, A2, pE3, F4, S8, G9, pE11 and L17. The earliest N-termini in deposits have been speculated to be L17 (p3) followed by deposition of pE3 and D1 [64, 66]. The full-length human prion protein (PrPc ) consists of residues 23–231 after the signal sequence is removed from the primary translation product and the GPI anchor attached. A protein truncated at approximately residue 80–90 is significantly elevated in Creutzfeldt-Jakob disease [68]. It has been proposed that heterogeneity at the N-terminus could contribute to the strain variation observed in prion diseases [69]. Neuronal inclusions observed in Huntington’s disease are fibrillar [70] and Congo red birefringent [71, 72]. The inclusions contain huntingtin and a 40-kDa N-terminal fragment of huntingtin [70]. In vitro experiments on N-terminal fragments of huntingtin have led to the suggestion that “the proteolytic cleavage event that yields this ‘toxic’ fragment would also provide a rate-limiting step in Huntington’s disease
8 Post-translational Chemical Modifications in Amyloid Fibril Formation
onset” [73]. Proteolysis has also been proposed as an important factor in other polyglutamine disorders [74]. The neurotoxicity observed in Huntington’s disease and other polyglutamine diseases may be mediated by the binding of polyglutamine proteins (e.g. TATA-binding protein) to the huntingtin deposit which could deplete the necessary supply of these proteins in the brain [72]. Aberrant processing of viral proteins may lead to amyloidogenic fragments that can nucleate amyloid deposition [75]. This has been championed in the case of glycoprotein B (gB) from herpes simplex virus 1 (HSV1). HSV1 DNA has previously been identified in the same regions of the brain that are affected by Alzheimer’s disease [76, 77] and more recently a fragment of gB has been demonstrated to accelerate Aβ fibril formation in vitro [75].
2.2 Deamidation, Isomerization, Racemization and Protein l-Isoaspartyl Methyltransferase (PIMT)
Deamidation of asparagine and glutamine is common under physiological conditions (for reviews, see [78–81]). Both the reaction rate and mechanism are influenced by a host of factors, including solution conditions (e.g. pH, ionic strength, buffers, temperature), primary sequence, secondary structure and tertiary structure. The inherent chemical and thermal lability of these residues is highlighted in the observed degradation products of peptide drugs upon aging (e.g. calcitonin [82], pramlintide [83], insulin [84]) and in the low number of Q/N repeats in thermophiles compared to mesophiles [85]. Deamidation reactions can lead to fragmentation of the amide backbone or result in subtle backbone or side-chain modifications. For example, asparagine deamidation can lead to the formation of L- or D-aspartic acid, L- or D-isoaspartic acid or fragmentation of the protein chain (Fig. 1) [86, 87]. Deamidation of glutamine can yield glutamic acid or, if the residue is at the N-terminus, pyroglutamic acid. The conversion of an asparagine to an aspartic/isoaspartic acid or glutamine to glutamic/pyroglutamic acid results in a protein that has the equivalent of a posttranslational point mutation. From a protein folding point of view, the chemical modification could affect the thermodynamic stability of the native fold, the kinetics of folding or unfolding, or the overall flexibility of the molecule. Deamidation of ribonuclease A (N67isoAsp), for example, significantly decreases the folding rate [88] while deamidation of lysozyme (N103D or N106D) increases the susceptibility to proteolysis [89]. The destabilization of the native state or enhanced flexibility could result in proteins that are more prone to aggregation. Deamidation leads to a change in net charge of the protein. Under physiological conditions (pH ∼7.4), asparagine and glutamine are neutral, but aspartic acid and glutamic acid are negatively charged. Model studies suggest that aggregation is favored when the net charge of the protein is closer to zero [90], so deamidation may facilitate amyloid fibril formation of basic proteins by altering the net charge. Several other post-translational chemical modifications can alter the net charge
2 Common Modifications that May Play a Significant Role In Vivo 9
Fig. 1 Mechanisms for deamidation, isomerization, and racemization of asparagine and aspartic acid. (Reprinted from [86, 87], with permission)
of a protein, including C-terminal amidation, N-terminal acetylation, advanced glycation (or glycosylation) end-product (AGE) formation and arginylation [91]. Polyglutamine repeat diseases [e.g. Huntington’s disease, spinalbulbar muscular atrophy (Kennedy’s disease), dentatorubral-pallidoluysian atrophy, and spinocerebellar ataxias 1, 2, 3 (Machado-Joseph disease), 6, 7 and 17] are characterized by the aggregation of glutamine-rich peptides/proteins [92, 93]. Huntington’s disease is the most common glutamine-repeat disease, and in vitro analysis of glutaminerich peptides demonstrates that these deposits are fibrillar, β-sheet rich and bind Thioflavin T [94]. Polyglutamine peptides are likely candidates for deamidation, but the effect of deamidation on fibril formation has not yet been investigated. Polyglutamine peptides are, however, susceptible to transglutaminases which can catalyze deamidation of glutamine or crosslinking between glutamine and other amines (e.g. lysine) [95]. These chemical modifications could potentially play a role in fibril formation by promoting nucleation or by stabilizing the resulting fibril. Glutamine- and asparagine-rich domains are common in yeast prions, and it is speculated that these repeats may provide interaction specificity [85]. These proteins, however, will also likely be susceptible to deamidation. Deamidation of PrPC has been shown to occur spontaneously [96] and results in a protein that readily forms protease K-resistant aggregates upon exposure to Cu2 + [97]. Aggregation of other prions may be mediated by a similar pathway.
10 Post-translational Chemical Modifications in Amyloid Fibril Formation
β 2 M extracted from amyloid deposits has been reported to be deamidated at positions 17 and 42 (Asn to Asp/isoAsp) [98, 99]. In vitro experiments suggest that Asn 17 to Asp is unlikely to accelerate amyloid fibril formation [100], but the other deamidated species have not been investigated. Similarly, deamidation and isomerization of tau has been identified in paired helical filaments, although no specific role in the aggregation process has been identified for these modifications [101]. In familial cataract, amyloid fibrils composed of crystallins are thought to precede cataract formation in vivo [102] and this process is supported by in vitro studies [103]. The crystallins are a group of proteins that reside in the human lens. These proteins do not turn over as the lens ages, so chemical modifications of crystallins are common in both normal and cataractous lenses. Investigations to elucidate differences between normal and cataractous lenses have revealed elevated deamidation of γ S-crystallin in cataractous lenses [104] at asparagine 143 [105]. Significant levels of glutamine deamidation have been reported in ex vivo Aβ amyloid deposits. The glutamine at position 3 of Aβ is significantly deamidated (10–50%) to yield an N-terminal pyroglutamic acid (pE3 Aβ) [65, 106–108]. Pyroglutamic acid at position 11 (pE11 Aβ) has also been reported [65]. The highly variable amounts reported for pE3 Aβ in the brain may, in part, be due to the different localization of this peptide [65, 108]. pE3 Aβ is common in diffuse amyloid plaques (proposed to be an early event in amyloid deposition) and the ratio of Aβ to pE3 Aβ directly correlates with age (younger subjects have higher levels of pE3 Aβ in plaques) [65, 109]. pE3 Aβ has also been shown to aggregate more readily than Aβ in vitro [107, 110]. Taken together, this data suggests that the formation of pE3 Aβ is a pre-fibrillogenic event which can induce amyloid fibril formation of pE3 Aβ followed by the subsequent deposition of Aβ onto these deposits [109]. Isomerization of aspartic acid and asparagine (concurrent with deamidation) is one of the most common non-enzymatic chemical modifications of proteins in vivo. This reaction occurs most readily in regions of high flexibility and with preference to sequences that contain an aspartic acid or asparagine followed by a glycine. Isomerization results in a change in conformation of the peptide/protein backbone [111] which could facilitate fibril formation. Aβ has three aspartic acid residues (D1, D7 and D23) and isomerization of each residue has been reported in ex vivo amyloid deposits [108, 112, 113]. If isomerization occurs as a secondary phenomenon to exposed Asp residues after Aβ fibril formation, a relatively equal distribution of chemically modified Aβ would be anticipated. Interestingly, the distribution of modified forms of Aβ differ between neuritic and vascular amyloid [108, 114]. This variable localization has, therefore, led to the hypothesis that Asp isomerization is a pre-fibrillogenic event [114]. A direct role in nucleation of fibril formation is supported by the observation that peptides with isoAsp at D1 and D7 have a higher β-sheet content [115], and isomerized D23 forms fibrils more readily than unmodified Aβ [113]. Furthermore, the presence of isoAsp (or D-Asp) can prevent protein turnover [112], thereby increasing the local concentration. Alternative explanations for the role of Asp isomerization in Alzheimer’s disease have been proposed in which a central role is attributed to the succinamide intermediate that
2 Common Modifications that May Play a Significant Role In Vivo 11
is transiently populated during the conversion of Asp to isoAsp. The succinamide intermediate, unlike Asp and isoAsp, is neutral and the loss of charge has been postulated to facilitate aggregation [116]. Furthermore, the succinamide residue adopts a geometry similar to a type II β-turn and may result in the formation of cytotoxic Aβ β-hairpins [117]. Racemization occurs when the stereochemistry at the α carbon is altered from the L configuration to the D configuration. This process is quite common [118] and can be catalyzed by racemases or occur non-enzymatically. Non-enzymatic racemization of amino acids in proteins has been explored as a possible dating technique to estimate the age of shells and other biomineralized materials [119]. Bacteria use racemases to generate cell wall peptides that contain both L- and D-amino acids which are relatively resistant to proteases. Some racemase inhibitors (e.g. cycloserine) produce neurological side-effects which may be a result of interactions with the recently discovered human serine racemase in the brain [120]. The presence of racemized amino acids can affect the protein conformation, have an impact on the folding/unfolding rates, or alter the thermodynamic stability. Although D-amino acids can destabilize β-sheets, peptides with D-amino acids have been shown to readily form amyloid fibrils [121]. Racemization of aspartic acid and serine residues has been observed in Aβ isolated from ex vivo amyloid fibrils [122]. Aspartic acid is very susceptible to spontaneous racemization and D-Asp Aβ has been identified in vitro [123]. D-Asp Aβ aggregates faster than unmodified Aβ, suggesting a possible role for racemization in amyloid fibril nucleation [123, 124]. D-Ser at position 8 likewise facilitates aggregation, but D-Ser at position 26 does not [124]. However, D-Ser 26 Aβ can be proteolyzed into a smaller, highly toxic fragment that may play a role in Alzheimer’s disease [125]. PIMT is an intracellular repair enzyme that facilitates the conversion of D-Asp or L-isoAsp into L-Asp [126]. A lower amount of PIMT activity has been correlated with an earlier age of death [127], suggesting that there may be a relationship between the accumulation of chemically modified proteins and mortality. PIMT knock-out mice exhibit a 9-fold increase in L-isoAsp and a phenotype of fatal epilepsy [128]. Likewise, the examination of human hippocampal tissue reveals dramatically reduced levels (50% lower) of PIMT activity and a 2-fold increase in damaged tubulin in epileptic versus control samples [129]. Cortical samples show similar levels of PIMT activity compared to controls [129], which is consistent with an earlier Alzheimer’s disease study [127]. Thus, local depletion of PIMT may be an interesting area of investigation to determine if excess deamidation or isomerization plays an active role in amyloid fibril formation. 2.3 Oxidative Damage
The reactants and products of oxidation reactions are highly variable, resulting in a staggering array of possible protein modifications that may impact amyloid deposition. Proteins can be oxidized by reacting with metals, lipids and other small
12 Post-translational Chemical Modifications in Amyloid Fibril Formation
molecules [16]. Oxidative damage has been proposed to play a role in a variety of lateonset diseases including Alzheimer’s disease [130]. For example, lipid peroxides can react with cysteine, histidine, and lysine residues to generate advanced lipoxidation end-products (ALEs) which can affect protein structure, charge, hydrophobicity and crosslinking [16]. Baynes and Thorpe “propose that atherosclerosis is an age-related disease characterized by accelerated lipid peroxidation and lipoxidative aging of proteins in the vascular wall” [16]. Abnormal metabolites (e.g. ketoaldehydes) have also been proposed to similarly react with proteins and affect aggregation [131]. Cysteine side-chains are frequently oxidized to form disulfide bonds as part of normal protein processing, but aberrant disulfide bonds may facilitate amyloid formation [132]. For example, normal lenses contain αA-crystallin in which Cys131 and Cys142 are partially reduced, but cataractous lenses contain only oxidized Cys131 and Cys142 which may impact lens transparency [133]. The release of monomeric TTR from ex vivo amyloid upon reduction suggests that there may be some disulfide bonds formed between TTR monomers in the fibrils [53]. (Note: Cys10 is the only cysteine in wild-type TTR.) The identification of a Cys10Arg TTR mutation that results in FAP [134] and a Cys10Ala mutant that forms amyloid fibrils in vitro [135], however, indicates that an intermolecular disulfide bridge is not necessary for the initiation of amyloid fibril formation. The Cys10 crosslinks may instead play a role in the thermodynamic stabilization of the fibrils. Cysteine residues on proteins can also react with free cysteine, cysteinylglycine, glutathione or sulfite in vivo. Disturbances in cysteine metabolism have been identified in several neurological disorders, including Alzheimer’s disease, Parkinson’s disease, motor neuron disease [136] and Hallervorden-Spatz disease [137]. Although no direct correlation to amyloid fibril formation has been elucidated, it has been proposed that altered thiol metabolism may contribute to the overall oxidative damage observed in these diseases. A more direct relationship between free thiols and amyloid fibril formation has been identified in AL amyloidosis and FAP. Protein extraction from ex vivo amyloid fibrils has revealed cysteinylation of Cys214 in k1 light chain in AL amyloidosis [138], and conjugation of Cys10 of TTR to sulfite, cysteine, cysteinylglycine and glutathione in FAP [139]. Modification of Cys10 with cysteine, cysteinylglycine or glutathione results in destabilization of TTR and enhanced fibril formation [140]. Significantly, these modifications result in TTR molecules that readily form fibrils at pH 4.8, a condition under which wild-type TTR does not aggregate [140]. Methionine oxidation has been reported for several amyloid fibril proteins including lysozyme [141], PrPsc [142], β 2 M [98] and Aβ [143, 144]. While earlier studies suggested that Met35 oxidation of Aβ increased the rate of aggregation [145], more recent work indicates that Met35 oxidation decreases the rate of βfi fibril formation [146–148]. Oxidation of methionine residues in α-synuclein results in a decrease in aggregation, but this inhibition is eliminated in the presence of metals [149–151]. In view of the present evidence, the role of methionine oxidation in amyloidosis remains unclear. It is possible that, in some cases, the oxidation may be protective by slowing fibril formation.
2 Common Modifications that May Play a Significant Role In Vivo 13
Tyrosine is susceptible to a variety of oxidation reactions that can lead to the formation of chlorotyrosine, nitrotyrosine, o-tyrosine and dityrosine [16]. Tyrosine nitration of α-synuclein in Parkinson’s disease and tau in Alzheimer’s disease has been reported and postulated to contribute to disease progression by affecting fibril formation or stability [152, 153]. Chlorination of tyrosine occurs upon exposure to HOCl, a strong oxidizing agent that is produced by neutrophils and monocytes [154]. Tyrosyl chlorination can promote protein aggregation [155] and has been implicated in formation of atherosclerotic plaques [156]. Dityrosine formation can form covalent crosslinks between peptides and it has been proposed that dityrosine formation of Aβ may stabilize amyloid fibrils in Alzheimer’s disease [157]. 2.4 AGEs
Carbohydrates can react with protein amino groups (e.g. N-terminus, lysine, arginine) to form a Schiff base that can rearrange to an Amadori product and, finally, an AGE. The time required to reach equilibrium is on the order of hours for Schiff base formation, days for Amadori products and weeks to months for AGEs. For an excellent review of AGE chemistry, see Bucala and Cerami [158]. AGEs, therefore, are predominantly identified in proteins with a low turnover rate in vivo. The formation of an AGE may result in the attachment of an adduct to the protein, protein crosslinking (e.g. an argininelysine crosslink or dilysine crosslink) [16] or fragmentation of the backbone [159]. The presence of AGE-modified proteins has been observed in cataracts [158], FAP [160], AA amyloidosis [161], Type 2 diabetes mellitus [162], Alzheimer’s disease [163] and patients undergoing long-term dialysis treatment [164]. AGE modification may promote amyloid fibril formation by decreasing the solubility as a result of the alteration in the net charge of the protein or AGE crosslinks could stabilize the fibrils. AGE modifications of both Aβ [163] and tau [165] have been identified in Alzheimer’s disease. AGE modification likely occurs early in the disease process [166] and may nucleate amyloid deposition as a result of accelerating Aβ aggregation [167]. In contrast, AGE-modified IAPP does not dramatically alter fibril formation kinetics, but does lead to fibrils that are more cytotoxic [168]. 2.5 Phosphorylation
Alzheimer’s disease is characterized by the presence of Aβ amyloid deposits and neurofibrillary tangles (NFTs) in the brain. Similarly, inclusion body myositis (IBM) is characterized by the deposition of Aβ amyloid and tau filaments in muscle. Ultrastructurally, the tau filaments are very similar in Alzheimer’s disease and IBM [169]. In Alzheimer’s disease, these filaments are called paired helical filaments (PHFs) and are the principal structural component of NFTs. In IBM, these filaments accumulate in muscle fibers and are called twisted tubulo filaments (TTFs). PHFs and TTFs are both composed of hyperphosphorylated tau [14, 169]. Phosphorylation
14 Post-translational Chemical Modifications in Amyloid Fibril Formation
of tau is not required for the formation of PHFs in vitro, but fibrils composed of phosphorylated tau resemble authentic PHFs more closely than do fibrils generated from non-phosphorylated tau [170]. This data suggests that phosphorylation may be an important pre-fibrillogenic event. AGE modification of tau, however, has been proposed to play a greater role in PHF insolubility than phosphorylation [171]. Therefore, it is possible that phosphorylation may nucleate fibril formation and AGE modification may stabilize the resultant fibrils against degradation.
3 Proposed Mechanisms by which Chemical Modifications may Affect Amyloid Deposition
Virtually all amyloid deposits contain some protein that is chemically modified. However, the identification of chemically modified protein in ex vivo amyloid does not a priori indicate that the modifications play a causative role. Therefore, it is necessary to consider the possible ways in which chemical modifications may affect amyloid deposition. There are three fundamentally different ways in which chemical modifications may contribute to amyloidosis. A chemical modification may affect the rate at which fibrils are formed (kinetic effect), the relative stability of the fibrils (thermodynamic effect) or lead to an increase in fibril toxicity (cytotoxic effect). In order to have a kinetic effect on amyloid fibril formation, the chemical modification must occur prior to fibrillogenesis and accelerate the slow step of fibril formation. Most of the mechanistic models of amyloid fibril formation require two basic steps, nucleation and growth (for a review of current models, see [172]). Nucleation involves the conversion of soluble protein (S) into an assembly competent state (A) that forms the nucleus (N) or seed. The formation of the nucleus from the soluble protein is the slow (rate-limiting) step in amyloid fibril formation. Fibril growth then proceeds via the addition of protein to the nucleus. Protein can bind to the nucleus in the soluble form (S) and then rearrange upon binding [173–175], or it can bind in the assembly competent (A) form [176] (Fig. 2). The assembly competent state (A) is generally believed to be a partially folded intermediate state [1–3, 177]. The destabilization of the native state, therefore, can increase the population of this intermediate and drive nucleus formation. Amyloidogenic mutants, for example, have been shown to be less stable than the wild-type proteins and it is proposed that the mutants populate the assembly competent intermediate state more readily [178]. Analogously, it is possible that post-translational chemical modifications can result in a protein (S*) that is less stable than the wild-type, readily populates the assembly competent intermediate state (i.e. S*→A is fast) and facilitates nucleus formation (Fig. 2). The protein generated by post-translational chemical modification could even be the equivalent of the assembly competent protein (i.e. S* = A). Depending on the mechanism of protein addition to the nucleus, the growth of amyloid fibrils by the chemical modification mechanism could require either
3 Proposed Mechanisms by which Chemical Modifications may Affect Amyloid Deposition
Fig. 2 Nucleation and growth mechanisms. Key: S=soluble protein, S*=chemically modified protein, A=assembly competent state, N=nucleus, F=fibril.
small or large amounts of modified protein. If the soluble protein adds directly to the nucleus, then only trace amounts of chemically modified proteins would be required to drive amyloid fibril formation. If, however, the assembly competent state is required for addition, then the majority of the protein in the fibrils would need to be chemically modified. Chemically modified proteins have been observed in ex vivo amyloid deposits in both major and minor quantities, suggesting that both mechanisms may occur in vivo. Nucleus formation is inherently slow as a result of the entropically unfavorable association of the A-state protein. Increasing the concentration of the A-state protein, however, can help to overcome this barrier. The population of the Astate can be increased in two different ways; destabilization of the S-state could shift the equilibrium toward the A-state or the degradation of the A-state could be slowed. Chemical modifications of the primary structure likely affect both pathways. The likelihood of amyloid fibril formation proceeding by the mechanism out lined in Fig. 2b has been strongly supported by examples of proteins in which cross-seeding occurs. Cross-seeding experiments demonstrate that the nucleus does not need to be formed from the protein used in the growth process [179]. A nucleus formed from a fragment of a protein can, in many cases, nucleate fibril formation from the full-length protein (e.g. tau [180], lysozyme [181], β 2 M [182]). This suggests that cleavage of a protein could generate a highly amyloidogenic fragment which nucleates fibril formation of the full-length protein. Nuclei that contain sequences that are very similar (e.g. polyglutamine [183]), differ by only 1 amino acid (e.g. α-synuclein [184], lysozyme [185], protein G BI domain [186], Aβ [187]) or differ significantly (e.g. tau/α-synuclein [188], tubulin/α-Synuclein [189], NAC/Aβ [190], A/β/IAPP [187], Aβ/tau [191], Aβ, TTR fragments, IAPP, silk/serum amyloid A, reviewed in [192]) from the soluble protein have been shown to accelerate fibril formation. Cross-seeding, however, is not necessarily a reciprocal process. A nucleus composed of protein X may be able to seed protein Y, but a nucleus of protein Y may not be able to seed protein X [187, 193]. It is also
15
16 Post-translational Chemical Modifications in Amyloid Fibril Formation
important to note that there are cases in which a protein may bind to a nucleus but not seed additional fibril formation [194]. The discovery of heterologous seeding has implications for the purity of peptides and proteins used for in vitro experiments designed to understand fibril formation. The batch-to-batch variability of Aβ is well documented and may be a result of different impurities [123]. In some cases, only trace amounts of modified peptide are necessary to seed fibril formation from unmodified protein [195], indicating that experiments need to be performed with exceptionally pure materials to assess mechanistic implications of fibril formation. Furthermore, the protein deposited in the fibrils can also be characterized to determine the extent of modified protein in the amyloid fibrils [196]. The two pieces of evidence necessary to adequately assess the impact of a chemical modification on the kinetics of fibril formation in vivo are the demonstration that the chemical modification is a pre-fibrillogenic event and accelerates nucleus formation. One method to determine if the modification is pre-fibrillogenic in vivo is to determine if the modified protein is present in the circulation. Although this type of experiment is insightful, there are two potential obstacles. First, if modified protein is not observed in the circulation, it is possible that the rate of nucleus formation is so fast that the modified protein is not present in detectable levels. Second, if modified protein is observed in the circulation, it is possible that the protein originated from the dissolution of amyloid fibrils in vivo and the chemical modification was, in fact, a post-fibrillogenic event. Another method to assess if the modification is preor post-fibrillogenic is to examine the distribution of chemically modified protein in the fibrils. If a chemical modification is a post-fibrillogenic event, it is reasonable to assume that the distribution of chemically modified protein in the fibrils will be similar. However, it has been determined that many modifications of Aβ are not equally distributed and have temporal relationship to fibril deposition (see Section 2.2). Finally, the extent of chemically modified protein in the fibrils may provide an insight into whether or not the modification is pre- or post-fibrillogenic. If the chemical modification of a specific residue is a post-fibrillogenic event, over time all of the exposed residues will become modified. Thus, if 50% or more of these residues are modified in aged samples, this may suggest that this residue is surface exposed and may be modified after fibril formation. However, in cases in which only small amounts of modification are present, this chemical modification is either very slow or the residue is in the interior of the fibril. If the residue is in the core of the fibril, chemical modification is likely to be a pre-fibrillogenic event. New methods have recently been explored in which the extent of surface exposed residues of in vitro amyloid fibrils were elucidated by artificial aging of the fibrils [196]. It will be interesting to see if similar methods can be applied to ex vivo fibrils. Another factor that is important to determine the effect of a chemical modification on the kinetics of fibril formation is the extent to which the modification accelerates fibril formation. In vitro studies have been used to determine that many chemical modifications of peptides lead to enhanced rates of fibril formation compared to wild-type as monitored by Thioflavin T fluorescence or Congo red binding. These rates are generally assessed by comparing equal concentrations of modified and
4 Conclusions
wild-type peptides. These results, however, need to be examined in the context of the relative concentrations of modified and wild-type peptides in vivo. For example, Met oxidation of Aβ has been observed in vivo and in vitro and there is evidence that this peptide aggregates more slowly than Aβ in vitro (see Section 2.3). From this information, it is tempting to conclude that Met oxidation of Aβ has a protective effect by inhibiting Aβ fibril formation. However, if the Met-oxidized Aβ peptide is turned over at a significantly slower rate than Aβ, the Met-oxidized peptide could accumulate and nucleate fibril formation. Pre- or post-fibrillogenic chemical modifications can play a role in amyloid fibril formation by altering the thermodynamic stability of the fibrils or the cytotoxicity. The stability can be enhanced by nucleating a different packing arrangement, the formation of covalent crosslinks (e.g. disulfides, AGEs, dityrosine), or decreasing the ability of the immune system to recognize or degrade the fibrils. One method to determine if the chemical modification can nucleate a different packing arrangement is to examine the fibril morphology. Phosphorylated tau, for example, results in fibrils that more closely resemble authentic PHFs than nonphosphorylated tau [170]. Fibrils can be examined for thermostability by comparing the measured solubility of modified and non-modified fibrils upon addition of denaturant. Changes in thermodynamic stability and cytotoxicity could be closely related. Fibrils and small oligomers both appear cytotoxic, but oligomers may be able to diffuse through tissue and, therefore, be toxic to more cells. Therefore, increasing the thermodynamic stability of the fibrils may be both bad and good – bad because it will make the fibrils difficult to clear from the body, good because it may shift the equilibrium toward the fibrils which could decrease the production of cytotoxic oligomers.
4 Conclusions
The majority of amyloid deposits examined to date contain some protein that is post-translationally modified. The role of these chemical modifications in amyloid fibril formation, however, has remained underexplored. A major challenge for the future will be to establish a temporal relationship between chemical modifications and fibril formation, and to assess the impact of the modification on kinetics of fibril formation, thermodynamic stability and cytotoxicity. A clear understanding of the role of chemical modifications in amyloid fibril formation will not only expand our current knowledge of protein misfolding, but may also lead to new avenues for therapeutic intervention. One current strategy to inhibit amyloid fibril formation is to design molecules that can stabilize the native state of the protein. This strategy will also be applicable for cases in which chemical modification plays a role because modifications are less likely in regions of a protein that are not flexible. However, a new strategy to prevent amyloid fibril formation may be to inhibit aberrant protein chemical modifications.
17
18 Post-translational Chemical Modifications in Amyloid Fibril Formation
5 Acknowledgments
I am grateful to Daniel Moriarty for a critical review of this manuscript, and to Carole Klapper and Randall Morrison for technical assistance in manuscript preparation.
References 1 CARRELL, R. W. and D. A. LOMAS. Conformational disease. Lancet 1997, 350, 134–138. 2 KELLY, J. W. The alternative conformations of amyloidogenic proteins and their multi-step assembly pathways. Curr Opin Struct Biol 1998, 8, 101–106. 3 DOBSON, C. M. The structural basis of protein folding and its links with human disease. Phil Trans R Soc Lond B Biol Sci 2001, 356, 133–145. 4 NALIVAEVA, N. N. and A. J. TURNER. Posttranslational modifications of proteins: acetylcholinesterase as a model system. Proteomics 2001, 1, 735–747. 5 NAGANO, N., E. G. HUTCHINSON and J. M. THORNTON. Barrel structures in proteins: automatic identification and classification including a sequence analysis of TIM barrels. Protein Sci 1999, 8, 2072–2084. 6 DAMAS, A. M. and M. J. SARAIVA. Review TTR amyloidosis – structural features leading to protein aggregation and their implications on therapeutic strategies. J Struct Biol 2000, 130, 290–299. 7 CLARK, A., E. D. KONING, C. BAKER, S. CHARGE and J. MORRIS. Localisation of N-terminal pro-islet amyloid polypeptide in β-cells of man and transgenic mice. Diabetologia 1993, 36, A136. 8 WESTERMARK, P., U. ENGSTROM, G. T. WESTERMARK, K. H. JOHNSON, J. PERMERTH and C. BETSHOLTZ. Islet amyloid polypeptide (IAPP) and pro-IAPP immunoreactivity in human islets of Langerhans. Diabetes Res Clin Pract 1989, 7, 219–226. 9 WESTERMARK, G. T., D. F. STEINER, S. GEBRE-MEDHIN, U. ENGSTROM and P. WESTERMARK. Pro islet amyloid polypeptide (ProIAPP) immunoreactivity
10
11
12
13
14
15
16
17
in the islets of Langerhans. Ups J Med Sci 2000, 105, 97–106. JAIKARAN, E. T., M. R. NILSSON and A. CLARK. Pancreatic beta-cell granule peptides form heteromolecular complexes which inhibit islet amyloid polypeptide fibril formation. Biochem J 2004, 377, 709–716. PARK, K. and C. B. VERCHERE. Identification of a heparin binding domain in the N-terminal cleavage site of pro-islet amyloid polypeptide. Implications for islet amyloid formation. J Biol Chem 2001, 276, 16611–16616. SLETTEN, K., P. WESTERMARK and J. B. NATVIG. Characterization of amyloid fibril proteins from medullary carcinoma of the thyroid. J Exp Med 1976, 143, 993–998. PUCCI, A., J. WHARTON, E. ARBUSTINI, M. GRASSO, M. DIEGOLI, P. NEEDLEMAN, M. VIGANO and J. M. POLAK. Atrial amyloid deposits in the failing human heart display both atrial and brain natriuretic peptide-like immunoreactivity. J Pathol 1991, 165, 235–241. GOEDERT, M. Tau protein and the neurofibrillary pathology of Alzheimer’s disease. Trends Neurosci 1993, 16, 460–465. ROBINSON, N. E. and A. B. ROBINSON. Molecular clocks. Proc Natl Acad Sci USA 2001, 98, 944–949. BAYNES, J. W. and S. R. THORPE. Glycoxidation and lipoxidation in atherogenesis. Free Rad Biol Med 2000, 28, 1708–1716. GLENNER, G. G., D. EIN, E. D. EANES, H. A. BLADEN, W. TERRY and D. L. PAGE. Creation of “amyloid” fibrils from Bence Jones proteins in vitro. Science 1971, 174, 712–714.
References 18 HAGGQVIST, B., J. NASLUND, K. SLETTEN, G. T. WESTERMARK, G. MUCCHIANO, L. O. TJERNBERG, C. NORDSTEDT, U. ENGSTROM, et al. Medin: an integral fragment of aortic smooth muscle cell-produced lactadherin forms the most common human amyloid. Proc Natl Acad Sci USA 1999, 96, 8669–8674. 19 KORVATSKA, E., H. HENRY, Y. MASHIMA, M. YAMADA, C. BACHMANN, F. L. MUNIER and D. F. SCHORDERET. Amyloid and nonamyloid forms of 5q31-linked corneal dystrophy resulting from kerato-epithelin mutations at Arg-124 are associated with abnormal turnover of the protein. J Biol Chem 2000, 275, 11465–11469. 20 BERSON, J. F., A. C. THEOS, D. C. HARPER, D. TENZA, G. RAPOSO and M. S. MARKS. Proprotein convertase cleavage liberates a fibrillogenic fragment of a resident glycoprotein to initiate melanosome biogenesis. J Cell Biol 2003, 161, 521–533. 21 CHAPMAN, M. R., L. S. ROBINSON, J. S. PINKNER, R. ROTH, J. HEUSER, M. HAMMAR, S. NORMARK and S. J. HULTGREN. Role of Escherichia coli curli operons in directing amyloid fiber formation. Science 2002, 295, 851–855. 22 CERINI, C., V. PEYROT, C. GARNIER, L. DUPLAN, S. VEESLER, J. P. LE CAER, J. P. BERNARD, H. BOUTEILLE, et al. Biophysical characterization of lithostathine. Evidences for a polymeric structure at physiological pH and a proteolysis mechanism leading to the formation of fibrils. J Biol Chem 1999, 274, 22266–22267. 23 DUPLAN, L., B. MICHEL, J. BOUCRAUT, S. BARTHELLEMY, S. DESPLAT-JEGO, V. MARIN, D. GAMBARELLI, D. BERNARD, et al. Lithostathine and pancreatitis-associated protein are involved in the very early stages of Alzheimer’s disease. Neurobiol Aging 2001, 22, 79–88. 24 JIN, C. X., S. NARUSE, M. KITAGAWA, H. ISHIGURO, T. KONDO, S. HAYAKAWA and T. HAYAKAWA. Pancreatic stone protein of pancreatic calculi in chronic calcified pancreatitis in man. JOP: J Pancreas (Online) 2002, 3, 54–61.
25 SUN, H. Q., M. YAMAMOTO, M. MEJILLANO and H. L. YIN. Gelsolin, a multifunctional actin regulatory protein. J Biol Chem 1999, 274, 33179–33182. 26 MAURY, C. P. Gelsolin-related amyloidosis. Identification of the amyloid protein in Finnish hereditary amyloidosis as a fragment of variant gelsolin. J Clin Invest 1991, 87, 1195–1199. 27 de LA CHAPELLE, A., R. TOLVANEN, G. BOYSEN, J. SANTAVY, L. BLEEKER-WAGEMAKERS, C. P. MAURY and J. KERE. Gelsolin-de-rived familial amyloidosis caused by asparagine or tyrosine substitution for as-5 Role of Post-translational Chemical Modifications in Amyloid Fibril Formation partic acid at residue 187. Nat Genet 1992., 2, 157–160. 28 RATNASWAMY, G., M. E. HUFF, A. I. SU, S. RION and J. W. KELLY. Destabilization of Ca2+ -free gelsolin may not be responsible for proteolysis in Familial Amyloidosis of Finnish Type. Proc Natl Acad Sci USA 2001, 98, 2334–2339. 29 CHEN, C. D., M. E. HUFF, J. MATTESON, L. PAGE, R. PHILLIPS, J. W. KELLY and W. E. BALCH. Furin initiates gelsolin familial amyloidosis in the Golgi through a defect in Ca2+ stabilization. EMBO J 2001, 20, 6277–6287. 30 KANGAS, H., T. PAUNIO, N. KALKKINEN, A. JALANKO and L. PELTONEN. In vitro expression analysis shows that the secretory form of gelsolin is the sole source of amyloid in gelsolin-related amyloidosis. Hum Mol Genet 1996, 5, 1237–1243. 31 MAURY, C. P. and H. ROSSI. Demonstration of a circulating 65K gelsolin variant specific for familial amyloidosis, Finnish type. Biochem Biophys Res Commun 1993., 191, 41–44. 32 MAURY, C. P., K. SLETTEN, N. TOTTY, H. KANGAS and M. LILJESTROM. Identification of the circulating amyloid precursor and other gelsolin metabolites in patients with G654A mutation in the gelsolin gene (Finnish familial amyloidosis): pathogenetic and diagnostic implications. Lab Invest 1997, 77, 299–304.
19
20 Post-translational Chemical Modifications in Amyloid Fibril Formation
33 GHISO, J., O. JENSSON and B. FRANGIONE. Amyloid fibrils in hereditary cerebral hemorrhage with amyloidosis of Icelandic type is a variant of gamma-trace basic protein (cystatin C). Proc Natl Acad Sci USA 1986, 83, 2974–2978. 34 OLAFSSON, I., G. GUDMUNDSSON, M. ABRAHAMSON, O. JENSSON and A. GRUBB. The amino terminal portion of cerebrospinal fluid cystatin C in hereditary cystatin C amyloid angiopathy is not truncated: direct sequence analysis from agarose gel electropherograms. Scand J Clin Lab Invest 1990, 50, 85–93. 35 STANIFORTH, R. A., S. GIANNINI, L. D. HIGGINS, M. J. CONROY, A. M. HOUNSLOW, R. JERALA, C. J. CRAVEN and J. P. WALTHO. Three-dimensional domain swapping in the folded and molten-globule states of cystatins, an amyloid-forming structural superfamily. EMBO J 2001, 20, 4774–4781. 36 NAGAI, A., S. KOBAYASHI, K. SHIMODE, K. IMAOKA, N. UMEGAE, S. FUJIHARA and M. NAKAMURA. No mutations in cystatin C gene in cerebral amyloid angiopathy with cystatin C deposition. Mol Chem Neuropathol 1998, 33, 63–78. 37 BERGSTROM, J., C. MURPHY, M. EULITZ, D. T. WEISS, G. T. WESTERMARK, A. SOLOMON and P. WESTERMARK. Codeposition of apolipoprotein A-IV and transthyretin in senile systemic (ATTR) amyloidosis. Biochem Biophys Res Commun 2001, 285, 903–908. 38 WESTERMARK, P., G. MUCCHIANO, T. MARTHIN, K. H. JOHNSON and K. SLETTEN. Apolipoprotein A1-derived amyloid in human aortic atherosclerotic plaques. Am J Pathol 1995, 147, 1186–1192. 39 MUCCHIANO, G. I., B. HAGGQVIST, K. SLETTEN and P. WESTERMARK. Apolipoprotein A-1-derived amyloid in atherosclerotic plaques of the human aorta. J Pathol 2001, 193, 27027–27035. 40 NICHOLS, W. C., F. E. DWULET, J. LIEPNIEKS and M. D. BENSON. Variant apolipoprotein AI as a major constituent of a human hereditary amyloid. Biochem Biophys Res Commun 1988, 156, 762–768. 41 HUSEBEKK, A., B. SKOGEN, G. HUSBY and G. MARHAUG. Transformation of
42
43
44
45
46
47
48
49
50
amyloid precursor SAA to protein AA and incorporation in amyloid fibrils in vivo. Scand J Immunol 1985, 21, 283–287. YAMADA, T., J. J. LIEPNIEKS, B. KLUVE-BECKERMAN and M. D. BENSON. Cathepsin B generates the most common form of amyloid A (76 residues) as a degradation product from serum amyloid A. Scand J Immunol 1995, 41, 94–97. HATTERS, D. M. and G. J. HOWLETT. The structural basis for amyloid formation by plasma apolipoproteins: a review. Eur Biophys J 2002, 31, 2–8. BAUMANN, M. H., T. WISNIEWSKI, E. LEVY, G. T. PLANT and J. GHISO. C-terminal fragments of alpha- and beta-tubulin form amyloid fibrils in vitro and associate with amyloid deposits of familial cerebral amyloid angiopathy, British type. Biochem Biophys Res Commun 1996, 219, 238–242. KIM, S. H., R. WANG, D. J. GORDON, J. BASS, D. F. STEINER, G. THINAKARAN, D. G. LYNN, S. C. MEREDITH, et al. Familial British dementia: expression and metabolism of BRI. Ann NY Acad Sci 2000, 920, 93–99. VIDAL, R., B. FRANGIONE, A. ROSTAGNO, S. MEAD, T. REVESZ, G. PLANT and J. GHISO. A stop-codon mutation in the BRI gene associated with familial British dementia. Nature 1999, 399, 776–781. SCHWAB, C., M. HOSOKAWA, H. AKIYAMA and P. L. MCGEER. Familial British dementia: colocalization of furin and ABri amyloid. Acta Neuropathol (Berl) 2003, 106, 278–284. KIM, S. H., J. W. CREEMERS, S. CHU, G. THINAKARAN and S. S. SISODIA. Proteolytic processing of familial British dementia-associated BRI variants: evidence for enhanced intracellular accumulation of amyloidogenic peptides. J Biol Chem 2002, 277, 1872–1877. BELLOTTI, V., M. STOPPINI, P. MANGIONE, M. SUNDE, C. ROBINSON, L. ASTI, D. BRANCACCIO and G. FERRI. Beta2-microglobulin can be refolded into a native state from ex vivo amyloid fibrils. Eur J Biochem 1998, 258, 61–67. LINKE, R. P., H. HAMPL, H. LOBECK, E. RITZ, J. BOMMER, R. WALDHERR and M.
References
51
52
53
54
55
56
57
58
59
EULITZ. Lysine-specific cleavage of beta 2-microglobulin in amyloid deposits associated with hemodialysis. Kidney Int 1989, 36, 675–681. ESPOSITO, G., R. MICHELUTTI, G. VERDONE, P. VIGLINO, H. HERNANDEZ, C. V. ROBINSON, A. AMORESANO, F. DAL PIAZ, et al. Removal of the N-terminal hexapeptide from human beta2-microglobulin facilitates protein aggregation and fibril formation. Protein Sci 2000, 9, 831–845. SARAIVA, M. J. Transthyretin mutations in health and disease. Hum Mutat 1995, 5, 191–196. GUSTAVSSON, A., H. JAHR, R. TOBIASSEN, D. R. JACOBSON, K. SLETTEN and P. WESTERMARK. Amyloid fibril composition and transthyretin gene structure in senile systemic amyloidosis. Lab Invest 1995, 73, 703–708. GUSTAVSSON, A., U. ENGSTROM and P. WESTERMARK. Normal transthyretin and synthetic transthyretin fragments form amyloid-like fibrils in vitro. Biochem Biophys Res Commun 1991, 175, 1159–1164. MACPHEE, C. E. and CM. DOBSON. Chemical dissection and reassembly of amyloid fibrils formed by a peptide fragment of transthyretin. J Mol Biol 2000, 297, 1203–1215. KELLY, J. and P. L. LANSBURY Jr. A chemical approach to elucidate the mechanism of transthyretin and β-protein amyloid fibril formation. Amyloid 1994, 1, 186–205. LUCKING, C. B. and A. BRICE. Alpha-synuclein and Parkinson’s disease. Cell Mol Life Sci 2000, 57, 1894–1908. CONWAY, K. A., J. D. HARPER and P. T. LANSBURY, Jr. Fibrils formed in vitro from alpha-synuclein and two mutant forms linked to Parkinson’s disease are typical amyloid. Biochemistry 2000, 39, 2552–2563. CULVENOR, J. G., C. A. MCLEAN, S. CUTT, B. C. CAMPBELL, F. MAHER, P. JAKALA, T. HARTMANN, K. BEYREUTHER, et al. Non-Abeta component of Alzheimer’s disease amyloid (NAC) revisited. NAC and alpha-synuclein are not associated
60
61
62
63
64
65
66
67
with Abeta amyloid. Am J Pathol 1999, 155, 1173–1181. BABA, M., S. NAKAJO, P. H. TU, T. TOMITA, K. NAKAYA, V. M. LEE, J. Q. TROJANOWSKI and T. IWATSUBO. Aggregation of alpha-synuclein in Lewy bodies of sporadic Parkinson’s disease and dementia with Lewy bodies. Am J Pathol 1998, 152, 879–884. UEDA, K., H. FUKUSHIMA, E. MASLIAH, Y. XIA, A. IWAI, M. YOSHIMOTO, D. A. OTERO, J. KONDO, et al. Molecular cloning of cDNA encoding an unrecognized component of amyloid in Alzheimer disease. Proc Natl Acad Sci USA 1993, 90, 11282–11286. HAASS, C. and D. J. SELKOE. Cellular processing of beta-amyloid precursor protein and the genesis of amyloid beta-peptide. Cell 1993, 75, 1039–1042. TJERNBERG, L. O., J. NASLUND, J. THYBERG, S. E. GANDY, L. TERENIUS and C. NORDSTEDT. Generation of Alzheimer amyloid beta peptide through nonspecific pro-teolysis. J Biol Chem 1997, 272, 1870–1875. LALOWSKI, M., A. GOLABEK, C. A. LEMERE, D. J. SELKOE, H. M. WISNIEWSKI, R. C. BEAVIS, B. FRANGIONE and T. WISNIEWSKI. The “nonamyloidogenic” p3fragment (amyloid beta17–42) is a major constituent of Down’s syndrome cerebellar pre-amyloid. J Biol Chem 1996, 271, 33623–33631. SAIDO, T. C., W. YAMAO-HARIGAYA, T. IWATSUBO and S. KAWASHIMA. Aminoand carboxyl-terminal heterogeneity of beta-amyloid peptides deposited in human brain. Neurosci Lett 1996, 215, 173–176. LEMERE, C. A., J. K. BLUSZTAJN, H. YAMAGUCHI, T. WISNIEWSKI, T. C. SAIDO and D. J. SELKOE. Sequence of deposition of heterogeneous amyloid beta-peptides and APO E in Down syndrome: implications for initial events in amyloid plaque formation. Neurobiol Dis 1996, 3, 16–32. JARRETT, J. T., E. P. BERGER and P. T. LANSBURY, Jr. The carboxy terminus of the beta amyloid protein is critical for the seeding of amyloid formation: implications for the pathogenesis of
21
22 Post-translational Chemical Modifications in Amyloid Fibril Formation
68
69
70
71
72
73
74
75
Alzheimer’s disease. Biochemistry 1993, 32, 4693–4697. CHEN, S. G., D. B. TEPLOW, P. PARCHI, J. K. TELLER, P. GAMBETTI and L. AUTILIO-GAMBETTI. Truncated forms of the human prion protein in normal brain and in prion diseases. J Biol Chem 1995, 270, 19173–19180. PAN, T., R. LI, B. S. WONG, T. LIU, P. GAMBETTI and M. S. SY. Heterogeneity of normal prion protein in two-dimensional immunoblot: presence of various glycosylated and truncated forms. J Neurochem 2002, 81, 1092–1101. DIFIGLIA, M., E. SAPP, K. O. CHASE, S. W. DAVIES, G. P. BATES, J. P. VONSATTEL and N. ARONIN. Aggregation of huntingtin in neuronal intranuclear inclusions and dystrophic neurites in brain. Science 1997, 277, 1990–1993. MCGOWAN, D. P., W. van ROON-MOM, H. HOLLOWAY, G. P. BATES, L. MANGIARINI, G. J. COOPER, R. L. FAULL and R. G. SNELL. Amyloid-like inclusions in Huntington’s disease. Neuroscience 2000, 100, 677–680. HUANG, C. C., P. W. FABER, F. PERSICHETTI, V. MITTAL, J. P. VONSATTEL, M. E. MACDONALD and J. F. GUSELLA. Amyloid formation by mutant huntingtin: threshold, progressivity and recruitment of normal polyglutamine proteins. Somat Cell Mol Genet 1998, 24, 217–233. SCHERZINGER, E., A. SITTLER, K. SCHWEIGER, V. HEISER, R. LURZ, R. HASENBANK, G. P. BATES, H. LEHRACH, et al. Self-assembly of polyglutamine-containing huntingtin fragments into amyloid-like fibrils: implications for Huntington’s disease pathology. Proc Natl Acad Sci USA 1999, 96, 4604–4609. TARLAC, V. and E. STOREY. Role of proteolysis in polyglutamine disorders. J Neu-rosci Res 2003, 74, 406–416. CRIBBS, D. H., B. Y. AZIZEH, C. W. COTMAN and F. M. LAFERLA. Fibril formation and neurotoxicity by a herpes simplex virus glycoprotein B fragment with homology to the Alzheimer’s A beta peptide. Biochemistry 2000, 39, 5988–5994.
76 LIN, W. R., D. SHANG, G. K. WILCOCK and R. F. ITZHAKI. Alzheimer’s disease, herpes simplex virus type 1, cold sores and apolipoprotein E4. Biochem Soc Trans 1995, 23, 594S. 77 JAMIESON, G. A., N. J. MAITLAND, G. K. WILCOCK, C. M. YATES and R. F. ITZHAKI. Herpes simplex virus type 1 DNA is present in specific regions of brain from aged people with and without senile dementia of the Alzheimer type. J Pathol 1992, 167, 365–368. 78 ROBINSON, A. B. and C. J. RUDD. Deamidation of glutaminyl and asparaginyl residues in peptides and proteins. Curr Topics Cell Reg 1974, 8, 247–295. 79 BISCHOFF, R. and H. V. KOLBE. Deamidation of asparagine and glutamine residues in proteins and peptides: structural determinants and analytical methodology. J Chromatogr B Biomed Appl 1994, 662, 261–278. 80 WRIGHT, H. T. Sequence and structure determinants of the nonenzymatic deamidation of asparagine and glutamine residues in proteins. Protein Eng 1991, 4, 283–294. 81 WRIGHT, H. T. Nonenzymatic deamidation of asparaginyl and glutaminyl residues in proteins. Crit Rev Biochem Mol Biol 1991, 26, 1–52. 82 WINDISCH, V., F. DELUCCIA, L. DUHAU, F. HERMAN, J. J. MENCEL, S. Y. TANG and M. VUILHORGNE. Degradation pathways of salmon calcitonin in aqueous solution. J Pharm Sci 1997, 86, 359–364. 83 HEKMAN, C. M., W. S. DEMOND, P. J. KELLEY, S. F. MAUCH and J. D. WILLIAMS. Isolation and identification of cyclic imide and deamidation products in heat stressed pramlintide injection drug product. J Pharm Biomed Anal 1999, 20, 763–772. 84 BRANGE, J., L. LANGKJAER, S. HAVELUND and A. VOLUND. Chemical stability of insulin. 1. Hydrolytic degradation during storage of pharmaceutical preparations. Pharm Res 1992, 9, 715–726. 85 MICHELITSCH, M. D. and J. S. WEISSMAN. A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel
References
86
87
88
89
90
91
92
93
94
95
prions. Proc Natl Acad Sci USA 2000, 97, 11910–11915. SHIMIZU, T., A. WATANABE, M. OGAWARA, H. MORI and T. SHIRASAWA. Isoaspartate formation and neurodegeneration in Alzheimer’s disease. Arch Biochem Biophys 2000, 381, 225–234. LOWENSON, J. D. and S. CLARKE. Recognition of D-aspartyl residues in polypeptides by the erythrocyte L-isoaspartyl/D-aspartyl protein methyltransferase. Implications for the repair hypothesis. J Biol Chem 1992, 267, 5985–5995. ORRU, S., L. VITAGLIANO, L. ESPOSITO, L. MAZZARELLA, G. MARINO and M. RUOPPOLO. Effect of deamidation on folding of ribonuclease A. Protein Sci 2000, 9, 2577–2582. KATO, A., S. TANIMOTO, Y. MURAKI, K. KOBAYASHI and I. KUMAGAI. Structural and functional properties of hen egg-white lysozyme deamidated by protein engineering. Biosci Biotechnol Biochem 1992, 56, 1424–1428. CHITI, F., M. CALAMAI, N. TADDEI, M. STEFANI, G. RAMPONI and C. M. DOBSON. Studies of the aggregation of mutant proteins in vitro provide insights into the genetics of amyloid diseases. Proc Natl Acad Sci USA 2002, 99 (Suppl 4), 16419–16426. HALLAK, M. E. and G. BONGIOVANNI. Posttranslational arginylation of brain proteins. Neurochem Res 1997, 22, 467–473. LA SPADA, A. R., E. M. WILSON, D. B. LUBAHN, A. E. HARDING and K. H. FISCHBECK. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 1991, 352, 77–79. PERUTZ, M. F. Glutamine repeats and neurodegenerative diseases: molecular aspects. Trends Biochem Sci 1999, 24, 58–63. CHEN, S., V. BERTHELIER, J. B. HAMILTON, B. O’NUALLAIN and R. WETZEL. Amyloidlike features of polyglutamine aggregates and their assembly kinetics. Biochemistry 2002, 41, 7391–7399. VIOLANTE, V., A. LUONGO, I. PEPE, S. ANNUNZIATA and V. GENTILE.
96
97
98
99
100
101
102
103
Transglutaminase-dependent formation of protein aggregates as possible biochemical mechanism for polyglutamine diseases. Brain Res Bull 2001, 56, 169–172. SANDMEIER, E., P. HUNZIKER, B. KUNZ, R. SACK and P. CHRISTEN. Spontaneous deamidation and isomerization of Asn108 in prion peptide 106–126 and in full-length prion protein. Biochem Biophys Res Commun 1999, 261, 578–583. QIN, K., D. S. YANG, Y. YANG, M. A. CHISHTI, L. J. MENG, H. A. KRETZSCHMAR, C. M. YIP, P. E. FRASER, et al. Copper(II)-induced conformational changes and protease resistance in recombinant and cellular PrP. Effect of protein age and deamidation. J Biol Chem 2000, 275, 19121–19131. MIYATA, T., Y. IIDA, Y. UEDA, T. SHINZATO, H. SEO, V. M. MONNIER, K. MAEDA and Y. WADA. Monocyte/macrophage response to beta 2-microglobulin modified with advanced glycation end products. Kidney Int 1996, 49, 538–550. ODANI, H., R. OYAMA, K. TITANI, H. OGAWA and A. SAITO. Purification and complete amino acid sequence of novel beta 2-microglobulin. Biochem Biophys Res Commun 1990, 168, 1223–1229. KAD, N. M., N. H. THOMSON, D. P. SMITH, D. A. SMITH and S. E. RADFORD. Beta2-microglobulin and its deamidated variant, N17D form amyloid fibrils with a range of morphologies in vitro. J Mol Biol 2001, 313, 559–571. WATANABE, A., K. TAKIO and Y. IHARA. Deamidation and isoaspartate formation in smeared tau in paired helical filaments. Unusual properties of the microtubule-binding domain of tau. J Biol Chem 1999, 274, 7368–7378. SANDILANDS, A., A. M. HUTCHESON, H. A. LONG, A. R. PRESCOTT, G. VRENSEN, J. LOSTER, N. KLOPP, R. B. LUTZ, et al. Altered aggregation properties of mutant gamma-crystallins cause inherited cataract. EMBO J 2002, 21, 6005–6014. MEEHAN, S., Y. BERRY, B. LUISI, C. M. DOBSON, J. A. CARVER and C. E. MAC-PHEE. Amyloid fibril formation by lens crystallin proteins and its
23
24 Post-translational Chemical Modifications in Amyloid Fibril Formation
104
105
106
107
108
109
110
111
112
implications for cataract formation. J Biol Chem 2004, 279, 3413–3419. TAKEMOTO, L., T. EMMONS and D. GRANSTROM. The sequences of two peptides from cataract lenses suggest they arise by deamidation. Curr Eye Res 1990, 9, 793–797. TAKEMOTO, L. and D. BOYLE. Increased deamidation of asparagine during human senile cataractogenesis. Mol Vis 2000, 6, 164–168. MORI, H., K. TAKIO, M. OGAWARA and D. J. SELKOE. Mass spectrometry of purified amyloid beta protein in Alzheimer’s disease. J Biol Chem 1992, 267, 17082–17086. HARIGAYA, Y., T. C. SAIDO, C. B. ECKMAN, C. M. PRADA, M. SHOJI and S. G. YOUNKIN. Amyloid beta protein starting pyroglutamate at position 3 is a major component of the amyloid deposits in the Alzheimer’s disease brain. Biochem Biophys Res Commun 2000, 276, 422–427. KUO, Y. M., M. R. EMMERLING, A. S. WOODS, R. J. COTTER and A. E. ROHER. Isolation, chemical characterization, and quantitation of A beta 3-pyroglutamyl peptide from neuritic plaques and vascular amyloid deposits. Biochem Biophys Res Commun 1997, 237, 188–191. SAIDO, T. C., T. IWATSUBO, D. M. MANN, H. SHIMADA, Y. IHARA and S. KAWASHIMA. Dominant and differential deposition of distinct beta-amyloid peptide species, A beta N3(pE), in senile plaques. Neuron 1995, 14, 457–466. HE, W. and C. J. BARROW. The A beta 3-pyroglutamyl and 11-pyroglutamyl peptides found in senile plaque have greater beta-sheet forming and aggregation propensities in vitro than full-length A beta. Biochemistry 1999, 38, 10871–10877. SZENDREI, G. I., H. FABIAN, H. H. MANTSCH, S. LOVAS, O. NYEKI, I. SCHON and L. OTVOS, Jr. Aspartate-bond isomerization affects the major conformations of synthetic peptides. Eur J Biochem 1994, 226, 917–924. ROHER, A. E., J. D. LOWENSON, S. CLARKE, C. WOLKOW, R. WANG, R. J. COTTER, I. M. REARDON, H. A. ZURCHER-NEELY, et al. Structural alterations in the peptide
113
114
115
116
117
118
119 120
121
backbone of beta-amyloid core protein may account for its deposition and stability in Alzheimer’s disease. J Biol Chem 1993, 268, 3072–3083. SHIMIZU, T., H. FUKUDA, S. MURAYAMA, N. IZUMIYAMA and T. SHIRASAWA. Isoas-partate formation at position 23 of amyloid beta peptide enhanced fibril formation and deposited onto senile plaques and vascular amyloids in Alzheimer’s disease. J Neurosci Res 2002, 70, 451–461. SHIN, Y., H. S. CHO, H. FUKUMOTO, T. SHIMIZU, T. SHIRASAWA, S. M. GREENBERG and G. W. REBECK. Abeta species, including IsoAsp23 Abeta, in Iowa-type familial cerebral amyloid angiopathy. Acta Neuropathol (Berl) 2003, 105, 252–258. FABIAN, H., G. I. SZENDREI, H. H. MANTSCH, B. D. GREENBERG and L. OTVOS, Jr. Synthetic post-translationally modified human A beta peptide exhibits a markedly increased tendency to form beta-pleated sheets in vitro. Eur J Biochem 1994, 221, 959–964. ORPISZEWSKI, J. and M. D. BENSON. Induction of beta-sheet structure in amy-loidogenic peptides by neutralization of aspartate: a model for amyloid nuclea-tion. J Mol Biol 1999, 289, 413–428. ORPISZEWSKI, J., N. SCHORMANN, B. KLUVE-BECKERMAN, J. J. LIEPNIEKS and M. D. BENSON. Protein aging hypothesis of Alzheimer disease. FASEB J 2000, 14, 1255–1263. BADA, J. L. In vivo racemization in mammalian proteins. Methods Enzymol 1984., 106, 98–115. BROWN, R. Amino acid dating. Origins 1985., 12, 8–25. De MIRANDA, J., A. SANTORO, S. ENGELENDER and H. WOLOSKER. Human serine racemase: molecular cloning, genomic organization and functional analysis. Gene 2000, 256, 183–188. JANEK, K., J. BEHLKE, J. ZIPPER, H. FABIAN, Y. GEORGALIS, M. BEYERMANN, M. BIENERT and E. KRAUSE. Water-soluble beta-sheet models which self-assemble into fibrillar structures. Biochemistry 1999, 38, 8246–8252.
References 122 SHAPIRA, R., G. E. AUSTIN and S. S. MIRRA. Neuritic plaque amyloid in Alzheimer’s disease is highly racemized. J Neurochem 1988, 50, 69–74. 123 ZAGORSKI, M. G., J. YANG, H. SHAO, K. MA, H. ZENG and A. HONG. Methodological and chemical factors affecting amyloid beta peptide amyloidogenicity. Methods Enzymol 1999, 309, 189–204. 124 KANEKO, I., K. MORIMOTO and T. KUBO. Drastic neuronal loss in vivo by beta-amyloid racemized at Ser(26) residue: conversion of non-toxic [D-Ser(26)]beta-amyloid 1–40 to toxic and proteinase-resistant fragments. Neuroscience 2001, 104, 1003–1011. 125 KUBO, T., Y. KUMAGAE, C. A. MILLER and I. KANEKO. Beta-amyloid racemized at the Ser26 residue in the brains of patients with Alzheimer disease: implications in the pathogenesis of Alzheimer disease. J Neuropathol Exp Neurol 2003, 62, 248–259. 126 LOWENSON, J. D. and S. CLARKE. Structural elements affecting the recognition of L-isoaspartyl residues by the L-isoas-partyl/D-aspartyl protein methyltransferase. Implications for the repair hypothesis. J Biol Chem 1991, 266, 19396–19406. 127 JOHNSON, B. A., J. M. SHIROKAWA, J. W. GEDDES, B. H. CHOI, R. C. KIM and D. W. ASWAD. Protein L-isoaspartyl methyl-transferase in postmortem brains of aged humans. Neurobiol Aging 1991, 12, 19–24. 128 YAMAMOTO, A., H. TAKAGI, D. KITAMURA, H. TATSUOKA, H. NAKANO, H. KAWANO, H. KUROYANAGI, Y. YAHAGI, et al. Deficiency in protein L-isoaspartyl methyl-transferase results in a fatal progressive epilepsy. J Neurosci 1998, 18, 2063–2074. 129 LANTHIER, J., A. BOUTHILLIER, M. LAPOINTE, M. DEMEULE, R. BELIVEAU and R. R. DESROSIERS. Down-regulation of protein L-isoaspartyl methyltransferase in human epileptic hippocampus contributes to generation of damaged tubulin. J Neurochem 2002, 83, 581–591. 130 SMITH, M. A., L. M. SAYRE, V. M. MONNIER and G. PERRY. Radical AGEing
131
132
133
134
135
136
137
138
in Alzheimer’s disease. Trends Neurosci 1995, 18, 172–176. ZHANG, Q., E. T. POWERS, J. NIEVA, M. E. HUFF, M. A. DENDLE, J. BIESCHKE, C. G. GLABE, A. ESCHENMOSER, et al. Metabolite-initiated protein misfolding may trigger Alzheimer’s disease. Proc Natl Acad Sci USA 2004, 101, 4752–4757. FEUGHELMAN, M. and B. K. WILLIS. Thiol-disulfide interchange a potential key to conformational change associated with amyloid fibril formation. J Theor Biol 2000, 206, 313–315. TAKEMOTO, L. J. Oxidation of cysteine residues from alpha-A crystallin during cataractogenesis of the human lens. Biochem Biophys Res Commun 1996, 223, 216–220. UEMICHI, T., J. R. MURRELL, S. ZELDENRUST and M. D. BENSON. A new mutant transthyretin (Arg 10) associated with familial amyloid polyneuropathy. J Med Genet 1992, 29, 888–891. MCCUTCHEN, S. L. and J. W. KELLY. Intermolecular disulfide linkages are not required for transthyretin amyloid fibril formation in vitro. Biochem Biophys Res Commun 1993, 197, 415–421. HEAFIELD, M. T., S. FEARN, G. B. STEVENTON, R. H. WARING, A. C. WILLIAMS and S. G. STURMAN. Plasma cysteine and sulphate levels in patients with motor neurone, Parkinson’s and Alzheimer’s disease. Neurosci Lett 1990, 110, 216–220. PERRY, T. L., M. G. NORMAN, V. W. YONG, S. WHITING, J. U. CRICHTON, S. HANSEN and S. J. KISH. Hallervorden-Spatz disease: cysteine accumulation and cysteine dioxygenase deficiency in the globus pallidus. Ann Neurol 1985, 18, 482–489. LIM, A., J. WALLY, M. T. WALSH, M. SKINNER and C. E. COSTELLO. Identification and location of a cysteinyl posttranslational modification in an amyloidogenic kappa1 light chain protein by electrospray ionization and matrix-assisted laser desorption/ionization mass spectrometry. Anal Biochem 2001, 295, 45–56.
25
26 Post-translational Chemical Modifications in Amyloid Fibril Formation
139 LIM, A., T. PROKAEVA, M. E. MCCOMB, L. H. CONNORS, M. SKINNER and C. E. COSTELLO. Identification of S-sulfonation and S-thiolation of a novel transthyretin Phe33Cys variant from a patient diagnosed with familial transthyretin amyloidosis. Protein Sci 2003, 12, 1775–1785. 140 ZHANG, Q. and J. W. KELLY. Cys10 mixed disulfides make transthyretin more amyloidogenic under mildly acidic conditions. Biochemistry 2003, 42, 8756–8761. 141 BELLOTTI, V., P. MANGIONE and M. STOPPINI. Biological activity and pathological implications of misfolded proteins. Cell Mol Life Sci 1999, 55, 977–991. 142 STAHL, N., M. A. BALDWIN, D. B. TEPLOW, L. HOOD, B. W. GIBSON, A. L. BURLINGAME and S. B. PRUSINER. Structural studies of the scrapie prion protein using mass spectrometry and amino acid sequencing. Biochemistry 1993, 32, 1991–2002. 143 NASLUND, J., A. SCHIERHORN, U. HELLSMAN, L. LANNFELT, A. D. ROSES, L. O. TJERNBERG, J. SILBERRING, S. E. GANDY, et al. Relative abundance of Alzheimer A beta amyloid peptide variants in Alzheimer disease and normal aging. Proc Natl Acad Sci USA 1994, 91, 8378–8382. 144 KUO, Y. M., T. A. KOKJOHN, T. G. BEACH, L. I. SUE, D. BRUNE, J. C. LOPEZ, W. M. KALBACK, D. ABRAMOWSKI, et al. Comparative analysis of amyloid-beta chemical structure and amyloid plaque morphology of transgenic mouse and Alzheimer’s disease brains. J Biol Chem 2001, 276, 12991–12998. 145 SNYDER, S. W., U. S. LADROR, W. S. WADE, G. T. WANG, L. W. BARRETT, E. D. MATAYOSHI, H. J. HUFFAKER, G. A. KRAFFT, et al. Amyloid-beta aggregation: selective inhibition of aggregation in mixtures of amyloid with different chain lengths. Biophys J 1994, 67, 1216–1228. 146 VARADARAJAN, S., J. KANSKI, M. AKSENOVA, C. LAUDERBACK and D. A. BUTTERFIELD. Different mechanisms of oxidative stress and neurotoxicity for Alzheimer’s A beta(1–42) and A
147
148
149
150
151
152
153
154
155
beta(25–35). J Am Chem Soc 2001, 123, 5625–5631. PALMBLAD, M., A. WESTLIND-DANIELSSON and J. BERGQUIST. Oxidation of methionine 35 attenuates formation of amyloid beta-peptide 1–40 oligomers. J Biol Chem 2002, 277, 19506–19510. HOU, L., I. KANG, R. E. MARCHANT and M. G. ZAGORSKI. Methionine 35 oxidation reduces fibril assembly of the amyloid abeta-(1–42) peptide of Alzheimer’s disease. J Biol Chem 2002, 277, 40173–40176. HOKENSON, M. J., V. N. UVERSKY, J. GOERS, G. YAMIN, L. A. MUNISHKINA and A. L. FINK. Role of individual methionines in the fibrillation of methionineoxidized alpha-synuclein. Biochemistry 2004, 43, 4621–4633. YAMIN, G., C. B. GLASER, V. N. UVERSKY and A. L. FINK. Certain metals trigger fibrillation of methionine-oxidized alpha-synuclein. J Biol Chem 2003, 278, 27630–27635. UVERSKY, V. N., G. YAMIN, P. O. SOUILLAC, J. GOERS, C. B. GLASER and A. L. FINK. Methionine oxidation inhibits fibrillation of human alpha-synuclein in vitro. FEBS Lett 2002, 517, 239–244. GIASSON, B. I., J. E. DUDA, I. V. MURRAY, Q. CHEN, J. M. SOUZA, H. I. HURTIG, H. ISCHIROPOULOS, J. Q. TROJANOWSKI, et al. Oxidative damage linked to neurode-generation by selective alpha-synuclein nitration in synucleinopathy lesions. Science 2000, 290, 985–989. HORIGUCHI, T., K. URYU, B. I. GIASSON, H. ISCHIROPOULOS, R. LIGHTFOOT, C. BELLMANN, C. RICHTER-LANDSBERG, V. M. LEE, et al. Nitration of tau protein is linked to neurodegeneration in tauopathies. Am J Pathol 2003, 163, 1021–1031. DOMIGAN, N. M., T. S. CHARLTON, M. W. DUNCAN, C. C. WINTERBOURN and A. J. KETTLE. Chlorination of tyrosyl residues in peptides by myeloperoxidase and human neutrophils. J Biol Chem 1995, 270, 16542–16548. OLSZOWSKI, S., E. OLSZOWSKA, T. STELMASZYNSKA, A. KRAWCZYK, J. MARCINKIEWICZ and N. BACZEK.
References
156
157
158
159
160
161
162
163
164
Oxidative modification of ovalbumin. Acta Biochim Pol 1996, 43, 661–672. HAZEN, S. L. and J. W. HEINECKE. 3-Chlorotyrosine, a specific marker of myeloperoxidase-catalyzed oxidation, is markedly elevated in low density lipoprotein isolated from human atherosclerotic intima. J Clin Invest 1997, 99, 2075–2081. ATWOOD, C. S., G. PERRY, H. ZENG, Y. KATO, W. D. JONES, K. Q. LING, X. HUANG, R. D. MOIR, et al. Copper mediates dityrosine cross-linking of Alzheimer’s amyloid-beta. Biochemistry 2004, 43, 560–568. BUCALA, R. and A. CERAMI. Advanced glycosylation: chemistry, biology and implications for diabetes and aging. Adv Pharmacol 1992, 23, 1–34. HUNT, J. V. and S. P. WOLFF. Oxidative glycation and free radical production: a causal mechanism of diabetic complications. Free Rad Res Commun 1991, 12/13, 115–123. NYHLIN, N., Y. ANDO, R. NAGAI, O. SUHR, M. EL SAHLY, H. TERAZAKI, T. YAMASHITA, M. ANDO, et al. Advanced glycation end product in familial amyloidotic polyneuropathy (FAP). J Intern Med 2000, 247, 485–492. KAMALVAND, G. and Z. ALI-KHAN. Immunolocalization of lipid peroxidation/advanced glycation end products in amyloid A amyloidosis. Free Rad Biol Med 2004, 36, 657–664. MA, Z., P. WESTERMARK and G. T. WESTERMARK. Amyloid in human islets of Langerhans: immunologic evidence that islet amyloid polypeptide is modified in amyloidogenesis. Pancreas 2000, 21, 212–218. SASAKI, N., R. FUKATSU, K. TSUZUKI, Y. HAYASHI, T. YOSHIDA, N. FUJII, T. KOIKE, I. WAKAYAMA, et al. Advanced glycation end products in Alzheimer’s disease and other neurodegenerative diseases. Am J Pathol 1998, 153, 1149–1155. MIYATA, T., O. ODA, R. INAGI, Y. IIDA, N. ARAKI, N. YAMADA, S. HORIUCHI, N. TANIGUCHI, et al. beta 2-Microglobulin modified with advanced glycation end products is a major component of
165
166
167
168
169
170
171
172
173 174
hemodialysis-associated amyloidosis. J Clin Invest 1993, 92, 1243–1252. LEDESMA, M. D., P. BONAY and J. AVILA. Tau protein from Alzheimer’s disease patients is glycated at its tubulin-binding domain. J Neurochem 1995, 65, 1658–1664. MUNCH, G., A. M. CUNNINGHAM, P. RIEDERER and E. BRAAK. Advanced glycation endproducts are associated with Hirano bodies in Alzheimer’s disease. Brain Res 1998, 796, 307–310. VITEK, M. P., K. BHATTACHARYA, J. M. GLENDENING, E. STOPA, H. VLASSARA, R. BUCALA, K. MANOGUE and A. CERAMI. Advanced glycation end products contribute to amyloidosis in Alzheimer disease. Proc Natl Acad Sci USA 1994, 91, 4766–4770. KAPURNIOTU, A., J. BERNHAGEN, N. GREENFIELD, Y. AL-ABED, S. TEICHBERG, R. W. FRANK, W. VOELTER and R. BUCALA. Contribution of advanced glycosylation to the amyloidogenicity of islet amyloid polypeptide. Eur J Biochem 1998, 251, 208–216. ASKANAS, V., W. K. ENGEL, M. BILAK, R. B. ALVAREZ and D. J. SELKOE. Twisted tubulofilaments of inclusion body myositis muscle resemble paired helical filaments of Alzheimer brain and contain hyperphosphorylated tau. Am J Pathol 1994, 144, 177–187. CROWTHER, R. A., O. F. OLESEN, M. J. SMITH, R. JAKES and M. GOEDERT. Assembly of Alzheimer-like filaments from full-length tau protein. FEBS Lett 1994, 337, 135–138. SMITH, M. A., R. H. NAGARAJ and G. PERRY. Protocol for quantitative analysis of paired helical filament solubilization: a method applicable to insoluble amyloids and inclusion bodies. Brain Res Brain Res Protoc 1997, 1, 247–252. KELLY, J. W. Mechanisms of amyloidogenesis. Nat Struct Biol 2000, 7, 824–826. GRIFFITH, J. S. Self-replication and scrapie. Nature 1967, 215, 1043–1044. SERIO, T. R., A. G. CASHIKAR, A. S. KOWAL, G. J. SAWICKI, J. J. MOSLEHI, L. SERPELL, M. F. ARNSDORF and S. L. LINDQUIST. Nucleated conformational
27
28 Post-translational Chemical Modifications in Amyloid Fibril Formation
175
176
177
178
179
180
181
182
conversion and the replication of conformational information by a prion determinant. Science 2000., 289, 1317–1321. ESLER, W. P., E. R. STIMSON, J. M. JENNINGS, H. V. VINTERS, J. R. GHILARDI, J. P. LEE, P. W. MANTYH and J. E. MAGGIO. Alzheimer’s disease amyloid propagation by a template-dependent dock-lock mechanism. Biochemistry 2000, 39, 6288–6295. HARPER, J. D. and P. T. LANSBURY, Jr. Models of amyloid seeding in Alzheimer’s disease and scrapie: mechanistic truths and physiological consequences of the time-dependent solubility of amyloid proteins. Annu Rev Biochem 1997, 66, 385–407. WETZEL, R. For protein misassembly, it’s the “I” decade. Cell 1996, 86, 699–702. TAKANO, K., J. FUNAHASHI and K. YUTANI. The stability and folding process of amyloidogenic mutant human lysozymes. Eur J Biochem 2001, 268, 155–159. CHIEN, P. and J. S. WEISSMAN. Conformational diversity in a yeast prion dictates its seeding specificity. Nature 2001., 410, 223–227. VON BERGEN, M., P. FRIEDHOFF, J. BIERNAT, J. HEBERLE, E. M. MANDELKOW and E. MANDELKOW. Assembly of tau protein into Alzheimer paired helical filaments depends on a local sequence motif (306 VQIVYK311 ) forming beta structure. Proc Natl Acad Sci USA 2000, 97, 5129–5134. KREBS, M. R., D. K. WILKINS, E. W. CHUNG, M. C. PITKEATHLY, A. K. CHAMBERLAIN, J. ZURDO, C. V. ROBINSON and C. M. DOBSON. Formation and seeding of amyloid fibrils from wild-type hen lysozyme and a peptide fragment from the beta-domain. J Mol Biol 2000, 300, 541–549. KOZHUKH, G. V., Y. HAGIHARA, T. KAWAKAMI, K. HASEGAWA, H. NAIKI and Y. GOTO. Investigation of a peptide responsible for amyloid fibril formation of beta 2-microglobulin by achromobacter protease I. J Biol Chem 2002, 277, 1310–1315.
183 CHEN, S., V. BERTHELIER, W. YANG and R. WETZEL. Polyglutamine aggregation behavior in vitro supports a recruitment mechanism of cytotoxicity. J Mol Biol 2001, 311, 173–182. 184 WOOD, S. J., J. WYPYCH, S. STEAVENSON, J. C. LOUIS, M. CITRON and A. L. BIERE. Alpha-synuclein fibrillogenesis is nucleation-dependent. Implications for the pathogenesis of Parkinson’s disease. J Biol Chem 1999, 274, 19509–19512. 185 MOROZOVA-ROCHE, L. A., J. ZURDO, A. SPENCER, W. NOPPE, V. RECEVEUR, D. B. ARCHER, M. JONIAU and C. M. DOBSON. Amyloid fibril formation and seeding by wild-type human lysozyme and its disease-related mutational variants. J Struct Biol 2000, 130, 339–351. 186 RAMIREZ-ALVARADO, M. and L. REGAN. Does the location of a mutation determine the ability to form amyloid fibrils? J Mol Biol 2002, 323, 17–22. 187 O’NUALLAIN, B., A. D. WILLIAMS, P. WESXTERMARK and R. WETZEL. Seeding specificity in amyloid growth induced by heterologous fibrils. J Biol Chem 2004, 279, 17490–17499. 188 GIASSON, B. I., M. S. FORMAN, M. HIGUCHI, L. I. GOLBE, C. L. GRAVES, P. T. KOTZBAUER, J. Q. TROJANOWSKI and V. M. LEE. Initiation and synergistic fibrillization of tau and alpha-synuclein. Science 2003, 300, 636–640. 189 ALIM, M. A., M. S. HOSSAIN, K. ARIMA, K. TAKEDA, Y. IZUMIYAMA, M. NAKAMURA, H. KAJI, T. SHINODA, et al. Tubulin seeds alpha-synuclein fibril formation. J Biol Chem 2002, 277, 2112–2117. 190 HAN, H., P. H. WEINREB and P. T. LANSBURY, Jr. The core Alzheimer’s peptide NAC forms amyloid fibrils which seed and are seeded by beta-amyloid: is NAC a common trigger or target in neurodegenerative disease? Chem Biol 1995, 2, 163–169. 191 GIACCONE, G., B. PEDROTTI, A. MIGHELI, L. VERGA, J. PEREZ, G. RACAGNI, M. A. SMITH, G. PERRY, et al. PP and Tau interaction. A possible link between amyloid and neurofibrillary tangles in Alzheimer’s disease. Am J Pathol 1996, 148, 79–87.
References 192 KISILEVSKY, R. Review: amyloidogenesis – unquestioned answers and unanswered questions. J Struct Biol 2000, 130, 99–108. 193 BALBIRNIE, M., R. GROTHE and D. S. EISENBERG. An amyloid-forming peptide from the yeast prion Sup35 reveals a dehydrated beta-sheet structure for amyloid. Proc Natl Acad Sci USA 2001, 98, 2375–2380. 194 NILSSON, M. R. and C. M. DOBSON. In vitro characterization of lactoferrin aggregation and amyloid
formation. Biochemistry 2003, 42, 375–382. 195 NILSSON, M. R., M. DRISCOLL and D. P. RALEIGH. Low levels of asparagine dea-midation can have a dramatic effect on aggregation of amyloidogenic peptides: implications for the study of amyloid formation. Protein Sci 2002, 11, 342–349. 196 NILSSON, M. R. and C. M. DOBSON. Chemical modification of insulin in amyloid fibrils. Protein Sci 2003, 12, 2637–2641.
29
1
Serum amyloid P Component Steve P. Wood, and Alun R. Coker University of Southampton, Southampton, United Kingdom
Originally published in: Amyloid Proteins. Edited by Jean D. Sipe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31072-X
1 Introduction
The characteristic amyloid fibril-forming polypeptides and the amyloid P component (AP) are abundant constituents of amyloid deposits, and a growing body of evidence suggests that their interaction is an important feature of amyloid accumulation. When the mouse gene for serum amyloid P component (SAP), the circulating equivalent of AP, is inactivated or knocked-out, then the time course for deposition of amyloid after experimental induction is considerably retarded [1]. This strongly suggests that SAP is centrally involved in some way in fibrillogenesis and/or in stabilizing deposits against clearance. A recently discovered drug, CPHPC, induces rapid clearance of circulating SAP in transgenic mice expressing the human protein, and leads to a reduction in the amount of amyloid associated SAP and to a reduction in the amyloid load in animals where systemic amyloidosis has been induced [2]. These results suggest that SAP contributes to the stability and persistence of fibrils in vivo. Amyloid fibril formation and the calcium-dependent binding of SAP to the fibrils can be reproduced in vitro, and, in these in vitro conditions, it was found that bound SAP protects the fibrils against proteolysis and attack by phagocytic cells, reinforcing the stabilization hypothesis [3]. Physical/chemical stabilization of amyloid fibrils might also result from SAP binding. Amyloid formation in vitro is often driven by harsh protein destabilizing conditions and the process appears to be essentially irreversible. However, regression of deposits in vivo following depletion of circulating fibril precursor pools by immunization [4] or transplantation [5] suggests that they may be more dynamic structures. SAP binding might contribute to “locking in” materials to the deposit. However, the administration of radiolabeled SAP and whole-body scintigraphy, an established and effective procedure for locating amyloid laden organs, depends upon exchange
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Serum amyloid P Component
of fibril-localized SAP with the circulating tracer pool and is further evidence of a dynamic aspect of amyloid deposits [6]. Although the interaction of SAP with amyloid fibrils appears to be a fundamental feature of amyloid disease pathology, the whole story of amyloid structure is likely to be far more complex. A host of other components are found in association with amyloid deposits including the complement protein C1q, α 1 -antichymotrypsin, apolipoprotein E, ubiquitin and glycosaminoglycans [7]. Their role is far from clear. There is evidence that SAP interacts with some of these constituents in other contexts, such as complement activation and basement membrane structure [8]. Some of the components may well be present in response to the local inflammatory cellular crisis at sites of amyloid deposition. However, the ability to reproduce amyloid recognition by SAP in vitro from pure components suggests that investigation of the interaction at a structural level is both a tractable and valid target. Here, we review what is known of the structure and properties of SAP, and how this is related to current views on the structure of amyloid and its formation, topics covered in greater depth elsewhere in this volume.
2 Amyloid Fibrils and their Formation
For many years, our understanding of fibril structure was limited to the interpretation of X-ray fibre diffraction patterns that reported the presence of cross-β structure. Technical advances in the use of synchrotron radiation for these investigations, the development of cryopreservation techniques and image analysis for electron microscopy and the use of multidimensional nuclear magnetic resonance (NMR) and more recently solid-state NMR are providing a clearer view of amyloid structure and potential motifs for SAP binding [9–13]. Amyloid fibrils are believed to be formed by a concentration dependent, nucleated growth pattern analogous to crystal growth [9]. Structurally perturbed precursors form stable intermolecular hydrogen bonding networks, perhaps involving domain swapping in some cases [14], that, together with any retained β-structure of the precursor, form the extended cross-β structure of the fibril. Strands are believed to run approximately normal to the fibril axis. This regular cross-β structure gives rise to characteristic reflections in the X-ray fibril diffraction pattern [15]. Fibril growth is probably far more complex than sequential monomer addition, as electron microscopy and atomic force microscopy have shown the presence of micelle-like components and beaded protofilaments during growth [16, 17]. Mature fibrils are often assembled as “rope-like” structures of entwined filaments of variable helical pitch and a hollow core [18] (Fig. 1a). Fibrils can be very long, but are usually in the range of 100–150 Å in diameter. The structural destabilization of precursors required to drive fibril growth may not always be extensive. It is clear that a number of mutated proteins associated with aggressive amyloid deposition exhibit reduced thermal stability [19, 20], and experimental conditions that promote fibril growth in vitro lead to partial denaturation and specific strand displacements from native folds.
2 Amyloid Fibrils and their Formation
Fig. 1 (a) A model for ATTR derived from deuterium-exchange protection NMR and electron spin resonance data. Displacement of strands C and D from the native fold creates newly exposed A and B edge strands that together with a slightly modified form of the native intersubunit sheet provides a continuum of $-strands. Disordered loops
and chain termini protrude from the fibril surface. (Reproduced from Oloffson et al. [27], with permission.) (b) Image processed electron microscopy images of insulin amyloid fibrils showing various degrees of intertwining of protofibrils. (Reproduced from Saibil et al. [10], with permission.)
However, models for transthyretin amyloid (ATTR) that fit with much experimental evidence incorporate a good deal of the native β structure [21–24]. Crystal structures of an amyloidogenic TTR mutant and of a κ-light chain variable region show extended β-helix arrangements, generated by the crystal symmetry, that emulate many of the features expected for amyloid fibrils [25, 26]. Recent solution NMR and mass spectrometry studies for TTR, Aβ and β 2 -microglobulin (β 2 M), using deuterium-exchange protection methods, suggest that considerable portions of the polypeptide chain, including chain termini and displaced loops, are probably disordered at the fibril surface providing an interaction surface distinct from that of the native proteins (Fig. 1b) [27–29]. Monoclonal antibodies recognizing cryptic epitopes of TTR that are displayed by amyloid fibrils and by hyper-amyloidogenic mutants in solution, but not the native protein, reaffirm this view [30]. Solidstate NMR studies of Aβ1–40 and fibril proteolysis results suggest that the polar
3
4 Serum amyloid P Component
N-terminal region and turns are exposed at the surface of the fibril [31, 32] and that the N-terminal region is a primary epitope recognized by antibodies to Aβ fibrils [33]. The hydrogen bonding potential of the β-strands is likely to be fully satisfied by the cross-β structure, except at the fibril ends, and hydrophobic residues are probably buried, if indeed the fibril state represents a low-energy “primordial fold” [34]. Although many proteins, including those with no association with amyloid disease, can be driven to the fibril state by harsh destabilizing conditions, in general, protein structures have probably evolved with variable degrees of success to avoid exposure of sheet edge strands capable of repeating intermolecular interaction in physiological conditions [35]. On the order of 20–30 distinct proteins have been reported to produce amyloid deposits in vivo. Therefore, the universal calcium dependent binding of SAP to these fibrils is unlikely to be highly sequence specific, in view of the variety of the fibrillogenic proteins. Evidence for the existence of a generic amyloid motif, however, is provided by the identification of an antibody that recognizes amyloid derived from different source proteins [36]. In both cases, it is possible that there exists a conformation-dependent discontinuous epitope that obscures identification of important conserved patterns of residues. If the emerging structural models of amyloid fibrils formed by TTR, β 2 M, Aβ1–40 are general in character, then we might expect the surface of the amyloid fibril to display turns between strands, displaced loops and flexible chain termini. These features are likely to be distributed in accord with the twist of the sheets and the inter-twining of the fibrils to produce a fuzz of polypeptide at the fibril surface within which motifs for SAP recognition must reside.
3 The Structure of SAP
The crystal structure of human SAP was first reported in 1994 and was determined by the classical method of isomorphous replacement [37]. Crystallization of the protein had been problematic due to its propensity to aggregate in the presence of calcium ions, a phenomenon that has continued to complicate many aspects of research into the behavior of SAP in solution [38]. SAP is a normal, relatively abundant (around 30 mg/l) plasma glycoprotein that is readily purified by virtue of its calcium-dependent affinity for immobilized phosphoethanolamine. Subsequent experimental manipulations are traditionally carried out in neutral buffers containing EDTA, where the protein exhibits the behavior of a highly soluble decamer of Mr 23,500 subunits. Each subunit comprises 204 amino acids and carries a single Asn32-linked biantennary oligosaccharide chain [39]. The isoelectric point is 5.5, giving the protein a net negative charge at physiological pH; however, this charge is not evenly distributed. Crystallization was carried out in the presence of high concentrations of calcium and acetate ions and at a pH close to the pI.
3 The Structure of SAP
Fig. 2 The SAP pentamer viewed down the 5-fold symmetry axis showing two calcium ions (yellow) bound to each subunit.
The crystallographic structure revealed molecules with cyclic pentameric symmetry packed in a herring-bone fashion and containing only five subunits. These disc-like pentameric molecules are approximately 100 Å in diameter and 35 Å deep, and show a large 20 Å diameter hole through the center, in accord with previous negative stained electron microscopy (Fig. 2) [40]. The polypeptide chain in each subunit is organized as two layers of antiparallel β-strands, sandwiching hydrophobic residues of the core between them (Fig. 3). The first and last four strands sequentially visit the two layers in a “jelly-roll” topology, while the intervening six strands define two β-meanders of three strands at the same end of each layer. This architecture very closely resembles that of the plant lectins, concanavalin A and pea lectin [41, 42]. One of the sheets is flat and has a 10-residue α-helix positioned upon it. The other sheet is rather twisted and loops dipping into the concave surface contribute to the double calcium-binding site of each subunit. The loops connecting strands are generally short and this likely contributes to the remarkable stability of calcium bound SAP in the face of proteolytic digestion [43]. The amino acid sequence of SAP is provided in Fig. 4 in comparison with that of C-reactive protein
5
6 Serum amyloid P Component
Fig. 3 Ribbon diagrams of orthogonal views of the SAP subunit showing the location of the A-face helix and the concavity of the B-face sheet.
Fig. 4 Amino acid sequences for SAP and CRP with residues involved in calcium binding in red, residues of the hydrophobic pocket bound by MO$DG and CPHPC in blue. The oligosaccharide attachment site (CHO) and site of chymotrypsin cleavage are indicated, as is the distribution of secondary structure elements.
4 The Calcium-binding Site
Fig. 5 Side view of the SAP pentamer showing the 10 bound calcium ions of the B face and five helices of the A face.
(CRP), annotated with secondary structure features and with important residues in color. All of the strands of each subunit approximately populate a plane normal to the 5-fold axis of the pentamer. The subunits interact via van der Waals contacts and hydrogen bonds through the open-core end of the β-sheet sandwich of one subunit and the N- and C-terminal strands of the adjacent subunit, to result in the burying of 20% of the protomer surface on pentamer formation. The pentamer has a polar character, with five α-helices on one face (the A face) and five double calcium sites on the other (B face) (Fig. 5). Only the first ring of the oligosaccharide chain attached to Asn32 was visible in this and several subsequent crystal structure analysis. However, more recently, we have experimentally observed almost the entire length of a single sugar chain for one subunit (unpublished data). In this case the sugar chain was immobilized by crystal packing. If the oligosaccharide chains are fully extended, as suggested by solution X-ray and neutron scattering [44], then their area of influence around the pentamer is considerable, extending the effective molecular diameter from 100 to around 135 Å (Fig. 6)
4 The Calcium-binding Site
Each SAP subunit in the crystal structure contains two calcium ions bound very close together (4.2 Å) in the concavity formed by the buckled B-face sheet. This “tense” electrostatic arrangement is stabilized by a selection of carboxylate and amide protein side-chain ligands to the metals (Fig. 7). Unlike many other calciumbinding proteins, these side-chains come from different parts of the sequence and so it seems likely that the bound metals are important for stabilizing the structure in this region. For instance, Glu136 and Asp138 are positioned on an extended loop that dips into the site, with both side-chains forming bridging interactions with both calcium ions. The two metal ions have different total numbers of stabilizing ligands and are likely to be bound to the protein with quite different affinities. Furthermore,
7
8 Serum amyloid P Component
Fig. 6 SAP pentamer with fully extended oligosaccharide chains attached to each subunit, extending the effective radius of the molecule by around 40%.
the ligand complement to the metals in this crystal form also included a bound acetate ion in four of the five subunits and an intermolecular interaction involving Glu167 from the prominent A-face helix of an adjacent pentamer. Acetate ions and other small molecules that bind this site inhibit the aggregation of SAP by blocking intermolecular interactions. There are many acidic groups exposed on the surface
Fig. 7 The double calcium ion-binding site of SAP, showing the uneven distribution of amino acid side-chain ligands to the metals and a bound acetate ion.
5 Comparative studies of CRP
of the pentamer that could contribute to intermolecular contacts, but the Glu167 side-chain is of special interest as five copies project from the A face and could simultaneously interact with the five double calcium sites of another pentamer stacked on the same 5-fold symmetry axis. Such stacks would also be expected to be rather stable because the net charges on the contact surfaces are opposite. Limited stacks have been observed on the surface of ex vivo amyloid fibrils by electron microscopy [45], but the most impressive long stacks are reported in the presence of EDTA where, in the absence of calcium, the above stacking mechanism would not be expected to occur [46]. The presence of long strings of SAP pentamers in calcium-free solution seems unlikely as the protein is very soluble, monodisperse and generally well behaved in these conditions. It is possible that stack formation in the absence of calcium is the result of negative stain metal binding during sample preparation. Stack formation is a potentially important property of SAP as it might contribute to SAP accumulation at amyloid sites. Certainly the related troublesome solubility properties of SAP have provided major technical complications for studies of SAP binding to putative target macromolecules. Calcium site I of each SAP subunit is coordinated to the side-chains of Asp58, Asn59, Glu136, Asp138 and the main chain carbonyl of Gln137. The acetate ion provides the seventh ligand to produce a pentagonal bipyramidal arrangement. Calcium site II is more open and has fewer protein ligands deriving from Glu136, Asp138 and Gln148 as well as the acetate ion and two water molecules. The site was found to release calcium easily, by washing crystals in calcium-free buffers, and indeed preferentially bound the much larger cerium ion. It is conceivable that metals larger than calcium, such as copper or zinc, might be preferred in site II in spite of their much lower availability in biological fluids. The integrity of the site probably also depends upon the presence of a bound ligand analogous to the acetate ion seen in the crystal, but no such ligand has been identified in SAP preparations. In the absence of bound metal ions SAP is readily cleaved by chymotrypsin at the 144–145 bond, suggesting that the loop carrying these residues, which is an integral part of the metal binding site, is much more accessible. The degree of protection conferred by calcium is concentration dependent [43]. Protection is complete at 5 mM and absent at 0.1 mM calcium. Protection is not entirely complete at 2 mM calcium and 35 mg/l SAP, conditions close to those expected in vivo.
5 Comparative studies of CRP
CRP is a close sequence and structural homologue of SAP and investigations of its structure and properties provide some intriguing comparisons with SAP, since CRP is not bound by amyloid fibrils. A number of crystal structures of CRP have been determined in the presence and absence of metal ions and ligands [47–50]. These show that the subunit fold of CRP is very similar to SAP. Two closely bound calcium ions reside on each subunit of a pentamer. The distribution of protein ligands to the two calcium ions is much more even in CRP and the metals are held with equal affinity [51]. In structures where no calcium ions are bound, the 140–150
9
10 Serum amyloid P Component
loop region is no longer well ordered, and, in solution, the 145–146 and 146–147 peptide bonds are readily cleaved by proteolytic enzymes. Although no structure for metal-free SAP has been determined, it seems highly probable that an equivalent disorganization of the metal binding region occurs. The organization of the pentamer of CRP is somewhat different from the SAP pentamer. Each subunit is tilted by 22◦ about an axis through its centre of mass towards the 5-fold axis of the pentamer such that the A-face helices are 5 Å closer to the axis while the five double calcium sites are displaced by an equivalent amount away from the axis. This reorganization may explain why CRP is not as susceptible to calcium-dependent aggregation and to stacking interactions as SAP, because the putative interacting groups are displaced to new pentamer radii differing by 10 Å. Perhaps it also contributes to the inability of CRP to bind motifs with a particular geometry at the surface of amyloid fibrils that are recognized by SAP. When an equivalent subunit rotation is modeled onto the SAP structure, surprisingly few steric clashes occur, suggesting that such rotations could be a dynamic feature of pentraxin structures. The crystal structure of CRP in the complete absence of calcium ions shows two pentamers packed A face to A face on a common 5-fold axis, bringing the helices on each subunit close together. Although there is no evidence for the formation of CRP decamers in solution, this does illustrate a preferred packing arrangement for pentamers of CRP, albeit driven by the supersaturation of the crystallization cocktail.
6 SAP Structure in the Absence of Calcium
In all of the biological contexts in which SAP structure informs function, it is likely that metal ions are present and, thus, metal binding is only likely to be disrupted by proteolytic clipping of the 140–150 loop. In this respect, the decameric structure adopted by SAP in metal-free conditions appears to be of limited interest. However, in many investigations of the possible role of SAP in influencing fibril formation, sugar binding or protein folding (that we will visit later), metal-free conditions have often been employed. The value of observations made in these conditions is that they report on a functionality of the protein that persists when the metal site is severely disrupted. This might be attributed to a completely different part of the protein or to generation of a novel functionality for the disrupted region. The decameric form of SAP that exists in solution in the absence of bound metals very likely involves a face-to-face interaction. The susceptibility of these decamers to proteolysis in the calcium site loop suggests that these loops are exposed and that the pentamers are interacting via their A faces, as they do in crystals of metal-free CRP. Binding of calcium however generates SAP pentamers. It is not clear how the ordering of the metal site would destabilize an A–A-face interaction some 35 Å away unless perhaps subunit rotation equivalent to that of CRP took place. It is conceivable that the SAP decamer involves interactions between pentamer B faces.
7 Binding of Small Molecule Ligands to SAP 11
The loop site of proteolysis is oriented to the outer edge of the pentamer and may not be completely obscured by a B–B-face interaction. Metal binding and ordering of the loop could disrupt a B–B-face decamer. Determination of the structure of metal-free SAP will clearly be informative.
7 Binding of Small Molecule Ligands to SAP
The calcium dependent binding of phosphoethanolamine (PE) and the 4,6-cyclic pyruvate acetal of β-D-galactose (MOβDG) to SAP are well-established observations [8]. Immobilized PE, linked to a matrix through the amino group, and MOβDG as a component of agarose are both effective affinity supports binding SAP from solutions containing calcium ions. The binding sites for the isolated ligands were identified by soaking SAP crystals in the compounds or by co-crystallization and the determination of crystal structures. Subsequently, both studies have been extended to higher resolution [52]. Both compounds were found to bind to the double calcium site of SAP via their acidic moieties. The structure of bound MOβDG was of special interest, as this compound has been shown to inhibit the binding of SAP to amyloid fibrils in vitro [53], and, therefore, has potential both as a lead compound for drug design and as a guide to the nature of the amyloid interaction. The carboxylate of MOβDG interacts with the calcium ions of each SAP subunit and the bridge oxygens of the carboxyethylidine ring hydrogen bond with the side-chain amide nitrogens of Asn59 and Gln148 (Fig. 8). The side-chain carbonyl oxygens of the same residues also interact with the calcium ions. There are also hydrogen bonds between sugar oxygens of C1 and C3 to Gln148 and Lys79, respectively. The methyl group of the bound R isomer of MOβDG locates in a hydrophobic pocket formed by Tyr74, Phe64 and Leu62. In one subunit, crystal packing leaves no room for a molecule of MOβDG to bind and only one calcium ion is bound in site I. This suggests that a bound ligand of some sort is required to stabilize the filling of calcium site II. Isothermal calorimetry measurements indicate a dissociation constant of 50 µM for the interaction of MOβDG with SAP [2], considerably (around 103 -fold) weaker than that with amyloid fibrils. This may be due in large part to the likelihood that multiple attachment points and some associated cooperativity are involved in amyloid recognition. A tight β-turn between polypeptide strands and including aspartic acid at its apex would present a broadly similar array of binding groups, as MOβDG and such turns can be readily built into the binding site, an observation consistent with the supposition that exposed turns at the fibril surface are involved in SAP binding [8]. However, MOβDG would also be expected to interfere with the formation of SAP stacks. Inhibition of stack formation could account for much of the observed inhibition of SAP binding to fibrils in the presence of MOβDG and calcium ions. The importance of cooperative binding effects was demonstrated particularly well for the first time in studies of the calcium-dependent binding of dAMP to SAP
12 Serum amyloid P Component
Fig. 8 The structure of the galactose analog (MO$DG) bound into the double calcium site of an SAP subunit, showing direct interaction with the calcium ions and hydrogen bonds to side-chains that also donate ligands to the metals.
[54]. This ligand was identified as an effective inhibitor of SAP aggregation in the presence of calcium ions. The crystal structure of the SAP–dAMP complex revealed decamers of SAP. Each subunit bound a single dAMP molecule via its phosphate group to the double calcium site. The two pentamers were arranged B face to B face, but the dominant interaction stabilizing the complex was the non-covalent stacking of the adenine rings. This complex was stable during gel filtration even though the extent of the interactions at one double calcium site suggests a low-affinity binding, in the same range as MOβDG. High-throughput screening approaches have identified a high-affinity ligand for SAP that clearly reinforces the importance of cooperative binding [2]. The screen selected compounds that inhibited binding of SAP to Aβ1–42 fibrils immobilized on a plastic surface. This approach identified CPHPC, a compound comprising two D-proline residues linked via a five-carbon spacer and that binds SAP with nanomolar affinity. The crystal structure of the SAP–CPHPC complex showed a decameric structure in which the two D-proline head groups of five molecules of the drug bound to the double calcium sites of subunits from two pentamers oriented B face to B face on a common 5-fold axis (Fig. 9). These decamers are stable in solution. The packing of the proline ring into the Leu62, Tyr74, Phe64 hydrophobic pocket and the interaction of the carboxylate with the calcium ions are the major
7 Binding of Small Molecule Ligands to SAP 13
Fig. 9 Two pentamers of SAP ($-strands in green and heices in red) crosslinked by five molecules of the drug CPHPC (left).
stabilizing interactions. Binding of d-proline alone to SAP, where cooperative effects are not involved, is about 103 -fold weaker, although the crystal structure of the N-acetyl D-Pro–SAP complex shows an identical set of interactions to the CPHPC head group complex. As noted earlier, the properties of this compound potentially provide an approach to treating amyloidoses. Although of considerable interest in a clinical context, it is not clear that the discovery of this compound directly aids our understanding of amyloid recognition. Multiple peptide chain C-termini at the fibril surface might bind SAP in a similar fashion to CPHPC and the side-chains of certain hydrophobic amino acids might fill the hydrophobic pocket, but they are unlikely to achieve the particularly snug fit observed with D-proline. L-proline binds considerably less tightly. During studies of the chaperone activity of SAP, we discovered that high concentrations of chloride ions can effectively inhibit the aggregation of SAP in the presence of calcium ions. X-ray structure analysis of crystals grown in these conditions revealed that a single chloride ion can bind close to the calcium ions of each SAP subunit. Although the binding is weak, at sufficiently high concentrations it is capable of blocking the intermolecular interactions that lead to SAP precipitation with calcium ions. This provides a convenient solution condition for comparative investigations of the calcium dependent binding of SAP to other macromolecules (unpublished data). In overview, structural investigations have defined the calcium-binding site of SAP in considerable detail. The most direct interpretation of the calcium dependence of fibril recognition by SAP suggests that this is the site involved although it is conceivable that the structural integrity of the site is required without direct metal
14 Serum amyloid P Component
bonding interactions as seen for sugar binding to structurally homologous lectins. The structural features of the binding of small molecule ligands provide clues to the nature of the binding mode of polypeptide ligands likely presented at the fibril surface. It is not clear how the architecture of the amyloid fibril might complement the 5-fold symmetric pattern of binding sites on SAP, although it seems that a multipoint attachment is required to explain the binding affinity observed for such a shallow recognition pocket. It is conceivable that the array of loops, turns and termini at the fibril surface satisfy the multipoint cooperative attachment by virtue of their density rather than their precise organization or sequence. While we have detailed structural information for SAP and its likely amyloid recognition site, and a much clearer view of the structure of fibrils, difficult issues remain in bringing this information together into a coherent hypothesis for the form of decorated fibrils in vivo. The possible motifs for amyloid fibril recognition by SAP discussed above are likely to occur with high frequency due to the tight twisting together of protofilaments in mature fibrils and the relatively large number of repeats of the generally small protein constituents of most amyloid fibrils. A 150Å diameter fibril could bind three SAP pentamers through a face interaction for each 140 Å of its length to fully coat the surface. However, the molar binding ratio of SAP pentamers to synthetic Aβ1–40 fibrils reaches a plateau at around 1:250, corresponding to around 12% of the mass of the fibril – roughly comparable to that found in in vivo amyloid fibrils [3]. Based on calculations of the average mass per unit length of mature synthetic Aβ1–40 fibrils [16], this would only give one SAP molecule every 370 Å along the fibril (Fig. 10). This low binding ratio of SAP to Aβ1–40 suggests that the amyloid fibrils in this model would be rather sparsely coated with SAP. For a fully coated surface providing maximum stabilization, the molar ratios would drop to around 1:20. An explanation for this might reside in the work of MacRaild [55]. Simultaneous attachment of SAP to more than one fibril, crosslinking and generating fibril entanglement might limit the accessibility of potential fibril ligands to further molecules of SAP and provide stabilization and resistance towards proteolytic attack.
Fig. 10 Glycosylated SAP pentamers (around 140 Å diameter) attached through a B-face interaction with a 150-Å diameter fibril of A$ would be around 370 Å apart if SAP made up around 12% of the mass of the complex.
9 SAP, Protein Folding and Amyloid Fibril Formation
8 The Role of Glycosaminoglycans (GAGs)
Heparan sulfate and chondroitin sulfate proteoglycans (HSPG and CSPG) are well-documented constituents of amyloid deposits. In vitro experiments show that SAP interacts in a calcium-dependent fashion with dermatan and heparan sulfate polymers [56]. Kiselevsky et al. [57] have proposed on the basis of in situ electron microscopy that this interaction forms the foundation for the structure of amyloid fibrils in vivo, and that disruptive extraction procedures may explain discrepancies between this and other work. Their model suggests that stacked SAP molecules are coated by a coil of CSPG on top of which lie HSPG chains and surface-exposed 1-nm protofilaments. During water extraction these filaments are freed from the fibril and assemble into the familiar pattern of twisted filaments observed by others and reproduced in in vitro experiments. While all aspects of this model are not in accord with the views of many investigators, it is conceivable that the SAP stacks observed attached to ex vivo fibrils is a relic of a GAG bound structure that has survived fibril extraction from tissues. It is less easy to accept the proposed presence of single strand protofilaments enmeshed in the HSPG layer as a dominant form. TTR fibrils removed from aqueous humor with minimal disturbance clearly show the classical cross-β X-ray fibril diffraction pattern [10] and lend strong support to the concept that in vivo fibril assembly bears strong similarities to that in vitro.
9 SAP, Protein Folding and Amyloid Fibril Formation
SAP recognizes amyloid fibrils formed from a diverse array of source proteins and polypeptides, but there is no evidence to suggest that SAP has any preferential binding to these proteins in their native state. SAP has been reported to bind non-fibrillar β 2 M, serum amyloid A protein (SAA) and Aβ, but the proteins were adsorbed to a plastic surface where some unfolding might be expected [58, 59]. Experiments designed to monitor the effects of SAP on fibril formation and protein folding are complicated by the calcium-induced aggregation of SAP and the technical difficulties arising from the need to employ harsh conditions to initiate fibril growth for some proteins. Synthetic Aβ has often been employed, as fibril growth proceeds spontaneously in dilute aqueous media. Amyloidogenic polypeptides like Aβ and SAA are likely to be metastable in aqueous solution, as their sequences are adapted to quite different environments. SAA preferentially associates with a high-density lipoprotein particle, while Aβ is a fragment of a larger protein and at least a portion of its sequence has evolved to reside within a lipid bilayer. Both materials have poor solubility and are rather difficult to work with. Precise solvent conditioning protocols are recommended by some investigators [29] to ensure reproducible behavior of synthetic Aβ, but these are not followed by all and this, together with the solubility difficulties of SAP, may go some way towards explaining diverging views on the effects of SAP on Aβ fibrillogenesis.
15
16 Serum amyloid P Component
SAP has been reported to both enhance and to inhibit fibril formation by Aβ in different experimental conditions. Hamazaki [60] reported that SAP enhanced the production of sedimentable Aβ1–40 approximately 3-fold over 16 h in the presence of 1.5 mM calcium ions. Webster and Rodgers [61] reported a doubling of the Thioflavin T fluorescence for Aβ1–42 in the presence of SAP and 2 mM calcium ions over 24 h. In both cases, nanomolar concentrations of Aβ and SAP were employed, approximating to in vivo levels. Janciauskine et al. [62] found that in the absence of calcium ions, SAP completely inhibited the formation of fibrils by Aβ1–42. More recently, MacRaild et al. [55] have investigated the effects of SAP and apolipoprotein E on the state of amyloid fibrils using sedimentation velocity analysis, electron microscopy and rheology. They concluded that both proteins cause amyloid fibrils to associate more densely, becoming highly entangled. The additional stabilization of fibrils provided by calcium-dependent binding and/or crosslinking by SAP would be expected to favor fibril formation. Inhibition of fibril formation, however, suggests a different mode of interaction. There is considerable current interest in identifying small molecules that interfere with extension of the “crossed-β” structure [63–65] and peptide strand terminators offer a plausible guide to the way in which SAP might halt fibril growth. The disordered region made up of residues 140–150 is a unique feature of metal-free SAP and could contribute to this effect, donating a strand that cannot be propagated.
10 Conclusion
The view that some disorganization of amyloidogenic proteins is required for them to restructure into a cross-β fold is well supported by experimental observations and the amyloidoses are often referred to as protein misfolding diseases [66]. The ability of SAP to recognize intermediate and end products of this process shows close parallels with the properties of molecular chaperones. These proteins interact with folding intermediates of other proteins in the cell to prevent aggregation and facilitate efficient production of correctly folded and biologically active materials [67, 68]. SAP has been shown to assist the refolding of chemically denatured lactate dehydrogenase (LDH) and to inhibit the turbulence induced inactivation of the same enzyme [69]. These effects were observed in the absence and presence of calcium ions, with the greatest effect being observed in the presence of the galactose analogue, MOβDG, which blocks amyloid binding. They are not inhibited by the presence of dAMP, which stabilizes a B–B-face interaction, suggesting that some other part of the SAP molecule is involved. The effects are saturable, possibly reflecting partitioning of refolding LDH intermediates between states that bind SAP and others that do not. Only a small proportion of molecules negotiate a direct route through the refolding energy landscape [70] in the absence of SAP, with the majority forming a misfolded and inactive product. SAP may bind misfolding intermediates that would otherwise inactivate correctly folded subunits of the dimeric enzyme and in the case of the turbulence inactivation assay, SAP may inhibit a nucleated
References
aggregation process of inactivation analogous to amyloid formation. One might speculate that these effects of SAP reflect a surveillance role in detecting damaged proteins in vivo, which is overwhelmed in amyloid disease. A selection of other intracellular molecular chaperones have been identified by proteomic analysis of senile plaques, levels of heat-shock protein (Hsp) 90 being particularly elevated [7]. The extracellular chaperone clusterin is also found. This protein has been shown to bind tightly to Aβ and inhibit its aggregation and to inhibit the formation of amyloid fibrils by apolipoprotein CII in vitro [71, 72], effects reminiscent of those exhibited by calcium-free SAP on fibrillogenesis. SAP binding to amyloid may be a pathological accident, perhaps a side-reaction to some other more general chaperone-like function, or SAP may have evolved as a safety mechanism to trap/localize misfolded proteins. A core role of SAP appears to be as an anti-opsonin in chromatin clearance and innate immunity [73] and masking the low level amyloid deposition that appears to be a common feature of ageing [74] may be a further element of this process.
References 1 BOTTO, M., et al. Amyloid deposition is delayed in mice with targeted deletion of the serum amyloid P component gene. Nat Med 1997, 3, 855–859. 2 PEPYS, M. B., et al. Targeted pharmacological depletion of serum amyloid P component for treatment of human amyloidosis. Nature 2002, 417, 254–259. 3 TENNENT, G. A., L. B. LOVAT and M. B. PEPYS. Serum amyloid P-component prevents proteolysis of the amyloid fibrils of Alzheimer-disease and systemic amyloidosis. Proc Natl Acad Sci USA 1995, 92, 4299–4303. 4 SCHENK, D., et al. Immunization with amyloid-beta attenuates Alzheimer disease-like pathology in the PDAPP mouse. Nature 1999, 400, 173–177. 5 HOLMGREN, G., et al. Clinical improvement and amyloid regression after liver-transplantation in hereditary transthyretin amyloidosis. Lancet 1993, 341, 1113–1116. 6 HAWKINS, P. N., et al. Scintigraphic imaging and turnover studies with iodine–131 labelled serum amyloid P component in systemic amyloidosis. Eur J Nucl Med 1998, 25, 701–708. 7 LIAO, L., et al. Proteomic characterization of postmortem amyloid plaques isolated
8
9
10
11
12
by laser capture microdissection. J Biol Chem 2004, 279, 37061–37068. PEPYS, M. B., D. R. BOOTH, W. L. HUTCHINSON, J. R. GALLIMORE, P. M. COLLINS and E. HOHENESTER. Amyloid P component. A critical review. Amyloid: J Protein Fold Disorders 1997, 4, 274– 295. JARRETT, J. T. and P. T. LANSBURY. Seeding one-dimensional crystallization of amyloid – a pathogenic mechanism in Alzheimer’s-disease and scrapie. Cell 1993, 73, 1055–1058. BLAKE, C. and L. SERPELL. Synchrotron X-ray studies suggest that the core of the transthyretin amyloid fibril is a continuous beta-sheet helix. Structure 1996, 4, 989–998. JIMENEZ, J. L., J. L. GUIJARRO, E. ORLOVA, J. ZURDO, C. M. DOBSON, M. SUNDE and H. R. SAIBIL. Cryo-electron microscopy structure of an SH3 amyloid fibril and model of the molecular packing. EMBO J 1999, 18, 815–821. ANTZUTKIN, O. N., R. D. LEAPMAN, J. J. BALBACH and R. TYCKO. Supramolecular structural constraints on Alzheimer’s beta-amyloid fibrils from electron microscopy and solid-state nuclear magnetic resonance. Biochemistry 2002, 41, 15436–15450.
17
18 Serum amyloid P Component
13 ANTZUTKIN, O. N., J. J. BALBACH and R. TYCKO. Site-specific identification of non-beta-strand conformations in Alzheimer’s beta-amyloid fibrils by solid-state NMR. Biophys J 2003, 84, 3326–3335. 14 STANIFORTH, R. A., et al. Three-dimensional domain swapping in the folded and molten-globule states of cystatins, an amyloid-forming structural superfamily. EMBO J 2001, 20, 4774–4781. 15 SERPELL, L. C. Alzheimer’s amyloid fibrils: structure and assembly. Biochim Biophys Acta 2000, 1502, 16–30. 16 GOLDSBURY, C. S., S. WIRTZ, S. A. MULLER, S. SUNDERJI, P. WICKI, U. AEBI and P. FREY. Studies on the in vitro assembly of A beta 1–40: implications for the search for A beta fibril formation inhibitors. J Struct Biol 2000, 130, 217–231. 17 IONESCU-ZANETTI, C., et al. Monitoring the assembly of Ig light-chain amyloid fibrils by atomic force microscopy. Proc Natl Acad Sci USA 1999, 96, 13175–13179. 18 JIMENEZ, J. L., E. J. NETTLETON, M. BOUCHARD, C. V. ROBINSON, C. M. DOBSON and H. R. SAIBIL. The protofilament structure of insulin amyloid fibrils. Proc Natl Acad Sci USA 2002, 99, 9196–9201. 19 BOOTH, D. R., et al. Instability, unfolding and aggregation of human lysozyme variants underlying amyloid fibrillogenesis. Nature 1997, 385, 787–793. 20 FERRAO-GONZALES, A. D., L. PALMIERI, M. VALORY, J. L. SILVA, H. LASHUEL, J. W. KELLY and D. FOGUEL. Hydration and packing are crucial to amyloidogenesis as revealed by pressure studies on transthyretin variants that either protect or worsen amyloid disease. J Mol Biol 2003, 328, 963–974. 21 INOUYE, H., F. S. DOMINGUES, A. M. DAMAS, M. J. SARAIVA, E. LUNDGREN, O. SANDGREN and D. A. KIRSCHNER. Analysis of X-ray diffraction patterns from amyloid of biopsied vitreous humor and kidney of transthyretin (TTR) Met30
22
23
24
25
26
27
28
29
30
familial amyloidotic polyneuropathy (FAP) patients: axially arrayed TTR monomers constitute the protofilament. Amyloid: Int J Exp Clin Invest 1998, 5, 163–174. SERAG, A. A., C. ALTENBACH, M. GINGERY, W. L. HUBBELL and T. O. YEATES. Arrangement of subunits and ordering of beta-strands in an amyloid sheet. Nat Struct Biol 2002, 9, 734–739. SERAG, A. A., C. ALTENBACH, M. GINGERY, W. L. HUBBELL and T. O. YEATES. Arrangement of subunits and ordering of beta-strands in an amyloid sheet. Nat Struct Biol 2003, 10, 70. SERAG, A. A., C. ALTENBACH, M. GINGERY, W. L. HUBBELL and T. O. YEATES. Identification of a subunit interface in transthyretin amyloid fibrils: evidence for self-assembly from oligomeric building blocks. Biochemistry 2001, 40, 9089–9096. ENEQVIST, T., K. ANDERSSON, A. OLOFSSON, E. LUNDGREN and A. E. SAUER-ERIKSSON. The beta-slip: a novel concept in transthyretin amyloidosis. Mol Cell 2000, 6, 1207–1218. STEINRAUF, L. K., M. Y. CHIANG and D. SHIUAN. Molecular structure of the amyloid-forming protein kappa I Bre. J Biochem 1999, 125, 422–429. OLOFSSON, A., J. H. IPPEL, S. S. WIJMENGA, E. LUNDGREN and A. OHMAN. Probing solvent accessibility of transthyretin amyloid by solution NMR spectroscopy. J Biol Chem 2004, 279, 5699–5707. HOSHINO, M., H. KATOU, Y. HAGIHARA, K. HASEGAWA, H. NAIKI and Y. GOTO. Mapping the core of the beta2 -microglobulin amyloid fibril by H/D exchange. Nat Struct Biol 2002, 9, 332–336. KHETERPAL, I., S. ZHOU, K. D. COOK and R. WETZEL. A beta amyloid fibrils possess a core structure highly resistant to hydrogen exchange. Proc Natl Acad Sci USA 2000, 97, 13597–13601. GOLDSTEINS, G., et al. Exposure of cryptic epitopes on transthyretin only in amyloid and in amyloidogenic mutants. Proc Natl Acad Sci USA 1999, 96, 3108–3113.
References 31 KHETERPAL, I., A. WILLIAMS, C. MURPHY, B. BLEDSOE and R. WETZEL. Structural features of the A beta amyloid fibril elucidated by limited proteolysis. Biochemistry 2001, 40, 11757– 11767. 32 TYCKO, R. Insights into the amyloid folding problem from solid-state NMR. Biochemistry 2003, 42, 3151–3159. 33 TOWN, T., J. TAN, N. SANSONE, D. OBREGON, T. KLEIN and M. MULLAN. Characterization of murine immunoglobulin G antibodies against human amyloid-beta1–42 . Neurosci Lett 2001, 307, 101–104. 34 DOBSON, C. M. Protein-misfolding diseases: getting out of shape. Nature 2002, 418, 729–730. 35 RICHARDSON, J. S., D. C. RICHARDSON. Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc Natl Acad Sci USA 2002, 99, 2754–2759. 36 O’NUALLAIN, B. and R. WETZEL. Conformational Abs recognizing a generic amyloid fibril epitope. Proc Natl Acad Sci USA 2002., 99, 1485–1490. 37 EMSLEY, J., et al. Structure of pentameric human serum amyloid-P component. Nature 1994, 367, 338–345. 38 WOOD, S. P., et al. A pentameric form of human-serum amyloid-P component – crystallization, X-ray-diffraction and neutron-scattering studies. J Mol Biol 1988, 202, 169–173. 39 PEPYS, M. B., et al. Human Serum amyloid-P component is an invariant constituent of amyloid deposits and has a uniquely homogeneous glycostructure. Proc Natl Acad Sci USA 1994, 91, 5602–5606. 40 PAINTER, R. H., I. DEESCALLON, A. MASSEY, L. PINTERIC and S. B. STERN. The structure and binding characteristics of serum amyloid protein (9. 5S alpha-1-glycoprotein). Ann NY Acad Sci 1982, 389, 199–215. 41 SRINIVASAN, N., H. E. WHITE, J. EMSLEY, S. P. WOOD, M. B. PEPYS and TL. BLUNDELL. Comparative analyses of pentraxins – implications for protomer assembly and ligand-binding. Structure 1994, 2, 1017–1027.
42 SRINIVASAN, N., S. D. RUFINO, M. B. PEPYS, S. P. WOOD and T. L. BLUNDELL. A super-family of proteins with the lectin fold Chemstracts. Biochem Mol Biol 1996, 6, 149–164. 43 KINOSHITA, C. M., et al. A protease-sensitive site in the proposed Ca2+ -binding region of human serum amyloid-P component and other pentraxins. Protein Sci 1992, 1, 700– 709. 44 ASHTON, A. W., M. K. BOEHM, J. R. GALLIMORE, M. B. PEPYS and S. J. PERKINS. Pentameric and decameric structures in solution of serum amyloid P component by X-ray and neutron scattering and molecular modelling analyses. J Mol Biol 1997, 272, 408–422. 45 SHIRAHAMA, T. and A. S. COHEN. High-resolution electron microscopic analysis of the amyloid fibril. J Cell Biol 1967, 33, 679–708. 46 PINTERIC, L. and R. H. PAINTER. Electron microscopy of serum amyloid protein in the presence of calcium; alternative forms of assembly of pentagonal molecules in two-dimensional lattices. Can J Biochem 1979, 57, 727–736. 47 THOMPSON, D., M. B. PEPYS and S. P. WOOD. The physiological structure of human C-reactive protein and its complex with phosphocholine. Structure 1999, 7, 169–177. 48 SHRIVE, A. K., A. W. METCALFE, J. R. CARTWRIGHT and T. J. GREENHOUGH. C-reactive protein and SAP-like pentraxin are both present in Limulus polyphemus haemolymph: crystal structure of Limulus SAP. J Mol Biol 1999, 290, 997–1008. 49 RAMADAN, M. A. M., A. K. SHRIVE, D. HOLDEN, D. A. A. MYLES, J. E. VOLANAKIS, L. J. DELUCAS and T. J. GREENHOUGH. The three-dimensional structure of calcium-depleted human C-reactive protein from perfectly twinned crystals. Acta Crystallogr D Biol Crystallogr 2002, 58, 992–1001. 50 SHRIVE, A. K., et al. Three dimensional structure of human C-reactive protein. Nat Struct Biol 1996, 3, 346–354. 51 KINOSHITA, C. M., et al. Elucidation of a protease-sensitive site involved in the
19
20 Serum amyloid P Component
52
53
54
55
56
57
58
59
60
binding of calcium to C-reactive protein. Biochemistry 1989, 28, 9840–9848. THOMPSON, D., M. B. PEPYS, I. TICKLE and S. WOOD. The structures of crystalline complexes of human serum amyloid P component with its carbohydrate ligand, the cyclic pyruvate acetal of galactose. J Mol Biol 2002, 320, 1081–1086. HIND, C. R. K., P. M. COLLINS, D. RENN, R. B. COOK, D. CASPI, M. L. BALTZ and M. B. PEPYS. Binding-specificity of serum amyloid-P component for the pyruvate acetal of galactose. J Exp Med 1984, 159, 1058–1069. HOHENESTER, E., W. L. HUTCHINSON, M. B. PEPYS and S. P. WOOD. Crystal structure of a decameric complex of human serum amyloid P component with bound dAMP. J Mol Biol 1997, 269, 570–578. MACRAILD, C. A., et al. Non-fibrillar components of amyloid deposits mediate the self-association and tangling of amyloid fibrils. J Biol Chem 2004, 279, 21038–21045. HAMAZAKI, H. Ca2+ -mediated association of human-serum amyloid-P component with heparan-sulfate and dermatan sulfate. J Biol Chem 1987, 262, 1456–1460. INOUE, S., M. KUROIWA and M. J. SARAIVA, A. GUIMARAES and R. KISILEVSKY. Ultrastructure of familial amyloid polyneuropathy amyloid fibrils: examination with high-resolution electron microscopy. J Struct Biol 1998, 124, 1–12. DANIELSEN, B., I. J. SORENSEN, M. NYBO, E. H. NIELSEN, B. KAPLAN and S. E. SVEHAG. Calcium-dependent and -independent binding of the pentraxin serum amyloid P component to glycosaminoglycans and amyloid proteins: Enhanced binding at slightly acid pH. Biochim Biophys Acta 1997, 1339, 73–78. HAMAZAKI, H. Ca2+ -dependent binding of human serum amyloid-P component to Alzheimer’s beta-amyloid peptide. J Biol Chem 1995, 270, 10392– 10394. HAMAZAKI, H. Amyloid-P component promotes aggregation of Alzheimer’s
61
62
63
64
65
66
67
68
69
70
71
beta-amyloid peptide. Biochem Biophys Res Commun 1995, 211, 349–353. WEBSTER, S. and J. ROGERS. Relative efficacies of amyloid beta peptide (A beta) binding proteins in A beta aggregation. J Neurosci Res 1996, 46, 58–66. JANCIAUSKIENE, S., P. G. DEFRUTOS, E. CARLEMALM, B. DAHLBACK and S. ERIKSSON. Inhibition of Alzheimer beta-peptide fibril formation by serum amyloid-P component. J Biol Chem 1995, 270, 26041–26044. LASHUEL, H. A., D. M. HARTLEY, D. BALAKHANEH, A. AGGARWAL, S. TEICHBERG and D. J. E. CALLAWAY. New class of inhibitors of amyloid-beta fibril formation – implications for the mechanism of pathogenesis in Alzheimer’s disease. J Biol Chem 2002, 277, 42881–42890. FRENKEL, D., B. SOLOMON and I. BENHAR. Modulation of Alzheimer’s beta-amyloid neurotoxicity by site-directed single-chain antibody. J Neuroimmunol 2000, 106, 23–31. SOTO, C., E. M. SIGURDSSON, L. MORELLI, R. A. KUMAR, E. M. CASTANO and B. FRANGIONE. Beta-sheet breaker peptides inhibit fibrillogenesis in a rat brain model of amyloidosis: implications for Alzheimer’s therapy. Nat Med 1998, 4, 822–826. COHEN, F. E. and J. W. KELLY. Therapeutic approaches to protein-misfolding diseases. Nature 2003, 426, 905–909. ELLIS, R. J. Roles of molecular chaperones in protein-folding. Curr Opin Struct Biol 1994, 4, 117–122. HARTL, F. U. Molecular chaperones in cellular protein folding. Nature 1996, 381, 571–580. COKER, A. R., A. PURVIS, D. BAKER, M. B. PEPYS and S. P. WOOD. Molecular chaper-one properties of serum amyloid P component. FEBS Lett 2000, 473, 199–202. DILL, K. A. and H. S. CHAN. From Levinthal to pathways to funnels. Nat Struct Biol 1997, 4, 10–19. MATSUBARA, E., C. SOTO, S. GOVERNALE, B. FRANGIONE and J. GHISO.
References Apolipoprotein J and Alzheimer’s amyloid beta solubility. Biochem J 1996, 316, 671–679. 72 HATTERS, D. M., M. R. WILSON, S. B. EASTERBROOK-SMITH and G. J. HOWLETT. Suppression of apolipoprotein C-II amyloid formation by the extracellular chaperone, clusterin. Eur J Biochem 2002, 269, 2789– 2794.
73 BICKERSTAFF, M. C. M., et al. Serum amyloid P component controls chromatin degradation and prevents antinuclear autoimmunity. Nat Med 1999, 5, 694–697. 74 HAGGQVIST, B., et al. Medin: an integral fragment of aortic smooth muscle cell-produced lactadherin forms the most common human amyloid. Proc Natl Acad Sci USA 1999, 96, 8669–8674.
21
1
Amyloid Fibril Formation of Natively Unfolded Proteins Vladimir N. Uversky Indiana University School of Medicine, Indianapolis, USA
Anthony L. Fink University of California, Santa Cruz, USA
Originally published in: Amyloid Proteins. Edited by Jean D. Sipe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31072-X
1 Introduction
The realization that many proteins are intrinsically disordered, or natively unfolded, under physiological conditions (for the purified protein) is a relatively recent one. In fact, current estimates suggest that between 25 and 33% of the human genome encoding proteins may code for intrinsically disordered regions [1–4]! Thus, rather than being rarities, such proteins may be rather common. Several hypotheses have been proposed for the existence of natively unfolded proteins; probably the most widely accepted is that the lack of ordered structure permits greater flexibility in binding partners, a valuable characteristic in a protein that must react with multiple binding partners in its normal function, as is true for many signaling molecules and gene regulators. It is likely that in their normal physiological milieu intrinsically unstructured proteins are not unfolded, due to interactions with other proteins, nucleic acids, membranes or small ligands. This would, at least in part, explain their stability to intracellular proteases. Natively unfolded, or intrinsically disordered, proteins are specifically localized within a unique region of chargehydrophobicity phase space, and are characterized by a combination of low overall hydrophobicity and large net charge [5]. As will be discussed, a variety of conditions can bring about folding of natively unfolded proteins, resulting in either relatively non-compact, partially folded conformations or relatively compact, tightly folded conformations. An informative example is the effect of increasing trimethylamine N-oxide (TMAO) on α-synuclein. TMAO is an osmolyte, and is known to stabilize native states and induce structure in disordered states. When the structure of α-synuclein was examined as a function
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Amyloid Fibril Formation of Natively Unfolded Proteins
of increasing TMAO, the initially natively unfolded conformation adopted a partially folded intermediate conformation in the vicinity of 1 M TMAO, and became compact and helical in 3 M TMAO. The term conformational disease or protein deposition disease has been given to a number of pathological states in which a specific protein or protein fragment changes from its natural soluble form into stable, insoluble, ordered filamentous protein aggregates, commonly referred to as amyloid fibrils [6–11]. Given the prevalence of natively unfolded proteins it should not be surprising that of the more than two dozen proteins known so far to be involved in protein deposition diseases, more than 50% are either completely unfolded or contain significant unfolded regions under physiological conditions (Tables 1 and 2). For many years it has been generally assumed that the ability to form amyloid fibrils is limited to a relatively small number of proteins, essentially those found in the diseases, and that these proteins possess specific sequence motifs encoding the unique structure of amyloid core. However, recent studies have suggested that the ability of proteins to form amyloid fibrils is probably very common and may be a general feature of the polypeptide chain [7, 12, 13]. In fact, many proteins that are not associated with diseases have been shown to form fibrils in vitro (reviewed in [14]). Overall, there is an increasing belief that the ability to form fibrils is an inherent property of the polypeptide chain, i.e. many proteins, perhaps all, are potentially able to form amyloid fibrils under appropriate conditions [7, 15–17]. Amyloid fibrils are highly organized structures (similar to one-dimensional crystals) in which the essential features of the structure are determined by the physicochemical properties of the polypeptide chain [18]. Furthermore, it has been pointed out that, as with other highly organized compounds (including crystals) whose structures are based on repetitive long-range interactions, the most stable structures are those consisting of a single type of peptide or protein where self-interactions can be optimized [18, 19]. Obviously, this can explain the remarkable specificity of protein aggregation in protein deposition diseases. Although amyloidogenic polypeptides may be rich in β-sheet, α-helix, β-helix or natively unfolded [5, 20–27], the resulting fibrils display many common properties including a core cross-β-sheet structure in which continuous β-sheets are formed with β-strands running perpendicular to the long axis of the fibril [28]. All fibrils have similar morphologies, with mature fibrils usually having a twisted, rope-like structure, which usually involves two to six unbranched protofilaments around 2–3 nm in diameter associated laterally or twisted together to form fibrils of around 10 nm diameter (see, e.g. [29–31]). Since all fibrils, regardless of the original structure of the given amyloidogenic protein, have a common cross-β structure, considerable conformational rearrangements have to occur in order for fibril formation to happen. The molecular basis of amyloid fibril formation from globular proteins is described; briefly, fibril formation can only occur when the rigid native structure is destabilized, favoring partial unfolding and the formation of an amyloidogenic partially unfolded conformation [6–11, 32–35]. This is because the structural rearrangements required for the formation of fibrils cannot take place within the tightly packed native globular
1 Introduction 3 Table 1 Some natively unfolded or significantly disordered amyloidogenic proteins and the
corresponding amyloid-based clinical disorders Amyloidogenic protein
Type of structure
Disease
Prion protein and its fragments
N-terminal fragment (23–121) is natively unfolded; C-terminal domain (121–230) is α-helical (predominantly)
Aβ and its fragments
natively unfolded
ABri Huntingtin
natively unfolded exon 1 is unfolded and forms fibrils ligand-binding and DNAbinding domains are α-helical; N-terminal domain is natively unfolded unknown (natively unfolded)
Creutzfeldt-Jacob disease (CJD) Gerstmann-StrausslerSchneiker syndrome (GSS) fatal familial insomnia (FFI) kuru bovine spongiform encephalopathy (BSE) and scrapie Alzheimer’s disease (AD) Dutch hereditary cerebral hemorrhage with amyloidosis (HCHWA, also known as cerebrovascular amyloidosis) Congophilic angiopathy familial British dementia Huntington disease
Androgen receptor protein
Ataxin-1
DRPLA protein (atrophin-1)
unknown (probably natively unfolded)
IAPP (amylin)
natively unfolded
Calcitonin
natively unfolded
α-Synuclein
natively unfolded
Tau protein
natively unfolded
spinal and bulbar muscular atrophy (SBMA)
spinocerebellar ataxia (SCA) neuronal intranuclear inclusion disease (NIID) hereditary dentatorubral-pal-lidoluysian atrophy (DRPLA) pancreatic islet amyloidosis in late-onset diabetes (type II diabetes mellitus) medullary carcinoma of the thyroid (MCT) Parkinson’s disease (PD) diffuse Lewy bodies disease (DLBD) Lewy bodies variant of Alzheimer’s disease (LBVAD) dementia with Lewy bodies (DLB) multiple system atrophy (MSA) Hallervorden-Spatz disease Alzheimer disease (AD) Pick’s disease progressive supranuclear palsy (PSP)
4 Amyloid Fibril Formation of Natively Unfolded Proteins Table 2 Non-disease-related amyloidogenic proteins and peptides
Protein (peptide)
Type of structure
Prothymosin α [164] Human complement receptor 1, 18–34 fragment [165] GAGA factor [166] Yeast prion Ure2p [167] ApoCII [156, 157, 160] Cold shock protein B, 1–22 fragment [168] Core histones [163] Soluble homopolypeptides [169]: poly(L-lysine) poly(L-glutamic acid) poly(L-threonine)
natively unfolded unfolded native unfolded α-helical/unfolded Natively unfolded Unfolded Natively unfolded
protein, due to the constraints of its tertiary structure. This situation, obviously, is not applicable for natively unfolded proteins, as they are devoid of rigid structure in their starting state. Thus, in contrast to globular proteins, the primary step of fibrillogenesis of natively unfolded proteins is partial folding rather than partial unfolding. The details of the process will be illustrated using results from a detailed analysis of the aggregation of human α-synuclein, a protein whose fibrillogenesis in vitro has been studied extensively. The generality of the results will be emphasized with a variety of other natively unfolded proteins that assemble into amyloid fibrils. 2 Molecular Mechanisms of Amyloid Fibril Formation by a Natively Unfolded Protein: α-Synuclein 2.1 α-Synuclein in Parkinson’s Disease and other Neurodegenerative Disorders
Parkinson’s disease is a progressive disorder resulting from loss of neurons of the substantia nigra, a small area of cells in the mid-brain. Gradual degeneration of these dopaminergic neurons causes a reduction in the release of the neurotransmitter dopamine in the striatum. This, in turn, can produce one or more of the classic signs of Parkinson’s disease: resting tremor on one (or both) side(s) of the body, generalized slowness of movements (bradykinesia), stiffness of limbs (rigidity) and gait or balance problems (postural dysfunction). The precise mechanisms of neuronal death are unknown as yet. Some surviving nigral dopaminergic neurons contain cytosolic filamentous inclusions known as Lewy bodies (LBs) and Lewy neurites (LNs) [36, 37]. LBs are also neurophathological hallmarks of several other diseases collectively termed synucleinopathies. There is a strong link between α-synuclein aggregation and the pathogenesis of Parkinson’s disease, including three different missense mutations in the α-synuclein gene, corresponding to A30P, E46K and A53T substitutions in
2 Molecular Mechanisms of Amyloid Fibril Formation by a Natively Unfolded Protein: "-Synuclein
α-synuclein protein, in a small number of kindreds with autosomal-dominantly inherited, early-onset Parkinson’s disease [38–40]; triplication of the α-synuclein gene locus causes familial autosomal dominant Parkinson’s disease with early onset [41]; overexpression of wild-type (WT) α-synuclein in transgenic mice [42] or of WT, A30P and A53T in transgenic flies [43] leads to motor deficits and neuronal inclusions reminiscent of Parkinson’s disease. 2.2 Key Structural Properties of α-Synuclein: A Natively Unfolded Protein
α-Synucleins from different organisms exhibit a high degree of sequence conservation: human α-synuclein consists of 140 amino acid residues and can be divided into the N-terminal region, residues 1–95, containing six 11-amino-acid imperfect repeats with a highly conserved hexamer motif (KTKEGV), and the C-terminal region, residues 96–140, which is enriched in acidic residues and prolines, suggesting that it adopts a disordered conformation. α-Synuclein is a typical intrinsically unstructured or natively unfolded protein (for recent reviews, see [13, 21, 23, 26, 44, 45]). The sequence of intrinsically disordered proteins is characterized by amino acid compositional bias and the existence of highly predictable flexibility [46]. Further, the majority of the intrinsically disordered proteins, are substantially depleted in I, L, V, W, F, Y, C and N, and enriched in E, K, R, G, Q, S, P and A [21]. These features account for the low hydrophobicity and high net charge of the intrinsically unstructured proteins, a property that allows discrimination of globular (folded) and intrinsically unstructured proteins based solely on their amino acid composition [5]. Several detailed studies show that purified α-synuclein possesses little ordered structure under physiological conditions [45, 47–49] and is characterized by farUV circular dichroism (CD) and Fourier transform IR (FTIR) spectra typical of a substantially unfolded polypeptide pH [49]. The hydrodynamic properties of α-synuclein show that the protein is quite expanded, but is slightly more compact than expected for a random coil and lacks a tightly packed globular structure [45, 49, 50]. The results of pulsed-field gradient nuclear magnetic resonance (NMR; which allows an estimation of the hydrodynamic radius) confirm that α-synuclein is unfolded, but slightly collapsed [51], and a high-resolution NMR analysis revealed that α-synuclein is largely unfolded in solution, but exhibits a region between residues 6 and 37 with a preference for helical conformation [48]. Finally, Raman optical activity spectra suggested that α-synuclein may contain some poly(L-proline) II helical conformation [52]. One of the characteristic features of natively unstructured proteins is a reverse response to changes in their environment compared with that of globular proteins. For example, intrinsically unstructured proteins gain rather than lose ordered structure at extremes of pH or high temperatures. The structure-forming effects of low pH on α-synuclein [45, 49, 50] were attributed to the minimization of the large net negative charge present at neutral pH, thereby decreasing intramolecular charge–charge repulsion and permitting hydrophobic-driven collapse to the
5
6 Amyloid Fibril Formation of Natively Unfolded Proteins
partially folded conformation. The effect of elevated temperatures was attributed to increased strength of the hydrophobic interaction at higher temperatures, leading to a stronger hydrophobic driving force for folding [49]. In fact, any changes in the environment of a natively unfolded protein leading to an increase in protein hydrophobicity and/or decrease in its net charge are expected to lead to partial folding. 2.3 Major Structural Characteristics of Partially Folded α-Synuclein
As the pH is decreased (or temperature increased) changes were observed in the shape of the CD spectrum for α-synuclein. Fig. 1(a) shows that the minimum at 196 nm becomes less intense, whereas the negative intensity of the spectrum around 222 nm increases, reflecting pH-induced formation of secondary structure. Similarly, the FTIR spectrum of α-synuclein at pH 7.5 was typical of an unfolded polypeptide (Fig. 1b), whereas a decrease in pH leads to significant spectral changes, indicating an increase in ordered secondary structure. The most evident change is the appearance of a new β-sheet band in the vicinity of 1626 cm–1 . Thus, at acidic pH natively unfolded α-synuclein is transformed into a partially folded conformation with a significant amount of β-structure. Furthermore, Fig. 1(c) shows that a decrease in pH leads to a considerable increase in 1-anilino-8-naphthalene sulfonate (ANS) fluorescence intensity and a large blue shift of the ANS fluorescence maximum (from around 515 to around 475 nm), reflecting the pH-induced transformation of the natively unfolded α-synuclein to a partially folded conformation with solvent-exposed hydrophobic regions. Hydrodynamic methods revealed that this partially folded conformation is accompanied by a substantial decrease in hydrodynamic dimensions (Rs = 27.9 ± 0.4 and Rg = 30 ± 1 Å). Moreover, changes in the profile of the small-angle X-ray scattering Kratky plot at pH 3 were consistent with the development of a tightly packed core (Fig. 1D). This means that protonation of α-synuclein results in the transformation of this natively unfolded protein into a more compact conformation with a significant amount of ordered secondary structure, affinity for ANS and the beginnings of a tightly packed core, all hallmarks of a partially folded intermediate [49]. Comparable structural changes are induced in α-synuclein by high temperatures [49]. The analysis of these data suggests that the partially folded conformation induced in α-synuclein by low pH or high temperature resembles the pre-molten globule state, an intermediate, preceding the molten globule in the refolding of globular proteins. In fact, it has been shown that the pre-molten globule is a denatured state, i.e. a conformation where the protein lacks a rigid three-dimensional structure. It is characterized by some secondary structure, although much less than that of the molten globule or native globular protein. A protein in the pre-molten globule state is considerably less compact than in the molten globule state and does not have a tightly packed globular structure, but it is more compact than the corresponding random coil. Furthermore, the pre-molten globule can interact with the hydrophobic fluorescent probe ANS, although more weakly than a molten globular
2 Molecular Mechanisms of Amyloid Fibril Formation by a Natively Unfolded Protein: "-Synuclein
Fig. 1 a–c
7
8 Amyloid Fibril Formation of Natively Unfolded Proteins
Fig. 1 Major structural characteristics of the natively unfolded and the partially folded pre-molten glubule-like conformation induced in human "-synuclein by low pH. (a) Far-UV CD spectra measured under different conditions. (b) FTIR spectra in the amide I region measured for natively
unfolded, partially folded and fibrillar forms of human alpha-synuclein. (c) ANS spectra measured under different experimental conditions. (d) Kratky plot representation of the results of small angle X-ray scattering analysis of "-synuclein at different experimental conditions.
protein. This means that at least some hydrophobic clusters are already formed in the pre-molten globule state, although there is no globular structure [53–55]. 2.4 Fibril Formation by α-Synuclein and the Partially Folded Amyloidogenic Conformation
The fibrillogenesis of α-synuclein in vitro has been studied extensively (for recent reviews, see [45, 56]). Although α-synuclein is an intrinsically unstructured protein, it forms fibrils of highly organized structure. Fig. 2 shows that both decreasing the pH and increasing the temperature result in accelerated fibrillation of α-synuclein [49]; thus, there is an excellent correlation between the intramolecular conformational change described above and the increased kinetics of fibril formation, suggesting that the partially structured conformation is a key intermediate on the fibril-forming pathway [49]. In contrast to an unfolded polypeptide chain, a partially folded conformation is anticipated to have contiguous hydrophobic patches on its surface, which might foster self-association and fibrillation. Thus, factors that shift the equilibrium in favor of this monomeric partially folded conformation will favor fibril formation. An increase in protein concentration obviously increases the absolute concentration of the intermediate and, thus, is predicted to accelerate fibrillation, as is observed (Fig. 3). There is an inverse linear correlation between the logarithm of α-synuclein concentration and the duration of the lag time (Fig. 3B). This reflects an important aspect of the kinetics of nucleus formation, i.e. the formation of the
2 Molecular Mechanisms of Amyloid Fibril Formation by a Natively Unfolded Protein: "-Synuclein
Fig. 2 Effect of pH (A) and temperature (B) on fibril formation of human "-synuclein, monitored by Thioflavin T fluorescence. Decreasing pH and increasing temperature accelerate fibrillation, due to decreased net charge and increased hydrophobicity, respectively.
aggregation-prone partially folded intermediate represents the rate-limiting step in α-synuclein nucleation and this process is probably responsible for the observed first-order kinetics. An alternative way to view this is that association to form fibrils involves the association of a rare conformer, the amyloidogenic partially folded intermediate. The data suggest that the key partially folded intermediate, once formed, oligomerizes rapidly to form fibrils. Fig. 3(C) shows that there is also a linear dependence of the first-order rate constant for fibril growth (elongation) on protein concentration. The kinetics of elongation have been shown to follow
9
10 Amyloid Fibril Formation of Natively Unfolded Proteins
Fig. 3 The effect of protein concentration on the kinetics of fibril formation of human recombinant "-synuclein. (a) Kinetics of fibrillation at different "-synuclein concentrations. Protein concentrations were 21 (solid circles), 70 (open circles), 105 (solid tri-
angles) and 190 :M (open triangles). (b) Inverse linear dependence of the logarithm of "-synuclein concentration as a function of lag time. (c) Linear dependence of the rate constant kapp for fibril growth on the "-synuclein concentration.
first-order kinetics for some other proteins, including Aβ and insulin [57–60]. The simplest explanation for this fact is that increasing protein concentration leads to increasing numbers of fibrils, due to increasing concentration of nuclei and to increasing concentration of monomeric protein (either in its native-like form or as a partially folded intermediate, see below).
3 Fibrillogenesis of Natively Unfolded Proteins Requires Partial Folding 11
It has been assumed that the aggregation of α-synuclein, leading to the development of Parkinson’s disease and other synucleinopathies, arises from various factors that would significantly increase the concentration of the critical partially folded intermediate [49]. For example, a number of factors have been shown to induce partial folding of α-synuclein and accelerated fibrillation in vitro: these include some common pesticides and herbicides [61–63], metal ions [63, 64], moderate concentrations of osmolytes such as TMAO [65], and some alcohols [66]. Under all these conditions α-synuclein was shown to form fibrils rapidly, thus confirming the correlation between the partially folded conformation and fibril formation (see Figs 2B and 4A). In contrast, Fig. 4(B) shows that the process of fibril formation can be slowed or completely inhibited under conditions favoring formation of more folded conformations [65, 66] or by stabilization of off-pathway oligomers, e.g. via nitration of tyrosines [67] or methionine oxidation [68], or interaction with β- or γ -synucleins [50]. All conditions that populated the aggregation-prone partially folded conformation accelerated both the nucleation and elongation stages of fibril assembly. This means that the partially folded intermediate is likely involved in both the formation of the nucleus and in the subsequent propagation of fibrils. It is worth noting that the formation of amyloid-like fibrils is not the only pathological outcome of protein deposition diseases and in several disorders (as well as in numerous in vitro experiments) protein deposits are composed of amorphous aggregates, without local order, e.g. light chain deposition disease, which involves deposition of immunoglobulin light chains. Similarly, soluble oligomers represent another alternative final product of the aggregation process. The physiological significance of oligomers is that they may be the toxic species. Fig. 5 represents a simplified model of α-synuclein aggregation, demonstrating the fact that aggregation is a very complex process, which can be divided into three major steps. The model shows that the first stage of the aggregation process is the structural transformation of a soluble natively unfolded protein, UN , into the “sticky” aggregation-prone precursor or intermediate (I), which plays a central role in the entire aggregation process. This species can partition kinetically between a minimum of three distinct pathways, leading to fibrils, amorphous aggregates and soluble oligomers. The formation of a nucleus leading to both fibrils and amorphous aggregates is a kinetically disfavored event and is responsible for the lag period preceding significant formation of aggregates. Once a critical nucleus has been generated, the conditions change in favor of a rapid increase in aggregate size.
3 Fibrillogenesis of Natively Unfolded Proteins Requires Partial Folding 3.1 Fibril Formation by Proteins Involved in Conformational Disorders
Data presented in Tables 1 and 2 show that many of the known amyloidogenic proteins are natively unfolded. It is reasonable to assume that such proteins are well
12 Amyloid Fibril Formation of Natively Unfolded Proteins
Fig. 4 Effect of different environmental factors on fibrillation of human "-synuclein. (a) Factors accelerating protein fibrillation: control (circles), 1 :M TMAO (inverse triangles), 10% methanol (diamonds), 500 M
paraquat (squares) and 500 :M HgCl2 (triangles). (b) Factors inhibiting fibrillation of "-synuclein: control (circles), nitrated "-synuclein (squares) and 1:1 mixture of "-/$-synucleins (triangles).
suited for amyloidogenesis, as they lack significant secondary and tertiary structure. In the absence of such conformational constraints they would be expected to be substantially more conformationally mobile, and thus more able to polymerize in the form of β-sheets than tightly packed globular proteins. However, this is not always the case and many natively unfolded proteins form fibrils in vitro with similar rates to those of native globular proteins (i.e. within a time frame of a few
3 Fibrillogenesis of Natively Unfolded Proteins Requires Partial Folding 13
Fig. 5 Schematic overview of the aggregation of "-synuclein, showing the central position of the partially folded intermediate and the different final aggregated products.
hours to several days). The delay in the fibril formation of these proteins has been attributed to the requirement for considerable structural rearrangement within the unfolded polypeptide chain, giving rise to a partially folded conformation. The following section considers illustrative examples of fibril formation of natively unfolded proteins. 3.2 Amyloid β protein (Aβ)
Alzheimer’s disease is the most prevalent age-dependent dementia, and is characterized by the accumulation of extracellular amyloid deposits, senile plaques, in the cerebral cortex and of intracellular neurofibrillary tangles. Senile plaques contain Aβ, which is a 40- to 42-residue peptide produced by endoproteolytic cleavage of the Aβ protein precursor (AβPP), whereas neurofibrillary tangles are assembled from the protein tau (see below). Many lines of evidence support the crucial role of Aβ in Alzheimer’s disease, in which the aggregation of Aβ is associated with the development of the cascade of neuropathogenetic events, ending with the appearance of cognitive and behavioral features typical of Alzheimer’s disease. NMR studies have shown that monomers of Aβ1–40 or Aβ1–42 are unfolded under physiological conditions [69]. However, partial folding has been detected at the earliest stages of Aβ fibril formation [69] and Aβ aggregation fits the “standard” pattern of fibril formation, i.e. native(ly unfolded) to partially folded to oligomers to fibrillar species. Many recent studies have shown that Aβ forms small soluble oligomers, in which the polypeptide is partially folded and that appear to be the toxic species [70–78]. 3.3 Tau protein
Tau, a microtubule assembly protein [79–85], represents a family of isoforms with one, two, three or four repeats in the C-terminal region [86]. Post-translational phosphorylation of tau is an additional source of microheterogeneity [87]. Tau
14 Amyloid Fibril Formation of Natively Unfolded Proteins
is a significantly disordered protein. Interest in tau dramatically increased with the discovery of its aggregation in neuronal cells in the progress of Alzheimer’s disease and various other neurodegenerative disorders, especially fronto-temporal dementia [88, 89]. In these cases specific tau-containing neurofibrillary tangles paired helical filaments (PHFs)] are formed [90]. Hyperphosphorylation was shown to be a common characteristic of pathological tau [91], and hyperphosphorylated tau isolated from patients with Alzheimer’s disease is unable to bind to microtubules and promote microtubule assembly. However, both of these activities are restored after enzymatic dephosphorylation of tau [92–96]. During brain development, tau is phosphorylated at many residues and by several kinases [97, 98]. Most of the in vitro phosphorylation sites of tau are located within the microtubule interacting region (repeat domain) and sequences flanking the repeat domain. Many of these sites are also phosphorylated in PHF-tau [99, 100]. In fact, 10 major phosphorylation sites have been identified in tau isolated from PHFs from patients with Alzheimer’s disease [99]. All of these sites are located in regions flanking tau’s repeat domain and constitute recognition sites for several Alzheimer’s disease diagnostic antibodies, which may point to an important role for these phosphorylation sites for Alzheimer’s disease pathogenesis. Hyperphosphorylation is accompanied by transformation from the unfolded state of tau into a partially folded conformation [101, 102], dramatically accelerating the self-assembly into paired helical filaments in vitro [93]. To analyze the potential role of tau hyperphosphorylation in tauopathies, mutated tau proteins have been produced, in which all 10 serine/threonine residues known to be highly phosphorylated in PHF-tau were substituted for negatively charged residues, thus producing a model for a defined and permanent hyperphosphorylation-like state [103]. Analogous to hyperphosphorylation, glutamate substitutions induce compact structure elements and sodium dodecylsulfate (SDS)-resistant conformational domains in tau, as well as dramatic acceleration of its fibrillation [103].
3.4 Islet Amyloid Polypeptide (IAPP) or Amylin
In addition to insulin, pancreatic islet cells β produce a peptide called amylin or IAPP [104]. Dysfunction of amylin due to mutation and/or amyloid fibril formation is associated with the development of non-insulin-dependent diabetes mellitus (Type 2 diabetes) [105–107]. Amylin is an unstructured peptide hormone of 37 amino acid residues, which forms fibrils under physiological conditions in vitro. The process of polymerization is relatively fast (lag times of 100 and 50 min for fulllength amylin and its 8–37 fragment, respectively) and results in the appearance of typical amyloid fibrils [108]. Interestingly, both peptides showed formation of a partially folded intermediate early in the fibrillation process. It takes around 90 min for full-length amylin to form such an intermediate, whereas this period was almost half as long for the truncated peptide, showing excellent correlation with the fibrillation lag-times [108].
3 Fibrillogenesis of Natively Unfolded Proteins Requires Partial Folding 15
3.5 Prion Protein
Prion diseases, collectively referred to as the transmissible spongiform encephalopathies (TSEs), are characterized by unique infectious prion particles. TSEs include Creutzfeldt-Jakob disease, scrapie, bovine spongiform encephalopathy, and chronic wasting disease of mule deer and elk [109]. The characteristic pathological features of TSEs are spongiform degeneration of the brain and accumulation of the abnormal, protease-resistant prion protein isoform in the central nervous system, which sometimes forms amyloid-like plaques. The N-terminal region of about 100 amino acids in PrPC is unstructured in the isolated molecule in solution in the absence of copper. The C-terminal domain is folded into a largely α-helical conformation (three α-helices and a short antiparallel β-sheet) and stabilized by a single disulphide bond linking helices 2 and 3 [110]. The central event in the pathogenesis of prion diseases is a major conformational change of the C-terminal region of the prion protein (PrP) from an α-helical (PrPc ) to a β-sheet-rich isoform (PrPSc ) and PrPSc propagates itself by causing the conversion of PrPC to PrPSc . Although unstructured in the isolated molecule, the N-terminal region contains tight binding sites for Cu2+ ions and acquires structure following copper binding [111–115]. Investigations of the steps required for prion propagation and neurodegeneration in transgenic mice expressing chimeric mouse-hamster-mouse or mouse-human-mouse PrP transgenics indicated that the last 50 residues in the disordered N-terminal region play a particularly important role in the interaction of PrPC with PrPSc leading to the conversion of the former to the latter [116, 117]. Those residues are largely unordered or weakly helical in the full-length PrPC [118, 119], but are predicted to be β-structure in PrPSc [120]. These observations emphasize a crucial role of the disordered N-terminal region in the modulation of prion protein aggregation. Several kinetics studies have revealed the existence of partially folded intermediates for the prion protein [121–123] and is reasonable to assume that fibrillation requires partial unfolding of the C-terminal domain prior to self-association. 3.6 Polyglutamine Repeat Diseases
Currently there are eight known hereditary diseases, including Huntington’s disease, in which the expansion of a CAG repeat in the gene leads to neurodegeneration [124, 125]. The neurotoxicity in these diseases is due to the expansion of the CAGn -encoded polyglutamine [poly(Gln)] repeat. The mechanistic hypothesis linking CAG repeat expansion to toxicity involves the tendency of longer poly(Gln) sequences, regardless of protein context, to form insoluble aggregates [126–134]. To help evaluate various possible mechanisms, the biophysical properties of a series of simple poly(Gln) peptides have been analyzed. The CD spectra of poly(Gln) peptides with repeat lengths of 5, 15, 28 and 44 residues were shown to be nearly identical, and were consistent with a high degree of random coil structure,
16 Amyloid Fibril Formation of Natively Unfolded Proteins
suggesting that the length dependence of disease is not related to a conformational change in the monomeric states of expanded poly(Gln) sequences [132]. In contrast, there was a dramatic acceleration in the spontaneous formation of ordered, amyloid-like aggregates for poly(Gln) peptides with repeat lengths of greater than 37 residues. Several studies have revealed the existence of partially folded intermediates of poly(Gln)-repeat proteins as key species in fibrillation [133, 135, 136] and these are assumed to be precursors of oligomeric aggregates [132].
4 Fibrillation of Proteins Unrelated to Conformational Disease 4.1 Yeast Prions
There are at least two natural genetic elements, the [PSI+ ] and [URE3] factors, in the budding yeast Saccharomyces cerevisiae that exhibit non-Mendelian inheritance, but can be “cured” by treatment of the cells with low concentrations of the protein denaturant guanidine hydrochloride [137]. The [PSI+ ] factor is a self-perpetuating, conformationally altered form of a cellular protein [138–140]. The inheritance of [PSI+ ] from mother to daughter cells is based on the transmission of conformational information from ordered non-functional Sup35p aggregates (amyloid-like fibrils) to the soluble, functional form of Sup35p, a subunit of the polypeptide chain release complex that is essential for translation termination [141, 142]. In vitro, the prion-determining region of Sup35p, NM, is initially disordered, but slowly converts to a partially folded structure and then fibrils [143–145]. In particular, factors such as elevated temperatures, chemical chaperones and certain mutations that increased the amount of structure increased the amount of aggregation [145]. The acceleration of Sup35p fibril formation is determined by the acceleration of slow conformational changes rather than by providing stable nuclei. Strikingly, inhibitory mutations map exclusively within a short glutamine/asparagine-rich region of Sup35p and all but one occur at polar residues. Even after replacement of this region with poly(Gln), Sup35p retains its ability to form fibrils [140]. This suggests similarities between the prion-like propagation of [PSI+ ] and poly(Gln)-mediated pathogenesis of several neurodegenerative diseases (see above). The [URE3] element is another factor of S. cerevisiae that propagates by a prionlike mechanism and corresponds to the loss of function of the cellular protein Ure2 [146, 147]. The N-terminal region of the protein is flexible and unstructured, while its C-terminal region is compact and folded [148–150]. The overexpression of full-length Ure2p in wild-type S. cerevisiae strains induced a 20- to 200-fold increase in the frequency with which [URE3] arose. On the other hand, expression of just the N-terminal 65 residues of Ure2p increased the frequency of [URE3] induction 6000-fold [148]. These observations emphasize a key role of the disordered N-terminal domain in the fibrillation of Ure2 protein. A dimeric intermediate is populated transiently during refolding and populated under conditions correlating
4 Fibrillation of Proteins Unrelated to Conformational Disease
with fibril formation. Interestingly, the native dimer and dimeric intermediate are significantly more stable than either of their monomeric counterparts. Stabilization of the native state disfavors amyloid nucleation [151]. 4.2 Prothymosin α
This is a very acidic protein, containing around 50% aspartic and glutamic acid, no aromatic or cysteine residues, and very few large hydrophobic aliphatic amino acids [152]. Because of these features, prothymosin α adopts a random coillike conformation with no regular secondary structure at neutral pH ([152, 153], and Tables 1 and 2). However, at acidic pH prothymosin α folds into a partially folded conformation [153]. Interestingly, it has been recently shown that at low pH (below pH 3, i.e. under conditions favoring the formation of the partially folded conformation [153]), prothymosin α is capable of relatively fast formation (lag time around 100 min) of regular elongated fibrils with a flat ribbonlike structure 4–5 nm in height and 12–13 nm in width [154]. 4.3 Apolipoprotein CII (ApoCII)
Human ApoCII is a plasma protein consisting of 79 amino acid residues, whose function is to activate lipoprotein lipase. The structure of ApoCII in the presence of the lipid mimetic, SDS, reveals three regions of well-defined amphi-pathic α-helix with loosely defined intervening regions that may reflect flexible hinge regions [155]. In the absence of lipid or detergent, human ApoCII lacks ordered structure and forms amyloid ribbons after incubation for several days [156–162]. Fibril formation was dramatically accelerated by the addition of phospholipids in submicellar concentrations, which were shown to induce partial folding of the protein into a pre-molten globule-like intermediate [156]. In contrast, the fibrillation of ApoCII was completely inhibited in the presence of micellar phospholipids, i.e. under conditions favoring α-helical conformation [156]. 4.4 Histones
The bovine core histones are natively unfolded proteins in solutions of low ionic strength, due to their high net positive charge at pH 7.5 [163]. Analysis of the structural properties of core histones as a function of pH, ionic strength and organic solvent concentration demonstrated a correlation between conformational change and propensity to aggregate. Overall, the histones are able to adopt at least five different relatively stable conformations: the intrinsically unstructured form (pH 2.0, low protein concentration, low ionic strength), a fully-unfolded conformation (6 M urea), partially folded α-helical dimers (low ionic strength, pH 2.0, high protein concentrations), helical monomers (low ionic strength, pH 2.0, 10% trifluoroethanol)
17
18 Amyloid Fibril Formation of Natively Unfolded Proteins
and molten globule-like α-helical oligomers (pH 2.0, high salt) [163]. Under most of the conditions studied the histones formed amyloid fibrils with typical morphology as seen by electron microscopy. In contrast to most aggregation/amyloidogenic systems, the kinetics of fibrillation showed an inverse dependence on histone concentration, which has been attributed to partitioning to a faster pathway leading to non-fibrillar self-associated aggregates at higher protein concentrations. In keeping with the hypothesis that partial folding is required of intrinsically disordered proteins for aggregation, the rate of fibril formation correlated with conditions favoring the stabilization of the partially folded conformation [163].
5 Conclusions
Intrinsically disordered or natively unfolded proteins represent a high fraction of naturally occurring amyloidogenic proteins. This is most likely due to the ease with which they can form the β-sheet topology required in amyloid fibrils, in contrast to folded globular proteins, for which the native conformation places major constraints for the required topological rearrangement. In contrast to tightly folded globular proteins, the first, and critical, step in the aggregation of intrinsically disordered proteins or polypeptides is partial folding. This results in a partially folded conformation/intermediate with hydrophobic surface patches that favors self-association. As with such intermediates formed from the partial folding of globular proteins, such aggregate-prone intermediates can polymerize to form fibrillar or amorphous aggregates, or soluble oligomers.
Acknowledgments
This work was supported by grants INTAS 2001-2347 (V. N.U.), and NIH NS39985, DK55675 and NS43778 (A. L. F.).
References 1 PANDEY, N., M. GANAPATHI, K. KUMAR, D. DASGUPTA, S. D. SUTAR and D. DASH. Comparative analysis of protein unfoldedness in human housekeeping and non-housekeeping proteins. Bioinformatics 2004, 20, 2904–2910. 2 DUNKER, A. K., Z. OBRADOVIC, P. ROMERO, E. C. GARNER and C. J. BROWN. Intrinsic protein disorder in complete genomes. Genome Inform Ser Workshop Genome Inform 2000, 11, 161–171.
3 WRIGHT, P. E. and H. J. DYSON. Intrinsically unstructured proteins: re-assessing the protein structurefunction paradigm. J Mol Biol 1999, 293, 321–331. 4 RADIVOJAC, P., Z. OBRADOVIC, C. J. BROWN and A. K. DUNKER. Improving sequence alignments for intrinsically disordered proteins. Pacific Symp Biocomput 2002, 589–600. 5 UVERSKY, V. N., J. R. GILLESPIE and A. L. FINK. Why are “natively unfolded”
References
6
7
8
9
10
11
12
13
14
15
16
proteins unstructured under physiologic conditions? Proteins 2000, 41, 415–427. KELLY, J. W. The alternative conformations of amyloidogenic proteins and their multi-step assembly pathways. Curr Opin Struct Biol 1998, 8, 101–106. DOBSON, C. M. Protein misfolding, evolution and disease. Trends Biochem Sci 1999, 24, 329–332. BELLOTTI, V., P. MANGIONE and M. STOPPINI. Biological activity and pathological implications of misfolded proteins. Cell Mol Life Sci 1999, 55, 977–991. UVERSKY, V. N., A. TALAPATRA, J. R. GILLESPIE and A. L. FINK, Protein deposits as the molecular basis of amyloidosis. I. Systemic amyloidoses. Med Sci Monitor 1999, 5, 1001–1012. UVERSKY, V. N., A. TALAPATRA, J. R. GILLESPIE and A. L. FINK. Protein deposits as the molecular basis of amyloidosis. II. Localized amyloidosis and neurodegenerative disorders. Med Sci Monitor 1999, 5, 1238–1254. ROCHET, J. C. and P. T. LANSBURY, Jr. Amyloid fibrillogenesis: themes and variations. Curr Opin Struct Biol 2000, 10, 60–68. CHITI, F., P. WEBSTER, N. TADDEI, A. CLARK, M. STEFANI, G. RAMPONI and C. M. DOBSON. Designing conditions for in vitro formation of amyloid protofilaments and fibrils. Proc Natl Acad Sci USA 1999, 96, 3590–3594. UVERSKY, V. N. Protein folding revisited. A polypeptide chain at the foldingmisfolding-nonfolding cross-roads: which way to go? Cell Mol Life Sci 2003, 60, 1852–1871. UVERSKY, V. N. and A. L. FINK. Conformational constraints for amyloid fibrillation: the importance of being unfolded. Biochim Biophys Acta 2004, 1698, 131–153. DOBSON, C. M. Protein folding and its links with human disease. Biochem Soc Symp 2001, 1–26. FANDRICH, M., M. A. FLETCHER and C. M. DOBSON. Amyloid fibrils from muscle myoglobin – even an ordinary globular protein can assume a rogue guise if
17
18
19
20
21
22
23
24
25
26
27
28
conditions are right. Nature 2001, 410, 165–166. PERTINHEZ, T. A., M. BOUCHARD, E. J. TOMLINSON, R. WAIN, S. J. FERGUSON, C. M. DOBSON and L. J. SMITH. Amyloid fibril formation by a helical cytochrome. FEBS Lett 2001, 495, 184–186. DOBSON, C. M. Principles of protein folding, misfolding and aggregation. Semin Cell Dev Biol 2004, 15, 3–16. STEFANI, M. and C. M. DOBSON. Protein aggregation and aggregate toxicity: new insights into protein folding, misfolding diseases and biological evolution. J Mol Med 2003, 81, 678–699. DYSON, H. J. and P. E. WRIGHT. Elucidation of the protein folding landscape by NMR. Methods Enzymol 2005, 394, 299–321. DUNKER, A. K., J. D. LAWSON, C. J. BROWN, R. M. WILLIAMS, P. ROMERO, J. S. OH, C. J. OLDFIELD, A. M. CAMPEN, C. M. RATLIFF, K. W. HIPPS, J. AUSIO, M. S. NISSEN, R. REEVES, C. KANG, C. R. KISSINGER, R. W. BAILEY, M. D. GRISWOLD, W. CHIU, E. C. GARNER and Z. OBRADOVIC. Intrinsically disordered protein. J Mol Graph Model 2001, 19, 26–59. UVERSKY, V. N. What does it mean to be natively unfolded? Eur J Biochem 2002, 269, 2–12. UVERSKY, V. N. Natively unfolded proteins: a point where biology waits for physics. Protein Sci 2002, 11, 739–756. DUNKER, A. K., C. J. BROWN, J. D. LAWSON, L. M. IAKOUCHEVA and Z. OBRADOVIC. Intrinsic disorder and protein function. Biochemistry 2002, 41, 6573–6582. DUNKER, A. K., C. J. BROWN and Z. OBRADOVIC. Identification and functions of usefully disordered proteins. Adv Protein Chem 2002, 62, 25–49. TOMPA, P. Intrinsically unstructured proteins. Trends Biochem Sci 2002, 27, 527–533. DYSON, H. J. and P. E. WRIGHT. Coupling of folding and binding for unstructured proteins. Curr Opin Struct Biol 2002, 12, 54–60. SUNDE, M., L. C. SERPELL, M. BARTLAM, P. E. FRASER, M. B. PEPYS and C. C. BLAKE.
19
20 Amyloid Fibril Formation of Natively Unfolded Proteins
29
30
31
32
33
34
35
36
37
38
Common core structure of amyloid fibrils by synchrotron X-ray diffraction. J Mol Biol 1997, 273, 729–739. SHIRAHAMA, T., M. D. BENSON, A. S. COHEN and A. TANAKA. Fibrillar assemblage of variable segments of immunoglobulin light chains: an electron microscopic study. J Immunol 1973, 110, 21–30. SHIRAHAMA, T. and A. S. COHEN High-resolution electron microscopic analysis of the amyloid fibril. J Cell Biol 1967, 33, 679–708. JIMENEZ, J. L., J. I. GUIJARRO, E. ORLOVA, J. ZURDO, C. M. DOBSON, M. SUNDE and H. R. SAIBIL. Cryoelectron microscopy structure of an SH3 amyloid fibril and model of the molecular packing. EMBO J 1999, 18, 815–821. LANSBURY, P. T., Jr. Evolution of amyloid: what normal protein folding may tell us about fibrillogenesis and disease. Proc Natl Acad Sci USA 1999, 96, 3342–3344. FINK, A. L. Protein aggregation: folding aggregates, inclusion bodies and amyloid. Fold Des 1998, 3, 9–15. DOBSON, C. M. The structural basis of protein folding and its links with human disease. Philos Trans R Soc Lond B Biol Sci 2001, 356, 133–145. ZEROVNIK, E. Amyloidfibril formation. Proposed mechanisms and relevance to conformational disease. Eur J Biochem 2002., 269, 3362–3371. LEWY, F. H. Paralysis Agitans. Pathologische Anatomie. In Handbuch der Neurologie, LEWANDOWSKI, M. (ed.). Springer, Berlin, 1912. FORNO, L. S.. Neuropathology of Parkinson’s disease. J Neuropathol Exp Neurol 1996, 55, 259–272. POLYMEROPOULOS, M. H., C. LAVEDAN, E. LEROY, S. E. IDE, A. DEHEJIA, A. DUTRA, B. PIKE, H. ROOT, J. RUBENSTEIN, R. BOYER, E. S. STENROOS, S. CHANDRASEKHARAPPA, A. ATHANASSIADOU, T. PAPAPETROPOULOS, W. G. JOHNSON, A. M. LAZZARINI, R. C. DUVOISIN, G. DI IORIO, L. I. GOLBE and R. L. NUSSBAUM. Mutation in the alphasynuclein gene identified in families with Parkinson’s disease. Science 1997, 276, 2045–2047.
39 KRUGER, R., W. KUHN, T. MULLER, D. WOITALLA, M. GRAEBER, S. KOSEL, H. PRZUNTEK, J. T. EPPLEN, L. SCHOLS and O. RIESS. Ala30 Pro mutation in the gene encoding alpha-synuclein in Parkinson’s disease. Nat Genet 1998, 18, 106–108. 40 ZARRANZ, J. J., J. ALEGRE, J. C. GOMEZ-ESTEBAN, E. LEZCANO, R. ROS, I. AMPUERO, L. VIDAL, J. HOENICKA, O. RODRIGUEZ, B. ATARES, V. LLORENS, T. E. GOMEZ, T. DEL SER, D. G. MUNOZ and J. G. de YEBENES. The new mutation, E46K, of alpha-synuclein causes Parkinson and Lewy body dementia. Ann Neurol 2004, 55, 164–173. 41 SINGLETON, A. B., M. FARRER, J. JOHNSON, A. SINGLETON, S. HAGUE, J. KACHERGUS, M. HULIHAN, T. PEURALINNA, A. DUTRA, R. NUSSBAUM, S. LINCOLN, A. CRAWLEY, M. HANSON, D. MARAGANORE, C. ADLER, M. R. COOKSON, M. MUENTER, M. BAPTISTA, D. MILLER, J. BLANCATO, J. HARDY and K. GWINN-HARDY. α-Synuclein locus triplication causes Parkinson’s disease. Science 2003., 302, 841. 42 MASLIAH, E., E. ROCKENSTEIN, I. VEINBERGS, M. MALLORY, M. HASHIMOTO, A. TAKEDA, Y. SAGARA, A. SISK and L. MUCKE. Dopaminergic loss and inclusion body formation in alpha-synuclein mice: implications for neurodegenerative disorders. Science 2000, 287, 1265–1269. 43 FEANY, M. B. and W. W. BENDER. A Drosophila model of Parkinson’s disease [see Comments]. Nature 2000, 404, 394–398. 44 UVERSKY, V. N. What does it mean to be natively unfolded? Eur J Biochem 2002, 269, 2–12. 45 UVERSKY, V. N. A protein-chameleon: conformational plasticity of α-synuclein, a disordered protein involved in neurodegenerative disorders. J Biomol Struct Dyn 2003, 21, 211–234. 46 DUNKER, A. K., E. GARNER, S. GUILLIOT, P. ROMERO, K. ALBERCHT, J. HART, Z. OBRADOVIC, C. KISSINGER and J. E. VILLAFRANCA. Protein disorder and the evolution of molecular recognition: theory, predictions and observations. Pacific Symp Biocomput 1998, 3, 473–484.
References 47 WEINREB, P. H., W. ZHEN, A. W. POON, K. A. CONWAY and P. T. LANSBURY, Jr. NACP, a protein implicated in Alzheimer’s disease and learning, is natively unfolded. Biochemistry 1996, 35, 13709–13715. 48 ELIEZER, D., E. KUTLUAY, R. BUSSELL, Jr and G. BROWNE. Conformational properties of alpha-synuclein in its free and lipid-associated states. J Mol Biol 2001, 307, 1061–1073. 49 UVERSKY, V. N., J. LI and A. L. FINK. Evidence for a partially folded intermediate in alpha-synuclein fibril formation. J Biol Chem 2001, 276, 10737–10744. 50 UVERSKY, V. N., J. LI, P. SOUILLAC, I. S. MILLETT, S. DONIACH, R. JAKES, M. GOEDERT and A. L. FINK. Biophysical properties of the synucleins and their propensities to fibrillate – inhibition of alpha-synuclein assembly by beta- and gamma-synucleins. J Biol Chem 2002, 277, 11970–11978. 51 MORAR, A. S., A. OLTEANU, G. B. YOUNG and G. J. PIELAK. Solvent-induced collapse of alpha-synuclein and acid-denatured cytochromeitac. Protein Sci 2001, 10, 2195–2199. 52 SYME, CD., E. W. BLANCH, C. HOLT, R. JAKES, M. GOEDERT, L. HECHT and L. D. BARRON. A Raman optical activity study of rheomorphism in caseins, synucleins and tau. New insight into the structure and behaviour of natively unfolded proteins. Eur J Biochem 2002, 269, 148–156. 53 UVERSKY, V. N. and O. B. PTITSYN. Further evidence on the equilibrium “pre-molten globule state”: Four-state guanidinium chloride-induced unfolding of carbonic anhydrase B at low temperature. J Mol Biol 1996, 255, 215–228. 54 UVERSKY, V. N. Diversity of equilibrium compact forms of denatured globular proteins. Protein Peptide Lett 1997, 4, 355–367. 55 KARNOUP, A. S. and V. N. UVERSKY. Sequential compaction of a random copolymer of hydrophilic and hydrophobic amino acid residues. Macromolecules 1997, 30, 7427–7434.
56 UVERSKY, V. N. and A. L. FINK. Biophysical properties of human α-synuclein and its role in Parkinson’s disease. In Recent Research Developments in Proteins, PANDALAI, S. G. (ed.). Transworld Research Network, Kerala, 2002. 57 LOMAKIN, A., D. S. CHUNG, G. B. BENEDEK, D. A. KIRSCHNER and D. B. TEPLOW. On the nucleation and growth of amyloid beta-protein fibrils: detection of nuclei and quantitation of rate constants. Proc Natl Acad Sci USA 1996, 93, 1125–1129. 58 NAIKI, H. and K. NAKAKUKI. First-order kinetic model of Alzheimer’s betaamyloid fibril extension in vitro. Lab Invest 1996, 74, 374–383. 59 ESLER, W. P., E. R. STIMSON, J. R. GHILARDI, Y. A. LU, A. M. FELIX, H. V. VINTERS, P. W. MANTYH, J. P. LEE and J. E. MAGGIO. Point substitution in the central hydrophobic cluster of a human beta-amyloid congener disrupts peptide folding and abolishes plaque competence. Biochemistry 1996, 35, 13914–13921. 60 NIELSEN, L., R. KHURANA, A. COATS, S. FROKJAER, J. BRANGE, S. VYAS, V. N. UVERSKY and A. L. FINK. Effect of environmental factors on the kinetics of insulin fibril formation: elucidation of the molecular mechanism. Biochemistry 2001, 40, 6036–6046. 61 UVERSKY, V. N., J. LI and A. L. FINK. Pesticides directly accelerate the rate of alpha-synuclein fibril formation: a possible factor in Parkinson’s disease. FEBS Lett 2001, 500, 105–108. 62 MANNING-BOG, A. B., A. L. MCCORMACK, J. LI, V. N. UVERSKY, A. L. FINK and D. A. DI Monte. The herbicide paraquat causes up-regulation and aggregation of alpha-synuclein in mice. J Biol Chem 2002., 277, 1641–1644. 63 UVERSKY, V. N., J. LI, K. BOWER and A. L. FINK. Synergistic effects of pesticides and metals on the fibrillation of α-synuclein: implications for Parkinson’s disease. Neurotoxicology 2002, 23, 527–536. 64 UVERSKY, V. N., J. LI and A. L. FINK. Metal-triggered structural transformations, aggregation and
21
22 Amyloid Fibril Formation of Natively Unfolded Proteins
65
66
67
68
69
70
71
72
fibrillation of human alpha-synuclein. A possible molecular link between Parkinson’s disease and heavy metal exposure. J Biol Chem 2001, 276, 44284–44296. UVERSKY, V. N., J. LI and A. L. FINK. Tri-methylamine-N-oxide-induced folding of alpha-synuclein. FEBS Lett 2001, 509, 31–35. MUNISHKINA, L. A., C. PHELAN, V. N. UVERSKY and A. L. FINK. Conformational behavior and aggregation of alpha-synuclein in organic solvents: modeling the effects of membranes. Biochemistry 2003., 42, 2720–2730. YAMIN, G., V. N. UVERSKY and A. L. FINK. Nitration inhibits fibrillation of human alpha-synuclein in vitro by formation of soluble oligomers. FEBS Lett 2003, 542, 147–152. UVERSKY, V. N., G. YAMIN, P. O. SOUILLAC, J. GOERS, C. B. GLASER and A. L. FINK. Methionine oxidation inhibits fibrillation of human alpha-synuclein in vitro. FEBS Lett 2002, 517, 239–244. KIRKITADZE, M. D., M. M. CONDRON and D. B. TEPLOW. Identification and characterization of key kinetic intermediates in amyloid beta-protein fibrillogenesis. J Mol Biol 2001, 312, 1103–1119. KHETERPAL, I., H. A. LASHUEL, D. M. HARTLEY, T. WLAZ, P. T. LANSBURY and R. WETZEL. A beta protofibrils possess a stable core structure resistant to hydrogen exchange. Biochemistry 2003, 42, 14092–14098. KAYED, R., Y. SOKOLOV, B. EDMONDS, T. M. MCINTIRE, S. C. MILTON, J. E. HALL, and C. G. GLABE. Permeabilization of lipid bilayers is a common conformation- dependent activity of soluble amyloid oligomers in protein misfolding diseases. J Biol Chem 2004, 279, 46363–46366. YONG, W., A. LOMAKIN, M. D. KIRKITADZE, D. B. TEPLOW, S. H. CHEN and G. B. BENEDEK. Structure determination of micelle-like intermediates in amyloid beta-protein fibril assembly by using small angle neutron scattering. Proc Natl Acad Sci USA 2002, 99, 150–154.
73 ZEROVNIK, E. Amyloid-fibril formation. Proposed mechanisms and relevance to conformational disease. Eur J Biochem 2002, 269, 3362–3371. 74 BITAN, G., A. LOMAKIN and D. B. TEPLOW. Amyloid yβ-protein oligomerization. I. Prenucleation interactions revealed by photo-induced cross-linking of unmodified proteins. J Biol Chem 2001, 276, 35176–35184. 75 CHROMY, B. A., R. J. NOWAK, M. P. LAMBERT, K. L. VIOLA, L. CHANG, P. T. VELASCO, B. W. JONES, S. J. FERNANDEZ, P. N. LACOR, P. HOROWITZ, C. E. FINCH, G. A. KRAFFT and W. L. KLEIN. Self-assembly of Abeta(1-42) into globular neurotoxins. Biochemistry 2003, 42, 12749–12760. 76 ELAGNAF, O. M. and G. B. IRVINE. Aggregation and neurotoxicity of alpha-synuclein and related peptides. Biochem Soc Trans 2002, 30, 559–565. 77 HOSHI, M., M. SATO, S. MATSUMOTO, A. NOGUCHI, K. YASUTAKE, N. YOSHIDA and K. SATO. Spherical aggregates of beta-amyloid (amylospheroid) show high neurotoxicity and activate tau protein kinase I/glycogen synthase kinase-3beta. Proc Natl Acad Sci USA 2003, 100, 6370–6375. 78 STINE, W. B., Jr, K. N. DAHLGREN, G. A. KRAFFT and M. J. LADU. In vitro characterization of conditions for amyloid-beta peptide oligomerization and fibrillogenesis. J Biol Chem 2003, 278, 11612–11622. 79 CLEVELAND, D. W., S. Y. HWO and M. W. KIRSCHNER. Purification of tau, a micro-tubule-associated protein that induces assembly of microtubules from purified tubulin. J Mol Biol 1977, 116, 207–225. 80 CLEVELAND, D. W., S. Y. HWO and M. W. KIRSCHNER. Physical and chemical properties of purified tau factor and role of tau in microtubule assembly. J Mol Biol 1977, 116, 227–247. 81 CLEVELAND, D. W., J. A. CONNOLLY, V. I. KALNINS, B. M. SPIEGELMAN and M. W. KIRSCHNER. Physical-properties and cellular localization of tau, a microtubule associated protein which induces
References
82
83
84
85
86
87
88
89
90
91
assembly of purified tubulin. J Cell Biol 1977, 75, A283–R283. DRECHSEL, D. N., A. A. HYMAN, M. H. COBB and M. W. KIRSCHNER. Modulation of the dynamic instability of tubulin assembly by the microtubule-associated protein tau. Mol Biol Cell 1992, 3, 1141–1154. BRANDT, R. and G. LEE. The balance between tau proteins microtubule growth and nucleation activities – implications for the formation of axonal microtubules. J Neurochem 1993, 61, 997–1005. BRANDT, R. and G. LEE. Functional-organization of microtubule-associated protein-tau – identification of regions which affect microtubule growth, nucleation and bundle formation in vitro. J Biol Chem 1993, 268, 3414–3419. BINDER, L. I., A. FRANKFURTER and L. I. REBHUN. The distribution of tau in the mammalian central nervous system. J Cell Biology 1985, 101, 1371– 1378. HIMMLER, A. Structure of the bovine tau-gene – alternatively spliced transcripts generate a protein family. Mol Cell Biol 1989, 9, 1389–1396. KENESSEY, A. and S. H. C. YEN. The extent of phosphorylation of fetal-tau is comparable to that of PHF-tau from Alzheimer paired helical filaments. Brain Res 1993, 629, 40–46. CROWTHER, R. A. and M. GOEDERT. Abnormal tau-containing filaments in neuro-degenerative diseases. J Struct Biol 2000, 130, 271–279. GOEDERT, M. The significance of tau and alpha-synuclein inclusions in neurode-generative diseases. Curr Opin Genet Dev 2001, 11, 343–351. DELACOURTE, A. and L. BUEE. Normal and pathological tau proteins as factors for microtubule assembly. Int Rev Cytol 1997, 171, 167–224. VULLIET, R., S. M. HALLORAN, R. K. BRAUN, A. J. SMITH and G. LEE. Proline-directed phosphorylation of human tau protein. J Biol Chem 1992, 267, 22570–22574.
92 LU, Q. and J. G. WOOD. Functional studies of Alzheimers disease tau protein. J Neurosci 1993, 13, 508–515. 93 ALONSO, A. C., I. GRUNDKEIQBAL and K. IQBAL. Alzheimer’s disease hyperpho-sphorylated tau sequesters normal tau into tangles of filaments and disassembles microtubules. Nat Med 1996, 2, 783–787. 94 ALONSO, A. D., I. GRUNDKEIQBAL and K. IQBAL. Abnormally phosphorylated tau from Alzheimer disease brain depoly-merizes microtubules. Neurobiol Aging 1994, 15, S37–S37. 95 ALONSO, A. D., T. ZAIDI, I. GRUNDKEIQBAL and K. IQBAL. Role of abnormally phosphorylated tan in the breakdown of microtubules in Alzheimer disease. Proc Natl Acad Sci USA 1994, 91, 5562–5566. 96 IQBAL, K., T. ZAIDI, C. BANCHER and I. GRUNDKEIQBAL. Alzheimer paired helical filaments – restoration of the biological-activity by dephosphorylation. FEBS Lett 1994, 349, 104–108. 97 WATANABE, A., M. HASEGAWA, M. SUZUKI, K. TAKIO, M. MORISHIMAKAWASHIMA, K. TITANI, T. ARAI, K. S. KOSIK and Y. IHARA. In vivo phosphorylation sites in fetal and adult rat tau. J Biol Chem 1993, 268, 25712–25717. 98 BILLINGSLEY, M. L. and R. L. KINCAID. Regulated phosphorylation and dephosphorylation of tau protein: effects on microtubule interaction, intracellular trafficking and neurodegeneration. Biochem J 1997, 323, 577–591. 99 MORISHIMAKAWASHIMA, M., M. HASEGAWA, K. TAKIO, M. SUZUKI, H. YOSHIDA, A. WATANABE, K. TITANI and Y. IHARA. Hyperphosphorylation of Tau in PHF. Neurobiol Aging 1995, 16, 365–371. 100 MORISHIMAKAWASHIMA, M., M. HASEGAWA, K. TAKIO, M. SUZUKI, H. YOSHIDA, K. TITANI and Y. IHARA. Proline-directed and non-proline-directed phosphorylation of PHF-tau. J Biol Chem 1995, 270, 823–829. 101 UVERSKY, V. N., S. WINTER, O. V. GALZITSKAYA, L. KITTLER and G. LOBER. Hyper-phosphorylation induces
23
24 Amyloid Fibril Formation of Natively Unfolded Proteins
102
103
104
105
106
107
108
109
110
structural modification of tau protein. FEBS Lett 1998, 439, 21–25. HAGESTEDT, T., B. LICHTENBERG, H. WILLE, E. M. MANDELKOW and E. MANDELKOW. Tau-protein becomes long and stiff upon phosphorylation – correlation between paracrystalline structure and degree of phosphorylation. J Cell Biol 1989, 109, 1643–1651. EIDENMULLER, J., T. FATH, A. HELLWIG, J. REED, E. SONTAG and R. BRANDT. Structural and functional implications of tau hyperphosphorylation: information from phosphorylation-mimicking mutated tau proteins. Biochemistry 2000, 39, 13166–13175. COOPER, G. J., A. C. WILLIS, A. CLARK, R. C. TURNER, R. B. SIM and K. B. REID. Purification and characterization of a peptide from amyloid-rich pancreases of type 2 diabetic patients. Proc Natl Acad Sci USA 1987, 84, 8628–8632. HIGHAM, C. E., E. T. JAIKARAN, P. E. FRASER, M. GROSS and A. CLARK. Preparation of synthetic human islet amyloid polypeptide (IAPP) in a stable conformation to enable study of conversion to amyloid-like fibrils. FEBS Lett 2000, 470, 55–60. JAIKARAN, E. T., C. E. HIGHAM, L. C. SERPELL, J. ZURDO, M. GROSS, A. CLARK and P. E. FRASER. Identification of a novel human islet amyloid polypeptide beta-sheet domain and factors influencing fibrillogenesis. J Mol Biol 2001, 308, 515–525. JAIKARAN, E. T. and A. CLARK. Islet amyloid and type 2 diabetes: from molecular misfolding to islet pathophysiology. Biochim Biophys Acta 2001, 1537, 179–203. GOLDSBURY, C., K. GOLDIE, J. PELLAUD, J. SEELIG, P. FREY, S. A. MULLER, J. KISTLER, G. J. S. COOPER and U. AEBI. Amyloid fibril formation from full-length and fragments of amylin. J Struct Biol 2000, 130, 352–362. PRUSINER, S. B. Shattuck lecture – neurodegenerative diseases and prions. N Engl J Med 2001, 344, 1516–1526. RIEK, R., S. HORNEMANN, G. WIDER, M. BILLETER, R. GLOCKSHUBER and K. WUTHRICH. NMR structure of the mouse
111
112
113
114
115
116
117
118
prion protein domain PrP(121–231). Nature 1996, 382, 180–182. COLLINGE, J. Prion diseases of humans and animals: their causes and molecular basis. Annu Rev Neurosci 2001, 24, 519–550. ARONOFF-SPENCER, E., C. S. BURNS, N. I. AVDIEVICH, G. J. GERFEN, J. PEISACH, W. E. ANTHOLINE, H. L. BALL, F. E. COHEN, S. B. PRUSINER and G. L. MILLHAUSER. Identification of the Cu2+ binding sites in the N-terminal domain of the prion protein by EPR and CD spectroscopy. Biochemistry 2000, 39, 13760–13771. BURNS, C. S., E. ARONOFF-SPENCER, C. M. DUNHAM, P. LARIO, N. I. AVDIEVICH, W. E. ANTHOLINE, M. M. OLMSTEAD, A. VRIELINK, G. J. GERFEN, J. PEISACH, W. G. SCOTT and G. L. MILLHAUSER. Molecular features of the copper binding sites in the octarepeat domain of the prion protein. Biochemistry 2002, 41, 3991–4001. BURNS, C. S., E. ARONOFF-SPENCER, G. LEGNAME, S. B. PRUSINER, W. E. ANTHOLINE, G. J. GERFEN, J. PEISACH and G. L. MILLHAUSER. Copper coordination in the full-length, recombinant prion protein. Biochemistry 2003, 42, 6794–6803. MILLHAUSER, G. L. Copper binding in the prion protein. Acc Chem Res 2004, 37, 79–85. SCOTT, M., D. GROTH, D. FOSTER, M. TORCHIA, S. L. YANG, S. J. DEARMOND and S. B. PRUSINER. Propagation of prions with artificial properties in transgenic mice expressing chimeric PrP genes. Cell 1993, 73, 979–988. TELLING, G. C., M. SCOTT, J. MASTRIANNI, R. GABIZON, M. TORCHIA, F. E. COHEN, S. J. DEARMOND and S. B. PRUSINER. Prion propagation in mice expressing human and chimeric PrP transgenes implicates the interaction of cellular PrP with another protein. Cell 1995, 83, 79–90. DONNE, D. G., J. H. VILES, D. GROTH, I. MEHLHORN, T. L. JAMES, F. E. COHEN, S. B. PRUSINER, P. E. WRIGHT and H. J. DYSON. Structure of the recombinant full-length hamster prion protein PrP(29–231): the N terminus is highly
References
119
120
121
122
123
124
125
126
127
128
flexible. Proc Natl Acad Sci USA 1997, 94, 13452–13457. JAMES, T. L., H. LIU, N. B. ULYANOV, S. FARR-JONES, H. ZHANG, D. G. DONNE, K. KANEKO, D. GROTH, I. MEHLHORN, S. B. PRUSINER and F. E. COHEN. Solution structure of a 142-residue recombinant prion protein corresponding to the infectious fragment of the scrapie iso-form. Proc Natl Acad Sci USA 1997, 94, 10086–100891. PRUSINER, S. B., M. R. SCOTT, S. J. DEARMOND and F. E. COHEN. Prion protein biology. Cell 1998, 93, 337–348. APETRI, A. C., K. SUREWICZ and W. K. SUREWICZ. The effect of disease-associated mutations on the folding pathway of human prion protein. J Biol Chem 2004, 279, 18008–18014. APETRI, A. C. and W. K. SUREWICZ. Kinetic intermediate in the folding of human prion protein. J Biol Chem 2002, 277, 44589–44592. MARTINS, S. M., A. CHAPEAUROUGE and S. T. FERREIRA. Folding intermediates of the prion protein stabilized by hydrostatic pressure and low temperature. J Biol Chem 2003, 278, 50449–50455. CUMMINGS, C. J. and H. Y. ZOGHBI. Trinucleotide repeats: mechanisms and pathophysiology. Annu Rev Genomics Hum Genet 2000, 1, 281–328. CUMMINGS, C. J. and H. Y. ZOGHBI. Fourteen and counting: unraveling trinucleotide repeat diseases. Hum Mol Genet 2000, 9, 909–916. ZOGHBI, H. Y. and H. T. ORR. Polyglutamine diseases: protein cleavage and aggregation. Curr Opin Neurobiol 1999, 9, 566–570. ROSS, C. A., J. D. WOOD, G. SCHILLING, M. F. PETERS, F. C. NUCIFORA, Jr, J. K. COOPER, A. H. SHARP, R. L. MARGOLIS and D. R. BORCHELT. Polyglutamine pathogenesis. Phil Trans R Soc Lond B Biol Sci 1999, 354, 1005–1011. PREISINGER, E., B. M. JORDAN, A. KAZANTSEV and D. HOUSMAN. Evidence for a recruitment and sequestration mechanism in Huntington’s disease. Phil Trans R Soc Lond B Biol Sci 1999, 354, 1029–1034.
129 WANKER, E. E. Protein aggregation and pathogenesis of Huntington’s disease: mechanisms and correlations. Biol Chem 2000, 381, 937–942. 130 MCCAMPBELL, A., J. P. TAYLOR, A. A. TAYE, J. ROBITSCHEK, M. LI, J. WALCOTT, D. MERRY, Y. CHAI, H. PAULSON, G. SOBUE and K. H. FISCHBECK. CREB-binding protein sequestration by expanded polyglutamine. Hum Mol Genet 2000, 9, 2197–2202. 131 MCCAMPBELL, A. and K. H. FISCHBECK. Polyglutamine and CBP: fatal attraction? Nat Med 2001, 7, 528–530. 132 CHEN, S., V. BERTHELIER, W. YANG and R. WETZEL. Polyglutamine aggregation behavior in vitro supports a recruitment mechanism of cytotoxicity. J Mol Biol 2001, 311, 173–182. 133 CHEN, S., V. BERTHELIER, J. B. HAMILTON, B. O’NUALLAIN and R. WETZEL. Amyloidlike features of polyglutamine aggregates and their assembly kinetics. Biochemistry 2002, 41, 7391–7399. 134 PERUTZ, M. F., B. J. POPE, D. OWEN, E. E. WANKER and E. SCHERZINGER. Aggregation of proteins with expanded glutamine and alanine repeats of the glutamine-rich and asparagine-rich domains of Sup35 and of the amyloid beta-peptide of amyloid plaques. Proc Natl Acad Sci USA 2002, 99, 5596–5600. 135 CHOW, M. K., H. L. PAULSON and S. P. BOTTOMLEY. Destabilization of a non-pathological variant of ataxin-3 results in fibrillogenesis via a partially folded intermediate: a model for misfolding in polyglutamine disease. J Mol Biol 2004, 335, 333–341. 136 POIRIER, M. A., H. LI, J. MACOSKO, S. CAI, M. AMZEL and C. A. ROSS. Huntingtin spheroids and protofibrils as precursors in polyglutamine fibrilization. J Biol Chem 2002, 277, 41032–41037. 137 LINDQUIST, S. Mad cows meet psi chotic yeast: the expansion of the prion hypothesis. Cell 1997, 89, 495–498. 138 PATINO, M. M., J. J. LIU, J. R. GLOVER and S. LINDQUIST. Support for the prion hypothesis for inheritance of a phenotypic trait in yeast. Science 1996, 273, 622–626.
25
26 Amyloid Fibril Formation of Natively Unfolded Proteins
139 PAUSHKIN, S. V., V. V. KUSHNIROV, V. N. SMIRNOV and M. D. TER Avanesyan. Propagation of the yeast prion-like [psi+ ] determinant is mediated by oligomerization of the SUP35-encoded polypeptide chain release factor. EMBO J 1996, 15, 3127–3134. 140 DEPACE, A. H., A. SANTOSO, P. HILLNER and J. S. WEISSMAN. A critical role for aminoterminal glutamine/asparagine repeats in the formation and propagation of a yeast prion. Cell 1998, 93, 1241–1252. 141 SERIO, T. R. and S. L. LINDQUIST. [PSI+ ]: an epigenetic modulator of translation termination efficiency. Annu Rev Cell Dev Biol 1999, 15, 661–703. 142 SERIO, T. R., A. G. CASHIKAR, J. J. MOSLEHI, A. S. KOWAL and S. L. LINDQUIST. Yeast prion [psi+ ] and its determinant, Sup35p. Methods Enzymol 1999, 309, 649–673. 143 GLOVER, J. R., A. S. KOWAL, E. C. SCHIRMER, M. M. PATINO, J. J. LIU and S. LINDQUIST. Self-seeded fibers formed by Sup35, the protein determinant of [PSI+], a heritable prion-like factor of S. cerevisiae. Cell 1997, 89, 811–819. 144 SERIO, T. R., A. G. CASHIKAR, A. S. KOWAL, G. J. SAWICKI, J. J. MOSLEHI, L. SERPELL, M. F. ARNSDORF and S. L. LINDQUIST. Nucleated conformational conversion and the replication of conformational information by a prion determinant. Science 2000, 289, 1317–1321. 145 SCHEIBEL, T. and S. L. LINDQUIST. The role of conformational flexibility in prion propagation and maintenance for Sup35p. Nat Struct Biol 2001, 8, 958–962. 146 WICKNER, R. B. [URE3] as an altered URE2 protein: evidence for a prion analog in Saccharomyces cerevisiae. Science 1994, 264, 566–569. 147 THUAL, C., L. BOUSSET, A. A. KOMAR, S. WALTER, J. BUCHNER, C. CULLIN and R. MELKI. Stability, folding, dimerization and assembly properties of the yeast prion Ure2p. Biochemistry 2001, 40, 1764–1773. 148 MASISON, D. C. and R. B. WICKNER. Prion-inducing domain of yeast Ure2p and protease resistance of Ure2p in
149
150
151
152
153
154
155
156
prion-containing cells. Science 1995, 270, 93–95. MASISON, D. C., M. L. MADDELEIN and R. B. WICKNER. The prion model for [URE3] of yeast: spontaneous generation and requirements for propagation. Proc Natl Acad Sci USA 1997, 94, 12503–12508. THUAL, C., A. A. KOMAR, L. BOUSSET, E. FERNANDEZ-BELLOT, C. CULLIN and R. MELKI. Structural characterization of Saccharomyces cerevisiae prion-like protein Ure2. J Biol Chem 1999, 274, 13666–13674. ZHU, L., X. J. ZHANG, L. Y. WANG, J. M. ZHOU and S. PERRETT. Relationship between stability of folding intermediates and amyloid formation for the yeast prion Ure2p: a quantitative analysis of the effects of pH and buffer system. J Mol Biol 2003, 328, 235–254. GAST, K., H. DAMASCHUN, K. ECKERT, K. SCHULZE-FORSTER, H. R. MAURER, M. MULLER-FROHNE, D. ZIRWER, J. CZARNECKI and G. DAMASCHUN. Prothymosin alpha: a biologically active protein with random coil conformation. Biochemistry 1995, 34, 13211–13218. UVERSKY, V. N., J. R. GILLESPIE, I. S. MILLETT, A. V. KHODYAKOVA, A. M. VASILIEV, T. V. CHERNOVSKAYA, R. N. VASILENKO, G. D. KOZOVSKAYA, D. A. DOLGIKH, A. L. FINK, S. DONIACH and V. M. ABRAMOV. Natively unfolded human prothymosin alpha adopts partially folded collapsed conformation at acidic pH. Biochemistry 1999, 38, 15009–15016. PAVLOV, N. A., D. I. CHERNY, G. HEIM, T. M. JOVIN and V. SUBRAMANIAM. Amyloid fibrils from the mammalian protein prothymosin alpha. FEBS Lett 2002, 517, 37–40. MACRAILD, C. A., D. M. HATTERS, G. J. HOWLETT and P. R. GOOLEY. NMR structure of human apolipoprotein C-II in the presence of sodium dodecyl sulfate. Biochemistry 2001, 40, 5414–5421. HATTERS, D. M., L. J. LAWRENCE and G. J. HOWLETT. Sub-micellar phospholipid accelerates amyloid formation by apolipoprotein C-II. FEBS Lett 2001, 494, 220–224.
References 157 HATTERS, D. M., C. E. MACPHEE, L. J. LAWRENCE, W. H. SAWYER and G. J. HOWLETT. Human apolipoprotein C-II forms twisted amyloid ribbons and closed loops. Biochemistry 2000, 39, 8276–8283. 158 HATTERS, D. M., R. A. LINDNER, J. A. CARVER and G. J. HOWLETT. The molecular chaperone, alpha-crystallin, inhibits amyloid formation by apolipoprotein C-II. J Biol Chem 2001, 276, 33755–33761. 159 HATTERS, D. M., A. P. MINTON and G. J. HOWLETT. Macromolecular crowding accelerates amyloid formation by human apolipoprotein C-II. J Biol Chem 2002, 277, 7824–7830. 160 HATTERS, D. M. and G. J. HOWLETT. The structural basis for amyloid formation by plasma apolipoproteins: a review. Eur Biophys J 2002, 31, 2–8. 161 HATTERS, D. M., M. R. WILSON, S. B. EASTERBROOK-SMITH and G. J. HOWLETT. Suppression of apolipoprotein C-II amyloid formation by the extracellular chaper-one, clusterin. Eur J Biochem 2002, 269, 2789–2794. 162 PHAM, C. L., D. M. HATTERS, L. J. LAWRENCE and G. J. HOWLETT. Cross-linking and amyloid formation by N- and C-terminal cysteine derivatives of human apolipoprotein C-II. Biochemistry 2002, 41, 14313–14322. 163 MUNISHKINA, L. A., A. L. FINK and V. N. UVERSKY. Conformational prerequisites for formation of amyloid fibrils from
164
165
166
167
168
169
histones. J Mol Biol 2004, 342, 1305–1324. PAVLOV, N. A., D. I. CHERNY, G. HEIM, T. M. JOVIN and V. SUBRAMANIAM. Amyloid fibrils from the mammalian protein prothymosin alpha. FEBS Lett 2002, 517, 37–40. PERTINHEZ, T. A., M. BOUCHARD, R. A. G. SMITH, C. M. DOBSON and LJ. SMITH. Stimulation and inhibition of fibril formation by a peptide in the presence of different concentrations of SDS. FEBS Lett 2002, 529, 193–197. AGIANIAN, B., K. LEONARD, E. BONTE, H. Van der ZANDT, P. B. BECKER and P. A. TUCKER. The glutamine-rich domain of the Drosophila GAGA factor is necessary for amyloid fibre formation in vitro, but not for chromatin remodelling. J Mol Biol 1999, 285, 527–544. BOUSSET, L., N. H. THOMSON, S. E. RADFORD and R. MELKI. The yeast prion Ure2p retains its native alpha-helical conformation upon assembly into protein fibrils in vitro. EMBO J 2002, 21, 2903–2911. WILKINS, D. K., C. M. DOBSON and M. GROSS. Biophysical studies of the development of amyloid fibrils from a peptide fragment of cold shock protein B. Eur J Biochem 2000, 267, 2609–2616. FANDRICH, M. and C. M. DOBSON. The behaviour of polyamino acids reveals an inverse side chain effect in amyloid structure formation. EMBO J 2002, 21, 5682–5690.
27
1
Kinesin Superfamily Proteins Nobutaka Hirokawa, and Reiko Takemura University of Tokyo, Tokyo, Japan
Originally published in: Molecular Motors. Edited by Manfred Schliwa. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52730594-0
1 Introduction
A cell is not simply a bag filled with cytoplasmic fluid surrounded by the plasma membrane in which membranous organelles such as the Golgi apparatus, endoplasmic reticulum and secretory vesicles float and through which newly synthesized proteins diffuse to reach their destination. Instead, cells transport and sort proteins and lipids after their synthesis to their proper destinations at appropriate velocities in membranous organelles and protein complexes using various kinds of motor proteins. The Neuron is composed of a cell body, dendrites and a long axon along the direction of impulse propagation. Because of the lack of protein synthesis machinery in the axon, most of the proteins necessary for the functioning of the axon and synaptic terminal must be transported down the axon after their synthesis into the cell body [1]. Most proteins are conveyed in membranous organelles or protein complexes. In this sense intracellular transport in the axon is fundamental to Neuronal morphogenesis and functioning. Epithelial cells also develop a polarized structure, that is, apical and basolateral regions, to which certain proteins are specifically transported and sorted [2, 3]. Vectorial intracellular transport occurs not only in polarized cells such as Neurons and epithelial cells but rather in all types of cells [4]. Microtubules serve as rails for this transport and have a polarity with a fastgrowing or plus end and a slow-growing or minus end. They are well organized in cells. In nerve axons, microtubules are arranged longitudinally with the plus end oriented away from the cell body. In proximal dendrites, the polarity of microtubules is mixed, while in the distal end, the polarity is the same as in the axon. In epithelial cells, microtubules are organized with the plus end oriented toward the basement Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Kinesin Superfamily Proteins
Fig. 1 Quick-freeze deep-etch electron micrographs showing short cross-bridges between membranous organelles and microtubules that are structural candidates for KIFs. Bar 50 nm. Reproduced from [4], Trends Cell Biol. 6: 135–141, with permission.
membrane. In most other cells such as fibroblasts, microtubules radiate from the cell center with the plus end oriented toward the periphery. Electron microscopy (EM) revealed short cross-bridge structures between the organelles and microtubules [5], which are candidates for microtubule-associated motor proteins conveying the membranous organelles along microtubules (Fig. 1). Moreover, video-enhanced differential interference contrast microscopy revealed bidirectional transport of various types of membranous organelles [6, 7]. Based on these findings, kinesin superfamily proteins have been identified as motor proteins transporting cargoes along microtubule rails. ‘Conventional kinesin’, the first member of the kinesin superfamily that was identified, consists of two 120-kDa heavy chains (KHCs) and two 64-kDa light chains (KLCs), at least in animal species [7–14]. It has a rod-like structure composed of two globular heads (10 nm in diameter), a stalk, and a fan-like end, with a total length of 80 nm. The globular heads are composed of KHCs that bind to microtubules [15, 16]; the KLCs constitute the fan-like end [15, 17]. Complementary DNA (cDNA) encoding KHCs yields a protein of ∼975 amino acids of which ∼350 NH2 -terminal amino acids form the motor domain (which binds to microtubules and has ATPase activity), an α-helical coiled coil-rich stalk domain involved in dimer formation and a tail domain [18].
2 The Kinesin Superfamily Proteins
Fig. 2 Structure of cDNAs from murine kinesin superfamily proteins (KIFs). Purple, motor domain; thin red line, ATP-binding consensus sequence; thick red line, microtubule-binding consensus sequence.
A systematic molecular biological search of kinesin superfamily genes coding for proteins containing ATP-binding and microtubule-binding consensus sequences or screening using anti-pan kinesin antibodies led to the discovery of many kinesin superfamily proteins, KIFs [19–24] (Figs 2 and 3). Combining molecular biological approaches with a BLAST search of proteins in public and private genome databases, a total of 45 KIFs were identified in mouse and human genomes [25]; Figs 4, 5 and Tab. 1. In this chapter, we provide an overview of KIFs.
2 The Kinesin Superfamily Proteins
KIFs have been classified into three major types based on the position of the motor domain: NH2 -terminal motor domain type, middle motor domain type, and COOHterminal motor domain type (referred to as N-kinesin, M-kinesin, and C-kinesin, respectively).
3
4 Kinesin Superfamily Proteins
Fig. 3 Left: Panels showing main KIFs functioning in intracellular transport, as observed by low-angle rotary shadowing EM. Scale bar, 100 nm. Right: Schematic illustration of the same KIFs based on EM studies or predicted from analysis of primary structures. Reproduced from [26], Science 279: 519–526, with permission.
All KIFs known or predicted to be transcribed in the human and mouse genomes are presented in a phylogenic tree along with the KIFs from S. cerevisiae, Drosophila melanogaster, and Caenorhabditis elegans (Fig. 5). The entire family is classified into 14 classes. N-kinesin consists of 11 classes, comprising 16 families. Most classes are composed of one family except N-3, N-4, N-6, and N-8. The N-3 class consists of Unc104/KIF1, KIF13, and KIF16 families. Members of the Unc104/KIF1 family are mostly monomeric, while members of the KIF13 family have different characteristics. The N-4 class is composed of the KIF3 and Osm3/KIF17 families. KIF3 is heterotrimeric, and Osm3/KIF17 forms homodimers, indicating that these two families are distinct within this class. The N-8 class consists of the KIF18 and Kid/KIF22 families. M-kinesin is composed ofone class, the KIF2 family. Ckinesin is composed of C-1 and C-2 classes, each having one family. Most of the KIFs of other species including plant species can be categorized into these 14 classes.
3 N-Kinesins 3.1 N-1 Kinesins
The first reported kinesin, the kinesin heavy chain (KHC), is included in this group. There are three highly related family members in this group, KIF5A, KIF5B, and
3 N-Kinesins
Fig. 4 All mouse and human KIFs and phylogenetic analysis of mouse and human orthologs. Sequences were analyzed by the neighbor-joining method. Reproduced from [25], PNAS 98: 7004–7011, with permission.
KIF5C, all of which form tetramers consisting of two heavy chains and two light chains [18, 19, 26–30]. KIF5B is ubiquitously expressed in many tissues, while KIF5A and KIF5C are specifically expressed in the nervous system [25, 28, 30]. The members of this family transport various organelles and macromolecular complexes within the cell [25, 26, 31], including anterograde transported vesicles [32] probably important for action potential propagation [33], mitochondria [34], lysosomes [35, 36], endocytic vesicles [37], tubulin oligomers [38], intermediate filament proteins such as vimentin [39, 40] and mRNA complexes [41, 42] (Fig. 6a,b). It may also be involved in cytoplasmic viral transport [43].
5
6 Kinesin Superfamily Proteins
Fig. 5 Phylogenetic analysis of all KIFs expressed in mouse/human, D. melanogaster, C. elegans and S. cerevisiae. Amino acid sequences were aligned using maximum parisomy. Reproduced from [25], PNAS 98: 7004–7011, with permission.
The exact molecular interaction involved in the binding of KIF5 to these cargoes has not been elucidated, but recently a direct interaction between KIF5 and some of the cargoes has been shown. KIF5 has been shown to bind directly to a group of scaffolding proteins of the JNK signaling pathway namely, the c-jun NH2 -terminal kinase (JNK)-interacting proteins (JIPs), JIP-1, JIP-2, and JIP-3 (also called JSAP1
KIF1A KIF1B KIF1C KIF2A KIF2B KIF2C KIF3A KIF3B KIF3C KIF4A KIF4B KIF5A KIF5B KIF5C KIF6 KIF7 KIF8 KIF11 KIF9 KIF10 KIF12 KIF13A (−) (−) BimC (−) (−) (−) Klp4
(−) (−) Klp61F (−) Cmet Cana CG15844 Kin73
CENP-E
EG5/KNSL1
Unc116
KHC
Y43F4B.6
nKHC uKHC xKHC
KRP85 KRP95
(−) K11D9.1
Unc104
C. elegans
KIF4
CG1453 CG3219
CG8566
D. melanogaster
Klp64D Klp68D CG17461 Klp3A
CAKin/KNSL6
KIF2
ATSV
H. sapiens
(−) (−) Kip1 Cin8 (−) Kip2 (−) (−)
(−)
(−)
(−)
(−)
(−)
S. cerevisiae
AnBo,C XlEg5 CrKlp1
RnnKHC
GgChrkin
CgMCAK XlKCM1 SpKRP85 SpKRP95
RnKIF1D XlXKIF2
Others
Table 1 All KIFs from mouse, human, D. melanogaster, C. elegans, S. cerevisiae, and selected other organisms
(Continued)
[22] [22] [22, 58, 61, 127, 181] [22, 171, 172] [22, 128, 135], FB, [127] [22], FB [22, 77, 182, 183]
[19, 65, 72, 175] [64, 66], FB [22, 63, 176] [19], FB, WB, [153] [25], FB [25, 149, 155, 157] [19, 24, 81, 177] [22], FB [19, 109], WB, [113] [25, 178] [19, 22, 24, 28, 30, 179, 180]
Reference
3 N-Kinesins 7
HSET/KNSL2
KNSL3
Kid/KNSL4 MKLP1/KNSL5
Rab6Kin KlpMPP1
GAKIN HUMORFW Hklp2
T01G1.1 (−) Zen4A Zen4B (−) (−) Vab8 C41G7.2 M01-E11.6 W02-B12.7 Klp3
(−) Klp67A CG9913 CG12298 CG5300 Nod Pav CG17459 (−) (−) Ncd (−)
Osm3 (−) (−) (−)
Klp98A
C. elegans
Klp6 C06G3.2 C33H5.4 (−)
Klp38B (−)
D. melanogaster
FB Flybase http://flybase.bio.indiana.edu:82/; WB, Wormbase, www.wormbase.orgxs
KIFC2 KIFC3
KIF16A KIF16B KIF17 KIF18A KIF18B KIF19A KIF19B KIF20A KIF20B KIF21A KIF21B KIF22 KIF23 KIF24 KIF25 KIF26A KIF26B KIFC1
KIF13B KIF14 KIF15
H. sapiens
(−)
(−) (−) (−) (−) (−) (−) Kar3
(−)
(−) Kip3
(−)
(−) (−)
S. cerevisiae
XlXCTK1
CgCHO2
XlXkid CgCHO1
MmKlp174
XlXklp4
Others
Table 1 All KIFs from mouse, human, D. melanogaster, C. elegans, S. cerevisiae, and selected other organisms (Continued)
[22, 155, 167, 168, 193] [22, 169, 194]
[22, 183] [22, 185] [22, 106, 107] [25], FB, [186] [25], FB [25, 123], FB [25, 187] [114], FB, WB [25, 136–138] [25, 115, 116, 118, 119] [25], FB [25, 188] [25, 143] [22, 189–191], WB, [157, 192]
[22, 78] [22, 184], WB [22, 142], WB
Reference
8 Kinesin Superfamily Proteins
3 N-Kinesins
Fig. 6 Scheme of KIFs and their cargo organelles in neurons (a) and in cells in general (b). In (b), neuron-specific KIFs and ubiquitous KIFs are illustrated in the same cell. CGN, cis-Golgi network; TGN, trans-Golgi network; ECV, endosomal carrier vesicle. Black arrows indicate the direction of transport.
9
10 Kinesin Superfamily Proteins
Fig. 7 Schematic drawing of diverse functions of KIF3 in various cell types.
and Sunday Driver, SYD) [44–48]. JIP-1 and JIP-2 interact with tetra-tricopeptide repeat (TPR) motifs of kinesin light chains through their C-termini. JIP-1 and JIP-2 are soluble proteins by themselves. Therefore, they do not directly connect kinesin to membrane organelles. Rather, they may serve as a link between other cargo membrane proteins and KIF5. One possible mechanism of interaction is that JIP-1 and JIP-2 mediate the interaction of KIF5 with vesicles containing ApoER2, the receptor for the Reelin ligand that controls Neuronal migration (Fig. 8). The role of JIP-3 (Sunday Driver, JSAP1) is less clear. JIP-3 interacts with TPR motifs of kinesin light chains through internal sequences, instead of the C-terminus, and is proposed to be a membrane protein by itself [44], but an unequivocal transmembrane domain has not yet been found. Amyloid precursor protein (APP), whose aberrant transport is thought to contribute to the development of Alzheimer’s disease, is another protein that was proposed to interact directly with KIF5 [49]. The interaction between APP and KIF5
3 N-Kinesins
Fig. 8 Scheme of how KIFs recognize and bind cargoes.
is again through the TPR motifs of kinesin light chains (KLCs). It is proposed that APP serves as the membrane cargo receptor for KIF5 which links KIF5 to a particular subset of axon-transported vesicles. Thus far, reports of binding of KIF5 to other molecules seem to be focused on the TPR motifs of kinesin light chains. However, it was not clear whether KLC is the only site for the interaction between kinesin and cargoes. In some species such as fungi [50, 51], kinesin light chains are absent, implying that KHCs alone are sufficient by themselves for binding to some cargoes [52]. An biochemical interaction between KHC and membranous organelles has been suggested [53]. Very recently, GRIP1 (Glutamate Receptor Interacting Protein 1) has been shown to bind to the KHC tail domain and transport AMPA-type glutamate receptor (GluR2)-containing vesicles to dendrites [54] (Fig. 8). Furthermore, this study demonstrated clearly that GRIP1 steers kinesin to dendrites while JSAP1 (JIP-3, Sunday Driver) steers kinesin to axons through its interaction with KLC [54]. KIF5B was also shown to interact with the actin-based myosin motor MyoVA, suggesting a coordination between these two motors [55].
11
12 Kinesin Superfamily Proteins
3.2 N-2 Kinesins
This class includes KIF11, BimC, Eg5, and plays a role in mitosis [25, 56–62]. KIF11 (Eg5) was shown to form a homotetramer for bipolar spindle formation. The motors localize at the midzone of interpolar microtubules, where microtubules have an anti-parallel orientation. Because the tetramers have motor domains on both ends of the molecule, they are thought to crosslink antiparallel microtubules and to allow them to slide against each other. 3.3 N-3 Kinesins
This class consists of three families, the Unc104/KIF1, KIF13, and KIF16 families [25]. The Unc104/KIF1 family is unique, because most of the proteins in this family form monomeric motors. 3.3.1 The Unc104/KIF1 family There are four members in this family, namely KIF1A, KIF1Bα, KIF1Bβ, and KIF1C [26, 63–66]. KIF1A and KIF1B are monomeric, a unique characteristic of these proteins compared to other KIFs. KIF1A was shown to be able to move processively as a monomer by a biased Brownian movement toward the microtubule plus end [67–70]. C. elegans Unc104 is a homolog of mouse KIF1A [71, 72]. KIF1Bα and KIF1Bβ are splice variants which have similar N-terminal motor domains but are distinct in their cargo-binding domains. Interestingly, the cargo-binding domains of KIF1A and KIF1Bβ have high homology, and both KIFs transport precursors of synaptic vesicles [65, 66] (Fig. 6a). The knock-out mice of KIF1A and KIF1B show an aberrant Neuronal function due to the defect in the transport of precursors of synaptic vesicles [66, 73]. This gene-targeting experiment demonstrated for the first time that defects in axonal transport due to a mutated motor protein can cause human peripheral neuropathy [66]. The KIF1B heterozygotes show progressive muscle weakness and motor disturbance similar to human neuropathies, and it was shown that patients with Charcot–Marie–Tooth disease type 2A carry a loss-offunction mutation in the motor domain of the KIF1B gene [66]. KIF1Bα transports mitochondria, but this role is shared with KIF5B [34] (Fig. 6a). KIF1C is dimeric, has a high homology to the mitochondrial motor KIF1Ba, and is believed to be involved in the transport of vesicles from the Golgi apparatus to the endoplasmic reticulum [74]. KIF1C was shown to be associated with 14-3-3 β, γ , and ζ proteins [75]. However, unexpectedly, cells from KIF1C (−/−) mice showed no significant difference from control cells in the transport of vesicles from the Golgi apparatus to the endoplasmic reticulum, suggesting that KIF1C is dispensable for retrograde transport [63]. Other KIFs may share this function. Recently, KIF1C was reported to mediate mouse macrophage resistance to the anthrax lethal factor [76]. KIF1C does not affect the cellular entry process of anthrax lethal toxin; therefore, events occurring later in the intoxication pathway are probably involved.
3 N-Kinesins
3.3.2 The KIF13 family KIF13A transports vesicles containing the mannose-6-phosphate receptor from the trans-Golgi network to the plasma membrane [77] (Fig. 6b). The intracellular trafficking of the mannose-6-phosphate receptor has been well studied. One of the roles of the mannose-6-phosphate receptor is to function as a recognition signal for the transport of newly synthesized lysosomal enzymes from the trans-Golgi network; it is also involved in the uptake of external ligands on the cell surface. The interaction of KIF13A with the vesicle is mediated via direct interaction between the KIF13A tail and β-1-adaptin, a subunit of the AP-1 adaptor complex [77], and the direct interaction of AP-1 with the mannose-6-phosphate receptor has been demonstrated (Fig. 8). KIF13B (GAKIN) was reported to interact with the human lympohcyte homolog of the Drosophila disc large tumor suppressor protein (hDlg), and may participate in the translocation of hDlg from the cytoplasm to the ‘cap’ structure upon activation [78]. Among the MAGUK superfamily proteins, GAKIN binds to the guanylate kinase-like domain of PSD-95, but not to other MAGUKs. 3.3.3 The KIF16 family KIF14, KIF16A, and KIF16B are members of this family. The function of this group of KIFs has not been clarified yet [25]. Because the tail domains as well as the expression patterns of KIF14, KIF16A, and KIF16B are different, these three KIFs may have separate functions. 3.4 N-4 Kinesins
This class is composed of the KIF3 family, which form heterotrimeric motors, and the Osm3/KIF17 family [25]. 3.4.1 The KIF3 family KIF3A, KIF3B, and KIF3C constitute this family. Members of the KIF3 family form heterotrimeric complexes. Thus the KIF3A–KIF3B heterodimer interacts at its tail domain with a non-motor protein, kinesin superfamily-associated protein 3 (KAP3) [26, 79–84] (Fig. 3). The KIF3A/B-KAP3 complex (also called kinesin II) transports vesicles and macromolecules [85]. In axons of mammalian Neurons, the complex transports vesicles which carry fodrin [86], and choline acetyltransferase, a soluble protein in Drosophila [87]. In melanophores, it is involved in the dispersion of pigment organelles called melanosomes [88–90] (Fig. 7). The KIF3 complex plays a vital role in the formation and maintenance of cilia and flagella in various organisms [91–96]. It transports macromolecular protein complexes from the bottom of the cilium to its distal tip along microtubules by a process called intraflagellar transport (Fig. 7). The components of these macromolecular complexes, called rafts, have been analyzed to some extent; rafts may contain as many as 15 polypeptides, but the details have not been completely clarified [91]. An interesting aspect of the involvement of KIF3 in cilium formation is that it also plays an important role in the determination of the left–right axis in early
13
14 Kinesin Superfamily Proteins
mouse development [97–99]. During mouse development, there is a monocilium at the nodal cell that rotates to produce unidirectional leftward flow of extraembryonic fluid [98, 99]. This nodal flow could generate concentration gradients of putative secreted morphogens in the extraembryonic fluid at the nodal region, which could switch on a gene cascade strictly expressed on the left side of the body and lead to left–right asymmetry. KIF3 is required for the formation of this monocilium. In kif3A (−/−) or kif3B (−/−) mouse, nodal cilia cannot be generated, nodal flow is absent, and the left–right axis formation is fundamentally impaired [98, 99]. In retinal photoreceptor cells, the KIF3 complex is localized in the connecting cilium, which is located between the outer and inner segment [100]. It was shown that the KIF3 complex participates in the transport of opsin from the inner to the outer segment at the connecting cilium, where microtubule plus-ends are pointed toward the outer segment [101]. At the connecting cilium, opsin is localized at the plasma membrane, and the KIF3 motors probably drag the transmembrane protein along the underlying microtubules. The soluble protein, arrestin, also seems to be transported by the KIF3 complex. In migrating epithelial cells, KAP3 is reported to bind to the tumor suppressor gene, adenomatous polyposis coli (APC), and it is proposed that β-catenin is transported to the edges of migrating epithelial cells through this interaction [102]. There is another form of the KIF3 complex, KIF3A/C–KAP3. KIF3A/C–KAP3 is expressed in the nervous system only [103, 104], while KIF3A/B–KAP3 is ubiquitously expressed in many tissues. However, a gene-targeting experiment showed that the KIF3A/C-KAP3 complex is dispensable; therefore, the function of this KIF3 complex may be redundant [105]. 3.4.2 The Osm3/KIF17 family KIF17 is localized to the dendrites and transports vesicles containing N-methyl-Daspartate (NMDA)-type glutamate receptor 2B towards the microtubule plus-end to the postsynaptic site, where the receptor plays an important role in synaptic plasticity [106] (Fig. 2). In the dendrite, microtubules are of mixed polarity. The large protein complex containing mLin-10 (Mint1), mLin-2 (CASK), mLin-7 (MALS/Velis), and NR2B mediates the attachment of KIF17 to the vesicle. The tail domain of KIF17 interacts directly with the PDZ domain of mLin-10, which then sequentially interacts with mLin-2 (CASK), mLin-7, and the NR2B subunit in this order [106] (Figs 6a and 8). Osm-3 is closely related to KIF3, but forms a dimer [107]. In C. elegans, many Neurons have ciliated dendritic ends, and some of them are chemosensory receptors. Osm-3 transports ciliary components to these ciliated dendritic ends [23, 108]. 3.5 N-5 Kinesins
This class includes KIF4A, KIF4B, KIF21A, and KIF21B, and plays roles in both vesicle transport and mitosis [25, 109]. KIF4A mRNA is abundantly expressed in juvenile Neurons, including differentiated immature Neurons, and the motor
3 N-Kinesins
transports vesicles to growth cones [110]. These vesicles may contain L1, a cell adhesion molecule implicated in axonal elongation in immature Neurons [111]. KIF4 was shown to bind murine leukemia virus Gag proteins and may play a role in the transport of viruses in an infected cell [112]. Chromokinesin, a chicken isolog of KIF4, is associated with chromosome arms and functions as a mitotic motor with DNA as its cargo [113]. What KIF21A and KIF21B transport is not known, but they have different distribution patterns within a Neuron; KIF21A is localized throughout Neurons, but KIF21B is highly expressed in dendrites [114]. 3.6 N-6 Kinesins
CHO1/KIF23 and KIF20/Rab kinesin families constitute this class. 3.6.1 The CHO1/KIF23 family The KIF23 family, called by various names such as CHO1, MKLP, ZEN-4 or Pavarotti, depending on the species, appears to be involved in cytokinesis [115–120]. Members of this family can bind to microtubules at their cargo-binding domain, and are, therefore, capable of bundling anti-parallel microtubules. During mitosis, they exist in the midbody region of the spindle in late anaphase and telophase, and probably establish the structure of the telophase spindle to provide a framework for the assembly of the contractile ring. They are also expressed in cultured Neurons [121, 122]. 3.6.2 The KIF20/Rab6 kinesin family The Rab family of GTP-binding proteins regulates vesicular transport and membrane traffic. KIF20A (Rab6-KIF) was first reported as a motor protein that associates with Rab6 and participates in Golgi-derived vesicle transport [123]. Moreover, it was revealed that KIF20 may also participate in cytokinesis [124, 125]. 3.7 N-7 Kinesins
KIF10 (CENP-E) plays an important role in mitosis [126–129]. It associates with kinetochores and tethers the centromere to spindle microtubules. It is presumed to function in ‘congression’, in which chromosomes attached to the spindle microtubules oscillate and migrate to the spindle equator. Therefore, it plays an essential role in chromosome alignment at the spindle equator by tethering kinetochores to the dynamic microtubule plus end [130–134]. CENP-meta has a similar function in Drosophila [135]. 3.8 N-8 Kinesins
This class includes the Kid/KIF22 and KIF18 families.
15
16 Kinesin Superfamily Proteins
3.8.1 The Kid/KIF22 family KIF22 (Kid) is involved in mitosis [136]. It binds to the arms of chromosomes, not to the kinetochore, and plays a role in chromosome alignment at the equatorial plate during metaphase [137–139]. It is proposed that Kid is responsible for the polar ejection force acting on chromosome arms and its degeneration is required for chromosome movement during anaphase. A highly homologous KIF, Nod, is required during female meiosis for chromosome alignment [140, 141]. KIF19 is included in this family, but its function is not yet well characterized [25]. 3.8.2 The KIF18 family This most recently discovered family of KIFs has not yet been characterized. 3.9 N-9 Kinesins
KIF12, a member of this family, is highly expressed in the kidney and may have a significant role in kidney cells. However, its function is not yet well characterized [25]. 3.10 N-10 Kinesins
The KIF15 family forms this class that is predominantly expressed in the spleen and testis [25]. Human KIF15 (Hklp2) may interact with the fork-head-associated domain of pKi-67, a widely used cell proliferation marker protein [142]. 3.11 N-11 Kinesins
This class includes the KIF26 family that is related to Cos2 and Vab-8. Vab-8 has been implicated in axon outgrowth and cell migration [143]. Costal2 (Cos2) has been shown to be part of the hedgehog signaling cascade; therefore, it is involved in the determination of cell fate and the patterning of animal development [144– 148]. It forms a large multiprotein complex with Ser/Thr protein kinases, Fused (Fu) and the transcription factor Cubitus interruptus (Ci). Cos2 prevents Ci nuclear translocation by tethering it in the cytoplasm via binding to microtubules, whereas hedgehog stimulates Ci nuclear import.
4 M-Kinesins
M-kinesins have a motor domain at the center of the molecule. The KIF2 family belongs to this class; its members, KIF2A, 2B, and 2C, have functions in vesicle transport, mitosis, and regulation of microtubule dynamics [25, 149, 150];
5 C-Kinesins
Figs 2 and 3). KIF2A plays a significant role in neurite outgrowth and transports vesicles containing a variant of the (β-subunit of the IGF-1 receptor, which is found on growth cones and is implicated in nerve cell development [151]; Fig. 6a). It may also participate in localization of lysosomes [152]. The KIF2C homolog in Xenopus, XKCM1, has a microtubule-destabilizing activity. It influences microtubule dynamics within the cell and is thought to promote chromosome segregation during mitosis by depolymerizing microtubules at the kinetochore [153–155]. XKCM1 acts on a single protofilament, possibly via an interaction of the basic residues in XKCM1 with the acidic tail of tubulin [156]. Mitotic centromere-associated kinesin (MCAK) identified in hamster is also a homolog of KIF2C [157]. It is also a microtubule depolymerizing factor, consistent with its role in promoting chromosome segregation during mitosis [158].
5 C-Kinesins 5.1 C-1 Kinesins
Drosophila Ncd, S. cerevisiae Kar3, and KIFC1 are included in this family [159, 160]. They are microtubule minus-end-directed motors and participate in meiosis, mitosis and karyogamy [25]. They have a microtubule-binding site on the cargo-binding domain. During mitosis, Ncd may function by crosslinking and sliding anti-parallel spindle microtubules relative to one another, pulling them apart, and thereby opposing the effect of bipolar plus-end-directed microtubule motors [161]. In meiosis, Ncd may play a role in establishing bipolar acentrosomal meiotic spindles [162]. Kar3 has an essential role in nuclear fusion during mating or karyogamy in S. cerevisiae. In addition, it has been implicated in several microtubule functions during the vegetative cell cycle including spindle assembly, mitotic chromosome segregation, microtubule depolymerization, kinetochore-motor activity and spindle positioning. These multiple functions may be performed by interaction of Kar3 with different associating proteins such as Cik1p or Viklp [163–166]. 5.2 C-2 Kinesins
The KIFC2/C3 families constitute C-2 kinesins [25, 167, 168]. KIFC2 is localized in the cell body and dendrites of Neurons [168]. It participates in the transport in dendrites of multivesicular body-like membranous organelles, which are morphologically similar to multivesicular bodies but appear to lack markers of endocytic compartments (Fig. 6a). The classic multivesicular body exists in both axons and dendrites; therefore, the organelle transported by KIFC2 may constitute a novel class. It has not been rigorously demonstrated, but it is presumed that KIFC2 acts as a minus-end-directed motor.
17
18 Kinesin Superfamily Proteins
KIFC3 is a minus-end-directed microtubule motor protein that exists in kidney epithelial cells and other cell types. In epithelial cells, microtubule organization is different from that in other cell types; microtubule minus-ends are located at the apical surface of the cell. KIFC3 accumulates on the apical surface of epithelial cells and transports vesicles associated with annexin XIIIb, a previously characterized membrane protein for apically transported vesicles [169]. A recent gene targeting study showed that KIFC3 plays a complementary role to cytoplasmic dynein in Golgi positioning and integration [170].
6 Orphans
These KIFs have no counterpart in Drosophila or C. elegans. KIF6 and KIF9 are located near the BimC family [25]. KIF9 is reported to interact with the Ras-like GTPase Gem, which suggests that it may play a role in cell shape remodeling [171, 172]. The distance between KIF9 and KIF6 is large. KIF7 has no evident homolog in Drosophila, C. elegans, or S. cerevisiae. Its function is as yet unknown, but it is predominantly expressed in the testis [25].
7 Cargoes of KIFs; Specificity and Redundancy
As mentioned above, KIFs convey various types of cargo. In some cases, a single KIF transports several distinct cargoes. For example, conventional kinesin (KIF5s) transports several cargoes such as mitochondria, lysosomes, vesicles containing Amyloid Precursor Protein (APP) and GAP43 [49, 173], ApoE2, the receptor for the Reelin ligand [48], or the AMPA-type glutamate receptor GluR2 [54]. Kinesin can also convey mRNA complexes [41], tubulin oligomers [38] and vimentin intermediate filament protein complexes [39, 40]. Another example is KIF3. KIF3 also conveys several different cargoes in distinct cell types. KIF3 transports vesicles associated with fodrin, important for neurite elongation in Neurons [86], and it conveys protein complexes for the formation of cilia and flagella along microtubules in various ciliated cell types [92, 93]. KIF3 is also a motor for the transport of pigment granules in pigmented cells in Xenopus [88–90] and is proposed to function in the trafficking of vesicles from the Golgi apparatus to the ER [174]. Thus, KIF3 is another example of a motor that carries out different tasks depending on the type of cell (Fig. 7). How the same KIF can transport different cargoes is an interesting and important question that needs to be addressed. On the other hand, different KIFs sometimes redundantly convey similar cargoes. Mitochondria are transported by KIF1Bα [64] as well as by conventional kinesin (KIF5) [28, 34]. KIF1A’s cargo is synaptic vesicle precursors [65], which are also conveyed by KIF1Bβ [66]. Another example is lysosomes which have at least two
8 Recognition and Binding to Cargoes
microtubule plus-end motors, kinesin and KIF2Aβ [35, 36, 152]. This redundancy could be one of the reasons for subtle phenotypes in knock-out mice in which some KIF genes have been knocked out.
8 Recognition and Binding to Cargoes
In many cases, a KIF has a specific cargo, while sometimes the same KIF, such as conventional kinesin and KIF3, conveys different cargoes. Thus, the fundamental question has been, how do KIFs recognize and bind to their cargoes? Studies of KIF17 and KIF13A clearly indicated for the first time how KIFs recognize and bind cargo molecules [77, 106]. KIF17 transports NMDA-type glutamate receptorcontaining vesicles through an interaction of its tail with mLin10-mLin2-mLin7NR2B. KIF13A, on the other hand, recognizes and binds to mannose-6-phosphate receptor-containing vesicles via an interaction between the KIF13A tail-β1 adaptinAP-1 adaptor complex and mannose-6-phosphate receptor (Fig. 8). In the case of conventional kinesin, it was demonstrated recently that it recognizes and binds cargoes such as APP- and GAP43-containing vesicles through KLC and APP interaction [173], and ApoE2-containing vesicles through interaction between KLC, and JIP1 and JIP2 [48]. Kinesin also recognizes cargoes such as AMPA receptor subunit GluR2-containing vesicles through its interaction with KHC-GRIP1-GluR2 [54] (Fig. 8). Thus, kinesin uses KHC and KLC to discriminate between distinct cargoes. A similar strategy may be used by KIF3. Thus in terms of how KIFs recognize and bind cargoes, two methods can be distinguished. Ĺ KIF tail–scaffolding protein(s) or adaptor protein(s)–membrane protein This is applicable to KIF17-mLin10-mLin2-mLin7-NR2B, KIF13A-13A-β1adaptin-AP1adaptor complex-M6PR, KLC-JIP1/JIP2-ApoE2, the reelin receptor and KIF5 (KHC)-GRIP1-GluR2 (Fig. 8). Ĺ KIF tail–membrane protein Kinesin was reported to bind directly to membrane proteins through its interaction with KLC. Sunday Driver (also called JSAP1 and JIP3) [44] and APP [173] were proposed to be membrane proteins that directly bind to kinesin through KLC, but it needs to be demonstrated whether Sunday Driver (JSAP1 ad JIP3) is indeed a membrane protein.
We are just beginning to understand how KIFs recognize and bind their specific cargoes. So far we only know that rather complex mechanisms are involved in these events. This may be related to how the association and dissociation between KIFs and cargoes are regulated. Another question is how KIFs recognize and bind to protein complexes such as tubulin and intermediate filament proteins, chromosomes and mRNAs. This is obviously a significant question that needs to be answered in the near future.
19
20 Kinesin Superfamily Proteins
9 How to Determine the Direction of Transport
Another intriguing question is how a cell regulates the direction of transport, for example the transport to axons versus dendrites. Kinesin (KIF5) transports APPcontaining vesicles to the axon while the same motor conveys AMPA-receptorcontaining vesicles to dendrites. In this case it was shown that GRIP1, which binds KHC and JSAP1, which binds KLC, steers kinesin to dendrites and axons, respectively [54]. Thus, GRIP1 and JSAP1 are functioning as drivers for kinesin (KIF5). There are motors that transport cargoes mainly to dendrites. KIF17 conveys NMDA receptors mainly to dendrites [106] and KIFC2 transports a new type of multivesicular body-like organelle to dendrites [168]. KIF21B was also proposed to be a motor for dendritic transport [114]. How differential transport is performed is clearly a fundamental question that needs to be solved in the near future. The mechanism is probably not simple and could involve several distinct events. Thus, interestingly, cells use many KIFs and very precisely control the direction and velocity of transport of various distinct cargoes that are fundamental in basic cellular functions. This field is obviously very important and rapidly advancing. Further studies need to be carried out to fully understand the mechanism of intracellular transport.
References 1 GRAFSTEIN, B. and D. S. FORMAN. 1980. Intracellular transport in neurons. Physiol. Rev. 60: 1167–1283. 2 SIMONS, K. and E. IKONEN. 1997. Functional rafts in cell membranes. Nature 387: 569–572. 3 WEIMS, T., S. H. LOW, S. J. CHAPIN, and K. E. MOSTOV 1997. Apical targeting in polarized epithelial cells: there’s more afloat than rafts. Trends Cell Biol. 7: 393–399. 4 HIROKAWA, N. 1996. Organelle transport along microtubules–The role of KIFs. Trends Cell Biol. 6: 135–141. 5 HIROKAWA, N. 1982. Cross-linker system between neurofilaments, microtubules, and membranous organelles in frog axons revealed by the quick-freeze, deep-etching method. J. Cell Biol. 94: 129–142. 6 ALLEN, R. D., J. METUZALS, I. TASAKI, S. T. BRADY, and S. P. GILBERT. 1982. Fast axonal transport in squid giant axon. Science 218: 1127–1129.
7 SCHNAPP, B. J., R. D. VALE, M. P. SHEETZ, and T. S. REESE. 1985. Single microtubules from squid axoplasm support bidirectional movement of organelles. Cell 40: 455–462. 8 BLOOM, G. S., M. C. WAGNER, K. K. PFISTER, and S. T. BRADY. 1988. Native structure and physical properties of bovine brain kinesin and identification of the ATP-binding subunit polypeptide. Biochemistry 27: 3409–3416. 9 BRADY, S. T. 1985. A novel brain ATPase with properties expected for the fast axonal transport motor. Nature 317: 73–75. 10 CYR, J. L., K. K. PFISTER, G. S. BLOOM, C. A. SLAUGHTER, and S. T. BRADY. 1991. Molecular genetics of kinesin light chains: generation of isoforms by alternative splicing. Proc. Natl Acad. Sci. USA 88: 10114–10118. 11 GAUGER, A. K. and L. S. GOLDSTEIN. 1993. The Drosophila kinesin light chain. Primary structure and interaction with
References
12
13
14
15
16
17
18
19
20
21
kinesin heavy chain. J. Biol. Chem. 268: 13657–13666. STENOIEN, D. L. and S. T. BRADY. 1997. Immunochemical analysis of kinesin light chain function. Mol. Biol. Cell 8: 675–689. VALE, R. D., T. S. REESE, and M. P. SHEETZ. 1985. Identification of a novel force-generating protein, kinesin, involved in microtubule-based motility. Cell 42: 39–50. WEDAMAN, K. P., A. E. KNIGHT, J. KENDRICK-JONES, and J. M. SCHOLEY. 1993. Sequences of sea urchin kinesin light chain isoforms. J. Mol. Biol. 231: 155–158. HIROKAWA, N., K. K. PFISTER, H. YORIFUJI, M. C. WAGNER, S. T. BRADY, and G. S. BLOOM. 1989. Submolecular domains of bovine brain kinesin identified by electron microscopy and monoclonal antibody decoration. Cell 56: 867–878. SCHOLEY J. M., J. HEUSER, J. T. YANG, and L. S. GOLDSTEIN. 1989. Identification of globular mechanochemical heads of kinesin. Nature 338: 355–357. PFISTER, K. K., M. C. WAGNER, D. L. STENOIEN, S. T. BRADY, and G. S. BLOOM. 1989. Monoclonal antibodies to kinesin heavy and light chains stain vesicle-like structures, but not microtubules, in cultured cells. J. Cell Biol. 108: 1453–1463. YANG, J. T., R. A. LAYMON, and L. S. GOLDSTEIN. 1989. A three-domain structure of kinesin heavy chain revealed by DNA sequence and microtubule binding analyses. Cell 56: 879–889. AIZAWA, H., Y. SEKINE, R. TAKEMURA, Z. ZHANG, M. NANGAKU, and N. HIROKAWA. 1992. Kinesin family in murine central nervous system. J. Cell Biol. 119: 1287–1296. COLE, D. G., W. Z. CANDE, R. J. BASKIN, D. A. SKOUFIAS, C. J. HOGAN, and J. M. SCHOLEY 1992. Isolation of a sea urchin egg kinesin-related protein using peptide antibodies. J. Cell Sci. 101: 291–301. HIROKAWA, N. 1993. Axonal transport and the cytoskeleton. Curr. Opin. Neurobiol. 3: 724–731.
22 NAKAGAWA, T., Y. TANAKA, E. MATSUOKA, S. KONDO, Y. OKADA, Y. NODA, Y. KANAI, and N. HIROKAWA. 1997. Identification and classification of 16 new kinesin superfamily (KIF) proteins in mouse genome. Proc. Natl Acad. Sci. USA 94: 9654–9659. 23 TABISH, M. Z. K. SIDDIQUI, K. NISHIKAWA, and S. S. SIDDIQUI. 1995. Exclusive expression of C. elegans osm-3 kinesin gene in chemosensory neurons open to the external environment. J. Mol. Biol. 247: 377–389. 24 YANG, Z., D. W. HANLON, J. R. MARSZALEK, and L. S. GOLDSTEIN. 1997. Identification, partial characterization, and genetic mapping of kinesin-like protein genes in mouse. Genomics 45: 123–131. 25 MIKI, H., M. SETOU, K. KANESHIRO, and N. HIROKAWA. 2001. All kinesin superfamily protein, KIF, genes in mouse and human. Proc. Natl Acad. Sci. USA 98: 7004–7011. 26 HIROKAWA, N. 1998. Kinesin and dynein superfamily proteins and the mechanism of organelle transport. Science 79: 519–526. 27 HIROKAWA, N., Y. NODA, and Y. OKADA. 1998. Kinesin and dynein superfamily proteins in organelle transport and Cell division. Curr. Opin. Cell Biol. 10: 60–73. 28 KANAI, Y., Y. OKADA, Y. TANAKA, A. HARADA, S. TERADA, and N. HIROKAWA. 2000. KIF5C, a novel neuronal kinesin enriched in motor neurons. J. Neurosci. 20: 6374–6384. 29 NAVONE, F., J. NICLAS, N. HOM-BOOHER, L. SPARKS, H. D. BERNSTEIN, G. MCCAFFREY and R. D. VALE. 1992. Cloning and expression of a human kinesin heavy chain gene: interaction of the COOH-terminal domain with cytoplasmic microtubules in transfected CV-1 cells. J. Cell Biol. 117: 1263–1275. 30 NICLAS, J., F. NAVONE, N. HOM-BOOHER, and R. D. VALE. 1994. Cloning and localization of a conventional kinesin motor expressed exclusively in neurons. Neuron 12: 1059–1072. 31 TERADA, S. and N. HIROKAWA. 2000. Moving on to the cargo problem of microtubule-dependent motors in
21
22 Kinesin Superfamily Proteins
32
33
34
35
36
37
38
39
40
41
neurons. Curr. Opin. Neurobiol. 10: 566–573. HIROKAWA, N., R. SATO-YOSHITAKE, N. KOBAYASHI, K. K. PFISTER, G. S. BLOOM, and S. T. BRADY. 1991. Kinesin associates with anterogradely transported membranous organelles in vivo. J. Cell Biol. 114: 295–302. GHO, M., K. MCDONALD, B. GANETZKY and W. M. SAXTON. 1992. Effects of kinesin mutations on neuronal functions. Science 258: 313–316. TANAKA, Y., Y. KANAI, Y. OKADA, S. NONAKA, S. TAKEDA, A. HARADA, and N. HIROKAWA. 1998. Targeted disruption of mouse conventional kinesin heavy chain, kif5B, results in abnormal perinuclear clustering of mitochondria. Cell 93: 1147–1158. HOLLENBECK, P. J. and J. A. SWANSON. 1990. Radial extension of macrophage tubular lysosomes supported by kinesin. Nature 346: 864–866. NAKATA, T. and N. HIROKAWA. 1995. Point mutation of adenosine triphosphate-binding motif generated rigor kinesin that selectively blocks anterograde lysosome membrane transport. J. Cell Biol. 131: 1039–1053. BANANIS, E., J. W. MURRAY, R. J. STOCKERT, P. SATIR, and A. W. WOLKOFF 2000. Microtubule and motor-dependent endocytic vesicle sorting in vitro. J. Cell Biol. 151: 179–186. TERADA, S., M. KINJO, and N. HIROKAWA. 2000. Oligomeric tubulin in large transporting complex is transported via kinesin in squid giant axons. Cell 103: 141–155. LIAO, G. and G. G. GUNDERSON. 1998. Kinesin is a candidate for cross-bridging microtubules and intermediate filaments. Selective binding of kinesin to detyrosinated tubulin and vimentin. J. Biol. Chem. 273: 9797–9803. PRAHLAD, V., M. YOON, R. D. MOIR, R. D. VALE, and R. D. GOLDMAN. 1998. Rapid movements of vimentin on microtubule tracks: kinesin-dependent assembly ofintermediate filament networks. J. Cell Biol. 143: 159–170. BRENDZA, R. P., L. R. SERBUS, J. B. DUFFY, and W. M. SAXTON. 2000. A function for
42
43
44
45
46
47
48
49
50
kinesin I in the posterior transport of oskar mRNA and Staufen protein. Science 289: 2120–2122. CARSON, J. H., K. WORBOYS, K. AINGER, and E. BARBARESE. 1997. Translocation of myelin basic protein mRNA in oligodendrocytes requires microtubules and kinesin. Cell Motil. Cytoskeleton 38: 318–328. RIETDORF J., A. PLOUBIDOU, I. RECKMANN, A. HOLMSTROM, F. FRISCHKNECHT, M. ZETTL, T. ZIMMERMANN, and M. WAY. 2001. Kinesin-dependent movement on microtubules precedes actin-based motility of vaccinia virus. Nature Cell Biol. 3: 992–1000. BOWMAN, A. B., A. KAMAL, B. W. RITCHINGS, A. V. PHILP, M. MCGRAIL, J. G. GINDHART, and L. S. GOLDSTEIN. 2000. Kinesin-dependent axonal transport is mediated by the sunday driver (SYD) protein. Cell 103: 583–594. BYRD, D. T., M. KAWASAKI, M. WALCOFF N. HISAMOTO, K. MATSUMOTO, and Y. JIN. 2001. UNC-16, a JNK-signaling scaffold protein, regulates vesicle transport in C. elegans. Neuron 32: 787–800. ITO, M., K. YOSHIOKA, M. AKECHI, S. YAMASHITA, N. TAKAMATSU, K. SUGIYAMA, M. HIBI, Y. NAKABEPPU, T. SHIBA, and K. I. YAMAMOTO. 1999. JSAP1, a novel jun N-terminal protein kinase (JNK)-binding protein that functions as a Scaffold factor in the JNK signaling pathway. Mol. Cell Biol. 19: 7539–7548. VERHEY K. J. and T. A. RAPOPORT. 2001. Kinesin carries the signal. Trends Biochem. Sci. 26: 545–550. VERHEY, K. J., D. MEYER, R. DEEHAN, J. BLENIS, B. J. SCHNAPP, T. A. RAPOPORT, and B. MARGOLIS. 2001. Cargo of kinesin identified as JIP scaffolding proteins and associated signaling molecules. J. Cell Biol. 152: 959–970. KAMAL, A., G. B. STOKIN, Z. YANG, C. H. XIA, and L. S. GOLDSTEIN. 2000. Axonal transport of amyloid precursor protein is mediated by direct binding to the kinesin light chain subunit of kinesin-I. Neuron 28: 449–459. KIRCHNER, J., G. WOEHLKE, and M. SCHLIWA. 1999. Universal and unique features of kinesin motors: insights
References
51
52
53
54
55
56
57
58
59
from a comparison of fungal and animal conventional kinesins. Biol. Chem. 380: 915–921. STEINBERG, G. and M. SCHLIWA. 1995. The Neurospora organelle motor: a distant relative of conventional kinesin with unconventional properties. Mol. Biol. Cell 6: 1605–1618. SEILER, S., J. KIRCHNER, C. HORN, A. KALLIPOLITOU, G. WOEHLKE, and M. SCHLIWA. 2000. Cargo binding and regulatory sites in the tail of fungal conventional kinesin. Nature Cell Biol. 2: 333–338. SKOUFIAS, D. A., D. G. COLE, K. P. WEDAMAN, and J. M. SCHOLEY. 1994. The carboxyl-terminal domain of kinesin heavy chain is important for membrane binding. J. Biol. Chem. 269: 1477–1485. SETOU, M., D. H. SEOG, Y. TANAKA, Y. KANAI, Y. TAKEI, M. KAWAGISHI, and N. HIROKAWA. 2002. Glutamate-receptor-interacting protein GRIP1 directly steers kinesin to dendrites. Nature 417: 83–87. HUANG, J. D., S. T. BRADY, B. W. RICHARDS, D. STENOLEN, J. H. RESAU, N. G. COPELAND, and N. A. JENKINS. 1999. Direct interaction of microtubule- and actin-based transport motors. Nature 397: 204–205 BLANGY, A., H. A. LANE, P. D’HERIN, M. HARPER, M. KRESS, and E. A. NIGG. 1995. Phosphorylation by p34cdc2 regulates spindle association of human Eg5, a kinesin-related motor essential for bipolar spindle formation in vivo. Cell 83: 1159–1169. COLE, D. G., W. M. SAXTON, K. B. SHEEHAN, and J. M. SCHOLEY. 1994. A ‘slow’ homotetrameric kinesin-related motor protein purified from Drosophila embryos. J. Biol. Chem. 269: 22913–22916. ENOS, A. P. and N. R. MORRIS. 1990. Mutation of a gene that encodes a kinesin-like protein blocks nuclear division in A. nidulans. Cell 60: 1019–1027. KAPOOR, T. M., T. U. MAYER, M. L. COUGHLIN, and T. J. MITCHISON. 2000. Probing spindle assembly mechanisms with monastrol, a small molecule
60
61
62
63
64
65
66
67
inhibitor of the mitotic kinesin, Eg5. J. Cell Biol. 150: 975–988. KAPOOR, T. M. and T. J. MITCHISON. 2001. Eg5 is static in bipolar spindles relative to tubulin: evidence for a static spindle matrix. J. Cell Biol. 154: 1102–1104. LE GUELLEC, R., J. PARIS, A. COUTURIER, C. ROGHI, and M. PHILIPPE. 1991. Cloning by differential screening of a Xenopus cDNA that encodes a kinesin-related protein. Mol. Cell Biol. 11: 3395–3398. SHARP, D. J., K. L. MCDONALD, H. M. BROWN, H. J. MATTHIES, C. WALCZAK, R. D. VALE, and T. J. MITCHISON. 1999. The bipolar kinesin, KLP61F, cross-links microtubules within interpolar microtubule bundles of Drosophila embryonic mitotic spindles. J. Cell Biol. 144: 125–138. NAKAJIMA, K., Y. TAKEI, Y. TANAKA, T. NAKAGAWA, T. NAKATA, Y. NODA, M. SETOU, and N. HIROKAWA. 2002. Molecular motor KIF1C is not essential for mouse survival and motor-dependent retrograde Golgi apparatus-to-endo-plasmic reticulum transport. Mol. Cell Biol. 22: 866–873. NANGAKU, M., R. SATO-YOSHITAKE, Y. OKADA, Y. NODA, R. TAKEMURA, H. YAMAZAKI, and N. HIROKAWA. 1994. KIF1B, a novel microtubule plus end-directed monomeric motor protein for transport of mitochondria. Cell 79: 1209–1220. OKADA, Y., H. YAMAZAKI, Y. SEKINE-AIZAWA, and N. HIROKAWA. 1995. The neuron-specific kinesin superfamily protein KIF1A is a unique monomeric motor for anterograde axonal transport of synaptic vesicle precursors. Cell 81: 769–780. ZHAO, C., J. TAKITA, Y. TANAKA, M. SETOU, T. NAKAGAWA, S. TAKEDA, H. W. YANG, S. TERADA, T. NAKATA, Y. TAKEI, M. SAITO, S. TSUJI, Y. HAYASHI, and N. HIROKAWA. 2001. Charcot–Marie–Tooth disease type 2A caused by mutation in a microtubule motorKIF1Bbeta. Cell 105: 587–597. KIKKAWA, M., Y. OKADA, and N. HIROKAWA. 2000. 15 Å resolution model
23
24 Kinesin Superfamily Proteins
68
69
70
71
72
73
74
75
76
of the monomeric kinesin motor: KIF1A. Cell 100: 241–252. KIKKAWA, M., E. P. SABLINA., Y. OKADA, H. YAJIMA, R. J. FLETTERICK and N. HIROKAWA. 2001. Switch-based mechanism of kinesin motors. Nature 411: 439–445. OKADA, Y. and N. HIROKAWA. 1999. A processive single-headed motor: kinesin superfamily protein, KIF1A. Science 283: 1152–1157. OKADA, Yand N. Hirokawa. 2000. Mechanism of the single-headed processivity: Diffusional anchoring between ‘k-loop’ of kinesin and the C-terminus of tubulin. Proc. Natl Acad. Sci. USA 97: 640–645. HALL, D. H. and E. M. HEDGECOCK. 1991. Kinesin-related gene unc-104 is required for axonal transport of synaptic vesicles in C. elegans. Cell 65: 837–847. OTSUKA, A. J., A. JEYAPRAKASH, J. GARCIA-ANOVEROS, L. Z. TANG, G. FISK, T. HARTSHORNE, R. FRANCO, and T. BORN. 1991. The C. elegansunc-104 gene encodes a putative kinesin heavy chain-like protein. Neuron 6: 113–122. YONEKAWA, Y., A. HARADA, Y. OKADA, T. FUNAKOSHI, Y. KANAI, Y. TAKEI, S. TERADA, T. NODA, and N. HIROKAWA. 1998. Defect in synaptic vesicle precursor transport and neuronal cell death in KIF1A motor protein-deficient mice. J. Cell Biol. 141: 431–441. DORNER, C., T. CIOSSEK, S. MULLER, P. H. MOLLER, A. ULLRICH, and R. LAMMERS. 1998. Characterization of KIF1C, a new kinesin-like protein involved in vesicle transport from the Golgi apparatus to the endoplasmic reticulum. J. Biol. Chem. 273: 20267–20275. DORNER, C., A. ULLRICH, H. U. HARING, and R. LAMMERS. 1999. The kinesin-like motor protein KIF1C occurs in intact cells as a dimer and associates with proteins of the 14-3-3 family. J. Biol. Chem. 274: 33654–33660. WATTERS, J. W., K. DEWAR, J. LEHOCZKY V. BOYARTCHUK, and W. F. DIETRICH. 2001. Kif1C, a kinesin-like motor protein, mediates mouse macrophage
77
78
79
80
81
82
83
84
resistance to anthrax lethal factor. Curr. Biol. 11: 1503–1511. NAKAGAWA, T., M. SETOU, D. SEOG, K. OGASAWARA, N. DOHMAE, K. TAKIO, and N. HIROKAWA. 2000. A novel motor, KIF13A, transports mannose-6-phosphate receptor to plasma membrane through direct interaction with AP-1 complex. Cell 103: 569–581. HANADA, T., L. LIN, E. V. TIBALDI, E. L. REINHERZ, and A. H. CHISHTI. 2000. GAKIN, a novel kinesin-like protein associates with the human homologue of the Drosophila discs large tumor suppressor in T lymphocytes. J. Biol. Chem. 275: 28774–28784. KONDO, S., R. SATO-YOSHITAKE, Y. NODA, H. AIZAWA, T. NAKATA, Y. MATSUURA, and N. HIROKAWA. 1994. KIF3A is a new microtubule-based anterograde motor in the nerve axon. J. Cell Biol. 125: 1095–1107. PESAVENTO, P. A., R. J. STEWART, and L. S. GOLDSTEIN. 1994. Characterization of the KLP68D kinesin-like protein in Drosophila: possible roles in axonal transport. J. Cell Biol. 127: 1041–1048. RASHID, D. J., K. P. WEDAMAN, and J. M. SCHOLEY. 1995. Heterodimerization of the two motor subunits of the heterotrimeric kinesin, KRP85/95. J. Mol. Biol. 252: 157–162. WEDAMAN, K. P., D. W. MEYER, D. J. RASHID, D. G. COLE, and J. M. SCHOLEY. 1996. Sequence and submolecular localization of the 115-kD accessory subunit of the heterotrimeric kinesin-II (KRP85/95) complex. J. Cell Biol. 132: 371–380. YAMAZAKI, H., T. NAKATA, Y. OKADA, and N. HIROKAWA. 1995. KIF3A/B: a heterodimeric kinesin superfamily protein that works as a microtubule plus end-directed motor for membrane organelle transport. J. Cell Biol. 130: 1387–1399. YAMAZAKI, H., T. NAKATA, Y. OKADA, and N. HIROKAWA. 1996. Cloning and characterization of KAP3: a novel kinesin superfamily-associated protein of KIF3A/3B. Proc. Natl Acad. Sci. USA 93: 8443–8448.
References 85 HIROKAWA, N. 2000. Stirring up development with the heterotrimeric kinesin KIF3. Traffic 1: 29–34. 86 TAKEDA, S., H. YAMAZAKI, D. H. SEOG, Y. KANAI, S. TERADA, and N. HIROKAWA. 2000. Kinesin superfamily protein 3 (KIF3) motor transports fodrin-associating vesicles important for neurite building. J. Cell Biol. 148: 1255–1265. 87 RAY, K., S. E. PEREZ, Z. YANG, J. XU, B. W. RITCHINGS, H. STELLER, and L. S. GOLDSTEIN. 1999. Kinesin-II is required for axonal transport of choline acetyltransferase in Drosophila. J. Cell Biol. 147: 507–518. 88 GROSS, S. P., M. C. TUMA, S. W. DEACON, A. S. SERPINSKAYA, A. R. REILEIN, and V. I. GELFAND. 2002. Interactions and regulation of molecular motors in Xenopus melanophores. J. Cell Biol. 156: 855–865. 89 REESE, E. L. and L. T. HAIMO. 2000. Dynein, dynactin, and kinesin II’s interaction with microtubules is regulated during bidirectional organelle transport. J. Cell Biol. 151: 155–166. 90 TUMA, M. C., A. ZILL, N. LE BOT, I. VERNOS, and V. GELFAND. 1998. Heterotrimeric kinesin II is the microtubule motor protein responsible for pigment dispersion in Xenopus melanophores. J. Cell Biol. 143: 1547–1558. 91 COLE, D. G., D. R. DIENER, A. L. HIMELBLAU, P. L. BEECH, J. C. FUSTER, and J. L. ROSENBAUM. 1998. Chlamydomonas kinesin-II-dependent intraflagellar transport (IFT): IFT particles contain proteins required for ciliary assembly in Caenorhabditis elegans sensory neurons. J. Cell Biol. 141: 993–1008. 92 KOZMINSKI, K. G., P. L. BEECH, and J. L. ROSENBAUM. 1995. The Chlamydomonas kinesin-like protein FLA10 is involved in motility associated with the flagellar membrane. J. Cell Biol. 131: 1517–1527. 93 MORRIS, R. L. and J. M. SCHOLEY 1997. Heterotrimeric kinesin-II is required for the assembly of motile 9 + 2 ciliary axonemes on sea urchin embryos. J. Cell Biol. 138: 1009–1022.
94 PIPERNO, G. and K. MEAD. 1997. Transport of a novel complex in the cytoplasmic matrix of Chlamydomonas flagella. Proc. Natl Acad. Sci. USA 94: 4457–4462. 95 PIPERNO, G., K. MEAD, and S. HENDERSON. 1996. Inner dynein arms but not outer dynein arms require the activity of kinesin homologue protein KHP1(FLA10) to reach the distal part of flagella in Chlamydomonas. J. Cell Biol. 133: 371–379. 96 WALTHER, Z., M. VASHISHTHA, and J. L. HALL. 1994. The Chlamydomonas FLA10 gene encodes a novel kinesin-homologous protein. J. Cell Biol. 126: 175–188. 97 MARSZALEK, J. R., P. RUIZ-LOZANO, E. ROBERTS, K. R. CHIEN, and L. S. GOLDSTEIN. 1999. Situs inversus and embryonic ciliary morphogenesis defects in mouse mutants lacking the KIF3A subunit of kinesin-II. Proc. Natl Acad. Sci. USA 96: 5043–5048. 98 NONAKA, S., Y. TANAKA, Y. OKADA, S. TAKEDA, A. HARADA, Y. KANAI, M. KIDO, N. HIROKAWA. 1998. Randomization of left–right asymmetry due to loss of nodal cilia generating leftward flow of extraembryonic fluid in mice lacking KIF3B motor protein. Cell 95: 829–837. 99 TAKEDA, S., Y. YONEKAWA, Y. TANAKA, Y. OKADA, S. NONAKA, and N. HIROKAWA. 1999. Left–right asymmetry and kinesin superfamily protein KIF3A: new insights in determination of laterality and mesoderm induction by kif3A-/-mice analysis. J. Cell Biol. 145: 825–836. 100 WHITEHEAD, J. L., S. Y. WANG, L. BOST-USINGER, E. HOANG, K. A. FRAZER, and B. BURNSIDE. 1999. Photoreceptor localization of the KIF3A and KIF3B subunits of the heterotrimeric microtubule motor kinesin II in vertebrate retina. Exp. Eye Res. 69: 491–503. 101 MARSZALEK, J. R., X. LIU, E. A. ROBERTS, D. CHUI, J. D. MARTH, D. S. WILLIAMS, and L. S. GOLDSTEIN. 2000. Genetic evidence for selective transport of opsin and arrestin by kinesin-II in mammalian photoreceptors. Cell 102: 175–187.
25
26 Kinesin Superfamily Proteins
102 JIMBO, T., Y. KAWASAKI, R. KOYAMA, R. SATO, S. TAKADA, K. HARAGUCHI, and T. AKIYAMA. 2002. Identification of a link between the tumor suppressor APC and the kinesin superfamily. Nature Cell Biol. 4: 323–327. 103 MURESAN, V., T. ABRAMSON, A. LYASS, D. WINTER, E. PORRO, F. HONG, N. L. CHAMBERLIN, and B. J. SCHNAPP. 1998. KIF3C and KIF3A form a novel neuronal heteromeric kinesin that associates with membrane vesicles. Mol. Biol. Cell 9: 637–652. 104 YANG, Z. and L. S. GOLDSTEIN. 1998. Characterization of the KIF3C neural kinesin-like motor from mouse. Mol. Biol. Cell 9: 249–261. 105 YANG, Z., E. A. ROBERTS, and L. S. GOLDSTEIN. 2001. Functional analysis of mouse kinesin motor Kif3C. Mol. Cell Biol. 21: 5306–5311. 106 SETOU, M., T. NAKAGAWA, D. H. SEOG, and N. HIROKAWA. 2000. Kinesin superfamily motor protein KIF17 and mLin-10 in NMDA receptor-containing vesicle transport. Science 288: 1796–1802. 107 SHAKIR, M. A., T. FUKUSHIGE, H. YASUDA, J. MIWA, and S. S. SIDDIQUI. 1993. C. elegans osm-3 gene mediating osmotic avoidance behaviour encodes a kinesin-like protein. Neuroreport 4: 891–894. 108 SIGNOR, D., K. P. WEDAMAN, L. S. ROSE, and J. M. SCHOLEY. 1999. Two heteromeric kinesin complexes in chemosensory neurons and sensory cilia of Caenorhabditis elegans. Mol. Biol. Cell 10: 345–360. 109 WILLIAMS, B. C., M. F. RIEDY, E. V. WILLIAMS, M. GATTI, and M. L. GOLDBERG. 1995. The Drosophila kinesin-like protein KLP3A is a mid-body component required for central spindle assembly and initiation of cytokinesis. J. Cell Biol. 129: 709–723. 110 SEKINE, Y., Y. OKADA, Y. NODA, S. KONDO, H. AIZAWA, R. TAKEMURA, and N. HIROKAWA. 1994. A novel microtubule-based motor protein (KIF4) for organelle transports, whose expression is regulated developmentally. J. Cell Biol. 127: 187–201.
111 PERETTI, D., L. PERIS, S. ROSSO, S. QUIROGA, and A. CACERES. 2000. Evidence for the involvement of KIF4 in the anterograde transport of L1-containing vesicles. J. Cell Biol. 149: 141–152. 112 KIM, W., Y. TANG, Y. OKADA, T. A. TORREY S. K. CHATTOPADHYAY, M. PFLEIDERER, F. G. FALKNER, F. DORNER, W. CHOI, N. HIROKAWA, and Morse HC, III. 1998. Binding of murine leukemia virus Gag polyproteins to KIF4, a microtubule-based motor protein. J. Virol. 72: 6898–6901. 113 WANG, S. Z. and R. ADLER. 1995. Chromokinesin: a DNA-binding, kinesin-like nuclear protein. J. Cell Biol. 128: 761–768. 114 MARSZALEK, J. R., J. A. WEINER, S. J. FARLOW, J. CHUN, and L. S. GOLDSTEIN. 1999. Novel dendritic kinesin sorting identified by different process targeting of two related kinesins: KIF21A and KIF21B. J. Cell Biol. 145: 469–479. 115 ADAMS, R. R., A. A. TAVARES, A. SALZBERG, H. J. BELLEN, and D. M. GLOVER. 1998. Pavarotti encodes a kinesin-like protein required to organize the central spindle and contractile ring for cytokinesis. Genes Dev. 12: 1483–1494. 116 KURIYAMA, R., S. DRAGAS-GRANOIC, T. MAEKAWA, A. VASSILEV, A. KHODJAKOV, and H. KOBAYASHI. 1994. Heterogeneity and microtubule interaction of the CHO1 antigen, a mitosis-specific kinesin-like protein. Analysis of subdomains expressed in insect sf9 cells. J. Cell Sci. 107: 3485–3499. 117 KURIYAMA, R., C. GUSTUS, Y. TERADA, Y. UETAKE, and J. MAULIENE. 2002. CHO1, a mammlian kinesin-like protein, interacts with F-actin and is involved in the terminal phase of cytokinesis. J. Cell Biol. 156: 783–790. 118 NISLOW, C., C. SELLITTO, R. KURIYAMA, and J. R. MCINTOSH. 1990. A monoclonal antibody to a mitotic microtubule-associated protein blocks mitotic progression. J. Cell Biol. 111: 511–522. 119 RAICH, W. B., A. N. MORAN, J. H. ROTHMAN, and J. HARDIN. 1998.
References
120
121
122
123
124
125
126
127
128
Cytokinesis and midzone microtubule organization in Caenorhabditis elegans require the kinesin-like protein ZEN-4. Mol. Biol. Cell 9: 2037–2049. SEVERSON, A. F., D. R. HAMILL, J. C. CARTER, J. SCHUMACHER, and B. BOWERMAN. 2000. The aurora-related kinase AIR-2 recruits ZEN-4/CeMKLP1 to the mitotic spindle at metaphase and is required for cytokinesis. Curr. Biol. 10: 1162–1171. FERHAT, L., R. KURIYAMA, G. E. LYONS, B. MICALE, and P. W. BAAS. 1998. Expression of the mitotic motor protein CHO1/MKLP1 in postmitotic neurons. Eur. J. Neurosci. 10: 1383–1393. SHARP, D. J., W. YU, L. FERHAT, R. KURIYAMA, D. C. RUEGER, and P. W. BAAS. 1997. Identification of a microtubule-associated motor protein essential for dendritic differentiation. J. Cell Biol. 138: 833–843. ECHARD, A., F. JOLLIVET, O. MARTINEZ, J. J. LACAPERE, A. ROUSSELET, I. JANOUEIX-LEROSEY, and B. GOUD. 1998. Interaction of a Golgi-associated kinesin-like protein with Rab6. Science 279: 580–585. FONTIJN, R. D., B. GOUD, A. ECHARD, F. JOLLIVET, J. van MARLE, H. PANNEKOEK, and A. J. G. HORREVOETS. 2001. The human kinesin-like protein RB6K is under tight cell cycle control and is essential for cytokinesis. Mol. Cell Biol. 21: 2944–2955. HILL, E., M. CLARKE, and F. A. BARR. 2000. The Rab6-binding kinesin, Rab6-KIFL, is required for cytokinesis. EMBO J. 19: 5711–5719. COTTINGHAM, F. R. and M. A. HOYT. 1997. Mitotic spindle positioning in Saccharomyces cerevisiae is accomplished by antagonistically acting microtubule motor proteins. J. Cell Biol. 138: 1041–1053. ROOF, D. M., P. B. MELUH, and M. D. ROSE. 1992. Kinesin-related proteins required for assembly of the mitotic spindle. J. Cell Biol. 118: 95–108. YEN, T. J., D. A. COMPTON, D. WISE, R. P. ZINKOWSKI, B. R. BRINKLEY W. C. EARNSHAW, and D. W. CLEVELAND. 1991. CENP-E, a novel human
129
130
131
132
133
134
135
136
centromere-associated protein required for progression from metaphase to anaphase. EMBO J. 10: 1245–1254. YEN, T. J., G. LI, B. T. SCHAAR, I. SZILAK, and D. W. CLEVELAND. 1992. CENP-E is a putative kinetochore motor that accumulates just before mitosis. Nature 359: 536–539. MCEWEN, B. F., G. K. CHAN, B. ZUBROWSKI, M. S. SAVOIAN, M. T. SAUER, and T. J. YEN. 2001. CENP-E is essential for reliable bioriented spindle attachment, but chromosome alignment can be achieved via redundant mechanisms in mammalian cells. Mol. Biol. Cell 12: 2776–2789. SCHAAR, B. T., G. K. CHAN, P. MADDOX, E. D. SALMON, and T. J. YEN. 1997. CENP-E function at kinetochores is essential for chromosome alignment. J. Cell Biol. 139: 1373–1382. TOPPER, L. M., H. BASTIANS, J. V. RUDERMAN, and G. J. GORBSKY 2001. Elevating the level of Cdc34/Ubc3 ubiquitin-conjugating enzyme in mitosis inhibits association of CENP-E with kinetochores and blocks the metaphase alignment of chromosomes. J. Cell Biol. 154: 707–717. WOOD, K. W., R. SAKOWICZ, L. S. GOLDSTEIN, and D. W. CLEVELAND. 1997. CENP-E is a plus end-directed kinetochore motor required for metaphase chromosome alignment. Cell 91: 357–366. YAO, X., K. L. ANDERSON, and D. W. CLEVELAND. 1997. The microtubule-dependent motor centromere-associated protein E (CENP-E) is an integral component of kinetochore corona fibers that link centromeres to spindle microtubules. J. Cell Biol. 139: 435–447. YUCEL, J. K., J. D. MARSZALEK, J. R. MCINTOSH, L. S. GOLDSTEIN, D. W. CLEVELAND, and A. V. PHILP. 2000. CENP-meta, an essential kinetochore kinesin required for the maintenance of metaphase chromosome alignment in Drosophila. J. Cell Biol. 150: 1–11. TOKAI, N., A. FUJIMOTO-NISHIYAMA, Y. TOYOSHIMA, S. YONEMURA, S. TSUKITA, J. INOUE, and T. YAMAMOTO. 1996. Kid, a
27
28 Kinesin Superfamily Proteins
137
138
139
140 141
142
143
144
145
novel kinesin-like DNA binding protein, is localized to chromosomes and the mitotic spindle. EMBO J. 15: 457–467. ANTONIO, C., I. FERBY, H. WILHELM, M. JONES, E. KARSENTI, A. R. NEBREDA, and I. VERNOS. 2000. Xkid, a chromokinesin required for chromosome alignment on the metaphase plate. Cell 102: 425–435. FUNABIKI, H. and A. W. MURRAY. 2000. The Xenopus chromokinesin Xkid is essential for metaphase chromosome alignment and must be degraded to allow anaphase chromosome movement. Cell 102: 411–424. LEVESQUE, A. A. and D. A. COMPTON. 2001. The chromokinesin Kid is necessary for chromosome arm orientation and oscillation, but not congression, on mitotic spindles. J. Cell Biol. 154: 1135–1146. HEALD, R. 2000. Motor function in the mitotic spindle. Cell 102: 399–402. ZHANG, P., B. A. KNOWLES, L. S. GOLDSTEIN, and R. S. HAWLEY. 1990. A kinesin-like protein required for distributive chromosome segregation in Drosophila. Cell 62: 1053–1062. SUEISHI, M., M. TAKAGI, and Y. YONEDA. 2000. The forkhead-associated domain of Ki-67 antigen interacts with the novel kinesin-like protein Hklp2. J. Biol. Chem. 275: 28888–28892. WOLF, F. W., M. S. HUNG, B. WIGHTMAN, J. WAY, and G. GARRIGA. 1998. vab-8 is a key regulator of posteriorly directed migrations in C. elegans and encodes a novel protein with kinesin motor similarity. Neuron 20: 655–666. ASCANO, M. Jr, K. E. NYBAKKEN, J. SOSINSKI, M. A. STEGMAN, and D. J. ROBBINS. 2002. The carboxyl-terminal domain of the protein kinase fused can function as a dominant inhibitor of hedgehog signaling. Mol. Cell Biol. 22: 1555–1566. ROBBINS, D. J., K. E. NYBAKKEN, R. KOBAYASHI, J. C. SISSON, J. M. BISHOP, and P. P. THEROND. 1997. Hedgehog elicits signal transduction by means of a large complex containing the kinesin-related protein costal2. Cell 90: 225–234.
146 SISSON, J. C., K. S. HO, K. SUYAMA, and M. P. SCOTT. 1997. Costal2, a novel kinesin-related protein in the Hedgehog signaling pathway. Cell 90: 235–245. 147 STEGMAN, M. A., J. E. VALLANCE, G. ELANGOVAN, J. SOSINSKI, Y. CHENG, and D. J. ROBBINS. 2000. Identification of a tetrameric hedgehog signaling complex. J. Biol. Chem. 275: 21809–21812. 148 WANG, G., K. AMANAI, B. WANG, and J. JIANG. 2000. Interactions with Costal2 and suppressor of fused regulate nuclear translocation and activity of cubitus interruptus. Genes Dev. 14: 2893–2905. 149 KIM, I. G., D. Y. JUN, U. SOHN, and Y. H. KIM. 1997. Cloning and expression of human mitotic centromere-associated kinesin gene. Biochim. Biophys. Acta 1359: 181–186. 150 NODA, Y., R. SATO-YOSHITAKE, S. KONDO, M. NANGAKU, and N. HIROKAWA. 1995. KIF2 is a new microtubule-based anterograde motor that transports membranous organelles distinct from those carried by kinesin heavy chain or KIF3A/B. J. Cell Biol. 129: 157–167. 151 MORFINI, G., S. QUIROGA, A. ROSA, K. KOSIK, and A. CACERES. 1997. Suppression of KIF2 in PC12 cells alters the distribution of a growth cone nonsynaptic membrane receptor and inhibits neurite extension. J. Cell Biol. 138: 657–669. 152 SANTAMA, N., J. KRIJNSE-LOCKER, G. GRIFFITHS, Y. NODA, N. HIROKAWA, and D. G. DOTTI. 1998. KIF2beta, a new kinesin superfamily protein in non-neuronal cells, is associated with lysosomes and may be implicated in their centrifugal translocation. EMBO J. 17: 5855–5867. 153 DESAI, A., S. VERMA, T. J. MITCHISON, and C. E. WALCZAK. 1999. Kin I kinesins are microtu-bule-destabilizing enzymes. Cell 96: 69–78. 154 TOURNEBIZE, R., A. POPOV, K. KINOSHITA, A. J. ASHFORD, S. RYBINA, A. POZNIAKOVSKY T. U. MAYER, C. E. WALCZAK, E. KARSENTI, and A. A. HYMAN. 2000. Control of microtubule dynamics by the antagonistic activities of XMAP215 and XKCM1 in Xenopus egg extracts. Nature Cell Biol. 2: 13–19.
References 155 WALCZAK, C. E., T. J. MITCHISON, and A. DESAI. 1996. XKCM1: a Xenopus kinesin-related protein that regulates microtubule dynamics during mitotic spindle assembly. Cell 84: 37–47. 156 NIEDERSTRASSER, H., H. SALEHI-HAD, E. C. GAN, C. WALCZAK, and E. NOGALES. 2002. XKCM1 acts on a single protofilament and requires the C terminus of tubulin. J. Mol. Biol. 316: 817–828. 157 WORDEMAN, L. and T. J. MITCHISON. 1995. Identification and partial characterization of mitotic centromere-associated kinesin, a kinesin-related protein that associates with centromeres during mitosis. J. Cell Biol. 128: 95–104. 158 MANEY, T., M. WAGENBACH, and L. WORDEMAN. 2001. Molecular dissection of the microtubule depolymerizing activity of mitotic centromere-associated kinesin. J. Biol. Chem. 276: 34753–34758. 159 ENDOW, S. A. and M. HATSUMI. 1991. A multimember kinesin gene family in Drosophila. Proc. Natl Acad. Sci. USA 88: 4424–4427. 160 MELUH, P. B. and M. D. ROSE. 1990. KAR3, a kinesin-related gene required for yeast nuclear fusion. Cell 60: 1029–1041. 161 SHARP, D. J., K. R. YU, J. C. SISSON, W. SULLIVAN, and J. M. SCHOLEY 1999. Antagonistic microtubule-sliding motors position mitotic centrosomes in Drosophila early embryos. Nature Cell Biol. 1: 51–54. 162 CULLEN, C. F. and H. OHKURA. 2001. Msps protein is localized to acentrosomal poles to ensure bipolarity of Drosophila meiotic spindles. Nature Cell Biol. 3: 637–642. 163 BARRETT, J. G., B. D. MANNING, and M. SNYDER. 2000. The Kar3p kinesin-related protein forms a novel heterodimeric structure with its associated protein Cik1p. Mol. Biol. Cell 11: 2373–2385. 164 MANNING, B. D., J. G. BARRETT, J. A. WALLACE, H. GRANOK, and M. SNYDER. 1999. Differential regulation of the Kar3p kinesin-related protein by two associated proteins, Cik1p and Vik1p. J. Cell Biol. 144: 1219–1233.
165 SHANKS, R. M., R. J. KAMIENIECKI, and D. S. DAWSON. 2001. The Kar3-interacting protein Cik1p plays a critical role in passage through meiosis I in Saccharomyces cerevisiae. Genetics 159: 939–951. 166 TROXELL, C. L., M. A. SWEEZY, R. R. WEST, K. D. REED, B. D. CARSON, A. L. PIDOUX, W. Z. CANDE, and J. R. MCINTOSH. 2001. pkl1(+) and klp2(+): Two kinesins of the Kar3 subfamily in fission yeast perform different functions in both mitosis and meiosis. Mol. Biol. Cell 12: 3476–3488. 167 HANLON, D. W., Z. YANG, and L. S. GOLDSTEIN. 1997. Characterization of KIFC2, a neuronal kinesin superfamily member in mouse. Neuron 18: 439–451. 168 SAITO, N., Y. OKADA, Y. NODA, Y. KINOSHITA, S. KONDO, and N. HIROKAWA. 1997. KIFC2 is a novel neuron-specific C-terminal type kinesin superfamily motor for dendritic transport of multivesicular body-like organelles. Neuron 18: 425–438. 169 NODA, Y., Y. OKADA, N. SAITO, M. SETOU, Y. XU, Z. ZHANG, and N. HIROKAWA. 2001. KIFC3, a microtubule minus end-directed motor for the apical transport of annexin XIIIb-associated Triton-insoluble membranes. J. Cell Biol. 155: 77–88. 170 XU, Y., S. TAKEDA, T. TANAKA, Y. NODA, Y. TANAKA, and N. HIROKAWA, 2002. Role of KIFC3 motor protein in Golgi positioning and integration. J. Cell Biol. 158: 293–303. 171 BERNSTEIN, M., P. L. BEECH, S. G. KATZ, and J. L. ROSENBAUM. 1994. A new kinesin-like protein (Klp1) localized to a single microtubule of the Chlamydomonas flagellum. J. Cell Biol. 125: 1313–1326. 172 PIDDINI, E., J. A. SCHMID, R. de MARTIN, and D. G. DOTTI. 2001. The Ras-like GTPase Gem is involved in cell shape remodelling and interacts with the novel kinesin-like protein KIF9. EMBO J. 20: 4076–4087. 173 KAMAL, A., A. ALMENAR-QUERALT, J. F. LEBLANC, E. A. ROBERTS, and L. S. GOLDSTEIN 2001. Kinesin-mediated axonal transport of a membrane compartment containing beta-secretase
29
30 Kinesin Superfamily Proteins
174
175
176
177
178
179
180
181
182
and presenilin-1 requires APP. Nature 414: 643–648. LEBOT, N., C. ANTONY, J. WHITE, E. KARSENTI, and I. VERNOS. 1998. Role of xKlp3, a subunit of the Xenopus kinesin II heterotrimer complex, in membrane transport between the endoplasmic reticulum and Golgi apparatus. J. Cell Biol. 143: 1559–1573. FURLONG, R. A., C. Y. ZHOU, M. A. FERGUSON-SMITH, and N. A. AFFARA. 1996. Characterization of a kinesin-related gene ATSV, within the tuberous sclerosis locus (TSC1) candidate region on chromosome 9Q34. Genomics 33: 421–429. ROGERS, S. L., I. S. TINT, P. C. FANAPOUR, and V. I. GELFAND. 1997. Regulated bidirectional motility of melanophore pigment granules along microtubules in vitro. Proc. Natl Acad. Sci. USA 94: 3720–3725. COLE, D. G., S. W. CHINN, K. P. WEDAMAN, K. HALL, T. VUONG, and J. M. SCHOLEY. 1993. Novel heterotrimeric kinesin-related protein purified from sea urchin eggs. Nature 366: 268–270. HA, M. J., J. YOON, E. MOON, Y. M. LEE, H. J. KIM, and W. KIM. 2000. Assignment of the kinesin family member 4 genes (KIF4A and KIF4B) to human chromosome bands Xq13.1 and 5q33.1 by in situ hybridization. Cytogenet. Cell Genet. 88: 41–42. SAXTON, W. M., J. HICKS, L. S. GOLDSTEIN, and E. C. RAFF. 1991. Kinesin heavy chain is essential for viability and neuromuscular functions in Drosophila, but mutants show no defects in mitosis. Cell 64: 1093–1102. PATEL, N., D. THIERRY-MIEG, and J. R. MANCILLAS. 1993. Cloning by insertional mutagenesis of a cDNA encoding Caenorhabditis elegans kinesin heavy chain. Proc. Natl Acad. Sci. USA 90: 9181–9185. HOYT, M. A., L. HE, K. K. LOO, and W. S. SAUNDERS. 1992. Two Saccharomyces cerevisiae kinesin-related gene products required for mitotic spindle assembly. J. Cell Biol. 118: 109–120. LI, H. P., Z. M. LIU, and M. NIRENBERG. 1997. Kinesin-73 in the nervous system
183
184
185
186
187
188
189
190
191
of Drosophila embryos. Proc. Natl Acad. Sci. USA. 94: 1086–1091. STEWART, R. J., P. A. PESAVENTO, D. N. WOERPEL, and L. S. GOLDSTEIN. 1991. Identification and partial characterization of six members of the kinesin superfamily in Drosophila. Proc. Natl Acad. Sci. USA 88: 8470–8474. MOLINA, I., S. BAARS, J. A. BRILL, K. G. HALES, M. T. FULLER, and P. RIPOLL. 1997. A chromatin-associated kinesin-related protein required for normal mitotic chromosome segregation in Drosophila. J. Cell Biol. 139: 1361–1371. VERNOS, I., J. HEASMAN, and C. WYLIE. 1993. Multiple kinesin-like transcripts in Xenopus oocytes. Dev. Biol. 157: 232–239. DEZWAAN, T. M., E. ELLINGSON, D. PELLMAN, and D. M. ROOF. 1997. Kinesin-related KIP3 of Saccharomyces cerevisiae is required for a distinct step in nuclear migration. J. Cell Biol. 138: 1023–1040. WESTENDORF J. M., P. N. RAO, and L. GERACE. 1994. Cloning of cDNAs for M-phase phosphoproteins recognized by the MPM2 monoclonal antibody and determination of the phosphorylated epitope. Proc. Natl Acad. Sci. USA 91: 714–718. OKAMOTO, S., M. MATSUSHIMA, and Y. NAKAMURA. 1998. Identification, genomic organization, and alternative splicing of KNSL3, a novel human gene encoding a kinesin-like protein. Cytogenet. Cell Genet. 83: 25–29. ANDO, A., Y. Y. KIKUTI, H. KAWATA, N. OKAMOTO, T. IMAI, T. EKI, K. YOKOYAMA, E. SOEDA, T. IKEMURA, K. ABE, et al. 1994. Cloning of a new kinesin-related gene located at the centromeric end of the human MHC region. Immunogenetics 39: 194–200. ENDOW, S. A., S. HENIKOFF and L. SOLER-NIEDZIELA. 1990. Mediation of meiotic and early mitotic chromosome segregation in Drosophila by a protein related to kinesin. Nature 345: 81–83. MCDONALD, H. B. and L. S. GOLDSTEIN. 1990. Identification and characterization
References of a gene encoding a kinesin-like protein in Drosophila. Cell 61: 991–1000. 192 KURIYAMA, R., M. KOFRON, R. ESSNER, T. KATO, S. DRAGAS-GRANOIC, C. K. OMOTO, and A. KHODJAKOV. 1995. Characterization of a minus end-directed kinesin-like motor protein from cultured mammalian cells. J. Cell Biol. 129: 1049–1059. 193 KHAN, M. L., C. B. GOGONEA, Z. K. SIDDIQUI, M. Y. ALI, R. KIKUNO, K. NISHIKAWA, and S. S. SIDDIQUI. 1997. Molecular cloning and expression of the Caenorhabditis elegans klp-3, an ortholog
of C terminus motor kinesins Kar3 and ncd. J. Mol. Biol. 270: 627–639. 194 YANG, Z., Ch. XIA, E. A. ROBERTS, K. BUSH, S. K. NIGAM, and L. S. GOLDSTEIN. 2001b. Molecular cloning and functional analysis of mouse C-terminal kinesin motor KifC3. Mol. Cell Biol. 21: 765–770. 195 KIM, A. J. and S. A. ENDOW. 2000. A kinesin family tree. J. Cell Sci. 113: 3681–3682. 196 SHARP, D. J., G. C. ROGERS, and J. M. SCHOLEY. 2000. Microtubule motors in mitosis. Nature 407: 41–47.
31
1
Myosin Superfamily Proteins Michele C. Kieke and Margaret A. Titus
University of Minnesota, Minneapolis, USA
Originally published in: Molecular Motors. Edited by Manfred Schliwa. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-52730594-0
1 Introduction
Actin filaments and the myosin motors associated with them play important roles in many dynamic biological processes. The classic example of actin filaments and myosin at work is during skeletal muscle contraction. But the functions of actin and myosin extend to many other cellular events, such as motility, adhesion, endocytosis, cytoplasmic streaming, neuron growth, structural maintenance and polarization. Like molecular cars on an actin track, myosins transport organelles and other cellular components, such as mRNA. Myosins can also aid in the formation or maintenance of an organized actin-based structures (such as the stereocilia of hair cells), and play roles in intracellular signal transduction pathways ([1–4], for reviews summarizing myosin functions). Myosins are molecular motor proteins that use the energy from adenosine triphosphate (ATP) hydrolysis to generate force for directed movement along actin filaments. Myosins are composed of one to two heavy chains, and one or more light chains. The heavy chain consists of several major domains and can include various other subdomains or protein motifs (Fig. ??). The relatively conserved N-terminal motor or ‘head’ domain has binding sites for both ATP and F-actin. A short region joining the head and neck (termed the ‘converter domain’) is believed to be responsible for producing the force required for movement. The neck domain contains one to six light chain binding regions termed IQ motifs, repeats of approximately 23 to 30 residues containing the sequence IQXXXRGXXXRK [5]. The divergent C-terminal globular ‘tail’ has been implicated in binding cargo and targeting the myosin to its proper location in the cell [6]. Some myosins also feature a coiled-coil domain that promotes heavy chain dimerization.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Myosin Superfamily Proteins
1 Introduction 3
The founding member of the myosin family, filament-forming class II muscle myosin, was discovered nearly a century ago, and its role in muscle contraction has been studied extensively [7, 8]. A combination of biochemical and molecular approaches has led to the identification of over 20 different myosin classes [9]. Because of the extensive amount of knowledge acquired regarding the properties of myosin II it is referred to as ‘conventional’ myosin; all other types of myosin are referred to as ‘unconventional’. The first unconventional myosin, myosin I, was described in 1973 by Pollard and Korn [10, 11]. They isolated a protein with enzymatic properties similar to myosin II (i.e. it exhibited actin-activated Mg+2 -ATPase and ATP-sensitive binding to actin) from the common freshwater amoeba Acanthamoeba castellanii that had a lower molecular weight than muscle myosin II (125 versus 200 kD) and did not form filaments. In addition, it was determined that myosin I had one head rather than two. Careful analysis of this unusual molecule revealed that it was indeed a bona fide myosin [12]. This work provided the first insights into the potential diversity and functions of the myosin superfamily. Myosin superfamily members are grouped into different classes based on phylogenetic analysis of motor domains (Fig. 2) [9, 13, 14]. Each class is designated by a Roman numeral, largely in the order of their discovery (note that we will refer to myosins using Arabic numbers for simplicity). A total of 18 classes have been officially designated, but there are at least six novel myosins that have yet to be classified. Myosins have been found in a variety of organisms but no one class is universally expressed in all phyla. The yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe have a total of five myosin genes from three classes (M1, M2, M5). These myosins are shared by higher organisms, ranging from Caenorhabditis elegans (C. elegans) to mammals. The human genome includes about 40 myosin genes from 12 classes [9]. M8, M11, and M13 are only found in plants [15]), and M14 myosins are found in parasites such as Toxoplasma gondii and Plasmodium falciparum [9]. A unique class of myosins (as yet undesignated) has been found in the ciliated protozoan Tetrahymena [16, 17], suggesting that these organisms have a distinctive set of myosins. Cells typically express multiple myosins – the expression of at least a dozen myosins in a single cell type has been described [18]. This includes several different classes of myosin, as well as two or more isoforms of several classes. Myosins from the same class can have isoform-specific roles, such as M5 isoforms in mammalian cells and yeast [19], or they can have functionally overlapping roles, such as the M1s in Dictyostelium discoideum and Saccharomyces [20–23].
← Fig. 1 Domain structure schematic for characterized myosin genes. Schematic illustrating the variety of known structural motifs found in myosin genes. A general box diagram is given for each class, although individual members of the same class may vary depending on the organism and/or particular isoform.
4 Myosin Superfamily Proteins
Fig. 2 Unrooted myosin superfamily phylogenetic tree. Phylogenetic tree from [9], constructed using myosin motor domain sequences. Species names are listed in the Abbreviations table, and some gene names have been shortened to save space. Se-
quences predicted in full or in part from genomic clones are indicated by an asterisk. Figure and legend text reprinted from Molecular Biology of the Cell (2001, 12: 780–794), with permission from the American Society for Cell Biology.
2 Functional Properties of Myosins
The existence of not only diverse myosin classes but also isoforms within those classes is a clear indication of the range of roles for this motor protein superfamily. Myosins have been implicated in many fundamental biological processes, and their functional significance is accented by the discovery that mutations in unconventional myosin genes in worms, flies, mice, and humans cause severe phenotypes such as deafness, blindness, and sterility. Despite the important and diverse roles for myosins, much remains to be learned about them at the molecular level. Only in the last decade have the intricacies of myosins been realized due to the characterization of properties such as structure, kinetics, localization, directionality, and putative functional roles. In the past five years, the rate of discovery in the myosin field has increased significantly – we are learning much about what myosins do and how they work. This chapter will summarize what is currently known about the myosin superfamily members, including their cellular functions and their roles in various diseases.
2 Functional Properties of Myosins
The motor domains of myosins are relatively well-conserved, although kinetic properties can vary from class to class. Some of the myosins are short-duty motors that most likely function as part of an assembly of motors (e.g. M2). Others are processive and can move for long distances without releasing from the actin filament (e.g. M5). It should be noted, however, that not all members of the same class have similar properties. For example, mammalian M5 is highly processive while yeast M5 is not [24, 25]. Myosins can move toward one end of the actin filament or the other, a property referred to as ‘directionality’. In contrast to the conservation between motor regions, the tail regions are quite divergent between the different myosin classes. A variety of protein motifs are found in the myosin tail region and a few are also located in the motor domain. These kinetic and structural properties can offer insight into myosin functions. 2.1 Directionality and Processivity
The directionality and processivity of myosins contribute to their diverse cellular functions [26]. Myosins move unidirectionally on actin filaments, either toward the plus-end or the minus-end of actin. The arrangement of actin filaments in the cell periphery is generally with the plus-ends (fast-growing) toward the plasma membrane. Myosins moving toward the plus-ends (i. e. M1, M5) would be expected to carry cargo and/or to localize at the cell periphery. On the other hand, myosins that move in the opposite direction, such as M6 and M9 [27, 28], would be predicted to have complementary roles. Perhaps minus-end directed myosins provide a means to transport cargos into the cell rather than toward the periphery, and thus provide an opposing activity for plus-end directed motors.
5
6 Myosin Superfamily Proteins
In addition to the directionality of the motor, the time spent bound to actin filaments is an important consideration when thinking about myosin functions. ‘Duty ratio’ refers to the filament-bound state of myosin, or more specifically to the fraction of the ATPase cycle that myosin spends strongly bound to actin. Myosins that spend a significant proportion of the cycle tightly bound to actin have a ‘highduty ratio’, while those that spend a small fraction of the cycle tightly bound are considered to have a ‘low-duty ratio’. Conventional muscle M2 is a low-duty motor [7], a property reflected in the fact that large assemblies of these motors work in coordination with each other during muscle contraction. A low-duty motor would not be suitable for persistent transport of cargo, as it would detach after a single step on actin. In contrast, motors with a high-duty ratio are good candidates to serve as long-range transport motors. Such single myosin molecules can translocate along an individual actin filament for long distances. It is also worth noting that high-duty ratio is a trait that would be advantageous for a motor suspected to provide force and maintain tension. High-duty ratio motors can be identified by detailed kinetic analyses or by in vitro motility assays of single myosin molecules. Myosins from at least three classes are processive – M5 [24], M6 [29–31], and M9b [27]. M5 and M6 are two-headed myosins, while M9b is a single-headed motor. This structural difference suggests distinct mechanisms for processive movement. 2.2 Protein Motifs Found in Myosins
The myosins possess a variety of known protein motifs at either their N- or Cterminus (see Fig. ??) that offer insight into possible myosin function and localization. These motifs can be generally categorized as membrane localization or targeting, signaling, structural, or myosin-specific. A major effort is currently underway to identify the contribution of each of these regions to myosin function. Several classes of myosin have intriguing N-terminal domains. The motor domain of M3 has a protein kinase domain that is required for its function in the termination of the phototransduction signal cascade in the Drosophila eye [32]. M16 has a series of ankyrin repeats at its N-terminus [33], and M18 has an Nterminal PDZ domain [34, 35]. The role of these two different domains in mediating protein–protein interactions suggests that the head may play a role in myosin localization apart from simply binding to actin. M15 has a unique extension at its N-terminus [36], the function of which remains to be determined but could play a role in its motor function and/or localization. The known membrane localizing motifs in the myosin tail are PH (pleckstrin homology), polybasic regions, and FERM (band 4.1/ezrin/radixin/moesin) domains. PH domains bind to the charged head groups of phospholipids in the plasma membrane and are often involved in signal transduction pathways. Thus, myosins with PH domains (M10 and the novel myoM from Dictyostelium [37–39] might act as adapters or relay proteins in signal transduction cascades, functioning to localize signaling proteins to the plasma membrane. The M1s all have a polybasic
2 Functional Properties of Myosins
domain that binds to anionic phospholipids with high affinity [2, 40], suggesting that this region directs the localization of this myosin class to a particular subset of intracellular membranes. It should be noted, however, that this domain also binds to actin and might be able to contribute to the association of M1s to actin filaments [41, 42]. FERM domains were first identified as the N-terminal region of the ERM (ezrin, radixin, moesin) family of actin-binding proteins that shares homology with the Nterminus of band 4.1 [43]. The band 4.1 protein family members participate in cell signaling events that regulate processes such as cell growth, cytoskeleton dynamics, and cell-substrate adhesion [44–46]. The FERM domains of ERM proteins bind to phospholipids as well as the cytoplasmic region of transmembrane receptors such as the CD44 adhesion molecule, linking them to the actin cytoskeleton [44, 46]. Talin, another FERM-domain containing protein, binds both the cytoplasmic tail of integrins via its FERM domain and the actin cytoskeleton via its C-terminus [47], forming mechanical links between the two components (similar to the ERMs). The M7, M10, M12, and M15s each have one or two FERM domains in their C-terminus [48]. Several different classes of myosins contain motifs in their tail regions that are commonly found in signaling proteins. These include rho-GAP (rho-GTPaseactivating protein) domains and SH3 (src homology 3) domains. M9s possess a rho-GAP domain [49, 52]. The rho-GTPases have been reported to regulate actin cytoskeleton organization [51], suggesting that M9 may act as both a motor and a signaling molecule. M4 and one subtype of M1 (the amoeboid type) each have a single SH3 domain [48]. The SH3 domain of the amoeboid M1 binds to proteins that recruit the Arp2/3 complex, implicating these motors in regulating or contributing to the directed polymerization of actin in yeast and Dictyostelium [52–55]. There are additional tail motifs that contribute either to the overall structure or function of given myosins. A number of unconventional myosins (M5, M6, M7, M8, M10, M11, M12, and M18) have stretches of predicted coiled-coil, indicating that these myosins function as dimers. The amoeboid M1s have a domain rich in the amino acids glycine, proline, and alanine (or serine or glutamate) that binds to actin with high affinity [56]. The M9s have one or more zinc-binding regions in the tail region [49, 50]. Zinc-binding motifs are also found in several signaling proteins such as protein kinase C, raf kinase, and Vav [57]. Mammalian M5 and M10 both have PEST regions in their tails [37, 58], sequences enriched in the amino acids proline, glutamate, serine, and threonine. It has been suggested that the PEST regions are susceptible to cleavage by calpain, a calcium-dependent protease [59]. If this cleavage occurs in vivo it would result in separation of the motor domain from the tail. One of the most ubiquitous and least characterized domains found in myosins is the MyTH4 (myosin tail homology) domain. MyTH4 domains are found not only in myosins (M4, M7, M10, M12, M15) but also in a kinesin from plants [48] as well as in a protein that participates in signaling processes that occur during axon guidance [60]. The precise role for the MyTH4 motif is not known, but preliminary evidence suggests that MyTH4 domains might bind to microtubules [48, 61]. In many cases
7
8 Myosin Superfamily Proteins
the MyTH4 domain immediately precedes a FERM domain, suggesting that these two domains together form a functional unit. Evidence in support of a critical role for this domain comes from the finding that mutations in the MyTH4 domains of M7 and M15 cause disease [36, 62]. Finally, both M5 and M6 have unique globular tail domains at their extreme C-termini that direct the intracellular localization of each of these myosins [19, 63]. In the case of M5a, there are several unique exons found only in M5a expressed in the skin [64]. The skin-specific exon has been shown to play an essential role in targeting M5a to the melanosome [65]. 2.3 Myosin Regulation
Motor function must be precisely regulated. The activity of myosins is regulated by either calcium or phosphorylation [3]. Most myosin light chains (LCs) are calmodulin or members of the calmodulin superfamily that bind to IQ motifs in the neck region. Myosins have from one to six IQ motifs per heavy chain (HC) and the LCs bind in the absence of calcium. Calcium binding to the associated calmodulin can result in a weakening of affinity and may result in LC loss. Binding also influences the actin-activated ATPase activity, actin binding, and can result in either reduced or increased translocation velocity of actin filaments in in vitro motility assays [66, 67]. The relationship between calcium binding to calmodulin LCs, actin binding, actin-activated Mg2+ -ATPase activity (AAA) and in vitro motility is different for each myosin. In the case of M5, micromolar calcium concentration results in higher AAA and an increased affinity for actin, but decreased motility [66, 68]. In contrast, higher calcium levels result in both decreased AAA and motility. The reader is referred to several recent reviews for a more detailed discussion of the effects of calcium-calmodulin regulation of various myosins [3, 67, 69, 70]. Regulation by HC phosphorylation occurs in the motor domain of amoeboid M1 and M6 at a residue called the ‘TEDS rule’ site [71] that resides 16 residues Nterminal from the conserved DALAK sequence located near the actin-binding site sequence. The TEDS site residue is usually either phosphorylatable (amino acids Tor S) or acidic (E or D), thus the ‘TEDS’ designation. The importance of phosphorylation in regulating unconventional myosin activity is highlighted by studies of amoeboid M1s in fungi or amoebae. These motors require phosphorylation by a PAK family kinase that is regulated by small G-proteins for actin-activated ATPase activity [72]. It should be noted that phosphorylation regulates the activity of only the lower eukaryotic forms of M1 because only these M1s have an S or T at the TEDS rule site. Mutation of the TEDS rule site to A in either yeast or Dictyostelium amoeboid M1 abolishes in vivo function [73–75]. Phosphorylation is not essential for the activity of all amoeboid M1s. Mutation of the TEDS rule site of Aspergillus MYOA to A results in a loss of virtually all AAA and motor activity, yet does not impair the in vivo function of this myosin [76]. This suggests that some M1s may play a non-motor role in certain contexts. The higher eukaryotic M1s have an E or D at
3 Diverse Functions for Myosins 9
that site and their regulation is likely to occur via Ca2+ -calmodulin rather than HC phosphorylation. Mammalian M6 is also phosphorylated by a PAK kinase at the TEDS rule site and this has been shown to moderately activate M6 motility in vitro as well the rate of Pi release [29, 77, 78]. The exact relationship between changes in the rates of in vitro motility and alterations in the kinetic properties of phosphorylated M6 are not yet clear, but ongoing investigations should resolve this issue. Recent experiments have revealed that phosphorylation in the myosin tail region can regulate the activity of myosins not by affecting enzymatic activity, but by changing the affinity of the myosin for its cargo. Calcium/calmodulin dependent kinase II (CaMKII) phosphorylates a serine in the C-terminal organelle-binding domain of Xenopus M5 in a cell-cycle dependent manner [79, 80]. The onset of mitosis results in an increase in M5 tail phosphorylation and a concomitant decrease in the level of M5 associated with melanosomes in melanophores. The increased phosphorylation is predicted to cause cargo (melanosome) release from M5 and the treatment of mitotic Xenopus extracts with CaMKII inhibitors strongly inhibits M5-melanosome dissociation. HC phosphorylation in the myosin tail region may be one mechanism for regulating myosins involved in organelle transport.
3 Diverse Functions for Myosins
Myosins participate in diverse biological functions (Tab. 1). While a number of different roles for myosins have been defined over the past few years, it has become increasingly clear with the identification of additional novel myosins that the full spectrum of functions for this family of motors has yet to be appreciated. Thus far, myosins have been implicated in a variety of processes that include cell motility and adhesion, intracellular transport, maintenance of actin-rich structures, membrane trafficking and signal transduction [1, 2]. 3.1 Non-muscle Contractile Structures
M2 forms filaments not only in muscle but in non-muscle cells as well. The nonmuscle M2 filaments are dynamic structures – they can be assembled and disassembled in particular intracellular locations as needed [81]. The critical role for M2 in cell motility will be discussed below; it is also involved in forming actinrich contractile structures during processes such as cytokinesis, morphogenesis and wound healing [82–84]. M2 is found interspersed among actin filaments that form contractile bundles between adjacent epithelial cells and in the cortical actin network that supports cell structure. At the cell periphery, a contracting acto-M2 network is thought to prevent pseudopod formation, assist in the retraction of the rear of the cell and provide resting cortical tension [85–87]. It is also found in actin filament stress fiber bundles that are thought to play a role in anchoring a cell to the substrate [88].
10 Myosin Superfamily Proteins Table 1 Summary of putative myosin functions. A brief listing of the known functions of sev-
eral different myosin classes. The question marks (?) indicate functions that have been inferred based on localization, initial phenotypic analyses and analysis of phenotypes resulting from ectopic expression of wild-type or mutant forms of a given myosin Myosin Putative role M1
M2
M3 M5
M6
Relevant organism or cell type/gene
Cell motility (leading edge) Acanthamoeba, Dictyostelium, mammalian cells Microvilli structure Intestinal epithelial cells/MYO1A Vesicle transport, Intestinal epithelial cells/MYO1A endocytosis Endocytosis Dictyostelium, yeast Endosome recycling Polarized epithelial cells/MYO1D Early endosome recycling Dictyostelium/myoB Signal transduction Stereocilia in mouse/MYO1C Cytokinesis Dictyostelium, mammal. cells, echinoderm Fission yeast/MYO2 and MYP2; budding yeast/MYO1 Morphogenesis Drosophila and Dictyostelium Cortical bundles, stress Mammalian cells fibers Cell motility (rear of cell) Acanthamoeba, Dictyostelium, mammalian cells Cell integrity and structure Drosophila photoceptor cells/ninaC Signal transduction Drosophila eye/ninaC Signal transduction (?) Limulus eye Secretory vesicle transport Budding yeast/MYO2 Vacuole Budding yeast/MYO2 transport/inheritance Golgi and peroxisome Budding yeast/MYO2 inheritance ASH1 mRNA transport Budding yeast/MYO2 Mitotic spindle orientation Budding yeast/MYO2 Particle transport Fisson yeast/MYO4 (MYO52) SER and vesicle transport Neurons Pigment granule transport Frog, fish, mouse/M5a Particle transport, filopod DRG neuron growth cones extentsion Intracellular trafficking Particle transport Transport and membrane organization (?)
Clathrin-mediated endocytosis
Mammal. cells/M5b, M5c Drosophila embryo/95F Drosophila oocyte/95F Drosophila developing sperm/95F C. elegans developing sperm/spe-15 Mouse hair cells, stereocilia/Myo6, sv Epithelial cells Polarized epithelial cells
Reference(s) [101, 102, 106] [13, 182, 184] [185, 206, 207] [20–23] [208] [209] [197, 221, 222, 224] See [89, 90] and references in [83] [92, 93] [84, 96, 97] See reference in [88] [85, 98–100, 106] [187, 188] [186, 217, 218] [219] [120, 124] [131] [133, 134] [137, 138, 141] [135, 136] [129, 257] [148–150, 192] [153, 154, 157–159] [106] [211, 212] [178] [179] [180] [181] [195, 196, 198] [191, 198–200] [63, 213, 214]
3 Diverse Functions for Myosins 11 Table 1 (Continued).
Myosin Putative role
Relevant organism or cell type/gene
Reference(s)
M7
Dictyostelium/myoi
[111]
Sensory hair cells/Myo7a, shaker-1
[197, 202, 204, 205]
Retinal pigment epithelia (RPE) Rat, human (GAP domain)
[175, 176] [49, 226, 227]
Mammalian cells
[193]
Toxoplasma/TgMyoA and Plasmodium
[113–116]
M9 M10 M14
Cell adhesion, filopod extension Lateral links, stereocilia organization Melanosome transport Signaling to actin cytoskeleton (?) Transport (?), filopod extension Gliding motility
Early studies showed the presence of M2 in the contractile ring at the equator of dividing cells and that the injection of function-blocking antibodies resulted in an inhibition of cytokinesis in echinoderm embryos [89, 90]. Subsequent molecular genetic studies in the Dictyostelium and yeast provided direct proof of a role for non-muscle M2 in cytokinesis [83, 91]. While budding yeast and Dictyostelium only have a single M2, the fission yeast Schizosaccharomyces pombe has two M2s (Myo2p and Myp2p) that seem to play non-overlapping roles in cytokinesis [92, 93]. Interestingly, in the case of Dictyostelium, M2 is only essential for cytokinesis when cells are in suspension where mutant cells become large and multinucleate. The cells are able to undergo a M2 independent cytokinesis when plated onto a substrate. Thus, Dictyostelium has more than one distinct mode of cytokinesis. One mode is dependent on M2 (‘cytokinesis A’), while another is M2-independent and instead requires adherence to a substrate (‘cytokinesis B’; [94, 95]). Finally, non-muscle M2 is required for morphogenesis in both Dictyostelium and Drosophila melanogaster [84, 96]. For example, mutations in Drosophila M2 result in embryos with defects in dorsal closure, head involution and axon patterning [84]. Further analysis of the mutant embryos suggests M2 is involved in the maintenance of cell shape and cell sheet movement. Embryos mutant for myosin phosphatase, a negative regulator of non-muscle M2, also exhibit dorsal closure defects [97].
3.2 Cell Motility and Adhesion
Myosins play a central role in cell motility. Immunofluorescence experiments have shown that M2 is concentrated in the lamellae, posterior and cortical region of polarized, locomoting cells [98–100]. Conversely, M1s are located at the leading edges of lamellipodial projections of migrating Dictyostelium and Acanthamoeba [101, 102] and in filopodia, lamellipodia, and growth cones of mammalian cells [103–105]. These findings suggest that M1 is involved in forward extension of the cell and that M2 functions in retraction at the rear of the cell body.
12 Myosin Superfamily Proteins
Dictyostelium cells mutant for either M1 or M2 have defects in locomotion [96], further supporting the essential roles for myosins in cell motility. Both M1 and M2 are required for dictating where a pseudopod is formed. M2 filaments at the cortex inhibit protrusion and the loss of this myosin results in the extension of pseudopodia around the periphery of the cell and a 50% decrease in velocity [87]. Dictyostelium lacking the M1s myoA or myoB are not deficient in extending pseudopodia, rather they make excess pseudopodia that are not extended in the typically ordered manner (i. e. one at a time) [106, 107]. Similarly, chromophore-assisted laser inactivation (CALI) of M1 in neuronal growth cones causes expansion of lamellipodia [108]. These observations suggest that M1s have a general role in regulating pseudopod formation, an actin-dependent process. The amoeboid M1s have recently been shown to be linked to the Arp2/3 actin-polymerization machinery and may aid in focusing the site of its action to direct the site of the growing pseudopod [52–54, 109]. Characterization of a Dictyostelium M7 null mutant revealed a role for this class of myosin in cell adhesion [110]. Mutant cells displayed reduced contact between the cell surface and substrate during migration. Loss of adhesion to surfaces results in a defect in phagocytosis [110, 111] due to reduced binding to particles. Preliminary analysis suggests that the loss of adhesion is not due to a loss of cell surface receptors. Since the early stages of phagocytosis and cell motility are similar, it is not completely surprising that M7 is important for both processes. It has been proposed that this myosin participates in adhesion receptor complex assembly and disassembly at the plasma membrane [110]. Apicomplexan parasites such as Toxoplasma gondii and Plasmodium falciparium move by an unusual means termed gliding motility [112]. These cells tightly anchor themselves to their host cell and then propel themselves into the cell by translocating the sites of attachment towards the rear of the cell. No change in cell shape is involved in their movement, in contrast to amoeboid cells. Treatment of cells with actin depolymerizing drugs inhibits gliding motility [113]. These parasites uniquely express M14, a class of myosin that has a motor domain, virtually no neck, and a short tail [114, 115] and one of these, TgMyoA, is localized to the plasma membrane [79, 80]. Recent biochemical analysis of TgMyoA [116] reveals that it is associated with a novel, relatively large LC (31 kD versus 20 kD) that is calmodulin-like and binds to the C-terminal 53 amino acids of the TgMyoA HC. The biochemical properties of this unusual myosin are remarkably similar to those of M2. In spite of a low degree of sequence similarity to other myosins in the converter and LC binding domains, TgMyoA propels actin filaments at a rate of ∼5 µm s−1 and has a step size of 5 nm. These data suggest that this myosin is capable of generating the forces required for propelling gliding motility. The development of an inducible expression system for Toxoplasma [117] should facilitate analysis of the role of this fascinating myosin in gliding motility. Myosins contribute to various aspects of cell migration – directed protrusion of the lamellipodium, adhesion to the substrate and retraction of the rear of the cell. The importance of an actomyosin-based mechanism for the propulsion of a wide array of cells is evidenced by the finding that an unusual myosin, M14, powers
3 Diverse Functions for Myosins 13
a unique form of motility. Given the complex series of events that occur during migration, it is likely that other myosins might also be found to participate in cell migration as new members of the myosin superfamily are studied. 3.3 Organelle/Cellular Component Transport
The proper localization of intracellular components at the necessary time(s) is important for cell function. Actin- and microtubule-dependent motors each play essential roles in ensuring that cellular components (cargos) reach the right place at the right time. Current evidence suggests that microtubule motors serve to translocate organelles for long distances while the actin-based motors move things locally [118, 119]. Myosins appear to transport cellular cargos that range from vesicles to mRNA, and the best-studied of the transport myosins is M5. Evidence for cargo transport by M5s comes from the study of yeast as well as pigment cells from mouse, fish and frog [118, 119]. The yeast Saccharomyces cerevisiae has two M5 genes (MYO2 and MYO4). The product of the MYO2 gene (Myo2p) is essential for viability Yeast with the temperature sensitive (ts) mutation myo2-66 (a point mutation in the N-terminal actin-binding domain) exhibit an accumulation of secretory vesicles and arrest in a large, unbudded state at the restrictive temperature [120, 121]. It has been shown that Myo2p binds vesicles via its tail region and transports them along actin filaments from the mother cell to the bud [122–124]. Genetic data suggest that the Myo2p tail also binds to a kinesin homolog (Smy1p) and a vesicle-associated Rab protein (Sec4p) [14, 122, 124, 127, 128]. The association between Myo2p and Smy1p is supported by co-localization and yeast two-hybrid experiments [121, 125]. These data contribute to the growing body of evidence that actin filament and microtubule systems cooperate. An M5 from fission yeast, Myo4p (also referred to as Myo52), has also been implicated in organelle transport. Myo4p/Myo52p is localized to small punctae that accumulate at sites of polarized growth at the ends of the cells and is also found at the septum. These particles appear to move rapidly within the cell. Deletion of myo4/myo52 results in rounder cells that exhibited an accumulation of internal vesicles [93, 129]. In addition to its role in transporting materials to sites of cell growth and division, Myo4p/Myo52p has also been found to play a role in vacuole fusion in response to osmotic stress [130]. Further work has revealed a function for M5 in vacuole inheritance in budding yeast [131]. The myo2-66 ts mutant was found to exhibit aberrant vacuole partitioning and inheritance by the daughter cell prior to division. A screen for additional myo2 alleles resulted in the identification of the myo2-2 mutant yeast which was also defective for vacuole inheritance but did not display the cell growth defects observed with myo2-66 [122]. The mutation was found to reside in a region of the Myo2p tail specifically required for vacuole binding [132]. It appears that Myo2p is required for two separate processes in yeast: the transport of secretory vesicles important for polarized cell growth and the segregation of vacuoles (an event not
14 Myosin Superfamily Proteins
essential for polarized cell growth). This result has exciting implications for the identification of multiple cargos, and thus multiple cargo binding receptors, for the same molecular motor protein. Like vacuoles, peroxisomes and Golgi elements are also segregated into the yeast bud during cell growth. Movement of these organelles from mother to daughter requires the actin cytoskeleton, and it was demonstrated that Myo2p provides the means for their transport. In cells containing the myo2-66 ts allele, peroxisome movement stops abruptly at the non-permissive temperature [133] and the inheritance of late Golgi elements is inhibited [134]. In addition to its role as a transport motor, yeast Myo2p has been shown to participate in mitotic spindle orientation [135, 136]. Proper spindle orientation is important for accurate chromosome segregation. The tail region of Myo2p binds Kar9p (a microtubule-associated protein) in yeast two-hybrid and co-immunoprecipitation experiments, and Myo2p is required for Kar9p transport into the bud. It appears that Myo2p and Kar9p both play essential roles in spindle orientation. Current models propose that Kar9p binds to both microtubules and Myo2p and that Myo2p transports both into the growing bud [135, 136]. This analysis provides another link between the actin filament and microtubule cytoskeleton systems. The other M5 in budding yeast, Myo4p, has been shown to participate in the asymmetric localization of the transcriptional repressor, Ash1p [137, 138]. Ash1p represses HO endonuclease, the protein that induces mating type switching in yeast [139]. Thus, localization of Ash1p to the budding daughter cells suppresses mating type switching [137, 138]. ASH1 mRNA is asymmetrically segregated to the bud tip of the growing daughter cell [140, 141], and its movement into the bud tip can be monitored by tagging it with green fluorescent protein (GFP) [142, 143]. Myo4p and two other She proteins (She2p and She3p) co-localize with ASH1 mRNA [140], and all four components co-immunoprecipitate from cell extracts [22, 131, 146, 147]. Myo4p co-immunoprecipitates with the She3p adapter in the absence of RNA, while She2p (a second adapter) recruits the Myo4p–She3p complex to ASH1 mRNA [144– 147]. The association between She3p and Myo4p appears to be mediated by their coiled-coil regions [144]. M5 may also have a vesicle transport role in neurons, where it was found to move extruded axon smooth endoplasmic reticulum (SER) and brain vesicles in vitro [148, 149]. Antibodies against the M5 head or tail were able to inhibit this movement [148, 149] and immunogold electron microscopy revealed the co-localization of M5 and a kinesin on the SER vesicles [149], consistent with a model of actin filament and microtubule system cooperation. M5 has also been found to be associated with organelles in live neurons [150]. Analysis of neurons from dilute mice lacking M5a reveals that organelles still undergo bidirectional movement along microtubules. However, loss of this motor causes an accumulation of organelles in regions of the cell enriched for dynamic microtubule ends, suggesting that M5a is responsible for dispersal of these organelles to actin-rich areas of the cell [65, 150]. The role of M5 in pigment granule transport has been well characterized in frog (Xenopus laevis), fish and mouse melanophores – cell types that have traditionally served as model systems to study intracellular transport mechanisms
3 Diverse Functions for Myosins 15
[151, 152]. Pigment granules in Xenopus melanophores undergo rapid and tightly regulated aggregation and dispersal, creating changes in color in response to external stimuli [151]. Transport of pigment granules seems to involve both the actin filament and microtubule cytoskeletal systems [153, 154]. Purified granules from frog melanosomes contain bound M5, kinesin II, and dynein, and have been shown to move along actin filaments (as well as microtubules) in vitro [80, 155] (Fig. 3a). Recent in vivo analysis of melanosome movement suggests that the motor action of kinesin and dynein are in a ‘tug-of-war’ with M5 and that M5 activity is downregulated during the aggregation phase [156]. Melanosome transport to the cell periphery by M5 has also been studied in mouse melanocytes [152]. Mouse melanosomes containing pigment are transported from the center of the cell to the periphery, where they are taken up by keratinocytes. The pigment is eventually deposited into the hair shaft, causing changes in coat color. Loss of M5a (dilute) in the mouse results in a lightened or ‘diluted’ coat color because the pigment granules are not transported to the dendritic tips of the melanocytes in both cultured cells [157, 158] and in situ [159]. M5 was shown to localize to the melanosomes [160–162]. Expression of the globular tail domain in cultured melanocytes disrupts the binding between endogenous M5 and melanosomes, causing a clumping of these organelles in the central region of the cell that leads to the dilute phenotype [163]. Mutations at two other murine loci (ashen and leaden) cause the same phenotype as dilute – a lightened coat color due to pigment granule transport defects [164]. The ashen gene was found to encode the Rab27a GTPase, and leaden encodes a Rab binding protein named melanophilin [165, 166]. The molecular relationship between the gene products of these loci has only recently been elucidated. The receptor for M5a (dilute) has been identified as a complex of two proteins, Rab27a (ashen) and melanophilin (leaden, also referred to as Slac2-a) [163, 167– 172]. Rab27a was shown to first bind the melanosome, then recruit melanophilin, which subsequently recruits M5a into the complex (Fig. 3b). The C-terminal region of melanophilin binds to the C-terminal domain of M5a [172, 173]. Binding of melanophilin to Rab27a occurs in a GTP-dependent manner, providing a means of controlling M5a association to the melanosome [163, 174]. It is of interest to note that M7a is associated with melanosomes in retinal pigment epithelium (RPE) and that these melanosomes are not correctly distributed in the shaker-1 M7a mutant [175]. This observation, coupled with the recent identification of a rabphilin [176] as an M7a tail binding protein, suggests that M7a may also have a role in melanosome transport in the RPE. M6 has also been implicated in organelle transport, consistent with it being a high-duty ratio motor [177]. The fly M6, designated 95F, is present on moving particles of unknown identity in the syncytial blastoderm. These particles appear to move on linear tracks. An inhibitory polyclonal antibody against 95F was able to block the linear (non-random diffusion) motion of the particles. The effect was similar to what was observed following treatment of the embryos with either actin-depolymerizing agent cytochalasin D or ATP inhibitor dinitrophenol (DNP), suggesting that 95F M6 powers the movement of these particles. These particles appear to be
16 Myosin Superfamily Proteins
Fig. 3 Models for melanosome transport by M5a. (a) Schematic of melanosome transport in a Xenopus melanophore, highlighting the involvement of both microtubule-based motors (cytoplasmic dynein and kinesin II), and the actin-based motor M5a. In the dual filament model of transport proposed by Langford (discussed in [151]), melanosomes are transported over a long distance from the center of the melanosome to the cel periphery on microtubules by dynein and kinesis. Once they reach the periphery,
they are transferred to the actin-filament cytoskeleton where local transport occurs via M5a. (b) A more detailed, molecular view of melanosome transport by M5a is diagrammed. A protein complex composed of Rab27a and melanophilin was shown to act as a receptor for M5a, targeting the motor to the melanosome [55, 93, 170, 172]. Rab27a first binds the melanosome then recruits melanophilin, which subsequently recruits M5a into the complex.
3 Diverse Functions for Myosins 17
associated with the invaginating plasma membrane and may be required for supplying membrane or other material to the growing invagination during the rapid cycles of cell division that occur in the early fly embryo [178]. Fluorescence time-lapse microscopy experiments with anti-95F antibodies revealed a role for fly M6 in the transport of cytoplasmic particles from the nurse cell to the oocyte [179]. Microinjection of mitochondria-specific dyes into the developing oocyte suggested that some of these particles were mitochondria. Partial loss of function mutations in the 95F gene revealed a role for M6 in sperm development [180]. The jaguar mutation disrupts the first exon of the 95F myosin, causing male sterility due to a defect in sperm individualization (the process of enclosing each spermatid with its own membrane from what was once a syncytium). Consistent with this phenotype, 95F is localized to the leading edge of the individualization complex (IC), an actin-rich structure that surrounds the nuclei as the process of individualization proceeds [180]. This region is enriched in vesicles and it is possible that 95F could participate directly in the transport of membranes to the IC. Alternatively, it may contribute to the organization or stabilization of a membrane-rich leading edge of the IC that is required for efficient delivery of plasma membrane to the sperm undergoing individualization. Interestingly, M6 is also critical for sperm development in C. elegans [181]. Animals with a null mutation in the spe-15 gene are sterile and exhibit defects in the segregation of various cellular components during spermatogenesis. M6 is thus essential for transport and/or membrane reorganization during spermatogenesis in both flies and worms, and might play a conserved role despite the significant differences in sperm development between these two organisms. Myosins have also been implicated in nuclear movements. The unique myosin from the ciliated protozoan Tetrahymena thermophila, MYO1, has been found to be required for macronucleus elongation, a process essential for correct segregation of genetic material to daughter cells [17]. This result raises the intriguing possibility that myosins play a role in the transport or motility of nuclei in some contexts. 3.4 Maintenance of Actin-rich Extensions
M1, M5, M6, M7, M10 and M15 are found in actin-rich cell extensions such as microvilli, filopodia and stereocilia. Their exact roles in the generation or maintenance of these structures remain unclear at this time. Several myosins have been proposed to provide materials, force and tension required to maintain these structures. One of the best characterized M1s is the brush border subtype MYO1A (better known as BBM1) that forms links between actin bundles in the microvillar core and overlying plasma membrane in brush-border intestinal microvilli [182, 183]. It has been hypothesized to provide structural support to the microvilli, but recent analysis of GFP-tagged MYO1A in an epithelial cell line suggests that it is rapidly exchanged with a cytosolic pool [184], raising questions about how this myosin might play a structural role. It has also been localized to Golgi-derived vesicles, suggesting that its role is largely to deliver membrane to the growing microvillus
18 Myosin Superfamily Proteins
[185] and that it might also contribute to the dynamic growth or changes in flexibility of this structure [184]. The Drosophila M3, ninaC (neither inactivation nor after potential C), was identified by the characterization of a vision mutant defective in photoreception. NINAC is expressed specifically in fly photoreceptor cells as two alternatively spliced proteins [186]. One form localizes to the microvillar extensions of photoreceptors (rhabdomeres), while the other form is found in cell bodies [187, 188]. Cells null for NINAC undergo retinal degeneration [188]. M3 was also recently identified in humans [189], but whether or not it functions in human vision is not yet apparent. M5 is found at the tips of microvilli and is also present in filopodia [190, 191]. CALI of M5 in growth cones of dorsal root ganglion (DRG) neurons results in rapid filopod retraction [108], suggesting a role in delivery of materials required for filopod growth or for growth cone function. Fractionation studies and in vitro motility assays are consistent with this model of M5a action [148, 192]. However, analysis of cultured dilute neurons did not reveal any defects in growth cone dynamics [190]. The discrepancy between the CALI study and the analysis of cultured dilute neurons remains to be resolved, but it is possible that another M5 can compensate partially for the loss of M5a in the mutant mouse neurons. Dictyostelium M7 and mammalian M10 are both localized to the tips of filopodia [37, 110]. Deletion of the Dictyostelium M7 results in a loss of filopodia extension [110]. It is not yet known if this is due to the loss of delivery of materials necessary for making filopodia (as appears to be the case for M5), or if this motor is required for stabilizing this type of actin-rich extension. Overexpression of M10 in COS-7 cells, in contrast, results in the production of longer filopodia in increased numbers [193]. The association of M10 with particles that move up and down along the filopodial shaft suggests that this motor might be required for the targeted delivery of materials to the growing filopod. The inner ear sensory hair cells in mice have actin-rich extensions (modified microvilli) called stereocilia at their apical surface (Fig. 4). These structures are organized into units or bundles that move together when stimulated. The bundles are deflected in response to sound vibrations and gravity, resulting in the opening of ion channels and eventually transduction of electrical signals to the brain [194]. The Snell’s waltzer (sv) mouse mutant carries an intragenic deletion in the gene encoding M6 (Myo6) [195]. These animals are deaf, and also display hyperactivity and circling behavior [196]. M6 is localized to the base of the stereocilia and is enriched in the cuticular plate (a region of densely-packed actin filaments that the stereocilia are embedded into), as well as being diffusely distributed throughout the cytosol [195, 197]. Additionally, it is also found in the pericuticular necklace (a region at the periphery of the apical surface of the cell that is rich in vesicles) and is associated with punctae throughout the cytoplasm of the hair cells [197]. Molecular analysis of the inner ear hair cells in sv mice supports a role for M6 in maintaining stereocilia structure. Stereocilia bundles appear normal at birth, but by 3 days after birth the bundles become fused together and disorganized [198]. One model for this fusion is that M6 provides the force required to anchor the
3 Diverse Functions for Myosins 19
Fig. 4 Schematic of myosins in the hair cell of the ear. Several myosins play critical roles in the mammalian ear–mutations in M2, M3, M6, M7a and M15 have all been linked to human deafness. The apical tip of the hair cel features actin-rich extensions called stereocilia. M1 localizes near the tip of the stereocilia in close proximity to tip links, which are likely attached to gated ion channels that open upon deflection of the hair cell bundle following stimulation [223]. M6 localizes to the base of the stereocilia, and is also enriched in the cuticular plate and perinuclear necklace [195, 197]. It is
believed to provide the force necessary to anchor the stereociliar membrane at the base of each extension. M7a localizes along the length of each stereocilium and is concentrated at site of lateral links, which are thought to be important for maintaining bundle integrity [197]. M7a may anchor cadherins at lateral link sites via an association with vezatin [204]. Finally, M15 also localizes to stereocilia and is concentrated in the cuticular plate [36]. Its role in the hair cell has not yet been elucidated. Note that the localization of neither M2 nor M3 in the sensory hair cells of the ear is as yet known.
membrane at the base of each individual stereocilium. The organization of actin filaments in the stereocilia and the minus-end directionality of M6 are such that this motor may help pull down the membrane around each stereocilium. In the absence of M6, discrete stereocilia form, but these structures are not preserved as the mice age resulting in stereocilia fusion.
20 Myosin Superfamily Proteins
It should be noted that although M6 is expressed ubiquitously in mammals [195, 199], the sv mice have no gross phenotypes other than deafness [195, 196]. However, their intestinal microvilli are shorter than those found in matched controls [198], consistent with M6 playing an important role in the maintenance of these specialized actin-filled structures. Despite the aberrant microvillar length, no gross defects in digestion have been reported. It may be necessary to subject the mice to some type of stress in order to uncover the consequences of shorter microvilli. The hypothesized role for M6 in membrane reorganization during stereocilia bundle formation/maintenance, along with membrane reorganization during fly embryogenesis and sperm development, follows a recurring theme for M6 involvement in membrane remodeling and maintenance. As observed in stereocilia, M6 localizes at the base of microvilli (another type of actin-rich structure) in brush border epithelial cells [191, 199, 200] where it may provide force for anchoring the membrane around each microvillus. A role for M6 in endocytosis could also relate to various biological processes involving membrane restructuring. As a high-duty ratio motor [29], M6 is a motor ideally adapted for maintaining membrane tension and providing force. M7a may also be important for aspects of stereocilia integrity. As with the Snell’s waltzer mouse, mice mutant for M7a (shaker-1) are deaf and display balance dysfunction [201]. Their hair cells extend stereocilia, but these are splayed apart [202]. Immunofluoresence studies have shown that M7a protein localizes along the length of each stereocilium and is concentrated at sites of lateral links that are believed to be important for maintaining the integrity of the stereocilia bundles [203]. Perhaps M7a is involved in organizing the stereocilia into bundles to provide rigidity during bundle deflections. M7a may be required to anchor cadherins at the lateral link sites via an association with vezatin, a ubiquitous, novel transmembrane protein [204]. 3.5 Membrane Trafficking
The M1s have been suggested to play a role in vesicle transport and endocytosis based on localization studies. As mentioned above, MYO1A in intestinal epithelial cells is associated with Golgi-derived vesicles that potentially provide membrane to the base of microvilli [185]. MYO1A is localized to both endosomes and lysosomes and expression of a truncated myosin dominantly inhibits the delivery of endosomal contents to lysosomes [205, 206]. It has also been implicated in the trafficking of basolateral vesicles to the apical plasma membrane in polarized epithelial cells [207]. The structurally similar MYO1D has been shown to play a role in trafficking of recycling endosomes [208]. Interestingly, an amoeboid M1 from Dictyostelium (myoB) has also been implicated in recycling from early endosomes, suggesting that there is some basic conservation of M1 function [209]. Kinetic analysis of MYO1A indicates that this mammalian M1 is a short-duty motor [210]; therefore, if it acts as an organelle motor, it should be present in a complex of motors. Such
3 Diverse Functions for Myosins 21
a complex has yet to be identified, and indeed a bona fide MYO1A binding protein remains to be found. The different trafficking roles that various M1 isoforms play suggest that there are specific targeting molecules for each one, but these have not yet been identified. Characterization of such proteins should provide interesting insights into how the targeting of individual myosins may contribute to exquisite specification of their function. Two M5 isoforms, M5b and M5c, have also been implicated in intracellular trafficking. M5b co-localizes with transferrin and Rab11a to intracellular perinuclear vesicles [211]. Expression of the M5b tail alone disturbs the recycling of transferrin to the plasma membrane in both polarized and non-polarized cells. Two-hybrid analysis revealed that M5b interacts with Rab11a, but attempts to verify this interaction in vivo have not yet been successful [211]. Similarly, M5c co-localizes with Rab8 on intracellular vesicles and expression of the tail domain alone perturbs transferrin trafficking [212]. The available data suggest that M5c is associated with a compartment distinct from that of M5b, another indication that even within a class of myosins there can be precise functional specificity of individual isoforms. M6 has been suggested to participate in clathrin-mediated endocytosis in cultured polarized epithelial cells [63]. Endocytosis is important for nutrient uptake in the cell, along with immune system defense against microorganisms and cell-surface receptor recycling. The actin cytoskeleton plays a role in endocytosis, perhaps by providing a structural framework for trafficking and/or supplying the track for myosin motors transporting vesicles. M6 tagged with GFP was shown to colocalize with clathrin-coated vesicles via its C-terminal tail in polarized Caco-2 cells [63]. It was found in a protein complex with adaptor protein-2 (AP-2) by pull-down assays and co-immunoprecipitation. The AP-2 protein is a component of clathrin-coated vesicles associated with the plasma membrane. Overexpression of the M6 tail domain reduced the uptake of transferrin in transfected cells by over 50 % [63] In contrast, M6 is found in association with the Golgi complex in non-polarized cells as well as in the actin-rich ruffles of growth factor-stimulated cells [77]. The difference in observed localization when compared to polarized cells has been attributed to the lack of an epithelial cell-specific insert in the globular tail region. These findings reiterate how specific residues in the myosin tail play a significant role in specifying subcellular localization and function. M6 has also been shown by two-hybrid screens and co-immunoprecipitation to associate with Disabled-2 (Dab2), a protein involved in endocytosis and signal transduction [213, 214]. The central region of Dab2 is important for AP-2 binding, while the C-terminal tail mediates binding to M6 [214]. Dab2 transiently co-localizes with members of the low-density lipoprotein receptor (LDLR) family in clathrin-coated pits and is thought to regulate receptor protein trafficking [215]. It has been proposed that the M6–Dab2 interaction provides a link between the actin cytoskeleton and receptors undergoing endocytosis [214]. It remains to be determined if M6 acts as a motor for endocytic vesicles or if it participates in the organization of an actin-based sub-membranous domain where initial sorting of endosomal contents occurs.
22 Myosin Superfamily Proteins
3.6 Signal Transduction
The role of myosins in signaling was first eludidated in the fly eye. Illumination of the fly eye with pulses of light results in a distinctive negative potential followed by a rapid return to baseline each time the eye is stimulated. The ninaC (M3) mutant has a larger negative potential upon stimulation followed by a slower return to baseline after cessation of the stimulus [186, 216]. The N-terminal kinase activity, as well as calmodulin binding to sites in the C-terminal region, are required for normal phototransduction [217, 218]. The role of NINAC in normal phototransduction is separate from its role in maintaining the overall integrity of the rhabdomere microvilli [172]. M3 is also found in Limulus polyphemus (horseshoe crab) eyes where it was identified as a phosphoprotein that undergoes clock-regulated phosphorylation (i. e. horseshoe crabs have circadian neural input that changes phototransduction in the eye, among other things) possibly by a cAMP-dependent protein kinase [219]. Interestingly, Limulus NINAC has also been shown to be a phosphoprotein that can be phosphorylated in vitro by protein kinase C (PKC) [219]. Mutation of PKC sites in the tail region results in defects in deactivation after the light stimulus has been shut off. Recent evidence suggests that NINAC is a component of the ‘signalplex’, a cluster of molecules required for phototransduction and that it binds to INAD (a PDZ protein) through its C-terminus [220]. NINAC does not require INAD binding for localization to the rhabdomeres, but their interaction is essential for termination of the electrophysiological response to light [220]. MYO1C (also known as myr 2, myosin 1β) plays an unusual role in signal transduction. This M1 localizes near the tips of the sensory hair cell stereocilia where a fine fiber known as the tip link extends from the side of one stereocilium to the top of the adjacent, shorter stereocilium [197, 221, 222]; Fig. 4. Elegant biophysical studies have shown that the tip link is likely attached to a calcium channel that opens upon deflection of the hair bundle following an auditory stimulation [223]. The cell becomes depolarized, but if the stimulus persists adaptation occurs. Adaptation has been proposed to occur when a myosin molecule climbs up the actin filaments in the core of the stereocilium, acting to reset the tension on the tip link and close the channel. An exciting new chemical genetics approach has been used to test this model of MYO1C function [224]. Introduction of a single mutation in the ATP binding pocket of MYO1C (Y61G) allows the mutant molecule to bind a modified ADP (NMB-ADP) that locks it into the rigor state. Wild-type MYO1C does not bind NMBADP and is not inhibited by it, whereas the Y61G-MYO1C functions normally in the presence of ATP but not with NMB-ADP [225]. Application of NMB-ADP to mouse utricular hair cells expressing small amounts of Y61G-MYO1C has a dramatic effect – it abolishes adaptation [224]. These data provide strong evidence in support of MYO1C being the adaptation motor and suggest a novel role for myosins in the ion channel gating.
4 Myosins in Disease
Human and rat M9s are predicted to contain a GAP domain in their tail regions with approximately 30% sequence identity to GAP proteins of the Rho small Gprotein subfamily [226, 227]. Rho belongs to a family of signaling proteins in the Ras superfamily of monomeric GTPases that act as molecular ‘switches’ whose activity depends on their nucleotide conformation state (GTP-bound versus GDPbound). Activation of Rho leads to the formation of actin filament bundles and focal adhesions, among other processes [51]. GAP proteins negatively regulate these processes by stimulating GTP hydrolysis of Rho, converting it from an active state (GTP-bound) to an inactive (GDP-bound) state. M9s have been shown to stimulate the GTPase activity of Rho A, B and C [50, 57, 226], and as predicted the overexpression of M9 in cultured mammalian cells leads to a loss of stress fibers and focal adhesion contacts [49]. The exact role for M9 in signal transduction is not known, but it is interesting to note that this motor has been shown to be processive and to move toward the minus-end of actin filaments [27]. An unusual myosin from Dictyostelium may also play a role in intracellular signaling. MyoM (to date an undesignated myosin) has a functional GEF domain at its extreme C-terminus [38, 39]. While deletion of this myosin does not result in any overt phenotype, expression of the tail region alone resulted in a decrease in growth rates and a hypersensitivity to osmotic stress [38, 39]. Cells exposed to water extended broad, actin-filled protrusions with the myoM tail at their tip, suggesting that the GEF domain is capable of signaling to the actin cytoskeleton under conditions of stress [38]. It remains to be determined whether this phenotype reflects the true function of myoM, but the fact that the behavior of the cells is only affected under specific conditions suggests that the observed phenotype is not due to non-specific misregulation of GEF activity.
4 Myosins in Disease
The critical roles for unconventional myosins are highlighted by their connection to a variety of human disorders, primarily those related to sensory dysfunctions. A common theme of structural roles in the inner ear and retina emerge for three classes of myosins. Mutations in genes encoding M3, M6, M7 and M15 have been implicated in human hearing disorders and deafness in mice. Myosin functions in hearing and other cellular processes related to disease phenotypes are just beginning to be uncovered. Perhaps the best-known relationship between myosin mutation and disease is in familial hypertropic cardiomyopathy (FHC; see the Chapter 19 by Konhilas and Leinwand), an autosomal dominant condition characterized by shortness of breath, angina, and heart palpitations that can lead to heart failure, stroke and sudden death [228]. FHC can be caused by mutations in β cardiac myosin HC. Myosin LC mutations have also been associated with human heart myopathies similar to FHC [229].
23
24 Myosin Superfamily Proteins
4.1 Griscelli Syndrome
Griscelli syndrome is characterized by severe primary immunodeficiency along with pigment dilution of hair, eyebrows and eyelashes [230, 231]. A role for M5 in this disorder was hypothesized to be due to the phenotypic similarities between Griscelli syndrome patients and the dilute mouse mutant (described above; [232]). For example, pigment accumulation has been observed in the center of Griscelli melanocytes (similar to the accumulation seen in dilute melanocytes; [157, 158, 230]. M5a was subsequently shown to co-localize with melanosomes in cultured human melanocytes [160]. Genetic evidence supports a role for M5 in this disorder. The MYO5A gene maps to the disease locus 15q21 and at least two M5 mutations have been identified in Griscelli patients (type 1 Griscelli syndrome; Pastural et al., [231, 233]). Some Griscelli patients were reported to have immunodeficiency disorders [230, 234], but this phenotype was inconsistent with the mouse dilute phenotype. However, the RAB27A gene is quite close to MYO5A – they are 1.6 cM apart- and RAB27A has also been linked with this syndrome (type 2 Griscelli syndrome; [235]. Rab27a is localized with M5s on melanosomes [168, 172] and these two proteins co-immunoprecipitate from melanocyte extracts [168]. Rab27a plays a role in granule exocytosis while M5a does not [236]. Thus, MYO5A mutations are associated with the pigmentation and neurological aspects of Griscelli disease, while RAB27A mutations are associated with the immunological disease form of Griscelli [235]. Mutations in the non-muscle M2 encoded by MYH9 have recently been implicated in three giant-platelet disorders – May–Hegglin anomaly (MHA), Sebastian syndrome (SBS) and Fechter syndrome [237, 238]. All three disorders are characterized by thrombocytopenia, large platelets and leukocyte inclusions [237, 238]. Based on similarities between the three disorders and mapping experiments, it was proposed that MHA, SBS and FTNS are allelic. One unique feature of FTNS is sensorineural deafness, consistent with a connection between Fechter syndrome and DFNA17 deafness disorder. 4.2 Roles for Myosins in Hearing
Approximately one in 1000 people experience severe or profound deafness at birth or during early childhood, and another one in 1000 children become deaf before they reach adulthood [239]. Non-syndromic deafness patients are afflicted only with hearing loss, while syndromic cases exhibit hearing loss in combination with other defects. There are many genes involved in deafness disorders, including several encoding unconventional myosins (see Fig. 4). M3, M6, M7 and M15 have fundamentally important functions in the auditory system as mutant alleles at these loci have been shown to co-segregate with human deafness. M7 plays a role in hearing and balance in humans, mice and zebrafish [201, 240, 241]. M7a (MYO7A) has a highly restricted tissue distribution in
4 Myosins in Disease
mammals – it is found almost exclusively in cells that have specialized actin-based structures such as the hair cells of the ear, retina, testis, lung and kidney [203]. Mutations in MYO7A result in two forms of human non-syndromic deafness, mutant loci DFNB2 (autosomal recessive; [242]) and DFNA11 (autosomal dominant; [243]) and a syndromic form of deafness, Usher’s syndrome type IB [241]. Usher’s syndrome type IB is characterized by a gradual degeneration of the rods and cones in the retina, causing loss of vision in addition to hearing loss [244] and [245] for reviews). As mentioned above, the shaker-1 M7a mutant mice are deaf and display balance dysfunction [201]. Mutations in the Danio rerio (zebrafish) M7a gene, mariner, were found to exhibit circling behavior along with hair cell defects similar to those observed in shaker-1 mice [240]. Hair cells did extend stereocilia but these are splayed apart, suggesting that the links between adjacent stereocilia have been lost [240]. These data demonstrate the remarkable functional conservation of M7a through evolution. A recent electrophysiological study has uncovered a role for M7a in gating of transducer channels that cannot be attributed to the disorganization of the hair bundles [246], suggesting that this motor may play more than one role in stereocilia function. While analysis of the shaker-1 mutant mouse has provided insight into the role of M7a in hearing [201, 202, 246], it has not provided a clear explanation for the late onset retinitis pigmentosa exhibited by Usher’s IB patients. A careful analysis of shaker-1 photoreceptors has revealed that opsin transport is decreased [247] and melanosome distribution is aberrant [175]. It has been suggested that the relatively short life-span of the mouse is not long enough for expression of the vision defect to become apparent [247]. Mutations have been documented all along the length of the M7a HC in humans, as well as mice and zebrafish. In addition to confirming the importance of the motor domain, the identification of mutations in both the C-terminal MyTH4 and FERM domains indicate these domains are essential for function [62, 240]. Mutations in M15 also result in deafness both in humans and mice. MYO15 is mutant in human DFNB3, a non-syndromic, congenital deafness disorder [248]. As is the case for M7a, mouse M15 has a restricted tissue distribution. In addition to the cochlea, it is expressed in the pituitary and other neuroendocrine tissue [36, 249]. The M15 homolog in mouse is also associated with a hearing loss phenotype; the shaker-2 mouse was found to have stereocilia that are approximately one-tenth the normal length [250]. M15 is localized to the stereocilia of cochlear hair cells and is concentrated in the cuticular plate [36]. Consistent with its localization, an abnormal organization of actin is also observed in the cochlear hair cells of shaker-2 mice, suggesting that this myosin plays a role in actin organization. M6 was first associated with hearing loss when a null mutation in murine Myo6 was discovered to cause the Snell’s waltzer phenotype [195], as discussed above (Section 3.4). A missense mutation in the motor domain of M6a results in an autosomal dominant non-syndromic form of deafness, DFNA22, indicating that this myosin has a conserved role in hearing [251]. The effect of this single mutation on the motor properties of M6a is not yet known.
25
26 Myosin Superfamily Proteins
Less is known regarding the role of other myosins involved in hearing, as mouse models for several human deafness mutations are not yet available for study. The role of M3 in fly vision has already been discussed, but M3 also plays an important sensory role in humans. Mutation of M3a causes progressive hearing loss DFNB30 [252]. MYO3A is expressed in the sensory cells of the mouse ear, but more work remains to be done to determine how this motor protein contributes to the maintenance of hearing. Given the conservation of M3 from flies to Limulus to humans, it is tempting to speculate that it plays a role in regulating some aspect of signaling in hair cells. A mutation in MYH9, a non-muscle M2, has been reported to be responsible for an autosomal dominant form of deafness, DFNA17 [253]. This missense mutation results in a single amino acid change (R705H) in a residue located in the critical SH1 helix, which is known to influence the ATPase activity of myosin. The underlying cause of deafness and the localization of this myosin to sensory cells are not currently known, but it will undoubtedly be of interest to determine how a myosin that plays a role in generating contractile forces contributes to hair cell function.
5 New Myosins and Myosin Functions on the Horizon
A considerable number of myosins have been identified but their functions have not yet been elucidated. The presence of interesting domains in their N- or C-termini and divergence in their motor domains suggests that they have the potential to play unique and interesting roles. As work in the field progresses, these roles will undoubtedly be uncovered. A comprehensive analysis of genes predicted by the human and fly genome projects [9] has revealed the existence of a novel human myosin predicted from contig AC023133. This myosin appears to have a unique N-terminal extension and a short tail region. Two novel myosins were identified in the fly genome, one designated 29CD that has an N-terminal extension and a short, proline-rich tail and another designated 95E that has a large insert in the motor domain loop 1 and a tail region comprised largely of basic residues. The diverse roles played by the cytoskeleton in fly development is likely to catalyze interest in studying the in vivo function of these novel myosins through a search for null mutants or use of double-stranded RNA interference (dsRNAi) to knock out their mRNAs. A novel human myosin, M18 or PDZ-myosin, was identified in a differential screen for genes expressed highly in bone marrow stromal cells that support hematopoietic cells [34]. The tail region of this myosin is comprised largely of predicted coiled-coil sequences (suggesting that it is a dimer) and there is a PDZ domain at the N-terminus. The tissue distribution of this myosin is not yet known. It is localized to cytoskeletal elements in stromal cell lines as well as NIH3T3 cells, where it appears to be associated with both actin and microtubules as well as being distributed throughout the cell.
6 Conclusions 27
The novel M16, also referred to as myr8, was identified in a study of neuronal myosins in the rat [33]. Two forms of this myosin were identified. M16a has a shorter tail and is expressed predominantly in the nervous system. The longer M16b appears to be broadly expressed. Analysis of the tail sequence reveals that it is unique and does not contain any regions of predicted coiled-coil, suggesting M16b is a monomer. The N-terminus contains eight ankryin repeats that bind to protein phosphatases. Immunoprecipitation experiments revealed that M16b indeed associates with protein phosphatase 1 catalytic subunits 1α and 1γ . This myosin has a punctate distribution in both neurons and astrocytes. M4, M12 and a novel unconventional myosin from scallop mantle (tissue comprised of both muscle and non-muscle cells) named ScunM have been identified either at the protein or gene level but no information regarding their tissue distribution (in the case of M12 and ScunM) or subcellular localization is available [1, 254, 255]. These myosins appear to be unique to the organisms in which they were originally identified. The M4 gene was identified in Acanthamoeba and sequence analysis revealed the presence of a C-terminal SH3 domain as well as a single MyTH4 domain [255]. It was shown to co-precipitate with actin under rigor conditions and to be released from actin by ATP, indicating that it has the biochemical properties expected of a myosin. M12 was identified at the gene level through a comprehensive search of the C. elegans genome for myosin genes and is named HUM-4 (heavy chain of unconventional myosin) [256]. HUM-4 has a 150-amino acid extension on the N-terminus. In addition, the P-loop and actin-binding sequences in the motor domain are more divergent than those found in other myosins. There are two MyTH4 domains in the tail but no clear FERM domains have been identified. No known mutations map near the hum-4 gene and efforts to inactivate gene function by dsRNAi have not resulted in any phenotype (J. P. Baker and M. A. Titus, unpublished data; WormBase – www.wormbase.org). ScunM was identified in a search for unconventional myosin genes in scallop mantle tissue [254]. It is one of the smallest myosins, with the HC having a predicted molecular weight of 90 kD. Similar to the M14s, ScunM has a short tail consisting of only 46 amino acids and it lacks any LC binding domains. Phylogenetic analysis, however, reveals that this myosin is the founding member of a novel class and not an M14.
6 Conclusions
It is likely that efforts to sequence the genomes of various organisms will result in the discovery of additional novel myosins. The ability to express myosins in baculovirus and inactivate gene function by dsRNAi in organisms and cultured cells should permit rapid characterization of their biochemical properties and in vivo functions. In addition to identifying myosins based on conserved motor sequences,
28 Myosin Superfamily Proteins
it is possible that large-scale efforts directed at crystallizing proteins will identify proteins that do not share the conserved myosin primary structure, but whose structure is highly similar to that of the myosin motor domain. Thus, a convergence of proteomic and genomic approaches is likely to expand our appreciation of the diversity of the myosin superfamily A number of lessons can be learned from examining the myosin superfamily. The most striking one is the diversity of the family – both in terms of the variety of different classes and the number of different isoforms within a class. This diversity is reflected in the wide range of functions observed for myosins of even a single class. There are classes of myosins that appear to be specific for plants and parasites, while only three myosin classes are conserved from fungi to humans. As evolution proceeded, it appears that higher eukaryotes acquired a more sophisticated complement of myosins. Some of which, such as M7, were preserved and others, such as M4 and M12, appear to have been lost. The importance of myosins is highlighted by their essential roles in fungi and their association with a number of human diseases, ranging from cardiomyopathies to deafness. An unexpected surprise that emerged from studies of myosins in mammals is the key role that they play in hearing, perhaps reflecting their ability to participate in an array of cellular functions. Future studies of this large motor family are likely to reveal additional roles and operating principles as well as, undoubtedly, exotic new family members.
Acknowledgements
The authors would like to thank Shawn Galdeen for helpful comments on the manuscript and Dr. Richard Cheney for providing Figure 2. M.C.K. is supported by an NIH postdoctoral fellowship (NRSA) and work in the Titus laboratory is supported by the National Institutes of Health. M.A.T. is an Established Investigator of the American Heart Association.
References 1 BAKER, J. P. and M. A. TITUS. 1998. Myosins: matching motors with functions. Curr. Op. Cell Biol. 10: 80–86. 2 MERMALL, V., P. L. POST, and M. S. MOOSEKER. 1998. Unconventional myosins in cell movement, membrane traffic, and signal transduction. Science 279: 527–533. 3 SOKAC, A. M. and W. M. BEMENT. 2000. Regulation and expression of unconventional myosins. Int. Rev. Cytol. 200: 197–304.
4 TUXWORTH, R. I. and M. A. TITUS. 2000. Unconventional myosins: anchors in the membrane traffic relay. Traffic 1: 11–18. 5 B¨AHLER, M. and A. RHOADS. 2002. Calmodulin signalling via the IQ motif. FEBS Lett. 513: 107–113. 6 KARCHER, R. L., S. W. DEACON, and V. I. GELFAND. 2002. Motor-cargo interactions: the key to transport specificity. Trends Cell Biol. 12: 21–27. 7 GEEVES, M. A. and K. C. HOLMES. 1999. Structural mechanism of muscle
References
8
9
10
11
12
13
14
15
16
17
18
19
contraction. Ann. Rev. Biochem. 68: 687–728. HUXLEY, A. F. 2000. Cross-bridge action: present views, prospects, and unknowns. J. Biomech. 33: 1189–1195. BERG, J. C., B. C. POWELL, and R. E. CHENEY. 2001. A millenial myosin census. Mol. Biol. Cell 12: 780–794. POLLARD, T. D. and E. D. KORN. 1973. Acanthamoeba myosin. I. Isolation from Acanthamoeba castellanii of an enzyme similar to muscle myosin. J. Biol. Chem. 248: 4682–4690. POLLARD, T. D. and E. D. KORN. 1973. Acanthamoeba myosin. II. Interaction with actin and with a new cofactor protein required for actin activation of Mg2+ adenosine triphosphatase activity J. Biol. Chem. 248: 4691–4697. KORN, E. D. 1991. Acanthamoeba myosin I: past, present and future. Curr. Top. Mem. 38: 13–30. CHENEY, R. E. and M. S. MOOSEKER. 1992. Unconventional myosins. Curr. Op. Cell Biol. 4: 27–35. GOODSON, H. V. and J. A. SPUDICH. 1993. Molecular evolution of the myosin family: relationships derived from comparisons of amino acid sequences. PNAS 90: 659–663. REDDY A. S. N. and I. S. DAY. 2001. Analysis of the myosins encoded in the recently completed Arabidopsis thaliana genome sequence. Genome Biol. 2: 1–17. GARCE´ S, J. and R. H. GAVIN. 1998. A PCR screen identifies a novel, unconventional myosin heavy chain gene (MYO1) in Tetrahymena thermophila. J. Euk. Microbiol. 45: 252–259. WILLIAMS, S. A., R. E. HOSEIN, J. A. CARCe´ s, and R. H. GAVIN. 2000. MYO1, a novel unconventional myosin gene affects endocytosis and macronuclear elongation in Tetrahymena thermophila. J. Euk. Microbiol. 47: 561–568. BEMENT, W. M., T. HASSON, J. A. WIRTH, R. E. CHENEY, and M. S. MOOSEKER. 1994. Identification and overlapping expression of multiple unconventional myosin genes in vertebrate cell types. PNAS 91: 6549–6553. RECK-PETERSON, S. L., D. W. PROVANCE, M. S. MOOSEKER, and J. A. MERCER. 2000.
20
21
22
23
24
25
26
27
28
29
30
Class V myosins. Biochem. Biophys. Acta 1496: 36–51. GELI, M. I. and H. RIEZMAN. 1996. Role of type I myosins in receptor-mediated endocytosis in yeast. Science 272: 533–535. GOODSON, H. V., B. L. ANDERSON, H. M. WARRICK, L. A. PON, and J. A. SPUDICH. 1996. Synthetic lethality screen identifies a novel yeast myosin I gene (MYO5): myosin I proteins are required for polarization of the actin cyto-skeleton. J. Cell Biol. 133: 1277–1291. JUNG, G., X. WU, and J. A. HAMMER, III. 1996. Dictyostelium mutants lacking multiple classic myosin I isoforms reveal combinations of shared and distinct functions. J. Cell Biol. 133: 305–323. NOVAK, K. D., M. D. PETERSON, M. C. REEDY, and M. A. TITUS. 1995. Dictyostelium myosin I double mutants exhibit conditional defects in pinocytosis. J. Cell Biol. 131: 1205–1221. MEHTA, A. D., R. S. ROCK, M. RIEF J. A. SPUDICH, M. S. MOOSEKER, and R. E. CHENEY. 1999. Myosin-V is a processive actin-based motor. Nature 400: 590–593. RECK-PETERSON, S. L., M. J. TYSKA, P. J. NOVICK, and M. S. MOOSEKER. 2001. The yeast class V myosins, Myo2p and Myo4p, are nonprocessive actin-based motors. J. Cell Biol. 153: 1121–1126. HIGUCHI, H. and S. A. ENDOW. 2002. Directionality and processivity of molecular motors. Curr. Op. Cell Biol. 14: 50–57. INOUE, A., J. SAITO, R. IKEBE, and M. IKEBE. 2002. Myosin IXb is a single-headed minusend directed processive motor. Nature Cell Biol. 4: 302–306. WELLS, A. L., A. W. LIN, L. Q. CHEN, D. SAFER, S. M. CAIN, T. HASSON, B. O. CARRAGHER, R. A. MILLIGAN, and H. L. SWEENEY. 1999. Myosin VI is an actin-based motor that moves backwards. Nature 401: 505–508. De LA CRUZ, E. M., E. M. OSTAP, and H. L. SWEENEY. 2001. Kinetic mechanism and regulation of myosin VI. J. Biol. Chem. 32373–32381. NISHIKAWA, S., K. HOMMA, Y. KOMORI, M. IWAKI, T. WAZAWA, A. H. IWANE, J.
29
30 Myosin Superfamily Proteins
31
32
33
34
35
36
37
38
SAITO, R. IKEBE, E. KATAYAMA, T. YANAGIDA, and M. IKEBE. 2002. Class VI myosin moves processively along actin filaments backward with large steps. BBRC 290: 311–317. ROCK, R. S., S. E. RICE, A. L. WELLS, T. J. PURCELL, J. A. SPUDICH, and H. L. SWEENEY. 2001. Myosin VI is a processive motor with a large step size. PNAS 98: 13655–13659. B¨AHLER, M. 2000. Are class III and IX myosins motorized signalling molecules? Biochem. Biophys. Acta 1496: 52–59. PATEL, K. G., C. LIU, P. L. CAMERON, and R. S. CAMERON. 2001. Myr8, a novel unconventional myosin expressed during brain development associates with the protein phosphatase catalytic subunits 1α and 1γ 1. J. Neurosci. 21: 7954–7968. FURUSAWA, T., S. IKAWA, N. YANAI, and M. OBINATA. 2000. Isolation of a novel PDZ-containing myosin from hematopoietic supportive bone marrow stromal cell lines. BBRC 270: 67–75. YAMASHITA, R. A., J. R. SELLERS, and J. B. ANDERSON. 2000. Identification and analysis of the myosin superfamily in Drosophila: a database approach. J. Mus. Res. Cell Mot. 21: 491–505. LIANG, Y., A. WANG, I. A. BELYANTSEVA, D. W. ANDERSON, F. J. PROBST, T. D. BARBER, W. MILLER, J. W. TOUCHMAN, L. JIN, S. L. SULLIVAN, J. R. SELLERS, S. A. CAMPER, R. V. LLOYD, B. KACHAR, T. B. FRIEDMAN, and R. A. FRIDELL. 1999. Characterization of the human and mouse unconventional myosin XV genes responsible for hereditary deafness DFNB3 and shaker-2. Genomics 61: 243–258. BERG, J. S., B. H. DERFLER, C. M. PENNISI, D. P. COREY, and R. E. CHENEY. 2000. Myosin-X, a novel myosin with pleckstrin homology domains, associates with regions of dynamic actin. J. Cell Sci. 113: 3439–3451. GEISSLER, H., R. ULLMANN, and T. SOLDATI. 2000. The tail domain of myosin M catalyses nucleotide exchange on rac1 GTPases and can induce
39
40
41
42
43
44
45
46
actin-driven surface protrusions. Traffic 1: 399–410. OISHI, N., H. ADACHI, and K. SUTOH. 2000. Novel Dictyostelium unconventional myosin, myoM, has a putative RhoGEF domain. FEBS Lett. 474: 16–22. TITUS, M. A. 1997. Unconventional myosins: new frontiers in actin-based motors. Trends Cell Biol. 7: 119–123. LEE, W. L., E. M. OSTAP, H. G. ZOT, and T. D. POLLARD. 1999. Organization and ligand binding properties of the tail of Acanthamoeba myosin-1A. Identification of an actin-binding site in the basic (tail homology-1) domain. J. Biol. Chem. 274: 35159–35171. LIU, X., H. BRZESKA, and E. D. KORN. 2000. Functional analysis of tail domains of Acanthamoeba myosin IC by characterization of truncation and deletion mutants. J. Biol. Chem. 275: 24886–24892. CHISHTI, A. H., A. C. KIM, S. M. MARFATIA, M. LUTCHMAN, M. HANSPAL, H. JINDAL, S. C. LIU, P. S. LOW, G. A. ROULEAU, N. MOHANDAS, J. A. CHASIS, J. G. CONBOY P. GASCARD, Y. TAKAKUWA, S. C. HUANG, E. J. BENZ, Jr., A. BRETSCHER, R. G. FEHON, J. F. GUSELLA, V. RAMESH, F. SOLOMON, V. T. MARCHESI, S. TSUKITA, S. TSUKITA, M. ARPIN, D. LOUVARD, N. K. TONKS, J. M. ANDERSON, A. S. FANNING, P. J. BRYANT, D. F. WOODS, and K. B. HOOVER. 1998. The FERM domain: a unique module involved in the linkage of cytoplasmic proteins to the membrane. Trends Biochem. Sci. 23: 281–282. BRETSCHER, A., D. CHAMBERS, R. NGUYEN, and D. RECZEK. 2000. ERM-Merlin and EBP50 protein families in plasma membrane organization and function. Ann. Rev. Cell Dev. Biol. 16: 113–143. HOOVER, K. B. and P. J. BRYANT. 2000. The genetics of the protein 4.1 family: organizers of the membrane and cytoskeleton. Curr. Op. Cell Biol. 12: 229–234. TSUKITA, S. and S. YONEMURA. 1999. Cortical actin organization: lessons from ERM (Ezrin/Radixin/Moesin) Proteins. J. Biol. Chem. 274: 34507–34510.
References 47 HORWITZ, A., K. DUGGAN, C. BUCK, M. C. BECKERLE, and K. BURRIDGE. 1986. Interaction of plasma membrane fibronectin receptor with talin–a transmembrane linkage. Nature 320: 531–533. 48 OLIVER, T. N., J. S. BERG, and R. E. CHENEY. 1999. Tails of unconventional myosins. Cell. Mol. Life Sci. 56: 243–257. 49 M¨ULLER, R. T., U. HONNERT, J. REINHARD, and M. B¨AHLER. 1997. The rat myosin myr5 is a GTPase activating protein for rho in vivo: essential role of arginine 1695. Mol. Biol. Cell 8: 2039–2053. 50 POST, P. L., G. M. BOKOCH, and M. S. MOOSEKER 1998. Human myosin IXb is a mechano-chemically active motor and a GAP for rho. J. Cell Sci. 111: 941–950. 51 RIDLEY, A. J. 2001. Rho GTPases and cell migration. J. Cell Sci. 114: 2713–2722. 52 EVANGELISTA, M., B. M. KLEBL, A. H. Y. TONG, B. A. WEBB, T. LEEUW, E. LEBERER, M. WHITEWAY D. Y. THOMAS, and C. BOONE. 2000. A role for myosin-I in actin assembly through interactions with Vrp1p, Bee1p, and the Arp2/3 complex. J. Cell Biol. 148: 353–362. 53 JUNG, G., K. REMMERT, X. WU, J. M. VOLOSKY and J. A. HAMMER, III. 2001. The Dictyostelium CARMIL protein links capping protein and the Arp2/3 complex to type I myosins through their SH3 domains. J. Cell Biol. 153: 1479–1497. 54 LECHLER, T., A. SHEVCHENKO, A. SHEVCHENKO, and R. LI. 2000. Direct involvement of yeast type I myosins in Cdc42-dependent actin polymerization. J. Cell Biol. 148: 363–373. 55 XU, P., K. I. MITCHELHILL, B. KOBE, B. E. KEMP, and H. G. ZOT. 1997. The myosin I binding protein Acan125 binds the SH3 domain and belongs to the superfamily of leucine-rich repeat proteins. PNAS 94: 3685–3690. 56 POLLARD, T. D., S. K. DOBERSTEIN, and H. G. ZOT. 1991. Myosin I. Ann. Rev. Physiol. 53: 653–681. 57 CHIEREGATTI, E., A. G¨ARTNER, H. E. STO¨ FFLER, and M. B¨AHLER. 1998. Myr 7 is a novel myosin IX-RhoGAP expressed in the brain. J. Cell Sci. 111: 3597–3608. 58 ESPREAFICO, E. M., R. E. CHENEY, M. MATTEOLI, A. A. C. NASCIMENTO, P. V. De
59
60
61
62
63
64
65
66
CAMILLI, R. E. LARSON, and M. S. MOOSEKER. 1992. Primary structure and cellular localization of chicken brain myosin-V (p190), an unconventional myosin with calmodulin light chains. J. Cell Biol. 119: 1541–1557. RECHSTEINER, M. and S. W. ROGERS. 1996. PEST sequences and regulation by proteolysis. Trends Biochem. Sci. 21: 267–271. HUANG, X., H. J. CHENG, M. TESSIER-LAVIGNE, and Y. JIN. 2002. MAX-1, a novel PH/MyTH4/FERM domain cytoplasmic protein implicated in netrin-mediated axon repulsion. Neuron 34: 563–576. NARASIMHULU, S. B. and A. S. N. REDDY. 1998. Characterization of microtubule binding domain in the Arabidopsis kinesin-like calmodulin binding protein. Plant Cell 10: 957–965. LIU, X. Z., C. HOPE, J. WALSH, V. NEWTON, X. M. KE, C. Y. LIANG, L. R. XU, J. M. ZHOU, D. TRUMP, K. P. STEEL, S. BUNDEY and S. D. BROWN. 1998. Mutations in the myosin VIIA gene cause a wide phenotypic spectrum, including atypical Usher Syndrome. Am. J. Hum. Genet. 63: 909–912. BUSS, F., S. D. ARDEN, M. LINDSAY, J. P. LUZIO, and J. KENDRICK-JONES. 2001. Myosin VI isoform localized to clathrin-coated vesicles with a role in clathrin-mediated endocytosis. EMBO J. 20: 3676–3684. SEPARACK, P. K., J. A. MERCER, M. C. STROBEL, N. G. COPELAND, and N. A. JENKINS. 1995. Retroviral sequences located within an intron of the dilute gene alter dilute expression in a tissue-specific manner. EMBO J. 14: 2326–2332. WU, X., B. BOWERS, K. RAO, Q. WEI, and J. A. HAMMER, III. 1998. Visualization of melanosome dynamics within wild-type and dilute melanocytes suggests a paradigm for myosin V function in vivo. J. Cell Biol. 143: 1899–1918. TAUHATA, S. B. F., D. V. DOS SANTOS, E. W. TAYLOR, M. S. MOOSEKER, and R. E. LARSON. 2001. High affinity binding of brain myosin-Va to F-actin induced by
31
32 Myosin Superfamily Proteins
67
68
69
70
71
72
73
74
75
76
77
calcium in the presence of ATP. J. Biol. Chem. 276: 39812–39818. WOLENSKI, J. S. 1995. Regulation of calmodulin-binding myosins. Trends Cell Biol. 5: 310–316. CHENEY, R. E., M. K. O’SHEA, J. E. HEUSER, M. V. COELHO, J. A. WOLENSKI, E. M. ESPREAFICO, P. FORSCHER, R. E. LARSON, and M. S. MOOSEKER. 1993. Brain myosin-V is a two-headed unconventional myosin with motor activity. Cell 75: 13–23. BARYLKO, B., D. D. BINNS, and J. P. ALBANESI. 2000. Regulation of the enzymatic and motor activities of myosin I. Biochem. Biophys. Acta 1496: 23–35. REILEIN, A. R., S. L. ROGERS, M. C. TUMA, and V. I. GELFAND. 2001. Regulation of molecular motor proteins. Int. Rev. Cytol. 204: 179–238. BEMENT, W. M. and M. S. MOOSEKER. 1995. TEDS rule: a molecular rationale for differential regulation of myosins by phosphorylation of the heavy chain head. Cell Mot. Cytoskel. 31: 87–92. BRZESKA, H. and E. D. KORN. 1996. Regulation of class I and class II myosins by heavy chain phosphorylation. J. Biol. Chem. 271: 16983–16986. NOVAK, K. D. and M. A. TITUS. 1997. Myosin I overexpression impairs cell migration. J. Cell Biol. 136: 633–648. NOVAK, K. D. and M. A. TITUS. 1998. The myosin I SH3 domain and TEDS rule phosphorylation site are required for in vivo function. Mol. Biol. Cell 9: 75–88. WU, C., V. LYTVYN, D. Y. THOMAS, and E. LEBERER. 1997. The phosphorylation site for Ste20p-like protein kinases is essential for the function of myosin-I in yeast. J. Biol. Chem. 272: 30623–30626. LIU, X., N. OSHEROV, R. YAMASHITA, H. BRZESKA, E. D. KORN, and G. S. MAY. 2001. Myosin I mutants with only 1% of wild-type actin activated MgATPase activity retain essential in vivo function(s). PNAS 98: 9122–9127. BUSS, F., J. KENDRICK-JONES, C. LIONNE, A. E. KNIGHT, G. P. COT ˆ E´ , and J. P. LUZIO. 1998. The localization of myosin VI at the Golgi complex and leading edge of
78
79
80
81
82
83
84
85
86
87
fibroblasts and its phosphorylation and recruitment into membrane ruffles of A431 cells after growth factor stimulation. J. Cell Biol. 143: 1535–1545. YOSHIMURA, M., K. HOMMA, J. SAITO, A. INOUE, R. IKEBE, and M. IKEBE. 2001. Dual regulation of mammalian myosin VI motor function. J. Biol. Chem. 276: 39600–39607. KARCHER, R. L., J. T. ROLAND, F. ZAPPACOSTA, M. J. HUDDLESTON, R. S. ANNAN, S. A. CARR, and V. I. GELFAND. 2001. Cell cycle regulation of myosin-V by calcium/calmodulin-dependent protein kinase II. Science 293: 1317–1320. ROGERS, S. L., R. L. KARCHER, J. T. ROLAND, A. A. MININ, W. STEFFEN, and V. I. GELFAND. 1999. Regulation of melanosome movement in the cell cycle by reversible association of myosin V. J. Cell Biol. 146: 1265–1275. TAN, J. L., S. RAVID, and J. A. SPUDICH. 1992. Control of nonmuscle myosin phosphorylation. Ann. Rev. Biochem. 61: 721–759. BEMENT, W. M. 2002. Actomyosin rings: the riddle of the sphincter. Curr. Biol. 11: R12–R15. ROBINSON, D. N. and J. A. SPUDICH. 2000. Towards a molecular understanding of cytokinesis. Curr. Op. Cell Biol. 10: 228–237. YOUNG, P. E., A. M. RICHMAN, A. S. KETCHUM, and D. P. KIEHART. 1993. Morphogenesis in Drosophila requires nonmuscle myosin heavy chain function. Genes Dev. 7: 29–41. EDDY R. J., L. M. PIERINI, F. MATSUMURA, and F. R. MAXFIELD. 2000. Ca2+ -dependent myosin II activation is required for uropod retraction during neutrophil migration. J. Cell Sci. 113: 1287–1298. PASTERNAK, C., J. A. SPUDICH, and E. L. ELSON. 1989. Capping of surface receptors and concomitant cortical tension are generated by conventional myosin. Nature 341: 549–551. WESSELS, D., D. R. SOLL, D. KNECHT, W. F. LOOMIS, A. De LOZANNE, and J. SPUDICH. 1988. Cell motility and chemotaxis in Dictyostelium amebae
References
88
89
90
91
92
93
94
95
96
97
98
lacking myosin heavy chain. Dev. Biol. 128: 164–177. BURRIDGE, K. and M. CHRZANOWSKA-WODNICKA. 1996. Focal adhesions, contractility and signaling. Ann. Rev. Cell Dev. Biol. 12: 463–519. FUJIWARA, K. and T. D. POLLARD. 1976. Fluorescent antibody localization of myosin in the cytoplasm, cleavage furrow, and mitotic spindle of human cells. J. Cell Biol. 71: 848–875. MABUCHI, I. and M. OKUNO. 1977. The effect of myosin antibody on the division of starfish blastomeres. J. Cell Biol. 74: 251–263. GUERTIN, D. A., S. TRAUTMANN, and D. MCCOLLUM. 2002. Cytokinesis in eukaryotes. Microbiol. Mol. Biol. Rev. 66: 155–178. BEZANILLA, M. and T. D. POLLARD. 2000. Myosin-II tails confer unique functions in Schizosac-charomyces pombe: characterization of a novel myosin-II tail. Mol. Biol. Cell 11: 79–91. MOTEGI, F., F. NAKANO, and I. MABUCHI. 2000. Molecular mechanism of myosin-II assembly at the division site in Schizosaccharomyces pombe. J. Cell Sci. 113: 1813–1825. GERISCH, G. and I. WEBER. 2000. Cytokinesis without myosin II. Curr. Op. Cell Biol. 12: 126–132. UYEDA, T. Q. P., C. KITAYAMA, and S. YUMURA. 2000. Myosin II-independent cytokinesis in Dictyostelium: its mechanism and implications. Cell Str. Func. 25: 1–10. UYEDA, T. Q. P. and M. A. TITUS. 1997. The myosins of Dictyostelium. In: Dictyostelium: a Model System for Cell and Developmental Biology. Edited by Y. MAEDA, K. INOUYE, and I. TAKEUCHI, Tokyo, Japan: University Academy Press, pp. 43–64. MIZUNO, T., K. TSUTSUI, and Y. NISHIDA. 2002. Drosophila myosin phosphatase and its role in dorsal closure. Development 129: 1215–1223. ANDERSON, K. I., Y. L. WANG, and J. V. SMALL. 1996. Coordination of protrusion and translocation of the keratocyte involves rolling of the cell body. J. Cell Biol. 134: 1209–1218.
99 POST, P. L., R. L. DEBIASIO, and D. L. TAYLOR. 1995. A fluorescent protein biosensor of myosin II regulatory light chain phosphorylation reports a gradient of phosphorylated myosin II in migrating cells. J. Cell Sci. 12: 1755–1768. 100 VERKHOVSKY A. B., T. M. SVITKINA, and G. G. BORISY 1995. Myosin II filament assemblies in the active lamella of fibroblasts: their morphogenesis and role in the formation of actin filament bundles. J. Cell Biol. 131: 989–1002. 101 FUKUI, Y., T. J. LYNCH, H. BRZESKA, and E. D. KORN. 1989. Myosin I is located at the leading edge of locomoting Dictyostelium amoebae. Nature 341: 328–331. 102 YONEMURA, S. and T. D. POLLARD. 1992. The localization of myosin I and myosin II in Acanthamoeba by fluorescence microscopy. J. Cell Sci. 102: 629–642. 103 LEWIS, A. K. and P. C. BRIDGMAN. 1996. Mammalian myosin 1α is concentrated near the plasma membrane in nerve growth cones. Cell Mot. Cytoskel. 33: 130–150. 104 RUPPERT, C., J. GODEL, R. T. M¨ULLER, R. KROSCHEWSKI, J. REINHARD, and M. B¨AHLER. 1995. Localization of the rat myosin I molecules myr 1 and myr 2 and in vivo targeting of their tail domains. J. Cell Sci. 108: 3775–3786. 105 WAGNER, M. C., B. BARYLKO, and J. P. ALBANESI. 1992. Tissue distribution and subcellular localization of mammalian myosin I. J. Cell Biol. 119: 163–170. 106 WESSELS, D., J. MURRAY, G. JUNG, J. A. HAMMER, III, and D. R. SOLL. 1991. Myosin IB null mutants of Dictyostelium exhibit abnormalities in motility. Cell Mot. Cytoskel. 20: 301–315. 107 WESSELS, D., M. A. TITUS, and D. R. SOLL. 1996. A Dictyostelium myosin I plays a crucial role in regulating the frequency of pseudopods formed on the substratum. Cell Mot. Cytoskel. 33: 64–79. 108 WANG, F. S., J. S. WOLENSKI, R. E. CHENEY, M. S. MOOSEKER, and D. G. JAY. 1996. Function of myosin-V in filopodial extension of neuronal growth cones. Science 273: 660–663.
33
34 Myosin Superfamily Proteins
109 LEE, W. L., M. BEZANILLA, and T. D. POLLARD. 2000. Fission yeast myosin-I, Myo1p, stimulates actin assembly by Arp2/3 complex and shares functions with WASp. J. Cell Biol. 151: 789–799. 110 TUXWORTH, R. I., I. WEBER, D. WESSELS, G. C. ADDICKS, D. R. SOLL, G. GERISCH, and M. A. TITUS. 2001. A role for myosin VII in dynamic cell adhesion. Curr. Biol. 11: 318–329. 111 TITUS, M. A. 1999. A class VII unconventional myosin is required for phagocytosis. Curr. Biol. 9: 1297–1303. 112 SIBLEY, L. D., S. HAKANSSON, and V. B. CARRUTHERS. 1998. Gliding motility: an efficient mechanism for cell penetration. Curr. Biol. 8: R12–R14. 113 DOBROWOLSKI, J. M. and L. D. SIBLEY. 1996. Toxoplasma invasion of mammalian cells is powered by the actin cytoskeleton of the parasite. Cell 84: 933–939. 114 HEINTZELMAN, M. B. and J. D. SCHWARTZMAN. 1997. A novel class of unconventional myosins from Toxoplasma gondii. J. Mol. Biol. 271: 139–146. 115 HETTMANN, C., A. HERM, A. GEITER, B. FRANK, E. SCHWARZ, T. SOLDATI, and D. SOLDATI. 2000. A dibasic motif in the tail of a class XIV api-complexan myosin is an essential determinant of plasma membrane localization. Mol. Biol. Cell 11: 1385–1400. 116 HERM-G¨OTZ, A., S. WEISS, R. STRATMANN, S. FUJITA-BECKER, C. RUFF, E. MEYHO¨ FER, T. SOLDATI, D. J. MANSTEIN, M. A. GEEVES, and D. SOLDATI. 2002. Toxoplasma gondii myosin A and its light chain: a fast, single-headed, plus-end-directed motor. EMBO J. 21: 2149–2158. 117 MEISSNER, M., S. BRECHT, H. BUJARD, and D. SOLDATI. 2001. Modulation of myosin A expression by a newly established tetracycline repressor-based inducible system in Toxoplasma gondii. Nucl. Acids Res. 29: E115. 118 BROWN, S. S. 2000. Cooperation between microtubule- and actin-based motor proteins. Ann. Rev. Cell Dev. Biol. 15: 63–80.
119 ROGERS, S. L. and V. I. GELFAND. 2000. Membrane trafficking, organelle transport, and the cytoskeleton. Curr. Op. Cell Biol. 12: 57–62. 120 JOHNSTON, G. C., J. A. PRENDERGAST, and R. A. SINGER. 1991. The Saccharomyces cerevisiae MYO2 gene encodes an essential myosin for vectorial transport of vesicles. J. Cell Biol. 113: 539–551. 121 LILLIE, S. H. and S. S. BROWN. 1994. Immunofluorescence localization of the unconventional myosin, Myo2p, and the putative kinesin-related protein, Smy1p, to the same regions of polarized growth in Saccharomyces cerevisiae. J. Cell Biol. 125: 825–842. 122 CATLETT, N. L. and L. S. WEISMAN. 1998. The terminal tail region of a yeast myosin-V mediates its attachment to vacuole membranes and sites of polarized growth. PNAS 95: 14799–14804. 123 RECK-PETERSON, S. L., P. J. NOVICK, and M. S. MOOSEKER. 1999. The tail of a yeast class V myosin, Myo2p, functions as a localization domain. Mol. Biol. Cell 10: 1001–1017. 124 SCHOTT, D., J. HO, D. PRUYNE, and A. BRETSCHER. 1999. The COOH-terminal domain of Myo2p, a yeast myosin V has a direct role in secretory vesicle targeting. J. Cell Biol. 147: 791–807. 125 BENINGO, K., S. H. LILLIE, and S. S. BROWN. 2000. The yeast kinesin-related protein Smy1p exerts its effects on the class V myosin Myo2p via a physical interaction. Mol. Biol. Cell 11: 691–702. 126 LILLIE, S. H. and S. S. BROWN. 1992. Suppression of a myosin defect by a kinesin-related gene. Nature 356: 358–361. 127 PRUYNE, D. W., D. H. SCHOTT, and A. BRETSCHER. 1998. Tropomyosin-containing actin cables direct the Myo2p-dependent polarized delivery of secretory vesicles in budding yeast. J. Cell Biol. 143: 1931–1945. 128 WALCH-SOLIMENA, C., R. N. COLLINS, and P. J. NOVICK. 1997. Sec2p mediates nucleotide exchange on Sec4p and is involved in polarized delivery of post-Golgi vesicles. J. Cell Biol. 137: 1495–1509.
References 129 WIN, T. Z., Y. GACHET, D. P. MULVIHILL, K. M. MAY, and J. S. HYAMS. 2000. Two type V myosins with non-overlapping functions in the fission yeast Schizosaccharomyces pombe: Myo52 is concerned with growth polarity and cytokinesis, Myo51 is a component of the cytokinetic actin ring. J. Cell Sci. 114: 69–79. 130 MULVIHILL, D. P., P. J. POLLARD, T. Z. WIN, and J. S. HYAMS. 2001. Myosin V-mediated vacuole distribution and fusion in fission yeast. Curr. Biol. 11: 1124–1127. 131 HILL, K. L., N. L. CATLETT, and L. S. WEISMAN. 1996. Actin and myosin function in directed vacuole movement during cell division in Saccharomyces cerevisiae. J. Cell Biol. 135: 1535–1549. 132 CATLETT, N. L., J. E. DEUX, F. TANG, and L. S. WEISMAN. 2000. Two distinct regions in a yeast myosin-V tail domain are required for the movement of different cargoes. J. Cell Biol. 150: 513–525. 133 HOEPFNER, D., M. van den BERG, P. PHILIPPSEN, H. F. TABAK, and E. H. HETTEMA. 2001. A role for Vsp1p, actin and the Myo2p motor in peroxisome abundance and inheritance in Saccharomyces cerevisiae. J. Cell Biol. 155: 979–990. 134 ROSSANESE, O. W., C. A. REINKE, B. J. BEVIS, A. T. HAMMOND, S. I. B. J. O’CONNOR, and B. S. GLICK. 2001. A role for actin, Cdc1p, and Myo2p in the inheritance of late Golgi elements in Saccharomyces cerevisiae. J. Cell Biol. 153: 47–61. 135 BEACH, D. L., J. THIBODEAUX, P. MADDOX, E. YEH, and K. BLOOM. 2000. The role of the proteins Kar9 and Myo2 in orienting the mitotic spindle of budding yeast. Curr. Biol. 10: 1497–1506. 136 YIN, H., D. BRUYNE, T. C. HUFFAKER, and A. BRETSCHER. 2000. Myosin V orientates the mitotic spindle in yeast. Nature 406: 1013–1015. 137 BOBOLA, N., R. P. JANSEN, T. H. SHIN, and K. NASMYTH. 1996. Asymmetric accumulation of Ash1p in postanaphase nuclei depends on a myosin and restricts
138
139
140
141
142
143
144
145
146
147
yeast mating-type switching to mother cells. Cell 84: 699–709. JANSEN, R. P., C. DOWZER, C. MICHAELIS, M. GALOVA, and K. NASMYTH. 1996. Mother cell-specific HO expression in budding yeast depends on the unconventional myosin Myo4p and other cytoplasmic proteins. Cell 84: 687–697. SIL, A. and I. HERSKOWITZ. 1996. Identification of asymmetrically localized determinant, Ash1p, required for lineage-specific transcription of the yeast HO gene. Cell 84: 711–722. LONG, R. M., R. H. SINGER, X. H. MENG, I. GONZALEZ, K. NASMYTH, and R. P. JANSEN. 1997. Mating type switching in yeast controlled by asymmetric localization of ASH1 mRNA. Science 277: 383–387. TAKIZAWA, P. A., A. SIL, J. R. SWEDLOW, I. HERSKOWITZ, and R. D. VALE. 1997. Actin-dependent localization of an RNA encoding a cell-fate determinant in yeast. Nature 389: 90–93. BEACH, D. L., E. D. SALMON, and K. BLOOM. 1999. Localization and anchoring of mRNA in budding yeast. Curr. Biol. 9: 569–578. BERTRAND, E., P. CHARTRAND, M. SCHAEFER, S. M. SHENOY, R. H. SINGER, and R. M. LONG. 1998. Localization of ASH1 mRNA particles in living yeast. Mol. Cell 2: 437–445. B¨OHL, F., C. KRUSE, A. FRANK, D. FERRING, and R. P. JANSEN. 2000. She2p, a novel RNA-binding protein tethers ASH1 mRNA to the Myo4p myosin motor via She3p. EMBOJ. 19: 5514–5524. LONG, R. M., W. GU, E. LORIMER, R. H. SINGER, and P. CHARTRAND. 2000. She2p is a novel RNA-binding protein that recruits the Myo4p-She3p complex to ASH1 mRNA. EMBOJ. 19: 6592–6601. M¨UNCHOW, S., C. SAUTER, and R. P. JANSEN. 1999. Association of the class V myosin Myo4p with a localised messenger RNA in budding yeast depends on She proteins. J. Cell Sci. 112: 1511–1518. TAKIZAWA, P. A. and R. D. VALE. 2000. The myosin motor, Myo4p, binds ASH1
35
36 Myosin Superfamily Proteins
148
149
150
151
152
153
154
155
156
157
158
mRNA via the adapter protein, She3p. PNAS 97: 5273–5278. EVANS, L. L., A. J. LEE, P. C. BRIDGMAN, and M. S. MOOSEKER. 1998. Vesicle-associated brain myosin-V can be activated to catalyze actin-based transport. J. Cell Sci. 111: 2055–2066. TABB, J. S., B. J. MOLYNEAUX, D. L. COHEN, S. A. KUZNETSOV, and G. M. LANGFORD. 1998. Transport of ER vesicles on actin filaments in neurons by myosin V. J. Cell Sci. 111: 3221–3234. BRIDGMAN, P. C. 1999. Myosin Va movements in normal and dilutelethal axons provide support for a dual filament motor complex. J. Cell Biol. 146: 1045–1060. TUMA, M. C. and V. I. GELFAND. 1999. Molecular mechanisms of pigment transport in mela-nophores. Pig. Cell Res. 12: 283–294. WU, X. and J. A. HAMMER, III. 2000. Making sense of melanosome dynamics in mouse melanocytes. Pig. Cell Res. 13: 241–247. RODIONOV, V. I., A. J. HOPE, T. M. SVITKINA, and G. G. BORISY. 1998. Functional coordination of microtubule-based and actin-based motility in melanophores. Curr. Biol. 8: 165–168. ROGERS, S. L. and V. I. GELFAND. 1998. Myosin cooperates with microtubule motors during organelle transport in melanophores. Curr. Biol. 8: 161–163. ROGERS, S. L., I. S. TINT, P. C. FANAPOUR, and V. I. GELFAND. 1997. Regulated bidirectional motility of melanophore pigment granules along microtubules in vitro. PNAS 94: 3720–3725. GROSS, S. P., M. C. TUMA, S. W. DEACON, A. S. SERPINSKAYA, A. R. REILEIN, and V. I. GELFAND. 2002. Interaction and regulation of molecular motors in Xenopus melanophores. J. Cell Biol. 156: 855–865. KOYAMA, Y. and T. TAKEUCHI. 1981. Ultrastructural observations on melanosome aggregation in genetically defective melanocytes of the mouse. Anatom. Rec. 201: 599–611. PROVANCE, D. W., Jr., M. WEI, V. IPE, and J. A. MERCER. 1996. Cultured
159
160
161
162
163
164
165
166
melanocytes from dilute mutant mice exhibit dendritic morphology and altered melanosome distribution. PNAS 93: 14554–14558. WEI, Q., X. WU, and J. A. HAMMER, III. 1997. The predominant defect in dilute melanocytes is in melanosome distribution and not cell shape, supporting a role for myosin V in melanosome transport. J. Mus. Res. Cell Mot. 18: 517–527. LAMBERT, T., J. ONDERWATER, Y. VANDERHAEGHEN, G. VANCOILLIE, H. K. KOERTEN, A. M. MOMMAAS, and J. M. NAEYAERT. 1998. Myosin V colocalizes with melanosomes and subcortical actin bundles not associated with stress fibers in human epidermal melanocytes. J. Invest. Dermatol. 111: 835–840. NASCIMENTO, A. A. C., R. G. AMARAL, J. C. S. BIZARIO, R. E. LARSON, and E. M. ESPREAFICO. 1997. Subcellular localization of myosin-V in the B16 melanoma cells, a wild-type cell line for the dilute gene. Mol. Biol. Cell 8: 1971–1988. WU, X., B. BOWERS, Q. WEI, B. KOCHER, and J. A. HAMMER, III. 1997. Myosin Vassociates with melanosomes in mouse melanocytes: evidence that myosin V is an organelle motor. J. Cell Sci. 110: 847–859. WU, X., F. WANG, K. RAO, J. R. SELLERS, and J. A. HAMMER, III. 2002. Rab27a is an essential component of the melanosome receptor for myosin Va. Mol. Biol. Cell 13: 1735–1749. SILVERS, W. K. 1979. Dilute and leaden, the p-locus, ruby-eye, and ruby-eye-2. In: Coat Colors of Mice: A Model for Mammalian Gene Action and Interaction. Edited by W. K. SILVERS, New York: Springer-Verlag, pp. 83–89; 104–107. MATESIC, L. E., R. YIP, A. E. REUSS, D. A. SWING, T. N. O’SULLIVAN, C. F. FLETCHER, N. G. COPELAND, and N. A. JENKINS. 2001. Mutations in Mlph, encoding a member of the rab effector family, cause the melanosome transport defects observed in leaden mice. PNAS 98: 10238–10243. WILSON, S. M., R. YIP, D. A. SWING, T. N. O’SULLIVAN, Y. ZHANG, E. K. NOVAK, R. T. SWANK, L. B. RUSSELL, N. G.
References
167
168
169
170
171
172
173
174
COPELAND, and N. A. JENKINS. 2000. A mutation in Rab27a causes the vesicle transport defects observed in ashen mice. PNAS 97: 7933–7938. FUKUDA, M., T. S. KURODA, and K. MIKOSHIBA. 2002. Slac2-a/melanophilin, the missing link between Rab27a and myosin Va: implications of a tripartite protein complex for melanosome transport. J. Biol. Chem. 277: 12432–12436. HUME, A. N., L. M. COLLINSON, A. RAPAK, A. Q. GOMES, C. R. HOPKINS, and M. C. SEABRA. 2001. Rab27a regulates the peripheral distribution of melanosomes in melanocytes. J. Cell Biol. 152: 795–808. HUME, A. N., L. M. COLLINSON, C. R. HOPKINS, M. STROM, D. C. BARRAL, G. BOSSI, G. M. GRIFFITHS, and M. C. SEABRA. 2002. The leaden gene product is required with Rab27a to recruit myosin Va to melanosomes in melanocytes. Traffic 3: 193–202. PROVANCE, D. W., Jr., T. L. JAMES, and J. A. MERCER. 2002. Melanophilin, the product of the leaden locus, is required for targeting of myosin-Va to melanosomes. Traffic 3: 124–132. WU, X., K. RAO, M. B. BOWERS, N. G. COPELAND, N. A. JENKINS, and J. A. HAMMER, III. 2001. Rab27a enables myosin Va-dependent melanosome capture by recruiting the myosin to the organelle. J. Cell Sci. 114: 1091–1100. WU, X., K. RAO, H. ZHANG, F. WANG, J. R. SELLERS, L. E. MATESIC, N. G. COPELAND, N. A. JENKINS, and J. A. HAMMER, III. 2002. Identification of an organelle receptor for myosin-Va. Nature Cell Biol. 4: 271–278. NAGASHIMA, K., S. TORII, Z. YI, M. IGARASHI, K. OKAMOTO, T. TAKEUCHI, and T. IZUMI. 2002. Melanophilin directly links Rab27a and myosin Va through its distinct coiled-coil regions. FEBS Lett. 517: 233–238. KURODA, T. S. M., FUKUDA, A. H., and K. MIKOSHIBA. 2002. The Slp homology domain of synaptotagmin-like proteins 1-4 and Slac2 functions as a novel rab27A binding domain. J. Biol. Chem. 277: 9212–9218.
175 LIU, X., B. ONDECK, and D. S. WILLIAMS. 1998. Mutant myosin VIIa causes defective melanosome distribution in the RPE of shaker-1 mice. Nature Genet. 19: 117–118. 176 EL-AMRAOUI, A., J. S. SCHONN, P. K¨USSEL-ANDERMANN, S. BLANCHARD, C. DESNOS, J. P. HENRY, U. WOLFRUM, F. DARCHEN, and C. PETIT. 2002. MyRIP, a novel rab effector, enables myosin VIIa recruitment to retinal melanosomes. EMBO Rep. 3: 463–470. 177 MERMALL, V., J. G. MCNALLY, and K. G. MILLER. 1994. Transport of cytoplasmic particles catalysed by an unconventional myosin in living Drosophila embryos. Nature 369: 560–562. 178 MERMALL, V. and K. G. MILLER. 1995. The 95F unconventional myosin is required for proper organization of the Drosophila syncytial blastoderm. J. Cell Biol. 129: 1575–1588. 179 BOHRMANN, J. 1997. Drosophila unconventional myosin VI is involved in intra- and intercellular transport during oogenesis. Cell. Mol. Life Sci. 53: 652–662. 180 HICKS, J. L., W. M. DENG, A. D. ROGAT, K. G. MILLER, and M. BOWNES. 1999. Class VI unconventional myosin is required for spermatogenesis in Drosophila. Mol. Biol. Cell 10: 4341–4353. 181 KELLEHER, J. F., M. A. MANDELL, G. L. MOULDER, K. L. HILL S. W. L’HERNAULT, R. J. BARSTEAD, and M. A. TITUS. 2000. Myosin VI is required for asymmetric segregation of cellular components during C. elegans spermatogenesis. Curr. Biol. 10: 1489–1496. 182 COLUCCIO, L. M. 1997. Myosin I. Am. J. Physiol. 273: C347–C359. 183 MOOSEKER, M. S. and R. E. CHENEY. 1995. Unconventional myosins. Ann. Rev. Cell Dev. Biol. 11: 633–675. 184 TYSKA, M. J. and M. S. MOOSEKER. 2002. MYO1A (brush border myosin I) dynamics in the brush border of LL-PK1-CL4 cells. Biophys. J. 82: 1869–1883. 185 FATH, K. and D. R. BURGESS. 1993. Golgi-derived vesicles from developing epithelial cells bind actin filaments and possess myosin-I as a
37
38 Myosin Superfamily Proteins
186
187
188
189
190
191
192
193
194
195
cytoplasmically-oriented peripheral membrane protein. J. Cell Biol. 120: 117–127. MONTELL, C. and G. M. RUBIN. 1988. The Drosophila ninaC locus encodes two photoreceptor cell specific proteins with domains homologous to protein kinases and the myosin heavy chain head. Cell 52: 757–772. HICKS, J. and D. S. WILLIAMS. 1992. Distribution of the myosin I-like ninaC proteins in the Drosophila retina and ultrastructural analysis of mutant phenotypes. J. Cell Sci. 101: 247–254. PORTER, J. A., J. L. HICKS, D. A. WILLIAMS, and C. MONTELL. 1992. Differential localizations of and requirements for the two Drosophila ninaC kinase/myosins in photoreceptor cells. J. Cell Biol. 116: 683–693. DOSE´ , A. C. and B. BURNSIDE. 2000. Cloning and chromosomal localization of a human class III myosin. Genomics 67: 333–342. EVANS, L. L., J. HAMMER, and P. C. BRIDGMAN. 1997. Subcellular localization of myosin V in nerve growth cones and outgrowth from dilutel-ethal neurons. J. Cell Sci. 110: 439–449. HEINTZELMAN, M. B., T. HASSON, and M. S. MOOSEKER. 1994. Multiple unconventional myosin domains of the intestinal brush border cytoskeleton. J. Cell Sci. 107: 3535–3543. PREKERIS, R. and D. M. TERRIAN. 1997. Brain myosin V is a synaptic vesicle-associated motor protein: evidence for a Ca2+ -dependent interaction with the synaptobrevin–synaptophysin complex. J. Cell Biol. 137: 1589–1601. BERG, J. S. and R. E. CHENEY. 2002. Myosin-X is an unconventional myosin that undergoes intrafilopodial motility. Nature Cell Biol. 4: 246–250. HOLT, J. R. and D. P. COREY. 2000. Two mechanisms for transducer adaptation in vertebrate hair cells. PNAS 97: 11730–11735. AVRAHAM, K. B., T. HASSON, K. P. STEEL, D. M. KINGSLEY, L. B. RUSSELL, M. S. MOOSEKER, N. G. COPELAND, and N. A. JENKINS. 1995. The mouse Snell’s waltzer
196
197
198
199
200
201
202
203
204
deafness gene encodes an unconventional myosin required for structural integrity of inner ear hair cells. Nature Genet. 11: 369–375. DEOL, M. S. and M. C. GREEN. 1966. Snell’s waltzer, a new mutation affecting behaviour and the inner ear in the mouse. Genet. Res. 8: 339–345. HASSON, T., P. G. GILLESPIE, J. A. GARCIA, R. B. MACDONALD, Y. ZHAO, A. G. YEE, M. S. MOOSEKER, and D. P. COREY. 1997. Unconventional myosins in inner-ear sensory epithelia. J. Cell Biol. 137: 1287–1307. SELF, T., T. SOBE, N. G. COPELAND, N. A. JENKINS, K. B. AVRAHAM, and K. P. STEEL. 1999. Role of myosin VI in the differentiation of cochlear hair cells. Dev. Biol. 214: 331–341. HASSON, T. and M. S. MOOSEKER. 1994. Porcine myosin-VI: characterization of a new mammalian unconventional myosin. J. Cell Biol. 127: 425–440. BIEMSDERFER, D., S. A. MENTONE, M. MOOSEKER, and T. HASSON. 2002. Expression of myosin-VI within the early endocytic pathway in the adult and developing proximal tubule. Am. J. Physiol Renal Physiol. 282: F785–F794. GIBSON, F., J. WALSH, P. MBURU, A. VARELA, K. A. BROWN, M. ANTONIO, K. W. BEISEL, K. P. STEEL, and S. D. M. BROWN. 1995. A type VII myosin encoded by the mouse deafness gene shaker-1. Nature 374: 62–64. SELF T., M. MAHONY J. FLEMING, J. WALSH, S. D. M. BROWN, and K. P. STEEL. 1998. Shaker-1 mutations reveal roles for myosin VIIA in both development and function of cochlear hair cells. Development 125: 557–566. HASSON, T., M. B. HEINTZELMAN, J. SANTOS-SACCHI, D. P. COREY, and M. S. MOOSEKER. 1995. Expression in the cochlea and retina of myosin VIIa, the gene product defective in Usher syndrome type 1B. PNAS 92: 9815–9819. K¨USSEL-ANDERMAN, P., A. EL-AMRAOUI, S. SAFIEDDINE, S. NOUAILLE, I. PERFETTINI, M. LECUIT, P. COSSART, U. WOLFRUM, and C. PETIT. 2000. Vezatin, a novel transmembrane protein, bridges myosin
References
205
206
207
208
209
210
211
212
213
VIIA to the cadherins-catenins complex. EMBO J. 19: 6020–6029. CORDONNIER, M. N., D. DAUZONNE, D. LOUVARD, and E. COUDRIER. 2001. Actin filaments and myosin I alpha cooperate with microtubules for the movement of lysosomes. Mol. Biol. Cell 12: 4013–4029. RAPOSO, G., M. N. CORDONNIER, D. TENZA, B. MENICHI, A. D¨URRBACH, D. LOUVARD, and E. COUDRIER. 1999. Association of myosin I alpha with endosomes and lysosomes in mammalian cells. Mol. Biol. Cell 10: 1477–1494. DURRBACH, A., G. RAPOSO, D. TENZA, D. LOUVARD, and E. COUDRIER. 2000. Truncated brush border myosin I affects membrane traffic in polarized cells. Traffic 1: 411–424. HUBER, L. A., I. FIALKA, K. PAIHA, W. HUNZIKER, D. B. SACKS, M. B¨AHLER, M. WAY, R. GAGESCU, and J. GRUENBERG. 2000. Both calmodulin and the unconventional myosin myr4 regulate membrane trafficking along the recycling pathway of MDCK cells. Traffic 1: 494–503. NEUHAUS, E. M. and T. SOLDATI. 2000. A myosin I is involved in membrane recycling from early endosomes. J. Cell Biol. 150: 1013–1026. JONTES, J. D., R. A. MILLIGAN, T. D. POLLARD, and E. M. OSTAP. 1997. Kinetic characterization of brush border myosin-I ATPase. PNAS 94: 14332–14337. LAPIERRE, L. A., R. KUMAR, C. M. HALES, J. NAVARRE, S. G. BHARTUR, J. O. BURNETTE, D. W. PROVANCE, J. A. MERCER, M. B¨AHLER, and J. R. GOLDENRING. 2001. Myosin Vb is associated with plasma membrane recycling systems. Mol. Biol. Cell 12: 1843–1857. RODRIGUEZ, O. C. and R. E. CHENEY. 2002. Human myosin-Vc is a novel class V myosin expressed in epithelial cells. J. Cell Sci. 115: 991–1004. INOUE, A., O. SATO, K. HOMMA, and M. IKEBE. 2002. DOC-2/DAB2 is the binding partner of myosin VI. BBRC 292: 300–307.
214 MORRIS, S. M., S. D. ARDEN, R. C. ROBERTS, J. KENDRICK-JONES, J. A. COOPER, J. P. LUZIO, and F. BUSS. 2002. Myosin VI binds to and localises with Dab2, potentially linking receptor-mediated endocytosis and the actin cytoskeleton. Traffic 3: 331–341. 215 MORRIS, S. M. and J. A. COOPER. 2001. Disabled-2 colocalizes with LDLR in clathrin-coated pits and interacts with AP-2. Traffic 2: 111–123. 216 STEPHENSON, R. S., J. E. O’TOUSA, N. J. SCAVARDA, L. L. RANDALL, and W. S. PAK. 1983. Drosophila mutants with reduced rhodopsin content. In: Biology of Photoreceptors. Edited by D. COSENS and D. VINCE-PRUE, Cambridge: Cambridge University Press, pp. 471–495. 217 PORTER, J. A. and C. MONTELL. 1993. Distinct roles of the Drosophila ninaC kinase and myosin domains revealed by systematic mutagenesis. J. Cell Biol. 122: 601–612. 218 PORTER, J. A., B. MINKE, and C. MONTELL. 1995. Calmodulin binding to Drosophila NinaC required for termination of phototransduction. EMBO J. 14: 4450–4459. 219 BATTELLE, B. A., A. W. ANDREWS, B. G. CALMAN, J. R. SELLERS, R. M. GREENBERG, and W. C. SMITH. 1998. A myosin III from Limulus eyes is a clock-regulated phosphoprotein. J. Neurosci. 18: 4548–4559. 220 WES, P. D., X. Z. S. XU, H. S. LI, F. CHIEN, S. K. DOBERSTEIN, and C. MONTELL. 1999. Termination of phototransduction requires binding of the NINAC myosin III and the PDZ protein INAD. Nature Neurosci. 2: 447–453. 221 GARC´IA, J. A., A. G. YEE, P. G. GILLESPIE, and D. P. COREY. 1998. Localization of myosin-Iβ near both ends of tip links in frog saccular hair cells. J. Neurosci. 18: 8637–8647. 222 STEYGER, P. S., P. G. GILLESPIE, and R. A. BAIRD. 1998. Myosin1β is located at tip link anchors in vestibular hair bundles. J. Neurosci. 18: 4603–4615. 223 GILLESPIE, P. G. and R. G. WALKER. 2001. Molecular basis of mechanosensory transduction. Nature 413: 194–202.
39
40 Myosin Superfamily Proteins
224 HOLT, J. R., S. K. H. GILLESPIE, D. W. PROVANCE, Jr., K. SHAH, K. M. SHOKAT, D. P. COREY, J. A. MERCER, and P. G. GILLESPIE. 2002. A chemical–genetic strategy implicates myosin-1c in adaptation by hair cells. Cell 108: 371–381. 225 GILLESPIE, P. G., S. K. H. GILLESPIE, J. A. MERCER, K. SHAH, and K. M. SHOKAT. 1999. Engineering of the myosin-Iβ nucleotide-binding pocket to create selective sensitivity to N-modified ADP analogs. J. Biol. Chem. 274: 31373–31381. 226 REINHARD, J., A. A. SCHEEL, D. DIEKMANN, A. HALL, C. RUPPERT, and M. B¨AHLER. 1995. A novel type of myosin implicated in signalling by rho family GTPases. EMBO J. 14: 697–704. 227 WIRTH, J. A., K. A. JENSEN, P. L. POST, W. M. BEMENT, and M. S. MOOSEKER. 1996. Human myosin-IXb, an unconventional myosin with a chimerin-like rho/rac GTPase-activating protein domain in its tail. J. Cell Sci. 109: 653–661. 228 SEIDMAN, J. G. and C. SEIDMAN. 2001. The genetic basis for cardiomyopathy: from mutation identification to mechanistic paradigms. Cell 104: 557–567. 229 POETTER, K., H. JIANG, S. HASSANZADEH, S. R. MASTER, A. CHANG, M. C. DALAKAS, I. RAYMENT, J. R. SELLERS, L. FANANAPAZIR, and N. D. EPSTEIN. 1996. Mutations in either the essential or regulatory light chains of myosin are associated with a rare myopathy in human heart and skeletal muscle. Nature Genet. 13: 63–69. 230 GRISCELLI, C., A. DURANDY, D. GUY-GRAND, F. DAGUILLARD, C. HERZOG, and M. PRUNIERAS. 1978. A syndrome associating partial albinism and immunodeficiency. Am. J. Med. 65: 691–702. 231 PASTURAL, E., F. J. BARRAT, R. DUFOURCQ-LAGELOUSE, S. CERTAIN, O. SANAL, N. JABADO, R. SEGER, C. GRISCELLI, A. FISHER, and G. de SAINT BASILE. 1997. Griscelli disease maps to chromosome 15q21 and is associated with mutations in the myosin-Va gene. Nature Genet. 16: 289–292.
232 WESTBROEK, W., J. LAMBERT, and J. M. NAEYAERT. 2001. The dilute locus and Griscelli syndrome: gateways towards a better understanding of melanosome transport. Pig. Cell Res. 14: 320–327. 233 PASTURAL, E., F. ERSOY, N. YALMAN, N. WULFFRAAT, E. GRILLO, F. OZKINAY I. TEZCAN, G. GEDIKO¨ GLU, N. PHILIPPE, A. FISCHER, and G. de SAINT BASILE. 2000. Two genes are responsible for Griscelli syndrome at the same 15q21 locus. Genomics 63: 299–306. 234 KLEIN, C., N. PHILIPPE, F. LE DEIST, S. FRAITAG, C. PROST, A. DURANDY, A. FISCHER, and C. GRISCELLI. 1994. Partial albinism with immunodeficiency (Griscelli syndrome). J. Pediat. 125: 886–895. 235 M´ENASCHE´ , G., E. PASTURAL, J. FELDMANN, S. CERTAIN, F. ERSOY, S. DUPUIS, N. WULFFRAAT, D. BIANCHI, A. FISCHER, F. LE DEIST, and G. de SAINT BASILE. 2000. Mutations in RAB27A cause Griscelli syndrome associated with haemophagocytic syndrome. Nature Genet. 25: 173–176. 236 HADDAD, E. K., X. WU, J. A. HAMMER, III, and P. A. HENKART. 2001. Defective granule exocytosis in rab27a-deficient lymphocytes from ashen mice. J. Cell Biol. 152: 835–841. 237 The May-Hegglin/Fechtner Syndrome consortium 2000. Mutations in MYH9 result in May-Hegglin anomaly, and Fechtner and Sebastion syndromes. Nature Genet. 26: 103–105. 238 KELLEY, M. J., W. JAWIEN, T. L. ORTEL, and J. F. KORCZAK. 2000. Mutation of MYH9, encoding non-muscle myosin heavy chain A, in May Hegglin anomaly. Nature Genet. 26: 106–108. 239 PETIT, C., J. LEVILLIERS, and J. P. HARDELIN. 2001. Molecular genetics of hearing loss. Ann. Rev. Genet. 35: 589–646. 240 ERNEST, S., G. J. RAUCH, P. HAFFTER, R. GEISLER, C. PETIT, and T. NICOLSON. 2000. Mariner is defective in myosin VIIA: a zebrafish model for human hereditary deafness. Hum. Mol. Genet. 9: 2189–2196. 241 WEIL, D., S. BLANCHARD, J. KAPLAN, P. GULIFORD, F. GIBSON, J. WALSH, P.
References
242
243
244
245
246
247
248
249
MBURU, A. VARELA, J. LEVILLIERS, M. D. WESTON, P. M. KELLEY W. J. KIMBERLING, M. WAGENAAR, F. LEVI-ACOBAS, D. LARGET-PIET, A. MUNNICH, K. P. STEEL, S. D. M. BROWN, and C. PETIT. 1995. Defective myosin VIIA gene responsible for Usher syndrome type 1B. Nature 374: 60–61. WEIL, D., P. K¨USSEL, S. BLANCHARD, G. L´EVY F. LEVI-ACOBAS, M. DRIRA, H. AYADI, and C. PETIT. 1997. The autosomal recessive isolated deafness, DFNB2, and the Usher 1B syndrome are allelic defects of the myosin-VIIA gene. Nature Genet. 16: 191–193. LIU, X. Z., J. WALSH, P. MBURU, J. KENDRICK-JONES, M. J. TV COPE, K. P. STEEL, and S. D. M. BROWN. 1997. Mutations in the myosin VIIA gene cause non-syndromic recessive deafness. Nature Genet. 16: 188–190. EUDY, J. D. and J. SUMEGI. 1999. Molecular genetics of Usher syndrome. Cell. Mol. Life Sci. 56: 258–267. KEATS, B. J. B. and D. P. COREY. 1999. The Usher Syndromes. Am. J. Med. Genet. 89: 158–166. KROS, C. J., W. MARCOTTI, S. M. van NETTEN, T.J. SELF, R. T. LIBBY, S. D. M. BROWN, G. P. RICHARDSON, and K. P. STEEL. 2002. Reduced climbing and increased slipping adaptation in cochlear hair cells of mice with Myo7a mutations. Nature Neurosci. 5: 41–47. LIU, X., I. P. UDOVICHENKO, S. D. M. BROWN, K. P. STEEL, and D. S. WILLIAMS. 1999. Myosin VIIa participates in opsin transport through the photoreceptor cilium. J. Neurosci. 19: 6267–6274. WANG, A., Y. LIANG, R. A. FRIDELL, F. J. PROBST, E. R. WILCOX, J. W. TOUCHMAN, C. C. MORTON, R. J. MORELL, K. NOBEN-TRAUTH, S. A. CAMPER, and T. B. FRIEDMAN. 1998. Association of unconventional myosin MYO15 mutations with human nonsyndromic deafness DFNB3. Science 280: 1447–1451. LLOYD, R. V., S. VIDAL, L. JIN, S. ZHANG, K. KOVACS, E. HORVATH, B. W. SCHEITHAUER, E. T. A. BOGER, R. A.
250
251
252
253
254
255
256
FRIDELL, and T. B. FRIEDMAN. 2001. Myosin XVa expression in the pituitary and in other neuroendocrine tissues and tumors. Am. J. Pathol. 159: 1375–1382. PROBST, F. J., R. A. FRIDELL, Y. RAPHAEL, T. L. SAUNDERS, A. WANG, Y. LIANG, R. J. MORELL, J. W. TOUCHMAN, R. H. LYONS, K. NOBEN-TRAUTH, T. B. FRIEDMAN, and S. A. CAMPER. 1998. Correction of deafness in shaker-2 mice by an unconventional myosin in a BAC transgene. Science 280: 1444–1447. MELCHIONDA, S., N. AHITUV, L. BISCEGLIA, T. SOBE, F. GLASER, R. RABIONET, M. L. ARBONES, A. NOTARANGELO, E. DI IORIO, M. CARELLA, L. ZELANTE, X. ESTIVILL, K. B. AVRAHAM, and P. GASPARINI. 2001. MYO6, the human homologue of the gene responsible for deafness in Snell’s waltzer mice, is mutated in autosomal dominant nonsyndromic hearing loss. Am. J. Hum. Genet. 69: 635–640. WALSH, T., V. WALSH, S. VREUGDE, R. HERTZANO, H. SHAHIN, H. HAIKA, M. K. LEE, M. KANAAN, M. C. KING, and K. B. AVRAHAM. 2002. From flies’ eyes to our ears: mutations in a human class III myosin cause progressive nonsyndromic hearing loss DFNB30. PNAS 99: 7518–7523. LALWANI, A. K., J. A. GOLDSTEIN, M. J. KELLEY, W. LUXFORD, C. M. CASTELEIN, and A. N. MHATRE. 2000. Human nonsyndromic hereditary deafness DFNA17 is due to a mutation in nonmuscle myosin MYH9. Am. J. Hum. Genet. 67: 1121–1128. HASEGAWA, Y. and T. ARAKI. 2002. Identification of a novel unconventional myosin from scallop mantle tissue. J. Biochem. 131: 113–119. HOROWITZ, J. A. and J. A. HAMMER, III. 1990. A new Acanthamoeba myosin heavy chain. Cloning of the gene and immunological identification of the polypeptide. J. Biol. Chem. 265: 20646–20652. BAKER, J. P. and M. A. TITUS. 1997. A family of unconventional myosins from the nematode Caenorhabditis elegans. J. Mol. Biol. 272: 523–535.
41
42 Myosin Superfamily Proteins
257 MOTEGI, F., R. ARAI, and I. MABUCHI. 2001. Identification of two type V myosins in fission yeast, one of which functions in polarized cell growth and moves rapidly in the cell. Mol. Biol. Cell 12: 1367–1380.
258 CHEN, Z. Y., T. HASSON, D. S. ZHANG, B. J. SCHWENDER, B. H. DERFLER, M. S. MOOSEKER, and D. P. COREY. 2001. Myosin VIIb, a novel unconventional myosin, is a constituent of microvilli in transporting epithelia. Genomics 72: 285–296.
1
Plasma-Membrane H+ -ATPase From Yeast Silvia Lecchi and Carolyn W. Slayman Yale University School of Medicine, New Haven, USA
Originally published in: Handbook of ATPase. Edited by Masamitsu Futai, Yoh Wada and Jack H. Kaplan. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30689-3
1 Introduction
The single most abundant protein in the fungal plasma membrane is a protontransporting ATPase, encoded by the PMA1 gene [1] and hydrolyzing as much as one-quarter of cellular ATP [2]. By forming a large electrochemical gradient across the surface membrane, the H+ -ATPase provides energy to an array of protoncoupled cotransporters for sugars, amino acids, and other nutrients; it also contributes to the regulation of intracellular pH (Figure 1). Direct evidence for the ATPase first came several decades ago from microelectrode studies on the filamentous fungus Neurospora crassa, where a large (> 200 mV), metabolically sensitive membrane potential was shown to be generated by ATP-dependent proton pumping [3, 4]. By the early 1980s, the H+ -ATPase had been purified from Schizosaccharomyces pombe [5], Saccharomyces cerevisiae [6], and Neurospora crassa [7, 8] and found to consist of a 100 kDa polypeptide, firmly embedded in the membrane bilayer and requiring detergents for solubilization. Reconstitution into proteoliposomes confirmed that the ATPase did indeed mediate electrogenic proton transport [9–11], with a stoichiometry of 1 H+ /ATP, as predicted from electrophysiological measurements [12]. In parallel, biochemical studies demonstrated that it split ATP by way of a covalent β-aspartyl intermediate [13–16], the hallmark of a widespread group later named the P-type ATPases. Like other members of that group, the H+ -ATPase alternated during its reaction cycle between two major conformational states (E1 and E2 ), which could be distinguished by their different patterns of proteolytic fragments following digestion with low concentrations of trypsin [17, 18]. By 1986, cloning of PMA1 genes from Saccharomyces cerevisiae [1] and Neurospora crassa [19, 20] had revealed clearcut sequence homology between the fungal Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Plasma-Membrane H+ -ATPase From Yeast
Fig. 1 Pma1 H+ -ATPase plays an essential role in yeast cell physiology. Pumping protons (H+ ) across the membrane at the expense of ATP hydrolysis, Pma1 H+ -ATPase establishes the membrane potential (R ) and the pH gradient required for nutrient import and contributes to the regulation of intracellular pH. PM: plasma membrane.
H+ -ATPase, the Kdp K+ -ATPase from Escherichia coli [21], and the plasmamembrane Na+ ,K+ - and sarcoplasmic reticulum Ca2+ -ATPases of animal cells [22–24]. More than 200 P-ATPase genes have since been cloned from archaebacteria, eubacteria, fungi, algae, plants, and animals and analyzed to establish their relationship with one another. In 1995, Lutsenko and Kaplan [25] put forward a useful classification scheme that divides P-type ATPases into three groups based on the cation transported, the number and location of hydrophobic transmembrane segments, and the position of three conserved sequences: DKTGT (where the β-aspartyl phosphointermediate forms), TGES (located upstream of the phosphorylation site in a smaller cytoplasmic loop), and GDGXNDXP (close to the ATP-binding site). According to this scheme, heavy metal pumps are classified as P1 -ATPases; H+ , Na+ /K+ , Mg2+ , and Ca2+ pumps, as P2 -ATPases; and the K+ -ATPase of E. coli, as a lone P3 -ATPase (Table 1). Several years later, Axelsen and Palmgren [26] developed a more complex classification based on the analysis of eight stretches of amino acid sequence shared by all P-type ATPases (Table 1). In this scheme, the heavy metal pumps remain as Type I ATPases, with the K+ -ATPase of E. coli distantly related to them. The Na+ ,K+ -, H+ ,K+ -, and Ca2+ -ATPases and H+ -ATPases still fall in the same general sector of the phylogenetic tree, but are separated into Type II and Type III, respectively. Finally, a place is found for two additional groups of P-ATPases that had become known from genome-sequencing projects: Type IV, involved in the transport of aminophospholipids and other hydrophobic compounds, and Type V, whose function is unknown.
1 Introduction Table 1 Classification of P-Type ATPases
Lutsenko and Kaplan [25]
Cation specificity
Axelsen and Palmgren [26]
Cation specificity
P1
Heavy metal
Type I
P2
Ca2+ Na+ /K+ H+ /K+ H+ Mg2+ K+
Type II
Heavy metal K+ Ca2+ Na+ /K+ H+ /K+ H+ Mg2+
P3
Type III
Type IV Type V
Aminophospholipid Unknown
Within the P-type group, the P2 - or Type II/III ATPases have been most intensively studied. These ATPases have a characteristic topology, with four membranespanning segments (M1-4) towards the N-terminal end of the polypeptide and six such segments (M5-10) towards the C-terminal end (Figure 2). The large cytoplasmic loop between M4 and M5 and the smaller cytoplasmic loop between M2 and M3 assemble to form the catalytically active part of the molecule, as discussed below. The N- and C-termini are also exposed at the cytoplasmic surface of the membrane. This chapter will review how the yeast and Neurospora Pma1 H+ -ATPases have come to serve as simple eukaryotic prototypes of the P2 -ATPases, based on the powerful genetic and cell biological tools that are available to study their reaction mechanism, regulation, and biogenesis.
Fig. 2 Transmembrane topology of Pma1 H+ -ATPase. The ATPase has 10 transmembrane "-helices with both N- and C-termini located in the cytoplasm. Highly conserved amino acid sequences characteristic of Type III ATPases are shown. PM: plasma membrane.
3
4 Plasma-Membrane H+ -ATPase From Yeast
2 Structure 2.1 Ca2+ -ATPase as a Model
A landmark event came in 2000 with the publication by Toyoshima and his colleagues of the first high-resolution structure of a P-type ATPase. They successfully crystallized sarcoplasmic reticulum Ca2+ -ATPase in the E1 conformation (with two Ca2+ ions bound) and solved the structure at 2.6 Å [27]. As expected from an earlier cryoelectron microscopic map at 8 Å resolution [28], they found the ATPase to be organized into a cytoplasmic headpiece and a transmembrane or M domain. Within the latter, four of the ten transmembrane α-helices (M4, M5, M6, and M8) assemble to create a pair of Ca2+ binding sites, consistent with the known stoichiometry of the ATPase. Site I is formed by side-chain oxygen atoms in M5 (N768 and E771), M6 (T799 and D800), and M8 (E908), while Site II is formed by main-chain carbonyl oxygens in M4 (V304, A305, I307) and side-chain oxygens in M4 (E309) and M6 (N796 and D800). Local unwinding of transmembrane helices, helped by two Pro residues in M4 and one in M6, plays an essential role in the architecture of both sites. The headpiece of Ca2+ -ATPase is organized into three domains: P (phosphorylation), N (nucleotide-binding), and A (actuator or anchor), quite distinct from one another in the E1 Ca conformation. The P domain is formed by the two ends of the large M4–M5 cytoplasmic loop and contains the aspartyl residue (D351) that undergoes alternate phosphorylation and dephosphorylation during the reaction cycle. It is a typical Rossman fold, with a seven-stranded parallel β-sheet and eight short α-helices. The large N domain consists of the middle portion of the M4–M5 loop, arranged in a seven-stranded antiparallel β-sheet lying between two helix bundles. In crystals soaked with a non-hydrolyzable ATP analog (TMP-AMP), this compound appears within the N domain, surrounded by residues previously implicated in nucleotide binding. Finally, the N-terminal part of the ATPase and the small M2–M3 cytoplasmic loop come together to form the A domain, which is connected to the transmembrane part of the ATPase by long loops. The function of the A domain was unclear at the time of the first Toyoshima paper, although there was evidence from proteolytic cleavage studies that its conformation changes substantially during the reaction cycle. In 2002 the same group published a second high-resolution (3.1 Å) structure of Ca2+ -ATPase, this time stabilized in the E2 conformation by the inhibitor thapsigargin [29]. It is strikingly different from the E1 Ca structure. On the one hand, six of the ten transmembrane helices (M1–M6) move significantly between E1 and E2 , “opening” the M domain, destroying the two Ca2+ binding sites, and, presumably, releasing Ca2+ into the lumen of the sarcoplasmic reticulum. Simultaneously, the A domain rotates ca. 100◦ horizontally and the N domain inclines ca. 90◦ relative to the membrane, shifting the cytoplasmic headpiece into a “closed” conformation. The authors suggest that the return from E2 to E1 requires Ca2+ to bind, fixing the
2 Structure
membrane helices in a way that prohibits the closed conformation of the cytoplasmic headpiece. The headpiece opens; the γ -phosphate of bound ATP can now reach and phosphorylate D351; and the cycle continues. Thus, although much remains to be done to understand the reaction mechanism fully, the two high-resolution structures have made it possible to develop a preliminary picture of how the Ca2+ pump may work. 2.2 Applicability of the Ca2+ -ATPase Structure to Other P2 -ATPase, Including the Pma1 H+ -ATPase
Not surprisingly, there has since been a flurry of activity to determine whether the first (and now the second) Toyoshima structure can serve as a legitimate template for other P2 -ATPases. So far, there is reason for optimism. Sweadner and Donnet [30] found that the sequence of Na+ ,K+ -ATPase could readily be projected onto the 2.6 Å Ca2+ -ATPase structure in a way that superimposes short stretches of homology throughout the cytoplasmic and membrane domains; when this was done, most insertions and deletions appeared (as they presumably should) on the protein surface. The authors concluded that the core regions of both ATPases are likely to fold in the same basic manner. Further support is that conformationally sensitive Fe2+ cleavage sites, identified by Karlish and co-workers for the Na+ ,K+ ATPase, can be mapped logically onto the E1 and E2 structures of Ca2+ -ATPase (reviewed in Ref. [31]). K¨uhlbrandt and co-workers [32] have recently provided strong evidence that the similarity also extends to the Neurospora and yeast plasma-membrane H+ ATPases. In this case, the authors made a point-by-point comparison between their earlier 8 Å cryoelectron microscopic map of the Neurospora enzyme [33] and Toyoshima’s 2.6 Å crystallographic structure of Ca2+ -ATPase [27]. There was an excellent fit between the two in the M and P domains; the A and N domains could also be accommodated by a rigid-body displacement of 10 Å in the first case and a more substantial rotation of 73◦ in the second case. Within the headpiece, key structural elements involved in ATP binding and formation of the phosphoenzyme reaction intermediate were readily recognizable, and within the membrane, two of the ten cation-ligating residues were conserved in the H+ -ATPase (D730 and E803, corresponding to D800 and E908 of Ca2+ -ATPase). Indeed, only the highly divergent N- and C-termini were impossible to overlay on the Ca2+ -ATPase template. Thus, it seems reasonable to use the K¨uhlbrandt H+ -ATPase model as a framework in developing experiments and interpreting results on the Pma1 ATPases of yeast, Neurospora, and other fungi. 2.3 H+ -ATPase Oligomers
By contrast with the steady progress made towards a detailed structure for the 100 kDa catalytic subunit, there has been considerable debate about the oligomeric state
5
6 Plasma-Membrane H+ -ATPase From Yeast
of the Pma1 ATPase. For the Neurospora enzyme, Scarborough and co-workers have demonstrated that 100 kDa monomers are fully competent to carry out ATP hydrolysis and ATP-dependent proton transport after reconstitution into proteoliposomes [34]. Under other conditions, however, the same enzyme clearly forms hexamers [35], which allow the ready production of two-dimensional crystals for cryoelectron microscopy [33]. Even larger oligomers of yeast Pma1 H+ -ATPase (up to dodecamers) are observed when detergent-solubilized cell extracts are analyzed by gradient centrifugation [36], blue native polyacrylamide gel electrophoresis [37], or co-immunoprecipitation of differently tagged 100 kDa polypeptides [38]. The biological significance of the oligomers is not yet clear. Recent studies by the Chang, Schekman, and Simons laboratories [36–38] suggest that they may play a role in biogenesis, since they are formed soon after the 100 kDa ATPase polypeptide is synthesized and appear to be associated with packaging the ATPase into “lipid rafts” for transport to the plasma membrane (see below). Conversely, K¨uhlbrandt et al. [32] found crystalline patches of rosette-shaped particles in freeze–fracture replicas of starving yeast and Neurospora cells, and consider them to be a downregulated “storage form” of the enzyme. Further work is required to sort out these possible interpretations, which are not mutually exclusive. It will also be of interest to map the regions of the Pma1 molecule that take part in oligomer formation; based on their structural model of the Neurospora ATPase, K¨uhlbrandt et al. [32] propose that the C-terminal domain of one monomer (known to be exposed in the cytoplasm [39]) interacts with Q624 and R625 in the P domain of the neighboring monomer. 2.4 Associated Proteolipids
Studies of highly purified Pma1 H+ -ATPase preparations have given no evidence for a tightly bound β subunit of the kind seen in mammalian Na+ ,K+ - and H+ ,K+ ATPases. Conversely, two small proteolipids do associate closely with the Pma1 enzyme and appear to play a physiologically significant role. Information about them began to emerge in 1992, when Navarre and co-workers first observed diffuse 4 and 7.5 kDa bands in Coomassie-stained gels of highly purified yeast H+ -ATPase [40]. Chloroform–methanol extracts of both bands yielded the same 38-amino acid proteolipid, encoded by a gene that the authors named PMP1 (for plasma-membrane proteolipid). A closely related PMP2 gene was later cloned by hybridization with a PMP1 probe [41]. Although the two open reading frames predict slightly different N-termini, post-translational processing yields mature Pmp1 and Pmp2 proteolipids that share all but a single amino acid residue (A21 vs. S21). Their hydropathy profiles are virtually identical, with a hydrophobic stretch of 24 residues followed by a highly basic, 14-amino acid tail. In their overall topology, Pmp1 and Pmp2 call to mind the amphipathic FYXD proteins that co-purify with mammalian Na+ ,K+ -ATPases (reviewed in Ref. [42]) as well as two regulatory proteins, phospholamban and sarcolipin, that are associated with sarcoplasmic reticulum Ca2+ -ATPases [43, 44].
3 Reaction Mechanism
Single deletions of PMP1 or PMP2 have only minor effects on the activity of yeast H+ -ATPase, but deletion of both genes leads to a 50% reduction in activity, suggesting some kind of regulatory role for the two proteolipids [41]. Recent NMR studies by Neumann and co-workers have explored the in vitro interactions between synthetic fragments of Pmp1 and lipid micelles [45, 46]. Based on their results, the authors propose that Pmp1 extends across the plasma membrane bilayer, with its bulky, hydrophobic N-terminal end located alongside the membrane-embedded domain of the ATPase in the outer, sterol- and sphingolipid-rich leaflet, while its C-terminal part sequesters a subset of negatively charged phosphatidylserine lipids in the inner leaflet to create a proper environment for the ATPase at the cytoplasmic surface of the membrane. The relationship between these structural data and stepwise models for ATPase biogenesis is discussed below.
3 Reaction Mechanism 3.1 Overview of the Reaction Cycle
This section covers three interrelated aspects of the reaction mechanism of Pma1 H+ -ATPase: (i) ATP binding and formation of the β-aspartyl phosphoenzyme intermediate; (ii) the E1 –E2 conformational change; and (iii) H+ transport. The Post-Albers scheme, based on more than three decades of investigation into the partial reactions of P-ATPases, describes the generally accepted relationship among these steps. In the top limb of the cycle, as diagrammed for the H+ -ATPase (Figure 3), ATP is bound with high affinity to the E1 form of the enzyme. Intracellular H+ (or a hydronium ion; see below) then binds and stimulates the transfer of γ -phosphate from ATP to create the high-energy β-aspartyl phosphate intermediate; E1 P shifts to E2 P, accompanied by release of the transported H+ to the extracellular medium; and ATP binds with low affinity to the E2 form, accelerating the return to E1 .
Fig. 3 Reaction cycle of Pma1 H+ -ATPase. Based on the classical Post-Albers scheme for P-type ATPases.
7
8 Plasma-Membrane H+ -ATPase From Yeast
3.2 ATP Binding and Phosphorylation
Although some of the amino acid residues that contribute to the ATP-binding site of P-type ATPases were first identified by affinity labeling and site-directed mutagenesis (reviewed in Ref. [47]), a much more complete picture has emerged with the crystal structure of Ca2+ -ATPase, where TNP-ATP was used to visualize the nucleotide site directly [27]. As expected, key residues have remained virtually unchanged during evolution. They include F487 in Ca2+ -ATPase (F451 in yeast H+ ATPase), K492 (K456), and K515 (K474), which lie deep in the pocket surrounding the adenosine moiety, and R560 (R509) near the mouth of the pocket, where it probably interacts with the β or γ phosphate of ATP. Consistent with this picture, mutational analysis has demonstrated the functional importance of K474 in the yeast Pma1 ATPase [48], and Pardo and co-workers have shown that K474 of the Neurospora enzyme is labeled by fluorescein 5-isothiocyanate in an ATP- and ADPprotectable way [49]. Overall, however, the N domain has been less well conserved than the rest of the cytoplasmic headpiece, with Ca2+ - and H+ -ATPases sharing only ca. 20 % identity in this region. More work is needed to correlate the structural differences with known differences in catalytic properties. In particular, the yeast H+ -ATPase has a K m for MgATP of ca.1.5 mM and K D s that are too high to measure accurately; its relatively weak affinity may relate to the smaller size of the N domain (ca. 150 residues) than in Ca2+ -ATPase (ca. 240 residues). By contrast, the P domain is similar in size (ca. 165 residues) and displays a very high degree of sequence identity (up to 40%) among P2 -type ATPases. This is not surprising in view of its central role in the reaction cycle. Functionally important P-domain motifs include 347-DKTGTLT in sarcoplasmic reticulum Ca2+ -ATPase (378-DKTGTLT in yeast H+ -ATPase) and 601-DPPR (534-DPPR), both located in or near the “hinge” region between the P and N domains; and 701-TGDGVNDAP (632TGDGVNDAP), located at the interface of the P and A domains. The first of these motifs surrounds the Asp residue that forms the characteristic high-energy E1 P intermediate, while the second and third contain residues involved in phosphoryl group transfer [50–52]. Based on site-directed mutagenesis studies of yeast Pma1 H+ -ATPase, amino acid substitutions within these highly conserved parts of the P domain often lead to a pronounced defect in protein folding, which can be recognized by an elevated sensitivity of the ATPase to trypsin [53]. In severe cases, newly synthesized ATPase fails to leave the endoplasmic reticulum and is eventually degraded by the proteasome (see below). In milder cases, the ATPase manages to reach the plasma membrane, but is then recognized by a yet-to-be-identified quality control mechanism and sent to the vacuole for degradation [54]. Similar folding problems have been reported for Na+ ,K+ -ATPase carrying mutations in the conserved TGDGVNDSP motif [55]. Thus, as might be expected from the crystal structures, the interfaces between the P, N, and A domains play a critical role in protein folding and stability. Valiakhmetov and Perlin [56] have recently reported encouraging progress towards mapping the P domain of yeast Pma1 H+ -ATPase. In this study, oxidative
3 Reaction Mechanism
cleavage by Fe2+ was used to identify parts of the 100 kDa polypeptide that lie close to bound Mg2+ and ATP during phosphoryl group transfer. Of the cleavage sites described, three are near conserved residues (Thr-558 and Lys-615) known to take part in the phosphorylation mechanism; the fourth cleavage site (tentatively located within 335-PVGLPA) provides intriguing evidence that the cytoplasmic end of membrane segment 4 may play a hitherto-unsuspected role in this part of the reaction cycle. 3.3 E1–E2 Conformational Change
Profound conformational changes accompany ATP hydrolysis by P-type ATPases [29], and are intricately involved in the transport mechanism. For simplicity, the reaction cycle is usually drawn as an alternation between two major states, E1 and E2 (Figure 3), even though the shift from one to the other presumably involves a carefully orchestrated sequence of smaller movements. Early evidence for structural differences between E1 and E2 came from proteolytic studies of Na+ ,K+ - and Ca2+ ATPases, in which distinctive cleavage patterns could be seen depending upon the step of the reaction cycle (reviewed in Ref. [57]). Differences were also reported in intrinsic tryptophan fluorescence and in the accessibility of individual amino acid residues to group-specific reagents [57]. Several years ago, Karlish and co-workers introduced a versatile way to probe conformational changes in Na+ ,K+ -ATPase by mapping and comparing the sites of Fe2+ - and Cu2+ -catalyzed oxidative cleavage in the presence of ligands that pull the ATPase into the E1 or E2 state. To date, the results fit well with the E1 and E2 structures of Ca2+ -ATPase, and indeed serve to extend those structures by providing information on the conformation-dependent binding of ATP and Mg2+ (reviewed in Ref. [31]). As expected, Pma1 H+ -ATPase also displays major conformational changes during its reaction cycle. Early work by Scarborough [17] and by our laboratory [18] showed that the tryptic fragmentation pattern of the Neurospora enzyme could be altered by ligands that pull the ATPase into the E1 state (ADP) or E2 state (vanadate). Further insight into the nature of the E1 –E2 conformational change has come from a novel group of yeast H+ -ATPase mutants. Haber and co-workers selected the first example of this group (S368F) based on its resistance to hygromycin B [58], an antibiotic later shown to require a negative membrane potential for uptake into the cell [59]. Not only was H+ -ATPase activity reduced in S368F, but the mutant enzyme also had three kinetic abnormalities: a large increase in IC50 for inhibition by vanadate, a several-fold lowering of the K 1/2 for MgATP, and an alkaline shift in pH optimum [60]. Soon afterwards, three similar mutants were identified by our laboratory as part of a scanning mutagenesis study of membrane segment 4 (I332A, V336A, V341A; Ref. [71]). In this case, the amino acid substitutions were located in the middle of M4, making it difficult to imagine that vanadate binding was affected directly. Much more likely was a shift in the conformational equilibrium of the ATPase from E2 (where vanadate binds tightly as a Pi analogue) towards E1 (which is expected to have low affinity for vanadate but high affinity for ATP
9
10 Plasma-Membrane H+ -ATPase From Yeast
and H+ ; see reaction cycle in Figure 3). Mutants with similar kinetic changes have since been described for Na+ ,K+ -ATPase [61, 62], gastric H+ ,K+ -ATPase [63], and sarcoplasmic reticulum Ca2+ -ATPase [64, 65] and interpreted in the same way. The discovery of the vanadate-resistant phenotype has provided a convenient way to map regions of the yeast H+ -ATPase polypeptide that work together to affect the E1 -E2 equilibrium. Not surprisingly, given the global nature of the conformational change [29], “E1 –E2 ” mutations are scattered throughout the 100 kDa polypeptide. Against this general background, however, a remarkable cluster has recently been discovered in a scanning mutagenesis study of the stalk region (S4) that links the cytoplasmic end of membrane-spanning segment 4 to the phosphorylation site, D378. As reported by Ambesi and co-workers [66], mutants at thirteen successive positions from I359 through G371 (including S368F) have undergone a 10- to 200fold increase in the IC50 for vanadate; most of them also show a parallel decrease in the apparent affinity for MgATP and an alkaline shift in pH optimum. In the Ca2+ -ATPase-based homology model of the H+ -ATPase [32], the I359–G371 stretch forms a short α-helix (P1) that is oriented transversely between the membrane and the phosphorylation site (D378). Based on the kinetic behavior of the E1 E2 mutants, it seems likely that this helix plays a central role in the conformational shift that lies at the heart of the transport mechanism.
3.4 H+ Pumping
Sequence identity drops to less than 20% in the M domain of P2 -type ATPases, making it difficult to recognize functional features based on sequence analysis alone. Even before the Toyoshima E1 Ca structure appeared, however, there was considerable evidence from mutagenesis studies that membrane-spanning segments 4, 5, 6, and 8 (M4, M5, M6, and M8) define the actual transport pathway of P2 -ATPases [67–70]. The way in which these four segments cooperate to form the two Ca2+ -binding sites of Ca2+ -ATPase is now clear from the 2.6 Å crystal structure of that enzyme [27]. As summarized above, side-chain oxygen atoms (D, E, N, T) and main-chain carbonyl oxygen atoms (A, V, I) are involved, and the actual coordination geometry is made possible by local unwinding of M4 and M6. Detailed structures are not yet available for other P2 -ATPases, but site-directed mutagenesis has begun to shed light on the residues involved in cation binding and transport. For the yeast Pma1 H+ -ATPase, scanning mutagenesis has been performed along the entire length of M4 [71], M5 [72], M6 and M8 (Petrov et al. and Guerra et al., manuscripts in preparation) and throughout much of M1 and M2 [73]. Among the mutants that have been constructed and analyzed, four types of defects have been noted: i) Protein misfolding, leading to ER retention and eventual degradation. Mutations at 16 intramembrane positions fall into this category, presumably because they disrupt the precise helix packing needed to stabilize the M domain of the H+ ATPase. Not surprisingly, changes in charged residues are poorly tolerated in M5
3 Reaction Mechanism
(R695, H701) and M6 (D730, D739) of the H+ -ATPase (Petrov et al., manuscript in preparation). Kaplan and co-workers have shown that the M5–M6 hairpin of Na+ ,K+ -ATPase is only tenuously embedded in the lipid bilayer, requiring occluded K+ ions to keep it from being lost into the medium after cytoplasmic portions of the polypeptide are removed by proteolysis [74, 75]. M8 appears to play an even more important role in proper maturation of the H+ -ATPase, since alanine substitutions at seven positions along this segment (I794, F796, L797, Q798, I799, L801, I807) lead to misfolding and retention in the ER (Guerra et al., manuscript in preparation). ii) Decrease in ATPase activity. Alanine substitutions in M4 (at G333, L338, V341), M5 (Y691, S699), M6 (L721, V723, F724, I727, F728, L734, Y738) cause H+ ATPase activity to fall below 20 % of the value seen in the wild-type control [71, 72]; Petrov et al., manuscript in preparation. The preponderance of such mutants in M6 again calls attention to the special nature of this membrane segment and points to the importance of bulky hydrophobic residues (Phe, Tyr, Leu, Val, Ile) for its proper functioning. iii) Shift in conformational equilibrium from E1 to E2 . The “E1 –E2 ” phenotype, in which the H+ -ATPase becomes strongly resistant to vanadate, has already been discussed (see above). Six such mutants have been discovered in the membrane domain, and the fact that three of them are located in M4 (I332A, V341A, M346A) points to a possible link between S4 and M4 in mediating the E1 –E2 conformational change (Petrov et al., manuscript in preparation). Similar mutants are also found in M1 (M128C; Ref. [73]), M5 (V692A; Ref. [72]), and M6 (V723A; Petrov et al., manuscript in preparation). iv) Mutational change in coupling between ATP hydrolysis and H+ transport. H+ ATPases pose a special challenge in studies of transport mechanisms since there are no radioisotopic methods to assay unidirectional proton fluxes or to detect protons occluded within the membrane. Instead, experimenters generally rely upon a fluorescent probe such as acridine orange or 9-amino-6-chloro-2methoxyacridine (ACMA) to measure the initial rate of proton uptake by ATPase that has been expressed in inside-out secretory vesicles [76] or purified and reconstituted into proteoliposomes [9–11]. Such measurements gain in reliability if the MgATP concentration is varied over a broad range (say, from 0 to 3.0 mM) and the initial rate of fluorescence quenching plotted as a function of the initial rate of ATP hydrolysis. The resulting plots are typically linear, giving evidence for constant coupling across a wide range of pump velocities [71].
To date, more than 80 mutant H+ -ATPases have been analyzed in this way. For most amino acid substitutions, little or no change is seen in the slope of the quenching vs. hydrolysis plot. However, membrane-spanning segments 5, 6, and 8 contain twelve positions at which mutations have caused the slope (or “coupling ratio”) to fall below 50 % of that seen in the wild-type control [72], Petrov et al., and Guerra et al., manuscripts in preparation. Based on these results, both hydrophobic residues (I712, L700, L708, L713, M791) and polar residues (T733, N792, E703, E803) contribute directly or indirectly to the transport pathway, as do two alanines (A726 and A732) and a glycine (G793).
11
12 Plasma-Membrane H+ -ATPase From Yeast
An unexpected effect on coupling has come from mutations at two positions in M8 of the yeast Pma1 ATPase. In the first case (E803, homologous to E908 of Ca2+ ATPase), replacement by Asn or Ala caused pumping to fall to barely detectable levels, but replacement by Gln led to a remarkable two-fold increase in the slope of the quenching vs. hydrolysis plot [77]. Approximately one helical turn away, substitution of S800 by Ala also produced a ca. 2-fold increase in slope (Guerra, L., et al., manuscript in preparation). Control measurements, in which the rate of decay of pH was tracked after the addition of vanadate to turn off the pump, showed no significant change in the passive permeability of the membrane [77]. Thus, the enhanced transport seen in S800A and E803Q appears to reflect an actual improvement in the effective stoichiometry of the pump. Further work is required to understand the relationship between this finding and earlier reports of variable stoichiometries in the Neurospora [78] and yeast [79] Pma1 ATPases. Efforts are already under way to model the transport pathway of Pma1 ATPase, with the longer-range goal of synthesizing the results of site-directed mutagenesis into a clear mechanistic picture of how proton transport actually works. On the one hand, it is tempting to draw analogies with better-understood proton pumps such as bacteriorhodopsin [80] and Fo F1 ATPase [81], where the carboxyl group of an essential Asp or Glu residue is alternately protonated and deprotonated as H+ ions move through the membrane. On the other hand, the clearcut resemblance between Pma1 H+ -ATPase and P2 -type Na+ ,K+ , H+ ,K+ , and Ca2+ -ATPases suggests that hydronium ions (H3 O+ ) may be the actual transported species, crossing the membrane via one or more binding sites analogous to those seen in the crystal structure of sarcoplasmic reticulum Ca2+ -ATPase. Both possibilities are currently being pursued. Based on homology modeling, Bukrinsky and co-workers have proposed that D730 (M6) and I331, I332, and V334 (M4) of yeast Pma1 ATPase may form a binding site for H3 O+ , analogous to site II of sarcoplasmic reticulum Ca2+ -ATPase [82]. In a similar model of Neurospora Pma1 ATPase, however, Radresa and his colleagues have pointed to the relative shortage of negatively charged groups in the region corresponding to Ca2+ sites I and II [83]. Rather, these authors emphasize a bacteriorhodopsin-like string of polar cavities, starting at D378 (the catalytic phosphorylation site), that may conduct protons from the cytoplasmic surface into the interior of Pma1 ATPase. It is unclear whether site-directed mutagenesis will be able to distinguish unambiguously between the two types of mechanistic models or whether other kinds of data, including a longhoped-for high-resolution Pma1 structure, will be required.
4 Biogenesis
Alongside studies of reaction mechanism, there has been great interest in the biogenesis of P2 -type ATPases which, as large polypeptides anchored in the lipid bilayer by multiple hydrophobic helices, pose a special challenge to the membrane insertion and trafficking machinery. The yeast H+ -ATPase has proved to be an
4 Biogenesis
excellent model system. Unlike Na+ ,K+ -ATPase, which has been equally well studied (reviewed in [84, 85]), the yeast ATPase lacks the complication of an auxiliary β-subunit. Work in yeast also benefits from powerful genetic tools that are not yet available in mammalian systems. 4.1 Pma1 Mutants with Defects in Folding and Biogenesis
As expected, yeast Pma1 ATPase is synthesized and inserted into the membrane in the rough endoplasmic reticulum [86]. Studies by Addison on the closely related Neurospora H+ -ATPase have defined the way in which alternating signal anchor (SA) and stop-transfer (ST) sequences serve to thread the 100 kDa polypeptide into the bilayer, helped by hairpin formation between neighboring membrane segments [87, 88]. At the earliest time points in a pulse-chase experiment, the ATPase can already be protected against trypsinolysis by low concentrations of MgADP, MgATP, and vanadate, indicating that its cytoplasmic domain has folded to form recognizable binding sites for nucleotides and inhibitors [89]. It then travels along the well-known secretory pathway to the plasma membrane. The availability of nearly 300 point mutants of the yeast H+ -ATPase has provided a wealth of material to explore the structural requirements for ATPase folding and trafficking [90]. At one end of the spectrum are mutants like D378N, D378S, and D378A, which are severely misfolded [91], become trapped in the ER [91, 92], and are eventually ubiquitinated and degraded in the proteasome [93]. Such mutants act in a dominant lethal fashion to prevent growth when co-expressed with wild-type Pma1-ATPase [91, 92]; this behavior has recently been traced to the formation of mixed oligomers between mutant and wild-type polypeptides [38]. At the other end of the spectrum are mutants that display a modest folding defect and a relatively mild genetic phenotype. By means of confocal microscopy, it has been possible to visualize a transient arrest of one such ATPase (G381A) in prominent punctate structures, which resemble the vesicular-tubular complexes (VTCs) that serve as a normal ER-Golgi intermediate in mammalian cells [94]. Thus, the yeast structures may correspond to specific ER exit ports that become amplified as the mutant ATPase is synthesized. The G381A protein eventually escapes the VTCs and travels to the plasma membrane, still in a misfolded, trypsin-sensitive state. There, its behavior points to an additional, yet-to-be-characterized quality control step that recognizes the ATPase as defective and sends it to the vacuole for degradation [94]. A separate series of experiments has shown that G381A can impose its phenotype on co-expressed wild-type ATPase, transiently retarding the wild-type protein in the ER and later stimulating its degradation in the vacuole [94]. Both effects serve to lower the steady-state amount of wild-type ATPase in the plasma membrane and can thus explain the co-dominant genetic behavior of the G381A mutation. Between the two extremes are numerous genetically intermediate mutants, which display a recessive lethal phenotype but have not yet been studied biochemically. All three kinds of mutations cluster in regions of the H+ -ATPase that are likely to be
13
14 Plasma-Membrane H+ -ATPase From Yeast
important for protein folding: in the 231 TGES and 632 TGDGVNDAP motifs, located at the interface between the A and P domains; in the 535 DPPR and 378 DKTGTLT motifs, situated within or immediately adjacent to the “hinge” regions between the N and P domains; and in the membrane-spanning segments, known to play a critical role in protein folding and stability (reviewed in Ref. [90]). 4.2 Use of Pma1 Mutants to Screen for Other Genes that Play a Role in Biogenesis and Quality Control
Because Pma1 ATPase is delivered to the cell surface by way of the secretory pathway, its biogenesis can be probed with the classical collection of temperaturesensitive “sec” mutants that define the basic architecture of that pathway. For example, sec18ts , sec7ts , and sec6ts strains have made it possible to detect stepwise Ser/Thr phosphorylation of the ATPase between the endoplasmic reticulum and the plasma membrane [95]. The sec6ts strain has also permitted the expression of mutant Pma1 ATPases in secretory vesicles, where ATP hydrolysis and ATPdependent proton transport can be examined quantitatively, free of contamination by wild-type ATPase [76]. More recently, genetic screens have been invented to uncover genes that play a specialized role in ATPase biogenesis (Table 2). In one such approach, Haber and his colleagues [96] began with the pma1-114 mutant, which is moderately resistant to hygromycin B due to a low membrane potential; they then used higher concentrations of the same drug to select “mop” (modifier of pma1) mutations leading to increased resistance. Among the isolates were several allelic strains (mop2) with a reduced amount of Pma1 ATPase at the cell surface. The new mutations did not affect transcription of the PMA1 gene, nor did they cause Pma1 ATPase to be arrested intracellularly. Rather, the 108 kDa Mop2 protein (also known as End4 or Sla2) appears to play a role in setting the amount and/or stability of the ATPase at the plasma membrane. In parallel, Chang and co-workers have used misfolded Pma1 mutants to pinpoint other genes involved in ATPase biogenesis and quality control. One novel gene identified in this way is EPS1, which, when disrupted, suppresses the dominant lethal phenotype of pma1-D378N and re-routes the mutant ATPase to the plasma membrane [93]. Interestingly, EPS1 encodes a protein disulfide isomerase that may act as a membrane-bound chaperone. Other genes, when overexpressed [AST1; [97]] or disrupted [SOP; [98]], allow a temperature-sensitive mutant ATPase (pma1–114) to escape quality control and move to the plasma membrane. The nature of these gene products and the specificity of their role in biogenesis are currently under active study. Kaiser and his co-workers [99] have focused genetically on the formation and activity of COPII vesicles, which carry newly synthesized proteins from the ER to the Golgi. Among ten LST (lethal with sec thirteen) genes identified by a method known as “synthetic lethality”, the product of one (Lst1p) helps to package Pma1 ATPase into the vesicles [99, 100]; significantly, it is homologous to a known COPII
4 Biogenesis Table 2 Proteins governing the biogenesis of Pma1 H+ -ATPase
Protein
Subscellular location
Function
effect on Pma1 ATPase
Ref.
Eps1
ER (integral)
Protein disulfide isomerase
[93]
Lst1
ER (peripheral)
Component of COPII vesicle coat
Ast1
Multiple membranes (peripheral)
Targeting to the plasma membrane
Mop2 (End4, Sla2)
Cytoskeleton
Actin filament organization Cell wall organization and biogenesis Endocytosis, exocytosis Polar budding
Deletion of EPS1 prevents the proteasomal degradation of Pma1-D378N ATPase and reroutes it to the plasma membrane Deletion of LST1 inhibits the delivery of Pma1 ATPase to the cell surface Overexpression of AST1 reroutes a temperature-sensitive Pma1 ATPase mutant from the vacuole to the plasma membrane Mutation of MOP2 reduces the amount of Pma1 ATPase in the plasma membrane
Sop1-16
Various
Not known
Deletion of each SOP gene perturbs the endosomal trafficking and re-routes a temperature-sensitive Pma1 ATPase mutant from the vacuole to the plasma membrane
[98]
[99, 100] [97]
[96]
coat protein, Sec24p. More work will be required to understand the exact function of Lst1p, but its discovery hints at differentiation of the secretory machinery to handle large, physiologically important cargo molecules such as the ATPase. 4.3 Role of Lipid Rafts
In addition to the protein components of the secretory pathway, membrane lipids also play a vital role in the biogenesis of Pma1 ATPase. Recent studies by Simons and his colleagues have shown that newly synthesized ATPase and other proteins destined for the plasma membrane are sorted preferentially into lipid rafts, which form an ordered, sphingolipid- and ergosterol-rich phase within the bilayer [101]. If a mutation such as pma1-7 impairs the ability of the 100 kDa ATPase polypeptide to associate with the rafts, the ATPase is mistargeted to the vacuole rather than the plasma membrane [36]. Thus, the rafts are an essential part of the biogenesis
15
16 Plasma-Membrane H+ -ATPase From Yeast
process; at the plasma membrane, their high sphingolipid and sterol content may also protect against the stresses of the external environment [36, 101].
5 Regulation
Although P2 -ATPases are structurally and functionally similar in most respects, very different mechanisms have evolved to regulate their activity; this is perhaps not surprising in view of the wide range of physiological roles played by members of the family. In some cases, regulation is mediated by an interacting protein: for example, phospholamban for sarcoplasmic reticulum Ca2+ -ATPase [102] or 14-3-3 protein for plant plasma-membrane H+ -ATPase [103–105]. Small proteolipids are also associated with many P2 -ATPases and appear to modulate activity (see above). In other cases, regulation is accomplished intramolecularly by posttranslational modification – usually phosphorylation of one or more Ser/Thr residues (reviewed in Ref. [106]). The latter mechanism is clearly involved in the well-documented posttranslational regulation of yeast H+ -ATPase activity by glucose. It has been known since 1983 that, when yeast cells are placed in carbon-free medium, there is a rapid 5- to 10-fold decrease in H+ -ATPase activity, and when glucose is added back, activity rebounds completely in less than 5 minutes [107]. Although the mechanism is not yet fully understood, there is growing evidence to implicate the C-terminus of the ATPase, acting as an autoinhibitory domain [108]. Mutations at potential phosphorylation sites near the C-terminus affect the ability of glucose to stimulate ATPase activity [109], and in thermolysin digests of the ATPase, two (as yet unidentified) phosphopeptides decrease in amount during carbon starvation and increase again upon glucose addition [95]. Thus, it has been proposed that the C-terminus becomes dephosphorylated during carbon starvation, allowing it to interact in an inhibitory way with one or more catalytically important parts of the ATPase; upon addition of glucose, the C-terminus is rephosphorylated and the inhibition is released. Recent evidence points to a role for stalk segment 5 (the cytoplasmic extension of M5) in this regulatory process. Cys substitutions at seven positions along one face of S5 lead to strong constitutive activation of the ATPase [110]. Furthermore, labeling studies with a membrane-impermeant fluorescent maleimide have pinpointed three S5 cysteines that fail to react in ATPase from glucose-starved cells but become reactive in ATPase from glucose-metabolizing cells, pointing to a significant conformational change between the two states of the protein [111]. Work is now needed to identify the Ser/Thr residues that undergo glucose-dependent phosphorylation, and, by means of biochemical and biophysical approaches, to probe the interaction of the C-terminus with other parts of the ATPase including S5. Progress is also being made towards defining the elements of the signal transduction pathway. In particular, Portillo and co-workers have recently identified two kinases (Ptk2 and Hrk1) that may mediate Ser phosphorylation of the ATPase in response to glucose [112].
6 Emerging Knowledge of Other Yeast P-type ATPase
6 Emerging Knowledge of Other Yeast P-type ATPase
The sequence of the yeast genome, published in 1996 [113], allowed the first identification of the entire set of P-type ATPases that support the growth and physiological functioning of a simple eukaryotic cell. In all, there are 16 P-ATPases in Saccharomyces cerevisiae [114]; Table 3. The plasma membrane contains Pma and Ena ATPases that mediate the efflux of H+ and Na+ (or K+ ) from the cell. Of them, only Pma1 ATPase (the subject of this chapter) is well expressed and highly active under normal laboratory conditions. Ena1 ATPase is induced during growth at elevated Na+ or Li+ concentrations and serves to protect the cell against excessive accumulation of Na+ , Li+ , or K+ [115–117]. Ca2+ homeostasis in yeast relies upon a complex system of transporters and signaling pathways including two endomembrane P-type ATPases: Pmr1 and Pmc1, which are homologues of mammalian SPCA and PMCA ATPases [118–125]. Both work together to maintain a low Ca2+ concentration in the cytoplasm. In addition, by pumping Ca2+ into the Golgi, Pmr1 creates a lumenal Ca2+ concentration (10 µM) that permits the proper functioning of the secretory pathway [121]. Interestingly, Pmr1 ATPase can also transport Mn2+ ions, and it plays an essential role in the resistance of yeast to millimolar Mn2+ [122]. By site-directed mutagenesis of amino acid residues with oxygen-containing side chains throughout M4, M5, M6, M7, and M8, followed by screening for altered sensitivity to Mn2+ and ion-chelating agents, Rao and co-workers have identified two residues in M6 (Asn-774 and Asp-778) that are likely to be involved in divalent cation binding [122]. Even more intriguing is a third mutant in M6 (Q783A) that transports Ca2+ normally but displays a 60-fold
Table 3 P-type ATPases in Saccharomyces cerevisiae
Specificity
Protein
Type Subcellular location
Size
Ref.
H+ H+ Na+ K+ Ca2+ , Mn2+ Ca2+ Cu2+
Pma1 Pma2 Ena1 (Hor6, Pmr2) to Ena5a Pmr1 (Ldb1) Pmc1 Ccc2 Pca1 (Cad2, Pay2) Drs2 (Fun38, Swa3) Neo1 Dnf1 Dnf2 Dnf3 Spf1 (Cod1, Pio1, Per9) YOR291w
IIIA IIIA IID IIA IIB IB IB IV IV IV IV IV V V
918 aa 947 aa 1091 aa 950 aa 1173 aa 1004 aa 1216 aa 1355 aa 1151 aa 1571 aa 1612 aa 1656 aa 1215 aa 1472 aa
This chapter [135, 136] [115–117] [118–123] [124, 125] [126, 127] [128] [129–131] [131] [131] [131] [131] [132, 133] [134]
PL
Unknown
a
Plasma membrane Plasma membrane Plasma membrane Golgi Vacuolar membrane Golgi Unknown Golgi Unknown Plasma membrane Plasma membrane Golgi ER Unknown
Various strains of Saccharomyces cerevisiae contain a tandem cluster of up to 5 nearly identical ENA genes. PL: Aminophospholipid.
17
18 Plasma-Membrane H+ -ATPase From Yeast
reduction in affinity for Mn2+ [122]; results of this kind promise to be useful in modeling the molecular basis for cation selectivity. Studies have barely begun on the remaining yeast P-type ATPases. Two belong to the Type IB subfamily and are related to the Cu2+ -ATPases known to underlie Menkes and Wilson’s diseases; they are Ccc2, located in a late-Golgi or post-Golgi compartment [126, 127], and Pca1, whose location and function have not yet been established [128]. Five (Drs2/Fun38/Swa3, Neo1, and Dnf1-3) are Type IV ATPases, postulated to “flip” aminophosholipids between leaflets of the membrane bilayer [129–131]. One Type V ATPase (Spf1/Cod1/Pio1/Per9), located in the endoplasmic reticulum, has been implicated in Ca2+ homeostasis [132, 133], while another (YOR291W) is known only as an open reading frame [134]. Thus, it seems certain that the P-type ATPases of yeast will provide rich material for studies of subcellular targeting, molecular mechanism, and physiological regulation for many years to come.
Acknowledgments
The authors are grateful to members of the Slayman laboratory for helpful comments on the manuscript. Work in the laboratory has been supported by research grant GM15761 from the National Institute of General Medical Sciences.
References 1 SERRANO R., KIELLAND-BRANDT M. C., and FINK G. R., Nature, 1986, 319, 689–693. 2 GRADMANN D. HANSEN U. P., LONG W. S., SLAYMAN C. L., and WARNCKE J., J. Membr. Biol., 1978, 39, 333–367. 3 SLAYMAN C. L., LU C. Y. H., and SHANE L., J. Membr. Biol. 1973, 14, 305–338. 4 SLAYMAN C. L., Am. Zool., 1970, 10, 377–392. 5 DUFOUR J. P., and GOFFEAU A., J. Biol. Chem., 1978, 253, 7026–7032. 6 MALPARTIDA F., and SERRANO R., FEBS Lett., 1980, 111, 69–72. 7 BOWMAN B. J., BLASCO F., and SLAYMAN C. W., J. Biol. Chem., 1981, 256, 12343–12349. 8 ADDISON R. and SCARBOROUGH G. A., J. Biol. Chem., 1981, 256, 13165–13171. 9 VILLALOBO A., BOUTRY M. and GOFFEAU A., J. Biol. Chem., 1981, 256, 12081–12087. 10 MALPARTIDA F., and SERRANO R., J. Biol. Chem., 1981, 256, 4175–4177. 11 PERLIN D. S., KASAMO K., BROOKER R. J.,
12
13 14 15 16
17 18 19
20
and SLAYMAN C. W., J. Biol. Chem., 1984, 259, 7884–7892. PERLIN D. S., SAN FRANCISCO M. J., SLAYMAN C. W., and ROSEN B. P., Arch. Biochem. Biophys. 1986, 248, 53–61. AMORY A., FOURY F., and GOFFEAU A., J. Biol. Chem., 1980, 255, 9353–9357. DAME J. B. and SCARBOROUGH G. A., Biochemistry, 1980, 19, 2931–2937. DAME J. B. and SCARBOROUGH G. A., J. Biol. Chem., 1981, 256, 10724–10730. AMORY A., GOFFEAU A., MCINTOSH D. B., and BOYER P. D., J. Biol. Chem., 1982, 257, 12509–12516. ADDISON R., and SCARBOROUGH G. A., J. Biol. Chem., 1982, 257, 10421–10426. MANDALA S. M., and SLAYMAN C. W., J. Biol. Chem., 1988, 263, 15122–15128. HAGER K. M., MANDALA S. M., DAVENPORT J. W., SPEICHER D. W., BENZ E. J. Jr., and SLAYMAN C. W., Proc. Natl. Acad. Sci. U. S. A., 1986, 83, 7693–7697. ADDISON R., J. Biol. Chem., 1986, 261, 14896–14901.
References 21 HESSE J E., WIECZOREK L., ALTENDORF K., REICIN A. S., DORUS E., and EPSTEIN W., Proc. Natl. Acad. Sci. U. S. A., 1984, 81, 4746–4750. 22 KAWAKAMI K., NOGUCHI S., NODA M., TAKAHASHI H., OHTA T., KAWAMURA M., NOJIMA H., NAGANO K., HIROSE T., INAYAMA S. HAYASHIDA H., MIYATA T., and NUMA S., Nature, 1985, 316, 733–773. 23 SHULL G. E., SCHWARTZ A., and LINGREL J. B., Nature, 1985, 316, 691–669. 24 MACLENNAN D. H., BRANDL C. J., KORCZAK B., GREEN N. M., Nature, 1985, 316, 696–700. 25 LUTSENKO S., and KAPLAN J. H., Biochemistry, 1995, 34, 15607–156013. 26 AXELSEN K. B., and PALMGREN M. G., J. Mol. Evol., 1998, 46, 84–101. 27 TOYOSHIMA C., NAKASAKO M., NOMURA H., and OGAWA H., Nature, 2000, 405, 647–655. 28 ZHANG P., TOYOSHIMA C., YONEKURA K., GREEN N. M., and STOKES D. L., Nature, 1998, 392, 835–839. 29 TOYOSHIMA C., and NOMURA H., Nature, 2002, 418, 605–611. 30 SWEADNER K. J., and DONNET C., Biochem. J., 2001, 356, 685–704. 31 KARLISH S. J. D., Ann. New York Acad Sci., 2003, 986, 39–49. 32 K¨UHLBRANDT W., ZEELEN J., and DIETRICH J., Science, 2002, 297, 1692–1696. 33 AUER M., SCARBOROUGH G. A., and K¨UHLBRANDT W., Nature, 1998, 392, 840–843. 34 GOORMAGHTIGH E., CHADWICK C., and SCARBOROUGH G. A., J. Biol. Chem., 1986, 261, 7466–7471. 35 CHADWICK C. C., GOORMAGHTIGH E., and SCARBOROUGH G. A., Arch. Biochem. Biophys., 1987, 252, 348–356. 36 BAGNAT M., CHANG A., and SIMONS K., Mol. Biol. Cell, 2001, 12, 4129–4138. 37 LEE M. C., HAMAMOTO S., and SCHEKMAN R., J. Biol. Chem., 2002, 277, 22395–22401. 38 WANG Q., and CHANG A., Proc. Natl. Acad. Sci. U. S. A., 2002, 99, 12853–12858. 39 MANDALA S. M., and SLAYMAN C. W., J. Biol. Chem., 1989, 264, 16276–81621. 40 NAVARRE C., GHISLAIN M., LETERME S.,
41
42 43
44
45
46
47
48 49 50
51
52
53
54
55 56 57 58
FERROUD C., DUFOUR J. P., and GOFFEAU A., J. Biol. Chem., 1992, 267, 6425–6428. NAVARRE C., CATTY P., LETERME S., DIETRICH F. and GOFFEAU, A. J. Biol. Chem., 1994, 269, 21262–21268. CRAMBERT G., and GEERING K., Sci. STKE 166, 2003, RE1. MACLENNAN D. H., ABU-ABED M., and KANG C., J. Mol. Cell. Cardiol., 2002, 8, 897–918. MACLENNAN D. H., ASAHI M., TUPLING A. R., Ann. N. Y. Acad. Sci., 2003, 986, 472–480. BESWICK V., ROUX M., NAVARRE C., COIC Y. M., HUYNH-DINH T., GOFFEAU A., SANSON A., and NEUMANN J. M., Biochimie, 1998, 80, 451–459. MOUSSON F., COIC Y. M., BALEUX F., BESWICK V., SANSON A., and NEUMANN J. M., Biochemistry, 2002, 41, 13611–13616. JORGENSEN P. L., and PEDERSEN P. A., Biochim. Biophys. Acta, 2001, 1505, 57–74. MALDONADO A. M., and PORTILLO F., J. Biol. Chem., 1995, 270, 8655–8659. PARDO J. P., and SLAYMAN C. W., J. Biol. Chem., 1988, 263, 18664–18668 FARLEY R. A., ELQUZA E., MULLER-EHMSEN J., KANE D. J., NAGY A. K., KASHO V. N., FALLER L. D., Biochemistry, 2001, 40, 6361–6370. PEDERSEN P. A., JORGENSEN J. R., JORGENSEN P. L., J. Biol. Chem., 2000, 275, 37588–37595 JORGENSEN P. L., JORGENSEN J. R., PEDERSEN P. A., J. Bioenerg. Biomembr., 2001, 33, 367–377. DEWITT N. D., TURINHO DOS SANTOS C. F., ALLEN K. E., and SLAYMAN C. W., J. Biol. Chem., 2002, 277, 21027–21040. FERREIRA T., MASON A. B., PYPAERT M., ALLEN K. E., and SLAYMAN C. W., J. Biol. Chem., 1988, 277, 21027–21040 JORGENSEN J. R., and PETERSEN P. A., Biochemistry, 2001, 40, 7301–7308 VALIAKMETOV A., and PERLIN D. S., J. Biol. Chem., 2003, 278, 6330–6336. JORGENSEN P. L., and ANDERSEN J. P., J. Membr. Biol., 1988, 103, 95–120. MCCUSKER J. H., PERLIN D. S., and HABER, J. E., Mol. Cell. Biol., 1987, 7, 4082–4088.
19
20 Plasma-Membrane H+ -ATPase From Yeast
59 PERLIN D. S., BROWN C. L., and HABER J. E., J. Biol. Chem., 1988, 263, 18118–18122. 60 PERLIN D. S., HARRIS S. L., SETO-YOUNG D., and HABER J. E., J. Biol. Chem., 1989, 264, 21857–21864. 61 BOXENBAUM N., DALY S. E., JAVAID Z. Z., LANE L. K., BLOSTEIN R., J. Biol. Chem., 1998, 273, 23086–23092. 62 SEGALL L., LANE L. K., BLOSTEIN R., J. Biol. Chem., 2002, 277, 35202–35209. 63 De PONT J. J., SWARTS H. G., WILLEMS P. H., KOENDERINK J. B., Ann. New York Acad. Sci., 2003, 86, 175–182. 64 VILSEN B., Biochemistry, 1997, 36, 13312–13324. 65 TOUSTRUP-JENSEN M., HAUGE M., VILSEN B., Biochemistry, 2001, 40, 5521–5532. 66 AMBESI A., MIRANDA M., ALLEN K. E., and SLAYMAN C. W., J. Biol. Chem., 2000, 275, 20545–20550. 67 CLARKE D. M., LOO T. W., INESI G., MACLENNAN D. H., Nature, 1989, 339, 476–478. 68 CLARKE D. M., LOO T. W., MACLENNAN D. H., J. Biol. Chem., 1990, 265, 22223–22227. 69 NIELSEN J. M., PEDERSEN P. A., KARLISH S. J., JORGENSEN P. L., Biochemistry, 1998, 37, 1961–1968. 70 PEDERSEN P. A., NIELSEN J. M., RASMUSSEN J. H., JORGENSEN P. L., Biochemistry, 1998, 37, 17818–17827. 71 AMBESI A., PAN R. L. and SLAYMAN C. W., J. Biol. Chem., 1996, 271, 22999–23005. 72 DUTRA M. B., AMBESI A., SLAYMAN C. W., J. Biol. Chem., 1998, 273, 17411–17417. 73 SETO-YOUNG D., HALL M. J., NA S., HABER J. E., and PERLIN D. S., J. Biol. Chem., 1996, 271, 581–587. 74 LUTSENKO S., ANDERKO R., KAPLAN J. H., Proc. Natl. Acad. Sci. U. S. A., 1995, 92, 7936–7940. 75 GATTO C., LUTSENKO S., SHIN J. M., SACHS G., KAPLAN J. H., J. Biol. Chem., 1999, 274, 13737–13740. 76 NAKAMOTO R. K., RAO R. and SLAYMAN C. W., J. Biol. Chem., 1991, 266, 7940–7949. 77 PETROV V. V., PADMANABHA K. P., NAKAMOTO R. K., ALLEN K. E., and SLAYMAN C. W., J. Biol. Chem., 2000, 275, 15709–15716.
78 WARNCKE J., SLAYMAN C. L., Biochim. Biophys. Acta, 1980, 591, 224–233. 79 VENEMA K., and PALMGREN M. G., J. Biol. Chem., 1995, 270, 1965–1967. 80 LANYI J. K., J. Biol. Chem., 1997, 272, 31209–31212. 81 FILLINGAME R. H., and DIMITRIEV O. Y., Biochim. Biophys. Acta, 2002, 1565, 232–245. 82 BUKRINSKY J. T., BUCH-PEDERSEN M. J., LARSEN S., and PALMGREN M. G., FEBS Lett., 2001, 494, 6–10. 83 RADRESA O., OGATA K., WODAK S., RUYSSCHAERT J. M., and GOORMATIGH E., Eur. J. Biochem., 2002, 269, 5246–5258. 84 GEERING K., J. Membr. Biol., 2000, 174, 181–190. 85 GEERING K., J. Bioenerg. Biomembr., 2001, 33, 425–438. 86 FERREIRA T., MASON A. B., SLAYMAN C. W., J. Biol. Chem., 2001, 276, 29613–29616. 87 LIN J., and ADDISON R., J. Biol. Chem., 1995, 270, 6935–6941. 88 LIN J., and ADDISON R., J. Biol. Chem., 1995, 270, 6942–6948. 89 CHANG A., ROSE M. D., and SLAYMAN C. W., Proc. Natl. Acad. Sci. U. S. A., 1993, 90, 5808–5812. 90 MORSOMME P., SLAYMAN C. W., GOFFEAU A., Biochim. Biophys. Acta, 2000, 1469, 133–157. 91 DEWITT N. D., dos SANTOS C. F., ALLEN K. E., and SLAYMAN C. W., J. Biol. Chem., 1998, 273, 21744–21751. 92 HARRIS S. L., NA S., ZHU X., SETO-YOUNG D., PERLIN D. S., TEEM J. H. and HABER J. E., Proc. Natl. Acad. Sci. U. S. A., 1994, 91, 10531–10535. 93 WANG Q., and CHANG A., EMBO J., 1999, 18, 5972–5982. 94 FERREIRA T., MASON A. B., PYPAERT M., ALLEN K. E., SLAYMAN C. W., J. Biol. Chem., 2002, 277, 21027–21040. 95 CHANG, A., and SLAYMAN, C. W. J. Cell Biol., 1991, 115 289–295. 96 NA S. HINCAPIE M., MCCUSKER J. H., HABER J. E., J. Biol. Chem., 1995, 270, 6815–6823. 97 CHANG A., FINK G. R., J. Cell Biol., 1995, 128, 39–49. 98 LUO W., CHANG A., J. Cell Biol., 1997, 138, 731–746.
References 99 ROBERG K. J., CROTWELL M., ESPENSHADE P., GIMENO R. and KAISER C. A., J. Cell Biol., 1999, 145, 659–672. 100 SHIMONI Y., KURIHARA T., RAVAZZOLA M., AMHERDT M., ORCI L., SCHEKMAN R., J. Cell Biol., 2000, 151, 973–984. 101 BAGNAT M., KERANEN S., SHEVCHENKO A., SHEVCHENKO A., SIMONS K., Proc. Natl. Acad. Sci. U. S. A., 2000, 97, 3254–3259. 102 ASAHI M., KIMURA Y., KURZYDLOWSKI K., TADA M., MACLENNAN D. H., J. Biol. Chem., 1999, 274, 32855–32862. 103 JAHN T., FUGLSANG A. T., OLSSON A., BRUNTRUP I. M., COLLINGE D. B. VOLKMANN D., SOMMARIN M., PALMGREN M. G., LARSSON C., Plant Cell, 1997, 9, 1805–1814. 104 FUGLSANG A. T., VISCONTI S., DRUMM K., JAHN T., STENSBALLE A., MATTEI B., JENSEN O. N., ADUCCI P., PALMGREN M. G., J. Biol. Chem., 1999, 274, 36774–36780. 105 MAUDOUX O., BATOKO H., OECKING C., GEVAERT K., VANDEKERCKHOVE J., BOUTRY M., MORSOMME P., J. Biol. Chem., 2000, 275, 17762–17770. 106 THERIEN A. G., BLOSTEIN R., Am. J. Physiol. Cell Physiol., 2000, 279, C541–566.. 107 SERRANO R. FEBS Lett., 156, 1983, 11–14. 108 PORTILLO F., Biochim. Biophys. Acta, 2000, 1469, 31–42. 109 PORTILLO F., ERASO P. and SERRANO R., FEBS Lett., 1991, 287, 71–74. 110 MIRANDA M., PARDO J. P., ALLEN K. E., SLAYMAN C. W., J. Biol. Chem., 2002, 277, 40981–40988. 111 MIRANDA M., ALLEN K. E., PARDO J. P., SLAYMAN C. W., J. Biol. Chem., 2001, 276, 22485–22490. 112 GOOSSENS A., de LA FUENTE N., FORMENT J., SERRANO R., PORTILLO F., Mol. Cell Biol., 2000, 20, 7654–7661. 113 GOFFEAU A. et al., Science, 1996, 274, 546. 563–567. 114 CATTY P., de KERCHOVE D’EXAERDE A., GOFFEAU A., FEBS Lett., 1997, 409, 325–332. 115 HARO R., GARCIADEBLAS B., RODRIGUEZ-NAVARRO A., FEBS Lett., 1991, 291, 189–191.
116 BENITO B., QUINTERO F. J., RODRIGUEZ-NAVARRO A., Biochim. Biophys. Acta, 1997, 1328, 214–226. 117 BENITO B., GARCIADEBLAS B., RODRIGUEZ-NAVARRO A., Microbiology, 2002, 148, 933–941. 118 ANTEBI A., FINK G. R., Mol. Biol. Cell, 1992, 3, 633–654. 119 SORIN A., ROSAS G., and RAO R., J. Biol. Chem., 1997, 272, 9895–9901. 120 DURR G., STRAYLE J., PLEMPER R., ELBS S., KLEE S. K., CATTY P., WOLF D. H., RUDOLPH H. K., Mol. Biol. Cell, 1998, 9, 1149–1162. 121 STRAYLE J., POZZAN T., and RUDOLPH H. K., EMBO J., 1999, 18, 4733–4743. 122 WEI Y., CHEN J., ROSAS G., TOMPKINS D. A., HOLT P. A., RAO R., J. Biol. Chem., 2000, 275, 23927–23932. 123 MANDAL D., RULLI S. J., RAO R., J. Biol. Chem., 2003, 278, 35292–35298. 124 CUNNINGHAM K. W., and FINK G. R., J. Exp. Biol., 1994, 196, 157–166. 125 CUNNINGHAM K. W., and FINK G. R., 1994, J. Cell Biol. 124, 351–363. 126 FU D., BEELER T. J., DUNN T M., Yeast, 1995, 11, 283–292. 127 HUFFMAN D. L., and O’HALLORAN T. V., Annu. Rev. Biochem., 2001, 70, 677–701. 128 RAD M. R., KIRCHRATH L., HOLLENBERG C. P., et al., Yeast, 1994, 10, 1217–1225. 129 TANG X., HALLECK M. S., SCHLEGEL R. A., WILLIAMSON P., Science, 1996, 272, 1495–1497. 130 CHEN C. Y., INGRAM M. F., ROSAL P. H., GRAHAM T. R., J. Cell Biol., 1999, 147, 1223–1236. 131 HUA Z., FATHEDDIN P., GRAHAM T. R., Mol. Biol. Cell, 2002, 13, 3162–3177. 132 CRONIN S. R., RAO R., HAMPTON R. Y., J. Cell Biol., 2002, 157, 1017–1028. 133 VASHIST S., FRANK C. G., JAKOB C. A., and DAVIS T. W., Mol. Biol. Cell, 2002, 13, 3955–3966. 134 POIREY R., CZIEPLUCH C., TOBIASCH E., PUJOL A., KORDES E., JAUNIAUX J. C., Yeast, 1997, 13, 479–482. 135 SUPPLY P., WACH A., GOFFEAU A., J. Biol. Chem., 1993, 268, 19753–19759. 136 FERNANDES A. R., SA-CORREIA I., Yeast, 2003, 20, 207–219.
21
1
Gastric H+ ,K+ -ATPase Jaim Moo Shin, Olga Vagin, Keith Munson, and George Sachs Membrane Biology Laboratory, West Los Angeles VA Medical Center, Los Angeles, USA
Originally published in: Handbook of ATPase. Edited by Masamitsu Futai, Yoh Wada and Jack H. Kaplan. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30689-3
1 Gastric H+ ,K+ -ATPase
The gastric H+ ,K+ -ATPase exchanges cytoplasmic protons for extracytoplasmic potassium ions in an electroneutral manner and, by being coupled to a KCl pathway in the apical membrane of the parietal cell, is responsible for the elaboration of HCl by the parietal cell of the gastric mucosa. The transport of H+ across the apical membrane is by means of conformational changes in the protein driven by cyclic phosphorylation and dephosphorylation of the catalytic subunit of the ATPase. This mechanism places the ATPase in the P2 ATPase family. The gastric H+ ,K+ -ATPase is composed of two subunits, the α-subunit and the β-subunit. The α subunit with molecular mass of about 100 kDa has the catalytic site and the β subunit with peptide mass of 35 kDa is strongly but non-covalently associated with the α subunit. The β subunit has six or seven N-glycosylated sites exposed to the extra-cytoplasmic surface. There is about 60% sequence homology between the Na+ ,K+ -ATPase and the H+ ,K+ -ATPase a subunit, while the CaATPase of sarcoplasmic reticulum shows only about 15% overall homology with the H+ ,K+ -ATPase. The hydropathy profile is, however, very similar, as is the 3D structure based on site-directed mutagenesis. The β subunit has 35% homology to the β 2 subunit of the Na+ ,K+ -ATPase. The gastric H+ ,K+ -ATPase has been a therapeutic target in treatment of acidrelated disease for the last 15 years. Substituted benzimidazoles that inhibit the H+ ,K+ -ATPase are the newest and most effective class of anti-ulcer drugs used to treat various diseases of the upper GI tract.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Gastric H+ ,K + -ATPase
1.1 α Subunit of Gastric H+ ,K+ -ATPase
The primary sequences of the α subunits deduced from cDNA have been defined for pig [1], rat [2] and rabbit [3]. The hog gastric H+ ,K+ -ATPase α subunit sequence deduced from its cDNA consists of 1034 amino acids and has an Mr of 114,285 [1]. The sequence based on the known N-terminal amino acid sequence is one less than the cDNA derived sequence [4]. Rat gastric H+ ,K+ -ATPase consists of 1033 amino acids and has a Mr of 114,012 [2], and rabbit gastric H+ ,K+ -ATPase consists of 1035 amino acids, with Mr = 114,201 [3]. Notably, the initial publication of the rat sequence had an N-terminal fragment of retinol dehydrogenase that is used in some microarrays as a marker for the ATPase (Agilent). Homology of the α subunits among the animal species is extremely high (over 97% identities). The gene sequence for human and the 5 part of the rat H+ ,K+ -ATPase α subunits have been determined [5–7]. The human gastric H+ ,K+ -ATPase gene has 22 exons and encodes α protein of 1035 residues, including the initiator methionine residue (Mr = 114,047). The N terminal amino acid is actually glycine. These H+ ,K+ ATPase α subunits show high homology (∼60 % identity) with the Na+ ,K+ -ATPase catalytic α subunits [5]. The gastric α subunit has conserved sequences along with the other P type ATPases, the sarcoplasmic reticulum Ca-ATPase and the Na+ ,K+ -ATPase, for the ATP binding site, the phosphorylation site, the pyridoxal 5 -phosphate binding site and the fluorescein isothiocyanate binding site. These sites are thought to be within the ATP binding domain in the large cytoplasmic loop between membranespanning segments 4 and 5 (N domain). Using the hog gastric vesicles containing the H+ ,K+ -ATPase, it was shown that pyridoxal 5 -phosphate is bound at lys497 of the α subunit in the absence but not the presence of ATP [8], suggesting that lys497 is present in the ATP binding site or in its vicinity [9]. The phosphorylation site was observed to be at asp386 [10], which is well conserved in other P type ATPases. Fluorescein isothiocyanate (FITC) covalently labels the gastric H+ ,K+ -ATPase in the absence of ATP [11]. The binding site of FITC was at lys518 [12]. Later, several additional lysines, such as those at 497 and 783, were shown to react with FITC during the inactivation of the Na+ ,K+ ATPase and to be protected from reaction with FITC when ATP was present in the incubation. Based on these data, similar lysines of the H+ ,K+ -ATPase could be near or in the ATP binding site. The membrane topology of the α subunit has been extensively studied. The hydropathy plots is most commonly used to predict the location of membranespanning α helices [13]. These are based on determining α moving average of hydrophobic, neutral and hydrophilic amino acids using α variety of scales. Interpretation of the hydrophobicity plots of the α subunit of the H+ ,K+ -ATPase based on the primary amino acid sequence suggested an 8 or 10 membrane-spanning segment model for the secondary structure. The first 4 membrane-spanning segments in the N terminal one-third of the α subunit are clearly defined; however, in
1 Gastric H+ ,K + -ATPase
the C terminal one-third of the protein α prediction from the hydropathy analysis is more difficult. The C-terminal amino acids of the α subunit are tyr-tyr, which therefore can be iodinated with peroxidase–H2 O2 –125 I on the cytoplasmic side of intact hog gastric vesicles. Digestion with carboxypeptidase Y then released about 28% of the counts incorporated into the α subunit, as would be predicted from α cytoplasmic location of the C terminal tyrosines [14]. These data show that there is an even number of transmembrane segments in the α subunit. In either an 8- or 10-transmembrane segment model, each transmembrane pair connecting the luminal loop has at least one cysteine, which allows fluorescent labeling of the cysteines left after complete cleavage of the cytoplasmic domain by trypsin. The N-terminus of each fluorescent peptide fragment is then defined by N-terminal sequencing. The size of the fragment is determined from Mr measurement in the tricine gradient gels used for the separation. This then allows the C-termination of the peptide to be identified using the lys or arg cleavage sites at this end. Four transmembrane pairs connected by their luminal loop were detected in the hog gastric H+ ,K+ -ATPase digest [15, 16]. A tryptic peptide fragment beginning at gln104 represents the H1/loop/H2 sector. The H3/loop/H4 sector was found at a single peptide beginning at thr291, and the H5/loop/H6 sector at a peptide beginning at leu776. The H7/loop/H8 region was found in a single peptide fragment of 11 kDa, beginning at leu853. These represent the first 4 transmembrane segment pairs of the proposed 10 transmembrane segment model [3]. From these studies, 4 membrane segment/loop/membrane segment sectors were identified, corresponding to H1/loop/H2 through H7/loop/H8. Although the hydropathy plot predicts two additional membrane-spanning segments (H9, H10) at the C-terminal region of the enzyme, no evidence was obtained for this pair biochemically, even though H9 is predicted to have 4 cysteine residues that should be able to bind fluorescein 5-maleimide (F-MI). The absence of F-MI labeling of these hydrophobic segments may suggest that the cysteines in H9 are post-translationally modified. Reduction, ester cleavage by hydroxylamine and chelating agents have not been successful in identifying this putative transmembrane pair after tryptic cleavage. The nature of the post-translational modification, if this is the reason for the lack of cysteine reactivity, is obscure. Many monoclonal antibodies have been generated reacting with the H+ ,K+ ATPase. With the sequence of the α subunit deduced from cDNA, it is now possible to define the epitopes, and to determine the sidedness of these epitopes by staining intact or permeable cells with either fluorescent or immunogold techniques. Antibody 95 inhibits ATP hydrolysis in the intact vesicles, and appears to be K+ competitive [17]. Its epitope was identified by Western blotting of an E. coli expression library of fragments of the cDNA encoding the α subunit, as well as by Western analysis of tryptic fragments. The sequence recognized by this antibody was between the amino acid positions 529 and 561. Since it inhibits intact vesicles this epitope must be cytoplasmic. Its epitope is close to the region known to bind the cytoplasmic reagent FITC, namely in the loop between H4 and H5. The ability
3
4 Gastric H+ ,K + -ATPase
of this antibody to immunoprecipitate intact gastric vesicles showed that it was on the cytoplasmic surface of the pump. Antibody 1218 was shown by Western analysis of tryptic fragments and by recombinant methodology to have its major epitope between amino acid positions 665 and 689 [18], and the epitope was then more cleanly defined as seven consecutive amino acids, Asp682-Met-Asp-Pro-Ser-GluLeu688 [19]. This epitope is on the cytoplasmic surface of the enzyme also in the loop between H4 and H5. This antibody does not inhibit ATPase activity. A second epitope for mAb 1218 was also identified to locate between amino acid positions 853 and 946 according to Western blot analysis of tryptic fragments. A synthetic peptide (888–907) containing this second epitope displaced mAb 1218 from vesicles adsorbed to the surface of ELISA wells [17]. Monoclonal antibody 146 was generated against intact parietal cells and subsequently purified and shown to react with rat H+ ,K+ -ATPase. In cells it reacts on the outside surface of the canaliculus, as shown by immunogold electron microscopy [20, 21]. Western analysis of rat, rabbit and hog enzyme showed, surprisingly, that it was present on the β subunit of rat and rabbit and absent from the β subunit of hog. Disulfide reduction eliminated reactivity of this antibody. Comparing sequence of the different β subunits, there is an arg, pro substitution in the hog for leu, val in the rat between the disulfide at position 161 and 178. This suggests that the β subunit epitope of mAb 146 is contained within this region of the subunit. Conversely, there was also an epitope on the α subunit of all three species recognized by mAb 146 (also using Western analysis). This epitope was defined, both by tryptic mapping and octamer walking, to be between positions 873 and 877 of the hog α subunit. This is on, or close to, the extra-cytoplasmic face of H7. The finding using Western blotting that there was an epitope both on the α and β subunits was confirmed by expressing the rabbit subunits individually in SF9 cells using baculovirus transfection. The β subunit reacted in the SF9 cells and on Western blots with mAb 146. The α subunit did not react in the cells but did on Western blots as if the epitope in the α subunit was difficult to access in the absence of α denaturing detergent such as SDS [18]. These data may indicate tight binding between the α and β subunits in this region of the enzyme. A molecular biological method was developed to analyze not only for the presence of the membrane segments defined as above, but also to explore the nature of membrane insertion [22]. A cDNA encoding α fusion protein of the 102 Nterminal amino acids of the rabbit H+ ,K+ -ATPase α subunit linked by α variable segment to the 177 most C-terminal amino acids of the rabbit H+ ,K+ -ATPase β subunit was transcribed and translated in a rabbit reticulocyte lysate system using labeled methionine in the absence or presence of microsomes. The cDNA for the variable region containing one or more putative membrane-spanning segments is synthesized using selected primers in a PCR reaction and ligated into the cDNA construct. Since the β subunit region has 5 consensus N-glycosylation sites, translocation of the C-terminal β subunit part of the fusion protein into the interior of the microsomes can be determined by assessment of glycosylation. The presence of glycosylation is evidence for an odd number of transmembrane segments in the variable region preceding the β subunit part. The absence of glycosylation shows
1 Gastric H+ ,K + -ATPase
either the presence of an even number of membrane-spanning segments or absence of membrane insertion. From this approach, the first 4 membrane segments are present and are co-inserted with translation. The last 2 hydrophobic sequences, H9 and H10, also appear to be co-inserted. The information within the H5, H6 and H7 sequences is apparently insufficient for these to act on their own as either signal anchor or stop transfer sequences. The information within the H8 sequence is sufficient for this to act as α stop transfer sequence. However, a stop transfer sequence in the absence of a preceding signal anchor sequence may have no implications for membrane insertion of the stop transfer sequence [23]. That H5 through H7 are indeed membrane inserted in the mature enzyme is clear from the biochemical analyses. A reasonable hypothesis that may explain these translation data is that insertion of H5 through H7 is post-translational, depending on insertion of H9/10, and that H8 then acts as a stop transfer sequence. Alternatively, the presence of the β sequence may also be involved in tethering H8 and thence H5 to H7. The presence of the additional pair of membrane segments predicted by hydropathy at the C-terminus is strongly suggested by these translation data. The combination of techniques described above, namely, sided trypsin proteolysis, epitope mapping, iodination, and in vitro translation, provides evidence for a 10 membrane segment model (Figure 1), with α large cytoplasmic loop between H4 and H5 and a large extra-cytoplasmic loop between H7 and H8. The 10 transmembrane segments determined and defined from the above evidence are named as TM1–10. These data are consistent with the 3D structure derived from the crystals of the SR Ca-ATPase. It is also worth noting that the reaction of the thiophilic luminal proton pump inhibitors provided conclusive evidence for the presence of TM5 and TM6, as discussed below.
1.2 β Subunit of Gastric H+ ,K+ -ATPase
The β subunit was first identified in work analyzing the carbohydrates of gastric membranes [24]. Post-embedding electron-microscopic staining techniques showed that wheat germ agglutinin staining occurred on the inside face of the inside-out gastric vesicles. The same β subunit was found by determining the nature of the antigen in autoimmune gastritis [25]. Using α lectin-affinity chromatography, the H+ ,K+ -ATPase α subunit was co-purified with the β subunit, showing that the α subunit interacts with the β subunit [26–28]. By cross-linking with low concentrations of glutaraldehyde, the α subunit was shown to be closely associated with the β subunit [29, 30]. The primary sequences of the β subunits have been reported for rabbit [31], hog [25], rat [32–35], mouse [36, 37], and human [38]. The hydropathy profile of the β subunit appears less ambiguous than the α subunit. There is one transmembranespanning region predicted by the hydropathy analysis, which is located between positions 38 and 63 near the N-terminus. Tryptic digestion of the intact gastric H+ ,K+ -ATPase produces no visible cleavage of the β subunit on SDS gels. Wheat
5
6 Gastric H+ ,K + -ATPase
Fig. 1 Proposed model of the gastric H+ ,K+ -ATPase " subunit. There are three lobes in the cytoplasmic domain, N (ATP binding), P (phosphorylation), and A (activation) regions, and ten transmembrane segments. This model is based on the
sr Ca-ATPase structures published by Toyoshima et al. [94]. The $ subunit is shown with " single membrane-spanning domain and regions of interaction are numbered in both subunits.
germ agglutinin (WGA) binding of the β subunit is retained. These data indicate that most of the β subunit is extra-cytoplasmic and glycosylated. When lyophilized hog vesicles are cleaved by trypsin followed by reduction, a small, non-glycosylated peptide fragment is seen on SDS gels with the N terminal sequence AQPHYS, which represents C-terminal region beginning at position 236. This small fragment is not found either after trypsinolysis of intact vesicles or in the absence of reducing agents. A disulfide bridge must therefore connect this cleaved fragment to the β subunit containing the carbohydrates. The C-terminal end of the disulfide is at position 262. This leaves little room for an additional membrane-spanning α helix. Hence it is likely that the β subunit has only one transmembrane-spanning segment. Reduction of the disulfides of the β subunit inhibits the activity of the H+ ,K+ -ATPase [39]. With Na+ ,K+ -ATPase, the effect of reducing agents on the ability of the enzyme to hydrolyze ATP and bind ouabain was quantitatively correlated with the reduction of disulfide bonds in the β subunit [40]. The β subunit of the Na+ ,K+ -ATPase is necessary for targeting the complex from the endoplasmic reticulum to the plasma membrane [41, 42]. It also stabilizes a
1 Gastric H+ ,K + -ATPase
functional form of both the Na+ ,K+ -ATPases [43, 44]. The H+ ,K+ -ATPase β-subunit is also necessary to target the membrane after proper assembly and sorting [45–48]. When the α- and β-subunits of Na+ ,K+ -ATPase and H+ ,K+ -ATPase were expressed in Sf9 cells in different combinations, the hybrid ATPase with the Na+ ,K+ -ATPase α-subunit and the H+ ,K+ -ATPase β-subunit showed an ATPase activity, which was 12% of the Na+ ,K+ -ATPase activity, with decreased apparent K+ affinity and about half the turnover number. Another hybrid ATPase with the H+ ,K+ -ATPase α-subunit and the Na+ ,K+ -ATPase β-subunit showed 9% of the H+ ,K+ -ATPase activity but increased the apparent K+ affinity [49]. Another report has shown that both lipid and the β subunit affect the K+ affinity of the gastric H+ ,K+ -ATPase [50]. Since both enzymes are unique in being K+ counter-transporters, perhaps the β is important to force counter-transport. Seven putative N-glycosylated sites (AsnXaaSer and AsnXaaThr) are found in rabbit H+ ,K+ -ATPase β-subunit [31], conserved in rat [32–35] and human [38], but only six putative N-glycosylation sites are present in the hog gastric β-subunit [25]. The structure of N-linked oligosaccharides of the β subunit of rabbit gastric H+ ,K+ ATPase was identified [51, 52]. All seven N-linked AsnX(Ser/Thr) sites at positions 99, 103, 130, 146, 161, 193, and 222 were fully glycosylated. Asn99 was modified exclusively with oligomannosidic-type structures, Man6 GlcNAc2 -Man8 GlcNAc2 , and Asn193 has Man5 GlcNAc2 -Man8 GlcNAc2 and lactosamine-type structures. Asn 103, 146, 161, and 222 contain lactosamine-type structures. All the branches of the lactosamine-type structure were terminated with Galα-Galβ-GlcNAc extensions. Similarly, the structure of complex glycans of the β subunit of the Na+ ,K+ -ATPase contain polylactosaminoglycans [53]. The role of the carbohydrate chains in membrane trafficking has been studied in HEK-293 cells and in polarized cells such as LLCPK and MDCK cells [54]. In HEK293 cells, enzyme activity was not affected by removal of any single carbohydrate site of the β subunit but removal of all the glycosylation sites resulted in the complete loss of activity. The role of glycosylation in sorting and trafficking is discussed below. The H+ ,K+ -ATPase β subunit has the sequence Phe-Arg-His-Tyr in its cytoplasmic domain. This tyrosine is important in initiating the removal of the H+ ,K+ ATPase from the apical membrane of the parietal cells and in ensuring its return to the TVE compartment in order to terminate the process of acid secretion. The participation of a tyrosine-based signal in the retrieval of the H+ ,K+ -ATPase suggests that this process involves interactions with adaptins and is mediated by clathrincoated pit formation [55]. In mice deficient in the H+ ,K+ -ATPase β-subunit, cells that express the H+ ,K+ -ATPase α-subunit had abnormal canaliculi and were devoid of typical tubulovesicular membranes [56]. Chimeric β-subunits between the gastric H+ ,K+ -ATPase and the Na+ ,K+ ATPase were constructed and co-transfected with the H+ ,K+ -ATPase α-subunit cDNA in HEK-293 cells [57]. Whole cytoplasmic and transmembrane domains of H+ ,K+ -ATPase β-subunit can be replaced by those of Na+ ,K+ -ATPase β-subunit without losing the enzyme activity. Also, the extracellular segment between Cys152 and Cys178, which contains the second disulfide bond, was exchangeable between
7
8 Gastric H+ ,K + -ATPase
H+ ,K+ -ATPase and Na+ , K+ -ATPase, preserving the ATPase activity intact. However, when four amino acids (76 QLKS79 ) in the ectodomain of H+ ,K+ -ATPase βsubunit were replaced by the corresponding amino acids (72 RVAP75 ) of the Na+ ,K+ ATPase β-subunit, the ATPase activity was abolished. This region appears essential for effective interaction between the two subunits. 1.3 Regions of Association Between the α and β Subunits
For Na+ ,K+ -ATPase, the last 161 amino acids of the α subunit are essential for effective association with the β subunit [58]. Furthermore, the last 4 or 5 C terminal hydrophobic amino acids of the Na+ pump β subunit are also essential for interaction with the α subunit whereas the last few hydrophilic amino acids are not [59]. Expression of sodium pump α subunit along with the β subunit of either sodium or proton pump in Xenopus oocytes has shown that the β subunit of the gastric proton pump can act as a surrogate for the β subunit of the sodium pump as far as membrane targeting and 86 Rb+ uptake, suggesting some homology in the associative domains of the β subunits of the two pumps [60] To biochemically specify the region of the α subunit associated with the β subunit, the tryptic digest was solubilized using non-ionic detergents such as NP-40 or C12 E8 . These detergents allow the holoenzyme to retain ATPase activity. The soluble enzyme was then adsorbed onto a WGA affinity column. Following elution of the peptides not associated with the β subunit binding to the WGA column, elution of the β subunit by 0.1 N acetic acid also eluted almost quantitatively the TM7/loop/TM8 sector of the α subunit. These data show that this region of the α subunit is tightly associated with the β subunit such that non-ionic detergents are unable to dissociate it from the β subunit [61]. The antibody mAb 146-14 also recognizes the region of the α subunit at the extra-cytoplasmic face of the TM7 segment, and also recognizes the β subunit, a finding consistent with the association found by column chromatography. Using a yeast two-hybrid analysis, a fragment Leu855 to Arg922 of the α subunit was identified as interacting with the β subunit [62]. Also, two different domains in the β-subunit, Gln64 to Asn130 and Ala156 to Arg188, were identified as association domains with the α-subunit by this method. 1.4 Regions of Association of the α Subunits
There has been much suggestive evidence that the α-β heterodimeric H+ ,K+ ATPase exists as an oligomer. Such evidence includes target size [63, 64] and unit cell size of the enzyme in two-dimensional crystals [65]. It was shown that the enzyme did indeed exist as an (αβ)2 heterodimeric dimmer, using blue native gel separation, and cross-linking with Cu2+ -phenanthroline. Membrane-bound H+ ,K+ ATPase reacted with Cu-phenanthroline to provide an α-α dimer. No evidence was obtained for β-β dimerization. ATP prevents this Cu2+ -phenanthroline-induced
2 Kinetics of the H+ ,K + -ATPase
α-α dimerization. It was possible, by blocking free SH groups and subsequent reduction and labeling with fluorescein maleimide, to demonstrate that the site of Cu2+ -oxidative cross-linking was either at cys565 or cys616 [66]. Hence this region of the α subunit in the N domain is in close contact with its neighboring α subunit. Expression studies of the Na+ ,K+ -ATPase in insect SF9 cells and immunoprecipitation reached the conclusion that a similar region of the α subunit of this enzyme was also in close contact [67].
2 Kinetics of the H+ ,K+ -ATPase
H+ ,K+ -ATPase exchanges intracellular hydrogen ions for extracellular potassium ions by consuming ATP. The H+ for K+ stoichiometry of the H+ ,K+ -ATPase was reported to be one [68–70] or two [71, 72] per ATP hydrolyzed. The H+ /ATP ratio was independent of external KCl and ATP concentration [71]. When care is taken to use only tight vesicles at pH 6.1 the stoichiometry is 2H+ per ATP. Clearly, at a luminal pH of 0.8, the in vitro pH reached in the canaliculus, the stoichiometry must be 1H+ per ATP. This could be explained by retained protonation at one of the ion-binding site carboxylic acids at a pH well below the pK a of the carboxylic amino acid. The kinetics of the H+ ,K+ -ATPase have defined many reaction steps, similar to those of the Na+ ,K+ -ATPase [73]. The rate of formation of the phosphoenzyme (EP) and the K+ -dependent rate of phosphoenzyme breakdown are sufficiently fast to allow the phosphoenzyme to be an intermediate in the overall ATPase reaction. The initial step is the reversible binding of ATP to the enzyme in the absence of added K+ ion, followed by a Mg2+ (and proton) dependent transfer of the terminal (γ ) phosphate of ATP to asp386 of the catalytic subunit (E1 -P· H+ ). The Mg2+ remains occluded in the P domain near asp730 [74] until dephosphorylation [75, 76]. The addition of K+ to the enzyme-bound acyl phosphate results in a biphasic two-step dephosphorylation. The faster initial step depends on [K+ ], whereas the slower step is not affected by K+ concentration. The second phase of EP breakdown is accelerated in the presence of K+ , but at [K+ ] i 500 µM the rate becomes independent of K+ concentration. This shows that two forms of EP exist. The first form, presumably E1 P, is K+ insensitive and converts spontaneously in the rate-limiting step into E2 P, the K+ -sensitive form. ATP binding to the H+ ,K+ -ATPase occurs in both the E1 and the E2 state, but with a lower affinity in E2 state (2,000 times lower than for E1 ) [77] (Figure 2). H+ or K+ interacts competitively on the cytoplasmic surface of intact vesicles. The effects of H+ and K+ on formation and breakdown of phosphoenzyme were determined using transient kinetics [78]. Increasing hydrogen ion concentrations on the ATP-binding face of the vesicles accelerate phosphorylation, whereas increasing [K+ ] inhibit phosphorylation. Increasing [H+ ] reduces this K+ inhibition of the phosphorylation rate. Decreasing [H+ ] accelerates dephosphorylation in the
9
10 Gastric H+ ,K + -ATPase
Fig. 2 Pump cycle of the gastric H+ ,K+ ATPase. The initial step is proton binding to the enzyme (E1 ·H+ ), followed the reversible binding of ATP to the enzyme in the absence of added K+ ion (E1 ·AT P ·H+ ). The phosphorylation step occurs by a Mg2+ (and proton) dependent transfer of the ter-
minal (() phosphate of ATP to the catalytic subunit (E1 -P ·H+ ). After releasing the proton, the enzyme (E2 -P) binds K+ ion, forming E2 -P ·K+ . Dephosphorylation (E2 -P ·K+ → E2 ·K+ ) followed by release of K+ (E2 ·K+ → E1 ) are the last steps for exchange of H+ with K+ .
absence of K+ , and K+ on the luminal surface accelerates dephosphorylation. Increasing [K+ ] at constant ATP decreases the rate of phosphorylation and increasing ATP concentrations at constant [K+ ] accelerates ATPase activity and increases the steady-state phosphoenzyme level [79]. Therefore, inhibition by cations is due to cation stabilization of a dephospho form at a cytosolically accessible cation-binding site. To determine the role of divalent cations in the reaction mechanism of the H+ ,K+ -ATPase, calcium was substituted for magnesium, which is necessary for phosphorylation. Calcium ion inhibits K+ stimulation of the H+ ,K+ -ATPase by binding at a cytoplasmic divalent cation site. Ca·EP dephosphorylates 10–20 times more slowly than Mg·EP in the presence of 10 mM KCl with either 8 mM CDTA or 1 mM ATP. The inability of the Ca·EP to dephosphorylate in the presence of K+ , compared with Mg·EP, demonstrates that the type of divalent cation that occupies the catalytic divalent cation site required for phosphorylation is important for the conformational transition to a K+ sensitive phosphoenzyme. Calcium is tightly bound to the divalent cation site of the phosphoenzyme and the occupation of this site by calcium causes slower phosphoenzyme kinetics. Since the presence of CDTA or EGTA does not change the dephosphorylation kinetics of the EP·Ca form of the enzyme, it was concluded that the divalent cation remains occluded in the enzyme until dephosphorylation occurs [80].
3 Conformations of the H+ ,K + -ATPase
3 Conformations of the H+ ,K+ -ATPase
H+ ,K+ -ATPase generates HCl in the stomach by the electroneutral exchange of H+ for K+ , dependent on conformation changes in the protein. The alteration of enzyme conformation changes the affinity and sidedness of the ion-binding sites during the cycle of phosphorylation and dephosphorylation. The ions transported from the cytoplasmic side are H+ at high pH. Since Na+ is transported as a surrogate for H+ , it is possible that the hydronium ion, rather than the proton per se, is the species transported [81]. The ions transported inwards from the outside face of the pump are Tl+ , K+ , Rb+ or NH4 + [73, 79, 82]. Presumably the change in conformation changes a relatively small ion-binding domain in the outward direction into a larger ion-binding domain in the inward direction. The E1 conformation of the H+ ,K+ -ATPase binds the hydronium ion from the cytoplasmic side at high affinity. Following phosphorylation, the conformation changes from E1 P·H3 O+ to E2 P· H3 O+ form, which has high affinity for K+ and low affinity for H3 O+ allowing release of H3 O+ and binding of K+ from the extracytoplasmic surface of the enzyme. Breakdown of the E2 P form requires K+ or its congeners on the outside face of the enzyme. With dephosphorylation, the E1 K+ conformation is produced with a low affinity for K+ , releasing K+ to the cytoplasmic side, allowing rebinding of H3 O+ [73]. The steps of phosphorylation of the H+ ,K+ ATPase were studied by measuring the inorganic phosphate, P18 O4 and P16 O4 distribution as a function of time at different H+ , K+ , and Pi concentrations [83]. The formation of E·Pi complex that exchanges 18 O with HOH was slower at pH 5.5 than at pH 8 and is not diffusion controlled, suggesting a unimolecular chemical transformation involving an additional intermediate in the phosphorylation mechanism such as, perhaps, a protein conformational change. From competitive binding of ATP and 2 ,3 -O-(2,4,6-trinitrophenylcyclohexadienylidine) adenosine 5 -phosphate (TNP-ATP), two classes of nucleotide binding sites were suggested [84]. TNP-ATP is not a substrate for the H+ ,K+ -ATPase. However, TNP-ATP prevents phosphorylation by ATP and inhibits the K+ -stimulated pNPPase and ATPase activities. The number of TNP-ATP binding sites was twice the stoichiometry of phosphoenzyme formation. Recently, two moles of phosphate were claimed to be liberated from one mole of phosphoenzyme of the gastric H+ ,K+ -ATPase [85]. It was hypothesized that one mole of Pi is from a high-affinity ATP binding site and the other Pi is from enzyme-bound ATP at a low-affinity site during crosstalk between catalytic subunits. All-sites phosphorylation has been proposed by studies using Pi or acetyl phosphate [86, 87], which showed that the stoichiometry of the maximum amount of phosphoenzyme formed from ATP, that from acetyl phosphate, that from inorganic phosphate (Pi), and the maximum amount of ATP binding to the enzyme was close to 1:2:2:2. The phosphoenzyme formed was shown to be turning over. The addition of K+ reduced the amount of phosphoenzyme from ATP to one-tenth but reduced those from acetyl phosphate or Pi to only a half.
11
12 Gastric H+ ,K + -ATPase
Fluorescein isothiocyanate (FITC) binds to the H+ ,K+ -ATPase at pH 9.0, inhibiting ATPase activity but not pNPPase activity [11]. Fluorescence of the FITC-labeled enzyme, representing the E1 conformation, was quenched by K+ , Rb+ , and Tl+ [11, 88]. The quenching of the fluorescence by KCl reflects the formation of E2 K+ . FITC binds at lys516 in the hog enzyme sequence [12]. This FITC binding site apparently becomes less hydrophobic when KCl binds to form the E2 ·K conformation. The FITC labeled Na+ ,K+ -ATPase has quite similar properties [89, 90]. Two K+ ions are required to cause the conformational change from E1 to E2. The binding site of FITC was at lys501. However, several additional lysines at positions 480 and 766 were shown to react with FITC during inactivation of the Na+ ,K+ -ATPase. These lysines were also protected from labeling in the presence of ATP. The fluorescent 1-(2-methylphenyl)-4-methylamino-6-methyl-2,3dihydropyrroloquinoline (MDPQ) was shown to inhibit the H+ ,K+ -ATPase and the K+ phosphatase competitively with K+ [75]. MDPQ fluorescence is quenched by the imidazo-pyridine SCH 28080. The imidazopyridine Me-DAZIP binds to the TM1/loop/TM2 sector of the β subunit [91]. MDPQ binding to the extra-cytoplasmic surface of the pump enhances its fluorescence, suggesting that inhibitor binding occurs to a relatively hydrophobic region of the protein. The fluorescence was quenched by K+ , independently of Mg2+ . The binding of Mg-ATP increased the fluorescence due to the formation of an E2 P·[I] complex [75], suggesting that the binding pocket on the outside surface between TM5 and TM6, as discussed later for these compounds, changes conformation or position between the two major conformers of the enzyme. The fluorescent changes with FITC and MDPQ may reflect relative motion of the cytoplasmic domain and outside surface of the enzyme. In the E1 form, the FITC region is relatively closer to the membrane and the extra-cytoplasmic loop relatively hydrophilic. With the formation of the E2 ·K+ form the FITC region is more distant from the membrane, whereas with formation of the E2 P form the MDPQ binding region moves towards the membrane. These postulated conformational changes are therefore reciprocal in the two major conformers of the enzyme. The effect of trypsin on the gastric H+ ,K+ -ATPase provides evidence for conformational changes as a function of ligand binding. Only K+ of the ionic ligands provided significant protection against extensive tryptic hydrolysis [61]. Neither ATP nor ADP affected the tryptic pattern at high trypsin:protein ratios. (J. M. Shin, unpublished observations). Two large fragments, 67 and 33 kDa, were found in the presence of ATP, Mg2+ and SCH 28080 as a K+ surrogate whereas several fragments were produced in the absence of ligands. These data suggest that the E2 K+ , or more particularly the E2 P·[SCH] conformation of the H+ ,K+ -ATPase, severely limits accessibility of trypsin to most of the lysines and arginines in the β subunit [16, 91]. Extensive tryptic digestion of the gastric H+ ,K+ -ATPase in the presence of KCl provided a C-terminal peptide fragment of 20 kDa beginning at the TM7 transmembrane segment, a peptide of 9.4 kDa comprising the TM1/loop/TM2 sector beginning at asp84, and another peptide of 9.4 kDa containing the TM5/loop/M6
4 Functional Residues of the H+ ,K + -ATPase
sector beginning at asn753 [61]. The C-terminal 20 kDa peptide fragment was suggested to be capable of Rb+ occlusion [76]. When these digests in the absence and presence of KCl are compared, some regions near the membrane can be seen to be K+ protected. The region between gly93 and glu104 near the TM1 segment, the region between asn753 and leu776 near the cytoplasmic side of the TM5 segment, and the region after the TM8 segment, especially the region between ile945 and ile963 containing 5 arginines and 1 lysine, are protected from the trypsin digestion in the E2 ·K conformation [61]. Further, there must be protection prior to TM3 and for some distance subsequent to M4, since no fragment containing these segments was found at an Mr of less than 20 kDa. When the gastric H+ ,K+ -ATPase was cleaved by Fe2+ -catalyzed oxidation in the presence of various ligands, the cleavage patterns were different between the conformational changes. There are two Fe2+ cleavage sites. In Fe2+ site-1, the parallel appearance of the fragments at 230 ESE, near 624 MVTGD, and at 728 VNDS upon transition from E1 to E2 (Rb) conformations were observed. Meanwhile, in Fe2+ site-2, the fragment near 299 HFIH was cleaved independently of conformational changes [74]. These cleavage patterns were identical to those of the Na+ ,K+ -ATPase [92, 93]. The cleavage data showed that structural organization and changes in the cytoplasmic domains, association with E1 /E2 transitions, are essentially the same for the H+ ,K+ -ATPase, the Na+ ,K+ -ATPase, and sr Ca-ATPase. Using the well-defined sr Ca-ATPase structure, the N-domain where ATP binds then inclines nearly 90◦ with respect to the membrane, and the A domain rotates by about 110◦ horizontally following E1 to E2 conformational changes [94].
4 Functional Residues of the H+ ,K+ -ATPase
When the gastric H+ ,K+ -ATPase was digested by trypsin in the presence of a high concentration of KCl, the tryptic membrane digest showed capability for Rb+ occlusion [76], like the Na+ ,K+ -ATPase [95]. As described above, some regions near the membrane were K+ protected, such as the region between gly93 and glu104 near the TM1 segment, the region between asn753 and leu776 near the cytoplasmic side of the TM5 segment, and the region after the TM8 segment, especially the region between ile945 and ile963 containing 5 arginines and 1 lysine. Furthermore, when K+ is removed from this membrane digest, the TM5–TM6 hairpin was released from the membrane, showing that this membrane hairpin is stabilized by K+ ions and that this region is more flexible than the other membrane-spanning regions [96]. Proton pump inhibitors such as omeprazole, pantoprazole, lansoprazole, and rabeprazole bind to Cys813 of TM5–TM6, giving inhibition of activity [15, 97]. This biochemical study shows that TM5 and TM6 must be part of the ion pathway. Using site-directed mutagenesis of the gastric H+ ,K+ -ATPase transfected in HEK293 cells, the TM5–TM6 luminal loop was studied in terms of K+ access to the ion binding domain [98]. Mutations of TM5, TM5–TM6, and M6 regions such
13
14 Gastric H+ ,K + -ATPase
as P798C, Y802L, P810A, C813A or S, F818C, T823V, and mutations of TM7–TM8 and TM8 such as E914Q, F917Y, G918E, T929L, F932L, reduced the affinity for SCH28080 up to 10-fold without affecting the affinity for the activating cation, NH4 + . The L809F substitution in the loop between TM5 and M6 resulted in about a 100-fold decrease in inhibitor affinity. C813T mutant showed a 9-fold loss of SCH28080 affinity. All these data suggest that the binding domain for SCH28080 contains the surface between L809 and C813 , in the TM5–TM6 loop and the luminal end of TM6. Mutations of C813 and I816 in TM6 and M334 in TM4 also showed that the inhibitor binds close to the luminal surface of the enzyme [99–102]. When negatively charged amino acid residues in the α-subunit were mutated the results showed that carboxyl groups in the membrane-spanning domain are important for cation binding [103]. Mutation of E820Q showed less sensitivity to K+ and the dephosphorylation was not stimulated by either K+ or ADP, indicating that E820 might be involved in K+ binding and transition to the E2 form of the H+ ,K+ -ATPase [104]. Mutation of E795 at the cytoplasmic side of TM5 showed a decrease of the phosphorylation rate and the apparent ATP affinity, indicating that E795 is involved in both K+ and H+ binding [105]. Mutation of E795 and E820 in TM5 and TM6 resulted in a K+ -independent, SCH28080-sensitive ATPase activity, due to a high spontaneous dephosphorylation rate [106, 107]. Similarly, the positively charged lysine in TM5/TM6 region was mutated. The mutants K800A and K800E of the Buffo bladder H+ ,K+ -ATPase showed K+ stimulated and ouabain-sensitive electrogenic transport. When the positive charge was conserved (K800R), no K+ -induced outward current could be measured, but rubidium transport was present. This shows that a single positive charged residue in TM5 can determine the electrogenicity [108]. The sixth transmembrane (TM6) segment of the catalytic subunit plays an important role in the ion recognition and transport in the P2 -type ATPase families. When all amino acid residues in the TM6 segment of gastric H+ ,K+ -ATPase α-subunit were singly mutated with alanine, four mutants, L819A, D826A, I827A, and L833A, completely lost the K+ -ATPase activity. Mutant L819A was phosphorylated but hardly dephosphorylated in the presence of K+ , whereas mutants D826A, I827A, and L833A were not phosphorylated from ATP. Amino acids involved in the phosphorylation are located exclusively in the cytoplasmic half of the TM6 segment and those involved in the K+ -dependent dephosphorylation are in the luminal half. Several mutants, such as I821A, L823A, T825A, and P829A, partly retained the K+ -ATPase activity accompanying the decrease in the rate of phosphorylation [109]. The cation selectivity of the Na+ ,K+ - and H+ ,K+ -ATPase may be generated through a cooperative effort between residues of the transmembrane segments and the flanking loops that connect these transmembrane domains. Substituting three residues in the Na+ ,K+ -ATPase sequence with their H+ ,K+ -ATPase counterparts (L319F, N326Y, T340S) and replacing the TM3–TM4 ectodomain sequence with that of the H+ ,K+ -ATPase result in a pump that gives 50 % of ATPase activity in the absence of Na+ at pH 6. This effect was not seen when the ectodomain alone is replaced [110]. Many of these mutational results can be rationalized based on the crystal structure of the sr Ca-ATPase.
5 Structural Model of the H+ ,K + -ATPase
5 Structural Model of the H+ ,K+ -ATPase 5.1 Crystal Structure of the H+ ,K+ -ATPase
As discussed above, experimental evidence showed that the H+ ,K+ -ATPase has 10 membrane spanning segments in the catalytic α-subunit and 1 membranespanning segment in the β-subunit. The two-dimensional structure crystals of the H+ ,K+ -ATPase formed in an imidazole buffer containing VO3− and Mg+ ions was resolved at about 25 Å [63, 65]. The average cell edge of the H+ ,K+ -ATPase was 115 Å, containing four asymmetric protein units of 50 x 30 Å. The structure of Na+ ,K+ -ATPase was determined by electron crystallography at 9.5 Å from multiple small 2-D crystals induced in purified membranes [111]. The density map shows a protomer stabilized in the E2 conformation that extends approximately 65 x 75 x 150 Å in the asymmetric unit of the P2 type unit cell. The unit cell dimension of the Co(NH3 )4 ATP-induced crystals of Na+ ,K+ -ATPase was 141 Å [112]. Two-dimensional crystals of the Na+ ,K+ -ATPase were reported to be best formed at pH 4.8 in sodium citrate buffer and to represent a unique lattice (a = 108.7, b = 66.2, r = 104.2) by electron cryomicroscopy There are two high contrast parts in one unit cell [113]. The crystal structure of the sarcoplasmic reticulum Ca-ATPase was resolved at 2.6 Å resolution with two calcium ions bound in the transmembrane domain, which consists often α-helices [114]. Also, the calcium-free E2 state of sr Ca-ATPase was compared with the calcium-bound E1Ca state [94]. In the Ca-ATPase, the Ndomain where ATP binds inclines nearly 90◦ with respect to the membrane and the A domain rotates by about 110◦ horizontally with a change from E1 to E2 conformation. Several attempts have been made to define the tertiary structure of this P2 type-ATPase based on the structure of sr Ca-ATPase combined with sitedirected mutagenesis, cleavage patterns of different conformations and molecular modeling [115–117]. 5.2 Molecular Modeling of the Gastric H+ ,K+ -ATPase
Gastric H+ ,K+ -ATPase is a member of the P2 type family of ATPases that transport ions against their concentration gradients across lipid bilayers at the expense of ATP hydrolysis. These enzymes, which differ in their ion and inhibitor specificities, nevertheless contain numerous stretches of identical amino acid sequence that not only place them in the family but imply a conserved three-dimensional structure. The implicit homology of mechanism has been substantiated during the past 30 years by an array of biochemical and spectroscopic results showing common conformational intermediates in all family members. These intermediates occur along a general reaction pathway which includes two fundamental conformations, E1 in which the binding sites for the outwardly transported ion are accessible from
15
16 Gastric H+ ,K + -ATPase
the cytoplasm and E2 where the binding sites for the inwardly transported ion are accessible from the side opposite the cytoplasm (the “outside”). Binding of the outwardly transported ion (H+ or H3 O+ for H+ ,K+ -ATPase) favors ATP phosphorylation of a conserved aspartic acid side-chain giving E1 P. This phosphorylation shifts the conformational equilibrium in favor of E2 and a separate conformer, E2 P, predominates. The fundamental reactive difference between these forms is that E1 P can reform ATP from ADP whereas E2 P cannot. Binding of the inwardly transported ion stimulates dephosphorylation and gives an occluded form E2 [K+ ] in the H+ ,K+ -ATPase where the ion is inaccessible from either side of the membrane, similar to that observed in the Na+ ,K+ -ATPase. Relaxation to the E1 state results in release of the K+ ion to the cytoplasm. The ultimate goal of structural chemists in this area of research has been to define the mechanism of active ion transport in terms of the molecular structures of the reaction intermediates and, secondly, to understand the molecular basis of ion and inhibitor specificity. The first of these aims requires high-resolution crystal structures for each of the reaction intermediates of at least one member of the P2 -type ATPase family. A tremendous advance towards this requirement has been met recently with rabbit sr Ca-ATPase where high-resolution crystal structures have been reported for both E1 ·2Ca2+ [114] and E2 [thapsigargin] [94] conformations. These structures have provided a molecular understanding of the general mechanism of ion transport [116]. The general shape of the sr Ca-ATPase is that of a “Y”, where the lower half of the base of the Y (M domain) passes through the membrane and contains the ion transport pathway while the upper half extends out of the membrane to form the “stalk” and contains the site of phosphorylation (P domain). One arm of the “Y” makes up the N (nucleotide) domain which binds ATP and the other assists in the transfer of essential conformational effects from the active site to the M domain and is designated the A (for “activator”) domain. In the E1 ·2Ca2+ conformation the N and A domains are splayed open while the calcium ions are encaged side by side near the middle of the membrane in roughly octahedral complexes formed by side-chains’ ligands from transmembrane (TM) helices 8, 5, and 6 (site-1) or 5, 6 and 4 (site-2). This splaying of the Y allows Ca2+ access to the ion-binding domain. The N domain in this form is too distant from P to allow for phosphorylation by bound ATP and it was surmised that N must be capable of rotation about a hinge to allow the γ -phosphate of ATP to reach the phosphorylated aspartate [118]. Subsequently, in the E2 conformation which binds ATP with low affinity, the N domain was shown to be tilted over the P domain, which itself has tilted to a more vertical orientation with respect to the plane of the membrane, making the M, P and N domains nearly collinear. Further, ATP with a fully extended triphosphate can span the N and P domains in the E2 state [119], proving the ability of N to achieve an orientation with P which is at least close to that in E1 . ATP and, presumably, similar to E1 P (ADP still being bound and capable of reforming ATP). The position of the A domain in E2 is raised and rotated upon conversion from E1 . Thus the A, P and N domains are substantially gathered together in E2 . These effects are transmitted to the membrane domain where the geometry of the ion ligands is altered, with
5 Structural Model of the H+ ,K + -ATPase
TM4 rotating to move its carboxylate ligand away from ion site-2 and TM6 rotating with respect to TM4 and TM5. This rearrangement optimizes the transport site for release of outward ion and for entry of the counterion from the outside, while formation of the ion complex triggers dephosphorylation, occlusion, and return to E1 . Currently, the molecular structures of E1 P and E2 P, which are essential for fully understanding the conversion of energy into vectorial ion transport, remain unknown. In contrast to the sr Ca-ATPase it has not been possible to crystallize either the Na+ ,K+ - or H+ ,K+ -ATPase in a form necessary for high-resolution structural determination. There have been two recent structures reported, however, for the Na+ ,K+ – ATPase at 9.5 [111] and 11 Å [120] resolution determined by cryoelectron microscopy of the E2 . vanadate conformations. In each case the density envelope conformed fairly well to the high-resolution form of the sr Ca-ATPase E1 ·2Ca2+ structure with the differences clearly explained by closer approximation to the E2 [thapsigargin] structure that has also been published [94]. The identity of the A, P, M, and N domains was clearly evident, demonstrating the expected homology in structure. The density contributed by the β subunit was tentatively identified in the transmembrane region of the higher-resolution structure [111]. Models for the N domain of the Na+ ,K+ -ATPase have also been determined by NMR of this fragment expressed in E. coli. Changes in structure were shown in the presence of high concentrations of ATP and an ATP bound conformation was constructed [121]. This work is of particular benefit for homology modeling since the N domain sequences are considerably different in sr Ca-ATPase and Na+ ,K+ -ATPase with several long deletions and insertions [115]. In contrast, the Na+ ,K+ – and H+ ,K+ ATPases show high homology in the N domain, with no insertions or deletions in the sequence alignments. For the closely related Na+ ,K+ - and H+ ,K+ -ATPases the more common strategy has been to use biochemical results to validate homology modeling based on the known sr Ca-ATPase structures [117]. In homology modeling, the peptide backbone of a known high-resolution structure is used to substitute the sequence of a homologous protein based on an amino acid sequence alignment. Steric clashes are removed and energy minimization then gives the final model. This method has been applied to the yeast H+ -ATPase [122], colonic H+ K+ -ATPase [123], and the Na+ ,K+ - [124] and H+ ,K+ -ATPases [125]. The predictions of a model can then tested experimentally. For instance, mutational analyses have shown that most of the positions in the M domain which provide ion ligands are conserved, albeit with some substitutions to other oxygen-containing side-chains. It is assumed that these changes are at least partially responsible for the separate ion specificities, but mutating them to affect specificity changes is ineffective, implying the precise geometry is critically dependent on subtle differences in global conformation. Similarly, site-specific mutagenesis has shown that many of the residues affecting specific inhibitor binding of ouabain to the Na+ ,K+ -ATPase, and K+ competitive imidazopyridine, SCH28080, to the H+ ,K+ -ATPase ([102, 125, 126], are conserved in the two pumps. The binding sites are therefore formed from some of the same
17
18 Gastric H+ ,K + -ATPase
Fig. 3 Gastric H+ ,K+ -ATPase a subunit (backbone in ribbon) in the E2 conformation viewed from the membrane (cytoplasmic side up). The model was constructed by substitution of the rabbit amino acid sequence on to the backbone of the known sr Ca-ATPase structure (pdb.1iwo, the E2 [thapsigargin] conformation) whose N domain backbone had been replaced with that of the Na+ ,K+ -ATPase determined by NMR. Energy minimization was then applied to give the structure shown. The site of phosphorylation is adjacent to the ac-
tive site magnesium (violet sphere). Ion transport sites are near the middle of the membrane. Proton pump inhibitors (imidazopyridines) block K+ access and inactivate acid secretion by covalent modification of cys813 in a cavity formed by ala335 and phe332 from TM4 (green), tyr925 from TM8 (orange), pro808 and leu809 from loop TM5/TM6 (yellow), and TM1/TM2 (blue). The structure and location of the $ subunit (not shown) is currently unknown although a major site of interaction is the M7/M8 luminal loop.
residues in the loops between TM3–TM4, TM5–TM6 and TM7–TM8 (Figure 3). Both of these inhibitors bind to E2 P conformations and prevent K+ access to the ion binding site. Specificity presumably arises from a combination of a few side-chain changes near the site and global differences generated allosterically by distant substitutions whose importance for binding is nevertheless demonstrable by mutation. The presence of an inhibitor cavity or vestibule accessible from the outside near the TM5-M6 loop is further shown by covalent labeling of the H+ ,K+ ATPase at Cys813 at the beginning of TM6 by the covalent proton pump inhibitors (e.g. omeprazole). The homology model of the E2 form of the H+ ,K+ -ATPase with SCH28080 bound accounts accurately for the known active conformer of
6 Acid Secretion and the H+ ,K + -ATPase
the inhibitor as well as for the effects given by specific amino acid mutations around the inhibitor site [125]. In addition, the model predicts disulfide bond linkages in the immediate vicinity of the bound inhibitor, which can be engineered into the protein via cysteine substitution. These results validate the accuracy of homology modeling in the membrane based on the sr Ca-ATPase structure. The fit of the model to the density envelopes reported for the homologous Na+ ,K+ -ATPase mentioned above has led to predictions for the site of β subunit interaction and kinase phosphorylation, but these have yet to be tested experimentally.
6 Acid Secretion and the H+ ,K+ -ATPase
The H+ ,K+ -ATPase is present mainly in the gastric parietal cell. In the resting parietal cell it is present in smooth surfaced cytoplasmic membrane tubules. Upon stimulation of acid secretion, the pump is found on the microvilli of the secretory canaliculus of the parietal cell. This morphological change results in a severalfold expansion of the canaliculus [127, 128]. There are actin filaments within the microvilli and the subapical cytoplasm. In the cytoskeleton system, there is an abundance of microtubules among the tubulovesicles. Some microtubules appeared to be associating with tubulovesicles. Numerous electron-dense coated pits and vesicles were observed around the apical membrane vacuoles in cimetidine-treated resting parietal cells, consistent with an active membrane uptake in the resting state. The cultured parietal cells undergo morphological transformation under histamine stimulation, resulting in a great expansion of apical membrane vacuoles. Immunogold labeling of H+ ,K+ -ATPase was present not only on the microvilli of expanded apical plasma membrane vacuoles but also in the electron-dense coated pits [129]. There is activation of a K+ and Cl− conductance in the pump membrane which allows K+ to access the extra-cytoplasmic face of the pump [130]. This allows H+ for K+ exchange to be catalyzed by the ATPase [131]. This conductance is probably due to the presence of individual proteins in the stimulated membrane. Covalent inhibitors of the H+ ,K+ -ATPase that have been developed to treat ulcer disease and esophagitis depend on the presence of acid secreted by the pump. They are also acid-activated prodrugs that accumulate in the acid space of the parietal cell. Hence their initial site of binding is only in the secretory canaliculus of the functioning parietal cell [132]. These data show also that the pump present in the cytoplasmic tubules does not generate HCl. The upstream DNA sequence of the a subunit contains both Ca and cAMP responsive elements in rat H+ ,K+ -ATPase [133]. There are gastric nuclear proteins present that bind selectively to a nucleotide sequence, GATACC, in this region of the gene [34]. These proteins have not been detected in other tissues. Another regulatory site has also been postulated [134]. Stimulation of acid secretion by histamine increases the level of mRNA for the α subunit of the pump [135]. Elevation of serum gastrin, which secondarily
19
20 Gastric H+ ,K + -ATPase
stimulates histamine release from the enterochromaffin-like cell in the vicinity of the parietal cell, also stimulates transiently the mRNA levels in the parietal cell [136]. H2 receptor antagonists block the effect of serum gastrin elevation on mRNA levels [137]. It seems, therefore, that activity of the H2 receptor on the parietal cell determines in part gene expression of the ATPase. Chronic stimulation of this receptor might be expected therefore to upregulate pump levels whereas inhibition of the receptor would down-regulate levels of the ATPase. However, chronic administration of these H2 receptor antagonists, such as famotidine, results in an increase in pump protein, whereas chronic administration of omeprazole (which must stimulate histamine release) reduces the level of pump protein in the rabbit [138, 139]. Regulation of pump protein turnover downstream of gene expression must account for these observations. Probably, the pump remains in the tubular state, preventing retrieval and degradation of part of the retrieved protein with receptor antagonist inhibition. This is consistent with the finding that the half-life of the protein is increased with H2 receptor inhibition [155].
7 Inhibitors of the H+ ,K+ -ATPase
Two types of proton pump inhibitors have been developed to react exclusively with the extra-cytoplasmic surface of the gastric H+ ,K+ -ATPase. One class of inhibitors now commercially available is substituted benzimidazoles such as omeprazole, lansoprazole, rabeprazole, and pantoprazole. The other class is a series of K+ competitive reagents, including imidazopyridines that are also known to react on the outside surface of the pump. 7.1 Substituted Benzimidazoles
The H+ ,K+ -ATPase in the parietal cell secretes acid into the secretory canaliculus, generating a pH of < 1.0 in lumen of this structure. The acidity of this space is more than 1000-fold greater than anywhere else in the body, and allows accumulation of weak bases. Weak bases of a pK a less than 4.0 would be accumulated only in this acidic space and no other acidic space in the body. The first compound of this class with inhibitory activity on the enzyme and on acid secretion was the 2-(pyridylmethyl)sulfinylbenzimidazole, timoprazole [140]. Since a substituted benzimidazole was first reported to inhibit the H+ ,K+ -ATPase, many inhibitors of the H+ ,K+ -ATPase have been synthesized. The first pump inhibitor used clinically was 2-[(3,5-dimethyl-4-methoxypyridin-2-yl)methylsulfinyl]5-methoxy-1H-benzimidazole, omeprazole [141]. The covalent inhibitors all belong to the substituted benzimidazole family [142–146] (Figure 4). These reagents are weak base, acid-activated compounds, which form cationic sulfenamides in acidic environments. The sulfenamides formed react with the SH group of cysteines in
7 Inhibitors of the H+ ,K + -ATPase
Fig. 4 Proton pump inhibitors. These are substituted pyridylmethylsulfinyl benzimidazole. For omeprazole R and R are methyl groups and R and R are methoxy groups. For lansoprazole R and R are H, R is a methyl group and R is a trifluo-
roethoxy group. Pantoprazole has R and R as methoxy groups, R is H and R is a difluoromethoxy group. For rabeprazole R and R are H, R is a methyl group and R is a methoxypropoxy group.
proteins to form relatively stable disulfides. Since the pump generates acid on its extra-cytoplasmic surface, only those cysteines available from that surface are accessible to these sulfenamides if labeling is carried out under acid transporting conditions. Omeprazole is an acid-activated prodrug [147]. Omeprazole can be accumulated in the acidic space and easily converted into a reactive cationic sulfenamide species, which binds to SH group of cysteines in the H+ ,K+ -ATPase [147–151]. Omeprazole has a stoichiometry of 2 mol inhibitor bound per mol phosphoenzyme under acid transporting conditions and is bound only to the a subunit even in vivo [150, 151]. Substituted benzimidazole inhibitors show slightly different effects depending on the inhibitor structure. Two irreversible inhibitors that form cysteine-reactive sulfenamides in the acid space generated by the pump, omeprazole and E3810, appear to inhibit the enzyme in different conformations [152]. Both omeprazole and E3810, 2-{[4-(3-methoxypropoxy)-3-methylpyridin-2-yl]methylsulfinyl}1H-benzimidazole, are acid activated in luminal surface to form active sulfenamide derivatives, which can bind cysteines within the H+ ,K+ -ATPase. The omeprazole-bound enzyme has a lower FITC fluorescence, perhaps due to an E2 -like conformation. Both ATPase activity and steady-state phosphorylation were inhibited. The E3810 bound enzyme showed a high FITC fluorescence, more
21
22 Gastric H+ ,K + -ATPase
Fig. 5 Proton pump inhibitor reaction pathway. Proton pump inhibitors have a pyridine moiety of pK a ∼ 4.0, which enables them to be selectively accumulated in the acidic space of the active parietal cell. A second protonation on the benzimidazole ring with a pK a ∼ 1.0 results in electron deficiency of C-2 of benzimidazole, where the unpro-
tonated pyridine N can attack intramolecularly. After this rearrangement, PPIs generate either a sulfenic acid or a cyclic sulfenamide, both of which are very reactive with thiol groups (cysteine in protein), giving a product that is a permanent cation, thereby restricting re-entry into the cytoplasm of the parietal cell.
like a E1 conformation. Fluorescence of the E3810 bound enzyme was quenched by K+ in contrast to the omeprazole-derivatized FITC labeled enzyme. It is not known whether these effects are due to differences in structure of the inhibitors or to differences in location of binding site or both. These proton pump inhibitors have different binding sites. Omeprazole (Figure 5) binds to cysteines in the extra-cytoplasmic regions of TM5/TM6 (cys813) and TM7/TM8 (cys892) [16]. Pantoprazole binds only to the cys813 and cys822 in M5/M6 [15] and lansoprazole binds to cysteines in TM3/TM4(cys321), cys813 inTM5/TM6, and cys892 in TM7/TM8 [153]. These data suggest that, of the 28 cysteines in the α subunit, only the cysteines present in the TM5/TM6 region are important for inhibition of acid secretion by the substituted benzimidazoles. The proton pump inhibitors provided different acid recovery rates in vivo. In human, the half-life of the inhibitory effect on acid secretion is ∼13 h for lansoprazole, ∼28 h for omeprazole and ∼46 h for pantoprazole [154]. The half-time of recovery of acid secretion and ATPase in rats following omeprazole treatment is ∼ 15 h whereas the pump protein half-life is 54 h [155, 156]. Covalent inhibition of the ATPase results in inhibition of acid secretion extending for longer than the
7 Inhibitors of the H+ ,K + -ATPase
plasma half-life of the PPIs. Recovery from inhibition of acid secretion could occur in principle by either de novo synthesis of pump protein and/or reduction of the disulfide by an endogenous cellular reducing agent such as glutathione. The latter mechanism of reversal depends on whether glutathione or other reducing agents can gain access to a particular cysteine disulfide bond. Recovery of acid secretion following inhibition by all PPIs, other than pantoprazole, may depend on both protein turnover and reversal of the inhibitory disulfide bond. In contrast, recovery of acid secretion after pantoprazole may depend entirely on new protein synthesis [157].
7.2 Substituted Imidazo[1,2α]pyridines and other K+ -competitive Antagonists
Reversible inhibitors contain protonatable nitrogens and have various structures. One type is represented by the imidazo-pyridine derivatives [158], others are piperidinopyridines [159], substituted 4-phenylaminoquinolines [160], pyrrolo[3,2-c]quinolines [161]g, guanidino-thiazoles [162], substituted pyrimidines[163] and scopadulcic acid [164, 165]g. Some natural products such as cassigarol A [166] and naphthoquinone [167] also showed inhibitory activity. Many of these reversible inhibitors show K+ competitive characteristics, in contrast to the benzimidazole type. Imidazo[1,2α]pyridine derivatives were shown to inhibit gastric secretion more rapidly than the benzimidazole type since the latter require acid secretion, accumulation and acid activation [168, 169]. SCH 28080, 3-cyanomethyl-2-methyl8-(phenylmethoxy)imidazo[1,2α]pyridine, inhibited the H+ ,K+ -ATPase competitively with K+ [170]. SCH 28080 binds to free enzyme extra-cytoplasmically in the absence of substrate to form E2(SCH 28080) complexes. SCH 28080 inhibits ATPase activity with high affinity in the absence of K+ (Figure 6). SCH 28080 has no effect on spontaneous dephosphorylation but inhibits K+ -stimulated dephosphorylation, presumably by forming a E2 -P·[I] complex. Hence SCH 28080 inhibits K+ -stimulated ATPase activity by competing with K+ for binding to E2 P [171]. Steady-state phosphorylation is also reduced by SCH 28080, showing that this compound also binds to the free enzyme. Using a photoaffinity reagent, 8-[(4azidophenyl)-methoxy]-1-tritiomethyl-2,3-dimethylimidazo[1,2α]pyridinium iodide (Me-DAZIP+ ), part of the binding site of this class of K+ competitive inhibitor was identified to be in or close to the loop between the TM1 and TM2 segments [91]. Site-directed mutagenesis shows that the binding region is also in the vestibule bounded by the loop between TM5 and TM6 [102, 125]. This binding region, as discussed earlier, was identified by site-directed mutagenesis. Another type of fluorescent K+ competitive arylquinoline, 1-(2-methylphenyl)4-methylamino-6-methyl-2,3-dihydropyrrolo[3,2-c]quinoline (MDPQ), shows enhanced hydrophobicity of its environment with formation of the E2 -P·[I] conformer of the enzyme, as if the H1/loop/H2 segment moves further into the membrane in the E2 conformation [75].
23
24 Gastric H+ ,K + -ATPase
Fig. 6 SCH 28080 and Me-DAZIP+ . SCH 28080, 3-cyanomethyl-2-methyl-8-(phenylmethoxy)imidazo[1,2"]pyridine, inhibited H+ ,K+ -ATPase competitively with K+ . SCH 28080 binds to free enzyme extra-cytoplasmically in the absence of substrate to form E2 (SCH 28080) com-
plexes. SCH 28080 inhibits ATPase activity by forming an E2 -P·[I] complex. Another imidazo-pyridine derivative, 8-[(4azidophenyl)methoxy]-1-tritiomethyl-2,3dimethylimidazo-[1,2"]pyridinium iodide (Me-DAZIP+ ), inhibits enzyme activity as SCH28080 does.
8 Trafficking of the H+ ,K+ -ATPase
Early electron microscopy suggested stimulation of acid secretion was accompanied by a morphological change in the parietal cell whereby membrane vesicles changed locale from the cytoplasm to microvilli of the secretory canaliculus [128, 172]. Disappearance of membrane vesicles from the cytoplasm that is observed by immunoelectron microscopy in association with acid secretion [173] was also consistent with the pump moving into microvilli. Fluorescence microscopy then demonstrated acid secretion takes place into the microvilluslined secretory canaliculus [174]. The association of a cytoskeletal component, ezrin, with stimulated but not with resting membranes [175] and the presence of other SNARE proteins such as Rab 11 [176] in the parietal cell was taken as further support of the vesicle fusion hypothesis. Freeze–fracture images of rapidly frozen fixed tissue indicated the membrane vesicles were in fact tubules [177, 178], or even a network of tubules [179]. These data imply the microvilli result from tubule followed by tubule fusion-eversion, rather than vesicle fusion to form the microvillus. Stimulation of secretion with a tubular network structure presumably results in eversion of this network to form the microvillar network of the stimulated secretory canaliculus without the need even for individual tubule fusion events [180, 181].
8 Trafficking of the H+ ,K + -ATPase
Thus, regulation of acid secretion in gastric parietal cells reflects the appearance of the pump in the microvilli of the secretory canaliculus as compared with a cytoplasmic location. [180, 181]. As mentioned above, a tyrosine-based motif in a cytoplasmic tail of the β-subunit appears to be responsible for the H+ ,K+ -ATPase relocation to the cytoplasmic compartment of the cell [55]. When this tyrosine was mutated to alanine and mutant β-subunit was expressed in mice, transgenic animals continuously secreted acid in a stimulus-independent manner. It was suggested that the same motif might be recognized as a basolateral sorting signal when the H+ ,K+ -ATPase β-subunit was expressed in MDCK cells [182] but the experimental data did not support this suggestion [183]. Parietal cells, similar to other polarized cells, sort proteins into two delivery routes, either to apical or basolateral domains of the plasma membrane. Newly synthesized proteins are sorted to the trans-Golgi network and recycling proteins are sorted in endosomes. Sorting is based on the presence of intrinsic sorting signals that are recognized by specific sorting machinery [184–186]. Information on the sorting signals within the H+ ,K+ -ATPase subunits has been inferred from expression studies. When expressed in LLC-PK1 cells, which do not contain tubulovesicular elements, the gastric H+ ,K+ -ATPase is located exclusively on the apical membrane [187]. Studies in which the H+ ,K+ -ATPase β-subunit or chimeric complexes of the H+ ,K+ -ATPase α- or β-subunit with the appropriate subunit of the Na+ ,K+ -ATPase were expressed in LLC-PK1 cells have shown that both α- and β-subunits of the H+ ,K+ -ATPase contain apical trafficking signals [182, 187]. The apical sorting signal in the α-subunit appears to reside in the fourth transmembrane domain and to act through long-range interactions with its flanking cytoplasmic loop domains [188]. However, the nature of the apical sorting signal(s) within the β-subunit remains unknown. N-Glycosylation sites might be considered as potential candidates for the apical signals since the gastric H+ ,K+ -ATPase βsubunit is heavily glycosylated and N-glycosylation sites act as apical signals in several secreted and membrane proteins [189–191]. If expressed in non-polarized HEK-293 cells, the H+ ,K+ -ATPase is localized on the plasma membrane [101, 192]. Mutational studies in HEK-293 cells have shown that six of the seven glycosylation sites in the gastric H+ ,K+ -ATPase β-subunit are essential for trans-Golgi network to plasma membrane trafficking of the H+ ,K+ -ATPase β-subunit in HEK-293 cells [193]. The second glycosylation site (Asn103), which is not conserved among the β-subunits from different species, is not critical for the plasma delivery of the protein. The trafficking step that is affected by the removal of N-glycosylation sites is the route from the Golgi to the plasma membrane and not from ER to Golgi. It is possible that, similar to polarized cells, HEK-293 cells have machinery which recognizes sorting signal(s) and places the β-subunit into specific cargo vesicles in trans-Golgi network that deliver the protein to the plasma membrane. There is increasing evidence that apical and basolateral sorting and trafficking pathways exist not only in polarized but also in non-polarized cells [194]. When apical and basolateral proteins are expressed in non-polarized cells, in trans-Golgi network they are sorted into different containers that travel separately until they
25
26 Gastric H+ ,K + -ATPase
Fig. 7 Hypothetical models of stimulation of acid secretion. The vesicle fusion hypothesis suggests that upon stimulation the H+ ,K+ -ATPase containing vesicles relocate from the cytoplasm to the microvili and fuse with the apical membrane, with
a contribution of cytoskeletal components and SNARE proteins. The tubule-eversion model suggests that stimulation of acid secretion results in fusion and eversion of the H+ ,K+ -ATPase containing tubules rather than individual vesicle fusion events.
fuse with plasma membrane [195–198] (Figure 7). This implies that trans-Golgi network to plasma membrane delivery in HEK-293 cells depends on the presence or absence of apical sorting information encoded by particular glycosylation sites within the extracellular domain of the β-subunit [199]. The decrease in apical surface expression of the H+ ,K+ -ATPase β-subunit in polarized LLC-PK1 due to the mutation of N-glycosylation sites support this conclusion [200].
References 1 M. MAEDA, J. ISHIZAKI, and M. FUTAI, Biochem. Biophys. Res. Commun., 1988, 157, 203–209. 2 G. E. SHULL and J. B. LINGREL, J. Biol. Chem., 1986, 261, 16788–16791. 3 K. BAMBERG, F. MERCIER, M. A. REUBEN, Y. KOBAYASHI, K. B. MUNSON, and G. SACHS, Biochim. Biophys. Acta, 1992, 1131, 69–77. 4 L. K. LANE, T. L. KIRLEY, and W. J. BALL, Jr., Biochem. Biophys. Res. Commun., 1986, 138, 185–192. 5 M. MAEDA, K. OSHIMAN, S. TAMURA, and M. FUTAI, J. Biol. Chem., 1990, 265, 9027–9032.
6 P. R. NEWMAN, J. GREEB, T. P. KEETON, A. A. REYES, and G. E. SHULL, DNA Cell. Biol., 1990, 9, 749–762. 7 K. OSHIMAN, K. MOTOJIMA, S. MAHMOOD, A. SHIMADA, and S. TAMURA, et al., FEBS Lett., 1991, 281, 250–254. 8 S. TAMURA, M. TAGAYA, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1989, 264, 8580–8584. 9 M. MAEDA, M. TAGAYA, and M. FUTAI, J. Biol. Chem., 1988, 263, 3652–3656. 10 M. O. WALDERHAUG, R. L. POST, G. SACCOMANI, R. T. LEONARD, and D. P. BRISKIN, J. Biol. Chem., 1985, 260, 3852–3859.
References 11 R. J. JACKSON, J. MENDLEIN, and G. SACHS, Biochim. Biophys. Acta, 1983, 731, 9–15. 12 R. A. FARLEY and L. D. FALLER, Biochemistry, 1989, 28, 5764–5772. 13 J. KYTE and R. F. DOOLITTLE, J. Mol. Biol., 1982, 157, 105–132. 14 D. R. SCOTT, K. MUNSON, N. MODYANOV, and G. SACHS, Biochim. Biophys. Acta, 1992, 1112, 246–250. 15 J. M. SHIN, M. BESANCON, A. SIMON, and G. SACHS, Biochim. Biophys. Acta, 1993, 1148, 223–233. 16 M. BESANCON, J. M. SHIN, F. MERCIER, K. MUNSON, and M. MILLER, et al., Biochemistry, 1993, 32, 2345–2355. 17 D. BAYLE, J. C. ROBERT, K. BAMBERG, F. BENKOUKA, and A. M. CHERET, et al., J. Biol. Chem., 1992, 267, 19060–19065 18 F. MERCIER, D. BAYLE, M. BESANCON, T. JOYS, and J. M. SHIN, et al., Biochim. Biophys. Acta, 1993, 1149, 151–165. 19 A. J. SMOLKA, K. A. LARSEN, and C. E. HAMMOND, Biochem. Biophys. Res. Commun., 2000, 273, 942–947. 20 F. MERCIER, H. REGGIO, G. DEVILLIERS, D. BATAILLE, and P. MANGEAT, J. Cell. Biol., 1989, 108, 441–453. 21 F. MERCIER, H. REGGIO, G. DEVILLIERS, D. BATAILLE, and P. MANGEAT, Biol. Cell., 1989, 65, 7–20. 22 K. BAMBERG and G. SACHS, J. Biol. Chem., 1994, 269, 16909–16919. 23 D. BAYLE, D. WEEKS, and G. SACHS, J. Biol. Chem., 1995, 270, 25678–25684. 24 K. HALL, G. PEREZ, D. ANDERSON, C. GUTIERREZ, and K. MUNSON, et al., Biochemistry, 1990, 29, 701–706. 25 B. H. TOH, P. A. GLEESON, R. J. SIMPSON, R. L. MORITZ, and J. M. CALLAGHAN, et al., Proc. Natl. Acad. Sci. U. S. A., 1990, 87, 6418–6422. 26 C. T. OKAMOTO, J. M. KARPILOW, A. SMOLKA, and J. G. FORTE, Biochim. Biophys. Acta, 1990, 1037, 360–372. 27 J. M. CALLAGHAN, B. H. TOH, J. M. PETTITT, D. C. HUMPHRIS, and P. A. GLEESON, J. Cell. Sci., 1990, 95, 563–576. 28 J. M. CALLAGHAN, B. H. TOH, R. J. SIMPSON, G. S. BALDWIN, and P. A. GLEESON, Biochem. J., 1992, 283, 63–68. 29 K. HALL, G. PEREZ, G. SACHS, and E. RABON, Biochim. Biophys. Acta, 1991, 1077, 173–179.
30 E. C. RABON and M. A. REUBEN, Annu. Rev. Physiol., 1990, 52, 321–344. 31 M. A. REUBEN, L. S. LASATER, and G. SACHS, Proc. Natl. Acad. Sci. U. S. A., 1990, 87, 6767–6771. 32 V. A. CANFIELD, C. T. OKAMOTO, D. CHOW, J. DORFMAN, and P. GROS, et al., J. Biol. Chem., 1990, 265, 19878–19884 33 G. E. SHULL, J. Biol. Chem., 1990, 265, 12123–12126. 34 M. MAEDA, K. OSHIMAN, S. TAMURA, S. KAYA, and S. MAHMOOD, et al., J. Biol. Chem., 1991, 266, 21584–21588. 35 P. R. NEWMAN and G. E. SHULL, Genomics, 1991, 11, 252–262. 36 V. A. CANFIELD and R. LEVENSON, Proc. Natl. Acad. Sci. U. S. A., 1991, 88, 8247–8251. 37 G. P. MORLEY, J. M. CALLAGHAN, J. B. ROSE, B. H. TOH, P. A. GLEESON, and I. R. DRIEL VAN, J. Biol. Chem., 1992, 267, 1165–1174. 38 J. Y. MA, Y. H. SONG, S. E. SJOSTRAND, L. RASK, and S. MARDH, Biochem. Biophys. Res. Commun., 1991, 180, 39–45. 39 D. C. CHOW, C. M. BROWNING, and J. G. FORTE, Am. J. Physiol., 1992, 263, 39–46. 40 T. L. KIRLEY, J. Biol. Chem., 1990, 265, 4227–4232. 41 K. J. RENAUD, E. M. INMAN, and D. M. FAMBROUGH, J. Biol. Chem., 1991, 266, 20491–20497. 42 P. JAUNIN, J. D. HORISBERGER, K. RICHTER, P. J. GOOD, B. C. ROSSIER, and K. GEERING, J. Biol. Chem., 1992, 267, 577–585. 43 U. ACKERMANN and K. GEERING, FEBS Lett., 1990, 269, 105–108. 44 K. GEERING, Soc. Gen. Physiol. Ser., 1991, 46, 31–43. 45 C. J. GOTTARDI and M. J. CAPLAN, J. Biol. Chem., 1993, 268, 14342–14347. 46 D. L. ROUSH, C. J. GOTTARDI, and M. J. CAPLAN, Ann. New York Acad. Sci., 1994, 733, 212–222. 47 M. J. CAPLAN, Curr. Opin. Cell. Biol., 1998, 10, 468–473. 48 L. A. DUNBAR and M. J. CAPLAN, Eur. J. Cell. Biol., 2000, 79, 557–563. 49 J. B. KOENDERINK, H. G. SWARTS, H. P. HERMSEN, and J. J. PONT DE, J. Biol. Chem., 1999, 274, 11604–11610.
27
28 Gastric H+ ,K + -ATPase
50 H. P. HERMSEN, H. G. SWARTS, L. WASSINK, F. J. DIJK, and M. T. RAIJMAKERS, et al., Biochim. Biophys. Acta, 2000, 1480, 182–190. 51 K. TYAGARAJAN, R. R. TOWNSEND, and J. G. FORTE, Biochemistry, 1996, 35, 3238–3246. 52 K. TYAGARAJAN, P. H. LIPNIUNAS, R. R. TOWNSEND, and J. G. FORTE, Biochemistry, 1997, 36, 10200–10212. 53 M. J. TREUHEIT, C. E. COSTELLO, and T. L. KIRLEY, J. Biol. Chem., 1993, 268, 13914–13919. 54 S. ASANO, K. KAWADA, T. KIMURA, A. V. GRISHIN, M. J. CAPLAN, and N. TAKEGUCHI, J. Biol. Chem., 2000, 275, 8324–8330. 55 N. COURTOIS-COUTRY, D. ROUSH, V. RAJENDRAN, J. B. MCCARTHY, and J. GEIBEL, et al., Cell, 1997, 90, 501–510. 56 K. L. SCARFF, L. M. JUDD, B. H. TOH, P. A. GLEESON, and I. R. DRIEL VAN, Gastroenterology, 1999, 117, 605–618. 57 S. ASANO, T. KIMURA, S. UENO, M. KAWAMURA, and N. TAKEGUCHI, J. Biol. Chem., 1999, 274, 22257–22265. 58 M. V. LEMAS, K. TAKEYASU, and D. M. FAMBROUGH, J. Biol. Chem., 1992, 267, 20987–20991. 59 A. T. BEGGAH, P. BEGUIN, P. JAUNIN, M. C. PEITSCH, and K. GEERING, Biochemistry, 1993, 32, 14117–14124. 60 J. D. HORISBERGER, P. JAUNIN, M. A. REUBEN, L. S. LASATER, and D. C. CHOW, et al., J. Biol. Chem., 1991, 266, 19131–19134 61 J. M. SHIN and G. SACHS, J. Biol. Chem., 1994, 269, 8642–8646. 62 D. MELLE-MILOVANOVIC, M. MILOVANOVIC, S. NAGPAL, G. SACHS, and J. M. SHIN, J. Biol. Chem., 1998, 273, 11075–11081. 63 E. RABON, M. WILKE, G. SACHS, and G. ZAMPIGHI, J. Biol. Chem., 1986, 261, 1434–1439. 64 E. C. RABON, R. D. GUNTHER, S. BASSILIAN, and E. S. KEMPNER, J. Biol. Chem., 1988, 263, 16189–16194. 65 H. HEBERT, Y. XIAN, I. HACKSELL, and S. MARDH, FEBS Lett., 1992, 299, 159–162. 66 J. M. SHIN and G. SACHS, J. Biol. Chem., 1996, 271, 1904–1908
67 J. C. KOSTER, G. BLANCO, and R. W. MERCER, J. Biol. Chem., 1995, 270, 14332–14339. 68 W. W. REENSTRA and J. G. FORTE, J. Membr. Biol., 1981, 61, 55–60. 69 G. S. SMITH and P. B. SCHOLES, Biochim. Biophys. Acta, 1982, 688, 803–807. 70 S. MARDH and L. NORBERG, Acta Physiol. Scand. Suppl., 1992, 607, 259–263. 71 E. C. RABON, T. L. MCFALL, and G. SACHS, J. Biol. Chem., 1982, 257, 6296–6299. 72 A. T. SKRABANJA, H. T. HIJDEN van der, and J. J. PONT DE, Biochim. Biophys. Acta, 1987, 903, 434–440. 73 B. WALLMARK, H. B. STEWART, E. RABON, G. SACCOMANI, and G. SACHS, J. Biol. Chem., 1980, 255, 5313–5319. 74 J. M. SHIN, R. GOLDSHLEGER, K. B. MUNSON, G. SACHS, and S. J. KARLISH, J. Biol. Chem., 2001, 276, 48440–48450. 75 E. RABON, G. SACHS, S. BASSILIAN, C. LEACH, and D. KEELING, J. Biol. Chem., 1991, 266, 12395–12401. 76 E. C. RABON, K. SMILLIE, V. SERU, and R. RABON, J. Biol. Chem., 1993, 268, 8012–8018. 77 P. BRZEZINSKI, B. G. MALMSTROM, P. LORENTZON, and B. WALLMARK, Biochim. Biophys. Acta, 1988, 942, 215–219. 78 B. STEWART, B. WALLMARK, and G. SACHS, J. Biol. Chem., 1981, 256, 2682–2690. 79 P. LORENTZON, G. SACHS, and B. WALLMARK, J. Biol. Chem., 1988, 263, 10705–10710. 80 J. MENDLEIN and G. SACHS, J. Biol. Chem., 1989, 264, 18512–18519. 81 C. POLVANI, G. SACHS, and R. BLOSTEIN, J. Biol. Chem., 1989, 264, 17854–17859. 82 P. LORENTZON, D. SCOTT, S. HERSEY, B. WALLMARK, E. RABON, and G. SACHS, Prog. Clin. Biol. Res., 1988, 273, 247–254. 83 L. D. FALLER and R. A. DIAZ, Biochemistry, 1989, 28, 6908–6914. 84 L. D. FALLER, Biochemistry, 1989, 28, 6771–6778. 85 K. ABE, S. KAYA, T. IMAGAWA, and K. TANIGUCHI, Biochemistry, 2002, 41, 2438–2445. 86 D. W. MARTIN and J. R. SACHS, Biochemistry, 1999, 38, 7485–7497. 87 H. EGUCHI, S. KAYA, and K. TANIGUCHI, Biochem. Biophys. Res. Commun., 1993, 196, 294–300.
References 88 S. MARKUS, Z. PRIEL, and D. M. CHIPMAN, Biochemistry, 1989, 28, 793–799. 89 S. J. KARLISH, J. Bioenerg. Biomembr., 1980, 12, 111–136. 90 I. N. SMIRNOVA and L. D. FALLER, J. Biol. Chem., 1993, 268, 16120–16123. 91 K. B. MUNSON, C. GUTIERREZ, V. N. BALAJI, K. RAMNARAYAN, and G. SACHS, J. Biol. Chem., 1991, 266, 18976–18988. 92 R. GOLDSHLEGER and S. J. KARLISH, J. Biol. Chem., 1999, 274, 16213–16221. 93 R. GOLDSHLEGER and S. J. KARLISH, Proc. Natl. Acad. Sci. U. S. A., 1997, 94, 9596–9601. 94 C. TOYOSHIMA and H. NOMURA, Nature, 2002, 418, 605–611. 95 S. J. KARLISH, R. GOLDSHLEGER, and W. D. STEIN, Proc. Natl. Acad. Sci. U. S. A., 1990, 87, 4566–4570. 96 C. GATTO, S. LUTSENKO, J. M. SHIN, G. SACHS, and J. H. KAPLAN, J. Biol. Chem., 1999, 274, 13737–13740. 97 M. BESANCON, A. SIMON, G. SACHS, and J. M. SHIN, J. Biol. Chem., 1997, 272, 22438–22446. 98 O. VAGIN, S. DENEVICH, K. MUNSON, and G. SACHS, Biochemistry, 2002, 41, 12755–12762. 99 S. ASANO, S. MATSUDA, S. HOSHINA, S. SAKAMOTO, and N. TAKEGUCHI, J. Biol. Chem., 1999, 274, 6848–6854. 100 K. B. MUNSON, N. LAMBRECHT, and G. SACHS, Biochemistry, 2000, 39, 2997–3004. 101 N. LAMBRECHT, K. MUNSON, O. VAGIN, and G. SACHS, J. Biol. Chem., 2000, 275, 4041–4048. 102 O. VAGIN, K. MUNSON, N. LAMBRECHT, S. J. KARLISH, and G. SACHS, Biochemistry, 2001, 40, 7480–7490. 103 E. C. RABON, M. HOGGATT, and K. SMILLIE, J. Biol. Chem., 1996, 271, 32137–32146. 104 H. G. SWARTS, C. H. KLAASSEN, M. BOER DE, J. A. FRANSEN, and J. J. PONT DE, J. Biol. Chem., 1996, 271, 29764–29772. 105 H. P. HERMSEN, J. B. KOENDERINK, H. G. SWARTS, and J. J. PONT DE, Biochemistry, 2000, 39, 1330–1337. 106 H. P. HERMSEN, H. G. SWARTS, L. WASSINK, J. B. KOENDERINK, P. H.
107
108
109
110
111
112
113
114
115 116 117
118 119 120
121
122 123
WILLEMS, and J. J. PONT DE, Biochemistry, 2001, 40, 6527–6533. H. G. SWARTS, J. B. KOENDERINK, H. P. HERMSEN, P. H. WILLEMS, and J. J. PONT DE, J. Biol. Chem., 2001, 276, 36909–36916. M. BURNAY, G. CRAMBERT, S. KHAROUBI-HESS, K. GEERING, and J. D. HORISBERGER, J. Biol. Chem., 2003, 278, 19237–19244. S. ASANO, T. IO, T. KIMURA, S. SAKAMOTO, and N. TAKEGUCHI, J. Biol. Chem., 2001, 276, 31265–31273. M. MENSE, V. RAJENDRAN, R. BLOSTEIN, and M. J. CAPLAN, Biochemistry, 2002, 41, 9803–9812. H. HEBERT, P. PURHONEN, H. VORUM, K. THOMSEN, and A. B. MAUNSBACH, J. Mol. Biol., 2001, 314, 479–494. H. HEBERT, E. SKRIVER, M. SODERHOLM, and A. B. MAUNSBACH, J. Ultrastruct. Mol. Struct. Res., 1988, 100, 86–93. Y. TAHARA, S. OHNISHI, Y. FUJIYOSHI, Y. KIMURA, and Y. HAYASHI, FEBS Lett., 1993, 320, 17–22. C. TOYOSHIMA, M. NAKASAKO, H. NOMURA, and H. OGAWA, Nature, 2000, 405, 647–655. K. J. SWEADNER and C. DONNET, Biochem. J., 2001, 356, 685–704. A. G. LEE, Biochim. Biophys. Acta, 2002, 1565, 246–266. P. L. JORGENSEN, K. O. HAKANSSON, and S. J. KARLISH, Annu. Rev. Physiol., 2003, 65, 817–849. C. XU, W. J. RICE, W. HE, and D. L. STOKES, J. Mol. Biol., 2002, 316, 201–211. H. MA, G. INESI, and C. TOYOSHIMA, J. Biol. Chem., 2003, 278, 28938–28943. W. J. RICE, H. S. YOUNG, D. W. MARTIN, J. R. SACHS, and D. L. STOKES, Biophys. J., 2001, 80, 2187–2197. M. HILGE, G. SIEGAL, G. W. VUISTER, P. GUNTERT, S. M. GLOOR, and J. P. ABRAHAMS, Nat. Struct. Biol., 2003, 10, 468–474. W. KUHLBRANDT, J. ZEELEN, and J. DIETRICH, Science, 2002, 297, 1692–1696. M. L. GUMZ, D. DUDA, R. MCKENNA, C. S. WINGO, and B. D. CAIN, Molecular modeling of the rabbit colonic (HKalpha2a) H(+), K(+) ATPase. J. Mol. Model. (Online), 2003, 9, 283–289.
29
30 Gastric H+ ,K + -ATPase
124 H. OGAWA and C. TOYOSHIMA, Proc. Natl. Acad. Sci. U. S. A., 2002, 99, 15977–15982. 125 K. MUNSON, O. VAGIN, G. SACHS, and S. KARLISH, Ann. New York Acad. Sci., 2003, 986, 106–110. 126 E. L. BURNS, R. A. NICHOLAS, and E. M. PRICE, J. Biol. Chem., 1996, 271, 15879–15883. 127 H. F. HELANDER and B. I. HIRSCHOWITZ, Gastroenterology, 1972, 63, 951–961. 128 H. F. HELANDER and B. I. HIRSCHOWITZ, Gastroenterology, 1974, 67, 447–452. 129 A. SAWAGUCHI, K. L. MCDONALD, S. KARVAR, and J. G. FORTE, J. Microsc., 2002, 208, 158–166. 130 J. M. WOLOSIN and J. G. FORTE, J. Membr. Biol., 1983, 76, 261–268. 131 G. SACHS, H. H. CHANG, E. RABON, R. SCHACKMAN, M. LEWIN, and G. SACCOMANI, J. Biol. Chem., 1976, 251, 7690–7698. 132 D. R. SCOTT, H. F. HELANDER, S. J. HERSEY, and G. SACHS, Biochim. Biophys. Acta, 1993, 1146, 73–80. 133 S. TAMURA, K. OSHIMAN, T. NISHI, M. MORI, M. MAEDA, and M. FUTAI, FEBS Lett., 1992, 298, 137–141. 134 M. KAISE, A. MURAOKA, J. YAMADA, and T. YAMADA, J. Biol. Chem., 1995, 270, 18637–18642. 135 A. TARI, V. WU, M. SUMII, G. SACHS, and J. H. WALSH, Biochim. Biophys. Acta, 1991, 1129, 49–56. 136 A. TARI, G. YAMAMOTO, K. SUMII, M. SUMII, and Y. TAKEHARA, et al., Am. J. Physiol., 1993, 265, 752–758. 137 A. TARI, G. YAMAMOTO, Y. YONEI, M. SUMII, and K. SUMII, et al., Am. J. Physiol., 1994, 266, 444–450. 138 D. R. SCOTT, M. BESANCON, G. SACHS, and H. HELANDER, Dig. Dis. Sci., 1994, 39, 2118–2126. 139 J. M. CROTHERS, Jr., D. C. CHOW and J. G. FORTE, Am. J. Physiol., 1993, 265, 231–241. 140 E. FELLENIUS, T. BERGLINDH, G. SACHS, L. OLBE, and B. ELANDER, et al., Nature, 1981, 290, 159–161. 141 B. WALLMARK, A. BRANDSTROM, and H. LARSSON, Biochim. Biophys. Acta, 1984, 778, 549–558.
142 H. NAGAYA, H. SATOH, K. KUBO, and Y. MAKI, J. Pharmacol. Exp. Ther., 1989, 248, 799–805. 143 H. FUJISAKI, H. SHIBATA, K. OKETANI, M. MURAKAMI, and M. FUJIMOTO, et al., Biochem. Pharmacol., 1991, 42, 321–328. 144 J. C. SIH, W. B. IM, A. ROBERT, D. R. GRABER, and D. P. BLAKEMAN, J. Med. Chem., 1991, 34, 1049–1062. 145 W. A. SIMON, C. BUDINGEN, S. FAHR, B. KINDER, and M. KOSKE, Biochem. Pharmacol., 1991, 42, 347–355. 146 T. ARAKAWA, T. FUKUDA, K. HIGUCHI, K. KOIKE, H. SATOH, and K. KOBAYASHI, Jpn. J. Pharmacol., 1993, 61, 299–302. 147 P. LINDBERG, P. NORDBERG, T. ALMINGER, A. BRANDSTROM, and B. WALLMARK, J. Med. Chem., 1986, 29, 1327–1329. 148 D. J. KEELING, C. FALLOWFIELD, K. J. MILLINER, S. K. TINGLEY, R. J. IFE, and A. H. UNDERWOOD, Biochem. Pharmacol., 1985, 34, 2967–2973. 149 W. B. IM, J. C. SIH, D. P. BLAKEMAN, and J. P. MCGRATH, J. Biol. Chem., 1985, 260, 4591–4597. 150 P. LORENTZON, R. JACKSON, B. WALLMARK, and G. SACHS, Biochim. Biophys. Acta, 1987, 897, 41–51. 151 D. J. KEELING, C. FALLOWFIELD, and A. H. UNDERWOOD, Biochem. Pharmacol., 1987, 36, 339–344. 152 M. MORII and N. TAKEGUCHI, J. Biol. Chem., 1993, 268, 21553–21559. 153 G. SACHS, J. M. SHIN, M. BESANCON, and C. PRINZ, Aliment. Pharmacol. Ther. 7, 1993, 29, 4–12. 154 M. KATASHIMA, K. YAMAMOTO, Y. TOKUMA, T. HATA, Y. SAWADA, and T. IGA, Eur. J. Drug. Metab. Pharm., 1998, 23, 19–26. 155 K. GEDDA, D. SCOTT, M. BESANCON, P. LORENTZON, and G. SACHS, Gastroenterology, 1995, 109, 1134–1141. 156 W. B. IM, D. P. BLAKEMAN, and G. SACHS, Biochim. Biophys. Acta, 1985, 845, 54–59. 157 J. M. SHIN and G. SACHS, Gastroenterology, 2002, 123, 1588–1597. 158 J. J. KAMINSKI and A. M. DOWEYKO, J. Med. Chem., 1997, 40, 427–436. 159 Y. HIOKI, J. TAKADA, Y. HIDAKA, H. TAKESHITA, M. HOSOI, and M. YANO, Arch. Int. Pharmacol. Ther., 1990, 305, 32–44.
References 160 R. J. IFE, T. H. BROWN, D. J. KEELING, C. A. LEACH, and M. L. MEESON, et al., J. Med. Chem., 1992, 35, 3413–3422. 161 C. A. LEACH, T. H. BROWN, R. J. IFE, D. J. KEELING, and S. M. LAING, et al., J. Med. Chem., 1992, 35, 1845–1852. 162 J. L. LAMATTINA, P. A. MCCARTHY, L. A. REITER, W. F. HOLT, and L. A. YEH, J. Med. Chem., 1990, 33, 543–552. 163 K. S. HAN, Y. G. KIM, J. K. YOO, J. W. LEE, and M. G. LEE, Biopharm. Drug. Dispos., 1998, 19, 493–500. 164 T. HAYASHI, S. ASANO, M. MIZUTANI, N. TAKEGUCHI, and T. KOJIMA, et al., J. Nat. Prod., 1991, 54, 802–809. 165 S. ASANO, M. MIZUTANI, T. HAYASHI, N. MORITA, and N. TAKEGUCHI, J. Biol. Chem., 1990, 265, 22167–22173. 166 S. MURAKAMI, I. ARAI, M. MURAMATSU, S. OTOMO, and K. BABA, et al., Biochem. Pharmacol., 1992, 44, 33–37. 167 A. H. DANTZIG, P. L. MINOR, J. L. GARRIGUS, D. S. FUKUDA, and J. S. MYNDERSE, Biochem. Pharmacol., 1991, 42, 2019–2026. 168 D. J. KEELING, A. G. TAYLOR, and C. SCHUDT, J. Biol. Chem., 1989, 264, 5545–5551. 169 D. J. KEELING, C. FALLOWFIELD, K. M. LAWRIE, D. SAUNDERS, S. RICHARDSON, and R. J. IFE, J. Biol. Chem., 1989, 264, 5552–5558. 170 B. WALLMARK, C. BRIVING, J. FRYKLUND, K. MUNSON, and R. JACKSON, et al., J. Biol. Chem., 1987, 262, 2077–2084. 171 J. MENDLEIN and G. SACHS, J. Biol. Chem., 1990, 265, 5030–5036. 172 T. FORTE and J. G. FORTE, J. Ultrastruct. Res., 1971, 37, 322–334. 173 A. SMOLKA, H. F. HELANDER, and G. SACHS, Am. J. Physiol., 1983, 245, 589–596. 174 P. MANGEAT, T. GUSDINAR, A. SAHUQUET, D. K. HANZEL, J. G. FORTE, and R. MAGOUS, Biol. Cell., 1990, 69, 223–231. 175 D. HANZEL, H. REGGIO, A. BRETSCHER, J. G. FORTE, and P. MANGEAT, Embo J., 1991, 10, 2363–2373. 176 X. YAO and J. G. FORTE, Annu. Rev. Physiol., 2003, 65, 103–131. 177 G. PERANZI, D. BAYLE, M. J. LEWIN, and A. SOUMARMON, Biol. Cell., 1991, 73, 163–171.
178 T. JONS, S. LEHNARDT, H. BIGALKE, H. K. HEIM, and G. AHNERT-HILGER, Eur. J. Cell. Biol., 1999, 78, 779–786. 179 T. NAMIKAWA, K. ARAKI, and T. OGATA, Arch. Histol. Cytol., 1998, 61, 47–56. 180 T. OGATA and Y. YAMASAKI, Microsc. Res. Tech., 2000, 48, 282–292. 181 T. OGATA and Y. YAMASAKI, Anat. Rec., 2000, 258, 15–24. 182 D. L. ROUSH, C. J. GOTTARDI, H. Y. NAIM, M. G. ROTH, and M. J. CAPLAN, J. Biol. Chem., 1998, 273, 26862–26869. 183 A. S. DUFFIELD, A. N. BROWN, H. FOLSCH, and I. C. MELLMAN. Presented at the 42nd American Society for Cell Biology Annual Meeting, San Francisco, 2002. 184 L. M. TRAUB and S. KORNFELD, Curr. Opin. Cell. Biol., 1997, 9, 527–533. 185 C. YEAMAN, K. K. GRINDSTAFF, and W. J. NELSON, Physiol. Rev., 1999, 79, 73–98. 186 E. IKONEN and K. SIMONS, Semin. Cell. Dev. Biol., 1998, 9, 503–509. 187 C. J. GOTTARDI and M. J. CAPLAN, J. Cell. Biol., 1993, 121, 283–293. 188 L. A. DUNBAR, P. ARONSON, and M. J. CAPLAN, J. Cell. Biol., 2000, 148, 769–778. 189 K. MATTER and I. MELLMAN, Curr. Opin. Cell. Biol., 1994, 6, 545–554. 190 P. SCHEIFFELE, J. PERANEN, and K. SIMONS, Nature, 1995, 378, 96–98. 191 A. GUT, F. KAPPELER, N. HYKA, M. S. BALDA, H. P. HAURI, and K. MATTER, EMBO J., 1998, 17, 1919–1929. 192 T. KIMURA, Y. TABUCHI, N. TAKEGUCHI, and S. ASANO, J. Biol. Chem., 2002, 277, 20671–20677. 193 O. VAGIN, S. DENEVICH, and G. SACHS, Am. J. Physiol. Cell. Physiol., 2003, 285, 968–976. 194 P. KELLER and K. SIMONS, J. Cell. Sci., 1997, 110, 3001–3009. 195 P. KELLER, D. TOOMRE, E. DIAZ, J. WHITE, and K. SIMONS, Nat. Cell. Biol., 2001, 3, 140–149. 196 A. MUSCH, H. XU, D. SHIELDS, and E. RODRIGUEZ-BOULAN, J. Cell. Biol., 1996, 133, 543–558. 197 A. RUSTOM, M. BAJOHRS, C. KAETHER, P. KELLER, and D. TOOMRE, et al., Traffic, 2002, 3, 279–288.
31
32 Gastric H+ ,K + -ATPase
198 T. YOSHIMORI, P. KELLER, M. G. ROTH, and K. SIMONS, J. Cell. Biol., 1996, 133, 247–256. 199 F. T. WIELAND, M. L. GLEASON, T. A. SERAFINI, and J. E. ROTHMAN, Cell, 1987, 50, 289–300.
200 O. VAGIN, S. TURDIKULOVA, I. YAKUBOV, and G. SACHS. Molecular Biology of the Cell, 2003, 14, p. 80a. (Abstracts of the 43rd Annual Meeting of the American Society for Cell Biology, Dec 13–17, 2003, San Francisco.)
1
Proton Translocating ATPases Masamitsu Futai, Ge-Hong Sun-Wada, and Yoh Wada Osaka University, Osaka, Japan
Originally published in: Handbook of ATPase. Edited by Masamitsu Futai, Yoh Wada and Jack H. Kaplan. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30689-3
Abbreviations DCCD, dicyclohexylcarbodiimide; Pi, inorganic phosphate.
1 Introduction
The mechanism of ATP synthesis has been a focus of biochemists for more than four decades. ATP synthase was first identified in the mitochondrial inner membrane, and then found successively in chloroplasts and bacterial membranes (for reviews, see Refs. 1–8). This enzyme synthesizes ATP from ADP and phosphate (Pi) coupled with an electrochemical proton gradient, and is also called protontranslocating ATPase or F-ATPase (F-Type ATPase) because of the reversible proton pumping upon ATP hydrolysis. The name F-ATPase originated from the coupling factor of oxidative phosphorylation sensitive to oligomycin [1]. Escherichia coli F-ATPase is composed of a membrane extrinsic F1 sector and a transmembrane Fo , formed from five (α 3 β 3 γ δ ε) and three (a b2 c10–14 ) subunit assemblies with different stoichiometries, respectively (Figure 1). It can also be divided into a catalytic α 3 β 3 hexamer, stalks (γ εab2 ), and membrane (a b2 c10–14 ) domains. Two stalks have been observed by electron microscopy [2]. The mitochondrial enzyme has additional subunits, possibly with regulatory functions. The higher-ordered structure of the bovine α 3 β 3 γ complex has been solved by X-ray crystallography [3]. Following this breakthrough, the structures of crystals of bovine F1 inhibited by efrapeptin, aurovertin, NBD-Cl (7-chloro-4-nitrobenzo2-oxa-1,3-diazole), and DCCD (dicyclohexylcarbodiimide) were solved by the same group [4–7]. Bovine F1 containing 1 mol MgADP-trifluoroaluminate [8] and two moles MgADP trifluoroaluminate [9] has also been crystallized, and the structures Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Proton Translocating ATPases
Fig. 1 Schematic model of F-ATPase, showing the subunit structures of the catalytic hexamer ("3 $3 ), stalk and membrane domain. The membrane extrinsic F1 and transmembrane Fo sector are shown together with energy coupling between the ATP synthesis–hydrolysis and subunit rotation.
determined. These studies, together with biochemical analysis, have contributed greatly to an understanding of the catalytic site and mechanism. The structures of F1 sectors of other origins have been reported [10–13]. The catalytic site is mainly located in the β subunit of the α 3 β 3 hexamer, and the three sites (one in each β) show strong cooperativity. The amino and carboxyl terminal helices of the γ subunit are located in the center of the α 3 β 3 hexamer, and form a central stalk with the ε subunit. The second or peripheral stalk is formed from the a and b subunits, and the δ subunit is located near the top of the α subunit [14]. The proton pathway is located at the interface between the a and c subunits, and the hairpin structure of the purified c subunit has been solved by NMR [15]. A ring structure formed from multiple c subunits was suggested by early studies involving electron [16] and atomic force microscopy [17, 18], and was extensively analyzed recently through NMR structure and genetic approaches [19, 20]. The X-ray structure of yeast F1 with a c subunit ring has also been reported [21]. The mechanism of coupling of proton transport and ATP synthesis or hydrolysis has been a major question in research on this complicated enzyme [14, 22, 23]. The binding change mechanism proposes rotation of the γ subunit relative to the α 3 β 3 hexamer coupled with the chemistry at the catalytic sites [23]. Rotation of the γ εc10–14 complex relative to α 3 β 3 δab2 upon ATP addition was shown recently, indicating that continuous rotation of the assembly of F1 and Fo subunits is involved in the coupling between ATP hydrolysis and proton transport [24–27]. Vacuolar-type ATPase (V-ATPase) with significant similarity to F-ATPase was introduced later to a family of proton-pumping ATPases (for reviews, see Refs. [28– 31]). V-ATPase is apparently different from F-ATPase in its physiological roles, and forms an acidic luminal pH in endomembrane organelles including lysosomes and
2 Catalytic Mechanism of F-ATPase
endosomes, and in extracellular compartments such as resorption lacuna formed between osteoclasts and the bone surface. It has extrinsic membrane V1 and Vo transmembrane domains formed from the A, B, C, D, E, F, G and H subunits, and a, c, c , c and d, respectively. F- and V-ATPase share significant homology, especially in their catalytic subunits and proton pathways, but also some unique differences, including their membrane sector and stalk region subunit compositions. Thus, comparative studies of the two proton pumps are pertinent for understanding the molecular mechanisms of both. In this chapter we discuss recent progress in the understanding of F-ATPase, focusing mainly on the energy coupling between the chemistry and proton transport through subunit rotation, and briefly on comparison with V-ATPase. It should be interesting for readers to follow a series of biochemical studies to show rotational catalysis of the F-ATPase holoenzyme leading to that of V-ATPase.
2 Catalytic Mechanism of F-ATPase
As expected from its complicated structure, F-ATPase is not a simple Michaelis–Menten type enzyme. Furthermore, the overall mechanism includes catalysis (chemistry), subunit rotation and proton translocation. X-Ray structure and kinetic studies, especially substrate binding analysis with an intrinsic tryptophan probe, revealed that the three catalytic sites are asymmetric. Senior and coworkers have recently extensively summarized and discussed the molecular catalytic mechanism of F-ATPase [32]. Thus, we discuss the catalytic mechanism and active site only briefly here. ATP synthesis and hydrolysis could be carried out kinetically at a single site (uni-site catalysis), two sites operating together (bi-site catalysis), or all three sites working together (tri-site catalysis). Uni-site catalysis has only been demonstrated for ATP hydrolysis, and can be measured experimentally with an ATP:F1 ratio of less than 1:3 [33]. This rate is 105 –106 -fold lower than that of steady-state (multisite) catalysis. The enzyme cross-linked chemically, and thus could not rotate, but could still carry out uni-site catalysis [34]. This catalysis is not a part of the steady state, which includes subunit rotation. Catalytic residues have been identified by analyzing uni-site catalysis of the purified E. coli mutant F1 sector (Figure 2). They were discussed in detail previously for mechanistic implication [32, 35–38]. Briefly, βLys155 of the β subunit is required for binding of the γ phosphate moiety, as shown by studies involving affinity labeling [39] and mutant enzymes such as βLys155Ala (βLys155 → Ala) or βLys155Ser [40–43]. The βArg182 residue is also involved in the binding. Enzymes with substitutions of βThr156 showed similar properties to those of βLys155 [40]. The hydroxyl moiety of βThr156 is essential, possibly for Mg2+ binding, since it can only be replaced by a serine residue [40, 44]. βGlu181 is a critical catalytic residue [41]. Its side-chain forms a hydrogen bond with a water molecule located near the γ phosphate of ATP [3]. However, it was shown later that this residue was not
3
4 Proton Translocating ATPases
Fig. 2 Catalytic sites of F-ATPase and V-ATPase. The catalytic residues of E. coli F-ATPase are shown together with bound ATP. Their positions are cited according to the bovine crystal structure [3]. Corresponding residues of yeast V-ATPase are also shown [109, 110].
involved in nucleotide binding or Mg2+ coordination [42, 45]. βTyr331, which is stacked close to the adenine ring [3], is required for the binding of ADP or ATP [37]. Results of analysis of the tryptophan fluorescence of the βTyr331Trp mutant F1 are consistent with the role of the βTyr331 residue [22, 23]. The bovine residues corresponding to those discussed above are located close to the phosphate moiety or the adenine ring of bound-ATP or ADP in the X-ray structure [3] (Figure 2). Recent results [9, 11, 37, 46–49], including the F1 structure of all three sites filled with nucleotides [9, 11], support tri-site catalysis, in which the three sites are working together during the steady state, and the notion that bi-site catalysis does not occur [32]. We leave convincing discussion of these points to the article of Senior and coworkers [32]. It can be assumed that mutant enzymes defective in steady state catalysis should show impaired uni-site catalysis. In this regard, αArg376 mutant enzymes are of interest [50]. The αArg376 residue of the α subunit is located close to the β- or γ -phosphate of ATP or ADP and Mg at the catalytic site [3] (Figure 2). However, αAg376 does not directly participate in the chemistry of ATP hydrolysis or synthesis, as shown by the uni-site catalysis of mutant F1 sectors [50]. The αArg376Lys r αArg376Ala mutant enzyme showed 2 × 103 -fold lower steady-state ATP hydrolysis than the wild type. However, the mutant enzymes showed essentially the same kinetics for uni-site catalysis as the wild type, suggesting that they can pass through the transition state. These results indicate that αArg376 is essential for promotion of catalysis to the steady-state turnover. This notion is different from the previously suggested roles of αArg376 deduced from the structural model [3] and
3 Roles of the ( subunit: energy coupling by mechanical rotation 5
fluoroaluminate binding, an indicator of the formation of the pentacovalent transition state [51]. The βGlu185 residue, located close to the γ -phosphate and Mg at the catalytic site, may have a similar role to αArg376 because the mutant enzymes maintain uni-site catalysis but are defective in multisite catalysis [52]. However, detailed analysis could not be carried out because the mutant F1 was unstable after solubilization from membranes. The binding change mechanism of Boyer [23] proposes that the three sites are involved sequentially in ATP synthesis or hydrolysis: at a specific time point during the steady state, the chemical reaction occurs reversibly at one site, and ATP release and/or binding of ATP + Pi at two other sites with the expenditure of energy. Evidence supporting catalytic cooperativity and the binding change mechanism has accumulated, as reviewed previously [23]. It includes early kinetic evidence of uni-site and steady-state catalysis showing three K m values and three interacting nucleotide binding sites, 18 O isotope exchange reactions, inhibitor studies, etc. These studies suggested that the chemistry, “ADP + Pi ←→ ATP + H2 O”, at the catalytic site is reversible, and provides essentially no energy change. The mechanism is strongly supported by the X-ray structure showing asymmetric catalytic sites and continuous γ subunit rotation [3]. However, it can not be concluded that the binding change mechanism was proven at the molecular level [32], although the mechanism is conceptually accepted and has been extremely useful for understanding the enzyme.
3 Roles of the γ subunit: energy coupling by mechanical rotation 3.1 Roles of the γ Subunit in Energy Coupling
The essential role of the γ subunit in catalysis and assembly was shown by early reconstitution experiments: a catalytic core complex exhibiting ATPase activity could be reconstituted from the purified E. coli α, β and γ subunits, but not without the γ subunit [53, 54]. Consistently, assembly of the F1 sector is strongly affected by γ subunit mutations, especially those of residues interacting with the β subunit [55]. The isolated α 3 β 3 γ complex could functionally bind to the Fo sector only after its assembly with the δ and ε subunits [54]. These results suggest that the two minor subunits are required for the functional binding of α 3 β 3 γ . Early genetic approaches focused on the γ subunit carboxyl terminal region [56–58]. The amino acid sequences of the γ subunits are weakly conserved among different species [57–59], although their X-ray structures are similar [3, 11–13]. When the known γ sequences were aligned, only 28 of the 286 residues of the E. coli subunit are conserved, and mostly in the carboxyl and amino terminal helices located at the center of the α 3 β 3 hexamer [58]. Seventeen residues are conserved between residues 242 and 286 of the γ subunit. An enzyme with a nonsense mutation lacking 10 residues at the carboxyl terminus is still capable of in vivo ATP
6 Proton Translocating ATPases
synthesis [57], indicating that the three conserved residues in this region are not required. Structural flexibility of the carboxyl terminus was also demonstrated by a frameshift mutation: the enzyme was still active with the γ subunit having seven additional residues at its carboxyl terminus together with nine altered residues downstream of γ Thr277 [60]. The enzyme with the nonsense mutation (γ Gln269End) was inactive, and substitution of a conserved residue (γ Gln269, γ Thr273, or γ Glu275) between γ Gln269 and γ Leu276 gave enzymes with reduced ATPase activity and energy coupling [57]. We noticed that mutations resulted in different ratios of ATPase catalysis and proton transport; three mutants (βThr277End, βGln269Leu, and βGlu275Lys) exhibited about 15 % of the wild-type ATPase activity, but showed various degrees of ATP-dependent H+ transport and in vivo ATP synthesis. These results suggest active role(s) of the γ subunit in ATPase activity and energy coupling. In addition to the carboxyl terminus, the only other conserved region is near the amino terminus [58]. The importance of this region was first suggested by the mutant lacking residues between γ Lys21 and γ Ala27, which resulted in failure of assembly of the F1 complex [61]. We introduced amino acid substitutions systematically into the amino terminal region of the γ subunit [62]. Most of the changes between γ Ile19 and γ Lys33, γ Asp83 and γ Cys87, or at γ Asp65 had no effect, even with a drastic replacement such as hydrophobic to hydrophilic or acidic to basic. Interesting exceptions were the γ Met23Arg and γ Met23Lys substitutions. These mutants grew only slowly on succinate through oxidative phosphorylation, indicating that they are impaired in ATP synthesis. However, the membranes prepared from the γ Arg23 and γ Lys23 strains showed 100 and 65% of the wild-type ATPase activity, but formed only 32 and 17 % of the electrochemical proton gradient, respectively, indicating that these mutants are defective in energy coupling between ATP hydrolysis and proton transport. In the X-ray structure, the γ Met23 residue is located close to the DELSEED (βAsp380–βAsp386) loop of the β subunit [3]. Thermodynamic and kinetic analyses of the purified γ Met23Lys enzyme suggested that the introduced γ Lys23 residue forms an ionized hydrogen bond with βGlu381 in the loop [63]. Consistent with this interpretation, the phenotype of γ Met23Lys was restored by the second mutation, βGlu381Gln [64]. The βGlu381Lys mutation also caused deficient energy coupling. These results suggest that the interaction between the regions around γ Met23 and βGlu381, and thus the γ subunit residue and β subunit DELSEED loop, are involved in energy coupling. The γ Met23Lys mutation was suppressed by substitution of carboxyl terminal residues, including γ Arg242, γ Glu269, γ Ala270, γ Ile272, γ Thr273 γ Glu278, γ Ile279k, and γ Val280 [65]. From these results, we initially suggested that γ Met23, γ Arg242, and the region between γ Glu269 and γ Val280 are three interacting domains that are required for efficient energy coupling. The X-ray structure shows that γ Met23 located in the amino terminal helix is near γ Arg242, but γ Gly269 and γ Val280 in the carboxyl terminal helix are near the top of the α 3 β 3 hexamer [3]. We further isolated second site mutations that suppressed the primary mutations, γ Gln269Glu and γ Thr273Val [66]. These mutations were mapped to the amino (residues 18, 34, and 35) and carboxyl (residues 236, 238, 242, and 262) termini.
3 Roles of the ( subunit: energy coupling by mechanical rotation 7
The higher-ordered structure clearly shows that γ Glu269 or γ Thr273 does not interact directly with the residues of the second mutations. The occurrence of suppression at a distance may suggest that the two a helices of the γ subunit located at the center of the α 3 β 3 complex undergo long-range conformational changes during catalysis [67]. As expected, the relative orientations of the γ subunit to the three β subunits (β E , β DP , and β TP ) are different in the crystal structure, strongly supporting γ subunit rotation during ATP hydrolysis or synthesis [3]. 3.2 γ Subunit Rotation
The higher-ordered structure of α 3 β 3 γ indicated that a and β are arranged alternately around the amino- and carboxyl-terminal a helices of the γ subunit [3, 14]. The binding change mechanism predicts that the catalytic sites in the three β subunits participate sequentially in ATP synthesis or hydrolysis via conformation transmission through the γ subunit rotation [23]. This rotation was suggested by experiments on chemical cross-linking between γ Cys87 and βCys380 (originally βAsp380) in the DELSEED loop [68, 69], and analysis of polarized absorption recovery after photobleaching of a probe linked to the carboxyl terminus of the chloroplast γ subunit [70, 71]. The continuous unidirectional γ subunit rotation in F1 was recorded directly by Yoshida and Noji and their colleagues [72] (Figure 3). They immobilized the Bacillus α 3 β 3 γ complex on a glass surface through a histidine-tag introduced into the β subunit. The fluorescent actin filament connected to the γ subunit rotated continuously in an anticlockwise direction during ATP hydrolysis. The rotation became slower with an increase in the filament length, and generated a frictional
Fig. 3 Rotation of the ( subunit of F-ATPase. F1 was immobilized through a histidine tag introduced to the " or $ subunit, and an actin filament was connected to the carboxyl terminus of the ( subunit. ATP-dependent rotation of the wild-type (open circles) and ( Met23Lys mutant (filled circles) F1 sector are shown. Taken from Omote et al. [77].
8 Proton Translocating ATPases
torque of ∼40 pN nm. The ε subunit rotation was also shown using the same approach [73], consistent with the tight association of γ and ε [21, 74]. A 120◦ step rotation was shown in the presence of a dilute ATP concentration, indicating that the γ subunit rotates, interacting with the three β subunits successively [75]. Furthermore, a refined measurement system involving gold beads revealed that the 120◦ step could be divided into 90◦ and 30◦ steps [76]. These two steps were proposed to correspond to ATP binding and ADP release, respectively, although this model is difficult to prove unambiguously. Interestingly, the rotation rates observed were much higher than those expected from the V max of steady-state catalysis. We could observe the rotation of an actin filament connected to the γ subunit of E. coli F1 [77] using a similar system to that described for the Bacillus α 3 β 3 γ complex [72]. The rotation was anticlockwise, inhibited by azide (F-ATPase inhibitor), and generated a frictional torque of ∼40 pN·nm, as reported for the Bacillus complex. The F1 sector, proven to be a chemically driven motor, should be connected functionally to the Fo sector to complete ATP-driven proton transport. Conversely, proton transport through Fo should be coupled to the γ subunit rotation and chemistry at the catalytic sites. Questions on the energy coupling between chemistry, rotation and proton transport could be answered with the E. coli enzyme by taking advantage of the accumulated genetic and biochemical information. 3.3 Mutational Analysis of the γ Subunit Rotation
We were interested in characterizing the γ rotation of a series of mutant enzymes defective in energy coupling and catalytic cooperativity. As described above, the γ Met23Lys mutant is defective in energy coupling between catalysis and proton transport [62]. We thought that the γ rotation in the mutant F1 may be defective, since the γ residue was replaced. Similar to the original mutant, the γ Me23Lys enzyme engineered for rotation observation exhibited essentially the same ATPase activity as the engineered wild type. However, the enzyme could not form an electrochemical proton gradient in membrane vesicles, or carry out in vivo oxidative phosphorylation [77]. Unexpectedly, an actin filament connected to the γ subunit of γ Met23Lys rotated, and generated essentially the same torque as that of the wild type (Figure 3). These results suggest that the γ Met23Lys mutant F-ATPase could couple between chemistry and rotation, but was defective in transforming mechanical work into proton translocation or vice versa. In this regard, Al-Shawi et al. showed that the γ Met23Lys mutant is defective in communication between F1 and Fo [63]. These results also suggested that analysis of F1 sector rotation is not enough to understand F-ATPase. It became imperative to determine which subunit complex is rotating in the F-ATPase holo enzyme purified or embedded in the membrane. Mutant F1 sectors with substitution of the βSer174 residue and their suppressors have been useful for understanding the rotation mechanism (Figure 4). Replacing βSer-174 with other residues altered the ATPase activity to between 150 and 10 % of the wild-type level, and the larger the side-chain of the residue introduced the lower
3 Roles of the ( subunit: energy coupling by mechanical rotation 9
Fig. 4 Models of the E. coli $ subunit domain including $Gly149 and $Ser174. (a) Domain structure for the ATP-bound ($TP ) and empty ($E ) $ subunit. The bovine structure [3] was used to model E. coli domains. Positions of ATP and residues discussed
in the text are indicated. Nomenclatures for the a helix and $ sheet are those of previous workers [3]. (b) Models of the $Ser174Phe mutant domain structure. The positions of the residues discussed in the text are shown. Taken from Iko et al. [79].
the ATPase activity observed [78]. Both the βSer174Leu and βSer174Phe enzymes retained about 10 % of the wild-type ATPase activity. However, the two mutants showed a difference in energy coupling: the βSer174Leu mutant could still grow on succinate by oxidative phosphorylation and transport protons into the isolated membrane vesicles, whereas the βSer174Phe mutant could not grow and showed no proton transport. Consistent with these observations, their F1 sectors differed in γ subunit rotation [79]. The F1 sector with βSer174Phe showed apparently slower rotation than the wild type, and generated significantly lower frictional torque (∼17 pN nm), whereas the F1 with βSer174Leu was similar to the wild type. These results suggest that the rotation or torque generation is closely related to the energy coupling with proton transport. Biochemical defects of the βSer174Phe mutation were suppressed by a secondsite mutation, βGly149 to Ser, Cys, or Ala [80]. Double mutants such as βSer174Phe/βGly149Ser showed essentially the same ATPase activity and proton transport as the wild type. As expected from these results, the double mutant F1 generated the wild-type torque (∼40 pN nm) [79]. The high-resolution structure of the bovine F1 predicts that βSer174 is located on the β subunit surface within the loop between an α helix (helix B) and a β sheet (β sheet 4) (Figure 4). βGly149, the first residue of the phosphate binding P loop connected to helix B, is close to the catalytic site. The structure of the βGly149–bSer174 region is significantly
10 Proton Translocating ATPases
different between the nucleotide-bound and empty β subunits. Thus, it can be assumed that the conformational change of the catalytic site, followed by that of the βGly149–βSer174 domain, leads to the γ subunit rotation for the energy coupling to proton transport. The βSer174Phe mutation was also suppressed by αArg296Cys of the α subunit [78]. It was of interest to analyze the rotation of the double mutant and related strains. However, the F1 sector of βSer174Phe/αArg296Cys was difficult to purify. Energy minimization with a simple potential function of the modeled βSer174Phe mutant F1 predicts that the side-chain of βPhe174 interacts with that of βIle163 or βIle166 of the β subunit with no nucleotide bound [79]. We substituted βIle163 or βIle166 with a less bulky Ala residue in the βSer174Phe mutant. As expected, the F1 with the βIle163Ala/βSer174Phe or βIle166Ala/βSer174Phe double mutant could rotate and generate almost similar torque to the wild type. These results suggest that the βGly149–βSer174 domain plays important roles in the rotation and torque generation essential for energy coupling. The role of the DELSEED loop has been focused on because the conformations of the three β subunits (β E and β T or β D ) are significantly different [3], which is consistent with the roles of γ Met23 [62] and βGlu381 (the second residue of the DELSEED loop) [63] in energy coupling. However, mutagenesis studies involving replacement of residues in the DELSEED loop of thermophilic Bacillus α 3 β 3 γ did not reveal significant effects on the torque generation [81]. The negative results may suggest that the loop is not related to the conformation change driving the β subunit rotation. However, this is difficult to conclude from the directed mutations without structural analysis. Furthermore, the experimental system with actin filaments gives rotation rates and torque values with high deviations. Extensive substitution of related residues and their second mutations should be analyzed for the final conclusion, as discussed above for a region around βSer174. Furthermore, analysis of rotation in the F-ATPase holoenzyme may be necessary, as pointed out for the γ Met23Lys mutation [77].
4 Rotational Catalysis of the F-ATPase Holoenzyme 4.1 Structure of Fo Sector and Proton Transport Pathway
As discussed above, the Fo membrane sector is composed of a, b and c subunits, whose stoichiometry (1:2:10 ± 1, for a:b:c) was first determined from the stained bands on polyacrylamide-gel electrophoresis [82]. Based on structural prediction, Cox et al. proposed a model of Fo in which the a and b subunit helices are surrounded by a ring of c subunits [83]. However, this model was not consistent with the electron [16] and atomic force [17, 18] microscopic images. In the model derived from the images, the a and b subunits are attached to one side of the symmetric ring structure formed by the c subunits. An atomic force microscope image indicated
4 Rotational Catalysis of the F-ATPase Holoenzyme
< 12 c subunits in the ring. The ab2 complex was purified recently after solubilization of Fo from membranes utilizing a histidine tag introduced at the a subunit amino terminus [84]. The proton pathway (Fo ) was reconstituted from the ab2 complex and c subunit. These results confirmed that ab2 and the c ring are two functional subcomplexes of the Fo sector. Subunit a spans the membrane with five helices, and its fourth helix could be cross-linked to the carboxyl terminal helix of a c subunit when Cys residues were introduced at appropriate positions [85, 86]. aArg210 in the fourth helix is involved in proton transport, and all the substitutions, even aArg210 to Lys, gave a Fo sector with no ability to transport protons [19, 20, 87]. The structure of the amino terminus (residues 1–34) of the b subunit, including the trans-membrane helix, was solved by NMR in a chloroform–methanol–water mixture [88], and extramembrane helices interact with the δ and α subunits at the top of the F1 sector [89, 90], which is consistent with the model proposing that the b subunit closely interacts with an α 3 β 3 hexamer [20]. The b subunit dimer interaction with the F1 sector is necessary for a second stalk [88]. The carboxyl terminal helix of a c subunit can be cross-linked to the membrane helix of the b subunit [91]. Fillingame and co-workers solved the structure of the E. coli c subunit by NMR [15]: monomeric c (in a mixture of chloroform, methanol and water, pH 5) give a hairpin-like structure formed from the two helices connected by a polar region that interacts with the γ and ε subunits. Structural models of Fo and the c ring were reviewed recently by Fillingame and coworkers [19, 20]. Thus, we discuss here only the pertinent points of the rotational catalysis. The front face of one c subunit packs with the back face of a second c subunit to form a dimer, consistent with the results of mutant studies [19]. Fillingame and coworkers proposed that dimers form a ring of 12 monomers, with the amino and carboxyl α helices of each c subunit in the interior and at the periphery, respectively, and the ring was modeled from the results of molecular dynamic calculations [92]. The model was supported by cross-linking analysis [93]. However, a ring containing more than 10 monomers was not found in purified Fo F1 [94], and recent recalculation gave essentially the same but a slightly smaller ring formed from 10 copies of the c subunit with the two helices in similar orientations [20, 95]. In an alternative model, the carboxyl and amino terminal α helices form inner and outer rings, respectively [96]. However, this model is not supported by cross-linking experiments [19]. The X ray structure indicated a yeast F1 tightly bound with a c ring formed from ten monomers [21]. These results established that the c subunits form a ring of ten monomers. However, the number of monomers forming the ring is still controversial [21, 24]: atomic force microscopy of chloroplast [97] and bacterial [98] Fo demonstrated rings of 14 and 11 c subunits, respectively. The difference in the copy number may be due to the species difference, loss of a part of the monomers, or reorganization of the ring during purification. Structural studies on F-ATPase or the Fo sector will eventually explain the discrepancy. cAsp61, in the middle of the second transmembrane a helix, is responsible for proton transport, and close to cAla24 and cIle28, of which substitutions by other
11
12 Proton Translocating ATPases
residues reduced the DCCD reactivity of cAsp61 [18]. The stoichiometry of the a and c subunits (1:10) indicates that one aArg210 and multiple cAsp61 are required for proton translocation in F-ATPase. As the pKa of the cAsp61 carboxyl moiety is 7.1, the NMR c subunit structure at pH 5 is at the fully protonated stage. The interaction between cAsp61 and aArg210 may lower the pKa to facilitate proton release into the proton pathway [19, 20]. Restogi and Girvin solved the c subunit structure at pH 8, with cAsp61 being in an almost completely deprotonated form [99]. The difference between the c structures at pH 5 and 8 is that the carboxyl terminal a helix was rotated by 140◦ with respect to the amino terminal helix. From the two structures and the results of cross-linking experiments, Fillingame et al. suggested that the carboxyl terminal a helix rotates during proton transport, interacts with aArg210, and deprotonates cAsp61 [19]. This c subunit structural change possibly drives stepwise rotation of the c ring.
4.2 Rotational Catalysis of the F-ATPase Holoenzyme
To complete ATP hydrolysis-dependent proton transport, the γ rotation should be transmitted to the Fo membrane sector. Conversely, for ATP synthesis, proton transport through Fo should generate torque to drive the γ subunit rotation coupled to chemistry. The γ rotation should be coupled to protonation–deprotonation of cAsp61 in either direction. Depending on the modeling of Fo (a and b subunit inside the c ring), rotation of a, b, γ , δ and ε relative to the complex of α, β and the c ring once has been suggested [83]. This speculation led to experimental tests, although an alternative model in which the c subunit ring attached by the a and b subunit helices is supported experimentally, as discussed above. Different mechanisms could be proposed for the coupling between γ rotation within the α 3 β 3 hexamer and proton transport through the Fo sector: (a) the γ subunit rotates on the surface of the c ring; and (b) the γ subunit and the c ring rotate as a single complex. Models should be consistent with the proton transport continuously utilizing one aArg210 and 10–14 cAsp61 residues of the a subunit and the c ring, respectively. Consistent with mechanism (b), energy transduction of F-ATPase with a counter-rotating rotor (γ ε c ring) and stator has been proposed [71, 100]. Convincing evidence supporting this mechanism includes the results of chemical cross-linking. Cross-linking between the γ and c subunits did not affect F-ATPase activity, indicating that sequential interaction of the rotating γ subunits with multiple c subunits is not necessary during ATP synthesis-hydrolysis [101]. Similarly, cross-linking between the γ and ε subunits did not affect ATP hydrolysis [102], supporting the rotation of an actin filament connected to either subunit. However, α–γ , α–ε, β–γ and β–ε cross-linking resulted in loss of the activity [20, 103], confirming that ε γ rotates against α 3 β 3 . These results suggest that a complex formed from γ , ε and c subunits is a mechanical unit for relative rotation to the α 3 β 3 hexamer.
4 Rotational Catalysis of the F-ATPase Holoenzyme
Fig. 5 Rotational catalysis of F-ATPase. Fo F1 was immobilized through a histidinetag attached to the a subunit (a) or c ring (b, c), and a fluorescent actin filament was connected as a probe to the c (a), " (b), or
a (c) subunit to observe rotational catalysis. Continuous rotation of the probe connected to the purified Fo F1 (a or b) or membraneembedded Fo F1 (c) was observed upon addition of ATP.
We obtained direct evidence of continuous rotation of the E. coli εγ c10–14 complex during ATP hydrolysis [24] (Figure 5). F-ATPase engineered for rotation was solubilized from membranes, purified, and immobilized through histidine tags introduced into the a subunit. Upon the addition of ATP, the filament connected to the c ring rotated anticlockwise continuously, and generated similar torque to that observed for the γ rotation in the F1 sector. The rotation was inhibited by venturicidin, a specific inhibitor for F-ATPase but not for ATPase activity of the F1 sector. The rotation and ATPase activity became less sensitive to the antibiotic when the cIle38Thr mutation was introduced into the c subunit [26], indicating that antibiotic binding to the c ring inhibited rotation [24]. P¨anke et al. also observed c subunit rotation during ATP hydrolysis [25]. They immobilized Fo F1 through a histidine tag introduced into the β subunit, and an actin filament was connected to the c subunit through a streptag and streptavidin. The characteristics of the rotation observed were the same as those with the above systems. These results are consistent with the recent experiments showing that complete cross-linking of the γ , ε and c subunit had no effect on ATP hydrolysis, proton translocation, or ATP synthesis [103]. The important conclusion drawn from these experiments is that the c subunit ring rotates when F-ATPase is immobilized through the α or β subunit. As discussed by P¨anke et al. [25] and Wada et al. [104], it is not easy to prove that all the subunits were integrated into F-ATPase rotating under the microscope. However, indirect evidence supports the intactness of the enzyme used for rotation: reconstitution studies indicated that the a, b and c subunit are required to form Fo capable of F1 binding [105], and all F1 subunits are required for binding to the Fo sector [54]. Early genetic and biochemical studies led to the conclusion that all three Fo subunits are required for F1 binding (see Ref. [56] for a review). The intactness of F-ATPase in these observations was also supported by the results for the membrane-bound enzyme [27], which rotated in essentially a similar manner to the purified F-ATPase. As discussed previously [26, 104], criticism [106] regarding
13
14 Proton Translocating ATPases
the initial observation [24] was based mainly on the negative observation under different experimental conditions. We further addressed the basic question of whether the rotor and stator are interchangeable in F-ATPase [26] (Figure 5b). Thus, F-ATPase was immobilized on a glass surface through a histidine tag introduced into the c subunit, and an actin filament was connected to the β subunit through the biotin-binding domain of transcarboxylase (biotin-tag). We could observe ATP-dependent filament rotation generating the same frictional torque as observed above, similar to the case of F-ATPase immobilized through the α or β subunit and with an actin filament connected to the c subunit [24, 25]. Thus, either of the two subcomplexes (εγ c10–14 or α 3 β 3 δab2 ) could be a rotor or a stator. Their actual roles in vivo will depend on the viscous drag due to the cytosol and membranes. 4.3 Rotational Catalysis of F-ATPase in Membranes
Although rotation of the γ εc10–14 complex has been shown using purified F-ATPase [24–26], a more favorable experimental system is the isolated membrane because the original integrity of the enzyme is maintained. The membrane could be used to answer one of the obvious questions of whether the c ring rotates relative to the a subunit. Such rotation would support the model of proton transport through the interface of the two subunits (Figure 5c). It may be difficult to test the rotation of a probe connected to the c ring embedded in membranes by immobilizing F-ATPase through the α or β subunit. As described above, purified F-ATPase can be immobilized through a histidine-tag connected to the c ring, and an actin filament can be connected to the α or β subunit through a biotin-tag [38]. This experimental system was modified to test the rotation of F-ATPase in the membrane. As histidine and biotin-tags face the periplasm and cytoplasm of the intact E. coli cell, respectively, membrane preparation for rotation should be carefully considered. Right-side out or everted membrane vesicles can not be used for testing rotation, because only one of the two tags in these vesicles is accessible from the medium: the histidine-tag faces the medium for right-side out vesicles, but the biotin-tag is inside the same vesicles. Similarly, the histidine-tag is inside everted vesicles, and thus cannot be used for immobilizing F-ATPase in these vesicles. Therefore, F-ATPase in planar membranes should be used to test for the rotation. We prepared membrane fragments by passing E. coli cells through a French press by a slight modification of the procedure used to prepare everted vesicles [27]. A labeling experiment with streptavidine-conjugated gold particles indicated that the preparation contained a significant number of planar membranes. Using this preparation, we observed ATP-dependent rotation of an actin filament connected to the β subunit. As expected from the previous studies, the rotation was counterclockwise and sensitive to DCCD. To complete a model of proton translocation through the rotation in F-ATPase, it is essential to show rotation of the c ring relative to the a subunit (Figure 5c).
4 Rotational Catalysis of the F-ATPase Holoenzyme
Fig. 6 Relative rotation of the a subunit in membrane-embedded F-ATPase. Time courses of rotating filaments connected to the a subunit of membrane F-ATPase immobilized through the c subunit are shown,
along with time courses of rotating filaments of varying length (a) and typical sequential video images (video interval, 100 ms) (b). Taken from Nishio et al. [27].
The c subunit (cMet65Cys) cross-linked with the a subunit (αAsn214Cys) was highly protected from labeling with 14 C-DCCD [107]. However, when F-ATPase was labeled with 14 C-DCCD and subjected to the ATPase reaction before crosslinking there was significantly increased labeling of the a–c dimer. This experiment supports the rotational catalysis in Fo , although no information on the mechanism could be obtained. We could show rotation of the actin filament connected to the a subunit relative to the c ring using membrane fragments [27] (Figure 6). These results and those obtained with purified Fo F1 or the F1 sector indicate that γ εc10–14 and α 3 β 3 δab2 are mechanical units, i.e., an interchangeable rotor and stator, respectively. Notably, the results obtained with membrane-embedded F-ATPase and those with the purified one are similar. Furthermore, the experimental system will be extremely valuable for studying the mechanism of coupling of rotation and proton transport, although further modification(s) may be necessary. These studies established rotational catalysis by F-ATPase (Figure 7). For ATP hydrolysis, the F-ATPase is a chemically driven motor rotating γ εc10–14 to drive proton transport. Studies on the mechanism of the rotation in the Fo sector were started only recently. The rotation is inhibited by venturicidin or DCCD, suggesting that the tight or covalent binding of these bulky chemicals to the c subunit inhibits mechanical rotation [13]. Junge and coworkers showed that the c subunit ring with the cAsp61Asn mutation can still rotate during ATP hydrolysis, indicating that the proton transport is not obligatory for the chemically driven rotation (ATP hydrolysis-dependent c ring rotation) [108]. For ATP synthesis, the same enzyme is a potential-driven motor rotating γ εc10–14 through an electrochemical proton gradient. This membrane system will be useful for studying the rotation in this direction.
15
16 Proton Translocating ATPases
Fig. 7 Schematic mechanism of rotational catalysis by F-ATPase. For ATP synthesis, electrochemical proton transport changes the c subunit conformation that drives the rotational movement of the c ring together with the ( and g subunits. Conversely, ATP
hydrolysis drives the ( and g subunit rotation together with the c ring to transport protons. Modified from Oster [100], Fillingame [19, 20], Junge [71] and their coworkers.
5 Rotational catalysis of V-ATPase 5.1 Catalytic Site and Proton Pathway
V-ATPase (vacuolar-type ATPase) acidifies the lumens of endomembrane organelles such as lysosomes, endosomes and synaptic vesicles (for reviews, see Refs. 32–35). The same enzyme in the plasma membranes of specialized cells, including osteoclasts and renal intercalated cells, pumps protons into extra cellular compartments such as resorption lacunae and collecting ducts, respectively. Despite significant physiological differences, V-ATPase exhibits similarities with F-ATPase. The catalytic A subunit of V-ATPase is homologous to F-ATPase β. The homology is striking in the phosphate binding P-loop and other sequences, including the F-ATPase catalytic residues discussed above [29]. E. coli F-ATPase residues at the catalytic site, βLys155, βThr156, βGlu181, βArg182 and βGlu185, correspond to ALys263, AThr264, AGlu266, AArg267 and AGlu290 of the yeast V-ATPase A subunit, respectively (Figure 8). Mutational studies of yeast V-ATPase
5 Rotational catalysis of V-ATPase
Fig. 8 Similarities of the V-ATPase G subunit with the F-ATPase b or * subunit. The subunits are aligned to obtain maximal homology. Identical amino acid residues are boxed.
supported the notion that they are catalytic residues [109, 110]. The kinetics indicate that V-ATPase has three catalytic sites exhibiting cooperativity [111]. The cysteine residue in the V-ATPase P-loop may be involved in regulation [112]. F-ATPase with Cys introduced at the corresponding position became sensitive to sulfhydryl reagents, similar to V-ATPase [113]. The membrane Vo sector of V-ATPase has a more complicated subunit composition: yeast Vo is formed from c, c , c , a and d subunits [28]. Of the three proteolipid subunits, the c and c having four transmembrane helices are a duplicated form of the F-ATPase c subunit, and show 56% identity in amino acid residues with each other. The carboxyl moieties (cGlu137 and c Glu145) for proton transport are located on the fourth helix of the c and c subunits, respectively [114, 115]. The c subunit, exhibiting some homology with c and c , is also required in proton transport: c γ Lu188 in the middle of the third a helix is also implicated for proton transport [114]. The c subunit is conserved in mammalians [116, 117], whereas the c subunit is only found in yeast. The stoichiometry of the c:c :c subunits in yeast is n:1:1 [118]. The three proteolipids may form a ring structure similar to the Fo c ring. Assuming that 4, 1, and 1 molecules of the c, c and c subunits [114], respectively, form a ring, Vo has ∼6 proton translocating residues altogether. This is in contrast with 10–14 residues for Fo . Mutation of V-ATPase aArg735 abolished proton translocation similar to that of F-ATPase aArg210 [119], although the subunits a of Vo and Fo exhibit little homology. Thus, Vo has multiple carboxyl moieties and one arginine essential for proton translocation. The similarities between the two ATPases raised the interesting question of whether V-ATPase can synthesize ATP. Unlike F-ATPase localized with the respiratory chain in the mitochondrial membrane, mammalian or yeast membranes
17
18 Proton Translocating ATPases
containing V-ATPase do not have a system for generating an electrochemical proton gradient. Does V-ATPase synthesize ATP when a membrane potential or protongradient is generated? To answer this question, we expressed a plant proton translocating pyrophosphatase in yeast vacuoles [120]. Upon hydrolysis of pyrophosphate, the vacuolar membrane vesicles could form an electrochemical proton gradient, which was lowered by the addition of ADP + Pi. As expected from this result, we observed ATP synthesis sensitive to the V-ATPase specific inhibitor bafilomycin. The results further support the similarities between the two ATPases. Plant vacuoles having both V-ATPase and pyrophosphatese may synthesize ATP similar to the hybrid system studied in yeast.
5.2 Subunit Rotation of V-ATPase During Catalysis
Despite the similarities of V- and F-ATPase discussed above, they are also significantly different. Typically, F-ATPase of E. coli has eight subunits, whereas yeast V-ATPase has 13. Most of the subunits unique to V-ATPase are located in the stalk region. Furthermore, the presence of isoforms of subunits B, E, G, d, and a has been found for V-ATPase, whereas the information on isoforms is limited for F-ATPase. As pointed out above, Vo , possibly its ring structure, may be different from Fo . These differences and similarities led us to examine the rotational catalysis in V-ATPase. As described above, rotation of an actin filament connected to the a, α, or β subunit was observed when F-ATPase was immobilized through the c subunit ring. The correspondence of minor subunits in the stalk region between the two ATPases was difficult to determine (Figure 8). The G subunit was shown to exist in the stalk domain of a recent model, and to be accessible from the cytosol [30]. The G subunit and F-ATPase b exhibit ≈24% homology, although the G subunit lacks a transmembrane region. However, recent cross-linking studies suggested that the G subunit is located near the top of V1 [121, 122], similar to the F-ATPase δ subunit [89, 90]. Thus, the G subunit may correspond to the b or δ subunit of F1 , and could be a candidate for connecting a probe to observe rotation, although the homology is quite limited. Based on these considerations, we engineered the yeast chromosome (VMA3 and VMA10 for the c and G subunit, respectively), solubilized V-ATPase from vacuolar membranes, and tested rotational catalysis. An actin filament connected to the G subunit rotated upon ATP hydrolysis [123] (Figure 9). The rotation was inhibited by nitrate and concanamycin, similar to their effects on ATPase activity. Concanamycin inhibition was striking: the rotation terminated within a few seconds after the addition Figure 9, which is consistent with its tight affinity [111]. This antibiotic is similar to bafilomycin, which binds to the Vo sector, possibly to its c subunit [124], indicating that concanamycin, possibly bound to the rotor–stator interface, terminated the rotation. Torque generated by the rotation was similar to that with F-ATPase. We observed rotation in a sonicated vacuolar membrane
6 Conclusion
Fig. 9 Rotation of an actin filament connected to the g subunit of V-ATPase. Solubilized V-ATPase was immobilized on a glass surface through the c subunit, and an actin filament was connected to the G subunit. Rotation was observed upon the addition of ATP, and inhibited by nitrate and concanamycin. Taken from Hirata et al. [123].
preparation; however, we could not obtain enough data for analysis (unpublished observation). These results suggest that the A3 B3 hexamer rotates together with the G subunit, when the c subunit ring is immobilized. An interesting question is “Which part of the V-ATPase rotates in vivo?” In this regard, Holiday and coworkers reported that the B subunit interacts with the cytoskeleton, possibly actin [125]. Thus, the c subunit ring together with the subunit corresponding to the γ (possibly D) subunit may be rotating in vivo, if the V-ATPase holo enzyme is immobilized with the cytoskeleton. However, it is also possible that the rotor–stator is determined only by the slight difference in viscous drag applied to each sub-complex. Rotational catalysis of the peripheral membrane sector of T. thermophilus ATPase was shown recently [126]. This Archaean ATPase is more similar to F-ATPase than V-ATPase, as pointed out by the authors. In this regard, the ATPase of Archae (Archaean bacterial) plasma membranes has been classified as an A-type ATPase [127]. Thus, three distantly related proton translocating ATPases carry out common rotational catalysis.
6 Conclusion
F-ATPase has been a fascinating membrane enzyme for generations of biochemists. As Paul Boyer called it “A splendid molecular machine” [23], it is more than an ordinary enzyme. F-ATPase synthesizes or hydrolyzes ATP coupling between
19
20 Proton Translocating ATPases
proton translocation and chemistry, and has become a real molecular machine in this sense, since mechanical rotation of its subunit complex was included recently in its mechanism. For the engineering aspect of the mechanism, the F1 sector immobilized on a metal surface can rotate a metal plate [128]. As described above, experimental systems have been established for further studies. Details of the rotation mechanism will be revealed by studies involving E. coli F-ATPase together with the progress regarding physical methods and the higher-ordered X-ray structure determinations of the Fo sector. It was interesting to learn that rotational catalysis has been expanded to V-ATPase and also distantly related Archaean ATPase. The basic mechanism of V-ATPase rotation may be similar to that of F-ATPase, as expected from the similarities between the two enzymes. Thus, the rotational mechanism of E. coli F-ATPase should be studied extensively, using its genetic and biochemical advantages. VATPase may have a more fascinating regulatory mechanism that is supported by a unique Vo structure and a series of isoforms in the stalk region. The regulatory role of stalk subunit(s) including subunit E in energy coupling has already been discussed [129]. Finally, owing to limited space, the present article is not a comprehensive survey, and so the reader is directed to the reviews listed above for references not cited here.
Acknowledgments
We wish to thank our coworkers, whose names appear in the references, from our laboratory. We are grateful to Ms S. Shimamura and M. Nakashima for preparation of the manuscript. The research in our own laboratories was supported by Grantsin-Aid from the Ministry of Education, Science, and Culture of Japan, the Takeda Foundation, and the Japan Science and Technology Corporation.
References 1 E. RACKER, A New Look at Mechanisms in Bioenergetics, Academic Press, New York, 1976. 2 S. WIKENS and R. A. CAPALDI, Nature, 1998, 393, 29–29. 3 J. P. ABRAHAMS, A. G. W. LESLIE, R. LUTTER, and J. E. WALKER, Nature, 1994, 370, 621–628. 4 J. P. ABRAHAMS, S. K. BUCHANAN, M. J. RAAJI VAN, I. M. FEARNLEY, A. G. W. LESLIE, and J. E. WALKER, Proc. Natl. Acad. Sci. U. S. A., 1996, 93, 9420–9424. 5 M. J. RAAJI VAN, J. P. ABRAHAMS, A. G. W. LESLIE, and J. E. WALKER, Proc. Natl. Acad. Sci. U. S. A., 1996, 93, 6913–6917.
6 G. L. ORRISS, A. G. W. LESLIE, K. BRAIG, and J. E. WALKER, Structure, 1996, 6, 831–837. 7 C. GIBBONS, M. G. MONTGOMERY, A. G. W. LESLIE, and J. E. WALKER, Nat. Struct. Biol., 2000, 7, 1055–1061. 8 K. BRAIG, R. I. MENZ, M. G. MONTGOMERY, A. G. W. LESLIE, and J. E. WALKER, Structure, 2000, 8, 567–573. 9 R. I. MENZ, J. E. WALKER, and A. G. W. LESLIE, Cell, 2001, 106, 331–341. 10 Y. SHIRAKIHARA, A. G. W. LESLIE, J. P. ABRAHAMS, J. E. WALKER, T. UEDA, Y. SEKIMOTO, M. KAMBARA, K. SAIKA, Y.
References
11
12
13 14
15
16
17
18
19
20
21 22 23 24
25
26
KAGAWA, and M. YOSHIDA, Structure, 1997, 5, 825–836. M. A. BIANCHET, J. HULLIHEN, P. L. PEDERSON, and L. M. AMZEL, Proc. Natl. Acad. Sci. U. S. A., 1998, 95, 11065–11070. A. C. HAUSRATH, G. GRUBER, B. W. MATTHEWS, and R. A. CAPALDI, Proc. Natl. Acad. Sci. U. S. A., 1999, 96, 13697–13702. G. GROTH and E. POHL, J. Biol. Chem., 2001, 276, 1345–1352. D. STOCK, C. GIBBONS, I. ARECHAGA, A. G. W. LESLIE, and J. E. WALKER, Curr. Opin. Struct. Biol., 2000, 10, 672–679. M. E. GIRVIN, V. K. RASTOGI, F. ABILDGAARD, J. L. MARKLEY, and R. H. FILLINGAME, Biochemistry, 1998, 37, 8817–8824. R. BIRKENHA¨ GER, M. HOPPERT, G. DECKERS-HEBESTREIT, F. MAYER, and K. ALTENDORF, Eur. J. Biochem., 1995, 230, 58–67. K. TAKEYASU, H. OMOTE, S. NETTIKADAN, F. TOKUMASU, A. IWAMOTO-KIHARA, and M. FUTAI, FEBS Lett., 1996, 392, 110–113. S. SIGN, P. TURINA, C. J. BUSTAMANTE, D. J. KELLER, and R. A. CAPALDI, FEBS Lett., 1996, 397, 30–34. R. H. FILLINGAME, C. M. ANGEVINE, and O. Y. DMITRIEV, Biochim. Biophys. Acta, 2002, 1555, 29–36. R. H. FILLINGAME and O. Y. DMITRIEV, Biochim. Biophys. Acta, 2002, 1565, 232–245. D. STOCK, A. LESLIE and J. E. WALKER, Science, 1999, 286, 1700–1705. J. WEBER and A. E. SENIOR, Biochim. Biophys. Acta, 1997, 1319, 19–58. P. D. BOYER, Annu. Rev. Biochem., 1997, 66, 717–749. Y. SAMBONGI, Y. IKO, M. TANABE, H. OMOTE, A. IWAMOTO-KIHARA, I. UEDA, T. YANAGIDA, Y. WADA, and M. FUTAI, Science, 1999, 286, 1722–1724. O. P¨ANKE, K. GUMBIOWSKI, W. JUNGE, and S. ENGELBRECHT, FEBS Lett., 2000, 472, 34–38. M. TANABE, K. NISHIO, Y. IKO, Y. SAMBONGI, A. IWAMOTO-KIHARA, Y. WADA, and M. FUTAI, J. Biol. Chem., 2001, 276, 15269–15274.
27 K. NISHIO, A. IWAMOTO-KIHARA, A. YAMAMOTO, Y. WADA, and M. FUTAI, Proc. Natl. Acad. Sci. U. S. A., 2002, 99, 13448–13452. 28 N. NELSON and W. R. HARVEY, Physiol. Rev., 1999, 79, 361–385. 29 M. FUTAI, T. OKA, G.-H. SUN-WADA, Y. MORIYAMA, H. KANAZAWA, and Y. WADA, J. Exp. Biol., 2000, 203, 107–116. 30 T. NISHI and M. FORGAC, Nat. Rev. Mol. Cell. Biol., 2002, 3, 94–103. 31 G.-H. SUN-WADA, Y. WADA, and M. FUTAI, J. Bioenerg. Biomembr., 2003, 35, 347–358. 32 A. E. SENIOR, S. NADANACIVA, and J. WEBER, Biochim. Biophys. Acta, 2002, 1553, 188–211. 33 H. S. PANEFSKY and R. L. CROSS, Adv. Enzymol. Relat. Areas Mol. Biol., 1991, 64, 173–214. 34 J. J. GARCIA and R. A. CAPALDI, J. Biol. Chem., 1998, 273, 15940–15945. 35 H. OMOTE and M. FUTAI, Acta Physiol. Scand., 1998, 163 (Suppl), 177–183. 36 M. FUTAI, H. OMOTE, Y. SAMBONGI, and Y. WADA, Biochim. Biophys. Acta, 2000, 1458, 276–288. 37 H. REN and W. S. ALLISON, Biochim. Biophys. Acta, 2000, 1458, 221–233. 38 A. E. SENIOR, S. NADANACIVA, and J. WEBER, J. Exp. Biol., 2000, 203, 35–40. 39 K. IDA, T. NOUMI, M. MAEDA, T. FUKUI, and M. FUTAI, J. Biol. Chem., 1991, 266, 5424–5429. 40 H. OMOTE, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1992, 267, 20571–20576. 41 M. Y. PARK, H. OMOTE, M. MAEDA, and M. FUTAI, J. Biochem., 1994, 116, 1139–1145. 42 S. L¨OBAU, J. WEBER, S. WILKE-MOUNTS, and A. E. SENIOR, J. Biol. Chem., 1997, 272, 3648–3456. 43 S. NADANACIVA, J. WEBER, and A. E. SENIOR, Biochemistry, 1999, 38, 7670–7677. 44 A. E. SENIOR and M. K. AL-SHAWI, J. Biol. Chem., 1992, 267, 21471–21478. 45 J. WEBER, S. T. HAMMOND, S. WIKE-MOUNTS, and A. E. SENIOR, Biochemistry, 1998, 37, 608–614. 46 J. WEBER, R. S. V. LEE, E. GRELL, J. G. WISE, and A. E. SENIOR, J. Biol. Chem., 1992, 267, 1712–1718.
21
22 Proton Translocating ATPases
47 R. K. NAKAMOTO, C. J. KETCHUM, and M. K. AL-SHAWI, Annu. Rev. Biophys. Biomol. Struct., 1999, 28, 205–234. 48 J. WEBER and A. E. SENIOR, Biochem. Biophys. Acta, 2000, 1458, 300–309. 49 J. WEBER and A. E. SENIOR, J. Biol. Chem., 2001, 276, 35422–35428. 50 N. P. LE, H. OMOTE, Y. WADA, M. K. AL-SHAWI, R. K. NAKAMOTO, and M. FUTAI, Biochemistry, 2000, 39, 2778–2783. 51 S. NADANACIVA, J. WEBER, S. WILKE-MOUNTS, and A. E. SENIOR, Biochemistry, 1999, 38, 15493–15499. 52 H. OMOTE, N. P. LE, M.-Y. PARK, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1995, 270, 25656–25660. 53 M. FUTAI, Biochem. Biophys. Res. Commun., 1977, 79, 1231–1237. 54 S. D. DUNN and M. FUTAI, J. Biol. Chem., 1980, 265, 113–118. 55 H. OMOTE, K. TAINAKA, A. IWAMOTO-KIHARA, Y. WADA, and M. FUTAI, Arch. Biochem. Biophys., 1998, 308, 277–282. 56 M. FUTAI, T. NOUMI, and M. MAEDA, Annu. Rev. Biochem., 1989, 58, 111–136. 57 A. IWAMOTO, J. MIKI, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1990, 265, 5043–5048. 58 R. K. NAKAMOTO, K. SHIN, A. IWAMOTO, H. OMOTE, M. MAEDA, and M. FUTAI, Ann. New York Acad. Sci., 1992, 671, 335–344. 59 M. FUTAI and H. OMOTE, Handbook of Biological Physics. W. N. KONINGS, H. R. KABACK, and J. S. LOLKEMA, eds., Elsevier Science, Amsterdam 1996, Vol. 2, pp 47–74. 60 C. J. De BEUKELAER, H. OMOTE, A. IWAMOTO-KIHARA, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1995, 270, 22850–22854. 61 H. KANAZAWA, H. HAMA, B. P. ROSEN, and M. FUTAI, Arch. Biochem. Biophys., 1985, 241, 364–370. 62 K. SHIN, R. K. NAKAMOTO, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1992, 267, 20835–20839. 63 M. K. AL-SHAWI, C. J. KETCHUM, and R. K. NAKAMOTO, J. Biol. Chem., 1997, 272, 2300–2306.
64 C. J. KETCHUM, M. K. AL-SHAWI, and K. K. NAKAMOTO, Biochem. J., 1998, 330, 707–712. 65 R. K. NAKAMOTO, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1993, 268, 867–872. 66 R. K. NAKAMOTO, M. K. AL-SHAWI, and M. FUTAI, J. Biol. Chem., 1995, 270, 14042–14046. 67 M. FUTAI and H. OMOTE, J. Bioenerg. Biomembr., 1996, 28, 409–414. 68 R. AGGELER, M. A. HAUGHTON, and R. A. CAPALDI, J. Biol. Chem., 1995, 270, 9185–9191. 69 T. M. DANCAN, V. V. BULYGIN, Y. ZOU, M. L. HUTCHEON, and R. L. CROSS, Proc. Natl. Acad. Sci. U. S. A., 1995, 92, 10964–10968. 70 D. SUBBERT, S. ENGELBRECHT, and W. JUNGE, Nature, 1996, 381, 623–625. 71 W. JUNGE, H. LILL, and S. ENGELBRECHT, Trends Biochem. Sci., 1997, 22, 420–423. 72 H. NOJI, R. YASUDA, H. YOSHIDA, and K. KINOSHITA, Jr., Nature, 1997, 386, 299–302. 73 K. KATO-YAMADA, H. NOJI, R. YASUDA, K. KINOSHITA, Jr., and M. YOSHIDA, J. Biol. Chem., 1998, 273, 19375–19377. 74 S. D. DUNN, J. Biol. Chem., 1982, 267, 7354–7359. 75 R. YASUDA, H. NOJI, and K. KINOSHITA, Jr. and M. YOSHIDA, Cell, 1998, 93, 1117–1124. 76 R. YASUDA, H. NOJI, H. YOSHIDA, and K. KINOSHITA, Jr., Nature, 2001, 410, 898–904. 77 H. OMOTE, N. SAMBONMATSU, K. SAITO, Y. SAMBONGI, A. IWAMOTO-KIHARA, T. YANAGIDA, Y. WADA, and M. FUTAI, Proc. Natl. Acad. Sci. U. S. A., 1999, 96, 7780–7784. 78 H. OMOTE, M.-Y. PARK, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1994, 269, 10265–10269. 79 Y. IKO, Y. SAMBONGI, M. TANABE, A. IWAMOTO-KIHARA, K. SAITO, I. UEDA, Y. WADA, and M. FUTAI, J. Biol. Chem., 2001, 276, 47508–47511. 80 A. IWAMOTO, H. OMOTE, H. HANADA, N. TOMIOKA, A. ITAI, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1991, 266, 16350–16355.
References 81 K. Y. HARA, H. NOJI, D. BALD, R. YASUDA, K. KINOSHITA, Jr., and M. YOSHIDA, J. Biol. Chem., 2000, 275, 14260–14263. 82 D. L. FOSTER and R. H. FILLINGAME, J. Biol. Chem., 1982, 257, 2009–2015. 83 G. B. COX, A. L. FIMMEL, F. GIBSON, and L. HATCH, Biochim. Biophys. Acta, 1986, 849, 62–69. 84 W.-D. STALZ, J.-C. GREIE, G. DECKERS-HEBESTREIT, and K. ALTENDORF, J. Biol. Chem., 2003, 278, 27068–27071. 85 J. C. LONG, S. WANG, and S. B. VIK, J. Biol. Chem., 1998, 273, 16235–16240. 86 F. I. VALIYAVEETIL and R. H. FILLINGAME, J. Biol. Chem., 1998, 273, 16241–16247. 87 S. EYA, M. MAEDA, and M. FUTAI, Arch. Biochem. Biophys., 1991, 284, 71–77. 88 O. Y. DMITRIEV, P. C. JONES, W. P. JIANG, and R. H. FILLINGAME, J. Biol. Chem., 1999, 274, 15598–15604. 89 A. J. W. RODGERS and R. A. CAPALDI, J. Biol. Chem., 1998, 273, 29406–29410. 90 D. T. MC LACHLIN and S. D. DUNN, Biochemistry, 2000, 39, 3486–3490. 91 P. C. JONES, J. HERMOLIN, W. P. JIANG, and R. H. FILLINGAME, J. Biol. Chem., 2000, 275, 31340–31346. 92 O. Y. DMITRIEV, P. C. JONES, and R. H. FILLINGAME, Proc. Natl. Acad. Sci. U. S. A., 1999, 96, 7785–7790. 93 P. C. JONES and R. H. FILLINGAME, J. Biol. Chem., 1998, 273, 29701–29705. 94 W. P. JIANG, J. HERMOLIN, and R. H. FILLINGAME, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 4966–4971. 95 R. H. FILLINGAME, W. P. JIANG, and O. Y. DIMITRIEV, J. Bioenerg. Biomembr., 2000, 32, 433–439. 96 G. GROTH and J. E. WALKER, FEBS Lett., 1997, 410, 117–123. 97 H. SEELERT, A. POETSCH, N. A. DENCHER, A. ENGEL, H. STAHLBERG, and D. J. M¨ULLER, Nature, 2000, 405, 418–419. 98 H. STAHLBERG, D. J. M¨ULLER, K. SUDA, D. FOTIADIS, A. ENGEL, T. MEIER, U. MATTHEY, and P. DIMROTH, EMBO Rep., 2001, 2, 229–233. 99 V. K. RESTOGI and M. E. GIRVIN, Nature, 1999, 402, 263–268. 100 T. ELSTON, H. WANG, and G. OSTER, Nature, 1998, 391, 510–513. 101 S. D. WATTS and R. A. CAPALDI, J. Biol. Chem., 1997, 272, 15065–15068.
102 C. L. TANG and R. A. CAPALDI, J. Biol. Chem., 1996, 271, 3018–3024. 103 S. P. TSUNODA, R. AGGELER, M. YOSHIDA, and R. A. CAPALDI, Proc. Natl. Acd Sci. U. S. A., 2001, 98, 898–902. 104 Y. WADA, Y. SAMBONGI, and M. FUTAI, Biochim. Biophys. Acta, 2000, 1549, 499–505. 105 E. SCHNEIDER and K. ALTENDORF, EMBO J., 1985, 4, 515–518. 106 S. P. TSUNODA, R. AGGELER, H. NOJI, K. KINOSHITA, Jr, M. YOSHIDA, and R. A. CAPALDI, FEBS Lett., 2000, 470, 244–248. 107 M. L. HUTCHEON, T. M. DUNCAN, H. NGAI, and R. L. CROSS, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 8519–8524. 108 K. GUMBIOWSKI, O. P¨ANKE, W. JUNGE, and S. ENGELBRECHT, J. Biol. Chem., 2002, 277, 31287–31290. 109 Q. LIU, P. M. KANE, P. R. NEWMAN, and M. FORGAC, J. Biol. Chem., 1996, 271, 2018–2022. 110 K. J. MACLEOD, E. VASILYEVA, J. D. BALEJA, and M. FORGAC, J. Biol. Chem., 1998, 273, 150–156. 111 H. HANADA, Y. MORIYAMA, M. MAEDA, and M. FUTAI, Biochem. Biophys. Res. Commun., 1990, 170, 873–878. 112 Q. LIU, X. H. LENG, P. R. NEWMAN, E. VASILYEVA, P. M. KANE, and M. FORGAC, J. Biol. Chem., 1997, 272, 11750–11756. 113 A. IWAMOTO, Y. ORITA, M. MAEDA, and M. FUTAI, FEBS Lett., 1994, 352, 243–246. 114 R. HIRATA, L. A. GRAHAM, A. TAKATSUKI, T. H. STEVENS, and Y. ANRAKU, J. Biol. Chem., 1997, 272, 4975–4803. 115 N. UMEMOTO, Y. OHYA, and Y. ANRAKU, J. Biol. Chem., 1999, 266, 24526–24532. 116 G.-H. SUN-WADA, H. MURAKAMI, H. NAKAI, Y. WADA, and M. FUTAI, Gene, 2001, 274, 93–99. 117 T. NISHI, S. KAWASAKI-NISHI, and M. FORGAC, J. Biol. Chem., 2001, 276, 34122–34130. 118 B. POWELL, L. A. GRAHAM, and T. H. STEVENS, J. Biol. Chem., 2000, 275, 26354–23660. 119 S. KAWASAKI-NISHI, T. NISHI, and M. FORGAC, Proc. Natl. Acd Sci. U. S. A., 2001, 98, 12397–12402. 120 H. HIRATA, N. NAKAMURA, H. OMOTE, Y. WADA, and M. FUTAI, J. Biol. Chem., 2000, 275, 386–389.
23
24 Proton Translocating ATPases
121 Y. ARATA, J. D. BALEJA, and M. FORGAC, J. Biol. Chem., 2002, 277, 3357–3363. 122 Y. ARATA, J. D. BALEJA, and M. FORGAC, Biochemistry, 2002, 41, 11301–11307. 123 T. HIRATA, A. IWAMOTO-KIHARA, G.-H. SUN-WADA, T. OKAJIMA, Y. WADA, and M. FUTAI, J. Biol. Chem., 2003, 278, 23741–23719. 124 M. HUSS, G. INGENHORST, S. KONIG, M. GASSEL, S. DORSE, A. ZEECK, K. ALTENDORF, and H. WIECZOREK, J. Biol. Chem., 2002, 277, 40544–40548. 125 L. S. HOLIDAY, M. LU, B. S. LEE, R. D. NELSON, S. SOLIVAN, L. ZHANG, and S. L. GLUCK, J. Biol. Chem., 2000, 275, 32331–32337.
126 H. IMAMURA, M. NAKANO, H. NOJI, E. MUREYUKI, S. OHKUMA, M. YOSHIDA, and K. YOKOYAMA, Proc. Natl. Acad. Sci. U. S. A., 2003, 100, 2312–2315. 127 K. IHARA, T. ABE, K. I. SUGIMURA, and Y. MUKOHATA, J. Exp. Biol., 1992, 172, 475–485. 128 R. K. SOONG, G. D. BACHAND, H. P. NEVES, A. G. OLKHOVETS, H. G. CRAIGHEAD, and C. D. MONTEMAGNO, Science, 2000, 290, 1555–1558. 129 G.-H. SUN-WADA, Y. IMAI-SENGA, A. YAMAMOTO, Y. MURATA, T. HIRATA, Y. WADA, and M. FUTAI, J. Biol. Chem., 2002, 277, 18098–18105. 130 R. H. FILLINGAME, Curr. Opin. Struct. Biol., 1996, 6, 491–498.
1
Vacuolar H+ -ATPases: Structure, Mechanism and Regulation Elim Shao and Michael Forgac Tufts University School of Medicine, Boston, USA
Originally published in: Handbook of ATPases. Edited by Masamitsu Futai, Yoh Wada and Jack H. Kaplan. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30689-3
1 Introduction
Vacuolar (H+ )-ATPases (or V-ATPases) are ATP-dependent proton pumps that acidify intracellular compartments and, in certain cases, transport protons across the plasma membrane of eukaryotic cells. Acidification of intracellular compartments functions in such processes as receptor-mediated endocytosis, intracellular targeting of lysosomal enzymes, protein processing and degradation, and coupled transport. Plasma membrane V-ATPases are involved in acid–base balance, bone resorption, potassium secretion and tumor metastasis. This chapter will focus on work from our laboratory on the structure, mechanism and regulation of the V-ATPases from clathrin-coated vesicles and yeast. V-ATPases are composed of a peripheral V1 domain responsible for ATP hydrolysis and an integral V0 domain responsible for proton transport. V1 has a molecular mass of 640 kDa and is composed of eight different subunits (subunits A–H) of molecular mass 70–13 kDa, whereas V0 has a molecular mass of 260 kDa and is composed of five different subunits (subunits a, d, c, c and c ) of molecular mass 100–17 kDa. Conventional and cysteine-mediated cross-linking as well as electron microscopy have allowed us to define the overall shape of the V-ATPase and the location of subunits within the complex. Site-directed and random mutagenesis together with chemical modification have been used to identify residues involved in nucleotide binding and hydrolysis, proton translocation and the coupling of these two processes. We have also obtained information about the mechanism of intracellular targeting of V-ATPases and regulation of V-ATPase activity by reversible dissociation.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Vacuolar H+ -ATPases: Structure, Mechanism and Regulation
2 Function of V-ATPases
V-ATPases are ATP-dependent proton pumps responsible for acidification of intracellular compartments and, in specialized cell types, proton secretion across the plasma membrane (for reviews see Refs. [1–4]). V-ATPases are present in such intracellular compartments as clathrin-coated vesicles, endosomes, lysosomes, Golgi-derived vesicles, secretory vesicles and the central vacuoles of fungi and plants. A major role of V-ATPases within intracellular compartments is in various membrane traffic processes [1]. V-ATPases within early endosomes create the low pH necessary for ligand–receptor dissociation and recycling of receptors to the cell surface and for the formation of endosomal carrier vesicles that move dissociated ligands along the endocytic pathway. Several viruses (such as influenza virus) and toxins (such as diphtheria toxin) gain access to the cytoplasm of target cells via acidic endosomal compartments. Acidification of late endosomes is required for targeting of newly synthesized lysosomal enzymes from the Golgi to lysosomes. In addition to their role in membrane traffic, V-ATPases also provide the driving force for various coupled transport processes. For example, the proton gradient and membrane potential generated by the V-ATPases drives uptake of neurotransmitters into synaptic vesicles. Finally, V-ATPases in lysosomes and the central vacuoles of fungi and plants provide the acidic environment required for macromolecular degradation as well as the driving force for coupled transport. Plasma membrane V-ATPases have been identified in various cell types, including renal intercalated cells, osteoclasts, macrophages, insect goblet cells and tumor cells, where they function in such processes as renal acidification, bone resorption, pH homeostasis, coupled potassium transport and tumor invasion [1, 4].
3 Overall Structure of V-ATPases and Relationship to F-ATPases
Our current model of the structure of the V-ATPases is shown in Figure 1, and the subunit composition, including the molecular mass of the subunits, the genes encoding these subunits in yeast and information about subunit function, is summarized in Table 1. V-ATPases are composed of a peripheral V1 domain responsible for ATP hydrolysis and an integral V0 domain responsible for proton translocation [1–4]. V1 is a 640 kDa complex that contains eight different subunits (A–H) of molecular mass 70–13 kDa that are present in a stoichiometry of A3 B3 C1 D1 E1 F1 G2 H1–2 [5, 6]. The nucleotide-binding sites are located on the 70 kDa A subunit and the 60 kDa B subunit, with the catalytic sites located on the three A subunits and socalled “non-catalytic” sites located on the three B subunits [7–9]. V 0 is a 260 kDa complex [10] that contains five different subunits (a, d, c, c and c ) of molecular mass 100–17 kDa in a stoichiometry a1 d1 c 4–5 c 1 c 1 [5, 11]. Buried charged residues essential for proton transport are present in both the proteolipid subunits (c, c and c ) and the 100 kDa a subunit [12, 13].
3 Overall Structure of V-ATPases and Relationship to F-ATPases
Fig. 1 Structural model of the V-ATPase complex. The V1 domain (shown in white) is responsible for ATP hydrolysis whereas the V0 domain (shaded) is responsible for proton translocation. ATP hydrolysis at the catalytic nucleotide-binding sites (located on the A subunits) is proposed to drive rotation of a central stalk (composed of the D and F subunits) which in turn drives rotation of the ring of proteolipid subunits (c, c , c ) relative to subunit a. Subunit a
is held fixed relative to the A3 B3 head by a peripheral stalk composed of subunits C, E, G, H and the soluble domain of subunit a. Movement of the ring of proteolipid subunits relative to subunit a is postulated to drive unidirectional proton transport across the membrane. Reprinted with permission
from Ref. [44], copyright 2002, the American Society for Biochemistry and Molecular Biology.
V-ATPases are structurally similar to the F-ATPases that functon in the synthesis of ATP in mitochondria, chloroplasts and bacteria [14–16]. In both cases, the peripheral domain is responsible for nucleotide binding and hydrolysis (or, in the case of the F-ATPases, synthesis), whereas the integral domain is responsible for transport of protons across the membrane. Sequence homology has been identified between the nucleotide-binding subunits of the V and F-ATPases as well as between the proteolipid subunits of these two classes [17–19], suggesting that they have evolved from a common evolutionary ancestor. X-Ray structures have been obtained for F1 from several sources and they reveal a hexameric arrangement of alternating nucleotide-binding subunits (α and β) surrounding a central cavity occupied by a highly α-helical γ subunit, which extends towards the membrane [20, 21]. More recently, an X-ray structure of F1 attached to a ring of proteolipid subunits has been obtained from yeast mitochondria that reveals the γ and ε subunits form a central stalk connecting F1 to a ring of 10 c subunits [22]. F1 and F0 have also been shown to be connected by a peripheral stalk containing the δ subunit and the soluble domains of subunit b [23, 24].
3
4 Vacuolar H+ -ATPases: Structure, Mechanism and Regulation Table 1 Subunit composition of the yeast V-ATPase
V1
V0
Subunit Yeast Gene
Mr (kDa)
Function/location
A B C
VMA1 VMA2 VMA5
69 58 44
D E F G H a
VMA8 VMA4 VMA7 VMA10 VMA13 VPH1
29 26 14 13 54 100
STV1
100
d c
VMA6 VMA3
40 17
c
VMA11
17
c
VMA16
23
Catalytic site, homolog of β of F1 F0 -ATPase Non-catalytic site, homolog of α of F1 F0 -ATPase Peripheral stalk, released from V1 complex during dissociation Central stalk, homolog of γ of F1 F0 -ATPase Peripheral stalk Central stalk Peripheral stalk Peripheral stalk Proton translocation, targeting to vacuole, homolog of a of F1 F0 -ATPase Proton translocation, targeting to late Golgi, homolog of a of F1 F0 -ATPase Cytoplasmic side, non-transmembrane protein Proton translocation, DCCD- and concanamycin-binding sites, homolog of c of F1 F0 -ATPase Proton translocation, homolog of c of F1 F0 -ATPase Proton translocation, homolog of c of F1 F0 -ATPase
The presence of multiple stalks connecting the peripheral and integral domains has been demonstrated for both the V and F-ATPases by electron microscopy [25–27]. The V-ATPase structure, however, appears more complex than the F-ATPase, with many projections emerging from the V1 domain and a knob present on the lumenal side of the membrane [26]. EM images of theV0 domain suggest a ring (presumably of proteolipid subunits) adjacent to a membrane embedded mass that probably corresponds to subunit a [28]. This chapter focuses on our work on the structure and function of individual subunits, their arrangement in the V-ATPase complex, the mechanism of ATP-driven proton transport, the mechanism of targeting of V-ATPases to different intracellular compartments and the mechanisms employed in regulating V-ATPase activity in vivo.
4 Structure and Function of the Nucleotide-binding Subunits
Results from several studies suggest that the catalytic nucleotide-binding sites are located on the 70 kDa A subunits. Modification of the bovine A subunit by sulfhydryl reagents (such as NEM and NBD-Cl) leads to ATP-protectable inhibition of V-ATPase activity [29]. The cysteine residue whose modification is responsible for this inhibition was identified as Cys254 in the bovine A subunit [7]. This cysteine residue is located in a consensus sequence termed the Walker A sequence or
4 Structure and Function of the Nucleotide-binding Subunits
P-loop (corresponding to GXGKTV). The X-ray structure of F1 reveals that this sequence is located at the catalytic nucleotide-binding site on the β subunit, where it surrounds the terminal phosphate groups of ATP [20, 21]. Interestingly, formation of a disulfide bond between Cys254 and Cys532 (located in the C-terminal domain of the A subunit) causes reversible inhibition of V-ATPase activity [30]. It has been proposed that disulfide bond formation between Cys254 and Cys532 leads to inhibition of ATPase activity by preventing this subunit from adopting an open conformation that it must go through during the catalytic cycle [31]. Reversible disulfide bond formation between these cysteine residues in the A subunit has also been proposed to represent an important mechanism of regulating V-ATPase activity in vivo [32]. Studies using the photoactivated nucleotide analog 2-azido-ATP also support the idea that the catalytic nucleotide-binding sites are located on the A subunits [8]. This reagent inhibits V-ATPase activity in an ATP-protectable manner, with complete inhibition of ATPase activity occurring after modification of only one of the three A subunit sites per complex. Further support for the location of the catalytic sites on the A subunit comes from mutagenesis studies. Using molecular modeling based on the F-ATPase structure and the homology between the V and F-ATPase nucleotide-binding subunits, A subunit residues were identified that were predicted to play a role in either ATP binding or hydrolysis at the catalytic sites [33]. Mutagenesis of K263 in the Walker A sequence of the A subunit completely inhibited ATP hydrolysis, as did mutagenesis of E286 [34]. K263 is proposed to stabilize the negatively charged phosphate groups of bound ATP whereas E286 is suggested to serve as a proton acceptor from water during ATP hydrolysis. In addition, mutagenesis studies coupled with photochemical modification by 2-azidoATP have identified four aromatic residues that are thought to form the adenine binding pocket of the catalytic site on the A subunit [33, 35]. More recently, evidence has been obtained for an important role of a novel domain of the V-ATPase A subunit, termed the “non-homologous” region, in coupling of ATP hydrolysis and proton transport and reversible dissociation of the V-ATPase complex [36]. The non-homologous region is a 90 amino acid stretch which is present and conserved in all V-ATPase A subunit sequences but which is absent from the corresponding F-ATPase β subunit [17, 37]. This region is located approximately one-third of the way from the amino terminus of the A subunit and, because the A3 B3 structure (by analogy with F1 ) is predicted to be tightly packed, is likely to be located on the outer surface of the complex. A peripheral location of this domain suggests the possibility that it may contribute to the peripheral stalk connecting the V1 and V0 domains [26]. Consistent with this idea, mutations have been identified in this region that lead to significant changes in coupling of proton transport and ATP hydrolysis [36]. Interestingly, one mutation (P217V) dramatically increased the coupling efficiency, suggesting that the wild-type enzyme is not normally optimally coupled. Several mutations were also identified in the non-homologous region that block reversible dissociation of the V-ATPase complex in response to glucose depletion [36], suggesting that this domain may also play a role in regulating V-ATPase activity in vivo (see below).
5
6 Vacuolar H+ -ATPases: Structure, Mechanism and Regulation
Several lines of evidence suggest that, in addition to subunit A, the 60 kDa B subunit also participates in nucleotide binding. First, subunit B is selectively modified by the photoactivated nucleotide analog BzATP [9]. Second, the B subunit is homologous to the α and β subunits of F1 and to subunit A of the V-ATPases (approximately 20–25 % amino acid identity), although subunit B lacks the consensus Walker A and B motifs that are often present in nucleotide-binding proteins [18, 38]. Despite the absence of these motifs, the sequences that replace them in the B subunit are highly conserved among species. Modeling studies of the B subunit based upon sequence homology with the α subunit, the X-ray coordinates of F1 and energy minimization have been used to predict a structure for this subunit. This model accurately predicts the presence of a number of residues at the nucleotide-binding site on B, based upon nucleotide-protectable chemical modification of mutant proteins [39]. Unique cysteine residues were introduced into a cys-less form of subunit B and their modification by the sulfhydryl reagent biotin maleimide was tested in the intact V-ATPase in the presence and absence of BzATP. All six cysteine residues which reacted with biotin maleimide displayed protection from labeling by BzATP, which is consistent with their presence at the nucleotide-binding site [39]. Although the function of the “non-catalytic” nucleotide-binding sites on the B subunit is not certain, at least one mutation at this site causes a time-dependent change in ATPase activity, suggesting that they may play a role in regulating activity [33]. Other mutations at these sites appear to have less dramatic effects on activity [40]. As with the F-ATPases [20], the nucleotide-binding sites (both catalytic and non-catalytic) appear to be located at the interface of the alternating A and B (or β and α) subunits [33, 40], although, as indicated above, most residues at the catalytic sites are contributed by the A subunits whereas most residues at the non-catalytic sites are contributed by the B subunits.
5 Structure and Function of other V1 Subunits and Arrangement of Subunits in the V-ATPase Complex
Studies from a number of laboratories have begun to shed light on the function of other subunits in the V1 domain. Mutants have been identified in subunit D (product of the VMA8 gene) that suggest it plays an important role in coupling of proton transport and ATP hydrolysis [41]. The most dramatic effects on coupling are observed for double mutants in which changes near the amino and carboxyl-terminus appear to act synergistically to cause uncoupling. One possible explanation for these results is that the N and C-terminus of the protein interact. Together with the predicted high α-helical content of subunit D, this has led us to postulate that subunit D acts as the γ subunit homolog in the V-ATPases [41]. Mutations causing uncoupling of the F-ATPase have also been identified in the γ subunit of F1 [42], which exists as a coiled-coil structure within the central stalk connecting F1 and F0 [20, 21].
5 Structure and Function of other V1 Subunits and Arrangement of Subunits in the V-ATPase Complex 7
Support for the hypothesis that subunit D is the γ subunit homolog in the VATPases comes from cysteine mutagenesis and covalent cross-linking studies using the bifunctional, photoactivated maleimide reagent maleimido-benzophenone. As indicated above, a molecular model of the A3 B3 hexamer has been derived using the X-ray coordinates of F1 , the sequence homology between the nucleotide-binding subunits of the V and F-ATPases and energy minimization [35, 39]. This model was used to predict the location of unique cysteine residues introduced into a cys-less form of subunit B by site-directed mutagenesis. These unique cysteine residues were first modified using maleimido-benzophenone followed by photoactivated cross-linking. Cross-linked products were then identified by Western blot analysis using subunit-specific antibodies. Results from these studies indicate that subunits B and D are cross-linked only at sites on the B subunit predicted to be oriented towards the interior of the A3 B3 hexamer [43]. By contrast, subunits E and G are cross-linked to subunit B at sites predicted to be facing the exterior of the complex [43, 44]. These results suggest that subunit D is located in the central stalk connecting V1 and V0 whereas subunits E and G form part of a peripheral connection between these domains. Conventional cross-linking studies performed on the coated vesicle V-ATPase using a number of bifunctional cross-linking reagents have demonstrated that subunit D makes contact with subunit F whereas subunit E is in contact with a significant number of subunits, including subunits C, G, H and the soluble domain of subunit a [6]. Disruption of V-ATPase assembly in yeast strains lacking certain V1 subunits gives rise to several subcomplexes, including the heterodimeric species DF and EG [45], providing further support for the structural model shown in Figure 1. Dissociation of the bovine coated vesicle V-ATPase also gives rise to a heterodimeric EC subcomplex, as detected by immunoprecipitation [46]. These results suggest that subunits D and F form a central stalk connecting V1 and V0 whereas subunits C, E, G, H and the soluble domain of subunit a form the peripheral stalk connecting these domains. Considerable information about the function of subunits C, G and H has been obtained from studies performed in other laboratories involving mutagenesis and overexpression of these subunits in yeast. Thus, subunit C appears to activate ATP hydrolysis by V1 whereas subunit H inhibits this activity [47, 48]. Subunit G, which shows some homology to subunit b of the F-ATPases along one helical face [49], has been shown to tolerate short deletions in its sequence without loss of function [50]. This result is similar to those obtained for the F-ATPase b subunit [51], and suggests that subunit G may be the b subunit homolog of the V-ATPases. An X-ray structure of subunit H has recently been obtained [52] and has revealed interesting structural similarities to the importins, proteins involved in nuclear import. Yeast 2-hybrid analysis and co-immunoprecipitation studies have revealed that subunit H makes contact with both subunit A in V1 and the soluble, cytoplasmic domain of subunit a in V0 [53], suggesting that it serves an important bridging function. Moreover, subunit H has been shown to serve an important role as the docking site for the NEF protein, which in turn functions in internalization of the HIV receptor CD4 [54].
8 Vacuolar H+ -ATPases: Structure, Mechanism and Regulation
6 Structure and Function of V0 Subunits
The V0 domain of the V-ATPases contains five subunits. Three of these correspond to highly hydrophobic proteins termed proteolipids because of their ability to be extracted with organic solvents: subunit c (Vma3p), subunit c (Vma11p) and subunit c (Vma16p). All three of these proteins are required for proton transport by the V-ATPases [12]. Subunit c and c each contain four transmembrane helices with an essential buried glutamate residue present in TM4. Although subunit c was originally predicted to contain five transmembrane helices, recent studies have demonstrated that it instead contains only four, with the critical glutamic acid residue present in TM2 [55]. Interestingly, the region originally predicted to correspond to TM1 has been shown not to be required for function. Topological studies employing expression of epitope-tagged proteins as well as differential modification of unique cysteine residues indicate that the C-terminus of subunit c is present on the lumenal side of the membrane whereas the C-terminus of subunit c is exposed on the cytoplasmic side of the membrane [55, 56]. The buried glutamic acid residues present in the proteolipid subunits are essential for proton transport by both chemical modification by DCCD [31] and site-directed mutagenesis [12]. These residues have therefore been proposed to participate directly in proton translocation. Interestingly, subunit c forms part of the binding site for the specific V-ATPase inhibitor bafilomycin [57]. The three V-ATPase proteolipid subunits are homologous to the F-ATPase subunit c, which is composed of just two transmembrane helices that adopt a helical hairpin structure [58]. The V-ATPase proteolipids appear to have been derived during evolution from a gene duplication and fusion event from an ancestral gene that more closely resembles the F-ATPase c subunit [19]. NMR analysis of the F-ATPase proteolipid suggests that the orientation of the helix containing the essential acidic residue changes upon a change in protonation state [59], leading to the suggestion that this helical swiveling is an important part of the mechanism of proton translocation (see below). The F0 domain from E. coli and yeast mitochondria contain 10 copies of subunit c, which form a ring [22, 60]. Stoichiometry measurements indicate that the V0 domain contains 5–6 copies of subunits c plus c and a single copy of subunit c [5], and epitope-tagging experiments indicate the presence of single copies of both subunits c and c [11]. Thus, these results suggest a subunit stoichiometry for the V0 domain of c4–5 c 1 c 1 . In addition to the proteolipid subunits, the V0 domain contains two additional subunits. Subunit d (Vma6p) is a 40 kDa hydrophilic polypeptide which appears to lack transmembrane segments [61], but which remains tightly bound to V0 upon dissociation of V1 [62]. The fifth V0 subunit is a 100 kDa integral membrane protein called subunit a. In yeast, subunit a is encoded by two genes (VPH1 and STV1), with Vph1p associated with V-ATPase complexes targeted to the vacuole and Stv1p associated with V-ATPases targeted to the late Golgi [63–65]. In mammals, subunit a is encoded by four genes (a1–a4), which are expressed in a tissue specific manner [66–70]. The a3 isoform appears to be responsible for plasma membrane targeting
7 Mechanism of ATP-dependent Proton Transport
of the V-ATPase in osteoclasts [68], and a defect in this gene causes the genetic defect autosomal recessive osteopetrosis [69]. The a4 isoform is responsible for targeting of the V-ATPase to the plasma membrane of renal intercalated cells [67], and disruption of this gene causes the disease renal tubule acidosis [70]. The topology of subunit a has been studied by introduction of unique cysteine residues into a cys-less form of Vph1p followed by evaluation of the exposure of these sites in intact vacuolar membranes using differential reactivity towards membrane permeant and impermeant sulfhydryl reagents [71]. These studies indicate that subunit a contains an amino terminal hydrophilic domain of about 50 kDa oriented towards the cytoplasmic side of the membrane and a carboxyl terminal hydrophobic domain containing nine transmembrane segments, with the C-terminus located on the lumenal side of the membrane. Site-directed mutagenesis of Vph1p has identified a buried, charged residue present in TM7 (Arg735) that is absolutely essential for proton translocation [13]. In addition, there are a series of other buried charged residues present in TM7 and TM9, including E789, H743 and R799, whose mutation causes a reduction in proton transport, suggesting that these residues may also play some role in this process [13, 72, 73]. Although no sequence homology exists between the a subunits of V0 and F0 , we have suggested that the V0 subunit a plays a similar role in proton transport. For subunit a of F0 , Arg210 located in TM4 is critical for proton transport [74]. It is postulated to directly interact with the buried carboxyl groups of the proteolipid c subunits during proton trans-location [75, 76]. The F-ATPase a subunit also contains other buried charged residues that are postulated to form aqueous access channels that allow protons to reach and leave these buried sites [74, 75]. V-ATPase complexes containing different isoforms of subunit a differ in other properties besides intracellular localization [77]. Thus Vph1p-containing complexes show a ca. 10-fold greater assembly of V1 and V0 than Stv1p-containing complexes. In addition, Vph1p-containing complexes are more tightly coupled than those containing Stv1p. These differences may reflect the need to maintain a lower lumenal pH in the vacuole (where Vph1p-containing complexes reside) than in the Golgi (where Stv1p-containing complexes are found). Using chimeric constructs, we have shown that whereas intracellular targeting is controlled by the amino terminal domain, the degree of assembly of V1 and V0 and the tightness of coupling of proton transport and ATP hydrolysis are controlled by the carboxyl-terminal [65].
7 Mechanism of ATP-dependent Proton Transport
Because of their structural similarity, the V and F-ATPases are believed to operate by a similar rotary mechanism [78, 79]. For the F-ATPases, ATP hydrolysis by the β subunits of F1 causes rotation of a central stalk, composed of the γ and ε subunits, which in turn drives rotation of the ring of proteolipid c subunits relative to subunit a. Rotation of both the γ subunit in F1 [80–82] and the ring of c subunits in F1 F0 [83] have been demonstrated. The a subunit is postulated to provide access channels
9
10 Vacuolar H+ -ATPases: Structure, Mechanism and Regulation
for the protons to reach and leave the buried carboxyl groups on the c subunit ring and to activate release of protons into the egress channel through interaction with a critical arginine residue [16, 74, 75]. Rotation of the c subunit ring relative to subunit a thus causes unidirectional proton transport across the membrane. For the V-ATPases, we suggest that ATP hydrolysis in V1 drives rotation of the D and F subunits together with the proteolipid ring relative to subunit a. Rotation of the D and F subunits of the V1 domain of the V-ATPase from Thermos thermophilus was recently demonstrated [84], which is consistent with the placement of these subunits in the central stalk [43]. Interaction of the proteolipid carboxyl groups with Arg735 on subunit a [13] would then lead to release of protons into the egress channel leading to the lumenal side of the membrane. Further rotation of the c subunit ring would bring these deprotonated carboxyl groups into contact with the cytoplasmic aqueous channel. They would need to pick up protons from this channel before re-entering the hydrophobic phase of the bilayer.
8 Regulation of V-ATPase Activity In Vivo
Control of V-ATPase activity in cells and tissues is critical for the diversity of functions served by these enzymes. Several mechanisms have been proposed to be involved in regulation of V-ATPase activity in vivo. As described above, reversible disulfide bond formation between conserved cysteine residues at the catalytic sites on the A subunit causes reversible inhibition of V-ATPase activity [30, 32], and several studies suggest that this represents an important in vivo mechanism for controlling activity [32, 85]. Differences in coupling efficiency have been demonstrated for V-ATPases containing different a subunit isoforms [77], and mutations in a number of V-ATPase subunits, including subunit D [41], subunit a [13] and the non-homologous domain of subunit A [36], cause changes in coupling efficiency. Moreover, coupling of proton transport and ATPase activity is readily perturbed [86, 87], suggesting that the enzyme may be poised to change the tightness of coupling in vivo. In addition to the mechanisms described above, reversible dissociation of the V-ATPase into its constituent V1 and V0 domains represents an important in vivo regulatory mechanism [88, 89]. In yeast, reversible dissociation occurs in response to glucose depletion [88], although many of the signal transduction pathways activated by glucose withdrawal do not appear to be involved in this process [90]. Glucose-dependent dissociation requires a catalytically active enzyme [35, 90] and an intact microtubular network [91]. Interestingly, however, reassembly of the V-ATPase is not dependent upon microtubules. Several mutations have been identified in the non-homologous region which block dissociation without inhibiting activity [36], suggesting that this domain may play a role in controlling dissociation independent of any effects on activity. Similar mutations have been identified in subunit G, where they appear to cause increased stability of the V-ATPase complex [50].
References
The in vivo dissociation behavior of V-ATPase complexes containing different isoforms of the a subunit has proven to be complex [77]. Thus, Vph1p-containing complexes localized to the vacuole undergo dissociation in response to glucose withdrawal whereas Stv1p-containing complexes localized to the Golgi do not. Re-targeting of Stv1p-containing complexes to the vacuole by overexpression of Stv1p results in a re-appearance of the ability to dissociate in response to glucose depletion. Conversely, prevention of Vph1p-containing complexes from reaching the vacuole by disruption of genes involved in intracellular targeting results in VATPase complexes that show glucose-dependent dissociation, but less completely than when these complexes have a vacuolar localization. Thus, in vivo dissociation appears to be controlled by both the a subunit isoform present in the complex and by the cellular environment in which the complex resides [77]. Recent studies have identified a novel complex (termed RAVE) which includes part of the ubiquitin ligase complex (Skp1p) and which appears to function in both normal and regulated assembly of the V-ATPase [92, 93]. Additional studies will be required to further elucidate the mechanism of regulating in vivo dissociation of the V-ATPase complex.
9 Conclusions
V-ATPases are multisubunit complexes responsible for ATP-driven proton transport in both intracellular compartments and the plasma membrane. They are composed of a peripheral V1 domain responsible for ATP hydrolysis and an integral V0 domain that transports protons. A variety of approaches have begun to elucidate the arrangement and function of subunits in the V-ATPase complex. Regulation of V-ATPase activity in vivo is likely to be complex given the great diversity of functions performed by this family of proton pumps.
Acknowledgments
The authors thank Drs Takao Inoue, Tsuyoshi Nishi and Shoko Kawasaki-Nishi for many helpful discussions. This work was supported by NIH Grant GM34478 (to M. F.).
References 1 T. NISHI and M. FORGAC, Nat. Rev. Mol. Cell. Biol., 2002, 3, 94–103. 2 H. WIECZOREK, D. BROWN, S. GRINSTEIN, J. EHRENFELD, and W. R. HARVEY, Bioeswsays, 1999, 21, 637–648. 3 L. A. GRAHAM, B. POWELL, and T. H. STEVENS, J. Exp. Biol., 2000, 203, 61–70.
4 P. M. KANE and K. J. PARRA, J. Exp. Biol., 2000, 203, 81–87. 5 H. ARAI, G. TERRES, S. PINK, and M. FORGAC, J. Biol. Chem., 1988, 263, 8796–8802. 6 T. XU, E. VASILYEVA, and M. FORGAC, J. Biol. Chem., 1999, 274, 28909–28915.
11
12 Vacuolar H+ -ATPases: Structure, Mechanism and Regulation
7 Y. FENG and M. FORGAC, J. Biol. Chem., 1992, 267, 5817–5822. 8 J. ZHANG, E. VASILYEVA, Y. FENG, and M. FORGAC, J. Biol. Chem., 1995, 270, 15494–15500. 9 E. VASILYEVA and M. FORGAC, J. Biol. Chem., 1996, 271, 12775–12782. 10 J. ZHANG, Y. FENG, and M. FORGAC, J. Biol. Chem., 1994, 269, 23518–23523. 11 B. POWELL, L. A. GRAHAM, and T. H. STEVENS, J. Biol. Chem., 2000, 275, 23654–23660. 12 R. HIRATA, L. A. GRAHAM, A. TAKATSUKI, T. H. STEVENS, and Y. ANRAKU, J. Biol. Chem., 1997, 272, 4795–4803. 13 S. KAWASAKI-NISHI, T. NISHI, and M. FORGAC, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 12397–12402. 14 R. L. CROSS, Biochim. Biophys. Acta, 2000, 1458, 270–275. 15 J. WEBER and A. E. SENIOR, Biochim. Biophys. Acta, 2000, 1458, 300–309. 16 R. H. FILLINGAME, W. JIANG, and O. Y. DMITRIEV, J. Exp. Biol., 2000, 203, 9–17. 17 L. ZIMNIAK, P. DITTRICH, J. P. GOGARTEN, H. KIBAK, and L. TAIZ, J. Biol. Chem., 1988, 263, 9102–9112. 18 B. J. BOWMAN, R. ALLEN, M. A. WECHSER, and E. J. BOWMAN, J. Biol. Chem., 1988, 263, 14002–14007. 19 M. MANDEL, Y. MORIYAMA, J. D. HULMES, Y. C. PAN, H. NELSON and N. NELSON, Proc. Natl. Acad. Sci. U. S. A., 1988, 85, 5521–5524. 20 J. P. ABRAHAMS, A. G. LESLIE, R. LUTTER, and J. E. WALKER, Nature, 1994, 370, 621–628. 21 M. A. BIANCHET, J. HULLIHEN, P. L. PEDERSEN, and L. M. AMZEL, Proc. Natl. Acad. Sci. U. S. A., 1998, 95, 11065–11070. 22 D. STOCK, A. G. LESLIE, and J. E. WALKER, Science, 1999, 286, 1700–1705. 23 I. OGILVIE, R. AGGELER, and R. A. CAPALDI, J. Biol. Chem., 1997, 272, 16652–16656. 24 D. T. MCLACHLIN, J. A. BESTARD, and S. D. DUNN, J. Biol. Chem., 1998, 273, 15162–15168. 25 E. J. BOEKEMA, T. UBBINK-KOK, J. S. LOLKEMA, A. BRISSON, and W. N. KONINGS, Proc. Natl. Acad. Sci. U. S. A., 1997, 94, 14291–14293. 26 S. WILKENS, E. VASILYEVA, and M. FORGAC, J. Biol. Chem., 1999, 274, 31804–31810.
27 S. WILKENS and R. A. CAPALDI, Nature, 1998, 393, 29. 28 S. WILKENS and M. FORGAC, J. Biol. Chem., 2001, 276, 44064–44068. 29 H. ARAI, M. BERNE, G. TERRES, H. TERRES, K. PUOPOLO, and M. FORGAC, Biochemistry, 1987, 26, 6632–6638. 30 Y. FENG and M. FORGAC, J. Biol. Chem., 1994, 269, 13224–13230. 31 M. FORGAC, J. Biol. Chem., 1999, 274, 12951–12954. 32 Y. FENG and M. FORGAC, J. Biol. Chem., 1992, 267, 19769–19772. 33 K. J. MACLEOD, E. VASILYEVA, J. D. BALEJA, and M. FORGAC, J. Biol. Chem., 1998, 273, 150–156. 34 Q. LIU, X. H. LENG, P. R. NEWMAN, E. VASILYEVA, P. M. KANE, and M. FORGAC, J. Biol. Chem., 1997, 272, 11750–11756. 35 K. J. MACLEOD, E. VASILYEVA, K. MERDEK, P. D. VOGEL, and M. FORGAC, J. Biol. Chem., 1999, 274, 32869–32874. 36 E. SHAO, T. NISHI, S. KAWASAKI-NISHI, and M. FORGAC, J. Biol. Chem., 2003, 278, 12985–12991. 37 K. PUOPOLO, C. KUMAMOTO, I. ADACHI, and M. FORGAC, J. Biol. Chem., 1991, 266, 24564–24572. 38 K. PUOPOLO, C. KUMAMOTO, I. ADACHI, R. MAGNER, and M. FORGAC, J. Biol. Chem., 1992, 267, 3696–3706. 39 E. VASILYEVA, Q. LIU, K. J. MACLEOD, J. D. BALEJA, and M. FORGAC, J. Biol. Chem., 2000, 275, 255–260. 40 Q. LIU, P. M. KANE, P. R. NEWMAN, and M. FORGAC, J. Biol. Chem., 1996, 271, 2018–2022. 41 T. XU and M. FORGAC, J. Biol. Chem., 2000, 275, 22075–22081. 42 K. SHIN, R. K. NAKAMOTO, M. MAEDA, and M. FUTAI, J. Biol. Chem., 1992, 267, 20835–20839. 43 Y. ARATA, J. D. BALEJA, and M. FORGAC, Biochemistry, 2002, 41, 11301–11307. 44 Y. ARATA, J. D. BALEJA, and M. FORGAC, J. Biol. Chem., 2002, 277, 3357–3363. 45 J. J. TOMASHEK, L. A. GRAHAM, M. U. HUTCHINS, T. H. STEVENS, and D. J. KLIONSKY, J. Biol. Chem., 1997, 272, 26787–26793. 46 K. PUOPOLO, M. SCZEKAN, R. MAGNER, and M. FORGAC, J. Biol. Chem., 1992, 267, 5171–5176.
References 47 K. J. PARRA, K. L. KEENAN, and P. M. KANE, J. Biol. Chem., 2000, 275, 21761–21767. 48 K. K. CURTIS and P. M. KANE, J. Biol. Chem., 2002, 277, 2716–2724. 49 I. E. HUNT and B. J. BOWMAN, J. Bioenerg. Biomembr., 1997, 29, 533–540. 50 C. M. CHARSKY, N. J. SCHUMANN, and P. M. KANE, J. Biol. Chem., 2000, 275, 37232–37239. 51 P. L. SORGEN, T. L. CAVISTON, R. C. PERRY, and B. D. CAIN, J. Biol. Chem., 1998, 273, 27873–27878. 52 M. SAGERMANN, T. H. STEVENS, and B. W. MATTHEWS, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 7134–7139. 53 C. LANDOLT-MARTICORENA, K. M. WILLIAMS, J. CORREA, W. CHEN, and M. F. MANOLSON, J. Biol. Chem., 2000, 275, 15449–15457. 54 M. GEYER, H. YU, R. MANDIC, T. LINNEMANN, Y. H. ZHENG, O. T. FACKLER, and B. M. PETERLIN, J. Biol. Chem., 2002, 277, 28521–28529. 55 T. NISHI, S. KAWASAKI-NISHI, and M. FORGAC, J. Biol. Chem., 2003, 278, 5821–5827. 56 T. NISHI, S. KAWASAKI-NISHI, and M. FORGAC, J. Biol. Chem., 2001, 276, 34122–34130. 57 B. J. BOWMAN and E. J. BOWMAN, J. Biol. Chem., 2002, 277, 3965–3972. 58 M. E. GIRVIN, V. K. RASTOGI, F. ABILDGAARD, J. L. MARKLEY, and R. H. FILLINGAME, Biochemistry, 1998, 37, 8817–8824. 59 V. K. RASTOGI and M. E. GIRVIN, Nature, 1999, 402, 263–268. 60 W. JIANG, J. HERMOLIN, and R. H. FILLINGAME, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 4966–4971. 61 C. BAUERLE, M. N. HO, M. A. LINDORFER, and T. F. STEVENS, J. Biol. Chem., 1993, 268, 12749–12757. 62 I. ADACHI, K. PUOPOLO, N. MARQUEZ-STERLING, H. ARAI, and M. FORGAC, J. Biol. Chem., 1990, 265, 967–973. 63 M. F. MANOLSON, D. PROTEAU, R. A. PRESTON, A. STENBIT, B. T. ROBERTS, M. HOYT, D. PREUSS, J. MULHOLLAND, D. BOTSTEIN, and E. W. JONES, J. Biol. Chem., 1992, 267, 14294–14303.
64 M. F. MANOLSON, B. WU, D. PROTEAU, B. E. TAILLON, B. T. ROBERTS, M. A. HOYT, E. W. JONES, J. Biol. Chem., 1994, 269, 14064–14074. 65 S. KAWASAKI-NISHI, K. BOWERS, T. NISHI, M. FORGAC, and T. H. STEVENS, J. Biol. Chem., 2001, 276, 47441–47420. 66 T. NISHI and M. FORGAC, J. Biol. Chem., 2000, 275, 6824–6830. 67 T. OKA, Y. MURATA, M. NAMBA, T. YOSHIMIZU, T. TOYOMURA, A. YAMAMOTO, G. H. SUN-WADA, N. HAMASAKI, Y. WADA, and M. FUTAI, J. Biol. Chem., 2001, 276, 40050–40054. 68 T. TOYOMURA, T. OKA, C. YAMAGUCHI, Y. WADA, and M. FUTAI, J. Biol. Chem., 2000, 275, 8760–8765. 69 Y. P. LI, W. CHEN, Y. LIANG, E. LI, and P. STASHENKO, Nat. Gen., 1999, 23, 447–451. 70 A. N. SMITH, J. SKAUG, K. A. CHOATE, A. NAYIR, A. BAKKALOGLU, S. OZEN, S. A. HULTON, S. A. SANJAD, E. A. AL-SABBAN, R. P. LIFTON, S. W. SCHERER, and F. E. KARET, Nat. Gen., 2000, 26, 71–75. 71 X. H. LENG, T. NISHI, and M. FORGAC, J. Biol. Chem., 1999, 274, 14655–14661. 72 X. H. LENG, M. F. MANOLSON, Q. LIU, and M. FORGAC, J. Biol. Chem., 1996, 271, 22487–22493. 73 X. H. LENG, M. MANOLSON, M. FORGAC, J. Biol. Chem., 1998, 273, 6717–6723. 74 B. D. CAIN, J. Bioenerg. Biomembr., 2000, 32, 365–371. 75 S. B. VIK, J. C. LONG, T. WADA, and D. ZHANG, Biochim. Biophys. Acta, 2000, 1458, 457–466. 76 W. JIANG and R. H. FILLINGAME, Proc. Natl. Acad. Sci., 1998, 95, 6607–6612. 77 S. KAWASAKI-NISHI, T. NISHI, M. FORGAC, J. Biol. Chem., 2001, 276, 17941–17948. 78 S. B. VIK and B. J. ANTONIO, J. Biol. Chem., 1994, 269, 30364–30369. 79 W. JUNGE, D. SABBERT, and S. ENGELBRECHT, Ber. Bunsenges. Phys. Chem., 1996, 100, 2014–2019. 80 T. M. DUNCAN, V. V. BULYGIN, Y. ZHOU, M. L. HUTCHEON, and R. L. CROSS, Proc. Natl. Acad. Sci. U. S. A., 1995, 92, 10964–10968. 81 D. SABBERT, S. ENGELBRECHT, and W. JUNGE, Nature, 1996, 381, 623–625. 82 H. NOJI, R. YASUDA, M. YOSHIDA, and K. KAZUHIKO, Nature, 1997, 386, 299–302.
13
14 Vacuolar H+ -ATPases: Structure, Mechanism and Regulation
83 Y. SAMBONGI, Y. IKO, M. TANABE, H. OMOTE, A. IWAMOTO-KIHARA, I. UEDA, T. YANAGIDA, Y. WADA, and M. FUTAI, Science, 1999, 286, 1722–1724. 84 H. IMAMURA, M. NAKANO, H. NOJI, E. MUNEYUKI, S. OHKUMA, M. YOSHIDA, and K. YOKOYAMA, Proc. Natl. Acad. Sci. U. S. A., 2003, 100, 2312–2315. 85 Y. E. OLUWATOSIN and P. M. KANE, J. Biol. Chem., 1997, 272, 28149–28157. 86 H. ARAI, S. PINK, and M. FORGAC, Biochemistry, 1989, 28, 3075–3082. 87 I. ADACHI, H. ARAI, R. PIMENTAL, and M. FORGAC, J. Biol. Chem., 1990, 265, 960–966.
88 P. M. KANE, J. Biol. Chem., 1995, 270, 17025–17032. 89 J. P. SUMNER, J. A. DOW, F. G. EARLY, U. KLEIN, D. JAGER, and H. WIECZOREK, J. Biol. Chem., 1995, 270, 5649–5653. 90 K. J. PARRA and P. M. KANE, Mol. Cell. Biol., 1998, 18, 7064–7074. 91 T. XU and M. FORGAC, J. Biol. Chem., 2001, 276, 24855–24861. 92 J. H. SEOL, A. SHEVCHENKO, A. SHEVCHENKO, and R. J. DESHAIES, Nat. Cell. Biol., 2001, 3, 384–391. 93 A. M. SMARDON, M. TARSIO, and P. M. KANE, J. Biol. Chem., 2002, 277, 13831–13839.
1
Anchoring Proteins of the Erythrocyte Membrane Yoshihito Yawata
Kawasaki College of Allied Health Professions, Kurashiki City, Japan
Originally published in: Cell Membrane. Yoshihito Yawata. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30463-9
1 Ankyrin 1.1 Introduction
Ankyrin is one of the major proteins of human red cells, making up approximately 5% of the total membrane proteins [1–3]. Ankyrin is a large, 206 kDa sulfhydryl-rich protein with a molecular size of 8.3 × 10 nm. It is present at a level of 120 × 103 copies per cell. Functionally, ankyrin is connected with β-spectrin through a high affinity linkage (K d : ∼10−7 M), and with the cytoplasmic domain of band 3, which also has a high affinity linkage (K d : ∼10−7 to 10−8 M) [4, 5]. Ankyrin is a polar protein, and is involved in the local segregation of integral membrane proteins. The polarization of membrane proteins appears to be produced by the relative affinities of the various isoforms of ankyrins for target membrane proteins. These ankyrin isoforms appear to be expressed through tissue-specific, developmentally-regulated control [6–10], as discussed below. 1.2 Structure of Red Cell Ankyrin
Ankyrin is composed of three domains [6, 7, 11–13]: (1) a membrane (band 3)binding domain (89 kDa) at its NH2 -terminal (amino acids 2 to 827), (2) a spectrin binding domain (62 kDa) at the central part of the molecule (amino acids 828 to 1382), and (3) a regulatory domain (55 kDa) at its COOH-terminal (amino acids 1383 to 1881) (Fig. 1). The NH2 -terminal band 3-binding domain is basic, the
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Anchoring Proteins of the Erythrocyte Membrane
Fig. 1 Structure of red cell ankyrin. Protein domain structure of erythroid ankyrin and its binding sites are shown schematically. Genetic basis of ankyrin 2.1 and 2.2 is also shown at the lower-right.
central spectrin-binding domain is neutral, but heavily phosphorylated, and the COOH-terminal regulatory domain is highly acidic. 1.2.1 Membrane (Band 3)-Binding Domain of Ankyrin This 89 kDa domain, which is at the NH2 -terminal portion of the ankyrin molecule, is almost entirely composed of 24 consecutive 33 amino acid tandem repeats (so-called “cdc 10/ankyrin repeats”, or “ankyrin repeats”), which are subdivided into six repeat folding units [12, 13]. These repeats are fairly similar to each other. Binding sites for band 3 and at least six other types of integral membrane proteins are present in this domain. Repeats 7 through 12 (folding unit 2) and especially repeat 13 through 24 (folding units 3 and 4) form two distinct but cooperating binding sites for band 3. These two binding sites of ankyrin for band 3 are able to interact with four band 3 molecules, because normally band 3 is present in a dimeric form on the membrane. This binding of ankyrin to band 3 appears to be critical for maintaining the normal integrity of membrane functions, especially membrane stability, because selective disruption of the ankyrin/band 3 interaction in intact red cells, which are placed at a slightly alkaline pH, decreases membrane stability markedly. The 15 out of 33 amino acids in the ankyrin repeats are highly conserved. Their conserved sequence is “-G-TPLH-AA–GH–V(or A)–LL–GA–N(or D)–”. L-shaped structures of the ankyrin repeats are composed of a pair of α-helices that form an antiparallel coiled-configuration, which is followed by an extended loop perpendicular to the helices and a β-hairpin [8, 14–19]. Very similar repeats have commonly been found in various proteins, in all phyla. Therefore, the ankyrin repeats appear to have been propagated as one of the versatile modules for specific ligands during evolution.
1 Ankyrin 3
1.2.2 Spectrin-Binding Domain of Ankyrin This spectrin-binding domain is also known as the central domain, and was previously known as the 62 kDa domain. The spectrin-binding sites are located in the beginning and middle regions of this domain. The site in the middle portion is highly conserved and appears to be the principal area of binding [20]. These regions are found to be those near the end of the β-spectrin molecule (repeats 15 and 16) which are involved in dimer–tetramer self-association [21]. Although two binding sites for spectrin are available in the ankyrin molecule, each spectrin tetramer appears to bind only one ankyrin molecule, probably because ankyrin is able to bind to the spectrin tetramer approximately ten-fold more strongly than to the spectrin dimer. 1.2.3 Regulatory Domain of Ankyrin This domain (55 kDa) is at the COOH-terminal region of the ankyrin molecule, and is known to contain regulatory sequences that enhance or reduce the extent of interaction of ankyrin with spectrin and band 3. Ankyrin binds to the COOHterminal region of the β-spectrin with its 55 kDa domain and to the cytoplasmic tail of band 3 via repeated sequences in the NH2 -terminal 89 kDa domain. The regulatory domain consists of multiple isoforms of different sizes and functions, which are produced by alternative splicing [8, 9]. One of these distinct isoforms is ankyrin 2.2, which lacks the acidic 162 amino acid sequence from exon 38, which is found in full-sized ankyrin (protein 2.1) [6]. This protein 2.2 (the smaller isoform: ankyrin 2.2) is an activated ankyrin, and enhances ankyrin binding to band 3 and spectrin. Use of alternative promoters has recently been shown to produce a muscle-specific, truncated isoform of ANK 1 [9]. Many alternatively spliced isoforms of ankyrin at the three COOH-terminal exons are known, these are isoforms lacking: (1) exons 38 and 39, (2) exons 36 through 39, and (3) exons 36 through 41. The COOH-terminal exons are highly conserved [8]. 1.3 Functions of Ankyrin
The major function of ankyrin lies in its binding to β-spectrin and band 3 [22]. These bindings create a tight association between spectrin and band 3, but the strength of binding can be modified by the extent of phosphorylation [23]. The presence of seven phosphorylation sites have been proven in vitro for casein kinase I and cyclic AMP-independent protein kinase. Unphosphorylated ankyrin binds preferentially to spectrin tetramers and oligomers rather than to spectrin dimers, but phosphorylation removes this preference. Phosphorylation also reduces the capacity of ankyrin to bind band 3. In addition, ankyrin is phosphorylated by protein kinase A. Stoichiometrically, one ankyrin molecule links each spectrin tetramer to the membrane. Since these binding interactions are cooperative, attachment of ankyrin to band 3 enhances greatly the ability of the molecule to organize the spectrin, to
4 Anchoring Proteins of the Erythrocyte Membrane
which it is attached, into tetramers. It is also true that spectrin binding enhances the affinity for band 3. Therefore, ankyrin plays a crucial role in the organization of the network. 1.4 Erythroid and Nonerythroid Ankyrins
Ankyrins are widely expressed, and several ankyrin gene families are known [2]. Red cell ankyrin (ankyrinR , or ANK 1) is expressed not only in red cells, but also in myocytes, endothelial cells, and brain (especially in Purkinje cells of the cerebellum) [6, 7]. The gene is located on 8p11. 2. AnkyrinB (ANK 2) is a neural form, and is present in neuronal cell bodies and dendrites [24, 25]. The gene is located on 4q25–q27. AnkyrinG (ANK 3) is the most widely distributed, mostly in epithelia and axons, but also in megakaryocytes, macrophages, myocytes, melanocytes, hepatocytes, kidney cells, and testicular Leydig’s cells [26, 27]. Nonerythroid ankyrins interact with a variety of integral membrane proteins other than band 3 (AE1), such as Na+ -, K+ -ATPase [28–30], the voltage-dependent axonal Na+ channel [31], the amiloride-sensitive epithelial channel [32], the cardiac Na+ /Ca2+ exchanger [33], H+ -, K+ -ATPase [34], the IP3 receptor [35], CD44 [36], and a group of neuro-fascinrelated brain adhesion molecules [37, 38]. Mice with targeted disruption of ANK 1 or ANK 3 exhibit neurological abnormalities, including Purkinje cell degeneration and ataxia, and those with targeted disruption of ANK 2 also exhibit brain defects, but with more clear-cut abnormalities in brain development [39].
2 Protein 4.2
Red cell protein 4.2 (P4.2) is one of the major components of the red cell membrane skeletal network, which binds to the cytoplasmic domain of anion exchanger band 3 and interacts with ankyrin in red cells [40–43]. Patients with P4.2 deficiency in their red cell membranes suffer from congenital hemolytic anemia with microspherocytosis or other such disorders. This fact suggests that P4.2 plays an important role in maintaining the stability and flexibility of red cells. P4.2 has been reviewed previously by two authors in 1993 [40], 1994 [41, 42] and recently in 2000 [43]. 2.1 Protein Chemistry of Protein 4.2
Protein 4.2 is a membrane protein (Fig. 2) accounting for approximately 5% of the total membrane protein content and for 250 × 103 copies per red cell. It has a molecular weight of 72 kDa on SDS–PAGE [44, 45]. Extraction of protein 4.2 from red cell membranes is more difficult than for any of the other peripheral proteins, even under high and low ionic strength conditions. Therefore, strong basic conditions (pH 11 or above) have been used in combination
2 Protein 4.2 5
Fig. 2 Molecular structure of protein 4.2 and the genetic isoforms. A wild type of protein 4.2 is shown at the second line from the top. Black solid boxes indicate constitutive coding exons, and shaded boxes show alternative coding exons.
with gel filtration with 1 M KI-Sephacryl S-200 as the standard method for the extraction of protein 4.2. This procedure yields 1–2 mg of “type I” protein 4.2 from 500 mL of whole blood with a purity of approximately 85%. This protein is water-soluble, and is difficult to separate from residual ankyrin and protein 4.1. The use of other extraction methods with 10 to 20 mM lithium diiodosalicylate, 6 mM 2,3-dimethylmaleic anhydride, 5 mM p-chloromercuribenzoic acid (pCMB), or 1 mM p-chloromercuribenzoic sulfate (pCMBS) have been reported. The nonionic detergent Triton X-100 can also be used to extract protein 4.2 from red cell membranes. Under these conditions, band 3 is co-extracted along with the portion of protein 4.2, suggesting an association of protein 4.2 with band 3 in red cell membranes in situ [44, 45]. An alternative method using 2 M Tris-HCl (pH 7.6) has also been used to extract protein 4.2 with a purity of greater than 97%. However, this “type II” variety of protein 4.2 is less water-soluble and behaves like an integral protein. It has been speculated that its characteristic hydrophobicity is due to myristylation [46]. Protein 4.2 purified by the standard pH 11 method with 1 M KI-Sephacryl S-200 appears to be heterogeneous in size and probably primarily consists of a mixture of dimers and trimers. Electron microscopically, purified protein 4.2 appears as globular particles with diameters in the range 80–150 Å, and has been suggested as being tetrameric in situ in the membrane.
6 Anchoring Proteins of the Erythrocyte Membrane
Protein 4.2 in human red cell membranes is known to be myristylated [47] at a site near the N-terminus, as assayed by the release of myristoyl glycine from partially hydrolyzed protein 4.2. Glycine at the second position appears to be responsible for this myristylation (N-myristoyl glycine). Further studies of its biological functions should be considered. It has also been reported that protein 4.2 is palmitoylated under physiological conditions [48]. After labeling of intact human red cells with 3 [H] palmitic acid, radioactivity was found to be associated with protein 4.2 by immunoprecipitation of peripheral membrane proteins extracted at pH 11 from ghosts with anti-protein 4.2 antibody. The fatty acid linked to protein 4.2 was identified as palmitic acid. Protein 4.2 could be depalmitoylated with hydroxylamine, suggesting a thioester linkage. Depalmitoylated protein 4.2 showed significantly decreased binding to protein 4.2-depleted membranes, as compared with native protein 4.2. Several red cell membrane proteins including ankyrin, band 3, p55, protein 4.1 and spectrin were palmitoylated. Fatty acid acylation of proteins confers an extra hydrophobic moiety on proteins, which promotes hydrophobic protein–membrane and protein–protein interactions. Whereas control protein 4.2 showed a binding capacity of 280 mg per g of vesicle protein (band 3), depalmitoylated protein 4.2 showed a capacity of 108 mg per g of vesicle protein. Therefore, palmitoylation of protein 4.2 appears to favor its interaction with band 3 in the membrane. To date, protein 4.2 has not been crystallized. Crystallography is thus expected to be performed in the future. 2.2 Functions of Protein 4.2
The major functions of protein 4.2 are, in conjunction with other membrane proteins, topographically adjacent to it in situ in red cell membranes, especially band 3, ankyrin, spectrin, and protein 4.1. 2.2.1 Binding Properties of Protein 4.2 2.2.1.1 Interactions of protein 4.2 with band 3 When Triton X-100 extracts of red cell membranes were fractionated by ion exchange chromatography and nondenaturing gel electrophoresis, protein 4.2 was found with band 3 [44]. In addition, direct binding assays have indicated that an excess of the cytoplasmic domain of band 3 eliminated the normal (2–8 × 10−7 M) binding of purified protein 4.2 to red cell inside-out vesicles (IOV). The binding of protein 4.2 to the purified cytoplasmic domain of band 3 usually takes from 6 to 20 h for complete saturation. Therefore, it has been hypothesized that re-binding of protein 4.2 to the membrane probably requires the formation of other types of associations apart from protein–protein contacts at the membrane–medium interface, perhaps the formation of protein 4.2 or band 3 oligomers. Although the major binding site of protein 4.2 has been considered to be the cytoplasmic domain of band 3, no direct evidence has been found, because the state of self-association of purified protein 4.2 is heterogeneous and the exact oligomeric state of band 3 in situ in membranes under the conditions of the
2 Protein 4.2 7
binding assays is unknown. It has been tentatively estimated that the stoichiometry of the protein 4.2 to band 3 interaction is approximately 1:3.9 on a monomer basis. One synthetic peptide of protein 4.2 (P8: L61 FVRRGQPFTIILYF) was found to bind strongly to the cytoplasmic domain of band 3 [49]. Four other peptides (P22: L271 LNKRRGSVPILRQW, P27: G346 EGQRGRIWIFQTST, P41: L556 WRKKLHLTLSANLE, and P48: I661 HRERSYRFRSVWPE) bind less strongly. These peptides have in common a cluster of two or three basic amino acid residues (arginine or lysine) in a region with virtually no acidic residues. The cytoplasmic domain of band 3 bound in a saturated manner to P8 with a K d of 0.16 µM and a capacity of 0.56 mol of the cytoplasmic domain of band 3 monomer per mol of P8. Replacement of R64 R with R64 G, G64 R or G64 G almost completely abolishes the cytoplasmic domain of band 3 binding, suggesting that R64 R is essential for its binding. P8 competitively inhibits binding of purified human red cell P4.2 to the cytoplasmic domain of band 3. Protein 4.2 can be found with band 3 and ankyrin in an immunoprecipitated complex, probably due to association of protein 4.2 with ankyrin and band 3 in situ [40]. A partial deficiency of band 3 has also been found to be accompanied by partial deficiency of protein 4.2 [43]. Furthermore, a cow with total band 3 deficiency and mice with the targeted band 3-knock-out gene clearly demonstrated a loss of protein 4.2 in their red cell membranes [43]. The rotational and lateral mobility of band 3 in red cell membranes from patients with protein 4.2 deficiency is substantially increased [50–52]. In our study with fluorescence recovery after use of the photobleaching method (FRAP), the immobile fraction of band 3, which constitutes about 60% of the total in normal red cells, was totally absent in complete protein 4.2 deficiency. It is also interesting that, with complete deficiency of protein 4.2, the number of intramembrane particles (IMP) has been found to be reduced with a shift to larger sizes, indicating the possibility of increased oligomerization of band 3 molecules in these red cells [53]. Heat treatment considerably enhanced this effect. The structural and functional characteristics of band 3 in these protein 4.2-deficient red cells appeared normal in terms of the cleavage pattern of band 3 fragments and the binding properties of band 3 to protein 4.2 or ankyrin. 2.2.1.2 Interactions of protein 4.2 with ankyrin Considering the association of protein 4.2 with ankyrin, it has been reported that protein 4.2 can bind to 0.65 mol of ankyrin per mol of protein 4.2 with a K d of from 1 to 3.5 × 10−7 M on the Scatchard plot in vitro [45]. However, this binding requires several hours to approach saturation in solution, and no conclusive evidence has yet been shown for an association of protein 4.2 with ankyrin in the membrane in situ. It has also been shown that ankyrin can bind to the cytoplasmic domain of band 3 in IOV without protein 4.2, and that reassociation of ankyrin with IOV is unaffected even when protein 4.2 is removed from the IOV. In some hemolytic anemias, however, a partial deficiency of protein 4.2 has also been reported in ankyrin deficiencies [54]. Decreased protein 4.2 content has been noted in a mouse strain (nb/nb) with an ankyrin deficiency. In a protein 4.2 deficiency with the 142 Ala←Thr point mutation, it was reported that ankyrin was partially released from the patient’s red cell membranes upon preparation of
8 Anchoring Proteins of the Erythrocyte Membrane
IOV, suggesting that protein 4.2 might contribute to the stability of the membrane protein association. It has been shown that the red cell membranes of nb/nb mice, which were almost completely deficient in full-length 210 kDa ankyrin due to a defect in the Ank-1 gene on mouse chromosome 8, were severely (up to 73%) deficient in protein 4.2 content [55]. This deficiency of protein 4.2 in nb/nb homozygous mice was not the result of defective protein 4.2 synthesis. Reconstitution of nb/nb to inside-out vesicles with human red cell ankyrin restored ankyrin levels to up to 80% of the normal levels and increased binding of exogenously added human red cell protein 4.2 by approximately 60%. These results suggest that ankyrin is required for normal associations of protein 4.2 with the red cell membrane. 2.2.1.3 Interactions of protein 4.2 with spectrin Normal protein 4.2 has been shown to bind to spectrin in solution and to promote the binding of spectrin to ankyrinstripped inside-out vesicles [51]. Two independent classes of binding sites of protein 4.2 to spectrin have been identified: 1) a high-affinity (a binding coefficient: K d = 7.4 ± 0.2 × 10−9 M−1 ), low-capacity (0.6 ± 0.8 × 10−9 M L−1 ) class of sites; and 2) a low-affinity (K d = 2.8 ± 2.0 × 10−7 M−1 ), high capacity (5.8 ± 1.0 × 10−9 M L−1 ) class of sites. It has been calculated that, at saturation, there is approximately one spectrin binding site per seven protein 4.2 molecules. Therefore, protein 4.2 provides low-affinity binding sites for both band 3 oligomers and spectrin dimers on the human red cell membrane. These observations suggest that protein 4.2 may stabilize skeleton–membrane interactions by providing a direct link between band 3 and spectrin. A spectrin-binding domain of human erythrocyte membrane protein 4.2 has recently been identified [56]. In a disease state, i. e., in red cells with total protein 4.2 deficiency and also in red cells of 4.2−/− mice, the cytoskeletal proteins (spectrins, ankyrin, and protein 4.1) are not deficient [41, 42, 57]. However, the cytoskeletal network in these protein 4.2-deficient red cells appears to be less extended when studied by electron microscopy using the surface replica method and the quick-freeze deep-etching method [53]. Interestingly, the cytoskeletal network in these protein 4.2-deficient red cells becomes markedly disorganized, with the appearance of larger aggregates when heat-treated up to 48 ◦ C [50]. Under these conditions, a marked decrease in red cell membrane deformability has been observed by ektacytometry It should be noted that the spectrin and ankyrin contents were maintained as normal in these cells. Therefore, these abnormalities appear to be independent of spectrin and ankyrin per se, and due to the lack of protein 4.2. These results raise the possibility that protein 4.2 may play a role in connecting the cytoskeletal network to integral proteins (especially band 3) as a type of anchoring protein. 2.2.1.4 Interaction of protein 4.2 with protein 4.1 As to a possible association of protein 4.2 with protein 4.1, it has been reported that protein 4.1 may interact with protein 4.2 in solution, suggesting the masking of the binding domains of these proteins when they are present together in solution [45]. Protein 4.2, protein 4.1 and ankyrin binding are partially inhibited (about 50%) by the presence of these proteins.
2 Protein 4.2 9
The protein 4.1 and protein 4.2 binding sites are localized at the nearby sites on the cytoplasmic domain of band 3. It is possible that, in the absence of protein 4.2, additional binding sites for protein 4.1 on band 3 may be exposed. The content of protein 4.1, however, remains nearly normal both in human red cells when there is a total deficiency of protein 4.2 and in 4.2−/− mouse red cells [41, 42, 57]. It is also worth noting that the content of protein 4.2 appears to be normal in mice with a complete deficiency of all protein 4.1 R isoforms, which had beeen generated by gene knock-out technology [58]. 2.2.2 Transglutaminase Activity of Protein 4.2 Protein 4.2 in human red cells has no transglutaminase activity, although the homology in the gene structure between protein 4.2 and transglutaminase is high (i.e., an overall identity of 32% in a 446 amino-acid overlap with guinea-pig liver transglutaminase, and of 27% in a 639 amino-acid overlap with human coagulation factor XIII subunit a) [59]. It has been speculated that the lack of transglutaminase activity may be due to the presence of an alanine substituted for the cysteine at the active site of the molecule. 2.2.3 Phosphorylation of Protein 4.2 Although phosphorylation has not been observed on protein 4.2 extracted from mature human red cells, protein 4.2 can be phosphorylated, if it is purified by methods that result in exposing phosphorylation sites through alteration of protein 4.2 sulfhydryl groups [40, 46]. Seventeen potential phosphorylation sites have been activated in red cell ghosts, i. e., eight possible protein kinase C sites, seven casein kinase II sites, one tyrosine kinase site, and one cAMP- (or cGMP-) dependent kinase site [47]. It has been suggested that membrane-associated protein 4.2 in human mature red cells is already fully phosphorylated with little or no turnover under normal conditions. The physiological functions of such potential protein 4.2 phosphorylation is unknown. The activities of the major red cell kinases have recently been determined to assess the phosphorylation status in red cells of 4.2−/− mice [57]. Cytosolic protein kinase C (PKC) was significantly decreased with decreased PKC-α and PKC-β I isoforms and normal PKC-β III in 4.2−/− red cells. Cytosolic protein kinase A (PKA) activity was increased in these red cells. Basal phosphorylation was increased and PMAstimulated phosphorylation was reduced in 4.2−/− red cells. Cytosolic casein kinase I (CK I) activity was normal, but cytosolic CK II activity was decreased in these red cells. The functional significance of these activities remains to be clarified at some point in the future. 2.3 Protein 4.2 in Red Cell Membrane Ultrastructure
Although the localization of P4.2 in the red cell membrane structure has still not been elucidated in detail, there are several positive pieces of evidence. With a total deficiency of P4.2, it is now recognized that the intramembrane particles (IMPs) are clustered in the inside-out vesicles (IOVs) of the patient’s red cells [41, 50, 53]. In
10 Anchoring Proteins of the Erythrocyte Membrane
addition, electron microscopic studies with the freeze fracture method have shown IMPs on the red cell ghosts to be enlarged, suggesting increased oligomerization of band 3 molecules. This phenomenon has been verified by biophysical analyses [41–43, 53]. Furthermore, analyses with the quick-freeze deep-etching method or the surface replica method have shown the cytoskeletal network to be disrupted in this disorder [41–43, 53]. These results appear to indicate that P4.2 molecules are located near band 3 molecules and membrane proteins making up the cytoskeletal network. Biophysical analyses, especially with ektacytometry, have revealed the increased instability of the cytoskeletal network of the patient’s red cells. It has recently been proven that P4.2 can bind directly to spectrins [51], implying that P4.2 may play an important role as one of the anchoring proteins connecting the cytoskeletal network to the integral proteins (particularly band 3 molecules). The exact location of various membrane proteins in situ in the normal human red cell membrane ultrastructure has been studied by immnuno-electron microscopy with the surface replica method by utilizing antibodies against various membrane proteins: i.e., spectrins, ankyrin, band 3 (the cytoplasmic domain), protein 4.1 and protein 4.2. At first, spectrins were readily detected as major constituents of the cytoskeletal network, and ankyrin and protein 4.1 were also identified on it. Band 3 molecules were found attached to the cytoskeletal network as an immobile form of band 3. Other band 3 molecules were located inside the basic units, as the mobile form of band 3. P4.2, however, could not be detected by this procedure, when the antibody-conjugated immunogold particles were applied to the open red cell ghosts, implying that the epitopes of P4.2 were not exposed. Therefore, normal red cell ghosts were subjected to gentle treatment with Triton X-100 to remove part of the cytoskeletal network. P4.2 was then found to be attached predominantly to the cytoskeletal network. These results suggest that P4.2 is very likely to be present at the outer face of the cytoskeletal network and even under the lipid bilayer, and is attached to spectrins and band 3, as shown by biophysical studies [51]. 2.4 Protein 4.2 Gene 2.4.1 Characteristics of Genomic DNA The human red cell protein 4.2 gene is ∼20 kb in length and contains 13 exons and 12 introns [59–63]. The coding sequence from the genomic DNA is identical to the cDNA sequence. Nucleotide polymorphism has been observed in the normal human protein 4.2 gene. The exons range in size from 104 to 314 base pairs with an average size of 170 base pairs, while the introns vary from 6.4 to 0.3 kb. All the exon–intron boundaries follow the consensus 5 donor–3 acceptor splice junction sequence for eukaryotic genes of gt-ag. The gene is localized at human chromosome 15q15–q21 [62, 63]. The upstream region of the protein 4.2 gene contains several elements that are similar in sequence to the upstream elements of the genes for β-globin and porphobilinogen deaminase, which are also red cell protein genes [59, 62]. The elements are spaced a similar distance from the transcription start site and have
2 Protein 4.2 11
similar relative spacings and orders. These similarities have made it easier to identify five possible regulatory cis-elements in the protein 4.2 gene, starting –20 nucleotides upstream from the transcription start site, i.e.: (1) a possible TATA element, (2) a short G + C-rich domain, which could be an Spl binding site, (3) a possible CAAT box, (4) a CAAC box, and (5) two GF-1 binding domains, one at −23 to −28, and another one at –173 to –178 [59]. These findings suggest the use of common cis-elements in these three erythroid genes, although the identification of these elements as having a regulatory function in protein 4.2 gene expression is highly speculative. It has also been reported [62] that the nucleotides upstream from the cDNA start site (nt 1) are: (1) CAGT (nt –4 to –1), agreeing well with the CA cap signal; (2) nucleotides –26 to –21 upstream from the cDNA start site having a sequence of ATAAAA, which agrees well in sequence and position with the promoter TATA box for eukaryotic genes; (3) a CCAT sequence was noted at nt –89 to –86 within the reported –385 nt upstream sequence, where upstream promoter elements are located; (4) a GC-rich region from nt –85 to –34 (G/C to A/T ratio = 3); (5) within this region, a sequence of CCCACCCC CTCCCCC containing a CACC element (nt –83 to –80) that is a potential binding site for the Spl nuclear factor; and (6) two AGATAA sequences for potential binding of erythroid-specific transcription factor GATA-1 (also known as GF-1, NF-E1, Eryf-1) located at nt –175 to –170 and at nt –28 to –23, respectively. The number of the 5 -CpG-3 di-nucleotide sites appears to be small, unlike the β-spectrin gene that has numerous 5 -CpG-3 sites known as the so-called “CpG islands” [64]. The 5 -CpG-3 sites of the protein 4.2 gene were highly methylated, when genomic DNA was prepared from mononuclear cells in normal human peripheral blood. Alignment of the protein 4.2 amino acid sequence with that of a subunit of human coagulation factor XIII and division of the sequence into exons have revealed a remarkable correspondence, although the gene for the a subunit of human factor XIII, which is on chromosome 6p24–p25, is 160 kb and has 15 exons and 14 introns, while the gene for protein 4.2 is only 20 kb and contains 13 exons and 12 introns [59]. With only one exception, the exons of protein 4.2 are very similar and in many cases identical in size to the exons of the a subunit of factor XIII with which they are paired. In additon, in every case, the corresponding intervening introns are of the same splice junction class. These and other similarities suggest that the gene for protein 4.2 is closely related to and possibly derived from that for the a subunit of factor XIII and that the proteins may share common structural and functional properties. However, it should be noted that, despite this close similarity, purified protein 4.2 has no transglutaminase activity in vitro, and that normal red cell membranes do not contain transglutaminase activity. The lack of protein 4.2 transglutaminase activity is induced by the substitution of the cysteine (GQCWVF) at the highly conserved consensus sequence in the transglutaminase, which is substituted for an alanine (GQAWVL) in protein 4.2. The cysteine appears to be required for transglutaminase activity. It is also possible that the substitution of a leucine for a phenylalanine may also be responsible for a loss of this activity. It has been shown that reticulocytes contain two forms of protein 4.2 mRNA, a small form (P4.2S) encoding a protein of 691 amino acids, and a larger form (P4.2L),
12 Anchoring Proteins of the Erythrocyte Membrane
which contains an additional 90 nucleotides following nucleotide +9, encoding a protein of 721 amino acids [59, 62]. Protein 4.2 exon I contains a 5 non-coding sequence, the translation start site, and 99 nucleotides encoding 33 amino acids. These 33 amino acids are identical to the first 33 amino acids of the larger protein 4.2 transcript. The last 90 nucleotides of exon I, coding for 30 amino acids, are removed by splicing in order to generate the smaller transcript, coding for the 691 amino acid protein (a wild type of protein 4.2: 72 kDa on the SDS–PAGE). The genomic organization of the protein 4.2 gene of human red cells contains 13 exons, i.e.: exon I, ut ∼33 residues; II, 34–95; III, 96–173; IV, 174–213; V 214–248; VI, 249–307; VII, 308–354; VIII, 355–388; IX, 389–469; X, 470–569; XI, 570–623; XII, 624–668; and XIII, 669-ut (ut, untranslated sequence) [59, 60] (Fig. 2). The sizes of the introns were 6490 for intron 1, 900 for 2, 580 for 3, 340 for 4, 940 for 5, 320 for 6, 740 for 7, 500 for 8, 390 for 9, 2560 for 10, 2200 for 11, and 3080 for 12. 2.4.2 cDNA of the Protein 4.2 Gene Protein 4.2 complimentary DNA (cDNA) obtained from a human reticulocyte cDNA library has been cloned and sequenced [60–62]. The full-length cDNA was 2.35 kb and contained an open reading frame with a 227–nt untranslated region upstream from the putative ATG start codon. The calculated molecular weight was 76.9 kDa encoding 691 amino acids. The nucleotide sequence CAACCATGC around this initiation site was similar to the consensus sequence for initiation found in higher eukaryocytes, except that the second nt in the P4.2 cDNAs was A rather than C [61]. The presence or absence of the 90 nt insert gave rise to two P4.2 cDNA sequences; that is, a P4.2S from 2073 bp and a P4.2L from 2163 bp [59, 62]. The amino acid sequence derived from the 2.5 kb cDNA contained ∼43% nonpolar, ∼35% polar, ∼10% acidic, and ∼12% basic amino acid residues [61]. The most abundant amino acids were leucine (82 residues) and alanine (60 residues). There were 49 serine and 43 threonine residues, which are potential sites for O-glycosylation and represent 13% of the total residues. There were 16 cysteine residues, six potential N-glycosylation sites (Asn-Xaa-Ser/Thr) at Asn-103, -420, -447, -529, -604, and -705, one potential cAMP-dependent phosphorylation site (basic-basicXaa-Ser) at Ser-278, and nine potential protein kinase C phosphorylation sites (Ser/Thr-Xaa-Arg/Lys) at Ser-7, -57, -58, -154, -224, -449, -455, and -666, and Thr287 [61]. There was one Arg-Gly-Asp sequence at 518–520. Secondary structure analysis predicted that P4.2 should contain ∼3% β-sheet, ∼24% α-helix, and ∼45% reverse turns. Hydropathy analysis of the deduced amino acid sequence revealed a major hydrophobic domain (residues 298–322), which was predicted to be mainly a β-sheet structure with a possible turn. There was a strongly hydrophilic region (residues 438–495). Toward the C terminus of this region, there was a highly charged segment predicted to be an α-helix (residues 470–492) and containing a large number of both positively and negatively charged residues, especially glutamic acid [61]. Elsewhere, it was reported that there were 37% hydrophobic residues and 28% polar residues [60]. Protein 4.2 did not show any obvious repeating primary structure, but a globular protein was suggested [60]. There were no extended stretches of β-sheet or α-helix. Instead, the protein was
2 Protein 4.2 13
characterized by short segments. A hydropathy plot of protein 4.2 showed short alternating regions of hydrophobic and hydrophilic character [60]. The region of protein between amino acids 265 and 475, however, was characterized by two sets of alternating, prominent hydrophobic and hydrophilic domains [60]. 2.4.3 Protein 4.2 Gene in Mouse Red Cells There are substantial discrepancies between the three reports published on the protein 4.2 gene in mouse red cells [65–67]. Korsgren and Cohen (1994) described isolation of a 3.5 kb mouse P4.2 cDNA with the P4.2 transcript of 4.1 kb from mouse reticulocytes [65]. Rybicki et al. (1994), on the other hand, reported isolation of a full-length P4.2 cDNA of 2.2 kb from mouse reticulocytes [66]. Karacay et al. (1995) described an entire P4.2 cDNA sequence consisting of 3465 nt with an open reading frame (ORF) of 691 amino acids, and despite its similarity to human P4.2 cDNA, the mouse cDNA had a longer 3 untranslated region [67]. In addition, they reported that the mouse reticulocyte P4.2 RNA did not exhibit alternative splicing in the region identified in human P4.2 RNA. The P4.2 gene in mice was mapped to murine chromosome 2 [68], in contrast to 15q15–q21 in the human red cell P4.2 gene. 2.4.4 Tissue-Specific Expression of the Mouse Protein 4.2 Gene and the Pallid Mutation Immunoreactive forms of P4.2 with a molecular weight of 72 kDa, in addition to those larger or smaller than 72 kDa, have been detected in nonerythroid cells and tissues [40, 69–71]. Immunologic cross-reactivity between the red cell P4.2 protein and other cellular proteins has also been reported [40, 69–71]. Zhu et al. (1998) recently reported that expression of the mouse P4.2 gene was temporally regulated during embryogenesis and that the P4.2 mRNA expression pattern matched the timing of erythropoietic activity in hematopoietic organs [70]. It should be noted that, contrary to previous reports, P4.2 expression was detected only in the erythroid cell-producing organs and circulating red cells during mouse embryonic development and in adult mice. They first analyzed poly A +RNAs from various adult mice tissues by Northern blot analysis using a 714 bp mouse P4.2 cDNA fragment containing the 3 protein of the P4.2-coding region as the probe. A single 3.5 kb P4.2 transcript was detected at a relatively high level in the spleen, while little or no P4.2 hybridization was seen in other tissues examined (brain, lung, liver, skeletal muscle, kidney, or testis). They extended the use of mice embryos for their P4.2 expression studies. A P4.2 hybridization signal was first detected not at E6.5 days, but instead in primitive erythroid cells in E7.5 embryos. In E10.5 embryos, the P4.2 hybridization signal was detected only in the heart and blood vessels. In E12.5 embryos, there was a switch in the hematopoietic production sites from the yolk sac to the fetal liver. In E 16.5 embryos, the signal was greatly reduced in the liver, and was almost undetectable after birth. Finally, P4.2 gene expression became confined to the red pulp on postnatal day 7. No P4.2 specific labeling was observed in the white pulp, which consisted of germinal centers for lymphocytes, plasma cells, and macrophages. No P4.2 hybridization signal was detected in megakaryocytes.
14 Anchoring Proteins of the Erythrocyte Membrane
Therefore, the P4.2 message was specifically expressed in cells of erythroid lineage in postnatal hematopoietic organs [70]. The chromosomal location of the mouse P4.2 gene was near a mouse pallid (pa) mutation [71]. Pallid was found in a mouse with dilution of coat color, increased bleeding time, and abnormal lysosomal enzyme secretion, as a model of the platelet storage pool disease [72, 73]. Therefore, it has been suggested that the P4.2 gene may be related to the pallid mutation gene, and it has even been proposed that the P4.2 gene itself should be nominated as a “pallidin” [40, 65, 71]. Patients with P4.2 deficiency, however, do not have the platelet storage pool deficiency seen in pa/pa mice, and the mutant mice do not exhibit the hemolysis and spherocytosis observed in the P4.2 deficiency [41–43]. It has recently been shown that the P4.2 gene is distinct from the pa gene, and that changes in P4.2 in pallid mice were not responsible for the pallid mutation [68]. There have also been reports of P4.2 transcripts, in addition to spleen, in other tissues (kidney, heart, brain, and liver), and even in other cells (HeLa cells or HT-29 cells). As with the results for the immunoreactive forms of P4.2 previously detected in nonerythroid tissues or cells, neither P4.2 message nor protein 4.2 have been found in nonerythroid tissues and cells [70].
References 1 BENNETT, V., STENBUCK, P. J. (1979) Identification and partial purification of ankyrin, the high-affinity membrane attachment site for human erythrocyte spectrin. J. Biol. Chem. 254: 2533–2541. 2 PETERS, L. L., LUX, S. E. (1993) Ankyrin. Structure and function in normal cells and hereditary spherocytes. Semin. Hematol. 30: 85–118. 3 BENNETT, V. (1992) Ankyrins. Adaptors between diverse plasma membrane proteins and the cytoplasm. J. Biol. Chem. 267: 8703–8706. 4 WEAVER, D. C., MARCHESI, V. T. (1984) The structural basis of ankyrin function, I. Identification of two structural domains, and II. Identification of two functional domains. J. Biol. Chem. 259: 6165–6169 and 6170–6175. 5 WALLIN, R., CULP, E. N., COLEMAN, D. B. (1984) A structural model of human erythrocyte band 2.1: Alignment of chemical and functional domains. Proc. Natl. Acad. Sci. USA 81: 4095–4099. 6 LUX, S. E., JOHN, K. M., BENNETT, V. (1990) Analysis of cDNA for human erythrocyte ankyrin indicates a repeated structure
7
8
9
10
with homology to tissue-differentiation and cell-cycle control proteins. Nature 344: 36–42. LAMBERT, S. YU, H., PRCHAL, J. T., LAWLER, J., RUFF, P., SPEICHER, D., CHEUNG, M. C., KAN, Y. W., PALEK, J. (1990) cDNA sequence for human erythrocyte ankyrin. Proc. Natl. Acad. Sci. USA 87: 1730–1734. GALLAGHER, P. G., TSE, W. T., SCARPA, A. L., LUX, S. E., FORGET, B. G. (1997) Structure and organization of the human ankyrin-1 gene: Basis for complexity of pre-mRNA processing. J. Biol. Chem. 272: 19220–19228. GALLAGHER, P. G., FORGET, B. G. (1998) An alternate promoter directs expression of a truncated, muscle-specific isoform of the human ankyrin 1 gene. J. Biol. Chem. 273: 1339–1348. GALLAGHER, P. G., SABATINO, D. E., GARETTE, L. J., BODINE, D. M., FORGET, B. G. (1998) Erythroid-specific expression of the human ankyrin 1 (Ank 1) gene in vitro and in vivo is mediated by a promoter that requires GATA-1 and CACCC-binding proteins for its activity. Blood 90: 7a.
References 11 BORK, P. (1993) Hundreds of ankyrin-like repeats in functionally diverse proteins: Mobile modules that cross phyla horizontally? Proteins 17: 363–374. 12 MICHAELY, P., BENNETT, V. (1993) The membrane-binding domain of ankyrin contains four independently folded subdomains, each comprised of six ankyrin repeats. J. Biol. Chem. 268: 22703–22709. 13 MICHAELY, P., BENNETT, V. (1995) The ANK repeats of erythrocyte ankyrin form two distinct but cooperative binding sites for the erythrocyte anion exchanger. J. Biol. Chem. 270: 22050–22057. 14 LUH, F. Y., ARCHER, S. J., DOMAILLE, P. J., SMITH, B. O., OWEN, D., BROTHERTON, D. H., RAINE, A. R., XU, X., BRIZUELA, L., BRENNER, S. L., LAUE, E. D. (1997) Structure of the cyclin-dependent kinase inhibitor p19Ink4d. Nature 389: 999–1003. 15 JACOBS, M. D., HARRISON, S. C. (1998) Structure of an I κ B α/NF-κ B complex. Cell 95: 749–758. 16 BATCHELOR, A. H., PIPER, D. E., de LA BROUSSE, F. C., MCKNIGHT, S. L., WOLBERGER, C. (1998) The structure of GABP cc/P: an ETS domain-ankyrin repeat heterodimer bound to DNA. Science 279: 1037–1041. 17 VENKATARAMANI, R., SWAMINATHAN, K., MARMORSTEIN, R. (1998) Crystal structure of the CDK4/6 inhibitory protein p18INK4C provides insights into ankyrin-like repeat structure/function and tumor-derived p16INK4 mutations. Nat. Struct. Biol. 5: 74–81. 18 YANG, Y., NANDURI, S., SEN, S., QIN, J. (1998) The structural basis of ankyrin-like repeat function as revealed by the solution structure of myotrophin. Structure 6: 619–626. 19 BAUMGARTNER, R., FERNANDEZ-CATALAN, C., WINOTO, A., HUBER, R., ENGH, R. A., HOLAK, T. A. (1998) Structure of human cyclin-dependent kinase inhibitor p19INK4d: Comparison to known ankyrin-repeat-containing structures and implications for the dysfunction of tumor suppressor p16INK4a. Structure 6: 1279–1290.
20 PLATT, O. S., LUX, S. E., FALCONE, J. F. (1993) A highly conserved region of human erythrocyte ankyrin contains the capacity to bind spectrin. J. Biol. Chem. 268: 24421–24426. 21 KENNEDY, S. P., WARREN, S. L., FORGET, B. G., MORROW, J. S. (1991) Ankyrin binds to the 15th repetitive unit of erythroid and nonerythroid P-spectrin. J. Cell Biol. 115: 267–277. 22 DAVIS, L. H., BENNETT, V. (1990) Mapping the binding sites of human erythrocyte ankyrin for the anion exchanger and spectrin. J. Biol. Chem. 265: 10589–10596. 23 CIANCI, C. D., GIORGI, M., MORROW, J. S. (1988) Phosphorylation of ankyrin down-regulates its cooperative interaction with spectrin and band 3. J. Cell Biochem. 37: 301–315. 24 OTTO, E., KUNIMOTO, M., MCLAUGHLIN, T., BENNETT, V. (1991) Isolation and characterization of cDNAs encoding human brain ankyrins reveal a family of alternatively spliced genes. J. Cell Biol. 141: 241–253. 25 KUNIMOTO, M., OTTO, E., BENNETT, V. (1991) A new 440-kD isoform is the major ankyrin in neonatal rat brain. J. Cell Biol. 115: 1319–1331. 26 PETERS, L. L., JOHN, K. M., LU, F. M., EICHER, E. M., HIGGINS, A., YIALAMAS, M., TURTZO, L. C., OTSUKA, A. J., LUX, S. E. (1995) Ank 3 (epithelial ankyrin), a widely distributed new member of the ankyrin gene family and the major ankyrin in kidney, is expressed in alternatively spliced forms, including forms that lack the repeat domain. J. Cell Biol. 130: 313–330. 27 KORDELI, E., LAMBERT, S., BENNETT, V. (1995) AnkyrinG. A new ankyrin gene with neural-specific isoforms localized at the axonal initial segment and rode of Ranvier. J. Biol. Chem. 270: 2352–2359. 28 DEVARAJAN, P., SCARAMUZZINO, D. A., MORROW, J. S. (1994) Ankyrin binds to two distinct cytoplasmic domains of Na, K-ATPase α-subunit. Proc. Natl. Acad. Sci. USA 91: 2965–2969. 29 JORDAN, C., PUSCHEL, B., KOOB, R., DRENCKHAHN, D. (1995) Identification of a binding motif for ankyrin on the
15
16 Anchoring Proteins of the Erythrocyte Membrane
30
31
32
33
34
35
36
37
38
α-subunit of Na+ , K+ -ATPase. J. Biol. Chem. 270: 29971–29975. ZHANG, Z., DEVARAJAN, P., DORFMAN, A. L., MORROW, J. S. (1998) Structure of the ankyrin-binding domain of α-Na, K-ATPase, J. Biol. Chem. 273: 18681–18684. SRINIVASAN, Y., ELMER, L., DAVIS, J., BENNETT, V., ANGELIDES, K. (1988) Ankyrin and spectrin associate with voltage-dependent sodium channels in brain. Nature 333: 177–180. SMITH, P. R., SACCOMANI, G., JOE, E. H., ANGELIDES, K. J., BENOS, D. J. (1991) Amiloride-sensitive sodium channel is linked to the cytoskeleton in renal epithelial cells. Proc. Natl. Acad. Sci. USA 88: 6971–6975. LI, Z. P., BURKE, E. P., FRANK, J. S., BENNETT, V., PHILIPSON, K. D. (1993) The cardiac Na+ -Ca2+ exchanger binds to the cytoskeletal protein ankyrin. J. Biol. Chem. 268: 11489–11491. SMITH, P. R., BRADFORD, A. L., JOE, A. H., ANGELIDES, K. J., BENOS, D. J., SACCOMANI, G. (1993) Gastric parietal cell H+ , K+ -ATPase microsomes are associated with isoforms of ankyrin and spectrin. Am. J. Physiol. 264: C63–70. BOURGUIGNON, L. Y., JIN, H. (1995) Identification of the ankyrin-binding domain of the mouse T-lymphoma cell inositol 1,4,5-triphosphate (IP3) receptor and its role in the regulation of IP3-mediated internal Ca2+ release. J. Biol. Chem. 270: 7257–7260. ZHU, D., BOURGUIGNON, L. Y. (1998) The ankyrin-binding domain of CD44s is involved in regulating hyaluronic acid-mediated functions and prostate tumor cell transformation. Cell Motil. Cytoskeleton 39: 209–222. DUBREUIL, R. R., MACVICAR, G., DISSANAYAKE, S., LIU, C., HOMER, D., HORTSCH, M. (1996) Neuroglian-mediated cell adhesion induces assembly of the membrane skeleton at cell contact sites. J. Cell Biol. 133: 647–655. TUVIA, S., CARVER, T. D., BENNETT, V. (1997) The phosphorylation state of the F1GQY tyrosine of neurofascin determines ankyrin-binding activity and patterns of cell segregation.
39
40
41
42
43
44
45
46
47
48
49
Proc. Natl. Acad. Sci. USA 94: 12957– 12962. DOONER, G. J., BARKER, J. E., GALLAGHER, P. G., DEBATIS, M. E., BROWN, A. H., FORGET, B. G., BECHER, P. S. (2000) Gene transfer to ankyrin-deficient bone marrow corrects spherocytosis in vitro. Exp. Hematol. 28: 765–774. COHEN, C. M., DOTIMAS, E., KORSGREN, C. (1993) Human erythrocyte membrane protein band 4.2 (pallidin). Semin. Hematol. 30: 119–137. YAWATA, Y. (1994) Red cell membrane protein band 4.2: Phenotypic, genetic and electron microscopic aspects. Biochim. Biophys. Acta 1204: 131–148. YAWATA, Y. (1994) Band 4.2 abnormalities in human red cells. Am. J. Med. Sci. 307: 190–243. YAWATA, Y., KANZAKI, A., YAWATA, A. (2000) Genotypic and phenotypic expressions of protein 4.2 in human erythroid cells. Gene Funct. Dis. 2: 61–81. KORSGREN, C., COHEN, C. M., (1986) Purification and properties of human erythrocyte band 4.2. Association with the cytoplasmic domain of band 3. J. Biol. Chem. 261: 5536–5543. KORSGREN, C., COHEN, C. M., (1988) Association of human erythrocyte band 4.2. Binding to ankyrin and to the cytoplasmic domain of band 3. J. Biol. Chem. 263: 10212–10218. DOTIMAS, E., SPEICHER, D. W., GUPTA ROY, B., COHEN, C. M. (1993) Human erythrocyte band 4.2: Structural domain mapping of the protein purified by a novel procedure. Biochim. Biophys. Acta 1148: 19–29. RISINGER, M. A., DOTIMAS, E. M., COHEN, C. M. (1992) Human erythrocyte protein 4.2, a high copy number membrane protein, is N-myristylated. J. Biol. Chem. 267: 5680–5685. DAS, A. K., BHATLACHARYA, R., KUNDU, M., CHAKRABARTI, P., BASU, J. (1994) Human erythrocyte membrane protein 4.2 is palmitoylated. Eur. J. Biochem. 224: 575–580. RYBICKI, A. C., MUSTO, S., SCHWARTZ, R. S. (1995) Identification of a band 3 binding site near the N-terminus of
References
50
51
52
53
54
55
56
erythrocyte membrane protein 4.2. Biochem. J. 309: 677–681. INOUE, T., KANZAKI, A., YAWATA, A., TSUJI, A., ATA, K., OKAMOTO, N., WADA, H., HIGO, I., SUGIHARA, T., YAMADA, O., YAWATA, Y. (1994) Electron microscopic and physicobiochemical studies on disorganization of the cytoskeletal network and integral protein (band 3) in red cells of band 4.2 deficiency with a mutation (codon 142: GCT → ACT). Int. J. Hematol. 59: 157–175. GOLAN, D. E., CORBETT, J. D., KORSGREN, C., THATTE, H. S., HAYETTE, S., YAWATA, Y., COHEN, C. M. (1996) Control of band 3 lateral and rotational mobility by band 4.2 in intact erythrocytes: Release of band 3 oligomers from low-affinity binding sites. Biophys. J. 70: 1534–1542. RYBICKI, A., SCHWARTZ, R. S., HUSTEDT, E. J., COBB, C. E. (1996) Increased rotational mobility and extractability of band 3 from protein 4.2-deficient erythrocyte membranes: Evidence of a role for protein 4.2 in strengthening the band 3-cytoskeleton linkage. Blood 88: 2745–2753. YAWATA, Y., YAWATA, A., KANZAKI, A., INOUE, T., OKAMOTO, N., UEHIRA, K., YASUNAGA, M., NAKAMURA, Y. (1996) Electron microscopic evidence of impaired intramembrane particles and of instability of cytoskeletal network in band 4.2 deficiency in human red cells. Cell Motil. Cytoskeleton 33: 95–105. EBER, S. W., GONZALEZ, J. M., LUX, M. L., SCARPA, A. L., TSE, W. T., DORNWELL, M., ¨ ZCAN, R., HERBERS, J., KUGLER, W., O PEKRUN, A., GALLAGHER, P. G., SCHRO¨ TER, W., FORGET, B.G., LUX, S. E. (1996) Ankyrin-1 mutations are a major cause of dominant and recessive hereditary spherocytosis. Nature Genet. 13: 214–218. RYBICKI, A. C., MUSTO, S., SCHWARTZ, R. S. (1995) Decreased content of protein 4.2 in ankyrin-deficient normoblastosis (nb/nb) mouse red blood cells: Evidence for ankyrin enhancement of protein 4.2 membrane binding. Blood 86: 3583–3589. MANDAL, D., MOITRA, P. K., BASU, J. (2002) Mapping of a spectrin-binding domain of human erythrocyte membrane protein 4.2. Biochem. J. 364: 841–847.
57 PETERS, L. L., JINDL, H. K., GWYNN, B., KORSGREN, C., JOHN, K. M., LUX, S. E., MOHANDAS, N., COHEN, C. M., CHO, M. R., GOLAN, D. E., BRUGNARA, C. (1999) Mild spherocytosis and altered red cell ion transport in protein 4.2-null mice. J. Clin. Invest. 103: 1527–1537. 58 SHI, Z.-T., AFZAL, V., COLLER, B., PATEL, D., CHASIS, J. A., PARRA, M., LEE, G., PASZTY, C., STEVENS, M., WALENSKY, L., PETERS, L. L., MOHANDAS, N., RUBIN, E., CONBOY, J. G. (1999) Protein 4.1 R-deficient mice are viable but have erythroid membrane skeleton abnormalities. J. Clin. Invest. 103: 331–340. 59 KORSGREN, C., COHEN, C. M. (1991) Organization of the gene for human erythrocyte membrane protein 4.2: Structural similarities with the gene for the a subunit of factor XIII. Proc. Natl. Acad. Sci. USA 88: 4840–4844. 60 KORSGREN, C., LAWLER, J., LAMBERT, S., SPEICHER, D., COHEN, C. M. (1990) Complete amino acid sequence and homologies of human erythrocyte membrane protein band 4.2. Proc. Natl. Acad. Sci. USA 87: 613–617. 61 SUNG, L. A., CHIEN, S., CHANG, L.-S., LAMBERT, K., BLISS, S. A., BOUHASSIRA, E. E., NAGEL, R. L., SCHWARTZ, R. S., RYBICKI, A. C. (1990) Molecular cloning of human protein 4.2: A major component of the erythrocyte membrane. Proc. Natl. Acad. Sci. USA 87: 955–959. 62 SUNG, L. A., CHIEN, S., FAN, Y.-S., LIN, C. C., LAMBERT, K., ZHU, L., LAM, J. S., CHANG, L.-S. (1992) Human erythrocyte protein 4.2: Isoform expression, differential splicing, and chromosomal assignment. Blood 79: 2763–2770. 63 NAJFELD, V., BALLARD, S. G., MENNINGER, J., WARD, D. C., BOUHASSIRA, E. E., SCHWARTZ, R. S., NAGEL, R. L., RYBICKI, A. C. (1992) The gene for human erythrocyte protein 4.2 maps to chromosome 15q15. Am J. Hum Genet. 50: 71–75. 64 REMUS, R., ZESCHNIGK, M., ZUTHER, I., KANZAKI, A., WADA, H., YAWATA, A., MUIZNIEKS, I., SCHMITZ, B., SCHELL, G., YAWATA, Y., DOERFLER, W. (2001) The state of DNA methylation in the promoter regions of the human red cell membrane
17
18 Anchoring Proteins of the Erythrocyte Membrane
65
66
67
68
protein (band 3, protein 4.2 and P-spectrin) genes. Gene Funct. Dis. 2: 171–184. KORSGREN, C., COHEN, C. M., (1994) cDNA sequence, gene sequence, and properties of murine pallidin (band 4.2), the protein implicated in the murine pallid mutation. Genomics 21: 478–485. RYBICKI, A. C., SCHWARTZ, R. S., QIU, J. J. H., GILMAN, J. G. (1994) Molecular cloning of mouse erythrocyte protein 4.2: A membrane protein with strong homology with the transglutaminase supergene family. Mammalian Genome 5: 438–445. KARACAY, B., XIE, E., CHANG, L.-S. (1995) The murine erythrocyte protein-4.2-encoding gene: Similarities and differences in structure and expression from its human counterpart. Gene 158: 253–256. GWYNN, B., KORSGREN, C., COHEN, C. M., CICIOTTE, S. L., PETERS, L. L. (1997) The gene encoding protein 4.2 is distinct from the mouse platelet storage pool deficiency mutation pallid. Genomics 42: 532–535.
69 FRIEDRICHS, B., KOOB, R., KRAEMER, D., DRENCKHAHN, D. (1989) Demonstration of immunoreactive forms of erythrocyte protein 4.2 in nonerythroid cells and tissues. Eur. J. Cell Biol. 48: 121–127. 70 ZHU, L., KAHWASH, S. B., CHANG, L.-S. (1998) Developmental expression of mouse erythrocyte protein 4.2 mRNA: Evidence for specific expression in erythroid cells. Blood 91: 695–705. 71 WHITE, R. A., PETERS, L. L., ADKINSON, L. R., KORSGREN, C., COHEN, C. M., LUX, S. E. (1992) The murine pallid mutation is a platelet storage pool disease associated with the protein 4.2 (pallidin) gene. Nature Genet. 2: 80–83. 72 NOVAK, E. K., HUI, S.-W., SWANK, R. T. (1984) Platelet storage pool deficiency in mouse pigment mutations associated with seven distinct genetic loci. Blood 63: 536–544. 73 REDDINGTON, M., NOVAK, E. K., HURLEY, E., MEDDA, C., MCGARRY, M. P., SWANK, R. T. (1987) Immature dense granules in platelets from mice with platelet storage pool disease. Blood 69: 1300–1306.
1
Skeletal Proteins of the Erythrocyte Membrane Yoshihito Yawata
Kawasaki College of Allied Health Professions, Kurashiki City, Japan
Originally published in: Cell Membrane. Yoshihito Yawata. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30463-9
1 α- and β-Spectrins 1.1 Introduction
Spectrins are the most abundant and largest proteins of red cell membrane skeletal proteins. They constitute approximately 25–30% of the total membrane proteins or 75% of the membrane skeletal proteins, and are present at a concentration of about 200 000 copies per red cell [1–4]. Spectrins are composed of two subunits, the α-chain and the β-chain, which are thus called α-spectrin (2429 amino acids, about 280 kDa) and β-spectrin (2137 amino acids, about 246 kDa) (Fig. 1). The number of copies of both α- and β-spectrins is 240 × 103 per red cell. The α-spectrin gene (SPTA 1) [5–7] is located on chromosome 1 (1q22–q23), and the β-spectrin gene (SPTB) [8–10] is on 14q23–q24.2. They are present as a heterodimer (α 2 β 2 ). The α- and β-spectrins are structurally related, but functionally distinct [11]. The α- and β-chains are aligned side by side in an antiparallel arrangement with respect to their N- and C-terminal ends. Electron microscopically, the spectrin molecule appears to have a slender and twisted rod-like structure. The total length of a spectrin molecule is about 100 nm, when examined in its extended state under an electron microscope with the negative staining method. Spectrins are highly flexible to enable a variety of shape changes. Therefore, spectrins are considered as one of the major determinants of cell shape. Although the theoretical length of spectrin tetramers is about 200 nm, the actual end to end distance is about 76 nm, implying that spectrin tetramers are tightly coiled in the native ultrastructure [12]. These coiled spectrin tetramers are able to extend reversibly to a relaxed state when the membrane is stretched artificially. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Skeletal Proteins of the Erythrocyte Membrane
Fig. 1 Molecular structure of "- and $-spectrins. Heterotetramers of "- and $-spectrins are arranged in an anti-parallel fashion. The headto-head interaction and the side-by-side interaction (at the nucleation site) are shown in this figure. "-Spectrin is composed of five domains (I to V), and $-spectrin is composed of four domains (I to IV).
1.2 Structure of Red Cell Spectrins
α-Spectrin (280 kDa) can be identified on sodium dodecyl sulfate (SDS)–polyacrylamide gels as a 240 kDa polypeptide [1]. The NH2 -terminal end with an isolated, unpaired helix (helix C) is a self-association site with the COOH-terminal end of the β-spectrin molecule. Nine typical 106 amino acid repeats, which are conformation segments 1 to 9, follow after this self-association site [1–4]. There is a short central segment which lacks homology with the above-mentioned repeats, but is known to be related to SH3 domains as segment 10 [11]. A further 12 more repeats (segment 11 to 22) follow until the COOH-terminal end is reached, where two EF hand structures are present, which are involved in Ca2+ -binding and in regulating Ca2+ action in α-actinin and fodrin. However, the exact role of the SH3 domain and EF hand structures in human red cells in vivo are not known. The α-spectrin chain is divided into five domains, based on the results from limited tryptic digestion of α-spectrin, and designated αI (conformational segments 1 to 6: 80 kDa), αII (7 to 10 including SH3 segment: 46 kDa), αIII (11 to 14: 52 kDa), αIV (15 to 17: 41 kDa), and αV (18 to 22: 41 kDa). This method is useful for isolating and characterizing functional domains of normal and mutated spectrins in disease states, especially hereditary elliptocytosis and hereditary pyropoikilocytosis. β-Spectrin (246 kDa) is isolated as a 220 kDa polypeptide by SDS–polyacrylamide gel electrophoresis (PAGE) [1]. The structure of β-spectrin begins with a nonhomologous NH2 -terminal end that contains an actin-binding domain, followed by 17 homologous 106 amino acid repeat segments [1–4]. Subsequently there is the putative protein 4.1-binding site [13] and a short nonhomologous COOH terminal segment with a consensus sequence for at least four phosphorylation sites for casein kinase 1 [14]. The state of phosphorylation is related to membrane mechanical stability, since it is known that increased phosphorylation decreases membrane
1 "- and $-Spectrins
stability, and also that decreased phosphorylation increases it [15]. Functionally, repeat 15 and part of repeat 16 are arranged in a β sheet structure and form the binding site for ankyrin [16]. β-Spectrin is also divided into four domains, designated as βIV (conformational segments 1 to 7: 74 kDa), βIII (8 to 10: 33 kDa), βII (11 to 16: 65 kDa), and βI (17 down to the COOH end: 23 kDa). In the β-spectrin, no SH3 segment is present. Nucleation sites of α-spectrin and β-spectrin are in αV (conformation segment 19 to 22), and in βIV (1 to 4), respectively. The presence of repeats in α- and β-spectrins strongly suggests that spectrin evolved from the duplication of a single ancestral gene [5–10]. The homologous 106 amino acid repeats in α- and β-spectrin fold into α-helical segments containing three antiparallel helices which are connected by short nonhelical segments. Each repeat has a triple-helical structure that is about 5 nm long and 2 nm wide. The unit is rotated 60◦ (right-handed) relative to the neighboring repeats [17, 18]. Each repeat is composed of three helices, i.e., helix A, helix B, and helix C. The α-helix (helix A) with 28 amino acids in a straight arrangement reverses itself and forms the next α-helix (helix B), which is 34 amino acids long. This is followed by another reverse turn and the third 31 amino acid α-helix (helix C), which bends in the middle. These three helices are in a triangular array and are bound together by both hydrophobic and electrostatic interactions. Hydrophobic amino acids are aligned on one face of each helix, and are spaced every third or fourth residue, because an α-helix makes one turn every 3.6 residues. Additional salt bonds are present between the mostly polar amino acids, particularly between helices A and C, and B and C. The three helices are tilted away from each other by 10◦ to 20◦ , so that the COOH-terminal end of each repeat is wider than the NH2 -terminal end. This molecular arrangement enables the following repeat to be attached without any change to the structure. The repeats connect to each other through the helices A and C, forming one long α-helix. Helix B of the proximal repeat overlaps helix A of the distal repeat, because the connection is tight. Interactions between the two helices appear to restrict the mobility of the repeats at the repeat junction. Several residues appear to be highly conserved in the repeat segments, such as tryptophan at position 45, leucine at position 26, arginine at 22, aspartate at 38, aspartate or glutamate at 41, lysine at 71, histidine at 101, and hydrophobic amino acids at 1, 12, 15, 26, 35, 46, and 68, respectively [11]. Although the presence of homologous 106 amino acid repeats suggests that spectrin evolved from duplication of a single ancestral minigene, the genomic organization of both α- and β-spectrin genes in human beings reveals that the size and position of exons does not necessarily correspond to the structural or conformational unit of spectrin repeats. 1.3 Functions of Red Cell Spectrins
Spectrins are the key protein for the composition of the cytoskeletal network, which regulates cell shape, membrane deformability and stability, and the lateral mobility of band 3 as an integral protein [1–4]. Spectrin has spring-like properties [19].
3
4 Skeletal Proteins of the Erythrocyte Membrane
Fig. 2 Demonstration of spectrin dimer (A)and spectrin tetramer (B) by electron microscopy with the shadowing method. By courtesy of the late Professor Jiri Palek.
These important functions of spectrins are mainly mediated by self-association of the spectrins themselves. Spectrin heterodimers (αβ) associate to form spectrin heterotetramers (α 2 β 2 ) and higher oligomers (Fig. 2). On the membrane, tetramer (α 2 β 2 ) is present predominantly. Spectrins stay as the tetramer at a physiological ionic strength and low-temperature (25 ◦ C), whereas they dissociate into dimers at low ionic strength and 37 ◦ C. The equilibrium is kinetically frozen at 0 ◦ C. In normal red cell ghosts, approximately 5% of the extracted spectrin is in the dimeric form and about 50% is in the tetrameric form. The remainder is the spectrin oligomers and the protein complexes of spectrins, actin, protein 4.1, and dematin, which have very high molecular weights. This procedure has been ulitized for elucidating molecular abnormalities of spectrins in the disease states, especially hereditary ellipto-cytosis. For the self-association of spectrins, two mechanisms have been considered: (a) the head-to-head interaction of the spectrins, and (b) their side-by-side interaction (Fig. 3). For the first mechanism, the interconversion of spectrin dimers to tetramers requires a reversible opening of the dimeric bond and formation of two new β attachments. The αβ contact closely resembles the triple-helical structure
Fig. 3 Head-to-head contact of "- and $-spectrins in normal red cells. $-Spectrins are shown as shaded helices, and "-spectrins are denoted as open helices. A polymerization site is composed of the last two helices of $-spectrins and the first one helix of "-spectrin.
1 "- and $-Spectrins
of native spectrin repeats [19, 20]. In the contact site, two of the helices (helices A and B) at the COOH-terminal end of the β-spectrin molecule are bound to one helix (helix C) of the NH2 -terminal end of the α-spectrin molecule [20]. The second mechanism is the side-by-side association of the α- and β-spectrins, which occurs in a zipper-like process. It starts with a defined nucleation site composed of four repeats, that is, α19 to α22 from the α-spectrin, and β1 to β4 from the β-spectrin, respectively [21]. These repeats are located at the tail end of each molecule, to which actin binds. Two of the α repeats and one of the β repeats have eight residue insertions which participate in the interchain interaction [22]. These are located between conformational segments α20/α21 and β2/β3. After the initial tight association of the complementary nucleation sites, a conformational change is initiated that promotes the pairing of the remaining part of the two chains [23]. A common α-spectrin polymorphism, α LELY interferes with normal nucleation and decreases the synthesis of functionally-competent α-spectrin chains [24]. This may influence clinical expression in spectrin mutations [25]. The linkage of spectrins to other red cell membrane proteins is also functionally important. Binding of spectrins to ankyrin requires almost the entire 15th repeat segment and a small portion of the 16th repeat of the β-spectrin molecule [16]. It has been reported that β-spectrin and nonerythroid β-spectrin (β-fodrin) contain an ankyrin-independent site in the NH2 -terminal half of each molecule, which binds to brain membranes. The binding is inhibited by Ca2+ /calmodulin. This site is known as the membrane association domain 1 (MAD1) [26]. β-Fodrin and the muscle/brain isoform of red cell β-spectrin have a second Ca2+ /calmodulinindependent site (MAD2) near to the COOH-terminal end. Another linkage of spectrins to the membrane is mediated by association with the junctional complex that includes spectrin, actin, and protein 4.1, which form a ternary or higher-order complex by linking spectrin tetramers to one another in a tail-to-tail fashion [4]. The actin binding site is located at a region near to the NH2 -terminal end of β-spectrin that contains a 27 amino acid sequence, which is also highly conserved in α-actinin, dystrophin, actin-binding protein (ABP–120), and others. 1.4 Erythroid and Nonerythroid Spectrins
There are a wide variety of spectrin-related proteins in many tissues and animal species [3, 4]. Two closely related, yet distinct mammalian α-subunits of spectrin are known, i.e.: α-spectrin in mature erythroid cells and α-fodrin in all other tissues [27]. In mammals, there are two β-subunits, i.e.: β-spectrin and β-fodrin. A third subunit (β-heavy spectrin) with a molecular weight of 430 kDa is present in Drosophila. β-spectrin exists in both erythroid and muscle/brain isoforms [28–30]. Nonerythroid spectrins are basically composed of two nonidentical, high molecular weight subunits (such as α-spectrin and β-spectrin), which are composed of homologous repeat units of 106 amino acids.
5
6 Skeletal Proteins of the Erythrocyte Membrane
Fodrin (synonyms: tissue spectrin, brain spectrin, spectrin II, or spectrin G) is a heterodimer of α- and β-fodrin chains [2]. The biological functions are essentially the same as those in red cell spectrin (spectrin I, or spectrin R). However, there are some differences between them. The α-fodrin has a calmodulin-binding site which is not present in α-spectrin. In addition, β-fodrin contains a pleckstrin homology domain which is a sequence motif in MAD2, but red cell spectrin does not. Erythroid spectrins, nonerythroid spectrins, dystrophin and its homologues, and α-actinin are believed to belong to the so-called spectrin super-family These proteins have the same characteristic features such as the flexible, rod-like shapes of the side-by-side arrangement of the proteins in an antiparallel fashion. They are composed of homologous amino acid repeats of 106, 109, and 120 residues of spectrin, dystrophin, and α-actinin, respectively, which are folded into triple α-helical segments [1, 2]. Amongst these there is homology in the sequence of the triple-helical repeats with conservation of tryptophan at position 45 or 46. The composition of the repeats are homologous mainly between spectrin and dystrophin, whereas that of the actinin repeat is less homologous. The COOH-terminal regions are homologous between α-spectrin, dystrophin, and actinin, which have the potential for Ca2+ -binding, EF hand structures. The NH2 -terminal regions are also homologous in β-spectrin, dystrophin, and α-actinin which demonstrate a potential actin-binding site. Great variability exists with respect to the subcellular localization of erythroid and nonerythroid spectrins [1–4]. The spectrin superfamily proteins are essentially present in the membrane skeleton of the mammalian red cells, but are also expressed in various tissues by developmentally regulated mechanisms, such as neural development, oogenesis, epithelial cell polarity, embryogenesis, viral transformation, the IgG binding to cell surface receptors, and apoptosis. It has been shown that tissue-specific differential modification of the COOH-terminal region of β-spectrin produces various β-spectrin isoforms. The best example is a muscle isoform of β-spectrin [28]. The isoform results from alternative mRNA splicing of the mRNA transcript of the erythroid β-spectrin gene. The splicing alters translation termination near to the COOH-terminus, leading to elongation of the COOH-terminus of the muscle form. At the present time, two α-spectrin isoforms and five β-spectrin isoforms have been reported [29]. A β III isoform is localized on the Golgi apparatus in many cell types [30].
2 Protein 4.1
Protein 4.1 in red cells is a phosphoprotein present in 200 × 103 copies per cell [31, 32]. The cloned protein is globular (5.7 nm diameter) and has a molecular weight of 66 kDa, but migrates as a 78 or 80 kDa protein on SDS–PAGE gels. The protein 4.1 gene (EL1 or EPB41) [33–35] is located on chromosome 1p36.1 [36], which encodes 588 amino acids.
2 Protein 4.1
2.1 Structure of Protein 4.1
Chymotryptic digestion and limited sequencing demonstrate the presence of four domains of red cell protein 4.1 [32]: (1) a 30 kDa domain (residues 1 to ∼300) at the NH2 -terminal end, (2) a 16 kDa domain (residues 300 to 404), (3) a 10 kDa domain (residues 405 to 471), and (4) a 22 to 24 kDa domain (residues 472 to 622) at the COOH end (Fig. 4). In red cells, there are two forms of protein 4.1 with different molecular weights, that is, protein 4.1a (80 kDa) and protein 4.1b (78 kDa). Protein 4.1a, with the higher molecular weight, results from the deamidation of two Asn residues (478 and 502) within the 22 to 24 kDa domain in a non-enzymatic, age-dependent fashion that lowers the mobility of the protein in gels [37]. Therefore, protein 4.1a is hardly apparent in young red cells, and increases as red cells age. Thus, protein 4.1a provides a useful indicator of red cell senescence. Considering the characteristic features of the structure of protein 4.1, clustering of cysteine residues is observed near the NH2 -terminal, and O-linked glycosylation is present in the 10 kDa domain [31]. The glycosylation contributes to its higher apparent molecular weight on polyacrylamide gels than is predicted on the basis of known amino acid sequence. In addition, the NH2 -terminus is definitely basic, whereas the COOH-terminus is clearly acidic. It has been known that red cell protein 4.1 is highly phosphorylated [31]. Phosphorylation sites are located near the COOH-terminal end of the 30 kDa domain and in the middle region of the 10 kDa domain. The latter site appears to be cyclic AMP dependent, but the others may be regulated by protein kinase C. At least one site appears to be a substrate for cdc kinase. 2.2 Binding to Other Membrane Proteins
The most important role of protein 4.1 is in the linkage of the spectrin–actin membrane skeleton to the lipid bilayer by facilitating complex formation between the spectrin–actin fibers [31, 38], the cytoplasmic domain of band 3 [39], p55 [40], and glycophorin C [41]. Protein 4.1 binds tightly (binding coefficient K d ∼ 10−7 M) to β-spectrin very close to the actin-binding site [13]. For this activity, a 21 amino acid within the 10 kDa domain is critical. The interaction of protein 4.1 with spectrin and actin is blocked by protein kinase A phosphorylation at residues in Ser331 in the 16 kDa domain and Ser467 in the 10 kDa domain, and also by tyrosine kinase phosphorylation at Tyr418 in the 10 kDa domain [31]. The ternary complex is also regulated by Ca2+ and calmodulin [42]. Calmodulin binds to a site within the 30 kDa domain of protein 4.1 in a Ca2+ -independent manner (K d ∼ 5 × 10−7 M). The 30 kDa domain of erythroid protein 4.1 is involved in its binding to proteins such as p55, glycophorin C, band 3, and other embedded moieties, e.g., the chloride channel pICln [43]. Protein 4.1 binds directly to glycophorin C to a site near amino
7
8 Skeletal Proteins of the Erythrocyte Membrane
Fig. 4 Molecular structure of protein 4.1. A schematic structure of protein 4.1 molecule is shown in (I), and the protein 4.1 mRNA is demonstrated in (II). The details are given in text.
2 Protein 4.1
acids 82 through 98 in the tail of this molecule [41]. It also binds to a more distal portion near residues 112 through 128, by means of the protein p55 [40]. The role of the interaction between protein 4.1 and glycophorin C is well established. Protein 4.1 deficient red cells are also deficient in glycophorin C, but not glycophorin A or band 3. The residual glycophorin C that is only loosely bound to the skeleton becomes tightly bound after protein 4.1 is reconstituted. Both protein 4.1 and glycophorin C bind p55, which enhances or regulates their interaction. The mechanical weakness of glycophorin C-deficient membranes is totally restored by restoration of protein 4.1 or its 10 kDa spectrin–actin binding domain. Under the physiological condition in vivo, approximately 40% of protein 4.1 is bound directly to glycophorin C, 40% is bound indirectly through the protein p55, and 20% is bound to band 3. 2.3 Extensive Alternative Splicings
It is known that protein 4.1 is extremely heterogeneous with respect to molecular weight, abundance, and intracellular localization [31]. Red cell protein 4.1 (P4.1R) isoforms come from a single genomic locus near the Rh locus at chromosome 1p36.1 [36]. Its primary mRNA transcript from the gene (250–300 kb long) is subjected to extensive alternative splicing, producing a diverse protein family through various mRNA (6.5 to 7.0 kb long) [44] (Fig. 5). At the present time, 12 alternatively spliced exons, as well as an important cryptic acceptor site, are known. Protein 4.1 utilizes two different initiation codons, termed “upstream” and “downstream” initiation codons, which translate to isoforms of 135 and 85 kDa, respectively [31]. The 85 kDa isoform is created by the splicing out of a 17 nucleotide motif that contains the upstream translation initiator. In most tissues, this sequence is spliced in, and a downstream 80 base pair motif is spliced out, producing the elongated 135 kDa isoform that contains an additional 209 amino acids attached to the NH2 -terminus of the shorter isoform of protein 4.1. This is because the upstream translation start site, when present, is used almost exclusively [33]. The 85 kDa shorter isoform which is encoded by the downstream initiation codon is found primarily in red cells. In erythroid protein 4.1 (P4.1R), splicings appear to be critical for red cell functions in only two regions, i. e., at the 10 kDa spectrin–actin binding domain, and in the 5 -untranslated region [31]. At the 10 kDa domain, three alternatively spliced exons are located at the 5 -end of this domain, which is the spectrin–actin binding domain. Exon 16, which is one of the three exons and is 63 nucleotides long, encodes for 21 amino acids at the 5 -terminus. This exon 16 is also expressed in muscle and testis [45], which are actually nonerythroid tissues. Two other exons (exons 14 and 15) are adjacent to the one responsible for binding to spectrin–actin. In red cells, the protein 4.1 mRNA contains only exon 16 out of these three exons [33, 46], whereas lymphocytes have
9
10 Skeletal Proteins of the Erythrocyte Membrane
Fig. 5 Extensive splicing of the protein 4.1 gene. Complicated splicings are observed around the regions of exons 3 to 5, 14 to 16, and 17A to 20.
2 Protein 4.1
none of the exons, and the brain retains all three exons [45]. Therefore, exon 16 appears to provide erythroid and stage-specific expression.
2.4 Nonerythroid Protein 4.1 Isoforms
Multiple alternative mRNA splicing events generate isoforms with different affinities for membrane and intracellular structures. The high (120–150 kDa) and low molecular weight isoforms (60–90 kDa) coexist in varying ratios in most nonerythroid tissues. Although the functional importance of the high molecular weight isoforms remains unknown, their major part of protein 4.1, which is 80 kDa upstream from the COOH-terminus, is identical to their low molecular weight analogues. Therefore, the additional amino acid residues in the high molecular weight isoforms are equivalent to a headpiece [31, 44]. It is interesting to note that high molecular weight isoforms associate with NuMa (a major organizing protein of the mitotic apparatus) [40, 47, 48], ZO-2 (a component of the tight junctions in epithelial and endothelial cells), and other membrane associated guanylate kinase (MAGUK) family proteins, in addition to p55 [40, 49, 50]. The diverse structure of these spliced isoforms appears to provide for the multiple subcellular locations for their specific functions. Although these isoforms demonstrate their widespread expression and interaction with key components, such as mitotic spindles and tight junctions, there is a surprising fact that erythroid protein 4.1 gene-targeted knockout mice exhibit only anemia and subtle neurological abnormalities [51]. As regards the high molecular weight isoforms, their junctional significance is still unknown [31]. The high molecular weight isoforms are completely absent from mature red cells. mRNA containing the 17 amino acid extension of exon 2, which is required for the synthesis of high molecular weight forms, is essentially absent from early erythroid precursors (younger than proerythroblasts) [33, 45]. Only the downstream translation initiation site is utilized for erythroid protein 4.1 biosynthesis during erythropoiesis. It is now clear that protein 4.1 and its family analogues demonstrate a mosaic of sequence homologues with other proteins [52]. A typical example is the 20–45% homology, which is observed at the NH2 -terminus of protein 4.1 family proteins. The FERM (Four. 1 protein, Ezrin, Radixin, and Moesin) domain, which is present at the NH2 -terminus, defines a family of proteins linking the cytoskeleton including actin to membranes in various cells [53, 54]. The FERM has previously been known as a family of actin binding proteins (moesin, ezrin, radixin, and merlin), which are also homologous with coraclin. This is a Drosophila protein that interacts with a large disk protein (Dlg) homologous with tumor suppressor proteins in mammals [55]. Erythroid protein 4.1 (P4.1R) is the prototype of a much more closely related family of at least four members, such as, P4.1 of a generally distributed type (P4.1G), that of a brain type (P4.1B), or that expressed predominantly in neuron (P4.1N). The extent of the homology among these isoforms is 60–95% [56, 57].
11
12 Skeletal Proteins of the Erythrocyte Membrane
3 Actin
Red cell actin (β-actin), which is a 43 kDa protein, is present in abundance (400–500 × 103 copies per red cell). The red cell actin content is 5.5% (w/w) of the total red cell membrane proteins [58–60]. The β-actin gene (ACTB) is located on the chromosome 7 (7p12–p22) encoding 375 amino acids. Red cell actin is similar to other actins in its structure and functions. Although the β-actin is distributed in various nonmuscle cells including red cells, red cell actin is organized as short, double-helical F-actin protofilaments 12 to 13 monomers long and approximately 35 nm in length. Red cell actin interacts with spectrins, adducin, protein 4.1, and tropomyosin for the sake of their stability [38, 58, 61, 62]. The actin also binds to tropomodulin by capping of the slow growing or pointed end of the actin filament [63]. The state of actin polymerization is functionally important to red cell membrane flexibility, which increases when actin polymerization is inhibited. It is also true that increased polymerization of actin makes the red cell membrane more rigid. Spectrin dimers bind to the side of actin filaments at a site near the tail end of the spectrin molecule. On average, six spectrin ends make a complex with each actin oligomer, producing an irregular network, which is approximately hexagonal [60]. Each spectrin–actin junction is stabilized by the formation of a ternary complex with protein 4.1 [38].
4 Other Minor Skeletal Proteins 4.1 The p55 Protein
The p55 protein is a phosphoprotein member of the MAGUK (membraneassociated guanylate kinase) family of proteins [40, 58]. The protein is a 55 kDa skeletal protein, which has 80 × 10 copies per red cell. The p55 gene (MPP1) is located on the X chromosome (Xq28), which encodes 466 amino acids. The protein p55 is the human homolog of dlg, a Drosophila tumor suppressor gene [40]. This protein is composed of three domains: (1) an NH2 -terminal domain of unknown function that is also present in dlg, (2) a central SH3 domain that is embedded in the MAGUK domain, and (3) a COOH-terminal guanylate kinase domain [64, 65]. The protein appears to be present in a dimeric form, extensively palmitoylated, and tightly bound to the membrane [66]. Homologues of the p55 include signal transduction proteins, tumor suppressor genes, and proteins important in cell-to-cell interactions. This protein is expressed throughout erythroid differentiation and is widely expressed in nonerythroid tissues. The p55 binds to the 30 kDa domain of protein 4.1 through a 39 amino acid binding motif in the COOH-terminal MAGUK domain and to the cytoplasmic tail of glycophorin C by means of its single PDZ motif. The p55 binds to the 30 kDa
4 Other Minor Skeletal Proteins 13
domain of protein 4.1 (K d ∼2 × 10−9 M) and to the cytoplasmic domain of glycophorin C (K d ∼7 × 10−9 M). In the disease states, patients who are actually deficient in either protein 4.1 or glycophorin C also lack the p55 [67]. 4.2 Adducin
Adducin, which is a Ca2+ /calmodulin-binding phosphoprotein, is composed of αβ-adducin heterodimers [68, 69]. This protein is located at the spectrin–actin junctional complex [58]. The α and β adducin are structurally similar proteins encoded by separate genes. A theoretical molecular weight of α-adducin, which is calculated from the results of gene sequencing, is 81 kDa, 1% of the total red cell membrane proteins, and with 30 × 103 copies per red cell. However, the actual molecular weight of this protein is 103 kDa on the SDS-PAGE gels, which is quite larger than the theoretical one, because some other components (such as glycosylated chains) are present on this molecule. β-Adducin (80 kDa) is 97 kDa on the SDS-PAGE gels, 1% of the total red cell membranes proteins, and with 30 × 103 copies per red cell. The α-adducin gene (ADDA) is located on chromosome 4 (4p16.3), which encodes 737 amino acids. The β-adducin gene (ADDB) is located on chromosome 2 (2p13–2p14), which encodes 726 amino acids. A third nonerythroid adducin (γ -adducin) is also known. The subunits have three domains: (1) an NH2 -terminal domain (39 kDa) with a globular head, (2) a 9 kDa domain of the connecting neck, and (3) a protease-sensitive domain (33 kDa) with an extended COOH-terminal tail [68, 70]. The last domain contains mainly hydrophilic residues and 22 amino acid segments homologous with the MARKS phosphorylation domain that regulates Ca2+ /calmodulin-regulated capping and bundling of actin filaments [71]. The four head domains cluster in a globular core, and the tail domains extend to interact with spectrins and actin. Adducin increases the binding of spectrin to actin, just like protein 4.1, whereas adducin does not interact directly with spectrin in the absence of actin, unlike protein 4.1. The tails of both α-and β-adducin bind to the actinbinding domain at the N-terminus of β-spectrin, and to the second spectrin repeat [72]. Adducin is expressed at the stage of erythroblasts, but is only incorporated into the red cell membrane structure at a late stage in erythroid development [73]. Adducin contributes to the early assembly of the spectrin–actin complex, which is regulated by phosphorylation of the COOH-terminal domain of adducin by protein kinase C [74, 75]. In this event, adducin does not bind directly to spectrin in the absence of actin. Targeted inactivation of β-adducin in mice produced compensated spherocytic anemia and neurologic abnormalities [76]. For this assembly of a spectrin–actin–adducin ternary complex, the actin-binding domain of β-spectrin and the first two spectrin repeats are required. Adducin is also present in various nonerythroid cells [68], especially at sites of cell-to-cell contact. The protein assembles at these sites in response to extracellular Ca2+ , and dissociates when phosphorylation of the protein is activated by protein kinase C.
14 Skeletal Proteins of the Erythrocyte Membrane
4.3 Dematin (Protein 4.9)
Dematin has been known to be associated with actin in erythroid and nonerythroid cells [58]. Human red cell dematin consists of two chains of 48 kDa and 52 kDa, which are present in a ratio of 3 (48 kDa) to 1 (52 kDa). The native protein is a trimer [77]. There are 40 × 103 trimeric copies per cell, and the content of dematin is approximately 1% of the total red cell membrane proteins. The dematin gene (EPB 49) is located on chromosome 8 (8p21.1), which encodes 383 amino acids [77, 78]. Dematin has two binding sites for β-actin, and bundles actin filaments into cables, which is inhibited by phosphorylation with protein kinase A, but not with protein kinase C. The COOH-terminal half of the 48 kDa subunit is similar to villin, which is known as an actin-binding protein that induces growth of microvilli and reorganizes actin filaments in brush borders. The 52 kDa subunit of dematin has an additional 22 amino acid sequence in the C-terminal domain of the 48 kDa subunit [77]. This sequence resembles that in protein 4.2 [79]. Dematin appears to attach to a lipid or integral membrane protein, since it remains associated with the membrane during the extraction of other skeletal proteins. 4.4 Tropomyosin
It is known that tropomyosin is also associated with actin [58]. Red cell tropomyosin is a heterodimer of 27 kDa and 29 kDa subunits on SDS–PAGE gels, on which tropomyosin migrates in the region of band 7 [80, 81]. There are 80 × 103 copies per cell, and its content is approximately 1% of the total red cell membrane proteins. The tropomyosin gene (TPM 3) is located on chromosome 1 (1q31), which encodes 239 amino acids. Stoichiometrically, one copy of tropomyosin binds to every six to eight actin monomers, which is just sufficient to line both grooves of the actin protofilament. The function of tropomyosin appears to be for stabilizing the short erythroid actin filaments and to help spectrin–actin interactions [82]. 4.5 Tropomodulin
Tropomodulin is a 41 kDa protein that binds to tropomyosin in a 2:1 molar ratio with a K d of 5 × 10−7 M [83]. Each protein binds to the NH2 -terminal region of the other. The molecular size of tropomodulin is 43 kDa on SDS–PAGE gels [58]. There are 30 × 103 copies per cell. The tropomodulin gene (TMOD) is located on chromosome 9 (9q22), which encodes 359 amino acids. This protein also binds to actin. Capping is enhanced when the grooves of the actin filament are lined with tropomyosin. Tropomodulin appears to have a tight association with the membrane.
References
4.6 Other Membrane Proteins
Myosin [58, 84, 85] and caldesmon (a 71 kDa calmodulin-binding protein) [86] are known as minor components of the red cell membrane proteins, although their physiological functions have not been elucidated in detail.
References 1 GALLAGHER, P. G., FORGET, B. G. (1993) Spectrin genes in health and disease. Semin. Hematol. 30: 4–20. 2 WINKELMANN, J. C., FORGET, B. G. (1993) Erythroid and nonerythroid spectrins. Blood 81: 3173–3185. 3 MORROW, J. S., RIMM, D. L., KENNEDY, S. P., CIANCI, C. D., SINARD, J. H., WEED, S. A. (1997) Of membrane stability and mosaics: The spectrin cytoskeleton, in: Handbook of Physiology (Hoffman, J., Jamieson, J. eds.), Oxford, London, pp. 485. 4 BENNETT, V., LAMBERT, S. (1991) The spectrin skeleton: From red cells to brain. J. Clin. Invest. 87: 1483–1489. 5 SAHR, K. E., LAURILA, P., KOTULA, L., SCARPA, A. L., COUPAL, E., LETO, T. L., LINNENBACH, A. J., WINKELMANN, J. C., SPEICHER, D. W., MARCHESI, V. T., CURTIS, P. J., FORGET, B. G. (1990) The complete cDNA and polypeptide sequences of human erythroid α-spectrin. J. Biol. Chem. 265: 4434–4443. 6 HUEBNER, K., PALUMBO, A. P., ISOBE, M., KOZAK, C. A., MONACO, S., ROVERA, G., CROCE, C. M., CURTIS, P. J. (1985) The α-spectrin gene is on chromosome 1 in mouse and man. Proc. Natl. Acad. Sci. USA 82: 3790–3793. 7 KOTULA, L., LAURY-KLEINTOP, L. D., SHOWE, L., SAHR, K., LINNENBACH, A. J., FORGET, B. G., CURTIS, P. J. (1991) The exon-intron organization of the human erythrocyte α-spectrin gene. Genomics 9: 131–140. 8 WINKELMANN, J. C., CHANG, J. G., TSE, W. T., SCARPA, A. L., MARCHESI, VT., FORGET, B. G. (1990) Full-length sequence of the cDNA for human erythroid beta-spectrin. J. Biol. Chem. 265: 11827–11832. 9 FUKUSHIMA, Y., BYERS, M. G., WATKINS, P. C., WINKELMANN, J. C., FORGET, B. G.,
10
11
12
13
14
15
SHOWS, T. B. (1990) Assignment of the gene for β-spectrin (SPTB) to chromosome 14q23–q24.2 by in situ hybridization. Cytogenet. Cell Genet. 53: 232–233. AMIN, K. M., SCARPA, A. L., WINKELMANN, J. C., CURTIS, P. J., FORGET, B. G. (1993) The exon-intron organization of the human erythroid beta-spectrin gene. Genomics 18: 118–125. SPEICHER, D. W., MARCHESI, V. T. (1984) Erythrocyte spectrin is comprised of many homologous triple helical segments. Nature 311: 177–180. VERTESSY B.G., STECK, T. L. (1989) Elasticity of the human red cell membrane skeleton. Effects of temperature and denaturants. Biophys. J. 55: 255–262. BECKER, P. S., SCHWARTZ, M. A., MORROW, J. S., LUX, S. E. (1990) Radiolabel-transfer cross-linking demonstrates that protein 4.1 binds to the N-terminal region of beta-spectrin and to actin in binary interactions. Eur. J. Biochem. 193: 827–836. MISCHE, S. M., MORROW, J. S. (1990) Multiple kinases phosphorylate spectrin, in: Molecular and Cellular Biology of Normal and Abnormal Erythrocyte Membranes (COHEN, C. M., PALEK, J. eds.), Alan R. Liss, New York, pp 113–130. ZIEMNICKA-KOTULA, D., XU, J., GU, H., POTEMPSKA, A., KIM, K. S., JENKINS, E. C., TRENKNER, E., KOTULA, L. (1998) Identification of a candidate human spectrin Src homology 3 domain-binding protein suggests a general mechanism of association of tyrosine kinases with the spectrin-based membrane skeleton. J. Biol. Chem. 273: 13681–13692.
15
16 Skeletal Proteins of the Erythrocyte Membrane
16 KENNEDY, S. P., WARREN, S. L., FORGET, B. G., MORROW, J. S. (1991) Ankyrin binds to the 15th repetitive unit of erythroid and nonerythroid β-spectrin. J. Cell Biol. 115: 267–277. 17 YAN, Y., WINOGRAD, E., VIEL, A., CRONIN, T., HARRISON, S. C., BRANTON, D. (1993) Crystal structure of the repetitive segments of spectrin. Science 262: 2027–2030. 18 WINOGRAD, E., HUME, D., BRANTON, D. (1991) Phasing the conformational unit of spectrin. Proc. Natl. Acad. Sci. USA 88: 10788–10791. 19 ALTMANN, S. M., GRU¨ NBERG, R. G., LENNE, P.-F., YLANNE, J., RAAE, A., HERBERT, K., SARASTE, M., NILGES, M. HORBER, J.K. (2002) Pathways and intermediates in forced unfolding of spectrin repeats. Structure 10: 1085–1096. 20 SPEICHER, D. W., DESILVA, T. M., SPEICHER, K. D., URSITTI, J. A., HEMBACH, P., WEGLARZ, L. (1993) Location of the human red cell spectrin tetramer binding site and detection of a related “closed” hairpin loop dimer using proteolytic footprinting. J. Biol. Chem. 268: 4227–4235. 21 SPEICHER, D. W., WEGLARZ, L., DESILVA, T. M. (1992) Properties of human red cell spectrin heterodimer (side-to-side) assembly and identification of an essential nucleation site. J. Biol. Chem. 267: 14775–14782. 22 URSITTI, J. A., KOTULA, L., DESILVA, T. M., CURTIS, P. J., SPEICHER, D. W. (1996) Mapping the human erythrocyte beta-spectrin dimer initiation site using recombinant peptides and correlation of its phasing with the alpha-actinin dimer site. J. Biol. Chem. 271: 6636–6644. 23 VIEL, A., GEE, M. S., TOMOOKA, L., BRANTON, D. (1998) Motif involved in interchain binding at the tail-end of spectrin. Biochim. Biophys. Acta 1384: 396–404. 24 WILMOTTE, R., HARPER, S. L., URSITTI, J. A., MARECHAL, J., DELAUNAY, J., SPEICHER, D. W. (1997) The exon 4b-encoded sequence is essential for stability of human erythroid alpha-spectrin and heterodimer formation. Blood 90: 4188–4196.
25 ALLOISIO, N., MORLE´ , L., MARE´ CHAL, J., ROUX, A. F., DUCLUZEAU, M. T., GUETARNI, D., POTHIER, B., BAKLOUTI, F., GHANEM, A., KASTALLY, R., DELAUNAY, J. (1991) Spα V/4I : A common spectrin polymorphism at the α IV -α V domain junction. Relevance to the expression level of hereditary elliptocytosis due to α-spectrin variants located in trans. J. Clin. Invest. 87: 2169–2177. 26 LOMBARDO, C. R., WEED, S. A., KENNEDY, S. P., FORGET, B. G., MORROW, J. S. (1994) βII-spectrin (fodrin) and βI 2-spectrin (muscle) contain NH2 - and COOH-terminal membrane association domains (MAD1 and MAD2). J. Biol. Chem. 269: 29212–29219. 27 LETO, T. L., FORTUGNO-ERIKSON, D., BARTON, D., YANG-FENG, T. L., FRANCKE, U., HARRIS, A. S., MORROW, J. S., MARCHESI, V. T., BENZ, E. J., Jr., (1988) Comparison of nonerythroid α-spectrin genes reveals strict homology among diverse species. Mol. Cell Biol. 8: 1–9. 28 WINKELMANN, J. C., COSTA, F. F., LINZIE, B. L., FORGET, B. G. (1990) β-Spectrin in human skeletal muscle: Tissue-specific differential processing of 3 β-spectrin pre-mRNA generates a P-spectrin isoform with a unique carboxyl terminus. J. Biol. Chem. 265: 20449–20454. 29 STABACH, P. R., MORROW, J. S. (2000) Identification and characterization of βV spectrin, a mammalian ortholog of Drosophila βH spectrin. J. Biol. Chem. 275: 21385–21395. 30 STANKEWICH, M. C., TSE, WT., PETERS, L. L., CH’NG, Y., JOHN, K. M., STABACH, P. R., DEVARAJAN, P., MORROW, J. S., LUX, S. E. (1998) A widely expressed beta-III spectrin associated with Golgi and cytoplasmic vesicles. Proc. Natl. Acad. Sci. USA 95: 14158–14163. 31 CONBY, J. G. (1993) Structure, function, and molecular genetics of erythroid membrane skeletal protein 4.1 in normal and abnormal red blood cells. Semin. Hematol. 30: 58–73. 32 LETO, T. L., MARCHESI, V. T. (1984) A structural model of human erythrocyte protein 4.1. J. Biol. Chem. 259: 4603–4608. 33 HUANG, J. P., TANG, C. J., KOU, G. H., MARCHESI, V. T., BENZ, E. J. Jr., TANG, T.
References
34
35
36
37
38
39
40
41
42
K. (1993) Genomic structure of the locus encoding protein 4.1. Structural basis for complex combinational patterns of tissue-specific alternative RNA splicing. J. Biol. Chem. 268: 3758–3766. SCHISCHMANOFF P. O., YASWEN, P., PARRA, M. K., LEE, G., CHASIS, J. A., MOHANDAS, N., CONBOY, J. G., (1997) Cell shape-dependent regulation of protein 4.1 alternative pre-mRNA splicing in mammary epithelial cells. J. Biol. Chem. 272: 10254–10259. BAKLOUTI, F., HUANG, S. C., VULLIAMY, TJ., DELAUNAY, J., BENZ, E. J. Jr., (1997) Organization of the human protein 4.1 genomic locus: New insights into the tissue-specific alternative splicing of the pre-mRNA. Genomics 39: 289–302. KIM, A. C., Van HUFFEL, C., LUTCHMAN, M., CHISHTI, A. H. (1998) Radiation hybrid mapping of EPB41L1, a novel protein 4.1 homologue, to human chromosome 20q11.2–q12. Genomics 49: 165–166. INABA, M., GUPTA, K. C., KUWABARA, M., TAKAHASHI, T., BENZ, E. J. Jr., MAEDE, Y. (1992) Deamidation of human erythrocyte protein 4.1: Possible role in aging. Blood 79: 3355–3361. GIMM, J. A., AN, X., NUNOMURA, W., MOHANDAS, N. (2002) Functional characterization of spectrin-actinbinding domains in 4.1 family of proteins. Biochemistry 41: 7275–7282. AN, X. L., TAKAKUWA, Y., NUNOMURA, W., MANNO, S., MOHANDAS, N. (1996) Modulation of band 3-ankyrin interaction by protein 4.1. Functional implications in regulation of erythrocyte membrane mechanical properties. J. Biol. Chem. 271: 33187–33191. CHISHTI, A. H. (1998) Function of p55 and its nonerythroid homologues. Curr. Opin. Hematol. 5: 116–121. HEMMING, N. J., ANSTEE, D. J., STARICOFF, M. A., TANNER, M. J., MOHANDAS, N. (1995) Identification of the membrane attachment sites for protein 4.1 in the human erythrocyte. J. Biol. Chem. 270: 5360–5366. TAKAKUWA, Y., MOHANDAS, N. (1988) Modulation of erythrocyte membrane material properties by Ca2+ and
43
44
45
46
47
48
49
50
51
calmodulin. Implications for their role in regulation of skeletal protein interactions. J. Clin, Invest. 82: 394–400. TANG, C. J., TANG, T. K. (1998) The 30-kD domain of protein 4.1 mediates its binding to the carboxyl terminus of pICln, a protein involved in cellular volume regulation. Blood 92: 1442–1447. CONBOY, J. (1999) The role of alternative pre-mRNA splicing in regulating the structure and function of skeletal protein 4.1. Proc. Soc. Exper. Biol. Med. 220: 73–78. CONBOY, J. G., CHAN, J. Y., CHASIS, J. A., KAN, Y. W., MOHANDAS, N. (1991) Tissueand development-specific alternative RNA splicing regulates expression of multiple isoforms of erythroid membrane protein 4.1. J. Biol. Chem. 266: 8273–8280. GASCARD, P., LEE, G., COULOMBEL, L., AUFFRAY, I., LUM, M., PARRA, M., CONBOY, J. G., MOHANDAS, N., CHASIS, J. A. (1998) Characterization of multiple isoforms of protein 4.1 R expressed during erythroid terminal differentiation. Blood 92: 4404–4414. LALLENA, M. J., MARTINEZ, C., VALCARCEL, J., CORREAS, I. (1998) Functional association of nuclear protein 4.1 with premRNA splicing factors. J. Cell Sci. 111: 1963–1971. MATTAGAJASINGH, S. N., HUANG, S. C., HARTENSTEIN, J. S., SNYDER, M., MARCHESI, V. T., BENZ, E. J. (1999) A nonerythroid isoform of protein 4.1R interacts with the nuclear mitotic apparatus (NuMA) protein. J. Cell Biol. 145: 29–43. COHEN, A. R., WOODS, D. F., MARFATIA, S. M., WALTHER, Z., CHISHTI, A. H., ANDERSON, J. M., WOODS, D. F. (1998) Human CASK/LIN-2 binds syndecan-2 and protein 4.1 and localizes to the basolateral membrane of epithelial cells. J. Cell Biol. 142: 129–138. WU, H., REUVER, S. M., KUHLENDAHL, S., CHUNG, W. J., GARNER, C. C. (1998) Subcellular targeting and cytoskeletal attachment of SAP97 to the epithelial lateral membrane. J. Cell Sci. 111: 2365–2376. SHI, Z. T., AFZAL, V., COLLER, B., PATEL, D., CHASIS, J. A., PARRA, M., LEE, G., PASZTY, C., STEVENS, M., WALENSKY, L., PETERS, L. L., MOHANDAS, N., RUBIN, E., CONBOY, J.
17
18 Skeletal Proteins of the Erythrocyte Membrane
52
53
54
55
56
57
58
59
60
61
G., (1999) Protein 4.1R-deficient mice are viable but have erythroid membrane skeleton abnormalities. J. Clin. Invest. 103: 331–340. TURUNEN, O., SAINIO, M., JAASKELAINEN, J., CARPEN, O., VAHERI, A. (1998) Structure-function relationships in the ezrin family and the effect of tumor-associated point mutations in neurofibromatosis 2 protein. Biochim. Biophys. Acta. 1387: 1–16. TSUKITA, S., YONEMURA, S. (1997) ERM (ezrin/radixin/moesin) family: from cytoskeleton to signal transduction. Curr. Opin. Cell Biol. 9: 70–75. BRETSCHER, A., RECZEK, D., BERRYMAN, M. (1997) Ezrin: a protein requiring conformational activation to link microfilaments to the plasma membrane in the assemble of cell surface structures. J. Cell Sci. 110: 3011–3018. WARD, R. E. 4th, LAMB, R. S., FEHON, R. G. (1998) A conserved functional domain of Drosophila coracle is required for localization at the septate junction and has membrane-organizing activity. J. Cell Biol. 140: 1463–1473. PARRA, M., GASCARD, P., WALENSKY, L. D., SNYDER, S. H., MOHANDAS, N., CONBOY, J. G. (1998) Cloning and characterization of 4.1G (EPB41L2), a new member of the skeletal protein 4.1 (EPB41) gene family. Genomics 49: 298–306. WALENSKY, LD., GASCARD, P., FIELDS, M. E., BLACKSHAW, S., CONBOY, J. G., MOHANDAS, N., SNYDER, S. H. (1998) The 13-kD FK 506 binding protein, FKBP13, interacts with a novel homo-logue of the erythrocyte membrane cytoskeletal protein 4.1. J. Cell Biol. 141: 143–153. GILLIGAN, D. M., BENNETT, V. (1993) The functional complex of the membrane skeleton. Semin. Hematol. 30: 74–83. PINDER, J. C., GRATZER, W. B., (1983) Structural and dynamic states of actin in the eythrocyte. J. Cell Biol. 96: 768–775. BYERS, T. J., BRANTON, D. (1985) Visualization of the protein associations in the erythrocyte membrane skeleton. Proc. Natl. Acad. Sci. USA 82: 6153–6175. KUHLMAN, P. A., HUGHES, C. A., BENNETT, V., FOWLER, V. M. (1996) A new function for adducin.
62
63
64
65
66
67
68
69
Calcium/calmodulin-regulated capping of the barbed ends of actin filaments. J. Biol. Chem. 271: 7986–7991. PICART, C., DISCHER, D. E. (1999) Actin protofilament orientation at the erythrocyte membrane. Biophys. J. 77: 865–878. FOWLER, V. M., SUSSMANN, M. A., MILLER, P. G., FLUCHER, B. E., DANIELS, M. P. (1993) Tropomodulin is associated with the free (pointed) ends of the thin filaments in rat skeletal muscle. J. Cell Biol. 120: 411–420. KIM, A. C., METZENBERG, A. B., SAHR, K. E., MARTATIA, S. M., CHISHTI, A. H. (1996) Complete genomic organization of the human erythroid p55 gene (MPP1), a membrane-associated guanylate kinase homologue. Genomics 31: 223–229. MARFATIA, S. M., MORAIS-CABRAL, J. H., KIM, A. C., BYRON, O., CHISHTI, A. H. (1997) The PDZ domain of human erythrocyte p55 mediates its binding to the cytoplasmic carboxyl terminus of glycophorin C. Analysis of the binding interface by in vitro mutagenesis. J. Biol. Chem. 272: 24191–24197. RUFF, P., SPEICHER, D. W., HUSAIN-CHISHTI, A. (1991) Molecular identification of a major palmitoylated erythrocyte membrane protein containing the src homology 3 motif. Proc. Natl. Acad. Sci. USA 88: 6595–6599. ALLOISIO, N., DALLA VENEZIA, N., RANA, A., ANDRABI, K., TEXIER, P., GILSANZ, F., CARTRON, J. P., DELAUNAY, J., CHISHTI, A. H. (1993) Evidence that red blood cell protein p55 may participate in the skeleton-membrane linkage that involves protein 4.1 and glycophorin C. Blood 82: 1323–1327. JOSHI, R., GILLIGAN, D. M., OTTO, E., MCLAUGHLIN, T., BENNETT, V. (1991) Primary structure and domain organization of human α and β adducin. J. Cell Biol. 115: 665–675. KATAGIRI, T., OZAKI, K., FUJIWARA, T., SHIMIZU, F., KAWAI, A., OKUNO, S., SUZUKI, M., NAKAMURA, Y., TAKAHASHI, E., HIRAI, Y. (1996) Cloning, expression and chromosome mapping of adducinlike 70 (ADDL), a human cDNA highly homologous to human erythrocyte
References
70
71
72
73
74
75
76
77
adducin. Cytogenet. Cell Genet. 74: 90–95. HUGHES, C. A., BENNETT, V. (1995) Adducin: A physical model with implications for function in assembly of spectrin-actin complexes. J. Biol. Chem. 270: 18990–18996. LI, X., MATSUOKA, Y., BENNETT, V. (1998) Adducin preferentially recruits spectrin to the fast growing ends of actin filaments in a complex requiring the MARCKS-related domain and a newly defined oligomerization domain. J. Biol. Chem. 273: 19329–19338. LI, X., BENNETT, V. (1996) Identification of the spectrin subunit and domains required for formation of spectrin/adducin/actin complexes. J. Biol. Chem. 271: 15695–15702. NEHLS, V., DRENCKHAHN, D., JOSHI, R., BENNETT, V. (1991) Adducin in erythrocyte precursor cells of rats and humans: Expression and compartmentarization. Blood 78: 1692–1696. MATSUOKA, Y., HUGHES, C. A., BENNETT, V. (1996) Adducin regulation. Definition of the calmodulin-binding domain and sites of phosphorylation by protein kinases A and C. J. Biol. Chem. 271: 25157–25166. MATSUOKA, Y., LI, X., BENNETT, V. (1998) Adducin is an in vivo substrate for protein kinase C: Phosphorylation in the MARCKS-related domain inhibits activity in promoting spectrin-actin complexes and occurs in many cells, including dendritic spines of neurons. J. Cell Biol. 142: 485–497. GILLIGAN, D. M., LOZOVATSKY, L., GWYNN, B., BRUGRARA, C., MOHANDAS, N., PETERS, L. L. (1999) Targeted disruption of the beta-adducin gene (Add 2) causes red blood cell spherocytosis in mice. Proc Natl. Acad. Sci. USA, 96: 10717–10722. AZIM, A. C., KNOLL, J. H., BEGGS, A. H., CHISHTI, A. H. (1995) Isoform cloning, actin binding, and chromosomal localization of human erythroid dematin,
78
79
80
81
82
83
84
85
86
a member of the villin superfamily J. Biol. Chem. 270: 17407–17413. KIM, A. C., AZIM, A. C., CHISHTI, A. H. (1998) Alternative splicing and structure of the human erythroid dematin gene. Biochim. Biophys. Acta 1398: 382–386. AZIM, A. C., MARFATIA, S. M., KORSGREN, C., DOTIMAS, E., COHEN, CM., CHISHTI, A. H. (1996) Human erythrocyte dematin and protein 4.2 (pallidin) are ATP binding proteins. Biochemistry 35: 3001–3006. FOWLER, VM., BENNETT, V. (1984) Erythrocyte membrane tropomyosin. Purification and properties. J. Biol. Chem. 259: 5978–5989. FOWLER, VM., (1996) Regulation of actin filament length in erythrocytes and striated muscle. Curr. Opin. Cell Biol. 8: 86–96. SUNG, L. A., GAO, K. M., YEE, L. J., TEMM-GROVE, C. J., HELFMAN, D. M., LIN, J. J., MEHRPOURYAN, M. (2000) Tropomyosin isoform 5b is expressed in human erythrocytes: implications of tropomodulin-TM5 or tropomodulin-TM5b complexes in the protofilament and hexagonal organization of membrane skeletons. Blood 95: 1473–1480. FOWLER, VM., (1990) Tropomodulin: A cytoskeletal protein that binds to the end of erythrocyte tropomyosin and inhibits tropomyosin binding to actin. J. Cell Biol. 111: 471–481. FOWLER, VM., DAVIS, J. Q., BENNETT, V. (1985) Human erythrocyte myosin: Identification and purification. J. Cell Biol. 100: 47–55. COLIN, F. C., SCHRIER, S. L. (1991) Myosin content and distribution in human neonatal erythrocytes are different from adult erythrocytes. Blood 78: 3052–3055. DER TERROSSIAN, E., DEPRETTE, C., LEBBAR, I., CASSOLY, R. (1994) Purification and characterization of erythrocyte caldesmon. Hypothesis for an actin-linked regulation of a contractile activity in the red blood cell membrane. Eur. J. Biochem. 219: 503–511.
19
1
Integral Proteins of the Erythrocyte Membrane Yoshihito Yawata
Kawasaki College of Allied Health Professions, Kurashiki City, Japan
Originally published in: Cell Membrane. Yoshihito Yawata. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30463-9
1 Band 3
Band 3 is the major integral protein of the red cells, and is also known as anion exchanger-1 (AE1) [1–4]. This protein is a transmembrane glycoprotein with a molecular mass of about 100 kDa, and comprises 25 to 30% (w/w) of the total membrane proteins. There are about 1200 × 103 copies per cell. Band 3 migrates as a diffuse band on SDS–PAGE gels due to heterogeneous glycosylation [5, 6]. The human band 3 cDNA is about 4.7 kb in length and encodes a 911 amino acid polypeptide [7, 8]. The gene (EPB3) is located on chromosome 17 (17q21–q22). Structure and functional interpretation of the cytoplasmic domain of erythrocyte membrane band 3 have recently been shown crystallographically [9]. 1.1 Structure of Band 3
Band 3 is composed of the cytoplasmic (NH2 -terminal) domain and helices and β-sheets to form the transmembrane (COOH-terminal) domain [1–5] (Fig. 1). The cytoplasmic (NH2 -terminal) domain of band 3 is composed of the first 403 amino acids. The major part of this domain (amino acids 1 to 359) is released from the membrane by treatment with chymotrypsin or trypsin as a 43 kDa fragment [10]. The two regions can be separated by chymotrypsin cleavage at the inner membrane (position 359). A second chymotryptic site is accessible at the external surface at position 553. This domain is an elongated, water-soluble, 403 amino acid segment with a flexible proline-rich hinge near the center, which is located between amino acids 175 and 190 of this domain. The first 45 amino acids in the cytoplasm are highly acidic. The reminder of the domain is fairly mobile. It extends at a high pH and contracts at a low pH [5]. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Integral Proteins of the Erythrocyte Membrane
Fig. 1 Schematic model for the molecular structure of human erythroid band 3. Band 3 molecules are composed of their cytoplasmic and transmembrane domains. The cytoplasmic domain demonstrates many binding sites to membrane proteins and
cytoplasmic proteins. The membrane domain contains anion transport sites, such as Lys539, Glu681, and Lys851. Asn642 in the band 3 molecule is the binding site for a sugar chain.
The transmembrane (COOH-terminal) domain (amino acids 404 to 911), which is composed of approximately 52 kDa (17 + 35 kDa), folds into helices and β-sheets to form 12 to 14 membrane-spanning segments containing the anion transport channels. The boundary between the NH2 -terminal (cytoplasmic) tail and the first membrane-spanning segment (amino acids 400 to 404) is highly conserved in red cells of various species [7, 9]. The protein at position 403 is particularly important for creating a β bend or a random coil at the membrane junction, which is known as an inter-domain hinge and gives the tail freedom of movement [11]. 1.2 Functions of Band 3 1.2.1 Membrane Protein Binding by the Cytoplasmic Domain of Band 3 The cytoplasmic (43 kDa) domain of band 3 plays a centrol role in the attachment of the cytoskeleton to the plasma membrane. The inter-domain hinge at this attachment point is important for the flexibility and rigidity of red cells. Three peripheral membrane proteins (ankyrin [12–15], protein 4.1 [16, 17], and protein 4.2 [18]) bind to the cytoplasmic domain of band 3. The ankyrin-binding involves several regions (proximal, middle, and distal) scattered throughout the cytoplasmic domain, implying that the cytoplasmic domain has a complex folded structure. Ankyrin
1 Band 3 3
binds to the flexed conformations of band 3, most probably to band 3 tetramers. It has been suggested that protein 4.1 and protein 4.2 have more than one site of attachment. Protein 4.1 also binds to peptides containing clustered basic residues (LRRRYand IRRRY) that are located at the COOH-terminal end of this domain [17]. Some of the band 3 gene mutations (band 3 Tuscaloosa, band 3 Montefiore, band 3 Fukuoka, and band 3 Coimbra) are known to demonstrate a markedly decreased protein 4.2 content. Band 3 is associated with glycophorin A (GPA) in the membrane [19, 20]. GPA facilitates translocation of band 3 from the endoplasmic reticulum as the site of synthesis of band 3 to the plasma membrane of Xenopus oocytes. Band 3 is critical for GPA synthesis or its stability, since GPA is also deficient in the band 3-deficient red cells [21]. Glycophorins do not appear to be a determinant for the expression of band 3, because when there is a combined deficiency of glycophorin A and B (a glycophorin Mk Mk phenotype) the red cells demonstrate normal amounts of band 3. There is additional evidence for the interaction of band 3 with GPA. The Wright (Wrb ) antigen is caused by interaction of a site on band 3 (Glu658 ) with a site or sites located near the end of the extracellular domain of GPA or in the adjacent transmembrane domain [22]. A monoclonal antibody to the Wrb blood type antigen immunoprecipitates both band 3 and GPA [23]. Furthermore, antibodies to glycophorin decrease the lateral and rotational mobility of band 3, indicating the presence of an interaction of band 3 with glycophorin. Self-association of band 3 is one of its major functions in red cell biology. Band 3, which is extracted from the membrane by a nonionic detergent (octaethylene glycol n-dodecyl monoether), exists as stable dimers (70%), tetramers, and higher-order oligomers (30%) [24]. Tetramers of band 3 are associated with the membrane skeleton. Since isolated membrane domains only form dimers, tetramers appear to be assembled by cross-linking neighboring dimers through the cytoplasmic domain, with ankyrin, or hemichromes. The most efficient physiological functional form in which interactions with cytoplasmic molecules occur appears to be tetrameric. 1.2.2 Binding to Glycolytic Enzymes by the Cytoplasmic Domain of Band 3 The acidic NH2 -terminal sequence has the binding sites for the glycolytic enzymes, that is, glyceraldehyde-3-phosphate dehydrogenase (G-3-PD) [25], phosphoglycerate kinase (PGK) [26], and aldolase [27]. These enzymes are basically cytosolic enzymes, but about 65 % of G-3-PD, 50% of PGK, and 40% of aldolase are bound to band 3 in the intact red cells. The enzymatic activities of these three enzymes are inhibited by membrane attachment, which are regulated by substrates, cofactors, inhibitors, and also phosphorylation of tyrosine 8 [28]. The phosphorus is added by the kinase p72syk and may be removed by a phosphotyrosine phosphatase which is bound to band 3. 1.2.3 Binding to Hemoglobin by the Cytoplasmic Domain of Band 3 The cytoplasmic domain of band 3 also binds hemoglobin. Deoxyhemoglobin binds better than the oxyhemoglobin. As regards this binding, the first five to seven amino acids of band 3 at the NH2 -terminal end are inserted deep into the
4 Integral Proteins of the Erythrocyte Membrane
2,3-diphosphoglycerate (2,3-DPG)-binding cleft of hemoglobin [29]. Therefore, 2,3DPG inhibits deoxyhemoglobin binding by band 3. Under physiological conditions, approximately half the band 3 molecules have hemoglobin attached. A denatured form of hemoglobin binds better and polymerizes with band 3, forming an aggregate of hemichromes and band 3 [30]. This binding of band 3 with hemichromes appears to stimulate aggregation into patches, which are uniquely recognized by a red cell senescence isoantibody leading to opsonization of the cell and its removal from circulation by the spleen. This mechanism may be one of the important events in red cell ageing. 1.2.4 Anion Exchange Channel by the Transmembrane Domain of Band 3 The transmembrane (52 kDa) domain of band 3 (amino acids 404–911) contains an anion exchange channel [1, 31–34]. The transport capacity is evaluated as being from 1010 to 1011 bicarbonate and chloride anions per second by 1200 × 103 molecules of band 3 per red cell. The bicarbonate is produced by carbonic anhydrase in the red cells. Most of the bicarbonate produced appears to have originated through this channel. In addition, approximately 60 % of CO2 transport from the tissues to the lungs appears to be handled in this way. The hydrogen (H+ ) byproduct by carbonic anhydrase binds to hemoglobin and facilitates oxygen release to the tissues [33]. All of these reactions are reversed in the lungs. The basic structural unit in both the inward- and outward-facing transporters is an oblong band 3 dimer. At the center of the dimer, the two monomers are located close together to form a channel. There is also a flexible subdomain on the far side of each monomer, which might be formed by one or more of the large cytoplasmic loops between transmembrane helices. The anion exchange itself appears to occur through a ping-pong mechanism. An intracellular anion enters the transport channel and is translocated outward and released, with the channel remaining in the outward conformation until an extracellular anion enters and triggers the reverse cycle [1]. It has recently been shown that glutamic acid (Glu681 ) plays the key role in chloride translocation by human band 3, that Glu681 is located near the carboxy terminus of transmembrane segment 8 (TM8) at the region of the chloride channel, and that Lys851 , Ser852 , and Ala858 are also required for this channel [34]. Anion transport requires a high energy and volume of activation. Anion exchange is extremely rapid with a half-life of 50 ms for chloride and bicarbonate, and the specificity of the channel is fairly broad. At much slower rates, large anions (sulfate, phosphate, phosphoenol pyruvate, and superoxide) are also transported through this channel [1, 2]. 1.2.5 Lateral and Rotational Mobility of Band 3 Lateral mobility of band 3 in normal human red cell membranes is constrained by steric hindrance interactions, low affinity binding interactions, and highaffinity binding interactions [35–37]. Steric hindrance interactions between band 3 oligomers and the spectrin-based membrane skeleton put a major constraint on the laterally mobile band 3 fraction, slowing the rate of band 3 lateral diffusion by approximately 50-fold compared with the predicted diffusion rate of free band
1 Band 3 5
3 in membranes devoid of a functional membrane skeleton. The spectrin/band 3 ratio is the major determinant of the lateral diffusion rate of band 3. Fluorescence recovery after use of the photobleaching (FRAP) method (Fig. 2) demonstrates that, in normal red cells, approximately one-third of band 3 is present in the mobile fraction and the remaining two-thirds exist in the immobile fraction, which is fixed to the cytoskeletal network, in particular to ankyrin [35–37]. It has been reported that the mobile fraction was 0.43 ± 0.11 with a lateral diffusion coefficient of (6.86 ± 1.37) × 10−11 cm2 s−1 in normal red cells by the FRAP method. Rotational mobility of band 3 in normal human red cell membranes is constrained by low-affinity and high-affinity binding interactions. The rotationally immobile band 3 fraction apparently represents individual band 3 molecules bound with high affinity to ankyrin. The rapidly rotating band 3 fraction consists of dimers, tetramers, and higher order oligomers of band 3 that are free from rotational constraints other than the viscosity of the lipid bilayer. The slowly rotating band 3 fraction is less well-defined. Rotational constraints applied by low-affinity binding interactions between ankyrin-linked band 3 and other band 3 molecules, and between the cytoplasmic domain of band 3 and membrane skeletal proteins (ankyrin, protein 4.1 and protein 4.2) have been invoked. 1.2.6 Blood Type Antigens and Band 3 On the transmembrane domain of band 3, a complex carbohydrate structure is attached to Asn642 [1, 2, 20]. Within the carbohydrate structure, N-acetylglucosamine, mannose, galactose and other moieties exist. Band 3 carries various polymorphic peptide epitopes including band 3 Memphis (Lys → Glu at position 56), such as the Diego blood group system, Wright (Wra and Wrb ) antigens and several other low frequency antigens [38]. No apparent consequence for band 3 function is observed in these antigenic variants. The Diego (Dia ) allele is observed with varying frequencies in Asian and South American populations, although this allele is exceedingly rare in most Caucasians [20]. Dia and Dib antigens correspond to a proline or a leucine residue at position 854 of the band 3 molecule, respectively. Among the Wright alleles, the Wrb allele is predominantly expressed (higher than 99 %), whereas the Wra allele is extremely rare [20, 22, 23]. Wra and Wrb antigens correspond to a lysine or a glutamine residue, respectively, encoded by codon 658 of band 3. The expression of Wrb is suppressed in glycophorin A deficiency. Band 3 also carries several minor antigens, such as Waldner (Wda ), Redelberger (Rba ), Traversu (Tra ), Wulfsberg (Wu), Moen (Moa ), ELO, and Warrior (WARR). It is known that WARR and Wu correspond to point mutations of Thr 552 → Ile and Gly 565 → Ala of band 3, respectively. ELO, Rb (a+), Tr (a+), and Wa (a+) correspond to Arg 432 → Trp, Pro 548 → Leu, Lys 551 → Asn, and Val 557 → Met substitutions, respectively [20, 38]. On the transmembrane domain of band 3, a complex carbohydrate structure is attached to Asn642 . The Ii antigens are carbohydrate molecules and correspond to portions of the oligosaccharide chains [20]. Molecules with i reactivity correspond to oligosaccharide chains containing at least two repeating N-acetylgalactosamine
6 Integral Proteins of the Erythrocyte Membrane
Fig. 2 Principle of fluorescence recovery after the photobleaching (FRAP)method to examine lateral mobility of band 3 molecules in red cell membranes in situ. The procedure is shown in (I). After band 3 is labeled by fluorescence (A),it is subjected to a spot laser beam, which abolishes the intensity of fluorescence on band 3 (B). The intensity of fluorescence is recovered by reentry of band 3 molecules with their lateral mobility (C). Relative fluorescence intensity is measured during this course of time.The
extent of the recovery of fluorescence intensity represents a mobile fraction (approximately 30 % of total) of band 3, and the remainder is equivalent to an immobile fraction of band 3. Interpretation of the results is shown schematically in (II). The band 3 molecules attached to the cytoskeletal network with anchoring proteins (ankyrin, and also probably protein 4.2) are immobile (a), and those without any fixation to the network are mobile (b, c, and d).
units, whereas I activity corresponds to a branched oligosaccharide structure formed by an N-acetylgalactosamine unit attached in a β1,6-linkage to a galactose residue within linear lactosamine polymers. Oligosaccharide chains in neonatal red cells are largely unbranched, and those in adult red cells are highly branched. The increase in the I reactivity, with a corresponding decrease in i reactivity, during early infancy corresponds to the elaboration and display of increasing numbers of β1,6-linked lactosamine units. Therefore, red cell expression of these Ii antigens is developmentally regulated. This particular β (1,6) N-acetylglucosaminyltransferase activity and a chain-elongating β (1,3) N-acetylglucosaminyltransferase activity have been identified.
2 Glycophorins
1.3 Band 3 in Nonerythyroid Cells
It is known that, in addition to erythroid band 3 (anion exchanger-1: AE1, or solute carrier family 4A: SLC4A1), two other genes (AE2andAE3) encoding band 3-related anion exchange proteins are present [7, 8, 39–41]. AE2 (or SLC4A2) is the general tissue anion antiporter, and is widely distributed in many tissues and cells. AE3 (or SLC4A3) is expressed in the heart and brain. In AE2 and AE3, approximately 300 amino acids are added to the NH2 -terminus of the AE1 molecule. Therefore, AE2 and AE3 are larger than AE1. Among these three transporters, there is distinct homology, particularly in the membrane domain of their molecules. AE1 is also expressed in tissues other than red cells, such as in the acid-secreting, type Aintercalated cells in the collecting ducts of the kidney, and in cardiac myocytes. The kidney transcript lacks the first 66 amino acids of the cytoplasmic domain of AE1. Thus, it is unable to bind glycolytic enzymes, protein 4.1, or ankyrin.
2 Glycophorins
The glycophorins are known as the most abundant integral membrane glycoproteins in red cells [19, 42, 43]. Glycophorins have a high sialic acid content, and more than 95 % of the periodic acid Schiff (PAS)-staining compounds come from the glycophorins. The five types of glycophorins are known as glycophorins A, B, C, D, and E [19, 42–52]. Among these five glycophorins, glycophorins A, B, and E are encoded by three closely linked genes [49–52], whereas glycophorins C and D arise from a single locus bearing no significant homology to the genes for glycophorins A, and B. Glycophorin D differs from glycophorin C by use of an alternate translation start site created by alternative mRNA splicing [19, 42, 43]. Another gene linked in tandem with those for glycophorins A and B has been isolated, which encodes glycophorin E [49–52]. This appears to have evolved from glycophorin A by homologous recombination at Alu repeats [49, 50]. However, no protein product has been identified regarding the gene for glycophorin E. Characterization of cDNA and genomic clones encoding the glycophorins has revealed that they fall into two distinct subgroups. Therefore, glycophorins are categorized into two groups, that is: (1) glycophorins A, B, and E, and (2) glycophorins C and D [19, 42, 43]. 2.1 Glycophorins A, B, and E 2.1.1 Glycophorin A (GPA) Glycophorin A (GPA) is the major sialoglycoprotein of red cells [19]. It is present at a level of approximately one million copies per cell. Its molecular weight, including carbohydrate, which is estimated from mobilities on SDS gels is 36 × 103 , and the molecular weight which is calculated from its molecular sequence excluding
7
8 Integral Proteins of the Erythrocyte Membrane
Fig. 3 Molecular structures of glycophorins A, B, C, D, and E. Glycophorins B (GPB) and E (GPE) are basically derived from glycophorin A (GPA). In contrast, glycophorin D (GPD) is derived from glycophorin C (GPC). E denotes exons. Ge: Gerwich blood group antigens.
the carbohydrate chain is 14 × 103 . GPA is approximately 1.6% (w/w) of the total membrane proteins in the red cells. Three GPA complementary DNA transcripts are expressed from the gene (transcripts of 2.8, 1.7, and 1.0 kb) that vary only by the use of alternative polyadenylation sites. The 5 -ends of these transcripts are essentially identical. The gene of GPA (GYPA) is located at chromosome 4q28–q31. GPA is synthesized with a cleavable leader peptide that yields a type I transmembrane protein 131 amino acids long [19, 44, 45]. The protein is heavily glycosylated with a single asparagine (position 26)-linked glycan and 15 serine–threonine-linked oligosaccharide units. Approximately 60 % of the GPA molecule is carbohydrate. GPA gene is composed of seven exons (Fig. 3). Exon 1 yields a leader peptide, whereas exons 2 (amino acids 1–26), 3 (amino acids 27–57), and 4 (amino acids 58–70) encode the extracellular domain. Exon 5 encodes the transmembrane domain (amino acids 71–100). Exons 6 and 7 generate the cytosolic domain (amino acids 101–131) and 3 -untranslated region [19, 44, 45]. The precise biological functions of GPA have not been well defined. Since GPA demonstrates an extensive negative surface charge of the red cells, it is expected that it may modulate interactions between red cells and red cells, or between red cells and endothelial cells. It has been reported that red cells are clumped when sialic acids are removed. The M and N antigens on GPA are determined by amino
2 Glycophorins
acid polymorphism at positions 1 and 5 of the mature polypeptide. The M antigen is defined by a serine at amino acid position 1 and a glycin at position 5, whereas the N antigen is defined by a leucine at position 1 and a glutamine at position 5 [19, 20, 53]. Another example is the Ss phenotype. When methionine is present at amino acid position 29, the S phenotype is expressed [19, 20, 53]. Similarly, the presence of threonine at position 29 produces the s phenotype. GPA is expressed only in erythroid cells, especially after the proerythroblast stage during terminal erythroid maturation. Glycophorin A-deficient red cells, such as an En (a–) type [20, 54], exhibit increased glycosylation of band 3, probably due to the addition of excessive sialic acid, which should have been present on the GPA molecule. Therefore, total surface charge density is not affected, and GPA-deficient red cells maintain the normal red cell shape and deformability. Thus, GPA may not be crucial with respect to the maintenance of mechanical stability, deformability, or shape change. Total deficiency of band 3 in the band 3-knock-out (−/−) mice [55, 56] or in the Japanese cow [57] is associated with complete deficiency of GPA in their red cells. In addition, when red cells bind to immunologically nonspecific ligands, (wheat germ agglutinin) the binding causes aggregation of glycophorin and decreases red cell deformability. Glycophorin-deficient red cells are more resistant to invasion by malaria parasites than are normal red cells [58]. Amino acids 90 to 93 of the membrane domain appear to be critical to the formation of GPA dimers in the membranes. In the transmembrane domain, a GPA dimer showed a small, well-packed interface between the molecules. Van der Waals interactions alone can mediate stable and specific associations between transmembrane helices [59]. 2.1.2 Glycophorin B (GPB) Glycophorin B (GPB) is fairly similar to GPA except for their exoplasmic domains and cytoplasmic tails [46, 48]. The gene for GPB (GYPB) is located on chromosome 4q28–q31. GPB is a structurally similar type I transmembrane protein derived from a gene consisting of five exons. This gene yields a single 0.5 kb transcript. GPB is synthesized with a cleavable single sequence to yield a type I transmembrane protein, which is 72 amino acids in length. There are no asparagine-linked carbohydrate chains, because asparagine in position 26 of exon 3 is spliced out, whereas there are approximately 11 serine–threonine-linked oligosaccharide chains. Approximately 50% of the mass of GPB molecules consists of oligosaccharide. There are only about 150 000 copies per cell. The apparent molecular weight of GPB on the SDS gels is 20 × 103 , and that calculated from protein sequence data is 8 × 103 . GPB is equivalent to 0.2 % (w/w) of the total red cell membrane proteins. The glycophorin B gene is composed of five functional exons (exons 1, 2, 4, 5, and 6) [46, 48] (Fig. 3). Sequences corresponding to exon 3 are designated as pseudoexons, because the sequences are not expressed in GPB transcripts as a consequence of a non-functional splice acceptor sequence at its 3 -border. Exon 1 yields a leading peptide. The extracellular domain is encoded by exons 2 and 4, and exon 5 encodes the transmembrane and short cytosolic segment. Exon 6 gen-
9
10 Integral Proteins of the Erythrocyte Membrane
Fig. 4 Evolution of glycophorins A, B and E. GPA: glycophorin A, GPB: glycophorin B, and GPE: glycophorin E.
erates the 3 -untranslated region. Amino acid sequence polymorphisms encoded by exon 4 yield an S-specific GPB molecule (methionine at position 29), or an s-specific molecule (threonine at position 29) [20, 53]. The amino acid sequence encoded by exon 2 yields the N antigen, with leucine at position 1 and glutamine at position 5. No biological function has been assigned to GPB other than its association with the Ss blood group [20, 53]. GPB is only present in erythroid cells, as is GPA.
2.1.3 Glycophorin E (GPE) Glycophorin E is a glycophorin that has been identified by molecular cloning [49–52]. The genes for glycophorins A, B, and E are located on the same chromosome 4q28–q31, in the order A, B, and E. The genomic structure and promoters of all three genes are highly conserved, and all three contain cleavable leader peptides. These three genes are oriented in a tandem fashion in a gene cluster (Fig. 4). The glycophorin E gene (GYPE) is predicted to be composed of four functional exons (exons 1, 2, 5, and 6) and two non-utilized pseudoexons (numbers 3, and 4) (Fig. 3). Exon 1 yields a leader peptide, exon 2 (positions 1 to 26) encodes the putative extracellular domain, and exon 5 (positions 26 to 59) encodes the predicted transmembrane segment. Exon 6 produces the 3 -untranslated region. The GYPE locus lacks DNA sequences corresponding to amino acid residues 27 to 39 of GPB, encompassing the position of the Ss amino acid sequence polymorphism in GPB. GYPE also contains a DNA sequence insertion, relative to GYPA and GYPE, at a position corresponding to exon 5 of GYPA. This insertion is predicted to encode eight amino acids not present in GPA or GPB. A polypeptide product corresponding to the GYPE locus has not been identified in vivo, but it could be a 20 000 Da molecule with a length of 59 amino acids, and with residues 1 and
2 Glycophorins
5 occupied by serine and glycine, respectively, corresponding to the M type blood group. 2.2 Gylcophorins C and D 2.2.1 Glycophorin C (GPC) Glycophorin C (GPC) is a glycoprotein with 128 amino acids [42, 43, 47, 48, 60, 61]. The molecular weight on the SDS gels is 32 kDa, but when calculated it is 14 kDa, because extensive posttranslational modification by glycosylation at the single asparagine (residue 8)-linked N-glycosylation site and 12 serine–threoninelinked O-glycosylation sites are present. Approximately 0.1% of the total membrane proteins (w/w) is GPC. There are about 143 × 103 copies of gly-cophorins C (per red cell). The gene for GPC (GYPC) is located on chromosome 2q14–q21. The GPC gene is composed of four exons (Fig. 3). The extracellular domain of GPC (residues 1–57) is encoded by exons 1, 2, and 3. The glycosylation sites are located in this extracellular domain. The transmembrane segment (residues 58–81) is encoded by exons 3 and 4, and the cytosolic domain (residues 82–128) by exon 4. The Gerbich (Ge: 2 and Ge: 3) blood type antigens are located at positions corresponding to exons 2 and 3 of GPC, respectively [20, 42, 62]. It is known that GPC is not erythroid-specific, and is present in multiple nonerythroid tissues in a distinctive fibrillar pattern. During erythroid differentiation, a desialylated form of GPC is present on the surface of erythroid progenitors of the burst forming unit in the erythroid (BFU-E). Normally glycosylated GPC first appears in erythroid progenitors of the colony-forming unit in erythroid (CFU-E). GPC is functionally important, because the cytoplasmic tail of GPC binds to protein 4.1 and p55 resulting in the anchoring of the skeletal network to the membrane. Therefore, GPC plays a crucial role in regulating the stability, deformability and shape of the membrane. 2.2.2 Glycophorin D (GPD) GPD is a truncated form of GPC (Fig. 3), corresponding to residues 22 to 128 of GPC [20, 42, 43, 60]. Therefore, GPD (23 kDa) is approximately 9000 Da smaller than GPC (32 kDa) on the SDS gels. The molecular weight of GPD calculated from the gene (GYPD) sequence is 11 kDa, and roughly 0.02% of the total membrane proteins (w/w) is GPD. There are approximately 82 × 103 copies of GPD per red cell. The same transcript yields both GPC and GPD through a mechanism involving translation initiation at an internal ATG codon. When translation is initiated at the first AUG, GPC is produced. When initiation occurs at the AUG encoding methionine at position 20, GPD as a truncated protein is produced. Residues 22 to 128 are identical in GPC and GPD. GPC does not express a cleavable signal peptide. The genes for GPC and GPD are located on chromosome 2q14–q21. There are six O-linked sugar chains in the GPD molecule, but none of the N-linked sugar chains. GPD expresses only the Ge: 3 determinant of the Gerbich blood group [20, 62].
11
12 Integral Proteins of the Erythrocyte Membrane
3 Blood Group Antigens
As many as 243 different determinants have been listed as blood group antigens, which belong to one of 19 distinct blood group systems [20, 63–68] (Table 1). Some of these demonstrate the clinical relevance of the antigens especially in red cell transfusion and organ transplant procedures. Therefore, current information on their biochemical and genetic properties and the molecular basis for the inherited polymorphisms in these molecules is critical.
3.1 ABO Blood Group
Human beings are classified into distinct groups by the ABO blood group system (Table 1) depending on the presence or absence of substances in the serum that agglutinate red cells from humans of other classes [69]. The antigens of the ABO system (A, B, and H determinants) are expressed by red cells and by many other tissues. The cells of some human tissues produce water-soluble forms of these molecules as components of the glycans on secreted and soluble glycoproteins, on glycosphingolipids, and on free oligosaccharides. This mechanism is determined by the Secretor (Se) locus. The immunoreactive regions of the ABO blood group determinant are located at the terminal ends of various oligosaccharides, which are components of integral proteins and glycolipids of red cell membranes [69–72]. The A, B, and H blood group molecules are constructed sequentially by distinct glycosyltransferases which are determined by a distinct genetic locus. These glycosyltransferases operate on one of four structurally distinct oligosaccharide precursor types synthesized in human cells. Type 1 oligosaccharide precursors are found at the termini of linear and branched chain oligosaccharides linked to proteins at asparagine residues or at serine or threonine residues. The type 1 oligosaccharide precursors are synthesized only by epithelia of various tissue cells, yielding ABH determinants present in body fluids and secretions. The ABH determinants expressed by red cells are mainly by the type 2 precursor chains, which are also asparagine-linked or serine–threonine-linked oligosaccharides. Type 3 A, B, and H antigens in mucins are not found in human red cells. Type 4 chains are restricted to glycolipids in human red cells. These oligosaccharide precursors are catalyzed by α (1,2) fucosyltransferase in a transglycosylation reaction. These enzymes transfer the nucleotide sugar substrate GDP-fucose to carbon 2 of the galactose molecule at the oligosaccharide precursors [69, 70]. The fucose is attached in an alpha anomeric linkage and forms the blood group H determinant, such as the disaccharide Fuc α (1,2) Gal β-unit. The human genome encodes two different α (1,2) fucosyltransferases in a tissuespecific fashion, which correspond to the products of the H and the Se blood group loci [73].
Table 1 Major human blood group systems
System name/symbol
Gene name
Antigen type
Antigen copy number per red cell
Number of alleles
Chromosome
Chapter No. in text
001
ABO/ABO
ABO
Oligosaccharide
8 × 105 −2 × 106
3 major Several minor (cis-AB, subgroups)
9q34.1–q34.2
3.1
002
MNS/MNS MN (glycophorin A)
∼1 × 106
2 major (M or N) Multiple minor
4q28–q31
2.1
003 004
S (glycophorin B) P/P1 Rh/RH
Glycoproteins GYPA, (GYPE) GYPB P1 Oligosaccharide Protein complex
∼1.5 × 105 ∼103
4q28–q31 22q11.2–qter 1p34.3–p36.1
2.1 3.3 3.2
005 006
D C/c, E/e Lutheran/LU Kell/KEL
RHD RHCE LU Glycoprotein KEL Glycoprotein
∼2 × 104 ∼2 × 104 ∼1600–4100 ∼5 × 103
19q12–q13 7q33
3.4 3.5
007 008 009 010
Lewis/LE Duffy/FY Kidd/JKKidd/JKa , 4> Diego/DI
FUT3 FY JK AE1
Oligosaccharide Glycoprotein Glycoprotein Glycoprotein (band 3)
4500–7300 ∼12 000–17 000 ∼14 000 ∼15 000
19p13.3 1q22–q23 18q11–q12 17q12–q21
3.6 3.7 3.8 3.11
011
Cartwright/YT
Not determined
012
XG/XG
ACHE Glycoprotein (acetylcholinesterase) XG Glycoprotein
2 major (S or s) Multiple minor Complex ∼48 haplotypes Includes minor C/c and E/e alleles 2 major (D+ or D− ) 2 major (C or c, and E or e) 3 major (Lua , Lub , recessive null) 2 major (KEL1, KEL2) Several minor 2 major (Lea , Leb ) 2 major (Fya and Fyb ) 2 major (Jka , Jkb ) Several Dia , Dib Wra , Wrb Other high-frequency antigens 1 major (Yta ) 1 minor (Ytb )
∼9000
Xp22.32
013 014
Scianna/SC Dombrock/DO
SC DO
Not determined Not determined
1 major (Xga ) 1 minor (Xg, a hypothetical null) 3; Scl, Sc2, and Sc3 1 major (Doa ) 1 minor (Dob )
Glycoprotein Glycoprotein (GPI-linked)
7q22
1p36.2–p22.2 unknown
3 Blood Group Antigens 13
ISBT No.
ISBT No.
System name/symbol
Gene name
Antigen type
Antigen copy number per red cell
Number of alleles
Chromosome
015
Colton/CO
AQP1
Not determined
2 major (Coa and Cob )
7p14
016
LandsteinerWeiner/LW
LW
Glycoprotein (aqua-porin-1) Glycoprotein (ICAM-4)
2 major (LWa , LWb ) Rare nulls
19p13.3
017
ChidoRogers/CH/RG
C4A, C4B
∼4400 (on D+ cells) ∼2835–3620 (on D− cells) ∼103
Ch (1 major, several minor) Rg (1 major, several minor)
6p21.3
018
Hh/H
FUT1
See ABO
19q13
019 020
Kx/XK Gerbich/GE
XK Protein GYPC, Glycoproteins GYPD
1 major Several minor (Bombay, para-Bombay) 1 major (Kx ) 1 major (Ge: 1, 2, 3, 4)
021
(glycophorins C and D) Cromer/CROM
DAF
022
Knops/KN
CR1
023 − −
Indian/IN Secretor/Se Ii
CD44 FUT2
ISBT: International Society of Blood Transfusion.
Glycoprotein (4th component of complement; C4A and C4B) Oligosaccharide
Glycoprotein (decay-accelerating factor) Glycoprotein (complement receptor CRI) Glycoprotein Oligosaccharide Oligosaccharide
Not determined ∼1 × 105 (glycophorin C) ∼2 × 104 (glycophorin D) ∼103
Xp21.1 2q14–q21
Chapter No. in text
3.9
2.2
Multiple minor 2 major (Cra , Crb ) Several minor
1q32
Not determined
Several major (Kna , Knb , Mca , Mcb , Yka , Sla )
1q32
Not determined see Lewis see ABO
2 major (Ina , Inb ) 2 major (Secretor, non-Secretor) 1 major (I), at least 1 minor (i)
11p13 19q13 9q21
3.10
14 Integral Proteins of the Erythrocyte Membrane
Table 1 (Continued).
3 Blood Group Antigens 15
Glycosyltransferases, which are encoded by the ABO blood group locus, use type 1, 2, 3, or 4 H determinants to form A or B blood group determinants. The A allele at the ABO locus encodes an α (1,3) N-acetylgalactosaminyltransferase that uses H molecules to form the blood group A molecule [69–71]. The B allele encodes an α (1, 3) galactosyltransferase that operates on H-active oligosaccharide precursors to form the blood group B determinant. The O allele is a null allele, which cannot encode a functional glycosyltransferase that will further modify H-active precursors. The ABH determinants in human red cells are mainly associated with membrane glycoproteins, that is, 80% of those (1–2 × 106 molecules per red cell) in band 3, and others in the red cell glucose transporter (band 4.5: 0.5 × 106 molecules per red cell), the Rh-related proteins, and the aquaporin-1 glycoprotein [74]. Red cell membrane glycolipids are also involved with the ABH molecules (0.5 × 106 molecules per red cell). A single asparagine-linked oligosaccharide molecule on band 3 is a branched poly-N-acetylgalactosaminoglycan, whose terminal branches may display several ABH determinants. Ig M antibodies specific for ABO oligosaccharide determinants are not displayed in red cells during infancy. This immune response is a consequence of exposure to microbial oligosaccharide antigens that are structurally similar (or identical) to the A and B blood group molecules. It is also interesting to note that, in most individuals, antibodies directed against H determinants are not formed because a substantial number of the blood group H precursors are not enzymatically converted into A or B determinants. The Ig M isoagglutinins, which occur naturally, efficiently fix complement leading to acute hemolysis of transfused red cells that display the corresponding antigen. There are a substantial number of variants of the ABH blood group antigensss [75, 76]. For example, in A subgroups, the A1 and A2 phenotypes are known, which differ in their molecular structure [70, 71]. The absolute number of immunodominant molecules is greater on A1 cells than it is on A2 cells. The human A transferase is a type II transmembrane protein, and is composed of 353 amino acids with an NH2 -terminal segment (residues 1–15), a hydrophobic segment (residues 16–39), and a COOH-terminal domain (residues 40–353). The COOH-terminal catalytic domain is located within the membranedelimited compartments of the Golgi and the trans-Golgi network, where terminal glycosylation reactions occur. This enzyme has a single potential site for asparaginelinked glycosylation, and functions as a glycosyltransferase [77]. The A transferase is also present as a soluble, catalytically potent polypeptide. In the molecular structure, the NH2 -terminus of the soluble enzyme corresponds to the alanine residue at codon 54, indicating that the soluble form of the A transferase is derived from its transmembrane precursor by proteolysis. Therefore, glycosyltransferase exists in both membrane-associated and soluble, catalytically active forms [70]. As regards the molecular basis for polymorphism at the ABO locus, the blood group O phenotype is produced and is due to a single base pair deletion of one nucleotide in the codon for amino acid 87 of the A transferase. By this frameshift of the reading frame, a termination codon appears at amino acid residue 117 of the A transferase. The truncated protein (116 amino acids) is consequently unable to
16 Integral Proteins of the Erythrocyte Membrane
modify the H antigen to form A or B structures, resulting in no detectable A or B transferase activity being observed in the sera taken from blood group O individuals (genotype OO). As for the A or B blood group phenotype, seven differences in nucleotide sequence are present within the protein-coding segments of the A transferase cDNA. Three of the seven appear to be functionally neutral polymorphisms. The other four appear to be critical for expressing the A and B transferases. These are at residues 176 (arginine, A; glycine, B), 235 (glycine, A; serine, B), 266 (leucine, A; methionine, B), and 268 (glycine, A; alanine, B). The polymorphisms at positions 266 and 268 are important for enzymatic functions to discriminate between UDPN-acetylgalactosamine and UDP-galactose. Therefore, leucine at 266 and glycine at 268, or methionine at 266 and alanine at 268 generate an A transferase or B transferase, respectively. H blood group oligosaccharide precursors are the most important substrates for the transferases encoded by the ABO locus. H-active precursors display terminal Fuc α (1,2) Gal β linkages, which are an integral part of the A and B antigenic determinants. These linkages are synthesized by α (1,2) fucosyltransferases (GDPfucose: Gal β2-α-L-fucosyltransferase). These transferases can use types 1, 2, 3, and 4 glycoprotein or glycolipid substrates as well as low molecular weight β-D-galactosides [70]. The synthesis of H-active blood group substances is determined by two factors, that is, the H locus and the Secretor (Se) locus [69, 70]. The Se locus determines expression of an α (1,2) fucosyltransferase activity, and H-active blood group substances (membrane-associated, and also soluble). Nearly all of this soluble blood group-active substance is constructed from type 1 precursors and is released mainly from the sublingual and submaxillary glands and the parotid gland. Red cells taken from both secretors and non-secretors maintain an essentially identical complement of H-determinants and also A or B determinants, depending on the ABO locus genotype. In red cell precursors, the synthesis of α (1,2) fucosyltransferase activity, and H determinants is directed by the H locus. The H and Se loci correspond to distinct genes encoding different α (1,2) fucosyltransferases with disparate tissue-specific expression patterns. The H locus is expressed predominantly in erythroid cells. In contrast, the Se locus represents a second α (1,2) fucosyltransferase locus whose expression is restricted to the epithelia of many tissue cells. Individuals with the secretor phenotype maintain at least one functional allele at both the H locus and the Se locus. Individuals with the non-secretor phenotype maintain two null alleles at the Se locus and at least one functional H allele. In human erythroid cells, the Se locus is constructed mainly from type 1 precursors, and the H determinants, which are synthesized by the H-encoded α (1,2) fucosyltransferase, are based on type 2 precursors [69, 73]. The human Se locus and the human H locus are separated only by 35 kb of genomic DNA on the same human chromosome 19. The human H locus encodes a 365 amino acid-long polypeptide as the type II transmembrane glycosyltransferase [78, 79]. The molecular structure is
3 Blood Group Antigens 17
composed of an NH2 -terminal cytosolic domain (residues 1–8), a hydrophobic domain (residues 9–25) that spans the Golgi membrane, and a COOH-terminal domain (residues 26–365) corresponding to a Golgi-localized catalytic domain, where two potential asparagine-linked glycosylation sites are present. Therefore, the human H locus is an α (1,2) fucosyltransferase. The human Se locus is a 332 or 343 amino acid-long polypeptide, which shares 68% of the amino acid sequence identity with the COOH-terminal 292 residues of the human H blood group α (1,2) fucosyltransferase [79]. The molecular structure is composed of a cytosolic domain with 3 or 14 residues, a 14 residue hydrophobic membrane domain, and a 315 amino acid-long COOH-terminal domain, which is located in the Golgi lumen. There are three potential asparagine-linked glycosylation sites [80, 81].
3.2 Rh Blood Group
The Rh (Rhesus) blood group antigens were initially identified by Landsteiner and Wiener in 1940 in the antisera of immunized guinea pigs and rabbits with red cells taken from Macaca rhesus monkeys [20, 68, 82, 83]. The human alloantibody is called Rh, and the heteroantibody is called LW (Table 1). RH and LW are antigenic systems determined by distinct gene complexes, which are located on chromosomes 1 and 19, respectively. LW and the Rh antigen complex associate in the membrane, and LW expression requires Rh polypeptide expression. A third antigen, Rh50, is also important for normal expression of the Rh, LW, and glycophorin molecules. The incidence of Rh positive is approximately 85 % in whites, and strikingly 99.5 % in the Japanese population. About 15 % of Caucasians and 0.5 % of Japanese are Rh negative. One main determinant of the Rh antigen system is the RhD antigen. Therefore, Rh positive individuals demonstrate the RhD antigen, whereas the Rh negative ones do not express the RhD antigen. In the Rh blood group system, other antigens, that is, the C/c and E/e antigen group, are known [84]. Their expression is determined by a second locus linked extremely tightly to the locus that determines D antigen expression [82, 83]. There are eight common Rh gene complexes considering D, C, c, E, and e alleles. The d antigen does not exist, because there is no product of the hypothetical d allele of the D gene. The C/c and E/e antigens correspond to a single, non-glycosylated, 417 amino acid-long (33 100 Da) polypeptide (RhCE) with 12 membrane spanning domains, encoded by the RHCE gene. The RHCE gene is composed of 10 exons in 70 kb of genomic DNA. The D antigen corresponds to a distinct non-glycosylated, 417 amino acid-long (33, 100 Da) protein (RhD), which is encoded by the RHD gene. This RhD protein shares 90% of the identity of the amino acid sequence with the RhCE protein, and also demonstrates a 12 membrane spanning domain [85]. The RHD and the RHCE genes are closely linked on human chromosome 1p34–p36.
18 Integral Proteins of the Erythrocyte Membrane
Considering the Rh antigen system, there is a third polypeptide (Rh50), which is a 409 amino acid-long glycoprotein. This peptide is approximately 36 % identical with amino acid sequences of RhD and RhCE, which are also known as the Rh30 polypeptides [86]. This Rh50 peptide has a 12 membrane-spanning domain, and is encoded by the RH50 gene, which is located on human chromosome 6p11–p21.1. The RH50 gene exhibits 10 exons and 32 kb of its size, closely resembling those of the RHD and RHCE genes [87, 88]. Although the Rh50 protein does not itself express Rh antigens, it interacts with the Rh30 polypeptides (RhD and RhCE) in the membrane, and is required to form a heterotetrameric complex, two molecules of Rh50 and one molecule each of RhD and RhCE, which is essential to normal cell surface expression of the Rh30 (RhD and RhCE)-encoded Rh antigens. The RhD and RhCE polypeptides are modified by fatty acylation and are also palmitylated through a thiolester linkage to free sulfhydryl groups on cysteine residues within a consensus tripeptide (Cys-Leu-Pro) in the molecules of the RhD and RhCE. Reactivity of Rh antigens is also regulated by the lipid composition of the membrane in situ. Therefore, alteration of the membrane lipid concentration could induce conformational changes of the Rh peptides. It is also known that the Rh polypeptides interact with the red cell membrane skeleton. It has recently been shown that protein 4.2-deficient red cells lack CD47 implicating an interaction between the Rh complex and the band 3 complex [89]. 3.3 P Blood Group
The P antigens (ISBT No.003: P/PI) are present exclusively on red cell membraneassociated glycosphingolipids [20, 68, 90, 91]. P molecules are produced from lactosyl ceramide by the sequential processing of a series of distinct glycosyltransferases (Table 1). Two distinct pathways have been proposed for the biosynthesis of the P antigen molecule via the step of the P antigen as a precursor, even though so little is known about the corresponding enzymes and genes. The most common phenotype (P1 ) demonstrates full activity, and its frequency is approximately 75 %. The other most common phenotype is P2 with a frequency 25%. Three rare phenotypes have been described: (1) the Pk1 phenotype, which is deficient in P transferase activity, (2) the Pk2 phenotype, which is homozygous for null alleles at the P transferase and the P1 transferase loci, and (3) the p phenotype, which is deficient in all three P antigens (P, P1 , and Pk ). Functions of the P blood group system remain unknown. 3.4 Lutheran Blood Group
In the Lutheran blood group system (ISBT No.005: Lutheran/LU), four major phenotypes are present [20, 68]. Approximately 90 % of normal subjects demonstrates Lu (a–b+), or LU: –1, 2 (ISBT phenotype). Others are Lu (a+b+); LU: 1, 2, Lu (a+b–); LU: 1, –2, and Lu (a−b−); LU: O, which displays no detectable Lutheran antigenic
3 Blood Group Antigens 19
activity (Table 1). In blood, the Lutheran blood group proteins are restricted in their expression to red cells and to B lymphocytes. The Lutheran antigens are expressed on a pair of membrane glycoproteins with molecular weights of 78 000 and 83 000. The Lutheran blood group antigens are a 597 amino acid-long type 1 transmembrane protein. The extracellular domain contains five potential N-glycosylation sites and five peptide segments that share a primary sequence similarity with members of the immunoglobulin superfamily [92]. Its structural organization is similar to MUC18 (the melanoma-associated, mucinlike protein) and related neural cell adhesion molecules. A single membranespanning segment is followed by a 59 amino acid-long intracellular segment, in which an Src homology 3 domain is present. The Lutheran blood group gene (LU) yields a pair of alternately spliced transcripts, producing two molecular weight isoforms of the protein that differ in their lengths of the cytosolic domains. One isoform of the Lutheran polypeptide is identical to a B-CAM (a basal cell carcinoma/epithelial cancer adhesion molecule). The Lutheran blood group gene corresponds to a 12.5 kb gene with 15 exons [93, 94]. There are several nucleotide polymorphisms in exon 3 at base pair 229 of the coding sequence. In the Lutheran (a+b–) phenotype, the nucleotide A at this position contributes to a histidine codon corresponding to residue 77 of the Lutheran polypeptide. In the (a–b+) phenotype, the nucleotide at this position is G corresponding to an arginine codon at residue 77 of the peptide.
3.5 Kell Blood Group
Although many human blood group alloantigens have been discovered, the KEL1 antigen of the Kell blood group system (ISBT No.006: Kell/KEL) is highly immunogenic, behind the RhD antigen, which is the strongest in its immunogenicity [20, 68, 95, 96]. Anti-KEL1 antibodies account for approximately two-thirds of nonRh immune red cell alloantibodies. There are approximately 5000 Kell determinants per red cell (Table 1). The Kell protein is a 732 amino acid-long 83 000 Da polypeptide. This is a type 1 transmembrane protein with a cytosolic NH2 -terminal segment, a single hydrophobic membrane-spanning segment, and a large COOH-terminal extracellular domain [95, 96]. Six potential asparagine-linked glycosylation sites and 15 cysteine residues are present in the extracellular domain. This protein demonstrates two heptad arrays of leucine residues, with clustered cysteine residues, in a leucine zipper motif that may be involved in protein/protein interactions. There is a primary sequence similarity of the Kell glycoprotein with the neprilysin family of zinc-binding neutral endopeptidases such as bradykinin, neurotensin, enkephalin, oxytocin, and angiotensins I and II. Biochemically, the monospecific anti-Kell antibodies precipitate immunologically a single 93 000 Da red cell membrane protein with two different Kell epitopes. This polypeptide exists as part of a large complex (from 115 × 103 Da to 200 × 103
20 Integral Proteins of the Erythrocyte Membrane
Da in size) under non-reducing conditions. The Kell glycoprotein appears to exist as a homodimer in the membrane. 3.6 Lewis Blood Group
The Lewis antigens (ISBT No.007: Lewis/LE) expressed in red cells are unique, because these antigens themselves are not synthesized by erythroid precursors. These antigens are Lewis-active glycosphingolipid molecules basically present in plasma, and are adsorbed by the red cell membrane through an apparently passive process [20, 68]. Two forms of the Lewis antigens (Lea and Leb ) are known as complexes with low- and high-density lipoproteins, and also as aqueous dispersions in the plasma (Table 1). There are approximately 4.5–7.3 × 103 Lea molecules per red cell. The molecule associates with red cell membranes through the ceramide moiety. The Lewis antigens are absent in red cells of newborns, and appear approximately 10 days after birth. The full activity of the Lewis antigens (Lea ) in red cells is established at approximately 24 months of age. The Lea and Leb antigens are synthesized by two distinct fucosyltransferases under the control of the Le blood group gene and the Se blood group gene. The Le gene corresponds to an α (1, 3/1, 4) fucosyltransferase gene which is an Fuc-TIII or FUT3 gene [69, 70, 80]. The enzyme encoded by this gene can use oligosaccharide precursors, including unsubstituted type 1 oligosaccharide precursors, to produce the Lea antigen, and type 1 H antigens to generate Leb antigens. Le genedependent expression of Lea and Leb molecules is identified at the epithelia lining the respiratory tract, urinary tract, digestive tract, salivary glands, and bile ducts [69, 70, 73, 80]. These tissues correspond to the tissue types capable of expression of type 1 H molecules, the synthesis of which is determined by the Se locus. α (1,3/1,4) Fucosyltransferase (FUT3) is a 363 amino acid-long type II transmembrane glycoprotein, which is composed of a 15 amino acid-long NH2 -terminal cytosolic segment, a 19 residue transmembrane segment, and a 320 amino acidlong COOH-terminal catalytic segment, which is located in the Golgi apparatus. The enzyme produces several types of α (1,3) – and α (1,4) fucosylated oligosaccharides, which are Lea , Leb , Lex , and Ley molecules, and sialylated forms of the Lea and Lex antigens. The FUT3 gene, as a member of an α (1,3) fucosyltransferase gene family, is located on chromosome 19p13.3. The Lewis α (1,3/1,4) fucosyltransferase can use the oligosaccharide products formed by the Se-determined α (1,2) fucosyltransferase. In addition, the Se and Le fucosyltransferases are expressed in many of the same tissues. Therefore, the genotype at these two genes determines which of the Lewis-active oligosaccharide molecules is constructed. In secretor-positive subjects, type 1 oligosaccharide precursors are first converted into type 1 H molecules, which become substrates for the Lewis locus-encoded α (1,3/1,4) fucosyltransferase [70, 80]. This enzyme converts these into Le -active molecules: that is, an Le (a–b+) phenotype. On the other hand, in non-secretors, type 1 H antigens cannot be produced in secretory epithelia. Thus,
3 Blood Group Antigens 21
these unsubstituted type 1 molecules are converted into Lea -active oligosaccharides by the Le-encoded α (1,3/1,4) fucosyltransferase: that is an Le (a+b–) phenotype. In homozygotes for null alleles at the Le gene, the phenotype should be Le (a−b−). In this case, two possibilities exist. In Lewis-negative and secretor-positive subjects, type 1 H determinants produced remain unconverted into Le antigens. In Lewisnegative, secretor-negative subjects, the type 1 precursors remain unsubstituted by either blood group fucosyltransferase. 3.7 Duffy Blood Group
There are two major alleles in the Duffy blood group system (ISBT No.008: Duffy/FY), that is, Fya and Fyb with virtually equivalent frequencies (Table 1). Other alleles are Fyx with a weakly reactive form of Fyb , Fy of a null allele, which produces no antigenic activities of Fya and Fyb [20, 68, 97]. The Duffy antigen is a 338 amino acid-long polypeptide with seven membranespanning segments. The Fya and Fyb antigens are localized on a glycoprotein with a molecular weight of from 38 × 103 to 90 × 103 Da. A significant amount of this molecule corresponds to asparagine-linked oligosaccharides, which demonstrate the heterogeneous migration properties in its native condition. The Fya and Fyb alleles differ only by a single amino acid substitution at codon 44: a glycine for the Fya antigen, and aspartic acid for the Fy antigen. Functionally, this protein corresponds to the red cell chemokine receptor, known as DARC (Duffy antigen receptor for chemokines). DARC removes excessive chemokines from the blood and tissues. In the homozygotes for the Fy allele, DARC expression in the bone marrow is defective, in spite of the normal expression of DARC in the extra-marrow tissues. This polymorphism in the tissue-specific expression of DARC is accounted for by a single base pair difference between the Fy allele and the Fya or Fyb alleles [98, 99]. The sequence change is located in the promoter region of the DARC gene, and in the Fy allele, leading to the disruption of a binding site for the erythroid lineage-specific transcription factor GATA-1. Duffy antigens are important for the invasion of human red cells by plasmodium vivax. Successful invasion of the merozoite into the red cells requires the subsequent formation of a junction between the apex of the merozoite and the red cells. Formation of this junction and penetration of the merozoite into the red cells only occur on Duffy-positive red cells [100]. 3.8 Kidd Blood Group
The Kidd antigen of the Kidd blood group system (ISBT No.009: Kidd/Jk) is a 46 000 to 60 000 Da protein [20, 68]. There are approximately 14 000 Kidd molecules in the red cells (Table 1). Two antigens (Jka and Jkb ) are expressed in their virtually equal gene frequencies, that is, 0.514 for the JK a allele and 0.486 for the JK b allele. A single amino acid substitution accounts for the polymorphism in the Kidd blood
22 Integral Proteins of the Erythrocyte Membrane
group system, that is, Asp at codon 280 of the Kidd gene for Jka , and Asn at the same codon for Jkb , respectively. Approximately half of normal individuals express the Jk (a+b+) phenotype in the red cells, owing to the genotype Jka Jkb . A quarter of individuals (26% or 24%) exhibit the Jk(a+b–) phenotype due to the genotype Jka Jka , or the Jk (a–b+) phenotype due to the genotype Jkb Jkb , respectively The Jk (a−b−) phenotype is rarely observed. The gene mutation of aberrant splicing in the Kidd gene leads to the Kidd null phenotype. It has been shown that the primary sequence of a human red cell urea transporter corresponds to the Kidd antigen [101]. 3.9 LW Blood Group
There are two major allelic antigens with the LW blood group (ISBT No.016: Landsteiner-Weiner/LW), that is, LWa and LWb [20, 68, 102]. The LWa antigen predominates and the LWb antigen is detected in less than 1 % of the total population (Table 1). The two alleles yield the common phenotype LW (a+b–), the much less common phenotypes LW (a–b+) and LW (a+b+), and the rare phenotype LW (a−b−), which is homozygous for null alleles at the LW locus. Expression of the LW polypeptides is dependent on expression of the Rh polypeptides. There are approximately 4400 LW molecules per red cell. The LW antigen corresponds to a 42 kDa red cell glycoprotein with a deglycosylated molecular mass of 25 000 Da. The COOH-terminal region of the LW glycoprotein is present at the surface of the red cells. Two structurally distinct LW proteins are present. The first one is a type I transmembrane protein with a short cytoplasmic tail. Another one is a molecule without the membrane-spanning and cytoplasmic regions of the first longer form. The LW antigens are also one of the ICAM (the intracellular adhesion molecule) family, such as ICAM-4 [103]. This protein shares approximately 30% of the same identity of the protein sequence with other members (ICAMs-1, 2, and 3). The LW gene corresponds to three exons of 2.65 kb on human chromosome 19 [104]. The LW a and LW b alleles are different at a single base pair, in a codon 70 corresponding to one amino acid residue, that is, glutamine for LW a , and arginine for LW b . One LW null allele is known with a 10 base pair deletion, resulting in a truncated protein at a position proximal to its transmembrane and cytosolic regions. 3.10 Ii Blood Group
The commonly expressed antigen of the Ii blood group system is denoted I, and its absence is called i [20, 68, 105, 106] (Table 1). The Ii antigens are carbohydrate molecules. I activity corresponds to branched oligosaccharide structure formed by an N-acetylgalactosamine unit attached in β1,6 linkage to a galactose residue within linear lactosamine polymers, whereas molecules with i reactivity correspond
3 Blood Group Antigens 23
to oligosaccharide chains containing at least two repeating N-acetylgalactosamine units. It is known that oligosaccharide chains in neonatal red cells are unbranched, and that those in adult red cells are branched. Red cell expression of the Ii blood group system is developmentally regulated. I determinants are deficient in embryonic, and cord red cells, which are highly reactive with anti-i antibodies. During the first 18 months after birth, this relationship is reversed to yield red cells with increased anti-I reactivity and diminished i reactivity, which is usually observed in adult red cells. This phenomenon corresponds to the fact that the increase in the I reactivity and the decrease in the i reactivity during early infancy are associated with the elaboration and display of increasing numbers of β1,6-linked lactosamine units. I reactivity is determined by a locus encoding an N-acetylglucosaminyltransferase that is expressed in a developmentally regulated fashion. The functions of the Ii blood group system are yet to be defined. 3.11 The Diego and Wright Blood Group Antigens on Band 3
Band 3 (anion exchanger 1: AE1) in red cells demonstrates several polymorphic peptide epitopes. The two major ones are the Diego blood group system (ISBT No.010: Diego/DI) and the Wright blood group system [20, 68] (Table 1). The Diego (Dia ) allele occurs relatively frequently in the Japanese population, but is rare in Caucasians. There are two Diego antigens, i. e., Dia and Dib . The Dia antigens correspond to a proline at position 854 of the 911 amino acid-long glycoprotein (band 3), whereas the Dib antigens correspond to a leucine residue at the same position 854 of this molecule [107]. The Wright alleles correspond to the antigens Wra and Wrb . The frequency of the Wr a allele is rare, but the Wr b allele is fairly prevalent. The Wra antigens correspond to a lysine residue at codon 658 of the AE1 gene, and the Wrb antigens to a glutamine residue at the same codon 658, respectively [108]. Wrb expression is suppressed in glycophorin A deficiency. Seven minor antigens are present on band 3, these are: Waldner (Waa ), Redelberger (Rba ), Traversu (Tra ), Wulfsberg (Wu), Moen (Moa ), ELO, and Warrior (WARR). Amino acid substitutions at codon 552 (Thr → Ile), and at codon 565 (Gly → Ala) of the band 3 gene are known as WARR and Wu, respectively. Similarly, amino acid substitutions at codon 432 (Arg → Trp), at codon 548 (Pro → Leu), at codon 551 (Lys → Asn), and at codon 557 (Val → Met) correspond to ELO, Rba , Tra , and Wda antigens, respectively. 3.12 Other Minor Blood Group Antigens
The Chido (Ch) blood group system and the Rodgers (Rg) blood group system are known to be complementary-associated blood group antigens (Table 1).
24 Integral Proteins of the Erythrocyte Membrane
The decay-accelerating factor (DAF) is one of the complementary regulatory proteins associated with red cell membranes as a glycosylphosphatidylinositol (GPI)-anchor protein. DAF-associated antigens are termed Cromer (Cr)-related antigens. The rare Inab phenotype corresponds to a total deficiency in the Cromerrelated antigens (Table 1). The Knops (Kna ) antigens in red cell membranes are expressed by the complementary receptor 1 protein (CR 1) (Table 1). The Cartwright (Yta and Ytb ) blood group system is associated with acetylcholinesterase in red cells, which is also a GPI-anchor protein. The type III red cells of paroxysmal nocturnal hemoglobinuria is deficient in acetylcholinesterase activity, demonstrating the Cartwright null phenotype (Table 1). The Indian (Ina and Inb ) antigens are expressed on CD44, which is the red cell isoform of the hyaluronan-binding protein (Table 1). The Colton (Coa and Cob ) blood group antigens are carried on an aquaporin-1, which is a red cell glycoprotein (Table 1). The physiological relevance of these antigens remains to be clarified [20, 64, 68].
4 Glycosyl Phoshatidylinositol (GPI) Anchor Proteins
Glycosylphosphatidylinositol (GPI) anchor proteins are proteins with complex glycolipid structures, which are highly conserved in all eukaryotic cells [109–111]. The common core region consists of a molecule of phosphatidylinositol (PI) to which are attached four sugars, that is, one molecule of N-glucosamine and three molecules of mannose (Fig. 5). The last mannose is attached to the carboxyl end of the protein through phosphoethanolamine. A palmitoyl residue is added to the inositol but may later be removed. The N-glucosamine is derived from the N-acetylglucosamine that is added and then deacetylated. The mannose residue is derived from dolichyl phosphoryl mannose. GPI anchor biosynthesis starts on the cytoplasmic side of the rough endoplasmic reticulum [112]. N-Acetylglucosamine is transferred to a phosphatidylinositol acceptor. This product is deacetylated to form glucosamidyl phosphatidylinositol. The first mannose is derived from dolichol phosphoryl mannose, as described above. The phosphoethanolamine is added. Addition of the GPI anchor precursor to the carboxy-terminus of the protein occurs on the luminal side of the endoplasmic reticulum, after amino acid residues 17 to 31 have been cleaved from the protein. When the synthesis of the anchor is completed, it proceeds through the Golgi apparatus and on the membrane surface. There are many membrane proteins which are related to GPI-anchor proteins, such as: (1) enzymes; acetylcholinesterase and leukocyte alkaline phosphatase, (2) complementary defense proteins; decay accelerating factor (DAF and CD55), a membrane inhibitor of reactive lysis (MIRL, CD59, protectin), and C8-binding protein (homologous restriction factor), (3) immunologic proteins; Fcy receptor
References
Fig. 5 Biosynthetic pathway of glycosyl phosphatidylinositol (GPI)anchoring proteins (A), and the molecular structure of the PIG-A gene (B).
IIIa, lymphocyte function-associated antigen-3 (LFA-3, CD58), endotoxin-binding protein receptor (CD14), and CDw 52 (Campath-1), (4) receptors; urokinase (plasminogen activator) receptor, and folate receptor, and (5) granulocyte proteins of unknown function; CD14, CD48, CD66, and Dombrock-Holley/Gregory-bearing protein. It is known that these proteins are missing from the blood cells in paroxysmal nocturnal hemoglobinuria (PNH), in which PNH blood cells are unable to synthesize the mature GPI anchor precursors. In lymphoblastoid cells of PNH, the block always occurs at the step when N-acetylglucosamine is transferred from UDPN-acetylglucosamine to phosphatidylinositol. This is the first step of the pathway and is catalyzed by the α1,6-N-acetylglucosaminyltransferase, which is an enzyne complex formed by four gene products, i. e., PIGA, PIGC, PIGH, and hGP1 [113]. Of these, the PIGA gene has been identified as pathognomonic for PNH [114]. The PIGA (phosphatidylinositol glycan complementation group A) gene consists of six exons and encodes a putative protein of 484 amino acids (approximately 60 kDa). This gene is located on chromosome Xp22.1. Many mutations of this gene have been reported in PNH patients. Recent advances on PNH and related disorders have recently been reviewed in detail [115].
25
26 Integral Proteins of the Erythrocyte Membrane
References 1 TANNER, M. J. A. (1993) Molecular and cellular biology of the erythrocyte anion exchanger (AE1). Semin. Hematol. 30: 34–57. 2 TANNER, M. J. (1997) The structure and function of band 3 (AE1): Recent developments. Mol. Membr. Biol. 14: 155–165. 3 HAMASAKI, N., JENNINGS, M. J., (eds.) (1989) Anion Transport Proteins of the Red Cell Membrane. Elsevier, Amsterdam. 4 BAMBERG, E., PASSAW, H., (eds.) (1992) The Band 3 Proteins: Anion Transporters, Binding Proteins and Senescent Antigens. Elsevier, Amsterdam. 5 LOW, P. S. (1986) Structure and function of the cytoplasmic domain of band 3: Center of erythrocyte membrane-peripheral protein interactions. Biochim. Biophys. Acta 864: 145–167. 6 WANG, D. N., KUHLBRANDT, W., SARABIA, V. E., REITHMEIER, R. A. (1993) Two-dimensional structure of the membrane domain of human band 3, the anion transport protein of the erythrocyte membrane. EMBO J. 12: 2233–2239. 7 LUX, S. E., JOHN, K. M., KOPITO, R. R., LODISH, H. F. (1989) Cloning and characterization of band 3, the human erythrocyte anion-exchange protein (AE1). Proc. Natl. Acad. Sci. USA 86: 9089–9093. 8 ALPER, S. L. (1991) The band 3-related anion exchanger (AE) gene family. Annu. Rev. Physiol. 53: 549–564. 9 ZHANG, D., KIYATKIN, A., BOLIN, J. T., LOW, P. S. (2000) Crystallographic structure and functional interpretation of the cytoplasmic domain of erythrocyte membrane band 3. Blood 96: 2925– 2933. 10 STECK, T. L., RAMOS, B., STRAPAZON, E. (1976) Proteolytic dissection of band 3, the predominant transmembrane polypeptide of the human erythrocyte membrane. Biochemistry 15: 1154–1161. 11 MOHANDAS, N., WINARDI, R., KNOWLES, D., LEUNG, A., PARRA, M., GEORGE, E., CONBOY, J., CHASIS, J. (1992) Molecular
12
13
14
15
16
17
18
19
20
basis for membrane rigidity of hereditary ovalocytosis: A novel mechanism involving the cytoplasmic domain of band 3. J. Clin. Invest. 89: 686–692. DAVIS, L., LUX, S. E., BENNETT, V. (1989) Mapping the ankyrin-binding site of the human erythrocyte anion exchanger. J. Biol. Chem. 264: 9665–9672. WILLARDSON, B. M., THEVENIN, B. J., HARRISON, M. L., KUSTER, W. M., BENSON, M. D., LOW, P. S. (1989) Localization of the ankyrin-binding site on erythrocyte membrane protein, band 3. J. Biol. Chem. 264: 15893–15899. DING, Y., KOBAYASHI, S., KOPITO, R. (1996) Mapping of ankyrin binding determinants on the erythroid anion exchanger, AE1. J. Biol. Chem. 271: 22494–22498. Van DORT, H. M., MORIYAMA, R., LOW, P. S. (1998) Effect of band 3 subunit equilibrium on the kinetics and affinity of ankyrin binding to erythrocyte membrane vesicles. J. Biol. Chem. 273: 14819–14826. LOMBARDO, C. R., WILLARDSON, B. M., LOW, P. S. (1992) Localization of the protein 4.1-binding site on the cytoplasmic domain of erythrocyte membrane band 3. J. Biol. Chem. 267: 9540–9546. WORKMAN, R. F., LOW, P. S. (1998) Biochemical analysis of potential sites for protein 4.1-mediated anchoring of the spectrin-actin skeleton to the erythrocyte membrane. J. Biol. Chem. 273: 6171–6176. RYBICKI, A. C., MUSTO, S., SCHWARTZ, R. S. (1995) Identification of a band 3-binding site near the N-terminus of erythrocyte membrane protein 4.2. Biochem. J. 309: 677–681. FUKUDA, M. (1993) Molecular genetics of the glycophorin A gene cluster. Semin. Hematol. 30: 138–151. LOWE, J. B. (2001) Red cell membrane antigens, in: Molecular Basis of Blood Disease, (STAMATOYANNOPOULOS, G., MAJERUS, P. W., PERLMUTTER, R. M., VARMUS, H. eds.), 3rd ed. McGraw-Hill, New York, pp. 314–361.
References 21 HASSOUN, H., HANADA, T., LUTCHMAN, M., SAHR, K. E., PALEK, J., HANSPAL, M., CHISHTI, A. H. (1998) Complete deficiency of glycophorin A in red blood cells from mice with targeted inactivation of the band 3 (AE1) gene. Blood 91: 2146–2151. 22 BRUCE, L. J., RING, S. M., ANSTEE, D. J., REID, M. E., WILKINSON, S., TANNER, M. J. (1995) Changes in the blood group Wright antigens are associated with a mutation at amino acid 658 in human erythrocyte band 3: A site of interaction between band 3 and glycophorin A under certain conditions. Blood 85: 541–547. 23 TELEN, M. J., CHASIS, J. A. (1990) Relationship of the human erythrocyte Wr antigen to an interaction between glycophorin A and band 3. Blood 76: 842–848. 24 CASEY, J. R., REITHMEIER, R. A. (1991) Analysis of the oligomeric state of band 3, the anion transport protein of the human erythrocyte membrane, by size exclusion high performance liquid chromatography. Oligomeric stability and origin of heterogeneity. J. Biol. Chem. 266: 15726–15737. 25 TSAI, I. H., MURTHY, S. N., STECK, T. L. (1982) Effect of red cell membrane binding on the catalytic activity of glyceraldehyde-3 -phosphate dehydrogenase. J. Biol. Chem. 257: 1438–1442. 26 DE, B. K., KIRTLEY, M. E. (1977) Interaction of phosphoglycerate kinasse with human erythrocyte membranes. J. Biol. Chem. 252: 6715–6720. 27 JENKINS, J. D., MADDEN, D. P., STECK, T. L. (1984) Association of phospho-fructokinase and aldolase with the membrane of the intact erythrocyte. J. Biol. Chem. 259: 9374–9378. 28 HARRISON, M. L., RATHINAVELU, P., ARESE, P., GEAHLEN, R. L., LOW, P. S. (1991) Role of band 3 tyrosine phosphorylation in the regulation of erythrocyte glycolysis. J. Biol. Chem. 266: 4106–4111. 29 CHE´ TRITE, G., CASSOLY, R. (1985) Affinity of hemoglobin for the cytoplasmic fragment of human erythrocyte
30
31
32
33
34
35
36
37
38
membrane band 3. Equilibrium measurements at physiological pH using matrix-bound proteins: The effects of ionic strength, deoxygenation and of 2, 3-diphosphoglycerate. J. Mol. Biol. 185: 639–644. WAUGH, S. M., WALDER, J. A., LOW, P. S. (1987) Partial characterization of the copolymerization reaction of erythrocyte membrane band 3 with hemi-chromes. Biochemistry 26: 1777–1783. FUJINAGA, J., TANG, X. B., CASEY, J. R. (1999) Topology of the membrane domain of human erythrocyte anion exchange protein, AE1. J. Biol. Chem. 274: 6626–6633. FORSTER, R. E., GROS, G., LIN, L., ONO, Y., WUNDER, M. (1998) The effect of 4,4 -diisothiocyanato-stilbene-2, 2 -disulfo-nate on CO2 permeability of the red blood cell membrane. Proc. Natl. Acad. Sci. USA 95: 15815– 15820. VINCE, J. W., REITHMEIER, R. A. (1998) Carbonic anhydrase II binds to the carboxyl terminus of human band 3, the erythrocyte Cl− /HCO3 − exchanger. J. Biol. Chem. 273: 284302–28437. TANG, X. B., KOVACS, M., STERLING, D., CASEY, J. R. (1999) Identification of residues lining the translocation pore of human AE1, plasma membrane anionexchange protein. J. Biol. Chem. 274: 3557–3564. TSUJI, A., KAWASAKI, K., OHNISHI, S. (1988) Regulation of band 3 mobilities in erythrocyte ghost membranes by protein association and cytoskeletal meshwork. Biochemistry 27: 7447–7452. CORBETT, J. D., AGRE, P., PALEK, J., GOLAN, D. E. (1994) Differential control of band 3 lateral and rotational mobility in intact red cells. J. Clin. Invest. 94: 683–688. TOMISHIGE, M., SAKO, Y., KUSUMI, A. (1998) Regulation mechanism of the lateral diffusion of band 3 in erythrocyte membranes by the membrane skeleton. J. Cell Biol. 142: 989–1000. JAROLIM, P., RUBIN, H. L., ZAKOVA, D., STORRY, J., REID, M. E. (1998) Characterization of seven low incidence blood group antigens carried by
27
28 Integral Proteins of the Erythrocyte Membrane
39
40
41
42
43 44
45
46
47
erythrocyte band 3 protein. Blood 92: 4836–4843. BROSIUS, F. C. 3rd., ALPER, S. L., GARCIA, A. M., LODISH, H. F. (1989) The major kidney band 3 gene transcript predicts an amino-terminal truncated band 3 polypeptide. J. Biol. Chem. 264: 7784–7787. DING, Y., CASEY, J. R., KOPITO, R. R. (1994) The major kidney AE1 isoform does not bind ankyrin (Ank 1) in vitro. An essential role for the 79 NH2 -terminal amino acid residues of band 3. J. Biol. Chem. 269: 32201– 32208. WANG, C. C., MORIYAMA, R., LOMBARDO, C. R., LOW, P. S. (1995) Partial characterization of the cytoplasmic domain of human kidney band 3. J. Biol. Chem. 270: 17892–17897. CARTRON, J. P., LE VAN KIM, C., COLIN, Y. (1993) Glycophorin C and related glycoproteins: Structure, function, and regulation. Semin. Hematol. 30: 152–168. CHASIS, J. A., MOHANDAS, N. (1992) Red cell glycophorins. Blood 80: 1869–1879. TOMITA, M., FURTHMAYR, H., MARCHESI, V. T. (1978) Primary structure of human erythrocyte glycophorin-A. Isolation and characterization of peptides and complete amino acid sequence. Biochemistry 17: 4756–4770. SIEBERT, P. D., FUKUDA, M. (1986) Isolation and characterization of human glycophorin A cDNA clones by a synthetic oligonucleotide approach: Nucleotide sequence and mRNA structure. Proc. Natl. Acad. Sci. USA 83: 1665–1669. SIEBERT, P. D., FUKUDA, M. (1987) Molecular cloning of a human glycophorin B cDNA: Nucleotide sequence and genomic relationship to glycophorin A. Proc. Natl. Acad. Sci. USA 84: 6735–6739. COLIN, Y., RAHUEL, C., LONDON, J., ROMEO, P. H., D AURIOL, L., GALIBERT, F., CARTRON, J. P. (1986) Isolation of cDNA clones and complete amino acid sequence of human erythrocyte glycophorin C.J. Biol. Chem. 261: 229–233.
48 BLANCHARD, D., DAHR, W., HUMMEL, M., LATRON, F., BEYREUTHER, K., CARTRON, J. P. (1987) Glycophorin B and C from human erythrocyte membranes: Purification and sequence analysis. J. Biol. Chem. 262: 5808–5811. 49 KUDO, S., FUKUDA, M. (1990) Identification of a novel human glycophorin, glycophorin E, by isolation of genomic clones and complementary DNA clones utilizing polymerase chain reaction. J. Biol. Chem. 265: 1102–1110. 50 VIGNAL, A., RAHUEL, C., LONDON, J., CHERIF-ZAHAR, B., SCHAFF, S., HATTAB, C., OKUBO, Y., CARTRON, J. P. (1990) A novel member of the human glycophorin A and B gene family. Molecular cloning and expression. Eur. J. Biochem. 191: 619–625. 51 VIGNAL, A., LONDON, J., RAHUEL, C., CARTRON, J. P. (1990) Promoter sequence and chromosomal organization of the genes encoding glycophorins A, B and E. Gene 95: 289–293. 52 ONDA, M., FUKUDA, M. (1995) Detailed physical mapping of the genes encoding glycophorins A, B and E, as revealed by p1 plasmids containing human genomic DNA. Gene 159: 225–230. 53 BLUMENFELD, O. O., HUANG, C. H. (1997) Molecular genetics of glycophorin MNS variants. Transfus. Clin. Biol. 4: 357–365. 54 TANNER, M. J. A., ANSTEE, D. J. (1976) The membrane change in En (a–) human erythrocytes. Absence of the major erythrocyte sialoglycoprotein. Biochem. J. 153: 271–277. 55 PETERS, L. L., SHIVDASANI, R. A., LIU, S. C., HANSPAL, M., JOHN, K. M., GONZALEZ, J. M., BRUGNARA, C., GWYNN, B., MOHANDAS, N., ALPER, S. L., ORKIN, S. H., LUX, S. E. (1996) Anion exchanger 1 (band 3) is required to prevent erythrocyte membrane surface loss but not to form the membrane skeleton. Cell 86: 917–927. 56 SOUTHGATE, C. D., CHISHTI, A. H., MITCHELL, B., YI, S. J., PALEK, J. (1996) Targeted disruption of the murine erythroid band 3 gene results in spherocytosis and severe haemolytic anaemia despite a normal membrane skeleton. Nature Genet. 14: 227–230.
References 57 INABA, M., YAWATA, A., KOSHINO, I., SATO, K., TAKEUCHI, M., TAKAKUWA, Y., MANNO, S., YAWATA, Y., KANZAKI, A., SAKAI, J., BAN, A., ONO, K., MAEDE, Y. (1996) Defective anion transport and marked spherocytosis with membrane instability caused by hereditary total deficiency of red cell band 3 due to a nonsense mutation. J. Clin. Invest. 97: 1804–1817. 58 CHISHTI, A. H., PALEK, J., FISHER, D., MAALOUF, G. J., LIU, S. C. (1996) Reduced invasion and growth of plasmodium falciparum into elliptocytic red blood cells with a combined deficiency of protein 4.1, glycophorin C, and p55. Blood 87: 3462–3469. 59 MACKENZIE, K. R., PRESTEGARD, J. H., ENGELMAN, D. M. (1997) A transmembrane helix dimer: Structure and implications. Science 276: 131–133. 60 EL-MALIKI, B., BLANCHARD, D., DAHR, W., BEYREUTHER, K., CARTRON, J. P. (1989) Structural homology between glyco-phorins C and D of human erythrocytes. Eur. J. Biochem. 183: 639–643. 61 COLIN, Y., LE VAN KIM, C., TSAPIS, A., CLERGET, M., D’AURIOL, L., LONDON, J., GALIBERT, F., CARTRON, J. P. (1989) Human erythrocyte glycophorin C. Gene structure and rearrangement in genetic variants. J. Biol. Chem. 264: 3773– 3780. 62 DAHR, W., KIEDROWSKI, S., BLANCHARD, D., HERMAND, P., MOULDS, J. J., CARTRON, J. P. (1987) High frequency of human erythrocyte membrane sialoglycoproteins, V Characterization of the Gerbich blood group antigens: Ge 2 and Ge 3. Biol. Chem. Hoppe-Seyler, 368: 1375–1383. 63 LEWIS, M., ANSTEE, D. J., BIRD, G. W. G., BRODHEIM, E., CARTRON, J. P., CONTREAS, M., CROOKSTON, M. C., DAHR, W., DANIELS, G. L., ENGELFRIET, C. P., GILES, CM., ISSITT, P. D., JØRGENSEN, J., KORNSTAD, L., LUBENKO, A., MARSH, W. L., MCCREARY, J., MOORE, B. P. L., MOREL, P., MOULDS, J. J., NAVANLINNA, H., NORDHAGEN, R., OKUBO, Y., ROSENFIELD, R. E., ROUGER, Ph., RUBINSTEIN, P., SALMON, Ch., SEIDL, S.,
64
65
66
67
68
69
70
71
72
73
74
SISTONEN, P., TIPPETT, P., WALKER, R. H., WOODFIELD, G., YOUNG, S. (1990) Blood group terminology 1990. Vox Sang. 58: 152–169. DANIELS, G. L., ANSTEE, D. J., DAHR, W., JØRGENSEN, J., LEVENE, C., LUBENKO, A., MOULDS, J. J., OVERBEEKE, M., ROUGER, P., SISTONEN, P., WOODFIELD, G. (1995) Blood group terminology 1995. ISBT Working Party on terminology for red cell surface antigens. Vox Sang. 69: 265–279. CARTRON, J. P., BAILLY, P., LE VAN KIM, C., CHERIF-ZAHAR, B., MATASSI, G., BERTRAND, O., COLIN, Y. (1998) Insights into the structure and function of membrane polypeptides carrying blood group antigens. Vox Sang. 74 (Suppl. 2): 29–64. REID, M. E., MCMANUS, K., ZELINSKI, T. (1998) Chromosome location of genes encoding human blood groups. Transfus. Med. Rev. 12: 151–161. AVENT, N. D. (1997) Human erythrocyte antigen expression: Its molecular bases. Br. J. Biomed. Sci. 54: 16–37. MOLLISON, P. L., ENGELFRIET, C. P., CONTRERAS, M., (eds.) (1997) Blood Transfusion in Clinical Medicine. 10th ed. Blackwell, Oxford. ORIOL, R., LE PENDU, J., MOLLICONE, R. (1986) Genetics of ABO, H, Lewis, X and related antigens. Vox Sang. 51: 161–171. WATKINS, W. M. (1980) Biochemistry and genetics of the ABO, Lewis, and P blood group systems. Adv. Hum. Genet. 10: 1–136, 379–385. YAMAMOTO, F. (1995) Molecular genetics of the ABO histo-blood group system. Vox Sang. 69: 1–7. HAKOMORI, S. (1981) Blood group ABH and Ii antigens of human erythrocytes: Chemistry polymorphism, and their developmental change. Semin. Hematol. 18: 39–62. ORIOL, R. (1990) Genetic control of the fucosylation of ABH precursor chains. Evidence for new epistatic interactions in different cells and tissues. J. Immunogenet. 17: 235–245. LAINE, R. A., RUSH, J. S. (1988) Chemistry of human erythrocyte polylactosamine glycopeptides
29
30 Integral Proteins of the Erythrocyte Membrane
75
76 77
78
79
80
81
82
83
84
(erythroglycans) as related to ABH blood group antigenic determinants. Adv. Exp. Med. Biol. 228: 331–347. EASTLUND, T. (1998) The histo-blood group ABO system and tissue transplantation. Transfusion 38: 975–988 DANIELS, G. (1995) Human Blood Groups. Blackwell, Oxford. JOZIASSE, D. H. (1992) Mammalian glycosyltransferases: Genomic organization and protein structure. Glycobiology 2: 271–277. LARSEN, R. D., ERNST, L. K., NAIR, R. P., LOWE, J. B. (1990) Molecular cloning, sequence, and expression of human GDP-L-fucose: β-D-galactoside 2-β-L-fucosyltransferase cDNA that can form the H blood group antigen. Proc. Natl. Acad. Sci. USA 87: 6674– 6678. ROUQUIER, S., LOWE, J. B., KELLY, R. J., FERTITTA, A. L., LENNON, G. G., GIORGI, D. (1995) Molecular cloning of a human genomic region containing the H blood group alpha (1, 2) fucosyltransferase gene and two H locus-related DNA restriction fragments. Isolation of a candidate for the human Secretor blood group locus. J. Biol. Chem. 270: 4632–4639. COSTACHE, M., CAILLEAU, A., FERNANDEZ-MATEOS, P., ORIOL, R., MOLLICONE, R. (1997) Advances in molecular genetics of alpha-2-and alpha-3/4-fucosyltransferases. Transfus. Clin. Biol. 4: 367–382. FERNANDEZ-MATEOS, P., CAILLEAU, A., HENRY, S., COSTACHE, M., ELMGREN, A., SVENSSON, L., LARSON, G., SAMUELSSON, B. E., ORIOL, R., MOLLICONE, R. (1998) Point mutations and deletion responsible for the Bombay H null and the Reunion H weak blood groups. Vox Sang. 75: 37–46. AVENT, N. D., REID, M. E. (2000) The Rh blood group system: a review. Blood 95: 375–387. HUANG, C. H., LIU, P. Z., CHENG, J. G. (2000) Molecular biology and genetics of Rh blood group system. Semin. Hematol. 37: 150–165. SIEGEL, D. L. (1998) The human immune response to red blood cell antigens as
85
86
87
88
89
90
91
92
93
revealed by repertoire cloning. Immunol. Res. 17: 239–251. HUANG, C. H. (1997) Molecular insights into the Rh protein family and associated antigens. Curr. Opin. Hematol. 4: 94–103. CHE´ RIF-ZAHAR, B., RAYNAL, V., GANE, P., MATTEI, M. G., BAILLY, P., GIBBS, B., COLIN, Y., CARTRON, J. P. (1996) Candidate gene acting as a suppressor of the RH locus in most cases of Rh-deficiency. Nature Genet. 12: 168–173. HUANG, C. H. (1998) The human Rh 50 glycoprotein gene. Structural organization and associated splicing defect resulting in Rh (null) disease. J. Biol. Chem. 273: 2207–2213. MATASSI, G., CHE´ RIF-ZAHAR, B., RAYNAL, V., ROUGER, P., CARTRON, J. P. (1998) Organization of the human RH50A gene (RHAG) and evolution of base composition of the RH gene family. Genomics 47: 286–293. BRUCE, L. J., GHOSH, S., KING, M. J., LAYTON, D. M., MAWBY, WJ., STEWART, G. W., OLDENBORG, P. A., DELAUNAY, J., TANNER, M. J. (2002) Absence of CD47 in protein 4.2-deficient hereditary spherocytosis in man: An interaction between the Rh complex and the band 3 complex. Blood 100: 1878–1885. MARCUS, D. M., KUNDU, S. K., SUZUKI, A. (1981) The P blood group system: Recent progress in immunochemistry and genetics. Semin. Hematol. 18: 63–71. YANG, Z., BERGSTROM, J., KARLSSON, K. A. (1994) Glycoproteins with Gal alpha 4 Gal are absent from human erythrocyte membranes, indicating that glycolipids are the sole carriers of blood group P activities. J. Biol. Chem. 269: 14620–14624. RAHUEL, C., LE VAN KIM, C., MATTEI, M. G., CARTRON, J. P., COLIN, Y. (1996) A unique gene encodes spliceoforms of the B-cell adhesion molecule cell surface glycoprotein of epithelial cancer and of the Lutheran blood group glycoprotein. Blood 88: 1865–1872. EL NEMER, W., RAHUEL, C., COLIN, Y., GANE, P., CARTRON, J. P., LE VAN KIM, C.
References
94
95
96
97
98
99
100
101
(1997) Organization of the human LU gene and molecular basis of the Lu (a)/Lu (b) blood group polymorphism. Blood 89: 4608–4616. PARSONS, S. F., MALLINSON, G., DANIELS, G. L., GREEN, C. A., SMYTHE, J. S., ANSTEE, D. J. (1997) Use of domain-deletion mutants to locate Lutheran blood group antigens to each of the five immunoglobulin superfamily domains of the Lutheran glycoprotein: Elucidation of the molecular basis of the Lu (a)/Lu (b) and the Au (a)/Au (b) polymorphisms. Blood 89: 4219–4225. LEE, S., RUSSO, D., REDMAN, CM. (2000) The Kell blood group system: Kell and XK membrane proteins. Semin. Hematol. 37: 113–121. RUSSO, D., REDMAN, C., LEE, S. (1998) Association of XK and Kell blood group proteins. J. Biol. Chem. 273: 13950–13956. POGO, O., CHAUDHURI, A. (2000) The Duffy protein. A malarial and chemo-kine receptor. Semin. Hematol. 37: 122–129. TOURNAMILLE, C., COLIN, Y., CARTRON, J. P., LE VAN KIM, C. (1995) Disruption of a GATA motif in the Duffy gene promoter abolishes erythroid gene expression in Duffy-negative individuals. Nature Genet. 10: 224–228. IWAMOTO, S., OMI, T., KAJII, E., IKEMOTO, S. (1995) Genomic organization of the glycoprotein D gene: Duffy blood group Fya/Fyb alloantigen system is associated with a polymorphism at the 44-amino acid residue. Blood 85: 622–626. CHAUDHURI, A., POLYAKOVA, J., ZBRZEZNA, V., POGO, A. O. (1995) The coding sequence of Duffy blood group gene in humans and simians: Restriction fragment length polymorphism, antibody and malarial parasite specificities, and expression in nonerythroid tissues in Duffy-negative individuals. Blood 85: 615–621. OLIVES, B., NEAU, P., BAILLY, P., HEDIGER, M. A., ROUSSELET, G., CARTRON, J. P., RIPOCHE, P. (1994) Cloning and functional expression of a urea transporter from human bone marrow cells. J. Biol. Chem. 269: 31649–31652.
102 STORRY, J. R. (1992) The LW blood group system. Immunohematology 8: 87. 103 HAYFLICK, J. S., KILGANNON, P., GALLATIN, W. M. (1998) The intercellular adhesion molecule (ICAM) family of proteins. New members and novel function. Immunol. Res. 17: 313–327. 104 HERMAND, P., LE PENNEC, P. Y., ROUGER, P., CARTRON, J. P., BAILLY, P. (1996) Characterization of the gene encoding the human LW blood group protein in LW+ LW–phenotypes. Blood 87: 2962–2967. 105 HAKOMORI, S. (1981) Blood group ABH and Ii antigens of human erythrocytes: Chemistry, polymorphism, and their developmental change. Semin. Hematol. 18: 39–62. 106 BIERHUIZEN, M. F., MAEMURA, K., KUDO, S., FUKUDA, M. (1995) Genomic organization of core 2 and I branching β-1,6-N-acetylglucosaminyltransferases. Implication for evolution of the β-1, 6-N-acetylglucosaminyltransferase gene family. Glycobiology 5: 417–425. 107 BRUCE, L. J., ANSTEE, D. J., SPRING, F. A., TANNER, M. J. (1994) Band 3 Memphis variant II: Altered stilbene disulfonate binding and the Diego (Dia ) blood group antigen are associated with the human erythrocyte band 3 mutation Pro54 → Leu. J. Biol. Chem. 269: 16155–16158. 108 BRUCE, L. J., RING, S. M., ANSTEE, D. J., REID, M. E., WILKINSON, S., TANNER, M. J. (1995) Changes in the blood group Wright antigens are associated with a mutation at amino acid 658 in human erythrocyte band 3: A site of interaction between band 3 and glycophorin A under certain conditions. Blood 85: 541–547. 109 BESSLER, M., ATKINSON, J. P. (2001) Paroxysmal nocturnal hemoglobinuria, in: Molecular Basis of Blood Disease (STAMATOYANNOPOULOS, G., MAJERUS, P.W., PERLMUTTER, R.M., VARMUS, H., eds.), 3rd ed. McGraw-Hill, New York, pp. 564–577. 110 ROSSE, W. F., (2000) Paroxysmal nocturnal hemoglobinuria, in: Hematology. Basic Principles and Practice (HOFFMAN, R., BENZ, E.J. Jr., SHATTIL, S.J., FURIE, B., COHEN, H.J., SILBERSTEIN,
31
32 Integral Proteins of the Erythrocyte Membrane
L. E., MCGLAVE, P., eds.), 3rd ed. Churchill Livingstone, New York, pp. 331–342. 111 FERGUSON, M. A. (1992) Glycosyl-phosphatidylinositol membrane anchor: The tale of a tail. Biochem. Soc. Trans. 20: 243–256. 112 VIDUGIRIENE, J., MENON, A. K. (1995) Biosynthesis of glycosyl-phosphatidyl-inositol anchors. Methods Enzymol. 250: 513–535. 113 MIYATA, T., TAKEDA, J., IIDA, Y., YAMADA, N., INOUE, N., TAKAHASHI, M., MAEDA, K., KITANI, T., KINOSHITA, T. (1993) The
cloning of PIG-A, a component in the early step of GPI-anchor biosynthesis. Science 259: 1318–1320. 114 TAKEDA, J., MIYATA, T., KAWAGOE, K., IIDA, Y., ENDO, Y., FUJITA, T., TAKAHASHI, M., KITANI, T., KINOSHITA, T. (1993) Deficiency of the GPI anchor caused by a somatic mutation of the PIG-A gene in paroxysmal nocturnal haemoglobinuria. Cell 73: 703–711. 115 OMINE, M., KINOSHITA, T., (eds.) (2003) Paroxysmal Nocturnal Hemoglobinuria and Related Disorders. Molecular Aspects of Pathogenesis. Springer, Tokyo, pp. 1–285.
1
Retinoid Receptors RAR and RXR: Structure and Function Alexander Mata de Urquiza, and Thomas Perlmann
Ludwig Institute for Cancer Research, Karolinska Institute, Stockholm, Sweden
Originally published in: Cellular Proteins and Their Fatty Acids in Health and Disease. Edited by Asim K. Duttaroy and Friedrich Spener. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30437-0
1 Retinoids in Development
Vitamin A and its biologically active metabolites, the retinoids, play essential roles in embryonic development, differentiation, and maintenance of homeostasis in the adult organism [1–3]. Adult animals suffering from vitamin A deficiency (VAD) display a number of abnormalities, including impaired vision, fertility, immune response, and epithelial differentiation. Furthermore, altering the levels of retinoic acid (RA) during embryogenesis leads to phenotypical malformations affecting for example cranofacial, CNS, limb, and heart development (reviewed in Refs [4] and [5]), underscoring the importance of correct vitamin A levels during gestation. Some of these abnormalities are thought to arise due to dysregulation of Hox gene expression. Hox genes encode a family of homeobox-containing transcription factors that play crucial roles in informing cells of their position along the anterio-posterior axis (reviewed in Refs [3] and [5]). The overlapping expression domains of these genes along the anterio-posterior axis are thought to specify the positional identity of cells along this axis, thereby enabling them to adopt a correct developmental fate. Interestingly, several Hox genes contain RA response elements (RAREs; see below) in their promoters, indicating that retinoids are involved in regulating their expression. Accordingly, embryos that develop in the absence of RA or in the presence of excess RA, display altered Hox gene expression [5, 6]. The distribution of retinoids in vivo has been analyzed using transgenic mice that express a retinoic acid-inducible reporter gene [7–14]. The results suggest that the brachial and lumbar regions of the developing spinal cord are “hot spots” of RA synthesis. In addition, reporter gene expression has been detected in the developing forebrain, forelimbs, and optic cup, as well as at the midbrain/hindbrain boundary [7, 9–11]. A better understanding of how this highly localized synthesis of Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Retinoid Receptors RAR and RXR: Structure and Function
Fig. 1 (A) Metabolic steps leading from retinol to retinoic acid and one of its oxidation products. The initial step is ratelimiting, and is the only reversible step in this process. 9-cis Retinoic acid (see B) is presumably formed via spontaneous iso-
merization from the all-trans form. (B) Structures of three natural RXR agonists, including 9-cis retinoic acid (left), the fatty acid docosahexaenoic acid (middle), and the chlorophyll metabolite phytanic acid (right) (see text for details).
RA is achieved in tissues has been gained with the cloning of enzymes responsible for RA production and degradation. Dietary all-trans retinol (atROL) is converted to all-trans retinal (atRAL), a reaction catalyzed by one of various ROL dehydrogenases (ROLDH). In the following step, atRAL is converted into atRA by a RAL dehydrogenase (RALDH) (reviewed in Refs [15] and [16]) (see Fig. 1A). 9cRA has been suggested to form by spontaneous isomerization from the all-trans form. Although atRA and 9cRA are the two best-characterized retinoids, other bioactive forms also exist in vivo, for example 3,4-didehydroRA and 4-oxo-RA [17, 18]. 4-oxo-RA is one of several breakdown products of atRA in catabolic reactions catalyzed by a family of cytochrome P450 enzymes, the CYP26 family [19–21], involved in attenuating the RA signal (see Fig. 1A). In early development, most embryonal RA is thought to be synthesized by RALDH2 [22–26]. Intriguingly, a comparison of the expression patterns of RALDH2 and CYP26 suggests that both enzymes are present in complementary and nonoverlapping domains, thereby creating areas differing in their RA levels [19, 24, 25]. CYP26 expression is restricted to the anterior- and posterior-most structures of the embryo, including the presumptive head and tailbud regions, keeping RA levels low. RALDH2 on the other hand is expressed in more central areas of the embryo, from the posterior end of the developing hindbrain to the anterior regions of the developing tail, and is responsible for RA synthesis in this region. Targeted
2 Retinoid Receptors Transduce Retinoic Acid Signals 3
disruption of the CYP26 gene gives rise to embryos where anterior brain structures are transformed to more posterior ones, resembling the morphogenetic defects generated by excess RA administration [5, 27, 28]. Conversely, disruption of the RALDH2 gene produces embryos with phenotypes indicating an expansion of anterior brain structures, reminiscent of the defects generated by vitamin A deficiency [6, 29, 30]. Taken together, these results suggest that disturbances in the graded synthesis of RA leads to posteriorization (excess RA) or anteriorization (shortage of RA) of structures in the developing head, largely due to misexpression of Hox genes. In an elegant study, Koide and co-workers show that active repression of RA target genes needs to be maintained for correct development of anterior regions [31]. This suggests that the mere absence of RA in anterior tissues is not sufficient for normal development, but that active repression of RA-responsive genes per se is essential for correct anterior patterning. Interestingly, retinoid receptors are directly involved in this repression by a mechanism that will be further discussed below (see “co-repressors” below). Within cells, cellular retinol binding proteins I and II (CRBP-I and -II) and cellular retinoic acid binding proteins I and II (CRABP-I and -II), act as cytoplasmic carriers for ROL and RA, respectively. Based on expression studies, it was thought that CRBPs function in protecting retinol from the cellular environment and presenting ROL to RA synthesizing enzymes. In contrast, CRABPs were suggested to be important in sequestering and promoting breakdown of excess RA in embryonic regions sensitive to the teratogenic effects of retinoids (see, for example, Refs [32–34]). However, animals carrying null mutations in both CRABP-I and -II are indistinguishable from wild-type littermates, suggesting that CRABPs are dispensable for normal embryonal development and adult physiology [35]. On the other hand, although not essential for embryonal development, adult mice lacking CRBP-I show decreased liver retinyl ester storage and predisposition to vitamin A deficiency [36], suggesting an important role for CBRP-I during vitamin A-limiting conditions. Interestingly, two CRBP-III genes have recently been identified [37, 38], showing partially overlapping patterns of expression, and it will be interesting to see what roles these proteins might play in vivo.
2 Retinoid Receptors Transduce Retinoic Acid Signals
The cloning and characterization of the nuclear hormone receptors (NRs) that bind and transduce the retinoid signal represent landmarks in our understanding of retinoid physiology (see references in Ref [39]). Two families of retinoid receptors exist, the retinoic acid receptors (RARs) and the retinoid X receptors (RXRs), each present in three isotypes, α, β, and γ (reviewed in Refs [39–41]). In addition, each RAR and RXR isotype exists in several isoforms (for example RARα1 and α2) due to differential promoter usage. Expression studies of RARα, β, and γ show that RARα is ubiquitously expressed, while RARβ and RARγ show a more temporal and spatial restriction, often in a non-overlapping fashion [32–34, 42, 43]. RXRs
4 Retinoid Receptors RAR and RXR: Structure and Function
are similarly differentially expressed both during development and in adult animals [42–45]. RXRβ is expressed in a general fashion, while RXRα is abundant in tissues associated with lipid metabolism. RXRγ expression is highly restricted to a few distinct areas, including the developing striatum and spinal motor neurons (where it is co-expressed with RARβ). A strong effort has been made during the last few years to understand the specific functions of the different retinoid receptors during development. Despite the unique distributions of the different receptors, genetic ablation studies have revealed a surprising redundancy in function among the different members of each receptor subtype. The results of single and double RAR or RXR mutants, as well as compound RAR/RXR mutants, have been extensively reviewed [41, 46]. Taken together, the results suggest that RARs and RXRs are important for correct regulation of numerous developmental processes. Although many known VAD phenotypes are not apparent in single RAR−/− mice, simultaneous ablation of two RAR genes recapitulates most aspects of VAD. Additionally, compound mutants of RXRα and RARα, β, or γ together reproduce almost the entire VAD phenotype spectrum, supporting the notion that RAR/RXR heterodimers are the active units for retinoid signaling in vivo (summarized in Ref [46]). Interestingly, a number of malformations not described in VAD studies are also observed, either suggesting roles for unliganded receptors or reflecting difficulties in achieving complete VAD by dietary deprivation.
3 Retinoid Receptors Belong to the Nuclear Hormone Receptor Family
NRs comprise a large and evolutionary well conserved family of transcription factors found in organisms as diverse as nematodes, flies, and mammals (reviewed in Refs [47–49]). NRs are thought to function as ligand-activated transcription factors, exerting widely different biological functions by regulating target gene expression positively and/or negatively, and include the receptors for certain small, lipophilic molecules. RARs bind both all-trans and 9-cis retinoic acids, whereas RXRs only bind 9-cRA (see Ref. [50] and references therein). Retinoid receptors activate transcription by recognizing and binding consensus sequences known as RA response elements (RAREs) in the promoters of target genes (see below). RAR binds DNA as a heterodimer with RXR, while RXR also has the ability to bind DNA as a homodimer. Additionally, RXR forms heterodimers with a number of other NRs, including the receptors for thyroid hormone (TR) and vitamin D3 (VDR) [50], thereby coupling retinoid signaling to a multitude of other cellular signaling pathways.
4 Nuclear Receptors Share a Common Structure
As mentioned above, retinoid receptors belong to the NR family of transcription factors. With only a few exceptions, all NRs share a common structure of functionally
4 Nuclear Receptors Share a Common Structure 5
Fig. 2 Nuclear receptor domains and functions. (A) NRs consist of defined domains, with variable degrees of conservation within the NR superfamily. The NTD, which is the most variable, has a ligand-independent trans-activation function (AF-1) shown to be important in basal transcription by some receptors. The more conserved DBD and the LBD are responsible for DNA and ligand binding, respectively. In addition, both domains play important roles in dimerization. A ligand-depen-dent transactivation function (AF-2) is localized to the LBD. NTD,
N-terminal domain; DBD, DNA binding domain; LBD, ligand binding domain; AF-1, activation function 1; AF-2; activation function 2. (B) Structures of the RXR DBD homodimeric complex (left) and of the RXRRAR DBD heterodimeric complex (right) on DR1 DNA response elements. Contacts between receptor partners form over the minor groove of the DNA helix, with additional protein-DNA contacts stabilizing the complex. DR1, direct repeat 1. Modified from Ref. [52].
separable domains, including an N-terminal domain (NTD), a central DNA binding domain (DBD), and a C-terminal ligand binding domain (LBD) (Fig. 2A). The highly conserved DBD is responsible for recognizing and binding to specific DNA sequences in the promoters of target genes, and is also involved in dimerization between receptors. The LBD, besides binding ligand, plays an essential role in the initial interaction between receptor heterodimers, as well as in ligand-dependent transactivation. The LBD also harbors a ligand-dependent activation function (AF-2), mediating ligand-dependent interactions with so-called co-activators (see below). The NTD is less conserved between different NRs, both in amino acid composition and length, and contains a ligand-independent activation function (AF-1), important in the basal transcriptional activity of some receptors. A hallmark of the NR family is the well-conserved DNA binding region. This cysteine-rich zinc binding domain, has been shown to contain structural features important for correct DNA target sequence recognition and binding, as well as dimerization (Fig. 2B) (see Ref. [50] and references therein). Most NRs bind socalled hormone response elements (HREs), arranged as one or two half-sites of the consensus nucleotide sequence 5 -AGGTCA-3 . Half-sites can be arranged as
6 Retinoid Receptors RAR and RXR: Structure and Function
direct- (DR), inverted- (IR), or everted repeats (ER), with a varying number of nucleotides separating the repeats. Studies of receptor dimers bound to DNA have shown that RXR and its heterodimer partners bind response elements arranged as two direct repeats spaced by one to five nucleotides (DR1 to DR5) ([51]; reviewed in Ref. [52]). Depending on the spacing between the two repeats, different RXR heterodimers will bind and activate transcription. It seems that both heterodimer partners cooperate to ensure correct binding specificity and affinity by making partner-specific protein contacts that stabilize the complex only on a correctly spaced DR motif. Additionally, RXR has the ability to switch its polarity on DR elements, binding either the upstream or the downstream half-site [52]. Importantly, the RXR heterodimer partner not only influences the response element of choice, but also the ability of RXR to become activated by ligand. For example, heterodimers between RXR and the peroxisome proliferator activated receptors (PPARs) are induced by both PPAR and RXR ligands, therefore said to be permissive to RXR activation. RAR-RXR heterodimers, on the other hand, are non-permissive in that they require binding of RAR ligands in order to become responsive to RXR ligands [53–55]. It has been suggested that the initial agonistinduced interaction between RAR and co-activators (see below) is necessary to induce the correct structural rearrangements that allow RXR to become receptive to its ligand [56]. However, several groups have obtained results that are not easily explained by this model. For example, RAR–RXR heterodimers can become responsive to an RXR ligand even after addition of an RAR antagonist, a situation when co-activators are not recruited by RAR [57, 58]. It is therefore still somewhat unclear why some heterodimers are permissive while others are not. An additional dimerization interface important for the initial interaction between NRs is found in the neighboring ligand binding domain (see below). This region has been shown to mediate cooperative binding on all three classes of DNA repeats (direct, inverted, and everted) [59, 60]. In receptor heterodimers, the second dimerization region formed within the DBD restricts receptors to direct repeat elements.
5 The LBD and Ligand-dependent Transactivation
The crystal structure of the LBD of several NRs has been solved, including unliganded (apo) and liganded (holo) RXRα, as well as holo RARγ [61, 62] (reviewed in Ref. [63]). The results reveal a common fold, consisting of 12 α-helices (H1H12) and one β-turn, arranged in a three-layered antiparallel “sandwich” with a hydrophobic core (Fig. 3A). The center of this sandwich contains the ligand binding pocket, lined mostly by hydrophobic and polar residues. As mentioned, the LBD contains the ligand-dependent activation function AF-2, and residues critical for its function have been mapped to helix 12 (the AF-2 core) [64]. Crystallographic studies have provided a model that accounts for the structural transitions involved in ligand activation of NRs (reviewed in Refs [63, 65]). In this so-called “mouse-trap” model, the initial interaction between ligand and LBD
5 The LBD and Ligand-dependent Transactivation
Fig. 3 Schematic drawing of the LBD structure of apo RAR" (A), holo RAR( bound to all-trans retinoic acid (atRA) (B), and antag onist-bound estrogen receptor (ER) " (C). atRA is shown in “stick” form in the center of the LBD in (B), and the ER antagonist,
raloxifene, is depicted as a bent cylinder in (C). Note the different positions of helix 12 (shown in black) in each situation. The "-helices of the LBD are represented as numbered rods. Modified from Ref. [65].
leads to structural changes that trap the ligand within the LBD in an induced fit mechanism. These transitions are subsequently followed by a repositioning of helices 3 and 4, which together with helix 11 now move to form a hydrophobic cleft on the surface of the LBD. The most profound conformational change involves helix 12 (H12) itself (Fig. 3). In the absence of ligand, H12 protrudes from the LBD and is exposed to solvent, whereas in the holo-receptor, it rotates and folds back towards the LBD, thereby compacting its structure. In its final position, H12 seals the pocket, trapping the ligand inside (compare Fig. 3A and B). In addition, in this new conformation, H12 has a major role in positioning so-called co-activator proteins in the hydrophobic cleft formed by residues on helices 3, 4, and 11, a process important in transcriptional activation (discussed below). Once transcription of target genes has been initiated, the hormone response needs to be attenuated in order to control the transcriptional output. Cells can regulate the activity of proteins involved in transcriptional activation by affecting their stability. Several reports show that retinoid receptors are targeted for ubiquitin-mediated degradation upon ligand binding [66–69]. Although somewhat conflicting, the results suggest that only active receptors are degraded. Interestingly, a component of the proteasome that mediates the degradation of ubiquitinated proteins, SUG1, has also been shown to function as a co-activator for certain NRs, including RAR [70], by interacting with the AF-2 helix. This could provide a functional link between ligand-induced transcriptional activation and subsequent degradation of NRs.
7
8 Retinoid Receptors RAR and RXR: Structure and Function
6 Cross-talk
In addition to acting as ligand-dependent transcription factors, some NRs also become activated or inactivated in the absence of ligand, for example by phosphorylation of the receptor itself or of its co-regulatory proteins. This is especially the case for the steroid hormone receptors (reviewed in Ref [71]), but also the activity of retinoid receptors can be modulated through phosphorylation, both in the presence and absence of ligand (see, for example, Refs [72–74]). Phosphorylation of a serine residue in the RAR DBD has for example been shown to lower the interaction between RAR and RXR, thereby decreasing transcriptional activation, and could be a way of regulating the activity of receptors in vivo [75]. On the other hand, phosphorylation of serine residues in the N-terminal AF-1 of RAR has been shown to have a positive effect on ligand-independent transcription [74, 76]. Perhaps the best established example of cross-talk is the inhibition of AP-1 transcriptional activity by several NRs including retinoid receptors. These modulatory activities are believed to be critical for the common antiproliferative effects of retinoids and other NR ligands (reviewed in Ref. [77]).
7 Co-activators
NRs require accessory factors, so-called co-activators, in order for ligand-dependent activation of target genes to occur. The first co-activators to be described interacted with the estrogen receptor (ER) in the presence of ligand [78]. One such protein, RIP160 (for receptor interacting protein 160), later shown to be identical to SRC-1 (steroid receptor co-activator-1) [79], interacted with several NRs in a hormonedependent manner. Since then, an ever-increasing number of co-activators have been cloned and characterized (reviewed in Ref. [80]). The best-characterized co-activators belong to one of three classes: the p160 family, including SRC-1/NCoA1, GRIP-1/TIF2, and ACTR/pCIP/RAC3/AIB-1; the homologous CBP and p300 co-activators; and the recently isolated TRAP/DRIP complexes (see Ref. [81] for abbreviations and references) (Fig. 4). Members of the p160 family as well as CBP/p300 and p/CAF (CBP associated f actor) show intrinsic histone acetyltransferase (HAT) activity [82–86], suggesting that co-activators may play direct roles in chromatin remodeling at promoters by acetylating histone proteins [87]. The TRAP/DRIP complexes were isolated by their ability to interact with ligand-bound TR and VDR, respectively. They form large multiprotein complexes that share common subunits [88, 89], and are thought to play an important role as bridging molecules between DNA-bound NRs and the basal transcription machinery [90]. Co-activators interact with NRs via one or several leucine-rich β-helices, also known as NR boxes, with the consensus sequence LxxLL (where L corresponds to leucine and x is any amino acid residue) [91, 92]. Structural and functional studies indicate that the co-activator LxxLL helix is accommodated along the hydrophobic
7 Co-activators
Fig. 4 The different steps and proteins involved in NR transcriptional activation. DNA-bound NRs (here exemplified as an RAR-RXR heterodimer) inhibit transcriptional activation by recruiting corepressors with histone deace-tylase activity, keeping promoter DNA packed into histones, i.e. in a silent form. Upon exposure to ligand, co-repressors are released and co-activators of the p160 family and CBP/p300 are recruited, either before or together with complexes like the TRAP/DRIP proteins. The
ATP-dependent activity of the SWI/SNF complex, initially acts to unwind DNA at the promoter. Histone acetylation by coactivators allows stronger interaction between the NRs and DNA, ultimately leading to recruitment of RNA Pol II and other accessory factors. These proteins recognize and bind DNA sequences of the TATA box at the transcriptional initiation site. RNA Pol II is finally released from the promoter and initiates gene transcription. See text for abbreviations and further details.
cleft on the surface of the receptor LBD that forms upon ligand binding [93–96]. Two strictly conserved amino acid residues on the receptor form a “charge clamp” that correctly places the co-activator LxxLL motif on the LBD, leading to transcriptional activation of the receptor. In the crystal structures of ER and RAR bound to antagonists, the AF-2 helix of the receptor is not repositioned correctly as in the ligand-bound receptors. Instead, it is translocated to overlap the co-activator interaction site, thereby preventing co-activator binding [96–99]. This in turn would facilitate the recruitment of another group of regulatory factors, corepres-sors, explaining the molecular mechanism behind antagonistic repression of NRs. The development of novel techniques for studying protein–chromatin interactions at specific promoters has yielded exciting new insights explaining the pro-
9
10 Retinoid Receptors RAR and RXR: Structure and Function
cess of transcriptional initiation (reviewed in Ref [100]). The results suggest that ligand-bound receptors continuously cycle on and off target promoters, transiently interacting with response elements on DNA, recruiting cofactors and RNA polymerase II to initiate transcription, and subsequently dissociating from DNA again [101]. Furthermore, it seems that cofactors with HAT activity not only acetylate histones at the promoter itself, but also further away on the DNA template to allow better access of proteins to promoter DNA. The exact sequence of events involved in transcriptional initiation is however unclear, and both a stepwise or concerted recruitment of p160 and TRAP/DRIP co-activators to the initiation complex has been suggested [102, 103] (see Fig. 4). 8 Co-repressors
Certain NRs, such as RAR and TR, repress basal transcription in the absence of ligand by binding the promoters of target genes, a process known as silencing [104]. The molecular mechanisms behind this phenomenon involves two related corepressor proteins, N-CoR (nuclear receptor co-repressor) and SMRT (silencing mediator of retinoid and thyroid hormone receptor), which both interact with RAR and TR in a ligand-independent manner [105, 106]. It has recently been shown that corepressors bind a region on the surface of the receptor LBD that overlaps the co-activator-interacting site. N-CoR and SMRT contain α-helical structures similar in sequence to the LxxLL helix of co-activators, with the consensus LxxI/HIxxxI/L (where L is leucine, I is isoleucine, H is histidine, and x is any amino acid residue) [107–109]. Thus, due to sequence similarities, corepressors are able to bind the same region on the NR LBD as co-activators, thereby masking the co-activator site and repressing transcription. Structural transitions in the LBD that occur upon ligand binding move the corepressor and allows AF-2 helix repositioning, which further displaces the corepressor. The extended corepressor motif no longer fits in the cavity due to steric hindrance by the AF-2 helix, and instead the co-activator gains entry to the site. Analogous to co-activators forming large multiprotein complexes, N-CoR and SMRT also interact with other proteins to repress transcription. Histone deacetylases (HDACs 1 and 2) are bridged to unliganded NRs via N-CoR/SMRT and mSin3 co-repressors, thereby mediating silencing (reviewed in Ref. [80]) (Fig. 4). Deacetylation of core histones by HDACs is recognized as a mechanism for keeping chromosome domains transcriptionally silent, and would explain how NRs mediate repression. 9 Nuclear Receptors from an Evolutionary Perspective
Based upon the homologies within the superfamily, NRs have been divided into six subfamilies [110]. The ability to bind ligand and the structure of the ligand in
10 Fatty acids as Endogenous Ligands for RXR 11
several cases seems unrelated to which subfamily the respective receptors belong. For example, RAR and TR, which bind two unrelated ligands, are nonetheless more related in sequence than, for example, RAR and RXR, which both bind retinoic acid isomers, suggesting that ligand-binding ability is independent from the evolutionary origin [111]. The fact that orphan receptors are present in all subfamilies, whereas liganded receptors are not, further suggests that the ancestral receptor did not have a ligand. Interestingly, several recent reports have revealed the unexpected presence of fatty acids/lipids in the ligand-binding pockets of several NR crystals, including RXR, the Drosophila RXR ortholog Ultraspiracle, and retinoic acid-related orphan receptor β (RORβ) [99, 112–114]. Apparently, these LBDs bind lipids derived from the bacterial strains used to express the proteins, and it seems likely that such binding is a prerequisite for a stable LBD conformation. These findings suggest an intriguing model explaining how the ligand-activated receptors of today have evolved. Accordingly, the primordial NR may have used a ubiquitous lipid as a structural element/cofactor to stabilize the LBD in its active state. In this view, the primordial “lipid binding domain” was permissive to those evolutionary adaptations necessary for a regulated domain to evolve, i.e. a structure that is regulated by bona fide ligand interactions. Ligand identity can thus be viewed as the result of an independent, convergent evolutionary process that took place in several different receptors, unrelated to their evolutionary origin within the NR superfamily.
10 Fatty acids as Endogenous Ligands for RXR
The signaling status of RXR in vivo is still a matter of debate, as its proposed natural ligand, 9-cis RA, has proved difficult to identify in mammalian tissue [115]. Nonetheless, several reports show that simultaneous addition of both RXR- and RAR-specific ligands often leads to synergistic biological effects (see, for example, Refs [57, 116, 117]). Therefore, it seems likely that ligand-induced activation of RXR does occur, a conclusion that has been corroborated by experiments in transgenic mice [14, 118]. Recent experiments have identified novel endogenous RXR ligands and expanded the perspectives on RXR functions in vivo. Interestingly, recent data suggests that the chlorophyll metabolite phytanic acid can bind and activate RXR [119, 120] (Fig. 1B). Although phytanic acid is a low-affinity ligand and high serum levels would be required to activate RXR, significant activation might occur in patients with certain metabolic disorders such as Refsum’s disease [119, 120]. Docosahexaenoic acid (DHA or C22:6 cis 4, 7, 10, 13, 16, 19) (Fig. 1B), a long-chain polyunsaturated fatty acid (PUFA), is an RXR ligand, does not activate RAR, TR, or VDR, and promotes the interaction between RXR and the co-activator SRC-1 [121]. Interestingly, other PUFAs including docosatetraenoic (C22:4 cis 7, 10, 13, 16), arachidonic (C20:4 cis 5, 8, 11, 14) and oleic (C18:1 cis 9) acids, also activate
12 Retinoid Receptors RAR and RXR: Structure and Function
RXR, although DHA is more potent [121]. Unlike 9cRA, which binds and activates RXR with high affinity, DHA is a low-affinity ligand for this receptor, reaching half-maximal activation at a concentration of about 40-50 µM (our unpublished observations). However, while 9cRA has been difficult to identify in vivo, DHA, which accumulates in the mammalian CNS during late gestation and early postnatal development, constitutes between 30 and 50% of total membrane-bound fatty acids in the postnatal brain (reviewed in Refs [122] and [123]). Therefore, it seems likely that sufficiently high intracellular concentrations of free DHA may exist in neurons. Little is known about the mechanisms of DHA release from its phospholipid compartment, although some reports implicate phospholipases A2 and C in this process (see, for example, Ref [124]). This mobilization could potentially supply enough free DHA to compensate for its rather low affinity towards RXR. Deficiencies in DHA have been shown to cause neurological abnormalities, impaired learning abilities and growth retardation (see, for example, Refs [125] and [126]). Memory deficits have recently been shown to improve upon either RA or DHA treatment, suggesting functional overlap between the two pathways [127–129]. Interestingly, the analyses of gene targeted mice lacking one or several genes encoding RXR isotypes have demonstrated overlapping functions with DHA, for example in the process of memory formation [130], vision [131], and reproduction [132]. Moreover, several RXR heterodimerization partners, such as PPARs, liver X receptors and farnesoid X receptor, contribute to energy and nutritional homeostasis in response to their respective ligands (reviewed in Ref. [133]). DHA could thus play an important modulatory role in these processes by binding and influencing the RXR subunit of such heterodimers. Structural analysis of the RXR LBD bound to DHA shows that the receptor LBD adopts the canonical conformation of a ligand bound NR, with the activation function AF-2 helix (H12) packed towards the hydrophobic groove formed on the surface of the protein [134]. The highly flexible fatty acid molecule is optimally accommodated to the ligand cavity of RXR, occupying 80% of the pocket (72% for 9cRA) and making ligand-protein contacts similar to those of 9cRA.
11 Conclusions
A few issues stand out as particularly likely to attract researchers’ attention over the next coming years. For example, although structural studies have provided valuable information as to how nuclear receptors in general, and retinoid receptors in particular, interact with each other, with DNA, and with co-regulatory proteins, most such studies have been performed on isolated domains of the receptors. Notably, crystal structures of receptors bound to DNA have been produced using only the DNA binding domain. Similarly, ligand and co-activator binding studies have focused on the ligand binding domain. Undertaking similar studies using
References
full-length receptors will undoubtedly provide additional and valuable data on how nuclear receptors perform their essential roles in vivo. Moreover, while the importance of retinoid receptors during embryonal development has been subject to detailed analysis, their roles in adult homeostasis are less clear. Gene ablation studies suggest that RXRα is the preferred partner of RARs in development in vivo [46]. Mice lacking the RXRα gene die before birth, a fact that has hampered the detailed analysis of its roles in adult homeostasis. Several studies have now employed tissue-specific knockout techniques to learn more about this important receptor (see, for example, Refs [135–137]), and recent data suggest that RXR ligands have potent effects on pathways involving other hetero-dimer partners than RAR [138]. Despite much work, little is known about retinoid receptor target genes, something that new approaches such as microarray experiments will help to address. It will be interesting to understand if isotype-specific retinoid receptor ligands induce the expression of different sets of downstream genes, perhaps by affecting the interaction between receptor and a certain co-activator or co-repressor. Perhaps the biggest mystery in retinoid action concerns the issue of specificity, i.e. how a simple small molecule such as retinoic acid can evoke such pleiotropic responses in development and adult physiology. Clearly, depending on cellular context the retinoid signal can be interpreted in many different ways. Reaching an understanding of how the diverse mechanisms discussed in this article can be differentially integrated to achieve the appropriate cellular responses should remain a major challenge in future retinoid receptor research. Acknowledgements
This work was supported by the G¨oran Gustafsson Foundation. References 1 GIGUE` RE, V. Endocr. Rev. 1994, 15, 61– 79. 2 GUDAS, L. J., SPORN, M. B., ROBERTS, A. B. In: SPORN, M. B., ROBERTS, A. B., GOODMAN, D. S. eds. The Retinoids: Biology, Chemistry, and Medicine, 2nd edn. New York: Raven Press, 1994, pp. 443–520. 3 HOFMANN, C., EICHELE, G. In: SPORN, M. B., ROBERTS, A. B., GOODMAN, D. S. eds. The Retinoids: Biology, Chemistry, and Medicine, 2nd edn. New York: Raven Press, 1994, pp. 387–441. 4 MADEN, M. Proc. Nutr. Soc. 2000, 59, 65–73. 5 ROSS, S. A., MCCAFFERY, P. J., DRAGER, U. C., De LUCA, L. M. Physiol Rev. 2000, 80, 1021–1054.
6 MADEN, M., GALE, E., KOSTETSKII, I., ZILE, M. Curr. Biol. 1996, 6, 417– 426. 7 BALKAN, W., COLBERT, M., BOCK, C., LINNEY, E. Proc. Natl Acad. Sci. USA 1992, 89, 3347–3351. 8 COLBERT, M. C., LINNEY, E., LAMANTIA, A. S. Proc. Natl Acad. Sci. USA 1993, 90, 6572–6576. 9 LAMANTIA, A. S., COLBERT, M. C., LINNEY, E. Neuron 1993, 10, 1035–1048. 10 MATA DE URQUIZA, A., SOLOMIN, L., PERLMANN, T. Proc. Natl Acad. Sci. USA 1999, 96, 13270–13275. 11 MENDELSOHN, C., RUBERTE, E., LEMEUR, M., MORRISS-KAY, G., CHAMBON, P. Development 1991, 113, 723–734.
13
14 Retinoid Receptors RAR and RXR: Structure and Function
12 REYNOLDS, K., MEZEY, E., ZIMMER, A. Mech. Dev. 1991, 36, 15–29. 13 ROSSANT, J., ZIRNGIBL, R., CADO, D., SHAGO, M., GIGUERE, V. Genes Dev. 1991, 5, 1333–1344. 14 SOLOMIN, L., JOHANSSON, C. B., ZETTERSTRO¨ M, R. H., BISSONNETTE, R. P., HEYMAN, R. A., OLSON, L., LENDAHL, U., FRISE´ N, J., PERLMANN, T. Nature 1998, 395, 398–402. 15 NAPOLI, J. L. FASEB J. 1996, 10, 993–1001. 16 DUESTER, G. Chem. Biol. Interact. 2001, 130–132, 469–480. 17 THALLER, C., EICHELE, G. Nature 1990, 345, 815–819. 18 PIJNAPPEL, W. W., HENDRIKS, H. F., FOLKERS, G. E., van den BRINK, C. E., DEKKER, E. J., EDELENBOSCH, C., van der SAAG, P. T., DURSTON, A. J. Nature 1993, 366, 340–344. 19 FUJII, H., SATO, T., KANEKO, S., GOTOH, O., FUJII-KURIYAMA, Y., OSAWA, K., KATO, S., HAMADA, H. EMBO J. 1997, 16, 4163–4173. 20 WHITE, J. A., BECKETT-JONES, B., GUO, Y. D., DILWORTH, F. J., BONASORO, J., JONES, G., PETKOVICH, M. J. Biol. Chem. 1997, 272, 18538–18541. 21 WHITE, J. A., RAMSHAW, H., TAIMI, M., STANGLE, W., ZHANG, A., EVERINGHAM, S., CREIGHTON, S., TAM, S. P., JONES, G., PETKOVICH, M. Proc. Natl Acad. Sci. USA 2000, 97, 6403–6408. 22 BERGGREN, K., MCCAFFERY, P., DRAGER, U., FOREHAND, C. J. Dev. Biol. 1999, 210, 288–304. 23 HASELBECK, R. J., HOFFMANN, I., DUESTER, G. Dev. Genet. 1999, 25, 353–364. 24 NIEDERREITHER, K., MCCAFFERY, P., DRAGER, U. C., CHAMBON, P., DOLLE, P. Mech. Dev. 1997, 62, 67–78. 25 SWINDELL, E. C., THALLER, C., SOCKANATHAN, S., PETKOVICH, M., JESSELL, T. M., EICHELE, G. Dev. Biol. 1999, 216, 282–296. 26 ZHAO, D., MCCAFFERY, P., IVINS, K. J., NEVE, R. L., HOGAN, P., CHIN, W. W., DRAGER, U. C. Eur. J. Biochem. 1996, 240, 15–22. 27 ABU-ABED, S., DOLLE, P., METZGER, D., BECKETT, B., CHAMBON, P., PETKOVICH, M. Genes Dev. 2001, 15, 226–240.
28 SAKAI, Y., MENO, C., FUJII, H., NISHINO, J., SHIRATORI, H., SAIJOH, Y., ROSSANT, J., HAMADA, H. Genes Dev. 2001, 15, 213–225. 29 NIEDERREITHER, K., SUBBARAYAN, V., DOLLE, P., CHAMBON, P. Nature Genet. 1999, 21, 444–448. 30 WHITE, J. C., SHANKAR, V. N., HIGHLAND, M., EPSTEIN, M. L., DELUCA, H. F., CLAGETT-DAME, M. Proc. Natl Acad. Sci. USA 1998, 95, 13459–13464. 31 KOIDE, T., DOWNES, M., CHANDRARATNA, R. A., BLUMBERG, B., UMESONO, K. Genes Dev. 2001, 15, 2111–2121. 32 DOLLE, P., RUBERTE, E., LEROY, P., MORRISS-KAY, G., CHAMBON, P. Development 1990, 110, 1133–1151. 33 RUBERTE, E., DOLLE, P., CHAMBON, P., MORRISS-KAY, G. Development 1991, 111, 45–60. 34 RUBERTE, E., FRIEDERICH, V., CHAMBON, P., MORRISS-KAY, G. Development 1993, 118, 267–282. 35 LAMPRON, C., ROCHETTE-EGLY, C., GORRY, P., DOLLE, P., MARK, M., LUFKIN, T., LE-MEUR, M., CHAMBON, P. Development 1995, 121, 539–548. 36 GHYSELINCK, N. B., BAVIK, C., SAPIN, V., MARK, M., BONNIER, D., HINDELANG, C., DIERICH, A., NILSSON, C. B., HAKANSSON, H., SAUVANT, P., AZAIS-BRAESCO, V., FRASSON, M., PICAUD, S., CHAMBON, P. EMBO J. 1999, 18, 4903–4914. 37 FOLLI, C., CALDERONE, V., OTTONELLO, S., BOLCHI, A., ZANOTTI, G., STOPPINI, M., BERNI, R. Proc. Natl Acad. Sci. USA 2001, 98, 3710–3715. 38 VOGEL, S., MENDELSOHN, C. L., MERTZ, J. R., PIANTEDOSI, R., WALDBURGER, C., GOTTESMAN, M. E., BLANER, W. S. J. Biol. Chem. 2001, 276, 1353–1360. 39 CHAMBON, P. FASEB J. 1996, 10, 940–954. 40 MANGELSDORF, D. J., UMESONO, K., EVANS, R. M. In: SPORN, M. B., ROBERTS, A. B., GOODMAN, D. S. eds. The Retinoids: Biology, Chemistry, and Medicine, 2nd edn. New York: Raven Press, 1994, pp. 319–349. 41 KASTNER, P., MARK, M., CHAMBON, P. Cell 1995, 83, 859–869. 42 KREZEL, W., KASTNER, P., CHAMBON, P. Neuroscience 1999, 89, 1291–1300.
References 43 ZETTERSTRO¨ M, R. H., LINQUIST, E., MATA DE URQUIZA, A., TOMAC, A., ERIKSSON, U., PERLMANN, T., OLSON, L. Eur. J. Neurosci. 1999, 11, 407–416. 44 MANGELSDORF, D. J., BORGMEYER, U., HEYMAN, R. A., ZHOU, J. Y., ONG, E. S., ORO, A. E., KAKIZUKA, A., EVANS, R. M. Genes Dev. 1992, 6, 329–344. 45 DOLLE, P., FRAULOB, V., KASTNER, P., CHAMBON, P. Mech. Dev. 1994, 45, 91–104. 46 KASTNER, P., MARK, M., GHYSELINCK, N., KREZEL, W., DUPE´ , V., GRONDONA, J. M., CHAMBON, P. Development 1997, 124, 313–326. 47 MANGELSDORF, D. J., THUMMEL, C., BEATO, M., HERRLICH, P., SCHU¨ TZ, G., UMESONO, K., BLUMBERG, B., KASTNER, P., MARK, M., CHAMBON, P., EVANS, R. M. Cell 1995, 83, 835–839. 48 PERLMANN, T., EVANS, R. M. Cell 1997, 90, 391–397. 49 DI CROCE, L., OKRET, S., KERSTEN, S., GUSTAFSSON, J.-Å., PARKER, M., WAHLI, W., BEATO, M. EMBO J. 1999, 18, 6201–6210. 50 MANGELSDORF, D. J., EVANS, R. M. Cell 1995, 83, 841–850. 51 UMESONO, K., MURAKAMI, K. K., THOMPSON, C. C., EVANS, R. M. Cell 1991, 65, 1255–1266. 52 RASTINEJAD, F. Curr. Opin. Struct. Biol. 2001, 11, 33–38. 53 KUROKAWA, R., DIRENZO, J., BOEHM, M., SUGARMAN, J., GLOSS, B., ROSENFELD, M. G., HEYMAN, R. A., GLASS, C. K. Nature 1994, 371, 528–531. 54 FORMAN, B. M., UMESONO, K., CHEN, J., EVANS, R. M. Cell 1995, 81, 541– 550. 55 PERLMANN, T., JANSSON, L. Genes Dev. 1995, 9, 769–782. 56 WESTIN, S., KUROKAWA, R., NOLTE, R., WISELEY, G. B., MCINERNEY, E. M., ROSE, D. W., MILBURN, M. V., ROSENFELD, M. G., GLASS, C. K. Nature 1998, 395, 199–202. 57 BOTLING, J., CASTRO, D. S., OBERG, F., NILSSON, K., PERLMANN, T. J. Biol. Chem. 1997, 272, 9443–9449. 58 CHEN, J. Y., CLIFFORD, J., ZUSI, C., STARRETT, J., TORTOLANI, D., OSTROWSKI, J., RECZEK, P. R., CHAMBON, P.,
59
60
61
62
63 64
65 66
67
68
69
70
71 72
73
74
GRONEMEYER, H. Nature 1996, 382, 819–822. PERLMANN, T., UMESONO, K., RANGARAJAN, P. N., FORMAN, B. M., EVANS, R. M. Mol. Endocrinol. 1996, 10, 958–966. KUROKAWA, R., YU, V. C., N¨AA¨ R, A., KYAKUMOTO, S., HAN, Z., SILVERMAN, S., ROSENFELD, M. G., GLASS, C. K. Genes Dev. 1993, 7, 1423–1435. BOURGUET, W., RUFF, M., CHAMBON, P., GRONEMEYER, H., MORAS, D. Nature 1995, 375, 377–382. RENAUD, J.-P., ROCHEL, N., RUFF, M., VIVAT, V., CHAMBON, P., GRONEMEYER, H., MORAS, D. Nature 1995, 378, 681–689. EGEA, P. F., KLAHOLZ, B. P., MORAS, D. FEBS Lett. 2000, 476, 62–67. DANIELIAN, P. S., WHITE, R., LEES, J. A., PARKER, M. G. EMBO J. 1992, 11, 1025–1033. MORAS, D., GRONEMEYER, H. Curr. Opin. Cell Biol. 1998, 10, 384–391. KOPF, E., PLASSAT, J. L., VIVAT, V., de THE, H., CHAMBON, P., ROCHETTE-EGLY, C. J. Biol. Chem. 2000, 275, 33280–33288. NOMURA, Y., NAGAYA, T., HAYASHI, Y., KAMBE, F., SEO, H. Biochem. Biophys. Res. Commun. 1999, 260, 729–733. OSBURN, D. L., SHAO, G., SEIDEL, H. M., SCHULMAN, I. G. Mol. Cell. Biol. 2001, 21, 4909–4918. ZHU, J., GIANNI, M., KOPF, E., HONORE, N., CHELBI-ALIX, M., KOKEN, M., QUIGNON, F., ROCHETTE-EGLY, C., de THE, H. Proc. Natl Acad. Sci. USA 1999, 96, 14807–14812. VOM BAUR, E., ZECHEL, C., HEERY, D., HEINE, M. J., GARNIER, J. M., VIVAT, V., LE DOUARIN, B., GRONEMEYER, H., CHAMBON, P., LOSSON, R. EMBO J. 1996, 15, 110–124. WEIGEL, N. L., ZHANG, Y. J. Mol. Med. 1998, 76, 469–479. HUGGENVIK, J. I., COLLARD, M. W., KIM, Y. W., SHARMA, R. P. Mol. Endocrinol. 1993, 7, 543–550. TANEJA, R., ROCHETTE-EGLY, C., PLASSAT, J. L., PENNA, L., GAUB, M. P., CHAMBON, P. EMBO J. 1997, 16, 6452–6465. BASTIEN, J., ADAM-STITAH, S., RIEDL, T., EGLY, J. M., CHAMBON, P.,
15
16 Retinoid Receptors RAR and RXR: Structure and Function
75
76
77 78
79
80 81 82 83
84
85
86
87 88
89
90 91
92
ROCHETTE-EGLY, C. J. Biol. Chem. 2000, 275, 21896–21904. DELMOTTE, M. H., TAHAYATO, A., FORMSTECHER, P., LEFEBVRE, P. J. Biol. Chem. 1999, 274, 38225–38231. ROCHETTE-EGLY, C., ADAM, S., ROSSIGNOL, M., EGLY, J. M., CHAMBON, P. Cell 1997, 90, 97–107. PFAHL, M. Endocr. Rev. 1993, 14, 651–658. CAVAILLES, V., DAUVOIS, S., DANIELIAN, P. S., PARKER, M. G. Proc. Natl Acad. Sci. USA 1994, 91, 10009–10013. ONATE, S. A., TSAI, S. Y., TSAI, M. J., O’MALLEY, B. W. Science 1995, 270, 1354–1357. GLASS, C. K., ROSENFELD, M. G. Genes Dev. 2000, 14, 121–141. XU, L., GLASS, C. K., ROSENFELD, M. G. Curr. Opin. Genet. Dev. 1999, 9, 140–147. BANNISTER, A. J., KOUZARIDES, T. Nature 1996, 384, 641–643. OGRYZKO, V. V., SCHILTZ, R. L., RUSSANOVA, V., HOWARD, B. H., NAKATANI, Y. Cell 1996, 87, 953–959. YANG, X. J., OGRYZKO, V. V., NISHIKAWA, J., HOWARD, B. H., NAKATANI, Y. Nature 1996, 382, 319–324. CHEN, H., LIN, R. J., SCHILTZ, R. L., CHAKRAVARTI, D., NASH, A., NAGY, L., PRIVALSKY, M. L., NAKATANI, Y., EVANS, R. M. Cell 1997, 90, 569–580. SPENCER, T. E., JENSTER, G., BURCIN, M. M., ALLIS, C. D., ZHOU, J., MIZZEN, C. A., MCKENNA, N. J., ONATE, S. A., TSAI, S. Y., TSAI, M. J., O’MALLEY, B. W. Nature 1997, 389, 194–198. MONTMINY, M. Nature 1997, 387, 654–655. FONDELL, J. D., GE, H., ROEDER, R. G. Proc. Natl Acad. Sci. USA 1996, 93, 8329–8333. RACHEZ, C., LEMON, B. D., SULDAN, Z., BROMLEIGH, V., GAMBLE, M., NAAR, A. M., ERDJUMENT-BROMAGE, H., TEMPST, P., FREEDMAN, L. P. Nature 1999, 398, 824–828. FREEDMAN, L. P. Cell 1999, 97, 5–8. HEERY, D. M., KALKHOVEN, E., HOARE, S., PARKER, M. G. Nature 1997, 387, 733–736. TORCHIA, J., ROSE, D., INOSTROZA, J., KAMEL, Y., WESTIN, S., GLASS, C.,
93
94
95
96
97
98
99
100 101
102
103
104 105 106
ROSENFELD, M. Nature 1997, 387, 677–684. DARIMONT, B. D., WAGNER, R. L., APRILETTI, J. W., STALLCUP, M. R., KUSHNER, P. J., BAXTER, J. D., FLETTERICK, R. J., YAMAMOTO, K. R. Genes Dev. 1998, 12, 3343–3356. FENG, W., RIBEIRO, R. C. J., WAGNER, R. L., NGUYEN, H., APRILETTI, J. W., FLETTERICK, R. J., BAXTER, J. D., KUSHNER, P. J., WEST, B. L. Science 1998, 280, 1747–1749. NOLTE, R. T., WISELY, G. B., WESTIN, S., COBB, J. E., LAMBERT, M. H., KUROKAWA, R., ROSENFELD, M. G., WILLSON, T. M., GLASS, C. K., MILBURN, M. V. Nature 1998, 395, 137–143. SHIAU, A. K., BARSTAND, D., LORIA, P. M., CHENG, L., KUSHNER, P. J., AGARD, D. A., GREENE, G. L. Cell 1998, 95, 927–937. BRZOZOWSKI, A. M., PIKE, A. C. W., DAUTER, Z., HUBBARD, R. E., BONN, T., ¨ HMAN, L., GREEN, G. L., ENGSTRO¨ M, O., O GUSTAFSSON, J.-A. Nature 1997, 389, 753–758. PIKE, A. C. W., BRZOZOWSKI, A. M., HUBBARD, R. E., BONN, T., THORSELL, A. G., ENGSTROM, O., LJUNGGREN, J. J.-A. G., CARLQUIST, M. EMBO J. 1999, 18, 4608–4618. BOURGUET, W., VIVAT, V., WURTZ, J.-M., CHAMBON, P., GRONEMEYER, H., MORAS, D. Mol. Cell 2000, 5, 289–298. LEE, K. C., LEE KRAUS, W. Trends Endocrinol. Metab. 2001, 12, 191–197. MCNALLY, J. G., MULLER, W. G., WALKER, D., WOLFORD, R., HAGER, G. L. Science 2000, 287, 1262–1265. DILWORTH, F. J., FROMENTAL-RAMAIN, C., YAMAMOTO, K., CHAMBON, P. Mol. Cell. 2000, 6, 1049–1058. SHANG, Y., HU, X., DIRENZO, J., LAZAR, M. A., BROWN, M. Cell 2000, 103, 843–852. PERLMANN, T., VENNSTROM, B. Nature 1995, 377, 387–388. CHEN, J. D., EVANS, R. M. Nature 1995, 377, 454–457. H¨ORLEIN, A. J., NAAR, A. M., HEINZEL, T., TORCHIA, J., GLOSS, B., KUROKAWA, R., YAN, A., KAMEL, Y., SODERSTROM, M., GLASS, C. K., ROSENFELD, M. G. 1995) Nature 1995, 377, 397–404.
References 107 HU, X., LAZAR, M. A. Nature 1999, 402, 93–96. 108 NAGY, L., KAO, H. Y., LOVE, J. D., LI, C., BABAYO, E., GOOCH, J. T., KRISHNA, V., CHATTERJEE, K., EVANS, R. M., SCHWABE, J. W. R. Genes Dev. 1999, 13, 3209– 3216. 109 PERISSI, V., STASZEWSKI, L. M., MCINERNEY, E. M., KUROKAWA, R., KRONES, A., ROSE, D. W., LAMBERT, M. H., MILBURN, M. V., GLASS, C. K., ROSENFELD, M. Genes Dev. 1999, 13, 3198– 3208. 110 LAUDET, V. J. Mol. Endocrinol. 1997, 19, 207–226. 111 ESCRIVA, H., SAFI, R., H¨ANNI, C., LANGLOIS, M.-C., SAUMITOU-LAPRADE, P., STEHELIN, D., CAPRON, A., PIERCE, R., LAUDET, V. Proc. Natl Acad. Sci. USA 1997, 94, 6803–6808. 112 BILLAS, I. M., MOULINIER, L., ROCHEL, N., MORAS, D. J. Biol. Chem. 2001, 276, 7465–7474. 113 CLAYTON, G. M., PEAK-CHEW, S. Y., EVANS, R. M., SCHWABE, J. W. Proc. Natl Acad. Sci. USA 2001, 98, 1549– 1554. 114 STEHLIN, C., WURTZ, J. M., STEINMETZ, A., GREINER, E., SCHULE, R., MORAS, D., RENAUD, J. P. EMBO J. 2001, 20, 5822–5831. 115 HORTON, C., MADEN, M. Dev. Dyn. 1995, 202, 312–323. 116 ROY, B., TANEJA, R., CHAMBON, P. Mol. Cell Biol. 1995, 15, 6481–6487. 117 LU, H. C., EICHELE, G., THALLER, C. Development 1997, 124, 195–203. 118 MASCREZ, B., MARK, M., DIERICH, A., GHYSELINCK, N. B., KASTNER, P., CHAMBON, P. Development 1998, 125, 4691–4707. 119 KITAREEWAN, S., BURKA, L. T., TOMER, K. B., PARKER, C. E., DETERDING, L. J., STEVENS, R. D., FORMAN, B. M., MAIS, D. E., HEYMAN, R. A., MCMORRIS, T., WEINBERGER, C. Mol. Biol. Cell 1996, 7, 1153–1166. 120 LEMOTTE, P. K., KEIDEL, S., APFEL, C. M. Eur. J. Biochem. 1996, 236, 328–323. 121 MATA DE URQUIZA, A., LIU, S., SJOBERG, M., ZETTERSTROM, R. H., GRIFFITHS, W., SJOVALL, J., PERLMANN, T. Science 2000, 290, 2140–2144.
122 SALEM Jr, N., KIM, H.-Y., YERGEY, J. A. In: SIMOPOULOS, A. P., KIFER, R. R., MARTIN, R. E. eds. Health Effects of Polyunsaturated Fatty Acids in Seafoods. London: Academic Press, 1986, pp. 263–317. 123 NEURINGER, M., ANDERSON, G. J., CONNOR, W. E. Annu. Rev. Nutr. 1988, 8, 517–541. 124 GARCIA, M. C., KIM, H.-Y. Brain Res. 1997, 768, 43–48. 125 HORROCKS, L. A., YEO, Y. K. Pharmacol. Res. 1999, 40, 211–225. 126 SHEAFF GREINER, R., MORIGUCHI, T., HUTTON, A., SLOTNICK, B. M., SALEM Jr, N. Lipids Suppl. 1999, 34, 239–243. 127 ETCHAMENDY, N., ENDERLIN, V., MARIGHETTO, A., VOUIMBA, R. M., PALLET, V., JAFFARD, R., HIGUERET, P. J. Neurosci. 2001, 21, 6423–6429. 128 FUJITA, S., IKEGAYA, Y., NISHIKAWA, M., NISHIYAMA, N., MATSUKI, N. Br. J. Pharmacol. 2001, 132, 1417– 1422. 129 MISNER, D. L., JACOBS, S., SHIMIZU, Y., MATA DE URQUIZA, A., SOLOMIN, L., PERLMANN, T., DELUCA, L. M., STEVENS, C. F., EVANS, R. M. Proc. Natl Acad. Sci. USA 2001, 98, 11714–11719. 130 CHIANG, M.-Y., MISNER, D., KEMPERMANN, G., SCHIKORSKI, T., GIGUE` RE, V., SUCOV, H. M., GAGE, F. H., STEVENS, C. F., EVANS, R. M. Neuron 1998, 21, 1353–1361. 131 KASTNER, P., GRONDONA, J. M., MARK, M., GANSMULLER, A., LEMEUR, M., DECIMO, D., VONESCH, J. L., DOLLE, P., CHAMBON, P. Cell 1994, 78, 987–1003. 132 KASTNER, P., MARK, M., LEID, M., GANSMULLER, A., CHIN, W., GRONDONA, J. M., ECIMO, D., KREZEL, W., DIERICH, A., CHAMBON, P. Genes Dev. 1996, 10, 0–92. 133 CHAWLA, A., SAEZ, E., EVANS, R. M. Cell 2000, 103, 1–4. 134 EGEA, P. F., MITSCHLER, A., MORAS, D. Mol. Endocrinol. 2002, 16, 987–997. 135 IMAI, T., JIANG, M., CHAMBON, P., METZGER, D. Proc. Natl Acad. Sci. USA 2001, 98, 224–228. 136 LI, M., CHIBA, H., WAROT, X., MESSADDEQ, N., GERARD, C., CHAMBON, P., METZGER, D. Development 2001, 128, 675–688.
17
18 Retinoid Receptors RAR and RXR: Structure and Function
137 WAN, Y. J., AN, D., CAI, Y., REPA, J. J., HUNG-PO CHEN, T., FLORES, M., POSTIC MAGNUSON, M. A., CHEN, J., CHIEN, K. R., FRENCH, S., MANGELSDORF, D. J. ND SUCOV, H. M. Mol. Cell. Biol. 2000, 20, 4436–4444.
138 REPA, J. J., TURLEY, S. D., LOBACCARO, J. A., MEDINA, J., LI, L., LUSTIG, K., SHAN, B., HEYMAN, R. A., DIETSCHY, J. M., MANGELSDORF, D. J. Science 2000, 289, 1524–1529.
1
Human ABC Transporters: Function, Expression, and Regulation Gerd Schmitz and Thomas Langmann
Universit¨at Regensburg, Regensburg, Germany
Originally published in: Cellular Proteins and Their Fatty Acids in Health and Disease. Edited by Asim K. Duttaroy and Friedrich Spener. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30437-0
1 Introduction
Members of the ATP binding cassette (ABC) transporter superfamily, which are found in all three kingdoms of life, namely prokaryotes, archae bacteria and eukaryotes, represent one of the biggest protein families described so far. ABC transporter genes are highly conserved among different genomes and have been sustained throughout the evolutionary tree. They are usually multispan membrane proteins that mediate the active uptake or efflux of specific substrates across various biological membrane systems [1]. The development of these two different directions of transport, import or export, most likely occurred even before the differentiation of eukaryotes from prokaryotes [2]. In bacteria, these bidirectional transport functions either provide a mechanism for nutrition supply, as in the MalK transporter from Escherichia coli [3], or serve as a defense mechanism, allowing bacteria to protect themselves from their own or foreign toxins [4]. During evolution, eukaryotes have developed specialized ABC proteins as a type of early innate immune system, protecting cells from harmful substances. Thus in the human system several ABC proteins (MDRs, MRPs, ABCG2) are responsible for increased drug exclusion in compound-treated tumor cells, providing cellular mechanisms for the development of multidrug resistance [5]. ABC transporters have also received attention because mutations in these molecules are the cause of various inherited human diseases, including familial HDL deficiency (ABCA1) [6–8], some chorioretinal diseases (ABCR or ABCA4) [9], progressive familial intrahepatic cholestasis (PFIC) type II (PFICII: BSEP or ABCB11) and type III (PFICIII: MDR3 or ABCB4) [10–12], Dubin-Johnson syndrome (cMOAT or ABCC2) [13], pseudoxanthoma elasticum (MRP6 or ABCC6) [14], adrenoleukodystrophy (ALDR or ABCD2) [15], and β-sitosterolemia [16]. In Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Human ABC Transporters: Function, Expression, and Regulation
addition, functional polymorphisms have been described in various ABC genes including ABCA1 [17], ABCA4 [18], and ABCG1 [19]. Most interestingly, a significant number of human ABC transporters are critically involved in bile acid, phospholipid, and sterol transport [20–23], whereas the expression of these ABC proteins is itself controlled by lipids. Therefore, it is obvious that ABC transporters are promising target molecules for the treatment of lipid disorders. In this article, we summarize the structural features of human ABC transporters, their role in human disorders and especially highlight the function and regulation of ABC proteins in cellular and total body lipid homeostasis. 2 Structural Features of ATP Binding Cassette (ABC) Transporters
A functional ABC transporter protein usually consists of two transmembrane domains (TMD) and two nucleotide binding domains (NBD) or ATP binding cassettes (ABC). The characteristic ABC domain is composed of two short, conserved peptides, the Walker A and Walker B motifs [24], which are required for ATP binding and which are also found in other ATP binding proteins [25] (Fig. 1). An additional element, the signature motif, is located between both Walker motifs and is characteristic for each ABC subfamily [26]. The TMD and ABC domains are either present in one polypeptide chain (full-size transporter) or in two poly-peptides (half-size transporter) and several arrangements of the TMD and ABC motifs are found in human ABC proteins (Fig. 1).
Fig. 1 Diagram depicting domain arrangements of human ABC transporters. The ATP binding cassette (ABC) consists of Walker A and Walker B motifs, separated by the signature motif characteristic for each ABC transporter subfamily [24, 28]. The membrane spanning domains are depicted as barrels. (A) The TMD0-(TMD-ABC)2 structure of ABCC (MRP) family members is shown. In addition to the regular full-size type containing the (TMD-ABC)2 domain
arrangement, this type displays an additional five trans-membrane domains termed TMD0. (B) Prototype ABC transporter with the (TMD-ABC)2 structure. (C) Two alternative types of half-size molecules, TMDABC and ABC-TMD. Only corresponding half-molecule organizations are able to form heterodimers. (D) The (ABC)2 type of molecules lacking transmem-brane domains is unlikely to function as transporter.
3 Overview of Human ABC Gene Subfamilies
Among the full-size transporters domain structures such as (TMD-ABC)2 as well as TMD0-(TMD-ABC)2 (which contains an additional N-terminal series of five transmembrane spans) occur. (TMD-ABC)2 structures are represented in the ABCA, ABCB, and ABCC families, whereas the TMD0-(TMD-ABC)2 arrangement is solely present in specific members of the ABCC subfamily (Tab. 1). The (ABCTMD)2 is only found in yeast and not present in human ABC molecules. Half-size transporters can either occur in the TMD-ABC organization, as it is the case within the ABCD subfamily, or as ABC-TMD, which is found in the ABCG group of ABC proteins (Tab. 1). In both cases, creation of a functional transporter requires the assembly as a homodimer or heterodimer. Although the final destination of full-size transporters is the plasma membrane, these proteins are also found intracellularly as a result of vesicular trafficking processes. Also, the localization in intracellular membrane-bound vesicles, collectively named multivesicular bodies (MVBs), is conceivable [27]. Most half-size molecules are routed to intracellular membrane systems such as mitochondria, peroxisomes, the endoplasmic reticulum, and the Golgi compartment [28]. However, a member of the ABCG subfamily, ABCG2, has been localized to the plasma membrane [29]. In contrast to these membrane-spanning ABC transporters, proteins from the ABCE and ABCF subfamilies do not harbor TMD at all and contain a (ABC)2 domain structure (Fig. 1). As a consequence, they are not likely to be involved in any membrane transport function. Moreover, ABCE1 binds oligoadenylate, which is produced upon viral infections and seems to be a part of the innate immune system by controlling the RNase L pathway [30]. ABCF1 is associated with ribosomes and interacts with eukaryotic initiation factor 2 (eIF2) and thereby plays a key role in the initiation of mRNA translation [31]. The group of membrane-spanning ABC transporters can be split into two different sections depending on their mode of action. The active transporters or pumps, such as members of the ABCB (MDR/TAP) subfamily, couple the hydrolysis of ATP and the resulting free energy to movement of molecules across membranes against a chemical concentration gradient [32]. In contrast, recent work has identified several ABC proteins that show nucleotide binding and a subsequent conformational change but very low ATP hydrolysis. These molecules mainly function as transport facilitators and include ABCC7 (CFTR) [33], ABCC8 (SUR1), ABCC9 (SUR2) [34], and ABCA1 [35].
3 Overview of Human ABC Gene Subfamilies
A comprehensive description of the currently known human ABC transporters is given in Tab. 1. The list contains fully characterized ABC genes, as well as gene annotations derived from sequence information based on the analysis of the human genome [36, 37] and uses the proposed nomenclature of the Human Gene Nomenclature Committee (HUGO). The synonyms, the chromosomal location, the domain structure, and the tissue specificity and cellular location of each gene are itemized. Furthermore, the lipid-sensitive regulation and the known or putative function of
3
Gene
Alternative name
Location
Domain structure
Tissue expression and cellular location
Lipid regulated
Known or putative transported molecule
ABCA1
ABC1
9q31.1
TMD-ABC) 2
Macrophages, liver
+
ABCA2 ABCA3 ABCA4 ABCA5 ABCA6 ABCA7
ABC2 ABC3 ABCR
9q34 16p13.3 1p22.1–p21 17q24 17q24 19p13.3
(TMD-ABC)2 (TMD-ABC)2 (TMD-ABC)2 (TMD-ABC)2 (TMD-ABC)2 (TMD-ABC)2
Brain Lung Photoreceptors Muscle, heart, testes Liver Spleen, thymus, PBMC
+ + − + + +
Choline-phospholipids and cholesterol Estramustine, steroids Surfactant phospholipids N-Retinylidene-PE
17q24 17q24 17q24 2q34 7p11–q11
(TMD-ABC) 2 (TMD-ABC)2 (TMD-ABC)2 (TMD-ABC)2 (TMD-ABC)2
Ovary Heart Muscle, heart Stomach Low in all tissues
− + − − −
ABCA8 ABCA9 ABCA10 ABCA12 ABCA13
Phospholipids Sphingolipids (e.g. ceramide) and serine-phospholipids (e.g. PS)
ABCB1
MDR1
7p21
(TMD-ABC)2
Excretory organs, apical membrane
+
ABCB2 ABCB3 ABCB4
TAP1 TAP2 MDR 3
6p21 6p21 7q21.1
TMD-ABC TMD-ABC (TMD-ABC)2
− − +
ABCB5 ABCB6 ABCB7
MTABC3 ABC7
7p14 2q36 Xq12–q13
(TMA-ABC)2 TMD-ABC TMD-ABC
Ubiquitous, ER Ubiquitous, ER Liver, apical membrane Ubiquitous Mitochondria Mitochondria
Phospholipids, PAF, aldoster-one, cholesterol, amphiphiles, β-amyloid peptide Peptides Peptides Phosphatidylcholine
− + −
Fe/S clusters Fe/S clusters
4 Human ABC Transporters: Function, Expression, and Regulation
Table 1 Overview of human ABC gene subfamilies
− +
Fe/S clusters
− +
Fe/S clusters Monovalent bile salts (e.g. TC)
TMD0-(TMD-ABC) 2
Lung, testes, PBMC
+
10q24
TMD0-(TMD-ABC)2
Liver
+
MRP3
17q21.3
TMD0-(TMD-ABC)2
Lung, intestine, liver, basolateral membrane
−
ABCC4
MRP4
13q32
(TMD-ABC)2
Prostate
+
ABCC5
MRP5
3q27
(TMD-ABC)2
Ubiquitous
+
ABCC6
MRP6
16p13.1
TMD0-(TMD-ABC)2
Kidney, liver
−
ABCC7 ABCC8
CFTR SUR1
7q31.2 11p15.1
(TMD-ABC)2 TMD0-(TMD-ABC)2
Exocrine tissue Pancreas
− −
GSH-, glucuronate-, sulfate-conjugates, GSSG, sphingolipids, LTC4 , PGA1, PGA2 , 17β-glucuronosyl estradiol GSH-, glucuronate-, sulfate-conjugates, bilirubin glucuronide, LTC4 , 17β-glucuronosyl estradiol, taurolithocholate 3-sulfate, anionic drugs glucuronate-, sulfate-conjugates, 17β-glucuronosyl estradiol, taur-olithocholate 3-sulfate Xenobiotics, nucleosides ATP/ADP/AMP/adenosin, GTP/GDP) Xenobiotics, nucleosides (ATP/ADP/AMP/adenosin, GTP/GDP) Anionic cyclopentapeptides (e.g. BQ123) Chlorideions, ATP Sulfonylureas
MAB C1
7q36 12q 24
TMD-ABC TMD-ABC
ABCB10 ABCB11
MTABC2 BSEP
1q42 2q24
TMD-ABC (TMD-ABC)2
ABCC1
MRP1
16p13.1
ABCC2
MRP2
ABCC3
3 Overview of Human ABC Gene Subfamilies
Mitochondria Heart, brain, lysosomes Mitochondria Liver, apical membrane
ABCB8 ABCB9
5
Gene
Alternative name
Location
Domain structure
Tissue expression and cellular location
Lipid regulated
Known or putative transported molecule
ABCC9 ABCC10 ABCC11 ABCC12
SUR2 MRP7 MRP8 MRP9
12p12.1 6p21 16q11–q12 16q11–q12
TMD0-(TMD-ABC)2 TMD0-(TMD-ABC)2 (TMD-ABC)2 (TMD-ABC)2
Heart, muscle Low in all tissues Low in all tissues Low in all tissues
− − − −
Sulfonylureas
ABCD1 ABCD2 ABCD3 ABCD4
ALD ALDR PMP70 PMP69
Xq28 12q11–q12 1p22–p21 134q24.3
TM-ABC TM-ABC TM-ABC TM-ABC
Peroxisomes Peroxisomes Peroxisomes Peroxisomes
+ + − +
Very long-chain fatty acids Very long-chain fatty acids Very long-chain fatty acids Very long-chain fatty acids
ABCE1 ABCF1
OABP ABC50
4q31 6p21.33
(ABC)2 (ABC)2
Ovary, testes, spleen Ubiquitous
− −
7q36 3q25
(ABC)2 (ABC)2
Ubiquitous Ubiquitous
− −
Oligoa denylate Translation elongation initiation factor 2
21q22.3 4q22 11q23 2p21 2p21
ABC-TMD ABC-TMD ABC-TMD ABC-TMD ABC-TMD
Ubiquitous Placenta, intestine Liver Liver, instestine Liver, instestine
+ − + + +
ABCF2 ABCF3 ABCG1 ABCG2 ABCG4 ABCG5 ABCG8
White MXR White2 White3 White 4
Phospholipids, cholesterol Drug resistance Plant and shellfish sterols Plant and shellfish sterols
Notes: The currently known 48 human ABC transporters from six different subfamilies and their typical features are listed. The proposal of the Human Genome Organization (HUGO) for the numbering of human ABC transporter genes has been used and the common names have been included additionally. The domain structure of ABC transporters has been adapted from Klein et al. [28] or by generation of hydrophobicity plots. An excellent regularly updated website established by Michael M¨uller (http://nutrigene.4t.com/humanabc.htm) provides supplementary information concerning database entries.
6 Human ABC Transporters: Function, Expression, and Regulation
Table 1 Continued
3 Overview of Human ABC Gene Subfamilies
human ABC transporters is mentioned. A short outline of each of the six known human ABC gene subfamilies is presented in the following paragraphs. 3.1 The ABCA (ABC1) Subfamily
The ABCA family contains solely full-size transporters (Tab. 1), and with ABCA1, ABCA4 (ABCR), and ABCA2 the largest proteins with 2261, 2273, and 2436 amino acids, respectively. Most of the ABCA proteins are expressed ubiquitously at low levels and also predominantly in specific tissues, such as ABCA1 in macrophages and ABCA4 (ABCR), which seems to be restricted to photoreceptor cells [9]. In contrast to all other ABC subgroups, the ABCA subfamily has no counterpart in yeast and appears for the first time in Caenorhabditis elegans [38]. Based on the genomic locations and phylogenetic analyses [39], two distinct divisions of ABCAs can be formed. The first group contains five genes located in a cluster on chromosome 17q24 (ABCA5, ABCA6, ABCA8, ABCA9, and ABCA10) and the second group consists of seven genes distributed over six different chromosomes (ABCA1, ABCA2, ABCA3, ABCA4, ABCA7, ABCA12, and ABCA13). Interestingly, the transcriptional control of at least seven ABCA members (Tab. 1) is controlled or influenced by lipids [40–45], indicating an important role of the whole ABCA subfamily in cellular lipid transport processes [23, 46]. ABCA1, the founding member of the family, is under extensive investigation and it is now widely accepted that its predominant role is associated to the regulation of cellular phospholipid and cholesterol release via an indirect mechanism, possibly by ATP-sensitive regulation of an as yet uncharacterized molecule [35]. In contrast, ABCA4 is an active retinoid-PE complex transporter which displays strong, lipid-activated ATPase activity [47–49] comparable to active pumps such as ABCB1 (MDR1). In addition to the high expression in neuronal tissues [50], ABCA2 is also present in liver, kidney, and macrophages [45, 51]. ABCA2 co-localizes with endosomal/lysosomal markers and contains a lipocalin signature motif, a feature found in a family of proteins linked to the transport of sterols including retinoids, steroids, lipids, and bilins [52]. Thus it is conceivable that ABCA2 sequesters lipids or lipid-steroid complexes via its lipocalin domain into endoso-mal/lysosomal vesicles, which could serve as a secretory pathway for these molecules [51]. This hypothesis is further supported by the lipid-sensitive induction of ABCA2 in human macrophages [45]. Although ubiquitously expressed, the ABCA7 protein is predominantly found in myelo-lymphatic tissues [43, 44] and a pivotal role in the development of hematopoietic cell lineages has been suggested [53]. Interestingly, there is recent evidence that ABCA7 may be involved in the transport of phosphatidylserine and ceramidespecies and thus be linked to apoptotic processes [54]. The ABCA3 protein is an integral part of the surfactant lamellar body membrane in lung alveolar type II cells [55]. Pulmonary surfactant is a complex mixture consisting of phospholipids, neutral lipids, and specific proteins. It is essential for normal lung function because it reduces surface tension at the air-liquid interface
7
8 Human ABC Transporters: Function, Expression, and Regulation
of alveolar spaces. Phospholipids comprise 80% of the mass of surfactant, of which 80–85% are phosphatidylcholines (PC). Among the PC molecular species, dipalmitoyl-PC (PC16:0/16:0) is the principle surface tension-lowering molecule, ranging from 40 to 60 mol% in adult mammals, whereas disaturated palmitoylmyristoyl-PC (PC16:0/14:0), together with the monounsaturated palmitoylpalmitoleoyl-PC (PC16:0/16:1) and palmitoyloleoyl-PC (PC16:0/18:1), comprise up to 38% of total PC [56]. Lung surfactant also contains four unique proteins: surfactant protein A (SP-A), SP-B, SP-C, and SP-D [57]. Lamellar bodies are enriched in SP-B and SP-C and it has been proposed that these hydrophobic proteins are secreted together with the phospholipids. SP-A and SP-D are secreted independently of lamellar bodies. The localization of ABCA3 in lamellar bodies of alveolar type II cells and the finding that raised ATP levels in bronchoalveolar lavage fluid are sufficient to stimulate surfactant secretion [58] implicate ABCA3 in the processing of pulmonary surfactant by transporting phospholipids and/or specific surfactant proteins for secretion. Since lamellar bodies are also important structures in other cells with barrier function such as keratinocytes in the skin and intestinal epithelial cells and because ABCA3 is expressed in other cells as well, a similar function in this cellular system could be envisioned [22, 59].
3.2 The ABCB (MDR/TAP) Subfamily
The ABCB family is the only subgroup of human ABC transporters that contains full-size (ABCB1, ABCB4, ABCB5, and ABCB11) and half-size transporters (ABCB2, ABCB3, ABCB6-ABC10) (Tab. 1). ABCB1 (MDR1) is probably the best studied ABC transporter. It was the first human ABC protein to be cloned [60] and has the ability to mediate multidrug resistance in cancer cells. ABCB1 is localized to the apical membrane of polarized cells and the major sites of expression are found in the liver, the intestine, and the blood-brain barrier. One proposed physiological function of MDR1 is the protection of cells by exporting lipophilic cytotoxic drugs. In addition to ABCB4 (MDR3), which only translocates phosphatidylcholine (PC) across membranes [61], ABCB1 can transport a variety of lipids: PC analogs, phosphatidylethanolamine (PE), sphingomyelin (SM), cholesterol, and glucosylceramide (GlcCer) molecules, which carry a shortened fatty acid at the C2-position of the glycerol or sphingosine backbone [62]. Of particular interest is the finding that ABCB1 (MDR1)-overexpressing cells have elevated levels of cholesterol, GlcCer [63–65] and caveolin 1 [66], all of which are constituents of raft plasma membrane microdomains involved in pathways of lipid efflux from cells. However, these data need further confirmation. Since ABCB1 itself is localized in Triton X-100-insoluble caveolin/cholesterolrich domains [67] and because cholesterol can directly interact with the substrate binding site of ABCB1 [68], it has been suggested that the transport of cytotoxic drugs, which are mostly lipophilic, is coupled to the translocation of cholesterol and sphingolipids [69].
3 Overview of Human ABC Gene Subfamilies
A recent report has indicated that ABCB1 is also involved in the secretion of platelet-activating factor (PAF) [70]. PAF (1-O-alkyl-2-acetyl-sn-glycero-3phosphocholine) is a potent bioactive lipid that is synthesized by a broad range of cells, including circulating infammatory cells, endothelial cells, and epithelial cells. It has a variety of biological effects including activation of inflammatory cells and is involved in many pathological conditions, such as angiogenesis in breast cancers, metastasis, shock, sepsis and multiple organ failure. Since PAF is a naturally occurring short-chain phosphocholine and because MDR1 recognizes short-chain analogs of PC and is expressed in many cell types, including epithelial cells, a model for direct translocation of PAF across the plasma membrane has been proposed. Once present on the cell surface, PAF interacts with the PAF receptor on a neighboring cell and elicits its signaling mechanisms. An unexpected role of ABCB1 in the immune response has been recently identified: mdr1a−/− mice kept under pathogen-free conditions develop spontaneous intestinal inflammation. It is thought that this type of colitis is due to a disturbance of the mucosal layer as a consequence of a defect in the membrane integrity of intestinal epithelial cells [71]. In this context, altered intestinal intraepithelial lymphocyte populations and a disturbed cytokine response has been documented in mdr1a−/− mice [72, 73]. ABCB1 has also been implicated in the efflux of brain β-amyloid protein, since pharmacological blockade of ABCB1 rapidly decreases extracellular levels of βamyloid secretion. Also, in vitro binding studies showed that addition of synthetic human -amyloid peptides to hamster mdr1-bearing vesicles resulted in saturable uptake of these peptides, suggesting that they interact directly with the transporter [74]. These results and the finding that apolipoprotein E (apoE) is also associated with β-amyloid peptides [75] implies that ABCB1 can co-transport apoE and β-amyloid and thereby may contribute to the etiology of Alzheimer’s disease. Two half-size members of the subfamily, ABCB2 (TAP1) and ABCB3 (TAP2), are transporters associated with antigen presentation (TAP) and form a functional heterodimer to transport peptides from the cytoplasm into the endoplasmic reticulum, from where the presentation of peptide antigens via major histocompatibility complex (MHC) I starts [76] (Fig. 2). A transient complex containing a class I heavy chain-β 2 microglobulin (β2m) heterodimer is assembled onto the TAP molecule by numerous interactions with the ER chaperones calnexin, ERp57, calreticulin, and the specialized tetherin molecule, tapasin [77]. Most interestingly, virus-infected and malignant cells have developed strategies to escape immune surveillance by affecting TAP expression or function [78]. The immediate-early gene product ICP47 of herpes simplex virus type I binds to the cytoplasmic face of TAP and thereby blocks peptide entry, whereas the ER-resident human cytomegalovirus protein US6 inhibits TAP function by blocking the ER-luminal part of the transporter (Fig. 2) [79–81]. ABCB9, which is closely related to ABCB2 and ABCB3, has been mainly found in lysosomes [82]. Although ABCB9 has been proposed to be involved in TAP-dependent processes, its exact function is currently unknown. The remaining four ABCB proteins (ABCB6, ABCB7, ABCB8, and ABCB10) are all targeted to the inner mitochondrial membrane and play a role in cellular iron homeostasis by
9
10 Human ABC Transporters: Function, Expression, and Regulation
Fig. 2 Proposed role of TAP proteins (ABCB2, ABCB3) in antigen presentation. En-dogenous proteins are degraded in the ubiquitin-proteasome pathway. The peptides are transported into the ER lumen by a fullsize complex composed of TAP1 and TAP2 [76]. The correct folding, assembly and load-
ing of MHC I molecules is mediated by numerous accessory proteins including calnexin, calreticulin, ERp57, tapasin, and TAP [77]. Stable MHC I-peptide complexes leave the ER through the Golgi compartment to the cell surface for recognition by T-cell receptors.
transporting iron-sulfur (Fe/S) cluster precursor proteins [82–85]. In this respect, a mutation in ABCB7, which is located on the X-chromosome, has been linked to X-linked sideroblastic anemia and ataxia (XLSA/A) (see Tab. 2) [86]. 3.3 The ABCC (CFTR/MRP) Subfamily
The ABCC subfamily comprises 12 full-size ABC proteins which perform such diverse functions as drug resistance, ion transport, nucleoside transport, and ion channel regulation (Tab. 2). A special subgroup within the ABCC family can be distinguished by the presence of a TMD0-(TMD-ABC)2 domain arrangement (Fig. 1A). Seven members display this special membrane topology (ABCC1, ABCC2, ABCC3, ABCC6, ABCC8, ABCC9, and ABCC10), whereas the other proteins in this subfamily exhibit the (TMD-ABC)2 structure. Although the TMD0 part is not required for transport activity, a linker region designated L0 is essential for proper
3 Overview of Human ABC Gene Subfamilies Table 2 Human ABC transporter genes and corresponding dieseases or phenotypes
Gene
Alternative name
Disorder or phenotype
Reference
ABCA1 ABCA4
ABC1 ABCR
ABCB1
PGY1, MDR1
6–8 9 9 9 9 9 9 59, 70
ABCB2 ABCB3 ABCB4
TAP1 TAP2 MDR3
ABCB7
ABC7
ABCB11
SPGP, BSEP
Familial HDL deficiency, Tangier disease Stargadt macular dystrophy (STGD) Fundus flavimaculatus (FFM) Retinitis pigmentosa 19 (RP) Cone-rod dystrophy (CRD) Cone dystrophy (CD) Age-related macular degeneration (AMD) Multidrug resistance, inflammatory bowel disease (ulcerative colitis) Immune deficiency Immune deficiency Progressive familial intrahepatic cholestasis type 3 (PFIC-3) Intrahepatic cholestasis of pregnancy (ICP) X-linked sideroblastosis and anemia (XLSA/A) Progressive familial intrahepatic cholestasis type 2 (PFIC-2)
ABCC1 ABCC2 ABCC6 ABCC7 ABCC8
MRP1 MRP2 MRP6 CFTR SUR1
Multidrug resistance Dubin-Johnson Syndrome (DJS) Pseudoxanthoma elasticum (PXE) Cystic fibrosis (CF) Persistent hyperinsulinemic hypoglycemia of infancy (PHHI)
162 13 14 163 184
ABCD1 ABCD3
ALD PXMP1, PMP70
Adrenoleukodystrophy (ALD) Zellweger syndrome 2 (ZWS2)
15 104, 105
ABCG2 ABCG5 ABCG8
ABCP, MXR, BCRP White3 White 4
Multidrug resistance β-Sitosterolemia β-Sitosterolemia
117 16 16
76-78 76-78 10–12 10–12 83 11
ABCC1function [87]. Among the (TMD-ABC)2 molecules, ABCC7 (CFTR) is characterized by an extraordinary domain structure: it contains a regulatory domain, which is controlled by cAMP/PKA-dependent phosphorylation and thereby enables ATP binding and hydrolysis at the nucleotide binding cassettes, which in turn control opening and closing of the chloride channel [88]. Mutations in ABCC7 (CFTR) cause cystic fibrosis by affecting numerous secretion processes. ABCC1, ABCC2, and ABCC3 are all able to transport anticancer drugs, whereby ABCC1 (MRP1) mainly transports glutathione-conjugated (GSH) molecules and therefore has been termed GS-X pump [89]. In addition to cancer drug resistance, the physiologic function of GS-X pumps is closely related with cellular detoxification, oxidative stress, and inflammation [90].
11
12 Human ABC Transporters: Function, Expression, and Regulation
ABCC2 (MRP2), which is located in the apical membrane of polarized epithelial cells and particularly to the canalicular membrane of hepatocytes, appears to participate in the hepatobiliary secretion of organic anions and has therefore originally called canalicular multispecific organic anion transporter (cMOAT) [91, 92]. ABCC3 (MRP3) is also an organic ion transporter but prefers glucuronate conjugates over GSH conjugates [93]. ABCC4, ABCC5, ABCC11, and ABCC12 are MRP-like proteins which lack the additional N-terminal domain, and ABCC4 and ABCC5 have been shown to function as cellular efflux pumps for nucleosides, including antihuman immunodeficiency virus drugs such as PMEA [94] and nucleotide analogs (e.g. 6-mercaptopurine and thioguanine) [95]. Although the physiological role as well as the potential participation in drug efflux of ABCC6 (MRP6) is still unclear [96], mutations in the gene have been detected in the connective tissue disorder pseudoxanthoma elasticum (PXE, Tab. 2) [97]. Since ABCC6 is highly expressed in liver and kidney cells, sites where PXE is not very pronounced, one hypothesis suggests that ABCC6 may transport or remove toxic metabolites which destroy connective tissue cells [98]. ABCC11 and ABCC12 are most closely related to the ABCC5 gene and are found tandemly duplicated on chromosome 16q12 (Tab. 1) [99]. Since ABCC11 and ABCC12 were mapped to a region harboring gene(s) for paroxysmal kinesi-genic choreoathetosis, a disease characterized by recurrent, brief attacks of involuntary movements induced by sudden movements or startling, ABCC11 and ABCC12 represent positional candidates for this disorder [100, 101]. Interestingly, several paroxysmal neurological manifestations and idiopathic age-dependent seizures are known to be caused by ion channel-related genes [102]. The two remaining members of the ABCC subfamily ABCC8 (SUR1) and ABCC9 (SUR2) bind sulfonylurea with high affinity and interact with potassium inward rectifiers KIR6.1 and KIR6.2, to form a large octameric channel with the stoichiometry (SUR/KIR6.x)4 [103]. These heteromeric channels regulate insulin release in response to glucose metabolism and sulfonylureas are widely used to stimulate insulin secretion in type 2 diabetic patients because they close these ATP-sensitive potassium (KATP ) channels in the pancreatic beta-cell membrane (see Fig. 5) [34].
3.4 The ABCD (ALD) Subfamily
This subfamily is composed of four peroxisomal half-size ABC transporters with a TMD-ABC domain structure. They are involved in very long fatty acid (VFLA) transport. A variable pattern of homo- and heterodimerization for all ABCD members has been proposed [104–106] and mutations in ABCD1 and ABCD3 are associated with adrenoleukodystrophy (ALD) and Zellweger syndrome 2 (ZWS2), respectively (Tab. 2) [107, 108]. An interesting finding is the transcripitonal regulation of ABCD genes by lipids (Tab. 1). In this respect, recent reports have provided evidence that nuclear hormone receptor ligands, especially RXR ligands and PPAR ligands, induce the ABCD2 promoter [109, 110].
3 Overview of Human ABC Gene Subfamilies
3.5 The ABCE (OABP) and ABCF (GCN20) Subfamilies
This subfamily contains four half-size ABC transporters, which are ubiquitously expressed in human tissues and do not possess transmembrane domains. The ABCE1 gene encodes an oligoadenylate binding protein (OABP), which is only found in multicellular eukaryotes and seems to participate in innate immune defense [30]. Oligoadenylates, which are produced from virus-infected cells are activators of RNaseL, which in turn degrades cellular RNAs and thereby blocks protein synthesis in infected cells. ABCE1 binds these oligonucletides and thus inhibits RNAseL, which implies that ABCE1 is involved in the negative control of immune reactions. ABCF1, the human homolog of the yeast GCN20 gene, shares some interesting features with ABCE1. Thus, ABCF1 is involved in the control of protein synthesis and also in the control of the immune system. ABCF1 binds to the translation elongation initiation factor 2 (eIF2) and seems to modulate its phosphorylation state [111]. In addition, ABCF1 has been co-purifed with ribosomal components confirming its role in protein translation [31]. In another interesting study, Richard and colleagues identified ABCF1 as a TNFinduced transcript in synoviocytes. They suggest that this ABC protein could be part of inflammatory processes related to rheumatoid arthritis. Since functionally related genes tend to be clustered on chromosomes and because ABCF1 is located on chromosome 6p21.33 (Tab. 1) in close proximity to class I MHC, the proposition that ABCF1 mediates inflammatory processes is very likely. 3.6 The ABCG (White) Subfamily
The human white or ABCG subfamily consists of five fully cloned genes (ABCG1, ABCG2, ABCG4, ABCG5, and ABCG8) and one gene so far only found in rodents (ABCG3) [22]. The ABCGs are thought to dimerize to form active membrane transporters. Among the half-size molecules ABCG proteins have a peculiar domain organization characterized by a nucleotide binding domain (ATP binding cassette) at the N-terminus followed by six transmembrane-spanning domains (Tab. 1 and Fig. 1). The founding member of this group, ABCG1, was independently described by Chen et al. and Croop et al. as the human homolog of the Dro-sophila white gene [112, 113] and its genomic organization, including the promoter region, has been described recently [114, 115]. Earlier indications linked ABCG1 with the congenital recessive deafness (DFNB10) syndrome, based on its chromosomal localization on chromosome 21q22.3 [116]. However, a recent report [117] has excluded ABCG1 along with five other known genes as candidates for DFNB10. Also, conflicting data exist whether the G2457A polymorphism in the 3UTR of the ABCG1 mRNA is associated with mood and panic disorders and related to suicidal behavior [19, 118]. The most interesting report dealing with ABCG1function came from a study by Klucken et al., which identified ABCG1 as a sterol-induced gene that participates in cholesterol and phospholipid efflux, especially in macrophages and foam cells
13
14 Human ABC Transporters: Function, Expression, and Regulation
[41]. The second well-known member of the ABCG subfamily ABCG2 has been identified by different approaches and is known under the names ABCP [119], BCRP [120], and MXR [121]. The protein has been shown to be amplified and overexpressed in human cancer cells and is capable of mediating drug resistance even in the absence of the classical MDR proteins ABCB1 (MDR1) and ABCC1 (MRP1) [121–123]. In contrast to most other half-size ABC transporters, the bulk of the ABCG2 protein has been localized to the plasma membrane, with a minor fractions found within intracellular membranes [29]. It was only a short time ago when two other ABCG transporters ABCG5 and ABCG8 had been identified and linked to the human disease β-sitosterolemia by two independent approaches [124– 126]. The latest paper on an ABCG member has reported the cloning of the complete cDNA of ABCG4 and identifed this transporter as a sterol-sensitive gene [127].
4 Diseases and Phenotypes Caused by ABCM Transporters
Eighteen out of 48 currently known human ABC proteins have been linked to human monogenetic disorders or cause special disease phenotypes (Tab. 2) [98]. Since ABC transporters represent a combination of enzymes and structural proteins, homozygous mutations cause severe human diseases, which are inherited in a recessive manner. As described below, these genetic diseases are found in five of the seven ABC subfamilies (Tab. 2). In addition, heterozygous mutations in ABC genes have been connected with susceptibility to complex, multigenic disorders. It is also worth mentioning that due to the pleiotropic functions of ABC transporters, the disease states affected by mutations in ABC transporter genes are just as complex and diverse as the cellular functions of these proteins. 4.1 Familial HDL-deficiency and ABCA1
The major clue that ABCA1 is involved in cellular cholesterol removal and lipid efflux was the identification of mutations in the human gene as the defect in familial HDL-deficiency syndromes such as classical Tangier disease (Tab. 2) [6–8]. The most striking feature of these patients is the almost complete absence of plasma HDL, low serum cholesterol levels, and a markedly reduced efflux of both cholesterol and phospholipids from cells, strongly supporting the idea that both lipids are co-transported [128, 129]. The lack of ABCA1 function in these patients has a major impact on plasma HDL levels and composition. Thus plasma HDL from TD patients is composed of small pre-β 1 -migrating HDL particles containing solely apoAI and phospholipids but lacking free cholesterol and apoAII [130, 131]. The low HDL levels seen in Tangier disease (TD) are mainly due to an enhanced catabolism of these HDL precursors [131–134]. In addition, the size of the HDL particle strongly correlates with the amount of cholesterol efflux and plasma HDL concentrations [135, 136]. In TD patients, neither cholesterol absorption nor metabolism is
4 Diseases and Phenotypes Caused by ABCM Transporters
significantly affected, however, the concentration of LDL-cholesterol is only 40% of healthy controls and the particles are often enriched in triglycerides. The reduction in LDL levels is mainly caused by disturbance of the cholesterol ester transfer pathway resulting in changes of LDL composition and size [137]. Interestingly, obligate heterozygotes for TD mutations have approximately 50% of plasma HDL, but normal LDL levels [138]. Studying 13 different mutations in 77 heterozygous individuals, Clee et al. described a more than 3-fold risk of developing coronary artery disease in affected family members and earlier onset compared with unaffected members [139, 140]. However, these results seem to be biased towards the atherosclerotic phenotype, since the prevalence of splenomegaly is much higher in the European group of ABCA1 deficiency patients [46]. These authors also reported an age-dependent modification of the ABCA1 heterozygous phenotype [140]. In addition to the absence of plasma HDL, patients with genetic HDL-deficiency syndromes display accumulation of cholesteryl esters either in the cells of the reticulo-endothelial system (RES), leading to splenomegaly and enlargement of tonsils or lymph nodes, or in the vascular wall, leading to premature atherosclerosis [46]. This indicates differences in macrophage trafficking into tissues in the absence of ABCA1 which may be a reflection of the specific localization of mutations within the ABCA1 gene. In this context, it is of note that the pool size of CD14dim CD16+ monocytes is inversely correlated with plasma HDL-cholesterol levels [141] and the expression of ABCA1 is high in phagocytes [40] but low in antigen-presenting dendritic precursor cells (unpublished observation). These observations may provide clues for a potential interlink between ABCA1function and the control of monocyte differentiation and phagocyte/dendritic cell lineage commitment. Accordingly, we have previously hypothesized that ABCA1 function regulates the differentiation, lineage commitment (phagocytic versus dendritic cells), and targeting of monocytes into the vascular wall of the RES [142]. This concept has been substantiated by recent work from our laboratory demonstrating accumulation of macrophages in liver and spleen in LDL receptor-deficient mouse chimeras that selectively lack ABCA1 in their blood cells [143]. The fact that the absence of ABCA1from leukocytes is sufficient to induce aberrant monocyte recruitment into specific tissues identifies ABCA1 as a critical leukocyte factor in the control of monocyte targeting. In addition to phagocytes, dendritic cells have been shown to be increased in atherosclerotic lesions and have been implicated in T cell activation in atherogenesis [144]. Expression of ABCA1 appears to inhibit monocyte differentiation into macrophages and may thus shift the balance between phagocytic differentiation and dendritic cell differentiation towards the latter [145]. Taking into account that dendritic cells are capable of inducing primary immune responses, ABCA1 may function, through this mechanism, as a modulator of innate immunity in atherogenesis. An interesting clue as to how ABCA1 may be implicated in the control of monocyte/macrophage trafficking at the cellular level comes from the observation that apoAI-mediated lipid efflux in ABCA1-deficient cells is paralleled by the downregulation of the protein Cdc42 and filopodia formation [146]. Cdc42, like
15
16 Human ABC Transporters: Function, Expression, and Regulation
rho and rac, is a member of the family of small GTP binding proteins which are sequentially activated by extracellular stimuli in mammalian cells [147]. Cdc42 controls a wide range of cellular functions including cytoskeletal modulation, formation of filopodia and vesicular processing. Rho proteins are known to induce the formation of stress fibers and focal adhesions; rac proteins regulate formation of lamellipodia and membrane ruffles. It is thus tempting to speculate that ABCA1 modulates cellular mobility of monocytes/macrophages through this mechanism and thus may affect recruitment of monocytes into the vessel wall. This regulator function for filopodia formation and cytoskeletal reorganization may even extend to platelet aggregation, vascular smooth muscle cell migration, and endothelial cell integrity, since these cells have been shown to express ABCA1 [148]. 4.2 Retinal Degeneration and ABCA4 (ABCR)
In addition to ABCA1, the ABCA4 (ABCR) gene located on chromosome 1p21 (Tabs 1 and 2) is another example how several mutations in one ABC transporter gene can cause pleiotropic effects. Thus, many different clinical phenotypes, associated with various forms of eye degeneration, and the age of onset as well as disease severity are associated with distinct mutations in ABCA4 [9]. As summarized in Tab. 2, ABCA4 has been found to be a causal gene for a series of retinal diseases. As an effort of several laboratories in 1997 [149–151], mutations in ABCA4 have been identified in Stargadt disease (STGD), a juvenile-onset macular dystrophy characterized by rapid central visual impairment and progressive bilateral atrophy of the retinal pigment epithelium, as well as in the late-onset form termed fundus fla-vimaculatus. Although only 60% of the mutations in the ABCA4 gene of STGD have been determined, all segregated chromosomal regions in these patients have been mapped to a locus between chromosomes 1p13 and 1p22. In addition to the monogenic STGD, ABCA4 mutations have been described in the autosomal recessive diseases conerod dystrophy (CRD) [152, 153] and retinitis pigmentosa (RP) [152, 154–156], which are both genetically and clinically heterogeneous disorders. Conerod dystrophy mainly displays cone degeneration, whereas retinitis pigmentosa affects predominantly rod photoreceptors. Age-related macular degeneration (AMD), the leading cause of severe central visual impairment among the elderly, is the fourth disease state associated with ABCA4 dysfunction. The disease is also characterized by progressive accumulation of large quantities of lipofuscin with retinal pigment epithelial cells and delayed dark adaptation [157]. Athough AMD is strongly influenced by environmental factors such as smoking, heterozygous mutations in ABCA4 have been proposed to increase the susceptibility to develop AMD. Thus, the two most frequent AMD-associated ABCA4 variants D2177N and G1961E, increase the risk of developing AMD by approximately 3-fold and 5-fold, respectively [158, 159]. In addition to the above described results based on the phenotypical analysis of ABCA4 mutations, data from in vitro studies and ABCA4 knockout mice have shed
4 Diseases and Phenotypes Caused by ABCM Transporters
Fig. 3 Model for the role of ABCA4 (ABCR) in rod outer segments. Left panel: schematic drawing of a rod photoreceptor. Right panel: magnification of rod disc membranes. Rho-dopsin is manufactured from opsin and 11-cis retinal in the Golgi of the rod inner segment and transported to rod outer segment discs. Upon light absorption the 11-cis form of retinal is converted to an all-trans form, which reacts with phosphatidylethanolamine (PE) to form the Schiff-base product N-retinylidene-PE (N-
RPE). ABCA4 is thought to flip N-RPE to the outer leaflet of the disc membrane. There, all-trans retinal is generated by hydrolsysis of N-RPE and subsequently reduced to alltrans retinol by retinol dehydrogenase prior to its delivery to the retinal pigment epithelial cells and re-esterification [62]. Under the effect of short-wave light or in ABCA4 deficiency, all-trans retinal accumulates, causing photooxidative damage and generation of toxic A2E (N-retinyl-N-retinylidene ethanolamine) [63].
light on the transport function of ABCA4 in photoreceptor cells. Recombinant liposome-reconstituted ABCA4 displays all-trans-retinal-stimulated ATPase activity [47, 48, 160] and ABCA4 knockout mice exhibit an acute light-dependent accumulation of all-trans retinal within rod outer segments and a progressive light-dependent culmination of lipofuscin-derived A2E (N-retinyl-N-retinylidene ethanolamine) [161, 162]. Based on these data, a model for the function of ABCA4 in rod disc membranes has been proposed [162]. As summarized in Fig. 3, all-trans retinoids, which are released from rhodopsin by photobleaching, react with the primary amine of PE to form the condensation product N-retinylidene-PE (N-RPE). ABCA4 is thought to flip N-RPE to the outer leaflet of the disc membrane, where all-trans retinal is generated by hydrolysis of N-RPE and subsequently reduced to
17
18 Human ABC Transporters: Function, Expression, and Regulation
all-trans retinol by retinol dehydrogenase prior to its delivery to the retinal pigment epithelial cells and re-esterification [163]. 4.3 Cystic Fibrosis (ABCC7/CFTR)
Cystic fibrosis, caused by mutations in ABCC7 (CFTR) (Tab. 2) is one of the most frequent inherited diseases in Caucasian populations with a prevalence of 1:900 to 1:2500, whereas African and Asian individuals are affected to a much lesser extent. Interestingly, a three base pair deletion (AF508) accounts for 70–80% of the mutated alleles in northern European populations. The total number now comprises more than 1000 CFTR mutations (http://www.genet.sickkids.on.ca/cftr/). The spectrum of the disease severity is dependent on the residual function of ABCC7 [164, 165]. Patients with two affected alleles develop a severe disease with a disturbed exocrine function of the pancreas leading to nutritional deficiencies, bacterial lung infections, and a blockade of the vas deferens causing male infertility. In contrast, patients with one partially functional allele retain residuary pancreatic function and have a milder disease phenotype [166]. Before ABCC7/CFTR was identified, it was known in the 1980s that the apical membrane of different epithelia displays a Cl− conductance, which could be activated by cAMP and which was defective in cystic fibrosis. With the identification of the ABCC7/CFTR gene in 1989 [167] and an impressing multitude of publications in the following years, it became more and more evident that ABC transporters are not exclusively ATP-driven pumps, but moreover can exhibit a regulatory and/or channel function. As depicted in Fig. 4, ABCC7/CFTR can act as a cAMP-regulated chloride channel as well as a regulator of outwardly rectifying chloride channels (ORCC) [88, 168]. In both cases targeting to the apical membrane of epithelial cells and activation of the regulatory (R) domain by PKA are a prerequisite for proper ABCC7function. It is now widely recognized that ABCC7 interacts with various proteins in the plasma membrane [169]. Thus, the N-terminus of CFTR binds the coiled-coil protein syntaxin 1a and the C-terminal region of CFTR binds to PDZ domain proteins, a family of proteins containing a 80–90 amino acid motif that binds the C-terminus of a variety of ion channels and receptors [170]. At least three PDZ domain-containing proteins, NHE-RF or EBP50, CAP70, and CAL, bind to the CFTR C-terminus (Fig. 4) [170–174]. Because EBP50 associates with Ezrin, which itself binds the regulatory subunit of PKA and has a binding site for F-actin [175], it is likely that EBP50 anchors CFTR to the actin cytoskeleton at a site where it can be targeted by PKA. CAP70, a subapical protein, is able to bind two ABCC7 molecules simultaneously via PDZ3 and PDZ4 [173] and thus can mediate cross-linking of CFTR dimers and thereby enhance either direct or indirect (ORCC) chloride channel activity. Taken together, the N-terminus of ABCC7 is required for binding of syntaxin 1a and other components of the SNARE-dependent vesicular trafficking machinery, whereas the C-terminus of CFTR is necessary for cytoskeletal fixation via EBP50/Ezrin proteins.
4 Diseases and Phenotypes Caused by ABCM Transporters
Fig. 4 Schematic diagram summarizing ABCC7 (CFTR) interactions in the plasma membrane. CFTR interacts with various proteins in the plasma membrane including syn-taxin 1a, CAL (CFTR-associated ligand), and EBP50/NHERF [170–175]. Syntaxin 1a contains coiled-coil protein motifs which bind the N-terminal part of CFTR. Syntaxin 1a controls the vesicular trafficking of CFTR through the Golgi to the plasma membrane and thereby inhibits its channel function. CFTR also contains a PSD95/Disc-large/ZO-1 (PDZ)-interacting motif at its C-terminus, which binds the PDZproteins CAL and EBP50/NHERF (ezrin, radixin, and moesin) binding phosphoprotein of 50 kDa/Na+ /H+ exchanger reg-
ulatory factor). CAL has one PDZ domain and two coiled-coil motifs, which organize CFTR into a cluster in the apical membrane. EBP50/NHERF binds to CFTR through its PDZ1 domain and thereby links the protein to the cytoskeleton via ezrin. Ezrin serves as a PKA-anchoring protein and facilitates cAMP-dependent phosphorylation of the CFTR regulatory domain and channel activity. In addition to these direct proteinprotein interactions, CFTR indirectly regulates several ion channels such as ROMK2, ENaC, CaCC, and ORCC. CaCC and ORCC are activated by Ca2+ -dependent purinergic receptors (P2Y2), which are in turn modulated by CFTR-dependent ATP release.
In addition, multimerization of CFTR molecules could be potentially achieved by CAP70-dependent linkage. 4.4 Multidrug Resistance (ABCB1/MDR1, ABCC1/MRP1, ABCG2)
When cells are exposed to toxic compounds, as is the case for tumor cells treated with chemotherapy, resistance against these drugs can occur by a variety of mechanisms. Among these are decreased cellular uptake, increased intracellular detoxification, modification of target proteins, and enhanced extrusion from cells. Although in most cases one compound initially causes these events, cells can become resistant to a variety of drugs with structural similarities to the initial compound. This multidrug resistance (MDR) is mainly caused by three drug efflux pumps, ABCB1 (MDR1), ABCC1 (MRP1), and ABCG2 (MXR, ABCP, BCRP) (Tabs 1 and 2).
19
20 Human ABC Transporters: Function, Expression, and Regulation
ABCB1, which was described in the mid 1970s by its ability to confer a multidrug resistant phenotype to cancer cells upon chemotherapy [176], is a highly promiscous transporter of hydrophobic drugs, e.g. vinblastine, colchicine, VP16, adriamycin, and of such diverse substrates as lipids, steroids, xenobiotics, and peptides [177]. In small-cell lung carcinoma cells, which did not display high ABCB1 activity, overexpression of ABCC1 (MRP1) was identified [178]. Thereafter a similar substrate pattern for ABCC1 compared to ABCB1 was reported, including doxorubicin, daunorubicin, vincristine, and colchicine. However, in contrast to ABCB1, which has mainly organic cations as substrates, ABCC1 is able to transport organic and anionic compounds, the latter mainly in conjugated forms [20, 95]. Another ABC protein amplified and involved in MDR is ABCG2, which confers resistance to anthracyclin chemotherapeutic drugs such as mitoxantrone. These findings have been supported by in vitro studies: ABCG2-transfected drug-sensitive breast cancer cells are resistant to mitoxantrone, daunorubicin, and doxorubicin, and also display an enhanced rhodamine-123 efflux [120]. Interestingly, an R482G mutation in ABCG2 can significantly alter its substrate specificity and concomitantly change the drug-resistance phenotype [179]. 4.5 Adrenoleukodystrophy (ABCD1/ALD)
The X-linked adrenoleukodystrophy (ALD) is an inherited peroxisomal disorder caused by mutations in ABCD1 (Tab. 2), resulting in progressive neurological dysfunction, occasionally associated with adrenal insufficiency [180]. ALD is characterized by the accumulation of unbranched saturated fatty acids with a chain length of 24–30 carbons, particularly hexacosanoate (C26), in the cholesterol esters of brain white matter and in adrenal cortex and in certain sphingolipids of brain [15, 107]. It took a long period of research for the causative defect to be identified and therapies for ALD to be developed; these include the famous approach developed by the Odone family of dietary treatment with oleic and erucic acids (glyceryl trierucate and trioleate oil), known as Lorenzo’s oil [181, 182]. Adrenoleukodystrophy belongs to a group of defects in peroxisomal β-oxidation [183]. The first step in the oxidation of very-long-chain fatty acids (VLCFA) involves their activation by conversion into CoA esters and the transport into peroxisomes [184]. The ABCD1 protein is thought to mediate this transport process [185] and evidence for this function comes from experiments using overexpression of human cDNAs encoding the ABCD1 protein and its closest relative, the ABCD2 (ALDR) protein. With this approach, Netik et al. could restore the impaired peroxisomal β-oxidation in fibroblasts of ALD patients [186]. The accumulation of very-longchain fatty acids could also be prevented by overexpression of the ABCD2 protein in immortalized ALD cells. Moreover, the peroxisomal β-oxidation defect in the liver of ABCD1-deficient mice could be restored by stimulation of ABCD2 and ABCD4 gene expression through dietary treatment with the PPAR agonist fenofibrate [186]. These results implicate that a therapy of adrenoleukodystrophy might be possible by drug-induced overexpression or ectopic expression of ABCD genes.
4 Diseases and Phenotypes Caused by ABCM Transporters
4.6 Sulfonylurea Receptor (ABCC8/SUR)
Familial persistent hyperinsulinemic hypoglycemia of infancy (PHHI) is characterized by unregulated insulin secretion from pancreatic beta cells. The defect has been localized to chromosome 11p15.1–p14, a chromosomal region containing the ABCC8 (SUR1) gene and the KCNJ11 (Kir6.2) gene [187]. Subsequently, causal mutations in the ABCC8 gene have been described in PHHI families [188] (Tab. 2); however, no defects in the KCNJ11 gene have found so far. As described earlier in this article (see Section 3.3), ABCC8 and KCNJ11 form together an inwardly rectifying potassium channel (Fig. 5). Under hyperglycemic conditions the
Fig. 5 Schematic model for KATP channelcontrolled insulin secretion from pancreatic $-cells. Entry and metabolism of glucose into pancreatic $-cells leads to increased levels of intracellular ATP and concomitantly decreases ADP levels. The increase in the ATP/ADP ratio causes binding of ATP to the nucleotide binding domains of ABCC8 (SUR1) and to KIR6.2 [34]. Thereby, the KATP channel closes and the plasma membrane is depolarized. The opening of voltage-gated Ca + channels and voltagedependent Na+ channels raises the intracellular Ca2+ concentration by Ca2+ influx and mobilization of intracellular Ca2+ stores, respectively. The increased level of intracellular Ca + stimulates the dephosphorylation
of $2 -syntrophin and the dissociation of 2syntrophin-utrophin-actin complexes from ICA 512 and secretory granules. Following dissociation of $2 -syntrophin, ICA 512 is cleaved by Ca2+ /calmodulin (CaM)-activated calpain, resulting in the mobilization of secretory granules from the cytomatrix and exocytosis of insulin. The pancreatic KATP channels are also regulated by important therapeutic pharmacological agents, such as sulfonylureas and K+ channel openers. Sulfonylureas, widely used in the treatment of NIDDM, stimulate insulin secretion by closing the KATP channels, while K+ channel openers inhibit insulin secretion by opening the KATP channels [103].
21
22 Human ABC Transporters: Function, Expression, and Regulation
intracellular ATP/ADP ratio increases and thereby causes the ATP-sensitive octameric K+ -channel complex to close, which in turn depolarizes the beta-cell membrane. The subsequent opening of voltage-dependent calcium channels allows calcium influx and initiates insulin release via several successive steps including dephosphorylation of β 2 -syntrophin, dissociation of β 2 -syntrophin-utrophin-actin complexes from the islet cell autoantigen (ICA) 512, ICA 512 cleavage by µ-calpain, and exocytosis of secretory granules. Interestingly, polymorphisms in calpain-10, a related protease also proposed to be involved in the processing of ICA512, affect insulin secretion and have been linked to type 2 diabetes [189, 190]. Hypoglycemia decreases the intracellular ATP/ADP ratio and causes opening of the ABCC8/KCNJ11 complex, which hyperpolarizes the plasma membrane and inhibits Ca + influx and thereby stops insulin secretion [191]. Mutations in ABCC8 lead to defective ATP-sensing of KCNJ11 channels and thereby to abnormal behavior in response to hypoglycemia, resulting in persistent insulin secretion despite low glucose levels. The knowledge of the function of ABCC8 has become very important in clinical practice, since sulfonylureas, drugs widely used as oral hypoglycemics to promote insulin secretion in the treatment of non-insulin-dependent diabetes mellitus (NIDDM), are high-affinity inhibitors of ABCC8 (SUR1) and ABCC9 (SUR2). In addition to the causal mutations in PHHI, polymorphisms in the ABCC8 gene have been associated with hyperinsulinemia and type 2 diabetes in Mexican Americans [192] and French Caucasians [193], respectively.
5 Function and Regulation of ABC Transporters in Lipid Transport
Due to the strong interest in the primary drug-transporting ABC proteins, other aspects of cellular functions, such as lipid homeostasis, of this large transporter superfamily has been for long time remained unknown and unappreciated. The first implication that ABC proteins could participate in lipid binding and/or transport came from mdr2 knockout mice, which displayed a complete absence of phospholipids from bile and as a consequence developed liver disease [194]. Subsequently, the translocation of phospholipids by the human homolog MDR3 (ABCB4) was demonstrated [61, 62, 195, 196]. Only a few years later, the identification of the sterol-responsiveness of ABCA1 [40] and of other ABC family members [41] had paved the way for the identification of the gene defect in HDL deficiency [6–8], which was a major clue in proving the importance of ABC transporters in macrophage cholesterol efflux. In a similar manner, the discovery of the genetic defect in βsitosterolemia has identified ABCG5 and ABCG8 as proteins which extrude dietary sterols from intestinal epithelial cells and from the liver to the bile duct [124, 126]. As is clear from Tab. 1, a significant number of ABC transporters feature a lipidsensitive regulation, which implies that in addition to the currently established ABC proteins, further members of this superfamily could have similar functional
5 Function and Regulation of ABC Transporters in Lipid Transport
properties. The following section will summarize the current knowledge of ABC transporters in lipid transport with a special emphasis on transport processes in macrophages, liver, and intestine, which reflect the major organ systems in sterol metabolism. 5.1 ABCA1 in Macrophage Lipid Transport
Several factors control the expression and activity of ABCA1. Induced cholesterol influx into macrophage cells has been shown to be a potent inducer of ABCA1 expression [40]. Since the cloning of the complete human and mouse ABCA1 genes, a number of transcriptional control elements acting via alternative promoters have been characterized [197–199] (Fig. 6). The ABCA1 upstream region contains a macrophage-specific promoter preceding exon 1. This sequence binds the repressors ZNF202 and USF1/2, as well as the activating factors Sp1/Sp3 and
Fig. 6 Diagram representing the human ABCA1 and ABCG1 gene promoters Upper panel: The ABCA1 upstream region contains two alternative promoters. Promoter 1 mainly directs macrophage-specific ABCA1 expression and contains binding sites for the transcription factors ZNF202, Sp1, Sp3, E-box binding factors, LXR/RXR, and TATA binding proteins. Promoter 2 is active in liver and steroidogenic tissues and contains putative binding motifs for HNF3, SREBP, LRH/SF1, LXR/RXR, C/EBPs, and a TATA box [198–202]. Lower panel: The ABCG1
gene contains at least three alternative promoters. Binding of ZNF202 to promoter 2 and binding of LXR/RXR to promoter 3 has been determined experimentally. The functionality of promoter 1 has not been demonstrated so far. ZNF202, zinc finger transcription factor 202; Sp1, specificity protein 1; Sp3, specificity protein 3; LXR, liver X receptor; HNF3, hepatic nuclear factor 3; SREBP, sterol regulatory element binding protein; LRH, liver x receptor homolog; SF1, steroidogenic factor 1; N/cFB, nuclear factor kappa B [203, 215].
23
24 Human ABC Transporters: Function, Expression, and Regulation
the oxysterol-induced RXR/LXR heterodimer [200, 201]. A second promoter located downstream of exon 1 has been recently implicated in the liver/steroidogenic expression of ABCA1 [198] (Fig. 6). The The LXR/RXR-responsive elements in promoter 1 triggers retinoic acid and oxysterol-dependent activation of the ABCA1 promoter and thereby confer the observed induction of ABCA1 during lipid loading of macrophages. The most likely endogenous ligand for LXRa and LXRβ is 27-hydroxycholesterol, since CYP27-deficient cells are not able to upregulate ABCA1 in reponse to sterols and since overexpression of CYP27 activates LXR/RXR [202]. The earlier described LXR ligands 20(S)-hydroxycholesterol, 22(R)-hydroxycholesterol, and 24(S),25-epoxycholesterol are not present in cholesterol-loaded macrophages, rendering them unlikely to be natural ligands of LXR [202]. In contrast to LXR/RXR, the zinc finger transcription factor ZNF202 is a transcriptional repressor of ABCA1 gene expression, which also prevents the induction of the gene by oxysterols by recruiting the universal co-repressor KAP1 (KRAB domain-associated protein 1) [203]. Due to the strong upregulation of ABCA1 expression in response to oxysterols, LXR agonists have been proposed to be promising candidates for therapeutic activation of ABCA1 [199, 204–206] (Fig. 7). It stands to reason that especially under disease conditions such as NIDDM, where the cells have low glucose levels, low ATP levels, and associated low HDL-cholesterol levels, excessive mitochondrial energy production could induce mitochondrial exhaustion. This may ultimately result in cellular ATP shortage, a process that likely enhances the programmed cell death of lesion macrophages. Mitochondrial exhaustion may also inhibit mitochondrial 27-OH sterol synthesis and its export from the mitochondrion, a critical pathway for LXR activation in response to cellular cholesterol stress (Fig. 7) [202]. Since 27-OH sterol is the predominant oxysterol in macro-phage-derived foam cells and atherosclerostic lesions [207], this mechanism may indeed be of pathophysiological significance in atherogenesis. In light of these complexities, treatment with LXR agonists bears the potential risk of inducing mitochondrial failure and pro-apoptotic effects and may thus negatively affect lesion formation. ABCA1 appears to be localized on the plasma membrane and surface expression of ABCA1 is upregulated in macrophages by cholesterol loading [208]. Recent evidence indicates that ABCA1 and Cdc42 are associated with a Lubrol detergentresistant raft subfraction, whereas ABCA1 is not detectable in Triton-resistant rafts [209, 210]. In addition, the fact that ABCA1 is detectable in the cytosol and Golgi compartment of unstimulated fibroblasts also raises the intriguing possibility that it is a mobile molecule that may shuttle between the plasma membrane and the Golgi compartment. Thus, ABCA1 could be a constituent of a vesicular transport route for lipids involving the Cdc42/N-WASP/Arp pathway (Fig. 7). Initial studies on the biologic role of ABCA1 supported the view that this transporter, like MDR1 and MDR3 [62], functions as a translocator of lipids between the inner and outer plasma membrane [211]. This was based on experiments showing an increase in cholesterol and phospholipid export under conditions of forced expression of ABCA1 and ABCA1-null mutant cells from TD individuals that characteristically display the reverse scenario [128, 208]. However, recent work from our laboratories indicated that the ATP turnover of ABCA1 occurs
5 Function and Regulation of ABC Transporters in Lipid Transport
Fig. 7 Synopsis of ABC lipid transporters, cellular lipid trafficking pathways, and energy-dependent activation of ABCA1. The model view presented highlights the interdependence of ABCA1function and the availability of ATP, thus emphasizing the requirement of mitochondrial integrity for the proper function of ABC transporters. The transcriptional activation of ABCA1 induced by oxidized sterols such as 27-OH cholesterol is shown. ACAT, acyl-CoA:cholesterol acyltransferase; ANT, ade-nine nucleotide translocator; Apaf-1, apoptotic proteaseactivating factor 1; PDH, pyruvate dehydrogenase; PKA, protein kinase A; UC, unesterified cholesterol; VDAC, voltage-dependent anion channel; ABC, ATP binding cassette
transporter; ACS, acyl-CoA synthetase; CE, cholesteryl ester; DAG, diacylglycerol; FA, fatty acid; FABP, fatty acid binding protein; FATP, FA transfer protein; FA-CoA, fatty acid acyl-CoA; GlcCer, glucosylceramide; HE1, Niemann-Pick C2 protein; HSL, hormone-sensitive lipase; L, lysosome; LacCer, lactosylcera-mide; LB, lamellar body; LCAT, lecithin-cholesteryl acyltransferase; Lipo, lipoprotein; MVB, multi-vesicular body; NCEH, neutral cholester-yl ester hydrolase; NPC, Niemann-Pick C protein; PC, phosphatidylcholine; PE, phosphatidylethanolamine; PL, phospholipid; PS, phos-phatidylserine; SPM, sphingomyelin; TG, triglycerides [207–211].
at a very low rate, whereas nucleotide binding induces conformational changes [35]. Based on this information it is likely that ABCA1 acts rather as a facilitator of cholesterol/phospholipid export within the cellular lipid export machinery than exerts bona fide pump function [35]. It will be exciting to elucidate the exact molecular mechanisms by which ABCA1 mediates the export of lipid compounds from the cell. 5.2 ABCG1 and Other ABCG Members in Sterol Homeostasis
Following its cloning in 1996 [112], it was four years before ABCG1 attracted great attention because of its striking similarities with ABCA1 in its expression
25
26 Human ABC Transporters: Function, Expression, and Regulation
pattern in monocytic cells. Using a differential display approach ABCG1 was identified as a target gene involved in macrophage lipid homeostasis [41]. Like ABCA1 [40], ABCG1 is upregulated during the differentiation process of monocytes into mature macrophages and is strongly induced by foam cell conversion of these macrophages under sterol loading conditions using acLDL. Conversely, cholesterol unloading conditions achieved by further incubation with HDL3 , as the cholesterol acceptor results in the suppression of ABCG1 mRNA and protein expression [41]. In the mean time, these results have been confirmed by other groups as well [212–214]. The observed upregulation of ABCG1 is not restricted to acLDL but is also operative when using other types of modified LDL, such as oxidized LDL or enzymatically modified LDL, but not with free cholesterol or native LDL. Of special interest is the finding that ABCG1 regulation by lipids occurs exclusively in human or murine monomyeloid cells, such as primary human macrophages, THP-1 cells, RAW246.7 cells, peritoneal macrophages, and foam cells of atherosclerotic lesions. The sterolsensitive induction seen in these cells is independent of pro-inflammatory stimuli and the oxidative state of the cell as treatment with TNFα or LPS has no impact on ABCG1 mRNA expression [213]. In addition to lipoprotein-derived lipids, some oxysterols and RXR-specific ligands upregulate ABCG1 expression via the LXR/RXR pathway. Evidence for a significant role of these nuclear receptors in ABCG1 induction comes from two different types of experiments. First, macrophages devoid of LXRα and LXRβ fail to upregulate ABCG1 mRNA upon oxysterol treatment, and secondly, retroviral expression of LXR in RAW246.7 cells facilitates the induction of ABCG1 in response to LXRα and LXR ligands [213]. A first characterization of the ABCG1 promoter (promoter 2 in Fig. 6) demonstrated its functionality and elucidated the minimal region required for liver- and macrophage-specific expression of the gene [114]. Further reports have shown that the ABCG1 gene displays a highly complex transcriptional profile due to the existence of at least three independent promoters (Fig. 6). Whereas the activity of promoter 1 has not been proven so far [115], promoter 3 of ABCG1 has been shown to bind the transcription factors LXR/RXR and thereby mediate the sterol-dependent induction of the gene [215]. In addition to this activating, sterol-regulated pathway, an independent inhibitory mechanism involving the transcriptional repressor ZNF202 and promoter 2 of ABCG1 has been described [203]. ZNF202 regulates a number of genes involved in general lipid metabolism and in particular has been shown to bind to the apoE, ABCA1, and ABCG1 promoters and thereby to modulate cellular lipid efflux. Taken together, transcription from ABCA1 and ABCG1 genes seems to be dominated by sterol-dependent activating mechanisms involving LXR/RXR and by sterol-independent repressory mechanisms mediated by ZNF202. Although the remarkable regulation of ABCG1 gene expression by cellular lipid components revealed its importance in macrophage lipid metabolism, direct evidence for a functional role in lipid trafficking came from an antisense strategy to block ABCG1 expression [41]. Specific antisense oligonucleotides which had no effect on ABCA1 levels caused a 32% and a 25% reduction in macrophage
5 Function and Regulation of ABC Transporters in Lipid Transport
cholesterol and phospholipid efflux, respectively, thereby directly linking ABCG1 with cellular lipid trafficking (Fig. 7). Since the same ABCG1 antisense oligonucleotides lead also to a significant inhibition of apoE secretion, the pathways involving ABCG1 seem to be at least in part distinct from acceptor mediated lipid efflux. Also, the residual phospholipid and cholesterol efflux present in cells from patients with Tangier disease along with a compensatory upregulation of ABCG1 in these cells further supports a function of ABCG1 in intracellular mobilization of lipid stores [212]. First steps in the elucidation of the localization of ABCG1 showed that the protein is predominantly localized in intracellular compartments mainly associated with the ER and Golgi membranes [41, 206, 216]. The small fraction of ABCG1 surface staining detected in immunocytochemical analysis is presumably due to unspecific binding of polyclonal ABCG1 antibodies to the macrophage receptor, as a ABCG1GFP fusion protein is absent from the plasma membrane [206]. There is still a lack of knowledge regarding the question whether ABCG1 functions as a heterodimer or homodimer. Both forms are conceivable for ABCG1 since both cases have been described within the subfamily, e.g. ABCG2 acts as homodimer, whereas ABCG5 and ABCG8 most likely cooperate as heterodimers. In addition to the above described lipid efflux pathways operative in macrophages and liver cells, two other members of the ABCG subfamily, namely ABCG5 and ABCG8 (Tabs 1 and 2), have been implicated recently in the efflux of dietary sterols from intestinal epithelial cells back into the gut lumen and from the liver to the bile duct (Fig. 9). The sterols in a normal western diet usually consist of cholesterol (250–500 mg) and non-cholesterol sterols (200–400 mg), mainly plant sterols like sitosterol and also sterols from fish. In healthy individuals approximately 50–60% of the cholesterol is absorbed and retained, whereas the retention of non-cholesterol sterols is less than 1% [217, 218]. These subtle mechanisms are disrupted in βsitosterolemia or phytosterolemia or shellfishsterolemia, a rare autosomal recessive disorder first described by Bhattacharyya and Connor in 1974 [219]. The disease is characterized by enhanced trapping of cholesterol and other sterols, including plant and shellfish sterols, within the intestinal cells and the inability to concentrate these sterols in the bile. As a consequence affected individuals have strongly increased plasma levels of plant sterols e.g. β-sitosterol, campesterol, stigmasterol, avenosterol, and 5-saturated stanols, whereas total sterol levels remain normal or are just moderately elevated [220, 221]. Another biochemical feature of β-sitosterolemia is a reduced cholesterol synthesis due to a lack of HMG-CoA reductase. Despite the almost normal total plasma sterol levels, the disease shares several clinical characteristics with homozygous familial hypercholesterinemia (FH). Patients display tendon and tuberous xanthomas at an early age, premature development of atherosclerosis, and coronary artery disease. In some cases hemolytic episodes, hypersplenism, platelet abnormalities, arthralgiasis, and arthritis have been described [222]. In 1998 Patel et al. [223] managed to localize the β-sitosterolemia locus to chromosome 2p21 and a recent fine mapping allowed workers to narrow the gene within a 2 cM region between markers D2S2294 and Afm210ex9 [125]. Using a
27
28 Human ABC Transporters: Function, Expression, and Regulation
combination of positional cloning and genome database survey, Lee et al. [126] identified ABCG5, which was mutated in nine unrelated β-sitosterolemia patients. Almost at the same time, Berge et al. [124] used a microarray analysis to search for LXR-regulated genes and identified ABCG5. Since ABC transporters are often found in clusters, the group screened nearby regions and found a second new member of the ABCG subfamily, ABCG8, which displayed 61% sequence similarity and was also mutated in sitosterolemia patients. The fact that the translational start sites of both ABC transporter genes are separated by only 374 bp and arranged in a head-to-head orientation led to the assumption that ABCG5 and ABCG8 have a bi-directional promoter and share common regulatory elements [124], although no functional promoter data have been provided so far. The highest expression level of both transporters is found in liver and intestine and high-cholesterol diet feeding in mice induced the expression of both genes. These findings, together with the observed clinical and biochemical features of β-sitosterolemia patients, assume that ABCG5 and ABCG8 play an important role in reducing intestinal absorption and promote biliary excretion of sterols. To date, several mutations and a number of polymorphisms have been identified in ABCG5 and ABCG8 [124, 126, 224, 225]. Interestingly, sequence analysis of both genes showed that the majority of the analyzed patients were homozygous for a single mutation and that the total number of different mutations is very low. This strongly suggests that sitosterolemia has its origin in a limited number of founder individuals. Another striking finding is that mutations in β-sitosterolemia patients occur exclusively either in ABCG5 or ABCG8, but never in both genes together [225, 226]. The coordinate regulation of both genes and the finding that mutations in either gene cause β-sitosterolemia strongly suggest that the ABCG5 and ABCG8 proteins form a functional heterodimer. As depicted in Fig. 8, dietary sterols including cholesterol and plant sterols which enter the intestinal epithelial cells via micellar transport are released along the lysosomal route. β-Sitosterol and other plant sterols are directly transported back to the gut lumen by the heterodimeric ABCG5/ABCG8 complex in a sort of kick-back mechanism, which may also efflux cholesterol, thereby regulating total sterol absorption. The retained sterols are routed along the ACAT pathway in the ER and either stored as cholesteryl esters in lipid droplets or alternatively packed into chylomicrons for further transport back to the liver (Fig. 8). In the liver alternative processes are conceivable. The sterols are either transported to peripheral tissues by VLDL and LDL particles or converted to bile acids. Also, a direct track into the bile duct for excretion exists, possibly mediated by ABCG5 and ABCG8. In addition to ABCG5 and ABCG8, other ABC transporters including ABCG1 and ABCA1 may also participate in intestinal sterol absorption mechanisms. Data from ABCA1−/− mice strongly suggest that ABCA1 is involved in the absorption of cholesterol and in the uptake of lipophilic vitamins [208, 227]. With this respect, it will be of special interest to determine in which membrane compartment, the apical or the basolateral part of intestinal epithelial cells, the ABCA1 molecule is located.
5 Function and Regulation of ABC Transporters in Lipid Transport
Fig. 8 Proposed role of ABC proteins in intestinal sterol metabolism. ABCG5, ABCG8, and ABCA1 are sterol-induced members of the ABC transporter family. ABCG5 and ABCG8, which are mutated in sitosterolemia, form a heterodimer to mediate the export of absorbed plant sterols and cholesterol into the gut lumen. In contrast, ABCA1 expression and function are required for the uptake of sterols into intestinal ep-
ithelial cells. Implications for the intracellular location and vesicular trafficking of these proteins are presented. Abbreviations not defined in text: CE, cholesteryl ester; DAG, diacylglyceride; DGAT, acyl-CoA: diacylglycerol transferase; HSP70, heat shock protein 70; L, lysosome; MAG, monoacylglyceride; Mic, micelle; MTP, microsomal transfer protein; Sit, sitosterol [22, 124–126].
5.3 ABC Transporters involved in Hepatobiliary Transport
The formation of bile is an elementary physiological function of the liver, which involves numerous transport proteins located in the basolateral (sinusoidal) and apical (canalicular) membranes of hepatocytes (Fig. 9). Bile, which is composed of bile salts, phospholipids, cholesterol, bilirubin and many other small molecules, is necessary for the micellar absorption of lipids from the intestine as well as for the excretion of endogenous and xenobiotic compounds [228]. The first step in hepatobiliary transport, the uptake of compounds into liver cells is mediated by proteins of the solute carrier (SLC) superfamily [229]. Among these, the Na+ /taurocholate co-transporting peptide (NTCP), located in the basolateral membrane is responsible for the uptake of the majority of bile salts in hepatocytes. Small (type I) organic ions (e.g. choline, drugs, and monoamine neurotransmitters) are transported by the organic cation transporter 1 (OCT1), whereas bulky (type II) organic cations,
29
30 Human ABC Transporters: Function, Expression, and Regulation
Fig. 9 Overview of lipid transport proteins in hepatocytes. Monovalent bile salts, such as taurocholate, are taken up into hepatocytes by the sodium-taurocholate cotransporting poly-peptide (NTCP) [229]. The organic anion transporting polypeptides 1 and 2 (OATP1-2) are responsible for the charge-independent uptake of bulky organic compounds, including bile salts and other organic anions, uncharged cardiac glycosides, steroid hormones, and certain type 2 organic cations [230]. Small, type 1 organic cations are transported by the organic cation transporter OCT1. Several ABC proteins belonging to the ABCB (MDR) subfamily or ABCC (MRP) subfamily are expressed in liver [231]. ABCB1 (MDR1)
is responsible for the excretion of bulky amphi-phatic compounds into bile, whereas ABCB4 (MDR3) is a phosphatidylcholine translocase. Monovalent bile salts are secreted into the bile canaliculi by the bile salt export pump BSEP (ABCB11). ABCC2 (MRP2) functions as a multispecific organic anion transport protein in the canalicular membrane. ABCC1 (MRP1), expressed at very low levels in the basolateral membrane in normal hepatocytes, has a similar substrate specificity to MRP2. ABCC3 (MRP3) preferentially translocates conjugates with glucuronate or sulfate, whereas the physiological substrates for ABCC6 (MRP6) are unknown.
glutathione conjugates, and some amount of bile acids are taken up the organic anion-transporting polypeptide (OATP1) [230]. The subsequent step in hepatobiliary transport, the translocation of compounds from hepatocytes into the bile, involves ABC transporters localized in the hepatocyte apical (canalicular) membrane [231]. These ABC proteins belong to the ABCB (MDR) and ABCC (MRP) subfamilies. Despite the low expression level of ABCB1 (MDR1) in normal human liver [232], data from Mdr1a/1b knockout mice, which are very sensitive to xenobiotics, neurotoxins, and chemotherapeutics, provide evidence that the major function of ABCB1 is the protection of hepatocytes against harmful substances by active translocation into the bile [233, 234]. It is now widely accepted
6 Conclusions 31
that ABCB4 (MDR3), which is exclusively expressed in the liver apical membrane, is a bile canalicular phosphatidylcholine translocase (Fig. 9). This function has been confirmed by a series of experimental data: (1) mice with a target disruption of the Mdr2 gene, the mouse homolog of ABCB4 (MDR3), exhibit a complete absence of PC and strongly decreased levels of cholesterol from bile [194]; (2) transgenic expression of human MDR3 in these mice can fully restore PC secretion into the bile [235]; and (3) mutations in the human ABCB4 (MDR3) gene cause progressive familial intrahepatic cholestasis (PFIC) type 3 [10] (Tab. 2). The third member of the ABCB subfamily involved in hepatobiliary secretion is ABCB11 (SPGP). Gerloff et al. [236] have shown that membrane vesicles isolated from ABCB11-overexpressing Sf9 cells display ATP-dependent taurocholate uptake characteristics similar to those of liver canalicular membrane vesicles, and thus concluded that ABCB11 is the major, if not the only bile salt transporter of mammalian liver, hence the name bile salt export pump (BSEP). Further support for this proposition comes from the findings that the ABCB11 (BSEP) gene is mutated in patients with progressive intrahepatic cholestasis type 3 (PFIC3) [11], a syndrome characterized by very low levels of biliary bile salts and elevated concentrations of serum bile salts. In the ABCC (MRP) subfamily, at least four members have been shown to be expressed in liver cells [95]. In hepatocytes and other polarized epithelial cells, ABCC2 (MRP2) is localized and is highly expressed at the canalicular membrane. In contrast, ABCC1 (MRP1) present at the basolateral membrane domain, is expressed very low in normal liver. As listed in Tab. 1 and displayed in Fig. 9, physiological substrates for ABCC1 and ABCC2 comprise glutathione conjugates (e.g. leukotriene C4), estrogen- and bilirubin-glucuronides, taurolithocholate 3-sulfate, and glutathione disulfide (GSSG). However, due to the differences in the overall expression levels and because of greatly different transport kinetics, ABCC2 seems to be the major transporter of anionic conjugates. Likewise, hereditary defects of ABCC2 in humans cause the Dubin-Johnson syndrome, which is associated with defects in biliary secretion of amphiphilic an-ionic conjugates including bilirubin-glucuronides [237, 238]. Glucuronate- and sulfate-conjugates are also substrates for ABCC3 (MRP3), which has been localized to the basolateral membrane of hepatocytes [239]; however, in contrast to ABCC1 and ABCC2, glutathione conjugates are poor substrates for ABCC3. ABCC6 (MRP6), which has been localized to the lateral hepatocyte membrane [240], is capable of transporting the anionic cyclopentapeptide BQ123, an endothelin receptor antagonist; however, the physiological substrate for ABCC6 has not been elucidated so far.
6 Conclusions
Although our knowledge of lipid-transporting ATP binding cassette transporters has grown substantially over the last few years, the detailed molecular mechanisms by which lipid compounds are transported across cellular membranes still await
32 Human ABC Transporters: Function, Expression, and Regulation
clarification. Analysis of the transcriptional and metabolic regulation, the intracellular localization and membrane domain association, the exact substrate specificity, and the functional activity of these proteins will provide helpful hints towards the understanding of working mechanisms of ABC lipid transporters. Based on their complex architecture it can be expected that ABC lipid transporters engage in multifaceted interactions with an array of yet to be identified effector molecules at specialized membrane compartments. The recent finding that ABCA1 is not an active pump but may rather function as a regulator similar to ABCC7 (CFTR) or ABCC8 (SUR1) supports this hypothesis. It will be a fascinating task to characterize the functional partners of ABC lipid transporters and to determine whether these include other ABC lipid transporters. In light of the now documented role of the prototypic cholesterol-responsive ABC molecules ABCA1 and ABCG1, it can be postulated that other ABC transporters which show a cholesterol-dependent regulation in macrophages, especially members of the ABCB and ABCC subfamilies play critical roles in macrophage lipid homeostasis.
References 1 C. F. HIGGINS, Annu. Rev. Cell Biol. 1992, 8, 67–113. 2 W. SAURIN, M. HOFNUNG, E. DASSA, J. Mol. Evol. 1999, 48, 22–41. 3 H. NIKAIDO, FEBS Lett. 1994, 346, 55–58. 4 J. YOUNG, I. B. HOLLAND, Biochim. Bio-phys. Acta 1999, 1461, 177–200. 5 T. LITMAN, T. E. DRULEY, W. D. STEIN, S. E. BATES, Cell Mol. Life Sci. 2001, 58, 931–959. 6 M. BODZIOCH, E. ORSO, J. KLUCKEN, T. LANGMANN, A. BOTTCHER, W. DIEDERICH, W. DROBNIK, S. BARLAGE, C. BUCHLER, O. M. PORSCH, W. E. KAMINSKI, H. W. HAHMANN, K. OETTE, G. ROTHE, C. ASLANIDIS, K. J. LACKNER, G. SCHMITZ, Nature Genet. 1999, 22, 347–351. 7 A. BROOKS-WILSON, M. MARCIL, S. M. CLEE, L. H. ZHANG, K. ROOMP, M. VAN-DAM, L. YU, C. BREWER, J. A. COLLINS, H. O. MOLHUIZEN, O. LOUBSER, B. F. OUELETTE, K. FICHTER, E. K. ASHBOURNE, C. W. SENSEN, S. SCHERER, S. MOTT, M. DENIS, D. MARTINDALE, J. FROHLICH, K. MORGAN, B. KOOP, S. PIMSTONE, J. J. KASTELEIN, M. R. HAYDEN, Nature Genet. 1999, 22, 336–345. 8 S. RUST, M. ROSIER, H. FUNKE, J. REAL, Z. AMOURA, J. C. PIETTE, J. F. DELEUZE, H. B. BREWER, N. DUVERGER, P. DENEFLE, G.
9 10
11
12
13
14 15
ASSMANN, Nature Genet. 1999, 22, 352–355. R. ALLIKMETS, Am. J. Hum. Genet. 2000, 67, 793–799. J. M. de VREE, E. JACQUEMIN, E. STURM, D. CRESTEIL, P. J. BOSMA, J. ATEN, J. F. DELEUZE, M. DESROCHERS, M. BURDELSKI, O. BERNARD, R. P. OUDE ELFERINK, M. HADCHOUEL, Proc. Natl Acad. Sci. USA 1998, 95, 282–287. S. S. STRAUTNIEKS, L. N. BULL, A. S. KNISELY, S. A. KOCOSHIS, N. DAHL, H. ARNELL, E. SOKAL, K. DAHAN, S. CHILDS, V. LING, M. S. TANNER, A. F. KAGALWALLA, A. NEMETH, J. PAWLOWSKA, A. BAKER, G. MIELI-VERGANI, N. B. FREIMER, R. M. GARDINER, R. J. THOMPSON, Nature Genet. 1998, 20, 233–238. E. JACQUEMIN, J. M. de VREE, D. CRESTEIL, E. M. SOKAL, E. STURM, M. DUMONT, G. L. SCHEFFER, M. PAUL, M. BURDELSKI, P. J. BOSMA, O. BERNARD, M. HADCHOUEL, R. P. ELFERINK, Gastroenterology 2001, 120, 1448–1458. J. KONIG, A. T. NIES, Y. CUI, I. LEIER, D. KEPPLER, Biochim. Biophys. Acta 1999, 1461, 377–394. J. UITTO, L. PULKKINEN, F. RINGPFEIL, Trends Mol. Med. 2001, 7, 13–17. J. MOSSER, Y. LUTZ, M. E. STOECKEL, C. O. SARDE, C. KRETZ, A. M. DOUAR, J. LOPEZ,
References
16 17
18
19
20
21 22 23 24
25
26
27
28 29
30
31
32
P. AUBOURG, J. L. MANDEL, Hum. Mol. Genet. 1994, 3, 265–271. M. H. LEE, K. LU, S. B. PATEL, Curr. Opin. Lipidol. 2001, 12, 141–149. S. LUTUCUTA, C. M. BALLANTYNE, H. ELGHANNAM, A. M. GOTTO, A. J. MARIAN, Circ. Res. 2001, 88, 969–973 A. RIVERA, K. WHITE, H. STOHR, K. STEINER, N. HEMMRICH, T. GRIMM, B. JURKLIES, B. LORENZ, H. B. SCHOLL, E. APFELSTEDT-SYLLA, B. H. WEBER, Am. J. Hum. Genet. 2000, 67, 800–813. M. NAKAMURA, S. UENO, A. SANO, H. TANABE, Mol. Psychiatry 1999, 4, 155–162. P. BORST, N. ZELCER, H. van HELVOORT, Biochim. Biophys. Acta 2000, 1486, 128–144. K. LU, M. H. LEE, S. B. PATEL, Trends Endocrinol. Metab. 2001, 12, 314–320. G. SCHMITZ, T. LANGMANN, S. HEIMERL, J. Lipid Res. 2001, 42, 1513–1520. G. SCHMITZ, W. E. KAMINSKI, Front. Biosci. 2001, 6, D505–D514. J. E. WALKER, M. SARASTE, M. J. RUNSWICK, N. J. GAY, EMBO J. 1982, 1, 945–951. S. C. HYDE, P. EMSLEY, M. J. HARTSHORN, M. M. MIMMACK, U. GILEADI, S. R. PEARCE, M. P. GALLAGHER, D. R. GILL, R. E. HUBBARD, C. F. HIGGINS, Nature 1990, 346, 362–365. C. F. HIGGINS, M. P. GALLAGHER, M. L. MIMMACK, S. R. PEARCE, Bioessays 1988, 8, 111–116. K. DENZER, M. J. KLEIJMEER, H. F. HEIJNEN, W. STOORVOGEL, H. J. GEUZE, J. Cell Sci. 2000, 113(19), 3365–3374. I. KLEIN, B. SARKADI, A. VARADI, Biochim. Biophys. Acta 1999, 1461, 237–262. E. ROCCHI, A. KHODJAKOV, E. L. VOLK, C. H. YANG, T. LITMAN, S. E. BATES, E. SCHNEIDER, Biochem. Biophys. Res. Commun. 2000, 271, 42–46. C. BISBAL, T. SALEHZADA, M. SILHOL, C. MARTINAND, F. LE ROY, B. LEBLEU, Methods Mol. Biol. 2001, 160, 183–198. J. K. TYZACK, X. WANG, G. J. BELSHAM, C. G. PROUD, J. Biol. Chem. 2000, 275, 34131–34139. K. UEDA, M. MATSUO, K. TANABE, K. MORITA, N. KIOKA, T. AMACHI, Biochim. Biophys. Acta 1999, 1461, 305–313.
33 K. SZABO, G. SZAKACS, T. HEGEDUS, B. SARKADI, J. Biol. Chem. 1999, 274, 12209–12212. 34 J. BRYAN, L. AGUILAR-BRYAN, Biochim. Biophys. Acta 1999, 1461, 285–303. 35 G. SZAKACS, T. LANGMANN, C. OZVEGY, E. ORSO, G. SCHMITZ, A. VARADI, B. SARKADI, Biochem. Biophys. Res. Commun. 2001, 288, 1258–1264. 36 E. S. LANDER, L. M. LINTON, B. BIRREN et al. Nature 2001, 409, 860–921. 37 J. C. VENTER, M. D. ADAMS, E. W. MYERS et al. Science 2001, 291, 1304–1351. 38 A. DECOTTIGNIES, A. GOFFEAU, Nature Genet. 1997, 15, 137–145. 39 C. BROCCARDO, M. LUCIANI, G. CHIMINI, Biochim. Biophys. Acta 1999, 1461, 395–404. 40 T. LANGMANN, J. KLUCKEN, M. REIL, G. LIEBISCH, M. F. LUCIANI, G. CHIMINI, W. E. KAMINSKI, G. SCHMITZ, Biochem. Biophys. Res. Commun. 1999, 257, 29–33. 41 J. KLUCKEN, C. BUCHLER, E. ORSO, M. PORSCH, G. LIEBISCH, M. KAPINSKI, W. DIEDERICH, W. DROBNIK, M. DEAN, R. ALLIKMETS, G. SCHMITZ, Proc. Natl Acad. Sci. USA 2000, 97, 817–822. 42 W. E. KAMINSKI, J. J. WENZEL, A. PIEHLER, T. LANGMANN, G. SCHMITZ, Biochem. Biophys. Res. Commun. 2001, 285, 1295–1301. 43 W. E. KAMINSKI, A. PIEHLER, G. SCHMITZ, Biochem. Biophys. Res. Commun. 2000, 278, 782–789. 44 W. E. KAMINSKI, E. ORSO, W. DIEDERICH, J. KLUCKEN, W. DROBNIK, G. SCHMITZ, Biochem. Biophys. Res. Commun. 2000, 273, 532–538. 45 W. E. KAMINSKI, A. PIEHLER, K. PULLMANN, M. PORSCH-OZCURUMEZ, C. DUONG, G. M. BARED, C. BUCHLER, G. SCHMITZ, Biochem. Biophys. Res. Commun. 2001, 281, 249–258. 46 G. SCHMITZ, W. E. KAMINSKI, E. ORSO, Curr. Opin. Lipidol. 2000, 11, 493–501. 47 H. SUN, R. S. MOLDAY, J. NATHANS, J. Biol. Chem. 1999, 274, 8269–8281. 48 J. AHN, J. T. WONG, R. S. MOLDAY, J. Biol. Chem. 2000, 275, 20399–20405. 49 J. AHN, R. S. MOLDAY, Methods Enzymol. 2000, 315, 864–879.
33
34 Human ABC Transporters: Function, Expression, and Regulation
50 C. ZHOU, L. ZHAO, N. INAGAKI, J. GUAN, S. NAKAJO, T. HIRABAYASHI, S. KIKUYAMA, S. SHIODA, J. Neurosci. 2001, 21, 849–857. 51 B. VULEVIC, Z. CHEN, J. T. BOYD, W. DAVIS, E. S. WALSH, M. G. BELINSKY, K. D. TE W, Cancer Res. 2001, 61, 3339–3347. 52 K. PAINE, D. R. FLOWER, Biochim. Biophys. Acta 2000, 1482, 351–352. 53 C. BROCCARDO, J. OSORIO, M. F. LUCIANI, L. M. SCHRIML, C. PRADES, S. SHULENIN, I. ARNOULD, L. NAUDIN, C. LAFARGUE, M. ROSIER, B. JORDAN, M. G. MATTEI, M. DEAN, P. DENEFLE, G. CHIMINI, Cytogenet. Cell Genet. 2001, 92, 264–270. 54 D. KIELAR, T. LANGMANN, W. E. KAMINSKI, A. PIEHLER, J. WENZEL, G. LIEBISCH, S. BARLAGE, C. MOEHLE, W. DROBNIK, G. SCHMITZ, manuscript submitted. 55 G. YAMANO, H. FUNAHASHI, O. KAWANAMI, L. X. ZHAO, N. BAN, Y. UCHIDA, T. MOROHOSHI, J. OGAWA, S. SHIODA, N. INAGAKI, FEBS Lett. 2001, 508, 221–225. 56 W. BERNHARD, S. HOFFMANN, H. S. DOMBROWSKY, G. A. RAU, A. KAMLAGE, M. KAPPLER, J. J. HAITSMA, J. FREIHORST, H. H. VON DER, C. F. POETS, Am. J. Respir. Cell Mol. Biol. 2001, 25, 725–731. 57 S. A. ROONEY, Comp. Biochem. Physiol A Mol. Integr. Physiol 2001, 129, 233–243. 58 W. R. RICE, M. BURHANS, J. R. WISPE, Pediatr. Res. 1989, 25, 396–398. 59 N. KLUGBAUER, F. HOFMANN, FEBS Lett. 1996, 391, 61–65. 60 I. PASTAN, M. M. GOTTESMAN, Annu. Rev. Med. 1991, 42, 277–286. 61 S. RUETZ, P. GROS, J. Biol. Chem. 1995, 270, 25388–25395. 62 A. van HELVOORT, A. J. SMITH, H. SPRONG, I. FRITZSCHE, A. H. SCHINKEL, P. BORST, G. van MEER, Cell 1996, 87, 507–517. 63 Y. LAVIE, H. CAO, S. L. BURSTEN, A. E. GIULIANO, M. C. CABOT, J. Biol. Chem. 1996, 271, 19530–19536. 64 A. LUCCI, A. E. GIULIANO, T. Y. HAN, T. DINUR, Y. Y. LIU, A. SENCHENKOV, M. C. CABOT, Int. J. Oncol. 1999, 15, 535–540. 65 A. LUCCI, W. I. CHO, T. Y. HAN, A. E. GIULIANO, D. L. MORTON, M. C. CABOT, Anti-cancer Res. 1998, 18, 475–480.
66 Y. LAVIE, G. FIUCCI, M. LISCOVITCH, J. Biol. Chem. 1998, 273, 32380–32383. 67 M. DEMEULE, J. JODOIN, D. GINGRAS, R. BELIVEAU, FEBS Lett. 2000, 466, 219–224. 68 E. WANG, C. N. CASCIANO, R. P. CLEMENT, W. W. JOHNSON, Biochem. Biophys. Res. Commun. 2000, 276, 909–916. 69 Y. LAVIE, G. FIUCCI, M. LISCOVITCH, Adv. Drug Deliv. Rev. 2001, 49, 317–323. 70 R. J. RAGGERS, I. VOGELS, G. van MEER, Biochem. J. 2001, 357, 859–865. 71 C. M. PANWALA, J. C. JONES, J. L. VINEY, J. Immunol. 1998, 161, 5733–5744. 72 M. D. EISENBRAUN, R. L. MOSLEY, D. H. TEITELBAUM, R. A. MILLER, Dev. Comp. Immunol. 2000, 24, 783–795. 73 L. MAGGIO-PRICE, D. SHOWS, K. WAGGIE, A. BURICH, W. ZENG, S. ESCOBAR, P. MORRISSEY, J. L. VINEY, Am. J. Pathol. 2002, 160, 739–751. 74 F. C. LAM, R. LIU, P. LU, A. B. SHAPIRO, J. M. RENOIR, F. J. SHAROM, P. B. REINER, J. Neurochem. 2001, 76, 1121–1128. 75 J. HERZ, U. BEFFERT, Nature Rev. Neurosci. 2000, 1, 51–58. 76 U. RITZ, B. SELIGER, Mol. Med. 2001, 7, 149–158. 77 B. ORTMANN, J. COPEMAN, P. J. LEHNER, B. SADASIVAN, J. A. HERBERG, A. G. GRANDEA, S. R. RIDDELL, R. TAMPE, T. SPIES, J. TROWSDALE, P. CRESSWELL, Science 1997, 277, 1306–1309. 78 R. ABELE, R. TAMPE, Biochim. Biophys. Acta 1999, 1461, 405–419. 79 H. HENGEL, J. O. KOOPMANN, T. FLOHR, W. MURANYI, E. GOULMY, G. J. HAMMERLING, U. H. KOSZINOWSKI, F. MOMBURG, Immunity 1997, 6, 623– 632. 80 K. AHN, A. GRUHLER, B. GALOCHA, T. R. JONES, E. J. WIERTZ, H. L. PLOEGH, P. A. PETERSON, Y. YANG, K. FRUH, Immunity. 1997, 6, 613–621. 81 B. GALOCHA, A. HILL, B. C. BARNETT, A. DOLAN, A. RAIMONDI, R. F. COOK, J. BRUNNER, D. J. MCGEOCH, H. L. PLOEGH, J. Exp. Med. 1997, 185, 1565–1572. 82 F. ZHANG, W. ZHANG, L. LIU, C. L. FISHER, D. HUI, S. CHILDS, K. DOROVINI-ZIS, V. LING, J. Biol. Chem. 2000, 275, 23287–23294. 83 G. KISPAL, P. CSERE, C. PROHL, R. LILL, EMBO J. 1999, 18, 3981–3989.
References 84 P. CSERE, R. LILL, G. KISPAL, FEBS Lett. 1998, 441, 266–270. 85 G. KISPAL, P. CSERE, B. GUIARD, R. LILL, FEBS Lett. 1997, 418, 346–350. 86 R. ALLIKMETS, W. H. RASKIND, A. HUTCHINSON, N. D. SCHUECK, M. DEAN, D. M. KOELLER, Hum. Mol. Genet. 1999, 8, 743–749. 87 E. BAKOS, R. EVERS, G. SZAKACS, G. E. TUSNADY, E. WELKER, K. SZABO, M. de HAAS, L. van DEEMTER, P. BORST, A. VARADI, B. SARKADI, J. Biol. Chem. 1998, 273, 32167–32175. 88 D. N. SHEPPARD, M. J. WELSH, Physiol. Rev. 1999, 79, S23–S45. 89 T. ISHIKAWA, Z. S. LI, Y. P. LU, P. A. REA, Biosci. Rep. 1997, 17, 189–207. 90 H. SUZUKI, Y. SUGIYAMA, Semin. Liver Dis. 1998, 18, 359–376. 91 D. KEPPLER, J. KONIG, Semin. Liver Dis. 2000, 20, 265–272. 92 D. KEPPLER, T. KAMISAKO, I. LEIER, Y. CUI, A. T. NIES, H. TSUJII, J. KONIG, Adv. Enzyme Regul. 2000, 40, 339–349. 93 T. HIROHASHI, H. SUZUKI, Y. SUGIYAMA, J. Biol. Chem. 1999, 274, 15181–15185. 94 J. D. SCHUETZ, M. C. CONNELLY, D. SUN, S. G. PAIBIR, P. M. FLYNN, R. V. SRINIVAS, A. KUMAR, A. FRIDLAND, Nature Med. 1999, 5, 1048–1051. 95 P. BORST, R. EVERS, M. KOOL, J. WIJNHOLDS, J. Natl Cancer Inst. 2000, 92, 1295–1302. 96 M. KOOL, M. van der LINDEN, M. de HAAS, F. BAAS, P. BORST, (1999) Cancer Res. 1999, 59, 175–182. 97 O. LE SAUX, Z. URBAN, C. TSCHUCH, K. CSISZAR, B. BACCHELLI, D. QUAGLINO, I. PASQUALI-RONCHETTI, F. M. POPE, A. RICHARDS, S. TERRY, L. BERCOVITCH, A. de PAEPE, C. D. BOYD, Nature Genet. 2000, 25, 223–227. 98 M. DEAN, Y. HAMON, G. CHIMINI, J. Lipid Res. 2001, 42, 1007–1017. 99 J. TAMMUR, C. PRADES, I. ARNOULD, A. RZHETSKY, A. HUTCHINSON, M. ADACHI, J. D. SCHUETZ, K. J. SWOBODA, L. J. PTACEK, M. ROSIER, M. DEAN, R. ALLIKMETS, Gene 2001, 273, 89–96. 100 W. L. LEE, A. TAY, H. T. ONG, L. M. GOH, A. P. MONACO, P. SZEPETOWSKI, Hum. Genet. 1998, 103, 608–612.
101 H. TOMITA, S. NAGAMITSU, K. WAKUI, Y. FUKUSHIMA, K. YAMADA, M. SADAMATSU, A. MASUI, T. KONISHI, T. MATSUISHI, M. AIHARA, K. SHIMIZU, K. HASHIMOTO, M. MINETA, M. MATSUSHIMA, T. TSUJITA, M. SAITO, H. TANAKA, S. TSUJI, T. TAKAGI, Y. NAKAMURA, S. NANKO, N. KATO, Y. NAKANE, N. NIIKAWA, Am. J. Hum. Genet. 1999, 65, 1688–1697. 102 J. L. DOYLE, L. STUBBS, Trends Genet. 1998, 14, 92–98. 103 L. AGUILAR-BRYAN, J. P. CLEMENT, G. GONZALEZ, K. KUNJILWAR, A. BABENKO, J. BRYAN, Physiol. Rev. 1998, 78, 227–245. 104 A. HOLZINGER, S. KAMMERER, J. BERGER, A. A. ROSCHER, Biochem. Biophys. Res. Commun. 1997, 239, 261–264. 105 A. HOLZINGER, S. KAMMERER, A. A. ROSCHER, Biochem. Biophys. Res. Commun. 1997, 237, 152–157. 106 G. LOMBARD-PLATET, S. SAVARY, C. O. SARDE, J. L. MANDEL, G. CHIMINI, (1996) Proc. Natl Acad. Sci. USA 1996, 93, 1265–1269. 107 J. MOSSER, A. M. DOUAR, C. O. SARDE, P. KIOSCHIS, R. FEIL, H. MOSER, A. M. POUSTKA, J. L. MANDEL, P. AUBOURG, Nature 1993, 361, 726–730. 108 P. AUBOURG, J. MOSSER, A. M. DOUAR, C. O. SARDE, J. LOPEZ, J. L. MANDEL, Biochemie 1993, 75, 293–302. 109 S. FOURCADE, S. SAVARY, S. ALBET, D. GAUTHE, C. GONDCAILLE, T. PINEAU, J. BELLENGER, M. BENTEJAC, A. HOLZINGER, J. BERGER, M. BUGAUT, Eur. J. Biochem. 2001, 268, 3490–3500. 110 A. PUJOL, N. TROFFER-CHARLIER, E. METZGER, G. CHIMINI, J. L. MANDEL, Genomics 2000, 70, 131–139. 111 M. RICHARD, R. DROUIN, A. D. BEAULIEU, Genomics 1998, 53, 137–145. 112 H. CHEN, C. ROSSIER, M. D. LALIOTI, A. LYNN, A. CHAKRAVARTI, G. PERRIN, S. E. ANTONARAKIS, Am. J. Hum. Genet. 1996, 59, 66–75. 113 J. M. CROOP, G. E. TILLER, J. A. FLETCHER, M. L. LUX, E. RAAB, D. GOLDENSON, D. SON, S. ARCINIEGAS, R. L. WU, Gene 1997, 185, 77–85. 114 T. LANGMANN, M. PORSCH-OZCURUMEZ, U. UNKELBACH, J. KLUCKEN, G. SCHMITZ, Biochim. Biophys. Acta 2000, 1494, 175–180.
35
36 Human ABC Transporters: Function, Expression, and Regulation
115 S. LORKOWSKI, S. RUST, T. ENGEL, E. JUNG, K. TEGELKAMP, E. A. GALINSKI, G. ASSMANN, P. CULLEN, Biochem. Biophys. Res. Commun. 2001, 280, 121–131. 116 T. B. BONNE, A. L. DESTEFANO, C. E. BRIGGS, R. ADAIR, B. FRANKLYN, S. WEISS, M. KOROSTISHEVSKY, M. FRYDMAN, C. T. BALDWIN, L. A. FARRER, (1996) Am. J. Hum. Genet. 1996, 58, 1254–1259. 117 A. BERRY, H. S. SCOTT, J. KUDOH, I. TALIOR, M. KOROSTISHEVSKY, M. WATTENHOFER, M. GUIPPONI, C. BARRAS, C. ROSSIER, K. SHIBUYA, J. WANG, K. KAWASAKI, S. ASAKAWA, S. MINOSHIMA, N. SHIMIZU, S. ANTONARAKIS, B. BONNE-TAMIR, Genomics 2000, 68, 22–29. 118 D. RUJESCU, I. GIEGLING, N. DAHMEN, A. SZEGEDI, I. ANGHELESCU, A. GIETL, M. SCHAFER, F. MULLER-SIECHENEDER, B. BONDY, H. J. MOLLER, Neuropsychobiology 2000, 42, Suppl 1, 22–25. 119 R. ALLIKMETS, L. M. SCHRIML, A. HUTCHINSON, S. V. ROMANO, M. DEAN,. Cancer Res. 1998, 58, 5337–5339. 120 L. A. DOYLE, W. YANG, L. V. ABRUZZO, T. KROGMANN, Y. GAO, A. K. RISHI, D. D. ROSS, Proc. Natl Acad. Sci. USA 1998, 95, 15665–15670. 121 K. MIYAKE, L. MICKLEY, T. LITMAN, Z. ZHAN, R. ROBEY, B. CRISTENSEN, M. BRANGI, L. GREENBERGER, M. DEAN, T. FOJO, S. E. BATES, Cancer Res. 1999, 59, 8–13. 122 T. LITMAN, M. BRANGI, E. HUDSON, P. FETSCH, A. ABATI, D. D. ROSS, K. MIYAKE, J. H. RESAU, S. E. BATES, J. Cell Sci. 2000, 113 (11), 2011–2021. 123 R. W. ROBEY, W. Y. MEDINA-PEREZ, K. NISHIYAMA, T. LAHUSEN, K. MIYAKE, T. LITMAN, A. M. SENDEROWICZ, D. D. ROSS, S. E. BATES, Clin. Cancer Res. 2001, 7, 145–152. 124 K. E. BERGE, H. TIAN, G. A. GRAF, L. YU, N. V. GRISHIN, J. SCHULTZ, P. KWITEROVICH, B. SHAN, R. BARNES, H. H. HOBBS, Science 2000, 290, 1771–1775. 125 M. H. LEE, D. GORDON, J. OTT, K. LU, L. OSE, T. MIETTINEN, H. GYLLING, A. F. STALENHOEF, A. PANDYA, H. HIDAKA, B. Jr. BREWER, H. KOJIMA, N. SAKUMA, R. PEGORARO, G. SALEN, S. B. PATEL, Eur. J. Hum. Genet. 2001, 9, 375–384.
126 M. H. LEE, K. LU, S. HAZARD, H. YU, S. SHULENIN, H. HIDAKA, H. KOJIMA, R. ALLIKMETS, N. SAKUMA, R. PEGORARO, A. K. SRIVASTAVA, G. SALEN, M. DEAN, S. B. PATEL, Nature Genet. 2001, 27, 79–83. 127 T. ENGEL, S. LORKOWSKI, A. LUEKEN, S. RUST, B. SCHLUTER, G. BERGER, P. CULLEN, G. ASSMANN, Biochem. Biophys. Res. Commun. 2001, 288, 483–488. 128 G. ROGLER, B. TRUMBACH, B. KLIMA, K. J. LACKNER, G. SCHMITZ, Arterioscler. Thromb. Vasc. Biol. 1995, 15, 683–690. 129 G. A. FRANCIS, R. H. KNOPP, J. F. ORAM, J. Clin. Invest. 1995, 96, 78–87. 130 A. BOTTCHER, J. SCHLOSSER, F. KRONENBERG, H. DIEPLINGER, G. KNIPPING, K. J. LACKNER, G. SCHMITZ, J. Lipid Res. 2000, 41, 905–915. 131 B. F. ASZTALOS, M. E. BROUSSEAU, J. R. MCNAMARA, K. V. HORVATH, P. S. ROHEIM, E. J. SCHAEFER, Atherosclerosis 2001, 156, 217–225. 132 D. BOJANOVSKI, R. E. GREGG, L. A. ZECH, M. S. MENG, C. BISHOP, R. RONAN, H. B. Jr. BREWER, J. Clin. Invest. 1987, 80, 1742–1747. 133 E. J. SCHAEFER, C. B. BLUM, R. I. LEVY, L. L. JENKINS, P. ALAUPOVIC, D. M. FOSTER, H. B. Jr. BREWER, N. Engl. J. Med. 1978, 299, 905–910. 134 E. J. SCHAEFER, D. W. ANDERSON, L. A. ZECH, F. T. LINDGREN, T. B. BRONZERT, E. A. RUBALCABA, H. B. Jr. BREWER, J. Lipid Res. 1981, 22, 217–228. 135 M. E. BROUSSEAU, G. P. EBERHART, J. DUPUIS, B. F. ASZTALOS, A. L. GOLDKAMP, E. J. SCHAEFER, M. W. FREEMAN, J. Lipid Res. 2000, 41, 1125–1135. 136 M. E. BROUSSEAU, E. J. SCHAEFER, J. DUPUIS, B. EUSTACE, P. Van EERDEWEGH, A. L. GOLDKAMP, L. M. THURSTON, M. G. FITZGERALD, D. YASEK-MCKENNA, G. O’NEILL, G. P. EBERHART, B. WEIFFENBACH, J. M. ORDOVAS, M. W. FREEMAN, R. H. Jr. BROWN, J. Z. GU, J. Lipid Res. 2000, 41, 433–441. 137 E. J. SCHAEFER, M. E. BROUSSEAU, M. R. DIFFENDERFER, J. S. COHN, F. K. WELTY, J. O’CONNOR, G. G. DOLNIKOWSKI, J. WANG, R. A. HEGELE, P. J. JONES, Atherosclerosis 2001, 159, 231–236. 138 A. R. TALL, N. WANG, J. Clin. Invest. 2000, 106, 1205–1207.
References 139 S. M. CLEE, A. H. ZWINDERMAN, J. C. ENGERT, K. Y. ZWARTS, H. O. MOLHUIZEN, K. ROOMP, J. W. JUKEMA, M. van WIJLAND, M. van DAM, T. J. HUDSON, A. BROOKS-WILSON, J. Jr. GENEST, J. J. KASTELEIN, M. R. HAYDEN, Circulation 2001, 103, 1198–1205. 140 S. M. CLEE, J. J. KASTELEIN, M. van DAM, M. MARCIL, K. ROOMP, K. Y. ZWARTS, J. A. COLLINS, R. ROELANTS, N. TAMASAWA, T. STULC, T. SUDA, R. CESKA, B. BOUCHER, C. RONDEAU, C. DESOUICH, A. BROOKS-WILSON, H. O. MOLHUIZEN, J. FROHLICH, J. Jr. GENEST, M. R. HAYDEN, J. Clin. Invest. 2000, 106, 1263–1270. 141 G. ROTHE, H. GABRIEL, E. KOVACS, J. KLUCKEN, J. STOHR, W. KINDERMANN, G. SCHMITZ, Arterioscler. Thromb. Vasc. Biol. 1996, 16, 1437–1447. 142 G. SCHMITZ, W. E. KAMINSKI, M. PORSCH-OZCURUMEZ, J. KLUCKEN, E. ORSO, M. BODZIOCH, C. BUCHLER, W. DROBNIK, Pathobiology 1999, 67, 236–240. 143 M. Van ECK, I. BOS, W. E. KAMINSKI, E. ORSO, G. ROTHE, J. TWISK, A. BOETTCHER, E. S. Van AMERSFOORT, T. A. CHRISTIANSEN-WEBER, W. P. FUNG-LEUNG, T. J. Van BERKEL, Proc. Natl Acad. Sci. USA 2002, 99, 6298–6303 144 Y. V. BOBRYSHEV, Curr. Opin. Lipidol. 2000, 11, 511–517. 145 G. ROTHE, J. STOHR, P. FEHRINGER, C. GASCHE, G. SCHMITZ, Atherosclerosis 1997, 130, 215–221. 146 W. DIEDERICH, E. ORSO, W. DROBNIK, G. SCHMITZ, Atherosclerosis 2001, 159, 13–324. 147 T. MATOZAKI, H. NAKANISHI, Y. TAKAI, Cell Signal. 2000, 12, 515–524. 148 L. BELLINCAMPI, M. L. SIMONE, C. MOTTI, C. CORTESE, S. BERNARDINI, S. BERTOLINI, S. CALANDRA, Biochem. Biophys. Res. Commun. 2001, 283, 590–597. 149 R. ALLIKMETS, N. SINGH, H. SUN, N. F. SHROYER, A. HUTCHINSON, A. CHIDAMBARAM, B. GERRARD, L. BAIRD, D. STAUFFER, A. PEIFFER, A. RATTNER, P. SMALLWOOD, Y. LI, K. L. ANDERSON, R. A. LEWIS, J. NATHANS, M. LEPPERT, M. DEAN, J. R. LUPSKI, Nature Genet. 1997, 15, 236–246. 150 S. M. AZARIAN, G. H. TRAVIS, FEBS Lett. 1997, 409, 247–252.
151 M. ILLING, L. L. MOLDAY, R. S. MOLDAY, J. Biol. Chem. 1997, 272, 10303–10310. 152 F. P. CREMERS, D. J. van de POL, M. van DRIEL, A. I. DEN HOLLANDER, F. J. van HAREN, N. V. KNOERS, N. TIJMES, A. A. BERGEN, K. ROHRSCHNEIDER, A. BLANKENAGEL, A. PINCKERS, A. F. DEUTMAN, C. B. HOYNG, Hum. Mol. Genet. 1998, 7, 355–362. 153 J. M. ROZET, S. GERBER, E. SOUIED, I. PERRAULT, S. CHATELIN, I. GHAZI, C. LEOWSKI, J. L. DUFIER, A. MUNNICH, J. KAPLAN, Eur. J. Hum. Genet. 1998, 6, 291–295. 154 J. M. ROZET, S. GERBER, I. GHAZI, I. PERRAULT, D. DUCROQ, E. SOUIED, A. CABOT, J. L. DUFIER, A. MUNNICH, J. KAPLAN, J. Med. Genet. 1999, 36, 447–451. 155 J. M. ROZET, S. GERBER, E. SOUIED, D. DUCROQ, I. PERRAULT, I. GHAZI, G. SOUBRANE COSCAS, J. L. DUFIER, A. MUNNICH, J. KAPLAN, Mol. Genet. Metab 1999, 68, 310–315. 156 A. MARTINEZ-MIR, E. PALOMA, R. ALLIKMETS, C. AYUSO, T. DEL RIO, M. DEAN, L. VILAGELIU, R. GONZALEZDUARTE, S. BALCELLS, Nature Genet. 1998, 18, 11–12. 157 F. C. DELORI, C. K. DOREY, G. STAURENGHI, O. AREND, D. G. GOGER, J. J. WEITER, Invest. Ophthalmol. Vis. Sci. 1995, 36, 718–729. 158 R. ALLIKMETS, Eur. J. Ophthalmol. 1999, 9, 255–265. 159 R. R. ALLIKMETS, Am. J. Hum. Genet. 2000, 67, 487–491. 160 H. SUN, P. M. SMALLWOOD, J. NATHANS, Nature Genet. 2000, 26, 242–246. 161 N. L. MATA, J. WENG, G. H. TRAVIS, Proc. Natl Acad. Sci. USA 2000, 97, 7154–7159. 162 J. WENG, N. L. MATA, S. M. AZARIAN, R. T. TZEKOV, D. G. BIRCH, G. H. TRAVIS, Cell 1999, 98, 13–23. 163 H. SUN, J. NATHANS, J. Biol. Chem. 2001, 276, 11766–11774. 164 J. A. COHN, K. J. FRIEDMAN, P. G. NOONE, M. R. KNOWLES, L. M. SILVERMAN, P. S. JOWELL, N. Engl. J. Med. 1998, 339, 653–658. 165 P. F. PIGNATTI, C. BOMBIERI, C. MARIGO, M. BENETAZZO, M. LUISETTI, Hum. Mol. Genet. 1995, 4, 635–639.
37
38 Human ABC Transporters: Function, Expression, and Regulation
166 M. DEAN, M. B. WHITE, J. AMOS, B. GERRARD, C. STEWART, K. T. KHAW, M. LEPPERT, Cell 1990, 61, 863–870. 167 J. R. RIORDAN, J. M. ROMMENS, B. KEREM, N. ALON, R. ROZMAHEL, Z. GRZELCZAK, J. ZIELENSKI, S. LOK, N. PLAVSIC, J. L. CHOU, Science 1989, 245, 1066–1073. 168 E. M. SCHWIEBERT, D. J. BENOS, M. E. EGAN, M. J. STUTTS, W. B. GUGGINO, Physiol. Rev. 1999, 79, S145–S166. 169 K. KUNZELMANN, News Physiol. Sci. 2001, 16, 167–170. 170 M. SHENG, C. SALA, Annu. Rev. Neurosci. 2001, 24, 1–29. 171 D. B. SHORT, K. W. TROTTER, D. RECZEK, S. M. KREDA, A. BRETSCHER, R. C. BOUCHER, M. J. STUTTS, S. L. MILGRAM. J. Biol. Chem. 1998, 273, 19797–19801. 172 S. WANG, R. W. RAAB, P. J. SCHATZ, W. B. GUGGINO, M. LI, FEBS Lett. 1998, 427, 103–108. 173 S. WANG, H. YUE, R. B. DERIN, W. B. GUGGINO, M. LI. Cell 2000, 103, 169–179. 174 J. CHENG, B. D. MOYER, M. MILEWSKI, J. LOFFING, M. IKEDA, J. E. MICKLE, G. G. CUTTING, M. LI, B. A. STANTON, W. B. GUGGINO, J. Biol. Chem. 2002, 277, 3520–3529. 175 D. T. DRANSFIELD, A. J. BRADFORD, J. SMITH, M. MARTIN, C. ROY, P. H. MANGEAT, J. R. GOLDENRING, EMBO J. 1997, 16, 35–43. 176 R. L. JULIANO, V. LING, Biochim. Biophys. Acta 1976, 455, 152–162. 177 S. V. AMBUDKAR, S. DEY, C. A. HRYCYNA, M. RAMACHANDRA, I. PASTAN, M. M. GOTTESMAN, Annu. Rev. Pharmacol. Toxicol. 1999, 39, 361–398. 178 S. P. COLE, G. BHARDWAJ, J. H. GERLACH, J. E. MACKIE, C. E. GRANT, K. C. ALMQUIST, A. J. STEWART, E. U. KURZ, A. M. DUNCAN, R. G. DEELEY, Science 1992, 258, 1650–1654. 179 Y. HONJO, C. A. HRYCYNA, Q. W. YAN, W. Y. MEDINA-PEREZ, R. W. ROBEY, A. van de LAAR T. LITMAN, M. DEAN, S. E. BATES, Cancer Res. 2001, 61, 6635–6639. 180 H. TAKANO, R. KOIKE, O. ONODERA, S. TSUJI, Cell Biochem. Biophys. 2000, 32, 177–185. 181 H. W. MOSER, Ann. Neurol. 1993, 34, 121–122.
182 H. W. MOSER, Lancet 1993, 341, 544. 183 P. T. CLAYTON, Biochem. Soc. Trans. 2001, 29, 298–305. 184 R. J. WANDERS, E. G. van GRUNSVEN, G. A. JANSEN, Biochem. Soc. Trans. 2000, 28, 141–149. 185 E. H. HETTEMA, H. F. TABAK, Biochim. Biophys. Acta 2000, 1486, 18–27. 186 A. NETIK, S. FORSS-PETTER, A. HOLZINGER, B. MOLZER, G. UNTERRAINER, J. BERGER, Hum. Mol. Genet. 1999, 8, 907–913. 187 P. M. THOMAS, G. J. COTE, D. M. HALLMAN, P. M. MATHEW, Am. J. Hum. Genet. 1995, 56, 416–421. 188 P. M. THOMAS, G. J. COTE, N. WOHLLK, B. HADDAD, P. M. MATHEW, W. RABL, L. AGUILAR-BRYAN, R. F. GAGEL, J. BRYAN, Science 1995, 268, 426–429. 189 Y. HORIKAWA, N. ODA, N. J. COX, X. LI, M. ORHO-MELANDER, M. HARA, Y. HINOKIO, T. H. LINDNER, H. MASHIMA, P. E. SCHWARZ, L. BOSQUE-PLATA, Y. HORIKAWA, Y. ODA, I. YOSHIUCHI, S. COLILLA, K. S. POLONSKY, S. WEI, P. CONCANNON, N. IWASAKI, J. SCHULZE, L. J. BAIER, C. BOGARDUS, L. GROOP, E. BOERWINKLE, C. L. HANIS, G. I. BELL, Nature Genet. 2000, 26, 163–175. 190 L. J. BAIER, P. A. PERMANA, X. YANG, R. E. PRATLEY, R. L. HANSON, G. Q. SHEN, D. MOTT, W. C. KNOWLER, N. J. COX, Y. HORIKAWA, N. ODA, G. I. BELL, C. BOGARDUS, J. Clin. Invest 2000, 106, R69–R73. 191 M. R. ABRAHAM, A. JAHANGIR, A. E. ALEKSEEV, A. TERZIC, FASEB J. 1999, 13, 1901–1910. 192 D. L. GOKSEL, K. FISCHBACH, R. DUGGIRALA, B. D. MITCHELL, L. AGUILARBRYAN, J. BLANGERO, M. P. STERN, P. O’CONNELL, Hum. Genet. 1998, 103, 280–285. 193 A. F. REIS, W. Z. YE, D. DUBOISLAFORGUE, C. BELLANNE-CHANTELOT, J. TIMSIT, G. VELHO, Hum. Genet. 2000, 107, 138–144. 194 J. J. SMIT, A. H. SCHINKEL, R. P. OUDE ELFERINK, A. K. GROEN, E. WAGENAAR, L. van DEEMTER, C. A. MOL, R. OTTENHOFF, N. M. van der LUGT, M. A. van ROON, Cell 1993, 75, 451–462. 195 S. RUETZ, P. GROS, Cell 1994, 77, 1071–1081.
References 196 A. J. SMITH, J. L. TIMMERMANSHEREIJGERS, B. ROELOFSEN, K. W. WIRTZ, W. J. van BLITTERSWIJK, J. J. SMIT, A. H. SCHINKEL, P. BORST, FEBS Lett. 1994, 54, 263–266. 197 K. SCHWARTZ, R. M. LAWN, D. P. WADE, Biochem. Biophys. Res. Commun. 2000, 274, 794–802. 198 L. B. CAVELIER, Y. QIU, J. K. BIELICKI, V. AFZAL, J. F. CHENG, E. M. RUBIN, J. Biol. Chem. 2001, 276, 18046–18051. 199 P. COSTET, Y. LUO, N. WANG, A. R. TALL, J. Biol. Chem. 2000, 275, 28240–28245. 200 G. SCHMITZ, T. LANGMANN, Curr. Opin. Lipidol. 2001, 12, 129–140. 201 T. LANGMANN, M. PORSCH-OZCURUMEZ, S. HEIMERL, M. PROBST, C. MOEHLE, M. TAHER, H. BORSUKOVA, D. KIELAR, W. E. KAMINSKI, E. DITTRICH-WENGENROTH, G. SCHMITZ, J. Biol. Chem. 2002, 277, 14443–14450 202 X. FU, J. G. MENKE, Y. CHEN, G. ZHOU, K. L. MACNAUL, S. D. WRIGHT, C. P. SPARROW, E. G. LUND, J. Biol. Chem. 2001, 276, 38378–38387. 203 M. PORSCH-OZCURUMEZ, T. LANGMANN, S. HEIMERL, H. BORSUKOVA, W. E. KAMINSKI, W. DROBNIK, C. HONER, C. SCHUMACHER, G. SCHMITZ, J. Biol. Chem. 2001, 276, 12427–12433. 204 A. CHAWLA, W. A. BOISVERT, C. H. LEE, B. A. LAFFITTE, Y. BARAK, S. B. JOSEPH, D. LIAO, L. NAGY, P. A. EDWARDS, L. K. CURTISS, R. M. EVANS, P. TONTONOZ, Mol. Cell 2001, 7, 161–171. 205 J. J. REPA, S. D. TURLEY, J. A. LOBACCARO, J. MEDINA, L. LI, K. LUSTIG, B. SHAN, R. A. HEYMAN, J. M. DIETSCHY, D. J. MANGELSDORF, Science 2000, 289, 1524–1529. 206 A. VENKATESWARAN, B. A. LAFFITTE, S. B. JOSEPH, P. A. MAK, D. C. WILPITZ, P. A. EDWARDS, P. TONTONOZ, Proc. Natl Acad. Sci. USA 2000, 97, 12097–12102. 207 A. J. BROWN, W. JESSUP, Atherosclerosis 1999, 142, 1–28. 208 E. ORSO, C. BROCCARDO, W. E. KAMINSKI, A. BOTTCHER, G. LIEBISCH, W. DROBNIK, A. GOTZ, O. CHAMBENOIT, W. DIEDERICH, T. LANGMANN, T. SPRUSS, M. F. LUCIANI, G. ROTHE, K. J. LACKNER, G. CHIMINI, G. SCHMITZ, Nature Genet. 2000, 24, 192–196.
209 W. DROBNIK, H. BORSUKOVA, A. BOETTCHER, A. PFEIFFER, G. LIEBISCH G. J. SCHUTZ H. SCHINDLER, G. SCHMITZ, Traffic 2002, 3, 268–278. 210 A. J. MENDEZ, G. LIN, D. P. WADE, R. M. LAWN, J. F. ORAM, J. Biol. Chem. 2001, 276, 3158–3166. 211 R. M. LAWN, D. P. WADE, M. R. GARVIN, X. WANG, K. SCHWARTZ, J. G. PORTER, J. J. SEILHAMER, A. M. VAUGHAN, J. F. ORAM, J. Clin. Invest. 1999, 104, R25–R31. 212 S. LORKOWSKI, M. KRATZ, C. WENNER, R. SCHMIDT, B. WEITKAMP, M. FOBKER, J. REINHARDT, J. RAUTERBERG, E. A. GALINSKI, P. CULLEN, Biochem. Biophys. Res. Commun. 2001, 283, 821–830. 213 A. VENKATESWARAN, J. J. REPA, J. M. LOBACCARO, A. BRONSON, D. J. MANGELSDORF, P. A. EDWARDS, J. Biol. Chem. 2000, 275, 14700–14707. 214 S. LORKOWSKI, M. KRATZ, C. WENNER, R. SCHMIDT, B. WEITKAMP, M. FOBKER, J. REINHARDT, J. RAUTERBERG, E. A. GALINSKI, P. CULLEN, Biochem. Biophys. Res. Commun. 2001, 283, 821–830. 215 M. A. KENNEDY, A. VENKATESWARAN, P. T. TARR, I. XENARIOS, J. KUDOH, N. SHIMIZU, P. A. EDWARDS, J. Biol. Chem. 2001, 276, 39438–39447. 216 A. VENKATESWARAN, B. A. LAFFITTE, S. B. JOSEPH, P. A. MAK, D. C. WILPITZ, P. A. EDWARDS, P. TONTONOZ, Proc. Natl Acad. Sci. USA 2000, 97, 12097–12102. 217 R. G. GOULD, R. J. JONES, G. V. LEROY, R. W. WISSLER, C. B. TAYLOR, Metabolism 1969, 18, 652–662. 218 G. SALEN, E. H. Jr. AHRENS, S. M. GRUNDY, J. Clin. Invest. 1970, 49, 952–967. 219 A. K. BHATTACHARYYA, W. E. CONNOR, J. Clin. Invest. 1974, 53, 1033–1043. 220 G. SALEN, S. SHEFER, L. NGUYEN, G. C. NESS, G. S. TINT, V. SHORE, J. Lipid Res. 1992, 33, 945–955. 221 R. E. GREGG, W. E. CONNOR, D. S. LIN, H. B. Jr. BREWER, J. Clin. Invest. 1986, 77, 1864–1872. 222 I. BJORKHEM, K. M. BOBERG, Inborn errors in the bile acid biosythesis and* storage of sterols other than cholesterol. The metabolic basis of inherited disease. 7th edition. Vol. 2, C. R. SCRIVER, A. L.
39
40 Human ABC Transporters: Function, Expression, and Regulation
223
224 225
226
227
228 229 230
BEAUDET, W. S. SLY, D. VALLE, editors. McGraw Hill, New York, 2073–2102. S. B. PATEL, G. SALEN, H. HIDAKA, P. O. KWITEROVICH, A. F. STALENHOEF, T. A. MIETTINEN, S. M. GRUNDY, M. H. LEE, J. S. RUBENSTEIN, M. H. POLYMEROPOULOS, M. J. BROWNSTEIN, J. Clin. Invest. 1998, 102, 1041–1044. M. H. LEE, K. LU, S. B. PATEL, Curr. Opin. Lipidol. 2001, 12, 141–149. M. H. LEE, K. LU, S. HAZARD, H. YU, S. SHULENIN, H. HIDAKA, H. KOJIMA, R. ALLIKMETS, N. SAKUMA, R. PEGORARO, A. K. SRIVASTAVA, G. SALEN, M. DEAN, S. B. PATEL, Nature Genet. 2001, 27, 79–83. C. MOEHLE, R. MAURER, M. DEAN, U. BEIL, K. VON BERGMANN, G. SCHMITZ, Hum. Mutat. 2002, 20, 151. W. DROBNIK, B. LINDENTHAL, B. LIESER, M. RITTER, T. C. WEBER, G. LIEBISCH, U. GIESA, M. IGEL, H. BORSUKOVA, C. BUCHLER, W. P. FUNG-LEUNG, K. VON BERGMANN, G. SCHMITZ, Gastroenterology 2001, 120, 1203–1211. J. L. BOYER, J. GRAF, P. J. MEIER, Annu. Rev. Physiol. 1992, 54, 415–438. B. HAGENBUCH, P. J. MEIER, Semin. Liver Dis. 1996, 16, 129–136. G. A. KULLAK-UBLICK, B. HAGENBUCH, B. STIEGER, C. D. SCHTEINGART, A. F. HOFMANN, A. W. WOLKOFF, P. J. MEIER,
Gastroenterology 1995, 109, 1274–1282. 231 M. MULLER, P. L. JANSEN, J. Hepatol. 1998, 28, 344–354. 232 J. A. SILVERMAN, D. SCHRENK FASEB J. 1997, 11, 308–313. 233 J. W. SMIT, A. H. SCHINKEL, B. WEERT, D. K. MEIJER, Br. J. Pharmacol. 1998, 124, 416–424. 234 J. W. SMIT, A. H. SCHINKEL, M. MULLER, B. WEERT, D. K. MEIJER, Hepatology 1998, 27, 1056–1063. 235 A. J. SMITH, J. M. de VREE, R. OTTENHOFF, R. P. OUDE ELFERINK, A. H. SCHINKEL, P. BORST, Hepatology 1998, 28, 530–536. 236 T. GERLOFF, B. STIEGER, B. HAGENBUCH, J. MADON, L. LANDMANN, J. ROTH, A. F. HOFMANN, P. J. MEIER, J. Biol. Chem. 1998, 273, 10046–10050. 237 J. KARTENBECK, U. LEUSCHNER, R. MAYER, D. KEPPLER, Hepatology 1996, 23, 1061–1066. 238 C. C. PAULUSMA, P. J. BOSMA, G. J. ZAMAN, C. T. BAKKER, M. OTTER, G. L. SCHEFFER, R. J. SCHEPER, P. BORST, R. P. OUDE ELFERINK, Science 1996, 271, 1126–1128. 239 J. KONIG, D. ROST, Y. CUI, D. KEPPLER, Hepatology 1999, 29, 1156–1163. 240 J. MADON, B. HAGENBUCH, L. LANDMANN, P. J. MEIER, B. STIEGER, Mol. Pharmacol. 2000, 57, 634–641.
1
The Photoproteins Osamu Shimomura The Photoprotein Laboratory, Falmouth, USA
Originally published in: Photoproteins in Bioanalysis. Edited by Sylvia Daunert and Sapna K. Deo. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31016-6
1 Introduction
In 1961, we found an unusual bioluminescent protein in the jellyfish Aequorea and named it “aequorin” after its genus name [1]. The protein had the ability to emit light in aqueous solutions merely by the addition of a trace of Ca2+ . Surprisingly, it luminesced even in the absence of oxygen. After some studies, we discovered that the light is emitted by an intramolecular reaction that takes place inside the protein molecule, and that the total light emitted is proportional to the amount of protein luminesced. At that time, we simply thought that aequorin was an exceptional protein accidentally made in nature. In 1966, however, we found another unusual bioluminescent protein in the parchment tubeworm Chaetopterus [2]. This protein emitted light when a peroxide and a trace of Fe2+ were added in the presence of oxygen, without the participation of any enzyme. The total light emitted was again proportional to the amount of the protein used. These two examples were clearly out of place in the classic concept of the luciferin–luciferase reaction of bioluminescence, wherein luciferin is customarily a relatively heat stable, diffusible organic substrate and luciferase is an enzyme that catalyzes the luminescent oxidation of a luciferin. Considering the possible existence of many similar bioluminescent proteins in luminous organisms, we have introduced the new term “photoprotein” as a convenient, general term to designate unusual bioluminescent proteins such as aequorin and the Chaetopterus bioluminescent protein [2]. Thus, “photoprotein” is a general term for the bioluminescent proteins that occur in the light organs of luminous organisms as the major luminescent component and are capable of emitting light in proportion to the amount of protein [3]. The proportionality of the light emission makes a clear distinction between a photoprotein and a luciferase. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Photoproteins
In a luciferin–luciferase luminescence reaction, the total amount of light emitted is proportional to the amount of luciferin, not to the amount of luciferase. If a luciferin is a protein, the luciferin is a photoprotein, regardless of the existence or nonexistence of a specific luciferase. A photoprotein could be an extraordinarily stable form of enzyme–substrate complex, more stable than its dissociated forms, an enzyme and a substrate. Because of its high stability, a photoprotein, rather than its dissociated forms, occurs as the primary light-emitting component in the light organs. For example, the light organs of the jellyfish Aequorea contain aequorin, which is highly stable in the absence of Ca2+ , but its components coelenterazine and apoaequorin, both unstable, are hardly detectable in any part of the jellyfish. In the cells of luminous bacteria, the bacterial luciferase forms an intermediate by reacting with FMNH2 and O2 , and this intermediate emits light when a fatty aldehyde is added [4, 5]. However, this intermediate is unstable and short-lived; thus, it does not fit the definition of photoprotein. Presently, there are about 30 different types of bioluminescent systems for which substantial biochemical knowledge is available. About half of these types involve a photoprotein (Table 1). These photoproteins include the Ca2+ -sensitive type from various coelenterates (aequorin, obelin, etc.); the superoxide-activation types from a scale worm (polynoidin) and the clam Pholas (pholasin); the H2 O2 -activation type from a brittle star (Ophiopsila); and the ATP-activation type from a Sequoia millipede (Luminodesmus). For analytical applications, the photoproteins of the Ca2+ -sensitive type and the superoxide-sensitive type (pholasin) have been utilized, and the photoprotein aequorin has been in extensive use in various biological studies for the past 35 years. Each of the various photoproteins are briefly described in the next section, followed by a discussion on the extraction and purification of photoproteins and a more detailed account on the photoprotein aequorin.
2 Various Types of Photoproteins Presently Known 2.1 Radiolarian (Protozoa) Photoproteins
A Ca2+ -sensitive photoprotein that resembles coelenterate photoproteins was isolated from the radiolarian Thalassicola sp., but its properties were not investigated in detail [6]. It is of interest as the only known example of a Ca2+ -sensitive photoprotein other than the coelenterate photoproteins. 2.2 Coelenterate Photoproteins
Several kinds of photoprotein, including aequorin and obelin, were isolated from hydrozoan jellyfishes and hydroids. All of them emit blue light when Ca2+ is added,
2 Various Types of Photoproteins Presently Known 3 Table 1 Photoproteins that have been isolated
Source
Protozoa Thalassicola sp.1 Coelenterata Aequorea aequorea2 Halistaura sp.3 Phialidium gregarium Obelia geniculata Obelia geniculata Obelia longissima Ctenophora Mnemiopsis10 sp. Beroe ovata10 Annelida Chaetopterus variopedatus11
Name
Mr
Thalassicolin 21 500
440
Ca2+ Ca2+ Ca2+ Ca2+ Ca2+
465 470 474
Mnemiopsin-1 Mnemiopsin-2 Berovin
24 000 27 500 25 000
Ca2+ Ca2+ Ca2+
485 485 485
120 000 184 000
Fe2+ , hydroperoxide, and O2 Fe2+ , H2 O2 , and O2
455
Peroxidase or Fe2+ , plus O2 Alkaline pH? and O2 Catalase, H2 O2 , and O2
490
60 000
ATP, Mg2+ , and O2
496
45 000
H 2 O2
482
500 000
Mollusca Pholas dactylus13
Pholasin
34 600
Symplectin
60 000
Campbell et al. [7] Shimomura [50] 3 Shimomura et al. [71] 4 Levine and Ward [31] 5 Inouye and Tsuji [76] 6 Stephenson and Sutherland [77] 7 Illarionov et al. [22] 8 Morin and Hastings [36] 9 Markova et al. [34] 10 Ward and Seliger [80] 11 Shimomura and Johnson [57] 12 Nicolas et al. [40] 13 Michelson [35] 14 Fujii et al. [27] 15 Tsuji et al. [79] 16 Shimomura unpublished 17 Shimomura [66, 68] 18 Shimomura [49] 2
Ca2+
23 000 21 600 21 0006 21 000 22 2007
Polynoidin
1
Luminescence maximum (nm)
Aequorin Halistaurin Phialidin 4 , clytin5 Obelin Obelin Obelin
Hamothoe lunulata12
Symplectoteuthis oualaniensis14 Symplectoteuthis luminosa16 Diplopoda Luminodesmus sequoia17 Echinodermata Ophiopsila californica18
Requirements for luminescence
50 000
4758 4859 4959
510
47015
4 The Photoproteins
regardless of the presence or absence of oxygen. The coelenterate photoproteins are suitable for use in the detection and measurement of trace amounts of Ca2+ , and aequorin has been widely used in the studies of Ca2+ in various biological systems, including single cells [2, 4]. The overwhelming popularity of this type of photoprotein compared with the other types sometimes leads to the misconception that the photoproteins are Ca2+ -sensitive bioluminescent proteins. Detailed studies have been made with only three kinds of photoproteins: aequorin obtained from Aequorea, obelin obtained from Obelia, and phialidin (clytin) obtained from Phialidium. Aequorin was isolated from Aequorea aequorea by Shimomura et al. [1], and its purification was described in several papers [5, 9, 10]. The recombinant form of aequorin has been made [12–14]. Obelin was isolated from Obelia geniculata [6] and also from O. australis and O. geniculata [16]. Its recombinant form was prepared by Illarionov et al. [17]. Phialidin was isolated from Phialidium gregarium by Levine and Ward [18] and was cloned by Inouye and Tsuji [19]; the recombinant protein was named clytin. All coelenterate photoproteins have a molecular weight close to 20 000. The concentrated solutions of purified photoproteins are slightly yellowish (weak absorption at about 460 nm) and non-fluorescent except for ordinary protein fluorescence. After Ca2+ -triggered luminescence, the solutions turn colorless and become fluorescent in blue (emission λmax 460 nm). The intensity of the blue fluorescence is dependent on the concentrations of the spent protein and Ca2+ ; however, the fluorescence intensity is not proportional to the concentration of the protein [20]. In the case of aequorin, the emission spectrum of blue fluorescence is almost superimposable on the emission spectrum of Ca2+ -triggered luminescence, suggesting that the blue fluorescent chromophore formed in the luminescence reaction is probably the light emitter [21]. As a Ca2+ indicator, aequorin is useful at a concentration range of Ca2+ between 10−7.5 M and 10−4.5 M [5], whereas obelin is useful at a range between 10−6.5 M and 10−3.5 M under similar conditions [16]. The Ca2+ sensitivity of phialidin is about equal to that of obelin [22]. It should be noted here that the Ca2+ sensitivity and certain other properties of aequorin, and probably of all coelenterate photoproteins, can be modified by replacing the coelenterazine moiety of the photoprotein with its analogues (explained later). The chemistry of the bioluminescence reaction of aequorin has been elucidated in considerable detail and will be described later in this chapter. The reaction mecha nisms of all hydrozoan photoproteins are believed to be essentially identical with that of aequorin. However, the luminescence reaction differs in luminous anthozoans, which are taxonomically closely related to hydrozoan. The luminous species of anthozoans contain a luciferin (coelenterazine) and a species-specific luciferase instead of a photoprotein, although the presence of a small amount of Ca2+ -sensitive photoprotein is suspected in some species, such as the sea pen Ptilosarcus gurneyi and the sea cactus Cavernularia obesa [23]. Spent aequorin that has been luminesced with Ca2+ can be regenerated into the active original form by incubation with coelenterazine in the presence of O2 and a low concentration of 2-mercaptoethanol [24]. The regenerated aequorin is
2 Various Types of Photoproteins Presently Known 5
indistinguishable from the original aequorin in every aspect of its properties. The yield of the regeneration is practically 100% when the protein concentration is over 0.1 mg mL−1 [25]. Thus, a sample of aequorin can be luminesced and recharged repeatedly. The regeneration of spent photoprotein takes place also with obelin [6], as well as with halistaurin and phialidin (unpublished results). 2.3 Ctenophore Photoproteins
The photoproteins mnemiopsin and berovin were isolated from Mnemiopsis sp. and Beroe lovata, respectively [26]. They are Ca2+ -sensitive photo proteins that are similar to aequorin, except that these photoproteins are photosensitive. The absorption maximum of mnemiopsin-2 is 435 nm, which is about 20 nm shorter than that of aequorin. The photosensitivity of ctenophore photoproteins is strikingly different from that of aequorin. Mnemiopsin and berovin are extremely sensitive to light [13], being easily inactivated by a broad spectral range of light (wavelength 230–570 nm) [28]. Aequorin and other hydrozoan photoproteins are not affected by light. Photoinactivated mnemiopsin, as well as spent mnemiopsin after Ca2+ -triggered luminescence, can be regenerated into its active form by incubation with coelenterazine in the presence of oxygen, like aequorin; however, the regeneration takes place only at a narrow pH range around 9.0 [1]. 2.4 Pholasin (Pholas Luciferin)
The boring clam Pholas dactylus is historically important in the field of bioluminescence because it was one of the two luminous species with which Dubois first demonstrated luciferin–luciferase luminescence in 1887. Thus, the luminescence of Pholas was originally considered to be a luciferin–luciferase reaction involving Pholas luciferin and Pholas luciferase. However, Pholas luciferin is a glycoprotein with a molecular weight of 34 600 [19, 21, 32]. Therefore, it is appropriate to call this luciferin a photoprotein. The name “pholasin” was first used by Roberts et al. [33]. The ultraviolet absorption spectrum of pholasin shows a bulge at about 325 nm, in addition to the protein peak at 280 nm. Pholasin emits light (λmax 490 nm) in the presence of various substances such as Pholas luciferase, ferrous ions, H2 O2 , peroxidase, superoxide anions, hypochlorite, and certain other oxidants, all in the presence of molecular oxygen [19–22, 35]. Thus, Pholas luciferase is clearly not an essential component for the luminescence of pholasin. The luminescence reaction of pholasin with Pholas luciferase is optimum at pH 8–9 and at an ionic strength of about 0.5 M, giving a quantum yield of 0.09 for pholasin [32]. According to Reichl et al. [36], the addition of horseradish peroxidase compounds I and II to pholasin induces an intense luminescence. Moreover, the addition of H2 O2 to a mixture of
6 The Photoproteins
myeloperoxidase and pholasin gives an intense burst of light. The chromophore of pholasin is still not chemically identified. The cloning and expression of apopholasin was achieved by Dunstan et al. [9], but attempts to reconstitute the recombinant apopholasin into pholasin by the addition of an acidic methanol extract of Pholas failed, although the mixture gave luminescence by the addition of sodium hypochlorite. Pholasin is commercially available from Knight Scientific, Plymouth, UK. The main application of pholasin is the measurement of oxygen radicals. 2.5 Chaetopterus Photoprotein
The photoprotein of the parchment tubeworm Chaetopterus variopedatus purified by chromatography has a molecular mass of approx. 120-130 kDa [2, 38]. The protein is amorphous when precipitated with ammonium sulfate, but it can be converted into a crystalline form with an increased molecular mass of 184 kDa by slow crystallization with ammonium sulfate. The photoprotein emits light in the presence of Fe2+ , a peroxide, and molecular oxygen. As a peroxide, H2 O2 can be used, but an unidentified hydroperoxide existing in old dioxane or tetrahydrofuran was far more effective. Two kinds of additional activators were found to give brighter luminescence, but they were not identified. The light emission of this photoprotein is strongly affected by the pH of the medium, showing a peak at pH 7.7 with a sharp decrease at both sides (50% decreases at pH 6.5 and pH 8.3); the light intensity is not significantly influenced by the salt concentration up to 1 M when tested with NaCl. The optimum temperature for the luminescence intensity is 22 ◦ C. With this photoprotein, a concentration of Fe2+ as low as 0.1 µM can be detected. The purified photoprotein is practically colorless, and its absorption spectrum shows, in addition to the 280-nm protein absorption peak, a very slight absorption in the region of 330–380 nm, although its significance is unclear. A solution of the photoprotein is moderately blue fluorescent, with a fluorescence emission maximum at 453–455 nm and an excitation maximum at 375 nm, and these peaks do not significantly change after the luminescence reaction. The luminescence spectrum of purified photoprotein (λmax 453–455 nm) closely matched with the fluorescence emission spectrum. 2.6 Polynoidin
A membrane photoprotein isolated from the scales of the scale worm Harmothoe lunulata was named “polynoidin” [39]. The purified photoprotein (Mr 500 000) emits light in the presence of molecular oxygen (λmax 510 nm) by the action of sodium hydrosulfite, the xanthine–xanthine oxidase system, Fenton’s reagent (H2 O2 plus Fe2+ ), or other reagents that produce superoxide radicals. The photoprotein luminescence was 30% brighter in phosphate buffer than in Tris buffer, and the luminescence response was significantly increased by including a complexing agent
2 Various Types of Photoproteins Presently Known 7
such as EGTA. However, the injection of polynoidin solution into the mixture of H2 O2 and Fe2+ failed to produce light; Fe2+ must be added last to initiate light emission. The photoprotein is not fluorescent (except for usual protein fluorescence) after the bioluminescence reaction or before the reaction. The requirements for its luminescence reaction are similar to that of the bioluminescence systems of Pholas and Chaetopterus, suggesting the involvement of a common basic mechanism in these luminescence systems. 2.7 Symplectin
The luminescent substance of the squid Symplectoteuthis oualaniensis was first obtained in the form of insoluble particles by Tsuji et al. [40]. The suspension of the particles emitted light in the presence of monovalent cations such as K+ , Rb+ , Na+ , Cs+ , NH4 + , and Li+ (in decreasing order of effect). Molecular oxygen was needed for the luminescence. Divalent ions such as Ca2+ and Mg2+ did not trigger light emission. The light emission (λmax 470 nm) was optimal in the presence of 0.6 M KCl or NaCl and at a pH of 7.8. The soluble form of the Symplectoteuthis photoprotein was isolated and purified from the granular light organs of the squid and was named “symplectin” [27, 41]. The light organs were first extracted with a pH 6 buffer containing 0.4 M KCl to remove impurities, and then symplectin was extracted from the residue with a pH 6 buffer containing 0.6 M KCl. All solutions used in the experiments contained 0.25 M sucrose, 1 mM dithiothreitol, and 1 mM EDTA. The 0.6 M KCl extract was chromatographed by size-exclusion HPLC on a TSK G3000SW column. Symplectin was eluted as two major components of oligomers, having molecular masses of 200 kDa or more, and a minor component of monomer (60 kDa). All processes of extraction and purification were carried out at 4 ◦ C. Warming up a solution of symplectin, adjusted to pH 8, to room temperature causes the luminescence reaction to begin, and the light emission lasts for hours. A tryptic digestion of the KCl extract increased the content of the 60-kDa species at the expense of the two high-molecular-weight species, accompanied by the formation of 40-kDa and 16-kDa species. SDS-PAGE analysis of the two high-molecular-weight oligomers revealed that they consist mainly of the 60kDa protein. The 60-kDa protein and the 40-kDa protein were fluorescent in the SDS-PAGE analysis. The spent protein of symplectin after luminescence (aposymplectin) could be reconstituted into original symplectin by treatment with dehydrocoelenterazine [27]. 2.8 Luminodesmus Photoprotein
This is presently the only example of a photoprotein of terrestrial origin. The millipede Luminodesmus sequoia [32] emits light from the surface of its whole body
8 The Photoproteins
continuously day and night. The photoprotein extracted and purified from this organism emits light (λmax 496 nm) when ATP and Mg2+ are added in the presence of molecular oxygen [11, 45]. Thus, the luminescence system of Luminodesmus resembles that of the fireflies in that it requires ATP and Mg2+ , but it differs in that it needs only the photoprotein rather than the luciferin and luciferase required in the firefly system. The molecular weight of the photoprotein is 104 000, which is close to the molecular weight of firefly luciferase reported earlier (100 000) but larger than its newer value 62 000 [46]. Although it was suspected that the photoprotein might be a complex of a firefly-type luciferase and firefly luciferin, firefly luciferin itself was not detected in this photoprotein. Recently, the possible presence of a porphyrin chromophore in the photoprotein has been suggested, although the role of this chromophore in the light-emitting reaction is unclear [47]. Using the luminescence system of Luminodesmus, 0.01 µM ATP and 1 µM Mg2+ can be detected. 2.9 Ophiopsila Photoprotein
The brittle star Ophiopsila californica is abundant around Catalina Island, off the coast of Los Angeles [48]. An animal of average size weighs about 3–4 g, and has five arms of about 10 cm long. The purified photoprotein luminesces in the presence of H2 O2 , emitting a greenish-blue light (λmax 482 nm). Molecular oxygen is probably not needed for the luminescence reaction. The molecular weight of Ophiopsila photoprotein is estimated to be about 45 000 by gel filtration. The absorption spectrum of a solution of the photoprotein showed a small peak (λmax 423 nm, with a shoulder at about 450 nm) in addition to the 280-nm protein peak. The 423-nm peak decreased slightly through the H2 O2 -triggered luminescence reaction, accompanied by a slight red shift of the peak. The photoprotein was fluorescent in greenish-blue (emission λmax 482 nm; excitation λmax 437 nm), and the fluorescence emission spectrum exactly coincided with the luminescence spectrum of photoprotein in the presence of H2 O2 , suggesting the possibility that the fluorescent chromophore might be the light emitter. However, the fluorescence emission of the photoprotein did not show any detectable change after the H2 O2 -triggered luminescence reaction; an anticipated increase in the 482-nm fluorescence did not occur.
3 Basic Strategy of Extracting and Purifying Photoproteins
Photoproteins are usually highly reactive, unstable substances, like luciferins. Their luminescence activities are easily lost by spontaneous light emission and various other causes. In isolating active photoproteins, it is extremely important to pay special attention to prevent the loss of the luminescence activity. Compared with the isolation of luciferins, however, techniques available for isolating photoproteins are somewhat limited because of their protein nature.
4 The Photoprotein Aequorin
The basic principle is to extract a photoprotein in an aqueous solution and purify the photoprotein by various means of protein purification, all under conditions that prevent the luminescence and denaturation of the protein molecules. Thus, the luminescence system must be reversibly inhibited during the extraction and purification of a photoprotein. The method of reversible inhibition differs depending on the nature and cofactor requirement of the system to be isolated. For example, the calcium chelator EDTA or EGTA is used to inhibit the luminescence of the Ca2+ -sensitive photoproteins of coelenterates and ctenophores such as aequorin, obelin, and mnemiopsin [1, 6, 13, 26]. Before the discovery of the Ca2+ requirement, however, aequorin was extracted with a pH 4 buffer that reversibly inactivated the photoprotein [1, 49]. In the case of the luminescence systems of Chaetopterus and Pholas, the metal ion inhibitors 8-hydroxyquinoline and diethyldithiocarbamate, respectively, were used [2, 17]. The ionic strength and the pH of buffers are also important, and these conditions should be chosen to optimize the yield of active photoprotein. The use of acidic buffers, pH 5.6–5.8, was effective in suppressing spontaneous luminescence during the extraction of the photoproteins of euphausiids and Luminodesmus [25, 51]. In the case of the membrane photoprotein polynoidin and the squid photoprotein symplectin, easily soluble impurities were all washed out and the substances that cause the luminescence of the photoprotein are completely removed before the solubilization of the photoproteins; thus, inhibitors were not needed [27, 39].
4 The Photoprotein Aequorin 4.1 Extraction and Purification of Aequorin
Aequorin is the best-known photoprotein and has been used widely in various applications. The first step in the extraction of aequorin from the jellyfish Aequorea (average body weight 50 g) is to cut off the circumferential margin of umbrella that contains light organs, making about 2-mm-wide strips commonly called “rings”. This process is important because it eliminates about 99% of unnecessary body parts that do not contain aequorin. The rings can be made efficiently by using specially made cutting devices [5, 29] or, much less efficiently, with scissors. The rings (about 0.5 g each) containing light organs are kept in cold seawater. Then, about 500 rings are shaken vigorously by hand with cold, saturated ammonium sulfate solution containing 50 mM EDTA [28] or with cold seawater [28] to dislodge the particles of light organs from the rings. Then, the rings are removed by filtering through a net of Dacron or Nylon (50–100 mesh), and the light organ particles suspended in the filtrate are collected by filtration on a B¨uchner funnel with the aid of some Celite. The light organ particles in the filter cake are cytolyzed and the aequorin therein is extracted by shaking with cold 50 mM EDTA (pH 6.5). After filtration, crude aequorin is precipitated by saturation with ammonium sulfate.
9
10 The Photoproteins
Of the two methods of shaking the rings mentioned above, using seawater results in much cleaner crude extracts, with a little less yield, than are obtainable by shaking in saturated ammonium sulfate containing EDTA. On the other hand, saturated ammonium sulfate strongly inhibits the luminescence response of the photogenic particles to mechanical stimulation such as shaking and stirring, and it also salts out and stabilizes aequorin, thus resulting in a better yield of aequorin and less effect on the isoform composition of aequorin extracted, compared with that obtainable by shaking in seawater. With regard to the purification of aequorin, Blinks et al. [5] described a welldesigned method for purifying an aequorin extract that has been obtained by the “seawater shaking method”. The method included gel filtration on Sephadex G-50 and ion-exchange chromatography on DEAE-Sephadex A-50 and QAE-Sephadex A-50. The ion exchangers effectively separated the green fluorescent protein from aequorin. For the purification of the extract obtained by the “saturated ammonium sulfate shaking method”, gel filtration on a column of Sephadex G-75 or G-100, using buffers containing 1 M ammonium sulfate and not containing ammonium sulfate, and ion-exchange chromatography on DEAE cellulose have been used [9, 10, 28]. Aequorin in 1 M ammonium sulfate aggregates to a larger size (Mr > 50 000). Thus, crude aequorin is first chromatographed on Sephadex G-100 with a low-salt buffer not containing ammonium sulfate, then the aequorin fraction obtained is rechromatographed on the same column using a buffer containing 1 M ammonium sulfate to obtain purified aequorin. Using Sephadex G-50 is not recommended in this case, at least for the initial step, because of the presence of a large amount of the aggregated form of impurities. 4.1.1 Hydrophobic Interaction Chromatography Butyl-Sepharose 4 Fast Flow (Pharmacia) is an excellent medium for purifying aequorin, supplementing the methods described above. Aequorin in a buffer solution containing 5–10 mM EDTA and 1.8 M ammonium sulfate is adsorbed on the column, and then aequorin is eluted with a buffer containing decreasing concentrations of ammonium sulfate and 5 mM EDTA. Aequorin elutes at an ammonium sulfate concentration between 1 M and 0.5 M. Because apoaequorin elutes at ammonium sulfate concentrations lower than 0.1 M, aequorin is cleanly separated from apoaequorin. Thus, it is possible to prepare virtually pure samples of aequorin using a single column as follows. The aequorin sample is first luminesced by the addition of a sufficient amount of Ca2+ . The spent solution, after dissolving 1 M ammonium sulfate, is adsorbed on a column of butyl-Sepharose 4. The apoaequorin adsorbed on the column is eluted with decreasing concentration of ammonium sulfate starting from 1 M; apoaequorin elutes at an ammonium sulfate concentration lower than 0.1 M. The apoaequorin eluted is regenerated with coelenterazine in the presence of 5 mM EDTA and 5 mM 2-mercaptoethanol (see the Section 4.4.3). The solution of regenerated aequorin in 1.8 M ammonium sulfate is adsorbed on the same butyl-Sepharose 4 column. Aequorin adsorbed on the column is eluted with a decreasing concentration of ammonium sulfate, yielding highly purified aequorin.
4 The Photoprotein Aequorin
4.2 Properties of Aequorin
Aequorin is a conjugated protein that has a relative molecular mass of approximately 20 000–21 000 [10], and it contains a functional chromophore corresponding to roughly 2% of the total weight. A concentrated solution of aequorin is yellowish because of its absorption peak (λmax about 460 nm), in addition to a protein absorption peak at 280 nm (A1%,1cm 30.0; [50]). Aequorin is non-fluorescent, except for a weak ultraviolet fluorescence that is due to its protein moiety. Natural aequorin is a mixture of isoforms, containing more than a dozen of them, designated aequorins A, B, C, etc. [4, 50]. The isoelectric points of these isoforms lie between 4.2 and 4.9 [3]. The solubility of aequorin in aqueous buffers is generally greater than 30 mg mL−1 [56]. Aequorin can be salted out from aqueous buffers with ammonium sulfate, although the salting out is not complete even after the complete saturation of ammonium sulfate. Usually 1–2% of aequorin remains in the solution. One milligram of aequorin emits 4.3–5.0 × 1015 photons at 25 ◦ C when Ca2+ is added, at a quantum yield of 0.16 [9, 21, 50, 56]. In the presence of an excess of Ca2+ , the luminescence reaction of aequorin has a rate constant of 100–500 s−1 for the rise and 0.6–1.25 s−1 for the decay [15, 33]. 4.2.1 Stability Aequorin is always emitting a low level of luminescence, spontaneously deteriorating by itself. Thus, the information concerning its stability is important when aequorin is used as a calcium probe. The stability of aequorin in aqueous solutions containing EDTA or EGTA varies widely by temperature, pH, concentration of salts, and impurities. To minimize the deterioration of aequorin, it is most important to keep the temperature as low as possible. The half-life of aequorin in 10 mM EDTA, pH 6.5, is about 7 days at 25 ◦ C. At room temperature, aequorin is most stable in solutions containing 2 M ammonium sulfate or when it is precipitated from saturated ammonium sulfate. Freeze-dried aequorin is also stable, but the process of drying always causes a loss of luminescence activity (see below). All forms of aequorin are satisfactorily stable for many years at -50 ◦ C or below, but all deteriorate rapidly at temperatures above 30–35 ◦ C. A solution of aequorin should be stored frozen whenever possible because repeated freeze-thaw cycles do not harm aequorin activity. 4.2.2 Freeze-drying A note on freeze-dried aequorin may be appropriate here, because most commercial preparations of aequorin are sold in a dried form. The process of freeze-drying aequorin always results in some loss of luminescence activity. Therefore, aequorin should not be dried if a fully active aequorin is required. The loss is about 5% at the minimum, typically about 10%. The loss can be slightly lessened by certain additives; the addition of 50–100 mM KCl and some sugar (50–100 mM) in the buffer seems to be beneficial. The buffer composition used for the freeze-drying of aequorin at the author’s laboratory is as follows: 100 mM KCl, 50 mM glucose, 3 mM HEPES, 3 mM Bis-Tris, and 0.05 mM EDTA, pH 7.0.
11
12 The Photoproteins
4.2.3 Specificity to Ca2+ Several kinds of cations other than Ca2+ elicit the light emission of aequorin. Some lanthanide ions (such as La3+ and Y3+ ) trigger the luminescence of aequorin as efficiently as Ca2+ . In addition, Sr2+ , Pb2+ , and Cd2+ cause significant levels of luminescence; Cu2+ and Co2+ give some luminescence only in slightly alkaline buffer. However, Be2+ , Ba2+ , Mn2+ , Fe2+ , Fe3+ , and Ni2+ do not elicit any light from aequorin [59]. In testing biological systems, however, aequorin is considered to be highly specific to Ca2+ , because the occurrence of a significant amount of metal ions other than Ca2+ is unlikely. In an in vitro test, all of these metal ions except Ca2+ , Sr2+ , and lanthanoids could be completely masked by including 1 mM sodium diethyldithiocarbamate in the test solution [60]. 4.3 Luminescence of Aequorin by Substances Other Than Divalent Cations
As already mentioned, all forms of aequorin emit photons spontaneously and constantly, regardless of its molecular status or environment conditions, even in the absence of Ca2+ or in the presence of a large excess of EDTA. The light emission results in a gradual deterioration of the luminescence capability of aequorin. A luminescence intensity of this type is quite low at 0 ◦ C, though it can be easily measured with a light meter. The intensity is temperature dependent and steeply increases with rising temperature, reaching a maximum intensity at around 60 ◦ C [56]. Such a temperature-dependent luminescence occurs with aequorin dissolved in aqueous solutions, as well as with freeze-dried aequorin and its suspension in a certain organic solvents, such as toluene, acetone, and diglyme (bis-2-methoxyethyl ether). The quantum yield of the spontaneous luminescence of dried aequorin, when warmed with or without an organic solvent, is generally in the range of 0.003–0.005, whereas that of aequorin in aqueous solutions is considerably less (about 0.001 at 43 ◦ C). Aequorin also emits luminescence in the presence of thiol-modification reagents such as p-benzoquinone, Br2 , I2 , N-bromosuccinimide, N-ethylmaleimide, iodoacetic acid, and p-hydroxymercuribenzoate [20]. The luminescence is probably caused by the conformational change of the protein that results from the modification of cysteine residues (by causing the decomposition of the coelenterazine peroxide moiety). The luminescence is weak but lasts for more than one hour. The quantum yields in this type of luminescence never exceed 0.02 (about 15% of Ca2+ -triggered luminescence) at 23–25 ◦ C. To prevent this type of luminescence, any reagents that might react with an SH group should be avoided. 4.4 Mechanism of Aequorin Luminescence and Regeneration of Aequorin 4.4.1 Structure of Aequorin Aequorin is a globular protein with three “EF-hand” domains to bind Ca2+ , and it accommodates a peroxidized coelenterazine in the central cavity of the protein [16].
4 The Photoprotein Aequorin
The presence of a peroxy group bound to position 2 of the coelenterazine moiety was previously suggested [29] and confirmed by 13 C nuclear magnetic resonance spectroscopy [39]. The protein conformation of aequorin is much more compact and rigid than that of apoaequorin, consistent with the results of the fluorescence polarization studies and the papain digestion of those proteins [30]. The functional group, peroxidized coelenterazine, is shielded from outside solvent. Therefore, no reagent can react with this group without first reacting with the residues of the protein, and any reaction with the protein residues triggers the breakdown of the peroxidized coelenterazine. 4.4.2 Luminescence Reaction In the case of aequorin reacting with Ca2+ , a conformational change of protein takes place when one molecule of aequorin is bound with two Ca2+ ions [52]. The conformational change results in the cyclization of the peroxide of coelenterazine into the corresponding dioxetanone, which instantly decomposes and produces the excited state of coelenteramide and CO2 [20, 29]. When the energy level of the excited state of coelenteramide falls to ground state, light is emitted. A simplified mechanism of the luminescence reaction is illustrated in Fig. 1. The spent solution of the luminescence reaction of aequorin is a mixture of coelenteramide, apoaequorin, and Ca2+ that forms a complex called “blue fluorescent protein” (fluorescence emission maximum about 465–470 nm). The dissociation constant of the complex into coelenteramide plus apoaequorin in the presence of 0.5 mM Ca2+ is 7 × 10−6 M at pH 7.4 and 25 ◦ C by Morise et al. [20]; (based on the molecular weight of aequorin 21 000). Thus, the luminescence reaction product of aequorin is usually blue fluorescent, unless the concentration of aequorin used is too low (much less than 1 µM) to form the fluorescent complex. The blue fluorescence of the complex (λmax 465–470 nm) closely matches the bioluminescence emission of aequorin, giving a basis to the postulation that the fluorescent complex is the light emitter of aequorin bioluminescence [21], although it now seems an oversimplification considering that the conformation of apoaequorin continues to change for several minutes after the light emission. When the light emission of aequorin is measured in low-ionic-strength buffers containing no inhibitor, the log-log plot of the luminescence intensity versus Ca2+ concentration gives a sigmoid curve having a maximum slope of about 2.0 for its middle part [10, 65], indicating that the binding of two Ca2+ ions to one molecule of aequorin is required to trigger the luminescence of aequorin. 4.4.3 Regeneration Apoaequorin can be reconstituted into aequorin by incubation with coelenterazine in the presence of O2 and 2-mercaptoethanol, which the role of the latter substance is to protect the functional sulfhydryl groups of apoaequorin during the regeneration reaction [24]. For the regeneration reaction to occur, there is no need to separate coelenteramide from apoaequorin if the material contains it. Usually, the product of the luminescence reaction is incubated at 0–5 ◦ C in a pH 7.5 buffer solution containing 5 mM EDTA, 3 mM 2-mercaptoethanol, and an excess of coelenterazine
13
14 The Photoproteins
Fig. 1 Schematic illustration of a simplified mechanism of the luminescence and regeneration of aequorin. Aequorin (upper left) is a globular protein that contains peroxidized coelenterazine sealed in its central cavity and has three EF-hand Ca2+ -binding sites on the outside. When the protein is bound with two Ca2+ ions, an intramolecular reaction starts, resulting in the formation of coelenteramide and CO2 , accompanied by the emission of blue light
(8max 460 nm) and opening of the protein shell (upper right). The protein part, apoaequorin (bottom), can be regenerated into the original aequorin by incubation with coelenterazine and molecular oxygen in the absence of Ca2+ . In the regeneration reaction, addition of a low concentration of 2-mercaptoethanol increases the yield of regenerated aequorin by protecting the functional cysteine residues of apoprotein.
(at least 2 µg mL−1 more than the calculated amount). The regeneration is usually 50% complete within 30 min and practically 100% complete after 3 h. When the regeneration reaction of apoaequorin is carried out in the presence of an excess of free Ca2+ , rather than in 5 mM EDTA, the result is a continuous, weak light emission from the reaction mixture. This weak luminescence lasts many hours, differing from the short, bright flash of the Ca2+ -triggered luminescence of aequorin. The weak luminescence of the regeneration mixture in the presence of Ca2+ can be intensified several times by including 0.5% diethylmalonate in the reaction medium [45]. During the regeneration in the presence of Ca2+ described above, apoaequorin appears to be acting as an enzyme that catalyzes the luminescent oxidation of coelenterazine. The mechanism involved might be a simple, straightforward one: aequorin is first formed, and then it instantly reacts with Ca2+ to emit light. This simple mechanism, however, has no experimental support at present; the regeneration reaction of aequorin in the presence of EDTA was not activated by diethylmalonate, suggesting either that Ca2+ is needed in the activation by diethylmalonate or that aequorin is not an intermediate in the luminescence reaction in the
4 The Photoprotein Aequorin
presence of Ca2+ [45]. Whatever the mechanism, apoaequorin must be a very sluggish enzyme if it is an enzyme. Apoaequorin has a turnover number of 1–2 per hour [10]. 4.5 Inhibitors of Aequorin Luminescence
All thiol-modification reagents cause weak, spontaneous luminescence of aequorin in the absence of Ca2+ , as already mentioned. They are in effect inhibitors of the Ca2+ -triggered luminescence of aequorin, because the quantum yields of aequorin in the luminescence caused by these reagents (∼0.008) are much lower than that of the Ca2+ -triggered luminescence of aequorin [20]. Bisulfite, dithionite, and p-dimethyaminobenzaldehyde are all strongly inhibitory even at micromolar concentrations [1]. It has been found that the functional group of aequorin, i.e., a peroxide of coelenterazine, decomposes without light emission when the photoprotein is treated with bisulfite or dithionite, resulting in the formation of a corresponding hydroxy-coelenterazine or coelenterazine [63]. A number of inorganic and organic substances at high concentrations (> 50 mM) suppress the luminescence intensity of the Ca2+ -triggered light emission. Thus, KCl (100–150 mM) used in physiological buffers is significantly inhibitory. Magnesium ions are inhibitory at millimolar concentrations, probably by competing with Ca2+ [4]. EDTA and EGTA can inhibit the Ca2+ -triggered luminescence of aequorin in two ways: (1) when free Ca2+ is removed from the reaction medium by chelation, the luminescence reaction is practically stopped; and (2) when the free (unchelated) forms of these chelators directly bind with the molecules of aequorin, inhibition results [30, 44]. The second type of inhibition is strong in solutions of low ionic strength and in the absence of other inhibitor ions such as Mg2+ , but it is relatively weak in the presence of 0.1 M KCl [47], presumably because aequorin is already inhibited by KCl. Therefore, great care must be taken if EDTA or EGTA is to be used in the calibration of the Ca2+ sensitivity of aequorin; this is especially important in the case of low-ionic-strength calcium buffers. It should also be noted that in usual calcium buffers, the lower the Ca2+ concentration, the higher the inhibitory free chelator concentration, resulting in a slope steeper than the true slope in the log–log plot of luminescence intensity versus Ca2+ concentration. 4.6 Recombinant Aequorin
The cloning and expression of apoaequorin cDNA was accomplished by two independent groups in 1985. One of these groups analyzed the cDNA clone AQ440 they obtained and reported that apoaequorin is composed of 189 amino acid residues (Mr 21 400) with an NH2 -terminal valine and a COOH-terminus proline [12, 12], which is consistent with the results of the amino acid sequence analysis of native aequorin reported by Charbonneau et al. [8]. In contrast, the other group reported that the
15
16 The Photoproteins
cDNA AEQ1 they obtained contains the entire protein-coding region of 196 amino acid residues, which includes seven additional residues attached to the N-terminus, and the apoaequorin expressed in Escherichia coli showed a molecular weight of 20 600 [13, 14]. The recombinant aequorin of the former group did not exactly match any of the isoforms of natural aequorin in the HPLC mobilities and the properties of Ca2+ -triggered luminescence [70]. No detailed comparison has been made with the recombinant aequorin obtained by the latter group, although a brief test indicated that the recombinant aequorins from both sources are practically identical. 4.7 Semi-synthetic Aequorins
The core cavity of the aequorin molecule can accommodate various synthetic analogues of coelenterazine in place of coelenterazine. The coelenterazine moiety in native aequorin can be replaced by a simple process. First, aequorin is luminesced by the addition of Ca2+ , and then the apoaequorin produced is regenerated with an analogue of coelenterazine in the presence of EDTA, 2-mercaptoethanol (or DTT), and molecular oxygen. The products are called semi-synthetic aequorins and are identified with an italic prefix (see Table 2). Semi-synthetic aequorins can be prepared from both native aequorin and recombinant aequorin, using various synthetic analogues of coelenterazine. A large number of coelenterazine analogues were synthesized, and about 50 kinds of semi-synthetic aequorins have been prepared and tested [70–73]. Some semi-synthetic aequorins are significantly different from the native type of aequorin in various properties, including spectral characteristics [51] and sensitivity to Ca2+ , the rate of luminescence reaction, and the rise time of luminescence (Table 2). The relationship between Ca2+ concentration and the initial light intensity of various semi-synthetic aequorins is shown in Fig. 2. As is apparent from the data of Table 2 and Fig. 2, the tolerance of the central cavity of the aequorin molecule in accommodating the coelenterazine moiety is surprisingly wide. The apparent limitations in the substitution of the coelenterazine moiety, and the changes in the Ca2+ sensitivity caused by the substitution, are as follows. 1. The group R1 must be aromatic. A replacement of the original p-hydroxyphenyl group with a group of larger size tends to decrease Ca2+ sensitivity. 2. The group R2 must be lager than the ethyl group. The replacement of the original phenyl group with a smaller non-aromatic group increases Ca2+ sensitivity. 3. The group R3 must be an OH group, and no substitution is allowed on the phenyl group bearing this OH. 4.7.1 e-Aequorins e-Aequorins, containing a ligand of e-coelenterazine, show properties significantly different from other aequorins [70–73]. In the structure of e-coelenterazines, the 5 position of the imidazopyrazinone structure is bound with the α position of the 6(p-hydroxyphenyl) group through an ethylene linkage, thus restraining the two ring
4 The Photoprotein Aequorin Table 2 Selected semi-synthetic aequorins derived from recombinant aequorin [76]
No. (Prefix)
1 2 (h) 3 (f ) 4 (f2) 6 (cl) 9 (n) 9 (n/J)5 12 (cp) 13 (ch)
Structural modification of coelenterazine1
None R1 : C6 H5 R1 : C6 H4 F(p) R1 : C6 H3 F2 (m,p) R1 : C6 H4 Cl(p) R1 : β-naphthyl R1 : β-naphthyl R2 : cyclopentyl R2 : cyclohexyl R1 : C6 H4 F(p), R2 : n-butyl 17 (fb) R1 : C6 H5 , R2 : cyclopentyl 19 (hcp) R1 : C6 H5 , R2 : cyclohexyl 21 (hch) R1 : C6 H4 F, R2 : cyclohexyl 22 (fch) 23 (m5) R4 : methyl 24 (e) R5 : CH2 CH2 R1 : C6 H4 F(p), R5 : CH2 CH2 26 (ef ) R2 : cyclohexyl, R5 : CH2 CH2 27 (ech) Fluorescein-labeled6
Luminescence max (nm)
Relative luminescence capacity2
Relative intensity at 10−6 or 10−7 M Ca2+3
Half-total time(s)4
466 466 472 470 464 468 467 442 453
1.00 0.75 0.80 0.80 0.92 0.25 0.30 0.63 1.00
1.00 16 20 30 0.6 0.15 0.07 28 15
M M M M 5 5 5 F F
460
0.20
1100
2
445
0.65
500
F
450
0.52
80
F
462 440 405, 472
0.43 0.37 0.50
73 2 6
M M F
405, 470
0.35
40
F
402, 440 528
0.40 1.00
8 2
F M
1 Only the changes from the coelenterazine structure are shown in this column. Those unchanged are shown in parentheses in the above structures. 2 The ratio in luminescence capacity: semi-synthetic aequorin/unmodified aequorin. 3 The ratio in luminescence intensity: semi-synthetic aequorin/unmodified aequorin, in 10−7 M Ca2+ for a value of 1 and larger and in 10−6 M Ca2+ for a value of less than 1. 4 The time required to emit 50% of the total light in 10 mM calcium acetate: F, 0.15–0.3 s; M, 0.4–0.8 s. The half-rise time of luminescence: F, 2–4 ms, all others, 6–20 ms. 5 Prepared from aequorin isoform J. 6 Fluorescein was chemically bound to apoaequorin, followed by regeneration using unmodified coelenterazine.
17
18 The Photoproteins
Fig. 2 Relationship between Ca2+ concentration and the initial light intensity of various recombinant semi-synthetic aequorins and n-aequorin J (a semi-synthetic natural aequorin made from an isoform, aequorin J). The curve number corresponds to the photoprotein number used in Table 2. A
photoprotein (3 :g) was added to 3 mL of Ca2+ buffer with various pCa values (pH 7.0) containing 1 mM total EGTA, 100 mM KCl, 1 mM free Mg2+ , and 1 mM MOPS, at 23–24 ◦ C. The data are taken from Shimomura et al. [76].
systems into the same plane. The luminescence reactions of e-aequorins are fast, with a half-rise time of 2–4 ms and a half-total time of 0.15–0.3 s, like ch-aequorins with an 8-cyclohexylmethyl substituent. The luminescence spectra are bimodal, with peaks at 400–405 nm and 440–475 nm. The ratio of the two peaks is variable not only with the type of aequorin but also with the measurement conditions, such as the concentration of Ca2+ and pH. e-Coelenterazines scarcely luminesce in the presence of apoaequorin, Ca2+ , and 2-mercaptoethanol in air [51].
References 1 SHIMOMURA, O., JOHNSON, F. H., SAIGA, Y. Extraction, purification and properties of aequorin, a bioluminescent protein from the luminous hydromedusan, Aequorea. J. Cell. Comp. Physiol. 1962, 59, 223–239. 2 SHIMOMURA, O., JOHNSON, F. H. Partial purification and properties of the
Chaetopterus luminescence system. In Bioluminescence in Progress (JOHNSON, F. H., HANEDA, A., Eds.), pp. 495–521. Princeton University Press: Princeton, NJ, 1966. 3 SHIMOMURA, O. Bioluminescence in the sea: photoprotein systems. Symp. Soc. Exp. Biol. 1985, 39, 351–372.
References 4 HASTINGS, J. W., GIBSON, Q. H. Inter mediates in the bioluminescent oxidation of reduced flavin mononucleotide. J. Biol. Chem. 1963, 238, 2537–2554. 5 HASTINGS, J. W., NEALSON, K. H. Bacterial bioluminescence. Ann. Rev. Microbiol. 1977, 31, 549–595. 6 CAMPBELL, A. K., HALLETT, M. B., DAW, R. A., RYALL, M. E. T., HATR, R. C., HERRING, P. J. Application of the photoprotein obelin to the measurement of free Ca2+ in cells. In Bioluminescence and Chemiluminescence, Basic Chemistry and Analytical Application (DELUCA, M. A., MCELROY, W. D., Eds.), pp. 601–607. Academic Press: New York, 1981. 7 BLINKS, J. R., PRENDERGAST, F. G., ALLEN, D. G. Photoproteins as biological calcium indicators. Pharmacol. Rev. 1976, 28, 1–93. 8 ASHLEY, C. C., CAMPBELL, A. K., Eds. Detection and Measurement of Free Ca2+ in Cells. Elsevier/North-Holland Biomedical Press, Amsterdam, 1979. 9 SHIMOMURA, O., JOHNSON, F. H. Properties of the bioluminescent protein aequorin. Biochemistry 1969, 8, 3991–3997. 10 SHIMOMURA, O., JOHNSON, F. H. Calcium-triggered luminescence of the photoprotein aequorin. Symp. Soc. Exp. Biol. 1976, 30, 41–54. 11 BLINKS, J. R., MATTINGLY, P. H., JEWELL, B. R., van LEEUWEN, M., HARRER, G. C., ALLEN, D. G. Practical aspects of the use of Aequorea as a calcium indicator: Assay, preparation, microinjection, and interpretation of signals. Method. Enzymol. 1978, 57, 292–328. 12 INOUYE, S., NOGUCHI, M., SAKAKI, Y., TAKAGI, Y., MIYATA, T., IWANAGA, S., MIYATA, T., TSUJI, F. I. Cloning and sequence analysis of cDNA for the luminescent protein aequorin. Proc. Natl. Acad. Sci. USA 1985, 82, 3154–3158. 13 PRASHER, D., MCCANN, R. O., CORMIER, M. J. Cloning and expression of the cDNA coding for aequorin, a bioluminescent calcium-binding protein. Biochem. Biophys. Res. Commun. 1985, 126, 1259–1268. 14 PRASHER, D. C., MCCANN, R. O., LONGIARU, M., CORMIER, M. J. Sequence comparisons of complememtary DNAs
15
16
17
18
19
20
21
22
23
24
25
encoding aequorin isotypes. Biochemistry 1987, 26, 1326–1332. CAMPBELL, A. K. Extraction, partial purification and properties of obelin, the calcium-activated luminescent protein from the hydroid Obelia geniculata. Biochem. J. 1974, 143, 411–418. STEPHENSON, D. G., SUTHERLAND, P. J. Studies on the luminescent response of the Ca2+ -activated photoprotein, obelin. Biochim. Biophys. Acta 1981, 678, 65–75. ILLARIONOV, B. A., FRANK, L. A., ILLARIONOVA, V. A., BONDAR, V. S., VYSOTSKI, E. S., BLINKS, J. R. Recombinant obelin: cloning and expression of cDNA, purification, and characterization as a calcium indicator. Method. Enzymol. 2000, 305, 223–249. LEVINE, L. D., WARD, W. W. Isolation and characterization of a photoprotein, “phialidin”, and a spectrally unique green-fluorescent protein from the bioluminescent jellyfish Phialidium gregarium. Comp. Biochem. Physiol. 1982, 72B, 77–85. INOUYE, S., TSUJI, F. I. Cloning and sequence analysis of cDNA for the Ca2+ -activated photoprotein, clytin. FEBS Lett. 1993, 315, 343–346. MORISE, H., SHIMOMURA, O., JOHNSON, F. H., WINANT, J. Intermolecular Energy Transfer in the bioluminescent system of Aequorea. Biochemistry 1974, 13, 2656–2662. SHIMOMURA, O., JOHNSON, F. H. Calcium binding, quantum yield, and emitting molecule in aequorin bioluminescence. Nature 1970, 227, 1356–1357. SHIMOMURA, O., SHIMOMURA, A. Halistaurin, phialidin and modified forms of aequorin as Ca2+ indicator in biological systems. Biochem. J. 1985, 228, 745–749. SHIMOMURA, O., JOHNSON, F. H. Comparison of the amounts of key components in the bioluminescence system of various coelenterates. Comp. Biochem. Physiol. 1979b, 64B, 105–107. SHIMOMURA, O., JOHNSON, F. H. Regeneration of the photoprotein aequorin. Nature 1975a, 256, 236–238. SHIMOMURA, O., SHIMOMURA, A. Resistivity to denaturation of the apoprotein of aequorin and reconstitution
19
20 The Photoproteins
26
27
28
29
30
31
32
33
34
35
36
of the luminescent photoprotein from the partially denatured apoprotein. Biochem. J. 1981, 199, 825–828. WARD, W. W., SELIGER, H. H. Properties of mnemiopsin and berovin, calcium-activated photoproteins. Biochemistry 1974, 13, 1500–1510. HASTINGS, J. W., MORIN, J. G. Calcium activated bioluminescent protein from ctenophores (Mnemiopsis) and colonial hydroids (Obelia). Biol. Bull. 1968, 135, 422. WARD, W. W., SELIGER, H. H. Action spectrum and quantum yield for the photoinactvation of mnemiopsin, a bioluminescent photoprotein from the ctenophore Mnemiopsis sp. Photochem. Photobiol. 1976, 23, 351–363. ANCTIL, M., SHIMOMURA, O. Mechanism of photoinactivation and re-activation in the bioluminescence system of the ctenophore Mnemiopsis. Biochem. J. 1984, 221, 269–272. HENRY, J. P., ISAMBERT, M. F., MICHELSON, A. M. Studies in bioluminescence. IX. Mechanism of the Pholas dactylus system. Biochimie 1973, 55, 83–93. HENRY, J. P., MONNY, C., MICHELSON, A. M. Characterization and properties of Pholas luciferase as a metalloglycoprotein. Biochemistry 1973b, 14, 3458–3466. MICHELSON, A. M. Purification and properties of Pholas dactylus luciferin and luciferase. Method. Enzymol. 1978, 57, 385–406. ROBERTS, P. A., KNIGHT, J., CAMPBELL, A. K. Pholasin – A bioluminescent indicator for detecting activation of single neutro-phils. Anal. Biochem. 1987, 160, 139–148. HENRY, J. P., and MICHELSON, A. M. Studies in bioluminescence. IV. Properties of luciferin from Pholas dactylus. Biochim. Biophys. Acta 1970, 205, 451–458. M¨ULLER, T., CAMPBELL, A. K. The chromophore of pholasin: a highly luminescent protein. J. Biolumin. Chemilumin. 1990, 5, 25–30. REICHL, S., ARNHOLD, J., KNIGHT, J., SCHILLER, J., ARNOLD, K. Reactions of pholasin with peroxidases and hypochlorous acid. Free Radical Biology & Medicine 2000, 28, 1555–1563.
37 DUNSTAN, S. L., SALA-NEWBY, G. B., FAJARDO, A. B., TAYLOR, K. M., CAMPBELL, A. K. Cloning and expression of the bioluminescent photoprotein pholasin from the bivalve mollusc Pholas dactylus. J. Biol. Chem. 2000, 275, 9403–9409. 38 SHIMOMURA, O., JOHNSON, F. H. Chaetopterus photoprotein: crystalli-za tion and cofactor requirements for bioluminescence. Science 1968, 159, 1239–1240. 39 NICOLAS, M. T., BASSOT, J. M., SHIMOMURA, O. Polynoidin: a membrane photoprotein isolated from the bioluminescent system of scale-worms. Photochem. Photobiol. 1982, 35, 201–207. 40 TSUJI, F. I., LEISMAN, G. B. K+ /Na+ -triggered bioluminescence in the oceanic squid Symplectoteuthis oualaniensis. Proc. Natl. Acad. Sci. USA 1981, 78, 6719–6723. 41 TAKAHASHI, H., ISOBE, M. Photoprotein of luminous squid, Symplectoteuthis oualaniensis and reconstruction of the luminous system. Chem. Lett. 1994, 5, 843–846. 42 ISOBE, M., FUJII, T., KUSE, M., MIYAMOTO, K., KOGA, K. 19 F-Dehydrocoelenterazine as probe to investigate the active site of symplectin. Tetrahedron 2002, 58, 2117–2126. 43 LOOMIS, H. F., DAVENPORT, D. A lumi nes-cent new xystodesmid milliped from California. J. Wash. Acad. Sci. 1951, 41, 270–272. 44 HASTINGS, J. W., DAVENPORT, D. The luminescence of the millipede, Lumino-desmus sequoiae. Biol. Bull. 1957, 113, 120–128. 45 SHIMOMURA, O. A new type of ATP-activated bioluminescent system in the millipede Luminodesmus sequoiae. FEBS Lett. 1981, 128, 242–244. 46 WOOD, K. V., De WET, J. R., DEWJI, N., DELUCA, M. Synthesis of active firefly luciferase by in vitro translation of RNA obtained from adult lanterns. Biochem. Biophys. Res. Commun. 1984, 124, 592–596. 47 SHIMOMURA, O., SHIMOMURA, A. Effect of calcium chelators on the Ca2+ -dependent luminescence of aequorin. Biochem. J. 1984, 221, 907–910.
References 48 SHIMOMURA, O. Bioluminescence of the brittle star Ophiopsila californica. Photochem. Photobiol. 1986a, 44, 71–674. 49 SHIMOMURA, O. A short story of aequorin. Biol. Bull. 1995c, 189, 1–5. 50 HENRY, J. P., MONNY, C. Proteiprotein interaction in the Pholas dactylus system of bioluminescence. Biochemistry 1977, 16, 2517–2525. 51 SHIMOMURA, O., JOHNSON, F. H. Extraction, purification and properties of the bioluminescence system of the euphausiid shrimp Meganyctiphanes norvegica. Biochemistry 1967, 6, 2293–2306. 52 JOHNSON, F. H., SHIMOMURA, O. Introduction to the bioluminescence of medusae, with special reference to the photoprotein aequorin. Method. Enzymol. 1978, 57, 271–291. 53 JOHNSON, F. H., SHIMOMURA, O. Preparation and use of aequorin for rapid microdetermination of Ca2+ in biological systems. Nature 1972, 237, 287–288. 54 SHIMOMURA, O. Isolation and properties of various molecular forms of aequorin. Biochem. J. 1986b, 234, 271–277. 55 BLINKS, J. R., HARRER, G. C. Multiple forms of the calcium-sensitive bioluminescent protein aequorin. Fed. Proc. 1975, 34, 474. 56 SHIMOMURA, O., JOHNSON, F. H. Chemistry of the calcium-sensitive photoprotein aequorin. In Detection and Measurement of Free Calcium Ions in Cells (ASHLEY, C. C., CAMPBELL, A. K., Eds.), pp. 73–83. Elsevier/North-Holland: Amsterdam, 1979a. 57 LOSCHEN, G., CHANCE, B. Rapid kinetic studies of the light emitting protein aequorin. Nature New Biology 1971, 233, 273–274. 58 HASTINGS, J. W., MITCHEL, G., MATTINGLY, P. H., BLINKS, J. R., van LEEUWEN, M. Response of aequorin bioluminescence to rapid changes in calcium concentration. Nature 1969, 222, 1047–1050. 59 SHIMOMURA, O., JOHNSON, F. H. Further data on the specificity of aequorin luminescence to calcium. Biochem. Biophys. Res. Commun. 1973, 53, 90–494. 60 SHIMOMURA, O., JOHNSON, F. H. Specificity of aequorin bioluminescence
61
62
63
64
65
66
67
68
69
70
71
72
to calcium. In Analytical Application of Bioluminescence and Chemiluminescence (CHAPPELLE, E. W., PICCIOLO, G. L., Eds.), NASA SP-388 ed., pp. 89–94. National Aeronautics and Space Administration: Washington, DC, 1975b. HEAD, J. F., INOUYE, S., TERANISHI, K., SHIMOMURA, O. The crystal structure of the photoprotein aequorin at 2.3A resolution. Nature 2000, 405, 372–376. MUSICKI, B., KISHI, Y., SHIMOMURA, O. Structure of the functional part of photoprotein aequorin. Chem. Commun. 1986, 1566–1568. LA, S. Y., SHIMOMURA, O. Fluorescence polarization study of the Ca2+ -sensitive photoprotein aequorin. FEBS Lett. 1982, 143, 49–51. SHIMOMURA, O. Luminescence of aequorin is triggered by the binding of two calcium ions. Biochem. Biophys. Res. Commun. 1995b, 211, 359–363. SHIMOMURA, O., SHIMOMURA, A. EDTA-binding and acylation of the Ca2+ -sensitive photoprotein aequorin. FEBS Lett. 1982, 138, 201–204. SHIMOMURA, O., JOHNSON, F. H. Peroxidized coelenterazine, the active group in the photoprotein aequorin. Proc. Natl. Acad. Sci. USA 1978, 75, 2611–2615. RIDGWAY, E. B., SNOW, A. E. Effects of EGTA on aequorin luminescence. Biophys. J. 1983, 41, 244a. SHIMOMURA, O. Porphyrin chromophore in Luminodesmus photoprotein. Comp. Biochem. Physiol. 1984, 79B, 565–567. CHARBONNEAU, H., WALSH, K. A., MCCANN, R. O., PRENDERGAST, F. G., CORMIER, M. J., VANAMAN, T. C. Amino acid sequence of the calcium-dependent photoprotein aequorin. Biochemistry 1985, 24, 6762–6771. SHIMOMURA, O., INOUYE, S., MUSICKI, B., KISHI, Y. Recombinant aequorin and recombinant semi-synthetic aequorins. Biochem. J. 1990, 270, 309–312. SHIMOMURA, O., MUSICKI, B., KISHI, Y. Semi-synthetic aequorin: an improved tool for the measurement of calcium ion concentration. Biochem. J. 1988, 251, 405–410. SHIMOMURA, O., MUSICKI, B., KISHI, Y. Semi-synthetic aequorins with improved
21
22 The Photoproteins
73
74
75
76
77
sensitivity to Ca2+ ions. Biochem. J. 1989, 261, 913–920. SHIMOMURA, O., MUSICKI, B., KISHI, Y., INOUYE, S. Light-emitting properties of recombinant semi-synthetic aequorins and recombinant fluorescein-conjugated aequorin for measuring cellular calcium. Cell Calcium 1993, 14, 373–378. SHIMOMURA, O. Cause of spectral variation in the luminescence of semisynthetic aequorins. Biochem. J. 1995a, 306, 537–543. FUJII, T., AHN, J.-Y., KUSE, M., MORI, H., MATSUDA, T., ISOBE, M. A novel photoprotein from oceanic squid (Symplectoteuthis oualaniensis) with sequence similarity to mammalian carbon-nitrogen hydrolase domains. Biochem. Biophys. Res. Commun. 2002, 293, 874–879. HENRY, J. P., ISAMBERT, M. F., MICHELSON, A. M. Studies in bioluminescence. III. The Pholas dactylus system. Biochim. Biophys. Acta 1970, 205, 437–450. ILLARIONOV, B. A., BONDAR, V. S., ILLARIONOVA, V. A., VYSOTSKI, E. S. Sequence of the cDNA encoding the Ca2+ -activated photoprotein obelin from
78
79
80
81
82
the hydroid polyp Obelia longissima. Gene 1995, 153, 273–274. INOUYE, S., SAKAKI, Y., GOTO, T., TSUJI, F. I. Expression of apoaequorin complementary DNA in Escherichia coli. Biochemistry 1986, 25, 8425–8429. MARKOVA, S. V., VYSOTSKI, E. S., BLINKS, J. R., BURAKOVA, L. P., WANG, B.-C., LEE, J. Obelin from the bioluminescent marine hydroid Obelia geniculata: cloning, expression, and comparison of some properties with those of other Ca2+ -regulated photoproteins. Biochemistry 2002, 41, 2227–2236. MORIN, J. G., HASTINGS, J. W. Biochemistry of the bioluminescence of colonial hydroids and other coelenterates. J. Cell. Physiol. 1971, 77, 305–311. SHIMOMURA, O., JOHNSON, F. H., SAIGA, Y. Extraction and properties of halistaurin, a bioluminescent protein from the hydromedusan Halistaura. J. Cell. Comp. Physiol. 1963, 62, 9–16. SHIMOMURA, O., JOHNSON, F. H., MORISE, H. Mechanism of the luminescent intramolecular reaction of aequorin. Biochemistry 1974, 13, 3278–3286.
1
The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel) Mikhail Vyssokikh Moscow State University, Moscow, Russia
Dieter Brdiczka University of Konstanz, Konstanz, Germany
Originally published in: Bacterial and Eukaryotic Porins. Edited by Roland Benz. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30775-3
1 Introduction
Eukaryotic porins are membrane proteins that form aqueous channels in the cell membrane and the mitochondrial outer membrane. In contrast to bacterial porins that have a similar function, the structure of eukaryotic porins is not known at a useful resolution. The mitochondrial outer membrane pore was first characterized by Marco Colombini [1] as a voltage-dependent anion channel (VDAC). The properties of VDACs have been widely investigated in reconstituted systems by several groups [1, 2] exploring the main differences compared to bacterial porins, i.e. sensitivity to voltage already at 30 mV and to the polarity of the applied voltage. In general, bacterial and mitochondrial outer membranes have segregating functions, and pore proteins in the membranes control limited exchange. However, in bacteria, the outer membrane has more protective functions, whereas in mitochondria, communicative functions are more important. The VDAC plays an important role in the coordination of the communication between mitochondria and cytosol. A substantial aspect of this management is a transient formation of complexes with other proteins. This will be the topic of this article. 2 Structure and Isotypes of VDAC
Although the VDAC amino acid sequence is very different compared to the bacterial porins, it is assumed that the mitochondrial outer membrane pore may form a Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
β-barrel composed of 16 β-strands analogous to the know structure in bacteria [3]. Three different isotypes of VDAC are expressed in cells [4]. Not much is known in mammals about tissue-specific distribution, intracellular localization and function of the different isotypes. All experiments mentioned here relate to the VDAC isotype 1. This isotype might be the main species in the outer mitochondrial membrane. This is deduced from experiments with yeast expressing two VDAC isotypes. When type 1 VDAC was deleted, type II could not replace it, but instead TOM 40, a pore for mitochondrial precursor peptides, was overexpressed [5]. Furthermore, Anflous et al. [6] followed their expression in different tissues by cloning rat heart VDAC isotypes. It was observed that VDAC type 1 was the most abundant.
3 The Influence of Phospholipids on VDAC Structure
Sterols have been found in isolated and purified mitochondrial VDAC such as ergosterol in the case of Neurospora crassa [7] and cholesterol in VDAC from bovine heart [8]. Sterols influence the channel properties of reconstituted VDAC. Popp et al. [9] found them to be essential for proper insertion of soluble precursor VDAC into bilayer membranes and also observed a 10 times increase of conductance after addition of cholesterol to native VDAC. It was postulated by the authors that sterols may increase the hydrophobicity of the surface of the VDAC β-barrel structure exposed to the phospholipid matrix of the membrane. The sterols may thus be required to obtain the high-conducting VDAC structure. Complete saturation of VDAC with sterols may even cause a reduction of voltage-dependent conductance regulation. The outer mitochondrial membrane contains 10% cholesterol, while almost nothing is found in the inner membrane [10]. However, the cholesterol in the outer membrane appears to be clustered. Ardail et al. [11] were able to isolate two contact site fractions differing significantly in cholesterol content (9.4 versus 0.4 %). Both fractions bound hexokinase and, thus, contained the specific binding protein VDAC. Treatment of mitochondria with digitonin that interacts with cholesterol removes only parts of the outer membrane, leaving small vesicles attached at contact sites were hexokinase is bound [12]. This also suggests inhomogeneous distribution of cholesterol in the outer membrane. As with Ardail et al., we were able to characterize different types of contact sites on the basis of their composition by different functionally interacting proteins, as will be described below. We determined the lipid composition of the different contact-forming complexes (Table 1) and found a 5-fold difference of cholesterol content between two contact site-generating complexes. On the basis of this observation we assume the existence of structures like lipid rafts [13] in the outer membrane as a kind of outer membrane-signaling phospholipid domain. The localization of VDAC inside or outside the cholesterol-containing lipid raft-like areas might explain the different properties and structure of VDAC observed in the two complexes, as will be described below.
5 Physiological Significance of the Voltage Dependence Table 1 Lipid composition of hexokinase and creatine kinase complexes from contact sites
of rat kidney mitochondria Fraction Kidney creatine kinase from cristae creatine kinase (periph.) hexokinase complex Liver OMCS IMCS
Cholesterol
Cardiolipin
Cardiolipin/cholesterol
1 10 2
36 26 22
36 2.6 11
9.4 0.4
20.2 21.5
2.15 53.7
In the periphery, mitochondrial creatine kinase interacts with VDAC in the outer membrane and car-diolipin in the peripheral inner membrane. The enzyme is also localized in the cristae, where it interacts with two inner membranes. For comparison, the data from Ardail et al. [11] are shown at the bottom (OMCS = outer membrane contact sites, IMCS = inner membrane contact site). Data are shown as percentage of total lipid mass.
4 VDAC Conductance and Ion Selectivity
Analysis of the isolated VDAC after reconstitution in artificial bilayer membranes resulted in the calculation of the pore diameter of 4 nm at a voltage smaller than 30 mV. In this high conductance state (4 nS at 1 M KCl in the bathing fluid) the pore is anion selective. Above 30 mV, the diameter of the pore is reduced to 2 nm. The conductance decreases to 2 nS and ion selectivity changes to cation selectivity [1, 2]. The voltage-dependent conductance variations are linked to large structural modifications of VDAC. So far, no information about the nature of these changes is available. It has been postulated that a positively charged loop moves out of the channel [14]. However, it is also possible that negative charges move into the mouth of the channel, as has been observed for bacterial porins [15]. Macromolecules such as dextran, that cannot penetrate the pore, have an osmotic effect on the aqueous interior of the channel [16] and thereby increase voltage sensitivity. In the presence of dextran the low-conductance, cation-selective state is adopted already at 10 mV [17]. It has been postulated that the pore acquires this state in the contact sites where the inner membrane potential extends to the outer membrane by capacitive coupling [18]. Recently, a new idea of generation of a potential at the outer membrane was proposed on a basis of metabolic cycling and translocation of ATP, ADP and Pi across the inner membrane, and different resistance in the contact sites [19]. 5 Physiological Significance of the Voltage Dependence
VDAC in the low-conductance, cation-selective state is not permeable for ATP, ADP and other negatively charged small molecules. This has been demonstrated
3
4 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
by Colombini et al. investigating VDAC reconstituted in artificial membranes [20]. In isolated mitochondria, enzymes in the inter-membrane space such as adenylate kinase, had no access to external adenine nucleotides in the presence of K¨onig’s polyanion [21] that induces the low-conductance state of VDAC [22]. If we convey these results to the physiological situation where more contact sites are formed and macromolecules are present, we have to assume that most VDAC are in the low-conductance, cation-selective state. Indeed, in situ mitochondria of permeable cardiomyocytes or skinned red muscle fibers had a 10 times higher apparent K m for ADP compared to isolated mitochondria from the same tissue. The high K m could be reduced to in vitro values by generating porous outer membranes with digitonin or by the pro-apoptotic protein Bax, suggesting that in situ the outer membrane was not freely permeable for ADP [23]. In addition to pure voltage-dependent regulation, specific proteins in the inter-membrane space or attached to the outer surface of mitochondria have been described that physiologically may as well be involved in regulation of the pore permeability [24, 25].
6 Porins as Specific Binding Sites
In bacteria, porins are receptors for various phages. VDAC at the mitochondrial periphery acts in a similar way as a binding site for enzymes such as hexokinase [26] and glycerol kinase [27]. A shift in trans-membrane topology in the membrane also changes the surface-exposed loops between the β-strands. Considering the voltage dependence of the trans-membrane topology, this means that VDAC in the contact sites might have a role to reflect functional changes of the inner membrane potential at the mitochondrial surface. Alternatively, such a signal at the surface of the outer membrane may be generated by interaction of VDAC with the adenine nucleotide translocator (ANT; see below) if we assume that the ANT would interact exclusively with a certain VDAC state.
7 VDAC senses Inner Membrane Functions in the Contact Site
In support of this idea, it has been observed by immune electron microscopy and binding studies that hexokinase binds with higher capacity to the outer membrane pore in the contact sites compared to pores beyond contacts [28]. Freeze-fracturing analysis of isolated liver mitochondria revealed that contact sites could be induced by dextran or atractyloside and suppressed by glycerol or dinitrophenol [29, 30]. Thus, hexokinase binding to isolated outer membrane or mitochondria with induced or suppressed contact sites was studied. Hexokinase showed sigmoidal-type binding to mitochondria with contact sites, while binding to pure outer membrane or mitochondria with suppressed contact sites led to hyperbolic binding curves [29]. This suggested a cooperative binding of hexokinase to the pore in the contact
9 Isolation and Characterization of VDAC-ANT Complexes
site area. In agreement with this, hexokinase, by binding, was activated and bound hexokinase was found to form tetramers [31]. In a recent investigation, Hashimoto and Wilson were analyzing different epitopes at the surface of bound hexokinase by specific monoclonal antibodies. The authors observed a variation of the bound hexokinase structure that correlated with functional changes of oxidative phosphorylation in the inner membrane [32]. This direct reflection of inner membrane functions at the mitochondrial periphery can be explained by interaction between VDAC in the outer and ANT in the inner mitochondrial membranes [33–35]. Because of the high capacity of contact sites for hexokinase binding, the enzyme was used as a marker for isolation of this membrane fraction from osmotically disrupted mitochondria by density gradient centrifugation [36, 37]. In a recent application, hexokinase was used to indicate structural changes of VDAC correlated to the functional state of ANT. The contact site isolation method in kidney mitochondria revealed that hexokinase activity in the contacts was decreasing when the ANT was shifted to the m-conformation by pre-treatment with bongkrekic acid, whereas it was increasing after induction of the c-conformation by pre-incubation with atractyloside (Figure 1). We do not know how the structure of the VDAC changes and whether the alterations are induced by the membrane potential or by the interaction with the ANT. However, we can summarize the observations of voltage-dependent conductance changes and studies of hexokinase binding as showing that VDAC alters its transmembrane topology correlated to functional mitochondrial states. Because it affects the affinity and capacity of hexokinase binding, we can define a tensed and relaxed state of the pore in analogy to the regulation of allosteric enzymes. The tensed state would be the low-conductance, cationically selective pore in a complex with the ANT, whereas the relaxed state would be characterized by high conductance and anion selectivity without complex formation (Figure 2).
8 Cytochrome c is a Component of the Contact Sites
It was observed that not only hexokinase, but also cytochrome c is a component of the contact sites, and its concentration increased in the presence of atractyloside and decreased upon treatment with bongkrekic acid (Figure 1). Thus, as observed for hexokinase, the structure of the ANT was responsible for the cytochrome c distribution at the surface of the inner membrane.
9 Isolation and Characterization of VDAC-ANT Complexes
Complexes between VDAC and ANT were generated in vitro with isolated porin and ANT proteins [33]. However, they were also isolated from Triton X-100-dissolved brain or kidney membranes by binding the attached hexokinase to an anion
5
6 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
Fig. 1 Variation of cytochrome c and hexokinase in the contact site fraction of kidney mitochondria by perturbing ANT structure. After osmotic shock subfractions of rat kidney mitochondria were separated in a sucrose density gradient (54.3 %, left; 30.8 %, right). The distribution of inner membrane
(SDH) and contact site (HK) fractions is shown. Cytochrome c concentration (empty circles) in the different fractions was determined by specific antibodies and varied significantly in 500 :M atractyloside-treated (A), compared to 250 :M bongkrekatetreated (B) mitochondria.
exchanger. It was found that the bound hexokinase complex contained VDAC and ANT isotype I [35]. A second complex composed of porin, creatine kinase and ANT could also be isolated from the same membrane extracts [34, 38]. Based on the separation and purification of the different complexes, two types of VDAC-ANT aggregates could be defined: one complex where VDAC and ANT interacted directly and VDAC achieved higher affinity for hexokinase (Figure 2A), and a second complex where VDAC interacted indirectly with the ANT through a mitochondrial creatine kinase octamer. In this case, VDAC exposed a different structure at the surface that had low affinity for hexokinase (Figure 2D), as hex-okinase activity
10 Reconstitution of VDAC-ANT Complexes
Fig. 2 Different configuration of the VDACANT complex connected to either hexokinase or creatine kinase. Each of the three components of the two kinase complexes can exist in different configurations that support or inhibit complex formation. The ANT (ANT-c, induced by atracty-loside) in its c-conformation faces the cytosol and interacts with the outer membrane pore (VDAC), whereas ANT in the m-conformation (ANT-m induced by bongkrekate) orientates to the matrix and does not interact with VDAC. The ring represents cardiolipin encircling the ANT. Cardiolipin might change from the outer to inner leaflet of the membrane correlated to the ANT structure. The active ANT is a dimer. VDAC-t (tensed) is a dimer when it interacts with the ANT-c in a cholesterol-
free lipid matrix. VDAC-r (relaxed) is in a cholesterol-rich lipid matrix (depicted as bars). It is not bound to the ANTand may also exist as a monomer. This structure of VDAC has low affinity for hexokinase and is present in the complex with the octamer of creatine kinase (CK). (A) Hexokinase (HK) associates to oligomers preferentially in the contact sites formed by the VDACANT-c complex. The oligomerization of hexokinase leads to activation of the enzyme. (B) Monomers of hexokinase bind to VDAC beyond the contacts with less affinity and activity. (C) VDAC-ANT complexes may also exist free without bound kinases. (D) Creatine kinase (CK) binds to VDAC as octamer. The octamer preferentially associates with cardiolipin that is bound to the ANT.
was always absent in the creatine kinase complexes. From cross-linking studies in isolated yeast mitochondria we know that VDAC was a dimer in the intact outer membrane [39]. Determination of the molecular weight of the isolated complexes suggested that tetramers of the active kinases were present in both cases [34]. Hexokinase is active as a monomer of 100 kDa, whereas the functionally active mitochondrial creatine kinase unit is a dimer of 85 kDa. Four dimers of mitochondrial creatine kinase form a cubic structure with identical top and bottom faces that are able to connect two membranes [40]. It is known from in vitro studies that the octamer of mitochondrial creatine kinase interacts directly with the outer membrane pore [41], while the interaction with the ANT may be indirect through the cardiolipin [42] that is known to be tightly bound to the ANT [43] (Figure 2D).
10 Reconstitution of VDAC-ANT Complexes
The hexokinase–porin–ANT complex and the porin-creatine kinase–ANT complex were isolated and functionally reconstituted in vesicles [34]. ATP loaded into the
7
8 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
vesicles did not leak out, although VDAC was a component of the complexes. However, the kinases in the complexes had access to the internal ATP through the ANT. This was shown by either glucose-6-phosphate formation from external glucose via hexokinase or creatine phosphate generation by creatine kinase from external creatine. Both reactions were inhibited by blocking the antiporter activity of the ANT by atractyloside. The results suggested that the kinases, after reconstitution, were functionally coupled to the ANT. The ANT was controlling the exchange of adenine nucleotides through VDAC and was working as an antiporter [44]. However, the functional state of the ANT in the reconstituted complexes could be converted into an unspecific uniporter (resembling the permeability transition pore) by addition of Ca2+ , as has been demonstrated also for several related mitochondrial transport systems by Dierks et al. [45].
11 Importance of Metabolic Channeling in Regulation of Energy Metabolism
The tight functional coupling of peripheral kinases to the ANT is important because of two reasons: (i) it regulates the activity of oxidative phosphorylation by ADP and (ii) it increases the free energy (G) in the ATP system. To clarify the first point we may imagine the situation after eating a carbohydrate-rich meal: a high glucose concentration enters the resting muscle cell that must be phosphorylated and converted to UDP-glucose to accomplish incorporation into glycogen. The ATP for this process is more efficiently provided by oxidation of pyruvate in the mitochondria than by lactate formation. Hexokinase bound to the VDAC- ANT complex induces the oxidative pathway by converting intra-mitochondrial ATP into ADP and by that activating oxidative phosphorylation. To describe the second advantage of metabolic channeling, we have to consider that all ATP energy-consuming processes in the cell (including ion pumps) depend on the transfer of the γ -phosphoryl group of ATP according to the reaction: ATP+X = X-P +ADP. In a subsequent step, X-P dissociates when, for example, the ion is pumped. This means the efficiency of ATP to drive these phosphorylation reactions depends on the level of ADP and Pi during the reaction according to the equation of G = Go + RT ln [ADP] [Pi ]/[ATP]. Thus, the power (G) of ATP and the flux through the ATP-dependent reaction would increase if ADP and Pi would be low or be continuously withdrawn from the reaction. In the complex between mitochondrial creatine kinase and the ANT these requirements are perfectly fulfilled. The equilibrium of the reaction will not be reached as the ATP just exported is directly utilized and the ADP produced is immediately taken up into the matrix by the ANT. Pi is excluded from the reaction because of complex formation. Thus, metabolic channeling through functional coupling between ANT and creatine kinase pushes the balance of the reaction between creatine and ATP far to the side of phosphocreatine production. A further advantage of this coupling to the ANT is that the higher free energy of ATP at the surface of the inner membrane is preserved as a higher phosphocreatine/creatine quotient. ATP is electrogenically exported by the ANT at the expense of the membrane
12 The VDAC–ANT Complex as Permeability Transition Pore
potential. It has been calculated that this process increases the free energy of ATP by an additional 12 kJ mol−1 [46]. Thus, the phosphocreatine/creatine ratio could be increased by this quantity through coupling between mitochondrial creatine kinase and the ANT. It has been criticized that the functional interaction between ANT and creatine kinase may be physiologically irrelevant because of the approximately 10-fold difference in ATP turnover of the two partners. However, one has to consider that at least four of the slower ANT dimers may be attached to one of the fast creatine kinase octamers (Figure 2).
12 The VDAC–ANT Complex as Permeability Transition Pore
Mitochondria contain a yet-unidentified structure that forms a large unspecific pore under conditions of high matrix Ca2+ , Pi or oxidative stress [47]. Opening of this so-called permeability transition pore was fully reversible upon withdrawing Ca2+ [48]. Electrophysiological measurements revealed properties of the pore related to VDAC suggesting that it may be a component of the whole structure [49, 50]. The permeability transition pore could be specifically blocked by cyclosporin A, which binds to cyclophilin D, a peptidylprolyl cis–trans isomerase present in the mitochondrial matrix [48, 51]. By employing a cyclophilin D affinity matrix, a complex between VDAC and ANT was isolated by Crompton et al. after extracting heart mitochondrial membranes with the detergent CHAPS [52]. This complex was reconstituted in vesicles, and pore opening by Ca2+ and Pi addition was registered in cyclosporin A-sensitive manner. Comparable results were obtained with the isolated hexokinase–VDAC–ANT complex. Cyclophilin was co-purified through isolation of the complex [38]. The complex was reconstituted in phospholipid vesicles that were loaded with malate. The internal malate was released correlated to increasing Ca2+ concentrations. The process was inhibited by cyclosporin A, suggesting that the reconstituted hexokinase–porin–ANT complex had properties of the permeability transition pore (Figure 3A). Two questions arise: (i) whether the ANT is the only structure that causes permeability transition and (ii) whether the ANT forms the pore only as a complex with VDAC. Concerning the first question, it is possible that all related antiporters such as Pi /OH- or the Glu/Asp exchanger can form a permeability transition pore depending on the actual conditions. As to the second question, it is possible to claim that the purified ANT also forms unspecific pores in the presence of Ca2+ . This was shown by reconstitution of ANT in bilayer membranes [53] or in vesicles [54]. However, these pores were not sensitive to cyclosporin A as cyclophilin D was missing. Cyclophilin D is thus a second important regulatory component of the permeability transition pore [51, 55]. The idea that VDAC may also have regulatory functions of the permeability transition was supported by the analysis of the reconstituted VDAC–creatine kinase–ANT complex. In contrast to the reconstituted hexokinase complex there was no release of enclosed malate from the vesicles by addition of
9
10 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
Fig. 3 The VDAC–ANT complex as permeability transition pore. (A, left panel) The scheme depicts the presumable arrangement of the hexokinase porin ANT complex in the contact sites of mitochondria. Hexokinase (HK) binds as a tetramer to the VDAC–ANT complex. Glycerol kinase (GK) and the mitochondrial benzodiazipin receptor (mBr) bind to the same pore configuration. (A, right panel) The complex was isolated from a Triton X-100 extract of brain membranes and was reconstituted in phospholipid vesicles that were loaded with malate or ATP. Cyclophilin was co-purified with the complex and could be extracted from the complex with phosphate buffer, pH 4.5. The enclosed malate was released from the vesicles by Ca2+ between 100 and 600 :M. The malate release was inhibited either by cyclophilin extraction (–Cyp D) or by incubation of the vesicles with 0.5
:M cyclosporin A (CSA) after addition of 200 or 500 :M Ca2+ . According to Brustovetsky and Klingenberg [53], Ca2+ interacts with the cardiolipin (DPG) around the ANT, thus changing the structure of the ANT from a ATP/ADP antiporter to an unspecific uniporter state. The uniporter state of the ANT may act as a permeability transition pore (PTP) in intact mitochondria. (B, left panel) The scheme shows the possible arrangement of a complex between VDAC, the octamer of mitochondrial creatine kinase and the ANT in the contact sites of mitochondria. When the octamer dissociates, VDAC and ANT can directly interact. (B, right panel) The complex was isolated from a Triton X-100 extract of brain membranes and was reconstituted in egg yolk liposomes that were loaded with malate. The malate was not released by increasing Ca2+ concentrations (Octamer). When the
13 The VDAC–ANT Complex as a Target for Bax-dependent Cytochrome c Release
Ca2+ (Figure 3B). As shown schematically in Figure 3B, VDAC is linked to the octamer of creatine kinase that hinders interaction with the ANT. However, direct interaction between VDAC and ANT was possible after dissociation of the creatine kinase octamer. In this case malate was liberated dependent on Ca2+ addition (Figure 3). The release was inhibited by cyclosporin A. The results suggested that VDAC such as cyclophilin D might play a role in regulation of the permeability transition pore. A possible role of VDAC might be to keep the ANT in the c-conformation, exposing the ATP/ADP-binding site to the cytosolic surface of the inner membrane according to the re-orientating carrier model proposed by Klingenberg [56]. In this conformation that is induced by atractyloside, the conversion of the ANT into an unspecific pore by Ca2+ is facilitated and leads to permeability transition. However, if VDAC and the ANT interact with the octamer of mitochondrial creatine kinase, conversion of ANT to the permeability transition pore-like state would be suppressed (Figure 3). In fact this has been observed in a transgenic mouse model expressing mitochondrial creatine kinase in liver mitochondria [57]. The permeability transition pore was opened in isolated mouse liver mitochondria by atractyloside and Ca2+ . In contrast, in the liver mitochondria from transgenic mice containing creatine kinase, the permeability transition was inhibited by substrates that stabilized the octamer of creatine kinase such as creatine and cyclocreatine. It should be mentioned here that mitochondrial creatine kinase, in addition to ts important role of organization in the peripheral mitochondrial compartment, is also localized in the cristae [58] where it has certainly different functions and is not linked to VDAC.
13 The VDAC–ANT Complex as a Target for Bax-dependent Cytochrome c Release
There are several signal- and tissue-specific pathways to induce apoptosis. One of them is a release of cytochrome c from mitochondria [59–61] that activates caspases [60] by binding to a cytoplasmic protein Apaf-1 in the presence of dATP [62]. The mechanism by which external Bax releases cytochrome c is still controversial, and may also depend on the actual localization of cytochrome c in addition to the binding and incorporation of Bax at the mitochondrial surface. VDAC and ANT interact preferentially when adopting a specific molecular conformation. The ANT has to be in the c-conformation as atractyloside induces the complex with the outer membrane pore [35], while bongkrekate shifts the ANT ← Fig. 3 octamer was dissociated into dimer by 20 min incubation of the liposomes with 5 mM MgCl2 , 20 mM creatine, 50 mM KNO3 and 4 mM ADP, VDAC–ANT complexes could form. In these complexes Ca2+ could shift the ANT structure to the unspecific permeability transition pore (PTP) and
the enclosed malate was liberated (Dimer). The malate permeability of the PTP was inhibited by pre-incubation of the vesicles with 100 nM cyclosporin A (dimer + CSA). The octamer-dimer equilibrium of the mitochondrial creatine kinase can be shifted to dimer by reactive oxygen species (ROS) [74].
11
12 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
to the m-conformation and by that represses this interaction. The VDAC in the complex with the ANT appears to be in the low-conductance, cationically selective, tensed state. This can be deduced from the observation that heart muscle mitochondria in situ show a reduced response to external ADP [23]. The structural change of the outer membrane pore achieved by interaction with the ANT is recognized at the mitochondrial surface and leads to higher affinity of hexokinase. Recent observations suggest that this tensed pore structure is the preferred target of Bax molecules. Pastorino et al. [63] found that hexokinase and Bax compete for the same binding site, and Capano and Crompton [64] co-precipitated VDAC and ANT when Bax was immune precipitated from extracts of cardiomyocytes.
14 The VDAC–ANT Complexes contain Cytochrome c
As described above it was observed that the contact sites contained cytochrome c. As the VDAC–ANT complexes are derived from the contact sites it was not surprising that the complexes contained a significant amount of cytochrome c [65]. This was found although the complex was eluted from anion exchanger column by 200 mM KCl suggesting that the cytochrome c was not bound by ionic interaction. To investigate whether Bax would be able to interact with the cytochrome c within the VDAC–ANT complexes, the latter were reconstituted in phospholipid vesicles as described above (Figure 4A). After reconstitution, the vesicles were loaded by malate. Bax liberated the endogenous cytochrome c, but did not release the internal malate (Figure 4A). The Bax-dependent liberation of endogenous cytochrome c was abolished when the VDAC–ANT complex was dissociated by bongkrekate (Figure 4B).
15 The Importance of the Kinases in Regulation of Apoptosis
Release of cytochrome c from mitochondria proceeds in two steps. The first involves the release of a small fraction of cytochrome c from the compartment between the outer and peripheral inner membranes and the contact sites. It is induced by Bax at low concentrations [66] in the early phase [64]. The second step includes release of additional fractions of cytochrome c located at the surface of the cristae membranes and results from opening of the permeability transition pore, followed by mitochondrial swelling and membrane disruption [67]. The massive cytochrome c release may also be a consequence of VDAC closure preceding membrane disruption [68]. The observed failure of mitochondria to exchange adenine nucleotides with the cytosol may be a consequence of the transformation of VDAC into the tensed state. The two described kinases bound to VDAC are involved in the regulation of both processes of cytochrome c liberation.
15 The Importance of the Kinases in Regulation of Apoptosis
Fig. 4 Bax-dependent release of cytochrome c from the hexokinase VDAC–ANT complex. (A) Schematic representation of the reconstituted hexokinase (HK)–VDAC (P)–ANT complex in multilayered vesicles. Cytochrome c (circles) that was still attached to the complexes could be released by Bax, as shown in the right panel. However, malate, loaded into the vesicles, was not liberated by Bax, but by 200–500 :M Ca2+
through opening the ANTas a permeability transition pore (Figure 3A). This suggested that Bax was specifically interacting with the cytochrome c in the porin–ANT complex. The sigmoidal curve of cytochrome c release might indicate that Bax is accomplishing this process as an oligomer. (B) Schematic representation of the association/dissociation of the hexokinase–VDAC (P)–ANT complex by atractyloside (ATR)
13
14 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
16 Suppression of Bax-dependent Cytochrome c Release and Permeability Transition by Hexokinase
Hexokinase binds to the outer membrane pore in its tensed state, when VDAC interacts with the ANT. In this state VDAC is not freely permeable for adenine nu-cleotides. This has been already observed in mitochondria in the presence of dextran that increases hexokinase binding and contacts between VDAC and ANT [29]. In these mitochondria, external pyruvate kinase had less access to the ADP produced by hexokinase [69]. Although VDAC in the tensed state does not allow ATP transfer, hexokinase utilizes mitochondrial ATP, suggesting a structural modification of VDAC through binding of the enzyme. On the other hand, the activity of mitochondrial-bound hexokinase was found to be important for protein kinase B-linked suppression of cytochrome c release and apoptosis [70]. The experiments suggested that following withdraw of growth hormone, Bax liberated the small fraction of cytochrome c from the contact sites. In complete agreement with this, the Bax-dependent release of cytochrome c bound in the VDAC–ANT complexes described above was inhibited by activity of hexokinase in the presence of glucose and ATP [65]. This was explained by stabilization of hexokinase binding to the VDAC that is also the main target of Bax. In addition to this effect on Bax-dependent cytochrome c release, activation of hexokinase inhibited Ca2+ -dependent opening of the permeability transition pore by ADP production [38].
17 Suppression of Permeability Transition by Mitochondrial Creatine Kinase
Another kinase which binds to VDAC is mitochondrial creatine kinase. This enzyme interacts with VDAC exclusively in the octameric state, while the dimer has only weak affinity to porin [41, 42]. Thus, the association–dissociation equilibrium between octamer and dimer determines the formation of octamer–VDAC complexes. Ligation of the coronary artery in guinea pigs led to decrease of octamer from 85 to 60 % in the infarcted area of the heart [71]. Because of the dissociation of the creatine kinase octamer, the possibility of direct VDAC–ANT interaction was increased. As suggested above, the latter complexes may be the prerequisite of Ca2+ -induced permeability transition, followed by mitochondrial swelling, ← Fig. 4 or bongkrekate (BA) after reconstitution in multilayer vesicles. Treatment of the complexes with bongkrekate changed the ANT into the m-conformation (ANTm). This led to dissociation of the complex and a correlated structural change of VDAC, with a decreased affinity for hexokinase and Bax. Cytochrome c might have redistributed
as well. As shown in the right panel, Bax was unable to release cytochrome c after the structural change of the complex by bongkrekate. All cytochrome c remained in the sediment (BA sed) in contrast to the control, without bongkrekate treatment, and nothing appeared in the supernatant (BA sup).
References
membrane disruption, cytochrome c release and apoptosis. Indeed, apoptotic cells encircle the area of necrotic cells in the infarcted heart. Heart, kidney and brain mitochondria contain two ANT isotypes. ANT isotype I was localized in the peripheral inner membrane having higher affinity to VDAC and cyclophilin, whereas isotype II was present in the cristae membranes [35]. When ANT I was overexpressed in cells, apoptosis was induced [72]. However, co-expression of cyclophilin D or creatine kinase together with ANT I abolished apoptosis induction. Moreover, it has been observed that patients suffering from dilated cardiomyopathy had significantly higher ANT I mRNA levels [73]. This phenomenon, such as in the tissue culture experiments mentioned above, may be responsible for the increased apoptosis rate observed in case of cardiomyopathy [74]. Considering that the ANT I in the peripheral inner membrane would form a permeability transition pore, we assume that proteins interacting with ANT I such as cyclophilin and the octamer of mitochondrial creatine kinase would reduce the probability of pore opening and apoptosis induction. However, the chance of pore opening and permeability transition would increase if the ratio between the ANT I and its regulatory proteins is changing either by induction of ANT I or by dissociation of the octamer of mitochondrial creatine kinase.
18 Conclusion
The first electron microscopic observation of contact sites as dynamic structures [75] reflected morphological perturbations in mitochondrial membranes induced by association or dissociation of the described kinase complexes. The different components of the complexes interact in various ways depending on their actual structures that are regulated by the mitochondrial membrane potential and the energy metabolism of the cell. Physiologically, the complexes are transient structures that are subjected to regulation. During isolation the complexes become stable because alternatively interacting partners are removed. Thus, the reconstituted complexes represent certain functional states that the different components have adopted. We have shown how perturbation of a single component such as the ANT changes the function of the whole complex and is reflected at the mitochondrial surface by VDAC. This means that physiological signals will change the complex architecture if they affect the properties of individual components such as VDAC conductance or hexokinase/creatine kinases activity with significant regulatory consequences for the cell. References 1 COLOMBINI M. A candidate for the permeability pathway of the outer mitochondrial membrane. Nature 1979, 279, 643–645.
2 BENZ R. Permeation of hydrophilic solutes through mitochondrial outer membranes: review on mitochondrial porins. Biochim Biophys Acta Rev
15
16 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
3
4
5
6
7
8
9
10 11
12
13
Biomembr 1994, 1197, 167– 196. CASADIO R., JACOBONI I., MESSINA A., De PINTO V. A 3D model of the voltage-dependent anion channel (VDAC). FEBS Lett 2002, 520, 1–7. SAMPSON M. J., LOVELL R. S., CRAIGEN W. J. The murine voltage-dependent anion channel gene family. Conserved structure and function. J Biol Chem 1997, 272, 18966–18973. KMITA H., BUDZINSKA M. Involvement of the TOM complex in external NADH transport into yeast mitochondria depleted of mitochondrial porin. J Biochim Biophys Acta 2000, 1509, 86– 94. ANFLOUS K., BLONDEL O., BERNARD A., KHRESTCHATISKY M., VENTURA-CLAPIER R. Characterization of rat porin isoforms: cloning of cardiac style-3 variant encoding an additional methionine at its putative N-terminal region. Biochim Biophys Acta 1998 1399, 47–50. FREITAG H., NEUPERT W., BENZ R. Purification and characterization of a pore protein of the outer mitochondrial membrane from Neurospora crassa. Eur J Biochem 1982, 162, 629–636. DEPINTO V., BENZ R., PALMIERI F. Interaction of non-classical detergents with the mitochondrial porin. A new purification procedure and characterization of the pore-forming unit. Eur J Biochem 1989, 183, 179–187. POPP B., SCHMID A., BENZ R. Role of sterols in the functional reconstitution of water-soluble mitochondrial porins from different organisms. Biochemistry 1995, 34, 3352–3361. DAUM G. Lipids of mitochondria. Biochim Biophys Acta 1985, 822, 1–42. ARDAIL D., PRIVAT J. P., EGRET-CHARLIER M., LEVRAT C., LERME F., LOUISOT P. Mito-chondrial contact sites. Lipid composition and dynamics. J Biol Chem 1990, 265, 18797–18802. BRDICZKA D., WALLIMANN T. The importance of the outer mitochondrial compartment in regulation of energy metabolism. Mol Cell Biochem 1994, 133/134, 69–83. SIMONS K., IKONEN E. Functional rafts in
14
15
16
17
18
19
20
21
22
23
cell membranes. Nature 1997, 387, 569–572. SONG J., MIDSON C., BLACHLY-DYSON E., FORTE M., COLOMBINI M. The sensor regions of VDAC are translocated from within the membrane to the surface during the gating processes. Biophys J 1998, 74, 2926–2944. WELTE W., NESTEL U., WACKER T., DIEDERICHS K. Structure and function of the porin channel. Kidney Int 1995, 48, 930–940. ZIMMERBERG J., PARSEGIAN V. A. Polymer inaccessible volume changes during opening and closing of a voltage-dependent ionic channel. Nature 1986, 323, 36–39. GELLERICH F. N., WAGNER M., KAPISCHKE M., WICKER U., BRDICZKA D. Effect of macromolecules on the regulation of the mitochondrial outer membrane pore and the activity of adenylate kinase in the inter-membrane space. Biochim Biophys Acta 1993, 1142, 217–227. BENZ R., KOTTKE M., BRDICZKA D. The cationically selective state of the mitochondrial outer membrane pore: a study with intact mitochondria and reconstituted mitochondrial porin. Biochim Biophys Acta 1990, 1022, 311–318. LEMESHKO VV. Model of the outer membrane potential generation by the inner membrane of mitochondria. Biophys J 2002, 82, 684–692. ROSTOVTSEVA T., COLOMBINI M. VDAC channels mediate and gate the flow of ATP: implications for the regulation of mitochondrial function. Biophys J 1997, 72, 1954–1962. K¨ONIG T. KOCSIS B., M´ESZAROS L., NAHM K., ZOLTA´ N S., HORVA´ TH I. Interaction of a synthetic polyanion with rat liver mitochondria. Biochim Biophys Acta 1977, 462, 380–389. BENZ R., WOJTCZAK L., BOSCH W., BRDICZKA D. Inhibition of adenine nucleotide transport through the mitochondrial porin by a synthetic polyanion. FEBS Lett 1988, 210, 75–80. SAKS V., KUZNETSOV A., KHUCHUA Z., VASILYEVA E., BELIKOVA J., KESVATERA T. TIIVEL T. Control of cellular respiration in vivo by mitochondrial outer membrane
References
24
25
26
27
28
29
30
31
32
and by creatine kinase. A new speculative hypothesis: possible involvement of mitochondrial-cytoskeleton interactions. J. Mol Cell Cardiol 1995, 27, 625–645. LIU M. Y., COLOMBINI M. A soluble mitochondrial protein increases the voltage dependence of the mitochondrial channel, VDAC. J Bioenerg Biomembr 1992, 24, 41–46. SAKS V., KAAMBRE T. SIKK P., EIMRE M., ORLOVA E., PAJU K., PIIRSOO A., APPAIX F., KAY L., REGNITZ-ZAGROSEK V. FLECK E., SEPPET E. Intracellular energetic units in red muscle cells. Biochem J 2001, 356, 643–657. FIEK Ch, BENZ R., ROOS N., BRDICZKA D. Evidence for identity between the hex-okinase-binding protein and the mitochondrial porin in the outer membrane of rat liver mitochondria. Biochim Biophys Acta 1982, 688, 429–440. ¨ STLUND A K., G¨OHRING U., KRAUSE J., O BRDICZKA D. The binding of glycerol kinase to the outer membrane of rat liver mitochondria: its importance in metabolic regulation. Biochem Med 1983, 30, 231–245. WEILER U., RIESINGER I., KNOLL G., BRDICZKA D. The regulation of mito-chondrial-bound hexokinases in the liver. Biochem Med 1985, 33, 223–235. WICKER U., B¨UCHELER K., GELLERICH F. N., WAGNER M., KAPISCHKE M., BRDICZKA D. Effect of macromolecules on the structure of the mitochondrial inter-membrane space and the regulation of hexokinase. Biochim Biophys Acta 1993, 1142, 228–239. B¨UCHELER K., ADAMS V., BRDICZKA D. Localization of the ATP/ADP translocator in the inner membrane and regulation of contact sites between mitochondrial envelope membranes by ADP. A study on freeze fractured isolated liver mitochondria. Biochim Biophys Acta 1991, 1061, 215–225. XIE G., WILSON J. E. Tetrameric structure of mitochondrially bound rat brain hexokinase: a cross-linking study. Arch Biochem Biophys 1990, 276, 285– 293. HASHIMOTO M., WILSON J. E. Membrane potential-dependent conformational
33
34
35
36
37
38
39
40
changes in mitochondrially bound hexokinase in brain. Arch Biochem Biophys 2000, 884, 163–173. B¨UHLER S., MICHELS J., WENDT S., R¨UCK A., BRDICZKA D., WELTE W., PRZYBYLSKI M. Mass spectrometric mapping of ion channel proteins (Porins) and identification of their supramolecular membrane assembly. Proteins 1998, 2, 63–73. BEUTNER G. R¨UCK A., RIEDE B. WELTE W. BRDICZKA D. Complexes between kinases, mitochondrial porin and adenylate translocator in rat brain resemble the permeability transition pore. FEBS Lett 1996, 396, 189–195. VYSSOKIKH M. Y., KATZ A., R¨UCK A., WUENSCH C., DORNER A., ZOROV D. B., BRDICZKA D. Adenine nucleotide translocator isoforms 1 and 2 are differently distributed in the mitochondrial inner membrane and have distinct affinities to cyclophilin D. Biochem J 2001, 358, 349–358. OHLENDIECK K., RIESINGER I., ADAMS V., KRAUSE J., BRDICZKA D. Enrichment and biochemical characterization of boundary membrane contact sites in rat-liver mitochondria. Biochim Biophys Acta 1986, 860, 672–689. ADAMS V. BOSCH W., SCHLEGEL J., WALLIMANN T., BRDICZKA D. Further characterization of contact sites from mitochondria of different tissues: topology of peripheral kinases Biochim Biophys Acta 1989, 981, 213–225. BEUTNER G., R¨UCK A., RIEDE B., BRDICZKA D. Complexes between porin, hexokinase, mitochondrial creatine kinase and adenylate translocator display properties of the permeability transition pore. Implication for regulation of permeability transition by the kinases. Biochim Biophys Acta 1998, 1368, 7–18. KRAUSE J., HAY R., KOWOLLIK C., BRDICZKA D. Cross-linking analysis of yeast mitochondrial outer membrane. Biochim Biophys Acta 1986, 860, 690–698. ROJO M., HOVIUS R., DEMEL R. A., NICOLAY K., WALLIMANN T. Mitochondrial creatine kinase mediates contact formation between mitochondrial membranes. J Biol Chem 1991, 266, 20290–20295.
17
18 The Outer Mitochondrial Membrane Pore (Voltage-Dependent Anion Channel)
41 BRDICZKA D., KALDIS Ph, WALLIMANN Th In vitro complex formation between octamer of creatine kinase and porin. J Biol Chem 1994, 269, 27640–27644. 42 SCHLATTNER U., DOLDER M., WALLIMANN T., TOKARSKA-SCHLATTNER M. Mitochondrial creatine kinase and mitochondrial outer membrane porin show direct interaction that is modulated by calcium. J Biol Chem 2001, 276, 48027–48030. 43 BEYER K., KLINGENBERG M. ADP/ATP carrier protein from beef heart mitochondria has high amounts of tightly bound cardiolipin, as revealed by 31 P nuclear magnetic resonance Biochemistry 1985, 24, 3821–3826. 44 KRA¨ MER R., PALMIERI F. Metabolite carriers in mitochondria. In: ERNSTER L. (Ed.), Molecular Mechanisms in Bioener-getics. Elsevier, Amsterdam, 1992, pp. 359–384. 45 DIERKS T. SALENTIN A., HEBERGER C., KRA¨ MER R. The mitochondrial aspartate/glutamate and ADP/ATP carrier switch from obligate counterexchange to unidirectional transport after modification by SH-reagents. Biochim Biophys Acta 1990, 1028, 268–280. 46 HELDT H. W., KLINGENBERG M., MILOVANCEV M. Differences between the ATP-ADP ratios in the mitochondrial matrix and in the extramitochondrial space. Eur J Biochem 1972, 30, 434–440. 47 HARWORTH R. A., HUNTER P. R. Allosteric inhibition of the Ca2+ -activated hydrophilic channel of the mitochondrial inner membrane by nucleotides. J Membr Biol 1980, 57, 231–236. 48 CROMPTON M., COSTI A. Kinetic evidence for a heart mitochondrial pore activated by Ca2+ and oxidative stress. Eur J Biochem 1988, 178, 448–501. 49 ZORATI M., SZABAO I. The mitochondrial permeability transition. Biochim Biophys Acta 1995, 1241, 139–176. 50 KINALLY K. W., ZOROV D. B., ANTONENKO Y. N., SNYDER S. H., MCENERY M. W., TEDESCHI H. Mitochondrial diazepine receptor linked to inner membrane ion channels by nanomolar actions of ligand. Proc Natl Acad Sci USA 1993, 90, 1374–1378.
51 HALESTRAP A. P., DAVIDSON A. M. Inhibition of Ca + -induced large amplitude swelling of mitochondria by cyclosporin A is probably caused by binding to a matrix peptidylprolyl cis-trans-isomerase and preventing it interacting with the adenine nucleotide trans-locase. Biochem J 1990, 268, 153–160. 52 CROMPTON M., VIRJI S., WARD J. M. Cyclo-philin-D binds strongly to complexes of the voltage-dependent anion channel and the adenine nucleotide trans-locase to form the permeability transition pore. Eur J Biochem 1998, 258, 729–735. 53 BRUSTOVETSKY N., KLINGENBERG M. The mitochondrial ADP/ATP carrier can be reversibly converted into a large channel by Ca2+ . Biochemistry 1996, 35, 8483–8488. 54 R¨UCK A., DOLDER M., WALLIMANN T., BRDICZKA D. Reconstituted adenine nucleotide translocase forms a channel for small molecules comparable to the mitochondrial permeability transition pore. FEBS Lett 1998, 426, 97–101. 55 CROMPTON M., ELLINGER H., COSTI A. Inhibition by cyclosporin A of a Ca2+ -dependent pore in heart mitochondria activated by inorganic phosphate and oxidative stress. Biochem J 1888, 255, 257–360. 56 KLINGENBERG M., GREBE K., FALKNER G. Interaction between the binding of 35 S-atractyloside and bongkrekic acid at mitochondrial membranes. FEBS Lett 1971, 16, 301–303. 57 O’GORMAN E., BEUTNER G., DOLDER M., KORETSKY A. P., BRDICZKA D., WALLIMANN T. The role of creatine kinase in inhibition of mitochondrial permeability transition. FEBS Lett 1997, 414, 253–257. 58 KOTTKE M., WALLIMANN Th, BRDICZKA D. Dual localization of mitochondrial creatine kinase in brain mitochondria. Biochem Med Metabol Biol 1994, 51, 105–117. 59 YANG J. L. X., BHALLA K., KIM C. N., IBRADO A. M., CAI J., PENG T. I., JONES D. P., WANG X. Prevention of apoptosis by Bcl-2: release of cytochrome c from
References
60
61
62
63
64
65
66
67
68
mitochondria blocked. Science 1997, 275, 1129–1132. LIU X., KIM C. N., YANG J., JEMMERSON R., WANG X. Induction of apoptotic program in cell-free extracts: requirement for dATP and cytochrome c. Cell 1996, 86, 147–157. KLUCK R. M., GREEN D. R., NEWMEYER D. D. The release of cytochrome c from mitochondria: a primary site for Bcl-regulation of apoptosis. Science 1997, 275, 1132–1136. SALEH A. S. S., ACHARYA S., FISHEL R., ALNEMRI E. S. Cytochrome c and dATP-mediated oligomerization of Apaf-1 is a prerequisite for procaspase-9 activation. J Biol Chem 1999, 274, 17941–17945. PASTORINO J. G., SHULGA N., HOEK J. B. Mitochondrial binding of hexokinase II inhibits Bax-induced cytochrome c release and apoptosis. J Biol Chem 2002, 277, 7610–7618. CAPANO M., CROMPTON M. Biphasic translocation of BAX to mitochondria. Biochem J 2002, 367, 169–178. VYSSOKIKH M. Y., ZOROVA L., ZOROV D., HEIMLICH G., J¨URGENSMEIER J. M., BRDICZKA D. Bax releases cytochrome c preferentially from a complex between porin and adenine nucleotide translocator. Hexokinase activity suppresses this effect. Mol Biol Rep 2002, 29, 93–96. PASTORINO J. G., TAFANI, M., ROTHMAN, R. J., MARCINEVICIUTE A., HOEK J. B., FARBER J. L. Functional consequences of sustained or transient activation by Bax of the mitochondrial permeability transition pore. J Biol Chem 1999, 274, 31734–31739. HALESTRAP A. P., MCSTAY G. P., CLARKE S. J. The permeability transition pore complex: another view. Biochimie 2002, 84, 153–166. VANDER HEIDEN M. G., CHANDEL N. S., WILLIAMSON E. K., SCHUMACKER P. T.,
69
70
71
72
73
74
75
THOMPSON C. B. Bcl-xL regulates the membrane potential and volume homeostasis of mitochondria. Cell 1997, 91, 627–637. GELLERICH F. N., LATERVEER F. D., ZIERZ S., NICOLAY K. The quantification of ADP diffusion gradients across the outer membrane of heart mitochondria in the presence of macromolecules. Biochim Biophys Acta 2002, 1554, 48–56. GOTTLOB K., MAJEWSKI N., KENNEDY S., KANDEL E., ROBEY R. B., HAY N. Inhibition of early apoptotic events by Akt/PKB is dependent on the first committed step of glycolysis and mitochondrial hexokinase. Genes Dev 2001, 15, 1406–1418. SOBOLL S. BRDICZKA D., JAHNKE D., SCHMIDT A., SCHLATTNER U., WENDT S., WYSS M., WALLIMANN T. Octamer-dimer transitions of mitochondrial creatine kinase in heart disease. J Mol Cell Cardiol 1999, 31, 857–866. BAUER M. K., SCHUBERT A., ROCKS O., GRIMM S. Adenine nucleotide translocase-1, a component of the permeability transition pore, can dominantly induce apoptosis. J Cell Biol 1999, 147, 1493–1502. D¨ORNER A., SCHULZE K., RAUCH U., SCHULTHEISS H. P. Adenine nucleotide translocator in dilated cardiomyopathy: pathophysiological alterations in expression and function. Mol Cell Biochem 1997, 174, 261–269. DOLDER M., WENDT S., WALLIMANN T. Mitochondrial creatine kinase in contact sites: interaction with porin and adenine nucleotide translocase, role in permeability transition and sensitivity to oxidative damage [Review]. Biol Signals Recept 2001, 10, 93–111. KNOLL G., BRDICZKA D. Changes in freeze-fracture mitochondrial membranes correlated to their energetic state. Biochim Biophys Acta 1983, 733, 102–110.
19
1
The Phytochromes
Shih-Long Tu, and Clark Lagarias University of California at Davis, Davis, USA
Originally published in: Handbook of Photosensory Receptors. Edited by Winslow R. Briggs and John L. Spudich. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31019-7
1 Introduction 1.1 Photomorphogenesis and Phytochromes
Oxygenic photosynthetic organisms have evolved sophisticated mechanisms to adapt to their environment. Dependent upon light as an energy source, these organisms must also cope with too much light, which is especially challenging for highly pigmented species living in an aerobic environment. For this reason, plants have evolved light-receptor systems to recognize and respond to light quality, fluence rate, direction and duration from their environment. Among the many physiological processes under light control are seed germination, seedling growth, synthesis of the photosynthetic apparatus, the timing of flowering, neighbor detection, and senescence. Such light-regulated growth and developmental responses are collectively known as photomorphogenesis [19, 81, 119, 179]. Phytochrome was the first of the photomorphogenetic photoreceptor families to be identified nearly 50 years ago, [18]. In the last decade, research on the physiological, biochemical and functional properties of phytochromes has grown exponentially, [44, 54, 117, 129, 140]. More recent advances in this field can be attributed to the impact of genomics, studies that have revealed phytochrome-related genes in representative organisms from all forms of life on earth except Archaea, [111, 171]. The extended phytochrome family can be categorized into three subfamilies: plant phytochromes (Phy), cyanobacterial phytochrome 1 (Cph1) and cyanobacterial phytochrome 2 (Cph2) families (Figure 1). This review focuses on phytochromes that are found in oxygenic photosynthetic organisms.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Phytochromes
Fig. 1 The phytochrome family. Orthologous photosensory input (GAF, PHY and PAS) and regulatory output (PAS and HKRD) subdomains are depicted in representative members of the phytochrome (Phy), cyanobacterial phytochrome 1/bacte-
riophytochrome (Cph1/BphP) and cyanobacterial phytochrome 2 (Cph2) families (n = 0–2). The linear tetrapyrrole (phytobilin) chromophore is shown associated with the P3 GAF and P4 PHY domains. Adapted from Montgomery and Lagarias (2002).
1.2 The Central Dogma of Phytochrome Action
Synthesized in the red light-absorbing Pr form, all phytochromes are regulated by red light (R) absorption which initiates the photochemical interconversion to the far-red (FR) light-absorbing Pfr form (Figure 2 A). FR promotes the reverse conversion of Pfr to Pr – a process which typically abolishes the R-dependent activation of the photoreceptor. This R/FR photoreversibility is conferred by a linear tetrapyrrole (phytobilin) prosthetic group that is covalently attached to the phytochrome apoprotein. Supporting evidence for the central dogma of phytochrome action, that is that Pfr is the active form, has been accumulating for years. Much of this evidence reflects the strong correlation between the amount of Pfr produced by a given fluence of light and the magnitude of the biological response. While the central dogma appears to hold true for R-FR-reversible low-fluence responses (LFR) and for R-dependent very low fluence responses (VLFR), [147], FR high irradiance responses (HIR) do not conform to this simple view of phytochrome action, [20, 55, 148]. Such data indicate that Pr, Pfr, photocycled-Pr, as well as intermediates produced during Pfr to Pr photoconversion, may all function to transduce the light signal for different phytochromes (Figure 2 B). In addition to photochemical interconversion processes, non-photochemical Pfr-to-Pr dark reversion plays an important role to attenuate the signal output by altering the lifetime of the Pfr form (Figure 2 B). For a given quality and f luence rate of light in the environment, signal output from phytochrome therefore depends on holophytochrome synthesis, the two photochemical interconversion processes, dark reversion as well as protein turnover. It is on these topics that we will mainly focus our discussion.
2 Molecular Properties of Eukaryotic and Prokaryotic Phytochromes 3
Fig. 2 Spectral properties and modes of action of plant phytochromes. Panel A shows absorption spectra of purified oat phytochrome after saturating red (dotted line) and far-red (solid line) irradiation representing a mixture of 87% Pfr/13% Pr and
100% Pr, respectively. The deduced specturm of Pfr (dashed line) was obtained as described (Kelly and Lagarias [80]). Panel B depicts the three modes of phytochrome action in flowering plants that have been described by action spectroscopy.
2 Molecular Properties of Eukaryotic and Prokaryotic Phytochromes 2.1 Molecular Properties of Plant Phytochromes
In flowering plants, phytochromes are encoded by a small nuclear gene family reflecting repeated gene duplication of a eukaryote phytochrome progenitor during its evolution, [102]. In the model plant Arabidopsis thaliana, the phytochrome family consists of 5 genes, denoted phyA–phyE, [27], while monocot species (eg. rice or maize) appear to possess only representatives of the phyA–C families, [103]. The number of phytochrome species is not known for more primitive plants, that is gymnosperms, mosses, ferns, liverworts and algae. However the overall structure of the phytochrome photoreceptor has been preserved in all extant eukaryotic photosynthetic organisms (Figure 1). Although atypical phytochromes have been found in mosses and ferns, these organisms also possess conventional phytochromes, [120, 167]. For this reason, we will limit our discussion to those eukaryotic phytochromes which conform to the consensus structure. Our present understanding of phytochrome structure is mainly based on biochemical analysis of phyAs, but for those studied, other phytochromes appear to possess similar properties. Eukaryotic phytochromes are soluble homodimeric proteins consisting of two ∼120 kDa subunits. Small-angle X-ray scattering analyses indicate an overall protein fold with dimensions similar to mammalian immunoglobin Gs, [118]. Each subunit is composed of two domains that are separated by a protease-sensitive hinge region: the 60–70 kDa N-terminal ‘photosensory’ and 55 kDa C-terminal
4 The Phytochromes
‘regulatory’ domains. The photosensory domain functions to sense the light input and is composed of four sub-domains, that is the serine-rich N-terminal P1 domain, a PAS-related P2 domain, the bilin-binding P3 GAF domain, and the P4 PHY domain, [111]. PAS and GAF domains have been shown to possess structurally-related protein folds, [67], with PAS domains often comprising the binding site for small planar aromatic ligand molecules, [4, 127, 158]. PHY domains, evolutionarily related to PAS and GAF domains, may also adopt a similar three dimensional fold, [111]. The phytobilin prosthetic group of plant phytochromes is covalently bound to a conserved cysteine residue found in the P3 GAF domain via a thioether linkage, [90]. The presence of this conserved cysteine residue is one of the key structural features that distinguishes the phytochrome photoreceptors from members of the phytochrome-related BphP family. Biochemical analyses have shown that the photosensory domain of plant phytochromes adopts a compact globular protein fold with the phytobilin chromophore mostly buried within the protein matrix, [188]. By contrast, the C-terminal regulatory domain of plant phytochromes is elongated and more sensitive to proteolytic degradation. Comprised of four sub-domains, including two PAS domains and two domains related to the transmitter ‘output’ domains found on prokaryotic twocomponent sensor proteins, [145], the regulatory domain of plant phytochrome contains the site(s) of homodimerization, [76]. All four subdomains have been implicated in the high-affinity subunit-subunit interaction suggesting that no single subdomain of the C-terminus is necessary or sufficient, [77, 170]. By analogy to twocomponent sensor proteins, it is reasonable that the R4 H-ATPase-related domain on one subunit interacts with the R3 histidine phosphotransferase (HPT)-related domain on the other subunit, [157]. The potential for heterotypic interactions between R1 and R2 PAS domains on different subunits are also reasonable but has not been directly assessed to date. The critical role of PAS domains to plant phytochrome function is underscored by the many loss-of-function alleles that map to these regions, [130]. The hypothesis that the PAS repeats play a central role in phytochrome signaling is also supported by localization of light-dependent conformational changes to this region, [87] and to the identification of nuclear targeting sequences within this domain, [24, 104]. As depicted in Figure 3, light-induced conformational changes have been shown to occur in both photosensory and regulatory domains of phytochromes with the consensus view being that the P1 domain becomes less exposed while P4, R1, and R2 domains become more exposed upon Pr to Pfr phototransformation, [87]. The surprising observation that the C-terminal domain is dispensable for phytochrome function, if it is replaced with a nuclear-targeted protein capable of homodimerization, indicates that R1–R4 subdomains regulate the subcellular localization, dimerization and overall topology of the N-terminal domain, [104]. Together with structure-function studies using chimeric phytochromes, [175], these results indicate that phytochrome signaling in the nucleus is mediated by protein-protein interactions with its N-terminal photosensory domain. Aside from homodimerization, the presence of transmitter kinase-related R3 and R4 ‘output’ domains on all eukaryotic phytochromes implicates their role in
2 Molecular Properties of Eukaryotic and Prokaryotic Phytochromes 5
Fig. 3 Light induced conformational changes of plant phytochromes. The lightdependent accessibility of various regions of plant phytochrome to modification/interaction by proteases (inverted ar-
rows), protein kinases, and monoclonal antibodies (filled arrowheads) is depicted on the oat phyA subunit. Adapted from McMichael and Lagarias [108].
ATP-dependent phosphotransferase activity. Phytochrome R4 domains possess all of the consensus residues found in H-ATPase domains of the transmitter kinase super-family which is consistent with the observation that plant phytochromes bind ATP, [184]. Plant phytochromes are also autophosphorylating, ATP-dependent serine-threonine protein kinases, [185, 189]. These results, along with the lack of a conserved autophosphorylating histidine residue in the R3 HPT-related domain, indicate that plant phytochromes are atypical transmitter kinases that have gained a new function during evolution, [22]. Resolution of an apparent enzymatic function for the transmitter-related domains of plant phytochromes with their dispensability remains an important, albeit unanswered question. The bonafide catalytic activity of transmitter domains of cyanobacterial phytochromes argue for an ATP-dependent enzymatic role for this region of plant phytochromes in light signaling, [190]. 2.2 Molecular Properties of Cyanobacterial Phytochromes
The Cph1 family encompasses those prokaryotic phytochromes with the greatest similarity to plant phytochromes (Figure 1). All members of this family, which include the bacteriophytochromes, possess three of the four photosensory domains (P2-P4) found on plant phytochromes as well as the two conserved subdomains found in transmitter modules of two-component histidine kinases, [125]. The two regulatory R1 and R2 PAS domains, as well as the serine-threonine rich P1 domain, are missing from all members of the Cph1 family. The first representative of this family to be identified was Cph1, a soluble 85 kDa protein from the cyanobacterium Synechocystis sp. PCC 6803, [73, 190]. Cph1-related genes are present in most cyanobacterial genomes examined; those with conserved cysteine residues in their P3 GAF domains are predicted to encode phytobilin-binding, R/FR photoreversible chromoproteins, [63]. Evidence that the latter is true has been reported for Synechocystis Cph1, [95, 124] and for CphA from Calothrix sp. PCC7601, [78]. Many cyanobacterial species have two representatives of this family in their genome, the other being a bacteriophytochrome, [63].
6 The Phytochromes
Owing to the ability to express and reconstitute large quantities of recombinant Cph1s in bacteria, the biochemical properties of these proteins have been extensively investigated, [78, 92, 95, 124]. Like plant phytochromes, Cph1s bind phytobilins via the conserved cysteine residue in the P3 GAF domain to generate RFR photoreversible chromo-proteins, [71, 78, 92]. Cph1 and CphA are also protein kinases, whose histidine autophosphorylation and aspartate phosphotransferase activities are both light dependent, [72, 190]. By contrast with plant phytochromes, both activities are Pr-dependent; thus light absorption leads to inhibition rather than activation of Cph1’s catalytic function. These studies have shown that Cph1’s enzymatic activities require both HPT and H-ATPase domains which function as phosphohistidine donor/acceptor and ATP-binding sites, respectively, [190]. The overall take-home lessons from biochemical studies on the Cph1 familiy of phytochromes are that 1) Cph1’s P2-P3-P4 photosensory domain structure and phytobilin-binding environments are very similar to those of eukaryotic phytochromes, 2) Cph1’s C-terminal HPT and H-ATPase domains are required for homodimerization, and 3) monomeric and dimeric forms of Cph1 are in dynamic equilibrium – a process that is regulated by both phytobilin binding and light. With regard to the third conclusion, size-exclusion chromatographic analysis has established that phytobilin binding to the Cph1 apoprotein promotes subunit-subunit association, [92, 124]. Taken together, these results support the working hypothesis that phytobilin binding to apoCph1 activates the kinase activity by promoting homodimerization and trans-autophosphorylation. While additional experiments are needed to elucidate the structural basis of light inactivation of Cph1’s kinase activities, this could be accomplished by increasing the subunit-subunit dissociation constant and/or by decreasing the affinity for ATP. These questions remain important topics for future investigation. The second family of prokaryotic phytochromes, that is the Cph2s, also have been found exclusively in cyanobacteria, [111]. With the exception of the founding member, that is Cph2 from Synechocystis sp. PCC6803, all members of the Cph2 family are composed of two to four GAF domains and terminate in a transmitter module containing both HPT and H-ATPase subdomains. The N-terminal GAF domains of Cph2s are most similar to the P3 GAF domains of plant and Cph1 phytochromes in which the conserved cysteine residue is located. Moreover, the GAF domain immediately adjacent to this conserved GAF domain possesses strong sequence similarity with P4 GAFs of both other phytochrome families, [111]. To test the functional significance of this similarity, recombinant Cph2 was shown to encode a phytobilin-binding 1276 amino acid apoprotein that exhibited a characteristic R/FR photoreversible-phytochrome spectral signature, [123]. The phytobilin binding activity and spectroscopic properties of the full length protein were retained within the two N-terminal GAF domains of Cph2, [186]. These results provided compelling support for the hypothesis that the P3 GAF and P4 PHY subdomains of phytochromes delimit the region of phytochromes that is in direct contact with the phytobilin prosthetic group. Through expression of the N-terminal GAF domain of Cph2, it was also shown that the P3 GAF subdomain is sufficient for the phytobilin lyase activity, [186]. Based on these results and the preserved GAF-PHY
3 Photochemical and Nonphotochemical Conversions of Phytochrome 7
molecular architecture for members of all three phytochrome families, we hypothesize that the initial molecular changes that accompany light activation of the phytobilin prosthetic group will be shared by all three classes of phytochromes. The widespread distribution of the Cph1 and Cph2 families of cyanobacterial phytochromes, together with their light-regulated enzymatic activities, suggests that these proteins perform important photosensory roles in these organisms. Interposon mutagenesis of Cph1 and Cph2 genes has so far failed to convincingly identify photoregulatory functions for their protein products although they appear to modulate phototaxis, [180]. Owing to the presence of many phytochrome-related genes in cyanobacterial genomes, the potential for redundant function may account for the lack of a clear phenotype of knockout mutants. Some of the potential functions of these proteins include both photosensory roles (i.e. phototaxis, lightintensity sensing, chromatic adaptation) as well as non-photosensory roles (i.e. regulators of tetrapyrrole and/or iron metabolism or oxygen sensors). There is clearly no dearth of interesting questions remaining to be answered on these two families of prokaryotic phytochromes.
3 Photochemical and Nonphotochemical Conversions of Phytochrome 3.1 The Phytochrome Chromophore
Based upon spectral measurements of phytochrome in 1959, Butler et al. proposed that the phytochrome chromophore was a linear tetrapyrrole similar to those found in phycobiliproteins – the major components of the phycobilisome lightharvesting antennae in cyanobacteria and red algae, [18]. Siegelman and colleagues later reported that the major pigment released from oat phytochrome by refluxing methanol was a linear tetrapyrrole, [149]. An oxidative degradation approach was also used to show that this compound was a 2,3-dihydrobiliverdin ‘phytobilin’ pigment similar to those derived from phycobiliproteins, [83, 137, 138]. The structure of this phytobilin was subsequently confirmed by total chemical synthesis to be 2R, 3E-phytochromobilin (PB) [177]. Together with peptide-mapping studies, [52], these data indicated that the phytobilin chromophore of oat phytochrome was covalently bound to the apoprotein via a thioether linkage – a hypothesis that was later confirmed by 1 H NMR spectroscopy, [90]. The structure and linkage of the Pr chromophore of oat phytochrome based on these studies, and its phytobilin precursor, are shown in Figure 4. The chromophores of no other phytochrome have received direct chemical scrutiny to date. In this regard, it has been generally assumed that the structure and linkages of all flowering-plant phytochrome chromophores are the same. The situation is less clear for phytochromes from more primitive plant species, except for the evidence that the phytochrome from the green alga Mesotaenium caldariorum and Cph1 from the cyanobacterium Synechocystis both possess chromophores
8 The Phytochromes
Fig. 4 The chromophore and phytobilin precursors of plant, green algal and cyanobacterial phytochromes. Panel A illustrates the structure and linkage of the chromophore of oat phytochrome A as deduced by 1 H NMR spectroscopy [90]. The PCB-derived chro-
mophores of green algal and cyanobacterial phytochromes are assumed to have identical linkages and stereochemistry by analogy. Panel B shows the phytobilin precursors of plant, green algal, and cyanobacterial phytochromes.
derived from the more reduced phytobilin prosthetic group, phycocyanobilin (PCB), [71, 187]. Linear tetrapyrroles are flexible molecules, and their spectroscopic properties are strongly influenced by their conformation, protonation state, and chemical environment, [43]. For this reason, the chemical structure of the phytochrome chromophore in situ is still the subject of debate. It is clear however that the phytobilin prosthetic group is protonated and adopts an extended configuration in situ – conclusions that are based on vibrational spectroscopic studies and the large ratio of phytochrome’s red to near UV absorption maxima, [13]. This contrasts with the small ratio observed for bilins in free solution which assume more cyclic, porphyrin-like conformations. Based on vibrational spectroscopic analysis and semi-empirical vibrational energy calculations, the 4Z-anti, 10E-anti, 15Z-syn configuration was proposed for the Pr chromophore, [2]. This contrasts with the proposed 4Z-anti, 10Z-syn, 15Z-anti configuration, also found in chromophores of the phycobiliproteins, that was based on similar arguments, [84]. This issue will likely remain controversial until a phytochrome crystal structure is determined. Regarding the Pfr chromophore structure, 1 H NMR analysis of chromopeptides derived from the Pfr form of oat phytochrome, [139], combined with vibrational spectroscopic studies, strongly implicate the Z to E isomerization of the C15 double bond upon Pr to Pfr photoconversion, [2, 3, 45, 64, 84, 109]. This hypothesis is further supported by the intense fluorescence of covalent adducts between apophytochromes and phycoerythrobilin (PEB), a PB analog that lacks this double bond, [116]. Taken together, these results indicate that the phytobilin chromophore of phytochrome is tightly tethered to the apoprotein in a manner enabling Z to E isomerization of the C15 double bond to be the only efficient mode of radiationless de-excitation.
3 Photochemical and Nonphotochemical Conversions of Phytochrome 9
3.2 Phytochrome Photointerconversions
The photochemical interconversion processes of phytochromes have been extensively studied for many years, [13, 14, 16, 82, 136]. Since Pr and Pfr have overlapping absorption spectra in most regions of the light spectrum, Pr-to-Pfr and Pfr-to-Pr photoconversion processes lead to the formation of a photoequilibium consisting of a mixture of Pr and Pfr forms under saturating illumination. The light fluence needed to produce this photoequilibrium depends on the intensity and wavelength of light used as well as the relative quantum yields for the Pr-to-Pfr and Pfr-to-Pr phototransformation processes, [16]. Photoequilibrium is most rapidly achieved at the absorption maxima of Pr and Pfr – red light (R) producing a mixture of roughly 87% Pfr and 13% Pr for oat and rye phyA, [80, 88]. Owing to the lack of Pr absorption in the far-red (FR) region of the light spectrum, FR irradiation can convert >99% of phytochrome to the Pr form, [101]. This information has been used to calculate the amount of Pfr produced by a given f luence of light providing the basis of support for the central dogma(s) of phytochrome action described previously. From time-resolved absorption and low-temperature trapping spectroscopic techniques, a number of intermediates accompaning the Pr-to-Pfr and Pfr-to-Pr photoin-terconversion processes have been identified. Two nomenclatures have been used to identify these intermediates – one based on the rhodopsin system and the other based upon the wavelength maxima of the intermediates. The former nomenclature will be used herein and in Figure 5. While the following discussion focuses on plant phytochromes (and mostly phyAs), comparative studies on Cph1 suggest that the initial photochemical interconversions are qualitatively similar, [47, 134, 150, 151]. Photoexcitation of Pr yields Lumi-R, the first ‘stable’ photoproduct that absorbs near 700 nm. This conversion occurs very rapidly; the half-life of the Pr excited state(s) lying between 25 and 50 ps, [1, 10, 70]. Based on resonance Raman analyses, it is generally accepted that Z, syn to E, syn isomerization of the C15 double bond occurs during this initial photoconversion, [2, 109]. This mechanistic interpretation is consistent with 1) low-temperature absorption kinetic measurements, [37], 2) the intense fluorescence of PEB-apophytochrome adducts which lack the C15 double bond, [116], and 3) the small deuterium isotope effect on Pr fluorescence yield – results that all but rule out intramolecular proton transfer as the primary mechanism of Pr excited-state decay, [15]. Lumi-R decay occurs in the microsecond timescale to meta-Ra, followed by its conversion into other kinetically distinguishable intermediates (eg. meta-Rc) within the microsecond to millisecond range, ultimately yielding Pfr. The temperatureand solvent-dependence of these non-photochemical interconversions implicate significant changes in phytobilin-apoprotein interactions. At least one of these processes involves the rotation about the C14–C15 single bond to yield the final C15 E,anti conformation of the Pfr chromophore (Figure 5). The chemical structures of these intermediates is subject of debate, with some investigators favoring a deprotonation-reprotonation mechanism while others assert that the chromophore
10 The Phytochromes
Fig. 5 Phytochrome photointerconversion processes. The Pr-to-Pfr photoconversion involves the light-dependent 15Zto-15E isomerization, followed by a lightindependent rotation about the C14–C15 bond and changes in chromophore-protein interactions. The Pfrto-Pr photoconversion
is envisaged to involve a concerted inplane photoisomerization of the C15 double bond, followed by a series of light-independent chromophore-protein relaxation steps. The lifetime and absorption maxima of the various spectroscopically detectible intermediates are shown. Adapted from [3].
remains protonated throughout, [13, 46]. The hypothesis that the phytobilin chromophore is moderately planar in both Pr and lumi-R but becomes more distorted from planarity upon conversion to Pfr is presently the most widely accepted interpretation of the available spectroscopic data, [13]. The overall Pr-to-Pfr photochemical quantum yield (r) has been determined to be in the range of 0.15–0.17 for plant phytochromes, [88]. This quantum yield is in agreement with that determined by femtosecond spectroscopy, [1] and by optoacoustic spectroscopy, [58]. The reverse Pfr-to-Pr photointerconversion is less well characterized for technical reasons but clearly this also proceeds through multiple intermediates, [23]. It is notable that Pfr fluorescence and Pfr-to-Pr photoconversion quantum yields are both lower than those of Pr and Pr-to-Pfr phototransformations, respectively. In this regard, the fluorescence quantum yield for Pfr is vanishingly small, that is less than 10−6 at room temperature, [178], and fr at 0.06–0.08 is less than half of that of r, [88]. Together with the results of resonance Raman analyses and theoretical arguments, [3], these observations have been interpreted to support a mechanism for the Pfr-to-Lumi-F photochemical interconversion that involves a
3 Photochemical and Nonphotochemical Conversions of Phytochrome 11
concerted configurational and conformational isomerization about the C15 double bond (Figure 5). The spectroscopic evidence for strong nonbonded interactions between the C- and D-ring methyl groups in the Pfr chromophore indicate that the excited state of Pfr and its photoproduct are strongly coupled, which may be a major structural feature which favors the concerted C15 E,anti to C15 Z,syn isomerization mechanism, [3]. The small value for fr suggests that one or more of the Pfrto-Pr intermediates can thermally dark-revert back to Pfr-like meta-R intermediates produced in the forward reaction. It also is conceivable that a second reversible photochemical reaction (i.e. proton transfer, amino acid adduct formation) had occurred in parallel with the photoisomerization mechanism to quench the Pfr excited state. Other than their absorption-spectroscopic signatures and thermal stabilities, little is known about the chemical structures of these intermediates and much still remains unknown regarding the reverse photointerconversion process. 3.3 Dark Reversion
From the first spectroscopic measurement of phytochromes, the process of nonphotochemical Pfr-to-Pr dark reversion became evident, [51]. Dark reversion of phytochrome has not only been measured in vitro, but has been observed in vivo, which lead to the hypothesis that this process is of physiological significance, [17]. The bulk of these studies were performed using dark-grown plant seedlings to reduce chlorophyll absorption which strongly overlaps with phytochrome and because these seedlings accumulate high levels of phyA, considerably improving the sensitivity of phytochrome photoassays. Spectrophotometric surveys of etiolated plant tissues revealed considerable variation in the amount of dark reversion for different plant species. While monocot plants showed considerably less dark reversion compared with dicots, the distinct morphology and cell types of these two classes of plants made it difficult to conclude whether this difference reflected the intrinsic properties or cellular environments of phytochromes from the two classes of plants, [51]. The development of methods to isolate and purify phytochromes have facilitated analysis of the molecular basis of dark reversion – a process that has been shown to be modulated by temperature, pH, various denaturants, proteases and reductants, [51]. As improvements have been made in the method of phytochrome purification, the discrepancy between in vivo and in vitro measurements of phytochrome dark reversion has been resolved. Consistent with in vivo measurements, full length monocot phyAs display very little dark reversion in vitro, while proteolytic removal of 50–100 amino acid at phyA’s N-terminus (i.e. the P1 domain) significantly promotes dark reversion, [172]. The stabilizing inf luence of the P1 domain has been seen for most phytochromes examined – its removal invariably leading to a 10–15 nm blue shift of the Pfr absorption maximum, [173]. Indeed, the lack of a P1 domain in Cph1 and Cph2 phytochromes may be responsible for the blueshifted absorption spectra and enhanced dark reversion of these cyanobacterial photoreceptors compared with flowering plant phyAs, [95, 190].
12 The Phytochromes
The low abundance of phyB-E in plants has made it difficult to measure their spectroscopic properties until recently. The development of methods to express and reconstitute recombinant holophytochromes has enabled investigation of the biochemical and spectroscopic properties of phytochromes whose genes have been cloned (see subsequent section). Studies on the Arabidopsis phytochrome family have shown that recombinant phyC and phyE exhibit significantly more dark reversion than recombinant phyA and phyB, [36]. The importance of dark reversion to phytochrome function is further underscored by the observation that the lossof-function phyB-101 mutant displays enhanced dark reversion, [38]. Localization of the phyB-101 mutation to the R2 PAS repeat implicates intra- and inter-molecular interations within the phytochrome dimer to stabilize the ‘active’ Pfr form. The possibility that other loss-of-function alleles of phytochrome may represent enhanced dark reversion has not been carefully assessed in all instances. It is based on these and other biochemical observations that the regulatory interplay between the N- and C-terminal regions of phytochromes is the subject of active ongoing investigation, [154]. The fact that plant phytochromes are all homodimers also raises the possibility that dark reversion of a Pfr chromophore on one subunit may be affected by the chromophore state on the other subunit, [55]. While there is evidence to support this hypothesis, for example that Pfr-Pfr homodimers dark revert more slowly than Pfr-Pr heterodimers, this simple interpretation does not hold true for all phytochromes so far examined, [36]. In summary, dark reversion plays an important role to regulate the phytochrome signal output — a role that future investigations will hopefully better resolve.
4 Phytochrome Biosynthesis and Turnover
Since light perception by all phytochromes depends upon the presence of the linear tetrapyrrole chromophore, the metabolic processes involved in the synthesis of its phytobilin prosthetic group and its assembly with apophytochrome are of fundamental importance to light-mediated plant growth and development. As phytobilin-binding proteins, phytochromes also influence tetrapyrrole metabolism in addition to their photoregulatory function. This section summarizes the current understanding of the pathways involved in the synthesis and assembly of phytobilins to apophytochromes. 4.1 Phytobilin Biosynthesis in Plants and Cyanobacteria
The finding that phytochromes possess linear tetrapyrrole prosthetic groups indicated that its synthesis would share common intermediates with heme and chlorophyll biosynthetic pathways. Early studies exploited the development of specific inhibitors of early steps in the tetrapyrrole pathway which could be
4 Phytochrome Biosynthesis and Turnover 13
overcome by feeding hypothetical intermediates after the blocked step. Oat seedlings treated with gabaculine or 5-aminohexynoic acid, potent inhibitors of glutamate-1-semialdehyde amino transferase (GSAT), were shown to induce both phytochrome and chlorophyll deficiencies, [40, 57]. Supplementation of the medium with 5-aminolevulinic acid (ALA), protoporphyrin or biliverdin IXα (BV) restored phytochrome levels to those of un-inhibited plants, thereby establishing the intermediacy of these compounds in the pathway of phytochrome chromophore biosynthesis, [39, 41, 57, 75, 86]. Based on these studies, it has been well established that the phytobilin biosynthesis shares intermediates with siroheme, protoheme, and chlorophyll biosynthetic pathways that are depicted in Figure 6. All plant tetrapyrroles are derived from glutamate which is converted to ALA via three enzymes: glutamate tRNA synthetase, glutamate tRNA reductase (GluTR) and GSAT, [6, 169]. Found in plants, algae, most eubacteria, and many archaebacteria, the glutamate C5 pathway for ALA synthesis can be contrasted with the C3 Shemin pathway of alpha proteobacteria and non-photosynthetic eukaryotes that utilizes the enzyme ALA synthase (ALAS). In all chlorophyll-based photosynthetic organisms, eight molecules of ALA are metabolized in several steps to yield protoporphyrin IX (Figure 6). A key branch point in chlorophyll and heme biosynthesis, protoporphyrin IX is metalated with magnesium or iron by two distinct chelatase systems to produce Mg-protoporphyrin or iron-protoporphyrin (heme). The intermediacy of heme in phytochrome chromophore biosynthesis was implicated by the observation that Mg-protoporphyrin was unable to restore holo-phytochrome levels in gabaculine-treated plants. Unfortunately, exogenous heme was not incorporated into phytochrome under the conditions used for ALA, protoporphyrin and BV rescue, [166]. Resolution of this impasse was made from two lines of investigation – development of an in organello assay system and molecular cloning of genes involved in phytochrome chromophore biosynthesis. With the knowledge that all plant tetrapyrroles are derived from ALA made in plastids, an isolated plastid system was used to address the hypothesis that linear tetrapyrroles are synthesized in this compartment, [162]. These studies took advantage of the ability of apophytochromes to assemble with phytobilin precursors to yield photochemically active holoproteins [see, [176] and later discussion]. By adding apophytochrome to the assay mixtures, these investigators were able to show that the entire pathway of phytochrome chromophore biosynthesis resides in plastids, [162] and later confirmed the intermediacy of ALA, heme, and biliverdin in its synthesis, [166]. While the low abundance of the enzymes unique to the pathway of phytobilin biosynthesis has proven challenging to their purification and molecular cloning, the key breakthrough on this subject was made with the molecular cloning of HY1 and HY2, [85, 114] – two loci known from previous studies to be involved in the synthesis of the phytochrome chromophore in Arabidopsis, [126]. These studies have confirmed that the phytochrome chromophore is derived from heme via the intermediacy of BV and PB (see Figure 4 for structure). The properties of the plant-specific heme oxygenase and phytobilin synthases are described in the following section.
14 The Phytochromes
Fig. 6 Phytobilin biosynthetic pathways in plants, cryptophytes, and cyanobacteria. Phytobilin biosynthesis shares common intermediates with siroheme, heme, and chlorophyll pathways up to the level of protopor-phyrin IX. Iron insertion leads to heme, which is subsequently converted to BV by heme oxygenase and to phytobilins by a member of the ferredoxin-dependent bilin reductase (FDBR) family. Magnesium insertion leads to the chlorophylls. Heme, BV, phytobilins, and chlorophylls assemble with their respective apoproteins – processes that draw off these intermediates. Alternative pathways for ALA synthesis, heme, and BV metabolism found in animals and some bacteria are shown in the
stipled boxes. Exogenous intermediates that were incorporated into the phytochrome chromophore are underlined. Metabolic steps that are feedback-inhibited by heme are shown with circled Hs. The symbols (H) and (O) indicate that oxidation and reduction steps occur. Abbreviations include: ALA, 5-aminole-vulinic acid; apoBphP, apo-bacteriophytochromes; apoCph, apocyanobacterial phytochromes; apoHP, apohemoproteins; apoLHCP, apo-light harvesting chlorophyll binding proteins; apoPHYs, apo-phytochromes; FDBRs, ferredoxindependent bilin reductases; GSA, glutamate1-semialdehyde; HO, heme oxygenase; tRNAglu, glutamate tRNA
4.1.1 Ferredoxin-dependent Heme Oxygenases Heme oxygenases (HO) from mammals were the first of this family of enzymes to be identified, [159]. Microsomal enzymes, mammalian heme oxygenases catalyze the oxygen-dependent conversion of heme to BV, CO, and Fe2+ products, [121]. The seven electrons required for this reaction are provided by microsomal NADPH-
4 Phytochrome Biosynthesis and Turnover 15
cytochrome P450 reductases, [141, 160]. Although they catalyze the same reaction, plant HOs are soluble plastid-localized proteins that utilize soluble ferredoxin for reducing power, [115]. Despite their functional differences, the two families of HOs are structurally and evolutionarily related, [50, 163]. Ferredoxin-dependent HOs plays a pivotal role in the biosynthesis of phytobilins in all oxygenic photosynthetic organisms. In plants, BV is reduced by the enzyme phytochromobilin synthase to yield PB, the immediate precursor of the phytochrome chromophore of phytochromes that is critical for its photoactivity, [166]. In cyanobacteria and algae, BV is converted into the phytobilin precursors of the chromophores of the light-harvesting phycobiliproteins for photosynthesis, [5]. These include both PEB and PCB, the latter of which also serves as the immediate precursor of the cyanobacterial phytochromes, Cph1 and Cph2. The first HO identified from a photosynthetic organisms was found in the red alga Cyanidium caldarium by Troxler and colleagues, [168]. These researchers demonstrated that the phycobiliprotein chromophore precursor PCB was derived from heme in a manner similar to that in mammalian systems. More recently, Beale and Cornejo partially purified a Cyanidium HO and reported data establishing its ferredoxin dependence, [7, 31]. Cyanobacterial HO activity was also described by these researchers, [32], who later succeeded to clone and functionally express a ferredoxin-dependent HO from the cyanobacterium Synechocystis sp. PCC 6803, [33]. Other than in organello studies from a variety of plant species, [163], there has been little published on the biochemistry of heme oxygenases isolated from plant sources. The cloning of HY1 has however lead to the identification of three other heme oxygenase-related genes in the Arabidopsis genome, [34, 115]. From genetic and biochemical studies on recombinant proteins, it is clear that HY1 encodes a soluble, ferredoxin-dependent HO that is targeted to plastids, [115]. HO genes have been found in many other plant species, and like HY1, lesions in these genes also lead to phytochrome deficiencies, [161]. Aside from animals, plants, and cyanobacteria, novel HO genes have been identified in a wide variety of bacterial species, [181]. A soluble HO from the bacterial pathogen Neisseria meningitides encoded by the hemO locus was shown to convert heme to ferric-BV in the presence of ascorbate or NADPH-cytochrome P450 reductase, [191]. Biochemical studies of a protein encoded by the HO-related hmuO locus from Corynebacterium diphtheriae suggests that it also encodes an HO enzyme similar to human HO, [26, 144, 182]. Physiological studies on HmuO implicate a vital role in iron acquisition that is essential for survival and pathogenicity of this microorganism, [144]. The HO-related PigA protein from the opportunistic human pathogen Pseudomonas aeruginosa was recently shown to convert heme to the IXβ/γ isomers of biliverdin, [132]. Perhaps the most interesting is the BphO family of HOs, whose genes are invariably found linked to BphPs in a wide variety of bacterial genomes, [8]. Although there is no direct biochemical evidence that shows that BphO is a heme oxygenase, the observation that E. coli cells overexpressing the bphOP operon exhibit green color strongly suggesting that BV is produced by this HO-related protein, [8]. These observations implicate a role for the bacteriophytochrome family in iron homeostasis, [111].
16 The Phytochromes
4.1.2 Ferredoxin-dependent Phytobilin Synthases (Bilin Reductases) In oxygenic photosynthetic organisms exclusively, BV is converted to phytobilins by a family of ferredoxin-dependent bilin reductases (FDBRs) as illustrated in Figure 7. HY2, the founding member of this family, was shown to encode the enzyme PB synthase that mediates the conversion of BV to 3Z-PB, [85]. Unlike the NADPH-dependent biliverdin reductase (BVR) found in mammals, [100] and its cyanobacterial ortholog BvdR, [142], biliverdin reduction by PB synthase appears to proceed via sequential one electron steps involving radical intermediates, [50]. PB synthase isolated from etiolated oat seedlings as well as recombinant HY2 from Arabidopsis both catalyze the ferredoxin-dependent two electron reduction of BV to 3Z-PB, [85, 106]. Formally a phytochromobilin:ferredoxin oxidoreductase (E.C. 1.3.7.4.), PB synthase possesses a submicromolar affinity for its BV substrate and also exhibits a relatively high turnover rate constant of >100 min−1 , [106]. A single-copy gene in Ara-bidopsis, mutations in HY2-related genes found in other plant species all lead to phytochrome deficiencies, [161]. The paucity of HY2 cDNA clones in plant EST databases supports the observation that the HY2 protein is present in vanishingly low amounts in plant tissue, [106]. Together with its very high BV affinity, this observation may account for the high specific activity of PB
Fig. 7 Bilin metabolism in plants, cyanobacteria and mammals. Phytobilins from plants, crytophytes and cyanobacteria are all derived from BV which is metabolized by various members of the ferredoxin-
dependent bilin reductase family, that is HY2, PcyA, PcyB, PebA and PebB [49]). In mammals and cyanobacteria, BV is reduced to bilirubin (BR) by the NADPH-dependent enzymes BVR and BvdR, respectively.
4 Phytochrome Biosynthesis and Turnover 17
synthase enabling low amounts of enzyme to produce sufficient PB precursor for holophytochrome synthesis. The precursor of the chromophores of green algal and cyanobacterial phytochromes has been shown to be PCB – a phytobilin that is two electrons more reduced than PB, [71, 187]. PCB is produced by the HY2-related enzyme PcyA, phycocyanobilin:ferredoxin oxidoreductase (E.C. 1.3.7.5.), [49]. Found in all phycobiliprotein-producing cyanobacteria as well as the oxygenic photobacterium Prochlorococcus marinus, PcyA genes encode the enzymes that catalyze the ferredoxin-dependent, four electron reduction of BV to a mixture of 3Z- and 3EPCB, [49]. Mechanistic studies on PcyA have established that the reduction of BV by PcyA proceeds via the two-electron reduced intermediate 181 ,182 -dihydrobiliverdin, rather than its isomer PB, [48]. The distinct BV reduction regiospecificity of PcyA was proposed to ensure that cyanobacteria do not produce PB because its misincorporation into phycobiliproteins might alter their spectra and/or stability, [48]. The observation that the green alga Mesotaenium caldariorum possess phytobilin synthase enzymes capable of the reductions of BV to PB and PB to PCB, [187] indicates that an alterative enzyme that we have named PcyB can accomplish one or both of these conversions in this organism, [50]. Although this enzyme has not yet been cloned, the recent demonstration that the PB chromophore of plant phytochromes can be substituted with PCB, and still yield a functional photoreceptor, raise many questions about chromophore selection during the evolution of higher plant phytochromes, [79]. 4.2 Apophytochrome Biosynthesis and Holophytochrome Assembly
Since phytochromes consist of two components, an apoprotein and a phytobilin pigment, phytochrome biosynthesis involves the convergence of two biosynthetic pathways which culminate in the assembly of functional holoprotein. Progress on the synthesis of the phytochrome apoprotein and its assembly with the phytobilin chromophore precursor is summarized below. 4.2.1 Apophytochrome Biosynthesis Although they are encoded by a small gene family of 3–5 members, [27], plant phytochromes can be classified into two groups based upon their light stability. Phytochromes encoded by the phyA gene family are responsible for the light-labile pool, while phyB-E genes encode the light-stable phytochromes, [54]. The pronounced light-lability of phyA proteins is due to two processes – light-dependent transcriptional repression of the phyA gene, [128] and light-dependent protein turnover (see Section 4.3). PhyA protein accumulates to very large levels in dark grown seedlings due to the elevated level of transcription and the pronounced stability of the phytochrome apoprotein. In this regard, the phyA promoter has been shown to be even stronger than the 35S promoter from the caulif lower mosaic virus when used to express heterologous proteins in dark-grown Arabidopsis plants, [152]. It is well established that the light-dependent repression of phyA gene expression is
18 The Phytochromes
mediated by phyA itself, [128]. The pattern of expression of the different phytochrome genes has been extensively studied in Arabidopsis, [59, 153], as has the accumulation of the phytochrome apoproteins, [66, 146]. Taken together, these studies indicate that the five phytochrome apoproteins have an overlapping distribution with phyA and phyB being the dominant species of phytochrome that respectively accumulate in dark-adapted and light-adapted plant tissue. While similar results have been reported for phytochromes in other f lowering plant species, [54, 62], little is known about the regulation of apophytochrome expression in lower plants, [174]. In this regard, it has been reported that phytochrome accumulation in green algae and mosses is light-regulated, although these phenomena may reflect differential protein turnover rather than differential transcription, [42, 112]. Owing to their very low abundance, even less is known about the accumulation of the Cph1 and Cph2 apoproteins in cyanobacteria other than the observation that Cph1 mRNA accumulation is light-regulated, [56]. The regulatory mechanism for this response is distinct from that of phyA in f lowering plants however, and little is presently known how this affects apoCph1 accumulation in cyanobacteria. 4.2.2 Holophytochrome Assembly Phytobilins are covalently bound to both phytochromes and phycobiliproteins via thioether linkages. Consequently, the process of holoprotein assembly has been the subject of extensive analysis, [107]. The assembly of plant phytochromes is autocatalytic and proceeds in two steps – noncovalent phytobilin binding to the apophytochrome followed by thioether linkage formation, [89, 96, 97]. Recent studies with the cyanobacterial phytochrome Cph1 have corroborated this autocatalytic mechanism, [11]. The take-home lesson of these studies is that an A-ring ethylidene in the phytobilin precursor is required for covalent attachment – a substituent that enables Michael addition of the conserved cysteine thiol residue of apophytochromes to this electrophilic double bond. BVs and bilirubins, which lack the A-ring ethylidene moiety, do not form covalent linkages with apophytochrome, [96]. BVs can however non-covalently interact with apophytochrome and act as reversible competitive inhibitors for phytochrome assembly, [97]. These results contrast with the observation that BV covalently binds to bacteriophytochromes, [8, 91, 93, 94]. The autocatalytic mechanism of phytochrome assembly also contrasts with phycobiliprotein assembly which require lyase enzymes to mediate attachment of each phytobilin, [50, 143]. In addition to the ethylidene moiety, both propionic acid moieties have been shown to be necessary for holophytochrome assembly, [9]. The requirement of a C10 methine bridge was based on the observation that rubin analogs of phytobilins fail to form covalent adducts with recombinant apophytochromes, [164]. While most of the published reports on phytochrome assembly have utilized the 3E-phytobilin isomers, the observation that 3Z-phytobilin isomers are also capable of functional assembly with apophytochrome suggests that the 3Z to 3E isomerization is not critical for holophytochrome assembly, [165]. As was discussed in Section 2.1.2, 3Z-PB is the major product of PB synthase, hence its isomerization may not be necessary in vivo. Other studies have shown that bilins with modified substituents in
4 Phytochrome Biosynthesis and Turnover 19
the D-ring including PCB, PEB and isoPB can assemble with apophytochromes. However, the spectroscopic properties of the resulting phytobilin apophytochrome adducts are significantly altered, [41, 96, 98]. PCB substitution has been shown to produce a functional, albeit blue-shifted plant phytochrome photoreceptor, [79], while PEB adducts of apophytochromes, a.k.a. phytofluors, are intensely f luorescent, [116]. More recent studies using synthetic bilins indicate that, aside from the ethylidene moiety required for thioether linkage formation, [89, 96, 97], modifications in both A- and D-rings can be tolerated, [60, 61]. This work suggests that the phytobilin binding site on apophytochrome can accommodate structural variations, and that it should be possible to alter the spectral properties and photoregulatory function of phytochromes through engineering of new pathways of phytobilin biosynthesis in plants. While the phytobilin specificity for holophytochrome assembly has been actively addressed, less is known regarding catalytic residues on apophytochrome that specify phytobilin attachment. The only absolute requirement for thioether linkage formation appears to be the conserved cysteine itself (i.e. Cys321 in the case of PHYA3), [9, 12]. Deletion analysis of recombinant phytochromes reveals that neither the N-terminal 68 amino acids nor the entire C-terminal regulatory domains are required for assembly, [35, 65]. Comparative domain analysis of phy, cph1 and cph2 families indicate that P1 and P2 domains are dispensable for phytobilin attachment, and in the same study, it was shown the P3 GAF domain delimits the phytobilin lyase domain of phytochromes, [186]. A site-directed phytochrome mutant on the histidine residue adjacent to the conserved cysteine in the P3 GAF domain almost completely abolished phytobilin attachment, demonstrating a potential catalytic role for this residue, [9, 135]. Replacement of this histidine with asparagine did not prevent phytobilin attachment however, indicating that this residue cannot serve as a proton donor/acceptor for catalysis, [155]. Mutagenesis of a conserved glutamate residue (E189) in the P3 GAF domain of Cph1 proved inhibitory to phytobilin attachment implicating its involvement in assembly – possibly as a proton donating residue to the phytobilin, [186]. The detailed knowledge of the residues that participate in both noncovalent and covalent binding of phytobilins to apophytochromes is expected to provide insight into the molecular mechanism of holophytochrome assembly and its regulation, about which little is presently known. 4.3 Phytochrome Turnover
Aside from assembly, photoconversion and dark reversion, phytochrome localization and turnover are two important mechanisms that regulate phytochrome activity. The movement of phytochrome from the cytoplasm to the nucleus is controlled by the photoconversion between Pr and Pfr forms – a process that has been strongly linked to phytochrome signal transduction. The following discussion focusses on the processes that regulate phytochrome stability. Plant phytochromes have been classified into two groups, that is light-labile and light-stable. Accumulating to high levels in plants grown for prolonged periods in
20 The Phytochromes
darkness, the light-labile phyA class of photoreceptors function to regulate gene expression and seed germination under very low light fluences and far-red light enriched canopy environments, [21]. Such conditions are encountered for seeds that germinate and develop underground. When dark-grown seedlings are exposed to light, phyA is rapidly degraded, [117]. The rate of phyA degradation increases 100-fold upon light exposure – from a half-life of over 100 h to less than 1 h after photoconversion to Pfr, [131]. Because of this phenomenon, light-stable phyBEs predominate in light-grown plants, [146]. Light-dependent phyA turnover is preceeded by the formation of sequestered areas of phytochromes or SAPs in the cytoplasm, [99, 156]. Since SAPs are only formed under R and are also detected in phyA preparations in vitro, the hypothesis that SAPs represent self-aggregated Pfr has been proposed, [68, 69]. This proposal has received support from more recent studies showing that phyA must be translocated to the nucleus to initiate signal transduction. In this model, phyA that remains sequestered in the cytosol does not participate in signaling and is ultimately degraded. In the late 1980s, the detection of ubiquitin-phyA conjugates (Ub-P) following light treatment provided evidence that phyA degradation utilizes the ubiquitin/26S pro-teasome pathway [for review see, [29]. Ub-P conjugates appeared rapidly, that is within 5 min, following exposure to red light and disappeared with kinetics similar to those for the loss of Pfr; results that indicated that phyA is degraded in the Pfr form, [25, 74]. This conclusion was more recently corroborated by the pronounced light-stability of phyA in dark-grown phytobilin-deficient plants where phyA accumulates as its apoprotein, [161]. The role of phyA ubiquitination to both signal transduction and turnover has been the subject of great interest. Through deletion analysis of phyA and expression in transgenic plants, several regions important for Pfr degradation have been identified [see review of, [29]. Key regions critical to the light-dependent turnover have been mapped to both N- and C-terminal domains of phytochrome. However, identification of key lysine residues that participate in the light-dependent ubiquitination of phyA remains an elusive challenge, [28]. The recent discovery of cytoplasm-to-nuclear translocation raises the question of in which plant cellular compartment phyA ubiquitination and turnover occurs. This issue will undoubtedly consume countless hours of research in the laboratory. The inability to identify a phyA-specific E3 ubiquitin ligase by both biochemical analysis and genetic screens suggests that a novel mechanism(s) for phyA turnover may be responsible.
5 Molecular Mechanism of Phytochrome Signaling: Future Perspective
Phytochrome signaling requires both translocation to the nucleus and the maintenance of the Pfr form within the nucleus for a sufficient length of time to commit the light signal irreversibly. Based on the protein kinase activities of plant and cyanobacterial phytochromes, it is likely that protein phosphorylation plays a key role in the transmission of the light signal and/or in its regulation. The poten-
5 Molecular Mechanism of Phytochrome Signaling: Future Perspective 21
tial role of phytochromes to regulate tetrapyrrole metabolism is another emerging area of phytochrome signaling studies. Both topics are the subject of the following discussion. 5.1 Regulation of Protein-Protein Interactions by Phosphorylation
Previous studies have shown that the protein kinase activities of both Cph1 and eu-karyotic phytochromes are bilin- and light-regulated, [189, 190]. Phytobilin binding to apoCph1 stimulates its histidine kinase activity, which is consistent with the observation that holoCph1 is mostly dimeric while apoCph1 is a monomer, [92, 124]. Phytobilins are therefore apoCph1 ligands which regulate its trans(auto)phosphorylation – a mechanism that has been well described for other bacterial two-component histidine kinases, [157]. In contrast, the autophosphorylation activity of affinity-purified, recombinant oat phyA is inhibited by bilin attachment, indicating that the ser/thr kinase activity of a eukaryotic phytochrome is also bilin-regulated, [189]. As shown in Figure 8, the polarities of bilin- and light-regulation for plant and Cph1 phytochromes are opposite one another, with Pfr being more active than Pr for plant phytochromes and vice versa for Cph1. These results strongly implicate regulation of the monomer-dimer equilibrium by bilin and light for Cph1, while a different (allosteric) mechanism must be invoked for plant phytochromes which are are obligate dimers. The role of phytochrome phosphorylation activities in light signaling in plants is unknown. However, the observed bilin-stimulated and light-inhibited phosphotransferase activities of Cph1 implicate the regulation of the ATP/ADP-binding affinities, subunit-subunit dissociation constants, and/or ATP-to-histidine and phosphohistidine-to-aspartate equilibrium/rate constants. Little is presently known about the inf luence of bound ATP (and/or ADP) and of the histidine phosphorylation state on the thermody-
Fig. 8 Models for prokaryotic (Cph1) and eukaryotic (phyA-E) phytochrome signal transmission. Bilin and light input signals are perceived by the photosensory domains leading to altered phosphotransfer activities of the regulatory domains. Biochemical studies have shown that Cph1 and eukaryotic phytochromes have opposite responses.
The linear tetrapyrrole (bilin) chromophore is shown associated with the P3 and P4 domains as indicated in Figure 6 Potential downstream targets are indicated in red. The output signal(s) of the phytochrome signal transduction pathways are presently unknown.
22 The Phytochromes
namics and kinetics of the various protein-protein interactions. For higher plant phytochromes, it is reasonable that phytochrome-substrate interactions will be inf luenced by ATP/ADP binding as well as by the phosphorylation status of phytochrome itself and its substrate(s). We envisage that bilin, light, and ATP work together to regulate phytochrome–protein interactions which inf luence its subcellular localization, degradation, dark reversion, gene expression, and lead ultimately to changes in plant growth and development. 5.2 Regulation of Tetrapyrrole Metabolism
The evidence that phytochromes are regulators of tetrapyrrole metabolism in plants has a rich history, [110]. In addition to key enzymes of the chlorophyll pathway that are transcriptionally regulated by phytochrome, such as GluTR, [105] and protochlorophyllide oxidoreductase A (PORA), [113], phytochrome performs a metabolic regulatory role in the tetrapyrrole biosynthetic pathway that likely ref lects a more ancestral role, [111]. Phytochromes coordinate the expression of nuclear genes involved in the biogenesis of the photosynthetic apparatus, most notably the light harvesting chlorophyll a/b binding protein family, with the production of chlorophyll pigments. This is an extremely important function because misregulated chlorophyll synthesis can lead to production of toxic oxygen species via photosensitization, [30]. Indeed, the combination of light, oxygen, and photosensitizing pigments is a deadly recipe that plants must avoid at all costs. In flowering plants, chlorophyll synthesis does not occur in darkness since their protochlorophyllide oxidoreductases are all light-dependent, [183]. This situation contrasts with cryptophytes, that is mosses, ferns and algae, which possess a light-independent protochlorophyllide oxidoreductase system enabling them to synthesize chlorophyll in darkness, [53, 169]. During prolonged dark periods, angiosperm seedlings therefore accumulate protochlorophyllide (Pchlide) which becomes bound to PORA. Upon light exposure, Pchlide bound to the ternary NADPH-PORA-Pchlide complex is photochemically converted to chlorophyllide (Chlide) which is subsequently metabolized to chlorophylls a and b (see Figure 6). The dark accumulation of Pchlide serves to prime chlorophyll biosynthesis and assembly of the photosynthetic apparatus when light becomes available; a process that is of adaptive significance to angiosperm seedlings developing underground, [21]. This program of etiolation can proceed as long as food reserves in the seed last or light becomes available. Highly expressed in dark-grown seedlings, the PORA gene is rapidly down-regulated upon light exposure; a response that has been shown to be mediated by phyA, [113]. The similar pattern of PHYA expression, that is elevated expression in darkness and rapid down regulation in light, suggests that the coordinate expression of PORA and PHYA contribute to this important adaptive process. The metabolic consequence of elevated PHYA expression in dark-grown plants is elevated synthesis of Pchlide which in part is due to removal of heme-derived
5 Molecular Mechanism of Phytochrome Signaling: Future Perspective 23
phytobilins. In this regard, it is well established that heme is a potent feedback inhibitor of ALA synthesis – and in plants, the major target for this inhibition is GluTR [see Figure 6; [30, 122]. For this reason, PORA expression must also be elevated to ensure that all of the Pchlide produced under these conditions is assembled into a PORA-protein complex. The consequence of insufficient PORA expression and the resulting accumulation of free Pchlide, seen in porA mutants and seedlings exposed to prolonged FR in which PORA gene expression is downregulated via the action of phyA, is severe photooxidative damage upon transfer to light, [133]. These results show that PORA and PHYA expression must be balanced during periods of prolonged etiolation, both to prime the system for rapid synthesis/assembly of the photosynthetic apparatus and to avoid photooxidative damage. Beside contributing to the synthesis of light-modulated regulators of gene expression, the phytobilin-binding properties of apophytochromes, as well as the BV-binding properties of apobacteriophytochromes, represent one of many strategies found in nature to modulate heme levels and thereby to modulate tetrapyrrole synthesis. In animals and alpha protoeobacteria, ALA synthesis is also regulated by heme. However in these organisms, ALAS is the target of heme feedback inhibition. The regulation of heme is critical for all aerobic organisms since heme also has pro-oxidant properties. For this reason, nature has invented a wide variety of mechanisms to cope with heme accumulation aside from this feedback regulation. Heme oxygenases play a key role in this regulation. However the mammalian and plant enzymes are inefficient with poor catalytic turnover, unless their BV products are removed. To accomplish heme removal, plants and cyanobacteria utilize the FD-BR family, apophytochromes and apophycobiliproteins, while mammals, fungi, pathogenic and nonpathogenic bacteria have evolved NADPH-dependent BV reductase (BVR-A), bacteriophytochromes, and unique heme oxygenase enzyme systems with distinct heme cleavage specificities [see Figure 6; [50]]. In all of these organisms, removal of heme has the same effect to up-regulate new tetrapyrrole synthesis through metabolite-dependent derepression of ALA synthesis. As for the fate of these newly synthesized tetrapyrroles, this varies from organism to organism – impinging on such processes as photosynthesis, nitrogen fixation, nitrate assimilation, oxygen-detoxification, synthesis/secretion of defense compounds, hormone synthesis/degradation, among many others. In addition to its well-publicized genetic regulatory function, phytochrome’s role to regulate tetrapyrrole metabolism in plants and cyanobacteria is expected to receive increased experimental scrutiny in the years to come. Acknowledgements
We gratefully acknowledge grant support from the National Institutes of Health (RO1 GM068552–01) and the United States Department of Agriculture (AMD0103397) and to members of the Lagarias laboratory for critical reading of this manuscript.
24 The Phytochromes
References 1 ANDEL, F., K. C. HASSON, F. GAI, P. A. ANFINRUD, and R.A. MATHIES. 1997. Biospectroscopy 3, 421–433. 2 ANDEL, F., J. C. LAGARIAS, and R.A. MATHIES. 1996. Biochemistry 35, 15997–16008. 3 ANDEL, F., J. T. MURPHY, J. A. HAAS, M. T. MCDOWELL, I. van der HOEF, J. LUGTENBURG, J. C. LAGARIAS, and R.A. MATHIES. 2000. Biochemistry 39, 2667–2676. 4 ARAVIND, L. and C.P. PONTING. 1997. TIBS 22, 458–459. 5 BEALE, S. I., 1993. Chem. Rev. 93, 785–802. 6 BEALE, S. I., 1999. Photosynth. Res. 60, 43–73. 7 BEALE, S. I. and J. CORNEJO. 1984. Arch. Biochem. Biophys. 235, 371–384. 8 BHOO, S. H., S. J. DAVIS, J. WALKER, B. KARNIOL, and R.D. VIERSTRA. 2001. Nature 414, 776–779. 9 BHOO, S. H., T. HIRANO, H. Y. JEONG, J. G. LEE, M. FURUYA, and P.S. SONG. 1997. J. Am. Chem. Soc. 119, 11717–11718. 10 BISCHOFF, M., G. HERMANN, S. RENTSCH, and D. STREHLOW. 2001. Biochemistry 40, 181–186. 11 BORUCKI, B., H. OTTO, G. ROTTWINKEL, J. HUGHES, M. P. HEYN, and T. LAMPARTER. 2003. Biochemistry 42, 13684–13697. 12 BOYLAN, M. and P. QUAIL. 1989. Plant Cell 1, 765–773. 13 BRASLAVSKY, S. E., 2003. In Photochromism, Molecules and Systems, (ed. D. H. BOUAS and H. LAURENT), pp. 738–755. Elsevier Science BV, Amsterdam. 14 BRASLAVSKY, S. E., W. G¨ARTNER, and K. SCHAFFNER. 1997. Plant Cell Environ. 20, 700–706. 15 BROCK, H., B. P. RUZICSKA, T. ARAI, W. SCHLAMANN, A. R. HOLZWARTH, S. E. BRASLAVSKY, and K. SCHAFFNER. 1987. Biochemistry 26, 1412–1417. 16 BUTLER, W. L., 1972. In Phytochrome (ed. K. MITRAKOS and W. SHROPSHIRE), pp. 182–192. Academic Press, New York. 17 BUTLER, W. L., H. C. LANE, and H.W. SEIGELMAN. 1963. Plant Physiol. 38, 514–519.
18 BUTLER, W. L., K. H. NORRIS, H. W. SIEGELMAN, and S.B. HENDRICKS. 1959. Proc. Natl. Acad. Sci. USA 45, 1703–1708. 19 CASAL, J. J., L. G. LUCCIONI, K. A. OLIVERIO, and H.E. BOCCALANDRO. 2003. Photochem. Photobiol. Sci. 2, 625–636. 20 CASAL, J. J., R. A. SANCHEZ, and J.F. BOTTO. 1998. J. Exp. Bot. 49, 127–138. 21 CASAL, J. J., R. A. SANCHEZ, and M.J. YANOVSKY. 1997. Plant Cell Environ. 20, 813–819. 22 CASHMORE, A. R., 1998. Proc. Natl. Acad. Sci. USA 95, 13358–13360. 23 CHEN, E. F., V. N. LAPKO, J. W. LEWIS, P. S. SONG, and D.S. KLIGER. 1996. Biochemistry 35, 843–850. 24 CHEN, M., R. SCHWABB, and J. CHORY. 2003. Proc. Natl. Acad. Sci. USA 100, 14493–14498. 25 CHERRY, J. R., H. P. HERSHEY, and R.D. VIERSTRA. 1991., Plant Physiol. 96, 775–785. 26 CHU, G. C., K. KATAKURA, X. ZHANG, T. YOSHIDA, and M. IKEDASAITO. 1999. J. Biol. Chem. 274, 21319–21325. 27 CLACK, T., S. MATHEWS, and R.A. SHARROCK. 1994. Plant Mol. Biol. 25, 413–427. 28 CLOUGH, R. C., E. T. JORDAN-BEEBE, K. N. LOHMAN, J. M. MARITA, J. M. WALKER, C. GATZ, and R.D. VIERSTRA. 1999. Plant J. 17, 155–167. 29 CLOUGH, R. C. and R.D. VIERSTRA. 1997. Plant Cell Environ. 20, 713–721. 30 CORNAH, J. E., M. J. TERRY, and A.G. SMITH. 2003. Trends Plant Sci. 8, 224–230. 31 CORNEJO, J. and S.I. BEALE. 1988. J. Biol. Chem. 263, 11915–11921. 32 CORNEJO, J. and S.I. BEALE. 1997. Photosynth. Res. 51, 223–230. 33 CORNEJO, J., R. D. WILLOWS, and S.I. BEALE. 1998. Plant J. 15, 99–107. 34 DAVIS, S. J., S. H. BHOO, A. M. DURSKI, J. M. WALKER, and R.D. VIERSTRA. 2001. Plant Physiol. 126, 656–669. 35 DEFORCE, L., K. I. TOMIZAWA, N. ITO, D. FARRENS, P. S. SONG, and M. FURUYA. 1991. Proc. Natl. Acad. Sci. USA 88, 10392–10396.
References 36 EICHENBERG, K., I. BAURLE, N. PAULO, R. A. SHARROCK, W. R¨UDIGER, and E. SCHA¨ FER. 2000. FEBS Lett. 470, 107–112. 37 EILFELD, P., P. EILFELD, and W. R¨UDIGER. 1986. Photochem. Photobiol. 44, 761–769. 38 ELICH, T. D. and J. CHORY. 1997. Plant Cell 9, 2271–2280. 39 ELICH, T. D. and J.C. LAGARIAS. 1987. Plant Physiol. 84, 304–310. 40 ELICH, T. D. and J.C. LAGARIAS. 1988. Plant Physiol. 88, 747–751. 41 ELICH, T. D., A. F. MCDONAGH, L. A. PALMA, and J.C. LAGARIAS. 1989. J. Biol. Chem. 264, 183–189. 42 ESCH, H. and T. LAMPARTER. 1998. Photochem. Photobiol. 67, 450–455. 43 FALK, H., 1989. The Chemistry of Linear Oligopyrroles and Bile Pigments. Springer-Verlag, Vienna. 44 FANKHAUSER, C., 2001. J. Biol. Chem. 276, 11453–11456. 45 FODOR, S. P., J. C. LAGARIAS, and R.A. MATHIES. 1990. Biochemistry 29, 11141–11146. 46 FOERSTENDORF, H., C. BENDA, W. G¨ARTNER, M. STORF, H. SCHEER, and F. SIEBERT. 2001. Biochemistry 40, 14952–14959. 47 FOERSTENDORF, H., T. LAMPARTER, J. HUGHES, W. G¨ARTNER, and F. SIEBERT. 2000. Photochem. Photobiol. 71, 655–661. 48 FRANKENBERG, N. and J.C. LAGARIAS. 2003a. J. Biol. Chem. 278, 9219–9226. 49 FRANKENBERG, N., K. MUKOUGAWA, T. KOHCHI, and J.C. LAGARIAS. 2001. Plant Cell 13, 965–978. 50 FRANKENBERG, N. F. and J.C. LAGARIAS., 2003b. In The Porphyrin Handbook. Chlorophylls and Bilins, Biosynthesis Structure and Degradation. (ed. K.M. KADISH, K.M. SMITH, and R. GUILARD), pp. 211–235. Academic Press, New York. 51 FRANKLIN, B., 1972. In Phytochrome (ed. K. MITRAKOS and W. SHROPSHIRE), pp. 195–225. Academic Press, New York. 52 FRY, K. T. and F.E. MUMFORD. 1971. BBRC 45, 1466–1473. 53 FUJITA, Y. and C.E. BAUER. 2003. In The Porphyrin Handbook. Chlorophylls and Bilins, Biosynthesis Structure and Degradation. (ed. K.M. KADISH, K.M. SMITH, and R. GUILARD), pp. 109–156. Academic Press, New York.
54 FURUYA, M., 1993. Ann. Rev. Plant Physiol. Plant Mol. Biol. 44, 617–645. 55 FURUYA, M. and E. SCHA¨ FER. 1996. Trends Plant Sci. 1, 301–307. 56 GARCIA-DOMINGUEZ, M., M.I. MURO–PASTOR, J.C. REYES, and F.J. FLORENCIO, 2000. J. Bacteriol. 182, 38–44. 57 GARDNER, G. and H.L. GORTON. 1985. Plant Physiol. 77, 540–543. 58 GENSCH, T., M. S. CHURIO, S. E. BRASLAVSKY, and K. SCHAFFNER. 1996. Photochem. Photobiol. 63: 719–725. 59 GOOSEY, L., L. PALECANDA, and R.A. SHARROCK. 1997. Plant Physiol. 115, 959–969. 60 HANZAWA, H., K. INOMATA, H. KINOSHITA, T. KAKIUCHI, K. P. JAYASUNDERA, D. SAWAMOTO, A. OHTA, K. UCHIDA, K. WADA, and M. FURUYA. 2001. Proc. Natl. Acad. Sci. USA 98, 3612–3617. 61 HANZAWA, H., T. SHINOMURA, K. INOMATA, T. KAKIUCHI, H. KINOSHITA, K. WADA, and M. FURUYA. 2002. Proc. Natl. Acad. Sci. USA 99, 4725–4729. 62 HAUSER, B. A., M. M. CORDONNIER-PRATT, and L.H. PRATT. 1998. Plant J. 14, 431–439. 63 HERDMAN, M., T. COURSIN, R. RIPPKA, J. HOUMARD, and N. T. de MARSAC. 2000. J. Mol. Evol. 51, 205–213. 64 HILDEBRANDT, P., A. HOFFMANN, P. LINDEMANN, G. HEIBEL, S. E. BRASLAVSKY, K. SCHAFFNER, and B. SCHRADER. 1992. Biochemistry 31, 7957–7962. 65 HILL, C., W. G¨ARTNER, P. TOWNER, S. E. BRASLAVSKY, and K. SCHAFFNER. 1994. Eur. J. Biochem. 223, 69–77. 66 HIRSCHFELD, M., J. M. TEPPERMAN, T. CLACK, P. H. QUAIL, and R.A. SHARROCK. 1998. Genetics 149, 523–535. 67 HO, Y. S. J., L. M. BURDEN, and J.H. HURLEY. 2000. EMBO J. 19, 5288–5299. 68 HOFMANN, E., R. GRIMM, K. HARTER, V. SPETH, and E. SCHA¨ FER. 1991. Planta 183, 265–273. 69 HOFMANN, E., V. SPETH, and E. SCHA¨ FER. 1990. Planta 180, 372–377. 70 HOLZWARTH, A. R., J. WENDLER, B. P. RUZSICSKA, S. E. BRASLAVSKY, and K. SCHAFFNER. 1984. Biochim. Biophys. Acta 791, 265–273.
25
26 The Phytochromes
71 HUBSCHMANN, T., T. B¨ORNER, E. HARTMANN, and T. LAMPARTER. 2001a. Eur. J. Biochem. 268, 2055–2063. 72 HUBSCHMANN, T., H. J. JORISSEN, T. B¨ORNER, W. G¨ARTNER, and N. TANDEAU DE MARSAC. 2001b. Eur. J. Biochem. 268, 3383–3389. 73 HUGHES, J., T. LAMPARTER, F. MITTMANN, E. HARTMANN, W. G¨ARTNER, A. WILDE, and T. B¨ORNER. 1997. Nature 386, 663. 74 JABBEN, M., J. SHANKLIN, and R.D. VIERSTRA. 1989. Plant Physiol. 90, 380–384. 75 JONES, A. M., C. D. ALLEN, G. GARDNER, and P.H. QUAIL. 1986. Plant Physiol. 81, 1014–1016. 76 JONES, A. M. and M.D. EDGERTON. 1994. Sem. Cell Biol. 5, 295–302. 77 JONES, A. M., R. D. VIERSTRA, S. M. DANIELS, and P.H. QUAIL. 1985. Planta 164, 505–506. 78 JORISSEN, H., B. QUEST, A. REMBERG, T. COURSIN, S. E. BRASLAVSKY, K. SCHAFFNER, N. T. de MARSAC, and W. G¨ARTNER. 2002. Eur. J. Biochem. 269, 2662–2671. 79 KAMI, C., K. MUKOUGAWA, T. MURAMOTO, A. YOKOTA, T. SHINOMURA, J. C. LAGARIAS, and T. KOHCHI. 2004. Proc Natl Acad Sci U S A 101, 1099–1104. 80 KELLY, J. M. and J.C. LAGARIAS. 1985. Biochemistry 24, 6003–6010. 81 KENDRICK, R. E. and G.H.M. KRONENBERG. 1994. Photomorphogenesis in Plants, pp. 828. Martinus Nijhoff Publishers, Dordrecht, The Netherlands. 82 KENDRICK, R. E. and C.J.P. SPRUIT. 1977. Photochem. Photobiol. 26, 201–214. 83 KLEIN, G., S. GROMBEIN, and W. R¨UDIGER. 1977. Hoppe-Seyler’s Z. Physiologie Chem. 358, 1077–1079. 84 KNEIP, C., P. HILDEBRANDT, W. SCHLAMANN, S. E. BRASLAVSKY, F. MARK, and K. SCHAFFNER. 1999. Biochemistry 38, 15185–15192. 85 KOHCHI, T., K. MUKOUGAWA, N. FRANKENBERG, M. MASUDA, A. YOKOTA, and J.C. LAGARIAS. 2001. Plant Cell 13, 425–436. 86 KONOMI, K. and M. FURUYA. 1986. Plant Cell Physiol. 27, 1507–1512. 87 LAGARIAS, J. C., 1985. Photochem. Photobiol. 42, 811–820.
88 LAGARIAS, J. C., J. M. KELLY, K. L. CYR, and W. O. SMITH, Jr., 1987. Photochem. Photobiol. 46, 5–13. 89 LAGARIAS, J. C. and D.M. LAGARIAS. 1989. Proc. Natl. Acad. Sci. USA 86, 5778–5780. 90 LAGARIAS, J. C. and H. RAPOPORT. 1980. J. Am. Chem. Soc. 102, 4821–4828. 91 LAMPARTER, T., M. CARRASCAL, N. MICHAEL, E. MARTINEZ, G. ROTTWINKEL, and J. ABIAN. 2004. Biochemistry 43, 3659–3669. 92 LAMPARTER, T., B. ESTEBAN, and J. HUGHES. 2001. Eur. J. Biochem. 268, 4720–4730. 93 LAMPARTER, T., N. MICHAEL, O. CASPANI, T. MIYATA, K. SHIRAI, and K. INOMATA. 2003. J. Biol. Chem. 278, 33786–33792. 94 LAMPARTER, T., N. MICHAEL, F. MITTMANN, and B. ESTEBAN. 2002. Proc. Natl. Acad. Sci. USA 99, 11628–11633. 95 LAMPARTER, T., F. MITTMANN, W. G¨ARTNER, T. B¨ORNER, E. HARTMANN, and J. HUGHES. 1997. Proc. Natl. Acad. Sci. USA 94, 11792–11797. 96 LI, L. and J.C. LAGARIAS. 1992. J. Biol. Chem. 267, 19204–19210. 97 LI, L., J. T. MURPHY, and J.C. LAGARIAS. 1995. Biochemistry 34, 7923–7930. 98 LINDNER, I., S. E. BRASLAVSKY, K. SCHAFFNER, and W. G¨ARTNER. 2000. Angew. Chem. Intl. Ed. 39, 3269–3271. 99 MACKENZIE, J. M. J., COLEMAN, R. A., and PRATT, L. H., (1974). A specific reversible intracellular localization of phytochrome as Pfr. Plant Physiol. 53, Abstract No. 5. 100 MAINES, M. D. and G.M. TRAKSHEL. 1993. Arch. Biochem. Biophys. 300, 320–326. 101 MANCINELLI, A. L., 1986. Plant Physiol. 82, 956–961. 102 MATHEWS, S., M. LAVIN, and R.A. SHARROCK. 1995. Ann. Miss. Bot. Gard. 82, 296–321. 103 MATHEWS, S. and R.A. SHARROCK. 1997. Plant Cell Environ. 20, 666–671. 104 MATSUSHITA, T., N. MOCHIZUKI, and A. NAGATANI. 2003. Nature 424, 571–574. 105 MCCORMAC, A. C., A. FISCHER, A. M. KUMAR, D. SOLL, and M.J. TERRY. 2001. Plant J. 25, 549–561. 106 MCDOWELL, M. T. and J.C. LAGARIAS. 2001. Plant Physiol. 126, 1546–1554. 107 MCDOWELL, M. T. and J.C. LAGARIAS. 2002. In Heme, Chlorophyll and Bilins,
References
108
109
110 111 112
113
114
115
116 117 118
119 120
121
122 123
Methods and Protocols (ed. A.G. SMITH and M. WITTY), pp. 293–309. Humana Press, Totowa, NJ. MCMICHAEL, R. W. and J.C. LAGARIAS. 1990. In Current Topics in Plant Biochemistry and Physiology (ed. D.D.a.B. RANDALL, .), pp. 259–270. University of Missouri–Columbia. MIZUTANI, Y., S. TOKUTOMI, and T. KITAGAWA. 1994. Biochemistry 33, 153–158. MOHR, H. and H. OELZE-KAROW. 1982. Plant Physiol. 70, 863–866. MONTGOMERY, B. L. and J.C. LAGARIAS. 2002. Trends Plant Sci. 7, 357–366. MORAND, L. Z., D. G. KIDD, and J.C. LAGARIAS. 1993. Plant Physiol. 101, 97–103. M¨OSINGER, E., A. BATSCHAUER, E. SCHA¨ FER, and K. APEL, 1985. Eur. J. Biochem. 147, 137–142. MURAMOTO, T., T. KOHCHI, A. YOKOTA, I. H. HWANG, and H.M. GOODMAN. 1999. Plant Cell 11, 335–347. MURAMOTO, T., N. TSURUI, M. J. TERRY, A. YOKOTA, and T. KOHCHI. 2002. Plant Physiol. 130, 1958–1966. MURPHY, J. T. and J.C. LAGARIAS. 1997. Curr. Biol. 7, 870–876. NAGY, F. and E. SCHA¨ FER. 2002. Annu. Rev. Plant Biol. 53, 329–355. 329–355. NAKASAKO, M., M. WADA, S. TOKUTOMI, K. T. YAMAMOTO, J. SAKAI, M. KATAOKA, F. TOKUNAGA, and M. FURUYA. 1990. Photochem. Photobiol. 52, 3–12. NEFF, M. M., C. FANKHAUSER, and J. CHORY. 2000. Gene Dev. 14, 257–271. NOZUE, K., T. KANEGAE, T. IMAIZUMI, S. FUKUDA, H. OKAMOTO, K. C. YEH, J. C. LAGARIAS, and M. WADA. 1998. Proc. Natl. Acad. Sci. USA 95, 15826–15830. ORTIZ DE MONTELLANO, P. R. and K. AUCLAIR. 2003. In The Porphyrin Handbook. The Iron and Cobalt Pigments, Biosynthesis, Structure and Degradation. (ed. K.M. KADISH, K.M. SMITH, and R. GUILARD), pp. 183–210. Academic Press, New York City. PAPENBROCK, J. and B. GRIMM. 2001. Planta 213, 667–681. PARK, C. M., J. I. KIM, S. S. YANG, J. G. KANG, J. H. KANG, J. Y. SHIM, Y. H.
124
125 126 127 128 129 130
131 132
133 134
135
136 137
138 139
140
141
142
CHUNG, Y. M. PARK, and P.S. SONG. 2000a. Biochemistry 39, 10840–10847. PARK, C. M., J. Y. SHIM, S. S. YANG, J. G. KANG, J. I. KIM, Z. LUKA, and P.S. SONG. 2000b. Biochemistry 39, 6349–6356. PARKINSON, J. S. and E.C. KOFOID. 1992. Ann. Rev. Genet. 26, 71–112. PARKS, B. M. and P.H. QUAIL. 1991. Plant Cell 3, 1177–1186. PONTING, C. P. and L. ARAVIND. 1997. Curr. Biol. 7, R674–R677. QUAIL, P. H., 1991. Ann. Rev. Genet. 25, 389–409. QUAIL, P. H., 1997. Plant Cell Environ. 20, 657–665. QUAIL, P. H., M. T. BOYLAN, B. M. PARKS, T. W. SHORT, Y. XU, and D. WAGNER. 1995. Science 268, 675–680. QUAIL, P. H., E. SCHA¨ FER, and D. MARME. 1973. Plant Physiol. 52, 128–131. RATLIFF, M., W. M. ZHU, R. DESHMUKH, A. WILKS, and I. STOJILJKOVIC. 2001. J. Bacteriol. 183, 6394–6403. REINBOTHE, S., C. REINBOTHE, K. APEL, and N. LEBEDEV. 1996. Cell 86, 703–705. REMBERG, A., I. LINDNER, T. LAMPARTER, J. HUGHES, C. KNEIP, P. HILDEBRANDT, S. E. BRASLAVSKY, W. G¨ARTNER, and K. SCHAFFNER. 1997. Biochemistry 36, 13389–13395. REMBERG, A., P. SCHMIDT, S. E. BRASLAVSKY, W. G¨ARTNER, and K. SCHAFFNER. 1999. Eur. J. Biochem. 266, 201–208. R¨UDIGER, W., 1992. Photochem. Photobiol. 56, 803–809. R¨UDIGER, W., T. BRANDLMEIER, I. BLOS, A. GOSSAUER, and J.–P. WELLER. 1980. Z. Naturforsch. 35, 763–769. R¨UDIGER, W. and D.L. CORRELL. 1969. Liebigs. Ann. Chem. 723, 208–212. R¨UDIGER, W., F. THU¨ MMLER, E. CMIEL, and S. SCHNEIDER. 1983. Proc. Natl. Acad. Sci. USA 80, 6244–6248. SAGE, L. C., 1992. Pigment of the Imagination, A History of Phytochrome Research. Academic Press, Inc., San Diego. SCHACTER, B. A., E. B. NELSON, H. S. MARVER, and B.S.S. MASTERS. 1972. J. Biol. Chem. 247, 3601–3607. SCHLUCHTER, W. M. and A.N. GLAZER. 1997. J. Biol. Chem. 272, 13562–13569.
27
28 The Phytochromes
143 SCHLUCHTER, W. M. and A.N. GLAZER. 1999. In The Photosynthetic Prokaryotes. (ed. G.A. PESCHEK, W. LOFFELHARDT, and G. SCHMETTERER), pp. 83–95. Kluwer Academic/Plenum Press, New York. 144 SCHMITT, M. P., 1997. J. Bacteriol. 179, 838–845. 145 SCHNEIDER-POETSCH, H. A., B. BRAUN, S. MARX, and A. SCHAUMBURG. 1991. FEBS Lett. 281, 245–249. 146 SHARROCK, R. A. and T. CLACK. 2002. Plant Physiol. 130, 442–456. 147 SHINOMURA, T., A. NAGATANI, H. HANZAWA, M. KUBOTA, M. WATANABE, and M. FURUYA. 1996. Proc. Natl. Acad. Sci. USA 93, 8129–8133. 148 SHINOMURA, T., K. UCHIDA, and M. FURUYA. 2000. Plant Physiol. 122, 147–156. 149 SIEGELMAN, H. W., B. C. TURNER, and S.B. HENDRICKS. 1966. Plant Physiol. 41, 1289–1292. 150 SINESHCHEKOV, V., J. HUGHES, E. HARTMANN, and T. LAMPARTER. 1998. Photochem. Photobiol. 67, 263–267. 151 SINESHCHEKOV, V., L. KOPPEL, B. ESTEBAN, J. HUGHES, and T. LAMPARTER. 2002. J. Photochem. Photobiol. B. 67, 39–50. 152 SOMERS, D. E. and P.H. QUAIL. 1995a. Plant Physiol. 107, 523–534. 153 SOMERS, D. E. and P.H. QUAIL. 1995b. Plant J. 7, 413–427. 154 SONG, P. S., 1999. J. Biochem. Mol. Biol. 32, 215–225. 155 SONG, P. S., M. H. PARK, and M. FURUYA. 1997. Plant Cell Environ. 20, 707–712. 156 SPETH, V., V. OTTO, and E. SCHA¨ FER. 1987. Planta 171, 332–338. 157 STOCK, A. M., V. L. ROBINSON, and P.N. GOUDREAU. 2000. Ann. Rev. Biochem. 69, 183–215. 158 TAYLOR, B. L. and I.B. ZHULIN. 1999. Microbiol. Mol. Biol. Rev. 63, 479–506. 159 TENHUNEN, R., H. S. MARVER, and R. SCHMID. 1968. Proc. Natl. Acad. Sci. USA 61, 748–755. 160 TENHUNEN, R., H. S. MARVER, and R. SCHMID. 1969. J. Biol. Chem. 244, 6388–6393. 161 TERRY, M. J., 1997. Plant Cell Environ. 20, 740–745.
162 TERRY, M. J. and J.C. LAGARIAS. 1991. J. Biol. Chem. 266, 22215–22221. 163 TERRY, M. J., P. J. LINLEY, and T. KOHCHI. 2002. Biochem. Soc. Trans. 30, 604–609. 164 TERRY, M. J., M. D. MAINES, and J.C. LAGARIAS. 1993a. J. Biol. Chem. 268, 26099–26106. 165 TERRY, M. J., M. T. MCDOWELL, and J.C. LAGARIAS. 1995. J. Biol. Chem. 270, 11111–11118. 166 TERRY, M. J., J. A. WAHLEITHNER, and J.C. LAGARIAS. 1993b. Arch. Biochem. Biophys. 306, 1–15. 167 THU¨ MMLER, F., M. DUFNER, P. KREISL, and P. DITTRICH. 1992. Plant Mol. Biol. 20, 1003–1017. 168 TROXLER, R. F., A. S. BROWN, and S.B. BROWN. 1979. J. Biol. Chem. 254, 3411–3418. 169 VAVILIN, D. V. and W.F. VERMAAS. 2002. Physiol. Plant. 115, 9–24. 170 VIERSTRA, R. D., 1993. Plant Physiol. 103, 679–684. 171 VIERSTRA, R. D. and S.J. DAVIS. 2000. Sem. Cell Dev. Biol. 11, 511–521. 172 VIERSTRA, R. D. and P.H. QUAIL. 1982. Planta 156, 158–165. 173 VIERSTRA, R. D. and P.H. QUAIL. 1986. In Photomorphogenesis in Plants (ed. R.E. KENDRICK and G.H.M. KRONENBERG), pp. 35–60. Martinus Nijhoff, Dordrecht. 174 WADA, M., T. KANEGAE, K. NOZUE, and S. FUKUDA. 1997. Plant Cell Environ. 20, 685–690. 175 WAGNER, D., C. D. FAIRCHILD, R. M. KUHN, and P.H. QUAIL. 1996. Proc. Natl. Acad. Sci. USA 93, 4011–4015. 176 WAHLEITHNER, J. A., L. LI, and J.C. LAGARIAS. 1991. Proc. Natl. Acad. Sci. USA 88, 10387–10391. 177 WELLER, J. P. and A. GOSSAUER. 1980. Chem. Ber. 113, 1603–1611. 178 WENDLER, J., A. R. HOLZWARTH, S. E. BRASLAVSKY, and K. SCHAFFNER. 1984. Biochim. Biophys. Acta 786, 213–221. 179 WHITELAM, G. C., S. PATEL, and P.F. DEVLIN. 1998. Phil. Trans. Royal Soc. London Series B, Biological Sciences 353, 1445–1453. 180 WILDE, A., B. FIEDLER, and T. B¨ORNER. 2002. Mol. Microbiol. 44, 981–988. 181 WILKS, A., 2002. Antioxid Redox Signal 4, 603–614.
References 182 WILKS, A. and M.P. SCHMITT. 1998. J. Biol. Chem. 273, 837–841. 183 WILLOWS, R. D., 2003. Nat Prod Rep 20, 327–341. 184 WONG, Y. S. and J.C. LAGARIAS. 1989. Proc. Natl. Acad. Sci. USA 86, 3469–3473. 185 WONG, Y. S., R. W. MCMICHAEL, and J.C. LAGARIAS. 1989. Plant Physiol. 91, 709–718. 186 WU, S. H. and J.C. LAGARIAS. 2000. Biochemistry 39, 13487–13495. 187 WU, S. H., M. T. MCDOWELL, and J.C.
188 189
190
191
LAGARIAS. 1997. J. Biol. Chem. 272, 25700–25705. YAMAMOTO, K. T., 1990. Botan Mag 103, 469–491. YEH, K. C. and J.C. LAGARIAS. 1998. Proc. Natl. Acad. Sci. USA 95, 13976–13981. YEH, K. C., S. H. WU, J. T. MURPHY, and J.C. LAGARIAS. 1997. Science 277, 1505–1508. ZHU, W. M., A. WILKS, and I. STOJILJKOVIC. 2000. J. Bacteriol. 182, 6783–6790
29
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
1
The Mammalian Rod Cell Photoreceptor Rhodopsin Najmoutin G. Abdulaev, and Kevin D. Ridge University of Maryland Biotechnology Institute, Rockville, USA
Originally published in: Handbook of Photosensory Receptors. Edited by Winslow R. Briggs and John L. Spudich. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31019-7
1 Introduction
Visual phototransduction is the process of transforming light energy absorbed by the specialized retinal cells (rods and cones) into an electrical response. The amplitude and duration of the response depends on light intensity and the type of cell. Rods (the dim-light photoreceptors) are more sensitive to light than cones (the bright-light photoreceptors), although the latter demonstrate a faster light response [1]. In rods (Figure 1A), a characteristic two-phase electrical response reflects a multistep, highly amplified series of biochemical reactions expressing activation and recovery phases of signaling. This elegant biochemical machinery is designed to exchange information between the rod outer segment (ROS) plasma membrane (PM) harboring cGMP-regulated ion channels and the light-sensitive pigment, rhodopsin, which is abundantly present in the ROS disk membranes (DM). Rhodopsin is the first link in the chain of biochemical reactions involved in visual phototransduction (Figure 1B). It is also a member of the large family of heptahelical G-protein coupled receptors (GPCRs). In this family, rhodopsin appears to be unique in that it displays very low noise levels and the rapid activation and inactivation kinetics necessary for high sensitivity and fast image processing. These requirements are realized by adopting 11-cis retinal as a chromophore sensitive to visible light. Of the ∼1000 GPCRs identified, bovine rhodopsin is the only receptor for which a three-dimensional structure is available [2–4]. The crystal structure has confirmed a wealth of biochemical and biophysical data on the dark-state of rhodopsin and provided a template for interpreting phenomenological data obtained by affinity labeling and site-directed muta-genesis. In addition, the availability of the rhodopsin structure has provided a useful model for understanding the structure and function of other members of the GPCR family. In this Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
2 The Mammalian Rod Cell Photoreceptor Rhodopsin
Fig. 1 Mammalian visual phototransduction. a) Schematic of the highly differentiated rod cell with the outer segment (OS) and inner segment (IS) indicated. The OS is comprised of numerous stacked disks that contain the major phototransduction components. The IS contains the biosynthetic machinery of the cell. b) Signal propagation by light-activated rhodopsin. The major protein components of the disk membrane (DM) that lead to hyperpolarization of the rod cell include rhodopsin (R), the retinal G-protein transducin (Gt ) "$( heterotrimer, light-activated rhodopsin (R* ), the cGMP phosphodiesterase (PDE6) "-, $-, (-, and F-subunits, guanylate cyclase (GC), and gunaylate cyclase activating protein (GCAP). The major protein components of the plasma membrane (PM) that lead to hyperpolarization of the rod cell include the cGMP-gated channel "- and $-subunits
and the Na+ /Ca2+ , K+ exchanger. Also indicated are guanylate kinase (GK) and nucleoside diphosphate kinase (NDPK), which are involved in guanine nucleotide metabolism, and calmodulin, a major Ca2+ binding protein in the ROS. c) Inactivation of light-activated rhodopsin. The major protein components of the DM that lead to the inactivation of R* include the Ca2+ -free and Ca2+ -bound forms of myristoylated recoverin (Rec), rhodopsin kinase (RK), arrestin (Arr), and protein phosphatase 2A (PP2A). In b) and c), R, R* , and opsin (O) are shown as distinctly shaded dimers in order to emphasize their proposed higher order organization. While atomic force microscopy evidence exists for the oligomeric state of murine rhodopsin (R1 R2 ) and opsin (O1 O2 ) in the DM [68, 69], the existence of R* dimers interacting with Gt , Arr, and RK remains an open question.
chapter, various structural and functional aspects of mammalian rod rhodopsin are highlighted. While this by no means represents a comprehensive molecular description of rhodopsin, it is anticipated that the reader will gain a new sense of understanding and appreciation for this fascinating photosensory receptor. 2 Rhodopsin and Mammalian Visual Phototransduction
The process of mammalian visual phototransduction can be divided into three distinct pathways. Only the first two pathways involve rhodopsin: signal propagation
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
2 Rhodopsin and Mammalian Visual Phototransduction
by light-activated rhodopsin (metarhodopsin II or R* ) and inactivation of R* . The third pathway, inactivation of the catalytic subunit (α-subunit) of the heterotrimeric retinal G-protein, transducin (Gt ), continues to attract much attention and those interested in a perspective on this process are referred to some recent reviews on the subject [5–7].
2.1 Signal Amplification by Light-activated Rhodopsin
Light activation of rhodopsin (R) in the DM to produce R* catalytically promotes the exchange of GDP for GTP in hundreds of Gt molecules to produce the GTPbound form of Gtα (Figure 1B). Gtα -GTP interacts with the γ -subunit of the cGMP phosphodiesterase (PDE6 in rods), removing an inhibitory constraint from the catalytic α or β subunits that results in cGMP hydrolysis. The fast depletion of cGMP results in the closure of cGMP-gated channels in the PM. A drastic reduction in the circulating “dark” current, due to a blockage of Na+ and Ca2+ entry into the rod and Ca2+ depletion of the ROS as a result of continuous function of the PM Na+ , Ca2+ , K+ exchanger, activates the Ca2+ -binding guanylate cyclase activating protein (GCAP). The low intracellular Ca2+ concentration promotes activation of guanylate cyclase (GC) by Ca2+ -free GCAP, catalyzing accelerated synthesis of cGMP from GTP supplied by the guanine nucleotide cycle. The latter is comprised of two distinct nucleotide-binding enzymes, guanylate kinase (GK) and nucleoside diphosphate kinase (NDPK). The release of bound Ca2+ from calmodulin (CaM) leads to its dissociation from the cGMP-gated channel conferring a lower affinity for cGMP. Thus, light-activation of rhodopsin to produce R* leads to changes in the net production of cGMP by perturbing two opposing catalytic activities: degradation of cGMP by PDE6 and synthesis of cGMP by GC. Since cGMP levels govern how many channels are opened, the signal is transformed from a physical stimulus (light) to a biochemical chain of reactions that culminates in the hyperpolarization of the rod.
2.2 Inactivation of Light-activated Rhodopsin
In the dark, when the concentration of intracellular Ca2+ is high, myristoylated recoverin (Rec) is in the Ca2+ -bound state and forms a complex with rhodopsin kinase (RK) at the DM, preventing phosphorylation of R* (Figure 1C). When the Ca2+ concentration in the ROS drops as a result of the light response, Rec releases its bound Ca2+ and dissociates from RK. The interaction between R* and RK leads to rapid phosphorylation of R* . The subsequent binding of arrestin (Arr) blocks R* signaling via Gt . The release of all-trans retinal, accompanied by enzymatic dephosphorylation by protein phosphatase 2A (PP2A), prepares the apoprotein, opsin, for 11-cis retinal binding and restoration of its inactive ground state.
3
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
4 The Mammalian Rod Cell Photoreceptor Rhodopsin
3 Properties of Rhodopsin 3.1 Isolation of Rhodopsin
Bovine rhodopsin is the best characterized member of the large family of GPCRs. A virtual explosion of information on rhodopsin can be attributed to the high natural abundance of the pigment in the DM and the accessibility of bovine rhodopsin and its mutants in amounts amenable to extensive biochemical, biophysical, and structural studies. Polyacrylamide gel electrophoresis in the presence of sodium dodecyl sulfate (SDS-PAGE) shows that more than 90% of the protein complement of the DM is accounted for by rhodopsin. In fact, highly purified rhodopsin in the DM can be prepared by sucrose density gradient centrifugation of ROS obtained by simple shaking of bovine retinae in the appropriate buffer and low speed centrifugation [8]. ROS preparations in hypotonic buffers are then subjected to Ficoll f lotation with subsequent high speed centrifugation [9]. Further purification of rhodopsin is typically based on its solubilization in detergents in combination with conventional chromatographic techniques. Although many different detergents effectively solubilize rhodopsin (e.g., digitonin, cetyl trimethylammonium bromide, nonylglucoside), dodecylmaltoside remains the detergent of choice due to very high rhodopsin stability in this media and the retention of many functional properties [10, 11]. Affinity chromatography based on separation with concanavalin A-Sepharose was an early purification method of choice [12]. However, rhodopsin samples prepared by this method are often contaminated with concanavalin A requiring the introduction of additional purification steps. Affinity chromatography based on matrices with immobilized monoclonal antibodies, such as rho 1D4-Sepharose [13], appears to provide an extraordinarily efficient method for the one-step purification of rhodopsin. Immunoaffinity chromatography is essential for purifying mutants of rhodopsin obtained by heterologous eukaryotic cell expression and also appears to be instrumental in separating unfolded or misfolded opsin from properly folded rhodopsin [14]. More recently, methods have been developed for the selective extraction of ROS rhodopsin from the DM using specific divalent cations (Zn2+ or Cd2+ ) in conjunction with alkylglucoside detergents [15]. This method allows for the effective purification of rhodopsin and was instrumental in obtaining diffraction-quality three dimensional crystals [16]. 3.2 Biochemical and Physicochemical Properties of Rhodopsin
Bovine rhodopsin is a single polypeptide of ∼40 kDa consisting of 348 amino acids. Its amino acid sequence is known from analyses of the opsin apoprotein and the corresponding DNA [17–19]. About 65% of the amino acid residues in bovine opsin are hydrophobic residues unevenly distributed along the polypeptide chain (Figure 2). This arrangement appears to be crucial for the membrane disposition of the protein.
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
3 Properties of Rhodopsin
Fig. 2 A two-dimensional model for the organization of bovine rhodopsin in the DM. The lengths of the transmembrane (TM) helices (I–VII) are based on the crystal structure of rhodopsin [2, 3]. The cytoplasmic helix, H8, extending from TM VII is also shown. The small hexagons in the N-terminal region represent branched
oligosaccharide chains attached to Asn-2 and Asn-15 and the zigzag lines attached to Cys-322 and Cys-323 in the C-terminal region represent palmitoyl groups. The disulfide bridge between Cys-110 and Cys-187 is also shown. Key amino acid residues mentioned in the text are highlighted in black.
Rhodopsin purity is assessed by comparing the ratios of the absorption maxima at 280 nm (protein) and 500 nm (chromophore) in the dark. This ratio (A280 /A500 ) for purified rhodopsin varies between 1.60 and 1.65. Purified preparations of rhodopsin appear on SDS-PAGE as a single but somewhat broadened and diffuse band. This property is characteristic for many membrane proteins and may represent possible heterogeneity in several post-translational modifications. In bovine rhodopsin, diffusion of the band appears to be primarily due to heterogeneity in asparagine-linked (N-linked) glycosylation (see Section 3.3). Additional factors contributing to the apparent heterogeneity of bovine rhodopsin may include phosphorylated rhodopsin species produced by exposure to light or lack of appropriate dark adaptation before or during retina processing. Finally, defects in fatty acylation (palmitoylation) may also contribute to rhodopsin heterogeneity [20].
5
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
6 The Mammalian Rod Cell Photoreceptor Rhodopsin
3.3 Post-translational Modifications in Rhodopsin
As indicated in Section 3.2, the opsin polypeptide is subject to several post-translational modifications. First, the amino-terminal sequence, together with the acetylated amino-terminal amino acid, contains two consensus sequences allowing Nglycosylation at Asn-2 and Asn-15 [21]. Carbohydrate analysis of bovine rhodopsin shows that ∼70% of the oligosaccharides are accounted for by Man3 Glc-NAc3 , 10% by Man4 GlcNAc3 , and 20% by Man5 GlcNAc3 [21–23]. What is not clear, however, is the distribution of these carbohydrate components between the two N-linked glycosylation sites. A second type of post-translational modification is palmitoylation of two vicinal cysteine residues at positions 322 and 323 via a thioester linkage [24–26]. Third, two highly conserved cysteine residues, Cys-110 and Cys-187, are disulfide linked [27]. This disulfide bridge is conserved not only in opsins, but also in the entire rhodopsin family of GPCRs and appears to play an important role in protein stability [28–30]. Another type of rhodopsin post-translational modification is the light-dependent phosphorylation of several threonines and serines at the extreme carboxyl terminus of the protein by RK [31, 32]. Despite 30 years of intensive effort, questions as to which of the serine and threonine residues become phosphorylated under physiological conditions remain unanswered. Finally, covalent binding of 11-cis retinal to the ε-amino group of Lys-296 via a protonated aldimine bond, a Schiff base, is a crucial and unique post-translational modification of rhodopsin. 3.4 Membrane Topology of Rhodopsin and Functional Domains
Seven hydrophobic stretches of about 25–30 amino acid residues interrupted by relatively short hydrophilic sequences is the signature feature of rhodopsin (Figure 2), other GPCRs, and numerous retinylidene microbial photoreceptors as well. This odd number of transmembrane (TM) excursions positions the amino- and carboxyl-terminal ends of the protein on the opposing intradiscal (extracellular) and cytoplasmic surfaces, respectively. There is no obvious explanation for these so called “seven pillars of wisdom”. The notion that seven hydrophobic TM helices are required to create a suitable binding pocket for low molecular weight agonists or antagonists (like 11-cis retinal) appears to be consistent with much of the experimental data, but this can not be extended to a vast number of GPCRs which have smaller ligands. Remarkably, a consensus on the heptahelical arrangement of rhodopsin was reached in the late 1970s [33–36] with only limited available information on the primary structure, and subsequently confirmed by the crystal structure of bovine rhodopsin [2–4]. The structure defines three major functional domains within rhodopsin: transmembrane, intradiscal, and cytoplasmic. 3.4.1 Transmembrane Domain of Rhodopsin The TM domain of rhodopsin includes the seven predominantly α-helical segments, termed TM I–TM VII (Figure 2). TMI (residues 34–64) contains a highly conserved
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
3 Properties of Rhodopsin
Asn-55 which forms hydrogen bonds with other residues (see Section 5.1). TM II, extending from residue 71 through 100, contains two adjacent glycine residues, Gly-89 and Gly-90. These two residues appear to be particularly susceptible to naturally occurring mutations in both mild and severe visual disorders [37–39]. TM II also contains a conserved Asp-83. TM III encompasses residues 106–139 and contains several important amino acid residues. First, the cytoplasmic end of TM III is capped by the highly conserved and functionally important triad: Glu-134, Arg-135, and Tyr-136 (the E(D)RY motif). The carboxyl group of Glu-113, which is located toward the center of this helix, serves as the dark-state counterion to the protonated retinylidene–Schiff base [40–42]. Next, highly conserved cysteine residue Cys-110 forms a disulfide bond with Cys-187 from the second intradiscal loop [27]. Interestingly, there are also two adjacent glycine residues, Gly-120 and Gly-121, which may contribute to the conformational f lexibility of this helix [43]. TM IV is the shortest of the TMs and extends from residues 150 to 172. Two proline residues, Pro-170 and Pro-171, are located at the cytoplasmic end of this helix. Cys-167, near the center of the helix, is an important component of the chromophore binding pocket [2, 3]. TM V incorporates residues 200–225, and with the exception of His-211 and Phe-212, appears not to be overburdened with functionally important amino acids. TM VI includes residues 244–276. Pro-267 of this helix appears to be essential for providing the necessary flexibility to support the dynamics of receptor activation [44]. Another important residue is Glu-247, which makes contact with Arg-135 of the E(D)RY motif of TM III (see Section 5.1). TM VII extends from residues 286 to 309 and contains several important side chains. First, this helix contains Lys-296, the site of retinal attachment. The ε-amino group of this amino acid reacts with the aldehyde group of 11-cis retinal to form a Schiff base. Second, the cytoplasmic end of this helix contains the conserved NPXXY motif, considered to be a part of a larger functional unit known as the NPXXY(X)5,6 F motif [45]. It has been shown that a portion of this motif becomes accessible to a monoclonal antibody upon light activation of rhodopsin [46]. Finally, Lys-296, Thr-297, Ser-298, and Ala-299 form a 3, 10 helix. All other helices in rhodopsin, both transmembrane and cytoplasmic, are regular α-helices [47].
3.4.2 Intradiscal Domain of Rhodopsin The intradiscal (extracellular) domain of rhodopsin (Figure 2), long suspected in playing an important role in the structural stability of the pigment, incorporates a 33-residue N-glycosylated amino-terminal region and three connecting loops between TM helices II–III (amino acid residues 101–105), IV–V (residues 173–198), and VI–VII (residues 277–285). As indicated in Sections 3.3 and 3.4.1, Cys-187 of the second intradiscal loop forms a disulfide bridge with Cys-110 on the extracellular end of TM helix III [27]. Only three of the seventeen positively charged surface residues in rhodopsin are found in the intradiscal domain, supporting the inside-positive rule characteristic for the arrangement of many polytopic membrane proteins [48]. The intradiscal domain appears to be highly structured and rarely tolerates even minor amino acid substitutions [49].
7
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
8 The Mammalian Rod Cell Photoreceptor Rhodopsin
3.4.3 Cytoplasmic Domain of Rhodopsin The cytoplasmic domain (Figure 2) includes the carboxyl-terminal tail of about 25 residues and loops connecting TM helices I–II (residues 65–70), III–IV (residues 140–149), and V–VI (residues 226–243). A fourth cytoplasmic loop (residues 311–323), now designated as H8 [2], extends from the cytoplasmic end of TM VII to the palmitoylated Cys-322 and Cys-323. In contrast to the highly structured and rigid intradiscal domain, the cytoplasmic domain appears to be designed to provide transient conformational changes that support activation of signaling and regulatory proteins [2, 50]. Valuable information on the cytoplasmic domain dark-state structure and the dynamics of its light-induced conformational changes became available through the painstaking work of the Khorana and Hubbell laboratories [51]. The use of several experimental approaches, including site-directed spin labeling (SDSL) in combination with electron paramagnetic resonance spectroscopy (EPR) [52], sulfhydryl reactivity [14], and disulfide crosslinking [53], on more than 100 single cysteine mutants and 40 double cysteine mutants of rhodopsin, has allowed important information to be collected on the solution state conformation and the dynamics of this domain, as well as the boundaries of the loops. The resulting borders appear to be in good agreement with the published crystal structure of rhodopsin [2–4], although SDSL/EPR studies suggest that the third cytoplasmic loop might fold up further into TM V and TM VI [54]. Quite interestingly, this latter finding is in agreement with a three-dimensional model generated from an alternative (trigonal) crystal form of rhodopsin [55].
4 Chromophore Binding Pocket and Photolysis of Rhodopsin
The high quantum yield of retinal isomerization in rhodopsin is believed to be supported by the constrained conformation of the chromophore within the protein. Numerous efforts have been directed towards understanding both the groundand excited-states of the rhodopsin chromophore. In the ground-state, the conjugated polyene chain is in the 11-cis configuration with the β-ionone ring primarily oriented to give a 6-s-cis conformer [56–58]. The chromophore binding pocket is predominantly comprised of hydrophobic amino acids with the β-ionone ring positioned between Phe-212 and Phe-261, whereas Trp-265 is closer to the center of the chromophore promoting the 11-cis form of the polyene chain [2, 3]. Although well protected from the aqueous environment, the chromophore binding pocket contains several hydrophilic side chains. Of keen interest is the carboxyl group of Glu-113, which serves as counterion to the protonated the retinylidene–Schiff base [40–42]. This interaction appears to be of key importance in stabilizing ground-state chromophore–protein interactions. Altogether, around 30 amino acid side chains participate in forming the chromophore binding pocket [2]. Light-induced isomerization of 11-cis retinal to its all-trans configuration is the primary step in vision [59]. This ultra-fast (femtosecond) photochemical process is followed by much slower events culminating in increased accessibility of the
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
5 Structure of Rhodopsin
Fig. 3 Photointermediates of rhodopsin studies is shown. The absorbance maxima for photolysis. The photobleaching sequence of the various photointermediates and their aprhodopsin according to kinetic spectroscopy proximate half lives at 20◦ C are indicated.
chromophore binding pocket to the aqueous environment with subsequent hydrolysis of the retinylidene–Schiff base [60]. These two extreme events in the photolysis of rhodopsin are mediated by several thermally stable dark processes resulting in photoproducts with defined spectral, kinetic, and functional properties. Following the light-induced cis → trans isomerization of retinal, the earliest photointermediate observed to date is photorhodopsin [61]. However, this intermediate is highly unstable so little is known about its structure. The first structurally characterized stable photointermediate detected is bathorhodopsin (Figure 3), which thermally decays to the blue-shifted intermediate (BSI), followed by lumirhodopsin, metarhodopsin I, and metarhodopsin II [62, 63]. Of particular interest in this sequence is metarhodopsin II (or R* ), the active form of rhodopsin with bound all-trans retinal and capable of Gt activation. Although the precise mechanism of decay for this signaling photointermediate is not fully understood [64, 65], the two products typically detected are metarhodopsin III and opsin plus free all-trans retinal.
5 Structure of Rhodopsin 5.1 Crystal Structure of Rhodopsin
Having outlined the three topologically distinct functional domains of rhodopsin in two dimensions in Section 3.4, it is now of interest to examine the three-dimensional
9
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
10 The Mammalian Rod Cell Photoreceptor Rhodopsin
structure of rhodopsin in order to visualize how these domains are arranged to support tertiary contacts that ensure ground-state stability. While several aspects of this issue, including orientation of the TM helices, structural irregularities (e.g., tilts, kinks), chromophore–protein interactions, and the structural consequences of naturally occurring rhodopsin mutations are of particular importance, many of these have already been covered in the original report [2] as well as several recent reviews [43, 50, 81–83]. Here, we focus on some of the key structural features that confer on rhodopsin the unique ability to function as a finely tuned dim-light photoreceptor. The X-ray structure was determined from diffraction-quality crystals of detergent solubilized bovine ROS rhodopsin [2, 16]. A current refined (2.8 Å) model of rhodopsin [3] includes greater than 95% of the amino acid residues as well as post-translational modifications (Figure 4). One of the surprises from the crystal structure is a compact intradiscal arrangement, parts of which fold inwards to enclose the retinal moiety. This “lid” or “plug” over the chromophore binding pocket involves the second intradiscal loop and is comprised of an anti-parallel β-stranded
Fig. 4 Three-dimensional structure of bovine rhodopsin. Ribbon drawing of the rhodopsin structure showing the seven TM helices and the cytoplasmic "-helix, H8, in various colors with the connecting segments in gray. The N-linked oligosaccharide chains attached to Asn-2 and Asn-15 in the N-terminal region and the thioester linked palmitoyl groups attached to Cys322 and Cys-323 in the C-terminal region are shown in green. The 11-cis retinal chro-
mophore is shown in blue. The side chains of key amino acid residues mentioned in the text that are important for maintaining the ground-state structure of rhodopsin or are involved in the mechanism of rhodopsin activation are shown in purple and indicated with arrows. The image was generated using InsightII and PDB entry 1HZX (A chain) and is based on work from Teller et al., [3].
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
5 Structure of Rhodopsin
motif. The disulfide forming Cys-187 residue of this loop anchors the latter to Cys-110, near the extracellular surface of TM III. The second part of this loop is proximal to the bound 11-cis retinal, whereas the first part of the loop lies on top of the second strand. As indicated in Section 3.3, the intradiscal domain contains two consensus N-glycosylation sites at Asn-2 and Asn-15. Neither of these sites appears to make structural contacts with the protein (Figure 4), although Asn-15 N-glycosylation has previously been implicated in rhodopsin function [66]. The cytoplasmic domain, in contrast to intradiscal domain, is largely disordered. One exception is H8, which lies almost perpendicular to the carboxyl-terminal end of TM VII (Figure 4). The carboxyl-terminal cytoplasmic tail and portions of the third cytoplasmic loop are poorly resolved in the structure [2–4], while, quite interestingly, the palmitoyl chains attached to Cys-322 and Cys-323 display resolvable density (Figure 4). The TM helices are the sites where a majority of the conserved amino acid residues are found in rhodopsin. For example, Asn-55 of TM I, Asn-78 and Asp-83 of TM II, and Asn-302, Pro-303, and Tyr-306 of TMVII (part of the NPXXY(X)5,6 F motif), are all highly conserved amino acids across the GPCR family that appear to form contacts important for maintaining dark-state stability. While Asn-55 in TM I is hydrogen bonded to Asp-83 in TM II and Asn-78 in TM II is hydrogen bonded to Trp-161 in TM IV (Figure 4), conserved residues from the NPXXY(X)5,6 F motif form a network of interactions that stabilize the cytoplasmic domain [2, 3, 45]. Another important conserved element is the E(D)RY motif at the cytoplasmic end of TM III (Figure 4). Glu-134 and Arg-135 in this motif form ionic interactions with Glu-247 at the cytoplasmic end of TM VI and provide additional dark-state stability to rhodopsin [45]. It should be noted, however, that the electrostatic interaction between Glu-113 and Lys-296, residues which are largely conserved only among opsins, is considered to be the main factor in maintaining the dark-state conformation of rhodopsin [37]. 5.2 Atomic Force Microscopy of Rhodopsin in the Disk Membrane
Oligomerization of GPCRs is now a widely accepted phenomenon. For most GPCRs, evidence for oligomerization is largely indirect and based on results from immuno-precipitation, chemical or disulfide crosslinking, size-exclusion chromatography, and f luorescence or bioluminescence resonance energy transfer studies [67]. In the case of rhodopsin, however, direct evidence for oligomerization has recently been obtained using atomic force microscopy (AFM) [68, 69]. The paracrystalline arrangement of rhodopsin dimers in the murine DM provides compelling evidence for how rhodopsin may be self-organizing, and provide a platform for signal propagation and/or quenching [70]. A current model for the rhodopsin dimer, termed the “IV–V model”, where the dimeric interface is formed between helices IV and V [68, 71], proposes intermolecular contacts (hydrogen bonds) formed by Asn-199 and Ser-202 of both monomers. Asn-199 also extends the hydrogen bond network to Glu-196, and the adjacent Glu-197 residue points its carbonyl oxygen
11
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
12 The Mammalian Rod Cell Photoreceptor Rhodopsin
atom toward Asn-199, thereby contributing to the association of the two rhodopsin monomers. Despite the impressive nature of the AFM and modeling studies, it should be noted that these findings are at variance with the results of earlier biophysical measurements [72, 73], which suggested that rhodopsin rapidly diffuses as a monomeric unit in the DM.
6 Activation Mechanism of Rhodopsin
Light activation of rhodopsin appears to rely on the disruption of dark-state contacts and replacing them with a new set of interactions. This process is directed towards creation of a highly specific and effective binding site for Gt . Since opsin exhibits negligible activity at neutral pH [74], the isomerized retinal chromophore is thought to play a major role in establishing these new interactions. In fact, studies with retinal analogs have shown that removal of the C19 methyl group from 11-cis retinal dramatically shifts the metarhodopsin I-metarhodopsin II equilibrium towards the former [75, 76]. An impairment of metarhodopsin II formation has also been observed when modifications are introduced in or around the β-ionone ring of retinal [77]. Thus, it appears that both the C19 methyl group and β-ionone ring of retinal make important steric contributions to rhodopsin activation. The exact mechanism of rhodopsin activation is not fully understood but the general view is that there is an isomerization-induced separation of the cytoplasmic ends of TM III and TM VI relative to each other [51, 78]. This is thought to be a consequence of charge separation and proton transfer from the retinylidene–Schiff base to the Glu-113 counterion, although a counterion switching mechanism at the metarhodopsin I stage involving Glu-181 has now been proposed [79]. This latter mechanism, however, is at variance with recent NMR results indicating no evidence for any major structural reorganization around the chromophore binding site up to the metarhodopsin I stage [80]. Nonetheless, local conformational changes arising from retinal isomerization appear to be propagated to the cytoplasmic surface causing greater structural changes coupled to the reprotonation and rearrangement around the E(D)RY motif as well as coordinated changes in and around the NPXXY(X)5,6 F motif [45]. A strong distortion in TM VI imposed by Pro-267, one of the most conserved amino acid residues in the family of rhodopsin-like GPCRs [50], is also considered a key element of the activation mechanism.
7 Conclusions
Much has been revealed about the structure and function of mammalian rod rhodopsin over the past several years. As we begin the 21st century, it is likely that further molecular insights into this visual photosensory receptor will emerge. Having now established the basic structural features important for maintaining
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
References
the dark-state of rhodopsin and having identified many of the key residues that contribute to function, future efforts will need to focus on placing these findings in the larger context of phototransduction. One immediate challenge, which should reveal the identity of specific steric interactions between the retinal chromophore and the protein, is the high-resolution structure determination of the rhodopsin photointermediates, and in particular, metarhodopsin II. Acknowledgements
We thank Kris Palczewski for providing the template for Figure 2 and Eric DeJong for assistance in the preparation of Figure 4. This work was supported in part by U. S. Public Health Service grant EY13286 and a grant from the Karl Kirchgessner Foundation. Since the preparation of this article, a new 2.2 Å crystal structure for bovine rhodopsin has been determined tht resolves the complete polypeptide chain and provides further details about eht configuration of the 11-cis retinal chromophare [Okada et al., J. Mol. Biol., 2004, 342, 571–583]. References 1 M. E. BURNS and D. A. BAYLOR, Annu. Rev. Neurosci., 2001, 24, 779–805. 2 K. PALCZEWSKI, T. KUMASAKA, T. HORI, C. A. BEHNKE, H. MOTOSHIMA, B. A. FOX, I. LE TRONG, D. C. TELLER, T. OKADA, R. E. STENKAMP, M. YAMAMOTO, and M. MIYANO, Science, 2000, 289, 739–745. 3 D. C. TELLER, T. OKADA, C. A. BEHNKE, K. PALCZEWSKI, and R. E. STENKAMP, Biochemistry, 2001, 40, 7761–7772. 4 T. OKADA, Y. FUJIYOSHI, M. SILOW, J. NAVARRO, E. M. LANDAU, and Y. SHICHIDA, Proc. Natl. Acad. Sci. USA., 2002, 99, 5982–5987. 5 V. Y. ARSHAVSKY, T. D. LAMB, and E. N. PUGH, Annu. Rev. Physiol., 2002, 64, 153–187. 6 K. D. RIDGE, N. G. ABDULAEV, M. SOUSA, and K. PALCZEWSKI, Trends Biochem. Sci. 2003, 28, 479–487. 7 T. M. CABRERA-VERA, J. VANHAUWE, T. O. THOMAS, M. MEDKOVA, A. PREININGER, M. R. MAZZONI, and H. E. HAMM, Endocr. Rev., 2003, 24, 765–781. 8 P. P. SCHNETKAMP and F. J. DAEMEN, Methods Enzymol., 1982, 81, 110–116. 9 H. G. SMITH, Jr., G. W. STUBBS, and B. J. LITMAN, Exp. Eye Res., 1975, 20, 211–217.
10 W. J. DEGRIP, Methods Enzymol., 1982, 81, 256–265. 11 E. RAMON, J. MARRON, L. DEL VALLE, L. BOSCH, A. ANDRES, J. MANYOSA, and P. GARRIGA, Vision Res., 2003, 43, 3055– 3061. 12 B. J. LITMAN, Methods Enzymol., 1982, 81, 150–153. 13 D. D. OPRIAN, R. S. MOLDAY, R. J. KAUFMAN, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 1987 84, 8874–8878. 14 K. D. RIDGE, Z. LU, X. LIU, and H. G. KHORANA, Biochemistry, 1995, 34, 3261–3267. 15 T. OKADA, K. TAKEDA, and T. KOUYAMA, Photochem. Photobiol., 1998, 67, 495–499. 16 T. OKADA, I. LE TRONG, B. A. FOX, C. A. BEHNKE, R. E. STENKAMP, and K. PALCZEWSKI, J. Struct. Biol., 2000, 130, 73–80. 17 Yu. A. OVCHINNIKOV, N. G. ABDULAEV, M. Yu. FEIGINA, I. D. ARTAMONOV, and A. S. BO-GACHUK, Bioorg. Khim. 1983, 9, 1331–1340. 18 P. A. HARGRAVE, J. H. MCDOWELL, D. R. CURTIS, J. K. WANG, E. JUSZCZAK, S. L. FONG, J. K. RAO, and P. ARGOS, Biophys. Struct. Mech. 1983, 9, 235–244.
13
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
14 The Mammalian Rod Cell Photoreceptor Rhodopsin
19 J. NATHANS and D. S. HOGNESS, Cell, 1983, 34, 807–814. 20 R. QANBAR and M. BOUVIER, Pharmacol. Ther., 2003, 97, 1–33. 21 M. N. FUKUDA, D. S. PAPERMASTER, and P. A. HARGRAVE, J. Biol. Chem., 1979, 254, 8201–8217. 22 C. J. LIANG, K. YAMASHITA, C. G. MUELLENBERG, H. SHICHI, and A. KOBATA, J. Biol. Chem., 1979, 254, 6414–6418. 23 P. A. HARGRAVE, J. H. MCDOWELL, R. J. FELDMANN, P. H. ATKINSON, J. K. RAO, and P. ARGOS, Vision Res., 1984, 24, 1487–1499. 24 Yu. A. OVCHINNIKOV, N. G. ABDULAEV, and A. S. BOGACHUK, FEBS Lett., 1988, 230, 1–5. 25 D. I. PAPAC, K. R. THORNBURG, E. E. BULLESBACH, R. K. CROUCH, and D. R. KNAPP, J. Biol. Chem., 1992, 267, 16889–16894. 26 S. S. KARNIK, K. D. RIDGE, S. BHATTACHARYA, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 1993, 90, 40–44. 27 S. S. KARNIK and H. G. KHORANA, J. Biol. Chem., 1990, 265, 17520–17524. 28 F. F. DAVIDSON, P. C. LOEWEN, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 1994, 91, 4029–4033. 29 P. GARRIGA, X. LIU, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 1996, 93, 4560–4564. 30 J. HWA, J. KLEIN-SEETHARAMAN, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 2001, 98, 4872–4876. 31 J. B. HURLEY, M. SPENCER, and G. A. NIEMI, Vision Res., 1998, 38, 1341– 1352. 32 T. MAEDA, Y. IMANISHI, and K. PALCZEWSKI, Prog. Retin. Eye Res., 2003, 22, 417–434. 33 A. D. ALBERT and B. J. LITMAN, Biochemistry, 1978, 17, 3893–3900. 34 M. MICHEL-VILLAZ, H. R. SAIBIL, and M. CHABRE, Proc. Natl. Acad. Sci. USA., 1979, 76, 4405–4408. 35 W. L. HUBBELL and B. K. FUNG, Soc. Gen. Physiol. Ser., 1979, 33, 17–25. 36 M. T. MAS, J. K. WANG, and P. A. HARGRAVE, Biochemistry, 1980, 19, 684–691. 37 V. R. RAO, G. B. COHEN, and D. D. OPRIAN, Nature, 1994, 367, 639–642.
38 L. BOSCH, E. RAMON, L. J. DEL VALLE, and P. GARRIGA, J. Biol. Chem., 2003, 278, 20203–20209. 39 N. G. ABDULAEV, Trends Biochem. Sci., 2003, 28, 399–402. 40 T. P. SAKMAR, R. R. FRANKE, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 1989, 86, 8309–8313. 41 E. A. ZHUKOVSKY and D. D. OPRIAN, Science, 1989, 246, 928–930. 42 J. NATHANS, Biochemistry, 1990, 29, 9746–9752. 43 J. A. BALLESTEROS, L. SHI, and J. A. JAVITCH, Mol. Pharmacol. 2001, 60, 1–19. 44 K. D. RIDGE, T. NGO, S. S. LEE, and N. G. ABDULAEV, J. Biol. Chem., 1999, 274, 21437–21442. 45 O. FRITZE, S. FILIPEK, V. KUKSA, K. PALCZEWSKI, K. P. HOFMANN, and O. P. ERNST, Proc. Natl. Acad. Sci. USA., 2003, 100, 2290–2295. 46 G. N. ABDULAEV and K. D. RIDGE, Proc. Natl. Acad. Sci. USA., 1998, 95, 12854–12859. 47 R. P. RIEK, I. RIGOUTSOS, J. NOVOTNY, and R. M. GRAHAM, J. Mol. Biol., 2001, 306, 349–362. 48 L. SIPOS and G. VON HEIJNE, Eur. J. Biochem., 1993, 213, 1333–1340. 49 T. DOI, R. S. MOLDAY, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 1990, 87, 4991–4995. 50 T. P. SAKMAR, Curr. Opin. Cell Biol., 2002, 14, 189–195. 51 W. L. HUBBELL, C. ALTENBACH, C. M. HUBBELL, and H. G. KHORANA, Adv. Protein Chem., 2003, 63, 243–290. 52 Z. T. FARAHBAKHSH, K. HIDEG, and W. L. HUBBELL, Science, 1993, 262, 1416–1419. 53 K. YANG, D. L. FARRENS, C. ALTENBACH, Z. T. FARAHBAKHSH, W. L. HUBBELL, and H. G. KHORANA, Biochemistry, 1996, 35, 14040–14046. 54 C. ALTENBACH, K. YANG, D. L. FARRENS, Z. T. FARAHBAKHSH, H. G. KHORANA, and W. L. HUBBELL, Biochemistry, 1996, 35, 12470–12478. 55 J. LI, P. EDWARDS, M. BURGHAMMER, C. VILLA, and G. F. X. SCHERTLER, J. Mol. Biol., 2004, 343, 1409–1438. 56 S. O. SMITH, I. PALINGS, V. COPIE, D. P. RALEIGH, J. COURTIN, J. A. PARDOEN, J. LUGTENBURG, R. A. MATHIES, and R. G.
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
References
57
58
59 60
61
62
63 64
65
66
67 68
69
70
GRIFFIN, Biochemistry, 1987, 24, 1606–1611. J. LUGTENBURG, R. A. MATHIES, R. G. GRIFFIN, and J. HERZFELD, Trends Biochem. Sci., 1988, 13, 388–393. P. J. R. SPOONER, J. M. SHARPLES, M. A. VERHOEVEN, J. LUGTENBURG, C. GLAUBITZ, and A. WATTS, Biochemistry, 2002, 41, 7549–7555. G. WALD, Science, 1968, 162, 230–239. R. W. SCHOENLEIN, L. A. PETEANU, R. A. MATHIES, and C. V. SHANK, Science, 1991, 254, 412–415. Y. SHICHIDA, Y. MATUOKA, and T. YOSHIZAWA, Photobiochem. Photobiophys., 1991, 7, 221–228. S. J. HUG, J. W. LEWIS, C. M. EINTERZ, T. E. THORGEIRSSON, and D. S. KLIGER, Biochemistry, 1990, 29, 1475–1485. I. SZUNDI, J. W. LEWIS, and D. S. KLIGER, Biophys. J., 1997, 73, 688–702. J. W. LEWIS, F. J. VAN KUIJK, J. A. CARRUTHERS, and D. S. KLIGER, Vision Res., 1997, 37, 1–8. M. HECK, S. A. SCHADEL, D. MARETZKI, F. J. BARTL, E. RITTER, K. PALCZEWSKI, and K. P. HOFMANN, J. Biol. Chem., 2003, 278, 3162–3169. S. KAUSHAL, K. D. RIDGE, and H. G. KHORANA, Proc. Natl. Acad. Sci. USA., 1994, 91, 4024–4028. S. TERRILLON and M. BOUVIER, EMBO Rep., 2004, 5, 30–34. D. FOTIADIS, Y. LIANG, S. FILIPEK, D. A. SAPERSTEIN, A. ENGEL, and K. PALCZEWSKI, Nature, 2003, 421, 127–128. Y. LIANG, D. FOTIADIS, S. FILIPEK, D. A. SAPERSTEIN, K. PALCZEWSKI, and A. ENGEL, J. Biol. Chem. 2003, 278, 21655– 21662. S. FILIPEK, K. A. KRZYSKO, D. FOTIADIS, Y. LIANG, D. A. SAPERSTEIN, A. ENGEL, and K. PALCZEWSKI, Photochem. Photobiol. Sci., 2004, 3, 628–638.
71 D. FOTIADIS, Y. LIANG, S. FILIPEK, D. A. SAPERSTEIN, A. ENGEL and K. PALCZEWSKI, FEBS Lett., 2004, 564, 281–288. 72 R. A. CONE, Nature New Biol., 1972, 236, 39–43. 73 M. POO and R. A. CONE, Nature, 1974, 247, 438–441. 74 T. J. MELIA, Jr., C. W. COWAN, J. K. ANGLESON, and T. G. WENSEL, Biophys. J., 1997, 73, 3182–3191. 75 R. VOGEL, G. B. FAN, M. SHEVES, and F. SIEBERT, Biochemistry, 2000, 39, 8895–8908. 76 C. K. MEYER, M. BOHME, A. OCKENFELS, W. GARTNER, K. P. HOFMANN, and O. P. ERNST, J. Biol. Chem., 2000, 275, 19713–19718. 77 F. JAGER, S. JAGER, O. KRUTLE, N. FRIEDMAN, M. SHEVES, K. P. HOFMANN, and F. SIEBERT, Biochemistry, 1994, 33, 7389–7397. 78 D. L. FARRENS, C. ALTENBACH, K. YANG, W. L. HUBBELL, and H. G. KHORANA, Science, 1996, 274, 768–770. 79 E. C. YAN, M. A. KAZMI, Z. GANIM, J. M. HOU, D. PAN, B. S. CHANG, T. P. SAKMAR, and R. A. MATHIES, Proc. Natl. Acad. Sci. USA., 2003, 100, 9262–9267. 80 P. J. R. SPOONER, J. M. SHARPLES, S. C. GOODALL, H. SEEDORF, M. A. VERHOEVEN, J. LUGTENBURG, P. H. BOVEE-GEURTS, W. J. DEGRIP, and A. WATTS, Biochemistry, 2003, 42, 13371–13378. 81 S. FILIPEK, R. E. STENKAMP, D. C. TELLER, and K. PALCZEWSKI, Annu. Rev. Physiol., 2003, 65, 851–879. 82 S. T. MENON, M. HAN, and T. P. SAKMAR, Physiol Rev., 2001, 81, 1659–1688. 83 T. OKADA, O. P. ERNST, K. PALCZEWSKI, and K. P. HOFMANN, Trends Biochem. Sci., 2001, 26, 318–324. 84 K. D. RIDGE, C. ZHANG, and H. G. KHORANA, Biochemistry, 1995, 34, 8804–8811.
15
c04 (JWBK148-Bommarius)
January 22, 2008
21:46
Char Count=
16
1
Microbial Rhodopsins John L. Spudich and Kwang-Hwan Jung Sogang University, Seoul, Korea
Originally published in: Handbook of Photosensory Receptors. Edited by Winslow R. Briggs and John L. Spudich. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31019-7
1 Introduction
The first 30 years of research on microbial rhodopsins concerned exclusively four proteins that share the cytoplasmic membrane of the halophilic archaeon Halobacterium salinarum, and a few very close homologs found in related haloarchaea. These four haloarchaeal types were the only microbial retinylidene proteins known prior to 1999: the light-driven ion pumps bacteriorhodopsin BR [1], and halorhodopsin (HR [2, 3]), and the phototaxis receptors sensory rhodopsin I (SRI [4]), and sensory rhodopsin II (SRII [5]). Studies of the haloarchaeal rhodopsins by the most incisive biophysical and biochemical tools available produced a wealth of information making them some of the best understood membrane-embedded proteins in terms of their structure-function relationships. Crystal structures of three [BR [6–8], HR [9], and SRII [10–12]] reveal a common seven-transmembrane α-helical structure with nearly identical helix positions in the membrane, despite their differing functions and identity in only ∼25% of their residues. The positions differ from those of visual pigments, as shown by the crystal structure of bovine rod rhodopsin [13], but their overall topologies are similar, namely the seven helices form an interior binding pocket in the hydrophobic core of the membrane for the retinal chromophore. In both the microbial and visual pigments, the retinal is attached by a protonated Schiff base linkage to a lysine in the middle of the seventh helix and retinal photoisomerization initiates their photochemical reactions. Starting in 1999, genome sequencing of cultivated microorganisms began to reveal the previously unsuspected presence of archaeal rhodopsin homologs in several organisms in the other two domains of life, namely Bacteria and Eukarya [14–16]. Further, in 2001, “environmental genomics” of populations of uncultivated microorganisms in ocean plankton showed the presence of a homolog in marine Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Microbial Rhodopsins
proteobacteria [hence given the name proteorhodopsin [17], which has swiftly expanded so far to ∼800 relatives identified in samples throughout the world’s oceans [18–23]. Microorganisms containing rhodopsin genes inhabit diverse environments including salt flats, soil, fresh water, surface and deep sea water, glacial sea habitats, and human and plant tissues as fungal pathogens. They comprise a broad phylogenetic range of microbial life, including haloarchaea, proteobacteria, cyanobacteria, fungi, dinof lagellates, and green algae. The conservation of residues, especially in the retinal-binding pocket, define a large phylogenetic class, called type 1 rhodopsins to distinguish them from the visual pigments and related retinylidene proteins in higher organisms (type 2 rhodopsins). Analysis of the sequences of the new type 1 rhodopsins, their heterologous expression and study, and in some cases study of the photosensory physiology of the organisms containing them, and spectroscopic analysis of environmental samples, have shown that the newfound pigments fulfill both ion-transport and sensory functions, the latter with a variety of signal-transduction mechanisms. The purpose of this article is to summarize what we have learned regarding the rapidly expanding group of retinylidene pigments comprising the microbial rhodopsin family.
2 Archaeal Rhodopsins
Many laboratories have characterized the four rhodopsins from H. salinarum with a battery of techniques because they provide model systems for the two fundamental functions of membranes: active transport and sensory signaling. Comprehensive reviews on mechanisms of BR [24], HR [25], and the SRs [26, 27] are available. Sixteen variants of BR, HR, SRI and SRII have been documented in related halophilic archaea, such as Natronomonas pharaonis and Haloarcula vallismortis (Table 1). Identification of members of the type-1 family has been based primarily on the conservation of residues in the retinal-binding pocket, which is known from the structures of haloarchaeal members. Atomic resolution structures, which exist for only a small number of membrane proteins, have been obtained from electron microscopy and X-ray crystallography of three of the archaeal rhodopsins: BR and HR from H. salinarum, and SRII from N. pharaonis (“NpSRII”). These proteins share a nearly identical positioning of seven transmembrane helices forming an interior pocket for the chromophore, all-trans retinal. The retinal binding pocket is comprised of residues from each of the seven helices, and it is the conservation of these residues that provides the most definitive identification of archaeal rhodopsin homologs in other organisms. Conservation outside of the pocket is sparse (Figure 1). Even between members of the archaeal branch, conservation outside the pocket is limited. For example, the phototaxis receptor NpSRII is only 27% identical to BR in amino acid sequence, and exhibits typically ∼40% identity with other archaeal sensory rhodopsins; all 4 archaeal rhodopsins exhibit ∼80% identity in the 22 residues that the crystal structures show form the retinal binding pockets in BR, HR, and
2 Archaeal Rhodopsins Table 1 List of microbial rhodopsins with database accession numbers or other sources.
We show all microbial opsin genes found in the NCBI database except proteorhodopsins, for which ∼800 have been identified (see text). We selectively present a subset of proteorhodopsins for which absorption maxima have been published. BR, bacteriorhodopsin; HR, halorhodopsin; SRI & SRII, sensory rhodopsins I and II; PR, proteorhodopsin; NOPI, Neurospora opsin I; CSRA & CSRB, Chlamydomonas sensory rhodopsins A and B Species and Name
Accession Number
Comments
Archaea Haloarcula argentinensis BR Haloarcula japonica BR Haloarcula sp. (Andes) BR Haloarcula vallismortis BR Haloarcula vallismortis HR Haloarcula vallismortis SRI Haloarcula vallismortis SRII Halobacterium marismotui BR
D31880 AB029320 S76743 D31882 D31881 D83748 Z35308 −
Halobacterium marismotui HR
−
Halobacterium marismotui SRI Halobacterium marismotui SRII Halobacterium marismotui BR-2 Halobacterium marismotui SRI-2 Halobacterium salinarum BR Halobacterium salinarum HR Halobacterium salinarum SRI
− − − − V00474 D43765 L05603
Halobacterium salinarum SRII
U62676
Halobacterium salinarum mex BR Halobacterium salinarum mex HR Halobacterium salinarum port BR Halobacterium salinarum port HR Halobacterium salinarum shark BR Halobacterium salinarum shark HR Halobacterium sp. AUS-1 BR Halobacterium sp. AUS-1 SRII Halobacterium sp. AUS-2 BR Halobacterium sp. NRC-1 BR Halobacterium sp. NRC-1 HR Halobacterium sp. NRC-1 SRI
D11056 P33970 D11057 Q48315 D11058 D43765 J05165 AB059748 S56354 NP 280292 NP 279315 AAG19914
Halobacterium sp. NRC-1 SRII
AAG19988
Halobacterium sp. SG1 BR Halobacterium sp. SG1 HR
X70291 X70292
H+ pump H+ pump H+ pump H+ pump Cl− pump phototaxis phototaxis H+ pump by homology with BR, Victor Ng, pers. com. Cl− pump by homology with HR, Victor Ng, pers. com. phototaxis,Victor Ng, pers. com. phototaxis,Victor Ng, pers. com. unknown,Victor Ng, pers. com. unknown,Victor Ng, pers. com. λmax = 568 nm, H+ pump λmax = 576 nm, Cl− pump λmax = 587 nm, phototaxis (attractant/repellent) λmax = 487 nm, phototaxis (repellent) H+ pump Cl− pump H+ pump Cl− pump H+ pump Cl− pump H+ pump phototaxis H+ pump H+ pump Cl− pump λmax = 587 nm, phototaxis (attractant/repellent) λmax = 487 nm, phototaxis (repellent) H+ pump Cl− pump (Continued)
3
4 Microbial Rhodopsins Table 1 (Continued)
Species and Name
Accession Number
Comments
Archaea Halobacterium sp. SG1 SRI Halorubrum sodomense BR Halorubrum sodomense HR Halorubrum sodomense SRI Haloterrigena sp. Arg-4 BR Haloterrigena sp. Arg-4 HR Natronomonas pharaonis HR Natronomonas pharaonis SRII
X70290 D50848 AB009622 AB009623 AB009620 AB009621 J0519 9 Z35086
phototaxis H+ pump Cl− pump phototaxis H+ pump Cl− pump Cl− pump λmax = 497 nm, phototaxis (repellent)
Eubacteria Anabaena sp. PCC7120
AP003592
Gloeobacter violaceus PCC 7421 Magnetospirillum magnetotacticum γ -proteobacterium (BAC31A8) γ -proteobacterium (HOT75m4) γ -proteobacterium (HOT0m1) γ -proteobacterium (PalE6) γ -proteobacterium (eBac64A5) γ -proteobacterium (eBac40E8) γ -proteobacterium (RSr6a5a6) γ -proteobacterium (RS23) γ -proteobacterium (RSr6a5a2)
NP 923144 − AF279106 AF349981 AF349978 AAK30200 AAK30175 AAK30174 AAO21455 AAO21449 −
Also known as Nostoc, λmax 543 nm, photosensory Unicellular cyanobacterium genome.ornl.gov/microbial/mmag λmax = 527 nm, H+ pump; GPR λmax = 490 nm, H+ pump; BPR λmax = 518 nm λmax = 490 nm λmax = 519 nm λmax = 519 nm λmax = 540 nm λmax = 528 nm λmax = 505 nm, RSr6a5a6(V105E), from Oded B´ej`a
Fungi Botrytis cinerea Botryotinia fuckeliana Cryptococcus neoformans
AL115930 − CF192410
Fusarium sporotrichioides Fusarium graminearum Leptosphaeria maculans Mycosphaerella graminicola Neurospora crassa NR Triticum aestivum Ustilago maydis Algae Chlamydomonas reinhardtii CSRA
BI187800 BU067691 AF290180 A W 180117 AF135863 CA747087 CF642219 AF508965
Chlamydomonas reinhardtii CSRB Pyrocystis lunula Guillardia theta Acetabularia acetabulum
AF508966 AF508258 AW342219 CF259014
cogeme.ex.ac.uk www.genome.ou.edu/cneo.html, Basidiomycetes
two homologs are present λmax = 534 nm Basidiomycetes photomotility for high light intensity photomotility for low light intensity dinof lagellate cryptomonad green alga
3 Clues to Newfound Microbial Rhodopsin Function from Primary Sequence 5
NpSRII. 55–75% identity in these 22 residues is also found in the new rhodopsins (Figure 1). The functions of the four archaeal rhodopsins have been well characterized. BR (λmax = 568 nm) and HR (λmax = 576 nm) are light-driven ion pumps for protons and chloride, respectively, absorbing maximally in the green-orange region of the spectrum [25, 28]. Their electrogenic transport cycles provide energy to the cell under conditions in which respiratory electron transport activity is low. Accordingly, their production in the cells is induced when oxygen is depleted in late exponential/early stationary phase cultures. Both BR and HR hyperpolarize the membrane to generate a positive outside membrane potential, thereby creating inwardly directed proton motive force. HR further contributes to pH homeostasis by hyperpolarizing the membrane by electrogenic chloride uptake rather than proton ejection, thereby providing an electrical potential for net proton uptake especially important in alkaline conditions. SRI and SRII are phototaxis receptors controlling the cell’s swimming behavior in response to changes in light intensity and color [26]. SRI (λmax = 587 nm) is also induced in cells in late exponential/early stationary phase, and attracts the cells to orange light useful to the transport rhodopsins. SRI is unique among known photosensory receptors in that it produces opposite signals (attractant and repellent) depending on the wavelengths of stimulating light. To avoid guiding the cells into light containing harmful near-UV radiation, SRI uses its color-discriminating mechanism to ensure the cells will be attracted to orange light only if that light is not accompanied by near-UV wavelengths [29]. The mechanism is based on photochromic reactions of the protein. If SRI absorbs a single photon (maximal absorption in the orange) it produces a photointermediate species called SRI-M or S373 (λmax = 373 nm) that is interpreted by the cells’ signal transduction machinery as an attractant signal. However, if S373 is photoexcited, it generates a strongly repellent-signaling photointermediate. Therefore single-photon excitation of SRI, such as occurs in orange light, attracts the cells, whereas two-photon excitation, as occurs in white light, repels the cells. SRII absorbs in the mid-visible range (λmax = 487 nm in H. salinarum SRII) and appears to serve only a repellent function [30]. It is the only rhodopsin in H. salinarum produced in cells during vigorous aerobic grow when light is not being used for energy and is therefore best avoided because of possible photooxidative damage.
3 Clues to Newfound Microbial Rhodopsin Function from Primary Sequence Comparison to Archaeal Rhodopsins
A challenge posed by the newfound microbial rhodopsin genes is to identify the photochemical and physiological function of the proteins in the cells containing them, some of which are uncultivated microorganisms. Success has been obtained in several cases: as detailed below, the Monterey Bay surface water-proteorhodopsin functions as a light-driven proton pump for the γ -proteobacterium SAR86 in its
6 Microbial Rhodopsins
3 Clues to Newfound Microbial Rhodopsin Function from Primary Sequence 7
native marine environment [18, 19], and therefore its physiological function is similar to that of BR in H. salinarum. In contrast, the Anabaena (Nostoc) rhodopsin [15] and Chlamydomonas reinhardtii pigments CSRA and CSRB [16] have been demonstrated to serve photosensory rather than transport functions, and therefore are functionally more similar to the archaeal SRI and SRII. These known transport proteins and sensory proteins do not group separately in phylogenetic analyses (Figure 2); therefore the phylogenetic trees do not permit assignment of particular sequences as encoding transport or sensory proteins. Some individual residue differences, however, provide a clue, as do the photochemical reaction cycle kinetics of the proteins. One difference in the primary sequence between BR, HR and the SRs stands out. Asp96 in BR functions as a proton donor, returning a proton to the Schiff base from the cytoplasmic side of the protein during the pumping cycle. This proton transfer improves the pumping efficiency of BR by accelerating the decay of its unprotonated Schiff base photocycle intermediate, M, and is present in all BR homologs in the haloarchaea. In the sensory rhodopsins, the corresponding M intermediates are signaling states of the receptor proteins (demonstrated unequivocally only for HsSRI), and longer M lifetimes increase the signaling efficiencies of the receptors. Accordingly, each of the five known haloarchaeal sensory rhodopsin sequences lacks a carboxylate residue at the position corresponding to Asp96 and contains Tyr or Phe instead. The residue corresponding to Asp85, which is the proton acceptor from the Schiff base, is a carboxylate residue in BR, SRI, and SRII and in each of the newly identified rhodopsins (Figure 1). HR does not produced an unprotonated Schiff base intermediate in its photocycle, and therefore does not contain a carboxylic acid residue in either of the positions corresponding to Asp96 and Asp85. Notably the SAR86 proteorhodopsin demonstrated to be a light-driven proton pump does contain a carboxylate (Glu108) at the position corresponding to Asp96 in BR, and moreover Glu108 has been shown to participate in the reprotonation of the Schiff base in the latter half of the photocycle as does Asp96 in BR [31, 32]. On ← Fig. 1 Primary sequence comparison of 15 microbial rhodopsins. We selected several opsin genes from each domain of life. Archaea- BR: Halobacterium salinarum bacteri-orhodopsin, SRI: H. salinarum sensory rhodopsin I, NpSRII: Natronomonas pharaonis sensory rhodopsin II; BacteriaGPR: (-proteobacterium (BAC31A8) proteorhodopsin, BPR: (-proteobacterium (HOT75m4) proteorhodopsin, Gloeobacter: microbial rhodopsin from Gloeobacter violaceus PCC 7421, Anabaena: sensory rhodopsin from Anabaena (Nostoc) sp. PCC7120; Eukarya- Fungi-rhodopsin from Neurospora crassa, Leptosphaeria maculans,
Botrytis cinerea, and Cryptococcus neoformans, Algae- Pyrocyctis: rhodopsin from Pyrocystis lunula, CSRA & CSRB: Chlamydomonas reinhardtii sensory rhodopsins A and B (N-terminal portions), Guillardia: rhodopsin from Guillardia theta. Conserved residues are marked with black background and the 22 residues in the retinal-binding pocket are marked with asterisks. Bacteriorhodopsin Asp85 and Asp96 in helix C and corresponding residues in the other pigments are marked with blue background (see text). Red-colored KWG residues on helix E are nearly completely conserved in fungal rhodopsins.
8 Microbial Rhodopsins
3 Clues to Newfound Microbial Rhodopsin Function from Primary Sequence 9
the basis of available information, the presence of the carboxylate residue at this position appears to be a necessary, but not a sufficient condition for identification of a new rhodopsin as a proton pump. Neurospora rhodopsin also contains a glutamate at this position, but extensive analysis of the photoactivity of the protein expressed in Pichia pastoris, as well as purified and reconstituted into liposomes, reveals a non-transport photocycle with kinetics indicative of a sensory rhodopsin [33]. A caveat is that one cannot be certain that the protein is folded correctly when expressed heterologously, although the absorption spectrum in the visible range and photochemical reactivity of the expressed protein when reconstituted with all-trans retinal provides some assurance. The demonstrated sensory rhodopsins, namely the archaeal SRI and SRII proteins, the Anabaena rhodopsin, and Chlamydomonas CSRA and CSRB, all lack a carboxylate residue at the homologous position of the BR Schiff base proton donor Asp96, while containing the carboxylate Schiff base proton-acceptor residue corresponding to Asp85 in BR (Figure 1). Hence the absence of a carboxylate in the donor position in Cryptococcus neoformans (alanine in the corresponding position) and in a marine proteorhodopsin sequence recently deposited in GenBank
← Fig. 2 Phylogenetic tree of microbial rhodopsins. A neighbor-joining tree was constructed from CLUSTALX(1.81) alignment of 46 microbial rhodopsin apoproteins. The tree was constructed using the full-length sequences, except in the case of Chlamydomonas sensory opsins, in which only the rhodopsin domains were used (CsoA, 378 N-terminal residues; CsoB, 303 N-terminal residues). Scale represents number of substitutions per site (0.1 indicates 10 nucleotide substitutions per 100 nucleotides). 1000 bootstrap replicates were performed to determine the reliability of the tree topology. The tree was drawn using TreeView1.6.6. Abbreviations: NOPI (Neurospora crassa opsin I), HtHR (Haloterrigena sp. halorhodopsin), HvalHR (Haloarcula vallismortis halorhodopsin), HsHR (Halobacterium salinarum halorhodopsin), HsportHR (Halobacterium salinarum port halorhodopsin), NpHR (Natronomonas pharaonis halorhodopsin), HsodHR (Halorubrum sodomense halorhodopsin), HspSGIHR (Halobacterium sp. SG1 halorhodopsin), AnabaenaSR (Anabaena (Nostoc) sp. PCC7120 sensory rhodopsin), HtBR (Haloterrigena sp. bacteriorhodopsin), HaspBR (Haloarcula sp. bacteriorhodopsin),
HvalBR (Haloarcula vallismortis bacteriorhodopsin), HajaponicaBR (Haloarcula japonica bacteriorhodopsin), HaargentBR (Haloarcula argentinensis bacteriorhodopsin), HsportBR (Halobacterium salinarum port bacteriorhodopsin), HsBR (Halobacterium salinarum bacteriorhodopsin), HsmexBR (Halobacterium salinarum mex bacteriorhodopsin), HspAUS-2BR (Halobacterium sp. AUS-2 bacteriorhodopsin), HsodBR (Halorubrum sodomense bacteriorhodopsin), HspAUS-1BR (Halobacterium sp. AUS1 bacteriorhodopsin), HvalSRII (Haloarcula vallismortis sensory rhodopsin II), NpSRII (Natronomonas pharaonis sensory rhodopsin II), HspAUS-1SRII (Halobacterium sp. AUS-1 sensory rhodopsin II), HsSRII (Halobacterium salinarum sensory rhodopsin II), HsodSRI (Halorubrum sodomense sensory rhodopsin I), HvalSRI (Haloarcula vallismortis sensory rhodopsin I), HsSRI (Halobacterium salinarum sensory rhodopsin I), PR eBAC64A5 ((proteobacterium proteorhodopsin), PR MBP (GPR; (-proteobacterium proteorhodopsin BAC21A8), CSRA and CSRB (Chlamydomonas reinhardtii sensory rhodopsins A and B, respectively).
10 Microbial Rhodopsins
(gi|42850614|gb|EAA92632.1, threonine in the corresponding position) strongly suggest sensory functions for these proteins. More than 10-fold faster photocycling rates distinguish the archaeal transport from the sensory pigments; the first-found sensory rhodopsin, archaeal SRI, in fact was initially called “slow-cycling rhodopsin” for this reason [4, 34]. The transport rhodopsins are characterized by photocycles typically <30 ms, whereas sensory rhodopsins are slow-cycling pigments with photocycle halftimes typically >300 ms [27]. This large kinetic difference is functionally important since a rapid photocycling rate is advantageous for efficient ion pumping, whereas a slower cycle provides more efficient light detection because signaling states persist for longer times. The photocycle rate difference, which holds firm for the archaeal rhodopsins, may in some cases not be a definitive criterion for assigning a transport versus sensory function. The Anabaena rhodopsin which has been concluded to be a sensory protein based on other criteria has a photocycle half-time of 110 ms [15], intermediate between archaeal transport and sensory proteins. Furthermore, a deep sea proteorhodopsin from a Hawaiian Ocean-Time station plankton sample from 75 m depth exhibits a light-driven proton transport cycle that is ∼10-fold slower (60 ms in cells) than that of the Monterey Bay proteorhodopsin [32]. The slower photocycling rate of the deep sea pigment is explained as an adaptation to the ∼10-fold decreased photon flux rate available to the BPR visible absorption band at 75 m.
4 Bacterial Rhodopsins 4.1 Green-absorbing Proteorhodopsin (“GPR”) from Monterey Bay Surface Plankton
Among the most abundant and widely distributed of the type 1 rhodopsins are the proteorhodopsins, the first of which was identified by genomic analysis of marine proteobacteria in plankton from Pacific coastal surface waters. The proteorhodopsin gene was the first found to encode a eubacterial homolog of the archaeal rhodopsins and was revealed by BAC library construction and sequencing of naturally occurring marine bacterioplankton from Monterey Bay [17]. The gene was functionally expressed in Escherichia coli and bound retinal to form an active, light-driven proton pump. The rRNA sequence on the same DNA fragment identified the organism as an uncultivated γ -proteobacterium (the SAR86 group), and the expressed protein was named proteorhodopsin. Phylogenetic comparison with archaeal rhodopsins placed proteorhodopsin on an independent long branch (Figure 2). The new pigment, designated GPR (λmax = 525 nm), exhibited a photochemical reaction cycle with intermediates and kinetics characteristic of archaeal proton-pumping rhodopsins. Its transport, spectroscopic, and photochemical reactions have now been characterized by a number of laboratories in Escherichia
4 Bacterial Rhodopsins
coli-expressed form, [17, 18, 20, 31, 35–38]. The efficient proton pumping and rapid photocycle (15 ms halftime) of the new pigment strongly suggested that proteorhodopsin functions as a proton pump in its natural environment. Asp97 and Glu108 in GPR function as Schiff base proton acceptor and donor carboxylate residues during the GPR pumping cycle, analogous to Asp85 and Asp96, respectively, at the corresponding positions in BR [31, 32]. The next step was examination of the plankton samples directly for the newfound protein’s activity. Retinylidene pigmentation with photocycle characteristics identical to that of the E. coli-expressed proteorhodopsin gene was demonstrated by flash spec-troscopy in membranes prepared from Monterey Bay picoplankton [18]. Estimated from laser flash-induced absorbance changes, a high density of proteorhodopsin in the SAR86 membrane is indicated, arguing for a significant role of the protein in the physiology of these bacteria. The flash photolysis results provided direct physical evidence for the existence of proteorhodopsin-like pigments and endogenous retinal molecules in the prokaryotic fraction of the Monterey Bay coastal surface waters, and provide compelling evidence that GPR functions as a light-driven proton pump photoenergizing SAR86 cells in their natural environment. Furthermore, the amplitude of the flash-photolysis signals permit a rough estimate of the total rate of solar energy conversion to proton motive force by marine proteorhodopsins; assuming for the calculation that the Monterey Bay sample has the average PR content, the conversion rate is on the order of 1013 –1014 W, a globally significant contribution to the biosphere. Since the initial finding of GPR, a wide variety of similar genes has been identified in picoplankton from very different ocean environments: the Antarctic, Central North Pacific, Mediterranean Sea, Red Sea, and the Atlantic Ocean [18–22]. Genes have been isolated from both surface and deep-water samples, and both coastal and open-sea areas. New members from the PR family were recently reported to be found also in marine α-proteobacteria [19], and based on whole genome “shotgun sequencing” of microbial populations collected en mass on tangential flow and impact filters from sea water samples collected from the Sargasso Sea near Bermuda, a remarkable 782 different partial sequences homologous to proteorhodopsins were identified [23]. Thus, microbial rhodopsin abundance and diversity within marine environments appears to be large. 4.2 Blue-absorbing Proteorhodopsin (“BPR”) from Hawaiian Deep Sea Plankton
One of the variant groups (designated clade II) of proteorhodopsin genes, differing by ∼22% in predicted primary structure from the clade I group defined by the GPR gene and its close relatives, was detected in both the Antarctic and in 75-m deep ocean plankton from Hawaiian water [18]. The Antarctic and Hawaiian PR genes when expressed in E. coli exhibit a blue-shifted absorption spectrum (λmax = 490 nm; hence referred to as “BPR“) with vibrational fine structure, unlike the unstructured spectrum of GPR [18]. The stratification of the surface GPR and 75-m BPR with depth is in accordance with light spectral quality at these depths [18].
11
12 Microbial Rhodopsins
The different absorption spectra of GPR and BPR have provided an opportunity to examine “spectral tuning” in two rhodopsins with closely similar primary sequence. One of the most notable distinguishing properties of retinal among the various chromophores used in photosensory receptors is the large variation of its absorption spectrum depending on interaction with the apoprotein (“spectral tuning”) [39, 40]. In rhodopsins, retinal is covalently attached to the ε-amino group of a lysine residue forming a protonated retinylidene Schiff base. In methanol a protonated retinylidene Schiff base exhibits a λmax = of 440 nm. The protein microenvironment shifts the λmax [the “opsin shift” [41] to longer wavelengths, e.g. to 527 nm in GPR and to 490 nm in BPR. With structural modelling comparisons and mutagenesis, a single residue difference in the retinal binding pockets at position 105 (Leu in GPR and Gln in BPR) was found to function as a spectral tuning switch and to account for most of the spectral difference between the two pigment families [20]. The mutations at position 105 almost completely interconverted the absorption spectra of BPR and GPR. GPR L105Q shifted to the blue and acquired vibrational fine structure like wild-type BPR, and BPR Q105L shifted to the red and lost the fine structure exhibiting spectra similar to those of GPR. Among both type 1 and type 2 rhodopsins the mechanisms of spectral tuning in general are still not well understood in physical chemical terms and the Q/L switch stands out as a simple spectral tuning model amenable to investigation. Spectral tuning is discussed in more detail in Section 6, below. Another difference between GPR and BPR is their photocycle halftimes, 6.5 ms and 60 ms respectively in E. coli cells [32]. The difference in photocycle rates and their different absorption maxima may be explained as an adaptation to the different light intensities in their respective marine environments, based on measured spectral distributions of intensities of solar illumination at the ocean surface and at various depths [42]. Taking into account the blue shift of BPR, matching the lower photon fluence rate from solar radiation requires a 10-fold slower photocycle in BPR than in GPR. Therefore there is no selective pressure for a photocycle faster than that of BPR at that depth. BPR may function to energize cells by light-driven electrogenic proton pumping, as does GPR. However, the contribution of solar energy capture from BPR is severely limited by the low light intensities in deep waters. This consideration raises the possibility of a regulatory rather than energy harvesting function of BPR, based either on its slow proton pumping or by yet unidentified protein-protein interaction with transducers in its native membrane. 4.3 Anabaena Sensory Rhodopsin
A rhodopsin pigment in a cyanobacterium established that sensory rhodopsins also exist in eubacteria. A gene encoding a homolog of the archaeal rhodopsins was found via a genome-sequencing project of Anabaena (Nostoc) sp. PCC7120 at Kazusa Institute (http://www.kazusa.or.jp/). The opsin gene was expressed in E. coli, and bound all-trans retinal to form a pink pigment (λmax = 543 nm) with a
4 Bacterial Rhodopsins
photochemical reaction cycle containing an M-like photointermediate and 110 ms half-life at pH 6.8 [15]. The opsin gene was found in the genome to be adjacent to another open reading frame separated by 16 base pairs under the same promoter. This operon is predicted to encode a 261-residue protein (the opsin) and a 125-residue (14 kDa) protein. The rate of the photocycle is increased ∼20% when the Anabaena rhodopsin and the soluble protein are co-expressed in E. coli [15], indicating physical interaction between the two proteins. Binding of the 14-kDa protein to Anabaena rhodopsin was confirmed by affinity-enrichment measurements and Biacore interaction analysis. The pigment did not exhibit detectable proton transport activity when expressed in E. coli, and Asp96, the proton donor of BR, is replaced with Ser86 in Anabaena rhodopsin. These observations are compelling that Anabaena opsin functions as a photosensory receptor in its natural environment, and strongly suggest that the 125-residue cytoplasmic soluble protein transduces a signal from the receptor, unlike the archaeal sensory rhodopsins which transmit signals by transmembrane helix–helix interactions with integral membrane transducers (Figure 3). Chimeric constructs have established that the archaeal sensory rhodopsins SRI and SRII transmit signals to their cognate membrane-embedded taxis transducers by interaction with the transducers, transmembrane helices and a short membrane proximal domain [43, 44] and an extensive membrane-embedded transducerbinding region has been observed in the X-ray structure of N. pharaonis SRII [12] co-crystallized with its taxis transducer fragment [10, 45]. The interaction of the soluble 14-kDa protein, likely to be a signal transducer, with Anabaena rhodopsin, therefore would extend the range of signal transduction mechanisms used by microbial sensory rhodopsins. Atomic resolution structures for the Anabaena pigment and its putative transducer have been obtained [46], but the physiological function of Anabaena SR has not been established. Several photophysiological responses of Anabaena with unidentified photosensory receptor(s) have been discussed [15, 47]. One of these is light-modulation of the pigments contained in the light-harvesting complex of Anabaena, a photoresponse called chromatic adaptation. Green light such as absorbed by Anabaena rhodopsin has been found to modulate complementary chromatic adaptation in the related cyanobacteria Calothrix and Fremyella [48]. The adaptation consists of differential biosynthesis of blue-absorbing phycoerythrins and red-absorbing phycocyanins depending on light quality. The genome of Anabaena sp. 7120 contains phycoerythrin subunits α and β(pecA and B), allophycocyanin subunits α and β(apcA and B), and phycocyanin sub-units α and β(cpcA and B). Green light (optimally 540 nm) promotes phycoerythrin synthesis whereas red light (optimally 650 nm) promotes phycocyanin synthesis [49]. A phytochrome would be an attractive candidate for a red light sensor, and 3 phytochrome homologs are present in the Anabaena sp. 7120 genome. The Anabaena rhodopsin (λmax = 543 nm) is a candidate for the green light sensor, and may function alone to discriminate color via its photochronic reactions [46]. The two major biliproteins in Synechocystis sp. 6830 are phycocyanin (λmax = 617 nm) and allophycocyanin (λmax = 650 nm) [50].
13
14 Microbial Rhodopsins
5 Eukaryotic Microbial Rhodopsins
Interestingly, the phycoerythrin gene which is regulated by green light is missing in the genome of Synechocystis which does not contain an opsin-encoding gene. 4.4 Other Bacterial Rhodopsins 4.4.1 Magnetospirillum Genome sequencing of the α-proteobacterium Magnetospirillum magnetotacticum revealed a microbial opsin gene (Table 1), proceeded by a homolog of brp, which encodes an enzyme for synthesis of retinal from β-carotene in H. salinarum [51]. The two gene-operon contains a single promoter. 4.4.2 Gloeobacter violaceus PCC 7421 Gloeobacter is a unicellular cyanobacterium and the complete genome of this strain has been sequenced [52]. There is only one type 1 rhodopsin gene, which unlike that of Anabaena, encodes a carboxylate (Glu) at the BR D96 position, suggesting a proton-pumping function.
5 Eukaryotic Microbial Rhodopsins 5.1 Fungal Rhodopsins
A genome sequencing project on the filamentous fungus Neurospora crassa revealed the first of the eukaryotic homologs, designated NOP-1 [14], and search of genome databases currently in progress indicates the presence of archaeal rhodopsin homologs in various fungi including plant and human pathogens — Ascomycetes: Botrytis cinerea, Botryotinia fuckeliana (anamorph Botrytis cinerea), Fusarium ← Fig. 3 Functional diversity among microbial rhodopsins. Domains of the two sensory rhodopsins from Chlamydomonas reinhardtii, CSRA and CSRB, based on secondary structure predictions, compared with those for proteorhodopsin, halorhodopsin and sensory rhodopsin I (in a dimeric complex with its cognate dimeric transducer) from haloarchaea and cyanobacterial sensory rhodopsin from Anabaena. The retinal chromophore (shown in red) is covalently linked to a conserved lysine residue in the seventh trans-membrane helix in each protein. Colors shown are approximately the color of the pigments. Residues in helix C of prote-
orhodopsin that are important for proton translocation, Asp-97 and Glu-108, and the amino acid differences at their corresponding positions in the other rhodopsins are highlighted. The corresponding residues are not shown for halorhodopsin (see text) to avoid the impression that they are on the chloride translocation path. Transducer domains or proteins shown in green are presumably involved in the post-receptor signal transduc-tion processes. For the cytoplasmic 14-kDa protein associated with ASR and for CSRA and CSRB, the numbers indicated correspond to the number of amino acid residues in each module.
15
16 Microbial Rhodopsins
sporotrichioides, Gibberella zeae (anamorph Fusarium graminearum), Leptosphaeria maculans, Mycosphaerella graminicola (two opsin homologs) and Basidiomycetes: Cryptococcus neoformans and Ustilago maydis. Each of these organisms contains genes predicted to encode proteins with the retinal-binding lysine in the seventh helix and high identity in the retinal-binding pocket. The Asp Schiff base counter ion and proton acceptor (Asp85 in BR) is conserved among all fungal opsin homologs and the carboxylate proton donor specific to proton pumps (Asp96 in BR) is also either Asp or Glu except in Cryptococcus which contains an Ala residue. The nop-1 gene was heterologously expressed in the yeast Pichia pastoris, and it encodes a membrane protein that forms with all-trans retinal a green lightabsorbing pigment (λmax = 534 nm) with a spectral shape and bandwidth typical of rhodopsins [33]. Laser-f lash kinetic spectroscopy of the retinal-reconstituted NOP-1 pigment (i.e. Neurospora rhodopsin) in Pichia membranes reveals that it undergoes a seconds-long photocycle with long-lived intermediates spectrally similar to intermediates detected in BR and other members of the type 1 family. The physiological function of Neurospora rhodopsin has not yet been identified. Based on the long lifetime of the intermediates in its photocycle and its apparent lack of ion-transport activities [at least when heterologously expressed [53], it seems likely to serve as a sensory receptor for one or more of the several different light responses exhibited by the organism, such as photocarotenogenesis or light-enhanced conidiation. Neurospora is non-motile, but phototaxis by zoospores of the motile fungus Allomyces reticulatus has been shown to be retinal-dependent [54] and therefore photomotility modulation is a likely photo-sensory function of rhodopsins in this particular fungal species. A blue-green light-induced photocycle in Cryptococcus neoformans native membranes has been detected and confirmed as deriving from the rhodopsin pigment by its absence in an opsin gene-deletion mutant. The photocycle is typical of a microbial rhodopsin exhibiting a blue-shifted intermediate characteristic of a deprotonated Schiff base species and a 100–150 ms half-life (pH 7.0, 25◦ C) (authors, unpublished). 5.2 Algal Rhodopsins
The rhodopsins of the green alga Chlamydomonas reinhardtii are the only ones in eukaryotic microbes to have an identified physiological function, namely photoreception controlling motility behavior [16]. Two type 1 opsin genes were identified in the C. reinhardtii genome. A microbial rhodopsin homolog gene is also present in Guillardia theta, which is a small bif lagellate organism considered both a protozoan and an alga, and opsin genes are also found in the dinof lagellate Pyrocystis lunula and Acetabularia acetabulum which is a unicellular green alga of the order Dasycladales found in warm waters of sheltered lagoons. The CSOA & CSOB (Chlamydomonas sensory opsin A and B) genes encode 712 and 737 amino acid proteins (Figure 1). The N-terminal 300 residues have a significant homology to archaeal rhodopsins with seven transmembrane helices and the conserved
6 Spectral Tuning
retinal binding pocket (Figure 1). The Chlamydomonas rhodopsins provide the first examples of evolution fusing the microbial rhodopsin motif with other domains. Early work established that Chlamydomonas uses retinylidene receptors for photomotility responses. Restoration of photomotility responses by retinal addition to a pigment-deficient mutant of C. reinhardtii first indicated a retinal-containing photoreceptor [55]. Subsequent in vivo reconstitution studies with retinal analogs prevented from isomerizing around specific bonds (“isomer-locked retinals”) in several laboratories further established that the Chlamydomonas rhodopsins governing phototaxis and the photophobic response have the same isomeric configuration (alltrans), photoisomerization across the C13–C14 double bond (all-trans to 13-cis), and 6-s-trans ring-chain conformation, (co-planar) as the archaeal rhodopsins [56–60]. The proteins encoded by csoA and csoB complexed with retinal [16]. RNAi suppression of the genes established that CSRA and CSRB mediate both phototaxis [16] and photophobic reactions [61] to high- and low-intensity light, respectively. The functions of the two rhodopsins were demonstrated by analysis of electrical currents and motility responses in transformants with RNAi directed against each of rhodopsin genes. CSRA has an absorption maximum near 510 nm and mediates a fast photoreceptor current that saturates at high light intensity. In contrast, CSRB absorbs maximally at 470 nm and generates a slow current saturating at low light intensity [16]. The rhodopsin domains of CSRA [62] and CSRB [63] have been shown to exhibit light-induced proton-channel activity in Xenopus oocytes. The relationship of this activity to their control of motility-regulating currents in C. reinhardtii is not clear. To clarify a possibly confusing series of reports in the literature, we mention here a protein that binds radiolabeled retinal, and named on this basis “chlamyrhodopsin,” that had been isolated from Chlamydomonas eyespot preparations [64]. For several years this most abundant protein in the eyespot membranes was assumed and often cited by the authors of that work as the photoreceptor for photomotile responses. However, its gene-predicted primary sequence, as well as that of a similar Volvox protein [65], suggest 2–4 transmembrane helices and no homology to archaeal opsins, nor is a photoactive retinal binding site evident from the sequences. Moreover, recently the “chlamyrhodopsin” has been ruled out as the photoreceptor pigment for either phototaxis or photophobic responses in C. reinhardtii [66].
6 Spectral Tuning
Comparison of the primary sequence, (Figure 1) alone gives hints to distinguishing properties of different microbial rhodopsins, but most properties cannot be deduced from primary structure alone. The Leu/Gln spectral-tuning switch at position 105 in GPR and BPR discussed above was revealed through structural modeling and mutagenesis. However, we were fortunate that a relatively simple single-residue switch is responsible for most of the color difference in that case, and, more
17
18 Microbial Rhodopsins
generally, detailed knowledge of atomic structure will probably be required to elucidate spectral tuning mechanisms of most microbial rhodopsins. An exemplary case is that of NpSRII [67]. Mutagenic substitution of 10 residues, in or near the retinal-binding pocket with their corresponding BR residues, produced only a 28nm red shift of the NpSRII absorption maximum [68, 69]. Structural differences responsible for the shift are evident in the 2.4-Angstrom resolution structure [12]. One notable change is a displacement of the guanidinium group of Arg72 by 1.1 Å coupled with a rotation away from the Schiff base in NpSRII. This increase in distance reduces the inf luence of Arg72 on the counterion, thus strengthening the Schiff base/counterion interaction, shifting the absorption to shorter wavelengths. In addition the position of the positive charge destabilizes the excited state contributing further blue shift [70]. Arg72 is repositioned as a consequence of several factors, including movement of its helix backbone by 0.9 Å and the cavity created by changes from BR: Phe208 →Ile197 , Glu194 →Pro183 , and Glu204 →Asp192 . Hence the spectral tuning results from precise positioning of retinal binding-pocket residues and the guanidinium of Arg72 , which could not be deduced from primary structure, but required atomic resolution tertiary structure information.
7 A Unified Mechanism for Molecular Function?
The idea that in microbial rhodopsins the sensory signaling mechanisms result from evolution “tweaking” the transport mechanism was suggested by the observation that SRI carries out light-driven proton pumping, but only when it is free of its transducer [71–74]. The HtrI protein was found to close or prevent the opening of a cytoplasmic proton-conducting channel in SRI during its photocycle. This finding led to the notion that a chemotaxis receptor progenitor of HtrI evolved an interaction with a proton-transporter progenitor of SRI, coupling to its pumping mechanism and thereby blocking the pump and converting the transport rhodopsin to a sensory receptor. NpSRII was also observed under some conditions to exhibit light-driven proton transport which was also prevented by its interaction with its transducer, HtrII [75, 76]. That the transducer-inhibition of light-driven transport occurs in both haloarchaeal rhodopsins further supports that the interaction blocking the transport is a critical aspect of the signaling mechanism. A tilting of helices, primarily helix F, contributes to opening a cytoplasmic channel in the latter half of the photocycle in BR [77]. A unifying mechanism is that ion transport and sensory signaling use the same retinal-driven protein structural changes, which is the conformational change that opens the cytoplasmic channel in the proton transport cycle. The key feature of the model is that consequences of retinal photoisomerization, including light-induced disruption of the salt bridge between the protonated Schiff base on helix G and aspartyl counterion on helix C [78], triggers tilting of helix F to which the Htr transmembrane helices are coupled. Supporting this mechanism are (i) the proton-pumping by the sensory rhodopsins, (ii) its inhibition by transducer interaction discussed above, and (iii) light-induced tilting of helix F in N. pharaonis
9 Conclusions
SRII, concluded from site-directed spin-labeling measurements [79]. Furthermore, (iv) genetic evidence supports helix-F involvement in signaling [80]. In addition, (v) substitution of the Schiff base counterion Asp85 with asparagine induces helix-F tilting in the dark in BR and the corresponding substitution in H. salinarium SRII partially activates the receptor in the dark [78]. Finally (vi), helix F interacts with the two transmembrane helices of the HtrII fragment co-crystallized with NpSRII [10]. Another prediction is that, since in the model SR helix tilting is transmitted to the Htr protein by direct helix–helix contacts, alterations in structure must occur in the Htr transmembrane domains between the receptor interaction sites and the cytoplasmic domain of the transducer, where the activity of the bound histidine kinase is controlled. Such structural alterations have been detected as light-induced changes in interactions between spin labels introduced into the NpHtrII transmembrane helices [79] and changes of disulfide bond formation rates between engineered cysteines [81]. In both studies the data indicate that the second transmembrane segment [79], and interaction of the receptor’s E-F loop with the membrane proximal domain of the transducer has been implicated in signal transfer [82, 83]. In summary, the evidence is compelling that the conformational changes in transport and sensory rhodopsins in haloarchaea share essential features despite their differing functions. For study of the newfound microbial rhodopsins, an important question is whether the light-induced conformational change observed in BR and strongly implicated in the haloarchaeal sensory rhodopsin photocycles is a key conserved feature of their functional mechanisms.
8 Opsin-related Proteins without the Retinal-binding Site
Several other genes in the fungi N. crassa [84, 85]. It may be that the conformational switching properties of the archaeal rhodopsins have been preserved in these opsinrelated proteins, while the photoactive site has been lost and its function replaced by another input module such as a protein-protein interaction domain. It is striking that opsin-related proteins lacking the retinal-binding lysine have been observed so far only in fungi and not in any of the many other classes of organisms containing type 1 rhodopsins.
9 Conclusions
Type 1 rhodopsins are present in all three domains of life, and therefore progenitors of these proteins may have existed in early evolution before the divergence of archaea, eubacteria, and eukaryotes. If so, light-driven ion transport as a means of obtaining cellular energy may well have predated the development of photosynthesis, and represent one of the earliest means by which organisms tapped solar radiation as an energy source. As more rhodopsins are identified, their evolution and dissemination into such a wide variety of organisms, whether by
19
20 Microbial Rhodopsins
divergence from a common progenitor or horizontal gene transfer, should become clearer. There is a much work to be done to understand the physiological roles and molecular mechanisms of the rhodopsins so far identified in the various microbial species. It seems likely that we will see even more members of this family as genomic sequencing becomes ever more rapid. The vast majority of microbial species have never been cultivated in a laboratory. Therefore the use of microbial rhodopsin probes in environmental genomics, which expands the search for homologous genes to uncultivated organisms, is likely to be especially fruitful.
Acknowledgements
We thank Elena Spudich for stimulating discussions. The work by the authors referred to in this review was supported by grants from the National Institutes of Health, National Science Foundation, Human Frontiers Science Program, and the Robert A. Welch Foundation.
References 1 OESTERHELT, D., and STOECKENIUS, W., (1973) Functions of a new photoreceptor membrane. Proc Natl Acad Sci USA 70, 2853–2857. 2 MATSUNO-YAGI, A., and MUKOHATA, Y., (1977) Biochem Biophys Res Commun 78, 237–243. 3 SCHOBERT, B., and LANYI, J. K., (1982) J Biol Chem 257, 10306–10313. 4 BOGOMOLNI, R. A., and SPUDICH, J. L., (1982) Proc Natl Acad Sci USA 79, 6250–6254. 5 TAKAHASHI, T., TOMIOKA, H., KAMO, N., and KOBATAKE, Y., (1985) FEMS Microbiol Lett 28, 161–164. 6 ESSEN, L., SIEGERT, R., LEHMANN, W. D., and OESTERHELT, D., (1998) Proc Natl Acad Sci USA 95, 11673–11678. 7 GRIGORIEFF, N., CESKA, T. A., DOWNING, K. H., BALDWIN, J. M., and HENDERSON, R., (1996) J Mol Biol 259, 393–421. 8 LUECKE, H., SCHOBERT, B., RICHTER, H. T., CARTAILLER, J. P., and LANYI, J. K., (1999) J Mol Biol 291, 899–911. 9 KOLBE, M., BESIR, H., ESSEN, L. O., and OESTERHELT, D., (2000) Science 288, 1390–1396. 10 GORDELIY, V. I., LABAHN, J., MOUKHAMETZIANOV, R., EFREMOV, R., GRANZIN, J., SCHLESINGER, R., BULDT, G.,
11
12
13
14
15 16
17
SAVOPOL, T., SCHEIDIG, A. J., KLARE, J. P., and ENGELHARD, M., (2002) Nature 419, 484–487. KUNJI, E. R., SPUDICH, E. N., GRISSHAMMER, R., HENDERSON, R., and SPUDICH, J. L., (2001) J Mol Biol 308, 279–293. LUECKE, H., SCHOBERT, B., LANYI, J. K., SPUDICH, E. N., and SPUDICH, J. L., (2001) Science 293, 1499–1503. PALCZEWSKI, K., KUMASAKA, T., HORI, T., BEHNKE, C. A., MOTOSHIMA, H., FOX, B. A., LE TRONG, I., TELLER, D. C., OKADA, T., STENKAMP, R. E., YAMAMOTO, M., and MIYANO, M., (2000) Science 289, 739–745. BIESZKE, J. A., BRAUN, E. L., BEAN, L. E., KANG, S., NATVIG, D. O., and BORKOVICH, K. A., (1999) Proc Natl Acad Sci USA 96, 8034–8039. JUNG, K. H., TRIVEDI, V. D., and SPUDICH, J. L., (2003) Mol Microbiol 47, 1513–1522. SINESHCHEKOV, O. A., JUNG, K. H., and SPUDICH, J. L., (2002) Proc Natl Acad Sci USA 99, 8689–8694. B´EJA` , O., ARAVIND, L., KOONIN, E. V., SUZUKI, M. T., HADD, A., NGUYEN, L. P., JOVANOVICH, S. B., GATES, C. M., FELDMAN, R. A., SPUDICH, J. L., SPUDICH, E. N., and DELONG, E. F., (2000) Science 289, 1902–1906.
References 18 B´EJA` , O., SPUDICH, E. N., SPUDICH, J. L., LECLERC, M., and DELONG, E. F., (2001) Nature 411, 786–789. 19 de la TORRE, J. R., CHRISTIANSON, L. M., BEJA, O., SUZUKI, M. T., KARL, D. M., HEIDELBERG, J., and DELONG, E. F., (2003) Proc Natl Acad Sci USA 100, 12830–12835. 20 MAN, D., WANG, W., SABEHI, G., ARAVIND, L., POST, A. F., MASSANA, R., SPUDICH, E. N., SPUDICH, J. L., and B´EJA` , O., (2003) EMBO J 22, 1725–1731. 21 MAN-AHARONOVICH, D., SABEHI, G., SINESHCHEKOV, O. A., SPUDICH, E. N., SPUDICH, J. L., and B´EJA` , O., (2004) Photochem Photobiol Sci 3, 459–462. 22 SABEHI, G., MASSANA, R., BIELAWSKI, J. P., ROSENBERG, M., DELONG, E. F., and BEJA, O., (2003) Environ Microbiol 5, 842–849. 23 VENTER, J. C., REMINGTON, K., HEIDELBERG, J. F., HALPERN, A. L., RUSCH, D., EISEN, J. A., WU, D., PAULSEN, I., NELSON, K. E., NELSON, W., FOUTS, D. E., LEVY, S., KNAP, A. H., LOMAS, M. W., NEALSON, K., WHITE, O., PETERSON, J., HOFFMAN, J., PARSONS, R., BARDEN-TILLSON, H., PFANNKOCH, C., ROGERS, Y. H., and SMITH, H. O., (2004) Science 306, 66–74. 24 LANYI, J. K., and LUECKE, H., (2001) Curr Opin Struct Biol 11, 415–419. 25 VARO, G., (2000) Biochim Biophys Acta 1460, 220–229. 26 HOFF, W. D., JUNG, K. H., and SPUDICH, J. L., (1997) Annu Rev Biophys Biomol Struct 26, 223–258. 27 SPUDICH, J. L., YANG, C. S., JUNG, K. H., and SPUDICH, E. N., (2000) Annu Rev Cell Dev Biol 16, 365–392. 28 OESTERHELT, D., (1998) Curr Opin Struct Biol 8, 489–500. 29 SPUDICH, J. L., and BOGOMOLNI, R. A., (1984) Nature 312, 509–513. 30 TAKAHASHI, T., YAN, B., MAZUR, P., DERGUINI, F., NAKANISHI, K., and SPUDICH, J. L., (1990) Biochemistry 29, 8467–8474. 31 DIOUMAEV, A. K., BROWN, L. S., SHIH, J., SPUDICH, E. N., SPUDICH, J. L., and LANYI, J. K., (2002) Biochemistry 41, 5348–5358. 32 WANG, W. W., SINESHCHEKOV, O. A., SPUDICH, E. N., and SPUDICH, J. L., (2003) J Biol Chem 278, 33985–33991. 33 BIESZKE, J. A., SPUDICH, E. N., SCOTT, K. L., BORKOVICH, K. A., and SPUDICH, J. L., (1999) Biochemistry 38, 14138–14145.
34 SPUDICH, J. L., ZACKS, D. N., and BOGOMOLNI, R. A., (1995) Isr J Photochem 35, 495–513. 35 DIOUMAEV, A. K., WANG, J. M., BALINT, Z., VARO, G., and LANYI, J. K., (2003) Biochemistry 42, 6582–6587. 36 FRIEDRICH, T., GEIBEL, S., KALMBACH, R., CHIZHOV, I., ATAKA, K., HEBERLE, J., ENGELHARD, M., and BAMBERG, E., (2002) J Mol Biol 321, 821–838. 37 KREBS, R. A., ALEXIEV, U., PARTHA, R., DEVITA, A. M., and BRAIMAN, M. S., (2002) BMC Physiol 2, 5. 38 LAKATOS, M., LANYI, J. K., SZAKACS, J., and VARO, G., (2003) Biophys J 84, 3252–3256. 39 BIRGE, R. R., (1990) Annu Rev Phys Chem 41, 683–733. 40 OTTOLENGHI, M., and SHEVES, M., (1989) J Membr Biol 112, 193–212. 41 YAN, B., SPUDICH, J. L., MAZUR, P., VUNNAM, S., DERGUINI, F., and NAKANISHI, K., (1995) J Biol Chem 270, 29668–29670. 42 JERLOV, N. G., (1976) in Marin Optics. Amsterdam, Oxford, New York, Elsevier. 43 JUNG, K. H., SPUDICH, E. N., TRIVEDI, V. D., and SPUDICH, J. L., (2001) J Bacteriology 183, 6365–6371. 44 ZHANG, X. N., ZHU, J., and SPUDICH, J. L., (1999) Proc Natl Acad Sci USA 96, 857–862. 45 SPUDICH, J. L., (2002) Nature Struct. Biol 9, 797–799. 46 VOGELEY, L., SINESHCHEKOV, O. A., TRIVEDI, V. D., SASAKI, J., SPUDICH, J. L. and LUECKE, H., (2004) Anabaena Sensory Rhodopsin: A Photochromic Color Sensor at 2.0 Å. Science 306, 1390–1393. 47 MULLINEAUX, C. W., (2001) Mol Microbiol 41, 965–971. 48 GROSSMAN, A. R., BHAYA, D., and HE, Q., (2001) J Biol Chem 276, 11449–11452. 49 KEHOE, D. M., and GROSSMAN, A. R., (1994) Semin Cell Biol 5, 303–313. 50 TOOLE, C. M., PLANK, T. L., GROSSMAN, A. R., and ANDERSON, L. K., (1998) Mol Microbiol 30, 475–486. 51 PECK, R. F., ECHAVARRI-ERASUN, C., JOHNSON, E. A., NG, W. V., KENNEDY, S. P., HOOD, L., DAS-SARMA, S., and KREBS, M. P., (2001) J Biol Chem 276, 5739–5744. 52 NAKAMURA, Y., KANEKO, T., SATO, S., MIMURO, M., MIYASHITA, H., TSUCHIYA, T., SASAMOTO, S., WATANABE, A.,
21
22 Microbial Rhodopsins
53
54 55
56 57
58
59
60
61
62
63
64
65
66
KAWASHIMA, K., KISHIDA, Y., KIYOKAWA, C., KOHARA, M., MATSUMOTO, M., MATSUNO, A., NAKAZAKI, N., SHIMPO, S., TAKEUCHI, C., YAMADA, M., and TABATA, S., (2003) DNA Res 10, 137–145. BROWN, L. S., DIOUMAEV, A. K., LANYI, J. K., SPUDICH, E. N., and SPUDICH, J. L., (2001) J Biol Chem 276, 32495–32505. SARANAK, J., and FOSTER, K. W., (1997) Nature 387, 465–466. FOSTER, K. W., SARANAK, J., PATEL, N., ZARILLI, G., OKABE, M., KLINE, T., and NAKANISHI, K., (1984) Nature 311, 756–759. HEGEMANN, P., GARTNER, W., and UHL, R., (1991) Biophys J 60, 1477–1489. LAWSON, M. A., ZACKS, D. N., DERGUINI, F., NAKANISHI, K., and SPUDICH, J. L., (1991) Biophys J 60, 1490–1498. SAKAMOTO, M., WADA, A., AKAI, A., ITO, M., GOSHIMA, T., and TAKAHASHI, T., (1998) FEBS Lett 434, 335–338. SINESHCHEKOV, O. A., GOVORUNOVA, E. G., DER, A., KESZTHELYI, L., and NULTSCH, W., (1994) Biophys J 66, 2073–2084. TAKAHASHI, T., YOSHIHARA, K., WATANABE, M., KUBOTA, M., JOHNSON, R., DERGUINI, F., and NAKANISHI, K., (1991) Biochem Biophys Res Commun 178, 1273–1279. GOVORUNOVA, E. G., JUNG, K. H., SINESHCHEKOV, O. A., and SPUDICH, J. L., (2004) Biophys J 86, 2342–2349. NAGEL, G., OLLIG, D., FUHRMANN, M., KATERIYA, S., MUSTI, A. M., BAMBERG, E., and HEGEMANN, P., (2002) Science 296, 2395–2398. NAGEL, G., SZELLAS, T., HUHN, W., KATERIYA, S., ADEISHVILI, N., BERTHOLD, P., OLLIG, D., HEGEMANN, P., and BAMBERG, E., (2003) Proc Natl Acad Sci USA 100, 13940–13945. DEININGER, W., KROGER, P., HEGEMANN, U., LOTTSPEICH, F., and HEGEMANN, P., (1995) EMBO J 14, 5849–5858. EBNET, E., FISCHER, M., DEININGER, W., and HEGEMANN, P., (1999) Plant Cell 11, 1473–1484. FUHRMANN, M., STAHLBERG, A., GOVORUNOVA, E., RANK, S., and HEGEMANN, P., (2001) J Cell Sci 114, 3857–3863.
67 TOMIOKA, H., and SASABE, H., (1995) Biochim Biophys Acta 1234, 261–267. 68 KAMO, N., SHIMONO, K., IWAMOTO, M., and SUDO, Y., (2001) Biochemistry (Mosc) 66, 1277–1282. 69 SHIMONO, K., IWAMOTO, M., SUMI, M., and KAMO, N., (2000) Photochem Photobiol 72, 141–145. 70 REN, L., MARTIN, C. H., WISE, K. J., GILLESPIE, N. B., LUECKE, H., LANYI, J. K., SPUDICH, J. L., and BIRGE, R. R., (2001) Biochemistry 40, 13906–13914. 71 BOGOMOLNI, R. A., STOECKENIUS, W., SZUNDI, I., PEROZO, E., OLSON, K. D., and SPUDICH, J. L., (1994) Proc Natl Acad Sci USA 91, 10188–10192. 72 OLSON, K. D., and SPUDICH, J. L., (1993) Biophys J 65, 2578–2585. 73 SPUDICH, E. N., and SPUDICH, J. L., (1993) J Biol Chem 268, 16095–16097. 74 SPUDICH, J. L., (1994) Cell 79, 747–750. 75 SCHMIES, G., ENGELHARD, M., WOOD, P. G., NAGEL, G., and BAMBERG, E., (2001) Proc Natl Acad Sci USA 98, 1555–1559. 76 SUDO, Y., IWAMOTO, M., SHIMONO, K., and KAMO, N., (2001) Photochem Photobiol 74, 489–494. 77 SUBRAMANIAM, S., and HENDERSON, R., (2000) Nature 406, 653–657. 78 SPUDICH, E. N., ZHANG, W., ALAM, M., and SPUDICH, J. L., (1997) Proc Natl Acad Sci U S A 94, 4960–4965. 79 WEGENER, A. A., CHIZHOV, I., ENGELHARD, M., and STEINHOFF, H. J., (2000) J Mol Biol 301, 881–891. 80 JUNG, K. H. and SPUDICH, J. L., (1998) J. Bacteriology 180, 2033–2042. 81 YANG, C. S., and SPUDICH, J. L., (2001) Biochemistry 40, 14207–14214. 82 CHEN, X. and SPUDICH, J. L., (2004) J Biol Chem 279, 42964–42969. 83 YANG, C. S., SINESHCHEKOV, O. A., SPUDICH, E. N. and SPUDICH, J. L., (2004) J Biol Chem 279: 42970–42976. 84 PIPER, P. W., ORTIZ-CALDERON, C., HOLYOAK, C., COOTE, P., and COLE, M., (1997) Cell Stress Chaperones 2, 12–24. 85 ZHAI, Y., HEIJNE, W. H., SMITH, D. W., and SAIER, M. H., Jr., (2001) Biochim Biophys Acta 1511, 206–223.
1
Light-activated Proteins Sandra Loudwig and Hagan Bayley University of Oxford, Oxford, United Kingdom
Originally published in: Dynamic Studies in Biology. Edited by Maurice Goeldner and Richard S. Givens. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30783-8
1 Introduction
While there are a considerable number of publications describing small caged molecules for applications in biology (for reviews, see [1, 2]), there are still relatively few papers describing caged proteins (for earlier reviews, see [3–6]). This is perhaps because the preparation of such molecules requires an interdisciplinary approach and often presents considerable technical difficulties. Nevertheless, the potential applications of caged proteins are likely to be highly rewarding and an expansion of work in this area is certainly warranted. We define a caged protein as an inactive protein that can be activated by light (Fig. 1). The definition is best kept loose, and here we make several asides to discuss, for example, reversible activation and light-triggered inactivation. For many applications, there is no alternative to the use of a caged protein. In other cases, the use of a caged small molecule might be considered instead. By comparison with small molecules, the advantages of caged proteins include the fact that the activity in question is precisely pinpointed and that in cellular experiments very low concentrations of a reagent can be used, avoiding, for example, problems with photolysis byproducts. Proteins are usually caged by targeted chemical modification. We aim to point out the issues with which the experimentalist should be familiar and to illustrate them with examples; however, the examples do not in themselves offer a comprehensive review of the literature, which nowadays can be better obtained from databases.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Light-activated Proteins
Fig. 1 Typical procedure for caging and uncaging a protein. A key residue (e.g. one in the active site of an enzyme) is derivatized with a caging reagent resulting in inactivation of the protein. The protein can be reactivated by photolysis of the caging group, which is released in an altered form.
2 The Properties of Caged Proteins 2.1 Extents of Caging and Photoactivation
Light-activated proteins are used to obtain control over the timing and location of activation of a biological process, and when necessary the extent of activation. If there are other biomolecules present, they should not be damaged. These requirements raise several important issues, the solutions to which will depend on the experimental system. While it is desirable to efficiently inactivate a protein with a caging reagent, and subsequently recover full activity upon irradiation, this cannot always be done. Therefore, the importance of the extents of caging and uncaging must be considered further, and will in turn affect the choice of caging reagent. For example, in the case of a caged signal transduction protein for injection into target cells, it is highly desirable to begin with a completely inactivated protein and even limited photoregeneration of activity within the cell (e.g. 5%) can be useful. In contrast, in many biophysical experiments, such as those involving time-resolved X-ray diffraction, circular dichroism or IR spectroscopy, a high recovery of activity is usually required (e.g. 95% or above), but complete inactivation is not usually as crucial (e.g. 5% of unmodified protein can often be tolerated). 2.2 The Rate of Uncaging: Continuous Irradiation
The rate of photorelease is often critical in biophysical experiments and should be fast in comparison to the timecourse of the event under observation. The half-time for activation of a caged reagent under continuous irradiation in dilute solution is given by [3]: t1/2 ≈ 0.3/φ p I0 ε
2 The Properties of Caged Proteins
Fig. 2 Illustration of t1/2 and J * for photolysis. During continuous photolysis at relatively low light intensity, the half-life of the caged protein is t1/2 , which is a function of the amount of light absorbed per unit time and the efficiency of conversion of the excited state to the uncaged product. Under
these conditions, the rate at which the excited states (and intermediates when they exist) are converted to product, represented by J *, is not rate determining. After a short intense light flash, the half-time for formation of the uncaged product is given by J *.
where P is the product quantum yield, I0 is the light intensity and ε is the extinction coefficient. t1/2 , ε and I0 are usually measured experimentally to find P . This expression encompasses the rate at which a collection of molecules absorbs photons and the efficiency at which the excited states are converted to uncaged product. t1/2 should be distinguished from the usually more rapid rate of breakdown of the excited states and intermediates (τ *), which will be discussed later (Fig. 2). The relationship given above is valid for a single wavelength, as would be the case for laser irradiation, and it is a reasonable approximation for a narrow band of wavelengths. Where a wide band is used, e.g. as might be the case from an arc lamp, the variation of ε and I0 with wavelength should be taken into account when determining P for the prevailing conditions. This can be done conveniently by summing I0 ε over the wavelength band by using, say, 10-nm intervals. To determine I0 for each interval from the total lamp output (determined by actinometry), both the emission spectrum of the lamp and the filter characteristics must be considered [7]. Clearly, in most circumstances, I0 can be increased far beyond the levels required to achieve photolysis in the desired time. The question is: what happens to other materials in the experimental system at high light intensities? The simple answer is that the consequences are usually deleterious. Most biological molecules are damaged by light, and the damage is exacerbated at short wavelengths and in the presence of oxygen. In a complex system, a wide variety of damage can occur, including crosslinking of nucleic acids and proteins. However, even in a simple biophysical experiment, involving a single protein, inter- and intramolecular crosslinks, polypeptide chain cleavage, and damage to susceptible residues such as tryptophan (Trp) can cause significant problems [8, 9]. At short photolysis times, a measure of the extent of uncaging relative to damage is given by E ud = t1/2d /t1/2u = φu εu /φd εd
3
4 Light-activated Proteins
where the subscripts ‘u’ and ‘d’ indicate uncaging and damage, respectively. At longer photolysis times, as uncaging nears completion, damage will “catch up” with uncaging: a warning to irradiate for no longer than necessary. The larger the value of E ud , the better and the experience of many workers in the field of caged reagents suggests that u ε u should be > 100 M−1 cm−1 , at least in the near-UV (350–400 nm), and even better when > 1000 M−1 cm−1 . In addition, photolysis should be at the longest wavelengths possible (proteins and nucleic acids without prosthetic groups absorb little above 300 nm) and preferably in the absence of oxygen. Values of u and ε u for many caging groups that have or could be used with proteins have been compiled here (Tab. 1). Because these values have not been obtained at identical wavelengths, it can be difficult to compare u ε u values, but we have attempted to quote values at wavelengths that would be used experimentally. The value of u ε u for a class of caging reagents can be manipulated by chemical substitution (Tab. 1). For example, in the case of 2-nitrobenzyl attached at the N atom of carbamoyl choline, the replacement of a methyl group with a carboxylate at the benzylic position increases from 0.25 to 0.8, resulting in an increase in u ε u (312 nm) from ∼300 to ∼1000 [10]. In another case, with a (2-nitrophenyl)ethyl group on the phenolic O atom of phenylephrine, the introduction of methoxy groups in positions 4 and 5 of the aromatic ring increases ε312 from 1300 to 4500 M−1 cm−1 (λmax is increased from 260 to 340 nm), which with a quantum yield = 0.13 gives u ε u (312 nm) ∼600 [11]. By comparison, the un-substituted derivative with = 0.05 gives u ε u (312 nm) ∼65. However, to the chagrin of the investigator, an improvement in ε u often leads to a reduction in u . In one such case, cysteine (Cys)- or thiophosphate-containing peptides were caged on the S atoms with 2nitrobenzyl or 4,5-dimethoxy-2-nitrobenzyl and little was gained from substitution of the ring, which serves to increase ε u [12]. Clearly, we have a great deal to learn about the mechanism(s) of photolysis of 2-nitrobenzyl reagents, which continue to be investigated [2, 13, 14]. A detailed study of substituted coumarins has been made by Furuta et al., who found u ε u (365 nm) values in the range of 100-630, with the highest value assigned to a tribrominated derivative [15]. By comparison, the 7-methoxycoumarins have u ε u (333 nm) values approaching 3000 [16]. Following their successful use of 4-hydroxyphenacyl (HPB) as a caging group [u ε u (300 nm) = 4400] [17], Givens et al. attempted to push the absorbance towards the visible by substitution of the aromatic ring with methoxy groups. The 3-methoxy and 3,5-di-methoxy derivatives have absorption tails above 400 nm, but the values are reduced to 0.03–0.04, which is accompanied by a change in the photolysis mechanism. In contrast to the underivatized 4-hydroxyphenacyl group, the formation of a phenylacetic acid from an initially released spiroketone (Fig. 3) is a minor pathway in the case of the methoxy reagents [18]. Additional examples can be found in Tab. 1. Further, while the determination of ε is a simple matter, the reader is cautioned to note that not all values of that make their way into tables in the literature (including Tab. 1) have been determined with equal rigor: some have been obtained by careful actino-metry, while others
2 The Properties of Caged Proteins
5
Table 1 Caging groups for proteins Caging group
Caging group characteristics
Caged functionality and/or molecule; caging reagent
Comment s
Refs.
25, 33, 2-Nitrobenzyl derivatives, initially described for the protection 34, 141 of carboxylates [141], are the most widely used caging groups. They have been used to cage thiols, thiophosphates, phosphates, carboxylates, amino groups and alcohols. Various modifications at the benzylic position (see NB, NPE, CNB and NPT) or on the aromatic ring (see DMNB, DMNPT, NVOC and MeNPoc) have been made in attempts to improve photochemical properties (see this table). The main drawback of the 2-nitrobenzyl group is the generation of a photolytic byproduct that reacts with thiol groups [25, 33, 34]. Further, when substituted in the benzylic position, the 2-nitrobenzyl group bears an asymmetric center. (312; pH 6) = 0.84; (312; pH 8.5) = 0.14 (312; pH 6) (312; pH8.5)
1090; 180
RCH2S-NB Caged Cys in cAMP-dependent protein kinase (PKA).
NBB 312 = 1300
[3]
(312; pH 5.8) = 0.62; (312; pH 7.2) = 0.14 (312; pH 5.8) (312; pH 7.2)
[12]
810; 180
RCH2S-NB In the heptapeptide C-kemptide, the serine of kemptide is replaced with Cys. This residue is caged with NBB.
A PKA mutant with a single 65 Cys is modified with NBB. Caging with different 2-nitrobenzyl derivatives (BNPA, DMNBB and NBB) and photolysis at two pH values are compared (see this table). Modification with unsubstituted NBB gives the highest extent of inactivation and subsequent photoregeneration of activity. Caging with BNPA and DMNBB 3, 12 (see this table) and photolysis at two pH values are compared. Photolysis of C-kemptide modified with NBB gives the best quantum yield and a slightly higher yield of uncaged product: 70% at pH 5.8 against 62 and 67% for BNPA and DMNBB, respectively.
(Continued)
6 Light-activated Proteins Table 1 (continued) Caging group
Caging group characteristics
312 = 1300
Caged functionality and/or molecule; caging reagent
[3]
(312; pH 5.8) = 0.23; (312; pH 7.2) = 0.04 (312; pH 5.8) (312; pH 7.2)
300; 50
[12]
Caged thiophosphorylated serine in the heptapeptide kemptide.
(312; pH 5.8) = 0.37; (312; pH 7.3) = 0.25 (312; pH 5.8) (312; pH 7.3)
480; 325 Caged thiophosphotyrosine in an 11 amino acid peptide.
max = 272
nm
272 = 6200 (300–350; pH 7)
RPhO-NB Caged phenylephrine [11].
= 0.05 312 65 * ND [11]
ROC(O)NH-NB Caged carbamyl(300–350; pH 8) = 0.25 choline. * (nitronate decay) = 1.7 ms (pH 7) 312 325 (pH 8) max = 262
262 = 5200
nm
Comment s
Refs.
Caging with BNPA and DMNBB 3, 12 (see this table) and photolysis at two pH values are compared. Photolysis of thiophosphoserine modified with unsubstituted NBB gives the best quantum yield and the highest yield of uncaged product: 70% at pH 5.8 against 55% for DMNBB. At pH 4, thiophosphate can be selectively modified over Cys with NBB. A thiophosphotyrosine is modi- 31 fied with NBB. Caging with NBB and 4-hydroxyphenacyl bromide (HPB), and photolysis at two pH values are compared. Uncaging of HP gives the best quantum yields and a slightly better yield of uncaged product: 50–70% (at both pH 5.8 and 7.3) against 50– 60% for NB (see below). 11, 142, Photolysis of different 2-nitro143 benzyl derivatives (NB, NPE, DMNB and CNB) are compared (see this table) [11]. Proteins have not been caged directly on hydroxyls, but catalytic serines, with enhanced nucleophilicity, might be selectively modified with NBBlike compounds. Also, caged tyrosines have been incorporated into proteins by unnatural amino acid mutagenesis [142, 143]. Proteins have not been caged as 144 carbamates in this manner, but the work provides a useful study of the pH dependence of breakdown of the nitronate intermediate.
2 The Properties of Caged Proteins
7
Table 1 (continued) Caging group
Caging group characteristics
Caged functionality and/or molecule; caging reagent
11 Proteins have not been caged directly in this way, but the work provides a useful comparison of various 2-nitrobenzyl derivatives. and hence values for NPE are about twice those for NB. The breakdown of the nitronate intermediate (Fig. 5.1.3) is slow in this case.
ROC(O)NH-NPE Caged carbamyl(300–350; pH 8) = 0.25 choline. * (nitronate decay) = 67 s (pH 7) 312 325 (pH 8)
Although the and values for 144 carbamylcholine caged with NB and NPE are the same, the decay rate of the nitronate intermediate with NPE is much faster.
RCH2S-CNB In the heptapep270 For Cys in C-kemp- tide C-kemptide, tide [12] the serine of kemptide is replaced with a Cys. This residue is caged with BNPA. Alternatively, the serine in kemptide is thiophosphorylated and then modified by BNPA [3, 12].
BNPA has been developed as a 3, 12, water-soluble reagent for the 47, 65 modification of nucleophiles in peptides and proteins. While 95% modification of the Cys in C-kemptide is achieved, only 10% modification of a thiophosphorylated serine in kemptide occurs under the same conditions. The value for Cys caged with CNB is much lower than for Cys caged with NB [3, 12]. Further, irradiation of CNB-Cys in PKA leads to 25–30% recovery of activity, while 80–100% recovery is found with PKA caged at Cys with NB [65]. BNPA has also been used for caging the pore-forming protein hemolysin [47].
nm
max = 262
nm
262 = 5200
(312; pH 5.8) = 0.21 312
CNB
Refs.
RPhO-NPE Caged phenyl(300–350; pH 7) = 0.11 ephrine. * (nitronate decay) = 300 ms (pH 7) 312 140 max = 272
272 = 6200
NPE
Comment s
BNPA max = 270
nm
RPhO-CNB Caged phenyl(300–350; pH 7) = 0.28 ephrine. * (nitronate decay) = 0.35 ms 312 365 266 = 6000
The quantum yield for CNB is 11 higher than for NB, NPE and DMNB. The lifetime of the nitronate is short compared to NPE and DMNB in the same context ( * not given for NB).
(Continued)
8 Light-activated Proteins Table 1 (continued) Caging group
Caging group characteristics
max = 266
nm
266 = 5200 (300–350; pH 7) = 0.8 * (nitronate decay) = 40 s 312 1040 [144]
Caged functionality and/or molecule; caging reagent
Comment s
Refs.
ROC(O)NH-CNB Caged carbamylcholine [144].
is remarkably high and is thus favorable [144]. Also the rate of decay of the nitronate intermediate is faster than for NB and slightly faster than for NPE in the same context [10, 144].
10, 144
RO-NPT and RO-DMNPT Caged alcohols: * = 2.2 and 50 s values are for (biexponential rate for nitronate decay) NPT-choline and DMNPT-arseno312 910 choline. R = OMe: (312; pH 7) = 0.43 312 1935 R = H:
(312; pH 7) = 0.7
R = H, NPT R = OMe, DMNPT
312 = 4500
[3]
(312; pH 5.8) = 0.15 312
675 [12]
RCH2S-DMNB The Cys in C-kemptide is caged with DMNBB [12].
DMNB
DMNBB
(312; pH 5.8) = 0.06 312 = 270
Caged thiophosphorylated serine in kemptide.
145 for NPT is very high. for DMNPT can be as high as 0.62, e.g. for 4-DMNPT -tolylgalactoside. The value for DMNPT is favorable. Proteins have not yet been caged in this manner.
is only slightly lower than for 3, 12, BNPA, but much lower than for 65 NBB. Nevertheless, since the value at 312 nm is higher, is about double that for the NB peptide. The yield of deprotection of 67% at pH 5.8 is similar to that of NB (70%) [12]. However, when DMNBB and NBB are used to cage a Cys in PKA, the photoregeneration of activity with NBPKA is 10–15 times greater than with DMNB-PKA [65]. 3, 12 DMNBB is used to cage thiophosphorylserine in kemptide. for DMNB is lower than that for NB, but (312; pH 5.8) values are about the same (300 for NB). values for NB and DMNB-caged thiophosphoserine are lower than values for caging a Cys residue.
2 The Properties of Caged Proteins
9
Table 1 (continued) Caging group
Caging group characteristics
max = 330
Comment s
Caged functionality and/or molecule; caging reagent
The value for DMNB-phenyl- 11 ephrine is similar to that for NPE-phenylephrine, but is higher. The lifetime of the nitronate lies between the values for CNB and NPE.
nm
RPhO-DMNB Caged phenyl(300–350; pH 7) = 0.13 ephrine. * (nitronate decay) = 2.8 ms 312 585 330 = 5000
max = 350
nm
DMNB is first used with caged 146 phosphate esters. Compared to NB, the absorption maximum is shifted to 350 nm and cyclic nucleotide photorelease is 200 times faster.
O Caged phosphates (cAMP and cGMP).
NH ||
/
likely to be similar to DMNB NVOC (365; pH 7.2) = 0.023 [141]
N H
N H
R
Caged arginine in a peptide inhibitor of protein kinase [141].
NVOC
Refs.
147 In the case of caged Arg, the low value of is compensated by a high , as the NVOC max is around 350 nm [141]. 6-Nitroveratryloxycarbonyl chloride (NVOC-Cl) is used to randomly modify Lys residues in proteins (see Tab. 5.1.2).
NVOC-Cl likely to be similar to DMNB
1-(4,5-Dimethoxy-2-nitrophenyl)- 66 2-nitroethene is used to modify one Cys residue per subunit in -galactosidase. Caged Cys in galactosidase.
(Continued)
10 Light-activated Proteins Table 1 (continued) Caging group
Caging group characteristics
Caged functionality and/or molecule; caging reagent
Comment s
likely to be similar to DMNB
MeNPoc-OR Photoprotection of 5 hydroxyl groups in deoxyribonucleotides.
148 1-(2-Nitro-4,5-methylenedioxyphenyl)-ethyl-1-oxycarbonyl chloride is used for the protection of deoxyribonucleoside hydroxyl groups during the synthesis of oligonucleotide arrays. Proteins have not yet been caged in this manner, but the oxycarbonyl chloride might be used as an alternative to NVOC-Cl for caging by random modification of Lys residues.
MeNPoc
MeNPoc-Cl
max = 262
nm
259 = 5500
(cyclohexanone ketal) max = 259 nm 259 = 5300 (glycol)
max = 259
Caged aldehydes and ketones.
nm
259 = 7298; 350 = 1289
NNMC
(380; pH 6.7) = 0.63 (nitroso formation) 380 = 510
Model compound.
NNMCC max
260 nm
355 = 400 (365; pH 7.2) = 0.3
[154] 355
NPPOC
120
NPPOC-OR Caged alcohol in thymidine [154].
Refs.
Aldehydes and ketones have 149– been caged as dioxolane rings 153 by using 2-nitrobenzyl derivatives. Ketones have been introduced into proteins by unnatural amino acid mutagenesis and it might be possible to cage them with diol reagents. Alternatively, caged ketones themselves could be introduced by unnatural amino acid mutagenesis. 3-Nitro-2-naphthalenemethanol 50 is irradiated as a model compound to obtain values. 3-Nitro-2-naphthyloxycarbonyl chloride (NNMCC) is then used for random caging of Lys residues in an immunoglobulin, despite the low water solubility of the compound. The NPPOC group has been used in the synthesis of oligonucleotides microarrays [155]. Proteins have not yet been caged in this manner.
154, 155
2 The Properties of Caged Proteins
11
Table 1 (continued) Caging group
MNI
Caging group characteristics
Caged functionality and/or molecule; caging reagent
Comment s
R = OMe: 246 nm max 246 = 17 500 (347; pH 7.2) = 0.085 [156] Two photon uncaging (730 nm), u = 0.06 GM for a caged carboxylate [158]
RCO-MNI Caged carboxylates.
156– Photo-accelerated solvolysis of 1-acyl-7-nitroindoline derivatives 158 is reported [156, 157]. is usually low, but is high, and photolysis occurs in high yield. Two-photon uncaging has been demonstrated [158]. Proteins have not yet been caged in this manner.
MNPC-OR Caged alcohols.
N-Methyl-N-(2-nitrophenyl) carbamoyl chloride is used to cage various alcohols [159] and the catalytic serine of butyrylcholinesterase (see Tab. 5.1.2) [53].
max = 262
nm
(262, pH 7.4) = 5000
is low
MNPCC
Benzoin
R1 = R2 = OMe: max, MeOH = 246 nm 246 = 14250; 282 = 3050 (366, benzene) = 0.64 (benzofuran formation) * 0.1 ns (quenching of triplet state) [161]
Refs.
53, 159
The rate of product formation 160– from the excited state is ex166 tremely fast and the quantum yield of benzofuran release is high [161]. Benzoins have been Photorelease of acetate used to cage phosphates [162], [161]; alcohols [163] and amines [164, benzoin-OC(O)OR, 165]. The chemically inert benzophotorelease of furan byproduct has a huge alcohols [163]; absorbance above 300 nm (for benzoin-OC(O)NHR, R = R = OMe: max, Ephotorelease of amines 1 2 tOH = 301 nm; [164, 165]. 301 = 27480) [161]. Proteins have not yet been caged as benzoins. Potential caging reagents might have solubility problems. The reagent presents a chiral center.
(Continued)
12 Light-activated Proteins Table 1 (continued) Caging group
Caging group characteristics
Caged functionality and/or molecule; caging reagent
Comment s
Refs.
17, 167, 4-Methoxyphenacyl derivatives were first reported by Sheehan 168 in an attempt to simplify the benzoin protecting group [167]. Givens developed the 4-hydroxy derivative to improve aqueous solubility [17]. Like benzoin, the 4-hydroxy-phenacyl group offers a very short * value. It does not contain or generate a chiral center and a predominant photolysis byproduct is usually released: 4-hydroxyphenylacetic acid. The byproduct, besides being chemically inert, does not absorb at the wavelengths of irradiation (> 300 nm) [168].
HP
(313; pH 7.2) = 0.085 310
670 [169]
RS-HP Caged thiol in 3 -thio2 -deoxythymidine phosphate [169].
HPB
(312; pH 5.8) = 0.65; (312; pH 7.3) = 0.56 310 5100 (pH 5.8); 310 4400 (pH 7.3)
Caged thiophosphotyrosine in a peptide.
4-Hydroxyphenacyl bromide is 31 used to cage a thiophosphorylated tyrosine in an unprotected peptide. Compared with the 2-nitrobenzyl (NB) group, HP exhibits better quantum yields. values are similar at pH 5.8 and 7.3. 4-Hydroxyphenacyl bromide is employed to cage a thiophosphorylated threonine in PKA (see Tab. 5.1.2).
(312; pH 7.3) = 0.21 310
The value is lower than for 54, 169 other 4-hydroxyphenacyl-caged functional groups and several photolysis byproducts are formed [169]. Nevertheless, 4-hydroxyphenacyl bromide (HPB) has proved successful for caging the catalytic Cys of a protein tyrosine phosphatase and about 70% activity is regained on uncaging [54].
1650
Caged thiophosphothreonine in PKA.
7
2 The Properties of Caged Proteins
13
Table 1 (continued) Caging group
Caging group characteristics
max = 286
nm
286 = 14600
(H2O/CH3CN) [168] (300; pH 7.3) = 0.37 300 4400 * = 1.2 ns (from quenching of triplet state) [17] values for caged phosphate max = 370
nm
304 = 11 730 (300–350; pH 7.2) =
HDMP
Caged functionality and/or molecule; caging reagent
0.03–0.04 * = 69 ns 304 410
O
Comment s
Refs.
The very short * value is notable.
17, 167, 168
Caged phosphate (ATP) [17]. RCOO-HP Caged carboxylates [167].
RCOO-HDMP Caged carboxylate in GABA and glutamate.
The introduction of methoxy 18 groups in positions 3 and 5 of the ring leads to a red-shift of the absorbance, but also to a drastic decrease of the quantum yield. The photolysis pathway appears to be altered and the release of the caging group as the 4-hydroxyphenylacetic acid is a minor pathway. The 3-monomethoxy reagent was also examined. Proteins have not yet been caged with methoxy substituted HP. Coumarin derivatives usually do 15 not display high quantum yields, but they do absorb strongly above 300 nm. They can also show an increase in fluorescence upon irradiation. A very interesting feature of several coumarins is susceptibility to two-photon irradiation with IR light, which enables spatial control of photolysis at the micometer level. A drawback is their low aqueous solubility.
(Continued)
14 Light-activated Proteins Table 1 (continued) Caging group
Caging group characteristics
max = 325
nm
(325, pH 7.2) = 11600; (365, pH 7.2) = 4100 (365, pH 7.2) = 0.025
HCM
365 = 103 Two-photon uncaging u = 1.07 GM (740 nm); u = 0.13 GM (800 nm) for caged acetate [15] max = 397
nm
(397, pH 7.2) = 15 900; (365, pH 7.2) = 9 700
Caged functionality and/or molecule; caging reagent
AcCOO-TBHCM Caged carboxylate (acetate).
365 = 631 u = 0.96
GM (740 nm); u = 3.1 GM (800 nm) values for caged acetate
max = 327
nm
327 = 13 300 (333; pH 7.2) = 0.21
MCM
333 = 2793 for caged cGMP [16]
max = 394
Caged phosphate (cAMP or cGMP).
The introduction of bromine 15 atoms on the coumarin ring of HCM increases both the quantum yield and the absorbance of caged carboxylates. The best results are achieved with three additional bromines, as shown. The two-photon cross section is high. Proteins have not yet been caged with the TBHCM group. Furuta et al. obtain 340 = 670 for caged cAMP in Ringer’s solution [171]. Schade et al. obtain (333, pH 7.2) = 1716 for caged cAMP and 333 = 2793 for caged cGMP in 20% methanol/80% HEPES buffer [16]. Proteins have not yet been caged with the MCM group.
16, 171
DMACM was first described for 172, caging cAMP and cGMP [172]. 173 max is at a favorable wavelength.
nm
394 = 17 200 (333; pH 7.2) = 0.28 for caged AMP [173]
DMACM
Refs.
For caged acetate, the value is 15, 162, low, but the strong absorbance 170 compensates for this [15]. Proteins have not yet been caged Caged phosphate [162] with the HCM group. Cys residues might be modified directly cAMP [170] with 4-halomethyl-7-hydroxyRCOO-HCM [15] RCNHCOO-HCM [15]. coumarins. Two-photon uncaging is demonstrated. The cross-sections are favorable.
(365, pH 7.2) = 0.065
TBHCM
Comment s
Caged phosphates (cAMP, cGMP, ATP, ADP or AMP).
2 The Properties of Caged Proteins
15
Table 1 (continued) Caging group
Caging group characteristics
max = 370
Caged functionality and/or molecule; caging reagent
nm
365 = 12 600–19 500 (365; pH 7.2) = 0.03–0.06 365 = 403–1026 u = 0.51–1.23 GM (740 nm)
BHC-diol
Caged aldehydes and ketones. max = 369
X
nm
369 = 2600
AQMOC-OR Caged alcohol (galactose).
174 Anthraquinon-2-ylmethoxycarbonyl is used to cage galactose. The value is low. However, AQMOC proves to be superior to various arylmethyl carbonates (7-methoxycoumarinyl-4-methyloxycarbonyl, pyren-1-ylmethoxycarbonyl and phenanthren-9ylmethoxycarbonyl) in terms of and .
DANP-OCOR Caged carboxylate.
For R = Me2N-, the quantum 175, 176 yield is very low, although the absorbance is high [175]. The ester is quite unstable in aqueous media (hydrolysis: t1/2 = 99 min at pH 7), but the stability is improved compared to the compound with R = OMe (hydrolysis: t1/2 = 6.1 min at pH 7.1) [176]. The caging group is released as the corresponding phenol.
750 GM (740 nm); u = 0.087 GM (780 nm) for BHQ-OAc u = 0.59
max = 327
nm
350 = 1500
AQMOC
350 = 0.1 (disappearance of starting material) in 50% aqueous THF 350 = 150
R = Me2N400 nm max 400 = 9077 (308–360; pH 7.4) =
DANP
0.03; (450; pH 7.4) = 0.002 * = 5 s (transient absorption change) [175]
41 BHC-diol is used to cage aldehydes and ketones. The quantum yields of uncaging are low, but the strong absorbance leads to high values. BHC-protected compounds can be uncaged by two-photon irradiation with a favorable cross section ( u).
BHQ-protected compounds can 40 be photolysed by two-photon irradiation. The cross-section at 740 nm is quite good. Proteins have not been caged with members of this class of reagents.
(365; pH 7.2) = 0.29
369 = 4100;
Refs.
RCOO-BHQ Caged carboxylates.
365
BHQ
Comment s
(Continued)
16 Light-activated Proteins Table 1 (continued) Caging group
Caging group characteristics
Caged functionality and/or molecule; caging reagent
Comment s
CINN-OR to modify various serine proteases
177 Cinnamate ester derivatives, CINN-OR have been used to modify the catalytic Ser of various proteases. The nature of the ester used depends on the substrate specificity of the protease. The photochemical data from the different proteases are compared below, together with a model compound. The absorbance and, especially, the quantum yield vary in the microenvironment of the protein.
CINN-OX
(trans to cis) = 0.13 max = 358
nm (366, pH 7.4) = 19 200 366 2500 [177]
CINN-OEt Model ester.
Refs.
177
CINN-O-chymotrypsin Caged catalytic serine. nm (366, pH 7.4) = 25 700 366 4370 [177] * = 199 s (cyclization after photoisomerization) [106]
106, 177
CINN-O-factor Xa Caged catalytic serine. nm (366, pH 7.4) = 26 000 366 5980 [177] * = 151 s (cyclization after photoisomerization) [106]
106, 177
(trans to cis) = 0.17
max = 372
(trans to cis) = 0.23
max = 358
(trans to cis) = 0.05
CINN-O-thrombin Caged catalytic serine. nm (366, pH 7.4) = 18 600 366 740 [177] * = 287 s (cyclization after photoisomerization) [106] max = 380
106, 177
2 The Properties of Caged Proteins
17
Table 1 (continued) Caging group
Caging group characteristics
Caged functionality and/or molecule; caging reagent
Comment s
Photoprotection of primary and secondary alcohols (X = O) or thiol (X = S).
2-Benzoylbenzoate derivatives 178, have been used to photoprotect 179 alcohols ROH (X=O) and 2-benzoylbenzenethioate derivatives to protect thiols RSH (X=S) [178]. Substituted benzyl 2-benzoylbenzoate derivatives have been used for the modification of the catalytic serine of various proteases, the -XR group varying according to the specificity of the enzyme (see Tab. 5.1.2) [179]. For caged thiols, the photodeprotection yield is 60%, together with 20% formation of disulphide. For ROH, the yield was variable, but generally good.
X = O: 2-benzoylbenzoate derivatives X = S: 2-benzoylbenzenethioate derivatives
Groups that have actually been used to cage proteins are emphasized, although promising potential alternatives are also given. Reagents that are available for directly caging proteins are shown with emphasis on those that can be used to derivatize nucleophiles, especially the thiolate side-chains of Cys residues. Key: λmax , maximum wavelength of absorption; ε xxx , extinction coefficient at XXX nm in M−1 cm−1 ; (xxx; pH Y) , quantum yield at XXX nm and pH Y. Unless otherwise stated, the product quantum yield (P ) is given, which is not always interchangeable with the quantum yield for uncaging (u ). In the text, we have often simply used for the purpose of general discussion; τ *, lifetime of photolytic intermediate, as defined in the text. δ u is the uncaging action cross-section, which is the product of the two-photon absorbance cross-section δ a and the uncaging quantum yield u2 . Ideally, δ u should exceed 0.1 Goeppert-Mayer (GM), where 1 GM is defined as 10−50 cm4 ·s photon−1 [15].
Fig. 3 Intermediates in the photolysis of two common caging groups: 2-nitrobenzyl (nitronate or aci-nitro intermediate) and 4-hydroxyphenacyl (spiroketone intermediate).
Refs.
18 Light-activated Proteins
have been obtained by comparison with other reagents for which the values of may in turn be of dubious origin. 2.3 The Rate of Uncaging: Breakdown of Intermediates
The argument about the half-time for photolysis (t1/2 ), given above, applies to cases in which the breakdown of the photogenerated intermediate is not rate limiting. In some applications, an intermediate might be long lived compared to the process under investigation. Such applications are unlikely to occur in cell biology, where the interest is usually in processes taking seconds to hours, but they can occur in biophysics, where for example there is renewed interest in examining protein folding on the submillisecond timescale using folding intermediates generated by flash photolysis [19–22]. Where they are available, values of the half-lives of caged reagents after the absorption of a photon, τ *, have been tabulated (Tab. 1). The reader must be cautioned to take a careful look at how the measurements were made. Very often the appearance and decay of an intermediate in a flash photolysis experiment is followed by absorption spectroscopy and this does not necessarily indicate the rate of product release (e.g. recent work on 2-nitrobenzyl chemistry [13, 14]). Indeed, the identity of the intermediate is not always clear. Rapid-scan FTIR has a higher information content and might prove to be a method of choice for determining photochemical reaction sequences in the future [13, 14, 23–26]. In some cases, secondary intermediates are expected. For example, unstable carbamic acids are formed upon photolysis of molecules caged with the nitrobenzyloxycarbonyl group. By using a single-molecule approach, we recently examined the lifetimes of both the (presumed) nitronate (aci-nitro compound), generated initially, and the derived carbamic acid at the surface of a protein [27]. The nitronate had a half-life of ∼2 s that was largely independent of pH, while the carbamic acid had a half-life of ∼3 ms at pH 5.5 and ∼5 s at pH 10. Clearly, these values are incompatible with, for example, rapid protein folding experiments. The rates of breakdown of photolytic intermediates can vary hugely even within the same class of reagents. For example, in the case of 2-nitrobenzyl reagents (Fig. 3), the nitronate of the 1-(3,4-dimethoxy-6-nitrophenyl)ethyl ester of glycine breaks down with a unimolecular rate constant of 1 s−1 [28], while the value for the nitronate of the corresponding α-carboxy-2-nitrobenzyl ester is 2 × 105 s−1 [29]. At present, the caging groups known to dissociate most rapidly after the absorption of a photon are the 4-hydroxyphenacyl group, τ * ∼1.2 ns [17, 30] and the benzoins [20], which cleave in < 1 ns. Photoinduced isomerizations can occur even more rapidly (see below). 2.4 The Effects of pH on Photolysis
Because there are often ionizable groups in caging reagents themselves or in photogenerated intermediates, both t1/2 and τ * can be affected by the prevailing
2 The Properties of Caged Proteins
pH. The two possibilities are not always clearly distinguished in the literature. Effects on t1/2 arise because of effects of pH on and ε. The t1/2 values are not meaningful in themselves because I0 (the intensity of the photon source) can vary enormously. The substituted 7-hydroxycoumarins of Furuta et al. provide an example of an effect of pH on t1/2 . Ionization of the coumarins led to a decrease in t1/2 (365 nm), which was largely due to an increase in εu rather than u [15]. In the case of 2-nitrobenzyl derivatives, pH can affect u [3, 31]. In an example of an effect on τ *, the rate of breakdown of the nitronate intermediate in the photolysis of 2-nitrobenzyl derivatives has been reported to be pH dependent over up to four orders of magnitude in certain cases, with faster breakdown at low pH [32]. In other cases, the pH dependence is weak [27, 29]. All these factors, u , ε u , τ * and the effects of pH, will depend on the leaving group, which for proteins will often be the thiolate of a cysteinyl (Cys) residue (see below). Some measurements have been made on proteins derivatized on Cys or thiophosphate (Tab. 1), but more work is needed in this area. At present, the effects of pH should be considered to be unpredictable and the necessary exploratory experiments must be carried out for each new reagent caged protein.
2.5 Problems with the Released Caging Group
The released caging group, which is usually liberated in an altered form, must not interfere with the experiment that is underway. Therefore, the released group should be chemically inert and should not take part in further photochemistry, either by simply absorbing light and screening the remaining caged protein or by forming reactive intermediates. These issues are expected to be less of a problem in the case of cell biology experiments, where caged proteins are most often used at low concentrations (micromolar or less). In these cases, the photoproducts will be dilute and therefore do minimal damage, even if they are chemically reactive, and they will absorb little light, even if they have high ε values. The most widely used caging reagent for proteins has been the 2-nitrobenzyl group (Tab. 1). Its main drawback is that it is released as a reactive nitrosoal-dehyde or nitrosoketone, which has been shown to be deleterious to proteins by reaction with Cys residues in particular [25, 33, 34]. The nitroso compounds can be scavenged by the addition of a substantial concentration (> 10 mM) of a thiol such as dithiothreitol (DTT) or 2-mercaptoethanol, prior to photolysis [34]. Where the caged protein is activated inside a living cell, the millimolar concentrations of intracellular thiols (glutathione, Cys, etc.) will serve to scavenge the reactive products. Semicarbazide at 10 µM has also been used to scavenge 2-nitrosobenzaldehyde, but its efficacy was not documented in detail [35]. Other photoproducts may be more benign. For example, 4-hydroxyphenacyl groups are released as a spiroketone, which rearranges to form 4hydroxyphenylacetic acid, which is water-soluble, chemically inert and blue-shifted in absorbance compared to the caging reagent (Fig. 3) [17, 30]. A detailed
19
20 Light-activated Proteins
comparison of this and other aspects of the nitrophenylethyl and hydroxyphenacyl caging groups has been made [25]. Several chromophores generate reactive oxygen species upon irradiation in the presence of oxygen. Therefore, where possible, uncaging should be carried out in an inert atmosphere, e.g. under nitrogen or argon. Alternatively, scavengers of the reactive species should be used. Thiols are good all round scavengers, but singlet oxygen is more effectively quenched with azide or carotenoids [36–38]. Many of the deleterious effects of short-wavelength light [8, 9] might be eliminated by twophoton uncaging at long wavelengths and this approach is under investigation [15, 39–41]. 2.6 Photochemistry Peculiar to Proteins
While they have been treated here in the context of caged proteins, many of the phenomena described above also occur with small caged reagents. Certain phenomena that might affect the efficiency of photolysis are, however, peculiar to proteins (Fig. 4), e.g. the excited state of the caging group might be quenched by neighboring tryptophan. Conversely, it might be possible to uncage proteins by energy transfer to the caging group from tryptophan residues. This process has been used previously to activate photoaffinity reagents [42]. Other aspects of the microenvironment associated with caged residues in proteins include the local dielectric constant and pH. The dielectric constant can alter both the ground state absorption properties of a caging group, the photo-physics after a photon is absorbed and the subsequent dark chemistry. The importance of pH has been noted above with reference to reactions in bulk solution. However, the “apparent pK a ” of a group in, say, the active site of an enzyme can be quite different from the bulk value, if the local environment differs. In other words, the protonation states of ground states and intermediates may differ significantly when a caged group is buried within a protein. Protonation and deprotonation rates may even be slowed to the extent that their kinetics dominate the deprotection process. Steric hindrance to uncaging might also be of importance. For example, after a photon is absorbed, the caging group might not be able to rotate freely to attain
Fig. 4 The microenvironment at the site of caging in a protein can affect the photochemistry. For example, as depicted, the excited caging group might be quenched by energy transfer. Uncaging might be inhibited by restricted rotation. Local conditions such as pH and the dielectric constant might differ from bulk values.
2 The Properties of Caged Proteins
Fig. 5 Active site photochemistry of 2hydroxycinnamoyl proteases. The enzyme (E-OH) is inhibited by acylation of the active site nucleophile, a serine side-chain. Upon photolysis, trans-cis isomerization oc-
curs such that the phenolic hydroxyl of the caging group can attack the carbonyl of the acyl enzyme. The caging group then cyclizes and is released.
the conformation that is required for bond rearrangement and cleavage (Fig. 4). From a practical viewpoint, these considerations indicate that it is not always possible to assess the properties of a caged protein from related solution photochemistry. Nevertheless, the circumstances suggest that, where possible, a small mobile group should be used to cage a protein. Further, aromatic amino acids, which are potential quenchers, especially tryptophan, should be removed by mutagenesis from the vicinity of the site of modification. Access to solvent will also be important if the photochemistry that occurs in bulk solution is to be approximated. In some cases, the uncaging mechanism relies on active-site chemistry, as in the hydroxycinnamoyl-proteases of Porter et al. (for recent work, see [43–45]) (Fig. 5). For caged enzymes, the photoproducts should be released quickly from the active site and fail to rebind. This can be a tricky issue where a light-removable covalent inhibitor has been directed towards the active site. For example, Cys199 of the catalytic subunit of protein kinase A (PKA), situated at the entrance of the active site, was modified in preference to Cys343, with a substrate-like peptide linked to an α-bromo-2-nitrobenzyl group [46]. While the millimolar affinity of the peptide was sufficient to discriminate kinetically between the two Cys residues in the alkylation reaction (caging), it was, as desired, too weak for the peptide to remain in the active site after photolysis (uncaging). Proteins are chiral and several caging reagents (Tab. 1) are attached through their own chiral centers forming diastereomers. Although the phenomenon has not been observed in practice, it is quite possible that the photochemistries of the two forms of caged protein differ. 2.7 Polypeptide Chain Cleavage
Occasionally, polypeptide chain cleavage has been noted during uncaging. For example, when the pore-forming protein α-hemolysin caged at Cys3 with 2-bromo2-(nitrophenyl)acetic acid (BNPA) was irradiated in the near-UV, an apparently shortened fragment was observed upon sodium dodecylsulfate-polyacrylamide gel electrophoresis (SDS-PAGE) [47]. While other explanations are possible for the shift
21
22 Light-activated Proteins
in electrophoretic mobility, such as intramolecular crosslinking, chain cleavage arising from scission at a site distant in the primary sequence is the most likely possibility. Less cleavage occurred when BNPA was attached at position 104 and in both cases the ratio of cleavage to deprotection was pH dependent [47]. In some cases, chain cleavage is intentional and desirable (see Section 3.10 and [48]). 2.8 Dominant Negative Effect
In the case of multisubunit proteins, “dominant negative” effects are a further consideration. Here, a small fraction of remaining caged protein or protein damaged during uncaging might remain in association with the properly uncaged protein and thereby reduce its activity. For example, in the case of caged α-hemolysin, which forms a heptameric pore, caging was about 60% complete as judged by SDS-PAGE, but only 10–15% of the pore-forming activity based on complete conversion was recovered [47].
3 Sites of Modification in Caged Proteins 3.1 Random Modification
Surprisingly, random modification has been used to cage proteins successfully [49]. For example, antibodies were treated with the oxycarbonylchloride of 1-(2nitrophenyl)ethanol (NPE), which primarily reacts with the e-amino groups of lysine (Lys) residues. When an Fc-specific anti-human IgG was “coated” with 30 NPE groups and the remaining Fc-binding fraction removed by affinity chromatography, a 15% recovery of IgG was obtained in which the specific Fc-binding activity was reduced to 1.5% of the original value. Upon irradiation with near-UV light, the specific binding activity increased to 29% of the original value. 3-Nitro-2naphthalenemethanol has been used in much the same way [50]. For the unconjugated alcohol, u = 0.63 and ε u = 810 M−1 cm−1 (u ε u = 510) in aqueous solution at 380 nm. In this case, the Protein A-binding activity of caged IgG increased from 18% to about 70% upon irradiation and then decreased upon further exposure. While simple in practice, the potential problems of random modification are clear. First, when the target protein has multiple functional domains, all of them will be affected by the caging reaction. Second, because multiple sites are likely to be modified by the caging reagent, uncaging will be a complicated process involving multiple photons, a lag phase and a complex mixture of intermediates with partly restored activity (Fig. 6). If the overall efficiency of removal of a caging group is E (the remaining positions being damaged in some manner), the final extent of uncaging will be ∼E n , where n is the number of groups incorporated initially. However, the effect of multiple sites of caging on the final activity is actually less
3 Sites of Modification in Caged Proteins
Fig. 6 Caging a protein by random modification. A two-domain protein is shown here. The protein is caged by “coating” with many protecting groups. Photolysis occurs through numerous partly deprotected inter-
mediates that may or may not be active. In the case of a two-domain protein, intermediates may be formed in which one domain is active and the other is not.
clear as each position at which caging takes place may have a different value of E and make a different contribution to the activity of the protein. On the positive side, a requirement for activation by multiple photons can aid in the spatial localization of uncaging (see Section 5.3). 3.2 Active-site Directed Modification
In an approach pioneered by Porter et al. [51], enzymes have been caged by making use of active-site chemistry. Various proteases have been inactivated as trans-2hydroxycinnamoyl acyl enzymes by using ester substrates (for recent work, see [43–45]). Upon photolysis, the caging group undergoes trans–cis isomerization and the product then lactonizes in a dark reaction liberating the active site (Fig. 5). In a recent manifestation, the approach has been used to separate a specific enzyme from a mixture of proteases. In this case, the leaving group in the ester substrate confers enzyme specificity (it is an “inverse” substrate) and the acylating group contains an arm carrying biotin. The modified enzyme is extracted from the mixture with an avidin column and then released from the column by irradiation. Tagged acylating agents have been used for proteomic screening [52] and it is possible that photorelease might be useful in this context. Additional active-site-directed reagents have been developed, e.g. the substratelike peptide that reacts with a Cys residue at the entrance to the catalytic site of PKA [46]. However, there may not always be a need to tailor a reagent for an enzyme active site, as the latter often contain hyper-reactive residues associated with the catalytic activity. For example, the serine at the active site of a butyrylcholinesterase can be selectively reacted with the carbamylating reagent N-methyl-
23
24 Light-activated Proteins
N-(2-nitrophenyl)carbamoyl chloride [53] and the catalytic Cys of a protein tyrosine phosphatase can be modified with various α-haloacetophenones, despite the presence of a total of four Cys residues in the protein [54]. It would seem likely that a modification strategy developed for one protein would be applicable to other members of its family, but this is not always so. For example, N-methyl-N-(2-nitrophenyl)carbamoyl chloride efficiently inhibits both acetylcholinesterase and butyrylcholinesterase. However, only butyrylcholinesterase recovers activity efficiently upon irradiation [53]. 3.3 The Use of Crosslinkers
A photocleavable, 2-nitrobenzyl, crosslinker was used to connect a ribosome-inactivating protein (PAP-S) to a monoclonal antibody or to the B-chain (lectin subunit) of ricin for delivery to cells [55]. Attachment to PAP-S was through a chloroformate and non-specific. A sulfhydryl group at the other end of the crosslinker was then deprotected for reaction with the monoclonal antibody, which had been premodified to carry maleimide groups. Alternatively, ricin B-chain was attached through another crosslinker carrying a maleimide. The conjugates were active towards cells in culture, but the activity was increased more than 20-fold in a plating efficiency assay after irradiation in situ at 350 nm, presumably because the liberated PAP-S has greater access to ribosomes (Fig. 7). This promising approach does not appear to have been pursued further.
Fig. 7 Application of a photocleavable crosslinker. (Left) A small toxic protein (PAP-S) is coupled in two steps to a cellspecific antibody through a photo-cleavable crosslinker. (Right) The toxin–antibody con-
jugate recognizes target cells and is internalized, e.g. by endocytosis. Photocleavage of the crosslinker increases the toxicity of the internalized conjugate [55].
3 Sites of Modification in Caged Proteins
There are many other potential applications of photocleavable crosslinkers for proteins. For example, they might be used to for the patterning of surfaces, including protein chips. One important possibility, for reagents carrying biotin on one arm [56], is in the purification of caged proteins. It has already been noted that it is often impossible to completely derivatize a protein with a caging reagent. With a biotinylated caging reagent, it would be possible to extract the derivatized fraction of the protein and this has been explored in the particular case of proteases by using active-site directed reagents [44]. The approach might be generalized by using Cys-directed biotinylated reagents to cage mutated proteins (see below). 3.4 Caging at Cys Residues
The deprotonated thiolate side-chains of Cys residues are by far the most nucleophilic groups in natural polypeptides. Therefore, under normal circumstances, a Cys residue will be selectively modified by an electrophilic caging reagent, even in the presence of hundreds of other reactive side-chains. “Normal circumstances” assumes that the Cys is not buried or that there is not a hyper-reactive active-site nucleophile present in the target protein. The unperturbed pK a value of a Cys residue in a polypeptide is 9.0–9.5. The next issue then is to find or place a suitable Cys residue in the target protein (Fig. 8). In general, Cys residues are rarer than Lys residues and simply “coating” the protein with a Cys-reactive caging reagent is unlikely to work. Generally, a Cys residue in or close to a key functional site in the protein is required and can be introduced by mutagenesis. Occasionally, such a Cys residue will occur naturally; Cys residues are crucial active-site nucleophiles in several classes of enzyme including various proteases and phosphatases. For example, a protein tyrosine phosphatase has been caged with 4-hydroxyphenacyl bromide at the
Fig. 8 Flow chart for caging proteins at Cys residues.
25
26 Light-activated Proteins
active-site Cys [54]. In other cases, a Cys residue in proximity to the active site may suffice and, where a structure is available, the residue is most readily identified by molecular graphics. For example, the catalytic subunit of PKA has been caged at Cys199, which lies near the entrance to the active site and in the so-called activation loop (PKA is activated by phosphorylation at Thr197). Care should be taken when caging enzymes at the periphery of the active site, because not all substrates may be affected equally by the covalent modification. Where structural information is unavailable or unhelpful, cysteine-scanning mutagenesis can be a useful approach. Although it is tedious, the various spinoffs, e.g. mechanistic information and sites for the attachment of fluorescent probes, often make it worthwhile. Cys-scanning mutagenesis was performed on the pore-forming protein α-hemolysin (αHL) before the crystal structure had been determined and when little was known about the assembly mechanism. Wild-type αHL contains no Cys residues. Eighty-three mostly charged residues in the 293-residue polypeptide were changed individually to Cys, and the poreforming activity of the protein was determined before and after modification at Cys with the bulky charged reagent 4-acetamido-4 -((iodoacetyl)amino)stilbene-2,2 -disulfonate, IASD [57]. The information gained in this way proved useful when it came to producing a caged aHL polypeptide [47]. Twenty-eight of 83 modified Cys mutants were substantially inactivated. Through various other negative criteria, including reduced activity before modification, incomplete inactivation by IASD, or a failure to be inactivated by a smaller reagent, the candidate positions were narrowed to four. One of these, Arg104, was investigated in detail and performed well when derivatized with the water-soluble caging reagent BNPA [47]. Subsequent structural information [58, 59] revealed that Arg-104 is in a region that undergoes a critical conformation change, including a 180◦ rotation at the ψ (Cα –C) angle of this residue. In principle, proteins might be inactivated by modification at locations other than an active site. For example, it might be possible to use a caging group to prevent a protein-protein interaction, which could be reestablished upon photolysis. In practice, however, this may be more difficult than it appears. For example, a nitrobenzyl group located at the dimer interface in HIV protease did not prevent the interaction between the subunits [60]. In most cases, it is likely that a bulky reagent will be required for this purpose. For example, polymeric Cys-directed caging reagents might be developed [61] or biotinylated reagents [44, 56] might be used to block the interface after coupling to streptavidin. An additional means of generating Cys residues at specific sites is by chemical ligation of polypeptide fragments, a methodology that is gaining ground for the total synthesis of proteins [62]. In this approach, two or more synthetic or recombinant fragments undergo chemical coupling. In general, the N-terminal fragment terminates in a reactive thioester at the C terminus and the C-terminal fragment contains an N-terminal Cys. For example, active channel proteins have been made in this way [63, 64]. After spontaneous ligation, the new polypeptide contains a Cys residue at the ligation site, which is available for chemical modification with a caging reagent. In this section, the question of how to handle more than one Cys residue in the target protein has been neglected. As noted above, if several caged residues
3 Sites of Modification in Caged Proteins
affect activity, the regeneration of activity upon photolysis will be a non-linear process. Because Cys residues are rare, this will not occur as often as it might with say non-specific modification at Lys. Again, because Cys is rare, it will often be possible to eliminate potentially offending residues by mutagenesis [65] (Fig. 8). In cases of extracellular proteins, Cys residues will often be present as disulfide bonds. The introduction of an additional Cys residue might then present a problem with disulfide scrambling, but often this can be circumvented by refolding in a redox buffer to encourage the formation of the natural disulfides and leave the new residue as a free sulfhydryl. Despite these issues, it is remarkable what can be achieved under potentially difficult circumstances. For instance, each 135 000 Da subunit of the β-galactosidase tetramer contains 16 Cys and 23 Met residues and yet it appears to be inactivated by the attachment of one 1-(4,5-dimethoxy-2nitrophenyl)-2-nitroethene reagent per subunit [66]. Because the alkylation reaction is not reversed by 2-mercaptoethanol, the modification is unlikely to be at the activesite Met502. Upon irradiation, 89% of the enzyme activity was recovered.
3.5 Sites other than Cys for Site-specific Modification
Reactive side-chains other than Cys residues can be introduced into proteins for sitespecific modification. For example, thiophosphate groups in peptides and proteins can be derivatized with various caging reagents. The thiophosphate group is an especially useful target because the activities of many cell-signaling proteins are controlled by phosphorylation. Thiophosphorylation can be carried out at serine and threonine by using the appropriate protein kinase and ATPγ S [adenosine 5 O-(3-thiotriphosphate)] [7]. In the case of tyrosine, the enzymatic phosphorylation conditions must be manipulated by the substitution of alternative divalent metal ions for magnesium [31]. Interestingly, the thiophosphate group is often a poor substrate for phosphatases, so proteins activated by thiophosphorylation rather than phosphorylation are expected to have longer lifetimes in vivo. A second useful feature of the thiophosphate group is that it can be alkylated at low pH in the presence of sulfhydryls [12]. The second pK a value of a thiophosphate monoester is ∼5.5 [67] and therefore at pH 4–5 the reactive monoanion will predominate, while Cys residues (pK a ∼9.0) will be protonated. With groups of similar nucleophilicity, selective modification can be obtained where pK a ∼3 or more. In principle, the wide difference in reactivity between a thiophosphoryl and sulfhydryl group at low pH might allow two different groups to be placed on a protein, e.g. a caging group and a fluorescent probe. Therefore, it is worth considering additional residues that might react orthogonally. A prime candidate is selenocysteine (Sec), which when compared with Cys has a greatly reduced pK a value (estimated to be 6.0–6.5 in a polypeptide chain [68]) and enhanced nucleophilicity [69, 70]. Sec can be introduced into proteins in vivo at TGA codons (although, in bacteria, the required cis signal disrupts the coding region) [71], by chemical ligation [70, 72] or, potentially, by unnatural amino acid mutagenesis. Interestingly, the relative rate of reaction of a selenophosphate over a thiophosphate monoester
27
28 Light-activated Proteins
towards an alkyl iodide has been examined at pH 7.0 and found to be only ∼ 3.5-fold higher [73]. More complex motifs can be derivatized even more selectively, as demonstrated by the work of Tsien et al. on tetracysteine (TC) sequences that can be reacted with bisarsenicals [74–76]. Reaction at these sites is sufficiently selective that proteins bearing them can be derivatized in vivo, although the generality of the approach for all types of cells has been questioned [77]. It is possible, for example, that caged TC motifs might be used to prevent protein–protein interactions in living cells. Other functionalities that might be useful for site-specific modification can be introduced by unnatural amino acid mutagenesis, and include azides [78, 79] and ketones [80]. 3.6 Reagents for Caging by Protein Modification
It will be clear from the discussion above that the overwhelming majority of useful caging reagents for proteins are electrophilic molecules that modify nucleophilic amino acid side-chains (Tab. 1). Preferably, a reagent should be soluble in aqueous media. However, poorly soluble reagents can often be added from a concentrated solution in a water-miscible organic solvent, provided that the final concentration of organic solvent remains low (<5%; see, e.g. [65]). At higher concentrations of organic solvent many, but not all, proteins are irreversibly inactivated. The reagent and its linkage to the target protein should be stable in the storage buffer. For example, butyrylcholinesterase derivatized at the active site serine with the carbamylating reagent N-methyl-N-(2-nitrophenyl)carbamoyl chloride hydrolyzes over several days, but in this case decomposition is not a problem given the shorter timescale of the uncaging experiments [53]. The reactivity of the caging reagent should be tuned so that the desired residue on the protein is selectively modified. Often, the reaction rate can be controlled by manipulating the pH to adjust the protonation state of the reactive polypeptide side-chain. Excess reagent should be inactivated and removed so that it does not interfere with the biological system under study or simply screen photolysis. Typically, an electrophilic reagent is reacted with excess of a nucleophile, such as 2-mercaptoethanol, and removed by gel filtration or dialysis. In principle, it would seem to be simple enough to choose a reagent (e.g. from Tab. 1) and modify a suitable Cys mutant of a target protein. In practice, there may be a need to test several reagents of differing size, shape and polarity to achieve a high extent of inactivation and the desired level of activation upon photolysis. For example, in the case of PKA, three nitrobenzyl reagents were tested for derivatizing Cys199: BNPA, 4,5-dimethoxy-2-nitrobenzyl bromide (DMNBB) and 2-nitrobenzyl bromide (NBB). Only one of them, NBB, gave an acceptable extent of inactivation (95–97%, after additional N-ethylmaleimide treatment) and reactivation (20- to 30-fold) [65]. The efficacy of NBB compared to the other reagents is not readily rationalized. Site-specific reagents are not discussed individually here, because each case must be considered on its own merits. However, another potentially general class of modification should be noted. Lawrence et al. placed a Cys residue by mutagenesis into
3 Sites of Modification in Caged Proteins
cofilin at the position of a Ser residue that is the site at which actin depolymerization activity is modulated by phosphorylation. The new Cys was modified with BNPA to yield a charged derivative that mimicked phosphoserine. This form of cofilin is inactive and is activated upon photolysis [81]. Because Cys is not a kinase substrate, the uncaged cofilin cannot be switched off by cellular kinases.
3.7 Determining the Extent of Protein Modification
As noted earlier, there are circumstances that require a high extent of inactivation when a protein is caged and others that require a high extent of reactivation upon irradiation. Often both are desirable. Unfortunately, inactivation is often incomplete, especially when less than 1% residual activity is sought. In the case of Cys residues, the possible reasons are varied. The residue in question may become oxidized during protein purification to a sulfenate (RSOH) or a disulfide (with say β-mercaptoethanol from a buffer). Sulfenates are inert towards most electrophilic reagents (but they will react with say 7-chloro-4-nitrobenzo-2-oxa-1,3diazole (NBD-Cl)) [82]. Both sulfenates and disulfides can be converted back to Cys by treatment with a reducing agent such as DTT. Improved means for purifying the protein in the first place can help, and where small amounts of proteins are required preparation by in vitro translation may be preferable [27, 47]. We have found that the Cys residues in freshly translated protein are more likely to be fully modifiable than the same residues in protein that has been subjected to a relatively harsh purification procedure. Another possibility may be to modify a protein before purification. Other sources of incomplete caging at Cys include the presence of lowmolecular-weight reactive impurities in the caging reagent that can modify but not inactivate a protein. Although functional measurements are the most direct and best for detecting small fractions of residual activity, other approaches can be used to assess the fraction of a protein that is modified. Further, while the residual activity is easily measured for an enzyme, it is not always straightforward to determine in other systems. If sufficient material is available, the extent of modification can be determined by UV-vis absorption spectroscopy, based on the near-UV or visible absorption of the caging group [5, 83]. The most powerful approach is mass spectrometry, which also requires a relatively small amount of protein. A popular method is matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-MS) in which the protein is desorbed from an organic matrix by a laser pulse, which is sufficiently powerful to remove a fraction of the caging groups. The latter can be seen as a complication, in that it is difficult to quantify the extent of caging, or an advantage in that the spectrum of ions is characteristic of the caging reagent. The phenomenon has been well documented for 2-nitrobenzyl-protected Cys and thiophosphoryl peptides with a 337-nm laser [3, 12]. Besides simple removal of the caging group, a MH+ -16 peak is seen in the case of Cys peptides (loss of O from the NO2 group?) and a MH+ -135 peak from caged thiophosphoryl peptides (loss of
29
30 Light-activated Proteins
the 2-nitrobenzyl group with S/O exchange?). These peaks are not seen in electrospray ionization mass spectrometry (ESI-MS). Another simple and clearly diagnostic approach is gel-shift electrophoresis [3, 35, 47], which is based on the fact that modified proteins may exhibit altered mobility upon SDS-PAGE that is greater than anticipated based on the increase in mass (and often in the opposite direction). For example, single-Cys mutants of a-hemolysin show distinct gel shifts after reaction with BNPA that permit quantification of the extent of modification [47]. Where the caging reagent does not produce a visible shift, an indirect approach is to show that the modified protein is no longer susceptible to a shift produced by a highly charged or high mass reagent, such as 4-acetamido-4 -[(iodoacetyl)amino]stilbene-2,2 -disulfonate [7, 47] or a sulfhydryldirected poly(ethylene glycol) [84]. In a few cases, the crystal structure of a caged protein has been determined [53, 85]. These studies confirm the site and extent of caging (although small fractions of uncaged protein are not detected), and are important both in terms of the application of caged groups to study enzyme mechanisms and the examination of the mechanism of uncaging in an active site by time-resolved crystallography [85–87]. 3.8 True Residual Activity
Of course, another possibility is that the fully and cleanly modified protein has true residual activity. When activity is blocked by steric hindrance or the prevention of a protein-protein interaction, rather than modification of a catalytic residue, the extent of inhibition might vary according to the bulk or charge of the photolabile group, even if the targeted residue is completely modified. When the caging group is attached at the periphery of the active site, it is even conceivable that a caged enzyme might have activity towards one class of substrates but not another. At the level of a few percent, it can be hard to distinguish between incomplete protein modification and a low level of activity after complete modification. However, after exhaustive investigation it was concluded that the catalytic subunit of PKA, caged with 4-hydroxyphenacyl bromide on thiophosphorylated Thr197, has ∼5% residual kinase activity towards a small peptide substrate [7]. In favorable cases, a systematic exploration of caging groups and locations may result in proteins with very low residual activity. For example, one of six caged RNase S peptides that were tested for their ability to complement RNase S yielded a complex (RNase S ) that was completely inactive. On irradiation, RNase activity equivalent to 36% of the native value was obtained [88]. The caged residue was adjacent to the catalytic His12. 3.9 Incomplete Uncaging
The photoregeneration of protein activity (uncaging) can also be incomplete and here even less is known. Very often details of photochemical side-reactions are
3 Sites of Modification in Caged Proteins
poorly understood, even for model compounds of low mass, because the focus is usually on the major photolysis product. Additional details about the solution photolysis of small caged molecules can be found in this volume. Further, as indicated earlier, it is quite possible for photodeprotection to take a different course in the environment of a protein (Fig. 4). It might also simply be less efficient (reduced p ) and thereby allow competing side-reactions, such as photooxidation of the protein, to gain the upper hand. It would be very interesting to apply the power of mass spectrometry to this issue by determining the structure of protein byproducts formed during uncaging. 3.10 Means of Caging other than Chemical Modification
Caged proteins can be produced by means other than chemical modification of a protein or a mutant of it. Small proteins can now be produced by total synthesis or semisynthesis [62] and, as mentioned earlier, Cys residues generated at splice sites by chemical ligation might be modified by electrophilic reagents. Alternatively, unnatural photoactivatable amino acids might be incorporated into a polypeptide during chemical synthesis. Examples of small synthetic caged peptides include protein kinase inhibitors (caged Tyr [89]; caged Lys [90]), RNase S peptide (caged Asp, Glu, Lys, Gln [88]) and speract on a backbone NH [91]. The photomodulation of effectors and inhibitors is the topic of the review by Peng and Goeldner (Chapter 52). These approaches could be extended to entire synthetic or semisynthetic proteins. One interesting approach, which, however, cannot be generalized, was taken by Ueda et al. [92], who amidated Lys residues in phospholipase A2, removed three key N-terminal residues by Edman degradation and replaced them by covalently coupling tripeptides containing various unnatural amino acids at position 3. One of these amino acids, L-p-phenylazophenylalanine (PAP), yielded a trans-azo lipase with reduced catalytic activity that gained weak activity upon irradiation. Unnatural amino acids can also be incorporated at specific sites during in vitro protein synthesis and this methodology has been used to make a range of caged proteins including enzymes [93] and ion channels [94]. Uncaging of in vitro incorporated amino acids has been used to initiate protein splicing [95], which might therefore also be used as a means to generate enzymatic activity. When the amino acid (2-nitrophenyl)glycine is introduced into proteins, intentional polypeptide chain cleavage occurs upon irradiation [48, 96]. In the case of ion channels, the proteins can be inactivated or specific modulatory regions, such as an N-terminal inactivation domain can be removed in situ. Further investigation of the generality and efficiency of chain cleavage by this means would be informative. Attempts to modulate protein–protein interactions through unnatural amino acid mutagenesis have been instructive. The p21ras protein with the β-2-nitrobenzyl ester of Asp at position 38 was unable to act with the effector protein GAP. Residue 38 lies at the Ras-GAP interface. Upon photolysis, about ∼50% of the Ras activity was restored [35]. In contrast, when HIV protease was caged in the same way, at an active site Asp that lies at the dimer interface, the protein was inactivated but still able to dimerize [60].
31
32 Light-activated Proteins
All the examples of unnatural amino acid mutagenesis to produce caged proteins so far recorded have been based on 2-nitrobenzyl chemistry, but the approach is capable of handling a wide variety of amino acid side-chains [97, 98]. Advances are being made in applying the unnatural amino acid approach in vivo. Loaded tRNAs and mRNA can be injected into cells, most readily Xenopus oocytes [94, 99]. Recently, aminoacylated tRNA and mRNA (or DNA constructs) were loaded by electroporation into cells [100]. Bacteria containing mutant tRNA/aminoacyltRNA synthetase pairs have been prepared that permit the introduction of unnatural amino acids at amber sites in vivo [80, 101, 102]. It is only a matter of time before various classes of cells capable of incorporating caged amino acids into preselected proteins at specific sites are available. This is important because the need to rely on means such as microinjection to introduce caged proteins into cells and to the correct sites within them is a major deterrent to their more widespread use.
4 Applications of Caged Proteins
In this chapter, we have focussed on the means by which caged proteins with optimal properties can be prepared. The applications of such proteins in vivo are covered. Here, for completeness, we have tabulated selected examples that have been examined for the most part in vitro, to illustrate the various classes of proteins that have been caged and to give emphasis to those cases where the protein has been used to generate new mechanistic or biophysical information (Tab. 2).
5 Future Prospects 5.1 Turning Proteins On and Off
Attempts to control reversibly the activity of proteins with light, like straightforward caging, go back many years [103–107]. Most commonly, proteins have been derivatized with azobenzenes that undergo trans to cis isomerization upon irradiation at short wavelengths, with reversal at longer wavelengths or by thermal relaxation [108]. With notable exceptions [92, 109], early attempts involved random modification with photoisomerizable reagents. Given the problems with simply caging proteins by this approach, it is not surprising that these efforts were only partly successful in that complete on–off switching was not achieved. More recently systematic studies have been undertaken. The example of ribonuclease (RNase) is instructive [110]. The photoisomerizable amino acid PAP was incorporated into the RNase S peptide at six positions, 4, 7, 8, 10, 11 or 13, by chemical synthesis. Under the authors’ conditions, the maximum extent of photocon-version of PAP in solution was from 96% trans/4% cis to 10% trans/90% cis, which places
5 Future Prospects Table 2 Caging groups for proteins
Class
Example
Caged protein
Comments
Enzymes
Protein kinase
Modification of Cys199 in the catalytic (C) subunit of PKA by NBB. Cys199 is located at the mouth of the active site. The other Cys in the C subunit, C343 is mutated to serine.
Various 2-nitrobenzyl deri65 vatives (NBB, DMNBB and BNPA) are tested. The most effective is NBB, which gives 3–5% residual activity after further treatment with Nethylmaleimide and 80–100% activity (a 20- to 30-fold increase) after irradiation, with (312; pH 6) = 0.84 and (312; pH 8.5) = 0.14
Modification of Cys199 in the C subunit of PKA by a derivative of BNPA, in which the carboxylate is linked to a peptide that binds at the active site of the enzyme. The affinity of the peptide (Ki = 1.5 mM) is sufficient to allow selective modification of Cys199 over Cys343, but low enough that the peptide dissociates after photolysis.
46 ESI mass spectrometry shows a single covalently bound CNB-peptide per protein. The C subunit shows 2% residual activity after modification and 50% of the original activity is regenerated upon irradiation. The caged PKA is microinjected into fibroblasts and morphological changes are observed upon photoactivation.
Modification of a thiophosphorylated Thr197 residue in the catalytic (C) subunit of PKA with HPB. The C subunit of PKA is activated by phosphorylation at Thr197.
17-fold decrease in activity (5% residual activity) upon modification, and 15-fold increase of activity (85–90% uncaging) upon irradiation, with (312; pH 7.3) = 0.21
Protein Catalytic Cys of a tyrosine phosphatase phosphatase caged by HPB.
Refs.
7
HPB, 4-methoxyphenacyl 54 bromide, 4-carboxymethylphenacyl bromide and 4carboxymethylphenacyl chloride are tested. HPB proves to be the best irreversible inhibitor. It produces over 95% inactivation and 80% of the activity is recovered upon irradiation.
(Continued)
33
34 Light-activated Proteins Table 2 (continued)
Class
Example
Caged protein
Comments
Refs.
Proteases
p-Substituted o-hydroxy-methylcinnamoyl derivatives are used to acylate the catalytic serine of various proteases. The nature of the ester leaving group (-OR) is varied to achieve selectivity among chymotrypsin, thrombin and factor Xa.
Various studies have been carried out with the caged proteases [44, 51, 87, 180, 181]; including: time-resolved X-ray crystallography [85–87]; stimulation of localized thrombosis in the rabbit eye [139]; two-photon photolysis [182]; purification of proteases derivatized with a biotinylated arm, with an avidin column and release by photolysis [45]. More details are given in Tab. 5.1.1.
44, 45, 51, 85– 87, 139, 177, 180– 182
CINN-OR
HIV-1 protease
p-Nitrophenyl 2-(2,5-dimethoxybenzoyl)benzoate (R = NO2; R = OMe) is used to cage the serine at the active site of chymotrypsin, making use of the catalytic mechanism.
Chymotrypsin is completely 179 inactivated. Subsequent irradiation regenerates 70–80% of the activity. A biotinylated photolabile inhibitor enables the attachment of chymotrypsin to an avidin affinity column, with release by photolysis. p-Guanidinophenyl benzoylbenzoate (R = NH(CNH)NH2 · HCl; R = H) is used in a similar way to cage thrombin. Acylthrombin does not regain any activity upon irradiation, but is rather shown to be photolabeled.
The active site of HIV-1 protease is at the interface of a homodimer. Asp-NB ester is incorporated at position 25 in the interface by the amber codon suppression method.
Surprisingly, dimer formation is not prevented, but the caged enzyme has minimal proteolytic activity. 97% of the activity is restored upon irradiation.
60
5 Future Prospects Table 2 (continued)
Class
Example
Caged protein
Comments
Refs.
-galactosidase
Modification of a Cys residue in -galactosidase by 1-(4,5-dimethoxy-2-nitrophenyl)-2-nitroethene.
94% inhibition is obtained. 66 Subsequent irradiation at 350 nm results in restoration of 83% of the enzymatic activity.
Cholinesterases
The catalytic serine of butyrylcholinesterase is modified with N-methyl-N(2-nitrophenyl)carbamoyl chloride (MNPCC).
95% inactivation is achieved. 53 Subsequent irradiation regenerates up to 100% activity. The specificity of the reagent for the active site is demonstrated by X-ray crystallography. Acetylcholinesterase can be efficiently inhibited by MNPCC, but photoregeneration is limited, 20–30% at most.
Ribonuclease
Ribonuclease S (RNase S ) is composed of two fragments: S-peptide (residues 1–20) and S-protein (residues 21–124, also called RNase S). S-peptide complements the inactive Sprotein to restore nativelike structure and activity. 2-Nitrobenzyl caged amino acids are incorporated at different positions in the S-peptide by solid phase peptide synthesis.
Gln11 in the S-peptide 88 stabilizes a phosphate ester of the RNA substrate. Caging of Gln11 leads to complete inhibition of the enzyme. Upon irradiation, 37% of the activity is recovered. S peptide caged at other positions shows a lesser or no effect on the ribonuclease activity. The authors report 20–30% photodamage of the control native enzyme.
Lysozyme T4
Asp20 in T4 lysozyme is essential for catalytic activity. The active site Asp20 is replaced by Asp-NB ester using the amber-codon suppression method.
Irradiation of the inactivated NB-Asp20 lysozyme T4, at 315 nm restores 32% of the activity.
93
(Continued)
35
36 Light-activated Proteins Table 2 (continued)
Class
Example
Caged protein
Comments
Random modification of amino groups of Lys residues in immunoglobulin with 1-(2-nitrophenyl) ethane chloroformate.
On average 30 NPE residues 49 are bound per immunoglobulin protein. After removal of antibody that is not inactivated, the binding capacity of the modified protein in an ELISA falls to 1.5% of native IgG. On subsequent exposure to UV light as much as 29% of the binding capacity is regenerated. The unmodified control protein lost 30% of its activity upon irradiation for the same duration.
Immunoglobulin
Lys residues in an antibody (Ab) that targets brain-derived neurotrophic factor (BDNF) are randomly modified with NVOC-Cl.
About 40% of the Lys resi183 dues of the BDNF Ab are modified, leading to a reduction of 80% in binding affinity. The derivatized antibody is effective when uncaged during an electrophysiological assay with hippocampal slices.
Immunoglobulin
Random modification of Lys residues of immunoglobulin (IgG) by 3-nitro2-naphthalyloxycarbonyl chloride (NMCC).
On average 22 residues are caged per protein, which decreases the activity to 18%. Upon irradiation, binding to Protein A is increased to 70–74% of the original value. max = 259 nm; 259 = 7298; 350 = 1289; (380; pH 6.7) = 0.83.
Ribosomeinactivating protein cross-linked with a targeting protein
Lys residues in the ribosome-inactivating protein PAP-S are modified with the chloroformate group of the cross-linker. The thiol is then deprotected and used to couple with an antibody carrying maleimide groups.
55 An average of 0.7 crosslinking groups are attached per PAP-S. Targeting proteins are either a monoclonal antibody, directed against the human transferrin receptor, or the B-chain of ricin; 85% of of the protein–protein conjugate is cleaved upon irradiation.
Antibodies Immunoglobulin
Refs.
50
5 Future Prospects Table 2 (continued)
Class
Example
Caged protein
Comments
Refs.
Cytoskeleton
Colifin
Colifin is inactivated by phosphorylation at Ser-3. This residue is mutated into a Cys, which results in a constitutively active protein. Cys3 is then modified by BNPA. The negative charge of this reagent acts as a phosphate mimic and thereby inactivates colifin.
Caged colifin is microinject- 81 ed into an adenocarcinoma cell line. Irradiation causes 80% uncaging of colifin, and the appearance of cleaved F-actin filaments.
Thymosin 4 (T 4)
Thymosins contain a Lys-rich segment, which binds G-actin and thereby regulates actin polymerization into F-actin. T 4 is inactivated by random modification with NVOC-Cl.
In vitro, caged T 4 shows no 184 effect on the polymerization rate of actin. Once photolyzed, T 4 significantly inhibits actin polymerization. Caged T 4 is bead-loaded into keratocytes. Local photoactivation results in a regional locomotor response.
Heavy meromyosin
Cys707 in heavy meromyosin is modified with DMNBB.
The modification abolishes 185 actin-activated ATPase activity. A 340–400-nm light pulse activates the movement of actin filaments with a velocity of 17–50% of that seen with native meromyosin.
G-actin
Lys residues in G-actin are caged as carbamates and the oxirane group is subsequently used to crosslink the caged G-actin with thiolated fluorescent dextran.
Several cross-linking reagents 61 with various X leaving groups for amine modification are made. Once the Lys-containing G-actin has been caged, it can be cross-linked to thiolated dextran by reaction with the oxirane group. The cross-linker is also used to cage the two aromatic amines of rhodamine 110, resulting in the non-fluorescent lactone form of the dye. The resulting reagent is coupled via one of the oxirane rings to Cys374 in G-actin, resulting in a protein that can both form F-actin and release rhodamine upon irradiation.
(Continued)
37
38 Light-activated Proteins Table 2 (continued)
Class
Example
Caged protein
Comments
Refs.
Lys61 is located at the actin–actin interface in the Factin polymer. G-actin is modified with NVOC-Cl to allow photoregulation of the polymerization.
83 Three to five NVOC groups are attached per G-actin polypeptide, resulting in greatly reduced polymerization activity. Up to 95% polymerization into F-actin is induced upon irradiation.
Transcrip- GAL4VP16 GAL4VP16 is randomly tion factors modified with NVOC-Cl.
Modification of GAL4VP16 186, with NVOC-Cl under basic 187 conditions completely inhibits its DNA binding ability. More than 50% of the binding activity is recovered upon irradiation at 365 nm. GAL4VP16 modified at an average of 8 of 14 Lys residues with NVOC-Cl can be activated after injection into Drosophila embryos.
Channels and pores
Only four of the unmodified 47 mutants retain pore-forming activity similar to that of the wild-type and can be completely inhibited with sulfhydryl reagents. The R104C mutant, modified with a single CNB group, shows almost no residual activity when tested in a red cell lysis assay. Between 10 and 15% of the lytic activity of HL-R104C is recovered upon uncaging. However, removal of the carboxynitrobenzyl group is 60% effective, as assessed by SDS-PAGE. Therefore, it is likely that one or a few inactive monomers prevent formation of the active heptameric pore.
-hemolysin 83 single Cys mutants of the -hemolysin ( HL) monomer are screened and R104C is chosen to be modified by BNPA.
5 Future Prospects Table 2 (continued)
Class
Example
Caged protein
Comments
Nicotinic acetylcholine receptor (nAChR)
Leu9 in the M2 transmembrane segment of the subunit of nAChR is involved in channel gating. This residue is replaced by 2-nitrobenzyl tyrosine or 2-nitrobenzyl Cys by unnatural amino acid mutagenesis in Xenopus oocytes.
Cys-NB-9 and Tyr-NB-9 188 mutants both show an AChinduced current before photolysis. Irradiation of Cys-NB-9 causes a 50–150% current increase. For Tyr-NB-9 , the current increases up to 4-fold.
Tyr93 and 198 in the subunit of nAChR are highly conserved and thought to participate in agonist binding. They are replaced by 2-nitrobenzyltyrosine, by unnatural amino acid mutagenesis in Xenopus oocytes. The less crucial Tyr127 is caged the same way as a comparison.
The receptors containing Tyr-NB-93 and Tyr-NB-198 are barely affected by ACh until uncaged by irradiation. In the presence of ACh, photolysis produces an increase in current and subsequent desensitization. Up to 17% of the tyrosines are uncaged with a single 1-ms 300–350-nm flash.
142
Both the K+ channel and nAChR mutants are expressed in Xenopus oocytes. Irradiation of the K+ channel results in a substantial reduction in its inactivation rate. Oocytes containing NpgnAChR show a large loss of whole-cell current when irradiated, but the channels remain properly folded, as judged by -bungarotoxin binding ability.
48
nAChR and Irradiation of the unnatuK+ channel ral amino-acid 2-nitrophenyl glycine (Npg) results in the cleavage of a polypeptide backbone. Npg is incorporated by unnatural amino acid mutagenesis at position 47 of the K+ channel, just after the N-terminal inactivating “ball” domain. The nAChR contains a highly conserved disulfide loop, Cys128–Cys142, that is critical for protein folding and assembly. Npg is incorporated between the two Cys residues.
Refs.
(Continued)
39
40 Light-activated Proteins Table 2 (continued)
Class
Splicing
Example
Caged protein
Comments
Kir2.1 channel
Tyr242 in the Kir2.1 potassium channel influences channel function, through its phosphorylation state. 2-Nitrobenzylcaged tyrosine is introduced by unnatural amino acid mutagenesis in Xenopus oocytes.
Photolysis of the NB-Tyr242 143 mutant in the presence of tyrosine kinases results in a decrease of both current and cell capacitance, due to Kir2.1 endocytosis.
Self-splicing Conserved nucleophilic proteins residues (Cys, Ser or Thr) immediately following splice junctions in self-splicing proteins are essential to splicing. Ser1082 in the T. litoralis DNA polymerase precursor is replaced by NB-Ser by unnatural amino acid mutagenesis, abolishing splicing.
Refs.
Irradiation of the truncated 95 56 kDa NB-Ser1082 DNA polymerase precursor, followed by incubation to allow splicing, results in excision of the expected 45-kDa intein as seen by Western blot analysis. However, attempts to detect the 10-kDa ligation product were unsuccessful, possibly because of the proteolytic sensitivity of the DNA polymerase in Escherichia coli extracts.
Protein– p21ras protein interactions
Asp38 in p21ras forms an important interaction with Lys939 in p120-GAP (GTPase-activating protein). Asp38 is replaced by NBAsp by the amber suppression technique.
NB-Asp38 p21ras retains its intrinsic GTPase activity, but is unable to interact with p120-GAP until it has been uncaged by 355 nm irradiation. The GTPase activity is then increased 3.5-fold, up to about 50% of the activity of the wild-type control in the presence of GAP.
Caged GFP fluorescent proteins
PA-GFP fusion proteins GFP contains a mixed can be activated in vivo. population of neutral phenols ( max = 397 nm) and phenolate chromophores ( max = 475 nm). The T203H mutant (PA-GFP) is almost entirely in the neutral form. Irradiation at 413 nm causes conversion to the phenolate and an almost 100-fold increase in fluorescence. The activated form is stable for days.
35
122, 123
5 Future Prospects
a limit on the maximum fold change in activity that might be achieved. In this case, the maximum feasible extent of switching was not translated into an effect on enzymatic activity because the effects of photoisomerization on kinetic constants were small and distributed among effects on S-peptide binding to RNase S, substrate binding and catalysis. In contrast, the apparently complete on–off switching of horseradish peroxidase has been accomplished [111]. Muranaka et al. placed PAP at 15 different positions in the peroxidase by using unnatural amino acid mutagenesis. Four of the mutants retained activity. In one of these (position 68), the cis form (UV activated) was more active than the trans form. Remarkably, for the protein substituted at position 179, the cis form was completely inactive and on–off switching was obtained, with 50% recovery in the first cycle utilizing reversal with visible light and no further loss of activity in the second cycle. Given, the observations with PAP in solution, it is surprising that no residual activity was seen. One possibility is that conversion to the cis form was more complete on the peroxidase, suggesting that the position of the photoequilibrium might vary with environment. There is precedent for such a possibility. To make a photomodulated DNA-binding protein, Caama˜ no et al. connected two 26-amino-acid peptides comprising GCN4 binding domains through C-terminal Cys residues by an azobenzene bridge [112]. Irradiation at 365 nm gave a mixture of 5% trans/95% cis. In the absence of DNA, this could be reversed by irradiation at 430 nm or by thermal relaxation. The cis form bound a target DNA sequence far more strongly than the trans form and, when bound to DNA, the cis form could not be switched back to the trans form by photolysis, demonstrating the high stability of the complex. Under many circumstances, it seems likely that a photoisomerizable intramolecular crosslink would have a more profound effect on the structure of a protein than substitution with a photoisomerizable amino acid. This possibility has been explored by Woolley et al. with model peptides [113–115]. Recently, the group has devised a water-soluble azobenzene crosslinker containing two sulfonate groups for attachment at Cys residues [116]. Because it can be introduced without organic cosolvent, it should be especially useful for the direct modification of proteins. The use of photoisomerizable reagents allows proteins to be activated by one wavelength of light and turned off by another. Alternatively, reversal can occur thermally. Therefore, it is possible to titrate the amount of activated protein in an experiment with a calibrated light pulse (assuming slow thermal reversal) or by continuous irradiation to produce a photo-steady state (assuming rapid thermal reversal). Additional means for turning proteins caged with protecting groups (rather than photoisomerizable reagents) into single-cycle reversible agents should be explored. For example, light activation as described earlier might be built into a temperature-sensitive mutant of a protein, so that activity can be turned on by light and off by a temperature shift. Temperature sensitivity can also be established by targeted chemical modification with polymers such as poly(N-isopropylacrylamide) that undergo a transition between an expanded and a condensed form [117]. Interestingly, the two ideas can be combined. Certain azopolymers undergo lightdependent expansion and condensation and, when attached at specific sites in a protein, they can mediate the control of ligand binding [118].
41
42 Light-activated Proteins
Additional constructs for on–off switching can be envisioned in which the protein must be modified at two sites. For example, it might be possible to activate a protein by uncaging with one wavelength of light and then inactivate it with another wavelength, e.g. by uncaging a sequence that targets the protein for proteolytic destruction in a cell or one that acts as an intramolecular inhibitor peptide, or by causing photolytic chain cleavage. Mutually exclusive modification at two sites might be achieved by using a nucleophile on the proteins that can be distinguished from Cys by virtue of its greater reactivity or lower pK a , such as thiophosphate [12] or Sec [70]. The orthogonal photolysis required to carry out these possibilities has been demonstrated with model compounds [119, 120]. Two protecting groups, a dimethoxybenzoin (254 nm) and dimethoxy-nitrobenzyl (420 nm), were removed in either order with selectivities in the range of 5 :1 to 10:1, thanks to an appropriate distribution of u and ε u values, and the lack of energy transfer between the short-lived excited state of dimethoxybenzoin and the dimethoxynitrobenzyl group. Incidentally, this result also suggests that selective activation should be possible in an experimental system containing two different caged proteins. 5.2 Genetically Encoded Caged Proteins and Caging within Cells
A major difficulty in using caged proteins within living cells is getting them into the cells in the first place. Techniques include microinjection, the use of Tat fusion proteins, electroporation, syringe loading and bead loading, all of which bring proteins in from the outside. In the case of fluorescent indicator proteins, rather than caged proteins, this issue has been addressed by using polypeptides that are completely genetically encoded, for the most part GFP fusion proteins [76]. It would be of great interest if caged proteins could also be genetically encoded, e.g. by using a fusion to a light-responsive signaling element such as photoactive yellow protein. In work along these lines, Shimizu-Sato et al. have produced a lightactivatable gene expression system, which functions in vivo and is reversible [121] (Fig. 9). The system is based on the yeast two-hybrid model and utilizes the lightdependent phytochrome-PIF3 interaction to recruit a GAL4 DNA-binding domain to the desired promoter. In another case, the fluorescence of a GFP variant increases ∼ 100-fold when subjected to intense 400-nm radiation and can be thought of as a genetically encoded caged fluorescent marker [122, 123]. A second idea is to chemically modify a selected protein within a cell. Obviously, this cannot be done at a single Cys residue, because there are numerous endogenous proteins containing Cys residues. Nevertheless, it has been achieved by using genetically encoded TC motifs (TC tags) that couple selectively to small cell-permeant molecules containing two suitably positioned trivalent arsenic functionalities [76]. So far, caged proteins have not been made in this way, but the idea has been extended to chromophoreassisted light inactivation (CALI). In the original manifestation of CALI, a dye, malachite green, was attached to an antibody raised against a target protein and the conjugate was microinjected into target cells. Upon irradation, reactive oxygen species, perhaps hydroxyl radicals, were generated that inactivated the target, but spared proteins a short distance away [124–126]. Tsien et al. have shown that
5 Future Prospects
Fig. 9 A light-activated transcription factor. The system is based on the red lightinduced binding of the plant photoreceptor phytochrome (Phy) to the protein PIF3 and reversal of the interaction by far red light. Phy is fused with a GAL4 DNA-binding do-
main (BD) and thus binds to the promoter of the target gene. Upon irradiation with red light, Phy recruits PIF3, which is fused with a transcription activation domain (TA). Upon irradiation with far red light, Phy dissociates from PIF3 and transcription ceases.
biarsenical derivatives of resorufin or fluorescein attached to a TC tag in vivo can mediate CALI, in this case most likely by the generation of singlet oxygen [127]. However, the degree of selectivity of biarsenicals in a cellular environment, in which natural Cys-rich proteins are present, has been questioned and further refinements of the technology can be expected [77]. 5.3 Improved Spatial Control
A major advantage of the use of caged proteins is the spatiotemporal control of reagent release, and there is room for improvement in both the spatial and temporal aspects of control. By using two-photon uncaging with focussed light, photolysis within cells can be limited to a few cubic micrometers [15, 39] and new approaches to sharply focussed beams continue to be investigated [128]. In the two-photon approach, a laser beam is used to produce very high light intensity at the target site, allowing, for example, the excitation of near-UV transitions with two red photons. Because the absorption probability falls off with a quadratic dependence, the excitation volume is highly restricted. Further, although uncaged molecules may diffuse from the site of irradiation, they are rapidly diluted. The use of a doubly caged glutamate that requires the sequential absorption of two photons for conversion to active glutamate also resulted in a substantial improvement in spatial resolution over singly caged glutamate when a focussed beam was used [129]. This “chemical two-photon uncaging” might also be applied to proteins. Local heating of biomolecules coupled to metal nanoparticles with RF magnetic fields is an interesting new approach for activation that has a spatial resolution of a few nanometers [130]. Spatial resolution might also be improved
43
44 Light-activated Proteins
by placing tags on caged proteins that direct them to subcellular or-ganelles, such as the nucleus or mitochondria. It is possible to screen for new tags by using approaches such as phage display of combinatorial libraries. Spatial control is also important for patterning surfaces. For example, patterning at the micron level is desirable in the fabrication of protein chips. Surfaces can first be covered with protein, which is then released from specific sites by irradiation. Alternatively, protein can be attached to the desired sites by irradiation. In an example of photolytic removal, Ching and colleagues attached an 18-residue peptide to poly(4-vinylpyridine) via N-pyridinium linkages [131]. The linkages were cleaved by UV irradiation. Deposition brings us into the arena of photoaffinity labeling and related approaches. In a recent example, Holden and Cremer showed that dye-labeled proteins can be attached to a BSA-coated surface by irradiation, most likely through a singlet oxygen mechanism [132]. For example, an Alexa-IgG was coupled at 594 nm. 5.4 Temporal Control
In certain biophysical experiments, including time-resolved crystallography and in vitro protein folding, there is a need for very rapid, efficient uncaging. Interestingly, recent work in the area of protein folding has eschewed conventional caging reagents, such as nitrobenzyl groups, which are often released quite slowly (Tab. 1), for photoisomerizable molecules such as azobenzenes. It is known that photoinduced trans–cis isomerization in azobenzenes is very fast indeed, occurring on the subpicosecond timescale. The quantum yield is also high. Until recently, proteinfolding studies had been limited to milliseconds by rapid mixing or dilution, or to microseconds by various “jump” techniques. New spectroscopic techniques are now being applied, many of which would benefit from rapid triggering of the folding process. Femtosecond time-resolved spectroscopy has been used to examine peptides cyclized with cis-azobenzene groups (Fig. 10). Photoisomerization, the subsequent dissipation of vibrational energy and the conformational relaxation of the trans-cyclic peptide were resolved [21]. Substantial conformational changes occurred on the 50-ps time-scale, but additional changes in the polypeptide backbone were still occurring at 1 ns. The kinetics of helix unfolding in a 16-amino-acid peptide crosslinked with azobenzene have been studied on the nanosecond timescale by optical rotatory dispersion (ORD) [22]. In the trans form, the peptide favors a more helical structure (66% helix) and in the cis configuration the helical content is reduced. Thermal reisomerization takes minutes, so photochemistry could be studied by flash photolysis in the trans to cis direction. After a 7-ns laser pulse at 355 nm, the ORD signal at 230 nm was best fit to a single-exponential decay with a time constant of 55 ns. The folding and unfolding time constants for the cis peptide were estimated to be 330 and 66 ns, respectively. Alternative approaches in this area include the photolytic cleavage of crosslinks within conformationally constrained peptides, including aromatic disulfides (at 270 nm) [19], which are believed to cleave in <1 ps, and cyclic benzoins, which cleave in <1 ns [20].
5 Future Prospects
Fig. 10 Application of a photoisomerizable reagent. An azobenzene inserted into a cyclic peptide in the cis configuration serves as an ultrafast optical switch to study peptide conformational dynamics.
In crystallography, an interesting way to observe intermediates consists of irradiating to completion a crystal of a caged protein at a low temperature (about 100–150 K) and then slowly increasing the temperature [133, 134]. In this way, caging groups that are inefficiently photolyzed can be used, which would not be the case with Laue diffraction, where uncaging with a single short light flash is generally employed [43]. 5.5 Medical Applications
For many years, it has been realized that photoactivatable drugs might have advantages in certain circumstances [49, 135]. Indeed, the light-mediated effects of psoralens were among the earliest drug actions recorded. Proteins have been used to increase therapeutic specificity by carrying drugs such as photosensitizers into cells [136, 137] or onto bacteria [138], but there have been few developments of caged proteins as drugs. Examples include immunotoxins that have been shown to kill cells in vitro [55] (Fig. 7) and a caged hydroxycinnamoylthrombin that produces local thrombosis in living cornea after irradiation at 366 nm [139]. The latter might be useful for treating aberrant ocular neovascularization, a major cause of blindness. The infectivity of adenoviral vectors can be controlled by coating the viral particles with a water-soluble, photocleavable reagent [140]. The modified viral vectors possess little infectivity towards target cells. Exposure to 365-nm light induces a reversal of the neutralizing modification and the restoration of infectivity.
45
46 Light-activated Proteins
The light-directed transduction of target cells by photoactivatable adenoviral vectors was demonstrated successfully both in vitro and in vivo. The modified vectors might find applications in gene therapy. Further advances in the medical application of caged proteins will require attention to a number of issues including the means of administration, binding or uptake by target cells and light penetration into tissues [135].
Acknowledgments
We thank Seong-Ho Shin and Maurice Goeldner for comments on the manuscript. S. L. was supported by a fellowship from the Fondation pour La Recherche M´edicale and by the Welch Foundation (BE-1335). H.B. is the holder of a Royal Society–Wolfson Research Merit Award.
Note added in proof
In an excellent example of spatiotemporal control, the phosphocofilin mimick [81] has been microinjected into carcinoma cells and activated within a 3-µm diameter spot [189]. New examples of protein folding on a nanosecond timescale have appeared [190, 191]. As predicted, a caged amino acid has now been incorporated into proteins in vivo [192]. In related work in vitro, caged amino acids have been introduced at the active site at the dimer interface of an endonuclease by using a 4-base codon [193]. As in the case of HIV protease [60], caging and photoactivation do not affect dimerization. When PAP was used, a substantial increase in endonuclease activity was observed after irradiation [194]. The rate of activation was slower than photoisomerization in solution, suggesting that the azobenzene is constrained on the protein. Indeed, activation was not reversed by irradiation at long wavelengths. Finally, 2-nitrophenylglycine, which causes chain cleavage upon photolysis, has been introduced at the proteolytic processing site of caspase-3 [195]. A method for chemical peptide ligation that leaves a 2-nitrobenzylamide at the splice junction has been devised [196]. The ligation product is caged on the peptide backbone (cf. [91]). Chemical ligation has also been used to incorporate caged phosphoserines [197]. On the technical side, molecular antennae have been used to increase the efficiency of photolysis [198]. Improved sites for the attachment of biarsenicals have been developed [199].
References 1 G. MARRIOTT (ed.), Methods in Enzymology. Vol. 291. Caged Compounds, Academic Press, New York, 1998. 2 A. P. PELLICCIOLI, J. WIRZ, Photochem. Photobiol. Sci. 2002, 1, 441–458.
3 H. BAYLEY, C.-Y. CHANG, W. T. MILLER, B. NIBLACK, P. PAN, Methods Enzymol. 1998, 291, 117–135. 4 K. CURLEY, D. S. LAWRENCE, Curr. Opin. Chem. Biol. 1999, 3, 84–88.
References 5 G. MARRIOTT, P. ROY, K. JACOBSON, Methods Enzymol. 2003, 360, 274–288. 6 E. J. PETERSSON, G. S. BRANDT, N. M. ZACHARIAS, D. A. DOUGHERTY, H. A. LESTER, Methods Enzymol. 2003, 360, 258–273. 7 K. ZOU, S. CHELEY, R. S. GIVENS, H. BAYLEY, J. Am. Chem. Soc. 2002, 124, 8220–8229. 8 J.-L. RAVANAT, T. DOUKI, J. CADET, J. Photochem. Photobiol. B 2001, 63, 88–102. 9 M. J. DAVIES, Biochem. Biophys. Res. Commun. 2003, 305, 761–770. 10 T. MILBURN, N. MATSUBARA, A. P. BILLINGTON, J. B. UDGAONKAR, J. W. WALKER, B. K. CARPENTER, W. W. WEBB, J. MARQUE, W. DENK, J. A. MCCRAY, G. P. HESS, Biochemistry 1989, 28, 49–55. 11 J. W. WALKER, H. MARTIN, F. R. SCHMITT, R. J. BARSOTTI, Biochemistry 1993, 32, 1338–1345. 12 P. PAN, H. BAYLEY, FEBS Lett. 1997, 405, 81–85. 13 J. E. T. CORRIE, A. BARTH, V. R. N. MUNASINGHE, D. R. TRENTHAM, M. C. HUTTER, J. Am. Chem. Soc. 2003, 125, 8546–8554. 14 Y.V. IL’ICHEV, M.A. SCHWO¨ RER, J. WIRZ, J. Am. Chem. Soc. 2004, 126, 4581–4595. 15 T. FURUTA, S. S. H. WANG, J. L. DANTZKER, T. M. DORE, W. J. BYBEE, E. M. CALLAWAY, W. DENK, R. Y. TSIEN, Proc. Natl Acad. Sci. USA 1999, 96, 1193–1200. 16 B. SCHADE, V. HAGEN, R. SCHMIDT, R. HERBRICH, E. KRAUSE, T. ECKARDT, J. BENDIG, J. Org. Chem. 1999, 64, 9109–9117. 17 R. S. GIVENS, C.-H. PARK, Tetrahedron Lett. 1996, 37, 6259–6262. 18 P. G. CONRAD, R. S. GIVENS, J. F. W. WEBER, K. KANDLER, Org. Lett. 2000, 2, 1545–1547. 19 H. S. M. LU, M. VOLK, Y. KHOLODENKO, E. GOODING, R. M. HOCHSTRASSER, W. F. DEGRADO, J. Am. Chem. Soc. 1997, 119, 7173–7180. 20 K. C. HANSEN, R. S. ROCK, R. W. LARSEN, S. I. CHAN, J. Am. Chem. Soc. 2000, 122, 11567–11568. 21 S. SPO¨ RLEIN, H. CARSTENS, H. SATZGER, C. RENNER, R. BEHRENDT, L. MORODER, P. TAVAN, W. ZINTH, J. WACHTVEITL, Proc. Natl Acad. Sci. USA 2002, 99, 7998–8002.
22 E. CHEN, J. R. KUMITA, G. A. WOOLLEY, D. S. KLIGER, J. Am. Chem. Soc. 2003, 125, 12443–12449. 23 A. BARTH, K. HAUSER, W. M¨ANTELE, J.E. T. CORRIE, D. R. TRENTHAM, J. Am. Chem. Soc. 1995, 117, 10311–10316. 24 X. DU, H. FREI, S. H. KIM, J. Biol. Chem. 2000, 275, 8492–8500. 25 X. DU, H. FREI, S. H. KIM, Biopolymers 2001, 62, 147–149. 26 Q. CHENG, M. G. STEINMETZ, V. JAYARAMAN, J. Am. Chem. Soc. 2002, 124, 7676–7677. 27 T. LUCHIAN, S.-H. SHIN, H. BAYLEY, Angew. Chem. Int. Ed. 2003, 42, 1926–1929. 28 M. WILCOX, R. W. VIOLA, K. W. JOHNSON, A. P. BILLINGTON, B. K. CARPENTER, J. A. MCCRAY, A. P. GUZIKOWSKI, G. P. HESS, J. Org. Chem. 1990, 55, 1585–1589. 29 C. GREWER, J. J¨AGER, B.K. CARPENTER, G. P. HESS, Biochemistry 2000, 39, 2063–2070. 30 C.-H. PARK, R. S. GIVENS, J. Am. Chem. Soc. 1997, 119, 2453–2463. 31 K. ZOU, W. T. MILLER, R. S. GIVENS, H. BAYLEY, Angew. Chem. Int. Ed. Engl. 2001, 40, 3049–3051. 32 J. E. T. CORRIE, D. R. TRENTHAM, in: Biological Applications of Photochemical Switches, H. MORRISON (ed.), Wiley, New York, 1993, vol. 2, pp. 243–299. 33 P. ZUMAN, B. SHAH, Chem. Rev. 1994, 94, 1621–1641. 34 A. BARTH, J. E. T. CORRIE, M. J. GRADWELL, Y. MAEDA, W. MANTELE, T. MEIER, D. R. TRENTHAM, J. Am. Chem. Soc. 1997, 119, 4149–4159. 35 S. K. POLLITT, P. G. SCHULTZ, Angew. Chem. Int. Ed. Engl. 1998, 37, 2104. 36 J.E. GONZA´ LEZ, R.Y. TSIEN, Chem. Biol. 1997, 4, 269–277. 37 K. BRIVIBA, H. SIES, Methods Enzymol. 2000, 319, 222–226. 38 S. BEUTNER, B. BLOEDORN, T. HOFFMANN, H. D. MARTIN, Methods Enzymol. 2000, 319, 226–241. 39 W. DENK, Proc. Natl Acad. Sci. USA 1994, 91, 6629–6633. 40 O. D. FEDORYAK, T. M. DORE, Org. Lett. 2002, 4, 3419–3422. 41 M. LU, O. D. FEDORYAK, B. R. MOISTER, T. M. DORE, Org. Lett. 2003, 5, 2119–2122.
47
48 Light-activated Proteins
42 M. P. GOELDNER, C. G. HIRTH, Proc. Natl Acad. Sci. USA 1980, 77, 6439. 43 B. L. STODDARD, B. E. COHEN, M. BRUBAKER, A. D. MESECAR, D. E. KOSHLAND, Nat. Struct. Biol. 1998, 5, 891–897. 44 N. A. PORTER, J. W. THURING, H. LI, J. Am. Chem. Soc. 1999, 121, 7716–7717. 45 J. W. THURING, H. LI, N. A. PORTER, Biochemistry 2002, 41, 2002–2013. 46 K. CURLEY, D. S. LAWRENCE, J. Am. Chem. Soc. 1998, 120, 8573–8574. 47 C.-Y. CHANG, B. NIBLACK, B. WALKER, H. BAYLEY, Chem. Biol. 1995, 2, 391–400. 48 P. M. ENGLAND, H. A. LESTER, N. DAVIDSON, D. A. DOUGHERTY, Proc. Natl Acad. Sci. USA 1997, 94, 11025–11030. 49 C. H. SELF, S. THOMPSON, Nat. Med. 1996, 2, 817–820. 50 A. K. SINGH, P. K. KHADE, Bioconjugate Chem. 2002, 13, 1286–1291. 51 A. D. TURNER, S. V. PIZZO, G. W. ROZAKIS, N. A. PORTER, J. Am. Chem. Soc. 1987, 109, 1274–1275. 52 G. C. ADAM, J. BURBAUM, J. W. KOZARICH, M. P. PATRICELLI, B. F. CRAVATT, J. Am. Chem. Soc. 2004, 126, 1363–1368. 53 S. LOUDWIG, Y. NICOLET, P. MASSON, J. C. FONTECILLA-CAMPS, S. BON, F. NACHON, M. GOELDNER, ChemBioChem 2003, 4, 762–767. 54 G. ARABACI, X.-C. GUO, K. D. BEEBE, K. M. COGGESHALL, D. PEI, J. Am. Chem. Soc. 1999, 121, 5085–5086. 55 V. S. GOLDMACHER, P. D. SENTER, J. M. LAMBERT, W. A. BLA¨ TTLER, Bioconjugate Chem. 1992, 3, 104–107. 56 J. OLEJNIK, S. SONAR, E. ˜ SKA-OLEJNIK, K.J. ROTHSCHILD, KRZYMAN Proc. Natl Acad. Sci. USA 1995, 92, 7590–7594. 57 B. WALKER, H. BAYLEY, J. Biol. Chem. 1995, 270, 23065–23071. 58 L. SONG, M. R. HOBAUGH, C. SHUSTAK, S. CHELEY, H. BAYLEY, J. E. GOUAUX, Science 1996, 274, 1859–1865. 59 R. OLSON, H. NARIYA, K. YOKOTA, Y. KAMIO, E. GOUAUX, Nat. Struct. Biol. 1999, 6, 134–140. 60 G. F. SHORT III, M. LODDER, A. L. LAIKHTER, T. ARSLAN, S. M. HECHT, J. Am. Chem. Soc. 1999, 121, 478–479.
61 J. OTTL, D. GABRIEL, G. MARRIOTT, Bioconjugate Chem. 1998, 9, 143–151. 62 T. MUIR, Annu. Rev. Biochem. 2003, 72, 249–289. 63 D. CLAYTON, G. SHAPOVALOV, J. MAURER, J. A. DOUGHERTY, H. A. LESTER, G. G. KOCHENDOERFER, Proc. Natl Acad. Sci. USA 2004, 101, 4764–4769. 64 F. I. VALIYAVEETIL, M. SEKEDAT, T. W. MUIR, R. MACKINNON, Angew. Chem. Int. Ed. 2004, 43, 2504–2507. 65 C.-Y. CHANG, T. FERNANDEZ, R. PANCHAL, H. BAYLEY, J. Am. Chem. Soc. 1998, 120, 7661–7662. 66 R. GOLAN, U. ZEHAVI, M. NAIM, A. PATCHORNIK, P. SMIRNOFF, Biochim. Biophys. Acta 1996, 1293, 238–242. 67 E. K. JAFFE, M. COHN, Biochemistry 1978, 17, 652–657. 68 A. P. ARNOLD, K.-S. TAN, D. L. RABENSTEIN, Inorg. Chem. 1986, 25, 2433–2437. 69 R. G. PEARSON, H. SOBEL, J. SONGSTAD, J. Am. Chem. Soc. 1968, 90, 319– 326. 70 R. J. HONDAL, B. L. NILSSON, R. T. RAINES, J. Am. Chem. Soc. 2001, 123, 5140–5141. 71 K. E. SANDMAN, J. S. BENNER, C. J. NOREN, J. Am. Chem. Soc. 2000, 122, 960. 72 M. D. GIESELMAN, L. XIE, W. A. van der DONK, Org. Lett. 2001, 3, 1331–1334. 73 Y. XU, E. T. KOOL, J. Am. Chem. Soc. 2000, 122, 9040–9041. 74 B. A. GRIFFIN, S. R. ADAMS, R. Y. TSIEN, Science 1998, 281, 269–272. 75 S. R. ADAMS, R. E. CAMPBELL, L. A. GROSS, B. R. MARTIN, G. K. WALKUP, Y. YAO, J. LLOPIS, R. Y. TSIEN, J. Am. Chem. Soc. 2002, 124, 6063–6076. 76 J. ZHANG, R. E. CAMPBELL, A. Y. TING, R. Y. TSIEN, Nat. Rev. Mol. Cell Biol. 2002, 3, 906–918. 77 K. STROFFEKOVA, C. PROENZA, K. G. BEAM, Pfl¨ugers Arch. (Eur. J. Physiol.) 2001, 442, 859–866. 78 H. KOLB, K. B. SHARPLESS, Drug Disc. Today 2003, 8, 1128–1137. 79 A. J. LINK, D. A. TIRRELL, J. Am. Chem. Soc. 2003, 125, 11164–11165. 80 Z. ZHANG, B. A. SMITH, L. WANG, A. BROCK, C. CHO, P. G. SCHULTZ, Biochemistry 2003, 42, 6735–6746.
References 81 M. GHOSH, I. ICHETOVKIN, X. SONG, J. S. CONDEELIS, D. S. LAWRENCE, J. Am. Chem. Soc. 2002, 124, 2440–2441. 82 S. O. KIM, K. MERCHANT, R. NUDELMAN, W. F. BEYER, T. KENG, J. DEANGELO, A. LAUSLADEN, J. S. STAMLER, Cell 2002, 109, 383–396. 83 G. MARRIOTT, Biochemistry 1994, 33, 9092–9097. 84 S. HOWORKA, L. MOVILEANU, X. LU, M. MAGNON, S. CHELEY, O. BRAHA, H. BAYLEY, J. Am. Chem. Soc. 2000, 122, 2411. 85 B. L. STODDARD, P. KOENIGS, N. PORTER, K. PETRATOS, G. A. PETSKO, D. RINGE, Proc. Natl Acad. Sci. USA 1991, 88, 5503–5507. 86 B. L. STODDARD, J. BRUHNKE, N. PORTER, D. RINGE, G. A. PETSKO, Biochemistry 1990, 29, 4871–4879. 87 B. L. STODDARD, J. BRUHNKE, P. KOENIGS, N. PORTER, D. RINGE, G. A. PETSKO, Biochemistry 1990, 29, 8042–8051. 88 T. HIRAOKA, I. HAMACHI, Bioorg. Med. Chem. Lett. 2003, 13, 13–15. 89 J. W. WALKER, S. H. GILBERT, R. M. DRUMMOND, M. YAMADA, R. SREEKUMAR, R. E. CARRAWAY, M. IKEBE, F. S. FAY, Proc. Natl Acad. Sci. USA 1998, 95, 1568–1573. 90 Y. TATSU, Y. SHIGERI, A. ISHIDA, I. KAMESHITA, H. FUJISAWA, N. YUMOTO, Bioorg. Med. Chem. Lett. 1999, 9, 1093. 91 Y. TATSU, T. NISHIGAKI, A. DARSZON, N. YUMOTO, FEBS Lett. 2002, 525, 20–24. 92 T. UEDA, K. MURAYAMA, T. YAMAMOTO, S. KIMURA, Y. IMANISHI, J. Chem. Soc. Perkin Trans. I 1994, 225–230. 93 D. MENDEL, J. A. ELLMAN, P. G. SCHULTZ, J. Am. Chem. Soc. 1991, 113, 2758– 2760. 94 D. L. BEENE, D. A. DOUGHERTY, H. A. LESTER, Curr. Opin. Neurobiol. 2003, 13, 264–270. 95 S. N. COOK, W. E. JACK, X. XIONG, L. E. DANLEY, J. A. ELLMAN, P. G. SCHULTZ, C. J. NOREN, Angew. Chem. Int. Ed. Engl. 1995, 34, 1629–1630. 96 P. M. ENGLAND, Y. ZHANG, D. A. DOUGHERTY, H. A. LESTER, Cell 1999, 96, 89–98. 97 T. HOHSAKA, M. SISIDO, Curr. Opin. Chem. Biol. 2002, 6, 809–815.
98 L. WANG, P. G. SCHULTZ, Chem. Commun. 2002, 1–11. 99 M. W. NOWAK, P. C. KEARNEY, J. R. SAMPSON, M. E. SAKS, C. G. LABARCA, S. K. SILVERMAN, W. ZHONG, J. THORSON, J. N. ABELSON, N. DAVIDSON, P. G. SCHULTZ, D. A. DOUGHERTY, H. A. LESTER, Science 1995, 268, 439–442. 100 S. L. MONAHAN, H. A. LESTER, D. A. DOUGHERTY, Chem. Biol. 2003, 10, 573. 101 L. WANG, A. BROCK, B. HERBERICH, P. G. SCHULTZ, Science 2001, 292, 498. 102 R. A. MEHL, J. C. ANDERSON, S. W. SANTORO, L. WANG, A. B. MARTIN, D. S. KING, D. M. HORN, P. G. SCHULTZ, J. Am. Chem. Soc. 2003, 125, 935–939. 103 B. F. ERLANGER, Annu. Rev. Biochem. 1976, 45, 267–283. 104 D. H. HUG, Photochem. Photobiol. Rev. 1978, 3, 1–33. 105 G. MONTAGNOLI, S. MONTI, L. NANNICINI, M. P. GIOVANNITTI, M. G. RISTORI, Photochem. Photobiol. 1978, 27, 43–49. 106 N. A. PORTER, J. D. BRUHNKE, P. A. KOENIGS, in: Biological Applications of Photochemical Switches, H. MORRISON (ed.), Wiley, New York, 1993, vol. 2, 197–241. 107 I. WILLNER, R. SHAI, Angew. Chem. Int. Ed. Engl. 1996, 35, 367–385. 108 O. PIERONI, A. FISSI, N. ANGELINI, F. LENCI, Acc. Chem. Res. 2001, 34, 9–17. 109 H. A. LESTER, M. E. KROUSE, M. M. NASS, J. M. NERBONNE, N. H. WASSERMANN, B. F. ERLANGER, J. Gen. Physiol. 1980, 75, 207–232. 110 D. A. JAMES, D. C. BURNS, G. A. WOOLLEY, Protein Eng. 2001, 14, 983–991. 111 N. MURANAKA, T. HOHSAKA, M. SISIDO, FEBS Lett. 2002, 510, 10–12. ˜ O, M.E. V´AZQUEZ, J. 112 A.M. CAAMAN MART´INEZ-COSTAS, L. CASTEDO, J.L. ˜ AS, Angew. Chem. Int. Ed. MASCAREN 2000, 39, 3104–3107. 113 J. R. KUMITA, O. S. SMART, G. A. WOOLLEY, Proc. Natl Acad. Sci. USA 2000, 97, 3803–3808. 114 D. G. FLINT, J. R. KUMITA, O. S. SMART, G. A. WOOLLEY, Chem. Biol. 2002, 9, 391–397. 115 J. R. KUMITA, D. G. FLINT, O. S. SMART, G. A. WOOLLEY, Protein Eng. 2002, 15, 561–569.
49
50 Light-activated Proteins
116 Z. ZHANG, D. C. BURNS, J. R. KUMITA, O. S. SMART, G. A. WOOLLEY, Bioconjugate Chem. 2003, 14, 824–829. 117 T. SHIMOBOJI, E. LARENAS, T. FOWLER, A. S. HOFFMAN, P. S. STAYTON, Bioconjugate Chem. 2003, 14, 517–525. 118 T. SHIMOBOJI, Z. L. DING, P. S. STAYTON, A. S. HOFFMAN, Bioconjugate Chem. 2002, 13, 915–919. 119 C. G. BOCHET, Angew. Chem. Int. Ed. 2001, 40, 2071–2073. 120 A. BLANC, C. G. BOCHET, J. Org. Chem. 2002, 67, 5567–5577. 121 S. SHIMIZU-SATO, E. HUQ, J. M. TEPPERMAN, P. H. QUAIL, Nat. Biotechnol. 2002, 20, 1041–1044. 122 G. H. PATTERSON, J. LIPPINCOTT-SCHWARTZ, Science 2002, 297, 1873–1877. 123 G. H. PATTERSON, J. LIPPINCOTT-SCHWARTZ, Methods 2004, 32, 445–450. 124 D. G. JAY, Proc. Natl Acad. Sci. USA 1988, 85, 5454–5458. 125 K. G. LINDEN, J. C. LIAO, D. G. JAY, Biophys. J. 1992, 61, 956–962. 126 J. C. LIAO, J. ROIDER, D. G. JAY, Proc. Natl Acad. Sci. USA 1994, 91, 2659– 2663. 127 O. TOUR, R. M. MEIJER, D. A. ZACHARIAS, S. R. ADAMS, R. Y. TSIEN, Nat. Biotechnol. 2003, 21, 1505–1508. 128 R. DORN, S. QUABIS, G. LEUCHS, Phys. Rev. Lett. 2003, 91, 233901–233901-4. 129 D. L. PETTIT, S. S. H. WANG, K. R. GEE, G. J. AUGUSTINE, Neuron 1997, 19, 465. 130 K. HAMAD-SCHIFFERLI, J. J. SCHWARTZ, A. T. SANTOS, S. ZHANG, J. M. JACOBSON, Nature 2002, 415, 152–155. 131 J. CHING, K. I. VOIVODOV, T. W. HUTCHENS, Bioconjugate Chem. 1996, 7, 525–528. 132 M. A. HOLDEN, P. S. CREMER, J. Am. Chem. Soc. 2003, 125, 8074–8075. 133 A. SPECHT, T. URSBY, M. WEIK, L. PENG, J. KROON, D. BOURGEOIS, M. GOELDNER, ChemBioChem 2001, 2, 845–848. 134 T. URSBY, M. WEIK, E. FIORAVANTI, M. DELARUE, M. GOELDNER, D. BOURGEOIS, Acta Crystallogr. D 2002, 58, 607–614. 135 H. BAYLEY, F. GASPARRO, R. EDELSON, Trends Pharmacol. Sci. 1987, 8, 138– 143.
136 S. YEMUL, C. BERGER, A. ESTABROOK, S. SUAREZ, R. EDELSON, H. BAYLEY, Proc. Natl Acad. Sci. USA 1987, 84, 246–250. 137 R. A. FIRESTONE, Bioconjugate Chem. 1994, 5, 105–113. 138 F. BERTHIAUME, S. R. REIKEN, M. TONER, R. G. TOMPKINS, M. L. YARMUSH, Biotechnology 1994, 12, 703–706. 139 J. G. ARROYO, P. B. JONES, N. A. PORTER, D. L. HATCHELL, Thromb. Haemostasis 1997, 78, 791–793. 140 M. W. PANDORI, D. A. HOBSON, J. OLEJNIK, E. KRZYMANSKA-OLEJNIK, K. J. ROTHSCHILD, A. A. PALMER, T. J. PHILLIPS, T. SANO, Chem. Biol. 2002, 9, 567–573. 141 J. A. BARLTROP, P. J. PLANT, P. SCHOFIELD, Chem. Commun. 1966, 822–823. 142 J. C. MILLER, S. K. SILVERMAN, P. M. ENGLAND, D. A. DOUGHERTY, H. A. LESTER, Neuron 1998, 20, 619–624. 143 Y. TONG, G. S. BRANDT, M. LI, G. SHAPOVALOV, E. SLIMKO, A. KARSCHIN, D. A. DOUGHERTY, H. A. LESTER, J. Gen. Physiol. 2001, 117, 103–118. 144 J. W. WALKER, J. A. MCCRAY, G. P. HESS, Biochemistry 1986, 25, 1799–1805. 145 A. SPECHT, M. GOELDNER, Angew. Chem. Int. Ed. 2004, 43, 2008–2012. 146 J. M. NERBONNE, S. RICHARD, J. NARGEOT, H. A. LESTER, Nature 1984, 310, 74–76. 147 J. S. WOOD, M. KOSZELAK, J. LIU, D. S. LAWRENCE, J. Am. Chem. Soc. 1998, 120, 7145–7146. 148 A. C. PEASE, D. SOLAS, E. J. SULLIVAN, M. T. CRONIN, C. P. HOLMES, S. P. FODOR, Proc. Natl Acad. Sci. USA 1994, 91, 5022–5026. 149 J. HEBERT, D. GRAVEL, Can. J. Chem. 1974, 52, 187–189. 150 D. GRAVEL, J. HEBERT, D. THORAVAL, Can. J. Chem. 1983, 61, 400–410. 151 M. J. AURELL, C. BOIX, M. L. CEITA, C. LLOPIS, A. TORTAJADA, R. MESTRES, J. Chem. Res. 1995, 11, 452–453. 152 L. CEITA, A. K. MAITI, R. MESTRES, A. TORTAJADA, J. Chem. Res. 2001, 10, 403. 153 A. BLANC, C. G. BOCHET, J. Org. Chem. 2003, 68, 1138–1141. 154 S. WALBERT, W. PFLEIDERER, U. E. STEINER, Helv. Chim. Acta 2001, 84, 1601–1611. 155 T. J. ALBERT, J. NORTON, M. OTT, T. RICHMOND, K. NUWAYSIR, E. F.
References
156
157 158
159 160 161
162 163 164 165
166 167 168
169
170 171
172
173
174 175
NUWAYSIR, K.-P. STENGELE, R. D. GREEN, Nucleic Acids Res. 2003, 31, e35. B. AMIT, D. A. BEN-EFRAIM, A. PATCHORNIK, J. Am. Chem. Soc. 1976, 98, 843–844. G. PAPAGEORGIOU, J. E. T. CORRIE, Tetrahedron 2000, 56, 8197–8205. M. MATSUZAKI, G. C. R. ELLIS-DAVIES, T. NEMOTO, Y. MIYASHITA, M. LINO, H. KASAI, Nat. Neurosci. 2001, 4, 1086–1092. S. LOUDWIG, M. GOELDNER, Tetrahedron Lett. 2001, 42, 7957–7959. J. C. SHEEHAN, R. M. WILSON, J. Am. Chem. Soc. 1964, 86, 5277–5281. J. C. SHEEHAN, R. M. WILSON, A. W. OXFORD, J. Am. Chem. Soc. 1971, 93, 7222–7228. R. S. GIVENS, B. MATUSZEWSKI, J. Am. Chem. Soc. 1984, 106, 6860–6861. M. C. PIRRUNG, J. C. BRADLEY, J. Org. Chem. 1995, 60, 1116–1117. M. C. PIRRUNG, C.-Y. HUANG, Tetrahedron Lett. 1995, 36, 5883–5884. J. F. CAMERON, C. G. WILLSON, J. M. J. FRECHET, J. Am. Chem. Soc. 1996, 118, 12925–12937. C. S. RAJESH, R. S. GIVENS, J. WIRZ, J. Am. Chem. Soc. 2000, 122, 611–618. J. C. SHEEHAN, K. UMEZAWA, J. Org. Chem. 1973, 38, 3771–3774. R. S. GIVENS, J. F. W. WEBER, A. H. JUNG, C.-H. PARK, Methods Enzymol. 1998, 291, 1–29. A. SPECHT, S. LOUDWIG, L. PENG, M. GOELDNER, Tetrahedron Lett. 2002, 43, 8947–8950. T. FURUTA, M. IWAMURA, Methods Enzymol. 1998, 291, 50–63. T. FURUTA, H. TORIGAI, M. SUGIMOTO, M. IWAMURA, J. Org. Chem. 1995, 60, 3953–3956. V. HAGEN, J. BENDIG, S. FRINGS, T. ECKARDT, S. HELM, D. REUTER, U. B. KAUPP, Angew. Chem. Int. Ed. Engl. 2001, 40, 1045–1048. D. GEISSLER, W. KRESSE, B. WIESNER, J. BENDIG, H. KETTENMAN, V. HAGEN, ChemBioChem 2003, 4, 162–170. T. FURUTA, Y. HIRAYAMA, M. IWAMURA, Org. Lett. 2001, 3, 1809–1812. A. BANERJEE, C. GREWER, L. RAMAKRISHNAN, J. JAGER, A. GAMEIRO, H.-G.A. BREITINGER, K. R. GEE, B. K.
176
177
178
179 180
181 182
183
184
185 186 187 188
189
190
191
192
193
CARPENTER, G. P. HESS, J. Org. Chem. 2003, 68, 8361–8367. D. RAMESH, R. WIEBOLDT, L. NIU, B. K. CARPENTER, G. P. HESS, Proc. Natl Acad. Sci. USA 1993, 90, 11074–11078. P. M. KOENIGS, B. C. FAUST, N. A. PORTER, J. Am. Chem. Soc. 1993, 115, 9371. P. B. JONES, M. P. POLLASTRI, N. A. PORTER, J. Org. Chem. 1996, 61, 9455–9461. P. B. JONES, N. A. PORTER, J. Am. Chem. Soc. 1999, 121, 2753–2761. A. D. TURNER, S. V. PIZZO, G. ROZAKIS, N. A. PORTER, J. Am. Chem. Soc. 1988, 110, 244–250. N. A. PORTER, J. D. BRUHNKE, Photochem. Photobiol. 1990, 51, 37–43. D. A. PRATT, D. F. UNDERWOOD, S. J. ROSENTHAL, N. A. PORTER, Abstr. 221st Am. Chem. Soc. 2001, ORGN-703. A. H. KOSSEL, S. B. CAMBRIDGE, U. WAGNER, T. BONHOEFFER, Proc. Natl Acad. Sci. USA 2001, 98, 14702–14707. P. ROY, Z. RAJFUR, D. JONES, G. MARRIOTT, L. LOEW, K. JACOBSON, J. Cell. Biol. 2001, 153, 1035–1048. G. MARRIOTT, M. HEIDECKER, Biochemistry 1996, 35, 3170–3174. J. MINDEN, R. NAMBA, J. MERGLIANO, S. CAMBRIDGE, Sci. STKE. 2000, 62, PL1. S. B. CAMBRIDGE, R. L. DAVIS, J. S. MINDEN, Science 1997, 277, 825–828. K. D. PHILIPSON, J. P. GALLIVAN, G. S. BRANDT, D. A. DOUGHERTY, H. A. LESTER, Am. J. Physiol. Cell Physiol. 2001, 281, 195–206. M. GHOSH X. SONG, G. MOUNEIMNE, M. SIDANI, D. S. LAWRENCE, J. S. CONDEELIS, Science 2004, 304, 743–746. R. S. ROCK, K. C. HANSEN, R. W. LARSEN, M. ENDO, K. NAKAYAMA, Y. KAIDA, T. S.I. CHAN, Chem. Phys. 2004, 307, 201. R. P. Y. CHEN, J. J. T. HUANG, H.-L. CHEN, H. JAN, M. VELUSAMY, C.-T. LEE, W. FANN, R. W. LARSEN, S. I. CHAN, Proc. Natl. Acad. Sci. USA 2004, 101 7305–7310. N. WU, A. DEITERS, T. A. CROPP, D. KING, P. G. SCHULTZ, J. Am. Chem. Soc. 2004, 126, 14306–14307. M. ENDO, K. NAKAYAMA, T. MAJIMA, J. Org. Chem. 2004, 69, 4292–4298.
51
52 Light-activated Proteins
194 K. NAKAYAMA, M. ENDO, T. MAJIMA, Chem.Commun. 2004, in press. 195 M. ENDO, K. NAKAYAMA, Y. KAIDA, T. MAJIMA, Angew. Chem. Int. Ed. 2004, 43, 5643–5645. 196 C. MARINZI, J. OFFER, R. LONGHI, P. E. DAWSON, Bioorg. Med. Chem. 2004, 12, 2749–2757.
197 M. E. HAHN, T. W. MUIR, Angew. Chem. Int. Ed. 2004, 43, 5800–5803. 198 G. PAPAGEORGIOU, D. OGDEN, J. E. T. CORRIE, J. Org. Chem. 2004, 69, 7228–7233. 199 S. R. ADAMS, R. E. CAMPBELL, L. A. GROSS, B. R. MARTIN, G. K. WALKUP, Y. YAO, J. LLOPIS, R. Y. TSIEN, J. Am. Chem. Soc. 2002, 124, 6063–6076.
1
Ubiquitin and Ubiquitin-like Protein Conjugation Mark Hochstrasser
Yale University, New Haven, USA
Originally published in: Protein Degradation, Volume 2. Edited by R. John Mayer, Aaron Ciechanover and Martin Rechsteiner. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31130-0 Abstract
Protein modification by ubiquitin and ubiquitin-like proteins (Ubls) plays a pervasive role in eukaryotic cell regulation. One aim of this article is to survey the ubiquitin and Ubl conjugation systems in order to highlight key mechanistic and functional features. Another is to discuss some of the gaps in our understanding of both the evolutionary origins of these conjugation systems and the changes Ubl attachment can impart on a conjugated protein. The ubiquitin and Ubl systems use related enzymes to activate and attach ubiquitin and Ubls to proteins (and, in at least one case, to phospholipids). Most ubiquitin and Ubl attachments are dynamic, with efficient reversal of the modifications by a battery of deconjugating enzymes. The versatility of these systems is reflected in the enormous array of biological processes they control. It is likely that ubiquitin and Ubl attachments function fundamentally as a means of regulating macromolecular interactions. Best known is the ability of polyubiquitinated protein to bind with high affinity to polyubiquitin receptor sites on the proteasome, causing the rapid degradation of the tagged protein. Specific examples of physiological deployment of ubiquitin and Ubl attachment will be used to illustrate distinct mechanisms of regulation by these highly conserved protein modifiers.
1 Introduction
The biological functions of many proteins are altered by their covalent attachment to polypeptide modifiers [1–5]. Among these types of modification, probably the best known is ubiquitination. Ubiquitin can target proteins for degradation by the 26S proteasome, but additional effects of protein ubiquitination are now well documented. Ubiquitin is joined reversibly to proteins by amide linkage between the Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Ubiquitin and Ubiquitin-like Protein Conjugation
carboxy terminus of ubiquitin and primary amino groups of the acceptor proteins [3, 5]. The primary amine is usually a lysine ε-amino group (the bond with ubiquitin is then called an isopeptide bond) but can also be the N-terminal Nα amino group [6]. 1.1 The Ubiquitin Conjugation Pathway
The C-terminus of ubiquitin must be activated before it can form a covalent bond with another protein [3, 5] (see Figure ??, which depicts a more general Ubl cycle). Initially, the ubiquitin is adenylated by the ubiquitin-activating enzyme E1. A highenergy mixed anhydride bond links the ubiquitin-AMP, which remains bound to the E1. This bond is then attacked by a sulfhydryl group of a cysteine in the E1 enzyme, yielding a high energy E1-ubiquitin thioester intermediate. The activated ubiquitin is subsequently passed to one of a large number of distinct ubiquitin-conjugating (E2) enzymes by transthiolation to a conserved cysteine side chain of the E2. The E2 proteins catalyze substrate ubiquitination in conjunction with a ubiquitin-protein ligase (E3). For one structural class of E3 proteins (the “HECT domain” E3s), the ubiquitin is first transferred to a conserved cysteine of the E3 before the final transfer to a substrate amine. For most other ubiquitination reactions, a primary role for the E3 appears to be as an adaptor that positions the substrate in close proximity to the reactive E2-ubiquitin thioester bond. The majority of such E3s are characterized by a RING domain, which coordinates a pair of zinc ions and participates in E2 binding [7]. Additional roles for E3s in the catalytic cycle, such as allosteric activation of the E2, remain a distinct possibility [8, 9]. For instance, all E2s have an asparagine residue upstream of the active site cysteine. This asparagine is implicated in the formation of an oxyanion hole that stabilizes the tetrahedral intermediate formed by nucleophilic attack of a substrate amino group on the activated carbonyl of ubiquitin [9]. However, the side chain of the asparagine is fully hydrogen-bonded and oriented away from the active cysteine in the atomic structures determined for isolated E2 enzymes. E3 binding to the E2 and/or ubiquitin thioester formation on the E2 may trigger local structural changes that allow rotation of the E2 asparagine side chain to a position where it can help generate a functional oxyanion hole [10]. 1.2 Ubiquitin Polymers
In many cases, particularly for proteolytic substrates, more than one ubiquitin is attached to the substrate protein. These ubiquitin molecules can be attached to different substrate amino groups, or they can be attached to each other to form a poly-ubiquitin chain that is linked to a single substrate site [11, 12]. The ubiquitin molecules in these polymers are linked through the lysine side chain of one ubiquitin with the C-terminal carboxyl of the next ubiquitin. Ubiquitin has several
1 Introduction 3
different lysines that contribute to such linkages. For instance, the polyubiquitinated proteins recognized by the proteasome usually have ubiquitin Lys48-linked chains, and the chain must include at least four ubiquitins for tight binding to the proteasome [13]. Ubiquitin chain formation is also essential for certain types of DNA repair and signal transduction pathways, but these chains have Lys63 linkages and do not target the proteins to the proteasome. Ubiquitin polymers of distinct topology are generally thought to have intrinsically different binding affinities for particular target proteins [14]. However, some proteins, such as the S5a proteasome sub-unit, can bind different types of ubiquitin chains with comparable affinity [15]. Monoubiquitination has distinct signaling functions, as will be discussed below. 1.3 Ubiquitin Attachment Dynamics
Ubiquitinated proteins are in a dynamic state, subject to either further rounds of ubiquitin addition or ubiquitin removal by deubiquitinating enzymes (DUBs) (analogous enzymes act on Ubls; see Figure ??). The DUBs comprise one of the largest classes of ubiquitin-system enzymes, but their individual functions are just now beginning to come into view [16]. Many DUBs have negative roles in ubiquitindependent signaling. For example, removal of a polyubiquitin chain from a proteolytic substrate prior to its binding to the proteasome will prevent degradation of the substrate. Several DUBs have been demonstrated to have substrate-specific deubiquitinating activity. An illustrative example is the herpesvirus-associated ubiquitin-specific protease (HAUSP). HAUSP can specifically deubiquitinate the p53 tumor suppressor protein; this limits p53 degradation, thereby enhancing p53 pro-apoptotic and growth inhibitory functions [17]. Other DUBs can have positive functions in ubiquitin-dependent processes. The best-known examples of this are the enzymes that recover ubiquitin from proteasome-bound polyubiquitinated substrates [16, 18]. Failure to remove the polyubiquitin from the tagged proteins severely impedes their degradation, presumably because it is difficult to unfold and degrade the highly structured ubiquitin molecules [19]. Ubiquitin (and most Ubls) is synthesized in C-terminally extended precursor forms, which are also processed by DUBs to expose a terminal Gly-Gly dipeptide that is necessary for ubiquitin activation and conjugation. The fates of ubiquitinated proteins vary greatly. Ubiquitin-induced functional changes depend on whether a single ubiquitin or a polyubiquitin chain is attached (and, if a chain, with what topology), where and when in the cell the modification occurs, and exactly what protein receives the modification. Similar considerations apply to the Ubls and their targets, although Ubl chain formation is not widely observed. Dynamic modification of proteins by ubiquitin and Ubls allows reversible switches between different functional states, as is true for other transient covalent protein modifications such as phosphorylation. A major focus of the remainder of this article will be on the general properties of the resulting ubiquitin and Ubl conjugates and the functional consequences of these modifications.
4 Ubiquitin and Ubiquitin-like Protein Conjugation
2 Ubls: A Typical Modification Cycle by an Atypical Set of Modifiers
The existence of potential ubiquitin-related protein modifiers first became apparent in the late 1980s with the discovery that an interferon-stimulated gene product of 15 kDa, or ISG15, shares significant sequence similarity with ubiquitin and is recognized by anti-ubiquitin antibodies [20]. In 1992, ISG15 was shown to modify other proteins by what seemed likely to be a similar post-translational enzymatic mechanism [21]. Like ubiquitin, ISG15 is synthesized in precursor form, and its C-terminal tail is processed off to expose a diglycine motif that is essential for ISG15-protein conjugation [22]. The mature protein is composed of two domains, both with substantial sequence and structural similarity to ubiquitin (Table 1) [23]. Although ISG15 is the prototypical Ubl, it remains one of the least understood. Only very recently have the ISG15 E1 and E2 enzymes been identified [24–26]. These enzymes, like the Ubl itself, are strongly induced by type I interferons, and they are presumed to be important in antiviral responses; to date, however, the only genetic evidence supporting this idea is the finding that mice lacking a protease with ISG15-deconjugating activity, UBP43, have a more vigorous innate immune response to viral infections [27]. Current data also suggest that ISG15 functions in signal transduction, particularly the Jak-STAT pathway, but its exact role is unclear. Multiple ISG15-protein conjugates are likely, but only a small number of substrates have been reported so far [28]. Since the discovery of ISG15, at least 10 additional ubiquitin-related proteins have been identified that can covalently modify other macromolecules or are strongly suspected of having this ability (Table 1). The widespread occurrence of Ubls underscores the potential regulatory importance of protein attachment to other proteins (or to lipids). Ubiquitin can modify hundreds if not thousands of different proteins [29, 30]. Some Ubls, such as small ubiquitin-related modifier (SUMO), appear to rival ubiquitin in the number and diversity of substrates targeted, while others are likely to have a very limited number of substrates. Atg12 (autophagy protein 12), for example, is thought to have but a single target (Atg5), and Atg8 is specifically attached to phosphatidylethanolamine, a phospholipid. Most Ubl modification pathways utilize highly similar enzymatic mechanisms involving E1-like, E2-like, and often E3-like enzymes as well as specific Ubldeconjugating enzymes (Figure ??). Several unusual ubiquitin and Ubl conjugation mechanisms have been proposed as well. These noncanonical mechanisms, which at this point still have little supporting experimental evidence, range from ubiquitin hydrolases effectively working in reverse [31] to a set of unusual ciliate self-splicing polyproteins [32]. The ciliate polyproteins consist of a series of Ubl domains flanked by self-splicing bacterial intein-like (BIL) domains. The BIL domains are postulated to excise the flanking Ubl segments and attach them to substrates during autocatalytic in cis processing reactions. Such atypical mechanisms most likely account for only a very small percentage of Ubl conjugation reactions. The major pathways of protein conjugation appear to have evolved by repeated rounds of duplication and
2 Ubls: A Typical Modification Cycle by an Atypical Set of Modifiers 5
diversification of enzymes and protein modifiers derived from ancient enzymecofactor biosynthetic pathways (see below). It should be noted that there are also many ubiquitin-related proteins in which a ubiquitin-like domain (UBL) is built into a larger polypeptide, but the UBL is neither excised nor attached to other proteins. Such UBLs may impart properties on a protein similar to those conferred by a transferable Ubl, but the UBL is locked into a single target. The UBL-containing proteins include several proteins that also have ubiquitin-binding domains. These multi-domain proteins are thought to help
Table 1 Known or suspected Ubls
Modifieraa
Identity with Ub (%)
E1a
E2a
Comments
Reference
Ubiquitin (Ub)
100
Uba1
Many
123
ISG15 Rub1/NEDD8
32/37b 55
Ube1L Uba3-Ula1
UbcH8 Ubc12
Smt3/SUMO1-4
18
Uba2-Aos1
Ubc9
Atg12
NSc
Atg7
Atg10
Atg8
NS
Atg7
Atg3
Urm1
NS
Uba4
−
UFM1 FUBI/MNSFβ
NS 38
Uba5 −
Ufc1 −
FAT10 Ubl-1
32/40b 40
− −
− −
Hub1
22
−
−
BUBL1, 2
variable (up to 80%)
−
−
SF3a120
30
−
−
Oligo(A) synthetase
42
−
−
Viral form more diverged (75% identity) First Ubl identified Substrates: cullins, p53 Vertebrates have 4 distinct SUMO genes ∼20% identical to ATG8 3 known human isoforms; has the Ub fold Related to MoaD, ThiS Has the Ub fold Derived from ribosomal precursor Substrates unknown Nematode ribosomal precursor Might bind only noncovalently to targets Ciliate putative autoprocessed proteins Ubl at C-terminus; no data for conjugation Ubl at C-terminus; no data for conjugation
a
21 110 111 112 33
44 115 113 116 114 117–119
32
120 121
Yeast names are listed for E1s and E2s except for the ISG15 and UFM1 systems, which are not found in S. cerevisiae; for Ubl names, the yeast names are given (listed first if a vertebrate ortholog is known and goes by a different name) if present in yeast; b Two Ub-related domains; c Not statistically significant.
6 Ubiquitin and Ubiquitin-like Protein Conjugation
transfer polyubiquitinated proteins from E2 and E3 complexes to the proteasome and will be discussed later in the chpter. 2.1 Some Unusual Ubl Conjugation Features
Among the eight Ubl conjugation pathways for which E1 enzymes have been identified (the first eight entries in Table 1), there are several with unusual properties that deserve additional comment. First, the Atg8 and Atg12 Ubls share a single E1like enzyme but have different E2-like proteins [33]. This remains the only known example in which a single E1, Atg7, can activate two different Ubls. The ability of Atg7 to transfer Atg8 and Atg12 to distinct E2s suggests that these E2s bind to the E1-Ubl complex in ways that are productive for transfer only of the E2’s cognate Ubl. The structural basis of this discrimination remains to be determined. Second, there is now also an example of different E1 enzymes specific for different Ubls sharing the same E2 enzyme to effect cognate substrate modification. UBEL1, the E1 for ISG15, and UBA1, the E1 for ubiquitin, can both use the same E2, UbcH8 [25, 26]. How Ubl substrate targeting specificity is maintained is an important question, although it might be explained largely by mass action. ISG15 and UBEL1 are only strongly expressed when cells are induced with interferon. Potentially, activated ISG15 would be present at sufficiently high levels under these conditions to ensure that UbcH8 picks up a sizeable amount of it and, together with ISG15specific E3s, which might also be interferon-inducible, will be able to transfer it to the appropriate targets. No such E3 has yet been described, but it would be predicted to be able to recognize ISG15-charged, but not ubiquitin-charged, UbcH8. For the SUMO pathway, recent studies reveal direct contacts between an E3 and the SUMO protein [10, 34]. Similarly, ISG15-specific E3s might be able to use direct ISG15 contacts to identify the cognate ISG15-E2 thioester even if the same E2 is sometimes linked to ubiquitin. In the remainder of the review, several areas will be emphasized. First, I will expand on earlier speculations about the possible evolutionary origins of the ubiquitin and Ubl modification systems [1]. Such an evolutionary perspective is useful when potential mechanistic variations in Ubl activation and conjugation are considered. It also suggests an explanation for the otherwise mysterious existence of the widespread E1-E2 couple in Ubl conjugation. Second, I will take examples from both the ubiquitin and Ubl literature to highlight common and divergent themes regarding the biochemical functions of ubiquitin and Ubl ligation. In other words, the emphasis will be on what happens after the modifier is attached to its target rather than focusing on what determines how a target is chosen for modification in the first place. The latter topic is extensively reviewed elsewhere in these volumes. Finally, I will attempt to generalize these examples to give a broader account of the biochemical and physiological consequences of ubiquitin and Ubl conjugation.
3
3 Origins of the Ubiquitin System 7
Origins of the Ubiquitin System
For many years, the evolutionary antecedents to the ubiquitin system were completely mysterious. Ubiquitin itself was regarded as perhaps the most highly conserved of all eukaryotic proteins, yet no eubacterial or archaeal proteins shared any obvious primary sequence similarity to it [35]. An early hint to ubiquitin’s origins came from the cloning and sequencing of the gene encoding the E1 ubiquitin-activating enzyme; the protein displays weak but significant similarity to ChlN/MoeB, an E. coli protein required for the biosynthesis of the molybdenum cofactor (Moco) [36]. At the time, the biochemical function of MoeB was unknown, and thus this similarity was not informative by itself. During the late 1990s, however, the protein sequences and catalytic mechanisms of the enzymes used to synthesize Moco (and thiamin [vitamin B1]) began to be deciphered, and intriguing similarities to ubiquitin activation were noted [37–39]. 3.1 Sulfurtransferases and Ubl Activation Enzymes
Biosynthesis of both Moco and thiamin requires insertion of sulfur atoms into their precursor forms. In each case, the sulfur is donated by a small sulfur carrier protein termed MoaD and ThisS, respectively. The donor sulfur derives from a thiocarboxylate group generated at the C-termini of these proteins (see Figure ??). MoaD and ThiS are related and, like ubiquitin, end with a pair of glycines. Most interestingly, conversion of the C-terminal glycine carboxylate to a thiocarboxylate in these proteins is preceded by C-terminal adenylation by an E1-related enzyme: MoeB for MoaD and ThiF for ThiS [38–40]. Both MoaD and ThiS were later shown to share the ubiquitin fold despite the lack of obvious overall sequence similarity to ubiquitin [41, 42]. Therefore, ubiquitin, MoaD, and ThiS are all structurally related proteins whose C-termini are activated through adenylation by homologous E1-related enzymes [5, 43]. Further evidence for an evolutionary link between these sulfur transfer systems and ubiquitin activation came from a bioinformatics analysis of proteins that might be related to MoaD or ThiS in the predicted S. cerevisiae proteome [44]. This eukaryote lacks any Moco-containing enzymes and uses a different mechanism for thiamin synthesis; therefore, the aim of the sequence searches was to identify potential new Ubls that might have been missed in scans with ubiquitin. One previously uncharacterized protein was uncovered and named ubiquitin-related modifier-1 (Urm1), although no sequence similarity to ubiquitin could be detected. By yeast two-hybrid interaction screening with Urm1, Furukawa et al. [44] then identified a novel E1-related protein, which they named Uba4. Uba4 is more closely related to ThiF and MoeB than it is to the E1 ubiquitin-activating enzyme. Nevertheless, Uba4 appears to form a thioester intermediate with Urm1 and to stimulate covalent addition of Urm1 to cellular proteins. (Subsequent analysis revealed that a major target of Urm1 conjugation is Ahp1, a thiol-specific antioxidant protein [45].) The conclusion from these findings is that Urm1 and Uba4 function as a Ubl-protein conjugation system despite bearing much closer sequence relatedness to biosyn-
8 Ubiquitin and Ubiquitin-like Protein Conjugation
thetic sulfur transfer factors than to ubiquitin and ubiquitin-activating enzyme. As such, the Urm1–Uba4 system might represent a kind of “missing link” in the evolution of the ubiquitin system from these sulfur transfer pathways [1, 44]. 3.2 The E1-E2 Couple
Of course, the ubiquitin system involves a number of additional enzymes beyond the E1-activating enzyme. E2, E3, and ubiquitin- and Ubl-deconjugating enzymes are also central components, although E3s and Ubl-cleaving enzymes might not be part of all Ubl conjugation systems. On the other hand, an E1–E2 couple may be obligatory in all conjugation pathways that utilize an E1-like enzyme (Figure ??). For the eight Ubl conjugation systems in which an E1 has been identified (Table 1), all but one – the Urm1 system – is known to require a separate E2 protein. The transfer of Ubls from a cysteine side chain of an E1 to a cysteine on an E2 is not chemically necessary insofar as the Ubl C-terminus is already activated when it is bound to the E1, and the transfer to E2 also yields an enzyme-Ubl thioester bond. Enhanced regulatory flexibility or substrate specificity might help to explain the existence of E2s. Multiple E2 isozymes characterize the ubiquitin pathway, and in conjunction with different E3s, this enzyme diversity might increase the range or specificity of substrate modification. An E2 might also be needed to generate polymerized forms of ubiquitin or of certain Ubls [46, 47]. However, these explanations do not account for the evolutionary appearance of E2s in the first place, and no protein obviously related in sequence to any E2s has been identified in the biosynthetic sulfur donor pathways. A possible exception to the E2 requirement for E1-catalyzed Ubl-protein conjugation is Urm1. Because Urm1 ligation appears to be poised, evolutionarily speaking, between the MoaD/ThiS activation and Ubl-protein ligation mechanisms, this exception is potentially instructive. A unique and conserved feature of Uba4 (the Urm1 E1 enzyme) compared to other E1-like proteins is the presence of a rhodanese homology domain (RHD) in the protein (Figure ??). Rhodanese and a number of RHD proteins are sulfurtransferases that form a persulfide (–S–S–H) on their active-site cysteine. Many MoeB family proteins, such as human MOCS3, have a similar domain organization, with an E1-like domain followed by an RHD. Based on these and other similarities, we previously proposed that thiocarboxylate formation in MoaD (MOCS2A in humans) catalyzed by MoeB/MOCS3 is closely related mechanistically to Uba4-catalyzed Urm1 activation and transfer [1] (see Figure ??). The Uba4 RHD in this scenario functions as a kind of built-in E2. In the part of this speculative model pertaining to the sulfurtransferase, it was suggested that a persulfide is generated at the RHD active site of MoeB and that this attacks the activated MoaD to form an acyl disulfide intermediate (Figure ??, left side). In a second step, reductive cleavage of the MoaD acyl disulfide by the conserved E1-domain cysteine releases the MoaD thiocarboxylate. Recent experiments on human MOCS2A thiocarboxylate formation support many features of this model [48, 49]. We and others [1, 37, 39] had also initially suggested that the cysteine in the E1-like domain forms a thioester with MoaD. This appears not to occur,
4 Ubiquitin-binding Domains and Ubiquitin Receptors in the Proteasome Pathway
and the importance of this cysteine for Moco synthesis varies between species. The model in Figure ?? incorporates this revision. Interestingly, an acyl disulfide intermediate linking E. coli ThiF and ThiS has also been isolated, but this occurs on the conserved E1 domain cysteine [50]; such a covalent complex is not universally observed [51]. What are the implications for the Uba4 mechanism of Urm1 activation? As noted, the RHD of Uba4 was proposed to function as a built-in E2, using its active-site cysteine to attack a thioester intermediate on the E1-like domain [1] (Figure ??). We have found that the conserved RHD cysteine of Uba4 is required for its physiological function and for Urm1-protein conjugation in vivo (I. Velichutina, M. Hochstrasser, unpublished data). However, it remains to be shown that Urm1 is transferred onto the RHD or even that it forms a thioester at the E1 site, as has been assumed (Figure ??, right side). More generally, the E1-to-E2 transthiolation of Ubls may reflect the derivation of these protein-conjugation systems from sulfurtransferases that mobilize sulfur from a protein-linked persulfide through reductive cleavage by a second enzyme thiol group. Urm1 conjugation may retain features of the more ancient sulfurtransferases, while a process of “molecular takeover” might have occurred for other Ubl ligation pathways such that the RHD was replaced by a distinct E2 species. It is noteworthy that the E2s for the Atg8 and Atg12 pathways, while very weakly related to each other, are not detectably similar in sequence to the identified E2s for the remaining Ubl pathways, suggesting the possibility of convergent evolution of E2s from at least two separate lineages.
4 Ubiquitin-binding Domains and Ubiquitin Receptors in the Proteasome Pathway
Most early work on ubiquitin focused on its role in proteolysis [52–54]. Once it became clear that the 26S proteasome was responsible for the degradation of polyubiquitinated proteins, an obvious question was how the polyubiquitin chain facilitated degradation of the substrate protein. Several functions for such chains were proposed, but the most patent was that they enhanced the association between substrate and proteasome by directly binding to the proteasome. Elegant biochemical experiments eventually demonstrated that direct binding between the Lys48-linked polyubiquitin chain and the proteasome could fully account for the observed affinity of model polyubiquitinated proteins for the protease complex [13]. 4.1 A Proteasome “Ubiquitin Receptor”
Beginning with the assumption that a polyubiquitinated protein could directly bind to a single subunit of the proteasome, Rechsteiner and colleagues used farWestern analysis to search for a ubiquitin receptor within the proteasome [55]. The subunits of purified human 26S proteasomes, which consist of one or two 19S regulatory complexes bound to a 20S proteasome core, were resolved by denaturing
9
10 Ubiquitin and Ubiquitin-like Protein Conjugation
gel electrophoresis, blotted onto a membrane, and incubated with a radiolabeled polyubiquitinated protein. Remarkably, a single subunit of the 19S regulatory complex called S5a (Rpn10 in yeast) bound tightly to the radiolabeled substrate despite having been denatured initially in SDS and separated from the other subunits of the complex. Binding in solution was later shown as well. Two related ∼30-residue hydrophobic segments in S5a are responsible for the binding to polyubiquitin [56]. A subsequent bioinformatics analysis recognized a more general ∼20-residue core related to the S5a ubiquitin-binding element, and this element was christened the ubiquitin interaction motif (UIM) [57]. 4.2 A Plethora of Ubiquitin-binding Domains
Since the description of the UIM, a substantial number of distinct ubiquitin-binding modules have been discovered by a combination of bioinformatic, biochemical, and structural studies [58]. Ubiquitin-binding domains include the UBA, CUE, UEV, NZF, DAUP/ZnF-UBP/PAZ, and GAT domains [59–61]. These domains range between∼35 and∼145 amino acids in length and vary considerably in structure. Nevertheless, several generalizations unite them. First, all of the structurally characterized ubiquitin-binding proteins contact the same general region on the ubiquitin molecule. The interface centers on a hydrophobic surface on the β-sheet of ubiquitin, which includes Ile44 and is sometimes called the Ile44 face. Hydrophobic interactions dominate the binding. Second, the binding to a single ubiquitin is generally very weak, with apparent dissociation constants ranging from ∼20 mM to nearly 1 mM. Despite this unimpressive affinity, mutational studies have made clear the physiological relevance of these weak binding interactions in many cases [58, 59]. Amplification of the ubiquitin signal by linking multiple ubiquitin moieties into a chain can have dramatic effects on the affinity or avidity of interaction with target proteins. Again, the first and probably clearest example comes from studies on the proteasome and S5a/Rpn10. In the original far-Western analyses that identified S5a as a ubiquitin-binding subunit, binding of S5a to ubiquitin chains that had at least four ubiquitin units was clearly far tighter than to shorter chains [55]. Later, quantitative studies of Lys48-ubiquitin chain binding to full 26S proteasomes revealed a similar discontinuity in binding affinity [13]. Using competitor ubiquitin chains of various lengths, Thrower et al. [13] measured the inhibition of degradation of a model substrate. They found that tetrameric chains displayed very strong inhibition (Ki ∼ 170 nM), i.e., high affinity, whereas inhibition was extremely weak with trimeric chains (Ki ∼ 1.9 µM) and undetectable with dimeric chains (Ki > 15 µM). Such nonlinear effects imply that a unique binding signal is created by formation of a tetrameric chain rather than independent binding of multiple monoubiquitin moieties [62]. 4.3
4 Ubiquitin-binding Domains and Ubiquitin Receptors in the Proteasome Pathway
Ubiquitin-Conjugate Adaptor Proteins
Although S5a/Rpn10 was the first identified ubiquitin-binding subunit in the proteasome, it is not the only one, and for the degradation of many proteasome substrates, S5a/Rpn10 is completely dispensable. In S. cerevisiae for instance, loss of Rpn10 leads to only very minor phenotypic abnormalities [63]. Examination of individual substrates in vivo also suggests that only a subset are affected by loss of Rpn10 [63, 64]. Another major way by which polyubiquitinated proteins bind to the proteasome is through “adaptor proteins” that are thought to shuttle on and off the proteasome, ferrying their cargo from ubiquitin-ligase complexes or other intermediaries to sites on the 19S proteasome regulatory complex [64–67]. Although the details of the apparent substrate handoffs to and from these adaptors are still unclear, some common features of the adaptor proteins are emerging. One commonality is that the adaptors bear separate modules for proteasome and polyubiquitin binding. Three structurally related adaptors, Rad23, Dsk2, and Ddi1, were first characterized in yeast. They have an N-terminal ubiquitin-like domain (UBL) and one or two UBA elements, which, as noted earlier, are ubiquitin-binding domains. The UBL of these adaptors binds directly to either of two specific subunits in the proteasome 19S regulatory complex [68, 69]. These subunits, Rpn1 and Rnp2, share a series of leucine-rich repeats (LRRs). The LRRs, at least for Rpn1, directly bind to the UBL of Rad23, and presumably the same holds for the other adaptors [68]. Only a subset of polyubiquitinated proteasome substrates requires any of these UBL–UBA adaptor proteins [64, 70]. Genetic studies suggest that Rpn10 and Rad23 have overlapping functions in substrate targeting, but degradation of some ubiquitinated proteins requires neither [64, 71]. Such substrates might be targeted by unidentified adaptors, or they might be recognized by an integral proteasome subunit [122]. One question raised by these data is how adaptor proteins with apparently generic proteasome- and polyubiquitin-binding modules can discriminate between different polyubiquitinated substrates (or even whether there is any functional difference between targeting such substrates directly to the proteasome or to a proteasome-binding adaptor protein). Moreover, while the bipartite nature of these adaptors suggests how they can bind simultaneously to both the polyubiquitinated substrate and the proteasome, it does not indicate how substrate transfer, e.g., between the adaptor and the proteasome, takes place. A provocative recent study suggests some unexpected features of polyubiquitinated substrate transfers and also addresses the question of substrate specificity. This work indicates that the UBL of Rad23 binds not only to the proteasomal Rpn1 subunit but also to a second protein, Ufd2, which participates in polyubiquitination of a limited set of proteins [66]. Rpn1 and Ufd2 compete for Rad23 binding. One interpretation of these findings is that binding of Rad23 to a Ufd2-containing ubiquitin-ligase complex engaged in substrate polyubiquitination displaces the substrate-linked polyubiquitin chain from Ufd2. This in turn could facilitate binding of the polyubiquitin chain to the UBA domains of Rad23. The net effect will be the transfer of the polyubiquitinated substrate from Ufd2 to Rad23. Subse-
11
12 Ubiquitin and Ubiquitin-like Protein Conjugation
quently, the Rad23 UBL must somehow release Ufd2 and bind the proteasome, initiating the final transfer of substrate. Whether additional factors are required for these transfer reactions remains to be determined.
5 Ubiquitin-binding Domains and Membrane Protein Trafficking
Targeting proteins to the proteasome is not the only function of ubiquitin. An intricate array of dynamic ubiquitinated protein-protein target interactions has been described in a completely different arena, namely, membrane protein trafficking [58]. As with polyubiquitinated protein trafficking to the proteasome, sequential interactions of ubiquitinated substrate proteins with multiple ubiquitin-binding factors is a central feature of membrane protein sorting. However, in membrane protein trafficking, monoubiquitin is the predominant signal. As noted earlier, the binding of monoubiquitin to a ubiquitin-binding domain is generally weak; therefore, association is usually very transient unless additional substrate-binding sites are combined with the ubiquitin-binding motif. In principle, this allows considerable flexibility and sensitivity in the regulation of endocytosis and other membrane protein trafficking events. Initial evidence for the importance of membrane protein ubiquitination for trafficking came from yeast [72–74]. These early studies demonstrated that cell surface receptors are ubiquitinated at the plasma membrane and that this ubiquitination correlates with their endocytosis and eventual degradation in the vacuole (the yeast equivalent of the lysosome). Subsequent research from many laboratories working in a variety of organisms has revealed that monoubiquitin attachment to receptors at the cell surface is a commonly employed endocytic signal [75]. Moreover, monoubiquitination of transmembrane proteins also helps to sort them once they have entered the endosomal membrane system. An intensively investigated ubiquitin-dependent trafficking pathway in higher eukaryotes is signal transduction by receptor tyrosine kinases, the best studied of which is the epidermal growth factor (EGF) receptor [58, 76]. Beginning at the cell surface, ubiquitin modification functions at several stages in the endocytosis and intracellular sorting of EGF receptors to the lysosome, where the receptors are ultimately degraded. Thus, as with yeast plasma membrane proteins, ubiquitination is a means of downregulating or attenuating the surface expression of EGF receptors. Once in endosomal vesicles, EGF receptors either can be sorted to a recycling compartment and thence back to the plasma membrane, or they can continue on toward late endosomes [76]. Late endosomes mature by a processing of invagination and vesiculation to form multivesicular bodies (MVBs). Receptors either stay in the limiting membrane of the MVB, which allows them to make their way back to the plasma membrane, or sort into the internal vesicles. MVBs eventually fuse with lysosomes, and the internalized vesicles and their cargo receptors are destroyed by lysosomal lipases and proteases. Surprisingly, the E3 that ubiquitinates the EGF receptor at the cell surface continues to colocalize with the receptor along the endocytic pathway all the way to
5 Ubiquitin-binding Domains and Membrane Protein Trafficking
the internal vesicles of MVBs. This sustained colocalization may be crucial for maintaining the EGF receptor in its ubiquitin-modified state and for its sorting to the lysosome [77]. In yeast, there is no requirement for continued E3 association with the ubiquitinated receptor during trafficking [78]. It could be that in mammalian cells, the endocytosed receptors are more susceptible to deubiquitination by DUBs and therefore require repeated rounds of ubiquitin readdition. This sustained requirement for ubiquitin on receptor proteins reflects ubiquitindependent endosomal sorting steps that occur within both yeast and mammalian cells. When either endocytosed proteins or biosynthetic membrane cargo proteins moving from the Golgi to the vacuole or lysosome are ubiquitinated, they are sorted into the invagi-nating regions of the late endosome. This sorting requires the sequential action of at least four highly conserved protein complexes that act at the endosome surface [79]. The first of these complexes is a heteromultimer formed by Hrs and STAM. Both the Hrs and STAM subunits contain ubiquitin-binding UIMs, which are necessary for sorting of cargo into internal MVB vesicles in yeast and might directly bind ubiquitinated membrane cargo proteins [80]. In mammalian cells, the Hrs- STAM complex is required for EGF receptor sorting in the MVB and degradation in the lysosome. Based on these and other data, this ubiquitin-binding complex has been proposed to be the sorting receptor for ubiquitinated membrane proteins at the endosome [80, 81]. Following initial recognition of ubiquitinated cargo by Hrs-STAM, the cargo is passed on to the ESCRT-I complex, which also includes a ubiquitin-binding subunit, Tsg101. ESCRT-I is recruited to the endosome through direct interactions between Tsg101 and both a tetrapeptide motif in Hrs and, apparently, the monoubiquitinated membrane protein cargo [82, 83]. Therefore, as was true for trafficking of certain polyubiquitinated proteins to the proteasome, ubiquitinated membrane proteins destined for the lysosome also need to be exchanged between a series of ubiquitinbinding factors. In both cases, these escort or adaptor proteins might shield the substrate from DUBs that could otherwise prematurely remove the ubiquitin signal. By interrogating the ubiquitin-substrate conjugate multiple times along the pathway to degradation, these factors might also enhance substrate selectivity. Finally, two other complexes important for protein sorting at the late endosome membrane, ESCRT-II and ESCRT-III, help to drive the ubiquitinated cargo into invaginating membrane domains. Prior to scission of an invaginating region, a specific DUB is recruited to the ESCRT-III complex to recover the ubiquitin from the targeted receptors [84, 85]. This prevents degradation of ubiquitin and the depletion of cellular ubiquitin pools. For simplicity, only ubiquitination of the endocytic or biosynthetic membrane protein cargo itself was noted in the preceding discussion. However, this is not the only point at which ubiquitination is important in membrane protein trafficking. Multiple endocytotic factors and membrane protein-sorting factors are also monoubiquitinated, sometimes in response to the same ligands that trigger receptor ubiquitination, e.g., EGF binding to EGF receptor [58, 75]. Many of these factors also contain ubiquitin-binding domains. These observations have led to the proposal that these soluble factors form dynamic protein networks in which
13
14 Ubiquitin and Ubiquitin-like Protein Conjugation
ubiquitination of one factor allows intramolecular or intermolecular binding to other trafficking factors. This might contribute to or regulate the assembly of the ubiquitin-binding factors at plasma membrane sites for endocytosis or at endosomal sites for MVB sorting. 5.1 The MVB Pathway and RNA Virus Budding
A source of considerable excitement in the area of ubiquitin-dependent trafficking has been the discovery that enveloped RNA viruses such as HIV-1 and Ebola commandeer the MVB machinery to bud from the surface of the cell [86]. Budding is the way by which such viruses detach from the membrane of infected host cells. Outward budding of the plasma membrane and inward vesiculation of the late endosome membrane are topologically equivalent, and thus it is not entirely surprising that the same vesiculation machinery might be used in both cases. A key issue is how these viruses recruit MVB components to sites of virus particle assembly on the plasma membrane. Expression of the HIV-1 Gag protein is sufficient for membrane budding and release, and a specific region in Gag called the late or L domain is necessary for these events. Interestingly, the L domain in HIV-1 (and other retroviruses) includes a tetrapeptide similar to the sequence in Hrs mentioned earlier; as with Hrs, this sequence provides a docking site for the Tsg101 subunit of the ESCRT-1 complex. In this case, however, recruitment of ESCRT-I serves to hijack the MVB machinery to viral budding sites. In this way, the L domain of Gag binds to the MVB machinery, bringing the ESCRT complexes to Gag-associated plasma membrane sites and thereby stimulating the budding and release of virus particles. What is unclear in all this is the exact role of ubiquitin [86]. Ubiquitin is somehow necessary for enveloped RNA virus budding based on ubiquitin depletion and mutagenesis experiments. Many retroviruses incorporate ubiquitin into their virions, and their Gag proteins are ubiquitinated in infected cells. However, Gag ubiquitination does not appear to be necessary for efficient viral budding and release, at least for the HIV-1 and MuLV retroviruses [87]. Components of the MVB machinery such as Hrs are also modified by ubiquitin, so it could be that this is where ubiquitin ligation is most important for viral egress from the cell.
6 Sumoylation and SUMO-binding Motifs
Besides ubiquitin, SUMO has been the most extensively studied ubiquitin-related protein modifier, and it is already clear that it can modify many proteins, possibly hundreds, in vivo. Recent results from studies on SUMO ligation and sumoylated protein interactions underscore and extend many of the ideas about ubiquitinprotein interactions, which were summarized in the preceding two sections.
6 Sumoylation and SUMO-binding Motifs 15
RanGAP1 was the first SUMO-modified protein to be described, and it has been a useful prototype for understanding how conjugation to SUMO can modify a protein’s function [88]. RanGAP1, the Ran GTPase-activating protein, is modified at a specific lysine that conforms to what has turned out to be a widely, but not universally, utilized sumoylation consensus sequence (KxD/E, where is a large aliphatic residue and x is any residue). Protein sumoylation is also different from ubiquitination in that site-specific modification can occur with just the cognate E1 and E2 but no E3. This can be explained by the observation that the consensus sumoylation site provides a set of specific side-chain interactions with the E2 activesite pocket [89]. Attachment of SUMO to RanGAP1 dramatically changes the subcellular localization of the enzyme, resulting in its concentration on the cytoplasmic fibrils of the nuclear pore complex (NPC) where it binds to RanBP2/Nup358. Binding of SUMORanGAP1 to RanBP2 is not competed off by an excess of either free Ran-GAP1 or SUMO, suggesting either that essential RanBP2-binding determinants are present in both the RanGAP1 and SUMO portions of the SUMO-RanGAP1 conjugate or that formation of the conjugate somehow alters the conformation of RanGAP1 or SUMO to generate a RanBP2-binding site [90]. Recent work on noncovalent SUMO interactions with RanBP2 and other proteins points toward the first of these two models and has helped to identify the first SUMO-binding motif (SBM). 6.1 A SUMO-binding Motif
Unlike the description of ubiquitin-binding motifs, which has become something of a cottage industry, there are relatively few studies thus far that address noncovalent binding of SUMO to other proteins. The first published foray into this area came from a two-hybrid screen for proteins that interacted with the p53-related protein p73α [91]. Both SUMO and a series of known SUMO-interacting proteins were found, and SUMO was shown to be conjugated to p73α. Comparison of the SUMO-interacting proteins suggested a short common motif that was potentially important for SUMO binding, and this was supported by subsequent mutagenesis experiments. The consensus derived from this small group of proteins was hhXSXS/Taaa, where h is a hydrophobic amino acid, α is an acidic amino acid, and X is any residue. Building on these findings, Song et al. [34] carried out quantitative binding and NMR chemical shift perturbation studies to map the interface between SUMO and peptides bearing an SBM. With as few as nine residues, these peptides are substantially smaller than the domains that typically bind ubiquitin (20–145 residues); despite this small size and the fact that the peptides and SUMO form 1:1 complexes, the dissociation constants are between 5 µM and 10 µM, which is much tighter than the binding seen with all the ubiquitin-binding motifs discussed earlier. The refined SBM consensus sequence derived by Song et al. emphasized a central group of 3–4 hydrophobic residues. A region of RanBP2 known to bind sumoylated RanGAP1 has a sequence that conforms to this consensus. Mutagene-
16 Ubiquitin and Ubiquitin-like Protein Conjugation
sis experiments supported the significance of these hydrophobic residues but also indicated a contribution from flanking acidic residues to SUMO binding. Viewed together, the data from Minty et al. [91] and Song et al. [34] suggest that the SBM is composed primarily of a hydrophobic amino acid cluster flanked on one or both sides by acidic residues. Interestingly, a two-hybrid screen for proteins that interacted with yeast SUMO, Smt3, pointed to the presence of a potential Smt3-binding sequence related to the above SBM sequences, namely, a cluster of 3–4 hydrophobic residues flanked by acidic amino acids [92]. This finding suggests that the mechanism of noncovalent SUMO interaction with target proteins is conserved from yeast to mammals. There will almost certainly be additional SUMO-binding domains distinct from the SBM, but it is notable that unrelated screens as different as the yeast and mammalian two-hybrid studies could yield such similar consensus sequences. A very recent report on the crystal structure for a complex containing a SUMO1RanGAP1 conjugate, a segment of RanBP2, and the SUMO E2 revealed the structure of the RanBP2 SBM and its mode of binding to SUMO [10]. The SBM peptide from RanBP2 forms a β strand that sits in a hydrophobic surface depression on SUMO and extends the SUMO β sheet; the ends of the SBM segment are acidic residues that interact with basic residues on the SUMO surface. The SBM-SUMO interface determined from this structure is therefore consistent with the mutagenesis and binding data mentioned above. Importantly, both the crystallographic analysis of SUMO-RanGAP1 and an earlier NMR study of the same conjugate [93] revealed minimal direct interaction between SUMO and RanGAP1, except around the isopeptide linkage, and no obvious structural changes from the unligated proteins. From the NMR analysis, the loop of RanGAP1 linked to SUMO and the SUMO tail were both highly dynamic. Therefore, the mechanism of SUMO-RanGAP1 binding to RanBP2 does not appear to be through a conformational switch in the conjugate but rather by cooperative and simultaneous interaction of RanBP2 to a bipartite binding site created by the physical linkage of SUMO to RanGAP1. This type of binding is likely to typify many other SUMO-protein conjugate interactions. 6.2 A SUMO-induced Conformational Change
In contrast to the structural independence of the SUMO and substrate moieties in the SUMO-RanGAP1 conjugate, an elegant series of mechanistic studies on the function of SUMO conjugation to the DNA repair enzyme thymine-DNA glycosylase (TDG) suggests that in this case, SUMO conjugation to the substrate induces a substantial and functionally important conformation change in the protein [94]. TDG initiates base excision repair of a mismatched thymine or uracil nucleotide by removing the base, leaving an abasic site in the DNA. In vitro, TDG dissociation from the abasic site is strongly rate limiting; tight binding to the potentially harmful abasic site likely shields the site until downstream enzymes can complete the
6 Sumoylation and SUMO-binding Motifs 17
repair process. Nonetheless, a mechanism for enzyme turnover is required, and unexpectedly, site-specific SUMO ligation to TDG turns out to greatly accelerate its release from the DNA [95]. Even more surprisingly, the stimulation of TDG-DNA dissociation by sumoylation of the TDG C-terminal domain appears to operate through a sumoylationinduced conformational change in the TDG N-terminal domain, which causes the enzyme to loosen its grip on the abasic site [94]. Based on a protease sensitivity assay, the N-terminal domain of unsumoylated TDG triggers a conformational change in the enzyme when it binds to a G-U DNA mismatch substrate, enhancing its affinity but preventing catalytic turnover. If, instead, a sumoylated version of TDG is used in the reaction, TDG conformation no longer changes upon incubation with this DNA substrate, suggesting that in its sumoylated form, TDG does not assume the high-affinity DNA-binding state. N-terminally deleted TDG and sumoylated versions of both full-length and N-terminally truncated TDG show identical kinetic behavior when base excision is assayed (in all cases, turnover is still very slow: kcat ∼0:05 min−1 ). These data suggest that sumoylation allows enzymatic turnover by somehow facilitating an N-terminal domain-dependent conformational switch back to a state with low DNA-binding affinity. In the cell, one would assume, TDG sumoylation should occur only after the base excision step. This way, DNA damage recognition and base excision can occur efficiently (non-sumoylated, tight binding state), but substrate release and handoff to the downstream DNA endonuclease will then also be possible (sumoylated, weak binding state). What controls the timing of sumoylation and desumoylation relative to DNA binding and base excision is an interesting question that remains to be examined. 6.3 Interactions Between Different Sumoylated Proteins
While SUMO ligation to TDG is likely to be critical to its normal in vivo function, a noncovalent interaction between the two also seems to be important [96]. A mutation in TDG that blocks noncovalent SUMO binding also blocks its covalent attachment. In addition, TDG associates with the promyelocytic leukemia (PML) protein, and the same mutation that blocks TDG binding to SUMO blocks in vivo colocalization with PML; this association is not impaired by simple elimination of the TDG sumoylation site (sumoylated TDG does bind slightly better to PML in vitro). In other words, noncovalent association of SUMO with TDG is important for TDG interaction with PML. Interestingly, PML is itself a sumoylated protein and also has an SBM, so an obvious model is that the SUMO portion of sumoylated PML binds to TDG, and the SUMO on TDG binds to the PML SBM. Indeed, mutations that eliminate PML sumoylation inhibit TDG binding as well. PML functions as a scaffold for the assembly of so-called PML nuclear bodies, subnuclear structures that have been implicated in transcriptional regulation and DNA repair. PML must be sumoylated to associate with these bodies and to
18 Ubiquitin and Ubiquitin-like Protein Conjugation
concentrate a number of other sumoylated proteins there as well [97]. By combining sumoylation sites and SUMO-binding motifs into the same polypeptide, networks of protein interactions, typified by the TDG-PML association, can be created (also see Ref. [92]). This is reminiscent of what seems to occur with ubiquitin modification of and association with components of the endocytic machinery. As we saw earlier, a number of endocytic factors are ubiquitinated and also carry ubiquitin-binding domains. This is thought to contribute to the activation and interaction of such factors and/or to the exchange of ubiquitinated cargos between them. Besides being relevant to the question of how SUMO-protein conjugates interact specifically and noncovalently with their protein partners, these data might also be significant when considering SUMO ligation specificity. As noted earlier, sitespecific protein sumoylation is frequently observed in vitro with only the E1 and E2 enzymes, at least when they are at high concentration. We saw that one element of this specificity comes from E2-substrate contacts at consensus sumoylation sites. However, direct but noncovalent SUMO-substrate binding could provide an important additional binding determinant, particularly in combination with a consensus sumoylation site. This notion is supported by the finding that TDG is not sumoylated if its noncovalent SUMO-binding site is mutated [96].
7 General Biochemical Functions of Protein-Protein Conjugation
The examples chosen in the previous sections illustrate various ideas about the biochemical functions of protein modification by ubiquitin and Ubls. The simplest generalization to come out of all these examples is that attachment of ubiquitin or a Ubl to a protein (or other macromolecule) creates a distinct physiological state by altering the protein’s interactions with other macromolecules (Figure ??). This is an almost trivial assertion, but more specific versions of the statement will be formulated below that may help give a sense of which mechanisms of altering macro-molecular interaction by ubiquitin or Ubl attachment are more common and why this might be so. 7.1 Negative Regulation by Ubl Conjugation
Because ubiquitin and Ubls are bulky modifiers, one obvious way they could function is by steric occlusion: the attached Ubl simply blocks the ability of its substrate to bind to another protein (or another part of the same protein) (Figure ??A). There are still relatively few well-established examples of this inhibitory mode of action. One possible reason is that for such a mechanism to operate effectively in many cases, a very large fraction of the protein would need to be modified by the Ubl. However, for many proteins, only a very small fraction is observed in the conjugate form. This can sometimes be attributed to artifactual deconjugation during protein
7 General Biochemical Functions of Protein-Protein Conjugation
isolation, but even when such deconjugation is largely prevented, often only a few percent of a particular protein is modified, e.g., with SUMO or ISG15. Nevertheless, if the small fraction of modified protein were localized to some functionally unique cellular site, or if a transient modification were sufficient to put the protein in a new state, such an inhibitory mechanism could still operate. An example that combines both of these mechanisms is the transient ubiquitination of histone H2B at chromosomal sites of induced transcription [98]. (The exact biochemical consequences of this histone ubiquitination are not yet known, so this might not be an example of a modification that inhibits interaction.) Local ubiquitination is brought about by the recruitment of a histone H2B-specific ubiquitin ligase complex. Such transient histone H2B ubiquitination triggers histone H3 methylation, and this new histone state, which is necessary but not sufficient for gene activation, no longer requires that ubiquitin remain on histone H2B. Indeed, deubiquitination of the histone is needed for completion of the switch to the transcriptionally active state. Another example of such molecular memory is the sumoylation of TDG discussed earlier: SUMO attachment is needed to weaken the interaction with DNA after base scission, but once TDG has released from the DNA, the SUMO is no longer necessary or desirable [94]. In this case, however, SUMO attachment seems to inhibit macromolecular interaction (DNA binding) indirectly by inducing a change in enzyme conformation. If a large percentage of a protein were Ubl-modified, the notion of negative regulation of protein interaction and function would be more straightforward. A recent example of a protein that is nearly quantitatively modified by SUMO is the vaccinia virus A40R early protein [99]. A40R sumoylation is required for its localization to ER viral replication sites. Mutation of the A40R sumoylation site causes the protein to self-associate and aggregate into long rods. Thus, SUMO attachment to A40R appears to block its interaction with another protein (another copy of A40R). A second potential example of quantitative sumoylation is the plasma membrane K2P1 potassium leak channel [100]. The apparent SUMO modification is proposed to block channel opening and thereby leakage of K+ ions from the cell. 7.2 Positive Regulation by Ubl Conjugation
When the Ubl modifier enhances an interaction with another macromolecule, it usually does so by participating directly in the formation of part or all of the binding interface with the target molecule (Figure ??B). In principle, such an interaction can also be modulated if an allosteric change in a target binding site were induced by the attached Ubl (Figure ??C). Many examples of Ubl regulation fall into the former category. Here it is easy to see that even if only a small fraction of a particular protein were modified, its new activity could suffice to effect a change in physiological state. Many of the examples in the preceding sections reflect this kind of mechanism. As discussed, noncovalent ubiquitin-protein or Ubl-protein interactions tend to be weak. Binding can be greatly enhanced either
19
20 Ubiquitin and Ubiquitin-like Protein Conjugation
by polymerization of the ubiquitin signal (no clear example of obligatory Ubl chain formation is known) or by combining the weak binding from ubiquitin or Ubl with additional weak binding sites (possibly created by additional ubiquitin or Ubl modifications). The combination of multiple weak interactions to give highly specific protein- protein binding is a well-established idea in the signal transduction field. An example of such multivalent binding was discussed earlier for SUMORanGAP1 binding to the nuclear pore complex. 7.3 Cross-regulation by Ubls
Interestingly, alternative Ubl or ubiquitin modifications sometimes occur on the same substrate, and these can direct the protein to different targets. The clearest illustration of this is proliferating cell nuclear antigen (PCNA), which can be monoubiquitinated, polyubiquitinated, or sumoylated, and each of these forms results in the recruitment of distinct downstream effector proteins. PCNA functions as a DNA polymerase processivity factor in various modes of DNA replication and DNA repair. It forms a homotrimer that encircles the DNA double helix and associates with multiple DNA polymerases. Ubiquitin or a Lys63-linked polyubiquitin chain is attached to a single PCNA lysine, and SUMO is primarily attached to this same site, with a small amount at a second lysine [101]. When the DNA replication machinery encounters a DNA lesion and cannot replicate past it, the type of post-translational modification of PCNA determines which of several distinct mechanisms will be engaged to correct or bypass the lesion. A model that accounts for these different outcomes was recently outlined [102]. The ubiquitin E2 Rad6 and the E3 Rad18 are responsible for monoubiquitination of PCNA, and this ubiquitin can be extended by a heterodimeric E2, Ubc13-Mms2, and the E3 Rad5 into a Lys63-linked polyubiquitin chain [101, 103]. During normal replication, the replicative DNA polymerase is associated with PCNA at the primer-template junction. Monoubiquitination of PCNA is proposed to prevent the polymerase from accessing the junction when DNA damage is encountered, allowing a translesion synthesis (TLS) polymerase to enter the complex [102, 104]. These TLS polymerases bind directly to PCNA and promote either error-free or mutagenic replication through the lesion, depending on the TLS polymerase and the type of DNA damage. If a Lys63-ubiquitin chain has formed on PCNA, it is postulated to cause complete dissociation of the replicative DNA polymerase complex, allowing a template-switching mechanism with error-free copying of the other replicated DNA strand. The consequences of SUMO ligation to PCNA have been less clear, but PCNA sumoylation, which occurs during a normal S phase without induced DNA damage [101], prevents recombinational bypass of lesions [102]. Such recombination during normal DNA replication can be deleterious because of the risk of chromosome rearrangements. Very recent results strongly support this function for SUMO-PCNA, and argue that the sumoylated form specifically recruits the Srs2 DNA helicase to the replication fork [105, 106]. Srs2 disassembles the nucleoprotein
References
filaments that are necessary intermediates for DNA strand invasion and homologous recombination [107, 108]. The exact protein-protein interactions affected by the different ubiquitin and SUMO modifications have not been worked out in full, but it appears that both positive and negative regulation of such interactions occurs (Figure ??). Because the same site of PCNA is used for both ubiquitin and SUMO attachment, there also appears to be some antagonism between these two modifications [105] (Figure ??D). Similar competition between ubiquitin and SUMO for the same substrate lysine has been seen with IκB, an inhibitor of the NF-κB signaling pathway [109]. For IκB, Lys48 polyubiquitination leads to IκB degradation and NF-κB activation, but sumoylation of the same IκB lysines prevents this.
8 Conclusions
As should be evident from the above survey, ubiquitin and Ubl modification of proteins represents a highly versatile means of regulating protein function. Nature has made widespread use of such conjugation through the elaboration of multiple variants of the same basic enzymatic mechanism. On the order of a dozen or so Ubls have been documented to date, and for eight of these, at least one enzyme in the pathway for substrate conjugation has been identified. Ubiquitin itself can attach to proteins in the form of polymers of different topology, and these topological variants impart differences in function as well. The fundamental E1-E2 couple, which probably arose very early in the evolution of the ubiquitin system from more ancient sulfur transfer pathways, has been supplemented with an array of specificity factors (E3s) in some of the pathways, especially the ubiquitin pathway. Deconjugating enzymes have turned ubiquitin and many of the Ubls into dynamic modifiers whose attachments are tightly regulated both spatially and temporally. The basic biochemical consequence of protein modification by ubiquitin or Ubls is usually a change in the target’s association with other proteins. This change can occur by both direct and indirect mechanisms and can either stimulate or inhibit particular protein-protein interactions. Given the intricacy of the ubiquitin-Ubl system, research into its functions and mechanisms should continue to tax and reward investigators for years to come. Acknowledgments
I thank Cecile Pickart, Rachael Felberbaum, Oliver Kerscher, Stefan Kreft, Alaron Lewis, and Tommer Ravid for many helpful comments on the manuscript. Work from my laboratory was supported by grants from the U.S. National Institutes of Health (GM46904 and GM53756).
References
21
22 Ubiquitin and Ubiquitin-like Protein Conjugation
1 HOCHSTRASSER, M. Evolution and function of ubiquitin-like protein-conjugation systems. Nat Cell Biol 2, E153–E157 (2000). 2 JENTSCH, S. & PYROWOLAKIS, G. Ubiquitin and its kin: how close are the family ties? Trends Cell Biol 10, 335–342 (2000). 3 PICKART, C. M. Mechanisms Underlying Ubiquitination. Annu Rev Biochem 70, 503–533 (2001). 4 SCHWARTZ, D. C. & HOCHSTRASSER, M. A superfamily of protein tags: ubiquitin, SUMO and related modiers. Trends Biochem Sci 28, 321–328 (2003). 5 HUANG, D. T., WALDEN, H., DUDA, D. & SCHULMAN, B. A. UBIQUITINLIKE protein activation. Oncogene 23, 1958–71 (2004). 6 CIECHANOVER, A. & BEN-SAADON, R. N-terminal ubiquitination: more protein substrates join in. Trends Cell Biol 14, 103–106 (2004). 7 ZHENG, N., WANG, P., JEFFREY, P. D. & PAVLETICH, N. P. STRUCTURE of a c-Cbl-UbcH7 complex: RING domain function in ubiquitin-protein ligases. Cell 102, 533–539 (2000). 8 FURUKAWA, M., OHTA, T. & XIONG, Y. Activation of UBC5 ubiquitin-conjugating enzyme by the RING. nger of ROC1 and assembly of active ubiquitin ligases by all cullins. J Biol Chem 277, 15758–15765 (2002). 9 WU, P. Y. et al. A conserved catalytic residue in the ubiquitin-conjugating enzyme family. Embo J 22, 5241–5250 (2003). 10 REVERTER, D. & LIMA, C. D. Insights into E3 ligase activity revealed by a SUMO-RanGAP1-Ubc9-Nup358 complex. Nature 435, 687–692 (2005). 11 PICKART, C. M. & COHEN, R. E. Proteasomes and their kin: proteases in the machine age. Nat Rev Mol Cell Biol 5, 177–187 (2004). 12 HOCHSTRASSER, M. Ubiquitin signalling: what’s in a chain? Nat Cell Biol 6, 571–572 (2004). 13 THROWER, J. S., HOFFMAN, L., RECHSTEINER, M. & PICKART, C. M. RECOGNITION of the polyubiquitin proteolytic signal. Embo J 19, 94–102 (2000). 14 RAASI, S., ORLOV, I., FLEMING, K. G. &
15
16
17
18
19
20
21
22
23
24
25
PICKART, C. M. Binding of polyubiquitin chains to ubiquitin-associated (UBA)domains of HHR23A. J Mol Biol 341, 1367–1379 (2004). BABOSHINA, O. V. & HAAS, A. L. NOVEL multiubiquitin chain linkages catalyzed by the conjugating enzymes E2EPF and RAD6 are recognized by 26S proteasome subunit 5. J. Biol. Chem. 271, 2823–2831 (1996). AMERIK, A. Y. & HOCHSTRASSER, M. Mechanism and function of deubiquitinating enzymes. Biochim Biophys Acta 1695, 189–207 (2004). LI, M. et al. Deubiquitination of p53 by HAUSP is an important pathway for p53 stabilization. Nature 416, 648–653 (2002). GUTERMAN, A. & GLICKMAN, M. H. Deubiquitinating enzymes are IN/(trinsic to proteasome function). Curr Protein Pept Sci 5, 201–211 (2004). YAO, T. & COHEN, R. E. A cryptic protease couples deubiquitination and degradation by the proteasome. Nature 419, 403–407 (2002). HAAS, A. L., AHRENS, P., BRIGHT, P. M. & ANKEL, H. Interferon induces a 15-kilodalton protein exhibiting marked homology to ubiquitin. J Biol Chem 262, 11315–11323 (1987). LOEB, K. R. & HAAS, A. L. The interferon-inducible 15-kDa ubiquitin homolog conjugates to intracellular proteins. J Biol Chem 267, 7806–7813 (1992). POTTER, J. L., NARASIMHAN, J., MENDE-MUELLER, L. & HAAS, A. L. Precursor processing of pro-ISG15/UCRP, an interferon-beta-induced ubiquitin-like protein. J Biol Chem 274, 25061–25068 (1999). NARASIMHAN, J. et al. Crystal structure of the interferon-induced ubiquitin-like protein ISG15. J Biol Chem (2005). YUAN, W. & KRUG, R. M. Influenza B virus NS1 protein inhibits conjugation of the interferon (IFN)-induced ubiquitin-like ISG15 protein. Embo J 20, 362–371 (2001). ZHAO, C. et al. The UbcH8 ubiquitin E2
References
26
27
28
29
30
31
32
33
34
35
enzyme is also the E2 enzyme for ISG15, an IFN-alpha/beta-induced ubiquitin-like protein. Proc Natl Acad Sci U S A 101, 7578–7582 (2004). KIM, K. I., GIANNAKOPOULOS, N. V., VIRGIN, H. W. & ZHANG, D. E. Interferon-inducible ubiquitin E2, Ubc8, is a conjugating enzyme for protein ISGylation. Mol Cell Biol 24, 9592–9600 (2004). RITCHIE, K. J. et al. Role of ISG15 protease UBP43 (USP18) in innate immunity to viral infection. Nat Med 10, 1374–1378 (2004). MALAKHOV, M. P. et al. High-throughput immunoblotting. Ubiquitiin-like protein ISG15 modifies key regulators of signal transduction. J Biol Chem 278, 16608–16613 (2003). PENG, J. et al. A proteomics approach to understanding protein ubiquitination. Nat Biotechnol 21, 921–926 (2003). HITCHCOCK, A. L., AULD, K., GYGI, S. P. & SILVER, P. A. A subset of membrane-associated proteins is ubiquitinated in response to mutations in the endoplasmic reticulum degradation machinery. Proceedings of the National Academy of Sciences of the United States of America 100, 12735–12740 (2003). LIU, Y., FALLON, L., LASHUEL, H. A., LIU, Z. & LANSBURY, P. T., Jr. The UCH-L1 gene encodes two opposing enzymatic activities that affect alpha-synuclein degradation and Parkinson’s disease susceptibility. Cell 111, 209–218 (2002). DASSA, B., YANAI, I. & PIETROKOVSKI, S. New type of polyubiquitin-like genes with intein-like autoprocessing domains. Trends Genet 20, 538–542 (2004). ICHIMURA, Y. et al. A ubiquitin-like system mediates protein lipidation. Nature 408, 488–492 (2000). SONG, J., DURRIN, L. K., WILKINSON, T. A., KRONTIRIS, T. G. & CHEN, Y. Identification of a SUMO-binding motif that recognizes SUMO-modified proteins. Proc Natl Acad Sci USA 101, 14373–14378 (2004). SHARP, P. M. & LI, W.-H. Molecular evolution of ubiquitin genes. 2, 328–332
(1987). 36 MCGRATH, J. P., JENTSCH, S. & VARSHAVSKY, A. UBA1: an essential yeast gene encoding ubiquitin-activating enzyme. Embo J 10, 227–236 (1991). 37 RAJAGOPALAN, K. V. Biosynthesis and processing of the molybdenum cofactors. Biochem Soc Trans 25, 757–761 (1997). 38 TAYLOR, S. V. et al. Thiamin biosynthesis in Escherichia coli. Identification of this thiocarboxylate as the immediate sulfur donor in the thiazole formation. J Biol Chem 273, 16555–16560 (1998). 39 APPLEYARD, M. V. et al. The Aspergillus nidulans cnxF gene and its involvement in molybdopterin 11 Biochemical Functions of Ubiquitin and Ubiquitin-like Protein Conjugation biosynthesis. Molecular characterization and analysis of in vivo generated mutants. J Biol Chem 273, 14869–14876 (1998). 40 LEIMKUHLER, S., WUEBBENS, M. M. & RAJAGOPALAN, K. V. Characterization of Escherichia coli MoeB and its involvement in the activation of molybdopterin synthase for the biosynthesis of the molybdenum cofactor. J Biol Chem 276, 34695–34701 (2001). 41 RUDOLPH, M. J., WUEBBENS, M. M., RAJAGOPALAN, K. V. & SCHINDELIN, H. Crystal structure of molybdopterin synthase and its evolutionary relationship to ubiquitin activation. Nat Struct Biol 8, 42–46 (2001). 42 WANG, C., XI, J., BEGLEY, T. P. & NICHOLSON, L. K. Solution structure of ThiS and implications for the evolutionary roots of ubiquitin. Nat Struct Biol 8, 47–51 (2001). 43 DUDA, D. M., WALDEN, H., SFONDOURIS, J. & SCHULMAN, B. A. Structural Analysis of Escherichia Coli ThiF. J Mol Biol 349, 774–786 (2005). 44 FURUKAWA, K., MIZUSHIMA, N., NODA, T. & OHSUMI, Y. A protein conjugation system in yeast with homology to biosynthetic enzyme reaction of prokaryotes. J Biol Chem 275, 7462–7465 (2000). 45 GOEHRING, A. S., RIVERS, D. M. &
23
24 Ubiquitin and Ubiquitin-like Protein Conjugation
46
47
48
49
50
51
52
53
54
SPRAGUE, G. F., Jr. Attachment of the ubiquitin-related protein Urm1p to the antioxidant protein Ahp1p. Eukaryot Cell 2, 930–936 (2003). SILVER, E. T., GWOZD, T. J., PTAK, C., GOEBL, M. & ELLISON, M. J. A chimeric ubiquitin conjugating enzyme that combines the cell cycle properties of CDC34 (UBC3) and the DNA repair properties of RAD6 (UBC2): implications for the structure, function and evolution of the E2s. Embo J 11, 3091–3098 (1992). CHEN, P., JOHNSON, P., SOMMER, T., JENTSCH, S. & HOCHSTRASSER, M. Multiple ubiquitin-conjugating enzymes participate in the in vivo degradation of the yeast MATa2 repressor. Cell 74, 357–369 (1993). MATTHIES, A., RAJAGOPALAN, K. V., MENDEL, R. R. & LEIMKUHLER, S. Evidence for the physiological role of a rhodanese-like protein for the biosynthesis of the molybdenum cofactor in humans. Proc Natl Acad Sci USA 101, 5946–5951 (2004). MATTHIES, A., NIMTZ, M. & LEIMKUHLER, S. Molybdenum Cofactor Biosynthesis in Humans: Identification of a Persulfide Group in the Rhodanese-like Domain of MOCS3 by Mass Spectrometry. Biochemistry 44, 7912–7920 (2005). XI, J., GE, Y., KINSLAND, C., MCLAFFERTY, F. W. & BEGLEY, T. P. Biosynthesis of the thiazole moiety of thiamin in Escherichia coli: identification of an acyldisulfide-linked protein protein conjugate that is functionally analogous to the ubiquitin/E1 complex. Proc Natl Acad Sci U S A 98, 8513–8518 (2001). PARK, J. H. et al. Biosynthesis of the thiazole moiety of thiamin pyrophosphate (vitamin B1). Biochemistry 42, 12430–12438 (2003). HERSHKO, A. & CIECHANOVER, A. The ubiquitin system for protein degradation. Annu Rev Biochem 61, 761–807 (1992). GOLDBERG, A. L. & ROCK, K. L. Proteolysis, Proteasomes and Antigen Presentation. Nature 357, 375–379 (1992). HOCHSTRASSER, M. Ubiquitin-dependent
55
56
57
58
59
60
61
62
63
64
protein degradation. Ann. Rev. Genet. 30, 405–439 (1996). DEVERAUX, Q., USTRELL, V., PICKART, C. & RECHSTEINER, M. A 26 S Protease Subunit That Binds Ubiquitin Conjugates. J. Biol. Chem. 269, 7059–7061 (1994). YOUNG, P., DEVERAUX, Q., BEAL, R. E., PICKART, C. M. & RECHSTEINER, M. Characterization of two polyubiquitin binding sites in the 26 S protease subunit 5a. J Biol Chem 273, 5461–5467 (1998). HOFMANN, K. & FALQUET, L. A ubiquitin-interacting motif conserved in components of the proteasomal and lysosomal protein degradation systems. Trends Biochem Sci 26, 347–350 (2001). DI FIORE, P. P., POLO, S. & HOFMANN, K. When ubiquitin meets ubiquitin receptors: a signalling connection. Nat Rev Mol Cell Biol 4, 491–497 (2003). SCHNELL, J. D. & HICKE, L. Non-traditional functions of ubiquitin and ubiquitin-binding proteins. J Biol Chem 278, 35857–35860 (2003). FISHER, R. D. et al. Structure and ubiquitin binding of the ubiquitin-interacting motif. J Biol Chem 278, 28976–28984 (2003). PRAG, G. et al. Structural mechanism for ubiquitinated-cargo recognition by the Golgi-localized, gamma-ear-containing, ADP-ribosylation-factor-binding proteins. Proc Natl Acad Sci U S A 102, 2334–2339 (2005). VARADAN, R. et al. Solution conformation of Lys63-linked diubiquitin chain provides clues to functional diversity of polyubiquitin signaling. J Biol Chem 279, 7055–7063 (2004). van NOCKER, S. et al. The multiubiquitin-chain-binding protein Mcb1 is a component of the 26S proteasome in Saccharomyces cerevisiae and plays a nonessential, substrate-specific role in protein turnover. Mol. Cell. Biol. 16, 6020–6028 (1996). VERMA, R., OANIA, R., GRAUMANN, J. & ESHAIES, R. J. Multiubiquitin chain receptors define a layer of substrate
References
65
66
67
68
69
70
71
72
73
74
75
selectivity in the ubiquitin-proteasome system. Cell 118, 99–110 (2004). CHEN, L. & MADURA, K. Rad23 promotes the targeting of proteolytic substrates to the proteasome. Mol Cell Biol 22, 4902–4913 (2002). KIM, I., MI, K. & RAO, H. Multiple interactions of rad23 suggest a mechanism for ubiquitylated substrate delivery important in proteolysis. Mol Biol Cell 15, 3357–3365 (2004). ELSASSER, S., CHANDLER-MILITELLO, D., MULLER, B., HANNA, J. & FINLEY, D. Rad23 and Rpn10 serve as alternative ubiquitin receptors for the proteasome. J Biol Chem 279, 26817–26822 (2004). ELSASSER, S. et al. Proteasome subunit Rpn1 binds ubiquitin-like protein domains. Nature Cell Biology 4, 725–730 (2002). SAEKI, Y., SONE, T., TOHE, A. & YOKOSAWA, H. Identification of ubiquitin-like protein-binding subunits of the 26S proteasome. Biochemical & Biophysical Research Communications 296, 813–819 (2002). MEDICHERLA, B., KOSTOVA, Z., SCHAEFER, A. & WOLF, D. H. A genomic screen identifies Dsk2p and Rad23p as essential components of ER-associated degradation. EMBO Rep 5, 692–697 (2004). LAMBERTSON, D., CHEN, L. & MADURA, K. Pleiotropic defects caused by loss of the proteasome-interacting factors Rad23 and Rpn10 of Saccharomyces cerevisiae. Genetics 153, 69–79 (1999). K¨OLLING, R. & HOLLENBERG, C. P. The ABC-transporter Ste6 accumulates in the plasma membrane in a ubiquitinated form in endocytosis mutants. EMBO J. 13, 3261–3271 (1994). HICKE, L. & RIEZMAN, H. Ubiquitination of a yeast plasma membrane receptor signals its ligand-stimulated endocytosis. Cell 84, 277–287 (1996). ROTH, A. F. & DAVIS, N. G. Ubiquitination of the yeast α-factor receptor. J Cell Biol 134, 661–674 (1996). HICKE, L. & DUNN, R. Regulation of membrane protein transport by ubiquitin and ubiquitin-binding
76
77
78
79
80
81
82
83
84
85
86
87
proteins. Annu Rev Cell Dev Biol 19, 141–172 (2003). MARMOR, M. D. & YARDEN, Y. Role of protein ubiquitylation in regulating endocytosis of receptor tyrosine kinases. Oncogene 23, 2057–2070 (2004). LONGVA, K. E. et al. Ubiquitination and proteasomal activity is required for transport of the EGF receptor to inner membranes of multivesicular bodies. J Cell Biol 156, 843–854 (2002). HICKE, L. Protein regulation by monoubiquitin. Nat Rev Mol Cell Biol 2, 195–201 (2001). KATZMANN, D. J., ODORIZZI, G. & EMR, S. D. Receptor downregulation and multivesicular-body sorting. Nat Rev Mol Cell Biol 3, 893–905 (2002). BILODEAU, P. S., URBANOWSKI, J. L., WINISTORFER, S. C. & PIPER, R. C. The Vps27p Hse1p complex binds ubiquitin and mediates endosomal protein sorting. Nat Cell Biol 4, 534–539 (2002). KOMADA, M. & KITAMURA, N. The Hrs/STAM complex in the down-regulation of receptor tyrosine kinases. J Biochem (Tokyo) 137, 1–8 (2005). PORNILLOS, O. et al. HIV Gag mimics the Tsg101-recruiting activity of the human Hrs protein. J Cell Biol 162, 425–434 (2003). SUNDQUIST, W. I. et al. Ubiquitin recognition by the human TSG101 protein. Mol Cell 13, 783–789 (2004). AMERIK, A. Y., NOWAK, J., SWAMINATHAN, S. & HOCHSTRASSER, M. The Doa4 deubiquitinating enzyme is functionally linked to the vacuolar protein-sorting and endocytic pathways. Mol Biol Cell 11, 3365–3380 (2000). LUHTALA, N. & ODORIZZI, G. Bro1 coordinates deubiquitination in the multivesicular body pathway by recruiting Doa4 to endosomes. J Cell Biol 166, 717–729 (2004). MORITA, E. & SUNDQUIST, W. I. Retrovirus budding. Annu Rev Cell Dev Biol 20, 395–425 (2004). OTT, D. E., COREN, L. V., CHERTOVA, E. N., GAGLIARDI, T. D. & SCHUBERT, U. Ubiquitination of HIV-1 and MuLV Gag. Virology 278, 111–121 (2000).
25
26 Ubiquitin and Ubiquitin-like Protein Conjugation
88 MELCHIOR, F. SUMO –nonclassical ubiquitin. Annu Rev Cell Dev Biol 16, 591–626 (2000). 89 BERNIER-VILLAMOR, V., SAMPSON, D. A., MATUNIS, M. J. & LIMA, C. D. Structural basis for E2-mediated SUMO conjugation revealed by a complex between ubiquitin-conjugating enzyme Ubc9 and RanGAP1. Cell 108, 345–356 (2002). 90 MAHAJAN, R., DELPHIN, C., GUAN, T., GERACE, L. & MELCHIOR, F. A small ubiquitin-related polypeptide involved in targeting RanGAP1 to nuclear pore complex protein RanBP2. Cell 88, 97–107 (1997). 91 MINTY, A., DUMONT, X., KAGHAD, M. & CAPUT, D. Covalent modification of p73alpha by SUMO-1. Two-hybrid screening with p73 identifies novel SUMO-1-interacting proteins and a SUMO-1 interaction motif. J Biol Chem 275, 36316–36323 (2000). 92 HANNICH, J. T. et al. Defining the SUMO-modified proteome by multiple approaches in Saccharomyces cerevisiae. J Biol Chem 280, 4102–4110 (2005). 93 MACAULEY, M. S. et al. Structural and dynamic independence of isopeptide-linked RanGAP1 and SUMO-1. J Biol Chem 279, 49131–49137 (2004). 94 STEINACHER, R. & SCHAR, P. Functionality of human thymine DNA glycosylase requires SUMO-regulated changes in protein conformation. Curr Biol 15, 616–623 (2005). 95 HARDELAND, U., STEINACHER, R., JIRICNY, J. & SCHAR, P. Modification of the human thymine-DNA glycosylase by ubiquitin-like proteins facilitates enzymatic turnover. Embo J 21, 1456–1464 (2002). 96 TAKAHASHI, H., HATAKEYAMA, S., SAITOH, H. & NAKAYAMA, K. I. Noncovalent SUMO-1 binding activity of thymine DNA glycosylase (TDG) is required for its SUMO-1 modification and colocalization with the promye-locytic leukemia protein. J Biol Chem 280, 5611–5621 (2005). 97 DUPREZ, E. et al. SUMO-1 modification of the acute promyelocytic leukaemia
98
99
100
101
102
103
104
105
106
107
protein PML: implications for nuclear localisation. J Cell Sci 112, 381–393 (1999). HENRY, K. W. et al. Transcriptional activation via sequential histone H2B ubiquitylation and deubiquitylation, mediated by SAGA-associated Ubp8. Genes Dev 17, 2648–2663 (2003). PALACIOS, S. et al. Quantitative SUMO-1 Modification of a Vaccinia Virus Protein Is Required for Its Specific Localization and Prevents Its Self-Association. Mol Biol Cell 16, 2822–2835 (2005). RAJAN, S., PLANT, L. D., RABIN, M. L., BUTLER, M. H. & GOLDSTEIN, S. A. Sumoylation silences the plasma membrane leak K ρ channel K2P1. Cell 121, 37–47 (2005). HOEGE, C., PFANDER, B., MOLDOVAN, G. L., PYROWOLAKIS, G. & JENTSCH, S. RAD6-dependent DNA repair is linked to modification of PCNA by ubiquitin and SUMO. Nature 419, 135–141 (2002). HARACSKA, L., TORRES-RAMOS, C. A., JOHNSON, R. E., PRAKASH, S. & PRAKASH, L. Opposing effects of ubiquitin conjugation and SUMO modification of PCNA on replicational bypass of DNA lesions in Saccharomyces cerevisiae. Mol Cell Biol 24, 4267–4274 (2004). STELTER, P. & ULRICH, H. D. Control of spontaneous and damage-induced mutagenesis by SUMO and ubiquitin conjugation. Nature 425, 188–191 (2003). KANNOUCHE, P. L., WING, J. & LEHMANN, A. R. Interaction of human DNA polymerase eta with monoubiquitinated PCNA: a possible mechanism for the polymerase switch in response to DNA damage. Mol Cell 14, 491–500 (2004). PFANDER, B., MOLDOVAN, G. L., SACHER, M., HOEGE, C. & JENTSCH, S. SUMO-modified PCNA recruits Srs2 to prevent recombination during S phase. Nature (2005). PAPOULI, E. et al. Crosstalk between SUMO and ubiquitin on PCNA is mediated by recruitment of the helicase Srs2p Mol Cell, Published online Jun. 9, 2005 10.1016/S1097276505013766 (2005). KREJCI, L. et al. DNA helicase Srs2 disrupts the Rad51 presynaptic filament.
References Nature 423, 305–309 (2003). 108 VEAUTE, X. et al. The Srs2 helicase prevents recombination by disrupting Rad51 nucleoprotein filaments. Nature 423, 309–312 (2003). 109 DESTERRO, J. M. P., RODRIGUEZ, M. S. & HAY, R. T. SUMO-1 modification of IkBa inhibits NF-κB activation. Molec. Cell 2, 233–239 (1998). 110 HORI, T. et al. Covalent modification of all members of human cullin family proteins by NEDD8. Oncogene 18, 6829–6834 (1999). 111 JOHNSON, E. S. Protein modification by SUMO. Annu Rev Biochem 73, 355–382 (2004). 112 MIZUSHIMA, N. et al. A protein conjugation system essential for auto-phagy. Nature 395, 395–398 (1998). 113 NAKAMURA, M. & TANIGAWA, Y. Characterization of ubiquitin-like polypeptide acceptor protein, a novel pro-apoptotic member of the Bcl2 family. Eur J Biochem 270, 4052–4058 (2003). 114 JONES, D. & CANDIDO, E. P. Novel ubiquitin-like ribosomal protein fusion genes from the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. J Biol Chem 268, 19545–19551 (1993). 115 KOMATSU, M. et al. A novel protein-conjugating system for Ufm1, a ubiquitin-fold modifier. Embo J 23, 1977–1986 (2004). 116 RAASI, S., SCHMIDTKE, G. & GROETTRUP, M. The ubiquitin-like protein FAT10 forms covalent conjugates and induces apoptosis. J Biol Chem 276, 35334–35343 (2001).
117 DITTMAR, G. A., WILKINSON, C. R., JEDRZEJEWSKI, P. T. & FINLEY, D. Role of a ubiquitin-like modification in polarized morphogenesis. Science 295, 2442–2446 (2002). 118 LUDERS, J., PYROWOLAKIS, G. & JENTSCH, S. The ubiquitin-like protein HUB1 forms SDS-resistant complexes with cellular proteins in the absence of ATP. EMBO Rep 4, 1169–1174 (2003). 119 YASHIRODA, H. & TANAKA, K. Hub1 is an essential ubiquitin-like protein without functioning as a typical modifier in fission yeast. Genes Cells 9, 1189–1197 (2004). 120 KRAMER, A., MULHAUSER, F., WERSIG, C., GRONING, K. & BILBE, G. Mammalian splicing factor SF3a120 represents a new member of the SURP family of proteins and is homologous to the essential splicing factor PRP21p of Saccharomyces cerevisiae. Rna 1, 260–272 (1995). 121 YAMAMOTO, A. et al. Two types of chicken 2 ,5 -oligoadenylate synthetase mRNA derived from alleles at a single locus. Biochim Biophys Acta 1395, 181–191 (1998). 122 LAM, Y. A., LAWSON, T. G., VELAYUTHAM, M., ZWEIER, J. L. & PICKART, C. M. A proteasomal ATPase subunit recognizes the polyubiquitin-degration signal. Nature 416, 763–767 (2002). 123 HAAS, A. L., KATZUNG, D. J., REBACK, P. M., GUARINO, L. A. Functional characterization of the ubiquitin variant encoded by the baculovirus Autographa californica. Biochemistry 35, 5385–5394 (1996).
27
1
Allosteric Regulation by Membrane-binding Domains Bertram Canagarajah, William J. Smith, and James H. Hurley US Department of Health and Human Services, Bethesda, USA
Originally published in: Protein-Lipid Interactions. Edited by Lukas K. Tamm. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31151-4
1 Introduction
Membrane targeting of proteins is a fundamental regulatory mechanism in cell signaling. The specific interaction of conserved protein modules with membrane lipids is one prominent mechanism for membrane targeting. The best known of these are the C1, C2, ENTH, FYVE, PH and PX domains [1–10], but the number continues to grow. Reversible binding of signaling proteins to membranes has a variety of consequences. The most obvious effect is relocalization, which typically brings a signaling protein to its site of action at a cell membrane. In a few cases, membrane localization instead sequesters a protein at the membrane and thereby removes it from its site of action. Relocalization has become straightforward to measure in vivo by using fluorescent protein fusions as probes and countless proteins have now been shown to undergo regulation by reversible changes in subcellular localization. Many excellent reviews on this topic are available [1–10]. Targeting is not the only function of membrane-binding domains. Many enzymes that act at the membrane interface contain membrane-binding domains that are rigidly attached to, but distinct from, their catalytic domain. In these systems, the membrane-binding domain serves as a rigid “handle” to position the catalytic domain on the membrane [11, 12]. The function of a rigidly attached handle, which targets and positions, stands in contrast to a flexibly tethered anchor, which targets but does not position. Yet another consequence of membrane binding can be to allosterically alter the activity of a protein by inducing conformational changes. The protein kinase Cs (PKCs) are the archetypal examples of allosteric regulation by membrane binding domains, and most of the concepts in the field were first established by studies of PKC activation by diacylglycerol (DAG) and phosphatyidylserine Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Allosteric Regulation by Membrane-binding Domains
[5, 13–19]. Biochemical and, ultimately, structural analysis is required to characterize these allosteric activation mechanisms. It is correspondingly harder to study these mechanisms as compared to simple targeting mechanisms. There are few structures of full-length, or nearly full-length, membrane-binding domain-containing proteins. Some of these structures are best interpreted in terms of targeting and positioning roles for the membrane-binding domain, with no large structural changes produced by membrane binding domain engagement. The structures of the C2 domain-containing proteins phospholipase Cδ [20, 21], cytosolic phospholipase A2 [22], PTEN [23] and phosphoinositide 3-kinase [24] fall into this category. The same might hold for the PH domain-containing lipid phosphatase myotubularin [25], although the lipid-binding properties of its outlier PH domain (formerly known as a “GRAM” domain) have not been reported to date. This chapter will focus on the handful of PH and PKC C1 domain-containing proteins for which structures are available, and for which there is at least some reason to believe that membrane binding leads to regulatory conformational changes. While only a few such structures are available, the small number of structures probably has more to do with the technical challenges of coaxing large, conformationally heterogeneous signaling proteins to form crystals. The total number of proteins in this regulatory class is probably far greater than is represented by existing structures.
2 How Membranes and PH Domains Regulate Rho Family-specific Guanine Nucleotide Exchange Factors (GEFs)
The GTP-binding proteins Cdc42, Rho and Rac are part of the Rho subfamily of small-molecular-weight Ras-like GTPases, and are at the center of many fundamental cellular events including the establishment of cell polarity, transcriptional regulation, actin cytoskeleton rearrangement, intracellular trafficking and endocytosis. These small G-proteins are intimately associated with phospholipid membranes [26–31]. The GTPases serve as molecular switches whereby the GTP bound form is active in signaling, while the GDP form is largely inert. Small GTPases have very poor intrinsic enzyme activity, and are helped through their catalytic cycle by GTPase-activating proteins (GAPs) and GEFs. GAPs turn off signaling by activating GTP hydrolysis, while GEFs turn the signal on by loading GTP onto the GTPases. A GTP-binding protein has two highly conserved regions that serve as the site for interactions with a variety of targets and regulatory proteins. These two regions are centered around the GTP-binding pocket, and have been dubbed switch I and switch II (Fig. ??), due to their propensity to undergo a conformational change upon binding to either a magnesium-coordinated GTP or GDP molecule [32]. The Rho family members are post-translationally modified at their C-terminus with a prenyl moiety that allows for their direct insertion into membrane compartments and organelles [33, 34].
2 How Membranes and PH Domains Regulate Rho Family 3
2.1 DH and PH Domain Rho GEFs
The Rho family-specific GEFs consist of a Dbl homology followed by a PH domain, and together they are referred to as a DHPH module. The minimal catalytic unit, the DH domain, is largely α-helical and possesses three conserved regions, referred to as CR1-3, that are critical for function. The mechanism of GEF-mediated guanine nucleotide exchange on the Rho family protein members by the DH domain is modeled as a three-step process. The DH domain first interacts with the conformational-rigid β2-β3 strands (Fig. ??) between switch I and II of their cognate GTPase. This “lock and key” association is followed by an “induced fit” mechanism that results in a conformational movement of switch I and switch II and consequently destabilization of the guanine nucleotide. The combined intrusion of Ala59 into the Mg 2+ -binding site and a steric clash of Ile33 with the nucleotide sugar promotes GDP dissociation (Fig. ??). The complex allows unobstructed access to the guanine nucleotide-binding pocket where subsequent GTP binding leads to remodeling of the switch regions and destabilization of the GEF/GTPase complex, resulting in the dissociation of the GEF and an active GTPase [35–37]. 2.2 Regulation of GEF Activity by PH Domains
While some isolated DH domains catalyze the exchange of GDP for GTP, the presence of the PH domain is required in others [38–41]. In the GEF-GTPase complex of Tiam1-Rac1, residues important for efficient GEF-stimulated guanine nucleotide exchange are present within the DH domain. A critical interaction between Arg66 within switch II of the GTP-binding protein, Rac1, forms an ion pair with Glu1239 within the DH domain of Tiam1, an interaction that is essential for the GEF activity on Rac1 [37]. However, in Dbs (Dbl’s big sister) the conserved Glu residue is absent. To restore GEF activity, the PH domain of these GEFs must contribute a hydrogen bond from Tyr889 and the carbonyl oxygen of Pro887 to a hydrogen bond network with Arg66 within switch II of Cdc42 (Fig. ??). However, mutation of Arg66 does not compromise the GEF activity of Dbs and it is thought that Tyr889 promotes GEF catalysis by stabilizing the electronic polarization of the imidazole group of His814 within the DH domain [37]. Thus, with the requirement of Tyr889 from the PH domain of Dbs in facilitating efficient guanine nucleotide exchange on its cognate GTP-binding protein, Cdc42 (or Rho), any alteration of the positioning of the PH domain relative to its DH domain would have a direct effect on the catalytic activity of the GEF. There are two leading models for the allosteric regulation of the GEF, which are not mutually exclusive. In the first, stabilizing interactions or structural rigidity imparted by the membrane on the GEFs (or other accessory proteins) to optimize GEF-stimulated guanine nucleotide exchange activity. In the second model, membrane binding to the PH domain induces conformational changes in the DHPH module that enhance GEF-stimulated guanine nucleotide exchange.
4 Allosteric Regulation by Membrane-binding Domains
2 How Membranes and PH Domains Regulate Rho Family 5
Dbs appears to fall into the first category. The PH domain of Dbs can rotate around the C-terminal helix (α6) with the DH domain. This rotation allows the PH domain in Dbs to directly interact with Cdc42 [37]. The crystal structure of the isolated unbound DHPH module of Dbs revealed a similar orientation of the DHPH module as observed when complexed with Cdc42 [42, 43]. Although the structures are similar, Worthylake et al. argue that the subtle differences between the GTPase-bound and unbound DHPH module of Dbs show how phosphoinositide and membrane binding to the PH domain restricts the conformational heterogeneity between the DH and PH domain [43]. This restriction in conformational flexibility would then favor direct interaction between the PH domain and GTPase to allow for elevated GEF-stimulated guanine nucleotide exchange [43]. Mutations that block the binding of the PH domain to phosphoinositides preserve GEFstimulated nucleotide exchange in vitro and maintain proper localization in the cell, but do not activate their cognate GTPases in vivo [39, 44]. A similar scenario applies to Trio [45]. The PH domain of Trio does not bind to phosphoinositides by itself, but RhoG greatly enhances the association of Trio with phosphoinositides [45]. In summary, first the GTPase is bound to the membrane, in close proximity to a phosphoinositide patch (Fig. ??). Second, the GEF is recruited to the GTPase by unobstructed access of the DH domain to the GTP-binding protein, however GEF activity is minimal. Finally, the PH domain engages the phosphoinositide directly through residues within the β1-β2 and β3-β4 loop, and stabilizes the catalytic loop contributed by the PH domain to assist in GEF activity. The case of Trio is a variation of the last step in which the basic tail of RhoG acts as a bridge by interacting with the PH domain of Trio and with membranes. Allosteric regulation of the PH domain within the DHPH module may occur by gross reorientations of the DH and PH domains. In the crystal structure of the isolated DHPH domain from mSos1 in the absence of a GTPase, the PH domain sterically blocks the main catalytic surface of the DH domain [41]. The crystal structure of the LARG DHPH module alone or bound to RhoA highlights a ← Fig. 1 Structural basis of Rho-GEFstimulated guanine-nucleotide exchange on Rho family GTP-binding proteins: structure of Cdc42 (red) bound to the DHPH module of Dbs (PDB: 1KZ7). Switch I (yellow), switch II (orange) and the $2-$3 (interswitch and colored charcoal) region that are responsible for the association with target/effector and regulator molecules. (A) Guanine nucleotide exchange occurs with the association of the DH domain (blue) (slate) with the $2-$3 “interswitch” concurrently with the conformational disruption of switch I and II in the GTPase. The conserved regions 1 and 2 (CR1 and CR2) that interact with the GTPase and facil-
itate GEF-stimulated guanine nucleotide exchange are highlighted in beige and hotpink, respectively. (B) Orthogonal view to (A), highlighting the relative accessibility of the nucleotide-binding pocket and the association of loop regions within the $1-$2 and $3-$4 strands that serve to coordinate membrane-bound phosphoinositides.1 (C) Dbs coordinates Arg66 within switch I (orange) of Cdc42 with residues contributed from the DH domain (His814) and PH domain (Tyr889 and Pro887, green) that are critical for GEF activity. All figures were rendered with MacPyMOL (www.pymol.org).
6 Allosteric Regulation by Membrane-binding Domains
conformational change that occurs upon association with the small GTPase. A roughly 30◦ rotation of the PH domain relative to the DH domain occurs through a bend in the flexible helix between the DH and PH domains. This conformational change allows for the PH domain to move into a position to directly contribute to efficient guanine nucleotide exchange [40]. Thus, in model 2, the GEF is first targeted to the membrane via direct electrostatic interactions contributed from the positively charged polarized surface of the PH domain (Fig. ??). Second, the membrane, assisted by additional protein effectors, induces a conformational change in the PH domain which in turn leads to a conformational change in the unstructured loop between the DH and PH domains, which then engages the membrane. Finally, the GEF is positioned to engage the GTPase and GEF-stimulated guanine nucleotide exchange occurs. In summary, the DHPH GEFs have a conserved modular structure, yet are regulated by diverse mechanisms. In some cases membrane engagement by the PH domain functions indirectly by positioning the DH; in other cases, structural changes are induced that rigidify the overall DHPH assembly and promote direct interactions between the PH domain and the small GTPase; and in at least one case it appears that conformational changes are required to relieve a steric block of the active site by the PH domain.
3 Regulation of G-protein Receptor Kinase (GRK) 2 Activity by Lipids and the Gβγ Subunit at the Membrane
GRKs are serine/threonine kinases that phosphorylate the C-terminal tail of activated G-protein-coupled receptors (GPCRs). The phosphorylation of activated GPCRs is followed by binding of specific arrestins to the phosphorylated receptors, which uncouples the receptors from their G-proteins and ultimately leads to the desensitization of these receptors [46]. At the level of G-proteins, GRKs prevent complex formation between Gα and Gβγ subunits. GRK2 (previously referred to as β-adrenergic receptor kinase) is an 80-kDa protein containing three modular domains: an N-terminal regulator of G-protein signaling (RGS) homology (RH) domain, a central protein kinase domain, and a C-terminal PH domain that is unique to GRK2 and GRK3 [47]. GRK2 is recruited from the cytosol to the plasma membrane by interacting with Gβγ and anionic phospholipids through its PH domain [48, 49]. The crystal structure of the GRK2-Gβγ complex (Fig. 2) has been determined in the presence of detergent micelles [50]. This structure suggests how GRK2 function is intricately regulated by the interaction between its domains and its interactions with Gqα, Gβγ and anionic lipid [50]. This structure illustrates how GRK2 is able to interact with Gα, Gβγ , GPCR and phospholipids simultaneously, and to down-regulate G-protein mediated signaling at the level of GPCRs and G-proteins. The kinase domain of GRK2 is very similar to the other known structures of AGC family of kinases, including protein kinase A (PKA) and protein kinase B (PKB). The
3 Regulation of G-protein Receptor
Fig. 2 Structural mechanism for allosteric regulation of GRK2. (A) Side view of the GRK2-G$( complex bound to Gq" and GPCR at the membrane. The PH domain of GRK2 is tan, the RH domain is violet and the kinase domain is yellow. The two helices ("10 and "11) that are separate from the rest of the RH domain in the primary sequence are colored magenta. The G$1 subunit is blue and G(2 is green. The model was built using the GRK2-G$1 (2 complex (PDB: 1OMW), the inactive crystal structure of rhodopsin (PDB: 1L9H) as activated GPCR (maroon) and a homology model of Gq" (cyan) built based on the structure of Gi" in com-
plex with RGS4 (PDB: 1AGR). The membrane is depicted as a translucent surface. (B) A close-up view of the kinase domain. The inactive kinase domain of GRK2 in yellow is superimposed on the active kinase structure of PKA (PDB: 1ATP) in cyan using the C-terminal lobes. The difference between the relative orientation of the lobes and the orientation of the C helix is apparent. The catalytically important Lys72 and Glu91 of PKA, the corresponding residues in GRK2, and the ATP are drawn as sticks. The hydrogen bond between Lys72 and Glu91 is drawn as a green dashed line. Two manganese ions of PKA are shown as white spheres.
7
8 Allosteric Regulation by Membrane-binding Domains
kinase domain in the complex is in an open, inactive conformation. The nucleotidebinding gate is disordered and the αC-helix is oriented as seen in other inactive kinase structures. The kinase activity of GRK2 is regulated by its interactions with the substrate, the membrane and protein ligands. Unlike the other AGC kinases, it is not regulated by phosphorylation on residues on the activation loop or other sites [51]. The kinase domain makes extensive contacts with RH domain. The α10 helix lies at the RH-kinase domain interface, and consists of several hydrophobic contacts and an ion pair. Any change in the conformation of RH domain could be relayed to the kinase domain. The phospholipid-binding site of the PH domain is adjacent to the RH domainPH domain interface. The structure also displays significant differences in the conformation of the C-terminal region of the domain compared to the nuclear magnetic resonance structure of the PH domain [52]. This conformational change could be a result of Gβγ binding to the GRK2 PH domain. It is possible that the binding of phospholipids and Gβγ to PH domain could be communicated to the rest of the protein through the RH–PH domain interface. This interface, consisting of hydrophobic interactions, a salt bridge and two ion pairs, plays a significant role in the allosteric regulation of GRK2. Site-directed mutagenesis of one of these ion pairs has been shown to impair the phospholipid-mediated activation of GRK2 [53]. The similarities between the inactive structure of c-Src [54, 55] and the RHkinase domain core of GRK2 led to the concept that binding of other proteins and/or domains to the RH domain could be coupled to the activity of the kinase domain. In particular, the similarity of the α10 helix to the Src homology-2 (SH2) kinase linker of c-Src places the α10 helix at an important position to regulate the enzymatic activity of GRK2. Changes in the rest of the protein produced by the interactions of Gα, Gβγ and phospholipids could change the conformation of the RH domain. The RH domain is in direct contact with both the PH and kinase domains, and thus able to coordinate various interactions with the activation of the kinase. A change in the conformation of RH domain resulting from the binding of phospholipids, Gβγ and Gα could then shift the kinase domain to the active conformation. In summary, the structure of GRK2 offers a clear-cut example of a positioning role, whereby the PH domain interacts with membranes and Gβγ to bring the kinase into proximity to its GPCR substrate. The observation of an inactive conformation even in the presence of detergent and Gβγ shows that these ligands are insufficient to fully activate the kinase. It seems clear that further activation by phospholipids or other ligands not present in the crystal structure must induce a conformational change in the kinase core in order to reposition the αC-helix for catalysis.
4 Lipid Activation of Rac-GAP Activity: β2-Chimaerin
DAG is a paradigmatic lipid second messenger in metazoan cell signaling, the first to be discovered [13–16]. DAG is a central mediator of downstream signaling by
4 Lipid Activation of Rac-GAP Activity: $2-Chimaerin
a host of hormones coupled through Gq and phospholipase Cβ, growth factors coupled to tyrosine kinase-linked receptors and phospholipase Cγ , and many other extra- and intracellular stimuli [16]. The PKC isozyme family has historically been the most intensively studied class of targets for DAG signaling [5, 13–16, 56, 57]. Conventional and novel PKC isozymes, and the protein kinase D isozymes, translocate to membranes and are activated by DAG and phorbol esters by virtue of their direct binding to motifs known as protein kinase C homology-1 (C1) domains [5, 15, 16, 56, 58, 59]. There are several major classes of C1 domain-containing DAG receptors in addition to PKC. One example is β2-chimaerin, a phorbol ester- and DAGresponsive GTPase-activating protein [60–64]. β2-chimaerin contains three conserved domains. The N-terminal SH2 domain is presumed competent to bind phosphotyrosine-containing proteins, but its physiological partner is unknown. The central C1 domain is followed by a C-terminal GAP domain with homology to many other Rho, Rac and Cdc42 GAPs. The GAP activity of β2-chimaerin is specific for Rac as opposed to other small G-proteins [65]. 4.1 The C1 Domain of β2-Chimaerin is Buried
The overall structure of β2-chimaerin [66] shows that the C1 domain is the linchpin that holds together the larger assembly of domains (Fig. 3). The C1 domain is in contact with both the SH2 and Rac-GAP domains, with the N-terminal region and with the SH2-C1 linker. Much of the surface of the C1 domain is deeply buried in contacts with the rest of the protein. The C1 domain surfaces buried in these contacts overlap extensively with the surfaces involved in phospholipid and phorbol ester binding. Of the six hydrophobic side-chains believed to insert into the membrane, four are completely buried, and two are partially buried in contacts with the rest of the protein. The main-chain NH and CO groups on Gly235 are presumed to interact with the 3- and 4-hydroxyls of phorbol ester, as seen for Gly253 of PKCδ. In β2-chimaerin, the amide group of the side-chain of Gln32 replaces this interaction and sterically blocks the phorbol ester-binding site. This implies that the observed conformation of β2-chimaerin is incompatible with phorbol ester and phospholipid membrane binding. The cost in free energy to break the extensive interactions between the C1 domain and these four protein regions must be considerable. This is consistent with the observation that intact β2-chimaerin translocates to membranes at higher doses of phorbol ester than required for the C1 domain alone or for proteins with less buried C1 domains. β2-chimaerin translocation to cell membranes requires a dose around 100-fold higher than translocation of PKCα [67]. β1-chimaerin, the product of alternative splicing of the same gene as β2-chimaerin, lacks the SH2 domain of β2-chimaerin and therefore has fewer intramolecular interactions with the C1 domain. β1-Chimaerin translocates to membranes at a much lower dose of phorbol ester than β2-chimaerin, consistent with a greater degree of solvent exposure of its C1 domain.
9
10 Allosteric Regulation by Membrane-binding Domains
Fig. 3 Allosteric regulation of the $2chimaerin at the membrane. (A) The overall structure of the $2-chimaerin: SH2 domain (red), C1 (blue), Rac-GAP (green) and linkers (tan). (B) Model for membrane-docked,
active $2-chimaerin. The geranylgeranyl group is taken from the structure of Rac (yellow) bound to its guanine nucleotide dissociation inhibitor.
Point mutants in any of the four regions that contact the C1 domain destabilize the inactive conformation and thereby decrease the EC50 for phorbol ester-induced translocation, with the sensitization ranging from 15- to 90-fold. The smallest enhancement, 15-fold, is seen in the Q32A mutant, which obliterates polar interactions, rather than the hydrophobic interactions lost in other mutants. The largest enhancements of phorbol ester sensitivity, 90-fold, are seen in the mutants L28A and I130A. Leu28 buries itself against the junction between αL and the C1 domain,
References
while Ile130 is buried in the αN/C1 interface. Mutating residues that stabilize more than one interdomain contact thus produces the largest sensitivity enhancements. Thus, the cooperative collapse of all of these interactions is a prerequisite for membrane binding. In other words, membrane binding by β2-chimaerin is coupled to a massive conformational change involving the entire protein. 4.2 Mechanism of Allosteric Rac-GTPase Activation by the C1 Domain
The β2-chimaerin structure, taken together with modeling of the β2-chimaerin/Rac transition state complex, shows how β2-chimaerin promotes the GTPase activity of Rac. Based on the model, Arg311 of β2-chimaerin acts as the “arginine finger”, reaches into the active site of Rac, and directly stabilizes the transition state for GTP hydrolysis. The observed β2-chimaerin structure appears to be inactive for catalysis in two respects. First, Pro21 and Pro22 directly occlude the Rac-binding site (Fig. 3). Second, the αF helix is locked in the unliganded conformation by contacts with Pro22 (Fig. 3). The liganded conformation of p50-Rho-GAP shows the likely conformation of the β2-chimaerin Rac-GAP domain in complex with substrate. The liganded complex predicts that a movement of the helix by around 6 Å in order to make contact with Rac. The diPro unit is structurally rigid, the local environment in the Rac-binding cleft is conformationally restrictive and it is not apparent how such a structural change could occur without moving the residues that connect the diPro unit to the rest of β2-chimaerin. The next two residues in sequence, Ile23 and Trp24, are wedged between the C1 and Rac-GAP domain and the N-terminus is thereby anchored to both domains via extensive hydrophobic interactions. These interactions need to be broken in order to free the diPro motif to move out of the GAP active site. In summary, β2-chimaerin provides the most clear-cut example available for a large-scale activating conformational change triggered by membrane engagement of a membrane binding domain. The membrane-penetrating and phorbol esterbinding sites on β2-chimaerin are occluded by intramolecular interactions with four separate regions of the protein. The Rac-binding site on β2-chimaerin is occluded. Rac binding requires the removal of the N-terminal region from the Rac-GAP domain active site. The N-terminal segment is tightly anchored by the C1 and Rac-GAP domain, and it participates directly in the occlusion of the phospholipid-binding site. Acidic phospholipid membranes compete with intramolecular interactions for binding to the C1 domain. When the former bind, the latter interactions are disrupted. The disruption of these intramolecular interactions allows the N-terminus sufficient flexibility to leave the Rac-GAP active site, removes steric inhibition of Rac binding, and permits the αF helix to adopt the active conformation.
Acknowledgment
W. J.S. thanks Juan Bonifacino for his support.
11
12 Allosteric Regulation by Membrane-binding Domains
References 1 E. A. NALEFSKI, J. J. FALKE, Protein. Sci. 1996, 5, 2375–2390. 2 M. J. REBECCHI, S. SCARLATA, Annu. Rev. Biophys. Biomol. Struct. 1998, 27, 503–528. 3 M. A. LEMMON, K. M. FERGUSON, Biochem. J. 2000, 350, 1–18. 4 M. A. LEMMON, Traffic 2003, 4, 201–213. 5 A. C. NEWTON, J. J. JOHNSON, Biochim. Biophys. Acta Rev. Biomembr. 1998, 1376, 155–172. 6 J. H. HURLEY, T. MEYER, Curr. Opin. Cell. Biol. 2001, 13, 146–152. 7 W. CHO, J. Biol. Chem. 2001, 276, 32407–32410. 8 J. H. HURLEY, S. MISRA, Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 49–79. 9 J. H. HURLEY, B. WENDLAND, Cell 2002, 111, 143–146. 10 A. SIMONSEN, A. E. WURMSER, S. D. EMR, H. STENMARK, Curr. Opin. Cell Biol. 2001, 13, 485–492. 11 J. H. HURLEY, J. A. GROBLER, Curr. Opin. Struct. Biol. 1997, 7, 557–565. 12 J. H. HURLEY, Y. TSUJISHITA, M. A. PEARSON, Curr. Opin. Struct. Biol. 2000, 10, 737–743. 13 U. KIKKAWA, A. KISHIMOTO, Y. NISHIZUKA, Annu. Rev. Biochem. 1989, 58, 31–44. 14 Y. NISHIZUKA, Nature 1988, 334, 661–665. 15 Y. NISHIZUKA, Science 1992, 258, 607–614. 16 Y. NISHIZUKA, FASEB J. 1995, 9, 484–496. 17 J. W. ORR, L. M. KERANEN, A. C. NEWTON, J. Biol. Chem. 1992, 267, 15263–15266. 18 J. W. ORR, A. C. NEWTON, J. Biol. Chem. 1994, 269, 8383–8387. 19 A. C. NEWTON, Trends Pharmacol. Sci. 2004, 25, 175–177. 20 L. O. ESSEN, O. PERISIC, R. CHEUNG, M. KATAN, R. L. WILLIAMS, Nature 1996, 380, 595–602. 21 J. A. GROBLER, L. O. ESSEN, R. L. WILLIAMS, J. H. HURLEY, Nat. Struct. Biol. 1996, 3, 788–795. 22 A. DESSEN, J. TANG, H. SCHMIDT, M. STAHL, J. D. CLARK, J. SEEHRA, W. S. SOMERS, Cell 1999, 97, 349–360. 23 J. O. LEE, H. J. YANG, M. M. GEORGESCU, A. DI CRISTOFANO, T. MAEHAMA, Y. G. SHI, J. E. DIXON, P. PANDOLFI, N. P. PAVLETICH, Cell 1999, 99, 323–334. 24 E. H. WALKER, O. PERISIC, C. RIED, L.
25
26
27
28 29 30 31 32
33
34 35 36
37
38 39
40 41
42
STEPHENS, R. L. WILLIAMS, Nature 1999, 402, 313–320. M. J. BEGLEY, G. S. TAYLOR, S. A. KIM, D. M. VEINE, J. E. DIXON, J. A. STUCKEY, Mol. Cell 2003, 12, 1391–1402. S. BAGRODIA, B. D´ERIJARD, R.J. DAVIS, R. A. CERIONE, J. Biol. Chem. 1995, 270, 27995–27998. O. A. COSO, M. CHIARIELLO, J. C. YU, H. TERAMOTO, P. CRESPO, N. XU, T. MIKI, J. S. GUTKIND, Cell 1995, 81, 1137–1146. A. HALL, Science 1998, 279, 509–514. C. S. HILL, J. WYNNE, R. TREISMAN, Cell 1995, 81, 1159–1170. R. KOZMA, S. AHMED, A. BEST, L. LIM, Mol. Cell Biol. 1995, 15, 1942–1952. A. J. RIDLEY, A. HALL, Cell 1992, 70, 389–399. A. M. De VOS, L. TONG, M. V. MILBURN, P. M. MATIAS, J. JANCARIK, S. NOGUCHI, S. NISHIMURA, K. MIURA, E. OHTSUKA, S. H. KIM, Science 1988, 239, 888–893. Y. FUKUMOTO, K. KAIBUCHI, Y. HORI, H. FUJIOKA, S. ARAKI, T. UEDA, A. KIKUCHI, Y. TAKAI, Oncogene 1990, 5, 1321–1328. T. NOMANBHOY, R. A. CERIONE, Biochemistry 1999, 38, 15878–15884. D. K. WORTHYLAKE, K. L. ROSSMAN, J. SONDEK, Nature 2000, 408, 682–688. J. T. SNYDER, D. K. WORTHYLAKE, K. L. ROSSMAN, L. BETTS, W. M. PRUITT, D. P. SIDEROVSKI, C. J. DER, J. SONDEK, Nat. Struct. Biol. 2002, 9, 468–475. K. L. ROSSMAN, D. K. WORTHYLAKE, J. T. SNYDER, D. P. SIDEROVSKI, S. L. CAMPBELL, J. SONDEK, EMBO J. 2002, 21, 1315–13126. Y. ZHENG, M. J. HART, R. A. CERIONE, Methods Enzymol. 1995, 256, 77–84. K. L. ROSSMAN, L. CHENG, G. M. MAHON, R. J. ROJAS, J. T. SNYDER, I. P. WHITEHEAD, J. SONDEK, J. Biol. Chem. 2003, 278, 18393–18400. R. KRISTELLY, G. GAO, J. J. TESMER, J. Biol. Chem. 2004, 279, 47352–47362. S. M. SOISSON, A. S. NIMNUAL, M. UY, D. BAR-SAGI, J. KURIYAN, Cell 1998, 95, 259–268. K. L. ROSSMAN, D. K. WORTHYLAKE, J. T. SNYDER, L. CHENG, I. P. WHITEHEAD, J. SONDEK, J. Biol. Chem. 2002, 277, 50893–50898.
References 43 D. K. WORTHYLAKE, K. L. ROSSMAN, J. SONDEK, Structure (Camb.) 2004, 12, 1078–1086. 44 M. A. BAUMEISTER, L. MARTINU, K. L. ROSSMAN, J. SONDEK, M. A. LEMMON, M. M. CHOU, J. Biol. Chem. 2003, 278, 11457–11464. 45 K. R. SKOWRONEK, F. GUO, Y. ZHENG, N. NASSAR, J. Biol. Chem. 2004. 46 K. L. PIERCE, R. T. PREMONT, R. J. LEFKOWITZ, Nat. Rev. Mol. Cell Biol. 2002, 3, 639–650. 47 J. A. PITCHER, N. J. FREEDMAN, R. J. LEFKOWITZ, Annu. Rev. Biochem. 1998, 67, 653–692. 48 S. K. DEBBURMAN, J. PTASIENSKI, J. L. BENOVIC, M. M. HOSEY, J. Biol. Chem. 1996, 271, 22552–22562. 49 J. A. PITCHER, K. TOUHARA, E. S. PAYNE, R. J. LEFKOWITZ, J. Biol. Chem. 1995, 270, 11707–11710. 50 D. T. LODOWSKI, J. A. PITCHER, W. D. CAPEL, R. J. LEFKOWITZ, J. J. G. TESMER, Science 2003, 300, 1256–1262. 51 L. N. JOHNSON, M. E. M. NOBLE, D. J. OWEN, Cell 1996, 85, 149–158. 52 D. FUSHMAN, T. NAJMABADI-HASKE, S. CAHILL, J. ZHENG, H. I. LEVINE, D. COWBURN, J. Biol. Chem. 1998, 273, 2835–2843. 53 C. V. CARMAN, L. S. BARAK, C. CHEN, L. LIU-CHEN, J. J. ONORATO, S. P. KENNEDY, M. G. CARON, J. L. BENOVIC, J. Biol. Chem. 2000, 275, 10443–10452. 54 W. Q. XU, S. C. HARRISON, M. J. ECK, Nature 1997, 385, 595–602. 55 F. SICHERI, I. MOAREFI, J. KURIYAN, Nature 1997, 385, 602–609.
56 D. RON, M. G. KAZANIETZ, FASEB J. 1999, 13, 1658–1676. 57 H. MELLOR, P. J. PARKER, Biochem. J. 1998, 332, 281–292. 58 Y. ONO, T. FUJII, K. IGARASHI, T. KUNO, C. TANAKA, U. KIKKAWA, Y. NISHIZUKA, Proc. Natl Acad. Sci. USA 1989, 86, 4868–4871. 59 J. H. HURLEY, A. C. NEWTON, P. J. PARKER, P. M. BLUMBERG, Y. NISHIZUKA, Protein Sci. 1997, 6, 477–480. 60 T. LEUNG, B. E. HOW, E. MANSER, L. LIM, J. Biol. Chem. 1994, 269, 12888–12892. 61 S. AHMED, J. LEE, R. KOZMA, A. BEST, C. MONFRIES, L. LIM, J. Biol. Chem. 1993, 268, 10709–10712. 62 L. B. ARECES, M. G. KAZANIETZ, P. M. BLUMBERG, J. Biol. Chem. 1994, 269, 19553–19558. 63 M. J. CALOCA, N. FERNANDEZ, N. E. LEWIN, D. X. CHING, R. MODALI, P. M. BLUMBERG, M. G. KAZANIETZ, J. Biol. Chem. 1997, 272, 26488–26496. 64 M. J. CALOCA, M. L. GARCIA-BERMEJO, P. M. BLUMBERG, N. E. LEWIN, E. KREMMER, H. MISCHAK, S. M. WANG, K. NACRO, B. BIENFAIT, V. E. MARQUEZ, M. G. KAZANIETZ, Proc. Natl Acad. Sci. USA 1999, 96, 11854–11859. 65 M. J. CALOCA, H. B. WANG, M. G. KAZANIETZ, Biochem. J. 2003, 375, 313–321. 66 B. CANAGARAJAH, F. COLLUCIO LESKOW, Y. S. J. HO, H. MISCHAK, L. SAIDI, M. G. KAZANIETZ, J. H. HURLEY, Cell 2004, 119, 407–418. 67 M. J. CALOCA, H. B. WANG, A. DELEMOS, S. M. WANG, M. G. KAZANIETZ, J. Biol. Chem. 2001, 276, 18303–18312.
13
1
Thermodynamics of Protein-Ligand Interactions R. B. Raffa
Temple University School of Pharmacy, Philadelphia, USA
Originally published in: Protein-Ligand Interactions. Edited by Hans-Joachim B¨ohm, Gisbert Schneider. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30521-6
1 Introduction
Used appropriately and judiciously, thermodynamic parameters can offer insight into the energetics of protein-ligand interactions that is not readily attainable by other means. The utility or application of thermodynamic analysis has traditionally been considered more the domain of (bio)chemistry than biology. However, the modern recognition of an interface in the case of protein–ligand interactions, particularly when the protein is an enzyme or a drug receptor, has kindled an integration with pragmatic benefit to basic understanding and to drug-discovery efforts [1]. Because the nature of most protein–ligand interactions involves relatively weak forces resulting from electrostatic attractions such as ion–ion, ion–dipole, dipole–dipole (hydrogen bonds), induced transient fluctuating dipoles (van der Waals), or hydrophobic effects, they are typically readily reversible and thus amenable to standard equilibrium thermodynamic analysis. Also convenient is that most protein-ligand interactions occur as closed systems, namely, they contain a fixed amount of matter, and the exchange of work is confined to expansion ( PdV ). Because other types of energy exchange, such as radiation, or other types of exchange of work, such as electrical, surface, or photophysical, are negligible (or are approximated to be), the thermodynamic analysis of protein-ligand interactions is simplified. This article provides a broad overview of the purpose and experimental approaches for determining thermodynamic parameters of protein–ligand interactions.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Thermodynamics of Protein-Ligand Interactions
2 Basic Thermodynamics of Protein-Ligand Interactions
Thermodynamics, originally the study of the more limited phenomena of heat and heat transfer, evolved to become the more broad study of energy and energy transfer with the recognition – through the cumulative work of Count Rumford (Benjamin Thompson), Robert Mayer, Sadi Carnot, James Joule, and others (see [2–4] for historical accounts) – that heat is a form of energy. A vast amount of experience and experimentation can be generalized in the following way (e.g., [5]): in any defined “system,” although the work done on the system (W) or the heat absorbed by the system (Q) in going from one “state” of the system to another varies with the path taken, the sum of W and Q is a constant and depends only on the initial and final “states” of the “system” under consideration. This generalization is formalized as follows: U = Q + W,
(1)
where U represents the change in the energy1) of the circumscribed “system.” This equation defines energy in terms of the measurable entities of heat and work and U as dependent only on the state of the system (i.e., independent of the path by which the system moves from one state to another). U around a closed path is zero, and only changes in energy can be measured (in terms of heat and work), not absolute values. The First Law of thermodynamics (colloquially, the law of “conservation of energy”; Mayer, Helmholtz) does not explain why or guarantee that a defined system change will occur spontaneously or, if it does, in which direction the change will occur. This shortcoming is addressed by the Second Law of thermodynamics. Again, a vast amount of experience and experimentation can be generalized by (Carnot, Kelvin, Clausius), or
(Q/T ) ≥ 0
(2)
d(Qreversible /T ) ≥ 0
(3)
where T is temperature in Kelvin. By defining change in “entropy” as S
Ssystem +
Ssurroundings ≥ 0,
1) U(or E) was previously termed the “internal” energy (no longer used). For a “closed” system (defined as one in which there is no exchange of mass with the ’surroundings’) at rest, U = Q+W if there is no other mechanism of exchange of energy. By convention, Q is the heat
Q/T, (4)
absorbed by the system (hence, positive if heat flows into the system and negative if heat flows out of the system) and W is the work done on the system (hence, positive if the surroundings do work on the system and negative if the system does work on the surroundings).
2 Basic Thermodynamics of Protein-Ligand Interactions
or d S ≥ 0.
(5)
Spontaneous change or equilibrium is described when the RHS of Eq. (4) or (5) is > or =0, respectively. To restrict the evaluation to measurable properties of the system rather than of the surroundings, free energy functions have been derived (Gibbs, Helmholtz). Most protein-ligand interactions occur at constant temperature and pressure, so that the only work is −PV. The second law then is represented by Ssystem −
(U + PV )system T
≥0.
(6)
Since U + PV is the change in “enthalpy2) for these conditions, S −
(H) ≥,0 T
(7)
which upon rearrangement becomes T S − H≥0.
(8)
With the definition of (J. Willard Gibbs) “free energy” as G = H − T S,
(9)
where G < 0 describes spontaneous change and G = 0 describes equilibrium.3) These and other fundamentals of thermodynamics are reviewed in several excellent texts [6–25]. In terms of protein-ligand interactions, energy changes occur in the dissociation of the ligand molecules from the molecules of the solvent and the association of ligand molecules with the protein molecules. Ligand with protein is associated with changes in H and S. In addition, because the solvent environment is structured due to hydrogen bonds, London forces, or van der Waals interactions, particularly near membrane surfaces, the leaving of ligand molecules is associated with a reversal of the solvation process, which generally involves a
2) Change in enthalpy is defined as H = U+ (PV), where P and V are the pressure and volume, respectively, of the system. (PV) is negligibly small in most proteinligand interactions, so H ≈ U, and the change in the enthalpy is used as an indication of the molecular forces involved in the interaction.
3) This is the fundamental criterion for a spontaneous transformation in a system, typical of most protein-ligand interactions, of constant temperature and pressure. The interaction proceeds spontaneously in the direction in which G<0. It is important to note that the rate of the interaction is not determined by the sign or magnitude G.
3
4 Thermodynamics of Protein-Ligand Interactions
decrease in entropy and an increase in energy level. Thus, the change in free energy upon protein-ligand interaction is the net result of dual rearrangement processes: first of the protein molecule (usually involving a change in degrees of freedom or exposure to water molecules) and then of the solvent molecules (usually involving a decrease in structural constraint and hence an increase in entropy).
3 Measurement of Thermodynamic Parameters
For an interaction between a protein (P) and a ligand (L) that forms a protein-ligand complex (PL) according to a simple, reversible, bimolecular step represented as k
1 PL P+L ⇔
k−1
(10)
the reaction can be characterized, with appropriate caveats, by the equilibrium constant (K eq = [PL]/[P][L]).4) In practice, the reciprocal of the equilibrium constant is commonly used and is termed the Michaelis constant (K M ) when the protein is an enzyme and the ligand is a substrate and is termed the dissociation constant (K d or K i ) when the protein is a receptor and the ligand is a neurotransmitter, hormone, or drug. The interaction can be visualized as a reaction-energy diagram as shown in Fig. 1. Changes in the energy coordinate (the ordinate) are plotted as a function of the position of the interaction as it proceeds in either direction along the reaction coordinate (the abscissa). This highly schematized representation indicates the overall change in energy (E) for the protein-ligand interaction and the activation energies for the association (E a ) and dissociation (E d ) steps. The diagram applies to the elementary step of the interaction. Associated processes, such as migration to the interaction site, catalytic activity (enzymes), activation of second-messenger transduction processes (receptors), etc., are not included. For the interaction represented by Eq. (10), the relationship between the change in free energy (G), change in enthalpy (H), and change in entropy (S) is given by Eq. (9). There are two major ways of obtaining the thermodynamic parameters. One way is by direct measurement of the heat of reaction, which for no PV work is the same as H. The recent development of highly sensitive calorimeters allows such measurement for a relatively wide variety of protein-ligand interactions and is described in more detail below. An alternative procedure employs a more indirect measure, which utilizes a simplified relationship (the van’t Hoff equation) between the thermodynamic parameters and the temperature dependence of the equilibrium constant of Eq. (10).
4) The relationship between these constants and the forward and reverse rate constants of the
interaction is not automatically known except for an elementary reaction step.
3 Measurement of Thermodynamic Parameters
Fig. 1 Reaction-energy diagram for the reversible interaction between a protein and a ligand that forms a protein-ligand complex. E is the overall change in energy for the interaction. Ea and Ed are the acti-
vation energies for the association and dissociation processes, respectively. Intermediate between the dissociated and associated components is a transition state comprised of an activated complex.
3.1 Calorimetric Determination of Thermodynamic Parameters
The use of calorimetry to measure the heat of a reaction is a time-honored technique. Presently, two modernized high-accuracy automated types of equipment are available with accompanying convenient software. One is known as “differential scanning calorimetry” (DSC), and the other is known as “isothermal titration calorimetry” (ITC). DSC measures the heat capacity (which at constant pressure is the temperature derivative of enthalpy) of the protein-ligand interaction under investigation by incrementally varying the temperature of the system over a specified range (the “scan”). Ultrasensitive isothermal titration microcalorimetry (the use of instruments for which the sensitivity is better than 1 µW) [26] measures the heat change that is associated with reactions in solution at a constant temperature and, by the sequential addition of ligand to the solution, also yields thermodynamic parameters. It is a well-characterized and widely accepted technique because the interaction is carried out at a constant pressure, VP = 0. Therefore, the energy change associated with the interaction is H the change in enthalpy (U = H + PV). An advantage of ITC over other methods is that it measures the enthalpy change directly. Other techniques, also described below, determine the enthalpy change indirectly. For this reason, DSC or ITC is the preferred method of obtaining interaction parameters, provided that the experimental conditions allow the use of these techniques. Because of the greater use of ITC for protein-ligand interactions to date, the details of this technique are provided below. In the standard ITC apparatus, the protein-ligand interaction proceeds in a sample cell of relatively small volume (usually 1–3 mL). One component (e.g., protein)
5
6 Thermodynamics of Protein-Ligand Interactions
of the reaction is placed in the reaction cell, and the other component (e.g., ligand) is added in stepwise fashion by an automated injection system in preset measured amounts for preset measured times. A built-in stirrer ensures that the reaction is continuously and well mixed. The reaction cell is composed of material that has high thermal conductivity such that energy changes (heat of reaction) that occur within the reaction cell are transmitted with minimal loss as changes in temperature. In modern ITC equipment, the change in temperature is measured as the amount of differential current (power) that is required to maintain the reaction cell at the same preset temperature as that of a reference cell filled with distilled water or the same buffer solution as the reaction cell. As a consequence of this design, the measurements are extremely precise because the dependent variable is power and essentially the only limitation is the electronic thermal motion. If the protein-ligand interaction is endothermic, more power (µcal s−1 ) is required relative to the reference cell. The power that is required, over baseline, comprises the raw data output of the ITC equipment. If the reaction is exothermic, less power is required, which is recorded as a downward deflection in output (Fig. 2). The overall interaction between a protein (enzyme or receptor) and a ligand (substrate, inhibitor, neurotransmitter, hormone, or drug) is carried out in a
Fig. 2 Diagrammatic representation of typical results obtained in an ITC study of a protein-ligand interaction. The raw data output (peak) accompanying each injection of ligand is the power (:cal s−1 ) that is required to maintain the sample cell at the same temperature as a reference cell. A downward deflection indicates an exothermic reaction; an upward deflection indi-
cates an endothermic reaction. Multiple ligand injections are made at preset intervals. The progressively smaller heat outputs correspond to progressively greater proteinligand binding until saturation is achieved. The residual deflections at the end of the run yield the heat of dilution, which is subtracted from the other deflections prior to further analysis.
3 Measurement of Thermodynamic Parameters
Fig. 3 The raw data output of ITC is transformed to show the heat exchange at each injection (kcal mol−1 of injectant), obtained by integration of the area of each “spike” in the raw data output, as a function of the
molar ratio of the protein-ligand binding interaction. The curve is then computergenerated as the best fit to either a one-site or multi-site binding model.
sequence of automated titrations. At each injection step, the power is recorded as a function of time. Each subsequent injection in the series is made after the power function returns to baseline. The output, therefore, forms an S-shaped curve, mirroring the progression of binding of the interacting species from initial injection to full saturation (Fig. 3). At the end of each run, all of the binding sites are occupied and no further heat of reaction is detected. Any residual power differential is a measure of the heat of solvation of the injected species. In modern ITC equipment, this heat is usually automatically subtracted from the heat of reaction. The raw data obtained for each injection (peak) are then integrated with respect to time, and the integrated heats that are derived from the raw data are plotted against the molar ratio of the interacting species. A best fit of the data is obtained using a non-linear algorithm. From this fit, the stoichiometry, K d , and H of the interaction are obtained. From the K d and H, the other thermodynamic parameters, G and S, are easily calculated from standard relationships. Additional details of the design and application of ITC are available in several excellent reviews [27–29].
3.2 van’t Hoff Determination of Thermodynamic Parameters 3.2.1 Relationship to Equilibrium Constant In the simplest case, the protein-ligand interaction can be represented as, or modeled as, a reversible bimolecular reaction such as depicted by P + L ⇔ PL. The
7
8 Thermodynamics of Protein-Ligand Interactions
change in Gibbs free energy (G) for the interaction in the direction indicated is related to the standard free energy change (G◦ ) by the following equation: G = G ◦ + RT ln(
[PL] ) [P][L]
(11)
where the brackets indicate concentration, R=1.99 cal/mol · K (=8.31 J/mol · K), and T is the absolute temperature in Kelvin (◦ C +273.15). Most protein-ligand interactions are examined at steady state, at which G = 0 (the process is not capable of producing work), so that Eq. (11) becomes G ◦ = − RT ln(
[PL] ) [P][L]
(12)
The ratio of complex concentration to the reactant concentrations can be represented by the equilibrium constant K eq , the reciprocal of the equilibrium constant (e.g., K M , K d , or K i ), or by some alternative designation in other types of studies. For the example of K d , substitution into Eq. (12) yields G ◦ = − RT ln(K eq ) = − RT ln(1/K d ) = RT ln(K d )
(13)
Hence, for the conditions under which most protein-ligand interactions are studied, Eq. (13) describes the relationship between the thermodynamic parameter G◦ and a reaction characteristic (the equilibrium constant) that can be measured experimentally. Because the change in Gibbs free energy is related to the change in enthalpy and entropy by G◦ = H◦ −TS◦ , Eq. (13) can be rearranged to ln(K d ) = (
S◦ H◦ 1 )( ) − R T R
(14)
Eq. (14) is an integrated form of the van’t Hoff equation d(lnK eq ) H◦ = dT RT 2
(15)
and is an approximation valid when H◦ and S◦ are not temperature dependent. Noting that Eq. (14) represents a linear relationship between ln (K d ) and 1/T with the y-intercept = − S◦ /R and the slope = H◦ /R, it is a common practice in thermodynamic analysis of protein-ligand interactions to determine K d at several different temperatures and then construct a “van’t Hoff plot” from which H◦ and S◦ are determined from the slope and y-intercept of the resultant data plotted as ln (K d ) against 1/T (which is a straight line if the heat capacity is independent of
3 Measurement of Thermodynamic Parameters
temperature). A smaller error in H◦ can be obtained if S◦ is determined first from the van’t Hoff plot and then H◦ from H◦ = G◦ + TS◦ . Not all such plots turn out to be linear, indicating that in those cases the heat capacity change (Cp ) is not independent of temperature for the interaction under study. It has also been suggested that H◦ values determined using the van’t Hoff plot method can differ from the same values determined using direct calorimetric measurement [30]. However, it has subsequently been reported that discrepancies are relatively minor [31]. 3.2.2 Obtaining the Equilibrium Constant In order to apply the van’t Hoff method of obtaining thermodynamic parameters, some means of measuring the association or dissociation constant of the protein-ligand interaction must be used. The basic principles and many of the experimental methodologies available for obtaining these constants have recently been summarized [32, 33] and are the subject of more extensive coverage in recent reviews (e.g., [34]) and monographs (e.g., [12, 35]). The methods include (extracted from [32] and [33]): Ĺ Equilibrium dialysis – Two compartments of a dialysis cell are divided by a semi-permeable membrane. The protein-ligand complex is allowed to associate or dissociate across the membrane until equilibrium is attained. By measuring the constituents of the interaction, the binding constant can be obtained from standard formulas. Ĺ Steady-state dialysis – The equilibrium dialysis technique is accelerated by having buffer flow at a constant rate on one side of the semi-permeable membrane and by stirring both sides in order to minimize the concentration gradients [36]. Ĺ Diafiltration – A type of dialysis equilibrium in which pressure is used to force the ligand-containing solution from one chamber into the protein-containing chamber [37]. Ĺ Ultrafiltration – Pressure or centrifugation is used to force a mixture of known total concentrations of protein and ligand through a semi-permeable membrane [38]. Ĺ Partition equilibrium – Separation occurs between two phases rather than across a semi-permeable membrane. Examples include partition between aqueous and lipid phases or partition between a liquid and a solid phase (e.g., where the binding sites are embedded on a solid matrix). Ĺ Gel (exclusion) chromatography – Counterpart to equilibrium dialysis when there is sufficient difference in size between protein and ligand and when the protein and protein-ligand complexes co-migrate. Ĺ Spectroscopy – Binding-induced changes in either a chromophore or fluorophore absorbance or emission are used to measure the ratio of free to bound ligand concentration. Examples include circular dichroism (differential absorption of left-and right-handed circularly polarized light), fluorescence emission (energy loss as radiation as a fluorophore returns to ground state from photon-excited
9
10 Thermodynamics of Protein-Ligand Interactions
Ĺ Ĺ
Ĺ
Ĺ
state) methods, including fluorescence anisotropy (binding of ligand changes the relative depolarization of the emission spectrum compared with that of a polarized exciting light). Electrophoresis – The components are separated on the basis of differential rates of migration toward an anode or cathode. Sedimentation equilibrium – An analytical ultracentrifuge is operated at a relatively slow speed that leads to a measurable equilibrium distribution of the constituents of a protein-ligand interaction. Radioligand binding – The most commonly used technique for the determination of binding to receptors is commonly called radioligand binding because of the use of a radioactive-labeled ligand for the quantification of the amount of bound material. As typically used, a radiolabeled ligand is incubated with the receptor preparation for a time sufficient for equilibrium to be attained. Bound and unbound ligands are then separated using any of a variety of techniques such as dialysis, centrifugation, or vacuum filtration (the most widely used method) (see [33] and [39] for details). Others – Affinity chromatography, biosensor techniques, and radioimmunoassay are among some of the other available techniques. In addition, perhaps a special mention should be made of the technique of estimating dissociation constants in pharmacological studies using irreversible antagonists (for the K d of an agonist) or a reversible antagonist (for the K d of the antagonist). These estimates, although not as intimate to the receptor-ligand interaction as some of the others, nevertheless have been used to some distinct advantage.
4 Applications 4.1 Calorimetric Determination of Thermodynamic Parameters
There are now well over 200 publications in which microcalorimetry has specifically been used to study protein-ligand interactions of a variety of types. A list of these studies is readily available by a MEDLINE search or from ITC equipment suppliers. Since the studies are too numerous to review here, perhaps a recent one might serve as a representative example of the technique and of its application. In this example [40] we determined the thermodynamic parameters associated with the binding of the reversible inhibitor 2 -CMP (2 -cytidine monophosphate) to RNAseA (ribonuclease A). We were specifically interested in the binding under conditions that were relatively “physiological,” i.e., at body temperature and in a buffer that contained multiple ions at roughly cellular concentrations. RNAses are exo- and endonucleases (EC 3.1.27.5), present in vertebrates and also in several bacteria [41–43], mold [44], and plant species [45, 46], that participate in
4 Applications 11
a variety of RNA-processing pathways. Several members of the RNAse superfamily, commonly referred to as the “non-secretory” type, function in predominantly intracellular roles, whereas others, termed the “secretory” type, have evolved [47] roles that are predominantly extracellular, presumably contributing to digestive and cytoprotective functions. (There are actually several systems of nomenclature for RNAses. This came about through historical factors, such as different names for the same RNAse being studied in different species and subsequently recognized as the same RNAse, identification of RNAse activity after naming the enzyme for other reasons, etc.). For the cytoprotective function of RNAses, cytotoxicity against external threats is a desirable and self-protective characteristic that is manifested under normal physiologic conditions. Usually, an intracellular ribonuclease inhibitor (RI) with exceptionally high affinity for RNAse protects the cell from any secretory RNAse that does not leave the cell. However, under two circumstances the secretory RNAses can be cytotoxic: failure of RI activity or unchecked RNAse activity. The first circumstance is a consequence of genetic defects that result in deficiencies in RNAse production or function. The second is a consequence of excess activity or inappropriate activity in pathological states. Perhaps the best-known example of the latter is the enhanced tumor growth that is attributed to angioneogenesis stimulated by the blood-borne RNAse angiogenin. However, there are other RNAses, specifically those designated as the ribonuclease 2 type, that are implicated in pathophysiological conditions where eosinophils appear in increased numbers, as in asthma and other inflammatory disorders in which tissue damage occurs as part of an allergic response [48–50]. Members of the human RNAse-A superfamily include Ĺ (“secretory”) pancreatic type (ribonucleases 1); Ĺ (“non-secretory” or “neurotoxin” type) liver, spleen, and urine (Us) RNAses (ribonucleases 2), also known as eosinophil-derived neurotoxin (EDN); Ĺ plasma RNAse (HT-29) (ribonucleases 4); Ĺ and angiogenins [47].
They constitute a group of homologous enzymes that display a preference for pyrimidine bases of RNA. Although some of the details are yet to be delineated, the catalytic mechanism of RNA cleavage by RNAses is hypothesized to occur as depicted in Fig. 4. The overall reaction is thought to occur in two steps [51]. In the first step, a 2 ,3 -cyclic phosphodiester is formed by a “transphosphorylation” reaction from the 5 carbon (starting from the base) to the 2 carbon of the next nucleotide in the RNA chain (Fig. 5). The catalytic reaction domain is formed by specific amino acid residues of the RNAse (Fig. 6), the details of which have been investigated by several strategies such as chemical modification and site-directed mutagenesis studies (e.g., [52–55]). The reaction products of the first step are not enzyme bound and therefore migrate into the solvent [56]. In the second step, which is believed to occur within the solvent, the product of the first step (2 ,3 -cyclic phosphodiester)
12 Thermodynamics of Protein-Ligand Interactions
Fig. 4 The proposed mechanism for the catalytic cleavage of RNA by RNase. The spheres represent amino acid residues of RNase or metal ions (e.g., Mg2+ ). Modified from [41].
is hydrolyzed to a 3 nucleotide [57, 58]. These reactions can then be represented as follows [51]: 1. Step 1: RNA ⇔ 2 ,3 -cyclic phosphodiesters + R-OH 2. Step 2: 2 3 -cyclic phosphodiesters →3 -phosphomonoesters.
Step 1 is the primary one that is catalyzed by RNAses. It is a fairly straightforward reaction and therefore is amenable to analysis by standard procedures [59]. RNase is also susceptible to inhibition by substances such as 2 -CMP. In our study [40],
Fig. 5 The proposed depolymerization reaction catalyzed by RNAse-A. The RNA backbone is indicated by the ellipsoid. Modified from [41].
4 Applications 13
Fig. 6 The catalytic cleft of bovine RNAse-A (indicated by the stippled region). A segment of RNA is oriented and held in the pocket formed by the amino acids indicated. Modified from [52].
we used ITC to determine the binding affinity and thermodynamic parameters associated with the reversible inhibition of RNAse-A by 2 -CMP at body temperature (37 ◦ C) and in a more “physiologically relevant” (i.e., multi-ion) buffer.5) These data ultimately might be helpful in drug-design efforts. Consistent isotherms with stable baselines were obtained. Maximal output to the injections of 2 -CMP was about −1.5 to –2.5 µcal/s, the negative deflection indicative of an exothermic reaction. As conventional for studies of this sort, the transposed data were plotted as the integrated heats (kcal/mol of 2 -CMP) for each injection against the 2 -CMP/RNAse-A molar ratio, and fitting parameters for the single-site nonlinear regression computer-fit of the raw data points yielded values for S (stoichiometry of the interaction), K eq , and H◦ for each run. The calculated stoichiometry was 5) Bovine pancreatic RNAse-A, 2 -CMP free acid (98% purity), Na+ , K+ , Ca+ , Mg2+ acetate, and glacial acetic acid (ACS or molecular biology grade) were purchased from commercial sources. The RNAse was dissolved in deionized water and was dialyzed twice for 4 h (in 20 mL solution) in a stirred 1-L beaker maintained at 1.5 ◦ C by immersion in an ice-bath. RNAse and salt stock solutions (in deionized water) were mixed such that the final concentrations were KCl (3 mM), CaCl2 (0.1 mM), NaAcetate (10 mM), K2 PO4 (3 mM), MgSO4 (0.4 mM), and KAcetate (50 mM) adjusted to pH 5.5 by dropwise addition of 50 mM HAcetate. The concentration of RNAse (0.04–0.05 mM), selected to be not much higher than the K d of interaction with 2 -CMP, was determined by quantitative UV spectrophotometre (277.5 nm; extinction coefficient ε = 9800 M−1 m−1 ). The concen-
tration of 2 CMP (1.2 mM), selected so that the c value (equal to the product of the binding constant and the total molar concentration of RNAse) would be between 1 and 500, was prepared in the same buffers as the RNAse-A and verified spectrophotometrically (260 nm, ε = 7400 M−1 cm−1 ). Solutions were degased at 36.5◦ C under vacuum (about 686 mmHg). The reference cell of the calorimeter contained degassed deionized water. The reaction cell contents were stirred at 400 rpm at 37◦ C throughout the experiment (the frictional heat of stirring is incorporated into the baseline). 2 -CMP was introduced into the reaction cell in a series of 35 4-µL injections, each delivered over 16 s at 3-min intervals.The equipment automatically adjusts for the change in volume. The data were evaluated (sampling rate 2 s−1 ) with computer software.
14 Thermodynamics of Protein-Ligand Interactions Table 1 Comparison of the dissociation constant and thermodynamic parameters obtained
for the 2 -CMP/RNAse-A interaction in multi-ion buffer and in a 50 mM potassium acetate buffer [61]
G◦ (kcal/mole) H◦ (kcal/mole) S◦ (kcal/mole · K) K d (nM) ∗
Multi-ion
Single-ion
−6.90±0.16* −15.7±2.0 −0.028±0.006 13.9±3.9
−7.46±0.10 −21.9±0.9 −0.047±0.003 5.6±1.0
Significant difference (P<0.05).
very close to 1:1, consistent with previous measures by others of a 1 to 1 interaction between 2 CMP and RNAse-A (e.g., [59]). The other estimated parameters, means (±S.D.) of triplicate runs, were K d = 13.9 (±3.9) µM; G◦ = −6.90 (±0.16) kcal/mol; H◦ (kcal/mol) = −15.7 (±2.0) kcal/mol; and S◦ = −0.028 (±0.006) kcal/mol · K. The observed negative entropy change is consistent with the location of the ribonucleolytic reaction active site within a cleft that binds and cleaves RNA [60]. The interaction proceeds because of a larger decrease in enthalpy. These results, which were determined in multi-ion buffer, were notably different from those determined in single-ion buffer [61] (Tab. 1). This single example, hopefully, serves as an example of the methodology of ITC and also a sense of its versatility. 4.2 van’t Hoff Determination of Thermodynamic Parameters
The van’t Hoff method has been the most commonly applied technique to determine thermodynamic parameters. A MEDLINE search of “van’t Hoff” reveals over 500 publications between 1966 and 2002. The application to enzyme reaction is well known. More recently, this method has been applied to ligand-receptor interactions [1]. Because these applications are less well known, a short summary is presented. There are also more caveats associated with such applications, a topic considered subsequently. The basic principles of thermodynamics of course apply to any chemical system, and in this sense the extension of the application of thermodynamic analysis to ligand-receptor interactions is straightforward. Ligand-receptor interactions involve a ligand molecule that has “affinity” for a receptor molecule in biological tissue. There is a requisite complementary 3-D shape for the ligand to able to “fit” the receptor and form chemical bonds – usually weak, reversible ones–with the receptor molecule. A subset of ligands, termed “agonists,” is also capable of inducing a biological effect by binding to receptors. Such molecules are said to have “intrinsic activity,” “efficacy,” or some similar term. Agonists can be “full” or “partial,” depending on their efficacy. Ligands that possess affinity but lack efficacy are “antagonists.” Such ligands do not activate measurable biological effects but block the agonist’s access to the receptor sites. Because it is not always possible to
4 Applications 15
control all variables precisely, the application of thermodynamic analysis to drugreceptor (pharmacological) interactions involves some care in both methodology and interpretation. Nevertheless, such an endeavor is often worthwhile if there is the opportunity to learn more about such systems than can be learned using other measures. The receptor concept was originated during the latter part of the 1800s and early 1900s, but it was the development of methodological techniques during the 1970s, in particular, radioligand binding techniques (e.g., [33]), that allowed the accurate determination of the number of drug-receptor binding complexes. With the wide commercial availability of relatively stable, radioactively labeled ligands, the technique is now almost routine (e.g., [35, 39]). The study published by Weiland et al. in Nature in 1979 [62] was perhaps the first to truly catch the attention of many biologists and remains probably the best-known thermodynamic study of drug-receptor interactions to many pharmacologists. In this study the authors measured the temperature dependency of the binding of 20 agonists and antagonists to the β-adrenoceptor located on turkey erythrocyte membranes. They reported that agonist binding affinity was greater at the lower of the two temperatures they examined. The calculated thermodynamic parameter values were reasonable for chemical reactions (G◦ = −6.19 to −12.51 kcal mol−1 , H◦ = −12.75 to −18.86 kcal mol−1 ). In contrast, it was found that antagonist binding to the β-adrenoceptor was largely “entropy driven” (the major contribution to the negative G◦ was due to a positive S◦ of 0.013 to 0.042 kcal mol−1 K−1 ). Thus, there was a clean distinction between agonist and antagonist binding that coincided quite nicely with prevailing views of the actions of agonists and antagonists at receptor sites -that agonists, but not antagonists, induce conformational changes in receptors and that this could account for the induction of the biological response by agonists but not antagonists. We now know, of course, that this does not hold for all receptor binding, but the early publications by Weiland et al. [62, 63] stand out as seminal in the field. Subsequent work has provided insight into a number of drug-receptor interactions. An example of the results of thermodynamic studies on one particular receptor type, the opioid receptor, is given in Tab. 2 [64–76]. Similar summaries of thermodynamic studies of other receptors can be found in Raffa [1]. Another application of the method, one that was brought to bear on a puzzling ligand-receptor question, was that reported by Wild et al. [75]. Although previous pharmacological studies had suggested the existence of more than one subtype of opioid δ receptor, only one had been cloned. Wild et al. [75] reasoned that a distinction could be demonstrated if two preparations, each containing a population of opioid δ receptors, had different temperature dependency of the dissociation constant. They measured the temperature dependence of the dissociation constant (using radioligand binding techniques) of the selective opioid δ receptor ligand [3 H]naltrindole in mouse brain tissue and mouse spinal cord tissue. Comparison of the two revealed that the van’t Hoff plots for mouse brain and mouse spinal cord had different slopes: one was positive and the other was negative. It was concluded that there are multiple subtypes of opioid δ receptors (at least functionally).
16 Thermodynamics of Protein-Ligand Interactions Table 2 Examples of thermodynamic studies of ligand interaction with opioid receptors
(from [1])
Agonists a. Radioligand binding β-endorphin DAMGO (µ) DAMGO DAMGO DAMGO DADLE δ DADLE Deltorphin Dihydromorphine Dihydromorphine DPDPE δ EKC κ EKC EKC (has) EKC (las) Etorphine Etorphine Methadone Morphine Ohmefentanyl Pentazocine PL017 SNC-80 (δ) SNC-80 (has) SNC-80 (las) Sufentanil b. Isolated Tissue DPDPE Antagonists a. Radioligand Binding CTAP Diprenorphine Diprenorphine Naloxone Naloxone Naloxone Naltrexone Naltriben (δ) Naltrindole (δ) Naltrindole Naltrindole Naltrindole Naltrindole TIPP (ψ) (δ)
Preparation
G◦
H◦
S◦
Reference
Rat brain Guinea pig brain Bovine adrenal Rat brain r-MOR/(CHO) Guinea pig brain Bovine adrenal Rat brain Rat brain Rat brain m-DOR-1 Guinea pig brain Bovine adrenal Frog brain Frog brain Rat brain Bovine adrenal r-MOR/(CHO) r-MOR/(CHO) r-MOR/(CHO) r-MOR/(CHO) r-MOR/(CHO) m-DOR-1 h-DOR/(CHO) h-DOR/(CHO) r-MOR/(CHO)
<0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0
>0 >0 >0 >0 <0 >0 >0 >0 >0 >0 <0 >0 >0 <0 <0 >0 >0 <0 <0 >0 <0 <0 <0 <0 <0 >0
>0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0 T-dep T-dep >0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0
64 65 66 67 68 65 66 67 69 67 70 65 66 71 71 72 66 68 68 68 68 68 70 73 73 68
MVD
<0
<0
>0
74
r-MOR/(CHO) Rat brain r-MOR/(CHO) Rat brain Rat brain h-DOR/(CHO) r-MOR/(CHO) h-DOR/(CHO) Mouse brain Mouse spinal cord NG 108-15 cells m-DOR-1 h-DOR/(CHO) m-DOR-1
<0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0 <0
>0 <0 >0 <0 <0 >0 <0 <0 >0 <0 >0 >0 >0 <0
>0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0 >0
68 72 68 69 67 73 68 73 75 75 75 70 73 70
5 Caveats 17 Table 2 Continued.
b. Isolated Tissue Naloxone Mixed Radioligand binding Bremazocine
Preparation
G◦
H◦
S◦
Reference
MVD
<0
<0
<0
76
h-DOR/(CHO)
<0
>0
>0
73
1 CHO = Chinese-hamster ovary cells; CTAP = D-Phe-Cys-Tyr-D-Trp-Arg-Thr-penicillamine-Thr-NH2 ; DAMGO = [D-Ala2 , NMePhe4 , Gly-ol5 ]enkephalin; DADLE = [D-Ala2 , D-Leu5 ]enkephalin; DPDPE = [D-Pen, 2,5 ]enkephalin; EKC = ethylketocyclazocine; has = high-affinity binding site; las = low-affinity binding site; m-DOR-1 = cloned δ receptor from mouse brain; MVD = mouse vas deferens; NG 108-15 = mouse neuroblastoma-rat glioma hybrid; PL017 = Tyr-Pro-NmePhe-D-Pro-NH2 ; SNC80 = (+)-4-[(aR)-a-((2S,5R)-4-allyl-2,5-dimethyl-1-piperazinyl)-3-methoxybenzyl]-N,N-diethylbenzamide; T-dep = temperature-dependent; TIPP(ψ) = Tyr-Tic(ψ)[CH2 NH]Phe-Phe-OH.
5 Caveats
The measurement of thermodynamic parameters for protein-ligand interactions can provide valuable insight into aspects of the interaction that are not easily obtainable by other techniques. However, as with all techniques, there are certain limitations in the approach – some related to the methodology and some related to the complexities of the systems under investigation. It is necessary to remember, for example, that the parameters determined apply to the overall reaction being measured. For most protein-ligand interactions, more than one process may be involved. For receptor-ligand interactions, this is almost certainly the case. For example, as a drug molecule interacts with a receptor and makes the transition from free to bound state, energy changes occur as the result of the alteration of the arrangement of receptor molecules as well as of the solvent molecular matrix from which the ligand leaves. Ion displacement, proton transfer, and other processes can be involved. The thermodynamic parameters that are measured for the interaction include these processes. Some of the more likely encountered possible limitations in thermodynamic analysis, particularly for ligand-receptor interactions, include the following: Ĺ Most receptors are membrane bound. Thus, the interaction of the receptor with the membrane must be considered (constraints on degrees of freedom, changes in the degrees of freedom upon ligand binding, etc.) Ĺ The thermodynamic analyses most often used, particularly the van’t Hoff method, require that measurements be made at steady-state conditions. In the case of radioligand binding determination of equilibrium constants, the time required for the protein-ligand interaction to reach steady state depends on the incubation temperature, and, therefore, the equilibrium constant must be
18 Thermodynamics of Protein-Ligand Interactions
Ĺ
Ĺ
Ĺ
Ĺ
determined for each temperature studied. For the most accurate results, the determination needs to be made at more than two temperatures in order to detect non-linearity. The integrated form of the van’t Hoff equation takes the simple form that is commonly used only if H◦ and S◦ for the interaction are not temperature dependent; otherwise, non-linearity in the van’t Hoff plot can arise. Meaningful information can still be obtained in such cases, but more complex analysis is required. The relevant affinity state is not always obvious. If the binding reaction is complicated by other processes, such as degradation of the ligand or internalization of the ligand, receptor, or both, then the data cannot be analyzed by simple thermodynamics methods – unless the system is defined in a way to incorporate these additional phenomena. Although thermodynamic parameters can be obtained for interaction mechanisms that are complex, interpretation of the results is greatly simplified when the interaction mechanism is simple. For example, tissues in which multiple receptor types are expressed will yield results different from tissues expressing only one type, unless a type-selective ligand is used. In radioligand binding studies, non-linear Scatchard plots or competition curves that have abnormally steep slopes imply complex binding phenomena, possibly involving multiple receptor types or affinity states. In such cases, the thermodynamic parameters should be separately determined for each receptor type or affinity state. The equilibrium (dissociation) constant (binding affinity) that is measured might depend upon the receptor affinity state, G protein coupling, allosteric influences, or other factors distal to the actual binding site. According to most present models, this is more likely for agonists than antagonists.
6 Summary
The determination of thermodynamic parameters of chemical reactions is extremely useful for the characterization and understanding of chemical reaction processes. The recent extension of this strategy to protein-ligand interactions has yielded equally significant insight into the more intimate details of these complicated and intransigent systems. In addition, the pragmatic application of the information obtained from thermodynamic data of protein-ligand interactions to novel drug-discovery efforts offers exciting new opportunities for creative and valuable work. Prior to the introduction of modern, automated, high-sensitivity calorimetry equipment, the van’t Hoff technique (which is based on the temperature dependence of the equilibrium constant of the reaction) was the primary experimental approach available to determine the thermodynamics of protein-ligand interactions. It remains a mainstay of such determinations. The technique requires the measurement of the equilibrium constant (or of its reciprocal, the dissociation constant),
References
and a large variety of methods have been developed to accomplish this. Radioligand binding is presently the most commonly used method for measuring the reaction constants of ligand-receptor interactions. In its most simplified form, the van’t Hoff equation assumes a temperature independence of enthalpy, and this requirement is unfortunately not always verified by experimentalists who use it. However, this problem can be easily avoided or overcome by appropriate experimental design or data analysis. The introduction of highly sensitive and automated calorimetric equipment has added new options for the measurement of thermodynamic parameters of proteinligand interactions. By varying either the temperature – as in differential scanning calorimetry (DSC) – or the ligand concentration – as in isothermal titra-tion calorimetry (ITC) – thermodynamic parameters are obtained directly. As with the van’t Hoff technique, there are limitations of practice and of interpretation. Perhaps the major pragmatic limitation at the present time is the relatively large amount of sample required for high-affinity interactions. As new strategies are devised to overcome these drawbacks, the application of calorimetric approaches will expand even further. The ever-increasing interest in the folding and interaction of large biomolecules with endogenous or designed ligands will provide the impetus for continued improvement and implementation of experimental approaches to determine the thermodynamics of protein-ligand interactions. The formalization of this interest in the new field of “proteinomics” will provide a framework for its development, and its application to drug-discovery efforts will demonstrate, as thermodynamics has always done, its utility.
References 1 R. B. RAFFA, Drug-Receptor Thermodynamics: Introduction and Applications, John Wiley & Sons, UK, 2001. 2 A. P. LIGHTMAN, Great Ideas in Physics, McGraw-Hill, USA, 2000. 3 M. GUILLEN, Five Equations that Changed the World, Hyperion, USA, 1995. 4 H. C. VON BAEYER, Warmth Disperses and Time Passes, Random House, USA, 1999. 5 I.M. KLOTZ, in R. B. RAFFA, ed., Drug-Receptor Thermodynamics: Introduction and Applications, John Wiley & Sons, UK, 2001. 6 H. C. van NESS, Understanding Thermody namics, Dover Publications, USA, 1969. 7 B. H. MAHAN, Elementary Chemical Thermodynamics, W.A. Benjamin, USA, 1964.
8 Ya. A. SMORODINSKY, (translated by V. I. KISIN), Temperature, Mir, USSR, 1984. 9 L. D. LANDAU, A. I. KITAIGORODSKY, (translated by M. GREENDLINGER), Molecules, Mir, USSR, 1980. 10 J. S. DUGDALE, Entropy and its Physical Meaning, Taylor & Francis, UK, 1996. 11 M. PLANCK, Treatise on Thermodynamics, Dover, USA, 1945. 12 I. M. KLOTZ, Ligand-Receptor Energetics: a Guide for the Perplexed, John Wiley & Sons, USA, 1997. 13 E. GRUNWALD, Thermodynamics of Molecular Species, John Wiley & Sons, USA, 1997. 14 G. G. HAMMES, Thermodynamics and Kinetics for the Biological Sciences, John Wiley & Sons, USA, 2000.
19
20 Thermodynamics of Protein-Ligand Interactions
15 M. GRAETZEL, P. INFELTA, The Bases of Chemical Thermodynamics, Universal, USA, 2000. 16 P. PERROT, A to Z of Thermodynamics, Oxford University Press, UK, 1998. 17 L. K. NASH, Elements of Chemical Thermodynamics, 2nd ed., Addison-Wesley, USA, 1970. 18 A. MACZEK, Statistical Thermodynamics, Oxford University Press, UK, 1998. 19 M. GOLDSTEIN, I. F. GOLDSTEIN, The Refrigerator and the Universe: Understanding the Laws of Energy, Harvard University Press, USA, 1993. 20 H. J. MOROWITZ, Foundations of Bioenergetics, Academic Press, USA, 1978. 21 Y.A. C ¸ ENGEL, M. A. BOLES, Thermodynamics: an Engineering Approach, 2nd ed., McGraw-Hill, USA, 1994. 22 S. I. SANDLER, Chemical and Engineering Thermodynamics, 2nd ed., John Wiley & Sons, USA, 1989. 23 G. N. LEWIS, M. RANDALL, Thermodynamics (revised by K. S. PITZER, L. BREWER), 2nd ed., McGraw-Hill, USA, 1961. 24 E. B. SMITH, Basic Chemical Thermodynamics, 4th ed., Clerendon, UK, 1990. 25 G. PRICE, Thermodynamics of Chemical Processes, Oxford University Press, 1998. 26 I. WADSO¨ , Thermochimica Acta 1995, 267, 45–59. 27 E. FREIRE, O. L. MAYORGA, M. STRAUME, Analytical Chem. 1990, 62, 950A–959A. 28 M. L. DOYLE, Curr. Opinion Biotech. 1997, 8, 31–35. 29 R.R.J. O’BRIEN, I. HAQ, J.E. LADBURY, in R.B. RAFFA, ed., Drug-Receptor Thermodynamics: Introduction and Applications, John Wiley & Sons, UK, 2001. 30 H. NAGHIBI, A. TAMURA, J. M. STURTEVANT, Proc. Natl. Acad. Sci. (USA) 1995, 92, 5597–5599. 31 J. R. HORN, D. RUSSELL, E. A. LEWIS, K. P. MURPHY, Biochem. 2001, 40, 1774–1778. 32 D.J. WINZOR, W.H. SAWYER in R. B. RAFFA, ed., Drug-Receptor Thermodynamics: Introduction and Applications, John Wiley & Sons, UK, 2001. 33 R.D. O’BRIEN, ed., The Receptors, a Comprehensive Treatise: vol. 1 General Principles and Procedures, Plenum, USA,
1979. 34 W. H. SAWYER, D. J. WINZOR, Curr. Protocols Protein Sci. A 1999, 5A, 1–40. 35 D. J. WINZOR, W. H. SAWYER, Quantitative Characterization of Ligand Binding, John Wiley & Sons, USA, 1995. 36 S. P. COLOWICK, F. C. WOMACK, J. Biol. Chem. 1969, 244, 774–776. 37 W. F. BLATT, S. M. ROBINSON, H. J. BIXLER, Analytical Biochem. 1968, 26, 151–173. 38 H. PAULUS, Analytical Biochem. 1969, 32, 91–100. 39 L. E. LIMBIRD, Cell Surface Receptors: a Short Course on Theory and Methods, 2nd ed., Kluwar, USA, 1996. 40 R. B. RAFFA, S. D. SPENCER, R. J. SCHULINGKAMP, Biochem. Pharmacol., 2002, 63, 1937–1939. 41 A.W. NICHOLSON, in G. D’ALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. 42 R. W. HARTLEY, in G. D’ALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. 43 M. IRIE, in G. D’ALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. 44 I. G. WOOL, in G. D’ALESSIO, J. F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. 45 P. A. BARIOLA, P. J. GREEN, in G. D’ALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. 46 S. K. PARRY, Y-H. LIU, A. E. CLARKE, E. NEWBIGIN, in G. DALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. 47 J. J. BEINTEMA, H. J. BREUKELMAN, A. CARSANA, A. FURIA, in G. DALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. 48 G. J. GLEICH C. R. ADOLPHSON, Adv. Immunol. 1986, 39, 177–253. 49 G. J. GLEICH, H. KITA, C. R. ADOLPHSON, IN M. M. FRANK, K. F. AUSTEN, H. N. CLAMAN, E. R. UNANUE, eds., Samter’s Immunologic Diseases, Little, Brown, USA, 1995. 50 M. R. SNYDER, G. J. GLEICH, in G. D’ALESSIO, J.F. RIORDAN, eds.,
References
51
52
53 54
55
56
57
58
59
60
61
Ribonucleases: Structures and Functions, Academic Press, USA, 1997. C. M. CUCHILLO, M. VILANOVA, M. V. NOGUE´ S, in G. DALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. N. RUSSO, R. SHAPIRO, Potent inhibition of mammalian ribonucleases by 3 ,5 -pyrophosphate linked nucleotides. J. Biol. Chem. 1999, 274, 14902–14908. M. S. STERN, M. S. DOSCHER, FEBS Letts. 1984, 171, 253–255. K. HAYDOCK, C. LIM, A. T. BRUNGER, M. KARPLUS, J. Amer. Chem. Soc. 1990, 112, 3826–3831. G. L. GILLILAND, in G. D’ALESSIO, J.F. RIORDAN, eds., Ribonucleases: Structures and Functions, Academic Press, USA, 1997. C. M. CUCHILLO, X. PARE´ S, A. GUASCH, T. BARMAN, F. TRAVERS, M. V. NOGUE´ S, FEBS Letts. 1993, 333, 207–210. F. M. RICHARDS, H. W. WYCKOFF HW in P. D. BOYER, ed., The Enzymes 3rd ed., vol. 4, Academic Press, USA, 1971. M. R. EFTINK, R. L. BILTONEN RL, in A. NEUBERGER, K. BROCKLEHURST, ed., Hydrolytic Enzymes, Elsevier, The Netherlands, 1987. T. S. WISEMAN, S. WILLISTON, J. F. BRANDTS, L.-N. LIN, (1989) Analytical Biochem. 1989, 179, 131–137. J. J. BEINTEMA, J. HOFSTEENGE, M. IWAMA, T. MORITA, K. OHGI, M. IRIE, R. H. SUGIYAMA, G. L. SCHIEVEN, C. A. DEKKER, D. G. GLITZ DG, Biochem. 1988, 27, 4530–4538. S. D. SPENCER, O. ABDUL, R. J. SCHULINGKAMP, R. B. RAFFA, J. Pharmacol. Exp. Ther. 2002 301, 925–929.
62 G. A. WEILAND, K. P. MINNEMANN, P. B. MOLINOFF, Nature (London) 1979, 281, 114–117. 63 G. A. WEILAND, K. P. MINNEMANN, P. B. MOLINOFF, Molec. Pharmacol. 1980, 18, 341–347. 64 P. NICOLAS, R. G. HAMMONDS Jr., S. GOMEZ, C. H. LI, Arch. Biochem. Biophys. 1982, 217, 80–86. 65 P. A. BOREA, G. M. BERTELLI, G. GILLI, Eur. J. Pharmacol. 1988, 146, 247–252. 66 N. BOURHIM, P. CANTAU, P. GIRAUD, E. CASTANAS, Compar. Biochem. Physiol. C.: Compar. Pharmacol. Toxicol. 1993, 105, 435–442. 67 G. F´ABIA´ N, S. BENYHE, J. FARKAS, M. SZUCS, J. Recept. Signal. Transduct. Res. 1996, 16, 151–168. 68 J. G. LI, R. B. RAFFA, P. CHEUNG, T. B. TZENG, L.-Y. LIU-CHEN, Eur. J. Pharmacol. 1998, 354, 227–237. 69 P. ZEMAN, G. TOTH, R. KVETNANSKY, Gen. Physiol. Biophys. 1987, 6, 237–248. 70 P. A. MAGUIRE, G. H. LOEW, Eur. J. Pharmacol. 1996, 318, 505–509. 71 J. MOITRA, H. A. OKTEM, A. BORSODI, J. Neurochem. 1995, 65, 798–801. 72 R. HITZEMANN, M. MURPHY, J. CURELL, Eur. J. Pharmacol. 1985, 108, 171–177. 73 K. D. WILD, S. K. YAGEL, R. B. RAFFA, Intl. Narcotics Res. Conf., Garmisch, 1998. 74 R. B. RAFFA, K. D. WILD, H. I. MOSBERG, F. PORRECA, Eur. J. Pharmacol. 1993, 244, 231–238. 75 K. D. WILD, F. PORRECA, H. I. YAMAMURA, R. B. RAFFA, Proc. Natl. Acad. Sci. (USA) 1994, 91, 12018–12021. 76 R. B. RAFFA, K. D. WILD, H. I. MOSBERG, F. PORRECA, J. Pharmacol. Exp. Ther. 1992, 263, 1030–1035.
21
1
Hydrogen Bonds in Protein-Ligand Complexes M. A. Williams, and J. E. Ladbury
University of College London, London, United Kingdom
Originally published in: Protein-Ligand Interactions.Edited by Hans-Joachim B¨ohm, Gisbert Schneider. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30521-6
1 Introduction 1.1 The Importance of Hydrogen Bonds
Hydrogen bonding plays a significant role in many chemical and biological processes, including ligand binding and enzyme catalysis. Consideration of hydrogenbonding properties in drug design is important because of their strong influence on specificity of binding, transport, adsorption, distribution, metabolization, and excretion properties of small molecules. Their ubiquity and flexibility make hydrogen bonds the most important physical interaction in systems of biomolecules in aqueous solution. Because hydrogen atoms comprise approximately one-half of the atoms within biological macromolecules and two-thirds of the atoms of the solvating water, hydrogen atoms, or protons, are found between almost every pair of noncovalently bonded heavy atoms in a biological system. Since the basic necessary condition for a hydrogen bond being present is that a proton lies between the electron clouds of two other atoms and modifies their interaction in a manner that is not explicable in terms of the van der Waals (dispersion-repulsion) effect, hydrogen bonds almost rival van der Waals interactions in number. Because van der Waals interactions occur unavoidably and with similar strength between all atoms, their contribution to selectivity of interactions largely lies in the shape selection caused by the repulsive component of the interaction. Consequently, from both an evolutionary and design perspective, modification of local hydrogen-bonding potential is the principal mechanism available for favorably enhancing the interactions between pairs of molecules. The popular notions of “hydrophobic” or “lipophilic” forces being important are merely a result of a nonatomic perspective. The hydrophobic
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Hydrogen Bonds in Protein-Ligand Complexes
forces, while being a simplifying concept, are a complex compound phenomenon resulting from redistribution and change in strength of water–water hydrogen bonds as solvent is released upon close approach of two apolar chemical groups. This may seem to be a somewhat enthusiastic view of the importance of hydrogen bonding, yet it is a consequence of regarding all forces from an atomic perspective and extrapolates a developing trend of the past decade, which has seen an increasing diversity of hydrogen-mediated interactions being considered significant in modulating the behavior of biological molecules. 1.2 Defining the Hydrogen Bond
The presently accepted definition of a hydrogen bond is due to Pimentel and McClellan’s classic 1960 book The Hydrogen Bond [1]. A hydrogen bond exists between a donor functional group (D H) and an atom or group of atoms (A) able to accept the bond, when there is evidence of association between the groups and that this is due to, or enhanced by, the presence of the hydrogen atom already covalently linked to D. This is clearly a rather broad definition, and the evidence for the bond itself can come from a wide variety of sources, e.g., X-ray crystallography, infrared spectroscopy, and calorimetry. Since 1960 a growing number of interactions have been characterized and encompassed by this definition. In the most familiar case of hydrogen bonding, an electronegative donor group (e.g., O, N, S) is considered capable of withdrawing electrons from the proton in a D H covalent bond, leaving the proton partially de-shielded, yielding a net partial positive charge, and resulting in the possibility of an electrostatic interaction between the proton and another electronegative group. This group has its electron density enhanced through induced polarization that is augmented by charge transfer to the proton. This simple electrostatic view of hydrogen bonding of electronegative atoms is originally due to Pauling [2] and is still the one embodied in almost all computational models of hydrogen bonding in biomolecular systems. However, hydrogen atoms have been subsequently shown to mediate many other interactions. Non-electronegative atoms such as carbon and silicon are know to act as donors, and in addition to the porbitals of electronegative atoms, delocalized π-orbitals of unsaturated or aromatic systems and negatively charged ionic species can act as acceptors (Tab. 1). Table 1 Potential hydrogen bond donor and acceptor groups classified according to their
strength of interaction
Very strong Strong Weak 1
Donor
Acceptor
N+ H3 , X+ -H, F H O H, N H, Hal H C H, S H, P H, M H
COO− , O− , N− , F− O C, O H, N, S C, F H, Hal− C C, Hal C, π S H, M, Hal M, Hal H, Se
X is any atom, Hal is any of the lighter halogens, and M is a transition metal.
2 Physical Character of Hydrogen Bonds 3
Theoretical efforts toward understanding the hydrogen bond have led to a progressive revision of the simple electrostatic picture from the 1970s onward [3, 4]. It is presently believed that in addition to the electrostatic contribution, there is a similarly sized covalent (quantum-mechanical) component to hydrogen bonds, which lies in the interaction between the empty σ * anti-bonding orbital of the donated hydrogen and highest occupied molecular orbital of the acceptor, which form a new shared orbital that is the dominant contribution to charge transfer in the interaction. This inherently quantum-mechanical view of the hydrogen bond seems to have received substantial confirmation from the recent experimental observation of peaks corresponding to the O H (1.72 Å) and O O (2.85 Å) distances in the anisotropic inelastic scattering of X-rays from the valence electrons of ice, indicating that electrons are being shared by atoms separated by these distances [5].
2 Physical Character of Hydrogen Bonds 2.1 Crystallographic Studies of Hydrogen Bonds
High-resolution X-ray and neutron crystallography have provided a great deal of information on the geometry of hydrogen bonds in both small molecule and protein crystals. X-rays or neutrons are scattered by collision with atomic electrons or nuclei, respectively, and in the ordered environment of a crystal produce diffraction patterns that can be interpreted to provide atomic positions. Neutron diffraction data are preferred for the study of hydrogen bonding, as the uncharged neutrons are scattered by direct collision with a proton (or deuteron) almost as efficiently as from heavier atoms. This is in contrast to the scattering of X-rays, which is very weak from the low electron density of hydrogen, rendering it invisible in all except the very highest-resolution (better than 1 Å) X-ray structures. However, neutron data have been historically much more difficult to obtain, and only 0.4% of small molecule and 0.05% of protein structures to date have been derived from neutron data. Advances in neutron production and detection technology and sample preparation promise to alleviate these experimental restraints, and many more neutron structures are expected in the future [6]. An example of the potential of the technique is the recent neutron diffraction structure of myoglobin [7], which shows in rich detail the extensive hydrogen bonding, the protonation states of histidines, and the orientation of water molecules in and around the hemebinding site. However, it will be some time before many such protein-ligand complexes are available for study. X-ray crystal structures numerically dominate the available molecular structural data, and our knowledge of hydrogen bonding geometry, particularly in proteins, is largely derived from them. Because the hydrogen atom is usually invisible in protein X-ray structures, in the vast majority of cases the positions of hydrogen atoms and the presence of a hydrogen bond is inferred from the proximity and expected
4 Hydrogen Bonds in Protein-Ligand Complexes
covalent geometry of donor and acceptor groups. Inherent errors in uncertainty of the donor and acceptor positions are also often larger than with neutron data, and biases in structure refinement procedures can lead to further uncertainty in local geometry. Clearly, without accurate atomic positions or even evidence of the presence of a hydrogen atom, in some cases the inferred hydrogen bonding may be erroneous and could lead to errors of interpretation. Small molecule X-ray crystal structures are usually determined by direct methods using little knowledge of the chemistry of the molecule other than its chemical composition and atom valances. Thus, the accuracy of the positions of heavy atoms depends only on the quality of the experimental data. Consequently, analysis of small molecule crystal structures provides unbiased information on both the relative orientation and separation of donor and acceptor groups (with errors in position typically 0.1 Å). However, the location of the hydrogen atom is not well determined even at very high resolution, as the electron density is not centered on the hydrogen atom but in the D H bond. D H bond lengths must be “normalized,” i.e., made equal to the average observed in chemically similar neutron-determined structures. This may produce small errors (<0.05 Å), as the presence and nature of a neighboring H-bond acceptor will affect the actual D H bond length, but such fine detail is not usually significant in comparison to other errors in structures. In the case of protein crystallography, considerably greater use is made of prior knowledge, of average covalent geometry, of torsion angle preferences, and of van der Waals radii of atoms in building a model whose properties are then compared to, and refined against, the experimental diffraction data until the agreement between the two is satisfactory. Consequently, some biases from the parameters representing the covalent geometry and non-bonded interactions are found in X-ray protein structures. The intrinsic uncertainty in the position of atoms in protein crystal structures is also usually greater and depends on whether the atom is part of the polypeptide backbone or in an amino acid side chain, and whether it is found in the interior or on the surface of the protein, where dynamic averaging of the position of atoms can be substantial. The average error in a set of coordinates derived from a 2.0–2.8 Å resolution electron density map is about 0.2–0.4 Å. Thus, for an interaction between a donor and acceptor, the error in measurement of a hydrogen bond distance can easily be 0.5 Å. Consequently, in some cases it can be very difficult to determine whether or not a hydrogen bond is present. 2.2 The Geometry of Hydrogen Bonds
Individual uncertainties in local structure are overcome in statistical surveys of large numbers of small molecule or protein structures, which are able to give a consistent and reliable picture of hydrogen bonding. Superposition of a particular donor–acceptor pair from many structures gives an anisotropic three-dimensional distribution of observed geometries. Differences between such distributions and those expected for a van der Waals interaction (more dispersed and isotropic) are diagnostic of the hydrogen-bonding capability of the selected groups where this may
2 Physical Character of Hydrogen Bonds 5
otherwise be in doubt. This is particularly true of weak hydrogen bond donors and acceptors, where deviation from van der Waals behavior may only be ascertained by superposition of large numbers of structures. The sheer weight of numbers means that the X-ray data are more informative, despite the poor visibility of hydrogen, and no significant differences from analysis of the much smaller number of neutron structures have been observed [8, 9]. Since hydrogen bond restraints are not used directly in structure refinement, even protein structures at relatively low resolution reveal unbiased information on the relative frequency of occurrence of particular types of hydrogen bond and orientation information on donor and acceptor groups (although distance information is less reliable due to the van der Waals parameters in the refinement force fields). Overall, studies of small molecule structures have greater accuracy and generality, which is particularly useful in thinking about a wide variety of potential ligands [10], and studies of protein structures give more specific information on amino acids and the significance of a particular type of hydrogen-bond interaction among all the competing interactions in proteins. The classic survey of hydrogen bonding in proteins was conducted by Baker and Hubbard [11]. This highlighted the importance of hydrogen bonds in forming not only helices and sheets but also the structure of loops, in binding water molecules, in recognition processes, and in catalysis. The most common bond in biological chemistry is the C O H N peptide backbone bond, which is usually substantially shorter than the van der Waals distance for O and N and is easily identified even in relatively low-resolution structures. Indeed, in the great majority of biological hydrogen bonds, the donating and accepting groups are either N or O atoms; however, weaker interactions including C H O or N, S H O and N H π electrons are observed [9]. In protein structures there is a systematic tendency toward a greater number of observed hydrogen bonds with increasing resolution. At 3.0 Å resolution, 85% of C O acceptors and 75% of N H donors form at least one hydrogen bond according to standard geometric criteria. However, at 1.5 Å resolution more than 95% of C O and 90% of N H are satisfied according to the same criteria, and virtually all are satisfied if weaker criteria are used that leave a greater margin for positional errors [12]. This means that inaccuracies in the vast majority of protein structures cause underestimation of the number of hydrogen bonds present. It is important to be aware of this in considering any particular protein-ligand structure. Indeed, there is a good case for forcing satisfaction of the weak criteria for classic hydrogen bond donors and acceptors in structure refinement and/or modeling studies of proteins, although this is not routinely done. Statistical analyses of both proteins and small molecules reveal that hydrogen bond stereochemistry is influenced by two major factors: the electronic configuration of the acceptors and the steric accessibility of the acceptors and donors. Considering the most common hydrogen bond in biological chemistry, there is a distinct preference for N H O C bonds to form in the O C RR’ plane and in the directions of the conventionally viewed sp2 lone pairs [13], with the proton lying within 30◦ of the plane and at 30–60◦ to the O C axis in the majority of cases. Note that in relation to the simple model of a hydrogen bond as an
6 Hydrogen Bonds in Protein-Ligand Complexes
electrostatic dipole-dipole interaction, which is embodied in most modeling software, the D H group does not lie along the dipole but points to the lone pairs. Note also that there is no absolute requirement for linearity of the acceptor–proton–donor system, as the σ * anti-bonding orbital of the H D, which is responsible for the covalent/charge transfer component of the bond, is a spherically symmetric orbital and imposes no directionality. The principal contributions to the observed directional preferences are the total electrostatic fields (accurately modeled as higher-order multi-poles [14]) and steric effects of donor, acceptor, and adjacent groups. The effect of electronic configuration of the acceptor is particularly apparent in the contrast of Ser/Thr and Tyr hydroxyl groups; the phenolic hydroxyl of tyrosine has a preference for near-plane position for donors and/or acceptors, as its sp2 hybridization leaves the lone pair electrons in the plane of the ring [15, 16], whereas serine and threonine hydroxyls have sp3 hybridization with two acceptor and one donor position at 120◦ spacing (the donating proton is usually trans to the carbon three covalent bonds away). The observed spatial distributions for the principal amino acid donor and acceptor groups are illustrated schematically in Fig. 1. Not all of the hydrogen-bonding potential of a particular amino acid is necessarily fulfilled. Although virtually all strong hydrogen-bonding groups form at least one hydrogen bond, and the very strong charged groups usually use all their capacity, R O H, R S H, C O and glutamate COO− form only one hydrogen bond [12]. It seems either that once the first interaction is made the second is less favorable energetically or that crowding makes it difficult to surround a group with the full complement of partners. The O H and S H groups tend either to donate or accept, and the main-chain carboxyl and glutamate COO− acceptors tend to be bifurcated. A bifurcated hydrogen bond occurs when a donated proton is close to two acceptor sites or a single acceptor is close to two donors. Analysis of high-resolution protein X-ray crystal structures shows that a quarter of all main-chain N H donors are bifurcated, i.e., they make more than the expected number of hydrogen bonds. Bifurcated donors occur systematically in both alpha-helices (most hydrogen bonds are of this type; the dominant component is N H O C(i-4 ), but longer distance N H O C(i-3 ), are also present) and in a substantial minority of β-sheet N H groups [17]. The possibility of such bifurcation should not be neglected in proteinligand structural studies, e.g., making one hydrogen bond to a bidentate acceptor may gain most of the binding energy. The above-mentioned hydrogen bond surveys considered only the classical strong hydrogen bond donors and acceptors O, N, and S, but recently the importance of weaker hydrogen bond groups has come to be more widely recognized. Weak–strong and weak–very-strong pairs of donor and acceptor are sufficiently energetically favored to compete with other biomolecular interactions and have an impact on protein structure and function. It has long been know that π-orbitals of aromatic ring systems can act as hydrogen bond acceptors (see the monumental review of weak hydrogen bonding by Desiraju and Steiner [18] for a detailed history), but it was not until the mid-1980s that they were unambiguously observed in several very high-resolution neutron and X-ray
2 Physical Character of Hydrogen Bonds 7
Fig. 1 Schematic illustration of the observed spatial distributions of hydrogen bond partners for several amino acids [13, 16]. Darker shading indicates an increased likelihood of finding a hydrogen bond partner at a particular position. Most groups have maximum likelihood of having their partner located in the plane of R’RD
or R’RA; only Ser/Thr/Cys/Lys have maxima out of plane. The Cys and Lys distributions are similar to Ser/Thr but with a greater and shorter D-A distance, respectively. The N H of the amide/His/Trp are also similar, and the O H group of tyrosine has a distribution similar to C O.
structures, including the observation by Perutz et al. [19] of a hydrogen bond from an asparagine side chain N H to an aromatic ring of the anti-sickling agent bezafibrate bound to hemoglobin. This complex also contained several other weak hydrogen bonds involving C H groups that help determine drug binding. The significance of these observations was not immediately appreciated, but a few further important examples, including the observation of several amino hydrogen
8 Hydrogen Bonds in Protein-Ligand Complexes
Fig. 2 Schematic of the observed hydrogen bonding to phosphotyrosine in a complex with the v-src SH2 domain [20]. Both very strong hydrogen bonds to the phosphate and weak bonds to the aromatic ring are important to binding.
bonds to the ring of the phosphotyrosine in its recognition by SH2 domains, as shown in Fig. 2 [20], and a later review by Perutz [21] stimulated both structural analyses and theoretical studies of D H π-acceptor systems in proteins [22, 23]. It has been shown that such bonds are relatively rare compared to the usual strong bonds. However, numerous examples are found where water and peptide D H π interactions are functional in stabilization of helix termini, strand ends, strand edges, beta-bulges, and regular turns. Side chain D H π hydrogen bonds are also formed in considerable numbers in α-helices and β-sheets. A recent survey [24] identified approximately 1 such weak bond for approximately every 11 aromatic residues (increasing to 1 per 6 residues for tryptophan because of its double-ring structure) in protein structures. There has been no systematic survey of such hydrogen bonds in protein-ligand complexes, but because many drugs contain a high proportion of atoms in ring structures, it is expected that there are many unrecognized cases in which they are important. Although the peptide Cα -H group has historically not been thought to form hydrogen bonds within proteins, recent experimental evidence and ab initio quantum calculations show it to be an effective proton donor, as it is activated by the neighboring peptide groups. Experimentally, Cα -H O contacts observed in
2 Physical Character of Hydrogen Bonds 9
neutron crystal-structure determinations of amino acids show the lengthening of the Cα -H bond with decreasing H O separation that is characteristic of hydrogen bonds [25]. Its binding energy to a water molecule has been calculated to lie in the range between 7.9 and 10.5 kJ mol−1 , comparable to the interactions between water molecules themselves, and a hydrogen bond to a charged lysine residue is significantly stronger than a conventional O H O interaction [26]. Both experimental and theoretical equilibrium Cα O bond lengths are about 3.3 Å, somewhat longer than the 2.7–3.0 Å of strong-donor/strong-acceptor pairs. 2.3 Infrared Spectroscopy of Hydrogen Bonds
The formation of a hydrogen bond changes the electron distribution in donor and acceptor groups, resulting in changes in the depth and shape of the potential wells corresponding to the covalent bonds in system. The changes in the covalent bonds are minor structurally and difficult to detect, as discussed above, but are readily observable via the change in the vibrational (infrared) spectrum of the system. Infrared spectroscopy is particularly useful in the study of homogeneous liquids and solids or dilute solutions of small molecules, where the simplicity of the system allows peaks or bands in the spectrum to be assigned to particular bonds via theoretical methods. Shifts in the D H stretching frequency are often the principal evidence of the reality of a group’s hydrogen-bonding ability, particularly in weak bonding cases and where insufficient statistical evidence is available from structural databases. Such shifts are also diagnostic of whether or not a hydrogen bond exists in an individual case, which can be difficult to decide on the basis of a single structure. Infrared spectroscopy is the primary method for monitoring the extent of hydrogen bond formation of particular groups in dilute solution in an inert solvent, the concentration and temperature dependence of which allows determination of the intrinsic thermodynamics of the formation of particular hydrogen bonds. The degree of change in a vibrational frequency can also be empirically related to hydrogen bond length and the energy of the bond [27, 28]. With these characteristics, infrared spectroscopy sounds like a highly appropriate technique for definitive characterization of individual hydrogen bonds in proteinligand structures. However, the difficulty of assigning a given infrared band to a particular hydrogen bond in a spectroscopically crowded heterogeneous biomolecular system has severely limited its use so far. Novel infrared techniques for spectral simplification such as ultraviolet resonance Raman spectroscopy, which selects polarizable aromatic groups [29], and Raman optical activity, which selects chiral centers [30], may see an increased use in understanding protein-ligand interactions in the future. For now, absorption spectroscopy remains a tool for fundamental, not protein-specific, investigation. In particular, it has provided considerable insight into apolar hydration processes via monitoring the signals from water (see Section 3.1).
10 Hydrogen Bonds in Protein-Ligand Complexes
2.4 NMR Studies of Hydrogen Bonds
Because of the ability to identify all individual C, N, and H atoms in the NMR spectra of proteins via isotopic labeling and multi-dimensional NMR techniques [31], NMR has become the key technique for investigation of biomolecular structure in solution. Several NMR spectral features are affected by hydrogen bonding, allowing hydrogen bonds to at least be identified and in some cases more accurately characterized than in high-resolution crystal structures. First, the chemical shift of a hydrogen-bonded proton is usually higher than an otherwise equivalent non-hydrogen bonded one, as the proton is de-shielded when it is withdrawn from the donor group. This can cause shifts of up to 6 ppm in the case of a very strong bond or bonds (e.g., a salt bridge). This effect is useful in comparative studies (i.e., with and without ligand) for the detection of hydrogen-bonding interactions on a ligand or protein. Because the H shift depends on many other conformational and environmental factors, however, it can be difficult to definitely ascribe the cause of a particular shift in the absence of a structure. If a series of closely related ligands is available (i.e., with and without the donors or acceptors in particular positions), comparison of spectra may determine which specific ligand and protein groups are interacting. Interpretational difficulties are, however, exacerbated by changes in solvent structure upon ligand binding. The change being measured is not simply due to the creation, or not, of a hydrogen bond but rather to the exchange of a hydrogen bond to water for the interaction with the ligand. This means that shifts in either direction can occur, reflecting changes in hydrogen bond strength. However, when a structure is available or in other cases where the binding partners are definitely identified, the hydrogen bond interaction and particularly its local energetic environment can be investigated in great detail [32]. Titratable groups (pK a near 7) are often of considerable importance to protein function and ligand binding. In particular, change of ionization state upon binding can have profound effects on the binding constant. NMR is the standard method of determining pK a of most groups in biomolecular complexes, by direct observation of changes in the apparent proton population or indirect observation of the donor atom chemical shift with pH [33]. Additional information on bond lengths (and energies) can be found from the quantitation of populations of protonated and deuterated hydrogen bonds in protein-ligand system in mixed H2 O/D2 O solvent [34]. Recently the direct coupling of nuclear magnetic energy levels across a hydrogen bond via their shared electrons has been detected as splitting of the NMR peaks, so-called 2 J (proton–acceptor) and 3 J (donor–acceptor) couplings. Such coupling is expected given the experimental observation and theoretical expectation of electron sharing (Section 1.2). A single coupling experiment allows direct identification of the hydrogen bonding partners (D, H, A), provided that they are isotopically (13 C, 15 N) labeled, and gives a very accurate determination of hydrogen bond length and angle based on a correlation with neutron structural data [35]. The requirement for isotope labeling and the general low sensitivity of the experiments have restricted their use so far to monitoring ligand-induced changes to the protein’s
2 Physical Character of Hydrogen Bonds 11
own hydrogen-bonding patterns [36], but with increased awareness of their utility and the resources to make labeled ligands, they could become a mainstay of future protein-ligand hydrogen-bonding studies.
2.5 Thermodynamics of Hydrogen Bonding
Observed crystal geometries represent a compromise between many competing forces, and the 3-D distribution observed for a particular donor-acceptor pair roughly (and through the blur of the positional uncertainties discussed in Section 2.1) represents a Boltzmann distribution on the potential energy surface of the hydrogen bond interaction. These distributions are generally rather broad (see Fig. 1), implying a shallow energy minimum or minima with depths of the same order of magnitude as thermal energies. Measurements of the energetics of individual hydrogen bonds can be made calorimetrically or spectroscopically in a dilute solution of the hydrogen-bonding moieties in an otherwise inert medium. Such experimental systems do not replicate the polarizing environment found in biological systems, but do allow comparison of the strengths of different types of hydrogen bonds and assessment of the ability of quantum theoretical methods for the prediction of those strengths. The success of theoretical methods in this simplified chemical milieu underpins confidence in their use in understanding the greater complexity of the situations found in biological systems. The diverse nature of the geometric and electronic character of the hydrogen bond is reflected in the range of its reported energetic value (Tab. 2). Intrinsic hydrogen bond strength differs by more than two orders of magnitude, depending upon the interacting partners. Even when the donor and acceptor atoms are the same, the dependence of hydrogen bonding on local electron distributions creates a variation of a factor of 10 for N H O C, depending on the nature of the substituents
Table 2 Examples of intrinsic hydrogen bond enthalpies measured for dilute solutions in
inert solvents Type
Example
−H(kj mol−1
Reference
Very strong. . .very strong Strong. . .very strong Strong. . .strong
F-H F− H O H (OH)− H O H OH2 R’RN H O CR1 R2 R O H O CR1 R2 R O H SR1 R2 R’RN H π RR’C-H OH2 RR’C H. π H S H SH2 CH4 SH2
163 96 21 14.5–18 14–32 17.5 4.5–1 6 9.2 5.8 4.5 1.7
18 18 18 28 28 28 21, 38 18 18 18 18
Weak. . .strong Weak. . .weak
12 Hydrogen Bonds in Protein-Ligand Complexes
attached to donor and acceptor (see [37] and references therein for a summary of substituent effects on donor and acceptor ability in drug-like molecules). 2.6 Experimental Thermodynamics of Biomolecular Hydrogen Bonds
Numerous studies have been performed to quantify the energetic contribution of hydrogen bonding in protein systems. Despite this significant experimental effort, many apparent discrepancies remain between studies on different systems. The main difficulty is that of isolating the contribution to binding from one or several defined hydrogen bonds. In aqueous solvent, hydrogen bond formation between the interacting moieties involves first breaking similar bonds with solvent water. This increases the complexity of the deconvolution of the components of the interaction that is required to give the energetic value for the formation of the protein-ligand hydrogen bond. In other words, instead of looking at the interaction D−H+A=D−H···A the actual event includes D H···O−H2 +H O H···A D H···A+H O H···O H2 . Rather than measuring the formation of one hydrogen bond in isolation, the experiment measures a rearrangement of already existing hydrogen bonds. This can have some counterintuitive consequences, e.g., the S H group is regarded as almost as effective a contributor to protein-ligand interactions as O H, since the energetic difference between being hydrated and forming an interfacial hydrogen bond is thought to be similar for both groups [39]. The most amenable way to measure the thermodynamic effect of hydrogen bonding in protein systems is to measure the differences between wild-type protein and a mutant form in which a residue is substituted, or a subtle chemical modification of a ligand is made, in order to remove selected hydrogen-bonding moieties. Interpretation of these data are complicated by the fact that hydrogen bonds affect, and are affected by, their local chemical environment, particularly in such adaptable and flexible entities as proteins. This causes an inherent difficulty in measuring the thermodynamic parameters for the contribution of a hydrogen bond to protein stability or protein-ligand interaction. Any experiment (e.g., mutagenesis) that creates a change in the disposition 1) of chemical groups surrounding the potential hydrogen bond site, 2) of the solvent, or 3) of the propensity of protein structures to conformationally adapt to binding, may affect the thermodynamic value obtained [40]. In the absence of structural information on each mutant, or modification, interpretation of such thermodynamic experiments is risky. As an alternative to these complexities, many studies have been carried out on model dilute solutions in media that supposedly mimic properties of the physio-chemical environments relevant to protein-protein and protein-ligand interactions.
2 Physical Character of Hydrogen Bonds 13
To emphasize the variation in reported experimental quantification of hydrogen bonds, we list several studies of the energetics (enthalpy or free energy) of hydrogen bond formation in biological interactions and related model systems. Early work by Schellman [41] on urea solutions arrived at a value of −6.3 kJ mol−1 for the H of formation of an amide hydrogen bond (i.e., a hydrogen bond between the N H and C O groups from amide linkages) in aqueous solution. Subsequently, the 1962 study of Klotz and Franzen [42] using dimerization of N-methyl-acetamide in aqueous solution suggested that the value of H for an amide hydrogen bond in water was near zero and that the overall free energy was unfavorable. However, later studies on δ-valerolactam [43, 44] calculated values of between −8 and −13 kJ mol−1 for the formation of a hydrogen bond in water. Scholtz et al. [45] also arrived at a favorable value for the H of formation of a hydrogen bond in water. Each of these studies is regularly quoted as the typical strong strong hydrogen bond energy for a solvent-exposed situation. Fersht et al., in a very well-known study of the binding of tyrosyl-tRNA synthetase to its substrate [39], found that deletion of a strong hydrogen bond donor or acceptor from the enzyme reduced the free energy of binding by only 2.1–6.3 kJ mol−1 , whereas removal of the partner of a very strong donor or acceptor weakened binding by an additional 12 kJ mol−1 . Data on mutants of ribonuclease T1 by Shirley et al. [46] similarly showed that a buried strong strong bond contributed −5.4 kJ mol−1 to conformational stability in the folded form. Here a different physical process is operating from that in the physiochemical studies of hydrogen bond formation in water because the groups are buried upon ligand binding or folding. In addition to loss of hydrogen bonds, buried groups lose long-range interactions with the surrounding water and gain long-range interactions with the protein. Several groups have tried to create physiochemical models of the burial process that allow partitioning of the two processes (bond formation and burial). The G for the transfer of a hydrogen-bonded peptide group from water to octanol (taken as a model of the protein interior) was determined as being about 4.6 kJ mol−1 , i.e., desolvation of the hydrogen-bonded pair is unfavorable [47]. In conjunction with the above mutagenesis data, this implies that the G contribution made by the hydrogen bond formed in water is favorable but is made less so by burial. However, it was suggested on the basis of model systems that the associated unfavorable H of dehydrating polar groups on burial might result in an overall unfavorable H for the burial of hydrogen-bonded polar groups [48, 49]. This is a contentious hypothesis, as dissolution of crystalline cyclic dipeptides [50] demonstrates that amide-hydroxyl hydrogen bonds give a favorable contribution to interaction from both H = −7.1 ± 1.3 kJ mol−1 and G = −2.4±0.8 kJ mol−1 . It is apparent from the above data that despite the many attempts to quantify the H and G for the formation of hydrogen bonds in biological systems, there is no convergence to a single value. This is not surprising, since the strength of hydrogen bonds depends on the local environment, which itself can be highly variable in biological systems. More data on H, G, and structural changes in series of similar complexes, categorized by type of hydrogen-bonding partners and local environmental change, are needed in order to provide a set of case histories from which
14 Hydrogen Bonds in Protein-Ligand Complexes
it may be possible to predict the effect of a proposed change by similarity to a previous case. An important feature of all these observed energy changes resulting from rearrangement of hydrogen bonds is that they are of the same order of magnitude as the thermal energies at room (or body) temperature (RT=2.48 kJ mol−1 at 298 K). The biological importance of hydrogen bonds lies not only in their specific structuring geometry but also in the possibility of rapid reorganization of hydrogen bonds (and consequent structural change) in response to environmental changes, such as ligand binding, assisting specificity of recognition.
3 Interactions with Water 3.1 Bulk and Surface Water Molecules
Thinking about the hydration of protein complexes is simplified by dividing water molecules into four classes: bulk water molecules that are not directly in contact with the biomolecules, surface water hydrogen bonded to the protein or ligand, surface water associated with apolar biomolecular groups, and buried water molecules that have no direct connection to the bulk solvent. Water molecules have a strong tendency to interact with each other and consequently cause the association of compounds with which they cannot interact as strongly. A bulk water molecule makes between four and five hydrogen bonds with neighboring water molecules, each contributing approximately −10 kJ mol−1 to the energy of bulk water. Bulk water molecules have another important property — they are easily reoriented in response to an electrostatic field. This reorientation serves to attenuate the attractive links between charged groups, which form the strongest hydrogen bonds found in protein-ligand systems. Such strong interactions could otherwise impose unacceptable rigidity on the molecules. Surface water molecules are distributed over the entire surface of the interacting molecules, are not usually particularly restricted in their motion, and exchange with the bulk solvent on a time scale of 10–300 picoseconds. Water molecules hydrogen bonded to surface polar groups are generally thought to have energetic behavior similar to bulk water, with subtle differences dependent on the strength of their hydrogen-bonding partner. A significant minority of external waters make more than one hydrogen bond with the macromolecular surface, and those water molecules hydrogen bonded to the main chain C O and N H groups generally appear at better-defined sites than those bound to amino acid side chains [51]. These structural observations probably reflect underlying thermodynamic preferences for particular polar sites to be hydrated. In ligand binding, water molecules at particularly favorable sites are usually retained or replaced by strong hydrogenbonding groups of the ligand in order to enhance binding affinity.
3 Interactions with Water
Water at apolar surfaces has rather distinctive thermodynamic properties, in particular an unusually low entropy and high heat capacity. Displacement of water from apolar surfaces to bulk during protein folding and ligand binding dominates observed heat-capacity changes of the whole system. Consequently, apolar surface hydration has received much more attention than that of polar surfaces. Water molecules associated with exposed apolar groups make only relatively weak direct interactions; the discontinuity in hydrogen-bonding options presented at the apolar surface means that they tend to become organized. Many biochemistry textbooks propagate the view that in order to satisfy their hydrogen bond potential, waters become ordered in rigid cages around the apolar groups in a similar manner to clathrate structures found in crystalline hydrates. However, this picture is being progressively challenged by recent studies in solution. Neutron diffraction studies of aqueous solutions imply that the layer of water around apolar groups is much less well organized and much more dynamic than the clathrate model would suggest [52]. Total internal reflection vibrational sum frequency spectroscopy (an infrared technique that selectively probes molecules at boundaries) has revealed that at an apolar surface, hydrogen bonding between adjacent water molecules is weakened relative to those in bulk water. It also shows that a substantial number of hydrogen bonds are lost (i.e., the H points into the surface) and confirms that this anisotropy of the environment results in substantial orientation of the waters at the interface [53]. This general weakening of hydrogen bonding at apolar surfaces is almost diametrically opposed to the clathrate picture of a cage of water surrounding hydrophobic groups. However, it makes sense of the observed, yet otherwise anomalous, increase in heat capacity of protein systems upon unfolding, i.e., as more apolar groups are exposed to solvent, more hydrogen bonds are weakened, the vibrational energy level spacing decreases, and the heat capacity increases [54]. Surface waters are displaced on formation of the protein-ligand complex and thus provide a favorable entropic contribution to the free energy of complex formation. In particular, displacement of the water found interacting on apolar surfaces makes a large contribution to the G and provides the driving force for many interactions (the hydrophobic effect). 3.2 Buried Water Molecules
High-resolution crystal structures have revealed the presence of water molecules in many cavities within proteins [55] and in the interfaces between proteins and ligands [56, 57]. Such water molecules usually make at least one strong hydrogen bond and in many cases fulfill their total hydrogen-bonding capacity. In cases were the water hydrogen bond capacity is not fulfilled by strongly interacting partners, waters are systematically found to take part in weaker interactions with Cα -H groups [58]. The functional role of buried water molecules is often difficult to ascertain, but in some cases their positioning is highly specific and is crucial to binding or function (this is seen particularly in catalytic active sites). On the other hand, water
15
16 Hydrogen Bonds in Protein-Ligand Complexes
molecules can act as a rather general mechanism to extend protein structure and/or to increase the promiscuity of the binding site. They are able to help accommodate polar hydrophilic side chain groups in what is often a largely hydrophobic interface. They can overcome steric problems of pairing of donor and acceptor groups at the interface. Networks of these waters can provide flexibility in recognition, and the additional hydrogen bonds formed may contribute toward the stability of the overall interaction. Since the buried water molecules usually form extensive interactions with the relatively rigid protein, they have greater difficulties in rearranging their hydrogen bonding than do surface water molecules. Consequently, they are usually kinetically trapped in the interface and have many orders of magnitude lower mobility than at the surface. There has been a widely held view that the entropic cost associated with this entrapment means that such buried water molecules are usually unfavorable to the overall free energy of binding and are found merely as a byproduct of the nonoptimal shape complementarity of ligand and protein. As a result, currently available drug-design programs work on dehydrated binding surfaces, assuming that liberating any water molecules that may have been visualized in the structure of the target and substrate (or cognate ligand) will provide a favorable contribution to G. This general approach has proven successful in several cases. Perhaps the best described of these is that of inhibitors to the HIV protease [59]. However, recent observations have suggested that this generally accepted treatment of interfacial water molecules is not always appropriate. For example, the binding of tripeptides with the general sequence of LysXxxLys to the protein OppA results in the entrapment of different numbers of water molecules in an isolated cavity depending on the side chain on the amino acid represented by Xxx [60, 61]. For example, if Xxx = Trp, the high-resolution X-ray structure reveals three water molecules that form a network hydrogen bonded into the cavity. Substituting Trp with Ala results in seven immobilized water molecules being found in the cavity. Although this would widely be expected to result in a less favorable G, the LysAlaLys peptide is in fact seen to bind significantly more tightly [60]. This has been rationalized by considering that although an entropic penalty is paid for the inclusion of additional water molecules, they are able to make optimal hydrogenbonding arrangements in the interface, producing a sufficiently favorable H as to overcome this. This view is supported by inspection of the crystal structures for OppA-LysXxxLys complexes, which have been solved for the cases where Xxx is any one of the 20 naturally occurring and for a few non-natural amino acids [61, 62]. In these structures, a subset of water molecules is seen to adopt the same position no matter which residue is in the Xxx position. Furthermore, on assessing the potential energy surface of the binding cavity, it appears that the water molecules adopt the positions of lowest enthalpy. These data and other recent studies (e.g., the extensive water-mediated hydrogen bonding in SH2 domain recognition of its specific peptide [57, 63], shown in Fig. 3) show that water molecules have to be incorporated into any rigorous ligand-design program. Effective ways of doing this have yet to be achieved and form a new frontier in the drug-development process.
4 Hydrogen Bonds in Drug Design 17
Fig. 3 The extensive water-mediated hydrogen bond network that mediates recognition of the specific cognate peptide pYEEI for vsrc SH2. The illustrated bonds are thought to be particularly favorable based on mod-
eling studies, and efforts to remove these waters by appropriately placed hydrogenbonding groups in peptidomimetic inhibitors persistently reduce binding affinity [57, 63].
4 Hydrogen Bonds in Drug Design 4.1 Diverse Effects of Hydrogen Bonding on Drug Properties
As we have seen above, single hydrogen bonds between ligand and protein or water have a substantial effect on binding affinity (5.7 kJ mol−1 is a factor of 10 and 11.4 kJ mol−1 is a factor of 100). However, good binding affinity is only one aspect of successful drug design. The rather specific geometric requirements of hydrogen bonding are able to create specificity of binding as well, which is important in avoiding binding to proteins other than the target. Hydrogen bonding also strongly influences the transport, adsorption, distribution, metabolism, and excretion properties of molecules. Lipinski’s analysis of drugs and development candidates [64] suggests that there is a finite limit to the number and nature of noncovalent interactions that a drug is expected to make with its environment. In particular two of the “rules” state that drugs should contain no more than 5 hydrogen bond donors and 10 acceptors. These rules arise because of the need to balance absorption and distribution properties with binding specificity within a
18 Hydrogen Bonds in Protein-Ligand Complexes
relatively small drug molecule. However, it is not clear why there is a numerical difference between donors and acceptors. The rules are based only on counting strong donors and acceptors (O, N, NH, OH) and make no modifications for variation in hydrogen bond strength due to activating neighbors or for the presence of other types of hydrogen-bonding groups. It seems, from the discussion that has gone on before concerning the variability of hydrogen-bonding groups, that the “rules” are not sufficiently subtle. A more quantitative measure of expected lipophilicities and permeabilities than Lipinski’s rules should probably be used in design strategies. Raevsky [65] has carried out an analysis of the donor and acceptor strength of many thousands of molecules, leading to a simple scoring system that describes these strengths. It was demonstrated that relatively accurate predictions of lipophilicity could be made from any structure on the basis of this scoring system. Similarly, in addition to steric bulk effects, both the H-bond donor and acceptor strength play an important role in explaining differences in permeability and absorption of neutral chemical compounds and drugs [66]. 4.2 Optimizing Inhibitor Affinity
If we consider only the process of improving the affinity of existing ligands or developing a high-affinity ligand de novo, then the incorporation of maximal numbers of hydrogen bonds appears likely to convey an advantage. Of course, a particular binding site will offer only a limited number of hydrogen-bonding opportunities based on the number, type, and disposition of the amino acids forming the site. A structure-based drug-design approach will take into account the structure of the complex of the target protein and a lead compound (often the enzyme substrate). Based on this information, the potential to incorporate additional or alternative groups on the ligand to enhance hydrogen bonding can be assessed. There are numerous programs that may facilitate this process, ranging from simple scoring functions to molecular dynamics simulations (Section 4.3). There are a plethora of examples in the literature whereby the involvement of hydrogen bonds has been modified to enhance ligand affinity. It is not the purpose of this article to provide a comprehensive review of these; instead, we select a few instructive examples (suitable for further reading) where hydrogen bonding has been demonstrated to have significant effect on the structure-based design process. In an early classic structural and thermodynamic study, the high-resolution crystal structures of a pair of thermolysin inhibitors revealed that they bound identically except for the appearance of one hydrogen bond that made a specific interaction [67, 68]. The comparison was specifically designed to produce an intrinsic hydrogenbonding energy. The hydrogen bond donor group of the tighter-binding inhibitor was replaced by an acceptor in order to maintain its hydrogen-bonding capability with water in the unbound state. The difference in affinity between the two interactions suggested that this hydrogen bond alone contributed a total of −16.8 kJ mol−1 to the free energy of binding, i.e., a factor of 840.
4 Hydrogen Bonds in Drug Design 19
Significant effort has been directed at the design of drugs to inhibit specific proteins involved in intracellular signal transduction pathways by targeting SH2 domains of relevant proteins. A number of studies report a rational approach to developing tyrosyl phosphate-peptidomimetic ligands that have properties suitable for drugs. One of the greatest challenges in this venture has been to design a suitable replacement for the phosphotyrosine (pY) residue. The phosphotyrosine contributes approximately 60% of the free energy of binding of peptide-based ligands to SH2 domain interactions [69, 70]; therefore, failure to adequately replace this group substantially compromises affinity. The phosphotyrosine moiety is involved in a high concentration of hydrogen bonds (see Fig. 2). Substitution of the pY by the non-hydrolyzable phosphomethyl phenylalanine (i.e., >C O PO3 H2 for >C CH2 PO3 H2 ) showed good resistance to phosphatase activity, but the loss of at least two hydrogen bonds to the phosphate oxygen compromised binding [71]. The best substitute for the phosphate group appeared to be via a sulfur group, OPSO2 H, the sulfur group presumably able to sustain the hydrogen bonds of the substituted oxygen [63, 72]. Calorimetric studies on both the free energy and the enthalpy of binding can be informative as to the source of binding energy because of the distinct thermodynamic characteristics of different processes, i.e., desolvation, water entrapment, or direct hydrogen bonding [73]. Work based on the interaction of FKBP and the immunosuppressive drug FK506 [74] suggested that a hydroxylcarbonyl hydrogen bond itself was enthalpically unfavorable even though the overall free energy for the bond formation is favorable because of a favorable S term for dehydration of the hydroxyl group. Structure-activity relationships for series of tricyclic inhibitors to farnesyl protein transferase revealed an interesting correlation between the en-thalpic contribution to binding and the increase of nonpolar surface resulting from addition of halogen atoms to the compounds. Nonetheless, the majority of the dominant enthalpy term appeared to be derived from hydrogen bonding, which incorporated a crucial water-mediated interaction [75]. An exhaustive study on novel serine protease inhibitors revealed the role of a multi-centered short hydrogen-bonding network in ligand recognition [76]. The appearance of eight inhibitor-enzyme or enzyme-water-inhibitor hydrogen bonds at the active site is a common feature of serine protease inhibitor binding observed in a large number of crystal structures of trypsin, thrombin, and urokinase-type plasminogen activator complexes. The short hydrogen-bonding networks were estimated to contribute approximately 7.1 kJ mol−1 to the free energy of binding. This seems rather small but is a differential effect, as many bonds to water of similar strength exist in the apo-enzymes. This work also emphasized the importance of the pK a values of groups in the binding site and the potential effects that these can have on hydrogen-bonding capability. The interaction of some of these inhibitors via hydrogen-bonding networks that incorporate water molecules also begs for the inclusion of interfacial water molecules into the drug-design process (as discussed in Section 3.2), where they can improve binding and specificity properties [56, 57].
20 Hydrogen Bonds in Protein-Ligand Complexes
4.3 Computational Tools for Hydrogen Bond Analysis and Design
There are several computational tasks that occur frequently in structure-based drug design, including high-throughput, structure-based searching for lead compounds that complement a given binding site; suggesting modifications to a known ligand in order to improve affinity or specificity; and post hoc rationalization of trends in experimentally well-characterized protein-ligand complexes. In each of these tasks, proper account needs to be taken of the effects of changes to hydrogen bonding as the ligand binds. Each of the three tasks requires similar structural and energetic computations, although the first is often carried out with a lower level of detail for reasons of speed. If we first consider the latter tasks, where the structure of a protein-ligand complex is already known, it is usually necessary to build the hydrogen atoms into the structure, determine which hydrogen bond donor and acceptor sites are present, bearing in mind possible bifurcation and steric accessibility issues, and then correctly score the contributions of hydrogen bonding and all other aspects of the structure to the thermodynamics of binding. It should be emphasized that no single accurate automated procedure yet exists to carry out all these computations, but many tools are available that can assist visualization and thinking. In building hydrogens into a structure, although it is fair to assume that all strong acceptors and donors are satisfied to a first approximation, it is often difficult to decide between alternative pairings and geometries. Glick and Goldblum [77] have described a strategy of ordered hydrogen placement that begins by adding non-rotatable hydrogens such as those of the peptide backbone according to known covalent geometry. Then water protons, polar side chain protons, and the C- and N-termini of a protein are added in such a manner as to maximize the number and strength of hydrogen bonds. Since there are many possible combinations, a sophisticated mixed stochastic/hierarchical search is employed to find the optimal configuration. The program was benchmarked successfully against several neutron structures of proteins. An alternative approach is to build all hydrogen according to known covalent geometry and then subject the system to a short period of molecular dynamics simulation, which allows the system to find a favorable low-energy configuration for the hydrogen bonding. This can be accomplished in most commercial biomolecular modeling packages (Sybyl, Insight, etc.), but note that molecular dynamics is essential in order to search thoroughly for alternatives to the initially built structure; energy minimization alone does not have sufficient flexibility. Molecular dynamics also has the potential advantage of allowing a flexible response of the protein in response to modifications of the ligand. Both methods require prior knowledge from NMR data or from the structure itself [12] or computation [78] of the pK a of all groups, which profoundly affect hydrogen-bonding patterns and consequent binding affinity [79]. When relying on computation of pK a values – since hydrogen bonding can significantly affect the pK a of His, Glu, and Asp – it may be necessary to iteratively cycle the pK a and hydrogen-bond calculations until a steady state is reached.
4 Hydrogen Bonds in Drug Design 21
The stochastic search and molecular dynamics approaches are both limited by how well hydrogen bonding is described within them. Generally, lone pair geometry and the interactions of weak acceptors are poorly represented in computational models of protein-ligand interactions. Accurate models of the local energetics of strong donor-weak acceptor pairs would be particularly useful, as the very existence of the bond depends on a fine balance of energetic considerations. Furthermore, it can be extremely difficult to decide on the basis of a medium-resolution X-ray structure alone whether or not such a bond is present or significant in binding. An alternative or additional procedure to attempting to model the actual position of every hydrogen bond is to create a 3-D contour map of the probability of finding a donor or acceptor at a particular position in the binding site. Taking advantage of the ready availability of 3-D modeling/visualization software, binding site probability maps are created by superimposing the individual probability maps for known donor and acceptor groups (which have been derived from structural databases, Section 2.2) on each donor and acceptor in the binding site. The 3-D site mapping idea has been incorporated into several programs: XSITE (based on data from the PDB and therefore actual protein-ligand complexes [80]) and ISOSTAR/SUPERSTAR (based on the CSD data and therefore on broader chemistry [81, 82]). These templates can be used to predict the potential positions at which a ligand could interact via a hydrogen bond and therefore could be used in conjunction with molecular similarity studies, pharmacophore query searching of databases, or de novo design algorithms [13]. An interesting variation on this approach is to create a hydrogen-bond probability map for a given ligand or set of ligands that contains all the feasible positions at which a complementary protein atom could be found. This can be useful in creating a model of a receptor where the structure of the receptor is not known from experiment [83]. The success of structure-based lead development is hampered by the inability to accurately relate changes in structure to the detailed energetics of binding. Ab in-itio quantum theory is quite successful at modeling hydrogen bond geometry and energetics, but it is too computationally demanding for routine use in ligand design. Consequently, simplified force fields in which the energy is expressed as a sum of Lennard-Jones and electrostatics interaction are widely used to model macromolecular systems. None of the standard force fields accurately model polarizability. Instead, parameters are averaged over a number of the configurations considered by quantum mechanics and are modified to account for an average polarizable environment. In the case of homogenous liquids such as water, this has been a successful strategy for the reproduction in calculation of the bulk properties of the liquid. However, the success of this temporal and spatial averaging strategy is unlikely to be extendable to discrete binding sites for water or other ligands that possess properties distinct from the bulk phase. Another defect of common force fields is that they place partial charges at atom positions and consequently model only the dipole character of local interactions. As we have seen, the preference for hydrogen bonding to lone pairs means that the dipole model does not have appropriate geometry. It has been found that more sophisticated electrostatic models (e.g., using distributed multi-poles) can produce correct orientation
22 Hydrogen Bonds in Protein-Ligand Complexes
information [84], but even then polarizability is not correctly included. Although such molecular-simulation-based rationalization of inhibitor binding energies can provide some insight and is a very active area of research (e.g., [85, 86]), it seems that it will be some time before sufficiently accurate models of hydrogen bonding are available to allow accurate physics-based predictions of binding free energy. High-throughput scoring of ligands is possible with lower levels of structural detail and greater margins of error in calculated binding affinity. Such scoring schemes partition the interaction energy of a protein-ligand system in a simple way, e.g., counting strong and very strong hydrogen bonds, contacts between, or surface area of, buried apolar groups and ignore the explicit water molecules. They can be quite successful in ranking compounds’ binding affinities (e.g., [87]) and consequently useful in the drug-design process. However, they are subject to a substantial margin of error and are usually very poor at identifying the thermodynamic (enthalpy or entropy) and structural source of changes in binding affinity [88]. We believe that improvements can be made by more detailed consideration of the strength of hydrogen bonds [37, 65] and the role of buried waters [56] and by parameterizing the scores against enthalpy and entropy data in addition to binding affinities.
5 Conclusion
Hydrogen bonds are crucial to the recognition of ligands by proteins. We have learned much structurally and energetically about proteins’ hydrogen-bonding capacity over the past 20 years, and this is beginning to make an effective contribution to drug-design strategies. However, hydrogen bonding remains a very active area of research, with new insights promised by the determination of more proteinligand structures of better quality, by emerging spectroscopic techniques, and the possibility of building a greater experimental databank of thermodynamic knowledge through advances in microcalorimetry. The integration of this knowledge into theoretical and computational molecular models will be an exciting and rewarding challenge in the coming decade.
References 1 G. C. PIMENTEL, A. L. MCCLELAN, The hydrogen bond, Freeman, San Fransisco 1960. 2 L. PAULING, The nature of the chemical bond and the structure of molecules and crystals, Cornell University Press, Ithaca, NY 1939. 3 K. MOROKUMA, Acc. Chem. Res., 1977, 10, 294–300.
4 D. A. SMITH, A brief history of the hydrogen bond, in: D. A. SMITH (Ed.) Modeling the hydrogen bond, American Chemical Society, Washington DC 1994, 1–5. 5 E. D. ISAACS, A. SHULKA, P. M. PLATZMAN, D. R. HAMANN, B. BARBIELLINI, C. A. TULK, Phys. Rev. Lett., 1999, 82, 600–603.
References 6 N. NIIMURA, Curr. Opin. Struct. Biol., 1999 9, 602–608. 7 F. SHU, V. RAMAKRISHNAN, B. P. SCHOENBORN, Proc. Nat. Acad. Sci. USA, 2000, 97, 3872–3877. 8 R. TAYLOR, O. KENNARD, W. VERSICHEL, J. Am. Chem. Soc., 1983, 105, 5761–5766. 9 G. A. JEFFERY, W. SAENGER, Hydrogen bonding in biological structures, Springer, Berlin 1991. 10 J. P. LOMMERSE, R. TAYLOR, Journal of Enzyme Inhibition, 1997, 11, 223–243. 11 E. N. BAKER, R. E. HUBBARD, Prog. Biophys. Molec. Biol., 1984, 44, 97–197. 12 I. K. MCDONALD, J. M. THORNTON, J. Mol. Biol., 1994 238, 777–793. 13 J. E. J. MILLS, P. M. DEAN, J. Computer-Aided Molecular Design, 1996, 10, 607–622. 14 J. B. O. MITCHELL, S. L. PRICE, J. Comp. Chem., 1990, 11, 1217–1233. 15 N. THANKI, J. M. THORNTON, J. M. GOODFELLOW, J. Mol. Biol., 1988, 202, 637–657. 16 J. A. IPPOLITO, R. S. ALEXANDER, D.W. CHRISTIANSON. J. Mol. Biol., 1990, 215, 457–471. 17 R. PREISSNER, U. EGNER, W. SAENGER, Febs Lett., 1991, 288, 192–196. 18 G. R. DESIRAJU, T. STEINER T., The Weak Hydrogen Bond-In Structural Chemistry and Biology, Oxford Science Publications, Oxford 1999. 19 M. F. PERUTZ, G. FERMI, D. J. ABRAHAM, C. POYART, E. BURSAUX, J. Am. Chem. Soc., 1986, 108, 1064–1078. 20 G. WAKSMAN, D. KOMINOS, S. C. ROBERTSON, N. PANT, D. BALTIMORE, R. B. BIRGE, D. COWBURN, H. HANAFUSA, B. J. MAYER, M. OVERDUIN, M. D. RESH, C. B. RIOS, L. SILVERMAN, J. KURIYAN, Science, 1992, 358, 646–653. 21 M. F. PERUTZ, Philos. Trans. Royal Soc. London A, 1993 345, 105–112. 22 J. B. O. MITCHELL, C. L. NANDI, I. K. MCDONALD, J. M. THORNTON, S. L. PRICE, J. Mol. Biol., 1994, 239, 315–331. 23 K. FLANAGAN, J. WALSHAW, S. L. PRICE, J. M. GOODFELLOW. Protein Eng., 1995, 8, 109–116. 24 T. STEINER, G. KOELLNER, J. Mol. Biol., 2001, 305, 535–557.
25 T. STEINER, J. Chem. Soc.-Perkin Transactions, 1995, 2, 1315–1319. 26 S. SCHEINER; T. KAR; Y. L. GU, J. Biol. Chem., 2001 276, 9832–9837. 27 G. A. JEFFERY, An introduction to hydrogen bonding, Oxford University Press, Oxford 1997. 28 S. N. VINOGRADOV, R. H. LINNELL, Hydrogen bonding, Van Nostrand Reinhold, New York 1971. 29 V. W. COULING, P. FISCHER, D. KLENERMAN, W. HUBER, Biophys. J., 1998, 75, 1097–1106. 30 L. D. BARRON, L. HECHT, E.W. BLANCH, A. F. BELL, Prog. Biophys. Mol. Biol., 2000, 73, 1–49. 31 G. M. CLORE, A. M. GRONENBORN, Curr. Opin. Chem. Biol., 1998, 2, 564–570. 32 P. M. NIETO, B. BIRDSALL, W. D. MORGAN, T. A. FRENKIEL, A. R. GARGARO, J. FEENEY, FEBS Lett., 1997, 405, 16–20. 33 J. FEENEY, Angew. Chem. Int. Ed. Engl., 2000, 39, 290–312. 34 T. K. HARRIS, Q. ZHAO, A. S. MILDVAN, J. Mol. Struct., 2000, 552, 97–109. 35 A. J. DINGLEY, F. CORDIER, S. GRZESIEK, Concepts Magn. Res., 2000, 13, 103–127. 36 F. CORDIER, C. Y. WANG, S. GRZESIEK, L. K. NICHOLSON, J. Mol. Biol., 2000, 304, 497–505. 37 E. GANCIA, J. G. MONTANA, D. T. MANALLACK, J. Molec. Graph. Modelling, 2001, 19, 349–362. 38 H. ADAMS, K. D. M. HARRIS, G. A. HEMBURY, C. A. HUNTER, D. LIVINGSTONE, J. F. MCCABE, Chem. Comm., 1996, 22, 2531–2532. 39 A. R. FERSHT, J.-P. SHI, J. KNILL-JONES, D. M. LOWE, A. J. WILKINSON, D. M. BLOW, P. BRICK, P. CARTER, M. M. Y. WAYE, G. WINTER, Nature, 1985 314, 235–238. 40 S. J. TEAGUE, A. M. DAVIES, Angew. Chem. Int. Ed. Engl., 1999, 38, 736–749. 41 J. A. SCHELLMAN, C. R. Trav. Carlsberg Ser. Chim., 1955, 29, 223–229. 42 I. M. KLOTZ, J. S. FRANZEN, J. Am. Chem. Soc., 1962, 84, 3461–3466. 43 H. SUSI, S. N. TIMASHEFF, J. S. ARD, J. Biol. Chem., 1964, 239, 3051–3054. 44 S. J. GILL, L. NOLL, J. Phys. Chem., 1972, 76, 3065–3068. 45 J. M. SCHOLTZ, S. MARQUSEE, R. L. BALDWIN, E. J. YORK, J. M. STEWART, M.
23
24 Hydrogen Bonds in Protein-Ligand Complexes
46 47 48 49 50 51
52 53 54 55
56 57
58 59
60
61
62 63 64
SANTORO, D. W. BOLEN, Proc. Natl. Acad. Sci. USA., 1991 88, 2854–2858. B. A. SHIRLEY, P. STANSSENS, U. HAHN, C. N. PACE, Biochemistry, 1992, 31, 725–732. K. A. SHARP, A. NICHOLLS, R. FRIEDMAN, B. HONIG, Biochemistry, 1991, 30, 9686–9697. A. S. YANG, K. SHARP, B. HONIG, J. Mol. Biol., 1992, 227, 889–900. G. I. MAKHATADZE, P. L. PRIVALOV, J. Mol. Biol., 1993, 232, 639–659. S. M. HABERMANN, K. P. MURPHY, Protein Sci., 1996, 5, 1229–1239. N. THANKI, J. M. THORNTON, J. M. GOODFELLOW, Protein Eng., 1990, 3, 495–508. J. L. FINNEY, A. K. SOPER, Chem. Soc. Reviews, 1994, 23, 1–10. L. F. SCATENA, M. G. BROWN, G. L. RICHMOND, Science, 2001, 292, 908–912. J. M. STURTEVANT, Proc. Natl. Acad. Sci. USA, 1977, 74, 2236–2240. M. A. WILLIAMS, J. M. GOODFELLOW, J. M. THORNTON, Protein Sci., 1994, 3, 1224–1235. J. E. LADBURY, Chem. & Biol., 1996, 3, 973–980. D. A. RENZONI, M. J. J. M. ZVELEBIL, T. LUNDBA¨ CK, J. E. LADBURY. In: J. E. LADBURY, & P. R. CONNELLY (Eds.) Structure-based drug design: Strategy, thermodynamics and modelling, Springer, Berlin 1997, 161–180. T. STEINER, Acta Cryst D, 1995, 51, 93–97. P. Y. S. LAM, P. K. JADHAV, C. J. EYERMANN, C. N. HODGE, Y. RU, L. T. BACHELER, J. L. MEEK, M. J. OTTO, M. M. RAYNER, Y. N. WONG, C.–H. CHANG, P. C. WEBER, D. A. JACKSON, T. R. SHARPE, S. ERICKSON-VIITANEN, Science, 1994, 263, 380–384. J. R. H. TAME, S. H. SLEIGH, A. J. WILKINSON, J. E. LADBURY, Nature Struct Biol., 1996, 3, 998–1001. S. H. SLEIGH, P. R. SEAVERS, A. J. WILKINSON, J. E. LADBURY, J. R. H. TAME, J. Mol. Biol., 1999, 291, 393–415. T. G. DAVIES, R. E. HUBBARD, J. R. H. TAME, Protein Sci., 1999, 8, 1–13. D. A. HENRIQUES, J. E. LADBURY, Arch. Biochem. Biophys., 2001, 390, 158–168. C. A. LIPINSKI, F. LOMBARDO, B. W. DOMINY, P. J. FEENEY, Adv. Drug Delivery Rev., 1997, 23, 3–25.
65 O. A. RAEVSKY, J. Phys. Organic. Chem., 1997, 10, 405–413. 66 O. A. RAEVSKY, K. J. SCHAPER, K.J. EUR. J. Med. Chem., 1998, 33, 799–807. 67 D. E. TRONRUD, H. M. HOLDEN, B. W. MATTHEWS, Science, 1987, 235, 571–574. 68 P. A. BARTLETT, C. MARLOWE, Science, 1987, 235, 569–571. 69 J. M. BRADSHAW, V. MITAXOV, G. WAKSMAN, J. Mol. Biol., 1999, 293, 971–985. 70 D. A. HENRIQUES, J. E. LADBURY, R. M. JACKSON, Protein Sci., 2000, 9, 1975–1985. 71 T. R. BURKE Jr., M.S. SMYTH, A. OTAKA, M. NOMIZU, P. P. ROLLER, G. WOLF, R. CASE, S. E. SHOELSON, Biochemistry, 1994, 33, 6490–6494 72 T. GILMER, M. RODRIGUEZ, S. JORDAN, R. CROSBY, K. ALLIGOOD, M. GREEN, M. KIMERY, C. WAGNER, D. KINDER, P. CHARIFSON, A. M. HASSELL, D. WILLARD, M. LUTHER, D. RUSNAK, D. D. STERNBACH, M. MEHROTRA, M. PEEL, L. SHAMPINE, R. DAVIS, J. ROBBINS, I. R. PATELL, D. KASSEL, W. BURKHART, M. MOYER, T. BRADSHAW, J. BERMAN, J. Biol. Chem., 1994, 269, 31711–31719 73 G. KLEBE, H.J. B¨OHM, J. Receptor Signal Transduction Res., 1997, 17, 459–473. 74 P. R. CONNELLY, R. A. ADALPE, F. J. BRUZZESE, S. P. CHAMBERS, M. J. FITZGIBBON, M. A. FLEMING, S. ITOH, D. J. LIVINGSTON, M. A. NAVIA, J. A. THOMSON, K. P. WILSON, Proc. Natl. Acad. Sci. USA., 1994, 91, 1964–1968. 75 C. L. STRICKLAND, P. C. WEBER, W. T. WINDSOR, Z. WU, H. V. LE, M. M. ALBANESE, C. S. ALVAREZ, D. CESARZ, J. DEL ROSARIO, J. DESKUS, A. K. MALLAMS, F. G. NJOROGE, J. J. PIWINSKI, S. REMISZEWSKI, R. R. ROSSMAN, A. G. TAVERAS, B. VIBULBHAN, R. J. DOLL, V. M. GIRIJAVALLABHAN, A.K. GANGULY. J. Med. Chem., 1999, 42, 2125–2135. 76 B. A. KATZ, K. ELROD, C. LUONG, M. J. RICE, R. L. MACKMAN, P. A. SPRENGELER, J. SPENCER, J. HATAYE, J. JANC, J. LINK, J. LITVAK, R. RAI, K. RICE, S. SIDERIS, E. VERNER, W. YOUNG, J. Mol. Biol., 2001, 307, 1451–1486.
References 77 M. GLICK, A. GOLDBLUM, Proteins: Struct. Funct. Gen., 2000, 38, 273–287. 78 D. BASHFORD, M. KARPLUS, J. Phys. Chem., 1991, 95, 9556–9561. 79 F. DULLWEBER, M. T. STUBBS, D. MUSIL, J. STURZEBECHER, G. KLEBE, J. Mol. Biol., 2001 313, 593–614. 80 R. A. LASKOWSKI, J. M. THORNTON, C. HUMBLET, J. SINGH, J. Mol. Biol., 1996, 259, 175–201. 81 I. J. BRUNO, J. C. COLE, J. P. M. LOMMERSE, R.S. ROWLAND; R. TAYLOR; M. L. VERDONK, J. Computer-Aided Molec. Design, 1997, 11, 525–537. 82 M. B¨OHM, G.J. KLEBE, J. Med. Chem., 2002 45, 1585–1597.
83 J. E. MILLS, T. D. J. PERKINS, P. M. DEAN, J Computer-Aided Molec. Design, 1997, 11, 229–242. 84 J. B. O. MITCHELL, C. L. NANDI, J. M. THORNTON, S. L. PRICE, J. SINGH, M. SNAREY, J. Chem. Soc. Faraday Trans., 1993, 89, 2619–2630. 85 P. KALRA, T. V. REDD, B. JAYARAM J. Med. Chem., 2001, 44, 4325–4338. 86 R. C. RIZZO, J. TIRADO-RIVES, W. L. JORGENSEN, J. Med. Chem., 2001, 44, 145–154. 87 H.J. B¨OHM, J. Computer-Aided Molec. Design, 1998, 12, 309–323. 88 J. R. H. TAME, J. Computer-Aided Molec. Design, 1999, 13, 99–108.
25
1
Enzyme Inhibitor Design D. W. Banner F. Hoffmann-La Roche Ltd, Basel, Switzerland
Originally published in: Protein-Ligand Interactions.Edited by Hans-Joachim B¨ohm, Gisbert Schneider. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30521-6
1 Introduction
The field of enzyme inhibition is one of the most fruitful sources of experimental information on the interaction of small chemical ligands with proteins. It is well known that the majority of pharmaceutical companies have a range of drugdevelopment projects where the active principle is an enzyme inhibitor. The reason for this is clear: many enzymes are well-characterized, soluble, stable proteins with an established assay suitable for either high-throughput screening or precise measurement of inhibition constants. Perhaps the most widely studied enzyme is thrombin. In this article, we will use the example of active site inhibition of thrombin to illustrate a range of principles of enzyme-inhibitor design. It will be left to the reader to perceive when the terms “enzyme” and “inhibitor” may be generalized to “receptor” and “ligand.” The modern drug-design and -development process is extremely complex. Here we will concentrate only on the molecular recognition aspects. Comprehensive surveys of the thrombin inhibitor patent literature have been made by Wiley and Fisher (pre-1997) [1] and Coburn (1997–2000) [2], and the clinical use of direct thrombin inhibitors has been extensively reviewed [3–11]. Thrombin residue numbering follows throughout the chymotryp-sinogen convention. The first principle of enzyme inhibitor design is “Use all the available information”. This information can be biological, functional, structural, chemical, or theoretical. There is such an immense amount of biological information on thrombin that it cannot be surveyed here: we focus on thrombin as a serine protease of the trypsin family and take fibrinogen to be its primary substrate. A convenient way to look at the information available is from the more general to the very specific. For
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Enzyme Inhibitor Design
Fig. 1 Thrombin active site regions as defined by the binding of fibrinopeptide of sequence D-F-L-A-E-G-G-G-V-R (from [1bbr])
thrombin, we may take four levels: the general catalytic mechanism; the particular substrate types processed; the structure of the protein; and, often forgotten, the flexibility of the protein required to achieve this function. Enzymes are biological catalysts: the active site exists to correctly position a substrate molecule so that functional groups on the enzyme may perform “chemistry” on it. All trypsin-like enzymes have a “catalytic triad” of aspartic acid 102, histidine 57, and serine 195 in which the serine Oγ is activated so that it may attack a suitably positioned substrate carbonyl carbon atom to form an acyl intermediate which is, in turn, attacked by water to release products. Thrombin substrates are normally peptidic, with nucleophilic attack of the serine Oγ being on the carbonyl carbon of an amide bond. The tetrahedral intermediate so formed is stabilized by two enzyme backbone hydrogen bonds from the -NHs of Gly193 and Ser195, which form the “oxyanion hole.” This catalytic mechanism is positioned next to a “recognition pocket” that has an aspartic acid (Asp189) at the bottom and is highly arginine specific (Fig. 1). Substrates normally have to be positioned quite accurately in the active site of an enzyme for catalysis to proceed quickly. In the case of thrombin, this is further achieved via an anti-parallel beta interaction between substrate peptide and enzyme residues 214–216. There is perhaps more three-dimensional structural information available on thrombin than any other enzyme. We will study selected examples of the use of such information, indicating the relevant Protein Data Bank (http://www.rcsb.org/pdb/) entries [1abc] for those who wish to visualize the structures in 3-D. Enzymes are intrinsically mobile. Catalysis proceeds stepwise, first by formation of a complex between enzyme and substrate(s), next by passage through one or more transition states, and finally by release of product(s). There is therefore a range of low-energy structures that may be regarded as possible targets for inhibition. These, when known, should be considered in the inhibitor design process. In
2 The Active Site 3
the case of thrombin, no large structural changes take place, but in other systems – particularly where multiple substrates, products, and cofactors are involved – it may be necessary to document and analyze a significant number of very different structural states. This may be done by determining X-ray structures of complexes with different combinations of functional and non-functional analogues of substrates, products, and cofactors. In particular, “transition state analogues” are of great value and should be synthesized whenever possible.
2 The Active Site
It has proved extremely useful for thrombin and many other enzymes to provide a standard nomenclature to describe the active site. The notation of Shechter and Berger is widely used for enzymes whose substrates are polymers: the positions of the polymer are named -P4-P3-P2-P1/P1 -P2 -P3 -P4 -, where/is the cleavage site, and the sequence for polypeptides runs from the N-to the C-terminus [12]. The corresponding pockets on the protein that are responsible for the recognition of these polymer elements are called “sub-sites” and are labeled . . . S2, S1, S1 , S2 . . . For thrombin, with fibrinogen as defining substrate, this is inconvenient. Inspection of complexes of thrombin with fibrinopeptide analogues [1bbr, 1dm4, 1fph, 1ucy, and 1ycp] shows the fibrinopeptide, the N-terminal fibrin cleavage product, to have a folded structure as illustrated in Fig. 1. (It is more convenient to use the notation of Fig. 2 [13]).
Fig. 2 Thrombin active site regions as defined by the binding of DPhe-Pro-Arg analogues, e.g., PPACK. The recognition pocket (S1) is clear. The proximal (P) hydrophobic pocket binds the proline side chain and thus corresponds to S2. The distal (D) hydrophobic pocket binds the D-Phe side chain. For PPACK 1 R1 CO.CH2 Cl.
4 Enzyme Inhibitor Design
3 The Heuristic Approach
It could be argued that in the ideal case and given the power of modern computational methods, one single X-ray crystal structure of thrombin should suffice to design thrombin inhibitors with the desired properties using “virtual screening” techniques. Certainly, examples of success using this approach are known and are given below (Section 5). The majority of thrombin inhibitors reported to date, however, were produced by classical medicinal chemistry, either alone or in combination with X-ray structural information. To facilitate the dialogue among the chemist, the modeler, and the crystallographer, it has proved most useful not only to define terms, as in Fig. 2, but also to develop “design rules.” Examples might be: “A terminal basic group is required to fit into the S1 pocket,” “Two hydrophobic groups are required to fill the P-and D-pockets,” or “A hydrogen-bond acceptor has to be positioned over the -NH of Gly 216.” Such rules may give valuable direction to design, particularly if they capture some aspect of the active site that is not particularly obvious but that is indicated by experiment. There is, however, a clear problem with the approach, namely, that it may not be necessary or desirable to obey all the rules at once. Thrombin, for example, has two large hydrophobic pockets (D and P) as well as the S1 pocket. It is thus relatively easy to generate molecules that bind tightly to the active site, that is, with inhibition constants in the low nanomolar range. The practical issue here (at least in the pharmaceutical industry) is not how to obtain better inhibition but rather how to produce compounds with optimal biological properties. There are many examples of thrombin inhibitors where, for instance, a less basic P1 group has been introduced in the attempt to improve oral availability, resulting in a “non-optimal” interaction with the S1 pocket. A further handicap associated with any rule-based approach to inhibitor design is that it tends strongly to lead to just one class of very similar molecules: the process often converges to a single (local) minimum. The best approach is “Try to extract helpful rules from the available data–but be prepared to break them!”
4 Mechanism-based Covalent Inhibitors
The penicillins are one of the most successful classes of drug. They are mechanismbased inhibitors of beta lactamases and penicillin-binding proteins (PBPs) involved in bacterial cell-wall synthesis. In brief, these enzyme inhibitors contain a lactam ring that opens on acylation of the active site serine; ring opening is followed by structural rearrangement of the inhibitor, and the best inhibitors are those where the rearrangement is such that the water attack required for deacylation is hindered. This results in very slow off-rates: the enzyme is covalently inhibited for a very long time. It might be thought that this principle could easily be transferred to other enzymatic systems, such as thrombin. This has been the subject of much effort, but
4 Mechanism-based Covalent Inhibitors Fig. 3. The simplest thrombin inhibitors, benzamidine 2 (left) and APPA, 3 p-amidino phenyl pyruvate.
the results have been generally disappointing. It turns out that lack of selectivity is a serious problem. Chemical compounds with suitable reactive centers can inhibit a wide variety of similar enzymes, with the risk of severe toxicity effects in vivo. For the penicillins this is not a problem, as humans do not possess homologues of beta lactamases and PBPs. We do, however, have many important thrombin-like serine proteases involved in vital functions, which must not be significantly inhibited. Can we, nevertheless, use mechanism-based inhibitors to study the molecular interactions in active sites? Unfortunately the simple answer appears to be negative: on- and off-rates and structural rearrangements are difficult to interpret in terms of the energetics of specific interactions. Although detailed spectroscopic studies have begun to shed light on these complex mechanisms (e.g., [14, 15]), much more work will be required before all the enthalpic and entropic effects can be unscrambled. Knowledge of nothing more than the catalytic mechanism and the P1 residue can indeed be used directly to design thrombin inhibitors. All that is required is an arginine analogue with an electrophilic center in the correct position. The simplest of these is APPA (Fig. 3). The seminal work of Bode and Huber produced crystal structures of both benzamidine and APPA bound to trypsin [3ptb, 1tpp] [16]. Thrombin has a very similar primary sequence to trypsin, with amino acid identities (similarities) of about 40% (55%), depending on species. The structures of the enzymes are also very similar, with 200 Ca positions superimposing with about 0.75 Å rms deviation. The remaining 60 thrombin residues are in surface loops which are much shorter in trypsin, where there are only 30 corresponding residues. Given this similarity, it is no surprise that both benzamidine and APPA show little selectivity among thrombin, trypsin, and the large number of closely related enzymes. About half of the thrombin structures in the PDB are active site complexes with covalent inhibitors and with other molecules bound to so-called “exo-sites” [17]. The majority of these contain PPACK 1 (Fig. 2) [18], which has been very widely used as a tool. This inhibitor forms a stable complex with thrombin, making covalent bonds to both Ser 195 and His 57. Here a word of warning must be given: a protein structure is distorted somewhat by inhibitor binding, and for covalent inhibitors, particularly doubly covalent ones, this distortion may be significant. For thrombin this is not a serious problem: in the PPACK complex, the Tyr-Pro-Pro-Trp lid of the P pocket adjusts its position so that the Trp side chain moves down by about 1
5
6 Enzyme Inhibitor Design
Å to better pack over the PPACK proline [13]. It is not unknown in other enzymes that the acylenzyme relaxes to a structure significantly different from the initial Michaelis complex. For investigation in structural detail of the acylation of elastase, a serine protease with the same active center as thrombin, consult [19–24]. While no structure of a true Michaelis complex of this kind of inhibitor has been reported, Skordalakes et al. have made a fascinating observation with a phosphonate tripeptide thrombin inhibitor of the structure of a trapped pentacovalent intermediate state which precedes the covalent intermediate [25]. Many research groups have started with PPACK and produced inhibitors of a less peptidic nature (to improve in vivo stability), and/or with conformational restriction (to tackle the entropy loss problem), and/or with a variety of “serine trap” functionalities, for example, aldehydes, boronic acids, α-keto amides and acids, αketo heterocycles, polyfluorinated ketones, and phosphonates. Structures of many of these are known in complex with thrombin but will not be reviewed here (see e.g., publications of C.A. Kettner and coworkers). As well as lack of specificity, these potential drugs suffer from slow on-rates and have not progressed to the market.
5 Parallel de novo Design of Inhibitors
As asserted above, it should easily be possible these days to progress from a crystal structure to a useful lead inhibitor quite quickly using in silico screening. A few years ago we used thrombin inhibition to test a conceptually simple de novo approach combining combinatorial docking and combinatorial chemistry [26]. Reductive amination was chosen as a convenient synthetic chemistry: a set of aldehydes and a set of amines were chosen on the criteria of size and availability, and all aldehydeamide reductive amination products were “synthesized” in silico and docked into the thrombin active site, and binding affinities were estimated using modified Ludi algorithms [27]. Ten of the predicted best inhibitors were then synthesized chemically and assayed for thrombin inhibition. The best compound 4, shown in Fig. 4, had a K i for thrombin of 95 nM, an encouraging result for a compound of molecular weight 317. The “amine moiety,” p-amino-benzamidine, which serves as the arginine mimetic, has on its own a K i of 34 µM for thrombin and 5.7 µM for trypsin, that is, the “needle,” as we have named such entities [28], is selective for trypsin by 5×. The full compound, however, has a K i for trypsin of 520 nM and is thus 5× selective for thrombin. To check the binding mode, an X-ray crystal structure of the complex was determined. This confirmed the binding mode predicted but with some significant differences in detail. The terminal phenyl group fits into the D pocket, but the ether oxygen is almost 2 Å further out into solution than expected. This is because the Tyr-Pro-ProTrp peptide sequence that forms the “lid” of the hydrophobic P pocket moves down over the central phenyl group in a way already seen for other inhibitors, notably PPACK. A number of principles are clear from this experiment. The first is that a straightforward approach to obtaining easy-to-synthesize inhibitors may be successful, so
5 Parallel de novo Design of Inhibitors
Fig. 4 (a) The reductive amination scheme. (b) The best compound 4 from combinatorial docking.
is worth trying. A second is that empirical estimates of binding energy such as implemented in Ludi are capable of giving useful results. A third is that the protein may adapt to the inhibitor rather than vice versa. This last principle may be generalized as “The system moves to the structure with the lowest free energy.” This is, of course, a well-known, fundamental principle of thermodynamics, so it is surprising that it is often overlooked. The problem arises because it is not yet possible to compute the behavior of a system consisting of protein, ligands, water, and, quite possibly, a variety of other solvent molecules for a time period long enough for the system to reach an energy minimum significantly different from that of the starting conformation. As a result, we take many approximations, which may not be valid. In particular, we know that we can express the change in free energy on binding as δG=E −T S
(1)
where E is the enthalpy change, T is the temperature, and S is the entropy change, but in practice we slip into the easier way of thinking of binding energy as simply enthalpic and ignore the entropic effects altogether. 5.1 Evolution of Inhibitors
Inhibitors may also be generated by the use of “molecular evolution.” This technology has been successfully applied to thrombin [29]. In brief, a synthetic scheme is chosen that potentially enables the synthesis of a very large combinatorial
7
8 Enzyme Inhibitor Design
library: in this case, a three-component Ugi-type reaction was selected. A small set of molecules is synthesized with a random choice of components and tested for thrombin inhibition. This “first generation” is “evolved” to a second and further generations. After about 20 or so generations, the “population” normally converges to a set of closely related thrombin inhibitors (for a detailed analysis of the method applied to thrombin, see [30]). In this way it is possible to generate low nanomolar thrombin inhibitors, potentially using different synthetic schemes where each scheme would give a different inhibitor series with a different chemical backbone (scaffold). The libraries for thrombin were “biased,” and the synthetic scheme was chosen so that the final molecules contain two hydrophobic groups (targeted to the D and P pockets), a basic group (targeted to the S1 pocket), and potential hydrogen-bonding groups (targeted at Gly216). It is therefore no surprise that the best inhibitors look “familiar” as thrombin inhibitors. The relevance for understanding molecular interactions is that the selection criterion for “survival of the fittest” is solely the measured K i for thrombin. It might thus be expected that the best inhibitors would indicate optimal interactions with the protein-binding site. In practice the information obtained is limited by the geometrical restraints imposed by the synthetic scheme chosen and by the restricted choice of building blocks. Use of only “affinity” as the selection criterion also drives the process towards larger and more hydrophobic compounds, which may not be “drug-like.” There is, of course, the possibility to include a bias towards lower molecular weight, or indeed towards any other property of the inhibitor that can be rapidly computed. A very informative practical exercise would be to use the approach with a number of synthetic schemes, to observe the binding modes of the best inhibitors by use of X-ray crystal structure determinations, and to superimpose the resulting structures. This would then enable a 3-D mapping of the active site on the assumption that inhibitor features occurring most frequently at a particular spatial location indicate the most favorable functional group to place at that point. Further, the range of relevant low-energy structures of the protein would also be mapped. The important principle here is that although a single ligand-protein complex contains much useful information, the overlay of a series of complexes – as independent of each other as possible – gives a much more objective picture of which interactions contribute most to affinity (and, in principle, to selectivity).
6 Inhibitors from Progressive Design
An alternative approach to finding good inhibitors is to start from some kind of small “anchor” building block and then “grow” the inhibitor. In principle, the anchor can be covalent; indeed, it has been suggested that, in the absence of any other starting point, a cysteine might be cloned into a protein of interest. Inhibitors could then be grown starting from a disulfide bridge as anchor. Once sufficient affinity is reached, the anchor could be abandoned altogether or replaced by a
6 Inhibitors from Progressive Design
Fig. 5 The Roche thrombin inhibitor, napsagatran 5
substituent targeted at the wild-type molecule. To my knowledge, such an approach has not been realized1) , but there is a large literature on metalloproteases where an anchor, such as hydroxamate, binding tightly to the catalytic zinc ion has been used. It has not, however, been particularly fruitful to progressively develop inhibitors starting from small, covalently bound thrombin inhibitors. As indicated in Section 3, PPACK-based thrombin inhibitors have been widely varied, but in most cases the D and P pocket groups were left alone and the groups interacting with Gly216, Ser195, or Asp189 were modified to improve biological properties. For more-or-less linear inhibitors, there always exists the possibility of “shuttle” optimization, i.e., working from one end to the other and back again. A very successful progressive design approach for thrombin is to start from a small building block, known or expected to bind non-covalently in a precise position in the active site, and expand this by progressive addition of substituents. We have called such building blocks “needles” and have described the discovery of a thrombin-specific needle and its evolution to a full-blown thrombin inhibitor with the use of sequential X-ray structural analyses (see [28]). The details will not be repeated here. The inhibitor, napsagatran, 5 is shown in Fig. 5. Napsagatran is, with K i for thrombin of 270 pM and K i for trypsin 1.9 µM, one of the most potent and selective thrombin inhibitors known and will be used here to illustrate a number of principles. First, the needle itself, amidino-piperidine, has K i for thrombin of 150 µM and for trypsin 360 µM and thus is 2.4× selective for thrombin. This contrasts with the classical needle benzamidine, which has K i for thrombin of 300 µM and 31 µM for trypsin and thus is 10× selective for trypsin. This is perhaps 1) Note added in proof: This concept has been published by D.A. Erlanson, A. C. Braisted, D. R. Raphael, M. Randal, R. M. Stroud, E.M. Gor-
don, J. A. Wells, Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 9367–9372.
9
10 Enzyme Inhibitor Design
unexpected, as it could be argued that benzamidine, being planar, is a much better analogue of the substrate arginine guanidinium group. The width of the recognition pocket (measured from Gly216 N to Cys191 carbonyl carbon and taken from highresolution X-ray structures) is 7.6 Å for PPACK, 8.0 Å for benzamidine, and 8.4 Å for napsagatran. The recognition pocket is thus not rigid and in thrombin is able to expand more easily than in trypsin. It is very difficult to see why this is so by inspecting X-ray structures. The only sequence difference between thrombin and trypsin in the whole of the recognition pocket is that thrombin has Ala at 190, whereas trypsin has Ser, but the effect of this difference on needle binding is obscure (see Section 9.1). It is quite possible that residues forming a second layer around the recognition pocket help determine the structural variability. Perhaps needle binding is a realm better covered by experiment than theory at the present time. Secondly, napsagatran bound to thrombin shows very good intramolecular interactions, and the bound conformation was observed to be very similar to that found in crystals of napsagatran alone. This points to the general principles of a “lockand-key” interaction between enzyme and inhibitor being favorable [31], which is equivalent to saying that the entropy loss of the inhibitor on binding should be low. The two hydrophobic substituents, naphthyl- and cyclopropyl-, pack well together. Such “hydrophobic collapse” of the inhibitor structure in solution is presumed to help pre-form the “key” to fit in the thrombin “lock” [32, 33]. Further, the carboxylate provides a “cap” to the needle, being positioned over the hydrophobic part of the piperidine, protecting it from interaction with water, and the conformation is stabilized by an intramolecular hydrogen bond from an -NH to the carboxylate. An important principle, much neglected, is that the conformation of a potential inhibitor in solution is also relevant. The observed inhibition constant is a measure of the free energy change in the whole system on mixing enzyme and inhibitor. It is very unfavorable if the inhibitor prefers a conformation in solution of substantially lower energy than when bound to the enzyme. In general, it is to be expected that, after complex formation, both enzyme and inhibitor will be in low-energy conformations. It seems unlikely, and also unnecessary, that both will be simultaneously in their lowest energy conformations. Just as the topological and geometrical constraints on protein folding lead not infrequently to amino acids – mostly but not always proline – being found in the cis rather than the energetically more favorable trans conformation, the global energy minimum attained in inhibitor binding may sometimes distribute local energy unevenly. A fundamental principle of inhibitor design, not always easy to achieve in practice, is that when docking enzyme and inhibitor, all reasonably low energy conformations have to be taken into account.
7 Lessons from Classical Inhibitors
As early as 1981, chemists at the Mitsubishi chemical company synthesized the thrombin inhibitor MD-805, 6 (Fig. 6), which has been extensively tested in
7 Lessons from Classical Inhibitors Fig. 6. The Mitsubishi compound MD-805 (argatroban) 6
humans under the name argatroban [34]. The compound, as might be seen from its structure, was produced by the classical medicinal chemistry approach starting with arginine-containing tripeptides. Secondary amines, ultimately the piperidine, were introduced to prevent processing as substrate. The structure in complex with alpha thrombin has been published by myself [13] and at higher resolution in complex with epsilon thrombin by Bode and Brandstetter [35]. Another early inhibitor is Napap, 7 (Fig. 7) [36]. The compound was synthesized as the racemate. The stereochemistry of the binding species was demonstrated by determining the structure in complex with alpha thrombin and was published by myself [13] and, at higher resolution in complex with epsilon thrombin, by Bode and Brandstetter [35]. This inhibitor can be considered a tetra-peptide analogue if the piperidine is taken as replacing a cyclized amino acid. Fig. 8 shows, in simplified form, the binding of the three inhibitors 5–7 to thrombin. All may be considered peptide analogues. All occupy roughly the same volume. Most of the interactions with thrombin are similar or identical. There is, nevertheless, a fundamental difference in the way they bind – the “binding modes” are not all the same. I take here a definition of “binding mode” as being determined by which substituents are in which pockets. If we take the arginine or analogue to
Fig. 7 D-Napap 7
11
12 Enzyme Inhibitor Design
Fig. 8 Sketch of the binding modes of napsagatran, 5, MD-805 6, and Napap 7 Table 1 Binding patterns of thrombin inhibitors
Napsagatran MD-805 Napap
S1
D pocket
P pocket
P1 P1 P1
P2 P2 P3
P1 P1 P1
be P1 and stretch the definitions somewhat, we have the following binding pattern according to where the residue side chains are found (Tab. 1). Two different binding modes are seen. For Napap the P1 residue lies substratelike towards Ser 195, but the D conformation (as opposed to the substrate L conformation) brings the piperidine corresponding to P1 into the P pocket; the glycine P2 acts as a spacer and makes the hydrogen bonds expected for a P3 residue, positioning the P3 naphthyl-sulfonyl correctly to fit into the D pocket. For MD-805 the P1 arginine tilts towards Gly216 and makes there the hydrogen bonds expected for a P3 residue, allowing the P1 piperidine to enter the P pocket and the P2 substituent the D pocket. Surprisingly, napsagatran formally has the same binding mode as MD-805, despite their different origins. MD-805 originated as a tripeptide with P2, P1, and P1 substituents. To prevent cleavage of the P1-P1 peptide bond, a secondary amide was introduced and optimized to the piperidine shown. What was not appreciated at the time was that the P1 arginine effectively jumped to P3. This is possible, as the structures show, but only if the arginine needle turns so that only one guanidine –NH2 interacts with Asp189 (Fig. 9). In retrospect, this example reinforces a number of important principles: 1. Expect the unexpected – the smallest change in an inhibitor can cause it to bind totally differently. 2. The tail does not wag the dog – in this case, the guanidine does not make “optimum” interactions with Asp189 but has to settle for “second best” in order to allow a large number of other favorable interactions. 3. As indicated above for napsagatran, “hydrophobic collapse” is a powerful driving force. Models that placed the P1 piperidine in the rather hydrophilic S1 region did not take this into account. 4. A water molecule at a well-defined position helps stabilize the bound conformation (Fig. 9).
7 Lessons from Classical Inhibitors
Fig. 9 Interactions at the bottom of the S1 (recognition) pocket. (Left) The canonical benzamidine hydrogen-bonding scheme with two hydrogen bonds to Asp189, one to the carbonyl of Gly219 and one to the con-
served water molecule. (Right) The guanidine of MD-805 makes interactions with the same hydrogen bond acceptors plus an extra water molecule.
5. The carboxylate substituent on the piperidine ring has the equatorial conformation and not the expected axial conformation that has lower energy taken in isolation. As previously discussed in detail [13], this allows better packing in the P pocket than any other conformation of this or any other stereoisomer. Modeling suggests that packing of the quinoline against the piperidine also gives preference to this conformation, so hydrophobic collapse may be the primary driving force. At any rate, there is a local conformation, which is not in its lowest energy state.
Napsagatran was evolved progressively from the 3-substituted amidino-piperidine needle. The exit vector from the needle is such that amide extension first interacts with the rim of the S1 pocket by accepting a hydrogen bond from the –NH of Gly217 at the front of the pocket. This excursion of the “extended needle” allows the central amino acid (regarding the extended needle as side chain) to achieve an “ideal P3” hydrogen-bonding interaction with Gly216, being in plane with Gly216 and close to the position of the P3 glycine in the fibrinopeptide structures. This repositioning of the central amino acid backbone then requires a smaller P pocket substituent. The possible energy/entropy losses associated with an extended needle are compensated by the fact that all torsional angles are close to ideal values and that the amide makes two good hydrogen bonds – to Gly217 and to the inhibitor carboxylate. Superimposition of the three lead inhibitor and fibrinopeptide structures indicates the following design rules for thrombin inhibitors: 1. There has to be a basic group in close interaction with Asp189. 2. There is a small volume deep in the P pocket that must be occupied by a hydrophobic group. 3. There is a small volume deep in the D pocket that must be occupied by a hydrophobic group (over CE3 of Trp215). 4. Non-aromatic residues are preferred in the P pocket.
13
14 Enzyme Inhibitor Design
5. 6. 7. 8.
Aromatic residues are preferred in the D pocket. Both “anti-parallel beta” hydrogen bonds to Gly216 must be made. A carboxylate or carbonyl is preferred near Ser195. It is not a requirement that the oxy-anion hole be occupied.
These heuristic rules are there to be broken but have to be kept in mind, as most will have to be satisfied most of the time. To progress further, we need to try to quantitate the energetics of the enzyme-inhibitor interaction.
8 Estimating the Energies of Interactions
There are many approaches to the energetics of intermolecular interactions (vide infra). Here we document some cases where thrombin inhibitors have been used to provide energy estimates. Obst, Diederich, and coworkers generated by rational design a series of thrombin inhibitors with rigid, bicyclic core structures [37]. These were further extended to tricyclic structures and modified specifically “to generate detailed information on the strength of individual intermolecular bonding interactions and their contribution to the overall free energy of complexation” [38]. The general formula of the inhibitors is given in Fig. 10. For details of synthesis and stereochemistry, please refer to the original articles. Here, two pairs of inhibitors will be presented for which there is evidence from high-resolution X-ray structures that the pairs bind in exactly the same way. This is most important, since, as discussed above, quite similar inhibitors might bind differently and also the protein might adapt to the inhibitor. The reference inhibitors 8a have carbonyl oxygens at both R1/R2 and R3/R4 and bind with the lower carbonyl oxygen accepting a hydrogen bond from the –NH of Gly216 (as modeled) and the upper carbonyl oxygen in the P pocket.
Fig. 10 General formula of the bi- and tricyclic thrombin inhibitors 8 from Obst et al. [37, 38] (simplified representation). 8a (R1/R2) = O, (R3/R4) = O; 8b R1 = H, R2 = H, (R3/R4) = O; 8c (R1/R2) = O, R3 = H, R4 CH2 (CH3 )2 .
8 Estimating the Energies of Interactions
A first inhibitor 8b was generated (in the bicyclic series) with R1 = H, R2 = H, i.e., the lactam carbonyl was replaced by a methylene group. The only difference in the X-ray structures of this and the reference bicyclic inhibitor was the lack of the carbonyl oxygen atom. The energy difference between the two compounds as derived from K i measurements is G = 0.8 kcal mol−1 . We can directly equate this with the energy of the lost hydrogen bond, since there are no other differences apparent. This is at the lower end of the generally accepted range (0.5–1.5 kcal mol−1 ). The authors propose a number of reasons for this. First, residue glycine 216 is planar, and the phenylamidinium residue of the inhibitor stacks parallel to it so that its π-electrons will tend to stabilize a partial positive charge on the –NH of Gly216. The resultant antiparallel dipoles of the glycine –NH and C O will then tend to stabilize each other. Second, because Gly216 is planar, repulsion between the inhibitor carbonyl group and the Gly216 carbonyl group will prevent the hydrogen bond from having optimal geometry (and thus optimal energy). Finally, the interaction between the methylene group on the inhibitor and the methylene group of Trp215 are assessed as positive, albeit small. There is a complementary way of looking at this problem. We may simply ask, does a water molecule prefer to bind here when no inhibitor is present? The answer seems to be “No,” which supports the above argument. A second inhibitor pair was synthesized, this time in the tricyclic series with R1/R2 as carbonyl, with R3 as H, and with variation of R4 with R stereochemistry. This was done since it was clear that a carbonyl group in the P pocket does not obey the rules for optimum binding. The reference tricyclic 8a compound is, in fact, quite potent, with a K i for thrombin of 90 nM, but with R4 as isopropyl 8c, the K i improved to 13 nM. This corresponds to an improvement in binding energy of G=1.1 kcal mol−1 . This is quite substantial, but given that the structure closely resembles the S2 valine of the natural substrate, it is perhaps less than might have been expected. Close inspection of the X-ray structures shows that the isopropyl substituent is fractionally too large and pushes the tricycle out by about 0.5 Å without a significant change in the protein structure. Cyclopropyl and ethyl are slightly better substituents, with K i ’s of 10 nM and 8 nM, respectively. Nevertheless, the carbonyl group does better than expected, probably because the P pocket is not fully closed. There is a smear of residual electron density between this carbonyl oxygen and the –NH3 of Lys60F, which can be interpreted as a poorly ordered water molecule partially solvating the carbonyl oxygen. Another inhibitor series has been used to estimate the value of P pocket interactions — the Boehringer Mannheim diaryl sulfonamides [39] (also reported by 3-Dimensional Pharmaceuticals [40–43]) (Fig. 11). The diaryl sulfonamide inhibitors were discovered by a screening exercise aimed at finding less basic thrombin inhibitors [44]. A crystal structure of the complex of thrombin with the R CH3 compound 9b (BM14.1248) shows the phenyl group in the D pocket, the central tolyl group in the P pocket, and the 4-aminopyridine in the S1 pocket (but not interacting directly with Asp189) [45] [1uvt]. 9b has a K i of 23 nM for thrombin, whereas the R H compound 9a has a K i of 300 nM [39]. This corresponds to G= 1.5 kcal mol−1 . As discussed by the authors [45], this is certainly lower than the energetic cost of a “hole” left by deleting a methyl group
15
16 Enzyme Inhibitor Design
Fig. 11 (Above) Diaryl sulfonamide inhibitors from Boehringer Mannheim, 9a R = H, 9b R = CH3 . (Below) A similar inhibitor from 3-Dimensional Pharmaceuticals, 10 R = CH3 .
ideally packed deep into a hydrophobic pocket. The P pocket is sufficiently flexible to compensate somewhat for the loss of the methyl group: the Tyr-Pro-Pro-Trp loop moves down by 1.0–1.5 Å (R. Engh, personal communication). The corresponding R CH3 compound 10 from 3-Dimensional Pharmaceuticals has a K i for thrombin of 11 nM and shows the same binding mode [40, 43]. In a similar way, using 4-TAPAP as template [35], a methyl group deep in the P pocket was shown to produce an affinity gain of 17 x with no change to the position of inhibitor binding [46]. We conclude from the above examples that up to 2 kcal mol−1 of binding energy may be obtained by placing a methyl or similar small hydrophobic group correctly in the P pocket. Something similar must be true of the D pocket, although examples with X-ray validation are missing. We also conclude that placing a hydrogen-bond acceptor correctly above the -NH of Gly 216 is worth only < 1 kcal mol−1 . We further observe that it is possible to obtain low nanomolar inhibition without making the canonical “benzamidine” hydrogen bonds (Fig. 9). For a detailed discussion of “non-canonical needles” and recognition pocket and P pocket flexibility, the reader should consult [41, 42, 45].
9 Water and Solvent
Hydrogen bonding in general is reviewed in the previous chapter. Here, the discussion will be limited to the role of water and other solvent molecules in inhibitor binding. The structure of the solvent around biological macromolecules has been reviewed in detail by Mattos and Ringe [47]. Serine proteases of the trypsin family have 21 conserved buried water molecules, as first reported by Sreenivasan and Axelsen
9 Water and Solvent Fig. 12. Inhibitors from Axys: 11a APC-8696 R1 = H, R2 =, X = CH; 11b APC-10302 R1 = Cl, R2 = , X = CH, 11c APC-1144 R1 = H, R2 = H, X = N.
[48]. These may be regarded as integral to the protein structure, and it might be expected that they would be difficult to displace. Dunitz [49] estimates the entropic cost of “freezing” in a water molecule as part of the protein structures as up to 2 kcal mol−1 , which is a quite significant penalty. High-resolution X-ray structures of thrombin in the Protein Data Bank show rather variable total numbers of water molecules, presumably according to the preferences of the depositors. A generally accepted number is around one water per amino acid [50], i.e., 300 for thrombin. Of these, the only ones of direct interest here are the conserved water at the bottom of the S1 pocket (Fig. 9); possible waters hydrogen bonding to the –NHs of Gly216 and Gly219; and whatever waters are in the S1, P, and D pockets and are normally displaced by inhibitors. Most of water molecules found around the active site do not appear to be particularly difficult to displace. In particular, there is no highly conserved water structure around the oxyanion hole or around Ser 195. These catalytically important features are not “frozen” in “ice-like” water but rather are intrinsically able to adapt to substrate, transition state, and product structures. An elegant, detailed structural description of an unusual multi-centered short hydrogen-bonding network, induced by the binding of Axys inhibitors of the 11c APC-1144 type (Fig. 12), is given by Katz et al. [51]. Those interested in the possibilities for interaction with the residues responsible for the catalytic mechanism and in the roles of water and pH should consult this reference. 9.1 Displacing a Tightly Bound Water
The conserved S1 water (Fig. 9) is one of the best defined, as judged by the low B-values (thermal disorder parameters) observed with benzamide- or guanidinetype inhibitors. It was long regarded as simply part of the protein until Katz et al. [52] produced inhibitors that displaced it. The objective was to improve selectivity between those serine proteases that have serine at position 190, e.g., uPA, trypsin, tryptase, and those that have alanine at 190, e.g., tPA, thrombin, factor X. The side chain of residue 190 is spatially close to the conserved water, and this region can be accessed by benzamidine substituents ortho- to the amidine (Fig. 12). Katz et al. [52] report binding constants for 11a APC-8696 of K i =130nM for trypsin and K i =320 nM for thrombin. The compound is thus trypsin selective by
17
18 Enzyme Inhibitor Design
2.5 ×. Upon introduction of the chlorine substituent to give 11b APC-10302, the binding constant for trypsin becomes K i = 230 nM and K i = 60 µM (60,000 nM) for thrombin. 11b is thus 260x trypsin selective. Katz et al. [52] provide a wealth of both binding data and X-ray structural data on uPA, thrombin, and trypsin and discuss in detail how the protein structure responds to inhibitor binding. This excellent set of high-resolution structures [1gjb, 1gjc, 1gj7, 1gj8, 1gj9, 1gja, 1gjd, 1gj4, 1gj5, and 1gj6] has recently been made public (May 2002) and will be a profitable subject for analysis. Only a preliminary summary can be given here. It might have been expected that the chlorine would interact favorably with the alanine 190 side chain in thrombin and less well with the serine 190–OH, giving selectivity for thrombin, but the inhibition constants reveal just the opposite. Trypsin is favored by G = 3 kcal mol−1 , although the conserved water is indeed removed from both enzymes and the general binding mode is the same. The conserved water donates hydrogen bonds to the carbonyl oxygen of residue 227 and to the π-electrons of Tyr228. Both of these interactions are lost when the water is displaced, but the energy change will be similar for both enzymes. The most important difference seems to be that in thrombin there is no fourth hydrogen bond partner for the amidino group, and this hydrogen bond is totally lost. Further, the chlorine is not quite large enough to fill the pocket left by the water, so a “hole” is generated, which as we have already seen, costs energy. In trypsin the nearer -NH2 of the inhibitor amidino group donates a hydrogen bond to the Ser190 -OH, which in turn donates a hydrogen bond to the -OH of Tyr228. The side chain of Tyr228 moves inward slightly and thus contacts the chlorine atom, filling the potential “hole.” The chlorine can thus displace the trypsin water at no net energy loss, as all hydrogen-bonding capabilities of the amidino group and the Ser190 Oy are satisfied and good close packing is achieved.
9.2 Binding of Solvent Molecules
Besides water, proteins also can bind a wide range of other molecules on their surface. Many of those found in structures in the Protein Data Bank have clearly been introduced to promote crystal formation. In recent years, nearly all crystal structures reported have been analyzed in the frozen state, and in many cases glycerol or another cryoprotectant has been added to aid in the crystal-freezing process. While the binding of such molecules from crystallization or freezing buffers may not be of direct biological relevance, specific binding sites are often identified that can deliver information on the preferred binding of small ligands, which then has predictive value for inhibitor design. A logical extension of this observation is to actively produce crystal structures in the presence of high concentrations of small “probe” molecules and thus produce an experimental binding map of the protein surface. This can then be used instead of, or in combination with, the theoretically derived functions used for in silico screening.
9 Water and Solvent
Fig. 13 Bis(5-amidino-2-benzimidazolyl) methane (BABIM) 12.
Mattos and Ringe have analyzed protein surfaces [53], reviewed “proteins in organic solvents” [54], and discussed the use of such information in inhibitor design [55]. English et al. have studied thermolysin in high concentrations of organic solvents [56, 57]. It is to be expected that this kind of experimental approach will be extensively used in the future, both to give design ideas in specific cases and to help improve prediction methods in general. Bartlett et al., for instance, have used increasing ethanol concentrations to help estimate the hydrophobic contribution to inhibitor binding [58]. Solvent or water molecules can be “recruited” by inhibitors to enable them to bind better. For example, the early thrombin-inhibitor complexes MD-805 and PPACK have water strongly bound to the guanidine group in the recognition pocket, in MD-805 it makes a bridge to the inhibitor carboxylate, while in PPACK it makes a bridge to the inhibitor N-terminal −NH3 . This, and the recruitment of common ions such as chloride, should be regarded as the norm and taken into account in the inhibitor design process. Ladbury [59] has analyzed the way inhibitors recruit water molecules but concluded that at present it is difficult to predict such behavior. An unusual observation is the recruitment of a zinc ion by serine protease inhibitors, reported by Katz et al., which is analyzed structurally in fine detail in [60–63]. The simplest compound, 12, showing this “delta effect” (greater potency in physiological buffers or plasma than in EDTA-containing assay buffers, i.e., an inverse plasma shift) is bis(5-amidino-2-benzimidazolyl) methane (BABIM) (Fig. 13). The “delta effect” gives increases of affinity of greater than 1000 x in the presence of Zn2+ . It will be interesting to see whether this paradigm can be extended to the recruitment of other (physiological) molecules. 9.3 Screening
While “screening” and “design” are commonly seen as opposite approaches to drug finding, it has to be pointed out here that screening by physical methods is an extremely useful way of mapping an active site. Abbott has developed this approach in extensive studies, principally on urokinase, using X-ray [64–73] and Raman screening methods [74] in addition to their SAR by NMR approach [75, 75– 77]. As higher throughput X-ray and NMR technologies are developed, it is to be expected that this kind of experimental approach will be used more and more.
19
20 Enzyme Inhibitor Design
Fig. 14 Symmetry of a typical thrombin inhibitor.
10 Structure-Activity Relationships (SAR)
In Section 7, the binding modes revealed by some crystal structures of thrombininhibitor complexes were discussed. Inhibitor studies on thrombin have been complicated by the tendency of small changes in the inhibitor backbone to change the binding mode [28, 79]. This occurs partly because the inhibitors have some internal symmetry, as depicted in Fig. 14. As this behavior is to be expected in other projects, it is perhaps worth commenting on a general principle — the use of structure-activity relationships (SAR). If even a few variants of a molecule are available, it is normally possible to identify the binding mode by inspecting the SAR. If there is any doubt, it may be worth deliberately making a few test compounds: normally, there is a position where substituents of one type are allowed in one binding mode but not in the other. A particular issue in thrombin inhibitors has been stereochemistry. The SAR of MD-805 and NAPAP and analogues, for example, was hard to understand without knowing that only the D isomer of NAPAP binds to thrombin. It has been very helpful on many occasions to determine X-ray structures by soaking racemic mixtures into crystals and observing which stereoisomer binds. It is now possible to produce a high-resolution structure of a thrombin-inhibitor complex in a day or so, which is quicker and easier than chiral separation. Where selectivity against related enzymes is an issue, which it certainly is for thrombin, it has been repeatedly found that this can only be understood, and thus improved, if the binding modes to these other enzymes can also be determined. This is particularly true where diastereomers are involved.
11 Present Clinical Status of Thrombin Inhibitors
Other than argatroban, the only non-peptidic thrombin inhibitors to have reached phase III clinical trials are the simple tripeptide analogue melagatran and its orally available pro-drug ximelagatran from AstraZeneca [5] (Fig. 15). For a recent clinical status review, see [80]. Given the enormous worldwide efforts over many years to develop thrombin inhibitors as anti-thrombotic drugs, this is rather disappointing. Until more results of early-phase clinical trials are published, it will not be clear whether the problem is
References
Fig. 15 Melagatran 13 and its pro-drug ximelagatran 14. Melagatran is the unsubstituted parent compound. Xi-melagatran has R1 = −CH2 CH3 , R2 = −OH.
that thrombin is a difficult drug target, e.g., because of bleeding risks, or whether the inhibitors proposed as clinical candidates simply do not have sufficiently drug-like properties. The suspicion is that tripeptide analogues with a strongly basic group have intrinsically poor pharmacokinetic and pharmacodynamic properties [8]. If this is true, new and different inhibitors are needed, and here a good understanding of the interactions between the inhibitor and the enzyme active site can contribute significantly to the identification of inhibitor molecules suitable for development as drugs. 12 Conclusions
Thrombin inhibition is a fruitful source of raw data for the study of molecular recognition. Several groups have determined, published, and deposited coordinates for sets of high-resolution X-ray crystal structures. In combination with binding and kinetic data, it is now possible to “map” the thrombin active site in some detail in terms of both structural changes and the energies of interactions. Heuristic models based on the binding of peptidomimetic inhibitors pointed to hydrophobic interactions in the D and P pockets and optimal hydrogen bonding to Gly216 and Asp189 as being vital for good inhibition. More recent experience, coming from a wide variety of sources, shows that for acceptable affinity the hydrophobic interactions have to be maintained and good surface complementarity is essential (no holes), but it is not a requirement that all possible hydrogen bonds are made to Gly216 and Asp189. Given the progress made so far, it is to be hoped that systems such as thrombin will continue to be actively employed to further our understanding of intermolecular interactions. Acknowledgments
I would like very much to thank all those excellent colleagues with whom I have had the great privilege to work and publish on thrombin, in particular Ulrike Obst for checking this manuscript. I thank also Fritz Winkler and Hans-Joachim B¨ohm for continued support and encouragement.
21
22 Enzyme Inhibitor Design
References 1 WILEY, M. R. and FISHER, M. J., Exp. Opin. Ther. Pat., 1997, 7 (11), 1265– 1282. 2 COBURN, C. A., Expert Opin. Ther. Pat., 2001, 11 (5), 721–738. 3 ANAND, S., Haemostasis, 1999, 29 (Suppl. 1), 76–78. 4 FRENCH, J. K. and WHITE, H. D., Curr. Card. Rep., 1999, 1 (3), 184–191. 5 GUSTAFSSON, D., et al., Thrombosis Res., 2001, 101 (3), 171–181. 6 HAUPTMANN, J., Nova Acta Leopold., 1999, 80 (311), 61–82. 7 HAUPTMANN, J. and STURZEBECHER, J., Thromb. Res., 1999, 93 (5), 203–241. 8 HAUPTMANN, J., Eur. J. Clin. Pharmacol., 2002, 57 (11), 751–758. 9 HIRSH, J., Am. Heart J., 2001. 142 (2, Suppl.), S3–S8. 10 MENEAR, K., Expert Opin. Invest. Drugs, 1999, 8 (9), 1373–1384. 11 SHEN, G. X., Curr. Drug Targets: Cardiovasc. & Haematol. Disord., 2001, 1 (1), 41–49. 12 SCHECHTER, I. and BERGER, A., Biochem. Biophys. Res. Commun., 1968, 32 (5), 898–902. 13 BANNER, D. W. and HADVARY, P., J. Biol. Chem., 1991, 266 (30), 20085–20093. 14 WILKINSON, A. S., et al., Biochemistry, 1999, 38 (13), 3851–3856. 15 CHITTOCK, R. S., et al., Biochem. J., 1999, 338 (Pt 1), 153–159. 16 MARQUART, M., et al., Acta Cryst., 1983, B39, 480–490. 17 BANNER, D. W., Nature, 2000, 404 (6777), 449–450. 18 KETTNER, C. and SHAW, E., Thrombosis Res., 1979, 14 (6), 969–973. 19 MATTOS, C., et al., Biochemistry, 1995, 34 (10), 3193–3203. 20 DING, X., et al., Biochemistry, 1994, 33 (31), 9285–9293. 21 DING, X., et al., Biochemistry, 1995, 34 (23), 7749–7756. 22 MATTOS, C., et al., Nat. Struct. Biol., 1994, 1 (1), 55–58. 23 PEISACH, E., et al., Science, 1995, 269 (5220), 66–69. 24 KATONA, G., et al., J. Biol. Chem., 2002, 14, 14.
25 SKORDALAKES, E., et al., J. Mol. Biol., 2001, 311 (3), 549–555. 26 BOHM, H. J., D. W. BANNER, and WEBER, L., J. Comput. Aided Mol. Des., 1999, 13 (1), 51–56. 27 BOHM, H. J., J. Mol. Recognit., 1993, 6 (3), 131–137. 28 HILPERT, K., et al., J. Med. Chem., 1994, 37 (23), 3889–3901. 29 WEBER, L., et al., Angew. Chem. Int. Ed. Engl., 1995, 107, 2453–2454. 30 ILLGEN, K., et al., Chem. Biol., 2000, 7 (6), 433–441. 31 VERLINDE, C. L. and HOL, W. G., Structure, 1994, 2 (7), 577–587. 32 HART, P.A. and RICH, D.H., Pract. Med. Chem., 1996, 393–412. 33 RICH, D.H., Perspect. Med. Chem. 1993, 15–25. 34 KIKUMOTO, R., et al., Biochemistry, 1984, 23 (1), 85–90. 35 BRANDSTETTER, H., et al., J. Mol. Biol., 1992, 226 (4), 1085–1099. 36 STURZEBECHER, J., et al., Thromb. Res., 1983, 29 (6), 635–642. 37 OBST, U., et al., Angew. Chem. Int. Ed. Engl., 1995, 34, 1739–1742. 38 OBST, U., et al., Chem. Biol., 1997, 4 (4), 287–295. 39 WEBER, I. R., et al., Bioorg. Med. Chem. Lett., 1998, 8 (13), 1613–1618. 40 LU, T., et al., Bioorg. Med. Chem. Lett., 1998, 8 (13), 1595–1600. 41 LU, T., et al., Bioorg. Med. Chem. Lett., 2000, 10 (1), 83–85. 42 LU, T., et al., Bioorg. Med. Chem. Lett., 2000, 10 (1), 79–82. 43 BONE, R., et al., J. Med. Chem., 1998, 41 (12), 2068–2075. 44 VON DER SAAL, W., et al., Bioorg. Med. Chem. Lett., 1997, 7 (10), 1283–1288. 45 ENGH, R. A., et al., Structure, 1996, 4 (11), 1353–1362. 46 BERGNER, A., et al., J. Enzyme Inhib., 1995, 9 (1), 101–110. 47 MATTOS, C. and RINGE, D., International Tables for Crystallography. 2001, 623– 647. 48 SREENIVASAN, U. and AXELSEN, P. H., Biochemistry, 1992, 31 (51), 12785–12791. 49 DUNITZ, J., Science, 1994, 264, 670–670.
References 50 CARUGO, O. and BORDO, D., Acta. Cryst. D: Biol Cryst, 1999, D55 (2), 479–483. 51 KATZ, B. A., et al., J. Mol. Biol., 2001, 307 (5), 1451–1486. 52 KATZ, B. A., et al., Chem. Biol., 2001, 8 (11), 1107–1121. 53 RINGE, D. and MATTOS, C., Med. Res. Rev., 1999, 321–331. 54 MATTOS, C. and RINGE, D., Curr. Opin. in Struct. Biol. 2001, 11 (6), 761–764. 55 MATTOS, C. and RINGE, D., Nature Biotechnol., 1996, 14 (5), 595–599. 56 ENGLISH, A. C., GROOM, C. R., and HUBBARD, R. E., Protein. Eng., 2001, 14 (1), 47–59. 57 ENGLISH, A. C., et al., Proteins: Struct. Funct. Gen., 1999, 37 (4), 628–640. 58 BARTLETT, P. A., et al., J. Am. Chem. Soc., 2002, 124 (15), 3853–3857. 59 LADBURY, J. E., Chem. Biol., 1996, 3 (12), 973–980. 60 KATZ, B. A., et al., Nature, 1998, 391 (6667), 608–612. 61 KATZ, B. A. and LUONG, C., J. Mol. Biol., 1999, 292 (3), 669–684. 62 JANC, J. W., et al., Biochemistry, 2000, 39 (16), 4792–4800. 63 KATZ, B.A., et al., in Special Publication – Royal Society of Chemistry. 2001, 211– 222. 64 NIENABER, V.L., et al., in U.S., 2001, (Abbott Laboratories, USA): US. p. 33 pp., Cont.-in-part of U.S. Ser. No. 36, 184. 65 MCCLELLAN, W.J., et al., in, Abstr. Pap. – Am. Chem. Soc. 2001. p. MEDI294.
66 ROCKWAY, T.W., et al., in, Abstr. Pap. – Am. Chem. Soc. 2000. p. MEDI-030. 67 NIENABER, V. L., et al., Structure (London), 2000, 8 (5), 553–563. 68 NIENABER, V., et al., J. Biol. Chem., 2000, 275 (10), 7239–7248. 69 NIENABER, V.L., et al., in PCT Int. Appl., 1999, (Abbott Laboratories, USA).: WO. p. 57. 70 NIENABER, V. L., MERSINGER, L. J., and KETTNER, C. A., Biochemistry, 1996, 35 (30), 9690–9699. 71 NIENABER, V. L., et al., Nature Biotech., 2000, 18 (10), 1105–1108. 72 HAJDUK, P. J., et al., J. Med. Chem., 2000, 43 (21), 3862–3866. 73 NIENABER, V. L., BOXRUD, P. D., and BERLINER, L. J., J. Prot. Chem., 2000, 19 (4), 327–333. 74 DONG, J., et al., Biochemistry, 2001, 40 (33), 9751–9757. 75 HAJDUK, P. J., et al., J. Am. Chem. Soc., 1997, 119 (25), 5818–5827. 76 FESIK, S. W., et al., Protein Eng., 1997, 10 (Suppl.), 73. 77 SHEPPARD, G.S., et al., in Book of Abstracts, 213th ACS National Meeting, San Francisco, April 13–17. 1997. p. MEDI-081. 78 PETROS, A. M. and FESIK, S. W., Methods Enzymol., 1994, 239 (Nuclear Magnetic Resonance, Pt. C), 717–739. 79 BANNER, D., et al., Perspect. Med. Chem. 1993, 27–43. 80 STEINMETZER, T., HAUPTMANN, J., and STURZEBECHER, J., Expert. Opin. Investig. Drugs., 2001, 10 (5), 845–864.
23
1
Protein Kinase Inhibitors Jerry L. Adams GlaxoSmithKline Pharmaceutical Medicinal Chemistry, Collegeville, USA
James Veal Serenex, Inc., Informatics and Computational Chemistry, Durham, USA
Lisa Shewchuk GlaxoSmithKline Pharmaceuticals Discovery Research, Research Triangle Park, USA
Originally published in: Protein Crystallography in Drug Discovery. Edited by Robert E. Babine and Sherin S. Abdel-Meguid. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30678-7
1 Introduction
The regulation of glycogen phosphorylase function by reversible phosphorylation was first described in the mid-1950s. However, the potential importance of protein kinases was not fully appreciated until the advent of DNA cloning and sequencing in the mid-1970s. The initial estimate of 1000–2000 human protein kinases made in 1987 has now been refined using the full human genome to 518 putative protein kinase genes or approximately 1.7% of all human genes [1]. With nearly a third of all proteins containing at least one covalently bound phosphate, it is easy to envisage an important regulatory role for the enzymes that attach (kinases) and remove (phosphatases) inorganic phosphate from proteins. Because kinases are involved in the regulation of all aspects of cellular function, protein kinase inhibition may be a widely applicable strategy for the treatment of disease. Protein kinases now figure prominently among the molecular targets currently being pursued by the pharmaceutical industry to treat a wide spectrum of diseases, these include cancer, diabetes and its complications, rheumatoid arthritis, Alzheimer’s, hypertension and stroke. The realization of the therapeutic promise of protein kinase inhibitors has proven to be a daunting task. Two major obstacles to be overcome were achieving access to these intracellular targets and selectivity of inhibition. With over 500 kinases in the human genome, it is not surprising that selectivity has proven to be the more Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Kinase Inhibitors
difficult of the two problems. However, as will be discussed in this review, X-ray crystallography has proven to be the central technology driving our understanding of the structural features governing inhibitor binding and providing direction in the search for selective kinase inhibitors.
2 Structure and Function 2.1 Tertiary Structure
A great level of structural knowledge has accumulated for protein kinases over the past decade (for various reviews see [2–9]). The minimal kinase domain consists of approximately 300 amino acids: for example, Cyclin-Dependent Kinase 2 (CDK2) has 298 residues (Fig. 1) [10, 11]. The kinase domain can be further divided into two lobes or sub-domains, N-terminal and C-terminal, with the lobes connected via a five to six residue "hinge" or "linker" region. The N-terminal lobe is the smaller subunit and consists of approximately 80 amino acids. It is primarily beta sheet in nature but does contain a significant alpha helix segment, termed the C-helix
Fig. 1 Stereo ribbon diagram of human CDK2 with bound ATP (1 HCK.PDB). Structural elements are colored as follows, glycinerich loop in dark blue, N-terminal beta sheet in cyan, the C-helix in yellow, linker or hinge region in magenta, activation loop in green and C-terminal lobe helices in red.
2 Structure and Function
(nomenclature derives largely from the initial Cyclic AMP-Dependent Kinase, PKA, structure [9, 12]). By contrast the C-terminal lobe, in addition to being substantially larger, is largely α-helical in nature. A binding cleft is formed between the two lobes, and ATP docks into this region. Hydrogen bonds are formed between the N6 exocyclic amino group of ATP and a backbone carbonyl oxygen of the hinge region noted above, as well as between the N1 of ATP and a backbone NH, again from the hinge region. Note that, in addition to these two groups, the hinge region also projects a second backbone carbonyl oxygen into the binding site in the region of adenine H2. This functionality is significant for synthetic ligand-hydrogen bonding interactions as discussed later. Beyond hydrogen bonds to the adenine ring, the ribose hydroxyls of ATP are often observed to show hydrogen bonds to a polar residue (Asp 86 of CDK2 [10, 11], Asp 1083 of insulin receptor kinase [13], and equivalent residues) that resides at the beginning of the C-terminal lobe as defined. The adenine and ribose rings are also sandwiched by a series of hydrophobic residues from both the N- and C-terminal lobes. In particular for CDK2, residues Val-18, Ala-31, Val-64, and Leu-134 have van der Waals contacts with ATP as do equivalent residues of other kinases. An additional residue, Phe-80 for CDK2, is also significant and contacts ATP in the vicinity of C5 and the N6 exocyclic amino group and has been termed the gatekeeper. This residue shows high variability across kinases. As brief examples, it is Met for insulin receptor kinase (IRK), Thr for p38, Gin for ERK2, and Val for VEGFR2 [14–19]. The gatekeeper nomenclature derives from the fact that the residue straddles both the ATP binding site and an additional region deep within the kinase that small molecule inhibitors often utilize (see Sections 2.3.1 and 4.2.2). The final component of ATP, the triphosphate group, projects as would be expected into a highly hydrophilic and open region framed by a set of strictly conserved residues involved in catalysis. It should be noted at this point that although the small molecule inhibitors designed for protein kinases are predominantly ATP competitive, they commonly occupy regions of substantial volume in addition to where ATP would normally bind. As a result, the "ligand" binding site, when referring to inhibitors, is not synonymous with the ATP binding site. 2.2 Catalysis and Substrate Binding
Crystallographic studies, in particular with non-hydrolyzable ATP and transition state analogues as well as substrate analogues, have provided a detailed understanding of the likely nature and mechanism of substrate binding and catalysis [13, 20–24]. As noted, the triphosphate group interacts with conserved residues of the protein kinase. More specifically, the α and β phosphates are oriented via a salt bridge with a lysine (Lys–33 of CDK2) and the lysine itself is further fixed in an appropriate location for catalysis by a conserved glutamic acid residue (Glu-51) residing on the C-helix. Asp-145 and Asn-132 of CDK2 and equivalent residues complete a coordination sphere between the phosphate groups and accompanying magnesium counterions.
3
4 Protein Kinase Inhibitors
Peptide substrate then docks onto the protein kinase, in general presumably occupying a cleft along the C-terminal lobe, as exemplified by the peptide inhibitor in the PKA-AMPPNP-PKI and IRK–ATP structures. Catalysis appears to be via a dissociative transition state mechanism and a planar phosphate intermediate [20, 22]. The incoming peptide hydroxyl is oriented via Asp-127, which is in turn further stabilized via a hydrogen bond to Asn-132. These latter two residues that are part of the signature-catalytic loop DxxxxN motif that designates proteins as potential protein kinases. 2.3 Regulation and Conformational Flexibility
Protein kinases are highly regulated owing to their central role in signal transduction [3, 5, 25]. While a full description of regulatory mechanisms is outside the remit of this review, it is nonetheless important to consider in the context of how it affects ligand binding. More specifically, regulatory control of protein kinases often involves significant conformational changes in structure that alter the shape of the binding site afforded to a small molecule inhibitor. Four mechanisms of regulation and their conformational effects will be mentioned: activation loop conformation, glycine-rich loop conformation, C-helix position and lobe orientation. 2.3.1 Activation Loop Conformation The original unliganded IRK structure [14] revealed a protein in which the ATP binding site was occluded by residues from the so called "activation loop" or "T loop". This C-terminal extended region is present in all protein kinase structures to date and, as defined here, begins with a well conserved Asp-Phe-Gly sequence motif (DFG starting with Asp-145 of CDK2 and Asp-1150 of IRK) and extends until just upstream of a consensus Ala/Thr-Pro-Asp/Glu triplet. The loop contains one or more phosphorylation sites, i.e., it is a substrate for another protein kinase or autophosphorylation via protein dimerization. It is useful to consider the activation loop as having a range of conformations, in equilibrium, with two relative extremes. One end of the equilibrium is as seen in the IRK-AMPPNP structure [13] and represents the catalytically active conformation. Specifically, the aspartic acid of the DFG motif is strictly conserved and involved in catalysis (Section 2.2), and so must be properly positioned to coordinate magnesium ions. The adjacent phenylalanine occupies a position adjacent to the C-helix and packs against a series of hydrophobic residues from other regions of the protein. The ATP site is unhindered by protein such that ATP can freely dock and form suitable hydrogen bonds to the linker strand with the γ -phosphate fully accessible. By contrast, at the other end of the equilibrium, a conformation is observed as represented by the initial IRK structure. The phenylalanine of the DFG motif is translocated by approximately 10 Å to roughly a position where, in an ATP -iganded structure, the ribose would reside.
2 Structure and Function
Scheme 1 Arrows indicate the hydrogen bonding pattern of the inhibitor to the kinase backbone determined by X-ray crystallography.
Additionally, the aspartic acid residue conformation is altered via rotation of the peptide backbone ζ angle. Phosphorylation of the activation loop appears to drive the equilibrium towards a catalytically active conformation due to the ability of the phosphorylated residue to form stabilizing salt bridges with a series of lysine or arginine residues. However, it should be noted that, based on the evidence to date, the activation loop can occupy a range of conformations including one close to an active conformation in the absence of phosphorylation and an inactive one despite phosphorylation. In short, it is a true equilibrium with energetics and relative populations of different forms varying from kinase to kinase depending on a range of factors including phosphorylation, additional protein and substrate binding, and internal energy of the conformation. The significance for ligand binding, particularly for small molecule inhibitors, is that the nature of the ATP pocket and adjacent regions is significantly altered. For example, STI–571 1 (Gleevec, Scheme 1), as described in more detail later, binds solely to an inactive type conformation [26, 27] with a portion of the ligand residing where the phenylalanine of the Abl DFG triad would reside in an active conformation. Inactive conformations have now been observed for a range of both serine/threonine and tyrosine kinases including Abl, IRK, p38, Tie2, and Twitchin kinase [14, 26–29] indicating that the phenomena is likely widespread across kinases, and so again emphasizing the importance to ligand design. One final point to note is that the region immediately upstream of the DFG motif is relatively constant to date across X-ray structures. In particular, the carbonyl oxygen of the residue preceding the aspartic acid is typically involved in a buried hydrogen bond to either a tyrosine phenol or histidine epsilon nitrogen. This hydrogen bond holds the carbonyl group fixed, and consequently, the backbone NH of the aspartic acid is also fixed via the rigid peptide bond. The major movement of the loop is, as a result, initiated via rotation around the ζ torsion of the aspartic acid. 2.3.2 Glycine Rich Loop The glycine-rich loop (G-rich loop) is defined by the GxGxxG motif that is highly prevalent in protein kinases and occurs in the first 10–20 residues of the kinase
5
6 Protein Kinase Inhibitors
domain. Other residues within this region, e.g., Thr-14 and Tyr-15 of CDK2, may be targets for phosphorylation and consequential inhibition of kinase activity. With respect to inhibitor ligand design, the important feature of this region is its inherent flexibility. In particular, ligand structures such as FGFR-SU5402 2 [30] and Abl-Gleevec 1 (Scheme 1) [26, 27] demonstrate the ability of the loop to mold or collapse onto the ligand to form favorable contacts, particularly with the aromatic residue that typically precedes the final glycine. It is difficult to determine based on structural evidence to date, but one can speculate that there is less likelihood of seeing a recurrent pattern for how the loop adjusts in conformation, as opposed to the activation loop conformational changes where patterns are clearly present.
2.3.3 C-Helix Orientation Recall that the C-helix contains a strictly conserved glutamic acid residue. It is apparent from multiple structures of different protein kinases that modulation of the conformation of this helix is a common regulatory theme. Additionally, there are multiple ways of regulating C-helix conformation, and as with other modes of regulation, ligand binding is affected. For CDK2 [10, 11], the C-helix is properly positioned by the binding of cyclin proteins to give a tertiary complex, upon ATP binding, that is catalytically active. In the absence of these proteins, the C-helix rotates on its axis, projecting the glutamic acid more towards the solvent, and translates away from the catalytic lysine. This conformational change creates additional space in a region adjacent to the ATP binding site that a small molecule inhibitor can occupy without steric interference. Similar patterns of C-helix displacement are also observed in Hck and Src Kinase X-ray structures [6, 31–35], but the degree of change is distinct. For Src and Hck, C-helix conformation is not modulated via interactions with cyclins but instead through phosphorylation and/or interaction with other protein domains. Another mechanism of modulating the C-helix conformation is employed in common for the so called AGC family of kinases: PKA, PKB, PRK, SGK1 and MSK1 are among the members of this family. These kinases utilize a hydrophobic motif (FxxF) beyond the kinase C-terminus to pack adjacent to the C-helix and lock it into an active conformation [12, 36–40]. For proper packing to occur, the evidence to date is that an acid functionality must be immediately adjacent and downstream of the second phenylalanine of the hydrophobic motif. This acidic group can be in the form of either a phosphorylated residue (e.g., PKB/SGK1), an aspartic acid (PRK2), or a C-terminus-free carboxylate (PKA). Crystallographic studies [38–40] for PKA and mutated PKB (S474D) show the acidic functionality to form a salt bridge or hydrogen bonds with residues from the N-terminal lobe [41] This interaction stabilizes the C-helix location There are two key features for ligand design considerations. First, as with the other mechanisms described above, the nature of the ligand binding site in the region of the C-helix is affected by the regulatory state However there is also a second feature unique to AGC family members. In addition to the hydrophobic motif they also have other conserved sequence motifs, which based on PKA and
2 Structure and Function
recent PKB structures impact the shape of the front of the ATP binding site in the vicinity of C2 of ATP Namely, a phenylalanine (PKA 327 PKB 439) that is upstream of the hydrophobic motif projects into the front of the ATP site stacking over a glycine-glycine step that is at the end of the hinge region and forms an aromatic herringbone type interaction with the C2/H2 of the adenine ring conformation [36] [37–41]. In other kinases this region is fully accessible, and commonly occupied by inhibitors, so a substantially different binding profile is thus possible for AGC family members, relative to other protein kinases.
2.3.4 Lobe Orientation An additional feature of protein kinase structure and function that impacts ligand binding is the orientation of the N-terminal lobe relative to the C-terminal lobe. Catalysis is, as noted, dependent on residues from both lobes, and as such, the proper spatial orientation between the domains is required for effective catalysis. In the absence of ATP, substrate, or other needed co-factors and adapter proteins, the domains are observed to orient differently via conformational changes in the hinge region as well as a few other confined sequence elements (e.g., an extended region just downstream of the C-helix). The net effect is to alter the shape of the ATP binding cleft and, consequently, the shape of the synthetic ligand needed for inhibition. This effect is perhaps observed to be most pronounced to date for ERK2 and p38 MAPK kinases [16–19, 42]. For p38, the binding site is substantially more open, relative to, e.g., CDK2 or IRK, and this feature as a result alters the profile for the type of ligand needed to inhibit p38 [43].
2.3.5 Solvent Channel The lysine–glutamic acid salt bridge has been described above, but one aspect that has not been emphasized is that this interaction occurs relatively deep in the binding site, in the vicinity of the N7/C8 positions of ATP. Additionally, the backbone NH of the aspartic acid that is part of the DFG motif also projects into the sterically accessible space adjacent to N7 of ATP. These two consistent features across protein kinases (Lys–Glu salt bridge and Asp NH) create a polar region buried within the ATP binding site. The hydrophilicity of this site can be further enhanced by the presence of polar amino acids in this region, for example at the gatekeeper residue or the residue immediately preceding the DFG motif. The net result is to favor and stabilize the presence of a solvent channel of two to three water molecules deep within the ATP binding site. The precise positioning of the water molecules is modulated by the types of amino acids in this region, and the number of water molecules can be expanded significantly for catalytically inactive conformations. Additionally, sterically bulky residues such as Phe-80 of CDK2 may block the presence of some of the water molecules. Inhibitors may either interact favorably with the solvent located in this region or displace it in part or in whole. An increasingly common motif is to target the Asp NH as will be discussed. Regardless of whether the solvent is to be displaced or retained, it is a key aspect of inhibitor design.
7
8 Protein Kinase Inhibitors
3 Crystallization 3.1 Defining the Construct
One of the most challenging aspects of obtaining a kinase crystal structure is designing the initial constructs. Recent advances in high-throughput cloning techniques, affinity purification tags and crystallization robots have however allowed for multiple constructs to be tested rapidly in parallel. Initial constructs are often designed based on previously solved crystal structures and usually contain only the catalytic kinase core. However, many kinases are multidomain proteins and inclusion of additional domains can increase both protein stability and solubility. The minimal kinase domain typically contains five β-strands and one α-helix in the N-terminal lobe and 7–8 α-helices in the C-terminal lobe. In tyrosine kinases, the N-terminal lobe usually starts near a highly conserved tryptophan, 14 residues upstream of the GxGxxG motif. The start of the N-terminus is less well conserved in serine/threonine kinases. Limited proteolysis has proven useful for identifying the boundaries of the Nand/or C-terminal lobes as well as identifying additional segments or domains that may provide stability. Typically the full length protein is first expressed and purified. The protein is then digested with a panel of proteases, both in the presence and absence of known ligands or inhibitors, and the digests are analyzed by SDSPAGE. The N- and C-terminus of the stable fragments are identified by N-terminal sequencing and mass spectrometry. In the case of EphB2 and TβR-1, an N-terminal juxtamembrane or regulatory domain was included in the crystallography construct in order to obtain data quality crystals [44, 45]. Regions C-terminal to the kinase domain were included in the CaM and Twitchin kinase constructs and were found to be pseudosubstrates in the crystal structure [46, 47]. The majority of kinases solved to date have been expressed using a baculovirus expression system in insect cells. Only a few kinases, including p38, ERK2, PAK and EphB2, have been successfully expressed in E. coli [17, 19, 45, 48]. Constructs typically contain an affinity tag to aid in the purification. Both His6 and cleavable GST tags [45, 48, 49] have been used successfully. Newer expression systems, inR R system, Gibco/Life Technologies Gateway system cluding the Invitrogen Echo or the Novagen pTriEx–1 cloning system, allow one to test multiple affinity tags in both E. coli rapidly and insect cells with minimal subcloning [50]. To improve the stability and/or solubility of the protein during expression and purification, peptide substrates, binding partners and ligands have been added. The structures of many cyclin-dependent kinases (CDKs) have been solved by co-expression and purification with protein activators and inhibitors. The structure of CDK5 was solved by co-expression with the protein activator p25 [51]. The structure of CDK6 was solved by reconstitution of a CDK6/cyclinD/pl8INK4c complex in vitro during the purification [52]. The solubility and stability of GSK3, during purification and crystallization, was significantly improved by addition
3 Crystallization
of a 39 amino acid peptide derived from the C-terminus of its binding partner FRAT1 [53]. 3.2 Mutagenesis
In some cases, mutation of surface or active site residues has been required to improve stability and solubility. Mutation of surface cysteines prevented the formation of disulfide-linked aggregates in FGFR1 [54]. Mutation of highly conserved, active site residues, to create a kinase-dead mutant, was required for a number of kinases whose over-expression proved to be toxic to cells. The conserved aspartic acid in the DFG motif was mutated to an asparagine in CDK5 while the conserved lysine in PAR was mutated to an arginine [48, 51]. Removal of flexible loops within the kinase domain has also been required for some kinases. Many receptor tyrosine kinases contain an additional domain within their C-terminal lobe, referred to as the kinase-insert domain. Removal of this highly charged domain was required to obtain data quality crystals of VEGFR2 [15]. Some proteins have not crystallized despite considerable effort. In such cases, mutation of a closely related kinase to resemble the kinase of interest has provided valuable structural information. Ikuta and co-workers mutated three residues within the ATP binding pocket of CDR2 to the corresponding residues in CDK4. The resulting mutant protein was crystallized and used to design CDK4-selective inhibitors [55]. 3.3 Phosphorylation
Heterogeneous phosphorylation is often a problem when kinases are expressed in insect cells. Multiple approaches have been used to solve this problem. Proteins have been completely dephosphorylated by incubation with λ protein phosphatase or alkaline phosphatase [38, 39, 56]. Ion exchange and isoelectric focusing chromatography have been used to separate proteins with multiple phosphorylation states. An γ -aminophenyl ATP-sepharose column was used to separate different phosphory-lated states of human c-Src [34]. Alternatively, serine/threonine or tyrosine phosphorylation sites can be mutated to alanine or phenylalanine, respectively [42]. For tyrosine kinases with multiple autophosphorylation sites, the active site aspartic acid can be mutated to an asparagine, creating a kinase dead mutant [57]. In many cases the active, phosphorylated form of the protein is preferred for structure-based drug design studies. Quantitative autophosphorylation has been achieved for many kinases including IRK, IGF-1 and VEGFR2 [13, 15, 49]. Residual ATP and ADP must be removed following phosphorylation to ensure homogeneity. Many serine/threonine kinases require phosphorylation by an upstream kinase for activation. This requires cloning the upstream kinase as well as additional purification steps after phosphorylation [38, 39, 58]. In some cases, the active form
9
10 Protein Kinase Inhibitors
of the protein can be mimicked by replacing phosphorylated serine or threonine residues with aspartic or glutamic acid [38, 39].
4 Inhibitor Design 4.1 Binding in ATP Cleft
The region of the ATP binding site is the current focus of kinase inhibitor drug discovery research. At first blush the pursuit of selective kinase inhibitors competitive with a substrate common to all kinases, ATP, and which is present at intracellular concentrations (2–5 mM) that are 10–1000-fold in excess of the Km , would appear to be a poor strategic approach. This belief was widely held into the mid-1990s at which time several reports emerged of kinase-selective ATP-competitive inhibitors. The structural basis for both binding in the ATP site and featares governing kinase selectivity was rapidly developed in subsequent X-ray crystallography studies and as discussed below provides a framework for current kinase discovery efforts. While many alternative binding sites exist which could provide the basis for non-ATP competitive approaches to kinase regulation, none are likely to prove as universally applicable as the compounds that bind into the ATP site. The primary attractiveness of the ATP site is that it affords a deep hydrophobic cleft for the adenine of ATP (or in some cases the guanine of GTP) that is common to all kinases and yet highly varied in shape. The commonality of binding elements provides inhibitor design features that can be applied to all kinases, whereas the exploitation of unique structural features provides the basis for achieving selectivity. At present there are >100 protein kinase-inhibitor structures in publicly available structural databases which span 28 kinases and a variety of inhibitor structural classes. Based upon an analysis of these data a classification of ATP binding regions has been proposed by Traxler and Furet [59] and Bower (personal communication from Michael J. Bower, December 1999). In the subsequent discussion a slightly modified version of this classification will be used to organize the trends seen across kinases and inhibitor classes (see Fig. 2). In the Traxler model, five sites were proposed of which three (adenine sugar and phosphate-binding sites) can be directly related to ATP and two additional lipophilic sites which lay outside of the region occupied by ATP. In the Bower model, an additional polar site on the surface of the protein was proposed. 4.1.1 ATP Binding Sites As previously described the purine binding site is well engineered to complement both the planar aromatic nature and hydrogen bonding pattern of adenine That the great majority of known inhibitors mimic these features highlights the essential role the purine binding site plays in the binding of both ATP and inhibitors The key features of a purine mimetic are hydrogen bonding functionality, which at a mini-
4 Inhibitor Design
Fig. 2 (A) Traxler/Bower Kinase inhibitor pharmacophore. The perspective given is a top down view from the N-terminal to Cterminal domain in which the plane of the adenine ring matches that of the page. A short segment of the kinase linker region is shown which includes the three hydrogen bonding residues (minus side chains) plus the gatekeeper residue. In this orienta-
tion the top of the graphic corresponds to the back of the pocket and the bottom of the figure to the front or solvent. HBA and HBD denote hydrogen-bond acceptor and donor, respectively The graphics were generated using WebLab ViewerLite from MSI and the indicated PDB files. (B) Complex of ATP, cyclin A and human CDK2 (PDB.l FIN).
mum mimics the hydrogen bond formed by N1 of adenine and a planar aromatic or heteroaromatic ring system (Fig 2B) Many inhibitors also contain a group which occupies the ribose-binding site and engages residues in the protein which either hydrogen bond to a ribose hydroxyl or to groups involved in the ligation of Mg and the α-phosphate of ATP. The isoquinoline sulfonamide 3 (Fig. 3A) is good example of an inhibitor which incorporates these binding elements [36]. While inhibitors must minimally contain a single hydrogen bond acceptor, more commonly seen are inhibitors which mimic the dual hydrogen bond donor–acceptor properties of adenine. Examples of donor–acceptor inhibitors (Scheme 2) are staurosporine 4 (NHCO of lactam) [60] and des-chloroflavopiridol 5 (carbonyl and phenolic hydroxyl) [61]. Also observed are alternate hydrogen bond donor–acceptor complexes in which the protein acceptor and donor atoms are located on the same amino acid residue (N+3 from the gatekeeper). Quercetin 6, whose phenolic oxygens bind to the kinase amide backbone, is an example of this alternative pattern [62] (Fig. 3B). Interestingly, in casein kinase 2 which utilizes GTP, as well as ATP, as the phosphate donor, this alternate hydrogen bond donor–acceptor pair has also been observed for the binding of GTP [63]. Finally, the structures of hymenialdisine 7 [64], indirbin-5-sulfonate 8 [65], NU6094 9 [66] and 10 (Fig. 5B) [67] in CDK2 provide examples of inhibitors that make all three possible hydrogen bonding contacts within the hinge region. The triphosphate binding site of kinases presents a conserved set of polar side chains to bind the phosphate portion of Mg–ATP and to position the γ -phosphate for transfer to the acceptor residue (Section 2.2). In general, the triphosphate site is largely unoccupied in the kinase-inhibitor complexes thus far described. A no-
11
12 Protein Kinase Inhibitors
Scheme 2 Arrows indicate the hydrogen bonding pattern of the inhibitor to the kinase backbone determined by X-ray crystallography.
table exception to this trend is balanol 11a, a potent polyphenolic natural product inhibitor of PKA and PKC. In the PKA structure, balanol occupies the entire ATP binding site, making extensive hydrogen bonding contacts to the glycine-rich loop and other polar phosphate-binding residues (Fig. 4A) [68]. A detailed investigation of synthetic balanol analogues revealed that the removal of polar groups which bound in the triphosphate pocket (11b–d, OH and C02 H) had no effect on PKA potency, although PKC inhibition was reduced by 10–1000-fold (Tab. 1) [69]. Also prepared was a 5 -deoxybalanol analogue, 11e, which lacks the phenolic group that hydrogen bonds the amide backbone in the hinge region and again this change had no effect on PKA inhibition. Using these balanol analogues and a number of the
Fig. 3 (A) Complex of the catalytic subunit of bovine PKA with 3, H-7, (PDB.l YRD). (B) Complex of quercetin, 6, with the Src family kinase, human HCK (PDB.2HCK).
4 Inhibitor Design
Fig. 4 (A) Complex of the catalytic subunit of activated murine PKA with balanol, 11a, (PDB.l BX6). (B) Complex of SB 203580, 12 in human p38 (PDB.1AU9).
isoquinoline sulfonamides, a theoretical model was developed to calculate ligandbinding affinities to PKA [70]. These calculations, which were able to reproduce most of the experimental data, found that the nonpolar binding interactions made the primary contribution to high-affinity binding. Hydrogen bonding interactions in general provided no net contribution to binding as the energetics of binding to the kinase were negated by the requirement to desolvate these polar groups. However, for the poorly solvated isoquinoline inhibitors, the single hydrogen bond made to the amide backbone in an otherwise lipophilic binding pocket does make a positive electrostatic contribution to binding. These conclusions, which are consistent with binding studies in other protein-small molecule systems, may provide
Table 1 Potency of balanol analogues prepared by Koide et al. against PKA and PKC [69]
13
14 Protein Kinase Inhibitors
Scheme 3
an explanation for the paucity of kinase inhibitors which make extensive contact in the highly polar triphosphate-binding pocket. 4.1.2 Gatekeeper-Dependent Binding Pocket The solution by Tong and co-workers of the co-crystal structure of SB 203580, 12, (Scheme 3) in p38, a serine/threonine kinase, provided the first structural data to explain how kinase selectivity could be achieved with an ATP-competitive inhibitor [71]. This structure revealed the binding of the fluorophenyl group into a pocket behind and orthogonal to the adenine of ATP. Residues Thr-106 and Lys-53 provide the most extensive contacts with the fluorophenyl group by forming the walls of the pocket above and below the plane of the aromatic ring (Fig. 4B). Recognizing that a limited number of kinases have side chains at position 106 the size of threonine or smaller, Tong proposed a potential role of Thr-106 in determining kinase selectivity. For example, the closely related ERK2, which has the larger glutamine side chain at position 106, is 1000-fold less sensitive to SB 220025, 13 (IC5O =20 µM) relative to p38 α [43]. The publication by Tong et al. was quickly followed by similar findings for several tyrosine kinases. Specifically, the co-crystal structures of PD 173074, 14, in FGFR114 (Fig. 5A) [72], PP1, 15, in Hck [32], PP2, 16, in Lck [73] and 15 in c-Src [74] all highlighted the significance of this newly discovered binding pocket. Further compelling evidence that a single amino acid residue, which is often termed the gatekeeper, as it controls access to this binding pocket, was the primary factor governing access to this pocket was provided by mutagenesis studies with p38γ [75], p38α [76] and c-Src [74]. Two important conclusions from these studies were: (1) that the introduction of a small gatekeeper residue into any protein kinase was sufficient to confer sensitivity to inhibitors that occupied the gatekeeper pocket, and (2) that kinase sensitivity to these inhibitors was closely related to the size of the gatekeeper residue, being greatest for glycine, followed by alanine and then serine and valine (Tab. 2). 4.1.3 Lipophilic Plug The lipophilic environment created by residues that form the adenine-binding pocket extends sufficiently beyond the opening of the pocket to create an additional
4 Inhibitor Design Table 2 The effect of site-specific mutagenesis of gatekeeper residues Met-106 (p38γ ) and
Thr-338 (Src) on inhibitors which occupy the gatekeeper specificity pocket [74, 75] p38γ /SAPK3–12
Src-15
Gatekeeper
IC50 (nM)
Gatekeeper
IC50 (nM)
Gly Ala Ser Thr Leu Gln Met (wildtype) Phe
30 10 50 300 45000 50 000 >100 000 >10 000
Gly Ala Ser (wildtype) Thr Val Ile Met Phe
5 5 400 100 100 5000 8000 8000
lipophilic binding area in front of the adenine. This pocket may not be accessible to inhibitors that bind into tightly closed adenine pockets. Inhibitor accessibility to this pocket may also be restricted for some kinase families. For example, the AGC family kinases (PKA, PKC, AKT) have a C-terminal tail that extends back into N-terminal domain and caps the adenine site with a phenylalanine (see Section 2.3). Inhibitor-kinase structures that bind an aryl ring in this site include 10 in CDK2 (Fig. 5B) [67], 9 in CDK2 [66], and 17 (Scheme 4) in p38 [77]. For these later two examples, the substitution of the inhibitor NH2 with NHPh, which places the aryl group in this lipophilic pocket, affords a 10-fold increase in potency. 4.1.4 Polar Surface Site The extension of the inhibitor beyond the lipophilic site that caps the adenine pocket has been successfully employed either to gain access to solvent (water) or to sites on the surface of the kinase. In general the inhibitor scaffolds that have been pursued as potential drug candidates are relatively flat lipophilic heteroaromatics, and the resulting compounds have very poor aqueous solubility. The attachment of a polar water-solubilizing group, typically a tertiary amine, to take advantage of this access to the solvent has been a key strategy for many kinase drug discovery efforts. Published examples for the anilinoquinazoline class of inhibitors are ZD1839 18, an EGFR inhibitor [78] and ZD6474 19, a VEGFR2 inhibitor [79]. While no crystal structures of these examples have been published, the recent publication of OSI774, 20, in complex with EGFR confirms the anticipated binding mode of these compounds [80]. The crystal structures of 14 (Fig. 5A) [72] and SU4984, 21, [30] in FGFR1 confirm that the water-solubilizing tertiary amines introduced onto these scaffolds project into the aqueous environment. As would be expected for functionality that remains water-solvated the introduction of these amines often has little impact on inhibitor potency. Conversely the observation of a significant effect on potency for functionality which extends beyond the lipophilic plug raises the expectation of a specific binding site on the protein surface. The hydrogen bonding interaction of an aryl sulfonamide with a
15
16 Protein Kinase Inhibitors
Scheme 4
backbone amide NH of Asp-86 appears to be the main driver for the > 100-fold enhancement in binding affinity seen for 10 (Fig. 5B) [67] and 24 (Fig. 6) [66]. 4.2 Conformational Considerations
The conservation of kinase structure and mechanism across the entire protein kinase family dictates a conservation of shape and electrostatics for the ATP binding site when ATP is bound to the catalytically active form of the kinase. However, no such commonality of features is required for the ATP binding site of kinases in their inactive state. In the last few years protein crystallography has provided many insights into the structural biology of protein kinase regulation. The picture that has emerged is complex. Multiple mechanisms have been described for regulation of kinase activity and the catalytic state of the protein is often the sum of competing mechanisms of activation and de-activation. The distortion of the kinase away from its active conformation is the common theme for these regulatory mechanisms. In many cases the conformational change required for activation is the direct result of a phosphorylation in the activation loop [81]. For the cyclin-dependent kinases an
4 Inhibitor Design
Fig. 5 (A) Structure of unphosphorylated FGFRl soaked with a solution of 14 (PDB 2FGI) (B) Structure of 10 in complex with inactive human CDK2 (PDB0.1 fvt).
Fig. 6 Structure-based design guided by iterative crystallography using activated cyclin A-bound Thr-160-phosphorylated CDK2.
additional event, the binding of an accessory protein, cyclins, is required to obtain an active conformation [82]. An in depth account of these mechanisms can be found in recent reviews by Huse and Kuriyan [5], Johnson and Lewis [83] and Engh and Bossemeyer [2]. The following sections consider the impact of conformational flexibility on inhibitor design. 4.2.1 Inhibitor-Induced Binding The structures of two potent PKA inhibitors, staurosporine 4 [84] and balanol 11a [68], reveal a protein binding surface complementary to that of the inhibitor. Noting the large protein-inhibitor contact surfaces in these structures, Engh and Bossemeyer investigated the relationship of buried surface area to inhibitor potency. The intuitive notion, that inhibitor potency should be related to binding area, can be theoretically derived as a logarithmic relationship between inhibitor potency and buried surface area [85]. When the data were plotted for PKA (log IC50 versus contact area), all of the compounds fell close to the theoretical line. One remarkable finding is that this correlation held for compounds whose binding induced a significant conformational change in PKA, for example 4 and 11a. Staurosporine is
17
18 Protein Kinase Inhibitors
a particularly instructive example as it is a potent inhibitor of many kinases and in all cases induces a protein conformation complementary to itself. The most likely explanation is that the induced conformations are energetically easily accessible, an assumption well-supported by structural studies which have examined the activation mechanisms of kinases. Several authors noting the tight fit of staurosporine have made reference to the phenomena of hydrophobic collapse. This picture, that the kinase envelopes and molds to the shape of inhibitor, fits well with the conclusions of Hunenberger et al. in which they noted that hydrophobics and not hydrogen bonding was the chief driver of inhibitor affinity [70]. When Engh expanded the analysis to include all the available kinase-inhibitor structures, the fit was poorer. Outliers included STI-571 1, which has the greatest buried surface area, but is only moderately potent and a number of compounds that bound more potently than would be expected based upon their surface contacts. An explanation for the relatively poor binding affinity of 1 is suggested from its structure as determined in the un-activated form of Abl [27]. This structure reveals that binding of the inhibitor forces the activation loop into a catalytically inactive conformation which appears to be energetically unfavorable relative to other active conformations. This conclusion was supported by inhibition studies demonstrating 1 to be a potent inhibitor of the un-activated kinase (K I =37 nM), but a weak inhibitor of the fully activated Tyr-393 phosphorylated Abl (K I =7000 nM). Moreover, the insensitivity of the fully activated kinase was predicted by modeling which suggested that Tyr-393 phosphorylation would favor the catalytically active conformation of the activation loop. That inhibitor binding can force the kinase into higher energy conformations is a further consequence of conformational plasticity that promises to play an important role in the design of selective kinase inhibitors. Finally, what about those inhibitors that bind better than predicted based upon an analysis of buried surface area? Examples given by Engh and Bossemeyer include CDK2-22, Hck-15, Lck-16 and p38-13 [2]. This tighter binding could reflect a higher degree of inhibitor-enzyme complementation. The structures of 13 and related pyridinylimidazoles in p38 offer support for this explanation. Unlike many of the structures discussed thus far, the binding of pyridinylimidazoles appears to induce very little change in p38 over that of the catalytically inactive apo-structure [43]. 4.2.2 What is the Most Appropriate Enzyme Form for Crystallography? The most pervasive mechanism for the regulation of kinases is that effected by other kinases and phosphatases that add and remove phosphate at multiple sites. Another common mechanism by which kinase activity and substrate binding (ATP and acceptor protein) are regulated is through the binding to proteins that lie both upstream and downstream to the kinase transmitting the signal. In most of these examples, the regulation of kinase activity is probably effected by conformational changes that either favor or disfavor the attainment of the catalytically active ATPbound conformation. In principle, consideration of these inactive conformations in designing an inhibitor is not required, if the sole mechanism of the inhibitor is to compete with ATP for the binding to the catalytically active form of the kinase. In practice, it is not that simple, as there are several examples known in
4 Inhibitor Design
which the inhibitors bind equally well to both the un-activated and activated kinase. Also, unfortunately in most cases it has proven much more difficult to purify and crystallize the active protein. Hence, many of the inhibitor-kinase structures have been solved with an inactive form of the kinase. Recently Davies and colleagues have reported their studies on the design of CDK2 inhibitors. Whereas previous structure-based design efforts had used the structure of inactive monomeric CDK2, prior to initiating their design efforts, the Davies group undertook a comparative study of both the inactive and activated cyclin Abound Thr-160-phosphorylated structures of CDK2 [86]. This comparison revealed two major changes which might affect inhibitor binding. They also solved the structures of an inhibitor, indirubin-5-sulfonate, bound to both kinase forms and performed calculations to quantify the effect of structural differences between these two forms on inhibitor binding. Having established an empirical basis for using the activated kinase as a tool in structure-based design, in a subsequent publication the authors presented a successful example of structure-based inhibitor design based upon the activated CDK2 structure [66]. Starting with a 12 µM inhibitor, NU2058 23, α >1000-fold improvement in potency was achieved for the optimized compound, NU6102 24 (Fig. 6). While the above example supports activated kinases as the most relevant form for structure-based design, the behavior of STI-571 1, which binds some 200-fold more potently to the un-activated form of Abl, suggests that inactive kinase structures may also have value [27]. The Abl-1 structure revealed that inhibitor binding locks the activation loop in an unusual inactive conformation opening up a new pocket deeper in the ATP site. Potential advantages of inhibitors that reform the ATP binding site are enhanced selectivity and the ability to lock the kinase into a nonactivatable conformation. However, these advantages are not fully realized for 1, as it is not uniquely selective (Abl, c-Kit and PDGFR are all equally inhibited) [87] and there are no data demonstrating that 1 inhibits activation. Another potential advantage is that the inhibitor may not have to compete with ATP for binding to the un-activated kinase. Given the high intracellular concentration of ATP, the cellular potency of inhibitors that are strictly competitive with ATP should be greatly attenuated, and as such binding of the inhibitor to an un-activated enzyme form with low ATP affinity might circumvent the disadvantage inherent to ATP-competitive inhibitors [88]. Recent Abl structural studies comparing the binding of 1 to PD 173955, 25, suggest that for this example the unique binding conformation may come at a high cost [26]. PD 173955, which has a reduced contact surface with the enzyme, does not bind in the pocket formed by movement of the activation loop. However, inhibition studies demonstrate that 25 inhibits both the un-activated and activated kinase equally well. The explanation proposed by the authors is that the larger binding surface of 1 is required to pay the penalty for binding to the novel inactive conformation. In contrast 25, which does not bind to this novel activation loop conformer, is an equipotent inhibitor of both un-activated and activated Abl. The authors propose that the differences in cellular potency of these two inhibitors (25 is some 10-fold more potent in a cellular assay) may be a reflection of their
19
20 Protein Kinase Inhibitors
Scheme 5
differing binding modes and conclude an advantage for the inhibitor which can bind to multiple enzyme forms (Scheme 5). The rationale of using the activated CDK2 structure to design 24 are clear. However, these studies leave unanswered the question as to whether this structure provided a result superior to that based upon the inactive kinase. Indeed, many groups have used the inactive form of CDK2 to successfully aid their lead optimization efforts. The Abl example illustrates how inactive kinase structures can lead to unique design possibilities. To reiterate, there are several mitigating factors that make any blanket statement on choice of enzyme form unwise. As will be outlined in the subsequent section, much progress has been made in kinase design irrespective of the activation state, a result which suggests that any structure is better than none. 4.2.3 Homology Models and Surrogate Kinases The conservation of tertiary structure and mechanism, which complicates the discovery of highly selective kinase inhibitors, provides for a portability of learning from one kinase to the next. Furthermore, the more closely two kinases are related, the more likely the information can be applied to the second kinase. One way in which structural knowledge is extrapolated from one kinase to the next is the process of homology modeling [89]. To construct a homology model the backbone tertiary structure of a known kinase is used as the template upon which the side chain residues of the target kinase are attached. Using various minimization algorithms, the homology model is refined and thereby provides
4 Inhibitor Design
a starting point for inhibitor docking or design. As discussed elsewhere in this chapter, homology modeling is a routinely used tool that can provide valuable insights. The application of inhibitor-binding modes from one kinase to the next is another example of the portability of structural information. The universal commonality of an adenine mimetic in inhibitor templates, and the corresponding engagement of common backbone-binding elements in the kinase provides a powerful rationale for this process. Structural proof of a common binding mode in PKA, c-Src, Lck, CDK2 and Chkl has been demonstrated for staurosporine, which potently inhibits all of these kinases [31, 73, 84, 90, 91]. Structural confirmation of a common binding mode for 15 and 16 to several members of the Src family of cytosolic tyrosine kinases has also been obtained [32, 73, 74]. In the above examples it can be argued that the unique structure of staurosporine or the close similarity of Src family members predisposes these systems to exhibit a common binding mode. However, even in highly unfavorable cases a "conservation" of binding modes has been observed. The anilinoquinazoline 26, which has a sub-nanomolar IC50 against EGFR, weakly inhibits p38 (IC50 =6.3 µM) and is all but inactive against CDK2 (IC5O =250 µM). Nonetheless, the binding mode observed for 27 in CDK2 (ICso-1 µM) and 28 in p38 (IC50 =5 µM) [42], is the same as that recently observed for the related anilinoquinazoline, 20, in EGFR [80]. There are several ways in which one may want to exploit the conservation of binding mode to advance inhibitor design. Two very common situations are the lack of protein crystallography for the target kinase at the initial stages of a new effort and the failure to obtain protein suitable for crystallography. To illustrate the later problem, despite the continued efforts of several groups to purify and crystallize Raf and MEK, the kinases upstream of ERK, no structures have been forthcoming On the other hand, multiple groups have succeeded in crystallizing CDK2, p38 and members of the Src family. In our own labs when the above situations arise we routinely use the binding mode obtained from the more structurally tractable kinase as a "surrogate" for the target kinase. Ikuta et al. have published a clever application of the surrogate kinase concept to assist in the design of CDK4-selective inhibitors Their approach was to design and crystallize a CDK2 mutant containing a region of the CDK4 ATP binding site in place of that of CDK2 This CDK4 mimic, which retained activity and demonstrated CDK4-like inhibitor selectivity, was used to design an inhibitor 29 that demonstrated high selectivity for CDK4 versus CDK2. The crystal structure of the CDK4 mimic-29 complex was consistent with their design which identified a single amino acid as the primary determinate for inhibitor selectivity [55]. Ideally, as in the case above, one would like to have available a surrogate kinase closely related to the target of interest, as this will make it more likely that the inhibitor will bind to the chosen surrogate and that information obtained will be more readily applied to the target. However, this is not a requirement. When faced with a new structural class of unknown binding mode, the solution of the first structure in any kinase is of great value. The process is not foolproof and as with any other co-crystal structure the utility of the information is dependent upon its predictive value.
21
22 Protein Kinase Inhibitors
4.3 Paradigms for Kinase Drug Discovery
One gauge of the impact of protein crystallography on kinase inhibitor design is the increasing frequency of its use. The majority of recent publications employ molecular modeling and report the use of protein kinase-inhibitor structures in the design process. Further evidence of the growing importance of crystallography is the increasing frequency of new kinase structures. In 2002 the list of newly solved structures included EFGR [80], AKT2 [38, 39], MAPKAPK2 [92], Aurora A [93], EphA2, and FAR [94]. The following discussion illustrates how this information is being applied to inhibitor design. 4.3.1 High Throughput Screening Where to start for the design of a kinase inhibitor? Does one follow the example set for proteases using a knowledge of mechanism and substrate to design inhibitors and then use iterative cycles of crystallography and design? For now and into the foreseeable future the answer is no. Instead, with rare exception, the starting point for kinase design has been a high throughput screening (HTS) lead. As has been true for other drug targets, natural products have proven to be a rich source of kinase inhibitor scaffolds. The solution of kinase crystal structures with natural products, such as staurosporine 4, L868276 5, quercetin 6, hymenaldisene 7, and balanol 11a (Scheme 2 and Tab. 1) have provided an understanding of binding in the ATP site which underpins much of the current design efforts. Furthermore, modifications of these natural products has led to the discovery of compounds which are being studied in clinical trials for the treatment of cancer (flavopyridol, 30) [95] and diabetic retenopathy (LY33351, 31) [96]. The screening of large compound libraries has also provided a rich source of new inhibitor scaffolds. An unusually successfully example is PTK 787 32, a KDR inhibitor screening lead which has advanced into clinical trials [97]. Major inhibitor classes found as kinase leads using HTS that have led to clinical candidates are the quinazolines [98], oxindoles [99], anilinopyrimidines [100] and the recently discovered diaryl ureas [29]. 4.3.2 Structure-Based Design In a 1995 publication Furet et al. at Ciba Geigy used the available PKA structures to predict the binding interactions of PKA with staurosporine [101]. This prediction, which was subsequently confirmed with a co-crystal structure, stands as an early illustration of how crystallography would guide the discovery of kinase inhibitors. With the increasing success of crystallographers, many inhibitor design efforts for a novel kinase can now begin with a similar level of knowledge. In the absence of such information, inhibitor-docking experiments using a homology model or alternatively, crystallography data from a surrogate kinase can be used to guide inhibitor design efforts. The following section details several examples illustrating the current status of structure-based design. For more comprehensive reviews on inhibitor design see Traxler and Furet [59], Toledo and Lydon [60], Scapin [102], and Garcia-Echeverria et al. [103].
4 Inhibitor Design
Fig. 7 The structure-based design of new inhibitor templates based upon a model of staurosporine bound to PKA.
The staurosporine 4 binding model developed by Furet et al. correctly identified the lactam amide as the donor–acceptor that bound to the linker backbone [101]. Furthermore, the authors hypothesized the indole rings to occupy the sugar and back hydrophobic pocket (referred to as the sugar and gatekeeper pockets in Fig. 2A). This model led the group to design less rigid analogues which retained a cyclic imide to maintain the essential backbone-binding element identified in staurosporine (Fig. 7). While the bisindolyl maleimides, 33, maintained the potent PKC inhibition seen with 4 [104], the dianilinophthalimides, 34, proved to favor inhibition of EGFR kinase [105]. The success of these new templates provided support for the original modeling and design hypothesis, which at this time had yet to be structurally verified, and furthermore demonstrated the possibility of achieving more selective kinase inhibition with an ATP-competitive inhibitor. Docking studies using the dianilinophthalimides in a homology model of the EGFR kinase led to the development of a model similar to that previously proposed for 4. Based upon this model, a sulfur-aromatic interaction of a meta-halo aryl group of the inhibitor with Cys-773 present in the ribose-binding pocket was proposed as a key-binding element [106]. Furthermore, since Cys-733 is uniquely present at this position in the EGFR family, this interaction could explain the enhanced selectivity of these compounds for EGFR kinase. Continuing their focus on EGFR, which had been identified as a therapeutically attractive target, the Novartis (Ciba-Geigy) group pursued two more related series, the pyrazolopyrimidines, 35 [106] and the pyrrolopyrimidines, 36 [107]. In both series a preference for a metahalo aryl group was demonstrated for the ring which was presumed by modeling to fit in the lipophilic pocket adjacent to Cys-773 (Fig. 8). Independently, 4-anilinoquinazolines were being pursued as EGFR inhibitors at Parke-Davis [108] and Zeneca [109]. The obvious structural similarities between the Novartis series and the quinazolines extended to the SAR, which indicated in both series a preference for a meta-halo group on one of the aryls. These similarities led the Parke-Davis group to propose that a common binding mode for 26 and 36 would be expected (Fig. 8). However, docking studies on the EGFR homology model developed at Parke-Davis led this group to propose an alternative to the Novartis model, in which the meta-halo aryl group was positioned in an inner lipophilic pocket next to the gatekeeper. In a comparison of both docking models
23
24 Protein Kinase Inhibitors
Fig. 8 The pyrrolopyrimidine- and anilinoquinazoline-EGFR binding models proposed by researchers at Novartis and Parke-Davis.
against all the Novartis data, the Parke-Davis model was better able to fit all the data [110]. A recent X-ray crystal structure of EGFR with anilinoquinazoline 20 matched the Parke-Davis model [80]. The successful design of CI-1033, 37, a selective inhibitor which binds irreversibly to EGFR through reaction of the ary-lamide with Cys-773, provides additional evidence to indicate that the Parke-Davis model will hold for other anilinoquinazolines and closely related scaffolds [98, 111]. However, further extrapolation of this result is premature. Changes in template substitution and differing interactions between kinases have been shown to affect the binding orientation of kinase inhibitor templates [43, 112]. It is important to note that in the 3–5 year period prior to crystallographic confirmation of these models, they were used successfully by several research groups to design new templates and to advance lead optimization programs. The modeling predictions of Trumpp-Kallmeyer at Parke-Davis stand out in this regard [113]. In a publication outlining the development of a binding model for two series of 6-aryl pyridopyrimidines 38 (Fig. 9), the authors present in detail the methodology and rationale used to develop the model. Because these new series potently inhibited FGFR and c-Src but also displayed significant activity against PDGFR and EGFR, homology models were built and docking studies performed for all four kinases. The analysis of docking studies were guided by the SAR and rationalize the selectivity for inhibition of the these four tyrosine kinases. Their conclusion of a 4-anilinoquinazoline-like binding mode for 38 (and related series) was verified in a crystal structure of 14 in FGFR1 [72]. Furthermore, the more general conclusion on the importance of the gatekeeper (Val-561 of FGFR1) as a key specificity residue for all kinase classes is now widely recognized. An excellent example illustrating how structural knowledge guides design is the transformation of the 6-aryl-pyridopyrimidones 38 from a tyrosine kinase inhibitorspecific template to favor inhibition of the cyclin-dependent kinases (Fig. 9) [114]. Docking of the 38 into CDK2, using the same binding mode developed for the tyrosine kinases, was not possible as access to the gatekeeper-dependent binding pocket was blocked by the large phenylalanine residue which is present in all
4 Inhibitor Design
Fig. 9 The transformation of a tyrosine kinase-selective inhibitor 38 into a CDK-selective template (39 and 40) was achieved by adjusting the template substitution pattern.
the cyclin kinases. The authors concluded this to be the primary reason these compounds were not cyclin-dependent kinase inhibitors, and if true, then pyridopyrimidones lacking this 6-aryl group should gain affinity for the cyclin kinases while losing affinity for tyrosine kinases. Pyridopyrimidines, 39 and 40, having hydrogen in place of aryl at the 6-position were synthesized and proved to be potent cyclin kinase inhibitors. Concurrent with these modeling efforts, several X-ray crystallographic studies were published for inhibitor-kinase complexes containing selective inhibitors. These included p38 [18, 43, 71], CDK2 [61, 82, 90, 115], several of the Src kinases [31, 32, 73, 74] and FGFR [30, 72]. The consistent theme of inhibitor-protein interactions (outlined in Sections 4.1/4.2) elucidated by these studies have provided the foundation for subsequent inhibitor design. Much of this design has been the reworking and optimization of templates discovered in screening, such as quinazolines, purines, diaryl imidazoles. Exemplary of this work are a structure-guided iterative design cycle described by Davies et al. for the optimization of 24, a puri-nyl inhibitor of CDK2 [116], and the design of CI-1033 37, an anilinoquinazoline containing a warhead acrylamide, which confers irreversible inhibition to the EGFR family of kinases [98]. 4.3.3 Mechanism-Based and Ligand Mimetic Design The skeptics that predicted the impossibility of a highly kinase-selective ATPcom-petitive inhibitor have now been silenced. However, the reasoning from which this skepticism arose remains to frustrate attempts to employ the rationale substrate-based design algorithms that have proven fruitful for other protein classes. Structure-based design efforts which rely on close analogues of either ATP or a peptide or a combination of both in a multi-substrate analogue have met with limited success, and there pursuit till now has been largely as an academic exercise to probe mechanistic details of the kinase reaction. Exemplary of this work is an ATP-peptide bi-substrate analogue inhibitor for which the kinetics and structure
25
26 Protein Kinase Inhibitors
have been determined with the insulin receptor kinase [117]. For further information on the status of these approaches see the reviews by Lawrence and Niu [88] and Parang and Cole [118]. 4.3.4 Computational Chemistry and Virtual Screening Honma et al. from Banyu Pharmaceuticals have published one of the few examples of kinase inhibitor design which begins with a de novo ligand-building program to generate novel ligands [119]. Realizing the limitations of the de novo lead generation program (LEGEND), Honma chose to filter the structures generated by the de novo software by developing a program (SEEDS) to extract the essential information from the de novo structures and search it against a database of commercially available and/or synthetically accessible ligands. The protein residues chosen as starting points for design of a CDK4-specific inhibitor were those that make the backbone donor–acceptor hydrogen bonds to adenine and the conversed lysine (Lys-33) and glutamic acid (Glu-51) which were known to be involved in the binding of several CDK2 inhibitors. The coordinates used were from a CDK4 homology model based upon CDK2. Following further visual filtering to eliminate undesired skeletons and functional groups, a set of 350 compounds were purchased and screened. Hits were ranked based upon their structural tractability as leads. Optimization of this lead was performed in an iterative manner, making small arrays followed by evaluation of SAR using docking back into the homology model to guide the selection of reagents for subsequent arrays. The end result was 41, a 42 nM inhibitor of CDK4, which was shown in an X-ray with CDK2 to mimic the main features of the proposed binding mode [119]. With the exception of the de novo ligand building, the design process used by Banyu to discover the CDK4 inhibitor provides a good introduction to the types of computational approaches that are routinely being employed with increasing success. Given the success of the de novo ligand design, it is worth considering why this has not been more widely used. Based upon our own experience, the most likely answer is that it has been tried but has yet to offer any advantage over the other available methods. In the Banyu implementation of de novo design, no “wet” chemistry was performed prior to screening of commercially available ligands. Given this restriction, one could have directly employed an alternate strategy of 3D pharmacophore searching of commercial databases. Such a process of pharmacophore generation and virtual screening has been described by Kinetix Pharmaceuticals [60]. Also Furet et al. from Novartis have reported on two examples where structure-based design was combined with database searching to discover leads for CDK1 and CDK2 [120, 121]. The development of kinase inhibitor libraries and the assembly of kinase screening sets is an another way the generic pharmacophore elements described above and in the Traxler/Bower model have been applied to advance lead discovery. To run a complete HTS screen of one million compounds is time consuming and expensive. As such there is considerable value in developing strategies to assemble smaller kinase inhibitor-enriched screening sets. For example, purvalanol B 22, a potent CDK2 inhibitor, was discovered in a purine-based combinatorial library [122]. Finally, it should be noted that the
4 Inhibitor Design
chances of getting a lead from HTS are increasing as the screening collections are populated with compounds prepared for other kinase inhibitor efforts. For example, compounds from the pyridinylimidazole class of p38 inhibitors have served as leads for c-Raf [123] and Alk5 [124]. For further details on the role of computational chemistry in kinase inhibitor structure-based design strategies and the range of computational tools being applied in this area see a recent overview by Woolfrey and Weston [89]. 4.4 Selectivity
A realistic goal for inhibitors binding in the ATP site would be to realize that in most cases very high selectivity is not obtainable, is unnecessary and undeterminable. That absolute selectivity is not required is demonstrated by 1, the first kinase inhibitor approved for use in humans, which in addition to inhibiting the Bcr-Abl target, also blocks c-Kit and PDGFR [87]. Such selectivity being undeterminable relates to the technical challenge of developing the methodology to assay all 518 putative human kinases. Setting aside further discussion of what level of selectivity is required for a specific target, the fact remains that if inhibitor selectivity is insufficient the undesired side-effects are expected to be intolerable. Hence, developing strategies to achieve the required level of selectivity for the given target will be a top priority for a kinase drug discovery effort. As noted earlier, unlike the phosphate-binding region, where the majority of residues are identical across all kinases, the adenine and ribose-binding pockets are lined by a less rigidly conserved set of residues. We have already covered examples illustrating the effect the gatekeeper has on the selectivity of inhibitors that access the gatekeeper-dependent binding pocket. The exploitation of Cys-773 in the EGFR receptor to develop irreversible inhibitors (37) is another example where a specific residue has been exploited to enhance potency and selectivity. The interpretation of X-ray structural data in light of inhibitor SAR led to the identification of these examples and promises to identify more differences that can be exploited to enhance selectivity (for additional examples see [30, 91]). However, in other cases where selective inhibition has been observed, elucidating the features responsible for selectivity has proven more difficult [64]. Looking further beyond the ATP binding site, several groups have discovered inhibitor binding sites at the solvent interface that impart selectivity. Two examples are the interaction of the sulfonamide of inhibitors 10 [67] and 24 [116] with D86 of CDK2 and the exploitation of the change from lysine (CDK2) to threonine (CDK4) at position of 89 to prepare a CDK4-selective inhibitor, 29 [55]. The conformational mobility of kinases, which is key for controlling the catalytic activity and substrate selectivity of kinases, may also prove to be key for obtaining inhibitor selectivity. Gleevec 1 [27], which inhibits Abl, and BIRB-796 42 [29], which inhibits p38α, both bind to catalytically inactive kinase conformations in which the phenylalanine of the conserved DFG motif in the activation loop is displaced by the binding of the inhibitor to the aspartate NH (Scheme 6). As noted
27
28 Protein Kinase Inhibitors
Scheme 6
in Section 2.3.1, this behavior has been observed for additional kinases and is likely to have applicability for the design of selective inhibitors of many kinases. Structural studies on the p38-VX-745, 43, complex have revealed an inhibitor-induced flip of the Gly-110 peptide geometry which allows the inhibitor carbonyl oxygen to form two hydrogen bonds to amide NH groups in the linker region [125] Researchers at Merck who first reported this observation speculate that the Gly-110 of p38 is uniquely suited for this conformational flip relative to kinases which do not have a glycine at this position and that this could explain the enhanced selectivity of 43 for p38 Several new classes of p38 inhibitors have been designed to take advantage of this feature and demonstrate improved kinase selectivity [125, 126].
5 Conclusion
Structural insights gained from crystallographic studies with individual members of the kinase family have resulted in the elucidation of a single, universally applicable kinase motif. Similarly, crystallographic studies of kinase-inhibitor complexes are providing an understanding of inhibitor affinity and selectivity that can be broadly applied. As reviewed in this chapter, when coupled with computational techniques, researchers have been able to design new inhibitor scaffolds and improve inhibitor selectivity. Recognition of the high strategic value of having crystallographic information at the beginning of a lead optimization effort is a driving force for improving the techniques to obtain crystalline kinases and the rapid solution of their structure, and if this fails, for the use of a surrogate kinase. This recognition and the elucidation of the general principles of kinase inhibitor binding is also driving the synthesis of kinase inhibitor libraries and the assembly of focus screening sets comprised of compounds which fit these kinase inhibitor descriptors. It is important to note that current methodologies for the generation of homology models are inadequate for the task of generating the novel conformations that have been revealed by protein crystallography. Consequently, the best starting point for a kinase structure-based lead optimization effort based upon a novel inhibitor scaffold is to solve the structure of the ligand-kinase complex. NMR
References
offers an alternative experimental technique for studying the solution dynamics of protein–ligand interaction and this technique has been successfully used in the design of novel inhibitors of several protein classes [127]. At present, the utility of NMR for the study of kinases is limited by the low resolution of structural information which may be gained and the general difficulty of the experiments. Another promising approach is the application of novel computational techniques to predict kinase-inhibitor structures. However, for this to be realized new approaches must be devised to account for the surprising conformational flexibility of kinases. Perhaps, by supplying a learning set of allowed motions, crystallography can provide the information required for the implementation of new conformational searching and inhibitor-docking algorithms. Protein crystallography is anticipated to continue its dominant role as a spectroscopic technique to drive the optimization of ATP-competitive kinase inhibitors. Moreover, structural insights revealed by crystallography are likely to be essential for maximizing the potential of alternative non-ATP-competitive inhibitors. Examples of non-ATP-competitive inhibitors that have been explored are antagonists of SH2-binding domains [128] and the MEK inhibitors first discovered by scientists at Parke-Davis [129]. In the case the SH2-binding domain, protein crystallography has provided a detailed understanding of the structural features required for high affinity binding. However, the identified binding motif presented significant challenges for the development of small molecular weight cell-permeable inhibitors, and thus far chemists have been unable to develop drug-like inhibitors of these protein-protein interaction domains. In contrast, while the binding site of the MEK inhibitors remains unknown, these compounds have been developed into a compound appropriate for clinical development [130]. In their role as proteins that integrate and transduce cell-signalling stimuli, kinases often have multiple activators and multiple substrates plus additional interactions which modulate their activity. Thus there exist many potential sites for intervention beyond the substrates involved in catalysis (ATP and phosphoacceptor). Finding leads which bind at these sites will require new, non-ATP-centric screening methodologies. Once identified, protein crystallography will again be expected to lead the way towards fully exploiting these new opportunities. References 1 MANNING, G., WHYTE, D. B., MARTINEZ, R., HUNTER, T., SUDARSANAM, S., Science 2002, 298, 1912–1934. 2 ENGH, R. A., BOSSEMEYER, D., Pharmacol. Ther. 2002, 93, 99–111. 3 HUBBARD, S. R., Prog. Biophys. Mol. Biol. 1999, 71, 343–358. 4 HUBBARD, S. R., Front Biosci. 2002, 7, 1315–1340. 5 HUSE, M., KURIYAN, J., Cell 2002, 109, 275–282.
6 SICHERI, F., KURIYAN, J., Curr. Opin. Struct. Biol. 1997, 7, 777–785. 7 SOWADSKI, J. M., EPSTEIN, L. F., LANKIEWICZ, L., KARLSSON, R., Pharmacol. Ther. 1999, 82, 157–164. 8 TAYLOR, S. S., KNIGHTON, D. R., ZHENG, J., Ten EYCK, L. F., SOWADSKI, J. M., Annu. Rev Cell Biol. 1992, 8, 429–462. 9 TAYLOR, S. S., RADZIO-ANDZELM, E., Structure 1994, 2, 345–355.
29
30 Protein Kinase Inhibitors
10 De BONDT, H. L., ROSENBLATT, J., JANCARIK, J., JONES, H. D., MORGAN, D. O., KIM, S. H., Nature 1993, 363, 595–602. 11 JEFFREY, P. D., RUSSO, A. A., POLYAK K. GIBBS, E., HURWITZ, J., MASSAGUE, J., PAVLETICH, N. P., Nature 1995, 376, 313–320. 12 KNIGHTON, D. R., ZHENG, J. H., TEN EYCK, L. F., ASHFORD, V. A., XUONG, N. H., TAYLOR, S. S., SOWADSKI, J. M., Science 1991, 253, 407–414. 13 HUBBARD, S.R., EMBO J. 1997, 16, 5572–5581. 14 HUBBARD, S. R., WEI, L., ELLIS, L., HENDRICKSON, W. A., Nature 1994, 372, 746–754. 15 MCTIGUE, M. A., WICKERSHAM, J.A., PINKO, C., SHOWALTER, R. E., PARAST, C. V., TEMPCZYK-RUSSELL, A., GEHRING, M. R., MROCZKOWSKI, B., KAN, C. C., VILLAFRANCA, J. E., APPELT, K., Structure Fold. Des. 1999, 7, 319–330. 16 WANG, Z., HARKINS, P. C., ULEVITCH, R. J., HAN, J., COBB, M. H., GOLDSMITH, E. J., Proc. Natl. Acad. Sci. USA 1997, 94, 2327–2332. 17 WILSON, K. P., FITZGIBBON, M. J., CARON, P. R., GRIFFITH, J. P., CHEN W. MCCAFFREY, P. G., CHAMBERS, S. P., SU, M. S., J. Biol. Chem. 1996, 271, 27696–27700. 18 WILSON, K. P., MCCAFFREY, P. G., HSIAO, K., PAZHANISAMY, S., GALULLO, V., BEMIS, G. W., FITZGIBBON, M. J., CARON, P. R., MURCKO, M. A., SU, M. S., Chem. Biol. 1997, 4, 423–431. 19 ZHANG, F., STRAND A., ROBBINS, D., COBB, M. H., GOLDSMITH, E. J., Nature 1994, 367, 704–711. 20 ABLOOGLU, AJ., TILL, J. H., KIM, K., PARANG, K., COLE, P. A., HUBBARD, S. R., KOHANSKI, R.A. J. Biol. Chem. 2000, 275, 30394–30398. 21 BOSSEMEYER, D., ENGH, R. A., KINZEL, V., PONSTINGL, H., HUBER, R., EMBO J. 1993, 12, 849–859. 22 COOK, A., LOWE, E. D., CHRYSINA, E. D., SKAMNAKI, V. T., OIKONOMAKOS, N. G., JOHNSON, L. N., Biochemistry 2002, 41, 7301–7311. 23 MADHUSUDAN, TRAFNY, E. A., XUONG, N. H., ADAMS, J. A., TEN EYCK, L. F., TAYLOR,
24
25 26
27
28
29
30
31
32
33 34 35
36
37
38
S. S., SOWADSKI, J. M., Protein Sci. 1994, 3, 176–187. TILL, J. H., ABLOOGLU, A. J., FRANKEL, M., BISHOP, S. M., KOHANSKI, R. A., HUBBARD, S. R., J. Biol. Chem. 2001, 276, 10049–10055. JOHNSON, L. N., NOBLE, M. E., OWEN, D. J., Cell 1996, 85, 149–158. NAGAR, B., BORNMANN, W. G., PELLICENA, P., SCHINDLER T. VEACH, D. R., MILLER, W. T., CLARKSON, B., KURIYAN, J., Cancer Res. 2002, 62, 4236–4243. SCHINDLER, T., BORNMANN, W., PELLICENA, P., MILLER, W. T., CLARKSON, B., KURIYAN, J., Science 2000, 289, 1938–1942. KOBE, B., HEIERHORST, J., FEIL, S. C., PARKER, M. W., BENIAN, G. M., WEISS, K. R., KEMP, B. E. EMBOJ. 1996, 15, 6810–6821. PARGELLIS, C., TONG, L., CHURCHILL, L., CIRILLO, P. F., GILMORE, T., GRAHAM, AG., GROB, P. M., HICKEY, E. R., MOSS, N., PAV, S., REGAN, J., Nat. Struct. Biol. 2002, 9, 268–272. MOHAMMADI, M., MCMAHON, G., SUN, L., TANG, C., HIRTH, P., YEH, B. K., HUBBARD, S. R., SCHLESSINGER, J., Science 1997, 276, 955–960. LAMERS, M. B., ANTSON, A. A., HUBBARD, R. E., SCOTT, R. K., WILLIAMS, D. H. J. Mol. Biol. 1999, 285, 713–725. SCHINDLER, T., SICHERI, F., PICO, A., GAZIT, A., LEVITZKI, A., KURIYAN, J., Mol. Cell 1999, 3, 639–648. SICHERI, F., MOAREEI, I., KURIYAN, J., Nature 1997, 385, 602–609. XU, W., HARRISON, S. C., ECK, M. J., Nature 1997, 385, 595–602. XU, W., DOSHI, A., LEI, M., ECK, M. J., HARRISON, S. C., Mol. Cell 1999, 3, 629–638. ENGH, R. A., GIROD, A., KINZEL, V., HUBER, R., BOSSEMEYER, D. J. Biol. Chem. 1996, 271, 26157–26164. FRODIN, M., ANTAL, T. L., DUMMLER, B. A., JENSEN, C. J., DEAK, M., GAMMELTOET, S., BIONDI, R. M., EMBOJ. 2002, 21, 5396–5407. YANG, J., CRON, P., GOOD, V. M., THOMPSON, V., HEMMINGS, B. A., BARFORD, D., Nat. Struct. Biol. 2002, 9, 940–944.
References 39 YANG, J., CRON, P., THOMPSON, V., GOOD, V. M., HESS, D., HEMMINGS, B. A., BAREORD, D., Mol. Cell 2002, 9, 1227–1240. 40 ZHENG, J., KNIGHTON, D. R., TEN EYCK, L. F., KARLSSON, R., XUONG, N., TAYLOR, S. S., SOWADSKI, J. M., Biochemistry 1993, 32, 2154–2161. 41 KNIGHTON, D. R., ZHENG, J. H., TEN EYCK, L. F., XUONG, N. H., TAYLOR, S. S., SOWADSKI, J. M., Science 1991, 253, 414–420. 42 SHEWCHUK, L., HASSELL, A., WISELY, B., ROCQUE, W., HOLMES, W., VEAL, J., KUYPER, L. F., J. Med. Chem. 2000, 43, 133–138. 43 WANG, Z., CANAGARAJAH, B. J., BOEHM, J. C., KASSISA, S., COBB, M. H., YOUNG, P. R., ABDEL-MEGUID, S., ADAMS, J. L., GOLDSMITH, E. J., Structure 1998, 6, 1117–1128. 44 HUSE, M., CHEN YG., MASSAGUE, J., KURIYAN, J., Cell 1999, 96, 425–436. 45 WYBENGA-GROOT, L. E., BASKIN, B., ONG, S. H., TONG, J., PAWSON T. SICHERI, F., Cell 2001, 106, 745–757. 46 GOLDBERG, J., NAIRN, A. C., KURIYAN, J., Cell 1996, 84, 875–887. 47 HU, S. H., PARKER, M. W., LEI, J. Y., WILCE, M. C., BENIAN, G. M., KEMP, B. E., Nature 1994, 369, 581–584. 48 LEI, M., LU, W., MENG, W., PARRINI, M. C., ECK, M. J., MAYER, B. J., HARRISON, S. C., Cdi 2000, 102, 387–397. 49 PAUTSCH, A., ZOEPHEL, A., AHORN, H., SPEVAK, W., HAUPTMANN, R., NAR, H., Structure (Camb.) 2001, 9, 955–965. 50 SULZENBACHER, G., GRUEZ, A., ROIG-ZAMBONI, V., SPINELLI, S., VALENCIA, C., PAGOT, F., VINCENTELLI, R., BIGNON, C., SALOMONI, A., GRISEL, S., MAURIN, D., HUYGHE, C., JOHANSSON, K., GRASSICK, A., ROUSSEL, A., BOURNE, Y., PERRIER, S., MIALLAU, L., CANTAU, P., BLANC, E., GENEVOIS, M., GROSSI, A., ZENATTI, A., CAMPANACCI, V., CAMBILLAU, C., Acta Crystallogr. D. Biol. Crystallogr. 2002, 58, 2109–2115. 51 TARRICONE, C., DHAVAN, R., PENG, J., ARECES, LB., TSAI, L. H., MUSACCHIO, A., Mol. Cell 2001, 8, 657–669. 52 JEEEREY, P. D., TONG L., PAVLETICH, N. P., Genes Dev. 2000, 14, 3115–3125.
53 BAX, B., CARTER, P. S., LEWIS C., GUY, A. R., BRIDGES, A., TANNER, R., PETTMAN, G., MANNIX C. CULBERT, A. A., BROWN, M. J., SMITH, D. G., REITH, A. D., Structure (Camb.) 2001, 9, 1143–1152. 54 MOHAMMADI, M., SCHLESSINGER, J., HUBBARD, S. R., Cell 1996, 86, 577–587. 55 IKUTA, M., KAMATA, K., FUKASAWA, K., HONMA, T., MACHIDA X. HIRAI, H., SUZUKI-TAKAHASHI, I., HAYAMA, T., NISHIMURA, S. J. Biol. Chem. 2001, 276, 27548–27554. 56 BINNS, K. L., TAYLOR, P. P., SICHERI, F., PAWSON T. HOLLAND, S. J., Mol. Cell Biol. 2000, 20, 4791–4805. 57 ARNOLD, L. D., DIXON, R., TALANIAN, R., GAZA-BULSECO, G., BUMP, N., SESHADRI, T., BELLAMACINA, C., ALLEN, K., HOFFKEN, W., RIEBEL, A., XU, Y., Abstract for Proceedings of the American Association for Cancer Research 2002, 43, 848. 58 BELLON, S., FITZGIBBON, M. J., FOX T. HSIAO, H. M., WILSON, K. P., Structure Fold. Des. 1999, 7, 1057–1065. 59 TRAXLER, P., FURET, P., Pharmacol. Ther. 1999, 82, 195–206. 60 TOLEDO, L. M., LYDON, N. B., Structure 1997, 5, 1551–1556. 61 De AZEVEDO, W. F., LECLERC, S., MEIJER L. HAVLICEK, L., STRNAD, M., KIM, S. H., Eur. J. Biochem. 1997, 243, 518–526. 62 TRAXLER, P., GREEN, J., METT, H., SEQUIN, U., FURET, P. J. Med. Chem. 1999, 42, 1018–1026. 63 NIFFIND, K., PUTTER, M., GUERRA, B., ISSINGER O. G., SCHOMBURG, D., Nat. Struct. Biol. 1999, 6, 1100–1103. 64 MEIJER, L., THUNNISSEN, A. M., WHITE, A. W., GARNIER, M., NIKOLIC, M., TSAI, L. H., WALTER, J., CLEVERLEY, K. E., SALINAS, P. C., WU, Y. Z., BIERNAT, J., MANDELKOW, E. M., KIM, S. H., PETTIT, G. R., Chem. Biol. 2000, 7, 51–63. 65 HOESSEL, R., LECLERC, S., ENDICOTT, J. A., NOBEL, M. E., LAWRIE, A., TUNNAH, P., LEOST, M., DAMIENS, E., MARIE, D., MARKO, D., NIEDERBERGER, E., TANG W. EISENBRAND, G., MEIJER, L., Nat. Cell Biol. 1999, 1, 60–67. 66 DAVIES, T. G., BENTLEY, J., ARRIS, C. E., BOYLE, F. T., CURTIN, N. J., ENDICOTT, J. A., GIBSON, A. E., GOLDING, B. T., GRIFFIN, R. J., HARDCASTLE, I. R.,
31
32 Protein Kinase Inhibitors
67
68
69
70
71
72
73
74
75
76
JEWSBURY, P., JOHNSON, L. N., MESGUICHE, V., NEWELL, D. R., NOBLE, M. E., TUCKER, J. A., WANG L. WHITFIELD, H. J., Nat. Struct. Biol. 2002, 9, 745– 749. BRAMSON, H. N., CORONA, J., DAVIS, S. T., DICKERSON, S. H., EDELSTEIN, M., FRYE, S. V., GAMPE, R. T., Jr., HARRIS, P. A., HASSELL, A., HOLMES, W. D., HUNTER, R. N., LACKEY, K. E., LOVEJOY, B., LUZZIO, M. J., MONTANA, V., ROCQUE, W. J., RUSNAK, D., SHEWCHUK L., VEAL, J. M., WALKER, D. H., KUYPER, LF. J. Med. Chem. 2001, 44, 4339–4358. NARAYANA, N., DILLER, T. C., KOIDE K., BUNNAGE, M. E., NICOLAOU, K. C., BRUNTON, L. L., XUONG, N. H., TEN EYCK, L. F., TAYLOR, S. S., Biochemistry 1999, 38, 2367–2376. KOIDE, K., BUNNAGE, M. E., GOMEZ, P. L., KANTER, J. R., TAYLOR, S. S., BRUNTON, L. L., NICOLAOU, K. C., Chem. Biol. 1995, 2, 601–608. HUNENBERGER, P. H., HELMS, V., NARAYANA, N., TAYLOR, S. S., MCCAMMON, J. A., Biochemistry 1999, 38, 2358–2366. TONG, L., PAV, S., WHITE, D. M., ROGERS, S., CRANE, KM., CYWIN, C. L., BROWN, M. L., PARGELLIS, C. A., Nat. Struct. Biol. 1997, 4, 311–316. MOHAMMADI, M., FROUM, S., HAMBY, J. M., SCHROEDER, M. C., PANEK, R. L., LU, G. H., ELISEENKOVA, A. V., GREEN, D., SCHLESSINGER, J., HUBBARD, S. R., EMBO J. 1998, 17, 5896–5904. ZHU, X., KIM, J. L., NEWCOMB, J. R., ROSE, P. E., STOVER, D. R., TOLEDO, L. M., ZHAO, H., MORGENSTERN, K. A., Structure Fold. Des. 1999, 7, 651–661. LIU, Y., BISHOP, A., WITUCKI, L., KRAYBILL, B., SHIMIZU, E., TSIEN, J., UBERSAX, J., BLETHROW, J., MORGAN, D. O., SHOKAT, KM., Chem. Biol. 1999, 6, 671–678. EYERS, P. A., CRAXTON, M., MORRICE, N., COHEN, P., GOEDERT, M., Chem. Biol. 1998, 5, 321–328. GUM, R. J., MCLAUGHLIN, M. M., KUMAR, S., WANG, Z., BOWER, M. J., LEE, J. C., ADAMS, J. L., LIVI, G. P., GOLDSMITH, E. J., YOUNG, P. R., J. Biol. Chem. 1998, 273, 15605–15610.
77 ADAMS, J. L., BOEHM, J. C., GALLAGHER, T. F., KASSIS, S., WEBB, E. F., HALL, R., SORENSON, M., GARIGIPATI, R., GRISWOLD, D. E., LEE, J. C., Bioorg. Med. Chem. Lett. 2001, 11, 2867–2870. 78 BARKER, A. J., GIBSON, K. H., GRUNDY, W., GODFREY, A. A., BARLOW, J. J., HEALY, M. P., WOODBURN, J. R., ASHTON, S. E., CURRY, B. J., Scarlett, L, Henthorn, L, Richards, L, Bioorg. Med. Chem. Lett. 2001, 11, 1911–1914. 79 HENNEQUIN, L. F., STOKES, E. S., THOMAS, A. P., JOHNSTONE C. PLE, P. A., OGILVIE, D. J., DUKES, M., WEDGE, S. R., KENDREW, J., CURWEN, J. O. J. Med. Chem. 2002, 45, 1300–1312. 80 STAMOS, J., SLIWKOWSKI, M. X., EIGENBROT, C. J. Biol. Chem. 2002, 277, 46265–46272. 81 HUBBARD, S. R., TILL, J. H., Annu. Rev. Biochem. 2000, 69, 373–398. 82 KIM, S. H., SCHULZE-GAHMEN, U., BRANDSEN, J., AZEVEDO, W. F. Jr., Prog Cell Cycle Res. 1996, 2, 137–145. 83 JOHNSON, L. N., LEWIS, R. J., Chem. Rev. 2001, 101, 2209–2242. 84 PRADE, L., ENGH, R. A., GIROD, A., KINZEL, V., HUBER, R., BOSSEMEYER, D., Structure 1997, 5, 1627–1637. 85 KLEBE, G., BOHM, H. J. J. Recept. Signal. Transduct. Res. 1997, 17, 459–73. 86 DAVIES, T. G., TUNNAH, P., MEIJER L., MARKO, D., EISENBRAND, G., ENDICOTT, J. A., NOBLE, M. E., Structure (Camb.) 2001, 9, 389–397. 87 DRUKER, B. J., Oncogene 2002, 21, 8541–8546. 88 LAWRENCE, D. S., NIU, J., Pharmacol. Ther. 1998, 77, 81–114. 89 WOOLEREY, J. R., WESTON, G. S., Curr. Pharm. Des 2002, 8, 1527–1545. 90 LAWRIE, A. M., NOBLE, M. E., TUNNAH, P., BROWN, N. R., JOHNSON, L. N., ENDICOTT, J. A., Nat. Struct. Biol. 1997, 4, 796–801. 91 ZHAO, B., BOWER, M. J., MCDEVITT, P. J., ZHAO, H., DAVIS, S. T., JOHANSON, K. O., GREEN, S. M., CONCHA, N. O., ZHOU, B. B. J. Biol. Chem. 2002, 277, 46609–46615. 92 MENG, W., SWENSON, L. L., FITZGIBBON, M. J., HAYAKAWA, K., TER HAAR, E., BEHRENS, A. E., FULGHUM, J. R., LIPPKE, J. A. J. Biol. Chem. 2002, 277, 37401–37405.
References 93 CHEETHAM, G. M., KNEGTEL, R. M., COLL, J. T., RENWICK, S. B., SWENSON L. WEBER, P., LIPPKE, J. A., AUSTEN, D. A. J. Biol. Chem. 2002, 277, 42419–2422. 94 NOWAKOWSKI, J., CRONIN, C. N., MCREE, D. E., KNUTH, M. W., NELSON, C. G., PAVLETICH, N. P., ROGERS, J., SANG, B. C., SCHEIBE, D. N., SWANSON, R. V., THOMPSON, D. A., Structure (Camb.) 2002, 10, 1659–1667. 95 SENDEROWICZ, A. M., Invest New Drugs 1999, 17, 313–320. 96 JIROUSEK, M. R., GILLIG, J. R., GONZALEZ, CM., HEATH, W. F., MCDONALD, J. H., III, NEEL, D. A., RITO, C. J., SINGH, U., STRAMM, L. E., MELIKIAN-BADALIAN, A., BAEVSKY, M., BALLAS, L. M., HALL, S. E., WINNEROSKI, L. L., FAUL, M. M. J. Med. Chem. 1996, 39, 2664–2671. 97 BOLD, G., ALTMANN, K. H., FREI, J., LANG, M., MANLEY, P. W., TRAXLER, P., WIETEELD, B., BRUGGEN, J., BUCHDUNGER, E., COZENS, R., FERRARI, S., FURET, P., HOEMANN, F., MARTINY-BARON, G., MESTAN, J., ROSEL, J., SILLS, M., STOVER, D., ACEMOGLU, F., BOSS, E., EMMENEGGER, R., LASSER, L., MASSO, E., ROTH, R., SCHLACHTER, C., VETTERLI, W. J. Med. Chem. 2000, 43, 2310–2323. 98 BRIDGES, A. J., Chem. Rev. 2001, 101, 2541–2572. 99 SUN, L., TRAN, N., LIANG C. HUBBARD, S., TANG, F., LIPSON, K., SCHRECK, R., ZHOU, Y., MCMAHON, G., TANG C. J. Med. Chem. 2000, 43, 2655–2663. 100 ZIMMERMANN, J., CARAVATTI, G., METT, H., MEYER T. MULLER, M., LYDON, N. B., FABBRO, D., J. Arch. Pharm. (Weinheim) 1996, 329, 371–376. 101 FURET, P., CARAVATTI, G., LYDON, N., PRIESTLE, J. P., SOWADSKI, J. M., TRINKS, U., TRAXLER, P., J. Comput. Aided Mol. Des. 1995, 9, 465–72. 102 SCAPIN, G., Drug Discov. Today 2002, 7, 601–611. 103 GARCIA-ECHEVERRIA, C., TRAXLER, P., EVANS, D. B., Med. Res. Rev. 2000, 20, 28–57. 104 BIT, R. A., DAVIS, P. D., ELLIOTT, L. H., HARRIS, W., HILL, C. H., KEECH, E., KUMAR, H., LAWTON, G., MAW, A., NIXON, J. S. J. Med. Chem. 1993, 36, 21–29.
105 TRINKS, U., BUCHDUNGER, E., FURET, P., KUMP, W., METT, H., MEYER, T., MULLER, M., REGENASS, U., RIHS, G., LYDON, N., J. Med. Chem. 1994, 37, 1015–1027. 106 TRAXLER, P., BOLD, G., FREI, J., LANG, M., LYDON, N., METT, H., BUCHDUNGER, E., MEYER, T., MUELLER, M., FURET, P., J. Med. Chem. 1997, 40, 3601–3616. 107 TRAXLER, P. M., FURET, P., METT, H., BUCHDUNGER, E., MEYER, T., LYDON, N. Med. Chem. 1996, 39, 2285–2292. 108 FRY, D. W., KRAKER, A. J., MCMICHAEL, A., AMBROSO, L. A., NELSON, J. M., LEOPOLD, W. R., CONNORS, R. W., BRIDGES, A. J., Science 1994, 265, 1093–1095. 109 GIBSON, K. H., GRUNDY, W., GODFREY, A. A., WOODBURN, J. R., ASHTON, S. E., CURRY, B. J., SCARLETT L., BARKER, A. J., BROWN, D. S., Bioorg. Med. Chem. Lett. 1997, 7, 2723–2728. 110 PALMER, B. D., TRUMPP-KALLMEYER, S., FRY, D. W., NELSON, J. M., SHOWALTER, H. D., DENNY, W. A. J. Med. Chem. 1997, 40, 1519–1529. 111 FRY, D. W., BRIDGES, A. J., DENNY, W. A., DOHERTY A. GREIS, K. D., HICKS, J. L., HOOK, K. E., KELLER, P. R., LEOPOLD, W. R., LOO, J. A., MCNAMARA, D. J., NELSON, J. M., SHERWOOD, V., SMAILL, J. B., TRUMPP-KALLMEYER, S., DOBRUSIN, E. M., Proc. Natl. Acad. Sci. USA 1998, 95, 12022–12027. 112 SCHULZE-GAHMEN, U., BRANDSEN, J., JONES, H. D., MORGAN, D. O., MEIJER L. VESELY, J., KIM, S. H., Proteins 1995, 22, 378–391. 113 TRUMPP-KALLMEYER, S., RUBIN, J. R., HUMBLET C. HAMBY, J. M., HOLLIS SHOWALTER, H. D. J. Med. Chem. 1998, 41, 1752–1763. 114 BARVIAN, M., BOSCHELLI, D. H., COSSROW, J., DOBRUSIN, E., FATTAEY A. FRITSCH, A., FRY, D., HARVEY, P., KELLER, P., GARRETT, M., LA, F., LEOPOLD, W., MCNAMARA, D., QUIN, M., TRUMPP-KALLMEYER, S., TOOGOOD, P., WU, Z., ZHANG, E. J. Med. Chem. 2000, 43, 4606–616. 115 DE, A. W., Jr., MUELLER-DIECKMANN, H. J., SCHULZE-GAHMEN, U., WORLAND, P. J., SAUSVILLE, E., KIM, S. H., Proc. Natl. Acad. Sci. USA 1996, 93, 2735–2740.
33
34 Protein Kinase Inhibitors
116 DAVIES, T. G., PRATT, D. J., ENDICOTT, J. A., JOHNSON, L. N., NOBLE, M. E., Pharmacol. Ther. 2002, 93, 125–133. 117 PARANG, K., TILL, J. H., ABLOOGLU, A. J., KOHANSKI, R. A., HUBBARD, S. R., COLE, P. A., Nat. Struct. Biol. 2001, 8, 37-H. 118 PARANG, K., COLE, P. A., Pharmacol. Ther 2002, 93, 145–157. 119 HONMA, T., HAYASHI, K., AOYAMA, T., HASHIMOTO, N., MACHIDA, T., FUKASAWA, K., IWAMA, T., IKEURA, C., IKUTA, M., SUZUKI-TAKAHASHI, I., IWASAWA, Y., HAYA-A, T., NISHIMURA, S., MORISHIMA, H., J. Med. Chem. 2001, 44, 4615–4627. 120 FURET, P., MEYER T. STRAUSS, A., RACCUGLIA, S., RONDEAU, J. M., Bioorg. Med. Chem. Lett. 2002, 12, 221–224. 121 FURET, P., MEYER T. MITTL, P., FRETZ, H. J. Comput. Aided Mol. Des. 2001, 15, 489–495. 122 GRAY, N. S., WODICKA, L., THUNNISSEN, A. M., NORMAN, T. C., KWON, S., ESPINOZA, F. H., MORGAN, D. O., BARNES, G., LE-CLERC, S., MEIJER, L., KIM, S. H., LOCKHART, D. J., SCHULTZ, P. G., Science 1998, 281, 533–538. 123 ADAMS, J. L., LEE, D., Curr. Opin. Drug Discov. Dev. 1999, 2, 96–109. 124 CALLAHAN, J. F., BURGESS, J. L., FORNWALD, J. A., GASTER, L. M., HARLING, J. D., HARRINGTON, F. P., HEER, J., KWON
125
126
127
128
129
130
C. LEHR, R., MATHUR A. OLSON, B. A., WEINSTOCK, J., LAPING, N. J. J. Med. Chem. 2002, 45, 999–1001. COLLETTI, S. L., FRIE, J. L., DIXON, E. C., SINGH, S. B., CHOI, B. K., SCAPIN, G., FITZGERALD, C. E., KUMAR, S., NICHOLS, E. A., O’KEEFE, S. J., O’NEILL, E. A., PORTER, G., SAMUEL K. SCHMATZ, D. M., SCHWARTZ, CD., SHOOP, W. L., THOMPSON, CM., THOMPSON, J. E., WANG, R., WOODS, A., ZALLER, D. M., DOHERTY, J. B., J. Med. Chem. 2003, 46, 349–352. STELMACH, J. E., LIU L., PATEL, S. B., PIVNICHNY, J. V., SCAPIN, G., SINGH, S., HOP, C. E., WANG, Z., STRAUSS, J. R., CAMERON, P. M., NICHOLS, E. A., O’KEEFE, S. J., O’NEILL, E. A., SCHMATZ, D. M., SCHWARTZ, CD., THOMPSON, CM., ZALLER, D. M., DOHERTY, J. B., Bioorg. Med. Chem. Lett. 2003, 13, 277–280. SHUKER, S. B., HAJDUK, P. J., MEADOWS, R. P., FESIK, S. W., Science 1996, 274, 1531–1534. CODY, W. L., LIN, Z., PANEK, R. L., ROSE, D. W., RUBIN, J. R., Curr. Pharm. Des. 2000, 6, 59–98. PANG, L., SAWADA, T., DECKER, S. J., SALTIEL, A. R. J. Biol. Chem. 1995, 270, 13585–13588. HERRERA, R., SEBOLT-LEOPOLD, J. S., Trends Mol. Med. 2002, 8, S27–S31.
1
Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes Robert E. Babine Suntory Pharmaceutical Research Laboratories, Cambridge, USA
Originally published in: Protein Crystallography in Drug Discovery. Edited by Robert E. Babine and Sherin S. Abdel-Meguid. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30678-7
1 Introduction 1.1 Nuclear Hormone Receptors: Ligand Binding Domains
Nuclear hormone receptors (NHR) are multidomain proteins that function as transcription factors. They contain a central DNA binding domain (DBD) responsible for targeting the receptor to highly specific DNA sequences comprising a response element. The DBD is surrounded by two activation domains: the activation function 1 (AF-1) domain that resides at the N-terminus and the activation function 2 (AF-2) domain that resides at the C-terminal ligand-binding domain (LBD) [1, 2]. NHRs for which no natural ligand is known are referred to as orphan nuclear hormone receptors. Regulation of gene transcription by nuclear receptors requires the recruitment of proteins characterized as co-regulators, with ligand-dependent exchange of co-repressors for co-activators serving as the basic mechanism for switching gene repression to activation [2]. Upon activation by a small molecule ligand, or hormone, NHRs dimerize with another ligated NHR and recruit coactivators to turn on target genes. The binding of a ligand thus can act as a switch between gene activation and gene repression. This biological property has made the ligand binding domains of nuclear hormone receptors important drug targets. Our understanding at the molecular level of how nuclear receptor ligands exert their effects has been dramatically enhanced by the elucidation of the crystal structures of the apo- and/or ligand-bound LBDs of several nuclear receptors (Tab. 1). These structures have revealed a common fold among LBDs, consisting of an antiparallel α-helical sandwich of 11–13 helices. Conventional nomenclature Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes Table 1 PDB codes of all nuclear hormone receptor-ligand-binding domains in the protein-
data bank as of December 2002 Protein
Apo structure
ER-α
Complex dimer structure
1A52, 1ERE, 1ERR, 1QKT, 1QKU, 3ERD, 3ERT, 1L2I, 1GWQ, 1GWR 1QKM, 1QKN, 1HJ1, 1L2J 1137, 1138, 1E3G 1M2Z, 1NHZ 1BSX
ER-β AR GR TR-β RAR-α RAR-γ RXR-α RXR-β PR VDR PPAR-α PPAR-γ PPAR-δ PXR ROR-β ROR-α HNF4-γ ERR3
Ligated structures
1DKF
1G1U, 1LBD
1PRG, 3PRG 2GWX 1ILG
1EXA, 1EXX, 1FCX, 1FCY, 1FCZ, 2LBD, 3LBD, 4LBD, 1FD0 1FBY, 1G5Y, 1MV9, 1MVC, 1MZN 1H9U 1A28, 1E3K 1DB1, 1IE8, 1IE9 1I7G, 1K7L, 1KKQ 1171, 2PRG, 4PRG, 1KNU 1GWX, 3GWX 1ILH 1K4W 1N83 1LV2
1DKF, 1FM6, 1FM9, 1K74
1FM6, 1FM9, 1K74
1KV6
ER Estrogen Receptor, AR Androgen Receptor, GR Glucocorticoid Receptor, TR Thyroid Receptor, RAR Retinoic Acid Receptor, RXR Retinoie X Receptor, PR Progesterone Receptor, VDR Vitamin D Receptor, PPAR Peroxisome Proliferator-Activated Receptor, PXR Pregnane X Receptor, ROR Retinoic Acid-Related Orphan Receptor, NHF-4 Hepatocyte Nuclear Factor 4, ERR Estrogen-Related Receptor
refers to these helices as helix-1, and helices-3 through 12. The region between helices 1 and 3 is variable and may contain zero, one or two helices. The helices fold to form a hydrophobic cavity into which the ligand can bind. The position of helix-12 relative to the other helices is dependent upon the presence or absence of a ligand and the nature of that ligand. The binding site that comprises the AF-2 domain is determined by the position of the C-terminal α-helix (helix-12). 1.2 Dimerization and Interactions with Co-activators and Co-repressors
The Glaxo group was the first to report the structure of a heterodimeric complex between two activated NHRs (peroxisome proliferator-activated receptor-γ ) PPARγ and (retinoid X receptor-α) RXR-α. The structure (PDB entry: 1FM6) of this hetero-dimer complex contains six components: the two receptor LBDs, their two respective ligands, and two peptides derived from the steroid receptor co-activator1 (SRC-1). These peptides contain a conserved LxxLL motif that is present in this
1 Introduction 3
Fig. 1 Heterodimeric complex between PPAR-( (green) bound to agonist rosiglita zone (red spheres), RXR-" (purple) bound to agonist cis-retinoic acid (green spheres). Peptides (orange) derived from the SRC-1 co-activator protein are also shown bound to both NHRs. From PDB entry 1 FM6.
class of co-activators. The complex is butterfly shaped, with both LBDs adopting the conserved "helical sandwich" fold previously reported for other ligand-bound nuclear receptors. PPAR-γ contains 13 α-helices and four short β-strands, while RXR-α is composed of 11 α-helices and two short β-strands [3]. This complex is illustrated in Fig. 1 For each NHR, the LxxLL motif of the SRC-1 peptides binds in a helical conformation to a groove on the protein defined by helices-3, -4 and -12 (Fig. 2). This groove is the binding site that comprises the AF-2 domain and it is determined by the position of helix-12. This "agonist" position of helix-12 is stabilized by the presence of an agonist ligand. Thus, a role of an activating ligand (agonist) is to stabilize a conformation of the LBD that allows binding of a co-activator protein. The Glaxo group has also reported the crystal structure of a ternary complex containing the PPAR-α ligand-binding domain bound to the antagonist GW6471 and a co-repressor motif derived from the SMRT protein (PDB entry 1KKQ) [4]. In this antagonist complex, helix-12 is found in a different position relative to the other helices than it is in the agonist structure. In this structure, the co-repressor motif adopts a three-turn α-helix and docks into a hydrophobic groove formed by helices3, -4, and -12 (Fig. 3). Superposition of an agonist- and antagonist-bound PPARα reveals that the co-repressor-binding site partly overlaps with the co-activatorbinding site. The additional helical turn in the co-repressor motif extends into the space that is left by the repositioning of helix-12, and prevents this helix from folding back into its active conformation. The binding of the co-repressor peptide is
4 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 2 A closer look at the PPAR-( rosiglitazone SRC-1 peptide complex (wall eyed stereo). The helical nature of the NHR LBD is apparent. The agonist ligand (red spheres) induces the agonist conformation of helix-12 (magenta) of the LDB. This pro-
tein conformation has a groove defined by helices-3 (blue), -4 (gray) and -12. The LxxLL motif of the SRC-1 co-activator peptide (orange) binds to that groove. From PDB entry 1FM6.
further reinforced by the antagonist, which blocks helix-12 from adopting an active conformation [4]. These structural studies have led to a proposal that nuclear receptors distinguish co-repressors from co-activators by the length of their helical interaction motifs [4]. The presence of an agonist ligand stabilizes a conformation of helix-12 that
Fig. 3 Structure of the complex between PPAR–" (green), an antagonist ligand (red spheres) and a peptide (orange) derived from the SMRTco-repressor protein (wall eyed stereo). The orientation is close to
that of Fig. 2The SMRT peptide binds as an extended helix and interacts with helix-12 (magenta) of PPAR-", which is in a much different conformation than it is in an agonist structure. From PDB entry 1 KKQ
2 Steroid Receptors
produces an AF-2 conformation that does not allow binding of co-repressors and allows the binding of co-activators. In the presence of an antagonist ligand, helix-12 is blocked from assuming the active AF-2 conformation, resulting in a larger pocket that can accommodate the three-turn α-helix of the co-repressor motif. The high degree of conservation of the co-repressor and co-activator interaction interfaces suggests that this is a model that applies to many of the nuclear receptors [4]. While not comprehensive, the remainder of this chapter will discuss in detail how ligands are capable of binding to NHRs and influencing the conformation of helix-12.
2 Steroid Receptors
While the chemical structures and biological properties of steroids have fascinated chemists and biologists alike for many decades, it is only recently that their receptors have been studied [1, 5]. This section will give an overview of the structural biology of steroid receptor ligand complexes focusing on the role of the ligand. 2.1 The Role of the Ligand 2.1.1 Estradiol Estrogen Receptor Complex Estradiol is the natural agonist ligand of the estrogen receptors (ER). There are two isotypes of the estrogen receptor, ER-α and ER-β. There are four structures of the complex between the LBD of ER-α and estradiol in the PBD. Expression, purification and crystallization of LBDs can often be problematic. Thus, the first structure solved (PDB entry 1ERE) [6] used protein in which the free cysteines were carboxymethylated. The refined structure showed at least one of the cysteines was modified. Another structure (PDB entry: 1A52) [7] used protein that was refolded. The structure of this protein showed an artifact where two LBDs were connected by an intermolecular disulfide bond. Through much experimentation, conditions were found to express, purify and crystallize ER-α LBD without modification or refolding. The protein structure was deposited as PDB entry 1QKU [8] and it is this protein structure that will be discussed. The mostly hydrophobic ligand estradiol is completely enclosed in the hydrophobic core of the ER-α LBD. This protein conformation with helix-12 folded over, and completely enclosing, the ligand-binding site is the agonist conformation of ER-α. While the ligand is completely enclosed within the ER-α protein, the fit is less than optimal; that is, there are several gaps between the surface of the ligand and the surface of the protein. It has been reported that the volume of the ligand-binding cavity (450 Å3 ) is nearly twice that of estradiol’s molecular volume (245 Å3 ). The length and breadth of the ligand is well matched by the receptor, but there are large unoccupied cavities opposite the α face of the B-ring and the β face of the C-ring [6]. The estradiol ligand makes direct van der Waals (VDW) contacts with residues from helix-3, helix-5, helix-7 and helix-11. While the presence of the agonist ligand
5
6 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
stabilizes the conformation of helix-12, the ligand makes no direct VDW contacts with this helix. The A-ring phenol makes hydrogen-bonding interactions with the side chains of Glu-353, located in the middle of helix-3, and Arg-394, located at the C-terminal end of helix-5. These two protein residues also interact with a common water molecule via hydrogen bonds. This arrangement is electrostatically complementary to the phenol and provides the basis for the known preference of an A-ring phenol group in ER ligands. The D-ring hydroxyl group makes a hydrogen bond with the side chain of His-524 that is located on helix-11. Thus, the two polar groups of the estradiol ligand make complementary polar interactions with the ER protein (Fig. 4). In the complex there are no unsatisfied polar interactions in the interior ligandbinding core of the protein. Thus, two types of contacts orient the ligand: hydrogen bonds at the two ends and hydrophobic VDW contacts along the body of the ligand
Fig. 4 (a) Schematic of estradiol binding to ER-" There are polar interactions at both ends of the ligand- the A-ring phenol group interacts with the side chain of Glu-353 of he lix-3 and the D-ring hydroxyl group interacts with His-524 (b) Numbering scheme for steroid ring systems.
2 Steroid Receptors
Fig. 5 Structure of the complex between ER-" and estradiol. The agonist ligand is held by polar interactions on opposite ends of the ligand and it stabilizes the agonist conformation of helix-12 (black). From PDB entry 1QKU.
[7]. The combination of non-polar interactions and the specific polar interactions account for the ability of ER to bind estradiol selectively with high affinity and to exclude other steroids. Estradiol is a pure agonist of estrogen receptor-a and the structure of this complex (PDB entry: 1QKU) [8] shows how an agonist ligand precisely and productively folds ER-α and stabilizes the unique agonist position of helix-12 (Figs. 4 and 5). When estradiol binds to ER-α, it interacts through the 17β-hydroxyl group with His-524, which in turn forms a hydrogen bond with the backbone carbonyl group of Glu-419 in the loop connecting helix-6 to helix-7. This glutamic acid contacts Glu-339 of helix-3 and Lys-531 of helix-11, forming a hydrogen bond network that favors the helix-12 agonist position. The loop between helix-1 and helix-3 accompanies the movement of helix-3. The precise positioning of helix-3 is an important feature for the constitution of the ligand-binding cavity. These interactions are essential for stabilizing the agonist conformation of helix-12. A triple mutant (C381S, C417S, C530S) of ER-α (PDB entry 1QKT) displays limited transcriptional activity upon estradiol stimulation. The structure of this triple mutant, in complex with estradiol, shows helix-12 in an antagonist conformation. This structure helped to elucidate some of the structural features required for agonist activity [8].
7
8 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
2.1.2 Other Estrogen Receptor Agonists Complexes There are numerous examples of nonsteroidal estrogen receptor agonists. Diethyl stilbene (DES; Fig. 6a) is one example of a nonsteroidal agonist ligand for ER-α; its structure has been determined (PDB entry: 3ERD) [9]. The structure of the ligand-binding core of the protein in this complex is nearly superimposable on the structure of the protein in the estradiol/ER-α complex. The A-ring of DES superimposes exactly with the A-ring of estradiol and results in interactions with Glu-353, Arg-394 and a structural water molecule. The D-ring phenol makes a hydrogen bond with His-524 but does so with a different geometry than does the hydroxyl group of estradiol (Fig. 6). Recall that in the estradiol/ER-α complex there are large unoccupied cavities opposite the a-face of the B-ring and the/Mace of the Cring. In the DES complex, the olefin group is out-of-plane with both aromatic rings. This results in the ethyl groups occupying the unoccupied cavities that are present in the estradiol complex and creating new unoccupied cavities in the positions filled by the B-, C- and D-rings of estradiol. Thus, like the estradiol complex, DES binds to the receptor by hydrogen bonding interactions at the two ends of the ligand and
Fig. 6 (a) Illustration of the polar interactions of DES with ER-". (b) Structure of raloxifene core (RalC). (c) Wall eyed stereo view of DES (black) superimposed on estradiol (gray).
2 Steroid Receptors
hydrophobic interactions in the middle. In addition, like the estradiol complex, there are several gaps between the surface of the ligand and the surface of the protein. The structure of another nonsteroidal ligand, raloxifene-core (RalC; Fig. 6b), has been determined in complex with ER-α (PDB entry: 1GWQ) [43]. In this complex, the structure of the ligand-binding core of the protein in this complex is nearly superimposable on the structure of the protein in the estradiol/ER-α and DES/ER-α complexes and the A-ring phenol binds in the same manner as that of estradiol and DES. The D-ring phenol hydrogen bonds with His-524 but does so differently from the way estradiol or DES does. The central hydrophobic core of RalC makes hydrophobic interactions and leaves several gaps between the surface of the ligand and the surface of the protein. The position occupied by the B-ring of estradiol is unoccupied in this complex. Upon examination of the structures of three complexes between ER-α and agonist ligands, some common features become apparent. The structure of the ligandbinding core of each protein is nearly identical. In each of the three examples discussed, the details of the protein-ligand interactions are different yet they bind to, and presumably stabilize, the same form of the receptor. Each complex is characterized by a less than optimal fit between the hydrophobic cores of the ligand and the protein. In each case the ligand is completely enclosed by the protein. There are no buried hydrophilic groups on either the protein or the ligand that are not involved in a hydrogen bond interaction with their ligand or protein partner, respectively. The role of the ligand is to precisely and productively fold ER-α and stabilize the unique agonist conformation of the protein. 2.1.3 Estrogen Receptor Antagonists Complexes Raloxifene (RAL) is an antagonist ligand for the estrogen receptors. The structures of the complex between RAL and both ER-α (PDB entry: 1ERR) [6] and ER-β (PDB entry: 1QKM) [11] have been reported. In contrast to the ER-α agonist structures discussed above, RAL is not completely enclosed by the protein. In these complexes, solved in the absence of a co-repressor or co-repressor peptide, helix-12 does not fold over the ligand but resides on the protein surface that is usually occupied by co-activator peptides in the agonist co-activator complexes (Fig. 7). For the RAL/ERα complex, the A-ring phenol of RAL interacts in the normal fashion with Glu-353, Arg-394 and a structural water molecule. The D-ring phenol interacts with the side chain of His-524 but does it differently than estradiol, DES or RalC (Fig. 8). In the agonist structures the side chain of His-524 forms hydrogen bonds to both a D-ring substituent and the backbone carbonyl of Glu-419. In the RAL complex His-524 interacts only with the ligand (a similar situation exists in the ER-β/RAL complex). While the B-ring of RAL overlaps with B-ring of estradiol the D-rings do not overlap and the phenol hydroxyl is displaced 5.1 A from the position of the 17β-OH of estradiol. The side chain of RAL extends out of the ligand binding core making extensive hydrophobic contacts with helix-3, helix-5, helix-11 and the loop between helix-11 and helix-12. In this structure, the piperazine (F) ring occupies some of the same space that helix-12 occupies in the agonist structures. In addition, a salt bridge between Asp-351 and the piperazine ring nitrogen is observed. Also of
9
10 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 7 Structure of the complex between ER-" (gray) and the selective antagonist raloxifene (solid spheres). In this structure, solved in the absence of a co-repressor peptide, helix-12 (bold) occupies the groove normally occupied by the LxxLL motif of co-activator peptides. From PDB entry 1ERR.
note is that for the agonists structure between ER-α and RalC and the antagonist structure between ER-α and RAL the A- and D-rings are reversed. For the protein ER-α, the structure of helix-3, helix-5 and the β-structure between helix-5 and helix-6, in contact with the ligand, is nearly identical in both the agonist and antagonist complex structure. However, the structure of helix-6 and the C-terminus of helix-11 that contact the ligand are different in the agonists
Fig. 8 Key polar interactions of the antagonistraloxifene with ER-".
2 Steroid Receptors
Fig. 9 Key polar interactions of the antagonist 4-hydroxytamoxifene with ER-".
and antagonist structures. As mentioned previously, helix-12 does not fold over the ligand and is in a completely different position in the antagonist structure. The structure of ER-α in complex with the selective antagonist 4hydroxytamoxifene (OHT; Fig. 9) has been reported (PDB entry 3ERT) [9]. This structure is quite similar to the raloxifene/ER-α complex structure with the E-ring protruding out of the ligand-binding pocket. The A-ring phenol interacts with Arg394 and Glu-353 in a manner similar to other ER-α ligand complexes (Fig. 9). The protein structure resembles that of the RAL complex in that helix-12 does not fold over the ligand but resides on the protein surface; helix-12 and the dimethylamino group makes a salt bridge with Asp-351. The conformation of the Ar-C C-Ar group of OHT is different from the Ar-C C-Ar conformation of DES. The structure of the pure antagonist ICI-164,384 (ICI; Fig. 10) in complex with rat ER-β has been reported (PDB entry: 1HJ1) [10]. ICI is a substituted estradiol
Fig. 10 Key polar interactions of the antagonist ICI-164,384 with ER-$.
11
12 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
with a 7-α-alkylamide group and it is this group that protrudes out of the ligandbinding pocket. To place this substituent out of the pocket requires that the ligand rotate approximately 180◦ (about the long hydroxyl to hydroxyl axis) in the binding site relative to estradiol. The A-ring phenol still interacts with the usual argi-nine and glutamine side chains; however, the D-ring alcohol interacts with His-430 and this histidine interacts with the backbone carbonyl of Glu-326 (Fig. 10). The primary consequence of the flipped binding mode of the ICI, compared with that of estradiol (normal mode), is an alteration in the nonpolar contacts made by the ligand’s steroidal core. The position adopted by the central framework of ICI readjusts to occupy the cavity maximally and position the ligand’s two hydroxyl groups so that they can optimally interact with the protein. Consequently, the Aring end of the steroid is shifted laterally away from helix-3 and toward helix-6 by about 1 A. In contrast, the D-ring end of the molecule adopts a very similar spatial position in the flipped and "normal" orientations. The ER-β cavity accommodates these changes with only minimal alterations in the positioning of the cavity-lining residues [10]. The N-(n-butyl)-N-methyl-undecanamide side chain of ICI exits the binding pocket in an identical manner to that observed previously for RAL. However, once free from the confines of the ligand-binding pocket, the flexible side chain of the ICI antagonist adopts a conformation that is distinct from that observed with the corresponding regions of RAL and OHT. The ICI side chain, unlike the corresponding regions of RAL and OHT, is not tethered to the LBD through a salt bridge with Asp-258 Helix-12 is invisible in the experimental electron density maps, suggesting that it is highly mobile. The lack of a stable orientation of helix-12 can be directly attributed to the binding mode of ICI. The protruding 7α-sub-stituent of ICI sterically prevents alignment of helix-12 over the cavity as with other antagonists, but, in addition, the positioning of the terminal amide portion of the side chain precludes helix-12 from adopting an alternate orientation along the co-activator-binding cleft [10] 2.1.4 Genistein — An ER-β Partial Agonist Genistein (GEN- Fig 11a) is an isoflavonoid natural product that is an ER-β selective partial agonist The structure of the complex between human ER-yS and genistein has been reported (PDB entry 1GKM) [11] GEN binds across the ligand-binding cavity in a manner similar to estradiol with ER-α The A-ring phenol of GEN interacts with Arg-346, Glu-305 and a structural water molecule in a manner the same as most ligands. The D-ring phenol forms a hydrogen bond with His-475, which in turn makes a hydrogen bond with the backbone carbonyl of Glu-371. The other D-ring phenol forms an intramolecular hydrogen bond with the keto group of the C-ring; these polar groups occupy the α-position off the C- and D-rings of estradiol (Fig. 11). Unlike estradiol and other agonist ligands, helix-12 does fold over and enclose the partial agonist genistein. In this structure, helix-12 occupies a position on the surface of the protein between helix-3 and helix-5. The primary determinant of helix-12 positioning appears to be the burial of its hydrophobic face against the protein. The positions of helix-12 in the presence of
2 Steroid Receptors
Fig. 11 (a) Key polar interactions of the partial agonist genistein with ER-$. (b) Structure of THC and its key interactions with ER-". THC is an ER-" agonist and an ER-$; antagonists (see text for details).
pure ER-α agonists, such as estradiol and diethylstilbestrol [6, 9], and that seen in the ER-β/GEN complex both fulfill this objective. In the archetypal "agonist" orientation, typified by the ER-α/estradiol complex, the N-terminal end of helix-12 appears to be stabilized by two hydrophilic interactions. The helix is capped by the side chain of a residue on helix-3, Asp-351, which interacts with the main chain amide of Leu-491. In addition, Tyr-537, located between helix-11 and -12, makes a hydrogen bond with helix-3 residue Asn-348. These two interactions effectively anchor the N-terminal end of helix-12 and position it so that Leu-540 can seal the ligand-binding cavity. This "agonist" conformation of helix-12 completely encloses the ligand within the binding cavity. The asparagine present at position 348 in ERα, which interacts with Tyr-537, is replaced by lysine-300 in ER-β. Consequently, the "agonist" orientation of helix-12 may be less favored in ER-β due to loss of the hydrogen bonding interaction made by this residue to Tyr-398. However, the origin of GEN’s "destabilizing" influence on helix-12 remains unclear [11]. 2.1.5 R,R-5,11-cis-Diethyl-5,6,11,12-tetrahydrochrysene-2,8-diol: An ER-α Agonist and ER-β Antagonist The R,R-enantiomer of 5,11-cis-diethyl-5,6,11,12-tetrahydrochrysene-2,8-diol (THC; Fig. 11) exerts opposite effects on the transcriptional activity of the two estrogen
13
14 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
receptor (ER) subtypes, ER-α and ER-β. THC acts as an ER-α agonist and as an ER-β antagonist. The crystal structures of THC bound to both ER-α (PDB entry 1L2J) and ER-β (PDB entry: 1L2J) have been reported [12]. These two structures show how THC exerts different functional effects on the two isotypes of ER by stabilizing distinct conformations of the two receptors. The ER-α conformation favors co-activator binding, while the ER-β conformation precludes co-activator binding. The complex between the C-2 symmetric THC and ER-α shows the A-ring phenol interacting with Arg-394 and Glu-353 and the D-ring phenol interacting with His524 (Fig. 11b). Like other agonist structures, His-524 interacts with the backbone carbonyl of Glu-419. The two α-face ethyl substituents fill space in the middle of the ligand-binding core of the protein. The structure of the ER-α protein in the THC complex is nearly identical to its structure in the estradiol complex. For agonist structures, ligand binding stabilizes the conformations of several protein side chains that form part of the binding site for helix-12 or the loop between helix-11 and helix-12 [12]. THC binds to ER-β in a similar, but distinctly different manner than it does to ER-α. While the A-ring phenol makes very similar interactions in the ER-α and ER-β complexes, the D-ring fails to interact intimately with His-524 in the ER-β complex and it does not indirectly stabilize helix-12 in an agonist conformation. The B-ring ethyl group adopts a different conformation in the ER-β complex; this causes the core ring system to be tilted away from the floor of the binding pocket compared with the ER-α complex. As a result, the D-ring of THC fails to stabilize many of the ER-β side chains in a manner that stabilizes the agonist conformation of helix-12. Specifically, the side chain of His-475 fails to pack against that of Met-479 (equivalent to Met-528 in ER-α), causing the Met-479 side chain to be disordered. The backbone atoms of Met-479 and the residues flanking it adopt a random coil conformation (as compared with the helical conformation adopted by their counterparts in ER-α). As a result, both Leu-476 and Met-479 are not positioned appropriately to form interactions with relevant residues from helix-12 and the preceding loop that would stabilize the active conformation of helix-12. Thus, by positioning certain binding pocket residues in nonproductive conformations, the binding of THC to ER-β disfavors the agonist-bound conformation of helix-12 and shifts the equilibrium towards the inactive conformation of helix-12 [12]. 2.2 Structural Basis for Agonism and Antagonism and Partial Agonism: The Role of the Ligand
The proposed role of agonists is to stabilize a precise conformation of the estrogen receptor that allows for productive dimerization and co-activator binding. The structures of agonist-bound NHRs reveal that helix-12 folds over the ligand into a position that stabilizes recruitment of a co-activator.
2 Steroid Receptors
For the antagonist ligands OHT, RAL and ICI, the positioning of the ligand side chains precludes the agonist-bound conformation of helix-12 by steric hindrance. This mechanism of antagonist action has been referred to as "active antagonism". The ER-β antagonist THC lacks a bulky side chain, and in its complex with ER-β, helix-12 is not sterically precluded from adopting the agonist-bound conformation. Instead, THC antagonizes ER-β by stabilizing nonproductive conformations of key residues in the ligand-binding pocket, thereby disfavoring the equilibrium to the agonist-bound conformation of helix-12 and leading to stabilization of an inactive conformation of helix-12. This mechanism of antagonism has been referred to as "passive antagonism". There are many other examples of NHR ligands that act as antagonists even though they are smaller than the endogenous agonists of these NHRs; thus "passive antagonism" appears to be a common mechanism for antagonists [12]. The position of helix-12 for the ER-β antagonist THC is similar to the position of helix-12 for the ER-β partial agonist GEN. Based on the position of helix-12, ER-β in the conformation stabilized by THC and GEN should be incapable of interacting with co-activators and should not have agonist activity. However, GEN is a partial agonist and THC is a pure antagonist. The simplest model that explains these data is based on two hypotheses [12]. First, helix-12 in the unliganded ER-β LBD is in equilibrium between the inactive conformations observed in the THC and GEN ER-β structures and the active agonist-bound conformation. Second, ligands affect transcriptional activity by shifting this equilibrium rather than inducing a single static conformation of helix-12. Partial agonists, such as GEN, are incapable of shifting the equilibrium in favor of the active conformation to the same extent as full agonists, they are able only to increase the affinity of the receptor for co-activator incrementally. Partial agonist binding should permit the NHR to sample both inactive and active conformations of helix-12 [12] (see also [8]). On the other hand, passive antagonists may prevent the sampling of the agonist conformation or they may stabilize a protein conformation that allows binding to co-repressors [4]. There also exist a class of ligands referred to as SERMs (selective estrogen receptor modulators) that display tissue-selective pharmacology [13]. Raloxifene and tamoxifen are two clinically used SERMs for which structures are available. Crystal structures of the estrogen receptor bound to different ligands (estradiol, tamoxifen, or raloxifene) reveal that ligands of different sizes and shapes induce a spectrum of receptor conformational states. These states can be "interpreted" by the cellular complexion of co-regulators and the environment of the local promoter of the target gene [13]. While the role of agonist’s ligands is presumably understood, the structural basis of antagonism and partial agonism are less well understood. The structural basis of SERM activity is not understood at this time. 2.2.1 Progesterone/Progesterone Receptor Complex Similar to the estrogen/estrogen receptor complex, the 3-keto steroid progesterone binds to the progesterone receptor (PR) by hydrogen bonding interactions at the
15
16 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 12 Interactions of the 3-keto steroid progesterone with PR Gln725 from helix-3 forms a hydrogen bond with the 3-keto group of the ligand In ER the corresponding residue that interacts with the A-ring phenol is a glutamic acid- thus it appears that this interaction plays a key role in the selection of ketones by PR and phenols by ER.
two ends of the ligand and hydrophobic interactions in the middle of the ligand (Fig. 12). The two axial methyl substituents on the steroid core of progesterone contribute significantly to the overall shape complementarities between the steroid and PR. The 3-keto group of the progesterone A-ring forms hydrogen bonds to the side chains of Arg-766, Gln-725 and a water molecule that interacts with both Arg766 and Gln-725. The side chain carbonyl (D-ring 17β acetyl group) of progesterone makes a weak hydrogen bond with the side chain of Thr-894 (Fig. 12). There are no strong hydrogen bond interactions between the D-ring carbonyl oxygen atom of progesterone and the protein in PR, suggesting that the recognition of this group is made mainly through hydrophobic and steric interactions [14]. PR Arg-766 is located at the C-terminal end of helix-5 and aligns with ER-α Arg394. PR Gln-725 is on helix-3 and aligns with ER-α Glu-353. The difference between a glutamine (PR) and a glutamic acid (ER) at the same position of helix-3 plays a role in the preference of PR for a 3-keto steroid and the preference of ER for an Aring phenol. Sequence and structural alignments [15] show that androgen receptor (AR), glucocorticoid receptor (GR) and mineralocorticoid receptor (MR) also have a glutamine at this position in helix-3. These NHRs have 3-keto steroids as their natural ligands. However, some other NHRs that do not recognize 3-keto steroids, such as RXR, hepatocyte nuclear factor-4 (HNF-4) and Drosophila melanogaster nuclear receptor Ultraspiracle (dUSP), also have a glutamine at this position. The PR Thr-894 residue is located on helix-11 and is in the i+4 position relative to ER His-524. 2.2.2 Androgen Receptor Complexes The structure of the androgen receptor (AR) complex with the natural agonist dihydroxytestosterone (DHT; Fig. 13) has been reported (PDB entry 1137) [16]. It binds to AR by hydrogen bonding interactions at the two ends of the ligand
2 Steroid Receptors
Fig. 13 Interactions of DHT with AR. As i n other steroid NHR complexes, the ligand is oriented on both ends by polar interactions. Mutagenesis experiments have implicated Thr-877 as a key residue that determines ligand selectivity.
and hydrophobic interactions in the middle of the ligand. The two axial methyl substituents on the steroid core of DHT contribute significantly to the overall shape complementarities between the steroid and AR. The A-ring ketone group forms hydrogen bond interactions with Gln-711, Arg-752 and a structural water molecule that bridges those two residues. Met-745 makes VDW contacts with the A-ring and the C-19 β-methyl group. The D-ring hydroxyl group makes hydrogen bond interactions with the side chains of Asn-705, from helix-3, and the side chain of Thr-877 from helix-11 (Fig. 13). Although there is only 55% sequence identity between the LBDs of AR and PR, there is a 77% sequence similarity, and as expected, the three-dimensional structures of these two LBDs are very similar. The ligand binding cores of the two proteins are quite similar in size and shape. DHT binds to the AR LBD in an almost identical fashion to the way progesterone binds to the PR LBD. Both agonists interact with helices 3, 5 and 11 of their respective LBDs. The interactions of ring A of each ligand with its receptor are similar, specifically those with the side chains of Gln-711, Met-745 and Arg-752 in the AR LBD, corresponding to those with Gln-725, Met-759 and Arg-766 in the PR LBD, and a conserved water molecule. The interactions of AR and PR with ring C are also similar, with close contacts to the main-chain of Leu-704 (Leu-718 in the PR LBD) and side chain of Asn-705 (Asn-719 in the PR LBD). The contact between C18 of DHT and the Oyl of Thr-877 is unique to the wild-type AR LBD, as the corresponding cysteinyl side chain is pointed away from the steroid in the PR LBD structure. These D-ring
17
18 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 14 Key polar interactions of dexamethasone with GR.
polar interactions are presumably responsible for the ability of AR to select the proper ligands correctly. The AR point mutant T877A shows promiscuity for other steroid ligands such as progestins, estrogens and Cortisols that differ from DHT in substitution at position 17 on the D-ring. The structure of this mutant, complexed with DHT, has been reported (PDB entry 1138) and is nearly identical to the wild type structure. The replacement of T887 by alanine leaves additional space off the D-ring and may explain its promiscuity [16]. 2.2.3 Glucocorticoid Receptor The structure of the glucocorticoid receptor (GR) in complex with the agonist ligand dexamethasone (Fig. 14) has been reported (PDB entry: 1M2Z) [17]. As with the 3-keto steroid ligands for PR and AR, the A-ring carbonyl of dexamethasone forms direct hydrogen bonds to Arg-611 and Gln-570 (Fig. 14). Like other agoniststeroid receptor complexes, dexamethasone does not have a precise fit in the ligandbinding pocket and occupies 65% of the binding pocket volume. The high affinity binding of dexamethasone to GR is readily explained by the extensive hydrophobic and hydrophilic interactions between the ligand and the protein. One or more hydrophobic residues within the GR protein contact nearly every atom of the steroid core of dexamethasone. In addition, all of the hydrophilic groups of dexamethasone form hydrogen bonds with the protein. Compared with the ligand-binding pocket found in the PR, AR or ER structures, the GR pocket has an additional branch extending from the center in the side of the steroid pocket. This additional side pocket in GR is formed by the structural rearrangement of helices-6 and -7. The side chain of Asn-564 is oriented in a way to allow it to make hydrogen bonds to the C-ring 11-hydroxyl and the D-ring 24-hydroxyl. Furthermore, the 21-hydroxyl (off the C17 position) and the 22-carbonyl form hydrogen bonds with residues Gln-642 and Thr-739, respectively. The extensive hydrogen bond network between GR and the ligand observed here are likely to contribute to the high affinity binding of dexamethasone [17].
2 Steroid Receptors
2.2.4 Steroid Ligand Selectivity Steroid hormones such as Cortisol, corticosterone, testosterone, progesterone and estrogen share a similar core chemical structure but mediate distinct biological responses. Structural comparisons of GR, AR, PR, and ER have now provided some insight into how specificity is achieved by the steroid receptors. In the structures of steroid receptors complexed with steroid agonists, the core steroid template assumes a common orientation with the A-ring oriented toward a conserved arginine from helix-5 and the D-ring toward helix-12. However, many subtle differences in the secondary structure and the topology of the ligand-binding pockets exist in different steroid receptors. In particular, helices-6 and -7 of GR deviate significantly from ER, AR and PR and produce a unique side pocket in GR. This pocket may account for the GR selectivity of glucocorticoids, which have larger substituents at the C17α position compared with estrogen, progesterone and testosterone. Interestingly, the mineralocorticoids that selectively bind MR have similar substituents at the C17α position. To date no structure for MR has been reported; however, MR may also have a similar pocket for these large C17α substitutes, since the residues that form the GR side pocket are also conserved in MR (Fig. 15) [17].
Fig. 15 Structure of some steroid ligands.
19
20 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Besides the shape differences, the polar atoms are also distributed differently in the steroid pockets with respect to the specific protein-ligand hydrogen bonds. For example, the polar substituents in steroid hormones are often located at positions C3 or C17. As discussed above, in the PR structure, the C3 ketone accepts hydrogen bonds from Arg-766 and Gln-725. These residues are also present in AR, GR and MR; however, in ER, the glutamine is replaced by glutamate, which prefers to accept a hydrogen bond from the phenolic group in the C3 position of the A-ring of estradiol [7]. These differences in hydrogen bond formation may explain why PR, as well as AR, GR and MR prefer the steroid hormones with a ketone at the C3 position, whereas ER prefers a phenol group. The larger GR side pocket can best explain the selectivity of GR ligands with larger substituents at the C17 position. While MR presumably has a similar pocket to GR the selectivity of MR for mineralocorticoids can be attributed to the differences in hydrogen bonding patterns between the receptor and the ligands. The MR-selective steroids, corticoste-rone, aldosterone and 11-deoxycorticosterone, all lack the 17α-hydroxyl group, which forms a specific hydrogen bond with Gln-642 in the GR structure At this position, MR has a hydrophobic leucine (Leu-848) that should disfavor the presence of a polar hydroxyl in this region [17]. Many of the structures between steroid receptors and agonist ligands show a less than optimal steric fit between the ligand and the protein However it appears that the subtle differences in the shape and electronic properties of the ligand binding pockets of steroid receptors are sufficient to either select the correct ligand or to deselect the incorrect one. Structures exist for a common ligand R1881 bound to both PR (PDB entry; 1E3K) and AR (PDB entry: 1E3G) [18] This ligand shows good binding affinities to both proteins and the affinities are comparable to the affinity of the natural ligand for each protein (Fig 16) PR can bind R1881 as well as progesterone in both the AR and PR complexes R1881 binds similarly with the A-ring ketone interacting with Arg-766 in PR (Arg-752 in AR) and Gln-725 in PR (Gln-711 in AR) and a structural water molecule. The overall shape of the ligand-binding pocket is similar in both PR and AR and many of the nonpolar side chains contacting the ligand are the same in both proteins. There are a total of 18 amino acid residues in AR and PR that interact
Fig. 16 Examples of interactions in the D-rings of steroids that account for ligand selectivity.
3 The Vitamin D Receptor-Ligand Complexes
with the bound ligands (either R1881 or progesterone). Most of these residues are hydrophobic and interact mainly with the steroid scaffold, whereas a few are polar and may form hydrogen bonds to the polar atoms in the ligand. The 17β hydroxyl group of R1881 forms different hydrogen bonds, when bound to AR or PR. In AR, the 17β hydroxyl group is hydrogen-bonded to Asn-705 and Thr-877. The same pattern is observed in the PR-R1881 complex where the 17β hydroxyl group of R1881 also forms a hydrogen bond to Asn-719. In contrast to the AR complex, Cys-891 (Thr-877 in AR) shows a weak interaction with the 17β hydroxyl group of R1881. Thr-894 in PR is replaced by Leu-880 in AR, and a methyl group of this leucine makes a van der Waals contact with both the 17α methyl group and 17β hydroxyl group of R1881. This bulkier side chain in AR is very likely responsible for the failure of AR to recognize the 17β acetyl group of progesterone. While there is an extra polar residue (Thr-877 besides Asn-705 which is also present in PR) that can form an additional hydrogen bond to the 17β hydroxyl oxygen it is likely that the decrease in pocket volume caused by the change of Thr-894 to Leu-880 inhibits the binding of bulkier ligands such as progesterone [18] While there are few published examples that directly address issues of steroid selectivity some trends are apparent The steroid receptor-agonist-ligand complexes do not involve precise steric fits between the protein and the ligand In each case the ligand is completely enclosed within the ligand-binding core of the protein While the steric match is not precise there are few or no polar groups present that are not involved in a protein-ligand hydrogen bond Thus one element of selectivity involves having a polar protein group to complement a ligand polar group The presence of a glutamate on helix-3 (E353 on ER-α) is responsible for the preference of ER for ligands containing an A-ring phenol In PR AR GR and MR the corresponding position on helix-3 has a glutamine and is responsible for the exclusion of A-ring phenol containing ligands and the preference of A-ring ketone containing ligands. Subtle changes elsewhere in the ligand-binding pocket appear to be responsible for the B-, C- and D-ring preference. It is not clear whether ligand selectivity of steroid receptors is due to the exclusion of the wrong ligand(s) or to the selection of correct ligand.
3 The Vitamin D Receptor-Ligand Complexes
The vitamin D receptor (VDR) binds the vitamin D metabolite la,25dihydroxyvitamin D3 (VDX; Fig. 17) as its natural agonist ligand. VDX is a secosteroid hormone, that is, it is a steroid with an opened B-ring. The structure of the VDX/VDR complex has been reported (PDB entry: 1DB1) using a deletion mutant of VDR [19, 20]. The structure of the protein in the complex resembles a typical agonist conformation with helix-12 folded over and enclosing the ligand. The hydroxylated A-ring makes polar interactions with several groups on the protein. The pseudo-equatorial hydroxyl group forms hydrogen bonds with the side chains of Ser-237 and Arg-274. Arg-274 is located on helix-5 while Ser-237 is located
21
22 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 17 Polar interactions of VDX with VDR.
on he-lix-3, this is the equivalent position of Glu-353 of ER-α and Gln-725 of PR. The axial hydroxyl group forms hydrogen bonds with the phenol group of Tyr-143 and the hydroxyl group of Ser-278. Tyr-143 is located at the C-terminus of helix-1 and Ser-278 is located at the near the C-terminus of helix-5, this is the equivalent position of Arg-394 of ER-α and Arg-766 of PR. The tertiary hydroxyl group off the D-ring interacts with two different histidine side chains, His-305, which is located between helix-6 and helix-7, and His-397, which is located on helix-11. His-305 also hydrogen bonds to the side chain of Gln-400 from helix-11 (Fig. 17). Comparison of the agonist-bound structure of ER-α and VDR provides insight into how different NHRs can select different ligands. The gross structures of these two proteins differ in the relative positions of helix-7 and helix-11 relative to helix-3 and in the conformation of the protein between the end of helix-5 and the beginning of helix-7. These changes alter the size and shape of the binding pocket to better accommodate estradiol (ER-α) or VDX (VDR). Key side chain differences, such as VDR Ser-237 (Glu-353 in ER-α) and VDR Ser-278 (Arg-394 in ER-α), affect the electronic complementarity of the NHR for the appropriate ligand. The crystal structures of the ligand-binding domain of the vitamin D receptor complexed to VDX (Fig. 18) and the 20-epi analogues, MC1288 (PDB entry 1IE9) and KH1060 (PDB entry 1IE8), show that the protein conformation is identical, further suggesting that for a given LBD the agonist conformation is unique. In all complexes, the A- to D-ring moieties of the ligands adopt the same conformation and form identical contacts with the protein. The explanation of the super-agonist effect of the 20-epi analogs is to be found in higher stability and longer half-life of the active complex, thereby excluding different conformations of the ligand-binding domain [21]. Vitamin D3-resistant rickets is associated with mutations to VDR, some of which effect ligand binding and ligand-dependent transactivation. The missense mutation R274L causes a > 1000-fold reduction in VDX responsiveness and is no longer regulated by physiological concentrations of the hormone. There have been two reports of the use of the VDR structure to design selective ligands that bind the mutant receptor in preference to the wild type VDR [22, 23].
3 The Vitamin D Receptor-Ligand Complexes
In one study, it was proposed that the R274L mutation removed a polar interaction between the protein and the natural ligand VDX and created a hydrophobic hole in its place. SS-III, a derivative of VDX, was designed to fill that hole. It was prepared and found to be 286 times more potent than VDX against the mutant protein [22] (Fig. 18). In another study, computER-αided molecular design was used to generate a focused library of nonsteroidal analogues of the VDR agonist LG190155 (Fig. 19) that were uniquely designed to complement the R274L protein associated with Vitamin D3-resistant rickets. Half of the designed analogues exhibit substantial activity in the R274L mutant. The seven most active designed analogues (such as A-11; Fig. 19) were more than 16 to 526 times more potent than VDX in the
Fig. 18 (a) The R274L VDR mutation fails to bind the natural ligand VDX. (b) The ligand SS-III was designed and found to bind to this mutant protein.
23
24 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 19 Structures of ligands for VDR R274L mutant.
mutant receptor. Significantly, the analogues are selective for the nuclear VDR and did not stimulate cellular calcium influx which is associated with activation of the membrane-associated vitamin D receptor (Fig. 19) [23].
4 The Retinoic Acid Receptors RAR and RXR 4.1 Introduction
Retinoid acids (RAs), the active retinoid derivatives of vitamin A regulate complex gene networks The pleiotropic effects of active retinoids are transduced by their cognate nuclear receptors retinoid X receptors (RXRs) and retinoic acid receptors (RARs) which act as transcriptional regulators activated by two stereoisomers of retinoic acid (RA): 9-cis RA (9cRA) and all-trans RA (atRA). Among the nuclear receptors, RXR occupies a central position and plays a crucial role in many intracellular signaling pathways as a ubiquitous heterodimerization partner with numerous other members of this superfamily Whereas RARs bind both retinoic acid isomers RXRs exclusively bind 9cRA The various RAR (RAR-α, -β and -γ ) and RXR (RXR-α -β and -γ ) isotypes are encoded by different genes while their multiple isoforms which differ in their N-terminal region are the results of differential promoter usage and alternative splicing It has been shown that RXRs play a unique role among NHRs since they are able to heterodimerize with a number of members of the NHR superfamily [24] This section will give an overview of the structural biology of retinoid receptor-ligand complexes focusing on the role of the ligand.
4 The Retinoic Acid Receptors RAR and RXR 25
4.2 RAR-γ and RXR-α Retinoid Complexes
The structure of atRA bound to RAR-γ was one of the first reported structures of the ligand-binding domain of an NHR bound to a ligand (PDB entry: 2LBD) [25, 26]. Like the steroid receptors the hydrophobic atRA ligand is buried, but fits loosely, within the ligand binding hydrophobic core of the protein. The protein adopts a typical agonist-bound conformation with helix-12 folding over, and enclosing, the ligand. The carboxylate group of atRA makes hydrogen bonds with both the backbone NH and side chain OH of Ser-289. Ser-289 makes two hydrogen bonds to the guanidine side chain of Arg-278. The carboxylate group of atRA also forms a hydrogen bond with a structural water molecule which is also hydrogen bonded to the backbone carbonyl of Leu-233 Arg-278 is located at the end of helix-5 and Ser-289 is located on a turn between helix-5 and helix-6 (Fig. 20). Comparing the agonist-bound structures of RAR-γ and ER-α shows some significant differences There is a significant change in the positions of helices-1, -3, -6, -7 and -9, and a small change in the positions of helices-11 and -12 relative to helices-4 and -5 In addition the protein conformation between the end of helix-5 and the beginning of helix-7 is very different between ER-α and RAR-γ As with ER-α there is an arginine at the end of helix-5 (Arg-278 in RAR-γ Arg-394 in ER-α) Glu-353 on helix-3 of ER-α is replaced by Cys-237 in RAR-γ Unlike ER and the other steroid receptors this residue in RAR-γ (Cys-237) does not appear to play a key role in ligand recognition As was seen for VDR changes in the gross structure of the protein result in changes in the relative positions of the helices. These changes alter the size and shape of the binding pockets to better accommodate the appropriate ligands. Clearly different NHRs have evolved structures that are complementary to their respective ligands. RAR-γ is a permissive receptor and will also bind 9cRA as an agonist; the structure of the protein in this complex (PDB entry 3LBD) [27] is nearly identical to that
Fig. 20 Schematic of interactions between atRA and RAR-(.
26 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
in the atRA complex. The polar carboxylate group of 9cRA interacts with RAR-γ in almost the same manner that atRA does and the rings of 9cRA and atRA fill the same hydrophobic binding site. However, the less than perfect fit of these two ligands to RAR-γ allows the tetraene chain to adopt different conformations (20-methyl group on opposite sides of the pocket) in the two complexes. In each complex the 19-methyl group occupies approximately the same space. The difference in chain conformations, coupled with a loose steric fit, allows for either a cis or trans olefin to be accommodated at the 9–10 position (Fig. 21). In contrast to RAR, RXR can select 9cRA in preference to atRA. The structure of the RXR-α/9cRA complex has been reported (PDB entry 1FBY) [24]. This structure exhibits the typical agonist protein conformation with helix-12 folded over the ligand. The carboxylate of 9cRA forms hydrogen bonds to the backbone NH of Ala-237 and the guanidine group of Arg-316. In addition, the carboxylate makes a hydrogen bond to a water molecule; that water forms hydrogen bonds to the backbone carbonyl of Leu-309 and to another water that is hydrogen bonded to the side chain of Gln-275. Arg-316 is located at the end of helix-5 and is in the equivalent position to Arg-278 in RAR-α and Ala-237 is in the equivalent position
Fig. 21 (a) Schematic of the interactions between 9cRA and RAR-(. (b) Superposition of atRA (black) and 9cRA (gray) bound to RAR-(.
4 The Retinoic Acid Receptors RAR and RXR 27
Fig. 22 Schematic of the polar interactions between 9cRA and RXR-". The shape of the RXR-" protein is complementary to 9cRA and explains its selectivity.
to Ser-289 of RAR-α (Fig. 22). Of interest, Gln-275 is located on the same position of helix-3 as Gln-725 in PR. As discussed previously this glutamine is conserved in several other steroid receptors that bind 3-keto steroids and is equivalent to Glu-353 in ER-α. Recall that in the steroid receptors this residue plays a key role in A-ring recognition. However, in RXR-α this glutamine residue is solvent exposed and makes an indirect polar interaction with the ligand. The hydrophobic ligandbinding pocket has a distinct L-shape to it. The cis-olefm of 9cRA allows the ligand to bend and accommodate this binding site. While the fit of 9cRA to RXR-α is not precise, the shape of the binding site precludes the binding of incorrect ligands such as atRA. This readily explains why RXR-α is selective for 9cRA as opposed to at RA. 4.3 Selectivity of RAR Ligands and RAR Isotypes
A very elegant experiment has been reported where enantiomers of an unnatural RAR-γ -selective ligand have been determined [28]. Of special significance is the observation that one of the enantiomers is effectively inactive in the biochemical assays. While acquiring co-crystal structures of many active ligands with the same protein is now commonplace, obtaining structures of inactive (i.e., weakly active) compounds is more challenging. This work provides a high-resolution structure (PDB entry 1EXX) of the complex between an inactive compound (BMS270395, Kd ≈ 500 µM) and RAR-γ . The structure (PDB entry: 1EXA) of the more potent en-antiomer (BMS270394, Kd = 500 nM) and RAR-γ was also determined for direct comparison (Fig. 23). The conformational profiles and energies of the two enantiomers are, by definition, mirror images of each other. Thus, the ligand conformational arguments made by Klaholz et al. [28] are not correct. If one superimposes the mirror image of
28 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 23 Structures of enantiomeric ligands BMS270394 is a 500 nM RAR-( ligand while its enantiomer BMS270395 is a very poor RAR-( ligand.
the inactive enantiomer on the active isomer it is apparent that the bound conformations of the two ligands are similar. The A-rings (except for the fluorine), amide groups and alcohol moiety superimpose almost exactly. The major difference in the conformations is the torsion angle (C O)-CH-C(Ar)-C(Ar); it is 53◦ and −102◦ for the active and inactive enantiomer, respectively. A quick MM2 calculation predicts a 119◦ torsion angle is a low energy local conformational minimum [29]. These calculations suggest that it is unlikely that the conformation energy difference between the active and inactive enantiomers is close to 4 Kcal mol−1 . It would appear that ligand conformational strain does not play a dominant role in the large difference in affinity between the two enantiomers. Thus, the major component of the large difference in affinity between these two isomers must come from protein-ligand interactions. The structure of the more active enantiomer (BMS270394) shows that the ligand binds to RAR-γ in a typical agonist conformation (Fig. 24). The carboxylate group of the ligand forms hydrogen bonds to Ser-289 and a water molecule in the same
Fig. 24 Key interactions between the active enantiomer BMS270394 and RAR-(. A key interaction involves a hydrogen bond between the ligand hydroxyl group and the sulfur of Met-272. Notice that the linking amide group makes complementary interactions with the enzyme.
4 The Retinoic Acid Receptors RAR and RXR 29
fashion that atRA does. The hydrophobic CD ring system makes VDW contacts in the hydrophobic ligand-binding core. The hydroxyl group of the ligand makes a hydrogen bond (3.2 Å) with the thioether of Met-272. The amide group also interacts well with the protein, the NH hydrogen bonds to the backbone carbonyl of Leu-271 and the amide carbonyl interacts with three aromatic H atoms; two of these interactions are much shorter than the third. The shortest of these interactions (3.17 Å) is with Phe-230. The interactions of carbonyl oxygens with aromatic hydrogens are well precedented, favorable interactions in protein-ligand complexes. These are weakly polar interactions between the positive dipole of an aromatic C H bond and the electrons of a carbonyl oxygen. The fluorine atom exhibits short VDW contacts with the Cα and Cβ atoms of Ala-234. For the more active isomer, all of the ligand’s polar groups make complementary polar interactions with the protein, the fluorine atom comes into close VDW contact with the protein, and the hydrophobic groups are buried in hydrophobic parts of the protein. The structure of the weakly active enantiomer (BMS270395) shows that the ligand also binds to RAR-γ in a typical agonist conformation (Fig. 25). The carboxylate group of the ligand makes hydrogen bonds to Ser-289 and a water molecule in the same fashion that BMS270394 does. The hydrophobic CD ring system creates VDW contacts in the hydrophobic ligand-binding core in much the same manner as BMS270394. In addition, the hydroxyl group of the inactive ligand also forms a strong hydrogen bond (3.2 A) with the thioether of Met-272. If the "inactive" and the "active" enantiomer adopted the same bound conformation, the hydroxyl group would still have had room to fit, indicating that the interaction between the hydroxyl group and the methionine is quite significant in determining the structure of the complex. Thus, the "inactive" enantiomer makes many of the same interactions with the protein that the “active” enantiomer does. The major differences between
Fig. 25 Key interactions between the "inactive" enantiomer BMS270395 and RAR-(. Like the active enantiomer, a key interaction involves a hydrogen bond between the ligand hydroxyl group and the sulfur of Met-
272. In this isomer the amide group does not make favorable interactions with the protein and the A-ring shows some disorder in the positions of the fluorine atoms.
30 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
the two are the position of the linking amide group and the position and occupancy of the fluorine atom and the conformation of the ligand. In the "inactive" enantiomer complex, the amide group does not make complementary interactions with the protein, and unlike the "active" enantiomer, the polar amide group is buried in the protein without making favorable compensatory interactions. Also, unlike the "active" enantiomer, BMS270395 shows two different orientations for the fluorine atom, in both orientations there are sub-optimal interactions between the fluorine and the protein. While the aromatic A-rings are in nearly the same position in both ligands, subtle differences in the position of the A-ring appear to be responsible for the disorder in the fluorine atoms. Furthermore, while the bound conformations of the two ligands are different, as discussed above, the conformational differences are not likely to be too severe. It appears that the burial of the polar amide and the less than optimal fit of the fluorine atom are the key determinant of enantiomer selectivity. Both of these structures show a hydrogen bond to the thioether of Met-272. Clearly, this is an important recognition element for binding to RAR-γ . In both RAR-α and RAR-β, the residue corresponding to Met-272 is an isoleucine. BMS270394 is much less potent against both RAR-α and RAR-β. While the structure of BMS270394 with either RAR-α or RAR-β has not been reported, one can speculate that the burial of the hydroxyl group near the isoleucine is responsible for the weaker affinity of these ligands for these isotypes. A paper has been published that discusses the structural basis for isotype selectivity of the RARs [30]. This paper reported structures of three RAR ligands bound to RAR-γ , BMS184394 (PDB entry: 1FCX), an RAR-γ -selective agonist, CD564 (PDB entry 1FCY), an RAR-β/γ co-agonist and BMS181156 (PDB entry 1FCZ), a panagonist (Fig. 26). Residues that directly contact the ligand are mostly conserved
Fig. 26 Structures of RAR ligands. BMS184394 is an RAR-(-selective ligand, CD564 is an RAR–$/( co-agonist, and BMS181156 is an RAR panagonist.
4 The Retinoic Acid Receptors RAR and RXR 31
in all three isotypes. The only differences are that Met-272 in RAR-γ is replaced by an isoleucine in both RAR-α and -β, and that Ala-234 in RAR-γ is replaced by a serine in RAR-α. BMS181156 has approximately the same activity against all three RAR isotypes and it is the most potent of these three ligands against RAR-γ . The structare of its complex bound to RAR-γ shows the bicyclic ring system bound in a hydrophobic site similar to BMS270394 and the carboxylate interacting with Ser-289 in the same manner as other RAR-γ ligands. This ligand binds to RAR-γ in a manner similar to 9cRA with the carbonyl group of BMS181156 occupying the same space as the C19 methyl group of 9cRA. In the BMS181156 complex, the carbonyl group makes close contacts with the Cε atom of Phe-304 and the Cε atom of Met-272. These close contacts have been described as C H O C hydrogen bonds [30]. The conformation of the Met-272 side chain is different in this complex to that in the BMS270394 and BMS270395 complexes, in the BMS181156 complex the sulfur atom does not interact directly with the ligand. CD564 shows comparable affinity towards RAR-γ and RAR-β but is 40-fold less potent against RAR-α. It is 5-fold less potent against RAR-γ than BMS181156. The structare of its complex bound to RAR-γ shows the bicyclic ring system bound in a hydrophobic site similar to BMS181156 and the carboxylate interacting with Ser-289 in the same manner as other RAR-γ ligands. The carbonyl group of CD564 makes a close contact with Cζ of Phe-304, which is in a different conformation than it is in the BMS181156 complex. The loss in potency against RAR-α is attributed to a steric clash between the naphthalene ring of the ligand and a serine side chain. In RAR-γ (and RAR-fl), this residue is Ala-234 and, in the complex structare, this residue is in close contact with the naphthalene ring. For BMS181156 the acyclic linker avoids this steric clash allowing potent binding to RAR-α. BMS184394, a mixture of enantiomers, is 100-fold less potent than BMS181156 against RAR-γ ; however, it shows 10-fold selectivity over RAR-β and 100-fold selectivity over RAR-α. This ligand binds to RAR-γ , as a single enantiomer, in a manner nearly identical to CD564. The major difference in the ligand is the difference between a ketone and an alcohol. As with BMS270394 and BMS270395, the alcohol group of BMS184394 forms a hydrogen bond to the thioether of Met-272. The side chain of Met-272 is in the same conformation as it is in the BMS270394 and BMS270395 structures. Thus, this side chain adopts different conformations depending upon the presence or absence of a hydrogen bond between the ligand and the thioether. The data suggests that the interaction of a ligand alcohol group with the thioether of Met-272 plays a key role in RAR-γ selectivity. The significant loss of affinity upon going from a ketone (CD564) to an alcohol (BMS184394) suggests that the observed C H O C hydrogen bonds in the ketones play an important role in stabilizing the complex and the alcohols cannot take advantage of this type of interaction. For RAR-γ , the alcohol-containing ligands can gain sufficient affinity by forming a hydrogen bond with Met-272 and thus becomes a selective ligand. SR11254, the oxime of CD564, is a potent (4 nM) RAR-γ -selective agonist [31]. Its structure in complex with RAR-γ (PDB entry 1FD0) shows that it binds in a
32 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 27 Key interactions of the RAR-(-selective agonist with RAR-(.
manner nearly identical to CD564 (Fig. 27) [32]. The ligand binds as a ∼1:1 mixture of E and Z oximes. In both isomers, the hydroxyl group forms a hydrogen bond to the Met-272 thioether. The side chain of Met-272 resembles the conformation found in other complexes in which the ligand forms a hydrogen bond to the Met272 side chain. In this structure, close contacts between the oxime oxygens and protein methyl groups are observed. The Z-isomer interacts with the CE methyl group of Met-272 and the E-isomer interacts with the Cγ 2 methyl group of Ile-275. These have been characterized as C H O hydrogen bonds and are proposed to help stabilize the complex. The Klaholz and Moras paper is also noteworthy for its discussion of C H O hydrogen bonds [32]. The above studies provide a structural explanation for how very similar NHRs (i.e., the isotypes of RAR) can selectively bind different ligands. In each case a key polar interaction (ligand carboxylate) and the burial of a substantial amount of hydrophobic surface allows binding to occur. Tight binding occurs when there are no bad steric or electrostatic interactions. The case of enantiomer selectivity (BMS270394 vs. BMS270395) occurred when only one of the enantiomers could bury an amide group and make complementary interactions. RAR-γ and RAR-β differ by only one amino acid group in the ligand-binding pocket (RAR-γ Met272 is an lie in RAR-β). Ligands that can form a hydrogen bond to the thioether of Met-272 tend to be selective for RAR-γ . RAR-β and RAR-α differ by only one amino acid group in the ligand-binding pocket (RAR-γ Ala-234 is an Ala in RAR-β and a Ser in RAR-α). Selectivity for RAR-β over RAR-α can be achieved with
5 PPAR: Isotype-Selective Ligands
Fig. 28 Structures of RXR ligands.
ligands that make close VDW contacts with the side chain of Ala-234 in RAR-γ (or RAR-β). These ligands presumably bind poorly to RAR-α due to a bad steric contact. Thus, it is possible to achieve selectivity against very similar isotypes of the same NHR and it is possible to rationalize these selectivities based on structure. It is likely that in the future subtle selectivity will be rationally designed into ligands. 4.3.1 RXR Complexes with Unnatural Ligands The structures of two structurally related RXR-selective ligands have been reported. The complex between BMS-649, RXR-α and a co-activator peptide (PDB entry 1MVC) [33] and between LG-268 and RXR-β (PDB entry 1H9U) [34] are similar and will be discussed briefly below. The carboxylate group of both ligands interacts with the backbone NH of Ala-327 (RXR-α, Ala-398 RXR-β) and Arg-316 (RXR-α, Arg-387 RXR-β in the same manner that 9cRA does. The D-ring binds in the same place that the ring of 9cRA does (Fig. 28). The shape of the ligand is complementary to the L-shaped RXR ligand-binding pocket. This shape complimentarity explains why these ligands are selective for RXR over RAR. One significant difference between these two structures is the position of helix12; in the complex between RXR-α, BMS-649 and a co-activator peptide, helix-12 is in the standard agonist position. In the complex between LG-268 and RXR-β helix-12 is in a novel position not seen in other NHR structures. In this position LG-268 does not fold over and cap the ligand. Additional data indicate that LG-268 is unable to release co-repressors from RXR unless co-activators are also present; this suggests that certain RXR ligands alone may be inefficient at repositioning helix-12 [34].
5 PPAR: Isotype-Selective Ligands
Peroxisome proliferator-activated receptors (PPARs), are members of the steroid/retinoid nuclear receptor family of ligand-activated transcription factors. There are three isotypes of the PPAR family, PPAR-α, PPAR-δ and PPAR-γ . Various natural fatty acids (FAs) and eicosanoids serve as ligands for the PPARs.
33
34 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Interestingly, several unsaturated FAs that activate the PPARs in vitro have pharmacological effects similar to those reported for the synthetic PPAR ligands. Based upon these observations and the promiscuous ligand-binding properties of the PPARs, it has been suggested that these receptors serve as physiological sensors of lipid levels, linking FA concentrations to glucose and lipid homeostasis [35]. The biological profiles of each have led to a number of ligated structures having been reported. This section will focus upon how ligands can select between these three isotypes. Unlike the RARs, there are several amino acid differences in the subtypes that directly contact the ligand. Thus, there are more options for obtaining isotype selective ligands and it may be more challenging to obtain ligands that show high affinity for all isotypes. The structure of the FA eicosapentaenoic acid (EPA) bound to PPAR-δ has been determined (PDB entry 3GWX) [35]. The ligand-binding pocket of PPAR-δ assumes roughly a "Y" shape with each of the three arms approximately 12 Å in length. One arm of the Y is composed of a mix of hydrophobic residues and polar residues and is capped by two residues from helix-12. This is the only arm of the Y that is substantially polar in character. Another arm of the Y is composed of hydrophobic residues and is sealed by the last residue of helix-1, and the loop between helices-1 and -2. The third arm of the Y-shaped pocket is composed mainly of hydrophobic residues with a few polar residues. The interior of the ligand-binding pocket is accessible via a channel that is exposed to solvent [35] The EPA complex crystal has two molecules in the asymmetric unit; each independent molecule has two distinct conformations for the EPA ligand (Fig. 29). For EPA in each complex molecule, the acid group and the first eight carbons adopt very similar conformations. The hydrophobic tails of EPA diverge in the different bound conformations with each occupying a different arm of the Y. The carboxylate head group of EPA interacts with several polar residues in one arm of the Y. Surprisingly, the details of these interactions are different in each of the four copies of the EPA in the crystal. There are three polar residues that interact with the carboxylates, His-323 from helix-5, His-449 from helix-11 and Tyr-473 from helix-12. His-449 forms a buried hydrogen bond to the side chain of Lys-367. PPARA and the PPARs in general, use a helix-12 residue to make a specific hydrogen bond to agonist ligands, unlike many of the other NHRs. The structure of the fibrate ligand GW2433, a PPAR-α/δ co-agonist, has also been reported bound to PPAPwJ (PDB entry: 1GWX; Fig. 30) [35]. In the structure, the carboxylate forms hydrogen bonds to two histidines (His-323 and His-449) and to Tyr-473 from helix-12. The rest of the molecule makes substantial VDW contacts with all three branches of the Y-shaped ligand-binding pocket. These are mostly hydrophobic interactions between ligand and protein; however, a hydrogen bond between the ligand urea NH and the side chain of Cys-285 is observed. While GW2433 is a PPAR-α/δ co-agonist, the structure of its complex with PPAR-α has not been disclosed. As will be discussed below, the residue corresponding to His-323 in PPAR-α is a tyrosine (Tyr-314). AZ-242 is a PPAR-α/γ co-agonist and its structure has been determined in complex with both PPAR-α (PDB entry 1I7G) and PPAR-γ (PDB entry 1171) [36]. Like
5 PPAR: Isotype-Selective Ligands
Fig. 29 Interactions of EPA with PPA* (a) The carboxylate head group of the ligand interacts with three protein residues His-323 His-449 and Tyr-473 (b) Two distinct confermations of the hydrophobic tail of the ligand were observed The ligand-binding site of PPARU5 is Y-shaped and each conformation of EPA fills a different branch of the Y.
PPAR-α, both PPAR-α and PPAR-γ have Y-shaped ligand binding sites; however, AZ-242 only fills two of the three sites. PPAR-α has four polar residues that interact with the carboxylates, Tyr-314 from helix-5, His-440 from helix-11, Tyr-464 from helix-12 and Ser-280 from helix-3. His-449 forms a buried hydrogen bond to the side chain of Lys-358. Tyr-314, His-440 and Tyr-464 are in the equivalent positions as His-323, His-449 and Tyr-473 in PPAR-δ. Thus, a major difference between PPAR-α and PPAR-δ is the change of a histidine in PPAR-δ to a tyrosine in PPARα. The dihydrocinnamate group of AZ-242 has a different local conformation than that of the phenoxy acid group of GW2433, as a result the carboxy groups of both ligands interact with the protein in distinct manners. This may explain why AZ-242 is a PPAR-α agonist and not a PPAR-α5 agonist (Fig. 31). The structure of AZ-242 bound to PPAR-γ has also been reported. The conformation of the dihydrocinnamate group resembles the conformation found in the
35
36 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 30 Interactions of the PPAR-"/* co-agonist with PPAR-,5 The Yshape of the ligand is complementary to the Y-shape of the ligand binding pocket One of the three arms of the Y makes polar interactions with the carboxy-late head group of the ligand the interactions in the other two arms are mostly hydrophobic.
PPAR-α complex. In PPAR-γ , Tyr-314 in PPAR-α is replaced by His-323, thus the carboxylate interacts with two histidines (His-323 and His-449) and one tyrosine (Tyr-473). This arrangement of polar groups is the same as found in PPAR-δ; however, there are subtle differences in the position of each residue relative to each other in the two isotypes. In addition, one of the carboxylate oxygens also forms a
Fig. 31 Key polar interactions of PPAR-"/( co-agonist AZ-242 with PPAR-".
5 PPAR: Isotype-Selective Ligands
Fig. 32 Key polar interactions of PPAR-"/( co-agonist AZ-242 with PPAR-(. Notice that the equivalent residue of PPAR-" Tyr-314 is His323 in PPAR-(.
hydrogen bond with Ser-289. One difference between the two AZ-242 complexes is the conformation of the phenoxy "tail" of the ligand; this results in slight differences in the final position of the phenyl sulfonate group. One difference between PPAR-γ and PPAR-α is the presence of Cys-275 in PPAR-α, which is Gly-284 in PPAR-γ . The different tail conformations of the phenoxy linker allow the sulfonate group of AZ-242 to avoid the Cys-275 side chain in PPAR-α (Fig. 32). The thiazolidinediones (TZDs) are a class of antidiabetic agents that were subsequently shown to be potent PPAR-γ agonists. TZDs are usually selective for PPAR-γ although exceptions are known [37, 38]. The structure of the TZD rosiglitazone in complex with PPAR-γ has been reported both in the presence (PDB entry 1FM6) [3] and absence (PDB entry 2PRG) [39] of RXR-α ligated with 9-cis-retinoic acid. Both complexes are quite similar and the discussion will center around the heterodimeric complex. Like AZ-242 rosiglitazone occupies two arms of the Y-shaped liand-binding site. The polar TZD head forms hydrogen bonds with His-323 and Tyr-473 and also makes longer polar interactions with His-449 and Ser-289 (Fig 33). Thus, the TZD group mimics a carboxylate group. The nonpolar part of rosiglitazone makes VDW contacts within the ligand-binding site. The larger size of a TZD group compared with a carboxylate group usually allows it to be selective for PPARγ over the other isotypes. For PPAR-δ, the carboxylate binding site is narrower than in the other isotypes; thus the larger TZD head group cannot interact with the polar groups on the PPAR-S protein. The presence of the Tyr-314 precludes the TZD head group from making the proper polar contacts with the PPAR-α protein. Consistent with this proposal, the PPAR-γ H323Y mutant binds rosiglitazone 50-fold weaker than wild type PPAR-γ . In addition, the PPAR-α Y314H mutant binds rosiglitazone
37
38 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
Fig. 33 Key polar interactions of PPAR-(-selective agonist rosiglitazone with PPAR-( The TZD group acts as a carboxylate mimetic.
weakly while the wild type PPAR-α does not bind it to any measurable extent [40]. Clearly, the steric demands of the carboxylate-binding pocket play a major role in determining isotype selectivity. It has been speculated that the shift in the TZD head group caused by Tyr-314 in PPAR-α results in an unfavorable conformation in the remainder of the TZD ligand. The TZD KRP-297 has been reported to show dual PPAR-α/γ activity [41]. Unlike rosiglitazone, KRP-297 has a meta-substituted side chain across the central phenyl rings. The meta-substituted side chain may allow an improved fit in the PPAR-α protein [40]. Farglitazar (GI262570) is a PPAR-γ /α co-agonist that shows 1000-fold selectivity for PPAR-γ over the α-isotype. The structure of its complex with PPAR-γ has been reported in the heterodimeric complex with ligated RXR-α (PDB entry: 1FM9) [3]. Farglitazar binds to PPAR-γ in a similar manner that AZ242 does; the dihydrocinnamate moieties bind nearly identically and, like AZ242, the carboxylate interacts with His-449, Tyr-473, His-323 and Ser-289 (Fig. 34). The tails of the two ligands bind by VDW contacts in the same pocket although the details of the binding are different. Thus, only two of the three branches of the Y-shaped pocket are occupied. The most dramatic difference between these ligands is the size of the substituent adjacent to the carboxylate. For AZ242 it is an O-ethyl group, and for farglitazar it is the much larger amino benzophenone group. The side chain of Phe-363 moves in the farglitazar structure relative to the AZ242 structure to open a pocket that accommodates the second ring of the benzophenone group. A close analogue of farglitazar, GW409544, is also a co-agonist that binds only ten times poorer towards PPAR-α (Fig. 35). Neither farglitazar nor GW409544 bind to, or activate, PPAR-δ to any measurable extent. The structures of GW409544 bound to both PPAR-γ (PDB entry: 1K74) and to PPAR-α (PDB entry: 1K7L) have
5 PPAR: Isotype-Selective Ligands
Fig. 34 Key polar interactions of PPAR-(-selective agonist farglitazar with PPAR-(. The larger benzophenone group fits into a pocket created by the movement of Phe-363 relative to its position in the rosiglitazone complex.
Fig. 35 Structure of the PPAR-"/( co-agonist CW409544 and the PPAR-*-selective agonist CW501516.
39
40 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
been reported [40]. GW409544 binds to PPAR-γ in a manner very similar to, but not identical to, the way farglitazar binds to PPAR-γ . The α-amino side chains bind differentiy in the two structures. GW409544 binds to PPAR-α and PPAR-γ in nearly identical manners despite slight changes in key protein residues contacting the ligand. The equivalent residue to PPAR-γ Phe-363 is Ile-354 in PPAR-α, and as previously discussed PPAR-α His-323 is substituted by the larger Tyr-314 in PPAR-α. A modeling exercise proposed that for farglitazar to interact with the larger Tyr-314 in PPAR-α a shift in the ligand position would result in a steric clash between a benzophenone ring and Phe-273. The structure of the complex between GW409544 and PPAR-α suggests that adding three atoms to the vinylogous amide of GW409544 to generate farglitazar would result in a steric clash with Phe-273. Consistent with this proposal, the PPAR-α Y314H mutant potently binds farglitazar [40]. A very interesting discussion about the key factors that result in PPAR subtype specificity has been published by the Glaxo group [40]. This discussion is now summarized below. The PPAR-α and PPAR-δ ligand-binding pockets are significantly larger than the PPAR-δ pocket because of the narrowing of the pocket adjacent to helix-12. It is notable that only a handful of potent PPAR-δ ligands have been described. Ligands such as TZDs and L-tyrosine-based agonists like farglitazar show little or no binding to PPAR-α. In both cases, their acidic head groups seem to be too large to fit within the narrow PPAR-δ pocket. In contrast, the potent PPAR-δ agonist GW501516 contains an unsubstituted phenoxy-acetic acid head group that complements the narrow PPAR-δ ligand-binding pocket. Fibrate ligands, which generally bind to PPAR-δ only at high micromolar concentrations, contain small alkyl substituents adjacent to the carboxylate group. A PPAR-δ mutant M417V, which allows fibrate ligands to bind to PPAR-δ, has been reported [42]. This mutation is likely to increase the size of the PPAR-δ pocket to facilitate the binding of the small alkyl substituents adjacent to the carboxylate. Thus, the reduced size of the PPAR-δ pocket is a major determinant of ligand binding to this subtype. In comparison with PPAR-α, the PPAR-α and PPAR-γ ligand-binding pockets are closer in size and shape to each other. The Glaxo group has found that a major determinant of selectivity between these two subtypes is the substitution of Tyr-314 in PPAR-α for His-323 in PPAR-γ . These amino acids form part of the network of hydrogen-bonding residues that are involved in the activation of the receptor by its acidic ligands. Overlay of the PPAR-α and PPAR-γ crystal structures reveals that the larger volume of the Tyr-314 side chain in PPAR-α forces a 1.5 A shift in the position of the high-affinity ligand GW409544. The structurally related ligand farglitazar is unable to accommodate this shift because of a steric interaction with PPAR-α Phe-273. As a result, farglitazar shows 1000-fold selectivity for PPAR-γ over PPAR-α. The point mutations of Y314H in PPAR-α and H323Y in PPAR-γ demonstrate that these single amino acids are, in large part, responsible for determining the subtype selectivity of farglitazar. In each case, a 10–100-fold shift in the potency of the ligand was observed. Compared with farglitazar, GW409544 has three atoms removed to allow it to shift within the PPAR-α pocket without clashing
6 Conclusions
with Phe-273. The potent dual PPAR-α/γ agonist activity of GW409544 results from a complementary match of the re-engineered ligand with both the PPAR-α and PPAR-γ ligand-binding pockets. TZD ligands also respond to the point mutation of Y314H in PPAR-α and H323Y in PPAR-γ with a corresponding increase in PPAR-α and decrease in PPAR-γ activity, respectively. These data suggest that the TZDs, which do not contain the large N-substituents present in the L-tyrosinebased ligands, also have difficulty accommodating the 1.5 Å shift required to bind to PPAR-α. It was speculated that the shift in the TZD head group results in an unfavorable conformation in the remainder of the molecule. It is interesting to note that a single amino acid difference in PPARs has such a dramatic impact on ligand selectivity, given that the PPAR pocket is composed of more than 25 amino acids [40].
6 Conclusions
NHRs function as either activators or repressors of gene expression. Small molecule ligands (hormones) are the regulators of these proteins. Structural studies have characterized at least two distinct forms of ligated NHRs, agonist-bound NHR with a peptide derived from a co-activator and antagonist-bound NHR with a peptide derived from a co-repressor. These define two states for the NHR, the first one is the "on" state where genes are expressed and the second is the "off-state where gene expression is repressed. There are other structures that may, or may not, represent intermediate states between "on" and "off". The discussions above, while not comprehensive, have focused on how ligands can bind to and alter the conformational state of the NHR. More importantly, the discussion has focused on the important issue of ligand selectivity for a specific NHR. A system that is regulated by ligands is only effective if the proteins that are being regulated respond only to a specific ligand, or to just a few key ligands. Thus, the ability of estradiol to regulate ER-α and ER-β and not to regulate other related proteins such as GR and AR is the key to this regulatory system. The key trends about how ligands selectively bind to and agonize their NHR target protein will now be summarized. A striking observation for many of these agonist NHR complexes is the lack of high steric complementarity between the mostly hydrophobic ligand and the ligandbinding core of the protein. The complex between estradiol and ER-α shows that there are large unoccupied cavities between the ligand and the protein. In spite of this, estradiol is a potent and selective ligand for ER-α and the closely related ER-β. Another good example is found in the complexes between PPAR-γ and rosiglitazone (Fig. 33) and the larger ligand farglitazar (Fig. 34). Both are potent selective ligands; however, the larger ligand is 90 times more potent. While the polar thiazolidedione moiety of rosiglitazone plays a key role for its isotype selectivity for PPAR-γ , the larger benzophenone moiety of farglitazar is required for PPAR-γ
41
42 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
selectivity. Thus, for farglitazar a precise relative fit of the polar carboxylate functionality and one of the apolar groups work together to achieve PPAR-γ selectivity. As discussed previously, the Glaxo group has shown that changes to the benzophenone moiety of farglitazar produce ligands that agonize both PPAR-γ and PPAR-α. They have also determined that a single polar amino acid change between PPARγ (His-323) and PPAR-α (Tyr-314, equivalent to PPAR-γ His-323) is the primary determinant of PPAR-γ selectivity for both rosiglitazone and farglitazar. The conclusion reached is that the PPAR isotype selectivity is determined by the polar ligand group making a precise interaction with the protein, with the remainder of the ligand being well tolerated by the protein. Many steroid hormones are mostly hydrophobic molecules that have polar groups at opposite ends of the molecule (A- and D-rings). For these steroid receptors a key polar residue on a common position of helix-3 helps the estrogen receptors (Glu353 in ER-α) select for A-ring phenols, while PR, AR, GR and MR, which have a glutamine at this position, are selective for 3-keto A-rings (Figs. 4 and 12). Further ligand discrimination occurs primarily by additional polar interactions near the Dring (Figs. 12, 13, 14, and 15). For the steroid receptors, agonist selectivity appears to be determined by matching of polar groups on the ends of the ligand with complementary protein polar groups. The hydrophobic-α of the ligands and the apolar nature of the ligand binding site provides a mechanism for obtaining high affinity, provided there are no polar mismatches, in spite of less than optimal steric complementarity. Another good example of this is the case of two enantiomers (BMS270394 and BMS270394, Fig. 23) of the same ligand binding to RAR-γ . Both ligands orient the carboxylate, hydroxyl group, the A-, C- and D-rings in approximately the same manner. The linking amide group of the more potent isomer makes complementary interactions with the protein (Fig. 24) while the less potent isomer does not (Fig. 25). Thus we see a vivid example of where effective agonist binding occurs when a hydrophobic ligand makes a less than optimal steric fit in the core of the NHR and has the matching of polar ligand groups with complementary protein polar groups. The "inactive" isomer must bury the polar amide group in the protein interior and thus bind very weakly. Comparison of RAR and RXR shows distinctly different shapes of their ligandbinding pockets. Thus, RAR can effectively bind either all-trans-retinoic acid or 9-cis-retinoic acid as agonists, while RXR can only accommodate the bent ligand 9-cis-retinoic acid. These data make it tempting to speculate that selectivity is determined not by a precise match of the "right" ligand with the "right" NHR, but rather by the exclusion of "wrong" ligands by a mismatch of polar groups or an impossible steric fit. To conclude this chapter, the molecular recognition of ligands by NHRs is determined by a number of subtle factors. Agonism or antagonism is due to the ability of a given ligand to bind to the appropriate form of the protein. It is likely that nonpolar interactions provide binding affinity even in the absence of good shape complementarity. Selectivity appears to be guided by the avoidance of interactions, either polar or apolar, that are unfavorable.
References
Acknowledgements
I would like to thank Dr. Michael J. Rynkiewicz for careful review of this document and for several suggestions that helped shape the final version. References 1 OLEFSKY, J. M. Nuclear receptor minire-view series. Biol. Chem. 2001, 276, 36863–36864. 2 ROSENFELD, M. G., GLASS, C. K. Coregulator codes of transcriptional regulation by nuclear receptors. J. Biol. Chem. 2001, 276, 36865–36868. 3 GAMPE, R. T., Jr., MONTANA, V. G., LAMBERT, M. H., MILLER, A. B., BLEDSOE, R. K., MILBURN, M. V., KLIEWER, S. A., WILLSON, T. M., XU, H. E. Asymmetry in the PPARγ /RXRalpha crystal structure reveals the molecular basis of heterodimerization among nuclear receptors. Mot Cdl 2000, 5, 545–555. 4 XU, H. E., STANLEY, T. B., MONTANA, V. G., LAMBERT, M. H., SHEARER, B. G., COBB, J. E., MCKEE, D. D., GALARDI, CM., PLUNKET, K. D., NOLTE, R. T., PARKS, D. J., MOORE, J. T., KLIEWER, S. A., WILLSON, T. M., STIMMEL, J. B. Structural basis for antagonist-mediated recruitment of nuclear co-repressors by PPARalpha. Nature 2002, 415, 813–817. 5 EVANS, R. M. The Steroid and Thyroid Hormone Receptor Superfamily Science 1988, 240, 889–895. 6 BRZOZOWSKI, A., PIKE, A., DAUTER, Z., HUBBARD, R., BONN T. ENGSTROM, O., OHMAN, L., GREENE, G., GUSTAESSON, J., CARLQUIST, M. Molecular basis of agonism and antagonism in the oestrogen receptor. Nature 1997, 389, 753–758. 7 TANENBAUM, D. M., WANG, Y., WILLIAMS, S. P., SIGLER, P. B. Crystallographic comparison of the estrogen and progesterone receptor’s ligand binding domains. PNAS 1998, 95, 5998–6003. 8 GANGLOFF, M., RUFF, M., EILER, S., DUCLAUD, S., WURTZ, J. M., MORAS, D. Crystal structure of a mutant hERalpha ligand-binding domain reveals key structural features for the mechanism of partial agonism. J. Biol. Chem. 2001, 276, 15059–15065.
9 SHIAU, A. K., BARSTAD, D., LORIA, P. M., CHENG L. KUSHNER, P. J., AGARD, D. A., GREENE, G. L. The structural basis of estrogen receptor/coactivator recognition and the antagonism of this interaction by tamoxifen. Cell 1998, 95, 927–937. 10 PIKE, A. C., BRZOZOWSKI, A. M., WALTON, J., HUBBARD, R. E., THORSELL, A. G., LI, Y. L., GUSTAFSSON, J. A., CARLQUIST, M. Structural insights into the mode of action of a pure antiestrogen. Structure (Camb.) 2001, 9, 145–153. 11 PIKE, A. C. W., BRZOZOWSKI, A. M., HUBBARD, R. E., BONN T. THORSELL, A.-G., ENGSTROM, O., LJUNGGREN, J., GUSTAESSON, J.-A., CARLQUIST, M. Structure of the ligand-binding domain of oestrogen receptor beta in the presence of a partial agonist and a full antagonist. EMBO J. 1999, 18, 4608–4618. 12 SHIAU, A. K., BARSTAD, D., RADEK, J. T., MEYERS, M. J., NETTLES, K. W., KATZENELLENBOGEN, B. S., KATZENELLENBOGEN, A., AGARD, D. A., GREENE, G. L. Structural characterization of a subtype-selective ligand reveals a novel mode of estrogen receptor antagonism. Nat. Struct. Biol. 2002, 9, 359–364. 13 KATZENELLENBOGEN, B. S., KATZENELLENBOGEN, J. A. Biomedicine: Enhanced: Defining the S in SERMs. Science 2002, 295, 2380–2381. 14 WILLIAMS, S., SIGLER, P. Atomic structure of progesterone complexed with its receptor. Nature 1998, 393, 392–396. 15 WURTZ, J. M., BOURGUET W. RENAUD, J. P., VIVAT, V., CHAMBON, P., MORAS, D., GRONEMEYER, H. A canonical structure for the ligand-binding domain of nuclear receptors. Nat. Struct. Biol. 1996, 3, 87–94. 16 SACK, J. S., KISH K. F., WANG C. ATTAR, R. M., KIEEER, S. E., AN Y. WU, G. Y., SCHEEELER, J. E., SALVATI, M. E., KRYSTEK, S. R., Jr., WEINMANN, R., EINSPAHR, H. M.
43
44 Molecular Recognition of Nuclear Hormone Receptor-Ligand Complexes
17
18
19
20
21
22
23
Crystallographic structures of the ligandbinding domains of the androgen receptor and its T877A mutant complexed with the natural agonist dihydrotestosterone. PNAS 2001, 98, 4904– 4909. BLEDSOE, R. K., MONTANA, V. G., STANLEY, T. B., DELVES, C. J., APOLITO, C. J., MCKEE, D. D., CONSLER, T. G., PARKS, D. J., STEWART, E. L., WILLSON, T. M., LAMBERT, M. H., MOORE, J. T., PEARCE, KH., XU, H. E. Crystal structure of the glucocorticoid receptor ligand binding domain reveals a novel mode of receptor dimerization and coactivator recognition. Cell 2002, 110, 93–105. MATIAS, P. M., DONNER, P., COELHO, R., THOMAZ, M., PEIXOTO C. MACEDO, S., OTTO, N., JOSCHKO, S., SCHOLZ, P., WEGG A. BASLER, S., SCHAEER, M., EGNER, U., CARRONDO, M. A. Structural evidence for ligand specificity in the binding domain of the human androgen receptor. Implications for pathogenic gene mutations. J. Biol. Chem. 2000, 275, 26164–26171. ROCHEL, N., WURTZ, J. M., MITSCHLER, A., KLAHOLZ, B., MORAS, D. The crystal structure of the nuclear receptor for vitamin D bound to its natural ligand. Mol. Cell 2000, 5, 173–179. ROCHEL, N., TOCCHINI-VALENTINI, G., EGEA, P. F., JUNTUNEN, K., GARNIER, J.M., VIHKO, P., MORAS, D. Functional and structural characterization of the insertion region in the ligand binding domain of the vitamin D nuclear receptor. Eur. J. Biochem. 2001, 268, 971–979. TOCCHINI-VALENTINI, G., ROCHEL, N., WURTZ, J. M., MITSCHLER, A., MORAS, D. Crystal structures of the vitamin D receptor complexed to superagonist 20-epi ligands. PNAS 2001, 98, 5491–5496. SWANN, S. L., BERGH, J. J., FARACH-CARSON, M. C., KOH, J. T. Rational design of vitamin D3 analogues which selectively restore activity to a vitamin D receptor mutant associated with rickets. Org. Lett 2002, 4, 3863– 3866. SWANN, S. L., BERGH, J., FARACH-CARSON, M. C., OCASIO, C. A., KOH, J. T. Structurebased design of selective agonists for a
24
25
26
27
28
29 30
31
32
33
rickets-associated mutant of the vitamin D receptor. J. Am. Chem. Soc. 2002, 124, 13795–13805. EGEA, P. F., MITSCHLER, A., ROCHEL, N., RUEE, M., CHAMBON, P., MORAS, D. Crystal structure of the human RXR{alpha} ligand-binding domain bound to its natural ligand: 9-cis retinoic acid. EMBO J. 2000, 19, 2592–2601. RENAUD, J. P., ROCHEL, N., RUEE, M., VIVAT, V., CHAMBON, P., GRONEMEYER, R. MORAS, D. Crystal structure of the RAR-gamma ligand-binding domain bound to all-trans retinoic acid. Nature 1995, 378, 681–689. WAGNER, R. L., APRILETTI, J. W., MCGRATH, M. E., WEST, B. L., BAXTER, J. D., FLETTERICK, R. J. A structural role for hormone in the thyroid hormone receptor. Nature 1995, 378, 690–697. KILAHOLZ, B. P., RENAUD, J. P., MITSCHLER, A., ZUSI, C., CHAMBON, P., GRONEMEYER, R., MORAS, D. Conformational adaptation of agonists to the human nuclear receptor RAR gamma. Nat. Struct. Biol. 1998, 5, 199–202. KIAHOLZ, B. P., MITSCHLER, A., BELEMA, M., ZUSI C. MORAS, D. Enantiomer discrimination illustrated by high-resolution crystal structures of the human nuclear receptor hRARgamma. PNAS 2000, 97, 6322–6327. BABINE, R. E. Unpublished, 2002. KIAHOLZ, B. P., MITSCHLER, A., MORAS, D. Structural basis for isotype selectivity of the human retinoic acid nuclear receptor. J. Mol. Biol. 2000, 302, 155–170. YU, K. L., SPINAZZE, P., OSTROWSKI, J., CURRIER, S. J., PACK, E. J., HAMMER L. ROALSVIG, T., HONEYMAN, J. A., TORTOLANI, D. R., RECZEK, P. R., MANSURI, M. M., STARRETT, J. E., Jr., Retinoic acid receptor beta. gamma-selective ligands: synthesis and biological activity of 6-substituted 2-naphthoic acid retinoids. J. Med. Chem. 1996, 39, 2411–2421. KIAHOLZ, B., MORAS, D. C.-H.-O Hydrogen bonds in the nuclear receptor RAR-gamma – a potential tool for drug selectivity. Structure (Camb.) 2002, 10, 1197. EGEA, P. F., MITSCHLER A. MORAS, D. Molecular recognition of agonist ligands
References
34
35
36
37
38
by RXRs. Mol. Endocrinol. 2002, 16, 987–997. LOVE, J. D., GOOCH, J. T., BENKO, S., LI, C., NAGY, L., CHATTERJEE, V. K., EVANS, R. M., SCHWABE, J. W. The structural basis for the specificity of retinoid-X receptorselective agonists: new insights into the role of helix HU. J. Biol. Chem. 2002, 277, 11385–11391. XU, H. E., LAMBERT, M. R., MONTANA, V. G., PARKS, D. J., BLANCHARD, S. G., BROWN, P. J., STERNBACH, D. D., LEHMANN, J. M., WISELY, G. B., WILLSON, T. M., KLIEWER, S. A., MILBURN, M. V. Molecular recognition of fatty acids by peroxisome proliferator-activated receptors. Mol. Cell 1999, 3, 397–403. CRONET, P., PETERSEN, J. F., FOLMER, R., BLOMBERG, N., SJOBLOM, K., KARLSSON, U., LINDSTEDT, E. L., BAMBERG, K. Structure of the PPARalpha and -gamma ligand binding domain in complex with AZ 242; ligand selectivity and agonist activation in the PPAR family. Structure (Camb.) 2001, 9, 699–706. WILLSON, T. M., BROWN, P. J., STERNBACH, D. D., HENKE, B. R. The PPARs: from orphan receptors to drug discovery. J. Med. Chem. 2000, 43, 527–550. BROOKS, D. A., ETGEN, G. J., RITO, C. J., SHUKER, A. J., DOMINIANNI, S. J., WARSHAWSKY, A. M., ARDECKY, R., PATERNITI, J. R., TYHONAS, J., KARANEWSKY, D. S., KAUEEMAN, R. F., BRODERICK, C. L., OLDHAM, B. A., MONTROSE-RAEIZADEH, C., WINNEROSKI, L. L., FAUL, M. M., MCCARTHY, J.R. Design and synthesis of 2-methyl-2-[4-(2-[5-methyl-2-aryloxazol-4yl]ethoxy)phenoxy]propionic acids: a new
39
40
41
42
class of dual PPARalpha/gamma agonists. G. Med. Chem. 2001, 44, 2061–2064. NOLTE, R. T., WISELY, G. B., WESTIN, S., COBB, J. E., LAMBERT, M. H., KUROKAWA, R., ROSENEELD, M. G., WLLLSON, T. M., GLASS, C. K., MILBURN, M. V. Ligand binding and co-activator assembly of the peroxisome proliferator-activated receptor-gamma. Nature 1998, 395, 137–143. XU, H. E., LAMBERT, M. H., MONTANA, V. G., PLUNKET, K. D., MOORE, L. B., COLLINS, J. L., OPLINGER, J. A., KLIEWER, S. A., GAMPE, R. T., Jr., MCKEE, D. D., MOORE, J. T., WILLSON, T. M. Structural determinants of ligand binding selectivity between the peroxisome proliferator-activated receptors. Proc. Natl. Acad. Sci. USA 2001, 98, 13919–13924. MURAKAMI, K., TOBE, K., IDE, T., MOCHIZUKI, T., OHASHI, M., AKANUMA, Y., YAZAKI, Y., KADOWAKI, T. A novel insulin sensitizer acts as a coligand for peroxisome proliferator-activated receptor-alpha (PPAR-alpha) and PPAR-gamma: effect of PPAR-alpha activation on abnormal lipid metabolism in liver of Zucker fatty rats. Diabetes 1998, 47, 1841–1847. TAKADA, I., YU, R. T., XU, H. E., LAMBERT, M. H., MONTANA, V. G., KLIEWER, S. A., EVANS, R. M., UMESONO, K. Alteration of a single amino acid in peroxisome proliferator-activated receptor-{alpha} (PPAR{alpha}) generates a PPAR{delta} phenotype. Mol. Endocrinol. 2000, 14, 733–740.
45
1
The Determination of Equilibrium Dissociation Constants of Protein-Ligand Complexes by NMR Gordon C.K. Roberts
University of Leicester, Leicester, United Kingdom
Originally published in: BioNMR in Drug Research. Edited by Oliver Zerbe. Copyright ľ 2002 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30465-7
1 Introduction
In general terms, the essential requirement for measuring the dissociation constant of a protein-ligand complex spectroscopically is the identification of a signal (from either the protein or the ligand) which has a different value in the complex from that in the free protein or ligand. In this respect, NMR has real advantages, since of the many resonances from the protein and the ligand several are sure to change – in shift, relaxation and/or scalar coupling – on complex formation. Most NMR studies of complex formation are carried out in order to determine the magnitude and direction of these changes in NMR parameters and to use them to draw conclusions about the nature of the complex. Here we will consider only the more limited use of NMR to determine the equilibrium dissociation constant for complex formation. As we shall see, NMR is particularly useful for the measurement of relatively weak binding, typically dissociation constants (K d ) 10−5 M. Accurate measurement of K d requires the use of protein concentrations ≤K d , and the low sensitivity of NMR in terms of the concentrations required obviously sets a practical limit to the range of dissociation constants which can be measured; this is discussed further below. A key factor in the use of NMR for measuring dissociation constants is its sensitivity to the rate of “chemical exchange”. Complex formation necessarily involves the exchange of the nuclei or molecules being observed between (at least) two states – the free ligand or protein and the complex. The fact that the appearance of the NMR spectrum is sensitive not only to the position of this equilibrium but also to the rates involved has a major influence on the design of NMR experiments for measuring dissociation constants and on the analysis of such data.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Determination of Equilibrium Dissociation Constants of Protein-Ligand Complexes by NMR
2 Chemical Exchange and NMR
In the context of NMR, “chemical exchange” refers to any process in which a nucleus exchanges between two or more environments in which its NMR parameters (chemical shift, scalar or dipolar coupling, relaxation) differ. The effect of this exchange process on the appearance of the NMR spectrum depends on the rate of exchange relative to the magnitude of the difference in NMR parameters between the two states. The effects which exchange can have on the appearance of the spectra are illustrated in Fig. 1 for the simple case of exchange between two equally populated states. When the exchange is very slow on the “NMR timescale” – here relative to the magnitude of the chemical shift difference between the two states – two separate resonances are seen at the positions corresponding to the chemical shifts characteristic of the two states. At the other extreme, when the exchange is very fast relative to the chemical shift difference, a single resonance is seen, whose position is the average of the chemical shifts of the two states, weighted by their relative populations (see below). Between these extremes there are complex changes in line shape, which are very sensitive to the precise value of the rate of exchange. Detailed descriptions of these exchange effects can be found in a number of textbooks and reviews [1–7].
Fig. 1 Effects of the exchange rate on the appearance of the NMR spectrum from a system in which a nucleus is exchanging between two equally populated states in which it has a different chemical shift.
2 Chemical Exchange and NMR 3
When attempting to use NMR to measure a dissociation constant, the basic experiment will be to vary the ligand concentration in the presence of a fixed concentration of protein. (The converse experiment, varying the protein concentration, may sometimes be carried out, but is generally less satisfactory because of problems with protein solubility and aggregation.) What one sees in this experiment will depend critically on the rate of exchange of the molecules between the free state and the complex. If the exchange is slow on the “NMR timescale”, the two resonances, corresponding to the bound and free states, will remain in the same positions, but their relative intensities will change. If the exchange is very fast, the position of the single resonance will change progressively as the relative amounts of free protein or ligand change. It is important to recognise that the “NMR timescale” is a relative one. For a given equilibrium, different resonances will show slow, intermediate or fast exchange behavior, depending on how much their chemical shifts differ between the free and bound states. An example of this is shown in Fig. 2. This shows a small region of a superposition of a series of 1 H-15 N HSQC spectra of the calcium-binding regulatory protein S110B, alone and in the presence of increasing concentrations of a peptide corresponding to its binding site on the actin-capping protein CapZ [8]. Three residues whose 1 H and 15 N chemical shifts change on peptide binding are indicated. For Glu72, the chemical shift changes are small, and a progressive movement of the cross peak is seen, indicative of very fast exchange. This is also the case for Ser62, where the shift changes are larger, but here the progressive shift of the cross peak is accompanied by a marked broadening at intermediate concentrations, indicating that for this peak the exchange regime is between fast
Fig. 2 Expanded region of the overlaid 1 H15 N HSQC spectra from a titration of 15 Nlabeled S100B with CapZ peptide. The protein concentration was 1 mM, and the spectra shown are for peptide concentrations of
0, 0.25, 0.5, 0.75, 1.0, 1.5, 2.5 and 3.0 mM. Chemical shift changes for the cross peaks of Ser62, Glu72 and Ala78 are indicated. From Ref. [8] with permission.
4 The Determination of Equilibrium Dissociation Constants of Protein-Ligand Complexes by NMR
and intermediate exchange. Finally, for Ala78, which shows the largest changes in 1 H and 15 N shifts, the broadening at intermediate peptide concentrations is such that the peak disappears completely, characteristic of intermediate exchange rates (cf. Fig. 1). These line-broadening effects can be very valuable as a means of determining the dissociation rate constant of the complex; detailed procedures for doing this are described elsewhere [9]. As described below, they can, however, present complications in the accurate determination of dissociation constants. Bearing in mind the relativity of the “NMR timescale”, it is useful to estimate the range of rates which will produce line-broadening effects. For chemical shift differences1) in the range 10–1000 Hz, intermediate exchange behavior will be observed for dissociation rate constants in the range 102 –104 s−1 ; assuming that the association rate constant is diffusion-limited, ∼107 –108 M−1 s−1 , this will correspond to dissociation constants in the range 10−3 –10−6 M.
3 The Basic Equations
The following descriptions will be couched primarily in terms of chemical shift changes, which are the most widely used for the measurement of dissociation constants; the general principles apply equally to measurements based on other NMR parameters, and these will be touched on later. k+1
E + L EL k−1
(1)
kd = kk −+ 11 = [E][L] [EL]
In discussing the effects of exchange on NMR spectra, an important parameter is the lifetime of a particular state: For a nucleus in the ligand molecule: Lifetime in state EL,
τEL = 1/k − 1
Lifetime in state L, τL = 1/k + 1 [E]
For a nucleus in the protein molecule: Lifetime in state EL,
τEL = 1/k − 1
Lifetime in state E, τE = 1/k + 1 [L]
Using Eq. (1), the lifetimes in the free state of the ligand and protein can be expressed more conveniently as: Lifetime in state L,
τL =
p k − 1 PEL
1) Note that in the context of exchange effects on NMR spectra chemical shift differences
Lifetime in state E, τE =
pE k − 1 PEL
must be measured in Hertz, not parts per million.
3 The Basic Equations
Where pX is the fractional population of species X, so that, for a ligand resonance, PEL =
[EL] LT
(2)
and pL =
[L] = 1 − pEL, LT
(3)
while for a protein resonance pEL =
[EL] ET
(4)
and pL =
[E] = 1 − pEL , ET
(5)
where LT and ET represent the total concentrations of ligand and protein, respectively. From Eq. (1), for a ligand resonance, 2 [EL] (ET + LT + Kd ) − (ET + LT + Kd ) − 4ET LT = pEL = LT 2LT
(6)
with an analogous equation for an enzyme resonance. In discussions of exchange effects in NMR, a single lifetime is often used to characterize the exchange process 1/τ = 1/τEL + 1/τL = k − 1 (1 + pEL / pL ) = k − 1 / pL
(7)
The spectrum in the presence of exchange in the absence of scalar coupling can be simulated by using McConnell’s modification of the Bloch equations [10–12]. (When there is strong scalar coupling, or when the coupling changes because of the exchange process, the density matrix approach must be used [5]; a number of line-shape fitting programs based on this approach are available [5].) For the simple case, the line shape (amplitude as a function of frequency, v) is given by the imaginary part of G(v), G(v) = iC[2PL PEL τ −τ 2 (PL αEL + PEL αL )] PL PEL −τ 2 αL αEL
(8)
where C is a scaling factor and αL = 2πi(vL − v) + 1/T2,L + pEL /τ ,
αEL = 2πi(vEL − v) + 1/T2,EL + pL /τ
5
6 The Determination of Equilibrium Dissociation Constants of Protein-Ligand Complexes by NMR
In general terms, the three exchange regimes are defined by: Slow exchange 1/τ 2π|vL − vEL | Intermediate exchange 1/τ ∼ 2π|vL − vEL | Fast exchange 1τ 2π|vL − vEL | Simpler expressions for the line shape are available for both the fast and slow exchange regimes (see Ref [9]). The possibility of measuring the dissociation equilibrium constant in each of these three regimes will now be considered.
4 Slow Exchange
In this regime, separate peaks are observed for the free and bound states (cf Fig. 1); when the relative amounts of these two states are altered by varying the ligand concentration, the positions of these two signals remain unaltered; only their relative intensity changes. In principle, this change in relative intensity can be used to estimate the dissociation constant; in practice this is only of any value under very restrictive conditions. First, the signals must be sufficiently well resolved to allow accurate intensity measurements; this can be achieved by appropriate isotope labeling – for example, using per-deuterated protein to study ligand resonances. Second, the signal-to-noise ratio must be sufficient to allow accurate intensity measurements at sub-stoichiometric ligand concentrations ([EL] ∼ 0.1ET ). This latter problem is compounded by the fact that accurate measurement of K d requires that ET < K d . Slow exchange implies a slow rate of dissociation, which in turn implies a relatively low value of K d and hence the need for a low protein concentration. For example, for a chemical shift difference of 100 Hz, observation of slow exchange would imply that k–1 ≤ 3×102 s−1 , so that (assuming a diffusion-limited association rate) K d ≤ 3×10−5 M, and one would need to make accurate intensity measurements at protein concentrations of ∼5 µM. Given the insensitivity of NMR, it is therefore very rarely practical to make reliable estimates of K d from spectra in the slow exchange regime.
5 Intermediate Exchange
The intermediate exchange regime covers a relatively narrow range of exchange rates, no more than a factor of two either side of the “coalescence point” (Eq. (9)). Within this regime the full line-shape equation must be used in order to extract the relative populations of the bound and free states, their chemical shifts and the rate of exchange between them. In principle good estimates of these parameters can be obtained from fitting a series of spectra at different ligand concentrations, but in practice limitations of resolution and signal-to-noise ratio – the signals can
6 Fast Exchange
be very broad at the coalescence point – mean that this is rarely achievable. In most circumstances a change in spectrometer frequency will be sufficient to shift the system into either slow exchange (at higher spectrometer frequency) or fast exchange (at lower spectrometer frequency), where the analysis will be easier.
6 Fast Exchange
The fast exchange regime is by far the most useful for the measurement of dissociation constants from changes in chemical shift, relaxation or other NMR parameters – a wide range of K d values, from 10 µM (or perhaps a little less) to ≥10 mM, can be accessible. 6.1 Very Fast Exchange
When the exchange rate is very fast, the chemical shifts and relaxation rates of the nuclei of the ligand and the protein are completely averaged between the bound and free states, and a weighted average spectrum is observed. For ligand resonances δobs,L = pL δL + pEL δEL
R2,obs,L = pL R2,L + pEL R2,EL,
(9)
(10)
where δ obs,L , δ L and δ EL are, respectively, the observed ligand chemical shift and the chemical shifts of the free ligand and of the ligand in the complex, with analogous definitions for R2 (=1/T 2 ), the transverse relaxation rate. Similarly for protein resonances δobs,E = pE δE + pEL δEL
(11)
R2,obs,E = pE R2,E + pEL R2,EL
(12)
From Eq. (9), for example, for a ligand resonance pEL =
δobs − δL δEL − δL
(13)
and this can be combined with Eq. (6) to give 2 δobs − δL (ET + LT + Kd ) − (ET + Lt + Kd ) − 4Et LT = δEL − δL 2LT
(14)
7
8 The Determination of Equilibrium Dissociation Constants of Protein-Ligand Complexes by NMR
and fitting Eq. (14) (or the analogous equations derived from Eqs. (10)–(12)) to the data (δ obs as a function of LT ) will yield estimates for K d and δ EL . This is the commonest approach used to determine K d by NMR; a detailed protocol for experiments of this kind is given elsewhere [9]. 6.2 The General Case of Fast Exchange
It is important to note that Eqs. (9)–(12) are valid only for very fast exchange. If the exchange is somewhat slower (“moderately fast” exchange), there will be an exchange contribution to the line width, and Eq. (10) for the ligand resonance2) will be replaced by R2,obs = PL R2,L + PEL R2,EL +
PEL PL2 4π 2 (δEL − δL )2 k−1
(15)
This equation predicts a different dependence of the line width on the ligand concentration from that predicted by Eq. (10). For large k–1 , Eq. (15) clearly reduces to Eq. (10), and, if changes in relaxation are being used to estimate K d , it is generally sensible to fit the data using Eq. (15) rather than Eq. (10) (the only exception being paramagnetic relaxation, see below). Most important, if the exchange contribution to the line width (the third term on the right hand side of Eq. (15)) is significant, Eqs. (9) and (14) will not be an accurate description of the change in chemical shift with ligand concentration. It has been shown by simulation experiments that the erroneous use of Eqs. (9) and (14) when exchange is not very fast can lead to errors of at least an order of magnitude in K d [13]. For example, Fig. 3 A shows a series of simulated spectra of a ligand resonance at increasing values of LT ; the chemical shift difference between the bound and free states is 100 Hz and k–1 is 500 s−1 . There is clearly a progressive change in chemical shift as the ligand concentration is increased, as would be expected for fast exchange, and the signals appear to have Lorentzian shapes. However, broadening of the resonance line in the middle of the titration is also clearly evident; this arises from the exchange contribution in Eq. (15), which has a maximum when pEL ≈ 1/3. While the chemical shifts obtained from these simulated spectra seem to be described reasonably satisfactorily by Eq. (14), as shown in Fig. 3 B, the K d value estimated from the least-squares fitting of Eq. (14) to the data is in error by almost a factor of two, even when, as here, data is available for a wide range of ligand concentrations (from 0.1ET to 30ET ) [13]. When, more realistically, only ligand concentrations greater than the protein concentration are used in the analysis, the error becomes a factor often for the parameters used in Fig. 3 and can be as much as 100 for somewhat lower values of k–1 /(δ EL – δ L ) [13].
2) Eq. (15) and the following discussion are couched in terms of ligand resonances, but ex-
actly the same considerations apply to protein resonances.
6 Fast Exchange
Fig. 3 Simulation of spectra of a ligand resonance in a system involving proteinligand complex formation. The parameters used for the simulation were: (*EL –*L ) 100 Hz, k−1 500 s−1 , K d 10−5 M, ET 10−3 M, LT (0.1–30) 10−3 M, free line width 5 Hz, bound line width 20 Hz. A (left): Simulated line shapes for LT 0.5 mM (◦), 1.5 mM (), 2.0 mM (+), 2.5 mM (×), 10 mM ().
B (right): Observed chemical shift (*obs –*L ) as a function of total ligand concentration, LT . The points are values calculated from the exact simulation, and the curve is the best fit to Eq. (14), assuming no exchange contribution; the value of K d estimated from the fit is 6.2×10−6 M. From Ref [13] with permission.
This is a potentially serious source of error for combinations of δ EL – δ L ), k–1 and K d which are very likely to be encountered in practice. It is thus essential to establish when it is safe to use Eq. (14) to analyze chemical shift changes. The simplest way to do this is to measure the line width of the resonance of interest as well as its position, and to analyze its dependence on ligand concentration (as described in Refs. [9, 13]) to estimate the magnitude of the exchange contribution (Eq. (15)); only when the exchange contribution is negligible – the very fast exchange condition is satisfied – is it safe to use Eq. (14). In the more general case of “moderately fast” exchange, where the simple Eq. (14) cannot be used, the alternative is either a full line-shape analysis or, more practically, a comparison of measured chemical shifts and line widths as a function of ligand concentration with those obtained from simulated spectra; detailed protocols for this approach are available [9]. Both in order to identify the exchange regime and to obtain accurate and precise values of K d , it is important to make measurements at as low a ligand concentration as possible. 6.3 Paramagnetic Relaxation
For those proteins which contain a paramagnetic center (e.g., heme iron, copper), usually at the active site, the bound ligand will experience a marked increase in relaxation rate due to dipolar interaction with the unpaired electron(s) of the paramagnetic center. While the paramagnetic relaxation effect is commonly used
9
10 The Determination of Equilibrium Dissociation Constants of Protein-Ligand Complexes by NMR
to obtain structural information (the distances between the paramagnetic center and ligand nuclei), it can also be used for the determination of K d (see, for example, Refs. [14, 15]). It is mentioned separately here because, since the magnetic moment of the electron is 1800 times that of the proton, it is, when present, a dominant effect. It magnitude is such that it can only be measured under fast exchange conditions, with a large excess of ligand over protein. This can simplify the analysis (for LT ET , pL ≈1), and it also allows one to use low enzyme concentrations, so that somewhat lower K d values can be measured. On the other hand, these same experimental conditions mean that a long extrapolation to R2,EL or R1,EL is required, and this limits the precision of the measurement.
7 Conclusions
As discussed above, NMR can be a useful tool for the determination of dissociation constants provided that its strengths and limitations are recognized. In practice, it is only useful under fast exchange conditions, and thus for relatively weak binding: K d 10 µM. The upper limit to the dissociation constants which can be measured is usually set by the solubility of the ligand molecule, since concentrations 10 K d are needed to obtain reliable estimates. Modest amounts of cosolvent (e.g., DMSO, dimethyl sulfoxide) are often used to ameliorate this problem, but most proteins will only tolerate small amounts < 10% of such cosolvents, and for accurate data it is important to keep the cosolvent concentration constant during the titration. For dissociation constants in the 10 µM–10 mM range, NMR is certainly competitive with other available methods; it is not dependent on the presence of a chromophore and does not require the separation of bound and free ligand. Provided that simple precautions, described above, are taken to ensure that the appropriate analysis is used, K d measurement by NMR is rapid, simple and accurate. The emphasis in the discussion has been on the analysis of changes in chemical shift and relaxation rate because these are the parameters most commonly used for K d measurements. Any protein-ligand binding process is bound to be accompanied by some change in chemical shift of a resonance or resonances from both the protein and the ligand, although of course the magnitude of such a shift change cannot in general be predicted. A number of resonances and indeed different spectrometer frequencies may need to be examined to find one with a shift large enough to be useful but small enough to be in fast exchange. Changes in relaxation rate are likely to be almost universal for ligand resonances, since the rotational correlation time of the ligand will increase substantially on binding to a macromolecule, though the magnitude of the effect can be complicated by internal motion in the complex. For the protein, on the other hand, changes in relaxation may occur but will be wholly dependent on local changes in mobility. The use of changes in relaxation to study ligand binding to proteins was one of the earliest biological applications of NMR [3, 16] and remains a valuable approach. In fact, of course, any NMR parameter which changes on complex formation and which shows fast exchange
References
behavior (being described by an equation analogous to Eqs. (9)–(12)) can be used to measure K d . Scalar coupling constants are likely to change if the conformation of the ligand or of protein side-chains changes on complex formation, and this can provide useful structural information; however, the magnitude of such changes is rarely large enough to be useful for K d determination. On the other hand, NMR experiments which are used to screen for binding, such as pulsed-fieldgradient measurements of translational diffusion or the measurement of the sign of intramolecular NOEs in the ligand (see, e.g., Ref [17]), can also readily be adapted for quantitative determination of K d – for example, see Ref. [18] for the use of translational diffusion measurements in this way.
References 1 KAPLAN, J. I. and FRAENKEL, G., NMR of Chemically Exchanging Systems. Academic Press, New York, 1980. 2 SANDSTROM, J., Dynamic NMR Spectroscopy. Academic Press, London, 1982. 3 JARDETZKY, O., and ROBERTS, G. C. K., NMR in Molecular Biology. Academic Press, New York, 1981. 4 SANDERS, J. K. M. and HUNTER, B. K., Modern NMR Spectroscopy. Oxford University Press, Oxford, 1987. 5 NAGESWARA RAO, B. D., Methods Enzymol. 1989, 176, 279–311. 6 LED, J. J., GESMAR, H., and ABILDGAARD, F., Methods Enzymol. 1989, 176, 311–329. 7 BERKOWITZ, B., and BALABAN, RS., Methods Enzymol. 1989, 176, 330–341. 8 KILBY, P. M., Van ELDIK, L. J., and ROBERTS, G. C. K., Protein Sci. 1997, 6, 2494–2503. 9 LIAN, L.-Y., and ROBERTS, G. C. K., in NMR of Macromolecules (ed. ROBERTS, G. C. K.), Oxford University Press, Oxford,
pp. 153–182, 1993. 10 MCCONNELL, H. M., J. Chem. Phys. 1958, 28, 430–435. 11 LEIGH, J. S., Jr., J. Magn. Reson. 1971, 4, 308–318. 12 MCLAUGHLIN, A. C. and LEIGH, J. S., Jr., J. Magn. Reson. 1973, 9, 296–305. 13 FEENEY, J., BATCHELOR, J. G., ALBRAND, J. P. and ROBERTS, G. C. K., J. Magn. Reson. 1979, 33, 519–529. 14 MODI, S., PRIMROSE, W. U., BOYLE, J. M. B., GIBSON, C. F., LIAN, L.-Y., and ROBERTS, G. C. K. Biochemistry 1995, 34, 8982–8988. 15 MODI, S., PAINE, M. J. I., SUTCLIFFE, M. J., LIAN, L.-Y., PRIMROSE, W. U., WOLF, C. R.. and ROBERTS, G. C. K., Biochemistry 1996, 35, 4540–4550. 16 FISCHER, J. J. and JARDETZKY, O., J. Am. Chem. Soc. 1965, 87, 3237–3281. 17 ROBERTS, G. C. K., Drug Discov. Today 2000, 5, 230–240. 18 TILLETT, M. L., HORSFIELD, M. A., LIAN L.-Y. and NORWOOD, T. J., J. Magn. Reson. 1998, 133, 379–384.
11
1
Protein-Lipid Interactions during Virus Entry by Membrane Fusion Alex L. Lai, Yinling Li, and Lukas K. Tamm University of Virginia, Charlottesville, USA
Originally published in: Protein-Lipid Interactions. Edited by Lukas K. Tamm. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31151-4
1 Introduction
A large number of animal viruses are enveloped by lipid bilayer membranes. Since enveloped viruses bud during biogenesis from specialized areas of the host cell surface, viral membrane envelopes are also highly specialized in terms of their lipid and protein compositions. Like the cell membranes from which they derive, they are enriched in sphingomyelin, phosphatidylcholine (PC) and cholesterol in the outer leaflet, and phosphatidylethanolamine (PE) and various negatively charged lipids in the inner leaflet. Viral membrane envelopes contain only a very small number, typically one to three, different integral membrane proteins. Most viral membrane envelope proteins are glycosylated and appear as elongated projections or spikes in electron micrographs of these viruses. The different viral spike glycoproteins have a number of specific tasks in viral cell entry. One function is to attach viruses to receptors on the surface of target cells. A second major function is to promote the fusion of viral and target cell membranes, either directly at the cell surface or after endocytosis with the endosomal membrane in a reaction that proceeds only at the lower pH that prevails in the endosome. In some viruses, both functions are packaged into a single protein, whereas in other viruses they are distributed between two different proteins. The atomic structures of the ectodomains of several viral spike glycoproteins have been solved by X-ray crystallography. According to these structures, viral membrane fusion proteins are generally grouped into two classes. Class I viral fusion proteins are elongated proteins characterized by trimeric bundles of helical hairpins with coiled-coil α-helical cores. In contrast, class II viral fusion proteins consist of three β-sheet domains that pair into dimers and form relatively flat lattices on the viral membrane surfaces (Fig. 1).
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
Fig. 1 Diagrams of representative membrane-enveloped viruses with type I and type II fusion proteins. (A) Influenza virus. The type I fusion spike glycoprotein is HA, which is a trimer consisting of three identical subunits. Each subunit is composed of HA1 and HA2 chains. The coiled-coil structures (green) in the stem are formed by the HA2 chains and the globular domains (yellow) at the tip of the molecule are formed by the HA1 chains. The virus also contains the matrix M1 protein (pink) and eight segments of single (−) strand RNA (blue) that together with the nucleocapsid protein NP (grey) form
ribonucleo-protein complexes. The envelope also contains neuraminidase NA (blue) and the proton channel M2 protein (not shown) in relatively small copy numbers. (B) Semliki Forest virus. The type II fusion glycoprotein is the E1 protein (red) that forms together with the receptor binding protein E2 (yellow) a T=4 icosahedral lattice on the membrane surface. The nucleocapsid consists of 240 copies of the capsid protein C (pink). A single (+) strand RNA (blue) contains the entire genome of this alpha-virus. The envelope also contains the small soluble E3 protein in stoichiometric amounts (not shown).
2 Fusion of Pure Lipid Bilayers
2 Fusion of Pure Lipid Bilayers
Pure lipid bilayers do not spontaneously fuse. The headgroups of phospholipids are highly hydrated and hydration repulsion prevents the spontaneous fusion of uncharged lipid bilayers. If bilayers are negatively charged, charge repulsion adds to the repulsive energy between lipid bilayers. At equilibrium, fluid-phase PC binds about 34 molecules of water and PE binds about eight waters per lipid [1]. Therefore, the equilibrium distance measured from headgroup phosphate to headgroup phosphate between two fluid-phase lipid bilayers is about 30 Å for PC bilayers and about 10 Å for PE bilayers. The hydration of negatively charged phosphatidylserine (PS) is similar or slightly larger than that of PC [2]. This hydration barrier must be overcome en route to membrane fusion. There are a few different views of how this could happen. These will be briefly discussed in the following. It has been postulated as early as in the late 1970s and early 1980s that membrane fusion may proceed through “point defects” [3, 4]. The nature of these point defects, however, was not so clear. Markin et al. proposed in a theoretical study that membrane fusion could proceed through an hourglass-shaped “lipid stalk” intermediate [5]. Since lipid stalks are thought to be transient structures that may exist only briefly, these structures have never been observed experimentally in membrane fusion. However, arrays of lipid stalks have recently been observed in X-ray diffraction experiments of aligned stacks of membranes at submaximal hydration [6, 7]. Under specialized experimental conditions, pure lipid bilayers can also be observed to form “hemifusion” diaphragms when they are closely apposed to each other. In hemifusion, the two distal leaflets of two approaching bilayers join to form a single bilayer in the central hemifused region. Depending on the experimental set-up, hemifusion diaphragms may be quite extended. Hemifusion has been observed electrophysiologically in planar bilayer experiments [8, 9] and in the surface forces apparatus when two mica-supported lipid bilayers are mechanically pushed together [10, 11]. The combination of electrophysiological and theoretical studies on pure lipid bilayer fusion led to the “stalk–pore” model of membrane fusion (reviewed in [12]). In this model, lipid stalks radially expand until the two distal monolayers contact each other to form a hemifusion diaphragm. The hemifusion diaphragm is thought to break at some point to form an initial fusion pore. The stalk and initial pore intermediates have different curvatures, whose energies have been calculated in numerous theoretical studies. The effect of lipid additives that alter the spontaneous curvature of lipid bilayers are consistent with the stalk–pore model. For example, lyso-PC (induces positive curvature) added to the proximal monolayers and oleic acid (induces negative curvature) added to the distal monolayers promote fusion, whereas lyso-PC added to the distal monolayers and oleic acid added to the proximal monolayers inhibit fusion [12]. These and a multitude of similar studies on many different reconstituted and biological fusion systems are often taken as direct evidence for the stalk–pore model of membrane fusion. However, the reader should be cautioned: neither theoretical calculations of energies of stalks nor the correlation between the effect of curvature agents on the calculated energy of stalks and membrane fusion prove that neatly organized stalks actually exist as
3
4 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
intermediates in fusion. They just prove that calculated energies with sometimes somewhat arbitrarily chosen boundary conditions are compatible with certain nonbilayer lipid structures and that certain lipophilic agents produce membrane situations (curvature, hydration, defects, etc.) that are conducive/inhibitory to membrane fusion. It is important to make this distinction because the stalk–pore model has become so popular recently that many take it for a fact rather than a model. One should also be aware that it is virtually impossible to distinguish experimentally between different stalk and hemifusion intermediates in fusion of biological membranes because, by definition, lipids can exchange freely between the two membranes through all hemi-fusion intermediates. If well-organized stalks exist, they are probably short-lived transient intermediates that according to the newer models progress directly to fusion pre-pores without the need for a transmonolayer contact hemifusion intermediate [13]. Extended hemifusion diaphragms may occur rather as off-pathway side-products under suboptimal conditions in biological membrane fusion [14–18]. Lentz et al. developed sophisticated methods to monitor many important details of the kinetics of fusion between pure lipid bilayers [19]. In this work, small unilamellar vesicles were induced to fuse by the addition of 5–20% polyethylene glycol of molecular weight 8000. Polyethylene glycol aggregates the vesicles by reducing the activity of water. It has been found that bilayers composed of PC, PE, sphingomyelin and cholesterol at a molar ratio of 35:30:15:20 fuse quite efficiently and in a non-leaky fashion [20]. The kinetics of outer leaflet lipid mixing (fast), transbilayer lipid movement (intermediate), inner leaflet mixing (intermediate) and contents mixing (slow) have been resolved [21]. Clearly, outer leaflet lipid mixing (around 0.5 min) and pore formation (4–6 min) are two distinct kinetic processes in these welldefined pure lipid bilayer model systems. A “problem” of the original stalk–pore model for membrane fusion is that hydrophobic “voids”, i.e. volumes of hydrophobic mismatch are created in the stalk, which may be energetically very costly [22, 23]. Different authors have tried to get around this problem by different means. Newer calculations have simply eliminated the voids by assuming intermediate structures with tilted lipids [13, 24] or reduced their effects by relaxing the geometry of the stalk [25]. In another effort to rescue the original stalk model, it has been postulated that the “voids” have to be filled by hydrophobic substances that reduce the high energy of the stalk and make it a viable intermediate in lipid bilayer fusion. Indeed, when small amounts of very long-chain lipids or alkanes were added to the lipid mixtures, the rates of fusion (contents mixing), but not hemifusion (lipid mixing) of pure lipid bilayers were increased up to about 2-fold [26]. This rate increase was additional to that achieved by negative curvature inducing lipids. Even if the rates are increased by these lipid additions, these experiments do not explain why bilayers in their absence are still able to fuse at fairly rapid rates. Finally self-consistent field theory of flexible amphiphilic chains has been used to model the formation of traditional lipid stalks [27]. The energetic barrier to forming a stalk derived from this theory was significantly smaller than that derived from the phenomenological continuum theories, but a large energetic barrier, which depended on the spontaneous intrinsic curvature of the amphiphile, was associated with the radial expansion of the stalk into a hemifusion diaphragm.
3 Viral Protein Sequences that Mediate Lipid Bilayer Fusion
Coarse-grained molecular simulations have been used recently to provide more insight into the microscopic details of transitions at lipid bilayer fusion junctions. In one study, lipids were modeled as amphiphilic three-segment rods and studied by Brownian dynamics [28]. In another set of studies, the amphiphiles were modeled as flexible co-polymer chains in a hydrophilic polymer solvent and their transformation from bilayers into fusion intermediates was studied by Monte-Carlo lattice simulations [29, 30]. Interestingly, the outcome of both approaches was similar and suggested a new fusion mechanism, which was different from the classical stalk–pore mechanism. After formation of an initial quite disordered lipid stalk between two closely apposed bilayers, a hole appeared in either one or both parent membranes next to the stalk and the stalk then grew asymmetrically around these holes to form the initial fusion pore. These initial small holes appear in the bilayer because the line tension is high near disorganized lipid stalks. Coarse-grained molecular dynamics simulations revealed a few additional new aspects of microscopic details in the evolution of lipid stalks and fusion pores [31, 32]. In one study [31], the stalk was initiated by the displacement of a few lipid molecules from their normal position in one of the two bilayers. Fusion could then proceed through the classical stalk–pore or the new stalk-hole mechanism. Which mechanism prevailed depended on the headgroup composition of the bilayers with negative curvatureinducing lipids favoring the stalk–pore and bilayer-forming lipids favoring the stalk-hole mechanism. In the other study [32], the lipid molecules became tilted and eventually splayed their aliphatic tails such that each chain was resident in different opposing leaflets. The nucleus of the initial stalk formed by lipid tail splaying was again seen to expand asymmetrically in a circle to gradually enclose an initial fusion pore. Whether specific details of each of these mechanisms will hold up for bilayers made of “real” phospholipids (i.e. not coarse-grained models) remains to be seen. However, the fact that so many different oversimplified models and methods of simulation independently led to very similar general results is very promising. Perhaps the recently observed spurious leakage in influenza hemagglutinin (HA)-mediated membrane fusion is a reflection of hole formation in the new stalk-hole model of membrane fusion and thus may provide experimental support for this mechanism [33]. New details of how fusion between pure lipid bilayers proceeds at the microscopic level will almost certainly emerge as computational simulation methods continue to be further refined and as computer power increases year-by-year.
3 Viral Protein Sequences that Mediate Lipid Bilayer Fusion 3.1 Fusion Peptides
Photoaffinity labeling studies have shown that the major regions of viral spike glycoproteins that interact with lipid bilayers are domains called “fusion peptides”
5
6 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
as well as the transmembrane anchors of the viral spike glycoproteins. While the transmembrane anchor domains are permanently inserted into the viral lipid bilayer, the fusion peptides interact with the lipid bilayer of target (and/or viral) membranes only upon activation of the fusion process. Although fusion peptides are quite hydrophobic, these polypeptide sequences are protected by other parts of the viral spike glycoproteins and therefore do not interact with membranes in their resting state. Many fusion peptides like those of the influenza and human immunodeficiency viruses are located at the extreme N-terminus of the fusion subunit of the viral spike glycoproteins. Others such as those of Ebola or Dengue virus are internal sequences of the respective fusion proteins. A list of a few selected fusion peptides is shown in Fig. 2 (see also [34, 35]). The sequences of fusion peptides are extremely well conserved within each family of viruses, but not between different families. There are, however, some general features that are common between fusion peptides of the different virus families: the propensities of glycines and alanines are high, large bulky aromatic residues are frequently found, and hydrophilic residues are found interspersed towards the Cterminal end of N-terminal fusion peptides and towards both ends of internal fusion
Fig. 2 Sequences of class I and II Nterminal and internal fusion peptides. Glycines and alanines, which are frequent in these sequences are highlighted. Bulky aromatic residues are underlined and hy-
drophilic residues are marked with a dot. HIV, human immunodeficiency virus; ASLV, avian sarcoma leukemia virus; TBE, tickborne encephalitis virus; SFV, Semliki Forest virus.
3 Viral Protein Sequences that Mediate Lipid Bilayer Fusion
peptides. Although it is not always totally clear where a fusion peptide sequence begins and ends, site-directed mutagenesis has shown that quite dramatic fusion phenotype changes are found with some only relatively mild single amino acid changes in the fusion peptide region. Most evidence for this comes from work with influenza virus [36–40], whose fusion protein HA contains a receptor binding subunit (HA1) and a fusion subunit (HA2). Glycines 1 and 4 of HA2 are particularly susceptible to fusion defects. For example, the G1S mutant causes hemifusion, whereas a G1V replacement is completely defective in fusion [39]. Large bulky hydrophobic residues can be replaced with other large bulky hydrophobic residues, but not with glycines [40]. It appears that a proper balance and spacing of glycines and large bulky hydrophobic residues in the fusion peptide is important to confer fusion activity to influenza HA2. Even a deletion of the first glycine is not tolerated in this fusion protein [41]. 3.2 Transmembrane Domains
A first indication that the transmembrane domain is also very important for fusion and not just for anchoring the fusion protein in the viral membrane came from a study in which the transmembrane domain of influenza HA2 was replaced with a glycosylphosphatidylinositol anchor [14]. This construct was able to induce hemifusion between HA expressing and red blood cells, but it was unable to complete the reaction to develop a full fusion pore. Subsequent studies in several laboratories established that there was little requirement on the actual sequence of the transmembrane domain, but that its length mattered. For example, HA constructs with only the N- or C-terminal half of the transmembrane domain present mediated hemifusion, but not full fusion, and full fusion was gradually recovered as the domain length was increased to its full length [42]. 3.3 Other Regions of the Fusion Protein
Some fusion proteins with N-terminal fusion peptides contain other segments of the ectodomain that may interact with lipids in the process of membrane fusion. For example, the kink region of the ectodomain of influenza HA2 (residues 106–112) that connects the inner and outer layer α-helices in the pH5 structure has been proposed to contribute to membrane fusion [43] either by pH-dependent lipid interactions [44] or protein–protein interactions [45]. Similarly, it has been suggested that paramyxoviruses contain an internal membrane-interactive segment in addition to the N-terminal fusion peptide [46]. This putative internal secondary fusion peptide, which comprises residues 208–229 of the Sendai virus F protein is located in between the N- and C-terminal heptad repeats that form coiled coils in the post-fusion structure [47]. This sequence also maps to the C-terminal end of the N-terminal heptad repeat and a short helix (N1) that is part of the neck in the pre-fusion structure of the F protein of the related Newcastle disease virus [48].
7
8 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
It has been reported that a tryptophan-rich region just N-terminal to the transmembrane domain of HIVgp41 contributes to membrane fusion [49, 50]. This segment has been predicted to lie in the interface of the external leaflet of the viral membrane. A similar interfacial juxta-membrane domain has been predicted to exist in Ebola virus [51]. It is interesting to note that juxta-membrane domains have also been implicated to contribute to fusion in the SNARE proteins syntaxin 1a and synaptobrevin 2, and that the aromatic residues of this segment may serve an important structural and functional role in this process [52]. Finally, HIV and other lenti- and retroviruses contain three consecutive helices on the cytoplasmic or inner side of the viral membrane that exhibit large hydrophobic moments that promote strong interactions with the inner surfaces of the viral membranes. However, rather than contributing to fusion, the predominant roles of these amphipathic helices appear to direct the intracellular targeting of the Env glycoproteins in virus assembly and their targeting to the perinuclear envelope after host cell entry [53]. In the case of influenza HA2, the cytoplasmic domain comprises only 13 residues, three of which are palmitoylated cysteines. Palmitoylation appears to be particularly important in intracellular vesicle traffic for correctly targeting HAs to the apical membrane and for subsequent virus assembly [54, 55]. However, a minor modulatory role in fusion pore opening has also been ascribed to the palmitoylated cytoplasmic tail of influenza HA2 [56].
4 Interactions of Fusion Peptides with Lipid Bilayers
N-terminal fusion peptides likely form independent folding units in membranes and, therefore, are sometimes also called fusion domains. The reason for this contention is that the linkers between fusion peptides and the structured ectodomains (1) contain several glycines, (2) are not ordered in crystal structures even if the residues are present, and (3) are susceptible to protease digestion. Moreover, the lipid bilayer and its interface constitutes a very different folding environment for membrane proteins or inserted peptides than the aqueous environment, in which ectodomains fold [57]. However, to obtain meaningful folding units of fusion peptide models in lipid bilayers, it is important to choose peptides that comprise the full length of the membrane-interactive fusion domain. Peptides can be quite polymorphic in membrane environments and different conformations and molecular properties may be expressed if shorter than full-length peptide models are chosen. This has been a particular problem in studies of the HIV envelope glycoprotein gp41 fusion peptide in lipid bilayers, where different authors have found quite diverging results. As seen in Fig. 2, this fusion peptide is quite long, but several studies have used relatively short model peptides to study the interactions with lipid bilayers. The hydrophobicity of the full-length HIV gp41 fusion peptide also makes it quite prone to aggregation in solution and in membranes, which adds further difficulties in its handling and potential reasons for differences in results that have been reported for this peptide. Therefore, choosing a certain length of a fusion
4 Interactions of Fusion Peptides with Lipid Bilayers
peptide model often represents a trade-off between a short sequence that is more easily handled and a long sequence that may better represent the membrane-bound structure of the full-length protein, but may form non-physiological aggregates before it is incorporated into the membrane. The results that are summarized below should be judged in the light of this background. 4.1 HIV Fusion Peptide-Bilayer Interactions
Some of the earliest studies on the HIV gp41 fusion peptide reported that the 16 most N-terminal residues inserted as an oblique α-helix into lipid bilayers [58, 59]. The oblique insertion correlated with the fusion activity because mutations with reduced activity inserted more parallel to the membrane surface as determined by polarized Fourier transform infrared (FTIR) spectroscopy. This finding was in agreement with a prediction from computer models of several viral fusion peptides that a tilted insertion into membranes might be a common feature of many viral fusion peptides [60]. The N-terminal segment of the simian immunodeficiency virus (SIV) fusion peptide analog was also found by neutron diffraction to insert as an oblique α-helix into lipid bilayers [61]. The first 23 residues of the HIV fusion peptide were also predominantly α-helical at low concentrations in membranes, but adopted an extended β-structure at higher concentrations [62, 63]. A still longer HIV fusion peptide analog (first 33 residues) was found by circular dichroism (CD) and FTIR spectroscopy to consist of about 30% α-helical and 50% β-structures [64]. Solution-state nuclear magnetic resonance (NMR) studies of the first 23-residue peptide revealed that these segments adopted predominantly α-helical structures in sodium dodecylsulfate (SDS) [65] and dodecylphosphocholine (DPC) micelles [66], but solid-state NMR found the first 23 residues of the HIV fusion peptide in parallel and antiparallel laterally associated β-sheet conformations when bound to lipid bilayer membranes [67–69]. Combining site-directed 13 C-FTIR spectroscopy and molecular modeling, Gordon et al. [70] suggested that the first 16 residues of the HIV fusion peptide are α-helical and the next seven residues extended under dilute conditions in lipid bilayers. In solution and at higher loading on membranes, the same peptide assembles into antiparallel β-sheets [71]. 4.2 Influenza Fusion Peptide Structure
CD and FTIR spectroscopy showed that the fusion peptide of influenza virus HA2 adopts about 50% helix in lipid bilayers [72–74]. This has been confirmed with 23residue models of the HA fusion peptides [75]. Several critical single site mutations in positions 1, 5 and 7 induce larger amounts of β-structure, when these peptides are incorporated into lipid bilayers [75, 76]. The relative proportion of β-structure is roughly anti-correlated with fusion activity when the same mutation is introduced into the full-length HA and fusion is measured between HA-expressing cells. The fusion activity is not so easily measured with the peptide models themselves because
9
10 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
generally only lipid mixing, but not contents mixing, are observed in liposome fusion assays with peptides. In contrast to their full-length analogs, fusion peptides cause contents leakage, i.e. pore formation, when added to lipid bilayers [77, 78]. Short of testing the effect of mutations in full-length fusion proteins, the best correlation between fusion-active and inactive peptide models is their ability to cause hemolysis in red blood cells [74, 76]. Why hemolysis induced by mutant fusion peptides is a better indicator for fusion activity of the full-length protein with the same mutation than lipid mixing and leakage assays of peptides with liposomes is not entirely clear. The influenza HA2 fusion peptide models discussed so far were all only poorly soluble in water and had to be added to liposomes from dimethylsulfoxide or combined with lipids in organic solvents for structural and functional measurements. To eliminate potential artifacts that might arise from solvent exposure of the peptides, we designed a new generation of fusion peptide models which had a very polar carrier peptide appended to the C-terminus of the fusion peptide via a flexible, i.e. glycine-rich, linker. The polar carrier peptide in a sense replaces the ectodomain of the fusion protein in these models. This strategy generates fusion peptides that are highly soluble in water or buffer, while retaining their ability to bind and insert into lipid bilayers with high affinity [79]. The solubilized 20-residue fusion peptide of influenza HA2 is largely random coil in solution, but adopts about 50% α-helical structure when bound to lipid bilayers as do the first-generation fusion peptides [79]. At very high surface concentrations on lipid bilayers, a fraction of the peptide is converted to extended self-associated β-structures at the membrane surface [80]. The new peptide design has permitted their atomic structures to be solved by solution NMR in DPC detergent micelle solutions at pH 7 and 5 [81]. The pH 5 structure is the physiologically relevant structure when influenza HA interacts with target membranes in the endosome, but the pH 7 structure is also informative because the differences between the two pH structures explain why influenza virus requires a low pH not only for releasing the fusion peptide by the conformational change of its ectodomain, but also for membrane fusion. Both structures are characterized by a well-defined N-terminal α-helix that extends to glutamate 11 (Fig. 3). Residues 11–13 form a turn and redirect the polypeptide chain so that it forms a “V” or “boomerang” with an opening angle of about 105◦ . The C-terminal arm does not form a regular secondary structure at pH 7, but residues 14–18 form a 310 -helix at pH 5. The inner volume of the boomerang is filled with bulky hydrophobic residues, a ridge of conserved glycines lines the N-terminal outer face and polar residues characterize the C-terminal outer face of the boomerang at pH 5, but not at pH 7. Therefore, folding of the C-terminal arm at pH 5 likely drives the N-terminal arm deeper into the membrane. These structures determined in detergent micelles by NMR have been confirmed by site-directed spin-labeling in lipid bilayers [81]. The spin-label electron paramagnetic resonance (EPR) measurements also determine the angle between the N-terminal α-helix and the membrane plane (23◦ at pH 7 and 38◦ at pH 5) and the depth of membrane penetration of the fusion peptide. These studies confirmed and substantially refined earlier spin-label studies on a HA2 construct with the fusion peptide and a differently designed fusion peptide
4 Interactions of Fusion Peptides with Lipid Bilayers 11
Fig. 3 Structures of the influenza HA fusion peptide in a lipid bilayer membrane at pH 5. The atomic structures of the wildtype, the glycine1-to-valine mutant (which causes a complete fusion defect), and the glycine1-to-serine mutant (which causes
hemifusion) fusion peptides are shown. All structures were determined by NMR in DPC micelles and their dispositions in lipid bilayers were measured by site-directed spinlabeling (adapted from [81, 86]).
[82, 83]. Fig. 3 shows that the Cα of Asn 12, which forms the apex of the boomerang, is located in the phosphate plane of a fluid lipid bilayer. The N-terminal arm penetrates to about the mid-plane of the lipid bilayer at pH 5, but to a shallower depth at pH 7. Two NMR structures of a different influenza HA2 fusion peptide analog are also available [84, 85]. Since two of the conserved glycines were replaced with glutamates (to make the peptides more soluble) and since in one study the peptide was bound to highly negatively charged SDS micelles, it is not surprising that many details of the structures are quite different, although some general aspects of the wild-type structure of Fig. 3 are preserved. 4.3 Influenza Fusion Peptide Mutants
The structures of the hemifusion-inducing fusion peptide mutant G1S and the fusion-blocking mutant G1V have also been determined recently by NMR in DPC micelles using the polar C-terminal carrier peptide approach [86]. The NMR structure of G1S is very similar to that of the wild-type (Fig. 3). It also forms an amphipathic boomerang with all bulky hydrophobic residues sequestered into the hydrophobic pocket of the structure. Only the glycine ridge on the N-terminal outer edge is disrupted. In contrast, G1V forms an irregular approximately linear
12 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
amphipathic helix (Fig. 3). Site-directed spin-labeling shows that this helix is oriented more parallel to the membrane surface. Apparently, an angled and deeply membrane inserted boomerang structure is necessary to promote hemifusion, but a preserved glycine ridge is further required to drive the hemifusion intermediate into a full fusion product. 4.4 Binding of Fusion Peptides to Lipid Bilayers
The binding of the solubilized fusion peptides to lipid bilayers has been measured by fluorescence spectroscopy and isothermal titration calorimetry [79, 80, 87]. The free energy of partitioning of the wild-type influenza HA2 fusion peptide into fluid lipid bilayers is −7.6 kcal mol−1 . In small unilamellar vesicles, this interaction is driven by enthalpy (−18.0 kcal mol−1 ) and opposed by entropy (10.4 kcal mol−1 ). Apparently, there is a real energetic affinity of the fusion peptide for highly curved membrane interfaces, rather than a classical hydrophobic effect-driven peptide association with the lipid bilayer. The entropy loss (−34.8 kcal mol−1 ) could be due to a combination of folding of the peptide and its partial immobilization (reduction of dimensionality of degrees of motional freedom) upon membrane binding. Interestingly, the enthalpy of binding of G1V is much reduced (−9.2 kcal mol−1 ), but that of G1S is similar to that of the wild-type fusion peptide (−15.9 kcal mol−1 ). Recent results from our laboratory show that measurements of binding enthalpy are more sensitive than measurements of free energy to discriminate between active and inactive fusion peptides (A. L. Lai and L. K. Tamm, unpublished results). 4.5 Sendai, Measles and Ebola Fusion Peptide–Bilayer Interactions
The secondary structures and lipid interactions of the N-terminal fusion peptides of the F proteins of two paramyxoviruses, Sendai and measles virus, have also been studied. The fusion peptide of Sendai virus is about 50% α-helical in lipid bilayers [88], but that of measles virus exhibited only weak membrane interactions, which is surprising given its close sequence similarity with the Sendai fusion peptide [89]. A study of the internal fusion peptide of the Ebola virus glycoprotein revealed only a limited amount of secondary structure, but attributed an important structural role to the central conserved proline [90]. A peptide corresponding to the putative secondary internal fusion peptide of Sendai virus is helical in membrane mimetic environments and promotes lipid mixing between liposomes [46]. A similar internal secondary fusion peptide has been postulated to exist in the F protein of measles virus and its interactions with lipid bilayers have been found to be stronger than those of the N-terminal fusion peptide [89]. This anomalous result may be explained if the more hydrophobic N-terminal fusion peptide was not adequately solubilized prior to membrane binding in these experiments. Whether the internal secondary “fusion peptides” of the F proteins of paramyxoviruses really contribute to fusion
4 Interactions of Fusion Peptides with Lipid Bilayers
by lipid interactions in the context of the full-length proteins is not yet known. Even for the primary internal fusion peptides, such as those shown in Fig. 2, it is presently not clear how well peptide models represent their true structures in the context of the membrane-bound forms of the entire fusion proteins. It may be possible to design looped peptide models with appropriately constrained distances between the ends of the loops in the cases of class II fusion proteins, for which the atomic structures of the post-fusion conformations of the entire ectodomains are known from X-ray crystallography such as, for example, for Semliki Forest and Dengue viruses (see Figs. 1 and 4) [91, 92].
4.6 Perturbation of Bilayer Structure by Fusion Peptides
How do fusion peptides alter the structure of lipid bilayers? How do they induce non-bilayer structures in lipid bilayers, which must occur in some form at intermediate stages of membrane fusion? These are questions of much current debate with few definite answers mostly because it is difficult to trap lipid–protein fusion intermediates for structural studies. A frequently held notion is that fusion peptides alter membrane curvature and make it more negative as required in the stalk–pore model of membrane fusion. For example, 20-residue models of the influenza HA2 fusion peptide decrease the transition temperature from bilayer to curved hexagonal phases and thus stabilize the negatively curved lipid phase [93]. The same result has been observed for the interaction of the 12-residue SIV gp32 fusion peptide with lipid bilayers that are prone to hexagonal phase formation [94]. Given the structure and position of the fusion peptide shown in Fig. 3, it is difficult to envisage how this structure could promote negative curvature in lipid bilayers. If anything, it is expected to promote positive curvature. The experimentally observed depression of the bilayer-to-hexagonal phase transition temperature may be explained if the hydrophobic peptides escaped into the interstitial spaces in the hexagonal phases, which they could not do if they were attached to polar ectodomains that must tie the fusion peptides to the membrane surface. Influenza HA2 fusion peptides have also been found to alter the hydration properties of bilayers at the level of the lipid ester carbonyl groups as measured by FTIR spectroscopy [75, 76]. It is likely that the observed lipid signals reflect a partial dehydration of the membrane surface by the peptides because they also induce more order in the lipid acyl chains [75, 76], which is a common consequence of headgroup dehydration in lipid bilayers. As discussed in Section 2, lipid dehydration could lead to rather disorganized lipid stalks connecting two bilayers. Such dynamically disorganized stalks may exhibit quite different structures than the lipid stalks with neatly curved surfaces that have been proposed in the standard stalk–pore mechanism of membrane fusion. In addition to changing lipid hydration, fusion peptides also decrease the rupture tension of lipid bilayers as has been demonstrated with the influenza HA2 fusion peptide [95]. A trimeric version of the HA2 fusion peptide is more effective in rupturing membranes than its monomeric version [96]. Similarly, a trimeric HIV gp41 fusion
13
14 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
Fig. 4 Structures of influenza virus HA (left) and Semliki Forest virus E1 protein (right) in their post-fusion state modeled into a possible fusion intermediate (A) and into a fusion pore (B). The membraneinserted fusion peptides are shown in black and the transmembrane domains in red. Note the large conformational changes of both proteins when compared to their
prefusion structures depicted on the virus surfaces in Fig. 1. The pink dots in the influenza HA fusion peptide structure denote glycines that may mediate helix interactions and the blue squares denote glutamates that may be responsible for the pHdependence of the fusion peptide penetration into lipid bilayers.
peptide construct induced more rapid lipid mixing than a fusion peptide dimer, which in turn was more active than the monomer [97]. It may be expected that further microscopic detail about fusion peptide–lipid interactions will be learned from molecular dynamics simulations, particularly those that are run for long enough times and large enough systems to allow for
5 Interactions of Transmembrane Domains with Lipid Bilayers
extended membrane transformations. Encouraging first steps in this direction have been taken by several groups. Kamath and Wong [98] have simulated the 16-residue HIV gp41 fusion peptide as an α-helix in a POPE bilayer. The peptide maintained an α-helical structure and became tilted in the bilayer during the 1.5 ns of the simulation. The fusion peptide increased the thickness of the proximal leaflet of the bilayer, but left the distal monolayer unperturbed. Huang et al. [99] performed an 18-ns simulation of the 20-residue influenza HA2 fusion peptide in a DMPC bilayer. Although they started from a helical rod structure, the peptide adopted a kinked and tilted helical structure in the bilayer similar to that observed by NMR and EPR spectroscopy [81]. In this study, the lipids of the proximal layer and closest to the N-terminus of the peptide were compressed in both leaflets of the bilayer relative to lipids in unperturbed bilayers or farther away from the N-terminus. Vaccaro et al. [100] started from the NMR structure of the influenza HA2 fusion peptide and simulated it for 5 ns in a POPC bilayer. The kinked tilted α-helical structure was maintained throughout the simulation and the order parameters of the lipid acyl chains were decreased leading to bilayer thinning. No differences between proximal and distal lipids have been reported in this study.
5 Interactions of Transmembrane Domains with Lipid Bilayers
Numerous mutagenesis studies indicate that the transmembrane domains of viral spike glycoproteins are more than just simple anchoring devices to attach these proteins to the viral membrane surfaces. A first indication for this comes from studies on expressed influenza HA, in which the transmembrane domain, i.e. a single α-helix, has been replaced with a glycosylphosphatidylinositol lipid anchor [14]. Cells expressing this mutant are arrested at the hemifusion stage in fusion assays with red blood cells. A similar result has been found for the parainfluenza type 2 fusion protein [101]. Since a deletion of the 12-residue cytoplasmic tail of HA2 still promotes membrane fusion [102], it is clear that the transmembrane domain contributes to the fusion reaction in a major way. This is true even if the cytoplasmic tail influences details of fusion pore opening, which are not yet completely understood [56]. The transmembrane domain must span both leaflets of the lipid bilayer because versions that span only half of the lipid bilayer cause hemifusion just as do the lipid-anchored HAs [42]. Despite the requirement for a near full-length transmembrane domain, there may be no specific sequence requirement of the influenza HA2 transmembrane domain to support membrane fusion [42]. However, some sequence requirements in this domain and appropriate acylations of three cysteines in the short cytoplasmic domain must be met to correctly target influenza HA to cholesterol/sphingomyelin-rich domains in the apical membrane during the biogenesis of influenza virus particles [54, 55]. In retroviruses, a conserved arginine or lysine in the middle of the transmembrane domain of the envelope glycoprotein appears to be important for folding and assembly of the protein in the membrane [103, 104] and/or its ability to support membrane fusion [105, 106]. Similarly, the
15
16 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
murine leukemia virus envelope glycoprotein requires a proline [107] and the vesicular stomatitis virus (VSV) G glycoprotein requires two glycines in the middle of the transmembrane domain in order to support the transition from the hemifusion intermediate to the full fusion product [108]. Perhaps, flexibility or kinks are necessary in the transmembrane domains of some fusion proteins for the completion of full membrane fusion. The transmembrane domain of influenza virus HA2 by itself is α-helical and oriented at an angle of 15◦ or less from the membrane normal [109]. It forms oligomers (dimers, trimers and tetramers) in SDS micelles and readily exchanges amide hydrogens when embedded in lipid bilayers. Therefore, it may form a small partially water-accessible helical bundle in membranes. Helical contacts and perhaps water accessibility may be mediated by three conserved serines (or two serines and a cysteine in some strains) that occur in heptad repeats in the central part of the transmembrane domain. Interestingly, the synthetic transmembrane domains of influenza HA2 also increase the lipid chain order parameters [109], which could be a consequence of a partial dehydration at the membrane surface as discussed above in the context of fusion peptide-membrane interactions. Lipid interactions of the tryptophan-rich juxta-membrane domains of HIV gp41 have also been examined. Peptides corresponding to this domain insert into membranes as amphipathic helices parallel to the membrane surface and competitively inhibit fusion promoted by full-length gp41 [110, 111].
6 Structure–Function (Fusion) Relationships of Membrane-interactive Viral Fusion Protein Domains 6.1 Fusion Peptide Mutants
Studies establishing correlations between the structures of membrane-interactive segments of fusion proteins in membranes and their ability to support membrane fusion are only in their beginning stages. The first residue, a glycine, of the influenza HA2 fusion peptide appears to be particularly important for supporting fusion. If glycine 1 is missing, fusion and viral infectivity is completely aborted [40, 41]. Molecular dynamics simulations indicate that the fusion peptide lacking the first glycine is more linear in structure and lies more parallel to the membrane surface compared to the wild-type fusion peptide structure [100]. This mutant also has a higher tendency to self-associate into β-structures on membrane surfaces than the wild-type fusion peptide [75]. If glycine 1 is substituted with a valine, fusion is also completely blocked [39]. This functional defect correlates with a more linear and more surface-located structure of the G1V fusion peptide in lipid bilayers ([86, 87], see also Fig. 3). The defective G1V fusion peptide also has an increased propensity for self-association on membrane surfaces. Substitution of glycine 1 with a serine causes a milder fusion phenotype: fusion can proceed to hemifusion, but not to
6 Structure–Function (Fusion) Relationships of Membrane-interactive Viral Fusion Protein Domains
full fusion [39]. Structural analysis reveals that the G1S fusion peptide still forms a kinked boomerang structure in lipid bilayers similar to that of the wild-type fusion peptide, but that the glycine ridge on the upper face of the N-terminal arm of the V-shaped molecule is disrupted ([86, 87], see also Fig. 3). Very recent data show that mutation of tryptophan 14 to an alanine also completely blocks fusion and that peptides with the W14A substitution are very flexible and lack a fixed angle between the two arms of the boomerang structure of the wild-type (A. Lai et al., unpublished results). Therefore, it appears that a deep insertion into the bilayer at a fixed oblique angle as accomplished by the boomerang structure is required to promote the initial joining of two bilayers (hemifusion), but that an intact glycine ridge may be additionally required to proceed to the formation and expansion of the fusion pore. Further experiments are needed to examine whether or not an intact glycine ridge on the N-terminal arm of the boomerang is an absolute structural requirement for influenza HA-mediated full membrane fusion. Glycine residues in the fusion peptide of HIV gp41 also appear to be critically important to support the fusion activity of the expressed fusion protein and infectivity of the virus [112]. The highly conserved glycines 10 and 13 are particularly sensitive to mutation, whereas the less conserved glycines 3 and 5 are more permissive to substitutions. Substitution of the bulky aliphatic side-chains of valine 2 and leucine 9 with the charged residues glutamate and arginine, respectively, also reduces the fusion activity of gp41 and the infectivity of HIV particles bearing these mutations [113]. The V2E substitution is particularly severe, whereas the L9R and A15E substitutions are milder and reduce syncytium formation to about 50%. The relatively conservative mutation of phenylalanine 11 to a tyrosine blocks syncytium formation of HIV gp41 expressing cells almost completely [114]. The V2E, L9R and F11Y mutations have been introduced into peptide models [115]. All three mutant peptides induced decreased levels of hemolysis of red blood cells compared to hemolysis induced by the wild-type gp41 fusion peptide. The V2E, L9R and F11Y mutant fusion peptides also exhibit significantly decreased contents of α-helix and increased contents of β-structure in lipid bilayers when compared to the wild-type fusion peptide [115]. Molecular dynamics simulations indicate that the V2E and L9R peptides may lie parallel on the membrane surface whereas the wild-type fusion peptide inserts as an oblique helix into lipid bilayers [98]. Therefore, it appears that some general structure-function relationships described for the influenza HA fusion peptide are recapitulated in the HIV gp41 fusion peptide. 6.2 Transmembrane Domain Mutants
As discussed above, the structural requirements on the transmembrane domain are likely less stringent than those on the fusion peptides. It appears that in some cases a full-length transmembrane domain of an almost generic sequence is sufficient to support fusion. However, there may be requirements for some flexibility in the middle of the transmembrane domain and perhaps the presence of some polar/apolar residues in heptad repeats to allow for appropriate homotypic and
17
18 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
heterotypic helix–helix interactions. An alignment of the sequences of the transmembrane domains of fourteen different subtype strains of influenza virus HAs shows that the N-terminal half of the domain is more conserved than the C-terminal half [109]. A heptad repeat with the conserved motif ILsIYSsbssSL or ILWISFsbssFL (s = small semipolar, i.e. G, A, S, T or C; b = branched aliphatic, i.e. I or V; underlines denote heptad repeats of apolar residues) is found in the N-terminal half, i.e. the half that resides in the outer leaflet of the viral lipid bilayer, in all fourteen strains. A conserved glycine or alanine is found eight or seven residues, respectively, downstream from these motifs in all strains. The C-terminal half of the transmembrane domain that resides in the inner leaflet of the viral lipid bilayer is otherwise highly variable between the strains and does not show features that distinguishes it from any generic transmembrane domain. Despite their good conservation, a few point mutations (e.g. W3 → A or s9 → L) in the above motifs did not affect fusion, nor did deletion of the first five residues of this motif [42]. As noted above, tryptophan 3 is not totally conserved and is a “s” in the other motif. Significant lipid–protein mismatch may occur in the deletion mutant and thereby destabilize the viral membrane and induce fusion in an unusual manner. The insensitivity of the G9L mutation in an H3 subtype HA is significant and interesting [42]. In contrast, changing glycine 10 into a leucine in an H2 subtype HA leads to a hemifusion phenotype [116, 117]. Apparently, a single glycine-to-leucine substitution in this region with several s residues is sufficient in some, but not in all, cases to reduce full fusion to hemifusion phenotypes. The high conservation of unusual motifs in the N-terminal half of the transmembrane domain of influenza HA remains intriguing and more drastic motif changes may lead to interesting new fusion mutants. Transmembrane domain glycines also seem to be important for fusion mediated by the VSV G protein. Two critical glycine substitutions in the transmembrane domain of this fusion protein block the transition from hemifusion to full fusion [108]. In the case of HIV gp41 and some other retroviruses, a conserved arginine or lysine residue in the center of the transmembrane domain appears to be important for fusion and viral infection [103–106].
7 Possible Mechanisms for Initiating the Formation of Viral Fusion Pores
Fusion is initiated by quite dramatic conformational changes in the ectodomains of the viral fusion proteins. These conformational changes are triggered by low pH in the endosome after receptor-mediated endocytosis for some viruses or by a receptor-mediated activation mechanism in other cases. Two examples for possible intermediate and final fusion structures of a class I and a class II fusion protein are shown in Fig. 4. These structures and their lateral organization in between two fusing membranes should be compared to the dramatically different resting structures that are present on the viral membrane surfaces and that are depicted in Fig. 1. In class I proteins, exemplified by influenza HA2, the coiled coils refold by executing large jack-knife motions (see, e.g. Fig. 2 in [118]). The pH 7 structure
7 Possible Mechanisms for Initiating the Formation of Viral Fusion Pores
of influenza HA2 is metastable. Energy is gained when HA2 assumes the pH 5 structure, which is why this conformational change has been characterized as “spring loaded” [119]. An important feature of the conformational change of class I fusion proteins is that the hydrophobic fusion peptide, which is protected in a hydrophobic pocket in the resting structure, becomes exposed and available for interaction with the target membrane upon fusion activation. In influenza HA2, for which most structural information is available, the conformational change is thought to occur in two steps: the core coiled-coils of HA2 extend in the Nterminal direction and thereby translocate the fusion peptides towards the top of the molecule where they become available for interaction with the target membrane [120]. Next, the outer layer helices refold to form helical hairpins with the core helices and thereby redirect the C-terminal ends with the attached transmembrane domains toward the N-terminal ends with the attached fusion peptides. These are the states shown in Fig. 4 on the left. Interestingly, the N- and C-terminal ends of the ectodomains form a tight cap structure [121] and disruption of this cap by mutagenesis also disrupts fusion and, therefore, is functionally important [122]. This is clear evidence that, at least in the case of influenza HA2, the fusion peptide and transmembrane domain, which each are only about nine residues away from the cap, end up inserted into the fused membranes in very close proximity to each other (Fig. 4, bottom left). Although not quite as much information is yet available on class II proteins, the final situation upon completion of fusion is likely very similar in class I and II fusion proteins [91, 92]. Class II proteins start out as lattices of dimers that lie almost flat on the viral membrane surface (Fig. 1). The dimer contacts and/or the receptor protein (E2 in the case of Semliki Forest virus) protect the fusion peptide loops from hydrophobic exposure. Each subunit in the dimer consists of four domains, i.e. a N-terminal domain, a middle domain that contains the fusion peptide at its tip, a C-terminal domain and the most C-terminal transmembrane domain. Upon activation, the N-terminal and middle domains re-associate into a trimer with the fusion peptide loops exposed close to each other on the tip of the pear-shaped molecule (Fig. 4, top right). The C-terminal domains become redirected and pack into the grooves between neighboring subunits at the base of the trimer. The C-terminal ends of the ectodomain, to which the transmembrane domains are attached, are not visible in the crystal structures, but likely pack into the upper parts of the grooves and thereby position the transmembrane domains very close to the fusion peptides (Fig. 4, bottom right). Therefore, a major purpose of the refolding of the ectodomains of class I and II viral membrane fusion proteins appears to be a mechanical device to (1) insert the fusion peptides into the target membrane, and (2) bring the viral membrane-inserted transmembrane domains into close proximity of the target membrane-inserted fusion peptides and thereby induce a merger of the two membranes. In the following, we briefly discuss protein–lipid and protein–protein interactions that may lead to hemifused and fully fused states similar to those depicted in Fig. 4. It is clear that the ectodomains with attached highly specific fusion peptides are sufficient to proceed to hemifusion, at least in the case of influenza HA-mediated
19
20 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
membrane fusion. A specific transmembrane domain is not required and a simple lipid anchor suffices to bring the two membranes into hemifusion contact, presumably by inducing “nipples” in one or both membranes and tilting the ectodomains. Nipple membrane protrusions have been observed by electron microscopy [123], and a tilting of the influenza HA ectodomains relative to the viral and target membranes has been measured by polarized IR spectroscopy under fusion conditions [124–126]. We do not believe that the boomerang-shaped structure of the fusion peptide is compatible with the membrane curvature required to induce a classical hourglass-shaped lipid stalk structure. We rather believe that a high concentration of fusion peptides in the contact zone between two membranes perturbs the bilayer structure by removing water molecules from this region. Therefore, the lipid stalk may be much more dynamic than previously thought and may resemble a “lipid mixer” rather than a highly organized fluid ordered structure. A cartoon of such a dynamic lipid (“mixer”) stalk is depicted in Fig. 5 for the case of influenza HA [127]. Perhaps the fusion loops of class II viral fusion proteins create similar perturbations of the lipids in between the two membranes that are about to fuse. We have postulated that the fusion peptides of class I viral fusion proteins may adopt a full transmembrane orientation and interact directly (presumably as continuous α-helices) with the transmembrane domains [118, 120]. This postulate is based on circumstantial evidence and direct evidence for a direct interaction of the fusion peptides and transmembrane domains in membranes is still lacking as is a detailed structure of the fusion pore. Proteinaceous pores of fusion proteins have been previously suggested for fusion mediated by influenza HA [128] and SNARE proteins [129]. The fundamental difference between the model suggested here and these earlier “gap junction-type” models is that in our model the different transmembrane segments are postulated to interact laterally in the same membrane and not in trans across two different membranes. Obviously, lateral helix association models cannot explain full fusion of type II viral fusion proteins. It appears that
Fig. 5 Dynamic lipid mixer stalk that may be responsible for the hemifusion intermediate in influenza HA-mediated membrane fusion (adapted from [127]). The glycines in the fusion peptide structures are indi-
cated with small dots. The larger spheres in the center are the headgroups of perturbed lipids whose apolar sidechains may rapidly flip between the upper and lower membranes.
References
in this case, insertion of the fusion loops into the membrane interface must suffice to attract the transmembrane domains to this destabilized membrane region. However, heteromeric lateral helix associations may still occur in class II fusion proteins because many of these proteins have other transmembrane domains in their vicinity in stoichiometric amounts, i.e. those of the receptor subunits – E2 in the case of Semliki Forest virus. The discussion about the structures that lead to the opening of fusion pores and the dilation of these pores to complete the membrane fusion reaction is reminiscent of a similar vivid discussion on the action of lytic peptides. Some antimicrobial peptides induce purely proteinaceous pores (barrel stave model), whereas others induce pores with more loosely arranged peptides with many interspersed lipids (carpet model). Clearly, more work needs to be done to define how exactly class I and II fusion proteins work on membranes and creative new combinations of structural and biophysical experimentation will be needed to establish what exact structures lead to membrane fusion in both classes of viral fusion proteins.
8 Acknowledgments
We thank the many members of the Tamm laboratory, present and past, who have contributed to our current understanding of the mechanisms of membrane fusion. This work was funded in part by grants from the National Institutes of Health.
References 1 J. F. NAGLE, S. TRISTRAM-NAGLE, Biochim. Biophys. Acta 2000, 1469, 159–195. 2 H. I. PETRACHE, S. TRISTRAM-NAGLE, K. GAWRISCH, D. HARRIES, V. A. PARSEGIAN, J. F. NAGLE, Biophys. J. 2004, 86, 1574–1586. 3 D. GINGELL, I. GINSBERG, in Membrane Fusion, POSTE, G. (ed.), Elsevier, Amsterdam, 1978. 4 S.W. HUI, T. P. STEWART, L. T. BONI, P. L. YEAGLE, Science 1981, 212, 921–923. 5 V. S. MARKIN, M. M. KOZLOV, V. L. BOROVJAGIN, Gen. Physiol. Biophys. 1984, 3, 361–377. 6 L. YANG, H. W. HUANG, Science 2002, 297, 1877–1879. 7 L. YANG, L. DING, H. W. HUANG, Biochemistry 2003, 42, 6631–6635. 8 L. V. CHERNOMORDIK, G. B. MELIKYAN, I. G. ABIDOR, V. S. MARKIN, Y. A. CHIZMADZHEV, Biochim. Biophys. Acta
1985, 812, 643–655. 9 L. V. CHERNOMORDIK, G. B. MELIKYAN, Y. A. CHIZMADZHEV, Biochim. Biophys. Acta 1987, 906, 309–352. 10 C. A. HELM, J. N. ISRAELACHVILI, P. M. MCGUIGGAN, Science 1989, 246, 919–922. 11 C. A. HELM, W. KNOLL, J. N. ISRAELACHVILI, Proc. Natl Acad. Sci. USA 1991, 88, 8169–8173. 12 L. CHERNOMORDIK, M. M. KOZLOV, J. ZIMMERBERG, J. Membr. Biol. 1995, 146, 1–14. 13 P. I. KUZMIN, J. ZIMMERBERG, Y. A. CHIZMADZHEV, F. S. COHEN, Proc. Natl Acad. Sci. USA 2001, 98, 7235–7240. 14 G. W. KEMBLE, T. DANIELI, J. M. WHITE, Cell 1994, 76, 383–391. 15 G. B. MELIKYAN, S. A. BRENER, D. C. OK, F. S. COHEN, J. Cell Biol. 1997, 136, 995–1005.
21
22 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
16 L. V. CHERNOMORDIK, V. A. FROLOV, E. LEIKINA, P. BRONK, J. ZIMMERBERG, J. Cell Biol. 1998, 140, 1369–1382. 17 E. LEIKINA, L. V. CHERNOMORDIK, Mol. Biol. Cell 2000, 11, 2359–2371. 18 R. M. MARKOSYAN, G. B. MELIKYAN, F. S. COHEN, Biophys. J. 2001, 80, 812–821. 19 J. LEE, B. R. LENTZ, Biochemistry 1997, 36, 6251–6259. 20 M. E. HAQUE, T. J. MCINTOSH, B. R. LENTZ, Biochemistry 2001, 40, 4340–4348. 21 K. O. EVANS, B. R. LENTZ, Biochemistry 2002, 41, 1241–1249. 22 D. P. SIEGEL, Biophys. J. 1993, 65, 2124–2140. 23 D. P. SIEGEL, Biophys. J. 1999, 76, 291–313. 24 Y. KOZLOVSKY, M. M. KOZLOV, Biophys. J. 2002, 82, 882–895. 25 V. S. MARKIN, J. P. ALBANESI, Biophys. J. 2002, 82, 693–712. 26 M. E. HAQUE, B. R. LENTZ, Biochemistry 2004, 43, 3507–3517. 27 K. KATSOV, M. MULLER, M. SCHICK, Biophys. J. 2004, 87, 3277–3290. 28 H. NOGUCHI, M. TAKASU, J. Chem. Phys. 2001, 115, 9547–9551. 29 M. MULLER, K. KATSOV, M. SCHICK, J. Chem. Phys. 2002, 116, 2342–2345. 30 M. MULLER, K. KATSOV, M. SCHICK, Biophys. J. 2003, 85, 1611–1623. 31 S. J. MARRINK, A. E. MARK, J. Am. Chem. Soc. 2003, 125, 11144–11145. 32 M. J. STEVENS, J. H. HOH, T. B. WOOLF, Phys. Rev. Lett. 2003, 91, 1881021–1881024. 33 V. A. FROLOV, A. Y. DUNINA-BARKOVSKAYA, A. V. SAMSONOV, J. ZIMMERBERG, Biophys. J. 2003, 85, 1725–1733. 34 J. J. SKEHEL, K. CROSS, D. STEINHAUER, D. C. WILEY, Biochem. Soc. Trans. 2001, 29, 623–626. 35 L. J. EARP, S. E. DELOS, H. E. PARK, J. M. WHITE, Curr. Topics Microbiol. Immunol. 2004, 285, 25–66. 36 M. J. GETHING, R. W. DOMS, D. YORK, J. WHITE, J. Cell Biol. 1986, 102, 11–23. 37 D. A. STEINHAUER, S. A. WHARTON, J. J. SKEHEL, D. C. WILEY, J. Virol. 1995, 69, 6643–6651. 38 C. SCHOCH, R. BLUMENTHAL, J. Biol. Chem. 1993, 268, 9267–9274.
39 H. QIAO, R. T. ARMSTRONG, G. B. MELIKYAN, F. S. COHEN, J. M. WHITE, Mol. Biol. Cell 1999, 10, 2759–2769. 40 K. J. CROSS, S. A. WHARTON, J. J. SKEHEL, D. C. WILEY, D. A. STEINHAUER, EMBO. J. 2001, 20, 4432–4442. 41 W. GARTEN, F. X. BOSCH, D. LINDER, R. ROTT, H. D. KLENK, Virology 1981, 115, 361–374. 42 R. T. ARMSTRONG, A. S. KUSHNIR, J. M. WHITE, J. Cell Biol. 2000, 151, 425–437. 43 R. F. EPAND, J. C. MACOSKO, C. J. RUSSELL, Y. K. SHIN, R. M. EPAND, J. Mol. Biol. 1999, 286, 489–503. 44 C. H. KIM, J. C. MACOSKO, Y. K. SHIN, Biochemistry 1998, 37, 137–144. 45 R. F. EPAND, C. M. YIP, L. V. CHERNOMORDIK, D. L. LEDUC, Y. K. SHIN, R. M. EPAND, Biochim. Biophys. Acta 2001, 1513, 167–175. 46 S. G. PEISAJOVICH, O. SAMUEL, Y. SHAI, J. Mol. Biol. 2000, 296, 1353–1365. 47 K. A. BAKER, R. E. DUTCH, R. A. LAMB, T. S. JARDETZKY, Mol. Cell 1999, 3, 309–319. 48 L. CHEN, J. J. GORMAN, J. MCKIMM-BRESCHKIN, L. J. LAWRENCE, P. A. TULLOCH, B. J. SMITH, P. M. COLMAN, M. C. LAWRENCE, Structure (Camb.) 2001, 9, 255–266. 49 K. SALZWEDEL, J. T. WEST, E. HUNTER, J. Virol. 1999, 73, 2469–2480. 50 I. MUNOZ-BARROSO, K. SALZWEDEL, E. HUNTER, R. BLUMENTHAL, J. Virol. 1999, 73, 6089–6092. 51 A. SAEZ-CIRION, M. J. GOMARA, A. AGIRRE, J. L. NIEVA, FEBS Lett. 2003, 533, 47–53. 52 D. H. KWEON, C. S. KIM, Y. K. SHIN, Nat. Struct. Biol. 2003, 10, 440–447. 53 S. S. CHEN, S. F. LEE, C. T. WANG, J. Virol. 2001, 75, 9925–9938. 54 P. SCHEIFFELE, M. G. ROTH, K. SIMONS, EMBO J. 1997, 16, 5501–5508. 55 S. LIN, H. Y. NAIM, A. C. RODRIGUEZ, M. G. ROTH, J. Cell Biol. 1998, 142, 51–57. 56 G. B. MELIKYAN, H. JIN, R. A. LAMB, F. S. COHEN, Virology 1997, 235, 118–128. 57 L. K. TAMM, H. HONG, in Protein Folding Handbook, BUCHNER, J., KIEFHABER, T. (eds), Wiley-VCH, Weinheim, 2005, Vol. 2, Part I, pp. 998–1031. 58 I. MARTIN, F. DEFRISE-QUERTAIN, E. DE-CROLY, M. VANDENBRANDEN, R.
References
59 60
61
62 63 64
65 66 67 68 69
70
71
72
73 74
75
76
77
BRASSEUR, J. M. RUYSSCHAERT, Biochim. Biophys. Acta 1993, 1145, 124–133. I. MARTIN, H. SCHAAL, A. SCHEID, J. M. RUYSSCHAERT, J. Virol. 1996, 70, 298–304. R. BRASSEUR, M. VANDENBRANDEN, B. CORNET, A. BURNY, J. M. RUYSSCHAERT, Biochim. Biophys. Acta 1990, 1029, 267–273. J. P. BRADSHAW, M. J. DARKES, T. A. HARROUN, J. KATSARAS, R. M. EPAND, Biochemistry 2000, 39, 6581–6585. M. RAFALSKI, J. D. LEAR, W. F. DEGRADO, Biochemistry 1990, 29, 7917–7922. A. SAEZ-CIRION, J. L. NIEVA, Biochim. Biophys. Acta 2002, 1564, 57–65. S. G. PEISAJOVICH, R. F. EPAND, M. PRITSKER, Y. SHAI, R. M. EPAND, Biochemistry 2000, 39, 1826–1833. D. K. CHANG, S. F. CHENG, W. J. CHIEN, J. Virol. 1997, 71, 6593–6602. K. F. MORRIS, X. GAO, T. C. WONG, Biochim. Biophys. Acta 2004, 1667, 67–81. J. YANG, C. M. GABRYS, D. P. WELIKY, Biochemistry 2001, 40, 8126–8137. J. YANG, D. P. WELIKY, Biochemistry 2003, 42, 11879–11890. J. YANG, M. PROROK, F. J. CASTELLINO, D. P. WELIKY, Biophys. J. 2004, 87, 1951–1963. L. M. GORDON, P. W. MOBLEY, R. PILPA, M. A. SHERMAN, A. J. WARING, Biochim. Biophys. Acta 2002, 1559, 96–120. L. M. GORDON, P. W. MOBLEY, W. LEE, S. ESKANDARI, Y. N. KAZNESSIS, M. A. SHERMAN, A. J. WARING, Protein Sci. 2004, 13, 1012–1030. M. MURATA, Y. SUGAHARA, S. TAKAHASHI, S. OHNISHI, J. Biochem. (Tokyo) 1987, 102, 957–962. J. D. LEAR, W. F. DEGRADO, J. Biol. Chem. 1987, 262, 6500–6505. S. A. WHARTON, S. R. MARTIN, R. W. RUIGROK, J. J. SKEHEL, D. C. WILEY, J. Gen. Virol. 1988, 69, 1847–1857. C. GRAY, S. A. TATULIAN, S. A. WHARTON, L. K. TAMM, Biophys. J. 1996, 70, 2275–2286. X. HAN, D. A. STEINHAUER, S. A. WHARTON, L. K. TAMM, Biochemistry 1999, 38, 15052–15059. M. E. HAQUE, A. J. MCCOY, J. GLENN, J. LEE, B. R. LENTZ, Biochemistry 2001, 40, 14243–14251.
78 M. E. HAQUE, B. R. LENTZ, Biochemistry 2002, 41, 10866–10876. 79 X. HAN, L. K. TAMM, Proc. Natl Acad. Sci. USA 2000, 97, 13097–13102. 80 X. HAN, L. K. TAMM, J. Mol. Biol. 2000, 304, 953–965. 81 X. HAN, J. H. BUSHWELLER, D. S. CAFISO, L. K. TAMM, Nat. Struct. Biol. 2001, 8, 715–720. 82 J. C. MACOSKO, C. H. KIM, Y. K. SHIN, J. Mol. Biol. 1997, 267, 1139–1148. 83 J. LUNEBERG, I. MARTIN, F. NUSSLER, J. M. RUYSSCHAERT, A. HERRMANN, J. Biol. Chem. 1995, 270, 27606–27614. 84 P. V. DUBOVSKII, H. LI, S. TAKAHASHI, A. S. ARSENIEV, K. AKASAKA, Protein Sci. 2000, 9, 786–798. 85 C. H. HSU, S. H. WU, D. K. CHANG, C. CHEN, J. Biol. Chem. 2002, 277, 22725–22733. 86 Y. LI, A. L. LAI, X. HAN, J. H. BUSHWELLER, D. S. CAFISO, L. K. TAMM, J. Virol., in press. 87 Y. LI, X. HAN, L. K. TAMM, Biochemistry 2003, 42, 7245–7251. 88 S. G. PEISAJOVICH, R. F. EPAND, R. M. EPAND, Y. SHAI, Eur. J. Biochem. 2002, 269, 4342–4350. 89 O. SAMUEL, Y. SHAI, Biochemistry 2001, 40, 1340–1349. 90 M. J. GOMARA, P. MORA, I. MINGARRO, J. L. NIEVA, FEBS Lett. 2004, 569, 261–266. 91 Y. MODIS, S. OGATA, D. CLEMENTS, S. C. HARRISON, Nature 2004, 427, 313–319. 92 D. L. GIBBONS, M. C. VANEY, A. ROUSSEL, A. VIGOUROUX, B. REILLY, J. LEPAULT, M. KIELIAN, F. A. REY, Nature 2004, 427, 320–325. 93 R. M. EPAND, R. F. EPAND, I. MARTIN, J. M. RUYSSCHAERT, Biochemistry 2001, 40, 8800–8807. 94 A. COLOTTO, I. MARTIN, J. M. RUYSSCHAERT, A. SEN, S. W. HUI, R. M. EPAND, Biochemistry 1996, 35, 980–989. 95 M. L. LONGO, A. J. WARING, D. A. HAMMER, Biophys. J. 1997, 73, 1430–1439. 96 W. L. LAU, D. S. EGE, J. D. LEAR, D. A. HAMMER, W. F. DEGRADO, Biophys. J. 2004, 86, 272–284. 97 R. YANG, M. PROROK, F. J. CASTELLINO, D. P. WELIKY, J. Am. Chem. Soc. 2004, 126, 14722–14723.
23
24 Protein-Lipid Interactions during Virus Entry by Membrane Fusion
98 S. KAMATH, T. C. WONG, Biophys. J. 2002, 83, 135–143. 99 Q. HUANG, C. L. CHEN, A. HERRMANN, Biophys. J. 2004, 87, 14–22. 100 L. VACCARO, K. CROSS, J. KLEINJUNG, S. STRAUSS, D. THOMAS, S. WHARTON, J. SKEHEL, F. FRATERNALI, Biophys. J. 2005, 88, 25–36. 101 S. TONG, R. W. COMPANS, Virology 2000, 270, 368–376. 102 H. JIN, G. P. LESER, J. ZHANG, R. A. LAMB, EMBO J. 1997, 16, 1236–1247. 103 D. A. EINFELD, E. HUNTER, J. Virol. 1994, 68, 2513–2520. 104 R. J. OWENS, C. BURKE, J. K. ROSE, J. Virol. 1994, 68, 570–574. 105 T. PIETSCHMANN, H. ZENTGRAF, A. RETHWILM, D. LINDEMANN, J. Virol. 2000, 74, 4474–4482. 106 J. T. WEST, P. B. JOHNSTON, S. R. DUBAY, E. HUNTER, J. Virol. 2001, 75, 9601–9612. 107 G. M. TAYLOR, D. A. SANDERS, Mol. Biol. Cell 1999, 10, 2803–2815. 108 D. Z. CLEVERLEY, J. LENARD, Proc. Natl Acad. Sci. USA 1998, 95, 3425–3430. 109 S. A. TATULIAN, L. K. TAMM, Biochemistry 2000, 39, 496–507. 110 D. J. SCHIBLI, R. C. MONTELARO, H. J. VOGEL, Biochemistry 2001, 40, 9570–9578. 111 A. SAEZ-CIRION, J. L. ARRONDO, M. J. GOMARA, M. LORIZATE, I. ILORO, G. MELIKYAN, J. L. NIEVA, Biophys. J. 2003, 85, 3769–3780. 112 M. D. DELAHUNTY, I. RHEE, E. O. FREED, J. S. BONIFACINO, Virology 1996, 218, 94–102. 113 E. O. FREED, D. J. MYERS, R. RISSER, Proc. Natl Acad. Sci. USA 1990, 87, 4650–4654.
114 L. BERGERON, N. SULLIVAN, J. SODROSKI, J. Virol. 1992, 66, 2389–2397. 115 P. W. MOBLEY, A. J. WARING, M. A. SHERMAN, L. M. GORDON, Biochim. Biophys. Acta 1999, 1418, 1–18. 116 G. B. MELIKYAN, S. LIN, M. G. ROTH, F. S. COHEN, Mol. Biol. Cell 1999, 10, 1821–1836. 117 G. B. MELIKYAN, R. M. MARKOSYAN, M. G. ROTH, F. S. COHEN, Mol. Biol. Cell 2000, 11, 3765–3775. 118 L. K. TAMM, X. HAN, Y. LI, A. L. LAI, Biopolymers 2002, 66, 249–260. 119 C. M. CARR, P. S. KIM, Cell 1993, 73, 823–832. 120 L. K. TAMM, Biochim. Biophys. Acta 2003, 1614, 14–23. 121 J. CHEN, J. J. SKEHEL, D. C. WILEY, Proc. Natl Acad. Sci. USA 1999, 96, 8967–8972. 122 H. E. PARK, J. A. GRUENKE, J. M. WHITE, Nat. Struct. Biol. 2003, 10, 1048–1053. 123 T. KANASEKI, K. KAWASAKI, M. MURATA, Y. IKEUCHI, S. OHNISHI, J. Cell Biol. 1997, 137, 1041–1056. 124 S. A. TATULIAN, P. HINTERDORFER, G. BABER, L. K. TAMM, EMBO J. 1995, 14, 5514–5523. 125 C. GRAY, L. K. TAMM, Protein Sci. 1997, 6, 1993–2006. 126 C. GRAY, L. K. TAMM, Protein Sci. 1998, 7, 2359–2373. 127 L. K. TAMM, J. CRANE, V. KIESSLING, Curr. Opin. Struct. Biol. 2003, 13, 453–466. 128 F. W. TSE, A. IWATA, W. ALMERS, J. Cell Biol. 1993, 121, 543–552. 129 X. HAN, C. T. WANG, J. BAI, E. R. CHAPMAN, M. B. JACKSON, Science 2004, 304, 289–292.
1
The Calponin Homology (CH) Domain Mario Gimona Consorzio Mario Negri Sud, Santa Maria Imbaro, Italy
Steven J. Winder University of Sheffield, Sheffield, United Kingdom
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 Introduction and Brief History
“Does Vav bind to F-actin through a CH domain?” This title of a 1995 paper by Castresana and Saraste [1] marks the birth of the calponin homology (CH) domain. With the help of structure-based sequence alignments the authors described a 100 residue long protein module that they found in a number of signaling and cytoskeletal proteins. The title of the manuscript prejudiced in a peculiar way the developments in the years to come. Not only did very few people know that the guanine nucleotide exchange factor Vav existed or even interacted with the actin cytoskeleton, but also the calponin community was surprised to find that the functionally almost dispensable N-terminal region of the calponin molecule would serve as the ‘mother of actin-binding modules’. At any rate, once the CH domain was born it stirred up the fields of cytoskeleton research and signaling and made a significant contribution towards a greater mutual understanding of the importance of upstream and downstream targets, respectively. Prior to this historical landmark, partial sequence similarities between the actinbinding domains of classical actin cross-linkers like α-actinin, filamin, spectrin, or fimbrin were noted [2–4] and the name-giving protein, calponin, likewise showed sequence similarities in parts of its N-terminal domain (see Figure 1 for sequence alignments and type classification). With the initial establishment of the CH domain, researchers began to look at actin-binding sites from a new perspective, and soon novel CH-domain family members were identified.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
Fig. 1
2 The Calponin Homology (CH) Domain
2 Structure of the Domain – The CH Domain Fold 3
However, the early years of the CH domain were troublesome. A persistent misapprehension of CH domain function, based on an oversimplified interpretation of actin-binding data, delayed a more intellectual discourse with this fascinating protein module [5]. The CH domain was stuck in mediating actin association for all kinds of proteins, irrespective of their subcellular localization or molecular context. More recent and more careful annotations of additional functional sites present on all types of CH domains not only sharpened researchers’ minds but also led to the revival of long-forgotten questions with respect to the regulation of actin binding. Recent years have seen a greater appreciation for and a more liberal view of the functional polymorphism displayed by CH-domain–containing molecules. The realization that actin-binding CH domains can bind the filaments in multiple ways [6–9] considerably shaped the current view on actin-binding modules of the CH-domain family (reviewed in [10]). The final hits aiding in establishing the CH domain as a multifaceted tool were added by the unequivocal determination of the microtubule-anchoring function of the EB1 CH domain [11] and the identification of CH domains in proteins of the nuclear mitotic apparatus (NuMA) [12].
2 Structure of the Domain – The CH Domain Fold
The CH domain is a compact globular fold comprising four major (A, C, E, G) and two or three minor helices (B, D, F) interconnected by loops of variable lengths (Figure 2a). The four major helices of between 10 and 20 residues form the core of the domain, with the smaller helices making minor structural contributions. Helices C, E, and G run roughly parallel to each other, with the N-terminal helix A being roughly perpendicular to them. The A and G helices are the ‘bread and butter’ sandwiching the ‘jam’ represented by the hydrophobic helices C and E (Figure 2a). Among the 15 different CH domain structures currently known (Table 1), which range from those present in a single copy to those of the double tandem domain in fimbrin and represent actin-binding, signaling, and microtubule binding proteins, the structural unit is highly conserved with an rmsd of ∼1 Å between Cα atoms of the four helices in all CH domains [13]. Despite strong structural conservation, CH domains can nonetheless be divided into distinct families, based on structural [7, 14] or sequence alignments [14]. In either situation, families arise due to differences in the lengths and positions of the minor helices and sequence ← Fig. 1 Sequence alignment of CH domains representing the individual types. Invariant key residues are highlighted, and the grey bars at the bottom indicate the positions of helices. The key residues tryptophan (W) at position 11 and the aspartate at the beginning of helix G are the most-conserved residues. Helix C usually follows the con-
sensus DGXXLXXL. The proline (P) residue terminating helix C is invariant in type2 and type3 CH domains and also in the fimtype and EBtype CH domains, but is missing in type1, type4, and type5 CH domains. The asparagine (N) at or within helix E is conserved, but the position varies significantly among the different CH domain types.
4 The Calponin Homology (CH) Domain
Fig. 2 Representative structures of single and tandem CH domains. (a) The single CH domain of calponin [13] with major helices colored individually, from blue at the N terminus (n) to red at the C terminus (c), and labeled from A to G. A short 310 helix unique to calponin is found between helices E and F, and there is no helix D. (b) The compact tandem CH-domain pair of plectin [20], colored and labeled as above.
CH1 is on the left and CH2 on the right; as in calponin there is no helix D in CH1. In comparison to other tandem CH-domain structures, helix A in CH1 is very long. This is probably not a feature unique to plectin, since other studies have suggested a long N-terminal helix for utrophin [19], but the extended amino terminal part of the helix is not part of the true CH domain fold.
variations in interconnecting loops. In all instances the integrity of the domain is maintained by hydrophobic interactions between the core helices, with only a tryptophan residue in helix A being absolutely conserved in all CH domains. This Trp residue generally forms nonpolar interactions with other aromatic or aliphatic residues in helices E and G, thus stabilizing helix A with respect to the triple-helical bundle formed by helices C, E, and G. The four or five residues that are conserved in character in all CH domains are mostly involved in helical packing of the core structure. Table 1 CH domain structures
Protein source Type of analysis
CH domain subtype
Calponin EB1 Rng2 β-Spectrin Utrophin α-Actinin Plectin Dystrophin Utrophin Fimbrin
type3 EBtype type3 type2 type2 type1/type2 tandem type1/type2 tandem type1/type2 tandem type1/type2 tandem type1/type2 tandem
NMR crystal crystal crystal crystal crystal crystal crystal crystal crystal
Reference 1H67 1PA7, 1UEG 1P2X, IP5S 1AA2, 1BKR 1BHD unpublished 1MB8 1DXX 1QAG 1AOA
[13] [17] [18, 76] [14, 16] [7] [22] [20] [23] [6] [21]
2 Structure of the Domain – The CH Domain Fold 5
2.1 Structures of Single CH Domains
The first structures of single CH domains to be solved were of CH2 in spectrin and utrophin [7, 16]; however, these were single CH domains belonging to actin-binding domains containing tandem pairs of CH domains and are discussed in more detail below. The first structure of a true single CH domain, and to date the only solution structure of a CH domain, was that of the archetypal CH domain from chicken gizzard calponin [13] (Figure 2a); more recently this was followed by the crystal structures of EB1, a microtubule binding protein [17], and Rng2, an IQGAP protein from yeast [18]. In all instances the CH domains of these proteins are not thought to be necessary for actin binding, despite the association of Rng2 and calponin with actin or actin-containing structures. Calponin does not require its CH domain for actin-binding and, as discussed below, the single CH domain is not an actinbinding domain per se. In EB1 however, this protein binds to microtubules and not to F-actin, but it is postulated that this interaction is mediated by hydrophobic residues in a similar manner to CH-domain interactions with F-actin [17]. It has been suggested that the interhelical loops, which in structural and sequence terms are the elements that show the most variation among CH domains, might confer the different properties on CH domains [19] and that flexibility in these regions might also confer unique properties, particularly in fimbrin. The solution structure of the calponin CH domain displayed very little if any flexibility in interhelical loops [13], and it is now accepted that it is probably conserved residues in the core helices that allow single CH domains to ‘locate’ on an actin filament. 2.2 Structures of Tandem CH Domains
The basic structure of individual CH domains within the tandem CH domaincontaining proteins recapitulates the sequence-derived phylogeny, in that the CH1 domains are structurally more similar to CH1 domains in other proteins than to the CH2 domain in the same protein. The main differences stem from the relative lengths of the core helices and the number and position of the secondary helices, including short 310 helices flanking helix C in CH2 domains and the presence of an additional helix D in CH2. These differences notwithstanding, the CH domains are remarkably similar in core structure [13]. The most striking and perhaps controversial feature of the tandem CH-domain structures so far elucidated is the positions of the first and second CH domains relative to each other. Fimbrin, plectin, and α-actinin all crystallized as compact monomers with a single molecule in the asymmetric unit [20–22], whereas dystrophin and utrophin crystallized in a more extended conformation and as antiparallel dimers [6, 23]. Biochemically, all the isolated actin-binding domains of these proteins are monomeric; furthermore, the interface between the antiparallel utrophin and dystrophin molecules was such that CH1 of one chain was juxtaposed with CH2 of the other chain in an orientation identical to the orientation of CH domains in fimbrin, α-actinin, and plectin. This
6 The Calponin Homology (CH) Domain
sort of conservation of interactions that exists between domains in proteins that adopt two different states, here, crystallographic monomer versus crystallographic dimer, is known as 3D domain swapping [24]. As such, this is not unusual – the controversy arises, however, as to whether these tandem CH-containing actinbinding domains can adopt different conformations in solution and whether there is significant reorganization of the CH domains upon binding to actin [25]. To add fuel to the controversy, several cryoelectron microscopy reconstructions of tandem CH domains with F-actin have yielded different configurations for the CH-domain orientation on actin. Initially this difference was thought to be a consequence of the length of the inter-CH domain linker [6], because in the first two tandem CH-domain structures to be published, those of fimbrin and utrophin [6, 21], the long linker of fimbrin was believed to allow the two CH domains to fold back on each other and the shorter linker in utrophin resulted in a more open structure. α-Actinin however, which has one of the shortest interdomain linkers of all, crystallized as a compact monomer [22], effectively ruling out the linker hypothesis. An alternative view might be that the conformation of the actin-binding domain reflects the function of the whole protein, i.e., actin-bundling proteins like fimbrin and α-actinin require a compact formation because they bind at right angles to the actin filament [26, 27], and dystrophin and utrophin, which are more akin to side-binding proteins [28, 29], require an open conformation. This argument would seem to support available cryoelectron microscopic data (see [10]); however, the recent crystal structure of plectin [20], which is likely to also be an F-actin side-binding protein to some extent, was found to be a compact monomer (Figure 2b). Although Pereda and colleagues demonstrated rather elegantly [20], as had been suggested previously for utrophin [8, 30], that the two CH domains can undergo movement and can rearrange upon binding to actin [31], further direct experimentation is required to resolve this fascinating problem.
3 Molecular and Signaling Function
CH domains are found in a wide range of molecules, but the unifying theme among them is their involvement in cytoskeletal structure, dynamics, and signaling. The enormous functional plasticity displayed by the otherwise structurally highly conserved domain argues that the CH domain represents a platform for a plethora of functional sites. 3.1 Actin-binding Domains
Sequence- and structure-based profiling has led to the established classification of CH domains into at least five distinct subfamilies [15, 32]. By far the largest number of CH domains belongs to the type1 and type2 classes. With a single exception (smoothelin), these CH domains occur in tandem, and this dual module
3 Molecular and Signaling Function
forms a high-affinity actin-binding domain (ABD) in a large variety of actin-binding and cross-linking proteins. The amino acid sequences of fimbrin-type CH domains are sufficiently different to place them in a separate subfamily, yet they follow the general consensus and arrangement of ABDs formed by the type1/type2 CH domains. Type1 and type2 CH domains differ not only in sequence, but also in their affinities for actin, as has been shown for the actin cross-linking protein α-actinin. Notably, the ‘type1 fim’ CH domains of fimbrin are able to associate additionally with the intermediate filament component vimentin [33] at a site encompassing residues 143–188 (corresponding to the type1 CH domain in the first ABD). ABDs bind one actin monomer in the filament, with affinities typically in the low micromolar range. When analyzed in isolation, CH domains from ABDs have divergent actin-binding characteristics. All type1 CH domains bind actin with significant affinity, but for type2 CH domains actin binding is almost undetectable. Nevertheless, both N- and C-terminal CH domains are required for the generation of a fully functional ABD, in which the type2 CH domain appears to contribute to the overall stability of the module [34]. Similarly, the ABD in plectin requires only the CH1 for actin binding and dimerization [35]. Here, the crucial importance of the ABS2 residing in the most C-terminal helix of the CH1 was shown for the first time. Interestingly, the CH2 in plectin appears to have a negative influence on actin binding of the ABD, because deletion of the second half increases actin binding of the plectin ABD; however, this may be a reflection of alternatively spliced exons in the first CH domain because, depending on the spliced exon, the affinity for actin can vary considerably [36]. Structural studies of CH domain proteins and their interactions with actin filaments have suggested that a conserved hydrophobic surface is implicated in binding to actin filaments. It is therefore believed that the general mode of binding and the molecular interface employed for contacting the actin filament is conserved among actin-binding domains formed by a CH domain tandem. When the isolated ABD from the actin cross-linking protein α-actinin was used as a molecular targeting vehicle, a variety of otherwise cytoplasmic components (e.g., GFP, Vav) could be targeted to the thin filaments of transfected fibroblasts. However, the similarly arranged ABD from the actin-network-stabilizing molecule filamin failed to serve as a strong targeting vehicle, although the domain is undoubtedly required for filamin binding to actin [37]. This example illustrates that there are functional differences among ABDs from different subfamilies of CH-domain proteins, likely reflecting the differences displayed in the amino acid sequence of this module [38]. One may thus hypothesize that functional diversity occurs among type1/type2 CH domain ABDs and that these differences may account for the subtle differences in actin affinity, mode of cross-linking, and site of attachment along the actin filament. 3.2 Single EB-type CH Domains Function as Microtubule Anchors
In total contrast to the CaP-family and Vav-family CH domains, the CH domains of EB-family proteins (namely EB1, EB2, EB3, RB1, and the yeast EB1 homolog
7
8 The Calponin Homology (CH) Domain
Bim1) have been shown to contact growing microtubule ends. End-binding (EB) proteins are evolutionarily conserved proteins that modulate microtubule dynamics by regulating dynamic instability. In particular, EB1 targets growing microtubule ends, where it is favorably positioned to regulate microtubule polymerization [39] and to confer molecular recognition of the microtubule end [40, 41]. Immunodepletion of EB1 from Xenopus egg extracts has been shown to reduce microtubule length, and this effect was reversed by readdition of recombinant EB1 [42]. EB1 also decreased microtubule catastrophe, resulting in increased polymerization and stable microtubules in interphase cells. The effect of EB1 on microtubule dynamics is highly conserved, suggesting that this protein family belongs to a core set of regulatory factors conserved in higher organisms. In Drosophila, EB1 has been shown to play a crucial role in mitosis by its ability to promote the growth and interactions of microtubules within the central spindle and at the cell cortex [43]. Finally, in Dictyostelium, the largest known EB1 homolog (57 kDa), termed DdEB1, also localizes along microtubules and at microtubule tips, centrosomes, and protruding pseudopods and was found at the spindle, spindle poles, and kineto-chores during mitosis. In addition, EB1 is involved in regulating the interaction of the tumor suppressor adenomatous polyposis coli (APC) with the microtubule apparatus [44, 45]. EB1 binds to the C terminus of the APC protein and may regulate accumulation in cortical clusters of extending membranes, but endogenous EB1 does not accumulate in the APC clusters [46]. 3.3 Kinases, Phospholipids, and Other Cytoskeletal Components
Tandem CH domains are also present in the actopaxin/parvin family [47, 48], but here, both CH domains branch into separate subfamilies (type4, type5) and expose (overlapping) binding sites for F-actin and the focal adhesion proteins paxillin and integrin-linked kinase (ILK) [49, 50]. Thus, the CH domains in this family display functions different from those of type1, type2, or type3 CH domains. Unfortunately, more detailed molecular information about the residues involved in these interactions is lacking. CH domains display different functions when present in single as compared to tandem motifs, and different proteins contain CH domains of different ‘types’. All ABDs strictly follow the consensus type1/type2 arrangement. However, the relative contribution of each CH domain to actin binding is not known. Detailed correlative sequence analysis has revealed that CH domains in ABDs can harbor additional conserved binding motifs for phosphatidylinositol (type2 CH domains; see below) and also novel autonomous actin-binding sequences, like the recently discovered DFRxxL motif from myosin light chain kinase [51], reviewed in [15]. These findings strongly suggest that even CH domains in ABDs may expose additional, as yet unidentified, features that contribute to the selectivity and specificity of binding to actin and possibly to other thin-filament-associated components. A perfect example of this are the actin-binding domains from plectin and dystonin, which bind β4 integrin [52]. Mutations in the integrin binding pocket, formed
3 Molecular and Signaling Function
by a sequence stretch unique to plectin and dystonin and corresponding to the region preceding and partially overlapping the ABS2 (residing in the C-terminal helix in the CH1 of plectin), significantly decrease β4 integrin binding but do not influence the F-actin binding ability of plectin’s ABD. Interestingly, actin binding and integrin β4 association are mutually exclusive in plectin, suggesting that CH domain function can be switched ‘on site’ — the mechanism, however, remains obscure. In contrast to the situation in type1 and type2 CH domains, the actin binding affinities of type3 CH domains are in the millimolar range, and researchers thus earlier questioned whether their physiological function is indeed related to actin binding. Although the contribution of the CH domain in calponin-family proteins is still an open question, we have seen an accumulation of data in recent years that support the original skepticism. It is clear now that the CH domain is neither sufficient nor necessary for actin binding activity in this protein family [53–56]. This development has helped to make manifest the view that type3 CH domains may serve a primarily regulatory function, and indeed several laboratories have identified a plethora of binding partners for type3 CH domains. In the namegiving protein calponin (CaP), for example, the type3 CH domain interacts with extracellular regulated kinases ERK1 and ERK2 in vitro [57] and is hypothesized to be modulated by an association with LIM kinase in vivo [58]. Work from the Gusev laboratory revealed that Hsp90 binds directly to the CH domain of smooth muscle h1CaP and affects actin binding. The authors hypothesized that, in the presence of Hsp90, CaP is trapped in a complex, which makes the molecule unavailable for interaction with G-actin. In this way, Hsp90 could decrease the CaP-induced polymerization of actin [59, 60]. Calponin interacts in vitro with the dimeric S100family members calcyclin (S100A6), which exposes two functional Ca2+ -binding EFhands per monomer, and S100A2. EF-hands have a similar amino acid arrangement as zinc fingers and require the presence of conserved Cys or His residues in a particular spatial arrangement. EF-hands of S100A2 can also bind Zn2+ , suggesting that Zn2+ binding is involved in the regulation of CaP via S100 molecules. In agreement with this scenario, the type3 CH domain in the Rho family nucleotide exchange factor Vav binds intramolecularly to a zinc-finger–like domain [61] and has been shown to interact with the guanine nucleotide dissociation inhibitor (RhoGDI) in a two-hybrid interaction screen [62]. Vav proteins are activated by an N-terminal deletion that removes all (in Vav-2) or part (in Vav-1) of the CH domain, and this deletion occurs naturally in the onco-vav gene [63]. The CH domain is required for the modulation or down-regulation of Vav’s GEF activity [64, 65] by mediating a conformational switch in the molecule, which involves interaction with the C-terminal zinc-finger domain and results in a steric block of the catalytic DH-PH domains [66]. Notably, the short loop connecting helices A and B in the Vav CH domain is responsible for this Vav-specific function, and replacement of the loop with the homologous region from CaP makes the molecule constitutively active [67]. Binding of the phosphoinositide PtdIns(4,5)-P2 negatively regulates actin binding and bundling activity of the antiparallel actin filament cross linker α-actinin, likely
9
10 The Calponin Homology (CH) Domain
by mediating changes in the molecular structure [68]. The conserved PiP2 binding site resides in the CH2 of α-actinin, in good agreement with the supporting function of CH2 domains in ABDs. Site-directed mutagenesis in this region has revealed three critical basic residues. Mutant proteins carrying sequence alterations in these amino acids, leading to defective PiP2 binding, display increased actin binding and bundling activity in vitro. 3.4 CH Domain-containing Proteins and Human Diseases 3.4.1 The Dystrophin ABD and Muscular Dystrophy Dystrophin is a large cytoskeletal linker protein that connects the subsarcolemmal actin cytoskeleton of skeletal muscle to the transmembrane adhesion receptor dystroglycan [69]. Mutations in dystrophin give rise to the crippling and fatal Xlinked disease Duchenne muscular dystrophy (DMD). The majority of mutations in the DMD gene give rise to premature stop codons, resulting in transcript instability and complete loss of protein. The milder allelic form of DMD, Becker muscular dystrophy, which is caused by in-frame deletion or missense point mutations, does allow the synthesis of mutated protein. Most point mutations in the CH domaincontaining actin-binding region of dystrophin lead to a relatively severe phenotype [70–72], emphasizing the importance of the actin-binding domain for dystrophin function. However, it is doubtful that in the majority of cases the actin-binding domain is functional at all, because the four missense mutations; L54R, A171P, A168D, and Y321N and in-frame deletions of exon 3 (residues 32–62) and exon 5 (residues 89–119) are all expected to disrupt either the hydrophobic core of the protein or its overall structure [23]. Consequently, these mutations have not been particularly useful in determining function. 3.4.2 The Filamin ABD and Otopalatodigital Syndromes Otopalatodigital (OPD) syndromes and related disorders are a diverse group of X-linked diseases affecting craniofacial, skeletal, brain, visceral, and urogenital structures. The affected gene encodes the cytoskeletal protein filamin A, with approximately half of the 17 mutations so far described being missense mutations residing in the CH2 domain of the filamin ABD [73]. No structure of a filamin CH domain is yet available, but mapping the mutations to a model of filamin based on the structures of spectrin or dystrophin CH2 suggested that some mutations were likely to not simply result in loss of actin-binding function [73], which points to additional and as-yet-unidentified roles for CH domains. 3.4.3 The α-Actinin ABD and Glomerulosclerosis Mutations in α-actinin 4 have been described in familial focal segmental glomerulosclerosis (FSGS). FSGS is a common nonspecific renal lesion characterized by decreasing kidney function and often leading to end-stage renal failure. α-Actinin 4 has been implicated in some cases of autosomal dominant FSGS, with point mutations occurring in helix G of CH2 [74]. All three of the mutations characterized –
4 Emerging Research Directions and Recent Developments
K228E, T232I, and S235P – are on the solvent-accessible surface of helix G and are not expected to affect core structure, but are also not in a region implicated in direct interactions with actin, and so presumably are involved in some other as-yet-unidentified role of α-actinin in the kidney. 3.4.4 The β-Spectrin ABD and Spherocytosis Hereditary spherocytosis (HS) includes a group of heterogeneous hemolytic anemias ranging in severity from asymptomatic to severe. In all cases the red blood cell has a distinct morphology, with varying degrees of surface area reduction leading to a spherocytic phenotype and osmotic fragility. Of the four characterized subsets of HS patients, two are characterized by a deficiency in β-spectrin. Several mutations in spectrin have been described and shown to be the molecular defect in HS with spectrin deficiency [75]. Of these mutations, two were found in the second CH domain, W182G and I220V, with both residues being important for maintaining the hydrophobic core of the CH domain. Changing Trp182 to Gly in the short helix B in particular would be expected to have a severe effect on the stability of the CH domain, with consequent effects on the function of the whole ABD. Many other proteins that contain CH domains are implicated in diseases, for example EB1 is a tumor-suppressor protein, and plectin is involved in epidermolysis bullosa with muscular dystrophy, but to date, disease-causing mutations in these proteins have not been identified within the CH domains.
4 Emerging Research Directions and Recent Developments
The presence of a CH domain in any given protein is still taken as a strong indication that the molecule associates with the actin cytoskeleton, despite controversial interpretations of binding data and subcellular localization studies which call this simplified view into question. It is more than evident, from the diverse list of binding partners that have been identified for the various CH domains, that the module is a platform for a number of interaction sites with cytoskeleton and signaling components. The most interesting study of the past decade may be the identification of the EB1 CH domain. EB CH domains strictly follow the consensus of conserved residues and, even though the protein has not been ascribed any direct association with the actin cytoskeleton, are placed in the CH-domain family tree, representing a separate branch. The work of Hayashi and Ikura has demonstrated that the module folds almost identically to the CH domains described thus far for spectrin, fimbrin, and calponin – but the CH domain in EB1 is a microtubule anchor instead. It is difficult to envisage any similarity in the surface profiles between an actin filament and a microtubule. Hence, morphofunctional plasticity, established during a coevolutionary process of the diverse eukaryotic cytoskeleton filament systems, and the factors regulating their assembly and dynamics, may be the key to understanding this apparent paradox. Other important work, such as with the plectin ABD [52], should serve as future guidelines on how meaningful studies can
11
12 The Calponin Homology (CH) Domain
be tailored to increase our knowledge of CH domain function and interactions. Future research activities might put greater emphasis on identifying the in vivo functions of the diverse CH domains and aim at determining in more detail the (sequence) parameters that drive functional plasticity in this family.
5 Conclusions
Not even ten years have gone by since the CH domain was delineated in detail. Like all other domains and modules analyzed in this book, the CH domain obeys the rule of protein module definition. It is a stable fold, and the function can be transported to other molecules by simple fusion of the coding sequence. We must, however, be aware of the fact that protein linguistics and positional semantics may lead us into novel territory in which the strict functional definitions may not hold. There is rapidly increasing evidence for the existence of a novel class of modules that fulfill the criteria of autonomous function but not of folding. These intrinsically unfolded protein (IUP) modules appear to fold exclusively at their ligands – and a good portion of these associate with the cytoskeleton! One should therefore keep in mind that the definition of domain borders is perhaps the most relevant parameter for assessing the proper in vivo function and that structurally dispensable flanking regions may contribute significantly to the fine-tuning of domain function.
Acknowledgements
Work cited in this chapter was funded by the BBSRC, MRC, and Wellcome Trust (SJW). We are grateful to Mike Broderick for critical reading of the manuscript and to Jose Pereda for providing Figure 1b. MG is recipient of the Marie Curie Excellence Grant MEXT-CT-2003-002573 of the European Union.
References 1 CASTRESANA, J., SARASTE, M., Does Vav bind to F-actin through a CH domain? FEBS Lett. 1995, 374, 149–151. 2 DE ARRUDA, M. V., WATSON, S., LIN, C. S., LEAVITT, J., MATSUDAIRA, P., Fimbrin is a homologue of the cytoplasmic phosphoprotein plastin and has domains homologous with calmodulin and actin gelation proteins. J. Cell Biol. 1990, 111, 1069–1079. 3 MATSUDAIRA, P., Modular organization of actin crosslinking proteins. Trends Biochem. Sci. 1991, 16, 87–92.
4 WAY, M., POPE, B., WEEDS, A. G., Evidence for functional homology in the F-actin binding domains of gelsolin and alpha-actinin: implications for the requirements of severing and capping. J. Cell Biol. 1992, 119, 835–842. 5 GIMONA, M., WINDER, S. J., Single CH domains are not actin binding domains. Current Biol. 1998, 8, R674–675. 6 KEEP, N. H., WINDER, S. J., MOORES, C. A., WALKE, S., NORWOOD, F. L. M., KENDRICK-JONES, J., Crystal structure of the actin-binding region of utrophin
References
7
8
9
10
11
12
13
14
15
16
17
reveals a head-to-tail dimer. Struct. Fold. Design 1999, 7, 1539–1546. KEEP, N. H., NORWOOD, F. L., MOORES, C. A., WINDER, S. J., KENDRICK-JONES, J., The 2.0 Å structure of the second calponin homology domain from the actin-binding region of the dystrophin homologue utrophin. J. Mol. Biol. 1999, 285, 1257–1264. GALKIN, V. E., ORLOVA, A., VAN LOOCK, M. S., RYBAKOVA, I. N., ERVASTI, J. M., EGELMAN, E. H., The utrophin actin-binding domain binds F-actin in two different modes: implications for the spectrin superfamily of proteins. J. Cell Biol. 2002, 157, 243–251. GALKIN, V. E., ORLOVA, A., VAN LOOCK, M. S., EGELMAN, E. H., Do the utrophin tandem calponin homology domains bind F-actin in a compact or extended conformation? J. Mol. Biol. 2003, 331, 967–972. WINDER, S. J., Structural insights into actin-binding, branching and bundling proteins. Curr. Opin. Cell Biol. 2003, 15, 14–22. BU, W., SU, L.-K., Characterization of functional domains of human EB1 family proteins. J. Biol. Chem. 2003, 278, 49721–49731. NOVATCHKOVA, M., EISENHABER, F., A CH domain-containing N-terminus in NuMA? Protein Sci. 2002, 10, 2281–2284. BRAMHAM, J., HODGKINSON, J., SMITH, B. O., Uhr´ın, D., BARLOW, P. N., WINDER, S. J., Solution structure of the calponin CH domain and fitting to the 3D helical reconstruction of F-actin: calponin. Struct. Fold. Design. 2002, 10, 249–258. BANUELOS, S., SARASTE, M., CARUGO, K. D., Structural comparisons of calponin homology domains: implications for actin binding. Structure 1998, 6, 1419–1431. GIMONA, M., DJINOVIC-CARUGO, K., KRANEWITTER, W. J., WINDER, S. J., Functional plasticity of CH domains. FEBS Lett. 2002, 513, 98–106. ˜ UELOS, S., DJINOVIC-CARUGO, K. D., BAN SARASTE, M., Crystal structure of a calponin homology domain. Nat. Struct. Biol. 1997, 4, 175–179. HAYASHI, I., IKURA, M., Crystal structure of the amino-terminal
18
19
20
21
22 23
24
25
26
microtubule-binding domain of end-binding protein 1 (EB1). J. Biol. Chem. 2003, 278, 36430–36434. WANG, C.-H., WALSH, M., BALASU-BRAMANIAN, M. K., DOKLAND, T., Expression, purification, crystallization and preliminary crystallographic analysis of the calponin-homology domain of Rng2. Acta Crystallogr. D. 2003, 59, 1809–1812. MORRIS, G. E., NGUYEN, T. M., NGUYEN, T. N., PEREBOEV, A., KENDRICK-JONES, J., WINDER, S. J., Disruption of the utrophin–actin interface by monoclonal antibodies and prediction of an actin-binding surface of utrophin. Biochem. J. 1999, 337, 119–123. GARCIA-ALVAREZ, B., BOBKOV, A., SONNENBERG, A., DE PEREDA, J. M., Structural and functional analysis of the actin binding domain of plectin suggests alternative mechanisms for binding to F-actin and integrin β4. Structure 2003, 11, 615–625. GOLDSMITH, S. C., POKALA, N., SHEN, W., FEDOROV, A. A., MATSUDAIRA, P., ALMO, S. C., The structure of an actin-cross-linking domain from human fimbrin. Nat. Struct. Biol. 1997, 4, 708–712. DJINOVIC-CARUGO, K., personal communication. NORWOOD, F. L. M., SUTHERLAND-SMITH, A. J., KEEP, N. H., KENDRICK-JONES, J., The structure of the N-terminal actin-binding domain of human dystrophin and how mutations in this domain may cause Duchenne or Becker muscular dystrophy. Struct. Fold. Design 2000, 8, 481–491. SCHULUNEGGER, M., BENNET, M., EISENBERG, D., Oligomer formation by 3D domain swapping: a model for protein assembly and disassembly. Adv. Protein Chem. 1997, 50, 61–122. SUTHERLAND-SMITH, A. J., MOORES, C. A., F. L., Norwood, HATCH, V., CRAIG, R., KENDRICK-JONES, J., LEHMAN, W., An atomic model for actin binding by the CH domains and spectrin repeat modules of utrophin and dystrophin. J. Mol. Biol. 2003, 329, 15–33. HANEIN, D., et al., An atomic model of fimbrin binding to F-actin and its implications for filament crosslinking
13
14 The Calponin Homology (CH) Domain
27
28
29
30
31
32
33
34
35
and regulation. Nat. Struct. Biol. 1998, 5, 787–792. YLA¨ NNE, J., SCHEFFZEK, K., YOUNG, P., SARASTE, M., Crystal Structure of the α-actinin rod reveals an extensive torsional twist. Struct. Fold. Design 2001, 9, 597–604. RYBAKOVA, I. N., AMMAN, K. J., ERVASTI, J. M., A new model for the interaction of dystrophin with F-actin. J. Cell Biol. 1996, 135, 661–672. RYBAKOVA, I. N., PATEL, J. R., DAVIES, K. E., YURCHENKO, P. D., ERVASTI, J. M., Utrophin binds laterally along actin filaments and can couple costameric actin with sarcolemma when over-expressed in dystrophin-deficient mice. Mol. Biol. Cell 2002, 13, 1512–1521. MOORES, C. A., KEEP, N. H., KENDRICK-JONES, J., Structure of the utrophin actin-binding domain bound to F-actin reveals binding by an induced fit mechanism. J. Mol. Biol. 2000, 297, 465–480. ORLOVA, A., RYBAKOVA, I. N., PROCHNIEWICZ, E., THOMAS, D. D., ERVASTI, J. M., EGELMAN, E. H., Binding of dystrophin’s tandem calponin homology domain to F-actin is modulated by actin’s structure. Biophys. J. 2001, 80, 1926–1931. KORENBAUM, E, RIVERO, F., Calponin homology domains at a glance. J. Cell Sci. 2002 115, 3543–3545. CORREIA, I., CHU, D., CHOU, Y. H., GOLDMAN, R. D., MATSUDAIRA, P., Integrating the actin and vimentin cytoskeletons: adhesion-dependent formation of fimbrin–vimentin complexes in macrophages. J. Cell Biol. 1999, 146, 831–482. MCGOUGH, A., WAY, M., DEROSIER, D., Determination of the alpha-actinin-binding site on actin filaments by cryoelectron microscopy and image analysis. J. Cell Biol. 1994, 126, 433–443. FONTAO, L., GEERTS, D., KUIKMAN, I., KOSTER, J., KRAMER, D., SONNENBERG, A., The interaction of plectin with actin: evidence for cross-linking of actin filaments by dimerization of the actin-binding domain of plectin. J. Cell Sci. 2001, 114, 2065–2076.
36 FUCHS, P., ZORER, M., REZNICZEK, G. A., SPAZIERER, D., OEHLER, S., CASTANON, M. J., HAUPTMANN, R., WICHE, G., Unusual 5 transcript complexity of plectin isoforms: novel tissue-specific exons modulate actin-binding activity. Hum. Mol. Genet. 1999, 8, 2461–2472. 37 GIMONA, M., unpublished. 38 STRADAL, T., KRANEWITTER, W., WINDER, S. J., GIMONA, M., CH domains revisited. FEBS Lett. 1998, 432, 134–137. 39 NAKAMURA, M., ZHOU, X. Z., LU, K. P., Critical role for the EB1 and APC interaction in the regulation of microtubule polymerization. Curr. Biol. 2001, 11, 1062–1067. 40 MORRISON, E. E., MONCUR, P. M., ASKHAM, J. M., EB1 identifies sites of microtubule polymerisation during neurite development. Brain Res. Mol. Brain Res. 2002, 98, 145–152. 41 BIENZ, M., The subcellular destinations of APC proteins. Nat. Rev. Mol. Cell Biol. 2002 3, 328–338. 42 TIRNAUER, J. S., GREGO, S., SALMON, E. D., MITCHISON, T. J., EB1–microtubule interactions in Xenopus egg extracts: role of EB1 in microtubule stabilization and mechanisms of targeting to microtubules. Mol. Biol. Cell 2002, 13, 3614–3626. 43 ROGERS, S. L., ROGERS, G. C., SHARP, D. J., VALE, R. D., Drosophila EB1 is important for proper assembly, dynamics, and positioning of the mitotic spindle. J. Cell Biol. 2002, 158, 873–884. 44 MIMORI-KIYOSUE, Y., SHIINA, N., TSUKITA, S., The dynamic behavior of the APC-binding protein EB1 on the distal ends of microtubules. Curr. Biol. 2000, 10, 865–868. 45 ASKHAM, J. M., MONCUR, P., MARKHAM, A. F., MORRISON, E. E., Regulation and function of the interaction between the APC tumour suppressor protein and EB1. Oncogene 2000, 19, 1950–1958. 46 BIENZ, M., Spindle cotton on to junctions, APC and EB1. Nat. Cell Biol. 2001, 3, E67–E69. 47 NIKOLOPOULOS, S. N., TURNER, C. E., Actopaxin, a new focal adhesion protein that binds paxillin LD motifs and actin and regulates cell adhesion. J. Cell Biol. 2000, 151, 1435–1448.
References 48 OLSKI, T. M., NOEGEL, A. A., KORENBAUM, E., Parvin, a 42 kDa focal adhesion protein, related to the alpha-actinin superfamily. J. Cell Sci. 2001, 114, 525–538. 49 NIKOLOPOULOS, S. N., TURNER, C. E., Integrin-linked kinase (ILK) binding to paxillin LD1 motif regulates ILK localization to focal adhesions. J. Biol. Chem. 2001, 276, 23499–23505. 50 NIKOLOPOULOS, S. N., TURNER, C. E., Molecular dissection of actopaxin–integrin–linked kinase–paxillin interactions and their role in subcellular localization. J. Biol. Chem. 2002, 277, 1568–1575. 51 HATCH, V., ZHI, G., SMITH, L., STULL, J. T., CRAIG, R., LEHMAN, W., Myosin light chain kinase binding to a unique site on F-actin revealed by three-dimensional image reconstruction. J. Cell Biol. 2001, 154, 611–617. 52 LITJENS, S. H. M., KOSTER, J., KUIKMAN, I., VAN WILPE, S., DE PEREDA, J. M., SONNENBERG, A., Specific binding of the plectin actin-binding domain to β4 integrin. Mol. Biol. Cell 2003, 14, 4039–4050. 53 GIMONA, M., MITAL, R., The single CH domain of calponin is neither sufficient nor necessary for F-actin binding. J. Cell Sci. 1998 111, 1813–1821. 54 FU, Y., LIU, H. W., FORSYTHE, S. M., KOGUT, P., MCCONVILLE, J. F., HALAYKO, A. J., CAMORETTI-MERCADO, B., SOLWAY, J., Mutagenesis analysis of human SM22: characterization of actin binding. J. Appl. Physiol. 2000, 89, 1985–1990. 55 GOODMAN, A., GOODE, B. L., MATSUDAIRA, P., FINK, G. R., The Saccharomyces cerevisiae calponin/transgelin homolog Scp1 functions with fimbrin to regulate stability and organization of the actin cytoskeleton. Mol. Biol. Cell 2003 14, 2617–2629. 56 WINDER, S. J., JESS, T., AYSCOUGH, K. R., SCP1 encodes an actin bundling protein in yeast. Biochem. J. 2003, 375, 287–295. 57 LEINWEBER, B. D., LEAVIS, P. C., GRABAREK, Z., WANG, C. L., MORGAN, K. G., Extracellular regulated kinase (ERK) interaction with actin and the calponin homology (CH) domain of actin-binding
58 59
60
61
62
63
64
65
66
67 68
proteins. Biochem. J. 1999, 344, 117–123. GRUBINGER, M., GIMONA, M., unpublished. MA, Y., BOGATCHEVA, N. V., GUSEV, N. B., Heat shock protein (hsp90) interacts with smooth muscle calponin and affects calponin binding to actin. Biochim. Biophys. Acta 2000, 1476, 300–310. BOGATCHEVA, N. V., MA, Y., UROSEV, D., GUSEV, N. B., Localization of calponin binding sites in the structure of 90 kDa heat shock protein (Hsp90). FEBS Lett. 1999, 457, 369–374. ZUGAZA, J. L., LOPEZ-LAGO, M. A., CALOCA, M. J., DOSIL, M., MOVILLA, N., BUSTELO, X. R., Structural determinants for the biological activity of vav proteins. J. Biol. Chem. 2002, 277, 45377–45453. GROYSMAN, M., SHIFRIN, C., RUSSEK, N., KATZAV, S., Vav, a GDP/GTP nucleotide exchange factor interacts with GDIs, proteins that inhibit GDP/GTP dissociation. FEBS Lett. 2000, 467, 75–80. KATZAV, S., CLEVELAND, J. L., HESLOP, H. E., PULIDO, D., Loss of the amino-terminal helix–loop–helix domain of the vav proto-oncogene activates its transforming potential. Mol. Cell. Biol. 1991, 11, 1912–1920. ABE, K., WHITEHEAD, I. P., O’BRYAN, J. P., DER, C. J., Involvement of NH2 -terminal sequences in the negative regulation of Vav signalling and transforming activity. J. Biol. Chem. 1999, 274, 30410–30418. YABANA, N., SHIBUYA, M., Adaptor protein APS binds the NH2 -terminal autoinhibitory domain of guanine nucleotide exchange factor Vav3 and augments its activity. Oncogene 2002, 21, 7720–7729. AGHAZADEH, B., LOWRY, W. E., HUANG, X. Y., ROSEN, M. K., Structural basis for relief of autoinhibition of the Dbl homology domain of protooncogene Vav by tyrosine phosphorylation. Cell 2000, 102, 625–633. KRANEWITTER, W. J., GRUBINGER, M., GIMONA, M., unpublished. FRALEY, T. S., TRAN, T. C., CORGAN, A. M., NASH, C. A., HAO, J., CRITCHLEY, D. R., GREENWOOD, J. A., Phosphoinositide binding inhibits α-actinin bundling
15
16 The Calponin Homology (CH) Domain
69
70
71
72
73
activity. J. Biol. Chem. 2003, 278, 24039–24045. WINDER, S. J., The membrane–cytoskeleton interface: the role of dystrophin and utrophin. J. Muscle Res. Cell Motil. 1997, 18, 617–629. KAPLAN, J. M., et al., Mutations in ACTN4, encoding α-actinin-4, cause familial focal segmental glomerulo-sclerosis. Nat. Genet. 2000, 24, 251–256. BEGGS, A. H., et al., Exploring the 73 molecular basis for variability among patients with Becker muscular dystrophy: dystrophin gene and protein studies. Am. J. Hum. Genet. 1991, 49, 54–67. PRIOR, T. W., BARTOLO, C., PEARL, D. K., 75 PAPP, A. C., SNYDER, P. J., SEDRA, M. S., BURGHES, A. H. M., MENDELL, J. R., Spectrum of small mutations in the dystrophin coding region. Am. J. Hum. Genet. 1995, 57, 22–33. ROBERTS, R. G., GARDNER, R. J., BOBROW, M., Searching for the 1 in 2,400,000: a
review of dystrophin gene point mutations. Hum. Mutat. 1994, 4, 1–11. 74 ROBERTSON, S. P., et al., Localized mutations in the gene encoding the cytoskeletal protein filamin A cause diverse malformations in humans. Nat. Genet. 2003, 33, 487–491. 75 HASSOUN, H., et al., Characterization of the underlying molecular defect in hereditary spherocytosis associated with spectrin deficiency. Blood 1997, 90, 398–406. 76 DOKLAND, T., personal communication.
Websites Directly Related to the Domain http://www.proteinmodules.org General platform site for protein domains and modules and official web site of the Protein Modules Consortium.
1
PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding Laurence A. Lasky, Nicholas J. Skelton, and Sachdev S. Sidhu
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 Introdution
Protein–protein binding interfaces are involved with virtually every aspect of intraand extracellular physiology. These interfaces range in size from the very large binding sites formed between antibody combining regions and their cognate antigens to the very small interfaces recognized by a variety of more compact motifs including, for example, the SH2 and SH3 domains. The vast majority of these protein modules bind to unmodified or post-translationally modified regions within proteins, as opposed to sites at the very amino- or carboxy-terminal ends of proteins. The potential benefits of recognition via the beginnings or ends of proteins are two-fold. First, recognition of these regions would allow for enhanced specificity by taking advantage of binding interactions involving the free ends of proteins. This might allow for a relatively small binding site, since specificity can be induced both by mainchain and sidechain interactions as well as by amino- or carboxy-type interactions. Second, recognition of terminal regions of proteins allows for the rest of the protein to be unencumbered by an internally bound protein-recognition motif. This would enable efficient scaffolding with other functionally related proteins together with simultaneous activity of the terminally-bound protein. It thus appeared likely to many investigators that scaffolding proteins that recognized the ends of other proteins were likely to exist. The discovery of a large family of proteins containing a compact domain, termed the PDZ domain, has fulfilled this expectation, at least with respect to C-terminal interactions. Determination of the sequences of several large proteins in the early 1990s led to the discovery of a 90–100 amino acid long domain that was repeated within the proteins and was conserved in many other molecules derived from a variety of organisms and cell types. The initial discovery of this motif was in three polypeptides: the mammalian protein post-synaptic density-95 (PSD-95), the Drosophila Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
melanogaster epithelial tumor suppressor protein, Discs Large (DLG), and the mammalian epithelial tight junction protein, zonula occludens-1 (ZO-1) [1–7]. Initial sequence comparisons revealed a reasonable degree of conservation between these diverse domains, including a highly conserved amino acid sequence, GLGF, and the motifs were initially referred to as GLGF domains. Subsequently, because this short sequence is not identical in all of these domains, as well as to honor the initial sites of discovery, the motifs were renamed PDZ domains (PSD-95, DLG, ZO-1). Large-scale genomic analyses, together with in silico bioinformatics work, has allowed for the determination of the number of potential PDZ domains in the organisms for which we have reasonably complete genomic DNA sequence information. Interestingly, and in contrast to both total gene numbers and genome sizes, the numbers of PDZ domains in various organisms seem to increase dramatically with increasing complexity, although the number of genomes so far analyzed is far too small to be certain of the validity of this trend. However, there are ∼90 domains in the nematode Caenorhabditis elegans, ∼130 in the fly D. melanogaster, and over 400 in the human genome. Some of this increased complexity can be explained by increasingly larger gene families. For example, although the nematode appears to have only two members of the LAP (leucine-rich repeat and PDZ domain) family, higher organisms appear to have additional homologs encoded within their genomes. This appears to be so with many other PDZ domain-containing proteins, including DLG, ZO-type, the MAGI proteins, and LIN-7, among others. The potential reasons for this increased complexity, including tissue specificity, diversity of function, etc., remain to be fully elucidated. PDZ domains are virtually always embedded in proteins that are assembled from multiple protein motifs (a complete list of PDZ-containing proteins together with other domains, functions, and annotations can be found at http://smart.emblheidelberg.de/). These protein motifs can include other PDZ domains, and there are examples of proteins containing only PDZ domains, such as MUPP1, in which the protein consists of 13 PDZ motifs [8]. Other multi-PDZ domain-containing proteins include INAD, par-3, and NHERF. It seems likely that this type of multi-PDZ protein would be involved in the assembly of functionally related proteins via C-terminal recognition, and this has been proven both genetically and biochemically for the INAD protein of Drosophila (see below), as well as several others. A second large family of PDZ domain-containing proteins is the MAGUK (membrane-associated guanylate kinase) family. This large subgroup of PDZ-containing proteins contains the founding members of the PDZ family, and it is unified by the inclusion of a protein motif with distant homology to guanylate kinases (the GUK domain), although there is no evidence that this domain has enzymatic activity. Many of these proteins, including DLG, ZO-1, and the MAGI (membrane-associated guanylate kinases with inverted orientation) proteins, are associated with the tight junctions of epithelial cells and are presumably involved in assembly and maintenance of these important structures. This large family contains proteins with other potential protein interaction domains, including, for example, SH3, WW, calmodulin kinase (CAMK), and L27 motifs. Finally, a third large group of PDZ proteins contains a variety of other sequence motifs (but not a
2 Structural Analysis of PDZ Domains 3
guanylate kinase-like domain) juxtaposed with one or more PDZ domains. Included in this family are proteins containing leucine-rich repeats (the LAP proteins), LIM, or crib motifs. A unifying feature of all of these PDZ-containing proteins is that the molecules contain a variety of protein interaction motifs with no clear evidence of enzymatic activity in any of the family members. These data, together with their localization to diverse intracellular sites, such as epithelial junctions and the neuronal post-synaptic density, suggest that the PDZ-containing proteins are predominately site-specific scaffolding proteins that assemble functional complexes, many of which appear to be important for the assembly, maintenance, and function of these subcellular anatomies.
2 Structural Analysis of PDZ Domains
Structural studies of many PDZ domains have identified a common 3D fold that undoubtedly provides the physical underpinning for the sequence motifs that can be used to identify the domains [2, 9]. Today, 34 independent structures of 20 different PDZ domains have been reported, and all consist of a six-stranded β barrel (strands βA through βF) capped by one short (α1) and one long (α2) helix (Figure 1) [10, 11]. Although the β-barrel core is common to all PDZ domain structures, there is sequence variation within the elements of regular secondary structure and also in the length and composition of the loops connecting them. In addition, a number of the domains also contain additional N- or C-terminal appendages that pack against the conserved core [9, 12, 13].
Fig. 1 Schematic view of the PDZ domain fold. In this structure of the Erbin PDZ domain, $ strands are shown as arrows and " helices as coils. The phage-derived peptide ligand (acTGWETWVCOOH ) is shown in stick form with the C-terminal carboxylate at the top of the view. Coordinates are taken from PDB accession number 1N7T.
4 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
Over half of the PDZ entries present in the structural database have a peptide ligand associated with the domain. In all instances, the GLGF motif of the domain wraps around the C-terminal carboxylate of the ligand [14]. The remainder of the ligand lies in the cleft between strand β2 and helix α2, with the peptide forming an antiparallel β-sheet interaction with β2. As a consequence of this fixed backbone hydrogen-bonding pattern, the ligand sidechains maintain a fixed register with respect to the PDZ domain and are able to contact only a limited subset of the domain sidechains (Figure 1). To standardize the categorization of all PDZ interactions, the ligand residues are numbered from 0 at the C terminus counting by increasingly negative integers toward the N terminus [15]. The importance of interactions at the 0 and −2 sites was appreciated from the earliest studies [16– 19] and led to the classification of PDZ domains into two classes based on their ligand preference at the −2 site. Subsequent studies have shown that the −1 site can also make a crucial contribution to binding [20–23]. Moreover, PDZ domains having a preference for ligands with a particular −3 sidechain are common [16, 20– 22, 24, 25], and preference for residues N-terminal to −3 have been observed, especially if the β2–β3 loop is long [13, 22, 26, 27]. Since the sidechains of ligand residues 0, −1, −2, and −3 all have relatively fixed interactions with their PDZ domain host, establishing the ‘rules’ for predicting ligand affinity and selectivity should be possible, given a suitable training set of PDZ domain–ligand interactions (e.g., see [28]). In addition to the ‘canonical’ ligand binding mode, the co-complex between αsyntrophin and neuronal nitric oxide synthetase (nNOS) PDZ domains indicates that non-C-terminal interactions are also possible. In this particular example, an ancillary β hairpin at the N terminus of nNOS lies in the peptide binding groove of α-syntrophin with the reverse turn of the hairpin positioned approximately where a C-terminal carboxylate would ordinarily reside [29]. More recently, structural studies of the seventh PDZ domain from GRIP demonstrate that its peptide-binding groove is not well-formed [30]. NMR chemical shift mapping suggests that GRASP1, the cognate ligand, binds to a hydrophobic surface formed by the strand β4–β5 hairpin and the face of helix α2 away from the peptide binding groove; precise details of the interaction were not reported. Finally, there have been reports that not all PDZ domains behave as independently folding units. Structures determined for tandem PDZ domains from PSD-95 and syntenin suggest that the domains can have a relatively fixed orientation with respect to each other, which may have implications for their function as scaffold proteins [31–33]. The quaternary structure observed for PDZ6 from GRIP-1 suggest alternative modes of such supramolecular assembly [34]. Finally, the PDZ4 and PDZ5 domains of GRIP-1 could not be expressed efficiently alone and had NMR spectra indicative of partial folding. However, the tandem domain expressed well and appeared highly folded by the same criteria [35]. Subsequent studies have shown that the two domains are intimately packed (with a different geometry than that observed in PSD-95 and syntenin tandem domains); the occluded peptide binding pocket of PDZ4 suggests that its primary function is not ligand binding, but rather, stabilization of the PDZ5 structure [36].
3 Analysis of PDZ Domain–Ligand Interactions with Mutagenesis and Synthetic Peptides 5
3 Analysis of PDZ Domain–Ligand Interactions with Mutagenesis and Synthetic Peptides
In vitro studies of PDZ domain–ligand interactions have proven invaluable for complementing and extending the knowledge acquired from structural analyses, and also for refining and even revising our views on physiologically relevant in vivo interactions. Studies with combinatorial peptide libraries, in particular, have been instrumental in defining the fine specificity of individual domains and revealing binding contributions from up to six C-terminal ligand residues. Early studies made use of synthetic peptide libraries [16] or a ‘peptides-on-plasmids’ library generated in Escherichia coli [24, 27–39], but more recently, C-terminal peptide libraries displayed on M13 [20] or lambda phage [40] have also been developed. Peptides fused to the C terminus of the D-capsid protein of lambda phage resulted in high valency display that enabled the selection of even very low-affinity ligands for the seven PDZ domains of the human INAD-like (INADL) protein [40]. Different degrees of consensus were observed, with different PDZ domains selecting ligands with homology at two, three, or four C-terminal residues. Unfortunately, the resulting low homology data, which may have been hampered by the high display levels of the peptides, were not as enlightening as the genetic/biochemical data derived from the homologous Drosophila protein INAD (see below). In contrast, low valency display achieved through C-terminal fusion to the M13 major coat protein yielded high-affinity peptide ligands that bound to MAGI-3 PDZ2 with affinities in the submicromolar range and showed a clear consensus in all four C-terminal positions [20]. C-terminal M13 phage display was also used to study the binding specificity of the Erbin PDZ domain, which had originally been isolated as a putative binding partner for ErbB-2 in a yeast two-hybrid screen (see below) [41]. Interestingly, the Erbin PDZ-binding consensus defined by phage display ([D/E][T/S]WVCOOH ) differed significantly from the C terminus of ErbB-2 (DVPVCOOH ) and instead matched the conserved C termini of three p120-related catenins (DSWVCOOH ). Subsequent in vitro affinity measurements with synthetic peptides showed that the Erbin PDZ domain binds to the catenin C-terminal sequence at least 100 times tighter than to the ErbB-2 C terminus. Although the yeast two-hybrid method is a powerful technology for discovering natural protein–protein interactions, experience with the Erbin PDZ domain shows that the method can be highly sensitive to even extremely low-affinity PDZ domain–ligand interactions and, thus, it is worthwhile corroborating yeast two-hybrid results with the results of other methods. In a subsequent study, NMR spectroscopy, phage display, and in vitro affinity measurements were combined to provide a detailed analysis of the structure and function of the Erbin PDZ domain [22]. The NMR co-complex with a phageoptimized peptide revealed five distinct binding sites on the protein, which accommodated the five C-terminal ligand residues (Figure 2). Binding assays with a panel of synthetic peptides showed that each of the five ligand sidechains contributes to binding, with the last two sidechains and the C-terminal carboxylate providing
6 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
Fig. 2 Summary of alanine-scanning mutagenesis data for Erbin PDZ domain binding to the phage-optimized peptide ligand. The solvent-accessible surface of Erbin residues that were mutated to alanine in a combinatorial phage library are colored according to their effect on ligand binding, with red, yellow, and blue indicating a large, moder-
ate, or no effect on binding, respectively. Likewise, the ligand sidechains are colored according to the effect that peptide alanine substitutions had on binding to wild-type Erbin PDZ domain, with red and yellow indicating a greater than 100-fold or greater than 10-fold effect on IC50 , respectively.
the majority of the binding energy. A combinatorial mutagenesis strategy was also used to assess the effects of alanine substitutions for 44 PDZ domain sidechains in and around the peptide-binding site. When the results of the peptide affinity and protein mutagenesis studies were mapped onto the NMR structure, they provided an extremely comprehensive view of the molecular elements involved in PDZ domain–ligand recognition (Figure 2). More traditional point-mutation studies have been used to understand the binding of α-syntrophin to both C-terminal peptides and nNOS, and also to investigate the impact of charge substitutions on ligand binding [42, 43]. In an interesting extension of such an approach, Ranganathan and colleagues looked at sites of pairwise covariation among several hundred PDZ domains [44]. This analysis identified energetically coupled pathways emanating from the peptide-binding groove to sites on the opposite side of the domain. Experimental validation of these pathways was afforded by ligand-binding measurements in double-mutant cycles; their presence raises the possibility of functionally relevant allosteric processes associated with ligand binding [44]. Finally, progress has also been made on the development of computational algorithms for predicting and engineering PDZ domain specificities [28]. Given the importance of PDZ domains in mediating numerous protein assemblies, attempts have been made to antagonize their interactions so as to elicit a biological response. Optimal peptide ligands identified from phage display
4 Molecular and Signaling Functions of PDZ Domains 7
libraries have been used to such ends, to block the interaction between Erbin and δ-catenin [21]; such reagents were able to induce distinct phenotypes in neuron outgrowth assays (K. Kosik, personal communication). Small peptides have also been used to disrupt the in vivo interaction between an NMDA receptor and a PDZ domain of PSD-95 [45]. Recently, organic mimics of C-terminal peptide ligands have been proposed that can bind in a selective manner to MAGI-3 PDZ2 and inhibit formation of complexes with PTEN peptides [46]. In a further recent development, halothane anesthetics have been found to bind to PSD-95 and PSD-93 and to inhibit interaction with NMDA receptor or nNOS. The binding was localized to the peptide-binding groove, explaining the mechanism of antagonism of PDZmediated protein assembly, and possibly also explaining the mechanism by which these anesthetics exert their physiological response [47].
4 Molecular and Signaling Functions of PDZ Domains
Although an ever-increasing diversity of PDZ-mediated interactions has emerged in the literature (for example, see [7]), we focus here on a few examples of interactions that have been validated by biochemical as well as more physiological methods, particularly genetics. As outlined in the structure section above, many of the interactions determined by biochemical techniques, such as the yeast two-hybrid and coprecipitation methods, are encumbered with numerous potential artifacts, such as very low, probably nonphysiological, binding affinities. Thus, although it is clear that many of the interactions predicted by these approaches are undoubtedly correct, there are likely to be many incorrectly described interactions as well. Luckily, there are a variety of physiologically well characterized PDZ-mediated interactions that allow for a number of interesting conclusions regarding the diverse functions of these motifs. 4.1 INAD as a Molecular Scaffold
Anyone who has attempted to swat a fly cannot help but marvel at the insect’s ability to escape certain death. This capability is in large part due to the rapid and efficient transduction of visual information. As in other organisms, vision in the fly D. melanogaster is initiated by the absorption of photons by the G protein-coupled receptor, rhodopsin. This results in a conformational change in the receptor, which induces the association with a Gq type of GTPase signaling protein. This interaction induces a modulation in downstream signaling molecules, including protein kinase C (PKC) and phospholipase C (PLC), which results in opening the transient receptor potential (TRP) and transient receptor potential-like (TRPL) calcium ion channels. The flow of ions through these channels is subsequently transmitted to the central nervous system, where the appropriate response to the initial visual input is engendered. Once the visual input signal is abolished, the system
8 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
undergoes a negative feedback response to intracellular calcium concentrations by down-regulating the activities of rhodopsin and phospholipase C [48]. It is not surprising that the efficiency of visual signal transduction as regulated by this large number of signaling proteins would be enhanced by placing the proteins very close to one another. Genetic studies of strains of Drosophila carrying mutations at the InaD locus suggested that this gene was critical for proper function of the light-gathering ommatidial cells in the eye [49–52]. Morphological analysis of TRP channel localization demonstrated that this critical visual component appears to be mis-localized in the eyes of flies with various InaD mutations [53]. Isolation of the gene and analysis of the protein encoded by the InaD locus demonstrated that this gene encodes a molecule containing five PDZ domains [49–52]. This result immediately suggested the possibility that the INAD protein was involved in assembly of components of the visual signal transduction system into a macromolecular structure that enhanced signaling efficiency by closely associating these components. Biochemical analysis has demonstrated that at least seven proteins bind to the INAD protein, with many of the proteins binding via an interaction between their C termini and the INAD PDZ domains [38, 54–59]. These results suggest that INAD functions as a scaffold to assemble the components of the fly visual signal transduction system into a ‘signalosome’ (Figure 3). Functional analyses suggest, as expected, that the INAD signalosome functions to enhance signaling efficiency, since mutations in the scaffolding protein result in significantly increased latency between visual stimulus and cellular response. Furthermore, this macromolecular complex appears to be also involved in termination of the visual response [62, 63]. In summary, INAD meets the expectations that multi-PDZ proteins can function as scaffolds to assemble macromolecular complexes that enhance signaling.
Fig. 3 The Drosophila visual signal transduction complex is assembled by the multiPDZ protein INAD (adapted from [48]). This figure illustrates that various cell-surface and intracellular proteins involved in signal transduction, including G proteins (Gq"), phospholipases (PLC), ion channels (TRP), are physically brought together by interactions between their C termini and the multi-
PDZ protein INAD. The complex is maintained in an appropriate subcellular location by interactions between the NINAC protein and the cytoskeleton (F-actin). In addition, PDZ–PDZ interactions appear to hold separate complexes together into a supercomplex, although this type of interaction is not mediated by C-terminal binding.
4 Molecular and Signaling Functions of PDZ Domains 9
4.2 LIN-7-Receptor Tyrosine Kinase Interactions and Subcellular Localization
Vulval development in C. elegans has provided an elegant system for analysis of the genetics and biochemistry of complex organ formation [62, 63]. Important early work in this system demonstrated that a homolog of the epidermal growth factor (EGF) receptor family, called LET-23, was critical for the formation of this intricate structure [64]. The activation of this receptor by LIN-3, an EGF-like molecule expressed by basolaterally localized gonadal anchor cells, suggested that the receptor was found on the basolateral surface of the vulval precursor cells, and in fact, this was found to be true [65]. The asymmetric basolateral orientation of this receptor was critical for completion of the vulval developmental program, since it allowed the receptor access to the adjacently produced activating factor. It appeared likely that the asymmetric localization of LET-23 was accomplished by a combination of both specific intracellular trafficking and retention mechanisms. An elegant analysis by the Kim laboratory soon provided strong evidence that the basolateral localization of LET-23 was accomplished by an interaction with a PDZ domain-containing protein, LIN-7 [66, 67]. This protein was found to be associated in a complex with two other PDZ domain-containing proteins, LIN-2 and LIN-10 [66, 68]. The assembly of this complex did not involve PDZ-binding interactions, a finding that suggested that the PDZ domains within the complex were free to interact with the C termini of other proteins. Importantly, it was clearly demonstrated that the C terminus of LET-23 was a type 1 PDZ-binding motif (TCLCOOH ), and it was established that this motif bound to the PDZ domain of LIN-7 [66–68]. Further genetic studies suggested that this interaction was critical for basolateral targeting of LET-23 and vulval development, providing the first molecular and genetic evidence for the role of PDZ-containing proteins in the subcellular targeting of signaling receptors in an epithelial cell [66–68] (Figure 4). Additional work suggested that the role of this interaction was to inhibit internalization of the bound receptor, consistent with the suggestion that the LIN-7-PDZ interaction was required for retention of the receptor at the basolateral surface of the polarized epithelial cell [66–68]. Because C. elegans LET-23 is a prototype of the EGF receptor family of higher organisms, it was important to establish the mechanisms by which these receptors, which include the oncogenically associated HER2-neu/ErbB-2 receptor, are asymmetrically maintained in polarized epithelial cells. An interesting initial study in this field was that of Borg and colleagues, who showed that Erbin, a member of the LAP family of PDZ-containing proteins, was involved in delivery and/or retention of the ErbB-2 receptor at the basolateral surface of epithelial cells [41]. However, as described above, further work demonstrated that the affinity of the ErbB-2-Erbin interaction was likely to be far too low to efficiently accomplish intracellular binding [21, 22]. In addition, several laboratories reported a high-affinity set of protein ligands for the Erbin PDZ domain, which included a variety of p120-type catenins [21, 69–71]. It thus appeared likely that Erbin was not involved in subcellular targeting of ErbB-2 [72], although initial data of Borg et al. did suggest that the ErbB-2 C
10 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
Fig. 4 The EGF-like receptor LET-23 is localized in the basolateral region of the epithelial cell by the LIN-2, -7, -10 complex of PDZ proteins. This figure illustrates the role of the multi-PDZ protein complex containing LIN-2, -7, and -10 in mediating the subcellular localization of the Caenorhabditis elegans EGF-like receptor LET-23. LET-23 is kept in the basolateral part of the vulval epithelial progenitor cell, where it binds to an EGF-like ligand and induces differentiation. LIN-7 binds to both the C terminus of LET-23 and potentially, by analogy with the mammalian system, to a site in the kinase domain and determines trafficking by the golgi complex to the basolateral region.
The PDZ complex is also involved in the inhibition of receptor-ligand endocytosis. In the absence of LIN-7, the default trafficking pathway for LET-23 is to the apical region of the cell. The figure also illustrates that Erbin, a PDZ protein previously thought to be involved in the subcellular localization of a LET-23-related receptor, HER2-neu (erb2), is now thought to interact with catenin-like proteins at the adherens junction (A. J.) of the epithelial cell (see Figure 5). Finally, the figure also suggests that a similar trafficking system is involved in the appropriate subcellular localization of other EGF-like receptors, including those in the EGFR family (i.e., erb2, EGFR, etc.).
terminus was critical for basolateral localization [41]. This conundrum was recently resolved in a report that clearly demonstrated that the mammalian homolog of the LIN-7 protein was involved in the basolateral targeting and retention of the ErbB2 protein [73]. Importantly, these investigators demonstrated that this targeting was accomplished by a bipartite signal in LIN-7, with an N-terminal domain involved in basolateral sorting and the PDZ domain involved in basolateral retention (Figure 4). Thus, the subcellular targeting of the EGF receptor tyrosine kinases, as well as, potentially, other receptor kinases, appears to involve PDZ domain interactions that are conserved from worms to humans [74]. Finally, it is likely that
4 Molecular and Signaling Functions of PDZ Domains 11
appropriate subcellular localization of other cell surface signaling molecules, such as for example, TGFα and the cystic fibrosis transmembrane regulator, is accomplished through interactions with PDZ-containing proteins such as GRASP and NHERF or CAP-70, respectively [75, 76]. 4.3 PDZ Domain Proteins and Epithelial Polarity Induction and Maintenance
The contrast between the aesthetically beautiful structure of the polarized epithelium and the grossly malformed appearance of the oncogenically transformed tissue could not be greater. In general, oncogenic transformation of epithelial cells is accompanied by a loss of normal polarization accompanied by increased proliferation, loss of the normal flat, single-cell epithelial morphology, and invasion of adjacent normal tissues by tumor cells [77]. Epithelial cells separate their polarized apical and basolateral regions by using complex junctional structures composed of adherens junctions and tight (septate) junctions [78]. The molecular mechanisms involved in formation of the polarized epithelium are complex and involve a multitude of intracellular and extracellular proteins, including a variety of signaling, scaffolding, and adhesion molecules. Notably, PDZ domain-containing molecules play major roles in the appropriate assembly and maintenance of the epithelium of all metazoan organisms [79–82]. It is not surprising that the powerful genetics of lower metazoan organisms, such as the fly and the nematode, have played a critical part in elucidating the roles of various PDZ domain-containing molecules in the assembly and maintenance of the epithelium [78]. Interest in PDZ proteins was stimulated early on, when it was found that mutations in two different molecules, discs large (DLG), a MAGUK-type PDZ protein, and Scribble, a LAP-type PDZ protein, resulted in the loss of epithelial integrity and a tumorigenic phenotype in Drosophila [83–87]. Furthermore, of the ∼50 known tumor suppressors in Drosophila, these two loci are the only places in which mutation leads to neoplasia in both imaginal discs and brain. These exciting results suggested that PDZ-containing molecules were likely to be involved in the assembly and/or maintenance of epithelial structures and that their loss correlated with a lack of morphological integrity as well as cellular hyper-proliferation and invasion, all hallmarks of oncogenically transformed cells. Importantly, further analysis revealed that not every cell with a PDZ protein mutation gave rise to a tumor, suggesting the need for additional oncogenic insults. Scribble and DLG both appear to be localized in the basolateral region of the epithelial cell. Another group of PDZ proteins, including the MAGUK protein Stardust (PALS-1), and the multi-PDZ protein Bazooka (Par-3) appear to be involved in formation of apical regions of the epithelial cell [88–90]. Elegant genetic and biochemical work from the Perrimon group has gone the farthest toward elucidating the roles of these various PDZ proteins in epithelial formation and maintenance [91]. These studies demonstrated that Bazooka is involved in the ‘apicalization’ of the cell, while the Scribble complex served to antagonize the Bazooka-induced formation of apical regions by repressing apicalization along the basolateral side of
12 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
the cell. These data were consistent with the subcellular localization of each of these proteins. Earlier work had demonstrated that the overexpression of a cell-surface protein called Crumbs increased apical formation, and a PDZ-mediated complex between Crumbs and the Stardust MAGUK counteracted the basolateral induction by Scribble [92–95]. Together, these data suggest the importance of PDZ domain proteins in the determination of this critical subcellular morphology, although the molecular mechanisms by which this control is mediated are still poorly understood (Figure 5). Although some of the PDZ interactions involved in this important signaling pathway have been elucidated, it will be fascinating to use newer methods, such as C-terminal phage display [20], to further dissect the molecular interactions
Fig. 5 The apicalbasolateral polarity of epithelial cells is in large part determined by a variety of PDZ proteins. Elegant genetic analyses in Drosophila and Caenorhabditis have demonstrated that a variety of PDZ-containing proteins, including Scribble (scrib), Discs Large (DLG), Lethal Giant Larvae (LGL), Erbin, and Bazooka are involved in the induction and maintenance of the apical-basal polarity of the epithelial cell. Scrib, LGL, and DLG are localized to the adherens junction, and their mutation leads to loss of apical–basal polarity and hyperproliferation that is tumor-like. Importantly, the oncogenic HPV viruses encode an E6 protein that is involved in mediating the degradation of these adherens junction components. Crumbs, an apical cell surface
protein that appears to induce apicalization, interacts with Bazooka via a PDZ-mediated binding event. Scrib induces a basolateral polarity, and the balance between scrib and the Bazooka–Crumbs complex appears to control apical-basal polarization. Erbin interacts via a PDZ-mediated interaction with p120-like catenins (i.e., *-catenin), and this interaction has been shown to regulate epithelial formation in C. elegans, possibly via an interaction with Ras-type GTPases. Finally, tight junction assembly and maintenance appears to be under the control of the ubiquitously expressed (in epithelia) ZO-1 and MAGI family of MAGUK-type PDZ proteins, which are also sensitive to E6-mediated degradation.
4 Molecular and Signaling Functions of PDZ Domains 13
induced by these proteins. In addition, a variety of other proteins, including the ZO-type MAGUKs as well as the MAGI family of proteins (for example, see [96]), are localized to the tight junctions of virtually all epithelial cells and are therefore likely to play an important role in epithelialization. Finally, although genetic evidence for a role for these PDZ proteins in human tumor formation is still lacking, recent data suggest that oncogenic human papilloma viruses (HPV) target a variety of PDZ proteins, including Scribble, DLG, and the MAGI proteins, for ubiquitinmediated proteasome degradation [97–101]. This targeting occurs via interactions between the C terminus of viral E6 protein, a ubiquitin ligase complex-associated protein, and various PDZ domains, suggesting that viruses can hijack cellular PDZ domains for their own nefarious purposes. A hallmark of HPV infection is loss of epithelial structure, hyperproliferation, and invasion again consistent with a role of these PDZ proteins in epithelial control. Although Scribble, a LAP family member, has clearly been shown to be critical for epithelial formation, other members of this family, including Erbin, are also likely to be important for epithelialization. The strongest genetic data for the function of Erbin comes from C. elegans, where the LET-413 protein, a likely homolog of mammalian Erbin, was clearly shown to be involved in epithelial formation [102, 103]. As described above, the probable binding partners for the Erbin PDZ domain are a group of p120-like catenins, including p0071, δ-catenin, and ARVCF, all of which contain a conserved C terminus (DSWVCOOH ), which was found to be a high-affinity Erbin PDZ-binding peptide [21, 22, 69–71]. Other important genetic data from C. elegans have recently been reported regarding JAC-1, the nematode homolog of the mammalian p120-related catenins. Mutation of this protein gives an epithelial phenotype that is reminiscent of the LET-413 mutations [104]. This result becomes even more interesting when viewed in the context of the work of Laura and colleagues, who predicted that the PDZ domain of LET-413 should bind with high affinity to the C terminus of JAC-1 (DSWVCOOH ) [21]. Together, these data suggest that a complex between LET-413 and JAC-1 is likely to be involved in epithelial regulation. Although the mechanism of this regulation remains to be fully elucidated, recent work by Huang and colleagues has demonstrated that the leucine-rich repeats of Erbin appear to be involved in the regulation of Ras GTPase activity [105, 106]. Epithelial integrity and intracellular communication are, in part, mediated by the cadherin family of adhesion molecules. Because Erbin is linked to this adhesion system via its PDZ-mediated catenin binding, the data of Huang et al. suggest a mechanism whereby epithelial adhesion might regulate Ras subcellular localization and activity [107]. 4.4 A Few Miscellaneous Examples: The Synapse, Disheveled, CARD MAGUKs, and Beta Adrenergic Receptors
In addition to the polarized epithelium, a second highly polarized cell type is the axon. In this cell type, the synapse corresponds to the highly polarized apical region of the epithelial cell, and it has been demonstrated that the synaptic and
14 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
post-synaptic regions of neurons contain a variety of signaling molecules, including neural transmitters and the many channels that they bind to. Importantly, the subcellular localization, signaling efficiency, and signalosome assembly of the various synaptic and post-synaptic signaling proteins appear to be accomplished by a variety of PDZ proteins [11, 108, 109]. Many potential PDZ-mediated interactions in neurons have been identified, but we discuss only two examples for which there is some genetic analysis of their function. By biochemical studies, a variety of excitatory receptors, including NMDA and glutamate receptors, have been found to be associated with PSD-95, one of the prototypical MAGUK-type PDZ proteins, and it has been assumed that their clustering and/or localization to the post-synaptic density requires an interaction with this MAGUK. Murine gene knockout studies of PSD-95, however, revealed that these receptors were correctly localized, but their signaling pathways with respect to synaptic plasticity seemed affected, although subtly [110]. Other ion channels, such as the potassium channel, also appear to interact via a C-terminal–PDZ domain interaction [11, 108, 109]. This interaction appeared, in cell culture assays, to be critical for the clustering and activity of these channels. However, recent murine knockout experiments have shown that the localization of these channels to the juxtaparanodal regions of myelinated axons does not require PSD-95 [111]. Thus, although PSD-95 is colocalized to these sites with the potassium channels, its presence is clearly not required for the localization of these channels. Together, these two examples highlight the importance of in vivo analysis in assessing the true function of PDZ-containing proteins. The signaling pathway regulated by the Wnt family of proteins has emerged as one of the most critical for numerous aspects of development, including the maintenance and expansion of stem cells. A critical component of this pathway is disheveled (Dsh/Dvl), a scaffolding protein containing a PDZ domain as well as a variety of other protein interaction motifs [112]. This protein is a critical component of the numerous pathways controlled by Wnt signaling, and mutations of the Dsh/Dvl gene result in various phenotypic changes in these pathways that mimic the loss of either Wnt protein or Wnt receptors [113–115]. Genetic analyses demonstrated that Dsh/Dvl was downstream from the Wnt/Wnt receptor but upstream from the β-catenin destruction complex, suggesting that Dsh/Dvl plays an important role in regulating this complex and its transcriptional activator, β-catenin. The role that the single Dsh/Dvl PDZ domain plays in regulating the Wnt pathway is controversial and appears to depend upon assay conditions, although much of this heterogeneity may be due to differences in the types of PDZ mutants that have been examined. Oddly, a bewildering array of proteins have been found to bind to the Dsh/Dvl PDZ domain [112]. Interestingly, these proteins appear to all have completely different C termini, suggesting that either this PDZ domain is unusually promiscuous or that most (or all) of these binding partners are artifactual. Recent data from our laboratory using the C-terminal phage-display method suggest that peptides that bind with high affinity to the Dsh/Dvl PDZ domain are quite different from the C-terminal sequences of these putative protein ligands (Zhang and Sidhu, unpublished observations). Thus, as with many other PDZ domain interactions reported in the literature, careful analysis of binding affinities is required
4 Molecular and Signaling Functions of PDZ Domains 15
to determine the validity of potential PDZ binding partners. Finally, a recently published study suggests that Dsh/Dvl is overexpressed in mesothelioma tumor cells, and this overexpression enhances β-catenin levels and cellular transformation [116]. As described above, the MAGUK-type PDZ domain-containing proteins are involved in the assembly of complexes that appear to be involved in polarization in both epithelial and neuronal cells. The CARD MAGUKs are a recently described family of proteins that contain MAGUK-type domains (including PDZ, SH3, and guanylate kinase domains) as well as a CARD (caspase recruitment domain) domain and a coiled-coil motif [117–121]. These proteins are encoded by multiple genes in mammals, and early data demonstrated that one CARMA-1 is expressed in cells of the immune system. Interesting early work demonstrated the function of CARMA-1 in the stabilization and activation of NFκB transcription factors, suggesting that this MAGUK may be involved in immune system function. This hypothesis was spectacularly supported by data from two groups who inactivated the CARMA-1 gene by using traditional knockout as well as a novel ENU-mediated point-mutation technique [122, 123]. Although some of the resultant phenotypes were different (likely due to the fact that one group introduced a point mutation in the coiled-coil domain and the other completely abolished expression), both types of mutant mice showed profound defects in immune system function, including loss of antigen receptor signaling in both B and T cells. Further analysis of the defects demonstrated that these mutant animals had defects in NFκB activation that appeared to be due to stabilization of the IκB inhibitor. Together, these data suggest that CARD MAGUKs are involved in antigen receptor signaling by controlling the stability of the inhibitor of NFκB. One possibility is that antigen receptor activation induces CARMA-1 to bring a protein degradation complex close to the IκB inhibitor. If the PDZ domain is critical to this event, inhibition of its binding activity by small-molecule antagonists may result in a decreased immune response, a potentially important result for diseases ranging from autoimmune syndromes to transplantation reactions. Heart failure is a major unmet medical problem with little or no effective treatment. The three beta-adrenergic receptors are G-coupled proteins that are regulated by the sympathetic nervous system, and it is likely that their modulation plays a role in the pathogenesis of heart failure [124]. Both beta 1 and 2 adrenergic receptors appear to be localized to specific subcellular localizations and are associated with downstream intracellular signaling molecules including, for example, G proteins, beta arrestin, and various kinases. This is highly reminiscent of other proteins that utilize PDZ domain-containing molecules to accomplish subcellular localization and signalosome assembly, and it has been shown that the C-terminal PDZ binding motifs of these two G-coupled proteins associate with PSD-95/MAGI-2 (beta 1) and the sodium–hydrogen exchange regulatory factor (NHERF) (beta 2) [125, 126]. Two major functional aspects of these interactions appear to be the control of subcellular localization and endocytosis as well as coupling to various G proteins [127, 128]. Interestingly, mutation of the PDZ binding site in the beta 2 receptor or blocking the interaction with a C-terminally derived peptide both result
16 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
in a change in intracellular coupling that is accompanied by increased contractility in response to ligand binding [129]. Increased contractility might be a useful response for heart failure patients, although other data suggest that modulation of intracellular coupling in this manner might also be detrimental to myocyte survival. In summary, these results provide for yet another interesting mechanism by which PDZ-containing scaffold or localization proteins play a critical role in cellular physiology and which might provide for a clinically useful application.
5 Conclusions
PDZ domains and the proteins they are found embedded in have emerged as major components of mechanisms for subcellular localization, signaling, and polarity induction. Although this chapter has not exhaustively discussed the huge number of reported PDZ interactions, we have attempted to discuss representative examples that have strong biological and/or genetic support. In addition, a number of examples exist, with more being presented all the time, which suggest that PDZ domains may be involved in a variety of pathogenic situations. For example, disruption of the interaction between the C terminus of an NMDA receptor and a PDZ domain of PSD-95 by a small peptide results in an inhibition of glutamate-mediated damage in an ischemic stroke model [45], suggesting that the detrimental effects of excitotoxicity may depend on PDZ-mediated receptor localization and/or signaling. The structural data discussed here suggest that the interface between the PDZ domain and its ligand(s) is likely to be among the smallest of protein–protein binding sites. Thus, although the dream of disrupting protein–protein interactions appears quite daunting for most targets, the possibility of identifying and enhancing the binding activity of small-molecule antagonists of PDZ-mediated interactions seems quite likely. It will be interesting in the future to identify such small-molecule antagonists in both model and disease-related systems and to examine their effects on PDZ-mediated biology. If these studies are successful, these inhibitors might be among the first drugs to target pathogenic protein interfaces. Acknowledgment
We thank David Wood for help with the figures.
References 1 CHO, K. O., HUNT, C. A., KENNEDY, M. B., The rat brain postsynaptic density fraction contains a homolog of the Drosophila discs-large tumor suppressor protein. Neuron 1992, 9, 929–942. 2 KENNEDY, M. B., Origin of PDZ (DHR,
GLGF) domains. Trends Biochem. Sci. 1995, 20, 350. 3 FANNING, A. S., ANDERSON, J. M., Protein modules as organizers of membrane structure. Curr. Opin. Cell Biol. 1999, 11, 432–439.
References 4 KORNAU, H. C., SEEBURG, P. H., KENNEDY, M. B., Interaction of ion channels and receptors with PDZ domain proteins. Curr. Opin. Neurobiol. 1997, 7, 368–373. 5 WOODS, D. F., BRYANT, P. J., Zo-1, DlgA and PSD95/SAP90: homologous proteins in tight, septate and synaptic cell junctions. Mech. Dev. 1993, 44, 889. 6 KIM, E., NIETHAMMER, M., ROTHSCHILD, A., JAN, Y. N., SHENG, M., Clustering of Shaker-type K+ channels by interaction with a family of membrane-associated guanylate kinases. Nature 1995, 378, 85–88. 7 NOURRY, C., GRANT, S. G. N., BORG, J. P., PDZ domain proteins: Plug and play! 2003, Sci. STKE 2003, re7. 8 ULLMER, C., SCHMUCK, K., FIGGE, A., LUBBERT, H., Cloning and characterization of MUPP1, a novel PDZ domain protein. FEBS Lett. 1998, 424, 63–68. 9 MORAIS CABRAL, J. H., et al., Crystal structure of a PDZ domain. Nature 1996, 382, 649–652. 10 HARRIS, B. Z., LIM, W. A., Mechanism and role of PDZ domains in signaling complex assembly. J. Cell Sci. 2001, 114, 3219–3231. 11 SHENG, M., SALA, C., PDZ domains and the organization of supramolecular complexes. Annu. Rev. Neurosci. 2001, 24, 1–29. 12 DANIELS, D. L., COHEN, A. R., ANDERSON, J. M., BRUNGER, A. T., Crystal structure of the hCASK PDZ domain reveals the structural basis of class II PDZ domain target recognition. Nat. Struct. Biol. 1998, 5, 317–325. 13 TOCHIO, H., HUNG, F., LI, M., BREDT, D. S., ZHANG, M., Solution structure and backbone dynamics of the second PDZ domain of postsynaptic density-95. J. Mol. Biol. 2000, 295, 225–237. 14 DOYLE, D. A., LEE, A., LEWIS, J., KIM, E., SHENG, M., MACKINNON, R., Crystal structures of a complexed and peptide-free membrane protein-binding domain: molecular basis of peptide recognition by PDZ. Cell 1996, 85, 1067–1076. 15 AASLAND, R., et al., Normalization of nomenclature for peptide motifs as
16
17
18
19
20
21
22
23
24
25
ligands of modular protein domains. FEBS Lett. 2002, 513, 141–144. SONGYANG, Z., et al., Recognition of unique carboxyl-terminal motifs by distinct PDZ domains. Science 1997, 275, 73–77. NIETHAMMER, M., KIM, E., SHENG, M., Interaction between the C-terminus of NMDA receptor subunits and multiple members of the PSD-95 family of membrane-associated guanylate kinases. J. Neurosci. 1996, 16, 2157–2163. MATSUMINE, A., et al., Binding of APC to the human homolog of the Drosophila discs large tumor suppressor protein. Science 1996, 272, 1020–1023. KORNAU, H. C., SCHENKER, L. T., KENNEDY, M. B., SEEBURG, P. H., Domain interaction between NMDA receptor subunits and the postsynaptic density protein PSD-95. Science 1995, 269, 1737–1740. FUH, G., PISABARRO, M. T., LI, Y., QUAN, C., LASKY, L. A., SIDHU, S. S., Analysis of PDZ domain–ligand interactions using carboxyl-terminal phage display. J. Biol. Chem. 2000, 275, 21486–21491. LAURA, R. P., et al., The Erbin PDZ domain binds with high affinity and specificity to the carboxyl termini of delta-catenin and ARVCF. J. Biol. Chem. 2002, 277, 12906–12914. SKELTON, N. J., et al., Origins of PDZ domain specificity: structure determination and mutagenesis of the Erbin PDZ domain. J. Biol. Chem. 2003, 278, 7645–7654. KARTHIKEYAN, S., LEUNG, T., LADIAS, J. A., Structural basis of the Na+/H+ exchanger regulatory factor PDZ1 interaction with the carboxyl-terminal region of the cystic fibrosis transmembrane conductance regulator. J. Biol. Chem. 2001, 276, 19683–19686. STRICKER, N. L., et al., PDZ domain of neuronal nitric oxide synthase recognizes novel C-terminal peptide sequences. Nat. Biotech. 1997, 15, 336–342. SCHULTZ, J., HOFFMULLER, U., KRAUSE, G., ASHURST, J., MACIAS, M. J., SCHMIEDER, P., SCHNEIDER-MERGENER, J., OSCHKINAT, H., Specific interactions
17
18 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
26
27
28
29
30
31
32
33
between the syntrophin PDZ domain and voltage-gated sodium channels. Nat. Struct. Biol. 1998, 5, 19–24. KOZLOV, G., BANVILLE, D., GEHRING, K., EKIEL, I., Solution structure of the PDZ2 domain from cytosolic human phosphatase hPTP1E complexed with a peptide reveals contribution of the β2–β3 loop to PDZ domain–ligand interactions. J. Mol. Biol. 2002, 320, 813–820. BIRRANE, G., CHUNG, J., LADIAS, J. A., Novel mode of binding by the Erbin PDZ domain. J. Biol. Chem. 2003, 278, 1399–1402. REINA, J., LACROIX, E., HOBSON, S. D., FERNANDEZ-BALLESTER, G., RYBIN, V., SCHWAB, M. S., SERRANO, L., GONZALEZ, C., Computer-aided design of a PDZ domain to recognize new target sequences. Nat. Struct. Biol. 2002, 9, 621–627. HILLIER, B. J., CHRISTOPHERSON, K. S., PREHODA, K. E., BREDT, D. S., LIM, W. A., Unexpected modes of PDZ domain scaffolding revealed by structure of nNOS–syntrophin complex. Science 1999, 284, 812–815. FENG, W., FAN, J.-S., JIANG, M., SHI, Y.-W., ZHANG, M., PDZ7 of glutamate receptor interacting protein binds to its target via a novel hydrophobic surface area. J. Biol. Chem. 2002, 277, 41140–41146. KANG, G., COOPER, D., DEVEDJIEV, Y., DEREWENDA, U., DEREWENDA, Z., Molecular roots of degenerate specificity in syntenin’s PDZ2 domain: reassessment of the PDZ recognition paradigm. Structure 2003, 11, 845–853. KANG, G., COOPER, D., JELEN, F., DEVEDJIEV, Y., DEREWENDA, U., DAUTER, Z., OTLEWSKI, J., DEREWENDA, Z. S., PDZ tandem of human syntenin: crystal structure and functional properties. Structure 2003, 11, 459–468. LONG, J.-F., TOCHIO, H., WANG, P., FAN, J.-S., SALA, C., NIETHAMMER, M., SHENG, M., ZHANG, M., Supramolecular structure and synergistic target binding of the N-terminal tandem PDZ domains of PSD-95. J. Mol. Biol. 2003, 327, 203–214.
34 IM, Y., PARK, S., RHO, S.-H., LEE, J. H., KANG, G., SHENG, M., KIM, E., EOM, S., Crystal structure of GRIP1 PDZ6-peptide complex reveals the structural basis for class II PDZ target recognition and PDZ domain-mediated multimerization. J. Biol. Chem. 2003, 278, 8501–8507. 35 ZHANG, Q., FAN, J.-S., ZHANG, M., Interdomain chaperoning between PDZ domains of GRIP. J. Biol. Chem. 2001, 276, 43216–43220. 36 FENG, W., SHI, Y., LI, M., ZHANG, M., Tandem PDZ repeats in glutamate receptor-interacting proteins have a novel mode of PDZ domain-mediated target binding. Nat. Struct. Biol. 2003, 10, 972–978. 37 STRICKER, N. L., SCHATZ, P., LI, M., Using the Lac repressor system to identify interacting proteins. Methods Enzymol. 1999, 303, 451–468. 38 VAN HUIZEN, R., et al., Two distantly positioned PDZ domains mediate multivalent INAD–phospholipase C interactions essential for G protein-coupled signaling. EMBO J. 1998, 17, 2285–2297. 39 WANG, S., RAAB, R. W., SCHATZ, P. J., GUGGINO, W. B., LI, M., Peptide binding consensus of the NHE-RF-PDZ1 domain matches the C-terminal sequence of cystic fibrosis transmembrane conductance regulator (CFTR). FEBS Lett. 1998, 427, 103–108. 40 VACCARO, P., BRANNETTI, B., MONTECCHI-PALAZZI, L., PHILIPP, S., CITTERICH, M. H., CESARENI, G., DENTE, L., Distinct binding specificity of the multiple PDZ domains of INADL, a human protein with homology to INAD from Drosophila melanogaster. J. Biol. Chem. 2001, 276, 42122–42130. 41 BORG, J. P., et al., ERBIN: a basolateral PDZ protein that interacts with the mammalian ERBB2/HER2 receptor. Nat. Cell Biol. 2000, 2, 407–414. 42 HARRIS, B. Z., HILLIER, B. J., LIM, W. A., Energetic determinants of internal motif recognition by PDZ domains. Biochemistry 2001, 40, 5921–5930. 43 HARRIS, B., LAU, F., FUJII, N., GUY, R., LIM, W. A., Role of electrostatic
References
44
45
46
47
48
49
50
51
52
53
interactions in PDZ domain ligand recognition. Biochemistry 2003, 42, 2797–2805. LOCKLESS, S. W., RANGANANTHAN, R., Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999, 286, 295–299. AARTS, M., et al., Treatment of ischemic brain damage by perturbing NMDA receptor–PSD-95 protein interactions. Science 2002, 298, 846–850. FUJII, N., HARESCO, J., NOVAK, K., SOKOE, D., KUNTZ, I., GUY, R., A selective irreversible inhibitor targeting a PDZ protein interaction domain. J. Am. Chem. Soc. 2003, 125, 12074–12075. FANG, M., et al., Synaptic PDZ domain-mediated protein interactions are disrupted by inhalation anesthetics. J. Biol. Chem. 2003, 278, 36669–36675. MONTELL, C., Visual transduction in Drosophila. Ann. Rev. Cell Dev. Biol. 1999, 15, 231–268. HUBER, A., SANDER, P., GOBERT, A., BAHNER, M., HERMANN, R., PAULSEN, R., The transient receptor potential protein (Trp), a putative store-operated Ca2+ channel essential for phosphoinositide-mediated photoreception, forms a signaling complex with NorpA, InaC and InaD. EMBO J. 1996, 15, 7036–7045. HUBER, A., SANDER, P., PAULSEN, R., Phosphorylation of the InaD gene product, a photoreceptor membrane protein required for recovery of visual excitation. J. Biol. Chem. 1996, 271, 11710–11717. SARAS, J., HELDIN, C. H., PDZ domains bind carboxy-terminal sequences of target proteins. Trends Biochem. Sci. 1996, 21, 455–458. SHIEH, B. H., NIEMEYER, B., A novel protein encoded by the InaD gene regulates recovery of visual transduction in Drosophila. Neuron 1995, 14, 201–210. CHEVESICH, J., KREUZ, A. J., MONTELL, C., Requirement for the PDZ domain protein, INAD, for localization of the TRP store-operated channel to a signaling complex. Neuron 1997, 18, 95–105.
54 HUBER, A., SANDER, P., BAHNER, M., PAULSEN, R., The TRP Ca2+ channel assembled in a signaling complex by the PDZ domain protein INAD is phosphorylated through the interaction with protein kinase C (ePKC). FEBS Lett. 1998, 425, 317–322. 55 SCOTT, K., ZUKER, C. S., Assembly of the Drosophila phototransduction cascade into a signalling complex shapes elementary responses. Nature 1998, 395, 805–808. 56 SHIEH, B. H., ZHU, M. Y., Regulation of the TRP Ca2+ channel by INAD in Drosophila photoreceptors. Neuron 1996, 16, 991–998. 57 SHIEH, B. H., ZHU, M. Y., LEE, J. K., KELLY, I. M., BAHIRAEI, F., Association of INAD with NORPA is essential for controlled activation and deactivation of Drosophila phototransduction in vivo. Proc. Natl. Acad. Sci. USA 1997, 94, 12682–12687. 58 TSUNODA, S., SIERRALTA, J., SUN, Y., BODNER, R., SUZUKI, E., BECKER, A., SOCOLICH, M., ZUKER, C. S., A multivalent PDZ-domain protein assembles signalling complexes in a G-protein– coupled cascade. Nature 1997, 388, 243–249. 59 XU, X. Z., CHOUDHURY, A., LI, X., MONTELL, C., Coordination of an array of signaling proteins through homo- and heteromeric interactions between PDZ domains and target proteins. J. Cell Biol. 1998, 142, 545–555. 60 WES, P. D., XU, X. Z., LI, H. S., CHIEN, F., DOBERSTEIN, S. K., MONTELL, C., Termination of phototransduction requires binding of the NINAC myosin III and the PDZ protein INAD. Nature Neuroscience 1999, 2, 447–453. 61 LI, H.-S., PORTER, J. A., MONTELL, C., Requirement for the NINAC kinase/myosin for stable termination of the visual cascade. J. Neurosci. 1998, 18, 9601–9606. 62 KORNFELD, K., Vulval development in Caenorhabditis elegans. Trends Genet. 1997, 13, 55–61. 63 EISENMANN, D. M., KIM, S. K., Signal transduction and cell fate specification during Caenorhabditis elegans vulval
19
20 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
64
65
66
67
68
69
70
71
72
development. Curr. Opin. Genet. Dev. 1994, 4, 508–516. AROIAN, R. V., KOGA, M., MENDEL, J. E., OHSHIMA, Y., STERNBERG, P. W., The let-23 gene necessary for Caenorhabditis elegans vulval induction encodes a tyrosine kinase of the EGF receptor subfamily. Nature 1990, 348, 693–699. HILL, R. J., STERNBERG, P. W., The gene lin-3 encodes an inductive signal for vulval development in C. elegans. Nature 1992, 358, 470–476. KAECH, S. M., WHITFIELD, C. W., KIM, S. K., The LIN-2/LIN-7/LIN-10 complex mediates basolateral membrane localization of the C. elegans EGF receptor LET-23 in vulval epithelial cells. Cell 1998, 94, 761–771. SIMSKE, J. S., KAECH, S. M., HARP, S. A., KIM, S. K., LET-23 receptor localization by the cell junction protein LIN-7 during C. elegans vulval induction. Cell 1996, 85, 195–204. HOSKINS, R., HAJNAL, A. F., HARP, S. A., KIM, S. K., The C. elegans vulval induction gene lin-2 encodes a member of the MAGUK family of cell junction proteins. Development 1996, 122, 97–111. JAULIN-BASTARD, F., et al., Interaction between Erbin and a catenin-related protein in epithelial cells. J. Biol. Chem. 2002, 277, 2869–2875. IZAWA, I., NISHIZAWA, M., TOMONO, Y., OHTAKARA, K., TAKAHASHI, T., INAGAKI, M., ERBIN associates with p0071, an armadillo protein, at cell–cell junctions of epithelial cells. Genes to Cells 2002, 7, 475–485. OHNO, H., HIRABAYASHI, S., IIZUKA, T., OHNISHI, H., FUJITA, T., HATA, Y., Localization of p0071-interacting proteins, plakophilin-related armadillo-repeat protein-interacting protein (PAPIN) and ERBIN, in epithelial cells. Oncogene 2002, 21, 7042–7049. DILLON, C., CREER, A., KERR, K., KUMIN, A., DICKSON, C., Basolateral targeting of ERBB2 is dependent on a novel bipartite juxtamembrane sorting signal but independent of the C-terminal ERBIN-binding domain. Mol. Cell. Biol. 2002, 22, 6553–6563.
73 SHELLY, M., MOSESSON, Y., CITRI, A., LAVI, S., ZWANG, Y., MELAMED-BOOK, N., AROETI, B., YARDEN, Y., Polar expression of ErbB-2/HER2 in epithelia: bimodal regulation by Lin-7. Developmental Cell 2003, 5, 475–486. 74 STRAIGHT, S. W., KARNAK, D., BORG, J. P., KAMBEROV, E., DARE, H., MARGOLIS, B., WADE, J. B., mLin-7 is localized to the basolateral surface of renal epithelia via its NH2 terminus. Am. J. Physiol. Renal Physiol. 2000, 278, F464–F475. 75 KUO, A., ZHONG, C., LANE, W. S., DERYNCK, R., Transmembrane transforming growth factor-alpha tethers to the PDZ domain-containing, golgi membrane-associated protein p59/GRASP55. EMBO J. 2000, 19, 6427–6439. 76 BEZPROZVANNY, I., MAXIMOV, A., PDZ domains: more than just a glue. Proc. Natl. Acad. Sci. USA 2001, 98, 787–789. 77 BISSELL, M. J., RADISKY, D., Putting tumours in context. Nat. Rev. Cancer 2001, 1, 46–54. 78 NAGAFUCHI, A. Molecular architecture of adherens junctions. Curr. Opin. Cell Biol. 2001, 13, 600–603. 79 BILDER, D., PDZ proteins and polarity: functions from the fly. Trends Genet. 2001, 17, 511–519. 80 HUMBERT, P., RUSSELL, S., RICHARDSON, H., Dlg, Scribble and Lgl in cell polarity, cell proliferation and cancer. BioEssays 2003, 25, 542–553. 81 WODARZ, A., Tumor supressors: Linking cell polarity and growth control. Curr. Biol. 2000, 10, R624–R626. 82 PEIFER, M., TEPASS, U., Which way is up? Nature 2000, 403, 611–612. 83 BILDER, D., PERRIMON, N., Localization of apical epithelial determinants by the basolateral PDZ protein Scribble. Nature 2000, 403, 676–680. 84 BILDER, D., LI, M., PERRIMON, N., Cooperative regulation of cell polarity and growth by Drosophila tumor suppressors. Science 2000, 289, 113–116. 85 WOODS, D. F., HOUGH, C., PEEL, D., CALLAINI, G., BRYANT, P. J., Dlg protein is required for junction structure, cell polarity, and proliferation control in
References
86
87
88
89
90
91
92
93
94
95
Drosophila epithelia. J. Cell Biol. 1996, 134, 1469–1482. WOODS, D. F., BRYANT, P. J., The discs-large tumor supressor gene of Drosophila encodes a guanylate kinase homolog localized at septate junctions. Cell 1991, 66, 451–464. JACOB, L., OPPER, M., BERNHARD, M., PHANNAVONG, B., MECHLER, B. M., Structure of the l(2)gl gene of Drosophila and delimination of its tumor suppressor domain. Cell 1987, 50, 215–225. PETRONCZKI, M., KNOBLICH, J. A., DmPAR-6 directs epithelial polarity and asymmetric cell division of neuroblasts in Drosophila. Nat. Cell Biol. 2001, 3, 43–49. KUCHINKE, U., GRAWE, F., KNUST, E., Control of spindle orientation in Drosophila by the Par-3–related PDZ-domain protein Bazooka. Curr. Biol. 1998, 8, 1357–1365. MULLER, H. A., WIESCHAUS, E., Armadillo, bazooka, and stardust are critical for early stages in formation of the zonula adherens and maintenance of the polarized blastoderm epithelium in Drosophila. J. Cell Biol. 1996, 134, 149–163. BILDER, D., SCHOBER, M., PERRIMON, N., Integrated activity of PDZ protein complexes regulates epithelial polarity. Nat. Cell Biol. 2003, 5, 53–58. HONG, Y., STRONACH, B., PERRIMON, N., JAN, L. Y., JAN, Y. N., Drosophila Stardust interacts with Crumbs to control polarity of epithelia but not neuroblasts. Nature 2001, 414, 634–638. BACHMANN, A., SCHNEIDER, M., THEILENBERG, E., GRAWE, F., KNUST, E., Drosophila Stardust is a partner of Crumbs in the control of epithelial cell polarity. Nature 2001, 414, 638–643. TEPASS, U., THERES, C., KNUST, E., Crumbs encodes an EGF-like protein expressed on apical membranes of Drosophila epithelial cells and required for organization of epithelia. Cell 1990, 61, 787–799. WODARZ, A., HINZ, U., ENGELBERT, M., KNUST, E., Expression of Crumbs confers apical character on plasma membrane
96
97
98
99
100
101
102
103
104
domains of ectodermal epithelia of Drosophila. Cell 1995, 82, 67–76. LAURA, R. P., ROSS, S., KOEPPEN, H., LASKY, L. A., MAGI-1: a widely expressed, alternatively spliced tight junction protein. Exp. Cell Res. 2002, 275, 155–170. NAKAGAWA, S., HUIBREGTSE, J. M., Human scribble (Vartul) is targeted for ubiquitin-mediated degradation by the high-risk papillomavirus E6 proteins and the E6AP ubiquitin-protein ligase. Mol. Cell. Biol. 2000, 20, 8244–8253. KIYONO, T., HIRAIWA, A., FUJITA, M., HAYASHI, Y., AKIYAMA, T., ISHIBASHI, M., Binding of high-risk human papillo-mavirus E6 oncoproteins to the human homologue of the Drosophila discs large tumor suppressor protein. Proc. Natl. Acad. Sci. USA 1997, 94, 11612–11616. GARDIOL, D., KUHNE, C., GLAUNSINGER, B., LEE, S. S., JAVIER, R., BANKS, L., Oncogenic human papillomavirus E6 proteins target the discs large tumour suppressor for proteasome-mediated degradation. Oncogene 1999, 18, 5487–5496. PIM, D., THOMAS, M., BANKS, L., Chimaeric HPV E6 proteins allow dissection of the proteolytic pathways regulating different E6 cellular target proteins. Oncogene 2002, 21, 8140–8148. MANTOVANI, F., BANKS, L., The human papillomavirus E6 protein and its contribution to malignant progression. Oncogene 2001, 20, 7874–7887. LEGOUIS, R., GANSMULLER, A., SOOKHAREEA, S., BOSHER, J. M., BAILLIE, D. L., LABOUESSE, M., LET-413 is a basolateral protein required for the assembly of adherens junctions in Caenorhabditis elegans. Nat. Cell Biol. 2000, 2, 415–422. MCMAHON, L., LEGOUIS, R., VONESCH, J.-L., LABOUESSE, M., Assembly of C. elegans apical junctions involves positioning and compaction by LET-413 and protein aggregation by the MAGUK protein DLG-1. J. Cell. Sci. 2001, 114, 2265–2277. PETTITT, J., COX, E. A., BROADBENT, I. D., FLETT, A., HARDIN, J., The Caenorhabditis
21
22 PDZ Domains: Intracellular Mediators of Carboxy-Terminal Protein Recognition and Scaffolding
105
106
107
108
109
110
111
112
113
114
115
116
elegans p120 catenin homologue, JAC-1, modulates cadherin-catenin function during epidermal morphogenesis. J. Cell Biol. 2003, 162, 15–22. HUANG, Y. Z., ZANG, M., XIONG, W. C., LUO, Z., MEI, L., Erbin suppresses the MAP kinase pathway. J. Biol. Chem. 2003, 278, 1108–1114. LI, W., HAN, M., GUAN, K.-L., The leucine-rich repeat protein SUR-8 enhances MAP kinase activation and forms a complex with Ras and Raf. Genes. Dev. 2000, 14, 895–900. KOLCH, W., Erbin: Sorting out ErbB2 receptors or giving Ras a break? Sci. STKE 2003, 2003, pe37. GARNER, C. C., NASH, J., HUGANIR, R. L., PDZ domains in synapse assembly and signalling. Trends Cell Biol. 2000, 10, 274–280. CRAVEN, S. E., BREDT, D. S., PDZ proteins organize synaptic signaling pathways. Cell 1998, 93, 495–498. MIGAUD, M., et al., Enhanced long-term potentiation and impaired learning in mice with mutant postsynaptic density-95 protein. Nature 1998, 396, 433–439. RASBAND, M. N., PARK, E. W., ZHEN, D., ARBUCKLE, M. I., POLIAK, S., PELES, E., GRANT, S. G., TRIMMER, J. S., Clustering of neuronal potassium channels is independent of their interaction with PSD-95. J. Cell Biol. 2002, 159, 663–672. WHARTON, K. A. Jr., Runnin’ with the Dvl: proteins that associate with Dsh/Dvl and their significance to Wnt signal transduction. Dev. Biol. 2003, 253, 1–17. SOKOL, S. Y., Analysis of Dishevelled signalling pathways during Xenopus development. Curr. Biol. 1996, 6, 1456–1467. YANAGAWA, S., VAN LEEUWEN, F., WODARZ, A., KLINGENSMITH, J., NUSSE, R., The dishevelled protein is modified by wingless signaling in Drosophila. Genes. Dev. 1995, 9, 1087–1097. KRASNOW, R. E., WONG, L. L., ADLER, P. N., Dishevelled is a component of the frizzled signaling pathway in Drosophila. Development 1995, 121, 4095–4102. UEMATSU, K., KANAZAWA, S., YOU, L., HE, B., XU, Z., LI, K., PETERLIN, B. M.,
117
118
119
120
121
122
123
124
125
126
MCCORMICK, F., JABLONS, D. M., Wnt pathway activation in mesothelioma: evidence of Dishevelled overexpression and transcriptional activity of beta-catenin. Cancer Res. 2003, 63, 4547–4551. POMERANTZ, J. L., DENNY, E. M., BALTIMORE, D., CARD11 mediates factor-specific activation of NF-kappaB by the T cell receptor complex. EMBO J. 2002, 21, 5184–5194. BERTIN, J., et al., CARD11 and CARD14 are novel caspase recruitment domain (CARD)/membrane-associated guanylate kinase (MAGUK) family members that interact with BCL10 and activate NF-kappa B. J. Biol. Chem. 2001, 276, 11877–11882. MCALLISTER-LUCAS, L. M., et al., Bimp1, a MAGUK family member linking protein kinase C activation to Bcl10-mediated NF-&κB induction. J. Biol. Chem. 2001, 276, 30589–30597. GAIDE, O., et al., CARMA1 is a critical lipid raft-associated regulator of TCR-induced NF-κB activation. Nat. Immunol. 2002, 3, 836–843. WANG, D., et al., A requirement for CARMA1 in TCR-induced NF-κB activation. Nat. Immunol. 2002, 3, 830–835. JUN, J. E., et al., Identifying the MAGUK protein Carma-1 as a central regulator of the humoral immune responses and atopy by genome-wide mouse mutagenesis. Immunity 2003, 18, 751–762. HARA, H., et al., The MAGUK family protein CARD11 is essential for lymphocyte activation. Immunity 2003, 18, 763–775. LEFKOWITZ, R. J., ROCKMAN, H. A., KOCH, W. J., Catecholamines, cardiac β-adrenergic receptors, and heart failure. Circulation 2000, 101, 1634–1637. HALL, R. A., et al., The beta2-adrenergic receptor interacts with the Na+/H+-exchanger regulatory factor to control Na+/H+ exchange. Nature 1998, 392, 626–630. CAO, T. T., DEACON, H. W., RECZEK, D., BRETSCHER, A., VON ZASTROW, M., A kinase-regulated PDZ-domain
References interaction controls endocytic sorting of the beta2-adrenergic receptor. Nature 1999, 401, 286–290. 127 XIANG, Y., DEVIC, E., KOBILKA, B., The PDZ binding motif of the beta 1 adrenergic receptor modulates receptor trafficking and signaling in cardiac myocytes. J. Biol. Chem. 2002, 277, 33783–33790. 128 CONG, M., PERRY, S. J., HU, L. A., HANSON, P. I., CLAING, A., LEFKOWITZ, R.
J., Binding of the β2 adrenergic receptor to N-ethylmaleimide– sensitive factor regulates receptor recycling. J. Biol. Chem. 2001, 276, 45145–45152. 129 XIANG, Y., KOBILKA, B., The PDZ-binding motif of the β2-adrenorecptor is essential for physiologic signaling and trafficking in cardiac myocytes. Proc. Natl. Acad. Sci. USA 2003, 100, 10776–10781.
23
1
PTB Domains Ben Margolis University of Michigan Medical School, Ann Arbor, USA
Linton M. Traub University of Pittsburgh School of Medicine, Pittsburgh, USA
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 Introduction
The phosphotyrosine binding (PTB) domain also known as the phosphotyrosine interaction domain (PID) was the second domain (after the SH2 domain) found to bind phosphotyrosine-containing peptides. Over time, scientists have come to realize that this domain does more than mediate phosphotyrosine-related signaling, because its binding is not always dependent on the presence of phosphotyrosine. The domain was first identified in the amino terminus of the Shc protein [1–3], where it was found to bind to an Asp-Pro-any amino acid-pTyr (NPxpoY) motif found in many activated growth factor receptors. Simultaneously, a binding domain on insulin receptor substrate 1 (IRS1) and insulin receptor substrate 2 (IRS2) was identified that also bound the NPxpoY motif [2, 4]. Although the NPxpoY binding regions in the IRS and Shc proteins do not have significant primary sequence homology, they have similar binding specificity and 3D structure. PTB domains with sequence homology to those found in IRS proteins are referred to as PTBI for PTB-domain-IRS-like to differentiate them from PTB domains that have sequence homology with the Shc PTB domain (Simple Modular Architecture Research Tool; http://smart.embl-heidelberg.de). A relatively small number of proteins have been identified that contain PTBI domains, and these primarily function in tyrosine kinase signal transduction. Many more PTB domains with sequence similarity to the Shc PTB domain have been identified [5], and they have more diverse cellular functions. Evolutionarily, PTB domains are not found in yeast or plants but appear in Drosophila and Caenorhabditis elegans genomes. This chapter discusses the
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 PTB Domains
function of proteins containing PTB and PTBI domains and examines the structural basis of their interactions with peptide and phospholipid ligands.
2 Function of PTB Domain Proteins
Proteins with PTB and PTBI domains are involved in multiple cellular functions. The following sections discuss their role in tyrosine kinase-dependent and -independent signaling, protein trafficking, and cell adhesion. A schematic representation of several PTB and PTBI domain proteins is displayed in Figure 1. 2.1 Role of PTB Domain Proteins in Tyrosine Kinase Signaling 2.1.1 Shc As described above, PTB domains are known to play an important role in tyrosine kinase signal transduction. This of course relates to the initial identification of the PTB domain in the tyrosine kinase substrate, ShcA, and has been reviewed previously [6]. The classical role of the Shc PTB domain is to allow Shc to bind to growth factor receptors, facilitating Shc phosphorylation [7, 8]. Once Shc is phosphorylated, it can bind to other downstream signal-transduction molecules. For example, the nerve growth factor receptor, TrkA, cannot bind efficiently to Grb2 and its binding partner, the Ras guanyl nucleotide exchange factor, Son of Sevenless (Sos). Grb2-Sos is the primary protein complex used by growth factor receptors to activate the Ras G protein. However, TrkA does have an NPxY motif that, once phosphorylated, can bind the PTB domain of Shc [8]. Once TrkA binds Shc, it mediates Shc tyrosine phosphorylation, allowing it to bind to the Grb2-Sos complex and activate Ras. Many growth factor receptors are known to utilize Shc and its paralogs as a crucial intermediary in signal transduction [9]. One of the better examples for the role of the PTB domain is found in Drosophila Shc [10]. In a genetic screen, N¨usslein-Volhard and coworkers identified Shc as a maternal gene important for embryonic development. They found that Shc signaled downstream of the Drosophila EGF and TOR tyrosine kinase receptors. Interestingly, one Shc mutant isolated had a mutation in the PTB domain and functioned similar to a null allele, confirming the crucial role of the PTB domain in Shc function. A second aspect to the function of the Shc PTB domain and other PTB domains is the ability to bind to phospholipids [11]. The PTB domain is highly related in structure to the pleckstrin homology (PH) domain, a domain known to bind to phospholipids [12]; indeed, the protein binding capabilities of the PTB domain may have evolved from its initial ability to bind phospholipids as a PH domain [13]. The lipid binding region of the PTB domain is distinct from the peptide binding region, indicating that the PTB domain can bind both lipids and proteins simultaneously. This can assist the targeting of Shc to the membrane while binding to tyrosinephosphorylated receptors. It has been shown that specific mutations within the
2 Function of PTB Domain Proteins
Fig. 1 Domain architecture of several representative PTB domain-containing proteins and other modular domains within the PTB and PTBI protein families. Many of the other modular domains are discussed in other chapters in this volume. More details on those not covered in this book can be found at one of the numerous domain databases such as SMART (http://smart.embl-heidelberg.de). PTB = phosphotyrosine binding domain;
PTBI = phosphotyrosine binding domain IRS-like; SH2 = Src homology 2 domain; WW = domain with two conserved tryptophans; PDZ = postsynaptic density 95/discs large/zona occludens-1 domain; SH3 = Src homology 3 domain; RGS = regulator of G protein signaling domain, RBD = Raf-like Ras-binding domain; Go Loco = G"I/O -Loco motif; SAM : sterile alpha motif; JBD = JNK binding domain; PH = pleckstrin homology domain.
PTB domain that disrupt phospholipid but not phosphotyrosine binding impair Shc tyrosine phosphorylation by growth factor receptors [11]. A unique aspect to Shc is that it also contains an SH2 domain in addition to a PTB domain (Figure 1). This organization is conserved throughout evolution, suggesting some important role for the SH2 domain. Yet there are very few examples of the SH2 domain
3
4 PTB Domains
rather than the PTB domain coupling Shc to tyrosine phosphorylated proteins. This suggests that the Shc SH2 domain may have an alternative role in Shc function, but the exact nature of this is unclear [9]. 2.1.2 Proteins with PTBI Domains Proteins with PTBI domains play a prominent role in signaling by receptor tyrosine kinases. As discussed previously, PTBI domains have a 3D structure similar to that of PTB domains, but there is limited primary sequence similarity between these domains. The classic family of proteins having this domain is the insulin receptor substrates (IRS). It had been known for several years that the insulin receptor had an NPxY motif centered on Tyr960 that was phosphorylated. It was also known that this motif was necessary for insulin-mediated signal transduction as well as phosphorylation of a 185-kDa substrate protein [14]. This substrate protein was later identified as IRS-1 [15] and shown to have a domain that can interact with the insulin receptor [16]. A similar domain was also identified in the related IRS-2 protein [4], and this domain is now referred to as the PTBI domain. The IRS family of proteins functions as adapters similar to Shc. However, although both Shc PTB and the IRS PTBI domains bind NPxpoY motifs, there are differences in the sequences surrounding the NPxpoY motif that are required for high-affinity binding [17]. IRS proteins contain multiple tyrosine-phosphorylation sites and can mediate multiple downstream signaling pathways. A complete discussion of IRS signaling is beyond the scope of this review, but it has been described in other publications [18]. However, a few points specifically related to the PTBI domains of these proteins can be noted. One of the surprises is that the role of PTBI in signaling by IRS proteins has been difficult to demonstrate. The amino terminus of IRS proteins has one PTBI domain that sits just carboxy-terminal to the PH domain (Figure 1). As mentioned previously, a primary function of PH domains is to bind phospholipid headgroups. In IRS-1, the PH domain is necessary for IRS tyrosine phosphorylation after insulin stimulation [19]. The PTB domain seems to be essential for high-affinity interactions between IRS-1 and insulin receptor, but the actual necessity for this interaction in insulin signaling has been difficult to demonstrate [19]. The picture becomes more confusing when one considers that the PTB domain of IRS proteins can bind phospholipids [20] and that the PH domain of IRS-1 also has protein binding partners [21]. The most reasonable explanation for the function of these domains comes from structural studies, which suggest that cooperation between PTB and PH domains mediates a high-affinity interaction between insulin receptor and IRS-1 [22]. Many of the studies using mutagenesis of these domains relied on overexpression and thus may miss the cooperative nature of domain interactions when the proteins are expressed at physiologic levels. Two other families of proteins with PTBI domains are also important in growth factor receptor signal transduction. One family is referred to as the fibroblast growth factor receptor substrate (FRS) family [23]. These proteins are also called Sucassociated neurotrophic factor-induced tyrosine-phosphorylated target (SNT). This protein family was first identified by its binding to p13suc1 agarose, an affinity reagent that binds to certain cyclin-dependent kinases [24]. This suggested that
2 Function of PTB Domain Proteins
this tyrosine-phosphorylated substrate of fibroblast growth factor (FGF) and Trk receptors might directly control the cell cycle, but later purification indicated it was a large scaffold protein that bound to growth factor receptors at the cell surface [23]. Proteins in this family have a single PTBI domain at their amino terminus that mediates interactions with FGF and Trk receptors, as well as other growth factor receptors [23, 25, 26]. Like Shc and IRS proteins, the FRS proteins become tyrosine-phosphorylated after association with growth factor receptors and then recruit signaling molecules onto the tyrosine phosphorylation sites [27, 28]. The binding specificity of the FRS PTBI domain is unique. On TrkA it binds to phosphorylated Y490, a classic NPxpoY site, and can compete with Shc for binding to this site [29]. In contrast, the binding to FGF receptor appears to be constitutive and not dependent on receptor phosphorylation [25, 27]. The peptide from the juxtamembrane domain of the FGF receptor binds to the FRS PTBI domain in a fashion completely different from that seen with the NPxpoY motif from the TrkA receptor [30] and represents a novel peptide-PTB domain interaction. One trend seen in previously discussed PTB domain proteins is also seen with the FRS family of proteins: both Shc and IRS proteins have the ability to bind membranes and protein peptide motifs either with their PTB domain alone or with the PTB domain in combination with another domain. The FRS proteins address this problem, not with an additional domain, but by adding a myristate group at the amino terminus that links them to membranes [23]. The other family of PTBI domain proteins is the DOK (downstream of kinase) proteins (Figure 1). The first DOK family member was identified as a 62-kDa tyrosine-phosphorylated protein that binds to the p120 RasGAP protein and contains an amino-terminal PH and PTB domain [31, 32]. Since then, an additional four members of the DOK family have been identified [33–40]. The PTB domain of DOK proteins can bind to the canonical NPxpoY motifs and to highly related motifs present in EGF receptor [41] and other tyrosine-phosphorylated proteins [38, 42]. However, there also appears to be unique phosphotyrosine-containing motifs that can bind these PTB domains, such as those found in Tie1 receptor [43]. As in IRS-1, there appears to be a need for cooperation between the PH and PTB domains to mediate the growth factor-induced tyrosine phosphorylation of DOK proteins [41]. However, in contrast to IRS and FRS family members, the primary role of DOK proteins focuses on the inhibition of growth-promoting signals. This is particularly evident for DOK signaling in cells of the hematopoietic lineage, where DOK family members inhibit signaling from tyrosine kinases to the MAP kinase-Erk pathway [36, 44]. However, there have also been reports of positive roles for DOK proteins in signal transduction pathways [38] and cell migration [45]. 2.1.3 Additional PTB Domain Proteins Involved in Tyrosine Kinase Signaling Other proteins with PTB domains appear to play a role in tyrosine kinase signaling, although the role of the PTB domain in these processes is not as clear. The EPS8 protein was first identified as a protein that became tyrosine-phosphorylated after EGF stimulation [46]. EPS8 appears to bind to the EGF receptor, but surprisingly, the PTB domain does not appear to be involved in this process [47]. EPS8 protein
5
6 PTB Domains
also contains an SH3 domain (Figure 1). An important insight came when it was determined that EPS8 can localize to sites of actin polymerization [48]. When EPS8 is knocked out in mice, the mice appear to develop normally, but there are defects in actin polymerization in response to growth factor receptors in knockout fibroblasts [49]. Based on several studies, it was concluded that EPS8 is in a complex with several proteins, including Sos, and mediates Rac activation after growth factor receptor stimulation in a PI-3 kinase-dependent fashion [50, 51]. EPS8 may also play a role in the endocytosis of growth factor receptors [52]. Another PTB domain known to be phosphorylated after EGF stimulation is Odin [53]. Odin also has SAM domains as well as multiple ankyrin repeats (Figure 1). It appears in the protein database in multiple isoforms with and without ankyrin repeats. The function of the PTB domain in this protein is unclear, because its removal does not affect Odin tyrosine phosphorylation, and it is not certain that this protein even associates with growth factor receptors [53]. A highly related gene called EB-1 was identified as a gene that is induced in hematopoietic cells transformed by the E2A-PBX oncoprotein, but the function of EB-1, as well as of Odin, is unknown [54]. Regulator of G protein signaling 12 (RGS12) is another signaling molecule that contains a PTB domain and is involved in tyrosine kinase signaling (Figure 1). RGS proteins function as GTPase activating proteins for the α subunits of heterotrimeric G proteins [55]. In RGS12, it has been reported that the PTB domain of this protein can bind to tyrosine-phosphorylated calcium channels [56]. The binding of the PTB domain to the calcium channel may modulate calcium channel activity as well as regulate the heterotrimeric G proteins that control the channel. 2.2 PTB Domain Proteins That Function Independent of Phosphotyrosine
The above signaling pathways involve the binding of PTB and PTBI domains to tyrosine-phosphorylated proteins. Indeed, these were the early signaling pathways that defined the function of PTB and PTBI domains. However, soon after the identification of this domain, it was found that PTB domains can also bind NPxY motifs independent of tyrosine phosphorylation [57]. This led to an expanded appreciation of the role of PTB domains in signaling, cell adhesion, and protein processing and trafficking. 2.2.1 PTB Domain Proteins That Bind APP The realization that PTB domains can bind nonphosphorylated NPxY motifs came with studies of amyloid precursor protein (APP). APP is of great interest because it is processed via proteases to amyloid beta, a protein deposited in the brains of Alzheimer disease patients. Evidence obtained over the past several years has strongly pointed to the deposition of amyloid beta as an important causative factor in Alzheimer’s disease [58]. Despite the known role of APP processing in Alzheimer’s disease, the physiological role of APP and highly related proteins is unclear.
2 Function of PTB Domain Proteins
However, the binding of PTB domains to APP has provided unique insights into the normal function of this protein. Initial studies pointed to the binding of two PTB domain proteins, FE65 and X11/Mint, to the NPxY motif of APP [57, 59, 60]. Later, several other PTB domain proteins, including Disabled (Dab), JIP-1, and Numb, were identified as proteins that can bind APP [61–66]. There are also reports of APP becoming tyrosinephosphorylated and binding Shc via an NPxpoY motif, but the physiologic relevance of this interaction is uncertain [67, 68]. A large number of studies have examined the effect of PTB domain proteins on APP processing to amyloid beta [64, 69– 72]. A weakness of all these studies is that they employ overexpression of the PTB domain proteins, and thus it is not clear that these proteins physiologically regulate APP processing normally in the brain. Nonetheless, a large amount of data has been obtained on the effects of X11/Mint proteins on the processing of APP. X11/Mint proteins have one PTB domain and two PDZ domains (Figure 1; see also Chapter 3). X11/Mint protein overexpression slows APP processing and reduces amyloid beta production, both in tissue culture and in vivo [69, 70, 73]. The mechanism of X11/Mint action in this regard is unclear; however, studies on the C. elegans homolog of X11/Mint proteins, known as Lin-10, are instructive [74, 75]. Lin-10 plays a role in proper trafficking of proteins in worm epithelia and brain, suggesting that X11/Mint might have a similar role in mammalian cells. The localization of X11/Mint in the Golgi fits with such a role for this protein [76, 77]. In addition, X11/Mint proteins also interact with Munc-18, a protein that modulates SNARE interactions and thus possibly vesicle fusion [78]. Munc-18 interaction also modulates the effects of X11/Mint on APP processing, adding to the hypothesis that altered trafficking of APP is at the core of X11 actions on APP processing [79, 80]. The processing of APP is known to occur in discrete intracellular compartments, and rerouting of APP in the cell by X11/Mint proteins could alter amyloid beta production [81]. A more exciting story has emerged on the interaction of the FE65 family of proteins with APP. FE65 family members have two PTB domains and an aminoterminal WW domain (Figure 1). The effects of FE65 on APP processing have also been studied, and it appears that FE65 usually increases amyloid beta production, but the effects may be variable depending on the situation [71, 82]. FE65 also interacts with another transmembrane protein called lipoprotein receptor-related protein (LRP), a member of the lipoprotein receptor family [83]. The amino-terminal PTB domain of FE65 interacts with LRP while the carboxy-terminal PTB binds to APP, allowing FE65 to act as a bridge linking LRP to APP [83, 84]. This is of significance, because a large body of literature suggests that LRP can control APP processing, and FE65 is an important link between these two proteins [85, 86]. Further studies revealed that FE65 is a nuclear protein and that its localization to the nucleus is restricted by binding to APP [87]. A breakthrough study by Cao and Sudhof [88] then demonstrated that processing of APP leads to translocation of FE65 to the nucleus bound to the cleaved intracellular domain of APP. It was demonstrated that this complex of FE65 bound to the intracellular domain of APP can affect gene transcription by binding to the Tip60 protein, a histone
7
8 PTB Domains
acetyltransferase [88]. This pathway can be modified by the interaction of APP with X11/Mint [89] and by the interaction of FE65 with LRP [90]. The concept of FE65 and APP as a unit that can control transcriptional activity has been confirmed by several other groups [91, 92]. These findings suggest a signal-transduction pathway in which the processing of APP and translocation of its intracellular domain to the nucleus leads to control of gene expression. Further work will be necessary to define what mechanisms are utilized to control the processing of APP and the nature of the genes induced. Other roles have also been invoked for FE65, including the control of actin bundling and cell migration [93], but these findings still await confirmation. APP has also been reported to interact with the PTB-domain proteins JNK interacting protein-1 (JIP-1) and JNK interacting protein-2 (JIP-2). JIPs were first identified as scaffolding proteins that can regulate the MAP kinase pathway that leads to JNK activation [94, 95]. There are several theories as to how JIP might control JNK activity, and it may have both inhibitory and activating effects on JNK activation [94–96]. JIP-1 and JIP-2 contain a PTB domain that can interact with APP [63, 64] and a carboxy terminus that can interact with the light chain of conventional kinesin, kinesin-1 [97]. In this fashion, APP, a transmembrane protein, can connect transport vesicles to a kinesin motor and assist vesicle movement along microtubules. Thus, another physiological role for APP, in addition to control of transcription, may involve vesicle trafficking along microtubules in neurons. Indeed, several reports have linked APP to kinesins, both directly and indirectly [98, 99]. The exact role of JIPs, and more specifically, the associated MAP pathway kinases in control of kinesin transport is an area of active investigation. It is interesting to note that another APP binding partner, X11/MINT, has also been implicated as binding to kinesins [100], indicating that APP may have a general role in coupling transport vesicles to kinesin motors. Thus, in conclusion, several PTB-containing proteins interact with the NPxY motifs of APP and give important clues to the physiological role of APP and its paralogs in transcription control and vesicle trafficking. The studies may also provide important additional clues to the pathogenesis of Alzheimer’s disease. 2.2.2 PTB Domain Proteins That Bind Integrins Another family of transmembrane proteins known to contain NPxY motifs are beta integrins, which serve to link the intracellular actin cytoskeleton with the extracellular matrix [101]. NPxY motifs are found in multiple copies in several of the beta integrins’ cytoplasmic tails [102, 103]. Recent studies have shown that several PTB-domain proteins can interact with integrins [103]. However, the best data are found for the interaction of integrin NPxY motifs with two proteins, integrins’ cytoplasmic domain associated protein-1 (ICAP1) [104, 105] and talin [106]. ICAP1 was first identified by yeast two-hybrid screening using the intracellular domain of beta integrins as bait [104, 105]. ICAP1α is a small phosphoprotein of 200 amino acids (Figure 1), with a C-terminal PTB domain and an N terminus that is the site of threonine phosphorylation [104, 107]. Several roles have been ascribed to ICAP1. One set of studies found that ICAP1 can function as a guanyl dissociation
2 Function of PTB Domain Proteins
inhibitor, sequestering the small G proteins, Cdc42 and Rac, in the cytoplasm and leading to altered actin dynamics [108]. In this regard it has been demonstrated that modulation of small G protein function can regulate ICAP phosphorylation [104]. ICAP may also alter cell adhesion and cell spreading by modulating the interactions of integrins with other members of the actin cytoskeleton [109]. One of the members of the actin cytoskeleton whose binding to integrins may be affected by ICAP1 is talin. Talin is a large protein (> 2500 amino acids) that interacts with integrins via its FERM domain [110]. FERM domains were originally identified in members of the Band 4.1 family of actin-binding proteins [111]. The FERM domain of talin interacts with the NPxY motifs in beta integrins [106]. It is of great interest that a crystal structure of this interaction reveals that this FERM domain interaction is very similar to PTB domain interactions and that a region of the FERM domain has a structure very similar to that of PTB and PH domains [112, 113]. Thus, some FERM-domain proteins may have interactions that mimic PTB domains and may greatly broaden the proteins that can interact with NPxY motifs. 2.2.3 PTB Domain Proteins That Control Endocytosis Nearly a decade before the discovery of the PTB domain, the Goldstein and Brown laboratory [114] showed that a single point mutation (the JD mutation) in the gene encoding the low density lipoprotein (LDL) receptor can prevent receptor internalization. The LDL receptor governs the internalization of plasma LDL particles, primarily in the liver, and targets internalized LDL for lysosomal degradation. The JD mutation substitutes a tyrosine for a cysteine residue within the cytosolic domain of the receptor [114], causing stagnation of the receptor on the cell surface and leading to hypercholesterolemia and coronary artery disease. More complete mapping revealed that an FxNPxY internalization sequence is required to drive the efficient endocytic uptake of the LDL receptor and that this sequence is both autonomous and transplantable [102]. Most other members of the LDL receptor superfamily contain at least one FxNPxY motif within the cytosolic portion but, because the PTB domain was originally presumed to be pTyr-specific, a link between PTB domain proteins and endocytosis from the cell surface was not immediately appreciated. It is now known that several PTB domain proteins, including Disabled-1 (Dab1), Disabled-2 (Dab2), FE65, and JIP-1/2, can bind to these nonphosphorylated FxNPxY sequences directly [61, 83, 115–117]. The PTB domain–endocytosis connection became apparex nt when Dab2 was found to colocalize with clathrin at internalization sites on the cell surface [117]. The C-terminal region of Dab2, following the PTB domain, is essentially unstructured and contains multiple interaction motifs that bind physically to clathrin, the clathrin-associated AP-2 adaptor complex, and several other endocytic components [117–119]. Clathrin and AP-2 are the two principal components of a sorting coat that assembles on a class of endocytic transport vesicles termed clathrin-coated vesicles [120]. Like the signaling PTB-domain scaffolding proteins, Dab2 appears to mesh LDL receptor cargo with assembling coated vesicles by simultaneously binding to an NPxY internalization signal, PtdIns(4,5)P2 , and the clathrin coat machinery. In so doing, Dab2 expands the cargo selective capacity of the major
9
10 PTB Domains
sorting adaptor AP-2. Indeed, overexpression of a tandem Dab2 PTB domain prevents uptake of LDL but not of transferrin, because the transferrin receptor uses a distinct internalization sequence that binds directly to the AP-2 adaptor [118]. In mice, targeted disruption of the Dab2 gene is lethal, but conditional disruption only in the embryo leads to viable animals with a proteinuria similar to, albeit milder than, that seen in megalin nullizygous mice [121, 122]. A member of the LDL receptor superfamily, megalin is a scavenger receptor that plays a vital role in the recovery of protein and vitamins from the renal glomerular filtrate [122]. Although there is evidence for Dab2 participating in signal transduction [121, 123–125], the trafficking activity appears phylogenetically conserved. Ce-Dab-1, the single Disabled ortholog in C. elegans, acts together with two LDL receptor-related proteins (Ce-LRP-1 and Ce-LRP-2) in the proper biosynthetic delivery of Egl17, a fibroblast growth factor [126]. The recent identification of a gene responsible for a rare form of hypercholesterolemia characterized by a recessive pattern of inheritance has fortified the connection between PTB domains and endocytosis. Autosomal recessive hypercholesterolemia patients have a clinical phenotype remarkably similar to that of familial hypercholesterolemia patients, but have no mutations in the LDL receptor. Instead, the defective gene encodes a 308-amino-acid adaptor protein, termed autosomal recessive hypercholesterolemia (ARH), with an N-terminal PTB domain related to the Dab1/2 and Numb PTB domains (Figure 1) [127, 128]. ARH patient lymphoblasts exhibit defective LDL uptake despite normal levels of LDL receptor, but retroviralmediated expression of exogenous ARH rescues LDL receptor activity in these cells [128]. The hypercholesterolemic phenotype of ARH−/− mice fed a high-fat diet and the prominent accumulation of the LDL receptor on the sinusoidal surface of hepatocytes in these animals also support a role for ARH in regulating the endocytic activity of the LDL receptor [129]. The ARH PTB domain interacts with FxNPxY sequences while the C-terminal region physically binds to AP-2 and clathrin [130–132]. In Xenopus oocytes, production of mutant ARH that cannot engage the endocytic machinery severely blunts the uptake of vitellogenin [132], a lipoprotein endocytosed by the vitellogenin receptor, a member of the LDL receptor superfamily with an FxNPxY internalization signal. Because ARH patients do not exhibit defects in LDL uptake in all tissues [133, 134] and the Dab2−/− phenotype is rather mild [121], it is possible that Dab2 and ARH are functionally redundant to some extent. The function of Numb is perhaps best understood in terms of the development of external sensory organs in Drosophila. Sensory bristles on the fly head are composed of four different cell types that all originate from a single sensory organ precursor (SOP) cell by asymmetric cell division. This results in the formation of two nonequivalent daughter cells with different cell fates. Numb is a modular PTB domain protein (Figure 1) that antagonizes signaling by the transmembrane receptor Notch [135]. The bristle phenotype of notch mutants is similar to that seen on Numb overexpression [136], and there is some evidence for Numb binding directly to Notch [137, 138]. During SOP cell mitosis, Numb partitions asymmetrically, repressing Notch activity in the cell preferentially enriched in
3 PTB Domain Structure
Numb, which is instrumental in establishing the different identities of the daughter cells. The PTB domain of Numb is necessary and sufficient for membrane association and asymmetric localization [139]. As in Dab2, the C-terminal region of Numb is likely disordered, and there are binary associations between Numb and the AP-2 adaptor [136, 140]. In fact, certain AP-2 (α-adaptin) mutant alleles phenocopy numb mutations, and Numb is necessary to drive asymmetric partitioning of AP-2 in the mitotic SOP cell [136]. Epistasis analysis reveals that AP-2 acts between Numb and Notch, leading to a model in which Numb-dependent endocytosis and degradation of Notch underlies the different levels of Notch signaling that cause distinct cell fates. The asymmetric distribution of Numb in mitotic cells may be controlled by NIP (Numb-interacting protein), a transmembrane protein that binds the PTB domain via two redundant NPxF-type sequences [141]. That the PTB domain is important in asymmetric cell division is also demonstrated by the gene dosage-dependent effect of overexpression of Numb-associated kinase (Nak), which engages the PTB domain via an FSNMSF sequence [142]. A significant complication of the Notch down-regulation model is that it has not been convincingly shown that the levels of Notch differ significantly between the two daughter cells. In a modified model, Numb may regulate Notch activity by binding to the multispanning transmembrane protein Sanpodo and promoting its internalization [143]. Endocytosis of Sanpodo prevents the protein from localizing to the plasma membrane where it can bind and assist Notch in signal transduction [143]. It is not yet known whether endocytic uptake of Sanpodo requires the Numb PTB domain, but the YTNPAF sequence at the N terminus of Sanpodo is highly suggestive. Irrespective, it seems clear that Numb, ARH, and Dab2 are globally similar adaptors that use the PTB domain to select designated cargo while synchronously interacting with the endocytic machinery.
3 PTB Domain Structure
The high-resolution atomic structures of the PTB domain from eight proteins (Shc [13], IRS-1 [144, 145], Numb [146], X11 [147], SNT-1 [30], Disabled-1 (Dab1) [148, 149], Disabled-2 (Dab2) [149], and Dok1 [150]) currently available reveal that, despite low primary sequence identity and conservation, all adopt a common basic fold. The canonical PTB domain core is composed of seven central β strands, folded into two antiparallel β sheets oriented nearly orthogonally to one another and capped by an α helix at one end of the β sandwich [151]. Typically, one sheet is composed of strands β1–β4 and the other of β5–β7, although β1 can be long and arched, allowing this strand to contribute to the β5–β7 sheet as well (Figures 2 and 3). As mentioned previously, the overall topological arrangement is homologous to that of a phosphoinositide interaction module, the PH domain [12], making the PTB domain a member of the PH superfold. Indeed, the core Cα atoms of the IRS-1 PTB domain and the dynamin, phospholipase C-δ1, and spectrin PH domains superimpose with an rms deviation of ∼1.0 Å [144].
11
12 PTB Domains
Fig. 2 Schematic representation of the structure of the Shc PTB domain bound to a TrkA NPxpoY phosphopeptide. Ribbon diagram with " helices colored green, $ strands cyan, and connecting loops gray. The TrkA phosphopeptide (HIIENPQpoYFSDA, gold) is shown in stick representation, with nitrogen atoms colored blue, oxygen red, and phosphorus magenta. The position of PTB domain sidechains involved in coordinating the pTyr 0 (pY0)
residue of the peptide are also shown in stick representation and colored according to the secondary structural elements from which they emanate. The locations of the NPxpoY peptide residues N-3, P-2, and pY0 are indicated in gold italic type. Notation of phosphotyrosine at position , indicated by pY0, should not be confused with the Seefeld convention for phosphotyrosine, poY.
In the PTB domain, the common flanking amphipathic C-terminal helix, along with the abutting β5 strand, play a central role in binding the NPx(po)Y peptide ligand. These two structural elements, in part, generate an elongated groove in the PTB domain that accounts for several of the generic features of PTB-NPx(po)Y engagement. In most cases, the ligand residues proximal to the Asn −3 residue are extended, participating in a β augmentation via backbone hydrogen bonds. This melds an extra antiparallel strand derived from the binding partner into one sheet of the β sandwich, between β5 and the C-terminal helix. The sidechain amide of the NPxY Asn −3 hydrogen bonds to the PTB domain, and the carbonyl oxygen is involved in the formation of the characteristic type I β turn. An intramolecular hydrogen bond to the mainchain amide of the −1 residue, and the propensity of Pro for tight turn formation, reorients the NPxY peptide backbone roughly 90◦ , bringing the Tyr into proximity with the loops connecting the β4–β5 and β6–β7 strands of the sandwich and stabilizing electrostatic interactions (Figures 2 and 3). Thus, the ligand is positioned in an L-shaped conformation and the generally
3 PTB Domain Structure
Fig. 3 Ribbon diagram illustrating the 3D structure of a ternary complex of the Dab1 PTB domain, an APP-derived NPxY peptide, and inositol 1,4,5-trisphosphate (Ins(1,4,5)P3 ). Helices are green, $ strands cyan, and connecting loops gray. The positions of PTB domain sidechains involved in accommodating the Tyr-5 (Y-5) residue of the YxNPxY peptide and in coordination of the Ins(1,4,5)P3 phosphates are shown
in stick representation colored according to the secondary structural elements from which they emanate. The APP peptide (NGYENPTYK, gold) and Ins(1,4,5)P3 are shown in stick form with nitrogen atoms blue, oxygen red, and phosphorus magenta. The locations of the NPxY peptide residues Y-5, N-3, P-2, and Y0 are indicated in gold italic type.
conserved mode of NPxY sequence recognition and engagement is illustrated by the extremely similar backbone conformations of a nonphosphorylated NGYNPTY peptide, derived from APP, in the X11 PTB and the ApoER2-derived FNFDNPVY peptide in the Dab1 PTB domain [148]. 3.1 Broad Binding Specificity
As discussed above, PTB domains are capable of binding a variety of NPx(po)Y-type sequences, discriminating between tyrosine-phosphorylated and nonphosphorylated versions of this motif, as well as engaging sequences unrelated to the NPxY consensus. Flexibility in recognition comes in part from specializations and variations in the basic PTB domain architecture in the form of additional strands and helices. A common addition, which differentiates the PTB from the PTBI domain, is the insertion of a β1 strand and following α helix between β1 and β2 (found in Shc, Numb, X11, Dab1, and Dab2). The insertion extends considerably the loop between the β1 and β2 strands and typically results in a second helix positioned over the β5/6/7 sheet, although the precise orientation differs among the PTB domains (Figures 2 and 3). In X11, residues from the N-terminal segment
13
14 PTB Domains
of the α1 helix make hydrophobic and electrostatic contributions to interactions with the N-terminal -5 and -8 residues of the bound, nonphosphorylated APP NPxY peptide [147]. The X11-APP interaction is also unusual because the NPxY tight turn is followed immediately by a short 310 helix that orients two aromatic sidechains (Phe +2 and Phe +3) for optimal placement in a hydrophobic pocket created by sidechains from the longer C-terminal region of the capping C-terminal ot2 helix [147]. Both these interactions contribute to the binding energy, because removal of the −7 and −8 residues of the APP peptide or substitution of Phe +2 or +3 for Ala, substantially diminish the affinity of the peptide for the X11 PTB domain [147]. Thus, X11 uses extended contacts to bind a nonphosphorylated NPxY sequence with high affinity (Kd ≈ 300 nM) [147]. A nonphosphorylatable Numb PTB domain ligand, GFSNMSFEDFP derived from the Numb binding partner Numb-associated kinase (Nak), uses a somewhat similar mechanism to complex with Numb [152]. 3.2 Diverse Modes of Engagement
As alluded to above, the FRS2/SNT1 PTBI domain binds either NPxpoY in TrkA receptor or a completely unrelated sequence in FGF receptor. NMR analysis of FRS2 complexed with a 22-residue peptide (HSQMAVHKLAKSIPLRRQVTVS) corresponding to a segment of the human FGF receptor uncovers a novel extended mode of ligand engagement [30] for this canonical PTBI domain scaffold. Despite the absence of the NPxY motif, the C-terminal portion of the FGF receptor peptide (QVTVS) forms the usual additional antiparallel β strand, but between β5 and β8, an extra strand unique to FRS2/SNT1 [30] that folds back over the capping C-terminal helix and assumes some of the function of the lower flanking a1 helix. Turning the peptide backbone nearly 90◦ , the proximal part of the FGF receptor ligand peptide extends over the face of the β5/6/7 sheet and turns again, embedding the N-terminal MAVH sequence in a hydrophobic depression created by sidechains projecting off the three loops joining β1–β2, β3–β4, and β6–β7. Together, this wraps the FGF receptor peptide around the PTB domain, employing extensive hydrophobic contacts all along the interaction surface as well as electrostatic interactions mainly at the turns [30]. Interestingly, deletion of β8 abolishes FGF receptor binding without disrupting the binding of NPxpoY-bearing neurotrophin receptors such as TrkA. The thermodynamics governing FGF receptor and neurotrophin peptide engagement are different [151], yet FRS2/SNT-1 does use the same general surface between β5 and a1/β8 to bind both, because FGF receptor and TrkB peptides compete with each other for PTB binding [30]. The high selectivity of FRS2 for phosphotyrosine is most probably due to two arginine residues structurally equivalent to the two sidechains (Arg212 and Arg227) in IRS1 that compensate for the negative charge of the phosphate [144, 145]. In FRS2, Arg63, located at the beginning of β5, and Arg78, projecting off the loop connecting β6 to β7, likely coordinate the phosphotyrosine, because mutation of FRS2 Arg63 or Arg78 disrupts TrkB binding without affecting FGF receptor interactions [30]. Similarly positioned basic sidechains for phosphotyrosine coordination are also
3 PTB Domain Structure
found in Shc (Arg67, Lys169, Arg175; Figure 2), which likewise selects NPxpoY ligands over NPxY [13, 153, 154], and in the Dok1 PTB domain [150]. Altogether, these structures provide a molecular explanation for phosphotyrosine-specific binding as well as for NPxY-unrelated sequence engagement. Another aspect of PTB domain versatility has been uncovered by NMR studies on the Shc PTB domain. Compared to the NPxpoY-liganded structure, the free Shc PTB domain is relatively unfolded: 25 N-terminal and 16 C-terminal residues of the domain are not ordered in the unliganded domain [155]. In addition, the vital β7 strand (equivalent to the canonical PTB domain β5 strand) is unstructured, and the α1 and α3 helices are partly unwound and separated, severely dismembering the NPxpoY engagement groove (Figure 2). Local disorder in the absence of bound ligand also splays the two β sheets apart, altering the location of important sidechains required for proper engagement of the phosphopeptide ligand [155]. This information suggests either an induced-fit type of ligand-dependent folding or equilibrium between the disordered and folded states, with ligand binding to and stabilizing only the folded conformation, or some degree of both [155]. It is not yet clear whether partial unfolding also occurs in other unliganded PTB domains. Although the Shc, IRS-1, Dok1, and FRS2 PTB domains preferentially bind phosphorylated partners, the Dab1 and Dab2 PTB domains have 10–100-fold higher selectivity for a nonphosphorylated NPxY peptide [61, 117]. And although X11 is Tyr-neutral, in that both Ala and pTyr are accommodated at the Tyr (0) position, these modifications to an APP peptide ligand severely inhibit binding to the Dab1 PTB domain [61]. The crystal structures of the Dab1 and Dab2 PTB domains nicely explain how the NPxY motif is selected over NPxpoY sequences in a phylogenetically distinct class of PTB-domain proteins including Numb, ARH [127] and the apoptosis protein CED-6/GULP [156]. An NPxY peptide derived from either ApoER2 [148] or APP [149] adopts the typical antiparallel β strand/tight turn conformation on the Dab1 PTB domain, and the orientation of the APP peptide upon the Dab1 or Dab2 PTB domain is very similar [149]. Strikingly, there is a lack of surface-exposed arginine residues in the immediate vicinity of the Tyr (0) hydroxyl group and, instead, a hydrogen bond between the Tyr sidechain and the mainchain carbonyl oxygen of Gly131 in Dab1, or Gly139 in Dab2, occurs [148, 149]. The β6 strand in Dab1 and 2 is also longer than in X11 and terminates closer to Tyr (0) and, together with the short loop joining β4–β5 and the tight turn connecting β6–β7, confines the aromatic portion of Tyr (0) with nearby PTB sidechains making hydrophobic contacts [148, 149] (Figure 3). Consequently, steric clashes would prevent the larger pTyr from engaging appropriately, allowing these PTB domains to effectively discriminate between NPxY and NPxpoY. Loop length and/or mobility in this region might allow some PTB domains to accommodate either form of tyrosine. In the X11-APP structure, the β6–β7 connecting loop is mobile and lacks good density [147]. This flexibility would allow more space to house the bulkier pTyr. Different PTB domains bind NPx(po)Y sequences with clear specificity; the Dok1 PTB domain, for example, does not bind NPx(po)Y peptides derived from IL-4 receptor or from TrkA, despite binding the related sequence WIENKLpoYGM derived from RET [150]. Additional ligand specificity/selectivity comes from
15
16 PTB Domains
peripheral alterations to the surface topology of the major interaction site. The marked preference for bulky hydrophobic sidechains at the −5 (Shc, X11, Dab1/2) or −6 position (Dok1) of the ligand reflects hydrophobic pockets appropriately positioned to accommodate these residues. For example, in Dab1 (and Dab2) Phe-5 packs into a hydrophobic site generated by Ile116, Ile151, Phe158, and the aliphatic portion of Arg155 [148, 149] (Figure 3). The surface of the Dok1 PTB domain is optimized for a sequence containing Leu at −1 [150]. All together, these features explain how different PTB domains preferentially recognize distinct peptide sequences despite an overall similar mode of engagement. 3.3 Phospholipid Binding
As previously discussed, the PTB domains of Shc [11], Dab1 [61], and Dab2 [118] also bind the acidic phospholipid phosphatidylinositol 4,5-bisphosphate (PtdIns(4,5)P2 ) noncompetitively with the NPxY ligand. The phosphoinositide interaction surface in the Disabled proteins is located at the opposite end of the second β sheet, allowing the crystallization of ternary PTB-NPxY-PtdIns(4,5)P2 /Ins(1,4,5)P3 complexes [148, 149]. The lipid-binding site is spatially related to the phosphoinositide-binding surface in the PH domain, where the negatively charged headgroup is coordinated by basic sidechains positioned to optimize hydrogen bonding to the appropriate phosphatidylinositol phosphate form [12]. Similarly, Lys, Arg, and His residues (Lys45, Arg76, His81, Lys82, Arg124, and Lys142; Figure 3) cluster to generate a basic surface patch on the Dab1 and 2 PTB domains involved in coordinating the phosphate groups [148, 149]. This coherent positively charged interaction surface is not found at the equivalent position in either X11 or Shc [148], but is likely to occur in Numb and ARH. Indeed, the PTB domains from both these proteins bind phosphoinositides in vitro [131, 157].
4 Conclusions
In summary, the PTB domain utilizes the basic PH-domain architectural scaffold to create a topologically conserved NPx(po)Y ligand binding cleft absent from the phosphoinositide binding PH domain. The peripheral mode of pTyr engagement, utilizing surface-exposed sidechains not absolutely conserved in primary sequence, is fundamentally different from that of the SH2 domain, which bears no structural relationship to the PTB domain. The mode of NPxY binding has allowed the evolution of different PTB domains specifically selecting either NPxpoY or NPxY, or displaying no bias for pTyr over Tyr, making the original PTB designation now a misnomer for this interaction module. Due to this ability to bind to diverse ligands, PTB-domain proteins are involved in multiple cellular functions. As described in Section 2, major categories of activity include tyrosine kinase signaling, cell adhesion, and protein trafficking. However,
References
there is still much to be learned about the functions of these proteins and their possible role in diseases. Already, the identification of the ARH protein has yielded important insights into cholesterol trafficking, and the multiple APP binding proteins are likely to lead to new discoveries in the pathogenesis of Alzheimer’s disease. Additionally, there are many PTB domains that we know very little about, such as Odin. Future work will focus on delineating the function of these proteins, and no one will be surprised if unique types of interactions are described for their PTB domains. Overall, the PTB domain is a good example of how studies based on the simple analysis of protein binding can evolve so as to yield many important discoveries in unrelated fields of study.
References 1 BLAIKIE, P., IMMANUEL, D., WU, J., LI, N., YAJNIK, V., MARGOLIS, B., A region in Shc distinct from the SH2 domain can bind tyrosine phosphorylated growth factor receptors. J. Biol. Chem. 1994, 269, 32031–32034. 2 GUSTAFSON, T. A., HE, W., CRAPARO, A., SCHAUB, C. D., O’NEILL, T. J., Phospho-tyrosine-dependent interaction of Shc and IRS-1 with the NPEY motif of the insulin receptor via a novel (non-SH2) domain. Mol. Cell. Biol. 1995, 15, 2500–2508. 3 KAVANAUGH, W. M., WILLIAMS, L. T., An alternative to SH2 domains for binding tyrosine-phosphorylated proteins. Science 1994, 266, 1862–1865. 4 SUN, X. J., et al., Role of IRS-2 in insulin and cytokine signalling. Nature 1995, 377, 173–177. 5 BORK, P., MARGOLIS, B., A phosphotyrosine interaction domain. Cell 1995, 80, 693–694. 6 BORG, J. P., MARGOLIS, B., Function of PTB domains. Curr. Top. Microbiol. Immunol. 1998, 228, 23–38. 7 BATZER, A. G., BLAIKIE, P., NELSON, K., SCHLESSINGER, J., MARGOLIS, B., The phosphotyrosine interaction domain of Shc binds an LXNPXY motif on the epidermal growth factor receptor. Mol. Cell. Biol. 1995, 15, 4403–4409. 8 DIKIC, I., BATZER, A. G., BLAIKIE, P., OBERMEIER, A., ULLRICH, A., SCHLESSINGER, J., MARGOLIS, B., Shc binding to nerve growth factor receptor is mediated by the phosphotyrosine
9
10
11
12
13
14
15
interaction domain. J. Biol. Chem. 1995, 270, 15125–15129. RAVICHANDRAN, K. S., Signaling via Shc family adapter proteins. Oncogene 2001, 20, 6322–6330. LUSCHNIG, S., KRAUSS, J., BOHMANN, K., DESJEUX, I., NUSSLEIN-VOLHARD, C., The Drosophila SHC adaptor protein is required for signaling by a subset of receptor tyrosine kinases. Mol. Cell 2000, 5, 231–241. RAVICHANDRAN, K. S., ZHOU, M. M., PRATT, J. C., HARLAN, J. E., WALK, S. F., FESIK, S. W., BURAKOFF, S. J., Evidence for a requirement for both phospholipid and phosphotyrosine binding via the Shc phosphotyrosine-binding domain in vivo. Mol. Cell. Biol. 1997, 17, 5540– 5549. LEMMON, M. A., Phosphoinositide recognition domains. Traffic 2003, 4, 201–213. ZHOU, M. M., et al., Structure and ligand recognition of the phosphotyrosine binding domain of Shc. Nature 1995, 378, 584–592. WHITE, M. F., LIVINGSTON, J. N., BACKER, J. M., LAURIS, V., DULL, T. J., ULLRICH, A., KAHN, C. R., Mutation of the insulin receptor at tyrosine 960 inhibits signal transmission but does not affect its tyrosine kinase activity. Cell 1988, 54, 641–649. SUN, X. J., et al., Structure of the insulin receptor substrate IRS-1 defines a unique signal transduction protein. Nature 1991, 352, 73–77.
17
18 PTB Domains
16 HE, W., O’NEILL, T. J., GUSTAFSON, T. A., Distinct modes of interaction of SHC and insulin receptor substrate-1 with the insulin receptor NPEY region via non-SH2 domains. J. Biol. Chem. 1995, 270, 23258–23262. 17 WOLF, G., et al., PTB domains of IRS-1 and Shc have distinct but overlapping binding specificities. J. Biol. Chem. 1995, 270, 27407–27410. 18 WHITE, M. F., IRS proteins and the common path to diabetes. Am. J. Physiol. Endocrinol. Metab. 2002, 283, E413–22. 19 YENUSH, L., MAKATI, K. J., SMITHHALL, J., ISHIBASHI, O., MYERS, M. G., WHITE, M. F., The pleckstrin homology domain is the principle link between insulin receptor and IRS-1. J. Biol. Chem. 1996, 271, 24300–24306. 20 TAKEUCHI, H., et al., PTB domain of insulin receptor substrate-1 binds inositol compounds. Biochem. J. 1998, 334, 211–218. 21 FARHANG-FALLAH, J., RANDHAWA, V. K., NIMNUAL, A., KLIP, A., BAR-SAGI, D., ROZAKIS-ADCOCK, M., The pleckstrin homology (PH) domain-interacting protein couples the insulin receptor substrate 1 PH domain to insulin signaling pathways leading to mitogenesis and GLUT4 translocation. Mol. Cell. Biol. 2002, 22, 7325–7336. 22 DHE-PAGANON, S., OTTINGER, E. A., NOLTE, R. T., ECK, M. J., SHOELSON, S. E., Crystal structure of the pleckstrin homology-phosphotyrosine binding (PH-PTB) targeting region of insulin receptor substrate 1. Proc. Natl. Acad. Sci. USA 1999, 96, 8378–8383. 23 KOUHARA, H., HADARI, Y. R., SPIVAK-KROIZMAN, T., SCHILLING, J., BAR-SAGI, D., LAX, I., SCHLESSINGER, J., A lipid-anchored Grb2-binding protein that links FGF-receptor activation to the Ras/MAPK signaling pathway. Cell 1997, 89, 693–702. 24 RABIN, S. J., CLEGHON, V., KAPLAN, D. R., SNT, a differentiation-specific target of neurotrophic factor-induced tyrosine kinase activity in neurons and PC12 cells. Mol. Cell. Biol. 1993, 13, 2203–2213. 25 XU, H., LEE, K. W., GOLDFARB, M., Novel recognition motif on fibroblast growth
26
27
28
29
30
31
32
33
factor receptor mediates direct association and activation of SNT adapter proteins. J. Biol. Chem. 1998, 273, 17987–17990. KUROKAWA, K., IWASHITA, T., MURAKAMI, H., HAYASHI, H., KAWAI, K., TAKAHASHI, M., Identification of SNT/FRS2 docking site on RET receptor tyrosine kinase and its role for signal transduction. Oncogene 2001, 20, 1929–1938. ONG, S. H., GUY, G. R., HADARI, Y. R., LAKS, S., GOTOH, N., SCHLESSINGER, J., LAX, I., FRS2 proteins recruit intracellular signaling pathways by binding to diverse targets on fibroblast growth factor and nerve growth factor receptors. Mol. Cell. Biol. 2000, 20, 979–989. HADARI, Y. R., GOTOH, N., KOUHARA, H., LAX, I., SCHLESSINGER, J., Critical role for the docking-protein FRS2 alpha in FGF receptor-mediated signal transduction pathways. Proc. Natl. Acad. Sci. USA 2001, 98, 8578–8583. MEAKIN, S. O., MACDONALD, J. I., GRYZ, E. A., KUBU, C. J., VERDI, J. M., The signaling adapter FRS-2 competes with Shc for binding to the nerve growth factor receptor TrkA. A model for discriminating proliferation and differentiation. J. Biol. Chem. 1999, 274, 9861–9870. DHALLUIN, C., et al., Structural basis of SNT PTB domain interactions with distinct neurotrophic receptors. Mol. Cell 2000, 6, 921–929. YAMANASHI, Y., BALTIMORE, D., Identification of the Abl- and rasGAP-associated 62-kDa protein as a docking protein, Dok. Cell 1997, 88, 205–211. CARPINO, N., WISNIEWSKI, D., STRIFE, A., MARSHAK, D., KOBAYASHI, R., STILLMAN, B., CLARKSON, B., p62(dok): a constitutively tyrosine-phosphorylated, GAP-associated protein in chronic myelogenous leukemia progenitor cells. Cell 1997, 88, 197–204. JONES, N., DUMONT, D. J., The Tek/Tie2 receptor signals through a novel Dok-related docking protein, Dok-R. Oncogene 1998, 17, 1097– 1108.
References 34 NELMS, K., SNOW, A. L., HU-LI, J., PAUL, W. E., FRIP, a hematopoietic cell-specific rasGAP-interacting protein phosphorylated in response to cytokine stimulation. Immunity 1998, 9, 13–24. 35 DI CRISTOFANO, A., et al., Molecular cloning and characterization of p56dok-2 defines a new family of RasGAP-binding proteins. J. Biol. Chem. 1998, 273, 4827–4830. 36 CONG, F., YUAN, B., GOFF, S. P., Characterization of a novel member of the DOK family that binds and modulates Abl signaling. Mol. Cell. Biol. 1999, 19, 8314–8325. 37 LEMAY, S., DAVIDSON, D., LATOUR, S., VEILLETTE, A., Dok-3, a novel adapter molecule involved in the negative regulation of immunoreceptor signaling. Mol. Cell. Biol. 2000, 20, 2743–2754. 38 GRIMM, J., SACHS, M., BRITSCH, S., DI CESARE, S., SCHWARZ-ROMOND, T., ALITALO, K., BIRCHMEIER, W., Novel p62dok family members, dok-4 and dok-5, are substrates of the c-Ret receptor tyrosine kinase and mediate neuronal differentiation. J. Cell Biol. 2001, 154, 345–354. 39 CAI, D., DHE-PAGANON, S., MELENDEZ, P. A., LEE, J., SHOELSON, S. E., Tw o new substrates in insulin signaling, IRS5/DOK4 and IRS6/DOK5. J. Biol. Chem. 2003, 278, 25323–25330. 40 FAVRE, C., GERARD, A., CLAUZIER, E., PONTAROTTI, P., OLIVE, D., NUNES, J. A., DOK4 and DOK5: new Dok-related genes expressed in human T cells. Genes Immunol. 2003, 4, 40–45. 41 JONES, N., DUMONT, D. J., Recruitment of Dok-R to the EGF receptor through its PTB domain is required for attenuation of Erk MAP kinase activation. Curr. Biol. 1999, 9, 1057–1060. 42 SONGYANG, Z., YAMANASHI, Y., LIU, D., BALTIMORE, D., Domain-dependent function of the rasGAP-binding protein p62Dok in cell signaling. J. Biol. Chem. 2001, 276, 2459–2465. 43 JONES, N., CHEN, S. H., STURK, C., MASTER, Z., TRAN, J., KERBEL, R. S., DUMONT, D. J., A unique autophosphorylation site on Tie2/Tek mediates Dok-R phosphotyrosine
44
45
46
47
48
49
50
51
52
binding domain binding and function. Mol. Cell. Biol. 2003, 23, 2658–2668. YAMANASHI, Y., TAMURA, T., KANAMORI, T., YAMANE, H., NARIUCHI, H., YAMAMOTO, T., BALTIMORE, D., Role of the rasGAP-associated docking protein p62(dok) in negative regulation of B cell receptor-mediated signaling. Genes. Dev. 2000, 14, 11–16. MASTER, Z., JONES, N., TRAN, J., JONES, J., KERBEL, R. S., DUMONT, D. J., Dok-R plays a pivotal role in angiopoietin-1dependent cell migration through recruitment and activation of Pak. EMBO J. 2001, 20, 5919–5928. FAZIOLI, F., MINICHIELLO, L., MATOSKA, V., CASTAGNINO, P., MIKI, T., WONG, W. T., DI FIORE, P. P., Eps8, a substrate for the epidermal growth factor receptor kinase, enhances EGF-dependent mitogenic signals. EMBO J. 1993, 12, 3799–3808. CASTAGNINO, P., BIESOVA, Z., WONG, W. T., FAZIOLI, F., GILL, G. N., DI FIORE, P. P., Direct binding of eps8 to the juxtamembrane domain of EGFR is phosphotyrosine- and SH2-independent. Oncogene 1995, 10, 723–729. PROVENZANO, C., GALLO, R., CARBONE, R., DI FIORE, P. P., FALCONE, G., CASTELLANI, L., ALEMA, S., Eps8, a tyrosine kinase substrate, is recruited to the cell cortex and dynamic F-actin upon cytoskeleton remodeling. Exp. Cell Res. 1998, 242, 186–200. SCITA, G., et al., EPS8 and E3B1 transduce signals from Ras to Rac. Nature 1999, 401, 290–293. SCITA, G., et al., An effector region in Eps8 is responsible for the activation of the Rac-specific GEF activity of Sos-1 and for the proper localization of the Rac-based actin-polymerizing machine. J. Cell Biol. 2001, 154, 1031–1044. INNOCENTI, M., FRITTOLI, E., PONZANELLI, I., FALCK, J. R., BRACHMANN, S. M., DI FIORE, P. P., SCITA, G., Phosphoinositide 3-kinase activates Rac by entering in a complex with Eps8, Abi1, and Sos-1. J. Cell Biol. 2003, 160, 17–23. LANZETTI, L., RYBIN, V., MALABARBA, M. G., CHRISTOFORIDIS, S., SCITA, G., ZERIAL, M., DI FIORE, P. P., The Eps8 protein
19
20 PTB Domains
53
54
55
56
57
58
59
60
61
coordinates EGF receptor signalling through Rac and trafficking through Rab5. Nature 2000, 408, 374–377. PANDEY, A., et al., Cloning of a novel phosphotyrosine binding domain containing molecule, Odin, involved in signaling by receptor tyrosine kinases. Oncogene 2002, 21, 8029–8036. FU, X., MCGRATH, S., PASILLAS, M., NAKAZAWA, S., KAMPS, M. P., EB-1, a tyrosine kinase signal transduction gene, is transcriptionally activated in the t(1;19) subset of pre-B ALL, which express oncoprotein E2a-Pbx1. Oncogene 1999, 18, 4920–4929. NEUBIG, R. R., SIDEROVSKI, D. P., Regulators of G-protein signalling as new central nervous system drug targets. Nat. Rev. Drug Discov. 2002, 1, 187–197. SCHIFF, M. L., et al., Tyrosine-kinasedependent recruitment of RGS12 to the N-type calcium channel. Nature 2000, 408, 723–727. BORG, J. P., OOI, J., LEVY, E., MARGOLIS, B., The phosphotyrosine interaction domains of X11 and FE65 bind to distinct sites on the YENPTY motif of amyloid precursor protein. Mol. Cell. Biol. 1996, 16, 6229–6241. HARDY, J., SELKOE, D. J., The amyloid hypothesis of Alzheimer’s disease: progress and problems on the road to therapeutics. Science 2002, 297, 353–356. FIORE, F., ZAMBRANO, N., MINOPOLI, G., DONINI, V., DUILIO, A., RUSSO, T., The regions of the Fe65 protein homologous to the phosphotyrosine interaction/ phosphotyrosine binding domain of Shc bind the intracellular domain of the Alzheimer’s amyloid precursor protein. J. Biol. Chem. 1995, 270, 30853– 30856. MCLOUGHLIN, D. M., MILLER, C. C. J., The intracellular cytoplasmic domain of the Alzheimer’s disease amyloid precursor protein interacts with phospho-tyrosine-binding domain proteins in the yeast two-hybrid system. FEBS Lett. 1996, 397, 197–200. HOWELL, B. W., LANIER, L. M., FRANK, R., GERTLER, F. B., COOPER, J. A., The disabled-1 phosphotyrosine-binding domain binds to the internalization
62
63
64
65
66
67
68
69
signals of transmembrane glycoproteins and to phospholipids. Mol. Cell. Biol. 1999, 19, 5179–5188. HOMAYOUNI, R., RICE, D. S., SHELDON, M., CURRAN, T., Disabled-1 binds to the cytoplasmic domain of amyloid precursor-like protein 1. J. Neurosci. 1999, 19, 7507–7515. MATSUDA, S., et al., c-Jun N-terminal kinase (JNK)-interacting protein-1b/islet-brain-1 scaffolds Alzheimer’s amyloid precursor protein with JNK. J. Neurosci. 2001, 21, 6597–6607. TARU, H., KIRINO, Y., SUZUKI, T., Differential roles of JIP scaffold proteins in the modulation of amyloid precursor protein metabolism. J. Biol. Chem. 2002, 277, 27567–27574. RONCARATI, R., et al., The gammasecretase-generated intracellular domain of beta-amyloid precursor protein binds Numb and inhibits Notch signaling. Proc. Natl. Acad. Sci. USA 2002, 99, 7102–7107. SCHEINFELD, M. H., RONCARATI, R., VITO, P., LOPEZ, P. A., ABDALLAH, M., D’ADAMIO, L., Jun NH2-terminal kinase (JNK) interacting protein 1 (JIP1) binds the cytoplasmic domain of the Alzheimer’s beta-amyloid precursor protein (APP). J. Biol. Chem. 2002, 277, 3767–3775. RUSSO, C., DOLCINI, V., SALIS, S., VENEZIA, V., ZAMBRANO, N., RUSSO, T., SCHETTINI, G., Signal transduction through tyrosine-phosphorylated C-termi-nal fragments of amyloid precursor protein via an enhanced interaction with Shc/Grb2 adaptor proteins in reactive astrocytes of Alzheimer’s disease brain. J. Biol. Chem. 2002, 277, 35282–35288. TARR, P. E., RONCARATI, R., PELICCI, G., PELICCI, P. G., D’ADAMIO, L., Tyrosine phosphorylation of the beta-amyloid precursor protein cytoplasmic tail promotes interaction with Shc. J. Biol. Chem. 2002, 277, 16798–16804. BORG, J. P., YANG, Y., DE TADDEO-BORG, M., MARGOLIS, B., TURNER, R. S., The X11alpha protein slows cellular amyloid precursor protein processing and
References
70
71
72
73
74
75
76
77
78
reduces Abeta40 and Abeta42 secretion. J. Biol. Chem. 1998, 273, 14761–14766. SASTRE, M., TURNER, R. S., LEVY, E., X11 Interaction with beta-amyloid precursor protein modulates its cellular stabilization and reduces amyloid beta-protein secretion. J. Biol. Chem. 1998, 273, 22351–22357. SABO, S. L., LANIER, L. M., IKIN, A. F., KHORKOVA, O., SAHASRABUDHE, S., GREENGARD, P., BUXBAUM, J. D., Regulation of beta-amyloid secretion by FE65, an amyloid protein precursorbinding protein. J. Biol. Chem. 1999, 274, 7952–7957. CHANG, Y., et al., Generation of the beta-amyloid peptide and the amyloid precursor protein C-terminal fragment gamma are potentiated by FE65L1. J. Biol. Chem. 2003, 278, 51100–51107. LEE, J. H., et al., The neuronal adaptor protein X11alpha reduces Abeta levels in the brains of Alzheimer’s APPswe Tg2576 transgenic mice. J. Biol. Chem. 2003, 278, 47025–47029. KAECH, S. M., WHITFIELD, C. W., KIM, S. K., The LIN-2/LIN-7/LIN-10 complex mediates basolateral membrane localization of the C. elegans EGF receptor LET-23 in vulval epithelial cells. Cell 1998, 94, 761–771. RONGO, C., WHITFIELD, C. W., RODAL, A., KIM, S. K., KAPLAN, J. M., LIN-10 is a shared component of the polarized protein localization pathways in neurons and epithelia. Cell 1998, 94, 751–759. BORG, J. P., LOPEZ-FIGUEROA, M. O., DE TADDEO-BORG, M., KROON, D. E., TURNER, R. S., WATSON, S. J., MARGOLIS, B., Molecular analysis of the X11-mLin-2/CASK complex in brain. J. Neurosci. 1999, 19, 1307–1316. WHITFIELD, C. W., BENARD, C., BARNES, T., HEKIMI, S., KIM, S. K., Basolateral localization of the Caenorhabditis elegans epidermal growth factor receptor in epithelial cells by the PDZ protein LIN-10. Mol. Biol. Cell 1999, 10, 2087–2100. OKAMOTO, M., SUDHOF, T. C., Mints, Munc18-interacting proteins in synaptic vesicle exocytosis. J. Biol. Chem. 1997, 272, 31459–31464.
79 HO, C. S., MARINESCU, V., STEINHILB, M. L., GAUT, J. R., TURNER, R. S., STUENKEL, E. L., Synergistic effects of Munc18a and X11 proteins on amyloid precursor protein metabolism. J. Biol. Chem. 2002, 277, 27021–27028. 80 HILL, K., et al., Munc18 interacting proteins: ADP-ribosylation factor-dependent coat proteins that regulate the traffic of beta-Alzheimer’s precursor protein. J. Biol. Chem. 2003, 278, 36032–36040. 81 KING, G. D., PEREZ, R. G., STEINHILB, M. L., GAUT, J. R., TURNER, R. S., X11alpha modulates secretory and endocytic trafficking and metabolism of amyloid precursor protein: mutational analysis of the YENPTY sequence. Neuroscience 2003, 120, 143–154. 82 ANDO, K., IIJIMA, K. I., ELLIOTT, J. I., KIRINO, Y., SUZUKI, T., Phosphorylationdependent regulation of the interaction of amyloid precursor protein with Fe65 affects the production of beta-amyloid. J. Biol. Chem. 2001, 276, 40353–40361. 83 TROMMSDORFF, M., BORG, J. P., MARGOLIS, B., HERZ, J., Interaction of cytosolic adaptor proteins with neuronal apolipoprotein E receptors and the amyloid precursor protein. J. Biol. Chem. 1998, 273, 33556–33560. 84 KINOSHITA, A., WHELAN, C. M., SMITH, C. J., BEREZOVSKA, O., HYMAN, B. T., Direct visualization of the gamma secretasegenerated carboxyl-terminal domain of the amyloid precursor protein: association with Fe65 and translocation to the nucleus. J. Neurochem. 2002, 82, 839–847. 85 HYMAN, B. T., STRICKLAND, D., REBECK, G. W., Role of the low-density lipoprotein receptor-related protein in beta-amyloid metabolism and Alzheimer disease. Arch. Neurol. 2000, 57, 646–650. 86 PIETRZIK, C. U., BUSSE, T., MERRIAM, D. E., WEGGEN, S., KOO, E. H., The cytoplasmic domain of the LDL receptor-related protein regulates multiple steps in APP processing. EMBO J. 2002, 21, 5691–5700. 87 MINOPOLI, G., DE CANDIA, P., BONETTI, A., FARAONIO, R., ZAMBRANO, N., RUSSO, T.,
21
22 PTB Domains
88
89
90
91
92
93
94
95
96
The beta-amyloid precursor protein functions as a cytosolic anchoring site that prevents Fe65 nuclear translocation. J. Biol. Chem. 2001, 276, 6545–6550. CAO, X., SUDHOF, T. C., A transcriptionally [correction of transcriptively] active complex of APP with Fe65 and histone acetyltransferase Tip60. Science 2001, 293, 115–120. BIEDERER, T., CAO, X., SUDHOF, T. C., LIU, X., Regulation of APP-dependent transcription complexes by Mint/X11s: differential functions of Mint isoforms. J. Neurosci. 2002, 22, 7340–7351. KINOSHITA, A., SHAH, T., TANGREDI, M. M., STRICKLAND, D. K., HYMAN, B. T., The intracellular domain of the low density lipoprotein receptor-related protein modulates transactivation mediated by amyloid precursor protein and Fe65. J. Biol. Chem. 2003, 278, 41182–41188. BAEK, S. H., OHGI, K. A., ROSE, D. W., KOO, E. H., GLASS, C. K., ROSENFELD, M. G., Exchange of N-CoR corepressor and Tip60 coactivator complexes links gene expression by NF-kappaB and betaamyloid precursor protein. Cell 2002, 110, 55–67. ZHAO, Q., LEE, F. S., The transcriptional activity of the APP intracellular domain– Fe65 complex is inhibited by activation of the NF-kappaB pathway. Biochemistry 2003, 42, 3627–3634. SABO, S. L., IKIN, A. F., BUXBAUM, J. D., GREENGARD, P., The Alzheimer amyloid precursor protein (APP) and FE65, an APP-binding protein, regulate cell movement. J. Cell Biol. 2001, 153, 1403–1414. WHITMARSH, A. J., CAVANAGH, J., TOURNIER, C., YASUDA, J., DAVIS, R. J., A mammalian scaffold complex that selectively mediates MAP kinase activation. Science 1998, 281, 1671– 1674. DICKENS, M., et al., A cytoplasmic inhibitor of the JNK signal transduction pathway. Science 1997, 277, 693–696. NIHALANI, D., MEYER, D., PAJNI, S., HOLZMAN, L. B., Mixed lineage kinase-dependent JNK activation is governed by interactions of scaffold protein JIP with MAPK module
97
98
99
100
101
102
103
104
105
components. EMBO J. 2001, 20, 3447–3458. VERHEY, K. J., MEYER, D., DEEHAN, R., BLENIS, J., SCHNAPP, B. J., RAPOPORT, T. A., MARGOLIS, B., Cargo of kinesin identified as JIP scaffolding proteins and associated signaling molecules. J. Cell Biol. 2001, 152, 959–970. INOMATA, H., NAKAMURA, Y., HAYAKAWA, A., TAKATA, H., SUZUKI, T., MIYAZAWA, K., KITAMURA, N., A scaffold protein JIP-1b enhances amyloid precursor protein phosphorylation by JNK and its association with kinesin light chain 1. J. Biol. Chem. 2003, 278, 22946–22955. KAMAL, A., ALMENAR-QUERALT, A., LEBLANC, J. F., ROBERTS, E. A., GOLDSTEIN, L. S., Kinesin-mediated axonal transport of a membrane compartment containing beta-secretase and presenilin-1 requires APP. Nature 2001, 414, 643–648. SETOU, M., NAKAGAWA, T., SEOG, D. H., HIROKAWA, N., Kinesin superfamily motor protein KIF17 and mLin-10 in NMDA receptor-containing vesicle transport. Science 2000, 288, 1796–1802. HYNES, R. O., Integrins: bidirectional, allosteric signaling machines. Cell 2002, 110, 673–687. CHEN, W. J., GOLDSTEIN, J. L., BROWN, M. S., NPXY, a sequence often found in cytoplasmic tails, is required for coated pit-mediated internalization of the low density lipoprotein receptor. J. Biol. Chem. 1990, 265, 3116–3123. CALDERWOOD, D. A., et al., Integrin beta cytoplasmic domain interactions with phosphotyrosine-binding domains: a structural prototype for diversity in integrin signaling. Proc. Natl. Acad. Sci. USA 2003, 100, 2272–2277. CHANG, D. D., WONG, C., SMITH, H., LIU, J., ICAP-1, a novel beta1 integrin cytoplasmic domain-associated protein, binds to a conserved and functionally important NPXY sequence motif of beta1 integrin. J. Cell Biol. 1997, 138, 1149–1157. ZHANG, X. A., HEMLER, M. E., Interaction of the integrin beta1 cytoplasmic domain with ICAP-1 protein. J. Biol. Chem. 1999, 274, 11–19.
References 106 CALDERWOOD, D. A., ZENT, R., GRANT, R., REES, D. J., HYNES, R. O., GINSBERG, M. H., The talin head domain binds to integrin beta subunit cytoplasmic tails and regulates integrin activation. J. Biol. Chem. 1999, 274, 28071–28074. 107 BOUVARD, D., BLOCK, M. R., Calcium/calmodulin-dependent protein kinase II controls integrin alpha5beta1-mediated cell adhesion through the integrin cytoplasmic domain associated protein-1 alpha. Biochem. Biophys. Res. Commun. 1998, 252, 46–50. 108 DEGANI, S., et al., The integrin cytoplasmic domain-associated protein ICAP-1 binds and regulates Rho family GTPases during cell spreading. J. Cell Biol. 2002, 156, 377–387. 109 BOUVARD, D., et al., Disruption of focal adhesions by integrin cytoplasmic domain-associated protein-1 alpha. J. Biol. Chem. 2003, 278, 6567–6574. 110 HORWITZ, A., DUGGAN, K., BUCK, C., BECKERLE, M. C., BURRIDGE, K., Interaction of plasma membrane fibronectin receptor with talin: a transmembrane linkage. Nature 1986, 320, 531–533. 111 CHISHTI, A. H., et al., The FERM domain: a unique module involved in the linkage of cytoplasmic proteins to the membrane. Trends Biochem. Sci. 1998, 23, 281–282. 112 GARCIA-ALVAREZ, B., et al., Structural determinants of integrin recognition by talin. Mol. Cell 2003, 11, 49–58. 113 PEARSON, M. A., RECZEK, D., BRETSCHER, A., KARPLUS, P. A., Structure of the ERM protein moesin reveals the FERM domain fold masked by an extended actin binding tail domain. Cell 2000, 101, 259–270. 114 DAVIS, C. G., LEHRMAN, M. A., RUSSELL, D. W., ANDERSON, R. G., BROWN, M. S., GOLDSTEIN, J. L., The J. D. mutation in familial hypercholesterolemia: amino acid substitution in cytoplasmic domain impedes internalization of LDL receptors. Cell 1986, 45, 15–24. 115 GOTTHARDT, M., et al., Interactions of the low density lipoprotein receptor gene family with cytosolic adaptor and
116
117
118
119
120
121
122
123
124
125
scaffold proteins suggest diverse biological functions in cellular communication and signal transduction. J. Biol. Chem. 2000, 275, 25616–25624. OLEINIKOV, A. V., ZHAO, J., MAKKER, S. P., Cytosolic adaptor protein Dab2 is an intracellular ligand of endocytic receptor gp600/megalin. Biochem. J. 2000, 347 Pt 3, 613–621. MORRIS, S. M., COOPER, J. A., Disabled-2 colocalizes with the LDLR in clathrin-coated pits and interacts with AP-2. Traffic 2001, 2, 111–123. MISHRA, S. K., KEYEL, P. A., HAWRYLUK, M. J., AGOSTINELLI, N. R., WATKINS, S. C., TRAUB, L. M., Disabled-2 exhibits the properties of a cargo-selective endocytic clathrin adaptor. EMBO J. 2002, 21, 4915–4926. MORRIS, S. M., ARDEN, S. D., ROBERTS, R. C., KENDRICK-JONES, J., COOPER, J. A., LUZIO, J. P., BUSS, F., Myosin VI binds to and localises with Dab2, potentially linking receptor-mediated endocytosis and the actin cytoskeleton. Traffic 2002, 3, 331–341. CONNER, S. D., SCHMID, S. L., Regulated portals of entry into the cell. Nature 2003, 422, 37–44. MORRIS, S. M., TALLQUIST, M. D., ROCK, C. O., COOPER, J. A., Dual roles for the Dab2 adaptor protein in embryonic development and kidney transport. EMBO J. 2002, 21, 1555–1564. LEHESTE, J. R., et al., Megalin knockout mice as an animal model of low molecular weight proteinuria. Am. J. Pathol. 1999, 155, 1361–1370. ZHOU, J., HSIEH, J. T., The inhibitory role of DOC-2/DAB2 in growth factor receptor-mediated signal cascade. DOC-2/DAB2-mediated inhibition of ERK phosphorylation via binding to Grb2. J. Biol. Chem. 2001, 276, 27793–27798. HOCEVAR, B. A., SMINE, A., XU, X. X., HOWE, P. H., The adaptor molecule Disabled-2 links the transforming growth factor beta receptors to the Smad pathway. EMBO J. 2001, 20, 2789–2801. HOCEVAR, B. A., MOU, F., RENNOLDS, J. L., MORRIS, S. M., COOPER, J. A., HOWE, P. H., Regulation of the Wnt signaling
23
24 PTB Domains
126
127
128
129
130
131
132
133
134
pathway by disabled-2 (Dab2). EMBO J. 2003, 22, 3084–3094. KAMIKURA, D. M., COOPER, J. A., Lipoprotein receptors and a Disabled family cytoplasmic adaptor protein regulate EGL–17/FGF export in C. elegans. Genes. Dev. 2003, 17, 2798–2811. GARCIA, C. K., et al., Autosomal recessive hypercholesterolemia caused by mutations in a putative LDL receptor adaptor protein. Science 2001, 292, 1394–1398. EDEN, E. R., et al., Restoration of LDL-receptor function in cells from patients with autosomal recessive hypercholeste-rolemia by retroviral expression of ARH1. J. Clin. Invest. 2002, 110, 1695–1702. JONES, C., HAMMER, R. E., LI, W. P., COHEN, J. C., HOBBS, H. H., HERZ, J., Normal sorting, but defective endocytosis of the LDL receptor in mice with autosomal recessive hypercholesterolemia. J. Biol. Chem. 2003, 278, 29024–29030. HE, G., GUPTA, S., MICHAELY, P., HOBBS, H. H., COHEN, J. C., ARH is a modular adaptor protein that interacts with the LDL receptor, clathrin and AP-2. J. Biol. Chem. 2002, 277, 44044–44049. MISHRA, S. K., WATKINS, S. C., TRAUB, L. M., The autosomal recessive hyper-cholesterolemia (ARH) protein interfaces directly with the clathrin-coat machinery. Proc. Natl. Acad. Sci. USA 2002, 99, 16099–16104. ZHOU, Y., ZHANG, J., KING, M. L., Xenopus ARH couples lipoprotein receptors with the AP-2 complex in oocytes and embryos and is required for vitellogenesis. J. Biol. Chem. 2003, 278, 44584–44592. COHEN, J. C., KIMMEL, M., POLANSKI, A., HOBBS, H. H., Molecular mechanisms of autosomal recessive hypercholesterolemia. Curr. Opin. Lipidol. 2003, 14, 121–127. SOUTAR, A. K., NAOUMOVA, R. P., TRAUB, L. M., Genetics, clinical phenotype, and molecular cell biology of autosomal recessive hypercholesterolemia. Arterioscler. Thromb. Vasc. Biol. 2003, 23, 1963–1970.
135 JAFAR-NEJAD, H., NORGA, K., BELLEN, H., Numb. “Adapting” notch for endocytosis. Dev. Cell 2002, 3, 155–156. 136 BERDNIK, D., TOROK, T., GONZALEZ-GAITAN, M., KNOBLICH, J., The endocytic protein #945;-adaptin is required for Numb-mediated asymmetric cell division in Drosophila. Dev. Cell 2002, 3, 221–231. 137 WAKAMATSU, Y., MAYNARD, T. M., JONES, S. U., WESTON, J. A., Numb localizes in the basal cortex of mitotic avian neuro-epithelial cells and modulates neuronal differentiation by binding to Notch-1. Neuron 1999, 23, 71–81. 138 ZHONG, W., FEDER, J. N., JIANG, M. M., JAN, L. Y., JAN, Y. N., Asymmetric localization of a mammalian numb homolog during mouse cortical neurogenesis. Neuron 1996, 17, 43–53. 139 KNOBLICH, J. A., JAN, L. Y., JAN, Y. N., The N terminus of the Drosophila Numb protein directs membrane association and actindependent asymmetric localization. Proc. Natl. Acad. Sci. USA 1997, 94, 13005–13010. 140 SANTOLINI, E., PURI, C., SALCINI, A. E., GAGLIANI, M. C., PELICCI, P. G., TACCHETTI, C., DI FIORE, P. P., Numb is an endocytic protein. J. Cell Biol. 2000, 151, 1345–1352. 141 QIN, H., PERCIVAL-SMITH, A., LI, C., JIA, C. Y., GLOOR, G., LI, S. S., A novel transmembrane protein recruits numb to the plasmic membrane in asymmetric cell division. J. Biol. Chem. 2003, in press. 142 CHIEN, C. T., WANG, S., ROTHENBERG, M., JAN, L. Y., JAN, Y. N., Numb-associated kinase interacts with the phosphotyrosine binding domain of Numb and antagonizes the function of Numb in vivo. Mol. Cell. Biol. 1998, 18, 598–607. 143 O’CONNOR-GILES, K. M., SKEATH, J. B., Numb inhibits membrane localization of Sanpodo, a four-pass transmembrane protein, to promote asymmetric divisions in Drosophila. Dev. Cell 2003, 5, 231–243. 144 ECK, M. J., DHE-PAGANON, S., TRUB, T., NOLTE, R. T., SHOELSON, S. E., Structure of the IRS-1 PTB domain bound to the
References
145
146
147
148
149
150
151
juxtamembrane region of the insulin receptor. Cell 1996, 85, 695–705. ZHOU, M. M., et al., Structural basis for IL-4 receptor phosphopeptide recognition by the IRS-1 PTB domain. Nat. Struct. Biol. 1996, 3, 388–393. LI, S. C., ZWAHLEN, C., VINCENT, S. J., MCGLADE, C. J., KAY, L. E., PAWSON, T., FORMAN-KAY, J. D., Structure of a Numb PTB domain–peptide complex suggests a basis for diverse binding specificity. Nat. Struct. Biol. 1998, 5, 1075–1083. ZHANG, Z., LEE, C. H., MANDIYAN, V., BORG, J. P., MARGOLIS, B., SCHLESSINGER, J., KURIYAN, J., Sequence-specific recognition of the internalization motif of the Alzheimer’s amyloid precursor protein by the X11 PTB domain. EMBO J. 1997, 16, 6141–6150. STOLT, P. C., JEON, H., SONG, H. K., HERZ, J., ECK, M. J., BLACKLOW, S. C., Origins of peptide selectivity and phosphoinositide binding revealed by structures of Disabled-1 PTB domain complexes. Structure (Camb) 2003, 11, 569–579. YUN, M., et al., Crystal structures of the dab homology domains of mouse disabled 1 and 2. J. Biol. Chem. 2003, 278, 36572–36581. SHI, N., et al., Structural basis for the specific recognition of RET by the Dok1 PTB domain. J. Biol. Chem. 2003, in press. YAN, K. S., KUTI, M., YAN, S., MUJTABA, S., FAROOQ, A., GOLDFARB, M. P., ZHOU, M. M., FRS2 PTB domain conformation regulates interactions with divergent neurotrophic receptors. J. Biol. Chem. 2002, 277, 17088–17094.
152 ZWAHLEN, C., LI, S. C., KAY, L. E., PAWSON, T., FORMAN-KAY, J. D., Multiple modes of peptide recognition by the PTB domain of the cell fate determinant Numb. EMBO J. 2000, 19, 1505–1515. 153 SONGYANG, Z., MARGOLIS, B., CHAUDHURI, M., SHOELSON, S. E., CANTLEY, L. C., The phosphotyrosine interaction domain of SHC recognizes tyrosine-phosphorylated NPXY motif. J. Biol. Chem. 1995, 270, 14863– 14866. 154 TRUB, T., CHOI, W. E., WOLF, G., OTTINGER, E., CHEN, Y., WEISS, M., SHOELSON, S. E., Specificity of the PTB domain of Shc for beta turn-forming pentapeptide motifs amino-terminal to phosphotyrosine. J. Biol. Chem. 1995, 270, 18205–18208. 155 FAROOQ, A., ZENG, L., YAN, K. S., RAVICHANDRAN, K. S., ZHOU, M. M., Coupling of folding and binding in the PTB domain of the signaling protein Shc. Structure (Camb) 2003, 11, 905–913. 156 SU, H. P., NAKADA-TSUKUI, K., TOSELLO-TRAMPONT, A. C., LI, Y., BU, G., HENSON, P. M., RAVICHANDRAN, K. S., Interaction of CED-6/GULP, an adapter protein involved in engulfment of apoptotic cells, with CED-1 and CD91/LRP. J. Biol. Chem. 2001, 277, 11772–11779. 157 DHO, S. E., FRENCH, M. B., WOODS, S. A., MCGLADE, C. J., Characterization of four mammalian numb protein isoforms. Identification of cytoplasmic and membrane-associated variants of the phosphotyrosine binding domain. J. Biol. Chem. 1999, 274, 33097–33104.
25
1
Phosphoserine/Threonine Binding Domains Andrew E. H. Elia, and Michael B. Yaffe
Massachusetts Institute of Technology, Cambridge, USA
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 Introduction
The orchestration of complex cellular events requires the timely assembly of multimolecular signaling complexes at specific locations within the cell. This temporal and spatial control is often achieved through phosphorylation-dependent binding and activation. Prior to the discovery of SH2 domains, protein phosphorylation was thought to regulate substrate binding and catalytic activity primarily by inducing allosteric changes in protein tertiary structure. A new view emerged in 1990, however, with the realization that binding of SH2 domains to tyrosine residues occurred only when they carried a phosphate moiety, introducing the idea that phosphorylation could function as a direct regulatory switch for protein-protein interactions [1–3]. This view was not immediately applied to serine/threonine kinase signaling, since SH2 and subsequently identified PTB domains were highly specific for phosphotyrosine [4]. Speculation ensued based on the idea that basic signaling mechanisms were different for Tyr and Ser/Thr phosphorylation events. The unanticipated finding in 1996 that 14-3-3 proteins recognize phosphorylated serine- and threonine-based motifs, however, rapidly dispelled this idea [5, 6]. Additional phosphoserine (pSer)/phosphothreonine (pThr)-binding modules have been discovered since 1996 and currently include five additional family members: WW domains, FHA domains, WD40 repeats, the Polo-box domain, and BRCT repeats [7–10]. These domains comprise a diverse structural group, demonstrating that numerous divergent tertiary folds have been capable of acquiring a phosphodependent binding function through evolution. Importantly, these domains all recognize phosphoserine or phosphothreonine within a unique consensus motif that directs the specificity of ligand binding. This chapter provides an overview of the identification, cellular function, and structural basis of binding for current Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Phosphoserine/Threonine Binding Domains
members of the pSer/pThr-binding family. Their expanding repertoire is leading to a more general appreciation of the role of phosphopeptide recognition domains in regulating the reversible assembly of multiprotein complexes.
2 The 14-3-3 Proteins 2.1 History and Functions
The first pSer/pThr-binding molecules to be identified were 14-3-3 proteins, a family of abundant polypeptides found in all eukaryotic cells. Initially discovered with no known function, the observation that ligand phosphorylation might be critical for 14-3-3 binding emerged from work on tryptophan hydroxylase [11], an enzyme involved in neurotransmitter biosynthesis. Widespread interest in 14-3-3 proteins subsequently grew when they were found to interact with Raf, the upstream activator of the classical MAP kinase pathway, and polyoma middle T antigen [12– 14] and to play an essential role in the DNA damage checkpoint of fission yeast [15]. Investigation of the 14-3-3 binding sites on Raf [5] and in-vitro peptide-library screening [6] led to the identification of two optimal phosphoserine/threoninecontaining motifs – RSxpoS/TxP and RxxxpoS/TxP – that are recognized by all 14-3-3 isotypes [6]. Over 100 proteins interact with 14-3-3, including various protein kinases and phosphatases, apoptotic factors, transcription factors, cell surface receptors, ion channels, cytoskeletal proteins, and metabolic proteins [16–18]. 14-3-3 regulates the function of its bound partners through a number of general mechanisms, including increasing or decreasing the ligand’s catalytic activity, facilitating or blocking molecular interactions between the ligand and other molecules, and regulating the subcellular localization of the bound protein. One of the earliest elucidated functions of 14-3-3 was the regulation of tryptophan hydroxylase activity. 14-3-3 was found to bind tryptophan hydroxylase upon calmodulin kinase II phosphorylation and to thereby activate the enzyme [19, 20]. Shortly thereafter, 14-3-3 proteins were found to play inhibitory roles toward other enzymes, including PKC [21] and apoptosis signal-regulating kinase 1 (ASK1) [22]. In some instances, 14-3-3 plays a dynamic role with different functions during the course of catalytic activation. For example, 14-3-3 helps to maintain the mitogenstimulated kinase Raf in an inactive but activatable conformation in resting cells. Upon cellular stimulation with growth factors, 14-3-3 is partly displaced from Raf to allow Raf activation, but 14-3-3 binding is not completely eliminated, because its continued presence is necessary for full catalytic activity [23–27]. Besides influencing enzymatic function, 14-3-3 proteins regulate some biological processes by modulating the interaction between two protein binding partners. A well studied example is the binding of 14-3-3 to the pro-apoptotic factor BAD [28, 29]. Exposure of cells to survival factors such as IL-3 results in the phosphorylation of BAD at three sites by the kinases Akt1, RSK1, and PKA [30–33].
2 The 14-3-3 Proteins
These phosphorylations cooperatively mediate 14-3-3 binding, which ultimately interferes with the ability of BAD to bind and inhibit the anti-apoptotic factor Bcl-2. The net outcome is Bcl2 release and the prevention of apoptosis. Another example of molecular interference by 14-3-3 involves its binding to IRS-1, which functions to inhibit the interaction between IRS-1 and phosphatidylinositol-3-kinase (PI3K), causing a decrease in insulin-stimulated PI3-K activity [34]. Although 14-3-3 functions as a molecular ‘blocker’ in these instances, it promotes protein binding in others by exploiting the presence of two phosphopeptide binding sites within its dimeric structure. Thus, a single dimeric 14-3-3 molecule can function as an adaptor by simultaneously binding two distinct ligands to bridge them together. Reports suggest that such 14-3-3-mediated complexes exist between Bcr and Raf [35] and between the kinases PKCζ and Raf [36]. 14-3-3 proteins regulate subcellular localization by inducing the cytoplasmic sequestration of some proteins and the nuclear translocation of others. A well characterized example of the former includes cytoplasmic retention of the mitotic phosphatase Cdc25C upon phosphorylation in response to DNA damage. Subsequent 14-3-3 binding and cytoplasmic tethering prevents access of Cdc25C to nuclear Cdc2-cyclin B and thereby inhibits mitotic entry [37–39]. Another example includes the 14-3-3-induced cytoplasmic sequestration of the pro-apoptotic transcription factor FKHRL-1, which occurs upon Akt phosphorylation in response to external cell survival stimuli [40, 41]. The general notion that 14-3-3 always promotes cytoplasmic localization, however, is contradicted by the observation that binding to the homeodomain transcription factor Tlx-2 induces its nuclear translocation [42]. Such diverse effects of 14-3-3 binding on subcellular localization may result from the fact that 14-3-3 proteins, by themselves, do not contain any functional nuclear export sequences (NES) or nuclear localization sequences (NLS). Instead, 14-3-3 molecules may regulate nucleo-cytoplasmic trafficking by exposing or masking such sequences within the bound target molecule, as has been observed for FKHRL-1 [41]. In this way, 14-3-3 proteins act as molecular ‘chauffeurs’, with the final subcellular destination of the 14-3-3-bound complex controlled by the bound protein passenger [41, 43]. 2.2 Structure and Binding
The X-ray structures of 14-3-3τ and ζ in the absence of bound ligand revealed the molecule to be a cup-shaped dimer (Figure 1a) [44, 45], with each monomeric subunit consisting of nine α helices. The dimer interface is formed from hydrophobic and salt-bridge interactions that conceal a large surface area of over 2000 Å2 . In all ligand-bound 14-3-3 structures [6, 46, 47], the peptide occupies one of two amphipathic grooves that line the central channel formed by interaction of the monomers. In phosphopeptide structures, the phosphate oxygens form ionic and hydrogen bonds with three basic residues, Lys49, Arg56, and Arg127, which form a positively charged pocket, and with Tyr128 (Figure 1b). The entire phosphopeptide mainchain is held in an extended conformation up to two residues after the phosphoserine, at which point there is a sudden change in chain direction. This ligand
3
4 Phosphoserine/Threonine Binding Domains
Fig. 1 Structural diversity among phosphoserine/threonine binding domains. In each panel, the bound phosphopeptide ligand is shown in stick representation with carbons colored yellow, nitrogens blue, oxygens red, and phosphate purple. Selected sidechains from each of the structures that interact with the phosphopeptide ligand are also shown in ball-and-stick representation with their carbon atoms colored cyan and the oxygen and nitrogen atoms colored as above. (A) Overview of 14-3-3 showing a phosphopeptide ligand bound to each monomeric subunit. (B) Close-up showing details of phosphate recognition by residues in the basic pocket of a 14-3-3 monomer.
(C) Overview and details of phosphopeptide binding by the Pin1 WW domain. (D) Structure of a pThr-containing peptide bound to the Rad53 N-terminal FHA domain. (E) Close-up showing the residues involved in the Rad53 N-terminal FHA-pThr phosphate interaction. (F) Overview of the Cdc4 WD40 repeats binding to a pThr-containing phosphopeptide. (G) Close-up showing WD40 residue sidechains important for pThr binding. (H) Structure of the Plk1 Polo-box domain bound to an optimal phosphopeptide. The critical Trp414 residue involved in phosphopeptide binding and Ser-1 selection is shown.
3 WW Domains
geometry is required to exit the 14-3-3 binding cleft and rationalizes why optimal 143-3-binding sites contain a proline at this position. This general mode of binding has been validated for a physiologic protein substrate by the recent structure of 14-3-3ζ in complex with serotonin N-acetyl transferase [48]. 14-3-3 proteins bind to their ligands with high affinity, having dissociation constants typically in the nanomolar range. A phosphopeptide binding site present within each monomer suggests that a dimeric 14-3-3 molecule might engage two distinct motifs within a single molecule to make use of an avidity effect. This type of bidentate interaction has been directly observed in 14-3-3 binding to serotonin N-acetyl transferase [48] and may occur in other 14-3-3 substrates that contain two or more phosphorylated 14-3-3 binding sites, such as Raf and BAD [5, 18, 26–28]. A synthetic phosphopeptide that contains tandem 14-3-3 consensus motifs binds to 14-3-3 over 30 times more tightly than the same peptide containing only one phospho-motif [6].
3 WW Domains
WW domains, named for the presence of two conserved tryptophan residues within their 40-residue sequence, can be grouped into six classes, whose members all recognize proline-rich sequences [49–51]. Only class IV WW domains exhibit a phospho-dependent binding function [52]. Three proteins with class IV domains have been described: the mitotic prolyl isomerase Pin1/Ess1, the splicing factor Prp40, and the HECT domain E3 ubiquitin ligase Nedd4/Rsp5. Phosphospecific binding to the WW domain of Pin1 has been most extensively studied. In addition to its WW domain, Pin1 contains a C-terminal prolyl isomerase domain, the X-ray crystal structure of which was solved prior to the recognition of its WW domain phospho-specificity. The structure of the isomerase domain in complex with an Ala-Pro dipeptide revealed a sulfate ion located 5 Å from the Cβ of Ala, suggesting that the isomerase might prefer phosphorylated substrates [53]. Indeed, Pin1 was found to catalyze the specific cis-trans isomerization of pSer/ThrPro bonds [54]. Furthermore, Pin1 was shown to interact with numerous mitotic phosphoproteins, including the important mitotic regulators Cdc25, Myt1, Wee1, Plk1/Plx1, and Cdc27 [55]. This phospho-dependent recognition was subsequently found to occur, in part, through the WW domain, defining it as the first modular phosphoserine/threonine-binding domain [8]. All class IV WW domains show specific binding to phosphoserine-proline or phosphothreonine-proline motifs that are created upon substrate phosphorylation by proline-directed kinases such as cyclin-dependent kinases and MAPKs. The effects of Pin1 on mitotic progression are complex, since it functions to delay mitotic entry but is also required for proper passage through mitosis [56]. The molecular mechanism underlying these functions is not completely understood but seems to involve regulation of the phosphatase Cdc25. Normal mitotic entry involves dephosphorylation and consequent activation of Cdc2 kinase, which is present within a complex containing cyclin B and a small regulatory
5
6 Phosphoserine/Threonine Binding Domains
subunit called Cks1, by a phosphorylated form of Cdc25. The activated Cdc2 complex, in turn, further phosphorylates Cdc25, increasing its activity and creating a positive feedback loop. Upon binding to phosphorylated Ser/Thr-Pro sites in Cdc25, Pin1 likely serves two functions to regulate this process. First, it catalytically induces a conformational change in Cdc25 that facilitates its dephosphorylation [57, 58], and second, it competes with Cks1 for binding to Cdc25, thereby limiting Cdc25 phosphorylation by Cdc2 [59, 60]. The cumulative effect is a decrease in Cdc25 phosphorylation and inhibition of Cdc25 activity during early mitotic entry. Pin1 plays additional roles outside of mitosis, one of which involves the regulation of β-catenin turnover and nuclear translocation. The WW domain of Pin1 inhibits interaction of adenomatous polyposis coli protein (APC) with β-catenin by binding to a phosphorylated Ser-Pro motif near the APC-binding site of β-catenin [61]. Since APC is necessary for the cytoplasmic assembly of β-catenin into a multimolecular complex that triggers its degradation, Pin1 binding serves to increase β-catenin stability in the nucleus [61]. Pin1 is also involved in activating the tumor-suppressor protein p53 upon DNA damage. Multiple types of genotoxic stress induce binding of Pin1 to three phosphorylated Ser/Thr-Pro sites in p53 [62, 63]. Pin1 then appears to effect a conformational change in p53 that increases both its stability and its transactivation function. Pin1-deficient cells are defective in both p53 activation and stabilization and in checkpoint control in response to DNA damage. The Pin1induced conformational changes in p53 may function to activate p53 by protecting it from interaction with the protein HDM2, which regulates p53 stability, and/or by directly influencing the ability of p53 to act as a transcription factor [62, 63]. WW domains fold into three antiparallel β strands, forming a single groove that recognizes proline-rich ligands in the context of a type II polyproline helix (Figure 1c) [64, 65]. Specificity for different proline-rich motifs is determined largely by residues within the loop regions that connect the β1/β2 and β2/β3 strands [65], somewhat akin to the mechanism of ligand binding utilized by FHA domains. The structure of the Pin1 WW domain bound to a YpoSPTpoSPS peptide from the C-terminal region of RNA polymerase II shows that phospho binding occurs through four hydrogen bonds between the peptide’s second phosphoserine and two residues in the β1/β2 loop (Arg17 and Ser16), along with one in the β2 strand (Tyr23) (Figure 1c) [66]. Because the majority of WW domains lack an Arg residue in the β1 loop, pSer/pThr binding by WW domains may be the exception rather than the rule. Pin1 WW domain selection for proline at the pSer/Thr +1 position is explained by the presence of a hydrophobic pocket formed by the aromatic amino acids Tyr23 and Trp34. The pSer-Pro backbone inserts into this pocket, where it is sterically clamped in a trans conformation.
4 FHA Domains
FHA (forkhead associated) domains were originally identified through sequence profiling as a region of homology in forkhead transcription factors [67]. They have since been found in a wide diversity of both prokaryotic and eukaryotic proteins.
4 FHA Domains
The necessity of ligand phosphorylation for FHA binding initially emerged from studies in Arabidopsis showing that a region encompassing the FHA domain of the protein KAPP (kinase-associated protein phosphatase) bound exclusively to the phosphorylated form of the receptor-like protein kinase RLK5 [68]. Interest in FHA domains increased in 1998 when Sun et al. [69] demonstrated that phospho-selective binding of the C-terminal FHA domain of Rad53 to the BRCT-containing protein Rad9 was necessary for DNA damage-induced G2/M arrest in Saccharomyces cerevisiae [69]. Firm evidence for FHA domains as phosphopeptide docking modules was secured when Durocher et al. [70] demonstrated that selective binding of the Rad53 FHA domain to phosphorylated Rad9 could be inhibited by exogenous peptides. These authors also showed that FHA domains from other proteins could also bind directly to isolated phosphopeptides. Oriented peptide library screening to discern consensus motifs for phosphothreonine peptide binding has allowed a tentative grouping of FHA domains into discrete classes based on the specificity at the pThr +3 position (three residues C-terminal from the phosphothreonine) [71]. FHA domain-containing proteins have been most extensively investigated in cell cycle control and in the cellular response to genotoxic damage. Three such proteins are the kinases Chk2, Rad53, and Dun1, all of which mediate cell cycle arrest at multiple checkpoints in response to DNA damage. The simple concept that these proteins employ their FHA domains for targeting substrates to their kinase domains, however, has not been supported by the available data. Instead, it appears that the FHA domains of these kinases are more important for targeting the kinases to scaffolds, where their own activating phosphorylation can occur. Mutations in the C-terminal FHA domain of Rad53 impair its ability to bind phosphorylated Rad9 after DNA damage and result in a loss of Rad53 activation [69]. Rad9 is thought to promote Rad53 activation by acting as an adaptor that either recruits Rad53 to a complex containing the activating kinase Mec1 [72] or brings separate Rad53 molecules close together to facilitate their trans-autophosphorylation [73]. In this regard, the FHA domain of Chk2 appears to regulate transautophosphorylation of its kinase domain by mediating direct homodimerization through interaction with ATM-phosphorylated T68 of Chk2 [74, 75]. Similarly, Dun1 s FHA domain is critical for direct phosphorylation and activation of Dun1 by Rad53 [76]. Two non-kinase mediators of checkpoint signaling that contain FHA domains are Nbs1 and MDC1. Nbs1 is a component of the Mre11-Rad50-Nbs1 (MRN) complex, which localizes to sites of DNA damage in order to coordinate DNA repair and checkpoint signaling. It contains an FHA domain and an adjacent BRCT domain, both of which are involved in targeting this complex to nuclear DNA damage foci [77, 78]. Recombinant FHA/BRCT domain of Nbs1 binds directly in vitro to ATM-phosphorylated histone H2AX, which likely mediates MRN localization to DNA damage sites in vivo [77]. Another recently identified mediator of checkpoint signaling containing an FHA domain is the molecule MDC1, which functions in the intra-S and G2-M checkpoints. Within minutes after ionizing radiation, MDC1 associates with the MRN complex and with numerous other DNA damageresponse proteins, including 53BP1 and BRCA1. Like Nbs1, it forms nuclear foci that colocalize with H2AX foci induced by DNA damage, and peptide binding data
7
8 Phosphoserine/Threonine Binding Domains
suggest that its FHA domain may also mediate direct binding to phosphorylated histone H2AX [79–81]. The X-ray crystal structure of the N-terminal Rad 53 FHA exhibits a core fold consisting of an 11-stranded β sandwich, with a strand topology essentially identical to that of the MH2 domain from SMAD signaling molecules (Figure 1d) [71]. Phosphopeptide binding occurs at one end of the domain, through interactions between selected residues in the phosphopeptide and loops that connect the β3/4, β4/5, and β6/7 strands. The phosphate moiety is held by five hydrogen bonds to Arg70, Ser85, Asn86, and Thr106 of Rad53 FHA1 (Figure 1e). Phospho-independent contacts between FHA domain sidechains and peptide backbone atoms maintain the peptide in an extended conformation. In the Rad53 FHA domain complex, the pThr +3 specificity is derived from salt bridging interactions of Arg83 with an aspartate at the pThr +3 residue of the bound phosphopeptide. FHA domains differ from other phosphothreonine-docking modules in that replacement of pThr with pSer completely eliminates phosphopeptide binding [71]. Curiously, though, the C-terminal Rad53 FHA domain binds phosphotyrosinecontaining peptides [82]. The biological significance of this finding is unclear, since yeast have limited tyrosine kinase signaling. However, it raises the interesting possibility that FHA domains in higher eukaryotes may function as dualspecificity phosphopeptide-binding modules. Interestingly, FHA domains appear to have an additional phospho-independent binding surface on the opposite side of the phosphopeptide-interacting groove. For the FHA domain of human Chk2, this surface is necessary, in conjunction with the phosphopeptide binding surface, for binding BRCA1 [83].
5 WD40 Repeats of F-box Proteins
Phospho-dependent substrate recognition has proven to play a key role in the regulation of protein ubiquitination. Skp1-Cullin-F-box (SCF) complexes are E3 ubiquitin ligases that facilitate the transfer of ubiquitin from a ubiquitin-conjugating enzyme (E2) to lysine residues on a substrate, ultimately forming polyubiquitylation chains that target the substrate for degradation by the proteasome. SCF complexes contain Skp1, Cul1/Cdc53, Roc1, and an F-box-containing protein [84–87]. In addition to an N-terminal F-box motif, most F-box proteins typically contain either WD40 domains or leucine-rich repeats (LRRs) that mediate phospho-specific substrate targeting. Whereas WD40 domains have been definitively shown to directly recognize phosphorylated motifs within substrates (reviewed in [7]), the exact role of LRRs in phospho-dependent substrate recognition is less clear, since they appear to additionally require the adaptor protein Csk1 [88]. The earliest studies implicating WD40 domains in phospho-binding came in 1997 from Ben-Neriah and coworkers, who discovered that phosphopeptides corresponding to sites within the NF-κB inhibitor, IkBcx, specifically inhibit IkBα ubiquitination and degradation mediated by the WD40-containing F-box protein β-TrCP [89, 90]. Subsequent work showed
5 WD40 Repeats of F-box Proteins
that β-TrCP in cellular lysates interacts with immobilized peptides in a phosphospecific fashion [91]. However, definitive proof was not attained until Sicheri, Tyers, and colleagues [92] crystallized the WD40 repeat of the F-box protein Cdc4 directly bound to a phosphopeptide ligand and Pavletich and colleagues [93] crystallized β-TRCP bound to a phosphopeptide from β-catenin. Phosphospecific binding by F-box proteins has been most extensively analyzed for Cdc4 and β-TrCP, two F-box proteins containing WD40 domains. In S. cerevisiae, studies on Cdc4 have focused on the substrate Sic1, a Cdk inhibitor that is specific for the S-phase kinase Clb5-Cdc28 kinase and whose degradation is necessary for S-phase entry. Phosphorylation of Sic1 by Cln1/2-Cdc28 kinases during G1 leads to its Cdc4-mediated ubiquitination and subsequent proteasomal degradation. Consequently, Clb5-Cdc28 becomes activated and drives S-phase entry [94]. The necessity of multisite phosphorylation for Sic1 recognition by Cdc4 has proven to play an important role in the kinetics of S-phase entry [95]. The optimal binding motif for Cdc4, which is LI-L/I-poT-P−ˆ(RK)4 (where ˆ(RK)4 denotes selection against basic residues in the next four positions), is not found in Sic1. Rather, nine suboptimal sites are present, six of which must be phosphorylated for Cdc4 binding. This multisite requirement makes the binding of Cdc4 to Sic1 an ultrasensitive process with a maximum theoretical Hill coefficient (nH ) of 6. When Cln-Cdc28 levels are subthreshold, phosphorylation is not sufficient to drive Sic1-Cdc4 complex formation, but when Cln-Cdc28 activity meets a critical threshold level, virtually all of the Sic1 is rapidly phosphorylated and degraded within a short time [95–97]. Another F-box protein with WD40 repeats, β-TrCP, plays roles in both developmental and inflammatory signaling pathways. It mediates the ubiquitination of phosphorylated β-catenin to down-regulate Wnt signaling [98] and of phosphorylated IkBcx to induce nuclear translocation of the transcription factor NFkB for induction of immunological genes [99]. β-TrCP recognizes a common DpoSGxxpoS motif, which is generated in β-catenin by GSK-3 phosphorylation and in IkBcx by IkK phosphorylation [99]. An X-ray crystal structure of a ternary complex consisting of Skp1, Cdc4, and a high-affinity phosphopeptide provides direct visualization of phosphoepitope binding to the WD40 domain of Cdc4 (Figure 1f) [92]. The WD40 repeats form an eight-bladed β propeller, a somewhat unusual finding since all previously solved WD40 structures possess only seven blades. The phosphopeptide binds in an extended conformation across one blade with the N terminus oriented toward the central pore of the WD40 domain and the C terminus oriented toward the outer rim. The phosphate moiety is bound by electrostatic interactions with the guanidinium groups of three arginines and with the hydroxyl group of a tyrosine residue (Figure 1g). The pThr +1 proline selection derives from insertion of the proline into a hydrophobic pocket whose Trp426 engages the proline pyrrolidine ring through a coplanar interaction in a manner strikingly similar to that which occurs in the Pin1 WW domain binding to a pThr +1 proline. The pThr −1 and −2 aliphatic selections are rationalized by hydrophobic binding, and the disfavor of basic residues in the C-terminal peptide stretch arises from repulsive forces delivered
9
10 Phosphoserine/Threonine Binding Domains
by a cluster of positively charged basic residues adjacent to the phosphate binding pocket [92]. The mechanism for cooperative binding between Cdc4 and phospho-Sic1 has yet to be determined. In principle, multiple phosphorylated sites could enhance binding through an avidity effect by engaging more than one pocket on Cdc4. Examining the Cdc4 surface, though, reveals no obvious additional sites for phosphate binding, because its most conserved region overlaps with the known basic phosphate binding pocket. The authors, therefore, prefer a model in which phospho-Sic1 engages a single phosphopeptide binding groove of Cdc4 [100, 101]. The presence of multiple Sic1 sites would strengthen binding in this model by increasing the local concentration of Sic1 once an initial contact is made. In this scenario, the rate of Sic1 diffusion away from Cdc4 upon dissociation of a single site would be overwhelmed by rebinding at a second site. Alternatively, cooperativity could be achieved by avidity binding of multiply phosphorylated Sic1 to multimerized Cdc4, for which some evidence exists [100].
6 Polo-box Domains
Polo-like kinases play important roles in cell cycle progression and in checkpoint pathways activated by DNA damage or mitotic spindle disruption. They are characterized by the presence of a conserved Ser/Thr kinase domain and a noncatalytic C-terminal region composed of two homologous ∼70–80-residue segments called Polo-boxes [102, 103]. For nearly 10 years, this C-terminal region was recognized as essential for the in vivo function of Plks [104–108]. Evidence suggested that it targeted polo kinases to substrates at particular locations within the cell. However, the molecular mechanism through which such localization occurred remained mysterious until identification of the Plk1 C terminus in a proteomic screen for novel phosphopeptide binding domains [9]. In this screen, an immobilized library of partially degenerate phosphopeptides was biased toward the phosphorylation motif for cyclin-dependent kinases (Cdks) and used to isolate phospho-binding domains that bind to proteins phosphorylated by Cdks. This screen revealed that both Polo-boxes of Plk1 function together as a single phosphoserine/threonine-binding domain, which has been termed the Polo-box domain (PBD). Oriented peptide library screening with PBDs from all canonical human Plks (Plk1 1, Plk2, Plk3) and from S. cerevisiae and Xenopus has demonstrated that PBDs recognize the common consensus motif S-[poT/poS]-(P/x) [109]. Peptide array studies have shown that serine at the (pThr or pSer) −1 position is absolutely necessary and that proline at the (pThr or pSer) +1 position is preferred but not required [9]. In contrast to other, more ubiquitous, phosphopeptide binding domains, PBDs occur only in Polo-like kinases, where they perform two critical functions regulating the activity of their adjacent kinase domains. In the basal state, they inhibit the catalytic activity of the kinase through intramolecular binding, but upon activation, they target the kinase domain to previously phosphorylated (primed) substrates or
6 Polo-box Domains
docking proteins. Plk1, which has been the most extensively studied among members of the polo kinase family, has a distinct subcellular localization pattern during mitosis, originating at centrosomes and kinetochores in prophase and moving to the spindle midzone during late stages of mitosis (reviewed in [110–113]). The PBD is necessary and sufficient for this localization pattern. Mutation of the phosphopeptide binding pocket in the PBD [109] or injection of its optimal phosphopeptide ligand into permeabilized cells [9] disrupts centrosomal localization, demonstrating that this process relies on phospho binding by the PBD. Evidence that phosphopeptide binding by the Plk1 PBD is necessary for mitotic progression emerged from the finding that mutation of its phospho-binding pocket prevents G2-M arrest when a dominant negative PBD construct is overexpressed in mammalian cells [109]. The mitotic phosphatase Cdc25 is one particular protein targeted by the PBD of Plk1 to regulate mitotic entry. During mitosis, Cdc25 is phosphorylated at five Ser/Thr-Pro sites in its N terminus [114, 115] by Cdks. The Plk1 PBD interacts selectively with the phosphorylated form of one of these sites, Thr130, which contains a conserved PBD consensus motif. Both Cdc2-cyclin B and Plk1 were previously shown to cooperate in phosphorylating and activating Cdc25 in mitotic entry [115–119]. It is attractive to envision a model for this cooperation by which low amounts of Cdc2/cyclinB activity during prophase are insufficient to fully activate Cdc25 but provide priming phosphorylation of Cdc25, creating a PBD docking site. Subsequent recruitment of Plk1 would further phosphorylate and activate Cdc25, which would dephosphorylate Cdc2/cyclin B to increase its activity, priming additional Cdc25 molecules for activation by Polo-like kinases, and resulting in a positive feedback loop [9]. Multiple lines of evidence suggest that a mutually inhibitory interaction exists between the PBD and the kinase domain in full-length Plk1. Deletion of the PBD increases the kinase activity of Plk1 about three fold [108, 120], and the isolated PBD interacts with and inhibits the isolated kinase domain in trans [107]. Furthermore, addition of the optimal PBD phosphopeptide to full-length Plk1 increases its kinase activity by a factor of about three, suggesting that binding of the PBD to primed phosphorylation sites not only serves to target the kinase domain to substrates but simultaneously activates the kinase domain by relieving an inhibitory intramolecular interaction [109]. Interaction of the PBD with the kinase domain does not appear to involve the phosphopeptide binding pocket, since its mutation does not affect the level of binding [109]. Furthermore, the kinase domain does not contain a PBD consensus motif, and mutation of Thr210 [121] within the kinase domain to Asp, as a mimic of phosphorylation, actually inhibits interaction of the PBD with the kinase domain [107]. Thus, disruption of the PBD-kinase interaction likely opens the Plk1 molecule up, to facilitate phosphorylation within its activation loop at Thr210 by upstream kinases [121–123]. The phosphorylated Plk1 would then become locked in the active state throughout the remainder of mitosis by preventing rebinding of the PBD to the kinase. X-ray crystal structures of the human Plk1 PBD bound to its optimal phosphopeptide ligand show that the two Polo-boxes of the PBD each comprise β 6 α structures (Figure 1h) [109, 124], resembling the single Polo-box found in Sak [125]. Together,
11
12 Phosphoserine/Threonine Binding Domains
both Plk1 Polo-boxes form a novel 12-stranded β-sandwich domain, to which the phosphopeptide binds within a conserved, positively charged cleft located at the edge of the Polo-box interface. Binding at this interface rationalizes the requirement of both Polo-boxes for efficient subcellular localization of Plk1 in vivo and also an observed 1:1 stoichiometry of PBD-ligand binding. Preceding the Poloboxes is a 45-residue region, termed the Polo cap, which wraps around Polo-box 2 like a hook tethering it to Polo-box 1 [109]. The phosphate group participates in eight hydrogen-bonding interactions, explaining the critical dependence on peptide phosphorylation for binding. Only two residues (His538 and Lys540) directly contact the phosphate, accounting for three of the hydrogen bonds, with the remaining hydrogen bonds formed by an extensive lattice of water molecule bridges. The structural basis for the high serine selectivity at the pThr −1 position results from three hydrogen bonds to the serine hydroxyl group, two of which arise from interactions with Trp414 mainchain atoms. This critical role of Trp414 in ligand binding explains prior observations that a W414F mutation eliminates both the centrosomal localization of Plk1 and its ability to complement a temperature-sensitive mutation in the Plk1 ortholog, Cdc5, in S. cerevisiae [104].
7 Conclusions
A remarkable amount of functional and structural diversity is observed in these examples of phosphoserine/phosphothreonine-binding domains. Most of these domains play at least some role in regulating normal cell progression or in halting the cell cycle after DNA damage. These observations underscore the critical role of protein phosphorylation-dependent cell signaling in the regulation of many aspects of cell division. The recent identification of tandem BRCT repeats as the newest member of the phosphoserine/threonine binding domain superfamily [10, 126] and the demonstration that mutations in the BRCT domains that eliminate phosphopeptide binding are also associated with an increased risk of breast and ovarian cancer further illustrate this point. The observation that so many different tertiary protein folds can adopt a phosphoserine/threonine binding function suggests that this property has been repeatedly ‘rediscovered’ during evolution and strongly argues that many additional phosphoserine/threonine binding domains remain to be found.
Acknowledgements
This work benefited from helpful discussions with members of the Yaffe laboratory. We apologize to the many researchers whose work was not cited here due to space limitations. Financial assistance from the NIH (grant GM60594) (MBY), and a Career Development Award from the Burroughs-Wellcome Fund (MBY), and the NIH Medical Scientist Training Program (AE) is gratefully acknowledged.
References
References 1 PAWSON, T., GISH, G. D., SH2 and SH3 domains: from structure to function. Cell 1992, 71, 359–362. 2 MAYER, B. J., GUPTA, R., Functions of SH2 and SH3 domains. Curr. Top. Microbiol. Immunol. 1998, 228, 1–22. 3 KURIYAN, J., COWBURN, D., Modular peptide recognition domains in eukaryotic signaling. Annu. Rev. Biophys. Biomol. Struct. 1997, 26, 259–288. 4 YAFFE, M. B., Phosphotyrosine-binding domains in signal transduction. Nat. Rev. Mol. Cell Biol. 2002, 3, 177–186. 5 MUSLIN, A. J., et al., Interaction of 14–3-3 with signaling proteins is mediated by the recognition of phosphoserine. Cell 1996, 84, 889–897. 6 YAFFE, M. B., et al., The structural basis for 14–3-3:phosphopeptide binding specificity. Cell 1997, 91, 961–971. 7 YAFFE, M. B., ELIA, A. E., Phosphoserine/threonine-binding domains. Curr. Opin. Cell Biol. 2001, 13, 131–138. 8 LU, P. J., et al., Function of WW domains as phosphoserine- or phosphothreonine-binding modules. Science 1999, 283, 1325–1328. 9 ELIA, A. E., CANTLEY, L. C., YAFFE, M. B., Proteomic screen finds pSer/pThr-binding domain localizing Plk1 to mitotic substrates. Science 2003, 299, 1228–1231. 10 MANKE, I. A., et al., BRCT repeats as phosphopeptide-binding modules involved in protein targeting. Science 2003, 302, 636–639. 11 FURUKAWA, Y., et al., Demonstration of the phosphorylation-dependent interaction of tryptophan hydroxylase with the 14–3-3 protein. Biochem. Biophys. Res. Commun. 1993, 194, 144–149. 12 FANTL, W. J., et al., Activation of Raf-1 by 14–3-3 proteins. Nature 1994, 371, 612–614. 13 FREED, E., et al., Binding of 14–3-3 proteins to the protein kinase Raf and effects on its activation. Science 1994, 265, 1713–1716.
14 PALLAS, D. C., et al., Association of polyomavirus middle tumor antigen with 14–3-3 proteins. Science 1994, 265, 535–537. 15 FORD, J. C., et al., 14–3-3 protein homologs required for the DNA damage checkpoint in fission yeast. Science 1994, 265, 533–535. 16 FU, H., SUBRAMANIAN, R. R., MASTERS, S. C., 14–3-3 proteins: structure, function, and regulation. Ann. Rev. Pharmacol. Toxicol. 2000, 40, 617–647. 17 TZIVION, G., SHEN, Y. H., ZHU, J., 14–3-3 proteins; bringing new definitions to scaffolding. Oncogene 2001, 20, 6331–6338. 18 VAN HEMERT, M. J., STEENSMA, H. Y., VAN HEUSDEN, G. P., 14–3-3 proteins: key regulators of cell division, signalling and apoptosis. BioEssays 2001, 23, 936–946. 19 ICHIMURA, T., et al., Brain 14–3-3 protein is an activator protein that activates tryptophan 5-monooxygenase and tyrosine 3-monooxygenase in the presence of Ca2+ , calmodulin-dependent protein kinase II. FEBS Lett. 1987, 219, 79–82. 20 ICHIMURA, T., et al., Molecular cloning of cDNA coding for brain-specific 14–3-3 protein, a protein kinase-dependent activator of tyrosine and tryptophan hydroxylases. Proc. Natl. Acad. Sci. USA 1988, 85, 7084–7088. 21 TOKER, A., et al., Protein kinase C inhibitor proteins: purification from sheep brain and sequence similarity to lipocortins and 14–3-3 protein. Eur. J. Biochem. 1990, 191, 421–429. 22 ZHANG, L., CHEN, J., FU, H., Suppression of apoptosis signal-regulating kinase 1-induced cell death by 14–3-3 proteins. Proc. Natl. Acad. Sci. USA 1999, 96, 8511–8515. 23 ROY, S., et al., 14–3-3 facilitates Ras-dependent Raf-1 activation in vitro and in vivo. Mol. Cell Biol. 1998, 18, 3947–3955. 24 LIGHT, Y., PATERSON, H., MARAIS, R., 14–3-3 antagonizes Ras-mediated Raf-1 recruitment to the plasma membrane to
13
14 Phosphoserine/Threonine Binding Domains
25
26
27
28
29
30
31
32
33
34
35
36
maintain signaling fidelity. Mol. Cell Biol. 2002, 22, 4984–4996. THORSON, J. A., et al., 14–3-3 proteins are required for maintenance of Raf-1 phosphorylation and kinase activity. Mol. Cell Biol. 1998, 18, 5229–5238. TZIVION, G., LUO, Z., AVRUCH, J., A dimeric 14–3-3 protein is an essential cofactor for Raf kinase activity. Nature 1998, 394, 88–92. ORY, S., et al., Protein phosphatase 2A positively regulates Ras signaling by dephosphorylating KSR1 and Raf-1 on critical 14–3-3 binding sites. Curr. Biol. 2003, 13, 1–20. ZHA, J., et al., Serine phosphorylation of death agonist BAD in response to survival factor results in binding to 14–3-3 not BCL-X(L). Cell 1996, 87, 619–628. DATTA, S. R., et al., Akt phosphorylation of BAD couples survival signals to the cell-intrinsic death machinery. Cell 1997, 91, 231–241. LIZCANO, J. M., MORRICE, N., COHEN, P., Regulation of BAD by cAMP-dependent protein kinase is mediated via phosphorylation of a novel site, Ser155. Biochem. J. 2000, 349, 547–557. TAN, Y., et al., BAD Ser-155 phosphorylation regulates BAD/Bcl-XL interaction and cell survival. J. Biol. Chem. 2000, 275, 25865–25869. DATTA, S. R., et al., 14–3-3 proteins and survival kinases cooperate to inactivate BAD by BH3 domain phosphorylation. Mol. Cell 2000, 6, 41–51. ZHOU, X. M., et al., Growth factors inactivate the cell death promoter BAD by phosphorylation of its BH3 domain on Ser155. J. Biol. Chem. 2000, 275, 25046–25051. KOSAKI, A., et al., 14–3-3beta protein associates with insulin receptor substrate 1 and decreases insulin-stimulated phosphatidylinositol 3 -kinase activity in 3T3L1 adipocytes. J. Biol. Chem. 1998, 273, 940–944. BRASELMANN, S., MCCORMICK, F., Bcr and Raf form a complex in vivo via 14–3-3 proteins. EMBO J. 1995, 14, 4839–4848. VAN DER HOEVEN, P. C., et al., 14–3-3 isotypes facilitate coupling of protein
37
38
39
40
41
42
43
44
45
46
47
48
kinase C-zeta to Raf-1: negative regulation by 14–3-3 phosphorylation. Biochem. J. 2000, 345, 297–306. KUMAGAI, A., DUNPHY, W. G., Binding of 14–3-3 proteins and nuclear export control the intracellular localization of the mitotic inducer Cdc25. Genes. Dev. 1999, 13, 1067–1072. LOPEZ-GIRONA, A., et al., Nuclear localization of Cdc25 is regulated by DNA damage and a 14–3-3 protein. Nature 1999, 397, 172–175. ZENG, Y., PIWNICA-WORMS, H., DNA damage and replication checkpoints in fission yeast require nuclear exclusion of the Cdc25 phosphatase via 14–3-3 binding. Mol. Cell Biol. 1999, 19, 7410–7419. BRUNET, A., et al., Akt promotes cell survival by phosphorylating and inhibiting a forkhead transcription factor. Cell 1999, 96, 857–868. BRUNET, A., et al., 14–3-3 transits to the nucleus and participates in dynamic nucleocytoplasmic transport. J. Cell Biol. 2002, 156, 817–828. TANG, S. J., et al., Association of the TLX-2 homeodomain and 14–3-3eta signaling proteins. J. Biol. Chem. 1998, 273, 25356–25363. MUSLIN, A. J., XING, H., 14–3-3 proteins: regulation of subcellular localization by molecular interference. Cell Signal 2000, 12, 703–709. LIU, D., et al., Crystal structure of the zeta isoform of the 14–3-3 protein. Nature 1995, 376, 191–194. XIAO, B., et al., Structure of a 14–3-3 protein and implications for coordination of multiple signalling pathways. Nature 1995, 376, 188–191. PETOSA, C., et al., 14–3-3 zeta binds a phosphorylated Raf peptide and an unphosphorylated peptide via its conserved amphipathic groove. J. Biol. Chem. 1998, 273, 16305–16310. RITTINGER, K., et al., Structural analysis of 14–3-3 phosphopeptide complexes identifies a dual role for the nuclear export signal of 14–3-3 in ligand binding. Mol. Cell 1999, 4, 153–166. OBSIL, T., et al., Crystal structure of the 14–3-3zeta:serotonin N-acetyltransferase
References
49
50
51
52
53
54
55
56
57
58
59
60
61
complex. a role for scaffolding in enzyme regulation. Cell 2001, 105, 257–267. SUDOL, M., et al., Characterization of a novel protein-binding module: the WW domain. FEBS Lett. 1995, 369, 67–71. SUDOL, M., SLIWA, K., RUSSO, T., Functions of WW domains in the nucleus. FEBS Lett. 2001, 490, 190–195. OTTE, L., et al., WW domain sequence activity relationships identified using ligand recognition propensities of 42 WW domains. Protein Sci. 2003, 12, 491–500. SUDOL, M., HUNTER, T., NeW wrinkles for an old domain. Cell 2000, 103, 1001–1004. RANGANATHAN, R., et al., Structural and functional analysis of the mitotic rotamase Pin1 suggests substrate recognition is phosphorylation dependent. Cell 1997, 89, 875–886. YAFFE, M. B., et al., Sequence-specific and phosphorylation-dependent proline isomerization: a potential mitotic regulatory mechanism. Science 1997, 278, 1957–1960. SHEN, M., et al., The essential mitotic peptidyl-prolyl isomerase Pin1 binds and regulates mitosis-specific phosphoproteins. Genes. Dev. 1998, 12, 706–720. LU, K. P., HANES, S. D., HUNTER, T., A human peptidyl-prolyl isomerase essential for regulation of mitosis. Nature 1996, 380, 544–547. ZHOU, X. Z., et al., Pin1-dependent prolyl isomerization regulates dephosphorylation of Cdc25C and tau proteins. Mol. Cell 2000, 6, 873–883. STUKENBERG, P. T., KIRSCHNER, M. W., Pin1 acts catalytically to promote a conformational change in Cdc25. Mol. Cell 2001, 7, 1071–1083. LANDRIEU, I., et al., p13(SUC1) and the WW domain of PIN1 bind to the same phosphothreonine-proline epitope. J. Biol. Chem. 2001, 276, 1434–1438. PATRA, D., et al., The Xenopus Suc1/Cks protein promotes the phosphorylation of G(2)/M regulators. J. Biol. Chem. 1999, 274, 36839–36842. RYO, A., et al., Pin1 regulates turnover and subcellular localization of
62
63
64
65
66
67
68
69
70
71
72
73
beta-catenin by inhibiting its interaction with APC. Nat. Cell Biol. 2001, 3, 793–801. ZACCHI, P., et al., The prolyl isomerase Pin1 reveals a mechanism to control p53 functions after genotoxic insults. Nature 2002, 419, 853–857. ZHENG, H., et al., The prolyl isomerase Pin1 is a regulator of p53 in genotoxic response. Nature 2002, 419, 849–853. MACIAS, M. J., et al., Structure of the WW domain of a kinase-associated protein complexed with a proline-rich peptide. Nature 1996, 382, 646–649. ZARRINPAR, A., LIM, W. A., Converging on proline: the mechanism of WW domain peptide recognition. Nat. Struct. Biol. 2000, 7, 611–613. VERDECIA, M. A., et al., Structural basis for phosphoserine-proline recognition by group IV WW domains. Nat. Struct. Biol. 2000, 7, 639–643. HOFMANN, K., BUCHER, P., The FHA domain: a putative nuclear signalling domain found in protein kinases and transcription factors. Trends Biochem. Sci. 1995, 20, 347–349. STONE, J. M., et al., Interaction of a protein phosphatase with an Arabidopsis serine-threonine receptor kinase. Science 1994, 266, 793–795. SUN, Z., et al., Rad53 FHA domain associated with phosphorylated Rad9 in the DNA damage checkpoint [see comments]. Science 1998, 281, 272–274. DUROCHER, D., et al., The FHA domain is a modular phosphopeptide recognition motif. Mol. Cell 1999, 4, 387–394. DUROCHER, D., et al., The molecular basis of FHA domain:phosphopeptide binding specificity and implications for phosphodependent signaling mechanisms. Mol. Cell 2000, 6, 1169–1182. SCHWARTZ, M. F., et al., Rad9 phosphory-lation sites couple Rad53 to the Saccharomyces cerevisiae DNA damage checkpoint. Mol. Cell 2002, 9, 1055–1065. GILBERT, C. S., GREEN, C. M., LOWNDES, N. F., Budding yeast Rad9 is an ATP-dependent Rad53 activating machine. Mol. Cell 2001, 8, 129–136.
15
16 Phosphoserine/Threonine Binding Domains
74 AHN, J. Y., et al., Phosphorylation of threonine 68 promotes oligomerization and autophosphorylation of the Chk2 protein kinase via the forkhead-associated domain. J. Biol. Chem. 2002, 277, 19389–19395. 75 XU, X., TSVETKOV, L. M., STERN, D. F., Chk2 activation and phosphorylation-dependent oligomerization. Mol. Cell Biol. 2002, 22, 4419–4432. 76 BASHKIROV, V. I., et al., Direct kinase-to-kinase signaling mediated by the FHA phosphoprotein recognition domain of the Dun1 DNA damage checkpoint kinase. Mol. Cell Biol. 2003, 23, 1441–1452. 77 KOBAYASHI, J., et al., NBS1 localizes to gamma-H2AX foci through interaction with the FHA/BRCT domain. Curr. Biol. 2002, 12, 1846–1851. 78 TAUCHI, H., et al., The forkhead-associated domain of NBS1 is essential for nuclear foci formation after irradiation but not essential for hRAD50–hMRE11–NBS1 complex DNA repair activity. J. Biol. Chem. 2001, 276, 12–15. 79 GOLDBERG, M., et al., MDC1 is required for the intra-S-phase DNA damage checkpoint. Nature 2003, 421, 952–956. 80 LOU, Z., et al., MDC1 is coupled to activated CHK2 in mammalian DNA damage response pathways. Nature 2003, 421, 957–961. 81 STEWART, G. S., et al., MDC1 is a mediator of the mammalian DNA damage checkpoint. Nature 2003, 421, 961–966. 82 WANG, P., et al., II. Structure and specificity of the interaction between the FHA2 domain of rad53 and phosphotyrosyl peptidesdagger [In Process Citation]. J. Mol. Biol. 2000, 302, 927–940. 83 LI, J., et al., Structural and functional versatility of the FHA domain in DNA-damage signaling by the tumor suppressor kinase Chk2. Mol. Cell 2002, 9, 1045–1054. 84 PETERS, J. M., SCF and APC: the yin and yang of cell cycle regulated proteolysis. Curr. Opin. Cell Biol. 1998, 10, 759–768.
85 WILLEMS, A. R., et al., SCF ubiquitin protein ligases and phosphorylation-dependent proteolysis. Philos. Trans R Soc. London B Biol. Sci. 1999, 354, 1533–1550. 86 WINSTON, J. T., et al., A family of mammalian F-box proteins. Curr. Biol. 1999, 9, 1180–1182. 87 DESHAIES, R. J., SCF and Cullin/Ring H2-based ubiquitin ligases. Annu. Rev. Cell Dev. Biol. 1999, 15, 435–467. 88 HARPER, J. W., Protein destruction: adapting roles for Cks proteins. Curr. Biol. 2001, 11, R431–435. 89 YARON, A., et al., Inhibition of NF-kappa-B cellular function via specific targeting of the I-kappa–B-ubiquitin ligase. EMBO J. 1997, 16, 6486–6494. 90 YARON, A., et al., Identification of the receptor component of the IkappaBalpha–ubiquitin ligase. Nature 1998, 396, 590–594. 91 WINSTON, J. T., et al., The SCFbeta–TRCP–ubiquitin ligase complex associates specifically with phosphorylated destruction motifs in IkappaBalpha and beta-catenin and stimulates IkappaBalpha ubiquitination in vitro [erratum appears in Genes. Dev. 1999, 13, 1050]. Genes. Dev. 1999, 13, 270–283. 92 ORLICKY, S., et al., Structural basis for phosphodependent substrate selection and orientation by the SCFCdc4 ubiquitin ligase. Cell 2003, 112, 243–256. 93 WU, G., et al., Structure of a beta-TrCP1–Skp1–beta-catenin complex: destruction motif binding and lysine specificity of the SCF(beta-TrCP1) ubiquitin ligase. Mol. Cell 2003, 11, 1445–1456. 94 TYERS, M., JORGENSEN, P., Proteolysis and the cell cycle: with this RING I do thee destroy. Curr. Opin. Genet. Dev. 2000, 10, 54–64. 95 NASH, P., et al., Multisite phosphorylation of a CDK inhibitor sets a threshold for the onset of DNA replication. Nature 2001, 414, 514–521. 96 DESHAIES, R. J., FERRELL, J. E. Jr., Multisite phosphorylation and the countdown to S phase. Cell 2001, 107, 819–822.
References 97 HARPER, J. W., A phosphorylation-driven ubiquitination switch for cell-cycle control. Trends Cell Biol. 2002, 12, 104–107. 98 MANIATIS, T., A ubiquitin ligase complex essential for the NF-kappaB, Wnt/Wingless, and Hedgehog signaling pathways. Genes. Dev. 1999, 13, 505–510. 99 KARIN, M., BEN-NERIAH, Y., Phosphorylation meets ubiquitination: the control of NF-[kappa]B activity. Annu. Rev. Immunol. 2000, 18, 621–663. 100 JACKSON, P. K., Ubiquitinating a phosphorylated Cdk inhibitor on the blades of the Cdc4 beta-propeller. Cell 2003, 112, 142–144. 101 KLEIN, P., PAWSON, T., TYERS, M., Mathematical modeling suggests cooperative interactions between a disordered polyvalent ligand and a single receptor site. Curr. Biol. 2003, 13, 1669–1678. 102 SEONG, Y. S., et al., A spindle checkpoint arrest and a cytokinesis failure by the dominant-negative polo-box domain of Plk1 in U-2 OS cells. J. Biol. Chem. 2002, 277, 32282–32293. 103 SONNHAMMER, E. L., et al., Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res. 1998, 26, 320–322. 104 LEE, K. S., et al., Mutation of the polo-box disrupts localization and mitotic functions of the mammalian polo kinase Plk. Proc. Natl. Acad. Sci. USA 1998, 95, 9301–9306. 105 LEE, K. S., SONG, S., ERIKSON, R. L., The polo-box-dependent induction of ectopic septal structures by a mammalian polo kinase, plk, in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 1999, 96, 14360–14365. 106 MAY, K. M., et al., Polo boxes and Cut23 (Apc8) mediate an interaction between polo kinase and the anaphase-promoting complex for fission yeast mitosis. J. Cell Biol. 2002, 156, 23–28. 107 JANG, Y. J., et al., Functional studies on the role of the C-terminal domain of mammalian polo-like kinase. Proc. Natl. Acad. Sci. USA 2002, 99, 1984–1989. 108 MUNDT, K. E., et al., On the regulation and function of human polo-like kinase
109
110
111
112
113
114
115
116
117
118
119
120
1 (PLK1): effects of overexpression on cell cycle progression. Biochem. Biophys. Res. Commun. 1997, 239, 377–385. ELIA, E. A. H., et al., The molecular basis for phosphodependent substrate targeting and regulation of Plks by the Polo-box domain. Cell 2003, 115, 83–95. NIGG, E. A., Polo-like kinases: positive regulators of cell division from start to finish. Curr. Opin. Cell Biol. 1998, 10, 776–783. GLOVER, D. M., OHKURA, H., TAVARES, A., Polo kinase: the choreographer of the mitotic stage? J. Cell Biol. 1996, 135, 1681–1684. GLOVER, D. M., HAGAN, I. M., TAVARES, A. A., Polo-like kinases: a team that plays throughout mitosis. Genes. Dev. 1998, 12, 3777–3787. DONALDSON, M. M., et al., The mitotic roles of Polo-like kinase. J. Cell Sci. 2001, 114, 2357–2358. KUMAGAI, A., DUNPHY, W. G., Regulation of the cdc25 protein during the cell cycle in Xenopus extracts. Cell 1992, 70, 139–151. IZUMI, T., MALLER, J. L., Elimination of cdc2 phosphorylation sites in the cdc25 phosphatase blocks initiation of M-phase. Mol. Biol. Cell 1993, 4, 1337–1350. KARAISKOU, A., et al., MPF amplification in Xenopus oocyte extracts depends on a two-step activation of cdc25 phosphatase. Exp. Cell Res. 1998, 244, 491–500. KARAISKOU, A., et al., Phosphatase 2A and polo kinase, two antagonistic regulators of cdc25 activation and MPF auto-amplification. J. Cell Sci. 1999, 112, 3747–3756. QIAN, Y. W., et al., The polo-like kinase Plx1 is required for activation of the phosphatase Cdc25C and cyclin B-Cdc2 in Xenopus oocytes. Mol. Biol. Cell 2001, 12, 1791–1799. QIAN, Y. W., et al., Activated polo-like kinase Plx1 is required at multiple points during mitosis in Xenopus laevis. Mol. Cell Biol. 1998, 18, 4262–4271. LEE, K. S., ERIKSON, R. L., Plk is a functional homolog of Saccharomyces cerevisiae Cdc5, and elevated Plk activity
17
18 Phosphoserine/Threonine Binding Domains
induces multiple septation structures. Mol. Cell Biol. 1997, 17, 3408–3417. 121 JANG, Y. J., et al., Phosphorylation of threonine 210 and the role of serine 137 in the regulation of mammalian polo-like kinase. J. Biol. Chem. 2002, 277, 44115–44120. 122 KELM, O., et al., Cell cycle-regulated phosphorylation of the Xenopus polo-like kinase Plx1. J. Biol. Chem. 2002, 277, 25247–25256. 123 ELLINGER-ZIEGELBAUER, H., et al., Ste20-like kinase (SLK), a regulatory kinase for polo-like kinase (Plk) during
the G2/M transition in somatic cells. Genes. Cells 2000, 5, 491–498. 124 CHENG, K. Y., et al., The crystal structure of the human polo-like kinase-1 polo box domain and its phosphopeptide complex. EMBO J. 2003, 22, 5757– 5768. 125 LEUNG, G. C., et al., The Sak polo-box comprises a structural domain sufficient for mitotic subcellular localization. Nat. Struct. Biol. 2002, 9, 719–724. 126 YU, X., et al., The BRCT domain is a phospho-protein binding domain. Science 2003, 302, 639–642.
1
EVH1/WH1 Domains Linda J. Ball
University of Oxford, Oxford, United Kingdom Urs Wiedemann, and J¨urgen Zimmermann Forschungsinstitut f¨ur Molekulare Pharmakologie, Berlin, Germany
Thomas Jarchau Universit¨at W¨urzburg, W¨urzburg, Germany
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 Introduction
Drosophila enabled (Ena)/vasodilator-stimulated phosphoprotein (VASP) homology 1 (EVH1) domains, sometimes referred to as Wiskott-Aldrich syndrome protein homology 1 (WH1) domains, are a family of small (∼115 residues; ∼13 kDa), noncatalytic, protein–protein interaction modules essential for linking their host proteins to various signaling pathways. Here, we take a close look at the sequence signatures and structural features that define EVH1 domains and distinguish them from related domains. Using information derived from multiple sequence alignments and phylogenetic trees, together with the available biological data and high-resolution structures of several representative domains and their complexes, we now suggest a classification scheme for EVH1 domains. Mechanisms of peptide–ligand recognition, modulation of binding specificity, and grouping of the domains in light of their evolutionary history are also discussed. To date, there are four main families of proteins that contain EVH1 domains: the actin regulatory Ena/VASP proteins, the synaptic scaffolding (Homer/Vesl) proteins, the Wiscott–Aldrich syndrome protein (WASP), the closely related N-WASP (also involved in modulating the actin cytoskeleton), and the Sprouty-related EVH1 domain-containing (Spred) proteins which regulate the Ras–MAP kinase pathway. The Ran binding domains (RanBDs) of the Ran binding proteins and nucleoporins possess a striking similarity to EVH1 domains in both sequence and structure. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 EVH1/WH1 Domains
A high degree of structural similarity to the pleckstrin homology (PH) domains and phosphotyrosine binding (PTB) domains, found in a widespread number of eukaryotic signaling proteins, is also observed, despite a sequence identity of less than 10%. This has led to suggestions that EVH1 domains may share a common ancestry with PH and PTB domains and may comprise one of six subfamilies belonging to the PH domain superfamily [1]. 2 Occurrence and Distribution of EVH1 Domains
Extensive searches of the SMART [2, 3], Pfam [4], SwissProt/SPTrembl [5], GenPept, and PIR databases [6] revealed a total of 519 EVH1 domain and RanBD sequences. After purging, 206 nonredundant sequences remained, 120 of which could be assigned to protein families. These were aligned based on the known 3D structures of representative members from each family, using the algorithm DALI [7]. Figure 1 Fig. 1 Sequence alignment of EVH1 domains and RanBD domains. Sequences were obtained from an extensive search of the SMART database [114] using EVH1 (called WH1 in SMART) and RanBD sequence sets, and a search of the SwissProt/SPTrembl, GenPept, and PIR databases using a hidden Markov model (HMM) [115], based on a sequence profile containing well known representatives of these domains. Eight sequences were added manually, either because they were not readily available from the databases or were not found by the three HMMs used. The VASP DD protein sequence, for example, is not yet available in the databases and was obtained from the original publication [15] and cross-checked against the Dictyostelium discoideum genome sequence. The resulting set of 519 domain sequences was edited manually to remove redundant sequences, yielding 206 nonredundant domain sequences, of which 120 were well annotated. From this set, representative sequences were selected for the alignment shown here, based on pairwise structural alignments using DALI [7] with manual refinement. Residues are colored as follows: red: acidic; blue: basic; green: hydrophobic; yellow: aromatic; purple: polar residues; dark green: Gly; and light brown: Pro. Sequences are grouped according to protein family, and secondary structure elements were obtained from the available structures using DSSP [116]. Asterisks denote sequences of domains shown in Figures 5
and 6. Sequence numbers and assignments of secondary structural elements above the alignment are according to the EVH1 domain of Mena (SP: Q03173, PDB: 1evh). The secondary structure assignment below the alignment represents the consensus conservation observed across the EVH1, RanBD, PTB, and PH families. The characteristic aromatic triad (Y16, W23, and F77; Mena numbering) of the Ena/VASP-family EVH1 domains are highlighted by boxes for comparison of the different groups. Numbers on the right represent the region of the sequence displayed. SwissProt accession codes are given in the figure. Abbreviations of the organism names are appended to each protein name as follows: AG = Anopheles gambiae, AT = Arabidopsis thaliana, BD = Babesia divergens, BR = Brachydanio rerio, BB = Babesia rodhaini, BT = Bos taurus, CA = Candida albicans, CE = Caenorhabditis elegans, CF = Canis familiaris, DD = Dictyostelium discoideum, DM = Drosophila melanogaster, EN = Encephalitozoon cuniculi, EG = Eremothecium gossypii, FR = Fugu rubripes, GG = Gallus gallus, GL = Giardia lamblia, HM = Hirudo medicinalis, HS = Homo sapiens, LE = Lycopersicon esculentum, LM = Leishmania major, MM = Mus musculus, NC = Neurospora crassa, OA = Ovis aries, PF = Plasmodium falciparum, PY = Patinopecten yessoensis, RN = Rattus norvegicus, SC = Saccharomyces cerevisiae, SP = Schizosaccharomyces pombe, SS = Sus scrofa domestica, XL = Xenopus laevis.
Fig. 1
2 Occurrence and Distribution of EVH1 Domains 3
4 EVH1/WH1 Domains Table 1 Number and origin of EVH1 and RanBD domains. The organism name abbreviations
are according to Figure 1
Protein family
Number of domains
Ena/VASP Spred Homer WASP/N-WASP RanBP/Nucleoporin Unclassified Total
19 7 20 16 58 86
Organisms in which domain has been found to date
HS MM RN CF GG DM CE HM DD HS MM DM HS MM RN OA BR XL DM CE HS MM RN BT BR DM CE DD SC SP EG HS MM BT SS BR XL DM EN LM PF SC SP CA AT LE BD HS MM RN BR FR XL AG DM CE PY GL PF DD NC SP AT BB
206
shows the alignment of a selection of these sequences, which were chosen to represent each of the main protein families and species in which EVH1 domains are found. A more thorough breakdown of the species distribution of these proteins is provided in Table 1. The sequences were then used to calculate a phylogenetic tree (Figure 2) which shows the relationships between EVH1 domains from the different host proteins [8–10]. To avoid crowding, only a reduced selection of 45 domains, representing all branches of the tree, is shown in the figure. The classification of the EVH1 domains of these proteins alone gives a biologically meaningful grouping of protein host families in terms of molecular function. The total number of sequences occurring in each major branch is given in brackets. Such a representation helps us to immediately see the groupings obtained based on sequence and structural alignments. Each of these different groups of host proteins is discussed below. 2.1 Proteins Containing EVH1 Domains
The EVH1 domains were first identified in the Ena/VASP protein family [11, 12], which includes Drosophila Ena, its mammalian orthologs (VASP, Mena, and EVL (Ena/VASP-like protein)), the Caenorhabditis elegans ortholog Unc-34, and the Dictyostelium DdVASP, among others (for reviews see [13–15]). The members of this family localize to highly dynamic areas of actin reorganization, such as the leading edge of lamellipodia, the tips of filopodia, adherens-type cell–matrix and cell–cell junctions, and other dynamic membrane regions (for reviews see [16, 17]), where they are involved in regulating the rate of actin polymerization. They are essential players in a wide range of physiological and developmental processes, including axon guidance [18–21], platelet aggregation [22, 23], T-cell activation, phagocytosis, and migration of neutrophils, fibroblasts, and neurons [24–29]. The Ena/VASP proteins are also engaged in actinbased intracellular motility and cell-to-cell spreading of the pathogenic bacterium
2 Occurrence and Distribution of EVH1 Domains 5
Fig. 2 Phylogentic tree of EVH1, RanBD, PH, and PTB domain sequences. The phylogenetic tree is based on the alignment shown in Figure 1 with known loop regions removed. The number of domain sequences known to date are given in brackets below each group. Representative sequences of the PH and PTB domains have been incorporated into the alignment based on a structural alignment using DALI [7]. The
sequence relatedness was calculated with PROTDIST using the Jones-Taylor-Thornton mutation model. This was then used by FITCH to calculate an unrooted tree in which branch lengths represent sequence similarity. All programs were part of the Phylip package version 3.6.a2 [10]. The tree was created using the TreeView application [8].
Listeria monocytogenes [30]. The proteins show both positive and negative regulatory activities toward actin-based processes [13, 16]. The Homer/Vesl proteins are a family of synaptic scaffolding proteins that are constitutively expressed in brain and enriched at excitatory synapses [31, 32]. They function as molecular adaptors, binding to and targeting neurotransmitter receptors and other scaffolding proteins to the post-synaptic density (PSD), a specialized protein complex at the synaptic junction [33, 34]. Homer/Vesl proteins have generated interest as potential mediators of synaptic plasticity having important implications for learning and memory formation [35]. Their N-terminal EVH1 domains are responsible for interactions with various receptors in the signaling pathways of these proteins. Like Ena/VASP, the WASP/N-WASP proteins are involved in spatial and temporal regulation of the actin cytoskeleton. These proteins were originally discovered in various cells of the hematopoietic system, where they play roles in promoting actin polymerization in response to upstream intracellular signals from the Cdc42 and phosphatidylinositol 4,5-bisphosphate (PIP2 ) signaling pathways [36–39]. Missense
6 EVH1/WH1 Domains
mutants in their N-terminal EVH1 domains (historically called WH1 domains) result in the X-linked recessive disorder Wiscott–Aldrich syndrome (WAS), characterized by immunodeficiency, eczema, and thrombocytopenia [36, 40–44]. These symptoms are consistent with cytoskeletal defects in hematopoetic cells and with roles for WASP in multiple actin-based motility processes [36]. A knockout mutation in the more ubiquitously expressed gene for N-WASP was lethal to developing embryos and resulted in defects in many actin-based processes. WASP/N-WASP proteins are also recruited by intracellular pathogens, such as the Shigella bacterium and the vaccinia virus, to support their own actin-based motility [45, 46]. The interaction occurs via the EVH1 domain as previously observed in the Ena/VASP proteins. The Spred family is the most recently discovered family of proteins found to contain EVH1 domains. So far, three paralogs of the Spred protein have been identified in mouse and humans, with Spred2 being the most ubiquitously expressed isoform [47]. Spred proteins suppress the signaling pathway of Ras-, Raf-, and mitogen-activated protein (MAP) kinase, which regulates differentiation in neuronal cells and myocytes [48]. Mechanistically, Spred inhibits early steps in the activation of MAP kinases, resulting in down-regulation of the Ras–MAP kinase signaling pathway [48]. The Drosophila ortholog of Spred is the AE33 protein, which is believed to be involved in regulating photoreceptor development [49]. The closest known relatives of the EVH1 domains are the Ran-binding domains (RanBDs) found in the Ran-binding protein/nucleoporin family of nuclear transport proteins. These proteins are essential for the trafficking of cytoplasmic proteins through the nuclear pore complex (NPC), the directionality of transport being tightly controlled by their interaction with the small GTP-binding protein Ran [50–52]. The RanBDs form a large family having a very high sequence homology to EVH1 domains [53–55] (Figure 1). The structurally related phosphoinositide-binding PH domains and the PTB domains, which bind primarily to peptides containing phosphorylated tyrosines, are not included in Figure 1, because of their low sequence identity to the EVH1 and RanBD families. 2.2 Modular Architecture of EVH1-containing Proteins: Domain Location, Domain Combinations, and Copy Number
Proteins containing EVH1, RanBD, PTB, and PH domains generally possess highly modular multidomain structures. Different domains within the same parent protein can therefore bind simultaneously to distinct interaction partners or substrates in response to specific signals. The combination of domain types that occur together within a host protein and the overall domain architecture are crucial factors in determining function. Figure 3 summarizes the domain structures for representative examples of the EVH1 domain-, RanBD-, PTB domain-, and PH domain-containing proteins.
2 Occurrence and Distribution of EVH1 Domains 7
The Ena/VASP proteins all share an overall tripartite domain structure. The Nterminal EVH1 domain binds FPPPP-containing motifs exposed on the surfaces of receptors or focal adhesion proteins such as zyxin [56], vinculin [57], and the Listeria surface protein ActA [30]. A central, low complexity, proline-rich region contains profilin and both SH3 and WW domain binding sites [58]. In the larger Drosophila Ena and mouse Mena sequences, an additional Gln-rich region precedes this (Figure 3). All share a conserved C-terminal EVH2 domain required for tetramerization and for F-actin and G-actin binding [59, 60]. The C-terminal region is responsible for making direct contacts with the actin cytoskeleton. The domain architecture of the Homer/Vesl family is more variable. Homer1 contains only an N-terminal EVH1 domain followed by a low complexity region. In contrast, its close relative Homer 2 consists of an N-terminal EVH1 domain, a low complexity linker region, and a leucine zipper motif at the C terminus responsible for clustering [33]. The EVH1 domain interacts with neurotransmitter receptors, binding selectively to PPxxF-containing motifs found in the C termini of group I metabotropic glutamate receptors (mGluRs), inositol-1,4,5-trisphosphate receptors (IP3Rs), ryanodine receptors (RyRs), and Shank family proteins [31, 61, 62]. The WASP/N-WASP family contain a more complex domain structure, comprising an N-terminal EVH1 domain, a short basic motif, a GTPase binding domain (GBD), a proline-rich region, and a C-terminal region containing either a verprolinhomology (VPH) and cofilin-acidic (CA) domain (WASP) or two tandem VPH domains followed by a CA domain (N-WASP). The latter region is often called, as a whole, the verprolin-cofilin-acidic (VCA) domain (Figure 3). The WASP EVH1 domain targets a 25-residue proline-rich peptide in the WASP interacting protein (WIP), which binds by wrapping itself around the entire WASP EVH1 domain [36]. This is in contrast to the interactions of Ena/VASP and Homer/Vesl EVH1 domains, which bind to much shorter (6–12 amino acids) target peptides. The Cterminal verprolin-cofilin-acidic (VCA) region interacts directly with and activates the actin-related protein (Arp)2/3 actin-nucleating complex to promote actin polymerization [63, 64]. The basic motif, GBD, and the proline-rich region have been shown to be involved in autoinhibitory interactions that block VCA activity. These interactions are relieved by binding of the upstream activators, phosphatidylinositol 4,5-bis-phosphate (PIP2 ), the GTPase Cdc42, or SH3 domains, respectively [36]. The Spred proteins possess a domain structure comprising an N-terminal EVH1 domain, a central c-Kit binding domain (KBD), and a C-terminal cysteine-rich Sprouty-related (SPR) domain [48] (Figure 3). To date, no binding partner has been identified for the EVH1 domain, although the fairly limited tissue distribution of some Spred isoforms (expressed in glandular epithelium, intestinal lymphoid tissues, tonsils, and, intriguingly, the invasive trophoblast of the developing embryo) suggests that their binding partners may be significantly different from those of the other EVH1 domains. Differences in sequence at specific positions known to be important for ligand recognition in the Ena/VASP and Homer/Vesl EVH1 domains also suggest that the Spred EVH1 domain may reveal a novel binding mode.
8 EVH1/WH1 Domains
Fig. 3
2 Occurrence and Distribution of EVH1 Domains 9
The C-terminal SPR domain is essential for localization of Spred to the plasma membrane [48]. Both the EVH1 and SPR domains are required for suppression of neuronal cell differentiation [48]. In all the protein families discussed above, EVH1 domains occur in single copies located exclusively at the N termini of their host proteins. Only one hypothetical protein from C. elegans (YKC2; SwissProt: p41993) was found that contains two putative EVH1 domains, separated by about 450 amino acids. This is in contrast to the RanBDs, PTB domains, and PH domains, which show a much broader range of location and occurrence [2, 65]. Many of the proteins containing RanBD, PTB, and PH domains are much larger than the EVH1-containing proteins and possess correspondingly more varied domain architectures. Selected examples are shown in Figure 3. The smallest member of the RanBP/nucleoporin proteins, nucleoporin 50, contains just one RanBD located at its C terminus. However, very little is understood about the structure and nature of the N terminus of these proteins. The RanBP2 proteins are up to an order of magnitude larger than the EVH1-containing proteins, with the largest composed of more than 3000 residues. The RanBDs in these large proteins are found at a variety of positions in the sequence and often occur several times within the same protein, separated by stretches of 150–450 amino acids. In light of the variable location and copy numbers of other domains, the conserved N-terminal location and singular occurrence of EVH1 domains raises interesting questions as to why this is so and whether it may be one of the defining features of EVH1 domains in general. One explanation may lie in the folding pathways of the EVH1 domain-containing proteins. Protein synthesis occurs from the N to the C terminus, and folding in eukaryotes is believed to occur mainly in a cotranslational manner. It is therefore favorable for autonomously folding, highly stable domains to be synthesized and leave the ribosome ahead of the low-complexity regions and
Fig. 3 Domain organization in proteins containing EVH1 domains, RanBDs, PTB, and PH domains. Domain abbreviations: EVH1 = Ena/VASP-homology 1; EVH2 = Ena/VASP-homology 2; GBD = GTP-binding domain; VPH = verprolin homology domain; CA = cofilin homology and acidic domain; KBD = kinase binding domain; SPR = sprouty-related domain; RanBD = ran binding domain; ZnF-RBZ = zinc finger domains; PPIase = proline isomerase; TPR (Pfam) = tetratricopeptide repeat; GRIP (Pfam) = golgin-97, RanBP2alpha, Imh1p, and P230/golgin-245; PTB = phosphotyrosine binding; PDZ = postsynaptic density/disc-large/ZO-1; SH2 = Src homology 2; SH3 = Src homology 3; PH = pleckstrin homology; RhoGAP = GTPase-
activator protein for Rho-like GTPases; DYNc = dynamin GTPase c; GED = dynamin GTPase effector domain. SwissProt accession codes for the specific examples shown in the figure are: VASP/Evl = P50552 (human); Ena/Mena = Q03173 (mouse); WASP = P42768 (human); N-WASP = O00401 (human); Homer1 = Q9Z216 (mouse); Homer2 = Q9UNT7 (human); Spred = Q7Z698 (human); nucleoporin 50 = O08587 (rat); RanBP2/Nup358 like = P49782 (human); RanBP2L/BS63 like = Q99666 (human); amyloid $ binding protein = P98084 (mouse); tensin = Q9UPS7 (human); Fe65 = O00213 (human); Rho GTPase-activating protein = Q7ZWQ2 (Xenopus); syntrophin = Q8BNW6 (mouse); dynamin-1 = Q05193 (human).
10 EVH1/WH1 Domains
oligomerization regions, which are also present in the EVH1 domain-containing proteins. Such regions are largely unstructured and, in the absence of their binding partners, are more prone to proteolysis and nonspecific aggregation, once released into the cytoplasm. In contrast, the domains that coexist in the host proteins of RanBDs, PTB, and PH domains are generally highly stable modules capable of independent folding. EVHI domains bind specifically to proline-rich motifs (PRMs) in peptides exposed on the surfaces of their binding partners. The N-terminal location of EVH1 domains is likely to facilitate their access to PRMs on bulky target molecules and may contribute to a segmental polarity of the EVH1 host protein that separates the adaptor module (EVH1) from its various effector domains. Certainly, given this strong preference for an N-terminal location, it is clear that EVH1 domains can occur only once in any given protein. The use of domain repeats, either in a cis configuration on the same polypeptide chain or in a trans configuration on identical chains in a quarternary structure, can achieve synergistic functions for the host protein, such as allowing the clustering of multiple binding partners or increasing the binding affinity for a single ligand. Since the Ena/VASP proteins tetramerize via their C-terminal EVH2 domains [59, 60], this brings together four EVH1 domains in each tetramer, thereby providing a mechanism for clustering of Ena/VASP proteins. Often, the target PRMs of EVH1 domains occur in close tandem repeats in the sequence of the binding partner. This could provide an additional mechanism for increasing binding affinity. Similarly, the Homer2 protein contains a leucine zipper motif at its C terminus, which is also involved in clustering [32, 66]. The EVH1 domains of each polypeptide chain in the oligomerized protein must therefore be located far away from the C-terminal oligomerization regions in the final structure, to allow unhindered access to their target peptides. This is clearly easier to achieve when these domains are well separated in the sequence. In summary, a conserved, unique N-terminal location appears to be a characteristic feature of EVH1 domains, setting them clearly apart from structurally related domains such as RanBDs, PTB, and PH domains. The EVH1 domains may confer a segmental polarity to their hosts that is required for functional or biogenetic reasons, resulting in the topological separation of this exposed terminal adaptor domain from the different types of genetically fused effector domains with which they coexist. 2.3 Classification of EVH1 Domains
To date, there is considerable discrepancy in the nomenclature and classification of EVH1 domains in the various databases. The name ‘EVH1 domain’ is frequently used interchangeably with the name ‘WH1 domain’, although the EVH1 nomenclature is generally the most widely used in the fields of biochemistry and molecular cell biology, owing to the early functional connotations resulting from the
2 Occurrence and Distribution of EVH1 Domains 11
identification of the first ligand [67]. Further confusion arises when different databases (for example, SMART, Pfam, InterPro [68]) refer to only one of these different names and classify them in conflicting schemes. Here, we refer to EVH1/WH1 domains as EVH1 domains and classify them based on their sequence conservation, domain co-occurrence, structural similarities, and ligand binding preferences, where this information is available. From the sequence alignment and phylogenetic analysis (Figures 1 and 2), it is clear that the EVH1 domains cluster into four main groups, which we have named after their primary host proteins: the Ena/VASP class, the Homer/Vesl class, the WASP/N-WASP class, and the Spred class. The Ena/VASP and Homer/Vesl classes have already been referred to as Class 1 and Class 2 EVH1 domains, based on their selectivities for peptides containing either FPPPP or PPxxF motifs, respectively [69]. We suggest keeping this nomenclature and adding the more distantly related WASP/N-WASP EVH1 domains (Figure 2), which bind a LPPPEPY-containing peptide in WIP, as Class 3, and the more recently described family of Spred EVH1 domains, which contain distinct putative ligand binding residues but for which no binding partner is yet known, as Class 4. Several EVH1 domains fall between these main classes, such as those from the Drosophila Still-life type 1 protein [70], the C. elegans hypothetical YKC2 protein, and the Dictyostelium RasGEFS. These could either provide putative evolutionary links between the different classes already known or be singular representatives of new, so-far-undiscovered families. Further detailed information on their structures, functions, and ligandbinding characteristics is needed before solid conclusions can be drawn about these proteins. The RanBDs cluster to form a large class of their own, which includes domains from the Ran-binding proteins and nucleoporins, as well as the HBA1 protein from the yeast Schizosaccharomyces pombe. Both the alignment in Figure 1 and the tree in Figure 2 show clearly that the RanBDs are most closely related to the EVH1 domains of the Homer/Vesl family, as well as to the RasGEFS and Drosophila Stilllife type 1 protein. However, despite this relationship, it would be inaccurate to include RanBDs as a subclass of EVH1 domains based on the criteria used here. It is possible that the RanBDs and EVH1 domains may instead stem from a common ancestor, which at some distant time may have descended from the same origin as the other PH and PH-like domains. The specific binding of the Homer EVH1 domain to C-terminal sequences has previously led to the suggestion that this EVH1 domain may be a divergent PDZ domain [31, 66]. Inspection of Homer EVH1 and PDZ domain sequences, however, reveals almost no similarity apart from a common four-amino-acid ‘typical’ motif constituted by turn-promoting residues, an invariant Phe, and a hydrophobic residue (GLGF in Homer). This motif is involved in C-terminal COOH recognition in PDZ domains [71, 72]. However, the different locations of this motif in the PDZ and Homer EVH1 structures rule out any functional relationship and make it unlikely that there is a meaningful evolutionary link between the two types of domain [53].
12 EVH1/WH1 Domains
3 Structures of EVH1 Domains and Their Complexes 3.1 High-resolution Structures of EVH1 and Related Domains
High resolution 3D structures of seven EVH1 domains representing three of the four classes of EVH1 domain, from the Ena/VASP, WASP/N-WASP, and Homer/Vesl families, have now been solved, several in complex with their target proline-rich ligand [33, 36, 73–77]. Structures of the RanBDs from both the large RanBP2 and the smaller, nucleoporin-like RanBP1 proteins are also known: the former in complex with Ran-GTP [54] and the latter in a ternary complex with Ran-GTP bound to RanGAP [78]. The structures of 5 PTB and 16 PH domains from a wide range of different host proteins are also available (for reviews see [79, 80]). Superposition of the Cα traces of representatives from each of the three classes of EVH1 domain for which structures are now known and the related RanBDs, PTB domains, and PH domains with the Class 1 EVH1 domain from murine Ena (Mena) are shown in Figure 4. The overall folds are essentially the same, forming a compact, parallel β sandwich capped along one side by a long α helix. Mutagenesis studies have highlighted several specific core residues as being important for stabilization of the EVH1 fold [81]. The main differences between the different families occur, not surprisingly, in the loop regions, where sequence variability is also highest (Figure 1). The structures of the different classes of EVH1 domains show very little difference (mean backbone rmsd values of VASP, Homer, and WASP EVH1 domains relative to the EVH1 domain of Mena are 1.39, 2.82, and 2.0 Å, respectively). The similarity to the RanBD fold is quite remarkable (rmsd 1.6 Å). Despite low sequence identity (∼10%), the PTB and PH domains show high structural homology to both EVH1 domains and RanBDs. The agreement is closest in the most highly structured regions (average rmsd values relative to Mena EVH1 are 2.7 and 2.4 Å, respectively) but much less in the loop regions, where long insertions and deletions occur. Additional elements of secondary structure are also present in several members of these more distantly related families (for example, the PTB domain of the human SHC protein shown in the figure). A distinguishing feature of EVH1 domains is the highly conserved triad of surface-exposed aromatic sidechains, Y16, W23, and F77 (outlined by boxes in the alignments in Figure 1; numbering relative to Mena). These come together in the 3D structure to form an aromatic cluster (shown in black in Figure 4), which provides a hydrophobic docking site for the proline-rich peptide ligands targeted by EVH1 domains. From the degree of conservation alone, it is clear that W23 is a highly important residue. It is completely conserved in all families of EVH1 domains and in almost all of the RanBDs. From inspection of the 3D structures, one can see that a large area of this sidechain makes important hydrophobic contacts with the core, thus involving it in fold stabilization. The mutation W23L in human VASP results in an insoluble, aggregated protein [75], in agreement with similar inactivating mutations analyzed in a yeast two-hybrid system [82]. Nevertheless,
3 Structures of EVH1 Domains and Their Complexes 13
Fig. 4 Comparison of 3D structures. Superpositions of the backbone (N, C", and C ) atoms of the EVH1 domain of Mena with representative members of each of the related groups shown in the phylogenetic tree. The Mena EVH1 domain (blue) is overlaid with (a) the Class 1 EVH1 domain from VASP (red); (b) the Class 2 Homer EVH1 domain (green); (c) the Class 3 N-WASP EVH1 domain (orange); (d) the first RanBD (RanBD1) of Nup358 (magenta); (e) the
PTB domain of Numb (yellow); and (f) the PH domain of DAPP1/PHISH (cyan). Structures were aligned by using DALI [7]. The locations of the exposed aromatic clusters are shown in black. PDB (and SwissProt) accession codes are Mena. EVH1 = 1evh (Q03173); VASP EVH1 = 1egx (P50552); Homer EVH1 = 1ddv (O88800); N-WASP EVH1 = 1mke (O08816); RanBD = 1rrp (P49792); PTB = 2nmb (P16554); PH = 1fao (Q9UHF2).
the indole proton of W23 (H1) is oriented toward the surface and is available as a H-bond-donating group in ligand interactions. Thus, in EVH1 domains and RanBDs, W23 fulfils a crucial role in domain stabilization while simultaneously exposing one functional group for ligand recognition. The other two residues of the triad are less strictly conserved. Variation in these allow corresponding variations in the geometry and properties of the binding site, enabling the different families of EVH1 domains to bind specifically to distinct consensus ligands. The conservation in fold and of important functional residues relates the above domains very closely. Furthermore, as this fold has been found only in signaling proteins and only in eukaryotes, this suggests that EVH1, Ran-binding, and PTB domains all comprise subfamilies of the PH superfamily, which may have arisen from a common distant ancestor [65]. Over time, these subfamilies have diverged, but have retained the stable PH fold as a scaffold upon which to build distinct
14 EVH1/WH1 Domains
specialized ligand-recognition sites, leading to the rich degree of functional diversity observed today [65]. 3.2 Structures of EVH1 Complexes and Determinants of Ligand Specificity
The EVH1 domains of Ena/VASP and Homer/Vesl families bind peptides that are 6–12 amino acids long and contain proline-rich motifs (PRMs) 4–6 amino acids long. The binding affinities of the core motifs in isolation are extremely low (K d values in the millimolar range), but are increased to biologically significant levels by the presence of additional core-flanking epitopes, which make additional contacts with the domain surface. The Ena/VASP EVH1 domains bind specifically to FPPPP motifs found in focal adhesion-associated proteins like zyxin [56], vinculin [57], and the Listeria ActA [30], whereas the Homer/Vesl EVH1 domains bind specifically to PPxxF motifs from the group I mGluRs [31], IP3R receptors [61], ryanodine receptors, and Shank proteins [62, 83, 84]. In contrast, the N-WASP EVH1 domain binds a much longer proline-rich peptide, having a minimum length of 25 residues (residues 461–485) and does not bind a 10-residue ligand of the Mena EVH1 domain from ActA, which contains a PRM (DFPPPPT) very similar to that found in the WIP peptide (DLPPPEP) [67]. It should be noted that the peptide-N-WASP complex was expressed from a single construct in which the 25-residue WIP peptide was fused by a 5-amino-acid linker to the N terminus of N-WASP [36]. This clearly favors binding for entropic reasons. It is not known whether binding occurs to an equivalent independent peptide, because isolated N-WASP EVH1 domains were found by these authors to be insoluble. The structures of representative complexes from each of the three families of EVH1 domains for which structures are now available are shown in Figure 5 [36, 73, 76]. No structure is yet available for the Spred EVH1 domains. The overall construction of the ligand-recognition sites is generally similar in all classes of EVH1 domains. The exposed Trp sidechain (W23; Mena numbering) is usually located at the centre of the aromatic triad (Figures 4 to 6) and is oriented in a plane almost perpendicular to the domain surface. On one or both sides, at approximately 90◦ to this plane and almost parallel to the domain surface, lie the flat rings of either Tyr or Phe sidechains. This perpendicular arrangement of aromatic rings results in rectangular hydrophobic pockets on each side of the Trp, well suited to the recognition of peptide ligands that adopt structures close to that of the left-handed PPII helix structure, characterized by backbone angles = −78◦ and = +146◦ [85–87]. The FPPPP-containing peptides bound by the Ena/VASP EVH1 domains and the LPPPEP region of the WIP peptide bound by the N-WASP EVH1 domain are good examples of this. The indole proton of the central Trp forms a hydrogen bond to a backbone carbonyl oxygen in the peptide, which anchors the ligand into place. The sidechains of the peptide residues surrounding this carbonyl (usually prolines) then pack closely into the rectangular hydrophobic pockets on either side of the W23 sidechain (Figure 6). In the Ena/VASP EVH1 domains, Y16 and F77 (Mena numbering) comprise the Trp-flanking aromatic residues, and the peptide
3 Structures of EVH1 Domains and Their Complexes 15
Fig. 5 Structures of complexes of EVH1 and related domains with their respective ligands. (a) Mena EVH1 domain with FPPPP peptide. (b) Homer EVH1 domain with the TPPSPF peptide. (c) N-WASP EVH1 domain with the minimal 25-residue peptide from WIP fused to its N terminus. Only the N-terminal 11 amino acids of this peptide (DLPPPEPYNQT) that bind in the ‘PRM-binding groove’ are shown. (d) The first RanBD (RanBD1) from Nup358 with the Ran protein. Only the C-terminal 10 amino acids (EVAQTTALPD) of Ran relevant to this discussion are shown. (e) The
PTB domain of the SHC protein with the 12-residue peptide HIIENPQpoYFSDA phosphorylated at tyrosine. (f) The PH domain from DAPP1/PHISH with inositol-1,3,4,5tetrakisphosphate. All classes of EVH1 domains and the RanBD share similar exposed clusters of aromatic sidechains in their peptide binding grooves, as described in the text. Numbers with asterisks refer to the equivalent positions in the sequence of Mena. PDB accession codes are as in Figure 4, except for the PTB domain complex (PDB: 1shc; SwissProt: P29353).
P(2) and P(5) of the Class 1 EVH1-binding motif FPPPP binds to either side of W23 (underlined residues are those whose sidechains make the closest contacts with the domain; yellow in Figure 6). The indole proton of W23 makes a hydrogen bond to the carbonyl of P(3). An additional hydrogen bond between the domain’s Q79 sidechain and the ligand P(2) supports the interaction. In each class of EVH1 domain, an H-bond–donating residue is almost always located at this position (Figure 1). The N-terminal F(1) of this sequence makes a further close hydrophobic contact with the domain, which is important for anchoring the peptide and for determining the orientation of the otherwise highly symmetric ligand (Figure 6). Variations in the geometries of the PRM binding sites give rise to the observed differences in ligand preference between the different domain families. The complex of the Homer EVH1 domain bound to the Class 2 EVH1-binding mo-
16 EVH1/WH1 Domains
Fig. 6 Comparison of peptide binding interfaces for the different classes of EVH1 domains (a-c) and RanBDs (d). Hydrogen bonds in the interface are shown by dotted lines where distances are less than 2.6 Å. The conserved W23 (Mena numbering) is colored pink. In all EVH1 domains the conserved W23 forms an important hydrogen bond via its indole proton to a carbonyl oxygen in the peptide backbone. The second conserved H-bond-donating residue Q79 (Mena) is shown in green. The location
of this residue is conserved in Ena/VASP and Homer/Vesl EVH1 domains, but differs in the EVH1 domains of N-WASP. (Asterisks indicate numbering when aligned to Mena). Ligand sidechains that make close hydrophobic contacts with the domain are colored yellow. The peptide sequence is shown above each complex. Hbond-acceptor residues are underlined and colored as follows: red = hydrogen bond to Trp; green = hydrogen bond to Gln (EVH1) or Arg (RanBD).
tif TPPxxF shows a very different binding mode from that of Ena/VASP EVH1 domains bound to their FPPPP motifs (Figure 6). In the Homer/Vesl proteins, I16 replaces Y16 (Mena), thereby altering the shape of the binding pocket that accommodates the sidechain of P(2) in the Ena/VASP EVH1 complexes [69]. Simultaneously, the hydrophobic residue M14 of Ena/VASP domain is replaced by the aromatic F14 in Homer/Vesl (Figures 1 and 5) providing a new binding pocket for the sidechain of P(3) in the Homer peptide. Hence, the loss of an aromatic sidechain at position 16 and the gain of a new aromatic sidechain at position 14 change the location and geometry of the PRM-recognition triad. The two consecutive prolines and terminal Phe of the TPPxxF motif make the closest hy-
3 Structures of EVH1 Domains and Their Complexes 17
drophobic contacts to the Homer EVH1 domain surface, with the exposed indole proton of Homer W24 forming a hydrogen bond to the carbonyl oxygen of the N-terminal peptide Thr (Figure 6). The complex of the N-WASP EVH1 domain with the N-terminal 11 amino acids of the 25-residue WIP minimal binding sequence 1 DLPPPEPYNQT11 is shown in Figures 5 and 6. The remainder of the peptide wraps around the back of the EVH1 domain [36] and is omitted from the figures for clarity. The PRM-binding triad of N-WASP is most closely related to that of Homer (also apparent from the sequence alignment in Figure 1). As in Homer, the conserved Y16 residue of the Ena/VASP EVH1 domains is lost (replaced by A48; equivalent to I16 in Homer) and thus no longer forms part of the peptide binding site. Instead, residues Y46 (14*), W54 (23*), and F104 (77*) in the N-WASP EVH1 domain are equivalent to F14, W24, and F74 in Homer (numbers with asterisks refer to the equivalent positions in Mena), forming almost identical aromatic clusters in the two proteins. However, the WIP peptide does not bind N-WASP in the same manner as observed for the Homer complex. Instead of making close contacts with Y46 (14*), the WIP peptide contacts a groove comprising W54 (23*), F104 (77*), T106 (79*), and Q113 (86*) on the N-WASP surface. The peptide P(3) and P(5) flank the exposed W54 sidechain on either side of the hydrogen bond between this residue and the carbonyl of P(3). An additional hydrogen bond from Q113 (86*) to the peptide E(6) helps to anchor the ligand into position (Figure 6). The most striking difference between the N-WASPWIP complex and the Ena/VASP and Homer/Vesl EVH1-peptide complexes is that the WIP peptide binds in the reverse orientation to that observed in the other complexes. One reason for this may be the Y(8) sidechain of the WIP peptide, which makes several close contacts with the domain surface, just as the F(1) of the FPPPP motif contacts the Class 1 EVH1 domains. Y(8) is also oriented so that a hydrogen bond may be possible from its hydroxyl proton to the backbone carbonyl of N-WASP G109 (82*). It is therefore likely that Y(8) plays a role in orienting the WIP ligand, and it will be interesting to find out whether other WASP ligands exist in which this pattern is conserved. Binding studies to date have shown that the Spred EVH1 domain does not bind the FPPPP motif recognized by the Ena/VASP EVH1 domains (Zimmermann et al., personal communication). The replacement of the conserved Class 1 EVH1 domain Y16 with an Arg in Spred is almost certainly an important factor in determining the ligand preference of this family (Figures 1, 4 and 5). Work in progress will reveal more details on the binding mode of this class of EVH1 domains in the future. All the substitutions seen in the exposed PRM-recognition sites are very conservative and have no noticeable effect on the overall stability of the fold. Nevertheless, they are critically placed and provide sufficient external variation to tailor the different families of EVH1 domains to recognize highly specific consensus target sequences. This allows EVH1 domains to be used as molecular adaptors by diverse families of host proteins to mediate their localization to very different signaling proteins. Interestingly, the PRM-binding interfaces of the EVH1 domains described above and in Figure 6 have many features in common with those of other protein
18 EVH1/WH1 Domains
interaction domains that bind specifically to proline-rich sequences. Examples include the well known families of the SH3, WW, GYF, and UEV domains, as well as the small, actin-binding, profilin protein. This mechanism for proline-rich peptide recognition is therefore not specific to EVH1 domains, but is rather widely used in many different signaling pathways (Ball et al., 2004).
3.3 Comparisons with RanBDs, PTB Domains, and PH Domains
There are four sites of contact between Ran and the RanBD. One of these involves a groove on the RanBD surface, which binds the C-terminal fragment of Ran. The interaction surface is analogous to the peptide interaction surface of the EVH1 domains. Figure 5 shows the first RanBD (RanBD1) of the nuclear pore complex protein Nup358 in complex with the Ran protein [54]. For clarity, only the 11 Cterminal amino acids 1 LEVAQTTALPD11 of Ran, which bind the groove relevant to this discussion, are shown in the figure. The RanBDs are the closest relatives to the EVH1 domains (as shown by the phylogenetic tree in Figure 2) and employ very similar mechanisms of ligand recognition. Both families of domains utilize clusters of exposed aromatic sidechains to recognize their peptide ligands. The completely conserved W23 of the EVH1 domain (Mena numbering) is also conserved in the majority of RanBDs, where it is also necessary for fold stabilization. As in the EVH1 domains, this important residue is flanked in RanBDs by two additional aromatics to form hydrophobic binding pockets that accommodate specific peptide sidechains (yellow in Figure 6). The interaction is supported by one or more additional H-bond-donating residues exposed at the ligand binding interface. The W1211 (W23*) sidechain of the Nup358 RanBD1 is positioned so that it can form a hydrogen bond with the backbone carbonyl of the peptide T(7) in the same way as observed for the EVH1 domains, but this distance in the NMR structure of the N-WASP–WIP complex is longer than usually expected for hydrogen bonds and the geometry of the interaction is not ideal. The interaction of the ligand with the domain is stabilized by an additional hydrogen bond between the domain R1284 (R90*) sidechain and the ligand P(10). The Ran protein binds RanBD in the same orientation as the Ena/VASP and Homer EVH1 domains. However, since the peptide fragment shown here is only a small fragment of a much larger ligand, there are likely to be other factors that also contribute to its binding orientation and affinity, which shall not be discussed here. For comparison, the PTB domain of the human SHC protein with a phosphotyrosine peptide from a TRKA receptor [88] and the PH domain of DAPP1/PHISH with inositol 1,3,4,5-tetrakisphosphate [89] are also shown in Figure 6. Although their folds are visibly very similar, it is clear that the binding sites of these domains have little if anything in common with those of the EVH1 domains or RanBDs. Thus, the aromatic peptide recognition clusters clearly evolved long after the above domains had diverged from the PH family.
4 Biological Function and Signaling Pathways Involving EVH1 Domains 19
4 Biological Function and Signaling Pathways Involving EVH1 Domains
The ability of proteins to target their binding partners in a highly specific manner is the basis for the assembly of multiprotein complexes required for signaling in all living cells. EVH1 domains mediate protein–protein interactions in a diverse range of signaling cascades, depending on their host protein and site of action. 4.1 Ena/VASP Interactions
The Ena/VASP proteins are involved in the generation and maintenance of cell polarization processes by localized actin polymerization and reorganization in a variety of specialized cytoskeletal substructures [14, 16]. During epithelial sheet formation, they bind FPPPP motifs within zyxin and vinculin via their EVH1 domains. This localizes the Ena/VASP proteins to premature epithelial contact sites called adhesion zippers, which are involved in sealing epithelial layers [29, 90]. Ena/VASP proteins are also involved in barrier formation in the endothelium [91, 92] and in regulating integrin-mediated platelet adhesion [22, 23]. Interactions between activated T-cells and antigen-presenting cells lead to the formation of a contact structure called the immunological synapse. Ena/VASP proteins are recruited by FPPPP motifs within the EVH1-binding protein Fyb/SLAP to a signaling complex which, together with WASP, supports localized actin-filament polymerization at these synapses [28]. Additionally, phagocytosis by macrophages involves assembly of phagocytic cups, to which Ena/VASP proteins are again recruited by the Fyb/Slap protein. This leads to the Fc cell-surface-receptor-mediated remodeling of the actin cytoskeleton that accompanies particle internalization [26]. The Ena/VASP proteins also have important roles in migration of neurons, neutrophils, and fibroblasts [14, 16]. During the establishment of synaptic contacts in developing nervous systems, growth cones of axons are navigated by receptormediated recognition of guidance proteins [93]. FPxxP binding sites for Class 1 EVH1 domains are found in the intracellular portion of the Drosophila and C. elegans axon guidance receptor Robo/Sax-3 [20, 94]. In the human and murine transmembrane semaphorin Sema6A-1 guidance protein [95], the corresponding EVH1 binding motifs are VPPKP. Furthermore, the C. elegans Ena/VASP ortholog Unc-34 is required for appropriate response of the axon guidance receptors Unc-5 and Unc-40/DCC to the guidance protein netrin [94, 96]. The locomotion of fibroblasts is a complex process requiring coordination of forward protrusion, attachment, contraction, and rear detachment, which translocates the cell body. Ena/VASP proteins were found to be negative regulators of cell migration [25]. At the subcellular level, they enhance F-actin polymerization at the leading edge of highly dynamic, transiently formed lamellipodia, to which they are localized by EVH1-mediated interactions [97]. Molecular understanding of these processes has greatly benefited from studies of the intracellular pathogenic bacterium Listeria monocytogenes [98, 99]. The EVH1 domains of the Ena/VASP
20 EVH1/WH1 Domains
proteins are recruited to the bacterial cell surface by tandem FPPPP motifs in the Listeria surface protein ActA [67]. There, they are involved in actin polymerization, resulting in the assembly of dynamic ‘comet’ tails, which generate the motile force necessary for bacterial propulsion through the cytoplasm [98, 99]. In summary, Ena/VASP proteins form a key link between signaling pathways and cytoskeletal dynamics by acting as regulators of actin filament assembly. These proteins are components of the actin polymerization machinery and are believed to function mechanistically by delivery of polymerization-competent actin monomers, exclusion of polymerization-terminating proteins, detachment of F-actin filaments, and/or inhibition of ‘Y’ branch formation in actin filament arrays, depending on the cellular context in which they are found [13, 100]. 4.2 Homer/Vesl Interactions
Homer/Vesl proteins are adaptor proteins involved in clustering, anchoring, and modulation of neurotransmitter receptor proteins in excitatory glutamatergic synapses of the central nervous system [84], which are among the most elaborate junctions existing between cells. The apical postsynaptic plasma membrane region of dendritic spines differentiates into an electrondense subcellular compartment called the postsynaptic density (PSD). The PSD contains, together with various multidomain PSD scaffold proteins, the NMDA (N-methyl-D-aspartate) and metabotropic glutamate receptors (mGluRs) [34]. The Class 2 EVH1 domains of Homer/Vesl proteins bind a PPxxF motif in the multidomain scaffold protein Shank/ProSAP [83], which is engaged in further interaction with NMDA receptors [62]. They also bind the Class 2 EVH1-binding motif (TPPSPF) in the C termini of group 1 mGluRs [31]. Thus, the clustering of Homer/Vesl proteins [59, 60] provides a mechanism for indirectly linking two glutamate receptor types. This may be relevant to synaptogenesis or to specific types of receptor crosstalk. The Homer/Vesl EVH1 domains are also known to bind the PPKKF motif within inositol trisphosphate receptors (IPRs) and the C termini of group 1 mGluRs [61]. Here, they are involved in the glutamate-induced release of Ca2+ from intracellular stores. The monomeric Homer1 isoform, which lacks a self-association domain [32, 66], is upregulated by neural activity [35]. This protein is expected to competitively disrupt the signaling complexes assembled by the oligomeric Homer/Vesl isoforms [61]. The Homer/Vesl proteins thus regulate the composition and stability of multiprotein complexes comprising different synaptic receptor proteins. This implies a role in the modulation of synaptic plasticity, which is clearly important with respect to learning and memory formation. 4.3 WASP/N-WASP Interactions
The founding member of the WASP/N-WASP proteins, the Wiskott-Aldrich syndrome (WAS) protein, was originally discovered during a search for the genetic
4 Biological Function and Signaling Pathways Involving EVH1 Domains 21
defect responsible for WAS, a rare, X-linked, recessive immunodeficiency disease with altered functions of many types of hematopoetic cells [101]. Among other roles, WASP is expected to be involved in biogenesis of the immunological synapse and in the early stages of T-cell cytoskeletal polarization (see above) [101]. The protein–protein interactions involved in these processes are not yet well understood. Understanding of the molecular function of WASP has been gained from analysis of cellular model systems [101], which identified WASP/N-WASP proteins as highly regulated effectors of Rho GTPases. Following release from an autoinhibitory state by Rho GTPases, WASP activates the actin-nucleating Arp2/3 complex via its C-terminal VCA effector domain. The EVH1 domain of N-WASP binds a consensus LPPPEPY motif in the C-terminal region of WIP and its homologs, CR16 and verprolin [36], to regulate N-WASP-mediated actin polymerization and filopodium formation. Various viral and bacterial pathogens have evolved strategies to hijack the actin-nucleating activity of Arp2/3, using pathogen-encoded virulence factors that are direct or indirect upstream activators of N-WASP [98]. Whereas the IcsA protein of Shigella flexneri directly mimics the host cell Rho GTPase Cdc42, to activate N-WASP [102], the viral membrane protein A36R of vaccinia virus [103] and the bacterial protein Tir of extracellular enteropathic Escherichia coli (EPEC) [104, 105] both recruit the host-encoded adaptor protein Nck, together with WIP and N-WASP, to initiate localized actin polymerization, which then supports either intracellular motility (Shigella, vaccinia) or contact formation to the host cell (EPEC). Pathogen-triggered binding of the EVH1 domain of N-WASP to PRMs of host proteins therefore functionally resembles the interaction of Ena/VASP EVH1 domains with the PRMs of the Listeria monocytogenes virulence factor ActA, discussed above. However, unlike the other virulence factors, ActA is able to activate the Arp2/3 complex directly.
4.4 Spred Interactions
Spred proteins were originally isolated in a screen for new tyrosine-kinase-binding proteins. The proteins are involved in regulation of differentiation in neuronal cells and myocytes [48]. The proteins were recently described as negative regulators of the signaling pathways of Ras, Raf, and mitogen-activated protein (MAP) kinase [47, 48]. Mechanistically, Spred inhibits early steps in the activation of MAP kinases by association with Ras, thereby suppressing phosphorylation and activation of Raf. This results in down-regulation of the Ras–MAP kinase signaling pathway [48]. The specific binding partner of the Spred EVH1 domain is currently unknown, although Spreds’ inhibitory activity is lost when their EVH1 domain is substituted for a Class 3 EVH1 domain of WASP, indicating a highly specific interaction. To date, only one other Spred family member, the Drosophila AE33 [49], has been identified. It is known to be involved in photoreceptor cell specification during Drosophila eye development. However, its role in these developmental pathways has not yet been elucidated.
22 EVH1/WH1 Domains
5 Emerging Research Directions and Recent Developments
One of the most important reasons for studying sequence similarities and obtaining detailed high-resolution structures of proteins and their complexes is that comparisons may then be made with proteins for which little functional data are available. This information also facilitates the annotation and classification of hypothetical proteins revealed by genome sequencing projects and provides a rational basis for the prediction of protein structures and the identification of candidate ligands for novel proteins. 5.1 Use of Sequence and Structural Data in Prediction of Binding Partners
Comparisons of sequence and fold reveal important relationships between different domains and provide insights into their evolutionary origins. Since the overall fold of a domain is determined by a small number of key residues in the hydrophobic core, structural alignments performed with algorithms such as DALI [106] and FSSP [107, 108] can provide important clues about function and ancestry even where sequence identity is very low. However, a fold similarity on its own is insufficient for the prediction of binding partners, as demonstrated in detail for the EVH1 domains discussed in this chapter. For this, close inspection of the conserved residues exposed on the surface of the domain can be very revealing. Often, groups of surface-exposed residues, although separated by many amino acids in the primary sequence, come together in the 3D structure to form a ligand binding site. These residues are usually conserved within a given domain family, and sometimes across many families that share common mechanisms of ligand recognition. Good examples are the PRM-binding domains, which include the SH3, WW, EVH1, GYF, and UEV domains and the single-domain actin-binding protein profilin. All six families of PRM-binding domains contain exposed groups of aromatic residues termed ‘aromatic clusters’ or ‘aromatic cradles’, which are responsible for binding proline-rich motifs [53, 109]. The presence of this cluster is now considered a signature for PRM recognition. Thus, if the structure of a new domain is solved, detailed comparisons with similar structures for which the function is better understood can narrow the search for binding partners significantly. A thorough understanding of the common features that define different types of domain–ligand interfaces is ultimately needed to enable us to derive rules for the prediction of binding partners from sequence information alone. 5.2 Use of Structural Data from Complexes to Guide the Rational Design of New Ligands
Knowledge of the molecular determinants of binding affinity and specificity is necessary for rational structure-based design of inhibitors to hinder interactions between specific proteins and to modulate cellular processes. To understand the
6 Conclusions 23
subtle factors that modify specificity and affinity, it is necessary to characterize, structurally and biochemically, the most important features of the interactions we wish to inhibit. Only then can this knowledge be used to guide the design of novel target peptides or nonpeptide molecules having new activities. SPOT analyses [110] have helped enormously in understanding which residues of a known binding sequence are critical for binding and which may be replaced with little or no consequence. This allows the derivation of consensus ligand sequences, as demonstrated for the FPPPP motifs recognized by Class 1 EVH1 domains. Here, the consensus peptide sequence was found to be FPxP (where is a hydrophobic residue and x is any residue), meaning that the central two residues can be replaced in searches for higher-affinity partners [69, 75]. Knowledge of variable and conserved residues is a crucial first step toward the prediction, modification, and design of peptide binding partners, as successfully demonstrated experimentally for EVH1 domains [111]. The design of inhibitors for EVH1 domains is currently under way. Such inhibitors should be useful in several different ways: (1) to study the effects of modulating EVH1-mediated signaling cascades; (2) as molecular tags to monitor the formation and dissociation of EVH1-mediated interactions within the cell; and (3) as lead molecules for the development of future generations of novel therapeutics. The dose-dependent modulation of EVH1 domain binding activity would mean that it should be possible to develop treatments of diseases for which partial inhibition of EVH1-mediated events would be desirable (for example, pathologically altered adhesion and motility in inflammatory diseases and metastatic states, or even the spreading of intracellular pathogens).
6 Conclusions
In this chapter we have taken a close look at the sequence signatures, domain occurrence/co-occurrence, and structural features that define the EVH1 domain. Subtle differences in the patterns and nature of exposed residues have been shown to lead to different ligand specificities. Combining sequence similarity, phylogeny, structural information, and ligand preference, we have classified the known EVH1 domains into four main categories. The stability of any domain fold is clearly an important factor in its evolution, and EVH1 domains are no exception. This is demonstrated by the high conservation of residues that comprise the hydrophobic core. The specific functionality then built upon the stable scaffold depends on subtle variations in the solvent-exposed residues, particularly those located within and around ligand binding sites. These can alter specificities dramatically, depending on their charge, hydrophobicity, size, and H-bond forming abilities. More distant relatives show greater differences in their patterns of surface residues, leading to widely different functions although they utilize a common global fold. Even seemingly minor variations at critical sequence locations can alter ligand binding specificities considerably. This provides an economical evolutionary mechanism for
24 EVH1/WH1 Domains
gradual functional diversification starting from a stable fold, as has been observed repeatedly during studies of the protein universe [112, 113]. Detailed studies of these variations provide clues about the domain’s ancestry and phylogeny. Furthermore, knowing which residues can be varied without destabilizing the domain fold, and which contextual combinations of domain folds are tolerated in different host protein repertoires, is a crucial first step in the understanding of protein evolution and protein–protein interactions in general. Such knowledge is essential for the rational design of inhibitors, interaction interfaces, and novel proteins in the future.
References 1 MURZIN, A. G., BRENNER, S. E., HUBBARD, T., CHOTHIA, C., Scop: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 1995, 247, 536–540. 2 SCHULTZ, J., COPLEY, R. R., DOERKS, T., PONTING, C. P., BORK, P., Smart: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 2000, 28, 231–234. 3 LETUNIC, I., GOODSTADT, L., DICKENS, N. J., DOERKS, T., SCHULTZ, J., MOTT, R., CICCARELLI, F., COPLEY, R. R., PONTING, C. P., BORK, P., Recent improvements to the Smart domain-based sequence annotation resource. Nucleic Acids Res. 2002, 30, 242–244. 4 BATEMAN, A., BIRNEY, E., CERRUTI, L., DURBIN, R., ETWILLER, L., EDDY, S. R., GRIFFITHS-JONES, S., HOWE, K. L., MARSHALL, M., SONNHAMMER, E. L., The Pfam protein families database. Nucleic Acids Res. 2002, 30, 276–280. 5 O’DONOVAN, C., MARTIN, M. J., GATTIKER, A., GASTEIGER, E., BAIROCH, A., APWEILER, R., High-quality protein knowledge resource: Swiss-Prot and Trembl. Brief Bioinform. 2002, 3, 275–284. 6 WU, C. H., et al., The protein information resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res. 2002, 30, 35–37. 7 HOLM, L., SANDER, C., Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 1993, 233, 123–138.
8 PAGE, R. D., Treeview: an application to display phylogenetic trees on personal computers. Comput. Appl. Biosci. 1996, 12, 357–358. 9 FELSENSTEIN, J., An alternating least squares approach to inferring phylogenies from pairwise distances. Syst. Biol. 1997, 46, 101–111. 10 FELSENSTEIN, J., PHYLIP: Phylogeny inference package (version 3.2). Cladistics 1989, 5, 164–166. 11 HAFFNER, C., JARCHAU, T., REINHARD, M., HOPPE, J., LOHMANN, S. M., WALTER, U., Molecular cloning, structural analysis and functional expression of the proline-rich focal adhesion and microfilament-associated protein VASP. EMBO J. 1995, 14, 19–27. 12 GERTLER, F. B., NIEBUHR, K., REINHARD, M., WEHLAND, J., SONARIO, P., Mena, a relative of VASP and Drosophila enabled, is implicated in the control of microfilament dynamics. Cell 1996, 87, 227–239. 13 KRAUSE, M., DENT, E. W., BEAR, J. E., LOUREIRO, J. J., GERTLER, F. B., Ena/VASP proteins: regulators of the actin cytoskeleton and cell migration. Annu. Rev. Cell Dev. Biol. 2003, 19, 541–564. 14 KWIATKOWSKI, A. V., GERTLER, F. B., LOUREIRO, J. J., Function and regulation of Ena/VASP proteins. Trends Cell Biol. 2003, 13, 386–392. 15 HAN, Y. H., CHUNG, C. Y., WESSELS, D., STEPHENS, S., TITUS, M. A., SOLL, D. R., FIRTEL, R. A., Requirement of a vasodilator-stimulated phosphoprotein family member for cell adhesion, the
References
16
17
18
19
20
21
22
23
24
formation of filopodia, and chemotaxis in dictyostelium. J. Biol. Chem. 2002, 277, 49877–49887. REINHARD, M., JARCHAU, T., WALTER, U., Actin-based motility: stop and go with Ena/VASP proteins. Trends Biochem. Sci. 2001, 26, 243–249. KRAUSE, M., BEAR, J. E., LOUREIRO, J. J., GERTLER, F. B., The Ena/VASP enigma. J. Cell Sci. 2002, 115, 4721–4726. GERTLER, F. B., DOCTOR, J. S., HOFFMANN, F. M., Genetic suppression of mutations in the Drosophila Abl proto-oncogene homolog. Science 1990, 248, 857–860. GERTLER, F. B., COMER, A. R., JUANG, J.-L., AHERN, S. M., CLARK, M. J., LIEBL, E. C., HOFFMANN, F. M., Enabled, a dosage-sensitive suppressor of mutations in the Drosophila abl tyrosine kinase, encodes an abl substrate with SH3 domain-binding properties. Genes. Dev. 1995, 9, 521–533. BASHAW, G. J., KIDD, T., MURRAY, D., PAWSON, T., GOODMAN, C. S., Repulsive axon guidance: Abelson and enabled play opposing roles downstream of the roundabout receptor. Cell 2000, 101, 703–715. DICKSON, B. J., Rho GTPases in growth cone guidance. Curr. Opin. Neurobiol. 2001, 11, 103–110. ASZODI, A., PFEIFER, A., AHMAD, M., GLAUNER, M., ZHOU, X. H., NY, L., ANDERSSON, K. E., KEHREL, B., OFFERMANNS, S., FASSLER, R., The vasodilator-stimulated phosphoprotein (VASP) is involved in cGMp- and cAMP-mediated inhibition of agonist-induced platelet aggregation, but is dispensable for smooth muscle function. EMBO J. 1999, 18, 37–48. HAUSER, W., KNOBELOCH, K.-P., EIGENTHALER, M., GAMBARYAN, S., KRENNS, V., GEIGER, J., GLAZOVA, M., RHODE, E., WALTER, U., ZIMMER, M., Megakaryocyte hyperplasia and enhanced agonist induced platelet activation in VASP knockout mice. Proc. Nat. Acad. Sci. USA 1999, 96, 8120–8125. ANDERSON, S. I., BEHRENDT, B., MACHESKY, L. M., INSALL, R. H., NASH,
25
26
27
28
29
30
31
32
G. B., Linked regulation of motility and integrin function in activated migrating neutrophils revealed by interference in remodelling of the cytoskeleton. Cell Motil. Cytoskeleton 2003, 54, 135–146. BEAR, J. E., LOUREIRO, J. J., LIBOVA, I., F¨ASSLER, R., WEHLAND, J., GERTLER, F. B., Negative regulation of fibroblast motility by Ena/VASP proteins. Cell 2000, 101, 717–728. COPPOLINO, M. G., KRAUSE, M., HAGENDORFF, P., MONNER, D. A., TRIMBLE, W., GRINSTEIN, S., WEHLAND, J., SECHI, A. S., Evidence for a molecular complex consisting of Fyb/Slap, Slp-76, Nck, VASP and WASP that links the actin cytoskeleton to fcgamma receptor signalling during phagocytosis. J. Cell. Sci. 2001, 114, 4307–4318. GOH, K. L., CAI, L., CEPKO, C. L., GERTLER, F. B., Ena/VASP proteins regulate cortical neuronal positioning. Curr. Biol. 2002, 12, 565–569. KRAUSE, M., SECHI, A. S., KONRADT, M., MONNER, D., GERTLER, F. B., WEHLAND, J., Fyn-binding protein (Fyb)/Slp76-associated protein (Slap), VASP proteins and the Arp2/3 complex link T cell receptor (TCR) signaling to the actin cytoskeleton. J. Cell Biol. 2000, 149, 181–194. VASIOUKHIN, V., BAUER, C., YIN, M., FUCHS, E., Directed actin polymerization is the driving force for epithelial cell–cell adhesion. Cell 2000, 100, 209–219. CHAKRABORTY, T., et al., A focal adhesion factor directly linking intracellularly motile Listeria monocytogenes and Listeria ivanovii to the actin-based cytoskeleton of mammalian cells. EMBO J. 1995, 14, 1314–1321. BRAKEMAN, P. R., LANAHAN, A. A., O’BRIEN, R., ROCHE, K., BARNES, C. A., HUGANIR, R. L., WORLEY, P. F., Homer: a protein that selectively binds metabotropic glutamate receptors. Nature 1997, 386, 284–288. XIAO, B., TU, J. C., PETRALIA, R. S., YUAN, J. P., DOAN, A., BREDER, C. D., RUGGIERO, A., LANAHAN, A. A., WENTHOLD, R. J., WORLEY, P. F., Homer regulates the association of group 1 metabotropic
25
26 EVH1/WH1 Domains
33
34
35
36
37
38
39
40
41
glutamate receptors with multivalent complexes of Homer-related, synaptic proteins. Neuron 1998, 21, 707–716. IRIE, K., NAKATSU, T., MITSUOKA, K., MIYAZAWA, A., SOBUE, K., HIROAKI, Y., DOI, T., FUJIYOSHI, Y., KATO, H., Crystal structure of the Homer1 family conserved region reveals the interaction between the EVH1 domain and own proline-rich motif. J. Mol. Biol. 2002, 318, 1117–1126. GARNER, C. C., NASH, J., HUGANIR, R. L., PDZ domains in synapse assembly and signalling. Trends Cell Biol. 2000, 10, 274–280. KATO, A., OZAWA, F., SAITOH, Y., HIRAI, K., INOKUCHI, K., Vesl, a gene encoding VASP/Ena family related protein, is upregulated during seizure, long-term potentiation and synaptogenesis. FEBS Lett. 1997, 412, 183–189. VOLKMAN, B. F., PREHODA, K. E., SCOTT, J. A., PETERSON, F. C., LIM, W. A., Structure of the N-WASP EVH1 domain– WIP complex: insight into the molecular basis of Wiskott–Aldrich syndrome. Cell 2002, 111, 565–576. CARLIER, M. F., DUCRUIX, A., PANTALONI, D., Signalling to actin: the cdc42-N-WASP-Arp2/3 connection. Chem. Biol. 1999, 6, R235–240. POLLARD, T. D., BLANCHOIN, L., MULLINS, R. D., Molecular mechanisms controlling actin filament dynamics in nonmuscle cells. Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 545–576. HIGGS, H. N., POLLARD, T. D., Activation by cdc42 and pip(2) of Wiskott–Aldrich syndrome protein (WASP) stimulates actin nucleation by Arp2/3 complex. J. Cell Biol. 2000, 150, 1311–1320. DERRY, J. M., OCHS, H. D., FRANCKE, U., Isolation of a novel gene mutated in Wiskott–Aldrich syndrome. Cell 1994, 78, 635–644. KOLLURI, R., SHEHABELDIN, A., PEACOCKE, M., LAMHONWAH, A. M., TEICHERT-KULISZEWSKA, K., WEISSMAN, S. M., SIMINOVITCH, K. A., Identification of WASP mutations in patients with Wiskott–Aldrich syndrome and isolated
42
43
44
45
46
47
48
49
thrombocytopenia reveals allelic heterogeneity at the was locus. Hum. Mol. Genet. 1995, 4, 1119–1126. VILLA, A., et al., X-linked thrombocytopenia and Wiskott–Aldrich syndrome are allelic diseases with mutations in the wasp gene. Nat. Genet. 1995, 9, 414–417. GREER, W. L., SHEHABELDIN, A., SCHULMAN, J., JUNKER, A., SIMINOVITCH, K. A., Identification of wasp mutations, mutation hotspots and genotype–phenotype disparities in 24 patients with the Wiskott–Aldrich syndrome. Hum. Genet. 1996, 98, 685–690. ZHU, Q., WATANABE, C., LIU, T., HOLLENBAUGH, D., BLAESE, R. M., KANNER, S. B., ARUFFO, A., OCHS, H. D., Wiskott–Aldrich syndrome/X-linked thrombocytopenia: Wasp gene mutations, protein expression, and phenotype. Blood 1997, 90, 2680–2689. SUZUKI, T., MIKI, H., TAKENAWA, T., SASAKAWA, C., Neural Wiskott–Aldrich syndrome protein is implicated in the actin-based motility of Shigella flexneri. EMBO J. 1998, 17, 2767–2776. MOREAU, V., FRISCHKNECHT, F., RECKMANN, I., VINCENTELLI, R., RABUT, G., STEWART, D., WAY, M., A complex of N-WASP and WIP integrates signalling cascades that lead to actin polymerization. Nat. Cell Biol. 2000, 2, 441–448. KATO, R., NONAMI, A., TAKETOMI, T., WAKIOKA, T., KUROIWA, A., MATSUDA, Y., YOSHIMURA, A., Molecular cloning of mammalian Spred-3 which suppresses tyrosine kinase-mediated ERK activation. Biochem. Biophys. Res. Commun. 2003, 302, 767–772. WAKIOKA, T., SASAKI, A., KATO, R., SHOUDA, T., MATSUMOTO, A., MIYOSHI, K., TSUNEOKA, M., KOMIYA, S., BARON, R., YOSHIMURA, A., Spred is a Sprouty-related suppressor of RAS signalling. Nature 2001, 412, 647–651. DEMILLE, M. M., KIMMEL, B. E., RUBIN, G. M., A Drosophila gene regulated by rough and glass shows similarity to Ena and VAS Gene, P. 1996, 183, 103–108.
References 50 GORLICH, D., Transport into and out of the cell nucleus. EMBO J. 1998, 17, 2721–2727. 51 DINGWALL, C., KANDELS-LEWIS, S., SERAPHIN, B., A family of Ran binding proteins that includes nucleoporins. Proc. Natl. Acad. Sci. USA 1995, 92, 7525–7529. 52 BEDDOW, A. L., RICHARDS, S. A., OREM, N. R., MACARA, I. G., The Ran/TC4 GTPase-binding domain: identification by expression cloning and characterization of a conserved sequence motif. Proc. Natl. Acad. Sci. USA 1995, 92, 3328–3332. 53 CALLEBAUT, I., COSSART, P., DEHOUX, P., Evh1/WH1 domains of VASP and WASP proteins belong to a large family including Ran-binding domains of the RanBP1 family. FEBS Lett. 1998, 441, 181–185. 54 VETTER, I. R., NOWAK, C., NISHIMOTO, T., KUHLMANN, J., WITTINGHOFER, A., Structure of a Ran-binding domain complexed with Ran bound to a GTP analogue: implications for nuclear transport. Nature 1999, 398, 39–46. 55 STEWART, M., BAKER, R. P., 1.9 Å resolution crystal structure of the Saccharomyces cerevisiae Ran-binding protein mog1p. J. Mol. Biol. 2000, 299, 213–223. 56 REINHARD, M., JOUVENAL, K., TRIQUIER, D., WALTER, U., Identification, purification and characterisation of a zyxin-related protein that binds the focal adhesion and microfilament protein VASP (vasodilator stimulated phospho-protein). Proc. Natl. Acad. Sci. USA 1995, 92, 7956–7960. 57 REINHARD, M., R¨UDIGER, M., JOKUSCH, B. M., WALTER, U., VASP interaction with vinculin: a recurring theme of interactions with proline rich motifs. FEBS Lett. 1996, 399, 103–107. 58 REINHARD, M., GIEHL, K., ABEL, K., HAFFNER, C., JARCHAU, T., HOPPE, V., JOCKUSCH, B. M., WALTER, U., The proline-rich focal adhesion and microfilament protein VASP is a ligand for profilins. EMBO J. 1995, 14, 1583–1589.
59 ZIMMERMANN, J., LABUDDE, D., JARCHAU, T., WALTER, U., OSCHKINAT, H., BALL, L. J., Relaxation, equilibrium oligomerization, and molecular symmetry of the VASP (336–380) EVH2 tetramer. Biochemistry 2002, 41, 11143–11151. 60 BACHMANN, C., FISCHER, L., WALTER, U., REINHARD, M., The EVH2 domain of the vasodilator stimulated phosphoprotein mediates tetramerization, F-actin binding and actin bundle formation. J. Biol. Chem. 1999, 274, 23549–23557. 61 TU, J. C., XIAO, B., YUAN, J. P., LANAHAN, A. A., LEOFFERT, K., LI, M., LINDEN, D. J., WORLEY, P. F., Homer binds a novel proline rich motif and links group 1 metabotropic glutamate receptors with IP3 receptors. Neuron 1998, 21, 717–726. 62 NAISBITT, S., KIM, E., TU, J. C., XIAO, B., SALA, C., VALTSCHANOFF, J., WEINBERG, R. J., WORLEY, P. F., SHENG, M., Shank, a novel family of postsynaptic density proteins that binds to the nmda receptor/psd-95/gkap complex and cortactin. Neuron 1999, 23, 569–582. 63 MACHESKY, L. M., INSALL, R. H., Signaling to actin dynamics. J. Cell Biol. 1999, 146, 267–272. 64 ROHATGI, R., MA, L., MIKI, H., LOPEZ, M., KIRCHHAUSEN, T., TAKENAWA, T., KIRSCHNER, M. W., The interaction between N-WASP and the Arp2/3 complex links cdc42-dependent signals to actin assembly. Cell 1999, 97, 221–231. 65 PONTING, C. P., SCHULTZ, J., COPLEY, R. R., ANDRADE, M. A., BORK, P., Evolution of domain families. Adv. Protein Chem. 2000, 54, 185–244. 66 KATO, A., OZAWA, F., SAITOH, Y., FUKAZAWA, Y., SUGIYAMA, H., INOKUCHI, K., Novel members of the VESL/Homer family of PDZ proteins that bind metabotropic glutamate receptors. J. Biol. Chem. 1998, 273, 23969–23975. 67 NIEBUHR, K., EBEL, F., FRANK, R., REINHARD, M., DOMANN, E., CARL, U. D., WALTER, U., GERTLER, F. B., WEHLAND, J., CHAKRABORTY, T., A novel proline-rich motif present in acta of Listeria monocytogenes and cytoskeletal
27
28 EVH1/WH1 Domains
68
69
70
71
72
73
74
75
76
77
proteins is the ligand for the Evh1 domain, a protein module present in the Ena/VASP family. EMBO J. 1997, 16, 5433–5444. MULDER, N. J., et al., Interpro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform. 2002, 3, 225–235. BALL, L. J., JARCHAU, T., OSCHKINAT, H., WALTER, U., Evh1 domains: structure, function and interactions. FEBS Lett. 2002, 513, 45–52. SONE, M., HOSHINO, M., SUZUKI, E., KURODA, S., KAIBUCHI, K., NAKAGOSHI, H., SAIGO, K., NABESHIMA, Y., HAMA, C., Still life, a protein in synaptic terminals of Drosophila homologous to GDP–GTP exchangers. Science 1997, 275, 543–547. NOURRY, C., GRANT, S. G., BORG, J. P., PDZ domain proteins: plug and play! Sci. STKE 2003, 2003, RE7. VAN HAM, M., HENDRIKS, W., PDZ domains: glue and guide. Mol. Biol. Rep. 2003, 30, 69–82. PREHODA, K. E., LEE, D. J., LIM, W. A., Structure of the enabled/VASP homology 1 domain–peptide complex: a key component in the spatial control of actin assembly. Cell 1999, 97, 471–480. FEDOROV, A. A., FEDOROV, E., GERTLER, F., ALMO, S. C., Structure of Evh1, a novel proline rich ligand-binding module involved in cytoskeletal dynamics and neural function. Nat. Struct. Biol. 1999, 6, 661–665. BALLET, L. J., et al., Dual epitope recognition by the VASP Evh1 domain modulates polyproline ligand specificity and binding affinity. EMBO J. 2000, 19, 4903–4914. BENEKEN, J., TU, J. C., XIAO, B., NURIYA, M., YUAN, J. P., WORLEY, P. F., LEAHY, D. J., Structure of the Homer Evh1 domain–peptide complex reveals a new twist in polyproline recognition. Neuron 2000, 26, 143–154. BARZIK, M., CARL, U. D., SCHUBERT, W. D., FRANK, R., WEHLAND, J., HEINZ, D. W., The N-terminal domain of Homer/VESL is a new class II Evh1 domain. J. Mol. Biol. 2001, 309, 155–169.
78 SEEWALD, M. J., KORNER, C., WITTINGHOFER, A., VETTER, I. R., Rangap mediates GTP hydrolysis without an arginine finger. Nature 2002, 415, 662–666. 79 YAN, K. S., KUTI, M., ZHOU, M. M., PTB or not PTB: that is the question. FEBS Lett. 2002, 513, 67–70. 80 LEMMON, M. A., FERGUSON, K. M., ABRAMS, C. S., Pleckstrin homology domains and the cytoskeleton. FEBS Lett. 2002, 513, 71–76. 81 AHERN-DJAMALI, S. M., COMER, A. R., BACHMANN, C., KASTENMEIER, A. S., REDDY, S. K., HUA, P., BECKERLE, M. C., WALTER, U., HOFFMANN, F. M., Mutations in Drosophila enabled and rescue by human VASP indicate important functional roles for Evh1 and Evh2 domains. Mol. Biol. Cell 1998, 9, 2157–2171. 82 CARL, U. D., POLLMANN, M., ORR, E., GERTLER, F. B., CHAKRABORTY, T., WEHLAND, J., Aromatic and basic residues within the Evh1 domain of VASP specify its interaction with proline rich ligands. Curr. Biol. 1999, 9, 715–718. 83 TU, J. C., et al., Coupling of mglur/Homer and psd-95 complexes by the Shank family of postsynaptic density proteins. Neuron 1999, 23, 583–592. 84 XIAO, B., TU, J. C., WORLEY, P. F., Homer: a link between neural activity and glutamate receptor function. Curr. Opin. Neurobiol. 2000, 10, 370–374. 85 COWAN, P. M., MCGAVIN, S., Structure of polylproline. Nature 1955, 176, 501–503. 86 ADZUBEI, A. A., STERNBERG, M. J. E., Left-handed polyproline II helices commonly occur in globular proteins. J. Mol. Biol. 1993, 229, 472–493. 87 WILLIAMSON, M. P., The structure and function of proline rich regions in proteins. Biochemical J. 1994, 297, 249–260. 88 ZHOU, M. M., et al., Structure and ligand recognition of the phosphotyrosine binding domain of SH Nature, C. 1995, 378, 584–592. 89 FERGUSON, K. M., KAVRAN, J. M., SANKARAN, V. G., FOURNIER, E., ISAKOFF,
References
90
91
92
93
94
95
96
97
98
S. J., SKOLNIK, E. Y., LEMMON, M. A., Structural basis for discrimination of 3-phosphoinositides by pleckstrin homology domains. Mol. Cell. 2000, 6, 373–384. VASIOUKHIN, V., FUCHS, E., Actin dynamics and cell–cell adhesion in epithelia. Curr. Opin. Cell Biol. 2001, 13, 76–84. LAWRENCE, D. W., COMERFORD, K. M., COLGAN, S. P., Role of VASP in reestablishment of epithelial tight junction assembly after Ca2+ switch. Am. J. Physiol. Cell Physiol. 2002, 282, C1235–1245. COMERFORD, K. M., LAWRENCE, D. W., SYNNESTVEDT, K., LEVI, B. P., COLGAN, S. P., Role of vasodilator-stimulated phosphoprotein in PKA-induced changes in endothelial junctional permeability. FASEB J. 2002, 16, 583–585. SONG, H., POO, M., The cell biology of neuronal navigation. Nat. Cell Biol. 2001, 3, E81–88. YU, T. W., HAO, J. C., LIM, W., TESSIER-LAVIGNE, M., BARGMANN, C. I., Shared receptors in axon guidance: Sax-3/robo signals via UNC-34/enabled and a netrin-independent UNC-40/dcc function. Nat. Neurosci. 2002, 5, 1147–1154. KLOSTERMANN, A., LUTZ, B., GERTLER, F., BEHL, C., The orthologous human and murine semaphorin 6a-1 proteins (sema6a-1/sema6a-1) bind to the enabled/vasodilator stimulated phosphoprotein-like protein (EVL) via a novel carboxyl-terminal zyxin like domain. J. Biol. Chem. 2000, 275, 39647–39653. COLAVITA, A., CULOTTI, J. G., Suppressors of ectopic UNC-5 growth cone steering identify eight genes involved in axon guidance in Caenorhabditis elegans. Dev. Biol. 1998, 194, 72–85. BEAR, J. E., et al., Antagonism between Ena/VASP proteins and actin filament capping regulates fibroblast motility. Cell 2002, 109, 509–521. GOLDBERG, M. B., Actin-based motility of intracellular microbial pathogens.
99
100
101
102
103
104
105
106
107
108
109
Microbiol. Mol. Biol. Rev. 2001, 65, 595–626. CAMERON, L. A., GIARDINI, P. A., SOO, F. S., THERIOT, J. A., Secrets of actin-based motility revealed by a bacterial pathogen. Nat. Rev. Mol. Cell Biol. 2000, 1, 110–119. SAMARIN, S., ROMERO, S., KOCKS, C., DIDRY, D., PANTALONI, D., CARLIER, M. F., How VASP enhances actin-based motility. J. Cell Biol. 2003, 163, 131–142. THRASHER, A. J., WASP in immune-system organization and function. Nat. Rev. Immunol. 2002, 2, 635–646. SUZUKI, T., SASAKAWA, C., N-WASP is an important protein for the actin-based motility of Shigella flexneri in the infected epithelial cells. Jpn. J. Med. Sci. Biol. 1998, 51 Suppl., S63–68. FRISCHKNECHT, F., MOREAU, V., ROTTGER, S., GONFLONI, S., RECKMANN, I., SUPERTI-FURGA, G., WAY, M., Actin-based motility of vaccinia virus mimics receptor tyrosine kinase signalling. Nature 1999, 401, 926–929. KALMAN, D., WEINER, O. D., GOOSNEY, D. L., SEDAT, J. W., FINLAY, B. B., ABO, A., BISHOP, J. M., Enteropathogenic E. coli acts through WASP and arp2/3 complex to form actin pedestals. Nat. Cell Biol. 1999, 1, 389–391. GRUENHEID, S., DEVINNEY, R., BLADT, F., GOOSNEY, D., GELKOP, S., GISH, G. D., PAWSON, T., FINLAY, B. B., Enteropathogenic E. coli Tir binds Nck to initiate actin pedestal formation in host cells. Nat. Cell Biol. 2001, 3, 856–859. HOLM, L., SANDER, C., Dali: a network tool for protein structure comparison. Trends Biochem. Sci. 1995, 20, 478–480. HOLM, L., SANDER, C., The fssp database of structurally aligned protein fold families. Nucleic Acids Res. 1994, 22, 3600–3609. HOLM, L., SANDER, C., Dictionary of recurrent domains in protein structures. Proteins 1998, 33, 88–96. ZARRINPAR, A., BHATTACHARYYA, R. P., LIM, W. A., The structure and function of proline recognition domains. Sci. STKE 2003, 2003, RE8.
29
30 EVH1/WH1 Domains
110 FRANK, R., OVERWIN, H., Methods in Molecular Biology, Humana Press, Totowa, NJ 1996. 111 ZIMMERMANN, J., K¨UHNE, R., VOLKMER-ENGERT, R., JARCHAU, T., WALTER, U., OSCHKINAT, H., BALL, L. J., Design of N-substituted peptomer ligands for Evh1 domains. J. Biol. Chem. 2003, 278, 36810–36818. 112 KOONIN, E. V., WOLF, Y. I., KAREV, G. P., The structure of the protein universe and genome evolution. Nature 2002, 420, 218–223. 113 WOLF, Y. I., KAREV, G., KOONIN, E. V., Scale-free networks in biology: new
insights into the fundamentals of evolution? Bioessays 2002, 24, 105–109. 114 SCHULTZ, J., MILPETZ, F., BORK, P., PONTING, C. P., Smart, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 1998, 95, 5857–5864. 115 EDDY, S. R., Profile hidden Markov models. Bioinformatics 1998, 14, 755–763. 116 KABSCH, W., SANDER, C., Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637.
1
SH3 Domains Bruce J. Mayer
University of Connecticut Health Center, Farmington, USA
Kalle Saksela University of Tampere, Tampere, Finland
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 Introduction
In many ways the Src homology 3 (SH3) domain serves as the archetype for the ever-growing family of modular protein binding domains. Much of what we know (or think we know) about the numerous other domains described in this volume can be traced back to concepts and experimental approaches first tested on the SH3 domain. It is one of the most commonly found modular domains in all eukaryotic genomes, attesting to its usefulness to the cell and its adaptability to a variety of specific circumstances. As befits a domain with such a long and rich history, a number of excellent reviews are available describing its structure, binding properties, and biological activities in great detail [1–3]. In this chapter we first discuss the historical context of our understanding of these domains and then focus on a few outstanding questions on which rapid progress is being made. SH3 domains are relatively short (∼60 residues) protein modules whose primary activity is to bind to proline-rich peptides. The domain is found in a large number of intracellular proteins, many of which are involved in signaling and regulation. Although many such examples contain a single SH3 domain, proteins with multiple copies are common, and as many as five SH3 domains can be found in a single protein. The overall structure of the domain is defined by a compact β sandwich consisting of five major strands, the first three of which are connected by variable loops that for historical reasons are generally referred to as the RT loop, the N-Src loop, and the distal loop; the fourth and fifth strands are separated by a 310 helix (Figure 1). A hydrophobic groove on the surface is defined by a series of highly
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 SH3 Domains
Fig. 1 Structure of the SH3 domain. (a) Sequences of most SH3 domains discussed in the text are aligned. At top, positions of $ strands and the 310 helix are indicated, along with the positions of the RT, N-Src, and distal loops. Dashes indicate gaps imposed to maximize alignment. Yellow shading indicates hydrophobic core residues, and orange indicates conserved residues defining the core ligand binding site. All sequences are for human proteins, except for Sem-5 (Caenorhabditis elegans). A roman numeral in parentheses after a name indicates which of multiple SH3 domains present in
that protein is shown (e.g., II indicates the second SH3 domain). (b) Ribbon diagram of N-terminal SH3 domain of Sem-5 (PDB number: 2SEM). Sidechains of the core ligand binding-site residues shaded orange in (a) are also depicted. (c) Surface rendering of Sem-5 SH3 in the same orientation as in (b). Again, core peptide binding-site residues are shaded in orange. In this orientation, the two X-P pockets are located at the top and middle of the domain, and the specificity pocket at the bottom. (b) and (c) were generated with the program MOLMOL.
conserved hydrophobic residues and is adapted to bind specifically to extended peptides that adopt the left-handed polyproline-2 (PPII) helical conformation. Proline is a defining feature of most SH3 binding sites, not only because its presence favors the PPII helix, but also because its unique N-substituted structure
2 Historical Perspective
allows it to make specific contacts not possible for the other 19 amino acids. Proline is also an excellent choice to mediate specific protein-protein interactions from a thermodynamic perspective - although quite hydrophobic, it is generally found on the surface of proteins, thus driving interactions with relatively hydrophobic binding partners; furthermore, its restricted rotational freedom means the entropic cost of binding is relatively low [4]. It is therefore no surprise that proline-rich regions are the targets of a number of other protein binding domains in addition to SH3. The specificity of the domain for particular short proline-rich peptides is generally modest, with affinities usually in the low micromolar range. Although it is clear that individual SH3 domains favor binding to particular sites and classes of sites, in general these differences are not black and white, but more akin to varying shades of gray. Specificity is addressed in more detail later in this chapter, but a few general themes are outlined here. First, the limited specificity of peptide-SH3 binding means that SH3-mediated interactions can be highly dependent on their environment. As detailed below, additional surfaces on the peptide or SH3 domain, or other domains on the two potential binding partners or even on other members of a multiprotein complex, can all confer much greater overall specificity to an SH3peptide interaction. Second, moderate affinities also imply that the interactions they mediate are highly dynamic, because off-rates are necessarily very rapid. This means that SH3-mediated interactions have the potential to quickly remodel, depending on subcellular localization and available binding partners.
2 Historical Perspective
SH3 domains owe some of their prominence to the fact that they are found in several ‘glamorous’ and therefore well-studied proteins. They were first noted as a region of sequence similarity between the Src and Abl families of nonreceptor tyrosine kinases, although what we now recognize as the SH3 was not at first defined as an independent domain, but rather as part of a poorly understood Nterminal modulatory domain of these oncogenic kinases. This region also contained what Pawson and colleagues in 1986 called the Src Homology 2, or SH2, domain [5] (the conserved kinase catalytic domain was termed SH1). This view changed in 1988, with the simultaneous cloning of the Crk oncogene and a gene for a phosphatidylinositol-specific phospholipase C, PLC-γ [6, 7]. In both instances, sequence analysis revealed the presence not only of SH2 domains, but also of a clearly distinct and independent region of similarity to the N terminus of Src, which soon was called the SH3 domain. Several factors focused immediate attention on the SH3 domain. Its presence in three completely different classes of proteins (tyrosine kinase, phospholipase, and an oncogene product with no apparent catalytic domain) implied that the domain was modular and therefore presumably conferred some independent function. Whatever this function was, it evidently worked in a variety of different protein
3
4 SH3 Domains
contexts. And this function clearly had the potential to regulate the activity of enzymes in which it was found, since mutations in this region of Src could have both positive and negative effects on the ability of Src variants to transform cells. Furthermore, it was likely that the domain played a role in signal transduction, because all three classes of proteins in which it was found figure prominently in cellular signaling pathways (many other SH3-containing examples were quickly noted, including Ras-GAP, the p85 phosphatidylinositol 3-kinase (PI3K) regulatory subunit, and several cytoskeleton-associated proteins). Hard as it may be to imagine now in the age of Prosite, SMART, and other such resources, the concept of modular proteins was novel at the time, and it was not immediately obvious that such modules might confer specific protein binding activity. In 1990, however, several groups found that the SH2 domain (which, although clearly distinct from SH3, was found in many of the same proteins) binds specifically to tyrosine-phosphorylated peptide sequences. Importantly, SH2 binds to denatured proteins on filters, implying that linear peptide determinants mediated binding [8]. This had practical implications, because in the era before yeast twohybrid screens, it meant that there was a straightforward way to clone the binding partners of such domains. In 1991 Cicchetti et al. [9] used a biotinylated, GSTtagged SH3 domain from Abl to probe a bacteriophage cDNA expression library. In this screen, they found two plaques that bound avidly to the GST-SH3 probe but not to a GST control. Rough deletion mapping of the corresponding clones, termed 3BP-1 and 3BP-2, rapidly focused attention on a proline-rich segment of 3BP-1 as the putative SH3 binding site. At the time this was a somewhat perplexing and troubling result, because prolinerich regions of proteins were not thought to be terribly interesting; certainly they were incapable of folding into globular domains and thus were relegated to relatively unstructured regions about which little was known. Subsequent mapping by alanine scanning of the 3BP-1 and 3BP-2 binding sites revealed that specific residues in the proline-rich region were required for binding [10] and the core PxxP motif (where x denotes any amino acid) identified in that study defines the great majority of SH3 binding sites. However, the biological significance of these SH3-ligand interactions remained dubious at best. At this point genetics and structural biology each independently made major contributions to buttress these initial biochemical studies, revealing the importance of SH3 domains and their role in signaling as well as the molecular details of the interactions that they mediated. 2.1 Genetics to the Rescue
As it turned out, the observation that SH3 domains could bind proline-rich peptides provided the final piece of a longstanding puzzle involving activation of Ras. Members of the Ras family of small GTP-binding proteins were originally isolated as oncogene products, and it was known that Ras activation (by GTP binding) is a critical switch in activating downstream signaling pathways, including those promoting
2 Historical Perspective
proliferation. But nothing was known about what actually triggered Ras activation in the cell. In the early 1990s, evidence from three genetic model organisms provided vital clues. In Caenorhabditis elegans, a so-called adaptor protein, Sem-5, was shown by Horvitz’s group [11] to lie upstream of Ras and downstream of the Let-23 receptor tyrosine kinase in the vulval differentiation pathway; sequencing Sem-5 revealed that it consisted entirely of two SH3 domains and an SH2 domain. At roughly the same time, Rubin’s group [12] showed that a Drosophila melanogaster protein called ‘Son of Sevenless’, or Sos, was upstream of Ras and downstream of the Sevenless receptor tyrosine kinase in the R7 photoreceptor differentiation pathway. Intriguingly, Sos showed sequence similarity to Cdc25 and related proteins from yeast, which had been shown to act as guanine nucleotide exchange factors (GEF) for Ras-like GTPases [13]. The sequence of Sos revealed an extensive proline-rich segment at its C terminus. With the insight that SH3 domains might bind proline-rich peptides, a plausible model for Ras activation was immediately apparent. Sem-5 (or Grb2 as it was called in vertebrate cells) would likely be recruited to bind via its SH2 domain to tyrosinephosphorylated receptors on the membrane. The SH3 domains in turn might be expected to bind Sos via its proline-rich tail and bring it along for the ride to the membrane. Because Ras is lipid-modified and therefore constitutively associated with membranes, recruitment of Sos to the membrane would lead to guanine nucleotide exchange on Ras and its activation via local concentration effects. Thus tyrosine phosphorylation (activation of receptor by ligand) could be coupled directly to Ras activation. This model was almost immediately validated by several groups [14] and now serves as the paradigm for changes in the subcellular localization and binding interactions of proteins as a driving force in signal transduction. The critical role of the Grb2-Sos interaction was conclusively demonstrated more recently by Pawson’s group [15], who showed that defects in mouse embryonic cells lacking Grb2 could largely be rescued by expression of a fusion protein in which the Grb2 SH2 domain was appended to Sos. 2.2 Structure and Specificity
The structural biologists quickly moved to characterize the SH3 domain, even before it had a genetically validated role in life-or-death decisions of the cell. In 1992 the groups of Saraste [16] and Schreiber [17] solved the first SH3 domain structures by X-ray crystallography and nuclear magnetic resonance (NMR), respectively, and the NMR studies provided the first clues about the binding site for proline-rich peptides. Within two years high-resolution structures of a number of SH3 domains bound to specific proline-rich peptide ligands were available. Along with SH2 domain structures that were solved around the same time, these studies exemplified a radical new approach to structural biology. Because of rapid advances in molecular biology, the emergence of enticing new protein targets, and the new insight that many proteins were assembled from modular functional domains, researchers quickly realized that much could be learned (and journal covers could be gained) by
5
6 SH3 Domains
rapidly solving the structures of individual modular pieces of important and ‘hot’ proteins. These structural studies also helped establish a new understanding of protein–protein interactions, which could be mediated by families of closely related modules, each binding to stereotypical, short, linear, peptide ligands [18]. This was a far cry from earlier prevailing notions that protein–protein interactions were mediated by extensive complementary surfaces requiring the folded tertiary structure of each partner. This insight had important ramifications, because it implied that we could learn a great deal about a protein and its potential binding partners merely by inspecting its primary sequence - if your favorite protein had an SH3 domain, you could be confident that it would bind to proteins having proline-rich PxxP motifs. SH3 domains also served as a crucial proving ground for new approaches to identifying specific ligands. This hunt was motivated both by basic questions about SH3-mediated signaling pathways and by the hope that specific high-affinity smallmolecule ligands for these domains might serve as lead compounds for drug development. The relatively small size of the SH3 domain and the ability to purify large amounts of the active domain from bacteria, along with its evident involvement in key biochemical pathways relevant to human health, made them ideal targets. Early on, Schreiber’s [19] group demonstrated that combinatorial libraries of beads, each coupled to a unique peptide sequence, could be used to select preferred ligands for the Src SH3 domain. Screening of phage display libraries (where each phage expressed on its surface a unique peptide sequence) also proved a very powerful approach, by which preferred binding partners could be enriched by multiple rounds of binding and amplification [20–22]. Both approaches revealed that a PxxP core binding motif was almost always present in preferred binding sites, generally flanked by a basic residue. One remarkable discovery that arose from these studies was that the conserved basic residue could be either N-terminal or C-terminal to PxxP: for example, RxxPxxP and PxxPxR could both be selected by the Src SH3 [23]. Structural analysis of SH3 domains bound to both classes of ligands (termed Class I and Class II ligands, respectively) showed that the domain could indeed bind ligands in both orientations at the same binding site (Figure 2). More specifically, for most SH3 domains the peptide-binding groove consisted of two hydrophobic slots, each normally occupied by an x-P dipeptide (where x was generally hydrophobic), and a third, negatively charged, ‘specificity’ pocket that engaged the basic residue of the ligand [23, 24]. Recent studies have suggested that the sidechain orientation of a highly conserved tryptophan residue in the SH3 domain dictates whether it can bind Class I, Class II, or both classes of ligands [25]. The fact that peptides could adopt either an N-to-C or C-to-N orientation was unprecedented and is a consequence of the pseudosymmetry of the left-handed PPII helical conformation. Similar behavior has since been observed for other domains that bind PPII helical ligands (WW, EVH1 domains). The central conundrum raised by many of the structural studies of SH3 domains and by the results of various screens for preferred binding partners was the
2 Historical Perspective
Fig. 2 Binding of Class I and Class II ligands to SH3 domains. Top, ball-and-stick diagrams of the structure of PPII helical ligands. Ligand residues are numbered according to the nomenclature of Lim et al. [24]. Five key residues of the core ligand peptide contact the surface of the SH3 domain, two in each of the two x-P pockets (green), and one in the specificity pocket (red). Below, positions of consensus residues in Class I and Class II binding sites are indicated. Depending on the binding orientation of the peptide, the PxxP-defining proline residues may occupy positions P0 and P3 (Class I) or P−1 and P2 (Class II). The other residue of the central x-P dipeptide (position P−1 in Class I and P0 in Class II) is generally hydrophobic (). In both cases residues
at positions P−2 and P1 face away from the SH3 domain and do not participate in binding. Proline residues are common in these positions, however, and they help to stabilize the PPII helical conformation of the peptide. Ionic interactions of the positively charged residue (R or K) in position P−3 with acidic residues in the specificity pocket determine the orientation of binding. For Class I sites, this basic residue is at the N terminus of the peptide; for Class II sites, it is at the C terminus. Other types of specificity-pocket interactions also exist, such as in the Class I peptides that bind Abl SH3, which have a hydrophobic residue at position P−3 . Atypical binding motifs that bind in a recognizably similar manner to Class I and Class II sites are also indicated.
realization that there really isn’t much room for specificity in the core peptidebinding groove. Only five core ligand residues directly contact the SH3 domain; two of these are the invariant prolines and one the basic residue (which is almost always arginine) (Figure 2). As will be more fully discussed in a later section, additional contacts between variable loops of the SH3 domain and ligand residues N-terminal or C-terminal to the core can greatly enhance specificity, and in a few examples, unconventional binding sites (such as PxxRxxKP motifs) can specifically bind to the core ligand-binding groove. But it is important to emphasize that, within the core binding site, the potential for specificity is highly limited, because only proline can be accommodated at two positions. Despite these constraints, however, a high degree of selectivity is achievable in vivo. A recent study by Lim’s group [26] showed that a yeast SH3 ligand peptide was absolutely selective for its known biologically relevant binding partner, not binding detectably to any of the other 26 SH3 domains of Saccharomyces cerevisiae. On the other hand, this peptide bound
7
8 SH3 Domains
robustly to a number of vertebrate SH3 domains, suggesting that high selectivity arose in the course of evolution through both positive and negative selection. The generality of this concept to other yeast SH3-ligand interactions or to the much more numerous potential interactions in metazoan proteomes remains to be seen. More recent structural studies have revealed yet another way in which the versatile SH3 domain can be used to assemble protein complexes in the cell. MAGUK proteins, which contain a PDZ domain and an SH3 domain followed by a guanylate kinase-like (GK) domain, serve as homo- or heteropolymeric scaffolds for assembly of large complexes on the membrane, for example, in organizing the structure of the neuromuscular junction. It was found that the SH3 domains of MAGUK proteins could bind in cis or in trans to the GK domain, but the mechanism was unclear, because there was no obvious PxxP binding motif on the GK domain. The subsequent X-ray crystal structures of the SH3-GK segment of PSD-95 revealed not only that the SH3 and GK domains associated tightly with each other, but more remarkably, that a sixth β strand, located C-terminal to the GK domain, contributed directly to the folded SH3 domain structure (the majority of which is N-terminal to the GK domain); in addition, a long hinge region was inserted between the fourth and fifth strands of SH3 [27, 28]. This, along with additional genetic and biochemical evidence, strongly suggests that the first four strands of the SH3 domain form a folding subunit that can bind in either cis or trans orientation to a second subunit composed of the fifth SH3 strand, the GK domain, and the sixth SH3 strand, by a process of 3D domain swapping [28]. Such a cis-trans isomerization is likely to underlie both the specificity and stability of the polymeric networks of assembled MAGUK proteins.
3 Predicting Binding Partners
The SH3 domain is the most common of the modular protein binding domains: the human proteome is predicted to encode almost 300 different SH3 domains (current best estimate is 284, unpublished data of Wang and Saksela). The number of SH3-encoding genes is somewhat lower, since many proteins contain multiple SH3 domains, and in some, such as Src and N-Src, alternative RNA splicing contributes to SH3 domain diversity by introducing sequence variation into the SH3 loop regions. The number of potential SH3 target proteins is more difficult to predict, but appears to be considerably higher than the number of SH3 domains, creating an enormous number of theoretically possible pairs of SH3-mediated protein interactions. Knowing which ones of these interactions actually occur and are biologically meaningful would go a long way toward understanding the wiring of signaling networks that regulate fundamental aspects of cellular physiology and are deregulated in many important diseases. As already discussed, the inherent specificity in protein recognition built into the ∼60 amino acids that constitute individual SH3 domains is only one of the many factors that determine the time, place, and partnerships of SH3-mediated
3 Predicting Binding Partners
protein interactions. Nevertheless, it is reasonable to assume that SH3-ligand pairs that have evolved to possess the highest binding affinity and selectivity are likely to be physiologically relevant. Therefore, in addition to screening for preferred SH3 binding partners from various expression libraries, much effort has also been invested in developing tools for predicting potential SH3-ligand pairs from sequence databases. Significant progress in this area was recently reported by Cesareni and colleagues [29], who combined experimental and computational approaches to predict SH3 interaction networks in yeast. This analysis was based on extensive phage-display screening of a random nonapeptide library to establish minimal consensus binding sequences for most (20 of the total predicted 27) SH3 domains of S. cerevisiae. These SH3 domains were also used in a large-scale yeast two-hybrid screening for preferred binding proteins from cDNA libraries. Of the total of 394 interactions predicted to be most likely based on computer analysis of potential target proteins, 59 were also found among the 233 SH3-ligand pairs obtained experimentally in the two-hybrid screens. In another study, this group generated a large library of Abl SH3 variants in which 12 residues that participate in ligand recognition (based on close examination of Abl SH3-peptide complex structures) were replaced with random combinations of amino acid that occur in the corresponding positions of other SH3 domains [30]. From this degenerate library, individual SH3 domains could be selected that bound well to a peptide (APTYPPPLPP) preferred by native Abl SH3 but not to another peptide (LSSRPLPTLPSP) known to be a good ligand for Src SH3. Conversely, SH3 domains with the opposite specificity, i.e., that bind to the Src peptide but not to the Abl peptide, could also be selected. The lessons learned from this exercise were subsequently applied to predicting the relative preference of the collection of S. cerevisiae SH3 domains for these two peptides. Gratifyingly, the three yeast SH3 domains that bind to Abl peptide in an ELISA assay were also predicted as the three most likely Abl peptide binders, whereas the top six predicted candidates for Src peptide binding included each of the five yeast SH3 domains that actually showed good binding to this peptide. A related computational approach was successfully developed by Brannetti et al. [31]. By combining a large amount of data from SH3-ligand structures and SH3 target peptide selection experiments, they generated an algorithm (SH3-SPOT) for searching databases to find optimal decapeptide sequences for any SH3 domain of interest. The validity of this approach was supported by the observation that the hallmarks of preferred target peptides for a given SH3 domain, such as Src, could be correctly predicted even if experimental data regarding that particular SH3 domain were excluded from the training dataset for the algorithm. We should note, however, that if the actual binding data for the entire Src family of SH3 domains were omitted from the training dataset, the accuracy for predicting peptides preferred by the Src SH3 domain was greatly reduced. Nevertheless, when this algorithm was used to examine protein sequence databases for potential partners for the SH3 domains of Abl, Hck, and Grb2, the top ten scoring decapeptides did include PxxP-containing regions of proteins, such as 3BP2 (Abl SH3), HIV-1 Nef and Cbl (Hck SH3), and
9
10 SH3 Domains
Sos2 (Grb2 SH3), which had been previously reported as relevant partners of the corresponding SH3-containing proteins. 3.1 Core Peptide Docking vs. Extended Interactions
Together, the studies described above clearly establish that the amino acid sequence of an SH3 domain can be used to help predict its target peptide preference, which in turn has predictive value for identifying relevant partner proteins for that SH3 domain. So, does this remarkable progress mean that we can now predict the partnerships among the SH3 domains and their ligands and start using this information to model cellular signaling networks? The answer to this question appears to be ‘yes and no’. Although such computational tools can be of great value in guiding research to identify the relevant SH3-mediated protein interactions underlying various normal and pathological processes in the cell, these efforts must ultimately rely on traditional (although perhaps increasingly systematic and high-throughput) experimental approaches. A major limitation of the computational approach is that it fails to take into account molecular contacts that are unique to individual SH3-ligand interactions, and which are therefore, by definition, not predictable by homology modeling. The overall contribution of such ‘atypical’ contacts in ligand recognition among natural SH3-mediated interactions as a whole is presently not known. However, in most instances for which the structural basis of unusually strong or selective SH3-ligand binding has been determined, it has been found to rely on molecular contacts outside the conserved core peptide binding pockets. Within the SH3 fold these additional specificity-determining contacts typically involve residues in the nonconserved regions of the RT and N-Src loops. This makes sense intuitively, since these are the only areas in which aligned primary structures of different SH3 domains completely diverge in sequence composition and also in length (Figure 1). When discussing the RT loop that connects the first and second β strands in the SH3 fold, however, we should note that this is a large structure (typically consisting of 18 amino acids, but varying between 16 and 26 in length in human SH3 domains), composed of a central highly variable loop flanked by constant stretches of four (N-terminal) and eight (C-terminal) relatively conserved amino acids with β strand character. The conserved RT loop residues provide key structural determinants for the third specificity pocket on the SH3 ligandbinding surface, which is in agreement with the relatively limited ligand selectivity provided by this surface. By contrast, the nonconserved residues in the central region of the RT loop have a greater capacity to engage in ligand-specific contacts (see below). An interesting example how nonconserved SH3 loop residues can enhance the affinity and specificity of ligand binding was reported by Cowburn’s group [32], who studied the complex of the Csk SH3 domain bound to a high-affinity (K d 0.8 µM) target peptide derived from Csk’s natural partner and regulator, PEP (proline-enriched phosphatase). This peptide contains a Class II core PxxP motif,
3 Predicting Binding Partners
but its strong and selective binding to the Csk SH3 also requires two hydrophobic amino acids located six residues after the PxxP motif (PPLPERTPESFIV). In fact, the consensus Class II PxxP motif of PEP is, although necessary, insufficient to mediate binding to the Csk SH3. The NMR structure revealed that after the PxxP motif, the PEP peptide contained a 310 helix, which helped to position the critical isoleucine and valine residues in a hydrophobic specificity pocket in Csk SH3. A key feature of this pocket is a ‘clamp’ formed by a lysine residue in the N-Src loop. Thus, the strong and functionally relevant SH3-mediated interaction between Csk and PEP has evolved to rely mainly on contacts involving the N-Src loop of the SH3 domain and determinants in the ligand that fall outside of the canonical PxxP motif, which by itself has insignificant affinity towards Csk SH3. Although not equally well understood in structural terms, a related story can be seen in the strong dependence of the second SH3 domain of the adapter protein Nck on a critical serine residue located C-terminal to the consensus Class II motifs in its target proteins such as PAK1 (PAPPMRNTS) [33]. Interestingly, this serine residue is also a major autophosphorylation site in activated PAK kinases, and Nck SH3 fails to bind significantly to peptides containing a phosphoserine in this position. Since membrane recruitment by Nck leads to PAK activation [34], modulation of affinity of the PAK binding site for Nck SH3 by autophosphorylation may represent an important negative feedback mechanism for PAK activation. Phosphorylation sites can be found in the vicinity of many other core SH3 binding motifs, and one should remember that regulation of SH3 binding affinity via modulation of noncanonical contacts involved in ligand recognition may be more common than is currently appreciated. Another informative study was reported by Kami et al. [35], who examined the basis of the unusually tight (K d 24 nM) association of the SH3 domain of p67phox with the proline-rich region of p47phox , another cytosolic component of NADPH oxidase. This binding involves a Class II PxxP motif in p47phox . However, the tight association requires 20 additional amino acids immediately C-terminal to the p47phox PxxP motif, whereas a 10-residue peptide consisting of only this PxxP motif binds to the p67phox SH3 domain with modest affinity (K d 20 µM). Structural analyses revealed that the 20 amino acids after the PxxP consensus sequence had a helix-turn-helix (HTH) structure, which cooperated with binding of the PPIIhelical PxxP peptide by making additional contacts with the area of p67phox SH3 bordered by the RT and N-Src loops. Even when separated from the PxxP sequence, the HTH peptide was shown to bind independently to p67phox SH3 with a K d of 10 µM. This led the authors to propose this interaction as an example of PxxPindependent SH3 binding. We should note, however, that interaction of the isolated HTH peptide with p67phox SH3 depends critically on the arginine residue that overlaps the Class II PxxP consensus motif of p47phox (PAVPPR). In the NMR structure, this arginine residue is seen to engage in canonical ionic interactions with the conserved negatively charged residues of the p67phox SH3 to coordinate binding of the p47phox PxxP-HTH peptide. Thus, this interaction could perhaps better be considered a novel example of a strategy to increase the modest affinity and target specificity provided by the core PxxP motif.
11
12 SH3 Domains
A somewhat different example of complex SH3-ligand recognition that is useful to consider here relates to the avid binding (K d 0.25 µM) of the SH3 domain of the Src family tyrosine kinase Hck to the HIV-1 pathogenicity factor Nef. Similar to the studies discussed above, Lee et al. [36] concluded that the Class II consensus PxxP motif in Nef is insufficient to mediate this strong binding, because a 12-mer peptide spanning this site (PVRPQVPLRPMT) has only a modest affinity (K d 91 µM) for Hck SH3. Interestingly, the closely related SH3 domain of Fyn showed similar modest binding to the 12-mer Nef PxxP peptide, but unlike Hck SH3, fails to bind to full-length Nef protein with high affinity (K d > 20 µM). Most of the striking difference in binding affinity between these two homologous SH3 domains could be assigned to a single residue in the variable region of the RT loop. When Arg96 in Fyn SH3 (numbering according to full-length Fyn) was replaced with an isoleucine, which is found in the corresponding position in the RT loop of Hck SH3, the resulting R96I mutant Fyn SH3 bound Nef with an affinity (K d 0.38 µM) almost as high as that of Hck SH3. The molecular basis of the binding selectivity and affinity provided by this Hck-specific isoleucine residue was revealed by a crystal structure of Fyn-R96I SH3 complexed to the core domain of Nef [37]. The sidechain of this isoleucine residue is inserted into a hydrophobic pocket between two α helices in Nef, where it is packed against sidechains of multiple nonpolar Nef residues, in particular, Leu87, Phe90, Trp113, and Ile114. Notably, and in contrast to the two examples of complex SH3-ligand recognition discussed above, here the specificity-determining residues are noncontiguous and not close to the PxxP consensus sequence in the polypeptide chain of the ligand. The experimental approach in the overwhelming majority of published studies on identification of SH3 recognition sites has been to narrow down the interaction to the shortest ligand peptide that can support efficient SH3 binding. Also, although coordinates for more than 150 structures involving an SH3 domain alone or in complex with a target peptide have been deposited in databanks, Nef remains the only native PxxP-containing intermolecular partner protein whose structure in complex with an SH3 domain has been solved. Thus, it is impossible to know whether stabilizing interactions between RT and/or N-Src loops of various SH3 domains and surfaces in their target proteins formed by contiguous or noncontiguous residues located at a distance from the core PxxP motif are widespread or if the mode of SH3 domain recognition by Nef is unusual. There are, however, several notable examples of 3D structures of an SH3 domain complexed with a native target protein in the context of an intramolecular interaction. These are the autoinhibited forms of the tyrosine kinases Hck, Src, and Abl [38–40]. In all three, binding of the SH3 domain to a target site present in the linker region connecting the SH2 and kinase domains plays a key role in locking the latter into a catalytically inactive state. Competition of this intramolecular interaction by high-affinity ligands for the SH3 domain, such as Nef for Hck [41], therefore promotes activation of these kinases. Although these structures provide us important insights into the regulation of tyrosine kinases, they are less informative for understanding how optimal SH3 binding specificity and affinity is achieved. Presumably because intermolecular competition for these intramolecular SH3
3 Predicting Binding Partners
interactions must be allowed, the molecular contacts involved in recognition of the internal SH3 binding sites in these kinases are clearly suboptimal. In fact, these interactions were not appreciated until the Hck and Src structures were solved, partly because the linker-region peptides of Hck, Src, and Abl that interact with their own SH3 domains substantially deviate from known SH3 binding consensus sequences, and only in Hck are two proline residues even present. This serves as a reminder that even relatively weak SH3-mediated interactions can play a vital role in regulating signaling pathways. 3.2 Atypical SH3 Docking Motifs
It is now increasingly evident that lack of a conventional PxxP motif in a ligand protein by no means rules out high-affinity SH3 binding. Indeed, another type of complication in predicting relevant target proteins of SH3 domains based on consensus binding motifs is the steadily increasing number of SH3 interactions that do not involve a bona fide Class I or Class II PxxP motif. For discussion’s sake, we classify these interactions into three somewhat arbitrary categories: (1) interactions that involve binding of the core PxxP-motif docking surface of the SH3 to a peptide that does not conform to a classic PxxP consensus; (2) interactions that are mediated by a defined peptide element in the ligand but do not involve the PxxP-motif docking surface of the SH3 domain; and (3) complex interactions, which may or may not involve residues contributing to the PxxP docking surface, but differ fundamentally from typical SH3 target peptide recognition. The last category includes the MAGUK protein SH3-GK interaction described above, Eps8 dimeriza-tion mediated via β strand exchange between two Eps8 SH3 domains [42], the integral role of the SH3 domain of p53 binding protein-2 (53BP2) in the 53BP2-p53 complex [43], and binding of the adaptor protein Grb2 and the guanine nucleotide exchange factor Vav via heterodimerization of their SH3 domains [44]. These interactions, while fascinating, are not further discussed here. Category 2 now consists of one well-characterized example, but, considering the evolutionary adaptability characteristic of modular protein interaction domains, it would not be surprising if additional examples came to light. Distel’s group [45] studied interactions between peroxins, which are proteins involved in biogenesis and translocation of peroxisomes, and noticed that Pex5p binds to the SH3 domain of Pex13p. Surprisingly, however, they found that a short (25 amino acids) αhelical element in Pex5p is not only necessary, but also sufficient, to mediate this interaction. Furthermore, this α-helical peptide does not compete with binding of yet another peroxin, Pex14p, which recognizes Pex13p SH3 via a canonical PxxP interaction. A recent determination of the NMR structure of Pex13p, together with chemical shift experiments, identified the binding site for Pex5p on the opposite face of the Pex13 SH3 domain, away from its PxxP binding surface [46], providing a structural explanation for the earlier observations. Reported SH3 interactions belonging to category 1 are steadily increasing in number. It appears that in some of them, evolution has resulted in an SH3 domain
13
14 SH3 Domains
having a ligand-binding surface that is specialized for recognizing atypical PxxPlike peptide motifs. This seems to occur in the Eps8 family SH3 domains, which (when in monomeric form) recognize ligands, such as Abi-1and RN-tre, via a motif conforming to the consensus PxxDY [47]. Instead of Class I or Class II consensus PxxP motifs, Eps8 SH3 also selects peptides containing the PxxDY consensus from random peptide libraries. The situation is similar with the SH3 domains of the CIN85/CMS family of adaptor proteins, which play important roles in endocytosis and signal transduction. Using a novel peptide screening method, Kurakin et al. [48] determined that the three SH3 domains of CIN85 specifically select the consensus sequence Px(P/A)xxR. They also mapped a region conforming to this consensus as the CIN85 binding site in its established partner Cbl-b and noted that the same consensus can be found in other known CIN85/CMS-interacting proteins, including c-Cbl, BLNK, AIP1, SB1, and CD2. Moreover, their database analyses found the Px(P/A)xxR motif in additional proteins, such as PAK2, ZO-2, and TAFII 70, which were subsequently confirmed to have the capacity to interact with CIN85 SH3 domains in a GST pulldown assay. This group also pointed out interesting similarities between this novel CIN85/CMS binding motif and another atypical proline-rich SH3 target sequence present in PAK2 (PPPVIAPRP), which serves as the binding site for the SH3 domains of α- and β-PIX (PAK interacting exchange factor) proteins [49], suggesting that PIX and CIN85/CMS SH3 domains might use similar strategies in target peptide recognition. The strength of binding of Px(P/A)xxR peptides to CIN85/CMS SH3 domains is not remarkable (Kd values close to 10 µM), whereas binding of a 22 amino-acid PAK peptide to aPIX SH3 was reported to be of unusually high affinity (Kd 24 nM [49]). This difference, however, could be explained by a differential contribution of residues outside the PxxP-like motifs of the peptides used for measuring these affinities. Another apparent example of specialization for recognition of a nonconsensus PxxP motif is the SH3 domain of amphiphysin (an adaptor protein involved in endocytosis). When used in screening peptide libraries, this SH3 domain selected the consensus sequence xRPxR, in which a hydrophobic () amino acid, and not proline, was preferentially selected in the first position [50, 51]. Although a proline residue is present in this position of the amphiphysin SH3 binding site in its partner protein dynamin (PSRPNR), the latter sequence still deviates from a classic Class II PxxP peptide, in which a hydrophobic residue generally precedes the second proline (PxPx[+], where + is R or K). In contrast to the atypical interactions discussed above, binding of SH3 domains to the proline-free RKxxYxxY motif found in the immune cell adaptor protein SKAP55 and its relative SKAP55-R [52] looks like an example of an opposite adaptation. SKAP55-derived peptides spanning this motif can bind to a number of different SH3 domains, such as those of Fyn and Fyb that prefer to select consensus Class I PxxP ligands in random peptide screening experiments. The affinity of RKxxYxxY peptide-mediated binding is relatively low (Kd 20–60 µM), although well in the range typical of SH3 binding to canonical PxxP peptides. Thus, the
3 Predicting Binding Partners
RKxxYxxY motif appears to have evolved as an alternative, proline-independent, means for connecting SKAP55 to cellular signaling proteins containing SH3 domains that otherwise recognize ligands with a Class I PxxP consensus. NMR chemical shifts indicate that the SKAP55 peptide makes contact with many but not all of the same residues involved in typical PxxP peptide binding. The spacing of the lysine and tyrosine residues in the SKAP55 peptide, and the failure of SH3 domains that preferentially bind Class II PxxP consensus sequences to bind this peptide, also suggest structural similarities with Class I PxxP peptide binding. Yet another interesting example of SH3 binding via an unorthodox binding site involves the consensus sequence PxxRxxKP, which was originally identified in the T-cell receptor-associated adaptor protein SLP-76 as the binding site for the Cterminal SH3 domain of Grb2 [53] and its relative GADS (Grb2-related adaptor downstream of Shc, also known as Mona, Grap2, GrpL, and Grf40) [54], but is present also in other signaling proteins, such as the deubiquitinating enzyme UBPY [55] and the GADS binding protein ASMH [56]. The structural basis of recognition of GADS SH3 by this motif was recently elucidated by two groups, leading to essentially identical conclusions. Feller’s [57] group used a 13-mer human SLP-76 peptide PAPSIDRSTKPPL, which binds the GADS SH3 domain with high affinity (K d 0.18 µM). The complex solved by Li’s group [58] included the 11-mer peptide APSIDRSTKPA also known to bind tightly to GADS SH3 (K d 0.24 µM; [54]). These studies revealed that the RSTK region of the SLP-76 peptide adopts a 310 helix conformation, which contributes to the binding in itself, but also positions the critical residues of the PxxRxxKP consensus into the three binding pockets of the PxxP-binding surface of the GADS SH3. The Ala-Pro residues of the SLP-76 peptide are accommodated by the first hydrophobic pocket, much like the first x-P dipeptide of a canonical Class II PxxP peptide (xPxPx[+]). Moreover, the role of the Ile residue in the SLP-76 peptide is analogous to that of the hydrophobic residue of the second canonical -P dipeptide. Remarkably, however, the second hydrophobic pocket of GADS SH3, where this Ile residue was found to sit, is too small to accommodate a canonical -P dipeptide, thus explaining why GADS SH3 interacts poorly with typical Class II PxxP peptides. After the 310 helix region in the SLP-76 peptide, however, the analog to the consensus PxxP motif recognition again appears, and the ionic interactions involving the lysine residue of the PxxRxxKP motif and the third specificity pocket of GADS SH3 were found to be similar to those seen in typical SH3-ligand complexes. Although some of the strategies of atypical PxxP-like peptides for contacting the PxxP binding surface of SH3 domains might inherently give rise to a stronger interaction than corresponding contacts of canonical PxxP peptides, the discussion above regarding additional stabilizing interactions that involve less-conserved SH3 regions applies equally well to these non-PxxP interactions. A good example of this seems to be the GADS SH3-PxxRxxKP interaction discussed above. The same SLP-76 peptide also binds to the C-terminal SH3 domain of Grb2 [53], albeit with a much lower affinity. Feller and colleagues [57] noted that this is probably not due to major differences in docking of the PxxRxxKP motif residues to the similar PxxP
15
16 SH3 Domains
binding surfaces of GADS and Grb2 SH3 domains. Instead, based on comparison of their structure with NMR cross-saturation data on Grb2 SH3-SLP-76 binding [35], they attribute the higher affinity of the GADS SH3 domain to the capacity of its RT and N-Src loops to engage in much more extensive interactions with the SLP-76 peptide.
4 Experimental Exploitation of SH3 Specificity
Insights gained from studies on natural SH3 binding specificity have already been successfully used to engineer molecules for the experimental and therapeutic manipulation of SH3-mediated interactions. For example, peptides with high affinity for individual SH3 domains (identified by screening random peptide libraries or by rational design) have been introduced into living cells to serve as competitors for natural ligands (see below). The core PxxP motif of the C3G protein binds strongly to the first SH3 domain of the adapter Crk (and CrkL), due an unusually tight coordination of a lysine residue in the C3G peptide by three acidic residues in Crk SH3 [59]. Kardinal et al. [60] used an optimized 28-amino-acid C3G peptide that binds to CrkLSH3 with very high affinity (35 nM) and fused it with a 17-residue amphipathic antennapedia-derived ‘shuttle’ peptide, which provided this fusion peptide with the capacity for receptor-independent entry into cells. When added to cultures of chronic myeloid leukemia (CML) cells, these cell-penetrating Crk/CrkL SH3 ligands disrupted Bcr-Abl/CrkL complexes and strongly reduced proliferation of both primary CML blast cells and cell lines established from Bcr-Abl-positive patients, suggesting that interaction of Crk SH3 with its effectors is important for proliferative signaling. The affinity of the C3G-derived peptide developed by Feller and colleagues towards an unrelated control SH3 domain (Grb2 SH3) was two orders of magnitude weaker (K d 4.27 µM) than that of Crk-CrkL. However, a concern remains that such peptides might also unintentionally target other SH3-mediated interactions that should be left untouched. Lim and colleagues [61] have reported an interesting and potentially useful solution to this problem. They noted that the entire shape of the sidechains of PxxP-defining proline residues was less important for SH3 recognition than the N-substitution on their backbone amide nitrogens. Although proline is the only N-substituted amino acid available to the cell, the organic chemist has a much richer palette from which to select. Accordingly, each of the consensus proline residues of an SH3 binding peptide from Sos (YEVPPPVPPRRR) was replaced by various nonnatural N-substituted residues (peptoids). Interestingly, when binding of these hybrid ligands to SH3 domains of Src, Grb2, Crk, and Sem-5 was examined, many of these peptoids were able to substitute for proline and support SH3 binding. Moreover, some of these substitutions led to remarkable changes in the relative affinities of these ligands for the different SH3 domains tested. One proline-to-peptoid substitution caused a greater than 100-fold increase in binding to Grb2 SH3 (to a K d of 40 nM), without any effect on binding to Src SH3. Other
4 Experimental Exploitation of SH3 Specificity 17
examples of significantly improved binding to only one or a subset of the tested SH3 domains were also seen. Equally important when considering the development of specific inhibitory ligands was that some proline-to-peptoid substitutions led to a substantial decrease in binding to one or two of the SH3 domains without affecting interaction with the others. These studies demonstrated that the pockets accommodating the conserved proline residues of SH3-ligand peptides present a major untapped source of additional selectivity that can be exploited in development of SH3-specific inhibitors by using peptoids as proline mimetics. In a follow-up study on peptide-peptoid ligands, Nguyen et al. [62] focused specifically on the issue of optimal selectivity, which they assert is more critical than overall affinity when developing therapeutically useful SH3 inhibitors. Combinatorial peptoid substitutions of multiple PxxP-defining as well as other proline residues were successfully introduced into the Sos peptide used in their previous study, and peptoids were also tested in other peptides that already contained flanking sequences optimized for binding to Src SH3 or Crk SH3. One of these ligands bound Crk SH3 with a remarkable affinity of K d 8 nM, while showing over 1000-fold weaker binding to both Src SH3 and Grb2 SH3 (K d 45.6 and 13.7 µM, respectively). Additional modifications of this ligand gave rise to variants that had lost part of their affinity for Crk SH3 (K d values of 0.5 µM and 1.98 µM), but more importantly, showed virtually no binding to the two other SH3 domains (K d values at or above 1 mM). Thus, within their test set of SH3 domains, Lim and colleagues succeeded in developing SH3 ligands with truly orthogonal binding selectivity. The success described above resulted partly from the higher selectivity of peptoids as compared to proline residues in recognizing the hydrophobic binding pockets of SH3 domains, but also relied on the unusually tight accommodation by the Crk SH3 domain of a Class II ‘consensus’ lysine residue presented in the context of an optimal peptide. Examples of similar specificity determinants close to the PxxP motifs of other SH3 ligands were discussed above. Nevertheless, in general, such selective PxxP-peptide flanking motifs are in short supply, and targeting of a peptide to bind to a specific SH3 domain remains a major challenge. This is particularly true if one would like to selectively target one SH3 domain but spare closely related domains. Indeed, it is hardly a coincidence that studies on enhancing the specificity of SH3-ligand recognition, be it via PxxP flanking sequence optimization or peptoid substitutions, have usually focused on SH3 domains representing members from widely divergent SH3 subfamilies. An innovative approach reported by Schreiber and colleagues [63] for generating SH3 ligands that could have unnaturally high capacities to discriminate different SH3 domains is to append to the N terminus of a short PxxP peptide (PLPPLP) a nonpeptide moiety generated by combinatorial synthesis from a diverse set of organic monomers. From a library of such compounds, several molecules were identified that bind to Src SH3 with affinities similar to those of optimized ligands selected by Src SH3 from random libraries of longer peptides. One of these hybrid compounds bound to Src SH3 with a K d of 3.4 µM and showed over 50-fold lower affinity for PI3K SH3 (K d 170 µM). The mode of recognition of
18 SH3 Domains
Src SH3 by these peptide/nonpeptide compounds was examined by NMR spectroscopy [64]. Similar to amino acid residues N-terminal to the conserved proline residues of Class II SH3 ligands, the nonpeptide moieties of these compounds contact the SH3 specificity pocket, but via binding interactions very different from those seen in peptide-SH3 complexes. Although the binding affinities reported by Combs et al. [63] are relatively modest compared to some of the engineered SH3 ligand peptides discussed earlier, this study provides a promising proof-ofconcept. Combinatorial synthesis of more complex nonpeptide moieties and/or using this strategy in combination with longer peptides already optimized by other means could be a powerful way of generating SH3 ligands with desired binding properties. An approach for engineering of SH3 binding specificity that is directly opposite to the studies described above was pursued by Hiipakka et al. [65]. Prompted by the critical role of the variable region of the RT loop of the Hck SH3 domain in binding to HIV-1 Nef (see above) and the sequence dissimilarity of this region in different SH3 domains, they generated a large phage library displaying Hck SH3 variants carrying six randomized residues in place of the variable region of the RT loop (residues EAIHHE; see Figure 1). From this library several individual SH3 domains (named RRT-SH3, for random RT loop) were selected that bind to Nef 30-to 40-fold better than wild-type Hck, thus pushing the affinity of this interaction into the low nanomolar range. When expressed in cells, these RRT-SH3 domains could potently inhibit intracellular functions of Nef [66]. Of note, the majority of cellular proteins that were found to coprecipitate with wild-type Hck SH3 failed to associate with the modified, Nef-targeted, RRT-SH3 domains. This suggests that the same variable region of the RT loop that plays a role in Nef binding is also involved in recognition of normal cellular partners of Hck SH3. Thus, the improved affinity toward one partner protein of Hck SH3, such as Nef, achieved via RT loop manipulation is also accompanied by enhanced binding selectivity in consequence of disrupted RT loop interactions with other potential partner proteins. This is an encouraging concept when considering the use of ligand-targeted RRT SH3 domains as competitive inhibitors for intracellular SH3-mediated processes. Among natural interactions, a critical role for SH3-specific determinants in the RT loop in ligand binding was conclusively demonstrated only for the Nef/Hck SH3 complex. However, recent unpublished work by Hiipakka and coworkers has established RRT-SH3 engineering as a feasible approach for generating tightly binding SH3 domains targeted against divergent SH3 binding proteins, such as Sos, PI3K, PAK2, and CD28, which are normally not high-affinity partners of Hck SH3. Thus, modifications involving six nonconserved residues in the RT loop are sufficient to target Hck SH3 to proteins having canonical PxxP motifs that normally have no particular preference for Hck SH3 or might even be poorly suited for binding. This highlights the potential of the RT loop in providing affinity and specificity to SH3-ligand recognition and suggests that the RRT-SH3 domain approach might be a generally applicable strategy for modulating SH3 interactions for research and perhaps even for therapeutic purposes.
References
5 Conclusion
SH3 domains have long been a testing ground for new ideas and new methods in protein analysis, and there is every indication that they will continue to play a prominent role. The power of newly available genomic, proteomic, and computational approaches, coupled with the detailed structural and biochemical information that is already available, are providing unprecedented insights into the normal biological roles of these domains. There can be little doubt that these insights will ultimately give rise to specific and effective strategies for rationally manipulating their binding activities in vivo, both in the laboratory and in the clinic.
Acknowledgements
BJM is grateful to M. Schiller for help in preparing Figure 1 and for helpful discussions. Research in BJM’s laboratory was partially supported by a grant from the National Institutes of Health (CA82258). KS would like to thank Marita Hiipakka and Jinghuan Wang for help and discussions, and his laboratory was supported by the Academy of Finland and Medical Research Fund of Tampere University Hospital.
References 1 LARSON, S. M., DAVIDSON, A. R., The identification of conserved interactions within the SH3 domain by alignment of sequences and structures. Protein Sci. 2000, 9, 2170–2180. 2 MAYER, B. J., SH3 domains: complexity in moderation. J. Cell Sci. 2001, 114, 1253–1263. 3 ZARRINPAR, A., BHATTACHARYYA, R. P., LIM, W. A., The structure and function of proline recognition domains. Sci. STKE 2003, 2003 (179), RE8. 4 KAY, B. K., WILLIAMSON, M. P., SUDOL, M., The importance of being proline: the interaction of proline-rich motifs in signaling proteins with their cognate domains. FASEB J. 2000, 14, 231–241. 5 SADOWSKI, I., STONE, J. C., PAWSON, T., A noncatalytic domain conserved among cytoplasmic protein-tyrosine kinases modifies the kinase function and transforming activity of fujinami sarcoma virus P130gag-fps . Mol. Cell. Biol. 1986, 6, 4396–4408.
6 STAHL, M. L., FERENZ, C. R., KELLEHER, K. L., KRIZ, R. W., KNOPF, J. L., Sequence similarity of phospholipase C with the non-catalytic region of src. Nature 1988, 332, 269–272. 7 MAYER, B. J., HAMAGUCHI, M., HANAFUSA, H., A novel viral oncogene with structural similarity to phospholipase C. Nature 1988, 332, 272–275. 8 MAYER, B. J., JACKSON, P. K., BALTIMORE, D., The noncatalytic src homology region 2 segment of abl tyrosine kinase binds to tyrosine-phosphorylated cellular proteins with high affinity. Proc. Natl. Acad. Sci. USA. 1991, 88, 627–631. 9 CICCHETTI, P., MAYER, B. J., THIEL, G., BALTIMORE, D., Identification of a protein that binds to the SH3 region of Abl and is similar to Bcr and GAP-rho. Science 1992, 257, 803–806. 10 REN, R., MAYER, B. J., CICCHETTI, P., BALTIMORE, D., Identification of a 10-amino acid proline-rich SH3 binding site. Science 1993, 259, 1157–1161.
19
20 SH3 Domains
11 CLARK, S. G., STERN, M. J., HORVITZ, H. R., C. elegans cell-signalling gene sem-5 encodes a protein with SH2 and SH3 domains. Nature 1992, 356, 340–344. 12 SIMON, M. A., BOWTELL, D. D. L., DODSON, G. S., LAVERTY, T. R., RUBIN, G. M., Ras1 and a putative guanine nucleotide exchange factor perform crucial steps in signaling by the sevenless protein tyrosine kinase. Cell 1991, 67, 701–716. 13 CRECHET, J. B., POULLET, P., MISTOU, M. Y., PARMEGGIANI, A., CAMONIS, J., BOY-MARCOTTE, E., DAMAK, F., JACQUET, M., Enhancement of the GDP–GTP exchange of RAS proteins by the carboxyl-terminal domain of SCD25. Science 1990, 248, 866–868. 14 MCCORMICK, F., How receptors turn ras on. Nature 1993, 363, 15–16. 15 CHENG, A. M., et al., Mammalian Grb2 regulates multiple steps in embryonic development and malignant transformation. Cell 1998, 95, 793–803. 16 MUSACCHIO, A., NOBLE, M., PAUPTIT, R., WIERENGA, R., SARASTE, M., Crystal structure of a Src-homology 3 (SH3) domain. Nature 1992, 359, 851–855. 17 YU, H., ROSEN, M. K., SHIN, T. B., SEIDEL-DUGAN, C., BRUGGE, J. S., SCHREIBER, S. L., Solution structure of the SH3 domain of Src and identification of its ligand-binding site. Science 1992, 258, 1665–1668. 18 HARRISON, S. C., Peptide–surface association: the case of PDZ and PTB domains. Cell 1996, 86, 341–343. 19 YU, H., CHEN, J. K., FENG, S., DALGARNO, D. C., BRAUER, A. W., SCHREIBER, S. L., Structural basis for the binding of proline-rich peptides to SH3 domains. Cell 1994, 76, 933–945. 20 SPARKS, A. B., QUILLIAM, L. A., THORN, J. M., DER, C. J., KAY, B. K., Identification and characterization of Src SH3 ligands from phage-displayed random peptide libraries. J. Biol. Chem. 1994, 269, 23853–23856. 21 FENG, S., KASAHARA, C., RICKLES, R. J., SCHREIBER, S. L., Specific interactions outside the proline-rich core of two classes of Src homology 3 ligands. Proc. Natl. Acad. Sci. USA 1995, 92, 12408– 12415.
22 RICKLES, R. J., BOTFIELD, M. C., WENG, Z., TAYLOR, J. A., GREEN, O. M., BRUGGE, J. S., ZOLLER, M. J., Identification of Src, Fyn, Lyn, PI3K and Abl SH3 domain ligands using phage display libraries. EMBO J. 1994, 13, 5598–5604. 23 FENG, S., CHEN, J. K., YU, H., SIMON, J. A., SCHREIBER, S. L., Two binding orientations for peptides to the Src SH3 domain: development of a general model for SH3–ligand interactions. Science 1994, 266, 1241–1247. 24 LIM, W. A., RICHARDS, F. M., FOX, R. O., Structural determinants of peptide-binding orientation and of sequence specificity in SH3 domains. Nature 1994, 372, 375–379. 25 FERNANDEZ-BALLESTER, G., BLANES-MIRA, C., SERRANO, L., The tryptophan switch: changing ligand-binding specificity from type I to type II in SH3 domains. J. Mol. Biol. 2004, 335, 619–629. 26 ZARRINPAR, A., PARK, S. H., LIM, W. A., Optimization of specificity in a cellular protein interaction network by negative selection. Nature 2003, 426, 676–680. 27 TAVARES, G. A., PANEPUCCI, E. H., BRUNGER, A. T., Structural characterization of the intramolecular interaction between the SH3 and guanylate kinase domains of PSD-95. Mol. Cell 2001, 8, 1313–1325. 28 MCGEE, A. W., DAKOJI, S. R., OLSEN, O., BREDT, D. S., LIM, W. A., PREHODA, K. E., Structure of the SH3-guanylate kinase module from PSD-95 suggests a mechanism for regulated assembly of MAGUK scaffolding proteins. Mol. Cell 2001, 8, 1291–1301. 29 TONG, A. H., et al., A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295, 321–324. 30 PANNI, S., DENTE, L., CESARENI, G., In vitro evolution of recognition specificity mediated by SH3 domains reveals target recognition rules. J. Biol. Chem. 2002, 277, 21666–21674. 31 BRANNETTI, B., VIA, A., CESTRA, G., CESARENI, G., CITTERICH, M. H., SH3-SPOT: an algorithm to predict preferred ligand to different members of
References
32
33
34
35
36
37
38
39
40
41
42
the SH3 gene family. J. Mol. Biol. 2000, 298, 313–328. GHOSE, R., SHEKHTMAN, A., GOGER, M. J., JI, H., COWBURN, D., A novel, specific interaction involving the Csk SH3 domain and its natural ligand. Nat. Struct. Biol. 2001, 8, 998–1004. ZHAO, Z. S., MANSER, E., LIM, L., Interaction between PAK and Nck: a template for Nck targets and role of PAK autophosphorylation. Mol. Cell Biol. 2000, 20, 3906–3917. LU, W., KATZ, S., GUPTA, R., MAYER, B. J., Activation of Pak by membrane localization mediated by an SH3 domain from the adaptor protein Nck. Curr. Biol. 1997, 7, 85–94. KAMI, K., TAKEYA, R., SUMIMOTO, H., KOHDA, D., Diverse recognition of non-PxxP peptide ligands by the SH3 domains from p67(phox), Grb2 and Pex13p. EMBO J. 2002, 21, 4268– 4276. LEE, C.-H., LEUNG, B., LEMMON, M. A., SHENG, J., COWBURN, D., KURIYAN, J., SAKSELA, K., A single amino acid in the SH3 domain of Hck determines its high affinity and specificity in binding to HIV Nef protein. EMBO J. 1995, 14, 5006–5015. LEE, C.-H., SAKSELA, K., MIRZA, U. A., CHAIT, B. T., KURIYAN, J., Crystal structure of the conserved core of HIV-1 Nef complexed with a Src family SH3 domain. Cell 1996, 85, 931–942. SICHERI, F., MOAREFI, I., KURIYAN, J., Crystal structure of the Src family tyrosine kinase Hck. Nature 1997, 385, 602–609. XU, W., HARRISON, S. C., ECK, M. J., Three-dimensional structure of the tyrosine kinase c-Src. Nature 1997, 385, 595–602. NAGAR, B., et al., Structural basis for the autoinhibition of c-Abl tyrosine kinase. Cell 2003, 112, 859–871. MOAREFI, I., LEFEVRE-BERNT, M., SICHERI, F., HUSE, M., LEE, C.-H., KURIYAN, J., MILLER, W. T., Activation of the Src-family tyrosine kinase Hck by SH3 domain displacement. Nature 1997, 385, 650–653. KISHAN, K. V., SCITA, G., WONG, W. T., DI FIORE, P. P., NEWCOMER, M. E., The SH3 domain of Eps8 exists as a novel
43
44
45
46
47
48
49
50
51
intertwined dimer. Nat. Struct. Biol. 1997, 4, 739–743. GORINA, S., PAVLETICH, N. P., Structure of the p53 tumor suppressor bound to the ankyrin and SH3 domains of 53BP2. Science 1996, 274, 1001–1005. NISHIDA, M., NAGATA, K., HACHIMORI, Y., HORIUCHI, M., OGURA, K., MANDIYAN, V., SCHLESSINGER, J., INAGAKI, F., Novel recognition mode between Vav and Grb2 SH3 domains. EMBO J. 2001, 20, 2995–3007. BARNETT, P., BOTTGER, G., KLEIN, A. T., TABAK, H. F., DISTEL, B., The peroxisomal membrane protein Pex13p shows a novel mode of SH3 interaction. EMBO J. 2000, 19, 6382–6391. PIRES, J. R., HONG, X., BROCKMANN, C., VOLKMER-ENGERT, R., SCHNEIDER-MERGENER, J., OSCHKINAT, H., ERDMANN, R., The ScPex13p SH3 domain exposes two distinct binding sites for Pex5p and Pex14p. J. Mol. Biol. 2003, 326, 1427–1435. MONGIOVI, A. M., ROMANO, P. R., PANNI, S., MENDOZA, M., WONG, W. T., MUSACCHIO, A., CESARENI, G., DI FIORE, P. P., A novel peptide-SH3 interaction. EMBO J. 1999, 18, 5300–5309. KURAKIN, A. V., WU, S., BREDESEN, D. E., Atypical recognition consensus of CIN85/SETA/Ruk SH3 domains revealed by target-assisted iterative screening. J. Biol. Chem. 2003, 278, 34102– 34109. MANSER, E., et al., PAK kinases are directly coupled to the PIX family of nucleotide exchange factors. Mol. Cell 1998, 1, 183–192. GRABS, D., SLEPNEV, V. I., SONGYANG, Z., DAVID, C., LYNCH, M., CANTLEY, L. C., DE CAMILLI, P., The SH3 domain of amphiphysin binds the proline-rich domain of dynamin at a single site that defines a new SH3 binding consensus sequence. J. Biol. Chem. 1997, 272, 13419–13425. CESTRA, G., et al., The SH3 domains of endophilin and amphiphysin bind to the proline-rich region of synaptojanin 1 at distinct sites that display an unconventional binding specificity. J. Biol. Chem. 1999, 274, 32001–32007.
21
22 SH3 Domains
52 KANG, H., FREUND, C., DUKE-COHAN, J. S., MUSACCHIO, A., WAGNER, G., RUDD, C. E., SH3 domain recognition of a proline-independent tyrosine-based RKxxYxxY motif in immune cell adaptor SKAP55. EMBO J. 2000, 19, 2889–2899. 53 LEWITZKY, M., et al., The C-terminal SH3 domain of the adapter protein Grb2 binds with high affinity to sequences in Gab1 and SLP-76 which lack the SH3-typical P-x-x-P core motif. Oncogene 2001, 20, 1052–1062. 54 BERRY, D. M., NASH, P., LIU, S. K., PAWSON, T., MCGLADE, C. J., A high-affinity Arg-X-X-Lys SH3 binding motif confers specificity for the interaction between Gads and SLP-76 in T cell signaling. Curr. Biol. 2002, 12, 1336–1341. 55 KATO, M., MIYAZAWA, K., KITAMURA, N., A Deubiquitinating enzyme UBPY interacts with the SH3 domain of Hrs binding protein via a novel binding motif Px(V/I)(D/N)RxxKP. J. Biol. Chem. 2000, 275, 37481–37487. 56 ASADA, H., et al., Grf40, a novel Grb2 family member, is involved in T cell signaling through interaction with SLP-76 and LAT. J. Exp. Med. 1999, 189, 1383–1390. 57 HARKIOLAKI, M., et al., Structural basis for SH3 domain-mediated high-affinity binding between Mona/Gads and SLP-76. EMBO J. 2003, 22, 2571–2582. 58 LIU, Q., BERRY, D., NASH, P., PAWSON, T., MCGLADE, C. J., LI, S. S., Structural basis for specific binding of the Gads SH3 domain to an RxxK motif-containing SLP-76 peptide: a novel mode of peptide recognition. Mol. Cell 2003, 11, 471–481. 59 WU, X., KNUDSEN, B., FELLER, S. M., ZHENG, J., SALI, A., COWBURN, D., HANAFUSA, H., KURIYAN, J., Structural
60
61
62
63
64
65
66
basis for the specific interaction of lysine-containing proline-rich peptides with the amino-terminal SH3 domain of c-Crk. Structure 1995, 3, 215–226. KARDINAL, C., et al., Cell-penetrating SH3 domain blocker peptides inhibit proliferation of primary blast cells from CML patients. FASEB J. 2000, 14, 1529–1538. NGUYEN, J. T., TURCK, C. W., COHEN, F. E., ZUCKERMANN, R. N., LIM, W. A., Exploiting the basis of proline recognition by SH3 and WW domains: design of N-substituted inhibitors. Science 1998, 282, 2088–2092. NGUYEN, J. T., PORTER, M., AMOUI, M., MILLER, W. T., ZUCKERMAN, R. N., LIM, W. A ., Improving SH3 domain ligand selectivity using a non-natural scaffold. Chem. Biol. 2000, 7, 463–473. COMBS, A. P., KAPOOR, T. M., FENG, S., CHEN, J. K., DAUDE-SNOW, L. F., SCHREIBER, S. L., Protein structure-based combinatorial chemistry: discovery of non-peptide binding elements to Src SH3 domains. J. Am. Chem. Soc. 1996, 118, 287–288. FENG, S., KAPOOR, T. M., SHIRAI, F., COMBS, A. P., SCHREIBER, S. L., Molecular basis for the binding of SH3 ligands with non-peptide elements identified by combinatorial synthesis. Chem. Biol. 1996, 3, 661–670. HIIPAKKA, M., POIKONEN, K., SAKSELA, K., SH3 domains with high affinity and engineered ligand specificities targeted to HIV-1 Nef. J. Mol. Biol. 1999, 293, 1097–1106. HIIPAKKA, M., HUOTARI, P., MANNINEN, A., RENKEMA, G. H., SAKSELA, K., Inhibition of cellular functions of HIV-1 Nef by artificial SH3 domains. Virology 2001, 286, 152–159.
1
The SH2 Domain: A Prototype for Protein Interaction Modules Tony Pawson, and Gerald D. Gish
Samuel Lunenfeld Research Institute, Toronto, Canada
Piers Nash The University of Chicago, Chicago, USA
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 The Multidomain Nature of Signaling Proteins and Identification of the SH2 Domain
The discovery of viral oncogenes in the 1970s initiated an intense effort to understand the biochemical and functional properties of their protein products. The striking effects of such transforming proteins on cellular behavior suggested that understanding their modus operandi might reveal new principles underlying the molecular basis for cellular regulation and oncogenesis. Indeed, analysis of the Src, Abl, and Fps/Fes retroviral oncoproteins led to the realization that these proteins have intrinsic protein-tyrosine kinase activity, which correlates with their ability to transform cells from a normal to a cancerous phenotype [1–4]. Furthermore, the normal progenitors of these cytoplasmic proteins, as well as the cell surface receptors for growth factors and insulin, proved to be tyrosine kinases, suggesting that tyrosine phosphorylation is a common physiological mechanism through which cells respond to mitogenic or metabolic hormones [5–8]. Such hormones frequently induce receptor clustering, which stimulates receptor kinase activity, and this effect can be mimicked by oncogenic mutations that constitutively activate the receptor in the absence of an external ligand [9–11]. A logical approach toward understanding the normal and malignant properties of cytoplasmic or receptor tyrosine kinases (RTK) was therefore to pursue the biological consequences of tyrosine phosphorylation. By focusing on the tyrosine kinases themselves, it became apparent that these enzymes are activated by intermolecular autophosphorylation within their kinase domains, leading to a sustained increase in kinase activity [12–14]. Subsequent analysis has shown that such
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The SH2 Domain: A Prototype for Protein Interaction Modules
autophosphorylation causes a structural reorganization of the catalytic domain, primarily through movement of the activation loop, which otherwise occludes the kinase active site and interferes with ATP and substrate binding [15, 16]. These observations were consistent with a straightforward model in which tyrosine kinases might regulate the enzymatic activity of exogenous substrates through phosphorylation near their active sites. Curiously, however, receptor tyrosine kinases (RTK) themselves were often found to be the most abundant phosphotyrosine (pTyr)-containing proteins in growth factor-stimulated cells, raising the possibility that tyrosine phosphorylation might have unexpected biochemical functions. Furthermore, cytoplasmic protein-tyrosine kinases appeared to have extended polypeptide sequences in addition to the catalytic domain, suggesting that kinases such as Fps, Src, and Abl might have novel, noncatalytic properties. We therefore set out to test the notion that a cytoplasmic tyrosine kinase such as v-Fps might contain multiple functional domains. Our group had used site-directed mutagenesis, a technique that was then in its infancy, to establish that tyrosine phosphorylation within the C-terminal catalytic domain of the v-Fps cytoplasmic oncoprotein stimulates its kinase activity and transforming potential [13]. We devised a variation of this approach to scan the 130-kDa v-Fps retroviral protein for folded domains that participate in its transforming activity. To this end, we inserted a dipeptide (or oligopeptide) motif at any one of numerous sites throughout the v-Fps protein, with the idea that such an insertion might disrupt the folded structure, and consequently the function, of a globular domain into which it was incorporated, but leave distinct domains intact [17]. At the time we knew that v-Fps had an N-terminal region encoded by the retroviral Gag gene fused to sequences derived from the cellular Fps tyrosine kinase, with the kinase domain located at the extreme C terminus [18]. The insertion mutagenesis approach identified three v-Fps domains that appeared to collaborate to induce cellular transformation. Not surprisingly, insertions in the C-terminal region of v-Fps corresponding to the kinase domain led to a loss of kinase and transforming activities. Insertions in an N-terminal v-Fps sequence caused a significant loss of transforming activity without an obvious effect on catalytic efficiency. This region has more recently been termed the FCH (Fes/Cip4 homology) domain and has been implicated in cytoskeletal localization [19, 20]. Consistent with the rationale for this mutagenesis approach, insertions in the FCH domain destroyed recognition of the v-Fps protein by two anti-Fps monoclonal antibodies that recognize conformation-dependent epitopes [21]. In contrast, insertion in the Gag sequence or in the central region of v-Fps had little effect. Most interestingly, insertions (and subsequently deletions) in the sequence immediately N-terminal to the kinase domain influenced both kinase activity and substrate recognition and inhibited cellular transformation [17, 22]. Although this region was not necessary for kinase activity per se, it formed a protease-resistant structure that also contained the kinase domain, was important for full kinase activity in vivo, and promoted the tyrosine phosphorylation of selected cellular proteins [22–24]. Inspection of the amino acid sequences of Src and Abl showed that this region of 100 amino acids N-terminal to the v-Fps kinase domain was relatively
1 The Multidomain Nature of Signaling Proteins and Identification of the SH2 Domain 3
Fig. 1 The domain organization of several SH2-containing proteins. Arrowheads mark the positions of insertions that led to identification of the SH2 domain in vFps. Y kinase = tyrosine kinase domain; CC = coiled coil; Pro = SH3 binding motifs; BD = binding domain; actin = actin binding domain; PLC X/Yc = split PLC cat-
alytic domain; Tyr = tyrosine phosphorylation site; 4H = 4-helix bundle; EF = EF hard-like domain; TA = transactivation domain; PH = pleckstrin homology; SH = Src homology; DNA = DNA-binding domain; Actin = actin-binding domain; UBA = ubiquitin-associated domain.
conserved in these other cytoplasmic tyrosine kinases and similarly located adjacent to the kinase domain, and we therefore termed it the Src homology 2 (SH2) domain (with the kinase domain implicitly being the SH1 region) [22] (Figure 1). Taken together, these data indicated that v-Fps, and likely other cytoplasmic tyrosine kinases, were composed of multiple folded domains. The SH2 sequence was identified as a conserved noncatalytic domain, but one with the potential to regulate both tyrosine kinase activity and substrate recognition and to promote malignant transformation. Such results suggested that the ability of cytoplasmic tyrosine kinases to recognize their substrates is not an exclusive property of the catalytic domain, but also depends on modular noncatalytic domains that target the kinase to specific substrates for phosphorylation. Support for these ideas came from cloning of the viral Crk oncogene by Hidesaburo Hanafusa [25] and of cDNAs for mammalian phospholipase C (PLC) γ 1 and Ras GTPase-activating protein (GAP) [26–28]. Intriguingly, the v-Crk oncoprotein comprised only an SH2 domain and a sequence present in Src, Abl, PLC-γ 1, and RasGAP (but absent from Fps), which Hanafusa termed SH3 and which was subsequently shown by the Baltimore lab to bind proline-rich motifs [29]. Despite
4 The SH2 Domain: A Prototype for Protein Interaction Modules
this lack of intrinsic catalytic activity, the v-Crk oncoprotein stimulated endogenous tyrosine kinase activity in transformed cells and enhanced the phosphorylation of a 130-kDa protein (p130cas ) at tyrosine, consistent with the view that the SH2 and SH3 domains might nucleate the formation of multiprotein complexes involved in tyrosine kinase signaling. Sequence analysis indicated that both PLCγ 1 and RasGAP contain two SH2 domains and an SH3 domain. Taken with the earlier v-Fps data, these results suggested that SH2 domains are a common element of otherwise disparate enzymes and adaptors that function downstream of tyrosine kinases to control intracellular pathways such as those involved in phospholipid metabolism and Ras GTPase signaling. In support of this idea, activated RTKs were found to induce PLCγ 1 and RasGAP phosphorylation at tyrosine, and we also observed that RasGAP associates with two other pTyr-containing proteins in cells with activated or oncogenic tyrosine kinases [30–34]. The variable domain organization of proteins such as Fps, Src, Crk, PLCγ 1, and RasGAP indicated that the SH2 and SH3 domains might function in a cassettelike, combinatorial fashion and have broad functions in controlling cellular behavior in response to extracellular signals [35]. The precise means through which SH2 domains might link tyrosine kinases to specific biochemical pathways therefore became a central issue. In this regard, a number of groups found that activated receptor tyrosine kinases (RTK) physically associate with targets such as PLCγ , phosphatidylinositol (PI) 3 kinase, and RasGAP [30, 36]. Furthermore, autophosphorylation of the PDGF β receptor at a specific tyrosine site (Tyr751) in the noncatalytic kinase insert region proved to specifically recruit PI 3 kinase to the activated receptor [37]. Extrapolating from these data, we identified SH2 domains as the unifying element in coupling phosphorylated RTKs to their diverse intracellular targets. In these experiments, we found that isolated SH2 domains of different cytoplasmic signaling proteins, including PLCγ 1, RasGAP, and Src, share a common ability to bind selectively to activated, autophosphorylated RTKs, as well as to tyrosine-phosphorylated docking proteins (e.g., p62Dok-1 ) [38, 39]. Of interest, different SH2 domains bound distinct sets of pTyr-containing proteins from growth factor-stimulated cells, suggesting that SH2 domains impose an element of specificity in tyrosine kinase signaling through the selective recognition of tyrosine-phosphorylated proteins. These results indicated that endogenous SH2 domains are independently folding modules that bind selectively to activated RTKs and cytoplasmic proteins in a pTyr-dependent fashion. In a similar vein, recombinant v-Crk protein was found to associate with both tyrosine kinase activity and a series of pTyr-containing proteins from transformed cells [40, 41], consistent with the idea that aberrant binding of SH2 domains to pTyr-containing proteins stimulates malignant transformation. SH2 domain-pTyr interactions were thereby revealed as important in physiological signaling from RTKs and in the induction of malignant transformation by deregulated tyrosine kinase activity. Subsequent data confirmed that SH2 domains directly recognize short pTyrcontaining sequences. The binding sites for SH2 domains on the activated EGF receptor were mapped to the noncatalytic C-terminal tail of the receptor, which contains multiple autophosphorylation sites [42], and SH2 domains were found to bind to denatured pTyr-containing proteins and specific phosphopeptides corresponding to RTK autophosporylation sites [43–46].
3 Structure and Binding Properties of SH2 Domains 5
Fig. 2 Summary of the binding properties of several interaction domains.
2 SH2 Domains as a Prototype for Interaction Domains
SH2 domains can be viewed as the forerunner of the large family of interaction domains that control many aspects of signal transduction and cellular regulation, through their ability to bind variously to protein, phospholipid, nucleic acid, or small molecule ligands [47, 48] (Figure 2). Such domains have a modular design and mediate simple binary interactions, frequently by binding short peptide motifs. Through their ability to recognize a broad range of post-translational modifications and to discriminate between related binding sequences, interaction domains play a central role in the dynamic and specific cellular responses to extracellular and internal signals. Indeed, the scheme for RTK signaling, in which cytoplasmic sequences of the activated receptor recruit the interaction domains of target proteins, is broadly applicable to a wide range of cell surface receptors and intracellular regulators. Although rather simple at first glance, interaction domains are remarkably versatile. Individual domains can have multiple ligands, and when covalently linked they can function as adaptors to join otherwise distinct polypeptides into a common signaling pathway. The reiterated use of interaction domains can also generate multiprotein assemblies with complex properties. Furthermore, interaction domains can provide allosteric regulation and switch-like behavior to associated catalytic domains and target enzymatic domains to their physiological substrates. In addition, interaction domains can serve as the organizing principle for extended protein networks, which underlie the systems properties of eukaryotic cells [49].
3 Structure and Binding Properties of SH2 Domains
The majority of SH2 domains bind short phosphopeptide motifs, which are themselves components of larger polypeptides, such as activated RTKs [50, 51]. A critical feature of phosphopeptide recognition involves association of the SH2 domain with the phosphorylated tyrosine; this interaction alone provides about half the binding energy and therefore results in a 1000-fold increase in affinity as compared with an unphosphorylated peptide [52–54]. As a consequence, the physiological association of SH2 domains with their binding partners is usually phosphorylation-dependent. An analysis of SH2 domain binding sites on cellular phosphoproteins, as well as
6 The SH2 Domain: A Prototype for Protein Interaction Modules
the in-vitro binding preferences of SH2 domains for phosphorylated peptides, indicated that SH2 domains also bind selectively to residues immediately C-terminal to the pTyr, in a fashion that varies from one SH2 domain to another [52, 55]. For example, the SH2 domains of the p85 subunit of PI 3 kinase typically bind sites on activated RTKs that contain a Met that is three residues C-terminal to the pTyr (the +3 position), and in vitro they bind preferentially to phosphopeptides having the consensus motif pTyr-x-x-Met (where x is any amino acid). The generality of this concept was revealed by using SH2 domains to probe a phosphopeptide library in which a central pTyr is flanked by degenerate positions. This approach showed that different SH2 domains preferentially recognize distinct amino acids at the C-terminal +1 to +3 positions [56, 57]. This technique identified, not only residues that are favored by a particular SH2 domain, but also amino acids that are inhibitory to binding. This is relevant, because the specificity of a particular protein–protein interaction in vivo depends on both permissive forces that direct an interaction domain to its intended target as well as inhibitory effects that block binding to nonphysiological partners. The dominance of pTyr recognition is necessary for SH2-mediated interactions to be phospho-dependent, but also dictates that the affinity of SH2 domains for the C-terminal specificity residues is not too high, or complex formation could no longer be regulated by phosphorylation. Indeed, the dissociation constants of SH2 domains for their optimal phosphopeptide ligands are typically in the range of 500 nM-1 µM [58]. This relatively modest affinity may have additional biological consequences. Since signaling from cell surface receptors is highly dynamic, the protein–protein interactions that control intracellular pathways must not be too strong, or signaling events would become irreversible. A case in point is provided by the intramolecular interaction in the Src tyrosine kinase between the SH2 domain and a pTyr site (Tyr527) in the C terminus that induces an autoinhibited state (see below). The phosphorylated Tyr527 motif is a poor ligand for the Src SH2 domain, but the interaction proceeds because the two interacting partners are tethered within the same molecule. Replacing the Tyr527 site with an optimal binding motif for the Src SH2 domain yields a Src tyrosine kinase that is locked in the autoinhibited state and cannot be activated, because the inhibitory interaction is too strong [59]. In this example, an SH2-mediated interaction is designed to be weak, so that it can be readily reversed. Similarly, although specificity in protein–protein interactions is biologically important, there is often significant flexibility in the binding of interaction domains to their targets. A given SH2 domain may interact physiologically with phosphorylated sites on multiple different proteins, which may allow for cell-specific functions or the formation of more complex signaling networks within a single cell [60]. Conversely, a single phosphorylated motif on an RTK may potentially provide a docking site for several distinct SH2-containing targets [61]. Structural analysis of a number of SH2 domains in association with phosphopeptide ligands has revealed the molecular basis for these biochemical observations (Figure 3) [50, 51]. SH2 domains typically have an N-terminal region that forms an α helix (αA) and a central antiparallel β sheet containing strands βA, βB, βC,
3 Structure and Binding Properties of SH2 Domains 7
Fig. 3 Structure of the Src SH2 domain bound to a pTyr-Glu-Glu-Ile peptide (PDB: 1SPS) [67]. The surface of the SH2 domain is in pink, and the secondary structural elements of the SH2 domain in green. The "A helix is to the right and "B helix to the left. The Arg ($B5 residue critical for pTyr bind-
ing is in blue. The N-terminal pTyr of the peptide (yellow) occupies the pTyr binding pocket. The peptide runs over the central $ sheet of the SH2 domain, the +1 and +2 glutamates contact the surface of the domain, and the sidechain of the +3 Ile (to the left) fits in a hydrophobic pocket.
and βD. The C-terminal region has a smaller β sheet (with strands βD , βE, and (βF), followed by another α helix (αB), and a short β strand (βG) that associates with the core β sheet. This creates a bipartite structure with two discrete binding pockets located on either side of the core β sheet [62–64]. Residues in the most highly conserved N-terminal region of the SH2 domain form a positively charged pocket that makes numerous contacts with the pTyr residue. In terms of binding free energy, the most prominent interaction involves a bidentate ionic association between an invariant buried Arg (Arg(βB5) in the SH2 domain and two oxygens of the pTyr phosphate (Figure 4); substitution of this Arg effectively eliminates phosphopeptide binding [65, 66]. The second binding surface on the SH2 domain is much more variable between different SH2 domains and is responsible for the selective recognition of C-terminal residues. In the Src (or Lck) SH2 domain bound to a pYEEI peptide, the peptide runs orthogonally over the β sheet and adopts an extended conformation so that residues C-terminal to the pTyr contact the more variable specificity pocket. Here, the +1 and +2 glutamates make long-range contacts with the surface of the SH2 domain, while the sidechain of the +3 Ile is buried in a hydrophobic cavity [67, 68] (Figure 3).
8 The SH2 Domain: A Prototype for Protein Interaction Modules
Fig. 4 Different modes of SH2 domainphosphopeptide recognition shown for the Src, Grb2, and PLC-(1 (C terminal) SH2 domains (PDP: 1SPS; PDB: 1TZE; PDB: 2PLD) [67, 70, 72]. In each panel, the surface of the SH2 domain is in gray, and the Arg $B5 required for pTyr recognition in blue. The pTyr binding pocket is to the right, and the more variable surface involved in selective recognition of C-terminal residues is to the left. The phosphopeptides are in yel-
low, with the N-terminal pTyr to the right and the more C-terminal residues to the left. The +3 Ile of the poYEEI peptide occupies a hydrophobic pocket on the Src SH2 domain, as also shown in Figure 5. The poYVNV peptide bound to the Grb2 SH2 domain is forced into a $ turn by the Trp residue at the EF1 position (shown in red). The C-terminal residues of the poYIIPLPD peptide bound to the PLC-(1 SH2 domain occupy an extended hydrophobic cleft.
4 Different Modes of SH2 Domain-Phosphopeptide Recognition
The N-terminal SH2 domain of the Shp-2 tyrosine phosphatase and the C-terminal SH2 domain of phospholipase C-γ 1 also bind phosphopeptides, such as those from the platelet-derived growth factor receptor (PDGFR), in an extended conformation [69, 70]. However, in contrast to Src, the specificity region of these SH2 domains has a long hydrophobic groove that accommodates at least five residues C-terminal to the pTyr, primarily through hydrophobic interactions (Figure 4). In PLC-γ 1, the +1 Ile of the Tyr1021 motif from the βPDGFR projects into the start of this hydrophobic cleft, facilitated by a cysteine at the βD5 position of the SH2 domain,
4 Different Modes of SH2 Domain-Phosphopeptide Recognition 9
which provides space for the Ile sidechain. In the Src SH2 domain this space is filled by a tyrosine at βD5, which therefore selects against Ile at the +1 position [71]. The SH2 domain of the Grb2 adaptor has yet another binding mode, conferred by a Trp at the EF1 position (the first residue in the loop between the βE and βF strands) (Figure 4). In the Src SH2 domain the corresponding EF1 residue is a Thr that forms part of the +3 hydrophobic binding pocket. In Grb2, the EF1 Trp fills this pocket and projects from the surface of the SH2 domain, forcing the phosphopeptide into a β turn that is accommodated by an Asn at the +2 position of the peptide, which makes hydrogen bonds with backbone atoms of the SH2 domain [72–74]. In the SH2 domain of the Grb7 adaptor, an insertion in the EF loop also occludes the +3-binding pocket and imposes a β-turn conformation on the bound phosphopeptide, which favors a +2 Asn in a similar fashion to Grb2 [75]. These data show how subtle variations in the sequences and structures of SH2 domains can influence their binding preferences. The biological relevance of this selectivity is suggested by the fact that physiological binding sites for SH2-containing proteins often conform to those identified by in-vitro peptide-binding studies and structural analyses. To test whether the ability of SH2 domains to select specific phosphopeptide motifs is biologically significant, we examined the effects of manipulating their binding selectivity [76]. Examination of multiple SH2 domain sequences revealed the presence of a Trp at the EF1 position that was unique to the Grb2 SH2 domain and which we thought might account for its propensity to bind Asn (as subsequently validated by structural analysis). We found that replacing the EF1 Thr of the Src SH2 domain with Trp resulted in strong selection for peptides having an Asn at the +2 position; structural analysis subsequently revealed that this mutant form of the Src SH2 domain recognizes a physiological Grb2 binding site (pTyr-Val-Asn-Val) in a virtually identical fashion to Grb2 itself [77]. To explore the biological effects of this specificity switch, we made use of the fact that inactivation of the Grb2 gene in Caenorhabditis elegans (called sem-5) results in multiple defects, including abnormal development of the vulva, which can be rescued by reexpression of wildtype Grb2. We replaced the SH2 domain of Grb2 with that from Src and found that the chimeric Src-Grb2 adaptor, in which the Src SH2 domain is flanked by the Grb2 SH3 domains, was inefficient in rescuing the defect caused by loss of the endogenous gene. However, the same chimeric adaptor, but with Trp replacing Thr at the EF1 position of the Src SH2 domain, restored biological activity to nearly wild-type levels. This experiment showed that SH2 domain selectivity can be attributed to specific residues and demonstrated that the biological function of an SH2 domain correlates with its binding selectivity. The properties of SH2 domains discussed above are reflected in a number of interaction domains. The ability of post-translational modifications to regulate modular protein–protein interactions is shared by a variety of domains that selectively recognize peptide motifs after phosphorylation at tyrosine (PTB domain) or serine/threonine (FHA, WD40, MH2, Polo Box, BRCT, FF, and WW domains and 14-3-3 proteins, [78, 79], acetylation of lysine residues (bromo-domains), arginine methylation (chromodomains) [80], proline hydroxylation (VHL) [81, 82], or
10 The SH2 Domain: A Prototype for Protein Interaction Modules
monoubiquitination (UIM, CUE, UBA domains) [83]. Many of these domains show a similar structural organization to SH2 domains, in the sense that they have a cassette-like design and typically have distinct binding pockets for the modified amino acid and flanking residues, with the latter providing specificity among different members of the same domain class.
5 Signaling Pathways and Networks
SH2 domains are embedded in numerous proteins having widely different biochemical functions, which are nonetheless all directly involved in tyrosine kinase signaling owing to the pTyr-binding properties of their SH2 domains. This observation suggests an evolutionary rationale for the pervasive use of interaction domains in cell signaling. According to this scheme, new signaling pathways and cellular functions could be generated through the simple device of inserting a novel interaction domain into an already-existing polypeptide or by organizing interaction domains into new combinations. This would provide the relevant protein with innovative physical connections and would thereby link previously separate cellular processes into a common complex or pathway. This approach has several potential advantages. Incorporating an interaction domain into a specific protein is simpler than evolving a complex mechanism of allosteric regulation. In particular, the same interaction domain can be readily coupled to many different types of protein and thereby endow them with a common function, such as pTyr recognition. In addition, interaction domains can themselves provide autoregulatory and switch-like behavior when linked to catalytic domains, through a combination of intra- and intermolecular interactions [84]. In contrast, a regulatory device that depends entirely on conformational change (e.g., induced by phosphorylation) may not be easily transferred from one polypeptide to another. The various types of SH2 domain proteins can be organized into a number of different classes: Ĺ Adaptors: Proteins of the Grb2, Crk, and Nck families are composed exclusively of SH2 and SH3 domains and link specific pTyr motifs recognized by the adaptors’ SH2 domains to cellular pathways controlled by proteins that bind selectively to their SH3 domains. Each adaptor recruits a distinct set of SH3-binding proteins and thereby controls a different facet of cellular organization. Grb2, for example, binds to pTyr-X-Asn motifs through its SH2 domain (as discussed above), to the Sos guanine nucleotide exchange factors (GEF) of the Ras GTPase through its N-terminal SH3 domain [85–90], and to the Gab-1 or Gab-2 scaffolding proteins through its C-terminal SH3 domain [91, 92] (Figure 5). Neither Grb2 nor Sos are notably phosphorylated at tyrosine, and thus activation of Ras by exchange of GDP for GTP may depend primarily on physical recruitment of the Grb2-Sos complex to sites of RTK activation at the plasma membrane, where Ras is also localized. GTP-bound Ras has a number of targets,
5 Signaling Pathways and Networks 11
Fig. 5 A series of modular protein–protein and protein-phospholipid interactions mediate signaling from pTyr-X-Asn motifs on activated RTKs. The N-terminal and C-terminal SH3 domains of Grb2 bind proline-rich and basic motifs, respectively.
most significantly the Raf protein-serine/threonine kinase that stimulates the Erk MAP kinase pathway. In contrast, Grb2-associated Gab proteins become tyrosine-phosphorylated at sites that bind additional SH2 domain proteins [93], such as PI 3 kinase, which consequently generates PI-3,4,5-P3 in the plasma membrane. PIP3 , in turn, is recognized by the PH domain of the protein-serine/threonine kinase PKB/Akt [94], which is activated after recruitment to the membrane and phosphorylates a number of targets, such as Bad, Forkhead transcription factors, and Tuberin involved in apoptosis, cell cycle control, and cell growth [95]. Interestingly, the serine/threonine sites phosphorylated by PKB are often themselves directly recognized by 14-3-3 proteins, which regulate the activity of their associated phosphoproteins [96]. Because it can bind multiple targets through its SH3 domains, the Grb2 adaptor couples a single pTyr site to a series of interconnected signaling pathways, which act in a cooperative fashion to control the growth, proliferation, survival, and differentiation of growth factor-stimulated cells. These signaling events are assembled from simple, modular protein–protein and protein-phospholipid interactions, which act in concert to form a network with a broad influence on cell behavior. The SH2 and SH3 domains of Nck have different binding specificities compared with Grb2. Most strikingly, the Nck SH3 domains bind a series of proteins that regulate the actin cytoskeleton [97], such as N-WASP and the Pakserine/threonine kinase [98–100]. This correlates with the ability of Nck to couple tyrosine kinases to cytoskeletal reorganization in cultured cells and intact organisms [101, 102]. Crk, in contrast, links specific pTyr sites to proteins involved in membrane dynamics and cell adhesion, including activators of the Rap and Rac GTPases [103–108]. SH2/SH3 adaptors therefore provide a simple tool through
12 The SH2 Domain: A Prototype for Protein Interaction Modules
Ĺ
Ĺ
Ĺ
Ĺ
which tyrosine kinases influence a range of cellular activities. Many other types of cell surface receptors, such as TNF receptors, Toll-like receptors, guidance receptors, adhesion molecules and integrins, TGF(β receptors, and Wnt receptors, similarly associate with their cytoplasmic targets through modular interaction domains, often in the form of adaptor proteins [109, 110]. Scaffolds: Scaffold proteins bind multiple signaling components with complementary functions. This applies to SH2 domain proteins such as Blnk (SLP-65) and SLP-76 [111, 112], which act in B cells and T cells as hubs to organize a number of key signaling proteins (including Tec family tyrosine kinases, PLCγ , Nck, and Vav) [113–115] and therefore play a critical role in signaling from antigen receptors [116–118]. Enzymes: A number of RTK targets are enzymes with intrinsic SH2 domains. These include proteins that regulate phospholipid metabolism, such as phospholipase Cγ 1 and γ 2 and the SHIP 5 -inositol phosphatase [119]. PI 3 kinase has a noncatalytic p85 adaptor subunit that contains SH2 domains, separated by a coiled-coil region that associates with the p110 catalytic subunit [95, 120]; however, in contrast to other adaptors, p85 is a dedicated component of the PI 3-kinase complex. Other SH2-containing enzymes regulate Ras, Rho, and Rab family GTPases, as well as tyrosine phosphorylation [121–128]. These enzymes are commonly activated by the binding of their SH2 domains to pTyr sites, as discussed in more detail below, and are also frequently substrates for tyrosine phosphorylation. Regulators: A substantial family of SH2-containing proteins control the duration and localization of tyrosine kinase signals. c-Cbl is an E3 protein–ubiquitin ligase that promotes RTK monoubiquitination and internalization [129–131], and SOCS proteins inhibit cytokine receptor signaling both by directly binding and inhibiting the activity of receptor-associated JAK-tyrosine kinases and by inducing their ubiquitination [132]. The APS protein, in contrast, binds and stabilizes the activated form of the insulin receptor (see below). Transcription factors: The Stat proteins are SH2-containing transcriptional regulators that provide a direct link between activated cytokine receptors at the plasma membrane, which they bind through their SH2 domain, and gene expression in the nucleus, where they relocalize after phosphorylation and dimerization, as discussed below.
6 Plasticity of SH2 Domains
The human genome encodes some 115 SH2 domains. An analysis of existing exon/intron boundaries suggests that these SH2 domains likely evolved from a common ancestor, which may then have been selected for its versatility [133]. Consistent with this possibility, SH2 domains show significant flexibility in their binding properties, beyond their conventional capacity to discriminate between distinct pTyr-containing peptide motifs.
6 Plasticity of SH2 Domains 13
Fig. 6 The SAP SH2 domain acts as an adaptor to link the SLAM receptor in T cells to the Fyn tyrosine kinase, which consequently phosphorylates SLAM at sites that engage additional SH2-containing proteins.
Loss-of-function mutations in the SAP SH2 domain cause X-linked lymphoproliferative syndrome. R = Arg78; PM = plasma membrane.
The SAP/SH2D1A protein, for example, is composed almost entirely of a single SH2 domain [134], which binds to a tyrosine-based motif in the T-cell coreceptor SLAM [135] (Figure 6). Homotypic engagement of the SLAM extracellular region leads to tyrosine phosphorylation of its cytoplasmic tail, which regulates signaling pathways in stimulated T cells that control the pattern of cytokine production and proliferation of CD8+ T cells. This in turn ensures a limited lymphoid response to viral infection. Mutations in the human SAP gene (see below) have revealed that its product is essential for SLAM signaling, and this has uncovered unusual properties of the SAP SH2 domain. First, it binds preferentially to an extended motif having the consensus sequence Thr-Ile-(p)Tyr-x-x-Val/Ile, which conforms to its physiological binding site on SLAM (Tyr281) [136, 137]. Surprisingly, the SAP SH2 domain binds an unphosphorylated Tyr281 SLAM peptide with a K d of 300 nM, and there is only about a 3-fold increase in affinity upon tyrosine phosphorylation [136–138]. As a consequence, SAP can bind to both unphosphory-lated and phosphorylated forms of SLAM in T cells. Biochemical and structural analysis has shown that the SAP SH2 domain effectively has three binding pockets, one for the central (p)Tyr, another for the +3 Val, and a third that accommodates the Thr at the −2 position of the peptide ligand [136, 137] (Figure 7). It is this unusual interaction with an N-terminal peptide residue that enhances pTyr-independent binding to the SAP SH2 domain. Presumably, the great majority of SH2 domains do not exploit this potential to bind unmodified peptides, because a relatively high affinity for the unphosphorylated ligand would nullify the switchlike properties of tyrosine phosphorylation in inducing SH2 domain recognition. The SAP SH2 domain also has an adaptor function, due to a second binding surface that associates with the SH3 domain of a Src-like cytoplasmic tyrosine kinase, Fyn [139]. Although SH3 domains usually recognize proline-rich motifs, some bind to basic sites [140]; in the Fyn–SAP complex, the Fyn SH3 domain recognizes such a basic surface on the SAP SH2 domain, focused on Arg78 [141, 142]. This surface of the SAP SH2 domain does not overlap the conventional peptide binding site, and the SAP SH2 domain can therefore bind simultaneously
14 The SH2 Domain: A Prototype for Protein Interaction Modules
Fig. 7 Recognition by the SAP SH2 domain of a Tyr-based motif in SLAM in both its unphosphorylated and phosphorylated states. The peptide binding surface of the SAP SH2 domain is in grey, and the Arg $B5 residue in blue. The SH2 domain is
complexed with either the phosphorylated SLAM peptide (TIpYAQV) (top) or its unphosphorylated counterpart (bottom). Note the three binding pockets for the −2 Thr, (p)Tyr, and the +3 Val, respectively.
to SLAM and the Fyn tyrosine kinase. The SAP-binding site on the Fyn SH3 domain, however, overlaps the regions involved in PxxP recognition and the autoinhibitory intramolecular interaction that represses Fyn activity. SAP therefore bridges SLAM to active Fyn, which consequently phosphorylates SLAM at additional tyrosine sites for the recruitment of SH2-containing effectors such as the SHIP 5 -inositol phosphatase and RasGAP (Figure 6). A number of more general points can be taken from this scheme. First, the reiterated use of rather simple binary interactions can be employed to assemble more sophisticated multiprotein complexes. Second, interaction domains such as SH2 and SH3 can potentially employ their primary binding surface to engage distinct types of peptide ligand (i.e., phosphorylated or unphosphorylated peptides for SH2 domains, proline-rich or basic motifs for SH3 domains). Third, a single domain can use distinct surfaces to bind distinct types of ligand, as shown by the ability of the SAP SH2 domain to engage both SLAM and Fyn. These are themes that are frequently encountered in the analysis of interaction domains. For example, PTB domains can bind both tyrosine phosphorylated and unphosphorylated peptide ligands [143–145], as well as phospholipids [146]; indeed, a number of interaction domains with functions ranging from phosphoinositide recognition to the binding of proline-rich sequences have the same fold as PTB domains [147].
7 SH2 Domain Dimerization 15
7 SH2 Domain Dimerization
The versatility with which SH2 domains bind pTyr-containing sequences is exemplified by the SH2 domain of the APS adaptor, which recognizes the activation loop of the insulin receptor kinase (IRK) catalytic domain. Upon insulin stimulation, the activation loop of IRK is autophosphorylated on three tyrosine residues and is consequently displaced from an inhibitory conformation that occludes the active site. One of these phosphorylated tyrosines (pTyr1163) interacts with residues in the kinase domain to lock the activation segment in place, but the other two (pTyr1158 and pTyr1162) are exposed [148]. The APS SH2 domain binds to these two pTyr residues in an unusual fashion [149]. The C-terminal region of the APS SH2 domain forms an unusually long αB helix, and this allows two SH2 domain monomers to dimerize through an extensive hydrophobic interface. pTyr1158 of IRK occupies the normal pTyr-binding pocket of the SH2 domain, but the subsequent residues of the IRK activation loop are sterically blocked from following a typical path perpendicular to the central β sheet, due to the C-terminal helix of the other monomer in the APS SH2 dimer. Rather, the activation loop binds in the plane of the β sheet, facilitated by numerous specific interactions with the IRK kinase domain. Notably, there is a second pTyr-binding site in the APS SH2 domain, which is occupied by pTyr1162 of the IRK activation loop. Thus, two IRK chains are bridged by two (dimeric) APS SH2 domains. APS may therefore enhance and prolong the activated state of IRK by protecting the phosphorylated activation loop from phosphatases and stabilizing the active form of the kinase domain (Figure 8). These results emphasize several points made above (i.e., the ability of SH2 domains to bind multiple ligands – here, the phosphorylated IRK and another APS SH2 domain – and the plasticity of SH2 domain pTyr-binding sites), but also demonstrate that SH2 domains can form homodimers. The Grb10 SH2 domain also forms a dimer through an interface that involves the C-terminal α helix (αB) [150]. In addition the SH2 domains of the STAT transcription factors dimerize in response to cytokine stimulation (see below) [151]. In this case, however, dimerization is induced by phosphorylation of a tyrosine just C-terminal to the SH2 domain,
Fig. 8 The APS SH2 domain homodimerizes and binds the phosphorylated activation loop of the insulin receptor kinase (IRK). IRK is itself a heterotetramer; only the cytoplasmic regions of its $ subunits are shown.
16 The SH2 Domain: A Prototype for Protein Interaction Modules
which leads to reciprocal intermolecular SH2 domain-pTyr interactions between two STAT proteins. Oligomerization is a common theme among interaction domains, which frequently self-associate into homo- or heterodimers.
8 Tandem SH2 Domains
A number of proteins, including the Syk and ZAP-70 cytoplasmic tyrosine kinases, PLC-γ , RasGAP, and the Shp-1 and Shp-2 tyrosine phosphatases, have two SH2 domains arranged in tandem (see Figure 1) and bind preferentially to doubly phosphorylated sites in which the two pTyr residues are closely spaced. Tandem SH2 domains appear to bind such bisphosphorylated motifs with significantly increased affinity and specificity, as compared to the interaction of a single SH2 domain with a monophosphorylated peptide [152]. This principle is demonstrated by the Syk and ZAP-70 tyrosine kinases, which mediate signaling by immuno-receptors, notably the B-cell and T-cell antigen receptors (BCR/TCR), and Fc receptors [153]. Upon activation of antigen receptors, Src family tyrosine kinases phosphorylate BCR or TCR signaling subunits on ITAM sequences (for immuno-receptor tyrosine-based activation motif). ITAMs have the consensus YxxI/L-x6–8 -YxxI/L, and their efficient binding to the Syk or ZAP-70 tandem SH2 domains requires that both tyrosines be phosphorylated [154]. Each of the two pTyr motifs within an ITAM binds in a conventional orientation to a single SH2 domain, such that the N-terminal pTyr associates with the C-terminal SH2 domain, and the C-terminal pTyr with the Nterminal SH2 domain. Strikingly, the pTyr binding pocket of the N-terminal SH2 domain of ZAP-70 is formed in part by residues from the C-terminal SH2 domain [155]. As a consequence, binding of the tandem ZAP–70 SH2 domains to doubly phosphorylated ITAM motifs in the TCR complex is strongly dependent on the spacing of the two phosphorylated tyrosines. The Syk SH2 domains are autonomous and more flexible and can bind a wider range of ITAM motifs with variable spacing between the pTyr residues [156, 157]. These data make the point that multiple interaction domains can be joined to create more-specific protein–protein interactions than those displayed by any one domain alone. Indeed, many signaling proteins contain several distinct interaction domains, often with quite different binding properties, which can potentially bind cooperatively to their targets.
9 Composite and Complex Interaction Domains
The c-Cbl protein is an E3 protein–ubiquitin ligase with an N-terminal SH2 domain, which binds specific pTyr motifs on autophosphorylated RTKs, and a central ring domain that recruits an E2 ubiquitin ligase and thus induces the monoubiquitination of receptors associated with the SH2 domain. The c-Cbl SH2 domain is unusual, in the sense that it is embedded in a larger folded structure that also
10 Allosteric Regulation 17
contains a four-helix bundle and an EF hand [158]. Although phosphopeptides bind the SH2 domain in a conventional fashion, the associated subdomains may participate in target recognition. The SH2 domain of STAT transcription factors is also part of a larger structural unit, since, unlike most SH2 domains, those from STAT proteins are not functional when expressed in isolation from their surrounding sequences. The core region of STATs contains a helical coiled-coil sequence, a DNA binding region, and a linker region followed by the SH2 domain; C-terminal to this core lies a transactivation sequence that regulates transcription. Activated cytokine receptors at the plasma membrane induce phosphorylation of a STAT tyrosine residue located immediately C-terminal to the SH2 domain, and this results in STAT dimerization through reciprocal SH2–pTyr interactions. STAT dimers are recruited to the nucleus, where they bind specific promoters to regulate gene expression. Structural analysis of the dimeric core region shows that the four domains (coiled-coil, DNA binding, linker, SH2) are interlocked through a common hydrophobic core and extensive interfaces between the domains [159, 160]. Of interest, the phosphate binding loop of the SH2 domain interacts directly with the linker domain, which in turn associates with the DNA binding domain; thus, there is potential for direct coupling between pTyr binding to the SH2 domain and DNA binding. The sequence following the SH2 domain of one monomer is organized so that, upon phosphorylation, it can engage only the SH2 domain of a partner STAT, rather than undergoing intramolecular interaction. STAT proteins therefore provide an elegant example of a functionally and structurally complex polypeptide built from smaller protein–protein and protein-nucleic acid interaction domains.
10 Allosteric Regulation
In enzymes such as the Src and Abl cytoplasmic tyrosine kinases or the Shp-2 protein-tyrosine phosphatase (PTP), reversible intramolecular interactions between the SH2 and catalytic domains provide the basis for autoinhibition of enzymatic activity and for switch-like activation after these repressive contacts are broken. The Shp-2 tyrosine phosphatase has two tandem SH2 domains, followed by a phosphatase domain and a C-terminal tail. In the autoinhibited state, the N-terminal (N-) SH2 domain undergoes an extensive intramolecular interaction with the active site of the phosphatase domain [161]. The D E loop of the SH2 domain buries into the PTP catalytic cleft, directly engages the catalytic Cys nucleophile and the phosphate binding cradle of the active site, and locks the PTP domain in an open, inactive conformation. This interaction also distorts the conformation of the NSH2 domain by imposing movements of the αB helix and the EF loop that occlude the phosphopeptide binding cleft; as a consequence, although the phosphopeptide binding site of the N-SH2 domain is exposed to solvent in the autoinhibited state, it is not properly configured for ligand recognition. The C-SH2 domain, in contrast, makes minimal contact with the N-SH2 or PTP domains, but could promote Shp-2
18 The SH2 Domain: A Prototype for Protein Interaction Modules
activation by binding one pTyr site on a bisphosphorylated peptide. This would increase the local concentration of the doubly phosphorylated motif and promote binding of the second pTyr site to the N-SH2 domain. The resulting reorganization of the N-SH2 domain would then break its interaction with the PTP domain, resulting in activation of the phosphatase. In this scheme, the N-SH2 domain has two distinct ligand binding sites (for a phosphopeptide or the PTP domain), but, in contrast to the examples discussed above, these interactions display negative cooperativity and are mutually incompatible. In Shp-2, the specific recognition of a bisphosphorylated motif by the tandem SH2 domains is thus precisely coupled to activation of the phosphatase. This device ensures that Shp-2 phosphatase activity is turned on only as the SH2 domains bind an appropriate target. A related mechanism is exploited by the Src and Abl tyrosine kinases, although the details are very different. In both Src and Abl, an SH3 domain is followed by an SH2 domain, which precedes the kinase domain. In Src family kinases, autoinhibition involves phosphorylation of a Tyr (Tyr527) in the extreme C-terminal tail by another tyrosine kinase, Csk [124]. This induces an intramolecular interaction with the SH2 domain and positions the SH3 domain to bind the linker region between the SH2 and kinase domains (Figure 9) [162, 163]. In this autoinhibited conformation, the conventional binding clefts of both the SH2 and SH3 domains engage internal ligands and are therefore protected from adventitious interactions with other polypeptides. At the same time, these interactions inactivate the kinase domain, although in contrast to Shp-2, this is achieved indirectly. Dynamic simulations and mutagenesis experiments have suggested that the linker between the SH2 and SH3 domains forms a rigid clamp in the autoinhibited state [164]. Since the SH2 domain interacts with the large lobe of the kinase and the SH3 domain with the small lobe, this clamp prevents flexibility at the intervening active site required for the exchange of ATP and ADP. Also, in this autoinhibited conformation, the regulatory tyrosine in the activation loop is sequestered and therefore less prone to autophosphorylation and resulting kinase activation, and the C helix in the small lobe undergoes a rotation that removes a key glutamate from the active site. This autoinhibited state can be broken in several ways. Most simply, dephosphorylation of Tyr527 leads both to dissociation of the SH2 and SH3 domains, which can then bind exogenous proteins with appropriate motifs, and to activation of the kinase domain, which can phosphorylate targets engaged by the adjacent interaction domains. Alternatively, an exogenous protein with high-affinity binding motifs for the Src SH3 or SH2 domains can compete for the inhibitory intramolecular interactions and thereby activate and localize the kinase domain. Therefore, as with Shp-2, the autoinhibited state both sequesters the interaction domains and inactivates the catalytic domain, and activation involves coordinate stimulation of the kinase domain and tethering of the SH2/SH3 domains to potential substrates. In the v-Src oncoprotein, deletion of the C-terminal tail removes the autoinhibitory Tyr527 site, and the tyrosine kinase becomes constitutively active (Figure 9). The Abl kinase lacks an inhibitory phosphorylation site C-terminal to the kinase domain, but undergoes a very similar mode of autoinhibition as Src. In the Abl1b isoform, an N-terminal myristate group is inserted into the large helical lobe of the
11 SH2 Domains and Disease
Fig. 9 The SH2 and SH3 domains of the c-Src tyrosine kinase regulate its activity and substrate specificity. Activated c-Src can be targeted to its substrates through SH2/SH3-mediated interaction, which pro-
motes the progressive phosphorylation of proteins such as Cas. Loss of the regulatory C terminus converts Src into a constitutively active tyrosine kinase.
kinase, inducing a conformational distortion that creates a direct interface between the large lobe of the kinase domain and the SH2 domain [165]. This interaction, which involves the αA helix of the SH2 domain, is incompatible with phosphopeptide binding and positions the SH3 domain for interaction with the SH2-kinase linker, in a similar fashion to Src. The precise mechanism through which this inactivates the kinase domain is somewhat different for Abl than for Src, but the principles are the same. Interestingly, the specificity of the kinase inhibitor Gleevec (STI571/Imanitib) for Abl as compared with Src is due to the distinct autoinhibited conformation imposed on the Abl kinase by its associated SH2 domain [165]. Gleevec binds exclusively to Abl in the autoinhibited conformation and therefore disturbs the equilibrium between inactive and active kinases, with beneficial effects in the treatment of chronic myelogenous leukemia. Under physiological conditions, Abl could be activated through loss of the N-terminal myristate group, which would abolish the conformation of the large lobe required for SH2 binding, or through competition by exogenous proteins for interactions with the SH2 and SH3 domains [166]. The regulatory device in which interaction domains undergo an intramolecular interaction with a catalytic domain or another interaction surface, resulting in reversible autoinhibition, is emerging as a common theme in intracellular regulatory proteins. Similarly, signaling enzymes are typically localized within the cell and docked onto their substrates through noncatalytic protein–protein and proteinphospholipid interactions.
11 SH2 Domains and Disease
Mutations in either SH2 domains or their binding motifs are associated with a number of human disorders. Missense mutations in the gene for human Shp-2 (PTPN11) cause Noonan’s syndrome (NS), a developmental disorder associated
19
20 The SH2 Domain: A Prototype for Protein Interaction Modules
with cardiac defects, facial dysmorphism, and skeletal abnormalities [167]. The substitutions that cause NS map to the autoinhibitory interface between the NSH2 and PTP domains. As a consequence, these mutations abrogate the inhibitory interaction between the SH2 and phosphatase domains without disrupting pTyr binding or catalytic activity. NS therefore results from activating mutations that selectively block the ability of the N-SH2 domain to inhibit the PTP domain. Somatic mutations that activate Shp-2 are also common in juvenile myelomonocytic leukemia (JMML) and are present in other myeloid leukemias [168, 169]. The Shp-2 mutations in JMML are almost all in the N-SH2 domain and block autoinhibition, resulting in aberrant phosphatase activation. Although it is counterintuitive that a tyrosine phosphatase would promote malignant cell proliferation, Shp-2 activates signaling through growth stimulatory pathways such as Erk MAP kinase by dephosphorylating inhibitory pTyr sites. Loss-of-function mutations in the Btk cytoplasmic tyrosine kinase cause a B-cell immunodeficiency termed X-linked agammaglobulinemia (XLA), which is associated with a deficiency in mature B cells and susceptibility to bacterial infection. Btk has an N-terminal PH domain, followed by SH3, SH2, and kinase domains, which adopt an extended conformation with little interaction between the domains [170]. Btk is activated by the binding of its PH domain to PI-3,4,5-P3 at the membrane [171] and by tyrosine phosphorylation, and associates with the phosphorylated Blnk/SLP-65 scaffold through its SH2 domain [172]. Btk in turn stimulates a number of downstream pathways involved in B cell activation, for example, through the phosphorylation of PLC-γ 2 and consequent calcium release [173]. XLA missense mutations can affect either the PH, SH3, SH2, or kinase domains and result in a loss of ligand binding, kinase activity, or protein stability [174, 175]. A distinct inherited immune defect, X-linked lymphoproliferative syndrome (XLP; also called Purtilo’s syndrome or Duncan’s disease), results from mutations in the SAP/SH2D1A gene [135, 176, 177]. Young boys with XLP are unable to control infections of Epstein–Barr virus (EBV) and typically develop a fulminant infectious mononucleosis, which is usually fatal as a result of severe tissue necrosis and organ failure. Surviving patients commonly develop malignant lymphomas. Many SAP mutations cause substitutions in the SH2 domain, resulting in either a loss of binding to SLAM or protein destabilization (or both) (Figures 6 and 7) [178, 179]. In the absence of SAP, the Fyn tyrosine kinase cannot be recruited to SLAM or related receptors, and this aspect of lymphoid regulation is lost. SH2 domains are also indirectly influenced by a wide range of disease-causing mutations. Notably, oncogenic mutations in RTKs or cytoplasmic tyrosine kinases activate ectopic pTyr–SH2 domain interactions that promote intracellular signaling and lead to a malignant state. An example is the Tyr177 site located in the BCR region of the chimeric Bcr–Abl oncoprotein that causes chronic myelogenous leukemia. In the context of Bcr–Abl, Tyr177 is phosphorylated, and because it lies in a Tyr-x-Asn motif, recruits the SH2 domain of the Grb2 adaptor [180, 181], which in turn interacts through its SH3 domains with Sos and Gab-2 (Figure 10). These latter proteins stimulate the Ras-MAP kinase and PI 3 kinase pathways, in a fashion that correlates with malignancy [182].
11 SH2 Domains and Disease
Fig. 10 Aberrant SH2–pTyr interactions in cancer and bacterial infection. (a) Schematic of the Bcr–Abl tyrosine kinase. Tyr177 in the Bcr-encoded region is phosphorylated and binds Grb2. The Crkl adaptor binds a proline-rich motif in the C terminus and is itself a substrate for phosphoryla-
tion. (b) The TIR protein encoded by enteropathogenic Escherichia coli (EPEC) is phosphorylated at a tyrosine motif in its Cterminal cytoplasmic tail that binds the SH2 domain of the Nck adaptor, which recruits regulators of the actin cytoskeleton through its SH3 domains.
Similarly, pathogenic proteins encoded by human viruses or bacteria can form aberrant interactions with the SH2 domains of host cell proteins and thus rewire cellular functions, to the advantage of the pathogen. Examples include the latent membrane protein (LMP) 2A of EBV, which spans the membrane of infected B cells 12 times and has an N-terminal cytoplasmic region with three sites of tyrosine phosphorylation; these pTyr motifs bind to Src family tyrosine kinases and to the tandem SH2 domains of Syk. By recruiting these B-cell tyrosine kinases, LMP2A appears to interfere with normal signaling through the BCR and at the same time provide a survival signal [183, 184]. This helps to maintain viral latency by simultaneously blocking activation of the infected B cell, which would induce a lytic state, and keeping the cell alive [185]. In a related vein, the CagA protein of Helicobacter pylori is phosphorylated by Src family kinases at multiple tyrosine motifs that bind the SH2 domains of Shp-2, which is important for morphological changes and increased motility of infected epithelial cells [186]. As a further example, the Tir protein of enteropathogenic E. coli is inserted into the host plasma membrane and is phosphorylated at a cytoplasmic Tyr site that recruits the Nck SH2/SH3 adaptor and its binding partner N-WASP to stimulate actin polymerization [187]. As a consequence, the bacterium induces abnormal actin polymerization (Figure 10), resulting in massive surface protrusions to which the bacterium adheres. In a striking example of convergent evolution, the A36R protein of vaccinia virus employs a similar motif to recruit Nck and induce actin polymerization, which drives the virus particle through the cell [188]. These observations regarding SH2 domains establish a precedent for the role of protein–protein interactions in disease states, and indeed there are now many examples of mutations that cause human disease by interfering with the functions of interaction domains. Another consideration is that naturally occurring non-synonomous polymorphisms might be a predisposing factor in disease by influencing protein–protein interactions. For example, the SKG strain of mice spontaneously develop chronic arthritis, caused by a polymorphism that results in a Trp-to-Cys substitution at the start of the C-SH2 domain of ZAP-70 [189]. This impairs the association of ZAP-70 with TCR and leads to attenuated T-cell
21
22 The SH2 Domain: A Prototype for Protein Interaction Modules
signaling. This in turn favors the survival of autoreactive T cells that would otherwise be eliminated through negative selection and apoptosis. Finally, it is likely that small molecules that modify protein–protein interactions will become more prominent in the treatment of disease. Such compounds could potentially block disease-associated protein interactions, stabilize existing complexes, or drive the interactions of proteins that are not normally associated. Indeed, immunosuppressant drugs such as cyclosporin and rapamycin function in the latter mode, by bridging an interaction between immunophilins and the protein phosphatase calcinuerin or the Tor protein kinase, respectively, which leads to inhibition of these enzymes and blockage of T-cell activation. Although protein–protein interactions are generally considered to be problematic drug targets, recent data have suggested the feasibility of inhibiting specific protein–protein interactions in therapeutically useful ways. For example, a small molecule that blocks the binding of p53 to Mdm2 stimulates the p53 pathway and has antitumor activity against cancer cells that retain wild-type p53 [190]. 12 Conclusions
SH2 domains have a very simple function, namely, to recognize specific pTyrcontaining sequences and thereby to couple tyrosine kinases to intracellular pathways that control cell growth, proliferation, metabolism, and differentiation. However, they also exhibit complex properties, through their ability to regulate both the substrate specificity and catalytic activity of enzymes in which they are embedded. In addition, by acting cooperatively with other interaction domains and by providing the basis for networks of interacting proteins (as in activated lymphocytes), they can exert a broad effect on cellular behavior. Perhaps most importantly, these functional attributes of SH2 domains have yielded a framework for the discovery and analysis of a large family of interaction domains, which has crystallized a new way of thinking about the lexicon of cellular organization. References 1 ECKHART, W., HUTCHINSON, M. A., HUNTER, T., An activity phosphorylating tyrosine in polyoma T antigen immuno-precipitates. Cell 1979, 18, 925–933. 2 SEFTON, B. M., HUNTER, T., BEEMON, K., ECKHART, W., Evidence that the phosphorylation of tyrosine is essential for cellular transformation by Rous sarcoma virus. Cell 1980, 20, 807–816. 3 WITTE, O. N., DASGUPTA, A., BALTIMORE, D., Abelson murine leukaemia virus protein is phosphorylated in vitro to
form phosphotyrosine. Nature 1980, 283, 826–831. 4 PAWSON, T., GUYDEN, J., KUNG, T. H., RADKE, K., GILMORE, T., MARTIN, G. S., A strain of Fujinami sarcoma virus which is temperature-sensitive in protein phosphorylation and cellular transformation. Cell 1980, 22, 767–775. 5 USHIRO, H., COHEN, S., Identification of phosphotyrosine as a product of epidermal growth factor-activated protein kinase in A-431 cell membranes. J. Biol. Chem. 1980, 255, 8363–8365.
References 6 EK, B., WESTERMARK, B., WASTESON, A., HELDIN, C. H., Stimulation of tyrosine-specific phosphorylation by platelet-derived growth factor. Nature 1982, 295, 419–420. 7 PETRUZZELLI, L. M., GANGULY, S., SMITH, C. J., COBB, M. H., RUBIN, C. S., ROSEN, O. M., Insulin activates a tyrosine-specific protein kinase in extracts of 3T3-L1 adipocytes and human placenta. Proc. Natl. Acad. Sci. USA 1982, 79, 6792–6796. 8 HUNTER, T., COOPER, J. A., Epidermal growth factor induces rapid tyrosine phosphorylation of proteins in A431 human tumor cells. Cell 1981, 24, 741–752. 9 SCHREIBER, A. B., LIBERMANN, T. A., LAX, I., YARDEN, Y., SCHLESSINGER, J., Biological role of epidermal growth factor-receptor clustering: investigation with monoclonal anti-receptor antibodies. J. Biol. Chem. 1983, 258, 846–853. 10 DOWNWARD, J., YARDEN, Y., SCRACE, G., TOTTY, N., STOCKWELL, P., ULLRICH, A., SCHLESSINGER, J., WATERFIELD, M. D., Close similarity of epidermal growth factor receptor and v-erb-B oncogene protein sequences. Nature 1984, 307, 521–527. 11 STERN, D. F., HEFFERNAN, P. A., WEINBERG, R. A., p185, a product of the neu proto-oncogene, is a receptor-like protein associated with tyrosine kinase activity. Mol. Cell Biol. 1986, 6, 1729–1740. 12 ROSEN, O. M., HERRERA, R., OLOWE, Y., PETRUZZELLI, L. M., COBB, M. H., Phosphorylation activates the insulin receptor tyrosine protein kinase. Proc. Natl. Acad. Sci. USA 1983, 80, 3237–3240. 13 WEINMASTER, G., ZOLLER, M. J., SMITH, M., HINZE, E., PAWSON, T., Mutagenesis of Fujinami sarcoma virus: evidence that tyrosine phosphorylation of P130gag-fps modulates its biological activity. Cell 1984, 37, 559–568. 14 HUBBARD, S. R., TILL, J. H., Protein tyrosine kinase structure and function. Annu. Rev. Biochem. 2000, 69, 373–398.
15 HUBBARD, S. R., WEI, L., ELLIS, L., HENDRICKSON, W. A., Crystal structure of the tyrosine kinase domain of the human insulin receptor. Nature 1994, 372, 746–754. 16 MOHAMMADI, M., SCHLESSINGER, J., HUBBARD, S. R., Structure of the FGF receptor tyrosine kinase domain reveals a novel autoinhibitory mechanism. Cell 1996, 86, 577–587. 17 STONE, J. C., ATKINSON, T., SMITH, M. E., PAWSON, T., Identification of functional regions in the transforming protein of Fujinami sarcoma virus by in-phase insertion mutagenesis. Cell 1984, 37, 548–558. 18 SHIBUYA, M., HANAFUSA, H., Nucleotide sequence of Fujinami sarcoma virus: evolutionary relationship of its transforming gene with transforming genes of other sarcoma viruses. Cell 1982, 30, 787–795. 19 ASPENSTROM, P., A cdc42 target protein with homology to the non-kinase domain of FER has a potential role in regulating the actin cytoskeleton. Curr. Biol. 1997, 7, 479–487. 20 TAKAHASHI, S., INATOME, R., HOTTA, A., QIN, Q., HACKENMILLER, R., SIMON, M. C., YAMAMURA, H., YANAGAI, S., Role for Fes/Fps tyrosine kinase in microtubule nucleation through is Fes/CIP4 homology domain. J. Biol. Chem. 2004, 278, 49129–49133. 21 STONE, J. C., PAWSON, T., Correspondence between immunological and functional domains in the transforming protein of Fujinami sarcoma virus. J. Virol. 1985, 55, 721–727. 22 SADOWSKI, I., STONE, J. C., PAWSON, T., A non-catalytic domain conserved among cytoplasmic protein-tyrosine kinases modifies the kinase function and transforming activity of Fujinami sarcoma virus P130gag-fps. Mol. Cell Biol. 1986, 4396–4408. 23 DECLUE, J. E., SADOWSKI, G. S., MARTIN, G. S., PAWSON, T., A conserved domain regulates interactions of the v-fps protein-tyrosine kinase with the host cell. Proc. Natl. Acad. Sci. USA 1987, 84, 9064–9068.
23
24 The SH2 Domain: A Prototype for Protein Interaction Modules
24 KOCH, A., MORAN, M., SADOWSKI, I., PAWSON, T., The common src homology region 2 domain of cytoplasmic signaling proteins is a positive effector of v-fps tyrosine kinase function. Mol. Cell Biol. 1989, 9, 4131–4140. 25 MAYER, B. J., HAMAGUCHI, M., HANAFUSA, H., A novel viral oncogene with structural similarity to phospholipase C. Nature 1988, 332, 272–275. 26 STAHL, M. L., FERENZ, C. R., KELLEHER, K. L., KRIZ, R. W., KNOPF, J., Sequence similarity of phospholipase C with the non-catalytic region of src. Nature 1988, 332, 269–272. 27 TRAHEY, M., et al., Molecular cloning of two types of GAP complementary DNA from human placenta. Science 1988, 242, 1697–1700. 28 VOGEL, U. S., DIXON, R. A. R., SCHABER, M. D., DIEHL, R. E., MARSHALL, M. S., SCOLNICK, E. M., SIGAL, I. S., GIBBS, J. A., Cloning of bovine GAP and its interaction with oncogenic ras p21. Nature 1988, 335, 90–93. 29 REN, R., MAYER, B. J., CICCHETTI, P., BALTIMORE, D., Identification of a tenamino acid proline-rich SH3 binding site. Science 1993, 259, 1157–1161. 30 MARGOLIS, B., et al., EGF induces tyrosine phosphorylation of phospholipase C-II: a potential mechanism for EGF signalling. Cell 1989, 57, 1101–1107. 31 MEISENHELDER, J., SUH, P.-G., RHEE, S. G., HUNTER, T., Phospholipase C is a substrate for the PDGF and EGF receptor protein-tyrosine kinases in vivo and in vitro. Cell 1989, 57, 1109–1122. 32 WAHL, M. I., NISHIBE, S., SUH, P.-G., RHEE, S. G., CARPENTER, G., Epidermal growth factor stimulates tyrosine phosphorylation of phospholipase C (II) independently of receptor internalization and extracellular calcium. Proc. Natl. Acad. Sci. USA 1989, 86, 1568–1572. 33 MOLLOY, C. J., BOTTARO, D. P., FLEMING, T. P., MARSHALL, M. S., GIBBS, J. B., AARONSON, S. A., PDGF induction of tyrosine phosphorylation of GTPase activating protein. Nature 1989, 342, 711–714.
34 ELLIS, C., MORAN, M., MCCORMICK, F., PAWSON, T., Phosphorylation of GAP and GAP-associated proteins by transforming and mitogenic tyrosine kinases. Nature 1990, 343, 377–381. 35 PAWSON, T., Non-catalytic domains of cytoplasmic protein-tyrosine kinases: regulatory elements in signal transduction. Oncogene 1988, 3, 491–495. 36 KUMJIAN, D. A., WAHL, M. I., RHEE, S. G., DANIEL, T. O., Platelet-derived growth factor (PDGF) binding promotes physical association of PDGF receptor with phospholipase C. Proc. Natl. Acad. Sci. USA 1989, 86, 8232–8236. 37 KAZLAUSKAS, A., COOPER, J. A., Autophosphorylation of the PDGF receptor in the kinase insert region regulates interactions with cell proteins. Cell 1989, 58, 1121–1133. 38 ANDERSON, D., KOCH, C. A., GREY, L., ELLIS, C., MORAN, M. F., PAWSON, T., Binding of SH2 domains of phospholipase Cγ 1, GAP, and Src to activated growth factor receptors. Science 1990, 250, 979–982. 39 MORAN, M. F., KOCH, C. A., ANDERSON, D., ELLIS, C., ENGLAND, L., MARTIN, G. S., PAWSON, T., Src homology region 2 domains direct protein–protein interactions in signal transduction. Proc. Natl. Acad. Sci. USA 1990, 87, 8622–8626. 40 MAYER, B. J., HANAFUSA, H., Association of the v-crk oncogene product with phosphotyrosine-containing proteins and protein kinase activity. Proc. Natl. Acad. Sci. USA 1990, 87, 2638–2642. 41 MATSUDA, M., MAYER, B. J., FUKUI, Y., HANAFUSA, H., Binding of oncoprotein, p47gag-crk, to a broad range of phospho-tyrosine-containing proteins. Science 1990, 248, 1537–1539. 42 MARGOLIS, B., LI, N., KOCH, A., MOHAMMADI, M., HURWITZ, D. R., ZILBERSTEIN, A., ULLRICH, A., PAWSON, T., SCHLESSINGER, J., The tyrosine phosphorylated carboxyterminus of the EGF receptor is a binding site for GAP and PLC-gamma. EMBO J. 1990, 9, 4375–4380. 43 MAYER, B. J., JACKSON, P. K., BALTIMORE, D., The noncatalytic src homology
References
44
45
46
47
48
49
50
51
52
region 2 segment of abl tyrosine kinase binds to tyrosine-phosphorylated cellular proteins with high affinity. Proc. Natl. Acad. Sci. USA 1991, 88, 627–631. ESCOBEDO, J. A., KAPLAN, D. R., KAVANAUGH, W. M., TURCK, C. W., WILLIAMS, L. T., A phosphatidylinositol-3 kinase binds to platelet-derived growth factor receptors through a specific receptor sequence containing phosphotyrosine. Mol. Cell Biol. 1991, 11, 1125–1132. KOCH, C. A., MORAN, M. F., ANDERSON, D., LIU, X., MBAMALU, G., PAWSON, T., Multiple SH2-mediated interactions in v-src-transformed cells. Mol. Cell Biol. 1992, 12, 1366–1374. REEDIJK, M., LIU, X., van der GEER, P., LETWIN, K., WATERFIELD, M. D., HUNTER, T., PAWSON, T., Tyr721 regulates specific binding of the CSF-1receptor kinase insert to PI 3 -kinase SH2 domains: a model for SH2-mediated receptor-target interactions. EMBO J. 1992 11, 1365–1372. KOCH, C. A., ANDERSON, D., MORAN, M. F., ELLIS, C., PAWSON, T., SH2 and SH3 domains: Elements that control interactions of cytoplasmic signaling proteins. Science 1991, 252, 668–674. PAWSON, T., NASH, P., Assembly of cell regulatory systems through protein interaction domains. Science 2003, 300, 445–452. TONG, A. H., et al., A combined experimental and computational strategy to define protein interaction networks for peptide recognition modules. Science 2002, 295, 321–324. KURIYAN, J., COWBURN, D., Modular peptide recognition domains in eukaryotic signaling. Annu. Rev. Biophys. Biomol. Struct. 1997, 26, 259–288. BRADSHAW, J. M., WAKSMAN, G., Molecular recognition by SH2 domains. Adv. Protein Chem. 2002, 61, 161–210. PICCIONE, E., CASE, R. D., DOMCHEK, S. M., HU, P., CHAUDHURI, M., BACKER, J. M., SCHLESSINGER, J., SHOELSON, S. E., Phosphatidylinositol 3-kinase p85 SH2 domain specificity defined by direct phosphopeptide/SH2 domain binding. Biochemistry 1993, 32, 3197–3202.
53 LEMMON, M. A., LADBURY, J. E., Thermodynamic studies of tyrosylphosphopeptide binding to the SH2 domain of p56lck. Biochemistry 1994, 33, 5070–5076. 54 BRADSHAW, J. M., MITAXOV, V., WAKSMAN, G., Mutational investigation of the specificity determining region of the Src SH2 domain. J. Mol. Biol. 2000, 299, 521–535. 55 DOMCHEK, S. M., AUGER, K. R., CHATTERJEE, S., BURKE, T. R. Jr., SHOELSON, S. E., Inhibition of SH2 domain/phospho-protein association by a nonhydrolyzable phosphonopeptide. Biochemistry 1992, 31, 9865–9870. 56 SONGYANG, Z., et al., Identification of phosphotyrosine peptide motifs which bind to SH2 domains. Cell 1993, 72, 767–778. 57 SONGYANG, Z., et al., Specific motifs recognized by the SH2 domains of Csk, 3BP2, fes/fps, Grb2, SHPTP1, SHC, Syk and vav. Mol. Cell Biol. 1994, 14, 2777–2785. 58 LADBURY, J. E., LEMMON, M. A., ZHOU, M., GREEN, J., BOTFIELD, M. C., SCHLESSINGER, J., Measurement of the binding of tyrosyl phosphopeptides to SH2 domains: a reappraisal. Proc. Natl. Acad. Sci. USA 1995, 92, 3199–3203. 59 PORTER, M., SCHINDLER, T., KURIYAN, J., MILLER, W. T., Reciprocal regulation of Hck activity by phosphorylation of Tyr(527) and Tyr(416). Effect of introducing a high affinity intramolecular SH2 ligand. J. Biol. Chem. 2000, 275, 2721–2726. 60 PAWSON, T., Protein modules and signalling networks. Nature 1995, 373, 573–580. 61 PONZETTO, C., et al., A multifunctional docking site mediates signaling and transformation by the hepatocyte growth factor/scatter factor receptor family. Cell 1994, 77, 261–271. 62 WAKSMAN, G., et al., Crystal structure of the phosphotyrosine recognition domain (SH2) of the v-src tyrosine kinase complexed with tyrosine phosphorylated peptides. Nature 1992, 358, 646–653. 63 OVERDUIN, M., RIOS, C. B., MAYER, B. J., BALTIMORE, D., COWBURN, D.,
25
26 The SH2 Domain: A Prototype for Protein Interaction Modules
64
65
66
67
68
69
70
71
Three-dimensional solution structure of the src homology 2 domain of c-abl. Cell 1992, 70, 697–704. BOOKER, G. W., BREEZE, A. L., DOWNING, A. K., PANAYOTOU, G., GOUT, I., WATERFIELD, M. D., CAMPBELL, I. D., Structure of an SH2 domain of the p85α subunit of phosphatidylinositol-3-OH kinase. Nature 1992, 358, 684–687. MARENGERE, L. E. M., PAWSON, T., Identification of residues in GAP SH2 domains that control binding to tyrosine phosphorylated growth factor receptors and p62. J. Biol. Chem. 1992, 267, 22779–22786. MAYER, B. J., JACKSON, P. K., Van ETTERN R. A., BALTIMORE, D., Point mutations in the abl SH2 domain coordinately impair phosphotyrosine binding in vitro and transforming activity in vivo. Mol. Cell Biol. 1992, 12, 609–618. WAKSMAN, G., SHOELSON, S., PANT, N., COWBURN, D., KURIYAN, J., Binding of a high affinity phosphotyrosyl peptide in the src SH2 domain: crystal structures of the complexed and peptide-free forms. Cell 1993, 72, 779–790. ECK, M. I., SHOELSON, S. E., HARRISON, S. C., Recognition of a high affinity phosphotyrosyl peptide by the Src homology 2 domain of p56lck . Nature 1993, 362, 87–91. LEE, C.-H., KOMINOS, D., JACQUES, S., MARGOLIS, B., SCHLESSINGER, J., SHOELSON, S. E., KURIYAN, J., Crystal structures of peptide complexes of the aminoterminal SH2 domain of the Syp tyrosine phosphatase. Structure 1994, 2, 423–438. PASCAL, S. M., SINGER, A. U., GISH, G., YAMAZAKI, T., SHOELSON, S. E., PAWSON, T., KAY, L. E., FORMAN-KAY, J. D., Nuclear magnetic resonance structure of an SH2 domain of phospholipase C-gamma1 complexed with a high affinity binding peptide. Cell 1994, 77, 461–472. SONGYANG, Z., GISH, G., MBAMALU, G., PAWSON, T., CANTLEY, L. C., A single point mutation switches the specificity of group III Src homology (SH) 2 domains to that of group I SH2 domains. J. Biol. Chem. 1995, 270, 26029–26032.
72 RAHUEL, J., et al., Structural basis for specificity of GRB2-SH2 revealed by a novel ligand binding mode. Nat. Struct. Biol. 1996, 3, 586–589. 73 MCNEMAR, C., et al., Thermodynamic and structural analysis of phosphotyrosine polypeptide binding to Grb2-SH2. Biochemistry 1997, 36, 10006–10014. 74 OGURA, K., et al., Conformation of an Shc-derived phosphotyrosine-containing peptide complexed with the Grb2 SH2 domain. J. Biomol. NMR 1997, 10, 273–278. 75 IVANCIC, M., DALY, R. J., LYONS, B. A., Solution structure of the human Grb7-SH2 domain/erbB2 peptide complex and structural basis for Grb7 binding to ErbB2. J. Biomol. NMR 2003, 27, 205–219. 76 MARENGERE, L. E. M., SONGYANG, Z., GISH, G. D., SCHALLER, M. D., PARSONS, T., STERN, M. J., CANTLEY, L. C., PAWSON, T., SH2 domain specificity and activity modified by a single residue. Nature 1994, 369, 502–550. 77 KIMBER, M. S., NACHMAN, J., GISH, G., PAWSON, T., PAI, E., Structural basis for specificity switching by the Src SH2 domain. Molecular Cell 2000, 5, 1043–1049. 78 YAFFE, M. B., Phosphotyrosine-binding domains in signal transduction. Nat. Rev. Mol. Cell Biol. 2002, 3, 177–186. 79 YAFFE, M. B., ELIA, A. E., Phosphoserine/threonine-binding domains. Curr. Opin. Cell Biol. 2001, 13, 131–138. 80 MARMORSTEIN, E., Protein modules that manipulate histone tails for chromatin regulation. Nat. Rev. Mol. Cell Biol. 2001, 2, 422–432. 81 JAAKKOLA, P., et al., Targeting of HIF-alpha to the von Hippel–Lindau ubiquitylation complex by O2 -regulated prolyl hydroxylation. Science 2001, 292, 468–472. 82 IVAN, M., et al., HIFalpha targeted for VHL-mediated destruction by proline hydroxylation: implications for O2 sensing. Science 2001, 292, 464–468. 83 HICKE, L., DUNN, R., Regulation of membrane protein transport by
References
84
85
86
87
88
89
90
91
92
ubiquitin and ubiquitin-binding proteins. Ann. Rev. Cell Dev. Biol. 2003, 19, 141–172. DUEBER, J. E., YEH, B. J., CHAK, K., LIM, W. A., Reprogramming control of an allosteric signaling switch through modular recombination. Science 2003, 301, 1904–1908. OLIVIER, J. P., et al., A Drosophila SH2– SH3 adaptor protein implicated in coupling the sevenless tyrosine kinase to an activator of Ras guanine nucleotide exchange, Sos. Cell 1993, 73, 179–191. SIMON, M. A., DODSON, G. S., RUBIN, G. M., SH3–SH2–SH3 protein is required for p21ras activation and binds to sevenless and Sos proteins in vitro. Cell 1993, 73, 169–177. ROZAKIS-ADCOCK, M., FERNLEY, R., WADE, J., PAWSON, T., BOWTELL, D., The SH2 and SH3 domains of mammalian Grb2 couple the EGF-receptor to mSos1, an activator of Ras. Nature 1993, 363, 83–85. LI, N., BATZER, A., DALY, R., SKOLNIK, E., CHARDIN, P., BAR-SAGI, D., MARGOLIS, B., SCHLESSINGER, J., Guanine nucleotide releasing factor hSos1 binds to Grb2 and links receptor tyrosine kinases to Ras signaling. Nature 1993, 363, 85–88. RAABE, T., OLIVIER, J. P., DICKSON, B., LIU, X., GISH, G. D., PAWSON, T., HAFEN, E., Biochemical and genetic analysis of Drk SH2/SH3 adaptor protein of Drosophila. EMBO J. 1995, 14, 2509–2518. CHENG, A. M., et al., Mammalian Grb2 regulates multiple steps in embryonic development and malignant transformation. Cell 1998, 95, 793–803. LOCK, L. S., ROYAL, I., NAUJOKAS, M. A., PARK, M., Identification of an atypical Grb2 carboxylterminal SH3 domain binding site in Gab docking proteins reveals Grb2-dependent and –independent recruitment of Gab1 to receptor tyrosine kinases. J. Biol. Chem. 2000, 275, 31536–31545. SCHAEPER, U., FUCHS, K. P., SACHS, M., KEMPKES, B., BIRCHMEIER, W., Coupling of Gab1 to c-Met, Grb2, and Shp2 mediates biological responses. J. Cell Biol. 2000, 149, 1419–1432.
93 GU, H., NEEL, B. G., The “Gab” in signal transduction. Trends Cell Biol. 2003, 13, 122–130. 94 THOMAS, C. C., DEAK, M., ALESSI, D. R., van AALTEN, D. M., High-resolution structure of the pleckstrin homology domain of protein kinase B/akt bound to phosphatidylinositol (3, 4, 5)-trisphosphate. Curr. Biol. 2002, 12, 1256–1262. 95 CANTLEY, L. C., The phosphoinositide 3-kinase pathway. Science 2002, 296, 1655–1657. 96 YAFFE, M. B., How do 14–3-3 proteins work? Gatekeeper phosphorylation and the molecular anvil hypothesis. FEBS Lett. 2002, 513, 53–57. 97 RIVERA, G. M., BRICENO, C. A., TAKESHIMA, F., SNAPPER, S. B., MAYER, B. J., Inducible clustering of membrane-targeted SH3 domains of the adaptor protein Nck triggers localized actin polymerization. Curr. Biol. 2004, 14, 11–22. 98 BUDAY, L., WUNDERLICH, L., TAMAS, P., The Nck family of adapter proteins: regulators of actin cytoskeleton. Cell Signal 2002, 14, 723–731. 99 ZHAO, Z. S., MANSER, E., Interaction between PAK and Nck: a template for Nck targets and role of PAK autophosphorylation. Mol. Cell Biol. 2000, 20, 3906–3917. 100 ROHATGI, R., NOLLAU, P., HO, H. Y., KIRSCHNER, M. W., MAYER, B. J., Nck and phosphatidylinositol 4, 5-bisphosphate synergistically activate actin polymerization through the N-WASP-Arp2/3 pathway. J. Biol. Chem. 2001, 276, 26448–26452. 101 GARRITY, P. A., RAO, Y., SALECKER, I., MCGLADE, J., PAWSON, T., ZIPURSKY, S. L., Drosophila photoreceptor axon guidance and targeting requires the Dreadlocks SH2/SH3 adaptor protein. Cell 1996, 85, 639–650. 102 BLADT, F., AIPPERSBACH, E., GELKOP, S., STRASSER, G. A., NASH, P., TAFURI, A., GERTLER, F. B., PAWSON, T., The murine Nck SH2/SH3 adaptors are important for the development of mesoderm-derived embryonic structures and for regulating the cellular actin
27
28 The SH2 Domain: A Prototype for Protein Interaction Modules
103
104
105
106
107
108
109
110
111
112
network. Mol. Cell Biol. 2003, 23, 4586–4597. TANAKA, S., et al., C3G, a guanine nucleotide-releasing protein expressed ubiquitously, binds to the Src homology 3 domains of CRK and GRB2/ASH proteins. Proc. Natl. Acad. Sci. USA 1994, 91, 3443–3447. OHBA, Y., et al., Requirement for C3G-dependent Rap1 activation for cell adhesion and embryogenesis. EMBO J. 2001, 20, 3333–3341. KIYOKAWA, E., HASHIMOTO, Y., KOBAYASHI, S., SUGIMURA, H., KURATA, T., MATSUDA, M., Activation of Rac1 by a Crk SH3-binding protein, DOCK180. Genes. Dev. 1998, 12, 3331–3336. ICHIBA, T., KURAISHI, Y., SAKAI, O., NAGATA, S., GROFFEN, J., KURATA, T., HATTORI, S., MATSUDA, M., Enhancement of guaninenucleotide exchange activity of C3G for Rap1 by the expression of Crk, CrkL, and Grb2. J. Biol. Chem. 1997, 272, 22215–22220. REDDIEN, P. W., HORVITZ, H. R., CED-2/CrkII and CED-10/Rac control phagocytosis and cell migration in Caenorhabditis elegans. Nat. Cell Biol. 2000, 2, 131–136. FELLER, S. M., Crk family adaptors-signalling complex formation and biological roles. Oncogene 2001, 20, 6348–6371. PAWSON, T., NASH, P., Protein–protein interactions define specificity in signal transduction. Genes. Dev. 2000, 14, 1027–1047. PAWSON, T., Specificity in signal transduction: From phosphotyrosine–SH2 domain interactions to complex cellular systems. Cell 2004, 116, 191–203. JACKMAN, J. K., MOTTO, D. G., SUN, Q., TANAMOTO, M., TURCK, C. W., PELTZ, G. A., KOREZKY, G. A., FINDELL, P. R., Molecular cloning of SLP-76, a 76-kDa tyrosine phosphorylation associated with Grb2 in T cells. J. Biol. Chem. 1995, 270, 7029–7032. FU, C., TURCK, C. W., KUROSAKI, T., CHAN, A. C., BLNK: a central linker protein in B cell activation. Immunity 1998, 9, 93–103.
113 BUBECK WARDENBURG, J., PAPPU, R., BU, J. Y., MAYER, B., CHERNOFF, J., STRAUS, D., CHAN, A. C., Regulation of PAK activation and the T cell cytoskeleton by the linker protein SLP-76. Immunity 1998, 9, 607–616. 114 SU, Y. W., ZHANG, Y., SCHWEIKERT, J., KORETZKY, G. A., RETH, M., WIENANDS, J., Interaction of SLP adaptors with the SH2 domain of Tec family kinases. Eur. J. Immunol. 1999, 29, 3702–3711. 115 CHIU, C. W., DALTON, M., ISHIAI, M., KUROSAKI, T., CHAN, A. C., BLNK: molecular scaffolding through ‘cis’-mediated organization of signaling proteins. EMBO J. 2002, 21, 6461–6472. 116 PIVNIOUK, V., TSITSIKOV, E., SWINTON, P., RATHBUN, G., ALT, F. W., GEHA, R. S., Impaired viability and profound block in thymocyte development in mice lacking the adaptor protein SLP-76. Cell 1998, 94, 229–238. 117 CLEMENTS, J. L., YANG, B., ROSS-BARTA, S. E., ELIASON, S. L., HRSTKA, R. F., WILLIAMSON, R. A., KORETZKY, G. A., Requirement for the leukocyte-specific adapter protein SLP-76 for normal T cell development. Science 1998, 281, 416–419. 118 MINEGISHI, Y., ROHRER, J., COUSTAN-SMITH, E., LEDERMAN, H. M., PAPPU, R., CAMPANA, D., CHAN, A. C., CONLEY, M. E., An essential role for BLNK in human B cell development. Science 1999, 286, 1954–1957. 119 KALESNIKOFF, J., SLY, L. M., HUGHES, M. R., BUCHSE, T., RAUH, M. J., CAO, L. P., LAM, V., MUI, A., The role of SHIP in cytokine-induced signaling. Rev. Physiol. Biochem. Pharmacol. 2003, 149, 87–103. 120 FU, Z., ARONOFF-SPENCER, E., BACKER, J. M., GERFEN, G. J., The structure of the inter-SH2 domain of class IA phosphoinositide 3-kinase determined by site-directed spin labeling EPR and homology modeling. Proc. Natl. Acad. Sci. USA 2003, 100, 3275–3280. 121 HALL, C., et al., Alpha 2-chimerin, an SH2-containing GTPase-activating protein for the ras-related protein p21rac derived by alternate splicing of the
References
122
123
124
125
126
127
128
129
130
131
human n-chimerin gene, is selectively expressed in brain regions and testes. Mol. Cell Biol. 1993, 13, 4986–4998. CRESPO, P., SCHUEBEL, K. E., OSTROM, A. A., GUTKIND, J. S., BUSTELO, X. R., Phosphotyrosine-dependent activation of Rac-1 GDP/GTP exchange by the vav proto-oncogene product. Nature 1997, 385, 169–172. BARBIERI, M. A., KONG, C., CHEN, P. I., HORAZDOVSKY, B. F., STAHL, P. D., The SRC homology 2 domain of Rin1 mediates its binding to the epidermal growth factor receptor and regulates receptor endocytosis. J. Biol. Chem. 2003, 278, 32027–32036. BROWN, M. T., COOPER, J. A., Regulation, substrates and functions of Src. Biochem. Biophys. Acta 1996, 1287, 121–149. PENDERGAST, A. M., The Abl family kinases: mechanisms of regulation and signaling. Adv. Cancer Res. 2002, 85, 51–100. GREER, P., Closing in on the biological functions of Fps/Fes and Fer. Nat. Rev. Mol. Cell Biol. 2002, 3, 278–289. SMITH, C. I., ISLAM, T. C., MATTSSON, P. T., MOHAMED, A. J., NORE, B. F., VIHINEN, M., The Tec family of cytoplasmic tyrosine kinases: mammalian Btk, Bmx, Itk, Tec, Txk and homologs in other species. BioEssays 2001, 23, 436–446. NEEL, B. G., GU, H., PAO, L., The ‘Shping news: SH2 domain-containing tyrosine phosphatases in cell signaling. Trends Biochem. Sci. 2003, 28, 284–293. JOAZEIRO, C. A., WING, S. S., HUANG, H., LEVERSON, J. D., HUNTER, T., LIU, Y. C., The tyrosine kinase negative regulator c-Cbl as a RING-type, E2-dependent ubiquitin–protein ligase. Science 1999, 286, 309–312. HAGLUND, K., SIGISMUND, S., POLO, S., SZYMKIEWICZ, I., DI FIORE, P. P., DIKIC, I., Multiple monoubiquitination of RTKs is sufficient for their endocytosis and degradation. Nat. Cell Biol. 2003, 5, 461–466. MOSESSON, Y., SHTIEGMAN, K., KATZ, M., ZWANG, Y., VEREB, G., SZOLLOSI, J., YARDEN, Y., Endocytosis of receptor tyrosine kinases is driven by monoubiquitylation, not
132
133
134
135
136
137
138
139
140
141
polyubiquitylation. J. Biol. Chem. 2003, 78, 21323–21326. WORMALD, S., HILTON, D. J., Inhibitors of cytokine signal transduction. J. Biol. Chem. 2004, 279, 821–824. MANNING, C. M., MATHEWS, W. R., FICO, L. P., THACKERAY, J. R., Phospholipase C-gamma contains introns shared by src homology 2 domains in many unrelated proteins. Genetics 2003, 164, 433–442. LATOUR, S., VEILLETTE, A., Molecular and immunological basis of X-lined lymphoproliferative disease. Immunol. Rev. 2003, 192, 212–224. SAYOS, J., et al., The X-linked lymphoproliferative-disease gene product SAP regulates signals induced through the co-receptor SLAM. Nature 1998, 395, 462–469. POY, F., YAFFE, M. B., SAYOS, J., SAXENA, K., MORRA, M., SUMEGI, J., ECK, M., Crystal structures of the XLP protein SAP reveal a class of SH2 domains with extended, phosphotyrosine-independent sequence recognition. Molecular Cell 1999, 4, 555–561. HWANG, P. M., et al., A “three-pronged” binding mechanism for the SAP/SH2D1A SH2 domain: structural basis and relevance to the XLP syndrome. EMBO J. 2002, 21, 314–323. LI, S.-C., GISH, G., YANG, D., COFFEY, A. J., FORMAN-KAY, J. D., ERNBERG, I., KAY, L. E., PAWSON, T., Novel mode of ligand binding by the SH2 domain of the human XLP disease gene product SAP/SH2D1A. Curr. Biol. 1999, 9, 1355–1362. LATOUR, S., GISH, G., HELGASON, C. D., HUMPHRIES, R. K., PAWSON, T., VEILLETTE, A., Regulation of SLAM-mediated signal transduction by SAP, the X-linked lymphoproliferative gene product. Nat. Immunol. 2001, 2, 681–690. BERRY, D. M., NASH, P., LIU, S. K., PAWSON, T., A high-affinity Arg-X-X-Lys SH3 binding motif confers specificity for the interaction between Gads and SLP-76 in T cell signaling. Curr. Biol. 2002, 12, 1336–1341. LATOUR, S., RONCAGALLI, R., CHEN, R., BAKINOWSKI, M., SHI, X., SCHWARTZBERG, P., DAVIDSON, D., VEILLETTE, A., Binding
29
30 The SH2 Domain: A Prototype for Protein Interaction Modules
142
143
144
145
146
147
148
149
150
151
of SAP SH2 domain to FynT SH3 domain reveals a novel mechanism of receptor signalling in immune regulation. Nat. Cell Biol. 2003, 5, 149–154. CHAN, B., et al., SAP couples Fyn to SLAM immune receptors. Nat. Cell Biol. 2003, 5, 155–160. BORG, J. P., OOI, J., LEVY, E., MARGOLIS, B., The phosphotyrosine interaction domains of X11 and FE65 bind to distinct sites on the YENPTY motif of amyloid precursor protein. Mol. Cell Biol. 1998, 16, 6229–6241. DHALLUIN, C., et al., Structural basis of SNT PTB domain interactions with distinct neurotrophic receptors. Mol. Cell 2000, 6, 921–929. ZWAHLEN, C., LI, S.-C., KAY, L. E., PAWSON, T., FORMAN-KAY, J. D., Multiple modes of peptide recognition by the PTB domain of the cell fate determinant Numb. EMBO J. 2000, 19, 1505–1515. RAVICHANDRAN, K. S., ZHOU, M. M., PRATT, J. C., HARLAN, J. E., WALK, S. F., FESIK, S. W., BURAKOFF, S. J., Evidence for a requirement for both phospholipid and phosphotyrosine binding via the Shc phosphotyrosine-binding domain in vivo. Mol. Cell Biol. 1997, 17, 5540–5549. BLOMBERG, N., BARALDI, E., NILGES, M., SARASTE, M., The PH superfold: a structural scaffold for multiple functions. Trends Biochem. Sci. 1999, 24, 441–445. HUBBARD, S. R., Crystal structure of the activated insulin receptor tyrosine kinase in complex with peptide substrate and ATP analog. EMBO J. 1997, 16, 5572–5581. HU, J., LIU, J., GHIRLANDO, R., SALTIEL, A. R., HUBBARD, S. R., Structural basis for recruitment of the adaptor protein APS to the activated insulin receptor. Mol. Cell 2003, 12, 1379–1389. STEIN, E. G., GHIRLANDO, R., HUBBARD, S. R., Structural basis for dimerization of the Grb10 Src homology 2 domain: implications for ligand specificity. J. Biol. Chem. 2003, 278, 13257–13264. LEVY, D. E., DARNELL, J. E., Stats: transcriptional control and biological
152
153
154
155
156
157
158
159
160
impact. Nat. Rev. Mol. Cell Biol. 2002, 3, 651–662. OTTINGER, E. A., BOTFIELD, M. C., SHOELSON, S. E., Tandem SH2 domains confer high specificity in tyrosine kinase signaling. J. Biol. Chem. 1998, 273, 729–735. CHAN, A. C., SHAW, A. S., Regulation of antigen receptor signal transduction by protein tyrosine kinases. Curr. Opin. Immunol. 1996, 8, 394–401. GRUCZA, R. A., BRADSHAW, J. M., MITAXOV, V., WAKSMAN, G., Role for electrostatic interactions in SH2 domain recognition: salt-dependence of tyrosyl-phosphorylated peptide binding to the tandem SH2 domain of the Syk kinase and the single SH2 domain of the Src kinase. Biochemistry 2000, 39, 10072–10081. HATADA, M. H., et al., Molecular basis for interaction of the protein tyrosine kinase ZAP-70 with the T-cell receptor. Nature 1995, 377, 32–38. FUTTERER, K., WONG, J., GRUCZA, R. A., CHAN, A. C., WAKSMAN, G., Structural basis for Syk tyrosine kinase ubiquity in signal transduction pathways revealed by the crystal structure of its regulatory SH2 domains bound to a dually phosphorylated ITAM peptide. J. Mol. Biol. 1998, 281, 523–537. KUMARAN, S., GRUCZA, R. A., WAKSMAN, G., The tandem Src homology 2 domain of the Syk kinase: a molecular device that adapts to interphosphotyrosine distances. Proc. Natl. Acad. Sci. USA 2003, 100, 14828–14833. MENG, W., SAWASDIKOSOL, S., BURAKOFF, S. J., ECK, M. J., Structure of the amino-terminal domain of Cbl complexed to its binding site on ZAP-70 kinase. Nature 1999, 398, 22–23. CHEN, X., VINKEMEIER, U., ZHAO, Y., JERUZALMEI, D., DARNELL, J. E., KURIYAN, J., Crystal structure of tyrosine phosphorylated STAT-1 dimer bound to DNA. Cell 1998, 93, 827–839. BECKER, S., GRONER, B., MULLER, C. W., Three-dimensional structure of the Stat3beta homodimer bound to DNA. Nature 1998, 394, 145–151.
References 161 HOF, P., PLUSKEY, S., DHE-PAGANON, S., ECK, M. J., SHOELSON, S. E., Crystal structure of the tyrosine phosphatase SHP-2. Cell 1998, 92, 441–450. 162 SICHERI, F., MOAREFI, I., KURIYAN, J., Crystal structure of the Src family tyrosine kinase Hck. Nature 1997, 385, 582–585. 163 XU, W., HARRISON, S. C., ECK, M. J., Three-dimensional structure of the tyrosine kinase c-Src. Nature 1997, 385, 595–602. 164 YOUNG, M. A., GONFLONI, S., SUPERTI-FURGA, G., ROUX, B., KURIYAN, J., Dynamic coupling between the SH2 and SH3 domains of c-Src and Hck underlies their inactivation by C-terminal tyrosine phosphorylation. Cell 2001, 105, 115–126. 165 NAGAR, B., et al., Structural basis for the autoinhibition of c-Abl tyrosine kinase. Cell 2003, 112, 859–872. 166 HANTSCHEL, O., NAGAR, B., GUETTLER, S., KRETZSCHMAR, J., DOREY, K., KURIYAN, J., SUPERTI-FURGA, G., A myristoyl/phosphotyrosine switch regulates c-Abl. Cell 2003, 112, 845–857. 167 TARTAGLIA, M., et al., Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat. Genet. 2001, 29, 465–468. 168 TARTAGLIA, M., et al., Somatic mutations in PTPN11 in juvenile myelomonocytic leukemia, myelodysplastic syndromes and acute myeloid leukemia. Nat. Genet. 2003, 34, 148–150. 169 LOH, M. L., et al., Somatic mutations in PTPN11 implicate the protein tyrosine phosphatase SHP-2 in leukemogenesis. Blood 2004, 103, 2325–2331. 170 MARQUEZ, J. A., et al., Conformation of full-length Bruton tyrosine kinase (Btk) from synchrotron X-ray solution scattering. EMBO J. 2003, 22, 4616–4624. 171 SALIM, K., et al., Distinct specificity in the recognition of phosphoinositides by the pleckstrin homology domain of dynamin and the Bruton’s tyrosine kinase. EMBO J. 1996, 15, 6241–6250. 172 HASHIMOTO, S., et al., Identification of the SH2 domain binding protein of
173
174
175
176
177
178
179
180
181
Bruton’s tyrosine kinase as BLNK: functional significance of Btk-SH2 domain in B-cell antigen receptor-coupled calcium signaling. Blood 1999, 94, 2357–2364. WATANABE, D., HASHIMOTO, S., ISHIAI, M., MATSUSHITA, M., BABA, Y., KISHIMOTO, T., KUROSAKI, T., TSUKADA, S., Four tyrosine residues in phospholipase C-gamma 2, identified as Btk-dependent phosphorylation sites, are required for B cell antigen receptor-coupled calcium signaling. J. Biol. Chem. 2001, 276, 38595–38601. VERTIE, D., et al., The gene involved in X-linked agammaglobulinaemia is a member of the src family of protein-tyrosine kinases. Nature 1993, 361, 226–226. VIHINEN, M., MATTSSON, P. T., SMITH, C. I., Bruton tyrosine kinase (BTK) in X-linked agammaglobulinemia (XLA). Front Biosci. 2000, 5, D917–D928. MORRA, M., HOWIE, D., GRANDE, M. S., SAYOS, J., WANG, N., WU, C., ENGEL, P., TERHORST, C., X-linked lymphoproliferative disease: a progressive immunodeficiency. Annu. Rev. Immunol. 2001, 19, 657–682. COFFEY, A. J., et al., Host response to EBV infection in X-linked lymphoproliferative disease results from mutations in an SH2-domain encoding gene. Nature Gen 1998, 20, 129–135. MORRA, M., et al., Characterization of SH2D1A missense mutation identified in X-linked lymphoproliferative disease patients. J. Biol. Chem. 2001, 276, 36809–36816. LI, C., IOSEF, C., JIA, C. Y., GKOURASAS, T., HAN, V. K., SHUN-CHENG LI, S., Disease-causing SAP mutants are defective in ligand binding and protein folding. Biochemistry 2003, 42, 14885–14892. PUIL, L., LIU, X., GISH, G., MBAMALU, G., BOWTELL, D., PELICCI, P. G., ARLINGHAUS, R., PAWSON, T., BCR-ABL onco-proteins bind directly to activators of the ras signalling pathway. EMBO J. 1994, 13, 764–773. PENDERGAST, A. M., et al., BCR-ABL-induced oncogenesis is
31
32 The SH2 Domain: A Prototype for Protein Interaction Modules
182
183
184
185
186
mediated by direct interaction with the SH2 domain of the GRB-2 adaptor protein. Cell 1993, 75, 175–185. SATTLER, M., et al., Critical role for Gab2 in transformation by BCR/ABL. Cancer Cell 2002, 1, 479–492. ALBER, G., KIM, K.-M., WEISER, P., RIESTERER, C., CARSETTI, R., RETH, M., Molecular mimicry of the antigen receptor signalling motif by transmembrane proteins of the Epstein–Barr virus and the bovine leukaemia virus. Curr. Biol. 1993, 3, 333–339. LONGNECKER, R., MILLER, C. L., Regulation of Epstein–Barr virus latency by latent membrane protein 2. Trends Microbiol. 1996, 4, 38–42. MERCHANT, M., CALDWELL, R. G., LONGNECKER, R., The LMP2A ITAM is essential for providing B cells with development and survival signals in vivo. J. Virol. 2000, 74, 9115–9124. HIGASHI, H., et al., Helicobacter pylori CagA induces Ras-independent
187
188
189
190
morphogenetic response through SHP-2 recruitment and activation. J. Biol. Chem. 2004, 279, 17205–17216. GRUENHEID, S., DEVINNEY, R., BLADT, F., GOOSNEY, D., GELKOP, S., GISH, G. D., PAWSON, T., FINLAY, B. B., Enteropathogenic E. coli Tir binds Nck to initiate actin pedestal formation in host cells. Nat. Cell Biol. 2001, 3, 856–859. FRISCHKNECHT, F., MOREAU, V., ROTTGER, S., GONFLONI, S., RECKMANN, I., SUPERTI-FURGA, G., WAY, M., Actin-based motility of vaccinia virus mimics receptor tyrosine kinase signaling. Nature 1999, 401, 926–929. SAKAGUCHI, N., et al., Altered thymic T-cell selection due to a mutation of the ZAP-70 gene causes autoimmune arthritis in mice. Nature 2003, 426, 454–460. VASSILEV, L. T., et al., In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science 2004, 303, 844–848.
1
The WW Domain Marius Sudol
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2
1 Introduction and Brief History of Module Discovery
As with the SH3 (Src homology 3) domain, the WW (tryptophan-tryptophan) domain also emerged from work on oncogenic viruses that was initiated at the Rockefeller University in New York in a virology laboratory focused on neoplastic transformation of cells by Rous sarcoma virus [1–3]. Biological analyses of host-dependent mutants of Rous sarcoma virus and site-directed mutation of the Src gene itself provided insights into the molecular mechanism of Src signaling [2, 4, 5]. Both approaches revealed a propensity of the N-terminal region of Src kinases to form discrete, apparently specific, protein-protein complexes [2]. The accumulated data prompted a hypothesis proposing that these complexes represent an important facet of Src function by regulating the enzymatic activity of Src [4, 5]. The identification of two protein-protein interaction modules, namely the SH2 and SH3 domains, within the very N-terminal region of Src kinases [6] reinforced the initial insight and prompted many laboratories to search for protein partners of SH domains. The WW domain was identified as an imperfect repeat of 38 amino acids in a differently spliced form of the Yes kinase-associated protein, YAP [7]. Surprisingly and uniquely, the 38 amino acids were added by a splicing event to one of the YAP isoforms that already had a single copy of the related sequence upstream [8]. The repeated sequence was distinct and easily noticed by visual inspection [8]. The Yes kinase is the closest relative of Src, and YAP was isolated as one of the SH3 domain-binding partners of Yes and Src kinases [9]. Examination of the consensus sequence in the first alignment of WW domains revealed the presence of two highly-conserved tryptophan (W) residues, spaced 20–22 amino acids apart, and led to its being named the WW domain [7]. Although intuitively we already considered this repeat a domain, at that point there were no data indicating that the short repeat of 38 amino acids mediated protein-protein Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The WW Domain
complexing or that it had a defined structure. Both results came swiftly to confirm our predictions. Using functional screens with a labeled WW domain of YAP as a probe, two partial cDNAs were isolated which had different sequences except for one short common region with a PPPPY motif, which was subsequently shown to be required for binding [10]. By sequentially replacing each of these five positions with alanine, a consensus sequence of PPxY was established [10]. Identification of the ligand prompted structural analysis of the domain-peptide complex and nuclear magnetic resonance analysis revealed a compact three-stranded β-sheet structure of the YAP WW domain [11]. At that point the sequence repeat with two conserved tryptophans became a ‘bona fide’ WW domain. It was clear to us that the domain was related to the SH3 domain by its ability to recognize proline-rich peptides, yet in many aspects it was distinct from it.
2 Structure of the WW Domain-Ligand Complex
High-resolution crystal structures and nuclear magnetic resonance structures of several WW domains and their complexes with ligands were solved [12–14]. Each WW domain folds as a compact triple-stranded β sheet [15] (Figures 1 and 2). The fold is stable in the absence of cofactors, ligands, or disulfide bridges. Based on the recognition of proline-rich ligands, the WW domains were classified into four groups (Table 1). Group I WW domains recognize a PPxY binding motif, Group II WW domains interact with PPLP motifs, and Group III WW domains
Fig. 1. Structure of the WW domain of dystrophin in complex with SPPPYV peptide derived from $dystroglycan. N and C termini of the domain are indicated. The sidechains of the first and second conserved tryptophan in the domain are indicated (W* and W**, respectively). The first conserved tryptophan is hidden behind the surface involved in ligand binding. Prolines and the tyrosine of the ligand are labeled in italics (P and Y). Note how the aromatic ring of the second tryptophan stacks with the nearby proline (for more details see [12, 15]). Fig. 2. Structure of the WW domain of dystrophin in complex with SPPPYV peptide derived from $dystroglycan. Note how the sidechain tyrosine (Y) in the second $ strand of the domain stacks with one of the prolines of the ligand. The tyrosine-proline and the second tryptophan-proline stacks constitute an aromatic cradle, a common molecular arrangement found in SH3, EVH1, and profilin complexes [12].
2 Structure of the WW Domain-Ligand Complex Table 1 Major classes of WW domains categorized according to whether they recognize
proline-rich or proline-containing ligands (for more details, see [16] and the text) Class of WW (Example)
Consensus
Ligand Protein
Class I (YAP) Class II (FE65) Class III (FBP) Class IV (Pin1)
PPxY PPLP RPPP(R) po(S/T)P
ErbB4 Mena SmB RNA Pol II
recognize polyprolines flanked or interrupted by R or K residues. Group IV is represented only by several WW domains and recognize a motif containing a proline residue preceded by phosphoserine or phosphothreonine, po(S/T)P [16]. For several members of Group I WW domains, the ligand phosphorylation on the tyrosine of the PPxY core negatively regulates complex formation in vitro and in cell culture (reviewed in [16]). The two signature tryptophans of the WW domain play different roles; the first one seems to be involved in maintaining the stability of the domain [15]; and the second conserved tryptophan participates in ligand binding [10, 15] (Figure 1). One interesting feature of the WW domain was revealed by a high-resolution crystal structure of the dystrophin WW domain in complex with the β-dystroglycan peptide [12]. It involves the second conserved tryptophan. The rings of aromatic sidechains of the WW domain form hydrophobic stacks with proline rings of the ligand (Figure 2). This arrangement is reinforced by a dominant hydrogen bond formed between a carbonyl oxygen of proline and the indole nitrogen in the tryptophan [12]. This structural arrangement, with minor variations, is a common feature found also among complexes of SH3 and EVH1 domains and profilin proteins. It was called an ‘aromatic cradle’ [12].
Fig. 3 Modular structures of proteins that harbor WW domains. As with other protein domains such as the SH2 and SH3 domains, the WW domain occurs in a variety of proteins that carry out diverse functions. YAP is a transcriptional coactivator. Nedd-4 is one of the E3-ubiquitin ligases. FE65 is an adaptor protein. Pin1 is the en-
zyme isoprolyl isomerase. Dystrophin is a cytoskeletal protein. TAD: transcription activation domain; C2: membrane-binding domain; HECT: E3-ubiquitin ligase domain; PTB: phosphotyrosine binding domain; ABD: actin binding domain composed of calponin homology 1 (CH1) and CH2 domains; SEPCTRIN REP: spectrin repeats.
3
4 The WW Domain
Using computer-aided screens of publicly accessible sequence databanks, we identified several hundred proteins or open reading frames that harbor WW domains. Examples of well defined and intensively studied WW domain-containing proteins are shown in Figure 3.
3 WW domains and Human Diseases 3.1 From Liddle’s Syndrome to Liddle’s Disease
In 1963, Liddle and colleagues described a rare syndrome of a salt-sensitive form of hypertension with clinical characteristics similar to a primary hyperaldosteronism [17]. The affected patients suffered from hypertension and metabolic alkalosis, both symptoms also observed in individuals with hyperproduction of aldosterone by the adrenal glands. However, Liddle demonstrated that his patients had normal levels of mineralocorticoids. Moreover, using an amiloride analog to selectively block a sodium channel, Liddle was able to decrease the blood pressure in patients ingesting a low-salt diet. At that point he proposed the term ‘pseudoaldosteronism’– today it is called the Liddle syndrome. In 1994 Shimkets et al. [18] were the first to demonstrate the genetic basis of Liddle’s syndrome. A mutation truncating the C-terminal region of the β subunit of ENaC (amiloride-sensitive sodium channel) was the first genetic lesion shown to correlate with Liddle disease. A subsequent study mapped other mutations, mostly deletions that encompass in common a 12-residue sequence containing the PPPNY motif [19]. At that time, the PPxY consensus motif was already recognized as a WW domain ligand core, and work in the laboratories of Rossier [20] and Rotin [21] revealed a WW domain-containing ubiquitin ligase, NEDD4, as a functional partner of ENaC. A simple scheme of the molecular basis of Liddle disease emerged and was confirmed by numerous biochemical and physiological studies [22, 23]. In the center of the scheme, the ENaC subunits (β and γ ) interact through their PPxY regions with WW domains of Nedd4 ubiquitin ligase (Figure 4). The complex is required for fast turnover of the ENaC channel and its degradation by a proteasome pathway. When the complex is compromised because of mutations in the PPxY cores, the high concentration of ENaC results in excessive activity of the channel, giving rise to sodium imbalance and hypertension. Interestingly, the systematic mutagenesis performed on the first proline-rich ligand of a WW domain, which defined the PPxY consensus [10], was reconfirmed in Nature. Missense single-point mutations in the gene encoding the β subunit of ENaC, which were reported to cause Liddle syndrome, mapped precisely to the core residues of the PPxY motif: P615S, P616L, and Y618H [24–26]. Independent patients, each harboring one of the three singlepoint mutations within the PPxY core showed full symptoms of Liddle disease. It is most likely that the ENaC-Nedd4 pair is not the only functional complex that is formed to regulate the channel. Other WW domains are known to interact with
3 WW domains and Human Diseases
Fig. 4 A simple scheme of the complex between the $ subunit of amiloride-sensitive sodium channel ENaC and Nedd-4 ubiquitin ligase. HECT is the ubiquitin ligase domain, C2 is a membrane-binding domain. The short sequences on the right indicate
the wild-type sequence of the PPxY motif that binds WW domains of Nedd4 followed by three point mutants described in individual patients with Liddle syndrome (see text for more details).
ENaC [27], and it would be interesting to understand how the channel is regulated by non-Nedd4-like proteins.
3.2 Amyloid Precursor Protein: APP and FE65
FE65 was isolated by Esposito et al. [28] in a screen designed to clone brain-enriched or brain-specific cDNAs. The structure of the FE65 protein predicted by the DNA sequence turned out to represent a typical adaptor protein composed of a WW domain and two PTB domains [29]. FE65 became a recognized entry on the signal transduction map when Russo et al. [30], and later, several other groups, showed a specific complex between APP and the second PTB of FE65. Recently, the APPFE65 complex was given a functional dimension. A report by Cao and Sudhof [31] elegantly documented that the γ -secretase-generated C-terminal fragment (CTF) of APP was able to translocate to the cell nucleus to form a multicomponent complex including FE65, the histone acetylase Tip60, and most likely, several other proteins (Figure 5). This complex was shown to activate transcription via Gal4 or LexA reporters only when the FE65 WW domain was intact, suggesting a requirement for the WW domain ligand in the process. Indeed, the list of FE65 WW domain ligands derived from the mapping data of human WW domains by AxCell Biosciences Corporation [27] contains mostly nuclear proteins that are able to regulate or affect transcription. It should be important to test the FE65WW domain ligands revealed by the WW domain mapping [27] as potentially critical components of the multicomponent APP-FE65 complex in regulating transcription. With the general availability of inexpensive advanced genomic chips, one should be able to decipher the differential profiles of gene expression regulated by APP-FE65 or APP-FE65 with mutated WW domains in neuronal cells. Finally, it would be critical to determine if any of the gene expression patterns that
5
6 The WW Domain
Fig. 5 Signaling by amyloid precursor protein (APP) complex. The carboxy-terminal fragment (CTF) of APP that is cleaved by (secretase forms a complex with FE65 adaptor protein. The complex translocates to the
cell nucleus and, together with other components such as Tip60 histone acetylase and FE65WW domain ligands (X-unknown), it regulates transcription (see text for a more detailed explanation).
are regulated by the APP-FE65 complex are also subject to changes in neuronal cells derived from brains affected by Alzheimer’s disease. 3.3 Dystrophin WW Domain and Muscular Dystrophy
The muscular dystrophies are a group of diseases that affect skeletal muscle and are characterized by progressive muscle wasting [32]. Duchenne and Becker muscular dystrophies are X-linked recessive diseases caused by mutations in the dystrophin gene [33]. This gene encodes several isoforms, among which is the 427-kD α isoform called dystrophin, which is expressed in muscle and brain. Dystrophin consists of three distinct regions: the N-terminal actin binding domain, the long spectrin-like rod region, and the cysteine-rich C-terminal region (Figure 3). Analysis of the dystrophin gene from affected patients has yielded a patho-functional map of dystrophin. According to the map, the most severe forms of Duchenne or Becker muscular dystrophies were observed when deletions affected either the N- or the C-terminal region of dystrophin. The identification of the WW domain in the C-terminal region of dystrophin immediately prompted a search for potential partner proteins within the known repertoire of proteins constituting the multicomponent dystrophinglycoprotein complex [34, 35]. The search was facilitated by the extensive biochemical data on the dystrophin-assembled complexes and the presumed presence of a PPxY core in the potential partners [34, 35]. β-Dystroglycan was instantly selected as the primary candidate ligand for the dystrophin WW domain [34]. At the C-terminal tail, which was shown to be required for interaction with dystrophin [35], the β-dystroglycan protein was found to harbor two PPxY cores. Subsequent biochemical and structural studies documented a direct contact between the most terminal PPxY core of β-dystroglycan and the dystrophin WW domain [12, 36].
4 Emerging Directions and Recent Developments
The study of the dystrophin-β-dystroglycan complex instructed us about two important and more general issues. We realized that some of the WW domains require longer flanking sequences to maintain their binding activity [12, 36]. We showed that a short N-terminal flank could cooperate with a very long C-terminal flanking sequence. Later, through the work of Chung and Campanelli [37], we realized that the very same WW domain could also retain binding activity when a much shorter C-terminal flanking sequence was ‘helped’ by somewhat longer N-terminal flank. The biology of the dystrophin WW domain-β dystroglycan complex is being investigated now in vivo. We are generating a knock-in mouse having a mutated WW domain of dystrophin. Based on limited patient data having deletions of the exons encoding the human WW domain, the expected phenotype of mutated mouse is that of muscular dystrophy of the Duchenne type (which in mice is much milder than in humans).
4 Emerging Directions and Recent Developments 4.1 AxCell’s Map
Recently a protein-protein interaction map for the human WW domain family was generated [27]: 57 human WW domains were expressed as GST fusions and probed with 1930 proline-rich peptides containing cores recognized by major classes of WW domains [16], with each peptide corresponding to the known protein or ORF in the human proteome. Colorimetric assay based on the alkaline phosphatasestreptavidin-biotin system was used to measure the apparent binding constants between GST-tagged WW domains and biotinylated peptides. A network of more than 69 000 interactions was deciphered and now serves as a blueprint for identification of new signaling pathways that utilize WW domains. This comprehensive effort [27] represents the first protein-protein interaction map of a domain in the human proteome. Apart from its value for projecting potential signaling routes and networks, the WW domain mapping provides a unique resource for understanding the structurefunction relationship of the WW domain-ligand interface. More directly, the data inform us about structural determinants of the WW domain, i.e., residues within the domain that are responsible for recognition of specific ligands. 4.2 ErbB4 Receptor Protein-Tyrosine Kinase and its WW Domain-containing Adaptor, YAP
We focused our research on ErbB4 when Carpenter and colleagues [38, 39] showed that the ErbB4 receptor protein-tyrosine kinase is proteolytically processed by
7
8 The WW Domain
membrane proteases in response to the cognate ligand, and the resulting C-terminal fragment (CTF) was found in the cell nucleus to regulate transcription. We were attracted by the relative simplicity of the Notch-type signaling and its relative novelty for the family of ErbB receptors. The decision to investigate ErbB4 signaling also stemmed from several observations, all pointing to the cognate complex between ErbB4 and YAP. Screening of molecular repertoires of proline-rich peptides on SPOT membranes, together with a screen of phage-displayed peptide libraries for ‘super binders’ to the first WW domain of YAP, resulted in a ψPPPPYP/R consensus sequence [40], where ψ (psi) designates aliphatic amino acids. This consensus matches the sequence of the most C-terminal PPxY motif (LPPPPYR) of ErbB4. We noted that ErbB4 contained three perfect PPxY motifs as potential binding sites of WW domains [41]. The Carpenter group [38] showed that CTF of ErbB4 has the potential to regulate transcription, but it had to be mutated to reveal its transcriptional regulation potential. Previously, YAP was shown to interact with PPxY containing sequences of nuclear proteins to regulate transcription [42]. Finally, the comprehensive mapping of human WW domains confirmed that the most-terminal PPxY of ErbB4 binds YAP with relatively high affinity [27]. We realized that we had in our hands a formerly missing link. With all the suggestive evidence and the available reagents, we documented that YAP was able to associate physically with the full-length ErbB4 receptor and functionally with the ErbB4 cytoplasmic fragment in the nucleus [41]. As expected, the YAP-ErbB4 complex was mediated by the first WW domain of YAP and the most C-terminal PPxY motif of ErbB4 [41]. The wild-type CTF of ErbB4 did not show transcriptional activity by itself. However, in the presence of YAP its transcriptional activity was revealed (Figure 6) [41]. Using immunofluorescent staining, we showed that the CTF of ErbB4 and full-length YAP partly colocalize in the cell nucleus, further supporting the validity of the functional complex.
Fig. 6 Signaling by ErbB4 receptor. Upon stimulation with ligand (NR-neuregulin), the ErbB4 receptor undergoes proteolytic processing that involves (-secretase. The released CTF fragment, which contains PPxY
motifs, forms a specific complex with the WW domain of YAP, a known cotranscriptional regulator. Both proteins translocate to the cell nucleus to regulate gene expression [41]. TAD: transcriptional activation domain.
5 Conclusions
To understand the ErbB4 signaling pathway, one should identify transcripts that are regulated by the ErbB4-YAP complex in response to the ligand (neuregulin) or TPA (tumor promoter) treatment – the latter is also known to initiate the proteolytic processing of ErbB4. Since the levels of ErbB4 and ErbB2 expression plus the resulting stoichiometry of ErbB4-ErbB2 heterodimers correlate with breast and cerebellar cancers [43], the precise description of the transcriptional program affected by the ErbB4-YAP complex should provide a molecular tool for a better understanding of these two types of cancers. 4.3 Membrane Proteins with PPxYs Implicated in Cancer
The examples of ENaC, β-dystroglycan, and especially the ErbB4 receptor as membrane proteins with distinct PPxY motifs within their cytoplasmic tails, which represent sites for formation of WW domain-mediated signaling complexes, turned out to be molecular scenarios amenable to fruitful experimental study. While studying these three complexes we kept searching for other membrane proteins that would contain PPxY motifs and could be implicated in human diseases. The most attractive candidate found in that search was the STAG1/PMEPA1 gene [44, 45]. The gene transcript and protein are robustly induced by androgen and found as over-expressed products in prostate and kidney tumors. The protein product is < 40 kD, has a short extracellular domain, and is followed by a transmembrane region and a relatively long cytoplasmic tail with two perfect PPxY motifs. Little is known about WW domain partners of the STAG1/PMEPA1 protein and nothing about the extracellular ligand(s). It is also not known if the protein can undergo ligand-induced dimerization or be phosphorylated on the tyrosines of the PPxY motifs. If indeed the up-regulation of the STAG1/PMEPA1 gene accelerates androgen pathways that promote neoplastic changes in prostate and kidney, then interfering with the downstream WW domain-mediated signals may have direct ramifications for cancer therapy.
5 Conclusions
The study of the WW domain has raised many important questions that are relevant to the entire field of modular protein domains. Several of these questions are mentioned here and, where possible, partial answers and comments follow. Are all WW domains and other protein domains well demarcated, independently folded, autonomous units that maintain their binding function even when removed from their original parent proteins? The data seem to indicate that these signature features of protein domains hold true for majority of them [e.g., 19, 27]. However, at one extreme there are domains that are truly exemplary ‘Lego blocks’ and at the opposite extreme there are a notable number of domains that require
9
10 The WW Domain
significantly longer flanking sequences to achieve full binding function – a Gaussian distribution it seems. Can a ‘gold standard’ of interaction for each domain-ligand complex be generated using techniques of alanine scan mutagenesis or molecular repertoires combined with high-throughput binding screens? More precisely, can we generate sequential deletions of proteins harboring the domains and their cognate ligands and create a comprehensive interaction matrix? Such data would identify minimal sequences required for domain-ligand binding and the overall binding profile of a specific complex. Is the specificity of domain-ligand pairs revealed only in the full protein and cell compartment contexts in which the interaction pair is presented? A lucid and elegant study from the laboratory of Lim [46] suggests that some protein domains do have unique cognate ligands and achieve fidelity in complex formation. The SH3 domain of Sho1, a yeast protein regulating the high-osmolarity stress-response pathway, and the proline-rich ligand Pbs2 seem to interact with perfect specificity. The remaining 26 SH3 domains present in the yeast proteome do not recognize Pbs2 in vitro or in vivo [46]. Perhaps, upon careful analysis of all known protein domains, we will find more examples of optimized specificity of domain-ligand binding within interaction networks, including those sets that are composed of overlapping recognition elements such as SH3 and WW domains, for example [47]. One could go even further by speculating that some, but not all, domainligand pairs have achieved fidelity of binding through the processes of negative and positive selection. Since promiscuous interactions may lead to evolutionarily significant disadvantages [47], one may suggest again that complex protein-protein interaction systems are under evolutionary pressure to achieve the highest degree of binding specificity. Does the apparent plasticity of protein modules, as revealed by engineered molecular repertoires of domains followed by successful selection toward new ligands [48–50], point to them as attractive molecular ‘capsules’ for tailored gene therapy? A successful interference with the Rous sarcoma virus budding process, by cis expression of the YAP WW domain that targeted PPxY-containing viral Gag, shows that an isolated WW domain can be used as a reagent to modulate in vivo processes [51]. I anticipate that using wild-type or mutated protein domains as modulators of signaling pathways will be fruitful. Can the comprehensive data of the WW domain protein-protein interaction map [27] be analyzed to decipher other rules of the ‘protein recognition code’ [52–54], given the data describing the fidelity in complex formation among some domain-ligand pairs and the apparently strong selection pressures on proteinprotein networks to achieve a high level of specificity? The protein-protein interaction map provides data that allow better understanding of specific residues within the WW domain binding surface, which are responsible for recognition of different classes of proline-rich ligands. However, truly general rules of a protein recognition code will probably emerge from concerted approaches that include structural, genetic, and biochemical analyses of biologically meaningful domain-ligand complexes.
References
A decade ago my laboratory entered the field of modular protein domains by serendipity. The analogy between modular protein domains and Lego blocks, which implicitly stipulates reiterated and combinatorial use of domains by Nature to create repertoires of functional complexes [6, 55], is an attractive, widely accepted concept. Only recently did I learn the origin of the name Lego – the word is an abbreviated fusion of two Danish words ‘leg godt’, meaning ‘have fun playing’. Indeed, designing and executing experiments with protein modules has been an engaging activity at times reminiscent of play. I look forward to new challenges and to further exploration of possibilities of tailoring the modular domains into powerful probes for profiling signaling events and for identifying leads for the development of therapeutic interventions.
Acknowledgements
I would like to thank all the members of my laboratory who contributed to the work on the WW domain. Special thanks are due to Aaron Einbond, Henry Chen, Xavier Espanel, and Akihiko Komuro for their outstanding dedication. Our work has been supported by grants from the NIH (CA45757; AR45626; DK62345), Human Frontier Science Program Organization (RG234), and Alzheimer’s and Muscular Dystrophy Associations. The help of Amjad Farooq with the WW structure figures is much appreciated. Comments of Drs. David Foster, Richard Jove, and Roman Osman significantly improved the manuscript. I apologize that many references are not included in this review. More than 300 papers have been published on the WW domain, and complete coverage of the subject was not possible.
References 1 SUDOL, M., The WW module competes with the SH3 domain? Trends Biochem. Sci. 1996, 21, 162–163. 2 JOVE, R., HANAFUSA, H., Cell transformation by the viral src oncogene. Annu. Rev. Cell Biol. 1987, 3, 31–56. 3 MAYER, B. J., SH3 domains: complexity in moderation. J. Cell Sci. 2001, 114, 1253–1263. 4 HIRAI, H., VARMUS, H. E., Site-directed mutagenesis of the SH2- and SH3-coding domains of c-src produces varied phenotypes, including oncogenic activation of c-src. Mol. Cell. Biol. 1990, 10, 1307–1318. 5 HIRAI, H., VARMUS, H. E., Mutations in src homology regions 2 and 3 of activated chicken c-src that result in preferential
transformation of mouse or chicken cells. Proc. Natl. Acad. Sci. USA 1990, 87, 8592–8596. 6 PAWSON, T., NASH, P., Assembly of cell regulatory systems through protein interaction domains. Science 2003, 300, 445–452. 7 BORK, P., SUDOL, M., The WW domain: a new signaling site in dystrophin? Trends Biochem. Sci. 1994, 19, 531–533. 8 SUDOL, M., BORK, P., EINBOND, A., KASTURY, K., DRUCK, T., NEGRINI, M., HUEBNER, K., LEHMAN, D., Characterization of the mammalian YAP (yes-associated protein) gene and its role in defining a novel protein module, the WW domain. J. Biol. Chem. 1995, 270, 14733–14741.
11
12 The WW Domain
9 SUDOL, M., Yes-associated protein (YAP65) is a proline-rich phosphoprotein that binds to the SH3 domain of the Yes proto-oncogene product. Oncogene 1994, 9, 2145–2152. 10 CHEN H. I., SUDOL, M., The WW domain of Yes-associated protein binds a novel proline-rich ligand that differs from the consensus established for SH3-binding modules. Proc. Natl. Acad. Sci. USA 1995, 92, 7819–7823. 11 MACIAS, M. J., HYVONEN, M., BARALDI, E., SCHULTZ, J., SUDOL, M., SARASTE, M., OSCHKINAT, H., The structure of the WW domain in complex with a proline-rich peptide. Nature 1996, 382, 646–649. 12 HUANG, X., ROY, F., ZHANG, R., JOACHIMIAK, A., SUDOL, M., ECK, M. J., Recognition of a proline motif in beta-dystroglycan by an ‘embedded’ WW domain in human dystrophin. Nat. Struct. Biol. 2000, 7, 634–638. 13 KANELIS, V., ROTIN, D., FORMAN-KAY, J. D., Solution structure of a Nedd4 WW domain–ENaC peptide complex. Nat. Struct. Biol. 2001, 8, 407–412. 14 MACIAS, M. J., GERVAIS, V., CIVERA, C., OSCHKINAT, H., Structural analysis of WW domains and design of a WW prototype. Nat. Struct. Biol. 2000, 7, 375–379. 15 MACIAS, M. J., WIESNER, S., SUDOL, M., WW and SH3 domains, two different scaffolds to recognize proline-rich ligands. FEBS Lett. 2002, 513, 30–37. 16 SUDOL, M., HUNTER, T., NeW Wrinkles for an Old Domain. Cell 2000, 103, 1001–1004. 17 LIDDLE, G. W., BLEDSOE, T., COPPAGE, W. J. S., A familial renal disorder simulating primary aldosteronism but with negligible aldosterone secretion. Trans. Assoc. Am. Physicians 1963, 76, 199–213. 18 SHIMKETS, R. A., et al., Liddle’s syndrome: heritable human hypertension caused by mutations in the beta-subunit of the epithelial Na channel. Cell 1994, 79, 407–414. 19 SUDOL, M., Structure and function of the WW domain. Prog. Biophys. Mol. Biol. 1996, 65, 113–132. 20 SCHILD, L., LU, Y., GAUTSCHI, I., SCHNEEBERGER, E., LIFTON, R. P., ROSSIER,
21
22
23
24
25
26
27
B. C., Identification of a PY motif in the epithelial Na channel subunits as target sequence for mutations causing channel activation found in Liddle syndrome. EMBO J. 1996, 15, 2381–2387. STAUB, O., DHO, S., HENRY, CORREA, P. C. J., ISHIKAWA, T., MCGLADE, J., ROTIN, D., WW domains of Nedd4 bind to the proline-rich PY motifs in the epithelial Na channel deleted in Liddle’s syndrome. EMBO J. 1996, 15, 2371–2380. SCHILD, L., CANESSA, C. M., SHIMKETS, R. A., GAUTSCHI, I., LIFTON, R. P., ROSSIER, B. C., A mutation in the epithelial sodium channel causing Liddle disease increases channel activity in the Xenopus laevis oocyte expression system. Proc. Natl. Acad. Sci. USA 1995, 92, 5699–5703. SNYDER, P. M. PRICE, M. P., MCDONALD, F. J., ADAMS, C. M., VOLK, K. A., ZEIHER, B. G., STOKES, J. B., WELSH, M. P., Mechanism by which Liddle’s syndrome mutations increase activity of a human epithelial Na channel. Cell 1995, 83, 969–978. HANSSON, J. H., SCHILD, L., LU, Y., WILSON, T. A., GAUTSCHI, I., SHIMKETS, R., NELSON-WILLIAMS, C., ROSSIER, B. C., LIFTON, R. P., A de novo missense mutation of the beta subunit of the epithelial sodium channel causes hypertension and Liddle syndrome, identifying a proline-rich segment critical for regulation of channel activity. Proc. Natl. Acad. Sci. USA 1995, 92, 11495–11499. TAMURA, H., SCHILD, L., ENOMOTO, N., MATSUI, N., MARUMO, F., ROSSIER, B. C., SASAKI, S., Liddle disease caused by a missense mutation of the beta subunit of the epithelial sodium channel gene. J. Clin. Invest. 1996, 97, 1780–1784. INOUE, J., IWAOKA, T., TOKUNAGA, H., TAKAMUNE, K., NAOMI, S., ARAKI, M., TAKAHAMA, K., YAMAGUCHI, K., TOMITA, K., A family with Liddle’s syndrome caused by a new missense mutation in the beta subunit of the epithelial sodium channel. J. Clin. Endocrinol. Metab. 1998, 83, 2210–2213. HU, H., et al., A map of WW domain family interactions. Proteomics 2004, 4, 643–655.
References 28 ESPOSITO, F., AMMENDOLA, R., DUILIO, A., COSTANZO, F., GIORDANO, M., ZAMBRANO, N., D’AGOSTINO, P., RUSSO, T., CIMINO, F., Isolation of cDNA fragments hybridizing to rat brain-specific mRNA’s. Dev. Neurosci. 1990, 12, 373–381. 29 SUDOL, M., SLIWA, K., RUSSO, T., Functions of WW domain in nucleus. FEBS Lett. 2001, 490, 190–195. 30 RUSSO, T., FARAONIO, R., MINOPOLI, G., De CANDIA, P., De RENZIS, S., ZAMBRANO, N., FE65 and protein network centered around the cytosolic domain of the Alzheimer’s beta-amyloid precursor protein. FEBS Lett. 1998, 434, 1–7. 31 CAO, X., SUDHOF, T. C., A transcriptionally active complex of APP with Fe65 and histone acetyltransferase Tip60. Science 2001, 293, 115–120. 32 O’BRIEN, K. F., KUNKEL, L. M., Dystrophin and muscular dystrophy, past, present and future. Mol. Genet. Metab. 2001, 74, 635–644. 33 SPENCE, H. J., CHEN, Y. J., WINDER, S. J., Muscular dystrophies, the cytoskeleton and cell adhesion. BioEssays 2002, 24, 542–552. 34 EINBOND, A., SUDOL, M., Towards prediction of cognate complexes between the WW domain and proline-rich ligands. FEBS Lett. 1996, 84, 1–8. 35 JUNG, D., YANG, B., MEYER, J., CHAMBERLAIN, J. S., CAMPBELL, K. P., Identification and characterization of the dystrophin anchoring site on beta-dystroglycan. J. Biol. Chem. 1995, 270, 27305–27310. 36 RENTSCHLER, S., LINN, H., DEININGER, K., BEDFORD, M. T., ESPANEL, X., SUDOL, M., The WW domain of dystrophin requires EF-hands region to interact with beta-dystroglycan. Biol. Chem. 1999, 380, 431–442. 37 CHUNG, W., CAMPANELLI, J. T., WW and EF hand domains of dystrophin-family proteins mediate dystroglycan binding. Mol. Cell. Bio. Res. Comm. 1999, 2, 162–171. 38 NI, C. Y., MURPHY, M. P., GOLDE, T. E., CARPENTER, G., Gamma-secretase cleavage and nuclear localization of ErbB-4 receptor tyrosine kinase. Science 2001, 294, 2179–2181.
39 NI, C. Y., YUAN, H., CARPENTER, G., Role of ErbB-4 carboxyl-terminus in gamma secretase cleavage. J. Biol. Chem. 2003, 278, 4561–4565. 40 LINN, H., ERMEKOVA, K., RENTSCHLER, S., SPARKS, A., KAY, B., SUDOL, M., Using molecular repertoires to identify high-affinity peptide ligands of the WW domain of human and mouse YAP. Biol. Chem. 1997, 378, 531–537. 41 KOMURO, A., NAGAI, M., NAVIN, N., SUDOL, M., WW domain-containing protein YAP associates with ErbB-4 and acts as a co-transcriptional activator for the carboxyl-terminal fragment of ErbB-4 that translocates to the nucleus. J. Biol. Chem. 2003, 278, 33334–33341. 42 YAGI, R., CHEN, L. F., SHIGESADA, K., MURAKAMI, Y., ITO, Y., A WW domain-containing yes-associated protein (YAP) is a novel transcriptional co-activator. EMBO J. 1999, 18, 2551–2562. 43 CARPENTER, G., ErbB-4: mechanism of action and biology. Exp. Cell. Res. 2003, 284, 66–77. 44 XU, L. L., SHANMUGAM, N., SEGAWA, T., SESTERHENN, I. A., MCLEOD, D. G., MOUL, J. W., SRIVASTAVA, S., A novel androgen-regulated gene, PMEPA1, located on chromosome 20q13 exhibits high level expression in prostate. Genomics 2000, 66, 257–263. 45 RAE, F. K., HOOPER, J. D., NICOL, D. L., CLEMENTS, J. A., Characterization of a novel gene, STAG1/PMEPA1, up-regulated in renal cell carcinoma and other solid tumors. Mol. Carcinogenesis 2001, 32, 44–53. 46 ZARRINPAR, A., PARK, S.-H., LIM, W., Optimization of specificity in cellular protein interaction network by negative selection. Nature 2003, 426, 676–680. 47 BEDFORD, M. T., CHAN, D. C., LEDER, P., FBP WW domains and the Abl SH3 domain bind to a specific class of proline-rich ligands. EMBO J. 1997, 16, 2376–2383. 48 ESPANEL, X., SUDOL, M., A single point mutation in a group I WW domain shifts its specificity to that of group II WW domains. J. Biol. Chem. 1999, 274, 17284–17289.
13
14 The WW Domain
49 TOEPERT, F., KNAUTE, T., GUFFLER, S., PIRES, J. R., MATZDORF, T., OSCHKINAT, H., SCHNEIDER-MERGENER, J., Combining SPOT synthesis and native peptide ligation to create large arrays of WW protein domains. Angew. Chem., Int. Ed. 2003, 42, 1136–1140. 50 KASANOV, J., PIROZZI, G., UVEGES, A. J., KAY, B. K., Characterizing Class I WW domains defines key specificity determinants and generates mutant domains with novel specificities. Chem. Biol. 2001, 8, 231–241. 51 PATNAIK, A., WILLS, J. W., In vivo interference of Rous sarcoma virus budding by cis expression of a WW domain. J. Virol. 2002, 76, 2789–2795. 52 SUDOL, M., From Src Homology modules to other signaling domains: proposal of the ‘protein recognition code’. Oncogene 1998, 17, 1469–1474. 53 TONG, A. H. Y., et al., A combined experimental and computational strategy to define protein interaction networks for
peptide recognition. Science 2002, 295, 321–324. 54 PANNINI, S., DENTE, L., CESARENI, G., In vitro evolution of recognition specificity mediated by SH3 domains reveals target recognition rules. J. Biol. Chem. 2002, 277, 21666–21674. 55 DAS, S., SMITH, T. F., Identifying nature’s protein Lego set. Adv. Protein Chem. 2000, 54, 159–183.
Web Resources Relating to the WW Domain WW domain page: http://www.www.bork.emblheidelberg.de/Modules/ww refs.html — this site was created in 1994 and is regularly updated by the Sudol and Bork laboratories. SMART database: http://smart.embl-heidelberg.de/ Pfam database: http://www.sanger.ac.uk/Software/Pfam/
1
High-Throughput Crystallography Harren Jhoti
Astex Technology Ltd., Cambridge, United Kingdom
Originally published in: Protein Crystallography in Drug Discovery. Edited by Robert E. Babine and Sherin S. Abdel-Meguid. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30678-7
1 Introduction
In the late 1990s the concept of high-throughput crystallography was first proposed. It was inspired by the significant achievements in the genomics arena in which the DNA of several organisms, including homo sapiens, were being sequenced using high-throughput methodologies [1]. In fact, earlier in the decade the broader application of “high-throughput” methodologies became commonplace within the pharmaceutical industry with the birth of high-throughput chemistry and screening [2]. During this period X-ray crystallography was unable to keep pace and the other drug discovery technologies being performed in a high-throughput mode became the focus of many pharmaceutical companies. More recently, however, there has been a resurgence in interest for using structure-based approaches, driven largely by major technological developments in crystallography, which has resulted in many new crystal structures of therapeutic targets. Furthermore, the ability to rapidly obtain crystal structures of a target protein in complex with small molecules is driving a new wave of structure-based drug design. The need for “high-throughput crystallography” was originally driven not by the drug discovery sector but by the “structural genomics” initiatives that were set-up with the aim of solving crystal structures of representatives from protein families for which little or no structural information was known [3]. In short, the original goal of the structural genomics projects was to obtain crystal structures for all known protein families. Most of these initiatives focused on different bacterial genomes but they all shared a common need to develop new methods and technologies ranging from molecular biology to protein production and structure determination. In this chapter some of these technology developments will be described briefly and how
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 High-Throughput Crystallography
they have enabled high-throughput X-ray crystallography to be applied within the drug discovery process discussed.
2 Technological Advances
There are many areas in which new technologies and methods are being developed to enable high-throughput structure determination by X-ray crystallography [4, 5]. The process from gene to crystal structure is clearly multidisciplinary and advances in molecular biology, biochemistry, crystallization, X-ray data collection and computational analysis underpin high-throughput X-ray crystallography. Many of these advances are being made in the public-funded initiatives focused on structural genomics. These programs began several years ago when Japan began building the RIKEN Genomic Sciences Center in Yokohama and the USA began funding structural genomics pilot projects under the NIGMS (National Institute of General Medical Sciences) Protein Structure initiative [6]. Similar programs are underway in other countries, for example, the Protein Structure Factory in Germany is focusing on solving structures of human proteins in collaboration with the German Human Genome Project (DHGP) and the European Union has recently funded the Structural Proteomics in Europe (SPINE) program, which is focused on both eukaryotic and prokaryotic proteins (www.spineurope.org). The main focus of these structural genomics initiatives is to automate all steps of structural biology and to determine structures of proteins, using either NMR or X-ray crystallography, for which no three-dimensional information exists [7]. In addition to these public-funded centers, some specialist biotechnology companies were also formed in 1999 to pursue structural genomics programs. These include Structural GenomiX and Syrrx (San Diego, CA, USA), who are making significant developments in the automation of streamlining the gene to crystal structure process [8]. Other companies, such as Astex Technology (Cambridge, UK) and Abbott Laboratories (Abbott Park, IL, USA) have also developed high-throughput crystallography as a novel approach for fragment-based lead discovery (see below). 2.1 Clone to Crystal
The process of going from the gene of interest to a suitable crystal of the protein can be broadly separated into two stages. Firstly, significant amounts of chemically (and conformationally) homogenous protein need to be generated in a robust and reproducible process. Secondly, the protein then needs to be crystallized in order for the X-ray diffraction experiments to be performed. These two stages will now be briefly discussed. 2.1.1 Protein Production Expression, purification and characterization of a novel protein in a quantity and form that is suitable for crystallization and X-ray analysis probably occupies 70–80%
2 Technological Advances 3
of the time in most academic structural biology groups. Consequently, methods for high-throughput parallel expression and purification are now being developed in many laboratories [9, 10]. Typically, 10–50 mg of protein are required to screen sufficient numbers of crystallization conditions to obtain the initial crystals. Traditionally, a handful of different DNA constructs would be generated, after analysis of the protein sequence, in an attempt to remove flexible regions of the protein that may hinder crystallization. Each construct would then be tested for expression in the host cell, usually Escherichia coli or insect cells, and the level of functional protein analyzed using bioassay and polyacrylamide gel electrophoresis (PAGE). In the past these different constructs would be analyzed sequentially, but recent developments in molecular biology, based on DNA recombination, now enable high-throughput approaches for cloning and expression where tens to hundreds of DNA constructs can easily be generated to test in parallel for high expression. Protein purification has also seen significant improvements owing to the development of affinity tags that allow proteins to be purified significantly faster and more efficiently [11]. Automated methods based on affinity chromatography, such as a nickel-nitrilotriacetic acid (Ni-NTA) column, are now available which can process samples in parallel using a 96-well format [10]. Most of these developments in protein expression have been designed to improve the throughput and efficiency of current in vivo methods. However, recent developments using in vitro systems have been reported and could potentially offer a new method for routine protein production. For several years there has been interest in bacterial and eukaryotic cell-free translation systems in which cellular components contained in artificial cells are used to generate protein but these systems have suffered from problems with stability and efficiency [12]. If they could be made to work, these systems could express proteins which would otherwise interfere with cell physiology and so offer advantages over traditional in vivo methods. For example, proteases that are toxic to the cell could be inhibited by small molecules during expression, and post-translational modifications such as phosphorylation could be better controlled using these cell-free systems. Furthermore, these systems can be used to produce stable-isotope labelling of proteins for NMR spectroscopy [12]. Recently, workers have overcome some of the stability problems and have developed a eukaryotic cell-free expression system, using components of the wheat seed, which was able to maintain productive translation for 14 days, resulting in several milligrams of protein [13]. These systems also lend themselves to robotic automation, which means the amounts of protein typically produced using this approach could be significantly scaled-up. 2.1.2 Crystallization Crystallization is often regarded as a slow, resource-intensive step with low success rates in obtaining good X-ray quality crystals. However, much of the failure during this step can be attributed to poor quality protein samples that often have some level of chemical or conformational heterogeneity. The use of biophysical methods, such as fluorescence, circular dichroism, dynamic light scattering and mass spectrometry, to characterize the protein sample rigorously is a key step before performing crystallization experiments [14, 15].
4 High-Throughput Crystallography
However, significant advances in automation have also improved the process of crystallization with the new generation of robots able to sample the multidimensional space efficiently by varying precipitant concentration, buffers and pH, all variables known to affect crystallization. For example, a crystallization robot called Agincourt, jointly developed by the Genomics Institute of the Novartis Research Foundation (San Diego, CA, USA) and Syrrx, is able to set-up and observe 60 000 samples per day. Several other groups, such as scientists at Emerald Biostructures (Bainbridge Island, WA, USA), the Protein Structure Factory (Berlin, Germany) as well as commercial vendors such as Gilson-Cyberlab (Middleton, USA), have developed similar robots. Each of these crystallization robots has been designed to optimize and speed-up the experimental process and can prepare and monitor many thousands of experiments per day. One key advantage of some of these robotic systems has been the capacity to perform nanocrystallization. For example, the Agincourt robot has been designed to dispense volumes in the 20–100 nL range, which significantly reduces the amount of protein required for screening of crystallization conditions [16]. Using this robot, crystals of Aurora-A, FAR and EphA2 protein kinases were grown in vapor diffusion experiments with 50 nL sitting drops, with 1–3 mg of protein being sufficient to set up over 1200 crystallization conditions [17]. The small drop size volume was also reported to accelerate crystal growth with maximal crystal size being reached within 48 hours rather than many days or weeks. These crystals were suitable for X-ray diffraction studies and were used to determine the structures of the protein kinases to high resolution [17]. Other groups have reported developing batch nanocrystallization methods where total crystallization volumes as low as 1 nL are being used [18]. One major technical challenge that nanocrystallization, in particular, has produced is that of automated crystal detection. Traditionally, a crystallographer would need to check each crystallization drop manually for signs of crystals or crystalline precipitant using a microscope. This clearly becomes impractical if the drops are 10–50 nL in size. Automated methods are being developed that allow the user to monitor and analyze the crystallization experiment [19]. Automatic crystal identification has remained a challenge as the size and morphology of crystals can vary greatly. Two approaches that are being developed for crystal detection involve edgedetection and birefringence as protein crystals often have sharp boundaries and are capable of altering polarized light [18, 20]. An ingenious method for analyzing the X-ray diffraction quality of a crystal while still in the crystallization drop has also been reported [21]. 2.2 Crystal to Structure
Once X-ray quality crystals have been grown data collection using several wavelengths or derivatives is required in order to obtain the protein structure. X-ray data collection has been revolutionized in the last decade by both better X-ray sources and detectors. Third generation synchrotrons are now available across the world
2 Technological Advances 5
which provide high intensity X-ray beams allowing the data collection time to be significantly reduced [22]. Synchrotron radiation coupled with charged-coupled device (CCD) detectors have allowed complete X-ray datasets for a crystal to be collected and processed within hours instead of days. Similarly, laboratory X-ray sources have also been significantly improved due largely to better optics [23]. Phase determination has also become dramatically easier by the application of synchrotron radiation to single and multi-wavelength anomalous diffraction techniques, known as SAD and MAD, respectively [24]. In fact, recent reports indicate that these anomalous diffraction methods can also be carried out on laboratory X-ray sources [25]. Elegant molecular biology methods for inserting heavy atoms such as selenium into the protein have been developed using auxotrophic cell expression systems [26]. This has replaced the approach of heavy atom derivatization of native protein crystals, which was often a major bottleneck in the novel structure determination of a protein. The percentage of protein structures solved using these methods, such as selenomethionine incorporation, has been increasing over the past decade [27]. Once a suitable electron density map has been generated many new methods of electron density interpretation and model-building are then available, which allow rapid and automated construction of protein models without the need for significant manual intervention [28]. In some cases, protein models have been automatically built into electron density maps in a matter of hours, rather than manually which often takes days or weeks. However, as much of this development has been in academic groups, most of these software tools focus only on the protein electron density and are unable to perform automated interpretation of ligand electron density. This has remained a significant problem for industrial groups who may be performing protein-ligand crystallography for as much as 80% of their time. To address this problem, specialized software has been developed such as the x-ligand module in Quanta from Accelrys Inc. (San Diego, CA, USA) R R and AutoSolve from Astex. AutoSolve allows automated analysis of the differences in electron density to detect binding of small molecules to the protein [29]. In the event of ligand binding, the software generates a correctly parameterized model for the ligand, which is then fitted into the electron density and the R protein-ligand complex refined. AutoSolve also identifies and models any solvent molecules and protein conformational movement that may have occurred on ligand binding. 2.3 Progress in Structural Genomics
From the beginning the structural biology community has been divided over the structural genomics programs, with many leading scientists critical of investing so heavily in these centers, and claiming these exercises are analogous to “stampcollecting“. The funding is truly unprecedented with Japan promising to fund its RI-KEN programs and eight new centers with $100 million per year over the next five years and the NIGMS funding its centers at level of $50 million a year [6]. The
6 High-Throughput Crystallography
critics also voiced concern over the stated goals of the programs, for example, the claim from NIGMS was to generate 10 000 structures for unique proteins in 10 years and was considered by some to be unrealistic. Many felt that solving protein crystal structures was very different from sequencing genomes, and that the gain in productivity seen from “industrialization” in the human genome project would not extend to structural biology. A recent conference to review the progress in structural genomics indicated that some of these initial goals were indeed over-ambitious [30]. Furthermore, the productivity of many of these centers has been disappointing with many reporting similar problems in the different parts of the “gene-to-structure process“. For example, the Northeast Structural Genomics Consortium (NESGC) led by Gaetano Montelione (Rutgers University, USA), reported that from 5187 DNA targets, 917 proteins have been purified from which only 50 structures have been solved using NMR or X-ray crystallography. Many other structural genomics centers reported similar attrition rates at this conference, which indicates that the new high-throughput technologies are not delivering, or at least not yet. The supporters of these public initiatives claim it is too early to pass judgement and point to 2004 when they believe the technology will be mature enough to be truly evaluated. The privately based biotechnology companies have, however, reported better progress leading some observers to conclude that these “industrial-scale” programs are not best performed in academic centers. Syrrx, for example, claim to have determined 56 structures of unique proteins in the last 8 months. Although most of these structures are thought to be of bacterial origin, the company has also determined the structures of some important human protein kinases such as Aurora-A, FAR and EphA2 [17]. Structural GenomiX also claim to have solved more than 100 crystal structures using their high-throughput crystallography platform, again most are believed to be of bacterial proteins [29]. Astex has focused its high throughput crystallography methods solely on human proteins that are key to drug discovery and recently announced that it had solved the structures of human cytochrome P450 isozymes 2C9 and 3A4. The human cytochrome P450 enzymes play a vital role in drug metabolism and so have been of key interest to pharmaceutical companies for many years. However, despite many attempts to determine their structures, these proteins have remained intractable yielding only to high throughput crystallography methods. Even if the original goals of the structural genomics initiatives prove to be overambitious, there is little doubt that the momentum, and technologies, generated by these programs will filter into the wider structural biology community and result in a significant increase in the number of protein structures. Currently, the Protein Data Bank (PDB) holds around 19 000 protein structures, most of which have been determined using X-ray crystallography, and this number is expected to rise exponentially in the coming years (Fig. 1) [31]. Owing to this growing wealth of protein structure data, it is increasingly possible that a therapeutic target of interest to drug discovery scientists will have had its three-dimensional structure determined. Therefore the focus is likely to shift from solving the structure to how protein crystal structures can be exploited in drug discovery.
3 High-throughput Crystallography in Lead Discovery 7
Fig. 1 Growth in the Protein Data Bank. For many years the number of protein structures being determined and deposited into the PDB was linear, however, with the advent of major technology advances over the last decade the deposition rate has become exponential. (Reproduced from the Protein Data Bank [31]).
3 High-throughput Crystallography in Lead Discovery
With the expectation that the structure of a drug target is probably going to be known, some groups have been developing methods for high-throughput crystallography for a very different purpose. Groups at Abbott and Astex have developed methods that focus on significantly improving the throughput of protein-ligand crystallography. This can be a bottleneck in drug discovery programs that utilize structure-based drug design methods. In many of these programs the structural information on the binding mode of compounds often lags behind the chemical synthesis, which reduces its impact. By obtaining crystal structures of proteinligand complexes in a timelier manner, lead optimization can be performed more effectively. Furthermore, although the structure of the native target protein is a useful start to guide a lead discovery program, the maximum value is only derived from structures of the protein in complexes with potential lead compounds. This is due to the fact that many proteins undergo some level of conformational movement on ligand binding which has proved very difficult to predict from the native structure alone. Also, water molecules often play a key role in the interactions between small molecules and proteins and their positions need to be established experimentally. The ability to determine crystal structures of protein-ligand complexes rapidly can not only effectively guide the lead optimization phase but also allows a new application of X-ray crystallography in drug discovery, that of a screening technology for fragment-based lead discovery [32].
8 High-Throughput Crystallography
3.1 Protein-Ligand Crystal Structures
The most reliable approach to determining the structure of a protein–ligand complex is either by co-crystallization or by soaking the ligand into the preformed crystal. However, when X-ray crystallography is used as a method for ligand screening, the soaking option is much preferred. After collecting the X-ray data from a protein crystal exposed to a ligand, the next step is to analyze and interpret the resulting electron density. This step is often time consuming and requires a crystallographer to spend several days assessing the data from a single protein-ligand experiment. This is a key bottleneck to using X-ray crystallography as a method for screening compounds. Technology advances have now been made to automate and accelerate this step. As mentioned earlier, software tools such as Quanta from Accelrys Inc. R (San Diego, CA, USA) and AutoSolve from Astex (Cambridge, UK) can assist the crystallographer in the analysis and interpretation steps. Another area of automation that has assisted in the high throughput crystallography of proteins and protein–ligand complexes involves hardware for mounting and removing crystals from X-ray detectors, tasks currently performed by scientists. Robotic systems have been developed which can remove cryogenically frozen crystals from a storage dewar, place and correctly align the crystal on the X-ray detector and remove after the X-ray data have been collected, ready for the next crystal. Several of these systems have been developed by groups at synchrotron beam-lines, where the data collection is often completed in a matter of minutes and the timeconsuming step has been scientists having to change crystals [33]. However, with the advent of more sensitive X-ray detectors and more powerful laboratory X-ray sources, in-house data collection times have also been significantly reduced, so that a crystal-changing robot can have a significant impact on productivity within a laboratory setting. In fact, the Automated Crystal Transport, Orientation and Retrieval (ACTOR) robot, which was developed by the Abbott group, was one of the first systems and was designed for use on a laboratory X-ray source [34]. ACTOR has subsequently been licensed to Rigaku-MSC (Woodlands, TX, USA), a leading X-ray equipment manufacturer, who has made further improvements and now markets this robot for both laboratory and synchrotron X-ray sources (Fig. 2). The integration of these software and hardware technological developments can allow protein-ligand crystallography to be performed with greater automation and at a significantly faster pace in the laboratory setting. For example, the Abbott group have reported collecting X-ray data from 18 crystals of dihydroneopterin aldolase in 42 hours using a Rigaku rotating anode source with a MarCCD detector (MarUSA, Evanston, IL, USA) and the original ACTOR system [34]. Similarly, the Astex group have a hardware configuration based on a Jupiter 140 CCD X-ray detector on a Rigaku Ru-300 HR 5KW rotating anode laboratory X-ray source with an ACTOR robot (all supplied by Rigaku-MSC). X-ray data were recently collected using this set-up from crystals of protein tyrosine phosphatase 1B (PTP1B), a key therapeutic target for type 2 diabetes, as part of a fragment screening campaign.
4 Fragment-Based Lead Discovery
Fig. 2 Automation for X-ray data collection. The ACTOR robot coupled with the new generation Jupiter CCD detectors from Rigaku-MSC (Woodlands, TX, USA) allows rapid, unmanned data collection from laboratory x-ray sources [33].
X-ray data from 52 PTP1B crystals were collected and processed in 48 hours without manual intervention (personal communication). These data were analyzed R simultaneously for ligand binding using AutoSolve , which in the event of binding is able to generate a refined protein-ligand crystal structure automatically. To further optimize the impact of protein-ligand data, crystal structures should then be automatically presented to the medicinal chemists using a web-based molecular viewer such as AstexViewerTM , which is capable of displaying key protein-ligand interactions and also electron density [35]. This level of automation and throughput is required if X-ray crystallography is to be used effectively as a screening technology for fragment-based lead discovery.
4 Fragment-Based Lead Discovery
There is growing interest in the use of molecular fragments for lead discovery. One reason for this interest is due to a problem that is evident in the nature of “hits” identified from traditional bioassay-based high throughput screens (HTS). The average MW of successful drugs in the World Drug Index is in the low 300s, which is similar to the average MW in current corporate compound collections [36]. This implies that corporate compound collections have evolved to be broadly “drug-like” instead of “lead-like” with respect to MW and other features. Therefore, an HTS hit from a compound collection with µM affinity towards the target may
9
10 High-Throughput Crystallography
well already have an average drug MW, however, it is likely that the MW will increase significantly during the lead optimization process, leading to poor druglike properties with respect to solubility, absorption and clearance [37]. In order to address this issue several groups have been developing methods to identify low MW fragments (MW 100–250) that could be efficiently optimized into novel lead compounds possessing good drug-like properties. These molecular fragments would by definition have limited functionality and will therefore exhibit weaker affinity (typically in the 100 µm-mM range). This level of affinity is outside of the normal HTS sensitivity range and as such cannot routinely be identified in standard bioassays due to the high concentration of compound that would be required, interfering with the assay and leading to significant false positives. However, biophysical methods such as NMR and X-ray crystallography are ideal for detecting weakly binding molecules. Different libraries of molecular fragments can be used to target a particular protein. For example, computational analysis of the proteinactive site using pharmacophoric mapping or protein-ligand docking methods can be used to generate a focused-fragment library for a target or target class. Alternatively, diversity-based fragment libraries can be used that have been generated by sampling scaffolds in known drug molecules [38]. As in conventional drug discovery, the use of different fragment libraries in a screening campaign is influenced by the varying importance of novelty in the initial lead compound. A key advantage of using molecular fragments for screening is the significant amount of chemical space that is sampled using a relatively small library of compounds, typically ranging from 500–1000 compounds. For example, if the binding of several heterocycles is probed against specific binding pockets in a protein, the discrimination between a binding and non-binding event is dependent solely on the molecular complementarity and not constrained or modulated by the heterocycle being part of a larger molecule. This is a far more comprehensive and elegant way to probe for new interactions than having the fragments attached to a rigid template, as might derive from a conventional combinatorial chemistry approach. One of the earliest and most successful applications of fragment-based lead discovery using biophysical methods is that of the NMR methods pioneered by Fesik and colleagues at Abbott and coined “SAR-by-NMR” [39, 40]. In determining structure-activity relationships (SAR) by NMR, perturbations to the NMR spectra of a protein are used to indicate that ligand binding is taking place and to give some indication of the location of the binding site. Once molecular fragments bound to the target protein have been identified they can be linked together using structurebased chemical synthesis to improve the affinity for the target. Compounds with nanomolar affinities for FK506-binding protein were discovered using SAR-byNMR where two ligands with micromolar affinities were tethered together [39]. 4.1 Fragment-Based Lead Discovery Using X-ray Crystallography
X-ray crystallography has the advantage of defining the ligand-binding sites with more certainty than NMR and the binding orientations of the molecular fragments
4 Fragment-Based Lead Discovery
play a critical role in guiding efficient lead optimization programs. Furthermore, ligands that bind away from the target active site can be ignored, which provides another advantage over some NMR methods. Fragment libraries can be screened as singlets or in cocktails by soaking pre-formed crystals of unliganded proteins using X-ray crystallography as the primary method of detection. As the output from an X-ray experiment is a visual description of the bound compound (its electron density) it is possible to screen cocktails of compounds without the need to deconvolute. An optimum cocktail size is typically between 4 and 8 and is defined by the tolerance of the protein crystals to organic solvents and the concentration at which you wish to screen each fragment. Given the affinity of many of the fragments is expected to be within the high micromolar to low millimolar range, the final concentration of each individual fragment is often within the 10–100 millimolar range. Some of the first experiments in which X-ray crystallography was used as a “screening tool” were reported by Verlinde and colleagues who exposed crystals of trypanosomal triosephosphate isomerase to cocktails of compounds in their search for inhibitors [41]. More recently, Nienaber and colleagues have described a method for screening using X-ray crystallography that focuses on soaking the target crystals with cocktails of compounds having differing shapes that can easily be distinguished by visual inspection of the electron density [42]. In these experiments crystals of the anticancer target urokinase were used to screen cocktails of 6–8 compounds in an attempt to discover novel inhibitors with improved pharmacokinetic properties. The active compounds were identified visually from the electron density map and the experiment repeated with the actives removed from the cocktail In this way, multiple binders in a cocktail could be identified. Three different compound classes were discovered using this approach, one of which, the 2-aminoquinoline provided the basis of a new optimized lead compound [42]. Fragment screening using X-ray crystallography can also be used in conjunction with other biophysical methods such as surface plasmon resonance (SPR), For example, SPR was used by Lesuisse and colleagues to identify novel phosphotyrosine mimetics to incorporate into inhibitors of the Src SH2 domain [43]. Over 150 fragments were evaluated for their binding affinity for Src SH2 using SPR and then soaked into crystals. From these experiments, 20 fragments were found to bind at the active site and were used design low nanomolar range Src SH2 inhibitors devoid of phosphate groups. However, to fully exploit X-ray crystallography as a screening approach it is desirable to implement an objective and automated process to address the key R bottleneck of data interpretation and analysis [29]. AutoSolve , from Astex, allows rapid and automated analysis of electron density from fragment-soaking experiments that use singlets and/or cocktails of compounds. Examples of elec R tron density that were unambiguously interpreted by AutoSolve are shown in Fig. 3. In each case the binding mode of the small-molecule fragment is clearly defined by the electron density, which means that although the affinity may be in the millimolar range, the binding is ordered, with key interactions being made R between the compound and the protein. In fact, AutoSolve requires no human
11
12 High-Throughput Crystallography
Fig. 3 Automatic analysis of ligand electron density. The electron density was interpreted and models of compounds automati R cally fitted using AutoSolve . Although the binding affinity is weak (AT000037 IC50 =46 :M; AT000056 IC50 = 1 mM) the frag-
ments bound into the pocket of Trypsin adopt a clearly ordered conformation. Electron density maps are contoured at 3F and density due to protein and solvent has been removed for clarity.
intervention if the quality of electron density is high, and can identify the correct compound bound at the active site from an experiment where the crystal has been exposed to a cocktail of compounds, therefore reducing the need to deconvolute (Fig. 4).
4.2 Structure-Based Optimization of Fragment Hits
When the binding of one or more molecular fragments has been determined in the proteinactive site, this provides a starting point for medicinal chemistry to optimize the interactions using a structure-based approach. The fragments can be combined onto a template or used as the starting point for “growing out” an inhibitor into other pockets of the protein (Fig. 5). The potency of the original weakly binding fragment can be rapidly improved using iterative structure-based chemical synthesis. For example, in one of our lead discovery programs targeted against the cancer target cyclin-dependent kinase 2, we identified an initial fragment, AT381 (MW=134), which exhibited an ICSB of 1 mM in an enzyme assay. Using the crystal structure of AT381 bound to CDK2 we were able to improve the potency
4 Fragment-Based Lead Discovery
Fig. 4 Analyzing fragment cocktails us R ing AutoSolve . A protein crystal was exposed to a cocktail of 8 fragments and the resultant electron density is shown below. AutoSolve attempts to fit each of the
8 molecules into the electron density and ranks the goodness of fit for each one. In this way the need for deconvolution of an active cocktail is reduced.
more than 100000-fold by synthesizing only 30 analogues (Fig. 6). The resulting compound, AT3851, had an ICSB = 30nM and was generated using an iterative cycle of design and synthesis (unpublished results). Compounds from this novel lead series are being further optimized to improve pharmacokinetic properties and show cellular activity against a range of normal and cancer cell lines including MRC5, HCT116 and HT29.
13
Fig. 5 Fragment-based lead generation. Once fragments have been identified bound into the active site they can be used as a start point for iterative structure-driven chemistry resulting in a drug-size lead compound. If two fragments are bound in 2 dif-
ferent pockets (a) then they could be used to decorate an appropriate scaffold (b). Alternatively, a single fragment can be rationally modified to occupy other neighbouring pockets (c).
Fig. 6 Structure-based lead optimization. Structure-based optimization of the initial fragment hit (AT381) allows affinity to be rapidly and efficiently built-up using focused chemical synthesis. The resulting compound
(AT3851) is a potent inhibitor of CDK2 and has good physicochemical properties and shows cell-based activity against a variety of cancer cell lines.
References
5 Conclusions
The recent increase in available protein crystal structures is driving a new wave of interest in structure-based drug design. This trend is expected to continue as crystal structures of more therapeutic drug targets are determined in the coming years. Technology advances in all aspects of structural biology are fuelling this surge of protein structure data. Many of these advances in protein production, crystallization and structure determination have been developed as part of the current structural genomics initiatives. High throughput crystallography methods are now being used to solve novel protein crystal structures and protein-ligand complexes. The ability to generate protein-ligand crystal structures rapidly allows not only accelerated lead optimization but a new application of X-ray crystallography, as a screening technology. This may have particular value for fragment-based lead discovery where the initial molecular fragments are likely to have an affinity too weak to enable detection using traditional bioassay-based methods. Initial data generated using X-ray crystallographic screening of molecular fragment libraries indicates that novel scaffolds can be identified and subsequently optimized using rapid structure-based synthesis to generate useful lead compounds. The potential of this fragment-based screening approach using X-ray crystallography may be significant, in particular against targets which have remained intractable using conventional screening methods. Acknowledgment
I wish to thank Mike Hartshorn and Ian Tickle who developed AutoSolve and Jeff Yon, Marc O’Reilly, Dominic Tisi and Andrew Sharff for contributions towards this chapter. I also appreciate the assistance of Emma Southern and Michelle Jones in the production of this text. References 1 International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome. Nature 2000, 409, 860–921. 2 CAMPBELL, S. F., Science, art and drug discovery: a personal perspective. Clin. Sci. 2000, 99, 255–260. 3 BURLEY, S. K., An overview of structural genomics. Nat. Struct. Biol. Suppl. 2000, Nov., 932–934. 4 HEINEMANN, U., ILLING, C., OSCHKINAT, H., High-throughput three-dimensional protein structure determination. Curr. Opin. Biotech. 2001, 12, 348–354. 5 BLUNDELL, T. L., JHOTI, H., Abell, C, High-throughput crystallography for lead
6
7
8
9
discovery in drug design. Nat. Rev. Drug Disc. 2002, 1, 45–54. NORVELL, J. C., MACHALEK, A. Z., Structural genomics programs at the US National Institute of General Medical Sciences. Nat. Struct. Biol. 2000, 7, 931. VITKUP, D., MELAMUD, E., MOULT, J., Sander, C, Completeness in structural genomics. Nat. Struct. Biol. 2001, 8, 559–566. DRY, S., MCCARTHY, S., HARRIS, T., Structural genomics in the biotechnology sector. Nat. Struct. Biol. 2000, 7, 946–949. STEVENS, R. C., Design of high-throughput methods of protein
15
16 High-Throughput Crystallography
10
11
12
13
14
15
16
17
18
19
20
production for structural biology. Structure 2000, 8, R177–R185. LESLEY, S. A., High throughput proteo-mics: protein expression and purification in the post-genomic world. Protein Exp. Purif. 2001, 22, 159–164. CROWE, J., DOBELI, H., GENTZ, R., HOCHULI, E., STUBER D., HENCO, K., 6xHis-Ni-NTA chromatography as a superior technique in recombinant protein expression/purification. Methods Mol. Biol. 1994, 31, 371–387. KIGAWA YABUKI, T., YOSHIDA, Y., TSUTSUI, M., ITO, Y., SHIBATA, T., YOKOYAMA, S., Cell-free production and stable-isotope labeling of milligram quantities of proteins. FEBS Lett. 1999, 442, 15–19. SAWASAKI, et al., A cell-free protein synthesis system for high-throughput pro-teomics. PNAS 2002, 99, 14652–14657. van MIERLO, C. P. M., STEENSMA, E., Protein folding and stability investigated by fluorescence, circular dichroism and nuclear magnetic resonance spectroscopy the flavodoxin story. J. Biotech. 2000, 79, 281–298. DARCY, A., Crystallising proteins - a rational approach? Acta Crystallogr. 1994, D50, 469–471. STEVENS, R. C., High-throughput protein crystallisation. Curr. Opin. Struct. Biol. 2000, 10, 558–563. NOWAKOWSKI, J., CRONIN, C. A., MCREE, D. A., KNUTH, M. W., et al., Structures of the cancer-related aurora-A, FAK, and EphA2 protein kinases from nanovolume crystallography. Structure 2002, 10, 1659–1667. BODENSTAFF, E. R., HOEDEMAEKER, F. J., KUIL, M. E., de VRIND, H. P. M., ABRAHAMS, J. P., The prospects of protein nanocrystallography Acta Crystallogr. 2002, D58, 1901–1906. STEWART, L., CLARK, R., Behnke, C, High-throughput crystallisation and structure determination in drug discovery. Drug Disc. Today 2002, 7, 187–196. WILSON, J., Towards the automated evaluation of crystallisation trials. Acta Crystallogr. 2002, D58, 1907–1914.
21 WATANABE, N., Semi-automatic protein crystallisation system that allows in situ observation of X-ray diffraction from crystals in the drop. Acta Crystallogr. 2002, D58, 1527–1530. 22 HENDRICKSON, W., Synchrotron crystallography. Trends Biochem. Sci. 2000, 25, 637–643. 23 ARNDT, U. W., BLOOMER, A. C., New developments in X-ray optics for macromo-lecular crystallography using laboratory X-ray sources, Curr. Opin. Struct. Biol. 1999, 9, 609–614. 24 de LA FORTELLE, E., BRICOGNE, G., Maximum-likelihood heavy-atom parameter refinement for multiple isomorphous replacement and multiwavelength anomalous diffraction methods. Methods Enzy-mol. 1997, 276, 472–494. 25 LEMKE, C. T., S-SAD, Se-SAD and S/Se-SIRAS using Cu Ka radiation: why wait for synchrotron time. Acta Crystallogr. 2002, D58, 2096–2101. 26 DOUBLIE, S., Preparation of seleno-methionyl proteins for phase determination. Methods Enzymol. 1997, 276, 523–530. 27 OGATA, C. R., MAD phasing grows up. Nat. Struct. Biol, Synchrotron Supplm. 1998, 1997, 638–640. 28 PERRAKIS, A., MORRIS, R., LAMZIN, V. S., Automated protein model building combined with iterative structure refinement. Nat. Struct. Biol. 1999, 6, 458–63. 29 BLUNDELL, T.L., ABELL, C., CLEASBY, A., HARTSHORN, M.J., TICKLE, I.J., PARASINI, E., High-throughput crystallography for drug discovery. Proceedings of the Royal Society of Chemistry meeting. Cutting Edge Approaches to Drug Design, March 2001 (FLOWER, D., ed.) RSC Publications, Cambridge. 2001, 53–59. 30 SERVICE, R., Tapping DNA for structures produces a trickle. Science 2002, 298, 948–950. 31 BERMAN, H. M., The Protein Data Bank. Acta Crystallogr. 2002, D58, 899–907. 32 CARR, R., JHOTI, H., Structure-based screening of low-affinity compounds. DDT, 2002, 7, 522–527.
References 33 KARAIN, W. I., Automated mounting, centering and screening of crystals for high-throughput protein crystallography. Acta Crystallogr. 2002, D58, 1519–1522. 34 MUCHMORE, S.W. et al., Automated crystal mounting and data collection in protein crystallography. Structure 2000, 8, R243–R246. 35 HARTSHORN, M. J., AstexViewerTM: a visualisation aid for structure-based drug design. J. Comp. Aided Mol. Design, 2003, 16, 871–881. 36 OPREA T. I., Is there a difference between leads and drugs? A historical perspective, J. Chem. Inf. Comp. Sci. 2001, 41, 1308–1315. 37 LIPINSKI, C. A., LOMBARDO, F., DOMINY, B. W., FEENEY, P. J., Experimental and computational approaches to estimate solubility and permeability in drug discovery and development. Adv. Drug Delivery Rev. 2001, 46, 3–26. 38 FEJZO, J., LEPRE, C. A., PENG, J. W., BEMIS, G. W., MURCKO, M. A., MOORE, J. M., The SHAPES strategy, an NMR-based approach for lead generation in drug discovery. Chem. Biol. 1999, 6, 755–769.
39 SHUKER, S. B., HAJDUK, R. P., MEADOWS, R. P., FESIK, S. W., Discovering high-affinity ligands for proteins: SAR by NMR. Science 1996, 274, 1531–1534. 40 HAJDUK, P. J., MEADOWS, R. P., FESIK, S. W., NMR-based screening in drug discovery. Quart. Rev. Biophys. 1999, 32, 211–240. 41 VERLINDE, C., KIM, H., BERNSTEIN, B.E., MANDE, S.C., HOL, W.G.J., Antitrypano-somiasis drug development based on structures of glycolytic enzymes. Structure-based drug design (VEERAPANDIAN, P., ed.), Marcel Dekker, New York, 1997, 365–394. 42 NIENABER, V. L., RICHARDSON, P. R., KLIGHOFER, V., BOUSKA, J. L., GIRANDA, V. L., GREER, J., Discovering novel ligands for macromolecules using X-ray crystallographic screening. Nat. Biotech. 2000, 18, 1105–1108. 43 LESUISSE, D., LANGE, G., DEPREZ, P., BENARD, D., SCHOOT, B., DELETTRE, G., MARQUETTE, J.-P., BROTO, P. et al., SAR and X-ray. A new approach combining fragment-based screening and rational drug design: application to the discovery of nanomolar inhibitors of Src SH2, J. Med. Chem. 2002, 45, 2379–2387.
17
1
Micro-Crystallization of Proteins Carl L. Hansen
California Institute of Technology, Pasadena, USA
Morten Sommer California Institute of Technology, Pasadena, USA
Kyle Self Fluidigm Corporation, San Francisco, USA
James M. Berger University of California at Berkeley, Berkeley, USA
Stephen R. Quake California Institute of Technology, Pasadena, USA
Originally published in: Protein Crystallography in Drug Discovery. Edited by Robert E. Babine and Sherin S. Abdel-Meguid. Copyright ľ 2004 Wiley-VCH GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30678-7
1 Introduction
With the completion of the various genome projects including the human genome, X-ray crystallography is poised to be the workhorse in the next great endeavor: illuminating the cellular proteomes. Just as the invention of the microscope helped to bring about an understanding of life at the cellular scale [1], X-ray crystallography has allowed scientists to observe proteins at the atomic level, driving a structural revolution in the biological and medical sciences. Unprecedented technological advances in synchrotron X-ray sources, phasing techniques and computing power have further amplified the power of this technique [2, 3]. These innovations in diffraction methods and model generation have not however been matched by the development of techniques for reliably and rapidly crystallizing macromolecules. Obtaining diffraction-quality crystals remains a major bottleneck in structure determination [4], and has prevented X-ray crystallography from realizing its true potential. The development of generally applicable technologies that enable the robust routine crystallization of biological Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Micro-Crystallization of Proteins
macromolecules will bring about a paradigm shift in biology and drug discovery, allowing researchers not only to picture life at the molecular level, but also to open areas of investigation that were previously inaccessible. Emergent technologies utilizing micro-fluidics now have the potential to solve these problems on several levels, both by allowing researchers to conduct efficient assays on the nanoliter scale, and by pushing the development of novel crystallization techniques. The crystallization of a biological macromolecule is realized by manipulation of one or more chemical and thermodynamic variables, such that the solubility of a target molecule in a concentrated solution is reduced, thereby promoting a transition to the solid phase in the form of a well-ordered crystal. In principle, any thermodynamic variable that may directly, or indirectly, affect protein solubility may be used to induce crystallization. Variables that are most often manipulated include macromolecule concentration, ionic strength, identity and concentration of precipitating agents, pH, temperature and small-molecule additives. Together, these variables comprise a vast multi-dimensional chemical phase space that must be systematically explored to discover crystallization conditions. Natural macromolecule targets for crystallography are both large and extremely varied. Proteins, for example, are complicated polymers of amino acids with polar, non-polar, charged and aromatic residues that interact with each other and with the external environment. These complicated interactions result in a highly varied and complicated phase space diagram that cannot be determined a priori.Consequently, macromolecular crystallization requires a brute force approach in which as large a volume of chemical phase space as possible must be explored. For macromolecules possessing a large volume of phase space that is conducive to crystallization, there is a good chance that one of the randomly screened conditions will result in crystal formation. Since the first "hit" is not likely to be optimal, initial successes are usually of poor quality, and may consist of spherulites (Fig. 1A), phase separation (Fig. 1B), micro-crystals (Fig. 1B), needles, needle clusters, thin plates (Fig. 1C), plate stacks (Fig. 1D), or small single crystals (Fig. 1E). These starting conditions can be used to initiate focused and refined screening however, eventually producing diffraction quality crystals (Fig. 1F). For macromolecules that are more difficult to crystallize, initial screens may be too coarse to uncover promising conditions and further screening may be required. If the conditions required for crystallization are very specific, it may ultimately require many thousands of experiments before a hit is detected, if at all. The large number of experiments required to uncover successful crystallization conditions therefore is what represents the most formidable obstacle to determining the structure of many important biological macromolecules. For example, membrane proteins play a central role in cell signaling and are often excellent targets for small molecule therapeutics. Unfortunately, the structures of only a very few have been determined to date, primarily because of the difficulties associated with expressing, solubilizing and stabilizing these molecules in large quantities (>1 mg) [5–7]. Similarly, many proteins in the cell work as large complexes. Structural information of these targets provides invaluable information regarding complex reactions and protein-protein interactions but such assemblies
1 Introduction 3
Fig. 1 Examples of initial crystallization hits from a single on-chip screening experiment of a type II topoisomerase ATPase domain/ADP, 12 mg mL−1 . (A) Irregular spherulite. (B) Phase separation
and spherulites with nucleating microcrystals. (C) Thin plate clusters. (D) Thick plate stacks. (E) Well-formed microcrystals. (F) Large single crystals. All scale bars are 100 µm.
are exceedingly difficult to purify in large amounts, and at times must even be purified by processing kilogram quantities of native sources [8, 9]. Traditional techniques for the crystallization of macromolecules include concentration through slow dehydration (vapor diffusion), batch and dialysis methods and both liquid-liquid and liquid-gel diffusion experiments (for a review see [10]). Practical limitations of traditional fluid handling approaches require that the minimum volume per assay for these techniques ranges from 0.2 to 1 µL for vapor diffusion and micro-batch methods, and up to 50 µL for micro-dialysis. Since protein samples may only be available in milligram quantities, and target molecule concentrations are generally required in excess of 10 mg mL−1 , the ability to perform nanoliter scale assays routinely would not only help to promote a more comprehensive screening strategy, but would also allow for experimentation with ultra-low-abundance macromolecules. To date robotic automation has been the dominant strategy both for miniaturizing standard crystallization protocols and for fostering high-throughput crystallization [11, 12]. This allows more trials to be conducted with a given sample volume. State of the art liquid handling robots are capable of accurately dispensing low viscosity solutions in small volumes. These machines have been shown to increase throughput, reduce sample consumption, and improve reproducibility [4, 13–15]. Nonetheless, there are however several drawbacks that limit robotic success and applicability. Since dispensing robots are sensitive to solution viscosity and surface tension, they must be properly calibrated for each working fluid, and frequently have difficulty dispensing the highly viscous solutions (such as MPD or PEG) used in crystallography. Furthermore, these systems require significant dedicated space and are expensive to both purchase and maintain, making them unavailable to the vast majority of researchers, particularly academic labs. From a progress
4 Micro-Crystallization of Proteins
standpoint, these systems essentially only scale down the volumes needed to conduct screening by traditional methods employing low viscosity solutions. Robotics have not necessarily changed or improved upon the efficiency of the crystallization method itself.
2 Microfluidics – Method and Design
In the same way that miniaturization has impacted the electronics industry, microfluidic technologies promise to spark a revolution in the biological sciences by integrating ultra small-volume sample processing within a chip format. The use of nanoliter reaction volumes and highly scaleable parallel sample processing make microfluidic technologies ideally suited to protein crystallography, where the screening and processing of precious reagents is required. Beyond reduction in sample consumption, the unique properties of mixing and fluid flow at the micron scale also allow for the implementation of assays that are highly efficient at detecting crystallization conditions. Despite this enormous potential, several technical problems have prevented the realization of the full potential of microfluidic devices for protein crystallization. In order to perform assays on the nanoliter scale, a system for the accurate metering and dispensing of fluids is required. For such a system to be scaleable and to have general applicability, it must be insensitive to both the surrounding fluidic architecture and to the properties of the working fluids. In the case of protein crystallography, this latter requirement is particularly important since the solutions used in crystal trials cover a large range of chemistries, viscosities, ionic strengths and pH. Previous work on microfluidic metering has resulted in the development of electrokinetic metering systems that manipulate fluids by the application of an electric potential to the terminals of the device [16–19]. Using this technique, once the devices are calibrated, fluids may be accurately metered and mixed on the picoliter scale [16–21]. This method is not scaleable however, and there are additional fundamental limitations that prevent its universal application. For example, the electrokinetic force used in these devices depends strongly on both the properties of the working fluid and on its interaction with the channel walls. Small changes in pH or salt concentration that result from ion drift over time can lead to more than a ten-fold variation in injected volume. These systems are also dependent on the viscosity and on the fluidic resistance due the surrounding channel architecture. As a result, electrokinetic devices must be recalibrated for every fluid, and are not suitable for high-throughput screening applications with a diverse ensemble of unrelated fluids. Moreover, since electrokinetic systems have no active valves, reagents diffuse through junctions and channels over time. This leaking dilutes and contaminates samples over time, restricting the maximally achievable incubation times and the density at which assays may be integrated on chip. The problem of reagent storage is particularly acute for crystallization applications where assays may be required to incubate for several days or weeks.
2 Microfluidics – Method and Design 5
All of these problems may be addressed by using MEMS (microelectromechanical systems) fabrication techniques to incorporate active mechanical valves on chip. Traditional MEMS techniques, using "hard" materials such as glass or silicon, may be used to fabricate true, leak-proof mechanical valves. However, MEMS fabrication techniques are expensive and require many processing steps, rendering the integration of many valves on a chip a difficult and expensive process. Furthermore, since these valves are fabricated from hard materials, a large valve actuator is needed to achieve valve closure at attainable actuation forces. The large actuators of each valve and the low yield of the fabrication process impose practical constraints on the degree of integration that is possible in microfluidic devices made from traditional MEMS techniques. As an alternative to traditional MEMS techniques, multilayer soft lithography (MSL) enables the facile and inexpensive large-scale integration of valves on chip [22, 23]. MSL is an extension of the technique of soft lithography [24], which uses replica molding of non-traditional polymer materials from a micromachined master. Soft lithography may be used to quickly reproduce features as small as 80 nm, creating a soft and inexpensive negative of the original master. This technique has found diverse applications from patterned surface deposition methods to the fabrication of passive microfluidic devices [24–34]. MSL uses consecutive soft lithography molding and bonding steps to generate complex multilayer fluidic devices with active mechanical valves, pumps, mixers and flow control logic. Owing to their excellent optical properties, low surface energy, and low Young’s modulus, silicone rubbers have become the most popular material for multi-layer soft lithography. Silicone rubbers are typically formed from the combination of two liquid components that cross-link into a flexible solid upon curing. One such example polymer is General Electric RTV 615. Part A of this compound consists of a platinum catalyst with polydimethylsiloxane polymers that have been functionalized with vinyl groups. Part B contains silicon hydride (Si-H) groups that covalently bond to the vinyl groups of part A, cross-linking the polymers. When mixed at the stoichiometric ratio of 10:1 (A:B) there is an equal number of vinyl and silicon hydride groups, allowing all groups to become incorporated into a covalent bond. When combined at a different ratio, however, there remain active groups that do not participate in a covalent bond, and which instead may be used to bond two surfaces together permanently, creating a monolithic device. A process flow diagram illustrating the steps in MSL is shown in Fig. 2. Two negative molds, one defining the flow structure, and the other defining the control structure of valves, are first patterned on a silicon wafer using conventional photolithography, leaving 10 µm raised features of photo-resist. The flow layer master is then annealed so that the photo-resist is allowed to re-flow, creating rounded flow channels. Masters are reproduced by replica molding in silicone rubber. A 30:1 ratio of (A:B) silicone rubber, having excess vinyl groups, is spun onto the flow mold to a final thickness of 30 µm. A 3:1 silicone rubber layer, containing excess silicon hydride groups, is cast over the control mold to a thickness of approximately 7 mm. Once both layers are heated and allowed to partially cure, the structures solidify. The control layer is then peeled from the mold, punched to create valve
6 Micro-Crystallization of Proteins
Fig. 2 Flow diagram illustration of the technique of multilayer soft lithography.
access ports, aligned to the flow structure and the entire device is then heated once again, causing the excess vinyl and silicon hydride groups to covalently bond. The resulting monolithic device is peeled from the flow mold and channel access ports are punched into it prior to sealing the multilayer polymer to a glass substrate. The substrate itself may additionally have structural features, such as microwells of defined volumes that may be accessed through the molded channels, thus providing larger localized volumes for reactions to occur. When a control channel crosses over a flow channel, only a thin square membrane of elastomer separates the two, forming a valve (Fig. 3A). By pneumatic or hydraulic pressurization of the control channel, the membrane may be deflected down into the flow channel, causing it to seal against the glass substrate (Fig. 3B). Because of the compliance of the membrane, a hermetic seal may be easily achieved at moderate actuation pressures even in the presence of particulates or imperfections. The low Young’s modulus of the elastomer (≈1 MPa versus ≈100 GPa for crystalline silicon) allows for the closing of valves having areas as small as 100 µm2 , so that many thousands of active valves may be integrated on a chip smaller than a credit card [23]. The actuation pressure required to close a valve is dependent on the geometry of the valve junction. For a standard geometry of a 10 µm high flow channel, a 20 µm thick membrane, and a 100 µm ×100 µm valve, the required actuation pressure is approximately 6 psi. The strong dependence of actuation pressure on the channel and valve geometry may be exploited to engineer bridges (valves with very high actuation pressure) by simply tapering the thickness of a control channel in the vicinity of a flow channel that is not to be closed. By applying an intermediate actuation pressure, a single control channel may be used to close a plurality of selected flow channels without significantly impeding flow in adjacent lines. It is, however, not enough simply to create a device with many integrated valves. The realization of highly integrated and complex fluidic devices also requires that the problem of priming, or initially filling the device with fluid, be addressed. For microfluidic devices made from conventional hard materials such as silicon or glass, this requirement may preclude the use of multiply crossing, highly complex fluidic architectures. Such devices may therefore need to be primed using a flowthrough method, which requires an outlet through which displaced gas may be vented. The introduction of the priming fluid through a complex fluidic structure
2 Microfluidics – Method and Design 7
Fig. 3 Active valves fabricated by multilayer soft lithography. (A) Rendering of valve geometry; control channel (narrow diagonal) crosses flow channel (wider diagonal) forming a valve at the intersection. (B) Micrograph of an open valve showing control
(horizontal) and flow (vertical) structures. (C) Closed valve; application of pressure to control structure deflects the membrane and pinches off the flow structure, creating a fluidic seal.
may trap air bubbles at junctions and other channel features. Surface tension effects at the liquid-gas interfaces of these bubbles result in a high effective viscosity such that they may not be easily removed and can adversely affect the performance of the device. Furthermore, since a single priming fluid must pass through the entire device, it may subsequently contaminate or dilute sample solutions. These difficulties can be surmounted in silicone devices by exploiting the gas permeability of a soft silicone polymer [35, 36]. Arbitrarily complex, connected fluidic structures may be filled in minutes by a technique called pressurized outgas priming (POP). Using the POP technique, a fluid is injected into a closed channel structure, causing the gas ahead of it to be pressurized. Owing to the permeability of the elastomer, the pressurized gas quickly diffuses into the bulk material, allowing the priming fluid to completely fill the flow structure. Despite the low surface energy of PDMS (22 mN m−1 ), aqueous solutions may be easily introduced, at moderate pressures (1–8 psi), into channels having a minimum dimension of 1 µm, eliminating the need for surface modification protocols. Since no outlet is needed for venting the gas, dead-end reaction chambers and channels may be used. By incorporating integrated valves into the flow structure, the device
8 Micro-Crystallization of Proteins
Fig. 4 Barrier interface metering. (A) A set of microfluidic reaction chambers designed to mix two fluid samples at three distinct mixing ratios. Reaction chambers having approximate volumes of 5 nL, 12.5 nL and 20 nL are paired to produce final mixing ratios of 4:1, 1:1 and 1:4 while maintaining a constant reaction volume of 25 nL. (B) Dead-end filling of reaction chambers by the pressurized outgas priming (POP) method.
Top wells have been filled with dye (13 mM bromophenol blue sodium salt) and bottom wells are being filled with water. (C) Chambers are sealed by actuation of the containment valves and the interface valve is released, allowing the solutions to slowly mix by diffusion. (D) The complete diffusive mixing of an organic dye with water, resulting in three distinct concentrations in the different mixing chambers.
may be selectively primed with different solutions, allowing for greater design flexibility. The ability to fill a device with many different fluids in different channels and chambers allows for the implementation of a simple and robust geometric method of metering solutions. The principle behind this scheme is to set up a geometry in which two microfluidic chambers may be primed with different solutions and connected via a fluidic interface that is controlled by a microfabricated valve (Fig. 4). With a central interface valve closed, the two chambers are first dead end filled with two different solutions using the POP technique. Once both chambers have been completely filled, containment valves are actuated, isolating the chambers from the rest of the chip. The interface valve is then opened, creating a constricted fluidic connection between the chambers, and allowing them to mix by diffusion. In this way, the chambers effectively act as microfluidic measuring cups, precisely determining the mixing ratio by the relative chamber volumes. Since the metering depends only on the volume of the chambers, it is an inherently robust technique that is insensitive to resident fluid properties.
3 Utility of Microfluidics for Crystallization 9
3 Utility of Microfluidics for Crystallization
In the simple configuration shown in Fig. 4, mixing is completely diffusive and occurs on the order of an hour for small molecules in an aqueous solution. Although rapid mixing of solutions at the low Reynolds numbers presents a challenge, several schemes for fast and efficient mixing have been developed. In cases where rapid mixing is required, it is straightforward to implement the BIM scheme into a geometry such as a ring, and then enhance mixing by active pumping around the ring using a kneading or chaotic mixing configuration [37–39]. Nonetheless, for some applications it is desirable to suppress the confounding effects of convection and instead to allow mixing that is completely diffusive. One situation where this scheme is particularly useful lies in the crystallization of macromolecules. The simple geometric metering scheme shown in Fig. 4 has been used to develop a highly efficient microfluidic device for protein crystallization in ultra-small volume reactions. The crystallization chip implements 144 simultaneous metering and mixing reactions while consuming only 3.0 µL of protein solution. A layout of the chip (Fig. 5) shows 48 reaction centers (Fig. 4), each consisting of three pairs of microfluidic reaction chambers with relative volumes of 1:4, 1:1 and 4:1. Each pair of chambers is connected to the protein sample and one of 48 crystallization solution reservoirs. The chip has 480 integrated valves that are actuated through three
Fig. 5 Parallel architecture of crystallization chip and blow-up of a single unit cell. All unit cells are connected to a central sample port (yellow) and one of 48 unique reagent ports (blue). All interface and containment
valves are actuated in parallel through two control ports (red). An additional control port (red; top right) actuates safety valves in the unlikely contingency of a valve failure.
10 Micro-Crystallization of Proteins
separately addressable control lines. As a precaution, 48 safety valves are included at the solution inlets to avoid the unwanted loss of protein sample in the unlikely event of an interface valve failure. The remaining two lines simultaneously control all interface and containment valves. By virtue of this parallel architecture and the robustness of the BIM scheme, solutions of varying viscosity, surface tension, pH and ionic strength may be simultaneously metered and mixed at three different mixing ratios using only two hydraulic control lines. The chip is contained within a carrier device (Fig. 6), which facilitates loading, storage and interfacing with the control lines. The chip is secured in the base of the carrier with the safety and interface lines directly connected to two of three carrier interface pins that are externally accessible. The containment line interface pin is connected to the chip through a pressure accumulator. Once charged, this accumulator acts as an on-board pressure source to maintain actuation of the containment valves for several weeks, thereby allowing the chip to be stored and transported without the need for any external connections. The top of the carrier has two cavities with a raised lip around the periphery and stainless steel input ports for pressurization. The cavities mate with the 48 reagent wells, creating a seal against the compliant elastomer chip when the plates are pressed together. Once the carrier is assembled the cavities are pressurized, simultaneously injecting the 48 crystallizing agents into the chip. The protein sample is then loaded through a
Fig. 6 Crystallization chip inside carrier device. Pressure reservoir (bottom) allows for free transport and storage of chip. Interface pins (right) allow for facile loading and control of chip valves.
3 Utility of Microfluidics for Crystallization 11
single port located in the center of the chip. A 3 µL sample of protein is aspirated into a pipette tip, and the tip is inserted into the protein port. The tip is then pressurized through an adapter, injecting the protein into the 144 reaction chambers. Since the sample is introduced through a single port, there is very little lost at the interface, and a true economy of scale is realized with more than half the sample being used in the crystallization assays. While the most obvious advantage of microfluidics is the enormous reduction in sample consumption, the growth of protein crystals can also be fundamentally improved by taking advantage of the physical properties of fluid flow at the micron scale [36]. The successful crystallization of a macromolecule is determined both by thermodynamic and kinetic considerations. A concentrated solution of the target molecule must first be brought into a state of supersaturation in which the crystal phase is energetically favorable, and then kept in this state to allow crystal nucleation and growth to occur. Super-saturation is induced through the addition of a precipitating agent. Traditional precipitating agents include salts, polymers (e.g., polyethylene glycol), organic solvents, buffers and various additives, all of which are chosen specifically to manipulate thermodynamic variables such as solution pH, dielectric constant, salt concentration and effective protein concentration. Since there is currently no way to predict a priori which conditions will be favorable to crystallization, determination of conditions is a purely empirical process of trial and error. Moreover, because a thorough investigation of phase space by conventional techniques is impractical, the initial search is directed towards a sparse matrix or incomplete factorial sampling of likely crystallization agents [40, 41]. Once preliminary conditions have been identified, a refined search is used for optimizing crystal size and quality. The chance of success in this search depends both on the number of assays and the effectiveness of each assay. While there is no general strategy for predicting what conditions will give rise to crystallization, an understanding of the physics of crystal nucleation and growth may be used to design experiments that increase the chances of success. A hypothetical two-dimensional phase space illustrating the protein behavior as a function of protein concentration and precipitating agent concentration is shown in Fig. 7 [36]. This phase space represents the interaction of a given protein with a specific precipitating solution, and is therefore unique to that solution. The shape and the extent of the phase space region will vary dramatically depending on the precipitating agent and target molecule characteristics. For some precipitating agents there may exist a region of supersaturation bounded by precipitation and solubility curves. In this region the protein solution is out of equilibrium, and given sufficient time undergoes a phase transition to a crystalline solid. The time required for such a transition depends strongly on the degree of supersaturation defined as the ratio of the protein concentration to the maximum soluble concentration in equilibrium. This region of supersaturation may further be divided into a metastable and a labile region. Within the labile region, near the precipitation curve, rapid nucleation occurs, leading to the growth of a large number of small low quality crystals. In many cases these microcrystals may be too small to be distinguished from the amorphous precipitate, a problem that can cause
12 Micro-Crystallization of Proteins
Fig. 7 Schematic diagram of the evolution of vapor diffusion, microbatch and :fID experiments through a two-dimensional projection of phase space having protein concentration and precipitating agent concentration as variables. The phase space is divided into soluble (S), metastable (M), labile (L) and precipitation (P) regions. Microbatch and hanging-drop experiments start at point II where the target molecule is combined 1:1 with the precipitating agent. Microbatch experiments sample only a point in phase space since incubation under immiscible oil prevents subsequent concentra-
tion of reagents (green). In vapor diffusion experiments, equilibration with a reservoir of precipitating agent slowly concentrates the reagents, driving the sample monotonicaly into the super-saturation region (black). The evolution of a :FID reaction site having 3 different mixing ratios is shown. Curves represent the average state of both the sample side (blue) and precipitating agent side (purple) of each compound well. The final states (I, II, III) are determined by the mixing ratio. A decrease in protein concentration due to precipitation or crystal growth is not included in the figure.
promising conditions to be overlooked. In the metastable region, near the solubility curve, the growth of large, high quality crystals is supported, but nucleation is a rare event requiring impractically long incubation times. Experiments that stay in this region will likely appear as clear drops and will be unremarkable. The three-dimensional aggregation of target molecules into a critical nucleus from which crystal growth may proceed, is a process that requires a higher activation energy, and hence higher supersaturation, than the subsequent one- or two-dimensional nucleation needed for crystal facet growth. For this reason, an optimal crystal growth scheme should allow for initial nucleation by transiently high levels of supersaturation, followed by passage into lower supersaturation levels that support high quality crystal growth [42, 43]. The microfluidic reactor we designed provides exactly this property by implementing a microfluidic free-interface diffusion (µFID) between the precipitant and the protein solutions on chip. The opening of the microreactor interface valve that separates the protein from the precipitating solution creates a well-defined fluidic interface that allows for diffusive equilibration of the coupled micro-wells. The phase-space trajectory taken by each assay on the chip therefore depends on
3 Utility of Microfluidics for Crystallization 13
both the diffusion constants of the species involved, and the relative volume of the wells. Shortly after the interface valves are opened, the target macromolecule concentration in the target well changes very little, while that of the precipitating agent, which typically has a much larger diffusion constant, increases towards one of three final values determined by the mixing ratios. Subsequently, over a time of approximately 8–24 hours the target macromolecule concentration equilibrates, increasing on the precipitating agent side and decreasing on target macromolecule side, towards the final protein concentration that is again determined by the mixing ratios. The macromolecule targets in the µFID assay therefore travel a curved path through phase space, sampling efficient crystal nucleation conditions in the labile region prior to settling into a high quality growth regime in the metastable region. This trajectory can be contrasted with that of microbatch and hanging drop methods, which are the two most popular assays for crystallization screening, but have phase trajectories that are stagnant or monotonically increasing in supersaturation (Fig. 7). The kinetics of the chip µFID assays have many similarities to conventional free interface diffusion, a technique that has long been recognized as an efficient method for exploring crystallization space [43–45]. Despite the favorable kinetics of conventional free interface diffusion, this method has not achieved success in the protein crystallography community for a number of reasons. For example, conventional FID set-ups create a liquid interface in a capillary tube, so that solutions must be delicately introduced into opposite ends of the capillary by inserting a thin needle, making the technique labor intensive and ill-suited to high through-put screening. Furthermore, the diameter of the capillary must be of a substantial size (typically 100 prn), necessitating the use of relatively large sample volumes (generally > 5 µL). In addition, the introduction of the second fluid causes transient convection, which results in a poorly defined interface that may only be reduced by cumbersome procedures such as the introduction of hydro-gels or the freezing of the first solution. The drawbacks of the large volume that is required, delicate dispensing and the intrinsic poorly defined interface, have thus prohibited the use of FID as either a routine or large scale automated screening technique. Even when a well-defined fluidic interface can be created, buoyancy-driven convection due to density differences in the solutions causes complex mixing at the interface. To avoid this unwanted mixing, capillaries must be stored with the long axis parallel to gravity and the more dense solution on the bottom. This configuration creates a stable fluidic interface, but often causes nucleated crystals to fall away from the interface and out of the optimal growth conditions [10]. It has been proposed that free interface diffusion would therefore only realize its practical advantages in microgravity environments where gravity-induced convection is eliminated [10, 43]. However, the unusual properties of fluid flow in microfluidic devices make it both possible and practical to implement nearly ideal free interface diffusion conditions in terrestrial devices. At the length scales that are relevant in microfluidic devices, the Grashoff number (which measures the ratio of buoyant to viscous forces) is small so that convection is suppressed and mass transport is dominated by diffusion [46, 47]. Furthermore, the BIM method allows many near
14 Micro-Crystallization of Proteins
ideal fluidic interfaces to be established with negligible transient mixing. Lastly, since there are no significant concentration gradients parallel to gravity, nucleated crystals do not fall out of the ideal growth region. Conventional free interface diffusion achieves high transient levels of supersaturation, but has a complicated spatial/temporal gradient due to the constant cross-section of the capillary. This gradient couples the kinetics and thermodynamics of traditional free interface diffusion assays in a way that µFID does not. In the µFID assay, the fluidic interface is established between the two wells in a constricted channel where the cross-section is 10 µmx 100 µm. In contrast, the cross-section of a well is approximately 300 µm ×µ100 im. This constriction acts as a high-impedance connection between the two channels, localizing the concentration gradient only to the length of the connecting channel. Fig. 8 shows a finite difference time domain simulation of the diffusive equilibration of a low molecular weight dye in two micro-wells coupled by a constricted channel (PDE R ; The MathWorks Inc. of Natick, MA, USA). With the exception Toolbox, MATLAB of the region in close proximity to the inlet, no appreciable concentration gradient forms in the micro-well itself. In a protein crystallization experiment, this implies that as the wells equilibrate, the vast majority of the sample evolves simultaneously through a continuum of thermodynamic conditions. By monitoring the experiment
Fig. 8 Two-dimensional finite element modeling of diffusion of an organic dye in aqueous solution between two micro-wells connected by a constricted channel. Simulation shows the bulk of the concentration drop
occurring along the channel with no appreciable gradient within the micro-wells. For modeling purposes, constriction of the channel in the vertical dimension has been represented by a lateral constriction.
3 Utility of Microfluidics for Crystallization 15
over time it is therefore possible to determine the exact transient conditions that resulted in crystal nucleation. In the optimization of crystallization conditions, it is often desirable to slow down the equilibration process so that favorable conditions are approached more slowly to produce fewer nucleation events and larger crystals. In vapor diffusion experiments, this regime can be achieved by methods such as placing a thin layer of semi-permeable oil over the precipitant, increasing the size of the crystallization drop to slow equilibration, or by inclusion of a chemical additive such as glycerol to the crystallization drop in order to reduce the vapor pressure. While these techniques are effective in slowing down the equilibration of the drop with the reservoir, they allow for only coarse control of the equilibration kinetics. Conversely, microfluidic free interface diffusion allows for precise and straightforward control of the equilibration rate while decoupling the kinetics and thermodynamics of crystallization. The absence of any spatial gradient within the micro-wells allows for a simple analytical solution of the time-dependent evolution of the concentration in each chamber. The net transport of a diffusing species along the channel is equal to the product of the diffusive current density and the channel cross-sectional area. Since the channel is constricted in both height and width, the problem is onedimensional and the gradient along the channel is given by the difference in concentration divided by the channel length. The equation governing the change of concentration in a well of volume V 1 and concentration C1 (t) that is coupled to a second well of volume V 2 at concentration C2 (t) is therefore given by: dC1 (t) C1 (t) − C2 (t) = DA dt V1 L
(1)
Integration of this equation and application of the initial conditions C1 (0) = C0 , C2 (0) = 0, gives C1 =
C0 1+
V2 V1
+
C0 1+
V1 V2
V1 + V2 1 A × exp −D t V2 L V1
(2)
From this equation, it can be seen that for a given diffusion constant and relative well volume, the rate of equilibration depends only on two characteristic lengths: the length of the connecting channel and the ratio of the well volume to the channel cross-sectional area. Thus, modifying the channel geometry allows for intuitive and accurate control over the kinetics of equilibration without changing the chemistry of the solutions. For example, by making the channel twice as long, the reaction proceeds at one half the speed, whereas reducing the channel crosssectional area by a factor of two increases the equilibration time by the same factor. Furthermore, since these length scales only scale time in the exponent, the locus of concentrations (path through phase space) achieved during a complete equilibration depends only on the relative volume of the wells, and is independent of the channel geometry. The decoupling of the kinetics and thermodynamics of
16 Micro-Crystallization of Proteins
diffusive equilibration has important implications in crystal optimization where it is often desirable to approach crystallization conditions slowly while conserving the successful thermodynamic variables. A further advantage of the constricted channel between the two wells is that only a very small fraction of the protein sample is exposed to large transient gradients that occur shortly after the interface is established. In micro-batch or hangingdrop experiments, the sudden addition of the precipitating agent to the protein sample induces rapid convective mixing, causing large transient concentration gradients throughout the drop, and may often result in the immediate precipitation of the protein. In the µFID experiments only a very small volume of the protein is exposed to this gradient, so that more concentrated precipitating solutions should be used with negligible immediate precipitation, and hence ultimately higher levels of supersaturation may be achieved. Experiments comparing the effectiveness of the chip to two conventional macroscopic techniques, vapor diffusion and microbatch, show that the chip may be used to conduct large numbers of nanoliter-scale assays while equaling or increasing crystallization success rates of these methods. In a recent study [36], in-chip crystallization experiments were conducted on 11 model macromolecules including 7 commercially available crystallization standards (Lysosyme, Glucose Iso-merase, Xylanase, Thaumatin, Protease K, Bovine Trypsin and Beef Liver Catalase), 3 proteins with unpublished structures (bacterial primase catalytic core domain, type II topoisomerase ATPase domain and a mycobacterial RNase) and a bacterial 70S ribosome. Each target was screened against two or more standard sparse matrices of precipitants. To compare crystallization in chip against standard crystallization methods, crystallization experiments were repeated for 9 of the model macromolecules using the conventional micro-batch and hanging-drop techniques, thereby keeping the precipitant chemistries constant while varying the kinetic scheme for crystal growth. Successful crystallization conditions were identified in the µFID format for all model macromolecules tested, and crystal growth was observed at incubation times ranging from 3 hours to 7 days. High quality crystals of 4 different proteins grown in chips are shown in Fig. 9. A histogram comparing the number of successful experiments for each method (Fig. 10) shows that identical sparse matrix screens led to crystal growth more often in the chip than by conventional techniques in all but two cases [36]. For the bacterial primase catalytic core domain, 11 conditions producing needle crystals of dimensions greater than 100 µm were detected on chip; no hits were initially observed in either macroscopic method. Further on-chip experimentation, optimizing around the crystallization conditions, identified in initial screens crystals produced whose largest dimension exceeded 400 µm. These conditions were subsequently used to reproduce crystallization in microbatch and vapor diffusion formats, demonstrating that successful on-chip conditions may be exported to more conventional macroscopic techniques. During these trials, the chip also produced crystals of two targets that had not been seen by conventional screening. A previously unidentified crystal form of the bacterial 70S ribosome was obtained in three conditions of a sparse matrix of
3 Utility of Microfluidics for Crystallization 17
Fig. 9 High-quality protein crystals grown in the crystallization chip. (A) Glucose isomerase, 31 mg mL−1 in deionized water. Mixing ratio of 1:1 with 0.2 M magnesium acetate tetrahydrate, 20% w/v polyethylene glycol 8000, 0.1 M sodium cacadylate, pH 6.5. (B) Thaumatin, 50 mg mL−1 in 0.1 M ADA (Sigma Aldrich) pH 6.5. Mixing ratio of 1:1 with 0.8 M potassium sodium tartrate tetrahydrate, 0.1 M HEPES pH 7.5. (C)
Chicken egg white lysosyme, 50 mg mL−1 in 0.05 M potassium dihydrogen phosphate, 20% w/v polyethylene glycol 8000. (D) Crystals of type II topoisomerase ATPase domain/ADP showing the importance of mixing ratio as a screening parameter; 12 mg mL−1 in 100 mM sodium chloride, 20 mM TRIS pH 7.0. Mixing ratios of 4:1,1:1 and 1:4 with 0.2 M ammonium fluoride, 20% w/v polyethylene glycol 3350, pH 6.2.
Fig. 10 Histogram of crystallization hits for sparse matrix screens of model proteins. Number of screens tested on each protein are lysosyme (L)=2, glucose isomerase (Gl) = 2, protease K (PK) = 1, bovine
liver catalase (BLC) = 1, xylanase (X) = 2, bacterial primase catalytic core domain (BPC) = 3, bovine pancreas trypsin (BT) = 1, thaumatin (T) = 1, mycobacterial RNase (MR) = 3.
18 Micro-Crystallization of Proteins
precipitants (Hampton Crystal Screen I), demonstrating that large protein-nucleic acid complexes may be crystallized in chip (C. Hansen, A. Vila-Sanjurjo and J. Cate, personal communication). Crystals of a previously uncrystallized mycobacterial RNase were also obtained from a single experimental condition on chip, whereas no crystals had been observed for this sample despite prior extensive trials using traditional methods. Subsequent broad-based screening efforts for the RNase around this condition using the hanging-drop vapor diffusion set-ups proved successful, but only after the protein concentration was increased to >40mg mL−1 . Consistent with the localization of large initial concentration gradients to the relatively small volume of the connecting channel, µFID-based chip experiments result in reduced protein precipitation. It was observed that the mixing ratios and concentration of crystallization agents that lead to crystallization on chip often caused the protein to precipitate immediately in hanging drop and microbatch experiments. In the case of a type II topoisomerase ATPase domain, the final concentration of precipitating agent achievable in chip was 4 times greater than that possible for the microbatch method. Finally, crystal growth in µFID experiments was generally observed to be faster than in microbatch or hanging-drop. For the type II topoisomerase ATPase domain, crystal growth in microbatch required 2 weeks while crystals grown on chip with the same conditions appeared after only 4 hours of incubation. When crystals grew on chip in less than 12 hours, they were always observed on the protein side of the compound well, suggesting that the short crystallization times are due to the high degree of supersaturation achieved in the initial phase of diffusive equilibration. Despite the small volumes, crystals of sufficient size for crystallographic structure determination may be grown and harvested directly from the chip. Fig. 11
Fig. 11 X-ray diffraction pattern from a single thaumatin crystal grown and harvested from a 5 nL chip micro-well. The inset shows a clean reflection at 1.35 Å resolution.
References
shows a high-resolution diffraction pattern for a single thaumatin crystal grown from only 5 nL of protein solution [36]. The high-resolution data obtained from chip-grown crystals exceeds that of crystals grown by conventional ground-based techniques, and is comparable to that obtained from thaumatin crystals grown in space [48]. In general, crystallization experiments in space have suggested that microgravity environments may be well suited for the growth of large, high quality crystals [48–51]. The mechanism for the increased order of space-grown crystals is not completely known but is largely attributed to the lack of buoyancy-driven convection during crystal growth, an effect also present in terrestrial microfluidic devices [52–54]. Microfluidic technologies have the potential to play a dominant role in the development of high-throughput crystallography. While the simple parallel architecture of the present device is ideally suited to preliminary screening experiments, more sophisticated control and flow architectures will allow for increased functionality and applicability. Using the simple valve configuration as a building block, multivalve elements such as peristaltic pumps, rotary mixers and addressable arrays have already been created on chip [22, 23, 38, 55–57]. These components have further been incorporated into higher-level microfluidic mixing and metering tools that can process samples in the sub-nanoliter range with picoliter accuracy. Given the current rate of progress in the field these components will no doubt lead to new technologies with applications both in and beyond protein crystallization.
References 1 HOOKE, R., Micrographia 1665. 2 OGATA, C. M., MAD phasing grows up. Nat. Struct. Biol. 1998, 5 (S), 638–640. 3 CUSACK, S., et al., Small is beautiful: protein micro-crystallography. Nat. Struct. Biol. 1998, 5 (S), 634–637. 4 CHAYEN, N. E., SARIDAKIS, E., Protein crystallization for genomics: towards high-throughput optimization techniques. Acta. Crystallogr., Sect. D, Biol. Crystallogr. 2002, 58 (6), 921–927. 5 AI, X., CAFFREY, M., Membrane protein crystallization in lipidic mesophases: Detergent effects. Biophys. J. 2000, 79 (1), 394–405. 6 OSTERMEIER, C., MICHEL, H., Crystallization of membrane proteins. Curr. Opin. Struct. Biol. 1997, 7 (5), 697–701. 7 ROSENBUSCH, J. P., Stability of membrane proteins: Relevance for the selection of
8
9
10
11
12
appropriate methods for high-resolution structure determinations. Struct. Biol. 2001, 136 (2), 144–157. BAN, N., et al., A 9 angstrom resolution X-ray crystallographic map of the large ribosomal subunit. Cell 1998, 93 (7), 1105–1115. CRAMER, P., et al., Architecture of RNA polymerase II and implications for the transcription mechanism. Science 2000, 288 (5466), 640–649. MCPHERSON, A., Crystallization of Biological Macromolecules. 1st edn. Cold Spring Harbor Laboratory Press, 1999. RC, S., Development of high-throughput technologies for protein crystallography and structure-based drug design. Abstracts of the Papers of the American Chemical Society 2001, 221 (2), U38. RC, S., High-throughput protein crystallization. Curr. Opin. Struct. Biol. 2000, 10 (5), 558–563.
19
20 Micro-Crystallization of Proteins
13 ABOLA, E., et al., Automation of X-ray crystallography. Nat. Struct. Biol. 2000, 7, 973–977. 14 NYARSIK, M. U., et al., Development of a technology for automation and miniaturization of protein crystallization. J. Bio-technol. 2001, 85, 7–14. 15 SERVICE, R. F., Robots enter the race to analyse proteins. Science 2001, 292 (5515), 187–188. 16 KANHOLZ, A. E., et al., Quantitative analysis of molecular interaction in a microfluidic channel: The T-sensor. Anal. Chem. 1999, 71, 5340–5347. 17 JENSON, D., Smaller, faster chemistry. Nature 1998, 393, 735–737. 18 JACOBSON, S. C., MCKNIGHT, T. E., RAMSEY, J. M., Microfluidic devices for electrokinetically driven parallel and serial mixing. Anal. Chem. 1999, 71, 4455–4459. 19 ZHANG, C., MANZ, A., Narrow sample channel injectors for capillary electrophoresis on microchips. Anal. Chem. 2001, 73, 2656–2662. 20 DERTINGER, S. K. W., et al., Generation of gradients having complex shapes using microfluidic networks. Anal. Chem. 2001, 73, 1240–1246. 21 QUI, C. X., HARRISON, D. J., Integrated self-calibration via electrokinetic solvent proportioning for microfluidic immunoassays. Electrophoresis 2001, 22, 3949–3958. 22 UNGER, M., et al., Monolithic microfabricated valves and pumps by multilayer soft lithography. Science 2000, 288 (5463), 113–116. 23 THORSEN, T., MAERKL, S. J., QUAKE, S. R., Microfluidic large-scale integration. Science 2002, 298. 24 XIA, Y., WHITESIDES, G. M., Soft lithography. Angew. Chem. 1998, 37, 550–575. 25 KIM, Y., SUH, K., LEE, H., Fabrication of three-dimensional microstructures by soft molding. Appl. Phys. Lett. 2001, 79 (14), 2285–2287. 26 CHIU, D. T., et al., Patterned deposition of cells and proteins onto surfaces by using three-dimensional microfluidic systems. Proc. Natl. Acad. Sci. USA 2000, 97 (6), 2408–2413.
27 DELAMARCHE, E., Patterned delivery of immunoglobulins to surfaces using microfluidic networks. Science 1997, 276, 779–781. 28 LI, H. W., HUCK, W. T. S., Polymers in nanotechnology. Curr. Opin. Solid State Mater. Sci. 2002, 6 (1), 3–8. 29 BREHMER, M., et al., Soft lithography on block copolymer films. Abstracts of Papers of the American Chemical Society 2002, 224 (2), U502–U503. 30 TIEN, J., Nelson, C, Chen, C, Fabrication of aligned microstructures with a single elastomeric stamp. Proc. Natl. Acad. Sci. USA 2002, 99 (4), 1758–1762. 31 SHAH, H., et al., Direct generation of optical diffractive elements in perfluorocy-clobutane (PFCB) polymers by soft lithography. IEEE Photon. Technol. Lett. 2000, 12 (12), 1650–1652. 32 KANE, R., et al., Patterning proteins and cells using soft lithography. Biomaterials 1999, 20 (23/24), 2363–2376. 33 XIA, Y., GM, W., Soft lithography. Annu. Rev. Mater. Sci. 1998, 28, 153–184. 34 BRITTAIN, S., et al., Soft lithography and microfabrication. Phys. World 1998, 11 (5), 31–36. 35 MONAHAN, J., GEWIRTH, A. A., NUZZO, R. G., A method for filling complex polymeric microfluidic devices and arrays. Anal. Chem. 2001, 73, 3193–3197. 36 HANSEN, C. L., et al., A robust and scalable microfluidic metering method that allows protein crystal growth by free interface diffusion. PNAS 2002, 99, 16531–16536. 37 HE, B., et al., A picoliter-volume mixer for microfluidic analytical systems. Anal. Chem. 2001, 73 (9), 1942–1947. 38 CHU, H., UNGER, M. A., QUAKE, S. R., A microfabricated rotary pump. Biomed. Microdev. 2001, 3, 323–330. 39 STROOCK, A., et al., Chaotic mixer for microchannels. Science 2002, 295 (5555), 647–651. 40 CARTER, C. W., BALDWIN, E. T., Frick, L, Statistical design of experiments for protein crystal-growth and the use of a pre-crystallization assay. J. Cryst. Growth 1988, 90 (1-3), 60–73. 41 JANCARIK, J., KIM, S. H., Sparse-matrix sampling – A screening method for
References
42
43
44
45
46
47
48
49
50
crystallization of proteins. J. Appl. Crystallogr. 1991, 24, 409–11. LUFT, J. R., De TITTA, G. T., Kinetic aspects of macromolecular crystallization. Methods Enzymol. 1997, 276, 110–130. SALEMME, F. R., A free interface diffusion technique for the crystallization of proteins for X-ray crystallography. Arch. Bio-chem. Biophys. 1972, 151. SYGUSCH, J., et al., Protein crystallization in low gravity by step gradient diffusion method. J. Cryst. Growth 1996, 162 (3/4), 167–172. GARCIA-RUIZ, J. M., et al., A supersaturation wave of protein crystallization. J. Crystal. Growth 2001, 232 (1–4), 149–155. NERAD, B. A., SHLICHTA, P. J. Ground-based experiments on the minimization of convection during the growth of crystals from solution. J. Cryst. Growth 1986, 75, 591–608. GARCIA-RUIZ, J. M., et al., Agarose as crystallization media for proteins I: Transport processes. J. Cryst. Growth 2001, 232 (1–4), 165–172. BARNES, C. L., SNELL, E. H., KUNDROT, C. E., Thaumatin crystallization aboard the International Space Station using liquid-liquid diffusion in the Enhanced Gaseous Nitrogen Dewar (EGN). Acta Crys-tallogr., Sea D, Biol. Crystallogr. 2002, 58 (5), 751–760. WANG, Y. P., et al., Protein crystal growth in microgravity using a liquid/liquid diffusion method. Microgram Sci. Tcchnol. 1996, 9 (4), 281–283. MILLER, T. Y., HE, X. M., CARTER, D. C., A comparison between protein crystals
51
52
53
54
55
56
57
grown with vapor diffusion methods in microgravity and protein crystals using a gel liquid liquid diffusion ground-based method. J. Cryst. Growth 1992, 122 (1–4), 306–309. NG, J. D., et al., Comparative analysis of space-grown and earth-grown crystals of an aminoacyl-tRNA synthetase: space-grown crystals are more useful for structural determination. Acta Crystallogr., Sect. D, Biol. Crystallogr. 2002, 58 (4), 645–652. THOMAS, B. R., et al., Distribution coefficients of protein impurities in ferritin and lysozyme crystals – Self-purification in microgravity. J. Cryst. Growth 2000, 211 (1–4), 149– 156. LIN, H., et al., Lower incorporation of impurities in ferritin crystals by suppression of convection: Modeling results. Cryst. Growth Design 2001, 1 (1), 73–79. CHERNOV, A. A., GARCIA-RUIZ, J. M., THOMAS, B. R., Visualization of the impurity depletion zone surrounding apoferritin crystals growing in gel with holoferritin dimer impurity. J. Cryst. Growth 2001, 232 (1–4), 184–187. CHOU, H., et al., A microfabricated device for sizing and sorting DNA molecules. Proc. Natl. Acad. Sci. USA 1999, 96, 11–13. LIU, J., ENZELBERGER, M., QUAKE, S., A nanoliter rotary device for polymerase chain reaction. Electrophoresis 2002, 23 (10), 1531–1536. FU, A., et al., An integrated microfabricated cell sorter. Anal. Chem. 2002, 74 (11), 2451–2457.
21
1
NMR of Membrane-Associated Peptides and Proteins Reto Bader, and Mirjam Lerch University of Cambridge, Cambridge, United Kingdom
Oliver Zerbe ETH Z¨urich, Z¨urich, Switzerland
Originally published in: BioNMR in Drug Research. Edited by Oliver Zerbe. Copyright ľ 2002 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30465-7
1 The Biochemistry of Membrane Interactions 1.1 Introduction
Membranes play a central role in all forms of life. They form parts of functional units, establish communication between the inside and outside of cells and between different cells, or are involved in the transport of molecules into and out of cells. Most fundamental biochemical functions involve membranes at some point [1, 2]. Whereas both prokaryotes and eukaryotes have a plasma membrane that separates the extracellular from the intracellular fluid, higher organisms additionally possess membranes that surround the organelles such as the nucleus, mitochondria and lysosomes. The plasma membrane additionally serves as a matrix for integral membrane proteins such as ion channels or membrane-embedded receptors, e.g. the G-protein coupled receptors (GPCRs). Many pharmacologically important drugs exhibit their biological action through binding to these. Unfortunately, high-resolution NMR is presently incapable of elucidating structures of peptides binding to whole cells. The methods that have been developed to study this class of peptides or proteins have recently been reviewed in Refs. [3–9]. Table 1 contains an updated summary of structures of peptides determined in the presence of membrane mimetics by high-resolution NMR. We shall in the next sections describe models that have been successfully used in the past to determine the structure and dynamics of membrane-bound peptides. In order to understand which simplifications are inherent in the models used and what biological Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
25
1BKU, 1BYV, 1BZB n. a. n. a.
Calcitonin, Eel-CT, CT-M6
Membrane-perturbing toxins/antibiotics
34
n.a.
Galanin
n.a.
Alamethicin Trichorzianin Tritrypticin
Mastoparan Gramicidin A, B, C
n. a. n. a.
PACAP27 Melittin
n. a. 1JNO, 1JO3, 1JO4
1D6X
36
1F8P
14 15
20 19 13
27 26
39
n. a.
SDS
32
DMPC/DHPC SDS
SDS SDS SDS
DPC SDS
DPC
DPC
SDS
DPC
SDS
SDS
DPC
Detergent
30
22
Tubero-infundibular peptide Neuro-peptide Y
Parathyroid hormone (1–34) Peptide E
n.a.
Motilin
29
n.a.
Glucagon
Size [AA]
Peptide hormones/ neuropeptides
PDB No.
Peptide/class
Class
Table 1 Summary of peptide structures determined in micelle systems
Helix Two helices Two turns, Tyr/Trp at membrane-water interphase Helix Helix
Flexible N-terminus, amphiphilic C-terminal α-helix Helix, flexible N-terminus Bent helix
Two beta-turns, intermediate order parameters Two helices
Two helices
Hydrophobic N-terminus in micelle interior, amphi-philic α-helix at C-terminus parallel to surface Three turn-like regions at water-membrane inter-phase Amphipathic helix and extended region
Two hydrophobic patches
Result
50 109
106 107 108
104 105
83
103
102
101
99, 100
98
97
96
Ref.
2 NMR of Membrane-Associated Peptides and Proteins
Ligand-receptorcomplexes Varia
Signal sequences Fragments of peripheral membrane proteins Cell/organelle penetrating
Virus coat Fragment of integral membrane proteins
Bacteriophage coat proteins
57 79 48
1IOJ 1I5J 1JO5 n.a.
cPLA2-C2
138
8/29
1HZN
52
n.a.
TrFd chloroplast membrane transit peptide CCK-8/CCKA-recep-tor (329–357) Apolipoprotein C-I Apoligoprotein C-II β subunit of LH1
16
21 15
SDS SDS Zwittergent 3:12 DPC/glycerophosphocholine
DPC, DPG (phosphoglycol) DPC
SDS
SDS DPC/bicelles
DPC
45
n.a.
n.a. n.a.
DPC DPC
SDS
DPC, SDS (comparison) MPG DPC SDS
80 31
40, 42
n.a. 1AFO 1BL1
53 19 36, 71
50
1JAV 1BHA
2CPB, 2CPS
Penetratin
Glycophorin A(62–101) PTH receptor 1 (168–198) PTH receptor 1 (241–285) PhoE N-myristoylated ARF (1–15)
Ike HIV gp41 Bacterio-opsin (1–36), (1–71) Aβ(1–40), (1-42)
M13 fd
123 124 125 126
Two helices Two helices Two helices 8 antiparallel beta-strands
122
121
Ill-defined helix, extended
Asp/Arg-interactions
120
118 119
117
116 101
115
112 113 114
95, 110, 111
Helix
Helix-break-helix Amphipathic helix
Helix
Membrane-associated amphiphilic helix Alpha-helical homodimer Three helices
Amphipathic helix at surface, membrane spanning hydrophobic helix dito Membrane-associated helix Hydrophobic helix
1 The Biochemistry of Membrane Interactions 3
4 NMR of Membrane-Associated Peptides and Proteins
significance these differences have we will briefly review membrane biology and biophysics here. We shall see that most of the important differences concern the lipid composition, the functional groups and molecules presented on the surface, the curvature and charge of the membrane, and hence its membrane potential. It is obvious why the spectroscopist wants to investigate the structure of integral membrane proteins or enzymes, whose biological action is linked to the presence of phospholipids such as phospholipase, in a membrane-mimicking environment. Why such an environment should also be used for other peptides like hormones becomes more clear when we take into account the membrane compartment theory [10–12] as postulated by R. Schwyzer. This theory states that peptides that target membrane-embedded receptors associate with the membrane prior to receptor binding and that it is the membrane-bound conformation that is initially recognized by the receptor. As a consequence, parts of the ligands are optimized for membrane binding (the so-called address), other parts for receptor binding (the message). Hence, for all the classes of peptides or proteins listed above, useful structural information can only be expected when conditions in the NMR sample match the natural environment as closely as possible. Although solvent mixtures like chloroform/methanol may help to perfectly solubilize these peptides, their structure may be quite different from those of the membrane-bound form. 1.2 Biological Membranes
From X-ray studies it is known that many biological membranes are 5 nm thick [13, 14]. One of the first generalized and today well-accepted views of cell membranes is described by a fluid, dynamic mosaic of alternating globular proteins and phospholipid bilayer constituting a two-dimensional oriented viscous solution [15]. For comparison, the apparent effective viscosity of the membrane fluid phase is about 103 to 104 times that of water. The fluid mosaic model has been applicable to all functional membranes despite the tremendous diversity of membrane compositions and functions. Membranes usually exist in both the gel state and in the liquid crystalline state [16]. The former is characterized by a largely reduced mobility and is usually encountered at lower temperatures. The exact transition temperature depends on the lipid composition. Lateral diffusion for completely mobile molecules (such as the phospholipids themselves) is fast in the liquid crystalline state (∼106 s−1 ). Additionally, rapid rotation occurs about its long axis (∼109 s−1 ) [17–19]. In contrast, flip-flop transitions of lipids from the outer to the inner leaflet or vice versa are rare (∼10−5 s−1 ) [18, 20, 21], depending on the protein composition in the cell membrane. The major components of membranes comprise lipids, proteins and glycolipids or glycoproteins [1]. Biological membranes may contain between tens and hundreds of lipid species differing in the chemical nature of the head groups, chain lengths and degree of unsaturation. The lipids are long-chain aliphatic acids coupled to various moieties such as glycerol, sphingosine or ceramide. Common lipids are glycerophospholipids, glycoglycerolipids, phosphosphingolipids and sterols. They
1 The Biochemistry of Membrane Interactions
are either zwitterionic (neutral) or negatively charged. The exact composition of the membrane is tissue-dependent, and the cell maintains an asymmetry with respect to inner and outer leaflet through flipases, flopases and scramblases (see Kuypers et al. [22]). The choline-containing phospholipids sphingomyelin (SM) and phosphatidylcholine (PC) are mainly found in the outer monolayer, whereas the occurrence of most aminophospholipids [phosphatidylserine (PS) and phosphatidylethanolamine (PE)] is limited to the inner monolayer. The asymmetric distribution of charged and uncharged phospholipids contributes to the electric potential across the membrane, the so-called transmembrane potential, which varies between −20 and −200 mV in the resting state [23]. However, not only the head groups but also the fatty acid side-chain compositions are different in the two layers. Glycolipids and glycoproteins contain large heterosaccharide chains attached to the lipids or proteins and are believed to play a pivotal role in cell-cell recognition. Often, sialic acid is found at the terminus of the chain. The exact chemical nature of the membrane components as well as their concentration varies according to the cell function and organism as well as between the different organelle membranes. Despite the weakness and short-range nature of protein-lipid and lipid-lipid interactions, cells have nevertheless evolved means of laterally assembling into membrane-microdomains. Sphingolipid-cholesterol rafts serve to recruit a specific set of membrane proteins and exclude others [24]. Caveolae are deeply invaginated raft domains that are stabilized by caveolin protein oligomers (binding cholesterol) [25]. It should be noted that plants, fungi and most bacteria have a well-defined cell wall, which covers the plasma membrane. The bacterial cell wall consists of a complex polypeptide-polysaccharide matrix referred to as the peptidoglycan. In gram-positive bacteria it is up to 20 layers thick (15–80 nm), whereas gramnegative bacteria have a much thinner peptidoglycan layer which is covered by the outer membrane, again a lipid bilayer. Because of their high osmotic pressure, plant cells are protected by a stable cell wall. It is mainly built up of carbohydrates, namely protopectin, hemicelluloses and cellulose. The fungal cell wall contains chitin instead of cellulose. In animal cells, the integral membrane proteins and certain lipids bear short sugar chains that point outward from the plasma-membrane and form the socalled glycocalyx (or cell coat). These carbohydrates are thought to play an important role in mediating interactions with cells or the nonliving environment. Many animal cells are further surrounded by the so-called extracellular matrix (ECM) [26], which contains mainly fibrous proteins such as collagen, proteoglycans (protein-polysaccharide complexes), laminin and fibronectin. The latter is coupled to the plasma membrane through certain cell surface receptors such as integrins [27]. The cell surface additionally displays receptors responsible for cell-cell recognition [28]. Members of this class of receptors are selectins [29] that recognize specific carbohydrates from other cells in the presence of calcium. Other cell surface receptors belong to the immunoglobulin superfamily (IgSF) [30] that promote
5
6 NMR of Membrane-Associated Peptides and Proteins
calcium-independent cell-cell adhesion. The third important class are the calciumdependent cell adhesion molecules, the cadherins [31], which form dimers with cadherin molecules presented on the surfaces of other cells and hence promote aggregation of similar cell types. From the information given above it is obvious that cell surfaces display an enormous complexity. A perfect model to study the interaction of a peptide with a biological membrane would require knowledge about the cell membrane composition in that particular tissue. Even if such information were available it will most probably not be possible to fully mimic the biological environment. However, some important aspects may still be studied with the available models. Whenever possible, one should try to relate the information derived from such a model to information gained from biological data taken on real cells (cell-lines) such as binding affinities etc. in order to prove the validity of the model for the study of a particular aspect. 1.2.1 Protein-Membrane Interactions Peptides may bind to a membrane either by association to its surface or by insertion into its interior. The latter class comprises the integral membrane proteins whose structures are largely α-helical or β-barrel type (see Chap. 12 in Ref [32]). The topology of interaction of helices with membranes is displayed in Fig. 1.
Fig. 1 Schematic drawing of membrane association modes of peptides: A Integral membrane proteins: (1) major fd coat protein gpVIII of bac-teriophage M13 (pdb 1fdm), anchored by an 18-residue transmembrane hydrophobic helix; (2) bovine rhodopsin, a 7 trans-membrane domain (G-protein-coupled) receptor (pdb 1f88); (3) ion channel peptaibol Chrysospermin C (pdb 1ee7), and B Peripheral membrane proteins: (1) neuropeptide Y (pdb 1f8p),
associated via an amphi-pathic helix; (2) prostaglandin H2 synthase-1 (pdb 1prh); membrane-anchoring domain consists of a right-handed spiral of amphipathic helices forming a planar motif; (3) schematic drawing of a protein associated to an integral membrane protein (4) transducin $-(subunits (pdb 1aor); the (-subunit is farnesylated at cys-71, serving as membraneanchor.
1 The Biochemistry of Membrane Interactions
Binding of molecules to the membrane surface may be due to electrostatic or multiple hydrophobic interactions. Binding of proteins to exposed anionic phospholipids (mainly PS) is often characterized by low specificity and requires high anionic charge density (>10 mol%). The specific binding of proteins to individual phospholipid molecules requires low mol% and is clearly seen in the case of PI-binding PH domains [33]. For hydrophobic peptides that do not insert into the membrane interior, electrostatic attractions provided by anionic phospholipids become essential for peptide binding and insertion into membranes [34]. Interestingly, there is accumulated evidence for specific affinity of Trp and Tyr side chains for the water-membrane interface [35], a view which was recently supported by thermodynamic data [36]. An important question arises about the effects of phospholipid composition and the function of membrane-bound enzymes. The phospholipid composition and cholesterol content in cell membranes of cultured cells can be modified, either by supplementing the medium with specific lipids or by incubation with different types of liposomes. Direct effects of phospholipid structure have been observed on the activity of the Ca2+ -ATPase (due to changes in the phosphorylation and nucleotide binding domains) [37]. Evidence of a relationship between lipid structure and membrane functions also comes from studies with the insulin receptor [38]. Lipid alteration had no influence on insulin binding, but modified the kinetics of receptor autophosphorylation. Protein binding to lipids also results in ordering of lipids. However, the effect is short-range, basically including only the first shell of phospholipids around the protein [39]. Interactions are mediated by electrostatic forces and by hydrogen bonding. Accordingly, the effects are selective for appropriate lipid head groups.
1.3 Aggregate Structures of Lipids and their Biophysics
Since detergents contain both hydrophobic tails and hydrophilic head groups, they form well-defined aggregates such that the head groups are exposed to the solvent and the hydrophobic chains are shielded from solvent access. Depending on the relative volumes of head group and aliphatic chains either spherical aggregates, micelles, or bilayers, either vesicles or bicelles, are formed. Micelles spontaneously form when head groups are more bulky than the aliphatic chains such that the overall appearance resembles a cone, e.g. dodecylphosphocholine (DPC) or sodium dodecylsulfate (SDS). In contrast, glycerol derivatives contain two coupled aliphatic chains and hence have a more rectangular shape (e.g. the phosphatidylcholine derivatives). The latter are most advantageously arranged as bilayers or bicelles, whereas the former rapidly assemble as micelles [1]. The chemical structures of the most prominent membrane model compounds particularly suitable for studies of protein/membrane interactions by NMR are displayed in Fig. 2.
7
8 NMR of Membrane-Associated Peptides and Proteins
Fig. 2 Chemical structures of selected membrane model compounds: A dodecylphosphocholine (DPC), B sodium dodecylsulfate (SDS), C sodium taurodeoxylcholate (similar to CHAPS), D n-octyl-$-Dglucopyranoside and E lyso-dodecylphosphatidyl-glycerol.
1 The Biochemistry of Membrane Interactions
1.3.1 Micelles The classical picture of a micelle with a linear arrangement of the aliphatic chains extending radially from the center to form a sphere in which the surface is formed by the hydrophilic head groups leads to a higher atom density in the centre. Dill et al. have developed a statistical (lattice) model [40] in which the terminal carbon atoms may also be found more close to the surface. This model is more consistent with data from molecular dynamics simulation of fully solvated DPC [41, 42] or SDS [43]. In all simulations, C-12 is partially found close to the head groups and the order parameter extracted from the MD trajectories for C H bond vectors decreases continuously on going from C-1 to C-12. The chain mobility properties are more similar to the liquid crystalline phase observed for bilayers than to the gel phase. Accordingly, Beswick et al. [44] concluded from their relaxation measurements that the dynamical behavior of phosphocholine groups from DPC micelles at 12 ◦ C corresponds to the phosphatidyl head groups of natural lipids at 51 ◦ C. Interestingly, also surface-associated phospholipids were encountered during the simulation. The correlation time for overall tumbling of a DPC micelle was estimated by Beswick et al. [44], based on a micelle radius of 2.3 nm [45], to be 11 ns at 25 ◦ C and 17 ns at 12 ◦ C. Water molecules are absent from the hydrophobic interior, but both the choline and the phosphate headgroups are fully solvated [41]. Similarly, the first hydration shell of the sulfate headgroup of SDS is formed rather by water molecules than by sodium ions. Because of hydration the charge density due to the lipid headgroups is overcompensated by the water dipoles, thereby reducing the transmembrane potential by 50–100 mV across the lipid water interface and resulting in a negative potential at the aqueous side [42]. 1.3.2 Bicelles Bicelles are disc-shaped, mixed micelles [46] composed of bilayered long-chain phospholipids such as dimyristoylphosphatidylcholine (DMPC) surrounded by a rim of short-chain detergents such as dihexanoylphosphatidylcholine (DHPC) or bile salt-like CHAPSO. Their use as a membrane model was introduced by Prestegard in 1988 [47], and new applications were reviewed by Sanders et al. [48]. The ratio of [DMPC] to [DHPC] is denoted as q and determines the diameter of the bicelles [49]. The DMPC-containing bilayer regions are conformationally and dynamically similar to DMPC in liquid crystalline phases. Recent interest in these systems was driven by the fact that solutions with 2 < q < 10 containing 2–40% (w/w) phospholipids form a nematic phase over a certain temperature and pH range and the bi-celles in such a phase become aligned in the presence of a magnetic field. If peptides bind to them tightly they can only be used for solid-state NMR studies because of the large size of the disks. However, Vold [50] realized that when q is reduced to 0.5 isotropic solutions of bicelles [51] are generated for which bound peptides display line widths comparable to those from micelleassociated peptides in spite of its comparatively large diameter (approx. 80). This is a consequence of the high mobility of the phospholipids within the bilayer. The
9
10 NMR of Membrane-Associated Peptides and Proteins
viscosity of the resulting solution is only twice that of water at 35 ◦ C. In contrast to micelles, bicelles form (almost) true bilayers, and surface curvature is much less an issue in these systems. Bicelles may also be doped with up to 25% phosphatidylserine (PS) or phosphatidylglycerol (PG), resulting in negatively charged bicelles with very similar biophysical properties [52]. For the peptide mastoparan X the additional charges promoted a change in the membrane-association topology, changing from a parallel to a perpendicular orientation with respect to the bicelle surface [53]. 1.3.3 Vesicles or Liposomes Vesicles or liposomes are spherical compartments of bilayered/multilayered phospholipids enclosing a liquid (usually water). Because of their considerable size compared to micelles or bicelles the NMR study of peptides binding to them can only be performed in the solid state. However, they have been very successfully employed by Prestegard to investigate the transport of organic molecules through membranes [54]. In his elegant approach, resonances from molecules outside the vesicles were broadened through spin labels and could thereby be distinguished from those that had permeated into the interior. For the three membrane models micelles, bicelles and vesicles (liposomes) it is of prime importance to know whether the peptides or proteins remain structurally intact and biologically active in that environment. In general, the similarity to biological membranes decreases in the order: vesicles > bicelles > mixed micelles > micelles. Functional reconstitution of membrane proteins in membrane mimetics is a well-known problem to biochemists but their successful candidate for a specific protein may not present an environment suitable for high-resolution NMR. The success also depends on the class of peptide/protein to be studied. For surface-associated peptides, micelles containing head groups similar to biological membranes will be a good mimic as long as the required surface for interaction is not too large such that the surface curvature will become a problem. The question to what extent micelle binding induces curvature was recently addressed by Chou et al. [55], who compared the structure of a peptide fragment of the HIV-1 envelope protein gp41 when bound to micelles with that of the bicelle-bound form using dipolar couplings. In their work, a clear curvature of the micelle-associated helix was observed with the hydrophobic residues pointing to the inside. In contrast, a nearly straight helix was found in the case of the bicelle-associated species. Even for membrane-integrating peptides micelles may still be useful. Arseniev et al. was able to show that peptides comprising the amino acid sequence from transmembrane parts of bacterioopsin form intact helices in SDS micelles [56]. A number of publications tested detergents for their ability to retain protein function [57–59], but the used concentrations of detergents were mostly well below those required for structural studies. A recent publication of Sanders specifically addresses that question for detergents used in high-resolution NMR studies [60]. In the case of gramicidin A, Ketchem et al. discovered by using solid-state NMR techniques that the channel-forming peptide when bound to bilayers is structurally slightly different from that of the micelle-bound
2 The NMR Sample
form [61] and partly attributes the difference to the curvature of the micelle. For that reason bicelles are expected to better mimic biological membranes. The integral membrane protein E.coli diacylglycerol kinase was successfully reconstituted into bicelles [62].
2 The NMR Sample 2.1 Synthesis of Peptides and Proteins
Solid-phase peptide synthesis offers a fast and convenient route for many peptides when isotope-enriched compounds are not required. Classical synthesis additionally permits the use of non-natural amino acids and allows site-specific isotope labeling. Although Fmoc protected 15 N-labeled amino acids are commercially available, the cost of such synthesis is usually prohibitive, and the peptides from chemical synthesis require perdeuterated detergents and unfortunately exclude investigation of internal dynamics through measurement of 15 N relaxation. In solid-phase synthesis [63, 64], peptides are mainly synthesized by applying Fmoc/tert-butyl or Boc/benzyl protection group strategy on a robot. Coupling requires activating reagents such as 1,3-diisopropylcarbodiimide (DIC) and 1hydroxybenzotriazole (HOBt). Fmoc/tert-butyl strategy – representing the more convenient method – allows simultaneous removal of all side-chain protecting groups and cleavage of the peptide from the resin with trifluoroacetic acid. Alternatively, the peptide may be liberated from the resin under mildly acidic conditions, keeping the side chains protected. Cleavage from Wang or SASRIN resins will yield peptides with the free carboxy terminus, whereas the so-called Rink-amide resin will lead to C-terminal amides. In the last coupling step, the peptides may also be N-terminally modified, for example, by an acetyl group. The success of the coupling steps and hence the purity of the peptides will largely depend on the amino acid sequence, whereas hydrophobic peptides tend to be more difficult to synthesize and purify. Water-soluble peptides can frequently be purified with a single reversed-phase (RP)-C-18 HPLC run using an acetonitrile/water gradient. Smaller peptides are difficult to overexpress in E. coli and are additionally degraded in cells. These problems are commonly circumvented if fusion peptides are used from which the peptide of interest is cleaved biochemically. These constructs, e.g. the presently popular GB1 domain of the streptococcal protein G [65], may also help to solubilize the peptide and thereby improve the expression yield. We have successfully used the system introduced by Kohno [66] in which the peptide of interest is fused to decahistidine-tagged ubiquitin. The peptide is expressed in either M9 minimal medium containing 15 N-NH4 Cl (and 13 Cglucose) or isotope-enriched rich media. Separation of the fusion peptide from
11
12 NMR of Membrane-Associated Peptides and Proteins
others is achieved by Ni-affinity chromatography, after which the peptide is liberated from ubiquitin through the enzyme ubiquitin hydrolase. To achieve Cterminal amidation for recombinantly produced peptides an additional gly codon is added to the cDNA. After expression the Gly residue is enzymatically converted into the amide function through the peptidylglycine α-amidating enzyme [67]. Again, the resulting peptide can be purified with RP C-18 HPLC if water solubility is sufficient. 2.2 Choice of Detergent
Detergents commonly used to form micelles that are amenable to high-resolution NMR are summarized in Tab. 1, and the chemical structures of the most commonly used detergents are presented in Fig. 2. Unfortunately, only a few of those needed for use with nonisotopically enriched peptides are commercially available in deuterated form. Most frequently, the zwitterionic DPC or the negatively charged SDS have been used as membrane mimetics. Mixtures of DPC “doped” with small amounts of SDS may be used to modulate the charge distribution on the micelle surface. It should be emphasized here that SDS is a denaturing detergent (in contrast to CHAPS, DPC etc.) in which many membrane-bound peptides/enzymes lose their biological activity. Furthermore, the head group is not encountered in biological systems, it would hence seem to be much inferior to DPC, and is only recommended as an additive. Similarly to SDS, phosphatidylserine is negatively charged and hence presents a suitable additive for modulating the surface charge of the micelle, thereby influencing electrostatic interactions with charged peptides. CHAPSO, similarly to cholesterol, contains the rigid steroid skeleton and hence may influence the membrane rigidity substantially. It is important to note that many detergents such as β-octylglycoside or CHAPSO are not available in perdeuterated form and require 13 C, 15 N doubly labeled peptides/proteins for structural studies. When working with isotopically (possibly 15 N, 13 C) enriched peptides, the need for deuteration is avoided, and the above-mentioned detergents may be added. Other alternatives are the lyso-phosphatidylglycerol derivatives in which one of the fatty acid chains coupled to glycerol have been removed. Unfortunately, the phosphatidyl lipids are not available in fully deuterated form, and hence certain regions of the proton spectra will be covered by the glycerol protons. Opella et al. [3] have shown that the concentration of the detergent must be well above the critical micelle concentration. Ideally, the concentration should be such that each micelle is occupied on average by one peptide molecule. Otherwise, aggregation effects will lead to species with different aggregation states so that signal broadening or doubling may result. However, it has been argued that this effect is less important for smaller peptides than for larger proteins whose molecular weight is similar to that of the complete micelle. For very high concentrations of detergent (>500 mM) both viscosity and aggregation number increase rapidly. In the case of
2 The NMR Sample Table 1 Commonly used detergents for work with micelles including some important bio-
physical properties CMC [mM]
Detergent (MW) SDS (288, 313a ) DPC (351, 389a ) Deoxycholate sodium (489) β-D-Octylglucoside (292) β-D-Decylglucoside (322) Zwittergent 3:12 (336) MPG (myristoyllyso-phosphatidylglycerol) CHAPS (615) CHAPSO (631) a
Aggregation number [N]
0–0.05 M Na+
0.1–0.2 M Na+
Ref.
0–0.01 M Na+
0.1–0.2 M Na+
Ref.
3–10 1.1 2–6
0.9–2
60–75 56 1.5–12
100–150
1–4
127–129 45 127, 130–132
129 45 132, 133
20–25
19–25
127, 128
84
134
2–3
2.0
2–4
1.4
Calbiochem manual 127, 135, 136
55
136
n.a.
n.a.
6–10 8.0
3–5 4.1
4–14
137, 138
127, 137, 138, 137
5–25
) fully deuterated form
(charged) SDS, the ionic strength of the solution increases to high values so that sensitivity suffers, pulse-lengths increase, and sample heating during decoupling or spin-locking occurs. One positive side-effect of working with detergent solutions is that peptide samples tend to be stable over longer periods at higher temperatures. 2.3 Choice of pH
In order to match physiological conditions, the selected pH should be close to neutral. However, because of elevated amide proton exchange at that pH for peptides with solvent-accessible amide protons, a value in the range 3–5 is usually chosen, also depending on the solubility at the pH. For globular peptides, most amide protons are involved in rather stable hydrogen bonds or are protected from solvent access and a much more basic pH may be used. Amide proton exchange is both base- and acid-catalyzed. The mechanism for base catalysis is clear and involves abstraction of the proton from the amide nitrogen forming an imidate anion. During acid catalysis the proton may either attack the nitrogen or the oxygen and depends on the electron-withdrawing/donating capabilities of nearby groups. Titration curves were established for labile protons in peptides a long time ago [68] (for reviews see Refs. [69, 70]). It soon became obvious in studies of micelle-bound peptides that exchange kinetics are different
13
14 NMR of Membrane-Associated Peptides and Proteins
in that environment. O’Neil et al. have determined pH titration profiles for M13 Coat protein in the presence of SDS micelles [71] and found that the minimum is shifted by 1-1.5 pH units. They also discovered that the apparent pH in the vicinity of the surface is increased by 2 pH units compared to the bulk solution. Spyrocopolous et al. determined the profiles for a set of model tripeptides [72] and concluded from their data that highly hydrophobic peptides that are likely to be placed more deeply in the hydrophobic interior display exchange kinetics reduced at least 25-fold. Perrin et al. have attributed the differences in amide proton exchange to the extent by which the transition state during catalysis, e.g. the positively or negatively charged species, are stabilized by the environment [73]. Accordingly, the pH minimum and kmin depend on the type of detergent used and hence will be different for positively charged, negatively charged or neutral micelles. All other effects, e.g. counterions, the head groups, the concentration of the surfactant etc., influence amide proton exchange to a much smaller extent. The pH of course will also influence the protonation state of side chains, and hence it will possibly influence both the structure of the peptide and the affinity of the peptide toward the micelle. The latter can also determine the exchange kinetics between the free and the micelle-bound state and may thereby largely influence the line widths encountered. In our view we recommend choosing the pH from a series of spectra. The pH should then be selected such that amide protons accessible to solvent are still observable. We generally found values in the pH range 5–6 gave good spectra when working with DPC micelles, assuming reasonable affinity toward the micelle. Figure 3 displays a comparison of one-dimensional spectra of a neuropeptide in the presence of either DPC or SDS micelles at various pH values. Note that the signal at approx. 8.6 ppm due to HE1 of His vanishes in SDS at a much higher pH compared to DPC. In general, spectra in SDS still yield reasonable quality at neutral pH in contrast to those recorded in DPC. 2.4 Choice of Temperature
It is advantageous to set the temperature for the measurement of peptide/micelle or peptide/bicelle systems at higher values consistent with stability of the peptide. We found that values in the range 30–45 ◦ C gave spectra with reasonable line widths. At these temperatures TOCSY spectra with mixing times of 80 ms still gave good S/N for the majority of correlations on 36 amino acid peptides. Amide proton exchange is still acceptable at pH 5–6 at these temperatures, and NOESY spectra display strong cross peaks for proximate protons. In contrast, H,D exchange is very fast on lyophilized samples of peptides with µM binding affinities to the membranes and hence required 15 N labeled peptides in order to record [15 N, 1 H]-HSQC spectra. Only with these experiments is it possible to assign the exchange to specific sites within the available time (2–20 min). A series of one-dimensional proton spectra of a neuropeptide recorded at various temperatures between 280 and 320 K are displayed in Fig. 4.
2 The NMR Sample
Fig. 3 Proton one-dimensional spectra displaying the region of amide resonances of 0.5 mM Ala31, Pro32-NPY recorded in 300 mM SDS (left) or 300 mM DPC (right) at various values of the pH.
2.5 Salt Concentrations
High salt content favors higher aggregation numbers of the micelle, and hence one should try to work at low salt concentrations (<50 mM). Since peptide phospholipid interactions are often electrostatic in nature, salt may be added to modulate the binding affinity. This may be advantageous if exchange between the free and the membrane-bound state of the peptide is slow on the NMR time scale such that
Fig. 4 Proton one-dimensional spectra displaying the region of amide resonances of 0.5 mM Ala31,Pro32-NPY in 300 mM DPC at various temperatures.
15
16 NMR of Membrane-Associated Peptides and Proteins
considerable line broadening occurs. We have found that addition of CaCl2 at low concentrations (5 mM) reduced the binding affinity of neuropeptide Y toward the micelle considerably (see below) without deteriorating the physical properties of the sample significantly. 2.6 Practical Tips for Sample Preparation
One inherent property of peptides that interact with membranes is that selfassociation or even aggregation will interfere with solubilization by organic solvents or micelles. The preparation, purification and sample preparation of extremely hydrophobic (often transmembrane) peptides is nontrivial and has been addressed by only a few papers [74–79]. The first problem encountered once the peptide has been successfully synthesized is that standard purification protocols fail. Although very hydrophobic peptides are soluble in acids such as TFA, these harsh conditions are not suitable for purification, because they can reduce column life times and denature native protein structures. Hence residual acid has to be removed, and many peptides can then be redissolved in mixtures of water and tert-butanol. Peptides with a strong tendency to aggregate may be dissolved either in trifluoroethanol (TFE), hexafluoroisopropanol (HFIP), mixtures of 1-propanol and 1-butanol, 20% acetic acid or 70–90% formic acid. Water-soluble peptides can easily be purified using standard C18 reverse-phase high pressure liquid chromatography running a water:acetonitrile gradient in the presence of 0.1% TFA. Again, difficulties arise with hydrophobic peptides. Firstly, it is critical to load the peptide onto the column in a deaggregated state applying one of the methods mentioned above. Secondly, standard RP-HPLC conditions may be inappropriate because of irreversible column binding or very broad peaks. However, subsititution of acetonitrile/water by either isopropanol:acetonitrile (2:3)/ water (0.1% TFA) or 1-propanol: 1-butanol (2: 1)/water (0.1% TFA) and using a C4 or even C1 column improved the purification for a variety of hydrophobic peptides. To gain optimal yields it is important to avoid aggregation of the peptides after injection. Therefore, the starting solvent for the HPLC purification (i.e., the percentage of organic solvent/water) has to be adjusted according to the hydrophobicity of the peptide. For hydrophobic peptides that are exclusively soluble in strong acids, Tomich [74] developed a purification scheme in which the peptide is cleaved from the resin in the presence of sodium dodecylsulfate. The precipitate is redissolved by dilute acetic acid and purified to homogeneity using a detergent-based HPLC protocol. At the same time, SDS is quantitatively exchanged by n-octyl-β-D-glucopyranoside. Alternatively, hydrophobic peptides may be purified by high-performace gel filtration using e.g. the Superdex Peptide HR 10/30 column (Pharmacia Biotech). With such columns isotropic elution with 70% formic acid is possible [79]. A general procedure has been reported by Killian et al. for the incorporation of hydrophobic peptides into micelles [75]. Therein, the peptides were first dissolved
3 The Structure and Dynamics of Membrane-Associated Peptides – A Case Study of Neuropeptide Y (NPY)
in TFA for deaggregation, dried under nitrogen and redissolved as 5 mM (clear) solutions in TFE or HFIP. The solution is subsequently diluted 1:1 by addition of an aqueous solution containing a varying SDS concentration (typically 500 mM, depending on the protein concentration). Water is then added to yield a 16:1 ratio of water to TFE (HFIP) by volume. Upon addition of excess water the peptide loses its solubility, but at the same time the micelles are re-formed, resulting in the homogeneous incorporation of the peptide. Vortexing and lyophilization by rapid freezing (liquid nitrogen) is followed by drying overnight under vacuum and redissolving in water to yield clear solutions. A crucial prerequisite for the successful incorporation of peptides into micelles is that they persist in the deaggregated (mostly monomeric) form during the whole sample preparation procedure. Peptides have their lowest solubility at their pI, and therefore aggregation may be prevented by adjusting the pH of any aqueous solution (if possible by addition of buffers) such that the net charge of the peptide is at a maximum. A very nice example is reported by Zhang et al. [79] studying a 33-mer peptide derived from the extracellular loop II of the kappa opioid receptor containing 4 and 8 residues that may be positively or negatively charged, respectively. Upon purification by the Superdex-FPLC-method (see above), the peptide was redissolved in 500 mM ammonium bicarbonate (pH 8.4) and again lyophilized. The resulting powder could be directly dissolved in a 144 mM DPC solution (50 mM phosphate buffer at pH 6.8) at a concentration of 1.9 mM, and the transmembrane segments of the peptide incorporated well into the micelle interior. Although detailed protocols for preparation of small unilamellar vesicles (SUVs) are published [80–82], their preparation is more time-demanding and complicated. In contrast, bicelles are prepared similarly to micelles with little effort.
3 The Structure and Dynamics of Membrane-Associated Peptides – A Case Study of Neuropeptide Y (NPY) 3.1 Introduction
The structural analysis of membrane-associated peptides comprises two steps: (a) the elucidation of the three-dimensional fold of the peptide and (b) the determination of the membrane-peptide interface. We will use our results gained for the 36 amino acid residue neuropeptide Y (NPY) [83] to demonstrate the approaches that can be used. NPY regulates important pharmacological functions such as blood pressure, food intake or memory retention and hence has been subject of many investigations (for a review see Ref. [84]). It targets the so-called Y receptors that belong to the class of seven transmembrane receptors coupled to G-proteins (GPCRs).
17
18 NMR of Membrane-Associated Peptides and Proteins
3.2 Structure Determination of Micelle-Bound NPY
Resonance assignment largely follows the procedure developed by the W¨uthrich laboratory [85]. The repertoire of experiments that may be used depends on whether nonlabeled peptides, 15 N-labeled peptides or 13 C,15 N-doubly labeled peptide are available. When no labeling has been performed, the experiments for resonance assignments solely belong to the class of homonuclear experiments (mainly DQFCOSY, DQ-spectra, TOCSY and NOESY etc.) [86]. 15 N labeling permits the use of 15 N-edited three-dimensional spectra. Although signal overlap in the amide region is not so severe as to make the use of 15 N labeling mandatory, it was required for the determination of the internal dynamics, and it also facilitated the experiments for determination of the membrane interface. We found a (two-dimensional) NOE-relayed [15 N,1 H]-HSQC [87] particularly useful to rapidly assign the 15 N, 1 H correlation map in the helical part of the peptide. In other cases, e.g. for the resonance assignment of micelle-bound bovine polypeptide (bPP) (unpublished data), we have additionally used 15 N-edited TOCSY and NOESY spectra for assignment. In our experience resonance assignment is speeded up so much in the case of 15 N-labeled peptides that the additional effort in the laboratory is quickly paid for. The HSQC spectra will also help to immediately recognize signal overlap. The structure calculation of NPY bound to DPC micelles was based upon a 75 ms NOESY spectrum. Figure 5 shows expansions of spectral regions that were heavily used for backbone assignment. Sufficiently tight binding (mM range) of the peptide to the DPC micelle gives strong cross peaks and requires the use of comparably short mixing times in order to avoid spin diffusion. We have always used watergatetype solvent suppression techniques [88] to avoid saturation transfer, which rapidly takes place for residues not involved in the membrane interface at the pH 6 used. The position of helical residues is confirmed by an analysis of 3 JNHα couplings,
Fig. 5 Expansion from NOE-relayed [15 N,1 H]-HSQC spectrum of NPY.
3 The Structure and Dynamics of Membrane-Associated Peptides – A Case Study of Neuropeptide Y (NPY)
Fig. 6 Superposition of the backbone representation of the 20 lowest energy DYANA structures of NPY/DPC.
which could conveniently be extracted from either the [1 H, 1 H]-NOESY or [15 N, 1 H]-HSQC spectra using the INFIT method [89]. The three-dimensional structure of NPY when bound to the membrane is shown in Fig. 6. It comprises an α-helix for residues 16 to 36 which is very well defined, and a flexible N-terminal part of the molecule. When comparing the structure of the DPC-micelle bound form to the structure in free solution, it is obvious that the α-helix is much more stable. In addition, the C-terminus of the helix comprising residues 32–36, which is flexible in solution, adopts an α-helical fold, and the Tyr36 is oriented such that it interacts with the water-membrane interface. 3.3 The Determination of the Topology of the Membrane-NPY Interface 3.3.1 Spin Labels Spin labels contain unpaired electrons that by highly efficient electron-nuclear spin dipolar coupling lead to accelerated transverse or longitudinal relaxation. The effect is rather far-reaching (at least up to 10 Å). Micelle-integrating spin labels, such as the doxyl-substituted stearates, are commercially available. The stearates are offered with the spin label attached at different positions along the fatty acid chain: 5, 7, 12. To probe the vicinity of the membranewater interface, the 5-doxylstearate is the most suitable. In the literature the free carboxylate has mostly been used. However, this may specifically interact e.g. with Arg or Lys side chains, and hence the methylester serves as a good control reagent. The 12-doxylstearate introduces some ambiguity into the analysis of the data because C-12 spends some time in the vicinity of the interface, as is known from MD trajectories. Beswick et al. have argued that at acidic pH the stearates are uncharged and hence their radial position is less well defined [44]. They propose the
19
20 NMR of Membrane-Associated Peptides and Proteins
use of 1-palmitoyl-2-stearoyl-(10-doxyl)-sn-glycero-3-phosphocholine, which readily integrates into the micelle. Mn2+ ions may serve to yield a “negative print” since they will broaden all protons which are water accessible. The use of spin labels is also nicely described in the review by Damberg et al. [90]. The doxylstearates may be prepared as concentrated (e.g. 0.3 M) methanolic stock solutions. They should be added such that their final concentration in the micelle solution is about one spin label per micelle. When possible, the spin label should be added to the micelle solution before the peptide to ensure that it integrates into the micelles properly and does not bind specifically with the peptide. In principle, spin label effects can be observed in both homonuclear and heteronuclear spectra. When no isotope-enriched peptide is available, integration of TOCSY spectra in the presence and in the absence of spin label gives information of spatial proximity of protons to the spin label. This method allows one to analyze data from both side-chain as well as from backbone protons. We found that analysis of [15 N,1 H]-HSQC spectra is very fast and reliable and superior to the use of homonuclear spectra. Because of the good signal separation and the large 1 JHN coupling, cross peaks are present even in the spin label containing samples and can still be reliably integrated. Moreover, the positioning of side chains does not unambiguously enable the molecule to orient on the micelle. The so-called snorkelling effect of Arg or Lys side chains [35] has been observed frequently and may lead to erroneous interpretation of the data. The relative signal intensities from a [15 N,1 H]-HSQC spectrum of NPY recorded in the presence of 5-doxylstearate is displayed in Fig. 7. It reveals two important
Fig. 7 Ratio of intensities of cross peaks in the [15 N,1 H-HSQC spectrum in the presence of 5-doxylstearate relative to those in the absence of the spin label (a). Ratio of intensities in the presence of both spin label and 5 mM CaCl2 to those in the presence of spin label but absence of salt (b).
3 The Structure and Dynamics of Membrane-Associated Peptides – A Case Study of Neuropeptide Y (NPY)
points. Firstly, the reduction due to the spin label decreases from the start of the C-terminal α-helix up to the N-terminus continuously, proving that the N-terminus diffuses freely in solution and does not interact with the membrane. Secondly, the signal reduction along the sequence is periodic, nicely reflecting the pattern to be expected for the association of an α-helix parallel to a membrane surface. The residues affected to the greatest extent are those containing hydrophobic side chains (e.g. Leu, Ile) and are contained in the helical segment comprising residues 17–36. For a peptide inserting into the membrane interior with its backbone, a set of reduced signals from a continuous stretch of the amino acid sequence is observed. Interestingly, upon addition of 5 mM CaCl2 , the affinity of NPY toward the DPC micelle is significantly reduced. Fig. 7b clearly displays the signal regain for the residues that have been most strongly affected by the spin label. From the spin label studies we were able to conclude that NPY associates specifically with its hydrophobic side of the C-terminal α-helix parallel to the membrane surface. 3.3.2 Amide H,D Exchange Significantly reduced rates for proton-deuterium exchange prove that the corresponding amide proton is either involved in stable hydrogen bonds, shielded from solvent access or both. Because of the ambiguity in interpretation, additional information about the persistence of hydrogen bonds stemming from structure calculations or from relaxation data should be available. For NPY it was possible to follow the H,D exchange for certain sites by redissolving the lyophilized sample in deuterated water. Although order parameters derived from 15 N relaxation data indicate that the α-helix is stable between residues 17–33, residual signals in the 15 N, 1 H correlation map were only visible for residues which form the hydrophobic side of the amphiphilic helix, e.g. Leu24, Ile28 Tyr27 and Ile31. This observation indicates that shielding from solvent access is the primary factor slowing down amide exchange and not formation of stable hydrogen bonds. The effect observed is much larger than the intrinsic differences of amide proton exchange rates due to neighboring atom effects. The residues mostly shielded from solvent access were identical or adjacent to those that were mostly affected by the spin label. From the spin label and H,D exchange data it is possible to position NPY on the membrane such that the hydrophobic side is pointing toward the membrane surface. Once again, availability of 15 N uniformly labeled NPY has to be emphasized. In order to yield reasonably narrow lines and to avoid the need to reassign resonances of the peptide at a different temperature, the experiments were again performed at 37 ◦ C. Only with the labeled peptide were we able to assign reduced exchange rates to specific sites in the peptide at that temperature. In our experimental setup, lyophilized 1 mM peptide was mixed with ice-cooled deuterated water and inserted into a magnet that was pre-shimmed and whose probe was tuned and matched. Acquisition of the first two-dimensional data set with sufficient resolution was finished approx. 5 min after dissolving the sample in D2 O. The spectrum is
21
22 NMR of Membrane-Associated Peptides and Proteins
Fig. 8 [15 N,1 H]-HSQC reference spectrum of 15 N labeled NPY (a), and (b) same as a but recorded 10 min after dissolving the lyophilized sample in deuterated water.
displayed in Fig. 8. Even with µM affinity for the membrane, H,D exchange is fast enough to exclude the possibility of recording the much less sensitive 1 H,1 H homonuclear correlation spectra needed to derive the assignments on nonlabeled material. 3.4 Measurement of Internal Dynamics of NPY/DPC
NMR techniques are unique in their ability to resolve internal dynamics with site-specific probes. Backbone dynamics may be derived from relaxation data of 15 N nuclei. Relaxation data are conveniently measured in experiments that utilize [15 N,1 H]-HSQC-derived pulse sequences and hence can be performed within less than a week of total instrument time with a 1 mM sample (at one field). The underlying principle of the measurement has been recently reviewed by Palmer [91]. Whereas 15 N longitudinal and transverse relaxation rates can be determined with sufficient precision, the determined values of the 15 N{1 H}-NOE differ significantly from the true values for residues involved in fast amide proton exchange. A comparison between the values for the 15 N{1 H}-NOE for NPY in solution in the presence and in the absence of DPC is displayed in the Fig. 9. The comparison reveals two striking differences. Firstly, the (negative) values of the NOE for residues of the
3 The Structure and Dynamics of Membrane-Associated Peptides – A Case Study of Neuropeptide Y (NPY)
Fig. 9 Comparison of 15 N{1 H}-NOE between NPY free in solution (black bars) and NPY bound to micelles (grey bars) (top), and (bottom) generalized order parameter S2 for NPY bound to DPC micelles.
unstructured N-terminus that do not interact with the DPC micelle surface are larger. This result is most probably due to increased saturation transfer from the water and results from increased exchange of amide protons at the used pH of 6.0 compared to that used in the absence of DPC (pH 3.1). Secondly, the values for residues from the C-terminal pentapeptide are negative in the case of NPY free in solution whereas they are positive in the micelle-bound form. This clearly indicates that the C-terminal pentapeptide is significantly rigidified upon binding to the micelle. The result is supported by the structure calculation that displays rather low RMSD values for that part.
23
24 NMR of Membrane-Associated Peptides and Proteins
Physical insight into motional properties derived from the relaxation data is commonly derived from a Lipari-Szabo type analysis [92, 93]. The analysis may only be conducted when the time scale for internal and overall dynamics differs by at least one order of magnitude. In addition, as noted by Schurr et al. [94], anisotropic reorientation of the molecules may cause the analysis to erroneously reveal slow internal motions even in the absence of these. Both prerequisites may not be fulfilled for NPY in the absence of DPC, and we have therefore restricted our analysis to the above-mentioned comparison of the heteronuclear NOE. However, we feel that such an analysis is justified for NPY bound to DPC micelles. Firstly, the micelle itself reorients isotropically in solution, and binding of the hormone to the spherical micelle is not expected to change that fact dramatically. Secondly, the overall tumbling of NPY bound to a DPC micelle is slow enough (8.96 ns) to separate internal from overall dynamics. Fitting of the relaxation data was well possible within the experimental error limits with small residuals excluding residues from the N terminus. The generalized order parameters S2 from a Lipari-Szabo type analysis are displayed in Fig. 9 (bottom). Their values are above 0.8 in the segment comprising residues 18–34. For residues 18, 19, 21, 23–28, 30 and 32 only a single variable, S2 , was required in order to yield reasonable fits to the experimental data. The data indicate that internal motions of the backbone are limited to values encountered in stable α-helices of globular proteins and show that the isolated helix upon binding to the membrane-mimicking micelle is remarkably rigidified. This result is in accordance with a comparison of the scalar 3 JNHα couplings that indicate rotationally averaged backbone angles in the case of free NPY and values reduced below 6 Hz for most C-terminal residues upon binding to DPC micelles. Moreover, the values drop only slightly toward the C terminus, indicating that although the backbone becomes more flexible it is still structured. Since it is this part of the molecule that is believed to be involved in forming contacts to the receptor, we believe that this result has biological significance and indicates a possible role of the membrane for recognition of NPY by its membrane-embedded receptor. The view that tight binding of helices to membranes is accompanied by a rigidification of the backbone is supported by the analysis of the backbone dynamics of M13 phage coat protein gVIIIp bound to SDS micelles as determined by Papvoine et al. [95]. They found even higher values for the micelle-spanning helix (S2 = 0.96) compared to that of the membrane-associated helix (S2 = 0.51). We have recently gained confidence that our results on NPY are biologically relevant from our investigations into the Ala31,Pro32 mutant of NPY, which selectively binds to the Y5 receptor subtype. Using the strategy described above, we discovered that, although in general structural similarity for NPY and the mutant is large, significant differences exist in the C-terminal pentapeptide (the message part). In the mutant, the second last membrane anchor is removed and the backbone of the mutant forms a longer (flexible!) loop comprising residues 30–36, which is again anchored via the C-terminal Tyr36 amide. Thereby, the position of residues Arg33 and Arg35 is much less well defined and their distance to the membrane-water
References
interface is greater on average. As a consequence, binding of these residues to an Asp receptor residue at the membrane-water interface is reduced in the Y1 receptor.
References 1 1 GENNIS, R. B. Biomembranes: Molecular Structure and Function; 1st ed.; Springer: New York, 1989. 2 YEAGLE, P. L. The membranes of cells; Academic Press: San Diego, 1993. 3 OPELLA, S. J.; KIM, Y.; MCDONNELL, P. Methods Enzymol. 1994, 239, 536–560. 4 HENRY, G. D.; SYKES, B. D. Methods Enzymol. 1994, 239, 515–535. 5 KLASSEN, R. B.; OPELLA, S. J. Methods in Molecular Biology 1997, 60, 271–297. 6 MARASSI, F. M.; OPELLA, S. J. Curr. Opin. Struct. Biol. 1998, 8, 640–648. 7 OPELLA, S. J. Nat. Struct. Biol. 1997, 4, 845–848. 8 WATTS, A. Curr. Opin. Biotechnol. 1999, 10, 48–53. 9 BALEJA, J. D. Anal. Biochem. 2001, 288, 1–15. 10 SCHWYZER, R. J. Receptor Res. 1991, 11, 45–57. 11 SCHWYZER, R. J. Mol. Recognit. 1995, 8, 3–8. 12 MORODER, L.; ROMANO, R.; GUBA, W.; MIERKE, D. F.; KESSLER, H.; DELPORTE, C.; WINAND, J.; CHRISTOPHE, J. Biochemistry 1993, 32, 13551–13559. 13 EDIDIN, M.; FINEAN, J. B. E.; MICHELL, R. H. in Membrane Structure 1981, 37–82. 14 WALKER, J. E.; CARNE, A. F.; SCHMITT, H. W. Nature 1979, 278. 15 SINGER, S. J.; NICOLSON, G. L. Science 1972, 175, 720–731. 16 CHAPMAN, D.; BENGA, G. in Biological Membranes; CHAPMAN, D., ed.; Academic Press: New York, 1984; Vol. 5, pp 1–56. 17 VAZ, W. L. C.; GOODSAIDZALDOUONDO, F.; JACOBSON, K. FEBS Lett. 1984, 1974, 199–207. 18 OP DEN KAMP, J. A. F. Annu. Rev. Biochem. 1979, 48, 47–71. 19 GANONG, B. R.; BELL, R. M. Biochemistry 1984, 23, 4977–4983. 20 MCNAMEE, M. G.; MCCONNEL, H. M. Biochemistry 1973, 12, 2957–2958.
21 KORNBERG, R. D.; MCCONNEL, H. M. Biochemistry 1971, 10, 1111–1120. 22 KUYPERS, F. A. Curr. Opin. Hematol. 1998, 5, 122–131. 23 AZZONE, G. F.; PIETROBON, D.; ZORATTI, M. Curr. Top. Bioenerg. 1984, 13, 1–77. 24 HARDER, T.; SIMONS, K. Curr. Opin. Cell Biol. 1997, 9, 534–542. 25 PARTON, R. G. Science 2001, 293, 2404–2405. 26 SAKAI, L. Y. Sci. Med. 1995, 2, 58–67. 27 HYNES, R. O. Cell 1992, 69, 11–25. 28 CHOTHIA, C.; JONES, E. Y. Annu. Rev. Biochem. 1996, 66, 823–862. 29 TEDDER, T. F. FASEB J. 1995, 9, 866–873. 30 UYEMURA, K.; ASOU, H.; YAZAKI, T.; TAKEDA, Y. Essays Biochem. 1996, 31, 37–48. 31 GEIGER, B.; AYALON, O. Annu. Rev. Cell Biol. 1993, 8, 307–332. 32 BRANDEN, C.; TOOZE, J. Introduction to Protein Structure; Garland Publisher: New York, 1999. 33 BACKLUND, B. M.; WIKANDER, G.; PEETERS, T. L.; GRASLUND, A. Biochim. Biophys. Acta 1994, 1190, 337–344. 34 LIU, L. P.; DEBER, C. M. Biochemistry 1997, 36, 5476–5482. 35 KILLIAN, J. A.; VON HEIJNE, G. Trends Biochem. Sci. 2000, 25, 429–434. 36 WIMLEY, W. C.; WHITE, S. H. Nat. Struct. Biol. 1996, 3, 842–848. 37 LEE, A. G.; DALTON, K. A.; DUGGLEBY, R. C.; EAST, J. M.; STARLING, A. P. Biosci. Rep. 1995, 15, 289–298. 38 MEUILLET, E. J.; LERAY, V.; HUBERT, P.; LERAY, C.; CREMEL, G. Biochim. Biophys. Acta 1999, 1454, 38–48. 39 ESMANN, M.; WATTS, A.; MARSH, D. Biochemistry 1985, 24, 1386–1393. 40 DILL, K. A.; FLORY, P. J. Proc. Natl. Acad.
25
26 NMR of Membrane-Associated Peptides and Proteins
Sci. USA 1981, 78, 676–680. 41 WYMORE, T.; GAO, X. F.; WONG, T. C. J. Mol. Struct. 1999, 485/486, 195–210. 42 TIELEMAN, D. P.; van der SPOEL, D.; BERENDSEN, H. J. C. J. Phys. Chem. 2000, 104, 6380–6388. 43 MACKERELL, A. D. J. Phys. Chem. 1995, 99, 1846–1855. 44 BESWICK, V.; GUEROIS, R.; CORDIER-OCHSENBEIN, F.; COIC, Y.-M.; HUYANH-DINH, T.; TOSTAIN, J.; NOEL, J.-P.; SANSON, A.; NEUMANN, J.-M. Eur. Biophys. J. 1998, 28, 48–58. 45 LAUTERWEIN, J.; B¨OSCH, C.; BROWN, L. R.; W¨UTHRICH, K. Biochim. Biophys. Acta 1979, 556, 244–264. 46 SMALL, D. M. Gastroenterology 1967, 52, 607–609. 47 RAM, P.; PRESTEGARD, J. H. Biochim. Biophys. Acta 1988, 940, 289–294. 48 SANDERS, C. R.; PROSSER, R. S. Structure 1998, 6, 1227–1234. 49 CAREY, M. C.; SMALL, M. D. Am. J. Med. 1970, 49, 590–606. 50 VOLD, R. R.; PROSSER, R. S.; DEESE, A. J. J. Biomol. NMR 1997, 9, 329–335. 51 GLOVER, K. J.; WHILES, J. A.; WU, G.; YU, N.; DEEMS, R.; STRUPPE, J. O.; STARK, R. E.; KOMIVES, E. A.; VOLD, R. R. Biophys. J. 2001, 81, 2163–2171. 52 STRUPPE, J.; WHILES, J. A.; VOLD, R. R. Biophys. J. 2000, 78, 281–289. 53 WHILES, J. A.; BRASSEUR, R.; GLOVER, K. J.; MELACINI, G.; KOMIVES, E. A.; VOLD, R. R. Biophys. J. 2001, 80, 280–293. 54 ALGER, J. R.; PRESTEGARD, J. H. Biophys. J. 1979, 28, 1–13. 55 CHOU, J. J.; KAUFMAN, J. D.; STAHL, S. J.; WINGFIELD, P. T.; BAX, A. J. Am. Chem. Soc. 2002, 124, 2450–2451. 56 PERVUSHIN, K. V.; OREKHOV, V.; POPOV, A. I.; MUSINA, L.; ARSENIEV, A. S. Eur. J. Biochem. 1994, 219, 571–583. 57 CASEY, J. R.; REITHMEIER, R. A. Biochemistry 1993, 32, 1172–1179. 58 KESSI, J.; POIREE, J. C.; WEHRLI, E.; BACHOFEN, R.; SEMENZA, G.; HAUSER, H. Biochemistry 1994, 33, 10825–10836. 59 WOMACK, M. D.; KENDALL, D. A.; MACDONALD, R. C. Biochim. Biophys.
60
61 62 63 64
65
66
67
68 69 70 71 72 73 74
75
76
77
78
Acta 1983, 733, 210–215. VINOGRADOVA, O.; SONNICHSEN, F.; SANDERS, C. R., 2nd J. Biomol. NMR 1998, 11, 381–386. KETCHEM, R. R.; HU, W.; CROSS, T. A. Science 1993, 261, 1457–1460. SANDERS, C. R., 2nd; LANDIS, G. C. Biochemistry 1995, 34, 4030–4040. MERRIFIELD, B. Method. Enzymol. 1997, 289, 3–13. FIELDS, G. B.; COLOWICK, S. P. Solid-Phase Peptide Synthesis; Academic Press: London, 1997. HUTH, J. R.; BEWLEY, C. A.; BELINDA, M. J.; HINNEBUSCH, A. G.; CLORE, G. M.; GRONENBORN, A. M. Protein Sci. 1997, 6, 2359–2364. KOHNO, T.; KUSUNOKI, H.; SATO, K.; WAKAMATSU, K. J. Biomol. NMR 1998, 12, 109–121. EIPPER, B. A.; MILGRAM, S. L.; HUSTEN, E. J.; YUN, H.-Y.; MAINS, R. E. Protein Sci. 1993, 2, 489–497. W¨UTHRICH, K.; WAGNER, G. J. Mol. Biol. 1979, 130, 1–18. ENGLANDER, S. W.; KALLENBACH, N. R. Q. Rev. Biophys. 1984, 16, 521–655. DEMPSEY, C. E. Prog. NMR Spectrosc. 2001, 39, 135–170. O’NEIL, J. D.; SYKES, B. D. Biochemistry 1988, 27, 2753–2762. SPYRACOPOULOS, L.; O’NEIL, J. D. J. J. Am. Chem. Soc. 1994, 116, 1395–1402. PERRIN, C. L.; CHEN, J.-H.; OHTA, B. K. J. Am. Chem. Soc. 1999, 121, 2448–2455. TOMICH, J. M.; CARSON, L. W.; KANES, K. J.; VOGELAAR, N. J.; EMERLING, M. R.; RICHARDS, J. H. Anal. Biochem. 1988, 174, 197–203. KILLIAN, J. A.; TROUARD, T. P.; GREATHOUSE, D. V.; CHUPIN, V.; LINDBLOM, G. FEBS Lett. 1994, 348, 161–165. SMITH, S. O.; JONAS, R.; BRAIMAN, M.; BORMANN, B. J. Biochemistry 1994, 33, 6334–6341. RIGBY, A. C.; BARBER, K. R.; SHAW, G. S.; GRANT, C. W. Biochemistry 1996, 35, 12591–12601. GLOVER, K. J.; MARTINI, P. M.; VOLD, R. R.; KOMIVES, E. A. Anal. Biochem. 1999, 272, 270–274.
References 79 ZHANG, L.; DEHAVEN, R. N.; GOODMAN, M. Biochemistry 2001, 41, 61–68. 80 OLSON, F.; HUNT, C. A.; SZOKA, F. C.; VAIL, W.; MAYHEW, E.; PAPHADJOPOULOS, D. Biochim. Biophys. Acta 1980, 601, 559–571. 81 BANGHAM, A. D.; STANDSIH, M. M.; WATKINS, J. C. J. Mol. Biol. 1965, 13, 238–252. 82 HUANG, C. Biochemistry 1969, 8, 344–352. 83 BADER, R.; BETTIO, A.; BECK-SICKINGER, A. G.; ZERBE, O. J. Mol. Biol. 2001, 305, 307–329. 84 BECK-SICKINGER, A. G.; JUNG, G. Biopolymers 1995, 37, 123–142. 85 W¨UTHRICH, K. NMR of Proteins and Nucleic Acids, 1st ed.; Wiley: New York, 1986. 86 KESSLER, H.; GEHRKE, M.; GRIESINGER, C. Angew. Chem. Int. Ed. Engl. 1988, 27, 490–536. 87 GRONENBORN, A. M.; BAX, A.; WINGFIELD, P. T.; CLORE, G. M. FEBS Lett. 1989, 243, 93–98. 88 PIOTTO, M.; SAUDEK, V.; SKLENAR, V. J. Biosmol. NMR 1992, 2, 661–665. 89 SZYPERSKI, T.; G¨UNTERT, P.; OTTING, G.; W¨UTHRICH, K. J. Magn. Reson. 1992, 99, 552–560. 90 DAMBERG, P.; JARVET, J.; GRASLUND, A. Methods Enzymol. 2001, 339, 271–285. 91 PALMER, A. G. Annu. Rev. Biophys. Biomol. Struct. 2001, 30, 129–155. 92 LIPARI, G.; SZABO, A. J. Am. Chem. Soc. 1982, 104, 4546–4559. 93 LIPARI, G.; SZABO, A. J. Am. Chem. Soc. 1982, 104, 4559–4570. 94 SCHURR, J. M.; BABCOCK, H. P.; FUJIMOTO, B. S. J. Magn. Reson. B 1994, 105, 211–224. 95 PAPAVOINE, C. H.; AELEN, J. M.; KONINGS, R. N.; HILBERS, C. W.; Van de VEN, F. J. Eur. J. Biochem. 1995, 232, 490–500. 96 BRAUN, W.; WIDER, G.; LEE, K. H.; W¨UTHRICH, K. J. Mol. Biol. 1983, 169, 921–948. 97 JARVET, J.; ZDUNEK, J.; DAMBERG, P.; GRASLUND, A. Biochemistry 1997, 36, 8153–8163. 98 OHMAN, A.; LYCKSELL, P. O.; JUREUS, A.; LANGEL, U.; BARTFAI, T.; GRASLUND, A.
99
100
101
102 103
104
105 106
107
108
109
110
111 112
113
114
Biochemistry 1998, 37, 9169–9178. MOTTA, A.; ANDREOTTI, G.; AMODEO, P.; STRAZZULLO, G.; CASTIGLIONE MORELLI, M. A. Proteins 1998, 32, 314–323. HASHIMOTO, Y.; TOMA, K.; NISHIKIDO, J.; YAMAMOTO, K.; HANEDA, K.; INAZU, T.; VALENTINE, K. G.; OPELLA, S. J. Biochemistry 1999, 38, 8377–8384. PELLEGRINI, M.; BISELLO, A.; ROSENBLATT, M.; CHOREV, M.; MIERKE, D. F. Biochemistry 1998, 37, 12737–12743. YAN, C.; DIGATE, R. J.; GUILES, R. D. Biopolymers 1999, 49, 55–70. PISERCHIO, A.; USDIN, T.; MIERKE, D. F. J. Biol. Chem. 2000, 275, 27284–27290. INOOKA, H.; OHTAKI, T.; KITAHARA, O.; IKEGAMI, T.; ENDO, S.; KITADA, C.; OGI, K.; ONDA, H.; FUJINO, M.; SHIRAKAWA, M. Nat. Struct. Biol. 2001, 8, 161–165. BROWN, L. R.; W¨UTHRICH, K. Biochim. Biophys. Acta 1981, 647, 95–111. FRANKLIN, J. C.; ELLENA, J. F.; JAYASINGHE, S.; KELSH, L. P.; CAFISO, D. S. Biochemistry 1994, 33, 4036–4045. CONDAMINE, E.; REBUFFAT, S.; PRIGENT, Y.; SEGALAS, I.; BODO, B.; DAVOUST, D. Biopolymers 1998, 46, 75–88. SCHIBLI, D. J.; HWANG, P. M.; VOGEL, H. J. Biochemistry 1999, 38, 16749–16755. TOWNSLEY, L. E.; TUCKER, W. A.; SHAM, S.; HINTON, J. F. Biochemistry 2001, 40, 11676–11686. PAPAVOINE, C. H.; CHRISTIAANS, B. E.; FOLMER, R. H.; KONINGS, R. N.; HILBERS, C. W. J. Mol. Biol. 1998, 282, 401–419. ALMEIDA, F. C.; OPELLA, S. J. J. Mol. Biol. 1997, 270, 481–495. WILLIAMS, K. A.; FARROW, N. A.; DEBER, C. M.; KAY, L. E. Biochemistry 1996, 35, 5145–5157. SCHIBLI, D. J.; MONTELARO, R. C.; VOGEL, H. J. Biochemistry 2001, 40, 9570–9578. PERVUSHIN, K. V.; OREKHOV, V.; POPOV, A. I.; MUSINA, L.; ARSENIEV, A. S. Eur. J. Biochem. 1994, 219, 571–583.
27
28 NMR of Membrane-Associated Peptides and Proteins
115 SHAO, H.; JAO, S.; MA, K.; ZAGORSKI, M. G. J. Mol. Biol. 1999, 285, 755–773. 116 MACKENZIE, K. R.; PRESTEGARD, J. H.; ENGELMAN, D. M. Science 1997, 276, 131. 117 PISERCHIO, A.; BISELLO, A.; ROSENBLATT, M.; CHOREV, M.; MIERKE, D. F. Biochemistry 2000, 39, 8153–8160. 118 CHUPIN, V.; KILLIAN, J. A.; BREG, J.; de JONGH, H. H.; BOELENS, R.; KAPTEIN, R.; de KRUIJFF, B. Biochemistry 1995, 34, 11617–11624. 119 LOSONCZI, J. A.; TIAN, F.; PRESTEGARD, J. H. Biochemistry 2000, 39, 3804–3816. 120 BERLOSE, J. P.; CONVERT, O.; DEROSSI, D.; BRUNISSEN, A.; CHASSAING, G. Eur. J. Biochem. 1996, 242, 372–386. 121 WIENK, H. L.; WECHSELBERGER, R. W.; CZISCH, M.; de KRUIJFF, B. Biochemistry 2000, 39, 8219–8227. 122 PELLEGRINI, M.; MIERKE, D. F. Biochemistry 1999, 38, 14775–14783. 123 ROZEK, A.; SPARROW, J. T.; WEISGRABER, K. H.; CUSHLEY, R. J. Biochemistry 1999, 38, 14475–14484. 124 MACRAILD, C. A.; HATTERS, D. M.; HOWLETT, G. J.; GOOLEY, P. R. Biochemistry 2001, 40, 5414–5421. 125 SORGEN, P. L.; CAHILL, S. M.; KRUEGER-KOPLIN, R. D.; KRUEGER-KOPLIN, S. T.; SCHENCK, C. C.; GIRVIN, M. E.
126
127 128 129 130
131
132
133 134
135 136 137
138
Biochemistry 2001, 41, 31–41. XU, G. Y.; MCDONAGH, T.; YU, H. A.; NALEFSKI, E. A.; CLARK, J. D.; CUMMING, D. A. J. Mol. Biol. 1998, 280, 485–500. BRITO, R. M.; VAZ, W. L. C. Anal. Biochem. 1986, 152, 250. CHATTOPADHYAY, A.; LONDON, E. Anal. Biochem. 1984, 139, 408. MYSELS, K. J.; PRINCEN, L. H. J. Phys. Chem. 1959, 63, 1696–1700. HELENIUS, A.; MCCASLIN, D. R.; FRIES, E.; TANFORD, C. Methods Enzymol. 56, 734. SMALL, M.; NAIR, P. P.; KRITCHEVSKY, D. in The Bile Acids: Chemistry, Physiology, and Metabolism 1971, 249. THOMAS, D. C.; CHRISTIAN, S. D. J. Colloid Interf. Sci. 1980, 78, 466. SMALL, D. M. Adv. Chem. 1968, 84, 31. ROSEVEAR, P.; Van AKEN, T.; BAXTER, J.; FERGUSON-MILLER, S. Biochemistry 1980, 19, 4108–4115. MAST, R. C.; HAYNES, L. V. J. Colloid Interf. Sci. 1975, 53, 35. HERRMANN, K. W. J. Colloid Interf. Sci. 1966, 22, 352. HJELMELAND, L. M.; NEBERT, D. W.; OSBORNE, J. C., Jr. Anal. Biochem. 1983, 130, 72–82. STARK, R. E.; LEFF, P. D.; MILHEIM, S. G.; KROPF, A. J. Phys. Chem. 1984, 88, 6063–6067.
1
Determination of Protein Dynamics Using 15N Relaxation Measurements David Fushman
University of Maryland, College Park, USA
Originally published in: BioNMR in Drug Research. Edited by Oliver Zerbe. Copyright ľ 2002 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30465-7
1 Introduction
One of the fundamental problems in understanding life at the molecular level is the relationship between structure, dynamics, and function in complex molecular systems like proteins. Proteins are molecular machines that perform most of the functions and control all key events in a living cell. Globular proteins are well packed and adopt ordered three-dimensional structures. Most importantly, they possess a variety of motions such as bond vibrations, side-chain rotations, segmental motions, domain movements etc. Although static three-dimensional structures alone provide extremely valuable information on the organization and interaction of protein molecules, it is motion that is required for most proteins to work. Dynamics are vital to protein function, which depends on alteration in three-dimensional structure in response to specific molecular interactions. The binding of a ligand to a protein (e.g., a substrate to an enzyme) generally results in changes in the protein’s structure and dynamics; these changes can be local or also can involve other regions of the protein molecule. The relative orientation and motions of domains within many proteins are key to the control of multivalent recognition, or the assembly of protein-based cellular machines. In order to understand how proteins work, we need to know what motions are present in a protein and how they are related to the protein’s biological function. Detailed knowledge of protein dynamics is, therefore, crucial to the understanding of the mechanisms of protein function, including such events as protein folding, ligand recognition, allostery, and catalysis. Protein dynamics are extremely complex and difficult to analyze, because a variety of motions take place in the same molecule and at the same time. The key problem here is to determine which of these motions in a protein are essential for its
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Determination of Protein Dynamics Using 15N Relaxation Measurements
biological function. In order to address this issue, we need detailed knowledge of protein dynamics. Multidimensional NMR methods, combined with isotope labeling, can provide access to virtually every atom in a molecule, unique for protein structural studies. This not only allows characterization of the structure and interaction of proteins in their native milieu, but also provides unparalleled possibilities to obtain a complete atomic-level resolution picture of protein dynamics in a time range from picoseconds up to seconds, the range where most motions relevant to protein function take place. A significant number of 15 N and 13 C relaxation studies have been performed on a large number of proteins in the last decade, initiated by the pioneering work of Refs. [1] and [2]. In this article we review some experimental and analytical approaches to protein dynamics by 15 N relaxation measurements and outline the current picture of protein dynamics revealed by NMR.
2 Spectroscopic Techniques
Typical experiments involved in 15 N relaxation studies include measurements of the transverse (R2 = 1/T 2 ) and longitudinal (R1 = 1/T 1 ) relaxation rates of the 15 N nuclear spin and of the 1 H- 15 N cross-relaxation rate, the latter being usually measured via the steady-state 15 N{1 H} nuclear Overhauser effect (NOE) (see e.g. Refs. [2–4]). These are HSQC-type experiments that are based on INEPT magnetization transfer in the 1 H-15 N pair and contain relaxation-specific pulse sequences inserted during the time period when magnetization is in the 15 N dimension. Typical NMR pulse sequences for these experiments can be found in Ref [5]. Traditionally, all relaxation measurements are performed at a single resonance field (spectrometer), although more accurate approaches might require measurements of these parameters at multiple fields [6]. Some practical aspects of these measurements will be briefly mentioned here. T 2 measurements usually employ either Carr-Purcell-Meiboom-Gill (CPMG) [7, 8] spin-echo pulse sequences or experiments that measure spin relaxation (T 1ρ ) in the rotating frame. The time delay between successive 180◦ pulses in the CPMG pulse sequence is typically set to 1 ms or shorter to minimize the effects of evolution under the heteronuc-lear scalar coupling between 1 H and 15 N spins [3]. T 1 measurements are usually based on inversion-recovery-type experiments (see e.g. Ref. [9]). The longitudinal magnetization relaxes to its equilibrium value, which is not zero. The accuracy of T 1 determination then could depend on the duration of the longest relaxation delay used for the measurements. In order to circumvent this problem, these experiments are usually performed so that the contribution from the equilibrium magnetization is subtracted in the course of the experiment by alternating the receiver phase simultaneously with the sign of the initial longitudinal 15 N magnetization [2]. In a different approach, two sets of T 1 experiments, one with positive and one with negative initial longitudinal 15 N magnetization, are
3 Accuracy and Precision of the Method 3
stored separately and then analyzed simultaneously [9]. As shown in Ref. [9], signal intensities measured in the heteronuclear NOE experiment can be used to assess the equilibrium magnetization for the T 1 experiment. Cross-correlation effects between 15 N CSA and 1 H-15 N dipolar interactions [10] will result in different relaxation rates for the two components of the 15 N spin doublet, which could significantly complicate the analysis of the resulting bi-exponential decay of the decoupled signal in T 1 or T 2 experiments. To avoid this problem, 180◦ 1 H pulses are applied during the 15 N relaxation period [3, 4], which effectively averages the relaxation rates for the two components of the 15 N spin doublet. The steady-state heteronuclear 15 N{1 H} NOEs are determined as a ratio of crosspeak intensities in two experiments, with and without presaturation of amide 1 H nuclear spins, usually referred to as NOE and NONOE experiments, respectively. Magnetization exchange between amide protons and water protons could affect the equilibrium 1 H magnetization in the NONOE experiment, and thereby increase the measured NOE values. The effect could be considerable for solvent-exposed parts of the backbone and could render the NOE values inaccurate. These systematic errors could be minimized by using water flip-back pulses in order to avoid saturation of H2 O magnetization [11]. The NOE data are generally more susceptible to errors than R1 and R2 because (i) the NOE experiments start with the equilibrium 15 N magnetization that is ∼10 fold lower than that of (1 H) in the R1 and R2 experiments, hence relatively low sensitivity, and (ii) the NOE values are derived from only two sets of measurements, whereas R1 and R2 data are obtained from fitting multiple sets of data; the latter is expected to result in a more efficient averaging of experimental errors.
3 Accuracy and Precision of the Method
Here we mention some practical aspects that could be useful for improving the accuracy and precision of the method. 3.1 Sampling Schemes
Typically, a series of several 2D spectra are recorded with various relaxation delays, ranging from very short to the longest delays, which usually correspond to 1.5–2 relaxation times. The delay values are usually selected so that they are uniformly distributed over this time interval or so that the signal values are uniformly spread. Another sparse sampling strategy proposed in [12] is based on an optimal sampling scheme for a monoexponential decay function; a five-point variant of this strategy uses one measurement at a very short relaxation delay and four measurements at 1.3 T 2 (or T 1 ).
4 Determination of Protein Dynamics Using 15N Relaxation Measurements
3.2 Peak Integration
Relaxation rates for a particular 15 N nucleus are determined from a monoexponential fit of the corresponding signals in a series of 2D spectra. Strictly speaking, peak volume, not intensity, is a direct measure of the NMR signal. However, peak volume integration is usually not straightforward and could be prone to errors. Assuming that the measurements are performed on the same sample and under identical temperature conditions, peak widths are same for a given peak in all experiments of the series, so that the peak volumes are linearly related to intensities. This allows one to monitor NMR signal decay by measuring peak intensities, which proved to be a more accurate procedure than peak volume integration because of the computational errors associated with the latter.
3.3 Estimation of Experimental Errors
Since relaxation rates are derived from the measured peak intensities, the experimental errors in these parameters are typically assessed based on the level of noise in the NMR spectra. This noise represents experimental uncertainties in the measured peak intensities and can be measured by integrating areas in a 2D spectrum that are free from signals. Errors in peak intensities can also be obtained from duplicate measurements [13]. The uncertainties in R1 or R2 can then be derived using standard methods of error estimation for fitting parameters, see e.g. Ref [14]. The experimental errors in NOE values can be assessed as follows:
σ N O E = |N O E | (δ N O E /IN O E )2 + (δ N O N O E /IN O N O E )2 where I and δ denote the peak intensity and the level of experimental noise, respectively, in the NOE and NONOE experiments (recall that NOE = INOE /INONOE ). Experimental errors in relaxation parameters could also be derived from the comparison of their values measured in multiple separate measurements [15]. The experimental precision of relaxation measurements depends on many factors: protein size and concentration, signal line width, number of transients per increment, etc. The higher the concentration and/or narrower the line width, the better is the signal-to-noise ratio. Typically, for a 1 mM sample of a relatively small protein (e.g. ubiquitin, Mw = 8.6 kDa) at 25 ◦ C (proton T 2 ≈ 50 ms), relaxation measurements with 32 scans per t1 increment could provide experimental uncertainty in R2 or R1 of about 1%. Heteronuclear 15 N{1 H} NOE measurements require a greater number of scans (>128) to reach reasonable levels of experimental precision (<2%). A complete set of R1 , R2 , and NOE measurements under these conditions typically takes less than 1 week of measurement time. Relaxation measurements at lower protein concentrations will require a significant increase in the number of scans, hence longer experimental time. For small to medium size proteins (up to ∼ 15–20 kDa), relaxation data of reasonable precision could be obtained using conventional NMR probes for protein concentrations as low as 500
3 Accuracy and Precision of the Method 5
µM or even 250 µM [9]. Bigger proteins will require a greater number of scans, as the line-width scale is approximately linear with protein molecular weight. In addition, the decrease in T 2 (particularly 1 H) results in greater losses due to spin relaxation during the magnetization transfer (INEPT), which will result in the decrease in the signal. Additional gain in the signal-to-noise ratio could be obtained by reducing T 2 s, which could be achieved, for example, by raising the temperature if the protein’s stability permits, or by perdeuterating side chains. Increasing protein concentration usually helps to improve the sensitivity or reduce the required measurement time, although it could also promote protein aggregation (see below). 3.4 Noise Reduction
It proved helpful for the purpose of noise reduction to perform relaxation experiments in an interleaved fashion, as one pseudo-3D experiment, where the 2D planes in the F2 dimension correspond to various relaxation delays. The acquisition order (3-2-1) is selected so that cycling through various relaxation delays (in R1 or R2 experiments) or through NOE/NONOE 2D planes is performed prior to incrementing the evolution period in the indirect dimension (F1) (see e.g. Ref. [16]). The resulting pseudo-3D spectrum can be processed as a set of 2D spectra in t1 and t3 dimensions, and then analyzed in the usual way. This procedure reduces the noise arising from switching from one 2D experiment to the other and helps minimize temperature variations between the spectra acquired with different relaxation delays. It will also be helpful in those cases when protein degradation occurs in the course of measurements; a reduction in the signal due to sample instability will then affect the line shape in F1 but not the relaxation rate. 3.5 Temperature Control
Since both the overall and local protein dynamics depend on temperature, proper temperature control is essential for accurate interpretation of 15 N relaxation data [17]. Real temperature in the sample could be measured using calibration standards, e.g. methanol and/or ethyl glycol (e.g. Ref [18]). However, even in this case, temperature consistency is a significant technical problem, as the actual sample temperature could vary from one experiment to the other. For example, in the T 2 experiment using CPMG, temperature variations between the 2D data sets could arise from the differences in the duration of relaxation delays and hence in the number of 180◦ -pulses applied. Temperature stability could be improved by using longer recycling delays between the transients, although at the cost of increased measurement time. Further temperature equalization could be obtained by inserting a compensating set of 180◦ pulses at the beginning of the recycling delay. The compensating pulses could be applied at a frequency far from the carrier position. This helps maintain the same temperature level between data sets corresponding to different relaxation delays, e. g. in the course of T 2 measurement, but, at the same
6 Determination of Protein Dynamics Using 15N Relaxation Measurements
Fig. 1 Illustration of the temperature sensitivity of 15 N relaxation parameters, R1 , R2 , and NOE, as indicated. Shown are the relative deviations in these relaxation parameters from their values at 25 ◦ C as a function of temperature in the range of ±3 ◦ C. The expected variations in R1 and R2 due to temperature deviations of as little as ±1 ◦ C are already greater than the typ-
ical level of experimental precision (±1%) of these measurements (indicated by the dashed horizontal lines). For simplicity, only temperature variation of the overall tumbling time of the molecule (due to temperature dependence of the viscosity of water) is taken into account; the effect of temperature variations on local dynamics is not considered here.
time, can introduce another problem as it will amplify temperature differences between T 2 and T 1 and NOE experiments. Temperature consistency between measurements performed on different spectrometers is particularly critical for accurate interpretation of the data (see Refs. [19, 20] for post-acquisition temperature consistency tests). However, temperature control and equalization are also important for the combined analysis of T 1 , T 2 , and NOE data measured on the same spectrometer, because of the possible temperature differences between these measurements. Fig. 1 illustrates the sensitivity of relaxation parameters to temperature variations. Accurate measurement of protein dynamics requires that all experiments be done at the same temperature. To improve temperature consistency between T 1 , T 2 , and NOE measurements on the same spectrometer, these could be performed in an interleaved fashion, e.g. as one pseudo-3D experiment, similarly to that described above.
4 Basic Equations
The measured spin relaxation parameters (longitudinal and transverse relaxation rates, R1 and R2 , and heteronuclear steady-state NOE) are directly related to power spectral densities (SD). These spectral densities, J(ω), are related via Fourier transformation with the corresponding correlation functions of reorientional motion. In the case of the backbone amide 15 N nucleus, where the major sources of
4 Basic Equations
7
relaxation are dipolar interaction with directly bonded 1 H and 15 N CSA, the standard equations read [21]: R1 = 3(d 2 + c 2 )J (ωN ) + d 2 [3J (ωN ) + J (ωH − ωN ) + 6J (ωH − ωN )]
(1)
1 1 R2 = (d 2 + c 2 )[4J (0) + 3J (ωN )] + d 2 [J (ωH − ωN ) + 6J (ωH ) + 6J (ωH + ωN )] + Rex 2 2 N O E = 1 − |γH /γN |d 2 [6J (ωH + ωN ) − J (ωH − ωN )]/R1 .
(3)
Here d = −[µ/(4π)])γ H γ N h/(4πr 3 HN ) and c = −ωN CSA/3 represent contributions from the 15 N-1 H dipolar coupling and 15 N CSA, respectively; Rex is the conformational exchange contribution (if any) to measured R2 . Equations (1–3) assume that the same autocorrelation function can be used to describe the spin relaxationrelevant modulation of both 1 H-15 N dipolar interaction and 15 N CSA. Corrections to these equations due to the noncollinearity of the two interactions are described in Ref [22]. Equations (1–3) are widely used for protein dynamics analysis from relaxation measurements. The primary goals here are (A) to measure the spectral densities J(ω) and, most important, (B) to translate them into an adequate picture of protein dynamics. The latter goal requires adequate theoretical models of motion that could be obtained from comparison with molecular dynamics simulations (see for example Ref. [23]). However, accurate analysis of experimental data is an essential prerequisite for such a comparison. In addition, cross-correlation effects between 15 N CSA and 1 H- 15 N dipolar interaction could be measured, see e.g. Refs. [24–26]: ηxy − dc[4J (0) + 3J (ωN )]P2 (cosβ)
(4)
ηz − 3dc J (ωN )P2 (cosβ).
(5)
Here β is the angle between the unique principal axis of the 15 N CSA tensor and the HN-bond, and P2 (x) =1/2 (3x2 − 1) is the second Legendre polynomial. Although these measurements do not provide additional information (compared to Eqs. (1–3) ) in terms of sampling the spectrum of protein motions, the η-terms could be used for the determination of the 15 N CSA tensor [19] and also for the identification of amide groups undergoing conformational exchange [26, 27] (see below). If the β angle was known, these parameters could also be used in addition to or in lieu of R2 and R1 to analyze protein dynamics. As one can see from Eqs. (1–5) , the number of experimentally measured parameters is less than the number of unknowns: independent characterization of the nine parameters in the right-hand side of Eqs. (1–5) from only five measurements is impossible. Therefore, several approaches were suggested in order to make possible characterization of protein dynamics from experimental data measured at a single resonance frequency. A “low-resolution” picture of protein dynamics
(2)
8 Determination of Protein Dynamics Using 15N Relaxation Measurements
can be obtained assuming constant values for r NH (1.02A) and 15 N CSA (−160 or −170 ppm) and using various useful parametrizations for J(ω) [28–31]. The two major approaches, the model-free analysis and SD mapping will be briefly discussed here.
5 The Model-Free Approach
Recent progress in protein dynamics studies by NMR was greatly facilitated by the invention of the “model-free” formalism [28, 32]. In this approach, the local dynamics of a protein are characterized by an order parameter, S, measuring the amplitude of local motion on a scale from 0 to 1, and the correlation time of the motion, τ loc . The model-free expression for the correlation function of local motion reads Cloc (t) = S2 + (1 − S2 )exp( − t/τloc ).
(6)
In the extended model-free approach [29], the local dynamics are deconvolved into a fast and a slow motion τ fast τ slow , S2 = S2 slow S2 fast ): 2 2 − S2 )exp( − t/τslow ) + (1 − Sfast )exp( − t/τfast ). Cloc (t) = S2 + (Sfast
(7)
Formally, S2 represents a decrease in the autocorrelation function caused by the motion: S2 = 0 corresponds to completely unrestricted motion of a bond (N–H in this case), while S2 = 0 is expected if the bond reorientations are frozen. It was shown recently that the order parameter may be related to the statistical mechanical properties of a protein molecule [33–35]; hence, changes in the NMR-derived order parameters can indicate localized contributions to overall molecular entropy. In the absence of a correlation between the local dynamics and the overall rotational diffusion of the protein, as assumed in the model-free approach, the total correlation function that determines the 15 N spin-relaxation properties (Eqs. 1–5 ) can be deconvolved (τ fast , τ slow τ c ): C(t) = Covrl (t)Cloc (t),
(8)
where Covrl (t) denotes an autocorrelation function of the 15 N-1 H dipolar vector interaction in the case of rigid body rotation, which is generally anisotropic (e.g. Ref. [36]). In the simplest case of isotropic overall rotation with a correlation time τ c , Covrl (t) = 15 e − t/τc . The model-free approach is essentially based on a parametrization of the spectral densities using a small number of fitting parameters, which then allows Eqs. 1–3 to be solved. The analysis of experimental data using this method will be discussed in a later section.
6 Reduced Spectral Densities Mapping 9
6 Reduced Spectral Densities Mapping
This approach is based on the idea of SD mapping [37, 38] that exploits the linear relationship (Eqs. 1–3 ) between the measured relaxation parameters and the spectral densities. The spectral density components, J(0), J(ωN ), and J(ωH ) could be determined directly from the relaxation data, R1 , R2 , and NOE, using the reduced spectral density approach [30, 31] based on the observation that the spectral densities are slowly varying functions of the frequency around wH : J (0.87ωH ) =
(9)
R1 − 7d 2 J (0.921ωH ) 3(d 2 + c 2 )
(10)
2R2 − R1 − 6d 2 J (ωH ) 4(d 2 + c 2 )
(11)
J (ωN ) =
J (0) =
1 γN | |(1 − N O E )R1 5d 2 γH
According to [31], the following two methods could be used in order to relate the high-frequency components of the spectral density in Eqs. 9a–9c : i) J(ωH ) ≈ J(0.87ωH ) ≈ J(0.921ωH ); or ii) J(ω) ∞ 1/ω2 for ω ≈ ωH , which leads to J(0.921ωH ) = 0.8923J(0.87ωH ) and J(ωH ) = 0.7569J(0.87ωH ) in Eqs. (10) and (11), respectively.
These two methods yield lower and upper bounds for J(0) and J(ωN ), but have no effect on the values of J(0.87ωH ) directly determined from the experimental data. Note that the derivation of J(0.87ωH ) (Eq. 9) does not involve any assumption about the values of 15 N CSA. This approach yields spectral densities. Although it does not require assumptions about the correlation function and therefore is not subjected to the limitations intrinsic to the model-free approach, obtaining information about protein dynamics by this method is no more straightforward, because it involves a similar problem of the physical (protein-relevant) interpretation of the information encoded in the form of SD, and is complicated by the lack of separation of overall and local motions. To characterize protein dynamics in terms of more palpable parameters, the spectral densities will then have to be analyzed in terms of model-free parameters or specific motional models derived e.g. from molecular dynamics simulations. The SD method can be extremely helpful in situations when no assumption about correlation function of the overall motion can be made (e.g. protein interaction and association, anisotropic overall motion, etc.; see e.g. Ref. [39] or, for the determination of the 15 N CSA tensor from relaxation data, Ref. [27]).
10 Determination of Protein Dynamics Using 15N Relaxation Measurements
7 Multi-Field Approach
Both approaches outlined above provide a solution to Eqs. 1–3 by reducing the number of necessary dynamics-related parameters to three and by using certain constant values for the dipolar interaction and 15 N CSA, also assuming Rex = 0 (in the SD mapping). The dynamic picture derived from such analyses could be inaccurate if these parameters vary from one site to another. For example, the sitespecific variations in 15 N CSA become important for the analysis of relaxation data at higher fields when the CSA contribution becomes comparable to that from the dipolar interaction, and for TROSY applications [6]. A more accurate approach to protein dynamics should allow independent derivation of protein spectral densities, site-specific 15 N CSA and dipolar interactions, as well as Rex values. An approach has been suggested recently based on relaxation measurements at multiple fields (spectrometers) [6, 19, 20]. This method is a natural extension of previous approaches [37, 40] in that all major components in Eqs. (1–5) , and not only SDs, are treated as fitting parameters. The approach is based on the fact that some terms in these equations are either field-independent [d, θ , and J(0)] or scale with field B0 in a known way (c ∞ B0 , Rex Rex ∞ B02 ). Therefore, when combining data measured at several fields, the number of experimental parameters increases faster than the number of unknowns, so that already a complete set of three-field data theoretically could allow determination of almost all unknowns in Eqs. 1–3 [6]. This could provide a useful tool for protein dynamics analysis free from the oversimplifying assumptions. The utility of this approach was demonstrated for the full characterization of the spectral densities, 15 N CSA tensor, and Rex in ubiquitin, unbiased by the assumption of constant CSA [19, 20]. Similar approaches were recently used by other groups [41, 42]. The necessity to use multiple-field data for accurate analysis of protein motions in multidomain systems was recently demonstrated in Ref. [42].
8 Strategies for the Analysis of Protein Dynamics from 15 N Relaxation Data
The NMR-relevant motions that contribute to nuclear spin relaxation are a combination of local dynamics and the overall molecular tumbling in solution. Accurate analysis of NMR data requires a proper separation of these contributions. The strategies for such analysis are discussed below. As a practical example illustrating the methods described here, we use 15 N relaxation data for the Pleckstrin Homology (PH) domain from the β-adrenergic receptor kinase-1 (βARK). The structure of this protein (PDB access code 1 bak) was solved by NMR [43], and its dynamics were characterized by N relaxation measurements [43] as well as computer molecular dynamics simulations [23, 44]. The βARK PH domain structure comprises seven β-strands forming a β-sandwich flanked on one side by the C-terminal α-helix (Fig. 4b). Some of the loops connecting β-strands, as well as the extended N- and
8 Strategies for the Analysis of Protein Dynamics from 15 N Relaxation Data 11
Fig. 2 Summary of the 15 N relaxation data for the $ARK PH domain measured at 500 MHz (see Ref. [23]): a R1 , b R2 , and c hetero-nuclear NOE versus protein sequence.
C-termini, are unstructured and flexible in solution. A summary of 15 N relaxation data collected at 500 MHz is presented in Fig. 2. It is worth mentioning that the analytical approaches outlined here and currently used to treat relaxation data assume that the overall and local dynamics are not coupled. While this is a reasonable assumption for small, compact proteins, it might not be true for systems possessing large-amplitude internal motions, e.g. multidomain systems and proteins with significant intersegmental flexibility. The approaches to these systems are starting to emerge [42, 45].
9
12 Determination of Protein Dynamics Using 15N Relaxation Measurements
Overall Tumbling
The overall tumbling of a protein molecule in solution is the dominant source of NH-bond reorientations with respect to the laboratory frame, and hence is the major contribution to 15 N relaxation. Adequate treatment of this motion and its separation from the local motion is therefore critical for accurate analysis of protein dynamics in solution [46]. This task is not trivial because (i) the overall and internal dynamics could be coupled (e.g. in the presence of significant segmental motion), and (ii) the anisotropy of the overall rotational diffusion, reflecting the shape of the molecule, which in general case deviates from a perfect sphere, significantly complicates the analysis. Here we assume that the overall and local motions are independent of each other, and thus we will focus on the effect of the rotational overall anisotropy. The anisotropy of the overall tumbling will result in the dependence of spinrelaxation properties of a given 15 N nucleus on the orientation of the NH-bond in the molecule. This orientational dependence is caused by differences in the apparent tumbling rates sensed by various internuclear vectors in an anisotropically tumbling molecule. Assume we have a molecule with the principal components of the overall rotational diffusion tensor Dx , Dy , and Dz (x, y, and z denote the principal axes of the diffusion tensor), and let Dx < Dy < Dz . For an internuclear vector parallel to the z-axis, its reorientations will be caused by molecular rotations around the x- and y-axes, but not z-axis, so that the apparent rotational diffusion rate for the vector will be determined by D(z) = (Dx + Dy )/2 (assuming small degree of anisotropy). On the contrary, reorientations of a vector perpendicular to the z-axis (e.g. along the x-axis) will be determined by D(x) = (Dz + Dy )/2 > D(z) and hence will proceed faster than for the previous internuclear vector. These differences in the tumbling rates will lead to differences in the spin relaxation rates for the corresponding pairs of nuclei; the effect will increase with the anisotropy of the molecule. Theoretical expressions describing this phenomenon and relating spin relaxation rates to the overall rotational diffusion tensor can be found in Ref. [36]. These results are used to accurately analyze protein dynamics in the case of overall rotational anisotropy (e.g. Refs. [15, 23, 43, 47]). The structural information in the form of internuclear vector orientations can also be used in the structure refinement procedures, where the relaxation-derived orientational constraints for individual bond vectors could complement the “short-range” NOE information [48], or in the determination of interdomain orientation in multidomain systems [49, 50].
10 How Can We Derive the Rotational Diffusion Tensor of a Molecule from Spin-Relaxation Data? 10.1
10 How Can We Derive the Rotational Diffusion Tensor of a Molecule from Spin-Relaxation Data? 13
Theoretical Background
Since the characterization of the overall rotational diffusion is a prerequisite for a proper analysis of protein dynamics from spin-relaxation data, we first focus on the theoretical basis of the method being used. Several approaches to determination of the overall rotational diffusion tensor from 15 N relaxation data were suggested in the literature [15, 47, 49, 51–53]. The approach described here uses the orientational dependence of the ratio of spinrelaxation rates [49]
2R2 (ωN τx )2 3/4 −1 = 3J4J(ω(0)N ) = 1+(ω × − 1) 2 {1+ (ωN τx )2 +(1+ 16 ε)2 R1 N τx ) 11 19 2 5 3 2 4+ ε+ ε + εsin θ 18 54 ε [4+3ε+ 29 ε 2 − sin2 θ (1+ (ω 3τ )2 +(1+ 2 2 )]}. 3+2ε+[1+ 13 ε(2 − 3sin2 θ)]2 N x 3 ε)
ρ≡(
(12)
Here ε ≡ D /D⊥ – 1, τx− 1 ≡6D⊥, θ is the angle between an NH bond and the unique axis (associated with D ) of the rotational diffusion tensor, and ωN is the 15 N resonance frequency. Given the orientation { , } of the principal axes of the diffusion tensor with respect to the protein coordinate frame, the angle θ for each NH bond can be determined as θ = cos − 1 (λix cossin + λiy sinsin + λiz cos )
(13)
where {λix , λiy λiz } are the coordinates (in the protein coordinate frame) of a unit vector in the direction of the NH bond. These basic equations assume that the diffusion tensor is axially symmetric. In the general case of rotational anisotropy (presence of the rhombic component), the corresponding equations are more complex (see e.g. Ref. [36]), but the main idea that structural information is encoded in the relaxation rates still holds. Here R 2 and R 1 are the transverse and longitudinal relaxation rates modified to subtract the contributions from high-frequency motions (see e.g. [20, 49]):
R1 = R1 − 7(
R2 = R2 −
0.87 2 2 ) d J (0.87ωH ) = 3(c 2 + d 2 )J (ωN ) 0.921
13 0.87 2 2 3 ( ) d J (0.87ωH ) = (c 2 + d 2 )[2J (0) + J (ωN )] 2 0.955 2
(14)
(15)
where d2 J(0.87ωH ) is determined according to Eq. (9), and the conformational exchange contributions to R2 are assumed to be negligible. Equation (10) is exact in the absence of local motion. It provides a good approximation for the protein core residues, characterized by restricted mobility in the protein backbone: in this case both J(0) and J(ωN ) scale as S2 , and therefore the contribution from local motion to ρ in Eq. (12) is canceled out. The approach described here is rather general and accommodates both large (ωN τ x 1) and small (ωN τ x ≤ 1) molecules and molecules with significant rotational anisotropy. Fig. 3 illustrates how well Eq. (12) describes the expected values for R 2 /R 1 in a wide range of anisotropies.
14 Determination of Protein Dynamics Using 15N Relaxation Measurements
Fig. 3 Demonstration of the accuracy of Eq. (12) for a wide range of rotational anisotropies, D /D⊥ , from 1 to 10. Symbols correspond to synthetic experimental data generated assuming overall tumbling with J c = 5 ns and various degrees of anisotropy as indicated. Model-free pa-
rameters typical of restricted local backbone dynamics in protein core, S2 = 0.87, J loc = 20 ps, were used to describe the effect of local motions. The 1 H resonance frequency was set to 600 MHz. The solid lines correspond to the right-hand-side expression in Eq. (12).
It is worth mentioning that parameter ρ is insensitive, to first order approximation, to modulation of the residue-specific 15 N chemical shift anisotropy tensor and/or dipolar interaction, as the (d2 + c2 ) term in the R 2 /R 1 ratio is canceled out. The noncollinearity of the CSA and dipolar tensors will require corrections to Eqs. (12) and 14–15 for high degrees of rotational anisotropy (D /D⊥ > 1.5), as described in detail in Ref. [22]. Equation (10) allows the determination of the principal values of the diffusion tensor if the orientation of its principal axes frame is known. The latter information is included there in implicit form, via Eq. (13). The problem is that neither the principal axes nor principal values are known a priori and are to be determined simultaneously, as outlined below. 10.2 Derivation of the Diffusion Tensor when Protein Structure is Known
When the structural information is available, then the components of the diffusion tensor can be derived from the minimization of the target function,
χ2 =
ρ exp − ρ calc i [ i ]2 σ pi i
(16)
10 How Can We Derive the Rotational Diffusion Tensor of a Molecule from Spin-Relaxation Data? 15
Fig. 4 The results of the determination of the rotational diffusion tensor of the $SARK PH domain: A fit of the orientational dependence of the experimental values of D for the $SARK PH domain and B a ribbon representation of the 3D structure of the protein, with the orientation of the diffusion axis indicated by a rod. Shown in A are the experimental (symbols) and the best fit (line) values of D (Eq. (12)) for those residues belonging to the well-defined secondary structure elements. The rotational diffusion tensor characteristics derived here are J x = 9.29 ± 0.11 ns and D /D⊥ = 1.39 ± 0.05 (resulting in J c = 8.23± 0.15 ns),
= 88 ± 4◦ and = 236 ± 4.5◦ . The probability that an improvement in the fit compared to the isotropic model could occur by chance is 2×10−13 . The rotational diffusion axis is oriented along the "-helix axis, and the angle between the two is about 20◦ . This plot also illustrates a clear separation between two sets of NH vectors, those belonging to the helix (hence oriented nearly parallel to the diffusion axis) and those in the $-strands. Because of the architecture of the PH domain fold, the latter NH vectors are almost all orthogonal to the helix and hence to the diffusion axis.
Here ρ exp is the experimentally determined value of (2R 2 /R 1 −1)−1 derived directly from the experimental data according to Eqs. 14–15 and (9), and ρ calc is its predicted value calculated using Eqs. (12) and (13) for a given set of the principal axes and principal values of the diffusion tensor; σ ρ denotes the experimental errors in ρ exp , and the sum runs over all available internuclear vectors. The minimization of the target function Eq. (16) for an axially symmetric tensor requires a search in a fourdimensional space of the principal values {D , D⊥ } and the Euler angles {, }. This method is implemented in our computer program R2R1. Its application for the determination of the rotational diffusion tensor for the βARK PH domain is illustrated in Fig. 4. The characteristics of the rotational diffusion tensor thus obtained can then be used as input parameters for the analysis of protein dynamics using computer program DYNAMICS, as described below. In the most general case of a completely anisotropic diffusion tensor, six parameters have to be determined for the rotational diffusion tensor: three principal values and three Euler angles. This determination requires an optimization search in a six-dimensional space, which could be a significantly more CPU-demanding procedure than that for an axially symmetric tensor. Possible efficient approaches
16 Determination of Protein Dynamics Using 15N Relaxation Measurements
to this problem suggested recently include a simulated annealing procedure [54] and a two-step procedure [55]. 10.3 What Can We Do when Protein Structure is not Known? Preliminary Characterization of the Diffusion Tensor
In the absence of accurate structural information, the analysis based on anisotropic diffusion as discussed above cannot be applied. The use of the isotropic overall model is still possible (see below) because it does not require any structural knowledge. However, the isotropic model has to be validated, i.e. the degree of the overall rotational anisotropy has to be determined prior to such an analysis. An estimate of the degree of rotational anisotropy can be obtained from the spread of the experimental values of ρ, assuming that the minimal and maximal observed values of ρ (ρ min and ρ max ) correspond to NH vectors oriented parallel (θ = 0) and perpendicular (θ = 90◦ ) to the rotational diffusion axis. We obtain:
D⊥ =
ωN 3
ρmin 9 ρmax /ρmin − 1 and D /D⊥ = 1 + 3 − 4ρmin 4 3 − 4ρmin
(17)
where the latter equation assumes ε 1. Note that the assignment of ρ max to the orthogonal orientation of NH vectors is not valid for strong anisotropies (e.g. D /D⊥ > 4 in Fig. 3). The diffusion tensor characteristics for the βARK PH domain derived using this approach are D /D⊥ = 1.45, τ x = 9.36 ns, and τ x = 8.15 ns, in reasonable agreement with those derived using the actual fit (see Fig. 4). These considerations assume that (i) we are dealing with a prolate molecule (D > D⊥ hence ε > 0), (ii) that the conformational exchange contribution is negligible, and (iii) that we have a uniform distribution of the orientations for the set of NH vectors available in the molecule. In the case of an oblate ellipsoid of revolution (ε < 0), ρ min and ρ max in these equations have to be interchanged, as the corresponding orientations will be flipped by 90◦ . Amide groups undergoing conformational exchange have to be excluded from this analysis: the presence of Rex contributions to R2 will lower the apparent values of ρ min and hence could render this estimate inaccurate. The issue of incomplete orienta-tional sampling is more complex and depends on the structure of the molecule (see Ref. [56]). In addition, it is worth mentioning that, even in the case of a uniform distribution, the number of vectors oriented parallel to any given direction is small, so that ρ max could be underestimated. Here we assumed that the rotational diffusion tensor is axially symmetric. The presence of a rhombic component could be identified by the shape of the distribution of the values of ρ (see e.g. Refs. [55, 57]). 10.4
10 How Can We Derive the Rotational Diffusion Tensor of a Molecule from Spin-Relaxation Data? 17
Isotropic Overall Model
The isotropic model is justified when the estimated degree of the overall rotational aniso-tropy is small. A D /D⊥ ratio of less than 1.1–1.2 could probably be considered as a reasonable value for the isotropic model, although an anisotropy as small as 1.17 can be reliably determined from 15 N relaxation measurements, as demonstrated in Ref. [15]. In the isotropic model, the overall rotational diffusion is characterized by a single parameter, the overall correlation time τ c . The following steps could be used to determine τ c . (i) Preliminary estimate of τ c . A preliminary estimate of τ c is obtained as an average over the apparent correlation time values (τ ci ) for each NH bond; the latter could be calculated as [58]: 1 τci = 2ωN
6R2i −7 R1i
(18)
here subscript i assigns numbers to all NH-vectors. This value is more accurate if the R 2 and R 1 values, corrected for high-frequency components, are used (Eqs. 14–15 ). A more refined estimate of τ c is then obtained as an average over those residues that have their τ ci values within 1–1.5 standard deviations from the mean. (ii) Optimization of τ c via model selection. This τ c estimate can then be used as a starting value for the optimization procedure using computer program DYNAMICS, which is performed as follows. For a range of τ c values (either automatically predetermined on a grid around the initial τ c estimate or input manually), the program performs model selection for all residues (see below). For each value of τ c , the total residuals of fit and the number of degrees of freedom are obtained 2 χi2 and d f tot = d f . The optimal value as a sum over all residues: χtot = i i i of τ c is determined as the one that yields the highest total number of degrees of freedom, df tot . Usually, a relatively narrow range of τ c values satisfies this criterion. The width of this range, typically about 0.1–0.2 ns, reflects the precision of τ c estimation by this method. Further optimization is then possible based on the 2 minimum of χtot in this τ c range. For the βARK PH domain example considered here, the τ c estimate using Eq. (18) gave 8.22 ns, while the model-selection optimization yielded τ c = 8.27 ns, both values in good agreement with those derived using a more accurate fit (Fig. 3). Note that, unlike other approaches [59], the optimization procedure implemented here performs model selection for each value of τ c and is not restricted to the model selection determined for the starting value of τ c .
11
18 Determination of Protein Dynamics Using 15N Relaxation Measurements Table 1 Parameters of the models of local motion for the relaxation data analysis using com-
puter program DYNAMICS 2 2 Model Sfast s fast Sslow s slow Rex Npar
A B C D E F G H
X X X X X X X X
0 X 0 X 0 X 0 X
1 1 1 1 X X X X
X X X X
0 0 X X 0 0 X X
1 2 2 3 3 4 4 5
Model Selection for NH Bond Dynamics
Here we describe the model selection algorithm that is used to derive microdynamic (model-free) parameters for each NH group from 15 N relaxation data. It is implemented in our program DYNAMICS [9]. Given the overall rotational diffusion tensor parameters (isotropic or anisotropic) derived as described above, this analysis is performed independently for each NH-group in order to characterize its local mobility. The method is based on the model-free parameterization of the spectral density. Depending on the number of available experimental parameters, up to 7 different models for SD could be considered, listed in Tab. 1 [9]. These models are nested; the search starts with the simplest model and proceeds to the models of increasing degree of complexity (number of fitting parameters). An “Ockham’s razor” principle is assumed here: if more than one model is consistent with the data, the simplest model is preferred. For each model of motion, all parameters are determined from fitting, based on the simplex algorithm, to minimize the following target function: exper χi2 = (Rki − Rkcalc )2 /σk2i (19) i k where the subscript k assigns numbers to all measured relaxation parameters available for residue i, σ k is the corresponding experimental uncertainty in Rk , and the superscripts “exper” and “calc” denote experimental and calculated values, respectively. A reasonably good fit is expected to result in the χ 2 i values about exper the number of “degrees of freedom”, d f i = Ni − Nicalc , where the Ns indicate the numbers of experimental and fitting parameters, respectively. The goodness of fit test for each model and their comparison with one another are performed based on the χi2 and df i values using the F-statistics [60] as described in detail in Ref. [9]. Since no F-statistics comparison is possible when the number of degrees exper of freedom becomes zero (i.e. when Ni = Nicalc ), the list of available models is limited. For example, for a typical set of relaxation measurements at a single field, exper = 3, so that only models A through C allow a statistically reliable assessment. Ni
11 Model Selection for NH Bond Dynamics 19
Fig. 5 Model-free characteristics of the backbone dynamics in the $ARK PH domain: a squared order parameters, b local correlation time, and c conformational exchange contributions to R2 . These parameters were derived from 15 N relaxation data at 500 MHz (Fig. 2) using computer program DYNAMICS, as described in the text. Amides belonging to the well-defined secondary structure elements were treated assuming the anisotropic overall tumbling model with the axi-ally symmetric diffusion tensor. The principal values and orienta-
tion of the latter are given in the legend to Fig. 4. The rest of the amides, comprising those in the flexible loops and termini, were treated assuming an isotropic model, with J c = 8.27 ns. A constant value of −170 ppm for 15 N CSA was assumed here. In those cases when the “extended” model [29] (model E in Tab. 1) was selected, in addition to the total order parameter (in this 2 S2 ) represented by solid case, S2 = Sfast slow 2 characsquares, the order parameter Sfast terizing fast motion of the NH bond is also shown in a (open circles).
Usually, models D and E are also included, and their acceptance conditions are somewhat arbitrarily set to a very low acceptance level for the χ 2 i , usually of the order of ≤10−2 . The summary of microdynamic parameters derived from 15 N relaxation data for the βARK PH domain is presented in Fig. 5.
20 Determination of Protein Dynamics Using 15N Relaxation Measurements
Typically, residues from secondary structural elements such as α-helices or βsheets may be fitted to a simple one-parameter (S2 ) model with low residuals. Since increased mobility reduces R2 rates in general, increased R2 values well above those otherwise found in residues from stable secondary structure indicate conformational exchange contributions. Residues from loops frequently require at least a two-parameter fit to take into account that there is a substantial contribution from internal motions that are not very fast. Residues displaying such motional behavior are often placed in sequence right before or after the termini of α-helices or β-sheets. Especially NH bonds with decreased values for the heteronuclear NOE quite often display motions on two different time scales and hence need the three-parameter 2 2 ,τslow and Sfast . This behavior is typical for residues at the unstructured Nfit Sslow and C-termini and in the loop regions.
12 Accuracy and Precision of the Model-Free Parameters
Let us briefly discuss possible limitations in the accuracy and precision of the dynamic picture of a protein derived using the model-free approach. Selection of a particular model of motion for each NH bond strongly depends on site-specific experimental values of relaxation parameters and is based on statistical tests that give preference to one model over another only with a certain probability. Experimental errors (systematic or random) in R1 , R2 , and NOE could therefore influence both the model selection and the actual values of the microdynamic parameters (S2 , τ loc , Rex etc.). The relationship between the errors in the relaxation rates and those in the model-free parameters is complex. For example, assume that the value of τ c is fixed (τ c = 5 ns), the “true” values of the model-free parameters are S2 = 0.87, τ loc = 50 ps, and there is no conformational exchange. A 10% increase in R2 is likely to give rise to Rex terms (0.7 s−1 ); a similar increase in R1 will bias model selection toward the “extended” model with reduced S2 (0.81) and with τ slow in the nanosecond range (1.9 ns), whereas a simultaneous increase in both R1 and R2 will result in increased S2 (0.95) and τ loc (270 ps). A 10% increase in NOE will cause only a slight change in S2 (0.88). Note also that the values of τ loc could be very sensitive to small variations in the relaxation rates [58]. Additional limitations in the accuracy of the derived dynamic parameters could be related to the limitations in the analytical approaches. For example, neglect of the overall rotational anisotropy could lead to considerable errors in the modelfree parameters, as illustrated earlier [46]. As also shown in Ref. [6], the model-free parameters could be in error if the site-specific variations in N CSA are not properly taken into account, particularly at higher fields (>600 MHz 1 H frequency). It is worth mentioning that the “model-free” approach provides a semiquantitative picture of protein dynamics, as it is practically impossible, with the means of a simple model having a limited number of fitting parameters, to fully and accurately describe the great variety of (often convoluted) motions which take place in a real protein. The order parameters, for example, allow a robust
14 Conformational Exchange 21
identification of mobile and rigid regions in a protein, but caution should be exercised when using order parameters to quantify changes in protein dynamics (e.g. upon ligand binding) or differences in backbone dynamics between various proteins.
13 Motional Models
The derivation of the SD or model-free parameters is just the beginning of the analysis of protein dynamics: what one finally wants to achieve is a picture of protein dynamics in terms of motional models. The order parameter in the model-free approach allows a simple geometrical interpretation depending on a particular model of motion. For example, if a wobbling-in-cone model is used, then S is related to the semiangle α of the cone as S =1 /2 cos α(cos α + 1). For the rotation on a cone model, S =1 /2 3 cos2 α − 1) . A Gaussian angle fluctuation model [61] provides another way to look at NH-vector motion. This one is of particular interest because in some cases [62] the backbone motions observed in molecular dynamics simulations can be represented as a solid-body rocking of the peptide plane around the Cα i−1 −Cα i axis (crankshaft-type motions). Which model provides the best representation for local mobility in a particular group remains unclear, as a detailed picture of protein dynamics is yet to be painted. This information is not directly available from NMR measurements that are necessarily limited by the number of experimentally available parameters. Additional knowledge is required in order to translate these experimental data into a reliable motional picture of a protein. At this stage, molecular dynamic simulations could prove extremely valuable, because they can provide complete characterization of atomic motions for all atoms in a molecule and at all instants of the simulated trajectory. This direction becomes particularly promising with the current progress in computational resources, when the length of a simulated trajectory approaches the NMR-relevant time scales [23, 63, 64].
14 Conformational Exchange
Conformational exchange in the intermediate time-range comprises those motions that modulate the isotropic contributions to spin Hamiltonian, already averaged by fast local and global motions. For example, conformational exchange could be produced by a two-or multiple-site interchange between two or multiple conformations characterized by different values of the isotropic chemical shift of the nucleus under observation. By their nature, these processes involve activation barriers. The conformational exchange leads to line-broadening effects, i.e. increased R2 ; its contribution (Rex Eq. (2)) to R2 can be assessed e.g. using the CPMG pulse sequence
22 Determination of Protein Dynamics Using 15N Relaxation Measurements
or rotating frame relaxation measurements. The processes of conformational exchange could take place over several time scales, but in practice their observation is limited to the range from µs to ms. The upper limit is determined by the experimental setup, principally by the repetition rate of the refocusing pulses in the CPMG pulse train. It can be varied (e.g. Refs. [9, 65]) in a limited range, which minimizes the effects of evolution due to heteronuclear scalar coupling (∼94 Hz between 1 H and 15 N) [3, 4] on the one hand and avoids sample heating from the pulse train on the other. Recent approaches [66] allow an extension of the time window for Rex observations beyond the 1 ms range. The lower limit of the accessible conformational exchange time scale depends on the lower limit of estimating Rex , which is set by a level of precision of the R2 measurement: Rex > δR2 , where δR2 is the experimental error. Interestingly, this µs–ms range of motions is considered to be relevant to protein function. It should be mentioned that rotational anisotropy of the molecule will result in an increase in the R2 values for NH vectors having particular orientation with respect to the diffusion tensor frame [46]. This increase could be misinterpreted as conformational exchange contributions, and, vice versa, small values of Rex , usually of the order or 1 s−1 or less, could be mistaken for the manifestation of the rotational anisotropy. Therefore, identification of residues subjected to conformational exchange is critical for accurate analysis of relaxation data. Additional approaches are necessary to distinguish between the two effects. As suggested earlier [27] (see also Ref. [26]), a comparison between R2 and the cross-correlation rate ηxy could serve this purpose, as ηxy contains practically the same combination of spectral densities as R2 , but does not contain the Rex term. The field dependence of Rex (Rex ∞ B0 2 in the fast exchange limit) can be used for these purposes [9, 16, 20], for example, by utilizing the following combination of relaxation rates (see Fig. 6)
2R2 − R1 = 4d 2 J (0) + 2[Rex + 2c 2 J (0)],
(20)
where the terms in the brackets scale as B02 , although a more accurate analysis has yet to separate the conformational exchange contribution from the CSA term having similar field dependence (see Ref. [20]). Experimental approaches to direct characterization of the conformational exchange motions in proteins have been suggested earlier [67–69]. The most recent methods [66, 70–73] are based on a relaxation-compensated version of CPMG that alleviates the previous restriction on the duration of the refocusing delay due to evolution of magnetization from scalar couplings and dipole-dipole cross-correlations.
15 Effects of Self-Association 15
N relaxation studies of proteins are usually performed at protein concentrations in the millimolar range. Although the invention of the cryoprobe promises to eventually allow measurements at significantly lower protein concentrations, a
15 Effects of Self-Association 23
Fig. 6 Identification of the residues undergoing conformational exchange based on the field dependence of relaxation parameters: a the comparison of 2R 2 –R 1 values measured at three fields, 9.4, 11.7, and 14.1 T (shown as grey, white, and black bars, respectively); b the values of Rex + 2c2 J(0) at 500 MHz directly derived from the difference between the 2R 2 –R 1 values (Eq. (20)) at 600 and 500 MHz (as described in Ref. [16]). This analysis does not require structural information. A strong increase in 2R 2 –R 1 with the field in panel a could be indicative of the conformational exchange, as further quantified in b. Although the contributions from CSA and Rex cannot be sepa-rated based on these data
alone, the expected level of 2c2 J(0) can be estimated based on 15 N CSA values. The horizontal lines in b indicate the expected level of 2c2 J(0) calculated for 15 N CSA values of −170 ppm (solid line) and −220 ppm (dashed line) and using model-free approach with S2 = 0.87 and the overall (isotropic) and local correlation times of 8.27ns and 50 ps, respectively. The latter CSA value corresponds to the maximum absolute 15 N CSA value observed in proteins [19]. Note the good correlation between the residues exhibiting elevated values in panel b and those having significant Rex contributions as determined from the model-free analysis (Fig. 5c).
millimolar-range level of protein concentration is currently required for measurements using conventional NMR probes to provide good sensitivity within a reasonable measurement time period. Protein aggregation that could occur under these conditions poses a problem for a robust data analysis. The presence of a protein in different aggregation states differing in their relaxation properties may severely complicate the analysis. The problem is at least two-fold. First, given an exchange process between the monomeric and dimerized/oligomerized forms, one has to determine the degree to which the measured relaxation rates sample the
24 Determination of Protein Dynamics Using 15N Relaxation Measurements
Fig. 7 Illusrration of the idea of the “random dimer” model. For each molecule the orientation of a selected NH vector is indicated; only three of an infinite number of possibilities of interdomain orientations in a dimer are shown.
relaxation parameters for each of the states. Second, a complication arises from the overall anisotropy of the dimer, which must be taken into account even in those cases when the monomer is clearly isotropic. What could be done in this case? As suggested in Ref. [9], in those cases when the dimerization is not apparently specific and the exchange is fast on the NMR-measurement time scale (R2 from tens to hundreds of ms), this problem can be solved by using a “random-dimer” model. In this model, the resulting spectral density function is assumed to be a weighted sum of the monomeric and dimeric (anisotropic) SD, the latter being averaged over all possible (random) orientations sampled by each monomeric molecule in various dimers which it forms during the spin-relaxation time (illustrated in Fig. 7). The averaged spectral density function can be written in the following form [9]: J (ω) = J mono (ω) pmono + J dimer (ω) pdimer ,
(21)
where pmono and pdimer denote the fraction of protein molecules in the monomeric and dimerized states, respectively. This approach accounts for both fast monomerdimer exchange and anisotropy of the dimer rotation. It happens to be the model of choice in the case of the dynamin PH domain [9].
16 Using 13 C Relaxation to Study Protein Dynamics
As demonstrated in this article, 15 N relaxation measurements provide access to backbone dynamics in proteins. The 15 N nuclei can also be used to gain information about side-chain motions [72], although their use as reporter groups here is inevitably limited to nitrogen-containing side chains (e.g., Trp, Asn, Gln, and Arg). Carbon is much more abundant in proteins, and therefore 13 C relaxation is potentially richer in the informational content about protein motions, both in the backbone and in the side chains. Both 15 N and 13 C relaxation data are necessary to fully characterize protein dynamics. Studying protein dynamics using 13 C relaxation is a real challenge. The 13 C relaxation measurements in uniformly labeled proteins are much more complicated than the 15 N measurements, primarily because of the magnetization transfer due to scalar coupling between adjacent 13 C nuclei, because of the contribution to 13 C relaxation from dipolar interactions between adjacent carbons, and because of the technical problems of achieving effective homonuclear decoupling of the 13 C nuclei. 13 C relaxation measurements in unlabeled protein samples (using 13 C natural abundance) [1, 74] help to avoid these
17 Conclusions 25
problems, although these studies require highly soluble proteins and an extensive measurement time. An enhancement can be achieved here by random fractional 13 C labeling [75] or by site-specific 13 C-labeling [76, 77] at the cost of complex sample preparation. An approach that overcomes the multiple-spin problems in the CH2 and CH3 groups by fractional 2 H and uniform 13 C labeling was suggested in Ref. [78]. Another problem with 13 C relaxation is the influence of other, not directly attached, protons, so that the isolated spin-system approximation might not be valid. This problem could partially be addressed by fractional deuteration of side chains, although the extent of random incorporation of 2 H at various side-chain positions requires further analysis. 13 C relaxation measurements on the carbonyl and α-carbons are expected to complement our understanding of the backbone dynamics [79, 80]. While the experimental techniques for such measurements have been developed [81–86], the informational content of the first studies is limited, mostly because of the necessity to correctly assess contributions from various sources, predominantly the 13 C CSA and dipolar contributions from directly attached and through-space coupling to neighboring hydrogens and 13 C’s. Site-specific fluctuations in the 13 C CSA tensors [87] and in the geometry of the local environment severely complicate this analysis. Side-chain mobility is of particular interest because groups responsible for protein function are in many cases located in side chains rather than in the backbone. Gaining an insight into side-chain dynamics, therefore, could be necessary for understanding the relationship between protein dynamics and function. In contrast to protein backbone dynamics, the dynamics in the side chains have not been studied systematically, although promising experimental approaches are starting to emerge [72, 73, 78, 88, 88, 88–90].
17 Conclusions
A diverse picture of protein dynamics has emerged from 15 N NMR relaxation studies. MD simulations related to NMR relaxation of proteins performed in the last decade have primarily focused on motions in the protein backbone. Some earlier results may require reinterpretation using anisotropic models. Briefly, the backbone motions in the elements of well-defined secondary structure (β-strands, α-helices, tight turns) are largely restricted in their experimental or simulated amplitude (0.8 < S2 < 1, S2 ≈ 0.87 on average), and mostly consist, in simulations, of ps time-range bond librations and restricted fluctuations of torsion angles involving rocking of the peptide plane [23, 91]. These correspond to oscillations within one potential well, and can be modeled either by diffusion within a cone or “wobbling in a cone” with the cone semiangle of ∼17◦ or by an oscillating motion within a harmonic potential, e.g. a Gaussian axial fluctuation model [61]. Dynamics in the loops and in the termini, on the other hand, are largely influenced by fluctuations of the backbone dihedral angles (, ψ) that are less restricted in flexible segments than in the protein core. These motions are much slower, with a characteristic
26 Determination of Protein Dynamics Using 15N Relaxation Measurements
time in the ns range, and lead to greater amplitudes of bond reorientation. Not surprisingly, the complete picture of motion is more complex, and in many cases resembles infrequent jumps between various energetic wells. The picosecond timescale bond librations are still present (as e.g. Sfast , τ fast in Eq. (7)), and in some cases can be deconvolved from the slower motions [9, 23]. Protein termini, if they happen not to be fully folded, are virtually unrestricted, with the order parameter values asymptotically approaching zero toward a flexible terminus. Negative heteronuclear steady-state NOEs observed in these residues can be directly used as an indicator of unfolded parts of a protein construct [9]. Acknowledgment
I am grateful to David Cowburn for many inspiring and helpful discussions. Computer programs DYNAMICS and R2R1 described here are available from the author upon request.
References 1 NIRMALA, N. and G. WAGNER. J. Am. Chem. Soc., 1988, 110: p. 7556–7558. 2 KAY, L. E., D. A. TORCHIA, and A. BAX. Biochemistry, 1989, 28(23): p. 8972–8979. 3 KAY, L., L. K. NICHOLSON, F. DELAGLIO, A. BAX, and D. TORCHIA. J. Magn. Reson., 1992, 97: p. 359–375. 4 PALMER, A., N. SKETON, W. CHAZIN, P. WRIGHT, and M. RANCE. Mol. Phys., 1992, 75: p. 699–711. 5 CAVANAGH, J. W. J. FAIRBROTHER, A. J. P. III, and N. J. SKELTON, Protein NMR Spectroscopy. 1996, San Diego: Academic Press. 6 FUSHMAN, D. and D. COWBURN, Nuclear magnetic resonance relaxation in determination of residue-specific 15 N chemical shift tensors in proteins in solution: protein dynamics, structure, and applications of transverse relaxation optimized spectroscopy, in Methods Enzymol. T. James, U. Schmitz, and V. Doetsch, Editors. 2001. p. 109–126. 7 CARR, H. and E. M. PURCELL. Phys. Rev., 1954, 94: p. 630. 8 MEIBOOM, S. and D. GILL. Rev. Sci. Instrum., 1958, 29: p. 688. 9 FUSHMAN, D., S. CAHILL, and D. COWBURN. J. Mol. Biol., 1997, 266: p. 173–194. 10 GOLDMAN, M. J. Magn. Reson., 1984, 60: p. 437–452.
11 GRZESIEK, S. and A. BAX. J. Am. Chem. Soc., 1993, 115: p. 12593–12594. 12 JONES, J. A. J. Magn. Reson., 1997, 126: p. 283–286. 13 SKELTON, N., A. PALMER, M. AKKE, J. KORDEL, M. RANCE, and W. CHAZIN. J. Magn. Reson., 1993, B102: p. 253–264. 14 PRESS, W. H., S. A. TEUKOLSKY, W. T. VETTERLING, and B. P. FLANNERY, Numerical Recipes in C. 1992, NY: Cambridge University Press. 15 TJANDRA, N., S. E. FELLER, R. W. PASTOR, and A. BAX. J. Am. Chem. Soc., 1995, 117: p. 12562–12566. 16 CAMARERO, J. A., D. FUSHMAN, S. SATO, I. GIRIAT, D. COWBURN, D. P. RALEIGH, and T. W. MUIR. J. Mol. Biol., 2001, 308: p. 1045–1062. 17 GAGNE, S., S. TSUDA, L. SPYRACOPOULOS, L. KAY, and B. SYKES. J. Mol. Biol., 1998, 278: p. 667–686. 18 BRAUN, S., H.-O. KALINOWSKI, and S. BERGER, 150 and more basic NMR experiments. 1998, Weinheim: Wiley-VCH, 596 pp. 19 FUSHMAN, D., N. TJANDRA, and D. COWBURN. J. Am . Chem. Soc., 1998, 120(42): p. 10947–10952. 20 FUSHMAN, D., N. TJANDRA, and D. COWBURN. J. Am. Chem. Soc., 1999, 121: p. 8577–8582.
References 21 ABRAGAM, A., The Principles of Nuclear Magnetism. 1961, Oxford: Clarendon Press. 22 FUSHMAN, D. and D. COWBURN. J. Biomol. NMR, 1999, 13: p. 139–147. 23 PFEIFFER, S., D. FUSHMAN, and D. COWBURN. J. Am. Chem. Soc., 2001, 123: p. 3021–3036. 24 TJANDRA, N. and A. BAX. J. Am. Chem. Soc., 1997, 119: p. 8076–8082. 25 TJANDRA, N. and A. BAX. J. Am. Chem. Soc., 1998, 119: p. 9566–9567. 26 KROENKE, C. D., J. P. LORIA, L. K. LEE, M. RANCE, and A. G. I. PALMER. J. Am. Chem. Soc., 1998, 120: p. 7905–7915. 27 FUSHMAN, D. and D. COWBURN. J. Am. Chem. Soc., 1998, 120: p. 7109– 7110. 28 LIPARI, G. and A. SZABO. J. Am. Chem. Soc., 1982, 104: p. 4546–4559. 29 CLORE, G. M., A. SZABO, A. BAX, L. E. KAY, P. C. DRISCOLL, and A. M. GRONENBORN. J. Am. Chem. Soc, 1990, 112: p. 4989– 4936. 30 ISHIMA, R. and K. NAGAYAMA. Biochemistry, 1995, 34: p. 3162–3171. 31 FARROW, N., O. ZHANG, A. SZABO, D. TORCHIA, and L. KAY. J. Biomol. NMR, 1995, 6: p. 153–162. 32 LIPARI, G. and A. SZABO. J. Am. Chem. Soc., 1982, 104: p. 4559–4570. 33 AKKE, M., R. BRUSCHWEILER and A. PALMER. J. Am. Chem. Soc., 1993, 115: p. 9832–9833. 34 LI, Z., S. RAYCHAUDHURI and A. J. WAND. Protein Science, 1996, 5: p. 2647–2650. 35 YANG, D. and L. E. KAY. J. Mol. Biol., 1996, 263: p. 369–382. 36 WOESSNER, D. J. Chem. Phys, 1962, 37: p. 647–654. 37 PENG, J. and G. WAGNER. J. Magn. Reson., 1992, 94: p. 82–100. 38 PENG, J. and G. WAGNER. Biochemistry, 1992, 31: p. 8571–8586. 39 FARROW, N., O. ZHANG, J. FORMAN-KAY, and L. KAY. Biochemistry, 1995, 34: p. 868–878. 40 VIS, H., C. E. VORGIAS, K. S. WILSON, R. KAPTEIN, and R. BOELENS. J. Biomol. NMR, 1998, 11(3): p. 265–277. 41 KROENKE, C. D., M. RANCE, and A. G. I. PALMER. J. Am. Chem. Soc., 1999, 121(43): p. 10119–10125.
42 BABER, J. L., A. SZABO, and N. TJANDRA. J. Am. Chem. Soc., 2001, 123(17): p. 3953–3959. 43 FUSHMAN, D., T. NAJMABADI-HASKE, S. CAHILL, J. ZHENG, H. LEVINE III and D. COWBURN. J. Biol. Chem., 1998, 273(5): p. 2835–2843. 44 PFEIFFER, S., D. FUSHMAN, and D. COWBURN. Proteins, 1999, 34: p. 206–217. 45 TUGARINOV, V., Z. LIANG, Y. SHAPIRO, J. H. FREED, and E. MEIROVITCH. J. Am. Chem. Soc., 2001, 123: p. 3055–3063. 46 FUSHMAN, D. and D. COWBURN, Studying protein dynamics with NMR relaxation, in Structure, Motion, Interaction and Expression of Biological Macromolecules, R. SARMA and M. SARMA, Eds. 1998, Adenine Press: Albany, NY. p. 63–77. 47 COPIE, V., Y. TOMITA, S. K. AKIYAMA, S. AOTA, K. M. YAMADA, R. M. VENABLE, R. W. PASTOR, S. KRUEGER, and D. A. TORCHIA. J. Mol. Biol., 1998, 277: p. 663–682. 48 TJANDRA, N., D. S. GARRETT, A. M. GRONENBORN, A. BAX, and G. M. CLORE. Nat. Struct. Biol., 1997, 4(6): p. 443– 449. 49 FUSHMAN, D., R. XU, and D. COWBURN. Biochemistry, 1999, 38: p. 10225– 10230. 50 HWANG, P. M., N. R. SKRYNNIKOV, and L. E. KAY. J. Biomol. NMR, 2001, 20: p. 83–88. 51 BRU¨ SCHWEILER, R., X. LIAO, and P. E. WRIGHT. Science, 1995, 268(5212): p. 886–889. 52 LEE, L. K., M. RANCE, W. J. CHAZIN, and A. G. PALMER III. J. Biomol. NMR, 1997, 9(3): p. 287–298. 53 BLACKLEDGE, M., F. CORDIER, P. DOSSET, and D. MARION. J. Am. Chem. Soc., 1998, 120: p. 4538–4539. 54 HUS, J. C., D. MARION, and M. BLACKLEDGE. J. Mol. Biol., 2000, 298(5): p. 927–936. 55 GHOSE, R., D. FUSHMAN, and D. COWBURN. J. Magn. Reson., 2001, 149: p. 214–217. 56 FUSHMAN, D., R. GHOSE, and D. COWBURN. J. Am. Chem. Soc., 2000, 122(43): p. 10640–10649. 57 CLORE, G. M., A. M. GRONENBORN, and A. BAX. J. Magn. Reson., 1998, 133(1): p. 216–221.
27
28 Determination of Protein Dynamics Using 15N Relaxation Measurements
58 FUSHMAN, D., R. WEISEMANN, H. THURING, and H. RUTERJANS. J. Biomol. NMR, 1994, 4: p. 61–78. 59 MANDEL, A. M., M. AKKE, and A. G. I. PALMER. J. Mol. Biol., 1995, 246: p. 144–163. 60 DRAPER, N. R. and H. SMITH, Applied regression analysis. 1981, New York: John Wiley & Sons. 709 pp. 61 BRUSCHWEILER, R. and P. WRIGHT. J. Am. Chem. Soc., 1994, 116: p. 8426–8427. 62 LIENIN, S. F., T. BREMI, B. BRUTSCHER, R. BRU¨ SCHWEILER, and R. R. ERNST. J. Am. Chem. Soc., 1998, 120(38): p. 9870– 9799. 63 CHATFIELD, D. C., A. SZABO, and B. R. BROOKS. J. Am. Chem. Soc., 1998, 120: p. 5301–5311. 64 DUAN, Y. and P. A. KOLLMAN. Science, 1998, 282: p. 740–744. 65 OREKHOV, V., K. V. PERVUSHIN, and A. S. ARSENIEV. Eur. J. Biochem., 1994, 219(3): p. 887–896. 66 LORIA, J. P., M. RANCE, and A. G. I. PALMER. J. Am. Chem. Soc., 1999, 121: p. 2331–2332. 67 AKKE, M. and A. G. PALMER III,. J. Am. Chem. Soc., 1996, 118: p. 911–912. 68 ZINN-JUSTIN, S., P. BERTHAULT, M. GUENNEUGUES, and H. DESVAUX. J. Biomol. NMR, 1997, 10: p. 363–372. 69 ISHIMA, R., P. WINGFIELD, S. STAHL, J. KAUFMAN, and D. A. TORCHIA. J. Am. Chem. Soc., 1998, 120: p. 10534–10542. 70 MILLET, O., J. LORIA, C. D. KROENKE, M. PONS, and A. G. PALMER. J. Am. Chem. Soc., 2000, 122: p. 2867–2877. 71 WANG, C., M. J. GREY, and A. G. I. PALMER. J. Biomol. NMR, 2001, 21: p. 361–3666. 72 MULDER, F. A. A., N. R. SKRYNNIKOV, B. HON, F. W. DAHLQUIST, and L. E. KAY. J. Am. Chem. Soc., 2001, 123: p. 967–975. 73 SKRYNNIKOV, N. R., F. A. A. MULDER, B. HON, F. W. DAHLQUIST, and L. E. KAY. J. Am. Chem. Soc., 2001, 123: p. 4556–4566. 74 PALMER, A., M. RANCE, and P. WRIGHT. J. Am. Chem. Soc, 1991, 113: p. 4371–4380.
75 WAND, A., J. URBAUER, R. MCEVOY, and R. BIEBER. Biochemistry, 1996, 35: p. 6166–6125. 76 NICHOLSON, L. K., L. E. KAY, D. M. BALDISSERI, J. ARANGO, P. E. YOUNG, A. BAX, and D. A. TORCHIA. Biochemistry, 1992, 31(23): p. 5253–5263. 77 SIIVARI, K., M. ZHANG, A. PALMER, and H. VOGEL. FEBS Lett., 1995, 366: p. 104– 109. 78 YANG, D. and L. KAY. J. Magn. Reson., 1996, B110: p. 213–218. 79 FISCHER, M. W. F., L. ZENG, Y. PANG, W. HU, A. MAJUMDAR, and E. R. P. ZUIDERWEG. J. Am. Chem. Soc., 1997, 119: p. 12629–12642. 80 FISCHER, M. W. F., L. ZENG, A. MAJUMDAR, and E. R. P. ZUIDERWEG. Proc. Natl. Acad. Sci, USA, 1998, 95: p. 8016–8019. 81 ZHENG, L., M. FISCHER, and E. ZUIDERWEG. J. Biomol. NMR, 1996, 7: p. 157–162. 82 CORDIER, F., B. BRUTCHER, and D. MARION. J. Biomol. NMR, 1996, 7: p. 163–168. 83 ALLARD, P. and T. H¨ARD. J. Magn. Reson., 1997, 126: p. 48–57. 84 DAYIE, K. T. and G. WAGNER. J. Am. Chem. Soc., 1997, 119: p. 7797–7806. 85 YAMAZAKI, T., R. MUHANDIRAM, and L. KAY. J. Am. Chem. Soc, 1994, 116: p. 8266–8278. 86 ENGELKE, J. and H. RUTERJANS. J. Biomol. NMR, 1995, 5: p. 173–182. 87 PANG, Y. and E. R. ZUIDERWEG. J. Am. Chem. Soc., 2000, 122: p. 4841–4842. 88 ISHIMA, R., J. M. LOUIS, and D. A. TORCHIA. J. Am. Chem. Soc., 1999, 121: p. 11589–11590. 89 ISHIMA, R., A. P. PETKOVA, J. M. LOUIS, and D. A. TORCHIA. J. Am. Chem. Soc., 2001, 123: p. 6164–6171. 90 MULDER, F. A. A., B. HON, A. MITTERMAIER, F. W. DAHLQUIST, and L. E. KAY. J. Am. Chem. Soc., 2002, 124: p. 1443–1451. 91 FUSHMAN, D., O. OHLENSCHLA¨ GER, and H. R¨UTERJANS. J. Biomol. Struct. & Dyn., 1994, 4: p. 61–78.
1
Membrane Protein Crystallization Mediated by Antibody Fragments Carola Hunte, and Cornelia M¨unke
Max-Planck-Institute of Biophysics, Frankfurt, Germany
Originally published in: Molecular Biology in Medicinal Chemistry. Edited by Theodor Dingermann, Dieter Steinhilber and Gerd Folkers. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30431-8
1 Introduction
After decoding the sequence of the human genome [1, 2], and as more and more genomes of pathogens, e.g. Helicobacter pylori [3], Mycobacterium tuberculosis [4] and Plasmodium falciparum [5], become available, hopes have been raised that drug discovery will be more empowered. Most drugs act at the level of proteins and great efforts are now put into structural genomics projects to accelerate experimental protein structure determination by means of high-throughput X-ray crystallography and nuclear magnetic resonance (NMR). High-resolution structures are used for drug discovery (1) by rational drug design, (2) by analysis of target-ligand complex structures to validate hits or rationally improve ligands that have been obtained by high-throughput screening of combinatorial libraries, or (3) by in silico ligand screening, which allows the reduction of experimental costs and speeds up the discovery process. Last, but not least, high-resolution structures provide crucial information for the understanding of the molecular mechanism of proteins that are present in the cell in a large variety of specific architectures–the basis for their highly diverse functions. The most widely used method for the determination of the atomic structure of proteins larger than 30 kDa (including membrane proteins) is X-ray crystallography, which requires well-ordered three-dimensional (3-D) crystals grown from protein solution. Despite the enormous interest in structures of membrane proteins, which provide targets for the majority of drugs, structures of less than 50 integral membrane proteins have been determined since 1985 when the first membrane protein structure of the bacterial photosynthetic reaction center was presented [6]. In contrast, several thousand structures of soluble proteins have already been analyzed by X-ray crystallography. Only 13 X-ray structures of unrelated polyProtein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Membrane Protein Crystallization Mediated by Antibody Fragments
topic membrane proteins from the inner membrane of bacteria and mitochondria and from eukaryotic membranes are available [7]; for a regularly updated list, see www.mpibp-frankfurt.mpg.de/michel/public/memprotstruct.html. This small number is in contrast to the high proportion of integral membrane proteins predicted from an analysis of genomes of eubacterial, archaean and eukaryotic organisms, in which 20–30% of the open reading frames encode integral membrane proteins [8]. Obtaining well-ordered 3-D crystals of membrane proteins suitable for high-resolution structural studies is notoriously difficult due to the amphipathic surface of the macromolecules. The hydrophobic membrane-spanning region of the protein is in contact with the acyl chains of the phospholipid bilayer, whereas the polar surfaces are exposed to the polar head groups of the lipids and to the aqueous phases. For purification and crystallization purposes, membrane proteins are solubilized from the lipid bilayer in their native conformation by the use of detergents. Detergent molecules are amphiphilic, consisting of a polar or charged head group and a hydrophobic tail. At low concentrations, they are present as monomers in aqueous solution and form a monolayer at the water–air interface. Above a certain concentration of the detergent, which is called the critical micelle concentration (CMC), self-association occurs and micelles are formed. In these spherical aggregates, the hydrophobic tails of the detergent molecules are oriented towards the micelle interior and the hydrophilic head groups point outwards. Membrane proteins are soluble in micellar detergent solutions and form protein–detergent complexes. A detergent micelle covers the hydrophobic surface of the membrane protein in a belt-like manner, where the polar detergent head groups face the aqueous phase (Fig. 1; for review, see [9, 10]). Starting with detergent–protein complexes, membrane proteins can be reconstituted into a phospholipid bilayer by concomitant removal of detergent and addition of phospholipid either by dialysis or solid-phase adsorption. Under certain empirically determined conditions the respective protein forms well-ordered twodimensional (2-D) arrays, so-called 2-D crystals, which can be used for structure determination by electron microscopy [11, 12]. In this way several structures of membrane proteins at medium resolution (around 5–10 Å) have already been obtained. Structure determination at near atomic resolution by electron microscopy is possible [13–15], but is cumbersome. The 2-D crystallization trials need less protein and it might be faster to obtain crystals. However, once good 3-D crystals are available, rapid structure determination at high resolution by X-ray crystallography is more likely. The amphipathic surface of membrane proteins allows 3-D crystals to be formed in two ways ([16], see Fig. 1). Type I crystals are in principal ordered stacks of 2-D crystals that contain the ordered protein molecules within the membrane plane. Type II crystals can be formed by membrane proteins integrated in detergent micelles in a similar manner to soluble proteins. In the latter case, crystal contacts are made by the polar surfaces of those protein domains that protrude from the detergent micelle. Here, crystallization conditions are very similar to those for soluble proteins. Mixed type I and II crystals are possible, although most membrane protein crystals belong to type II.
1 Introduction 3
Fig. 1 Basic types of 3-D membrane protein crystals. Type I crystals consist of ordered stacks of 2-D crystals that are present in a bilayer. The hydrophobic surface of membrane proteins is covered by a micelle in the detergent-solubilized state. These membrane protein–detergent complexes form so-called type II 3-D crystals. Stable crystal
contacts are made between the hydrophilic surfaces of the protein. In a modification of type II (type IIb), tightly and specifically bound antibody fragments add a hydrophilic domain to the complex. This surface expansion provides additional sites for crystal contacts and space for the detergent micelle (From [29].)
Obviously, the detergent micelle requires space in the crystal lattice. Although attractive interactions between the micelles might stabilize the crystal packing [17, 18], they cannot contribute to rigid crystal contacts. Thus, proteins with small hydrophilic domains are especially difficult to crystallize, e.g. many transporters or channels that consist mainly of transmembrane helices with very short solvent exposed loops. A strategy to increase the probability of obtaining well-ordered type II crystals is to enlarge the polar surface of the membrane protein by the attachment of polar domains with specifically binding antibody fragments. This approach, developed by Hartmut Michel et al., was first successfully applied for crystallization
4 Membrane Protein Crystallization Mediated by Antibody Fragments
and structure determination of the cytochrome c oxidase (COX) from Paracoccus denitrificans with a bound antibody Fv (fragment variable) fragment [19, 20]. In addition to the specific detergent requirements for 3-D crystallization of membrane proteins, the second main bottleneck for structural analysis is the supply of sufficient amounts of protein. Crystallization in 3-D consumes at least 10 mg of pure protein per crystallization screen. Nearly all membrane protein structures that have been determined are of proteins that occur in high abundance in their natural environment, e.g. proteins of the respiratory chain or photosynthetic membranes. However, most membrane proteins, e.g. receptors and transporters, are present in the cell at low concentration. Using recombinant production systems, insertion of the protein into a membrane limits the production yield, unless one chooses production via inclusion body formation and refolding. The latter has been very successfully applied for the unusually stable β-barrel membrane proteins from the outer bacterial membranes [21, 22]. Good progress has been made in the development of overexpression systems for membrane proteins from bacterial sources, e.g. for the structure determination of the mechanosensitive ion channel from M. tuberculosis produced in Escherichia coli [23]. Further development of high-level overexpression systems for the production of recombinant membrane proteins from eukaryotic sources is crucial for the successful exploration of their structures [24, 25]. In addition, recombinant techniques have the advantage that proteins can be engineered with affinity tags to allow easy purification. The sequence can also be modified to provide a homogenous protein source, e.g. by removal of glycosylation sites. Furthermore, with the growing number of available genome sequences, the recombinant system allows an additional strategy: to express and purify homologous proteins from several organisms and test which protein is suitable for crystallization. For successful structure determination of the mechanosensitive channel MscL, homologs from nine prokaryotic species were subjected to crystallization trials [23]. Homologous proteins from different species might not only feature differences in essential criteria like production yield or protein stability, but small variations in surface properties might be responsible for the crystallization success, as crystal contacts are made by the protein surface. Several reviews are available that describe the specific requirements for crystallization of membrane proteins [16, 18, 26–29]. In this article, we focus on describing the crystallization of membrane proteins mediated by antibody fragments, as this promising approach has already been successfully used for structure determination of five membrane protein–antibody complexes [30]. 2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 2.1 Overview
Following the successful antibody fragment-mediated crystallization of bacterial COX, this technique was used to determine several structures of membrane proteins that will be described below.
2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 5
COX from P. denitrificans was crystallized as a complex with a recombinant antibody Fv fragment [19]. The structure of this complex was determined at 2.8-Å resolution, and contained all four subunits of COX and the two polypeptides of the Fv fragment [20, 31]. Fv fragments are involved in all crystal contacts. However, the lack of direct protein–protein interactions along the c-axis causes anisotropy in diffraction and problems in reproducing the crystals. Better-ordered crystals were obtained with a COX preparation containing only the two catalytically essential subunits I and II. The crystals diffract X-rays to below 2.5-Å resolution. The new crystal packing involves direct interaction between cytoplasmic and periplasmic surfaces of adjacent COX molecules. All further contacts are mediated exclusively by the Fv fragment. In all of these structures, the COX–Fv fragment co-complex was purified by indirect immunoaffinity chromatography (see below). The co-complex is formed by mixing solubilized membranes with Fv fragment containing periplasm and purified in a single step by streptavidin-affinity chromatography making use of the strep-tag fused to the Fv fragment [32]. The second membrane protein crystallized using this technique is the yeast cytochrome bc1 complex (QCR), a fundamental component of the mitochondrial respiratory chain. As in the case of bacterial COX, the complex was crystallized with a recombinant Fv fragment. The latter binds to the extrinsic domain of the catalytic subunit Rip1p, the so-called Rieske protein [33]. The parental monoclonal antibody 18E11 was obtained by standard hybridoma techniques by immunizing the mouse with detergent-solubilized and purified yeast QCR (Hunte, unpublished results). The antibody does not recognize the protein in Western Immunoblot after sodium dodecylsulfate–polyacrylamide gel electrophoresis (SDS–PAGE) separation of the complex, but binds to a native discontinuous epitope. Although this multisubunit complex (230 kDa per monomer) has a large hydrophilic domain protruding in the matrix space (about 86 kDa) and a considerable hydrophilic portion present in the intermembrane space (about 46 kDa), crystals of yeast QCR suitable for X-ray analysis were obtained only in complex with the Fv fragment. The structure of the co-complex was determined at 2.3-Å resolution [33, 34]. The bound Fv fragments result in a spacious crystal packing (space group C2) with a high solvent content of 74%, providing space for the large detergent micelle of the detergent undecylmaltopyranoside (Fig. 2). The main crystal contacts are mediated by the antibody fragments. The same preparation was used to crystallize yeast QCR with its substrate cytochrome c bound [35]. The different crystallization conditions resulted in an altered crystal lattice (P21). Again, the Fv fragments contribute the major interactions between adjacent molecules. They neither hinder cytochrome c binding nor interfere with QCR activity (Hunte, unpublished results). Crystallization and structure determination of the potassium channel KcsA was achieved with a bound Fab (fragment antibody binding) fragment [36]. The monoclonal antibody used was obtained by hybridoma technology and clones were selected that recognize the native homotetrameric, but not the monomeric, form of the channel. Crystallization was performed with proteolytic Fab fragments generated by papain proteolysis and purified by anion-exchange chromatography. The channel–Fab fragment complex crystals allowed an elegant structure solution. The
6 Membrane Protein Crystallization Mediated by Antibody Fragments
Fig. 2 Antibody fragments are essential for the crystallization of the yeast cytochrome bc1 complex, as crystal contacts (white arrows) of the dimeric yeast cytochrome bc1 complex–Fv18E11 co-complex are mediated by the Fv fragment (encircled) (from [83]).
The fragment binds to the extrinsic domain of subunit Rip1p and generously provides space for the detergent micelle in which the transmembrane portion of the complex (dotted lines) is embedded, as schematically depicted in the insert.
phases were determined by molecular replacement using a published structure of another Fab fragment, as the immunoglobulin fold is highly conserved. In addition, the Fab fragment, at around 54 kDa, represents the largest portion of the complex, compared to single channel subunit of around 11 kDa. The crystals of the complex diffracted substantially better (below 2Å) than the previous crystals of the KcsA K+ channel alone (3.2-Å resolution [37]). The Fab fragment binds to the extracellular surface of the channel subunit. All crystal contacts are between Fab fragments, leaving space for the central molecule of the channel with its relative large detergent micelle of decylmaltoside. The size of a Fab fragment appears to be well suited for the crystallization of channels and other membrane proteins that do not contain domains of significant size protruding from the lipid bilayer.
2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 7
2.2 Generation of Antibodies Suitable for Co-crystallization
Up to now, antibody fragments successfully used for co-crystallization of membrane proteins were derived from monoclonal antibodies obtained by classical hybridoma technology [38]. To achieve stable complex formation for crystallization, parental antibodies as well as their derivatives have to (1) bind the antigen in its native conformation, (2) have a high binding affinity to allow for stable stoichiometric complexes and (3) form a rigid complex with their native antigen. Antibodies binding to a discontinuous epitope as opposed to a linear one are preferred. The N- and C-termini of polypeptides as well as exposed loops are often recognized as epitopes in an immune response. However, these peptides are often flexible and binding of an antibody molecule would very likely introduce a mobile domain detrimental for crystallization. Specificity for a discontinuous epitope can be easily checked in Western immunoblot analysis after separation of the denatured protein by SDS–PAGE, in which the antibody should not bind to the linearized polypeptide.
2.2.1 Immunization One of the first important steps for the successful generation of specific hybridoma clones is the immunization of animals with antigen to generate a sufficient antibody titer. Detergent-solubilized and purified membrane proteins have been successfully used as antigens to obtain high-affinity binders for native membrane proteins, e.g. for generating monoclonal antibody 18E11 specific for the yeast QCR [33] or various monoclonal antibodies specific for the Na+ /H+ antiporter NhaA [39]. Generally, long-term immunization protocols are favorable to obtain many highaffinity binders. In a standard protocol applied for generating antibodies against yeast QCR and NhaA [33, 39, 40], 10-week-old female BALB/c mice were immunized by intraperitoneal injection of highly purified, detergent solubilized protein [100 µg protein in 50% (v/v) adjuvant (ABM2; Pan Biotech, Aidenbach, Germany) at a final volume of 250 µl]. The initial immunization was followed by four injections with protein suspension at 4-week intervals [50 µg protein in 50% (v/v) adjuvant (ABM1; Pan Biotech) at a final volume of 250 µl]. For the final boost the latter injection was repeated on 3 consecutive days and mice sacrificed the following day for removal of the spleen. Blood sera of the mice were taken prior to immunization and 10 days after each injection. After the third immunization, the blood serum showed clear positive signals up to a serum dilution of 105 in a standard enzyme-linked immunosorbent assay (ELISA) using purified NhaA as antigen [39]. It is evident that such an immunization regime in small mammals for highly homologous membrane proteins (i.e. derived from human, mouse, rat and monkey) may be less likely to produce antibodies because of self-tolerance mechanisms. However, there are several ways to stimulate the immune response, e.g. genetic immunization [41–43] or simply the use of improved lipid nanoparticle-based adjuvants [44].
8 Membrane Protein Crystallization Mediated by Antibody Fragments
2.2.2 Selection of Antibodies Antibodies suitable for co-crystallization of membrane proteins have to recognize their respective antigen as a protein–detergent complex. Thus, specific requirements are imposed on the selection of conformation-specific monoclonal antibodies. The type and concentration of detergent is crucial to keep the protein in its native conformation. Some detergents may prevent proteins from binding to plastic and polystyrene surfaces used as common ELISA supports. In addition, adsorption to the solid phase can cause partial denaturation of the protein [45]. When generating monoclonal antibodies against yeast QCR, 35 clones were obtained that recognize the denatured protein in Western immunoblots and only one clone was selected that specifically recognizes the native protein (Hunte, unpublished results). There, a standard ELISA was applied in which detergent-solubilized complex was coated to the polysterene matrix. To improve the yield of native binders for membrane proteins, an easy and robust strategy, the so-called his-tag ELISA, has been devised for proper immobilization and presentation of the “native” antigen of his tagged recombinant membrane proteins, as demonstrated for the antiporter NhaA. The protein is immobilized on Ni2+ -NTA-coated ELISA plates using the same buffer conditions established for the purification of the antiporter using immobilized metal-affinity chromatography [39]. Under these conditions, in the presence of the mild detergent dodecyl-β-D-maltopyranoside, the protein is properly folded and fully functional, and it can be subjected to screening for antibodies. Starting from more than 2000 hybridoma clones, implementation of his xstag ELISA (as opposed to a standard ELISA) allowed the isolation of six different monoclonal antibodies of the IgG type. All of the antibodies obtained are conformation sensitive and bind the detergent-solubilized antiporter as well as its membrane-bound form. Only one of these antibodies is positive in Western blot analysis after SDS–PAGE separation of the protein. However, this antibody (1F6) binds to a linear, but surface exposed, epitope located at the N-terminus of the enzyme [46]. Interestingly, monoclonal antibody 1F6 binds NhaA in a pH-dependent manner that mirrors the pH dependence of the activation of the protein and the associated conformational change as probed by trypsin digestion [47]. It can be used for immunoaffinity purification of the transporter from crude membrane extracts by binding NhaA to immobilized monoclonal antibody 1F6 and eluting it specifically with a pH shift. His-tag ELISA is a powerful method that combines the sensitivity, specificity and high signal/noise ratio of a common immunosorbent assay with recombinant protein chemistry. It can easily be adapted to all recombinant proteins purified with the help of a his tag. To our knowledge it is the first example of selection of antibodies against conformational epitopes using a detergent-solubilized membrane protein, i.e. the most common state for the growth of type II 3-D crystals. Alternative methods for the presentation of membrane proteins in their native conformation have been reported, such as selection with lipid reconstituted samples [48], native enriched membrane fractions or intact cells [49]. Tailored screening is the key step in selecting suitable antibodies from hybridoma fusions or recombinant antibody libraries. Specifically adapted screens allow the selection of antibodies specific for a defined protein conformation. Correctly
2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 9
selected antibodies could help to trap a flexible protein in a fixed conformation. Cocrystallization with antibody fragments has been demonstrated to facilitate structure determination of the flexible gp120 domain [50]. 2.2.3 Antibody Fragments Native antibodies are unsuitable for co-crystallization attempts. They are flexible in the linker regions connecting the variable and constant domains [51], and their bivalent binding mode is in most cases undesirable. Monovalent antibody fragments can be generated by proteolytic cleavage of the whole antibody producing two Fab fragments per antibody molecule. A sufficient amount of starting material of the monoclonal antibody is either obtained by ascites production or by cell culture techniques with the producing murine hybridoma cell line. The monoclonal antibodies are then purified, e.g. using Protein A-affinity chromatography. In the following step, the Fab fragments are commonly generated with the protease papain that specifically cuts in the hinge region. Fab fragments can be purified by Protein A depletion of the split constant antibody domain, followed by size-exclusion or ion-exchange chromatography. The Fab fragments are relatively easy to obtain by proteolysis, but care has to be taken to produce homogenous fragment preparations suitable for crystallization. Residual parts of the flexible linker and in some cases glycosylation can hinder crystallization attempts. Despite these problems, a large number of those Fab fragments alone and in complex with the respective hapten or antigen, including one membrane protein (the potassium channel KcsA [36]), have been crystallized. In the future, a promising alternative for raising monoclonal antibodies in mice is the adaptation of phage- or ribosome-display technology [48, 52, 53] to the in vitro selection of ligands for membrane proteins. Up to now, examples of successful selection of antibodies specific for membrane proteins via phage display of nonimmune libraries are rare. We anticipate that other strategies may be worth pursuing to establish phage display-mediated screening of antibodies against these targets, i.e. (1) generation of more diverse (more than 1010 different elements) antibody libraries, (2) use of a modified helper phage that allows multivalent presentation of the antibodies [54] and (3) screening for ligands after a single round of bio-panning followed by in vitro affinity maturation [55]. 2.2.3.1 Cloning of Antibody Fragments The smallest antigen-binding domain of an antibody, i.e. the Fv fragment, consists of the variable domain of the heavy and light chain, VH and VL , respectively. Fv fragments cannot be obtained by proteolysis, but have to be produced as recombinant proteins. For this purpose, the genes encoding VH and VL have to be cloned and expressed starting from hybridoma cell lines. The cloning of Fv fragments by reverse transcriptase-polymerase chain reaction (RTPCR) is described here [32, 56] (Fig. 3). mRNA or total RNA is isolated from murine hybridoma cell lines producing the antibody of interest. In the following step, cDNA synthesis can be simply performed by using oligo(dT) as a primer. To amplify VH and VL genes, degenerated primers are used that match the 5 end of framework 1 and the 3 end of framework 4, i.e. corresponding to the N-termini of the heavy and
10 Membrane Protein Crystallization Mediated by Antibody Fragments
Fig. 3 Schematic depiction of the strategy for cloning and expression of Fv fragments from monoclonal hybridoma cell lines by RT-PCR. The genes encoding for the variable domains of the heavy and light chains (VH and VL , respectively) are inserted in the bicistronic plasmid pASK68 for periplas-
mic expression in E. coli [32]. The signal sequences OmpA and PhoA are used to target the polypeptides to the periplasm. A strep-tag is attached to the heavy chain for purification; the myc-tag fusion allows detection of the light chain with the monoclonal antibody 9E10.
2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 11
light chain of the antibody molecule and the C-termini of the variable domains, respectively. The sequence of the framework regions is highly conserved, and a set of primer pairs was constructed to specifically amplify genes encoding for the Vλ chain, Vκ chain subclasses 1, 2, 3 and 5, Vκ chain subclass 4 and 6, and VH [32]. These primers contain unique restriction sites, which allow insertion of the PCR product into the plasmid pASK68 that is used for production of the Fv fragment in the periplasm of E. coli (see below). Antisense-directed RNase H digestion can be used to suppress amplification of the nonfunctional mRNA transcripts introduced in hybridoma cell lines during fusion with the nonsecreting myeloma cell line [56]. This fast, robust approach was used to clone several antibody fragments, including the Fv fragments of monoclonal antibody 7E2 specific for COX of P. denitrificans [32], monoclonal antibody 18E11 specific for yeast QCR [33], and monoclonal antibodies 2C5 and 5H4 specific for the antiporter NhaA from E. coli [57]. However, in some cases this approach failed to amplify the desired genes. Therefore, a new set of primers was designed that accounts for the different sequences of the subgroups as listed by the Kabat databank [58] (Fig. 4). A specific forward primer was constructed for every subgroup of antibodies. The position of the backward primer was shifted downstream into the highly conserved constant region. Furthermore, the restriction sites were omitted for the sake of higher accuracy. PCR products were then cloned by blunt-end or TA ligation into sequencing vectors. After validation of the antibody sequence, restriction sites were introduced by PCR to match the desired expression vector. A similar strategy based on degenerated primers can be used for cloning of Fab fragments [59]. Furthermore, hybrid Fab fragments can be produced by insertion of the VH and VL genes into a vector that already contains constant domains of an antibody molecule [57]. Alternative PCR strategies and primer sets including cloning by rapid amplification of cDNA ends (RACE) have been published [60–62]. Often, heavy and light chain are covalently joined by introduction of a linker peptide, generating so-called sc (single-chain) Fv fragments [63] or by connecting the chains via a disulfide bridge (dsFv fragments [64]). Although these measures are intended to stabilize the recombinant fragments, they may in some cases negatively affect the binding properties of the fragment. Fragments Fv7E2 and Fv18E11, which have been used for crystallization of COX and QCR, respectively, do not contain a linker or an additional disulfide bond. See Fig. 4. 2.2.3.2 Production of Antibody Fragments A variety of expression systems have been used for the production of recombinant antibody fragments, e.g. E. coli [65], Pichia pastoris [66] or mammalian cells [67]. A routine procedure for the production of Fv fragments is periplasmic expression in E. coli [32, 65]. The encoding genes of the variable domains of heavy and light chain are inserted in the dicistronic operon of the plasmid pASK68 allowing periplasmic expression in E. coli under the control of the inducible lac promoter (see Fig. 3 in [32]). N-terminal fusion of the OmpA and PhoA signal sequence facilitate secretion of the heavy and light chain to the periplasm. A streptavidin-binding peptide (strep-tag) is appended at the Cterminus of the VH domain for purification via streptavidin-affinity chromatography [68], while a myc-tag (recognized by the monoclonal antibody 9E10) is present at
12 Membrane Protein Crystallization Mediated by Antibody Fragments
Fig. 4 Primer sets used for amplification of VL - and VH -encoding genes of murine monoclonal antibodies. (A) Degenerated primers with unique restriction sites have been designed to clone the genes compatible for insertion into the plasmid pASK68 used for periplasmic expression in E. coli (see Fig. 3. in [32]).
the C-terminus of the VL domain and is used for detection of the antibody fragment in a Western blot or ELISA. When expressing an antibody fragment that binds to a membrane protein naturally residing in the cytoplasmic membrane of the host, one should consider that the antibody may inhibit the protein activity and thereby growth of the cell. In the case of the antiporter NhaA, even though most antibodies had an inhibitory effect on the activity of NhaA [39], periplasmic expression of Fv and Fab fragments was possible [57] as the transporter is vital only at certain growth conditions, e.g. at high sodium or lithium concentration, or at alkaline pH in the presence of sodium. In addition, the respective epitopes may be facing the extracellular side or not be accessible.
2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 13
Fig. 4 (B) Amplification of VH and VL genes is considerably improved using the subgroup specific primer set. Using the same cDNA template, every primer combination is tried in parallel at different temperatures. In most cases the correct gene is amplified by a distinct primer combination.
14 Membrane Protein Crystallization Mediated by Antibody Fragments
Fig. 4 (C) Corresponding amino acid sequences of antibodies of different subgroups for the complementary part of primer set II. The residues of VL and VH are numbered according to Kabat et al. [58].
In bacterial expression systems, Fv and Fab fragments are normally produced in the periplasmic space, where oxidizing conditions allow the formation of the required disulphide bonds essential for proper folding and activity of the antibodies. Protein yields vary for the different antibody fragments, with maximum yields of 1.5 mg protein/l of bacterial culture for stable Fv fragments. Fab fragments are expressed at an even lower yield (0.1–0.4 mg/l of culture), most likely due to the higher complexity of their disulphide bond patterns and the larger mass. Their production creates a bottleneck for co-crystallization trials. Production of proteolytic Fab fragment is a multistep process, and is time consuming and costly. Often the yield of pure protein is below 30% of the starting protein sample. To overcome these limitations, yield optimization of recombinant Fab production would be advantageous. Fab fragments have been produced as chimeric proteins with the CH 1 and CL constant domains of another antibody, i.e. monoclonal antibody D1.3 specific for lysozyme, by subcloning the VH and VL domains of three different antibodies into the vector pASK85-D1.3 [57]. This plasmid provides a his tag at the C-terminus of the heavy chain for purification via immobilized metal-affinity chromatography. It
2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 15
is designed for periplasmic expression in E. coli under the control of the tetracycline repressor/promoter system [68]. Monoclonal antibody 2C5 antibody fragments are produced at good levels of 1.5 and 0.5 mg/l in Fv and Fab formats, respectively. Antibody fragments of 5H4 were produced at much lower yields (0.1 and 0.03 mg/l for Fv and Fab fragments, respectively). Exchange of charged and bulky residues at the N-terminus of the variable domain of the light chain improved the folding and solubility properties of the latter antibody [69]. The yield could be considerably increased by expression of the same hybrid fragments in an oxidizing bacterial cytoplasm using the E. coli strain FA113 and the newly constructed expression vector pFAB1 [57]. This leads to high yield of 10–30 mg pure, functional Fab fragments per liter of E. coli culture and these are of suitable quality for crystallization studies. In addition, scFv fragments derived from phage display libraries have recently been successfully expressed in a similar strain co-expressing molecular chaperones [70]. 2.3 Protein Purification and Crystallization
Starting from a good natural or recombinant protein source, the purification procedure has to ensure pure and homogenous protein preparations in sufficient yield. In general, the methods for membrane protein purification are the same as those applied for soluble proteins except that integral membrane proteins are purified as protein–detergent complexes. Membrane proteins are kept soluble in micellar detergent solutions at a usual detergent working concentration of 1.5-to 2-fold CMC. The choice of detergent for solubilization and purification is important, and should be carefully chosen. Most purification materials tolerate a wide range of detergents. Affinity purification systems have the advantage of high speed and high specificity compared to standard chromatographic procedures such as anion-exchange or size-exclusion chromatography. For specific applications, e.g. for crystallization, recombinant antibody fragments can be used in indirect immunoaffinity chromatography for one-step purification of membrane protein–antibody complexes as outlined below. 2.3.1 Indirect Immunoaffinity Purification Recombinant antibody fragments are versatile tools for the structural analysis of proteins. They can be used for the preparation of membrane protein complexes, yielding highly purified material suitable for crystallization in a single immunoaffinity chromatography step [32, 71]. Compared with classical methods like ion-exchange or size-exclusion chromatography, immunoaffinity chromatography offers an excellent purification factor – usually between 100 and 10,000 – due to the high specificity of the interaction between antigen and antibody or antibody fragment. However, due to the strength of this interaction, harsh elution protocols are usually required if the antibodies or their fragments are covalently coupled to activated matrices or trapped by immobilized Protein A or G. Engineered antibody fragments, however, can be noncovalently and reversibly bound to an appropriate matrix via an affinity tag, e.g. the strep-tag [72]. The protein is consequently eluted
16 Membrane Protein Crystallization Mediated by Antibody Fragments
Fig. 5 Schematic depiction of the purification of a multisubunit membrane protein in complex with an antibody fragment (strep-tagged Fv fragment) by indirect immunoaffinity chromatography. Solubilized cellular membranes are combined with periplasmic extract of the Fv fragmentproducing E. coli strain and the so-called cocomplex is formed. The mixture is applied to a streptavidin-affinity column and the co-
complex as well as the Fv fragment alone are bound via the strep-tag. They are specifically eluted by competitive displacement with an appropriate ligand after removing all other proteins in a washing step. Surplus Fv can be separated by size-exclusion chromatography resulting in pure complex of membrane protein and Fv fragment. (From [71].)
as a co-complex with the antibody fragment (Fig. 5) and can be used as such for crystallization trials. This indirect immunoaffinity chromatography approach combines the advantages of antigen/antibody interaction and affinity tag techniques, i.e. high specificity and well established, fast purification protocols of broad applicability. The technique allows isolation of the protein from its natural source, as well as from expression systems in its native form without the attachment of affinity markers. Obviously, the purified protein is present in a complex with the bound
2 Crystallization of Membrane Proteins Mediated by Antibody Fragments 17
antibody fragment, which should not affect the biological activity of the protein unless this is desired. COX from P. denitrificans [32] as well as the quinol oxidases from P. denitrificans and E. coli [73] have been efficiently purified by one-step indirect immunoaffinity chromatography using recombinant antibody fragments without affecting the functional properties of these membrane protein complexes. In contrast, to the COX co-complexes, streptavidin-affinity chromatography cannot be used for purification of the QCR–Fv fragment co-complex. A single transmembrane helix anchors subunit Rip1p at the periphery of the trans-membrane portion of the complex and “pulling” at its extrinsic domain via the Fv fragment results in disintegration of QCR to a certain extent (Hunte, unpublished results). Therefore, QCR and Fv fragments were purified separately and the co-complex subsequently formed with surplus Fv fragments, which are then removed by size-exclusion chromatography [33, 71]. 2.3.2 Crystallization Reassuringly, membrane protein crystals are obtained with standard crystallization techniques and conditions that are routinely used for soluble proteins, and finding the optimal crystallization conditions appears not to be the limiting step (for reviews, see [27–29, 74, 75]). Thorough biochemical characterization of the protein preparation in combination with comprehensive screening for the most suitable detergent may be the most efficient strategy to cope with the difficulties of membrane protein crystallization. Protein heterogeneity has to be avoided; it can be caused by proteolysis, different oligomerization states or post-translational modifications, e.g. glycosylation. The crystallization of the water channel AQP1 was performed with fully deglycosylated protein samples [76]. In general, crystallization may be hampered by flexible termini or loops and proteolytic cleavage or recombinant products omitting these termini might be useful. For instance, crystallization of the potassium channel KcSA was achieved only after proteolytic cleavage of 35 C-terminal amino acids [36, 37]. Furthermore, restrained mutagenesis at the polar surface of the outer membrane proteins OmpA and OmpX was used to optimize crystallization [21]. Another important aspect should be mentioned here. Many membrane proteins change their conformation considerably during biological activity. For successful crystallization it may be necessary to fix certain conformations or mobile domains, e.g. by the use of specific inhibitors. The mobile extrinsic domain of subunit Rip1p of the cytochrome bc1 complex was not resolved in the first structure of the complex from bovine mitochondria [77]. The domain’s structure was only visible after its immobilization either by crystal contacts or by addition of Qo site-specific inhibitors [33, 78]. In general, various means can be used to improve the stability of a protein. If no conditions and detergents can be found to keep the protein stable or suitable for crystallization, selection for stable mutants is advisable [79]. Antibody fragments can fix a defined protein conformation or stabilize a flexible domain to aid crystallization attempts if suitable antibodies have been selected. Conditions applied for crystallization of membrane proteins in complex with antibody fragments comprise various pH, precipitant and salt concentrations
18 Membrane Protein Crystallization Mediated by Antibody Fragments
indicating the high stability of the antigen–antibody complexes [30]. No data are available for the actual binding constants of the successful examples; however, in all cases, antibody fragments are used in molar excess during purification and stoichiometric complexes are obtained using gel filtration as a final purification step. The use of stoichiometric antibody–antigen complexes for crystallization clearly indicates the high affinity of the antibodies. 2.3.3 Alternative Approaches for Expansion of the Polar Surface If a recombinant expression system is available, the addition of fusion proteins appears to be a fast alternative to enlarge the solvent-accessible surface of a membrane protein and thereby the potential for crystallization. However, protein fusions are connected via a linker region with the protein of interest. This linker will lead to the introduction of a flexible domain, a feature detrimental for crystallization. In an attempt to improve the crystal quality of membrane-bound cytochrome bo3 ubiquinol oxidase from E. coli, the protein was produced as a C-terminal fusion of protein Z to subunit IV [80]. The fusion protein crystallized at similar conditions compared to the native complex, but the crystals were even less ordered. Molecular replacement trials have not been successful and protein Z could not be placed in the model. This may be due to the low resolution of the diffraction data (6 Å) or may indicate disorder of protein Z. The fusion strategy to enlarge the hydrophilic surface could be of general help and valuable for structural genomics projects if fusion partners can be attached in a rigid manner. It has been tried to add the fusion partner not at the protein termini, but between two transmembrane helices. The soluble cytochrome b562 was inserted in one of the inner cytoplasmic loops of lactose permease, a secondary transport protein with 12 transmembrane helices [81]. The “red permease” has transport activity and 2-D crystals were obtained [82], but good diffracting 3-D crystals have not been reported. Choosing physiologically stable reaction partners or stabilizing protein–protein interactions may be an alternative to increase the solvent-exposed domains of membrane proteins for crystallization.
3 Conclusions
In recent years, structural membrane protein research has been a rapidly evolving field, paving the way for rational drug discovery. However, the number of membrane protein structures at near atomic resolution is still small compared with those of soluble proteins. Antibody fragments are used to enlarge the hydrophilic surface of membrane proteins, thereby improving the chances for 3-D crystallization. High-resolution structures of three different membrane proteins have already been obtained with this method. All antibody fragments used for successful cocrystallization of membrane proteins so far have been derived from monoclonal antibodies obtained by classical hybridoma technology. They recognize native,
References
nonlinear epitopes of their respective antigen and bind with high affinity. To establish this strategy as a general tool in crystallization of membrane proteins, recombinant antibody techniques have to be optimized to allow fast access to high-affinity binders. Antibody fragment-mediated crystallization may be especially helpful for membrane proteins that consist mainly of transmembrane helices with very short solvent-exposed loops, as well as for trapping flexible proteins in a fixed conformation.
Acknowledgments
This work was supported by the German Research Foundation (SFB472) and the Max Planck Society. We thank H. P´alsd´ottir and E. Padan for critically reading the manuscript.
References 1 LANDER E. S., et al., Initial sequencing and analysis of the human genome, Nature 2001, 409, 860–921. 2 VENTER J. C., et al., The sequence of the human genome, Science 2001, 291, 1304–1351. 3 TOMB J. F., et al., The complete genome sequence of the gastric pathogen Helicobacter pylori, Nature 1997, 388, 539–547. 4 COLE S. T., et al., Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence, Nature 1998, 93, 537–544. 5 GARDNER M. J., et al., Genome sequence of the human malaria parasite Plasmodium falciparum, Nature 2002, 419, 498–511. 6 DEISENHOFER J., EPP O., MIKI K., HUBER R., MICHEL H., Structure of the protein subunits in the photosynthetic reaction center of Rhodopseudomonas viridis at 3 A resolution, Nature 1985, 8, 618–624. 7 BYRNE B., IWATA S., Membrane protein complexes, Curr Opin Struct Biol 2002, 12, 239–243. 8 WALLIN E., VON HEIJNE G., Genome-wide analysis of integral membrane proteins from eubacterial archaean and eukaryotic organisms, Protein Sci 1998, 7, 1029–1038. 9 GARAVITO R. M., FERGUSON-MILLER S., Detergents as tools in membrane
10
11
12
13
14
15
biochemistry, J Biol Chem 2001, 276, 32403–32406. ZULAUF M., Detergent phenomena in membrane protein crystallization, in:MICHEL H., ed., Crystallization of Membrane Proteins, 54–71, CRC Press, Boca Raton, FL, 1991. WALZ T., GRIGORIEFF N., Electron crystallography of two-dimensional crystals of membrane proteins, J Struct Biol 1998, 121, 142–161. KUHLBRANDT W., Two-dimensional crystallization of membrane proteins: a practical guide, in: HUNTE C., VON JAGOW G., SCHA¨ GGER H., eds, Membrane Protein Purification and Crystallization, 253–284, Elsevier Science, Amsterdam, 2003. HENDERSON R., BALDWIN J. M., CESKA T. A., ZEMLIN F., BECKMANN E., DOWNING K. H., Model for the structure of bacteriorhodopsin based on high resolution electron cryomicroscopy, J Mol Biol 1990, 219, 899–929. KUHLBRANDT W., WANG D. N., FUJIYOSHI Y., Atomic model of plant light-harvesting complex by electron crystallography Nature 1994, 67, 614–621. MURATA K., MITSUOKA K., HIRAI T., WALZ T., AGRE P., HEYMANN J. B., ENGEL A., FUJIYOSHI Y., Structural determinants of water permeation through aquaporin-1, Nature 2000, 407, 599–605.
19
20 Membrane Protein Crystallization Mediated by Antibody Fragments
16 MICHEL H., Crystallization of membrane proteins, Trends Biochem Sci 1983, 8, 56–59. 17 ROTH M., ARNOUX B., DUCRUIX A., REISS-HUSSON F., Structure of the detergent phase and protein detergent interactions in crystals of the wild-type (strain Y) Rhodobacter sphaeroides photochemical reaction center, Biochemistry 1991, 30, 9403–9413. 18 OSTERMEIER C., MICHEL H., Crystallization of membrane proteins, Curr Opin Struct Biol 1997, 7, 697–701. 19 OSTERMEIER C., IWATA S., LUDWIG B., MICHEL H., Fv fragment mediated crystallization of the membrane protein bacterial cytochrome c oxidase, Nat Struct Biol 1995, 2, 842–846. 20 IWATA S., OSTERMEIER C., LUDWIG B., MICHEL H., Structure at 28-Angstrom resolution of cytochrome c oxidase from Paracoccus denitrificans, Nature 1995, 76, 660–669. 21 PAUTSCH A., VOGT J., MODEL K., SIEBOLD C., SCHULZ G. E., Strategy for membrane protein crystallization exemplified with OmpA and OmpX, Proteins Struct Function Genet 1999, 34, 167–172. 22 SCHULZ G. E., Beta-barrel membrane proteins, Curr Opin Struct Biol 2000, 30, 443–447. 23 CHANG G., SPENCER R. H., LEE A. T., BARCLAY M. T., REES D. C., Structure of the MscL homolog from Mycobacterium tuberculosis: a gated mechano-sensitive ion channel, Science 1998, 282, 2220–2206. 24 GRISSHAMMER R., TATE C. G., Overexpression of integral membrane proteins for structural studies, Q Rev Biophys 1995, 28, 315–422. 25 PADAN E., HUNTE C., REILANDER H., Production and purification of recombinant membrane proteins, in: HUNTE C., VON JAGOW G., SCHA¨ GGER H., eds, Membrane Protein Purification and Crystallization, 55–83, Elsevier Science, Amsterdam, 2003. 26 MICHEL H., Crystallization of membrane proteins, in: ROSSMANN M. G., ARNOLD E., eds, International Tables of Crystallography F: Macromolecular Crystallography, 94–99, Kluwer, Dordrecht, 2001.
27 KUHLBRANDT W., 3-dimensional crystallization of membrane proteins, Q Rev Biophys 1988, 21, 429–477. 28 GARAVITO R. M., PICOT D., LOLL P. J., Strategies for crystallizing membrane proteins, J Bioenerget Biomemb 1996, 28, 13–27. 29 HUNTE C., MICHEL H., Membrane protein crystallization, in: HUNTE C., VON JAGOW G., SCHA¨ GGER H., eds, Membrane Protein Purification and Crystallization, 143–160, Elsevier Science, Amsterdam, 2003. 30 HUNTE C., MICHEL H., Crystallisation of membrane proteins mediated by antibody fragments, Curr Opin Struct Biol 2002, 12, 503–508. 31 HARRENGA A., MICHEL H., The cytochrome c oxidase from Paracoccus denitrificans does not change the metal center ligation upon reduction, J Biol Chem 1999, 274, 33296–33299. 32 KLEYMANN G., OSTERMEIER C., Ludwig B, Skerra A, Michel H, Engineered Fv fragments as a tool for the one-step purification of integral multisubunit membrane protein complexes, BioTechnology 1995, 13, 155–160. 33 HUNTE C., KOEPKE J., LANGE C., ROSSMANITH T., MICHEL H., Structure at 23 angstrom resolution of the cytochrome bc1 complex from the yeast Saccharomyces cerevisiae co-crystallized with an antibody Fv fragment, Struct Fold Des 2000, 8, 669–684. 34 LANGE C., NETT J. H., TRUMPOWER B. L., HUNTE C., Specific roles of tightly bound phospholipids in the yeast cytochrome bc1 complex structure, EMBO J 2001, 20, 6591–6600. 35 LANGE C., HUNTE C., Crystal structure of the yeast cytochrome bc1 complex with its bound substrate cytochrome c, Proc Natl Acad Sci USA 2002, 99, 2800–2805. 36 ZHOU Y. F., MORAIS-CABRAL J. H., KAUFMAN A., MACKINNON R., Chemistry of ion coordination and hydration revealed by a K channel+ Fab complex at 20 angstrom resolution, Nature 2001, 414, 43–48. 37 DOYLE D. A., CABRAL J. M., PFUETZNER R. A., KUO A. L., GULBIS J. M., COHEN S. L., CHAIT B., MACKINNON R., The structure of the potassium channel: molecular basis of
References
38
39
40
41
42
43
44
45
46
47
K conduction and selectivity, Science 1998, 280, 69–77. KOHLER G., MILSTEIN C., Continuous cultures of fused cells secreting antibody of predefined specificity, Nature 1975, 256, 495–497. PADAN E., VENTURI M., MICHEL H., HUNTE C., Production and characterization of monoclonal antibodies directed against native epitopes of NhaA the Na+ /H+ -antiporter of Escherichia coli, FEBS Lett 1998, 44, 53–58. VENTURI M., HUNTE C., Monoclonal antibodies for the structural analysis of the Na+ /H+ antiporter NhaA from Escherichia coli, Biochim Biophys Acta Biomembr 2003, 1610, 46–50. TANG D. C., De VIT M., JOHNSTON S. A., Genetic immunization is a simple method for eliciting an immune response, Nature 1992, 356, 152–154. PEET N. M., MCKEATING J. A., RAMOS B., KLONISCH T., DESOUZA J. B., DELVES P. J., LUND T., Comparison of nucleic acid and protein immunization for induction of antibodies specific for HIV-1 gp120, Clin Exp Immunol 1997, 109, 226–232. KILPATRICK K. E., CUTLER T., WHITEHORN E., DRAPE R. J., MACKLIN M. D., WITHERSPOON S. M., SINGER S. and HUTCHINS J. T., Gene gun delivered DNA-based immunizations mediate rapid production of murine monoclonal antibodies to the Flt-3 receptor, Hybridoma 1998, 17, 569–576. SINGH M., O’HAGAN D. T., Recent advances in vaccine adjuvants, Pharm Res 2002, 6, 715–728. GOLDBERG M. E., DJAVADI-OHANIANCE L., Methods for measurement of antibody/antigen affinity based on ELISA and RIA, Curr Opin Immunol 1993, 5, 278–281. VENTURI M., RIMON A., GERCHMAN Y., HUNTE C., PADAN E., MICHEL H., The monoclonal antibody 1F6 Identifies a pH-dependent conformational change in the hydrophilic NH2 terminus of NhaA Na+ /H+ -antiporter of Escherichia coli, J Biol Chem 2000, 275, 4734–4742. GERCHMAN Y., RIMON A., PADAN E. J., A pH-dependent conformational change of NhaA Na+ /H+ antiporter of Escherichia
48
49
50
51
52
53
54
55
56
coli involves loop VIII IX, plays a role in the pH response of the protein, and is maintained by the pure protein in dodecyl maltoside, Biol Chem 1999, 274, 24617–24624. MIRZABEKOV T., KONTOS H., FARZAN M., MARASCO W., SODROSKI J., Paramagnetic proteoliposomes containing a pure, native and oriented seven-transmembrane segment protein, CCR5, Nat Biotechnol 2000, 18, 649–654. PEIPP M., SIMON N., LOICHINGER A., BAUM W., MAHR K., ZUNINO S. J., FEY G. H., An improved procedure for the generation of recombinant singlechain Fv antibody fragments reacting with human CD13 on intact cells, J Immunol Methods 2001, 251, 161–176. KWONG P. D., WYATT R., MAJEED S., ROBINSON J., SWEET R. W., SODROSKI J., HENDRICKSON W. A., Structures of HIV-1 gp120 envelope glycoproteins from laboratory-adapted and primary isolates, Struct Fold Des 2000, 8, 1329–1339. LESK A. M., CHOTHIA C., Elbow motion in the immunoglobulins involves a molecular ball-and-socket joint, Nature 1988, 35, 188–190. VITI F., NILSSON F., DEMARTIS S., HUBER A., NERI D., Design and use of phage display libraries for the selection of antibodies and enzymes, Methods Enzymol 2000, 326, 480–485. HANES J., JERMUTUS L., PLUCKTHUN A., Selecting and evolving functional proteins in vitro by ribosome display, Methods Enzymol 2000, 328, 404–430. RONDOT S., KOCH J., BREITLING F., DUBEL S. A., Helper phage to improve single-chain antibody presentation in phage display, Nat Biotechnol 2001, 19, 75–78. De WILDT R. M., MUNDY C. R., GORICK B. D., TOMLINSON I. M., Antibody arrays for high-throughput screening of antibody antigen interactions, Nat Biotechnol 2000, 18, 989–994. OSTERMEIER C., MICHEL H., Improved cloning of antibody variable regions from hybridomas by an antisense-directed RNase H digestion of the P3-X63-Ag8653 derived pseudogene mRNA, Nucleic Acid Res 1996, 24, 1979–1980.
21
22 Membrane Protein Crystallization Mediated by Antibody Fragments
57 VENTURI M., SEIFERT C., HUNTE C., High level production of functional antibody Fab fragments in an oxidizing bacterial cytoplasm, J Mol Biol 2002, 315, 1–8. 58 KABAT E. A., WU T. T., REID-MILLER M., PERRY H. M., GOTTESMAN K. S., FOELLER C., 1991, Sequences of Proteins of Immunological Interest, 5th edn, US Department of Health and Human Services, NIH, Bethesda, MD. 59 ENGBERG J., JENSEN L. B., YENIDUNYA A. F., BRANDT K., RIISE E., Phage-display libraries of murine antibody Fab fragments, in: KONTERMANN R., D¨UBEL S., eds, Antibody Engineering, 65–92, Springer, Berlin, 2001. 60 BURMESTER J., PLU¨ CKTHUN A., Construction of scFv fragments from hybridoma or spleen cells by PCR assembly, in: KONTERMANN R., D¨UBEL S., eds, Antibody Engineering, 19–40, Springer, Berlin, 2001. 61 BREITLING F., MOOSMAYER D., BROCKS B., D¨UBEL S., Construction of scFv from hybridoma by two-step cloning, in: KONTERMANN R., D¨UBEL S., eds, Antibody Engineering, 41–55, Springer, Berlin, 2001. 62 BRADBURY A., Cloning hybridoma by RACE, in: KONTERMANN R., D¨UBEL S., eds, Antibody Engineering, 56–61, Springer, Berlin, 2001. 63 BIRD R. E., HARDMAN K. D., JACOBSON J. W., JOHNSON S., KAUFMAN B. M., LEE S. M., POPE S. H., RIORDAN G. S., WHITLOW M., Single-chain antigen-binding proteins, Science 1988, 242, 423–426. 64 BRINKMANN U., REITER Y., JUNG S. H., LEE B., PASTAN I., A recombinant immunotoxin containing disulfide-stabilized Fv fragment, Proc Natl Acad Sci USA 1993, 90, 7538–7542. 65 SKERRA A., PLU¨ CKTHUN A., Assembly of functional immunoglobulin Fv fragments in Escherichia coli, Science 1988, 249, 1038–1040. 66 RIDDER R., SCHMITZ R., LEGAY F., GRAM H., Generation of rabbit monoclonal antibody fragments from a combinatorial phage display library and their production in the yeast Pichia pastoris, Biotechnology 1995, 13, 255–260. 67 TRILL J. J., SHATZMAN A. R., GANGULY S., Production of monoclonal antibodies in
68
69 70
71
72
73 74
75
76
77
78
COS and CHO cells, Curr Opin Biotechnol 1995, 6, 553–560. SCHMIDT T. G. M., SKERRA A., One-step affinity purification of bacterially produced proteins by means of the “Strep tag” and immobilized re-combinant core streptavidin, J Chromatogr A 1994, 676, 337–345. VENTURI M., PhD thesis, University of Frankfurt, 2000. JURADO P., RITZ D., BECKWITH J., de LORENZO V., FERNANDEZ L. A., Production of functional single-chain Fv antibodies in the cytoplasm of Escherichia coli, J Mol Biol 2002, 320, 1–10. HUNTE C., KANNT A., Antibody fragment mediated crystallization of membrane proteins, in: HUNTE C., VON JAGOW G., SCHA¨ GGER H., eds, Membrane Protein Purification and Crystallization, 205–218, Elsevier Science, Amsterdam, 2003. SCHMIDT T. G. M., SKERRA A., The random peptide library-assisted engineering of a C-terminal affinity peptide useful for the detection and purification of a functional Ig Fv fragment, Protein Eng 1993, 6, 109–122. OSTERMANN T., PhD thesis, University of Frankfurt, 2000. MCPHERSON S. A., 1990, Current approaches to macromolecular crystallization, Eur J Biochem 189, 1–23. MICHEL H., General and practical aspects of membrane protein crystallization, in: MICHEL H., ed., Crystallization of Membrane Proteins, 73–88, CRC Press, Boca Raton, FL, 1991. SUI H. X., WALIAN P. J., TANG G., OH A., JAP B. K., Crystallization and preliminary X-ray crystallographic analysis of water channel AQP1, Acta Crystallogr D Biol Crystallogr 2000, 56, 1198–1200. XIA D., YU C. A., KIM H., XIAN J. Z., KACHURIN A. M., ZHANG L., YU L., DEISENHOFER J., Crystal structure of the cytochrome bc1 complex from bovine heart mitochondria, Science 1997, 277, 60–66. ZHANG Z. L., HUANG L. S., SHULMEISTER V. M., CHI Y. I., KIM K. K., HUNG L. W., CROFTS A. R., BERRY E. A., KIM S. H., Electron transfer by domain movement in
References stockbroker bc1 , Nature 1998, 392, 677–684. 79 BOWIE J. U., Stabilizing membrane proteins, Curr Opin Struct Biol 2001, 11, 397–402. 80 BYRNE B., ABRAMSON J., JANNSON M., HOLMGREN E., IWATA S., Fusion protein approach to improve the crystal quality of cytochrome bo3 ubiquinol oxidase from Escherichia coli, Biochim Biophys Acta 2000, 1459, 449–455. 81 PRIVE G. G., KABACK H. R., Engineering the lac permease for purification and
crystallization, J Bioenerg Biomembr 1996, 28, 29–34. 82 ZHUANG J., PRIVE G. G., WERNER G. E., RINGLER P., KABACK H. R., ENGEL A., Two-dimensional crystallization of Escherichia coli lactose permease, J Struct Biol 1999, 125, 63–75. 83 HUNTE C., Insights from the structure of the yeast cytochrome bc1 complex: crystallization of membrane proteins with antibody fragments, FEBS Lett 2001, 504, 126–132.
23
1
13C- and 15N-Isotopic Labeling of Proteins Christian Klammt, Frank Bernhard, and Heinz R¨uterjans
Johann Wolfgang Goethe-University, Frankfurt, Germany
Originally published in: Molecular Biology in Medicinal Chemistry. Edited by Theodor Dingermann, Dieter Steinhilber and Gerd Folkers. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30431-8
1 Introduction
Nuclear magnetic resonance (NMR) spectroscopy has experienced a tremendous growth over the past decade, and has been developed as a widely used technique with enormous potential for the study of the structure and dynamics of biological macromolecules. In particular, structural analysis of proteins has greatly benefited from recent advancements of high field solution NMR. Improvements in the hardware and the development of NMR spectrometers with field strengths up to 900 MHz considerably increased the sensitivity and reduced the requirements for high amounts of samples. Furthermore, major milestones in the study of proteins have been the use of isotope labeling and the development of multidimensional techniques. The structure of small proteins below a limit of approximately 10 kDa could be analyzed without the need for special isotope labeling after the development of homonuclear 1 H two-dimensional (2-D) experiments in the late 1970s and early 1980s. However, beyond that limit, the increased complexity due to line broadening and 1 H chemical shift overlap prevents detailed structural analysis of larger proteins by these techniques. The structural analysis of large proteins therefore requires multinuclear labeling and special strategies in order to replace the prevalent NMR inactive isotopes 14 N and 12 C in proteins by the NMR active isotopes 13 C and 15 N [1]. Protocols have been developed to generate either uniformly 13 C/15 N-labeled proteins or to introduce the label only at specific sites. In addition, heteronuclear multidimensional NMR experiments have been designed that utilize the relatively large one- and two-bond scalar couplings introduced by the label [2, 3]. Using the 13 C and/or 15 N chemical shifts, the overlapping 1 H resonances can now be routinely separated into three or four dimensions. The combination of these
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 13C- and 15N-Isotopic Labeling of Proteins
techniques addresses the limitations imposed on the study of large proteins and permits structure determination of proteins up to a molecular weight of approximately 22 kDa [4, 5]. This limit could be extended up to 40 kDa by the random incorporation of 2 H into nonexchangeable sites of proteins [6–9]. In this chapter, recently developed strategies for the incorporation of 15 N and 13 C labels into proteins resulting in highly sensitive spectra and facilitating structural analysis of high-molecular-weight systems are discussed. Conventional labeling strategies require the overproduction of proteins in bacterial or eukaryotic expression systems and we give an overview of the most advanced systems as they are indispensable prerequisites for the efficient incorporation of isotopes into proteins. In addition to structural studies, NMR plays an increasingly important role in the description of protein dynamics, and relationships between dynamics and function. We therefore also address techniques of selective isotope labeling for the study of protein side-chain dynamics and protein–ligand interactions. The second part of the chapter focuses on newly developed in vitro techniques, utilizing cell-free extracts for the generation of protein samples. While cell-free production of proteins in an analytical scale has been possible for several years, preparative protein production using extracts from Escherichia coli or other sources has become one of the most powerful tools for the production of labeled samples suitable for NMR analysis. We therefore will especially address the recent advancements in cell-free expression and labeling strategies.
2 Expression Systems for the In Vivo Incorporation of 13 C and 15 N Labels into Proteins
Apart from technical barriers, the quality and stability of protein samples often limit their analysis by NMR technologies. The relatively low sensitivity of most NMR spectrometers requires sample concentrations starting from 0.2 mM and many proteins aggregate long before they reach that limit. In addition, samples must remain stable for at least several days until a complete set of measurements is finished. In order to obtain optimal resolution, NMR analysis usually requires relatively high temperatures of 15-25◦ C, which could also affect protein stability. Due to the high costs involved in the preparation of multinuclear-labeled proteins, one sample should be used for several experiments and aggregation, denaturation or even degradation of the proteins due to spurious amounts of proteases or due to autoproteolysis needs to be reduced to a minimum. Reasonably high yields of sample preparations are therefore required, which in conventional in vivo expression systems should reach a minimum of several milligrams of protein per liter of culture medium. In order to obtain the best protein production rates, the choice of the optimal expression system is therefore of primary importance. The most commonly used organism for labeled recombinant protein production is still the enterobacterium E. coli. An immense variety of expression systems and vectors have been designed for protein production in E. coli, and this large selection is certainly one of the major advantages for choosing an E. coli expression system. The
2 Expression Systems for the In Vivo Incorporation of 13 C and 15 N Labels into Proteins
predominant eukaryotic systems for protein production are based on the highly productive species Pichia pastoris. Labeling of proteins in mammalian cells is only rarely considered because of the high costs due to low yields concomitant with the need for expensive labeling precursors. Labeled protein samples were initially generated by growing the cells in defined minimal media supplemented with specifically labeled compounds. Common label precursors include 15 NH4 Cl, (15 NH4 )2 SO4 , K15 NO3 and Na15 NO3 for nitrogen labeling, and NaH13 CO3 , [13 C]glycerol, [13 C]glucose and [13 C]acetate for carbon labeling. A variety of supplements such as trace element mixtures and vitamin cocktails have been tested to enhance the cellular growth in minimal media and to improve the productivity. Furthermore, single- or double-labeled mixtures of amino acids or complex algal or microbial hydrolyzates have been added to culture media [10–13] in order to produce uniformly labeled proteins. In order to reduce the isotope costs, cells may be grown in unlabeled rich media to high cell densities, harvested and then suspended in low volumes of labeled defined medium for protein production. High levels of isotope incorporation concomitant with a 4- to 8-fold increase in yield were obtained in E. coli with this technique [14, 15]. 2.1 Protein Production and 13 C and 15 N-labeling in E. coli Expression Systems
Techniques for isotope enrichment in E. coli have been available for several years [16–18] and the vast majority of labeled proteins analyzed by NMR spectroscopy are still being produced by heterologous expression in various E. coli hosts yielding up to 50% of the total cell protein. Despite considerable improvements in eukaryotic and cell-free expression systems, bacterial expression remains the fastest and most economical system for the production of proteins. Standard methods of generating heteronuclear-labeled samples in E. coli use M9 minimal medium [19] or modified derivatives of it-supplemented with [13 C]glucose and [13 C]glycerol for carbon labeling, and (15 NH4 )2 SO4 and 15 NH4 Cl for nitrogen labeling. An important advantage when using E. coli expression is the elaborated selection of vectors and host strains available. Upon overproduction of proteins, it is often essential that the transcription of the target gene can be repressed as tightly as possible until the cells reach the most suitable growth phase for induction. Leaky expression prior to induction could either favor the accumulation of mutations in the target gene in order to suppress the production of unwanted proteins, or even damage or kill the cells due to toxic effects. The most popular inducible promoters in E. coli expression systems are listed in Tab. 1. Repression of the strong E. coli lac promoter is performed by interaction of its repressor LacI with specific operator sequences in Plac , thus preventing the E. coli RNA polymerase from binding. The LacI protein is released from its operator by the addition of isopropyl β-thiogalactoside (IPTG), a nonmetabolizable analog of the natural inducer. A disadvantage of the lac promoter is its relatively high background expression. This could be partly suppressed by an increasing copy number of LacI in expression hosts either by introducing additional copies of the lacI gene on a
3
4 13C- and 15N-Isotopic Labeling of Proteins Table 1 Commonly used promoters for protein overproduction in E. coli
Promoter
Origin
Repression
Induction
Plac
E. coli
LacI
Para Ptet φ1-φ10
E. coli E. coli, Tn10 phage T7 of T7 RNA polymerase phage λ
AraR TetR repression/inactivation
isopropyl β-thiogalactoside (IPTG) L-arabinose tetracycline requires T7 RNA polymerase, IPTG heat inducible, temperature shift of 42◦ C
λPL
λ cI repressor
plasmid or by increasing the expression of the chromosomal lacI gene by an up mutation of its native promoter, called lacIq . Several derivatives, like the tac or trc promoter, have been constructed by fusion of Plac with other promoters, e.g. from the tryptophan operon. In expression systems based on the highly specific T7 promoters, the protein production is induced by the supply of T7 RNA polymerase [20]. In principal, the T7 RNA polymerase can be provided in several ways. A common method is the expression in specially designed strains carrying a chromosomal copy of T7 gene 1, encoding T7 RNA polymerase. This construct, termed “DE3”, represents a tran-scriptional fusion of gene 1 with the IPTG-inducible lac promoter. To ensure tight repression, gene 1 is accompanied by an extra copy of the lacIq gene. In order to eliminate any background of T7 RNA polymerase due to leaky expression, the plasmid-borne gene encoding for T7 lysozyme can be provided. T7 lysozyme efficiently binds and inactivates T7 RNA polymerase, and this expression system even allows the production and labeling of toxic proteins in E. coli. While the structural analysis of the native protein is predominantly the major task of a labeling approach, most samples were initially produced as translational fusions to other proteins or to artificial peptide-tags (Tab. 2). Those strategies can provide Table 2 Popular tags and carrier proteins for protein overproduction in fusion systems
Fusion
Effect on solubilitya
Purification
Poly(His)6 -tag Strep-tag Maltose-binding protein NusA Glutathione S-transferase Thioredoxin Protein G (B1 domain) λ head protein D Protein A (Z domain)
no effect no effect ++ ++ + + + + +
immobilized metal chelate chromatography affinity chromatography affinity chromatography
a
Effect on the solubility of the target protein.
affinity chromatography
affinity chromatography
2 Expression Systems for the In Vivo Incorporation of 13 C and 15 N Labels into Proteins
valuable advantages for the fast and efficient purification of the labeled protein or for its stabilization [21]. Furthermore, the addition of a suitable fusion partner to the N-terminus of the target protein can significantly enhance its production. Fusion proteins usually include a recognition site for a highly specific protease like enterokinase or the TEV protease in the linker between the two proteins. If necessary, the fusion partners could then be cleaved off from the target proteins after purification, leaving the nonmodified labeled protein for analysis. The most frequent peptide-tag used for an easy purification is a poly(His)6 -tag added to the N- or C-terminus of the protein. Such small tags usually do not disturb the conformation or activity of proteins and their removal after purification might not even be necessary. Proteins containing a poly(His)6 -tag can easily be purified from a crude cell extract by forming a chelate complex with metal ions like Ni2+ or Cu2+ immobilized on a chromatography column. The imidazole moieties of adjacent histidine residues are crucial for firm chelate formation and the bound proteins can later be eluted from the column with a competing imidazole gradient. The production of proteins with a poly(His)6 -tag has become a routine technique for the heterologous expression of recombinant genes in any host. Only approx. 25% of overproduced proteins are initially suitable or stable enough for structural analysis [22] and most experiments must be optimized in order to obtain sufficient yields of correctly folded proteins. General parameters of optimization are the selection of suitable expression hosts and vectors, variation of growth conditions and inducer concentrations, construction of fusion proteins, and coexpression of helper proteins. A frequently encountered problem upon heterologous expression in E. coli is the formation of inclusion bodies – large aggregates of unfolded inactive protein. Protein targeting as insoluble inclusion bodies could be a strategy to circumvent bactericidal activities of the overproduced protein [23]. Several protocols have been described to solubilize inclusion bodies after purification in strong denaturants and to refold denatured proteins into their correct 3-D conformation [24, 25]. However, for each protein the refolding strategy has to be optimized and might represent an intensive laborious trial-and-error procedure giving less than satisfactory results. Thus, it is often intended to maximize the production of proteins in completely soluble form. Fortunately, in many cases, the formation of inclusion bodies during expression can be prevented by C-terminal linkage of the target protein to a highly soluble fusion partner. Several carrier proteins are able to enhance the solubility of an otherwise insoluble proteins (Tab. 2); in systematic screens for proteins suitable to confer solubility on insoluble target proteins fused to their C-terminal ends, the E. coli maltose-binding protein MalE [26] and NusA [27] proved to be the most efficient. As large carrier proteins have to be removed through specific proteases prior to NMR analysis, the cleavage of the fusion proteins can reintroduce problems with solubility and the released target proteins sometimes tend to precipitate soon after they are no longer covalently attached to a carrier. In addition, the use of large carrier proteins would waste a considerable amount of the precious label precursors. Smaller carrier proteins like the phage λ head protein D [28] or domains of Staphylococcus Protein A [29] or Streptococcus Protein G [30] might therefore be used as noncleavable tags to enhance the stability
5
6 13C- and 15N-Isotopic Labeling of Proteins Table 3 Modifications of E. coli expression systems
Problems/requirements
Modifications
Disulfide bridge formation
expression in trx- and gor - strains, providing an oxidizing environment in the cytoplasm coexpression of disulfide isomerases like DsbA, DsbC or PDI periplasmic expression using signal peptides coexpression of chaperones like GroELS and/or DnaKJ/GrpE coexpression of corresponding tRNAs
Chaperone-dependent folding Rare codon usage
of smaller proteins or peptides. Fusions with the B1 domain of Protein G even improved NMR spectra of labeled proteins [30]. As well as enhancing solubility, carrier proteins can be used to facilitate the production of small peptides in E. coli and to stabilize them against degradation [23]. Misfolding can be a particular problem with the expression of eukaryotic proteins in E. coli, especially if they have several disulfide bonds, consist of multiple subunits or contain prosthetic groups. A popular strategy to address such problems is the coexpression of accessory helper proteins (Tab. 3). If folding of the overproduced protein depends on chaperones, the increase of chaperone concentrations by coexpression of GroEL/ES, DnaKJ/GrpE or other chaperone systems could be highly advantageous. Low production rates due to translational pausing or misincorporation of amino acids as a result of rare codon usage of the overproduced protein could be optimized by the coexpression of corresponding tRNAs. A frequent problem, which in many cases still cannot be completely solved when using E. coli expression systems, is the correct formation of disulfide bonds in proteins. Strategies for the optimization of disulfide bond formation in vivo include the export of overproduced proteins into the oxidizing periplasm by addition of appropriate export signal sequences, expression in specially designed strains lacking thioredoxin and glutathione reductase and thus providing an oxidizing cellular environment [31], and coexpression of disulfide isomerases like the DsbA and DsbC proteins of E. coli or the eukaryotic chaperone protein disulfide isomerase (PDI). 2.2 13 C- and 15 N-labeling of Proteins in P. pastoris
Yeast cells combine many advantages of prokaryotic and eukaryotic expression systems. They are small, robust and easy to handle. The growth rate of yeast cells is much faster compared with other eukaryotic cells and the fermentation of yeast cells is usually finished within a couple of days. Furthermore, yeasts do not require complex supplements in the growth media and can be propagated in simple inexpensive media or on agar plates like E. coli cells. Their excellent growth on defined minimal media make them especially attractive for the production of labeled proteins for NMR analysis. In addition, due to the secretion capabilities of P. pastoris, recombinant proteins might be easily purified in a single step [32]. Many proteins
2 Expression Systems for the In Vivo Incorporation of 13 C and 15 N Labels into Proteins
of eukaryotic origin need to undergo specific posttranslational modifications like glycosylation or lipidation, disulfide bridge formation, or posttranslational processing. The modifications are often essential for the stability or enzymatic activity of proteins [33], or for the adoption of their native conformation. The failure to form correct disulfide bridges can retard or even block the folding process of proteins, resulting in the formation of inclusion bodies or in their degradation. The production of modified proteins cannot be addressed in a satisfactory manner by using prokaryotic hosts like E. coli. Expression in yeast cells like the methylotrophic P. pastoris may therefore be the method of choice when searching for a fast-growing host with the potential for the production of modified and glycosylated proteins [34, 35]. However, the fact that the protein modification pattern obtained after expression in yeast might differ from that usually present in the native protein has to be considered, e.g. yeasts are only capable of attaching mannose-rich glycans, representing the core polysaccharide of glycoproteins. Labeling approaches should take into account that heterologous protein production in P. pastoris is correlated with a high cell density. Up to 100-fold increased yields can be obtained by fermentation of the cells in bioreactors as compared to shake flask cultures [36–38]. The induction of heterologous protein production should be started after reaching high cell densities. Yields of more than 100 mg protein/l culture medium have been reported [34, 38]. In shake flasks, the reported yields of uniformly labeled heterologous proteins vary from 2 mg/l in case of the Vaccinia virus complement control protein [39] to up to 27 mg/l for tick anticoagulant protein [36]. The average protein yield in yeast using shake flasks therefore only reaches the lower limit of comparable protein production in E. coli. It is furthermore important to realize that if compared with protein production in E. coli, much higher amounts of labeled isotopes are required to obtain a reasonable growth of the yeast cells, and at least 5- to 10-fold higher concentrations of (15 NH4 )2 SO4 and [13 C]glucose are routinely used when growing the cells in shake flasks. To produce 90 mg of a 13 C-labeled fragment of thrombomodulin in P. pastoris, 143 g [13 C]glycerol or 162 g [13 C]glucose had to be used by growing the cells in a 1.25-l fermentor [34]. Therefore, based on economic aspects, labeling approaches with yeasts are only competitive in cases when the proteins cannot be produced in E. coli. In defined minimal medium, the sole nitrogen source for P. pastoris usually is NH4 OH, which in addition serves as a base to neutralize considerable amounts of acid produced by the yeast during growth [34]. However, due to economic reasons, NH4 OH has to be replaced by (15 NH4 )2 SO4 for 15 N-labeling of proteins in P. pastoris. The required high cell density demands the addition of high amounts of (15 NH4 )2 SO4 and more than 40-folds concentration as compared with fermentations of E. coli had to be used. To avoid any negative effects on cell growth arising from the increased ionic strength of the medium through the formation of K2 SO4 , the (15 NH4 )2 SO4 had to be added in several portions at different times [38], and optionally also in combination with an exchange of medium before induction of protein expression [34]. For 13 C-labeling of proteins using the strong AOX1 promoter, two usually carbon sources are required during fermentation – glycerol or glucose to obtain a high
7
8 13C- and 15N-Isotopic Labeling of Proteins
cell density and methanol for the induction of the AOX1 promoter. Although P. pastoris is able to grow on methanol as the sole carbon source, the slow doubling time would prevent high cell densities and requires the use of alternate carbon sources like glycerol or glucose. While glucose could certainly help to increase the cell density, it represses the strong AOX1 promoter, leading to a decrease in heterologous protein production. However, [13 C]glucose could be used in labeling experiments as an initial carbon source until the desired high cell density is reached. The cells may then be further fermented with [13 C]glycerol as the carbon source and induced with [13 C]methanol [34]. Up to 30% of the carbon from the labeled protein originates from methanol and therefore it is crucial to induce the protein production with [13 C]-methanol in order to obtain fully labeled proteins, especially when using a yeast strain with a mut+ genotype [34]. 2.3 13 C- and 15 N-labeling of Proteins by Expression in Chinese Hamster Ovary (CHO) Cells
Compared to yeast cells, CHO cells represent the alternative of a more advanced system enabling the generation of labeled samples containing all of the possible modifications specific to eukaryotic proteins. Like with yeast cells, recombinant proteins can be efficiently secreted into the medium and protein production rates of more than 100 mg/l from CHO cells have been reported [40]. However, mammalian cells usually require amino acids, vitamins, cofactors and in many cases a complex serum as essential additives to the medium. To reduce costs, CHO cells could also be grown by replacing the highly expensive isotopically labeled amino acids with labeled bacterial hydrolyzates or algal extracts and sufficient yields of recombinant protein highly enriched with isotopes can be obtained [10]. To avoid dilution of the isotope label with unlabeled amino acids from the serum, a dialysis step should be included to remove low-molecular-weight compounds from the serum [10, 41]. A further important and probably cost-intensive factor is the requirement of relatively high concentrations of labeled glutamine, which is essential for the metabolic pathways of several amino acids and nucleic acid components. Although CHO cells represent an expensive host for the generation of labeled proteins, they might be the best choice for the overexpression of complex glycoproteins or of proteins that cannot be produced in E. coli [41–43]. 2.4 13 C- and 15 N-labeling of Proteins in Other Organisms
A further option to produce labeled glycosylated proteins can be the overproduction in the slime mould Dictyostelium discoideum. This host is able to feed on bacterial cells like E. coli that could then be pre-enriched with isotopic labels using standard protocols. Suitable expression vectors for D. discoideum are available and a glycosylated 16-kDa protein has already been labeled with 13 C/15 N to a high extent [44]. As the growth rate of D. discoideum is rather low, the complete experiment took more
2 Expression Systems for the In Vivo Incorporation of 13 C and 15 N Labels into Proteins
than 10 days. However, if compared to labeling approaches using yeast cells as a host, lower amounts of labeled precursors are needed to obtain sufficient protein for NMR measurements, and D. discoideum might therefore be considered as an economic alternative to produce labeled glycoproteins. While the previously discussed expression systems require relatively expensive mixtures of labeled precursors, the photoautotrophic cyanobacterium Anabaena sp. can be grown in defined media containing Na15 NO3 and NaH13 CO3 as the sole nitrogen and carbon sources. A 24-kDa domain of the E. coli gyrase B subunit was successfully overproduced in Anabaena sp. by using a specially designed shuttle vector system and more than 90% 13 C/15 N label incorporation was obtained [45]. The yield of the protein (approximately 3-6 mg/l) was found to be comparable to that obtained in E. coli. It is noteworthy that the expressed gene was under control of the E. coli tac promoter, which is also functional in Anabaena. While 15 N-labeling in Anabaena can be carried out with standard growth protocols, 13 C-labeling is problematic under aerobic conditions because of the ability of endogenous CO2 fixation, resulting in incomplete 13 C-labeling of less than 30% [45]. However, 13 Cenrichment of more than 90% could be obtained after growing the Anabaena cells with nitrogen gas aeration under controlled anaerobic conditions. The fermentation of Anabaena requires illumination and the duration is about 5 times longer compared to E. coli because of the slower doubling time. However, as less expensive labeled precursors can be used, the cost could be reduced to only about 10%. Heterologous protein production in Anabaena might therefore be an option for proteins with a low production rate for which higher volumes of medium have to be used. In addition, it has to be considered that the non-protonated nature of the final carbon supply might also be advantageous and cost-effective for the generation of perdeuterated protein samples necessary for the structural analysis of large proteins or dynamic studies. Although heterologous protein production is the most common approach, small peptide products have also been labeled in their natural host. A crucial prerequisite is that the organisms can be adapted to grow on defined media supplemented with suitable labeled precursors. Examples of successful approaches are the labeling of cyclosporin A in Tolypocladium inflatum [46], alamethicin in Trichoderma viride [47], bellenamine in Streptomyces nashvillensis [48] and cyanophycin in Synechocystis sp. [49]. 2.5 Strategies for the Production of Selectively 13 C- and 15 N-labeled Proteins
NMR spectra of uniformly labeled proteins become increasingly complex with the increasing size of the proteins. Several methods have been established to selectively label only specific domains, amino acids or even single nuclei of the protein in order to simplify the spectral analysis. Using in vivo labeling techniques, this can be done by feeding the host cells with specifically designed label precursors, which are obtained in most cases by chemical synthesis. The resulting spectral simplification facilitates the unambiguous assignment of resonances in large proteins. Choosing
9
10 13C- and 15N-Isotopic Labeling of Proteins
the correct labeling strategy can therefore be crucial to accelerate the assignment of resonances in proteins. 2.5.1 Selective Labeling of Amino Acids In principal, any amino acid residue in an overproduced protein can be specifically labeled by providing excessive amounts of the amino acid containing the desired label in the growth medium. Resonances of single amino acids can be easily identified by comparison with spectra of a protein with other labeled amino acids and such approaches could considerably accelerate the resonance assignments of large proteins. However, the high costs for purified labeled amino acids prevent the application of this technique for routine use of structure determinations. The selective labeling strategy is suited for the dynamic and functional analysis of selected amino acid residues e.g. during ligand interactions. Specific labeling is also useful in the analysis of protein denaturation. Unfolded proteins generally have a low dispersion of resonances, but the tracking of a limited number of labeled residues, ideally well distributed throughout the protein, could enable the analysis of the unfolding process. In a previous study, selective labeling with [15 N]isoleucine was used to detect folding intermediates of a complex protein [50]. As an alternatively to specific labeling of amino acids, in reverse isotope labeling an excess of one or several nonlabeled amino acids is added to a growth medium in combination with commonly used general labeling compounds like 15 NH4 Cl or [13 C]glucose. An interesting application is the reverse labeling of aromatic residues like phenylalanine or tyrosine as their side-chains are frequently involved in ligandbinding interfaces or positioned in hydrophobic cores of proteins, making distance restraints for the environment of these residues highly valuable [51, 52]. The reverse isotope-labeling approach clearly depends on the complexity of the analyzed protein and is generally useful for smaller proteins containing only few of the analyzed amino acid residues. The application of amino acid-specific or selective isotope-labeling strategies is limited by scrambling effects of the isotope label to other types of residues. Specific auxotrophic mutants of E. coli can be used for the overproduction and specific labeling of recombinant proteins in order to minimize this problem [53]. 2.5.2 Specific Isotope Labeling with 13 C Amino acids labeled at selected carbon positions can be added to defined growth media for their incorporation into the recombinant protein. These partially labeled compounds were usually generated by chemical synthesis and introduced to optimize the spectral resolution of large proteins in specific NMR experiments. Aromatic side-chain protons may be difficult to assign if the number of aromatic residues increases. Phenylalanine residues 13 C-labeled only at the ε position could thus help to rapidly assign the aromatic ring protons [54]. Furthermore, fully 13 Clabeled proteins have undesirable 13 Cα –13 Cβ scalar couplings. Incorporation of amino acids labeled in the 13 C backbone overcome this problem [55]. Fractional 13 C-labeling of amino acids can be achieved by growing the protein-producing cells with a defined mixture of 13 C- and 12 C-labeled carbon sources [56]. The analysis
2 Expression Systems for the In Vivo Incorporation of 13 C and 15 N Labels into Proteins
of internal side-chain dynamics within proteins in uniformly 13 C-labeled proteins is also complicated due to interference effects between different contributing relaxation interactions as well as by contributions from 13 C–13 C scalar and dipolar couplings. To solve this problem, proteins can be synthesized with an alternating 12 C–13 C–12 C-labeling pattern by an elaborate isotope labeling procedure using either [2-13 C]glycerol or [1,3-13 C2 ]glycerol as the sole carbon source [57]. Further carbon-selective labeling strategies are highly valuable in combination with selective protonation to analyze the structure of large proteins [52]. 2.5.3 Segmental Isotope Labeling Despite advances in heteronuclear multidimensional NMR techniques, the increased complexity of spectra due to a lack of resolution and increased overlap of signals having similar chemical shifts still limits the structural analysis of large proteins. However, a promising approach to further extend the actual size limit of the structural evaluation of proteins by NMR spectroscopy is the segmental- or block-labeling technique. In principal, partially labeled full-length proteins can be produced by the posttranslational in vitro ligation of nonlabeled domains together with a labeled domain obtained in different expression experiments. Resonances of single domains of larger proteins may be sequentially assigned and the complete structure of the protein could be obtained by a combinatorial approach, dividing a large target protein into parts of manageable size. In contrast to a structural elucidation of isolated protein domains, one should bear in mind that the structure of labeled domains with the segmental-labeling technique is analyzed in the context of the complete protein, thus providing information of the structural and functional interaction between domains while preserving the overall structural character of the protein. The ligation process is catalyzed by self-splicing enzymes – the inteins [58]. Inteins are insertion sequences cleaved off after translation through self-excision, leaving the flanking protein regions, the exteins, joined together through native peptide bond formation. Depending on the nature of the intein and on the solubility of the target domains, several modified protocols of the segmental labeling strategy have been established. Ĺ The intein is cut in the middle of the sequence within a flexible loop region and the two fragments are expressed separately as fusions to the target domains. The isolated denatured proteins are mixed and the two intein fragments efficiently associate during refolding, resulting in the ligation of the two target domains [59, 60]. This strategy is especially applicable if the intein fusions are expressed as insoluble inclusion bodies and if an efficient protocol for refolding is available. Using different inteins at each splice junction, even proteins containing central labeled domains can be generated [61]. Ĺ The chemical ligation of folded proteins under native conditions uses an ethyl αthioester at the C-terminal end of a recombinant protein, generated by cleavage of an intein fusion, for the ligation with a second protein or peptide containing a cysteine residue at its N-terminal end [62]. Both ligation partners can be
11
12 13C- and 15N-Isotopic Labeling of Proteins
combined in a correctly folded conformation at a physiological pH and this strategy principally could be extended to link more than two domains together. Ĺ The mini intein from the dnaE gene of the cyanobacterium Synechocystis sp. comprises less than 200 amino acid residues, and is divided into N- and Cterminal fragments that are expressed separately. If combined after purification, the two fragments efficiently interact under native conditions to catalyze the splicing and ligation reaction of covalently attached proteins [63, 64]. Target domains to be ligated have to be expressed as fusion N- and C-terminal to the intein fragments. Although in vitro ligation yields as high as 90% have been obtained, several potential problems should be considered when starting a segmental labeling approach. While most enzymatically important residues are located within the inteins, an efficient excision and ligation mechanism also requires some residues from the extein, and the reaction is therefore not completely independent of the sequence of newly formed protein junctions [64]. The C-terminal extein requires a cysteine, threonine or serine residue at its N-terminal end. This sequence requirement must not affect the structure or activity of the analyzed protein. Refolding and ligation conditions must be optimized for each individual protein, and the splicing/ligation junction should be located in a loop region ensuring high flexibility. In principal, block labeling might apply best to proteins composed of independently folded domains having distinct biochemical properties.
3 Cell-free Isotope Labeling
Cell-free protein synthesis is an attractive and promising alternative to the conventional technologies for protein production using bacterial or eukaryotic cell cultures. In contrast to in vivo gene expression methods where protein synthesis is carried out in the cellular context surrounded by cell walls and membranes, cell-free protein synthesis provides a completely open system. This allows direct control and access to the reaction at any time. Compounds to improve protein production or to stabilize recombinant proteins, e.g. chaperones, chemicals, detergents or protease inhibitors, can easily be added without considering side-effects of the cellular metabolism or transport problems through the cell membrane. Most cellular functions with the exception of transcription and translation need not be maintained during cell-free protein expression. In principal, applications can therefore be extended to proteins or conditions that would not be tolerated by living organisms. While in vivo protein production is often limited by the formation of insoluble inclusion bodies or protein instability caused by intracellular proteases, the cell-free system offers new possibilities for the synthesis of complex proteins (Tab. 4). Protein folding and stability can be promoted by direct addition of chaperones, PDIs or necessary cofactors. In contrast, overexpression of chaperones in vivo can lead to cell filamentation or other undesirable phenotypes that can be detrimental for the viability of E. coli cells and protein expression [65]. Reaction parameters such as pH, redox
3 Cell-free Isotope Labeling 13 Table 4 Potential advantages and disadvantages of cell-free protein synthesis
Potential advantages
Disadvantages
Completely open system (easy access) → customization of reaction conditions to synthesized protein Incorporation of labeled, glycoslylated, modified or unnatural amino acids Direct translation of PCR products → high-throughput screening Production of toxic proteins Production of membrane bound proteins
Often low productivity
Expression of proteins requiring cofactors Easy addition of chaperones or PDI Miniaturization (e.g. 50 µl reactions) Working without living organisms → no growth restrictions Allowing a direct isolation of products to shorten time required for preparing purified proteins
Complex system compared to in vivo protein synthesis Expensive reaction compared to in vivo expression Small number of commercial systems and kits Difficult standardization because of multitude of reaction components High costs
potential and ionic strength can be determined without concern for harmful effects on the growth and viability of cells, and with the certainty that these parameters will directly influence the relevant reactions. This new opportunity of in vitro gene expression allows full control and high flexibility of conditions, and offers new potentials for difficulties associated with cytotoxicity, proteolytic degradation or improper folding and aggregation of synthesized proteins. The production of cytotoxic proteins [66], membrane proteins [67, 68] as well as the production of functional antibodies using PDI and chaperones [69] or functional viruses [70] had been reported. Most of all, cell-free protein synthesis offers completely new possibilities for the incorporation of labeled [71–74], glycosylated [74, 75], modified or unnatural amino acids [77]. Another promising potential for cell-free synthesis can be found in its suitability for high-throughput expression of proteins by direct translation of linear polymerase chain reaction (PCR) products [78]. Current limitations of cellfree systems are connected with their high complexity, high costs and often low productivity. 3.1 Components of Cell-free Expression Systems
In cell-free protein production, all components involved in gene expression and protein synthesis have to be added to the reaction mixture (Fig. 1). Components like DNA, nucleotide triphosphates (NTPs), messenger RNA (mRNA), transfer RNA (tRNA), aminoacyl-tRNA synthetases (ARSases), polymerases, ribosomes, transcription and translation factors like initiation factors (IFs), elongation factors (EFs) or release factors (RFs) and amino acids have to be optimized with regard to their concentration, and optimal salt and pH environment. The process of transcription/
14 13C- and 15N-Isotopic Labeling of Proteins
Fig. 1 Schematic to illustrate cell-free protein synthesis, showing the coupled process of transcription and translation in a bacterial system. Initially, cells are grown, lysed and the cell extract is prepared containing ribosomes, translation factors like initiation factors (IFs), elongation factors (EFs) and release factors (RFs), acetate kinase and aminoacyl-tRNA synthetases (ARSases). Substrates like amino acids, the energy-regenerating system components or nucleoside triphosphates (NTPs) and salts are then added to the extract, and protein synthesis is initiated by adding the template DNA. The DNA is transcribed by
an added RNA polymerase. Added tRNA is loaded with amino acids by ARSases and they are used in the translation of mRNA. The incorporation of some stable isotope-labeled amino acids (dark) can easily be done in the cell-free system, leading to a selective isotope-labeled protein. Regeneration of ATP and GTP and even the NTPs in the cell-free system is achieved by an ATP-regenerating energy system based on the hydrolysis of high-energy substrates in the presence of their kinases. Chaperones can easily be added to the reaction mixture to assist the folding of the target protein.
translation requires large amounts of free energy supplied by the hydrolysis of the triphosphates ATP and GTP. Therefore, in vitro protein synthesis requires an ATPregenerating energy system to maintain the triphosphate concentration. For this purpose, high-energy phosphate donors such as acetyl phosphate (AcP), creatine phosphate (CrP) or phosphoenol pyruvate (PEP) in the presence of their kinases (acetate kinase, creatine kinase and pyruvate kinase) have been used.
3 Cell-free Isotope Labeling 15
A crucial component is a cell extract based on crude cell lysate which contains the necessary reaction components such as IFs, EFs, RFs, ARSases, tRNA and enzymes for energy regeneration like acetate kinase. Cell-free expression systems are classified according to the origin of their extract. In principle, functional in vitro systems can be prepared from any cell type, but many factors contribute to the protein production efficiency. The most common in vitro reactions are based on extracts made from E. coli, wheat germ or rabbit reticulocyte lysates (Tab. 5). E. coli extracts consist of the so-called S30 supernatant fraction, named after the soluble fraction when centrifuged at 30,000 g, containing endogenous ribosomes, enzymes like acetate kinase and factors necessary for transcription and translation, ARSases, tRNA and mRNA. Endogenous mRNA is removed from the ribosomes during preincubation of the crude cell extract in a “run-off” step and destroyed by endogenous ribonucleases [79]. Another way to deploy bacterial extracts is described in the Gold–Schweiger system [80, 81]. Ribosomes are added to the supernatant fraction of a S100 extract especially purified from endogenous amino acids and nucleic acids by ion-exchange chromatography. This system provides a very low
Table 5 Bacterial and eukaryotic cell-free expression systems
Bacterial systems
Eukaryotic systems
Escherichia coli
Wheat germ
Rabbit reticulocyte
S30 extract preparation
S100 extract preparation
Wheat germ extract
Rabbit reticulocyte lysates
Supernatant fraction at 30,000 g centrifugation of E. coli extract preincubated to detract endogenous DNA and mRNA Contains: RNA polymerase, ribosomes, tRNAs, ARSases, translation factors 24-38◦ C (optimum at 37◦ C)
Supernatant fraction at 100,000 g centrifugation deprived of all nucleic acids, e.g. by DEAE cellulose treatment Contains: RNA polymerases, ARSases and translation factors
Directly used for expression of endogenous or exogenous templates
Treated with micrococcal Ca2+ dependent RNase
Contains: Ribosomes, tRNAs, ARSases, translation factors
Contains: Ribosomes, tRNAs, ARSases, translation factors
20-27◦ C up to 32◦ C
30-38◦ C
1
Ribosomes, tRNAs added 24-38◦ C (optimum at 37◦ C)
Reaction temperature. References. To be added: DNA (plasmid with appropriate promotor for SP6, T7 RNA-polymerase) + polymerase (bacteriophage SP6 or T7 RNA polymerase) or mRNA. Amino acids. Energy components: ATP and GTP. NTP-regeneration system: PEP + PK or CP + CrP or AcP. Formyltetrahydrofolate or its congener, e.g. methenyltetrahydrofolate or folinic acid. SH-compound: mercaptoethanol or dithiothreitol. Mg2+ and K+ in optimal concentrations. In some cases: Ca2+ and NH4 + in optimal concentrations. cAMP. Polyamines, e.g. spermidine. 2
16 13C- and 15N-Isotopic Labeling of Proteins
background due to endogenous synthesis and better-controlled conditions at the expense of more complicated preparation. The E. coli system functions well in a temperature range of 24-38◦ C with an optimum at 37◦ C [79, 82, 83]. Wheat germ extract possesses a low level of endogenously expressed messengers and therefore can be directly used for expression of endogenous [84] or exogenous templates. In the wheat germ system, the optimum temperature is in the range 20-27◦ C [85, 86], but can be increased to up to 32◦ C for higher expression of some templates [87]. Reticulocyte extract is prepared by directly lysing blood cells of anemic rabbits; this increases the number of proerythrocytes or reticulocytes that are subsequently treated with micrococcal Ca2+ -dependent RNase to remove endogenous mRNA [88]. This system works in a temperature range of 30-38◦ C [89]. With regard to the reaction temperature, it should be noted that apart from any effect on the enzymatic process of transcription/translation and mRNA degradation, the temperature affects the folding of the synthesized protein. To assist the folding of proteins, chaperones like the GroEL/ES system [90] or DnaK and DnaJ [69] can be added to the reaction mixture. Usually, lower temperatures will lead to higher yields of recombinant protein in the absence of chaperones, but not necessarily in their presence [91]. Furthermore, to assist disulfide bond formation, PDI can be added to the transcription/translation mixture [69]. Less common translation systems based on yeast extracts [92], mammalian cells like human HeLa and mouse L-cells [93] or on tobacco chloroplasts, which strongly depend on exogenously added mRNA [94], have high levels of degradation and relatively low protein yields. With regard to the cell-free labeling of proteins, only E. coli and wheat germ extracts have been used. Using a very elaborate approach, Shimizu et al. developed a cell-free translation system reconstructed from purified poly(His)-tagged translation factors [95]. Their system, termed the “protein synthesis using recombinant elements” (PURE) system, contains 32 individually purified components with high specific activity, allowing efficient protein production. An advantage of the PURE system, apart from the absence of inhibitory substances such as nucleases, proteases and enzymes that hydrolyze nucleoside triphosphates, is the simple purification of the synthesized protein by removing tagged protein factors by affinity chromatography. One limitation of the cell-free system is the degradation of exogenous added mRNA. Various RNase activities present in cell extracts usually restrict the lifetime of mRNA, and subsequently the efficiency of protein synthesis is inhibited. This problem could be solved by the periodical reintroduction of messengers into the reaction mixture in a simple translation system [96]. Coupled transcription–translation systems, where mRNA is continuously synthesized from DNA templates added to the reaction mixture, can be advantageous to translation systems containing presynthesized messengers. Direct transcription in the reaction mixture may be executed from appropriate promoters by endogenous E. coli RNA polymerase, or by exogenous phage RNA polymerase in bacterial systems [97] or in eukaryotic systems [82, 98]. To avoid rapid messenger degradation, especially in E. coli cell-free systems, partially ribonuclease-depleted extracts or RNase inhibitors are used. A good choice of template DNA in the prokaryotic system is circular plasmid DNA. In the wheat germ system with lower nuclease activities, both
3 Cell-free Isotope Labeling 17
plasmid DNAs [99] and linear PCR fragments function well [100–102]. The translational efficiency of mRNA depends on its structural features. Most cDNA sequences can be sufficiently well expressed without addition of translational enhancers. The sequence of interest only needs to be provided with a favorable Kozak sequence in eukaryotic cell-free systems and the Shine–Dalgarno sequence in prokaryotic cell-free systems, which are the respective sequences upstream of the ATG codon responsible for translation initiation. A further problem in cell-free protein synthesis is the high consumption of biochemical energy provided by ATP and GTP. Creatine phosphate concomitant with creatine kinase is usually used for ATP and GTP regeneration in eukaryotic cell-free systems, whereas the combination of PEP and pyruvate kinase, acetyl phosphate and acetate kinase or a combination of both has been applied for bacterial in vitro protein synthesis. The acetyl phosphate energy system may have the advantage that the ATP level is maintained twice as long as in the presence of PEP. Since acetate kinase is present at sufficient levels in bacterial extracts, it does not need to be added exogenously in the E. coli system [103]. Studies of the biochemical energy levels in different cell-free systems observed a high rate of triphosphate hydrolysis to mono- and diphosphates during protein synthesis in wheat germ extracts [104] as well as in an E. coli S30 extract [105]. It is reported that more than 80% of ATP and GTP hydrolysis in the wheat germ system initially occurs independently of protein synthesis and it was suggested that acid phosphatases are responsible for the nonspecific hydrolysis of the nucleotide triphosphates [86, 104]. Recent extensive studies to increase the protein yield of cell-free reactions have focused on the composition of the reaction mixture, especially the amino acids composition [106, 107], and methods to prepare the cell extract. For example, endogenous phosphatases have been removed by using S30 extract prepared from the spheroplasts of E. coli [108] or by immunodepletion of the phosphatase [109]. Approaches to concentrate the extract components by ultrafiltration [110] or by dialysis [74] have also been reported. 3.2 Cell-free Expression Techniques
In cell-free reactions carried out in a “batch” mode, the reaction conditions change as a result of substrate consumption and the accumulation of products. Translation stops as soon as any essential substrate is exhausted, or any product or by-product reaches an inhibiting concentration. Actually, the bacterial cell-free systems are active only for 10-30 min at 37◦ C. Systems based on rabbit reticulocyte lysates or wheat germ extract are typically capable of working for up to 1 h. However, in vitro protein-synthesizing systems in batch mode work well for most analytical purposes, but short lifetimes and low productivities limit their application for the synthesis of preparative amounts of protein. The reasons for the low yield are degradation of mRNA, depletion of nucleotide triphosphates and accumulation of their hydrolyzates. Prolonged reaction times in cell-free expression were first achieved by Spirin et al. [111–113] by using a continuous-flow cell-free (CFCF) translation device (Fig. 2A). The basic idea is to continuously supply amino acids,
18 13C- and 15N-Isotopic Labeling of Proteins
Fig. 2 Illustration of cell-free reactors for continuous flow (A and B) and continuous exchange (C and D) cell-free expression systems. (A) Schematic drawing of a direct-flow CFCF reactor where substrates are supplied and products are removed by the flow of a feeding solution, forced by a pump. (B) Y-flow CFCF reactor, containing two ultrafiltration membranes of different pore size, separating protein product and low-molecular-weight waste outflows [123]. (C) CECF reactor design with explanation of feeding solution (light grey) and reaction mixture components (dark grey). Compartments
are separated by a semipermeable dialysis membrane, providing the diffusion exchange of substrates and low-molecular-weight products, and the retention of reaction mixture components. (D) Flow-exchange column reactor consisting of semipermeable mini-vesicles packed in a column, containing the reaction mixture surrounded by feeding solution, changed by flow through the column. Low-molecular-weight products and substrates are changed by diffusion through the mini-vesicle membrane [113]. Modified after Shirokov et al. [134].
3 Cell-free Isotope Labeling 19
energy-regenerating components (AcP, CrP or PEP) and NTPs in a feeding solution, and continuously remove small molecule byproducts (mainly products of triphosphate hydrolysis like inorganic phosphates and nucleoside monophosphates) by active (forced) flow of the feeding solution across an ultrafiltration membrane (molecular weight cut-off in the range of 10-300 kDa). In this case, all products, including the protein synthesized, are continuously removed from the reaction compartment if the pore size is large enough (Fig. 2A). Permanent stirring of the reaction mixture and in some set-ups upright flow of the feeding solution are applied to minimize membrane clogging [114]. The CFCF system can function for more than 20 hours and results in preparative protein expression of about 0.1-1 mg protein/ml reaction volume or higher. The template for this system can either be mRNA [113, 115], DNA transcribed by endogenous bacterial RNA polymerase or added phage RNA polymerase [82, 113, 114], or self-replicating RNA [116]. Using the CFCF system, proteins like bacteriophage MS2 coat protein [111], brome mosaic virus coat protein [111], calcitonin polypeptide [113], globin [117], functionally active dihydrofolate reductase (DHFR) [112, 113, 118, 119], chloram-phenicol acetyltransferase (CAT) [105, 113, 114, 116, 120, 121] interleukin (IL)-2 [122] and IL-6 [115] have successfully been synthesized. The important advantage of the CFCF system is the continuous removal of synthesized proteins from the reaction mixture, which can result in 80-85% purity of the protein product as previously demonstrated in the outflow of a bacterial CFCF system synthesizing DHFR [119] and in a wheat germ CFCF system synthesizing IL-6 [115]. The synthesized protein diffuses out of the CFCF reactor in a large volume of the effluent and is quite diluted. To reduce the dilution effect, the so-called Y-flow reactor with a split outflow has been proposed [123]. The Y-flow reactor (Fig. 2B) has two membranes with different pore sizes. Initially the low-molecular-weight products are removed through a small-pore membrane at a high rate and the synthesized protein is subsequently collected through a large-pore membrane at a low rate. Here, the flow of the protein product is controlled by a separate pump. Nevertheless, a number of laboratories attempting CFCF expression have had difficulties in establishing this complex system. The main problems are RNA degradation when using bacterial extracts, even in the coupled transcription/translation mode, and low efficiency of initiation complex formation, which might cause leakage and therefore loss of some translation components by ultrafiltration or the blockage of the ultrafiltration membrane [91]. However, there is an alternative way of carrying out prolonged protein synthesis, namely by using diffusion instead of pumping to supply substrates and remove lowmolecular-weight products (Fig. 2C). This set-up, using a reaction mixture separated from a feeding solution by applying a dialysis membrane, is called a continuousexchange cell-free system (CECF) [124] or a semicontinuous-flow cell-free system (SFCF) [121]. The simplest device for the continuous supply of substrates and the removal of low-molecular-weight products by passive exchange with a feeding mixture is a dialysis bag [124]. In practice, simple homemade dialysis bags or R R standard commercial dialyzers, such as the MicroDialyzer and DispoDialyzer from Spectrum, can be used successfully. The pore size of the reactor membranes
20 13C- and 15N-Isotopic Labeling of Proteins
is usually in the range of 10–50 kDa. However, better performance of reactors with a larger pore membrane has been reported [74]. Stirring of either the feeding solution or both, the feeding and reaction mixture, is necessary for efficient supply of substrates. Using this approach, Kim and Choi [121] reported the synthesis of 1.2 mg/ml of CAT over 14 h in E. coli S30 extract. The product yield, quantified by ELISA, exceeded the yield of the analogous batch reaction by 10–12 times. More recently, Kigawa et al. [74] succeeded in synthesizing CAT and Ras proteins in amounts up to 6 mg/ml over 18 h in their version of the bacterial CECF system, and Madin et al. [96] reached yields of 1-4 mg/ml for several functionally active proteins like DHFR, green fluorescent protein (GFP), luciferase and RNA replicase of tobacco mosaic virus (TMV). Figure 3 describes the cell-free synthesis of GFP in a CECF R method using a MicroDialyzer . The advantage of the CECF system is the accumulation of the synthesized product in the reaction mixture. The synthesis of proteins and polypeptides fused with GFP provide a direct and demonstrative way to visualize product accumulation by fluorescence of the GFP moiety. The synthesis of a HIV protein, the so-called Nef antigen, fused with GFP [125] and an antibacterial polypeptide Cecropin P1 fused
Fig. 3 Cell-free synthesis of GFP in a bacterial CECF system using a 100-:l MicroDia R lyzer . The kinetic points display the mean of three reactions and error bars indicating the deviation of the mean. The reaction was carried out at 30◦ C using a 17-fold amount of feeding mixture compared to the reaction mixture volume and under the following reaction conditions: 0.05% NaN3 , 2% phosphoenolglycol, 196 :M folinic acid, 2 mM 1,4-dithiothreitol, 1.2 mM ATP, 0.8 mM each of CTP, GTP and UTP, 20 mM PEP, 20
mM acetyl phosphate, 1 mM of each amino acid, except the amino acids arginine, aspartic acid, cysteine, glutamic acid, methionine and tryptophan, for which 2 mM was used, 100 mM HEPES-KOH (pH 8.0), 2.8 mM EDTA, 1 × complete protease inhibitor (Roche), 280 mM K+ , 13 mM Mg2+ , 40 :g/ml pyruvate kinase, 500 :g/ml tRNA (E. coli), 3 U/:l T7-RNA polymerase, 0.3 U/:l RNasin, 35% S30 extract (E. coli) and 15 :g/ml plasmid.
3 Cell-free Isotope Labeling 21
with GFP [126], both in the bacterial CECF T7 transcription/translation system, have been reported. Reactors combining both exchange and flow have also been developed. In one version, the reaction mixture is encapsulated into polysaccharide minivesicles that can be packed into a column, where feeding solution is passed through the column (Fig. 2D). In this case, product–substrate exchange across the vesicle walls takes place during the flow [113]. More sophisticated versions of the CECF reactor are being developed in order to meet the demands of scientists and biotechnologists. The first commercial CECF reactor has recently been launched on the market by Roche Diagnostics. The Roche CECF system promises a high protein yield of more than 2 mg/ml. Other commercial systems focus on the batch mode, where recent improvements have been made. For example, a novel NTP regeneration system, avoiding accumulation of inorganic phosphate by adding pyruvate oxidase, which generates AcP from pyruvate and inorganic phosphate directly in the reaction mixture, has been proposed by Kim and Swartz [127]. Later they showed that addition of oxalate, a potent inhibitor of PEP synthetase, substantially increases the yield of CAT synthesis through the enhanced supply of ATP by about 47% [128]. Furthermore, Kim and Swartz developed a “fed-batch” mode where coordinated addition of PEP, magnesium, and the amino acids arginine, cysteine and tryptophan resulted in a final concentration of cell-free synthesized CAT that was more than 4fold compared to a batch reaction [107]. As a result of these improvements it became possible to synthesize about 350 µg/ml CAT [107] or 450 µg/ml recombinant DNA human protein thrombopoietin [106] in a batch reaction.
3.3 Specific Applications of the Cell-free Labeling Technique
Conventional in vivo labeling techniques are often accompanied by low protein yields due to retarded growth in a minimal medium and, in the case of selective isotope labeling, by scrambling effects that drastically reduce the efficiency and selectivity of labeling. A major advantage of cell-free labeling techniques therefore is the absence of any scrambling effects or metabolic conversion of labeled amino acids [71, 73, 129]. The selective incorporation of isotope-labeled amino acids in cell-free synthesized proteins for NMR research has already been demonstrated by several groups [71, 73, 74, 129–131]. Similarly, in the reverse isotope-labeling technique, most amino acids are single or dual (15 N, 13 C) labeled, except for a few residues [74]. Figure 4(A) illustrates the 2-D 1 H–15 N heteronuclear single-quantum correlation spectroscopy (HSQC) NMR spectra of the cell-free reverse isotopelabeled C-terminal DNA-binding domain of the transcriptional response regulator RcsB (cRcsB) which has been uniformly 15 N-labeled except for asparagine (N) and glutamine (Q) residues. The uniformly 15 N-labeled cRcsB expressed in vivo is shown in Fig. 4(B). The 2-D 1 H–15 N HSQC spectrum of the labeled cRcsB protein synthesized in vitro is consistent with that of the uniformly labeled protein synthesized in vivo (only the asparagine and glutamine peaks are missing).
22 13C- and 15N-Isotopic Labeling of Proteins
Fig. 4 Comparison of the 2-D 1 H -15 N HSQC spectra of the in vitro and in vivo generated purified C-terminal domain of the bacterial transcriptional regulator RcsB (cRcsB) using a Bruker DRX-800 MHz NMR spectrometer with a cryoprobe. (A) Cellfree 15 N-reverse isotopic-labeled cRcsB synthesized in an optimized CECF system R using a DispoDialyzer . All amino acids except asparagine (N) and glutamine (Q)
were 15 N-labeled. As expected, the N and Q residues (circles) are absent in the 2-D 1 H -15 N HSQC spectrum. (B) Uniformly 15 N isotopic-labeled cRcsB expressed in vivo. This spectrum is in good agreement with that of the reverse cell-free isotopically labeled cRcsB, and N and Q residues can easily be identified. (NMR spectra kindly ¨ provided by Frank Lohr.).
3 Cell-free Isotope Labeling 23
Cell-free expression is highly suited for the generation of protein samples labeled only at distinct residues. Site-specific labeling is extremely useful to simplify the observation and resonance assignment procedures for specified amino acid residues of particular interest, and can also be used for analyzing local structures of large proteins and protein–protein interactions. Ellman et al. demonstrated for the first time the incorporation of a particular 13 C-labeled residue at a suppressible termination codon by translating the modified sequence in an in vitro system supplemented with the charged suppressor tRNA. Subsequently it became possible to track the labeled residues upon denaturation and refolding of the protein by NMR spectroscopy [130]. Milligram quantities of site-specific isotope labeled protein can be obtained in a cell-free system involving the amber suppression strategy [73]. The E. coli amber suppressor tRNA can be prepared by in vitro transcription with T7 RNA polymerase and later aminoacylated with the appropriate purified E. coli amino ARSase and the cognate labeled amino acid. The codon for the selected amino acid residue in the protein was previously changed into an amber codon by standard techniques. Segmental labeling in vivo is limited by specific requirements like organization of the target protein into domains, presence of specific residues and folding problems. Pavlov et al. suggested a cell-free technique based on in vitro translation of matrixcoupled mRNAs, which principally is devoid of any sequence and conformational requirements [72]. The size of the labeled region is controlled by codon usage and no intrinsic upper limit to the size of proteins that can be isotope-labeled in selected regions exists. This method is based on the usage of translation mixtures depleted of either one amino acid and/or its tRNA and/or its amino ARSase and consists of three steps. Using column-coupled template RNA the reaction mixture can easily be exchanged. Initially, the unlabeled N-terminal region, using an unlabeled reaction mixture, is synthesized up to the first codon without a matching amino acyl-tRNA in the extract. Here the ribosomes pause and the translation mixture can be exchanged against a mix containing isotope-labeled amino acids, now deficient for a different amino acid, tRNA or amino ARSase. Translation resumes, thereby labeling the region until the ribosomes encounter the next codon without the corresponding tRNA. In the last step, the C-terminal part is synthesized without any label and the protein is released from the ribosome. However, the technique results in low protein yields, as it can, at the very best, produce protein stoichiometric to the immobilized mRNA. A very promising advantage of cell-free isotope labeling is the possibility of in situ NMR measurement as described by Guignard et al. They showed NMR analysis of in vitro-synthesized proteins without any chromatographic purification and with minimal sample handling using an optimized CECF reaction combined with the sensitivity of a cryoprobe [131]. As expected, they observed no cross peak for any excess 15 N amino acid and only the target protein was enriched with the isotopelabeled residues. They suggest a new possibility for inexpensive high-throughput protein analysis applicable in large-scale proteomics, where selectively labeled proteins can be expressed in 0.5 ml of reaction medium using small quantities of labeled amino acids and analyzed by NMR. All steps from the expression to the completed NMR spectra were done in less than 24 h.
24 13C- and 15N-Isotopic Labeling of Proteins
Due to the exceptional advantages of cell-free protein synthesis, isotope labeling can be achieved for proteins that are normally difficult to express. Using cell-free labeling techniques, it might become possible to synthesize and isotope label proteins that are toxic, require cofactors or chaperones for adopting an active conformation, or even membrane proteins. Subunit labeling of oligomers, further research on disulfide bridge formation or protein oxidation should be possible in the near future.
References 1 D¨OTSCH V., WAGNER G. 1998. New approaches to structure determination by NMR spectroscopy. Curr Opin Struct Biol 8, 619–23. 2 PERVUSHIN K., RIEK R., WIDER G., W¨UTHRICH K. 1997. Attenuated T2 relaxation by mutual cancellation of dipole–dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc Natl Acad Sci USA 94, 12366–71. 3 TJANDRA N., BAX A. 1997. Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278, 1111–4. 4 CLORE G. M., GRONENBORN A. M.. 1991. Structures of larger proteins in solution: three- and four-dimensional heteronuclear NMR spectroscopy. Science 252, 1390–9. 5 BAX A. 1994. Multidimensional nuclear-magnetic-resonance methods for protein studies. Curr Opin Struct Biol 4, 738–44. 6 CLORE G. M., GRONENBORN A. M.. 1997. NMR structures of proteins and protein complexes beyond 20,000 Mr. Nat Struct Biol 4, 849–53. 7 KAY L. E., GARDNER K. H.. 1997. Solution NMR spectroscopy beyond 25 kDa. Curr Opin Struct Biol 7, 722–31. 8 GARDNER K. H., KAY L. E.. 1998. The use of 2 H, 13 C, 15 N multidimensional NMR to study the structure and dynamics of proteins. Annu Rev Biophys Biomol Struct 27, 357–406. 9 WIDER G., W¨uthrich K. 1999. NMR spectroscopy of large molecules and
10
11
12
13
14
15
16
multimolecular assemblies in solution. Curr Opin Struct Biol 9, 594–601. HANSEN A. P., PETROS A. M., MAZAR A. P., PEDERSON T. M., RUETER A., FESIK S. W.. 1992. A practical method for uniform isotopic labeling of recombinant proteins in mammalian cells. Biochemistry 31, 12713–8 SAILER M., HELMS G. L., HENKEL T., NIEMCZURA W. P., STILES M. E., VEDERAS J. C.. 1993. 15 N- and 13 C-labeled media from Anabaena sp. for universal isotopic labeling of bacteriocins: NMR resonance assignments of leucocin A from Leuconostoc gelidum and Nisin A from Lactococcus lactis. Biochemistry 32, 310–8. REILLY D., FAIRBROTHER W. J.. 1994. A novel isotope labeling protocol for bacterially expressed proteins. J Biomol NMR 4, 459–62. BRANDNER G., PHALIP V., KELLENBERGER E., ZHANG C. C.. 1999. Optimization of a cyanobacterial culture for production of isotope-labeled recombinant proteins. Biotechnol Tech 13, 177–80. CAI M., HUANG Y., SAKAGUCHI K., CLOE G. M., GRONENBORN A. M., CRAIGIE R. 1998. An efficient and cost-effective isotope labeling protocol for proteins expressed in Escherichia coli. J Biomol NMR 11, 97–102. MARLEY J., LU M., BRACKEN C. 2001. A method for efficient isotopic labeling of recombinant proteins. J Biomol NMR 20, 71–5. MUCHMORE D. C., MCINTOSH L. P., RUSSEL C. B., ANDERSON D. E., DAHLQUIST F. W.. 1989. Expression and nitrogen-15 labeling of proteins for proton and nitrogen-15 nuclear
References
17
18
19
20
21
22
23
24
25
magnetic resonance. Methods Enzymol 177, 44–73. IKURA M., KAY L. E., BAX A. 1990. A novel approach for sequential assignment of 1 H, 13 C, and 15 N spectra of proteins: heteronuclear triple-resonance three-dimensional NMR spectroscopy. Application to calmodulin. Biochemistry 29, 4659–67. VENTERS R. A., CALDERONE T. L., SPICER L. D., FIERKE C. A.. 1991. Uniform 13 C isotope labeling of proteins with sodium acetate for NMR studies: application to human carbonic anhydrase II. Biochemistry 30, 4491–4. SAMBROOK J., FRITSCH E. F., MANIATIS T. 1989. Molecular Cloning: A Laboratory Manual, 2nd edn, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. STUDIER F. W., ROSENBERG A. H., DUNN J. J., DUBENDORF J. W.. 1990. Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol 185, 60–89. NILSSON J., STAHL S., LUNDEBERG J., UHLEN M., NYGREN P. A.. 1997. Affinity fusion strategies for detection, purification and immobilization of recombinant proteins. Protein Express Purif 11, 1–16. CHRISTENDAT D., YEE A., DHARAMSI A., KLUGER Y., SAVCHENKO A., CORT J. R., BOOTH V., MACKERETH C. D., SARIDAKIS V., EKIEL I., KOZLOV G., MAXWELL K. L., WU N., MCINTOSH L. P., GEHRING K., KENNEDY M. A., DAVIDSON A. R., PAI E. F., GERSTEIN M., EDWARDS A. M., ARROWSMITH C. H.. 2000. Structural proteomics of an archaeon. Nat Struct Biol 7, 903–9. MAJERLE A., KIDRIC J., JERALA R. 2000. Production of stable isotope enriched antimicrobial peptides in Escherichia coli: an application to the production of a 15 N-enriched fragment of lactoferrin. J Biomol NMR 18, 145–51. De BERNARDEZ C. E.. 1998. Refolding of recombinant proteins. Curr Opin Biotechnol 9, 157–63. LILIE H., SCHARZ E., RUDOLPH R. 1998. Advances in refolding of proteins produced in E coli. Curr Opin Biotechnol 9, 497–501.
26 KAPUST R. B., WAUGH D. S.. 1999. Escherichia coli maltose binding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused. Protein Sci 8, 1668–74. 27 DAVIS, D. D., ELISEE C., NEWHAM D. M., HARRISON R. G.. 1999. New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol Bioeng 65, 382–8. 28 FORRER P., JAUSSI R. 1998. High-level expression of soluble heterologous proteins in the cytoplasm of Escherichia coli by fusion to the bacteriophage Lambda head protein D. Gene 224, 45–52. 29 JANSSON M., LI Y. C., JENDEBERG L., ANDERSON S., MONTELIONE B. T., NILSSON B. 1996. High-level production of uniformly 15 N- and 13 C-enriched fusion proteins in Escherichia coli. J Biomol NMR 7, 131–41. 30 ZHOU P., LUGOVSKOY A. A., WAGNER G. 2001. A solubility-enhancement tag (SET) for NMR studies of poorly behaving proteins. J Biomol NMR 20, 11–4. 31 BESSETTE P. H., ASLUND F., BECKWITH J., GEORGIOU G. 1999. Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm. Proc Natl Acad Sci USA 96, 13703–8. 32 De LAMOTTE F., BOZE H., BLANCHARD C., KLEIN C., MOULIN G., GAUTIER M. F., DELSUC M. A.. 2001. NMR onitoring of accumulation and folding of 15 N-labeled protein overexpressed in Pichia pastoris. Protein Express Purif 22, 318–24. 33 WYSS D. F., WAGNER G. 1996. The structural role of sugars in glycoproteins. Curr Opin Biotechnol 7, 409–16. 34 WOOD M. J., KOMIVES E. A.. 1999. Production of large quantities of isotopically labeled protein in Pichia pastoris by fermentation. J Biomol NMR 13, 149–59. 35 CEREGHINO J. L., CREGG J. M.. 2000. Heterologous protein expression in the methylotrophic yeast Pichia pastoris. FEMS Microbiol Rev 24, 45–66. 36 LAROCHE Y., STORME V., De MEUTTER J., MESSENS J., LAUWEREYS M. 1994. High-level secretion and very efficient
25
26 13C- and 15N-Isotopic Labeling of Proteins
37
38
39
40
41
42
43
isotopic labeling of tick anticoagulant peptide (TAP) expressed in the methylotrophic yeast Pichia pastoris. Biotechnology 12, 1119–24. WHITE C. E., HUNTER M. J., MEININGER D. P., WHITE L. R., KOMIVES E. A.. 1995. Large-scale expression, purification and characterization of small fragments of thrombomodulin: the roles of the sixth domain and of methionine 388. Protein Eng 8, 1177–87. Van den BURG H. A., de WIT P. J. G. M., VERVOORT J. 2001. Efficient 13 C/15 N double labeling of the avirulence protein AVR4 in a methanol-utilizing strain (Mut+ .) of Pichia pastoris. J Biomol NMR 20, 251–61. WILES A. P., SHAW G., BRIGHT J., PERCZEL A., CAMPELL I. D., BARLOW P. N.. 1997. NMR studies of a viral protein that mimics the regulation of complement activation. J Mol Biol 272, 253–65. HENSLEY P., MCDECITT P. J., BROOKS I., TRILL J. J., FEILD J. A., MCNULTY D. E., CONNOR J. R., GRISWOLD D. E., KUMAR N. V., KOPPLE K. D.. 1994. The soluble form of E-selectin is an asymmetric monomer. Expression, purification, and characterization of the recombinant protein. J Biol Chem 269, 23949–58. LUSTBADER J. W., BIRKEN S., POLLAK S., POUND A., CHAIT B. T., MIRZA U. A., RAMNARAIN S., CANFIELD R. E., BROWN J. M.. 1996. Expression of human chorionic gonadotropin uniformly labeled with NMR isotopes in Chinese hamster ovary cells: an advance toward rapid determination of glycoprotein structures. J Biomol NMR 7, 295–304. ARCHER S. J., BAX A., ROBERTS A. B., SPORN M. B., OGAWA Y., PIEZ K. A., WEATHERBEE J. A., TSANG M. L., LUCAS R., ZHENG B., WENKER J., TORCHIA D. A.. 1993. Transforming growth factor β1, NMR signal assignments of the recombinant protein expressed and isotopically enriched using Chinese hamster ovary cells. Biochemistry 32, 1152–63. WYSS D. F., DAYIE K. T., WAGNER G. 1997. The counterreceptor binding site of human CD2 exhibits an extended surface patch with multiple
44
45
46
47
48
49
50
51
conformations fluctuating with millisecond to microsecond motions. Protein Sci 6, 534–42. CUBBEDU L., MOSS C. X., SWARBRICK J. D., GOOLEY A. A., WILLIAMS K. L., CURMI P. M. G., SLADE M. B., MABBUTT B. C.. 2000. Dictyostelium discoideum as expression host: isotopic labeling of a recombinant glycoprotein for NMR studies. Protein Express Purif 19, 335–42. DESPLANCQ D., KIEFFER B., SCHMIDT K., POSTEN C., FORSTER A., OUDET P., STRUB J. M., Van DORSSELAER A., WEISS E. 2001. Cost-effective and uniform 13 C- and 15 N-labeling of the 24-kDa N-terminal domain of the Escherichia coli gyrase B by overexpression in the photoautotrophic cyanobacterium Anabaena sp. PCC 7120. Protein Express Purif 23, 207–17. SENN H., WEBER C., KOBEL H., TRABER R. 1991. Selective 13 C-labelling of cyclosporin A. Eur J Biochem 199, 653–8. YEE A. A., O’NEIL J. D.. 1992. Uniform 15 N labeling of a fungal peptide: the structure and dynamics of an alamethicin by 15 N and 1 H NMR spectroscopy. Biochemistry 31, 3135–43. IKEDA Y., NAGANAWA H., KONDO S., TAKEUCHI T. 1992. Biosynthesis of bellenamine by Streptomyces nashvillensis using stable isotope labeled compounds. J Antibiot (Tokyo) 45, 1919–24. SUAREZ C., KOHLER S. J., ALLEN M. M., KOLODNY N. H.. 1999. NMR study of the metabolic 15 N isotopic enrichment of cyanophycin synthesized by the cyanobacterium Synechocystis sp. strain PCC 6308. Biochim Biophys Acta 1426, 429–38. BIEKOFSKY R. R., MARTIN S. R., MCCORMICK J. E., MASINO L., FEFEU S., BAYLEY P. M., FEENEY J. 2002. Thermal stability of calmodulin and mutants studied by 1 H–15 N HSQC NMR measurements of selectively labeled [15 N]Ile proteins. Biochemistry 41, 6850–9. VUISTER G. W., KIM S. J., WU C., BAX A. 1994. 2D and 3D NMR-study of phenylalanine residues in proteins by reverse isotope labeling. J Am Chem Soc 116, 9206–10.
References 52 GOTO N. K., KAY L. E.. 2000. New developments in isotope labeling strategies for protein solution NMR spectroscopy. Curr Opin Struct Biol 10, 585–92. 53 WAUGH D. S.. 1996. Genetic tools for selective labeling of proteins with alpha-15 N-amino acids. J Biomol NMR 8, 184–92. 54 WANG H., JANOWICK D. A., SCHKERYANTZ J. M., LIU X., FESIK S. W.. 1999. A method for assigning phenylalanines in proteins. J Am Chem Soc 121, 1611–2. 55 COUGHLIN P. E., ANDERSON F. E., OLIVER E. J., BROWN J. M., HOMANS S. W., POLLAK S., LUSTBADER J. W.. 1999. Improved resolution and sensitivity of triple-resonance NMR methods for the structural analysis of proteins by use of a backbone-labeling strategy. J Am Chem Soc 121, 11871–4. 56 ATREYA H. S., CHARY K. V. R.. 2001. Selective “unlabeling” of amino acids in fractionally 13 C labeled proteins: an approach for stereospecific NMR assignments of CH3 groups in Val and Leu residues. J Biomol NMR 19, 267–72. 57 LEMASTER D. M., KUSHLAN D. M.. 1996. Dynamical mapping of E. coli thioredoxin via 13 C NMR relaxation analysis. J Am Chem Soc 118, 9255–64. 58 PERLER F. B.. 1998. Protein splicing of inteins and hedgehog autoproteolysis: structure, function, and evolution. Cell 92, 1–4. 59 YAMAZAKI T., OTOMO T., ODA N., KYOGOKU Y., UEGAKI K., ITO N., ISHINO Y., NAKAMURA H. 1998. Segmental isotope labeling for protein NMR using peptide splicing. J Am Chem Soc 120, 5591–2. 60 OTOMO T., TERUYA K., UEGAKI K., YAMAZAKI T., KYOGOKU Y. 1999. Improved segmental isotope labeling of proteins and application to a larger protein. J Biomol NMR 14, 105–14. 61 OTOMO T., ITO N., KYOGOKU Y., YAMAZAKI T. 1999. NMR observation of selected segments in a larger protein: central-segment isotope labeling through intein-mediated ligation. Biochemistry 38, 16040–4.
62 XU R., AYERS B., COWBURN D., MUIR T. W.. 1999. Chemical ligation of folded recombinant proteins: segmental isotopic labeling of domains for NMR studies. Proc Natl Acad Sci USA 96, 388–93. 63 WU H., HU Z., LIU X. Q.. 1998. Protein transsplicing by a split intein encoded in a split DnaE gene of Synechocystis sp. PCC6803. Proc Natl Acad Sci USA 95, 9226–31. 64 EVANS T. C., MARTIN D., KOLLY R., PANNE D., SUN L., GHOSH I., CHEN L., BENNER J., LIU X. Q., XU M. Q.. 2000. Protein transsplicing and cyclization by a naturally split intein from the dnaE gene of Synechocystis species PCC6803. J Biol Chem 275, 9091–4. 65 BLUM P., ORY J., BAUERNFEIND J., KRSKA J. 1994. Physiological consequences of DnaK and DnaJ overproduction in Escherichia coli. J Bacteriol 174, 7436–44. 66 STIEGE W., ERDMANN V. A.. 1995. The potentials of the in vitro biosynthesis system. J Biotechnol 41, 81–90. 67 FALK M. M., BUEHLER L. K., KUMAR N. M., GILULA N. B.. 1997. Cell-free synthesis and assembly of connexins into functional gap junction membrane channels. EMBO J 16, 2703–16. 68 HUPPA J. B., PLOEGH H. L.. 1997. In vitro translation and assembly of a complete T cell receptor-CD3 complex. J Exp Med 186, 393–403. 69 RYABOVA L. A., DESPLANCQ D., SPIRIN A. S., PLUCKTHUN A. 1997. Functional antibody production using cell-free translation: effects of protein disulfide isomerase and chaperones. Nat Biotechnol 15, 79–84. 70 MOLLA A., PAUL A. V., WIMMER E. 1991. Cell-free, de novo synthesis of poliovirus. Science 254, 1647–51. 71 KIGAWA T., MUTO Y., YOKOYAMA S. 1995. Cell-free synthesis and amino acid-selective stable isotope labelling of proteins for NMR analysis. Biomol NMR 2, 129–34. 72 PAVLOV M. Y., FREISTROFFER D. V., EHRENBERG M. 1997. Synthesis of region-labelled proteins for NMR studies by in vitro translation of column-coupled mRNAs. Biochimie 79, 415–22.
27
28 13C- and 15N-Isotopic Labeling of Proteins
73 YABUKI T., KIGAWA T., DOHMAE N., TAKIO K., TERADA T., ITO Y., LAUE E. D., COOPER J. A., KAINOSHO M., YOKOYAMA S. 1998. Dual amino acid-selective and site-directed stable-isotope labelling of the human c-Ha-Ras protein by cell-free synthesis. J Biomol NMR 11, 295–306. 74 KIGAWA T., YABUKI T., YOSHIDA Y., TSUTSUI M., ITO Y., SHIBATA T., YOKOYAMA S. 1999. Cell-free production and stable-isotope labelling of milligram quantities of proteins. FEBS Lett 442, 15–9. 75 JOSEPH S. K., BOEHNING D., PIERSON S., NICCHITTA C. V.. 1997. Membrane insertion, glycosylation, and oligomerization of inositol triphosphate receptors in a cell-free translation system. J Biol Chem 272, 1579–588. 76 DUSZENKO M., KANG X., BOHNE U., HOMKE R., LEHNER M. 1999. In vitro translation in a cell-free system from Trypanosoma brucei yields glycosylated and glycosylphosphatidyl-inositol-anchored proteins. Eur J Biochem 266, 789–97. 77 PARK Y., LUO J., SCHULTZ P. G., KIRSCH J. F.. 1997. Noncoded amino acid replacement probes of the aspartate aminotransferase mechanism. Biochemistry 36, 10517–25. 78 OHUCHI S., NAKANO H., YAMANE T. 1998. In vitro method for the generation of protein libraries using PCR amplification of a single DNA molecule and coupled transcription/translation. Nucleic Acids Res 26, 4339–346. 79 ZUBAY G. 1973. In vitro synthesis of protein in microbial systems. Annu Rev Genet 7, 267–87. 80 GOLD L. M., SCHWEIGER M. 1969. Synthesis of phage-specific x- and β-glucosyl transferases directed by T-even DNA in vitro. Proc Natl Acad Sci USA 62, 892–8. 81 GOLD L. M., SCHWEIGER M. 1971. Synthesis of bacteriophage-specific enzymes directed by DNA in vitro. Methods Enzymol 20, 537–42. 82 BARANOV V. I., SPIRIN A. S.. 1993. Gene expression in cell-free systems on preparative scale. Methods Enzymol 217, 123–42.
83 KIM D. M., KIGAWA T., CHOI C. Y., YOKOYAMA S. 1996. A highly efficient cell-free protein synthesis system from Escherichia coli. Eur J Biochem 239, 881–6. 84 ROBERTS B. E., PATERON B. M.. 1973. Efficient translation of tobacco mosaic virus RNA and rabbit globin 9S RNA in a cell-free system from commercial wheat germ. Proc Natl Acad Sci USA 70, 2330–4. 85 ANDERSON C. W., STRAUS J. W., DUDOCK B. S.. 1983. Preparation of a cell-free protein-synthesizing system from wheat germ. Methods Enzymol 101, 635–44. 86 KAWARASAKI Y., NAKANO H., YAMANE T. 1994. Prolonged cell-free protein synthesis in a batch system using wheat germ extract. Biosci Biotechnol Biochem 58, 1911–3. 87 TULIN E. E., KEN-ICHI T., SHIN-ICHIRO E. 1995. Continuously coupled transcription-translation system for the production of rice cytoplasmic aldolase. Biotechnol Bioeng 45, 511–6. 88 PELHAM H. R. B., JACKSON R. J.. 1976. An efficient mRNA-dependent translation system from reticulocyte lysates. Eur J Biochem 67, 247–56. 89 WOODWARD W. R., IVEY J. L., HERBERT E. 1974. Protein synthesis with rabbit reticulocyte preparations. Methods Enzymol 30, 724–31. 90 TSALKOVA T., ZARDENETA G., KUDLICKI W., KRAMER G., KORWITZ P. M., HARDESTY B. 1993. GroEL and GroES increase the specific enzymatic activity of newly-synthesized rhodanese if present during in vitro transcription/translation. Biochemistry 32, 3377–80. 91 JERMUTUS L., RYABOVA L. A., PLUCKTHUN A. 1998. Recent advantages in producing and selecting functional proteins by using cell-free translation. Curr Opin Biotechnol 9, 534–48. 92 IIZUKA N., SUGIURA M. 1997. Translation-component extract from Saccharomyces cerevisieae: effects of L-A RNA, 5 cap, 3 poly(A) tail on translational efficiency of mRNAs. Methods 11, 353–60. 93 TOLUNAY H. E., YANG L., KEMPER W. M., SAFER B., ANDERSON W. F.. 1984.
References
94
95
96
97
98
99
100
101
102
Homologous globin cell-free transcription system with comparison of heterologous factors. Mol Cell Biol 4, 17–22. HIROSE T., SUGIURA M. 1996. Cis-acting element and trans-acting factors for accurate translation of chloroplast psbA mRNAs: development of an in vitro translation system from tobacco chloroplasts. EMBO J 15, 1687–695. SHIMIZU Y., INOUE A., TOMARI Y., SUZUKI T., YOKOGAWA T., NISHIKAWA K., UEDA T. 2001. Cell-free translation reconstituted with purified components. Nat Biotechnol 19, 751–5. MADIN K., SAWASAKI T., OGASAWARA T., ENDO Y. 2000. A highly efficient and robust cell-free protein synthesis system prepared from wheat embryos: plants apparently contain a suicide system directed at ribosomes. Proc Natl Acad Sci USA 97, 559–64. CHEN H. Z., ZUBAY G. 1983. Prokaryotic coupled transcription–translation. Methods Enzymol 101, 674–90. CRAIG D., HOWELL M. T., GIBBS C. L., HUNT T., JACKSON R. J.. 1992. Plasmid cDNA-directed synthesis in a coupled eukaryotic in vitro transcription–translation system. Nucleic Acids Res 20, 4987–995. NEVIN D. E., PRATT J. M.. 1991. A coupled in vitro transcription–translation system for the exclusive synthesis of polypeptides from the T7 promotor. FEBS Lett 291, 259–63. LESLEY S. A., BROW M. A., BURGESS P. R.. 1991. Use of in vitro protein synthesis from polymerase chain reaction-generated templates to study interaction of Escherichia coli transcription factors with core RNA polymerase and for epitope mapping of monoclonal antibodies. J Biol Chem 266, 2632–8. BURKS E. A., CHEN G., GEORGIOU G., IVERSON B. L.. 1997. In vitro scanning saturation mutagenesis of an antibody binding pocket. Proc Natl Acad Sci USA 94, 412–7. MARTEMYANOV K. A., SPIRIN A. S., GUDKOV A. T.. 1997. Direct expression of
103
104
105
106
107
108
109
110
111
112
PCR products in a cell-free translation/transcription system: synthesis of antibacterial peptide cecropin. FEBS Lett 414, 268–70. RYABOVA L. A., VINOKUROV L. M., SHEKHOVTSOVA E. A., ALAKHOV Y. B., SPIRIN A. S.. 1995. Acetyl phosphate as an energy source for bacterial cell-free translation systems. Anal Biochem 226, 184–6. YAO S. L., SHEN X. C., SUZUKI E. 1997. Biochemical energy consumption by wheat germ extract during cell-free protein synthesis. J Ferment Bioeng 84, 7–13. KITAOKA Y., NISHIMURA N., NIWANO M. 1996. Cooperativity of stabilized mRNA and enhanced translational activity in the cell-free system. J Biotechnol 48, 1–8. PATNAIK R., SWARTZ J. R.. 1998. E. coli-based in vitro transcription/translation: in vivo-specific synthesis rates and high yields in a batch system. Bio-Techniques 24, 862–8. KIM D. M., SWARTZ J. R.. 2000. Prolonging cell-free protein synthesis by selective reagent additions. Biotech Prog 16, 385–90. KANG S. H., OH T. J., KIM R. G., KANG T. J., HWANG S. H., LEE E. Y., CHOI C. Y.. 2000. An efficient cell-free protein synthesis system using periplasmic phosphatase-removed S30 extract. J Microbiol Methods 43, 91–6. KAWARASAKI Y., NAKANO H., YAMANE Y. 1998. Phosphatase-immunodepleted cell-free protein synthesis system. J Biotechnol 61, 199–208. NAKANO H., TANAKA T., KAWARASAKI Y., YAMANE T. 1994. An increased rate of cell-free protein synthesis by condensing wheat germ extract with ultrafiltration membranes. Biosci Biotechnol Biochem 58, 631–4. SPIRIN A. S., BARANOV V. I., RYABOVA L. A., OVODOV S. Y., ALAKHOV Y. B.. 1988. A continuous cell-free translation system capable of producing polypeptides in high yield. Science 242, 1162–4. BARANOV V. I., MOROZOV I. Y., ORTLEPP S. A., SPIRIN A. S.. 1989. Gene expression in a cell-free system on the preparative scale. Gene 84, 463–6.
29
30 13C- and 15N-Isotopic Labeling of Proteins
113 SPIRIN A. S.. 1991. Cell-free protein synthesis bioreactor. In: TODD P., SIKDAR S. K., BEER M., eds, Frontiers in Bioprocessing II, 31–43. American Chemical Society, Washington, DC. 114 KIGAWA T., YOKOYAMA S. 1991. A continuous cell-free protein synthesis system for coupled transcription–translation. J Biochem 110, 166–8. 115 VOLYANIK E. V., DALLEY A., MC KAY I. A., LEIGH I., WILLIAMS N. S., BUSTIN S. A.. 1993. Synthesis of preparative amounts of biologically active interleukin-6 using a continuous-flow cell-free translation system. Anal Biochem 214, 289–94. 116 RYABOVA L., VOLYANIK E., KURNASOV P., SPIRIN A. S., WU Y., KRAMER F. R.. 1994. Coupled replication–translation of amplifiable messenger RNA: A cell-free protein synthesis system that mimics viral infection. J Biol Chem 269, 1501–5. 117 RYABOVA L. A., ORTLEB S. A., BARANOV V. I.. 1989. Preparative synthesis of globin in a continuous cell-free translation system from rabbit reticulocytes. Nucleic Acids Res 17, 4412. 118 ENDO Y., OTSUZUKI S., ITO K., MIURA K. 1992. Production of an enzymatic active protein using a continuous flow cell-free system. J Biotech 25, 221–30. 119 KUDLICKI W., KRAMER G., HARDESTY B. 1992. High efficiency cell-free synthesis of proteins: refinement of the coupled transcription/translation system. Anal Biochem 206, 389–93. 120 RYABOVA L. A., MOROZOV I. Y., SPIRIN A. S.. 1998. Continuous-flow cell-free translation, transcription–translation, and replication–translation system. I. Methods Mol Biol 77, 179–93. 121 KIM D. M., CHOI C. Y.. 1996. A semi-continuous prokaryotic coupled transcription/translation system using a dialysis membrane. Biotechnol Prog 12, 645–9. 122 KOLOSOV M. I., KOLOSOVA I. M., ALOKHOV V. Y., OVODOV S. Y., ALOKHOV Y. B.. 1992. Preparative in vitro synthesis of bioactive human interleukin-2 in a continuous flow translation system. Biotech Appl Biochem 16, 125–33.
123 BIRYUKOV S. V., SIMONENKO P. N., SHIROKOV V. A., MAJOROV S. G., SPIRIN A. S.. 1999. Method of preparing polypeptides in cell-free system and devise for its realization. PCT WO 00 50436. 124 ALAKOV Y. B., BARANOV V. I., OVODOV S. J., RYABOVA L. A., SPIRIN A. S., MOROZOV I. J.. 1995. Method of preparing polypeptides in cell-free translation system. US Patent 5478730. 125 CHEKULAYEVA M. N., KURNASOV O. V., SHIROKOV V. A., SPIRIN A. S.. 2001. Continuous-exchange cell-free protein-synthesizing system: synthesis of HIV-1 antigen Nef. Biochem Biophys Res Commun 280, 914–7. 126 MARTEMYANOV K. A., SHIROKOV V. A., KURNASOV O. V., GUDKOV A. T., SPIRIN A. S.. 2001. Cell-free production of biologically active polypeptides: application to the synthesis of antibacterial polypeptide cecropin. Protein Expr Purif 21, 456–61. 127 KIM D. M., SWARTZ J. R.. 1999. Prolonging cell-free protein synthesis with a novel ATP regeneration system. Biotech Bioeng 66, 180–8. 128 KIM D. M., SWARTZ J. R.. 2000. Oxalate improves protein synthesis by enhancing ATP supply in a cell-free system derived from Escherichia coli. Biotechnol Lett 22, 1537–42. 129 BETTON J. M., PALIBRODA N., NAMANE A., METZLER T., BˆARZU O.. 2002. Selective labelling of proteins in the RTS 500 System. In SPIRIN A. S., ed., Cell-free Translation Systems, 219–5, Springer, Berlin. 130 ELLMAN J. A., VOLKMAN B. F., MENDEL B., SCHULTZ P. G., WEMMER D. E.. 1992. Site-specific isotopic labeling of proteins for NMR studies. J Am Chem Soc 141, 7959–961. 131 GUIGNARD L., OZAWA K., PURSGLOVE S. E., OTTING G., DIXON N. E.. 2002. NMR analysis of in vitro-synthesized proteins without purification: a high-throughput approach. FEBS Lett 524, 159–62. 132 LEDERMAN M., ZUBAY G. 1967. DNA-directed peptide synthesis in vitro. I. A comparison of T2 and E. coli
References DNA- directed peptide synthesis. Biochem Biophys Acta 149, 253–8. 133 De VRIES J. K., ZUBAY G. 1967. DNA-directed peptide synthesis. II. The synthesis of the α-fragment of the enzyme β-galactosidase. Proc Natl Acad Sci USA 57, 1010–2.
134 SHIROKOV V., SIMONENKO P. N., BIRYUKOV S. V., SPIRIN A. S.. 2002. Continuous-flow and continuous-exchange cell-free translation systems and reactors. In SPIRIN A. S., ed., Cell-free Translation Systems, 91–108, Springer, Berlin.
31
1
Studying Protein Folding and Aggregation by Laser Light Scattering Klaus Gast, and Andreas J. Modler Universit¨at Potsdam, Golm, Germany
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The term laser light scattering is used here for recent developments in both static light scattering (SLS) and dynamic light scattering (DLS). In principle, DLS can be performed with conventional light sources [1], however, laser light is mandatory in practice, and recent progress in laser technology has greatly improved the capability of both DLS and SLS. Static or “classical” light scattering, which attains molecular parameters from the angular and concentration dependence of the time-averaged scattering intensity, was one of the main tools for the determination of molecular mass in the middle of the last century. However by the 1970s the method had somewhat lost its significance when it became possible to measure the molecular mass of proteins and other small biological macromolecules with high precision by electrospray ionization (ESI) or matrix-assisted laser desorption/ionization (MALDI) mass spectrometry. Furthermore these alternative methods required less material and less careful preparation of the samples. SLS is now restoring its popularity due mostly to significant technical improvements. Well-calibrated instruments equipped with lasers and sensitive solid-state detectors allow precise measurements over a wide range of particle masses (103 –108 kDa) using sample volumes of the order of 10 µL. The coupling with separation devices (e.g., size exclusion or other chromatographic techniques and field-flow fractionation) has provided the biomolecular community with a powerful molecular analyzer, particularly for polydispersed materials. The advent of DLS in the early 1970s led to great expectations concerning the study of dynamic processes in solution. A readily measurable quantity by this procedure is the translational diffusion coefficient of macromolecules from which the hydrodynamic Stokes radius RS can be calculated. Up to the 1990s, however, the Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Studying Protein Folding and Aggregation by Laser Light Scattering
method appeared to be experimentally much more difficult than SLS. Laboratorybuilt or commercially available DLS photometers consisted of gas lasers, expensive, huge correlation electronics and subsequent data evaluation was tedious because of severe requirements on computation: powerful online computers were not generally available. The first “compact” devices (DLS-based particle sizers) were only applicable to strongly scattering systems. This situation has fundamentally changed during the last 10 years. A sophisticated single-board correlator, which can be easily plugged into a desktop computer, is essentially the only additional element needed to upgrade an SLS to a DLS instrument. As a consequence of these developments, laser light scattering can now be considered as a standard laboratory method. Several compact instruments are on the market, which are valuable tools for protein analysis. Nevertheless, for special applications, including also particular studies of protein folding and aggregation, a more flexible laboratory-built set-up assembled of commercially available components is preferred. We would like to stress that laser light scattering investigations are entirely non-invasive and can be done under any solvent conditions. However, the measurements must be done with solutions that are free of undesired large particles and bubbles. Particular care has to be taken with proteins that absorb light in the visible region. It is worth mentioning that SLS measurements can also be performed using conventional fluorescence spectrometers including stopped-flow fluorescence devices in the case of kinetic experiments. Important information in the case of protein folding studies can be obtained especially from DLS, which yields the molecular dimensions of proteins in terms of the Stokes radius. However, the combined application of SLS and DLS is very useful for distinguishing changes in the dimensions of individual protein molecules due to molecular folding or association. In this review we will first deal with the measurement of hydrodynamic dimensions, intermolecular interactions, and the state of association of proteins in differently folded states. The application of kinetic laser light scattering to the study of folding kinetics is also discussed. A special section is dedicated to the study of protein aggregation accompanying misfolding. Useful protocols for some standard applications of light scattering to protein folding studies are listed in Section 6.
2 Basic Principles of Laser Light Scattering 2.1 Light Scattering by Macromolecular Solutions
Much of the experimental methodology, skilful practice, and basic theoretical approaches for static light scattering had already been developed in the first half of the last century. The birth of dynamic light scattering can be dated to the middle
2 Basic Principles of Laser Light Scattering
of the 1960s, and the next decade yielded much of the fundamental groundwork on the principles and practice of DLS from researchers in the fields of physics, macromolecular biophysics, biochemistry, and biology. This short introduction to both methods is an attempt to supply fellow researchers who are not familiar with these techniques with an intuitive understanding of the basic principles and the applicability to protein folding and aggregation problems. We will therefore only recall some of the more fundamental equations. A more detailed understanding can be achieved on the basis of numerous excellent textbooks (see, for example, Refs. [2–7]). The theory of light scattering can be approached from the so-called single-particle analysis or density fluctuation viewpoint. The single-particle analysis approach, which will be used here, is simpler to visualize and is adequate for studying the structure and dynamics of macromolecules in diluted solutions. The fluctuation approach is appropriate for studying light scattering in liquids. Furthermore, it is useful to distinguish light scattering from small particles and molecules (of d < λ/20) and large particles, whose maximum dimension, d, are comparable to or larger than the incident wavelength λ (the wavelength of blue-green light is about 500 nm). The former case is easier to handle, but much information about the structure is not accessible. Proteins used for folding studies entirely fulfill the conditions of the first case, provided we exclude large protein complexes or aggregates. The concepts of light scattering from small particles in vacuum can be applied to the scattering of macromolecules in solution by considering the excess of different quantities (e.g., light scattering, polarizibility, refractive index) of macromolecules over that of the solvent. The strength of the scattering effect primarily depends on the polarizibility α of a small scattering element, which might be the molecule itself if it is sufficiently small. Larger molecules are considered as consisting of several scattering elements. The oscillating electric field vector → E 0 (t, → r )= → E 0 ·e −1(ωt−→k0 ·→r ) of the incident light beam induces a small oscillating dipole with the dipole moment → p(t)=α· → E (t)·ω is the angular frequency and → k0 is the wave vector with the magnitude | → k0 |=2π·n0 /λ. This oscillating dipole re-emits electromagnetic radiation, which has the same wavelength in the case of elastic scattering. The intensity of scattered light at distance r is Is =
4π 4 n40 α 2 . .I0 λ4 r 2
(1)
where λ is the wavelength in vacuum, n0 is the refractive index of the surrounding medium and I0 =| → E 0 |2 is the intensity of the incident vertically polarized laser beam. Equation (1) differs only by a constant factor for the case of unpolarized light. In general, the scattered light is detected at an angle with respect to the incident beam in the plane perpendicular to the polarization of the beam (Figure 1). The wave vector pointing in this direction is → ks where → k0 and → ks have the same magnitude.
3
4 Studying Protein Folding and Aggregation by Laser Light Scattering
Fig. 1 Schematic diagram of the optical part of an light scattering instrument.
Of particular importance is the vector difference → q = → k0 − → k S , the socalled scattering vector, which determines the spatial distribution of the phases φ1 = → q · → r i of the scattered light wave emitted by individual scattering elements i (Figure 1). The magnitude of → q is q =(4π·n/λ)sin(θ/2). The phases play an essential role in the total instantaneous intensity, which results from the superposition of light waves emitted by all scattering elements within the scattering volume v defined by the primary beam and the aperture of the detector. The instantaneous intensity fluctuates in time for nonfixed particles, like macromolecules in solution, due to phase fluctuations φi (t)= → q · → r i (t) caused by changes in their location → r i (t). SLS measures the time average of the intensity, thus the term “time-averaged light scattering” would be more appropriate, although “static light scattering” appears the accepted convention now. SLS becomes q-dependent for large particles when light waves emitted from scattering elements within a individual particle have noticeable phase differences. DLS analyses the above-mentioned temporal fluctuations of the instantaneous intensity of scattered light. Accordingly, DLS can measure several dynamic processes in solution. It is evident that only those changes in the location of scattering elements lead to intensity fluctuations that produce a sufficiently large phase shift. This is always the case for translational diffusion of macromolecules. The motion of segments of chain molecules and rotational motion can be studied for large structures. Rotational motion of monomeric proteins and chain dynamics of unfolded proteins are practically not accessible by DLS. 2.2 Molecular Parameters Obtained from Static Light Scattering (SLS)
The expression for the light scattering intensity from a macromolecular solution can be derived from Eq. (1) by summation over the contributions of all macromolecules in the scattering volume v and substituting the polarizability α by related physical
2 Basic Principles of Laser Light Scattering
parameters. The latter can be done by applying the Clausius-Mosotti equation to macromolecular solutions, which leads to n2 −n20 =4π·N ·α where n and n0 are the refractive indices of the solution and the solvent, respectively. N is the number of particles (molecules) per unit volume and can be expressed by N = N A ·c/M. N a is Avogadro’s number, c and M are the weight concentration and the molecular mass (molar mass) of the macromolecules, respectively. Using the approximation n + n0 ∼ 2n0 we get n2 –n0 2 ∼ 2n0 · (∂n/∂c) · c. ∂n/∂c is the specific refractive index increment of the macromolecules in the particular solvent. The excess scattering of the solution over that of the solvent Iex = Isolution -Isolvent of noninteracting small molecules is then Ie x =
4π 2 n20 (∂n/∂c)2 v v · 2 ·c·M·I0 =H· 2 ·c·M·I0 λ4 NA r r
(2)
The optical constant H = 4π 2 n0 2 (∂n/∂c)2 /λ4 N A depends only on experimental parameters and the scattering properties of the molecules in the particular solvent, which is reflected by ∂n/∂c. The exact knowledge of ∂n/∂c, the dependence of n on protein concentration in the present case, is very important for absolute measurements of the molecular mass. For proteins in aqueous solvents of low ionic strength it is about 0.19 cm3 g−1 and practically does not depend significantly on the amino acid sequence. However, it is markedly different in solvents containing high concentrations of denaturants. The instrument parameters in Eq. (2) can be eliminated by using the Rayleigh (excess) ratio Rq = (Iex /I0 )(r 2 /v). In practice, Rq of an unknown sample is calculated from the scattering intensities of the sample and that of a reference sample of known Rayleigh ratio Rref by Rq =Rref · f corr ·(Iex /Iref )
(3)
where f corr is an experimental correction factor accounting for differences in the refractive indices of sample and reference sample [5]. In the more general case including intermolecular interactions and large molecules, which have a refractive index np and satisfy the condition 4π(np –n0 )d/λ 1 of the Rayleigh-Debye approximation, static light scattering data are conveniently presented by the relation H·c 1 +2A2 ·c = Rq M·P(q )
(4)
P(q) is the particle scattering function, which is mainly expressed in terms of the product q ·RG , where RG =R2 1/2 is the root mean square radius of gyration of the particles. Analytical expressions for P(q) are known for different particle shapes, whereas instead of RG also other characteristic size-dependent parameters are used (e.g., length L for rods or cylinders). In the limit q ·RG 1, the approximation P(q )−1 =1+(q ·RG )2 /3 can be used to estimate RG from the angular dependence of Rq . A perceptible angular dependence of the scattering intensity can only be
5
6 Studying Protein Folding and Aggregation by Laser Light Scattering
expected for particles with RG > 10 nm. Thus, light scattering is not an appropriate method to monitor changes of RG during unfolding and refolding of small monomeric proteins. A substantial angular dependence is observed for large protein aggregates, however. The concentration dependence of the right hand side of Eq. (4) yields the second virial coefficient A2 , which reflect the strength and the type of intermolecular interactions. The usefulness of measuring A2 will be discussed below. A2 is positive for predominant repulsive (covolume and electrolyte effects) and negative for predominant attractive intermolecular interactions. In general, both extrapolation to zero concentration and zero scattering angle (q = 0) are done in a single diagram (Zimm plot) for calculations of M from Eq. (4). The primary mass “moment” or “average” obtained is the weight-average molar mass M= i c i Mi / i c i in the case of polydisperse systems. This has to be taken into consideration for proteins when monomers and oligomers are present. Molar masses of proteins used in folding studies (i.e., M < 50 000 g mol−1 ) can be determined at 90◦ scattering angle because the angular dependence of Rq is negligible. Measurements at different concentration are mandatory, however, because remarkable electrostatic and hydrophobic interactions may exist under particular environmental conditions. In the following, the values of parameters measured at finite concentration are termed apparent values, e.g., Mapp .
2.3 Molecular Parameters Obtained from Dynamic Light Scattering (DLS)
Information about dynamic processes in solution are primarily contained in the temporal fluctuations of scattered electric field → E s (t). The time characteristics of these fluctuations can be described by the first-order time autocorrelation function g (1) (τ )= → E s (t)· → E s (t+τ ). The brackets denote an average over many products of → E s (t) with its value after a delay time τ ·g (1) (τ ) is only accessible in the heterodyne detection mode, where the scattered light is mixed with a small portion of the incident beam on the optical detector. This experimentally complicated detection method must be used if particle motions relative to the laboratory frame, e.g., the electrophoretic mobility in an external electric field, are measured. The less complicated homodyne mode, where only the scattered light intensity is detected, is normally the preferred optical scheme. This has the consequence that only the second-order intensity correlation function g (2) (τ ) = I(t)·I(t + τ ) is directly available. Under particular conditions, which are met in the case of light scattering from dilute solutions of macromolecules, the Siegert relation g (2) (τ ) = 1 + |g (1) (τ )|2 can be used to obtain the normalized first-order correlation function g (1) (τ ) from the measured g 2 (τ ). Analytical forms of g 1 (τ ) have been derived for different dynamic processes in solution. As we have already indicated above only translational diffusional motion essentially contributes to the fluctuations of the scattered light in the case of monomeric proteins and we can reasonably neglect rotational effects. g (1) (τ )
2 Basic Principles of Laser Light Scattering
for identical particles with a translational diffusion coefficient D has the form of an exponential g (1) (τ )=e−q
2.D.τ
(5)
D is related to the hydrodynamic Stokes radius RS by the Stokes-Einstein equation Rs=
k·T 6π·η·D
(6)
where k is Boltzmann’s constant, T is the temperature in K, and η is the solvent viscosity. g (1) (τ ) for a polydisperse solution containing L different macromolecular species (or aggregates) with masses Mi , diffusion coefficients Di and weight concentrations ci is g (1) (τ )=(
L
ai ·e −q
2
·Di ·τ
)/S
(7a)
i=1
L ai is the normalization factor. The weights ai = ci · Mi = ni ·Mi 2 where S= i=1 reflect the c × M dependence of the scattered intensity (see Eq. (2)). ni is the number concentration (molar concentration) of the macromolecular species. Accordingly, even small amounts of large particles are considerably represented in the measured g (1) (τ ). The general case of an arbitrary size distribution, which results in a distribution of D, can be treated by an integral 2·D·τ g (τ )=( a(D)·e −q d D)/s
(7b)
Equation (7b) has the mathematical form of a Laplace transformation of the distribution function a(D). Thus, an inverse Laplace transformation is needed to reconstruct a(D) or the related distribution functions c(D) and n(D). This is an ill-conditioned problem from the mathematical point of view because of the experimental noise in the measured correlation function. However, there exist numerical procedures termed “regularization”, which allow a researcher to obtain stabilized, “smoothed” solutions. A widely used program package for this purpose is “CONTIN” [8]. Nevertheless, the distributions obtained can depend sensitively on the experimental noise and parameters used during data evaluation procedure in special cases. Thus, it might be more appropriate to use simpler but more stable data evaluation schemes like the method of cumulants [9], which yields the z-averaged diffusion coefficient D and higher moments reflecting the width and asymmetry of the distribution. D can be obtained simply from the limiting slope of the logarithm of g(1) (τ ), viz. D=−q −2 · ddτ (ln|g (1) (τ )|)τ →0 .. This approach is very useful for rather narrow distributions.
7
8 Studying Protein Folding and Aggregation by Laser Light Scattering
D, like M, exhibits a concentration dependence, which is usually written in the form D(c)=D0 (1+k D ·c)
(8)
where kD is the diffusive concentration dependence coefficient, which can be used to characterize intermolecular interactions. However, kD differs from A2 . The concentration dependence of D and other macromolecular parameters was discussed in more detail by Harding and Johnson [10]. Extrapolation to zero protein concentration yielding D0 is essential in order to calculate the hydrodynamic dimensions in terms of RS for individual protein molecules. 2.4 Advantages of Combined SLS and DLS Experiments
Many of the modern laser light scattering instruments allow both SLS and DLS to be measured in one and the same experiment. This experimental procedure is more complicated, since the optimum optical schemes for SLS and DLS are different. Briefly, DLS needs focused laser beams and only a much smaller detection aperture can be employed because of the required spatial coherence of the scattered light. This reduces the SLS signal and demands higher beam stability. Combined SLS/DLS methods have two main advances for protein folding studies. The first one concerns the reliability of measurements of the hydrodynamic dimensions during folding or unfolding. The observed changes of RS could partly or entirely result from an accompanying aggregation reaction, which can be excluded if the molecular mass remains essentially constant. The second advantage is the capacity to measure molecular masses of proteins in imperfectly clarified solutions that contain protein monomers and an unavoidable amount of aggregates. This situation is often met, for example during a folding reaction or when the amount of protein is too small for an appropriate purification procedure. In this case, SLS alone would measure a meaningless weight average mass. The Stokes radii of monomers and aggregates are mostly well separated, allowing the researcher to estimate the weighting ai for monomers and aggregates fitting the measured g (1) (τ ) by Eq. (7a). Knowledge of the weighting allows the contributions of monomers and aggregates to the total scattering intensity to be distinguished and, therefore, the proper molecular mass of the protein to be obtained.
3 Laser Light Scattering of Proteins in Different Conformational States – Equilibrium Folding/Unfolding Transitions 3.1 General Considerations, Hydrodynamic Dimensions in the Natively Folded State
The molecular parameter of interest in folding studies is primarily the hydrodynamic Stokes radius RS . Additional direct estimations of the molecular mass
3 Laser Light Scattering of Proteins in Different Conformational States – Equilibrium Folding/Unfolding Transitions
from SLS data are recommended to check whether the protein is and remains in the monomeric state during folding or unfolding. Formation of a dimer of two globular subunits could easily be misinterpreted as a swollen monomer since its Stokes radius is only larger by a factor of 1.39 than that of the monomer [11]. Hydrodynamic radii in the native state have been measured for many proteins. For globular proteins, a good correlation with the mass or the number of amino acids was found [12] (see also below). This has encouraged some researchers and manufacturers of commercial DLS devices to estimate M from RS . This procedure has to be applied with great care and clearly contradicts the application of DLS to folding studies. For correct estimations of molecular dimensions in terms of RS before and after a folding transition, extrapolation of D to zero protein concentration should be done before applying Eq. (6). The concentration dependence of D indicated by kD possibly changes during unfolding/refolding transitions as it is shown in Figure 2 for thermal unfolding of RNase T1. Strong changes in kD are often observed in the case of pH-induced denaturation. Apparent Stokes radii, RS,app , are measured if extrapolation is not possible due to certain experimental limitations. 3.2 Changes in the Hydrodynamic Dimensions during Heat-induced Unfolding
The first DLS studies of the thermal denaturation of proteins were reported by Nicoli and Benedek [13]. These authors investigated heat-induced unfolding of lysozyme in the pH range between 1.2 and 2.3 and different ionic strength. The size transition coincided with the unfolding curve recorded by optical spectroscopic probes. The average radius increased by 18% from 1.85 to 2.18 nm. The same size increase was also found for ribonuclease A (RNase A) at high ionic strength. The proteins had intact disulfide bonds. A good candidate for thermal unfolding/refolding studies is RNase T1, since the transition is completely reversible even according to light scattering criteria [14]. The unfolding transition curves measured at three different protein concentrations at pH 7 are shown in Figure 2a. The concentration dependence of D in the native and unfolded states is shown in Figure 2b. The increase of the diffusive concentration dependence coefficient from 49 to 87 mL g−1 on the transition to the unfolded state indicates a strengthening of the repulsive intermolecular interactions. The Stokes radii obtained after extrapolation to zero protein concentration are 1.74 nm and 2.16 nm for the native and unfolded states, respectively. The corresponding increase in RS of 24% is somewhat larger than those for lysozyme and RNase A. This might be due to the fact that RNase T1 differs in the number and position of the disulfide bonds as compared with lysozyme and RNase A. The thermal unfolding behavior of the wild-type λ Cro repressor (Cro-WT) is more complex, but it allows us to demonstrate the advantage of combined SLS/DLS experiments. The native Cro-WT is active in the dimeric form. Spectroscopic and calorimetric studies [15] (and references cited therein) revealed that thermal unfolding proceeds via an intermediate state. The corresponding changes of the apparent
9
10 Studying Protein Folding and Aggregation by Laser Light Scattering
Fig. 2 Heat-induced unfolding of ribonuclease T1 in 10 mM sodium cacodylate, pH 7, 1 mM EDTA. a) Temperature dependence of the apparent Stokes radius at concentrations of 0.8 mg mL−1 (open circles), 1.9 mg mL−1 (filled squares), and 3.9 mg mL−1 (filled diamonds). The continuous lines were obtained by nonlinear least squares
fits according to a two-state transition yielding T m = 51.0 ± 0.5◦ C and Hm ± 497 ± 120 kJ mol−1 (average over all three fits). b) Concentration dependence of the diffusion coefficient D at 20◦ C (open square) and 60◦ C (filled square). Linear fits to the data yield the diffusive concentration dependence coefficient kD (see text).
Stokes radius and the relative scattering intensity, which reflects changes in the average mass, are quite different at low and high concentrations (Figure 3). At low concentration, Cro-WT first unfolds partly, but remains in the dimeric form between 40◦ C and 55◦ C according to the increase in RS between 40◦ C and 55◦ C and the nearly constant scattering intensity. Above 55 ◦ C, both RS and the relative scattering intensity decrease due to the dissociation of the dimer. At high concentration, the increase of RS during the first unfolding step is much stronger and is
3 Laser Light Scattering of Proteins in Different Conformational States – Equilibrium Folding/Unfolding Transitions
Fig. 3 Heat-induced unfolding and association/dissociation of Cro repressor wild-type at 1.8 mg mL−1 (filled squares) and 5.8 mg mL−1 (open squares) in 10 mM sodium cacodylate, pH 5.5.
accompanied by an increase in the scattering intensity. This is due to the formation of dimers of partly unfolded dimers. At temperatures above 55 ◦ C, dissociation into monomers is indicated by the decrease of both RS and scattering intensity. The transient population of a tetramer is a peculiarity of thermal unfolding of Cro-WT. Unfortunately, many proteins aggregate upon heat denaturation, thus preventing reliable measurements of the dimensions. 3.3 Changes in the Hydrodynamic Dimensions upon Cold Denaturation
Since the stability curves for proteins have a maximum at a characteristic temperature, unfolding in the cold is a general phenomenon [16]. However, easily attainable unfolding conditions in the cold exist only for a few proteins. Cold denaturation under destabilizing conditions was reported for phosphoglycerate kinase from yeast (PGK) [17, 18] and barstar [19, 20]. The increases in RS upon cold denaturation for
11
12 Studying Protein Folding and Aggregation by Laser Light Scattering
Fig. 4 Relative expansion, RS,app /RS,ref , upon cold denaturation for barstar (pseudo wild-type), 2.4 mg mL−1 , in 50 mM Tris/HCl, pH 8, 0.1 M KCl, 2.2 M urea (open circles) and phosphoglycerate kinase from yeast, 0.97 mg mL−1 , in 20 mM
sodium phosphate, pH 6.5, 10 mM EDTA, 1 mM DTT, 0.7 M GdmCl (filled squares). RS,ref is the Stokes radius in the reference state at the temperature of maximum stability under slightly destabilizing conditions.
PGK [18] and barstar (unpublished results) are shown in Figure 4. The size increase is a three-state transition in the case of PGK, because of the independent unfolding of the two domains in the cold. Both proteins aggregate on heating preventing to compare the results with those for heat-induced unfolding. 3.4 Denaturant-induced Changes of the Hydrodynamic Dimensions
Before the advent of DLS, most data relating to the hydrodynamic dimensions of proteins under strongly denaturing conditions had been obtained by measuring intrinsic viscosities [21]. In a pioneering DLS experiment on protein denaturation, Dubin, Feher, and Benedek [22] studied the unfolding transition of lysozyme in guanidinium chloride (GdmCl). These authors meticulously analyzed the transition at 31 GdmCl concentrations between 0 M and 6.4 M in 100 mM acetate buffer, pH 4.2, at a protein concentration of 10 mg mL−1 . It was found that RS,app increases during unfolding by 45% and 86% in the case of intact and reduced disulfide bonds, respectively. Meanwhile, the hydrodynamic dimensions at high concentrations of GdmCl have been measured for many proteins lacking disulfide bonds by using DLS. The results will be discussed below in connection with scaling (i.e., power) laws, which can be derived from this data. Here we consider briefly the influence of disulfide bonds and temperature on the dimensions of proteins in the highly
3 Laser Light Scattering of Proteins in Different Conformational States – Equilibrium Folding/Unfolding Transitions
Fig. 5 Temperature dependence of the relative expansion RS,app /RS;nat of RNase A and RNase T1 in the unfolded state at high concentrations of GdmCl. RNase T1, 2.2 mg mL−1 , in 10 mM sodium cacodylate, pH 7, 1 mM EDTA, 5.3 M GdmCl (open circles), RNase A, 2.9 mg mL−1 , in 50 mM
MES, pH 5.7, 1 mM EDTA, 6 M GdmCl (open squares), and RNase A (broken disulfide bonds), 2.9 mg mL−1 , in 50 mM MES, pH 6.5, 1 mM EDTA, 6 M GdmCl (filled squares). RNase A, 2.5 mg mL−1 , with broken disulfide bonds is highly unfolded even in the absence of GdmCl (filled diamonds).
unfolded state. Such experiments were done with RNase A [23]. Figure 5 shows the temperature dependence of the relative dimensions (RS,app /RS,native for RNase A with intact and with reduced disulfide bonds. The slight compaction with increasing temperature is typical for proteins in highly unfolded states: for example, similar results are obtained with RNase T1 (Figure 5). Surprisingly, RNase A without disulfide bonds is already unfolded in the absence of GdmCl and has larger dimensions than RNase A with intact disulfide bonds in 6 M GdmCl. 3.5 Acid-induced Changes of the Hydrodynamic Dimensions
Many proteins can be denatured by extremes of pH, and most of the studies on this have focused on using acid as denaturant. Proteins respond very differently to acidic pH. For example, lysozyme remains native-like, some other protein adopt the molten globule conformation with hydrodynamic dimensions about 10% larger than in the native state, and many proteins unfold into an expanded conformation. Fink et al. [24] have introduced a classification scheme for the unfolding behavior under acidic conditions. The molecular mechanisms of acid denaturation have been studied in the case of apomyoglobin by Baldwin and coworkers [25]. The dimensions of selected acid-denatured proteins are listed below in Table 1. Some proteins are more expanded in the acid-denatured than in the chemical-denatured state, e.g., PGK (RS,unf /RS,native = 2:49) and apomyoglobin (RS,unf /RS,native = 2.05). Furthermore, those proteins exhibit strong repulsive intermolecular interactions, as reflected by large values of kD (e.g., 550 mL g−1 for PGK at pH 2). This means,
13
14 Studying Protein Folding and Aggregation by Laser Light Scattering Table 1 Stokes radii RS , relative compactness RS /RS,native , and diffusive concentration depen-
dence coefficients kD of selected proteins in differently folded states determined by DLS Protein/state Bovine α-lactalbumin Native state (holoprotein) Unfolded state (5 M GdmCl) A-state (pH 2, 50 mM KCl) Molten globule (neutral pH) Molten globule (15% v/v TFE) TFE-state (40% v/v TFE) Kinetic molten globule PGK Native state Unfolded state (2 M GdmCl) Cold denatured state (1 ◦ C, 0.7 M GdmCl) Acid denatured state (pH 2) TFE-state (50% v/v TFE, pH 2) Apomyoglobin Native state Acid denatured (U-form, pH 2) Molten globule (I-form, pH 4) RNase T1 Native state Unfolded state (5.3 M GdmCl) Heat denatured (60 ◦ C) RNase A Native state Unfolded state (6 M GdmCl, nonreduced SS) Unfolded state (6 M GdmCl, reduced SS) TFE-state (70% v/v TFE) Barstar Native state Unfolded state (6 M urea) Cold-denatured state (1 ◦ C, 2.2 M urea)
Rs (nm)
Rs Rs,native
kD (mL g−1 )
References
1.88 2.46 2.08 2.04 2.11 2.25 1.99
1.31 1.11 1.09 1.12 1.20 1.06
−7 −9 15 62 −100 −10 −60
29 29 29 29 33 33 33
2.97 5.66 5.10 7.42 7.76
1.91 1.72 2.50 2.61
0.8 – – 552 1030
44 44 44 34 34
2.09 4.29 2.53
2.05 1.21
26 520 104
30 30 30
1.74 2.40 2.16
1.38 1.24
49 – 87
14 14 14
1.90 2.60 3.14 2.28
1.37 1.65 1.20
≈0 – – 15
56 56 56 33
1.72 2.83 2.50
1.65 1.45
5 50 −3
87 87 87
Experimental errors were omitted for brevity. Errors in RS are typically less than ±2%. Errors in kD are of the order of ±10% except for kD values close to zero. For exact solvent conditions we refer to the original literature.
that Dapp is twice as large as D0 at a concentration of about 2 mg mL−1 . This underlines the importance of extrapolation of the measured quantity to zero protein concentration. 3.6 Dimensions in Partially Folded States – Molten Globules and States
From the wealth of partly folded states existing for many proteins under different destabilizing conditions, DLS data obtained for some particular types will be
3 Laser Light Scattering of Proteins in Different Conformational States – Equilibrium Folding/Unfolding Transitions
considered here. Molten globule states [26–28] have been most extensively studied. DLS data on the hydrodynamic dimensions were published for the molten globule states of two widely studied proteins, α-lactalbumin [29] and apomyoglobin [30]. The results including data for the kinetic molten globule of α-lactalbumin are shown in Table 1. Data also exist on the geometric dimensions in terms of the radius of gyration, RG , for both α-lactalbumin [31] and apomyoglobin [30, 32] in different conformational states. Only a few DLS studies on the hydrodynamic dimension of proteins in aqueous solvents containing structure-promoting substances like trifluoroethanol (TFE) and hexafluoroisopropanol (HFIP) have been reported so far. The Stokes radii at high fluoroalcohol content, at which the structure stabilizing effect saturates, are very different for various proteins. An increase in RS of only about 20% was found for α-lactalbumin and RNase A [33] with intact disulfide bonds. By contrast, PGK at pH 2 and 50% (v/v) TFE has a Stokes radius RS = 7:76 nm exceeding that of the native state by a factor of 2.6 [34]. According to the high helical content estimated from CD data and the high charge according to the large value of kD it is conceivable that the entire PGK molecule consists of a long flexible helix. Fluoroalcohols may have a different effect at low volume fractions. In the case of α-lactalbumin, a molten globule-like state was found at 15% (v/v) TFE [33]. This state revealed a native-like secondary structure and a Stokes radius 10% larger than that of native protein. Furthermore, strong attractive intermolecular interactions were indicated by a large negative diffusive concentration dependence coefficient, kD = −100 mL g−1 , which result presumably from the exposure of hydrophobic patches in this state. kD was close to zero both in the absence and at high percentage of TFE. 3.7 Comparison of the Dimensions of Proteins in Different Conformational States
The hydrodynamic dimensions in different equilibrium states of five proteins frequently used in folding studies are summarized in Table 1. Some general rules become evident from these data, e.g., heat-denatured proteins are relatively compact as compared with those denatured by high concentration of denaturants or cold-denatured proteins. In this connection, a particular class of proteins should be mentioned, namely the so-called intrinsically unstructured proteins [35]. These proteins are essentially unfolded according to their CD spectra and have much larger Stokes radii under physiological conditions than RS;glob expected for a globular proteins of the same molecular mass. Typical examples are prothymosin α and α-synuclein, with relative dimensions RS /Rglob of 1.77 [36] and 1.52 (unpublished results), respectively. 3.8 Scaling Laws for the Native and Highly Unfolded States, Hydrodynamic Modeling
A systematic dependence of the hydrodynamic dimensions on the number of amino acids N aa or molecular mass M can be established for the dimension of globular
15
16 Studying Protein Folding and Aggregation by Laser Light Scattering
proteins in the native state and also for proteins lacking disulfide bonds in the unfolded state induced by high concentrations of GdmCl or urea. Scaling (i.e., “power”) laws for native globular proteins have been published by several authors. 1/3 on the basis of Damaschun et al. [37] obtained the relation RS [nm]=0.362·Naa the Stokes radii calculated for more than 50 globular proteins deposited in the protein databank. Uversky [38] found the dependence RS [nm]=0.0557·M0.369 using experimental data from the literature and Stokes radii measured with a carefully 0.369 calibrated size exclusion column (this is consistent with RS [nm]=0.325·Naa for a mean residue weight of 119.4 g mol−1 ). An advantage of DLS is that no calibration of the methods is needed. The relation between RS and N aa determined by Wilkins 0.29 et al. [39] is RS [nm]=0.475·Naa . The scaling laws for the native state are useful for the classification of proteins for which the high-resolution structure is not known or cannot be obtained. Deviation from the scaling law indicates that the protein has no globular shape or does not adopt a compactly folded structure under the given environmental conditions. In contrast to the native state, proteins in unfolded states populate a large ensemble of configurations, which are restricted only by the allowed ψ angles and nonlocal interactions of the polypeptide chain. In the absence of (or at balanced) nonlocal interactions (i.e., in the unperturbed or theta state), polypeptide chains behave as random chains and their conformational properties can be described by the rotational isomeric state theory [40]. Whether or not unfolded proteins are indeed true random chains is a matter of debate [41]. The dimension of a denatured protein is an important determinant of the folding thermodynamics and kinetics [42]. Thus, the chain length dependence of the hydrodynamic dimension in the form of scaling laws is an important experimental basis for the characterization of unfolded polypeptide chains. A first systematic analysis of the chain length dependence of the hydrodynamic dimensions was done by Tanford et al. [43] by measuring the intrinsic viscosities of unfolded proteins. The Stokes radii measured by DLS [44] for 12 proteins denatured by high concentrations of GdmCl are shown in Figure 0.5 was obtained. Scaling laws 6. From the data the scaling law RS [nm]=0.28·Naa were also published by other authors. Uversky [38] derived RS [nm]=0.28·M0.502 0.502 (corresponding to RS [nm]=0.315·Naa ) from Stokes radii measured by size ex0.57 clusion chromatography. Wilkins et al. [39] found RS [nm]=0.221·Naa using Stokes radii mostly measured by pulse field gradient NMR. By analysing published Stokes radii measured by different methods Zhou [42] obtained the relation 0.522 RS [nm]=0.2518·Naa . Two aspects that are closely related to measurements of the hydrodynamic dimensions of proteins are hydrodynamic modeling [45, 46] and the hydration problem [47, 48]. Particularly for globular proteins, both are important for the link between high-resolution data from X-ray crystallography or NMR and hydrodynamic data. Protein hydration appears as an adjustable parameter in recent hydrodynamic modeling procedures [49]. The average values over different proteins are 0.3 g water per g protein or in terms of the hydration shell 0.12 nm. Model calculations are also in progress for unfolded proteins [42].
4 Studying Folding Kinetics by Laser Light Scattering
Fig. 6 Stokes radii RS of 12 proteins unfolded by 6 M GdmCl in dependence on the number of amino acid residues Naa . The linear fit yields the scaling law RS [nm] = 0.28 · Na 0.5 . 1: apocytchrome c, 2: staphylokinase, 3: RNase A, 4: apomyoglobin, 5:
chymotrypsinogen A, 6: glyceraldehydes-3phosphate dehydrogenase, 7: aldolase, 8: pepsinogen, 9: streptokinase, 10: phosphoglycerate kinase, 11: bovine serum albumin, 12: DnaK. Disulfide bonds, if existing in native proteins, were reduced.
4 Studying Folding Kinetics by Laser Light Scattering 4.1 General Considerations, Attainable Time Regions
An important characteristic of the folding process is the relation between compaction and structure formation. Therefore, it is intriguing to monitor compactness during folding by measuring the Stokes radius using DLS, which is the fastest among the hydrodynamic methods. Furthermore, RS and also the radius of gyration, RG , are direct measures of the overall molecular compactness. The first studies of this kind were measurements of the compaction during lysozyme refolding by continuous-flow DLS [50]. In general, the basic experimental techniques of continuous- and stopped-flow experiments can also be applied for laser light scattering. There are no restrictions from general physical principles concerning the accessible time range for SLS. The data acquisition time T A , which is needed to obtain a sufficiently high signal-to-noise ratio (S/N), depends solely on the photon flux and can be minimized by increasing the light level. However, there are two fundamental processes that limit the accessible time scales in the case of DLS. These problems are discussed in detail by Gast et al. [51] and will be outlined only briefly in this chapter.
17
18 Studying Protein Folding and Aggregation by Laser Light Scattering
First, unlike other methods including SLS, the enhancement of the signal is not sufficient to increase the (S/N), since the measured signal is itself a statistical quantity. At light levels above the shot-noise limit the signal-to-noise ratio of the measured time correlation function depends on the ratio of T A to the correlation time τ c of the diffusion process by (S/N) = (T A /τ c ). τ c is of the order of tens of microseconds for proteins, leading to a total acquisition time T A ∼ 5-10 s in order to achieve an acceptable (S/N). This time can be shortened by splitting it into N S “shots” in a stopped-flow experiment. N S = 100 is an acceptable value leading to acquisition times of 50-100 ms during one “shot.” This corresponds to the time resolution in the case of the stopped-flow technique. The second limiting factor becomes evident when the continuous-flow method is used. Excellent time resolutions have been achieved with this method in the case of fluorescence measurements [52] and small-angle X-ray scattering [53, 54]. Continuous-flow experiments are especially useful for slow methods, since the time resolution depends only on the aging time between mixer and detector and not on the speed of data acquisition. Increasing the flow speed in order to get short aging times also reduces the resting time of the protein molecules within the scattering volume. However, to avoid distortions of the correlation functions, this time must be longer than the correlation time corresponding to the diffusional motion of the molecules. This limits the flow speed to about 15 cm s−1 under typical experimental conditions [51], resulting in aging times of about 100 ms. The benefit of continuous-flow measurements over stopped-flow measurements does not exist for DLS. Thus, DLS is only suitable for studying late stages of protein folding in the time range > 100 ms. However, this time range is important for folding and oligomerization or misfolding and aggregation (see Section 5). Two examples for the compaction of monomeric protein molecules during folding will be now be considered.
4.2 Hydrodynamic Dimensions of the Kinetic Molten Globule of Bovine α-Lactalbumin
The hydrodynamic dimensions of equilibrium molten globule states are well characterized (Table 1). It is interesting to compare the dimensions in these states with that of the kinetic counterpart. Stopped-flow DLS investigations [29] were done with the Ca2+ -free apoprotein, which folds much slower than the holoprotein. The time course of the changes of the apparent Stokes radius at a protein concentration of 1.35 mg mL−1 is shown in Figure 7a. Measurements at different protein concentrations (Figure 7b) revealed the “sticky” nature of the kinetic molten globule as compared to the completely unfolded and refolded states. The kinetic molten globule appears to be slightly more compact than the equilibrium molten globules. The dimensions of the natively folded apo- and the holoprotein are practically the same. The changes of the dimensions of holo α-lactalbumin were measured later by time-resolved X-ray scattering [55].
4 Studying Folding Kinetics by Laser Light Scattering
Fig. 7 a) Kinetics of compaction of bovine "-lactalbumin after a GdmCl concentration jump from 5 M to 0.5 M, final protein concentration 1.35 mg mL−1 , 50 mM sodium cacodylate, pH 7, 50 mM NaCl, 2 mM EGTA, T = 20 ◦ C. The data can be fitted by a single exponential with a decay time J = 43 ± 6 s. b) Concentration dependence
of the diffusion coefficient D for the apoform of bovine "-lactalbumin under native conditions (filled squares), in the presence of 5 M GdmCl (filled diamonds), and for the kinetic molten globule (open circles). The buffer was same as in (a). The Stokes radii obtained from D extrapolated to zero concentration are shown.
4.3 RNase A is Only Weakly Collapsed During the Burst Phase of Folding
In the previous section, it was demonstrated that α-lactalbumin collapses very rapidly into the molten globule with dimensions close to that of the native state.
19
20 Studying Protein Folding and Aggregation by Laser Light Scattering
Fig. 8 Kinetics of compaction of RNase A after a GdmCl concentration jump from 6 M to 0.67 M, final protein concentration 1.9 mg mL−1 , 50 mM sodium cacodylate, pH 6, T = 10 ◦ C. The single exponential fit yields a decay time J = 28 ± 5 s.
A different folding behavior was observed for RNase A [56]. The Stokes radius of RNase is 2.56 nm in the unfolded state at 6 M GdmCl and decreases only to 2.34 nm during the burst phase (Figure 8). Most of the compaction towards the native state occurs at later stages in parallel with the final arrangement of secondary structure and the formation of tertiary structure.
5 Misfolding and Aggregation Studied by Laser Light Scattering 5.1 Overview: Some Typical Light Scattering Studies of Protein Aggregation
Misfolding and aggregation of proteins have been regarded for a long time as undesirable side reactions and the importance of these processes became evident only recently in connection with recombinant protein technology and the observation that the development of particular diseases correlates with protein misfolding and misassembly [57]. The basic principles of protein aggregation have been considered for example by De Young et al. [58]. Light scattering is the most sensitive method in detecting even small amounts of aggregates in macromolecular solutions. From the numerous applications of light scattering to aggregation phenomena only a few selected applications to proteins will be discussed briefly before turning to the more special phenomenon of the combined processes of misfolding and aggregation. Light scattering studies of aggregation are mostly kinetic experiments. In contrast to the kinetic schemes discussed in Section 4, measurements of the intensity,
5 Misfolding and Aggregation Studied by Laser Light Scattering
which indicate the increase in the particle mass, are preferred in this case. Typical experiments of this kind have been reported by Zettlmeissl et al. [59], who studied the aggregation kinetics of lactic dehydrogenase using a stopped-flow laser light scattering apparatus. These authors thoroughly investigated the effect of enzyme concentration on the kinetics of aggregation and obtained an apparent reaction order of 2.5 from the concentration dependence of the initial slopes of the light scattering signal. Fast aggregation events in the millisecond time range during refolding of interleukin 1β were measured by Finke et al. [60] using a stopped-flow fluorescence device for the kinetic light scattering experiments. Many light scattering investigations have been done in connection with heatinduced aggregation of proteins. Jøssang et al. [61] studied the aggregation kinetics of human immunoglobulins by measuring the increase in the relative mass and the relative Stokes radius. From the changes of both quantities the authors obtained quite detailed information about the aggregation process, which could be consistently described by Smoluchowski’s coagulation theory [62]. The heat aggregation of the β-lactoglobulin protein was very intensively investigated by light scattering [63–65]. These studies are examples of very careful light scattering investigations of protein aggregation. In many of these studies size-exclusion chromatography has been used in addition to light scattering. This is very important in order to follow the consumption of the initial species (mostly monomers) during aggregation (see below). Monomer consumption cannot be easily derived from light scattering data, since the measured weight averaged mass is dominated by the growing large particles. If the protein aggregates or polymerization products become large enough, further structural information about the growing species can be obtained from the angular dependence of the light scattering intensity. A careful analysis of fibrin formation using stopped-flow multiangle light scattering was reported by Bernocco et al. [66]. An interesting question is: how can protein aggregation be influenced? Suppression of citrate synthase aggregation and facilitation of correct folding has been demonstrated by Buchner et al. [67] using a commercial fluorometer as a light scattering device. 5.2 Studying Misfolding and Amyloid Formation by Laser Light Scattering 5.2.1 Overview: Initial States, Critical Oligomers, Protofibrils, Fibrils Conformational conversion of proteins into misfolded structures with an accompanying assembly into large particles, mostly of fibrillar structure called amyloid, is presently an expanding field of research. In this section of the review we will try to sketch the potentials and limits of laser light scattering for elucidating these processes. It is clear beforehand that SLS and DLS are best applied for this purpose mostly in conjunction with other methods, particularly with those sensitive to changes in secondary structure (CD, FTIR, NMR) and methods giving evidence of the morphology of the growing species such as electron microscopy (EM) and/or
21
22 Studying Protein Folding and Aggregation by Laser Light Scattering
atomic force microscopy (AFM). Nevertheless, we will concentrate mainly upon the contribution from the light scattering studies in the following. Amyloid formation comprises different stages of particle growth. The information that can be obtained from light scattering experiments strongly depends on the quality of the initial state [68]. A starting solution containing essentially the conversion competent monomeric species is a prerequisite for studies of early stages involving the formation of defined oligomers, later called critical oligomers, and protofibrillar structures. These structures became interesting recently because of their possible toxic role. The size distributions and the kinetics of formation of these rather small particles can be characterized very well by light scattering. A careful experimental design (e.g., multi-angle light scattering or model calculations regarding the angular dependence of the scattered light intensity) is needed for the characterization of late products such as large protofibrils or long “mature” fibrils. The problems involved in light scattering from long fibrillar structures have been discussed in a review article by Lomakin et al. [69]. Quantitative light scattering studies of fibril formation have been reported for a few protein systems. Some of them will be discussed now. 5.2.2 Aggregation Kinetics of Aβ Peptides Aβ peptide is the major protein component found in amyloid deposits of Alzheimer’s patients and comprises 39-43 amino acids. The first light scattering investigations of Aβ aggregation kinetics were reported by Tomski and Murphy [70] for synthetic Aβ(1–40). According to their SLS and DLS data, these authors modeled the assembly kinetics as formation of rods with a diameter of 5 nm built up from spontaneously formed small cylinders consisting of eight Aβ monomers. The kinetic aggregation model was based on Smoluchowski’s theory in the diffusionlimited case. Later, the same group [71] studied the concentration dependence of the aggregation process. They found that a high molecular weight species is formed rapidly when Aβ(1-40) is diluted from 8 M urea to physiological conditions. The size of this species was largest and constant at the lowest concentration (70 µM), while it was smaller and was growing with time at higher concentrations. Furthermore, dissociation of Aβ monomers from preformed fibrils was observed. Detailed light scattering investigations of Aβ(1-40) fibrillogenesis in 0.1 M HCl have been reported by Lomakin et al. [72–74]. Aβ(1-40) fibrillization was found to be highly reproducible at low pH. In a review article [69], these authors thoroughly analyzed the requirements to obtain good quantitative data of the fibril length from the measured hydrodynamic radius. Adequate measurements can be done if the fibril length does not exceed about 150 nm. Rate constants for fibril nucleation and elongation were determined from the measured time evolution of the fibril length distribution. The initial fibril elongation rate varied linearly with the initial peptide concentration c0 below a critical concentration c* = 0.1 mM and was constant above c*. From these observations particular mechanisms for Aβ fibril nucleation were derived. Homogeneous nucleation within small (d ∼ 14 nm) Aβ micelles was proposed for c0 > c* and heterogeneous nucleation on seeds for c < c0 . From
5 Misfolding and Aggregation Studied by Laser Light Scattering
the temperature dependence of the elongation rate an activation energy of 23 kcal mol−1 was estimated for the proposed monomer addition to fibrils. Finally, it should be mentioned that reproducibility of Aβ fibrillogenesis in vitro is a serious problem in general due to possible variations in the starting conformation and the assembly state of the Aβ preparations [68].
5.2.3 Kinetics of Oligomer and Fibril Formation of PGK and Recombinant Hamster Prion Protein Presently more than 60 proteins and peptides are known that form fibrillar structures under appropriate environmental conditions. Many of these proteins are not related to any disease. Particularly interesting are those which have served as model proteins in folding studies and could also be good candidates for studying basic principles of structure conversion and amyloid formation. The potentials of laser light scattering for this purpose will be demonstrated here for PGK from yeast and recombinant prion protein from Syrian hamster, ShaPrP90–232 . A well-defined conversion competent state free of any “seeds” can be achieved for these proteins under specific environmental conditions thus allowing studies of early stages of misfolding and aggregation. Conversion of PGK starts from a partially folded state at pH 2, at and above room temperature and in the presence of defined amounts of salt [34]. The time dependence of the relative increases in the average mass and the average Stokes radius after adding Na-TCA are shown in Figure 9a. Two growth steps are clearly visible. The mass and size transition curves scale linearly with concentration [75]. Thus, the growth process is a second-order reaction. Both growth steps can be well described within the framework of Smoluchowski’s coagulation theory. During the first stage, clustering towards oligomers consisting of about 8-10 PGK molecules occurs. These oligomers, termed “critical oligomers”, assemble into protofibrillar structures during the second stage. Such a behavior could be inferred from the relation between mass and size (Figure 9c), which yields an idea of the dimensionality of the growth process. Electron microscopy at selected growth stages have confirmed this idea [75]. The mutual processes of growth and secondary structure conversion were analyzed by relating the increase in mass to the changes in the far-UV CD [75]. Recombinant PrPC can be converted into a β-rich aggregated structure at low pH and under destabilizing conditions [76–78]. The relative changes in the mass and size for ShaPrP90–232 in sodium phosphate, pH 4.2, after a GdmCl concentration jump from 0 M to 1 M are shown in Figure 9b. The aggregation process resembles that of PGK with respect to the occurrence of a well-populated oligomeric state. The transient size maximum during oligomer formation is an indication of a concentration dependent overshoot phenomenon. Extrapolation of this size maximum to zero protein concentration yielded an octamer as the smallest oligomer [79]. The subsequent slow increases in size and mass result from the formation of protofibrillar structures [79]. The change in particle morphology is again indicated by a kink in the dimensionality plot (Figure 9c).
23
24 Studying Protein Folding and Aggregation by Laser Light Scattering
Fig. 9 Amyloid assembly of PGK and recombinant Syrian hamster PrP90–232 . a); Relative increases in mass (squares) and Stokes radius (circles) during assembly of PGK, c = 2.7 mg mL−1 , into fibrillar structures in 10 mM HCl, pH 2, 9.1 mM sodium trichloroacetate, T = 20 ◦ C. Structural conversion was initiated by adding sodium trichloroacetate. Mmon and RS,mon are the molecular mass and the Stokes radius of the monomer, respectively. b) Relative increases in mass (squares) and Stokes ra-
dius (circles) during assembly of recombinant Syrian hamster PrP90–232 , c = 0.59 mg mL−1 , in 20 mM sodium acetate, pH 4.2, 50 mM NaCl, 1 M GdmCl, 20 ◦ C. Structural conversion was initiated by a GdmCl concentration jump from 0 to 1 M. c) Dimensionality plots, M(t)/Mmon versus RS (t)/RS,mon for the data shown in (a) and (b) for PGK (squares) and Syrian hamster PrP90–232 (circles), respectively. The fit for the data of PGK corresponds to a dimensionality of 1.6.
5 Misfolding and Aggregation Studied by Laser Light Scattering
Fig. 10 Amyloid assembly monitored by size-exclusion chromatography. a) PGK (c = 0.14 mg mL−1 , 10 mM HCl, pH 2, 190 mM NaCl) and b) recombinant Syrian hamster PrP90–232 (c = 0.08 mg mL−1 , in 20 mM
sodium acetate, pH 4.2, 50 mM NaCl, 1 M GdmCl) at room temperature. The conversion process was triggered either by addition of salt (PGK) or GdmCl (prion protein), respectively.
The concentration dependence of the growth process revealed remarkable kinetic differences as compared to PGK according to the estimated apparent reaction order of 3. The differences in the formation of the critical oligomers are well documented by size exclusion chromatography (Figure 10). Different oligomeric states can be detected during aggregation of PGK, while only monomers and the critical oligomer (octamer) are populated in the case of ShaPrP90–232 . Accordingly, the octamer is the smallest stable oligomer. Complementary FTIR and CD studies [79] have led to the conclusion that the misfolded secondary structure is stabilized within the octamer. The sole appearance of monomers and octamers and/or higher oligomers tells us that the dimensionality of the first growth process in the case of ShaPrP90–232 (Figure 9c) is due to changes in the population of two discrete species.
25
26 Studying Protein Folding and Aggregation by Laser Light Scattering
Kinks in the dimensionality plot are also observed when various growth stages are not clearly separated in plots of M and RS versus time (Figure 9a,b) because of comparable growth rates. This is a clear advantage of this data representation. However, one must be careful in relating the (apparent) dimensionality derived from these plots to simple geometrical models. 5.2.4 Mechanisms of Misfolding and Misassembly, Some General Remarks Viewed from the perspective of chemical kinetics amyloid formation is a complex polymerization reaction, since assembly into amyloid fibrils is accompanied by secondary structure rearrangement into a predominant β-sheet structure. β-sheet strands are needed for association at the edges of monomeric subunits to build up the fibrillar end products [80]. This is in marked contrast to polymerization reactions of low-molecular-weight compounds, where the monomers possess already the functional groups for polymerization at initiation of the reaction [81] and to polymerization of proteins like actin, microtubulin [82], and sickle cell hemoglobin [83]. In the latter cases the proteins possess the assembly-competent structure immediately after the start of the reaction. These processes are usually interpreted in the framework of a nucleation polymerization mechanism [82, 83]. This interpretation has been overtaken to rationalize the kinetics of amyloid formation [84] and only little attention has been paid to the relation between secondary and quaternary structure formation in models of amyloid fibril assembly. The latest model proposals have put their focus especially on this point and have emphasized the population of intermediate long-lived oligomeric assemblies on the pathway to amyloid fibrils [75, 85].
6 Protocols and Instrumentation 6.1 Laser Light Scattering Instrumentation 6.1.1 Basic Experimental Set-up, General Requirements The experimental set-up for light scattering measurements in macromolecular solutions is relatively simple (Figure 1). A beam of a continuous wave (cw) laser is focused into a temperature-stabilized cuvette. A temperature stability better than ± 0.1◦ C is required for DLS experiments because of the temperature dependence of the solvent viscosity. The scattered light intensity is detected at one or different scattering angles using either photomultiplier tubes or avalanche photodiode detectors (APD). Though photomultiplier tubes have still some advantages, an important feature of APDs is the much higher quantum yield (>50%) within the wavelength range of laser light between 400 and 900 nm. The expense of a light scattering system depends essentially on the type of the laser and the complexity of the detection system. For example, simple systems detect the light only at 90◦ or in the nearly backward scattering direction, while
6 Protocols and Instrumentation
sophisticated devices are able to monitor the scattered light simultaneously at many selectable scattering angles. An important feature is the construction of the sample cell holder, which should allow the use of different cell types. As discussed earlier, for folding studies and molecular weight determinations of proteins detection at one scattering angle is sufficient. Multi-angle detection is recommended for studies of protein assembly, when the size of the formed particles exceeds about 50 nm. A criterion for choosing the appropriate cell type is the decision between batch and flow experiments and the available amount of protein. Batch experiments require the lowest amount of substance, since standard microfluorescence cells with volumes of the order of 10 µL can be used. These cells have the further advantage that the protein concentration can be measured in the same cell. Flow-through cells, which are essential for stopped-flow experiments and on-line coupling to particle separation devices, like HPLC, FPLC, and field-flow fractionation, are also useful for batch experiments in the following respect. A direct connection of these cell to a filter unit are the best way to obtain perfectly cleaned samples. The cells must be cleaned first by flushing a large amount of water and solvent through the filter and the cell. The protein sample is then applied without removing the filter unit. Several cell types can be used including standard fluorescence flow-through cells, sample cells used in stopped-flow systems, or special flow-through cells provided by manufacturers of light scattering equipments. Most of these flow cells have the disadvantage of a restricted angular range. An exception is the cell used in the Wyatt multiangle laser light scattering (MALLS) device. Cylindrical cells in connection with an index matching bath have to be used for precise measurements at different, particularly small scattering angles. In the case of SLS experiments, measurements of the photocurrent with subsequent analog-to-digital conversion as well as photon-counting techniques are used. The latter technique is preferred in DLS experiments. An appropriate correlation electronics is no longer a problem for modern dynamic light scattering devices since practically ideal digital correlators can be built up. The minimum protein concentration needed in a DLS experiment depends on the molecular mass and the optical quality of the instrument, particularly on the available laser power. But, even at sufficiently high laser power of about 0.5 W the experiments become impractical below a certain concentration. This happens when the excess scattering of the protein solution falls below the scattering of pure water, e.g., for a protein with M = 10 kDa at concentrations below 0.25 mg mL−1 . Thus, DLS experiments can be done without special efforts at concentrations satisfying the condition c × M ≥ 5 kDa mg mL−1 . For SLS experiments the value of c × M can be about one order of magnitude lower. Removal of undesired contaminations, such as dust, bubbles, or other large particles, is a crucial step in preparing samples for light scattering experiments. Large particles can readily removed by membrane filters having pore sizes of about 0.1 µxm or by centrifugation at 10 000 g or higher, while size-exclusion chromatography or similar separation techniques have to be applied when a protein solution contains oligomers or small aggregates only slightly larger than the monomeric protein. Fluids used to clean the scattering cell must also be free of all particulates.
27
28 Studying Protein Folding and Aggregation by Laser Light Scattering
In an SLS experiment, the time-averaged scattering intensities of solvent, solution, and reference sample (scattering standard) have to be measured. Data evaluation has to be done according to Eqs (3) and (4). Commercial instruments are in general equipped with software packages that generate Zimm plots according to Eq. (4). Additionally, the software contains the Rayleigh ratios for scattering standards at the wavelength of the instrument. A widely used scattering standard is toluene. The Rayleigh ratios of toluene and other scattering standards at different wavelength are given in Ref [5]. Instrument calibration can also be done using an appropriate protein solution. However, the protein must be absolutely monomeric under the chosen solvent conditions. DLS measurements demand slightly more experience concerning experiment duration and data evaluation. Calibration is not needed. The basis for obtaining stable and reliable results is an adequately measured correlation function. A visual inspection of this primary data is recommended before any data evaluation is done. The signal-to-noise ratio that is needed depends on the complexity of the system under study. For example, short measurement times are acceptable only in the case of unimodal narrow size distributions, when a noisy correlation function can be well fitted by a single exponential. In practise, two data evaluation schemes are mainly used, the method of cumulants [9] yielding an average Stokes radius and the second and third moments of the size distribution and/or the approximate reconstruction of the size distribution by an inverse Laplace transformation, mostly using the program CONTIN [8]. The corresponding software packages are available from the author or are provided by the manufacturers of the instruments. CONTIN allows the reconstruction of a distribution function in terms of scattering power, weight concentration, or number (molar) concentration. Though the latter two types are more instructive, they should only be calculated when a reasonable particle shape can be anticipated within the considered size range. 6.1.2 Supplementary Measurements and Useful Options For the estimation of the molecular mass from SLS, and the diffusion coefficient and the Stokes radius from DLS, additional physical quantities are required. For the calculation of the optical constant in Eq. (2) the refractive index of the solvent and the refractive index increment of the protein in the particular solvent must be known. The former can be measured easily by an Abbe refractometer, whereas an differential refractometer is needed for precise measurements of ∂n/∂c. Measurements of ∂n/∂c are often more expensive than the light scattering experiment itself. A comprehensive collection of ∂n/∂c values can be found in [86]. ∂n/∂c = 0.19 mL g−1 is a good approximation for proteins in aqueous solution at moderate (<200 mM) salt concentrations. For calculations of RS from DLS data via Eq. (6) the dynamic viscosity η of the solvent must be known. η can be obtained from the kinematic viscosity v (e.g., measured by an Ubbelohde type viscometer) and the density p (measured by a digital densitometer, e.g., DMA 58, Anton Paar, Austria) by η = v × p. A very useful option for light scattering instruments is the on-line coupling with FPLC, HPLC, or field-flow fractionation (FFF) and a concentration
6 Protocols and Instrumentation
detector, which might be a UV absorption monitor or a refractive index detector. This allows direct measurements of M during the flow. In the case of known ∂n/∂c for the particular solvent conditions, the molecular mass can be obtained from the output signals of the SLS and refractive index detectors by M = ke × (∂n/∂c)−1 × (output)SLS /(output)RI , where ke is the instrument calibration constant. A parallel estimation of RS is possible, when the scattering is strong enough allowing a sufficiently precise DLS experiment during a few minutes. A further option is the coupling to a mixer allowing kinetic light scattering experiments. Two schemes can be used. Either a mixing device is coupled to a light scattering apparatus equipped with a flow cell [66] or the light scattering apparatus is built-up around a stopped-flow device originally constructed for fluorescence detection [56].
6.1.3 Commercially Available Light Scattering Instrumentation There are several companies that produce light scattering instruments and optional units. Many of them have taken into consideration the special demands for studies on protein solutions. Accordingly, most of the systems can handle very small sample volumes, have the sensitivity to study low concentrations of monomeric proteins, and can work in batch and flow mode. For those who are interested in applying SLS and DLS to folding and aggregation studies we have listed several companies and the corresponding websites in alphabetic order.
ALV-Laservertriebsgesellschaft m.b.H., Germany: http://www.alvgmbh.de Brookhaven Instruments Corporation, USA: http://www.bic.com Malvern Instruments Ltd., UK: http://www.malvern.co.uk Precision Detectors, USA: http://www.lightscatter.com Proterion Corporation, USA (formerly Protein Solutions): http://www. proterion.com Ĺ Wyatt Technology Corporation, USA: http://www.wyatt.com, visit also http://www.wyatt.de Ĺ Ĺ Ĺ Ĺ Ĺ
6.2 Experimental Protocols for the Determination of Molecular Mass and Stokes Radius of a Protein in a Particular Conformational State
The purpose of the protocols is to demonstrate critical steps of combined SLS/DLS measurements. If a commercial instrument with sophisticated data evaluation software is used, the individual steps may be involved, but slightly differently arranged in the software protocol. The protocols are based on the assumption that the instrument provides an average over time T of the scattered intensity IT and a set of N normalized correlation functions g (2) (τ ,T A calculated during an accumulation time T A and is equipped with the software to reconstruct the size distribution (CONTIN or similar packages).
29
30 Studying Protein Folding and Aggregation by Laser Light Scattering
Protocol 1
Determination of the apparent molecular mass Mapp , diffusion coefficient Dapp , and the corresponding Stokes radius Rapp of a protein at a given concentration cP at 20 ◦ C.
Equipment and reagents
SLS/DLS instrument with thermostatted cell holder UV spectrophotometer, preferentially single-beam instrument Abbe-type refractometer Flow-through cell for fluorometry, light path 1.5 mm (V = 25 µL, e.g., Hellma 176.152) and ultra-microfluorescence cell, light path 1.5 mm (V = 12 µL, e.g., Hellma 105.252). Ĺ Whatman filter Anotop and Anodisc, d = 10 mm, pore size 0.1 µm with appropriate filter holders, syringes 1-2 mL (laboratory centrifuge and pipettes for method B). Ĺ Suitable buffer, e.g., 50 mM Tris/HCl, pH 8, 100 mM KCl (the viscosity η = 1:018 cP at 20◦ C is not very different from that of water, η = 1:002 cP). Ĺ Ĺ Ĺ Ĺ
(A) Measurements with flow-through cells 1. Prepare the protein solution of the required concentration (suitable range 1-5 mg mL−1 ). 2. Wet the Anotop filter with distilled water and connect it to the flow-through cell (do not disconnect until the end of the experiment). 3. Flush at least 5 mL distilled water through filter and cell. 4. Insert the cell into the light scattering instrument and check for purity (the scattering intensity must be very low and “spikes” in intensity should be totally absent. Otherwise, repeat step 3. 5. Fill a syringe with 1-2 mL buffer and flush it through filter and cell (note that the actual dead volume of filter and cell is larger than the specified volume of the cell). 6. Insert the cell into the light scattering instrument and measure the scattering intensity of the buffer, I T,buffer . In some cases, it could be useful to run a short DLS experiment in order to check that the baseline is flat. 7. Measure the scattering intensity of the reference sample (toluene) in a separate cuvette of similar geometry. 8. Insert the flow-through cell with buffer into the UV spectrometer and measure the blank spectrum (note that the center height of the cell fits both the SLS/DLS and UV spectrophotometer). 9. Measure the refractive index of the buffer with the Abbe-type refractometer. 10. Flush about 1 mL protein solution slowly through filter and cell in order to replace the buffer by the protein solution (the protein solution that has passed filter and cell can be used to prepare a sample of lower concentration).
6 Protocols and Instrumentation
11. Measure the absorption spectrum of the protein solution in the flow-through cell, subtract the blank spectrum, and calculate the actual protein concentration in the cell on the basis of known extinction coefficients. 12. Insert the cell into the SLS/DLS instrument and start data acquisition. It is recommended to record the correlation function not in a single long run, but rather by superposition of N short runs (of about 10 s). This enables one to get a better overview about the stability and quality of the sample. Furthermore, good acquisition software allows to discard distorted runs. Though a correlation function is clearly built up within 10 s, it is mostly too noisy for adequate data evaluation. 13. Store all data and prepare data evaluation by entering additional parameters like temperature, refraction index, and viscosity. 14. Data evaluation, especially estimations of size distribution, should be done immediately giving the chance to continue the experiment. This could be useful if the resulting size distribution is still very unstable against variation in data evaluation parameters. 15. The size distribution should consist of one dominating peak in the present case. Otherwise, the protein is strongly prone to aggregation. The influence of stable aggregates of clearly distinct size can be eliminated in many cases as discussed at the end of Section 2. 16. Calculate M using Eqs. (??), and (4), the Rayleigh ratio of the reference sample (the value depends on the wavelength used in the instrument), and ∂n/∂c = 0.19 mL g−1 . (B) Measurements with ultra-microfluorescence cells Although Scheme A is superior from the light scattering point of view, a rather large amount of protein is needed. This can be circumvented by using ultra-micro cells. Here, we repeat only those steps that concern the preparation of buffer and protein solutions. Two approaches are applicable. (B1) Preparation using micro-filter units Several manufacturers (e.g. Proterion Corporation or Wyatt Technology Corporation) deliver micro-filter holder for Anodisc filter membranes with dead volumes less than 10 µL. This allows sample volumes as small as 25 µL to be filtered. The following steps have to be done. 1. Insert a dry filter membrane into the filter holder and wet the filter with buffer and filter about 100 µL buffer. 2. Flush the fluorescence cell with a strong nitrogen stream in order to remove any dust. 3. Filter about 50 µL buffer into the cell and check for purity in the light scattering instrument. Do the same measurements with the buffer and reference sample as indicated under Scheme A. (Empty the cell, wash is with distilled water, dry it with ethanol (Uvasol) and nitrogen and repeat steps 2 and 3 if dust or bubbles can be detected.)
31
32 Studying Protein Folding and Aggregation by Laser Light Scattering
4. Empty the cell carefully (washing and drying is not necessary) and filter about 25 µL protein solution into the cell. 5. Close the cell, determine the concentration and measure SLS and DLS as under Scheme A. 6. This procedure is nearly as good as filtration according to Scheme A, but the presence of dust and bubbles cannot be totally excluded. (B2) Preparation by centrifugation This procedure has to be applied when only about 15 µL of protein solution are available and differs from Scheme B1 only by step 4. The protein solution centrifugated about 15 min at 10 000 g or more has to be transferred into the scattering cell. An alternative is careful centrifugation within the scattering cell (note that ultra-micro cells are rather expensive!).
Protocol 2
Determination of the molecular mass, diffusion coefficient D, Stokes radius RS and virial coefficients of a protein in a particular state. This experimental procedure is a repetition of the sequences given in Protocol 1 for at least four protein concentrations. Scheme A is recommended, whereas a stock solution of 1 mL with the highest concentration is needed. Additional data evaluation concerns extrapolation of (H × c)/Rq (Eq. (4) with P(q) ≈ 1) and D to zero protein concentration.
Acknowledgments
The authors are grateful to Stephen Harding, University Nottingham, for valuable comments and suggestions.
References 1 JAKEMAN, E., PUSEY, P. N. & VAUGHAN, J. M. (1976). Intensity fluctuation light-scattering spectroscopy using a conventional source. Optics Commun. 17, 305–308. 2 HUGLIN, M. (1972). Light Scattering from Polymer Solutions, Academic Press, New York. 3 KRATOCHVIL, P. (1987). Classical Light Scattering from Polymer Solutions, Elsevier, Amsterdam. 4 SCHMITZ, K. S. (1990). An Introduction to Dynamic Light Scattering by
5 6 7
8
Macro-molecules. Academic Press, New York. CHU, B. (1991). Laser Light Scattering. Academic Press, New York. BROWN, W. (1993). Dynamic Light Scattering. Claredon Press, Oxford. BERNE, B. J. & PECORA, R. (2000). Dynamic Light Scattering with Applications to Chemistry, Biology, and Physics. Dover Publications, Mineola, New York. PROVENCHER, S. W. (1982). CONTIN-a general-purpose constrained regularization program for inverting
References
9
10
11
12
13
14
15
16
17
noisy linear algebraic and integralequations. Com. Phys. Commun. 27, 229–242. KOPPEL, D. E. (1972). Analysis of macromolecular polydispersity in intensity correlation spectroscopy: the method of cumulants. J. Chem. Phys. 57, 4814–4820. HARDING, S. E. & JOHNSON, P. (1985). The concentration dependence of macromolecular parameters. Biochem. J. 231, 543–547. GARCIA BERNAL, J. M. & de LA TORRE, J. G. (1981). Transport-properties of oligomeric subunit structures. Biopolymers 20, 129–139. CLAES, P., DUNFORD, M., KENNEY, A. & PENNY, V. (1992). An on-line dynamic light scattering instrument for macromolecular characterization. In Laser Light Scattering in Biochemistry (HARDING, S. E., SATTELLE, D. B. & BLOOMFIELD, V. A., eds), pp. 66–76. Royal Society of Chemistry, Cambridge. NICOLI, D. F. & BENEDEK, G. B. (1976). Study of the thermal denaturation of lysozyme and other globular proteins by light-scattering spectroscopy. Biopolymers 15, 2421–2437. GAST, K., ZIRWER, D., DAMASCHUN, H., HAHN, U., M¨ULLER-FROHNE, M., WIRTH, M. & DAMASCHUN, G. (1997). Ribonuclease T1 has different dimensions in the thermally and chemically denatured states: a dynamic light scattering study. FEBS Lett. 403, 245– 248. FABIAN, H., F¨ALBER, K., GAST, K., REINSTADLER, D., ROGOV, V. V., NAUMANN, D., ZAMYATKIN, D. F. & FILIMONOV, V. V. (1999). Secondary structure and oligomerization behavior of equilibrium unfolding intermediates of the lambda Cro repressor. Biochemistry 38, 5633–5642. PRIVALOV, P. L. (1990). Cold denaturation of proteins. Crit. Rev. Biochem. Mol. Biol. 25, 281–305. GRIKO, Y. V., VENYAMINOV, S. Y. & PRIVALOV, P. L. (1989). Heat and cold denaturation of phosphoglycerate kinase (interaction of domains). FEBS Lett. 244, 276–278.
18 DAMASCHUN, G., DAMASCHUN, H., GAST, K., MISSELWITZ, R., M¨ULLER, J. J., PFEIL, W. & ZIRWER, D. (1993). Cold denaturation-induced conformational changes in phosphoglycerate kinase from yeast. Biochemistry 32, 7739–7746. 19 AGASHE, V. R., UDGAONKAR, J. B. (1995). Thermodynamics of denaturation of barstar: evidence for cold denaturation and evaluation of the interaction with guanidine hydrochloride. Biochemistry 34, 3286–3299. 20 WONG, K. B., FREUND, S. M. V. & FERSHT, A. R. (1996). Cold denaturation of barstar: H-1, N-15 and C-13 NMR assignment and characterisation of residual structure. J. Mol. Biol. 259, 805–818. 21 TANFORD, C. (1968). Protein denaturation. Adv. Protein Chem. 23, 121–282. 22 DUBIN, S. B., FEHER, G. & BENEDEK, G. B. (1973). Study of the chemical denaturation of lysozyme by optical mixing spectroscopy. Biochemistry 12, 714–719. 23 N¨OPPERT, A., GAST, K., M¨ULLER-FROHNE, M., ZIRWER, D. & DAMASCHUN, G. (1996). Reduced-denatured ribonuclease A is not in a compact state. FEBS Lett. 380, 179–182. 24 FINK, A. L., CALCIANO, L. J., GOTO, Y., KUROTSU, T. & PALLEROS, D. R. (1994). Classification of acid denaturation of proteins: intermediates and unfolded states. Biochemistry 33, 12504–12511. 25 BARRICK, D., HUGHSON, F. M. & BALDWIN, R. L. (1994). Molecular mechanism of acid denaturation-the role of histidine-residues in the partial unfolding of apomyoglobin. J. Mol. Biol. 237, 588–601. 26 PTITSYN, O. B. (1995). Molten globule and protein folding. Adv. Protein Chem. 47, 83–229. 27 ARAI, M. & KUWAJIMA, K. (2000). Role of the molten globule state in protein folding. Adv. Protein Chem. 53, 209–282. 28 KUWAJIMA, K. & ARAI, M. (2000). The molten globule state: the physical picture and biological significance. In Mechanisms of Protein Folding (PAIN, R. H., ed.), pp. 138–174. Oxford University Press, Oxford. 29 GAST, K., ZIRWER, D., M¨ULLER-FROHNE, M. & DAMASCHUN, G. (1998). Compactness of
33
34 Studying Protein Folding and Aggregation by Laser Light Scattering
30
31
32
33
34
35
36
37
38
the kinetic molten globule of bovine alpha-lactalbumin: a dynamic light scattering study. Protein Sci. 7, 2004–2011. GAST, K., DAMASCHUN, H., MISSELWITZ, R., M¨ULLER-FROHNE, M., ZIRWER, D. & DAMASCHUN, G. (1994). Compactness of protein molten globules: temperature-induced structural changes of the apomyoglobin folding intermediate. Eur. Biophys. J. 23, 297–305. KATAOKA, M., KUWAJIMA, K., TOKUNAGA, F. & GOTO, Y. (1997). Structural characterization of the molten globule of alpha-lactalbumin by solution x-ray scattering. Protein Sci. 6, 422–430. KATAOKA, M., NISHII, I., FUJISAWA, T., UEKI, T., TOKUNAGA, F. & GOTO, Y. (1995). Structural characterization of the molten globule and native states of apomyoglobin by solution X-ray scattering. J. Mol. Biol. 249, 215–228. GAST, K., ZIRWER, D., M¨ULLER-FROHNE, M. & DAMASCHUN, G. (1999). Trifluoroethanol-induced conformational transitions of proteins: insights gained from the differences between alpha-lactalbumin and ribonuclease A. Protein Sci. 8, 625–634. DAMASCHUN, G., DAMASCHUN, H., GAST, K. & ZIRWER, D. (1999). Proteins can adopt totally different folded conformations. J. Mol. Biol. 291, 715–725. WRIGHT, P. E. & DYSON, H. J. (1999). Intrinsically unstructured proteins: Re-assessing the protein structure-function paradigm. J. Mol. Biol. 293, 321–331. GAST, K., DAMASCHUN, H., ECKERT, K., SCHULZE-FORSTER, K., MAURER, H. R., MULLER, F. M., ZIRWER, D., CZARNECKI, J. & DAMASCHUN, G. (1995). Prothymosin alpha: A biologically active protein with random coil conformation. Biochemistry 34, 13211–13218. DAMASCHUN, G., DAMASCHUN, H., GAST, K., MISSELWITZ, R., ZIRWER, D., G¨UHRS, K. H., HARTMANN, M., SCHLOTT, B., TRIEBEL, H. & BEHNKE, D. (1993). Physical and conformational properties of staphylokinase in solution. Biochim. Biophys. Acta 1161, 244–248. UVERSKY, V. N. (1993). Use of fast protein size-exclusion liquid-chromatography to
39
40
41
42
43
44
45
46
47
48
49
50
study the unfolding of proteins which denature through the molten globule. Biochemistry 32, 13288–13298. WILKINS, D. K., GRIMSHAW, S. B., RECEVEUR, V., DOBSON, C. M., JONES, J. A. & SMITH, L. J. (1999). Hydrodynamic radii of native and denatured proteins measured by pulse field gradient NMR techniques. Biochemistry 38, 16424– 16431. FLORY, P. J. (1969). Statistical Mechanics of Chain Molecules. John Wiley & Sons, New York. BALDWIN, R. L. & ZIMM, B. H. (2000). Are denatured proteins ever random coils? Proc. Natl Acad. Sci. USA 97, 12391–12392. ZHOU, H. X. (2002). Dimensions of denatured protein chains from hydrodynamic data. J. Phys. Chem. 106, 5769–5775. TANFORD, C., KAWAHARA, K. & LAPANJE, S. (1966). Proteins in 6 M guanidine hydrochloride. Demonstration of random coil behavior. J. Biol. Chem. 241, 1921–1923. DAMASCHUN, G., DAMASCHUN, H., GAST, K. & ZIRWER, D. (1998). Denatured states of yeast phosphoglycerate kinase. Biochemistry (Moscow) 63, 259–275. BYRON, O. (2000). Hydrodynamic bead modeling of biological macromolecules. Methods Enzymol. 321, 278–304. GARCIA DE LA TORRE, J., HUERTAS, M. L. & CARRASCO, B. (2000). Calculation of hydrodynamic properties of globular proteins from their atomic-level structure. Biophys. J. 78, 719–730. HARDING, S. E. (2001). The hydration problem in solution biophysics: an introduction. Biophys. Chem. 93, 87–91. HALLE, B. & DAVIDOVIC, M. (2003). Biomolecular hydration: From water dynamics to hydrodynamics. Proc. Natl Acad. Sci. USA 100, 12135–12140. GARCIA DE LA TORRE, J. (2001). Hydration from hydrodynamics. General considerations and applications of bead modelling to globular proteins. Biophys. Chem. 93, 159–170. FENG, H. P. & WIDOM, J. (1994). Kinetics of compaction during lysozyme refolding studied by continuous-flow quasielastic
References
51
52
53
54
55
56
57
58
59
60
light scattering. Biochemistry 33, 13382–13390. GAST, K., ZIRWER, D. & DAMASCHUN, G. (2000). Time-resolved dynamic light scattering as a method to monitor compaction during protein folding. In Data Evaluation in Light Scattering of Polymers (HELMSTEDT, M. & GAST, K., eds), pp. 205–220. Wiley-VCH, Weinheim. SHASTRY, M. C. R., LUCK, S. D. & RODER, H. (1998). A continuous-flow capillary mixing method to monitor reactions on the microsecond time scale. Biophys. J. 74, 2714–2721. SEGEL, D. J., BACHMANN, A., HOFRICHTER, J., HODGSON, K. O., DONIACH, S. & KIEFHABER, T. (1999). Characterization of transient intermediates in lysozyme folding with time-resolved small-angle X-ray scattering. J. Mol. Biol. 288, 489–499. POLLACK, L., TATE, M. W., DARNTON, N. C., KNIGHT, J. B., GRUNER, S. M., EATON, W. A. & AUSTIN, R. H. (1999). Compactness of the denatured state of a fast-folding protein measured by submillisecond small-angle x-ray scattering. Proc. Natl Acad. Sci. USA 96, 10115–10117. ARAI, M., ITO, K., INOBE, T., NAKAO, M., MAKI, K., KAMAGATA, K., KIHARA, H., AMEMIYA, Y. & KUWAJIMA, K. (2002). Fast compaction of alpha-lactalbumin during folding studied by stopped-flow X-ray scattering. J. Mol. Biol. 321, 121–132. N¨OPPERT, A., GAST, K., ZIRWER, D. & DAMASCHUN, G. (1998). Initial hydrophobic collapse is not necessary for folding RNase A. Folding Des. 3, 213–221. JAENICKE, R. & SECKLER, R. (1997). Protein misassembly in vitro. Adv. Protein Chem. 50, 1–59. De YOUNG, L. R., DILL, K. A. & FINK, A. L. (1993). Aggregation of globular proteins. Acc. Chem. Res. 26, 614–620. ZETTLMEISSL, G., RUDOLPH, R. & JAENICKE, R. (1979). Reconstitution of lactic dehydrogenase. Noncovalent aggregation vs. reactivation. 1. Physical properties and kinetics of aggregation. Biochemistry 18, 5567–5571. FINKE, J. M., ROY, M., ZIMM, B. H. & JENNINGS, P. A. (2000). Aggregation events
61
62
63
64
65
66
67
68
69
70
occur prior to stable intermediate formation during refolding of interleukin 1 beta. Biochemistry 39, 575–583. JOSSANG, T., FEDER, J. & ROSENQVIST, E. (1985). Heat aggregation kinetics of human IgG. J. Chem. Phys. 82, 574–589. V. SMOLUCHOWSKI, M. (1917). Versuch einer mathematischen Theorie der Koagulationskinetik kolloider L¨osungen. Z. Phys. Chem. 92, 129–168. AYMARD, P., NICOLAI, T., DURAND, D. & CLARK, A. (1999). Static and dynamic scattering of beta-lactoglobulin aggregates formed after heat-induced denaturation at pH 2. Macromolecules 32, 2542–2552. LE BON, C., NICOLAI, T. & DURAND, D. (1999). Kinetics of aggregation and gelation of globular proteins after heat-induced denaturation. Macromolecules 32, 6120–6127. BAUER, R., CARROTTA, R., RISCHEL, C. & OGENDAL, L. (2000). Characterization and isolation of intermediates in beta-lactoglobulin heat aggregation at high pH. Biophys. J. 79, 1030–1038. BERNOCCO, S., FERRI, F., PROFUMO, A., CUNIBERTI, C. & ROCCO, M. (2000). Polymerization of rod-like macromolecular monomers studied by stopped-flow, multiangle light scattering: Set-up, data processing, and application to fibrin formation. Biophys. J. 79, 561–583. BUCHNER, J., SCHMIDT, M., FUCHS, M., JAENICKE, R., RUDOLPH, R., SCHMID, F. X. & KIEFHABER, T. (1991). GroE facilitates refolding of citrate synthase by suppressing aggregation. Biochemistry 30, 1586–1591. FEZOUI, Y., HARTLEY, D. M., HARPER, J. D., KHURANA, R., WALSH, D. M., CONDRON, M. M., SELKOE, D. J., LANSBURY, P. T., FINK, A. L. & TEPLOW, D. B. (2000). An improved method of preparing the amyloid beta-protein for fibrillogenesis and neurotoxicity experiments. Amyloid 7, 166–178. LOMAKIN, A., BENEDEK, G. B. & TEPLOW, D. B. (1999). Monitoring protein assembly using quasielastic light scattering spectroscopy. Methods Enzymol. 309, 429–459. TOMSKI, S. J. & MURPHY, R. M. (1992). Kinetics of aggregation of synthetic
35
36 Studying Protein Folding and Aggregation by Laser Light Scattering
71
72
73
74
75
76
77
78
beta-amyloid peptide. Arch. Biochem. Biophys. 294, 630–638. MURPHY, R. M. & PALLITTO, M. R. (2000). Probing the kinetics of beta-amyloid self-association. J. Struct. Biol. 130, 109–122. LOMAKIN, A., CHUNG, D. S., BENEDEK, G. B., KIRSCHNER, D. A. & TEPLOW, D. B. (1996). On the nucleation and growth of amyloid beta-protein fibrils: Detection of nuclei and quantitation of rate constants. Proc. Natl Acad. Sci. USA 93, 1125– 1129. LOMAKIN, A., TEPLOW, D. B., KIRSCHNER, D. A. & BENEDEK, G. B. (1997). Kinetic theory of fibrillogenesis of amyloid beta-protein. Proc. Natl Acad. Sci. USA 94, 7942– 7947. KUSUMOTO, Y., LOMAKIN, A., TEPLOW, D. B. & BENEDEK, G. B. (1998). Temperature dependence of amyloid beta-protein fibrillization. Proc. Natl Acad. Sci. USA 95, 12277–12282. MODLER, A. J., GAST, K., LUTSCH, G. & DAMASCHUN, G. (2003). Assembly of amyloid protofibrils via critical oligomers-a novel pathway of amyloid formation. J. Mol. Biol. 325, 135– 148. HORNEMANN, S. & GLOCKSHUBER, R. (1998). A scrapie-like unfolding intermediate of the prion protein domain PrP(121-231) induced by acidic pH. Proc. Natl Acad. Sci. USA 95, 6010–6014. MORILLAS, M., VANIK, D. L. & SUREWICZ, W. K. (2001). On the mechanism of alpha-helix to beta-sheet transition in the recombinant prion protein. Biochemistry 40, 6982–6987. BASKAKOV, I. V., LEGNAME, G., BALDWIN, M. A., PRUSINER, S. B. & COHEN, F. E. (2002). Pathway complexity of prion protein assembly into amyloid. J. Biol. Chem. 277, 21140–21148.
79 SOKOLOWSKI, F., MODLER, A. J., MASUCH, R., ZIRWER, D., BAIER, M., LUTSCH, G., MOSS, D. A., GAST, K. & NAUMANN, D. (2003). Formation of critical oligomers is a key event during conformational transition of recombinant Syrian hamster prion protein. J. Biol. Chem. 278, 40481–40492. 80 RICHARDSON, J. S. & RICHARDSON, D. C. (2002). Natural beta-sheet proteins use negative design to avoid edge-to-edge aggregation. Proc. Natl Acad. Sci. USA 99, 2754–2759. 81 FLORY, P. J. (1953). Principles of Polymer Chemistry. Cornell University Press, Ithaca. 82 OOSAWA, F. & ASAKURA, S. (1975). Thermodynamics of the Polymerization of Protein. Academic Press, London. 83 EATON, W. A. & HOFRICHTER, J. (1990). Sickle cell hemoglobin polymerization. Adv. Protein Chem. 40, 63–279. 84 JARRETT, J. T. & LANSBURY, P. T., Jr. (1993). Seeding “one-dimensional crystallization” of amyloid: a pathogenic mechanism in Alzheimer’s disease and scrapie? Cell 73, 1055–1058. 85 SERIO, T. R., CASHIKAR, A. G., KOWAL, A. S., SAWICKI, G. J., MOSLEHI, J. J., SERPELL, L., ARNSDORF, M. F. & LINDQUIST, S. L. (2000). Nucleated conformational conversion and the replication of conformational information by a prion determinant. Science 289, 1317–1321. 86 THEISEN, A., JOHANN, C., DEACON, M. P. & HARDING, S. E. (2000). Refractive Increment Data-book. Nottingham University Press, Nottingham. 87 GAST, K., MODLER, A. J., DAMASCHUN, H., KRO¨ BER, R., LUTSCH, G., ZIRWER, D., GOLBIK, R. & DAMASCHUN, G. (2003). Effect of environmental conditions on aggregation and fibril formation of barstar. Eur. Biophys. J. 32, 710–723.
1
Spectroscopic Techniques to Study Protein Conformation Franz Schmid Universit¨at Bayreuth, Bayreuth, Germany
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag Gmbh & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Optical spectroscopy provides the standard techniques for measuring the conformational stabilities of proteins and for following the kinetics of unfolding and refolding reactions. Practically all unfolding and refolding reactions are accompanied by changes in absorbance, fluorescence, or optical activity (usually measured as the circular dichroism). These optical properties scale linearly with protein concentration. Spectra and spectral changes can be measured with very high accuracy in dilute protein solutions, and spectrophotometers are standard laboratory equipment. Therefore the methods of optical spectroscopy will remain the basic techniques for studying protein folding. Light absorbance occurs in about 10−15 s and the excited states show lifetimes of around 10−8 s. Optical spectroscopy is thus well suited for measuring the kinetics of ultrafast processes in folding (e.g., in temperature-jump or pressure-jump experiments). Proteins absorb and emit light in the UV range of the spectrum. The absorbance originates from the peptide groups, from the aromatic amino acid side chains and from disulfide bonds. The fluorescence emission originates from the aromatic amino acids. Some proteins that carry covalently linked cofactors, such as the heme proteins, show absorbance in the visible range. In absorption, light energy is used to promote electrons from the ground state to an excited state, and when the excited electrons revert back to the ground state they can lose their energy in the form of emitted light, which is called fluorescence. When a chromophore is part of an asymmetric structure, or when it is immobilized in an asymmetric environment, left-handed and right-handed circularly polarized light are absorbed to different extents, and we call this phenomenon circular dichroism (CD). Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Spectroscopic Techniques to Study Protein Conformation
The energies of the ground and the activated state depend on the molecular environment and the mobility of the chromophores, and therefore the optical properties of folded and unfolded proteins are usually different. Spectroscopy is a nondestructive method, and, if necessary, the samples can be recovered after the experiments. This chapter provides an introduction into the spectral properties of amino acids and proteins as well as practical advice for the planning and the interpretation of spectroscopic measurements of protein folding and stability.
2 Absorbance
Light absorption can occur when an incoming photon provides the energy that is necessary to promote an electron from the ground state to an excited state. The energy (E) of light depends on its frequency v (E = hn) and is thus inversely correlated with its wavelength λ (E = hc/λ). Light in the 200-nm region has sufficient energy for exciting electrons that participate in small electronic systems such as double bonds. Aromatic molecules have lower lying excited states, and therefore they also absorb light in the region between 250 and 290 nm, which is called the “near UV” or “aromatic” region of a spectrum. The amino acids phenylalanine, tyrosine, and tryptophan absorb here, as well as the bases in nucleic acids. The aromatic region is used extensively to follow the conformational transitions of proteins and nucleic acids. Light in the visible region of the spectrum can only excite electrons that participate in extended delocalized π systems. Such systems do not occur in normal proteins, but can be introduced by site-specific labeling. The basic aspects of protein absorption are very well covered in Refs. [1–6]. The absorbance (A) is related to the intensity of the light before (Io ) and after (I) passage through the protein solution: A=−log10 (I/Io )
(1)
The absorbance depends linearly on the concentration of the absorbing molecules in a solution, and absorbance can be measured very easily and with very high accuracy. Therefore absorbance spectroscopy is the most common method for determining the concentration of biological macromolecules in solution. The relation between A and the molar concentration c is given by the Beer–Lambert relationship: A=ε×c×l
(2)
where l is the pathlength in cm, and ε is the molar absorption coefficient, which can be determined experimentally or calculated for proteins by adding up the contributions of the constituent aromatic amino acids of a protein [7–10].
2 Absorbance
Fig. 1 Absorption spectrum of lysozyme. The spectrum was measured at a protein concentration of 0.65 :M in 10 mM K-phosphate buffer pH 7.0, 25 ◦ C, in a 1-cm cell. The inset provides an expanded view of the spectrum in the near-UV region. It was measured with 5.5 :M lysozyme under otherwise identical conditions.
2.1 Absorbance of Proteins
Proteins absorb light in the UV range. As an example, the absorption spectrum of lysozyme is shown in Figure 1. The peptide bonds of the protein backbone absorb strongly between 180 and 230 nm. Figure 1 shows the descending slope of this absorption. The shoulder near 220 nm also contains significant contributions from the aromatic residues of lysozyme. The inset of Figure 1 provides an expanded view of the near-UV region of the spectrum. The absorption here is caused by the aromatic side chains of tyrosine, tryptophan, and phenylalanine. Disulfide bonds display a very weak absorbance band around 250 nm. Figure 1 emphasizes that proteins absorb much more strongly in the far-UV region than in the near-UV (the “aromatic”) region, which is commonly used to measure protein concentrations and to follow protein unfolding. Spectra of the individual aromatic amino acids are shown in Figure 2; their molar absorptions at the respective maxima in the near UV are compared in Table 1. The contributions of phenylalanine, tyrosine, and tryptophan to the absorbance of a protein are strongly different. In the aromatic region the molar absorption of phenylalanine (λmax = 258 nm, 258 = 200 M−1 cm−1 ) is smaller by an order of magnitude compared with those of tyrosine (λmax = 275 nm, 275 = 1400 M−1 cm−1 ) and tryptophan (λmax = 280 nm, 280 = 5600 M−1 cm−1 ), and it is virtually zero above 270 nm. The near-UV spectrum of a protein is therefore dominated by the contributions of tyrosine and tryptophan. The shape of the absorbance spectrum of a protein reflects the relative contents of the three aromatic residues in the molecule.
3
4 Spectroscopic Techniques to Study Protein Conformation
Fig. 2 Ultraviolet absorption spectra of the aromatic amino acids in a 1-cm cell in 0.01 M potassium phosphate buffer, pH 7.0 at 25 ◦ C. (a) 1 mM phenylalanine, (b) 0.1 mM tyrosine, (c) 0.1 mM tryptophan. The figure is taken from Ref. [6].
The absorbance maximum near 280 nm is used for determining the concentrations of protein solutions and the 280–295 nm range is used to follow protein unfolding reactions. In this wavelength range only tyrosine and tryptophan contribute to the observed absorbance and to the changes caused by conformational changes. With the distinct fine structure of their absorbance (Figure 2) the phenylalanine residues contribute “wiggles” in the 250–260 nm region, and they influence the ratio A260 /A280 , which is often used to identify contaminating nucleic acids in protein preparations. The aromatic amino acids absorb only below 310 nm, and therefore pure protein solutions should not show absorbance at wavelengths greater than 310 nm. Proteins that are devoid of tryptophan should not absorb above 300 nm (cf. Figure 2). A sloping baseline in the 310–400 nm region usually originates from light scattering when large particles, such as aggregates are present in the protein solution. A plot Table 1 Absorbance and fluorescence properties of the aromatic amino acids
Compound
Tryptophan Tyrosine Phenylalanine a b
Absorbance
Fluorescence
Sensitivity
kmax (nm)
max (M−1 cm−1 )
kmax (nm)
φFb
max × φ F b (M−1 cm−1 )
280 275 258
5600 1400 200
355 304 282
0.13 0.14 0.02
730 200 4
In water at neutral pH; data are from Ref. [26]. F , fluorescence quantum yield.
2 Absorbance
of log10 A as a function of log10 wavelength (above 310 nm) reveals the contribution of light scattering, and the linear extrapolation to shorter wavelength can be used to correct the measured absorbance for the contribution of scattering [5]. Figure 3 reflects the strong influence of tryptophan residues on absorbance spectra. It compares the absorption spectra of the wild-type form (Figure 3a) and the Trp59Tyr variant (Figure 3b) of ribonuclease T1. The wild-type protein contains one tryptophan and nine tyrosines, the Trp59Tyr variant contains 10 tryosines and no tryptophan. The Trp59Tyr replacement reduces the absorbance in the 280–290 nm region significantly (Figure 3c), and the absorbance is essentially zero above 300 nm. In folded proteins, most of the aromatic residues are buried in the hydrophobic core of the molecule, and upon unfolding they become exposed to the aqueous solvent. The spectral changes that accompany unfolding are dominated by a shift of the absorption spectra of the aromatic amino acids towards lower wavelengths
Fig. 3 Ultraviolet absorption spectra of a) the wild-type form, and c) the Trp59Tyr variant of ribonuclease T1. The spectra of the native proteins (in 0.1 M sodium acetate, pH 5.0) are shown by the continuous lines. The spectra of the unfolded proteins (in 6.0 M GdmCl in the same buffer) are shown by broken lines. The difference spectra between
the native and unfolded forms are shown in b) and d). Spectra of 15 :M protein were measured at 25◦ C in 1-cm cuvettes in a double-beam instrument with a bandwidth of 1 nm at 25◦ C. The spectra of the native and unfolded proteins were recorded successively, stored, and subtracted. The figure is taken from Ref [6].
5
6 Spectroscopic Techniques to Study Protein Conformation
(a blue shift). In addition, small changes in intensity and peak widths are observed as well. The blue shift upon unfolding is typically in the range of 3 nm. It leads to maxima in the difference spectra between native and unfolded proteins in the descending slope of the original spectrum, that is in the 285–288 nm region for tyrosine and around 291–294 nm for tryptophan [11]. Figure 3 shows also the spectra for the unfolded forms of the two variants of ribonuclease T1. The difference spectra in Figure 3b,d show that the maximal differences in absorbance are indeed observed in the descending slopes of the original spectra in the 285–295 nm region. The difference spectrum of the Trp59Tyr variant (Figure 3d) shows a prominent maximum at 287 nm. It is typical for proteins that contain tyrosine residues only. In the wild-type protein with its single tryptophan (Figure 3b) additional signals are observed between 290 and 300 nm. They originate from the absorbance bands of tryptophan, which are visible as shoulders in the protein spectrum (Figure 3a). Difference spectra between the native and the unfolded form, such as those for the variants of ribonuclease T1 in Figure 3b,d are observed for most proteins. Generally, these differences are small, but they can be determined with very high accuracy. The contributions of phenylalanine residues to difference spectra are very small. In some cases they are apparent as ripples in the spectral region around 260 nm (cf. Figure 3b,d). They are easily mistaken as instrumental noise. Proteins are usually unfolded by increasing the temperature or by adding destabilizing agents such as urea or guanidinium chloride (GdmCl). These changes not only unfold the protein, but they also influence the spectral properties of the protein solution. The refractive index of the solution decreases slightly with increasing temperature, the protein concentration decreases because the solution expands upon heating, and the ionization state of dissociable groups can change. All these effect are very small, and in combination they often lead to an almost temperature independent protein absorbance in the absence of unfolding. Chemical denaturants have a significant influence on the absorbances of tyrosine and tryptophan at 287 nm and at 291 nm, respectively, where protein unfolding is usually monitored. The refractive index of the solvent increases as a function of the concentration of GdmCl or urea, which leads to a slight shift to higher wavelengths (a red shift) of the spectra and thus to an increase in absorbance at 287 and 291 nm. A291 of tryptophan increases by about 50% when 7 M GdmCl is added. Similar increases are also observed for tyrosine at 287 nm and when urea is used as the denaturant. The GdmCl and urea dependences of tyrosine and tryptophan absorbance are found in Ref. [6]. The spectrum of the unfolded form of a protein and its denaturant dependence can be modeled by mixing tyrosine and tryptophan in a ratio as in the protein under investigation and by measuring the absorbance of this mixture as a function of the denaturant concentration. Such data provide adequate base lines for denaturant-induced unfolding transitions. The absorbance changes upon unfolding are measured in the steeply descending slopes of the absorbance spectra (cf. Figure 3) and consequently they depend on the instrumental settings (in particular on the bandwidth). The reference data must therefore be determined under the same experimental settings as the actual unfolding transition.
2 Absorbance
2.2 Practical Considerations for the Measurement of Protein Absorbance
The solvents used for spectroscopic measurements should be transparent in the wavelength range of the experiment. Water absorbs only below 170 nm, but several of the commonly used buffers absorb significantly already in the region below 230 nm. It is therefore important to select appropriate buffers for measurements in this spectral region and to carefully match buffer concentrations in the sample and the reference cell. Absorbance values of commonly used salt and buffer solutions are found in Table 2. Denaturants such as urea and GdmCl are used in very high concentrations (typically 1–10 M) in protein unfolding experiments. Therefore, the purity of these unfolding agents is of utmost importance. Even impurities in the 1 ppm range are present in the same range of concentration as the protein to be studied (often near 1 µM). Such impurities (such as cyanate in urea solutions) might lead to chemical modifications of the protein. In addition, they may absorb light and thus interfere with the measurement of protein difference spectra. Solutions for spectroscopy should be filtered through 0.45-µm filters or centrifuged to remove dust particles. For measuring difference spectra such as in protein unfolding studies, a double-beam spectrophotometer and a set of matched quartz cells should be used. The equivalence of two cuvettes is easily tested by Table 2 Absorbance of various salt and buffer solutions in the far-UV region
Compound
No absorbance above (nm) Absorbance of a 0.01 M solution in a 0.1-cm cell at: 210 nm 200 nm 190 nm
NaClO4 NaF, KF Boric acid NaCl Na2 HPO4 NaH2 PO4 Na-acetate Glycine Diethylamine NaOH Boric acid, NaOH Tricine Tris Hepes Pipes Mops Mes Cacodylate
pH 12 pH 9.1 pH 8.5 pH 8.0 pH 7.5 pH 7.0 pH 7.0 pH 6.0 pH 6.0
170 170 180 205 210 195 220 220 240 230 200 230 220 230 230 230 230 210
0 0 0 0 0 0 0.03 0.03 0.4 >0.5 0 0.22 0.02 0.37 0.20 0.10 0.07 0.01
0 0 0 0.02 0.05 0 0.17 0.1 >0.5 >2 0 0.44 0.13 0.5 0.49 0.34 0.29 0.20
0 0 0 >0.5 0.3 0.01 >0.5 >0.5 >0.5 >2 0.09 >0.5 0.24 >0.5 0.29 0.28 0.29 0.22
180 nm 0 0 0 >0.5 >0.5 0.15 >0.5 >0.5 >0.5 >2 0.3 >0.5 >0.5 >0.5 >0.5 >0.5 >0.5 >0.5
Buffers were titrated with 1 M NaOH or 0.5 M H2 SO4 to the indicated pH values. Data are from Ref. [6].
7
8 Spectroscopic Techniques to Study Protein Conformation
placing aliquots of the same solution (e.g., a solution of the protein of interest) into sample and reference cuvette. The difference in absorbance between both cells should ideally be zero (or <0.5% of the measured absorbance) in the wavelength range of interest. 2.3 Data Interpretation
Absorbance spectroscopy is an accurate method, but usually spectral changes such as those observed upon protein unfolding cannot be interpreted in molecular terms. The size and shape of difference spectra depends on the nature and the number of the aromatic amino acids as well as on the degree of burial of their side chains in the interior of the native protein. Changes can be assigned to particular aromatic residues only for proteins that carry only a single tyrosine or tryptophan residue, for example, or for mutants that differ from the reference protein only in a single aromatic residue (such as the ribonuclease T1 variants in Figure 3). In the early days of protein spectroscopy there were efforts to determine by difference spectroscopy the number of aromatic residues that are buried in the native protein and become exposed upon unfolding. Donovan [3] presented estimates for the change in absorbance produced by the transfer of a residue from the protein interior to water. He suggested numbers of −700 M−1 cm−1 at 287 nm for tyrosine and −1600 M−1 cm−1 at 292 nm for tryptophan. These values were derived by assuming that the transfer from the interior of a protein to an aqueous environment is equivalent to a transfer from 120% ethylene glycol to water. In a protein the absorbance changes upon unfolding depend on the environment of the aromatic residues in the folded protein, which are different for the individual residues, and therefore the numbers given above are, at best, reasonable first approximations. In summary, the molecular interpretation of absorption differences is often vague. The usefulness of absorption spectroscopy for studies of protein folding rather lies in the favourable properties of absorbance and the ease of the measurements. Absorbance changes linearly with concentration, and even small differences in absorbance can be measured with very high accuracy and reproducibility. Moreover, absorbance can be followed with a very high time resolution after rapid mixing or after perturbations such as a temperature or a pressure jump. Thus, absorbance spectroscopy remains to be a powerful technique for determining the stability of proteins and for following the kinetics of protein unfolding and refolding.
3 Fluorescence
A molecule emits fluorescence when, after excitation, an electron returns from the first excited state (S1 ) back to the ground state (S0 ). At about 10−8 s, the lifetime
3 Fluorescence
of the excited state is very long compared with the time for excitation, which takes only about 10−15 s (on the human time scale this would be equivalent to the times of 1 year and 1 second). An excited molecule thus has ample time to relax into the lowest accessible vibrational and rotational states of S1 before it returns to the ground state. As a consequence, the energy of the emitted light is always smaller than that of the absorbed light. In other words, the fluorescence emission of a chromophore always occurs at a wavelength that is higher than the wavelength of its absorption. Electrons can revert to the ground state S0 also by nonradiative processes (such as vibrational transitions), and in some instances transitions to the excited triplet state T1 occur. The S1 to T1 transition is facilitated by magnetic perturbations, such as the collision of an activated molecule with paramagnetic molecules or with molecules with polarizable electrons. This leads to the phenomenon of fluorescence quenching. Good collisional quenchers are iodide ions, acrylamide, or oxygen. Quenching can also occur intramolecularly, in proteins by the side chains of Cys and His and by amides and carboxyl groups. All conformational transitions that change the exposure to quenchers in the solvent or within the protein will thus lead to fluorescence changes. Another factor that leads to a decrease in emission is fluorescence resonance energy transfer (FRET), also called F¨orster transfer. The efficiency of this energy transfer between two chromophores depends, among other factors, on the extent of overlap between the fluorescence spectrum of the donor and the absorption spectrum of the acceptor, and, in particular, on the distance between donor and acceptor. In combination, these radiationless routes of deactivation compete with light emission, and therefore the quantum yields (F ) of most chromophores are much smaller than 1. F is equal to the ratio of the emitted to the absorbed photons. Its maximal value is thus 1. Good introductions into the principles of fluorescence of biological samples are found in Refs. [2, 12, 13]. 3.1 The Fluorescence of Proteins
The fluorescence of proteins originates from the phenylalanine, tyrosine, and tryptophan residues. Emission spectra for the three aromatic amino acids are shown in Figure 4, and their absorption and emission properties are summarized in Table 1. The excitation spectra correspond to the respective absorption spectra (Figure 2). The fluorescence intensity of a particular chromophore depends on both its absorption at the excitation wavelength and its quantum yield at the wavelength of the emission. The product ε max × F is thus a good parameter to characterize the fluorescence “sensitivity” of this chromophore. In proteins that contain all three aromatic amino acids, fluorescence is usually dominated by the contribution of the tryptophan residues because its sensitivity parameter is 730, which is much higher than the corresponding values which are 200 for tyrosine and only 4 for phenylalanine (Table 1). In proteins, phenylalanine fluorescence is practically not
9
10 Spectroscopic Techniques to Study Protein Conformation
Fig. 4 Fluorescence spectra of the aromatic amino acids in 0.01 M potassium phosphate buffer, pH 7.0 at 25 ◦ C. For the measurements, 100 :M phenylalanine, 6 :M tyrosine, and 1 :M tryptophan were used. Excitation was at 257 nm (Phe), 274 nm (Tyr), and 278 nm (Trp). The figure is taken from Ref. [6].
observable, because the other two aromatic residues absorb strongly around 280 nm, where phenylalanine emits, and because its emission is almost completely transferred to tyrosine and tryptophan by FRET. The fluorescence of tyrosines is often undetectable in proteins that also contain tryptophan, (i) because tryptophan shows a much stronger emission than tyrosine (Table 1), (ii) because in folded proteins tryptophan emission is often shifted to lower wavelengths and thus masks the contribution of tyrosine at 303 nm, and (iii) because FRET can occur from tyrosine to tryptophan in the compact native state of a protein [13, 14]. 3.2 Energy Transfer and Fluorescence Quenching in a Protein: Barnase
The small bacterial ribonuclease barnase provides an instructive example for how strongly protein fluorescence is influenced by the local environment of the chromophores by quenching and by energy transfer, in this case transfer between individual tryptophan residues. Barnase contains three tryptophan residues at positions 35, 71, and 94 (Figure 5a). Trp94 is in close contact with His18, Trp71 is about 10 Å away from Tyr92, and Trp35 is about 25 Å away from the other two tryptophan residues. The fluorescence of wild-type barnase increases strongly between pH 6 and pH 9, following a titration curve with an apparent pK of 7.75. This was suggested to reflect the quenching of Trp94 emission by His18. Fersht and coworkers [15] constructed single mutants of barnase, one in which His18 was mutated to a glycine and then three mutants in which one tryptophan residue at a time was mutated to another aromatic or aliphatic residue. They found that the fluorescence of barnase increased more than 2.5-fold when His17 was mutated to Gly (Figure 5b), which suggests that His18 does indeed act as an
3 Fluorescence
Fig. 5 a) The structure of barnase showing the locations of the three Trp residues and of His18. b) Fluorescence emission spectra of barnase and the variants with single mutations of His18 and of its three Trp residues. All spectra were recorded in
10 mM Bis/Tris buffer, pH 5.5 at the same protein concentration of 4 :M. Under these conditions His18 is protonated. The maximal emission of the wild-type protein (W.T.) is set as 100. The figure is taken from Ref. [15].
11
12 Spectroscopic Techniques to Study Protein Conformation
intramolecular quencher of Trp94 fluorescence. The observed fluorescence increase was, however, much higher than expected under the implicit assumption that all three tryptophan residues contribute equally to the fluorescence of the wild-type protein. The subsequent mutations of the individual tryptophan residues revealed that their contributions to the overall fluorescence are indeed strongly different. The Trp35Phe mutation alone decreased the emission by 80% (Figure 5b), which indicates that Trp35 is the major contributor to the fluorescence of wild-type barnase, and that the emission of the other two tryptophan residues is largely abolished in the folded protein. Indeed, the replacement of Trp71 by a tyrosine decreased the emission only by 15% (Figure 5b). Trp71 is close to Trp94 (Figure 5a) and loses most of its emission because of energy transfer to Trp94. Trp94 in turn is efficiently quenched by the neighboring His18. The replacement of Trp94 by Phe or Leu prevents both energy transfer from Trp71 and the quenching of Trp94 by His18. As a consequence, the substitution of Trp94 led to a twofold increase in fluorescence (Figure 5b). It reflects the increase in the fluorescence of Trp71, which, in the Trp94Phe variant, no longer loses emission by energy transfer to Trp94. In summary, this mutational analysis reveals how, in wild-type barnase, Trp71 and Trp94 lose most of their emission via quenching by His18. It demonstrates nicely that protein fluorescence is strongly influenced by energy transfer and by intramolecular quenching, which render fluorescence an excellent probe for protein folding. 3.3 Protein Unfolding Monitored by Fluorescence
The extended lifetime of the excited state is a key to understanding the changes that occur during a protein folding transition. As described above, a broad range of interactions or perturbations can influence the fluorophore while it is in the activated state. In particular, singulet-to-triplet inter system crossing can occur, or energy transfer to an acceptor. Fluorescence emission is consequently much more sensitive to changes in the environment of the chromophore than absorption. The changes in fluorescence during the unfolding of a protein are often very large. In proteins that contain tryptophan both shifts in wavelength and changes in intensity are usually observed. Depending on the environment in the folded protein the tryptophan emission of a native protein can be greater or smaller than the emission of free tryptophan in aqueous solution, and fluorescence can increase or decrease upon unfolding, depending on the protein under investigation. The emission maximum of solvent-exposed tryptophan is near 350 nm, but in a hydrophobic environment, such as in the interior of a folded protein, tryptophan emission occurs at lower wavelengths (indole shows an emission maximum of 320 nm in hexane [16]). As an example the emission spectra of native and of unfolded ribonuclease T1 are shown in Figure 6. As mentioned, ribonuclease T1 contains nine tyrosine residues and only one tryptophan (Trp59), which is inaccessible to solvent in the native protein [17].
3 Fluorescence
Fig. 6 Fluorescence emission spectra of native (—) and of unfolded (— — —) ribonuclease T1. Native ribonuclease T1 (1.4 :M) was in 0.1 M sodium acetate pH 5.0, the sample of unfolded protein contained 6.0 M GdmCl in addition. Fluorescence was
excited at a) 278 nm and b) 295 nm. The bandwidths were 3 nm for excitation and 5 nm for emission. Spectra were recorded at 25◦ C in 1 × 1 cm cells in a Hitachi F-4010 fluorimeter. The figure is taken from Ref [6].
The fluorescence of the tryptophan residues can be investigated selectively by excitation at wavelengths higher than 295 nm. Tyrosine does not absorb above 295 nm because its absorbance spectrum is blue shifted (as compared with tryptophan, Figure 2) and is thus not excited at 295 nm. A comparison of the emission spectra observed after excitation at 280 nm and at 295 nm gives information about the contributions of the tryptophan and the tyrosine residues to the fluorescence of a protein. The data for ribonuclease T1 in Figure 6 show that the shapes of the fluorescence spectra observed after excitation at 278 nm and 295 nm are virtually identical. The measured emission originates almost completely from the single Trp59, which is inaccessible to solvent and hence displays a strongly blue-shifted emission maximum near 320 nm. Tyrosine emission is barely detectable in the spectrum of the native protein, because energy transfer to tryptophan occurs. Unfolding by GdmCl results in a strong decrease in tryptophan fluorescence and a concomitant red shift of the maximum to about 350 nm. The distances between the tyrosine residues and Trp59 increase, and energy transfer becomes less efficient. As a consequence, the tyrosine fluorescence near 303 nm becomes visible in the spectrum of the unfolded protein when excited at 278 nm (Figure 6a), but not when excited at 295 nm (Figure 6b). The examples in Figure 6 indicate that, unlike the absorbance changes, the fluorescence changes upon folding can be very large and the contribution of the tryptophan residues can be studied selectively by changing the excitation wavelength. These features, together with its very high sensitivity, make fluorescence measurements extremely useful for following conformational changes in proteins. Multiple emission bands do not necessarily originate from different tryptophan residues of a folded protein. Figure 6 shows that even single tryptophan residues, such as Trp59 of ribonuclease T1 can give rise to several emission bands.
13
14 Spectroscopic Techniques to Study Protein Conformation
In contrast to ribonuclease T1, where we found a fivefold decrease of tryptophan fluorescence, the fluorescence of the FK 506 binding protein FKBP12 increases about 12-fold upon unfolding [18]. Figure 7 shows fluorescence spectra recorded between 0 M and 7.1 M urea. Incidentally both ribonuclease T1 and FKBP12 have a single tryptophan at position 59. The emission of Trp59 of FKBP12 is barely detectable in the folded protein (Figure 7), but as soon as the protein enters the transition zone (around 3 M urea, cf. inset a in Figure 7) the fluorescence increases strongly and its maximum shifts from 330 nm to 350 nm. In the experiments shown in Figure 7 the fluorescence of the tyrosine residues of FKBP12 was suppressed by exciting Trp59 selectively by light of 294 nm. At this wavelength tyrosine absorption is almost zero (cf. Figure 2). In addition, at 294 nm the absorbance of the samples is very low, the inner filter effect (cf. Section 3.5) becomes negligible, and the fluorescence emission remains linear over a wide range of protein concentrations (cf. inset b of Figure 7).
Fig. 7 Urea-induced denaturation of FKBP12 measured by intrinsic tryptophan fluorescence at 25 ◦ C. A) Samples containing 50 :M FKBP12 in 75 mM phosphate/25 mM glycine buffer, 1 mM DTT, pH 7.2 and the following concentrations of urea. A)–J) 0.0, 3.1, 3.4, 3.7, 4.1, 4.4, 4.7, 5.1, 6.1, and 7.1 M. Excitation was at 294 nm to minimize inner filter effects. Inset a) Equilibrium
unfolding curve based on the integrated emission intensities between 304 and 500 nm. The dashed lines represent linear fits to the pre- and post-transitional baselines. Inset b) A plot of the fluorescence intensity at 350 nm versus protein concentration to examine the linear dependence of the emission on protein concentration. The figure is taken from Ref. [18].
3 Fluorescence
Fig. 8 Fluorescence emission spectra of 1.5 :M of the native (——) and of the unfolded (----) Trp59Tyr variant of ribonuclease T1. This protein contains 10 Tyr residues. The native protein was in 0.1 M sodium
acetate, pH 5.0; the unfolded sample contained 6.0 M GdmCl in addition. Fluorescence was excited at 278 nm, the spectra were recorded as in Figure 6. The figure is taken from Ref. [6].
The fluorescence maximum of tyrosine remains near 303 nm, irrespective of its molecular environment. Therefore the unfolding of proteins that contain only tyrosine is usually accompanied by changes in the intensity, but not in the wavelength of emission. As an example, the fluorescence emission spectra of the Trp59Tyr variant of ribonuclease T1 in the native and unfolded states are shown in Figure 8. A decreased tyrosine fluorescence in the native state (as in Figure 8) is frequently observed. It is thought to originate from hydrogen bonding of the tyrosyl hydroxyl group and/or the proximity of quenchers, such as disulfide bonds in the folded state [19]. 3.4 Environmental Effects on Tyrosine and Tryptophan Emission
The fluorescence of the exposed aromatic amino acids of a protein depends on the solvent conditions. Solvent additives that promote the S1 → T1 transition (such as acrylamide or iodide) quench the fluorescence, and this effect can be used to measure the exposure of tryptophan side chains to the solvent [20]. The fluorescence generally decreases with increasing temperature. This decrease is substantial. To a first approximation, tyrosine and tryptophan emission decrease >1% per degree increase in temperature [6]. Fluorescence is therefore generally not suited for following the heat-induced unfolding of a protein. Denaturants such as GdmCl and urea also influence the fluorescence of tyrosine and tryptophan. The dependences on the concentration of GdmCl and urea of tryptophan and tyrosine emission are given in Ref. [6]. As in absorbance experiments (cf. Section 1) the emission of the unfolded protein can be represented well by an
15
16 Spectroscopic Techniques to Study Protein Conformation
appropriate mixture of tyrosine and tryptophan. The dependence of the emission of this mixture on denaturant concentration is often very useful to determine the baseline equivalent to the unfolded protein for the quantitative analysis of unfolding transition curves. Protein fluorescence can also be affected by salts. Charged denaturants, such as GdmCl, can thus act in a twofold manner: as a denaturant and as a salt. To identify the salt effect, it is advisable to titrate the protein in control experiments with the appropriate concentrations of NaCl or KCl. Salt effects of GdmCl are often strongest at low GdmCl concentration and thus they can severely affect the baseline of the native protein in equilibrium unfolding experiments. 3.5 Practical Considerations
Fluorescence is an extremely sensitive technique, so it is essential to avoid contamination of cuvettes and glassware with fluorescing substances. Deionized and quartz-distilled water should be used and plastic containers should be avoided because they may leach out fluorescing additives. Also, laboratory detergents usually contain strongly fluorescing substances. Fluorescing impurities are easily identified when running buffer controls. In water, a Raman scattering peak is observed. It originates from the excitation of an O H vibrational mode of the H2 O molecule, and thus it is present in all aqueous solvents. The Raman peak of water is weak and occurs at a constant frequency difference, not at a constant wavelength difference from the exciting light. The position of the Raman peak as a function of the excitation wavelength can be found in Ref. [6]. After excitation at 280 nm the Raman peak occurs at 310 nm, and therefore it overlaps with the emission spectra of the aromatic amino acids, in particular with tyrosine emission. The Raman peak of the solvent should always be measured separately and subtracted from the measured spectra. As in absorption spectroscopy impurities and dust particles should be avoided as much as possible and transparent buffers and solvents should be used (cf. Section 2). A single dust particle moving slowly through the light beam can cause severe distortions of the fluorescence signal. Continuous stirring during the fluorescence measurement is strongly recommended. Stirring keeps dust particles in rapid motion and thereby minimizes their distorting effect on the signal. In addition it continually transports new protein molecules into the small volume of the cell that is illuminated by the excitation light beam and thereby averages-out photochemical decomposition reactions over the entire sample. Stirring also improves the thermal equilibration in the cuvette, which is important, regarding the strong temperature dependence of fluorescence. Fluorescence increases with fluorophore concentration in a linear fashion only at very low concentration. Nonlinearity at higher concentration is caused by innerfilter effects. The exciting light is attenuated by the absorption of the protein and of the solvent. The extent of this attenuation depends on the absorbance of the sample at the wavelength of excitation, and therefore it can be varied by a shift in the excitation
4 Circular Dichroism
wavelength. Further attenuation originates from reabsorption of fluorescent light by protein and solvent molecules. This effect is usually small, because the absorption of proteins and solvents beyond 300 nm is almost zero. Inner-filter effects are negligible when the absorbance of the sample at the excitation wavelength is smaller than 0.1 absorbance units. Additional practical advice on how to measure protein fluorescence is found in Ref. [6].
4 Circular Dichroism
Circular dichroism (CD) is a measure for the optical activity of asymmetric molecules in solution and reflects the unequal absorption of left-handed and righthanded circularly polarized light. The CD signals of a particular chromophore are therefore observed in the same spectral region as its absorption bands. A good introduction to the basic principles of CD is provided by Refs. [2] and [21]. Proteins show CD bands in two spectral regions. The CD in the far UV (170–250 nm) originates largely from the dichroic absorbance of the amide bonds, and therefore this region is usually termed the amide region. The aromatic side chains also absorb in the far UV and accordingly they also contribute to the CD in this region. These contributions may dominate the spectra of proteins that show a weak amide CD and/or a high content of aromatic residues. The CD bands in the near UV (250–300 nm) originate from the aromatic amino acids and from small contributions of disulfide bonds. Figure 9 shows the CD spectra of pancreatic ribonuclease for the native and the denaturant-unfolded forms. The two spectral regions give different kinds of information about protein structure. The CD in the amide region reports on the backbone (i.e., the secondary) structure of a protein and is used to characterize the secondary structure and changes therein. In particular the α-helix displays a strong and characteristic CD spectrum in the far-UV region. The contributions of the other elements of secondary structure are less well defined. The content of the different secondary structures in a protein can be calculated from its CD spectrum in the amide regions. Methods for these calculations are found in Ref. [22]. The aromatic residues have planar chromophores and are intrinsically symmetric. When they are mobile, such as in short peptides or in unfolded proteins, their CD is almost zero. In the presence of ordered structure, such as in a folded protein, the environment of the aromatic side chains becomes asymmetric, and therefore they show CD bands in the near UV. The signs and the magnitudes of the aromatic CD bands cannot be calculated; they depend on the immediate structural and electronic environment of the immobilized chromophores. Therefore the individual peaks in the complex near-UV CD spectrum of proteins usually cannot be assigned to specific amino acid side chains. However, the near-UV CD spectrum represents a highly sensitive criterion for the native state of a protein. It can thus be used as a fingerprint of the correctly folded conformation.
17
18 Spectroscopic Techniques to Study Protein Conformation
Fig. 9 CD spectra of pancreatic ribonuclease (from red deer) at 10 ◦ C in the native (——) and in the unfolded state (----). The native protein was in 0.1 M sodium cacodylate/HCl, pH 6.0, the unfolded protein was in the same buffer plus 6.0 M GdmCl.
a) CD in the aromatic region. The protein concentration was 78 :M in a 1-cm cell. b) CD in the peptide region. 28 mM protein in a 0.1-cm cell. The figure is taken from Ref. [6].
4.1 CD Spectra of Native and Unfolded Proteins
Figure 10 shows representative spectra in the far-UV region for all-helical (a), for all-β (b), and for unstructured proteins (c). Helical proteins show rather strong CD bands with minima near 222 and 208 nm and a maximum near 195 nm. β proteins show weaker and more dissimilar spectra. The shape of the CD spectrum of a β
4 Circular Dichroism
protein depends, among other factors, on the length and orientation of the strands and on the twist of the sheet. Most spectra show, however, a positive ellipticity between 190 and 200 nm. In proteins with helices and sheets usually the helical contributions dominate the CD spectrum. Unordered proteins and peptides show weak CD signals above 210 nm, but a pronounced minimum between 195 and 200 nm. The strong difference near 195 nm between the CD of folded and unfolded proteins should, in principle, provide an ideal probe for monitoring protein unfolding transitions. In practice, however, most unfolding reactions are monitored near 220 nm. Wavelengths below 200 nm are almost never used, because most buffers and virtually all denaturants absorb strongly below 200 nm (Table 2), and therefore the advantage of having a strong change in signal is more than offset by the very high background absorbance of the solvent. The aromatic amino acid side chains absorb not only in the near-UV, but also in the far-UV region [23]. Accordingly, in native proteins they also contribute to the CD in the 180–230 nm region, which has traditionally been ascribed to only the amide bonds in the protein backbone. The aromatic contributions to the far-UV CD are often modest, and for helical proteins it is usually adequate to neglect them in the analyses, because the CD of helices is very strong (cf. Figure 10, top). For β proteins with their small and diverse CD signals (cf. Figure 10, middle), the aromatic bands can lead to problems and therefore the contents of β structure as calculated from the CD spectra can be seriously wrong. Carlsson and coworkers employed a mutational approach to analyze the contributions of the seven tryptophan residues to the amide CD spectrum of carbonic anhydrase [24]. They found strong contributions of these tryptophan residues to the far-UV spectrum of this protein.
4.2 Measurement of Circular Dichroism
The difference between the absorbances of right- and left-handed circularly polarized light of a protein sample is extremely small. In the far-UV region it lies in the range of 10−4 −10−6 absorbance units in samples with a total absorbance of about 1.0. This requires that less than 0.1% of the absorbance signal be measured accurately and reproducibly. Therefore highly sensitive instruments are necessary and careful sample preparation is important. CD instruments use a high-frequency photoelectric modulator to generate alternatively the two circularly polarized light components. This leads to an alternating current contribution to the photomultiplier signal that is proportional to the CD of the sample. A detailed description of the principles of CD measurements is found in Refs. [6] and [25]. Since the measured CD is so small, an extremely high stability of the CD signal is required. In general the instrument should be allowed to warm up for 30–60 min, and the stability of the baseline with and without the cuvette should be checked regularly.
19
20 Spectroscopic Techniques to Study Protein Conformation
Fig. 10 Representative CD spectra of a) helical proteins, b) proteins with $-sheet structure, and c) unordered proteins and peptides. The figure is taken from Ref. [22].
4 Circular Dichroism
When CD is measured in the far-UV region particular attention should be paid to a careful selection of solvents and buffers. The contributions of buffers and salts to the total absorbance of the sample should be as small as possible. This poses severe restrictions on the choice of solvents for CD measurements because many commonly used salts and buffers absorb strongly in the far-UV region. Absorbance values for several of them are given in Table 2. Common denaturants, such as urea and GdmCl also absorb strongly in the far UV. Therefore they cannot be used for CD measurements below 210 nm. Oxygen also absorbs in the far UV, but precautions such as purging the cell compartment with nitrogen is necessary only when measurements are performed below 190 nm. Protein concentration, pathlength and solvent must be carefully selected to obtain good CD spectra and to avoid artifacts. A good signal-to-noise ratio is achieved when the total absorbance is around 1.0. The magnitude of the CD depends on the protein concentration and therefore its contribution to the total optical density of the sample should be as high as possible, and the absorbance of the solvent should be as small as possible. Therefore, for measurements in the far-UV region, short path lengths (0.2–0.01 cm) and increased protein concentrations should be used. Cells of 0.1 cm are usually sufficient for measurements down to about 200 nm. When preparing samples for CD measurements it should be remembered that protein absorbance is about tenfold higher in the far UV when compared with the aromatic region. Artifacts can arise easily from an excessive absorbance of the sample or from light scattering caused by protein aggregates in the sample. Whenever the optical density of the sample becomes too high, the instrument cannot discriminate properly between the right and the left circularly polarized light components, and the CD signal decreases in size. This phenomenon, called absorption flattening, is a serious problem below 200 nm. It is good practice to ascertain that the measured CD is linearly related with path length and protein concentration. 4.3 Evaluation of CD Data
CD is recorded either as the difference in absorbance of right- and left-handed circularly polarized light, A = AL — AR or as ellipticity, obs , in degrees. Data in both formats can be converted to the molar values, i.e., to the differential molar circular dichroic extinction coefficient, ε = ε L — ε R and to the molar ellipticity, [ ]. ε and [ ] are interrelated by Eq. (3). [ ]=3300×ε
(3)
It should be noted that the concentration standards are different for [ ] and ε. ε is the differential absorbance of a 1 mol L−1 solution in a 1-cm cell, whereas [ ] is the rotation in degrees of a 1 dmol cm−3 solution and a path length of 1 cm. CD of proteins in the amide region is usually given as mean residue ellipticity, [ ]MRW , which is based on the concentration of the sum of the amino acids in the protein solution under investigation. The concentration of residues is obtained by
21
22 Spectroscopic Techniques to Study Protein Conformation
multiplying the molar protein concentration with the number of amino acids. For data in the aromatic region, different units are used in the literature. Frequently, aromatic CD is given as ε, but [ ], based on the protein concentration, and [ ]MRW , based on the residue concentration are also found. The molar ellipticity, [ ], or the residue ellipticity, [ ]MRW , are calculated from the measured (in degrees) by using Eqs. (4) or (5): [ ]=
×100×Mr c×l
[ ]MRW =
(4)
×100×Mr c×l ×NA
(5)
where is the measured ellipticity in degrees, c is the protein concentration in mg mL−1 , l is the pathlength in cm, and Mr is the protein molecular weight. N A is the number of amino acids per protein. [ ] and [ ]MRW have the units degrees × cm2 × dmol−1 . The factor 100 in Eqs. (4) and (5) originates from the conversion of the molar concentration to the dmol cm−3 concentration unit. The relation between [ ] and ε, given in Eq. (3) allows an immediate transformation of raw data into A values and vice versa by the relationship: = 33 (AL − AR ). This implies that a measured ellipticity of 10 millidegrees is equivalent to a A of only 0.0003 absorbance units. Practical aspects of how to measure and analyze CD spectra are discussed in detail in Ref. [6]. The analysis of CD spectra is often used to determine the secondary structure of folded proteins or of folding intermediates. References to computer programs and critical evaluations of the methods can be found in Ref. [22].
References 1 D. B. WETLAUFER, Adv. Protein Chem., 1962, 17, 303–421. 2 C. R. CANTOR, P. R. SCHIMMEL, Biophysical Chemistry. 1980, San Francisco: W. H. Freeman. 3 J. W. DONOVAN, Meth. Enzymol., 1973, 27, 497–525. 4 S. B. BROWN, Ultraviolet and visible spectroscopy. In An Introduction to Spectroscopy for Biochemists, S. B. BROWN, Editor. 1980, Academic Press: London. pp. 14–69. 5 W. COLON ´ , Meth. Enzymol., 1999, 309, 605–632. 6 F. X. SCHMID, Optical spectroscopy to characterize protein conformation and conformational changes. In Protein
7 8
9
10
Structure: A practical approach, T. E. CREIGHTON, Editor. 1997, Oxford University Press: Oxford. pp. 261–298. H. EDELHOCH, Biochemistry, 1967, 6, 1948–1954. S. C. GILL, P. H. VON HIPPEL, Anal. Biochem., 1989, 182, 319–326. C. N. PACE, F. X. SCHMID, How to determine the molar absorbance coefficient of a protein. In Protein Structure: A practical approach, T. E. CREIGHTON, Editor. 1997, Oxford University Press: Oxford. pp. 253–260. C. N. PACE, F. VAJDOS, L. FEE, G. GRIMSLEY, T. GRAY, Protein Sci., 1995, 4, 2411–2423.
References 11 S. YANARI, F. A. BOVEY, J. Biol. Chem., 1960, 235, 2818–2826. 12 G. R. PENZER, Molecular emission spectroscopy. In An Introduction to Spectroscopy for Biochemists, S. B. BROWN, Editor. 1980, Academic Press: London. pp. 70–114. 13 J. R. LAKOWICZ, Principles of Fluorescence Spectroscopy. 1999, New York: Kluwer/Plenum Publishers. 14 P. WU, L. BRAND, Anal. Biochem., 1994, 218, 1–13. 15 R. LOEWENTHAL, J. SANCHO, A. R. FERSHT, Biochemistry, 1991, 30, 6775–6779. 16 F. W. J. TEALE, Biochem. J., 1960, 76, 381–387. 17 U. HEINEMANN, W. SAENGER, Nature, 1982, 299, 27–31. 18 D. A. EGAN, T. M. LOGAN, H. LIANG, E. MATAYOSHI, S. W. FESIK, T. F. HOLZMAN, Biochemistry, 1993, 32, 1920–1927. 19 R. W. COWGILL, Biochim. Biophys. Acta, 1967, 140, 37–43. 20 M. R. EFTINK, C. A. GHIRON, Anal. Biochem., 1981, 114, 199–227. 21 G. D. FASMAN, ed. Circular Dichroism and the Conformational Analysis of
22
23
24
25
26
Biomolecules. 1996, Plenum Press: New York. S. Y. VENYAMINOV, J. T. YANG, Determination of protein secondary structure. In Circular Dichroism and the Conformational Analysis of Biomolecules, G. D. FASMAN, Editor. 1996, Plenum Press: New York. pp. 69–107. R. W. WOODY, A. K. DUNKER, Aromatic and cystine side chain circular dichroism in proteins. In Circular Dichroism and the Conformational Analysis of Biomolecules, G. D. FASMAN, Editor. 1996, Plenum Press: New York. pp. 109–157. P. O. FRESKGARD, L. G. MARTENSSON, P. JONASSON, B. H. JONSSON, U. CARLSSON, Biochemistry, 1994, 33, 14281–14288. W. C. JOHNSON, Circular dichroism and its empirical application to biopolymers. In Methods of Biochemical Analysis, D. GLICK, Editor. 1985, John Wiley: New York. pp. 61–163. M. R. EFTINK, Fluorescence techniques for studying protein structure. In Methods of Biochemical Analysis, C. H. SUELTER, Editor. 1991, John Wiley: New York. pp. 127–205.
23
1
Ion Channel Structural Studies Randal B. Bass Amgen Inc., Seattle, USA
Robert H. Spencer Merck Reserch Laboratories, West Point, USA
Originally published in: Expression and Analysis of Recombinant Ions Channels. Edited by Jeffrey J. Clare and Derek J. Trezise. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31209-2
1 Introduction
Biological membranes are fascinating and essential components for all living cells. Whether they are actively transporting molecules into or out of the cell, transducing a signal, or allowing solutes to cascade down a concentration gradient, decades of work in numerous laboratories has illustrated the remarkable properties of these proteins that operate within what is essentially a thin layer of very complicated grease. Since the first atomic resolution structure of a membrane protein was reported over 20 years ago (the bacterial photosynthetic reaction center from Rhodopseudomonas viridis [1]), a wealth of information about how membrane proteins function has been gained from the determination of high-resolution structures from diverse classes of membrane proteins. The ability to visualize the structure of these proteins, as has been done for their soluble counterparts, has allowed keener understanding of their folding and functions. However, it has only been in the last decade that significant progress on the structure of ion channel proteins has been seen. Many advances in the field of membrane protein structural biology have contributed to the growing number of these proteins in the databank. For ion channels, breakthroughs on solving their structures have primarily come from the identification of bacterial homologs which could be expressed in sufficient quantities for labor-intensive crystallization trials. Although general structural information on mammalian ion channels has been obtained using electron microscopy, nearly all of the X-ray crystal structures of ion channels have come from bacterial
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Ion Channel Structural Studies
sources – primarily due to the difficulties associated with their expression in heterologous systems and establishing biochemical conditions to stabilize them outside the lipid membrane. However, very recently the crystal structure of the first mammalian K+ channel has been reported and will hopefully be a harbinger of advances yet to come [2]. In this chapter, we hope to provide a broad overview of the issues and advances made in pursuing membrane proteins such as ion channels for structural studies, and touch on topics that are specifically relevant to their expression, biochemistry and structural analysis. In summarizing these advances, we also aim to highlight those researchers who are at the cutting edge and changing the way in which the sometimes daunting problem of obtaining high-resolution structures is being approached. So just how many membrane proteins are out there, and is it really worth pursuing more? Most estimates that have come from sequencing the genomes from everything from humans to bacteria suggest that roughly 30% of all proteins in the genomes are membrane proteins. For humans, that means around 10 000 of the ∼30 000 protein-coding genes can be found in the lipid bilayer. Several authors have noted that the growth of structures for membrane proteins follows a similar trajectory to those of soluble proteins. In the 20 years following the first high-resolution structure of myoglobin (determined by Kendrew et al. in 1958 [3]), 132 stable protein structures had been solved whereas by 2003, in a similar period since the first membrane protein structure, 75 unique membrane protein structures had been solved [4]. Progress on the structures of membrane proteins is clearly slower than for soluble proteins, but it is possible that the end of 2005 will see the high-resolution determination of the 100th unique membrane protein. Many of these are structures of homologous proteins from other organisms, thus the number of unique folds is somewhat smaller. In stark contrast, the Protein Data Bank (www.rcsb.pdb.org) currently has nearly 30 000 deposited protein structures in total. One of the great hopes of membrane protein structural biology that drives the desire for more structures is that the knowledge gained can be applied to the treatment of disease and help refine the development of better, targeted drugs. Determining the structure of membrane proteins will not only continue to enlighten the molecular and structural basis for their functions, but is also likely to have a broader impact by facilitating lead optimization for the development of novel therapies. Due to the critical role that ion channels play in neuronal and cell signaling and ion homeostasis, they are, and will continue to be, valuable molecular targets for the treatment of human disease. Approximately one third of currently marketed drugs produce their pharmacological actions via ion channels; this also includes around 15% of the top 100 best-selling medicines [5]. However, the development of novel, selective ion channel modulators has been difficult with only a single new ion channel drug being approved since 1997; the exception be R ing ziconotide (Prialt ), a recently approved Ca2+ -channel blocking peptide for the treatment of severe chronic pain. Given current estimates of over 400 ion channel genes encoded within the human genome, there is an abundance of ion channel targets [6], and yet we have no structural information for any of these proteins. For that matter, there currently is not a single X-ray structure of a human
1 Introduction 3
transmembrane protein. Fortunately, we have the structure of a handful of bacterial and archaeal ion channels at atomic resolution which have been extremely enlightening and valuable for molecular modeling purposes, and now the first structure of the first mammalian K+ channel [2]. However, high-resolution structures of the many other mammalian ion channels, especially human, remain a significant need. A major breakthrough for structural biology, as for all branches of biology, was the molecular biology revolution that allowed the recombinant expression of virtually any protein. Reviewing the methods used to obtain protein samples used in crystallization attempts, one can quickly understand that the ability to clone, express and purify recombinant proteins is nearly universal in its application to obtaining structures. This technique has been widely applied in membrane proteins with great success. However, unlike the myriad of soluble bacterial and eukaryotic proteins successfully expressed in recombinant systems, there has been only limited success for the expression, purification and solution of a eukaryotic membrane protein crystal structure produced from a recombinant system. Certainly there are many cases of expression of membrane proteins in these systems, some of which have been useful for structural studies using electron microscopy (EM) [7–9]. The recent structure of the rat Kv1.2-β2 solved from protein expressed in the yeast Pichia pastoris is a tremendous advance and represents a significant breakthrough in the field that may open the door for many other mammalian membrane proteins [2, 10]. Why are the membrane proteins so recalcitrant to expression in these systems when the soluble proteins are not? One potential issue may be the vastly different lipids that make up eukaryotic bilayers, more specifically mammalian, compared to those that are found in prokaryotes. Whereas human cell membranes are composed largely of phosphatidylcholine, phosphatidylethanolamine, sphingomyelin and cholesterol, the E. coli inner membrane is mainly composed of phosphatidylethanolamine (75 %) [11]. The remainder of the lipid content in E. coli is largely cardiolipin and phosphatidylglycerol, neither of which are prevalent in human bilayers [12]. Even with a fantastic expression system, once a eukaryotic membrane protein enters the inner membrane of E. coli, it finds a lipid environment far different from the one in which it would normally reside. Thus, it is not surprising that many eukaryotic membrane proteins fail to express in E. coli, and if they do express they typically end up in inclusion bodies. This presents a new problem. Unlike soluble proteins, the successful refolding from inclusion bodies of a complex membrane protein containing multiple transmembrane helices has not yet been widely reported, although there is a proprietary technology reported by m·phasys GmbH (T¨ubingen, Germany) which suggests this approach may be feasible. Another promising advance was recently reported using a membrane-integrating protein called Mistic (membrane integrating sequence for translation of integral membrane protein constructs) identified from Bacillus subtilis that, when linked to the N-terminus of a eukaryotic membrane protein, produced significant quantities of several target proteins, including a voltage-gated K+ channel [13]. Although relatively hydrophilic, Mistic folds autonomously within the bacterial membrane and appears to facilitate the integration and folding of attached eukaryotic ‘cargo’ membrane proteins that
4 Ion Channel Structural Studies
have been extremely difficult to express in prokaryotes. This and other nascent technologies should be extremely valuable in moving forward the entire field of membrane protein structural biology. Certainly it is possible for mammalian membrane proteins to be expressed and correctly folded in bacterial cells, as several examples have been cited in the literature [8, 9, 14]. However, it is likely that many will not. This fact underscores the need to broaden the search for suitable candidates for expression in these systems. One means to address this is using a multiple orthologue approach such as that applied in the Rees [15–17] and Chang laboratories [18, 19]. In all of these cases, orthologous bacterial proteins were expressed in E. coli to find combinations that could produce sufficient quantities of protein that were properly folded, and would produce crystals of sufficient diffraction quality. If these examples of bacterial membrane proteins are any indicator, the successful expression of eukaryotic membrane proteins is also likely to require this same approach. 2 Expression of Membrane Proteins for Structural Studies 2.1 Mammalian Expression
For mammalian membrane proteins, including ion channels, it seems intuitive to seek to isolate these from native tissues or to express them heterologously in cell culture. However, since ion channels are so efficient (with flux rates of ∼107 ions s−1 ) mammalian cells natively express relatively low levels of these proteins, typically 10–1000 channels of any one type per cell, and thus are not good sources for structural studies. Nevertheless, there are a few examples (see Table 2, below) where sufficient quantities have been obtained for single-particle electron microscopy (EM) including the TRPC3 cation channel [20], L-type Ca2+ channel [21], type 1 IP3 receptor [22–24], and the ryanodine receptor [25–27]. There are also a few nonmammalian examples of abundant natural sources of ion channels which have been exploited for EM structures, including the nicotinic acetylcholine receptor (nAChR) from Torpedo ray [28] and the voltage-gated Na+ channel from electric eel [29]. Stably transfected recombinant cell lines are also relatively poor sources of ion channel protein as expression levels at the cell membrane rarely exceed a few thousand channels per cell. Additionally, there can often be a large pool of immature, incorrectly processed or folded protein remaining within the cell (see Chapter 4) resulting in a heterogeneous protein sample upon purification. One way to resolve this is by purifying only the mature, fully glycosylated channels. For example, in EM studies of both the L-type (dihydropyridine) and IP3 receptor-type Ca2+ channels, wheat germ lectin affinity chromatography was commonly used during their purification [30–32], and was also employed for the 20 Å resolution structure of human CFTR [33]. Another approach is to simply remove oligosaccharide altogether. In fact, given the heterogeneous nature of glycosylation the removal of oligosaccharide is likely
2 Expression of Membrane Proteins for Structural Studies
to be beneficial, perhaps even essential, for high-resolution structural studies. This was found to be critical for obtaining well diffracting 2D crystals of human aquaporin-1 [34, 35] and, in the cases of Shaker, aquaporin-1, Kv1.3, and SkM1 the effects of deglycosylation on the function of these channels is minimal [36–39]. Additionally, synchrotron radiation circular dichroism spectroscopy and thermal denaturation studies on the E. electricus Na+ channel show no net change in protein secondary structure resulting from the deglycosylation procedure [40]. In summary, given the current limitations on protein processing and expression level, the development of improved mammalian expression systems is likely to be required before this becomes a realistic option for the production of membrane proteins for structural studies. Viral expression systems, such as Semliki Forest virus (SFV) or BacMam, may provide a solution in the future. 2.2 Insect Expression
Employing insect cells for the expression of eukaryotic channels has shown some potential and has several advantages compared to mammalian systems since the cell culture is simpler and easier to scale up. The Drosophila Shaker K+ channel was one of the first ion channels successfully expressed heterologously in insect cells for the purpose of biochemical and structural studies [41]. The human Kv1.1 channel has been functionally expressed using baculovirus in Sf9 cells and proven useful for biochemical studies [42]. Based on whole-cell currents, the cell membrane contains no more than a few thousand channels per cell, but it is possible that a much larger intracellular pool of protein may be recoverable for structural studies. It has also been applied for biochemical and biophysical studies of CFTR, the glycine receptor, the KATP channel, and the heteromeric amiloride Na+ channel complex [43–46]. However, in spite of these advances, the general utility of insect cells for producing reasonable quantities of intact mammalian ion channel protein for structural studies is still in question. Although, the first images of the tetrameric structure of a voltage-gated K+ channel came from protein expressed in Sf9 cells, there has been little progress for other mammalian ion channels [47]. In the case of CFTR, no structural information has been gained for this channel expressed in insect cells. However, recently, 2D crystals of this channel were reported using heterologous expression in mammalian cells (BHK cells) that provided structural information at a resolution of 20 Å [33]. In spite of these issues, it should be recognized that the crystal structure of many soluble proteins has been obtained using protein derived from heterologous expression in insect cells and thus may yet prove similarly valuable for membrane proteins. 2.3 Yeast Expression
Although the possibility of using yeast for the production of mammalian membrane proteins has been pursued for many years, it was not until very recently that this was successfully applied for material suitable for structural studies. In 2003, Parcej
5
6 Ion Channel Structural Studies
and Eckhardt-Strelau reported the expression, purification and structure of the rat Kv1.2 channel in complex with the Kvβ2 subunit at 21 Å resolution using single-particle electron microscopy [10]. The Kv1.2-β2 protein was expressed in the methylotrophic yeast Pichia pastoris and found to be in a relatively homogeneous form following the removal of a consensus N-glycosylation site and several putative phosphorylation sites. This is certainly one of the first breakthrough methods for the production of milligram quantities of eukaryotic membrane protein that appears to be both properly folded and functionally active based on ligand-binding analysis. This method was recently adapted for the determination of the crystal structure of this channel at 2.9 Å resolution [2] – a landmark achievement. It is certainly hoped that this system can be broadly used to explore the structure of many other mammalian membrane proteins in addition to ion channels. 2.4 Bacterial Expression
The heterologous expression of proteins in E. coli cells is certainly one of the most widely used systems applied to soluble proteins. However, in the case of membrane proteins, it is only in the past decade that significant progress has been seen on the successful application of this system for X-ray and EM structural studies. It was not until 1998 that the first successful X-ray crystal structures of ion channel proteins were determined (KcsA and MscL), and these paved the way for many other ion channel structures that were determined using similar methods [15, 48]. As clearly summarized in Table 1, with a single exception [2], all of the known ion channel structures determined using X-ray crystallography were derived from protein expressed and purified using E. coli as the host. The primary advantage of using a prokaryotic expression system is the ability to easily scale-up the process and the tremendously lower costs of production. In general, relatively traditional methods have been successfully employed for the expression of bacterial membrane proteins in E. coli. Plasmids such as the pET (Novagen, Madison, WI) or pQE (Qiagen, Valencia, CA) vectors are the most commonly utilized for expression in suitable E. coli strains [BL-21 (DE3), for pET vectors; XL-1 or SG13009 (pREP4) for pQE vectors]. Although the process appears relatively straightforward, the over expression of ion channels, or other membrane proteins, is often toxic to E. coli, requiring optimization of media, temperature, cell density and induction conditions for each protein of interest. Additionally, we have observed that over expression of these proteins will often result in cellysis at later stages of induction. Thus, it is necessary to manage this process carefully and consistently by harvesting the cells before this occurs. For bacterial membrane proteins, the use of E. coli will continue to be an important means for bringing forward new structures. However, the expression of eukaryotic membrane proteins in E. coli is still a significant challenge. There are scattered reports of limited success for a few mammalian membrane proteins, but none of these has yet been successful at providing any structural information using either EM or X-ray methods. The recent report on the expression of several
2003 2005
KvAP 58 Kv1.2-P-2 2
b
1BL8 1MSL 1K4C 1KPK 1MXM 1LNQ
1KPL 1OTS 1P7B
1ORQ 2A79
K+ Non-selective K+ Cl− Non-selective K+
Cl− Cl− K+
K+ K+
Stoichiomety for pore-forming domain Complex with Fab fragment
2002 2003 2003
StClC 70 EcClC 72 KirBac1.1 73
a
1998 1998 2001 2002 2002 2002
48 15 69 70 17 71
KcsA MscL KcsAb EcCIC MscS Mthk
3.2 2.9
3.0 2.5 3.6
3.2 3.5 2.0 3.5 3.9 3.3
4 4
2 2 4
4 5 4 2 6 4
6 6
17 17 2
2 2 2 17 3 2
A. pemix R. norvegicus
S. lividans M. tuberculosis S. lividans E. coli E. coli M. thermoautotrophicum S. typhimurium E. coli B. pseudomallei
Ion Resolution TM α-helices Name Ref. Year Selectivity PDB ID (Å) Subunitsa (per subunit) Species
His6 , C-term His6 , C-term His6 , C-term
His6 , C-term His10 , N-term His6 , C-term His6 , C-term His6 , N-term His6 , C-term
Yes No
Yes Yes No
Yes No Yes Yes No Yes
n-Decyl-β-D-maltoside n-Decyl-β-D-maltoside n-Decyl-β-D-maltoside + n-Tridecyl-β-Dmaltoside n-Decyl-β-D-maltoside n-Dodecyl-β-D-maltoside
n-Decyl-β-D-maltoside n-Dodecyl-β-D-maltoside n-Decyl-β-D-maltoside n-Decyl-β-D-maltoside Fos-Choline-14 n-Decyl-β-D-maltoside
Tag Detergent Affinity Tag Removal (Extraction)
E. coli His6 , C-term P. pastoris His6 , N-term
E. coli E. coli E. coli
E. coli E. coli E. coli E. coli E. coli E. coli
Host
Table 1 Ion channel structures determined by X-ray crystallography. Examples of crystal structures determined for ion channels using X-ray crystallography
n-Octyl-β-D-glucoside n-Decyl-β-D-maltoside
n-Octyl-β-D-maltoside n-Decyl-β-D-maltoside Cymal-4, HEGA-10
LDAO n-Dodecyl-β-D-maltoside n-Decyl-β-D-maltoside n-Octyl-β-D-maltoside Fos-Choline-14 LDAO
Detergent (Crystallization)
2 Expression of Membrane Proteins for Structural Studies 7
a
Shaker 5-HT3 receptor Kv1.3 KcsA Cx43 EcCIC Voltage-sensitive Na+ channel Shaker L-type Ca2+ channel nAChR Kv1.2-P2 CFTR IP3 receptor Kv4.2 / KChlP2 KvAP AMPA receptor Ryanodine receptor TRPC3
78 21 28 10 33 79 80 81 82 83 20
47 74 38 75 76 77 29
n/a n/a n/a 6 7.5 6.5 19 25 23 4.0 21 20 15 21 10.5 42 10 30
K+ Cation K+ K+ Non-selective Cl− Na+
K+ Ca2+ Cation K+ Cl− Ca2+ K+ K+ Na+ Ca2+ Cation 4 1 5 4 1 4 4 4 5 4 4
4 5 4 4 12 2 1 6 24 4 6 12 6 6 6 3 6–8 6
6 4 6 2 4 17 24 D. melanogaster O. cuniculus T. mamorata R. norvegicus H. sapiens M. musculus H. sapiens A. pernix R. norvegicus O. cuniculus M. musculus
D. melanogaster M. musculus H. sapiens S.lividans R. norvegicus E. coli E. electricus
Resolution TM α-helices Ref. Ion Selectivity (Å) Subunitsa (per subunit) Species
Stoichiomety for pore-forming domain
2001 2003 2003 2003 2004 2004 2004 2004 2005 2005 2005
1994 1995 1997 1998 1999 2001 2001
Year Name
crystallography from two-dimensional protein crystals
COS Native Native P. pastoris BHK Native COS7 E. coli Native Native HEK293
Sf9 (Baculovirus) NG 108-15 CV-1 (Vaccinia) E. coli BHK E. coli Native
Host
1D4, C-term – – His9 , N-term His10 , C-term – 1D4, C-term His6 , C-term – – FLAG, C-term
– – His6 , N-term His6 , N-term – His10 , C-term –
N – – N N – N Y – – N
– – N N – Y –
CHAPS Digitonin – Tween-80 n-Dodecyl-β-D-maltoside CHAPS CHAPS n-Decyl-β-D-maltoside CHAPS or n-Decyl-β-D-maltoside CHAPS n-Decyl-β-D-maltoside
CHAPS n-Dodecyl-β-D-maltoside CHAPS n-Dodecyl-β-D-maltoside Tween-20 n-Dodecyl-β-D-maltoside Lubrol- PX
Tag Affinity Tag Removal Detergent
Table 2 Ion channel structures determined by electron microscopy. Examples of ion channel structures determined using either single particle electron microscopy or electron
8 Ion Channel Structural Studies
3 The Detergent Factor
eukaryotic membrane proteins in E. coli, including a GPCR and an ion channel, using the autonomously folding integral membrane protein, Mistic, is very encouraging [13]. It is hoped that this and similar advances will pave the way for the structure determination of the many eukaryotic membrane proteins that still evade our grasp.
3 The Detergent Factor
Successful crystallization is only possible if the membrane protein, once removed from the bilayer, is properly folded in a detergent. The list of detergents used in successful crystallizations is both lengthy and diverse. A table of successfully used detergents is shown in Fig. 1. The α-helical type membrane proteins have seen the greatest success n-octyl-β-D-glucopyranoside (OG), N,N-dimethyldodecylamine-Noxide (LDAO) and n-dodecyl-β-D-maltopyranoside (DDM). Octylglucoside stands out as the detergent that has been used in the most successful crystallizations. However, it is critical to note that the success rates partly reflect the length of time that each of these have been available. Thus, it is not clear if there something universal about OG and LDAO that makes them so useful, or simply if they have they seen the greatest success because they have been utilized in membrane protein biochemistry for much longer. Similarly, chain length trends for the detergents, where multiple chains lengths are available, have been frequently speculated about. Again, it is important to note that the many variations within a given detergent family, such as the glucosides, maltosides and so forth, have come about only recently, precluding clear conclusions. While it is possible to observe some general correlations, for example α-helical-type proteins are more apt to crystallize in glucopyranosides then they are in polyoxyethylenes, few other trends beyond such cursory conclusions can be defined at present. It is no surprise that removal of membrane proteins from their native lipid environment poses one of the most serious challenges to maintaining a native fold conducive for crystallization trials. Certainly one of the most significant advances for membrane protein crystallography is the availability of a myriad of high quality, diverse detergents. In this respect, the work of the scientists at Anatrace (Maumee, OH) is of particular note. The development of several new detergent series, CYMAL, FOS-CHOLINE, FOS-MEA, CYFOS and C-HEGA was aided through a NIH Small Business Innovative Research grant. Recognition by the funding agencies of the need for better reagents for novel research is to be congratulated and encouraged. The effects of detergent on membrane proteins cannot be overstated. Proper choice of detergent will greatly impact on whether crystallization is possible. The choice of detergent can be broken into two critical components. The first step is to identify a detergent that can successfully extract the protein from the bilayer. Membrane proteins often contain multiple subunits so the detergent must be able to extract the protein while maintaining both its tertiary and its quaternary structure. As illustrated in Fig. 2A, one can design screens to identify optimal expression
9
10 Ion Channel Structural Studies
Fig. 1 Graph of detergents successfully used for membrane protein structure determination. Data from the compilation of known membrane protein structures at http://www.mpibpfrankfurt.mpg.de/michel/public/ memprotstruct.html. Beta-type proteins refer to outer membrane proteins of prokary-
otes with transmembrane spanning beta strands. Alpha-type proteins, found in both prokaryotic and eukaryotic membrane proteins, span the bilayer with alpha helices. The x-axis denotes the number of successful structure determinations in each of the listed detergents.
constructs using multiple homologs, expression vectors, varying types or location of affinity tags, using a few key detergents. Once an optimized expression system has been identified, one can then broadly screen for detergent solubility, relative purity and integrity using a 96-well format as illustrated in Fig. 2B. Further analysis of the structural integrity of the protein must then be carried out. While spectroscopic techniques such as circular dichroism are helpful for determining the state of tertiary structure, assessing quaternary structure can be more challenging. Size-exclusion chromatography, although low resolution, works quite well for this, in spite of the presence of detergent micelles. Macromolecules and
3 The Detergent Factor
Fig. 2 Expression and detergent screening. (A) Schematic diagram for expression construct optimization and detergent screening. Multiple constructs, homologs, and detergents can be simultaneously screened to rapidly identify the optimal conditions for expression and solubilization. Ideally, each open reading frame (ORF) is cloned into several expression vectors that place an affinity tag at either the N- or C-terminus. These are transfected into a series of bac-
terial strains developed for protein expression. Following induction of protein expression (typically using a range of conditions), culture samples are individually aliquoted, lysed and solubilized with a panel of unique detergents. Cleared lysates are analyzed by Western blot using an antibody to the affinity tag to determine relative expression for each construct expressed in each bacterial strain, as well as their relative detergent solubility.
11
12 Ion Channel Structural Studies
Fig. 2 (Continued) Expression and detergent screening. (B) Schematic diagram of a high throughput detergent screening procedure. To determine the relative solubility and yield for a membrane protein using a broad series of detergents, a 96-well extraction and purification procedure can be performed using a robotic liquid-handling system or manually. Briefly, a panel of unique detergents is added to each well of a deep 96-well block containing the cell lysate. After shaking gently, affinity resin is added across the wells to bind the target protein. The resin is then transferred to a
96-well filter plate where it is washed, and subsequently eluted into a collection plate (available from several vendors including Novagen, San Diego, CA; Qiagen,Valencia, CA;Vi-vascience, Inc., Edgewood, NY). Protein recovery can be evaluated by UV spectrometry or other protein assay, and purity can be visualized by high-throughput SDS-PAGE (Invitrogen, Carlsbad, CA). Protein identity can be confirmed by Western blot analysis and mass spectrometry, and functional and/or structural integrity can be evaluated by size exclusion chromatography (SEC) or ligand-binding.
complexes of over 200 kDa can be routinely resolved with size-exclusion chromatography and perhaps more importantly, aggregated protein in the sample can be identified. The presence of aggregated protein is a significant obstacle to successful crystallization. Every attempt should be made to eliminate it, or significantly reduce it, either by reengineering constructs, altering growth and production of the protein or by chromatography, prior to setting up crystallization trials. To assess whether the protein of interest is maintained as a higher order, oligomeric structure in any detergent one can crudely, but quickly, address this using ultra filters of an appropriate molecular weight cutoff and simply look for the oligomeric protein in the retentate versus the filtrate (i.e. monomeric protein will pass through the filter). A more visual approach uses bifunctional cross linking reagents to covalently link subunits together and subsequently visualize them on
4 Purification
Fig. 3 Crosslinking to determine proper oligomeric state. Polyacrylamide gel showing crosslinking of two homologs of MscL from M. tuberculosis (denoted Tb) and E. coli (Eco) with disuccinimidyl suberate (DSS). Shown on the left side of the gel are the positions of the molecular weight markers. The two left hand lanes are monomers
of the homologs without addition of the bifunctional cross-linking reagents DSS. The two right lanes are with the addition of DSS, showing a clear ladder of monomer, dimer, trimer, tetramer and finally pentamer that is the full oligomeric structure of MscL, as seen in the crystal structure. Reprinted with permission from Ref [15].
SDS-PAGE gels, as shown in Fig. 3. This technique has been used to determine the oligomeric state of the ion channel MscL [15]. An enhancement of this technique is cross linking in conjunction with capillary electrophoresis (CE) employing a sieving gel. The advantage of this latter technique is the ability to easily quantify peaks in an electropherogram and deduce the stoichiometry of a complex macromolecule. Although densitometry of Coomassie-stained gels can also accomplish this, CE offers the advantage of high resolution, ease of quantification, and excellent separation, even between subunits of similar molecular weight. On a practical note, although it may be advantageous to make membrane preparations for large-scale purification of membrane proteins, this technique may be unnecessary and misleading if used in conjunction with detergent screens. Care must be taken whenever sonication is used for detergent screening to avoid false positive results. Heavy sonication can produce very small (∼5 µM diameter) vesicles that can remain suspended while attempting to spin down a detergent screen trial. If membrane preparation by sonication is employed, it is necessary to apply ultracentrifugation to pellet any small membrane vesicles, prior to gel and analysis.
4 Purification
The first membrane protein structures were all derived from naturally abundant proteins. For example Deisenhofer’s original membrane protein structure, the photosynthetic reaction center from Rhodopseudomonas virdis, is found in purple
13
14 Ion Channel Structural Studies
patches in the membrane in a highly concentrated “pre-purified” form [1]. Additionally, the first GPCR, bovine rhodposin, occurs naturally at high concentration in the outer rod segments and is relatively pure in that membrane. Although determining the structure of both these proteins certainly presented numerous challenges, considerable additional challenges are presented by proteins found in eukaryotic plasma membrane where many proteins of interest to both the academic and commercial structural biologist reside, such as transporters, channels and receptors. These proteins are found typically in low abundance and in a membrane that is host to hundreds of other proteins. It would be difficult, if not impossible, to obtain structural information for these proteins without the use of recombinant technology in conjunction with affinity purification tags. Many affinity tag systems are commercially available, such as FLAG, maltose binding protein, streptavidin, thioredoxin, and many more. A near universal tool for this approach is the use of polyhistidine tags and immobilized metal chelate chromatography (IMAC). So pervasive is its application, that it has been a significant technology for the high throughput world of structural genomics as well as a core technique for many membrane protein crystallography laboratories. The use of these tags is now so routine that they need not be thoroughly discussed here. Indeed, as shown in Tables 1 and 2, nearly all of the recombinant expressed ion channel proteins for which structures have been determined were prepared using some type of polyhistidine tag. However, a few specific points regarding the use of polyhistidine tags for membrane proteins should be made. Although hexahistidine (His6 ) tags were originally used, His8 , His10 and even longer tags are now more common although, in our hands, the utility of tags longer than His10 is not clear. A critical advantage of polyhistidine tags is their ability to efficiently bind metal columns in the presence of detergents. Although other affinity tags can also be used in the presence of detergent in virtually all cases, our experience supports the use of polyhistidine tags in conjunction with the >60 detergents that are routinely used in our screens. In short, the presence of detergents above their critical micellar concentration (CMC) does not preclude the use of this type of tag, though it should be noted that the efficiency with which the tag binds the column is often diminished. It is this reduction in affinity that has led to the use of vectors encoding the elongated tags listed above. In the light of lowered affinity in the presence of detergent, very high levels of imidazole should be carefully tested when washing and eluting the column, in order to obtain maximum purity while balancing the need for high yield. These opposing forces can easily be monitored, even at very small scale, by employing Western blots with anti-hexahistidine antibodies which readily bind the longer polyhistidine tags, and which are commercially available and inexpensive. Regardless of the affinity tag used, the ability to manipulate the detergent in this capture step is often of great importance. The detergent used for extraction from the membrane may not be one in which the protein remains stable for long periods of time, or it may not prove successful in crystallization. At the affinity purification step, which is usually the first step after extraction, detergent concentration is often high (e.g. 0.1% or more), depending on the CMC of the detergent. This step is therefore an opportunity to
5 Crystallization
lower the detergent concentration to a level at which, though still above the CMC, the protein remains stable and which allows for at least the possibility of crystal formation [19–23]. The organization of protein molecules into a three-dimensional crystal lattice requires the formation of intermolecular crystal contacts which stabilize its architecture. However, due to associated detergent, there is significantly less available surface area on purified membrane proteins for making these necessary contacts. The crystallization and X-ray structure of soluble proteins fused to large affinity tags, such as the maltose-binding protein (MBP), thioredoxin (TRX), or glutathione-Stransferase (GST), has recently been reported, and it has been suggested that these tags may aid membrane protein crystallization by increasing the available hydrophilic surface area [49]. This technique would be analogous to the approach taken by Kaback and coworkers using cytochrome b562 fused to lac permease [50], or Iwata and coworkers attaching protein Z to cytochrome bo3 to facilitate crystallization [51]. This approach has not yet been proven successful for obtaining high-resolution structures of any membrane protein, and moreover, in the two examples mentioned, their structures were finally solved as His-tag fusions [52, 53]. However, since MPB is targeted to the periplasm, some have speculated that an N-terminal fusion with MBP could potentially “drag” the first transmembrane spanning region through the bilayer, providing a critical foothold to proper folding and further insertions into the membrane. In addition another possible benefit is the “chaperone-like effect” provided by MBP, enhancing the likelihood of a proper fold. The successful expression of the human Na+ /glucose co-transporter illustrates both the use of an alternative affinity tag, as well as the expression of functional human membrane protein in E. coli [9]. The authors used a FLAG-tagged protein expressed in E. coli to generate 0.3 mg of pure protein per liter of culture. The use of a lac promoter system, in simple LB medium, at reduced temperature (20 ◦ C), and a bacterial strain defective in the outer membrane protein OmpT, illustrates the successful expression and purification of a human membrane protein in a bacterial system. Perhaps more importantly, the protein produced was functional when reconstituted into liposomes. This level of expression of a functional protein illustrates the utility of bacterial expression when one considers that a scale up to a fermentor would easily achieve the protein production required for crystallization attempts. Moreover, a functional assay, such as sugar transport in this example, adds to the likelihood that this method can produce material that has the ability to be crystallized.
5 Crystallization
In general, once successfully purified in a detergent-containing buffer system in which it is stable, the crystallization of a membrane protein, though challenging, is not that much different from that of soluble proteins. Crystals for membrane proteins are obtained either in batch mode or from vapor diffusion, with both hanging
15
16 Ion Channel Structural Studies
and sitting drop formats being successfully used. Sitting drops have the advantage of easier use with 96-well plates, enabling faster screening of more conditions, and smaller drop sizes, thus preserving valuable supplies of painstakingly purified protein as far as possible. An online list of crystal screening kits and the composition of each tube or well is maintained at the Structural Biology Laboratories at the University of Uppsala (http://xray.bmc.uu.se/markh/php/xtalscreens.php). Mem R brane protein specific screens have been developed, most notably the MembFac screen originally developed by Michael Stowell and commercialized by Hampton Research, which was the first attempt at compiling the conditions used to successfully crystallize membrane proteins into a single screen [54]. As the number of successful structures continues to increase so does the breadth of conditions used for successful crystal formation. Making use of the commercially available screens is an excellent place to begin. When attempting to determine suitable conditions, if availability of material allows, it is worth trying them all, since they are relatively inexpensive and cover a wide range of precipitants, salts, buffers and pH. Commercial screens in 96-well format from Hampton Research, Nextal Biotechnologies, Emerald Biostructures and others are available, facilitating simple application of this higher throughput format. Microfluidics has the potential to vastly extend the amount of crystallization screening possible but, as of now, these systems are still in their infancy. The purification and crystallization of the light-harvesting complex of photosystem II (LHC-II) illustrates several techniques useful for the extraction and stabilization of membrane proteins [55]. In this case detergent (n-nonyl-β-D-glucoside), lipid [digalactosyldiacylglycerol (DGDG)], and chlorophyll were used to solubilize the complex. The use of a combination of detergent, a critical plant lipid present in the thylakoid membrane where the complex is found, as well as chlorophyll, is readily validated from the structure. Adjacent trimers of the LHC-II complex are largely mediated by two pairs of DGDG and two pairs of chlorophyll molecules. The crystallization solution contained a second detergent, N,N-bis-(3-D-gluconamidopropyl) deoxycholamide (BigCHAP). This illustrates a highly complicated extraction and stabilization of a membrane protein but perhaps most striking is the resulting crystalline lattice. The packing of the complexes resembles that of the highly symmetric (in this case icosahedral particles) packing of viruses. In a ‘Type-II’ membrane protein crystal the detergent micelle forms a torus around the membrane-spanning region and crystal contacts are mediated by polar regions of the molecule, as described above. In the case of LHC-II, the complex exists in a proteoliposome and contacts between adjacent molecules are made through hydrophobic contacts at the lipid. The resultant ‘Type III’ membrane protein crystal is clearly formed by the judicious addition of lipid, cofactor and detergents. This may prove especially important in maintaining the correct fold of eukaryotic membrane proteins expressed in organisms that do not have bilayers with a similar lipid composition. The recent Kv1.2-β2 structure also highlights the potential importance of exogenous lipid, including phosphatidylcholine, phosphatidylethanolamine and phosphatidylglycerol, which was reported to be essential during the purification and crystallization process [2].
6 Use of Antibody Fragments
Thus, the use of lipids in membrane protein crystallography has proved useful and should be explored further.
6 Use of Antibody Fragments
Crystal formation by membrane proteins, even assuming a stable homogenous protein is purified, can be hampered if limited polar surface area is present. The hydrophobic surface of membrane spanning regions is coated by the detergent micelle, allowing solubilization of the protein. The polar head groups of the detergent are of course identical and lack a regular structure that can promote crystal formation and growth. Polar protein surfaces are responsible for mediating crystal contacts which can be minimal for many integral membrane proteins. One way of enlarging these regions is to use antibody fragments, either Fab domains, or Fv fragments specific for the membrane protein. These antibody fragments bind specifically, and therefore regularly, to the membrane protein and provide additional polar surface area. The fragments are also rigid in structure, allowing a higher probability of producing a regular three-dimensional lattice. The application of antibody fragments, both Fab and Fv, has been successful with membrane proteins [56–58]. One route to produce these is to simply use a well established hybridoma protocol to generate suitable monoclonal antibodies and then create Fab fragments using proteolytic cleavage [58]. This technique works well, as long as the proteolytic enzymes cut uniformly and produce a homogenous Fab sample. Alternatively, variable region fragments (Fv), can be generated, either as independent light and heavy chain fragments or as a single chain with a linker sequence. Both forms of variable region fragments are readily expressed in E. coli. This latter approach can be initiated from the more well established hybridoma/Fab approach described above, by cloning of the variable region of the antibodies using RT-PCR amplification of the mRNA from the hybridoma cell line. Several well characterized examples are in the literature [59]. Regardless of the route used to create antibody fragments, one feature that likely contributes to crystal formation is the ability of the fragments to lock a single conformation of the target protein. Of course any perturbation of one protein by another, although sometimes required for a successful structure determination, leads to speculation as to the usefulness of the resulting structure. Transmembranespanning regions are particularly at risk of structural perturbation from antibody fragments since these regions are often more mobile, partly due to the lack of constraints that normally exist in the membrane. However, some membrane proteins, notably channels and transporters, are inherently flexible in these regions due to their function. The ability to “lock” the structure in a particular conformation using an antibody fragment is an extra benefit, in addition to increasing polar surface area, which can promote a successful crystallization. However, it should be noted that, for one of the better known structures determined in this manner (KvAP), significant skepticism remained about the relevance of the resulting model [60–62].
17
18 Ion Channel Structural Studies
Another technique to generate antibodies is to use filamentous phage. Phage display offers several advantages in the creation of antibody fragments, most notably speed and recombinant expression of the resulting fragments. Fab or scFv fragments can be generated in as little as one month. Recently R¨othlisberger and coworkers created a phage display library that produces Fab fragments that are stable in detergent [63]. Screening of the library was carried out in the presence of 0.1% dodecylmaltoside, with subsequent binding analysis using Biacore performed in buffer containing the same concentration of detergent. The authors also provide guidance as to which affinity tags are most useful in the screening process. The resulting antibody fragments, Fab in this case, were shown to bind to their target in the low nanomolar range in a conformationally dependent manner.
7 Generation of First Diffraction Datasets
One striking thing that has been observed for a variety of membrane protein structures is the variability in the quality of diffraction from crystal to crystal, as well as the sensitivity of the crystals to cryoprotection. This latter complication may be due to the high solvent content of membrane protein crystals which can typically reach 80% [15]. First diffraction images can produce less than inspiring data, when compared with a typical soluble protein. However, any diffraction is a significant step towards a structure and even the weakest of spots, often clustered just around the beamstop, are cause for celebration. Once any diffraction is obtained, a critical foothold in the process is reached, since this represents a milestone against which further modifications can be assessed. Now it is possible to assess any changes and ask, is this better or not? A long stream of blank images will likely precede this milestone, from which it is difficult to assess the impact of any changes made. As shown in Fig. 4, it can be seen that the first diffraction image may only produce a few spots around the beamstop, and cannot be successfully indexed with any confidence. However, the process of crystal screening and cryoprotection optimization is pursued to improve their diffraction qualities (Fig. 4B). Finally, in the case of MscL, heavy metal soaks of the crystals produced higher resolution reflections that produced the final structure (Fig. 4C and D). A similar approach has been successfully applied for the structures of the bacterial transporters MsbA and EmrE, bacterial homologs to the ABC transporter family [18, 19]. Advances in automation are having a major impact on several aspects of membrane protein crystallography, particularly for screening of crystals. 96 and 384-well format crystallization trays now allow researchers to rapidly set up thousands of crystallization trials using a variety of robotic liquid-handling platforms such as the Mosquito (TTP Labtech, UK) or Hydra II (Matrix Tech. Corp., NH), but these need to be complemented with crystal imaging technology. The limit in the number of trials is often determined not by the chemical matrices that can be thought up for crystallization, but rather by restrictions on the amount of purified protein that can be produced. Since it is likely that many tens or hundreds of crystals will need to be
7 Generation of First Diffraction Datasets 19
Fig. 4 Examples of X-ray diffraction images. (A) Initial diffraction pattern from a single crystal of the MscL ion channel protein from M. tuberculosis (Tb-MscL) prior to the optimization of crystallization and cryoprotection conditions. (B) Following optimization of the crystallization and cryoprotection conditions for a crystal of Tb-MscL that diffracted to a limiting resolution of 7 Å. (C) Diffraction pattern of a Tb-MscL crystal after soaking with the heavy atom
compound, Na3 Au(S2 O3 )2 . Note the significant improvement in the diffraction limit, as compared to (B), extending to 3.5 Å. This image also illustrates the significant anisotropy and intensity decay of the reflections often observed with membrane proteins. (D) Ribbon diagram of the resulting 3.5 Å structure of the M. tuberculosis MscL, a mechanosensitive channel, reproduced with permission from Ref. [15].
20 Ion Channel Structural Studies
produced and screened, obtaining sufficient protein preparations from the outset is critical. Robots are also finding good use in the screening of crystals at the synchrotron. Anyone who has screened at the synchrotron will tell you that most of the time spent screening is taken up by mounting crystals and closing up the experimental hutch. With current state-of-the-art beam lines, the time taken to obtain the test diffraction image can be as short as 10 s. In contrast, getting the hutch open, mounting and centering the crystal and closing up the hutch takes many minutes. Robots such as the ACTOR system (Rigaku/MSC, TX) mounted at the beam line are equipped with cryo-pucks that hold dozens of crystals and are now in use at several synchrotrons. Due to the efficiency of not having to manipulate the crystals manually, one can now screen crystals at the beam line remotely. A Dewar flask loaded with pucks and dozens of crystals is shipped to the synchrotron facility and a technician can load up the robot. Automatic mounting and centering of the crystals at a preset exposure time will facilitate the screening of hundreds of crystals within a few hours. Once screened, each crystal is safely recovered and returned to the Dewar by the robot for future data collection. This is without doubt the way of the future since it massively increases the “open shutter” time at the beam lines, enhancing the efficiency of a seriously taxed resource.
8 Selenomethionine Phasing of Membrane Proteins
Congratulations! You’ve just successfully crystallized a membrane protein and got a dataset. Now all you have to do is obtain your phases and start model building. Membrane proteins involved in respiration and light harvesting have metal-containing cofactors that can be useful for phase determination. However, many membrane proteins, channels, receptors and transporters, typically do not. Although traditional multiple isomorphous replacement methods for obtaining phase information on membrane proteins and channels have been used, an increasingly useful technique widely used in soluble proteins, and just as valuable for membrane proteins, is selenomethionine multiple anomalous diffraction phasing (SeMet MAD). In spite of the often modest resolution of membrane proteins, this technique has seen spectacular successes and should not be discounted as a useful route for phase determination. A possible complication to using this technique is the large size of many membrane proteins. The larger the protein the higher the likelihood that a large number of methionine residues, and thus selenomethionines, will be present. Successful phasing using selenomethionine and MAD phasing have been reported for membrane proteins with up to 42 methionines per asymmetric unit, all at a resolution of 4 Å, underscoring the utility of this approach [17]. Incorporation of selenomethionine has been shown in E. coli (reviewed in Ref. [64]), in yeast [65], baculovirus [66], and mammalian tissue culture (BioXtal, Switzerland). In the Rees Lab at Caltech, bacterial expression of membrane proteins with incorporation of selenomethionine started with an M9 glucose minimal media with
9 MAD Phasing and Edge Scanning
subsequent media augmentation. A modified M9 medium, containing 50 mg L−1 of selenomethionine and the other 19 amino acids at 40 µg ml−1 (or 80 µg, for D,L-amino acids) was sufficient for nearly complete incorporation of selenomethionine. As with any change in growth medium, optimization of induction reagents, induction time and growth temperature should be undertaken. The addition of naturally derived media components, such as tryptone and yeast extract, should be avoided since these reagents and others contain methionine, and will likely reduce the level of selenomethionine incorporation. As one would expect, the yield of recombinant protein in minimal media is considerably lower than that for rich media. Sufficient cell mass for purification and crystallization can require scaling up, in our case up to a 60 L fermentor. Similarly, for yeast expression, both extensive strain selection as well as media screening produced high incorporation rates [65]. In some cases, cells can be grown in methionine-containing media with subsequent transfer to selenomethioninecontaining media, as was done for the first selenomethionine crystal structure that utilized insect cells and baculovirus infection. In this latter case, cells were grown and infected in methionine-rich media, then grown in methionine-depleted media and finally grown in selenomethionine-containing media. This approach is also readily applied to bacterial expression if cell growth in selenomethionine minimal media is insufficient. Current high throughput efforts in structural genomics have applied selenomethionine labeling right from the beginning. In fact, as with soluble proteins, some have suggested that it is advantageous to simply move immediately to expression of membrane proteins in selenomethionine incorporating systems. While this surely would speed the process to structure solution, it seems highly unlikely that the limitations imposed in media used for selenomethionine bioincorporation would lead to the successful membrane protein expression. A far more fruitful approach would seem to rely on a combination of multiple targets and expression systems, as mentioned throughout this chapter and a return to selenomethionine once diffraction quality crystals have been obtained.
9 MAD Phasing and Edge Scanning
A fluorescent scan of the crystal should be obtained before data are collected. Multiple wavelength anomalous diffraction (MAD) beam lines are equipped with X-ray fluorescence detectors and obtaining the values of f and f for the anomalous scattering atom(s) aids in finding the precise wavelength you should use for collecting the peak and inflection datasets. Although it is possible to look up these values, the dataset will be better for choosing the wavelengths from the scan. The Blu-Ice program (developed at SSRL) can perform an edge scan and calculate these values for the user [67]. A minimum of two wavelengths are required to do MAD phasing (hence “multiple” wavelength anomalous diffraction); however, in practice,
21
22 Ion Channel Structural Studies
the peak wavelength, inflection, and a high-energy remote wavelength are usually collected. The peak wavelength is, as it sounds, the peak of the f plot. A second wavelength, at the inflection point, is the wavelength at which the f value inflects at a minimum (very near the half maximal point on the f plot. The final wavelength is a high energy remote, usually collected at a lower wavelength distant from the edge of X-ray absorption of the anomalous scattering atom. Finally, a critical factor is to obtain highly redundant data. Anomalous scattering only produces relatively small differences between pairs of reflections (Bijvoet pairs). By obtaining data redundancy of 4-fold (each reflection measured 4 times), 5-fold or even higher, each of these pairs is measured multiple times, so the small differences in the intensity of each Bijvoet pair are statistically significant. Even high symmetry space groups, where a small angular wedge of data can produce a complete dataset, benefit, in terms of phasing, by collecting as much data as possible. Of course, the maximum would be to collect 360◦ of data at each of the wavelengths chosen. If radiation damage is not a problem, as determined by integrating and scaling the data and obtaining the Rmerge of each dataset, then this approach should be used. Since the anomalous signal from selenium is small compared to heavy atoms used in multiple isomorphous replacement, such as gold, mercury and platinum, the data from a selenomethionine MAD experiment need special treatment. Searching for the anomalous scattering atoms, selenium in this case, which have relatively small differences in the intensities of Bijvoet pairs is enhanced by allowing the search programs (Solve, for example) to use local scaling. Local scaling of the data, scaling using a reciprocal space sphere of neighboring reflections, allows the best signal to noise ratio when obtaining these slight differences in intensities. In practice, what this means is stronger anomalous signals accurately measured to the highest resolution for that dataset, which will increase the chances of solving the crystal structure.
10 Negative B-factor Application (Structure Factor Sharpening)
Anisotropic decay in the intensity of high resolution reflections has been observed frequently for membrane proteins. Since many membrane protein crystals diffract to only modest resolution (3–4 Å) obtaining the most from these weak reflections in the highest resolution shell is crucial. A successful approach that has been used with great success is structure factor sharpening (the application of a negative B-factor to the observed structure factor, F obs ). The CCP4 program can be run to apply an overall B-factor and scale to the data that will increase the F obs of the high-resolution reflection in a disproportional manner relative to the low-resolution reflections [68]. However, a disadvantage is that the application of negative B-factors increases the noise in electron density maps. Nevertheless, this technique is most valuable if noncrystallographic symmetry is present, since this excess noise is averaged out leaving maps that can be of excellent quality, thus producing a better model.
References
11 Conclusions
In spite of the somewhat daunting task of crystallizing a membrane protein such as an ion channel, there are emerging tools and techniques that will, without doubt, make structures of these intriguing proteins more common. Ion channels and membrane proteins in general have such tremendous therapeutic potential to be unlocked by the availability of high-resolution structures; one can expect the sheer number and speed of their solutions to only increase in the coming years.
References 1 J. DEISENHOFER, O. EPP, K. MIKI, R. HUBER, H. MICHEL, X-ray structure analysis of a membrane protein complex. Electron density map at 3 A resolution and a model of the chromophores of the photosynthetic reaction center from Rhodopseudomonas viridis, J. Mol. Biol. 1984, 180, 385–398. 2 S. B. LONG, E. B. CAMPBELL, R. MACKINNON, Crystal Structure of a Mammalian Voltage-Dependent Shaker Family K+ Channel, Science 2005, 309, 897–903. 3 J. C. KENDREW, G. BODO, H. M. DINTZIS, R. G. PARRISH, H. WYCKOFF, D. C. PHILLIPS. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis, Nature 1958, 181, 662– 666. 4 S. H. WHITE, The progress of membrane protein structure determination, Protein Sci. 2004, 13, 1948–1949. 5 M. WORTMAN, The Re-Emergence of Ion Channel Drug Discovery, in Start Up: Windhover’s Review of Emerging Medical Ventures, 2003, p. 9. 6 J. C. VENTER, et al., The sequence of the human genome, Science 2001, 291, 1304–1351. 7 R. GRISSHAMMER, J. TUCKER, Quantitative evaluation of neurotensin receptor 15 purification by immobilized metal affinity chromatography, Protein Expr. Purif. 1997, 11, 53–60. 8 G. FIERMONTE, V. DOLCE, F. PALMIERI, Expression in Escherichia coli, functional characterization, and tissue distribution of isoforms A and B of the phosphate
9
10
11
12
13
14
15
16
carrier from bovine mitochondria, J. Biol. Chem. 1998, 273, 22782–22787. M. QUICK, E. M. WRIGHT, Employing Escherichia coli to functionally express, purify, and characterize a human transporter, Proc. Natl. Acad. Sci.U S A 2002, 99, 8597–8601. D. N. PARCEJ, L. ECKHARDT-STRELAU, Structural characterisation of neuronal voltage-sensitive K+ channels heterologously expressed in Pichia pastoris, J. Mol. Biol. 2003, 333, 103–116. M. A. LESNEY, HTS: Seeking Signals Through the Noise, Modern Drug Discov. 2002, 5. C. TANFORD (ed.), The Hydrophobic Effect: Formation of Micelles and Biological Membranes, John Wiley & Sons, New York, 1980. T. P. ROOSILD, J. GREENWALD, M. VEGA, S. CASTRONOVO, R. RIEK, S. CHOE, NMR structure of Mistic, a membrane-integrating protein for membrane protein expression, Science 2005, 307, 1317–1321. J. F. WHITE, L. B. TRINH, J. SHILOACH, R. GRISSHAMMER, Automated large-scale purification of a G protein-coupled receptor for neurotensin, FEBS Lett. 2004, 564, 289–293. G. CHANG, R. H. SPENCER, A. T. LEE, M. T. BARCLAY, D. C. REES, Structure of the MscL homolog from Mycobacterium tuberculosis: a gated mechanosen-sitive ion channel, Science 1998, 282, 2220–2226. K. P. LOCHER, A. T. LEE, D. C. REES, The E. coli BtuCD structure: a framework for
23
24 Ion Channel Structural Studies
17
18
19
20
21
22
23
24
25
26
ABC transporter architecture and mechanism, Science 2002, 296, 1091–1098. R. B. BASS, P. STROP, M. BARCLAY, D. C. REES, Crystal structure of Escherichia coli MscS, a voltage-modulated and mechanosensitive channel, Science 2002, 298, 1582–1587. G. CHANG, C. B. ROTH, Structure of MsbA from E. coli: a homolog of the multidrug resistance ATP binding cassette (ABC) transporters, Science 2001, 293, 1793–1800. C. MA, G. CHANG, Crystallography of the integral membrane protein EmrE from Escherichia coli, Acta Crystallogr. D 2004, 60, 2399–2402. K. MIO, T. OGURA, Y. HARA, Y. MORI, C. SATO, The non-selective cation-permeable channel TRPC3 is a tetrahedron with a cap on the large cytoplasmic end, Biochem Biophys Res Commun 2005, 333, 768–777. M. WOLF, A. EBERHART, H. GLOSSMANN, J. STRIESSNIG, N. GRIGORIEFF, Visualization of the domain structure of an L-type Ca2+ channel using electron cryo-microscopy, J. Mol. Biol. 2003, 332, 171–182. P. C. DA FONSECA, S. A. MORRIS, E. P. NEROU, C. W. TAYLOR, E. P. MORRIS, Domain organization of the type 1 inositol 1,4,5-trisphosphate receptor as revealed by single-particle analysis, Proc. Natl. Acad. Sci.U S A 2003, 100, 3936– 3941. Q. X. JIANG, E. C. THROWER, D. W. CHESTER, B. E. EHRLICH, F. J. SIGWORTH, Three-dimensional structure of the type 1 inositol 1,4,5-trisphosphate receptor at 24 A resolution, EMBO J. 2002, 21, 3575–3581. SERYSHEVA II, D. J. BARE, S. J. LUDTKE, C. S. KETTLUN, W. CHIU, G. A. MIGNERY, Structure of the type 1 inositol 1,4,5-trisphosphate receptor revealed by electron cryomicroscopy, J. Biol. Chem. 2003, 278, 21319–21322. Serysheva II, S. L. HAMILTON, W. CHIU, S. J. LUDTKE, Structure of Ca2+ release channel at 14 A resolution, J. Mol. Biol. 2005, 345, 427–431. E. V. ORLOVA, SERYSHEVA II, M. van HEEL, S. L. HAMILTON, W. CHIU, Two structural configurations of the skeletal muscle
27
28
29
30
31
32
33
34
35
36
calcium release channel. Nat. Struct. Biol. 1996, 3, 547–552. SERYSHEVA II, M. SCHATZ, M. van HEEL, W. CHIU, S. L. HAMILTON, Structure of the skeletal muscle calcium release channel activated with Ca2+ and AMP-PCP. Biophys. J. 1999, 77, 1936–1944. A. MIYAZAWA, Y. FUJIYOSHI, N. UNWIN, Structure and gating mechanism of the acetylcholine receptor pore, Nature 2003, 423, 949–955. C. SATO, Y. UENO, K. ASAI, K. TAKAHASHI, M. SATO, A. ENGEL, Y. FUJIYOSHI, The voltage-sensitive sodium channel is a bell-shaped molecule with several cavities. Nature 2001, 409, 1047–1051. SERYSHEVA II, S. J. LUDTKE, M. R. BAKER, W. CHIU, S. L. HAMILTON, Structure of the voltage-gated L-type Ca2+ channel by electron cryomicroscopy, Proc. Natl. Acad. Sci.U S A 2002, 99, 10370–10375. M. C. WANG, G. VELARDE, R. C. FORD, N. S. BERROW, A. C. DOLPHIN, A. KITMITTO, 3D structure of the skeletal muscle dihydropyridine receptor, J. Mol. Biol. 2002, 323, 85–98. C. C. CHADWICK, A. SAITO, S. FLEISCHER, Isolation and characterization of the inositol trisphosphate receptor from smooth muscle, Proc. Natl. Acad. Sci.U S A 1990, 87, 2132–2136. M. F. ROSENBERG, A. B. KAMIS, L. A. ALEKSANDROV, R. C. FORD, J. R. RIORDAN, Purification and crystallization of the cystic fibrosis transmembrane conductance regulator (CFTR), J. Biol. Chem. 2004, 279, 39051–39057. A. K. MITRA, A. N. van HOEK, M. C. WIENER, A. S. VERKMAN, M. YEAGER, The CHIP28 water channel visualized in ice by electron crystallography, Nat. Struct. Biol. 1995, 2, 726–729. G. REN, V. S. REDDY, A. CHENG, P. MELNYK, A. K. MITRA, Visualization of a water-selective pore by electron crystallography in vitreous ice, Proc. Natl. Acad. Sci.U S A 2001, 98, 1398–1403. L. SANTACRUZ-TOLOZA, Y. HUANG, S. A. JOHN, D. M. PAPAZIAN, Glycosylation of shaker potassium channel protein in insect cell culture and in Xenopus oocytes, Biochemistry 1994, 33, 5607– 5613.
References 37 A. N. van HOEK, M. C. WIENER, J. M. VERBAVATZ, D. BROWN, P. H. LIPNIUNAS, R. R. TOWNSEND, A. S. VERKMAN, Purification and structure-function analysis of native, PNGase F-treated, and endo-betagalactosidase-treated CHIP28 water channels, Biochemistry 1995, 34, 2212–2219. 38 R. H. SPENCER, et al., Purification, visualization, and biophysical characterization of Kv1.3 tetramers, J. Biol. Chem. 1997, 272, 2389–2395. 39 E. BENNETT, M. S. URCAN, S. S. TINKLE, A. G. KOSZOWSKI, S. R. LEVINSON, Contribution of sialic acid to the voltage dependence of sodium channel gating. A possible electrostatic mechanism, J. Gen. Physiol. 1997, 109, 327–343. 40 N. B. CRONIN, A. O’REILLY, H. DUCLOHIER, B. A. WALLACE, Effects of deglycosylation of sodium channels on their structure and function, Biochemistry 2005, 44, 441– 449. 41 K. KLAIBER, N. WILLIAMS, T. M. ROBERTS, D. M. PAPAZIAN, L. Y. JAN, C. MILLER, Functional expression of Shaker K+ channels in a baculovirus-infected insect cell line, Neuron 1990, 5, 221–226. 42 R. A. GUBITOSI-KLUG, D. J. MANCUSO, R. W. GROSS, The human Kv1.1 channel is palmitoylated, modulating voltage sensing: Identification of a palmitoylation consensus sequence, Proc. Natl. Acad. Sci.U S A 2005, 102, 5964–5968. 43 C. E. BEAR, C. H. LI, N. KARTNER, R. J. BRIDGES, T. J. JENSEN, M. RAMJEESINGH, J. R. RIORDAN, Purification and functional reconstitution of the cystic fibrosis transmembrane conductance regulator (CFTR), Cell 1992, 68, 809–818. 44 M. CASCIO, N. E. SCHOPPA, R. L. GRODZICKI, F. J. SIGWORTH, R. O. FOX, Functional expression and purification of a homomeric human alpha 1 glycine receptor in baculovirus-infected insect cells, J. Biol. Chem. 1993, 268, 22135–22142. 45 M. V. MIKHAILOV, P. PROKS, F. M. ASHCROFT, S. J. ASHCROFT, Expression of functionally active ATP-sensitive K-channels in insect cells using baculo-virus, FEBS Lett. 1998, 429, 390–394.
46 U. S. RAO, A. MEHDI, R. E. STEIMLE, Expression of amiloride-sensitive sodium channel: a strategy for the coexpression of multimeric membrane protein in Sf9 insect cells, Anal. Biochem. 2000, 286, 206–213. 47 M. LI, N. UNWIN, K. A. STAUFFER, Y. N. JAN, L. Y. JAN, Images of purified Shaker potassium channels, Curr. Biol. 1994, 4, 110–115. 48 D. A. DOYLE, et al., The structure of the potassium channel: molecular basis of K+ conduction and selectivity, Science 1998, 280, 69–77. 49 D. R. SMYTH, M. K. MROZKIEWICZ, W. J. MCGRATH, P. LISTWAN, B. KOBE, Crystal structures of fusion proteins with large-affinity tags, Protein Sci. 2003, 12, 1313–1322. 50 J. ZHUANG, G. G. PRIVE, G. E. WERNER, P. RINGLER, H. R. KABACK, A. ENGEL, Two-dimensional crystallization of Escherichia coli lactose permease, J. Struct. Biol. 1999, 125, 63–75. 51 B. BYRNE, J. ABRAMSON, M. JANSSON, E. HOLMGREN, S. IWATA, Fusion protein approach to improve the crystal quality of cytochrome bo(3) ubiquinol oxidase from Escherichia coli, Biochim. Biophys. Acta 2000, 1459, 449–455. 52 J. ABRAMSON, I. SMIRNOVA, V. KASHO, G. VERNER, H. R. KABACK, S. IWATA, Structure and mechanism of the lactose permease of Escherichia coli, Science 2003, 301, 610–615. 53 J. ABRAMSON, et al.,The structure of the ubiquinol oxidase from Escherichia coli and its ubiquinone binding site, Nat. Struct. Biol. 2000, 7, 910–917. 54 M. H. STOWELL, in UCLA Crystallization Workshop 1993. 55 Z. LIU, et al., Crystal structure of spinach major light-harvesting complex at 2.72 A resolution, Nature 2004, 428, 287–292. 56 S. IWATA, C. OSTERMEIER, B. LUDWIG, H. MICHEL, Structure at 2.8 A resolution of cytochrome c oxidase from Paracoccus denitrificans, Nature 1995, 376, 660–669. 57 C. HUNTE, J. KOEPKE, C. LANGE, T. ROSSMANITH, H. MICHEL, Structure at 2.3 A resolution of the cytochrome bc(1) complex from the yeast Saccharomyces cerevisiae co-crystallized with an antibody
25
26 Ion Channel Structural Studies
58
59
60
61
62
63
64
65
66
67
68
Fv fragment, Struct. Fold. Des. 2000, 8, 669–684. Y. JIANG, A. LEE, J. CHEN, V. RUTA, M. CADENE, B. T. CHAIT, R. MACKINNON, X-ray structure of a voltage-dependent K+ channel, Nature 2003, 423, 33–41. G. KLEYMANN, C. OSTERMEIER, B. LUDWIG, A. SKERRA, H. MICHEL, Engineered Fv fragments as a tool for the one-step purification of integral multi-subunit membrane protein complexes, Biotechnology (N Y) 1995, 13, 155–160. L. G. CUELLO, D. M. CORTES, E. PEROZO, Molecular architecture of the KvAP voltage-dependent K+ channel in a lipid bilayer, Science 2004, 306, 491–495. F. BEZANILLA, The voltage-sensor structure in a voltage-gated channel, Trends Biochem. Sci. 2005, 30, 166–168. M. LAINE, D. M. PAPAZIAN, B. ROUX, Critical assessment of a proposed model of Shaker, FEBS Lett. 2004, 564, 257–263. D. ROTHLISBERGER, K. M. POS, A. PLUCKTHUN, An antibody library for stabilizing and crystallizing membrane proteins–selecting binders to the citrate carrier CitS, FEBS Lett. 2004, 564, 340–348. W. A. HENDRICKSON, J. R. HORTON, D. M. LEMASTER, Selenomethionyl proteins produced for analysis by multi-wavelength anomalous diffraction (MAD): a vehicle for direct determination of three-dimensional structure, EMBO J. 1990, 9, 1665–1672. D. A. BUSHNELL, P. CRAMER, R. D. KORNBERG, Selenomethionine incorporation in Saccharomyces cerevisiae RNA polymerase II, Structure (Camb.) 2001, 9, R11–R14. J. J. BELLIZZI, J. WIDOM, C. W. KEMP, J. CLARDY, Producing selenomethionine-labeled proteins with a baculovirus expression vector system, Struct. Fold. Des. 1999, 7, R263–R267. T. M. MCPHILLIPS, et al., Blu-Ice and the Distributed Control System: software for data acquisition and instrument control at macromolecular crystallography beamlines, J. Synchrotron Radiat. 2002, 9, 401–406. Collaborative Computational Project, N. The CCP4 suite: programs for protein
69
70
71
72
73
74
75
76
77
78
79
80
crystallography, Acta Crystallogr. D 1994, 50, 760–763. Y. ZHOU, J. H. MORAIS-CABRAL, A. KAUFMAN, R. MACKINNON, Chemistry of ion coordination and hydration revealed by a K+ channel-Fab complex at 2.0 A resolution, Nature 2001, 414, 43–48. R. DUTZLER, E. B. CAMPBELL, M. CADENE, B. T. CHAIT, R. MACKINNON, X-ray structure of a ClC chloride channel at 3.0 A reveals the molecular basis of anion selectivity, Nature 2002, 415, 287–294. Y. JIANG, A. LEE, J. CHEN, M. CADENE, B. T. CHAIT, R. MACKINNON, Crystal structure and mechanism of a calcium-gated potassium channel, Nature 2002, 417, 515–522. R. DUTZLER, E. B. CAMPBELL, R. MACKINNON, Gating the selectivity filter in ClC chloride channels, Science 2003, 300, 108–112. A. KUO, et al., Crystal structure of the potassium channel KirBac1.1 in the closed state, Science 2003, 300, 1922–1926. F. G. BOESS, R. BEROUKHIM, I. L. MARTIN, Ultrastructure of the 5-hydroxy-tryptamine3 receptor, J Neurochem 1995, 64, 1401–1405. H. L. LI, et al., Two-dimensional crystallization and projection structure of KcsA potassium channel, J. Mol. Biol. 1998, 282, 211–216. V. M. UNGER, N. M. KUMAR, N. B. GILULA, M. YEAGER, Three-dimensional structure of a recombinant gap junction membrane channel, Science 1999, 283, 1176–1180. J. A. MINDELL, M. MADUKE, C. MILLER, N. GRIGORIEFF, Projection structure of a ClC-type chloride channel at 6.5 A resolution, Nature 2001, 409, 219–223. O. SOKOLOVA, L. KOLMAKOVA-PARTENSKY, N. GRIGORIEFF, Three-dimensional structure of a voltage-gated potassium channel at 2.5 nm resolution, Structure (Camb.) 2001, 9, 215–220. C. SATO, et al. Inositol 1,4,5-trispho-sphate receptor contains multiple cavities and L-shaped ligand-binding domains, J. Mol. Biol. 2004, 336, 155–164. L. A. KIM, J. FURST, D. GUTIERREZ, M. H. BUTLER, S. XU, S. A. GOLDSTEIN, N. GRIGORIEFF, Three-dimensional structure of I(to); Kv4.2-KChIP2 ion channels by
References electron microscopy at 21 Angstrom resolution, Neuron 2004, 41, 513–519. 81 Q. X. JIANG, D. N. WANG, R. MACKINNON, Electron microscopic analysis of KvAP voltage-dependent K+ channels in an open conformation, Nature 2004, 430, 806–810. 82 T. NAKAGAWA, Y. CHENG, E. RAMM, M. SHENG, T. WALZ, Structure and different
conformational states of native AMPA receptor complexes, Nature 2005, 433, 545–549. 83 M. SAMSO, T. WAGENKNECHT, P. D. ALLEN. Internal structure and visualization of transmembrane domains of the RyR1 calcium release channel by cryo-EM, Nat. Struct. Mol. Biol. 2005, 12, 539– 544.
27
1
Structural Analysis of Fibrous Proteins Thomas Scheibel Technische Universit¨at M¨unchen, Garching, Germany
Louise C. Serpell University of Sussex, Falmer, United Kingdom
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Elucidating protein fiber assembly and analyzing the acquired regular fibrous structures have engaged the interest of scientists from different disciplines for a long time. The complexity of the problem, together with the paucity of interpretable data, has generated a large volume of literature concerned with speculations about possible assembly mechanisms and structural conformations. Fibrous proteins are essential building blocks of life, providing a scaffold for our cells, both intracellularly and extracellularly. However, despite nearly a century of research of the assembly mechanisms and structures of fibrous protein, we have only recently begun to get insights. In the last 10 years, atomic models for some of the most important fibrous proteins have been proposed. We are increasingly knowledgeable about how fibrous proteins form and how they function (for example, the interaction of actin and myosin to produce muscular movement). Development of methods specific to the study of fibrous proteins has enabled these investigations, since fibrous proteins are generally insoluble and often heterogeneous in length, and therefore methods used for examining the structure of globular proteins cannot be used. In this article, an overview of some commonly used methods to examine the assembly and structure of fibrous proteins is given. However, additional methods can be suitable to analyze fibrous proteins. Among the methods not discussed are solid-state NMR (ssNMR), near-field scanning optical microscopy (NSOM), scanning transmission electron microscopy (STEM), and light microscopy, for all of which reviews can be found elsewhere. Other complementary methods, specific for Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Structural Analysis of Fibrous Proteins
the particular protein, include cross-linking studies and antibody-binding epitope studies. These are valuable techniques that can help us to interpret structural data and contribute to the understanding of the assembly pathway, but due to space limitations they can not be discussed in this article.
2 Overview: Protein Fibers Formed in vivo 2.1 Amyloid Fibers
The term “amyloid” describes extracellular, fibrillar protein deposits associated with disease in humans. The deposits accumulate in various tissue types and are defined by their tinctorial and morphological properties [1]. Although the involved proteins are distinct with respect to amino acid sequence and function, the fibrils formed by different amyloidogenic proteins are structured similarly. This observation is remarkable since the soluble native forms of these proteins vary considerably: some proteins are large, some are small, and some are largely α-helical while some are largely β-sheet . Some are intact in the fibrous form, while others are at least partially degraded. Some are cross-linked with disulfide bonds and some are not. Amyloid fibers are unusually stable and unbranched, range from 6 to 12 nm in diameter, and specifically bind the dye Congo red (CR), showing an applegreen birefringence in polarized light [2, 3]. Amyloid fibers are also rich in β-sheet structure and produce a characteristic “cross-β” X-ray diffraction pattern, consistent with a model in which stacked β-sheets form parallel to the fiber axis with individual β-strands perpendicular to the fiber axis [1, 4]. Diseases characterized by amyloids are referred to as amyloidoses; these diseases are slow in onset and degenerative and can be found not only in humans but also in most mammals. Some of the amyloidoses are familial, while some are associated with medical treatment (e.g., hemodialysis) or infection (prion diseases), and some are sporadic (e.g., most forms of Alzheimer’s disease). Certain diseases, such as the amyloidoses associated with the protein transthyretin, can be found in both sporadic and familial forms. In addition to diseases with extracellular deposits, there are others, notably Parkinson’s and Huntington’s disease, that appear to involve very similar aggregates, which are intracellular, and these diseases are therefore not included in the strict definition of amyloidoses [5, 6]. Amyloidogenic proteins are not found only in mammals. Studies of Saccharomyces cerevisiae have resulted in the discovery of proteins that have properties related to those of mammalian prions [7, 8]. These yeast prions have been found to convert in vitro into ordered aggregates with all the characteristics of amyloid fibers [9]. Also, amyloidogenic peptides are found in egg envelopes (chorions) of invertebrates. More than 95% of the chorion’s dry mass consists of proteins that have remarkable mechanical and chemical properties that protect the oocyte and the developing embryo from a wide range of environmental hazards. About
2 Overview: Protein Fibers Formed in vivo 3
200 proteins have been detected and classified into two major classes, A and B. Interestingly, within the chorion, protein fibrils can be detected with characteristics of amyloid fibrils. These features include a high proportion of intermolecular β-sheet, positive staining and green birefringence with CR, and detection of long, unbranched fibrils with a diameter of 4–6 nm [10, 11]. 2.2 Silks
Arthropods produce a great variety of silks that are used in the fabrication of structures outside of the body of the animal. Silks are typically fibrous filaments or threads with diameters ranging from a few hundred nanometers to 50 µm. Insects and spiders use their silks for lifelines, nests, traps, and cocoons. Characteristically, the silk is produced in specialized glands and stored in a fluid state in the lumen of the gland. As the fluid passes through the spinning duct, a rapid transformation to the solid state takes place and the silk becomes water-insoluble [12–15]. However, some of the materials that are called silks are not produced by silk glands in the traditional sense but by extrusions of the guts, e.g., in antlions, lacewings, and many beetles [16]. The major components of traditional silks are fibrous proteins, which in the final insoluble state have a high content of regular secondary structure. Wide variations in composition and structure have been found among different silks. Therefore, the classification of silks is often based on the predominant form of regular secondary structure in the final product as determined by X-ray diffraction studies. Silks are classified as beta silks (which are rich in β-sheets; see Section 3.1.4), alpha silks (which resemble the X-ray diffraction pattern obtained from αkeratin and which have a much lower content of Gly and a much higher content of charged residues than any other silk; see Section 3.2.4), Gly-rich silks (which form a polyglycine II-like structure), and collagen-like silks (which resemble the X-ray diffraction pattern given by collagen; see Section 3.2.1). While silks from the class of Arachnida (spiders) are generally β-silks, in the class of Insecta all four classes of silk can be found, e.g., β-silks in the family of Bombycidae (moths, includes Bombyx mori, whose caterpillar is the silkworm), α-silks in the family of Apidae (bees), Gly-rich silks in the family of Blennocampinae (subspecies of sawflies), and collagen-like silks in the family of Nematinae (subspecies of sawflies) [15, 17–20]. In contrast, silks from extrusions of the guts do not form traditional silk filaments or threads; rather, they are hardened fibrous feces. 2.3 Collagens
Collagens are a family of highly characteristic fibrous proteins found in all multicellular animals. In combination with elastin (see Section 2.8), mucopolysaccharides, and mineral salts, collagens form the connective tissue responsible for the structural integrity of the animal body. The characteristic feature of a typical collagen
4 Structural Analysis of Fibrous Proteins
molecule is its long, stiff, triple-stranded helical structure, in which three collagen polypeptide chains, called α-chains, are wound around each other in a rope-like superhelix (see Section 3.2.1). The polypeptide chains in collagens have a distinctive conformation that is intimately related to their structural function [21–25]. All collagen polypeptides have a similar primary structure, in which every third amino acid is Gly (Gly-Xaa-Yaa)n with a predominance of Pro residues at position Xaa and Yaa. Many of the Pro residues, as well as occurring Lys residues at the position Yaa, are hydroxylated as a posttranslational modification. The repeating sequence of the collagen chain is the reason for the three-dimensional structure of collagens, which has a slightly twisted, left-handed, threefold helical conformation like that of poly(Pro) II and poly(Gly) II and entirely different from that of an αhelix [25–27]. Collagen polypeptide chains typically have about 1000 residues, but considerable variation occurs in certain types. Polypeptide chains of 1000 residues produce a collagen triple helix 1.4 nm in diameter and 300 nm in length. In vivo the triple helix of collagens is assembled correctly because the polypeptide chains are synthesized with non-helical, globular extensions of just over 100 residues at the amino-terminal and 300 residues at the carboxy-terminal ends. The carboxyterminal extensions associate specifically and serve to align the three polypeptide chains and to nucleate assembly of the triple helix. Next, collagens aggregate side by side into microfibrils, which are thin in diameter (10–300 nm) and hundreds of micrometers long. The collagen fibrils often aggregate further into larger, cable-like bundles with several micrometers in diameter [21–23, 27]. The main types of collagen found in connective tissue are types I, II, III, V, and XI, with type I being the principal collagen of skin and bone and by far the most common. These are the fibrillar collagens and have the rope-like structure. Types IX and XII are called fibril-associated collagens, as they decorate the surface of collagen fibrils. They are thought to link these fibrils to one another and to other components of the extracellular matrix. Types IV and VII are network-forming collagens. Type IV molecules assemble into a felt-like sheet or meshwork that constitutes a major part of mature basal lamina, while type VII molecules form dimers that assemble into specialized structures called anchoring fibrils, which help to attach the basal lamina of multi-layered epithelia to the underlying connective tissue [21–23]. In some metazoans, collagens of ectodermal origin are used to construct materials outside the confines of the living organism [28, 29]. Gooseberry sawfly silk [30], selachian egg capsules [31], worm cuticle collagens [32, 33], spongin [34], and mussel byssus are a few examples of such exoskeletal collagenous materials. These threads were dubbed “collagenous” on the basis of fiber X-ray diffraction [35, 36], hydroxyproline content [37], amino acid composition [38, 39], and shrinkage temperatures [40]. The primary structure of collagenous proteins found in the mussel byssus threads, however, encodes unprecedented natural block copolymers with three major domain types: a central collagen domain, flanking elastic or stiff domains, and histidine-rich terminal domains. The elastic domains have sequence motifs that strongly resemble those of elastin (see Section 2.8), and the stiff domains closely resemble amorphous glycine-rich regions of spider silks (see Section 2.2) [41, 42].
2 Overview: Protein Fibers Formed in vivo 5
2.4 Actin, Myosin, and Tropomyosin Filaments
All eukaryotic species contain actin, which is the most abundant protein in many eukaryotic cells. Actin filaments (F actin) can form both stable and labile structures in cells. Stable actin filaments are a crucial component of the contractile apparatus of muscle cells in so-called myofibrils (see below). Many cell movements, however, depend on labile structures constructed from actin filaments. Actin filaments are about 8 nm wide and consist of a tight helix of uniformly oriented actin molecules also known as globular actin (G actin). Like a microtubule (see Section 2.7), an actin filament is a polar structure with two structurally different ends. The so-called minus end is relatively inert and slow growing, while the so-called plus end is faster growing [43, 44]. Some lower eukaryotes, such as yeasts, have only one actin gene encoding a single protein. All higher eukaryotes have several isoforms encoded by a family of actin genes. At least six types of actin are present in mammalian tissue, which fall into three classes depending on their isoelectric point: α actins are found in various types of muscle, whereas β and γ actins are the principal constituents of non-muscle cells. Although there are subtle differences in the properties of different forms of actin, the amino acid sequences have been highly conserved in evolution and all assemble into filaments in vitro. Polymerization of actin in vitro requires ATP as well as both monovalent and divalent cations, such as K+ and Mg2+ . ATP hydrolysis weakens the bonds in the polymer (F actin) and thereby promotes depolymerization [43]. Myofibrils are cylindrical structures 1–2 µm in diameter, which in addition to thin filaments formed by actin contain thick filaments made of muscle isoforms of myosin II. Muscle myosin has two heads with ATPase and motor activity and a long, rod-like tail. A myosin II molecule is composed of two identical heavy chains, each of which is complexed to a pair of light chains. The amino-terminal portion of the heavy chain forms the motor-domain head, while the carboxy-terminal half of the heavy chain forms an extended α-helix. Two heavy chains associate by twisting their α-helical tail domains together into a coiled coil to form a stable dimer that has two heads and a single rod-like tail [45–47]. A major role of the rod-like tail is to allow the molecules to polymerize into bipolar filaments. Myosin II filaments play an important role in muscle contraction, drive membrane furrowing during cell division, and generate tension in stress fibers. Tropomyosins are a group of proteins that bind to the sides of actin filaments and (together with troponins) regulate the interaction of the filaments with myosin in response to Ca2+ . They are parallel, α-helical, coiled-coil dimers. Vertebrates typically express approximately 20 different tropomyosins from alternative splicing from about four tropomyosin genes [48]. Tropomyosins exist as both homodimers and heterodimers held together by a series of salt bridges and hydrophobic interactions. They may also interact through their amino and carboxyl termini in solution (especially in low-ionic-strength buffers). There are two major groups of tropomyosin, the high- and low-molecular-weight tropomyosins. The former type exists primarily in muscle tissues (cardiac, smooth, and skeletal), whereas non-muscle tissues
6 Structural Analysis of Fibrous Proteins
express both high- and low-molecular-weight tropomyosins. Tropomyosins bind cooperatively to actin filaments [49–51], probably because there is end-to-end binding of the tropomyosin molecules, but possibly also because tropomyosin restricts the movement of the filament, causing the filament repeat distance to be made more regular [52]. The binding of tropomyosin to actin is strongly Mg2+ -dependent [49]. High-molecular-weight tropomyosins consist most often of 284 amino acids and bind seven actin monomers, while the low-molecular-weight tropomyosins consist of about 245–250 amino acids and span six actins. The thick filaments of invertebrate muscles contain, in addition to myosin, the protein paramyosin, a rod-like, coiled-coil molecule similar to myosin [53]. Paramyosin filaments play an important part in certain molluscan and annelid muscles because they allow these muscles to maintain tension for prolonged periods without a large expenditure of energy. 2.5 Intermediate Filaments/Nuclear Lamina
Intermediate filaments are tough, durable, 10-nm wide protein fibers found in the cytoplasm of most, but not all, animal cells, particularly in cells that are subject to mechanical stress [54]. Unlike actin and tubulin, which are globular proteins, monomeric intermediate filament proteins are highly elongated, with an aminoterminal head, a carboxy-terminal tail, and a central rod domain. The central rod domain consists of an extended α-helical region containing long tandem repeats of a distinct amino acid sequence motif called heptad repeats. These repeats promote the formation of coiled-coil dimers between two parallel α-helices. Next, two of the coiled-coil dimers associate in an antiparallel manner to form a tetrameric subunit. The antiparallel arrangement of dimers implies that the tetramer and the intermediate filaments formed thereof are non-polarized structures. This distinguishes intermediate filaments from microtubules (see Section 2.7) and actin filaments (see Section 2.4), which are polarized and whose functions depend on this polarity [55–58]. The cytoplasmic intermediate filaments in vertebrate cells can be grouped into three classes: keratin filaments, vimentin/vimentin-related filaments, and neurofilaments. By far the most diverse family of these proteins is the family of α-keratins, with more than 20 distinct proteins in epithelia and over eight more so-called hard keratins in nails and hair. The α-keratins are evolutionarily distinct from β-keratins found in bird feathers, which have an entirely different structure and will not be further discussed in this context. Based on their amino acid sequence, α-keratins can be subdivided into type I (acidic) and type II (neutral/basic) keratins, which always form heterodimers of type I and type II but never homodimers [59]. Unlike keratins, the class of vimentin and vimentin-related proteins (e.g., desmin, peripherin) can form homopolymers [60]. In contrast to the first two classes of intermediate filaments, the third one, neurofilaments, is found uniquely in nerve cells, wherein they extend along the length of an axon and form its primary cytoskeletal component [61]. The nuclear lamina is a meshwork of intermediate filaments that lines the inside surface of the inner nuclear membrane in eukaryotic cells. In mammalian cells the nuclear lamina is composed of lamins, which are homologous to other intermediate
2 Overview: Protein Fibers Formed in vivo 7
filament proteins but differ from them in several ways. Lamins have a longer central rod domain, contain a nuclear localization signal, and assemble into a twodimensional, sheet-like lattice, which is unusually dynamic. This lattice rapidly disassembles at the start of mitosis and reassembles at the end of mitosis [61, 62]. 2.6 Fibrinogen/Fibrin
Fibrinogen is a soluble blood plasma protein, which during the process of clotting is modified to give a derived product termed fibrin. Fibrinogen is a multifunctional adhesive protein involved in a number of important physiological and pathological processes. Its multifunctional character is connected with its complex multi-domain structure. Fibrinogen consists of two identical subunits, each of which is formed by three nonidentical polypeptide chains, Aα, Bβ, and γ , all of which are linked by disulfide bonds and assemble into a highly complex structure [63–65]. Conversion of fibrinogen into fibrin takes place under the influence of the enzyme thrombin. The proteolysis results in spontaneous polymerization of fibrin and formation of an insoluble clot that prevents the loss of blood upon vascular injury. This clot serves as a provisional matrix in subsequent tissue repair. The polymerization of fibrin is a multi-step process. First, two stranded protofibrils are generated, which associate with each other laterally to produce thicker fibrils. The fibrils branch to form the fibrin clot, which is subsequently stabilized by the formation of intermolecular cross-linkages [66]. Fibrinogen is a rather inert protein, but after conversion fibrin interacts with different proteins, such as tissue-type plasminogen activator, fibronectin, and some cell receptors, and participates in different physiologically important processes, including fibrinolysis, inflammation, angiogenesis, and wound healing [67–69]. 2.7 Microtubules
Microtubules are stiff polymers formed from molecules of tubulin. Microtubules extend throughout the cytoplasm and govern the location of membrane-bounded organelles and other cell components. Particles can use the microtubules as tracks, and motor proteins, such as kinesin and dynein, can move, e.g., organelles along the microtubules [70, 71]. The protein building blocks of microtubules, tubulins, are heterodimers consisting of two closely related and tightly linked globular polypeptides, α-tubulin and β-tubulin. In mammals there are at least six forms of α-tubulin and a similar number of forms of β-tubulin. The different tubulins are very similar and copolymerize into mixed microtubules in vitro, although they can have distinct cellular locations. A microtubule can be regarded as a cylindrical structure with a 25-nm diameter, which is built from 13 linear protofilaments, each composed of alternating α- and β-tubulin subunits and bundled in parallel to form the cylinder. Since the 13 protofilaments are aligned in parallel with the same polarity, the microtubule itself is a polar structure and it is possible to distinguish both ends [71–74]. Microtubules reflect dynamic protein polymers that can polymerize and
8 Structural Analysis of Fibrous Proteins
depolymerize depending on their biological function. Tubulin will polymerize into microtubules in vitro, as long as Mg2+ and GTP are present, by addition of free tubulin to the free end of the microtubule. Hydrolysis of the bound GTP takes place after assembly and weakens the bonds that hold the microtubule together. Without stabilization by cellular factors capping the free ends, microtubules become extremely unstable and disassemble [73, 75]. 2.8 Elastic Fibers
The proteins elastin, fibrillin, and resilin occur in the highly elastic structures found in vertebrates and insects, which are at least five times more extensible than a rubber band of the same cross-sectional area. Elastin is a highly hydrophobic protein that, like collagen, is unusually rich in Pro and Gly residues, but unlike collagen it is not glycosylated and contains little hydroxyproline and no hydroxylysine. Elastin molecules are secreted into the extracellular space and assemble into elastic fibers close to the plasma membrane. After secretion, the elastin molecules become highly cross-linked between Lys residues to generate an extensive network of fibers and sheets. In its native form, elastin is devoid of regular secondary structure [76–79]. The elastic fibers are not composed solely of elastin, but the elastin core is covered with microfibrils that are composed of several glycoproteins including the large glycoprotein fibrillin [80, 81]. Microfibrils play an important part in the assembly of elastic fibers. They appear before elastin in developing tissues and seem to form a scaffold on which the secreted elastin molecules are deposited. As the elastin is deposited, the microfibrils become displaced to the periphery of the growing fiber [81]. Elastin and resilin are not known to be evolutionarily related, but their functions are similar. Resilin forms the major component in several structures in insects possessing rubber-like properties. Similar to elastin, the polypeptide chains are three-dimensionally cross-linked and are devoid of regular secondary structure. The amino acid composition of resilin resembles that of elastin insofar as the Gly content is high, but otherwise there is little sequence homology between these two proteins [19, 79, 82]. 2.9 Flagella and Pili
Prokaryotes and archaea use a wide variety of structures to facilitate motility, most of which include fibrillar proteins. The best studied of all prokaryotic motility structures is the bacterial flagellum, which is composed of over 20 different proteins. The bacterial flagellum is a rotary structure driven from a motor at the base and a protein fiber acting as a propeller. The propeller consists of three major substructures: a filament, a hook, and a basal body. The filament is a tubular structure composed of 11 protofilaments about 20 nm in diameter, and the protofilaments are typically built by a single protein called flagellin. Less commonly, the filament is composed
2 Overview: Protein Fibers Formed in vivo 9
of several different flagellins [83–85]. Strikingly, monomeric flagellin does not polymerize under physiological conditions. The amino- and carboxy-terminal regions of flagellin monomers are known to be intrinsically unfolded, which prevents flagellin from assembling into a nucleus for polymerization [86, 87]. Flagellin assembly is regulated through a combination of transcriptional, translational, and posttranslational regulatory mechanisms too complex to be explained in detail in this chapter. Only the unusual filament growth should be mentioned, since individual flagellin monomers are added to the growing polymer not at the base closest to the cell surface, but at the distal tip furthest from the cell, probably by passing through the hollow interior of the tubule [85, 88]. Motility is also conferred by flagella in the domain archaea, yet these structures and the sequences of the involved proteins bear little similarity to their bacterial counterparts. Rather, archaeal flagella demonstrate sequence similarity to another bacterial motility apparatus named pili. Both types of proteins have extremely hydrophobic amino acids over the first 50 amino acids and, in contrast to bacterial flagellin, are made as preproteins with short, positively charged leader peptides that are cleaved by leader peptidases. In archaea multiple flagellins are found that form a filament with a diameter of approximately 12 nm [89, 90]. Pili are responsible for various types of flagellar-independent motility, such as twitching motility in prokaryotes, and they are formed of pilins. The outer diameter of a pilus fiber is about 6 nm, and, unlike in bacterial flagellum, there is no large channel in the center of the fiber. Therefore, pili grow by adding monomers at the base closest to the cell surface. The amino terminus of pilin forms α-helices, which compose the core of the pilus fiber. The outside of the pilus fiber is made of β-sheets packed against the core and an extended carboxy-terminal tail [90–92]. 2.10 Filamentary Structures in Rod-like Viruses
Filamentous phages comprise a family of simple viruses that have only about 10 genes and grow in gram-negative bacteria. Different biological strains have been isolated, all of which have a similar virion structure and life cycle. The most prominent filamentous phage is bacteriophage fd (M13), which infects E. coli. The wild-type virion is a flexible rod about 6 nm in diameter and 800–2000 nm in length. Several thousand identical α-helical proteins form a helical array around a single-stranded circular DNA molecule at the core. It is assumed that the protein subunits forming the helical array pre-assemble within the membrane of the host into ribbons with the same inter-subunit interactions as in the virion. A ribbon grows by addition of newly synthesized protein subunits at one end, and at the same time the other end of a ribbon twists out of the membrane and merges into the assembling virion [93–95]. Some proteins of plant viruses, such as the tobacco mosaic virus (TMV), also form self-assembled macromolecules. The virion of TMV is a rigid rod (18 × 300 nm) consisting of 2130 identical virion coat protein (CP) subunits stacked in a helix
10 Structural Analysis of Fibrous Proteins
around a single strand of plus-sense RNA. Depending upon solution conditions, TMV CP forms three general classes of aggregates, the 4S or A protein composed of a mixture of low-ordered monomers, dimers, and trimers; the 20S disk or helix composed of approximately 38 subunits; and an extended virion-like rod. Under cellular conditions, the 20S helix makes up the predominant CP aggregate [96–99]. In the absence of RNA, the structure within the central hole of the 20S filament is disordered, but upon RNA binding structural ordering occurs, which locks the nucleoprotein complex into a virion-like helix and allows assembly to proceed [100]. Virion elongation occurs in both the 5 and 3 directions of the RNA. Assembly in the 5 direction occurs rapidly and is consistent with the addition of 20S aggregates to the growing nucleoprotein rod. Assembly in the 3 direction occurs considerably slower and is thought to involve the addition of 4S particles [97, 99]. 2.11 Protein Fibers Used by Viruses and Bacteriophages to Bind to Their Hosts
Certain viruses and bacteriophages use fibrous proteins to bind to their host receptors. Examples are adenovirus, reovirus, and the T-even bacteriophages, such as bacteriophage T4. The fibers of adenovirus, reovirus, and T4 share common structural features: they are slim and elongated trimers with asymmetric morphologies, and the proteins consist of an amino-terminal part that binds to the viral capsid, a thin shaft, and a globular carboxy-terminal domain that attaches to a cell receptor. Strikingly, the fibers formed of these proteins are resistant to proteases, heat, and, under certain conditions, sodium dodecyl sulfate (SDS) [101]. The assembly of the tail spike from Salmonella phage P22 is the most extensively studied model system and has allowed detailed study of in vitro and in vivo folding pathways in light of structural information [102].
3 Overview: Fiber Structures
Structural elucidation of fibrous proteins is nontrivial and often involves the use of several structural techniques combined to derive a model structure. Structural studies have often involved X-ray crystallography from single crystals of monomeric or complexed proteins to elucidate the structure of fibrous protein subunits. Electron microscopy with helical image reconstruction and X-ray fiber diffraction have then been used to examine the structure of the fiber, and, where possible, modeling of a crystal structure has been matched with the electron microscopy maps or X-ray diffraction data. Protein fiber structures that have been extensively studied include natural proteins such as actin, tubulin, collagen, intermediate filaments, silk, myosin, and tropomyosin as well as misfolded protein fibers such as amyloid, paired helical filaments, and serpin polymers. In this section, a few selected examples will be discussed.
3 Overview: Fiber Structures
3.1 Study of the Structure of β-sheet-containing Proteins 3.1.1 Amyloid Amyloid fibrils extracted from tissue are extremely difficult to study due to contaminants and the degree of order within the fibrils themselves. For this reason, structural studies of amyloid fibrils have concentrated on examining the structure of amyloid formed in vitro from various peptides with homology to known amyloidforming proteins. It was clear from the early work of Eanes and Glenner [2] that amyloid fibrils resembled cross-β silk (see Section 3.3.1), in that the fiber diffraction pattern gave the cross-β diffraction signals. This means that the diffraction pattern consists of two major diffraction signals at 4.7 Å and around 10 Å, on perpendicular axes, that correspond to the spacing between hydrogen-bonded β-strands and the intersheet distance, respectively. Models of amyloid fibrils have been built using X-ray diffraction data [1, 103–106], solid-state NMR data [107–109], and electron microscopy [110, 111]. Many of these models have a common core structure in which the β-strands run perpendicular to the fiber axis and are hydrogen-bonded along the length of the fiber. Whether the amyloid fibrils are composed of parallel or antiparallel β-sheets may vary depending on the composite protein. X-ray diffraction and FTIR data from various amyloid fibrils have suggested an antiparallel β-sheet arrangement for some amyloid fibrils formed from short peptides [105]. Solid-state NMR data from full-length Aβ peptides appear to be consistent with a parallel arrangement [108]. Alternative models of amyloid fibril core structure involving β-helical structure have been suggested [112, 113]. The mature amyloid fibril is thought to be composed of several protofilaments; evidence for this comes from electron microscopy [111, 114] and atomic force microscopy data [115]. 3.1.2 Paired Helical Filaments Paired helical filaments (PHFs) are found deposited in disease and are composed of the microtubule-associated protein Tau [116]. PHFs are found in neurofibrillary tangles (NFT) in Alzheimer’s disease brains as well as in a group of neurodegenerative diseases collectively known as the tauopathies. Electron microscopy structural work on PHFs has revealed that the filaments are composed of two strands twisted around one another, giving the characteristic appearance [117]. Straight filaments (SF) have also been isolated and are thought to be composed of the same building blocks. Calculated cross-sectional views showed that C-shaped subunits make up each strand for both PHFs and SFs [117]. More recently, X-ray and electron diffraction from synthetic and isolated PHFs have revealed a cross-β structural arrangement, similar to that shown for amyloid [118], indicating that the core structure of PHFs may share the generic amyloid-like structure. 3.1.3 β-Silks All silk fibroins appear to contain crystalline regions or regions of long-range order, interspersed with short-range-ordered, non-crystalline regions [14]. Silks spun by different organisms vary in the arrangement, proportions, and length of
11
12 Structural Analysis of Fibrous Proteins
the crystalline and non-crystalline regions. Generally, β-pleated sheets are packed into tight crystalline arrays with spacer regions of more loosely associated β-sheets, α-helices, and/or β-spirals [14]. X-ray fiber diffraction has been used extensively to examine the structure of silks made by different spiders and insects [14]. The crystalline domains of Bombyx mori fibroin silk are composed of repeating glycinealanine residues and form an antiparallel arrangement of β-pleated strands that run parallel to the fiber axis. For this structural arrangement, a model was proposed by Marsh, Corey, and Pauling [119], in which the intersheet distance is about 5.3 Å, which accommodates the small side chains. A recent refinement of the model [120] showed that the methyl groups of the alanine residues alternately point to either side of the sheet structure along the hydrogen-bonding direction, so that the structure consists of two antipolar, antiparallel β-sheet structures. Major ampullate (dragline) silk from the spider Nephila clavipes has been examined by X-ray diffraction and solid-state NMR, and, again, the silk structure is dominated by β-strands aligned parallel to the fiber axis [121]. Tussah silk, which is the silk of the caterpillar of the moth Antherea mylitta, also has an antiparallel arrangement of β-strands parallel to the fiber axis and gives an X-ray diffraction pattern similar to that of Bombyx mori silk, although some significant differences are observed. These differences are thought to arise from different modes of sheet packing due to the high polyalanine content of Tussah silk [19]. Cross-β silk was identified in the egg stalk of the green lacewing fly, Chrysopa flava [122]. The X-ray diffraction pattern was examined in detail and found to arise from an arrangement where the β-strands run orthogonal to the fiber axis and hydrogen bonding extends between the strands forming a β-sheet that runs the length of the fiber. A model was proposed by Geddes et al. [4] in which the β-pleated strand folded back on to itself to form antiparallel β-sheet ribbons. The length of the β-strands was about 25 Å, giving the ribbon a width of 25 Å. Several ribbons then stacked together in the crystallite. This model has been used as the basis of the amyloid core structure (see Section 3.1.1).
3.1.4 β-Sheet-containing Viral Fibers Adenovirus fiber shaft, P22 tail spike, and bacteriophage T4 short and long fibers are known to be predominantly β-sheet-structured [101]. The shaft regions of the adenovirus and bacteriophage T4 fibers contain repeating sequences, indicating that they might fold into regular, repeating structures. The adenovirus type-2 fiber shaft has been shown to fold into a novel triple β-spiral. The crystal structure of a four-repeat fragment was solved and showed that each repeat folded into an extended β-strand that runs parallel to the fiber axis [123]. Each β-strand is followed by a sharp β-turn enabled by the positioning of a proline or glycine residue, and the turn is then followed by another β-strand, which runs backward at an angle of about 45◦ to the fiber shaft axis. This fold results in an elongated molecule with β-strands spiraling around the fiber axis. A model of the fiber shaft suggests a diameter of around 15–22 Å and a length of around 300 Å. This fold is also thought to be included in other viral fiber structures (e.g., reovirus structure).
3 Overview: Fiber Structures
Several proteins have been shown to fold into a parallel β-helix, including pectate lyase [124] and the bacteriophage P22 tail spike [125]. These have righthanded, single-chain helices that have a bean-shaped cross-section, and the βstrands run perpendicular to the long axis. Left-handed β-helices (e.g., UDP Nacetylglucosamine acyltransferase) have a triangular cross-section, as each turn of the helix is made up of three β-strands, connected by short linkers [126]. A triple β-helix fold has been identified in part of the short-tail fiber of bacteriophage T4, gp12. It has a triangular cross-section and is made up of three intertwined chains [127]. The β-strands are again perpendicular to the long axis. This arrangement has also been identified in a region of P22 tail spike, and a similar fold has been found in bacteriophage T4 protein gp5. The arrangement forms the needle used by T4 to puncture host cell membranes [128]. Very recently, the structure of the T4 baseplate-tail complex was determined to 12 Å by cryo-electron microscopy, and this has enabled the known crystal structures to be fitted into the cryo-electron microscopy map to build up a full structure of the T4 tail tube and the arrangement of proteins at the baseplate [129]. 3.2 α-Helix-containing Protein Fibers 3.2.1 Collagen The structure of the collagen superfamily has been studied for over 50 years. Collagen has a complex supramolecular structure, involving the interaction of different molecules giving different functions. It has a characteristic periodic structure, forming many levels of association [130]. The primary structure consists of a repeating motif, Gly-Xaa-Yaa, interspersed by more variable regions. The secondary structure involves the central domain of collagen α-chain folding into a tight, right-handed helix with an axial residue-to-residue distance determined by the spacing of the Pro residues (Xaa) and hydroxyprolines (Yaa). This arrangement results in a Gly residue row following a slightly left-handed helix winding around the coil, and this is responsible for the packing to form the tertiary structure, a triple helix, where the Glys are buried inside the supercoiled molecules [130]. Depending on the collagen type, the triple helices may be comprised of differing collagen chains or of three identical chains [130]. The triple helix is a long, rod-like molecule around 1.5 nm wide. Recently, the crystal structure of the collagen triple helix formed synthetically from [(Pro-Pro-Gly)10 ]3 peptide was solved to a resolution of 1.3 Å [25] (Figure 1). The triple helix undergoes fibrillogenesis to form supramolecular structures. Nonhelical ends that do not contain the Gly-Xaa-Yaa sequence are known as the telopeptides. Many models of the molecular packing have been presented (summarized in Ref. [130]). The most recent model of type I collagen appears to fit data from X-ray diffraction, electron microscopy, and cross-linking studies [130]. The proposed model is a one-dimensional, staggered, left-handed microfibril. The packing allows for interconnections between the microfibrils via the telopeptides [131]. This model was deduced by careful testing of several models to compare calculated and observed X-ray diffraction data [132].
13
14 Structural Analysis of Fibrous Proteins
Fig. 1. Collagen triple helix model: the crystal structure of [(Pro-Pro-Gly)10 ]3 (PDB: 1K6F.pdb) [25]. The three chains of the collagen-like peptide are represented as space-filling models and a ribbon model.
3.2.2 Tropomyosin Tropomyosin molecules are arranged within the thin filament of muscle bound to actin filaments (see below). The molecules are two-chain, α-helical, coiled-coil proteins around 400 Å in length that link end to end to form continuous strands. The crystal structure of full-length tropomyosin was solved from low-resolution X-ray diffraction data with a contribution from electron microscopy, revealing a pitch of around 140 Å [133, 134]. 3.2.3 Intermediate Filaments Knowledge of the structural detail of the intermediate filament (IF) is incomplete at present. However, the primary sequence of the central rod domain of the cytoplasmic IF proteins (cytokeratin 18, cytokeratin 8, vimentin, neurofilament L protein, and nuclear laminin), reveals the heptad repeat characteristic of coiled-coil structures [58]. Coiled coils were first identified by X-ray diffraction from oriented α-keratin in hair and quill [135]. A model for the structure of vimentin has been proposed, using crystallographic structure of short fragments of conserved regions of vimentin [136]. The structure reveals the four coiled-coil regions interspersed with linker regions. The dimers assemble to form tetramers, which then associate into rod-like filaments (about 60 nm long and 16 nm in diameter), known as unitlength filaments (ULFs). Finally, the ULFs anneal longitudinally to form extended filaments [58]. These mature filaments undergo some internal rearrangement that results in the final diameter being 11 nm. The arrangement and association of the dimers has been examined using cross-linking studies as well as analysis of
3 Overview: Fiber Structures
X-ray fiber diffraction patterns and electron micrographs [58]. From these studies it has been proposed that two oppositely oriented, left-handed coiled coils may wind around one another to form a left-handed super helix with a pitch of around 140–180 Å. Cryo-electron microscopy and X-ray diffraction from hard α-keratin have suggested that the IFs have a low density core with a diameter of around 3 nm. It has been suggested that the ULFs may be a circular assembly of tetramers arranged as a hollow cylinder. Upon compaction, when the ULFs anneal into an extended filament, the lateral packing of the dimers alters and four dimers form octameric protofilaments that pack more densely, resulting in a much narrower hollow center [58]. 3.3 Protein Polymers Consisting of a Mixture of Secondary Structure 3.3.1 Tubulin The structure of microtubules has been studied using cryo-electron microscopy, electron diffraction, and X-ray crystallography. The structure of a bacterial homologue of tubulin called FtsZ has also been solved using X-ray diffraction [137], and synthetic polymers of FtsZ have been studied by electron microscopy and image reconstruction [138]. Tubulin exists as αβ heterodimers that polymerize into protofilaments that make up the microtubule. The crystal structure of tubulin heterodimers was solved by electron crystallography [139] from two-dimension crystals of zinc-induced tubulin sheets in ice, and the structural interpretation was supported by the X-ray crystal structure of FtsZ [137]. Each monomer is a compact ellipsoid made up of three domains, an amino terminal nucleotide-binding domain, a smaller second domain, and a helical carboxy terminal domain [140]. The tubulin heterodimers polymerize in a polar fashion to make longitudinal protofilaments that in turn bundle to form the microtubule. The dimer structure was docked into 20-Å maps from cryo-electron microscopy of microtubules, creating a near-atomic model of the microtubule [140]. This model revealed that the carboxy terminal helices form a crest on the outside of the protofilaments and that the long loops define the microtubule lumen [140]. The microtubule is polar, with plus and minus ends, and the structure showed that the exchangeable nucleotide bound to β-tubulin was exposed at the plus end of the microtubule, whereas the proposed catalytic residue in α-tubulin was exposed at the minus end. Cryo-electron microscopy and helical image reconstruction techniques have proved particularly powerful in examining the arrangement and interaction of motor proteins bound to microtubules [141]. Three-dimensional maps of microtubules decorated with motor domains or kinesin in different nucleotide states could be compared to naturally occurring microtubules (Figure 2) [141]. 3.3.2 Actin and Myosin Filaments The structure of actin filaments has been extensively studied using X-ray diffraction and electron microscopy [142]. Early work by Hanson and Lowry [143] revealed the basic geometry of F-actin by negative stain electron microscopy. Major advances
15
16 Structural Analysis of Fibrous Proteins
Fig. 2 Structure of the microtubule from electron microscopy image reconstruction. The microtubule is shown in green with directly bound kinesin heads in yellow and tethered heads of kinesin dimers in orange. Thanks to Linda A. Amos, LMB, Cambridge for the donation of this figure.
were made when the crystal structure of actin was solved in complex with DNase I [144]. Since then, a more detailed picture of the structure of the actin filament has been built up [145, 146] by positioning the atomic resolution crystal structure against X-ray diffraction data. The filament is composed of actin monomers arranged in a left-handed helix with 13 monomers per six turns. The pitch is about 59 Å and the separation between monomers is about 27.5 Å. Two strands of actin monomers slowly twist around one another, with a cross-over repeat of 360–380 Å. Tropomyosin chains bind along the actin filament in the grooves between the two actin strands. Each tropomyosin has a troponin complex bound, and the positioning of the troponin marks a 385-Å repeat along the axis of the actin filaments. X-ray fiber diffraction has been utilized to examine changes in the structure of muscle fibers in order to try to understand the mechanism of muscle contraction [147]. Muscular force is produced when myosin filaments interact with actin filaments by an ATP-dependent process. Time-resolved, low-angle diffraction patterns from muscle in various states can be taken and the positions and intensities of signal can be compared [147] to allow conclusions to be made about the movement of various components of the muscle fibers. Myosin filaments make up the thick filaments of striated muscle and the structure has also been examined using X-ray diffraction and electron microscopy. Modeling of the arrangement of the myosin heads from X-ray and electron microscopy data have been performed, but the arrangement of the heads has been inconclusive. Four arrangements of the heads are possible, whereby the heads may be parallel or antiparallel. In each case the two heads may lie on the same helical track in a compact arrangement or on neighboring tracks in a splayed arrangement [148]. Recent work on electron microscopy data from filaments of the spider Brachypelma smithi (tarantula) [149] tested the four possible arrangements by fitting the crystal structure of myosin heads [150] into the map generated from electron microscopy
4 Methods to Study Fiber Assembly 17
of negatively stained thick filaments. A good fit was obtained when the myosin heads were arranged antiparallel and packed on the same helical ridge.
4 Methods to Study Fiber Assembly 4.1 Circular Dichroism Measurements for Monitoring Structural Changes Upon Fiber Assembly 4.1.1 Theory of CD Circular dichroism (CD) is a phenomenon that gives information about the unequal absorption of left- and right-handed circularly polarized light by optically active molecules in solution [151–154]. CD signals are observed in the same spectral regions as the absorption bands of a particular compound, if the respective chromophores or the molecular environment of the compound are asymmetric. CD signals of proteins occur mainly in two spectral regions. The far-UV or amide region (170–250 nm) is dominated by contributions of the peptide bonds, whereas CD signals in the near-UV region (250–320 nm) originate from aromatic amino acids. In addition, disulfide bonds give rise to minor CD signals around 250 nm. The two main spectral regions give different kinds of information about protein structure. CD signals in the amide region contain information about the peptide bonds and the secondary structure of a protein and are frequently employed to monitor changes in secondary structure in the course of structural transitions [155–157]. Since many fiber-forming proteins undergo certain conformational changes upon fiber assembly (e.g., from an intrinsically unfolded state to a β-sheet-rich fiber as seen in some amyloid-forming proteins; see Figure 3), far-UV CD measurements can be used to monitor the kinetics of fiber assembly.
Fig. 3 Far-UV CD spectra of the yeast prion protein Sup35p-NM [9]. The green line represents the spectrum of soluble protein that depicts a random coil conformation. The black line represents the spectrum of protein after fiber assembly, which reflects a $-sheet-rich conformation.
18 Structural Analysis of Fibrous Proteins
CD signals in the near-UV region are observed when aromatic side chains are immobilized in a folded protein and thus transferred to an asymmetric environment. Since Tyr and Phe residues yield in very small near-UV CD signals, only Trp residues lead to utilizable near-UV CD spectra. The CD signal of aromatic residues is very small in the absence of ordered structure (e.g., in short peptides) and therefore serves as a good tool for monitoring fiber assembly of peptides (in case Trp residues are present). The magnitude as well as the wavelengths of the aromatic CD signals cannot be predicted, since they depend on the structural and electronic environment of the immobilized chromophores. Therefore, the individual peaks in the complex near-UV CD spectrum of a protein usually cannot be assigned to transitions in the vicinity of specific amino acid side chains. However, the near-UV CD spectrum represents a highly sensitive criterion for the structural state of a protein. It can thus be used as a fingerprint for either conformational state, the soluble and the fibrous one [156, 157]. 4.1.2 Experimental Guide to Measure CD Spectra and Structural Transition Kinetics Preferably only fused quartz cuvettes that display high transparency below 200 nm and are not birefringent should be used. Their quality can be examined by recording the CD of the thoroughly cleaned empty cuvettes, which should coincide with the instruments’ baseline in the absence of the cell. Deviations occur if the cell is birefringent or, more likely, if traces of impurities, such as residual proteins, cover the glass surface. The contribution of buffers and salts to the total absorbance of the sample should be as small as possible. This poses severe restrictions on the choice of solvents for CD measurements, particularly in the far-UV region. Therefore, absorbance values for salts and buffers have to be determined in the far-UV region prior to the measurement. Common denaturants such as urea and guanidinium chloride absorb strongly in the far-UV region, which restricts their use to wavelengths above 210 nm. Only highly purified salts and denaturants should be used to avoid interference from absorbing contaminants. Oxygen absorbs light below 200 nm, which makes it necessary to degas solvents for measurements in this region prior to the CD measurement. The cell compartment and the optics of the instrument should be purged with nitrogen [152, 156]. The following considerations are important for an optimal choice of protein concentration: path length and solvent condition. A good signal-to-noise ratio is achieved when the absorbance is around 1.0, but in general it should stay below 2.0. The magnitude of the CD depends on the protein concentration, and therefore its contribution to the total optical density of the sample should be as high as possible. This implies that the absorbance of the solvent should be as small as possible. Absorbance spectra of the samples prior to the CD experiment should be recorded to determine the protein concentration necessary for the calculation of the molar CD and to select the optimal conditions for the CD measurement. Most buffers are transparent in the near-UV region, and path lengths of 1 cm or more can be selected in this spectral region. For measurements in the far-UV region, short path lengths (0.2–0.01 cm) and increased protein concentrations should be
4 Methods to Study Fiber Assembly
employed to minimize the contributions of the solvent to the total absorbance. As a rule, 0.1-cm cells can be used for the spectral region down to 200 nm. For measurements extending below 200 nm, 0.01-cm cells are advisable. Solutions to be employed for CD measurements should be filtered or centrifuged to remove dust or other solid suspended particles. The presence of light scattering in CD measurements leads to artifacts that result in anomalous, red-shifted spectra. Since light scattering increases during assembly of protein fibers, this point has to be taken into consideration before evaluating the received data. When protein samples are exposed to UV light around 220 nm for an extended time, as is frequently the case when measuring far-UV CD spectra or conformational transition kinetics, photochemical damage can occur. It is advisable to examine some functional or structural properties of the protein under investigation after long-term CD measurements [152, 155, 156]. 4.2 Intrinsic Fluorescence Measurements to Analyze Structural Changes 4.2.1 Theory of Protein Fluorescence Fluorescence emission is observed when, after excitation, an electron returns from the first excited state back to the ground state. As the lifetime of the excited state is long, a broad range of interactions or perturbations can influence this state and thereby the emission spectrum. Fluorescence is thus an excellent probe to investigate conformational changes of proteins and fiber assembly [158–162]. The fluorescence of proteins originates from Phe, Tyr, and Trp residues, which show the following absorbance and fluorescence properties in water at neutral pH: Trp λmax (abs) = 280 nm, ε max (abs) = 5600 M−1 cm−1 , λmax (fluor) = 355 nm; Tyr λmax (abs) = 275 nm, ε max (abs) = 1400 M−1 cm−1 , λmax (fluor) = 304 nm; Phe λmax (abs) = 258 nm, ε max (abs) = 200 M−1 cm−1 , λmax (fluor) = 282 nm. In proteins that contain all three aromatic amino acids, fluorescence is usually dominated by the contribution of the Trp residues because both their absorbance at the wavelength of excitation and their quantum yield of emission are considerably higher than the respective values for Tyr and Phe residues. Phe fluorescence is typically not observed in native proteins due to its low intensity. The other factor in fluorescence is energy transfer between residues. Therefore, Phe fluorescence is also not observed because its emission is efficiently quenched by energy transfer to the other two aromatic amino acids. In proteins that contain both Tyr and Trp residues, fluorescence of Tyr is barely detectable due to the strong emission of Trp, due to wavelength shifts of Trp emission towards the wavelength of Tyr emission, and due to non-radiative energy transfer from Tyr to Trp residues in compactly folded proteins. Changes in protein conformation very often lead to large changes in fluorescence emission, since fluorescence of aromatic amino acids depends on their environmental conditions. In Trp-containing proteins, both shifts in emission wavelength and changes in intensity can generally be observed upon conformational changes. In hydrophobic environments, such as in the interior of a folded protein or protein fibers, Trp emission occurs at lower wavelengths than in aqueous ones. The
19
20 Structural Analysis of Fibrous Proteins
fluorescence of Trp residues can be investigated selectively by excitation at wavelengths greater than 295 nm. Due to the red shift and the increased intensity of the absorbance spectrum of Trp residues, when compared with that of Tyr residues, protein absorbance above 295 nm originates almost exclusively from Trp residues. Unlike absorbance, the changes in fluorescence upon folding can be very large, which makes intrinsic Trp fluorescence useful as a sensitive probe for conformational changes of proteins upon fiber assembly and in analyzing the kinetics thereof. The fluorescence maximum of Tyr residues remains around 303 nm irrespective of its molecular environment. Therefore, conformational changes of proteins are accompanied by changes in the Tyr emission intensity, but not in the wavelength of emission, which makes Tyr fluorescence a less sensitive tool for studying conformational changes in comparison to Trp fluorescence [159, 160, 163, 164]. 4.2.2 Experimental Guide to Measure Trp Fluorescence Emission is usually observed at a right angle to the excitation beam. Therefore, fluorescence cuvettes are manufactured with polished surfaces on all four sides. Quartz cells are required for work in the UV region. Fluorescence is an extremely sensitive technique, so it is mandatory to avoid contamination of cuvettes and glassware with fluorescent substances. Deionized and quartz-distilled water should be used, and plastic containers that may leach out fluorescent additives should be avoided. Also, laboratory detergents usually contain strongly fluorescing substances. Fluorometer cells should be cleaned with 50% nitric acid. If detergents are used for cleansing, extensive rinsing with distilled water is required. Any solvent can be used for fluorescent measurements, provided that it does not absorb in the spectral regions of the excitation and emission of the fluorophore to be examined, which implies that the buffers or solvent additives should not absorb or fluoresce at wavelengths greater than 250 nm. Because of the high sensitivity of fluorescence measurements, impurities and dust have to be avoided entirely. A single dust particle moving slowly through the light beam can cause severe distortions of the signal due to light scattering. It is helpful to stir the sample solution continuously during the fluorescence measurement, using tiny magnetic stirring bars placed at the bottom of the cuvette. However, in the case of fiber assembly kinetics, extreme care has to be taken that stirring does not influence the assembly/aggregation process. Magnetic stirrers that fit underneath the cuvette in the sample compartment of the instrument and cuvettes with specially designed bottoms for magnetic bars are commercially available. Protein emission spectra are broad and without detectable fine structure. Changes upon structural transitions frequently involve the entire fluorescence spectrum. Therefore, a variety of wavelengths and wide emission slits can be used to follow conformational changes of proteins. The fluorescence intensity generally decreases with increasing temperature. This decrease is substantial, with Tyr emission decreasing ≥1% per degree increase in temperature and an even more pronounced temperature dependence of Trp fluorescence. In contrast, there is no linear relationship between fluorophore concentration and emission intensity. The
4 Methods to Study Fiber Assembly
actual form of the concentration dependence varies with the optical geometry of the fluorometer and the cell. The nonlinearity is caused by inner filter optical effects. Between the entrance window of the cells and the volume element of the sample, from which fluorescence emission is collected, the exciting light is attenuated by absorbance of the protein and the solvent. In strongly absorbing samples (i.e., at high protein concentrations), the emission intensity is actually decreased, as almost the entire incident light is absorbed before it can reach the center of the cuvette [158, 159, 161]. Since growing fibers are large enough to scatter light, light scattering as measured in the spectrofluorometer can be used as an analytic tool to investigate fiber assembly kinetics (see Section 4.5). 4.3 Covalent Fluorescent Labeling to Determine Structural Changes of Proteins with Environmentally Sensitive Fluorophores 4.3.1 Theory on Environmental Sensitivity of Fluorophores Fluorescence spectra and fluorescence quantum yields are generally more dependent on the environment than are absorption spectra and extinction coefficients. For example, coupling a single fluorescein label to a protein reduces its quantum yield by approximately 60%, but decreases its extinction coefficient by only approximately 10%. Interactions between two adjacent fluorophores or between a fluorophore and other species in the surrounding environment can produce environment-sensitive fluorescence. The most common environmental influences on fluorescence properties are fluorophore-fluorophore interactions, solvent polarity, proximity and concentrations of quenching species, and pH of the aqueous medium. Environmental influences on fluorescence by quenching are the result of a bimolecular process that reduces the fluorescence quantum yield without changing the fluorescence emission spectrum. Fluorescence resonance energy transfer (FRET) is a strongly distance-dependent, excited-state interaction in which emission of one fluorophore is coupled to the excitation of another. Some excited fluorophores interact to form excimers, which are excited-state dimers that exhibit altered emission spectra [159, 165]. Several chemical groups in side chains of amino acids can be used to covalently cross-link fluorophores. Among the most important are amine and thiol groups. Amine-reactive groups are widely used to prepare protein and peptide bioconjugates for immunochemistry, fluorescence in situ hybridization, cell tracing, receptor labeling, and fluorescent analogue cytochemistry. In contrast, thiol-reactive dyes are principally used to prepare fluorescent proteins and peptides for probing biological structure, function, and interactions. Since the thiol functional group can be labeled with high selectivity, thiol modification is often employed to estimate both inter- and intramolecular distances using FRET. In proteins, thiol groups (also called mercaptans or sulfhydryls) are present in Cys residues. The thiol-reactive functional groups are primarily alkylating reagents including iodoacetamides, maleimides, benzylic halides, and bromomethyl ketones. Among the dyes that are particularly sensitive to environmental and conformational changes are benzoxadiazole, naphthalene, and
21
22 Structural Analysis of Fibrous Proteins
pyrene derivatives [166]. When protein conjugates of these dyes are denatured or undergo a change in conformation, a decrease in fluorescence intensity and a shift in emission to longer wavelengths are often observed. The use of covalently linked fluorophores recently allowed the investigation of the degradation, aggregation, and fiber assembly kinetics of αβ peptides and of the amyloid-forming NM-region of the yeast prion protein Sup35 [167–169]. These reports indicate the power of covalently attached fluorophores in determining fiber assembly mechanisms. In addition to the broadly applicable dyes discussed above, some fluorophores are specifically crafted to label actin (e.g., phallotoxins) or tubulin, but these fluorophores are not further discussed in the context of this chapter [166]. 4.3.2 Experimental Guide to Labeling Proteins With Fluorophores We will focus in this section on binding of reagents that can be coupled to thiol groups on proteins or peptides to give thioether-coupled products. Most of the commercially available fluorophores react rapidly at near-neutral pH and usually can be coupled to thiol groups selectively in the presence of amine groups. Haloalkyl reagents (primarily iodoacetamides) are among the most frequently used reagents for thiol modification. In most proteins, the site of reaction is at Cys residues that either are intrinsically present or result from reduction of cystines. In addition, Met residues can sometimes react with haloalkyl reagents. Maleimides are similar to iodoacetamides in their application as reagents for thiol modification. However, they are more thiol-selective than iodoacetamides, since they do not react with His or Met residues. For labeling in general, the protein should be dissolved in a suitable buffer at 50–100 µM at pH 8.0–8.5 (10–100 µM tris-(hydroxymethyl)-aminomethane [Tris] or N-(2-hydroxyethyl)-piperazine-N -2-ethansulfonic acid [HEPES]) at room temperature. In this pH range, the protein thiol groups are sufficiently nucleophilic that they react almost exclusively with the reagent in the presence of the more numerous protein amines, which are protonated and relatively unreactive. Reduction of disulfide bonds in the protein is best carried out at this stage. A 10-fold molar excess of a reducing agent, such as 1,4-dithiothreitol (DTT) or tris-(2-carboxyethyl) phosphine (TCEP), is usually sufficient. If DTT is used, then dialysis is required to remove the excess DTT prior to introducing the reactive dye. It is not necessary to remove excess TCEP during conjugation with iodoacetamides, haloalkyl derivatives, or maleimides. It is advised to carry out thiol modifications in an oxygen-free environment, because thiols can be oxidized to disulfides, especially when the protein has been treated with DTT prior to thiol modification. A 10- to 20-µM stock solution of the reactive dye in a suitable solvent should be prepared immediately prior to use, and all stock solutions have to be protected from light as much as possible. Sufficient protein-modification reagent is added to give approximately 10–20 moles of reagent for each mole of protein. The reaction should proceed for 2 h at room temperature or overnight at 4 ◦ C. For haloalkyl reactive dyes and to a lesser extent maleimides, it is essential to protect the reaction mixture from light as much as possible. Upon completion of the reaction with the protein, an excess of glutathione, mercaptoethanol, or other soluble low-molecular-weight thiols can
4 Methods to Study Fiber Assembly
be added to consume excess thiol-reactive reagent. The conjugate should be separated on a gel-filtration column, such as a Sephadex G-25 column or equivalent matrix, or by extensive dialysis to remove non-protein-bound dye. It is important to calculate the degree of labeling by determining both protein concentration and concentration of label (through the fluorophore-specific absorption) (for an overview, see Ref. [170]). After successfully labeling the protein, the fluorescence emission of the dye can be monitored upon fiber assembly. In case the reagent was linked at a site that changes its conformation upon fiber assembly, kinetics thereof can be monitored. Commonly used dyes for such investigations are 6-acryloyl-2dimethylaminonaphathlene (acrylodan) and N,N -dimethyl-N-(iodoacetyl)-N -(7nitrobenz-2-oxa-1,3-diazol-4-yl) ethylene diamine (IANBD amide). 4.4 1-Anilino-8-Naphthalensulfonate (ANS) Binding to Investigate Fiber Assembly 4.4.1 Theory on Using ANS Fluorescence for Detecting Conformational Changes in Proteins Fluorescent dyes can be used to investigate conformational changes of proteins without the necessity of conjugation. Some substances, such as the N-benzyl derivatives of 3-chloro-6-methoxy-9 aminoacridine and amino naphthalene sulfonic acids, do not fluoresce in aqueous solution but are highly fluorescent when adsorbed onto proteins, with the intensity of fluorescence depending on the structural state of the protein [171, 172]. Such fluorescent dyes exist in a soluble non-planar form, in which the radiative transition is forbidden. The adsorbed dye, on the other hand, exists in a planar configuration, in which the transition is allowed [173]. 1-Anilino-8naphthalensulfonate (ANS) is the most commonly used fluorescent dye for analyzing conformational changes of proteins. ANS is brilliantly fluorescent in solvents such as isopropanol, but it shows practically no fluorescence in water. ANS can bind to nonpolar (hydrophobic) sites in proteins through its anilinonaphthalene group and therefore reflects an excellent tool to investigate the unfolding of a protein, in which usually tightly packed hydrophobic patches become surface-exposed. However, ANS binding to proteins also depends on proteins’ cationic charge and solution pH, since the ANS sulfonate group is also involved in the ANS-protein interaction [174, 175]. Importantly, the dependence of ANS binding on electrostatic interaction between the sulfonate group and protein cationic groups does not require preexisting hydrophobic sites on or in the protein to start the binding reaction. Moreover, ANS fluorescence is quenched by water, so fluorescence is profoundly affected if a change in protein hydration or ANS accessibility to water occurs during binding [176]. Therefore, ANS can be used not only to monitor protein unfolding but also to analyze structural changes upon fiber assembly. 4.4.2 Experimental Guide to Using ANS for Monitoring Protein Fiber Assembly ANS has been used to study the partially folded states of globular proteins as well as the binding pockets of a number of carrier proteins and enzymes. ANS has
23
24 Structural Analysis of Fibrous Proteins
also been used to characterize fibrillar forms of proteins, such as fibronectin, islet amyloid polypeptide, or infectious isoforms of the prion protein PrP [177–180]. After binding to fibers, ANS can show an increased fluorescence and a blue shift of its fluorescence maxima. Protein concentrations should be in the range of 0.5–20 µM and ANS should be added in a 10- to 100-fold excess. The concentration of ANS can be determined by using a molar extinction coefficient of 6.8 × 103 M−1 cm−1 at 370 nm in methanol [181]. In a 1-cm cell the fluorescence inner-filter effect (see Section 4.2.2) becomes significant at ∼30 µM ANS. Therefore, correction factors have to be used for the fluorescence of ANS when its total concentration exceeds 10 µM. The magnitude of the inner-filter effect must be experimentally determined for each instrument and whenever the instrumental configuration is altered. The magnitude of the correction depends on the wavelength range and the path length, but not on slit-width or sample turbidity for most of the instruments [182]. To determine whether ANS binds to the fibrous protein, but not to the soluble counterpart, fluorescence emission spectra should be recorded from 400–650 nm with an excitation wavelength of 380 nm. After measuring ANS fluorescence spectra before and after fiber assembly, it can be determined whether ANS serves as a tool for investigating fiber assembly of the chosen protein. Extreme care has to be taken in determining whether ANS fluorescence changes are based solely on fiber formation or whether ANS is also able to bind non-fibrous assembly intermediates. If ANS binding is specific to the fibrous form of the protein, the wavelength with maximal fluorescence differences before and after assembly can be used to monitor fiber assembly kinetics. 4.5 Light Scattering to Monitor Particle Growth
Hydrodynamic studies can be carried out to obtain qualitative descriptions of the shape and dimensions of macromolecules and to gain precise information on molecular geometry [183]. The interpretation of hydrodynamic measurements is always based on an assumption that the real molecule must be presented by a particle of simpler shape, characterized by a small number of geometric parameters that can be determined experimentally. As a result, the choice of an inadequate model may result in incorrect interpretation. 4.5.1 Theory of Classical Light Scattering Classical light scattering is an absolute method for the determination of the molar mass of intact macromolecules, and it is particularly suited for studying large macromolecular assemblies, up to a maximum of 50 × 106 Mr . Beyond this maximum, the theory of classical light scattering, known as the “Rayleigh-Gans-Debye” approximation, is not valuable. Classical light scattering is the total, or time-integrated, intensity of light scattered by a macromolecular solution compared with the incident intensity for a range of concentrations and/or angles. The intensity of light scattered by a protein solution is measured as a function of angle with a lightscattering photometer. If the macromolecule solution is heterogeneous, as it is
4 Methods to Study Fiber Assembly
usually the case for fiber assembly reactions, Mr will be a weight-average molar mass Mw , which makes it difficult to determine precise fiber assembly kinetics [184, 185]. However, in combination with field-flow fractionation (FFF) (see Section 4.6), classical light scattering serves as a powerful tool to investigate the growth of protein fibers. The combination of classical light scattering with FFF is realized by using multi-angle laser light-scattering photometers (MALS). In addition, it is necessary to have a concentration detector, which is usually a highly sensitive differential refractometer equipped with a flow cell. Taken together, FFF-MALS is most valuable for the analysis of heterogeneous and polydispersed protein mixtures (see Section 4.6) [186–188]. 4.5.2 Theory of Dynamic Light Scattering The principle of dynamic light scattering (DLS) experiments is based on the high intensity, monochromaticity, collimation, and coherence of laser light. The primary parameter that comes from DLS measurements is the translational diffusion coefficient D. In DLS measurements, laser light is directed into a thermostated protein solution, and the intensity is recorded at either a single angle (90◦ ) or multiple angles using a photomultiplier/photodetector. The intensities recorded will fluctuate with time, caused by Brownian diffusive motions of the macromolecules. This movement causes a “Doppler” type of wavelength broadening of the otherwise monochromatic light incident on the protein molecules. Interference with light at these wavelengths causes a fluctuation in intensity, which depends on the mobility of the protein molecules. An autocorrelator evaluates this fluctuation by a normalized intensity autocorrelation function as a function of delay time. The decay of the correlation, averaged over longer time intervals, can then be used to obtain the value of D [185]. To obtain molar mass information from the value of D, a calibration of log D versus log Mr is produced, based on globular protein standards. The logarithmic plot of the normalized autocorrelation decay is a straight line for homodispersed solutions, but will tend to curve for solutions with assembling protein fibers. The spread of diffusion coefficients is indicated by a parameter known as the polydispersity factor, which can be evaluated by computer programs. DLS is particularly valuable for the investigation of changes in macromolecular systems, when the time-scale of changes is minutes or hours and not seconds or shorter, which makes it valuable for the investigation of fiber assembly [184, 189–192]. 4.5.3 Experimental Guide to Analyzing Fiber Assembly Using DLS Solutions have to be as free as possible from dust and supramolecular aggregates. This requirement is met by filtering all solutions with filters of appropriate size (0.1–0.45 µM). An optimal protein concentration is 2 mg mL−1 for a 30-kDa protein, with proportionally lower concentrations for larger proteins. The diffusion coefficient is a sensitive function of temperature and the viscosity of the solvent in DLS measurements. Therefore, measurement and calibration have to be made at the same temperature and solvent viscosity conditions. The diffusion coefficient measured at a single concentration is an apparent one (Dapp ) because of nonideality effects (finite volume and charge). These effects become vanishingly small as
25
26 Structural Analysis of Fibrous Proteins
the concentration approaches zero. Importantly, for non-globular proteins (such as growing fibers) a multi-angle instrument has to be used [185]. 4.6 Field-flow Fractionation to Monitor Particle Growth 4.6.1 Theory of FFF Field-flow fractionation (FFF) is a one-phase chromatography in which highresolution separation is achieved within a very thin flow against which a perpendicular force field is applied. This technique will be described in more detail, but it has to be mentioned that other types of fields are also used. In the cross-field technique, the flow and sample are confined within a channel consisting of two plates that are separated by a spacer foil with a typical thickness of 100–500 µm. The upper channel plate is impermeable, while the bottom channel plate is permeable and is made of a porous frit. An ultrafiltration membrane covers the bottom plate to prevent the sample from penetrating the channel. Within the flow channel, a parabolic flow profile is created due to the laminar flow of the liquid: the stream moves slower closer to the boundary edges than it does at the center of the channel flow. When the perpendicular force field is applied to the flowing, laminar stream, the analytes are driven towards the boundary layer of the channel, the so-called accumulation wall. Diffusion associated with Brownian motion in turn creates a counteracting motion. Smaller particles, which have higher diffusion rates, tend to reach an equilibrium position farther away from the accumulation wall. Thus, the velocity gradient flowing inside the channel separates different sizes of particles. Smaller particles move much more rapidly than larger particles due to their higher diffusion coefficients, which results in smaller particles eluting before larger ones. This is exactly the opposite of a size-exclusion chromatography separation, in which large molecules elute first. With FFF separation there is no column media to interact with the samples. Even for very high-molar-mass polymers, such as protein fibers, no shearing forces are applied. The entire separation is gentle, rapid, and non-destructive, without a stationary phase that may interact, degrade, or alter the sample [193–196]. The knowledge of the actual molecular shape is of great importance for analyzing proteins using FFF. Regular and simple geometrical shapes with well-defined boundary surfaces are formed by just a few macromolecules. In contrast, most dissolved macromolecules, especially protein fibers, have to be described in terms of an asymmetrical, expanded and inhomogeneous coil. Considering the complex nature of protein fiber conformations, two parameters are important: the root-mean-square radius and the hydrodynamic radius of the macromolecule. Although a theory predicts the hydrodynamic size of eluting species as a function of elution time, absolute determination of size without reference to standards, assumptions about the conformation of the particles, etc., can be obtained only by combination of FFF with multi-angle light-scattering (MALS) detectors (see Section 4.3) [186–188]. FFF in combination with MALS allows separation of protein fibers of different sizes and therefore serves as a tool to monitor assembly and elongation of protein fibers.
4 Methods to Study Fiber Assembly
4.6.2 Experimental Guide to Using FFF for Monitoring Fiber Assembly FFF reflects a unique tool for analyzing fiber assembly, since it is independent of sample impurities and sample composition. The amount of sample injected can vary widely (0.5–500 mg) because sample size has a negligible effect on the separation efficiency, as long as it is sufficiently large to yield a good detector response and sufficiently small to avoid overloading (a few micrograms is usually a suitable amount for injection). Unlike for other techniques described in this chapter, no general experimental guidelines can be provided, since the flexible operation of FFF has to be adapted for specific needs. For the two most important parameters, cross-flow and channel flow, the optimum flow rates have to be adjusted specifically for each protein investigated. At fixed flow rates, retention time significantly increases with increasing molar mass, accompanied by a growth in bandwidth. At low flow rates, separation power is sacrificed unnecessarily and rather narrow peaks are obtained. At high flow rates, detectability decreases due to increasing dilution [196–198]. 4.7 Fiber Growth-rate Analysis Using Surface Plasmon Resonance 4.7.1 Theory of SPR Surface plasmon resonance (SPR) is an optical technique that is a valuable tool for investigating biological interactions. SPR offers real-time in situ analysis of dynamic surface events and thus is, for instance, capable of defining rates of fiber growth. When light traveling through an optically dense medium (e.g., glass) reaches an interface between this medium and a medium of a lower optical density (e.g., air), a phenomenon of total internal reflection back into the dense medium occurs [199]. Although the incident light is totally internally reflected, a component of this light, the evanescent wave, penetrates the interface into the less dense medium to a distance of one wavelength [200]. In SPR measurements a monochromatic, polarized light source is used, and the interface between the two optically dense media is coated with a thin metal film (less than one wavelength of light thick). The choice of metal used is critical, since the metal must exhibit free electron behavior. Suitable metals include silver, gold, copper, and aluminum. Silver and gold are the most commonly used, since silver provides a sharp SPR resonance peak and gold is highly stable [201]. The evanescent wave of the incoming light is able to couple with the free oscillating electrons (plasmons) in the metal film at a specific angle of incidence, and thus the plasmon resonance is resonantly excited. This causes energy from the incident light to be lost to the metal film, resulting in a reduction in the intensity of reflected light, which can be detected by a two-dimensional array of photodiodes or charge-coupled detectors. Therefore, if the refractive index directly above the metal surface changes by the adsorption of a protein layer, a change in the angle of incidence required to excite a surface plasmon will occur [202]. By monitoring the angle at which resonance occurs during an adsorption process with respect to time, an SPR adsorption profile can be obtained. The difference between the initial and the final SPR angle gives an indication of the extent of
27
28 Structural Analysis of Fibrous Proteins
adsorption, and the positive gradient of the SPR adsorption curve determines the rate of adsorption, allowing analysis of, e.g., assembly kinetics of a growing protein fiber. 4.7.2 Experimental Guide to Using SPR for Fiber-growth Analysis One of the most important parameters for SPR measurements is the surface where the molecules interact. The commercial availability of sensor chips with robust and reproducible surfaces makes SPR convenient to use. The most commonly used surface is carboxymethyl dextran (CMD) bound to a gold substrate. CMD is covalently bound to the sensor chip surface via carboxyl moieties of dextran. Functional groups on the ligand that can be used for coupling proteins include NH2 , SH, CHO, and COOH. CMD sensor chips can be regenerated by selective dissociation of the bound protein from the covalently immobilized ligand. Other commercially available sensor chips have: surfaces made of dextran matrices with a low degree of carboxylation (providing fewer negative charges), shorter carboxymethylated dextran matrices (valuable for large protein assemblies), chips with coupled nitrilotriacetic acid (designed to bind histidine-tagged molecules), chips with pre-immobilized streptavidin (for capturing biotinylated ligands), and plain gold surfaces (no dextran or hydrophobic coating), which provides the freedom to design customized surface chemistry such as self-assembled monolayers (SAM) [203]. The specific coupling chemistry has to be selected for each experiment individually. In the case of fiber growth kinetics, it is necessary to couple initiating molecules, which could be seeds, nuclei, or monomeric protein, in a manner that polymerization is not sterically hindered. This implies some fundamental knowledge on the polymerization process of the protein to be investigated. Experimental design and data processing dramatically influence the quality of SPR-derived data. However, when properly utilized, SPR can be a powerful tool for determining kinetic and equilibrium constants of molecular interactions [204]. After kinetic analysis of fiber assembly, the SPR chips can be microscopically investigated by atomic force microscopy (AFM) to obtain additional morphological insights into the assembled protein fibers (see Sections 4.8 and 5.5) [205]. 4.8 Single-fiber Growth Imaging Using Atomic Force Microscopy 4.8.1 Theory of Atomic Force Microscopy Atomic force microscopy (AFM) measures the height of a solid probe over a specimen surface. AFM has been developed as a tool for imaging biological specimens and dynamic measurements of biological interactions. AFMs operate by measuring attractive or repulsive forces between a tip and a sample, with the force between the sample and the tip causing the cantilever to bend. The magnitude of this force depends on the distance between the tip and the sample and on the chemical and mechanical properties (i.e., stiffness) of the sample. Deflection of a laser beam focused onto the end of the cantilever detects the bending of the cantilever. Since the sample is mounted on top of a ceramic piezoelectric crystal (piezo), a feedback
4 Methods to Study Fiber Assembly
loop between the laser detector and the piezo controls the vertical movement of the sample. Depending on the AFM design, scanners are used to translate either the sample under the cantilever or the cantilever over the sample. By scanning in either way, the local height of the sample is measured. Three-dimensional topographical maps of the surface are then constructed by plotting the local sample height versus horizontal probe tip position [206]. AFM can achieve a resolution of 10 pm and, unlike electron microscopes, can image samples in air and under liquids, thereby allowing the imaging of hydrated specimens [207, 208]. This feature can be used to monitor fiber growth as previously shown for amyloid fibril formation [115, 209–211]. In its repulsive “contact” mode, the instrument lightly touches a tip at the end of a leaf spring or “cantilever” to the sample. As a raster scan drags the tip over the sample, a detection apparatus measures the vertical deflection of the cantilever, which indicates the local sample height. Thus, in contact mode the AFM measures hard-sphere repulsion forces between the tip and sample. In tapping mode, the cantilever oscillates at its resonant frequency (often hundreds of kilohertz) and is positioned above the surface, so that it taps the surface for only a very small fraction of its oscillation period. The very short time over which this contact occurs means that lateral forces are dramatically reduced as the tip scans over the surface. Tapping mode is commonly used for poorly immobilized or soft samples. Thus, in both modes the surface topography of the biological specimen is recorded and its elevations and depressions can be accurately monitored [207, 208, 212]. 4.8.2 Experimental Guide for Using AFM to Investigate Fiber Growth Experiments should be carried out in tapping mode in liquid using a fluid cell, since this mode allows for higher-resolution liquid imaging than does the contact mode. The proteins should be dissolved in low-salt buffers (it is best to avoid any salts) such as 10 µM Tris/HCl or 5 µM potassium phosphate. Protein concentrations should be around 50 µg mL−1 to obtain high-quality images. Also, atomically flat surfaces should be used to immobilize the assembling proteins. Mica is a commonly used surface material for protein attachment and fibril growth. Since slight drifts during repeated scans can occur in experiments that take several hours, common points easily visible on the surface should be chosen as a reference (such as protein aggregates or other impurities). The use of short, narrow-legged silicon nitride cantilevers (nominal spring constant 0.032 N/m) is emphasized. The best imaging results can be obtained using a tapping frequency in the range of 9 to 10 kHz. Typical scan rates are between 1 and 2 Hz, with drive amplitudes in the range of 150 to 300 mV [115, 210]. 4.9 Dyes Specific for Detecting Amyloid Fibers 4.9.1 Theory on Congo Red and Thioflavin T Binding to Amyloid Congo red (CR) is a diazo dye that binds relatively specifically to many amyloid proteins [213, 214]. The absorbance spectrum of CR changes upon binding to amyloid
29
30 Structural Analysis of Fibrous Proteins
Fig. 4 Absorption spectra of CR in buffer (without protein) (black line), of CR in the presence of soluble yeast Sup35p-NM (blue line), and of CR in the presence of amyloid fibers assembled with Sup35pNM (red line).
(Figure 4). Binding interactions of CR and the concomitant spectral shift depend on the structure of the amyloid, specifically the β-sheet conformation [215]. Protein fibers, which lack or contain only a minor proportion of β-sheet secondary structure, do not stain with CR. However, β-sheet secondary structure is not sufficient for CR binding, since no soluble β-sheet-rich globular protein is known to bind CR. It is suggested that CR binding occurs through a combination of both hydrophobic and electrostatic interactions [216]. The spectral shift upon binding might be caused by the stacking of several CR molecules in a defined manner along the amyloid fibers. Importantly, CR is a well-established inhibitor of fibril formation for several proteins, such as Aβ, amylin, prions, and insulin, and therefore it cannot be used to monitor fiber assembly in real time [216]. The mechanism of assembly inhibition is so far unsolved, but it could be shown that CR in addition to fibrous protein binds to native or partially unfolded conformational states and stabilizes them, which might prevent fiber formation. However, binding to the non-fibrous proteins does not lead to any spectral shift. Although binding of CR to amyloid fibers formed by Aβ peptides and insulin was highly specific and could be quantified [214], the fact that CR also binds to native, partially unfolded conformations of several proteins indicates that it must be used with caution as a diagnostic test for the presence of amyloid fibrils in vitro. The benzothiazoles thioflavin S and thioflavin T (ThT) are other classical amyloid stains that bind specifically to β-sheet-rich fibers but not to monomeric or oligomeric intermediates [217, 218]. ThT proved to be particularly attractive because of a substantial hypochromic shift of the excitation maximum of the bound dye from 336 to 450 nm, permitting selective excitation of the bound species. ThT fluoresces only when it is bound to many but not all amyloid fibrils and fibrillar β-sheet species [218]. The binding reaction is rapid (completed within 30 s) and is sensitive enough to avoid significant contributions from light scattering. The advantage of ThT over CR is that ThT does not inhibit fiber assembly and can
4 Methods to Study Fiber Assembly
therefore also be used as a dye for real-time fiber growth, although it is important to investigate in each particular case whether ThT stimulates assembly kinetics or stabilizes the fibers [219].
4.9.2 Experimental Guide to Detecting Amyloid Fibers with CR and Thioflavin Binding In a typical CR binding assay, CR binding can be quantified by a specific spectral shift. The classic CR birefringence used for histochemistry is only a qualitative technique that cannot be used for quantification. For spectral shift analysis, the amyloid fibrils should be dissolved in phosphate buffer, mixed with a solution of CR in phosphate buffer to yield a final CR concentration of 2–20 µM and incubated for 10 min, since an absorption maximum is reached after 10 min of incubation time [214]. The ratio of CR (in micromolar) to amyloid fibrils (in micrograms per milliliter, determined by the total concentration of soluble protein) should not fall below 1:5. Therefore, the total protein concentration should not exceed 100 µg µL−1 . The absorbance of control samples containing only CR (without protein) and only protein fibrils (without CR) should be monitored at 540 nm and 477 nm, respectively. Calculating the zeroed (to the controls) absorbances of the amyloid fibrils incubated with CR at 540 nm and 477 nm with the formula (A540 /25295) – (A477 /46306) leads to the moles CR bound per liter. From this, the moles CR bound per mole protein can be calculated. For ThT binding experiments, a stock solution of ThT is prepared at a concentration of 1 mM in double-distilled water and stored at 4 ◦ C protected from light to prevent quenching until it is used. For high-throughput measurements, ThT fluorescence can be determined in 96-microwell plates using a fluorescence plate reader 20 µM ThT is added. To the protein fiber solutions to be incubated in 96-microwell polystyrene plates with flat bottoms. A sample volume of 200 µL is added to each well. Five replicates corresponding to five wells are measured for each sample to minimize the well-to-well variation. The plates should be covered by ELAS septum sheets and incubated without agitation or shaking. Fluorescence measurements should be performed every 30 min (excitation at 450 nm, emission at 482 nm). The read time for each well is 1 s, and the total read time for the whole plate is approximately 3 min. Protein fiber adsorption to the plates has to be investigated by incubating protein fiber solutions in the plate for 10 min at 37 ◦ C with stirring and by determining the protein concentration before and after incubation by measuring the UV absorbance. Alternatively fibril formation can be performed in glass vials. Ten-microliter aliquots are withdrawn from the glass vials and added directly to a fluorescence cuvette (1-cm path length semi-micro quartz cuvette) containing 1 mL of a ThT mixture (5 µM ThT, 50 mM Tris buffer, and 100 mM NaCl pH 7.5). Emission spectra are recorded immediately after addition of the aliquots to the ThT mixture from 470 nm to 560 nm (excitation at 450 nm). For each sample, the signal can be obtained as the ThT intensity at 482 nm, from which a blank measurement recorded prior to addition of protein to the ThT solution is subtracted.
31
32 Structural Analysis of Fibrous Proteins
5 Methods to Study Fiber Morphology and Structure 5.1 Scanning Electron Microscopy for Examining the Low-resolution Morphology of a Fiber Specimen 5.1.1 Theory of SEM Scanning electron microscopy (SEM) involves the electron beam being passed over a conductive surface in a series of parallel lines, in a manner similar to that used in television imaging. Electrons leave the source and are focused onto the surface of a specimen as a very fine point. The probing electron beam penetrates only less than a few microns into the specimen and the electrons interact with the specimen surface, inducing a variety of radiations. Each leaves the specimen in a variety of directions and can be counted by a detector and displayed. The direction is not important; only the number of photons or electrons leaving the spot on the sample needs to be measured. The probing beam is moved to an adjacent spot on the specimen and the information is output to the display. This continues rapidly across the specimen, each point being amplified and magnified to form the image [220]. The type of information collected depends on the particular radiation being used as the information signal. The types of radiation utilized include visible light, X-rays, backscattered-electron current, and induced-specimen current [220]. Most commonly, secondary electrons emitted by the interaction of the beam are collected by a detector and the three-dimensional topographic image is built up. This technique, known as emissive mode, is commonly used for large specimens to examine morphological details. Coating the specimen in a conductive layer, usually gold coating, improves the specimen image, since the lower the atomic number of the specimen, the better the emission of secondary electrons [221]. The technique has been used extensively for the observation of silk fibers’ overall morphology. In general, however, the range of resolution (typically 100 nm to 1 mm) is lower than is required for examination of most protein fibers but is ideally suited for topographic imaging. SEM allows a large depth of field compared to the optical microscope, permitting the visualization of rough surfaces in focus across the whole specimen. In addition, the contrast within scanning micrographs provides a three-dimensional effect when the specimen is tilted with respect to the electron beam. 5.1.2 Experimental Guide to Examining Fibers by SEM SEM gives information only about the external surface of the specimen. Usually, the specimen requires fixation and dehydration. Fixation is commonly performed using osmium tetroxide or formaldehyde-glutaraldehyde as a fixative, depending on the nature of the material being examined. Dehydration is usually carried out by passing the specimen though a graded series of alcohol. Freeze-drying may be used to remove water from a specimen, and this is best achieved by freeze-drying from a nonpolar solvent rather than from water (e.g., amyl acetate can be substituted for water).
5 Methods to Study Fiber Morphology and Structure
After freeze-drying, the specimen is warmed to room temperature before air is readmitted to the vacuum chamber to prevent rehydration. The specimen is often coated with a thin layer of carbon to maintain the dried state and finally coated with a layer of heavy metal. Possibly, dehydration and coating may be performed within the same vacuum chamber. Critical point drying is also a method used for dehydration of some specimens. This involves replacing the water within a specimen first by a dehydrating agent, such as ethyl alcohol, followed by an intermediate liquid (e.g., amyl acetate) and a transitional liquid (e.g., carbon dioxide). This transitional liquid will be removed from the specimen at a “critical point” of temperature and pressure, when it becomes converted to a gas. Air drying is used less often due to the surface tension effects causing distortion in the specimen, but it is occasionally used for certain specimens. In order to improve the electrical potential of the specimen surface, specimens are coated with a thin layer of conducting materials, since the dehydrated biological specimen usually conducts poorly and may also build up beam-induced charge, which results in artifacts. Usually, a specimen will be coated with a thin layer of carbon, followed by an outer layer of a heavy metal, commonly gold. The thin layer of gold is applied by vacuum evaporation from a tungsten filament. The method of data collection will depend considerably upon the type of specimen [220]. In general, for biological specimens, it is thought that the acceleration should be kept low (10 kV) to reduce damage. The current should also be kept low, while enabling the collection of a relatively noise-free image. Astigmatism (i.e., a different direction focus at different points) is a very common limitation to resolution and therefore should be prevented as much as possible by maintaining the microscope to a high standard and corrected during data collection. Selection of the final aperture depends on the magnification and the depth of focus required. A large aperture (200 micron) is appropriate for high-resolution work, whereas at low magnification, depth of focus may be the prime requirement and therefore a small final aperture may be used (50 micron). It may prove useful to image the specimen at different tilts to examine different angles of the material. Data recording can be on film or, more commonly, on CCD camera. The images may then be examined in detail (Figure 5). 5.2 Transmission Electron Microscopy for Examining Fiber Morphology and Structure 5.2.1 Theory of TEM Electron micrographs represent a projection of the density distribution of the specimen onto a plane [222]. Therefore, an electron microscope image may contain sufficient information to generate the three-dimensional structure of an object. Transmission electron microscopy (TEM) is commonly used for examination of macromolecular protein assemblies, from single particles, such as viruses, to fibers, such as microtubules. Materials for TEM must be specially prepared to a thickness that allows electrons to transmit through the sample in a manner similar to how light is transmitted through materials in conventional optical microscopy.
33
34 Structural Analysis of Fibrous Proteins
Fig. 5 SEM of major and minor ampullate silks collected from the garden cross spider Araneus diadematus.
Since the wavelength of electrons is much smaller than that of light, the optimal resolution attainable for TEM images is many orders of magnitude better than that of a light microscope. This means that TEMs can reveal the finest details of internal structure – in some cases as small as individual atoms. The energy of the electrons in the TEM determines the relative degree of penetration of electrons in a specific sample or, alternatively, influences the thickness of material from which useful information may be obtained. Conventional electron microscopes tend to be run at 80–200 kV. The electron dose must be kept low to reduce radiation damage to the biological specimen. Thus, the contrast from a micrograph of biological specimens can be very poor, and therefore negative staining is required to visualize the sample. The negative stains are composed of heavy metals (such as uranyl acetate) that scatter the electrons and are electron-opaque, which results in a dark image of the stain deposited around a protein molecule or fiber. The image visualized within the electron microscope is two-dimensional. Image reconstruction techniques can be applied to produce three-dimensional images by single-particle averaging for globular molecules or by helical image reconstruction for fibers. Further analysis of electron micrographs is useful to exceed what is possible by visual interpretation to overcome problems such as signal detection in the presence of noise and interpretation of phase contrast images. For further discussion on analysis of images, see Section 5.3 on cryo-electron microscopy. 5.2.2 Experimental Guide to Examining Fiber Samples by TEM The way in which the sample is prepared can vary depending on the stability of the specimen. The sample must be dehydrated since it is viewed in a vacuum. In some cases the sample is fixed prior to preparing microscope grids. The grids
5 Methods to Study Fiber Morphology and Structure
used for TEM can consist of a layer of plastic, such as Formvar or Pioloform, overlaid with carbon or simply a thin carbon film. The grids may be purchased ready-made or can be made to the user’s specifications. Detailed procedures for making electron microscope grids can be found in Ref. [223]. Improved contrast and resolution can be obtained by using a thin-layer support, and this is best achieved by making carbon support films on mica that are then transferred to the grid. Alternatively, the carbon can be evaporated onto grids covered with a plastic film. The plastic film can then either be dissolved, leaving the thin carbon film, or left in place, resulting in more robust grids but a higher background upon imaging of a specimen. 5.2.2.1 Negative Staining for Imaging Fibrous Proteins As previously mentioned, it is usually necessary to provide contrast for visualization of a biological specimen by negative staining. These are usually heavy metal salts such as uranyl acetate, potassium phosphotungstate and ammonium molybdate. Detailed considerations of the advantages of different stains and choice of concentration are summarized in Ref. [223]. Uranyl salts give the greatest amplitude contrast but have slightly larger microcrystallinity/granularity after drying than some other negative stains, whereas potassium phosphotungstate gives fine granularity but lower image contrast [223]. Spreading agents may be a useful part of preparation of some samples. Surfactants such as n-octyl-β-glucopyranoside can be used to evenly disperse the molecules. The spreading agent can be added to the biological sample or to the negative stain. However, glow-discharge treatment of the microscope grid may be sufficient to allow even spread of the sample. The staining procedure varies considerably among different researchers. However, the simplest technique involves the placement of droplets (∼20 µL) of the specimen, several of filtered water and stain in a row, onto parafilm. The grid can then be placed facedown on each droplet in turn and for a specified amount of time. Alternatively, droplets (∼4 µL) may be placed onto an upturned grid and blotted away using filter paper before the next droplet is placed on the grid. 5.2.2.2 Metal Shadowing for Enhanced Contrast of Fibrous Proteins The contrast provided by negative-stain techniques may be inadequate for visualization of rodshaped particles, and metal shadowing may increase the contrast [224], allowing better imaging of the morphological features. Metal shadowing may be unidirectional or rotary. The protein to be examined must be in a solution of glycerol (50% v/v) and then be sprayed onto a piece of freshly cleaved mica and dried. The protein-covered mica is coated with platinum or another heavy metal (such as tantalum/tungsten) by evaporation and then coated with a layer of carbon. The metal/carbon replicas are then floated off onto cleaned grids and finally examined under the electron microscope. Contact prints of the micrographs provide the best images for further analysis (Figure 6). Details of this procedure can be found in Ref. [224].
35
36 Structural Analysis of Fibrous Proteins
Fig. 6. Platinum carbon shadowing micrographs of wild-type alpha-synuclein fibrils (assembled as previously described [250]).
5.3 Cryo-electron Microscopy for Examination of the Structure of Fibrous Proteins 5.3.1 Theory of Cryo-electron Microscopy Cryo-electron microscopy allows the imaging of samples in a hydrated state encased in a thin layer of ice. No stain is required and contrast is produced by the density difference between the molecule and the vitreous (non-crystalline) ice. The samples benefit from the fact that they are not flattened by gravity, are unstained, and can be viewed at near-physiological conditions (i.e., hydrated). High-resolution data can be collected from samples at liquid nitrogen temperatures using low-dose techniques, which minimize radiation damage to the sample. Cryo-electron micrographs usually show a low signal-to-noise ratio due to the low dose of electrons, the lack of stain, and the use of small defocuses. Therefore, cryo-electron micrographs benefit from additional analysis such as Fourier filtering and image-reconstruction techniques. High-resolution structures (to 6 Å) have been elucidated by single-particle analysis from cryo-electron micrographs of large protein molecules. Helical image reconstruction from electron micrographs can allow the elucidation of fibrous structures. 5.3.2 Experimental Guide to Preparing Proteins for Cryo-electron Microscopy Cryo-electron microscopy grid preparation involves a certain amount of development to find the ideal conditions for examination of a particular protein sample. Grids are made from a holey carbon film, and the size of the holes and thickness of the carbon should be optimized for the protein molecule’s size, as the thickness of the ice layer in which the protein is encased should be ideal for the
5 Methods to Study Fiber Morphology and Structure
size of the protein. The thickness of the ice within a carbon hole will depend on the diameter of the holes, the thickness of the carbon, and the treatment of the grid during freezing. Often, grids are glow-discharged to spread out the water layer, resulting in thinner ice, or amylamine is used to increase the thickness. Finally, the technique used for grid blotting before the sample is flash-frozen will affect the ice thickness. Holey carbon grids can be purchased or made by a procedure similar to that for making carbon films for negative-stain electron microscope grids [223]. The concentration of the fiber sample should be optimized for cryo-electron microscopy first by negative-stain EM. Ideally, the concentration should be such that several fibers are found in the scope of a micrograph but that they rarely cross over. A droplet of the fiber sample is then placed onto the optimally prepared holey carbon grids, blotted, and plunged into liquid ethane cooled by liquid nitrogen. If the procedure is carried out correctly, the grid will be covered in a thin layer of vitreous ice. The grid should be stored in liquid nitrogen until required. Transfer to the cryo-stage specimen holder for the electron microscope must be performed at low humidity and at a temperature lower than −160 ◦ C to prevent ice accumulation on the grid and to preserve the vitreous water. Specialized equipment to transfer the grid to the electron microscope specimen holder is available (Oxford Instruments, Gatan). The cryo-TEM must also have low vapor content; this is achieved by a liquid nitrogen pre-cooled anti-contaminator system to maintain the vitreous ice. The temperature of the specimen should then be maintained at less than −125 ◦ C to prevent transformation of the vitreous water to crystalline cubic ice. The appearance of crystalline ice may not be visible in the microscope but can be observed as diffraction spots upon electron diffraction. Usually, cryo-electron micrographs will be collected at various defocuses in order to collect a full dataset. This is because at each defocus the contrast transfer function will pass through zero, giving rings of uncollected data in the power spectrum. 5.3.3 Structural Analysis from Electron Micrographs Data collection for revealing images of protein fibers may include systematically tilting the specimen to produce a series of images with different projections of the material. However, it may be possible to reconstruct the structure of a helical molecule using only a single image, since the high symmetry of a helical fiber allows observation of many views simultaneously [225]. In order to analyze the micrograph images, they must be digitized (some electron microscopes have fitted CCD cameras, thus avoiding this step). Ideally, a fiber image will be scanned such that the fiber is vertical in order to avoid interpolation resulting in loss of data at the later stages. Examination of images can be carried out using a suitable program such as Ximdisp [226]. Regions of fiber images may be selected and must be boxed and floated. This involves masking off all density not belonging to the particle and surrounding the particle box to reduce the density step at the edge of the box [227]. Fiber images may benefit from Fourier filtering and/or background subtraction. The fibers may be straightened by a number of established techniques [228]. If necessary, the image may be interpolated. Interactive Fourier transforms are calculated to look for repeating features with the fiber. If a diffraction pattern is observed, then further analysis may be carried out on the fiber
37
38 Structural Analysis of Fibrous Proteins
image using helical image reconstruction [227, 229, 230]. The determination of the position and tilt of the helix axis must first be performed, followed by indexing of the layer line spot positions. The indexing is tested and phases are measured, allowing a reconstruction to be carried out [227, 229, 230]. This results in a threedimensional density function for the object. This methodology can be followed in detail [222, 229]. 5.4 Atomic Force Microscopy for Examining the Structure and Morphology of Fibrous Proteins 5.4.1 Experimental Guide for Using AFM to Monitor Fiber Morphology Minimal sample preparation is necessary for visualizing amyloid fibers by atomic force microscopy (AFM) [231]. A clean, uncontaminated solution of the sample can be placed onto freshly cleaved mica, glass, or other suitable smooth support. The substrate should be thoroughly cleaned with an excess of buffer (salt-free is best) or water. In order to get good contrast and to reduce mechanical damage of the soft biological materials, the samples can be stabilized by adding covalent cross-linking agents or certain cations that are able to link the constituents of the sample to each other or to the substrate. Cooling can also stiffen the sample to reduce sample disruption and damage. Tapping mode is most commonly used for imaging relatively soft biomolecules. The force generated between the sample and tip depends on the amplitude of free oscillation and the set point amplitude [232]. The observed amplitude of oscillation is maintained by adjusting the vertical position of the sample. The strength of tapping must be optimized for a particular sample, since tapping too hard will reduce image resolution and quality due to damage to the tip or protein. Tapping too softly might cause adhesive forces between the sample and cantilever tip, resulting in damping of oscillation and causing spurious height data for the sample [232]. Once these conditions are optimized, image data may be collected. Visual representations of the data are often generated by assigning a color or brightness to the relative height in the image. This will enable clear images of the fiber in question to be observed above the background smooth surface (Figure 7). Height measurements may be taken from the data, but width cannot be measured accurately due to the shape and size of the tip. Of course, height measurements taken from specimens in air may be affected by dehydration of the specimen, by the effect of gravity, or by the force of the cantilever compressing soft protein. However, morphological features may be clearly observed. 5.5 Use of X-ray Diffraction for Examining the Structure of Fibrous Proteins 5.5.1 Theory of X-Ray Fiber Diffraction X-ray diffraction involves the use of proteins in an ordered arrangement. The amount of information that can be obtained is directly related to the degree of order
5 Methods to Study Fiber Morphology and Structure
Fig. 7 (a) AFM image of amyloid fibers assembled with biotinylated yeast Sup35p-NM on streptavidin scanned in contact mode [251]. The scale bar gives the color code for the height information. (b) AFM images of
major ampullate silk from Araneus diadematus on mica scanned in tapping mode. The left image reflects the sample height above the surface, and the right image depicts the tip deflection.
39
40 Structural Analysis of Fibrous Proteins
within the diffracting object. The most ordered arrangement will be in a crystal where each identical protein molecule is arranged periodically on a threedimensional lattice. Diffraction from crystals results in a pattern comprised of sharp spots. However, fibrous proteins usually do not crystallize due to their size, insolubility, or heterogeneity. Fibrous proteins form partly ordered structures, and standard methods of crystal structure analysis are not applicable to fibers due to the degree of disorder and number of overlapping reflections. However, X-ray fiber diffraction can be performed. Details concerning the theory of X-ray fiber diffraction may be found in Ref. [19]. For a single, isolated chain molecule, the crystalline lattice extends in only one dimension, parallel to the axis of the fiber (z-direction). The scattering appears on layer lines, and the layer planes have a separation of the reciprocal of the periodic repeat along the fiber axis. Since a single-chain molecule has no periodicity in the x- and y-directions, the scatter is a continuous function. In practice, however, a fiber specimen is composed of several chain molecules with rotational averaging. The molecules may be axial in register but arranged at all possible azimuthal orientations. These molecules are related to one another, but not by a crystallographic lattice. Therefore, a collection of chain molecules gives the scattering effect for the isolated molecules plus the effects caused by the orientation of the molecules and the packing, size, and mutual disposition of ordered regions. Fibers are cylindrically symmetrical, so each reflection is spread out into a disk and intercepts the Ewald sphere at two symmetrical points. This means that the diffraction pattern is symmetrical. On the fiber diffraction diagram, the direction parallel to the fiber axis is known as the meridian and the perpendicular direction is the equator. Fibers are crystalline along the fiber axis and this produces strong, sharp reflections on the meridian. The degree of disorientation of the fibers relative to one another will produce more or less arcing of reflections. The appearance of diffraction arcs depends on the arrangement of crystallites (microstructure) within the material, known as the texture. Individual fibers may be aligned relative to one another, thus reducing disorientation and reducing the spread of the arcs. A sample in which the fibers have not been aligned at all will produce a diffraction pattern where the reflections appear as rings, much like a powder diffraction pattern. X-ray diffraction from a partially aligned specimen will allow distinction between the meridional and equatorial reflections. Even further orientation of the fiber within the specimen may produce a series of lines of intensity known as layer lines, running perpendicular to the meridian of the pattern. The intensities on closely spaced layer lines may overlap. In fact, it is rare to find non-overlapping reflections. If the fiber is mounted with the fiber axis normal to the X-ray beam, the diffraction pattern will be symmetrical above and below the equator (Figure 8). However, there will be regions of reciprocal space for which no data is recorded. By tilting the specimen, additional data can be collected. 5.5.2 Experimental Guide to X-Ray Fiber Diffraction 5.5.2.1 Sample Preparation Fibers in general are weakly diffracting, and therefore specimen preparation is particularly important. The specimen should be of an optimum size and thickness to fill the beam. Otherwise background scatter may
5 Methods to Study Fiber Morphology and Structure
Fig. 8 X-ray fiber diffraction from amyloid fibrils from a short peptide homologous to the central region of A$ peptide showing the cross-$ diffraction pattern.
obscure weak reflections. The amount of information that can be obtained from an X-ray fiber diffractogram is closely associated with the degree of alignment. Some fibrous proteins occur naturally in a highly oriented form (e.g., keratin and collagen). Often regions of orientation are quite limited, and therefore it can be useful to use a microfocus X-ray beam for data collection. Orientation may be improved by physical manipulation or by chemical treatment. For example, the orientation of Chrysopa egg stalk silk fiber orientation was increased by stretching in water or urea [4]. Preparation of oriented films and fibers from soluble fibrous proteins has involved a number of procedures that include stretching, rolling, casting films, drawing fibers from precipitate, shearing, magnetic alignment, centrifugation, and capillary flow. Long-chain molecules can be drawn from the solution in a manner similar to that used for making diffraction samples of DNA. Samples that cannot be pulled into long fibers may be aligned by hanging a drop between two wax-plugged capillary tubes arranged 1–2 mm apart [1]. Casting flat sheets is often used as a method for aligning polymers [233–235], and this can be achieved by placing a high concentration of a fiber solution on a suitable surface and allowing the solution to dry to form a thin film that can be removed from the surface (such as Teflon). The film will show alignment of the fibers in the direction parallel to the film surface but no alignment in the perpendicular direction. Further improvement to alignment of fiber bundles or films may be achieved by annealing in high humidity. Some fibrous protein specimens are diamagnetically anisotropic [236] and may benefit from magnetic alignment [105, 237, 238]. The effect of the magnetic field on a single molecule is small, and therefore it is necessary to embed the molecules in a liquid crystal so that the molecules act together and to use a high-energy magnet [239].
41
42 Structural Analysis of Fibrous Proteins
The alignment can be fixed in the specimen by slow drying while the specimen is still in the magnetic field. 5.5.2.2 Data Collection Since fibrous proteins are often weak scatterers, long exposure times may be needed. Cameras specially designed for the particular qualities of the specimen may be helpful. The arrangement of the apparatus will depend on whether a high- or low-angle diffraction is being collected. The use of a helium chamber and an evacuated camera can be helpful to reduce air scatter as well as to allow the humidity around the specimen to be controlled if required. In the past, the recording of diffraction data was performed using photographic film. This allowed the placement of several films in the detector to record a range of intensities. However, it is now more common to use detectors used for crystallography, such as a MarResearch image plate or CCD cameras. If film has been used, then the diffraction image must be digitized and converted to an appropriate format for data analysis. 5.5.2.3 Data Analysis There are a number of fiber diffraction processing programs available to facilitate processing of fiber diffraction data. CCP13 [240] is a collection of programs designed to analyze high-quality diffraction patterns. FIT2D (http://www.esrf.fr/computing/expg/subgroups/data analysis/FIT2D) can be used to analyze one- and two-dimensional datasets. The CCP4 suite [241] is intended for protein crystallography, but many of its components such as ipdisp and Mosflm can be utilized for examining the diffraction pattern. The data from the detector must be converted into a format compatible with the other programs in the chain. Examples of such programs are XCONV (CCP13) [147], marcvt (http://www.marresearch.com/ software.htm) and Denzo [242]. The diffraction image must be carefully centered to avoid a systematic error while measuring spot locations. Mosflm allows this to be done by manually moving the image with respect to a series of overlaid concentric circles. XFIX (CCP13) allows the user to find the center of the pattern and to calibrate the specimen to film distance. One may also wish to correct for the shape of the image plate and determine the sample’s tilt and rotation [243]. Background subtraction is often beneficial to correct for detector fog, white radiation, and X-ray scattering from air, camera components, the sample holder, and amorphous material in the specimen such as solvent and disordered polymer [244, 245]. The positions of the reflection can then be assessed. An initial survey can be carried out by noting spot locations and resolutions. This is straightforward in Mosflm, since the resolutions are calculated by clicking on spot maxima using a zoomed image. Resolution calculation can also be done by hand using an approximation to Bragg’s law. This is the result of the product of the wavelength of the X-rays and the distance between the sample and the image plate, divided by the distance of the diffraction spot from the center of the diffraction pattern. For highquality diffraction patterns containing diffraction peaks on layer lines, the CCP13 suite of programs allows the user to quadrant-average the diffraction pattern given the calculated tilt/rotation of the image. This image can then be processed further
5 Methods to Study Fiber Morphology and Structure
to include background subtraction and modeling of the position, size, shape, and intensity of the reflections. 5.5.2.4 Model Building An X-ray diffraction pattern from an unknown structure cannot directly lead to a structure since only amplitudes, but not phases, can be measured from the diffractogram. One method of solving this phase problem is by trial and error, so that from consideration of the diffraction pattern, symmetry, and chemical composition, a model is built and refined until a calculated diffraction can be generated that is similar to the observed [239]. A final model will predict the observed diffraction pattern so well that the possibility of the existence of another model that also predicts the diffraction pattern is virtually eliminated. Fibers by their nature are highly symmetrical and consist of repeating units, which reduces the number of possibilities for arrangement. Structural information about the subunits may already be available, either from crystal structures of individual monomers or from other biophysical techniques that indicate a secondary or tertiary structure. A molecular structure may be modeled using software such as LALS [246] or Cerius2 (http://www.accelrys.com/cerius2/cerius248/), which then allows the calculation of a simulated diffractogram to compare with the empirical data. Fxplor (http://www.molbio.vanderbilt.edu/fiber/software.html) is an extension of X-Plor, a program useful for atomic model refinement. 5.6 Fourier Transformed Infrared Spectroscopy 5.6.1 Theory of FTIR Fourier transformed infrared spectroscopy (FTIR) has been used to gain information about the conformation of fibrous proteins. Proteins yield three major infrared bands of interest that arise from the peptide backbone. These are the amide I absorption at 1600–1700 cm−1 , arising from the C=O stretch; the amide II absorption at 1500–1600 cm−1 , arising from the N=H deformation; and the amide III absorption at 1200–1350 cm−1 . A N=H stretch also absorbs at about 3300 cm−1 [247]. The presence of hydrogen bonding within the sample shifts the energies of the peptide variations, therefore allowing the determination of the presence of α-helical and β-sheet conformations in theory. However, this is complicated in practice due to coupling between individual peptide vibrations, which leads to splitting of the bands into a series of bands. The symmetry of α-helical fibers leads to a distinctive set of bands upon aggregation from the monomer [247, 248], whereas the spectra observed from the formation of all-β-sheet protein fibers are more subtle [248]. Observed and calculated infrared spectra from α-helix and antiparallel and parallel β-structures can be found in Ref. [247]. 5.6.2 Experimental Guide to Determining Protein Conformation by FTIR FTIR data collection is complicated by the fact that the bending vibrations H–O–H from water are near 1640 cm−1 , which can obscure the amide I band. This problem can be overcome by examination of the protein in the solid state, when little water
43
44 Structural Analysis of Fibrous Proteins
is present, or by exchanging the hydrogen for deuterium. The D–O–D bending vibration is shifted to 1220 cm−1 , but care must be taken to fully exchange the H for D, since partial exchange will result in bands in the amide I region [248]. However, for examining the structure of ex vivo samples of fibrous proteins, exchange of deuterium may not be possible, and in order to overcome this, the signal-to-noise resolution of the data must be high. For solution samples, the use of a short path length (4–10 µm) and high protein concentration (1–50 mg mL−1 ) should reduce the water absorbance relative to the protein signal at 1640 cm−1 [248]. To minimize scattering, samples may be prepared in dry state in KBr pellets by combining oven-dried KBr with the protein. The sample is then kept under vacuum and pressed (5–10 kbar) using a Carver press to form a thin, translucent pellet. This pellet can then be transferred to the transmission holder and a spectrum can be collected [248]. The spectrum is then compared with the spectrum collected from a KBr-only pellet. However, this harsh method may cause denaturation of the protein. Alternative methods include attenuated total reflectance (ATR) FTIR, diffuse reflectance FTIR, and transmission-mode FTIR from a thin film. For ATR-FTIR, the sample is placed on a surface composed of a material of high refractive index, which is known as the internal reflection element (IRE) and is often composed of germanium or zinc selenide. The beam penetrates to the boundary between the IRE and the sample and is not transmitted through the sample. This makes this method ideal for examining fibrous proteins, and the method may be used for samples prepared as a thin film or suspension [249]. For ATR experiments, the maximum amount of light throughput is required and the amount of water vapor is reduced (by purging the system with dry nitrogen or dry air) to increase the sensitivity. Oberg et al. [249] recommend the use of out-of-compartment, horizontal, trapezoid-shaped IREs. To prepare a sample for ATR-FTIR, thin films can be prepared of the fibrous protein using a solution (50–100 µL of 0.5–1.0 mg mL−1 ) or suspension placed onto the IRE and dried using nitrogen gas. The thin films are partly hydrated, maintaining the protein structure. Further, the limited-depth penetration of the ATR method reduces the contribution of water to the spectra [248]. Lyophilized proteins may also be spread onto the IRE for collection of spectra from powdered proteins. Alternatively, the diffuse reflectance (DRIFT) method can be used for dry samples [248]. In contrast to the ATR method, in transmission-mode FTIR, the beam passes directly through the sample [248]. High protein concentrations (2–60 mg mL−1 ) and small path lengths (less than 10 µm) are required, although for a D2 O sample, lower protein concentrations and longer path lengths (25–50 µm) may be used. Thin films can be prepared by drying the protein suspension or solution onto thin silver chloride disks. Data analysis of FTIR spectra involves some expertise. Water or buffer subtraction must be performed before analysis of the spectrum. The conformation of the polypeptide chain has a signature in the amide I band. However, the bands may overlap, making interpretation difficult. This can be tackled in two ways. One approach is resolution enhancement methods or deconvolution to decrease the widths of the infrared bands and to allow determination of the overlapping compo-
6 Conclusions
nents. A second approach is to use pattern-recognition methods for which software packages are available (e.g., GRAMS, Galactic Corp.).
6 Conclusions
In the present chapter, fiber-forming proteins and methods to determine their fiber assembly and fiber morphology have been described. There is a wide variation among fibrous proteins with regard to the content of regular secondary structure. Fibers of proteins such as collagen and tropomyosin with highly elongated molecules tend to have a high content of regular secondary structure (generally greater than 90%). In contrast, fibers of globular proteins, such as actin, feather keratin, and flagellin, generally have much lower contents of regular structure (often less than 50%). In another group of proteins, comprising elastin, resilin, and the matrix proteins of mammalian keratins, the content of regular secondary structure is very small. In these cases the protein fibers are cross-linked and have rubber-like elastic properties. So far, any attempt to correlate sequence with structure and function in fibrous proteins is highly speculative. Therefore, from a given sequence it cannot be concluded whether a protein forms fibers or not. However, the structure-function relation based on sequence data is the aim of many biologists, chemists, physicists, and material scientists. One main conformation-determining factor in fibrous proteins appears to be regions with repeating sequences. The simplest examples of repeating sequences are found among the silks, which have, for instance, runs of Glyn . The corresponding sections of the chains adopt conformations that appear to be identical to those observed in synthetic homopolypeptides. Four other conformation-determining factors in fibrous proteins are interactions involving side chains, interactions between apolar side chains, hydrogen bond formation, and the content and distribution of Gly and Pro residues. Aggregation studies on amyloids, collagens, myofibrils, and silks illustrate the capacity of fibrous proteins for self-assembly into filaments and other organized aggregates. The mode of aggregation is generally found to be sensitive to changes in pH, ionic strength, and the presence of other solutes, indicating that the assembly process depends upon a fine balance among various types of weak secondary bonding. Packing considerations are also clearly involved, but in the case of assembly in aqueous environment, these will be less important than in a solvent-free crystalline phase. Several features of the self-assembly process in vivo are still not well understood, such as nucleation and termination of fiber formation. For termination of fiber formation, two possibilities can be envisaged: the assembly process could be intrinsically self-limiting, with the information residing in the individual protein molecules, or the size of the aggregate is potentially unlimited, but regulation of size is achieved through some external agency.
45
46 Structural Analysis of Fibrous Proteins
The future aim in investigating fibrillar proteins is to understand the kinetics of fiber assembly (which will be a basis for controlling fiber assembly) and to gain more insights into fibrous structure to learn about sequence-structure relationships. Such knowledge will be a basis for elucidating the relevance of assembly processes in cellular development, cellular structure, and cellular stability; for assessing fundamental influences leading to protein-folding diseases; and for developing products through material science.
Acknowledgements
T.S. is supported by the Deutsche Forschungsgemeinschaft (SFB 563 A9); SCHF 603/4-1, the Fonds der Chemischen Industrie, and the Leonhard-Lorenz-Stiftung. L.S. is funded by a Welcome Trust Research Career Development fellowship. The authors would like to thank Linda Amos for the donation of Figure 2, Bettina Richter for technical assistance, and Daniel H¨ummerich, Sumner Makin, and Thusnelda Stromer for critical reading of the manuscript.
References 1 SUNDE, M. & BLAKE, C. (1997). The structure of amyloid fibrils by electron microscopy and X-ray diffraction. Adv. Protein Chem. 50, 123–159. 2 EANES, E. D. & GLENNER, G. G. (1968). X-ray diffraction studies on amyloid filaments. J. Histochem. Cytochem. 16, 673–677. 3 TEPLOW, D. B. (1998). Structural and kinetic features of amyloid beta-protein fibrillogenesis. Amyloid 5, 121–142. 4 GEDDES, A. J., PARKER, K. D., ATKINS, E. D. & BEIGHTON, E. (1968). “Cross-beta” conformation in proteins. J. Mol. Biol. 32, 343–358. 5 ROCHET, J. C. & LANSBURY, P. T. (2000). Amyloid fibrillogenesis: themes and variations. Curr. Opin. Struct. Biol. 10, 60–68. 6 DOBSON, C. M. (2001). The structural basis of protein folding and its links with human disease. Philos. Trans. R. Soc. Lond. B Biol. Sci. 356, 133–145. 7 LINDQUIST, S. (1997). Mad cows meet psichotic yeast: the expansion of the prion hypothesis. Cell 89, 495–498. 8 WICKNER, R. B., EDSKES, H. K., MADDELEIN, M. L., TAYLOR, K. L. &
9
10
11
12
13
14
MORIYAMA, H. (1999). Prions of yeast and fungi. Proteins as genetic material. J. Biol. Chem. 274, 555–558. SCHEIBEL, T. (2002). [PSI]-chotic yeasts: protein-only inheritance of a yeast prion. in Recent Research in Molecular Microbiology 1 (ed. PANDALAI, S. G.), Hindustan Publ. Corp., Delhi, 71–89. ICONOMIDOU, V. A., VRIEND, G. & HAMODRAKAS, S. J. (2000). Amyloids protect the silkmoth oocyte and embryo. FEBS Lett. 479, 141–145. PODRABSKY, J. E., CARPENTER, J. F. & HAND, S. C. (2001). Survival of water stress in annual fish embryos: dehydration avoidance and egg envelope amyloid fibers. Am. J. Physiol. Regul. Integr. Comp. Physiol. 280, R123–131. GUHRS, K. H., WEISSHART, K. & GROSSE, F. (2000). Lessons from nature – protein fibers. J. Biotechnol. 74, 121–134. KNIGHT, D. P. & VOLLRATH, F. (2001). Comparison of the spinning of selachian egg case ply sheets and orb web spider dragline filaments. Biomacromolecules 2, 323–334. CRAIG, C. L. & RIEKEL, C. (2002). Comparative architecture of silks,
References
15
16 17
18
19
20
21
22
23
24
25
26
27
fibrous proteins and their encoding genes in insects and spiders. Comp. Biochem. Physiol. B Biochem. Mol. Biol. 133, 493–507. VALLUZZI, R., WINKLER, S., WILSON, D. & KAPLAN, D. L. (2002). Silk: molecular organization and control of assembly. Philos. Trans. R. Soc. Lond. B Biol. Sci. 357, 165–167. VOLLRATH, F. (1999). Biology of spider silk. Int. J. Biol. Macromol. 24, 81–88. WARWICKER, J. O. (1960). Comparative studies of fibroins II. The crystal structures of various fibroins. J. Mol. Biol. 2, 350–362. RUDALL, K. M. & KENCHINGTON, W. (1971). Arthropod silks: The problem of fibrous proteins in animal tissues. Annu. Rev. Entomol. 16, 73–96. FRASER, R. D. & MACRAE, T. P. (1973) in Conformation in fibrous proteins and related synthetic polypeptides (eds. HORECKER, B., KAPLAN, N. O., MARMUR, J. & SCHERAGA, H. A.) Academic Press London, 293–343. VOLLRATH, F. (2000). Strength and structure of spiders’ silks. J. Biotechnol. 74, 67–83. BURGESON, R. E. (1988). New collagens, new concepts. Annu. Rev. Cell Biol. 4, 551–577. van der REST, M. & GARRONE, R. (1991). Collagen family of proteins. FASEB J. 5, 2814–2823. MAYNE, R. & BREWTON, R. G. (1993). New members of the collagen superfamily. Curr. Opin. Cell Biol. 5, 883–890. FRATZL, P., MISOF, K., ZIZAK, I., RAPP, G., AMENITSCH, H. & BERNSTORFF, S. (1998). Fibrillar structure and mechanical properties of collagen. J. Struct. Biol. 122, 119–122. BERISIO, R., VITAGLIANO, L., MAZZARELLA, L. & ZAGARI, A. (2002). Recent progress on collagen triple helix structure, stability and assembly. Protein Pept. Lett. 9, 107–116. CREIGHTON, T. E. (1993) in Proteins – structures and molecular properties W. H. Freeman New York, 193–198. BRODSKY, B. & RAMSHAW, J. A. (1997). The collagen triple-helix structure. Matrix Biol. 15, 545–554.
28 GROSS, J. (1963). Comparative biochemistry of collagen. Comp. Biochem. 5, 307–345. 29 ADAMS, E. (1978). Invertebrate collagens. Science 202, 591–598. 30 SPIRO, R. G., LUCAS, F. & RUDALL, K. M. (1971). Glycosylation of hydroxylysine in collagens. Nat. New Biol. 231, 54–55. 31 KNIGHT, D. P., FENG, D., STEWART, M. & KING, E. (1993). Changes in the macromolecular organization in collagen assemblies during secretion in the nidamental gland and formation of the egg capsule wall in the dogfish Scyliorhinus canicula. Phil. Trans. R. Soc. Lond. B 341, 419–436. 32 KRAMER, J. M., COX, G. N. & HIRSCH, D. (1992). Comparisons of the complete sequences of two collagen genes from Caenorhabditis elegans. Cell 30, 599–606. 33 MANN, K., GAILL, F. & TIMPL, R. (1992). Amino acid sequence and cell adhesion activity of fibrilforming collagen from the tube worm Riftia pachyptila living at deep sea hydrothermal vents. Eur. J. Biochem. 210, 839–847. 34 GARRONE, R. (1978). Phylogenesis of connective tissue. In Frontiers of matrix biology Vol. 5 (ed. ROBERT, L.) Karger Basel, 1–250. 35 MERCER, E. H. (1952). Observations on the molecular structure of byssus fibers. Austr. J. Mar. Freshwater Res. 3, 199–204. 36 RUDALL, K. M. (1955). The distribution of collagen and chitin. Symp. Soc. Exp. Biol. 9, 49–71. 37 MELNICK, S. C. (1958). Occurrence of collagen in the phylum Mollusca. Nature 181, 1483. 38 DEVORE, D. P., ENGEBRETSON, G. H., SCHACTELE, C. F. & SAUK, J. J. (1984). Identification of collagen from byssus threads produced by the sea mussel Mytilus edulis. Comp. Biochem. Physiol. 77B, 529–531. 39 BENEDICT, C. V. & WAITE, J. H. (1986). Location and analysis of byssal structural proteins. J. Morphol. 189, 261–270. 40 PIKKARAINEN, J., RANTANEN, J., VASTAMAKI, M., LAPIAHO, K., KARI, A. & KULONEN, E. (1968). On collagens of invertebrates with special reference to
47
48 Structural Analysis of Fibrous Proteins
41
42
43
44
45
46 47
48
49
50
51
52
53
Mytilus edulis. Eur. J. Biochem. 4, 555–560. QIN, X. X. & WAITE, J. H. (1995). Exotic collagen gradients in the byssus of the mussel Mytilus edulis. J. Exp. Biol. 198, 633–644. COYNE, K. J., QIN, X. X. & WAITE, J. H. (1997). Extensible collagen in mussel byssus: a natural block copolymer. Science 277, 1830–1832. CARLIER, M. F. (1991). Actin: protein structure and filament dynamics. J. Biol. Chem. 266, 1–4; KABSCH, W. & VANDEKERCKHOVE, J. (1992). Structure and function of actin. Annu. Rev. Biophys. Biomol. Struct. 21, 49–76. JANMEY, P. A., SHAH, J. V., TANG, J. X. & STOSSEL, T. P. (2001). Actin filament networks. Results Probl. Cell. Differ. 32, 181–199. CHENEY, R. E., RILEY, M. A. & MOOSEKER, M. S. (1993). Phylogenetic analysis of the myosin superfamily. Cell Motil. Cytoskeleton 24, 215–223. TITUS, M. A. (1993). Myosins. Curr. Opin. Cell. Biol. 5, 77–81. BERG, J. S., POWELL, B. C. & CHENEY, R. E. (2001). A millennial myosin census. Mol. Biol. Cell. 12, 780–794. PERRY, S. V. (2001). Vertebrate tropomyosin: distribution, properties and function. J. Muscle Res. Cell Motil. 22, 5–49. WEGNER, A. (1979). Equilibrium of the actin-tropomyosin interaction. J. Mol. Biol. 131, 839–853. YANG, Y.-Z., KORN, E. D. & EISENBERG, E. (1979). Cooperative binding of tropomyosin to muscle and Acanthamoeba actin. J. Biol. Chem. 254, 7137–7140. ZOT, A. S. & POTTER, J. D. (1987) Structural aspects of troponintropomyosin regulation of skeletal muscle contraction. Ann. Rev. Biophys. Biophys. Chem. 16, 535–539. STOKES, D. L. & DEROSIER D. J. (1987). The variable twist of actin and its modulation by actin-binding proteins. J. Cell Biol. 104, 1005–1017. PANTE, N. (1994). Paramyosin polarity in the thick filament of molluscan smooth muscles. J. Struct. Biol. 113, 148–163.
54 ALBERS, K. & FUCHS, E. (1992). The molecular biology of intermediate filament proteins. Int. Rev. Cytol. 134, 243–279. 55 CHOU, Y. H., BISCHOFF, J. R., BEACH, D. & GOLDMAN, R. D. (1990). Intermediate filament reorganization during mitosis is mediated by p34cdc2 phosphorylation of vimentin. Cell 62, 1063–1071. 56 STEWART, M. (1993). Intermediate filament structure and assembly. Curr. Opin. Cell Biol. 5, 3–11. 57 HERRMANN, H. & AEBI, U. (2000). Intermediate filaments and their associates: multi-talented structural elements specifying cytoarchitecture and cytodynamics. Curr. Opin. Cell Biol. 12, 79–90. 58 STRELKOV, S. V., HERRMANN, H. & AEBI, U. (2003). Molecular architecture of intermediate filaments. Bioessays 25, 243–251. 59 COULOMBE, P. A. (1993). The cellular and molecular biology of keratins: beginning a new era. Curr. Opin. Cell Biol. 5, 17–29. 60 GARROD, D. R. (1993). Desmosomes and hemidesmosomes. Curr. Opin. Cell Biol. 5, 30–40. 61 NIGG, E. A. (1992). Assembly and cell cycle dynamics of the nuclear lamina. Semin. Cell Biol. 3, 245–253. 62 RZEPECKI, R. (2002) The nuclear lamins and the nuclear envelope. Cell. Mol. Biol. Lett. 7, 1019–1035. 63 PRIVALOV, P. L. & MEDVED, L. V. (1982). Domains in the fibrinogen molecule. J. Mol. Biol. 159, 665–683. 64 DOOLITTLE, R. F. (1984). Fibrinogen and fibrin. Annu. Rev. Biochem. 53, 195–229. 65 MEDVED, L. V., LITVINOVICH, S., UGAROVA, T., MATSUKA, Y. & INGHAM, K. (1997). Domain structure and functional activity of the recombinant human fibrinogen gamma-module (gamma148–411). Biochemistry 36, 4685–4693. 66 BUDZYNSKI, A. Z. (1986). Fibrinogen and fibrin: biochemistry and pathophysiology. Crit. Rev. Oncol. Hematol. 6, 97–146. 67 MOSHER, D. F. & JOHNSON, R. B. (1983). Specificity of fibronectin-fibrin
References
68
69
70
71
72
73
74
75
76
77
78
79
80
81
cross-linking. Ann. N.Y. Acad. Sci. 408, 583–594. VARADI, A. & PATTHY, L. (1983). Location of plasminogen-binding sites in human fibrin(ogen). Biochemistry 22, 2440–2446. BACH, T. L., BARSIGIAN, C., YAEN, C. H. & MARTINEZ, J. (1998). Endothelial cell VE-cadherin functions as a receptor for the beta15–42 sequence of fibrin. J. Biol. Chem. 273, 30719–30728. SKOUFIAS, D. A. & SCHOLEY, J. M. (1993). Cytoplasmic microtubule-based motor proteins. Curr. Opin. Cell. Biol. 5, 95–104. MANDELKOW, E. & MANDELKOW, E. M. (1995). Microtubules and microtubuleassociated proteins. Curr. Opin. Cell. Biol. 7, 72–81. AMOS, L. A. & BAKER, T. S. (1979). The three-dimensional structure of tubulin protofilaments. Nature 279, 607–612. NOGALES, E. (2001). Structural insight into microtubule function. Annu. Rev. Biophys. Biomol. Struct. 30, 397–420. JOB, D., VALIRON, O. & OAKLEY, B. (2003). Microtubule nucleation. Curr. Opin. Cell. Biol. 15, 111–117. MANDELKOW, E. M. & MANDELKOW, E. (1992). Microtubule oscillations. Cell. Motil. Cytoskeleton 22, 235–244. GOSLINE, J. M. (1976). The physical properties of elastic tissue. Int. Rev. Connect Tissue Res. 7, 211–249. ELLIS, G. E. & PACKER, K. J. (1976). Nuclear spin-relaxation studies of hydrated elastin. Biopolymers 15, 813–832. URRY, D. W. (1988). Entropic elastic processes in protein mechanisms, II. Simple (passive) and coupled (active) development of elastic forces. J. Protein Chem. 7, 1–34. TATHAM, A. S. & SHEWRY, P. R. (2002). Comparative structures and properties of elastic proteins. Philos. Trans. R. Soc. Lond. B Biol. Sci. 357, 229–234. CLEARY, E. G., & GIBSON, M. A. (1983). Elastin-associated microfibrils and microfibrillar proteins. Int. Rev. Connect Tissue Res. 10, 97–209. VAUGHAN, L., MENDLER, M., HUBER, S., BRUCKNER, P., WINTERHALTER, K. H., IRWIN, M. I. & MAYNE, R. (1988). D-periodic distribution of collagen type
82
83
84
85
86
87
88
89
90
91
92
93
94
IX along cartilage fibrils. J. Cell Biol. 106, 991–997. ARDELL, D. H. & ANDERSEN, S. O. (2001). Tentative identification of a resilin gene in Drosophila melanogaster. Insect. Biochem. Mol. Biol. 31, 965–970. MORGAN, D. G., OWEN, C., MELANSON, L. A. & DEROSIER, D. J. (1995). Structure of bacterial flagellar filaments at 11 A resolution: packing of the alpha-helices. J. Mol. Biol. 249, 88–110. MIMORI, Y., YAMASHITA, I., MURATA, K., FUJIYOSHI, Y., YONEKURA, K., TOYOSHIMA, C. & NAMBA, K. (1995). The structure of the R-type straight flagellar filament of Salmonella at 9 A resolution by electron cryomicroscopy. J. Mol. Biol. 249, 69–87. ALDRIDGE, P. & HUGHES, K. T. (2002). Regulation of flagellar assembly. Curr. Opin. Microbiol. 5, 160–165. VONDERVISZT, F., KANTO, S., AIZAWA, S. & NAMBA, K. (1989). Terminal regions of flagellin are disordered in solution. J. Mol. Biol. 209, 127–133. NAMBA, K., YAMASHITA, I. & VONDERVISZT, F. (1989). Structure of the core and central channel of bacterial flagella. Nature 342, 648–654. YONEKURA, K., MAKI-YONEKURA, S. & NAMBA, K. (2002). Growth mechanism of the bacterial flagellar filament. Res. Microbiol. 153, 191–197. THOMAS, N. A., BARDY, S. L. & JARRELL, K. F. (2001). The archaeal flagellum: a different kind of prokaryotic motility structure. FEMS Microbiol. Rev. 25, 147–174. BARDY, S. L., NG, S. Y. & JARRELL, K. F. (2003). Prokaryotic motility structures. Microbiology 149, 295–304. FOREST, K. T. & TAINER, J. A. (1997). Type-4 pilus-structure: outside to inside and top to bottom – a mini-review. Gene 192, 165–169. WALL, D. & KAISER, D. (1999). Type IV pili and cell motility. Mol. Microbiol. 32, 1–10. MARVIN, D. A. & WACHTEL, E. J. (1976). Structure and assembly of filamentous bacterial viruses. Philos. Trans. R. Soc. Lond. B Biol. Sci. 276, 81–98. MARVIN, D. A. (1998). Filamentous phage structure, infection and assembly.
49
50 Structural Analysis of Fibrous Proteins
95
96
97
98
99
100
101
102
103
104
105
Curr. Opin. Struct. Biol. 8, 150–158. PEDERSON, D. M., WELSH, L. C., MARVIN, D. A., SAMPSON, M., PERHAM, R. N., YU, M. & SLATER, M. R. (2001). The protein capsid of filamentous bacteriophage PH75 from Thermus thermophilus. J. Mol. Biol. 309, 401–421. DURHAM, A. C., FINCH, J. T. & KLUG, A. (1971). States of aggregation of tobacco mosaic virus protein. Nat. New Biol. 1971, 229, 37–42. SCHUSTER, T. M., SCHEELE, R. B., ADAMS, M. L., SHIRE, S. J., STECKERT, J. J. & POTSCHKA, M. (1980). Studies on the mechanism of assembly of tobacco mosaic virus. Biophys. J. 32, 313–329. DIAZ-AVALOS, R. & CASPAR, D. L. (1998). Structure of the stacked disk aggregate of tobacco mosaic virus protein. Biophys. J. 74, 595–603. CULVER, J. N. (2002). Tobacco mosaic virus assembly and disassembly: determinants in pathogenicity and resistance. Annu. Rev. Phytopathol. 40, 287–308. BUTLER, P. J. (1984). The current picture of the structure and assembly of tobacco mosaic virus. J. Gen. Virol. 65, 253–279. MITRAKI, A., MILLER, S. & van RAAIJ, M. J. (2002). Review: conformation and folding of novel beta-structural elements in viral fiber proteins: the triple beta-spiral and triple beta-helix. J. Struct. Biol. 137, 236–247. BETTS, S. & KING, J. (1999). There’s a right way and a wrong way: in vivo and in vitro folding, misfolding and subunit assembly of the P22 tailspike. Structure Fold. Des. 7, R131–139. INOUYE, H., FRASER, P. & KIRSCHNER, D. (1993). Structure of β-Crystallite Assemblies by Alzheimer β Amyloid Protein Analogues: Analysis by X-Ray Diffraction. Biophys. J. 64, 502–519. BLAKE, C. & SERPELL, L. (1996). Synchrotron X-ray studies suggest that the core of the transthyretin amyloid fibril is a continuous b-helix. Structure 4, 989–998. SIKORSKI, P., ATKINS, E. D. T. & SERPELL, L. C. (2003). Structure and textures of fibrous crystals of the Abeta(11–25)
106
107
108
109
110
111
112
113
114
115
fragment from Alzheimer’s Abeta amyloid protein. Structure 11, 1–20. MALINCHIK, S., INOUYE, H., SZUMOWSKI, K. & KIRSCHNER, D. (1998). Structural analysis of Alzheimer’s b(1–40) amyloid: proto-filament assembly of tubular fibrils. Biophys. J. 74, 537–545. GRIFFITHS, J. M., ASHBURN, T. T., AUGER, M., COSTA, P. R., GRIFFIN, R. G. & LANSBURY, P. T. (1995). Rotational resonance solid state NMR elucidates a structural model of pancreatic amyloid. J. Am. Chem. Soc. 117, 3539–3546. BALBACH, J. J., PETKOVA, A. T., OYLER, N. A., ANTZUKIN, O. N., GORDON, D. J., MEREDITH, S. C. & TYCKO, R. (2002). Supramolecular structure in full-length Alzheimer’s β-amyloid fibrils: evidence for parallel b-sheet organisation from solid state nuclear magnetic resonance. Biophys. J. 83, 1205–1216. BENZINGER, T. L. S., GREGORY, D. M., BURKOTH, T. S., MILLER-AUER, H., LYNN, D. G., BOTTO, R. E. & MEREDITH, S. C. (2000). Two-dimensional structure of b-amyloid (10–35) fibrils. Biochemistry 39, 3491–3499. SERPELL, L. & SMITH, J. (2000). Direct visualisation of the beta-sheet structure of synthetic Alzheimer’s amyloid. J. Mol. Biol. 299, 225–231. JIMENEZ, J. L., GUIJARRO, J. I., ORLOVA, E., ZURDO, J., DOBSON, C., SUNDE, M. & SAIBIL, H. (1999). Cryo-electron microscopy structure of an SH3 amyloid fibril and model of the molecular packing. EMBO J. 18, 815–821. LAZO, N. & DOWNING, D. (1998). Amyloid fibrils may be assembled from b-helical protofibrils. Biochemistry 37, 1731–1735. PERUTZ, M. F., FINCH, J. T., BERRIMAN, J. & LESK, A. (2002). Amyloid fibers are water-filled nanotubes. Proc Natl Acad Sci USA 99, 5591–5595. SERPELL, L., SUNDE, M., BENSON, M., TENNENT, G., PEPYS, M. & FRASER, P. (2000). The protofilament substructure of amyloid fibrils. J Mol Biol. 300, 1033–1039. GOLDSBURY, C., KISTLER, J., AEBI, U., ARVINTE, T. & COOPER, G. J. (1999) Watching amyloid fibrils grow by
References
116
117
118
119
120
121
122
123
124
125
126
127
time-lapse atomic force microscopy. J. Mol. Biol. 285, 33–39. CROWTHER, R. (1993). TAU Protein and Paired Helical Filaments of Alzheimers Disease. Curr. Opin. Struct. Biol. 3, 202–206. CROWTHER, R. (1991). Structural Aspects of Pathology in Alzheimers Disease. Biochim. Biophys. Acta 1096, 1–9. BERRIMAN, J., SERPELL, L. C., OBERG, K. A., FINK, A. L., GOEDERT, M. & CROWTHER, R. A. (2003). Tau filaments from human brain and from in vitro assembly of recombinant protein show cross-beta structure. Proc. Natl. Acad. Sci. USA 100, 9034–9038. MARSH, R. E., COREY, R. B. & PAULING, L. (1955). An investigation of the structure of silk fibroin. Biochim. Biophys. Acta 16, 1–34. TAKAHASHI, Y., GEHOH, M. & YUZURIHA, K. (1999). Structure refinement and diffuse streak scattering of silk (Bombyx mori). Int. J. Biol. Macromol. 24, 127–138. PARKHE, A. D., SEELEY, S. K., GARDNER, K., THOMPSON, L. & LEWIS, R. V. (1997). Structural studies of spider silk proteins in the fiber. J. Mol. Recognit. 10, 1–6. PARKER, K. & RUDALL, K. (1957). The silk of the egg-stalk of the green lacewing fly. Nature 179, 905–907. van RAAIJ, M. J., MITRAKI, A., LAVIGNE, G. & CUSACK, S. (1999). A triple b-spiral in the adenovirus fiber shaft reveals a new structural motif for a fibrous protein. Nature 401, 935–938. YODER, M., KEEN, N. & JURNAK, F. (1993). New domain motif – the structure of pectate lyase-C, a secreted plant virulence factor. Science 260, 1503–1506. STEINBACHER, S., SECKLER, R., MILLER, D., STEIPE, B., HUBER, R. & REINEMER, P. (1994). Crystal-structure of P22 tailspike protein – interdigitated subunits in a thermostable trimer. Science 265, 383–386. RAETZ, C. & RODERICK, S. (1995). A left-handed parallel b-helix in the structure of UDP-N-Acetylglucosamine acyltransferase. Science 270, 997–1000. van RAAIJ, M. J., SCHOEHN, G., JAQUINOD, M., ASHMAN, K., BURDA, M. R. & MILLER,
128
129
130
131
132
133
134
135 136
137
138
S. (2001). Crystal structure of a heat- and protease-stable part of the bacteriophage t4 short tail fiber. J. Mol. Biol. 314, 1137–1146. KANAMARU, S., LEIMAN, P. G., KOSTYUCHENKO, V. A., CHIPMAN, P. R., MESYANZHINOV, V. V., ARISAKA, F. & ROSSMAN, M. G. (2002). The structure of the bacteriophage T4 cell-puncturing device. Nature 415, 553–557. KOSTYUCHENKO, V. A., LEIMAN, P. G., CHIPMAN, P. R., KANAMARU, S., van RAAIJ, M. J., ARISAKA, F., MESYANZHINOV, V. V. & ROSSMAN, M. G. (2003). Threedimensional structure of the bacteriophage T4 baseplate. Nat. Struct. Biol. 10, 688–693. OTTANI, V., MARTINI, D., FRANCHI, M., RUGGERI, A. & RASPANTI, M. (2002). Hierarchical structures in fibrillar collagens. Micron 33, 587–596. WESS, T. J., HAMMERSLEY, A. P., WESS, L. & MILLER, A. (1998). A consensus model for molecular packing of type I collagen. J. Struct. Biol. 122, 92–100. WESS, T. J., HAMMERSLEY, A. P., WESS, L. & MILLER, A. (1995). Type I Collagen packing conformation of the triclinic unit cell. J. Mol. Biol. 248, 487–493. PHILLIPS, G. N., FILLERS, J. P. & COHEN, C. (1986). Tropomyosin crystal structure and muscle regulation. J. Mol. Biol. 192, 111–131. WHITBY, F. G., KENT, H., STEWART, F., STEWART, M., XIE, X., HATCH, V., COHEN, C. & PHILLIPS, G. N. (1992). Structure of tropomyosin at 9 Angstroms resolution. J. Mol. Biol. 227, 441–452. CRICK, F. H. C. (1952). Is alpha-keratin a coiled coil. Nature 170, 882–883. STRELKOV, S. V., HERRMAN, H., GEISLER, N., WEDIG, T., ZIMBELMANN, R., AEBI, U. & BURKHARD, P. (2002). Conserved segments 1A and 2B of the intermediate filament dimer: their atomic structures and role in filament assembly. EMBO J. 21, 1255–1266. LOWE, J. & AMOS, L. A. (1998). Crystal structure of the bacterial cell-division protein FtsZ. Nature 391, 203–206. LOWE, J. & AMOS, L. A. (1999). Tubulin-like protofilaments in Ca2+
51
52 Structural Analysis of Fibrous Proteins
139
140
141
142
143
144
145
146
147
148
induced FtsZ sheets. EMBO J. 18, 2364–2371. NOGALES, E., WOLF, S. & DOWNING, K. H. (1998). Structure of the alpha-beta tubulin dimer by electron crystallography. Nature 391, 199–203. NOGALES, E., WHITTAKER, M., MILLIGAN, R. A. & DOWNING, K. H. (1999). Highresolution model of the microtubule. Cell 96, 79–88. HIROSE, K., AMOS, W. B., LOCKHART, A., CROSS, R. A. & AMOS, L. A. (1997). Three-dimensional cryo-electron microscopy of 16-protofilament microtubules: structure, polarity and interaction with motor proteins. J. Struct. Biol. 118, 140–148. SQUIRE, J. & MORRIS, E. (1998). A new look at thin filament regulation in vertebrate skeletal muscle. FASEB J. 12, 761–771. HANSON, E. J. & LOWRY, J. (1963). The structure of F-actin and of actin filaments isolated from muscle. J. Mol. Biol. 6, 48–60. KABSCH, W., MANNHERZ, H. G., SUCK, D., PAI, E. F. & HOLMES, K. C. (1990). Atomic structure of the actin filament. Nature 347, 44–49. HOLMES, K., POPP, D., GERHARD, W. & KABSCH, W. (1990). Atomic Model of the Actin Filament. Nature 347, 44–49. LORENZ, M., POOLE, K., POPP, D., ROSENBAUM, G. & HOLMES, K. (1995). An Atomic Model of the Unregulated Thin Filament Obtained by X-ray Fiber Diffraction on Oriented Actin-Tropomyosin Gels. J. Mol. Biol. 246, 108–119. SQUIRE, J. M., KNUPP, C., AL-KHAYAT, H. A. & HARFORD, J. J. (2003). Millisecond time-resolved low-angle X-ray fiber diffraction: a powerful high-sensitivity technique for modelling real-time movements in biological macromolecular assemblies. Fib. Diff. Rev. 11, 28–35. MORRIS, E., SQUIRE, J. & FULLER, G. (1991). The 4-Stranded Helical Arrangement of Myosin Heads on Insect (Lethocerus) Flight Muscle Thick Filaments. J. Struct. Biol. 107, 237–249.
149 PADRON, R., ALAMO, L., MURGICH, J. & CRAIG, R. (1998). Towards an atomic model of the thick filaments of muscle. J. Mol. Biol. 275, 35–41. 150 RAYMENT, I., RYPNIEWSKI, W., SCHMIDT-BASE, K., SMITH, R., TOMCHECK, D., BENNING, M., WINKELMANN, D., WESENBERG, G. & HOLDEN, H. (1993). 3D-Structure of Myosin Subfragment-1: A Molecular Motor. Science 261, 50–57. 151 ADLER, A. J., GREENFIELD, N. J. & FASMAN, G. D. (1973). Circular dichroism and optical rotatory dispersion of proteins and polypeptides. Methods Enzymol. 27, 675–735. 152 JOHNSON, W. C. (1990). Protein secondary structure and circular dichroism: a practical guide. Proteins 7, 205–214. 153 KUWAJIMA, K. (1995). Circular dichroism. Methods Mol. Biol. 40, 115–135. 154 WOODY, R. W. (1995). Circular dichroism. Methods Enzymol. 246, 34–71. 155 BLOEMENDAL, M. & JOHNSON, W. C. (1995). Structural information on proteins from circular dichroism spectroscopy possibilities and limitations. Pharm. Biotechnol. 7, 65–100. 156 GREENFIELD, N. J. (1996). Methods to estimate the conformation of proteins and polypeptides from circular dichroism data. Anal. Biochem. 235, 1–10. 157 KELLY, S. M. & PRICE, N. C. (2000). The use of circular dichroism in the investigation of protein structure and function. Curr. Protein Pept. Sci. 1, 349–384. 158 PENZER, G. R. (1980) in An introduction to spectroscopy for biochemists (ed. BROWN, S. B.) Academic Press London, 70–114. 159 ROYER, C. A. (1995). Fluorescence spectroscopy. Methods Mol. Biol. 40, 65–89M. 160 ROY, S. & BHATTACHARYYA, B. (1995). Fluorescence spectroscopic studies of proteins. Subcell. Biochem. 24, 101–114. 161 SCHMID, F. X. (1997) in Protein structure – a practical approach (ed. CREIGHTON, T. E.) Oxford University Press, 261–298. 162 MCLAUGHLIN, M. L. & BARKLEY, M. D. (1997). Time-resolved fluorescence of constrained tryptophan derivatives:
References
163
164
165
166
167
168
169
170 171
172
173
174
implications for protein fluorescence. Methods. Enzymol. 278, 190–202. BEECHEM, J. M. & BRAND, L. (1985). Time-resolved fluorescence of proteins. Annu. Rev. Biochem. 54, 43–71. PAPP, S. & VANDERKOOI, J. M. (1989). Tryptophan phosphorescence at room temperature as a tool to study protein structure and dynamics. Photochem. Photobiol. 49, 775–784. MATHIES, R. A., PECK, K. & STRYER, L. (1990). Optimization of high-sensitivity fluorescence detection. Anal. Chem. 62, 1786–1791. HAUGLAND, R. P. (1996) in Handbook of fluorescent probes and research chemicals (ed. SPENCE, M. T. Z.) Molecular Probes Inc. GORMAN, P. M., YIP, C. M., FRASER, P. E. & CHAKRABARTTY, A. (2003). Alternate aggregation pathways of the Alzheimer beta-amyloid peptide: Abeta association kinetics at endosomal pH. J. Mol. Biol. 325, 743–757. LEISSRING, M. A., LU, A., CONDRON, M. M., TEPLOW, D. B., STEIN, R. L., FARRIS, W. & SELKOE, D. J. (2003). Kinetics of amyloid beta-protein degradation determined by novel fluorescence- and fluorescence polarization-based assays. J. Biol. Chem. 278, 37314–37320. SCHEIBEL, T., BLOOM, J. & LINDQUIST, S. L. The elongation of yeast prion fibers involves separable steps of association and conversion. Proc. Natl. Acad. Sci. USA, 101, 2287–2292. HERMANSON, G. (1996) in Bioconjugate techniques Academic Press London. UDENFRIEND, S. (1962) in Fluorescence assay in biology and medicine (eds. HORECKER, B., KAPLAN, N. O., MARMUR, J. & SCHERAGA, H. A.) Academic Press London, 223–229. SLAVIK, I. (1982). Anilinonaphthalene sulfonate as a probe of membrane composition and function. Biochim. Biophys. Acta 694, 1–25. F¨ORSTER, T. (1951) in Fluoreszenz organischer Verbindungen, Vandenhoeck and Ruprecht G¨ottingen. MATULIS, D. & LOVRIEN, R. (1998). 1-Anilino-8-naphthalene sulfonate anion-protein binding depends
175
176
177
178
179
180
181
182
183
184
primarily on ion pair formation. Biophys. J. 72, 422–429. MATULIS, D., BAUMANN, C. G., BLOOMFIELD, V. A. & LOVRIEN, R. (1999). 1-anilino-8- naphthalene sulfonate as a protein conformational tightening agent. Biopolymers 49, 451–458. KIRK, W., KURIAN, E. & PRENDERGAST, F. (1996). Affinity of fatty acid for (r)rat intestinal fatty acid binding protein: further examination. Biophys. J. 70, 69–83. SAFAR, J., ROLLER, P. P., GAJDUSEK, D. C. & GIBBS, C. J. (1984). Scrapie amyloid (prion) protein has the conformational characteristics of an aggregated molten globule folding intermediate. Biochemistry 33, 8375–8383. LITVINOVICH, S. V., BREW, S. A., AOTA, S., AKIYAMA, S. K., HAUDENSCHILD, C. & INGHAM, K. C. (1998). Formation of amyloid-like fibrils by self-association of a partially unfolded fibronectin type III module. J. Mol. Biol. 280, 245–258. KAYED, R., BERNHAGEN, J., GREENFIELD, N., SWEIMEH, K., BRUNNER, H., VOELTER, W. & KAPURNIOTU, A. (1999). Conformational transitions of islet amyloid polypeptide (IAPP) in amyloid formation in vitro. J. Mol. Biol. 287, 781–796. BASKAKOV, I. V., LEGNAME, G., BALDWIN, M. A., PRUSINER, S. B. & COHEN, F. E. (2002). Pathway complexity of prion protein assembly into amyloid. J. Biol. Chem. 277, 21140–21148. MANN, C. J. & MATTHEWS, C. R. (1993). Structure and stability of an early folding intermediate of Escherichia coli trp aporepressor measured by far-UV stopped-flow circular dichroism and 8-anilino-1-naphthalene sulfonate binding. Biochemistry 32, 5282–5290. SUBBARAO, N. K. & MACDONALD, R. C. (1993). Experimental method to correct fluorescence intensities for the inner filter effect. Analyst 118, 913–916. BENOIT, H., FREUND, L. & SPACH, G. (1967) in: Poly-α-amino acids (ed. FASMAN, G. D.) Marcel Dekker Inc. New York, 105–155. SHEN, C. L., SCOTT, G. L., MERCHANT, F. & MURPHY, R. M. (1993). Light scattering
53
54 Structural Analysis of Fibrous Proteins
185
186
187
188
189
190
191
192
193
194
195
analysis of fibril growth from the amino-terminal fragment beta(1–28) of beta-amyloid peptide. Biophys. J. 65, 2383–2395. HARDING, S. E. (1997) in Protein structure – a practical approach (ed. CREIGHTON, T. E.) Oxford University Press, 219–251. WYATT, P. J. (1991). Combined differential light scattering with various liquid chromatography separation techniques. Biochem. Soc. Trans. 19, 485. ROESSNER, D. & KULICKE, W. M. (1994). On-line coupling of flow field-flow fractionation and multi-angle laser light scattering J. Chromatogr. 687, 249–258. WYATT, P. J. & VILLALPANDO, D. (1997). High-Precision Measurement of Submicrometer Particle Size Distributions. Langmuir 13, 3913–3914. HARDING, S. E. (1986). Applications of light scattering in microbiology. Biotechnol. Appl. Biochem. 8, 489–509. WALSH, D. M., LOMAKIN, A., BENEDEK, G. B., CONDRON, M. M. & TEPLOW, D. B. (1997). Amyloid beta-protein fibrillogenesis. Detection of a protofibrillar intermediate. J. Biol. Chem. 272, 22364–22372. TSENG, Y., FEDOROV, E., MCCAFFERY, J. M., ALMO, S. C. & WIRTZ, D. (2001). Micromechanics and ultrastructure of actin filament networks crosslinked by human fascin: a comparison with alpha-actinin. J. Mol. Biol. 310, 351–366. KITA, R., TAKAHASHI, A., KAIBARA, M. & KUBOTA, K. (2002). Formation of fibrin gel in fibrinogen-thrombin system: static and dynamic light scattering study. Biomacromolecules 3, 1013–1020. GIDDINGS, J. C., YANG, F. J. & MYERS, M. N. (1977). Flow field-flow fractionation as a methodology for protein separation and characterization. Anal. Biochem. 81, 394–407. WAHLUND, K. G. & GIDDINGS, J. C. (1987). Properties of an asymmetrical flow field-flow fractionation channel having one permeable wall. Anal. Chem. 59, 1332–1339. GIDDINGS, J. C. (1989). Field-flow fractionation of macromolecules. J. Chromatogr. 470, 327–335.
196 LIU, M. K., LI, P. & GIDDINGS, J. C. (1993). Rapid protein separation and diffusion coefficient measurement by frit inlet flow field-flow fractionation. Protein Sci. 2, 1520–1531. 197 NILSSON, M., BIRNBAUM, S. & WAHLUND, K. G. (1996). Determination of relative amounts of ribosome and subunits in Escherichia coli using asymmetrical flow field-flow fractionation. J. Biochem. Biophys. Methods 33, 9–23. 198 ADOLPHI, U. & KULICKE, W. M. (1997). Coil dimensions and conformation of macromolecules in aqueous media from flow field-flow fractionation/multi-angle laser light scattering illustrated by studies on pullulan. Polymer 38, 1513–1519. 199 KRETCHMANN, E. (1971). The determination of the optical constants of metals by excitation of surface plasmons. Z. Phys. 241, 313–324. 200 FAGERSTAM, L. G., FROSTELL-KARLSSON, A., KARLSSON, R., PERSSON, B. & RONNBERG, I. (1992). Biospecific interaction analysis using surface plasmon resonance detection applied to kinetic, binding site and concentration analysis. J. Chromatogr. 597, 397–410. 201 De BRUIJN, H. E., KOOYMAN, R. P. & GREVE, J. (1992). Choice of metal and wavelength for SPR sensors: some considerations. Appl. Opt. 31, 440–442. 202 GREEN, R. J., FRAZIER, R. A., SHAKESHEFF, K. M., DAVIES, M. C., ROBERTS, C. J. & TENDLER, S. J. (2000). Surface plasmon resonance analysis of dynamic biological interactions with biomaterials. Biomaterials 21, 1823–1835. 203 RICH, R. L. & MYSZKA, D. G. (2000). Advances in surface plasmon resonance biosensor analysis. Curr. Opin. Biotechnol. 11, 54–61. 204 MYSZKA, D. G., WOOD, S. J., BIERE, A. L. (1999) Analysis of fibril elongation using surface plasmon resonance biosensors. Methods Enzymol. 309, 386–402. 205 VIKINGE, T. P., HANSSON, K. M., BENESCH, J., JOHANSEN, K., RANBY, M., LINDAHL, T. L., LIEDBERG, B., LUNDSTOM, I. & TENGVALL, P. (2000). Blood plasma coagulation studied by surface plasmon resonance. J. Biomed. Opt. 5, 51–55.
References 206 BINNIG, G., QUATE, C. F. & GERBER, C. (1986). Atomic force microscope. Phys. Rev. Lett. 56, 930–933. 207 STOLZ, M., STOFFLER, D., AEBI, U. & GOLDSBURY, C. (2000). Monitoring biomolecular interactions by time-lapse atomic force microscopy. J. Struct. Biol. 131, 171–180. 208 YANG, Y., WANG, H. & ERIE, D. A. (2003). Quantitative characterization of biomolecular assemblies and interactions using atomic force microscopy. Methods 29, 175–187. 209 HARPER, J. D., LIEBER, C. M. & LANSBURY, P. T. (1997). Atomic force microscopic imaging of seeded fibril formation and fibril branching by the Alzheimer’s disease amyloid-beta protein. Chem. Biol. 4, 951–959. 210 BLACKLEY, H. K., SANDERS, G. H., DAVIES, M. C., ROBERTS, C. J., TENDLER, S. J. & WILKINSON, M. J. (2000). In-situ atomic force microscopy study of beta-amyloid fibrillization. J. Mol. Biol. 298, 833–840. 211 DEPACE, A. H., WEISSMAN, J. S. (2002). Origins and kinetic consequences of diversity in Sup35 yeast prion fibers. Nat. Struct. Biol. 9, 389–396. 212 DA SILVA, L. P. (2002). Atomic force microscopy and proteins. Protein Pept. Lett. 9, 117–126. 213 PUCHTLER, H. & SWEAT, F. (1965). Congo red as a stain for fluorescence microscopy of amyloid. J. Histochem. Cytochem. 13, 693–694. 214 KLUNK, W. E., JACOB, R. F. & MASON, R. P. (1999). Quantifying amyloid by congo red spectral shift assay. Methods Enzymol. 309, 285–305. 215 DELELLIS, R. A., GLENNER, G. G. & RAM, J. S. (1968). Histochemical observations on amyloid with reference to polarization microscopy. J. Histochem. Cytochem. 16, 663–665. 216 KHURANA, R., UVERSKY, V. N., NIELSEN, L. & FINK, A. L. (2001). Is Congo red an amyloid-specific dye? J. Biol. Chem. 276, 22715–22721. 217 NAIKI, H., HIGUCHI, K., HOSOKAWA, M., TAKEDA, T. (1989). Fluorometric determination of amyloid fibrils in vitro using the fluorescent dye, thioflavin T1. Anal. Biochem. 177, 244–249.
218 LEVINE, H. (1997). Stopped-flow kinetics reveal multiple phases of thioflavin T binding to Alzheimer beta (1–40) amyloid fibrils. Biochem. Biophys. 342, 306–316. 219 BAN, T., HAMADA, D., HASEGAWA, K., NAIKI, H. & GOTO, Y. (2003). Direct observation of amyloid fibril growth monitored by thioflavin T fluorescence. J. Biol. Chem. 278, 16462–16465. 220 HAYES, T. L. (1973). Scanning electron microscope techniques in biology. In Advanced Techniques in Biological Electron Microscopy (ed. KOEHLER, J. K.). Springer-Verlag, Berlin, Heidelberg, New York. 221 ECHLIN, P. (1971). The application of SEM to biological research. Phil. Trans. Soc. Lond. B 261, 51–59. 222 DEROSIER, D. J. & MOORE, P. B. (1970). Reconstruction of three-dimensional images from electron micrographs of structures with helical symmetry. J. Mol. Biol. 52, 355–369. 223 HARRIS, J. R. (1997). Negative staining and cryoelectron microscopy. Royal Microscopical Society, Microscopy handbook series, Bios scientific publishers limited, Oxford. 224 GLENNEY, J. R. (1987). Rotary metal shadowing for visualizing rod-shaped proteins. In Electron microscopy in molecular biology (eds. SOMMERVILLE, J. & SCHEER, U.), IRL press, Oxford, Washington DC, 167–178. 225 DEROSIER, D. J. & KLUG, A. (1968). Reconstruction of three-dimensional structures from electron micrographs. Nature 217, 130–134. 226 SMITH, J. M. (1999). XIMDISP – A visualization tool to aid in structure determination from electron micrograph images. J. Struct. Biol. 125, 223–228. 227 CROWTHER, R. A., HENDERSON, R. & SMITH, J. (1996). MRC Image processing programs. J. Struct. Biol. 116, 9–16. 228 EGELMAN, E. H. (1986). An algorithm for straightening images of curved filamentous structures. Ultramicroscopy 19, 367–373. 229 STEWART, M. (1988). Computer image processing of electron micrographs of biological structures with helical
55
56 Structural Analysis of Fibrous Proteins
230
231
232
233
234
235
236
237
238
symmetry. J. Electron microscopy technique 9, 325–358. FRANK, J. (1973). Computer Processing of Electron Micrographs. In Advanced Techniques in Biological Electron Microscopy (ed. KOEHLER, J. K.). Springer-Verlag, Berlin, Heidelberg, New York. STINE, W. B., SNYDER, S. W., LADROR, U. S., WADE, W. S., MILLER, M. F., PERUN, T. J., HOLZMAN, T. F. & KRAFFT, G. A. (1996) The nanometer-scale structure of amyloid-beta visualized by atomic force microscopy. J. Protein Chem. 15, 193–203. DING, T. T. & HARPER, J. D. (1999). Analysis of amyloid-beta assemblies using tapping mode atomic force microscopy under ambient conditions. In Methods in Enzymology: Amyloid, prions and other protein aggregates (ed. WETZEL, R.) Academic press New York, 309, 510–525. SIKORSKI, P. & ATKINS, E. D. T. (2001). The three-dimensional structure of monodisperse 5-amide nylon 6 crystals in the lambda phase. Macromolecules 34, 4788–4794. KREJCHI, M. T., COOPER, S. J., DEGUCHI, Y., ATKINS, E. D. T., FOURNIER, M. J., MASON, T. L. & TIRRELL, D. A. (1997). Crystal structures of chain-folded antiparallel beta-sheet assemblies from sequence-designed periodic polypeptides. Macromolecules 17, 5012–5024. FANDRICH, M. & DOBSON, C. (2002). The behaviour of polyamino acids reveals an inverse side-chain effect in amyloid structure formation. EMBO Journal 21, 5682–5690. WORCESTER, D. (1978). Structural Origins of Diamagnetic Anisotropy in Proteins. Proc. Natl. Acad. Sci. USA 75, 5475–5477. TORBET, J., FREYSSINET, J.-M. & HUDRY-CLERGRON, G. (1981). Oriented Fibrin Gels Formed by Polymerization in Strong Magnetic Fields. Nature 289, 91–93. TORBET, J. & DICKENS, M. J. (1984). Orientation of skeletal muscle actin in strong magnetic field. FEBS Lett. 173, 403–406.
239 MARVIN, D. A. & NAVE, C. (1982). X-ray fiber diffraction. In Structural molecular biology (eds. DAVIES, D. B., SRENGER, W. & DANYLUK, S. S.). Plenum Publishing corporation. 240 SQUIRE, J., AL-KHAYAT, H., ARNOTT, S., CRAWSHAW, J., DENNY, R., DIAKUN, G., DOVER, D., FORSYTH, T., HE, A., KNUPP, C., MANT, G., RAJKUMAR, G., RODMAN, M., SHOTTON, M. & WINDLE, A. (2003). New CCP13 software and the strategy behind further developments: stripping and modelling of fiber diffraction data. Fiber diffraction review 11, 13–19. 241 CCP4. (1994). The CCP4 Suite Programs for Crystallography. Acta Cryst.D 50, 760–763. 242 OTWINOWSKI, Z. & MINOR, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. In Methods in Enzymology, (eds. CARTER, C. W.& SWEET, R. M.) Academic press New York 276, 307–326. 243 FRASER, R. D. B., MACRAE, T. P., MILLER, A. & ROWLANDS, R. J. (1976). Digital Processing of Fiber Diffraction Patterns. J. Appl. Cryst. 9, 81–94. 244 MILLANE, R. P. & ARNOTT, S. (1985). Background subtraction in X-ray fiber diffraction patterns. J. Appl. Cryst. 18, 419–423. 245 IVANOVA, M. I. & MAKOWSKI, L. (1998). Iterative low-pass filtering for estimation of the background in fiber diffraction patterns. Acta Cryst. A 54, 626–631. 246 OKADA, K., NOGUCHI, K., OKUYAMA, K. & ARNOTT, S. (2003). WinLALS for a linked-atom least squares refinement program for helical polymers on Windows PCs. Computational Biology and Chemistry 3, 265–285. 247 CANTOR, C. R. & SCHIMMEL, P. R. (1980). In Biophysical Chemistry. Part II: Techniques for the study of biological structure and function. W. H. Freeman and company New York. 248 SESHADRI, S., KHURANA, R. & FINK, A. L. (1999). Fourier transform infrared spectroscopy in analysis of protein deposits. In Methods in Enzymology: Amyloid, prions and other protein aggregates (ed. WETZEL, R.) Academic press New York, 309, 559–576.
References 249 OBERG, K. A. & FINK, A. L. (1998). A new attenuated total reflectance Fourier transform infrared spectros-copy method for the study of proteins in solution. Anal. Biochem. 256, 92–106. 250 SERPELL, L. C., BERRIMAN, J., JAKES, R., GOEDERT, M. & CROWTHER, R. A. (2000).
Fiber diffraction of synthetic α-synuclein filaments shows amyloidlike cross-b conformation. Proc. Natl. Acad. Sci. USA 97, 4897–4902. 251 SCHEIBEL, T., KOWAL, A., BLOOM, J. & LINDQUIST, S. (2001). Bi-directional amyloid fiber growth for a yeast prion determinant. Curr. Biol. 11, 366–369.
57
1
Computational Analysis of Modular Protein Architectures Rune Linding, Ivica Letunic, Toby J. Gibson, and Peer Bork EMBL-Heidelberg, Heidelberg, Germany
Originally published in: Modular Protein Domains. Edited by Giovanni Cesareni, Mario Gimona, Marius Sudol and Michael Yaffe. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30813-2 The sections in this article are
1 Introduction
In this chapter we discuss some of the bioinformatics and computational tools that exist for finding modular domains and their cognate ligands. We begin with a general discussion of protein architecture and relate this to the modular model of protein function. We pay particular attention to the SMART, ELM, GlobPlot, and DisEMBL resources, because these are the ones we are most familiar with. We apologize to those involved with all the great resources out there that we have not covered in this chapter.
2 Protein Architecture: Sequence, Structure, and Function
Nature seems to present protein functions in two states: structured and unstructured. Proteins are heteropolymers of amino acids; the sequence of amino acids determines not only the structure and folding of a protein but also the lack of structure. Molecular functions in proteins are associated with structural units, e.g., modular globular domains. However, an emerging large group of functional sites are found primarily in unstructured parts of proteins. In this chapter we explore how these sites can be identified in proteins and describe some of the computational tools that can be used to analyze protein sequences.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Computational Analysis of Modular Protein Architectures
2.1 The Modular Model of Protein Function
Multidomain proteins predominate in eukaryotic proteomes. The basic hypothesis in what one might call the modular model for protein function is that individual functions assigned to different sequence segments (often domains) combine to create a complex function for the whole protein. The term ‘modular’ refers directly to the autonomous nature of the individual folding units determining these functions, whereas the term ‘globular’ describes the structural state of a domain whether or not it is modular. A dogma in structural biology is that atomic structure determines function. The modular model has grown out of this paradigm; however, we know that at the fold level this is not always true, one can find structures belonging to the same fold (e.g. in SCOP) that have completely different functions: an example of this is glutathione-S-transferase and S-crystallin. They share 75% sequence similarity and the same fold, but the former is an enzyme and the latter a structural protein [4]. This is mainly a problem when one is trying to infer function from sequence; having the atomic structure solved frequently helps to define the function. Because single-domain globular proteins are often, although not always, less difficult to crystallize, for a long time they dominated our perceptions of typical protein structure (although fibrous proteins like collagen were of course well known). Gradually, as protein sequences have accumulated, the monodomain view of protein structure has been replaced by the realization that most proteins are multidomain, at least in higher eukaryotes. Multidomain architectures are usual for transmembrane receptors, signaling proteins, cytoskeletal proteins, chromatin proteins, transcription factors, and so forth. Multidomain proteins can be described as consisting of a series of modules or globular domains and a set of short linear functional sites. An archetypal protein of the modular model is Src (Figure 1). Src consists of three globular domains: SH3, SH2, and a tyrosine kinase domain (itself consisting of two structural domains), and eight known functional sites including four phosphorylation sites and three ligands of modular domains (SH3, SH2, and cyclin). 2.2 Partitioning of Protein Space
It is becoming increasingly clear that many functionally important protein segments occur outside globular domains [27, 99]. The set (or space) of all observations of protein structure and function is partitioned into two subspaces (Figure 2). The first consists of globular units having binding pockets, active sites, and interaction surfaces. The second subspace consists of nonglobular segments such as sorting signals, post-translational modification sites, and protein ligands (e.g., SH3 or WW ligands). Globular units are built of regular secondary structural elements and contribute the majority of the structural data deposited in PDB. The globular function space is described very well by domain databases such as SMART [49] and Pfam [11].
2 Protein Architecture: Sequence, Structure, and Function
Fig. 1 Domain and functional site architecture of the well known proto-oncogenic protein kinase Src. About 60%–80% of proteins from higher eukaryotes have analogous modular architectures [6]. Even though ‘only’ ∼30 000 ORFs are predicted to be in the human genome, most of these have
several splice variants and, in addition, several functional sites, e.g., post-translational modification sites (PTMs). These sites exist in various states and thereby increase the number of different functional isoforms several fold. The arrows show only the approximate locations of the functional sites.
Fig. 2 A conceptual model of protein structure and function observation space. Many functions and their structures can be assigned to two different subspaces. A linear/nonglobular one and a globular one. It is important to notice that these two sub-
spaces are not detached from each other – a good example is disordered loops that can protrude from a globular domain. Along the borderline (white) we find coiled coils, repeat proteins (often forming rods), and single transmembrane helices (TM1).
3
4 Computational Analysis of Modular Protein Architectures
In contrast, the nonglobular subspace encompasses disordered, unstructured, and flexible regions lacking regular secondary structure. Functional sites within the nonglobular space are known as linear motifs catalogued by ELM [71], PROSITE [80], and Scansite [101]. This group of sites includes protein interaction sites, cell compartment targeting signals, post-translational modification sites, and cleavage sites. Traditionally, protein function and interaction have been studied from a domaincentric view, and in fact, most large datasets that deal with protein interaction have also focused on this type of interaction [2]. This is because methods such as affinity purification tend to isolate ‘sticky’ and ‘nontransient’ protein complexes. The methods for isolating transient interactions, such as the binding of a cognate ligand to its modular domain, are different, and much smaller in-vivo datasets exist. However, the nontransient network is only a part, and perhaps even a subpart, of the interaction networks within the cell. We hope that this book will encourage the scientific community to focus on collecting large amounts of data on domain-ligand interactions, since only by having these can we try to obtain a full picture of the cellular protein networks. Below, we discuss how to annotate and analyze protein sequences for modular domains and linear motifs. Methods for finding domains and predicting their function are described first, followed by a description of how to identify unstructured regions and their potential functions.
3 Analyzing Globular Domains
The past three decades have seen relatively steady levels of domain discovery [18]. It seems likely that most of the more common mobile protein domains have already been described in the literature (see, Figure 3). However, because there is a large number of domains that are detectable only in a relatively small number of proteins, we predict that various domains are still hiding in many proteins and that each genome might harbor its own repertoire of species-specific or at least lineage-specific domains [26]. Since the late 1970s, homology search has been an extremely powerful computational technique for assigning novel functions to proteins. In addition to database search methods that were introduced in the early 1980s, an awareness of conserved entities such as local motifs was incorporated into software that was able to scan dedicated collections of such motifs in the mid 1980s (see, e.g., PROSITE, 1985, for an early resource). Yet, the statistics of more sophisticated search methods such as BLAST are still struggling with compositional biases due to nonglobularity or multiple occurrences of such structural entities within a search sequence. However, without discrimination of the rationale behind functionality (e.g., short functional motifs that can change quickly in time, such as glycosylation patterns vs. essential catalytic residues that stay conserved over billions of years), homology searches with the aim of function prediction remain limited. Thus, in this chapter
3 Analyzing Globular Domains
Fig. 3 Presence of SMART domains in the Homo sapiens proteome. Domains present in more than 20 proteins are shown. Domain names are displayed for only a few of the 314 domains.
we first define globular domains and then describe resources that help to annotate known domains. Finally, we explain the analysis options of one of these resources, SMART. 3.1 Globularity of Domains
Both ‘domain’ and ‘globular’ are terms defined in structural protein biology. They have since been used in other contexts such as function description and sequence analysis. Domains were first defined in structural terms, since early X-ray structures showed separate entities with defined subfunctions connected by flexible regions. The term globular refers to the globular structural state: a protein globule can be depicted as a soluble sphere having a hydrophobic core. An operational definition of globular proteins can be found in [19]: Most natural proteins in solution are much smaller in their dimensions than comparable polypeptides with random or repetitive conformations and have roughly spherical shapes; hence they are generally referred to as globular. Their physical properties do not change gradually as the environment is altered (e.g., by changes in temperature, pH or pressure) as do the properties of random polypeptides. Instead, globular proteins usually exhibit little or no change, until a point is reached at which there is a sudden drastic change and, invariably, a loss of biological function. This phenomenon is known as denaturation.
Structural biologists relate the term globular to domains that are compact and fold independently of the remainder of the host protein. Theoretical definitions of
5
6 Computational Analysis of Modular Protein Architectures
domains exist [85, 86, 88], including several algorithms for classifying domains and folds [43, 87]. Resources for finding globular domains with determined tertiary structure include SCOP (fold level) [54], ASTRAL (domains from SCOP) [16], SUPERFAMILY (hidden Markov models of SCOP domains) [35], and CATH (architecture level) [37, 65]. However, these resources are confined to the structural knowledge base PDB [96]. Therefore, only a subset of the total fold and structure space is described; the remaining domains are to a certain extent described in Pfam and SMART. 3.2 Resources for Analysis of Globular Domains
There are numerous domain databases available that can be useful for detection and analysis of globular domains in your favorite protein sequence. They can be separated into several categories: Ĺ Databases such as SMART (http://smart.embl.de/ [49]) and PFAM (http:// www.sanger.ac.uk/software/pfam/ [11]) primarily make use of hand-edited sequence alignments representing single protein domains with well defined borders at the sequence level. PROSITE (http://www.expasy.org/prosite/ [80]) is also a handmade resource, but it contains a much more heterogeneous set of domains and motifs although, for linear motifs, it has been superseded by ELM. Ĺ Other databases rely on various automatic methods to generate their domain signatures. This is so for ProDom (http://www.toulouse.inra.fr/prodom.html [79]) and BLOCKS (http://www.blocks.fhcrc.org/ [38]). Such resources predict domains that do not always correspond to known structural and globular domains and for this purpose may not be as sensitive. However, these resources are of substantial discovery value, since they collect conserved sequence segments that might specify novel functions. Ĺ In addition to the search functionality of the databases themselves, several metaservers allow users to search multiple domain databases. The Interpro database, at the EBI (http://www.ebi.ac.uk/interpro/ [61]) allows searching of the PROSITE, PFAM, PRINTS, ProDom, and SMART model collections, and the Conserved Domain Database (CDD) at the NCBI (http://www.ncbi.nlm.nih.gov/ structure/cdd/cdd.shtml [57]) allows searching of profiles derived from SMART and PFAM using a modified version of the BLAST algorithm.
We describe the SMART resource in greater detail, because it focuses on modular signaling domains. 3.3 SMART: Simple Modular Architecture Research Tool
The explosion of sequence data increases the need for computational sequenceanalysis tools that annotate novel genes with predicted functions. Function prediction, however, is fraught with potential pitfalls, such as variable sequence
3 Analyzing Globular Domains
divergence, nonequivalent functions of homologs, and nonidentical multidomain architectures [25]. Detecting nonenzymatic regulatory domains is essential to predicting a protein’s cellular role, binding partners, and subcellular localization. Such domains can be divergent in sequence and occur in contrasting multidomain contexts. This leads to difficulties in unraveling the evolution and function of multidomain proteins. To help in solving these problems, SMART has been developed to identify and annotate protein domains, particularly those in eukaryotes that are genetically mobile and difficult to detect. 3.3.1 The SMART Alignment Set Domain detection in SMART relies on multiple sequence alignments of representative family members. Alignment Construction Protocol The starting point for constructing a multiple sequence alignment that optimally represents a domain family is an alignment of divergent family members based on known tertiary structures, where possible, or from homologs identified in a PSI-BLAST [3] analysis. These alignments are optimized manually and, after construction of a hidden Markov model (HMM), used to search current sequence databases (Figure 4). Each sequence of the alignment is also used as a query in a PSI-BLAST search. All sequences that are significantly similar [as detected by HMM (E < 0.01) or PSI-BLAST (E < 0.001) searches] are added to the alignment using the sequence-versus-HMM alignment method of HMMer. Alignments are checked manually for potential false positives or misassembled protein sequences derived from genomic sources. From this alignment, one of each sequence pair sharing > 67% identity is deleted to reduce redundancy. The resulting alignment is used as a starting point for a subsequent round of searches. This iterative procedure is pursued until no new homologs are detected. Searching Method To maximize the sensitivity of domain and repeat detection, SMART uses hidden HMMer models as implemented in the HMMer software package (http://hmmer.wustl.edu/). HMMER provides statistically sound E values, thus giving a robust estimate of the significance of a domain hit. [The E value represents the number of sequences having a score ≥X that would be expected absolutely by chance. The E value connects the score (X) of an alignment between a user-supplied sequence and a database sequence, generated by any algorithm, with how many alignments having similar or greater scores would be expected from a search of a random sequence database of equivalent size.] From a database search with an HMM derived from the SMART alignment, the highest per-protein E value of identified true positives (Ep ) and the lowest per-protein E value of predicted true negatives (En ) are stored within the SMART database. Similarly, for two or more repeats in a protein, the lowest E value of a false positive repeat (Er ) is stored. To ensure that the E value thresholds are independent of database size, the size of the protein database used when deriving the thresholds is also recorded. SMART predicts a domain homolog within any sequence that either has an E value < Ep or else when Ep < E value < En and E value < 1.0. If no repeat threshold is defined, all hits in a protein are reported; otherwise only those with E values < Er are shown.
7
8 Computational Analysis of Modular Protein Architectures
Fig. 4 SMART alignment representing the MAM domain.
3 Analyzing Globular Domains
Domain Coverage Originally, SMART was intended as a tool for the analysis of domains involved in eukaryotic signal transduction [76]; it was later expanded to detect domains of extracellular proteins and bacterial two-component regulatory systems. Gradually, various domains associated with DNA, RNA, chromatin, and cytoskeletal functions have been added. Over the past few years, to augment the SMART domain set, several semi-automatic search methods to identify new and biologically interesting domains were developed. The current release of SMART (version 4.0) includes nearly 700 protein domains. 3.3.2 SMART Relational Database System The core of SMART is a relational database management system (RDBMS) powered by PostgreSQL (http://www.postgresql.org/), which stores information on SMART domains and the underlying nonredundant protein database. Protein Database Basic components of SMART’s source sequence database are the Swiss-Prot and SP-TrEMBL [10] protein sets, which have been used by SMART since its inception. This set was recently expanded by inclusion of all proteomes available in the Ensembl collection [17]. Sequences from all sources are compared, and a nonredundant set of proteins with multiple identifiers per sequence is generated. Sequences are retrievable and linkable via any of the original identifiers. Domain Database The SMART domain database stores information on each domain’s presence in all proteins in the relational database. Each domain’s hit borders, raw bit score, and Expect (E) value are recorded, together with the protein accession code, description, and species name. In addition to domain information, other intrinsic features of each protein, such as transmembrane regions, coiled coils, signal peptides, and internal repeats are included. 3.3.3 Web Interface SMART provides a web-based interface to its underlying relational database and HMMer-based search engine. There are two principal ways of using SMART: individual sequence analysis and domain architecture analysis. Here we describe major features of the current SMART (version 4.0). Sequence Analysis SMART uses the CRC64 algorithm to calculate checksums for all user-supplied sequences. If a matching checksum is found in the SMART database, precalculated results are displayed. If there is no match, HMMer software is used to scan the sequence with all SMART profiles. It is also possible to include Pfam profiles in the search. Resulting schematic protein representations (Figure 5) are easy to interpret: a gray line shows the protein backbone, and different colored shapes represent domains and features that are confidently predicted. If a user-supplied sequence has a matching checksum identified, several important features become available in the main results page.
9
10 Computational Analysis of Modular Protein Architectures
Fig. 5 SMART representation of mouse tyrosine protein kinase TEC (ENSMUSP00000006349). Gray lines show the protein backbone, with domains represented by different colored shapes. Intron positions are indicated by vertical lines showing the amino acid location and the intron phase. Intron positions are taken from Ensembl gene predictions.
Where available, intron positions are shown in schematic protein figures. For proteins that match any of the Ensembl predictions, SMART shows intron positions as vertical colored lines (Figure 5). This information is retrieved from a precalculated mapping of Ensembl gene structures to protein sequences. Extra information may be associated with the sequence. If multiple IDs are associated with the same sequence, users receive a list of all IDs with links to corresponding source databases. Since SMART incorporates Ensembl genomes, users also receive a list of alternative splices of the gene encoding the analyzed protein (if there are any). It is possible to either display SMART protein annotation for any of the alternative splices or obtain a graphical multiple sequence alignment of all of them (Figure 6). Orthology information: SMART provides orthology information for all Ensemblpredicted proteins. These relationships are distinct from those provided by Ensembl. There are two separate sets of orthologs for each protein: 1 : 1 reciprocal best matches in other genomes and orthologous groups with reciprocal best hits from all genomes analyzed (i.e., each of these proteins has exactly one
Fig. 6 SMART graphical alignment of alternative splice variants of Mus musculus procollagen gene ENSMUSG00000026141. Domain and intron positions are adjusted according to gaps in the alignment (black boxes).
3 Analyzing Globular Domains
Fig. 7 SMART representation of an orthologous group alignment. Orthologous proteins from different species are aligned using Clustal W. Domains, intrinsic features, and introns are mapped onto the alignment with their positions adjusted according to gaps (black boxes). This tool allows easy visual comparison of intron positions and
their relations to protein features. Proteins displayed: H. sapiens ENSP00000306893, M. musculus ENSMUSP00000034225, Rattus norvegicus ENSRNOP00000015337, Fugu rubripes SINFRUP00000143191, Drosophila melanogaster CG6827-PA, and Anopheles gambiae, ENSANGP00000009390.
ortholog in all six genomes). Orthologous groups are displayed as graphical multiple sequence alignments (Figure 7). All orthology information is extracted from all-against-all Smith–Waterman similarities for combined proteomes, using a previously described method [103].
Domain Architecture Analysis (Architecture SMART and Alert SMART) Architecture SMART allows users to search for specific domain architectures using AND/NOT logic. Since the SMART database includes intrinsic protein features as well as precalculated results for Pfam [11] domains, these can be used together with SMART domains. For example, it is possible to identify receptor tyrosine kinases by searching for proteins that contain both a tyrosine kinase domain and a predicted transmembrane region (query “TyrKc AND TRANS”, Figure 8). In addition to standard domain querying, SMART can be used to find proteins based on gene ontology (GO [8]) terms associated with domains. Associations of domains with GO are taken from Interpro [61]. Querying with GO terms is a twostep process. In the first step, the user obtains a list of domains matching the GO terms entered. After selecting the domains of interest from the list, proteins
11
12 Computational Analysis of Modular Protein Architectures
Fig. 8 Using intrinsic features in domain architecture queries. The SMART database was queried for all proteins containing a tyrosine kinase domain and a transmembrane region (“TyrKc AND TRANS”), and 660 proteins were found,including the four
displayed here. The color of domain names correlates with both subcellular localization (blue = extracellular, black = intracellular)and catalytic activity (red = catalytically active).
containing those domains are displayed. As with standard domain querying, results can be limited to specific taxonomic ranges. Finding Proteins with Similar Domain Architecture SMART can search for all proteins that have the same domain architecture as the query (having all the domains of the query protein in the same co-linear order) or that have an identical set of domains (at least one of all domain types of the query protein, irrespective of order). Identification of proteins having identical or near-identical domain architectures as the query sequence may improve predictions of protein functions. This feature also reveals, by using a taxonomic breakdown, the phyletic distribution of a given architecture. 3.3.4 Application of SMART Apart from its use as a web tool, SMART has been applied to large-scale annotation projects, such as annotation of the human, mouse, and mosquito draft genome sequences [48, 95, 103]. It was also used in the investigation of single domain families in model organisms [40] and for the study of sequence conservation in multiple alignments [67]. In conjunction with genomic data, SMART was used for the study of conservation of gene (i.e., intron/exon) structure [13]. SMART has also been incorporated into other domain and protein family resources that are used for the primary annotation of sequence databases. It is a component database of Interpro [61], which contributes to the annotation of Swiss-Prot
3 Analyzing Globular Domains
sequences [10], and of the Conserved Domain Database (CDD), which contributes to the annotation of RefSeq sequences [70]. 3.4 Other Features and Resources 3.4.1 Globular Repeats Repeats can be hard to detect in protein sequences: most of them are short, and their sequences are often highly divergent. The numbers of repeats in different proteins are extremely variable. Finally, defining the first and last residues of a repeat is more contentious than for a domain, since repeats are more prone to circular permutation than are domains, particularly within closed structures [74], and to partial truncation, resulting in non-integer repeat numbers. Repeat detection methods are often incorporated into domain prediction servers (for example, SMART uses the Prospero [60] program from the Ariadne package), but dedicated protein repeat prediction servers also exist, e.g., REPeats (http://www.embl-heidelberg.de/∼andrade/papers/rep/search.html [5]). The GlobPlot method described in Section 4.1.4 is also fairly capable in detecting repeats. 3.4.2 Domain Interaction Prediction Although homology-based methods are a primary source of globular domain discovery, protein-protein interactions are also being used and becoming more popular for this purpose. One of the servers for exploration of such interactions is STRING. The STRING database (http://string.embl.de/) is dedicated to proteomewide prediction of protein-protein associations [94]. It is an integrated resource relying on a wide range of experimental and computational datasets to make reliable interaction predictions. It contains genomic context associations (derived from genome comparisons), interactions derived from coexpression analysis, and various types of high-throughput experimental data, all of which are stringently bench-marked by using a common reference. 3.4.3 No Domains? If no domains are found by, e.g., SMART or Pfam, this does not mean that your favorite protein does not contain any higher fold or globular domains. Most often, it simply indicates the presence of nonannotated domains, which of course has potentially higher discovery value. However, several resources exist to help identify potential domain boundaries and give hints as to the structure of what might be hidden in the sequence. Secondary Structure Prediction: Good Prediction of secondary structure is the most mature of any structure prediction strategy, and accuracies of up to ∼80% can be achieved [20, 21, 68]. An initial BLAST search to find homologous proteins is important to get a better idea of the function and to build a sequence set for a multiple alignment that can be used on secondary structure prediction servers
13
14 Computational Analysis of Modular Protein Architectures
such as PredictProtein (PROFsec, http://www.predictprotein.org/) and JPRED (http://www.compbio.dundee.ac.uk/∼www-jpred/). Tertiary Structure Prediction: Difficult Prediction of tertiary structure and folds is still error-prone and difficult; however, having good secondary structure predictions at hand can assist this analysis. Perhaps the best approach is to submit the sequence to one of the homology-based prediction servers, such as the 3D-JURY metaserver (http://bioinfo.pl/Meta/ [34]) or SWISS-MODEL (http://www.expasy.org/ swissmod/SWISS-MODEL.html [77]). Other resources can be found on the websites for the evaluation competitions CASP (http://predictioncenter.llnl.gov/ casp5/Casp5.html) and CAFASP(http://bioinfo.pl/cafasp/). Other Sequence Features: Narrowing Down Domain Boundaries Single transmembrane segments (TM1), coiled coils, and low-complexity regions are all incorporated in the SMART server. Sometimes low-complexity regions are disordered (see Section 4.1.3). Coiled coils are also disordered sequences; however, they behave like globular units after the coiled-coil structure is formed, which is a very clear example of disorder–order transition. At EMBL in Heidelberg we have two additional methods that are useful in the definition of potential domain boundaries: DomCut [84] and GlobPlot [52], see Section 4.1.4. Another resource for potential domain boundary prediction is DomPred (http://bioinf.cs.ucl.ac.uk/dompred/ [58]). Many proteins are entirely and natively unstructured and without globular domains, and the rest of this chapter is dedicated to the analysis of this part of protein space.
4 Analyzing Nonglobular Protein Segments
Since most attention in assigning function to proteins has been on globular domains, there are relatively few tools for analyzing the nonglobular protein space. Structural biology has tended to avoid unstructured proteins and regions (e.g., by removing them in recombinants), which has led to a skew toward globular proteins in structural datasets. However, this neglect is not confined to structural biology – bioinformatics has also tended to keep nonglobular function prediction under the academic carpet. Although resources are readily available for revealing globular domains in sequences, until recently there has not been any comprehensive collection of short functional sites/motifs comparable to the globular domain resources. Yet these are just as important for the function of multidomain proteins. Indeed, it is impossible for a researcher to find a list of currently known motifs – going through the literature to retrieve them is impractical without foreknowledge in more areas than any one person has. This neglect is primarily due to the fact that short sequence motifs are
4 Analyzing Nonglobular Protein Segments
statistically insignificant and difficult to handle compared to domains for which accurate sequence models can be produced.
4.1 Unstructured Regions: Protein Disorder
The approach to finding functional sites is fundamentally different from the one described above for globular domains. Since linear motifs are often shorter than 10 amino acids, they overpredict massively even if they are described by using artificial neural networks or other sensitive probabilistic methods. However, linear motifs are context-dependent in the sense that they are functional only if they are exposed for interaction with a modular domain or in the right cell compartment. Structurally they prefer to be in nonglobular or disordered regions of the protein, both of which can be detected fairly accurately. A typical functional site is shown in Figure 9; notice the linear unstructured and flexible protein backbone, a requirement for the CSK kinase to be able to modify the tyrosine. In the following we discuss how to find potentially nonglobular areas, including those that appear structurally disordered, and how to predict functional sites in them.
Fig. 9 The C-terminal CSK Tyrphosphorylation site in Src (PDB: 1fmk) in the closed conformation bound to the SH2 domain. This linear motif (red) shows the general features of an ELM: it is linear in sequence and structure space. The se-
quence of the instance of this functional site is TEPQYQPGE. In the ELM resource this is called the MOD TYR CSK functional site and is described by the pattern [TAD][EA].Q(Y)[QE].[GQA][PEDLS]. The image was created with PyMOL (Table 2).
15
16 Computational Analysis of Modular Protein Architectures
Recently, it has become possible to analyze natively unstructured proteins by methods such as NMR. Besides their high content of functional sites, disordered and nonglobular regions are exciting for many other reasons. 4.1.1 What Role Does Protein Disorder Play in Biology? Target Selection In the post-genomic era, discovery of novel domains and functional sites in proteins is of growing importance. One focus of structural genomics initiatives is to solve structures for novel domains and thereby increase the coverage of fold and structure space [14]. During the target selection process in structural genomics/biology, it is important to consider intrinsic protein disorder, because disordered regions (at the N and C termini or even within domains) often lead to difficulties in protein expression, purification, and crystallization. It is therefore essential to be able to predict which regions of a target protein are potentially disordered/unstructured. IDPs (Intrinsically Disordered Proteins) Although IDPs (also known as intrinsically unstructured proteins) are under-researched, an increasing number are being found. These are proteins or domains that, in their native state, are either entirely disordered or contain large disordered regions. More than 100 such proteins are known, including Tau, prions, Bcl-2, p53, 4E-BP1, and HMG proteins (see Figure 14) [7, 47, 56, 90, 91]. Protein disorder is important for understanding protein function as well as protein folding pathways [67, 92]. Although little is understood about the cellular and structural meaning of IDPs, they are thought to become ordered only when bound to another molecule (e.g., CREB–CBP complex [72]) upon changes in the biochemical environment [27, 29]. Function of Disorder and IDPs The current view on protein disorder is that it allows for more interaction partners and modification sites [53, 90, 99]. However, we have not been able to confirm this hypothesis by analyzing a large interaction dataset (unpublished results). This might be because such datasets are enriched in nontransient interactions, but interactions carried out by disordered proteins are transient. Perhaps disordered proteins have evolved to provide a simple solution to having large intermolecular interfaces while keeping smaller protein, genome, and cell sizes [36]. It has been proposed that having several relatively low-affinity linear interaction sites allows for a flexible, subtle regulation as well as accounting for specificity and cooperative binding effects [31]. In light of the modular model described in Section 2.1, we can see how these sites can be used in a combinatorial manner to generate a very large set of potential interaction environments. Protein Disorder and Disease Structural disorder in proteins is now known to play a central role in diseases mediated by protein misfolding and aggregation [12, 45, 78]. Amylogenic diseases such as Alzheimer’s, Type II diabetes, and BSE are thought to be related to the occurrence of short linear motifs in unfolded regions. These motifs
4 Analyzing Nonglobular Protein Segments
are important for initiation of the formation of the amyloid fibers that cause great harm to the cellular environment, particularly in brain tissue. There are several proposed peptide models for these motifs, and the structural context in which they occur are under investigation [24, 55, 63, 83]. Other diseases such as Parkinson’s, Huntington’s, and serpinopathies are related to misfolding of proteins. The understanding of protein misfolding is related to analysis of the unstructured ensemble or the unfolded state of a polypeptide. This state can be analyzed in natively disordered proteins [30]. How does one characterize protein disorder and nonglobular regions? The field of protein disorder studies has, so far, failed to reach any agreement on this. 4.1.2 What is Protein Disorder? No commonly agreed definition of protein disorder exists. The thermodynamic definition of disorder in a polypeptide chain is the random-coil structural state. The random-coil state can best be understood as the structural ensemble spanned by a given polypeptide in which all degrees of freedom are used within the conformational space. However, even under extremely denaturing solvent conditions, such as 8 M urea, this theoretical state is not observed in solvated proteins [46, 66, 89]. Proteins in solution thus seem to always retain a certain amount of residual structure. Protein disorder is observed by a variety of experimental methods, such as X-ray crystallography; NMR, Raman, and CD spectroscopy; and hydrodynamic measurements [29, 82]. In vivo studies of disorder are possible with NMR spectroscopy on living cells (e.g., anti-sigma factor FlgM [22]). Each of these methods detects different aspects of disorder, resulting in several operational definitions of protein disorder (see [90] for a review). Regions without regular secondary structure can be predicted by the NORSp (nonregular structure) server [53]; however, as the authors point out, such regions are not necessarily disordered. Structures such as the Kringle domain (PDB: 1krn) are almost entirely without regular secondary structure in their native state, but they still have tertiary structure in which the basic building block is coils. These loopy proteins are not necessarily IDPs, since they can still form a well defined globular tertiary structure. In our work we have used four definitions of protein disorder: Ĺ Loops/coils as defined by DSSP [44]. Residues are assigned to one of several secondary structure types. For this definition we consider residues in an αhelix (H), 310 helix (G), or (β-strand (E) to be ordered and all other states (T, S, B, I) to be in loops (also known as coils). Loops/coils are not necessarily disordered (e.g., turns); however, protein disorder is found only within loops. It follows that one can use loop assignments as a necessary but not sufficient requirement for disorder; a disorder predictor based entirely on this definition is thus promiscuous. Ĺ Hot loops constitute a refined subset of the above: namely, those loops having a high degree of mobility as determined from Cα temperature factors (B factors).
17
18 Computational Analysis of Modular Protein Architectures
It follows that highly dynamic loops should be considered disordered. Several attempts have been made to try to use B factors for disorder prediction [15, 28, 32, 93, 104], but there are many pitfalls in doing so, because B factors can vary greatly within a single structure due to the effects of local packing and structural environment. Recent progress in deriving propensity scales for residue mobility based on B factors [81] has encouraged us to use B factors for defining protein disorder. The details for hot loops can be found in the methods part of [51]. Ĺ Missing coordinates/remark465 in X-ray structure, as defined by remark465 entries in the PDB. Nonassigned electron densities most often reflect intrinsic disorder and were used early, for disorder prediction [50]. Ĺ Russell–Linding propensities are parameters based on the hypothesis that the tendency for disorder can be expressed as P = RC – SS where RC and SS are the propensities for a given amino acid to be in random coil and regular secondary structure, respectively. This scale was defined during the development of the GlobPlot predictor described in Section 4.1.4. Figure 10 shows the disorder propensities for each amino acid by our four definitions of disorder. A more detailed discussion of these values can be found in [51], but in general, hydrophobic residues promote order according to all definitions of
Fig. 10 Propensities of the amino acids to be disordered, according to the definitions used in DisEMBL and GlobPlot (sorted by hot loop preference). This scale directly reflects the datasets used for training; however, it is only a rough approximation of what the DisEMBL neural networks use in predicting disorder. Error bars correspond to the 25th and 75th percentiles as estimated by stochastic simulation. The
Russell–Linding scale is an absolute scale. Methionine suffers a bias in the remark465 dataset for at least two reasons: (1) often the N-terminal methionine is missing; and (2) some structures are solved using selenomethionine derivatives for phasing, which can lead to deletion of the residue in the PDB entry. The same bias is seen in ([29], Figure 10).
4 Analyzing Nonglobular Protein Segments
disorder. Disorder-promoting residues include proline, lysine, serine, threonine, and methionine. 4.1.3 Methods for Finding Protein Disorder Several other attempts have been made to predict disorder. Perhaps the earliest were methods of finding regions of low complexity. Although many such regions are structurally disordered, the correlation is far from perfect, because regions of low sequence complexity are not always disordered (and vice versa) [27]. Likely the strongest evidence for this correlation comes from the fact that low-complexity regions are rarely seen in protein 3D structures [75]. Methods to predict low complexity, like SEG [98] and CAST [69], are thus often used for this purpose. Methods using hydrophobicity can also give hints about disordered regions, because lowcomplexity regions are typically exposed and rarely hydrophobic. The first tool designed specifically for prediction of protein disorder was PONDR (predictor of naturally disordered regions, http://www.pondr.com [32, 33, 73]). It is based on artificial neural networks. PONDR is, however, not freely accessible to academics. Refer to [59] for a recent evaluation of disorder prediction (DisEMBL was published after CASP5). Prediction of protein tertiary structure may be an alternative route to disorder prediction, although such methods are computationally intensive and error-prone. Moreover, such methods are usually designed to predict the structure of globular domains, and their behavior with other sequences can be unpredictable. At the EMBL in Heidelberg we have developed methods for finding unstructured regions from sequence data alone. These tools were primarily developed for use in the ELM project to help find regions potentially containing functional sites. However, these tools are now being used by several structural genomics initiatives and laboratories around the world who are either studying IDPs or trying to optimize their recombinant protein expression vectors by cutting out disordered segments. 4.1.4 GlobPlotting GlobPlot was invented specifically to aid the ELM project; however, it proved to be of much wider interest [52]. From the beginning we wanted a graphical tool that could generate easy-to-interpret plots of the tendency within a sequence for structured or lack of structure. The basis for GlobPlot was the Russell–Linding scale mentioned earlier in Section 4.1.2. The combination of random coil and secondary structure in the Russell–Linding scale enhanced the discrimination of the graphs and was the key factor in the success of this scale at detecting both disorder and globular packing. GlobPlot is not intended to be a competitor in secondary structure prediction, because it cannot give the same level of detail as can be obtained from secondary structure prediction based on multiple alignment. GlobPlot is an ab initio method, i.e., it requires only one sequence and can therefore be applied to novel sequences having no homologs, i.e., it does not use multiple alignment. The basic algorithm behind GlobPlot is beautifully simple and very fast: each amino acid ai has a defined
19
20 Computational Analysis of Modular Protein Architectures
propensity P(ai )R (see Russell–Linding in Figure 10). Given a protein sequence of length L, we define a sum function Dis(ai ) as follows: Dis (ai )=
L
P(ai )
j =1
where P(ai ) is the propensity for the ith amino acid. The GlobPlot webserver plots the function, and the graphs are referred to as globplots. Before plotting, the digitalsmoothing Savitzky-Golay algorithm is used to reduce noise on the curve. Analyzing a GlobPlot Reading globplots is fairly easy, but different from, e.g., hydropathy plots, in that globplots are cumulative-sum curves rather than derivative curves. Because GlobPlot plots this running sum, the graph is analyzed by looking at the slope. The numbers on the ordinate do not matter, they equal the running sum, and we are interested only in whether or not a given segment of the graph is disordered. The latter is seen by the decrease or increase in the slope, because that is how the Russell–Linding scale works: negative values correspond to ordered residues, and positive values indicate disorder-promoting amino acids. We designed GlobPlot like this because we think it results in profile-like, intuitive plots. In particular, we wanted to avoid a high-variation curve such as the derivative curve. The globplot in Figure 11 is a good example of one of these profile-like curves: the GlobPlot plot for mucin predicts that the central part of the protein is almost completely disordered (using the Russell–Linding disorder definition) – this is probably why this protein is so slimy. Domain detection with GlobPlot is as easy as finding protein disorder, since both features are shown in the plot. To help you to navigate and understand the plots, the webserver overlays the graph with any predicted SMART domains. In domain hunting situations, you would look for downhill regions in the graph. As seen in Figure 12, GlobPlot can detect potential domains: notice the downhill slope whenever a domain is found by SMART/Pfam. GlobPlot often detects additional sequence to be ordered, this is because SMART and Pfam use only the most conserved sequence part of a domain to generate their hidden Markov models for the domain. This indicates that GlobPlotting is useful for domain boundary definition. 4.1.5 Prediction of Multiple Types of Disorder with DisEMBL The performance of GlobPlot encouraged us to refine our approach and predict disorder in a more traditional biocomputational manner by training artificial neural network predictors for the various definitions of disorder mentioned above. This work led to the DisEMBL disorder predictor ensemble. DisEMBL is a computational tool for prediction of disordered/unstructured regions within a protein sequence [51]. DisEMBL currently provides three alternative disorder definitions: hot loops, coils, and missing coordinates as defined in Section 4.1.2. The coils predictor is used primarily as a filter to require disorder to be within coil-predicted regions (see Section 4.1.2). DisEMBL is a highly accurate predictor, predicting more than 60% of hot loops with fewer than 2% false positives [51].
4 Analyzing Nonglobular Protein Segments
Fig. 11 Globplot of human mucin 5 protein (Swiss-Prot: MU5B HUMAN). Most of this slimy protein is highly disordered. Since GlobPlot plots a running sum of the propensity to disorder, the graph is analyzed by looking at its slope. The numbers
on the ordinate axis do not really matter, it is the uphill or downhill tendency that should be read. Referring to Figure 10 indicates that disorder-promoting propensities are positive, so ‘uphill’ on the graph is equivalent to disorder.
Hot Loops ‘Hot loops’ is a novel definition of disorder based on X-ray data. We think that it will prove difficult to pull out a much more precise definition of disorder based on crystallographic data. An example of hot loop results is shown in Figure 14, where we mapped the probabilities shown in Figure 13 onto the structure of nonhistone chromosomal protein 6A from yeast. It is remarkable that a definition based on X-ray data can predict so well for NMR structures, arguing that this novel definition of disorder is relevant. We also showed this correlation earlier, as well as a comparison of the correlations between our alternative definitions of protein disorder [51]. Using DisEMBL DisEMBL is freely available via a web interface (http://dis. embl.de/) and can be downloaded for use in large-scale studies. The web interface is fairly straightforward to use, you can submit a sequence or enter the SwissProt/SWALL accession (e.g., P08630) or entry code (e.g., HMG1 HUMAN). The server fetches the sequence and description of the polypeptide from an ExPASy server using Biopython.org software. The probability of disorder is shown graphically, as illustrated at Figure 13. The random expectation levels for the different predictors are shown on the graph as horizontal lines, but should merely be considered absolute minima. The default parameters are set for optimal prediction and
21
22 Computational Analysis of Modular Protein Architectures
Fig. 12 Globplot of human CREB-binding protein (CBP HUMAN). About half of the sequence appears to be disordered, with long flexible regions observed at the N and C termini. The flexible region just after the
KIX domain might be important for induced binding of the pKID domain of CREB to CBP [23, 72]. For further discussion of disorder in CBP/CREB see Wright et al. [99].
should be changed only in rare situations. On-line documentation of the various settings is provided at http://dis.embl.de/help.html. If the query protein sequence is very long, > 1000 residues, you can download the predictions and use a local graphing/plotting tool such as Grace or OpenOffice.org to plot and zoom the data. A future version of DisEMBL may include a web applet for interactive plotting and zooming of the graphs. GlobPlot and DisEMBL The GlobPlot algorithm is very simple and intuitive, which is appealing. Although it was originally designed for prediction of protein disorder, the Russell–Linding propensity scale functions just as well for detection of domain boundaries, repeats, and other globular features. The Russell–Linding scale and the SMART domain overlay feature are unique to GlobPlot. DisEMBL is more accurate than GlobPlot in coil prediction, which is related to the Russell–Linding scale. It furthermore provides the novel hot-loop definition of disorder. The two methods complement each other, since they approach disorder prediction differently. In general, we urge you to submit your sequences to both tools. 4.1.6 Design of Protein Expression Vectors As mentioned earlier, protein disorder is related to problems during protein expression, purification, and crystallization. Other tools such as TANGO
4 Analyzing Nonglobular Protein Segments
Fig. 13 Sample output from the DisEMBL web server, showing predictions for yeast nonhistone chromosomal protein 6A (high mobility group protein, Swiss-Prot: NHPA YEAST). The green curve shows the predictions obtained for missing coordinates, red for the hot loop network, and blue for coils. The horizontal lines correspond to the random expectation level for
each predictor: for coils and hot loops the prior probabilities were used, and a neural network score of 0.5 was used for remark465. From this plot it is seen that the N-terminal tail of the protein is especially predicted to be disordered. See Figure 14 for a mapping of the hot loop predictions onto the structure of this protein.
Fig. 14 DisEMBL hot loop predictions mapped on the NMR structure of nonhistone chromosomal protein 6A (high mobility group protein, PDB: 1cg7; model 1, Swiss-Prot: NHPA YEAST). The predicted probabilities are indicated with a color scale
going from blue to red, where red corresponds to the most likely disordered regions and blue to ordered regions. The unstructured tail clearly shows the highest disorder scores (see Figure 13). Surface plot generated with VMD (see Table 2).
23
24 Computational Analysis of Modular Protein Architectures
(http://tango.embl.de/) deal with protein cross-beta aggregation which is different from the disorder in solvated proteins that our tools predict. We believe that identification of potential disordered regions should provide a good basis for setting up expression vectors and/or comparing the data with obtained structural data. However, currently we cannot assess which of the definitions of disorder is most appropriate for design of protein expression vectors. We thus strongly encourage feedback on successes and failures in using DisEMBL for expression and structural analysis of proteins. 4.2 Function Prediction for Nonglobular Protein Segments
Having identified candidate unstructured regions, one can start searching for function in them. Most functions correlate with short linear peptide motifs that are used for cell compartment targeting, protein-protein interaction, regulation by phosphorylation, acetylation, glycosylation, and a host of other post-translational modifications. See Figure 15 for an overview of the many functions these sites perform. The number of known categories of functional sites has increased dramatically in the past few years, and it is obvious that there are more to be discovered. These sites are usually short and often reveal themselves in multiple sequence alignments as short patches of conservation, leading to their definition as linear motifs. In addition to occurring outside globular domains, some sites, e.g., phosphorylation sites, are often found in exposed flexible loops protruding from globular domains.
Fig. 15 Main classes of functional sites. Functional sites are as varied and numerous as domains are. On a proteome level we expect at least five sites per protein, resulting in about 150 000 instances in the human proteome. This indicates the presence of a gigantic and complex interaction and regulatory system.
4 Analyzing Nonglobular Protein Segments
Considering the abundance of targeting signals and post-translational modification sites, it is reasonable to assume that there are more functional sites than globular domains in a higher eukaryotic proteome. 4.2.1 Available Resources ELM is the largest collection of linear motifs, followed by Scansite and PROSITE [64, 80, 101]. Scansite is a very capable resource focusing on cell signaling. It complements ELM in using position-specific scoring matrices (PSSMs) for prediction, which are more sensitive than the regular expressions ELM uses. However, Scansite does not provide an annotated database similar to ELM. A series of individual predictors of functional sites can be found at http:// www.cbs.dtu.dk/services/ which is hosted by the Center for Biological Sequence Analysis in Denmark. The CBS focuses on providing high-performance neural network predictors but without any annotated knowledgebase interface, taking a complementary approach to the other resources. The PROSITE database has collected a number of linear protein motifs, representing them as regular expressions. PROSITE patterns have been very useful but suffered from severe overprediction; more recently the database has emphasized globular domain annotation at the expense of linear motifs. Also of interest are protein interaction databases such as BIND and DIP [9, 100]. More informative protein interaction databases that store known instances of linear motifs include MINT [102] and Phospho.ELM at http://phospho.elm.eu.org/. Databases of instances are not directly useful for prediction but provide valuable data-mining resources. It was recently demonstrated that short functional sites or protein features are crucial for the classification of protein function [41]. The Protfun method is an ab initio method for prediction of higher functional classes based on sequence features alone [42].
4.3 The Eukaryotic Linear Motif Resource: ELM
In this section we describe the ELM resource in detail, since it is the largest resource for linear motifs. The Eukaryotic Linear Motif server (http://elm.eu.org/ [71]) is a new bioinformatics resource for investigating candidate short functional motifs in eukaryotic proteins. Some of the concepts used within ELM are defined in Table 1. An example of the concepts used in practice can be seen in Figures 16 and 17. Linear motifs are short (usually < 10 amino acids) and therefore difficult to evaluate, since the usual significance assessments are inappropriate. Therefore, the ELM resource deploys logical context filters to eliminate false positives. The prediction strategy ELM uses is what we call knowledge-based decision support (KBDS). The basic idea is that, since we cannot discriminate ELMs based on sequence matching, we can use a knowledge base of contextual information regarding functional
25
26 Computational Analysis of Modular Protein Architectures Table 1 Definitions of concepts used in the ELM resource. Functional sites are, as opposed
to, e.g., active sites, short and linear in sequence and structure space. In the ELM resource we describe functional sites as linear motifs. Here, the linear motif is shown as a regular expression or pattern, but it could as well have been another type of sequence model, e.g., a hidden Markov model Concept
Definition
Example
A functional site
A set of short linear (sub)sequences that can be related to a molecular function The common pattern of a set of linear (sub)sequences that can be related to a molecular function An instance of an ELM in a particular polypeptide
LIG RBBD: Rb pocket interacting sequence
An ELM An ELM instance
[LI].C.[DE] RBB1 HUMAN: LVCHE
sites and ELMs to filter out false positives. This knowledge base is created/curated manually from the scientific literature. Currently, KBDS filters are in place for cell compartment, globular domain clash, and taxonomic range. In favorable instances, the filters can reduce the number of retained matches by an order of magnitude or more.
Fig. 16 LIG RBBD is a functional site responsible for interaction with retinoblastoma (Rb) family proteins. Rb proteins are known for repression of E2F proteins, which are required for transcription of proteins important in the cell cycle. The figure shows the SV40 large T antigen interacting with the Rb pocket (PDB: 1gh6).
4 Analyzing Nonglobular Protein Segments
Fig. 17 The location of the instance (LFCSE) of LIG RBBD in the SV40 large T antigen protein is C-terminal to the DnaJ globular domain.
4.3.1 ELM Annotation – ‘Site seeing’ All data input is by hand curation, performed by trained molecular biologists. Annotating each ELM is called ‘Site seeing’ and includes the processes shown in Figure 18. To promote interoperability with other bioinformatics resources, ELM uses three public annotation standards. Gene ontology (GO) identifiers are used for cell compartment, molecular function, and biological process [8, 39], and the NCBI taxonomy database identifiers [97] are used for taxonomic nodes at the apex of phylogenetic groupings in which an ELM occurs. Annotations of ELM instances are assigned ontology terms from the Proteomics Standards Initiative Molecular Interaction ontologies for evidence methods (HUPO.org). In the future the ELM resource will be able to report known instances of ELMs with details about what kind of experiments were performed to show the instance, with links to the relevant literature. The motif patterns are currently represented as POSIX regular expressions (usable in the Python and PERL languages), analogous to PROSITE patterns, but with a different syntax. For example, the FxDxF motif, which is responsible for the binding of accessory endocytic proteins to the alpha subunit of adaptor protein complex AP-2, has a consensus sequence of F-x-D-x-F and is written F. D. F. Linear motifs in ELM will in the future include motif descriptions according to the Seefeld convention nomenclature for linear motifs (see [1]). In the future, ELM might incorporate HMMs or other sensitive search methods; nevertheless, linear motifs will continue to overpredict and require alternative approaches for reducing the levels of false positives.
4.3.2 ELM Resource Architecture The core of the ELM resource is a relational database, powered by PostgreSQL, storing data about linear motifs. Figure 19 outlines how the ELM server is implemented. The user submits a protein sequence to the server and receives a list of matching ELMs that have been filtered to remove false positives (it may naturally include false negatives and residual false positives). Matched motifs are usually not statistically significant, and overprediction occurs despite filtering; hence matches should be considered to represent potential true instances of functional sites and should be used as guides to experimental determination.
27
28 Computational Analysis of Modular Protein Architectures
Fig. 18 The flow of the ‘siteseeing’ process typically involves extensive literature searches, BLAST runs, multiple alignment of relevant protein families, perusal of Swiss-Prot and other online databases, and, where practical, discussion with experimentalists from the field. The empty box symbolizes additional future strategies.
4.3.3 Knowledge-based Decision Support (KBDS): ELM Filtering Sequence-matching methods find many false – but apparently plausible – instances of ELMs that somehow are not recognized by their cognate binding/modification domains. There are two explanations for this: Ĺ One obvious reason why a sequence that matches a motif is not a true functional site is that the motif does not fully and accurately represent the functional site. This can partly be solved by deploying more sophisticated sequence models such as PSSMs or artificial neural networks, an approach used by Scansite and CBS. Ĺ Another reason is that the sequence matches (potential ELM instances) occur in an irrelevant context. They may match a sequence from a wrong cellular compartment or from a species that does not use this functional site. As we have seen, the structural context is also of great importance for linear motifs to be reachable so as to be functional.
4 Analyzing Nonglobular Protein Segments
Fig. 19 Flowchart of the ELM server. Dashed boxes indicate the four stages from input to results. As the server is further developed, more filters will be added to allow more query-dependent data to be retrievable.
It is possible to develop context filters that remove such false positives. In ELM we do most of this inside the knowledge database that the ‘siteseeing’ process is building. The ELM database is designed to accommodate these types of filters or KBDS modules. Currently, three filters are installed on the ELM server. These filters are not completely accurate and introduce false negatives occasionally, although we try to avoid this as much as possible. In general, the approach in ELM is to predict as few false positives as possible, but it is even more important to avoid false negatives. Cell Compartment Filter In ELM every linear motif is annotated with GO terms for the set of cell compartments in which it is known to function. For example, KDEL is a signal for retention of the host protein by the endoplasmic reticulum, whereas the SUMO site applies to proteins in the nucleus and the PML body. The user specifies the compartments in which the query protein functions, and all matches for ELMs not found in these compartments are filtered out. In the future ELM may support prediction of compartments using LOC3D [62].
29
30 Computational Analysis of Modular Protein Architectures
Globular Domain Filter: A Two-track Filtering Strategy Globular domains identified with the SMART and Pfam (domain subset) resources are used for filtering out ELMs. This filter has two tracks: Ĺ a domain filter, Ĺ an ELM rescue or reinstate module.
The domain filter works simply by removing all ELMs within the boundaries of the SMART/Pfam domains matching the same sequence, since they are false positives. The primitive assumption here is that sites within globular units are not accessible and therefore not functional, clearly an oversimplification. ELMs can occur inside certain domains, e.g., the internal tyrosine phosphorylation sites in the active loops of tyrosine kinase domains, as is described in Section 2.1. This later group of ELMs are to a certain extent being ‘rescued’ by the ELM rescue module, i.e., for some ELMs certain SMART/Pfam domains are simply not used for filtering in the domain filter. Given the limited accuracy of the domain filter, the unfiltered results are provided on the results front page. In many situations, users can investigate surface accessibility by examining an available 3D structure, by using a good-quality 2D structure prediction [20, 21, 68], or perhaps by using a homology modeling server such as SWISS-MODEL or the 3D-JURY metaserver [34, 77]. We are currently developing better domain filters, e.g., using surface accessibility from known structures to discriminate false from true positives. A good example of how this might work is shown in Figure 20.
Fig. 20 The V-1 Nef protein (magenta) in complex with wild-type Fyn SH3 (red) domain (PDB: 1avz) contains two potential SH3 ELMs. Residue numbers are given as well as the accessibility (acc) of the high-
lighted fragments. The yellow sequence is the only one that binds to the Fyn partner. The cyan putative ELM is covered by a loop and has a accessibility of 81% (compared to the 98% of the true binding domain).
4 Analyzing Nonglobular Protein Segments
Taxonomic Filtering Some types of functional sites are found in all eukaryotes, e.g., the ER retention signal KDEL is universal. But others are restricted to specific eukaryotic taxa. Perhaps most strikingly, the large receptor tyrosine kinase multigene family is found only in metazoa. Each ELM is annotated with one or more NCBI taxonomy nodes to indicate its known phylogenetic distribution. The user provides the query species, and all ELMs that are not assigned to its lineage are filtered out. 4.3.4 Using ELM The public ELM webserver allows you to retrieve filtered as well as unfiltered raw results. This approach should encourage you to think critically about ELM server results. Figure 21 shows the ELM server output using the human Src sequence as a query. This example indicates the potential of the KBDS approach for improving motif searches. A pipeline interface to ELM prediction for use in proteome analysis is currently being developed and implemented; this pipeline and the results will be made available as soon as possible. The predictive power of the ELM resource can be enhanced by harnessing it to other data, including experimental results. For example, many protein kinase recognition sites are among those which severely overpredict. If a protein is known
Fig. 21 Sample output from the ELM server. The query sequence was Src (SwissProt: SRC HUMAN). The surviving ELMs are shown in blue, and the motifs that have been filtered out are shown in grey. This figure illustrates only how the globular domain (green) filter works: of the 103 ELMs in the resource at the time of writing, 27 match the sequence, but two are removed by the species filter, seven by the compartment fil-
ter, and five by the SMART/Pfam domain filter. Of the remaining 14 ELMs that survive the filtering, six are known to be true, and two are false negatives, i.e., not predicted by ELM (the C-terminal SH2 ligand and the autophosphorylation site within the tyrosine kinase domain; compare with Figure 1). The functionality of the ELM rescue module is not shown in this figure.
31
32 Computational Analysis of Modular Protein Architectures Table 2 Some resources referred to in this chapter. For more information please see the
individual websites Resource
Function classes
htpp://
SMART Pfam Interpro CDD
globular modular domains globular modular domains Globular domain meta server globular modular domains
PROSITE
Domain signatures and a few linear motifs Secondary structure prediction
smart.embl.de www.sanger.ac.uk/Software/Pfam/ www.ebi.ac.uk/interpro/ web.ncbi.nlm.nih.gov/Structure/odd/ odd.shtml www.expasy.ch/sprot/prosite.html
PredictProtein ELM PyMOL VMD Scansite
Functional sites, ELMs, linear motifs Very nice and easy to use molecule viewer and renderer Feature rich molecule viewer
www.predictprotein.org elm.eu.org pymol.sourceforge.net www.ks.uiuc.edu/Research/vmd/
NetNglyc
Phosphorylation and signaling motifs Enzyme categories and higher functional classes N-glycosylation motifs
www.cbs.dtu.dk/services/NetNGlyc
PredictNLS
Nuclear localization signals
cubic.bioc.columbia.edu/predictNLS
SignalP
www.cbs.dtu.dk/services/SignalP
PSORT
Cleavage sites & signal/non-signal peptide prediction Protein sorting signals
Sulfinator
Tyrosine sulfation motifs
us.expasy.org/tools/sulfinator
GlobPlot
Protein disorder and globularity
globplot.embl.de
DisEMBL
Protein disorder
dis.embl.de
GO
biological function, component and process Genome browsing
www.geneontology.org
Protfun
Ensemble
scansite.mit.edu www.cbs.dtu.dk/services/ProtFun
psort.nibb.ac.jp
www.ensembl.org
Phospho.ELM
Instances of Ser/Thr/Tyr phosphorylation
phospho.elm.eu.org
Perl
Script oriented language widely used in bioinformatics Highly object oriented language designed for large projects
www.perl.com
Hidden Markov Model software suite Bioinformatics modules for perl
hmmer.wustl.edu
Python HMMER Biopython Bioperl
Bioinformatics modules for Python
www.python.org
www.biopython.org www.bioperl.org
Acknowledgements
not to be phosphorylated, kinase sites can all be ignored; but if it is known to be phosphorylated, then the kinase-site matches can be targeted for experimental testing. Mass spectrometry can be a useful tool for revealing post-translational modifications. ELM can provide synergism with appropriate experiments and can help in mapping out a research program. In this way, the ELM resource should become increasingly useful to the research community
5 URLs
In Table 2 we have listed some URLs we thought might be useful for you to explore. Many more links can be found in the annual database (http://nar. oupjournals.org/content/vol31/issue1/) and webserver (http://nar.oupjournals. org/content/vol31/issue13/) open access issues of Nucleic Acids Research.
6 Conclusions
We hope that you now have a pretty clear idea of how to approach the analysis of your favorite modular protein. In this chapter we have not discussed all available resources for analysis of proteins, we apologize to the authors of these resources. The mapping of globular domains should be considered mature – methods such as Pfam and SMART are highly reliable for determining potential domains in a sequence. The prediction of functional sites is a much younger field, although advancements have been made with nonsequence approaches such as the KBDS system in ELM. The paradigm behind this chapter and the modular model of protein function is that sequence determines structure, which again determines function. This is clearly true in many instances; however, like any dogma, it is ultimately wrong and misleading. Our view of protein function is still very primitive. We expect the modular model to be enveloped by a more holistic model. It does indeed seem as if nature is presenting molecular functions in two modes: structured domains that are folded and in which the fold/structure determines the function of the domain/protein, and an unstructured mode like the one we see for ELMs. These are modular units which seem to behave like autonomous bit/information strings carried within the host protein to accommodate certain functions or tuning of the host structure – they are themselves unstructured and only their sequence determines their function.
Acknowledgements
This work was partly supported by EU grant QLRI-CT-2000-00127. Thanks to Kresten Lindorff-Larsen, Sophie Chabanis-Davidson, Sara Quirk, and Francesca
33
34 Computational Analysis of Modular Protein Architectures
Diella for commenting on this chapter. Thanks to Pal ˚ Puntervoll and Manuela Helmer Citterich for figures. Finally, we are deeply grateful to FreeBSD.org, (bio)Python.org, PostgreSQL.org, Debian.org, Gentoo.org, and Apache.org for fantastic open free software.
References 1 AASLAND, R., et al., Normalization of nomenclature for peptide motifs as ligands of modular protein domains. FEBS Lett. 2002, 513, 141–144. 2 ALOY, P., RUSSELL, R., The third dimension for protein interactions and complexes. Trends Biochem. Sci. 2002, 27, 633–638. 3 ALTSCHUL, S., MADDEN, T., SCHAFFER, A., ZHANG, J., ZHANG, Z., MILLER, W., LIPMAN, D., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. 4 ANDRADE, M., Bioinformatics and Genomes Current Perspectives. Horizon Scientific Press 2003. 5 ANDRADE, M., PONTING, C., GIBSON, T., BORK, P., Homology-based method for identification of protein repeats using statistical significance estimates. J. Mol. Biol. 2000, 298, 521–537. 6 APIC, G., GOUGH, J., TEICHMANN, S., An insight into domain combinations. Bioinformatics 2001, 17 Suppl 1, S83–89. 7 ARITOMI, M., KUNISHIMA, N., INOHARA, N., ISHIBASHI, Y., OHTA, S., MORIKAWA, K., Crystal structure of rat Bcl-xL: implications for the function of the Bcl-2 protein family. J. Biol. Chem. 1997, 272, 27886–27892. 8 ASHBURNER, M., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000, 25, 25–29. 9 BADER, G., BETEL, D., HOGUE, C., BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003, 31, 248–250. 10 BAIROCH, A., APWEILER, R., The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28, 45–48.
11 BATEMAN, A., et al., The Pfam protein families database. Nucleic Acids Res. 2002, 30, 276–280. 12 BATES, G., Huntingtin aggregation and toxicity in Huntington’s disease. Lancet 2003, 361, 1642–1644. 13 BETTS, M., GUIGO, R., AGARWAL, P., RUSSELL, R., Exon structure conservation despite low sequence similarity: a relic of dramatic events in evolution? EMBO J. 2001, 20, 5354–5360. 14 BRENNER, S., Target selection for structural genomics. Nat. Struct. Biol. 2000, 7 Suppl, 967–969. 15 BROOKS, B., KARPLUS, M., Normal modes for specific motions of macromolecules: application to the hinge-bending mode of lysozyme. Proc. Natl. Acad. Sci. USA 1985, 82, 4995–4999. 16 CHANDONIA, J., WALKER, N., LO CONTE, L., KOEHL, P., LEVITT, M., BRENNER, S., ASTRAL compendium enhancements. Nucleic Acids Res. 2002, 30, 260–263. 17 CLAMP, M., et al., Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res. 2003, 31, 38–42. 18 COPLEY, R., DOERKS, T., LETUNIC, I., BORK, P., Protein domain analysis in the era of complete genomes. FEBS Lett. 2002, 513, 129–134. 19 CREIGHTON, T., Proteins Structures and Molecular Properties, 2nd edit. Freeman, New York 1993. 20 CUFF, J., BARTON, G., Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 2000, 40, 502–511. 21 CUFF, J., CLAMP, M., SIDDIQUI, A., FINLAY, M., BARTON, G., JPred: a consensus secondary structure prediction server. Bioinformatics 1998, 14, 892–893. 22 DEDMON, M., PATEL, C., YOUNG, G., PIELAK, G., FlgM gains structure in living
References
23
24
25
26
27
28
29
30
31
32
33
34
cells. Proc. Natl. Acad. Sci. USA 2002, 99, 12681–12684. DEMAREST, S., MARTINEZ-YAMOUT, M., CHUNG, J., CHEN, H., XU, W., DYSON, H., EVANS, R., WRIGHT, P., Mutual synergistic folding in recruitment of CBP/p300 by p160 nuclear receptor coactivators. Nature 2002, 415, 549–553. DOBSON, C., Protein misfolding and human disease. Scientific World Journal 2002, 2 (1 Suppl 2), 132. DOERKS, T., BAIROCH, A., BORK, P., Protein annotation: detective work for function prediction. Trends Genet 1998, 14, 248–250. DOERKS, T., COPLEY, R., SCHULTZ, J., PONTING, C., BORK, P., Systematic identification of novel protein domain families associated with nuclear functions. Genome Res. 2002, 12, 47–56. DUNKER, A., BROWN, C., LAWSON, J., IAKOUCHEVA, L., OBRADOVIC, Z., Intrinsic disorder and protein function. Biochemistry 2002, 41, 6573–6582. DUNKER, A., et al., Protein disorder and the evolution of molecular recognition: theory, predictions and observations. Pac. Symp. Biocomput. 1998, 473–484. DUNKER, A., et al., Intrinsically disordered protein. J. Mol. Graph. Model 2001, 19, 26–59. DYSON, H., WRIGHT, P., Equilibrium NMR studies of unfolded and partially folded proteins. Nat. Struct. Biol. 1998, 5 Suppl, 499–503. EVANS, P., OWEN, D., Endocytosis and vesicle trafficking. Curr. Opin. Struct. Biol. 2002, 12, 814–821. GARNER, E., CANNON, P., ROMERO, P., OBRADOVIC, Z., DUNKER, A., Predicting disordered regions from amino acid sequence: common themes despite differing structural characterization. Genome Inform Ser Workshop Genome Inform 1998, 9, 201–213. GARNER, E., ROMERO, P., DUNKER, A., BROWN, C., OBRADOVIC, Z., Predicting binding regions within disordered proteins. Genome Inform. Ser. Workshop Genome Inform. 1999, 10, 41–50. GINALSKI, K., ELOFSSON, A., FISCHER, D., RYCHLEWSKI, L., 3D-Jury: a simple
35
36
37
38
39
40
41
42
43
44
45
approach to improve protein structure predictions. Bioinformatics 2003, 19, 1015–1018. GOUGH, J., The SUPERFAMILY database in structural genomics. Acta Crystallogr. D Biol. Crystallogr. 2002, 58, 1897–1900. GUNASEKARAN, K., TSAI, C., KUMAR, S., ZANUY, D., NUSSINOV, R., Extended disordered proteins: targeting function with less scaffold. Trends Biochem. Sci. 2003, 28, 81–85. HARRISON, A., PEARL, F., SILLITOE, I., SLIDEL, T., MOTT, R., THORNTON, J., ORENGO, C., Recognizing the fold of a protein structure. Bioinformatics 2003, 19, 1748–1759. HENIKOFF, J., GREENE, E., PIETROKOVSKI, S., HENIKOFF, S., Increased coverage of protein families with the blocks database servers. Nucleic Acids Res. 2000, 28, 228–230. HILL, D., BLAKE, J., RICHARDSON, J., RINGWALD, M., Extension and integration of the gene ontology (GO): combining GO vocabularies with external vocabularies. Genome Res. 2002, 12, 1982–1991. HILL, E., BROADBENT, I., CHOTHIA, C., PETTITT, J., Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster. J. Mol. Biol. 2001, 305, 1011–1024. JENSEN, L., USSERY, D., BRUNAK, S., Functionality of system components: conservation of protein function in protein feature space. Genome Res. 2003, 13, 2444–2449. JENSEN, L. J., et al., Prediction of human protein function from post-translational modifications and localization features. J. Mol. Biol. 2002, 319, 1257–1265. JONASSEN, I., EIDHAMMER, I., TAYLOR, W., Discovery of local packing motifs in protein structures. Proteins 1999, 34, 206–219. KABSCH, W., SANDER, C., Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983, 22, 2577–2637. KAPLAN, B., RATNER, V., HAAS, E., Alpha-synuclein: its biological function
35
36 Computational Analysis of Modular Protein Architectures
46
47
48
49
50
51
52
53
54
55
56
57
and role in neurodegenerative diseases. J. Mol. Neurosci. 2003, 20, 83–92. KLEIN-SEETHARAMAN, J., et al., Long-range interactions within a nonnative protein. Science 2002, 295, 1719–1722. KUSSIE, P., GORINA, S., MARECHAL, V., ELENBAAS, B., MOREAU, J., LEVINE, A., PAVLETICH, N., Structure of the MDM2 oncoprotein bound to the p53 tumor suppressor transactivation domain. Science 1996, 274, 948–953. LANDER, E., et al., Initial sequencing and analysis of the human genome. Nature 2001, 409, 860–921. LETUNIC, I., et al., Recent improvements to the SMART domain-based sequence annotation resource. Nucleic Acids Res. 2002, 30, 242–244. LI, X., OBRADOVIC, Z., BROWN, C., GARNER, E., DUNKER, A., Comparing predictors of disordered protein. Genome Inform Ser Workshop Genome Inform 2000, 11, 172–184. LINDING, R., JENSEN, L., DIELLA, F., BORK, P., GIBSON, T., RUSSELL, R., Protein disorder prediction: implications for structural proteomics. Structure 2003, 11, 1453–1459. LINDING, R., RUSSELL, R., NEDUVA, V., GIBSON, T., GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res. 2003, 31, 3701–3708. LIU, J., TAN, H., ROST, B., Loopy proteins appear conserved in evolution. J. Mol. Biol. 2002, 322, 53–64. LO CONTE, L., BRENNER, S., HUBBARD, T., CHOTHIA, C., MURZIN, A., SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res. 2002, 30, 264–267. LOPEZ DE LA PAZ, M., GOLDIE, K., ZURDO, J., LACROIX, E., DOBSON, C., HOENGER, A., SERRANO, L., De novo designed peptide-based amyloid fibrils. Proc. Natl. Acad. Sci. USA 2002, 99, 16052–16057. LOPEZ GARCIA, F., ZAHN, R., RIEK, R., WUTHRICH, K., NMR structure of the bovine prion protein. Proc. Natl. Acad. Sci. USA 2000, 97, 8334–8339. MARCHLER-BAUER, A., et al., CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 2003, 31, 383–387.
58 MARSDEN, R., MCGUFFIN, L., JONES, D., Rapid protein domain assignment from amino acid sequence using predicted secondary structure. Protein Sci. 2002, 11, 2814–2824. 59 MELAMUD, E., MOULT, J., Evaluation of disorder predictions in CASP5. Proteins 2003, 53 Suppl 6, 561–565. 60 MOTT, R., Accurate formula for P-values of gapped local sequence and profile alignments. J. Mol. Biol. 2000, 300, 649–659. 61 MULDER, N., et al., The InterPro Database, 2003, brings increased coverage and new features. Nucleic Acids Res. 2003, 31, 315–318. 62 NAIR, R., ROST, B., LOC3D: annotate sub-cellular localization for protein structures. Nucleic Acids Res. 2003, 31, 3337–3340. 63 NILSSON, M., DOBSON, C., In vitro characterization of lactoferrin aggregation and amyloid formation. Biochemistry 2003, 42, 375–382. 64 OBENAUER, J., CANTLEY, L., YAFFE, M., Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res. 2003, 31, 3635–3641. 65 ORENGO, C., PEARL, F., THORNTON, J., The CATH domain structure database. Methods Biochem Anal 2003, 44, 249–271. 66 PAPPU, R., SRINIVASAN, R., ROSE, G., The Flory isolated-pair hypothesis is not valid for polypeptide chains: implications for protein folding. Proc. Natl. Acad. Sci. USA 2000, 97, 12565–12570. 67 PEI, J., GRISHIN, N., AL2CO: calculation of positional conservation in a protein sequence alignment. Bioinformatics 2001, 17, 700–712. 68 POLLASTRI, G., PRZYBYLSKI, D., ROST, B., BALDI, P., Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 2002, 47, 228–235. 69 PROMPONAS, V., ENRIGHT, A., TSOKA, S., KREIL, D., LEROY, C., HAMODRAKAS, S., SANDER, C., OUZOUNIS, C., CAST: an iterative algorithm for the complexity analysis of sequence tracts: complexity
References
70
71
72
73
74
75
76
77
78
79
80
analysis of sequence tracts. Bioinformatics 2000, 16, 915–922. PRUITT, K., MAGLOTT, D., RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2001, 29, 137–140. PUNTERVOLL, P., et al., ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res. 2003, 31, 3625–3630. RADHAKRISHNAN, I., PEREZ-ALVARADO, G., PARKER, D., DYSON, H., MONTMINY, M., WRIGHT, P., Solution structure of the KIX domain of CBP bound to the transactivation domain of CREB: a model for activator:coactivator interactions. Cell 1997, 91, 741–752. ROMERO, P., OBRADOVIC, Z., KISSINGER, C. R., VILLAFRANCA, J., DUNKER, A., Identifying disordered proteins from amino acid sequences. Proc. IEEE Int. Conf. Neural Networks 1997, 1, 90–95. RUSSELL, R., PONTING, C., Protein fold irregularities that hinder sequence analysis. Curr. Opin. Struct. Biol. 1998, 8, 364–371. SAQI, M., STERNBERG, M., Identification of sequence motifs from a set of proteins with related function. Protein Eng. 1994, 7, 165–171. SCHULTZ, J., MILPETZ, F., BORK, P., PONTING, C., SMART, a simple modular architecture research tool: identification of signaling domains. Proc. Natl. Acad. Sci. USA 1998, 95, 5857–5864. SCHWEDE, T., KOPP, J., GUEX, N., PEITSCH, M., SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 2003, 31, 3381–3385. SCHWEERS, O., SCHONBRUNN-HANEBECK, E., MARX, A., MANDELKOW, E., Structural studies of tau protein and Alzheimer paired helical filaments show no evidence for beta structure. J. Biol. Chem. 1994, 269, 24290–24297. SERVANT, F., BRU, C., CARRERE, S., COURCELLE, E., GOUZY, J., PEYRUC, D., KAHN, D., ProDom: automated clustering of homologous domains. Brief Bioinform 2002, 3, 246–251. SIGRIST, C., CERUTTI, L., HULO, N., GATTIKER, A., FALQUET, L., PAGNI, M.,
81
82
83
84
85
86
87
88 89
90
91
92
BAIROCH, A., BUCHER, P., PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform. 2002, 3, 265–274. SMITH, D., RADIVOJAC, P., OBRADOVIC, Z., DUNKER, A., ZHU, G., Improved amino acid flexibility parameters. Protein Sci. 2003, 12, 1060–1072. SMYTH, E., SYME, C., BLANCH, E., HECHT, L., VASAK, M., BARRON, L., Solution structure of native proteins with irregular folds from Raman optical activity. Biopolymers 2001, 58, 138–151. STEFANI, M., DOBSON, C., Protein aggregation and aggregate toxicity: new insights into protein folding, misfolding diseases and biological evolution. J. Mol. Med. 2003, 81, 678–699. SUYAMA, M., OHARA, O., DomCut: prediction of inter-domain linker regions in amino acid sequences. Bioinformatics 2003, 19, 673–674. TAYLOR, W., Protein structural domain identification. Protein Eng. 1999, 12, 203–216. TAYLOR, W., A deeply knotted protein structure and how it might fold. Nature 2000, 406, 916–919. TAYLOR, W., Defining linear segments in protein structure. J. Mol. Biol. 2001, 310, 1135–1150. TAYLOR, W., LIN, K., Protein knots: a tangled problem. Nature 2003, 421, 25. TEILUM, K., KRAGELUND, B., POULSEN, F., Transient structure formation in unfolded acylcoenzyme A–binding protein observed by site-directed spin labeling. J. Mol. Biol. 2002, 324, 349–357. TOMPA, P., Intrinsically unstructured proteins. Trends Biochem. Sci. 2002, 27, 527–533. UVERSKY, V., Natively unfolded proteins: a point where biology waits for physics. Protein Sci. 2002, 11, 739–756. VERKHIVKER, G., BOUZIDA, D., GEHLHAAR, D., REJTO, P., FREER, S., ROSE, P., Simulating disorder–order transitions in molecular recognition of unstructured proteins: where folding meets binding. Proc. Natl. Acad. Sci. USA 2003, 100, 5148–5153.
37
38 Computational Analysis of Modular Protein Architectures
93 VIHINEN, M., TORKKILA, E., RIIKONEN, P., Accuracy of protein flexibility predictions. Proteins 1994, 19, 141–149. 94 VON MERING, C., HUYNEN, M., JAEGGI, D., SCHMIDT, S., BORK, P., SNEL, B., STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 2003, 31, 258–261. 95 WATERSTON, R., et al., Initial sequencing and comparative analysis of the mouse genome. Nature 2002, 420, 520–562. 96 WESTBROOK, J., FENG, Z., CHEN, L., YANG, H., BERMAN, H., The Protein Data Bank and structural genomics. Nucleic Acids Res. 2003, 31, 489–491. 97 WHEELER, D., et al., Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res. 2002, 30, 13–16. 98 WOOTTON, J., Nonglobular domains in protein sequences: automated segmentation using complexity measures. Comput Chem 1994, 18, 269–285. 99 WRIGHT, P., DYSON, H., Intrinsically unstructured proteins: reassessing the protein structure–function paradigm. J. Mol. Biol. 1999, 293, 321–331.
100 XENARIOS, I., SALWINSKI, L., DUAN, X., HIGNEY, P., KIM, S., EISENBERG, D., DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 2002, 30, 303–305. 101 YAFFE, M., LEPARC, G., LAI, J., OBATA, T., VOLINIA, S., CANTLEY, L., A motif-based profile scanning approach for genome-wide prediction of signaling pathways. Nat. Biotechnol. 2001, 19, 348–353. 102 ZANZONI, A., MONTECCHI-PALAZZI, L., QUONDAM, M., AUSIELLO, G., HELMER-CITTERICH, M., CESARENI, G., MINT: a molecular interaction database. FEBS Lett. 2002, 513, 135–140. 103 ZDOBNOV, E., et al., Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science 2002, 298, 149–159. 104 ZOETE, V., MICHIELIN, O., KARPLUS, M., Relation between sequence and structure of HIV-1 protease inhibitor complexes: a model system for the analysis of protein flexibility. J. Mol. Biol. 2002, 315, 21–52.
1
Mass Spectrometry for the Detection and Identification of Proteins Simone K¨onig, Jens Grote, Martin Zeller, Manfred Fobker, Michael Erren and Stefan Lorkowski University of M¨unster, M¨unster, Germany
J¨org Haberland and Paul Cullen Ogham GmbH, M¨unster, Germany
Erk Gedig XanTec bioanalytics GmbH, M¨unster, Germany
Stephen W. Pursley DakoCytomation, Inc., Fort Collins, USA
John E. Pearl The Trudeau Institute, New York, USA
Originally published in: Analysing Gene Expression. Edited by Stefan Lorkowski, Paul Cullen. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30488-2
1 Introduction
Mass spectrometry (MS) is one of the most important tools in proteomic analyses today. It has moved out of the physicist’s basement into the biological laboratory. Companies and technology platforms offering protein mass spectrometry continue to be founded and funded. In this section, the reader will be introduced to this technique stressing protein and peptide analysis, although mass spectrometry is of equal importance for the study of other biomolecules such as DNA or low-molecular weight metabolites. The focus of the following sections will be on what is understood as proteomic mass spectrometry: the analysis and identification of proteins separated by gel electrophoresis, although the application of mass spectrometry in protein analysis is much wider and includes the measurement of intact proteins and their complexes. A distinction should be drawn between the huge demand for proteomic analyses of known proteins, which has lead to the commercialisation of the method, and special applications from biomedical research such as de novo seProtein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Mass Spectrometry for the Detection and Identification of Proteins Table 1 History of mass spectrometry of biomolecules. For more detail, see the 50 year com-
memorative book review of the American Society for Mass Spectrometry from 2002 and the internet site ‘A history of mass spectrometry’ at http://masspec.scripps.edu/hist.html Year
Technical acquirement
1960s 1965
Analysis of volatile compounds only. Analysis of permethylated peptide. At 1.4 kilodalton, largest natural compound analysed by mass spectrometry. Mass spectrometry of large underivatised peptides becomes possible. First complete protein spectrum obtained (insulin). Analysis of snake toxins of 13 kilodaltons. Analysis of proteins above 25 kilodaltons not possible. Sensitivity in low to mid-picomolar range. Analysis of almost any biomolecule possible.
1974 1982 1983 After 1988
quencing, the analysis of post-translational modifications, or cross-linking studies, where mass spectrometry is also of enormous value. The latter methods will be described only briefly and the reader is referred to the literature for further reading. Since the 1960s, when derivatisation was still necessary in order to measure peptides with mass spectrometry, solutions were found to the problem of ionising huge molecules such as proteins and detecting them using mass spectrometry. Increasingly, mass spectrometry instrumentation has been adapted to demands of proteomic analyses and spectrometers have become benchtop size and much more user-friendly (Table 1).
2 How does mass spectrometry work?
Mass spectrometers measure molecular weight. To do this, molecules are first ionised and then separated according to their mass-to-charge ratio (m/z). Each atom has a characteristic mass and the symbol u (mass unit or unified atomic mass unit) represents by convention one twelfth of the mass of the most abundant naturally occurring stable isotope of carbon. The mass of a molecule is the sum of the atomic masses (in Dalton) of all the atoms of the elements composing it. The unit Dalton is widely used in biochemistry and is also accepted in the field of mass spectrometry [16]. 2.1 Ionisation of molecules
Ionisation is the first step in mass spectrometric analysis of molecules. Although several other ionisation techniques are still in use, matrix-assisted laserdesorption/ionisation (MALDI) and electrospray ionisation (ESI) are the main techniques employed in proteomics. Other ionisation techniques are atmospheric pressure chemical ionisation, electron impact, chemical ionisation, fast-atom bombardment, and thermospray.
2 How does mass spectrometry work?
In MALDI, the sample is spotted together with a matrix onto a metal target and dried. The sample holder is then loaded into the mass spectrometer. Ionisation is achieved by directing a pulsed laser beam onto the sample. Most ions detected carry a single charge (see Figure 1). The matrix is a low-molecular weight compound, that can absorb some of the energy from the laser light (for instance, ultraviolet light). This prevents decomposition of the sample and also facilitates vaporisation and ionisation of the analyte molecules (Figure 2). In electrospray ionisation, a liquid sample is exposed to a strong electric field. Ionisation is achieved for instance by applying a high voltage of up to six kilovolts directly to the solution via a metal wire placed inside a capillary. Highly charged droplets are formed, which are desolvated by heat or gas. In contrast to MALDI, ESI creates more ions which bear a multiple charge. Intact protein can take up as many as 50 protons so that their spectra show a distribution of charged state (Figure 3). 2.2 Detection of ions
Ions are separated in electric and magnetic fields in a high vacuum. Modern mass spectrometers are based on a variety of technical designs and show differences in mass accuracy or resolution. The most important features for proteomics are to resolve the isotopes of peptides and to detect them with an error as small as possible (routinely better than 50 parts per million). Ion separation can be described by the following fundamental equation E kin = z e V = 1/2m v 2 where E kin is the kinetic energy of the ions, V is the accelerating voltage, z is the number of elementary charges e (1.60217733 × 10−19 Coulomb), m is the molecular mass of the ions, and v is the velocity of the ions. This means that ions in electric and magnetic fields in a high vacuum are separated according to their mass-to-charge ration. Technical solutions for ion detection mainly used in proteomics are ion traps, time-of-flight (TOF), and quadrupoles, whereas detection in other analytical fields is mainly performed by quadrupole systems, double-focusing sector, and Fourier transformion cyclotron resonance (FT-ICR). 2.3 Appearance of spectra
Modern instruments can routinely determine monoisotopic peptide masses at high accuracy. This is necessary to search databases effectively. For intact proteins, usually their average mass is measured, although instruments with very high resolving power such as Fourier transformion cyclotron resonance instruments have the capability to resolve individual peaks (Figures 4 and 5).
3
4 Mass Spectrometry for the Detection and Identification of Proteins Table 2 Atomic masses and abundances for biologically relevant isotopes [25]
Isotope
Atomic weight
Natural abundance [%]
12 C
12.0107 13.003354826 1.007825035 2.014101779 14.003074002 15.00010897 15.99491463 16.9991312 17.9991603 30.9737620 31.97207070 32.97145854 33.96786665 35.96708062
98.93 1.07 99.9885 0.0115 99.632 0.368 99.757 0.038 0.205 100 94.93 0.76 4.29 0.02
13 C 1H 2H 14 N 15 N 16 O 17 O 18 O 31 P 32 S 33 S 34 S 36 S
Most elements found in biological compounds (12 C, 1 H, 14 N, 16 O) have lowabundant isotopes (13 C, 2 H, 15 N, 17 O, 18 O). It is important to realise that for masses larger than two kilodaltons the 12 C isotope is no longer the most abundant peak in an isotopic distribution of a biomolecule. Isotopes of other elements alter the isotopic distribution significantly (i.e. bromine and chlorine; see Table 2). 2.4 Modes of operation
Mass spectrometers can be operated in many ways in order to solve a specific problem. On one hand, the molecular weight of the analyte molecules can be determined. In this case, every compound present in the sample should in principle generate a separate peak. This is not possible, however, because factors such as ion suppression, adduct formation, and instrument settings restrict mass spectrometric output. For this reason, sample preparation and purification are very important and sample preparation has often to be adapted to the needs of mass spectrometry. An important feature of some types of mass spectrometers is their capability to select and to fragment certain ions by collision in the gas phase. In this way, peptides can be sequenced. The product ions generate characteristic spectra, which can be searched against databases (Figures 6 and 7, Table 3). Further options available in triple-quadrupole spectrometers are the neutral loss scan and the parent ion scan which are of importance for monitoring specific ions. This is important, for example, in phosphorylation analysis (Figure 8).
3
3 Proteomic mass spectrometry Table 3 Amino acid residues and their mass
Amino acid
Abbreviation
Symbol
Monoisotopic mass [g/mol]
Average mass [g/mol]
Glycine Alanine Serine Proline Valine Threonine Cysteine Isoleucine Leucine Asparagine Aspartic acid Glutamine Lysine Glutamic acid Methionine Histidine Phenylalanine Arginine Tyrosine Tryptophan
Gly Ala Ser Pro Val Thr Cys Ile Leu Asn Asp Gln Lys Glu Met His Phe Arg Tyr Trp
G A S P V T C I L N D Q K E M H F R Y W
57.021 71.037 87.032 97.053 99.068 101.048 103.009 113.084 113.084 114.043 115.027 128.059 128.095 129.043 131.040 137.059 147.068 156.101 163.063 186.079
57.052 71.079 87.078 97.117 99.133 101.105 103.145 113.160 113.160 114.104 115.089 128.131 128.174 129.116 131.199 137.142 147.177 156.188 163.176 186.213
Proteomic mass spectrometry
The immediate goal in proteomics is the automated analysis and comparison of large sets of proteins. It is of advantage to digest the separated proteins of interest enzymatically, because (i) resulting peptides can be easily extracted and separated, (ii) a characteristic set of peptides is generated for each protein, and (iii) mass spectrometric determination of peptide masses (termed peptide mapping or peptide mass fingerprint) and sequence analysis allows protein identification. The protease used predominantly is trypsin, but other cleavage methods may be more appropriate for specific analysis problems (for an overview for digestion methods, see Table 4). Proteins may be enzymatically digested directly in the gel spot after SDS-based polyacrylamide gel electrophoresis. The resulting peptide is extracted after detection by silver or Coomassie staining that is generally sufficient for protein identification (Figure 9). Silver staining of gels allows the detection of one to ten nanograms of protein, whereas Coomassie staining is less sensitive and allows the detection of 50 to 100 nanograms. Since it is possible to identify less than one picomole of a single protein using mass spectrometry, any bands detected by silver or Coomassie staining can be identified. Proteins may also be digested after electroblotting using a similar procedure. The peptides are purified using solid-phase extraction and subjected to mass spectrometric analysis. In general, a MALDI peptide map is measured first. The
5
6 Mass Spectrometry for the Detection and Identification of Proteins Table 4 Proteolytic agents
Proteolytic agents
Cleavage site
Chemical
BNPS skatol Cyanogen bromide
Tryp. . .X Met. . .X
Enzymatic
Trypsin Chymotrypsin Endoproteinases • Lys-C • Glu-C • Asp-N Endopeptidase • Asp-C Carboxypeptidase B
Lys/Arg. . .X; X=Pro Phe/Tyr/Trp. . .X and other hydrophobic residues Lys. . .X; X=Pro Glu. . .X X. . .Asp Asp. . .x Carboxy-terminal residues, in particular Arg/Lys
list of monoisotopic peptide masses is used for a database search that will often result in the confident identification of the protein with high confidence. Because of the statistical basis of the search, the number of false-positive results can be quite high. Several factors may hinder protein identification: (i) modifications of the protein, (ii) multiple proteins in a single spot, (iii) unusual contaminants, and (iv) difficulties in accessing database information due to properties of search algorithms or database entries. It is therefore advisable to perform fragmentation analysis using electrospray ionisation tandem mass spectrometry (ESI-MS/MS). Sequence information gained for only one peptide may allow protein identification. Detailed information on mass spectrometric analyses of proteins separated by polyacrylamide gel electrophoresis can be found in specialist literature: in-gel digest with spot excision is described by Rosenfeld et al. [24] and Shevchenko et al. [26], and digest of electro-blotted proteins with spot excision is referenced by Fernandez et al. [8] and Stults et al. [27]. The combination of in-gel and in-membrane digest without spot excision is described in detail by Binz et al. [5]. Two-dimensional polyacrylamide gel electrophoresis is of limited use in the separation of very acidic or basic proteins, very large or small proteins, or membrane proteins. In such cases, the combination of high-performance liquid chromatography and tandem mass spectrometry (HPLC-MS/MS) has been used. Using this approach, complex protein mixtures are digested and then automatically separated and fragmented by a high-performance liquid chromatography system that is coupled to a tandem mass spectrometer. Thus, a sequence of different chromatographic separation methods can be combined (e.g., two-dimensional liquid chromatography, 2D-LC). Detailed information on mass spectrometric analyses of proteins separated by high-performance liquid chromatography can be found in Link et al. [15], Peng & Gygi [21] and Washburn et al. [28]. Peptides are often separated with high-performance liquid chromatography using reversed phase material (silica particles covered with chemically-bonded hydrocarbon chains of varying length from four, C-4 column packing material, to eighteen carbons, C-18 column packing material). In HPLC-MS/MS experiments, both the ultraviolet absorption and the mass spectrometric signal (total ion current)
3 Proteomic mass spectrometry
are used for peptide detection. If their values reach preset values, they trigger the start of an tandem mass spectrometric experiment. In this way, more than 100 proteins have been identified in a single run (Figure 10). Peak lists generated with peptide mapping or tandem mass spectrometric fragmentation are the experimental data used to screen databases. Special search engines have been developed for this purpose that access the large public protein and genomic databases. The search programs differ in their algorithms and constraints. With this approach only known proteins can be found. Tandem mass spectrometric spectra of peptides from unknown proteins serve to find sequence tags, which can then help access the protein via classical cloning approaches (Figure 11). 3.1 Data analysis tools for mass spectroscopic experiments
Several software tools for mass spectrometric analyses are available via the internet. They allow to search databases with peptide mass fingerprinting or sequencing data. Short stretches of sequence (tags) can also be used as search input. Examples are: Protein Prospector This tool was developed by the San Francisco Mass Spectrometry Facility (University of California, California, USA) and which is publicly available at http://prospector.ucsf.edu/ ([6, 10]). Mascot This search engine is offered by Matrix Science Ltd. (London, United Kingdom) and is freely accessible at http://www.matrixscience.com ([10, 20]). PROWL This tool is being developed in a collaborative fashion by ProteoMetrics, LLC, and the Rockefeller University (New York, New York, USA) and is provided at http://prowl.rockefeller.edu/ ([7, 10]). 3.2 Proteomics servers and databases
Proteomic data are provided by several servers that are accessible via the internet. In the following paragraphs, some of the most popular servers are listed. ExPASy (Expert Protein Analysis System). The ExPASy server of the Swiss Institute of Bioinformatics (SIB; Geneva, Switzerland) is dedicated to the analysis of protein sequences and structures as well as two-dimensional polyacrylamide gel electrophoresis. The server is available at http://www.expasy.org/ and provides a broad range of useful proteomic tools and databases. European Bioinformatics Institute. The European Bioinformatics Institute (EBI; Cambridge, United Kingdom) of the European Molecular Biology Laboratory (EMBL) is accessible at http://www.ebi.ac.uk/ and manages databases of biological data including nucleic acid, protein sequences and biological structures as well as several tools of molecular biological analyses. SWISS-PROT. SWISS-PROT at the Swiss Institute of Bioinformatics (SIB) provides a curated protein sequence database, with a high level of annotations including protein function, domain structure, post-translational modification and variants. It
7
8 Mass Spectrometry for the Detection and Identification of Proteins
shows a minimal redundancy and high level of integration with other databases. The database is available at http://www.expasy.org/sprot/. The current SWISS-PROT release 40.13 (state: March, 2002) contains 106,734 entries. TrEMBL. The computer-annotated supplement of SWISS-PROT, TrEMBL, contains all translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT and is available at http://www.ebi.ac.uk/trembl/. The current TrEMBL release 19.13 (state: March, 2002) contains 592,934 entries. National Center for Biotechnology Information. The National Center for Biotechnology Information (NCBI; Bethesda, Maryland, USA) is a US American resource for molecular biology information and is available at http://www.ncbi.nlm.nih.gov/. NCBI creates public databases, conducts research in computational biology, develops software tools for analysing genome data, and disseminates biomedical information. GenBank – a very popular nucleic acid sequence database available at the National Center for Biotechnology Information – is an annotated collection of all publicly available DNA, RNA and protein sequences. The GenBank flat file release 128.0 (state February, 2002) contains 15,465,325 loci and 17,089,143,893 bases from 15,465,325 reported sequences. Protein Information Resource. The Protein Information Resource (PIR; National Biomedical Research Foundation, Washington DC, USA) is available at http://pir.georgetown.edu/. The provided PIR-International Protein Sequence Database (PIR-PSD) is a comprehensive, non-redundant, expertly-annotated, fully classified and extensively cross-referenced protein sequence database in the public domain. Auxiliary databases provide integration of sequences, functional, and structural information to support genomics and proteomics research. The current release 72.00 (state: April, 2002) contains 283,174 entries. The Non-redundant REFerence Protein Database (PIR-NREF) provided by the Protein Information was designed to provide a timely and comprehensive collection of all protein sequence data, keeping pace with the genome sequencing projects and containing source attribution and minimal redundancy. The database contains all sequences stored in PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. Non-redundancy is achieved based on clustering by sequence identity and taxonomy at the species level. The NREF report provides source attribution with protein identification numbers and names from associated databases, in addition to protein sequence, taxonomy, and bibliography. The current beta-release (state: March, 2002) contains 888,230 entries.
4 An example: Identification of an isolated and unknown protein
Following one-dimensional polyacrylamide gel electrophoresis, a Coomassiestained band of 18 kilodaltons was excised. A tryptic digest of the excised gel band was performed and the peptides were extracted and purified for mass spectrometry. MALDI-MS was performed with a reflectron-time-of-flight instrument with delayed extraction, nitrogen-ultraviolet laser at 337 nanometres and α-cyano4-hydroxy cinnamic acid as a matrix. The experiment was run at an accuracy better than 50 parts per million with low femtomole sensitivity. The database search us-
5 Is quantification possible with mass spectrometry?
ing the MALDI peptide map and all of the major search engines did not provide a conclusive answer with and without species and molecular weight constraints (Figure 12). The remaining sample digest was subjected to nanospray electrospray ionisation tandem mass spectrometry using an ion trap with a mass error of 0.4 Dalton. A search using the resulting tandem mass spectrometry spectra of the peptides led to the identification of albumin and haemoglobin peptide ions from Bos taurus. The MALDI map was then assigned accordingly (Figure 13). Thus, only contaminant proteins had been isolated on the gel so far. The mass spectrometric analysis was therefore complicated by the fact that two proteins (the protein of interest and a contamination) were co-eluting in PAGE and that the species differed from the species expected. An additional difficulty was the fact that two proteins were coeluting in the gel.
5 Is quantification possible with mass spectrometry?
Although quantification is generally difficult in mass spectrometry, an approach has been proposed to identify and quantify proteins in a high-throughput environment. With isotope-coded affinity tags (ICATTM ; [9]), differences in the relative protein abundance between two different samples can be compared using
9
10 Mass Spectrometry for the Detection and Identification of Proteins
multi-dimensional liquid chromatography followed by mass spectrometry. ICATTM has been described in detail in a previous section of this chapter. This recent technology is thought to enhance the detection of differential protein expression, in particular compared to the classical approach using two-dimensional polyacrylamide gel electrophoresis.
6 Analysis of protein modifications
The biological function of a protein is determined not only by its amino acid sequence but also by post-translational modifications, more than 200 of which have been described to date. Among these, the major tasks for analysis concern phosphorylation, glycosylation and disulphide bridges. The analysis of modifications is based on the standard approach explained above with adaptations depending on the specific modification. In the analysis of disulphide bridges, it may be possible to obtain fragmentation spectra of the linked peptides via HPLC-MS/MS in order to assign the site, possibly utilising a comparison with the reduced and alkylated protein in which the disulphide link is absent. 6.1 Phosphorylation
Protein phosphorylation is a common and important modification of proteins, because nearly all aspects of cell life are influenced by reversible protein phosphorylation. Changes in the state of protein phosphorylation are regulated by the interplay of two types of counteracting enzyme activities: kinases, which catalyse the addition of phosphate from a nucleoside triphosphate donor to specific residues, mainly serine, threonine or tyrosine, while phosphatases catalyse the removal of specific phosphate groups. Defining the sites of phosphorylation in a protein and the extent of phosphorylation at each specific site is an analytical challenge for a number of reasons. First, for many phosphoproteins, especially those involved in signalling, in cells their copy number is very low. Second, individual sites are often only partially phosphorylated and third, in mass spectrometry, phosphorylated peptides usually show lower response in positive ion mode due to the electronegativity of phosphate groups. Solutions to improve mass spectrometric detection of protein phosphorylations involve the specific detection of characteristic marker ions for phosphopeptides and the specific enrichment of phosphopeptides using immobilised metal ion affinity chromatography (IMAC). Once the phosphopeptide is isolated, tandem mass spectrometric analysis can determine the modified sites (Figure 14). For further information on protein phosphorylation, the interested reader is referred to Hunter [12]. Detailed information on the selective enrichment of phosphopeptides is described by Andersson & Porath [1] and Posewitz & Tempst [22], whereas the comparative MALDI map before and after treatment with phosphatase
7 Access to protein tertiary structure
is described by Zhang et al. [31]. Further information on selective tandem mass spectrometers for phosphopeptide-specific ions is provided by Neubauer & Mann [19] and Annan et al. [2]. 6.2 Glycosylation
Glycosylated proteins are very abundant among secreted and membrane-bound proteins. The biological role of the attached sugar chains (glycans) involves conformational stability, protection against degradation as well as essential molecular and cellular recognition. Glycans are branched structures consisting of different carbohydrate residues. They can either be attached to asparagines residues in the consensus sequence Asn-X-Ser/Thr/Cys (N-linked; where X can be any amino acid) or to serine or threonine (O-linked). Each site may vary in the glycan structures attached, creating micro-heterogeneity. In addition, different sites may only be partially glycosylated. Depending on the information required (carbohydrate portion or protein part) different analysis strategies are available. They involve combinations of: (i) the release of N-glycans using peptide-N4 -(N-acetyl-β-glucosaminyl) asparagines amidase F (PNGase F), (ii) the cleavage of O-glycans on reductive β-elimination, and (iii) the separation of the peptide and sugar fractions using reversed-phase high-performance liquid chromatography (RP-HPLC). For further information on protein glycosylation, the interested reader is referred to Bill et al. [4]. Strategies for glycoprotein analysis are described in Rademaker & Thomas-Oates [23].
7 Access to protein tertiary structure
In analogy to the detection of disulphide bridges, chemically cross-linked proteins can be analysed. Using a variety of linkers, amino acid distances can be defined in folded proteins. The potential of this simple approach for the analysis of the protein tertiary structure was recognised early on, and parallel advances in bioinformatics, mass spectrometry and databases of protein structure have fuelled strong renewed interest in the field. The main advantage of mass spectrometry is the low sample consumption in comparison to the traditional structure determination techniques of X-ray crystallography and nuclear magnetic resonance spectroscopy (NMR). In addition, crystallisation and intense labelling are not required. There is no upper weight limit for the protein under study. The basic idea is the combination of ‘easily’ gained experimental data with molecular modelling. Structure generation will be driven by minimisation of three types of potential energy terms. (i) Experimental constraints of data obtained by mass spectrometric analyses such as long and short range constants, site-specific secondary structure elements, hydrogen exchange and charging patterns. (ii) Globally applicable covalent restraints for proteins such as bond length and angles, improper angles and van-der-Waals terms. (iii) Database potential; a database can contain information on any form of
11
12 Mass Spectrometry for the Detection and Identification of Proteins
constraining knowledge including function, tendency of a given stretch of amino acids to form a specific secondary structure, phylogenetic information and so on. The method is still in its infancy. Templates for homologous structures available in three-dimensional databases are used for initial attempts. Current information on high-throughput protein fold identification by using experimental constraints derived from intra-molecular cross-links and mass spectrometry are described in Young et al. [30] and detailed three-dimensional macromolecular structure data of proteins can be found in the Protein Data Bank (PDB; Research Collaboration for Structural Bioinformatics, USA) at http://www.rcsb.org/pdb/ [3]. The single international repository for processing and distribution contains data primarily determined experimentally.
8 Analysis of intact proteins and non-covalent protein complexes
In many cases it is advisable to measure the uncleaved protein, because this may provide information that is lost in digests. Based on the mass spectrum, one can determine if a protein is pure or partially or completely modified (Figure 15). The number of attached groups can also be determined (usually the number of phosphorylated sites), if the resolution of the instrument is sufficient and the protein is not too large or inhomogeneous. Useful electrospray ionisation mass spectra can be measured for proteins of up to 70 kilodaltons in size. For higher mass proteins this appears to be difficult. The reason is not mass spectrometry, but the incapability to purify higher mass proteins in such a way that the charge state distribution can be resolved. Matrix-assisted laser-desorption/ionisation is more tolerant, but mass accuracy also suffers from that effect. Generally, the experiment is carried out under acidic conditions to support ionisation. By using volatile buffers such as ammonium bicarbonate, non-covalent protein complexes can also be measured. Although questions remain, correlations of the solution structure and the gas phase structure have been shown and mass spectrometry can be a valuable tool to study protein interaction [11].
9 Affinity mass spectrometry
Interfacing bio-specific interaction analysis based on surface plasmon resonance and mass spectrometry allows the combination of information on binding events and the determination of the molecular weight of the interacting molecules. Online and offline approaches have been demonstrated, but the use of bio-specificallycoated matrix-assisted laser-desorption/ionisation targets seems to be most promising with respect to protein chip development. Ultimately, the molecular weight of isolated bio-specific markers could be measured first followed by on-target enzymatic digestion directly on the same sample holder providing immediate protein identification [13, 17, 18, 29].
References
10 Conclusion
Mass spectrometry will continue to be of major importance in protein analysis. The technique complements established procedures in protein biochemistry and it opens up new possibilities to investigate the living organism. The analysis of proteins separated using polyacrylamide gel electrophoresis is already a routine operation in many laboratories. Special experimental setups allow the detection of post-translational protein modifications. Even intact proteins and their natural complexes with other proteins can be measured and bioaffinity mass spectrometry is well on the way facilitating interaction analysis. Mass spectrometry convinces by its non-ambiguous result – the molecular weight. Although sample preparation is sometimes a hindrance, ways will be found to increase the impact of the technique further.
References 1 ANDERSSON, L, and PORATH, J. Isolation of phosphoproteins by immobilized metal (Fe3+ ) affinity chromatography. Analytical Biochemistry 1986; 154(1): 250–254. 2 ANNAN, R. S., HUDDLESTON, M. J., VERMA, R., DESHAIES, R. J., and CARR, S. A. A multidimensional electrospray MS-based approach to phosphopeptide mapping. Analytical Chemical 2001; 73(3) 393–404. 3 BERMAN, H. M., WESTBROOK, J., FENG, Z., GILLILAND, G., BHAT, T. N., WEISSIG, H., SHINDYALOV, I. N., and BOURNE, P. E. The Protein Data Bank. Nucleic Acids Research 2000; 28(1): 235–242. 4 BILL, R. M., REVERS, L., and WILSON, I. B. H. Protein glycosylation. BILL, R. M., REVERS, L., and WILSON, I. B. H., editors. Kluwer Academic Publishers, Boston, New York, USA 1998. 5 BINZ, P.-A., M¨ULLER, M., WALTHER, D., BIENVENUT, W. V., GRAS, R., HOOGLAND, C., BOUCHET, G., GASTEIGER, E., FABBRETTI, R., GAY, S., PALAGI, P., WILKINS, M. R., ROUGE, V., TONELLA, L., PAESANO, S., ROSSELLAT, G., KARMIME, A., BAIROCH, A., SANCHEZ, J. C., APPEL, R. D., and HOCHSTRASSER, D. F. A molecular scanner to automate proteomic research and to display proteome images. Analytical Chemistry 1999; 71(21): 4981–4988. 6 CLAUSER, K. R., BAKER, P. R., and BURLINGAME, A. L. Role of accurate mass
7
8
9
10
11
12
measurement (+/− 10 ppm) in protein identification strategies employing MS or MS/MS and database searching. Analytical Chemistry 1999; 71(14): 2871–2882. FENYO, D., ZHANG, W., CHAIT, B. T., and BEAVIS, R. C. Internet-based analytical chemistry resources. A model project. Analytical Chemistry 1996; 68(23): 721A–726A. FERNANDEZ, J., ANDREWS, L., and MISCHE, S. M. An improved procedure for enzymatic digestion of polyvinylidene difluoride-bound proteins for internal sequence analysis. Analytical Biochemistry 1994; 218(1): 112–117. GYGI, S. P., RIST, B., GERBER, S. A., TURECEK, F., GELB, M. H., and AEBERSOLD, R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nature Biotechnology 1999; 17(10): 994–999. HANDLEY, J. Software for MS protein identification. Analytical Chemistry 2002; 74(5): 159A–162A. HERNANDEZ, H., and ROBINSON, C. V. Dynamic protein complexes: Insights from mass spectrometry. Journal of Biological Chemistry 2001; 276(50): 46685–46688. HUNTER, T. Protein kinases and phosphatases: The Yin and Yang of
13
14 Mass Spectrometry for the Detection and Identification of Proteins
13
14
15
16
17
18
19
20
protein phosphorylation and signaling. Cell 1995, 80(2): 225–236. K¨ONIG, S., GROTE, J., and GEDIG, E. High-capacity dextran hydrogels for protein mass spectrometry. International Biotechnology Laboratory 2002; 20(1): 10. K¨ONIG, S., ZELLER, M., PETER-KATALINIC, J., ROTH, J., SORG, C., and VOGL, T. Use of non-specific cleavage products for protein sequence analysis as shown on calcyclin isolated from human granulocytes. Journal of the American Society for Mass Spectrometry 2001; 12(11): 1180–1185. LINK, A. J., ENG, J., SCHIELTZ, D. M., CARMACK, E., MIZE, G. J., MORRIS, D. R., GARVIK, B. M., and YATES, J. R., 3rd . Direct analysis of protein complexes using mass spectrometry. Nature Biotechnology 1999; 17(7): 676–682. MILLS, I., CVITASˇ , T., HOMANN, K., KALLAY, N. and KUCHITSU, K. Quantities, units and symbols in physical chemistry. 2nd edition. International Union of Pure and Applied Physical Chemistry Division. Blackwell Science, Oxford, United Kingdom 1993. NEDELKOV, D., and NELSON, R. W. Exploring the limit of detection in biomolecular interaction analysis mass spectrometry (BIA/MS): detection of attomole amounts of native proteins present in complex biological mixtures. Analytica Chimica Acta 2000; 423(1): 1–7. NELSON, R. W., JARVIK, J. W., TAILLON, B. E., and TUBBS, K. A. BIA/MS of epitope-tagged peptides directly from E-coli lysate: Multiplex detection and protein identification at low-femtomole to subfemtomole levels. Analytical Chemistry 1999, 71(14): 2858–2865. NEUBAUER, G., and MANN, M. Mapping of phosphorylation sites of gel-isolated proteins by nanoelectrospray tandem mass spectrometry: potentials and limitations. Analytical Chemistry 1999; 71(1): 235–242. PERKINS, D. N., PAPPIN, D. J., CREASY, D. M., and COTTRELL, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999;
20(18): 3551–3567. 21 PENG, J., and GYGI, S. P. Proteomics: the move to mixtures. Journal of Mass Spectrometry 2001; 36(10): 1083–1091. 22 POSEWITZ, M. C., and TEMPST, P. Immobilized Gallium(III) affinity chromatography of phosphopeptides. Analytical Chemistry 1999; 71(14): 2883–2892. 23 RADEMAKER, G. J., and THOMAS-OATES, J. Analysis of glycoproteins and glycopeptides using fast-atom bombardment. In: Methods in Molecular Biology. CHAPMAN, J. R., editor. Humana Press, Totowa, New Jersey 1996; 61 24 ROSENFELD, J., CAPDEVIELLE, J., GUILLEMOT, J. C., and FERRARA, P. In-gel digestion of proteins for internal sequence analysis after one- or two-dimensional gel electrophoresis. Analytical Biochemistry 1992; 203(1): 173–179. 25 ROSMAN, K. J. R., and TAYLOR, P. D. P. (Commission on Atomic Weights and Isotopic Abundances). Isotopic compositions of the elements. Pure and Applied Chemistry 1998; 70(1): 217–236. 26 SHEVCHENKO, A., WILM, M., VORM, O., and MANN, M. Mass spectrometric sequencing of proteins from silver-stained polyacrylamide gels. Analytical Chemistry 1996; 68(5): 850–858. 27 STULTS, J. T., HENZEL, W. J., WONG, S. C., and WATANABE, C. Identification of electroblotted proteins by peptide mass searching of a sequence database. In: Mass spectrometry in the biological sciences. BURLINGAME, A. L., CARR, S. A., editors. Humana Press, Totowa, New Jersey, USA 1996. 28 WASHBURN, M. P., WOLTERS, D., and YATES, J. R., 3rd . Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nature Biotechnology 2001; 19(3): 242–247. 29 WILLIAMS, C., and ADDONA, T. A. The integration of SPR biosensors with mass spectrometry: Possible applications for proteome analysis. Trends in Biotechnology 2000; 18(2): 45–48. 30 YOUNG, M., TANG, N., HEMPEL, J. C., OSHIRO, C. M., TAYLOR, E. W., KUNTZ, I. D., GIBSON, B. W., and DOLLINGER, G. High throughput protein fold
References identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry. Proceedings of the National Academy of Sciences of the USA 2000; 97(11): 5802–5806. 31 ZHANG, X., HERRING, C. J., ROMANO, P. R.,
SZCZEPANOWSKA, J., BRZESKA, H., HINNEBUSCH, A. G., and QIN, J. Identification of phosphorylation sites in proteins separated by polyacrylamide gel electrophoresis. Analytical Chemistry 1998; 70(10): 2050–2059.
15
1
Protein Microarrays Simone K¨onig
University of M¨unster, M¨unster, Germany
J¨org Haberland Ogham GmbH, M¨unster, Germany Jens Grote, and Martin Zeller University of M¨unster, M¨unster, Germany
Erk Gedig XanTec bioanalytics GmbH, M¨unster, Germany Manfred Fobker, and Michael Erren University of M¨unster, M¨unster, Germany
Stephen W. Pursley DakoCytomation, Inc., Fort Collins, USA
John E. Pearl The Trudeau Institute, New York, USA
Paul Cullen Ogham GmbH, M¨unster, Germany
Stefan Lorkowski University of M¨unster, M¨unster, Germany
Originally published in: Analysing Gene Expression. Edited by Stefan Lorkowski, Paul Cullen. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30488-2
1 Introduction
The high-throughput tools of the genomic era have yielded a wealth of genetic information that is of use in the diagnosis and treatment of diseases. However, while DNA-based technologies such as micro-capillary and miniature array platforms are
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Microarrays
most useful tools [1, 2], their ability to describe complex dynamic processes on the protein level is quite limited. One main reason for this is that there is often a poor correlation between the level of gene expression at the mRNA level, cellular protein abundance and protein activity [3]. Furthermore, post-translational modifications of proteins, including phosphorylation, glycosylation, acetylation, lipidation and proteolysis, are often the key to function and affinity, but cannot be quantified by mRNA-based measurements. Finally, drug/target affinity studies that are an integral part of the drug discovery process are performed using the protein targets themselves rather than their encoding mRNA or DNA. For these reasons, major efforts have been made in the past few years to identify and quantify the complete proteome of a cell or a tissue including all post-translational modifications at any given time [4], and to make every component of this diverse and numerous population accessible for interaction studies with potential agonists, antagonists or drug candidates. At present, the identification and quantification of the protein complement of cells or tissues is mainly carried out using two-dimensional gel electrophoresis and mass spectrometry-based methods [5]. Affinity studies rely either on two-hybrid experiments [4] or on protein affinity chromatography [7]. Together, these approaches have been the workhorses of protein analysis for many years. However useful these methods have been, they suffer from a number of disadvantages including limited analytical speed and throughput, modest sensitivity and limited flexibility. For this reason, much effort is now focused on the adaptation of highly sensitive techniques with a massive parallelisation potential such as DNA microarray technology in order to use them in protein assays and affinity studies [1]. Early experiments employing protein microarrays date back to 1989 when Roger Ekins demonstrated that parallel microarray assays can deliver sensitive and selective results [9]. A few years before, fundamental work with a direct optical detection technology called surface plasmon resonance (SPR) had shown that protein interactions can be detected in real-time and without the use of labels on the sensorchip’s surface [10]. Also in the 1980s, advances in mass spectroscopy made it possible to exactly determine the molecular masses of proteins from solid matrices – the matrix-assisted laser-desorption/ionisation (MALDI) spectrometer was born [11, 12, 12]. In the following years, the latter two methods independently matured to become powerful analytical tools for protein analysis. The rapid progress in the field of miniature DNA arrays in the mid 1990s has spurred the development of – mostly fluorescence-based – biochip readers. Consequently, most of today’s protein arrays employ fluorescence as their detection principle [13]. However, unlike DNA which can be labelled without a significant alteration of its affinity, proteins are much more delicate, and, in some cases, their structure is greatly affected by chemical labels. Therefore, label-free methods such as the above-mentioned surface plasmon resonance, MALDI or a combination of both, biomolecular interaction analysis mass spectrometry, BIAMS [14], have become increasingly popular in recent years. In addition to the further improvement of these detection platforms, one of the most challenging tasks today is the development of highly specific sensorchip surfaces, which are a prerequisite for sensitive and selective assays. The construction
2 Principles and basics, typical results and applications 3
of large protein libraries still represents a major problem, too, for as yet there is no simple method of protein amplification, such as the PCR amplification of DNA, available.
2 Principles and basics, typical results and applications
Generally speaking, protein biochips can be considered as chromatography on a two-dimensional solid surface (a chip) combined with a sufficiently sensitive detection technique to analyse the binding of (bio)molecules from sample solutions (see Figure ?? for the workflow of protein microarray operation). In most cases, the surface is coated with proteins or protein binding molecules immobilised in
Fig. 1 Workflow in protein biochip operation. After proteins have been immobilised to a solid surface, specific binding and dissociation of ligands can be monitored in realtime. In addition, it is possible to quan-
tify or characterise bound ligands in later steps. The following abbreviation is used in the figure: Matrix-assisted laser desorption/ionisation, MALDI.
4 Protein Microarrays
a regular pattern on surface-modified planar substrates. Piezoelectric or ring-pin spotters are used to fabricate such arrays. An alternative approach works with chip surfaces of different charge and hydrophobicity, each of which attracts proteins in a quasi-specific manner thanks to intermolecular attractive forces such as electrostatic interaction, hydrophobic interaction, co-ordinate covalent and hydrogen bonding [15]. Depending on the detection technique employed, the chips are made of glass, metal or plastic. The samples are either directly deposited upon the chip’s active areas, or they are introduced in a continuous flow stream, or they are injected into a continuously mixed cuvette. The latter two methods require flow chambers or cuvette systems to ensure a sufficient convection that enhances the binding rate of high molecular weight analyte molecules. Discrimination between specific and nonspecific binding is achieved either by washing sequences of increasing stringency, or, preferably, by the inherent bio-inertness of the surface coating which at best should suppress non-specific binding almost quantitatively. Alternatively, the sample solution can be pretreated and purified by microfluidic devices before it reaches the protein chip surface [11, 15]. While DNA is stable enough to be spotted on almost every surface without losing its specific hybridisation properties, proteins tend to denature if the surfaces to which they are to be attached exhibit a high energy and/or charge density. Therefore, protein chips require a biocompatible coating prior to the immobilisation step. This coating must provide accessible covalent coupling sites, a structure-retaining immobilisation matrix, low non-specific binding and the option of quantitative regenerability after each binding cycle. The coating should also shield the chip substrate, thus enhancing the chemical resistance and providing a stable, homogeneous support for further modification. Finally, to allow a reliable interpretation of the experimental data, properties such as surface energy, charge and chemical functionality must be well-defined over the entire chip area. Typically, the cleaned chip surface is first covered with a linker layer to which a bioinert matrix is then covalently coupled. The matrix usually consists of hydrophilic compounds that contain functional groups for immobilisation of proteins. Frequently used matrix polymers are polysaccharides, poly L-lysine [17], polyethyleneglycol or proteins, for example, avidin [19] or bovine serum albumin [13]. Figure ?? shows an example of a dextran-coated surface plasmon resonance sensor chip. The polysaccharide chains are covalently attached to a linker layer in a brush-like architecture, serve as an immobilisation matrix for ligands, and stabilise the chip surface against non-specific binding. Depending on their biochemical functionality, protein chips and microarrays have been divided into protein function and protein detecting systems [19]; (also see Table 1). Protein function arrays consist of native proteins which can be obtained from cDNAs by in vitro transcription/translation or some other technique and which are spotted onto a suitable surface. The principle of a typical procedure is shown in Figure ??. Protein function arrays may be used to probe the function or binding properties of native proteins. An interesting application for such arrays is the mapping of all proteins of a specific organ onto one array in order to study the
2 Principles and basics, typical results and applications 5
Fig. 2 The molecular structure of a hydrogel-coated surface plasmon resonance protein biochip. A covalently-attached polysaccharide monolayer serves as an immobilisation matrix for receptor molecules and reduces the possibility of non-specific
interactions with the surface of the biochip. The binding of ligands is detected in realtime via modulations in the intensity of a reflected laser beam. Reprinted with kind
permission from XanTec bioanalytics GmbH (M¨unster, Germany).
interactions of small molecules with each arrayed protein. This type of application is relevant to a problem frequently encountered in the pharmaceutical industry, namely, whether a given drug candidate binds tightly to proteins other than its intended target [20]. Another example is illustrated in Figure ??. DNA-binding Table 1 The two main classes of protein microarrays and their applications
Type of protein array
Immobilised elements
Applications
Protein function
Proteins Peptides Glycosides Protein/nucleic acid conjugates Antibodies Single chain variable antibodies Protein and nucleic acid aptamers Small molecules Molecular imprinted materials
Drug discovery Antibody characterisation Organ and disease specific arrays Autoimmune diagnostics Second generation proteomics Pharmacokinetics Protein profiling Diagnostics Environmental and food analysis
Protein detection
6 Protein Microarrays
Fig. 3 Process for the production of protein libraries. Expression of modified cDNAs yields tagged proteins which can be readily immobilised to chip surfaces or microtiter wells. The figure is reprinted
with kind permission of Sense Proteomic Ltd. (Babraham, Cambridge, United Kingdom).
Fig. 4 Differential DNA binding to single nucleotide polymorphisms (SNP)-encoded mutants in an array format. Protein arrays showing A) the abundance of attached protein detected by labelled specific antibody and B) differential DNA binding to the pro-
teins due to SNP-encoded mutations in some members of the panel. The figure is
reprinted with kind permission of Sense Proteomic Ltd. (Babraham, Cambridge, United Kingdom).
2 Principles and basics, typical results and applications 7
proteins of the same type showing different kinds of mutations in their peptide sequence are arrayed on a solid surface. Afterwards, the array is incubated with labelled DNA molecules that are able to bind to the arrayed proteins. The mutations of the proteins influence their affinity of binding the DNA molecules. Using this procedure, proteins can be identified that bind with a high affinity to a certain DNA molecule. In protein-detecting arrays, ligands that bind with high affinity and specificity to each protein are arrayed on the appropriate surface. Today, such ligands are usually antibodies or single chain variable fragments. The production of tens of thousands of characterised antibodies is a time-consuming and expensive process. For this reason, new methods for the generation of single-chain Fv (scFv) antibody libraries are currently being developed [9]. Other approaches, for example, molecular imprinted materials, might become a promising alternative in the future. The ready manufacture of such artificial antibodies, which are much more stable than their natural counterparts may become feasible in the future. However, the affinity of this kind of ligands as they currently exist is too small to make them a serious alternative to antibodies at the present time. Arrays of the protein-detecting type will have wide applications in the field of proteomics and pharmacokinetics, as they allow for a protein profiling, i.e. they give quantitative information about the relative abundance of any given protein in an organism at a given time. Such arrays can also be used in diagnostics and for food and environmental analysis. In addition to the biospecifity of the surface and the immobilised ligand, the sensitivity of protein chips critically depends on the detection method (see Table 2 for an overview). Compared with DNA oligonucleotides, the molecular weight span of the analytes to be detected is much larger and can range from several hundred Dalton in applications such as drug discovery to some hundred kilodalton. As a consequence, the detection principles developed for DNA arrays may not be suitable for analytes other than DNA. Moreover, in many cases it is not only the concentration of the ligand that is of interest, but also the binding constants or even the molecular weight. The most commonly used detection method today is fluorescence [13]. Many variants exist, including fluorescence lifetime, polarisation or fluorescence resonance energy transfer (FRET). An advantage of fluorescence-based methods is the high sensitivity which allows even single molecule detection. Arrays are normally constructed on glass when using fluorescence detection, since glass auto-fluorescence is low. The array readers can detect up to five wavelengths and use confocal optics to enhance the signal-to-noise ratio. A higher sensitivity can be achieved with fluorescence planar waveguide (FPWG) detection [23]. This technique relies upon fluorescence excitation in the evanescent field of light which propagates in a planar waveguide. The advantage of this detection method is that only surface-bound fluorophores are excited, and not any labels or other fluorescent moieties found in the free solution. Despite the high sensitivity, there are several problems associated with fluorescence detection. First, due to the chemical heterogeneity of proteins, some proteins
8 Protein Microarrays Table 2 Suitable detection techniques for biomolecular interaction analysis
Information
Transducer classification
Examples
Quantitative
Label-free detection
Micro-refractometry
Surface plasmon resonance (SPR) Grating coupler Resonant mirror Mach-Zehnder interferometers
Reflectometry
Brewster angle spectrometry Ellipsometry Reflectrometric interferometry
Atomic force microscopy
–
Detection with labels
Micro-gravimetry
Quartz microbalance
Fluorimetry
Fluorescence immunoassay (FIA) Fluorescence lifetime assays Fluorescence polarisation (FPIA) Fluorescence resonance energy transfer (FRET) Fluorescence planar waveguide (FPWG)
Luminometry
Luciferase ‘glow’ assays
Densitometry
Particle-based assays
Colourimetry
Enzyme-linked immunosorbent assay (ELISA)
Scintigraphy
Radio immunoassay (RIA)
Amperometry Qualitative
–
Mass spectrometry
Matrix-assisted laser-desorption/ionisation (MALDI) Surface-enhanced laser-desorption/ionisation (SELDI)
Evanescent field spectrometry
Attenuated total reflection (ATR) couplers
label far more than others, which makes it necessary to obtain calibration curves for each arrayed protein prior to measurement. Secondly, as the chemical labelling process converts positively charged lysine side chains to amides, and as the labels themselves are relatively hydrophobic, the surface characteristics of the protein are greatly changed. In the worst case, the labelled proteins are denaturised and may no longer bind to the immobilised ligand. This effect becomes more problematic with analytes of low molecular weight. An extreme example is provided by the low molecular weight compounds analysed in drug discovery. It is clear that labelling with a one kilodalton fluorophore may drastically alter the affinity profiles of such small organic molecules. More traditional protein quantification methods such as enzyme-linked immunosorbent assay (ELISA) detect coloured products of enzymatic reactions. Enzymatic detection is usually sensitive due to the signal amplification provided by enzyme activity. However, the bulky enzyme may alter the diffusion properties of
2 Principles and basics, typical results and applications 9
the labelled molecules. Enzyme-linked detection has been used to detect proteins on both filter [6] and glass arrays [3]. Protein arrays have also been detected using incorporation of radioactive isotopes [13]. Isotopic labelling is sensitive, because the emitted radioactivity can be integrated over time and the background is low. Moreover, isotopic labels do not alter the labelled protein’s structure or activity. Disadvantages of isotopic labelling are high cost, lengthy preparations and the extra precautions required when working with radioactivity. A limitation of both isotopically labelled and enzyme-linked reactions is the fact that only the end point of the binding reaction is quantified. It is impossible to measure the complete binding curve in order to obtain kinetic data. Additionally, the amount of capture material in each spot must be tightly controlled to enable quantitative comparison between spots. The best way to avoid problems with labels, of course, would be to use a detection platform that requires no sample labelling whatsoever. Several technologies have been developed to allow the direct detection of binding events at the sensor surface without the use of labels. In addition to electromechanical or electro-acoustic approaches such as the quartz microbalance [27] or surface acoustic wave (SAW) sensors [28], direct optical biosensors are being increasingly used for the thermodynamic and kinetic characterisation of affinity reactions on chip surfaces. There are two groups of methods. The most frequently-used detection principle measures the small refractive index changes at the sensor surface that occur when biomolecules with a refractive index of approximately 1.5 replace water of which the refractive index is 1.33. This method is called micro-refractometry. Examples for micro-refractometric sensors are surface plasmon resonance (see Figure ??
Fig. 5 Surface plasmon resonance biosensor set-up. Polarised laser light is shone at different angles onto a gold-coated sensor chip. At a critical angle nSPR , which depends on the refractive index close to the sensor chip surface, a resonance effect occurs and a sharp decrease in the intensity of the reflected light is measured. As the
refractive index changes upon binding of biomolecules, biomolecular interactions can be monitored in real-time via the shift of the angle n. The following abbreviation is used in the figure: Surface plasmon resonance, SPR. The figure is reprinted with kind permission of XanTec bioanalytics ¨ GmbH (Munster, Germany).
10 Protein Microarrays
for the principle of this method), the grating coupler [23], the resonant mirror [8] and the Mach-Zehnder interferometer (MZI) [5]. The sensitivity of these devices allows the detection of differences in refractive index of as little as 10−6 , corresponding to mass changes of a few picograms per square millimetre. It is thus possible to quantify proteins in femtomolar concentrations even in matrices with a high non-specific background [32]. The second approach, reflectometry, is based on changes of the optical properties of the chip surface itself which leads to spectral or polarisation shifts of light reflected at the chip surface. This method includes the Brewster angle spectrometry, ellipsometry and reflectometric interferometry [14]. Surface plasmon resonance biosensors have been commercially available since 1990 and have now become part of established methodology. In addition to their label-free operation, a major advantage of this technology lies in the possibility of real-time data acquisition. The binding curves obtained in this fashion can be kinetically evaluated and allow the rapid assessment of kinetic data such as the association and dissociation rate constants ka and kd respectively (see Figure ??). While the direct optical techniques described above are well suited to obtaining quantitative information on the concentration or the kinetic characteristics of a specific biomolecule, they do not provide qualitative data such as the molecular weight or amino acid sequence of detected analytes. Over the past few years, this gap has been filled by the use of matrix-assisted laser-desorption/ionisation mass spectrometry instrumentation which, under certain conditions, is able to desorb surface-bound proteins. In addition to directly modified mass spectrometry, enhanced and simplified procedures for sample extraction and clean-up facilitated effective on-probe investigation of biomolecules. For example, an approach termed
Fig. 6 Assessment of kinetic data derived from binding curves obtained by surface plasmon resonance biosensors. Important kinetic data, such as the association rate constant ka and the dissociation rate constant kd can be obtained from surface plasmon resonance sensograms by first calculating the dissociation constants K D of different binding curves and then plotting these constants versus the correspond-
ing ligand concentration. The association rate constant ka is then calculated from the slope of the resulting straight line and its y-intercept. The following abbreviations are used in the figure: millidegrees, mdeg; seconds, s; nanomolar, nM; rate constant, R; time, t. The figure is reprinted with kind permission of XanTec bioanalytics GmbH ¨ (Munster, Germany).
3 Conclusion 11
surface-enhanced laser-desorption/ionisation (SELDI) [15] makes it possible to perform analyte-specific sample cleanup directly on a MALDI target. Another promising technique, biomolecular interaction analysis mass spectrometry (BIA/MS), combines mass spectrometry with realtime SPR detection [26]. Both approaches allow a molecular mass determination of bound analyte molecules with an accuracy of some 100 parts per million. If an additional enzymatic proteolysis step is performed, the amino acid sequence of bound proteins can also be determined.
3 Conclusion
In recent years, protein biochips and protein arrays have become a valuable addition to the portfolio of modern bioanalytical tools. Although the technology is still in its infancy, many types of experiments have already been reported, such as the measurement of protein/protein, protein/DNA, protein/RNA and protein/small molecule interactions. The main advantage of current protein array technologies over conventional analytical devices is the possibility of screening thousands of proteins against one sample and of obtaining a wide variety of information which would be extremely tedious or even impossible to gain using other techniques. However, the method has weaknesses which are either inherent or awaiting a technical solution. A fundamental problem is that reactions that occur in solutions are examined on the surface of the protein array. Mass transfer limitations during analyte binding [35] and steric hindrance on the chip surface can complicate the kinetic analysis of biomolecular interactions. The latter is likely to become a serious problem when the analytes under investigation form larger complexes with other proteins, a phenomenon frequently observed in living systems [2]. What is more, the immobilisation process includes at least some chemical modification, attachment to bulky binding domains, or even denaturing on the chip surface, and may therefore alter the binding characteristics of the immobilised ligand. The problems associated with protein labelling can be avoided by the use of label-free detection technologies, which will probably gradually replace conventional detection methods. Furthermore, although the experimental methodology has greatly advanced, the challenge remains to generate a significant number of capture reagents for the arrays. It is certainly a daunting task to express and purify thousands of different proteins, and some proteins will inevitably prove refractory to biochemical manipulation. Nonetheless, the effort will be worthwhile. In the future, multi-analyte assays will be capable of simultaneously measuring the expression of very large numbers of genes and proteins. This is likely to reveal important information on the causation, progression, and response to treatment of disease [7]. Protein function arrays will have a considerable impact on drug discovery, and it is possible that in the coming ten to twenty years protein arrays will become a mainstay of medical diagnostics.
12 Protein Microarrays
References 1 SCHENA, M. Genome analysis with gene expression microarrays. Bioessays 1996; 18(5): 424–431. 2 SHALON, D., SMITH, S. J., and BROWN, P. O. A DNA microarray system for analyzing complex DNA samples using two-colour fluorescent probe hybridization. Genome Research 1996; 6(7): 639–645. 3 GYGI, S. P., ROCHON, Y., FRANZA, B. R., and AEBERSOLD, R. Correlation between protein and mRNA abundance in yeast. Molecular and Cellular Biology 1999; 19(3): 1720–1730. 4 LOPEZ, M. F. Better approaches to finding the needle in a haystack: optimizing proteome analysis through automation. Electrophoresis 2000; 21(6): 1082–1093. 5 PANDLEY, A., and MANN, M. Proteomics to study genes and genomes. Nature 2000; 405(6788): 837–846. 6 BARTEL, P. S., and FIELDS, S. Analyzing protein-protein interactions using the two-hybrid system. Methods in Enzymology 1995; 254: 241–263. 7 FORMOSA, T., and ALBERTS, B. M. The use of affinity chromatography to study proteins involved in bacteriophage T4 genetic recombination. Cold Spring Harbor Symposia on Quantitative Biology 1984; 49: 363–370. 8 ABBOTT, A. A post-genomic challenge: learning to read patterns of protein synthesis. Nature 1999; 402(6763): 715–720. 9 EKINS, R., CHU, F., and MICALLEF, J. High specific activity chemiluminescent and fluorescent markers: their potential application to high sensitivity and ‘multi-analyte’ immunoassays. Journal of Bioluminescence and Chemiluminescence 1989; 4(1): 59–78. 10 LIEDBERG, B., NYLANDER, C, and LUNDSTRO¨ M, I. Surface plasmon resonance for gas detection and biosensing. Sensors and Actuators 1983; 4: 299–304. 11 KARAS, M., and HILLENKAMP, F. Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Analytical Chemistry 1988; 60(20): 2299–2301.
12 TANAKA, K., WAKI, H., IDO, Y., AKITA, S., YOSHIDA, Y. and YOSHIDA, T. Protein and polymer analysis up to m/z 100.000 by laser ionization and time-of-flight mass spectrometry. Proceeds of the 2nd Japan-China Joint Symposium on Mass Spectrometry. Osaka, Japan 1987: 185–187. 13 MACBEATH, G., and SCHREIBER, S. Printing proteins as microarrays for high-throughput function determination. Science 2000; 289(5485): 1760–1763. 14 NELSON, R. W., NEDELKOV, D., and TUBBS, K. A. Biomolecular interaction analysis mass spectrometry. Analytical Chemistry 2000; 72(11): 404A–411A. 15 MERCHANT, M., and WEINBERGER, S. R. Recent advancements in surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. Electrophoresis 2000; 21(6): 1164–1177. 16 FIGEYS, D. Array and lab on a chip technology for protein characterization. Current Opinion in Molecular Therapeutics 1999; 6(1): 685–694. 17 GUETENS, G., van CAUWENBERGHE, K., de BOECK, G., MAES, R., TJADEN, U. R., van der GREEF, J., HIGHLEY, M., van OOSTEROM, A. T., and de BRUIJN, E. A. Nanotechnology in bio/clinical analysis. Journal of Chromatography. B, Biomedical Sciences and Applications 2000; 739(1): 139–150. 18 HAAB, B. B., DUNHAM, M. J., and BROWN, P. O. Protein microarrays for highly parallel detection and quantitation of specific proteins and antibodies in complex solutions. Genome Biology 2000; 1(6):RESEARCH0004.1-RESEARCH0004.13. 19 ROWE, C. A., SCRUGGS, S. B., FELDSTEIN, M. J., GOLDEN, J. P., and LIGLER, F. S. An array immunosensor for simultaneous detection of clinical analytes. Analytical Chemistry 1999; 71(2): 433–439. 20 KODADEK, T. Protein microarrays: prospects and problems. Chemistry and Biology 2001; 8(2): 105–115. 21 LECITRA, E. J., and LIU, J. O. A three-hybrid system for detecting small
References
22
23
24
25
26
27
28
29
30
ligand-protein interactions. Proceedings of the National Academy of Sciences of the USA 1996; 93(23): 12817–12821. De WILDT, R., MUNDY C. R., GORICK, B. D., and TOMLINSON, I. M. Antibody arrays for high throughput screening of antibody-antigen interactions. Nature Biotechnology 2000; 18(9): 989–994. NEUSCHAFER, D., BUDACH, W., BAR, E., PAWLAK, M., and DUVENECK, G. Planar waveguides as efficient transducers for bioaffinity sensors. Proceedings of SPIE International Society for Optical Engineering 1996; 2836(12): 221–234. B¨USSOW, K., CAHILL, D., NIETFELD, W., BANCROFT, D., SCHERZINGER, E., LEHRACH, H., and WALTER, G. A method for global protein expression and antibody screening on high-density filters of an arrayed cDNA library. Nucleic Acids Research 1998; 26(21): 5007–5008. ARENKOV, P., KUKHTIN, A., GEMMELL, A., VOLOSHCHUK, S., CHUPEEVA, V, and MIRZABEKOV, A. Protein microchips: use for immunoassay and enzymatic reactions. Analytical Biochemistry 2000; 278(2): 123–131. GE, H. UPA, an universal protein array system for quantitative detection of protein-protein, protein-DNA, protein-RNA and protein ligand interactions. Nucleic Acid Research 2000; 28(2): E3. RICKERT, J., WEISS, T., KRAAS, W., JUNG, G., and G¨OPEL, W. A new affinity biosensor: Self assembled thiols as selective monolayer coatings of quartz crystal microbalances. Biosensors and Bioelectronics 1996; 11(6–7): 591–598. TOM-MOY, M., BAER, R. L., SPIRA-SOLOMON, D., and DOHERTY, T. P. Atrazin measurement using surface transverse wave devices. Analytical Chemistry 1995; 67(7): 1510–1514. LUKOSZ, W., and TIEFENTHALER, K. Sensitivity of integrated optical grating and prism couplers as (bio)chemical sensors. Sensors and Actuators 1988; 15: 273–281. CUSH, R. CRONIN, J. M., GODDARD, N. J., MAULE, C. H., MOLLOY J., and STEWARD, J.
31
32
33
34
35
36
37
38
The resonant mirror: A novel optical biosensor for direct sensing of biomolecular interactions. Part I: Principle of operation and associated instrumentation. Biosensors and Bioelectronics 1993; 8(7–8): 347–354. BROSINGER, F., FREIMUTH, H., LACHER, M., EHRFELD, W., GEDIG, E., KATERKAMP, A., SPENER, F., and CAMMANN, K. A label free affinity sensor with compensation of unspecific protein interactions by a highly sensitive integrated optical Mach-Zehnder interferometer on silicon. Sensors and Actuators B 1997; 44: 350– 355. SCHNEIDER, B. H., DICKINSON, E. L., VACH, M. D., HOIJER, J. V., and HOWARD L. V. Highly sensitive optical immunoassays in human serum. Biosensors and Bioelectronics 2000; 15(1–2): 13–22. GRA¨ FE, D., ELENDER, G., GRAU, W., DOBSCHAL, H.-J., and BERTHEL, G. Anordnung zum Nachweis biomolekularer Reaktionen und Wechselwirkungen. DE19828547 1998. NELSON, R. W., JARVIK, J. W., TAILLON, B. E., and TUBBS, K. A. BIA/MS of epitope-tagged peptides directly from E-coli lysate: Multiplex detection and protein identification at low-femtomole to subfemtomole levels. Analytical Chemistry 1999, 71(14): 2858–2865. SCHUCK, P., and MINTON, A. P Analysis of mass transport limited binding kinetics in evanescent wave biosensors. Analytical Biochemistry 1996; 240(2): 262–272. ALBERTS, B. The cell as a collection of protein machines: preparing the next generation of molecular biologists. Cell 1998; 92(3): 291–294. CAHILL, D. J. Protein and antibody arrays and their medical applications. Journal of Immunological Methods 2001; 250(1–2): 81–91. TANAKA, K., WAKI, H., IDO, Y., AKITA, S., YOSHIDA, Y. and YOSHIDA, T. Protein and polymer analysis up to m/z 100,000 by laser ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 1988; 2(8): 151–153.
13
1
Separation Strategies for Proteomics Simone K¨onig University of M¨unster, M¨unster, Germany
J¨org Haberland Ogham GmbH, M¨unster, Germany
Jens Grote and Martin Zeller University of M¨unster, M¨unster, Germany
Erk Gedig XanTec bioanalytics GmbH, M¨unster, Germany
Manfred Fobker and Michael Erren University of M¨unster, M¨unster, Germany
Stephen W. Pursley DakoCytomation, Inc., Fort Collins, USA
John E. Pearl The Trudeau Institute, New York, USA
Paul Cullen Ogham GmbH, M¨unster, Germany
Stefan Lorkowski University of M¨unster, M¨unster, Germany
Originally published in: Analysing Gene Expression. Edited by Stefan Lorkowski, Paul Cullen. Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30488-2
1 Introduction
The simple but most important question for the choice of procedure for the separation of proteins is: what has to be separated? Is it a single protein, a group of proteins or the whole proteome of a cell or tissue? The answer to that question defines: (i)
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Separation Strategies for Proteomics
The technique to be used. The three most relevant techniques are conventional slab gel electrophoresis including isoelectric focussing (IEF), sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) and two-dimensional PAGE, highperformance liquid chromatography (HPLC) and capillary electrophoresis (CE). (ii) The separation principle, for example, ion exchange, size exclusion, or affinity. (iii) The need to combine several successive methods to a multi-dimensional separation procedure. The following sections give an overview about current procedures in the field of protein separation. Some commonly used applications are presented in more detail.
2 Conventional slab gel electrophoresis 2.1 Isoelectric focusing (IEF)
All proteins are ampholytes. The net charge of each protein is the sum of the positive and negative charges of the amino groups and carboxy acid groups, respectively. For every protein there exists a pH at which the net charge is zero. This is called the isoelectric point and usually ranges between pH 3 and pH 11. In isoelectric focusing (IEF), proteins migrate within an inert polyacrylamide matrix containing a stable gradient according to their net charge under denaturing conditions and stop migrating when they reach their individual isoelectric point (Figure 1). Two approaches are available to provide a stable pH gradient within the matrix. The historical earlier method includes the use of carrier ampholytes (CA-IEF), initially described for slab gel electrophoresis [1]. In this method, a mixture of low molecular weight ampholytes covering the pH of interest is embedded into the carrier matrix of polyacrylamide. During a first gel run, these carrier ampholytes find their individual positions between cathode and anode where they supply a buffered pH that corresponds to their respective isoelectric point. A major problem with the use of carrier ampholytes is a time-dependent shift of the expected gradient called the ‘cathodic drift’ [2]. This delay of the gradient generally decreases the reproducibility of CA-IEF. The introduction of an immobilised pH gradient (IPG-IEF) has abolished this problem [3]. In this variant, buffering acryloyl monomers are covalently bound to the acrylamide matrix during gel production. The buffering components are immobilised at fixed positions providing a stable and reproducible pH gradient. Another advantage of IPG-IEF is that large amounts of protein (up to 5 mg) can be applied so that more low abundance proteins become detectable. On the other hand, the resolution of some hydrophobic (e.g., membrane) proteins is poor and others get lost when separated by immobilised pH gradient [4]. The most important
2 Conventional slab gel electrophoresis
Fig. 1 Isoelectric focusing (IEF). IEF is a method to separate proteins according to their net charge by electrophoresis. The gel matrix provides a stable and continuous pH gradient. A) At low pH, the amino groups of proteins are positively charged while the carboxylic acid groups are uncharged resulting in a positive net charge of the protein. At low pH, the situation is reversed.
The amino groups remain uncharged while the carboxylic acid groups carry a negative charge resulting in a negative net charge of the protein. B) The application of an electric field causes the migration of the charged protein through the gel matrix along the pH gradient. C) At the isoelectric pH, the protein has no net charge and accumulates to a sharp band.
characteristic of IEF is that it represents the first dimension of two-dimensional gels and is therefore an essential step in analysis of protein expression. 2.2 Sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) and Western blotting
Proteins differ in three physical parameters: molecular weight, shape, and charge. SDS-PAGE is a common and simple method to determine the approximate molecular mass of a polypeptide or protein and its subunits or to verify the success of a protein preparation. This is possible as the influences of shape and charge on the migration behaviour of a polypeptide can be eliminated. SDS and β-mercaptoethanol are the chemical reagents that are used to ensure that the migration of proteins is proportional to the logarithm of their molecular masses [27]. β-Mercaptoethanol is one component of the buffer in which the sample is heated before being applied to the gel. β-Mercaptoethanol reduces disulphide bridges to sulfhydryl groups so
3
4 Separation Strategies for Proteomics
Fig. 2 Sodium dodecyl sulfate-based polyacrylamide gel electrophoresis (SDS-PAGE). A) Polypeptides applied to a gel slot accumulate to a small band during their migration through the electric field in the stacking gel forwarded by the electric field. B) In the presence of SDS, polypeptides are separated according to their molecular masses as the
anionic detergent SDS binds to polypeptide chains giving them an overall negative charge. Protein bands are usually stained with Coomassie Blue. The example shows the typical band patterns from crude cell extract (lane on the left) to the over 90 percent purified protein (lane on the right) occurring during a protein purification.
that proteins lose their tertiary and quarternary structure. SDS is present in all components of the electrophoresis. The anionic detergent dodecyl sulphate has a negatively charged hydrophilic head region and a long hydrophobic tail region. The latter interacts with hydrophobic regions of a polypeptide and gives the surface an overall negative charge via its charged head regions that neutralises the positive charges of the polypeptide chain. The net effect of this treatment is to produce polypeptide or protein molecules whose migration behaviour is a pure function of their molecular mass. SDS-PAGE is not suitable for all proteins, however. For example, the molecular mass of some very large proteins and proteins containing large amounts of carbohydrates cannot be determined using this method. Most commonly, SDS-PAGE is implemented as a discontinuous electrophoresis in slab gels (Figure 2) according to the procedure described by Laemmli [5]). During the first part of the electrophoresis, the sample is concentrated in a small sharp band by means of a stacking gel. The stacked proteins then enter the separation gel at almost the same time. Both the stacking gel and the separation gel consists of a porous polyacrylamide matrix, but the pores in the separation gel are smaller. Usually, the stacking separation gel contains four (to separate very large proteins) to fifteen (to separate small proteins) percent acrylamide depending on the protein of interest. Polypeptides smaller than 15 kilodalton should be separated in tricine gels [6]. Proteins can be visualised by staining with Coomassie Blue dye or silver. Silver staining is more sensitive than other non-radioactive staining methods.
2 Conventional slab gel electrophoresis
For the unambiguous identification of a polypeptide within a gel and for numerous downstream applications, it is often useful to transfer the separated polypeptides electrophoretically out of the gel matrix onto the surface of a membrane. This process is called Western blotting [7]. The gel and the blotting buffer are first equilibrated in transfer buffer and the gel is laid in direct contact onto the blotting membrane that is usual of polyvinyldifluoride (PVDF) or nitrocellulose. This transfer leads to the quantitative immobilisation of the previously separated proteins on the surface of the membrane. Usually, this is carried out electrophoretically in a submersed blotting chamber. The success of the transfer is checked by reversible staining of proteins on the membrane using Ponceau S red stain. Western blotting is also called “immunoblotting”, as the most frequent downstream application is the immunochemical identification of relevant proteins using antibodies directed against the protein of interest that allow identification of the separated protein (Figure 3). 2.3 Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE)
One of the greatest challenges of the proteome analysis is the reproducible separation of the up to predicted 50,000 to 100,000 different cellular proteins while maintaining the relative amounts of these proteins, the abundance of which varies between ten and 106 copies in a single cell. The most widely used and most suitable method for quantitatively comparing changes in protein profiles of cells, tissues, or whole cell organisms is 2D-PAGE [8]. Originally described by O’Farrell [9], 2D-PAGE is a multi-dimensional separation method that requires two successive operations. In the first dimension, proteins are separated according to their charge by isoelectric focusing. Suitable precasted polyacrylamide gel strips containing an immobilised pH gradient [3] are available from several commercial suppliers. In the second dimension, that is carried out in right angle to the first dimension, the pre-separated proteins are further separated according to their molecular masses by SDS-PAGE (Figure 4). After staining, the spots can be excised from the gel, digested with the endoproteinase trypsin and the resulting peptides can be characterised by mass spectrometry. Proteins are then identified by comparing the mass of the peptide fragments with predictions from either mRNA sequences or protein databases [10]. Limited resolution is currently the main drawback of two-dimensional PAGE. Resulting spots are visualised by staining. Coomassie Blue facilitates only the visualisation of high abundant proteins, i.e. 105 copies per cell or approximately ten nanograms per square millimetre gel. Silver staining by contrast visualises about 0.01 nanograms per square millimetre gel (103 copies per cell). The most sensitive detection method is immunoblotting that may detect as little as ten protein copies per cell. The usefulness of immunoblotting in protein profiling is limited as it is a highly sequence specific procedure. A new generation of fluorescent dyes called SYPRO (SYPRO is an artificial word; S and Y are the initials of the person that invented the dye and PRO comes from protein) does not bind directly to the
5
6 Separation Strategies for Proteomics
Fig. 3 Western blotting procedure. A) Blotting. All polypeptides separated by polyacrylamide gel electrophoresis are quantitatively transferred to a the surface of a blotting membrane by electrophoresis. This procedure is called Western blotting. B)
Temporary visualisation. The efficiency of the transfer can be controlled by staining polypeptides reversible using Ponceau S red stain. C) Immunochemical detection. Polypeptides of interest are visualised and identified by specific antibodies.
protein but to the protein-associated SDS and provides sensitivities comparable to those of silver staining [3]. Even the most sensitive staining method available only allows the visualisation of about 10,000 spots per gel. Vuong et al. postulated that the detectable proteins are species represented with more than 1,000 copies per cell. [12] Assuming that a single cell contains approximately 105 different mRNA molecules each of them encoding for in average five different proteins one can calculate that the 10,000 spots which are maximally visualised on a two-dimensional gel represent only about one fifth of the potential number of gene products [12]. Staining methods
3 High-performance liquid chromatography (HPLC)
Fig. 4 Two-dimensional polyacrylamidegel electrophoresis (2D-PAGE). In the first dimension, polypeptides are separated according to their charge using isoelectric focusing (IEF). The following second di-
mension separates polypeptides according only to their molecular masses by sodium dodecyl sulphate-based polyacrylamide gel electrophoresis (SDS-PAGE).
must therefore be developed that allow the detection of low abundance proteins. The application of narrow pH range ultrazoom gels can enhance the resolution and improve the detection of low abundance proteins [8]. The methods mentioned and described so far can be summarised as the classical tools used for the analysis of differential protein expression. 2.4 Two-dimensional difference gel electrophoresis (2D-DIGE)
Two-dimensional difference gel electrophoresis (2D-DIGE) offers an improved comparability of quantitative changes of protein expression patterns in different samples. In principle, differences between two protein samples are detected by fluorescently tagging the samples with different dyes having distinguishable emission maxima as well as by co-separation and visualisation in a single conventional twodimensional gel [14]. Differences between the samples are analysed by overlaying the two false colour marked fluorescent channels.
3 High-performance liquid chromatography (HPLC)
High-performance liquid chromatography (HPLC) is a popular technique for the analysis of proteins and peptides because it is easy to use and is not limited by volatility or stability of the sample compound. HPLC was developed in the mid1970s and quickly improved with the development of column packing materials and the additional convenience of online detectors. Computers and automation added to the convenience of HPLC during the last two decades. The past decade has seen
7
8 Separation Strategies for Proteomics
Fig. 5 Column chromatography. A cylindrical glass, plastic, or metal column is filled with a permeable solid matrix (solid phase) that is immersed in solvent (mobile phase). The sample is applied to the top of the column. Large amounts of solvent are pumped
(or flow by gravity) through the column. Different molecule species are separated according to different migration rates through the column and are fractionally collected in different tubes.
a vast undertaking in the development of micro-columns and other specialised columns for separation. The dimensions of a typical HPLC column are at least xd 100 millimetres or more in length with an internal diameter of between three and five millimetres. In principle, a sample is injected upstream of the column as part of the mobile phase and is pumped over a column that is composed of a solid phase and a mobile phase with a constant flow rate. Downstream of the column, the separated sample compounds are detected online using ultraviolet absorption, fluorescence or mass spectrometry and the eluate is collected in fractions (Figure 5). The main advantages of HPLC as compared to previously used methods such as open-column chromatography, paper chromatography or thin-layer chromatography are velocity, reproducibility and high resolution. The choice of solid and mobile phase determines the success of the separation effort. The solid phase in HPLC refers to the solid support contained within a column over which the mobile phase containing the sample flows. As the sample
3 High-performance liquid chromatography (HPLC)
9
10 Separation Strategies for Proteomics
Fig. 6 Different methods used in column chromatography. A) Ion exchange chromatography (IEC) means the reversible electrostatic binding of solubilised ions to ions of the opposite charge that are covalently bound to the solid matrix of the column. As an example, the principle of cation exchange is shown. While cations bind to the matrix, anions do not (upper panel). The strength of the association between the cations and the matrix depends on the ion strength and the pH of the mobile phase. The lower panel shows the elution of bound polypeptides with a linear salt gradient. The increasing concentration of concurrent cations causes the elution of the more weakly bound polypeptides first while stronger bound polypeptides dissociate and elute later in the order of their increasing affinity to the matrix of the column. B) In size exclusion chromatography (SEC; also called gel filtration), the sample passes an inert matrix that is porous. Small molecules that enter the pores completely
are the most retarded and delayed. Depending on the pore size of the matrix, larger molecules are retarded to a lesser extent or not at all and migrate through the column faster (upper panel). The elution of a SEC column (lower panel) uses the same solvent that the sample was dissolved in. No further component is required. C) Affinity chromatography makes use of a highly specific interaction between a molecule covalently bound to the inert matrix and the molecule of interest in the applied sample (upper panel). The matrix-bound molecular species may for instance be an antibody or a natural binding partner of the protein that has to be isolated. The high affinity makes it possible to wash away undesired components efficiently after the specific binding has taken place. Subsequently, purified protein can be eluted from affinity chromatography columns using either an extreme pH value or highly concentrated salt solutions (lower panel).
flows across the stationary phase, migration of the sample components is affected by non-covalent interactions of the compounds with the stationary phase. Samples that interact more strongly with the stationary phase than with the mobile phase will have a longer retention time and vice versa. The separation principles employed in the stationary phase include affinity chromatography, size exclusion chromatography and ion exchange chromatography. In addition, a polar solute may be used with a apolar stationary phase (so-called reverse phase chromatography), or an apolar solute may be used with a polar stationary phase (normal phase chromatography) (Figure 6). 3.1 Ion exchange chromatography (IEC)
Ion exchange chromatography (IEC) operates on the basis of selective exchange of ions in the sample with counterions in the stationary phase (Figure 6A). IEC is performed with columns containing charge-bearing functional groups attached to a polymer matrix. The functional ions are permanently bonded to the column and each has a counterion attached. The sample is retained by replacing the counterions of the stationary phase with the ionised sample. The sample is eluted from the column by changing the properties of the mobile phase so that the mobile phase will displace the sample ions from the stationary phase, i.e. by changing the ion strength or the pH. The use of gradient elution is favourable because of equivalent resolution as polyacrylamide gel electrophoresis and increased loading capacity when compared to size exclusion chromatography. Originally reserved for
3 High-performance liquid chromatography (HPLC)
the analysis of proteins, IEC has become a useful tool in peptide analysis. The development of strong cation-exchange (SCX) columns allows the efficient retention and separation of peptides [6]. 3.2 Size exclusion chromatography (SEC)
Size exclusion chromatography (SEC) operates by separating compounds on the basis of their size. The stationary phase consists of porous beads. The larger compounds are excluded from the interior of the beads and, thus, elute first. The small compounds enter the beads and elute according to their ability to exit through the bead pores (see Figure 6B). The column may be either silica or non-silica-based. The use of SEC (as well as IEC) is well-suited for use with biologically active proteins, such as enzymes, hormones, and antibodies, since each protein has its own unique structure and these techniques may be performed under physiological conditions. Full recovery of activity after exposure to the chromatography may be achieved, and availability of SEC columns is diverse enough to allow fractionation from ten to one thousand kilodaltons. Extremely basic or hydrophobic proteins may not exhibit true SEC character since the columns tend to have slight hydrophobicity and an anionic character [1]. 3.3 Affinity chromatography
Affinity chromatography operates by using immobilised biochemicals that have a specific affinity to the compound(s) of interest (Figure 6C). Separation occurs as the mobile phase and sample pass over the stationary phase. The sample compound of interest is retained as the rest of the impurities and mobile phase pass through the column. The compound is then eluted by changing the ion strength, pH, temperature and other characteristics of the mobile phase. Affinity chromatography is therefore the method of choice to separate a specific protein or a group of proteins sharing a common affinity characteristic [10]. 3.4 Reverse phase chromatography (RPC)
Reverse phase chromatography (RPC) operates on the basis of hydrophilicity and lipophilicity. The stationary phase consists of silica-based packings with n-alkyl chains covalently bound. For example, the octadecyl ligand that is often used is symbolised by C-18. The more hydrophobic the matrix on each ligand, the greater is the tendency of the column to retain hydrophobic moieties. Thus, hydrophilic compounds elute more quickly than hydrophobic compounds. The use of RPC revolutionised peptide purification. Crude tissue extracts may be loaded directly onto the RPC system and mobilised by gradient elution although it is usual to perform fragmentation prior to RPC. A common mobile phase for RPC
11
12 Separation Strategies for Proteomics
of peptides is a gradient of 0.1 percent of trifluoroacetic acid in water to 0.1 percent trifluoroacetic acid in an organic solvent such as acetonitrile [16]. 3.5 Normal phase chromatography
Normal phase chromatography operates on the basis of hydrophilicity and lipophilicity by using a polar stationary phase and a less polar mobile phase. Thus, hydrophobic compounds elute more quickly than hydrophilic compounds. 4 Capillary electrophoresis (CE) and capillary electrochromatography (CEC)
Like HPLC, capillary electrophoresis (CE) and capillary electrochromatography (CEC) are powerful tools for peptide and protein sample separation [14, 20]. The separation of compounds in CE relies on the same principles as in conventional electrophoresis techniques, but the separations take place within a silica capillary (column) with only 25 to 50 micrometres in diameter. The driving force of ion separation in CE is the applied voltage of up to 35 kilovolt. This high voltage causes partial deprotonation of the silica capillary wall, which ultimately results in the electroendoosmotic flow (EOF) of ions [2]. While CE separates sample compounds in buffer, in CEC the capillary column contains an additional stationary phase, that makes most conventional slab gel and HPLC techniques (e.g., IEF, SDS-PAGE, affinity chromatography) accessible to the capillary system. One advantage of CE and CEC as compared to HPLC is the necessity for only very small sample volumes of a few nanolitres. Another advantage is that the EOF in CE and CEC causes a nearly flat band profile while HPLC may show band broadening effects caused by transchannel diffusion in the broader column. In two-dimensional gel electrophoresis, proteins are first resolved, then digested with trypsin and the peptide fragments are analysed using mass spectrometry. HPLC and CE/CEC techniques have in common that the proteins are digested first and then separated afterwards. The advantage is that peptides are more soluble and easier to separate than the parent proteins, especially when there are hydrophobic and membrane proteins. The disadvantage is that tryptic digests may easily contain a million peptides to be analysed. No single chromatographic or electrophoretic method is capable of separating such a complex mixture. Excellent overviews about the HPLC of proteins and the separation of biomolecules are given by Reginer [16] and Turkova [16, 22], respectively.
5 Multi-dimensional separations
Each human cell contains at least 10,000 proteins at any given time [5]. The concentrations of these proteins vary with a dynamic range of approximately fivefold. Two-dimensional PAGE, the current method of choice to investigate whole
6 New approaches
proteomes, can resolve up to 10,000 spots with the disadvantage that low abundance proteins and hydrophobic proteins are poorly or not represented (see above). As no single chromatographic or electrophoretic method is likely to resolve such a complex mixture of a cell’s or tissue’s proteins, a multi-dimensional method which employs two or more orthogonal separation techniques (i.e. conventional slab gel electrophoresis, HPLC or CE) or separation methods with different mechanisms of separation (ion exchange, size exclusion, affinity chromatography, etc.) will significantly improve the chances of resolving such a complex mixture of cell proteins into its individual components. All of the above mentioned separation techniques and mechanisms can be combined theoretically in multi-dimensional approaches, each with its own advantages. For example, the advantages of HPLC-combined CE multi-dimensional approaches are that they can be automated, are sensitive, reproducible, fast and quantitative. An overview of multi-dimensional approaches has been provided by Issaq et al. [9].
6 New approaches 6.1 Isotope-coded affinity tag method (ICATTM )
The isotope-coded affinity tag method (ICATTM ) is a new strategy for quantifying differential protein expression (Figure 7). Two different protein mixtures are labelled, each with a different derivative of the ICATTM reagent. The ICATTM reagent consists of three structural elements: (i) Biotin as an affinity tag, which is used to isolate ICATTM -labelled peptides. (ii) A linker that can incorporate two stable isotopes, e.g., hydrogen in light ICATTM reagent (d0-ICAT) and deuterium in heavy reagent (d8-ICAT). (iii) A reactive group specific toward thiol groups, i.e., to cysteine residues of proteins and peptides. The protein samples are differentially labelled by virtue of the differences that exist in their cysteine residues. The samples are then combined and fragmented by means of trypsin digestion. The modified peptides are then isolated by affinity chromatography using immobilised avidin. The mixture is analysed by mass spectrometry delivering the ratio of the ion intensities for an ICATTM -labelled pair and therefore quantifies the relative abundance of its parent protein in the original samples. In addition, a tandem mass spectrum reveals the sequence of the peptide and unambiguously identifies the protein. This strategy results in the quantification and identification of all protein compounds in a mixture. Theoretically, ICATTM is applicable to protein mixtures as complex as the entire proteome [7]. 6.2 Multi-epitope ligand-Kartographie (MELK)
Although it is itself not a method to separate proteins, or peptides, the proprietary topological proteomics technology MELK (Multi-epitope ligand-Kartographie;
13
14 Separation Strategies for Proteomics
Fig. 7 Principle of the isotope-coded affinity tag (ICATTM ) method. Two samples are covalently labelled via their cysteine residues, one with the light hydrogen-containing ICAT reagent, and the other with the heavy deuterium-containing ICAT reagent. The two samples are pooled in a 1:1 ratio and polypeptides in the pool are subjected to an endoproteinase digest. The resulting fragments containing an ICATTM label are separated from unlabelled fragments by affinity
chromatography. As the labelling with heavy ICATTM reagent gives a higher molecular mass to a certain peptide in sample 2 as compared to the same peptide in sample 1, the relative amount of each peptide in the two samples can be determined by tandem mass spectrometry (MS/MS). The identification of the respective protein by mass spectrometry (MS) delivers the amino acid sequence.
Mel-Tec GmbH, Magdeburg, Germany) is mentioned here, because of its innovative character (Figure 8). In a high-throughput process suitable for whole cell protein fingerprinting, MELK permits the deciphering of protein networks, their protein components and function in single intact cells delivering three-dimensional distribution patterns of marker proteins in cells to be compared. The technique is capable of detecting and tracking selected proteins independently of their biochemical properties, expression levels, or localisation. A fundamental advance of MELK is that proteomes of intact cells are analysed where context-depending protein information is preserved and expression profiles of selected proteins can be compared in situ. These topological information gets lost when cells are homogenised as it is usual for other applications in proteome analysis [20].
7 Conclusion
For the last four decades, parallel developments in conventional slab gel electrophoresis and in HPLC/CE techniques have contributed enormously to the science of protein separation. Individual methods within these techniques are
References
Fig. 8 Multi-epitope ligand-Kartographie (MELK). In contrast to conventional methods used to investigate proteomes, MELK not only determines the cellular abundance of a selected protein species but also allows visualisation of its cellular distribution. The figure schematically shows that MELK al-
lows the differential localisation of four chosen proteins in the cellular status I and II of intact cells A), respectively, while conventional methods using homogenised cells B) only determine the (in this case) unchanged abundance of these proteins in status I and II of the cells.
well suited for separating single proteins or groups of proteins. However, the separation of a whole proteome from a cell or tissue is an ambitious task, and, currently, there is no established method available to meet this challenge. In practice, three different approaches are actually used to deal with whole proteomes. The first strategy is to improve the most common conventional method of choice, i.e. 2D-PAGE. This method is fundamentally lacking in speed, sensitivity, reproducibility, and precision. The second strategy is to combine two or more of the established methods of separation successfully in socalled multi-dimensional approaches in order to achieve the desired effect. So far, this path has been used to probe many complex questions. A third strategy is to develop new techniques for comparing different proteomes. ICAT is and MELK seems to be a promising new tool for taking up the great challenge of proteomics.
15
16 Separation Strategies for Proteomics
References 1 ROBERTSON, E. F., DANNELLY, H. K., MALLOV, P. J., and REEVES, H. C. Rapid isoelectric focusing in a vertical polyacrylamide minigel system. Analytical Biochemistry 1987; 167(2): 290–294. 2 RILBE, H. Rapid isoelectric focusing in density gradient columns. Annals of the New York Academy of Sciences 1973; 209: 80–93. 3 BJELLQUIST, B., EK, K., REGHETTI, P. G., GIANAZZA, E., and GORG, A. Isoelectric focusing in immobilized pH gradients: principle, methodology and some applications. Journal of Biochemical and Biophysical Methods 1982; 6(4): 317–339. 4 MOLLOY M. P. Two-dimensional electrophoresis of membrane proteins using immobilized pH gradients. Analytical Biochemistry 2000; 280(1): 1–10. 5 LAEMMLI, U. K. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 1970; 227(259): 680–685. 6 SCHA¨ GGER, H., and JAGOW, G. Tricine-sodium dodecyl sulfate-polyacrylamide gel electrophoresis for the separation of proteins in the range from 1 to 100 kDa. Analytical Biochemistry 1987; 166(2): 368–379. 7 TOWBIN, H., STAEHELIN, T., and GORDON, J. Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications. Proceedings of the National Academy of Sciences of the USA 1979; 76(9): 4350–4354. 8 QUADRONI, M., and JAMES P. Proteomics and automation. Electrophoresis 1999; 20(4–5): 664–677. 9 O’FARRELL, P. H. High resolution two-dimensional electrophoresis of proteins. Journal of Biological Chemistry 1975; 250(10): 4007–4021. 10 YATES, J. R., 3rd . Mass spectrometry and the age of the proteome. Journal of Mass Spectrometry 1998; 33(1): 1–19. 11 BERGGREN, K. N., CHERNOKALSKAYA, E., LOPEZ, M. F., BEECHEM, J. M., and PATTON, W. F. Comparison of three different fluorescent visualization strategies for detecting Escherichia coli ATP synthase
12
13
14
15
16
17
18
19
20
21
22
subunits after sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Proteomics 2001; 1(1): 54–65. VUONG, G. L., WEISS, S. M., KAMMER, W., PRIEMER, M., VINGRON, M., NORDHEIM, A., and CAHILL, M. A. Improved sensitivity proteomics by postharvest alkylation and radioactive labelling of proteins. Electrophoresis 2000; 21(13): 2594–2605. HOVING, S., VOSHOL, H., and van OOSTRUM, J. Towards high performance two-dimensional gel electrophoresis using ultrazoom gels. Electrophoresis 2000; 21(13): 2617–2621. UNLU, M., MORGAN, M. E., and MINDEN, J. S. Difference gel electrophoresis: a single gel method for detecting changes in protein extracts. Electrophoresis 1997; 18(11): 2071–2077. CRIMMINS, D. L. Applications of strong cation-exchange (SCX)-HPLC in synthetic peptide analysis. Methods in Molecular Biology 1994; 36: 53–64. BARTH, H. G., BOYES, B. E., and JACKSON, C. Size exclusion chromatography. Analytical Chemistry 1994; 66(12): 595R–620R. LABROU, N., and CLONIS, Y. D. The affinity technology in downstream processing. Journal of Biotechnology 1994; 36(2): 95–119. REGNIER, F. E. High-performance liquid chromatography of proteins. Methods in Enzymology 1983; 91: 137–190. PRETORIUS, V, HOPKINS, B. J., and SCHIEKE, J. D. Electro-osmosis – A new concept for highspeed liquid chromatography. Journal of Chromatography 1974; 99(1): 23–30. WU, J. T., HUANG, P., LI, M. X., and LUBMAN, D. M. Protein digest analysis by pressurized capillary electrochromatography using an ion trap storage/reflectron time-of-flight mass detector. Analytical Chemistry 1997; 69(15): 2908–2913. BARTLE, K. D., and MYERS, P. Theory of capillary electrochromatography. Journal of Chromatography A 2001; 916(1–2): 3–23. TURKOVA, J. Bioaffinity chromatography. In: Analytical and preparative separation
References methods of biomacromolecules. ABOUL-ENEIN, H. Y., editors. Marcel Dekker, New York, New York, USA 1999: 99–165. 23 CELIS, J. E., and GROMOV, P. 2D protein electrophoresis: can it be perfected? Current Opinions in Biotechnology 1999; 10(1): 16–21. 24 ISSAQ, H. J. The role of separation science in proteomics research. Electrophoresis 2001; 22(17): 3629–3638. 25 HAN, D. K., ENG, J., ZHOU, H., and AEBERSOLD, R. Quantitative profiling of
differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nature Biotechnology 2001; 19(10): 946–951. 26 SCHUBERT, W. Ultrasensitive Hochdurchsatz-analyse von Proteomen auf Einzelzellniveau. Transkript Laborwelt 2000; 4: 42–43. 27 WEBER, K., and OSBORN, M. The reliability of molecular weight determinations by dodecyl sulfate-polyacrylamide gel electrophoresis. Journal of Biological Chemistry 1969; 244(16): 4406–4412.
17
1
Mechanisms of Protein Evolution Andreas Sebastian Bommarius Georgia Institute of Technology, Atlanta, USA
Bettina R. Riebel Emory University School of Medicine, Atlanta, USA
How does nature generate new enzyme functions? Several possible mechanisms for the development of novel function have been observed: (i) dual-functionality proteins (“moonlighting proteins” and catalytic promiscuity); (ii) gene duplication; (iii) horizontal gene transfer; and (iv) circular permutation of structure. There are three dominant principles of enzyme evolution: 1. Retained substrate specificity (binding), changed chemistry: A new enzyme evolves to supply substrate from an available precursor by evolution of enzyme using the substrate. The underlying hypothesis states that metabolic pathways evolve backwards: A → B → C, E A → B is new, E B → C is old enzyme; 2. Retained chemistry, changed substrate specificity (binding): Nature selects protein from a pool of enzymes whose mechanism provide a partial reaction or stabilization strategy for intermediates or transition states. Evolution decreases the proficiency of the reaction catalyzed by the progenitor. The underlying hypothesis states that chemical mechanism dominance starts with a low level of promiscuous activity and that once evolved it is beneficial for nature to utilize it over and over again. 3. Active-site architecture retained: In this case, the active site is able to support alternative reactions with shared functional groups in a different mechanistic or metabolic context. The study of α/β-barrel proteins, also termed (αβ)8 -barrel or TIM barrel proteins, is important because (i) the structure is found in 10% of all known structures, 68% of them enzymes; (ii) residues responsible for stability are separate from residues responsible for function; and (iii) the general scaffold catalyzes several classes of catalysis: oxidoreductases, hydrolases, lyases, and isomerases.
Originally published in: Biocatalysis. Andreas Sebastian Bommarius and Bettina R. Riebel. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim Print ISBN: 3-527-30344-1 Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Mechanisms of Protein Evolution
1 Introduction
How does nature generate new enzyme functions? How is a protein selected which already binds a desired substrate? How is the protein mutated until the catalytic function is up to the right level (this procedure corresponds to the generation and enhancement of activity in catalytic antibodies), or is an enzyme chosen that by accident already features a low level of activity and that, by further mutation, is selected towards the target (in analogy to protein engineering)? Ultimately, is it binding or is it chemical mechanism that is the decisive starting point for the development of novel enzymatic function? There are more than 10 000 protein structures known but only about 30 3D structure types. This might be traced to a limited number of possible stable polypeptide structures but most probably reflects the evolutionary history of the diversity of proteins. There are structural motifs which repeat themselves in a multitude of enzymes which are otherwise neither structurally nor functionally related, such as TIM barrel proteins, four-helix bundle proteins, Rossmann folds, or α/β-folds of hydrolases Figure 1. Since comparatively few structural types are found among proteins it appears likely that the evolutionary history of an enzyme can be traced through its structure, even if mutations have clouded the primary structure during the course of evolutionary time. Many different enzymes can evolve from a common protein structure. An enzyme catalyzing a certain activity can otherwise feature very different properties in different organisms, such as a melting point T m for GAPDH of between 50 ◦ C in the American lobster and 98 ◦ C in an archaeum from a deep-sea vent. While the essential structural components of GAPDH are conserved, the detailed amino acid sequence is not. 1.1 Congruence of Sequence, Function, Structure, and Mechanism
What determines protein function? Is it determined by the amino acid sequence of the protein? Or by three-dimensional structure? Or is the chemical mechanism most important? The probable answer is that protein function is not determined by any one factor alone but a combination of factors that, as yet, cannot be predicted well at all. Therefore, all the elucidated cases which are discussed in this chapter should not be treated as generalizable. The following examples illustrate the lack of general congruence between different factors impacting on protein function: Ĺ Lack of congruence of structure and mechanism: Common structure does not imply a common mechanism! The β-barrel structures triosephosphate isomerase and xylose isomerase function by hydride transfer through enol, whereas aldose reductase performs hydride transfer through a metal ion.
1 Introduction
Fig. 1 Repeatedly used structural motifs of protein 3D structures (Branden, 1999): a) "/$-fold, for the redox protein flavodoxin; b) closed "/$ barrel structure, for triose
phosphate isomerase; c) Rossmann fold between substratebinding domain (left) and dinucleotidebinding domain (right).
Ĺ Lack of congruence of sequence and structure with function: Common sequence and structure, indeed identity of the protein itself, do not imply a unique function! Each pair, o-succinylbenzoate synthase (OSBS)-N-acetylamino acid racemase, and lens crystallin-lactate dehydrogenase, share sequence and structure but differ in function.
3
4 Mechanisms of Protein Evolution
Ĺ Lack of congruence of function and structure: A common reaction catalyzed by two enzymes does not imply a common structure or sequence! Dehydrogenation is the reaction common to alcohol DH from horse liver (HLADH) and alcohol DH from yeast (YADH), but they have neither a common structure nor a common sequence. Ĺ Lack of congruence of structure and structure: In several cases, no single structure of an enzyme exists. Several (at least two) structures can be verified during the catalytic cycle! An example is distinct lipases, with an open (hydrophilic) and with a closed lid (hydrophobic).
Especially interesting is the question of congruence with respect to sequence and structure: while it was accepted long ago that each protein sequence folds into a unique 3D structure, the implicit assumption was that similar sequences are congruent with similar structures. Indeed, for sequences with more than 30% identical residues, a similar structure is predicted [42]. This high robustness of structures with respect to residue exchanges explains the scope available for variety in evolution. Surprisingly, most structurally similar proteins have less than 45% pairwise sequence identity, similar folds can often arise from protein pairs below 25–30% of pairwise sequence identity and, in one study, most pairs of structurally similar proteins even clustered around 8–9% sequence identity, a level expected for two random, unrelated sequences [41]. In conclusion, most pairs of similar structures have sequence identity as low as randomly related sequences while on average only 3–4% of all residues are “anchor” residues crucial for maintaining the structure. The position of protein function has undergone a radical change since the advent of conveniently accessible databases in the 1990s with a plethora of sequence (and increasingly, also structural) information. As Figure 2 illustrates, protein function before the advent of databases was the starting point for characterizing a protein in most cases. Information about function was available before sequence, mechanism, or biochemical details of the protein were known. With the available databases, assignment of protein function is now often the endpoint of the investigation, which often starts as the investigation of an annotated function [12]. Certainly, both DNA and amino acid sequences are known before the assignment of (experimentally verified) function and, lastly, biochemical studies.
2 Search Characteristics for Relatedness in Proteins 2.1 Classification of Relatedness of Proteins: the -log Family
The simplest measures for comparison of two proteins, the degree of sequence identity (identical amino acids in a certain position in the sequence) and similarity (similar amino acids in a certain position), do not specify the relationship between
2 Search Characteristics for Relatedness in Proteins
Fig. 2 Congruence between sequence, structure, function, and mechanism.
the two proteins. The following terms help to characterize such relationships; examples for each term are provided in Table 1: Ĺ Homologs: enzymes which derive from a common ancestor and thus are related. A high degree of sequence homology allows identification from sequence databases, whereas highly divergent homologs with respect to sequence need additional proof from common 3D structures. Ĺ Orthologs: homologs in different species that catalyze the same reaction. Ĺ Paralogs: homologs in the same species that have diverged from each other. Ĺ Analogs: enzymes catalyzing the same reaction but that are structurally unrelated. The terms above imply the mode of relationship but not a timeline. Evolution can be thought of as divergent or convergent in time: Divergent evolution refers to two or more species that originate from a common ancestor but that are becoming more and more dissimilar over time. Examples include: Ĺ even-toed (with an even number of toes on each hoof) hoofed mammals. During the Eocene epoch, only one species of even-toed hoofed mammal existed. Table 1 Examples of different relatedness in proteins
Category
Examples
References
Homologs
subtilisin in B. subtilis, S. carlsberg, B. liquifaciens MLE and MR in Pseudomonas putida D-lactate DH and FDH in yeast subtilisin in B. subtilis, S. carlsberg, B. liquifaciens MLE and MR in Pseudomonas putida HL-ADH and YADH
v.d. [17, 47, 48] v.d. [47] [17] many
Orthologs Paralogs Analogs
HL-ADH: horse liver alcohol dehydrogenase; Y-ADH: yeast alcohol dehydrogenase; FDH: formate dehydrogenase; D-Lactate DH: D-lactate dehydrogenase; MLE: muconate-lactonizing enzyme; MR: mandelate racemase.
5
6 Mechanisms of Protein Evolution
Fig. 3 Specific example of divergent evolution: "/$-barrel enzymes [18].
However, by the Pleistocene epoch, multiple species had diverged from this original species, such as the bison, pronghorn, chevrotain, peccary, wild boar, hippopotamus, giraffe, deer, and camel. Ĺ many, possibly all, α/β-barrel or TIM barrel enzymes. This fold functions as a generic scaffold catalyzing 15 different enzymatic functions. The numbers in Figure 3 of the α/β-barrel fold indicate the different locations of the active site in four proteins that have this fold. These four proteins – xylose isomerase, aldose reductase, enolase, and adenosine deaminase – carry out very different enzymatic functions, in four of the main E.C. classes (1.-.-, 3.-.-, 4.-.-, and 5.-.-). Convergent evolution occurs when species with different ancestors evolve to look similar. This happens as a result of adaptation to living in similar environments. Examples of convergent evolution include: Ĺ the body shape of dolphins, sharks, and penguins; red tube-shaped flowers of plants pollinated by hummingbirds; the long bills of hummingbirds and sunbirds designed to reach the nectar at the base of flowers. Ĺ the structures of two carbonic anhydrases with the same enzymatic function but with different folds.
2 Search Characteristics for Relatedness in Proteins
Fig. 4 Specific example of convergent evolution [18].
2.2 Classification into Protein Families
The protein sequence and structure databases are now sufficiently representative for the strategies nature uses to evolve new catalytic functions to be identified. Another way to classify related enzymes concerns grouping into different levels of structural similarity: Ĺ Fold: a common fold between proteins is indicated by the same major secondary structures in the same spatial arrangement and the same topological connection. Ĺ Families: a group of homologous enzymes catalyzing the same reaction (mechanism and substrate specificity); there is often > 30% sequence identity. Ĺ Superfamilies: a group of homologous enzymes that catalyze either (i) the same reaction with differing substrate specificity, or (ii) different reactions with a shared mechanistic attribute (transition state, intermediate); there is often < 20% sequence identity. Ĺ Suprafamilies: a group of homologous enzymes that catalyze different overall reactions which do not share common mechanistic attributes and do not share a high level of sequence identity (< 20%) but are metabolically linked such as in successive reactions. Active-site residues may be conserved but perform different functions in family members. Tables 2 and 3 list known superfamilies and suprafamilies with some of their family members.
7
8 Mechanisms of Protein Evolution Table 2 Known superfamilies and some of their members [37]
Superfamily
Examples
References
Enolases
mandelate racemase (MR), muconate-lactonizing enzyme (MLE), N-acetylamino acid racemase (NAAR) urease, amidase, carbamoylase, phosphotriesterase, chlorohydrolase hydroxynitrile lyase (HNL), haloalkane dehalogenase crotonase, enoyl-CoA hydratase, β-hydroxyisobutyryl-CoA hydrolase glyoxalase I, dioxygenase
[17, 37]
Amidohydrolases α/β-hydrolases crotonases vicinal-oxygen chelates (VOC) Fe-dependent oxidase/oxygenases
Holm, [23]
[33] Babbitt, [4]
cephalosporin synthase, isopenicillin N-synthase (IPNS)
Two types of functionally distinct suprafamilies have been identified [15]: Ĺ suprafamilies whose members catalyze successive transformations in the tryptophan and histidine biosynthetic pathways, and Ĺ suprafamilies whose members catalyze different reactions in different metabolic pathways.
An understanding of the structural bases for the catalytic diversity observed in super- and suprafamilies may provide the basis for discovering the functions of proteins and enzymes in new genomes as well as providing guidance for in-vitro evolution/engineering of new enzymes. 2.3 Dominance of Different Mechanisms
Independently of the mode of evolution of new function, several different mechanisms can be dominant, in either substrate specificity, or chemical mechanisms, or active-site architecture. Table 3 Known suprafamilies and some of their members [15]
Suprafamily
Examples
References
His biosynthesis
PFACRI and ImGPS OMPDC, R5PE, HUMPS
[27] [15]
PFACRI: N -[(5 -phosphoribosyl)formimino]-5-aminoimidazole-4-carboxamide ribonucleotide isomerase; ImGPS: imidazole 3-glycerol phosphate synthase; OMPDC: orotidine 5 -phosphate decarboxylase; R5PE: ribulose 5-phosphate epimerase; HUMPS: hex-3-ulose monophosphate synthase.
2 Search Characteristics for Relatedness in Proteins
1. Substrate specificity dominance: The first and best known dominant mechanism in evolution concerns substrate specificity. The hypothesis underlying this mechanism states that metabolic pathways evolve backwards [Eq. (1), where E1 is the new enzyme and E2 is the old enzyme].
E1 E2 A −→ B −→ C
(1)
New enzyme evolves to supply substrate from an available precursor by evolution of enzyme using the substrate. Old and new enzymes are members of the same suprafamily. This hypothesis was first espoused by [24]. 2. Chemical mechanism dominance: This second mechanism assumes that it is not binding but chemical mechanism that is most difficult to create anew. Once a chemical mechanism has been evolved it is beneficial for nature to utilize it over and over again. Nature selects protein from a pool of enzymes whose mechanism provides partial reaction or stabilization strategy for intermediates or transition states. Evolution decreases the proficiency of the reaction catalyzed by the progenitor. The underlying hypothesis of chemical mechanism dominance is that it starts with a low level of promiscuous activity [Eq. (2)]. (kcat /kuncat )progenitor /(kcat /kuncat )successor 1(old); < 1(new)
(2)
Old and new enzymes developed by this mechanism are members of the same superfamily. The main champions of this mechanism are [37]. Readers are also encouraged to consult the reference on catalytic promiscuity by [35]. 3. Dominance of active-site architecture: In some cases, the intuitive notion that the active-site architecture is rarely the dominant influence will play the most important role. In such a case, the active site is able to support an alternative reaction with shared functional groups in a different mechanistic or metabolic context. Old and new enzymes are going to be members of a single suprafamily. Binding of proteins to putative substrate molecules was already claimed more than 50 years ago to be the most important property of future enzymes [24]; catalysis was described as a consequence of evolution from an existing binding situation and was thought to be the simpler step. However, after analysis of more than 100 000 confirmed or predicted enzyme sequences and characterization of more than 4000 3D structures of enzymes, only about 30 enzyme 3D structure types are known (see above). The work on mechanistic similarity of superfamily members demonstrates that acquisition of novel substrate specificity on the basis of existing catalytic mechanisms seems to be the more relevant way of generating new enzymatic activity. Catalysis, even in its sometimes initial rudimentary way, seems to be at least as important as a basis for new enzymes as binding to a specific substrate, which seems to be acquired more easily in foreseeable timeframes.
9
10 Mechanisms of Protein Evolution
3 Evolution of New Function in Nature
In nature, several possible mechanisms for the development of novel function have been observed. They can be summarized as:
Ĺ Ĺ Ĺ Ĺ
dual functionality proteins (“moonlighting proteins” and catalytic promiscuity), gene duplication, horizontal gene transfer, and circular permutation of structure.
Evolution counteracts simplistic notions of a static world in which all effects have one cause and where linear thinking and operations are dominant. According to the central dogma of biology, if all information flows from DNA to RNA to protein. As reverse transcriptases exist and transform RNA back into DNA, we know the central dogma is not strictly correct. In this chapter, we will encounter several situations which collectively challenge the notion of one gene-one protein-one structure-one mechanism-one function Figure 5 summarizes possible options for evolution of function [45]. Evolutionary optimization can occur by residue hopping of a functional residue to a different domain, strand, or subunit of the protein (a). Alternatively, residue hopping results in a new function (b). Recruitment of catalytic groups from different proteins for activity optimization might result in two novel and different enzymes (c, speciation). As discussed below, gene duplication can occur and different functions can develop on the proteins coded for by the genes (d, duplication and divergence). Lastly, in a circular protein the N- and C-termini can fuse and open up new termini in a different position on the protein while conserving the catalytic residues (e, circular permutation). One example, although not yet for optimization, is the shift of the catalytic lysine from one β strand to a neighboring strand: catalytic competency is maintained despite spatial reorganization of the active site with respect to substrate; this case could cover a) or b) [54]. Duplication and divergence and circular permutation are covered below.
3.1 Dual-Functionality Proteins 3.1.1 Moonlighting Proteins Proteins with more than one function, also called moonlighting proteins, are much more frequent than expected. Such proteins challenge the chain of assumptions of “one gene-one protein-one function”. The state of affairs has been summarized in
3 Evolution of New Function in Nature
Fig. 5 Hypotheses regarding the nonequivalence of functional residues that play the same catalytic role in enzyme homologs [45]. Each shape represents a protein structure, and each “segment” in the protein structure represents a position in the fold at which a functional residue can be situ-
ated. Circles denote functional residues. “F” and “F1”, etc., denote particular functions, and “optF” denotes the same function as “F”, but optimized. “newF” denotes a new function., “N” and “C” denote the N- and C-termini, respectively.
a review article [25]. Examples of moonlighting protein include: Ĺ Ĺ Ĺ Ĺ Ĺ Ĺ
serum albumin: no activity vs. Kemp elimination; lens crystallins vs. lactate DH; thymidylate synthase vs. translation inhibitor; thrombin (protease) vs. cell surface receptor ligand; aconitase vs. iron-responsive element binding protein; and uracil-DNA-glycosidase vs. glyceraldehyde-3-phosphate DH.
How can a cell differentiate which of the two functions a protein molecule is supposed to exert? There are several modes of differentiation, all depicted in Figure 6.
11
12 Mechanisms of Protein Evolution
Fig. 6 Examples of mechanisms of switching between one function and the other [25]: a) different functions in different locations of the cell; b) proteins with enzymatic activity in cell cytoplasm but growth factor function when secreted; c) proteins with different functions
when expressed by different cell types; d) different activity of protein caused by binding of substrate, product, or cofactor; e) protein with different function in monomer than in multimer; f) different function through interaction with different polypeptides; and g) proteins with different binding sites for different substrates.
3 Evolution of New Function in Nature Table 4 Examples of promiscuous enzymes
Enzyme
First function
Second function
Subtilisin
peptidase
esterase
Aspartate amino-transferase (AAT) Phytase Hydroxynitrile lyase (HNL) Rubisco
Decarboxylases (PDC, ADC) Chymotrypsin O-Succinyl-benzoate synthase (OSBS)
Comment
common to serine and cysteine proteases transamination decarboxylation in wt, transamination dominates phosphomono- sulfoxidation sulfoxidation aided esterase by vanadate addition oxynitrilase esterase catalytic triad Ser-His-Asp carboxylase Oxygenase (addition of (addition of CO2 ) O2 ) decarboxylase carboligation thiamine pyrophos-phate (TPP) catalysis amidation phosphotriesterase synthase racemization identical protein: (OSBS) (NAAR) racemization fortuitous?
Reference
many
[36, 55]
[46] [49] [44]
[8]
many [37]
PDC: pyruvate decarboxylase; ADC: acetoacetate decarboxylase; NAAR: N-acetylamino acid racemase; OSBS: O-succinylbenzoate synthase.
3.1.2 Catalytic Promiscuity In contrast to, and possibly as a subset of, moonlighting proteins, other proteins exist which feature not just two functions but even two different catalytic functions. Such proteins challenge the assumption of “one protein-one mechanism-one function”. In contrast to moonlighting proteins, which often are unknown to a researcher working in biocatalysis, proteins demonstrating catalytic promiscuity are fairly common and have also, in fact, been discussed in previous chapters of this book. Table 3 lists some examples of promiscuous enzymes, for a review, readers are referred to [35]. In cases of promiscuous activity, one species in the pathway often acts as a common intermediate for both mechanisms leading to different activities. An enamine between PLP and the substrate branches off into a decarboxylation or a transamination, a TPP-bound intermediate can react either to decarboxylation or to carboligation, and the triad Ser-His-Asp in hydroxynitrile lyase is responsible for both the main function, oxynitrilation, and the subordinate function, ester hydrolysis. It can be assumed that many more promiscuous functionalities will be discovered in the coming years.
13
14 Mechanisms of Protein Evolution
3.2 Gene Duplication
Possibly the most important source of novel enzyme activity is divergence of genetic material from a common ancestral source by gene duplication, twofold incorporation of the same DNA material upon duplication of the cell. Gene duplication can be viewed from the perspectives of post-duplication and of the extent of the genomic region involved. The former encompasses (i) duplications of single structural genes followed by divergent evolution, (ii) genes which exist in several copies within each genome but which remain essentially identical to each other in DNA sequence and in function, and (iii) in eukaryotes only, relatively short sequences of DNA of various lengths that are each repeated many times (106 ) and which are not identical. The latter can be classified as (i) partial or internal gene duplications, (ii) complete gene duplications, (iii) partial chromosomal duplications (regional duplications), (iv) complete chromosomal duplications and, lastly, (v) polyploidy or whole-genome duplications. The idea of gene duplications as a major means of creating new genes and ultimately new function is about 70 years old and connected with the name of John B. S. Haldane, who suggested in 1932 that a redundant duplicate of a gene may acquire divergent mutations and eventually emerge as a new gene. 3.3 Horizontal Gene Transfer (HGT)
Comparison of gene sequences is the primary way to study evolution of microbes. Most early studies assumed species evolved primarily by vertical inheritance, normal chromosomal inheritance, from parent or parents. We have been thinking of inheritance as vertical inheritance but we now know that horizontal gene transfer, gene transfer from species to species or genus to genus, has played a significant role in evolution. Drawing gene-based family trees for microbes offers a challenge owing to horizontal gene transfer. Bacteria and other single-celled creatures regularly pick up genetic material from organisms outside their species, even distant relatives. However, the recent sequencing of microbial genomes reveals that horizontal transfers have occurred far more often than most researchers had appreciated. In a study of the common gut bacterium E. coli, investigators calculate that nearly 20% of its DNA originated in other microbes. In 1859, the great Charles Darwin in his seminal work The Origin of Species discussed the universal phylogenetic tree, a taxonomy based on genes and pheno-type. Figure 7, incidentally the only Figure in The Origin of Species, illustrates Darwin’s idea. More than a century later, Erwin Zuckerkandl and Linus Pauling at Caltech (Pasadena/CA, USA) in 1965 came up with the idea of molecular phylo-genetics, a taxonomy based on molecular sequences rather than mostly on pheno-type. A few years later, in the 1970s, Carl Woese (University of Illinois, Urbana-Champaign/IL, USA) developed what today is called the standard model, a taxonomy based on the sequence relationships of small-subunit ribosomal RNA (ssu RNA) Figure 8.
3 Evolution of New Function in Nature
Fig. 7 Universal phylogenetric tree [9].
Since then, the standard model has been called into question; W. Ford Doolittle (Dalhousie University, Halifax, Nova Scotia, Canada) in 1999 advanced the notion of chimerism. Chimerism renders the role of the universal tree unclear Figure 9. At the beginning of evolution, cellular entities (progenotes) were very simple and information processing systems inaccurate. Enhanced levels of horizontal gene transfer were responsible for the evolutionary dynamic rather than vertical inheritance caused by the mutation rates (themselves high). Organismal lineages, and so organisms as we know them, did not exist at these early stages. With increasingly complex and precise biological structures and processes evolving, the mutation rate and the scope and level of lateral gene transfer decreased. One after the other, the various subsystems of the cell became refractory to horizontal gene transfer, with the translation apparatus probably crystallizing first and the evolutionary dynamic gradually becoming that characteristic of modern cells. The universal ancestor at
Fig. 8 Standard model of taxonomy [53].
15
16 Mechanisms of Protein Evolution
Fig. 9 Chimerism in the “tree of life” [10].
the base of the universal phylogenetic tree, therefore, is not a discrete entity but a diverse community of cells that survives and evolves as a biological unit [53]. It is the community as a whole, the ecosystem, which evolves. As a cell design becomes more complex and interconnected, a critical point is reached where a more integrated cellular organization emerges, a critical point, termed the “Darwinian threshold”, is reached at which vertically generated novelty can and does assume greater importance. How does horizontal gene transfer occur? Table 5 describes the levels and mechanisms of HGT. Horizontal gene transfer (HGT) is a natural genetic transformation that is believed to be the essential mechanism for the attainment of genetic plasticity in many species. Kanamycin, streptomycin, and tetracycline resistance genes found in Gram-negative and Gram-positive bacteria are essentially identical in nucleotide sequence. Thus, the development of resistance against antibiotics in organisms can act as proof of the existence of horizontal gene transfer. Bacteria of the genus Agrobacterium have developed a special mechanism that allows transfer of genes Table 5 Levels and mechanisms of horizontal gene transfer
Level
Mechanism
Recombination within species Hybridization between species Interspecies exchange Plasmids and extrachromosomal elements Organelle to nucleus Viruses Transposons
passive absorption mating/conjunction scavenging from environment plasmid exchange organelle to nucleus exchange viruses and viroids transposable elements fusion of cells
4 "/$J-Barrel Proteins as a Model for the Investigation of Evolution 17
from the bacterium to higher plant chromosomes. Horizontal gene transfer is suspected of contributing up to 50% of bacterial genetic material, which dramatically underscores the importance of this pathway for the development of evolution. As much as HGT affects the concept of a “tree of life” and even calls into question the notion of a prokaryote species, current thinking argues that each organism has a core of genes, responsible for critical functions, which is not transferred. The surprisingly large extent of gene swapping has researchers asking whether comparisons of genes indicate how closely related different microbes are and if such analyses can truly point the way back to a universal ancestor. Sequence comparisons suggest the recent horizontal transfer of many genes among diverse species, including transfer across the boundaries of phylogenetic “domains”. Thus determining the phylogenetic history of a species cannot be done conclusively by determining evolutionary trees for single genes. The wide transfer of genes that must have occurred naturally weakens the argument that human intervention through genetic engineering is placing genes where they have never been before. 3.4 Circular Permutation
In proteins with a symmetric structure, circular permutation can account for the shift of active-site residues over the course of evolution. A very good model of symmetric proteins are the (βα)8 -barrel enzymes with their typical eightfold symmetry. Circular permutation is characterized by fusion of the N and C termini in a protein ancestor followed by cleavage of the backbone at an equivalent locus around the circular structure. Both fructose-bisphosphate aldolase class I and transaldolase belong to the aldolase superfamily of (αβ)8 -symmetric barrel proteins; both feature a catalytic lysine residue required to form the Schiff base intermediate with the substrate in the first step of the reaction. In most family members, the catalytic lysine residue is located on strand 6 of the barrel, but in transaldolase it is not only located on strand 4 but optimal sequence and structure alignment with aldolase class I necessitates rotation of the structure and thus circular permutation of the β-barrel strands [26].
4 α/βJ-Barrel Proteins as a Model for the Investigation of Evolution 4.1 Why Study α/β-Barrel Proteins?
The study of α/β-barrel proteins, also called TIM barrels after the first enzyme investigated in this class, triose phosphate isomerase, or (βα)8 -barrel enzymes after the maximally eightfold symmetry axis, is important for the following three
18 Mechanisms of Protein Evolution
Fig. 10 Separation of catalysis and stability domains in barrel proteins [22].
reasons: 1. the structure has been found in 10% of all known structures, and 68% of these known structures are enzymes; 2. the residues responsible for stability are separate from the residues responsible for function Figure 10; and 3. the general scaffold catalyzes all classes of catalysis: oxidoreductases, hydrolases, lyases, and isomerases Figure 3.
The main characteristics of α/β-barrel proteins are: Ĺ a high degree of symmetry: they consist of a circular or elliptical eightfold repeat of (βα) units, with eight β-strands are surrounded by eight α-helices, each strand being H-bonded to two neighboring strands; Ĺ separation of purpose: the β/α-loops (follow each β-strand) are important for function, the α/β-loops (follow each α-helix) are important for stability; the active site is found at the C-terminal end of the β-strands; and Ĺ evolutionary relationship: evolution has been divergent for most α/β-barrel families, from a less efficient and specific common ancestor for enzymes involved in central metabolism [11].
α/β-Barrel enzymes can be classified in six families (listed in Table 7 for simplicity; [40]) or even 21 superfamilies with 76 sequence families [34], determined by (i) the domains in the β/α-loops, (ii) the shape of the barrel, and (iii) the location of extra sheets or helices such as enolases with their ββαα(βα)6 topology instead of the (βα)8 -barrel fold. Owing to the importance of the fold and rapid progress, the subject of α/β-barrel proteins has been reviewed extensively. Structural evidence for evolution of the β/α barrel scaffold by gene duplication and fusion is provided by [30]. Aspects of stability, catalytic versatility, and evolution of the (βα)8 -barrel fold have been covered by [21]. The prevalence of divergent evolution in (βα)8 -barrel enzymes
4 "/$J-Barrel Proteins as a Model for the Investigation of Evolution 19 Table 6 α/βBarrel proteins [40]
Family
Name
Cofactor
A A A
flavocytochrome b2 glycolate oxidase trimethylamine dehydrogenase ribulose-1,5-biphosphate carboxylase/oxygenase mandelate racemase
FMN, heme oxidation FMN oxidation FMN, 4Fe- 4S oxidative demethylation
A B
B B B B B B B B
C C C C
C D D
D E E F F * * PSR PSR PSR
Mg2+ Mg2+
muconate cycloisomerase chloromuconate cycloisomerase α-amylase
Mn2+ Mn2+
β-amylase oligo-1,6-glucosidase cyclodextrin glycosyl-transferase pyruvate kinase xylose isomerase
none none Ca2+
N-(5 -phosphoribosyl) anthranilate isomerase indole-3-glycerol phosphate synthase tryptophan synthase (α-unit) triose phosphate isomerase
none
narbonin fructose bisphosphate aldolase 2-dehydro-3-deoxyphosphogluconate aldolase N-acetylneuraminate lyase aldose reductase 3α-hydroxysteroid dehydrogenase adenosine deaminase phosphotriesterase (1 →3) -β-glucanase (1→3,1→4)-β-glucanase endo-β-N-acetylglucoseaminidase urease old yellow enzyme
Ca2+
Mg2+ , K+ Mg2+
none
Chemical reaction
formation of an enediol(ate) addition of either CO2 or O2 /C C bond cleavage proton abstraction to generate a carbanionic intermediate reprotonation of the intermediate cycloisomerization cycloisomerization endohydrolysis of 1,4-α-d -glucoside bonds hydrolysis of 1,4-α-d -glucoside bonds hydrolysis of 1,6-α-d-glucoside bonds formation (and hydrolysis) of 1,4-α-d-glucoside bonds proton abstraction phosphorylation hydride transfer between adjacent carbons during aldose/ketose isomerization Amadori rearrangement
none
decarboxylative closure of the indole ring aldol cleavage proton transfer between two adjacent carbons during aldose/ketose isomerization no known enzymatic activity aldol cleavage
none
aldol cleavage
none NADPH NAD(P)H
aldol cleavage reduction reduction
Zn2+ Zn2+ none none
hydrolysis hydrolysis hydrolysis hydrolysis
none none
Ni2+ FMN, NADP
20 Mechanisms of Protein Evolution Table 7 α/β Barrel enzyme families [40]
Family
Characteristics
Examples
A
barrels nearly circular small helix between β-strand 8 and α-helix 8 domain covering N-terminus of the barrel major axis near β-strand 1 MR, MLE are missing final α-helix domains cover the C-terminal end of the barrel domain blocks N-terminus of the barrel composed of a single domain major axis near β-strand 3 small helix between β-strand 8 and α-helix 8 have an α-helix preceding the α/β barrel this helix blocks the N-terminal end of the barrel major axis of ellipse points towards β-strand large loops at the carboxyl-terminal end of the barrel after β-strands 4 and 7 bind NAD long thin barrel structure group of α-helices in the carboxyl-terminal loop between β-strand 1 and α-helix 1 require divalent transition metal ion
glycolate oxidase, trimethylamine dehydrogenase
B
C
D
E
F
mandelate racemase, muconate cycloisomerase, xylose (glucose) isomerase tryptophan synthase, triose phosphate isomerase (TIM), PRAI, IGPS (see Figure 13 fructose-biphosphate aldolase, 2keto-3-deoxy-6-phospho-gluconate aldose reductase, 3α-hydroxysteroid dehydrogenase
phosphotriesterase, adenosine deaminase
* Not all the characteristics refer to each member of the family. There are other α/β-barrel enzymes that are in a family.
has been argued [19]. Recent work has examined natural, designed, and directed evolution of function in several super-families of (βα)8 -barrel-containing enzymes [32]. Lastly, α/β-barrel proteins are interesting because there exists at least one very good example with this class of fold for all modes of evolution of biocatalytic function: gene duplication, horizontal gene transfer, and circular permutation. The α/β-barrel template allows enzyme improvement through both protein engineering by site-directed mutagenesis and directed evolution by random mutagenesis. Several studies have been performed with the goal of enzyme improvement, most notably by [27], who changed substrate specificity of HisA to that of PRAI (see Figure 13) although the two enzymes feature only 10% sequence identity (for an example outside the realm of α/β-barrels, see [31]). 4.2 Example of Gene Duplication: Mandelate and α-Ketoadipate Pathways
Mandelate racemase (MR) enables some strains of the common soil bacterium Pseudomonas putida to utilize mandelate from decomposing plant matter as a carbon source. MR is the first of five enzymes in the bacterial pathway that converts mandelate to benzoate. Then benzoate is broken down by another set of five
4 "/$J-Barrel Proteins as a Model for the Investigation of Evolution 21
enzymes of the ensuing β-ketoadipate pathway to compounds that can be used to generate ATP, the cell’s major source of chemical energy. During the investigation of the mandelate metabolism (Figure 11), a team of eminent bioscientists collaborating from several independent research groups, among them George Kenyon from the University of California at San Francisco (UCSF), John Gerlt from the University of Illinois at Urbana-Champaign/IL, John Kozarich from Merck (Rahway/NJ), and Greg Petsko from MIT (Cambridge/MA), later Brandeis University (Waltham/MA), all USA, noticed several surprises when characterizing the enzymes of the mandelate and β-ketoadipate pathways [38]: Ĺ MR, an α/β-barrel protein, features a high degree of similarity with muconatelactonizing enzyme (MLE) of the β-ketoadipate metabolism: 26% sequence identity, the same symmetry and organization of subunits (octamer), a divalent metal ion requirement (Mn2+ for MLE, Mg2+ for MR). As MLE is almost ubiquitous among Pseudomonas, compared to 5% frequency of MR, it can be concluded that MLE is the older enzyme and probably the precedessor of MR. Ĺ As all enzymes of the mandelate path are located on one operon, the other four enzymes could also be identified, cloned, and sequenced. (S)-Mandelate dehydrogenase [(S)-ManDH] features similarities to glycolate oxidase and flavocytochrome b2 (a lactate-DH): both are α/β-barrel proteins and require FMN as co-factor as does (S)-ManDH. It is reasonable to conclude that in this case nature has also sequestered a template that already catalyzes a chemistry similar to the desired reaction. Ĺ Benzoylformate decarboxylase (BFD), a TPP-dependent enzyme, features a high degree of sequence similarity with a whole family of other TPP-dependent decarboxylases, among them pyruvate decarboxylase (PDC), one of the most important enzymes in metabolism. Ĺ After comparison of DNA sequences of an open reading frame (orf) of the mandelate pathway with databases it was found that there exists sequence similarity of an otherwise unannotated piece of sequence with some amide hydrolases. It is suspected that this DNA sequence piece codes for a mandelate amidase which catalyzes hydrolysis of mandelamide into mandelate and ammonium ion. This hypothesis recently was borne out and the mandelamide hydrolase found [32].
The following conclusions can be drawn from these findings: Ĺ The mandelate and β-ketoadipate pathways are a good example of gene duplication, with strong evidence of the former evolving from the latter: the congruence of MR and MLE, (S)-ManDH and benzoate dihydrodiol dehydrogenase, and possibly benzoyl formate decarboxylase and protocatechuate decarboxylase. Ĺ Evolution of MR from MLE and possibly (S)-ManDH from benzoate dihydrodiol dehydrogenase points to the relevance of the enzyme mechanism for catalytic reactivity. If the chemistry is right, the remaining necessary enhancement of specificity, i.e., the enhancement of binding of substrate, is the simpler task, in contrast to the procedure for design of catalytic antibodies.
22 Mechanisms of Protein Evolution
Fig. 11 Mandelate and $-ketoadipate metabolism [38].
4 "/$J-Barrel Proteins as a Model for the Investigation of Evolution 23
4.2.1 Description of Function MR from Pseudomonas putida ATCC 12633 catalyzes the reversible interconversion of (R)-mandelate and (S)-mandelate, by removing a hydrogen ion from one side of mandelate and attaching it to the other. In the mandelate pathway, (S)-mandelate dehydrogenase is specific for the (S)-enantiomer, thus requiring the racemase to allow (R)-mandelate oxidation to benzoate. MR therefore allows utilization of either enantiomer of mandelate. There is persuasive evidence that the reaction catalyzed by MR proceeds via a two-base mechanism in which Lys166 abstracts the α proton from (S)-mandelate [28] and His297 the α proton from (R)-mandelate [29]. MR is not inhibited by pathway intermediates, indicating that it has no regulatory role. Reversible inhibition was, however, observed to occur when carboxylic acids having an aromatic substituent were incubated with the enzyme prior to addition of mandelic acid. So what is the source of the power of MR catalysis? Many enzymes catalyze the abstraction of a proton in the α-position to a carbonyl group, such as in a carboxylic acid. The enzymatic rates of abstraction, however, are much higher than could be inferred from the ApK a between substrate in solution and the base in the active center [Eq. (3)] (kB T/h = 6.2 × 10−1 s−1 ).
k = (k B T/ h)exp{−[G = /RT ) + 2.303 · pK a ]}
(3)
For carboxylic acids, G= > 30 kJ mol−1 , so pK a < 2–5. The catalytic rate constants of the reaction, however, are often between 102 and 105 s−1 . Through analysis of known and well estimated pK a data, electrophilic catalysis can be demonstrated with a cycle process (Figure 12) to suffice as a quantitative explanation for the observed rates.
Fig. 12 Cycle process: pK a values of mandelic acid at different degrees of protonation [13].
24 Mechanisms of Protein Evolution
If K E signifies the concentrations of enol and keto tautomers in a carboxylic acid, then pK E is the difference between the pK a values of the α-protons of the keto tautomer and the hydroxyl group of the enol tautomer. pK E , however, is also the difference in pK a values between the α-protons and the proton of the carbonyl group of the carbonyl-protonated acid, that is deemed to be decisive [13] for the kinetics of abstraction of a proton, rather than the pK a value of the substrate in solution. The pK E of mandelic acid (15.4) links the pK a of the α-proton of the keto tautomer (22.0) with the pK a value of the enol tautomer (6.6). The pK E value also links the pK a value of the α-proton with that of the carbonyl-bound proton of the protonated mandelic acid. If the pK a value of the carbonyl-bound proton of the protonated mandelic acid is assumed to be about −8.0 then the pK a value of the α-proton is about 7.4. This value matches well with the pK a -values of Lys and His residues which have been assigned recently in the active center of mandelate racemase, so electrophilic catalysis alone is able to explain the catalytic power of mandelate racemase. 4.3 Exchange of Function in the Aromatic Biosynthesis Pathways: Trp and His Pathways
In the biosynthesis pathways of histidine and tryptophan, the enzymes HisA (N [(5 -phosphoribosyl)formimino]-5-aminoimidazole-4-carboxamide ribonucleotide isomerase) and TrpF (PRAI, phosphoribosylanthranilate isomerase) (Figure 13), both of which are (βα)8 -barrels, both catalyze Amadori rearrangements of a thermolabile aminoaldose into the corresponding aminoketose (Figure 13). All four enzymes in Figure 13, His A, His F, TrpF, and TrpC, have a similar (βα)8 -barrel structure. Site-directed mutagenesis of conserved residues at the active sites of both HisA and TrpF and steady-state enzyme kinetics of variants pointed to the following residues as catalytically important [20]: aspartate-8, aspartate-127, and threonine-164 for the HisA reaction, cysteine-7 and aspartate-126 for the TrpF reaction. In conjunction with a three-dimensional structure of an enzyme-product analog conjugate, a similar reaction mechanism involving general acid-base catalysis and a Schiff base intermediate is postulated for both enzymes. With the goal of creating an ancestor-like enzyme for N -[(5 phosphoribosyl)formimino]-5-aminoimidazole-4-carboxamide ribonucleotide isomerase (HisA; E.C. 5.3.1.16) and phosphoribosylanthranilate isomerase (TrpF; E.C. 5.3.1.24), HisA variants were generated by random mutagenesis and selection via an auxotroph [27]. Several variants catalyzed the TrpF reaction and one variant even retained significant HisA activity, the quadruple mutant Asn24Asp, Arg97Gly, His175Tyr, and Met236Ile. A single amino acid exchange, Asp127Val, close to the active site was found to be able to establish TrpF activity on the HisA scaffold. According to these results, HisA and TrpF may indeed have evolved from an ancestral enzyme of broader substrate specificity. HisA and HisF both consist of α/β-barrels with a twofold repeat pattern. It is likely that these proteins evolved by twofold gene duplication and gene fusion from
4 "/$J-Barrel Proteins as a Model for the Investigation of Evolution 25
Fig. 13 Evolution within the His and Trp biosynthesis pathways [27].
a common half-barrel ancestor [30]. The strongest evidence for this idea, besides the twofold sequence repeat pattern, stems from the catalytic aspartate residue which is located at the end of β-strand 1 in the HisA-N-terminal half (HisA-N) and HisF-N, and on β-strand 5 in HisA-C and HisF-C. Furthermore, the individual purified half-barrel proteins HisF-N and HisF-C, catalytically inactive but featuring a well-defined secondary and tertiary structure, assemble into the cata-lytically fully active full barrel HisF-NC [21]. As a summary of all the current results, the picture in Figure 14 emerges.
26 Mechanisms of Protein Evolution
Fig. 14 Evolution of "/$-barrel protein scaffold [30].
The earliest currently discernible ancestor was a (βα)4 -half barrel protein which, through gene duplication, generates two initially identical half-barrels which fuse and form an ancestral α/β-barrel protein. This full-barrel ancestor already featured or developed catalytic activity on intermediates of the histidine biosynthesis pathway. The second gene duplication step leads to the diversification of the ancestral α/β-barrel protein into two enzymes with distinct catalytic activities, in this case into HisA and HisF catalyzing successive reaction in the histidine biosynthesis pathway. Therefore, the Horowitz hypothesis seems to have been demonstrated on the example of HisA and HisF as one possibility for the development of novel enzymatic activity.
Suggested Further Reading C. C. DARWIN, On the Origin of Species by Means of Natural Selection or the Preservation of Favored Races in the Struggle for Life (The Origin of Species), John Murray, London, 1859. J. A. GERLT and F. M. RAUSHEL, Evolution of function in (β/α)(8 )-barrel enzymes, Curr.
Opin. Chem. Biol. 2003, 7, 252–264. B. H¨OCKER, C. J¨URGENS, M. WILMANNS, and R. STERNER, 2001, Stability, catalytic versatility and evolution of the (β α)(8)-barrel fold, Curr. Opin. Biotechnol. 12, 376–381.
References
References 1 G. ALAGONA, C. GHIO, and P. A. KOLLMAN, Do enzymes stabilize transition states by electrostatic interactions or p K a balance: the case of triose phosphate isomerase (TIM)? J. Am. Chem. Soc. 1995, 117, 9855–9862. 2 F. H. ARNOLD, Directed evolution: creating biocatalysts for the future, Chem. Eng. Sci. 1996, 51, 5091–5102. 3 P. C. BABBITT, M. S. HASSON, J. E. WEDEKIND, D. R. PALMER, W. C. BARRETT, G. H. REED, I. RAYMOND, D. RINGE, G. L. KENYON, and J. A. GERLT, The enolase superfamily: a general strategy for enzymecatalyzed abstraction of the α-protons of carboxylic acids, Biochemistry 1996, 35, 16489–16501. 4 P. C. BABBITT and J. A. GERLT, Understanding enzyme superfamilies: chemistry as the fundamental determinant in the evolution of new catalytic activities, J. Biol. Chem. 1997, 272, 30591–30594. 5 P. C. BABBITT and J. A. GERLT, Understanding enzyme superfamilies, Biochemistry 1999, 36, 000. 6 S. BORMAN, Enzyme’s activity designed to order, Chem. Eng. News 2000, Feb. 21, 35–36. 7 C. BRANDEN and J. TOOZE, Introduction to Protein Science, Garland Publishing, London, New York, 1991. 8 H. BRUHN, M. POHL, J. GROETZINGER, and M.-R. KULA, The replacement of Trp392 by alanine influences the decarboxylase/carboligase activity and stability of pyruvate decarboxylase from Zymomonas mobilis, Eur. J. Biochem. 1995, 234, 650–655. 9 C. C. DARWIN, On the Origin of Species by Means of Natural Selection or the Preservation of Favored Races in the Struggle for Life (The Origin of Species), John Murray, London, 1859. 10 W. F. DOOLITTLE, Phylogenetic classification and the universal tree, Science 1999, 284, 2124–2128. 11 G. K. FARBER and G. A. PETSKO, The evolution of α/β-barrel enzymes, Trends Biochem. Sci. 228–234. 1990,
12 S. M. FIRESTINE, A. E. NIXON, and S. J. BENKOVIC Threading your way to protein function, Chem. Biol. 1996, 3, 779–783. 13 J. A. GERLT, J. W. KOZARICH, G. L. KENYON, and P. G. GASSMAN, Electrophilic catalysis can explain the unexpected acidity of carbon acids in enzymecatalyzed reactions, J. Am. Chem. Soc 1991, 113, 9667–9669. 14 J. A. GERLT and P. C. BABBITT, Mechanistically diverse enzyme superfamilies: the importance of chemistry in the evolution of catalysis, Curr. Opin. Chem. Biol. 1998, 2, 607–612. 15 J. A. GERLT and P. C. BABBITT, Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally diverse suprafamilies, Annu. Rev. Biochem. 2001, 70, 209–246. 16 J. A. GERLT and F. M. RAUSHEL, Evolution of function in (β/α)(8 )-barrel enzymes, Curr. Opin. Chem. Biol. 2003, 7, 252–264. 17 M. S. HASSON, I. SCHLICHTING, J. MOULAI, K. TAYLOR, W. BARRETT, G. L. KENYON, P. C. BABBITT, J. A. GERLT, G. A. PETSKO, and D. RINGE, Evolution of an enzyme active site: the structure of a new crystal form of muconate lactonizing enzyme compared with mandelate racemase and enolase, Proc. Natl. Acad. Sci. USA 1998, 95, 10396–10401. 18 H. HEGYI and M. GERSTEIN, http:// bioinfo.mbb.yale.edu/genome/ foldfunc/Fig1-struct.html), 1998. 19 M. HENN-SAX, B. HOCKER, M. WILMANNS, and R. STERNER, Divergent evolution of (/Ja)8 -barrel enzymes, Biol. Chem. 2001, 382, 1315–1320. 20 M. HENN-SAX, R. THOMA, S. SCHMIDT, M. HENNIG, K. KIRSCHNER, and R. STERNER, Two (/3a) (8 )-barrel enzymes of histidine and tryptophan biosynthesis have similar reaction mechanisms and common strategies for protecting their labile substrates, Biochemistry 2002, 41, 12032–120342. 21 B. H¨OCKER, S. BEISMANN-DRIEMEYER, S. HETTWER, A. LUSTIG, and R. STERNER, Dissection of a (/?a)8 -barrel enzyme into
27
28 Mechanisms of Protein Evolution
22
23
24
25
26
27
28
29
30
31
two folded halves, Nature Struct. Biol. 2001, 8, 32–36. B. H¨OCKER, C. J¨URGENS, M. WILMANNS, and R. STERNER, Stability, catalytic versatility and evolution of the (beta alpha)(8)-barrel fold, Current Opin. Biotechmol. 2001, 12, 376–381. L. HOLM and C. SANDERS, An evolutionary treasure; unification of a broad set of amidehydrolases related to urease, Protein: Structure, Function, and Genetics 1997, 28, 72–82. N. H. HOROWITZ, On the evolution of biochemical syntheses, Proc. Natl. Acad. Sci. USA 1945, 31, 153–157. C. J. JEFFERY, Moonlighting proteins, Trends Biochem. Sci. (TIBS), 1999, 24, 8–11. J. JIA, W. HUANG, U. SCHOERKEN, H. SAHM, G. A. SPRENGER, Y. LINDQVIST, G. SCHNEIDER, Crystal structure of transaldolase B from Escherichia coli suggests a circular permutation of the a/(5-barrel within the class I aldolase family, Structure 1996, 4, 715–724. C. J¨URGENS, A. STROM, D. WEGENER, S. HETTWER, M. WILMANNS, and R. STERNER, Directed evolution of a (beta alpha)(8)-barrel enzyme to catalyze related reactions in two different metabolic pathways, Proc. Natl. Acad. Sci. 2000, 97, 18, 9925–9930. A. T. KALLARAKAL, B. MITRA, J.W. KOZARICH, J. A. GERLT, J. G. CLIFTON, G. A. PETSKO, and G. L. KENYON, Mechanism of the reaction catalyzed by mandelate racemase: structure and mechanistic properties of the J. Mutant, Biochemistry 1995, 34, 2788–2797. J. A. LANDRO, A. T. KALLARAKAL, S. C. RANSOM, J. A. GERLT, J. W. KOZARICH, D. J. NEIDHART, and G. L. KENYON, Mechanism of the reaction catalyzed by mandelate racemase. 3. Asymmetry in reactions catalyzed by the H297N mutant, Biochemistry 1991, 30, 9274–9281. D. LANG, R. THOMA, M. HENN-SAX, R. STERNER, and M. WILMANNS, Structural evidence for evolution of the beta/alpha barrel scaffold by gene duplication and fusion, Science 2000, 289, 1546–1550. G. MACBEATH, P. KAST and D. HILVERT, Redesigning enzyme topology by directed
32
33
34
35
36
37
38
39
40
evolution, Science 1998, 279, 1958–1961. M. J. MCLEISH, M. M. KNEEN, K. N. GOPALAKRISHNA, C. W. KOO, P. C. BABBITT, J. A. GERLT, G. L. KENYON, Identification and characterization of a mandelamide hydrolase and an NAD(P)+ -dependent benzaldehyde dehydrogenase from Pseudomonas putida ATCC 12633, J Bacteriol. 2003, 185, 2451–2456. G. MULLER-NEWEN, U. JANSSEN, and W. STOFFEL, Enoyl-CoA hydratase and isomerase form a superfamily with a common active-site glutamate residue, Eur. J. Biochem. 1995, 228, 68–73. N. NAGANO, C. A. ORENGO, and J. M. THORNTON, One fold with many functions: The evolutionary relationships between TIM barrel families based on their sequences, structures and functions, J. Mol. Biol. 2002, 321, 741–765. P. J. O’BRIEN and D. HERSCHLAG, Catalytic promiscuity and the evolution of new enzymatic activities, Chem. Biol. 1999, 6, R91–R105. S. OUE, A. OKAMOTO, T. YANO, and H. KAGAMIYAMA, Redesigning the substrate specificity of an enzyme by cumulative effects of the mutations of non-active site residues, J. Biol. Chem. 1999, 274, 2344–2349. D. R. PALMER, J. B. GARRET, V. SHARMA, R. MEGANATHAN, P. C. BABBITT, and J. A. GERLT, Unexpected divergence of enzyme function and sequence: ’ N-Acetylamino Acid Racemase’ is o-succinylbenzoate synthase, Biochemistry 1999, 38, 4252–4258. G. A. PETSKO, G. L. KENYON, J. A. GERLT, D. RINGE, and J. W. KOZARICH, On the origin of enzymatic species, Trends Biochem. Sci. (TIBS) 1993, 18, 372–376. G. A. PETSKO, Design by necessity, Nature 2000, 403, 606–607. E. QUE´ ME´ NEUR, M. MOUTIEZ, B. CHARBONNIER and A. M´ENEZ, Engineering cyclophilin into a proline-specific endopeptidase, Science 1998, 391, 301–304. D. REARDON, and G.K. FARBER, Protein motifs. 4. The structure and evolution of alpha/beta barrel proteins, FASEB J. 1995, 9, 497–503.
References 41 B. ROST, Protein structures sustain evolutionary drift, Fold. Des. 1997, 2, S19–S24. 42 C. SANDER and R. SCHNEIDER, Database of homology-derived protein structures and the structural meaning of sequence alignment, Proteins: Struct. Funct. Genet. 1991, 9, 56–68. 43 J. D. STEVENSON, S. LUTZ, and S. J. BENKOVIC, Retracing enzyme evolution in the (βα)8 -barrel scaffold, Angew. Chem. Int. Ed. 2001, 40, 1854–1856. 44 I. STORROE, and B. R. MCFADDEN, Ribulose diphosphate carboxylase/oxygenase in toluene-permeabilized Rhodospirillum rubrum, Biochem. J. 1983, 212, 45–54. 45 A. E. TODD, C. A. ORENGO, and J. M. THORNTON, Plasticity of enzyme active sites, Trends Biochem. Sci. 2002, 27, 419–426. 46 F. van de VELDE, L. KONEMANN, F. van RANTWIJK, and R. A. SHELDON, Enantioselective sulfoxidation mediated by vanadium-incorporated phytase: a hydrolase acting as a peroxidase, Chem. Commun. 1998, (17), 1891–1892. 47 C. VON DER OSTEN, S. BRANNER, S. HASTRUP, L. HEDEGAARD, M. D. RASMUSSEN, H. BISGARD-FRANTZEN, S. CARLSEN, and J. M. MIKKELSEN, Protein engineering of subtilisins to improve stability in detergent formulations, J. Biotechnol. 1993, 28, 55–68.
48 C. VINALS, X. De BOLLE, E. DEPIEREUX, and E. FEYTMANS, Knowledge-based modeling of the d-lactate dehydrogenase three-dimensional structure, Proteins 1995, 21, 307–318. 49 U. G. WAGNER, M. HASSLACHER, H. GRIENGL, H. SCHWAB, and C. KRATKY, Mechanism of cyanogenesis: the crystal structure of hydroxynitrile lyase from Hevea brasiliensis, Structure 1996, 4, 811–822. 50 R. K. WIERENGA, The TIM-barrel fold: a versatile framework for efficient enzymes, FEBS Lett. 2001, 492, 193–198. 51 M. K. WINSON and D. B. KELL, Going places: forces and natural molecular evolution, TIBTECH 1996, 14, 323–325. 52 C. WOESE, The universal ancestor, Proc. Natl. Acad. Sci. USA 1998, 95, 6854–6859. 53 C. R. WOESE, On the evolution of cells, Proc. Natl. Acad. Sci. USA 2002, 99, 8742–8747. 54 N. J. WYMER, L. V. BUCHANAN, D. HENDERSON, N. MEHTA, C. H. BOTTING, L. POCIVAVSEK, C. A. FIERKE, E. J. TOONE, and J. H. NAISMITH, Directed evolution of a new catalytic site in 2-keto-3-deoxy-6-phosphogluconate aldolase from Escherichia coli, Structure 2001, 9, 1–10. 55 T. YANO, S. OUE, and H. KAGAMIYAMA, Directed evolution of an aspartate aminotransferase with new substrate specificities, Proc. Natl. Acad. Sci. USA, 1998, 95, 5511–5515.
29
1
Predicting Free Energy Changes of Mutations in Proteins Raphael Guerois CEA Saclay, Gif-sur-Yvette, Frankreich
Joaquim Mendes, and Luis Serrano EMBL Heidelberg, Heidelberg, Germany
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
The ability to predict with good accuracy the effect of mutations on protein stability or complex formation is needed in order to design new proteins, as well as to understand the effect of single nucleotide polymorphisms on human health. In the following sections we will summarize what is known and can be used for this purpose. We will start by describing the forces that act on proteins, an understanding of which is a prerequisite to understanding the following section dealing with prediction methods. Finally we will briefly summarize other possible unexpected results of making mutants, such as changes in “in vivo” stability or inducing protein aggregation.
1 Physical Forces that Determine Protein Conformational Stability
In this section, we will briefly discuss the physical forces that determine the conformational stability of proteins. This section is intended to serve as a basis for the understanding of the following sections, and no effort has been made to comprehensively cover the relevant literature. However, many excellent reviews that thoroughly cover this topic have been written, several of which are cited in the titles to the various subsections for the interested reader; more specific references are given in the text.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Predicting Free Energy Changes of Mutations in Proteins
1.1 Protein Conformational Stability [1]
Under physiological conditions, proteins exist in a biologically active state (the native (N) state), which is in dynamic equilibrium with a biologically inactive state (the denatured (D) state). The relative thermodynamic stability of the N state with respect to the D state is measured by the folding free energy: Gfold = GN –GD . Although GN and GD are individually large numbers, Gfold is always small – typically on the order of 5–20 kcal mol−1 . This cancellation of large values is due to enthalpy-entropy compensation. The N state of proteins is, therefore, only marginally stable with respect to the D state, this stability being determined by a delicate balance between the forces that stabilize the N state and those that stabilize the D state. 1.2 Structures of the N and D States [2–6]
A detailed understanding of protein conformational stability requires that the structures of all microstates involved be known. Since the N and D states contribute on the same footing to Gfold , the structures of both states are, in principle, required. The N state of proteins consists of a compact structure in which the polypeptide chain folds up upon itself. This structure is frequently described as a single, unique conformation, which is a feature typical of proteins, compared with synthetic polymers. It is this feature that allows the structure of the N state to be determined experimentally with high accuracy, using methods such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy. However, the N state corresponds more accurately to an ensemble of structurally very similar conformations, which manifests itself experimentally as flexibility. Two levels of flexibility can be distinguished [7]: vibrational flexibility, corresponding to structural fluctuations within the same energy well on the energy hypersurface, and conformational flexibility, corresponding to structural transitions between different energy wells. Vibrational flexibility can be observed, for example, in the atomic B-factors of high-resolution X-ray models, whereas conformational flexibility can be observed in these models in the alternative rotamer conformations adopted predominantly by surface side chains. In contrast to the N state, little is known structurally about the D state of proteins. Proteins can be denatured by a variety of means, including extremes of high or low temperatures, extremes of pH, addition of organic solvents, and addition of chemical denaturants. However, the denatured state relevant to determining protein stability is that which exists in equilibrium with the N state under physiological conditions. It has been extremely difficult to observe this state experimentally, since it exists at very low concentration relative to the N state. Initially, the D state was conceived of as an ensemble of structurally very diverse, totally unfolded conformations of the polypeptide chain, which are highly solvated. These conformations
1 Physical Forces that Determine Protein Conformational Stability
were believed to be completely random and, therefore, the ensemble would be devoid of any structural features, i.e., a random coil state. This conception was based on data from a variety of experimental methods that indicate that almost all tertiary and secondary structure present in the N state is disrupted upon denaturation (probed by fluorescence, UV absorption, and near-UV circular dichroism), and that the polypeptide chain expands to a size approximately equal to that expected for a random coil (probed by viscometry, gel filtration, and small angle X-ray scattering). However, several inconsistencies with the random coil model of the D state appeared: (i) significant amounts of secondary structure can be detected in the D state of some proteins using far-UV circular dichroism even under relatively strong denaturing conditions, (ii) the D state of some proteins is only slightly expanded relative to the N state and much more compact than would be expected for a random coil, (iii) some single-point mutations can drastically alter the compactness of the D state, and (iv) the free energy change arising from some point mutations cannot be rationalized when a random coil model is assumed. In relation to point (iv), point mutants have been found for which the folding enthalpy is reduced to less than 50% of the wild-type value and the folding entropy increases to a value that almost perfectly compensates the enthalpy change, such that there is practically no difference in Gfold of the mutant and wild-type proteins [8]. But by definition, there are no side chain-side chain interactions in a random coil state and, therefore, the change in free energy of the D state accompanying mutation depends only on the identity of the mutated residue in the mutant and wild-type proteins, and not on the particular amino acid sequence; as such, under the random coil assumption, the change in stability produced by a mutation can be rationalized based solely on the structure of the N state. However, to explain the changes in enthalpy and entropy in these mutants solely in terms of effects on the native state requires (1) more than half of the chemical bonds stabilizing the wild type N state to be broken in the mutant N state, and (2) an enormous increase in the vibrational entropy of the mutant N state in order to avoid raising its free energy. Crystallographic studies of the N state of many mutant proteins have invariably shown only modest changes in atomic coordinates and thermal B-factors, making this interpretation highly implausible. A wealth of studies over the past decade or so have made considerable steps forward in elucidating the structure of the D state. These studies are based on a variety of NMR experiments. In contrast to other spectroscopic and hydrodynamic studies mentioned above, which yield data that correspond to a global property of the protein, NMR spectroscopy provides information about individual residues. The most important of these studies have directly observed the D state under physiological conditions. This has been accomplished by destabilizing the N state relative to the D state, such that both exist in similar concentrations or the D state predominates. The destabilization has been achieved by particular pH conditions [9, 10], a single-point mutation [11, 12], or the deletion of residues at the N- and C-termini [5, 13–15]. The conclusion of these studies is that considerable amounts of secondary structure exist in the D state. This structure can be
3
4 Predicting Free Energy Changes of Mutations in Proteins
either native-like or nonnative-like. It is not persistent but, rather, fluctuating. It can exist up to high concentrations of chemical denaturants, but eventually melts out at very high concentrations. Recent studies using paramagnetic probes or residual dipolar couplings have found that, in addition to local secondary structure, a considerable amount of nonlocal structure exists in the D state. For some proteins, the D state appears to adopt the same overall topology of the N state. This nonlocal structure persists even at very high concentrations of denaturant [16]. In the scope of protein engineering it was shown that the specific destabilization of nonnative residual structure in the denatured state could bring about a dramatic stabilization in an SH3 domain [17]. The current conception of the D state under physiological conditions is that of a highly compact ensemble with considerable local and nonlocal structure, which, for some proteins, may already be poised to adopt the N state. The recent systematic survey of 21 well-characterized proteins based on kinetics and protein engineering data also strongly suggests that a majority of proteins studied so far follow such behavior [18].
1.3 Studies Aimed at Understanding the Physical Forces that Determine Protein Conformational Stability [1, 2, 8, 19–26]
A large number of studies have been undertaken to assess the relative importance of different physical forces in stabilizing the N and D states of proteins. Two types of study have been performed: (1) studies in which Gfold is decomposed into the respective Hfold and Sfold components and these in turn into various component forces, and (2) studies in which the change in Gfold produced by point mutations is decomposed into component forces. These studies provided many important insights and improved the understanding of protein conformational stability considerably. However, some of their conclusions are still very controversial. The controversy arises from several factors: (i) the large enthalpy-entropy compensation accompanying folding leads to a very small Gfold the consequence of which is that a small error in the estimation of a particular force can lead to a stabilizing force being interpreted as being destabilizing, or vice versa, (ii) the impossibility of uniquely decomposing experimental thermodynamic quantities into intramolecular and solvation contributions, (iii) the limited knowledge of the structural details of the D state and the lack of adequate models for it, and (iv) the considerable plasticity of the N state, i.e., its ability to readjust structurally to accommodate point mutations. In spite of the controversy, since Gfold is always small, even forces that contribute only a few kcal mol−1 make a large contribution to Gfold and, thus, tip the balance in favor of the N state or of the D state. Therefore, as a general conclusion that has emerged mainly from mutant studies, each protein uses a different strategy to tweak this stability of the N state, taking advantage of small increases in stability afforded by a particular type of force.
1 Physical Forces that Determine Protein Conformational Stability
1.4 Forces Determining Conformational Stability [1, 2, 8, 19–27]
In their natural environment, most globular proteins are immersed in an aqueous medium. As such, to understand the conformational stability of proteins it does not suffice to consider the intramolecular interactions of the polypeptide chain in the N and D states. One must also consider the intermolecular interactions between the polypeptide chain in each of these states with the water molecules of the aqueous medium, as well as the alterations in the interactions of the water molecules among themselves induced by the presence of the protein. These latter interactions, which contain both enthalpic and entropic components, are usually all collected together in the solvation free energy. Apart from the intramolecular interaction energy and the solvation free energy, one must also take into account the entropy of the N and D states when attempting to understand protein conformational stability. 1.5 Intramolecular Interactions
The intramolecular interactions that stabilize the N and D states of a protein are predominantly nonbonded forces. Bonded forces are essential for maintaining the covalent structure of the polypeptide chain, but are less important in determining stability. 1.5.1 van der Waals Interactions The core of proteins in the N state is very densely packed – as dense as in organic crystals. As such, intramolecular van der Waals interactions are very important in stabilizing this state. Since the D state is significantly expanded with respect to the N state, intramolecular van der Waals interactions should be much fewer and weaker because it is a short-ranged interaction and, therefore, should play a much less important role in stabilizing this state. 1.5.2 Electrostatic Interactions Charge-charge interactions Electrostatic interactions between charged groups in proteins can only contribute significantly to Gfold if they are nonlocal, i.e., between groups on residues significantly separated in sequence. For neighboring residues, the free energy should be very similar in both the N and D states because charged groups are usually highly exposed in the N state as well. Salt bridges form in the N state between groups of opposite charge that are in direct contact and simultaneously establish hydrogen bonds with each other. It is frequently argued that buried salt bridges cannot stabilize the N state because the energy gained by the electrostatic interaction, even though high due to the interior of the protein having a low dielectric constant, cannot compensate for the dehydration of two charged groups. However, recent calculations suggest that salt bridges with favourable geometry are likely to be stabilizing anywhere in the protein [28]. On the other hand, it is frequently argued that salt bridge formation
5
6 Predicting Free Energy Changes of Mutations in Proteins
at the surface of the protein cannot contribute much to the stability of the N state, because the energy gained by the electrostatic interaction in the aqueous medium of high dielectric constant cannot compensate for the loss of side chain entropy involved in fixing the side chains in the correct orientation for the salt bridge to form. However, if most of the entropy of the side chains is already lost prior to the formation of the salt bridge due to packing with other side chains at the surface, then salt bridge formation does not lead to an entropy loss and is considerably stabilizing. On a similar note, entropic cooperativity can occur when networks of surface salt bridges are formed. Thus, although formation of the first salt bridge requires the loss of entropy of the two side chains involved, formation of a second salt bridge from a third residue to the already bridged pair requires only loss of the entropy of one side chain. Salt bridge networks are very common in hyperthermophilic proteins, and are thought to be important in stabilizing the N state of these proteins [1]. Long-ranged electrostatic interactions between groups not directly involved in a salt bridge with each other can also contribute to stabilize or destabilize the N state relative to the D state. Thus, too many charges of the same type at the surface of the protein destabilize the N state with respect to the D state, because in the more expanded D state the like charges are further apart and repel each other less strongly. On the other hand, some studies indicate that the distribution of charges at the surface of the protein can be optimized such that the overall long-ranged electrostatic interactions stabilize the N state [29–31]. It is frequently difficult to predict the effect of mutations that aim at increasing protein stability by optimizing electrostatic interactions among charged groups on the surface of the N state of the protein. In fact, the stability increase can be considerably smaller than predicted and, in some cases, a decrease in stability is observed when an increase was predicted. These results suggest that favorable charge-charge interactions occur in the D state, and that the free energy of this state may be decreased even more than that of the N state by the mutations in charged residues that had aimed at improving the electrostatics of the N state [32]. Hydrogen bonding The N state of proteins is extensively hydrogen bonded and intramolecular hydrogen bonding is an important interaction in stabilizing this state. In the D state, intramolecular hydrogen bonds are not persistent and, therefore, probably do not contribute significantly to the stability of this state. Most hydrogen bonds in proteins involve strong hydrogen bond donors and acceptors, and are of type N H O and O H O. However, there is some structural evidence that the α-hydrogen of the polypeptide backbone can participate in C H O hydrogen bonds, which, although somewhat weaker, could make a significant contribution to the stability of the N state [33, 34]. Dipole interactions Molecular mechanic calculations indicate that intramolecular dipole-dipole interactions between polar groups not directly hydrogen bonded to each other can contribute substantially to stabilizing the N state of proteins [22].
1 Physical Forces that Determine Protein Conformational Stability
These interactions were also found to represent a major stabilizing interaction in the D state, using an extended chain conformation as a model for this state [22]. It is possible that this would also be the case if a more realistic model of the D state were used, since this state is currently thought to contain considerable local and nonlocal structures similar to that in the N state (see above). Intramolecular charge-dipole interactions can also contribute significantly to stabilizing the N state of proteins. Thus, mutating a residue at the C-terminus of an α-helix to a positively charged amino acid, or mutating a residue at the N-terminus of an α-helix to a negatively charged amino acid, is found to stabilize the N state of the protein through charge-dipole interactions between the charged residues and the net dipole moment of the helix. Presumably, in these cases, the helix is not formed or only partially formed in the D state and, therefore, the charge-dipole interactions either do not exist or are much weaker in this state. Aromatic interactions The aromatic groups of aromatic residues have a net electric quadrupole moment. This arises from the electronic distribution being concentrated above and below the aromatic plane, leading to a net negative charge above and below this plane and a net positive charge within the plane and on the aromatic hydrogen atoms. The quadrupole moment is responsible for the particular interactions in which aromatic groups are involved. Aromatic groups in direct contact can interact with each other through aromaticaromatic interactions. Two stabilizing geometries are found in proteins: in one, the hydrogen atoms at the edge of one group point towards the center of the other, in the other the two aromatic planes are stacked on top of each other parallel but offcentered. Intramolecular aromatic-aromatic interactions can be considerably strong and can stabilize the N state of proteins significantly. In fact, aromatic residues exist most frequently in the core of native proteins and are in contact with each other more frequently than random. Aromatic groups can also be involved in cation-π interactions within proteins [35]. In this interaction, a positive ion is in close proximity to the negative partial charge at the center of the aromatic ring, thus interacting favorably with the quadrupole moment. Intramolecular cation-π interactions can contribute considerably to the stability of the N state of proteins. The aromatic hydrogen atoms can also interact with the sulfur atom of Cys and Met residues through a kind of weak hydrogen bond as has been experimentally shown [36]. Aromatic groups can also participate in a variety of hydrogen bonds of differing strengths [37]. 1.5.3 Conformational Strain Conformational strain is a consequence of bonded forces. Presumably, in the D state of proteins strained conformations do not exist because it is expanded and very flexible. Conformational strain can, however, occur in the tightly packed N state of proteins when: (i) a torsion angle or a bond angle deviates significantly from its equilibrium value, (ii) the side chain rotamer selected in the N state is
7
8 Predicting Free Energy Changes of Mutations in Proteins
not the one of lowest local energy, and (iii) there is overlap between atoms leading to steric repulsion. The first two of these are coined χ-strain and rotamer strain by Penel and Doig [38], and can make a considerable contribution to destabilizing the N state of proteins. These two types of strain can totally counterbalance the stabilizing effect of a disulfide bridge if this is under strain in the N state. As for the third type of strain, mutations that attempt to fit too large a residue into an internal cavity of a protein can result in so much atomic overlap that the protein does not fold. In less dramatic cases, it can lead to a significant decrease in the stability of the N state. 1.6 Solvation
The effects of solvation on protein conformational stability can be quantified through the solvation free energy of the N and the D states, defined as the free energy of transferring the polypeptide chain in each of these states from vacuum into aqueous solution. The free energies of transfer of model compounds indicate that the solvation of polar and charged groups in proteins is strongly favorable, that of aromatic groups is slightly favorable, and that of aliphatic groups is unfavorable. The favorable solvation of polar and charged groups is due to a negative enthalpy associated with favorable van der Waals interactions, hydrogen bond formation, and dipole-dipole or charge-dipole interactions with the solvent molecules. The slightly favorable solvation of aromatic groups is also enthalpic in origin and arises from favorable van der Waals and weak hydrogen bonds to the solvent molecules. The unfavorable solvation of aliphatic groups is due to the hydrophobic effect. This is a complex effect, which is still not completely understood. At room temperature, it is predominantly entropic in nature, involving the ordering of water molecules around the solute, but at other temperatures has both entropic and enthalpic contributions. The solvation data from these model compounds indicates that the solvation of polar and aromatic groups stabilizes the D state more than the N state, because in the D state such groups are more exposed to solvent. In contrast, solvation of aliphatic groups strongly destabilizes the D state with respect to the N state, because a large fraction of these groups is buried in the protein core in the N state but considerably exposed in the D state. 1.7 Intramolecular Interactions and Solvation Taken Together
A change in a thermodynamic quantity of folding, such as Gfold ; Hfold , or Sfold , resulting from a point mutation cannot be decomposed experimentally into the effects of intramolecular interactions and those of solvation. Therefore, it is common to assess the importance of the combined effects of intramolecular interactions and solvation on protein stability, of the different classes of chemical groups. Aliphatic groups stabilize the N state and destabilize the D state because they form more and stronger intramolecular van der Waals interactions and are
1 Physical Forces that Determine Protein Conformational Stability
excluded from solvent in the core of the protein in the N state. It is frequently argued that the stabilization afforded by the intramolecular van der Waals interactions in the N state is almost exactly offset by the intermolecular van der Waals interactions of the polypeptide chain with water in the D state (contained in the solvation free energy). Therefore, van der Waals interactions as a whole would not contribute to protein conformational stability, and the stability afforded by aliphatic groups is due to the destabilizing effect of hydrophobic solvation in the D state. However, the enthalpy of dissolution of organic crystals in water (as a model of the unfolding of the densely packed protein core) is unfavorable. Privalov and coworkers have interpreted this result as indicating that the van der Waals interactions in the crystal are stronger than between the organic molecule and water, as a consequence of the much less dense, open structure of water, which leads to fewer and weaker contacts than in the crystal [39]. They extend this explanation to protein stability, arguing that the intramolecular van der Waals interactions in the N state are stronger than the intermolecular van der Waals interactions with water in the D state and, therefore, van der Waals interactions are an important force in stabilizing the N state. However, based on molecular dynamics simulations, Karplus and coworkers suggested a different interpretation of the dissolution data [22]. They find that the intramolecular van der Waals interaction energy in the crystal is approximately the same as the intermolecular van der Waals energy of interaction between the organic molecule and water. Therefore, they argue that van der Waals interactions hardly contribute to protein conformational stability, and that the unfavorable enthalpy of dissolution results from changes in water structure associated with cavity formation. Although, the net contribution of aliphatic groups is to destabilize the D state through the hydrophobic effect and to stabilize the N state through van der Waals interactions, in some mutants an inverse hydrophobic effect has been observed for very exposed hydrophobic residues in the N state [40]. This has been interpreted as resulting from the hydrophobic residues being more buried in the D state than in the N state. Aromatic groups probably stabilize the N state more than they do the D state, because they are most frequently buried in the N state establishing favorable intramolecular van der Waals and aromatic interactions, which strongly outweigh the slightly favorable solvation when exposed in the D state. In fact, the very unfavorable free energies of transfer of aromatic compounds from the pure liquid into water, in spite of their favorable solvation free energy, is thought to result from the much more favorable aromatic-aromatic interactions in the pure liquid. There is still controversy on the contribution of polar groups to protein conformational stability. Some authors defend that the favorable solvation of these groups when exposed to solvent in the D state almost exactly cancels the favorable intramolecular hydrogen bond and dipolar interactions they establish in the N state, and therefore their net contribution to Gfold is practically zero. Other authors argue that the intramolecular interactions of the polar groups in the N state outweigh their favorable solvation in the D state, and conclude that polar group interactions
9
10 Predicting Free Energy Changes of Mutations in Proteins
contribute about the same to Gfold as do the interactions of aliphatic and aromatic groups. Whatever its net contribution to protein stability, hydrogen bonding is always important for achieving a specific fold, since any unsatisfied hydrogen bond donor or acceptor group in the N state of the protein largely destabilizes this state with respect to the D state, in which such groups establish hydrogen bonds with water molecules.
1.8 Entropy
The entropy of the D state of proteins is considerably larger than that of the N state, because it consists of a structurally much more diverse ensemble of conformations. As such, entropy stabilizes the D state with respect to the N state. One may consider the entropy of a polypeptide chain to be composed of backbone entropy and side chain entropy, although they are coupled. Mutants that change the entropy of the backbone in the D state can affect conformational stability substantially. Thus, Xaa-Gly mutations lead to an increased flexibility of the D state, which corresponds to an increase in entropy and, consequently a decrease in GD . Since the mutation does not affect the entropy of the N state to the same degree, GN is approximately constant. Therefore, Gfold decreases in absolute value, representing a decrease in the stability of the N state with respect to the D state. Xaa-Pro mutations induce exactly the opposite effect. Thus, they lead to a decreased flexibility of the D state, with a consequent decrease in entropy and associated increase in GD . Gfold increases in absolute value, representing an increase in the stability of the N state with respect to the D state. Xaa-Cys mutations designed to form additional disulfide bridges can also drastically reduce the entropy of the D state, thus enhancing stability [41]. However, the disulfide bridge must have the correct geometry in the N state, otherwise conformational strain can totally offset the gain in stabilization afforded by the reduction in entropy of the D state. The change in side chain entropy upon folding is thought to be a major factor determining α-helix and β-sheet propensities of the various amino acids. The conformational freedom of side chains is lost to a differing degree upon folding from the D to the N state. In the D state, side chains are considerably flexible and can probably access most rotamer states. However, particular interactions in the D state can limit the conformational freedom of side chains [42]. In the N state, totally buried side chains and side chains involved in salt bridges, for example, have very little conformational freedom. However, totally exposed residues may maintain most of the freedom they had in the D state, whereas partially exposed residues may have an intermediate flexibility. Although it is frequently assumed, when interpreting protein conformational stability, that side chain entropy is totally lost upon folding, the differing degrees of side chain flexibility in the N state as well as in the D state should be considered to gain a detailed understanding of the effect
1 Physical Forces that Determine Protein Conformational Stability
of side chain entropy on stability. Clearly, this will only be totally feasible with the advent of more realistic models of the D state. 1.9 Cavity Formation
The effect of cavity-forming mutations on protein conformational stability can be difficult to interpret, since it involves a simultaneous change of several interactions. If a hydrophobic cavity that is accessible to solvent is formed, there will be a reduction in the number of intramolecular van der Waals interactions and the exposure of a hydrophobic surface to solvent. All other factors held constant, such mutations will, in general, be destabilizing. If a polar cavity is formed, it is frequently observed that structural water molecules fill the cavity in the mutant. If the group deleted from the wild-type protein was polar, the water molecule(s) may substitute for its interactions and there may be no change in stability. However, if the deleted group was hydrophobic, the exposure of polar groups to solvent, which in the wild-type protein had unsatisfied hydrogen bonding, and the removal of an exposed hydrophobic group will, in general, lead to a stabilization of the mutant.
1.10 Summary
Many different forces participate in determining the final stability of a protein, which results from the sum of these forces in the native and denatured states. In general terms, entropy favors the denatured state, while desolvation of apolar groups, van der Waals interactions, hydrogen bonds, and electrostatic interactions favor the native state. Experimentally it has been very difficult to assign absolute values to all these terms, since any mutation will affect many of these terms simultaneously. Thus, breaking a buried H-bond by mutating a Ser into Ala not only breaks the H-bond but also changes the solvation properties of the denatured state, produces a cavity in the native state, and reduces the van der Waals interactions in this state. The native structure can relax, thereby increasing backbone entropy locally, while decreasing side chain entropy by eliminating one degree of freedom of the Ser residue. As a consequence, when protein scientists speak of the H-bond contribution, or of the hydrophobic contribution of an amino acid group, they are not speaking in pure physical terms, but rather are just referring to the main contributor to the interactions being analyzed. Studies over many different proteins and mutants show that all forces contribute to protein stability and that different proteins could use different combinations of forces to achieve stability. In fact, and contrary to what many groups have postulated, it has been shown that it is easier to stabilize a protein through substitutions on the surface than in the protein core [43–45]. A good example of how regular folded structures can be achieved without
11
12 Predicting Free Energy Changes of Mutations in Proteins
using hydrophobic amino acids is the formation of β-sheet amyloid fibrils by polyGln polypeptides [46]. In this case without any doubt the H-bonding and van der Waals interactions dominate over the hydrophobic contribution.
2 Effect of Point Mutations on in vitro Protein Stability 2.1 General Considerations on Protein Plasticity upon Mutation
Over the past 20 years the protein engineering method was used as an efficient means to probe the chemistry underlying protein structure and interactions. The large amount of thermodynamic data obtained experimentally offers an appealing support for the development and assessment of predictive methods that estimate the effect of point mutation on protein stability. On average, single deletion or conservative point mutations can alter the stability of a protein by 10–20% of its total stability (the upper limit is around 50% of the total stability) and is supposed not to cause too drastic changes in the three-dimensional structure. The observation that the structure is little affected by single deletion or conservative point mutations has been obtained from extensive works of different groups that have systematically solved the structure of the mutated protein [47–52]. The mutation of buried residues is expected to be most deleterious to protein structure and is discussed briefly here. As regards mutants in which a chemical group is deleted, slight adjustments usually partially accommodate changes in side-chain volume, but only to a limited degree [53–55]. In fact, it appears that full compensations are unlikely because it is difficult to reconstitute the equivalent set of interactions that were present in the wild-type structure (Figure 1). In the case of a new group inserted in a protein core, the changes in structure tend to accommodate the introduced side chains by rigid-body displacements of groups of linked atoms, achieved through relatively small changes in torsion angles (Figure 2). It is rare, for instance, that a side chain close to the site of substitution changes to a different rotamer [56]. In a sense these observations are consistent with the fact that proteins in their native state are in a deep energy well. Alternative structures are higher in energy and not populated as long as the destabilization effect is limited. Only when several side chains are mutated at one time are wider rearrangements observed in the set of interactions that stabilize protein structure [55, 57]. In cases of extreme perturbation (such as the insertion of amino acids inside a helix), experimental observations reveal an amazing plasticity of proteins that can successfully change and adapt their structures (Figure 3), [58]. Such modifications are still far from being predicted by computational methods. Yet, for the majority of the point mutants considered by the predictive algorithms, the assumption that the structure does not undergo large changes is a reasonable one, although some exceptions have also been shown [59].
2 Effect of Point Mutations on in vitro Protein Stability
Fig. 1 Representation of the structural perturbation of side-chains in the core of the T4 lysozyme upon deletion of chemical groups. The wild-type (PDB: 4LZM) is in yellow and the mutant F153A and L99A (PDB: 1L89) is in blue.
2.2 Predictive Strategies
A wide range of different strategies exist to account for the effect of point mutations on protein stability [60]. The methods can be divided into three classes: (i) statistically based methods (SEEF) that rely on the analysis of either sequence or structure databases to transform probabilities into energies, (ii) protein engineering based methods (EEEF), which are empirically based methods that take advantage of thermodynamic data obtained on protein mutants, and (iii) physically based methods (PEEF) that rely on the derivation of energy terms from model compounds and of rigorous treatment of thermodynamics to calculate free energy changes upon mutation. In the following we consider each of the three approaches, analyzing the bases and limits of their current predictive power and attempting to draw out the critical features that should be implemented for future progress.
13
14 Predicting Free Energy Changes of Mutations in Proteins
Fig. 2 Representation of the structural perturbation of side-chains in the core of the T4 lysozyme upon insertion of chemical groups. The wild-type (PDB: 4LZM) is in yellow and the mutant A129W (PDB: 1QTD) is in blue.
2.3 Methods 2.3.1 From Sequence and Multiple Sequence Alignment Analysis Several works suggest that the information obtained from multiple sequence alignments could be converted into knowledge-based potentials helpful for guiding protein engineering. Indeed, sequence profiles include in an implicit manner a combination of important protein traits, such as stability, structural specificity, and solubility. For instance, protein engineering experiments based the predictive approach solely on multiple sequence alignments to achieve a successful stabilization of several protein folds [61, 62]. Looking at the reason for the success of this strategy, it was observed that in 25% of the cases the increase in stability could be explained by an enhanced propensity of the substituted residue for the local backbone conformation at the mutated site [61]. This conclusion was reached based on the predictions of local structures using the I-sites library [63]. It would be interesting to use libraries of local structures such as the I-sites library [63] or strategies such as those used in AGADIR [64] to account implicitly for the energetic properties of local structures that cannot be described adequately by the basic α or β secondary structure classes. This may also be an interesting way to
2 Effect of Point Mutations on in vitro Protein Stability
Fig. 3 Representation of the structural perturbation induced by the insertion of a triplet of amino acids inside a helix of the T4 lysozyme. The wild-type (PDB: 4LZM) is in yellow and the mutant (PDB: 1L74) is in blue with the inserted region in red.
calculate the energy of the unfolded state when some residual structure is likely to be formed. However, used as a unique source of information, multiple sequence alignments are unlikely to provide quantitative estimates of the effect of a mutation on protein stability. Yet, if the structure of the protein target is unknown they become the only useful method for rationally modifying the sequence (and might be useful to improve the sample properties as shown in the field of structural genomics [65].
15
16 Predicting Free Energy Changes of Mutations in Proteins
2.3.2 Statistical Analysis of the Structure Databases Statistical potentials derived from the protein structure database have been widely used in the field of protein structure prediction. As mentioned by Lazaridis and Karplus [66], they have the advantage of being quick to compute, of accounting implicitly for complex effects that may be difficult to consider otherwise, and of perhaps being less sensitive to errors or atom displacement in the protein structure. In the case of protein design however, pairwise statistical potentials for residue-residue interactions may not be adequate for the structural stringency that is required [67, 68]. An elegant method has been developed that may overcome this limitation [69]. A four-body statistical potential, SNAPP, was developed based on the Delaunay tesselation of protein structure. The protein structure is divided into tetrahedra whose edges represent all nearest neighbors of side chain centroids. Quite strong correlations were obtained in the prediction of point mutations of hydrophobic core residues. Yet as discussed further, prediction of the effect of point mutations on buried hydrophobic residues is not the most difficult to achieve. Simple counting methods were shown to perform quite well also [70]. The tessellation strategy might be more difficult to extend to surface accessible positions and a four-body potential method might be less successful for polar residues. Statistical potentials based on the linear combination of database-derived potentials such as the PoPMuSiC algorithm have also been shown to have good predictive power, especially at fully exposed locations in proteins [71–73]. This program (accessible at babylone.ulb.ac.be/popmusic/) was successfully used to guide the engineering of an α 1 -antitrypsin protein [74] and helped to identify that cation-π interactions are likely to play a key role in the ability of proteins to undergo domain swapping [75]. The latter is an interesting example of how database-derived potentials can reveal the role of specific interaction types that would have not been detected with atomic potentials that do not explicitly integrate this type of interaction.
2.3.3 Helix/Coil Transition Model The evolution of the algorithms based on the helix/coil transition model illustrates beautifully the extent to which the field of prediction could benefit from protein engineering data. Because of its apparent simplicity, the helix structure has long been recognized to be an excellent model for understanding the fundamentals of protein structure and folding and for probing the effects of mutations on protein stability. The fact the helices are mainly stabilized through local interactions prompted many groups to tackle the analysis of small helical peptides whose sequences could be extensively varied. Short monomeric peptides present less context dependence than proteins. As mentioned in the previous section, changes in local interactions are also likely candidates to contribute to the stability of the native state and the denatured ensemble. Such systems have been successfully used to dissect the contribution of local interactions to helix stability (for reviews see Refs. [76] and [77], and have then been used to analyze β-hairpin and β-sheet formation (for a review see Ref. [77]).
2 Effect of Point Mutations on in vitro Protein Stability
Nowadays spectroscopic analysis, either by circular dichroism (CD) or nuclear magnetic resonance (NMR), has been carried out for a large number of short peptides encompassing helices of natural proteins. While short-range interactions are not enough to determine a single definite helix structure in a peptide, they do determine helix propensities, experimentally observed as helical populations in pepti-des that are different for every sequence and for each residue in the peptide. Thus, accurate predictions of helix stability require a statistical mechanics approach in which all the possible helical conformations in a peptide and all the energy contributions are taken into account. Its simplest version, postulated by Zimm and Bragg [78], used equilibrium constants characteristic of each amino acid to represent the nucleation and elongation of helical segments. Later versions of the helix/coil transition theory algorithms include detailed interaction terms such as capping interactions, side chain-side chain interactions, i, i + 3 and i, i + 4 electrostatic effects, interaction of charged groups with the helix dipole, etc. (for reviews see Refs. [76, 77, 79, 80]. These terms have been introduced, either to follow experimental data [64, 76, 81– 86], or to correspond to the statistical analysis of the protein database [84]. The latest version of one of these algorithms, AGADIR1s-2, includes: local motifs, a position dependence of the helical propensities for some of the 20 amino acids, an electrostatic model that takes into consideration all electrostatic interactions up to 12 residues in distance in the helix and random coil conformations and the effect of ionic strength [64]. This algorithm predicts with an overall standard deviation value of 6.6% (maximum helix is 100%), the CD helical content of 778 peptides (223 correspond to wild-type and modified protein fragments), as well as the conformational shifts of the C H protons, and the 13 C and 15 N J coupling values (web access at www.embl-heidelberg.de/Services/serrano/agadir/agadir-start.html). The important point for our discussion here is that these methods, originally developed to predict the properties of peptides, could also be used to predict the effect of mutation on protein stability. Several experimental results are now confirming that the use of AGADIR could help understanding or designing the effect of mutations on protein stability [43–45]. The development of an AGADIR-like program for the quantitative prediction of β-hairpin stability has been somewhat less successful than in the case of helices. Several nonaggregating peptides exhibiting a large amount of β-hairpin structure were studied in the 1990s and allowed experimental analysis of the specific interactions both at the turn regions and in the strands [87–91]. In general, a good correlation was found between the experimental data and the statistical analysis of the protein database. As more experimental information is available, the description of hairpin formation in a beta/coil transition algorithm might be undertaken. However, due to the complexity of the interactions present in β-sheets, a residue-based empirical description such as the one used in AGADIR is unlikely to be successful. For example, the energetic contribution of a pairwise interstrand side-chain interaction appears to be dependent on the face of the β-sheet on which it is located, and intrinsic propensities are likely to display a significant position dependence along the strands as well as in the different turn positions, and will vary for the different turn types. Furthermore parallel β-sheets will differ in their properties from the
17
18 Predicting Free Energy Changes of Mutations in Proteins
antiparallel β-sheets. Taken together, these different contributions would yield a huge number of parameters to be experimentally determined for any empirical model of β-sheet formation. As suggested in the next section, a rational approach to quantitatively predict the effect of mutations on β-hairpin and β-sheet stability will be more successful if, in addition to the residue based description used in AGADIR, an atomic-based description is included. Simple peptide models adopting β-hairpin and conformations hence constitute crucial systems for testing and refining such algorithms.
2.3.4 Physicochemical Method Based on Protein Engineering Experiments The physicochemical reasoning methods originate from the rationalization of results obtained from a large quantity of protein engineering data. They have been designed from the assembly of empirical rules that are likely to reflect most of the stability variations in proteins observed upon mutation. The resolution of more than 100 crystal structures of point mutants and the creation of databases gathering all available mutant information such as the Protherm [92] or ASEdb [93] databases constituted decisive steps in pushing forward a large-scale validation of these predictive strategies. We have identified four independent methods recently published that tackled the challenge of developing tools for predicting G upon any type of mutation: the SPMP [94, 95], the FOLD-X energy function (FOLDEF) [96] accessible on the web at fold-x.embl-heidelberg.de), the Kortemme et al. method [97], and the Lomize et al. method [98]. The aim of these methods is to estimate free energies from a single structural conformation. This means that they not only consider the potential energies associated with the interactions but also integrate effects such as the protein confor-mational entropy or the solvation. Because the exploration of the conformational space is limited in these methods, much of these effects have to be accounted for in an implicit manner. Also, the structure of the denatured state is not taken into account explicitly, rather interactions with close neighbours are neglected. Hence, in these methods, an initial fitting procedure is required to adjust the weights of the various explicit and implicit contributions to protein stability. In the training procedure, the structures of the mutants were either modeled from the wild-type structure (FOLDEF, Kortemme et al.) or obtained from experimentally determined X-ray structures (SPMP, Lomize et al.). The fact that most mutations are conservative (involving the deletion or the substitution of chemical groups) prevents large modeling issues. Table 1 summarizes the major factors included in the methods and briefly presents the way the energetic contributions were implemented. A global agreement is reached between all the methods regarding the minimal set of terms that are to be considered. For each term, however, depending on the method, a variable number of parameters were used in the fit. Apart from the work by Lomize et al., in which the different interactions were independently fitted with respect to the atom types (18 parameters all in all), the total number of adjustable parameters is around 8.
ASA-based model (2 parameters) Distance-dependent value. Three terms depend on protein or water interactions
Solvation
Explicit hydrogen bond
None
Van der Waals
Derived from average of model compounds analysis Constant value. Fitted
371 mut. + 233 in complexes None Protein atomic density
667 mut. + 82 in complexes None Protein atomic density from the atom volumic contact parameter Set by the atom volumic contact (atomic density)
56 mut. with X-ray structures High B-factor residues ASA based
Blind test database Excluded mutations Accessibility estimation
Lennard-Jones with two parameters for attractive and repulsive terms Lazaridis, and Karplus solvation model Angular, distance, and solvent-accessible dependence. Different terms for backbone and side chains. Four parameters
743 mut.
339 mut.
54 mut. with X-ray structures
8
8
Kortemme et al.
9
FOLD-X
Number of adjustable parameters Training database
SPMP
(Continued)
Lennard-Jones like. Nine terms depending on atom types ASA-based model (fizve parameters) 10–12 Lennard-Jones like potential. Three terms depending on atom types
106 mut. with X-ray structures 10 mut. with X-ray structures Water-accessible residues ASA based
18
Lomize et al.
Table 1 Summary of the implementation of the energetic factors included in four different methods for predicting free energy changes upon mutation
2 Effect of Point Mutations on in vitro Protein Stability 19
Propensities from Ramachandran plot statistics Yes. From statistical table
Two-states propensity α/β
Yes. From statistical table
Yes from X-ray
Yes
Backbone entropy
Side chain entropy
Explicit water
Cavity Other specificities
Yes. Position of water bridges predicted None Helix capping term Helix dipole term
Coulombic dielectric constant varying with atomic density
FOLD-X
None
SPMP
Electrostatics except H-bonds
Table 2 (Continued)
None Allows side chain flexibility (Monte-Carlo)
None
Coulombic dielectric constant varying with interatomic distance Propensities from Ramachandran plot statistics None
Kortemme et al.
None Explicit account of torsion strains in side chains
Yes. From statistical analysis (α/β differences) None
None
None
Lomize et al.
20 Predicting Free Energy Changes of Mutations in Proteins
2 Effect of Point Mutations on in vitro Protein Stability
A major feature of these methods is that the amplitudes of the various terms are scaled with respect to the solvent proximity. The scaling factor is calculated either from the accessible surface area or the density of protein atoms, which are two highly correlated parameters. In the case of the entropy terms (derived from statistical analyses of the structures database) the scaling accounts implicitly for the fact that, at the surface, side chains or backbone loops are more flexible. In the FOLD-X algorithm, this conformational entropy is also dependent on the existence of hydrogen bond interactions that are likely to freeze the conformations. The contributions of the solvation effects (derived or consistent with model compound analysis) appear to be approximated in a reasonable manner by the use of the scaling factors. Penalties associated with the burial of polar groups represent the largest contribution to the destabilization of the protein tested in these methods. The implicit treatment of solvation effects is, however, questioned to account for the effect of so-called structural water (or water bridges), which are water molecules making more than two hydrogen bonds with the protein atoms. In the SPMP and FOLD-X methods, an explicit treatment of water bridges was found absolutely required for predicting the G of mutants in which a water bridge had been altered. The SPMP method detailed the effect of water even further by considering explicitly the entire water shell network surrounding the protein [99]. This method relied on the X-ray structure of the mutants obtained at very low temperature so that the thermal fluctuations do not impede the observation of water molecules network around the protein. When the structure of the mutant is not known, a fast predictive method for positioning the water molecules is required. The one proposed in [100] was for instance adapted in the FOLD-X method. It is important to note that in their study, Kortemme et al. suggested that the method they used for hydrogen bond calculation allowed an implicit treatment of the effect of these structural water molecules [97]. The role and the way to compute the effects of hydrogen bonds in protein stability has been a matter of debate for some years [101, 102]. In all the methods based on protein engineering analysis, an explicit consideration of the hydrogen bond with distance and angular constraints was used. Other recent studies also suggest that, in order to score a unique structural conformation, the angular and distance constraints inherent to the formation of hydrogen bonds are better modeled by a specific expression than by a global term accounting for all the electrostatic interactions [103, 104]. Different databases were used to test these potentials, which makes a ranking comparison between the methods rather difficult. For the FOLDEF, SPMP, and Kortemme et al. methods a precision ranging from 0.7 to 1 kcal mol−1 of standard deviation between prediction and experiments was obtained. The fourth method (Lomize et al.) produced very low rmsd errors between theoretical and experimental (0.4 kcal mol−1 and 0.57 kcal mol−1 for the training database or the blind test database respectively). Yet the mutant databases used in the latter studies were limited to fully buried residues (test carried out on solvent accessible residues led to mean unsigned errors of 1.4 kcal mol−1 ) and the blind test database was restricted to 10 point mutants.
21
22 Predicting Free Energy Changes of Mutations in Proteins
Interestingly, each method has specifically focused on at least one term of the energy function: (i) the SPMP method has developed specific analyses for explicit water molecules, (ii) the FOLD-X focused on the way entropy penalties are applied, (iii) the Kortemme et al. method derived a distance and angular dependent expression for hydrogen bonds, and (iv) the Lomidze et al. method used a specific Lennard Jones potential for each atom type interaction. A combination of these specific developments may push forward the current limits of the predictive algorithms. The way hydrogen bonds strength and charge-charge electrostatic interactions are modulated with solvent accessibility and with long-range interactions is probably one of the lines of improvement that can still be achieved in these methods. Yet, to be precise enough, this improvement will require the inclusion of side chain or backbone conformational exploration as was already partly included in the Kortemme et al. method. An interesting test case to those questions has been proposed in the study of a surface salt bridge K11/E34 in ubiquitin and of the mutant in the reverse orientation E11/K34 [31]. It is clearly shown that long-range interactions contribute to the strength of these salt bridges and that a simple model can predict the effect of the surrounding on this type of interaction. Another track for improvement is related to the selection of the mutants tested in the fitting procedure. The fitting of the adjustable parameters tends to prioritize mutants with a larger G. Contexts or interactions that involve smaller G or interactions that are little represented in the database may tend to be underoptimized. For instance, in the case of mutations made at the surface of a protein, the amplitudes of G are on average smaller than those for the buried ones. Also, mutations which alter the cap of a helix or the partners of a structural water are relatively rare, but their corresponding G can be quite large. In all those cases, the fitting procedure is likely to limit the optimization of the parameters. A strategy to improve the current algorithms might be to restrict the databases to mutants that specifically address certain classes of interactions, such as the interaction with water bridges or the very solvent exposed ones.
2.3.5 Methods Based only on the Basic Principles of Physics and Thermodynamics These methods use a combination of basic principles of physics and thermodynamics to reproduce with the highest fidelity possible the complex balance of interactions occurring in proteins. With respect to the more empirical methods described above, they restrict as far as possible the number of implicit terms they have to consider in the calculation. The great interest of such approaches is that the origin of any perturbation can then be analyzed and understood in the greatest detail. In principle, the free energy change between two states can be determined by evaluating the partition function in each state. In molecular systems as complex as proteins, however, one cannot sample thoroughly the conformational space to obtain accurate free energies. One of the strategies used instead is to break the change into many smaller steps, so that the free energy
2 Effect of Point Mutations on in vitro Protein Stability
change for each step is evaluated by perturbation or thermodynamic integration methods [105–107]. In this method, the G between a state A and B can be expressed with respect to the Hamiltonian of each state, HA and HB by defining a λ factor so that H(λ)=λHB +(1−λ)HA
(1)
as follows: 1. Using the free energy perturbation method
G=G B −G A =
1
−RT lne −H /RT λ , where H =Hλ+dλ −Hλ
(2)
λ=0
2. Using the thermodynamic integration method
G=
λ=1 λ=0
∂H dλ ∂λ
(3)
This methodology has been applied to a wide range of free energy calculations dedicated to the study of enzymatic reactions, conformational changes, binding of proteins, nucleic acids or small molecules [108], and also of the effects of mutations on protein stability. Compared with other fields of application, the estimation of protein stability is relatively difficult. Indeed, contrary to the analysis of binding between macromolecules, the estimation of the G of folding requires the energy calculation of a highly flexible and still poorly characterized state, the denatured state ensemble. Since the early 1990s, molecular dynamics strategies coupled to free energy calculations have been used to analyze the stability changes of about 30 different point mutations in experimentally studied proteins [109–119]. The agreements between experimental and predictions have been good in the large majority of the cases observed, although significant discrepancies could also be observed [120]. The importance of the denatured state in the free energy calculation has been highlighted in many of the most recent calculations. It supports the experimental observation of the existence of residual structure, often with native like properties, in the denatured state of protein as reported in the first section. The systematic calculations carried out on 10 different mutants of chymotrypsin inhibitor 2 (CI2), with various conditions for the representation of the denatured protein emphasize the importance of considering a proper denatured state structure ensemble [118]. This analysis illustrates that when the denatured state is taken into account in the calculation, the predictions are much more successful (correlation going from 0.68 to 0.82). Besides, depending on the model, either an extended tripeptide or a more realistic full-length simulated structure of the denatured state, clear
23
24 Predicting Free Energy Changes of Mutations in Proteins
improvements in the simulations could be obtained. Hence, in the case of the hydrophobic mutations considered there, the quality of the prediction was shown to depend on the model of the denatured state. Similarly, the works by Kitao’s group [114, 116] showed a better agreement between experimental values and theory when they changed their reference state from a pentapeptide with an extended conformation to a more realistic pentapeptide model with the native like conformation. As stated above, the advantage of molecular dynamics simulations or alternative methods that are based on the basic principle of physics is that they allow a detailed inspection of the origin of the stabilization or destabilization. Yet, the way free energies can be divided into energy (such as Lennard Jones, electrostatic, etc.) or residue terms is not a trivial issue. Several theories regarding the path dependence of the free energy components have been advanced [107, 121–125]. Under controlled conditions and proper choice of the λ coupling parameter it seems possible to extract some meaningful information. Unexpected results were observed, for instance, in the case of mutations involving apolar residues. The reason for the loss of stability in the V57A mutant in CI2 or the L56F mutant in T4 lysozyme could be interpreted as a larger variation in the electrostatic component. These particular mutants were not well predicted by the empirical-based methods and this fact emphasizes the advantage of mixing various approaches to get further insight in the origin of protein stabilization or destabilization. Because it is based on molecular dynamics simulation, the free energy calculation method can account for the change in structure provoked by a mutation and does not depend on the assumption that the structure is unchanged upon mutation. Yet compared with the less sophisticated empirical methods the computer time required to run a calculation on one mutant is several orders of magnitude larger. The most exhaustive study is still limited to eight to ten mutants. It would, however, be of great interest to get results from the calculation of a larger number of point mutants so that a statistical estimation of the error made by these methods could be more easily made. To speed up and overcome the sampling problems mixed strategies using Monte-Carlo sampling coupled to molecular dynamics have been proposed [117]. It is also important to mention alternative strategies using optimized methods for extensive sampling of the side chain conformational space. In this case however the backbone is considered rigid. Instead of extracting free energies from molecular dynamic simulation, these methods account for the conformational entropy component by an extensive sampling of side chain conformations. Different levels of sophistication in the force field can be included in these methods [126–128]. For instance the method in Ref. [128] included an implicit solvation term that allowed the calculation of a large number of point mutants not only at solventburied positions but also at solvent-exposed ones. This line of research is, to our view, one of the most promising for the prediction of the effect of point mutation on protein stability since it combines a high sampling quality and a force field that can depend both on physical and statistical terms to achieve the highest prediction rate.
3 Mutation Effects on in vivo Stability
3 Mutation Effects on in vivo Stability
A frequently forgotten aspect of proteins is that in general they need to perform their function in vivo. The cell has evolved a series of mechanisms to ensure the removal of proteins that have already carried out their function or of polypeptide chains that are faulty at synthesis or become misfolded. Thus there are some cryptic signals in proteins that are recognized by a cell and determine the half-life of a protein or target for destruction. Knowing about these signals is important since they could determine the outcome of an experiment (i.e., yield, protein activity, etc.). In general these signals could be classified as follows: (1) N-terminal rules, (2) C-terminal rules, and (3) PEST sequences. However, it is possible that many other spurious signals exist and result in rapid degradation of an otherwise stable protein. 3.1 The N-terminal Rule
The N-end rule relates the in vivo half-life of a protein to the identity of its N-terminal residue. Similar but distinct versions of the N-end rule operate in all organisms examined, from mammals to fungi and bacteria. In eukaryotes, the N-end rule pathway is a part of the ubiquitin system. In Escherichia coli, three genes have been identified, clpA, clpP, and aat, as part of the degradation machinery involved in the N-terminal rule. In Table 2 we show data for the half-life of a protein having different N-terminal amino acids. For a review see Ref. [129]. Approximate in vivo half-lives of X-β-galactosidase (βgal) proteins in E. coli at 36◦ C [130] and in S. cerevisiae at 30◦ C [131, 132]. A question mark at Pro indicates its uncertain status in the N-end rule (see text). From Ref. [129].
3.2 The C-terminal Rule
In E. coli it has been found that when the ribosome encounters a problem in protein synthesis it attaches an 11 amino acid peptide (the ssrA degradation tag) to the Cterminus of the stalled protein chain and then the protein is released and targeted for degradation by the ClpX system [133]. Consequently it has been found that proteins containing an unstructured C-terminal segment with a hydrophobic amino acid at the C-terminus could be targeted for degradation by the ClpX machinery and that the efficiency of the degradation is related to the hydrophobic composition of the last five amino acids [134]. 3.3 PEST Signals
Polypeptide regions rich in proline (P), glutamic acid (E), serine (S), and threonine (T) (PEST) target intracellular proteins for destruction [135]. Attachment of PEST
25
26 Predicting Free Energy Changes of Mutations in Proteins Table 2 The N-end rule in E. coli and Saccharomyces cerevisiae
In vivo half-life of X-βgal (min) Residue X in X-gal Arg Lys Phe Leu Tr p Ty r His Ile Asp Glu Asn Gln Cys Ala Ser Thr Gly Va l Pro Met
in E. coli
in S. cerevisiae
2 2 2 2 2 2 >600 >600 >600 >600 >600 >600 >600 >600 >600 >600 >600 >600 ? >600
2 3 3 3 3 10 3 30 3 30 3 10 >1200 >1200 >1200 >1200 >1200 >1200 ? >1200
sequences to proteins results in the fusion protein being degraded from 2- to almost 40-fold faster than the parental molecule [136]. 4 Mutation Effects on Aggregation
When making a mutation or designing a protein, a frequently forgotten aspect is that the folded protein is in equilibrium with the denatured state and more important that the denatured state, or folding intermediates, can lead to aggregation. It is important at this point to make the distinction with the random process of precipitation, which is related to issues of solubility and does not appear to alter the structure of the proteins, as is seen for example by salting out using ammonium sulfate salts. Protein aggregation has long been thought of as an unspecific process caused by the formation of nonnative contacts between protein folding intermediates. This view was supported by the wide variation of morphologies of aggregates that were observed by techniques such as electron microscopy and atomic force microscopy. Spectroscopically, two major types of aggregates are observed: the ones that involve the formation of β-sheet [137] and the ones that retain the native spectrum [138], although in a few cases helical aggregates can also be found. Both principal forms of initial aggregates can ultimately be converted to amyloid-like
References
fibers that are invariably rich in cross-beta structure [137]. It seems therefore that nearly all proteins when aggregating will end up in a universal cross-beta structure but that the mechanism of getting there is determined by the competition between retaining the native structure and forming cross-β contacts. Mutations that favor protein aggregation can be classified into those that destabilize the protein and increase the concentration of aggregation prone species (denatured and intermediate conformations) [139], or those that increase per se the tendency of an amino acid sequence to aggregate [140]. Analysis of the effect of different mutations on the aggregation tendency of unfolded Acp and other peptides [141] has illustrated the importance of secondary structure propensity, hydrophobicity, and charges in the process. Recently, a statistical mechanics algorithm, TANGO, based on simple physicochemical principles of β-sheet formation extended by the assumption that the core regions of an aggregate are fully buried, has been developed to predict protein aggregating regions. TANGO predicts with surprisingly good accuracy the regions experimentally described to be involved in the aggregation of 176 peptides of over 20 proteins. The predictive capacities of TANGO are further illustrated by two examples: the prediction of the aggregation propensities of Aβ1–40 and Aβ1–42 and in several disease-related mutations of the Alzheimer’s β-peptide as well as the prediction of the aggregation profile of human acyl phosphatase. Thus, by capturing the energetics of structural parameters observed to contribute to protein aggregation and taking into account competing conformations, like α-helix and β-turn formation, it is possible to identify protein regions susceptible of promoting protein aggregation. The success of TANGO shows that the underlying mechanism of cross-β formation aggregates is universal [142]. It is expected that improvements of this or similar algorithms developed by other groups in conjunction with other software used to measure the impact of mutations on protein stability will allow us to design new proteins taking into account all the states and interactions.
References 1 VIEILLE, C. and ZEIKUS, G. J. (2001). Hyperthermophilic enzymes: sources, uses, and molecular mechanisms for thermostability. Microbiol Mol Biol Rev 65, 1–43. 2 DILL, K. A. (1990). Dominant forces in protein folding. Biochemistry 29, 7133–7155. 3 DILL, K. A. and SHORTLE, D. (1991). Denatured states of proteins. Annu Rev Biochem 60, 795–825. 4 SHORTLE, D. (1996). The denatured state (the other half of the folding equation) and its role in protein 12 stability. FASEB J 10, 27–34.
5 SHORTLE, D. (2002). The expanded denatured state: an ensemble of conformations trapped in a locally encoded topological space. Adv Protein Chem 62, 1–23. 6 MILLETT, I. S., DONIACH, S., and PLAXCO, K. W. (2002). Toward a taxonomy of the denatured state: small angle scattering studies of unfolded proteins. Adv Protein Chem 62, 241–262. 7 KARPLUS, M., ICHIYE, T., and PETTITT, B. M. (1987). Configurational entropy of native proteins. Biophys J 52, 1083–1085. 8 STURTEVANT, J. M. (1994). The thermodynamic effects of protein
27
28 Predicting Free Energy Changes of Mutations in Proteins
9
10
11
12
13
14
15
16
17
18
mutations. Curr Opin Struct Biol 4, 69–78. KORTEMME, T., KELLY, M. J., KAY, L. E., FORMAN-KAY, J., and SERRANO, L. (2000). Similarities between the spectrin SH3 domain denatured state and its folding transition state. J Mol Biol 297, 1217–1229. LIETZOW, M. A., JAMIN, M., JANE DYSON, H. J., and WRIGHT, P. E. (2002). Mapping long-range contacts in a highly unfolded protein. J Mol Biol 322, 655–662. KLEIN-SEETHARAMAN, J., OIKAWA, M., GRIMSHAW, S. B. et al. (2002). Long-range interactions within a nonnative protein. Science 295, 1719–1722. MAYOR, U., GROSSMANN, J. G., FOSTER, N. W., FREUND, S. M., and FERSHT, A. R. (2003). The denatured state of Engrailed Homeodomain under denaturing and native conditions. J Mol Biol 333, 977–991. ACKERMAN, M. S. and SHORTLE, D. (2002). Molecular alignment of denatured states of staphylococcal nuclease with strained polyacrylamide gels and surfactant liquid crystalline phases. Biochemistry 41, 3089–3095. MOK, Y. K., KAY, C. M., KAY, L. E., and FORMAN-KAY, J. (1999). NOE data demonstrating a compact unfolded state for an SH3 domain under non-denaturing conditions. J Mol Biol 289, 619–638. CHOY, W. Y. and FORMAN-KAY, J. D. (2001). Calculation of ensembles of structures representing the unfolded state of an SH3 domain. J Mol Biol 308, 1011–1032. SHORTLE, D. and ACKERMAN, M. S. (2001). Persistence of native-like topology in a denatured protein in 8 M urea. Science 293, 487–489. MOK, Y. K., ELISSEEVA, E. L., DAVIDSON, A. R., and FORMAN-KAY, J. D. (2001). Dramatic stabilization of an SH3 domain by a single substitution: roles of the folded and unfolded states. J Mol Biol 307, 913–928. SANCHEZ, I. E. and KIEFHABER, T. (2003). Hammond behavior versus ground state effects in protein folding: evidence for narrow free energy barriers and residual
19
20
21
22
23
24
25
26
27
28
29
30
31
structure in unfolded states. J Mol Biol 327, 867–884. CREIGHTON, T. E. (1991). Stability of folded conformations. Curr Opin Struct Biol 1, 5–16. MATTHEWS, B. W. (1991). Mutational analysis of protein stability. Curr Opin Struct Biol 1, 17–21. FERSHT, A. R. and SERRANO, L. (1993). Principles of protein stability derived from protein engineering experiments. Curr Opin Struct Biol 3, 75–83. LAZARIDIS, T., ARCHONTIS, G., and KARPLUS, M. (1995). Enthalpic contribution to protein stability: insights from atom-based calculations and statistical mechanics. Adv Protein Chem 47, 231–306. MAKHATADZE, G. I. and PRIVALOV, P. L. (1995). Energetics of protein structure. Adv Protein Chem 47, 307–425. HONIG, B. and YANG, A. S. (1995). Free energy balance in protein folding. Adv Protein Chem 46, 27–58. JAENICKE, R., SCHURIG, H., BEAUCAMP, N., and OSTENDORP, R. (1996). Structure and stability of hyperstable proteins: glycolytic enzymes from hyperthermophilic bacterium Thermotoga maritima. Adv Protein Chem 48, 181–269. PACE, C. N., SHIRLEY, B. A., MCNUTT, M., and GAJIWALA, K. (1996). Forces contributing to the conformational stability of proteins. FASEB J 10, 75–83. CREIGHTON, T. E. (1993). Proteins: Structures and Molecular Properties, 2nd edn. W. H. Freeman and Co., New York. KUMAR, S. and NUSSINOV, R. (1999). Salt bridge stability in monomeric proteins. J Mol Biol 293, 1241–1255. MARTIN, A., KATHER, I., and SCHMID, F. X. (2002). Origins of the high stability of an in vitro-selected cold-shock protein. J Mol Biol 318, 1341–1349. TORREZ, M., SCHULTEHENRICH, M., and LIVESAY, D. R. (2003). Conferring thermostability to mesophilic proteins through optimized electrostatic surfaces. Biophys J 85, 2845–2853. MAKHATADZE, G. I., LOLADZE, V. V., ERMOLENKO, D. N., CHEN, X., and THOMAS, S. T. (2003). Contribution of
References
32
33
34
35
36
37
38
39
40
41
surface salt bridges to protein stability: guidelines for protein engineering. J Mol Biol 327, 1135–1148. PACE, C. N., ALSTON, R. W., and SHAW, K. L. (2000). Charge-charge interactions influence the denatured state ensemble and contribute to protein stability. Protein Sci 9, 1395–1398. SCHEINER, S., KAR, T., and GU, Y. (2001). Strength of the Calpha H O hydrogen bond of amino acid residues. J Biol Chem 276, 9832–9837. CHAMBERLAIN, A. K. and BOWIE, J. U. (2002). Evaluation of C H O hydrogen bonds in native and misfolded proteins. J Mol Biol 322, 497–503. PLETNEVA, E. V., LAEDERACH, A. T., FULTON, D. B., and KOSTIC, N. M. (2001). The role of cation-pi interactions in biomolecular association. Design of peptides favoring interactions between cationic and aromatic amino acid side chains. J Am Chem Soc 123, 6232– 6245. VIGUERA, A. R. and SERRANO, L. (1995). Side-chain interactions between sulfur-containing amino acids and phenylalanine in alpha-helices. Biochemistry 34, 8771–8779. SCHEINER, S., KAR, T., and PATTANAYAK, J. (2002). Comparison of various types of hydrogen bonds involving aromatic amino acids. J Am Chem Soc 124, 13257–13264. PENEL, S. and DOIG, A. J. (2001). Rotamer strain energy in protein helices – quantification of a major force opposing protein folding. J Mol Biol 305, 961–968. MAKHATADZE, G. I. and PRIVALOV, P. L. (1995). Energetics of protein structure. Adv Protein Chem 47, 307–425. BOWLER, B. E., MAY, K., ZARAGOZA, T., YORK, P., DONG, A., and CAUGHEY, W. S. (1993). Destabilizing effects of replacing a surface lysine of cytochrome c with aromatic amino acids: implications for the denatured state. Biochemistry 32, 183–190. MATSUMURA, M. and MATTHEWS, B. W. (1991). Stabilization of functional proteins by introduction of multiple disulfide bonds. Methods Enzymol 202, 336–356.
42 CHOY, W. Y., SHORTLE, D., and KAY, L. E. (2003). Side chain dynamics in unfolded protein states: an NMR based 2H spin relaxation study of delta131delta. J Am Chem Soc 125, 1748–1758. 43 MUNOZ, V., CRONET, P., LOPEZ-HERNANDEZ, E., and SERRANO, L. (1996). Analysis of the effect of local interactions on protein stability. Fold Des 1, 167–178. 44 VILLEGAS, V., ZURDO, J., FILIMONOV, V. V., AVILES, F. X., DOBSON, C. M., and SERRANO, L. (2000). Protein engineering as a strategy to avoid formation of amyloid fibrils. Protein Sci 9, 1700–1708. 45 TADDEI, N., CHITI, F., FIASCHI, T. et al. (2000). Stabilisation of alpha-helices by site-directed mutagenesis reveals the importance of secondary structure in the transition state for acylphosphatase folding. J Mol Biol 300, 633–647. 46 SCHERZINGER, E., LURZ, R., TURMAINE, M. et al. (1997). Huntingtin-encoded polyglutamine expansions form amyloid-like protein aggregates in vitro and in vivo. Cell 90, 549–558. 47 TAKANO, K., SCHOLTZ, J. M., SACCHETTINI, J. C., and PACE, C. N. (2003). The contribution of polar group burial to protein stability is strongly context-dependent. J Biol Chem 278, 31790–31795. 48 TAKANO, K., YAMAGATA, Y., and YUTANI, K. (2001). Contribution of polar groups in the interior of a protein to the conformational stability. Biochemistry 40, 4853–4858. 49 TAKANO, K., YAMAGATA, Y., and YUTANI, K. (2003). Buried water molecules contribute to the conforma-tional stability of a protein. Protein Eng 16, 5–9. 50 TAKANO, K., YAMAGATA, Y., FUNAHASHI, J., HIOKI, Y., KURAMITSU, S., and YUTANI, K. (1999). Contribution of intra- and intermolecular hydrogen bonds to the conformational stability of human lysozyme. Biochemistry 38, 12698–12708. 51 YAMAGATA, Y., KUBOTA, M., SUMIKAWA, Y., FUNAHASHI, J., TAKANO, K., FUJII, S., and YUTANI, K. (1998). Contribution of hydrogen bonds to the conformational stability of human lysozyme: calorimetry and X-ray analysis of six tyrosine →
29
30 Predicting Free Energy Changes of Mutations in Proteins
52
53
54
55
56
57
58
59
60
phenylalanine mutants. Biochemistry 37, 9355–9362. MATTHEWS, B. W. (1995). Studies on protein stability with T4 lysozyme. Adv Protein Chem 46, 249–278. TAKANO, K., YAMAGATA, Y., FUJII, S., and YUTANI, K. (1997). Contribution of the hydrophobic effect to the stability of human lysozyme: calorimetric studies and X-ray structural analyses of the nine valine to alanine mutants. Biochemistry 36, 688–698. ERIKSSON, A. E., BAASE, W. A., ZHANG et al. (1992). Response of a protein structure to cavity-creating mutations and its relation to the hydrophobic effect. Science 255, 178–183. BALDWIN, E., XU, J., HAJISEYEDJAVADI, O., BAASE, W. A., and MATTHEWS, B. W. (1996). Thermodynamic and structural compensation in “size-switch” core repacking variants of bacteriophage T4 lysozyme. J Mol Biol 259, 542–559. LIU, R., BAASE, W. A., and MATTHEWS, B. W. (2000). The introduction of strain and its effects on the structure and stability of T4 lysozyme. J Mol Biol 295, 127–145. GASSNER, N. C., BAASE, W. A., and MATTHEWS, B. W. (1996). A test of the “jigsaw puzzle” model for protein folding by multiple methionine substitutions within the core of T4 lysozyme. Proc Natl Acad Sci USA 93, 12155–12158. VETTER, I. R., BAASE, W. A., HEINZ, D. W., XIONG, J. P., SNOW, S., and MATTHEWS, B. W. (1996). Protein structural plasticity exemplified by insertion and deletion mutants in T4 lysozyme. Protein Sci 5, 2399–2415. CONSONNI, R., SANTOMO, L., FUSI, P., TORTORA, P., and ZETTA, L. (1999). A single-point mutation in the extreme heat- and pressure-resistant sso7d protein from sulfolobus solfataricus leads to a major rearrangement of the hydrophobic core. Biochemistry 38, 12709–12717. MENDES, J., GUEROIS, R., and SERRANO, L. (2002). Energy estimation in protein design. Curr Opin Struct Biol 12, 441–446.
61 RATH, A. and DAVIDSON, A. R. (2000). The design of a hyperstable mutant of the Abp1p SH3 domain by sequence alignment analysis. Protein Sci 9, 2457–2469. 62 WANG, Q., BUCKLE, A. M., and FERSHT, A. R. (2000). Stabilization of GroEL minichaperones by core and surface mutations. J Mol Biol 298, 917–926. 63 BYSTROFF, C. and BAKER, D. (1998). Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol 281, 565–577. 64 LACROIX, E., VIGUERA, A. R., and SERRANO, L. (1998). Elucidating the folding problem of alpha-helices: local motifs, long-range electrostatics, ionic-strength dependence and prediction of NMR parameters. J Mol Biol 284, 173–191. 65 PEDELACQ, J. D., PILTCH, E., LIONG, E. C. et al. (2002). Engineering soluble proteins for structural genomics. Nat Biotechnol 20, 927–932. 66 LAZARIDIS, T. and KARPLUS, M. (2000). Effective energy functions for protein structure prediction. Curr Opin Struct Biol 10, 139–145. 67 BETANCOURT, M. R. and THIRUMALAI, D. (1999). Pair potentials for protein folding: choice of reference states and sensitivity of predicted native states to variations in the interaction schemes. Protein Sci 8, 361–369. 68 THOMAS, P. D. and DILL, K. A. (1996). An iterative method for extracting energy-like quantities from protein structures. Proc Natl Acad Sci USA 93, 11628–11633. 69 CARTER, C. W., Jr, LEFEBVRE, B. C., CAMMER, S. A., TROPSHA, A., and EDGELL, M. H. (2001). Four-body potentials reveal protein-specific correlations to stability changes caused by hydrophobic core mutations. J Mol Biol 311, 625–638. 70 SERRANO, L., KELLIS, J. T., Jr, CANN, P., MATOUSCHEK, A., and FERSHT, A. R. (1992). The folding of an enzyme. II. Substructure of barnase and the contribution of different interactions to protein stability. J Mol Biol 224, 783–804. 71 GILIS, D. and ROOMAN, M. (2000). PoPMuSiC, an algorithm for predicting
References
72
73
74
75
76
77
78
79
80
81
82
83
protein mutant stability changes: application to prion proteins. Protein Eng 13, 849–856. GILIS, D. and ROOMAN, M. (1997). Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. J Mol Biol 272, 276–290. GILIS, D. and ROOMAN, M. (1996). Stability changes upon mutation of solvent-accessible residues in proteins evaluated by database-derived potentials. J Mol Biol 257, 1112–1126. GILIS, D., MCLENNAN, H. R., DEHOUCK, Y., CABRITA, L. D., ROOMAN, M., and BOTTOMLEY, S. P. (2003). In vitro and in silico design of alpha1-antitrypsin mutants with different conformational stabilities. J Mol Biol 325, 581–589. DEHOUCK, Y., BIOT, C., GILIS, D., KWASIGROCH, J. M., and ROOMAN, M. (2003). Sequence-structure signals of 3D domain swapping in proteins. J Mol Biol 330, 1215–1225. CHAKRABARTTY, A. and BALDWIN, R. L. (1995). Stability of alpha-helices. Adv Protein Chem 46, 141–176. SERRANO, L. (2000). The relationship between sequence and structure in elementary folding units. Adv Protein Chem 53, 49–85. ZIMM, B. H. and BRAGG, J. K. (1959). Theory of the phase transition between helix and random coil in polypeptide chains. J Chem Phys 31, 526–535. MUNOZ, V. and SERRANO, L. (1995). Helix design, prediction and stability. Curr Opin Biotechnol 6, 382–386. DOIG, A. J. (2002). Recent advances in helix-coil theory. Biophys Chem 101–102, 281–293. LIFSON, S. and ROIG, A. (1961). On the helix-coil transition in polypeptides. J Chem Phys 34, 1963–1974. LOMIZE, A. L. and MOSBERG, H. I. (1997). Thermodynamic model of secondary structure for alpha-helical peptides and proteins. Biopolymers 42, 239–269. ANDERSEN, N. H. and TONG, H. (1997). Empirical parameterization of a model for predicting peptide helix/coil
84
85
86
87
88
89
90
91
92
93
equilibrium populations. Protein Sci 6, 1920–1936. MISRA, G. P. and WONG, C. F. (1997). Predicting helical segments in proteins by a helix-coil transition theory with parameters derived from a structural database of proteins. Proteins 28, 344–359. MUNOZ, V. and SERRANO, L. (1994). Elucidating the folding problem of helical peptides using empirical parameters. Nat Struct Biol 1, 399–409. MUNOZ, V. and SERRANO, L. (1995). Analysis of i; i + 5 and i; i + 8 hydrophobic interactions in a helical model peptide bearing the hydrophobic staple motif. Biochemistry 34, 15301–15306. RAMIREZ-ALVARADO, M., BLANCO, F. J., and SERRANO, L. (1996). De novo design and structural analysis of a model beta-hairpin peptide system. Nat Struct Biol 3, 604–612. De ALBA, E., RICO, M., and JIMENEZ, M. A. (1997). Cross-strand side-chain interactions versus turn conformation in beta-hairpins. Protein Sci 6, 2548–2560. RAMIREZ-ALVARADO, M., BLANCO, F. J., NIEMANN, H., and SERRANO, L. (1997). Role of beta-turn residues in beta-hairpin formation and stability in designed peptides. J Mol Biol 273, 898–912. De ALBA, E., RICO, M., and JIMENEZ, M. A. (1999). The turn sequence directs beta-strand alignment in designed beta-hairpins. Protein Sci 8, 2234–2244. LACROIX, E., KORTEMME, T., LOPEZ DE LA PAZ, M., and SERRANO, L. (1999). The design of linear peptides that fold as monomeric beta-sheet structures. Curr Opin Struct Biol 9, 487–493. GROMIHA, M. M., UEDAIRA, H., AN, J., SELVARAJ, S., PRABAKARAN, P., and SARAI, A. (2002). ProTherm, thermodynamic database for proteins and mutants: developments in version 3.0. Nucleic Acids Res 30, 301–302. THORN, K. S. and BOGAN, A. A. (2001). ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 17, 284–285.
31
32 Predicting Free Energy Changes of Mutations in Proteins
94 TAKANO, K., OTA, M., OGASAHARA, K., YAMAGATA, Y., NISHIKAWA, K., and YUTANI, K. (1999). Experimental verification of the ‘stability profile of mutant protein’ (SPMP) data using mutant human lysozymes. Protein Eng 12, 663–672. 95 FUNAHASHI, J., TAKANO, K., and YUTANI, K. (2001). Are the parameters of various stabilization factors estimated from mutant human lysozymes compatible with other proteins? Protein Eng 14, 127–134. 96 GUEROIS, R., NIELSEN, J. E., and SERRANO, L. (2002). Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320, 369–387. 97 KORTEMME, T. and BAKER, D. (2002). A simple physical model for binding energy hot spots in protein-protein complexes. Proc Natl Acad Sci USA 99, 14116–14121. 98 LOMIZE, A. L., REIBARKH, M. Y., and POGOZHEVA, I. D. (2002). Interatomic potentials and solvation parameters from protein engineering data for buried residues. Protein Sci 11, 1984–2000. 99 FUNAHASHI, J., TAKANO, K., YAMAGATA, Y., and YUTANI, K. (2002). Positive contribution of hydration structure on the surface of human lysozyme to the conformational stability. J Biol Chem 277, 21792–21800. 100 PITT, W. R. and GOODFELLOW, J. M. (1991). Modelling of solvent positions around polar groups in proteins. Protein Eng 4, 531–537. 101 HONIG, B. and YANG, A. S. (1995). Free energy balance in protein folding. Adv Protein Chem 46, 27–58. 102 MYERS, J. K. and PACE, C. N. (1996). Hydrogen bonding stabilizes globular proteins. Biophys J 71, 2033–2039. 103 KORTEMME, T., MOROZOV, A. V., and BAKER, D. (2003). An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. J Mol Biol 326, 1239–1259. 104 MOROZOV, A. V., KORTEMME, T., and BAKER, D. (2003). Evaluation of models
105
106
107
108
109
110
111
112
113
114
of electrostatic interactions in proteins. J Phys Chem B 107, 2075–2090. BEVERIDGE, D. L. and DICAPUA, F. M. (1989). Free energy via molecular simulation: applications to chemical and biomolecular systems. Annu Rev Biophys Biophys Chem 18, 431–492. KOLLMAN, P. A. (1993). Free energy calculations: applications to chemical and biochemical phenomena. Chem Rev 93, 2395–2417. BRADY, G. P. and SHARP, K. A. (1995). Decomposition of interaction free energies in proteins and other complex systems. J Mol Biol 254, 77–85. WANG, W., DONINI, O., REYES, C. M., and KOLLMAN, P. A. (2001). Biomolecular simulations: recent developments in force fields, simulations of enzyme catalysis, protein-ligand, protein-protein, and protein-nucleic acid noncovalent interactions. Annu Rev Biophys Biomol Struct 30, 211–243. TIDOR, B. and KARPLUS, M. (1991). Simulation analysis of the stability mutant R96H of T4 lysozyme. Biochemistry 30, 3217–3228. PREVOST, M., WODAK, S. J., TIDOR, B., and KARPLUS, M. (1991). Contribution of the hydrophobic effect to protein stability: analysis based on simulations of the Ile-96–-Ala mutation in barnase. Proc Natl Acad Sci USA 88, 10880–10884. SNEDDON, S. F. and TOBIAS, D. J. (1992). The role of packing interactions in stabilizing folded proteins. Biochemistry 31, 2842–2846. YAMAOTSU, N., MORIGUCHI, I., KOLLMAN, P. A., and HIRONO, S. (1993). Molecular dynamics study of the stability of staphylococcal nuclease mutants: component analysis of the free energy difference of denaturation. Biochim Biophys Acta 1163, 81–88. YAMAOTSU, N., MORIGUCHI, I., and HIRONO, S. (1993). Estimation of stabilities of staphylococcal nuclease mutants (Met32 ! Ala and Met32 ! Leu) using molecular dynamics/free energy perturbation. Biochim Biophys Acta 1203, 243–250. SUGITA, Y., KITAO, A., and GO, N. (1998). Computational analysis of thermal
References
115
116
117
118
119
120
121
122
123
stability: effect of Ile ! Val mutations in human lysozyme. Fold Des 3, 173–181. SUGITA, Y. and KITAO, A. (1998). Improved protein free energy calculation by more accurate treatment of nonbonded energy: application to chymotrypsin inhibitor 2, V57A. Proteins 30, 388–400. SUGITA, Y. and KITAO, A. (1998). Dependence of protein stability on the structure of the denatured state: free energy calculations of I56V mutation in human lysozyme. Biophys J 75, 2178–2187. PITERA, J. W. and KOLLMAN, P. A. (2000). Exhaustive mutagenesis in silico: multicoordinate free energy calculations on proteins and peptides. Proteins 41, 385–397. PAN, Y. and DAGGETT, V. (2001). Direct comparison of experimental and calculated folding free energies for hydrophobic deletion mutants of chymotrypsin inhibitor 2: free energy perturbation calculations using transition and denatured states from molecular dynamics simulations of unfolding. Biochemistry 40, 2723–2731. FUNAHASHI, J., SUGITA, Y., KITAO, A., and YUTANI, K. (2003). How can free energy component analysis explain the difference in protein stability caused by amino acid substitutions? Effect of three hydrophobic mutations at the 56th residue on the stability of human lysozyme. Protein Eng 16, 665–671. SPENCER, D. S. and STITES, W. E. (1996). The M32L substitution of staphylococcal nuclease: disagreement between theoretical prediction and experimental protein stability. J Mol Biol 257, 497–499. MARK, A. E. and van GUNSTEREN, W. F. (1994). Decomposition of the free energy of a system in terms of specific interactions. Implications for theoretical and experimental studies. J Mol Biol 240, 167–176. SMITH, P. A. and van GUNSTEREN, W. F. (1994). When are free energy components meaningful? J Phys Chem 98, 13735–13740. BORESCH, S., ARCHONTIS, G., and KARPLUS, M. (1994). Free energy
124
125
126
127
128
129
130
131
132
133
134
simulations: the meaning of the individual contributions from a component analysis. Proteins 20, 25–33. BORESCH, S. and KARPLUS, M. (1995). The meaning of component analysis: decomposition of the free energy in terms of specific interactions. J Mol Biol 254, 801–807. BRADY, G. P., SZABO, A., and SHARP, K. A. (1996). On the decomposition of free energies. J Mol Biol 263, 123–125. LEE, C. and LEVITT, M. (1991). Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core. Nature 352, 448–451. LEE, C. (1994). Predicting protein mutant energetics by self-consistent ensemble optimization. J Mol Biol 236, 918–939. MENDES, J., BAPTISTA, A. M., CARRONDO, M. A., and SOARES, C. M. (2001). Implicit solvation in the self-consistent mean field theory method: sidechain modelling and prediction of folding free energies of protein mutants. J Comput Aided Mol Des 15, 721–740. VARSHAVSKY, A. (1996). The N-end rule: functions, mysteries, uses. Proc Natl Acad Sci USA 93, 12142–12149. TOBIAS, J. W., SHRADER, T. E., ROCAP, G., and VARSHAVSKY, A. (1991). The N-end rule in bacteria. Science 254, 1374–1377. BACHMAIR, A., FINLEY, D., and VARSHAVSKY, A. (1986). In vivo half-life of a protein is a function of its amino-terminal residue. Science 234, 179–186. BACHMAIR, A. and VARSHAVSKY, A. (1989). The degradation signal in a short-lived protein. Cell 56, 1019–1032. FLYNN, J. M., LEVCHENKO, I., SEIDEL, M., WICKNER, S. H., SAUER, R. T., and BAKER, T. A. (2001). Overlapping recognition determinants within the ssrA degradation tag allow modulation of proteolysis. Proc Natl Acad Sci USA 98, 10584–10589. FLYNN, J. M., NEHER, S. B., KIM, Y. I., SAUER, R. T., and BAKER, T. A. (2003). Proteomic discovery of cellular substrates of the ClpXP protease reveals five classes of ClpX-recognition signals. Mol Cell 11, 671–683.
33
34 Predicting Free Energy Changes of Mutations in Proteins
135 ROGERS, S., WELLS, R., and RECHSTEINER, M. (1986). Amino acid sequences common to rapidly degraded proteins: the PEST hypothesis. Science 234, 364–368. 136 LOETSCHER, P., PRATT, G., and RECHSTEINER, M. (1991). The C terminus of mouse ornithine decarboxylase confers rapid degradation on dihydrofolate reductase. Support for the pest hypothesis. J Biol Chem 266, 11213–11220. 137 DOBSON, C. M. (2002). Getting out of shape. Nature 418, 729–730. 138 BOUSSET, L., THOMSON, N. H., RADFORD, S. E., and MELKI, R. (2002). The yeast prion Ure2p retains its native alpha-helical conformation upon assembly into protein fibrils in vitro. EMBO J 21, 2903–2911. 139 BOOTH, D. R., SUNDE, M., BELLOTTI, V. et al. (1997). Instability, unfolding and
aggregation of human lysozyme variants underlying amyloid fibrillogenesis. Nature 385, 787–793. 140 CHITI, F., TADDEI, N., BUCCIANTINI, M., WHITE, P., RAMPONI, G., and DOBSON, C. M. (2000). Mutational analysis of the propensity for amyloid formation by a globular protein. EMBO J 19, 1441–1449. 141 CHITI, F., STEFANI, M., TADDEI, N., RAMPONI, G., and DOBSON, C. M. (2003). Rationalization of the effects of mutations on peptide and protein aggregation rates. Nature 424, 805–808. 142 FERNANDEZ-ESCAMILLA, A. M., ROUSSEAU, F., SCHYMKOWITZ, J., and SERRANO, L. (2004). Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol 22, 1302–1306.
1
Denaturation of Proteins by Urea and Guanidine Hydrochloride C. Nick Pace, Gerald R. Grimsley, and J. Martin Scholtz Texas A&M University, College Station, USA
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The ability of urea to denature proteins has been known since 1900 [1]. Urea is also effective in denaturing more complex biological systems: “A dead frog placed in saturated urea solution becomes translucent and falls to pieces in a few hours” [2]. (It was not reported if frog denaturation is reversible!) The even greater effectiveness of guanidine hydrochloride (guanidinium chloride (GdmCl)) as a protein denaturant was first reported in 1938 by Greenstein [3]. In this article we discuss why urea and GdmCl have proven to be so useful to protein chemists and summarize what we have learned over the past 100 years. Tanford [4] was the first to study quantitatively the unfolding of proteins by urea. The diagram in Figure 1 is from Tanford’s paper; it shows that when a protein unfolds many nonpolar side chains and peptide groups that were buried in the folded protein are exposed to solvent in the unfolded state. Figure 1 also gives the measured values of the free energy of transfer, Gtr , of a leucine side chain and a peptide group from water to the solvents shown. Note that Gtr of both are negative in the presence of urea and GdmCl. This is true of almost all the constituent groups of a protein, and explains why urea and GdmCl denature proteins, and why GdmCl is a stronger denaturant than urea. It is also clear why both folded and unfolded proteins are generally more soluble in urea and GdmCl solutions than they are in water. Most of this article will focus on urea denaturation, however GdmCl denaturation will be discussed where appropriate.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Denaturation of Proteins by Urea and Guanidine Hydrochloride
Fig. 1 Schematic from Tanford [4] illustrating the changes in accessibility of peptidegroups (rectangles) andsidechains (circles) on protein unfolding. The closed symbols represent groups that are accessible
to solvent and the open symbols represent groups that are not accessible to solvent. See Wang and Bolen [88] for osmolyte, and Pace et al. [56] for denaturant, Gtr values.
2 How Urea Denatures Proteins
Urea is remarkably soluble, 10.5 M, in water at 25 ◦ C. Experimental [5–7] and theoretical studies [8–11] have improved our understanding of urea solutions and how urea and water interact with peptide groups and nonpolar side chains. Based on neutron diffraction studies, Soper et al. [5] show that urea incorporates readily into water through hydrogen bonding and there is great similarity in the spatial distribution of water around water, urea around water, urea around urea, and water around urea. They estimate that each urea forms ≈ 5.7 hydrogen bonds. In addition, urea hydrogen bonds with itself, forming chains and clusters in water. (Using a similar approach, Mason et al. [12] show that the Gdm ion has no recognizable hydration shell and is one of the most weakly hydrated cations yet characterized. The Gdm ion can form only three undistorted hydrogen bonds with water.) Urea is nearly three times larger than water, and can form more hydrogen bonds. This may account for the favorable Gtr for the transfer of peptide groups from water to urea solutions. This question is considered experimentally and theoretically in Zou et al. [7]. Why Gtr values are favorable for the transfer of nonpolar side chains from water to urea solutions was more difficult to explain. Muller [13] suggested that
2 How Urea Denatures Proteins
urea molecules replace some of the water molecules in the hydration shell formed around nonpolar solutes, and this changes the hydrogen bonding and van der Waals interactions in such a way as to increase the solubility of the nonpolar solute. This idea is supported by the experimental results of Soper et al. [5], and by the theoretical calculations of Laidig and Daggett [14]. In the absence of denaturant, Timasheff states [15]: “The fact is that there is no rigid shell of water around a protein molecule, but rather there is a fluctuating cloud of water molecules that are thermodynamically affected more or less strongly by the protein molecule.” Crystallographers are able to see some water molecules that are bound tightly enough to be observed, but the fact that no water molecules are observed near many polar groups shows that they do not see them all [16]. When crystal structures are determined in the presence of urea and GdmCl, some denaturant molecules are seen bound to the protein, mainly by hydrogen bonds [17, 18]. These denaturant molecules sometimes link adjacent polar groups on the protein and reduce the conformational flexibility, as shown by a decrease in the temperature factors. There is also experimental evidence that denaturants (and water) reduce the flexibility of the denatured state by transiently linking parts of the protein molecule through hydrogen bonds to polar groups in the protein [19]. Halle’s group has used magnetic relaxation dispersion techniques (MRD) to gain a better understanding of protein hydration at the molecular level. For example, Modig et al. [6] summarize: . . . the urea 2 H and water 17 O MRD data support a picture of the denatured state where much of the polypeptide chain participates in clusters that are more compact and more ordered than a random coil but nevertheless are penetrated by large numbers of water and urea molecules. These solvent-penetrated clusters must be sufficiently compact to allow side-chains from different polypeptide segments to come into hydrophobic contact, while, at the same time, permitting solvent molecules to interact favorably with peptide groups and with charged and polar side chains. The exceptional hydrogen-bonding capacity and small size of water and urea molecules are likely essential attributes in this regard. In such clusters, many water and urea molecules will simultaneously interact with more than one polypeptide segment and their rotational motions will therefore be more strongly retarded than at the surface of the native protein.
We now have a good understanding of how urea denatures proteins because of careful studies and thoughtful analyses from the labs of Timasheff [20], Record [21], and Schellman [22]. If the excluded volume of the protein is taken into account, the hydrodynamic properties of urea denatured proteins suggest that they approach a randomly coiled conformation [23]. Consequently, the accessible surface area of the denatured state is considerably greater than that of the native state. (This is shown clearly in figure 9 in Goldenberg [23].) Studies by Timasheff [20] and Record [21] have shown that urea binds preferentially to proteins over water, so the concentration of urea near the protein is greater than the concentration in the bulk solvent. Thus, as the urea concentration increases, the denatured state will be favored because it has a much greater surface to interact with the urea (Figure 1). Both groups suggest that the dominant contribution is due to preferential binding of urea (or Gdm ions) to the newly exposed peptide groups [20, 21].
3
4 Denaturation of Proteins by Urea and Guanidine Hydrochloride
Schellman [22] has extended his previous studies to show clearly how osmolytes such as trimethylamine-N-oxide (TMAO) or denaturants such as urea act to stabilize or destabilize proteins. The two key factors are the excluded volumes of the cosolvents and their contact interactions with the protein. Because the molecules of interest are all larger than water, the excluded volume effect will always favor the native state. In contrast, contact interactions will generally favor the denatured state. For urea, the contact interaction term is greater than the excluded volume effect, so urea acts as a protein denaturant. For osmolytes, the reverse is true and osmolytes stabilize proteins. Schellman [24] was the first to show that because the direct interaction of urea molecules with proteins is so weak, denaturation should be represented by a solvent exchange mechanism rather than solely by the binding of the denaturant to the protein molecule. His K av – 1 term is a measure of the contact interaction for the average site on a protein. For RNase T1, this term is 0.12 for TMAO, 0.22 for urea, and 0.40 for guanidinium ion. The larger value of the contact term reflects the greater preferential binding of the Gdm ion compared to urea, and this is what makes GdmCl a better denaturant than urea [22]. In summary, it is now clear that urea denatures proteins because there is a preferential interaction with urea compared with water, and the denatured state has more sites for interaction. Consequently, proteins unfold as the urea concentration is increased.
3 Linear Extrapolation Method
A typical urea denaturation curve is shown in Figure 2A. When a protein unfolds by a two-state mechanism, the equilibrium constant, K, can be calculated from the experimental data using: K = [(y)N −(y)]/[(y)−(y)D ]
(1)
where (y) is the observed value of the parameter used to follow unfolding, and (y)N and (y)D are the values (y) would have for the native state and the denatured state under the same conditions where (y) was measured. In the original analyses of urea denaturation curves [25, 26], log K was found to vary linearly as a function of log [urea] and the slope of the plot was denoted by n, and the midpoint of the curve by (urea)1/2 (where log K = 0). These parameters could then be used to calculate the standard free energy of denaturation G 0 =−RT ln K
(2)
and its dependence on urea concentration using [4]: d(G 0 )d(urea)=RT n/(urea)1/2
(3)
3 Linear Extrapolation Method
Fig. 2 A) Urea denaturation curve for RNase Sa determined at 25 ◦ C, pH 5.0, 30 mM sodium acetate buffer as described by Pace et al. [78]. The solid line is based on an analysis of the data using Eq. (5). B) G as a function of urea molarity. The G val-
ues were calculated from the data in the transition region as described in the text. The G(H2 O) and m-values can be determined by fitting the data in B to Eq. (4), or by analyzing the data in A with Eq. (5).
This equation was used by Alexander and Pace [25] to estimate the differences in stability among three genetic variants of a protein for the first time, and was the forerunner of the linear extrapolation method. The linear extrapolation method (LEM) was first used to analyze urea and GdnHCl denaturation curves by Greene and Pace [27]. When G◦ is calculated
5
6 Denaturation of Proteins by Urea and Guanidine Hydrochloride
as a function of urea concentration using data such as those shown in Figure 2A, G◦ is found to vary linearly with urea concentration as shown in Figure 2B. Based on results such as these for several proteins, this equation was proposed for analyzing the data: G=G(H2 O)−m[urea]
(4)
where G(H2 O) is an estimate of the conformational stability of a protein that assumes that the linear dependence continues to 0 M denaturant, and m is a measure of the dependence of G on urea concentration, i.e., the slope of the plot shown in Figure 2B. The same approach has recently been used for measuring the stability of RNA molecules [28]. In the early days, (y)N and (y)D were obtained by extrapolating the pre- and posttransition baselines into the transition region, and then using Eq. (1) to calculate K and then G◦ . However, Santoro and Bolen [29] had a better idea and proposed that nonlinear least squares be used to directly fit data such as those shown in Figure 2A. With their approach, six parameters are used to fit the data: a slope and an intercept for the pretransition and posttransition regions, which are generally linear, and G(H2 O) and m for the transition region, leading to Y = {(YF + mF [urea]) + (YU + mU [urea]) · exp − ((G(H2 O) − m · [urea])/RT )}/ (1 + exp − (G(H2 O) − m · [urea])/RT ))
(5)
where yF and yU are the intercepts and mF and mU the slopes of the pre- and post-transition baselines, and G(H2 O) and m are defined by Eq. (4). The linear extrapolation method is now widely used for estimating the conformational stability of proteins and for measuring the difference in stability between proteins differing slightly in structure. (The original how-to-do-it paper on solvent denaturation [30] has been cited over 1200 times.) In addition, the m-value has taken on a life of its own [31]. These topics will be discussed further below.
4 G(H2 O)
The conformational stability of RNase Sa at 25 ◦ C and pH 5 is ∼7.0 kcal mol−1 . This corresponds to one unfolded molecule for each 135 000 folded molecules (K = 7.4 × 10−6 ). To study the equilibrium between the folded and unfolded conformations by conventional techniques requires destabilizing the protein so that both conformations are present at measurable concentrations. With urea denaturation this is done by increasing the urea concentration. It is clear from Figure 2A that the unfolding equilibrium can be studied only near 7 M urea, and that a long extrapolation is needed to estimate G◦ in the absence of urea, G(H2 O). The same is true for thermal denaturation. The unfolding equilibrium can be studied only near 55 ◦ C (Figure 3A). Thermal denaturation curves can be analyzed to obtain
4 G(H2 O) 7
Fig. 3 A) Thermal denaturation curve for RNase Sa determined at pH 5.0 in 30 mM sodium acetate buffer as described [78]. The solid line is based on a nonlinear least squares analysis of the data similar to that described for Eq. (5), but for thermal denaturation curves [78]. B) G as a function of temperature. The G values were cal-
culated from data in the transition region as described in the text, and the data were analyzed to determine T m and Hm as described [78]. The Cp value is from Pace et al. [78]. The G(25 ◦ C) value (filled circle) was calculated using Eq. (6) which is also shown in B.
8 Denaturation of Proteins by Urea and Guanidine Hydrochloride Table 1 Comparison of G(T) values from DSC with G(H2 O) values from UDC
pH
T (◦ C) Tm a (◦ C) Hm a (kcal mol −1 ) G(T)a (kcal mol −1 ) G(H2 O)b (kcal mol −1 )
2.8 2.8 2.8 2.8 2.8 3.0 3.55 4.0 5.0 6.0 7.0
17.1 21.1 24.9 27.8 25.0 25.0 25.0 25.0 25.0 25.0 25.0
44.9 44.9 44.9 44.9 44.9 49.1 54.5 56.1 58.6 60.3 61.8
79.4 79.4 79.4 79.4 79.4 82.7 91.5 94.2 99.1 100.7 102.3
5.5 4.9 4.3 3.7 4.5 5.1 6.7 7.2 8.1 8.5 8.9
5.4 4.9 4.3 3.5 4.3 5.2 6.4 7.3 7.9 8.6 9.1
a The T m , Hm , and Cp = 1.15 kcal mol−1 K−1 values are from Pace et al. [36]. They were used in Eq. (6) to calculate the G(T) values. b The first four G(H2 O) values are from Pace and Laurents [39], and the last five G(H2 O) values are interpolated from figure 6 in Pace et al. [56].
the midpoint of the thermal denaturation curve, T m , and the enthalpy change at T m , Hm . These two parameters plus the heat capacity change for folding, Cp , can then be used in the Gibbs-Helmholtz equation [32]: G(T )=Hm (1−T/Tm )+C p [T −Tm −T In(T/Tm )]
(6)
to calculate G at any temperature T, G(T). The temperature dependence of G and the value of G(25 ◦ C) for the denaturation of RNase Sa are shown in Figure 3B. Note that the value of G(25 ◦ C) from the thermal denaturation experiment agrees with the G(H2 O) value from the urea denaturation experiments within experimental error, which is about ±0.3 kcal mol−1 (see Eftink and Ionescu [33] for an excellent discussion of the problems most frequently encountered in analyzing solvent and thermal denaturation curves). Previously, we used RNase T1 to show that estimates of G(T) based on DSC experiments were in good agreement with estimates of G(H2 O) from urea denaturation [24, 35]. In a more recent study using RNase A, we compared G(T) values from DSC determined in Makhatadze’s laboratory with G(H2 O) values determined using urea denaturation and the LEM [36] in our laboratory. The agreement is remarkably good (Table 1). Most recently, we have compared conformational stabilities estimated from hydrogen exchange rates measured under native state conditions with those from thermal or solvent denaturation [37]. Again, the conformational stabilities are in good agreement. There is considerable experimental evidence that proteins are more extensively unfolded in solutions of urea than they are in water (see, for example, Qu et al. [38]). Thus, it is surprising that conformational stabilities determined under conditions where the denatured state ensembles differ would be the same. This suggests that the denatured state ensemble that exists under physiological conditions is
4 G(H2 O) 9
Fig. 4 Thermal unfolding curves for ecHPr in various urea concentrations at pH 7.0 (10 mM K/Pi ) as monitored by the change in ellipticity at 222 nm and converted to fraction folded, f F . The error on any single point is smaller than the size of the symbol. The
urea concentrations are 0 M (filled circle), 1 M (open circle), 2 M (filled triangle), 3 M (open triangle), 3.5 M (filled square), and 4 M (open square). From Nicholson and Scholtz [41].
thermodynamically equivalent to the ensembles that exist after thermal or urea denaturation, even though they do not appear to be structurally equivalent. This is supported by studies where both thermal and urea induced unfolding are used to study the conformational stability of a protein. This approach had its beginnings in a study that showed that results from thermal and urea denaturation could be combined to get a reasonable estimate of Cp [39]. The method was then extended to characterize the stability of the HPr proteins from Bacillus subtilis [40] and Escherichia coli [41] as a function of both urea concentration and temperature. Some typical results are shown in Figure 4. More recently, Felitsky and Record [42] used this approach to characterize the stability of the Lac repressor DNA-binding domain in the most detailed study of the stability of a protein yet published. A related approach has been described by Ibarra-Molera and Sanchez-Ruiz [43]. The excellent agreement between conformational stabilities based on DSC results and the LEM suggests that the LEM is a reliable method for estimating the conformational stability of a protein. The Record group has developed an approach for analyzing the results of urea and GdmCl denaturation studies in terms of preferential interaction coefficients that is firmly based on thermodynamics. This method is called the local-bulk domain model and they show that this analysis “justifies the use of linear extrapolation to estimate (≤ 5% error) the stability of the native protein
10 Denaturation of Proteins by Urea and Guanidine Hydrochloride
in the absence of a denaturation” [21]. The history of this area is summarized in a delightful review titled: “Fifty years of solvent denaturation” by Schellman [44]. Two other methods have been used to analyze urea denaturation curves. They both give estimates of G(H2 O) that are larger than those based on the LEM. The differences are not large with urea, but they are with GdmCl. One is an extrapolation based on Tanford’s model [4] that is discussed in the next section. This method uses Gtr values for transfer of the constituent groups of a protein from water to urea solutions. Unfortunately there is uncertainty in the Gtr value to use for a peptide group [38], and this introduces uncertainty into the G(H2 O) values obtained by this method [45]. Experiments are underway to determine a more reliable value of Gtr for a peptide group [7] (Auton and Bolen, personal communication). Makhatadze [46] has shown that the data on which this method is based are consistent with the LEM when urea but not GdnHCl is used as the denaturant. The other method that can be used to determine G(H2 O) is the denaturant binding model first used by Aune and Tanford [47]. The plots of log K vs. log [urea] mentioned above show that more urea molecules are bound by the denatured state than by the native state. Also, urea increases the solubility of all of the constituent parts of a protein, and it is possible to account for the enhanced solubility in terms of urea binding, even for leucine side chains (see table XI in Tanford [48]). Thus, it might seem reasonable to try to analyze urea denaturation curves in terms of the denaturant binding model. Generally, all of the urea binding sites are considered identical and independent and K = 0.1 is used for the binding constant [30]. However, this is surely not stoichiometric binding and it is unlikely that the binding sites are identical and independent. The enhanced solubility of the model compound data requires “binding constants” in the range 0.05–0.3 [46, 48, 49]. Schellman has shown convincingly that a solvent exchange model is more reasonable than a stoichiometric binding model, and this model is consistent with the LEM. (Schellman’s recent article [22] gives references to his earlier work. It is also reviewed in Timasheff’s paper [50].) At first it was puzzling that the LEM worked so well. It seemed likely that there would sometimes be binding sites for urea molecules on folded proteins of at least moderate affinity and this would lead to nonlinearity that would cause the LEM to give erroneous results. Apparently, this rarely occurs. Our understanding of how urea denatures proteins has improved, thanks to the approaches of Timasheff [20], Record [21], and Schellman [22], which all show now why the LEM method works as well as it does. G(H2 O) values from GdmCl denaturation studies are less reliable, as pointed out most recently by Makhatadze [46]. This conclusion was reached earlier by Schellman and Gassner [51] who said:
Finally this study as well as a number of others indicates that urea has a number of advantages over guanidinium chloride as far as thermodynamic interactions are concerned. The free energy and enthalpy functions of both proteins and model compounds are more linear; the solutions more ideal; extrapolations to zero concentration are more certain; and least-squares analysis of the data is more stable.
5 m-Values
One problem is that the ionic strength cannot be controlled with GdmCl, and this has been shown to affect the results in several studies (see, for example, IbarraMolero et al. [52], Monera et al. [53], Santoro and Bolen [54]). In summary, the agreement between G(T) values from DSC and thermal denaturation studies and G(H2 O) values from urea denaturation studies analyzed by the LEM suggests that either method can be used to reliably measure the conformational stability of a protein. In addition, we now have a good understanding of why the LEM works so well.
5 m-Values
The thermodynamic cycle in Figure 5 shows that the dependence of G on denaturant concentration is determined by the groups in a protein that are exposed to solvent in the denatured state, D, but not in the native state, N4 . In support of this, Myers et al. [55] showed that there is a good correlation between m-values and the change in accessible surface area on unfolding. Since g tr; i values are available for most of the constituent groups in a protein, the equation at the bottom of Figure 5 can be used to calculate the dependence of G on denaturant concentration. The m-value is an experimental measure of the dependence of G on denaturant concentration. Consequently, for proteins that unfold by a two-state mechanism, α can be estimated by comparing the calculated and measured m-values. Thus, α
Fig. 5 A thermodynamic cycle connecting protein folding in water and urea. N and D denote the native and denatured states, G and G(H2 O) are defined by Eq. (4), ni is the total number of groups of type i in the protein, "i is fraction of groups of type i
that are exposed in D but not N, and gtr,i is the free energy of transfer of groups of type i from water to a given concentration of urea. " is the average change in exposure of all of the peptide groups and uncharged side chains in the protein [56].
11
12 Denaturation of Proteins by Urea and Guanidine Hydrochloride
is the fraction of buried groups that must become exposed to solvent on unfolding to account for the measured m-value [56]. In the original LEM paper [27], this approach was used to estimate α for four proteins. It was clear that differences among the α values for individual proteins revealed differences in the accessibility of the denatured state and that these differences depended to some extent on the number and location of the disulfide bonds in the protein. Interest in m-values was stimulated by a series of papers from the Shortle laboratory [57, 58]. (A review of these studies was titled: “Staphylococcal nuclease: a showcase of m-value effects.” [31].) Their studies of mutants of staphylococcal nuclease (SNase) showed a wide range of m-values. Those with m-values 5% or more greater than wild type were designated m+ mutants (≈25%) and those with m-values 5% or more less than wild type were designated m+ mutants (≈ 50%). Their interpretation was that m+ mutants unfolded to a greater extent than wild type and vice versa for m+ mutants. These papers were extremely important because they focused attention on the denatured state which had largely been ignored in discussions of protein stability. (Another review from the Shortle lab [58] was titled: “The denatured state (the other half of the folding equation) and its role in protein stability”) However, this interpretation is only straightforward if all of the mutants unfold by a two-state mechanism. Shortle and Meeker [57] chose to assume a two-state mechanism to analyze their results even though the m+ mutants did not appear to unfold by a two-state mechanism. It is now clear that some of the m− mutants unfold by a three-state mechanism [59–62]. The presence of an intermediate state will generally lower the m-value [63]. One important consequence of this is that the estimates of G(H2 O) will be too low when a two-state mechanism is assumed and the mechanism is, in fact, three-state. In the case of the m− mutant V66L + G88V + G79S, a two-state analysis indicates that this mutant is 3.0 kcal mol−1 LESS stable than wild type [57], but a three-state analysis indicates that that this mutant is 0.8 kcal mol−1 MORE stable than wild type [60]. This is also true for other m− mutants [64]. This means that the (G) values for the m− mutants are not correct. Since 50% of the SNase mutants are m− , this calls into question the interpretation of many of the (G) values determined for SNase. In Table 2, we have compiled m and α values for a selection of proteins. In all cases, the folding of the proteins is thought to closely approach a twostate mechanism. The value of α expected for unfolding a protein depends on the model assumed for the denatured state. If the accessibility of the denatured state is modeled as a tripeptide using the Lee and Richards [65] approach, a value of α≈ 0.7 would be expected. However, if a compact denatured state based on fragments with native-state conformations is assumed, a α ≈ 0.53 would be expected [66]. Note in Table 2 that most of the α values are considerably less than 0.53. This suggests that the urea-denatured states of proteins are less completely unfolded than in the hypothetical state that Creamer et al. [66] thought might be a lower limit for the compactness of the denatured state at the midpoint of a thermal denaturation curve in water. However, because of uncertainty in the Gtr value for a peptide group and for some of the side chains, interpreting the absolute values of α is hazardous. In contrast, differences among the α values for
5 m-Values
individual proteins or for the same protein under different conditions are more trustworthy. For the proteins without disulfide bonds, the α values in Table 2 range from a low of 0.26 for thioredoxin to a high of 0.51 for calbindin. Thus, if these proteins unfold by a two-state mechanism, calbindin unfolds to a much greater extent than thioredoxin. Barnase also unfolds more completely than most of the other proteins in Table 2. In support of this, we have shown that the tryptophan and tyrosine side chains of urea-denatured barnase are more accessible to a perturbant than those of several other proteins that have lower α values [67]. The results in Table 2 suggest that urea-denatured proteins are not completely unfolded and that they unfold to different extents. Table 2 Change in accessibility (α) for urea denaturation calculated from the measured
m-values using Tanford’s modela
Protein
Residues
Disulfides
(Urea)1/2
m
α
RNase Sa (0K) (pH 7)b RNase Sa (0K) (pH 3)b RNase Sa (3K) (pH 7)b RNase Sa (3K) (pH 3)b RNase Sa (5K) (pH 7)b RNase Sa (5K) (pH 3)b RNase T1c RNase Ad Lysozyme (Hen)c SH3 domainc Calbindinc HPr (E. coli)e Acyl phosphatasef λ Repressorc RNase T1d FKBPc Thioredoxinc Barnasec Che Yc FABPc SNaseg ApoMbh RNase Hc DHFR (E. coli)c T4 Lysozymec
96 96 96 96 96 96 104 124 129 62 75 88 99 102 104 107 108 110 129 131 149 153 155 159 163
1 1 1 1 1 1 2 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6.44 1.48 4.07 1.47 5.31 1.30 5.30 6.98 6.80 3.74 5.30 4.32 4.54 4.42 3.34 4.18 6.70 4.49 3.51 5.43 2.56 2.05 3.88 3.10 6.30
0.99 1.75 0.99 2.00 1.14 2.21 1.21 1.40 1.29 0.77 1.13 1.14 1.25 1.09 1.64 1.46 1.30 1.94 1.60 1.77 2.36 2.49 1.93 1.90 2.00
0.32 0.41 0.29 0.46 0.36 0.50 0.32 0.35 0.29 0.34 0.51 0.45 0.39 0.38 0.38 0.42 0.26 0.50 0.36 0.40 0.44 0.47 0.35 0.32 0.38
a Tanford’s model is described in Tanford [4] and the details of the approach used to calculate α are given in Pace et al. [56]. (Urea)1/2 values are given in M; m values are given in kcal mol−1 M−1 . b The (urea)1/2 and m-values are from Shaw [82]. c The original reference for these proteins is given in Myers et al. [55]. d See Pace et al. [56]. e See Nicholson and Scholtz [41]. f See Taddei et al. [83]. g See Shortle [31]. h See Hughson and Baldwin [84].
13
14 Denaturation of Proteins by Urea and Guanidine Hydrochloride
We were surprised to find that for RNases A and T1 [56] and barnase [67], the m-value increases markedly as the pH is lowered from 7 to 3. This indicates that the denatured states interact more extensively with urea as the pH is lowered, and we suggested that the denatured state ensemble expands at low pH due to electrostatic repulsion among the positive charges. In support of this interpretation, Privalov et al. [68] have shown that the intrinsic viscosity of unfolded proteins increases at low pH. We have recently obtained results with RNase Sa that provide further support for this idea [69]. RNase Sa is an acidic protein with a pI = 3.5 that contains no lysine residues. We have prepared a triple mutant that we call 3K (D1K, D17K, E41K) with a pI = 6.4, and a quintuple mutant that we call 5K (D1K, D17K, D25K, E41K, E74K) with a pI = 10.2. At pH 3, the estimated net charges are +8 for wild-type RNase Sa, +11 for 3K, and +13 for 5K. The m-values for the three proteins at pH 7 and pH 3 are given in Table 2. For all three proteins, the m-value is considerably greater at pH 3 than at pH 7, and the α values increase from 0.32 for wild-type RNase Sa to 0.36 for 3K to 0.50 for 5K. This is consistent with an increase in accessibility to denaturant caused by an expansion of the denatured state due to electrostatic repulsion among the positive charges. As the pH increases from 3 to 7, the carboxyl groups are titrated and both negative and positive charges are present. Now attractive charge–charge interactions are possible, and the decrease in the m-values suggests that the denatured state ensemble is more compact because of attractive Coulombic interactions. In support of this, the α value at pH 7 is lowest for 3K where the number of positive and negative charges is almost equal that for wild-type RNase Sa, which has an excess of negative charges, or 5K, which has an excess of positive charges. This and other evidence led us to conclude that long-range electrostatic interactions are important in determining the denatured state ensemble [69]. We suggest that when the hydrophobic and hydrogen bonding interactions that stabilize the folded state are disrupted, the unfolded polypeptide chain rearranges to compact conformations that improve long-range electrostatic interactions. These charge–charge interactions in the denatured state will reduce the net contribution of electrostatic interactions to protein stability but will help determine the denatured state ensemble. If this idea is correct, then long-range charge–charge interactions in the denatured state may play an important role in the mechanism of protein folding. 6 Practical Considerations 6.1 How to Choose the Best Denaturant for your Study
Both urea and GdmCl will increase the solubility of the folded and unfolded states of a protein. If the denaturant is just going to be used to increase the solubility or unfold your protein of interest, GdmCl is preferred since it is a more potent denaturant and is chemically stable. Guanidinium thiocyanate is an even more potent denaturant than GdmCl and can be used for the same purposes. (The reason for the greater potency is discussed in Mason et al. [12].)
6 Practical Considerations
For quantitative studies of protein stability, we previously preferred results from urea and GdmCl unfolding over thermal unfolding studies for several reasons: the product of unfolding seemed to be better characterized, unfolding was more likely to closely approach a two-state mechanism, and unfolding was more likely to be completely reversible. However, in the cases that have been carefully investigated, the two techniques seem to give estimates of G(H2 O) that are in good agreement [36, 70]. After getting the equipment set up and learning the procedures, either type of unfolding curve can be determined and analyzed in a single day. Consequently, the safest course is to determine both types of unfolding curves whenever possible. Also, we generally prefer urea over GdmCl because salt effects can be investigated, and sometimes reveal interesting information [46, 51, 71, 72]. 6.2 How to Prepare Denaturant Solutions
Both urea and GdmCl can be purchased commercially in highly purified forms. However, some lots of these denaturants may contain fluorescent or metallic impurities. Methods are available for checking the purity of GdmCl and for recrystallization if necessary [73]. A procedure for purifying urea has also been described [74]. GdmCl solutions are stable for months, but urea solutions slowly decompose to form cyanate and ammonium ions in a process accelerated at high pH [75]. The cyanate ions can react with amino groups on proteins [76]. Consequently, a fresh urea stock solution should be prepared for each unfolding curve and used within the day. Table 3 summarizes useful information for preparing urea and GdmCl stock solutions. We prepare urea stock solutions by weight, and then check the concentration by refractive index measurements using the equation given in Table 3 Information for preparing urea and GdmCl stock solutions
Property
Urea
GdmCl
Molecular weight Solubility (25.0 ◦ C) d = d0 a Molarityb
60.056 10.49 M 1 + 0.2658W + 0.0330W2 117.66(N) + 29.753(N)2 + 185.56(N)3
95.533 8.54 M 1 + 0.2710W + 0.0330W2 57.147(N) + 38.68(N)2 − 91.60(N)3
0.495 0.755 1.103
1.009 1.816 –
Grams of denaturant per gram of water to prepare 6M 8M 10 M a
W is the weight fraction denaturant in the solution, d is the density of the solution and d0 is density of water [85]. b N is the difference in refractive index between the denaturant solution and water (or buffer) at the sodium D line. The equation for urea solutions is based on data from Warren and Gordon [86], and the equation for GdmCl solutions is from Nozaki [73].
15
16 Denaturation of Proteins by Urea and Guanidine Hydrochloride
Table 3. The preparation of a typical urea stock solution is outlined below. Since GdmCl is quite hygroscopic, it is more difficult to prepare stock solutions by weight. Consequently, the molarity of GdmCl stock solutions is generally determined from the refractive index measurement (step 5 below). 6.2.1 Preparation of a urea stock solution This describes the preparation of ∼100 mL of ∼10 M urea stock solution containing 30 mM MOPS buffer, pH 7.0. We use a top loading balance with an accuracy of about +0.02 g. 1. Add ≈60 g of urea to a tared beaker and weigh (59.91 g). Now add 0.69 g of MOPS buffer (sodium salt), 1.8 mL of 1 M HCl and ≈52 mL of distilled water and weigh the solution again (114.65 g). 2. Allow the urea to dissolve and check the pH. If necessary, add a weighed amount of 1 M HCl to adjust to pH 7.0. 3. Prepare 30 mM MOPS buffer, pH 7.0. 4. Determine the refractive index of the urea stock solution (1.4173) and of the buffer (1.3343). Therefore, N = 1.4173 − 1.3343 = 0.0830. 5. Calculate the urea molarity from DN using the equation given in Table 3: M = 10.08. 6. Calculate the urea molarity based on the recorded weights. The density is calculated with the equation given in Table 1: weight fraction area (W) = 59.91= 114.65 = 0.5226; therefore d/d0 = 1.148. Therefore volume = 114.65/1.148 = 99.88 mL. Therefore urea molarity = 59.91/60.056/.09988 = 9.99 M.
6.3 How to Determine Solvent Denaturation Curves
Figure 2A shows a typical urea unfolding curve for RNase Sa using CD to follow unfolding. We will refer to the physical parameter used to follow unfolding as y for the following discussion. The curves can be conveniently divided into three regions: (i) The pretransition region, which shows how y for the folded protein, yF , depends upon the denaturant. (ii) The transition region, which shows how y varies as unfolding occurs. (iii) The posttransition region, which shows how y for the unfolded protein, yU , varies with denaturant. All of these regions are important for analyzing unfolding curves. As a minimum, we recommend determining four points in the pre- and posttransition regions, and five points in the transition region. Of course, the more points determined the better defined the curve. With thermodynamic measurements, it is essential that equilibrium is reached before measurements are made, and that the unfolding reaction is reversible. The time required to reach equilibrium can vary from seconds to days, depending upon the protein and the conditions. For example, the unfolding of RNase T1 requires hours to reach equilibrium at 25 ◦ C, as compared to only minutes for RNase Sa. For any given protein, the time to reach equilibrium is longest at the midpoint
6 Practical Considerations
of the transition and decreases in both the pre- and posttransition regions. To ensure that equilibrium is reached, y is measured as a function of time to establish the time required to reach equilibrium. To test the reversibility of unfolding, allow a solution to reach equilibrium in the posttransition region and then by dilution return the solution to the pretransition region and measure y. If the reaction is reversible, the value of y measured after complete unfolding should be within ±5% of that determined directly. For many proteins, unfolding by urea and GdmCl is completely reversible. We have left RNase T1 in 6 M GdmCl for over 3 months and found that the protein will refold completely on dilution. Proteins containing free sulphydryl groups present special problems. If the protein contains only free –SH groups and no disulfide bonds, then a reducing agent such as 10 mM dithiothreitol (DTT) can be added to ensure that no disulfide bonds form during the experiments. For proteins containing both free –SH groups and disulfide bonds, disulfide interchange can occur and this may lead to irreversibility. Disulfide interchange can be minimized by working at low pH [77]. If the unfolding of the protein of interest requires considerable time to reach equilibrium, then each point shown in Figure 2A will have to be determined from a separate solution, prepared volumetrically using the best available pipettes. A fixed volume of protein stock solution is mixed with the appropriate volumes of buffer, and urea or GdmCl stock solutions. The protein and buffer solutions are prepared by standard procedures. After the solutions for measurement have been prepared (see the protocol below), they are incubated until equilibrium is reached at the temperature chosen for determining the unfolding curve. After the measurements have been completed, it is good practise to measure the pH of the solutions in the transition region. If the protein equilibrates quickly, as is the case with RNase Sa, titration methods can be used to determine the denaturation curve. This was the method used to obtain the curve in Figure 2A. This protocol is described in detail in Pace [78] and Scholtz [40]. If the amount of protein is limited, instruments have been developed to follow the unfolding of a protein simultaneously with several spectral techniques using a single protein solution [79]. 6.3.1 Determining a Urea or GdmCl Denaturation Curve 1. Prepare three solutions: a denaturant stock solution, a protein stock solution and a buffer solution. 2. Prepare the sample solutions volumetrically (e.g., with Rainin EDP2 pipettes) in clean, dry test tubes. The solutions made to determine a urea unfolding curve for E41H RNase Sa, along with the resulting CD measurements, are given in Table 4. 3. Allow these solutions to equilibrate at the temperature chosen for the experiment until they reach equilibrium. (This is best determined in a separate experiment as described in the text.) 4. Measure the experimental parameter being used to follow unfolding on the solutions in order of increasing denaturant concentration. Leave the cuvette in the spectrophotometer, and do not rinse between samples. Simply remove the
17
18 Denaturation of Proteins by Urea and Guanidine Hydrochloride Table 4 Experimental results from a urea denaturation curvea
Ureab (mL)
Bufferc (mL)
Proteind (mL)
Ureae (M)
h 234 (mdeg)f
0.298 0.795 1.193 1.267 1.342 1.392 1.466 1.542 1.615 1.690 1.740 1.789 1.839 2.087 2.286
2.152 1.655 1.257 1.183 1.108 1.058 0.984 0.909 0.835 0.760 0.710 0.661 0.611 0.363 0.164
0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050 0.050
1.2 3.2 4.8 5.1 5.4 5.6 5.9 6.2 6.5 6.8 7.0 7.2 7.4 8.4 9.2
18 18 16 14 12 10 7 4 1 −2 −4 −4 −5 −5 −6
a Only two or three solutions are shown for the pre- and posttransition regions. In the actual experiment, a total of 32 solutions were prepared and measured. b 10.06 M urea stock solution (30 mM MOPS, pH 7.0). c 30 mM MOPS buffer, pH 7.0. d 5.0 mg mL−1 RNase Sa E41H stock solution (30 mM MOPS, pH 7.0). e Urea molarity = 10.06 ((mL urea)/(mL urea + mL protein + mL buffer)). f Circular dichroism measurements at 234 nm using an Aviv 62DS spectropolarimeter.
old solution carefully with a pasteur pipette with plastic tubing attached to the tip, and then add the next sample. Leaving the cuvette in position improves the quality of the measurements, and the error introduced by the small amount of old solution is negligible. 5. Plot these results to determine if any additional points are needed. If so, prepare the appropriate solutions and make the measurements as described for the original solutions. 6. Measure the pH of the solutions in the transition region. 7. Analyze the results as described below. 6.3.2 How to Analyze Urea or GdmCl Denaturant Curves We assume a two-state folding mechanism for the discussion here (this will be correct for many small globular proteins). Consequently, for any of the points shown in Figure 2A, only the folded and unfolded conformations are present at significant concentrations, and f F + f U = 1, where f F and f U represent the fraction protein present in the folded and unfolded conformations, respectively. Thus, the observed value of y at any point will be y = yF f F + yU f U , where yF and yU represent the values of y characteristic of the folded and unfolded states, respectively, under the conditions where y is being measured. Combining these equations gives:
f U =(yF −y)/(yF −yU )
(A1)
6 Practical Considerations Table 5 Analysis of a urea denaturation curvea
Urea (M)
y
yF
yU
fF
fU
K
G (kcal mol−1 )
6.26 6.51 6.76 7.01 7.26 7.51 7.76 8.01 8.26 8.51 8.75 9.00
9.16 8.38 7.35 6.17 4.45 2.75 1.15 −0.66 −2.16 −3.14 −3.94 −4.55
12.01 12.08 12.16 12.23 12.30 12.38 12.45 12.52 12.60 12.67 12.74 12.82
−10.53 −10.22 −9.88 −9.55 −9.23 −8.91 −8.58 −8.26 −7.94 −7.61 −7.29 −6.96
0.87 0.83 0.78 0.72 0.64 0.55 0.46 0.37 0.28 0.22 0.17 0.12
0.13 0.17 0.22 0.28 0.36 0.45 0.54 0.63 0.72 0.78 0.83 0.88
0.14 0.20 0.28 0.39 0.57 0.83 1.16 1.73 2.55 3.53 4.98 7.16
1.15 0.95 0.76 0.56 0.33 0.11 −0.09 −0.33 −0.56 −0.75 −0.95 −1.17
a
Data points were from the transition region of the unfolding curve shown in Figure 2A.
The equilibrium constant, K, and the free energy change, G, can be calculated using K = f U / f F = f U /(1− f U )=(yF −y)/(yF −yU )
(A2)
G=−RT InK =−RT In[(yF −y)/(y−yU )]
(A3)
and
where R is the gas constant (1.987 cal mol−1 K−1 or 8.3144 J K−1 mol−1 ) and Tis the absolute temperature. Values of yF and yU in the transition region are obtained by extrapolating from the pre- and posttransition regions in Figure 2A. Usually yF and yU are linear functions of denaturant concentration, and a least-squares analysis is used to determine the linear expressions for yF and yU . The calculation of f U , K, and G from the data in Figure 2A is illustrated in Table 5. Values of K can be measured most accurately near the midpoints of the curves, and the error becomes substantial for values outside the range 0.1–10. Consequently, we generally only use G values within the range ±1.5 kcal mol−1 . See Eftink and Ionescu [33] and Allan and Pielak [80] for discussions of the problems most frequently encountered in analyzing solvent and thermal denaturation curves. As discussed above, the simplest method of estimating the conformational stability in the absence of urea, G(H2 O), is to use the LEM, which typically gives results such as those shown in Figure 2B. We strongly recommend that values of G(H2 O), m, and [D]1/2 be given in any study of the unfolding of a protein by urea or GdmCl. Remember also that if the LEM is used, one can apply Eq. (5) and use nonlinear least squares [29] to conveniently obtain the unfolding parameters.
19
20 Denaturation of Proteins by Urea and Guanidine Hydrochloride
6.4 Determining Differences in Stability
It is frequently of interest to determine differences in conformational stability among proteins that differ slightly in structure. The structural change might be a single change in amino acid sequence achieved through site-directed mutagenesis, or a change in the structure of a side chain resulting from chemical modification. In Table 6, we present results from urea unfolding studies of wild-type RNase Sa and a mutant that differs in amino acid sequence by one residue. Two different methods of calculating the differences in stability, (G), are illustrated. The midpoints of urea, (urea)1/2 , and thermal, Tm , unfolding curves can be determined quite accurately and do not depend to a great extent on the unfolding mechanism [30]. In contrast, measures of the steepness of urea unfolding curves, m, cannot be determined as accurately, and deviations from a two-state folding mechanism will generally change these values. Consequently, differences in stability determined by comparing the G(H2 O) values can have large errors. However, when comparing completely different proteins or forms of a protein that differ markedly in stability, no other choice is available. This approach is illustrated by the first column of (G) values in Table 6. There is also danger in drawing conclusions about the conformational stabilities of unrelated proteins based solely on the midpoints of their unfolding curves. For example, lysozyme and myoglobin have similar G(H2 O) values at 25 ◦ C, but a much higher concentration of GdmCl is needed to denature lysozyme because the m-value is much smaller. Likewise, lysozyme and cytochrome c are unfolded at about the same temperature, even though lysozyme has a much larger value of G(H2 O) at 25 ◦ C [81]. A second approach that can be used to determine differences in stability of singlesite mutants is to take the difference between [D]1/2 values and multiply this by the average of the m-values. This is illustrated by the (G) values in the last column of Table 6. The rationale here is that the error in measuring the m-values should generally be greater than any differences resulting from the effect of small changes in structure on the m-value. However, substantial differences in m-values between Table 6 Determining differences in conformational stabilitya
Protein
G(H2 O)b
(G)c
md
(Urea)1/2 e
(Urea)1/2 f
(G)g
Wild-type RNase Sa N44 → A RNase Sa
6.09 4.12
– 2.0
1170 1174
5.20 3.51
– 1.69
– 2.0
a
Data from Hebert et al. [87]. From Eq. (4), in kcal mol−1 . c Difference between the G(H2 O) values in kcal mol−1 . d From Eq. (4), in cal mol−1 M−1 . e Midpoint of the urea unfolding curve in M. f Difference between the (urea)1/2 values in M. g From (urea)1/2 x 1172 (the average of the two m-values) in kcal mol−1 . b
References
proteins differing by only one amino acid in sequence have been observed with some proteins, i.e., SNase, so one must use caution when applying this method.
7 Conclusions
Protein denaturants are used for a variety of purposes by protein chemists when they need to solubilize or denature proteins. Urea and GdmCl are the denaurants most often used. The LEM has proven to be a reliable method for measuring the conformational stability of a protein or the difference in stability between two proteins that differ slightly in structure. In addition, studies of the m-values of proteins, especially SNase, have focused attention on the denatured state and we are slowly gaining a better understanding of the denatured state ensemble.
8 Acknowledgments
This work was supported by grants GM-37039 and GM-52483 from the National Institutes of Health (USA), and grants BE-1060 and BE-1281 from the Robert A. Welch Foundation. We thank many colleagues and coworkers over the years that have made important contributions to the field of protein folding.
References 1 SPIRO, K. (1900). Uber die beeinflussung der eiweisscoagulation durch stickstoffhaltige substanzen. Z Physiol Chem 30, 182–199. 2 RAMSDEN, W. (1902). Some new properties of urea. J Physiol 28, 23–27. 3 GREENSTEIN, J. P. (1938). Sulfhydryl groups in proteins. J Biol Chem 125, 501–513. 4 TANFORD, C. (1964). Isothermal unfolding of globular proteins in aqueous urea solutions. J Am Chem Soc 86, 2050–2059. 5 SOPER, A. K., CASTNER, E. W., and LUZAR, A. (2003). Impact of urea on water structure: a clue to its properties as a denaturant? Biophys Chem 105, 649–666. 6 MODIG, K., KURIAN, E., PRENDERGAST, G., and HALLE, B. (2003). Water and urea interactions with the native and unfolded forms of a β-barrel protein. Protein Sci 12, 2768–2781.
7 ZOU, Q., BENNION, B. J., DAGGETT, V., and MURPHY, K. P. (2002). The molecular mechanism of stabilization of proteins by TMAO and its ability to counteract the effects of urea. J Am Chem Soc 124, 1192–1202. 8 TSAI, J., GERSTEIN, M., and LEVITT, M. (1996). Keeping the shape but changing the charges: A simulation study of urea and its isosteric analogs. J Chem Phys 104, 9417–9430. 9 TIRADO-RIVES, J., OROZCO, M., and JORGENSEN, W. L. (1997). Molecular dynamics simulations of the unfolding of barnase in water and 8 M aqueous urea. Biochemistry 36, 7313–7329. 10 SOKOLIC, F. I. A. (2002). Concentrated aqueous urea solutions: A molecular dynamics study of different models. J Chem Phys 116, 1636–1646.
21
22 Denaturation of Proteins by Urea and Guanidine Hydrochloride
11 BENNION, B. J. and DAGGETT, V. (2003). The molecular basis for the chemical denaturation of proteins by urea. Proc Natl Acad Sci USA 100, 5142–5147. 12 MASON, P. E., NEILSON, G. W., DEMPSEY, C. E., BARNES, A. C., and CRUICKSHANK, J. M. (2003). The hydration structure of guanidinium and thiocyanate ions: implications for protein stability in aqueous solution. Proc Natl Acad Sci USA 100, 4557–4561. 13 MULLER, N. (1990). A model for the partial reversal of hydrophobic hydration by addition of a urea-like cosolvent. J Phys Chem 94, 3856–3859. 14 LAIDIG, K. E. and DAGGETT, V. (1996). Testing the modified hydration-shell hydrogen-bond model of hydrophobic effects using molecular dynamics simulation. J Phys Chem 100, 5616–5619. 15 TIMASHEFF, S. N. (2002). Protein hydration, thermodynamic binding, and preferential hydration. Biochemistry 41, 13473–13482. 16 KARPLUS, P. A. and FAERMAN, C. (1996). Ordered water in macro-molecular structure. Curr Opin Struct Biol 4, 770–776. 17 PIKE, A. C. W. and ACHARYA, K. R. (1994). A structural basis for the interaction of urea with lysozyme. Protein Sci 3, 706–710. 18 DUNBAR, J., YENNAWAR, H. P., BANERJEE, S., LUO, J., and FARBER, G. K. (1997). The effect of denaturants on protein structure. Protein Sci 6, 1727–1733. 19 MAKHATADZE, G. I. and PRIVALOV, P. L. (1992). Protein interactions with urea and guanidinium chloride. A calorimetric study. J Mol Biol 226, 491–505. 20 TIMASHEFF, S. N. and XIE, G. (2003). Preferential interactions of urea with lysozyme and their linkage to protein denaturation. Biophys Chem 105, 421–448. 21 COURTENAY, E. S., CAPP, M. W., SAECKER, R. M., and RECORD, M. T., Jr. (2000). Thermodynamic analysis of interactions between denaturants and protein surface exposed on unfolding: interpretation of urea and guanidinium chloride m-values and their correlation with changes in accessible surface area (ASA) using preferential interaction coefficients and
22
23
24
25
26
27
28
29
30
31
32
33
the local-bulk domain model. Proteins Suppl 4, 72–85. SCHELLMAN, J. A. (2003). Protein stability in mixed solvents: a balance of contact interaction and excluded volume. Biophys J 85, 108–125. GOLDENBERG, D. P. (2003). Computational simulation of the statistical properties of unfolded proteins. J Mol Biol 326, 1615–1633. SCHELLMAN, J. A. (1987). Selective binding and colvent denaturation. Biopolymers 26, 549–559. ALEXANDER, S. S. and PACE, C. N. (1971). A comparison of the denaturation of bovinelactoglobulins A and B and goat-lactoglobulin. Biochemistry 10, 2738–2743. PACE, C. N. and TANFORD, C. (1968). Thermodynamics of the unfolding of β-lactoglobulin A in aqueous urea solutions between 5 and 55. Biochemistry 7, 198–208. GREENE, R. F. J. and PACE, C. N. (1974). Urea and guanidine hydrochloride denaturation of ribonuclease, lysozyme, alpha-chymotrypsin, and beta-lactoglobulin. J Biol Chem 249, 5388–5393. SHELTON, V. M., SOSNICK, T. R., and PAN, T. (1999). Applicability of urea in the thermodynamic analysis of secondary and tertiary RNA folding. Biochemistry 38, 16831–16839. SANTORO, M. M. and BOLEN, D. W. (1988). Unfolding free energy changes determined by the linear extrapolation method. 1. Unfolding of phenylmetanesulfonyl α-chymotrypsin using different denaturants. Biochemistry 27, 8063–8068. PACE, C. N. (1986). Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol 131, 266–280. SHORTLE, D. (1995). Staphylococcal nuclease: a showcase of m-value effects. Adv Protein Chem 46, 217–247. BECKTEL, W. J. and SCHELLMAN, J. A. (1987). Protein stability curves. Biopolymers 26, 1859–1877. EFTINK, M. R. and IONESCU, R. (1997). Thermodynamics of protein unfolding:
References
34
35
36
37
38
39
40
41
42
43
questions pertinent to testing the validity of the two-state model. Biophys Chem 64, 175–197. YU, Y., MAKHATADZE, G. I., PACE, C. N., and PRIVALOV, P. L. (1994). Energetics of ribonuclease T1 structure. Biochemistry 33, 3312–3319. HU, C.-Q., STURTEVANT, J. M., THOMSON, J. A., ERICKSON, R. E., and PACE, C. N. (1992). Thermodynamics of ribonuclease T1 denaturation. Biochemistry 31, 4876–4882. PACE, C. N., GRIMSLEY, G. R., THOMAS, S. T., and MAKHATADZE, G. I. (1999). Heat Capacity Change for Ribonuclease A Folding. Protein Sci 8, 1500–1504. HUYGHUES-DESPOINTES, B. M., SCHOLTZ, J. M., and PACE, C. N. (1999). Protein conformational stabilities can be determined from hydrogen exchange rates. Nat Struct Biol 6, 910–912. QU, Y., BOLEN, C. L., and BOLEN, D. W. (1998). Osmolyte-driven contraction of a random coil protein. Proc Natl Acad Sci USA 95, 9268–9273. PACE, C. N. and LAURENTS, D. V. (1989). A new method for determining the heat capacity change for protein folding. Biochemistry 28, 2520–2525. SCHOLTZ, J. M. (1995). Conformational stability of HPr: the histidine-containing phosphocarrier protein from Bacillus subtilis. Protein Sci 4, 35–43. NICHOLSON, E. M. and SCHOLTZ, J. M. (1996). Conformational stability of the Escherichia coli HPr protein: test of the linear extrapolation method and a thermodynamic characterization of cold denaturation. Biochemistry 35, 11369–11378. FELITSKY, D. J. and RECORD, M. T., Jr. (2003). Thermal and urea-induced unfolding of the marginally stable lac repressor DNA-binding domain: a model system for analysis of solute effects on protein processes. Biochemistry 42, 2202–2217. IBARRA-MOLERO, B. and SANCHEZ-RUIZ, J. M. (1996). A model-independent, nonlinear extrapolation procdure for the characterization of protein folding energetics from solvent-denaturation data. Biochemistry 35, 14689–14702.
44 SCHELLMAN, J. A. (2002). Fifty years of solvent denaturation. Biophys Chem 96, 91–101. 45 ZHANG, O. and FORMAN-KAY, J. D. (1995). Structural characterization of folded and unfolded states of an SH3 domain in equilibrium in aqueous buffer. Biochemistry 34, 6784–6794. 46 MAKHATADZE, G. I. (1999). Thermodynamics of protein interactions with urea and guanidinium hydrochloride. J Phys Chem B 103, 4781–4785. 47 AUNE, K. C. and TANFORD, C. (1969). Thermodynamics of the denaturation of lysozyme by guanidine hydrochloride II. dependence on denaturant concentration at 25◦ . Biochemistry 8, 4586–4590. 48 TANFORD, C. (1970). Protein denaturation. C. Theoretical models for the mechanism of denaturation. Adv Protein Chem 24, 1–95. 49 LIEPINSH, E. and OTTING, G. (1994). Specificity of urea binding to proteins. J Am Chem Soc 116, 9670–9674. 50 TIMASHEFF, S. N. (2002). Thermodynamic binding and site occupancy in the light of the Schellman exchange concept. Biophys Chem 101–102, 99–111. 51 SCHELLMAN, J. A. and GASSNER, N. C. (1996). The enthalpy of transfer of unfolded proteins into solutions of urea and guanidinium chloride. Biophys Chem 59, 259–275. 52 IBARRA-MOLERO, B., LOLADZE, V. V., MAKHATADZE, G. I., and SANCHEZ-RUIZ, J. M. (1999). Thermal versus guanidine-induced unfolding of ubiquitin. An analysis in terms of the contributions from charge–charge interactions to protein stability. Biochemistry 38, 8138–8149. 53 MONERA, O. D., KAY, C. M., and HODGES, R. S. (1994). Protein denaturation with guanidine hydrochloride or urea provides a different estimate of stability depending on the contributions of electrostatic interactions. Protein Sci 3, 1984–1991. 54 SANTORO, M. M. and BOLEN, D. W. (1992). A test of the linear extrapolation of unfolding free energy changes over an extended denaturant concentration range. Biochemistry 31, 4901–4907.
23
24 Denaturation of Proteins by Urea and Guanidine Hydrochloride
55 MYERS, J. K., PACE, C. N., and SCHOLTZ, J. M. (1995). Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci 4, 2138–2148. 56 PACE, C. N., LAURENTS, D. V., and THOMSON, J. A. (1990). pH dependence of the urea and guanidine hydrochloride denaturation of ribonuclease A and ribonuclease T1. Biochemistry 29, 2564–2572. 57 SHORTLE, D. and MEEKER, A. K. (1986). Mutant forms of staphylococcal nuclease with altered patterns of guanidine hydrochloride and urea denaturation. Proteins 1, 81–89. 58 SHORTLE, D. (1996). The denatured state (the other half of the folding equation) and its role in protein stability. FASEB J 10, 27–34. 59 GITTIS, A. G., W. E. STITES, E. E. LATTMAN. (1993). The phase transition between a compact denatured state and a random coil state in staphylococcal nuclease is first-order. J Mol Biol 232, 718–724. 60 CARRA, J. H. and PRIVALOV, P. L. (1995). Energetics of denaturation and m values of staphylococcal nuclease mutants. Biochemistry 34, 2034–2041. 61 IONESCU, R. M., M. R. EFTINK. (1997). Global analysis of the acid-induced and urea-induced unfolding of staphylococcal nuclease and two of it variants. Biochemistry 35, 1129–1140. 62 BASKAKOV, I., BOLEN, D. W. (1998). Forcing thermodynamically unfolded proteins to fold. J Biol Chem 273, 4831–4834. 63 PACE, C. N. (1975). The stability of globular proteins. CRC Crit Rev Biochem 3, 1–43. 64 CARRA, J. H., ANDERSON, E. A., and PRIVALOV, P. L. (1994). Three-state thermodynamic analysis of the denaturation of staphylococcal nuclease mutants. Biochemistry 33, 10842–10850. 65 LEE, B. and RICHARDS, F. M. (1971). The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55, 379–400. 66 CREAMER, T. P., SRINIVASAN, R., and ROSE, G. D. (1997). Modeling unfolded states of proteins and peptides. II. backbone
67
68
69
70
71
72
73
74
75
76
77
solvent accessibility. Biochemistry 36, 2832–2835. PACE, C. N., LAURENTS, D. V., and ERICKSON, R. E. (1992). Urea denaturation of barnase: pH dependence and characterization of the unfolded state. Biochemistry 31, 2728–2734. PRIVALOV, P. L., TIKTOPULO, E. I., VENYAMINOV, S., GRIKO YU, V., MAKHATADZE, G. I., and KHECHINASHVILI, N. N. (1989). Heat capacity and conformation of proteins in the denatured state. J Mol Biol 205, 737–750. PACE, C. N., ALSTON, R. W., and SHAW, K. L. (2000). Charge–charge interactions influence the denatured state ensemble and contribute to protein stability. Protein Sci 9, 1395–1398. THOMSON, J. A., SHIRLEY, B. A., GRIMSLEY, G. R., and PACE, C. N. (1989). Conformational stability and mechanism of folding of ribonuclease T1. J Biol Chem 264, 11614–11620. PACE, C. N. and GRIMSLEY, G. R. (1988). Ribonuclease T1 is stabilized by cation and anion binding. Biochemistry 27, 3242–3246. YANG, M., FERREON, A. C., and BOLEN, D. W. (2000). Structural thermodynamics of a random coil protein in guanidine hydrochloride. Proteins Suppl, 44–49. NOZAKI, Y. (1972). The preparation of guanidine hydrochloride. In Methods in Enzymology (HIRS, C. H. W. and TIMASHEFF, S. N., eds), Vol. 26, pp. 43–50. Academic Press, New York. PRAKASH, V., LOUCHEUX, C., SCHEUFELE, S., GORBUNOFF, M. J., and TIMASHEFF, S. N. (1981). Interactions of proteins with solvent components in 8 M urea. Arch Biochem Biophys 210, 455–464. HAGEL, P., GERDING, J., FIEGGEN, W., and BLOEMENDAL, H. (1971). Cyanate formation in solutions of urea. I. Calculation of cyanate concentrations at different temperature and pH. Biochim Biophys Acta 243, 366–373. STARK, G. R. (1965). Reactions of cyanate with functional groups of proteins. 3. Reactions with amino and carboxyl groups. Biochemistry 4, 1030–1036. GRAY, W. R. (1997). Disulfide bonds between cysteine residues. In Protein
References
78
79
80
81
82
Structure: A Practical Approach, 2nd edn (CREIGHTON, T. E., ed.), pp. 165–186. IRL Press, Oxford. PACE, C. N., HEBERT, E. J., SHAW, K. L. et al. (1998). Conformational stability and thermodynamics of folding of ribonucleases Sa, Sa2 and Sa3. J Mol Biol 279, 271–286. SAITO, Y. and WADA, A. (1983). Comparative study of GuHCl denaturation of globular proteins. I. Spectroscopic and chromatographic analysis of the denaturation curves of ribonuclease A, cytochrome c, and pepsinogen. Biopolymers 22, 2105–2122. ALLEN, D. L. and PIELAK, G. J. (1998). Baseline length and automated fitting of denaturation data. Protein Sci 7, 1262–1263. PACE, C. N. and SCHOLTZ, J. M. (1997). Measuring the conformational stability of a protein. In Protein Structure: A Practical Approach, 2nd edn. (CREIGHTON, T. E., ed.), pp. 299–321. IRL Press, Oxford. SHAW, K. L. (2000). Reversing the net charge of ribonuclease sa. Texas A&M University, PhD dissertation. College Station, TX.
83 TADDEI, N., CHITI, F., PAOLI, P. et al. (1999). Thermodynamics and kinetics of folding of common-type acylphosphatase: comparison to the highly homologous muscle isoenzyme. Biochemistry 38, 2135–2142. 84 HUGHSON, F. M. and BALDWIN, R. L. (1989). Use of site-directed mutagenesis to destabilize native apomyoglobin relative to folding intermediates. Biochemistry 28, 4415–4422. 85 KAWAHARA, K. and TANFORD, C. (1966). Viscosity and density of aqueous solutions of urea and guanidine hydrochloride. J Biol Chem 241, 3228–3232. 86 WARREN, J. R. and GORDON, J. A. (1966). On the refractive indices of aqueous solutions of urea. J Phys Chem 70, 297–300. 87 HEBERT, E. J., GILETTO, A., SEVCIK, J. et al. (1998). Contribution of a conserved asparagine to the conformational stability of ribonucleases Sa, Ba, and T1. Biochemistry 37, 16192–16200. 88 WANG, A., BOLEN, D. W. (1997). A naturally occurring protective system in urea-rich cells: mechanism of osmolyte protection of proteins against urea denaturation. Biochemistry 36, 9101–9108.
25
1
Thermal Unfolding of Proteins George I. Makhatadze Penn State University, Hershey, USA
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Protein folding/unfolding reactions and protein–ligand interactions, like any other chemical reactions, are accompanied by heat effects. At a constant pressure, these heat effects are described by the enthalpy of the process. Temperature-induced changes in the enthalpy of the macroscopic states of a protein are of particular significance. Temperature and enthalpy are coupled extensive and intensive thermodynamic parameters. Thus, the functional relationship between enthalpy and temperature f = H(T) includes all the thermodynamic information on the macroscopic states in the system. This information can be extracted using the formalism of equilibrium thermodynamics. The enthalpy H(T) function of the system can be determined experimentally from the calorimetric measurements of the heat capacity at constant pressure, Cp (T), of a system over a broad temperature range, as: Cp (T )=
∂H ∂T
T
and thus H(T )= p
Cp (T )dT +H(T0 )
(1)
T0
The heat capacity function of a system, Cp (T), is directly measured by differential scanning calorimetry (DSC), making DSC the method of choice to study the conformational stability of biological macromolecules in general and proteins in particular. To study biological macromolecules the DSC instrumentation must satisfy a number of requirements, the most important of which is that measurements must be performed on dilute (< 1 mg mL−1 /0.1 weight %) protein solutions to minimize intermolecular interactions, and must use a small volume because in most cases only a few milligrams of protein are available. With these requirements Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Thermal Unfolding of Proteins
Fig. 1 Schematic design of the cell assembly of a contemporary DSC instrument. The matched sample cell (C1) and reference cell (C2) of the total fill type with inlet capillary tubes for filling and cleaning are surrounded by inner (IS) and outer (OS) shields. Two main heating elements (H1) on the cells are supplying electrical power for scanning upward in temperature. The auxiliary heating
elements (H2) are activated by a feedback loop to equilibrate the difference in temperatures between cells (i.e., to maintain T 1 = 0 ◦ C). Additional heaters (H3) on the inner shield (IS) maintain the temperature difference between the cell and the shield T 2 = 0 ◦ C. The feedback loops are controlled by a PC board.
satisfied, the partial heat capacity of a protein in aqueous solution is on the order of 0.03–0.5%, which is measured on the background of more than 99.5% heat capacity of solvent. Contemporary DSC instruments [1, 2] are designed to satisfy these extreme sensitivity, reproducibility, and signal stability requirements (Figure 1). Extreme sensitivity of the instrumentation implies that particular care should be taken over experimental procedures for sample preparation and routine operation of DSC [3, 4]. Some essential considerations for setting up a DSC experiment, and discussion of critical factors are given in Section 7. For those interested in collection of numerical data of thermodynamic parameters such as Gibbs free energy change, enthalpy change, heat capacity change, transition temperature etc., two excellent sources of information are Refs. [5] and [6].
2 Two-state Unfolding
First we consider in details the simplest case – a two-state model of protein unfolding. Some of the more complex cases will be discussed in Section 6.
2 Two-state Unfolding
For a monomolecular two-state process: KU N ↔ U the equilibrium constant is defined by the concentrations of protein in the native and unfolded states at a given temperature: K (T )=
[U] [N]
(2)
while the fraction of the molecules in the native, F N , and unfolded, F U , states is defined as: FN (T )=
[N] [U] and FU (T )= [N]+[U] [N]+[U]
(3)
Combining Eqs. (2) and (3), the populations of the native and unfolded states are defined by the equilibrium constant as: FN (T )=
1 K (T ) and FU (T )= 1+K (T ) 1+K (T )
(4)
The equilibrium constant of the unfolding reaction, K(T), on the other hand, is related to the Gibbs energy change upon unfolding as: G(T ) K (T ) =exp − RT
(5)
The Gibbs energy of unfolding, G(T), is defined as: G(T )=H(T )−T ·S(T )
(6)
where H(T) is the enthalpy change and S(T) is the entropy changes upon protein unfolding. The temperature dependence of both parameters is defined by the heat capacity change upon unfolding, Cp : Cp =
dS(T ) dH(T ) =T · dT dT
(7)
Assuming that Cp is independent of temperature we can write: H(T )=H(T0 )+Cp (T −T0 ) T S(T )=S(T0 )+Cp ·1n T0
(8) (9)
3
4 Thermal Unfolding of Proteins
where H (T 0 ) and S (T 0 ) are the enthalpy and entropy changes at a reference temperature T 0 . For a two-state process it is convenient to set the reference temperature at which fractions for the native, F N , and unfolded, F U , protein are equal. Correspondingly, at this temperature, which is usually called the transition temperature, T m , the equilibrium constant is equal to 1 and thus the Gibbs energy is equal to zero: G(Tm )=−RT 1nK (Tm )=H(Tm )−Tm S(Tm )=0
(10)
which allows the entropy of unfolding at T m to be expressed as: S(Tm )=
H(Tm ) Tm
(11)
and the entropy function as S(T )=
T H(Tm ) +Cp ·In Tm Tm
(12)
Thus the temperature dependence of the Gibbs energy can be written as: G(T )=
Tm Tm −T ·H(Tm )+Cp ·(T −Tm )+T ·Cp ·In Tm T
(13)
The importance of Eq. (13) is that if just three parameters, T m , H(T m ), and Cp , are known, all thermodynamic parameters H, S, and G for the two-state process can be calculated for any temperature. Figure 2 shows the temperature dependence of the partial heat capacity, Cp,pr (T) of a hypothetical protein. It consists of two terms: prg
Cp,pr (T )=Cp (T )+Cp (T ) exc
(14)
where Cp prg (T) is the so-called progress heat capacity (sometimes also called the “chemical baseline”) which is defined by the intrinsic heat capacities of protein in different states, and exc is the excess heat capacity which is caused by the heat absorbed upon protein unfolding. The progress heat capacity for a two-state unfolding can be defined as: prg Cp (T ) −FN ·Cp,N (T ) +FU ·Cp,U (T )
(15)
where Cp,N (T) and Cp,U (T) are the temperature dependencies of the heat capacities of the native and unfolded states, respectively. It is important to emphasize that the heat capacity of the unfolded state is higher than that of the native state, Cp,U > Cp,N . Extensive studies of the heat capacities of the native and unfolded states of small globular proteins have established some general properties [7]. The heat capacity function for the native state appears to be a linear function of temperature with a
2 Two-state Unfolding
Fig. 2 A) Temperature dependence of the partial heat capacity, C p,pr (T), of a hypothetical protein that has the following thermodynamic parameters of unfolding: T m = 65 ◦ C, H = 250 kJ mol−1 , C 5 kJ mol−1 p K−1 Cp,N (T) and C p,U (T) are the temperature dependencies of the heat capacities of the native and unfolded states, respectively, and C p prg (T) is the progress heat
capacity (sometimes also called the “chemical baseline”). B) The excess heat capacity, C p (T)exc , is obtained as a difference between C p pr (T) and C prg (T). The area under C p (T)exc is the enthalpy of unfolding, and temperature at which C p (T)exc is maximum defines the transition temperature, Tm.
rather similar slope 0.005–0.008 J K−2 g−1 when calculated per gram of protein. The absolute values for Cp,N vary between 1.2 and 1.80 J K−1 g−1 at 25 ◦ C. For the unfolded state, the absolute value at 25 ◦ C is always higher than that for the native state (Cp,U > Cp,N and ranges between 1.85 and 2.2 J K−1 g−1 . The heat capacity of the unfolded state has nonlinear dependence on temperature. It increases with increase of temperature and reaches a plateau between 60 and 75 ◦ C. Above this temperature Cp,U Cp,U remains constant with the typical values ranging between 2.1 and 2.5 J K−1 g−1 . The excess heat capacity function can thus be calculated as a difference between experimental and progress heat capacities. The excess heat capacity function is very important as it provides the temperature dependence of the fraction of protein in the native and unfolded states: T ( ) exc Qi (T ) 0 C p T dT = ∞ FN (T ) =1−FU (T ) = Qtot ( ) exc 0 C p T dT
(16)
5
6 Thermal Unfolding of Proteins
and the temperature dependence of the equilibrium constant ∞ K (T ) = TT 0
Cp (T ) exc dT
Cp (T ) exc dT
(17)
The temperature dependence of the equilibrium constant defines the van’t Hoff enthalpy of a two-state process as: HvH =−R·
d InK (T ) d (1/T )
(18)
The area under the exc is the calorimetric (model independent) enthalpy of unfolding: Hcal =
∞
Cp (T ) exc dT
(19)
0
Comparison of the two enthalpies – van’t Hoff enthalpy calculated for a two-state transition and model-free calorimetric enthalpy of unfolding – is the most rigorous test of whether protein unfolding is a two-state process [8–11]. If calorimetric enthalpy is equal to van’t Hoff enthalpy Hcal (T m ) = HvH (T m ), the unfolding process is a two-state process. If calorimetric enthalpy is larger than the van’t Hoff enthalpy, Hcal (T m ) > HvH (T m ), the unfolding process is more complex, such as being not monomolecular or involving intermediate states (see Section 6). Instead of the van’t Hoff enthalpy, it is more practical to use the fitted enthalpy of two-state unfolding. This can be done using the following approach. The excess enthalpy function (relative to the native state) is defined as [2, 8]: Hexc =FU ·H (T )
(20)
where H(T) is the enthalpy difference between native and unfolded states. By definition, the excess heat capacity function is a temperature derivative of the excess enthalpy at constant pressure: Cp exc =
∂ FU ∂Hexc =H· ∂T ∂T
(21)
Note that the way exc is defined by Eq. (14), the temperature dependence of H is in the Cp prg (T) function. Combining Eqs. (4, 8), and (21) it is easy to show that the excess heat capacity exc is defined as: Cp (T ) exc =
Hfit2 K · R·T 2 (1+K )2
(22)
3 Cold Denaturation
where the enthalpy function is defined as H(T): H (T ) =Hfit (Tm ) +Cp · (T −Tm )
(23)
The experimental partial molar heat capacity function, Cp,pr (T), is then fitted to the following expression: Cp,pr (T ) =FN (T ) ·Cp,N (T ) +Cp (T ) exc +FU (T ) ·Cp,U (T )
(24)
The heat capacity functions for the native and unfolded states can be represented by the linear functions of temperature as: Cp,N (T ) =AN ·T +BN
(25)
Cp,U (T ) =AU ·T +BU
(26)
There are seven parameters to fit: T m ; Hfit ; Cp ; AN ; AU ; BN , and BU . 3 Cold Denaturation
Initial studies of the thermodynamics of protein unfolding [9, 12–15] done on small globular proteins established that equilibrium protein unfolding is a twostate process, i.e., HvH = Hcal and that the protein unfolding is accompanied by a large positive heat capacity change. Direct consequence of large heat capacity changes upon unfolding is that both enthalpy and entropy functions are strongly temperature dependent which in turn leads to a temperature-dependent Gibbs energy function (Figure 3). Both the enthalpy and entropy are expected to be zero at temperature T h and T s . By definition, the slope of temperature dependence of the G function is equal to −S: ∂G (T ) =−S (T ) ∂T Thus at T s S = 0 and G has a maximum at T s , so that G(T s ) = H(T s ). The enthalpy function crosses zero at T h , which is always lower than T s (i.e., T h < T s ), however since T s − T h = H(T s )=Cp , this difference is only a few degrees. There is a simple relationship between T h and the transition temperature T m : T m – T h = H(T m )/Cp . The fact that G function has a maximum at T s means that it can cross zero at two different temperatures: once at temperature T m > T s and again at T m < T s . T m is the temperature of denaturation observed upon heating the solution. T m is the temperature of protein denaturation upon cooling, the so-called cold denaturation. There is a simple relationship between T m and T m . The equation T m H ≈ T m − 2 · H(T m )/Cp overestimates the temperature of cold denaturation only by few degrees.
7
8 Thermal Unfolding of Proteins
Fig. 3 Temperature dependencies of Gibbs energy, G, enthalpy, H, and entropic contribution, T · S, for the same hypothetical protein as in Figure 2. Thermodynamics functions were calculated according to equations using the following parameters: T m
= 65◦ C, H = 250 kJ mol−1 , C p =5 kJ mol−1 K−1 . Other abbreviations: T m temperature of cold denaturation, T h temperature at which enthalpy is zero, T s temperature at which entropy is zero.
Cold denaturation was predicted and later its existence confirmed based on the extrapolation of thermodynamic functions obtained from the study of heat denaturation process [8, 16–19]. Experimental validation of the cold denaturation phenomenon once again demonstrated the power of statistical thermodynamics in the analysis of the conformational properties of proteins.
4 Mechanisms of Thermostabilization
From the point of view of applied properties of proteins, two parameters are important – the thermodynamic stability as measured by the Gibbs energy and the thermostability as measured by the transition temperature. Depending on the relationship between the T s ,T m and absolute values of G, three extreme models [20–24] account for the increase in G and/or T m and the relationship between these two parameters (Figure 4). Ĺ Model I Increase in thermostability is accompanied by little or no change in T s but with an increase in stability at T s . This can be achieved by lowering the entropy of unfolding without changes in the enthalpy of unfolding or in Cp . Ĺ Model II Increase in thermostability is accompanied by little or no change in T s and no changes in stability at T s , but the temperature dependence of G
4 Mechanisms of Thermostabilization
Fig. 4 Thermodynamic models for thermostabilization relative to the reference protein. Detailed thermodynamic parameters for the reference protein (R), and models I, II and III are given in Table 1.
becomes less steep. This can be achieved by lowering Cp , but both enthalpy and entropy will change as well. Ĺ Model III Increase in thermostability is accompanied by the increase in T s but no changes in stability at T s , or in the temperature dependence of G function. The increase in stability is due to the decrease in both enthalpic and entropic factors, but decrease in T · S is larger than in H, while Cp remains unchanged.
Of course these three models represent “extreme” cases and the real systems can use combination of all three simultaneously. Nevertheless, understanding the contributions of different factors to the thermodynamics of proteins in terms of Cp , H, and S allows strategies for rational modulation of protein stability
9
10 Thermal Unfolding of Proteins Table 1 Thermodynamic parameters used in the simulation of different models for ther-
mostabilization shown in Figure 4 T m T s C p (kJ H(T m ) G(T S ) (kJ H(65 ◦ C) T·S(65 ◦ C) G(65 ◦ C) (kJ mol−1 ) (kJ mol−1 ) (kJ mol−1 ) (◦ C) (◦ C) mol−1 K −1 ) (kJ mol−1 ) mol−1 ) Reference Model 1 Model 2 Model 3
65 75 75 75
18 19 18 27
5.00 5.00 3.44 5.00
250 300 212 254
17.6 24.6 17.6 17.6
250 250 177 204
250 242 172 197
0 7.9 5.6 6.6
and/or thermostability to be designed. These strategies are discussed in the next section.
5 Thermodynamic Dissection of Forces Contributing to Protein Stability
The ultimate goal of the field is to be able to understand the determinants of protein stability in such details that will allow the prediction of the stability profile (i.e., G) of any protein under any conditions such as temperature, pH, ionic strength, etc. For example, the slope of the G dependence on temperature is −S, which in turn is dependent on temperature and this dependence is defined by Cp . Thus one needs to know not only a value for G at one particular temperature, but also S, H, and Cp to calculate the G(T) function. Another potential complexity arises from the fact that G being the difference between the enthalpic (H) and entropic (T · S) terms, is subject to the so-called enthalpy–entropy compensation phenomenon [25–27]. It has been observed in a number of systems that changes in the enthalpy of a system are often compensated by the changes in the entropy in such a way that the resulting changes in G are very small. One particularly striking example of such compensation is the effect of heavy (D2 O) and ordinary light (H2 O) water on protein stability [28]. Heavy water as a solvent provides probably the smallest possible perturbation to the properties H2 O. The stability (G) of proteins in H2 O is very similar to the stability of proteins in D2 O (Figure 5). Such similarity in the G values might lead to the conclusion that D2 O did not perturb the properties of the system relative to H2 O. However, if one looks at the corresponding changes in enthalpy (Figure 5) and entropy (not shown), it is clear that D2 O has a profound effect on the system, and decreases the enthalpy of unfolding by ∼20%. This decrease in enthalpy is accompanied by a decrease in entropy in such a way that the resulting G is largely unchanged. This result underlines the importance of the analysis of the protein stability in terms of enthalpy, entropy, and heat capacity, so that underlying physicochemical processes are not masked by the enthalpy-entropy compensations. The effect of D2 O on protein stability also underlines the importance of the solvent (or interactions of
5 Thermodynamic Dissection of Forces Contributing to Protein Stability
Fig. 5 Comparison of the unfolding thermodynamics for three different proteins ribonucleases A (RNaseA), hen-egg white lysozyme (Lyz) and cytochrome c (Cyt), in ordinary (H2 O, squares) and heavy (D2 O, circles) water. A) Dependence of the transition temperature, T m , on the pH/pD. B) Dependence of the enthalpy of unfolding,
H, on the transition temperature, T m . The slopes which represent the heat capacity change upon unfolding (C p ,kJK−1 mol−1 ) are 4.8 (d-RNase), 5.2 (h-RNase), 6.4 (dLyz), 6.7 (h-Lyz), 4.7 (d-Cyt c) and 5.5 (h-Cyt c). C) Dependence of the Gibbs energy of unfolding at G (25 ◦ C) on the pH/pD. Reproduced with permission from Ref. [28].
protein groups with the solvent) for protein stability. Even such small perturbation in the properties of solvent such as D2 O versus H2 O produces dramatic changes in the enthalpies and entropies of unfolding. Thus in analysis of the factors important for proteins stability one needs to consider not only the interactions between protein groups but also interactions between protein groups and the solvent. The next three sections will discuss the contribution of different types of interactions to the heat capacity, enthalpy and entropy changes upon protein unfolding.
11
12 Thermal Unfolding of Proteins
5.1 Heat Capacity Changes, Cp
It is currently believed that the heat capacity changes upon unfolding are entirely defined by the interactions with the solvent of the proteins groups that were buried in the native state but become exposed upon unfolding. Both polar and nonpolar groups seem to contribute although with different sign and magnitude. It appears that nonpolar groups have a large positive contribution to the heat capacity of unfolding, while the polar groups have smaller (for a group of similar size) and negative contribution to Cp . These estimates of Cp for polar and nonpolar groups were obtained from the study of thermodynamics of transfer for model compounds [7, 29–35]. To link the data on model compounds to the results obtained on proteins, the heat capacity changes for model compounds were computed per unit of solvent accessible surface area, and then used as coefficients to calculated the hydration heat capacity for proteins. It has been shown that the Cp values calculated are in a very good agreement with the experimentally determined heat capacity changes for proteins. From this correlation it was concluded that the Cp of protein unfolding is caused exclusively by the hydration and that the disruption of internal interactions in the native state do not contribute to the Cp . This conclusion was in disagreement with the estimates done by Sturtevant [36], who showed that the changes in vibrational entropy of protein upon unfolding should contribute 20–25% to the experimentally observed changes in Cp . The common belief that Cp is entirely defined by the hydration was questioned by experiments of Loladze et al. [37]. These authors measured experimentally the effects of single amino acid substitutions at fully buried positions of ubiquitin on the Cp of this protein. It was shown that substitution of the fully buried nonpolar residues with the polar ones leads to a decrease in Cp , while the substitutions of fully buried polar residues with nonpolar ones leads to an increase in Cp . These results provided direct experimental support for the model compound data (that indicated difference in the sign for the Cp of hydration for polar and nonpolar groups) to be qualitatively applicable to proteins. Quantitatively, however, the results of substitutions on Cp were not possible to predict based on the model compound data and surface area proportionality. This led to the conclusion that the dynamic properties of the native state also contribute to Cp [37, 38]. At this point quantitative correlation between native state dynamics and Cp awaits experimental validation. 5.2 Enthalpy of Unfolding, H
The enthalpy of protein unfolding is positive, which means that enthalpically the native state is favored over the unfolded state. Two major types of interactions can be considered as a possible source of enthalpy: interactions with the solvent and internal interactions between protein groups in the native state.
5 Thermodynamic Dissection of Forces Contributing to Protein Stability 13
Interactions with the solvent by the protein groups newly exposed upon unfolding of the native protein structure are expected to be enthalpically unfavorable (enthalpically hydration of the unfolded state is favored over the hydration of the native state). This follows from the data on the transfer of the model compounds from the gas phase into water; for both polar and nonpolar groups, the enthalpy of transfer is negative [28, 32, 34, 39, 48]. The absolute magnitude of the effects for a polar group and a nonpolar group of the same size is very different. It is far more enthalpically unfavorable to transfer a polar group into water that it is for a nonpolar group. This large difference in the enthalpy of hydration of polar and nonpolar groups in proteins was recently demonstrated experimentally by measuring the effects of nonpolar to polar substitutions at fully buried sites on the enthalpy of unfolding [40]. If the enthalpy of dehydration is unfavorable, it is clear that the internal interactions between protein groups in the native state are enthalpically very stabilizing. The internal interactions are the hydrogen bonding and packing interactions between groups in the protein interior. The packing of the protein core is as good as the packing of organic crystals. Calculations show that the packing density in proteins is very similar to that for organic crystals [41, 42]. Melting or sublimation of a crystal is accompanied by large enthalpy, and one can expect that disruption of packing interactions in the protein interior will also lead to large positive enthalpy. Direct evidence for this was obtained by measuring the effects of substitutions that perturb packing in the protein interior [40]. It was shown that cavity creating substitutions in the protein core led to substantial decrease in the enthalpy of unfolding. Hydrogen bonding also makes a favorable contribution to the protein stability [43, 44]. The net effect of disruption of hydrogen bonding inside the protein is difficult to measure. However, substitutions that disrupt hydrogen bonding with minimal (but not negligible) perturbations in the packing and hydration indicate that there is a significant enthalpic effect associated with the disruption of internal hydrogen bonds [40]. One needs to keep in mind that this effect is for disruption of hydrogen bonds in the protein interior and does not include the effect of hydration. These two effects are very difficult to estimate separately. The combined effect, i.e., enthalpy of disruption of hydrogen bonds in the protein interior followed by exposure (hydration) of hydrogen bonding groups to the solvent is more straightforward for the experimental analysis. Recent advances in calorimetric instrumentation and the development of novel model systems have provided estimates for the enthalpy of hydrogen bonding upon helix-coil transition [45–47]. The measured value is relatively small (∼3 kJ mol−1 ) but positive, indicating that the favorable enthalpy of internal hydrogen bonding is larger than the unfavorable enthalpy of hydration. To summarize, native protein is enthalpically more stable than the unfolded state. Internal interactions (packing and hydrogen bonding) are enthalpically stabilizing, while exposure to the solvent upon unfolding of the buried in the native state groups is enthalpically destabilizing.
14 Thermal Unfolding of Proteins
5.3 Entropy of Unfolding, S
Protein unfolding is accompanied by increase in entropy. This entropy increase is caused by two factors. One factor is the interaction of the newly exposed groups with the solvent in the unfolded state. The other factor is the increase in the number of accessible conformations in the unfolded state relative to the native state. Interactions with the solvent by both polar and nonpolar groups occur with the negative entropy [33, 34, 48]. The molecular details of these effects just start to become clear but in a simplified way they can be described by the increase in order of water molecules around the solute (polar or nonpolar) relative to the bulk water. The molecular mechanism of this ordering is different for polar and nonpolar groups [49]. For a group of the same size, the entropy decrease due to hydration for polar groups appears (based on the model compound study) to be larger than that for the nonpolar groups. Such an effect is expected intuitively, based on consideration that the polar groups will have more ordering effect on the water dipole. This decrease in entropy due to hydration of protein groups newly exposed to solvent in the unfolded state is overcompensated by the increase (relative to the native state) in the configurational freedom of the polypeptide chain in the unfolded state. There are several different estimates of the magnitude of the configurational entropy in proteins [50–55]. The actual value will probably be strongly context dependent, and will depend on the conformations that can be populated within the unfolded state ensemble. Overall, hydration of both polar and nonpolar groups (the latter sometimes called hydrophobic effect) entropically favors the native state, while the increase in conformational freedom favors the unfolded state. On balance, the protein structure is stabilized enthalpically. Large enthalpic stabilization is offset by entropic destabilization, rendering the protein molecule marginally stable [56]. One can only speculate why proteins evolve to be marginally stable. From the biological standpoint, low stability might be advantageous for rapid regulation of protein levels in the cell, because low stability most certainly will facilitate protein degradation by proteases or more complex cellular machineries. Transport of proteins among different cellular compartments involves folding–unfolding and thus low stability will facilitate transport across the membrane [57]. Low stability will probably permit conformational transitions between different functional states, and thus will be advantageous for regulation. Low stability will also have important consequences for the kinetics of folding/unfolding reactions and thus might be important in preventing accumulation of intermediate states that have detrimental properties, such as a tendency to aggregate [57–59]. In discussing protein stability we largely concentrate on the residues that are buried in the native state as a major source for the stability. Indeed, interactions in the core of the native protein and interactions of these residues with the solvent in the unfolded state provide the dominant effects for protein stability. This does
6 Multistate Transitions
not mean, however, that the surface residues do not contribute to stability [60– 63]. They are in most cases not dominant factors defining protein energetics, yet are significantly large relative to the overall protein stability (G) and thus they are important for modulating protein stability and, in the case of charge–charge interactions, enhancing protein solubility.
6 Multistate Transitions
In 1978 Biltonen and Freire [8] proposed an universal mechanism for equilibrium unfolding of proteins based on standard statistical thermodynamics approach. They considered a general reaction in which an initial native state, N, undergoes conformational transition to the final unfolded state, U, via a series of intermediate states, Ii : N↔I1 ↔I2 · · ·↔In ↔U and showed that the excess enthalpy of the systems (enthalpies of all states in the system relative to the initial state) can be written as: Hexc =
n
Fi ·Hi
(27)
i=0
where F i is the fractional population of i-th state and Hi is the enthalpy of the i-th state relative to the initial state. Keeping in mind that since Hexc =
T T0
Cpexc ·dT
(28)
knowledge of the excess heat capacity function provides a unique description of the thermodynamic properties of the system. One practical approximation for the multistate transitions is that when the transitions involve many states, it is convenient to assume that the thermodynamic domains behave in an independent manner, with each individual transition approximated by a two-state transition. In such case the excess heat capacity function can be simply written as: Cpexc =
(Hi )2 i
RT 2
·
Ki (1+K i )2
(29)
where K i is the equilibrium constant for the i-th reaction. The following sections discuss several of the commonest cases for the unfolding process other than the simple two-state case.
15
16 Thermal Unfolding of Proteins
6.1 Two-state Dimeric Model
In this model only the native dimer, N2 , and unfolded monomer, U, are significantly populated. K N2 ↔ 2·U The equilibrium constant is then defined as: K=
[U]2 [N2 ]
(30)
where [N2 ] and [U] are the concentrations of native dimers and monomers, respectively. The fraction of protein in these two states can be presented as: 1 [N2 ]= ·PT ·FN2 and [U]=PT ·FU 2
(31)
where FN2 and F U are the fractions of native dimer and unfolded monomers, respectively, and PT , the total protein concentration expressed per mole of monomers is: PT =2·[N2 ]+[U]
(32)
Combining Eqs. (30) and (32) allows the equilibrium constant to be expressed as a function of total protein concentration and a fraction of unfolded protein as: K=
2·PT ·FU2 1−FU
(33)
Solving a simple quadratic equation for F U , we arrive at: FU =
K 2 +8·PT ·K −K 4PT
(34)
The excess enthalpy of the system is related to the enthalpy of unfolding as: Hexc =FU ·H (T )
(35)
which allows calculation of the excess heat capacity function by simple analytical differentiation: Cpexc =
∂ FU H (T )2 FU · (1−FU ) ∂Hexc =H· = ∂T ∂T R·T 2 2−FU
(36)
6 Multistate Transitions
Thus the experimental partial molar heat capacity profile can be expressed as: Cp,pr (T ) =FN (T ) ·Cp,N (T ) +Cpexc (T ) +FU (T ) ·Cp,U (T )
(37)
where the heat capacity functions for the native and unfolded states are expressed according to Eqs. (25–26). Calculation of the standard thermodynamic quantities such as Gibbs energy can be done by defining the reference temperature T m as a temperature at which FN2 = 0.5, and thus according to the Eq. (33) the equilibrium constant at this temperature is equal to PT . The standard Gibbs energy can then be expressed as: G 0 =
Tm −T Tm −R·T ·InK (Tm ) ·H (Tm ) +Cp · (T −Tm ) +T ·Cp ·In Tm T
(38)
It must be emphasized that the G0 function in this case is refers to a 1 mol L−1 concentration of dimers. 6.2 Two-state Multimeric Model
For the reaction of unfolding of homo-oligomeric system that occurs without significant population of intermediate states, i.e. K Nn ←→ n·U it is easy to show that the excess heat capacity function can be expressed simply as
Cpexc =
H (T )2 FU · (1−FU ) R·T 2 n−FU · (n−1)
(39)
where n is the order of the reaction. Obviously, for n = 2, Eq. (39) is reduced to Eq. (36) 6.3 Three-state Dimeric Model
In this model, not only the native dimer, N2 , and the unfolded monomer, U, but also a native monomer, N, are significantly populated. Kd KU N2 ←→ 2·N ←→ 2·U
17
18 Thermal Unfolding of Proteins
Equilibrium constants of dissociation, K d , and of unfolding, K U , are then defined as: K d=
[N]2 [U] and K U = [N2 ] [N]
(40)
where [N2 ], [N], and [U] are the concentrations of the native dimers, the native monomers and the unfolded monomers, respectively. The fraction of protein in these three states can be presented as: 1 [N2 ]= FN2 ·PT 2
(41)
[N]=FN ·PT
(42)
[U]=PT ·FU
(43)
where FN2 , F N , and F U are the fractions of native dimer, native monomer, and unfolded monomer, respectively, and PT , the total protein concentration expressed per mole of monomers, is: PT=2[N2 ]+[N]+[U]
(44)
Combining Eqs. (40)–(44) allows the equilibrium constants to be expressed as a function of total protein concentration and fractions of native and unfolded protein as: K d=
2·FN2 ·PT FN2
K U=
(45)
FU
(46)
K d ·FN2 2·PT
Keeping in mind that FN2 + F N + F U = 1 we can combine Eqs. (45) and (46) to get: FN2 =1−
K d ·FN2 −K U · 2·PT
K d ·FN2 2·PT
(47)
Substituting 1 + K U = q and solving the quadratic equation we arrive at the following expressions for the fractions of proteins in the native dimer, native monomer, and unfolded monomer: Kd 8·PT ·[ q 2 + −q ]2 FN2 = 8·PT Kd
(48)
6 Multistate Transitions
Kd 8·PT FN = ·[ q 2 + −q ] 4·PT Kd K U ·K d 8·PT FU = ·[ q 2 + −q ] 4·PT Kd
(49)
(50)
The excess enthalpy function is defined as: Hexc =Hd ·FN + (Hd +HU ) ·FU
(51)
which allows calculation of the excess heat capacity function as a temperature derivative of the excess enthalpy. The corresponding Gibbs energies can be expressed as: Td ) ( −R·T ·InK (Td ) ·H Td +Cp · T −Tp +T ·Cp ·In T Tm −T Tm G U = ·H (Tm ) +Cp · (T −Tm ) +T ·Cp ·In Td T
Td −T G 0d = Td
(52)
(53)
where G0 d is the standard Gibbs energy of dissociation of native dimer into native monomers, and GU is the monomolecular Gibbs energy of unfolding of monomers. For convenience the reference temperatures are defined as follows: T d is defined as a temperature at which F N2 = F N 2 , and thus according to the Eq. (45) the equilibrium constant is equal to 2PT ; T m is the temperature at which F N = F U and thus the equilibrium constant is equal to 1. 6.4 Two-state Model with Ligand Binding
In this model, the native state, N, but not the unfolded state, U, can bind a ligand, L. The ligand does not undergo any conformational transition upon binding. Kd KU NL ←→ N+L ←→ U+L The equilibrium constants of dissociation, K d , and of unfolding, K U , are then defined as: K d=
[N]·[L] [U] and K U = [NL] [N]
(54)
where [NL], [N], [U], and [L] are the concentrations of ligated, and unligated native states, unfolded state, and free ligand concentration, respectively. The fraction of protein in these three states can be presented as:
19
20 Thermal Unfolding of Proteins
[N]=FN ·PT
(55)
[U]=FU ·PT
(56)
[NL]=FNL ·PT
(57)
where the total concentration of the protein, PT , is PT =[NL]+[N]+[U]
(58)
Similarly, the total concentration of the ligand, LT , can be expressed as: L T =[NL]+[L]
(59)
Combining Eqs. (54–59) allows the equilibrium constants to be expressed as a function of total protein and ligand concentrations and fractions of native and unfolded protein as: K d=
FN · ( L T −FNL ·PT ) FNL
(60)
K U=
FU · ( L T −FNL ·PT ) K d ·FNL
(61)
Keeping in mind that F NL + F N + F U = 1 we can combine Eqs. (60–61) and write: FNL +
K d ·K U FNL K d ·FNL + =1 ( L T −FNL ·PT ) ( L T −FNL ·PT )
(62)
Equation (62) is a simple quadratic equation and can be solved for F NL to give:
q 2 −4·PT ·L T −q FNL = 2·PT
(63)
where q = PT + LT + K d + K U K d The fractions of the unligated native and of the unfolded states can be expressed as:
q 2 −4·PT ·L T −q
FN = 2·L T ·PT −PT · q 2 −4·PT ·L T −q K d·
q 2 −4·PT ·L T −q
FU = 2·L T ·PT ·PT · q 2 −4·PT ·L T −q K U ·K d ·
(64)
(65)
6 Multistate Transitions
The excess enthalpy function is defined as: Hexc =Hd ·FN + (Hd +HU ) ·FU
(66)
which allows calculation of the excess heat capacity function as a temperature derivative of the excess enthalpy. The corresponding Gibbs energies can be expressed as: G 0d =H (T0 ) +Cp · (T −T0 ) ·S (T0 ) −T ·Cp ·In G U =
T0 T
−R·T ·InK d (T0 )
Tm −T Tm ·H (Tm ) +Cp · (T −Tm ) +T ·Cp ·In Tm T
(67) (68)
where G0 d is the standard Gibbs energy of dissociation, GU is the monomolecular Gibbs energy of unfolding, T 0 is the reference temperature for the dissociation reaction, and T m is the temperature at which F N = F U and thus the equilibrium constant is equal to 1. In practice, the GU function is determined from the DSC experiments performed in the absence of the ligand, and then is used to fit the DSC profiles obtained in the presence of ligand. 6.5 Four-state (Two-domain Protein) Model
In this model, the protein consists of two interacting domains A and B, each of which unfolds as a two state.
The equilibrium constants for the four reactions can be expressed through the concentrations of four states, native with both A and B domains folded (AN BN ), domain A folded domain B unfolded (AN BU ), domain A unfolded domain B folded (AU BN ), and both domains unfolded (AU BU ), as K 1=
[AU BN ] [AN BN ]
(69)
K 2=
[AN BU ] [AN BN ]
(70)
K 3=
[AU BU ] [AU BN ]
(71)
K 4=
[AU BU ] [AN BU ]
(72)
21
22 Thermal Unfolding of Proteins
It is important to emphasize that the four equilibrium constants are not fully independent since the following relationship is valid K 1 · K 3 = K 2 · K 4 . The fraction of each of the four states can be written as: FAN BN =
[AN BN ] [AN BN ]+[AU BN ]+[AN BU ]+[AU BU ]
(73)
FAU BN =
[AU BN ] [AN BN ]+[AU BN ]+[AN BU ]+[AU BU ]
(74)
FAN BU =
[AN BU ] [AN BN ]+[AU BN ]+[AN BU ]+[AU BU ]
(75)
FAU BU =
[AU BU ] [AN BN ]+[AU BN ]+[AN BU ]+[AU BU ]
(76)
Combining Eqs. (69)–(72) and (73)–(76) we can relate the fraction of individual states to the corresponding equilibrium constants: FAN BN =
1 1+K 1 +K 2 +K 3
(77)
FAU BN =
K1 1+K 1 +K 2 +K 1 ·K 3
(78)
FAN BU =
K2 1+K 1 +K 2 +K 1 ·K 3
(79)
FAU BU =
K 1 ·K 3 1+K 1 +K 2 +K 1 ·K 3
(80)
Difference in K 1 and K 4 or K 2 and K 3 defines the energy of interaction between domains, Gint . K4 G 4 −G int =exp − K int R·T T G 4 =H (TA) · 1+ +Cp,A · (T −TA −T ·In (T/TA )) TA K3 G 3 −G int K 2= =exp − K int R·T T G 3 =H (TB ) · 1+ +Cp,B · (T −TB −T ·In (T/TB )) TB G int K int =exp − R·T K 1=
G int =Hint −T ·Sint
(81) (82) (83) (84) (85) (86)
7 Differential Scanning Calorimetry (DSC) 23
The reference temperatures T A and T B are defined as the temperatures at which the each of the two intermediate states, AU BN and AN BU that have one of the domains unfolded awhile the other one folded, is 50% populated. It is important to emphasize that the energy of interaction as defined above also includes possible conformational changes in each domain upon their interactions. The excess enthalpy of the system can be written as: Hexc = (HA −Hint ) ·FAU BN + (HB −Hint ) ·FAU BU + (HA +HB −Hint ) ·FAU BU
(87)
and the excess heat capacity function can be obtained by differentiation. Cpexc (T ) =
∂Hexc dFAU BN dFAN BU = (HA −Hint ) · + (HB −Hint ) · ∂T dT dT dFAN BU + (HA +HB −Hint ) · dT
(88)
Note that after one of the domains melts, the interaction energy goes to zero. Thus, Gint has apparent effect only on a domain with lower T m .
7 Differential Scanning Calorimetry (DSC) 7.1 How to Prepare for DSC Experiments
The DSC instrument should be turned on prior to the experiment and several (at least three) baseline scans with both cells filled with the buffer should be recorded. The last two profiles must be overlapping. This establishes the thermal history and verifies that the instrument is working within specifications. The pre-experiment baseline scans should be done at the scan rate that will be used for the protein scan. The heating rate is one of the most important parameters. Several considerations have to be taken into account. First, the higher the scan rate, the higher the sensitivity of the instrument. The increase in sensitivity is linear with respect to the heating rate (i.e., the sensitivity at a heating rate of 2◦ min−1 is twice as high as that at 1◦ min−1 ), but the increase in sensitivity does not mean an increase in the signal-to-noise ratio. Second, if the expected transition is very sharp (e.g., within a few degrees) a high heating rate will affect the shape of the heat absorption profile and lead to an error in the determination of all thermodynamic parameters for this transition. This error will be particularly large for the transition temperature. Third, the higher the heating rate the less time the system has to equilibrate. So for slow unfolding/refolding processes it is advisable to use low heating rates.
24 Thermal Unfolding of Proteins
In practice, for small globular proteins exhibiting reversible temperature induced unfolding, a maximal (on most DSC instruments) heating rate of 2◦ min−1 is acceptable. For large proteins lower heating rates (0.5–1◦ min−1 ) are more appropriate. In the case of fibrillar proteins, which usually exhibit very narrow transitions, a heating rate of 0.1–0.25◦ min−1 is most suitable. The purity of the protein sample is very important. Contamination with other proteins, nucleic acids, or lipids can affect both the shape of the calorimetric profiles and the estimation of the thermodynamic parameters of unfolding through the errors in determination of the sample concentration. PADG electrophoresis, UV/VIS spectroscopy and mass spectroscopy should be used to validate the purity of the protein sample. Since calorimetric experiments are performed over a broad temperature range, the pH of the buffer should have a small temperature dependence. For this reason for example, very popular “physiological” Tris buffer is not the best choice. Glycine, sodium/potassium acetate, sodium/potassium phosphate, sodium cacodilate, and sodium citrate are the buffers of choice. Equilibration of protein with buffer is a vital part of the sample preparation for calorimetric experiments. One should keep in mind that the DSC instrument measures the heat capacity difference between buffer and protein solutions. Thus any small inequality in the buffer composition will result in an incorrect absolute value and an incorrect slope of the heat capacity profile. The best way to achieve equilibrium between buffer and protein solution is dialysis. The protein sample should be dialyzed against at least two changes of buffer. The buffer solutions should be filtered prior to dialysis using a 0.45 µm filter to eliminate all insoluble material. Following dialysis, all insoluble particles should be carefully removed from the protein solution. This can be achieved by centrifugation at ∼12 000 g in a microcentrifuge for ∼15–20 min. Knowledge of the exact protein concentration after dialysis is very important. Among all available methods for determination of protein concentration, spectrophotometric methods seem to be most accurate and rapid. This method requires knowledge of the extinction coefficient of the protein at a given wavelength. The extinction coefficient can be calculated from the number of aromatic residues and disulfide bonds in a protein using an empirical equation [64]: 0.1%,1cm = 5690·NTrp +1280·NTyr +120·NSS /MW ε280nm
(89)
where MW is the molecular weight of the protein in daltons. A more elaborate procedure is described by Pace et al. [65] and improves the accuracy of the calculations. For proteins that do not contain any aromatic residues, the method that is based on absorption of peptide backbone gives fast and accurate results. Light scattering in spectrophotometric measurements of protein concentration can be a significant problem, but can be taken into account using a procedure suggested by Winder and Gent [66].
7 Differential Scanning Calorimetry (DSC) 25
7.2 How to Choose Appropriate Conditions
The most frequent problem in DSC experiments is irreversibility of the unfolding transition. This particularly applies to large proteins that have tendencies to aggregate after unfolding. Sensitivity of contemporary DSC instruments provides reliable heat capacity profiles at concentrations as low as 0.3 mg mL−1 for proteins of the molecular range 6–17 kDa. The concentration for larger proteins can be even lower. Low protein concentration leads to decreased probability for aggregation. However, even at low protein concentrations, finding the proper solvent conditions is not a simple task. A rule of thumb is that at a pH value close to the isoelectric point of the protein, aggregation after unfolding is highly probable. High concentrations of neutral salts also promote aggregation. In many cases, aggregation can be avoided by keeping the ionic strength of the solvent very low (10–50 mM) and by using pH values a couple of units away (usually below) from the isoelectric point. 7.3 Critical Factors in Running DSC Experiments
Two important pieces of information should be established prior to the analysis of the calorimetric profiles. This will help to decide whether the formalism of equilibrium thermodynamics is applicable to the studied protein: 1. Reversibility of the unfolding transition under several conditions should be examined by rescanning the sample. Reversibility depends on the upper temperature of heating. The unfolding transition can be considered reversible if rescanning after the sample has been heated to the temperature just past the completion of the transition leads to the recovery of at least 85–90% of the initial endotherm. If no endotherm is observed after rescanning, the unfolding reaction is irreversible. Thermodynamic analysis for irreversible transitions is possible only in special cases [11, 67]. 2. Equilibrium conditions of reversible unfolding should be examined to find the optimal scan rate that permits establishment of equilibrium between native and unfolded states at all temperatures (i.e., scan rate should not be faster than the folding/unfolding rates). This can be examined by comparing the heat capacity profiles obtained on exactly the same sample at two different heating rates, usually 2◦ min−1 and 0.5◦ min−1 . If the difference in the transition temperature exceeds 0.5–0.7 ◦ C, additional experiments at a lower heating rate (i.e., 0.1◦ min−1 ) should be performed and compared with a 0.5◦ min−1 experiment. If there is still a difference in the transition temperature all experiments at a given solvent composition should be performed at several different heating rates and the transition temperatures should be extrapolated to a zero heating rate (for more details see Ref. [68]).
26 Thermal Unfolding of Proteins
References 1 D. S. GODSELL and A. J. OLSON, Trends Biochem. Sci. 1993, 18, 65–68. 2 K. A. DILL, Biochemistry 1990, 29, 7133–7155. 3 W. KAUZMANN, Adv. Protein Chem. 1959, 14, 1–63. 4 C. N. PACE, Biochemistry 2001, 40, 310–313. 5 R. JAENICKE and H. LILIE, Adv. Protein Chem. 2000, 53. 6 K. E. NEET and D. E. TIMM, Protein Sci. 1994, 3, 2167–2174. 7 R. SECKLER, Assembly of multi-subunit structures. In Mechanisms of Protein Folding (R. H. PAIN, ed.), pp. 279–308. Oxford University Press, Oxford, 2000. 8 R. JAENICKE, Biol. Chem. 1998, 379, 237–243. 9 C. N. PACE, Methods Enzymol. 1986, 131, 266–280. 10 J. U. BOWIE and R. T. SAUER, Biochemistry 1989, 28, 7139–7143. 11 D. E. TIMM and K. E. NEET, Protein Sci. 1992, 1, 236–244. 12 I. JELESAROV and H. R. BOSSHARD, J. Mol. Recogn. 1999, 12, 13–18. 13 E. FREIRE, O. L. MAYORGA and M. STRAUME, Anal. Chem. 1990, 62, 950A-959A. 14 T. WISEMAN, S. WILLISTON, J. F. BRANDTS and L.-N. LIN, Anal. Biochem. 1989, 179, 131–137. 15 K. W. PLAXCO and C. M. DOBSON, Curr. Opin. Struct. Biol. 1996, 6, 630–636. 16 C. F. BERNASCONI, Relaxation Kinetics, 1st edn. Academic Press, New York, 1976. 17 H. WENDT, C. BERGER, A. BAICI, R. M. THOMAS and H. R. BOSSHARD, Biochemistry 1995, 34, 4097–4107. 18 T. JONSSON, C. D. WALDBURGER and R. T. SAUER, Biochemistry 1996, 35, 4795–4802. 19 R. ZWANZIG, Proc. Natl Acad. Sci. USA 1997, 94, 148–150. 20 M. E. MILLA and R. T. SAUER, Biochemistry 1994, 33, 1125–1133. 21 T. JONSSON, C. D. WALDBURGER and R. T. SAUER, Biochemistry 1996, 35, 4795–4802. 22 A. K. SRIVASTAVA and R. T. SAUER, Biochemistry 2000, 39, 8308–8314.
23 M. E. MILLA, B. M. BROWN, C. D. WALDBURGER and R. T. SAUER, Biochemistry 1995, 34, 13914–13919. 24 C. D. WALDBURGER, T. JONSSON and R. T. SAUER, Proc. Natl Acad. Sci. USA 1996, 93, 2629–2634. 25 I. A. HOPE and K. STRUHL, EMBO J. 1987, 6, 2781–2784. 26 A. I. DRAGAN and P. L. PRIVALOV, J. Mol. Biol. 2002, 321, 891–908. 27 I. JELESAROV and H. R. BOSSHARD, J. Mol. Biol. 1996, 263, 344–358. 28 H. WENDT, L. LEDER, H. H¨ARMA¨ , I. JELESAROV, A. BAICI and H. R. BOSSHARD, Biochemistry 1997, 36, 204–213. 29 E. K. O’SHEA, K. J. LUMB and P. S. KIM, Curr. Biol. 1993, 3, 658–667. 30 P. L. PRIVALOV and S. A. POTHEKIN, Methods Enzymol. 1986, 131, 4–51. 31 R. S. SPOLAR and M. T. RECORD, Jr., Science 1994, 263, 777–784. 32 G. I. MAKHATADZE and P. L. PRIVALOV, Adv. Protein Chem. 1995, 47, 307–425. 33 G. SCHREIBER and A. R. FERSHT, Nature Struct. Biol. 1996, 3, 427–431. 34 H. X. ZHOU, Biopolymers 2001, 59, 427–433. 35 H. X. ZHOU, Protein Sci. 2003, 12, 2379–2382. 36 H. R. BOSSHARD, D. N. MARTI, and I. JELESAROV, J. Mol. Recogn. 2004, 17, 1–16. 37 K. J. LUMB and P. S. KIM, Science 1995, 268, 436–439. 38 P. PHELAN, A. A. GORFE, J. JELESAROV, D. N. MARTI, J. WARWICKER and H. R. BOSSHARD, Biochemistry 2002, 41, 2998–3008. 39 D. N. MARTI and H. R. BOSSHARD, J. Mol. Biol. 2003, 330, 621–637. 40 C. C. MOSER and P. L. DUTTON, Biochemistry 1988, 27, 2450–2461. 41 E. D¨URR, I. JELESAROV and H. R. BOSSHARD, Biochemistry 1999, 38, 870–880. 42 R. KOREN and G. G. HAMMES, Biochemistry 1976, 15, 1165–1171. 43 M. VIJAYAKUMAR, K. Y. WONG, G. SCHREIBER, A. R. FERSHT, A. SZABO and H. X. ZHOU, J. Mol. Biol. 1998, 278, 1015–1024.
References 44 H. R. BOSSHARD, E. D¨URR, T. HITZ and I. JELESAROV, Biochemistry 2001, 40, 3544–3552. 45 J. A. ZITZEWITZ, O. BILSEL, J. B. LUO, B. E. JONES and C. R. MATTHEWS, Biochemistry 1995, 34, 12812–12819. 46 R. A. KAMMERER, T. SCHULTHESS, R. LANDWEHR et al. Proc. Natl Acad. Sci. USA 1998, 95, 13419–13424. 47 J. K. MYERS and T. G. OAS, J. Mol. Biol. 1999, 289, 205–209. 48 R. A. KAMMERER, V. A. JARAVINE, S. FRANK et al. J. Biol. Chem. 2001, 276, 13685–13688. 49 J. A. ZITZEWITZ, B. IBARRA-MOLERO, D. R. FISHEL, K. L. TERRY and C. R. MATTHEWS, J. Mol. Biol. 2000, 296, 1105–1116. 50 B. A. KRANTZ and T. R. SOSNICK, Nature Struct. Biol. 2001, 8, 1042–1047. 51 X. CHENG, M. L. GONZALEZ and J. C. LEE, Biochemistry 1993, 32, 8130–8139. 52 S. M. DOYLE, E. H. BRASWELL and C. M. TESCHKE, Biochemistry 2000, 39, 11667–11676. 53 D. H. KIM, D. S. JANG, G. H. NAM et al. Biochemistry. 2000, 39, 13084–13092. 54 M. S. GITTELMAN and C. R. MATTHEWS, Biochemistry 1990, 29, 7011–7020. 55 C. J. MANN, S. XHAO and C. R. MATTHEWS, Biochemistry 1995, 34, 14573–14580. 56 L. M. GLOSS and C. R. MATTHEWS, Biochemistry 1997, 36, 5612–5623. 57 L. M. GLOSS and C. R. MATTHEWS, Biochemistry 1998, 37, 15990–15999. 58 L. M. GLOSS, B. R. SIMLER and C. R. MATTHEWS, J. Mol. Biol. 2001, 312, 1121–1234. 59 A. C. CLARK, S. W. RASO, J. F. SINCLAIR, M. M. ZIEGLER, A. F. CHAFFOTTE and T. O. BALDWIN, Biochemistry 1997, 36, 1891–1899. 60 E. D¨URR and H. R. BOSSHARD, Protein Sci. 2000, 9, 1410–1415. 61 R. M. THOMAS, H. WENDT, A. ZAMPIERI and H. R. BOSSHARD, Progr. Colloid.
62 63 64 65
66
67
68 69 70 71 72 73
74
75
76 77
78 79
Polym. Sci. 1995, 99, 24–30. D. N. MARTI, M. LU, H. R. BOSSHARD and I. JELESAROV, J. Mol. Biol. 2004, 336, 1–8. D. C. CHAN, D. FASS, J. M. BERGER and P. S. KIM, Cell 1997, 89, 263–273. D. L. BURNS and H. K. SCHACHMAN, J. Biol. Chem. 1982, 257, 8648–8654. M. G. MATEU, M. M. S. Del PINO and A. R. FERSHT, Nature Struct. Biol. 1999, 6, 191–198. C. BODENREIDER, N. KELLERSHOHN, M. E. GOLDBERG and A. MEJEAN, Biochemistry 2002, 41, 14988–14999. C. R. JOHNSON, P. E. MORIN, C. H. ARROWSMITH and E. FREIRE, Biochemistry 1995, 34, 5309–5316. A. R. FERSHT, Curr. Opin. Struct. Biol. 1997, 7, 3–9. T. A. LARSEN, A. J. OLSON and D. S. GOODSELL, Structure 1998, 6, 421–427. C. J. TSAI, S. L. LIN, H. J. WOLFSON and R. NUSSINOV, Protein Sci. 1997, 6, 53–64. V. DAGGETT and A. R. FERSHT, Trends Biochem. Sci. 2003, 28, 18–25. K. S. THOMPSON, C. R. VINSON and E. FREIRE, Biochemistry 1993, 32, 5491–5496. K. BACHHAWAT, M. KAPOOR, T. K. DAM and A. SUROLIA, Biochemistry 2001, 40, 7291–7300. D. APIYO, K. JONES, J. GUIDRY and P. WITTUNG-STAFSHEDE, Biochemistry 2001, 40, 4940–4948. E. BARBAR, B. KLEINMAN, D. IMHOFF, M. LI, T. S. HAYS and M. HARE, Biochemistry 2001, 40, 1596–1605. L. A. WALLACE and H. W. DIRR, Biochemistry 1999, 38, 16686–16694. L. A. WALLACE, N. SLUIS-CREMER and H. W. DIRR, Biochemistry 1998, 37, 5320–5328. L. A. MARKY and K. J. BRESLAUER, Biopolymers 1987, 26, 1601–1620. P. KUZMIC, Anal. Biochem. 1996, 237, 260–273.
27
1
Pressure–Temperature Phase Diagrams of Proteins Wolfgang Doster Technische Universit¨at M¨unchen, Garching, Germany
Josef Friedrich Technische Universit¨at M¨unchen, Freising, Germany
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
One of the most peculiar features of proteins is their marginal stability within a narrow range of thermodynamic conditions. Biologically active structures can be disrupted by increasing or decreasing the temperature, the pressure, the pH or by adding denaturants. By disrupting a structure one can study its architecture and energetics. In this article we focus on the thermodynamic aspects of temperature and pressure denaturation. A variation of temperature inevitably leads to a simultaneous change of entropy and volume through thermal expansion. The advantage of using pressure as a thermodynamic variable is that volume-dependent effects can be isolated from temperature-dependent effects. Just as the increase in temperature drives the system in equilibrium towards states of higher entropy, a pressure increase will bias the ensemble of accessible states towards those with a smaller volume. This is the meaning of Le Chatelier’s principle. Thus, if the protein has the lower volume in its denatured form, the native structure will become unstable above a critical pressure. Similarly the opposite effect, stabilization of the native state by pressure, is sometimes associated with heat denaturation. Packing defects in the water-excluding native state, reorganization of the solvent near exposed nonpolar side chains, and electrostriction by newly formed charges act as volume reservoirs which can lower the volume in the unfolded state. While dissociation of oligomeric proteins into subunits occurs at low pressures, i.e., below 200 MPa [1], pressures above 300 MPa are required to denature monomeric
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Pressure–Temperature Phase Diagrams of Proteins
proteins. A number of useful review articles on this subject have been published recently (see, for example, Ref. [2]). The isothermal compressibility of water (0.56/GPa) is quite small. At 11 000 m below sea level and pressures near 100 MPa, the density of water is only 5% higher than at ambient pressure. On the other hand, the density at the freezing transition changes by 9%. Moreover, proteins are 5–10 times less compressible than water. As a result, pressure-induced volume changes in proteins are quite small, typically 0.5% of the total volume at the unfolding transition. For monomeric proteins the difference corresponds approximately to the volume of five water molecules (5 × 18 mL mol−1 ). Thus, minor volume changes often induce large structural rearrangements, where essentially incompressible matter is displaced. Therefore pressure can be a powerful structural gear. On the other hand, the microscopic interpretation of protein volume changes is still one of the difficult and unresolved questions of the field [3, 4]. Part of the problem arises from the fact that the denatured state itself is not very well defined [5]. It is generally assumed that thermal denaturing leads to a random coil state. As to pressure-induced denaturation and cold denaturation, there are indications that the denatured state is still compact and resembles the molten globule state [5–7]. How such a situation can be reconciled with the simple thermodynamic description of protein stability based on a two-state model has to be investigated in each case. Below we present a thermodynamic frame for describing and interpreting protein stability phase diagrams. The main features are illustrated using our own results obtained with various optical techniques as well as with neutron scattering.
2 Basic Aspects of Phase Diagrams of Proteins and Early Experiments
The most striking thermodynamic properties of proteins are related to the hydrophobic effect. These are Ĺ the large and positive heat capacity increment upon unfolding [8–10] and Ĺ the phenomenon of cold denaturation.
The latter was first predicted by Brandts [11, 12] based on studies of heat denaturation on ribonuclease. The transition temperatures for cold denaturation are usually below the freezing point of most aqueous solutions at ambient pressure. However, under high pressure, water remains liquid down to much lower temperatures (−18 ◦ C at 200 MPa, Figure 1). Thus cold denaturation can be studied conveniently at elevated pressures under mild denaturing conditions, without the need to add denaturants. The phase diagram of water (Figure 1) illustrates the relevance of the pressuretemperature plane for locating states of structural stability and for classifying their thermodynamic properties.
2 Basic Aspects of Phase Diagrams of Proteins and Early Experiments
Fig. 1 The phase diagram of water: For the boundary from the liquid to hexagonal ice (I) one has dP/dT = S/V < 0, unlike for the technically relevant modifications, II and III where the volume discontinuity is small.
The slope of the phase boundary of liquid water and ice I is negative: Increase in pressure extends the stability range of the liquid phase. The slope of the phase boundary is given by the ratio of the negative entropy change and the increase in specific volume at the freezing transition (Eq. (3)). The expansion (+9%) causes substantial damage when biological material is frozen. In contrast, the slope of the liquid to ice III boundary is large and positive, since the corresponding volume change is small and negative. The application of high pressure to the liquid and then cooling into ice III minimizes the damage by volume changes (−3%) at the freezing transition. Proteins, consisting of a few thousand atoms, are mesoscopic objects. Experience and theoretical analysis shows [13] that they can be labeled by well-defined thermodynamic averages, such as the specific heat capacity, the compressibility, and the thermal expansion. The structural reorganization may then be regarded as a change of the macroscopic state of the system. Such changes occur in a highly cooperative manner. Since the native and denatured states of proteins differ not only in heat capacity but also in volume and entropy, the transition between the two states must involve a discontinuity in the first derivatives of the thermodynamic potential. This is the signature of a first-order phase transition. Consequently, the native and denatured states of a single-domain protein may be interpreted as two phases of a macroscopic system, which differ in structural order. For a single protein molecule the transformation can only be abrupt and not gradually. From this point of view a cooperative domain of a protein resembles a crystal, which has,
3
4 Pressure–Temperature Phase Diagrams of Proteins
however, a critical size, since it can only exist as a whole. Experimentally, however, one usually deals with protein ensembles. The properties of the ensemble, which are reflected in the heterogeneity of the characteristic parameters of a protein, such as its energy, structure, compressibility, etc., render some specific features to the denaturing transition compared with phase transitions of thermodynamic systems: The transition is usually rather broad in the variables P or T (e.g., see Figures 9 and 10). This dispersion of the transition region is directly related to the heterogeneity of the protein ensemble, which has its roots in the fact that a protein, although a large molecule, is not an infinite system. The equilibrium constant K DN (T, P) follows from the pressure analog of the van’t Hoff equation with ∂T → ∂P and S → −V: ∂(lnK DN )/∂P = −V/RT The transition width, determined by the magnitude of the transition volume V, is finite for a mesoscopic protein molecule, in contrast to the zero width of a macroscopic first-order transition. Hawley, in 1971, was the first to interpret the curve on which the free energy difference between native and unfolded state vanishes as an elliptical phase boundary. This provided a natural explanation for the so-called re-entrant phase behavior of heat and cold denaturation [14]. Figure 2 shows the contours of the Gibbs free energy G(P, T) versus pressure and temperature for chymotrypsinogen. Note that at moderate pressure levels (< 150 MPa), the temperature of heat denaturation increases slightly with pressure. Hence, applying pressure to the thermally denatured protein may actually drive the protein back into the native state. Upon further increasing the pressure, however, unfolding may occur again. In other words, the denatured phase is re-entered, irrespective of the fact that the pressure is monotonously changed. Phase diagrams with this property are called “re-entrant.” A similar behavior is found if the temperature is changed. For instance, if one starts out at sufficiently high pressure levels (200 MPa) in the denatured state and lowers the temperature, one enters the native regime which is, however, left again at even lower temperatures to re-enter the denatured regime again. This latter transition is called “cold denaturation”. Re-entrant phase diagrams of perfect elliptical shape have been observed for liquid crystals by Cladis et al. [15] and by Klug and Whalley [16]. The elliptical shape was analyzed in detail by Clark [16]. In 1973, Zipp and Kauzmann, in a seminal paper, investigated the pressuretemperature–pH phase diagram of myoglobin [17]. They also observed re-entrant behavior, but did not attempt to fit the phase boundaries using ellipses. Depending on pH, the boundaries vary strongly in shape and show nonelliptical distortions, as shown in Figure 3. In recent years, a significant number of studies have suggested diagrams of elliptical shape for protein denaturation or for even more complex units, such as bacteria [18].
2 Basic Aspects of Phase Diagrams of Proteins and Early Experiments
Fig. 2 Contours of constant chemical potential : = : D – : N between the denatured (D) and the native (N) state of chymotrypsinogen (after Hawley [14]).
Fig. 3 Phase boundaries (: = 0) of myoglobin at various pH values, the native state is stable inside the contours, the denatured state outside (after Zipp and Kauzmann [17]).
5
6 Pressure–Temperature Phase Diagrams of Proteins
The stability phase diagrams of proteins and related phenomena have been discussed in several review articles [19–21]. Our goal is to eludicate the physical basis and thermodynamics of re-entrant protein phase diagrams, the conditions for elliptical shapes, and the limitations of this approach.
3 Thermodynamics of Pressure–Temperature Phase Diagrams
The phase equilibrium between the native and the denatured state N D is controlled by the difference in the chemical potentials µ – µD – µN . The ratio of concentrations of the denatured to the native form assuming a dilute solution is given by: K DN = cD /cN = (−µ/RT )
(1)
The chemical potential is the driving force that induces transport and transformations of a substance. It is analogous to the electrical potential, which can only induce currents because of charge conservation. Since the concentration of components, water, and protein is fixed (only pressure and temperature are varied), we restrict ourselves to one-component systems. The change dµi with temperature and pressure obeys the Gibbs–Duhem relation for each phase D and N: dµN = SN dT + VN dP = −RT d(ln cN /c0 ) dµD = −SD dT + VD dP = −RT d(ln cD /c0 )
(2)
where SN , Sd and V N , V D are the partial molar entropy and volume of the protein in solution in the native and denatured phase, respectively. Since entropy and volume are generally positive quantities, the chemical potential always decreases with increasing temperature, while it increases with increasing pressure, as shown in Figures 4 and 6. The potentials of two phases, differing in entropy, will thus cross at a particular temperature where a transition to the phase with the lower potential occurs. A similar change in phase will take place with pressure when the volumes of the two phases differ. Thus the more compact phase will be stable at high pressure. For proteins in particular, SD > SN at higher temperature because of an excess in configurational entropy of the unfolded chain. Thus µD has the more pronounced temperature dependence and will fall below µN at T H , the heat denaturation temperature. For myoglobin it is suggested in Figure 4 that the peculiar thermodynamic behavior of proteins arises from the strongly temperature-dependent entropy of the unfolded phase: The slope of µD (T) and thus the entropy SD (T) decreases with decreasing temperature.
3 Thermodynamics of Pressure–Temperature Phase Diagrams
Fig. 4 The chemical potentials of the denatured (D) and the native (N) state of myoglobin and the resulting : per mole of amino acid residue. Parameters [24]: : 0N = −0.2 kJ mol−1 , : 0D = 0.2 kJ mol−1 ,
SC = 14 J K−1 mol−1 , Cp = 75 J K−1 mol−1 . T H , T C , and T M are the temperatures for heat denaturation, for cold denaturation and for maximum stability.
On a microscopic scale this reflects the ordering force of nonpolar groups on surface water molecules. Since the corresponding hydrophobic effect is less pronounced in the native state, SN is expected to be almost temperature independent. This discrepancy explains why the potential surfaces µD and µN meet again at T C , the cold denaturation temperature, as shown in Figure 4. The chemical potential difference, µ = µD −µN , consequently assumes the shape of a convex curve, which is close to an inverted parabola with the maximum at Tm , the temperature of maximum stability (Figure 5).
Fig. 5 a) Schematic representation of the difference : = : D – : N of the chemica potential as a function of temperature for various pressures as deduced from Figure 4. b) The phase boundary as schematically deduced from the various pairs of points with : = 0 of Figure 5a for negative values of V = V D — V N .
7
8 Pressure–Temperature Phase Diagrams of Proteins
Changes in pressure displace the difference potential surface µ vertically, depending on the unfolding volume V as shown in Figure 5a. Since the phase transition occurs at constant chemical potential, µ(P, T) = 0, or cN = cD , while pressure and temperature are varied, one derives, from Eq. (2), a differential equation for the phase boundary, the Clausius-Claperon equation: (dP/dT )µ = 0 = S/V
(3)
where S = SD –SN and V = VD −VN can depend on T and P. The phase boundary (Figure 5b) will assume the shape of an inverted parabola if the unfolding volume V is negative and constant in P and T. For positive unfolding volumes a regular parabola, open to the high pressure side, would be obtained excluding denaturation by pressure. The latent quantities S and V are composed of intrinsic contributions of the protein and those of the hydration shell, while the values of bulk water in both phases cancel. The entropy difference reflects mainly the increase in configurational entropy of the chain, SC , and the entropy change SH in the hydration shell, due to exposure of buried residues. The same reasoning applies to the latent volume. Hence, we can write: S = SC + SH V = VC + VH
(4)
We argue that the closed phase boundaries displayed in Figures 2 and 3 are the consequence of the variation of SH and V H with temperature and pressure due to the hydrophobic effect. Exposure of nonpolar groups upon unfolding tends to immobilize water molecules, thereby decreasing their entropy. This mechanism is strongly temperature dependent. Its experimental signature is the positive heat capacity increment at constant pressure, Cp > 0, of the denatured relative to the native form. In simple terms, to increase the temperature of the unfolded phase requires additional entropy to continuously melt the immobilized water [8– 10, 22, 23]. However, above a certain temperature, T 0 , approximately 140 ◦ C, the ordering effct has vanished. Below T 0 , the hydration entropy change, SH (T, T 0 ), is rapidly decreasing with decreasing temperature. Assuming Cp to be approximately independent of temperature, one obtains for the change in “hydration entropy” [24]: SH (T) =
T T0
CP T0 dT = −CP ·ln T T
(5)
and thus from Eqs. (2)–(5): µ = µ0 −(T −T0 ).[SC −Cp ln(T0 /T )] + V ·P
(6)
3 Thermodynamics of Pressure–Temperature Phase Diagrams
The hydration term ∞ Cp is always negative below T 0 , stabilizing the denatured form. This is known as the “wedging effect” of water, which softens the native structure. The combined entropy difference S vanishes at the maximum of µ, at the temperature of maximum stability T M (Figure 4): (∂µ/∂ T )T = TM = −S = 0
(7)
At T M the negative hydration entropy SH just compensates for the positive configurational term SC : SC = Cp ln(T0 /TM )
(8)
S in Eq. (7) depends on pressure and temperature according to: dS(P,T ) = (∂S/∂ T )dT + (∂S/∂ P)dP = (Cp /T )·dT −αp·dP = 0
(9)
where we have introduced the volume expansion increment αp = (∂V/∂T)P=const . Equation (9) defines the slope of a boundary S(T; P) = constant in the P-T plane: (dP/dT )S = const = C p /(T ·αp)
(10)
The special S = 0 line, separating regions of positive and negative transition entropies, is fixed by the condition (dP = dT)µ=0 = 0 as shown in Figure 11. Since the slope in Figure 11 is positive and since both Cp and V are positive, it follows that αp > 0. Integrating Eq. (10) yields the effect of pressure on the temperature of maximum stability: TM = TM0 ·exp(P·αp /Cp )
(11)
For reasonably low pressures one has for the S = 0 line: T M ∼ T M0 (1 + P · αp/Cp ). Expanding the entropic part of the chemical potential (Eq. (6)) about T M ∼ T M0 yields an approximate parabola in T with negative curvature (Figure 5a): 1 µ(T ,P) = µM − Cp ·(T −TM )2 /T + V ·P = −RT ln(c D /c N ) 2
(12)
For V < 0, increasing pressure diminishes the stability range of the native state. If V is constant, one obtains the parabolic pressure-temperature phase diagram, P(T)µ=0 , shown in Figure 5b. Experimental data, like those shown in Figure 2, 3, and 11 suggest a re-entrant phase behavior not only as a function of temperature but also as a function of pressure. This leads to a closed or “ellipsoidal” phase boundary with V depending on pressure and temperature. The volume of the denatured form, i.e. the slope of
9
10 Pressure–Temperature Phase Diagrams of Proteins
Fig. 6 Effect of pressure on the chemica potential :(÷ 4) for the parameters of cytochrome c at 280 K (Table 1 and Eq. 12, 17). The volume of the denatured state decreases with pressure to below the value of
the native protein, which leads to a crossing at high pressure PH . A second crossing occurs at negative pressure PL · PM is the pressure of maximum stability.
µD (P), in Figure 6 increases with decreasing pressure. Thus a second crossing of the D–N potentials may occur at low or even negative pressure denoted by PL . Thus the unfolded protein is more easily stretched than the compact native state. Moreover, near P = 0, a pressure increase will stabilize the native state. In analogy to the temperature (Eq. (7)) we can define a pressure of maximum stability, PM , according to: (∂µ/∂P) P = PM = V = 0
(13)
Combining Eqs. (4) and (13) shows that the changes in configurational and hydration volume cancel at P = PM . The small observed unfolding volumes may result from such compensation effects. Equation (13) allows determination of the V(P, T) = 0 line, separating regions of positive and negative volume changes: (∂V/∂ T )dT + (∂V/∂ P)dP = αp·dT −βT ·dP = 0
(14)
(dP/dT )V = const = αp /βT
(15)
thus
3 Thermodynamics of Pressure–Temperature Phase Diagrams
The V = 0 line follows from the additional condition: (dP/dT)µ=0 = ∞. For the pressure of maximum stability one obtains: PM = PR + (αp/βT )(T −Tr )
(16)
β T denotes the increment in the compressibility coefficient: β T = −(∂V/∂P). T r denotes an unspecified reference temperature. For second-order phase transitions, sometimes associated with molten globule denaturation [25, 26], there is no discontinuity in the extensive quantities. Equations (9) and (14) then become the so-called Ehrenfest equations: (dP/dT)S = 0 = (dP/dT)V = 0 . For most proteins cold denaturation occurs at subzero temperatures in the unstable or supercooled region of the water phase diagram. For this reason cold denaturation was formerly considered to be irrelevant to protein science. Similarly, low-pressure denaturation at PL (T) occurs for most proteins in a large temperature range where pressure is negative. Liquids under tension are unstable, although metastable states can exist, since the creation of a new liquid–gas interface involves the crossing of an energy barrier. The first experiments were performed by Berthelot in 1850 using a spinning capillary [27]. Stretched states of water at pressures as low as −280 bars have been observed using the same method [28]. Unfolding experiments with protein solutions at negative pressures still need to be performed. The folding of unstable proteins such as nuclease conA remains incomplete at positive pressure, while negative pressure may further stabilize the native state [29]. At sufficiently high temperature, low-pressure denaturation may occur in the positive range (Figure 11) and in this case it is easily accessible experimentally. Pressure- and temperature-dependent transition volumes originate from finite increments in the partial compressibility β T and the volume expansion coefficient αp. Expanding the latent volume V about a reference point (T r Pr ) yields: V (P,T ) = Vr −βT ·(P−Pr ) + αp·(T −Tr )
(17)
The major part of the change in the compressibility upon global transformations of proteins is due to hydration processes [30]. The greater the hydration, the smaller the partial compressibility of protein and hydration shell. This is the rule for protein solutions at normal temperature and pressure. For native proteins the intrinsic compressibility is as low as that of organic solids [31–33]. The outer surface contribution to the measured partial compressibility is quite negative. Thus complete unfolding leads to a considerable decrease in the partial compressibility due to the loss of intramolecular voids and the expansion of the surface area contacting the bulk water [34]. In this low pressure/high temperature regime β T is negative, while αp is mostly positive. This results in a positive unfolding volume (Eq. (17)) as indicated in Figure 6. Consequently moderate pressures stabilize the native state. A number of studies suggest that high-pressure denaturation leads to incomplete unfolding and structures resembling those of molten globule states (MG) [5–7, 35]. The N → MG transition is generally accompanied by an increase in compressibility.
11
12 Pressure–Temperature Phase Diagrams of Proteins
The tendency towards a positive β T at higher pressures due to partial unfolding may lead to the observed negative unfolding volumes. Combining Eqs. (12) and (17), the phase boundary µ(T, P) = 0 with respect to a reference point (T r , Pr ) assumes the form of a second-order hyperface: − 12 Cp ·(T −Tr )2 /T − 12 βT ·(P−Pr )2 + αp·(T −Tr )·(P−Pr ) −Sr ·(T −Tr ) + Vr ·(P−Pr ) + µr = 0
(18)
For approximately constant increments Cp , β T , αp, the phase boundary assumes parabolic (β T = 0), hyperbolic or closed elliptical shapes. The basic condition to obtain an elliptical phase diagram is given by: (Cp )(βT )−(αp)2 >0
(19)
Since experiments show that Cp > 0, Eq. (19) requires the compressibility change to be positive: β T > 0. A hyperbolic shape would imply a negative left-hand side in Eq. (19). Thus far only elliptical diagrams, in some cases with distortions (e.g., Figure 3) have been discussed. A complete solution would require data in the range where pressure is negative: the data shown in Figure 2 are also consistent with a parabolic shape.
4 Measuring Phase Stability Boundaries with Optical Techniques 4.1 Fluorescence Experiments with Cytochrome c
Protein denaturing processes can be investigated with many techniques. The most convenient ones concerning the determination of the whole stability diagram are spectroscopic techniques. Almost all methods have been used: IR, Raman, NMR, absorption and fluorescence spectroscopy. However, data covering complete phase diagrams are quite rare. Some of them were reviewed above. The various techniques have their specific advantages as well as disadvantages. For instance, with vibrational spectroscopy one obtains information on structural changes such as the weakening of the amide I band [19, 20, 36, 37], with NMR one obtains information on hydrogen exchange and the associated protection factors [6] from which conclusions on structural details of the denatured state can be drawn. However, as a rule no information on the structure disrupting mechanism under pressure is obtained. The situation with absorption and fluorescence spectroscopy is different. In optical spectroscopy it is the spectral position and the linewidth of an electronic transition that is measured. From these quantities it is, as a rule, quite difficult to extract any structural information. On the other hand, one obtains detailed information on the interaction of the electronic states with the respective environment. Since
4 Measuring Phase Stability Boundaries with Optical Techniques
Fig. 7 Sketch of a fluorecence experiment for measuring the phase diagram of proteins.
these interactions are known in detail, it is possible to extract from their behavior under pressure and temperature changes information on the mechanism of phase crossing. Compared with other spectroscopic techniques, fluorescence spectroscopy is also the most sensitive. Hence, it is possible to work at very low concentration levels so that aggregation processes can easily be avoided. In most experiments tryptophan is used as a fluorescent probe molecule. With chromoproteins this is, generally speaking, impossible since fast energy transfer to the chromophore quenches the UV fluorescence to a high degree. In the following we describe fluorescence experiments on a cytochrome c-type protein in a glycerol/water matrix [38]. The native heme iron was substituted by Zn in order to make the protein strongly fluorescent in the visible range. As a short notation for this protein we use the abbreviation Zn-Cc. The set-up for a fluorescence experiment is simple and is shown in Figure 7. The sample is in a temperature- and pressure-controlled diamond anvil cell. Pressure can be varied up to about 2 GPa. Its magnitude is determined from the pressure shift of the ruby fluorescence, for which reference values can be taken from the literature [39]. The temperature is controlled by a flow thermostat between −20 ◦ C and 100 ◦ C. Excitation is carried out into the Soret band at 420 nm with light from a pulsed dye laser pumped by an excimer laser. The fluorescence is collected in a collinear arrangement, dispersed in a spectrometer and detected via a CCD camera. The quantities of interest for measuring the protein stability phase diagram are the spectral position and the width of the S1 → S0 00-transition (587 nm, Figure 8). Generally speaking the spectral changes for Zn-Cc are rather small. Nevertheless the measured changes of the spectral shifts and widths are accurate within about 3%. Figure 8 shows the fluorescence spectrum of Zn-Cc between 550 and 700 nm as it changes with pressure. The sharp line around 695 nm is the fluorescence from ruby. 4.2 Results
Figures 9 and 10 show typical results for the changes in the first and second moment (position of the maximum and width) of the 00-band under pressure and temperature variation.
13
14 Pressure–Temperature Phase Diagrams of Proteins
Fig. 8 Fluorescence spectrum of Zn-cytochrome c in glycerol/water at ambient temperature as it changes with pressure. The stability diagram was determined from the changes of the first and second moment of the 00-band at 584 nm. The fluorescence from ruby is shown as well. It serves for gauging the pressure.
Fig. 9 The behavior of first and second moment of the fluorescence under pressure variation at ambient temperature.
4 Measuring Phase Stability Boundaries with Optical Techniques
Fig. 10 The behavior of the first and second moment of the fluorescence under temperature variation at ambient pressure.
If pressure is increased the band responds with a red shift, although a very small one (Figure 9). Around 0.75 GPa the red shifting regime changes rather abruptly into a blue shifting regime which levels off beyond 1 GPa. A qualitatively similar behavior is observed for the band shift under a temperature increase: a red shifting phase (in this case more pronounced) is followed by a blue shifting phase (Figure 10). Quite interesting is the behavior of the bandwidth: An increase of pressure leads to a rather strong increase in the width (Figure 9). This is the usual behavior and just reflects the fact that an increase in density results in an increase in the molecular interactions due to their strong dependence on distance. However, if the pressure is increased beyond about 0.9 GPa, the band all of a sudden narrows. This narrowing levels off beyond about 1 GPa. As can be seen from the figure, the maximum in the bandwidth coincides with the midpoint of the blue shifting regime. The thermal behavior of the bandwidth is different (Figure 10): There is no narrowing phase, but there is kind of a kink in the increase of the bandwidth with temperature around 65 ◦ C. Again, this change in the thermal behavior of the width coincides rather well with the midpoint in the thermally induced blue shifting phase. For measuring the complete phase diagram we performed a series of pressure scans from ambient pressure up to about 1.5 GPa at various temperatures and a series of temperature scans from a few centigrades up to about 90 ◦ C at various pressure levels. The respective pattern of the changes of the fluorescence always showed the sigmoid-like behavior as discussed above. We associate the midpoint
15
16 Pressure–Temperature Phase Diagrams of Proteins
Fig. 11 The stability phase diagram of Zn-cytochrome c. The solid line is an elliptic least square fit to the data points. The straight lines mark the points where S and V change sign.
of the blue shifting regime with the stability boundary between the native and the denatured state of the protein due to reasons discussed below. The complete stability diagram is shown in Figure 11. The solid line represents an elliptic least square fit to the experimental points. The two solid lines S = 0 and V = 0 cut the stability diagram at points where the volume and the entropy difference change sign (see also Eqs. (11) and (16)). For instance, for all data points to the left of the S = 0 line the entropy in the denatured state SD is smaller than SN , the entropy in the native state (see below). Figures 12 and 13 show specific features of the fluorescence behavior which we want to discuss separately. Figure 12 shows the behavior of the first and second moment for cold denaturation at ambient pressure. The point which we want to stress is that the band shift no longer reflects the qualitative change in its behavior. Instead, we observe a blue shift which increases in a nonlinear fashion with decreasing temperature. From such a behavior it is hard to determine a transition point. The bandwidth, on the other hand, shows a quite unusual behavior: it narrows with decreasing temperature, but around −3 ◦ C it runs through a minimum and starts broadening upon further decreasing the temperature. The unusual behavior concerns the broadening with decreasing temperature. This broadening obviously originates from an additional state which becomes populated below −3 ◦ C. Naturally, this state must be the denatured state of the protein. Hence, we associate this minimum in the bandwidth with the stability boundary.
4 Measuring Phase Stability Boundaries with Optical Techniques
Fig. 12 The behavior of the first and second moment of the fluorescence for cold denaturation at ambient pressure.
Fig. 13 The first and second moment of the fluorescence under a temperature variation at high pressure (0.7 GPa). The pattern shows features characteristic for re-entrant phase crossing.
17
18 Pressure–Temperature Phase Diagrams of Proteins
The temperature dependence of the moments in Figure 13 follows a rather complex pattern. This pattern comes from the re-entrant character of the phase diagram which has been stressed in Section 2. At a pressure level of 0.7 GPa, the protein is denatured to a high degree if the temperature is below around 10 ◦ C (see Figure 11). Hence, upon increasing the temperature, the protein enters the stable regime. This is most clearly reflected in a narrowing of the bandwidth due to the fact that the denatured state becomes less and less populated. At the same time the band maximum undergoes a red shift in line with the observation that the denatured state is blue shifted from the native one. However, around 25 ◦ C the red shifting phase turns into a blue shifting phase, signaling that the protein re-enters the denatured regime. The bandwidth data support this conclusion. Interestingly, the data convey the impression that there are more than just two transitions involved; their nature, however, is not clear.
5 What Do We Learn from the Stability Diagram? 5.1 Thermodynamics
The eye-catching feature in Figure 11 is the nearly perfect elliptic shape of the stability diagram in agreement with the simple model developed in Section 3. The elliptic shape implies that the characteristic thermodynamic parameters of the protein, namely the increments of the specific heat capacity, the compressibility and the thermal expansion are rather well-defined material parameters, i.e., do not significantly depend on temperature and pressure. In addition, since the model is based on just two states, we conclude from the elliptic shape that the protein investigated, namely Zn-Cc, can be described as a two-state folder. Applying the Clausius–Clapeyron equation (Eq. (3)) to the elliptic phase boundary, we see that there are distinct points characterized by (dP/dT)µ=0 = ∞ and by (dP/dT)µ=0 = 0. The former condition separates the regime with positive V from the regime with negative V. The latter one separates the regime with positive S from the regime with negative S (Eqs. (7) and (13)). Let us consider thermal denaturation at ambient pressure.S is definitely positive because the entropy of the chain increases upon denaturation and the entropy of the solvent increases as well. Since the slope of the phase boundary is positive, V must be positive, too. However, moving along the phase boundary, we eventually cross the V = 0 line. For all points above that line, V is negative. In other words: application of pressure destabilizes the native state and favors the denatured state. Below the V = 0-line, V is positive. Accordingly, application of pressure favors the native state. In a similar way, for all points to the right of the S = 0 line, S is positive. A positive entropy change upon denaturation is what one would straightforwardly expect. However, to the left of the S = 0 line, the entropy change is negative,
5 What Do We Learn from the Stability Diagram?
meaning that the entropy in the denatured state is smaller than in the native state. As was discussed above, the negative entropy change is due to an upcoming ordering of the hydration water as the temperature is decreased: The exposure of hydrophobic groups causes the water molecules in the hydration shell to become more and more immobilized due to the formation of a stronger hydrogen network. Right at the point where the S = 0 line cuts the phase diagram, the chain entropy and the entropy of the hydration shell exactly cancel, as we have stressed above. There is another interesting outcome from the elliptic shape of the phase diagram: Eq. (19) tells us that the change in the compressibility upon denaturation has to have the same sign as the change in the specific heat. The change in the specific heat is positive meaning that the heat capacity is larger in the unfolded state. As outlined above, this is due to the fact that an increase of the temperature in the denatured state requires the melting of the ordered immobilized hydration shell. Accordingly, β has to be positive, meaning that the compressibility in the denatured state is larger than in the native state. As to the absolute values of the thermodynamic parameters which determine the phase diagram, they can be determined only if one of the parameters is known. In the following section we will show how the equilibrium constant can be determined as a function of temperature and pressure and how this can be exploited to extract the thermodynamic parameters on an absolute scale. 5.2 Determination of the Equilibrium Constant of Denaturation
The body of information that can be extracted from the thermodynamics of protein denaturation reveals surprising details. However, it should be stressed again that all the modeling is based on two important assumptions, namely that folding and denaturing is described within the frame of just two states, N and D, and that these two states are in thermal equilibrium. Neither of these assumptions is straightforward. It is well known that the number of structural states, even of small proteins is, is extremely large [40–46] and the communication between the various states can be very slow so that the establishment of thermal equilibrium is not always ensured. We understand the two-state approximation on the basis of a concept which we call “state lumping.” In simple terms this means that the structural phase space can be partitioned into two areas comprising, on the one hand, all the states in which the protein is functioning and, on the other hand, the area in which the protein is dead. We associate the native state N and the denatured state D with these two areas. In order for the “state lumping” concept to work, the immediate consequence is that all states within the two and between the two areas are in thermal equilibrium. Whether this is true or not can only be proven by the outcome of the experiments. For Zn-Cc this concept seems to hold sufficiently well. In our experiments, typical waiting times for establishing equilibrium were of the order of 20 minutes. In order to make sure that this time span is reasonable, we measured for some points on the phase boundary also the respective kinetics. However, we also stress that at
19
20 Pressure–Temperature Phase Diagrams of Proteins
high pressures and low temperatures (left side of the phase diagram, Figure 11) we could not get reasonable data and we attributed this to the fact that equilibrium could not be reached within the experimental time window. Assuming that equilibrium is established and the two-state approximation holds with sufficient accuracy, we can determine the equilibrium constant as a function of pressure and temperature from the fluorescence experiments and from the fact that the phase diagram has an elliptic shape. We proceed in the following way: The fluorescence intensity F(λ) in the transformation range is a superposition from the two states N and D with their respective fluorescence maximum at λN and λD : F (λ) = pN aN exp[−(λ−λN )2 /2σN2 ] + pD aD exp[−(λ−λD )2 /2σD2 ]
(20)
where pN and pD are the population factors of the two states N and D, aN and aD are the respective oscillator strengths. Changes of the temperature or the pressure of the system cause a change in the population factors pN and pD which, in turn, leads to a change in the first moment of F(λ), that is in the spectral position of the band maximum, λmax . As to Zn-Cc, we can make use of some of its specific properties: First, the chromophore itself is quite rigid, hence, it is not likely affected by pressurizing or heating the sample. Accordingly, it is safe to assume that the oscillator strengths in the native and in the denatured state are not significantly different from each other. Second, the spectral changes induced by varying the temperature or the pressure are rather small compared to the total width of the long wavelength band (Figure 8). In addition, the wavelengths λN and λD are rather close and well within the inhomogeneous band. As a consequence, the exponentials in Eq. (20) are roughly of the same magnitude irrespective of the value of λ. Along these lines of reasoning, λmax is readily determined from the condition dF/dλ = 0: λmax = pN (T ,P)λN + pD (T ,P)λD = pN (T , p)[λN −λD ]−λD
(21)
The last term on the right hand side of Eq. (21) holds because we restrict our evaluation to an effective two-state system. According to the above equation, the band maximum in the transition region is determined by a population weighted average of λN and λD . Since λN and λD depend on pressure and temperature themselves, we have to determine the respective edge values in the transition region. How this is done is shown in Figure 14 for thermal denaturation at ambient pressure. Similar figures are obtained for any parameter variation. Having λmax as a function of pressure or temperature from the experiment and knowing the respective edge values λN and λD , the population factors pN and pD can readily be evaluated as a function of pressure and temperature using Eq. (21). From the population factors the equilibrium constant K follows immediately: K DN (T ,P) = [1− pN (T ,P)]/ pN (T ,P)
(22)
Once K is known, µ (or, equivalently, G mol−1 ) can be determined as a function of pressure and temperature from Eq. (1). For a fixed pressure, say pi, µ(T, Pi)
5 What Do We Learn from the Stability Diagram?
Fig. 14 Determination of the edge values 8N and 8D .
forms an inverted parabola in T, as was shown in Section 3. The same is true for µ(P, Ti). So we determined µ(T) and µ(P) by fitting parabolas to the few data points (Figures 15 and 16) under the constraints that both branches of these parabolas have to go through the phase boundaries of the stability diagram (Figure 11). From the first and second derivative of µ with respect to temperature and pressure all the thermodynamic parameters, namely V (T, P), S(T, P). Cp , β and α, can be determined. For Zn-Cc this parameters are listed in Table 1. We took Cp from the equilibrium constant because it seems to be the most accurate parameter (Figure 15). As a matter of fact our value is rather close to what was measured by Makhatadze and Privalov for unfolding native cytochrome c [10]. All the other parameters are determined from the phase diagram. Note that β has the same sign as Cp , as required for an elliptic shape of the diagram. 5.3 Microscopic Aspects
Having determined the complete set of thermodynamic parameters which govern denaturation of Zn-Cc, we may now proceed with exploring the microscopic driving forces of denaturation. We stress that the pattern in the behavior of the first moment of the fluorescence 00-transition is qualitatively very similar for thermal and pressure denaturation: A red shifting regime upon an increase in both parameters is followed by a blue shifting regime which signals the onset of the transformation range. The solvent shift of an optical transition is mainly determined by two types
21
22 Pressure–Temperature Phase Diagrams of Proteins
Fig. 15 Temperature dependence of the difference of the Gibbs free energy. G. (Note that, for a one-component system, G/Mol = :).
Fig. 16 Pressure dependence of the difference in the Gibbs free energy. G.
5 What Do We Learn from the Stability Diagram? Table 1 Thermodynamic parameters for the denaturation of Zn-cytochrome c
Parameters Cp (kJ mol−1 K−1 ) β(cm3 mol−1 GPa−1 ) α (cm3 mol−1 K−1 ) V (0.91 GPa, 298 K) (cm3 mol−1 ) S (0.91 GPa, 298 K) (kJ mol−1 K−1 ) V (0.1 MPa, 333 K) (cm3 mol−1 ) S (0.1 MPa, 333 K) (kJ mol−1 K−1 )
From the fit 5.87 148.1 0.139 −74.6 −0.263 65.0 0.530
of interactions, namely the dispersive and the higher order electrostatic interaction (dipole-induced dipole). Both types of interaction are very short ranged. They fall off with distance R as R−6 . From the pioneering papers by Bayliss, McRae and Liptay [47–49], it is well known that the dispersive interaction is always red shifting because the polarizability in the excited state is higher than in the ground state. However, electrostatic interactions can cause shifts in both directions, to the red as well as to the blue, depending on how the dipole moment of the chromophore changes in the excited state compared with the ground state. Accordingly, in the blue shifting regime, the electrostatic interaction of the probe with its environment obviously increases compared with the dispersive interaction. As a consequence, we conclude that polar groups with a sufficiently large dipole moment have to come close to the chromophore to induce this shift. As to thermal denaturation, such an interpretation seems to fit quite well into the scenario. Baldwin [22], for instance, could show that thermal denaturation of a protein is remarkably well described in analogy to the solvation of liquid hydrocarbons in water. This “oil droplet model” accounts for the temperature dependence of the hydrophobic interaction: The entropic driving force for folding, which comprises the major part of the hydrophobic interaction, decreases as the temperature increases. This driving force is associated with the formation of an ordered structure of the water molecules surrounding the hydrophobic amino acids. At sufficiently high temperature this ordered structure melts away so that the change of the unfolding entropy of the polypeptide chain takes over and the protein attains a random coil-like shape [5]. Since a random coil is an open structure, the chromophore may now be exposed to water molecules. Water molecules have a rather high permanent dipole moment, hence, may become polarized through the electrostatic interaction with the chromophore. Along these lines of reasoning, it seems straightforward that the observed blue shifting regime of the fluorescence in the thermal denaturation of Zn-Cc comes from the water molecules of the solvent. However, as was pointed out by Kauzmann [50], the “oil droplet model” has severe shortcomings, despite its success in explaining the specific features of thermal denaturation. It almost completely fails in explaining the specific features of pressure induced denaturation. Pressure denaturation is governed by the volume change V (Section 3). In this respect hydrophobic molecules behave quite differently from
23
24 Pressure–Temperature Phase Diagrams of Proteins
proteins: V for protein unfolding is negative above a few hundred MPa (see, for instance, Figure 11 and discussions in Sections 3 and 5.1), whereas it is positive in the same pressure range for transfering hydrophobic molecules into water. There are two possible consequences: Either the “oil droplet model” is completely wrong, or the pressure denatured state is different from the thermally denatured state. Indeed, Hummer and coworkers could show that the latter possibility is true [51, 52]. On the basis of their calculations they suggested that, as pressure is increased, the tretrahedral network of H-bonds in the solvent becomes more and more frustrated so that the pressure-induced inclusion of water molecules within the hydrophobic core becomes energetically more and more favorable. As a consequence, the contact configuration between hydrophobic molecules is destabilized by pressure whereas the solvent-separated contact configuration (a water molecule between two hydrophobic molecules) is stabilized. In simple words this means that water is pressed into the protein, the protein swells but does not completely unfold into a random coil conformation but rather seems to retain major parts of its globular shape. The important conclusion is that pressure denaturation is based on the presence of water. Larger solvent molecules, such as glycerol, can obviously not penetrate the protein, hence may act as stabilizing elements against pressure [53]. It seems that this view on the microscopic aspects of pressure denaturation is convincingly supported by our experiments: First, we stress again that the general pattern in the variation of the first moment with pressure is very similar to the respective variation with temperature. Since, in the latter case, we attributed this pattern to water molecules approaching the chromophore, it is quite natural to associate the pressure-induced pattern with the same phenomenon. Second, the magnitude of the change in the first moments is about the same for thermally and pressure-induced denaturation. Accordingly, the change in the respective interaction has to be very similar. This means, that, on average, the number and the distances of the additionally interacting water molecules have to be the same. In order to get additional support for this reasoning we performed pressure tuning hole burning experiments [54]. With these experiments it is possible to estimate the size of an average interaction volume of the chromophore with its environment. We found that the respective radius is about 4–5 Å. Accordingly, application of high pressure forces water molecules from the hydration shell into the interior of the protein where they fill the voids around the chromophore in the heme pocket. High-pressure (0.9 GPa) MD simulations (M. Reif and Ch. Scharnagl, in preparation) are in full agreement with this view of pressure denaturation. Summarizing this section we state that fluorescence experiments in combination with other optical experiments and computer simulations [21] provide a detailed insight into the processes involved in pressure-induced protein denaturation. 5.4 Structural Features of the Pressure-denatured State
From the discussion above it is evident that pressure denaturation leads to a structural state which is different from the respective one obtained by thermal
6 Conclusions and Outlook
Fig. 17 Comparative neutron scattering experiments at ambient pressure and at high pressure. Plotted is the neutron structure factor H(Q) as a function of Q. Data are for myoglobin. The important feature is the partial persistence of the protein helical structure (peak at Q = 0.7 A−1 ) in the pressure-denatured state.
denaturation. It seems that many structural features of the native protein remain preserved under pressure denaturation. Experimental evidence comes from NMR experiments [6], but also from IR experiments [19, 20, 36, 37]. Figure 17 shows another experiment [55], namely a neutron-scattering experiment with myoglobin as a target from which the conservation of native structural elements in the denatured state is directly seen. Myoglobin denatures at pH 7 in the range between 0.3 and 0.4 GPa as judged from the optical absorption spectrum of the heme group. The corresponding changes in the protein structure and at the protein-water interface can be deduced from the neutron structure factor H(Q) of a concentrated solution of myoglobin in D2 O (Figure 17). Q denotes the length of the scattering vector. The large maximum around Q = 2 Å−1 arises from OO correlations of water near the protein interface. This maximum increases with density. The smaller maximum near Q = 0.7 A−1 reflects helix-helix correlations. The contrast is provided by the negative scattering length density of the protein hydrogen atoms versus the positive scattering length density of C, N, and O. The most remarkable feature is the persistence of this maximum above the transition at 0.45 GPa, indicating residual secondary structure in the pressure-unfolded form.
6 Conclusions and Outlook
Folding and denaturing processes of proteins are extremely complicated due to the complex nature of the protein molecules with their huge manifold of structural
25
26 Pressure–Temperature Phase Diagrams of Proteins
states. Hence, it is surprising that some of the characteristic features of equilibrium thermodynamics associated with the folded and denatured state are already revealed on the basis of simple models. Most important in this context is the reduction of the folding and denaturing processes to just two essential states, namely the native state and the denatured state. To reconcile this crude assumption with the large structural phase space, we introduced the concept of “state lumping.” Another important approximation concerns the vanishing higher (higher than two) derivatives of the chemical potential, rendering pressure- and temperatureindependent state parameters (specific heat, compressibility, thermal expansion) to the protein. The result of these simplifying assumptions is an elliptically shaped stability phase diagram, provided that the sign of the change in the specific heat capacity and in the compressibility is the same. Our experiments on the stability of a modified cytochrome c in a glycerol/water solvent showed an almost perfect ellipse from which the regimes with positive and negative volume changes as well as positive and negative entropy changes could be deduced. We tried to shed some light on the microscopic aspects responsible for these regimes. A dominant force is the hydrophobic interaction which depends not only on temperature but also on pressure [56]. The characteristic structural features of the pressure-denatured state are different from the respective ones of the thermally denatured state. In both cases, however, water molecules come very close to the chromophore. As to the open questions in context with the stability phase diagram, we refer to the elliptic shape. So far only closed diagrams have been observed, and in most cases an elliptic shape was an appropriate description. The question comes up how general this observation is and what the microscopic implications are. Admittedly, up to now few experiments have measured the full diagram. Another problem concerns negative pressure. From an experimental point of view negative pressures as low as −200 bar seem to be feasible. Denaturation under negative pressure, that is under conditions where the relevant interactions are weakened, would add valuable information on pressure-induced denaturing forces. Finally the role of the solvent has to be addressed in more detail. The solvent may have a dramatic influence on the hydrophobic interaction, the size of the solvent molecules may play an important role in all processes involving pressure, and, last but not least, understanding the influence of the solvent may also shed light on the behavior of membrane proteins.
Acknowledgment
The authors acknowledge financial support from the DFG (FOR 358, A1/2), from the Fonds der Chemischen Industrie (J.F.) and from the Bundesministerium f¨ur Bildung und Forschung 03DOE2M1. Scientific input to this work came from many of our friends. In particular we want to thank J. M. Vanderkooi, H. Lesch, and G. Job and F. Herrmann for the idea to base thermodynamics on entropy and chemical potential.
References
References 1 G. WEBER, H. DRICKAMMER, Q. Rev.; Biophys. 1983, 16, 89–112. 2 Special Issue on “Frontiers in high pressure biochemistry and biophysics”, Biochim. Biophys. Acta 2002, 1595, Issue 1–2. 3 C. A. ROYER, Biochim. Biophys. Acta 2002, 1595, 201–209. 4 M. GERSTEIN, C. GOTHIA, Proc. Natl Acad. Sci. USA 1996, 93, 10167–10172. 5 A. FERSHT, Structure and Mechanism in Protein Science, W. H. Freeman, New York, 1999. 6 D. P. NASH, J. JONAS, Biochemistry 1997, 36, 14375–14383. 7 G. J. A. VIDUGIRIS, J. L. MARKLEY, C. A. ROYER, Biochemistry 1995, 34, 4909–4912. 8 P. L. PRIVALOV, Adv. Protein Chem. 1979, 33, 167–241. 9 P. L. PRIVALOV, S. J. GILL, Adv. Protein Chem. 1988, 39, 191–234. 10 G. I. MAKHATADZE, P. L. PRIVALOV, Adv. Protein Chem. 1995, 47, 307–425. 11 J. F. BRANDTS, J. Am. Chem. Soc. 1964, 86, 4302–4314. 12 J. F. BRANDTS, J. Am. Chem. Soc. 1965, 87, 2759–2760. 13 A. COOPER, Prog. Biophys. Mol. Biol. 1984, 44, 181–214. 14 S. A. HAWLEY, Biochemistry 1971, 10, 2436–2442. 15 P. E. CLADIS, R. K. BOGARDUS, W. B. DANIELS, G. N. TAYLOR, Phys. Rev. Lett. 1977, 39, 720–723. 16 D. D. KLUG, E. J. WHALLEY, J. Chem. Phys. 1979, 71, 1874–1877. 17 A. ZIPP, W. KAUZMANN, Biochemistry 1973, 12, 4217–4228. 18 H. LUDWIG, W. SCIGALLA, B. SOJKA, in High Pressure Effects in Molecular Biophysics and Enzymology, J. L. MARKLEY, D. NORTHROP, C. A. ROYER (Eds), Oxford University Press, Oxford, 1996, 346–363. 19 K. HEREMANS, L. SMELLER, Biochim. Biophys. Acta 1998, 1386, 353–370. 20 Y. TANIGUCHI, N. TAKEDA, in High Pressure Effects in Molecular Biophysics and Enzymology, J. L. MARKLEY, D. NORTHROP, C. A. ROYER (Eds), Oxford
21 22 23 24 25 26 27 28 29
30 31 32
33 34 35 36
37
38
39
40
University Press, Oxford, 1996, 87–95. E. PACI, Biochim. Biophys. Acta 2002, 1595, 185–200. R. BALDWIN, Proc. Natl Acad. Sci. USA 1986, 83, 8069–8072. R. BALDWIN, N. MULLER, Proc. Natl Acad. Sci. USA 1992, 89, 7110–7113. P. L. PRIVALOV, Biofizika 1987, 32, 742–746. Y. V. GRIKO, P. PRIVALOV, J. Mol. Biol. 1994, 235, 1318–1325. Th. KIEFHABER, R. L. BALDWIN, J. Mol. Biol. 1995, 252, 122–132. M. BERTHELOT, Ann. Chim. 1850, 30, 232–237. L. J. BRIGGS, J. Appl. Phys. 1950, 21, 721–722. M. EFTINK, G. RAMSEY, in High Pressure Effects in Molecular Biophysics and Enzymology, J. MARKLEY, D. NORTHROP, C. A. ROYER (Eds), Oxford University Press, Oxford, 1996, 62. T. V. CHALIKIAN, K. J. BRESLAUER, Curr. Opin. Struct. Biol. 1998, 8, 657–664. K. GEKKO, H. NOGUCHI, J. Phys. Chem. 1979, 83, 2706–2714. B. GAVISH, E. GRATTON, C. J. HARDY, Proc. Natl Acad. Sci. USA 1983, 80, 750–754. J. ZOLLFRANK, J. FRIEDRICH, J. Opt. Soc. Am. B. 1992, 9, 956–961. D. P. KHARAKOZ, A. P. SARVAZYAN, Biopolymers 1993, 33, 11–25. G. J. A. VIDUGIRIS, C. A. ROYER, Biophys. J. 1998, 75, 463–470. G. PANICK, R. MALESSA, R. WINTER, G. RAPP, K. J. FRYE, C. A. ROYER, J. Mol. Biol. 1998, 275, 389–402. H. HERBERHOLD, S. MARCHAL, R. LANGE, C. H. SCHEYING, R. F. VOGEL, R. WINTER, J. Mol. Biol. 2003, 330, 1153–1164. H. LESCH, H. STADLBAUER, J. FRIEDRICH, J. M. VANDERKOOI, Biophys. J. 2002, 82, 1644–1653. M. EREMETS, High Pressure Experimental Methods, Oxford University Press, Oxford, 1996. J. D. BRYNGELSON, J. N. ONUCHIC, N. D. SOCCHI, P. G. WOLYNES, Proteins Struct. Funct. Genet. 1995, 21, 167–195.
27
28 Pressure–Temperature Phase Diagrams of Proteins
41 H. FRAUENFELDER, F. PARAK, R. D. YOUNG, in Annu. Rev. Biophys. Chem. 1988, 17, 451–479. 42 H. FRAUENFELDER, S. G. SLIGAR, P. G. WOLYNES, Science 1991, 254, 1598–1603. 43 H. FRAUENFELDER, D. THORN LEESON, Nat. Struct. Biol. 1998, 5, 757–759. 44 H. FRAUENFELDER, Nat. Struct. Biol. 1995, 2, 821–823. 45 A. E. GARCIA, R. BLUMENFELD, G. HUMMER, J. A. KRUMMHANSL, Physica D 1997, 107, 225–239. 46 K. A. DILL, H. S. CHAN, Nat. Struct. Biol. 1997, 4, 10–19. 47 N. S. BAYLISS, J. Chem. Phys. 1950, 18, 292–296. 48 N. S. BAYLISS, E. G. MCRAE, J. Phys. Chem. 1954, 58, 1002–1006. 49 W. LIPTAY, Z. Naturforsch. 1965, 20a, 272–289.
50 W. KAUZMANN, Nature 1987, 325, 763–764. 51 G. HUMMER, S. GARDE, A. E. GARCIA, M. E. PAULALITIS, L. R. PRATT, Proc. Natl Acad. Sci. USA 1998, 95, 1552–1555. 52 G. HUMMER, S. GARDE, A. E. GARCIA, M. E. PAULALITIS, L. R. PRATT, J. Phys. Chem. 1998, 102, 10469–10482. 53 A. C. OLIVEIRA, L. P. GASPART, A. T. DA POIAN, J. L. SILVA, J. Mol. Biol. 1994, 240, 184–187. 54 H. LESCH, J. SCHLICHTER, J. FRIEDRICH, J. M. VANDERKOOI, Biophys. J. 2004, 86, 467–472. 55 W. DOSTER, R. GEBHARDT, A. K. SOPER, in Advances in High Pressure Bioscience and Biotechnology, R. WINTER (Ed.), Springer, Berlin, 2003, 29. 56 K. A. DILL, Biochemistry 1990, 29, 7133–7155. 57 N. A. CLARK, J. Physique 1979, 40, 345–349.
1
Conformational Properties of Unfolded Proteins Patrick J. Fleming and George D. Rose Johns Hopkins University, Baltimore, USA
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The protein folding reaction, U(nfolded) N(ative), is a reversible disorder order transition. Proteins are disordered (U) at high temperature, high pressure, extremes of pH, or in the presence of denaturing solvents; but they fold to uniquely ordered, biologically relevant conformers (N) under physiological conditions. This folding transition is highly cooperative such that individual molecules within the population are predominantly fully folded or fully unfolded; partially folded chains are transitory and rare. Notably, no covalent bonds are made or broken during folding/unfolding; in effect, the transition is simply a re-equilibration in response to changes in temperature, pressure, pH, or solvent conditions. Currently, there are more than 20 000 examples of native proteins in the protein databank. In contrast, the unfolded population, by its very nature, resists ready structural characterization. In this sense, the folding reaction might be more appropriately denoted as ? N. This chapter traces thinking about the unfolded state from Pauling’s and Wu’s early suggestions in the 1930s, through the work of Tanford and Flory in the 1960s to the present moment. Early work gave rise to the random coil model for the unfolded state, as described below. Confirmatory findings established this model as the conceptual anchor point for thinking about unfolded proteins – until recently, perhaps. In the past few years, results from both theory and experiment indicate the existence of conformational bias in the unfolded state, a condition that is not addressed by the random coil model. If unfolded conformers are biased toward their native conformation sufficiently, then the random coil model is likely to be superseded by newer, more specific models. Though controversial, such a conceptual shift appears to be underway, as we attempt to describe.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Conformational Properties of Unfolded Proteins
Fig. 1 The folding transition. The folding reaction of a typical, small biophysical protein is a highly cooperative, all-or-none transition. At the transition midpoint, half the ensemble is folded and half is unfolded; the population of partially folded/unfolded molecules is negligible. In this idealized
plot of an actual experiment [1], the population is followed by a conformational probe (e.g., circular dichroism or fluorescence) as a function of denaturant concentration. Upon addition of sufficient denaturant, the probe signal reaches a plateau, indicating that the transition is complete.
1.1 Unfolded vs. Denatured Proteins
The term unfolded protein is generic and inclusive, and it can range from protein solutions in harsh denaturants to protein subdomains that undergo transitory excursions from their native format via spontaneous fluctuations. While conceptually complete, this range is too diverse to be practically useful, and it requires further specification. Accordingly, the field has focused more specifically on denatured proteins, the population of unfolded conformers that can be studied at equilibrium under high concentrations of denaturing solvents, high temperature, high pressure, high/low pH, etc. Early experiments of Ginsburg and Carroll [1] and Tanford [2] demonstrate that such conditions can give rise to a defined equilibrium population in which the unfolding transition is complete (Figure 1). In this chapter, we use both terms and rely on the context for specificity.
2 Early History
The fact that protein molecules can undergo a reversible disorder order transition originated early in the last century, in ideas proposed by Wu [3, 4] and Mirsky and Pauling [5]. Both papers propose that a theory of protein structure is tantamount to a theory of protein denaturation. In particular, these authors recognized that many
3 The Random Coil
disparate physical and chemical properties of proteins are abolished coordinately upon heating. This was unlikely to be mere coincidence. Both Wu and Mirsky and Pauling hypothesized that such properties are a consequence of the protein’s structure and are abolished when that structure is melted. Their hypothesis was later confirmed by Kauzmann and Simpson [6], at which point the need for an apt characterization of the melt became clear, and protein denaturation emerged as a research discipline. A widely accepted view assumes that unfolded polypeptide chains can explore conformational space freely, with constraints arising only from short-range local restrictions and longer range excluded volume effects. To a good first approximation, short-range local restrictions refer to repulsive van der Waals interactions between sequentially adjacent residues (i.e., steric effects) captured by the well-known Ramachandran map for a dipeptide [7]. Longer range excluded volume effects also refer to repulsive van der Waals interactions, in this case those between nonbonded atoms that are distant in sequence but juxtaposed in space as the chain wanders at random along a Brownian walk in three dimensions [8, 9]. This random coil model has conditioned most of the thinking in the field. It is important to realize that the random coil model need not imply an absence of residual structure in the unfolded population. Kauzmann’s famous review raised the central question about structure in the unfolded state explicitly [10]: For instance, one would like to know the types of structures actually present in the native and denatured proteins. . . . The denatured protein in a good solvent such as urea is probably somewhat like a randomly coiled polymer, though the large optical rotation of denatured proteins in urea indicates that much local rigidity must be present in the chain (p. 4).
3 The Random Coil
A chain molecule is a freely jointed random coil if it traces a random walk in threedimensional space, in incremental steps of fixed length. The random coil model has enjoyed a long and successful history in describing unfolded proteins. By definition, a random coil polymer has no strongly preferred backbone conformations because energy differences among its sterically accessible backbone conformations are of order ∼kT. Accordingly, the energy landscape for such a polymer can be visualized as an “egg crate” of high dimensionality, and a Boltzmann-weighted ensemble of such polymers populates this landscape uniformly. More than others, this elegant theory was developed by Flory [11], pp. 30–31; [12], pp. 991–996, and advanced by Tanford [2, 9, 13], who demonstrated that proteins denatured in 6 M guanidinum chloride (a strong denaturant) appear to be structureless, random chains. Tanford’s pioneering studies established a compelling framework for interpreting experimental protein denaturation that would survive largely unchallenged for the next 30 years.
3
4 Conformational Properties of Unfolded Proteins
Often, the term random coil is used synonymously with the freely jointed chain model (described below), in which there is no correlation between the orientation of two chain monomers at any length scale. That is, configurational properties of a freely jointed chain, such as its end-to-end distance, are Gaussian distributed at all chain lengths. In practice, no actual polymer chain is freely jointed. More realistic models, such as Flory’s rotational isomeric-state model, have Gaussiandistributed chain configurations only in the infinite chain limit ([11], pp. 30–31; [12], pp. 991–996). These distinctions notwithstanding, the main characteristic of the random coil holds in all cases, both ideal and real: the unfolded state is structurally featureless because the number of available conformers is large and the energy differences among them are small. 3.1 The Random Coil–Theory
Statistical descriptions are the natural way to characterize a large heterogeneous population, such as an ensemble of unfolded proteins. A few key ideas are mentioned here, but they are no substitute for the many excellent treatments of this subject [8, 11, 12, 14, 15]. The fundamental model is the freely jointed chain (or freely jointed random coil or random flight), a linear polymer of n adjoining links, each of fixed length, with complete freedom of rotation at every junction (Figure 2). From this definition, it follows that the angles at link junctions (i.e., bond angles) are completely uncorrelated. This model is completely general because it neglects chemical constraints, and therefore its scope is not restricted to any particular type of polymer chain. However, additional constraints such as chain thickness or hindered bond rotation can be added as appropriate, resulting in more specific models. What can be said about a polymer chain that is devoid of chemistry? The freely jointed chain is equivalent to Brownian motion with a mean free path of fixed length, as described by Einstein-Smoluchowski theory [16]. The basic relationship governing both freely jointed chains and Brownian particles is:
√ r 2 =l n
(1)
Fig. 2. A freely jointed chain. The chain is comprised of links, each of fixed length, l, with freedom of rotation at every junction. For a chain of n links, the vector from the beginning to the end, r n , (shown as the long arrow) is given by summing the links, n li: rn= l i and | l i |=l i =1
3 The Random Coil
where r 2 is the root-mean-square end-to-end distance (see Figure 2), l is the link length, and n is the number of links in the polymer. In other words, the distance between termini increases as the square root of the number of chain links: doubling the distance requires four times as many links. The end-to-end distance measures the size of a polymer coil. Another such measure is the radius of gyration, RG , the root-mean-square distance of link termini from their common center of gravity: 1 2 R n+1 i=0 Gi n
RG2 =
(2)
where RGi is the distance of link i from the center of gravity and n is the number of links in the polymer. According to a theorem of Lagrange in 1783, RG can be rewritten in terms of the individual link vectors, r ij , without explicit reference to the center of gravity ([11], appendix A). RG2 =
N N 1 2 r 2n2 i=1 j=1 ij
(3)
The two measures are related: RG2 =
r2 as n→∞ 6
(4)
For a freely jointed chain, the values of such configurational measures are Gaussian distributed. Of course, no real chain is freely jointed. The chemical bonds in real chains restrict motion; bond rotations are never random. Also, each link of a real chain occupies a finite volume, thereby reducing the free volume accessible to remaining links. Accordingly, ideal chains descriptions must be modified if they are to accommodate such real-world constraints. A strategy for accommodating restricted bond motion is to depart from physical chain links and instead re-represent the chain as though it were comprised of longer, uncorrelated virtual links. The idea underlying this strategy is as follows: a short chain segment (e.g., a dipeptide) is somewhat rigid [7], but a sufficiently long segment is flexible. Therefore, the chain becomes flexible at some length between the dipeptide and the longer peptide. This leads to the idea of an effective segment, leffective , also called a Kuhn segment, the length scale at which chain segments approach independent behavior and correlated orientations between them dwindle away. A chain of length L contains L/leffective Kuhn segments and can be approximated as a freely jointed chain of Kuhn segments: 2 r 2 =l effective
L l effective
=l effective L
(5)
5
6 Conformational Properties of Unfolded Proteins
A closely related idea is defined in terms of the chain’s persistence length, the length scale over which correlations between bond angles “persist”. In effect, the chain retains a “memory” of its direction at distances less than the persistence length. Stated less anthropomorphically, the energy needed to bend the chain through a 90◦ angle diminishes to ≈kT/2 at its persistence length, and thus ambient-temperature fluctuations can randomize the chain direction beyond this length. The size of a Kuhn segment is approximately two persistence lengths (i.e., directional correlations die away in either direction). Current models strive to capture the properties of real chains with more detail than idealized, freely jointed chains can provide. For example, no actual chemical bond is a freely swiveling joint. To treat bond restrictions more realistically, Flory devised the rotational isomeric state approximation ([11], p. 55), in which bond angles are restricted to discrete values, chosen to correspond to known potential minima (e.g., gauche+ , gauche− , and trans). A real polymer chain cannot evade itself. Inescapably, the volume occupied by a chain element is excluded from occupancy by other chain elements. Otherwise, a steric clash would ensue: nonbonded atoms cannot occupy the same space at the same time. This excluded volume effect is substantial for proteins and results in a major departure from ideal chain dimensions (Eqs. (1)–(4)). As real polymers fluctuate, contracted coils have more opportunities to experience excluded volume steric clashes than expanded coils, perturbing the chain toward larger dimensions than those expected for ideal polymers. Chain dimensions are also perturbed by the nature of the solvent. A good solvent promotes chain expansion by favoring chain:solvent interactions over chain:chain interactions. Conversely, a poor solvent promotes chain contraction by favoring chain:chain interactions over chain:solvent interactions. Flory introduced the idea of a -solvent in which, on average, chain:chain interactions exactly counterbalance chain:solvent interactions, leading to unperturbed chain behavior. He pointed out that the notion of a -solvent for a liquid is analogous to the Boyle point for a real gas, the temperature at which a pair of gas molecules follow an ideal isotherm because repulsion arising from volume exclusion is compensated exactly by mutual attraction ([11], p. 34). Flory provided a simple relationship that relates the coil dimensions to solvent quality [11]. For a random coil polymer with excluded volume, the radius of gyration, RG , is given by: RG =R0 nv
(6)
where R0 is a constant that is a function of the chain’s persistence length, n is the number of links, and v is the scaling factor of interest that depends on solvent quality. Values of v range from 0.33 for a collapsed, spherical molecule in poor solvent through 0.5 at the -point (Eq. (1)) to 0.6 in good solvent. Protein molecules are amphipathic, and their interactions with solvent are complex. However, on balance, denaturing agents such as urea and guanidinum chloride can be considered good solvents. Using Eq. (6), the degree to which unfolded
3 The Random Coil
proteins are random coil polymers in denaturing solvents can assessed by measuring v, the main topic of Section 3.2. 3.1.1 The Random Coil Model Prompts Three Questions The random coil model set the stage for much of the contemporary theoretical work on unfolded proteins. A key question was brought into sharp focus by Levinthal [17]: if the random coil model holds, how can an unfolded protein discover its native conformation in biological real time? In particular, if unfolded protein molecules wander freely in a vast and featureless energy landscape, with barriers of order ∼kT, then three related questions arise: 1. The kinetic question How does a protein discover its native conformation in biological real time? If restricted solely to the two most populated regions for a dipeptide, a 100-residue backbone would have 2100 ∼ = 1030 conformers. With −13 bond rotations of order 10 s, the mean waiting time en route to the native conformation would be prohibitive just for the backbone. In actuality, experimentally determined folding times range from milliseconds to seconds. 2. The thermodynamic question How does a protein compensate for the large conformational entropy loss on folding? With 2100 ∼ = 1030 conformers, the entropic price required to populate a single conformation ∼ = 30 × R ln(10) ∼ = 40 kcal mol−1 at room temperature, a conservative estimate. 3. The dynamic question How does a protein avoid meta-stable traps en route to its native conformation? An equivalent way of asking this question is: why do proteins have a unique native conformation instead of a Boltzmann-distributed ensemble of native conformations?
Many investigators have sought to provide answers to these questions. Two notable examples are mentioned here, though only in bare outline. 3.1.2 The Folding Funnel Following earlier work of Frauenfelder et al. [18], who suggested an analogy between proteins and spin glasses, Wolynes and coworkers introduced the notion of a folding funnel [19] to describe the progress of a protein population that traverses its energy landscape en route to the folded state. The favorable-high-entropy, unfavorable-high-energy unfolded state is conceptualized as a wide funnel mouth, while the unfavorable-low-entropy, favorable-low-energy native state corresponds to a narrow funnel spout. According to this conception, sloping funnel walls guide the population downhill toward the folded state from any starting point, answering question 1. During this downhill trajectory, lost entropy is progressively compensated by favorable pairwise interactions, answering question 2. Finally, meta-stable traps can be avoided if the funnel walls are sufficiently smooth [20], answering question 3. As a corollary, it is postulated that evolutionary pressures screen protein sequences, selecting those which can fold successfully in a funnel-like manner [21]. The folding funnel evokes a graphic portrait of folding dynamics and
7
8 Conformational Properties of Unfolded Proteins
thermodynamics but is not intended to address specific structural questions, such as whether a region of interest will be helix or sheet. 3.1.3 Transition State Theory Fersht and coworkers imported transition state theory from small molecule chemical reactions into protein folding [22]. Akin to the folding funnel, transition state kinetics focus on the entire population, with the transition state species pictured at the top of an energy barrier which separates U from N. But, unlike the folding funnel, only a few key residues comprise the organizational “tipping point”, viz., those that participate in the transition state. Questions 1–3 are not at issue for small molecule chemical reactions: (1) the mean waiting time for a reaction to occur depends upon a bond vibration, (2) after barrier crossing, the process is steeply downhill, and (3) intermediates between reactant and product are unstable because bond making/breaking energies are large. To the degree that the transition state approximation holds for protein folding [23], similar answers will obtain. Transition state theory, expressed in the Eyring rate equation, transforms timedependent kinetics into time-independent thermodynamics via an internal ticking clock: the rate of product formation depends upon the frequency of vibration of a critical bond. In contrast, no covalent bonds are made or broken in a folding reaction, and structure accretion is incremental and hierarchic en route from U to N [23, 24]. Not surprisingly then, the application of transition state theory to protein folding is complex [25]. Confidence in the application of the transition state approximation to protein folding comes from its success in describing simplified folding reactions [26] and the thermal unfolding of a β-hairpin [27]. However, recent work also illustrates the complexities. The transition state can be shifted dramatically without changing a protein’s amino acid sequence [28, 29]. In simulations, the folding reaction can produce a broad ensemble of transition states instead of a single, well-defined species [30]. This blurring of the lines is further compounded by other work showing that the transition state resembles nearby folding intermediates [31] or is simply a distorted form of the native state [32]. 3.1.4 Other Examples The preceding examples illustrate how the random coil model has informed current thinking about unfolded proteins and the folding transition. The search for answers to the three questions has motivated other studies as well. In yet another example that focuses on question 3, Sali et al. [33, 34] analyzed the density of states in lattice simulations of folding and found a large energy gap – the e–gap – separating the native state (i.e., the ground state) from the nearest nonnative state (i.e., the 1st excited state). This finding rationalizes the predominance of the native state. 3.1.5 Implicit Assumptions from the Random Coil Model Unfolded state models utilized in computer simulations often incorporate random coil assumptions implicitly. Four such assumptions are mentioned here.
3 The Random Coil
1. The unfolded landscape is smooth. If the energy differences among sterically accessible backbone conformations are of order ∼kT, the landscape will be devoid of kinetic traps and conformational bias. This assumption simplifies strategies for exploring the unfolded state in simulations. 2. The isolated-pair hypothesis is valid. Lattice models provide a way to count conformational alternatives explicitly, and they have been used extensively [14]. Most often, residues are represented as single lattice points (i.e., all residues are sterically equivalent on a lattice). Consequently, residue-specific steric restrictions beyond the dipeptide are either underweighted or ignored. This practice is valid to the degree that local steric repulsion does not extend beyond nearest chain neighbors, an assumption made explicit in Flory’s isolated-pair hypothesis [11], which posits that each , ψ pair is sterically independent. 3. The Go approximation holds. A simplifying idea, introduced by Go [35], computes the energy of a conformation by rewarding native-like contacts while ignoring nonnative contacts, i.e., fortuitous nonnative contacts are not allowed to develop into kinetic traps. For simulations, this is a useful artifice that can be rationalized in a featureless landscape, where nonnative contacts dissolve as easily as they form. 4. Peptide backbone solvation is uniform. In other words, solvent water does not induce conformational bias in the unfolded state. If the interaction with water were energetically favored by some particular backbone conformation(s), then the unfolded landscape would be preferentially populated by these favored conformers, in violation of the featureless, random coil model.
These four assumptions are examined in Section 4. 3.2 The Random Coil–Experiment
Is a denatured protein aptly described as a random coil? It was Charles Tanford’s experimental work that convinced the field. In numerous studies, Tanford demonstrated that proteins denatured in 6 M guanidinum chloride (GdmCl) have coil dimensions that obey simple scaling laws, consistent with random coil behavior. His masterful review of protein denaturation in Advances in Protein Chemistry [9, 13] is required reading for anyone interested in this topic. In essence, the experimental strategy involves measuring coil dimensions for unfolded proteins in solution, fitting them to Eq. (6), and determining whether the scaling exponent, v, is consistent with a random coil polymer with excluded volume in good solvent. The excluded volume can be obtained directly from any practical technique that depends upon the colligative properties of the polymer solution, such as osmotic pressure. Using such techniques, concentration-dependent deviations from ideality arising from solute:solvent interactions are measured. To extract the excluded volume, the chemical potential of the polymer solution is expanded as a power series in solute concentration – the virial expansion. For purely repulsive interactions, the molar excluded volume is given by the second virial coefficient
9
10 Conformational Properties of Unfolded Proteins
[36]. As mentioned above, excluded volume increases chain dimensions, with v ranging from 0.33 for a collapsed, spherical molecule in poor solvent through 0.5 at the -point to 0.6 in good solvent. Tanford documented many experimental pitfalls [9]. His investigations emphasized the importance of eliminating any potential residual structure in the unfolded protein by showing that the unfolding transition is complete. In fact, some residual structure is evident in heat-denatured proteins [37], but it can be eliminated in 6 M GdmCl. He also cautioned that the radius of gyration alone is an insufficient criterion for assessing random coil behavior, pointing out that a helical rod and a random polypeptide chain have similar values of RG at chain lengths approximating those of ribonuclease and lysozyme. 3.2.1 Intrinsic Viscosity In classic studies, Tanford used the intrinsic viscosity to determine coil dimensions. The intrinsic viscosity of a protein solution measures its effective hydrodynamic volume per gram in terms of the specific viscosity ([38], chapter 7). In particular, if η is the viscosity of the solution and η0 is the viscosity of solvent alone, the specific viscosity, ηsp = (η–η0 )/η0 , is the fractional change in viscosity produced by adding solute. While ηsp is the quantity of interest, it is expressed in an experimentally inconvenient volume fraction concentration scale. This is remedied by converting to the intrinsic viscosity, [η], which is ηsp normalized by the protein concentration, c, η at infinite dilution: [η]= limc→0 csp . Specific viscosity is a pure number, so intrinsic viscosity has units of reciprocal concentration, milliliters per gram. The intrinsic viscosity is not a viscosity per se but a viscosity increment owing to the volume fraction of solution occupied by the protein, like ηsp . It measures the hydrated protein volume, which will be much larger for randomly coiled molecules than for compactly folded ones; [η] scales with chain length, n. The dependence of [η] on n is conformation dependent, and Tanford took advantage of this fact. The relevant equation is:
[η]=K n*
(7)
where K is a constant that depends upon the molecular weight, but only slightly. Intrinsic viscosity is closely related to RG , and Eqs. (6) and (7) have a similar form. If unfolded proteins retain residual structure, each in their own way, the relation between [η] and n is expected to be idiosyncratic. Conversely, a series of proteins that conform to Eq. (7) is indicative of random coil behavior. In fact, for a series of 15 proteins denatured in GdmCl, a plot of log[η] vs. log n describes a straight line with slope 0.666, the exponent in Eq. (7) [9]. The linearity of this series and the value of the exponent are strong evidence in favor of random coil behavior. 3.2.2 SAXS and SANS Small angle X-ray scattering (SAXS) can be used to measure the radius of gyration, RG , directly [39]. Molecules in a protein solution scatter radiation like tiny antennae
3 The Random Coil
([40], chapter 7). In idealized situations, particles scatter independently (Rayleigh scattering), but significant interference occurs when intramolecular distances are of the same order as the wavelength of incident radiation, λ. This is the situation that obtains when proteins are irradiated with X-rays, and it is the basis for all experimental scattering techniques that yield RG . In this case, the quantity of interest is P(θ ), the ratio of measured intensity to the intensity expected for independent scattering by particles much smaller than λ as a function of the scattering angle, θ . For a solution of scatterers, P(θ )=
N N 1 sinhij n2 i−1 j=1 hr ij
(8)
where n is the number of scattering centers, r ij is the distance between any pair of centers i and j, and h is a function of the wavelength and scattering angle: h=
θ 4π sin λ 2
(9)
The double sum over all scattering centers is immediately reminiscent of Eq. (3), in which RG is rewritten in terms of individual vectors, without explicit reference
Fig. 3 The relationship between chain length and the radius of gyration, RG , for a series of denatured proteins is well described by Eq. (6). Data points were taken in [39], obtained using SAXS. The fitted curve has a value of v = 0.61 ± 0.03, in close agreement with theory. This figure reproduces the one in Millett et al. ([39]), but with omission of outliers.
11
12 Conformational Properties of Unfolded Proteins
to their center of gravity. Van Holde et al. ([40], p. 321) show that P(θ )=
N N N N h2 2 1 (1)− r n2 i=1 j=1 6n2 i=1 j=1 ij
(10)
where the first term is unity and RG is directly related to the double sum in the second term, as in Eq. (3). Millett et al. used SAXS to determine RG for a series of proteins under both denaturing and native conditions [39]. Disulfide cross-links, if any were reduced in the denatured species. Their experimentally determined values of RG were fit to Eq. (6), giving values of v = 0.61 ± 0.03 for the denatured proteins and v = 0.38 ± 0.05 for their native counterparts (Figure 3). These values are remarkably close to those expected from theory, viz., v = 0.6 for a random coil with excluded volume in good solvent and v = 0.33 for a collapsed, spherical molecule in poor solvent. The SAXS data provide the most compelling evidence to date in favor of the random coil model for denatured proteins.
4 Questions about the Random Coil Model
The random coil model would seem to be on firm ground at this point. However, recent work from both theory and experiment has raised new questions about the validity of the model – questions that provoke considerable controversy. Are they mere quibbles, or are they the prelude to a deeper understanding of the unfolded state? Familiarity conditions intuition. At this point, the random coil model has conditioned our expectations for several decades. Should we be surprised that the dimensions of unfolded proteins are well described by a single exponent? Size matters here. As Al Holtzer once remarked, a steel I-beam behaves as a Gaussian coil if you make it long enough. But, at relevant length scales, the fact that proteins and polyvinyl behave similarly is quite unanticipated. After all, proteins adopt a unique folded state, whereas nonbiological polymers do not. Flory emphasized this difference [11]: “Synthetic analogs of globular proteins are unknown. The capability of adopting a dense globular configuration stabilized by self-interactions and of transforming reversibly to the random coil are characteristics peculiar to the chain molecules of globular proteins alone” (p. 301). The new questions center around the possibility of conformational bias and/or residual structure in unfolded proteins [11], even those unfolded in strong denaturing solvents like 6 M GdmCl [42]. We turn now to this discussion. 4.1 Questions from Theory
Superficially, the question of whether polypeptide chains are true random coils seems amenable to straightforward analysis by computer simulation. In principle,
4 Questions about the Random Coil Model
chains of n residues could be constructed, one at a time, using some plausible model (e.g., the Flory rotational isomeric model) to pick backbone dihedral angles. The coil dimensions and other characteristics of interest could then be analyzed by generating a suitable population of such chains. In practice, the excluded volume problem precludes this approach for chains longer than ∼20 residues, where the likelihood of encountering a steric clash increases sharply, killing off nascent chains before they can elongate. Naively, one might think that the problem can be solved by randomly adjusting offending residues until the clash is relieved, but this tactic biases the overall outcome. In fact, the only unbiased tactic is to rebuild the chain from scratch, resulting almost invariably in other clashes at new sites for chain lengths of interest. Such problems have thwarted attempts to analyze the unfolded population via simulation and modeling. 4.1.1 The Flory Isolated-pair Hypothesis Nearly all theoretical treatments of the unfolded state assume that local steric repulsion does not extend beyond nearest chain neighbors. This simplifying assumption was made explicit in Flory’s isolated-pair hypothesis ([11], p. 252), which posits that each , ψ pair is sterically independent of its neighbors. Recently, the isolated-pair hypothesis was tested by exhaustive enumeration of all sterically accessible conformations of short polyalanyl chains [43]. To count, , ψ space was subdivided into a uniform grid. Every grid square, called a mesostate, encloses a 60◦ × 60◦ range of , ψ values, with 36 mesostates in all. Each such mesostate was sampled extensively and at random to determine whether alanyl dipeptides with , ψ values in this range are sterically allowed. Only 14 mesostates are populated; the remaining 22 mesostates experience ubiquitous steric clashes throughout their entire range. Reconstruction of allowed , ψ space from mesostate sampling recreates the dipeptide map [7] and provides an acceptance ratio for each mesostate (Figure 4). The acceptance ratio, i , is the fraction of sterically allowed samples for mesostate i, ranging from 0 to 1. These s were then used to test the isolated-pair hypothesis. Specifically, short polypeptide chains of length n = 3. . .7 were tested by enumerating all possible strings over the 14 allowed mesostates and sampling them extensively. If the isolated-pair hypothesis holds, then , ψ angles in each mesostate are independent, and the fraction of sterically allowed conformers for each string is given by n the product of individual acceptance ratios, i . But, if there are steric clashes n i between nonnearest neighbors in the string, then string < i , invalidating the i hypothesis. From this analysis, the isolated-pair hypothesis was found to be valid in the upper left quadrant of , ψ space but invalid in all other allowed regions. This finding makes sense physically: upon adopting , ψ values from the upper left quadrant, the chain is extended, like a β-strand, and nonnearest neighbors are separated. However, with , ψ values from any of the remaining five allowed mesostates (see Figure 4), the chain is contracted, like a helix or turn, and nonnearest neighbors are juxtaposed, enhancing opportunities for steric interference.
13
14 Conformational Properties of Unfolded Proteins
Fig. 4 Testing the Flory isolated-pair hypothesis. , R space for a dipeptide was subdivided into 36 alphabetically labeled coarse-grain grid squares, called mesostates. Treating the atoms as hard spheres, a Ramachandran plot (shown in gray) was computed by generating 150 000 randomly chosen , R conformations
within each mesostate and testing for steric collisions [48]. Twenty-two mesostates have no allowed population; in each of these cases, every , R value results in a steric collision. In the 14 remaining mesostates, the fraction of sterically allowed samples, 0 < i ≤ 1, was determined, as shown.
The failure of the isolated-pair hypothesis for short peptides (n = 7) challenges the random coil model, possibly in a major way. Steric restrictions obtain in the folded and unfolded states alike. The failure of the hypothesis for contracted chains implies that such conformers will be reduced selectively in the unfolded state, resulting in a population that is more extended than random coil expectations. Structurally, this shift in the population will result in a more homogeneous ensemble of unfolded conformers, and thermodynamically, it will reduce the entropy loss accompanying folding. But is it significant? Studies of van Gunsteren et al. [44] and Sosnick and his colleagues [45] concur that the size of conformational space that can be accessed by unfolded molecules is restricted in peptides. However, Ohkubo and Brooks [46] argue that restrictions become rapidly insignificant as chain lengths grow beyond n ∼ = 7, with negligible consequences for the random coil model. In an inventive approach to the problem, Goldenberg simulated populations of protein-sized chains [47] by adapting a standard software package that generates three-dimensional models from NMR-derived distance constraints. He
4 Questions about the Random Coil Model
analyzed the resultant unfolded state population using several measures, including coil dimensions, and found them to be well described as random coils. A note of caution is in order, however, because a substantial fraction of the conformers generated by this method fall within sterically restricted regions of , ψ space [47]. 4.1.2 Structure vs. Energy Duality Often, the complex interplay between structure and energy has confounded simulations. Small changes in structure can give rise to large changes in energy, and conversely. From a structural point of view, two conformers are distinguishable when their , ψ angles differ. From a thermodynamic point of view, two conformers are indistinguishable when they can interconvert via a spontaneous fluctuation. This structure–energy duality has contributed confusion to the Levinthal paradox (Section 3.1.1) and many related size estimates of the unfolded population because a single energy basin can span multiple conformers. For example, most sterically accessible conformers of short polyalanyl chains in good solvent [43] are quite extended, as expected in the absence of stabilizing intramolecular interactions. The , ψ values for these conformers are densely distributed over a broad region in the upper left quadrant of the , ψ map, as shown in Figure 5. When energy differences among these structures are calculated using a simple soft-sphere potential, the population partitions largely into two distinct energy basins, one that includes βstrands and another that includes polyproline II helices [48]. All conformers within each basin can interconvert spontaneously at room temperature (i.e., Ai, j ≤ RT
Fig. 5 A single energy basin can span multiple conformers. Most sterically accessible conformers of short polyalanyl chains in good solvent are extended. Using a softsphere potential, the Boltzmann-weighted population for the alanyl dipeptide is pre-
dominantly in the upper left quadrant of , R space and partitions into two distinct energy basins, one that includes polyproline II helices (larger) and another that includes $strands (smaller) [48]. At 300 K, conformers with each basin interconvert spontaneously.
15
16 Conformational Properties of Unfolded Proteins
at 300 K, where Ai, j is the Helmholtz free energy difference between any two conformers, i and j, R is the universal gas constant and T the temperature in Kelvin). Thus, apparent structural diversity is reduced to two thermodynamically homogeneous populations.
4.1.3 The “Rediscovery” of Polyproline II Conformation More than three decades ago, Tiffany and Krimm proposed that disordered peptides are comprised of left-handed polyproline II (PII ) helical segments interspersed with bends [49, 50]. They were led to this prescient proposal by the similarity between the optical spectra of PII helices and nonprolyl homopolymers. Even earlier, Schellman and Schellman had already argued that the spectrum of unfolded proteins was unlikely to be that of a true random coil [51]. Following these early studies, the ensuing literature disclosed a noticeable similarity between the spectra of PII and unfolded proteins, but such suggestive hints failed to provoke widespread interest – until recently. See Shi et al. for a thorough review [52]. The designation “polyproline” can be misleading. The circular dichroism (CD) spectrum, characteristic of actual polyproline or collagen peptides, has a pronounced negative band near 200 nm and a positive band near 220 nm. However, similar spectra can be obtained from peptides that are neither “poly” [53] nor proline-containing [50]. The P II conformation is a left-handed helix with three residues per turn (, ψ ∼ = −75◦ , +145◦ ), resulting in three parallel columns spaced uniformly around the long axis of the helix. This helix has no intrasegment hydrogen bonds, and, in solution, significant fluctuations from the idealized structure are to be expected. The P II conformation is forced by sterics in a polyproline sequence, but it is adopted readily by proline-free sequences as well [58]. Only three repetitive backbone structures are sterically accessible in proteins: αhelix, β-strand and PII -helix [7]. In the folded population, α-helices and β-strands are abundant, whereas PII -helices are rare. More specifically, isolated residues with PII , ψ values are common in the non-α, non-β regions, accounting for approximately one-third of the remaining residues, but longer runs of consecutive residues with PII , ψ-values are infrequent [55]. This finding can be rationalized by the fact that PII -helices cannot participate in hydrogen bonds in globular proteins. Hydrogen bonds are eliminated because the spatial orientation of backbone donors and acceptors is incompatible with both intrasegment hydrogen bonding within PII -helices and regular extra-segment hydrogen bonding between PII helices and the three repetitive backbone structures. Upon folding, those backbone polar groups deprived of hydrogen-bonded solvent access can make compensatory hydrogen bonds in α-helices and strands of β-sheet, but not in PII -helices. Recent work by Creamer and coworkers focused renewed attention on PII [24, 55, 56], raising the question of whether fluctuating PII conformation might contribute substantially to the unfolded population in proteins [50]. Studies performed during the past few years lend support to this idea, as described next.
4 Questions about the Random Coil Model
4.1.4 PII in Unfolded Peptides and Proteins The blocked peptide, N-acetylalanine-N -methylamide, is a popular backbone model. Many groups have found PII to be an energetically preferred conformation for this peptide in water [57–60, 62]. Does this finding hold for longer chains? Again using alanine as a model, Pappu and Rose analyzed the conformational preferences of longer blocked polyalanyl chains, N-acetyl-Alan -N -methylamide (n ≤ 7) [48]. To capture nonspecific solvent effects, they minimized chain:chain interactions, mimicking the chain’s expected behavior in good solvent. At physiological temperature, only three energy basins were needed to span ∼75% of the population, and within each basin, the population of structures was homogeneous. Notably, the basin corresponding to PII structure was dominant. Pappu and Rose [48] used soft-sphere repulsion (the repulsive term in a LennardJones potential) to calculate energy. More extensive testing using detailed force fields was performed by Sosnick and coworkers [45]. It is often assumed that the backbone is solvated uniformly in the unfolded state and that the energy of solvent stripping upon folding is not a significant consideration. This assumption follows directly from the random coil model, in which unfolded conformers are readily interconvertible. However, if unfolded state conformers exhibit conformational biases, it becomes important to question this assumption. Is solvation free energy conformation dependent? A series of papers by Avbelj and Baldwin [63–65] offered a fresh perspective on this issue, motivated by an inconsistency between the measured energy of peptide hydrogen bond formation [66] and the corresponding energy derived from a simple thermodynamic cycle [67]. Specifically, their analysis uncovered a large enthalpy deficit (−7.6 kcal mol−1 ) upon helix formation that could not be reconciled with data from typical model compounds, such as acetamide derivatives [65]. One or more terms had to be missing. Avbelj and Baldwin’s work prompted a re-examination of peptide solvation in proteins by a number of groups, including themselves [63–65]. Of particular interest are a series of unrelated simulations [26, 45, 63–65, 68–70], all of which reach a common conclusion: water interacts preferentially with PII peptides, imparting a previously unsuspected conformational bias. In sum, both peptide:solvent interactions and peptide:peptide interactions [48] favor PII conformers. In the former case, water is simply a better solvent for PII than for other conformers, e.g., β-strands and α-helices. In the latter case, PII affords the chain greater entropic freedom (i.e., more “wiggle room”). 4.2 Questions from Experiment
Early NMR studies provided evidence for residual structure in the denatured state of both proteins [71, 72] and peptides excised from proteins [73]. However, the structured regions seen in proteins were not extensive. Furthermore, most isolated peptides lacked structure, and the few exceptions did not always retain the conformation adopted in the native protein [74].
17
18 Conformational Properties of Unfolded Proteins
Peptide studies tell a similar story. A prime example involves the assessment of autonomous stability in the α-helix. Early evidence indicated that the cooperative unit for stable helix formation is ∼100 residues [75], a length that exceeds the average protein helix (∼12 residues) by almost an order of magnitude. Consequently, the prevailing view in the 1970s was that protein-sized helical peptides would be random coils in isolation. This view was reversed in the 1980s, after Bierzynski et al. [76], expanding upon earlier work by Brown and Klee [77], demonstrated helix formation in water at near-physiological temperature for residues 1–13 of ribonuclease, a cyanogen bromide cleavage product. This finding prompted a reevaluation of helix propensities in peptides [78, 79, 81, 81] and motivated numerous biophysical studies of peptides [82]. Summarizing this large body of work, there is evidence for structure in some short peptides in aqueous solvent at physiological temperature, but it is marginal at best and, more often, undetectable altogether. 4.2.1 Residual Structure in Denatured Proteins and Peptides The limited success of these early attempts to detect residual structure strengthened the conviction that denaturation abolishes structure and reinforced the notion that the unfolded state is a random coil. Consequently, the field was stunned when Shortle and Ackerman [83] demonstrated the persistence of native-like structure in staphylococcal nuclease under strongly denaturing conditions (8 M urea). Shortle and Ackerman’s finding was based on evidence from residual dipolar couplings in oriented gels. However, their interpretation that these data provide evidence of global organization was questioned recently by Annila and coworkers [88]. The ultimate conclusions from such work are still unclear, but the perspective has definitely changed and many recent experiments now find evidence for substantial residual structure in the denatured state (e.g., [31, 86, 86–89]). In a similar vein, Shi et al. reanalyzed a blocked peptide containing seven consecutive alanine residues for the presence of residual structure [52]. This peptide is too short to form a stable α-helix and should therefore be a random coil. Contrary to this expectation, the peptide is largely in PII conformation, in agreement with predictions from theory [48]. While not all residues are expected to favor the PII conformation [56], this result shows that the unfolded state is predominantly a single conformer, at least in the case of polyalanine. 4.3 The Reconciliation Problem
The measured radii of gyration, RG , of denatured proteins have values [39] that are consistent with those expected for a random coil with excluded volume in good solvent (Section 3.2.2). Yet, experimental evidence in both proteins [83] and peptides [52] suggests the presence of residual structure in the unfolded population. How are these seemingly contradictory findings to be reconciled? Millett et al. refer to this as the reconciliation problem ([39], see their discussion, p. 257). Paradox is often a prelude to perception. Equation (6), in its generality, necessarily neglects the chemical details of any particular polymer type. Accordingly, the
4 Questions about the Random Coil Model
resultant chain description is insensitive to short-range order, apart from the proportionality constant R0 , which is a function of the persistence length (Section 3.1). Sterically induced local order, encapsulated in R0 , is surely present in unfolded proteins [43, 91], but can it rationalize the apparent contradiction between random coil RG values and global residual structure [83]? One possible explanation is that multiple regions of local structure dominate the ensemble average to such an extent that they are interpreted as global organization [84, 92]. The coil library may provide a useful clue to the resolution of this puzzle. The coil library is the collection of all nonrepetitive elements in proteins of known structure, that fraction of native structure which remains after α-helix and β-sheet are removed. Given that the library is composed of fragments extracted from solved structures, it is surely not “coil” in the polymer sense. However, the term “coil library” is intended to convey the hypothesis that such fragments do, in fact, represent the full collection of accessible chain conformers in unfolded proteins [93]. Taken to its logical conclusion, this hypothesis posits that the coil library is a collection of structured fragments in folded proteins and, at the same time, a collection of unstructured fragments in unfolded proteins. If so, this library, together with α-helix, β-strand, and PII helix, represents an explicit enumeration of accessible conformers from which the unfolded ensemble might be reconstructed [64]. At this writing, the reconciliation problem remains an ongoing question. Regardless of the eventual outcome, this paradox appears to be moving the field in an informative direction. 4.4 Organization in the Unfolded State – the Entropic Conjecture
Are there general principles that lead to organization in the unfolded state? If accessible conformational space is vast and undifferentiated, the entropic cost of populating the native basin exclusively will be large. However, if the unfolded state is largely restricted to a few basins, with nonuniform, sequence-dependent basin preferences, then entropy can function as a chain organizer. Consider two thermodynamic basins, i and j. The Boltzmann-weighted ratio of their populations, ni /nj , is given by (wi /wj )e−βU , where ni and nj designate the number of conformers in i and j, wi and wj are the degeneracies of state, β is the Boltzmann factor, and U is the energy difference between the two basins. Both entropy and enthalpy contribute to this ratio. If wi and wj are conformational biases (i.e., the number of isoenergetic ways the chain can adopt conformations i and j), and wi /wj is the dominant term in the Boltzmann ratio, the entropy difference, Sconf = R ln(wi /wj ), would promote organization in the unfolded population. In particular, if PII is a dominant conformation in polyalanyl peptides, then it is also likely to be favored in unfolded proteins, in which case the unfolded state is not as heterogeneous as previously believed. The usual estimate of about five accessible states per residue in an unfolded protein is based on a familiar argument: the free energy difference between the folded and unfolded populations, Gconf , is a small difference between large value of Hconf and TSconf [13, 14]. If Gconf ∼ = −10 kcal
19
20 Conformational Properties of Unfolded Proteins
mol−1 (a typical value) and Hconf = 100 kcal mol−1 , then the counterbalancing TSconf is also ∼100 kcal mol−1 . Then Sconf ∼ = 3.33 entropy units per residue for a 100-residue protein at 300 K. Assuming Sconf = R ln W, the number of states per residue, W, is 5.34. However, instead of a reduction in the number of distinct states, this entropy loss on folding could result from a reduction in the degeneracy of a single state, providing the , ψ space of occupied regions in the unfolded population is further constricted upon folding. For example, a residue in PII is within a room-temperature fluctuation of any sterically allowed , ψ value in the upper left quadrant of the dipeptide map ([70]). Consequently, different , ψ values from these regions would be thermodynamically indistinguishable and therefore not distinct states at all. As a back-of-the-envelope approximation, consider a residue that can visit any allowed region of the upper left quadrant in the unfolded state. Upon folding, let this residue be constrained to lie within ±30◦ of ideal β-sheet , ψ values. The reduction in , ψ space would be a factor of 5.58, approximating the value attributed to distinct states. Similar, but less approximate, estimates can be obtained when the unfolded populations are Boltzmann weighted. 4.4.1 Steric Restrictions beyond the Dipeptide It has long been believed that local steric restrictions do not extend beyond the dipeptide boundary [7], but, on re-analysis, this conviction requires revision [41, 43, 96] (see Section 4.1.1). In fact, systematic steric restrictions operate over chain regions of several adjacent residues, and they serve to promote organization in unfolded protein chains. Two recent lines of investigation focus on identifying the physical basis for longer range, sterically induced ordering. In a series of remarkable papers, Banavar, Maritan and their colleagues show that chain thickness alone imposes stringent, previously unrecognized restrictions on conformational space [97–99]. All the familiar secondary structure motifs emerge automatically when the protein is represented as a self-avoiding tube, coaxial with the main chain, and a single inequality is imposed on all triples of Cα atoms [97, 99]. The further addition of simple hydrogen bond and hydrophobic terms is sufficient to generate the common super-secondary structures [98]. These straightforward geometric considerations demonstrate that sequence-independent steric constraints predispose proteins toward their native repertoire of secondary and super-secondary structural motifs. Investigating the atomic basis for longer range steric restrictions, Fitzkee and Rose found that a direct transition from an α-helix to a β-strand causes an unavoidable steric collision between backbone atoms [91]. Specifically, a nonnearest neighbor collision occurs between the carbonyl oxygens of an α-residue at position i (Oα i ) and a β-residue at position i + 3 (Oβ i+3 ). This restriction also holds for the transition from α-helix to PII . These simple steric constraints have pervasive organizational consequences for unfolded proteins because they eliminate all structural hybrids of the form . . . αααβ . . . and . . . αααPII . . . , pushing the unfolded population toward pure segments of α,β, and PII interconnected by irregular regions such as those found in the coil library.
5 Acknowledgments 21
5 Future Directions
The early analysis of steric restrictions in the alanyl dipeptide (more precisely, the compound Cα-CO-NH–CαHR–CO-NH-Cα, which has two degrees of backbone freedom like a dipeptide) by Ramachandran et al. [100] has become one of those rare times in biochemistry where theory is deemed sufficient to validate experiment [101]. The fact that the dipeptide map is based only on “hard sphere” repulsion alone led some to underestimate the generality of this work, but not Richards, who commented [102]: For chemically bonded atoms the distribution is not spherically symmetric nor are the properties of such atoms isotropic. In spite of all this, the use of the hard sphere model has a venerable history and an enviable record in explaining a variety of different observable properties. As applied specifically to proteins, the work of G. N. Ramachandran and his colleagues has provided much of our present thinking about permissible peptide chain conformations.
The notion that repulsive interactions promote macromolecular organization is not limited to the alanyl dipeptide. Space-filling models [103], which represent each atom literally as a hard sphere, were central to Pauling’s successful model of the α-helix [104] and have widespread application throughout chemistry. Much of the theory of liquids is based on the organizing influence of repulsion interactions [105]. Despite such successes, the existence of sterically induced chain organization has had little influence on models of the unfolded state owing to the strongly held conviction that local steric restrictions extend no further than adjacent chain neighbors. Of course, long-range excluded volume effects do affect the population [8, 14], as reflected in the exponent of Eq. (6), but they are not thought to play any role in biasing unfolded proteins toward specific conformations. Given the finding of local steric restrictions beyond the dipeptide (Section 4.4.1), it is time to reanalyze the problem. Re-analysis will involve at least three steps: (1) analysis of local steric restrictions beyond the dipeptide, (2) characterization of elements in the coil library, and (3) combination of the results from these two steps. To the extent that useful insights emerge from this prescription, the folding problem may not be as intractable as previously thought.
Acknowledgments
We thank Buzz Baldwin, Nicholas Fitzkee, Haipeng Gong, Nicholas Panasik, Kevin Plaxco, and Timothy Street for stimulating discussion and The Mathers Foundation for support.
22 Conformational Properties of Unfolded Proteins
References 1 GINSBURG, A. and CARROLL, W. R. 1965. Some specific ion effects on the conformation and thermal stability of ribonuclease. Biochemistry 4, 2159–2174. 2 TANFORD, C., PAIN, R. H., and OTCHIN, N. S. 1966. Equilibrium and kinetics of the unfolding of lysozyme (muramidase) by guanidine hydrochloride. J. Mol. Biol. 15, 489–504. 3 EDSALL, J. T. 1995. Hsien Wu and the first theory of protein denaturation (1931). In Advances in Protein Chemistry (eds D. S. EISENBERG, and F. M. RICHARDS), pp. 1–26. Academic Press, San Diego. 4 WU, H. 1931. Studies on denaturation of proteins. XIII. A theory of denaturation. Chin. J. Physiol. V, 321–344. 5 MIRSKY, A. E. and PAULING, L. 1936. On the structure of native, denatured, and coagulated proteins. Proc. Natl Acad. Sci. USA 22, 439–447. 6 SIMPSON, R. B. and KAUZMANN, W. 1953. The kinetics of protein denaturation. J. Am. Chem. Soc. 75, 5139–5192. 7 RAMACHANDRAN, G. N. and SASISEKHARAN, V. 1968. Conformation of polypeptides and proteins. Adv. Protein Chem. 23, 283–438. 8 FLORY, P. J. 1953. Principles of Polymer Chemistry. Cornell University Press, New York. 9 TANFORD, C. 1968. Protein denaturation. Adv. Protein Chem. 23, 121–282. 10 KAUZMANN, W. 1959. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1–63. 11 FLORY, P. J. 1969. Statistical Mechanics of Chain Molecules. Wiley, New York. 12 CANTOR, C. R. and SCHIMMEL, P. R. 1980. Biophysical Chemistry. Part III: The Behavior of Biological Macro-molecules. Freeman, New York. 13 TANFORD, C. 1970. Protein denaturation. Part C. Theoretical models for the mechanism of denaturation. Adv. Protein Chem. 24, 1–95. 14 CHAN, H. S. and DILL, K. A. 1991. Polymer principles in protein structure and stability. Annu. Rev. Biophys. Chem. 20, 447–490.
15 DILL, K. A. and SHORTLE, D. 1991. Denatured states of proteins. Annu. Rev. Biochem. 60, 795–825. 16 EINSTEIN, A. 1956. Investigations on the Theory of Brownian Movement. Dover Publications, New York. 17 LEVINTHAL, C. 1969. How to fold graciously. Mossbauer spectroscopy in Biological Systems, Proceedings. University of Illinois Bull. 41, 22–24. 18 ANSARI, A., BERENDZEN, J., BOWNE, S. F., FRAUENFELDER, H., IBEN, I. E., SAUKE, T. B., SHYAMSUNDER, E., and YOUNG, R. D. 1985. Protein states and proteinquakes. Proc. Natl Acad. Sci. USA 82, 5000–5004. 19 BRYNGELSON, J. D., ONUCHIC, J. N., SOCCI, N. D., and WOLYNES, P. G. 1995. Funnels, pathways and the energy landscape of protein folding-a synthesis. Proteins Struct. Funct. Genet. 21, 167–195. 20 DILL, K. A. and CHAN, H. S. 1997. From Levinthal to pathways to funnels. Nat. Struct. Biol. 4, 10–19. 21 TIANA, G., BROGLIA, R. A., and SHAKHNOVICH, E. I. 2000. Hiking in the energy landscape in sequence space: a bumpy road to good folders. Proteins 39, 244–251. 22 ITZHAKI, L. S., OTZEN, D. E., and FERSHT, A. R. 1995. The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: evidence for a nucleation-condensation mechanism for protein folding. J. Mol. Biol. 254, 260–288. 23 BALDWIN, R. L. and ROSE, G. D. 1999. Is protein folding hierarchic? II. Folding intermediates and transition states. Trends Biochem. Sci. 24, 77–83. 24 BALDWIN, R. L. and ROSE, G. D. 1999. Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem. Sci. 24, 26–33. 25 CIEPLAK, M. and HOANG, T. X. 2003. Universality classes in folding times of proteins. Biophys. J. 84, 475–488. 26 DOYLE, R., SIMONS, K., QIAN, H., and BAKER, D. 1997. Proteins Struct. Funct. Genet. 29, 282–291.
References 27 MUNOZ, V., THOMPSON, P. A., HOFRICHTER, J., and EATON, W. A. 1997. Nature 390, 196–199. 28 SHASTRY, M. C. and UDGAONKAR, J. B. 1995. The folding mechanism of barstar: evidence for multiple pathways and multiple intermediates. J. Mol. Biol. 247, 1013–1027. 29 VIGUERA, A. R., SERRANO, L., and WILMANNS, M. 1996. Different folding transition states may result in the same native structure. Nat. Struct. Biol. 3, 874–880. 30 LAZARIDIS, T. and KARPLUS, M. 1997. Science 278, 1928–1931. 31 KAZMIRSKI, S. L., WONG, K. B., FREUND, S. M. V., TAN, Y. J., FERSHT, A. R., and DAGGETT, V. 2001. Protein folding from a highly disordered denatured state: the folding pathway of chymotrypsin inhibitor 2 at atomic resolution. Proc. Natl Acad. Sci. USA 98, 4349–4354. 32 SANCHEZ, I. E. and KIEFHABER, T. 2003. Origin of unusual -values in protein folding: evidence against specific nucleation sites. J. Mol. Biol. 334, 1077–1085. 33 SALI, A., SHAKHNOVICH, E., and KARPLUS, M. 1994. Kinetics of protein folding -a lattice model study of the requirements for folding to the native state. J. Mol. Biol. 5, 1614–1636. 34 SALI, A., SHAKHNOVICH, E. I., and KARPLUS, M. 1994. How does a protein fold? Nature 477, 248–251. 35 GO, N. 1984. The consistency principle in protein structure and pathways of folding. Adv. Biophys. 18, 149–164. 36 SCHELLMAN, J. A. 2002. Fifty years of solvent denaturation. Biophys. Chem. 96, 91–101. 37 AUNE, K. C., SALAHUDDIN, A., ZARLENGO, M. H., and TANFORD, C. 1967. J. Biol. Chem. 242, 4486–4489. 38 van HOLDE, K. E. 1971. Physical Biochemistry. Prentice-Hall, Englewood Cliffs, NJ. 39 MILLETT, I. S., DONIACH, S., and PLAXCO, K. W. 2002. Toward a taxonomy of the denatured state: small angle scattering studies of unfolded proteins. Adv. Protein Chem. 62, 241–262.
40 van HOLDE, K. E., JOHNSON, W. C., and HO, P. S. 1998. Physical Biochemistry. Prentice-Hall, Upper Saddle River, NJ. 41 BALDWIN, R. L. and ZIMM, B. H. 2000. Are denatured proteins ever random coils? Proc. Natl Acad. Sci. USA 97, 12391–12392. 42 PLAXCO, K. W. and GROSS, M. 2001. Unfolded, yes, but random? Never! Nat. Struct. Biol. 8, 659–660. 43 PAPPU, R. V., SRINIVASAN, R., and ROSE, G. D. 2000. The flory isolated-pair hypothesis is not valid for polypeptide chains: implications for protein folding. Proc. Natl Acad. Sci. USA 97, 12565–12570. 44 van GUNSTEREN, W. F., B¨URGI, R., PETER, C., and DAURA, X. 2001. The key to solving the protein-folding problem lies in an accurate description of the denatured state. Angew. Chem. Int. Ed. 40, 351–355. 45 ZAMAN, M. H., SHEN, M. Y., BERRY, R. S., FREED, K. F., and SOSNICK, T. R. 2003. Investigations into sequence and conformational dependence of backbone entropy, inter-basin dynamics and the Flory isolated-pair hypothesis for peptides. J. Mol. Biol. 331, 693–711. 46 OHKUBO, Y. Z. and BROOKS, C. L. 2003. Exploring Flory’s isolated-pair hypothesis: statistical mechanics of helix-coil transitions in polyalanine and C-peptide from RNase A. Proc. Natl Acad. Sci. USA 100, 13916–13921. 47 GOLDENBERG, D. P. 2003. Computational simulation of the statistidal properties of unfolded proteins. J. Mol. Biol. 326, 1615–1633. 48 PAPPU, R. V. and ROSE, G. D. 2002. A simple model for polyproline II structure in unfolded states of alanine-based peptides. Protein Sci. 11, 2437–2455. 49 TIFFANY, M. L. and KRIMM, S. 1968. Biopolymers 6, 1767–1770. 50 TIFFANY, M. L. and KRIMM, S. 1968. New chain conformations of poly(glutamic acid) and polylysine. Biopolymers 6, 1379–1382. 51 SCHELLMAN, J. A. and SCHELLMAN, C. G. 1964. In The Proteins: Composition, Structure and Function, 2nd edn, Vol 2, pp. 1–37. Academic Press, New York.
23
24 Conformational Properties of Unfolded Proteins
52 SHI, Z., WOODY, R. W., and KALLENBACH, N. R. 2002. Is polyproline II a major backbone conformation in unfolded proteins? Adv. Protein Chem. 62, 163–240. 53 MADISON, V. and SCHELLMAN, J. A. 1970. Diamide model for the optical activity of collagen and polyproline I and II. Biopolymers 9, 65–94. 54 CREAMER, T. P. 1998. Left-handed polyproline II helix formation is (very) locally driven. Proteins 33, 218–226. 55 STAPLEY, B. J. and CREAMER, T. P. 1999. A survery of left-handed polyproline II helices. Protein Sci. 8, 587–595. 56 RUCKER, A. L., PAGER, C. T., CAMPBELL, M. N., QUALLS, J. E., and CREAMER, T. P. 2003. Host-guest scale of left-handed polyproline II helix formation. Proteins 53, 68–75. 57 ANDERSON, A. G., and HERMANS, J. 1988. Microfolding: conformational probability map for the alanine dipeptide in water from molecular dynamics simulations. Proteins 3, 262–265. 58 DROZDOV, A. N., GROSSFIELD, A., and PAPPU, R. V. 2004. The role of solvent in determining conformational preferences of alanine dipeptide in water. J. Am. Chem. Soc. 126, 2574–2581. 59 GRANT, J. A., WILLIAMS, R. L., and SCHERAGA, H. A. 1990. Ab initio self-consistent field and potential-dependent partial equalization of orbital electronegativity calculations of hydration properties of N-acetyl-N -methyl-alanineamide. Biopolymers 30, 929–949. 60 HAN, W.-G., JALKANEN, K. J., ELSTNER, M., and SUHAI, S. 1998. Theoretical study of aqueous N-acetyl-L-alanine N -methylamide: structures and raman, VCD and ROA spectra. J. Phys. Chem. B 102, 2587–2602. 61 JALKANEN, K. J. and SUHAI, S. 1996. N-Acetyl-L-Alanine N -methylamide: a density functional analysis of the vibrational absorption and birational circular dichroism spectra. Chem. Phys. 208, 81–116. 62 POON, C.-D. and SAMULSKI, E. T. 2000. Do bridging water molecules dictate the structure of a model dipeptide in
63
64
65
66
67
68
69
70
71
72
aqueous solution? J. Am. Chem. Soc. 122, 5642–5643. AVBELJ, F. and BALDWIN, R. L. 2002. Role of backbone solvation in determining thermodynamic beta propensities of the amino acids. Proc. Natl Acad. Sci. USA 99, 1309–1313. AVBELJ, F. and BALDWIN, R. L. 2003. Role of backbone solvation and electrostatics in generating preferred peptide backbone conformations: Distributions of phi. Proc. Natl Acad. Sci. USA 100, 5742–5747. AVBELJ, F., LUO, P., and BALDWIN, R. L. 2000. Energetics of the interaction between water and the helical peptide group and its role in determining helix propensities. Proc. Natl Acad. Sci. USA 97, 10786–10791. SCHOLTZ, J. M., MARQUSEE, S., BALDWIN, R. L., YORK, E. J., STEWART, J. M., SANTORO, M., and BOLEN, D. W. 1991. Calorimetric determination of the enthalpy change for the alpha-helix to coil transition of an alanine peptide in water. Proc. Natl Acad. Sci. USA 88, 2854–2858. BALDWIN, R. L. 2003. In search of the energetic role of peptide hydrogen bonds. J. Biol. Chem. 278, 17581–17588. GARCIA, A. E. 2004. Characterization of non-alpha helical conformations in Ala peptides. Polymer 45, 669–676. KENTSIS, A., GINDIN, T., MEZEI, M., and OSMAN, R. 2004. Unfolded state of polyalanine is a segmented polyproline II helix. Proteins 55, 493–501. MEZEI, M., FLEMING, P. J., SRINIVASAN, R., and ROSE, G. D. 2004. Polyproline II helix is the preferred conformation for unfolded polyalaine in water. Proteins 55, 502–507. GARVEY, E. P., SWANK, J., and MATTHEWS, C. R. 1989. A hydrophobic cluster forms early in the folding of dihydrofolate reductase. Proteins Struct. Funct. Genet. 6, 259–266. NERI, D., BILLETER, M., WIDER, G., and WUTHRICHT, K. 1992. NMR determination of residual structure in a urea-denatured protein, the 434-repressor. Science 257, 1559–1563.
References 73 DYSON, H. J., SAYRE, J. R., MERUTKA, G., SHIN, H. C., LERNER, R. A., and WRIGHT, P. E. 1992. Folding of peptide fragments comprising the complete sequence of proteins. Models for initiation of protein folding II. Plastocyanin. J. Mol. Biol. 226, 819–835. 74 DYSON, H. J., RANCE, M., HOUGHTEN, R. A., LERNER, R. A., and WRIGHT, P. E. 1988. Folding of immunogenic peptide fragments of proteins in water solution. I. Sequence requirements for the formation of a reverse turn. J. Mol. Biol. 201, 161–200. 75 ZIMM, B. H. and BRAGG, J. K. 1959. Theory of the phase transition between helix and random coil in polypeptide chains. J. Chem. Phys. 31, 526–535. 76 BIERZYNSKI, A., KIM, P. S., and BALDWIN, R. L. 1982. A salt-bridge stabilizes the helix formed by isolated C-peptide of RNase A. Proc. Natl Acad. Sci. USA 79, 2470–2474. 77 BROWN, J. E. and KLEE, W. A. 1971. Helix-coil transition of the isolated amino terminus of ribonuclease. Biochemistry 10, 470–476. 78 LYU, P. C., LIFF, M. I., MARKY, L. A., and KALLENBACH, N. R. 1990. Side chain contributions to the stability of alpha-helical structure in peptides. Science 250, 669–673. 79 MERUTKA, G., LIPTON, W., SHALONGO, W., PARK, S. H., and STELLWAGEN, E. 1990. Effect of central-residue replacements on the helical stability of a monomeric peptide. Biochemistry 29, 7511–7515. 80 O’NEIL, K. T. and DEGRADO, W. F. 1990. A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids. Science 250, 646–651. 81 PADMANABHAN, S., MARQUSEE, S., RIDGEWAY, T., LAUE, T. M., and BALDWIN, R. L. 1990. Relative helix-forming tendencies of nonpolar amino acids. Nature 344, 268–270. 82 SCHOLTZ, J. M. and BALDWIN, R. L. 1992. The mechanism of alpha-helix formation by peptides. Annu. Rev. Biophys. Biomol. Struct. 21, 95–118. 83 SHORTLE, D. and ACKERMAN, M. 2001. Persistence of native-like topology in a
84
85
86
87
88
89
90
91
92
93
denatured protein in 8 M urea. Science 293, 487–489. LOUHIVUORI, M., PAAKKONEN, K., FREDRIKSSON, K., PERMI, P., LOUNILA, J., and ANNILA, A. 2003. On the origin of residual dipolar couplings from denatured proteins. J. Am. Chem. Soc. 125, 15647–15650. DAGGETT, V., LI, A., ITZHAKI, L. S., OTZEN, D. E., and FERSHT, A. R. 1996. Structure of the transition state for folding of a protein derived from experiment and simulation. J. Mol. Biol. 257, 430–440. GARCIA, P., SERRANO, L., DURAND, D., RICO, M., and BRUIX, M. 2001. NMR and SAXS characterization of the denatured state of the chemotactic protein CheY: implications for protein folding initiation. Protein Science 10, 1100–1112. SANCHEZ, I. E., and KIEFHABER, T. 2003. Origin of Unusual f-values in protein folding: evidence against specific nucleation sites. J. Mol. Biol. 334, 1077–1085. SRIDEVI, K., LAKSHMIKANTH, G. S., KRISHNAMOORTHY, G., and UDGAONKAR, J. B. 2004. Increasing stability reduces conformational heterogeneity in a protein folding intermediate ensemble. J. Mol. Biol. 337, 669–711. YI, Q., SCALLEY-KIM, M. L., ALM, E. J., and BAKER, D. 2000. NMR characterization of residual structure in the denatured state of protein L. J. Mol. Biol. 299, 1341–1351. SHI, Z., OLSON, C. A., ROSE, G. D., BALDWIN, R. L., and KALLENBACH, N. R. 2002. Polyproline II structure in a sequence of seven alanine residues. Proc. Natl Acad. Sci. USA 2002, 9190–9195. FITZKEE, N. C. and ROSE, G. D. 2004. Steric restrictions in protein folding: an α-helix cannot be followed by a contiguous β-strand. Protein Sci. 13, 633–639. ZAGROVIC, B., SNOW, C. KHALIQ, S., SHIRTS, M., and PANDE, V. 2002. Native-like mean structure in the unfolded ensemble of small proteins. J. Mol. Biol. 323, 153–164. SMITH, L. J., BOLIN, K. A., SCHWALBE, H., MACARTHUR, M. W., THORNTON, J. M., and DOBSON, C. M. 1996. Analysis of main chain torsion angles in proteins:
25
26 Conformational Properties of Unfolded Proteins
94
95
96
97
98
99
prediction of NMR coupling constants for native and random coil conformations. J. Mol. Biol. 255, 494–506. BRANDTS, J. F. 1964. The thermodynamics of protein denaturation. I. The denaturation of chymotrypsinogen. J. Am. Chem. Soc. 86, 4291–4301. BRANDTS, J. F. 1964. The thermodynamics of protein denaturation. II. A model of reversible denaturation and interpretations regarding the stability of chymotrypsinogen. J. Am. Chem. Soc. 86, 4302–4314. SRINIVASAN, R. and ROSE, G. D. 1999. A physical basis for protein secondary structure. Proc. Natl Acad. Sci. USA 96, 14258–14263. BANAVAR, J., MARITAN, A., MICHELETTI, C., and TROVATO, A. 2002. Geometry and physics of proteins. Proteins 47, 315–322. HOANG, T. X., TROVATO, A., SENO, F., BANAVAR, J., and MARITAN, A. 2004. What determines the native state folds of proteins? submitted. MARITAN, A., MICHELETTI, C., TROVATO, A., and BANAVAR, J. 2000. Optimal
100
101
102
103
104
105
shapes of compact strings. Nature 406, 287–290. RAMACHANDRAN, G. N., RAMAKRISHNAN, C., and SASISEKHARAN, V. 1963. Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7, 95–99. LASKOWSKI, R. A., MACARTHUR, M. W., MOSS, D. S., and THORNTON, J. M. 1993. PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291. RICHARDS, F. M. 1977. Areas, volumes, packing, and protein structure. Annu. Rev. Biophys. Bioeng. 6, 151–176. KOLTUN, W. L. 1965. Precision spacefilling atomic models. Biopolymers 3, 665–679. PAULING, L., COREY, R. B., and BRANSON, H. R. 1951. The structures of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl Acad. Sci. USA 37, 205–210. CHANDLER, D., WEEKS, J. D., and ANDERSEN, H. C. 1983. The van der Waals picture of liquids, solids and phase transformations. Science 220, 787–794.
1
Kinetic Mechanisms in Protein Folding Annett Bachmann, and Thomas Kiefhaber University of Basel, Basel, Switzerland
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The understanding of protein folding landscapes requires the characterization of local saddle points (transition states) and minima (intermediates) on the free energy surface between the ensemble of unfolded states and the native protein. This article discusses experimental approaches to analyze kinetic data for protein folding reactions, to identify folding intermediates and to assign their role in the folding process. The methods discussed mainly use concepts from classical reaction kinetics and from physical organic chemistry to analyze the dynamics of protein folding and unfolding. It could be argued that these methods may not be applicable to the protein folding process, which represents a conformational search on a complex multidimensional free energy surface. However, results from a variety of different studies suggest that efficient folding proceeds through partially folded intermediates and that protein folding in solution encounters major free energy barriers. In addition, protein folding transition states are usually structurally well-defined local maxima along different testable reaction coordinates [1]. This suggests that we can indeed derive information on protein folding barriers using the classical concepts.
2 Analysis of Protein Folding Reactions using Simple Kinetic Models
The aim of kinetic studies is trying to find the minimal model capable of describing the experimental data by ruling out as many alternative models as possible. It is thereby assumed that the mechanism involves a finite number of kinetic species that are separated by energy barriers significantly larger than the thermal energy Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Kinetic Mechanisms in Protein Folding
(>5kB T), but each kinetic species may consist of different conformations in rapid equilibrium. Thus, the transitions between large ensembles of states can be treated using classical reaction kinetics and the interconversion between two species, Xi and Xj , can be described with microscopic rate constants for the forward (kij ) and reverse (kji ) reactions kij
Xi X j
(1)
kji
2.1 General Treatment of Kinetic Data
Experimental studies on the mechanism of protein folding have focused mainly on monomeric single domain proteins as model systems. Folding of oligomeric proteins and multidomain proteins is discussed in Chapter 27 and Chapter 2 in Part 2,. Folding kinetics of monomeric proteins have the major advantage that all observed reactions are of first order. Consequently the time-dependent change in intensity of an observed signal (P) can be represented as a sum of n exponentials with observable rate constants (λi ) and corresponding amplitudes (Ai ) Pt −P∞ =
n
AI ·e−λi t
(2)
i=1
where Pt is the observed intensity at time t and P∞ is the intensity at t → ∞. The apparent rate constants, λi , are functions of the microscopic rate constants, kij , which depend on external parameters like temperature, pressure, and denaturant concentration. The amplitudes Ai additionally depend on the initial concentrations of the kinetic species (see Protocols, Section 6). Generally, any kinetic mechanism with n different species connected by first-order reactions results in n −1 observable rate constants. Applied to protein folding with fixed initial (U) and final (N) state the observation of n −1 apparent rate constants thus indicates the presence of n −2 transiently populated intermediate states. A more detailed treatment of the analysis of kinetic data in protein folding is given in Refs. [2] and [3] and general treatments of kinetic mechanisms are given in Refs. [4] and [5]. 2.2 Two-state Protein Folding
Experimental investigations have revealed simple two-state folding with single exponential folding and unfolding kinetics for more than 30 small proteins (Figure 1; for an overview see Ref. [6]): kf
UN kU
(3)
2 Analysis of Protein Folding Reactions using Simple Kinetic Models 3
Fig. 1 Free energy profile for a hypothetical two-state protein folding reaction. The ground states U and N are separated by a free energy barrier with the transition state as point along the reaction coordinate with
the highest free energy. The activation free ◦ energies for passing the barrier from Gf ‡ ◦‡ and Gf , respectively, are reflected in the microscopic rate constants kf or ku (see Eq. (8)).
where kf and ku are the microscopic rate constants for the folding and unfolding reactions, respectively. The single observable macroscopic rate constant (λ) for this mechanism is readily derived as λ=kf +kU
(4)
The equilibrium of the reaction is determined by the flows from the native state N to the unfolded state U and vice versa: kf ·[U]eq =kU ·[N]eq
(5)
and the equilibrium constant (K) for folding is k:=
[N]eq [U]eq
=
kf kU
(6)
A straightforward parameter that can be inferred from two-state behavior is the free energy for folding (G◦ ) which is connected by the van’t Hoff relation with the equilibrium constant (K) G o =−RT ·ln(K )=−RT ·ln
[N]eq [U]eq
=−RT ·ln
kf kU
(7)
Thus, protein stability can be measured in two different ways, either by equilibrium methods or by kinetic measurements. The agreement between the free energies for
4 Kinetic Mechanisms in Protein Folding
folding inferred from equilibrium and kinetic measurements is commonly used as a test for two-state folding (see Figure 2). The same formalism can be applied to obtain information on the free energy of ◦ activation (G t ) for a reaction using transition state theory k=k0 ·e−G
ot
/RT
(8)
where k0 is the pre-exponential factor and represents the maximum rate constant for a reaction in the absence of free energy barriers. Combination of Eqs. (6) and (8) allows us to gain information on the free energy changes along the reaction coordinate (see Figure 1). ◦‡
◦‡
G ◦ =G f −G U
(9)
In the original Eyring theory the pre-exponential factor corresponds to the frequency of a single bond vibration [7, 8]: k0 =κ
kB T ≈6·1012 s−1 at 25◦ C h
(10)
where κ represents the transmission coefficient, which is usually set to 1. Since protein folding involves formation and breakage of weak intramolecular interactions rather than changes in covalent bonds, the Eyring prefactor will not be useful for the analysis of protein folding reactions. The pre-exponential factor for protein folding reactions will be strongly influenced by intrachain diffusion processes but will also depend on the location of the transition state on the free energy landscape and on the nature of the rate-limiting step for a particular protein. It is likely around 107 –108 s−1 [9]. Experimental analysis of protein folding reactions usually makes use of the effect of chemical denaturants like urea and GdmCl on the rate and equilibrium constants. Empirically, a linear correlation between the denaturant concentration ([D]) and both the equilibrium free energy (G◦ ) [10, 11] and the activation free energies ◦‡ ◦‡ [12] for folding (G f ) and unfolding (G U ) is found (Figure 2). Accordingly, for two-state folding a plot of the logarithm of the single observable (apparent) rate constant, λ, vs. [D] yields a V-shaped curve, commonly called chevron plot [13, 14] (Figure 2C). This plot gives information on the refolding reaction at low denaturant concentrations (the refolding limb) and information on the unfolding reaction at high denaturant concentrations (the unfolding limb). Accordingly, proportionality constants, mx , are defined as meq =
∂G ◦ ∂[Denaturant]
(11)
◦‡
mf ,U =
∂G f ,U ∂[Denaturant]
(12)
2 Analysis of Protein Folding Reactions using Simple Kinetic Models 5
Fig. 2 Equilibrium and kinetic data for twostate tendamistat folding and unfolding at pH 2.0. A) GdmCl-induced equilibrium transition of wild-type tendamistat at pH 2.0 measured by far-UV circular dichroism. B) Plot of RT ln K eq for the data in the transition region result in a linear dependency on the denaturant concentration. The slope corresponds to −meq (see Eq. (11)). C) GdmCl dependence of the (see logarithm
of the single observable folding rate constant (8 = kf + ku ) gives a V-shaped profile commonly termed a chevron plot. This reveals a linear dependence between [GdmCl] and the ln(kf,u ) as indicated by the dashed lines. The values for K eq and meq derived from kinetic (panel C) and equilibrium data (panel B) are identical, which is a strong support for a two-state folding mechanism.
6 Kinetic Mechanisms in Protein Folding
Since the meq -values were shown to be proportional to the changes in accessible surface area (ASA) upon unfolding of the protein [15], the kinetic m-values are believed to reflect the changes in ASA between the unfolded state and the transition state (mf ) and the native state and the transition state (mu ). This can be used to characterize the solvent accessibility of a protein folding transition state. The m-values allow an extrapolation of the data to any given denaturant concentration using G ◦ (D)=G ◦ (H2 O)+meq ·[D]
◦‡
◦‡
◦‡
◦‡
(13)
G f (D)=G f (H2 O)+mf ·[D]
G U (D)=G U (H2 O)+mU ·[D]
(14)
(15)
where (H2 O) denotes the values in the absence of denaturant and (D) those at a given denaturant concentration [D]. Combining Eq. (9) with Eqs (13)–(15) shows that the comparison of the kinetic and equilibrium free energies and mvalues provides a tool to test for the validity of the two-state mechanism (see Figure 2): ◦‡
◦‡
G ◦ (H2 O)=G f (H2 O)−G U (H2 O)
(16)
meq =mf −mU
(17)
and
2.3 Complex Folding Kinetics
For simple two-state folding reactions the microscopic rate constants are readily obtained from the chevron plot (cf. Figure 2) and the properties of the transition barriers can be analyzed using rate equilibrium free energy relationships [1, 16– 19] (REFERs). However, folding of most proteins is more complex with several observable rate constants. These complex folding kinetics can have different origins. The transient population of partially folded intermediates, which frequently occurs under native-like solvent conditions (i.e., at low denaturant concentrations), will lead to the appearance of additional kinetic phases (see Section 2.3.2) and to a downward curvature in the chevron plot (Figure 3A) [13, 20–23]. Kinetic coupling between two-state folding and slow equilibration processes in the unfolded state
2 Analysis of Protein Folding Reactions using Simple Kinetic Models 7
Fig. 3 Different origins for curvatures in chevron plots: A) Population of a transient intermediate during folding of hen egg white lysozyme. At low denaturant concentrations two rate constants are observed in refolding (82 and 81 ; open triangles and open circles). For unfolding of native lysozyme a single kinetic phase is observed (filled circles). 82 at higher denaturant concentrations (filled triangles) corresponds to unfolding of the intermediate and can only be detected when the intermediate is transiently populated by a short refolding pulse and then unfolded at high denaturant
concentrations in sequential mixing experiments [52] (interrupted refolding experiments). B) Chevron plot for ribonuclease T1 W59Y. The curvature in the refolding limb at low denaturant concentrations is caused by kinetic coupling between prolyl isomerization and protein folding [24, 25]. Only the slowest refolding phase is shown. The fraction of faster folding molecules with all prolyl peptide bonds in the native conformation could not be detected since the experiments were carried out by manual mixing. Data for lysozyme were taken from Ref. [52] and data for RNase T1 were taken from Ref. [25].
provides another source for multiphasic folding kinetics [2, 24, 25] (Section 2.3.1). Since these possible origins for complex folding kinetics have far-reaching effects on the molecular interpretation of the kinetic data it is crucial to discriminate between them and to elucidate the origin of complex kinetics. With the knowledge of the correct folding mechanism the microscopic rate constants (kij ) connecting the individual states can be determined.
8 Kinetic Mechanisms in Protein Folding
2.3.1 Heterogeneity in the Unfolded State The unfolded state of a protein consists of a large ensemble of different conformations which are in rapid equilibrium, and can thus be treated as a single kinetic species as long as their interconversion is faster than the kinetic reactions leading to the native state. Studies on the time scale of intrachain diffusion in unfolded polypeptide chains in water showed that these processes occur on the 10–100 ns time scale [9, 26], which is significantly faster than protein folding reactions [27, 28]. On the other hand, slow interconversion reactions between the different unfolded conformations lead to kinetic heterogeneity. This was first observed by Garel and Baldwin [29] who showed that both fast and slow refolding molecules exist in unfolded ribonuclease A (RNase A). In 1975 Brandts and coworkers [30] showed with their classical double jump experiments that slow and fast folding forms of RNase A are caused by a slow equilibration process in the unfolded state and proposed cis-trans isomerization at Xaa-Pro peptide bonds as the molecular origin. This was confirmed later with proline mutations [31] and catalysis of the slow reactions by peptidyl-prolyl cis-trans isomerases [32]. The fast folding molecules have all Xaa-Pro peptide bonds in their native isomerization states, whereas slow-folding molecules contain nonnative prolyl isomers, which have to isomerize to their native isomerization state during folding. This isomerization process is slow with time constants (τ = 1/λ) around 10–60 s at 25 ◦ C, which usually limits the folding reaction of the unfolded molecules with nonnative prolyl isomers. Since the equilibrium population of the cis isomer in Xaa-Pro peptide bonds is between 7 and 36%, depending on the Xaa position, [33] all proteins with prolyl residues should show a significant population of slow folding molecules. Prolyl isomerization has indeed been observed in folding of many proteins [34]. Whenever folding and proline isomerization have similar rate constants or when folding is slower than isomerization, a pronounced curvature is observed in the refolding limb of the chevron plot, which looks similar to the effect of transiently populated intermediates (Figure 3B). The effect on chevron plots is treated quantitatively in Refs. [2, 35, 36]. In addition to prolyl isomerization also nonprolyl peptide bond isomerization has recently been shown to cause slow parallel folding pathways [37, 38]. Cis-trans isomerization of nonprolyl peptide bonds has long been speculated to cause slow steps in protein folding, since it is an intrinsically slow process with a high activation energy [30, 33]. The large number of peptide bonds in a protein leads to a significant fraction of unfolded molecules with at least one nonnative cis isomer, although the cis isomer is only populated to about 0.15% in equilibrium in the unfolded state [39]. Studies on a slow folding reaction in a prolyl-free tendamistat variant revealed that folding of about 5% of unfolded molecules is limited by the cis-trans isomerization of nonprolyl peptide bonds [38] (Figure 4). The slow equilibrium between cis and trans nonprolyl peptide bonds in the unfolded state was revealed by double jump experiments [38]. In these experiments (Figure 4C) the protein is rapidly unfolded at high denaturant concentrations often in combination with low pH. After various times, unfolding is stopped by dilution to refolding conditions. The resulting folding kinetics are monitored by spectroscopic probes. If a slow isomerization reaction in the unfolded state occurs, the slow folding molecules will be formed slowly after
2 Analysis of Protein Folding Reactions using Simple Kinetic Models 9
Fig. 4 Complex folding kinetics caused by nonprolyl cis-trans isomerization in the unfolded state of a proline-free variant of tendamistat. GdmCl dependence of the apparent rate constants, 81 (open circles), 82 (filled circles) for folding and unfolding (A) and the respective refolding amplitudes (A1 (open circles), A2 (filled circles) are shown. The lines in panels A and B represent the global fit of both apparent rate constants and amplitudes to the analytical solution of a linear three-state model. The fit reveals a GdmCl-independent isomerization reaction and a GdmCl-dependent folding reaction (see Ref. [38]). C) Double jump experiments to detect the slow isomerization process in the unfolded state. Appearance of the fast-refolding (open cir-
cles) and the slow-refolding (filled circles) molecules during unfolding. Native protein was unfolded at high denaturant concentrations for the indicated time. Under the applied conditions unfolding was complete after 150 ms. After various unfolding times refolding was initiated by a second mixing step and the amplitudes of the two refolding reactions (open circles, filled circles) in the presence of 0.8 M GdmCl (cf. panel A) were measured. The results show that the slow-refolding phase (filled circles) is formed slowly in the unfolded state whereas the fast-refolding phase is formed with the same time constant as that with which unfolding of tendamistat occurs. Data were taken from Ref. [38].
10 Kinetic Mechanisms in Protein Folding
unfolding. Thus, the amplitude of the slow reaction will increase with unfolding time as observed for the proline free tendamistat variant shown in Figure 4C. Figure 4 further reveals that nonprolyl isomerization has a rate constant around 1 s−1 , which is significantly faster than prolyl peptide bond isomerization. This is in accordance with rate constants for nonprolyl isomerization measured in model peptides [39]. The results on the effect of nonprolyl isomerization on protein folding imply that isomerization at non prolyl peptide bonds will dramatically effect folding of large proteins. For a protein with 500 amino acid residues more than 50% of the unfolded molecules have at least one nonnative peptide bond isomer. Since the isomerization rate constant is around 1 s−1 , it will mainly affect the early stages of folding and folding of fast folding proteins [38]. Another cause of kinetic heterogeneity in the unfolded state was identified in cytochrome c, where parallel pathways were shown to be due to the exchange of the heme ligands in the unfolded state [40–42]. 2.3.2 Folding through Intermediates As discussed above, the number of kinetic species, n, is related to the number of observable rate constants, n − 1, and thus, with unfolded and native state as the initial and final states, the number of intermediates is given by n − 2. Thus, for a single observable rate constant there are no intermediates, corresponding to two-state folding. The observation of two apparent rate constants indicates a single intermediate, etc. It should be noted that these considerations only apply if there is a kinetically homogeneous unfolded state with rapidly interconverting conformations. In the case of heterogeneous populations of unfolded molecules separated by slow interconversion reactions like prolyl isomerization, the number of observable rate constants is additionally correlated with the number of unfolded species. In the following we will describe experiments that allow the elucidation of mechanisms of fast folding molecules (i.e., of molecules with all Xaa-Pro bonds in the native isomerization state). The first step in elucidating a folding mechanism is the determination of the number of exponentials, i.e., the number of observable rate constants, λ, needed to describe the kinetics. In principle, the number of observable rate constants should not depend on the probe used to monitor the folding reaction, as observed for lysozyme refolding shown in Figure 5, which shows the same two observable rate constants measured by a number of different probes. However, some reactions may not cause a signal change in a specific probe, if, for example, an intermediate shows the same spectral properties as the native or unfolded state. Thus, kinetics should be monitored using different probes. In addition, it is crucial to test for burst-phase reactions (i.e., for processes occurring within the experimental deadtime) (Figure 5). These reactions are observed for many proteins during refolding at low denaturant concentrations and are usually caused by a considerable compaction of the polypeptide chain [43, 44] (Figure 5E) and concomitant formation of significant amounts of secondary structure [45] (see Chapter 23). Whether rapid collapse represents a distinct step on a folding pathway or whether it is the response of the unfolded state to the change in solvent conditions upon refolding is currently
2 Analysis of Protein Folding Reactions using Simple Kinetic Models 11
Fig. 5 Lysozyme refolding observed with different probes: A) Intrinsic tryptophan fluorescence above 320 nm after excitation at 280 nm; B) ANS fluorescence; C) far-UV circular dichroism at 225 nm; D) near-UV circular dichroism at 289 nm; E) changes in radius of gyration measured by smallangle X-ray scattering (SAXS). The lines of constant signal represent the signal of the unfolded state. The traces are recorded at 23 ◦ C, pH 5.2. Data in panels A-D were recorded by stopped-flow mixing. SAXS
data were recorded by continuous-flow and stopped-flow mixing. All data can be fitted globally with 82 = 34 s−1 ; 8−1 = 4.3 s−1 demonstrating that all probes detect the same two kinetic phases. In addition, major signal changes in the dead time of mixing are detected by all probes except near-UV CD, indicating a fast reaction that cannot be resolved by the experiments (burst phase with 8 > 2000 s−1 ). Data were taken from Refs. [43] and [44].
12 Kinetic Mechanisms in Protein Folding
under discussion. Results on the folding of α-lactalbumin [46–48], apo-myoglobin [49], and lysozyme [44] suggest that burst-phase intermediates unfold cooperatively, indicating that they are separated by a barrier from the ensemble of random coil conformations, which are populated at high denaturant concentrations. To test for aggregation side reactions during refolding, which are occasionally also observed for small proteins, [50], the concentration dependence of the folding reaction should be tested. The next step in determining a folding mechanism is to locate the intermediates on the folding pathway. If a single folding intermediate is observed two apparent rate constants will be observed for all possible three-state mechanisms shown in Scheme 1. The triangular mechanism shown in panel A represents the general three-state model whereas mechanisms B (on-pathway intermediate) and C (off-pathway intermediate) represent special cases of mechanism A with one of the three equilibria between the individual states being too slow to influence the kinetics. Thus, if only mechanisms B or C are considered for a particular protein, the assumption is made that one of the equilibration processes between the three states is significantly slower than the other two. A consequence from the mathematical analysis of the mechanisms shown in Scheme 1 is that the two observable rate constants for the triangular mechanism cannot be simply related to individual microscopic rate constants but are functions of several microscopic rate constants (see Protocols, Section 6). Thus it is essential to elucidate the correct folding mechanism and to determine all microscopic rate constants in order to draw conclusions on the individual transition states between the kinetic species. Since all mechanisms shown in Scheme 1 give rise to two experimentally observable rate constants, it is impossible to exclude the triangular mechanism on the basis of direct spectroscopic measurements of the folding kinetics. However, in the triangular mechanism the native molecules are produced in both the faster and
Scheme 1
2 Analysis of Protein Folding Reactions using Simple Kinetic Models 13
the slower kinetic phase whereas in the on-pathway mechanism the native state is produced with a lag phase. Thus, discrimination between these mechanisms requires to specifically monitor the time course of the native state. Experiments, which selectively measure the population of the individual kinetic species during folding, have been introduced by Schmid [51] and were initially used to detect slow and fast folding pathways during RNase A folding. They are commonly termed interrupted refolding experiments. These experiments make use of the fact that each state of the protein has its characteristic stability and unfolding rate constant. Interrupted refolding experiments consist of two consecutive mixing steps (Figure 6). In a first step refolding is initiated from completely unfolded protein. The folding reaction is allowed to proceed for a certain time (ti ) after which the solution is transferred to a high concentration of denaturant. This results in the unfolding of all native molecules and of partially folded states which have accumulated during the refolding step. Each state (N or I) is identified by its specific unfolding rate constant at high concentrations of denaturant, where unfolding is virtually irreversible, so that little kinetic coupling occurs. The amplitudes of the observed unfolding reactions reflect the amounts of the respective species present at time (ti ) when refolding was interrupted. Thus, varying the time (ti ) allowed for refolding gives the time course of the populations of native protein and of the intermediate during the folding process. With the use of highly reproducible sequential mixing stopped-flow instruments these experiments also became feasible on a faster time scale and allow the discrimination between sequential and triangular folding mechanism on fast folding pathways [22, 52].
Fig. 6 Principle of interrupted refolding experiments to measure the time course of the population of a folding intermediate (I) and of native proteins (N) starting from completely unfolded protein (U). The experiments are described in detail in the text.
14 Kinetic Mechanisms in Protein Folding
The discrimination between the off-pathway mechanism and the triangular mechanism is usually difficult, since also in the off-pathway model some molecules may fold directly from U to N, depending on the ratio of the rate constants kUN and kUI [52]. The discrimination between these mechanisms usually requires analysis of the denaturant dependence of all folding and unfolding rate constants in combination with the time course of the different species [52]. The data are then fitted to the analytical solutions of the differential equations describing the different models. For folding reactions of monomeric proteins all reactions are of first order and the analytical solutions of the differential equations can be obtained for mechanisms with four species or less. These solutions are derived in many kinetic textbooks and rate constants and amplitudes for three-state and four-state mechanisms are given in Refs. [24] and [52]. The solutions for the three-state models in Scheme 1 are given in the Protocols (Section 6.1–6.3). A particularly useful source for the analytical solution of a vast number of different kinetic mechanisms is the article by Szabo [4]. For more complex mechanisms numerical methods have to be applied to analyze the data. As an example for the discrimination between the different kinetic models shown in Scheme 1, the elucidation of the folding mechanism of lysozyme will be discussed in detail in Section 3. 2.3.3 Rapid Pre-equilibria The analysis for mechanisms B and C in Scheme 1 simplifies when equilibration between U and I occurs on a much faster time scale than folding to the native state [53, 54]. In this case, formation of the intermediate can be treated as a rapid pre-equilibrium. The apparent rate constants for mechanism B (on-pathway intermediate) can thus be approximated by
λ1 =kUI +kIU ,λ2 =
1 kUI kIN +kNI with K UI = 1+1/K UI kIU
(18)
and for mechanism C (off-pathway intermediate) as λ1 =kUI +kIU ,λ2 =(1−
1 )kUN +kNU 1+1/K UI
(19)
where 1/(1 + (1/K UI )) represents the fraction of intermediate (f (I)) in the preequilibrium. Since I is productive in mechanism B and nonproductive in mechanism C, the rate of formation of N depends on f (I) and 1 −f (I), respectively. Due to the commonly observed strong denaturant dependence of the microscopic rate constants, the simplifications made above might not be valid at some denaturant concentrations. Therefore, the simplified treatment of the data should only be performed if formation and unfolding of the intermediate is significantly faster than the kIN and kNI under all experimental conditions. Otherwise, the exact solutions of the three state model must be used to fit the data [52] (see Protocols, Section 6). Under conditions where the simplifications given in Eqs. (18) and (19) hold and the triangular mechanism has been ruled out, these equations can provide a simple
2 Analysis of Protein Folding Reactions using Simple Kinetic Models 15
tool for identifying off-pathway intermediates. Destabilization of the intermediate leads to a larger fraction of productive (unfolded) molecules in the off-pathway model. Thus, addition of denaturant will speed up folding, when the resulting destabilization of the intermediate exceeds the deceleration of the U → N reaction. As a consequence, an increase in the folding rate with increasing concentrations of denaturant points to the transient accumulation of an off-pathway intermediate. This behavior has been observed for intermediates trapped by nonnative disulfide bonds, [55] by nonnative proline isomers [56] and by nonspecific aggregation [50]. 2.3.4 Folding through an On-pathway High-energy Intermediate Many apparent two-state folders were shown to fold through on-pathway highenergy intermediates, which cannot be detected directly with spectroscopic methods, since they are less stable than N and U [57, 58] (Figure 7A). Their existence can be inferred from kinks in chevron plots, which are not accompanied by additional kinetic phases nor by burst-phase reactions [1, 18, 19, 57–61]. Figure 7B shows such a kink in the chevron plot for a tendamistat variant [57]. The changing slope in the unfolding limb of the chevron plot indicates the existence of two consecutive transitions states, TS1 and TS2, and a “hidden” high-energy intermediate (I*) and thus provides an effective way to identify on-pathway intermediates [1, 18, 19, 57– 61]. The detection of a high-energy on-pathway intermediate results in a linear three-state model: kUI
kIN
kIU
kNI
U I∗ N
Fig. 7 Transition barrier for apparent twostate folding through a high-energy intermediate and the consequences on the shape of the chevron plot. A) The reaction coordinate of tendamistat comprises two discrete transition states and a high-energy intermediate. B) A single folding rate constant is observed which shows a kink in the denaturant dependence of the chevron-plot which
(20)
indicate the switch between the two alternative transition states [57]. The black line represents a fit of the data to a three-state model as described in the Protocols (Eqs (??)–(42)). The lines labeled TS1 and TS2 correspond to constructed chevron plots for the two transition states. Data were taken from Ref. [57].
16 Kinetic Mechanisms in Protein Folding
The kinetics at a single denaturant concentration cannot be distinguished from two-state folding (Eq. (??)). However, the presence of a kink in the chevron plot and the resulting altered slope at high denaturant concentrations allows a characterization of both transition states [57] (see Protocols, Section 6.4). Since I* does not become populated at any conditions, its stability can not be determined. However, the difference in free energy between both transition states (G ◦TS2/TS1 ) and the difference in their ASA (mTS2 –mTS1 ) can be obtained by fitting the GdmCl dependence of ln λ to the analytical solutions of the three-state model shown in Eq. (20) (see Protocols, Section 6.4). Figure 7B shows the results of a three-state fit for the kinked chevron and the constructed chevron plots for the individual transition states for the tendamistat variant.
3 A Case Study: the Mechanism of Lysozyme Folding
In the following, we will use lysozyme as a case study to discuss how sequential mixing experiments in combination with the denaturant dependence of the apparent rate constants allow a quantitative analysis of complex folding kinetics and the determination of all microscopic rate constants. 3.1 Lysozyme Folding at pH 5.2 and Low Salt Concentrations
Lysozyme consists of two structural subdomains, the α-domain with exclusively α-helical structure and the β-domain with predominantly β-structure [62, 63]. Starting from GdmCl-unfolded disulfide intact protein, large changes in far-UV CD and fluorescence signals are observed within the first millisecond of refolding [43, 44, 64, 65] (Figure 5A-D). Time-resolved small angle X-ray scattering experiments show that this burst-phase reaction leads to a globular state, with a significantly smaller radius of gyration (RG ) than the unfolded protein [43, 44] (Figure 5E). In a subsequent reaction, with a time constant of 30 ms (at pH 5.2 and 20 ◦ C), an intermediate state (IN ) is formed, as observed by strong quenching in tryptophan fluorescence to a level below that of the native state [43, 44, 66], by an increase in the far-UV CD signal [64, 65], by small changes in near-UV CD [44] and by a further compaction of the polypeptide chain [43, 44]. Pulsed hydrogen/ deuterium exchange experiments showed that this intermediate has native-like helical structure in the α-domain whereas the β-sheet structures are not formed [67]. The intermediate converts to the native state with a relaxation time of about 400 ms. Double-jump experiments showed that the slow folding reaction of lysozyme is not caused by slow equilibration processes coupled to the folding reaction [52] (Figure 8A). There had been controversial reports as to the role of IN in the folding process and it remained unclear whether it represents an obligatory intermediate on the folding pathway for all lysozyme molecules [66, 67]. In addition, a reaction on the 10 ms time scale was observed in pulsed hydrogen exchange
3 A Case Study: the Mechanism of Lysozyme Folding 17
Fig. 8 Double jump (A) and interrupted refolding experiments (B, C) for lysozyme folding at pH 5.2. Double jump experiments show that amplitude of the slow folding reaction (A1 , open circles) is formed with the same time constant as that at which unfolding of native lysozyme occurs. For comparison the unfolding kinetics directly monitored by the change in intrinsic Trp fluorescence is shown (solid line). B) Time course of the population of native lysozyme,
N (filled circles) and of the kinetic intermediate, IN (open circles) measured in interrupted refolding experiments. C) The early time region of the kinetics shows that the faster process (t = 30 ms) produces both native molecules and the partially folded intermediate IN . The lag phase for formation of N, expected for a linear on-pathway model shown in Scheme 1B (. . .), is not observed. Data were taken from Ref. [52].
18 Kinetic Mechanisms in Protein Folding
experiments [67–69], which leads to formation of molecules with native hydrogen bonding network and was thus proposed to produce an intermediate with nativelike hydrogen bonding network [70]. However, this reaction was not observed with any other spectral probe or in SAXS experiments (see Figure 5), which obscured its interpretation. The observation of two apparent rate constants (treating the burst-phase collapse as a pre-equilibrium uncoupled from the slower reactions) cannot rule out any of the mechanisms shown in Scheme 1. However, comparing mechanism A with mechanisms B and C offers a simple way to distinguish obligatory from nonobligatory folding intermediates. In the case of an obligatory intermediate, all molecules have to fold through this state, resulting in a lag phase in the formation of native molecules [5]. Direct spectroscopic measurements or measurements of changes in RG are not able to monitor formation of native lysozyme directly, since changes in spectroscopic and geometric properties occur in all folding steps (see above). Further, the use of inhibitor binding to detect formation of active lysozyme did not give any clear-cut results since binding is too slow under the applied experimental conditions [22, 66]. Thus, interrupted refolding experiments were used to monitor the time course of native protein and of IN . Two unfolding reactions were observed in interrupted refolding experiments [22, 52]. One of them corresponds to the well-characterized unfolding kinetics of native lysozyme, and a second much faster one corresponds to unfolding of the intermediate (see Figure 6). The faster reaction is only observed at short and intermediate refolding times but is not observed when native lysozyme is unfolded. The collapsed state unfolds too fast under all conditions to be measured by stopped-flow unfolding. The results show that the intermediate is not obligatory for lysozyme folding, since no lag phase in forming native lysozyme is observed [22] (Figure 8B,C). Rather, formation of IN and formation of 20% of the native molecules occur with the faster kinetic phase (τ = 30 ms). The slower process (τ = 400 ms) reflects the interconversion of the intermediate to the native state, resulting in the remaining 80% of the molecules folding to the native state [22, 52]. As discussed above, identical apparent rate constants for forming the intermediate and native molecules on a fast pathway are expected, since any three-state model gives rise to only two observable rate constants. The lack of an initial lag phase during formation of native lysozyme ruled out a linear on-pathway model, but it could not distinguish between the triangular model and the off-pathway model [22]. Performing a least-squares fit of the denaturant dependence of the two apparent rate constants to the analytical solutions of both models (see Protocols, Section 6) revealed that only the triangular model was able to describe the data [52] (Figure 9). In the case of the off-pathway model, unfolding of the intermediate should become the rate-limiting step for folding at very low denaturant concentrations. This would predict an increase in the folding rate with increasing denaturant concentration, which was, however, not observed (Figure 9). Fitting the data to the circular three-state model allowed the determination of all six microscopic rate constants (Figure 10). For this analysis it was crucial to know the unfolding rate constant of the intermediate at high denaturant
3 A Case Study: the Mechanism of Lysozyme Folding 19
Fig. 9 GdmCl dependence of the apparent rate constants of lysozyme folding at pH 5.2. 82 (open and filled triangles) and 81 (open and filled circles) are fit to the analytical solutions of the triangular model (Scheme 1A,—) and to the linear off-pathway model (Scheme 1C, . . .). The
fits show that the off-pathway model is not able to describe the data. To increase the stability of the intermediate and thus allow a better distinction between the onand the off-pathway model the experiments were carried out in the presence of 0.5 M Na2 SO4 . Data were taken from Ref. [52].
concentrations (λ2 in Figures 3A and 9). Since unfolding of native lysozyme is single exponential at high denaturant concentrations and determined by λ1 , the faster rate constant (λ2 ) had to be obtained in sequential mixing experiments [52]. In these experiments the intermediate was populated by a short (100 ms) refolding pulse and then unfolded at high denaturant concentrations to yield λ2 (see Figure 6). The triangular folding mechanism for lysozyme raises the question for the origin for the kinetic partitioning into a fast direct pathway and a slow pathway through IN at the stage of the collapsed state. Fluorescence and NMR measurements revealed nonnative short- and long-range interactions around Trp residues in denaturantunfolded lysozyme [44, 71] and in the collapsed state [43]. These interactions involve Trp62 and Trp63 which are located in the β-domain and may cause distinct subpopulations in the collapsed state and prevent structure formation in the β-domain of part of the collapsed molecules. This model is supported by the folding kinetics of a W62Y mutant, which has weakened hydrophobic interactions in the unfolded state and shows faster folding kinetics [72]. Also replacement of Trp108 by tyrosine accelerates folding of lysozyme [72] supporting a model of nonnative short-range and long-range interactions, which prevent fast folding of the β-domain. The final conundrum in the folding mechanisms of lysozyme at pH 5.2 was the origin of the 10 ms reaction that was only observed in pulsed hydrogen exchange experiments [67–69]. This reaction was interpreted as formation of a native-like intermediate with secondary structure formed both an the α- and β-domains [70]. It is very difficult to imagine formation of a native-like state to occur without changes in any spectroscopic parameters. A quantitative analysis of all reactions
20 Kinetic Mechanisms in Protein Folding
Fig. 10 Folding mechanism of lysozyme at pH 5.2 and low salt concentrations obtained by analyzing a large number of experimental data using various probes to detect refolding (see Figure 5), the GdmCl dependence of 81 and 82 (Figure 3A) and the results from interrupted refolding experiments (Figure 8B,C). Adapted from Refs [3, 43] and [52].
occurring during the pulsed hydrogen exchange experiments revealed, however, that the 10 ms reaction is an artifact of incomplete hydrogen exchange, which was misinterpreted as fast protection of protons [73]. The reason for the incomplete exchange was an acceleration of all folding rate constants and a change in the folding mechanism under the conditions of the exchange pulse (pH 9–10; see below). The determination of the folding mechanism and all microscopic rate constants for folding and unfolding under the conditions of the exchange pulse allowed a quantitative calculation of the exchange kinetics expected on the basis of the rate constants determined from spectroscopic probes. All microscopic rate constants for all conditions applied during the exchange experiments and the rate constants for H/D exchange of the individual amide protons in lysozyme [74] were used to calculate exchange kinetics. The results quantitatively reproduced the experimentally measured time course of amide occupancy when the folding rate constants from spectroscopic experiments (Figures 3A and 5) were used [73]. This showed that no kinetic phases in addition to the ones observed by spectroscopic probes are required to quantitatively explain the hydrogen exchange kinetics.
3 A Case Study: the Mechanism of Lysozyme Folding 21
3.2 Lysozyme Folding at pH 9.2 or at High Salt Concentrations
The procedure described above to elucidate the three-state mechanism for lysozyme folding at pH 5.2 can also be applied to elucidate the mechanism of more complex folding reactions with more than one intermediate. For lysozyme folding, a third kinetic reaction is observed at high salt concentrations [75] and at high pH (>8.5) [73], indicating the stabilization of an additional intermediate under these conditions. Thus, a four-state model with an additional rapid pre-equilibrium is required to describe the folding kinetics under these conditions. Again interrupted refolding experiments were able to monitor the time course of the population of all intermediates and of native molecules (Figure 11B). Together with the denaturant
Fig. 11 Folding of lysozyme at pH 9.2. A) Effect of GdmCl on the three apparent rate constants 81 , 82 , and 83 measured by a combination of single and sequential mixing experiments [73]. B) Time course of the two kinetic intermediates and the native state
during refolding at 0.6 M GdmCl. A global fit to the mechanism shown in Scheme 3 can describe the data (solid lines in all panels) and allowed all the microscopic rate constants to be determined (see Scheme 3).
22 Kinetic Mechanisms in Protein Folding
Scheme 2 Mechanism and rate constants for lysozyme folding at pH 5.2 in the presence of 0.85 M NaCl. Adapted from Ref. [75].
dependence of all three rate constants obtained from single and sequential mixing stopped-flow refolding and unfolding experiments (Figure 11A) this allowed the identification of the minimal kinetic mechanism for lysozyme folding under these conditions. The results showed that the additional intermediate induced at high salt concentrations or at high pH is located on a third parallel folding pathway. The four-state mechanisms shown in Schemes 2 and 3 show the minimal models describing lysozyme folding at pH 9.2 and high salt concentrations, respectively. In addition the microscopic rate constants obtained from global fits of all data are shown.
Scheme 3 Mechanism and rate constants for lysozyme folding at pH 9.2. Adapted from Ref. [73].
4 Non-exponential Kinetics
This case study demonstrates the need for methods capable of detecting the time course of individual kinetic species during the folding process. Direct spectroscopic measurements are usually not able to provide this information. In addition, the folding studies on lysozyme show that for complex reactions it is usually impossible to assign experimentally observed rate constants directly to individual steps on folding pathways, since they are complex functions of several microscopic rate constants. In order to determine all microscopic rate constants for a mechanism it is essential to combine results from interrupted refolding experiments and from the denaturant-dependence of all observable folding and unfolding reactions in a global fit to the solutions of various possible models (see Protocols, Section 6).
4 Non-exponential Kinetics
Non-exponential kinetics were first reported by Kohlrausch [76] for the relaxation of glass fibers after stretching. He described a broad distribution of relaxation times covering time scales of several orders of magnitude. Such non-exponential kinetics can be treated with a stretched exponential term of the kind β
A=A0 e−(k·t)
(21)
The stretch factor (β) indicates a time-dependent change in the rate constant (k), with β = 1 corresponding to the special case of single exponential kinetics. The kinetics become increasingly stretched with decreasing β. Stretched exponential behavior, which has also been referred to as “strange” kinetics [77], has recently been observed both in theoretical [78, 79] and in experimental work on biological systems [80]. The best-studied experimental model is the structural relaxation of myoglobin at low temperature [80] or at high solvent viscosity [81]. A prerequisite for strange kinetics is a rough energy landscape with significant barriers between a large number of local minima [78]. Thus, it was argued that protein folding reactions might exhibit stretched behavior, under conditions where local minima on the folding landscape become stabilized (e.g., at low temperature). Up to date there are only a few experimental reports on strange kinetics in protein folding, which describe the refolding of ubiquitin [82], phosphoglycerate kinase [82], and a WW domain at low temperature [83] measured in laser temperature-jump experiments on the microsecond time scale. The difficulty of unambiguously assigning the observed kinetics to stretched behavior may be seen from the fact that the kinetics only cover about 2.5 orders of magnitudes of life-times and can also be described by the sum of two or three exponentials. Nevertheless, these first indications of non-exponential behavior in protein folding on the very fast time scale provide an interesting new aspect to the understanding of energy landscapes for protein folding (see Chapter 14for a more detailed discussion of non-exponential kinetics in protein folding).
23
24 Kinetic Mechanisms in Protein Folding
5 Conclusions and Outlook
A quantitative treatment of folding kinetics is a prerequisite for elucidating mechanisms of protein folding. We have seen that concepts from classical reaction kinetics can be applied to protein folding as long as different ensembles of states are separated by significant barriers (>5kT). In most cases a discrimination between different folding mechanisms is not possible without the determination of the time course of native molecules. Thus, interrupted refolding experiments in combination with the complete denaturant dependence of all observable rate constants should be routinely used to analyzed complex folding reactions. 6 Protocols–Analytical Solutions of Three-state Protein Folding Models 6.1 Triangular Mechanism
The most general three-state mechanism is the triangular model is given by
(22)
The characteristic equation for this mechanism is
with
f (λ)=λ2 −λ(kUI +kIU +kUN +kNU +kNI +kIN )+(γ1 +γ2 +γ3 )=0
(23)
γ1 =kNI kIU +kNU kIU +kIN kNU γ2 =kNU kUI +kUI kNI +kUN kNI γ3 =kUI kIN +kIU kUN +kUN kIN
(24)
The apparent rate constants, λ1 and λ2 are the solution of Eq. (23) and are given by: √ −B± B2 −4C λ1,2 = 2
(25)
with B = −(kUN + kNU + kUI + kIU + kIN + kNI ) and C = γ 1 + γ 2 + γ 3 . The time-dependent behavior of the different kinetic species for refolding starting from the unfolded state (U 0 = 1; I0 ; N 0 = 0) is given by UI )−(γ2 +γ3 ) −λ1 t 3 (kUN +kUI ) −λ2 t U(t)=U0 ( λ1 (kUNλ+k e + γ2 +γ e + λγ1 λ1 2 ) λ2 (λ1 −λ2 ) 1 (λ1 −λ2 ) γ2 −kUI λ1 −λ1 t kUI λ2 −γ2 −λ2 t I(t)=U0 ( λ1 (λ1 −λ2 ) e + λ2 (λ1 −λ2 ) e + λγ1 λ2 2 ) λ2 −γ3 −λ2 t UN λ1 −λ1 t e + λkUN e + λγ1 λ3 2 ) N(t)=U0 ( λγ31−k (λ1 −λ2 ) 2 (λ1 −λ2 )
(26)
6 Protocols–Analytical Solutions of Three-state Protein Folding Models
where U(t), I(t), and N(t) represent the concentrations of the respective species at a given time t. The pre-exponential factors give the amplitude of the observable rate constant (see Eq. (2)) for monitoring the time course of the respective species and the constant terms represent the equilibrium concentrations of each species. The principle of microscopic reversibility provides an additional constraint for fitting the experimental data to the triangular mechanism kUI +kIN +kNU =1 kIU +kNI +kUN
(27)
The time-dependent behavior of the different species for starting from the native state (N 0 = 1; U 0 = 0; I0 = 0) or from the intermediate (I0 = 1; U 0 = 0; N 0 = 0) are obtained by substituting the respective microscopic rate constant in Eq. (26). 6.2 On-pathway Intermediate
kUI
kIN
kIU
kNI
UIN
(28)
The characteristic equation for the on-pathway mechanism is given by f (λ)=λ2 −λ·(kUI +kIU +kIN +kNI )+(kUI kIN +kUI kNI +kIU kNI )=0
(29)
The apparent rate constants, λ1 and λ2 are the solution of Eq. (29) (see Section 6.1). The time-dependent behavior of the different kinetic species for refolding starting from the unfolded state (U 0 = 1; I0 = 0; N 0 = 0) is given by IN −λ2 ) −λ2 t IN −kNI λ1 t e + kUI (kλ2NI(λ+k e + kλNI1 λkIU ) U(t)=U0 ( kUIλλ11(λ−k1 −λ 2) 1 −λ2 ) 2
−λ1 −λ1 t kUI (λ2 −kNI −λ2 t kUI kNI I(t)=U0 ( kλUI1 (λ(k1NI−λ e + λ2 (λ1 −λ2 ) e + λ1 λ2 ) 2) kUI kIN UI kIN e−λ2 t + kλUI1 λkIN ) N(t)=U0 ( λ1 (λ1 −λ2) e−λ2 t + λ−k 2 (λ1 −λ2 ) 2
(30)
For initial conditions I0 = 1; U 0 = 0; N 0 = 0 we yield (kNI −λ1 ) −λ1 t kIU (λ2 −kNI ) −λ2 t kNI kIU e + λ2 (λ1 −λ2 ) e + λ1 λ2 ) U(t)=I0 ( kIU λ1 (λ1 −λ2
1 )(kNI −λ1 ) −λ1 t NI )(kUI −λ2 ) −λ2 t I(t)=I0 ( (kUIλ−λ e + (λ2 −k e + kλUI1 λkNI ) λ2 (λ1 −λ2 ) 1 (λ1 −λ2 ) 2
(31)
UI −λ1 ) −λ1 t 2 −kUI ) −λ2 t e + kλIN2 (λ e + kλUI1 λkIN ) N(t)=I0 ( kλIN1 (k (λ1 −λ2 ) (λ1 −λ2 ) 2
The solutions for starting from the native state (N 0 = 1; U 0 = 0; I0 = 0) are the same as for starting from the unfolded state with replacing the respective microscopic rate constants in Eq. (30).
25
26 Kinetic Mechanisms in Protein Folding
6.3 Off-pathway Mechanism
kIU
kUN
kUI
kNU
I U N
(32)
The characteristic equation for the off-pathway mechanism is given by f (λ)=λ2 −λ(kUI +kIU +kUN +kNU )+(kIU kUN +kIU kNU +kUI kNU )=0
(33)
The apparent rate constants, λ1 and λ2 are the solution of Eq. (33) (see Section 6.1). The time-dependent behavior of the different kinetic species for refolding starting from the unfolded state (U 0 = 0; I0 = 0; N 0 = 0) is given by kNU 1 )(kNI −λ1 ) −λ1 t NU (kIU −λ2 ) −λ2 t e + (λ2 −k e + kIU ) U(t)=I0 ( (kIU −λ λ1 (λ1 −λ2 λ2 (λ1 −λ2 ) λ1 λ2 (kNU −λ1 ) −λ1 t kUI (λ2 −kNU ) −λ2 t kNU kUI I(t)=I0 ( kUI e + λ2 (λ1 −λ2 ) e + λ1 λ2 ) λ1 (λ1 −λ2 )
(34)
(kIU −λ1 ) −λ1 t kUN (λ2 −kIU ) −λ2 t kIU kUN e + λ2 (λ1 −λ2 ) e + λ1 λ2 ) N(t)=U0 ( kUN λ1 (λ1 −λ2 )
The solutions for starting from the native state (N 0 = 1; U 0 = 0; I0 = 0) or from the intermediate (I0 = 1; U 0 = 0; N 0 = 0) are obtained in the same way as discussed in Section 6.2. 6.4 Folding Through an On-pathway High-Energy Intermediate
Apparent two-state folding through a high-energy intermediate (see Figure 7A) can be described by a three-state on-pathway model kUI
kIN
kIU
kNI
U I∗ N
(35)
As discussed above, the kinetics of folding through a high-energy intermediate cannot be distinguished from two-state folding, when monitored at a single denaturant concentration. However, the presence of a kink in the chevron plot and the resulting altered slope at high denaturant concentrations (Figure 7B) allows a characterization of both transition states [57]. Comparing two-state folding (Eq. (??)) with Eq. (??) allows the rate constants kf and ku to be expressed for apparent two-state folding as a function of the microscopic rate constants kij for the three-state model for the different regimes shown in Figure 7A. At low denaturant concentrations (TS1 limit) formation of the high-energy intermediate (kUI ) is rate-limiting for folding (kUI kIN ) and thus we can approximate kf in the TS1 limit (kf(TS1)) as kf (TS1)=kUI
(36)
6 Protocols–Analytical Solutions of Three-state Protein Folding Models
The unfolding reaction in the TS1 limit requires going through I*. Since I* is higher in free energy than N (kNI kIN ) and TS1 represents the highest barrier, the crossing of transition state 2 can be treated as a pre-equilibrium. Thus, ku in the TS1 limit (ku (TS1)) is given by kU (TS1)=
kNI kIN −1 k I U =kNI ( ) kIN * kIU
(37)
For folding and unfolding at high denaturant concentrations TS2 represents the highest barrier (TS2 limit). Thus, in analogy to Eqs. (36) and (37), kf and ku can be expressed in the TS2 limit as: kf (TS2)=
kIN kUI kIN =kUI kIU kIU
(38)
kU (TS2)=kNI
(39)
These considerations show that the presence of a kink in the chevron plot allows the determination of the ratio kIU /kIN , which is equivalent to the equilibrium constant between the two transition states (K TS1 /TS2 ) and can be converted into the difference in free energy between the two transition states (G ◦TS1/TS2 ) kIN /kIU =kTS1/TS2 =e−(1/RT )(G
◦‡
(TS2)−G ◦‡ (TS1))
◦
=e−GTS2/TS1 /RT
(40)
Accordingly, the stability of I* cannot be determined, since it does not become populated, but the difference in free energy between both transition states (G ◦TS1/TS2 ) can be obtained. It should be noted that this analysis does not make any assumptions on the stability of the hypothetical intermediate besides that it is always less stable than U and N. Fitting the GdmCl dependence of ln λ to the analytical solutions of the three-state model (Eq. (29)) with λ1 and λ2 given by Refs [4] and [24] λ1,2 =
√ −B± B 2 −4C 2
(41)
with B=−(kUI +kIU +kIN +kNI ) C=kUI (kIN +kNI )+kIU kNI
(42)
therefore allows the determination of kUI and kNI and their denaturant dependencies mUI and mNI , respectively. It further yields the parameters kIN /kIU and mIN —mIU . Since only the ratios kIN /kIU and mIN —mIU are defined for folding through a high-energy intermediate, these ratios have to be used for data fitting
27
28 Kinetic Mechanisms in Protein Folding
[57]. Folding through a high-energy intermediate shows apparent two-state behavior and thus only the smaller one of the two apparent rate constants (λ1 ) should have the complete folding and unfolding amplitude (A1 = 1) whereas A2 = 0 (see Ref. [57]).
7 Acknowledgments
We thank Manuela Sch¨atzle and Beat Fierz for comments on the manuscript and all members of the Kiefhaber lab for help and discussion.
References 1 S´ANCHEZ, I. E. & KIEFHABER, T. (2003). Hammond behavior versus ground state effects in protein folding: evidence for narrow free energy barriers and residual structure in unfolded states. J. Mol. Biol. 327, 867–884. 2 KIEFHABER, T. (1995). Protein folding kinetics. In Methods in Molecular Biology, Vol. 40: Protein Stability and Folding Protocols (SHIRLEY, B. A., ed.), pp. 313–341. Humana Press, Totowa, NJ. 3 BIERI, O. & KIEFHABER, T. (2000). Kinetic models in protein folding. In Protein Folding: Frontiers in Molecular Biology 2nd edn (PAIN, R., ed.), pp. 34–64. Oxford University Press, Oxford. 4 SZABO, Z. G. (1969). Kinetic characterization of complex reaction systems. In Comprehensive Chemical Kinetics (BAMFORD, C. H. & TIPPER, C. F. H., eds), Vol. 2, pp. 1–80. vols. Elsevier Publishing Company, Amsterdam. 5 MOORE, J. W. & PEARSON, R. G. (1981). Kinetics and Mechanisms. John Wiley & Sons, New York. 6 JACKSON, S. E. (1998). How do small single-domain proteins fold? Folding Design 3, R81-R91. 7 EYRING, H. (1935). The activated complex in chemical reactions. J. Chem. Phys. 3, 107–115. 8 EVANS, M. G. & POLANYI, M. (1935). Some applications of the transition state method to the calculation of reaction velocities, especially in solution. Trans. Faraday Soc. 31, 875–885.
9 KRIEGER, F., FIERZ, B., BIERI, O., DREWELLO, M. & KIEFHABER, T. (2003). Dynamics of unfolded polypeptide chains as model for the earliest steps in protein folding. J. Mol. Biol. 332, 265–274. 10 GREENE, R. F. J. & PACE, C. N. (1974). Urea and guanidine-hydrochloride denaturation of ribonuclease, lysozyme, alpha-chyomtrypsinn and beta-lactoglobulin. J. Biol. Chem. 249, 5388–5393. 11 SANTORO, M. M. & BOLEN, D. W. (1988). Unfolding free energy changes determined by the linear extrapolation method. 1. Unfolding of References phenylmethanesulfonyl alpha-chymotrypsin using different denaturants. Biochemistry 27, 8063–8068. 12 TANFORD, C. (1970). Protein Denaturation. Part C. Theoretical models for the mechanism of denaturation. Adv. Prot. Chem. 24, 1–95. 13 IKAI, A., FISH, W. W. & TANFORD, C. (1973). Kinetics of unfolding and refolding of proteins. II. Results for cytochrome c. J. Mol. Biol. 73, 165–184. 14 MATTHEWS, C. R. (1987). Effect of point mutations on the folding of globular proteins. Methods Enzymol. 154, 498–511. 15 MYERS, J. K., PACE, C. N. & SCHOLTZ, J. M. (1995). Denaturant m values and heat capacity changes: relation to changes in accessible surface areas of protein unfolding. Protein Sci. 4, 2138–2148. 16 LEFFLER, J. E. (1953). Parameters for the description of transition states. Science
References 117, 340–341. 17 LEFFLER, J. E. & GRUNWALD, E. (1963). Rates and Equilibria of Organic Reactions. Dover, New York. 18 JENCKS, W. P. (1969). Catalysis in Chemistry and Enzymology. McGraw-Hill, New York. 19 S´ANCHEZ, I. E. & KIEFHABER, T. (2003). Non-linear rate-equilibrium free energy relationships and Hammond behavior in protein folding. Biophys. Chem. 100, 397–407. 20 IKAI, A. & TANFORD, C. (1973). Kinetics of unfolding and refolding of proteins. I. Mathematical Analysis. J. Mol. Biol. 73, 145–163. 21 TANFORD, C., AUNE, K. C. & IKAI, A. (1973). Kinetics of unfolding and refolding of proteins. III. Results for lysozyme. J. Mol. Biol. 73, 185–197. 22 KIEFHABER, T. (1995). Kinetic traps in lysozyme folding. Proc. Natl Acad. Sci. USA 92, 9029–9033. 23 KHORASANIZADEH, S., PETERS, I. D. & RODER, H. (1996). Evidence for a three-state model for protein folding from kinetic analysis of ubiquitin variants with altered core residues. Nat. Struct. Biol. 3, 193–205. 24 KIEFHABER, T., KOHLER, H. H. & SCHMID, F. X. (1992). Kinetic coupling between protein folding and prolyl isomerization. I. Theoretical models. J. Mol. Biol. 224, 217–229. 25 KIEFHABER, T. & SCHMID, F. X. (1992). Kinetic coupling between protein folding and prolyl isomerization. II. Folding of ribonuclease A and ribonuclease T1. J. Mol. Biol. 224, 231–240. 26 BIERI, O., WIRZ, J., HELLRUNG, B., SCHUTKOWSKI, M., DREWELLO, M. & KIEFHABER, T. (1999). The speed limit for protein folding measured by triplet-triplet energy transfer. Proc. Natl Acad. Sci. USA 96, 9597–9601. 27 BIERI, O. & KIEFHABER, T. (1999). Elementary steps in protein folding. Biol. Chem. 380, 923–929. 28 KUBELKA, J., HOFRICHTER, J. & EATON, W. A. (2004). The protein folding speed limit. Curr. Opin. Struct. Biol. 14, 76–88. 29 GAREL, J. R. & BALDWIN, R. L. (1973). Both the fast and slow folding reactions of
30
31
32
33
34
35
36
37
38
39
ribonuclease A yield native enzyme. Proc. Natl Acad. Sci. U.S.A. 70, 3347–3351. BRANDTS, J. F., HALVORSON, H. R. & BRENNAN, M. (1975). Consideration of the possibility that the slow step in protein denaturation reactions is due to cis-trans isomerism of proline residues. Biochemistry 14, 4953–4963. KIEFHABER, T., GRUNERT, H. P., HAHN, U. & SCHMID, F. X. (1990). Replacement of a cis proline simplifies the mechanism of ribonuclease T1 folding. Biochemistry 29, 6475–6480. LANG, K., SCHMID, F. X. & FISCHER, G. (1987). Catalysis of protein folding by prolyl isomerase. Nature 329, 268–270. REIMER, U., SCHERER, G., DREWELLO, M., KRUBER, S., SCHUTKOWSKI, M. & FISCHER, G. (1998). Side-chain effects on peptidylprolyl cis/trans isomerization. J. Mol. Biol. 279, 449–460. BALBACH, J. & SCHMID, F. X. (2000). Proline isomerization and its catalysis 12.1 Kinetic Mechanisms in Protein Folding in protein folding. In Protein Folding: Frontiers in Molecular Biology (PAIN, R., ed.). Oxford University Press, Oxford. KIEFHABER, T., QUAAS, R., HAHN, U. & SCHMID, F. X. (1990). Folding of ribonuclease T1. 1. Existence of multiple unfolded states created by proline isomerization. Biochemistry 29, 3053–3061. KIEFHABER, T., QUAAS, R., HAHN, U. & SCHMID, F. X. (1990). Folding of ribonuclease T1. 2. Kinetic models for the folding and unfolding reactions. Biochemistry 29, 3061–3070. ODEFEY, C., MAYR, L. & SCHMID, F. X. (1995). Non-prolyl cis/trans peptide bond isomerization as a rate-determinig step in protein unfolding and refolding. J. Mol. Biol. 245, 69–78. PAPPENBERGER, G., AYGU¨ N, H., ENGELS, J. W., REIMER, U., FISCHER, G. & KIEFHABER, T. (2001). Nonprolyl cis peptide bonds in unfolded proteins cause complex folding kinetics. Nat. Struct. Biol. 8, 452–458. SCHERER, G., KRAMER, M. L., SCHUTKOWSKI, M., REIMER, U. & FISCHER, G. (1998). Barriers to rotation of
29
30 Kinetic Mechanisms in Protein Folding
40
41
42
43
44
45
46
47
48
49
50
51
secondary amide peptide bonds. J. Am. Chem. Soc. 120, 5568–5574. COLON, W., WAKEM, L. P., SHERMAN, F. & RODER, H. (1997). Identification of the predominant non-native histidine ligand in unfolded cytochrome c. Biochemistry 36, 12535–12541. YEH, S.-R., TAKAHASHI, S., FAN, B. & ROUSSEAU, D. L. (1998). Ligand exchange in unfolded cytochrome c. Nat. Struct. Biol. 4, 51–56. YEH, S.-R. & ROUSSEAU, D. L. (1998). Folding intermediates in cytochrome c. Nat. Struct. Biol. 5, 222–228. SEGEL, D., BACHMANN, A., HOFRICHTER, J., HODGSON, K., DONIACH, S. & KIEFHABER, T. (1999). Characterization of transient intermediates in lysozyme folding with time-resolved small angle X-ray scattering. J. Mol. Biol. 288, 489–500. BACHMANN, A. & KIEFHABER, T. (2002). Test for cooperativity in the early kinetic intermediate in lysozyme folding. Biophys. Chem. 96, 141–151. KUWAJIMA, K. (1989). The molten globule state as a clue for understanding the folding and cooperativity of globular-protein structure. Proteins Struct. Funct. Genet. 6, 87–103. WU, L. C., PENG, Z.-Y. & KIM, P. S. (1995). Bipartite structure of the a-lactalbumin molten globule. Nat. Struct. Biol. 2, 281–286. WU, L. C. & KIM, P. S. (1998). A specific hydrophobic core in the α-lactalbumin molten globule. J. Mol. Biol. 280, 175–182. LUO, Y. & BALDWIN, R. L. (1999). The 28-111 disulfide bond contrains the a-lactalbumin molten globule and weakens its cooperativity of folding. Proc. Natl Acad. Sci. USA 96, 11283–11287. LUO, Y., KAY, M. S. & BALDWIN, R. L. (1997). Cooperativity of folding of the apomyoglobin pH 4 intermediate studied by glycine and proline mutations. Nat. Struct. Biol. 4, 925–929. SILOW, M. & OLIVEBERG, M. (1997). Transient aggregates in protein folding are easilty mistaken for folding intermediates. Proc. Natl Acad. Sci. USA 94, 6084–6086. SCHMID, F. X. (1983). Mechanism of folding of ribonuclease A. Slow refolding
52
53
54 55
56
57
58
59
60
61
62
63
is a sequential reaction via structural intermediates. Biochemistry 22, 4690–4696. WILDEGGER, G. & KIEFHABER, T. (1997). Three-state model for lysozyme folding: triangular folding mechanism with an energetically trapped intermediate. J. Mol. Biol. 270, 294–304. HILL, T. L. (1974). The sliding filament model of contraction of striated muscle. Progr. Biophys. Mol. Biol. 28, 267–340. HILL, T. L. (1977). Free Energy Transduction in Biology. Academic Press, London. WEISMANN, J. S. & KIM, P. S. (1991). Reexamination of the folding of BPTI: predominance of native intermediates. Science 253, 1386–1393. KIEFHABER, T., GRUNERT, H. P., HAHN, U. & SCHMID, F. X. (1992). Folding of RNase T1 is decelerated by a specific tertiary contact in a folding intermediate. Proteins Struct. Funct. Genet. 12, 171–179. BACHMANN, A. & KIEFHABER, T. (2001). Apparent two-state tendamistat folding is a sequential process along a defined route. J. Mol. Biol. 306, 375–386. S´ANCHEZ, I. E. & KIEFHABER, T. (2003). Evidence for sequential barriers and obligatory intermediates in apparent two-state protein folding. J. Mol. Biol. 325, 367–376. JENCKS, W. P. (1980). When is an intermediate not an intermediate? Enforced mechanisms of general acid-base catalyzed, carbonation, carbanion, and ligand exchange reactions. Acc. Chem. Res. 13, 161–169. KIEFHABER, T., BACHMANN, A., WILDEGGER, G. & WAGNER, C. (1997). Direct measurements of nucleation and growth rates in lysozyme folding. Biochemistry 36, 5108–5112. WALKENHORST, W. F., GREEN, S. & RODER, H. (1997). Kinetic evidence for folding and unfolding intermediates in staphylococcal nuclease. Biochemistry 36, 5795–5805. BLAKE, C. C. F., KOENIG, D. F., MAIR, G. A., NORTH, A. C. T., PHILLIPS, D. C. & SARMA, V. F. (1967). Structure of hen egg-white lysozyme. Nature 206, 757–761. BLAKE, C. C. F., MAIR, G. A., NORTH, A. C. T., PHILLIPS, D. C. & SARMA, V. F. (1967).
References
64
65
66
67
68
69
70
71
72
On the conformation of the hen egg-white lysozyme molecule. Proc. R. Soc. B 167, 365–377. KUWAJIMA, K., HIRAOKA, Y., IKEGUCHUI, M. & SUGAI, S. (1985). Comparison of the transient folding intermediates in lysozyme and α-lactalbumin. Biochemistry 24, 874–881. CHAFFOTTE, A. F., GUILLOU, Y. & GOLDBERG, M. E. (1992). Kinetic resolution of peptide bond and side-chain far-UV CD during folding of HEWL. Biochemistry 31, 9694–9702. ITZHAKI, L. S., EVANS, P. A., DOBSON, C. M. & RADFORD, S. E. (1994). Tertiary interactions in the folding pathway of hen lysozyme: kinetic studies using fluorescent probes. Biochemistry 33, 5212–5220. RADFORD, S. E., BUCK, M., TOPPING, K. D., DOBSON, C. M. & EVANS, P. A. (1992). Hydrogen exchange in native and denatured states of hen egg-white lysozyme. Proteins 14, 237–248. MIRANKER, A. D., ROBINSON, C. V., RADFORD, S. E., APLIN, R. T. & DOBSON, C. M. (1993). Detection of transient protein folding populations by mass spectroscopy. Science 262, 896–900. MATAGNE, A., CHUNG, E. W., BALL, L. J., RADFORD, S. E., ROBINSON, C. V. & DOBSON, C. M. (1998). The origin of the α-domain intermediate in the folding of hen lysozyme. J. Mol. Biol. 277, 997–1005. KULKARNI, S. K., ASHCROFT, A. E., CAREY, M., MASSELOS, D., ROBINSON, C. V. & RADFORD, S. E. (1999). A near-native state on the slow refolding pathway of hen lysozyme. Protein Sci. 8, 35–44. KLEIN-SEETHARAMAN, J., OIKAWA, M., GRIMSHAW, S. B., WIRMER, J., DUCHARDT, E., UEDA, T., et al. (2002). Long-range interactions within a nonnative protein. Science 295, 1719–1722. ROTHWARF, D. M. & SCHERAGA, H. A. (1996). Role of non-native aromatic and
73
74
75
76
77
78
79
80
81
82
83
hydrophobic interactions in the folding of hen egg white lysozyme. Biochemistry 35, 13797–13807. BIERI, O. & KIEFHABER, T. (2001). Origin of apparent fast and non-exponential kinetics of lysozyme folding measured in pulse labeling experiments. J. Mol. Biol. 310, 919–935. BAI, Y., MILNE, J. S., MAYNE, L. & ENGLANDER, W. (1993). Primary structure effects on peptide group hydrogen exchange. Proteins Struct. Funct. Genet. 17, 75–86. BIERI, O., WILDEGGER, G., BACHMANN, A., WAGNER, C. & KIEFHABER, T. (1999). A salt-induced intermediate is on a new parallel pathway of lysozyme folding. Biochemistry 38, 12460–12470. KOHLRAUSCH, R. (1847). Uber das 410. 12.1 Kinetic Mechanisms in Protein Folding Dellmann’sche Elektrometer. Ann. Phys. Chem. 11, 353–405. SHLESINGER, M. F., ZASLAVSKY, G. M. & KLAFTER, J. (1993). Strange kinetics. Nature 363, 31–37. SAVEN, J. G., WANG, J. & WOLYNES, P. G. (1994). Kinetics of protein folding: The folding dynamics of globally connected rough energy landscapes with biases. J. Chem. Phys. 101, 11037–11043. SKOROBOGATIY, M., GUO, H. & ZUCKERMANN, M. (1998). Non-Arrhenius behavior in the realaxation of model proteins. J. Chem. Phys. 109, 2528–2535. AUSTIN, R. H., BEESON, K. W., EISENSTEIN, L., FRAUENFELDER, H. & GUNSALUS, I. C. (1975). Dynamics of ligand binding to myoglobin. Biochemistry 14, 5355–5373. HAGEN, S. J., HOFRICHTER, J. & EATON, W. A. (1995). Protein kinetics in a glass at room temperature. Science 269, 959–962. SABELKO, J., ERVIN, J. & GRUEBELE, M. (1999). Observation of strange kinetics in protein folding. Proc. Natl Acad. Sci. USA 96, 6031–6036. YANG, W. Y. & GRUEBELE, M. (2003). Folding at the speed limit. Nature 423, 193–197.
31
1
Association and Folding of Small Oligomeric Proteins Hans-Rudolf Bosshard University of Zurich, Z¨urich, Switzerland
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Our current understanding of protein folding has been mostly obtained through the study of monomeric, globular proteins. However, oligomeric proteins are probably more abundant in nature. There is strong evolutionary pressure for monomeric proteins to associate into oligomers [1]. Many results and conclusions acquired from the study of monomeric proteins can be applied to oligomeric proteins. For example, the stabilities of both monomeric and oligomeric proteins result from a delicate balance between opposing forces, the major contributors to stability being the hydrophobic effect, van der Waals interactions, and hydrogen bonds among polar residues [2–4]. In contrast to monomeric proteins, the stabilizing and destabilizing forces operate not only within single subunits but also between the subunits of an oligomeric protein; that means, both intramolecular and intermolecular forces contribute to stability. Nevertheless, there are no fundamental differences between the mechanisms of folding of monomeric and oligomeric proteins. For both types of proteins singular as well as multiple folding pathways are observed with no, a few, or many intermediates and sometimes with nonnative, dead-end products. However, one difference is important: At least one step in the folding of an oligomeric protein must be concentration dependent. This is because the formation of an oligomeric protein includes folding and association; and folding and association are concurrent reactions. In some cases, folding and association can be studied as separate and virtually independent reactions, in others they are fully coupled into a single reaction step (“two-state” mechanism or “single-step” mechanism), yet most often they can be distinguished but are not independent of each other. The degree of coupling between folding and association largely depends on the balance between the intra- and intermolecular forces that stabilize the oligomeric Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Association and Folding of Small Oligomeric Proteins
structure. If the isolated monomeric subunits are stabilized by strong intramolecular forces and the folded monomeric subunits themselves are thermodynamically stable folded structures, then folding and association are separate steps: formation of the oligomer corresponds to the association of preformed folded monomers. If the isolated monomeric subunits are unstable and stability is gained only by intermolecular forces within the oligomeric structure, then folding and association are intimately dependent on each other and are tightly coupled. The concurrence of intra- and intermolecular reactions and the dependence of folding on concentration adds an extra level of complexity to folding. This is perhaps the main reason why our knowledge of the stability and folding of multisubunit proteins is still rudimentary and often of only a qualitative nature. Indeed, quantitative thermodynamic and kinetic analyses have been restricted to mainly dimeric proteins. Reliable quantitative thermodynamic and kinetic data on trimeric and higher order oligomeric structures are still scarce. The present chapter focuses on the detailed, quantitative in vitro analysis of the thermodynamics and kinetics of folding of small dimeric proteins and protein motifs, with only brief and passing reference to trimeric and tetrameric structures. The emphasis is on concepts and methods. We do not present a comprehensive review of the assembly of dimeric proteins but select appropriate examples to illustrate experimental methods, concepts and principles. In vitro and in vivo folding and association of oligomeric and multimeric proteins has been reviewed before (see Refs [5–8]).
2 Experimental Methods Used to Follow the Folding of Oligomeric Proteins
In principle, the methods used in the study of monomeric proteins can be used to follow dissociation and unfolding of oligomeric proteins under equilibrium conditions and to measure the kinetics of folding, unfolding, and refolding. The main difference is in data analysis, which has to consider the association step that is lacking in monomeric proteins. 2.1 Equilibrium Methods
Chemical unfolding by urea or guanidinium chloride (GdmCl) is the most commonly used method to measure the fraction of folded and unfolded protein, respectively, and to calculate equilibrium constants. Free energies of unfolding at standard conditions, GU (H2 O), are obtained by linear extrapolation to zero denaturant as is done for monomeric proteins [9] (see Appendix: Section A2.2 for more details). Extrapolation is performed on any suitable signal change paralleling the denaturant-induced change of molecular state; examples are far-UV circular dichroism (CD) to follow secondary structure changes, or tryptophan fluorescence to monitor molecular packing density and polarity. In applying linear extrapolation
2 Experimental Methods Used to Follow the Folding of Oligomeric Proteins
one tacitly assumes that the effects of the denaturant on both unfolding and dissociation of the oligomeric protein are linear. This assumption is difficult to check. Extrapolating the same value of GU (H2 O) from urea and GdmCl unfolding is a good indication of a linear denaturant dependence of dissociation and unfolding [10, 11]. Another popular equilibrium method is thermal unfolding. If the equilibrium between unfolded subunits and folded oligomer is reversible, the change with temperature of an appropriate spectroscopic signal such as fluorescence emission or CD can be used to determine the fraction of protein in the folded and unfolded states and, occasionally, also the fraction of intermediate states. The immediate result from the thermal melting curve of a protein is its midpoint temperature of unfolding, T m . If the reversible unfolding of an oligomeric protein follows a twostate mechanism, half of the protein is unfolded and dissociated and half is folded and associated at T m . (This does not apply to the midpoint T m of a melting curve obtained by differential scanning calorimetry [12].) The melting curve provides the fraction of unfolded, monomeric protein, f M , from which the equilibrium unfolding constant K U and the free energy of unfolding GU = −RT ln K U is calculated. Note that in using K U and GU , the folded, native state is the reference state. From the temperature dependence of the equilibrium constants of unfolding, the van’t Hoff enthalpy of the dissociation-unfolding transition at T m can be calculated (see Appendix: Sections A2.3 and A2.4). From differential scanning calorimetry experiments the calorimetric enthalpy can be obtained, which is model independent and may differ from the van’t Hoff enthalpy if the unfolding reaction is not twostate. Extrapolation of enthalpies and free energies to standard conditions (for example 25 ◦ C) are based on the formalism of equilibrium thermodynamics (see Appendix: Section A2.5). Such extrapolation can be inaccurate if it extends over a large temperature range and the heat capacity change is not accurately known. The fraction of protein in the oligomeric and monomeric state may also be obtained by following an appropriate signal after simple dilution. This approach is an added benefit of the concentration dependence of the monomer/oligomer equilibrium and can be used under benign buffer conditions, that is, in the absence of denaturant and at ambient temperature. If dissociation and unfolding are fully coupled, the dilution method yields the overall free energy of unfolding plus dissociation. However, if different free energies of unfolding are obtained from the dilution method and from chemical or thermal unfolding, this indicates that folding and association are not, or only partly, coupled. For example, a lower free energy of unfolding from dilution than from chemical unfolding is an indication for a folding intermediate that is populated under benign conditions but not in the presence of chemical denaturant. In other words, the oligomeric protein dissociates into (partly) folded subunits during dilution under benign conditions, yet it dissociates into unfolded subunits in the presence of a denaturant. The difference between GU (H2 O) from dilution and GU (H2 O) from chemical unfolding can then be ascribed to the unfolding free energy of the subunits. Other methods to monitor the fraction of folded and unfolded oligomeric proteins under benign buffer conditions are sedimentation equilibrium analysis,
3
4 Association and Folding of Small Oligomeric Proteins
gel-filtration chromatography, light-scattering analysis, and isothermal titration calorimetry. In the last method, the heat evolved or taken up when two (or more) molecules interact with each other is measured and the reaction enthalpy is calculated from the integrated heat change. Isothermal titration calorimetry can only be applied to heterodimeric proteins whose subunits do not form homomeric structures. The procedure yields the change of the enthalpy, the Gibbs free energy and the subunit stoichiometry of the oligomeric structure [12–14]. It should be noted that all the methods based on dissociation under benign buffer conditions work only if the association between the subunits is relatively weak and association/dissociation occurs in the micromolar to nanomolar concentration range. If an oligomeric protein assembles at less than nanomolar concentration, spectroscopic signal changes and calorimetric heat changes may be too weak to monitor by dilution. 2.2 Kinetic Methods
The same kinetic methods used for monomeric proteins can be applied to study the time course of folding and association or unfolding and dissociation of oligomeric proteins. Time-resolved methods include conventional stopped-flow and quenchedflow techniques for the time range of milliseconds to seconds, and fast relaxation methods like temperature jump to follow more rapid reactions. Variation of the observation signal allows different aspects of the reaction to be investigated. For example, far-UV CD spectroscopy, pulse-labeling NMR and mass spectroscopy are used to follow secondary structure formation and formation of hydrogen bonds; intrinsic and extrinsic fluorescence measurements monitor molecular packing; fluorescence anisotropy and small angle X-ray estimate changes in molecule size; ligand binding reports the gain or loss of biological activity. (See Ref. [15] for spectroscopic methods used to study protein folding.) As for monomeric proteins, unfolding and refolding is often studied by rapid dilution into denaturant and out of denaturant, respectively. This necessitates extrapolation to standard conditions in water. To avoid extrapolation, unfolding under benign buffer conditions may be studied from relaxation after a rapid change of temperature, pH or simply after rapid dilution (see Refs [16] and [17] and Appendix: Section A3). If an oligomeric protein is composed of different subunits, folding can be studied in a very direct way by rapid mixing of the different subunits. This means there is no need to first unfold the heterooligomeric protein and thereafter follow its refolding by returning to folding conditions. The rates of dissociation of an oligomeric protein can also be measured by subunit exchange. To this end, the protein has to be labeled with a fluorescence tag or another appropriate marker. Folded protein with the marker is rapidly diluted with a large excess of folded protein lacking the marker. The change of the signal of the marker is then proportional to the rate of dissociation of the marked protein provided that reassociation is much more rapid than dissociation, which can be achieved by increasing the protein concentration [17, 18].
3 Dimeric Proteins
3 Dimeric Proteins
The simplest folding mechanism of a dimeric protein is a two-state or one-step reaction between two unfolded monomers Mu and the folded dimer Df as described by Eq. (1). kf
2Mu Df ku
K d=
(1)
ku [Mu ]2 = kf [Df ]
Here and elsewhere, we use the notation M for monomer and D for dimer. The subscripts indicate unfolded and folded, respectively. For a monomeric protein both kf and ku are monomolecular rate constants (s−1 ). In the case of a dimeric protein, kf is a bimolecular rate constant (M−1 s−1 ) and only ku is unimolecular (s−1 ). Square brackets indicate molar concentrations. Also, in contrast to a monomeric protein, the equilibrium dissociation constant K d has the unit of a concentration. It is related to the thermodynamic parameters of unfolding by the well-known thermodynamic equation: G U =HU −T SU =−RT lnK d
(2)
Equation (1) says that the folding and association reactions are completely linked. In practice, this means that the association of the monomeric subunits and their folding to the native conformation of the final dimer cannot be experimentally separated. However, this does not exclude the fleeting existence of intermediates. Indeed, there are several instances where folding is thermodynamically two-state but kinetics reveal transitory intermediates (see the examples in Table 3). Hence, a more physically meaningful mechanism for the formation of a dimeric protein is described by Eq. (3). k1
k2
k3
k−1
k−2
k−3
2Mu 2Mi D1 Df [Mu ] k−1 [Mi ]2 k−2 [Di ] k−3 K d1 = K d2 =
(3)
Here, monomeric and dimeric intermediates (subscript i) are at equilibrium with the unfolded monomer Mu and the folded dimer Df , respectively. Intermediate Mi can be considered as an association-competent subset of conformational forms of the monomer, perhaps a set of partly folded monomers. Only Mi associates to the dimeric intermediate Di , which then rearranges to the final folded dimer Df . The first and third equilibria are unimolecular, the middle equilibrium is bimolecular. If the first equilibrium is far on the side of Mu and the last is far on the side of Df , Eq. (3) simplifies to the one-step Eq. (1). In this case, only Mu and Df are detected
5
6 Association and Folding of Small Oligomeric Proteins
under equilibrium conditions, but the kinetics of folding and unfolding may exhibit different phases indicative of transitory formation of intermediates. 3.1 Two-state Folding of Dimeric Proteins
Several dimeric proteins fold according to Eq. (1), exhibiting two-state transition between two unfolded monomers and a folded dimer. For monomeric proteins it has been shown that a two-state folding model is justified if the protein rapidly equilibrates between many unfolded conformations prior to complete folding [19]. By the same reasoning, two-state folding seems justified if intermediates Mi and Di of Eq. (3) are ensembles of barely populated conformations at rapid equilibrium with Mu and Df , respectively. One has to keep in mind that two-state folding is an operational concept: The sole observation of a folded dimer and an unfolded monomer under a given set of experimental conditions does not rule out the existence of folding intermediates. Table 1 lists operational criteria that typically apply to two-state folding. At least one of criteria i–iv referring to equilibrium experiments, and one of criteria v–vii referring to kinetic experiments, as well as criterion viii should be met to demonstrate two-state folding. The calorimetric and van’t Hoff equality criterion ix is more difficult to test because thermal unfolding has to be over 90% reversible, which is not always the case. The thermodynamic and kinetic formalism for two-state folding is relatively simple and has often been described (see the Appendix for details). As for monomeric proteins, the equilibrium dissociation constant K d is obtained from the fraction of monomeric protein f M , or the fraction of dimeric protein f D = 1–f M . For a monomeric protein the equilibrium constant, defined as K d = f M /(1–f M ), is independent of the protein concentration. In contrast, to determine K d of a dimeric Table 1 Criteria for two-state folding
(i)
Equilibrium unfolding followed with different physical probes such as light absorption, fluorescence emission, circular dichroism yield superimposable curves (ii) Equilibrium measurements are dependent on protein concentration (iii) The change of the free energy of unfolding with denaturant concentration is linear (iv) The reciprocal midpoint temperature of unfolding (1/T m ) is linearly proportional to the logarithm of the total protein concentration (v) Refolding kinetics is concentration dependent, unfolding kinetics is concentration independent (vi) Kinetics of folding and unfolding exhibit single phases, and these account for the entire amplitude of the signal changes of folding and unfolding, respectively (vii) The change of the unfolding and refolding rate constants with denaturant is linear (viii) Equilibrium constants determined experimentally are the same within error as those calculated from the ratio of the association and dissociation rate constants (ix) The enthalpy change of the dimer to monomer transition obtained by van’t Hoff analysis of thermal unfolding curves is the same within error as the calorimetrically measured enthalpy change
3 Dimeric Proteins
protein, the protein concentration has to be known. Moreover, the relationship between K d and concentration is different for homodimeric and heterodimer proteins, as shown by Eqs (4) and (5) (see also Appendix: Section A1). K d=
2[Pt ] f M2 1− f M
(4)
K d=
[Pt ] f M2 2 (1− f M )
(5)
In the above equations, [Pt ] is the total protein concentration expressed as the total monomer concentration: [Pt ] = [M] + 2[D]. It is important to note that the protein concentration has to be accounted for in analyzing the thermodynamic and kinetic data. For example, the well-known linear extrapolation of chemical unfolding data to zero denaturant concentration has to consider protein concentration because GU is not zero at the transition midpoint of chemical unfolding (see Appendix: Section A2.2). 3.1.1 Examples of Dimeric Proteins Obeying Two-state Folding Table 2 lists five representative examples of small dimeric proteins for which twostate folding has been demonstrated based on several of the criteria listed in Table 1. The Arc repressor of bacteriophage P22 is a small homodimeric protein composed of two 53 residue polypeptide chains [10]. Figure 1 shows that the fluorescence and CD changes induced by denaturant are superimposable (criterion i of Table 1). The position of the unfolding trace is dependent on protein concentration as predicted for a bimolecular reaction (criterion v of Table 1). The change of unfolding free energies with denaturant concentration is linear (criterion iii of Table 1) and extrapolation to water yields the same free energy of unfolding for both urea and GdmCl unfolding, supporting of the validity of the linear extrapolation approach for both the unfolding reaction and the dissociation of dimers into monomers. Figure 2 shows the urea dependence of unfolding and refolding rates of the dimeric Arc repressor. The rate of refolding increases linearly when the denaturant concentration is lowered; the rate of unfolding increases linearly when the denaturant concentration increases (criterion vii). Refolding is concentration dependent, unfolding is concentration independent, and the single kinetic phases account for the entire amplitude of refolding and unfolding (criterion vi). Finally, the free energy of unfolding and dissociation extrapolated from equilibrium measurements and calculated from the ratio of the refolding and unfolding rate are very similar, as demanded by criterion viii. In the middle range of denaturant concentrations, both unfolding and refolding reactions have to be taken into account when analyzing the unfolding and refolding traces, respectively. The analysis was performed using Eq. (A29) [20]. Note that, unlike in the case of a monomeric protein, the crossover point of the unfolding and refolding lines in Figure 2 do not yield the midpoint concentration of the denaturant at which [M] = [D]/2 and GU = 0. This is because the refolding reaction and hence the midpoint of denaturation depends on the protein concentration.
7
kt
Equilibrium analysisa
Kinetic analysisa
GU = 44 kJ mol−1 , Cp = 3–4 kJ M−1 kf = 2 × 107 M−1 s−1 , ku = 0.2 s−1 , Gu = 46 kJ mol−1 [28] K−1 , HU (25◦ C) = 120 kJ M−1 [27]
a
Unless stated, thermodynamic and kinetic parameters refer to ambient temperature (20 to 25◦ C)
Nerve growth factor GU = 81 kJ mol−1 [11] (homodimer) Mannose-binding GU = 55 kK mol−1 , Cp = 13 kJ M−1 lectin (heterodimer) K−1 , HU (66◦ C) = 728 kJ M−1 [73]
Leucine zipper AB (heterodimer)
Arc repressor GU = 40 kJ mol−1 , Cp = 6.7 kJ M−1 kf = 9 × 106 M−1 s−1 , ku = 0.2 s−1 , (homodimer) Gu = 41 kJ mol−1 [20] K−1 , HU (54◦ C) = 297 kJ M−1 [10] kf = 4 × 100 M−1 s−1 , ku = 3.3 × 10−3 GCN4 leucine zipper GU (5◦ C) = 39 kJ mol−1 [72] (homodimer) s−1 , GU (5 ◦ C) = 43 kJ mol−1 [45]
Protein
ku
Table 2 Examples of dimeric proteins folding through a simple two-state mechanism: 2Mu D f
Cp -values from experiment and calculated from the buried subunit interface agree well
Many native-like interactions in the dimeric transition state [21–23] Calorimetric analysis indicates two monomolecular pretransitions at 20 ◦ C and 50◦ C before the main bimolecular transition [26] Evidence for association-competent, partly helical monomer [48, 49] Folding, but not unfolding, is strongly dependent on ionic strength indicating electrostatistically assisted association [28]
Comments
8 Association and Folding of Small Oligomeric Proteins
3 Dimeric Proteins
Fig. 1 Equilibrium unfolding of dimeric Arc repressor according to a two-state mechanism. A) Guanidinium chloride (GdmCl) denaturation curves monitored by fluorescence (squares) and CD (circles) are superimposable and the solid line is described by Eq. (4) for two-state folding. B) Urea denaturation is concentration dependent as predicted by Eq. (4). Experiments were performed with 1.6 µM (squares) and 16 µM (circles) total protein concentration. C) Free
energies of unfolding measured with different total peptide concentrations (different symbols) are linearly dependent on GdmCl concentration (lower data) and urea concentration (upper data). Linear extrapolation of urea and GdmCl according to Eq. (A12) yields the same free energy of unfolding of GU (H2 O) = 46 kJ mol−1 , supporting the validity of the linear extrapolation method. Figure adapted from Ref. [10] with permission.
9
10 Association and Folding of Small Oligomeric Proteins
Fig. 2 Urea dependence of the rate constants of refolding (squares, left ordinate) and unfolding (circles, right ordinate) of Arc repressor. Empty symbols refer to data analysis neglecting unfolding (Eq. (A22)) or refolding (Eq. (A25)). Dotted symbols refer
to data analysis including both unfolding and refolding as described by Eqs (A27) and (A29). Experiments were performed at pH 7.5 and 20 ◦ C. Figure adapted from Ref. [20] with permission.
Extrapolation of the folding rate constant of the Arc repressor to zero denaturant yields kf of about 9 × 106 M−1 s−1 , which is about two orders of magnitude below the diffusion limit. Moreover, the folding rate is nearly independent of solvent viscosity. It seems that only a small fraction of collisions of unfolded monomers are productive, otherwise the rate should approach the diffusion limit and depend on solvent viscosity. This is supported by a dimeric transition state of folding exhibiting regions of native-like structure. In particular, there are native-like backbone interactions in the transition state of folding of the Arc repressor [21–23]. Mutating a cluster of buried charged residues forming a complex salt bridge network into hydrophobic residues speeds up folding, makes the folding rate viscosity dependent and brings the structure of the dimeric transition state closer to the unfolded state [24]. The leucine zipper domain of the yeast transcriptional activator protein GCN4 forms a coiled-coil structure in which two α-helices are wound around each other to form a superhelix [25]. This leucine zipper domain of GCN4 folds by a twostate mechanism. However, calorimetric analysis indicates two monomolecular pretransitions at 20 ◦ C and 50 ◦ C before the main concentration-dependent, bimolecular transition takes place [26]. Another coiled coil whose folding has been analyzed in much detail is the heterodimeric leucine zipper AB [27, 28]. This short, designed leucine zipper consists of two different 30 residue peptide chains whose sequences are related to the GCN4 leucine zipper [25]. One chain is acidic (A-chain) and the other is basic (B-chain) so that inter-subunit electrostatic interactions are maximized [27, 29]. Figure 3 shows thermal unfolding of leucine zipper AB. The sigmoidal shape is described by Eq. (5) for two-state unfolding of a heterodimeric
3 Dimeric Proteins
Fig. 3 Temperature-induced unfolding of the heterodimeric leucine zipper AB. The unfolding transition was monitored by the change in the CD signal at 222 nm. From T m = 343 K and [Pt ] = 3:9 × 10−5 M, the dissociation constant K d of 1 × 10−5 M
at T m is calculated, corresponding to the free energy of unfolding of GU = 32:9 kJ mol−1 . To calculate K d at any temperature T, the pre- and post-transition slopes (thin lines) have to be taken into account. Figure adapted from Ref. [27] with permission.
protein. The midpoint of the thermal transition yields the transition temperature T m where half of the protein is dimeric and half is monomeric. The enthalpy of unfolding at T m ; HU ,m , is obtained from the data in Figure 3 by plotting ln K versus T according to the van’t Hoff equation (Eq. (A14)). A further test for two-state unfolding is provided by a plot of 1/T m against the logarithm of the total protein concentration, as shown in the inset of Figure 3 (Eq. (A18)). To obtain the unfolding free energy, GU (T), at any temperature T, the data from the thermal unfolding curve of Figure 3 need to be extrapolated by the GibbsHelmholtz equation describing the change of GU with temperature. T T +Cp T −Tm −T < ln G U (T ) =HU,m 1− Tm Tm −RT ln[K d (Tm ) ]
(6)
The last term of Eq. (6) accounts for the concentration dependence of GU . This term is missing in the equation used for monomeric proteins. Figure 4 shows the thermal stability curve of the heterodimeric leucine zipper AB. The curve was constructed using data sets from thermal unfolding in the range 340–350 K (T m was varied with protein concentration) and from isothermal titration calorimetry performed between 283 and 313 K. Since neither peptide chain alone associates to a homodimer, the formation of the heterodimeric “AB-zipper” can be followed by
11
12 Association and Folding of Small Oligomeric Proteins
Fig. 4 Stability curve of the heterodimeric leucine zipper AB at pH 7.2 and 15 mM ionic strength. The curve was calculated by fitting experimental values of GU obtained by thermal unfolding (squares) and isothermal titration calorimetry (circles) with the help of Eq. (6). The maximum stability is
at T s = 36 ◦ C, GU = 0 is at T g = 120 ◦ C and T g = −42◦ C. The solid line is a best fit in the range of the experimental data, the dashed line is the extrapolation to T g and T g . Surface contour model of a leucine zipper is from 2ZTA.pdb. Figure adapted from Ref. [27] with permission.
titrating one chain to the other by isothermal titration calorimeter. The experiment yields the enthalpy and free energy changes of the coupled association and folding reaction [27]. In this way, the data from thermal unfolding can be complemented with thermodynamic data pertaining to a temperature range in which the folded protein dominates. The stability of the leucine zipper peaks at 309 K (36 ◦ C) and decreases above and below this temperature. This is an example of cold denaturation visible already at ambient temperature. Extrapolation of the stability curve yields the temperatures of heat and cold denaturation, T g and T g , at which GU = 0. The shape of the stability curve is governed by the heat capacity change, CP , of the unfolding reaction. CP of 4.3 kJ M−1 K−1 is obtained from a fit of the data in Figure 4 using Eq. (6). The positive value of CP implies that the unfolded monomeric peptide chains can take up more heat per degree K than the folded leucine zipper. The specific CP normalized per g of protein is 0.6 J g−1 K−1 . This is of the same magnitude as the average specific CP for the unfolding of monomeric globular proteins [30] and points to similar types of inter- and intramolecular forces stabilizing dimeric and monomeric proteins. There have been many attempts to calculate CP from structural data based on the amount of polar and nonpolar surface buried at the interface of a dimeric protein [31, 32].
3 Dimeric Proteins
Fig. 5 Dependence of the rate constants and equilibrium dissociation constants of folding and association of the leucine zipper AB on ionic strength. A) Rate constant kf of coupled association and folding. B) Rate constant ku of coupled dissociation and unfolding. C) Equilibrium dissociation constant calculated as K d = ku /kf . The ionic
strength is expressed as the logarithm of the mean rational activity coefficient of NaCl [33]. Rate constants were determined under benign buffer conditions by rapid mixing of the unfolded peptide chains. Kinetic traces were analyzed with the help of Eq. (A29). Figure adapted from Ref. [28] with permission.
The kinetics of folding and unfolding of the leucine zipper AB are presented in Figure 5. This dimeric leucine zipper is characterized by several interhelical chargecharge interactions (salt bridges). Therefore, the rates of association/folding and dissociation/unfolding depend on ionic strength. The rate of association and folding decreases steeply when the ionic strength is raised (panel A) while dissociation and unfolding is only weakly ionic strength dependent (panel B). Consequently, the equilibrium constant is dominated by the ionic strength effect on the rate of
13
14 Association and Folding of Small Oligomeric Proteins
folding (panel C). The strong decrease with ionic strength of the rate of folding could indicate a general effect on the free energy of activation of the folding and association reaction due to long-range electrostatic attraction between the oppositely charged, unfolded polypeptide chains [33]. A small effect of the ionic strength on the rate of unfolding and dissociation has been interpreted by the presence of nativelike electrostatic interactions in the transition state of folding [34, 35]. However, in the case of several leucine zippers, interhelical salt bridges contribute negligibly to the stability of the coiled-coil structure, or even destabilize the folded dimer [36– 39]. Therefore, the small ionic strength effect on the dissociation rate constant ku confirms a minor role of electrostatic forces in stabilizing the folded structure [28]. The bimolecular rate constant of association and folding extrapolated to zero ionic strength is very large: kf =; 9 × 109 M−1 s−1 , larger than the rate of about 109 M−1 s−1 estimated for the diffusion-limited association of two short, uncharged polypeptide chains [40–42]. The extremely large value of kf in the absence of salt points to electrostatic acceleration of the association step as reported for other protein association reactions [33, 43]. Such very rapid association may indicate that the interaction is not much geometrically restricted. In other words, the transition state of folding of the leucine zipper is not well structured, which has been supported by kinetic transition state analysis [44]. The folding and association of leucine zipper AB as well as of other leucine zippers [45] are very tightly coupled and very rapid, yet there seems to be a transitory population of association-competent monomers preceding the actual associationfolding reaction [Mu → Mi in Eq. (3)]. There is indirect evidenced for a small helix content in the association-competent conformation of the peptide chains forming a leucine zipper [46–49]. The impact of preformed helix on the rate of association and folding of a coiled coil has been manipulated by engineering a metal-binding site into the helix sequence to stabilize the helix conformation [50]. 3.2 Folding of Dimeric Proteins through Intermediate States
As noted above, two-state folding is an operational concept since the lack of observable intermediates (or ensembles of intermediates) may have experimental reasons. Detection of intermediates depends on their stability as well as on the sensitivity and time resolution of thermodynamic and kinetic detection methods. Table 3 lists representative examples of dimeric proteins folding by a multistate mechanism, that means through at least one detectable intermediate. A good indication for a thermodynamic intermediate is the disparity between the free energy of unfolding determined by different methods. For example, the dimeric cAMP receptor protein (CRP) exhibits different unfolding curves depending on whether unfolding is observed by a spectroscopic probe or by sedimentation equilibrium analysis in the ultracentrifuge (Figure 6). The midpoint of the denaturant-induced unfolding curve obtained by measuring the monomer/dimer distribution in the ultracentrifuge is at 2 M GdmCl. However, the midpoint shifts to 3 M GdmCl when unfolding and dissociation is followed from the change of the far-UV CD spectrum, from
3 Dimeric Proteins Table 3 Examples of dimeric proteins folding through intermediate states
Protein
Mechanism
Commentsa
cAMP receptor protein (CPR) (homodimer) [51]
2Mu → 2Mi → Df
Disulfoferredoxin (homodimer) [74] LC8, light chain of dynein (homodimer) [75] SecA dimeric bacterial ATPase (homodimer) [52]
2Mu → 2Mi → Df
Thermodynamically stable monomeric intermediate. GU = 30 kJ mol−1 for Df → 2Mi and 50 kJ mol−1 for 2Mi → 2Mu . Equilibrium transition between folded dimer and monomeric intermediate from sedimentation equilibrium, overall unfolding and dissociation from unfolding by denaturant GU = 23 kJ mol−1 for Df → 2M, and 50 kJ mol−1 for 2Mi → 2Mu GU = 35 kJ mol−1 for Df → 2Mi and 62 kJ mol−1 for 2Mi → 2Mu
2Mu → 2Mi → Df 2Mu → Di → Df
Ketosteroid isomerase (homodimer) [53]
2Mu → 2Mi → Di → Df
Glutathione transferase A1-1 (homodimer) [76, 77]
2Mu → 2Mi → Di → Di → Df
Trp repressor protein (TRP) (homodimer) [54, 58]
2Mu → 2Mi → Di → Df
Bacterial luciferase (heterodimer) [59]
Mαu → Mαi , Mβu → Mβi Mαi + Mβi → Di → Df
a
Plateau in denaturant unfolding curve indicates intermediate shown to be dimeric by sedimentation equilibrium. GU = 35 kJ mol−1 for Df → Di and 59 kJ mol−1 for Di → 2Mu Two-state folding under equilibrium conditions with GU = 90 kJ mol−1 . Four-state folding in kinetic analysis with k = 60 s−1 for Mu → Mi , k = 5:4 × 104 M−1 s−1 for 2Mi → Di , k = 0:017 s−1 for Di → Df Under equilibrium conditions folding is two-state [77]. Folding through a monomeric and two dimeric intermediates; folding pathway complicated by proline cis → trans isomerization in Mu and Mi [76]. Rapid transition to two classes of association-competent monomers which associate in a combinatorial way to three different dimeric intermediates, and these rearrange in the rate-limiting step to the folded dimer [58]. Under equilibrium conditions, folding appears two state [54] Under equilibrium conditions folding is two state. Folding proceeds through monomeric and dimeric intermediates. The β-subunit can irreversibly form monomeric and dimeric dead-end species
Unless stated, thermodynamic and kinetic parameters refer to ambient temperature (20–25 ◦ C).
fluorescence emission or from fluorescence anisotropy. The higher midpoint corresponds to unfolding and dissociation, the lower midpoint to dissociation alone, which does not significantly change the CD and fluorescence signals [51]. Thus, CRP folds through a thermodynamically stable monomeric intermediate. The overall free energy of unfolding and dissociation is GU = 80 kJ mol−1 , that for
15
16 Association and Folding of Small Oligomeric Proteins
Fig. 6 Guanidinium chloride-induced unfolding of the homodimeric cAMP receptor protein (CPR) followed by different spectroscopic methods shows a sharp transition at 3 M GdmCl (filled circles, left ordinate). In contrast, the monomer/dimer ratio followed by sedimentation equilibrium analysis has a transition midpoint at 2 M GdmCl
(empty circles, right ordinate). The latter midpoint is assigned to an equilibrium between folded dimer and (partly folded) monomeric intermediate. The midpoint at 3 M GdmCl is assigned to the equilibrium between the fully folded dimer and the fully unfolded monomers. Figure redrawn from Ref. [51] with permission.
dissociation alone (from ultracentrifugation) is GU = 30 kJ mol−1 . The difference of 50 kJ mol−1 is the free energy of unfolding of the stable monomeric intermediate (Table 3). The fourth protein in Table 3, dimeric bacterial enzyme SecA, folds through a thermodynamically stable dimeric intermediate, which appears as a plateau in the denaturant unfolding curve [52]. A larger amount of energy is necessary for the unfolding and dissociation reaction Di → 2Mu than for unfolding of Df into Di , in accord with a thermodynamically stabilized dimeric intermediate. In the kinetics of folding, the monomolecular transition Di → Df is rate-determining since even at low protein concentration the bimolecular association 2Mu → Di is very rapid [52]. The first four proteins of Table 3 fold through thermodynamically stable intermediates. In many cases, however, the equilibrium distribution between folded dimer and unfolded monomer is well described by a two-state mechanism yet transitory intermediates are detectable in the kinetic analysis. Representative examples in Table 3 are the ketosteroid isomerase, the glutathione transferase A1–1 and the tryptophan repressor protein (TRP). Equilibrium analysis of all these proteins conforms to two-state folding and fulfills at least one of criteria i–iv but none of criteria v–viii listed in Table 1. Thus, dimeric ketosteroid isomerase is stabilized by GU = 90 kJ mol−1 under equilibrium conditions [53]. However, the time course of unfolding exhibits three kinetic phases, a slow monomolecular phase Df → Di , a bimolecular phase Di → 2Mi , and a rapid monomolecular phase Mi → Mu . Thus, two additional states are seen in kinetics. One may say that the simple Eq. (1) applies
3 Dimeric Proteins
Fig. 7 Complex folding scheme for tryptophan repressor protein (TRP) from Escherichia coli through three “folding channels.” Following the formation of two ensembles of partly folded monomeric intermediates, Mi fast and Mi slow , folding to the dimer is controlled by three kinetic phases (“channels”). These are due to the association of two molecules of Mi fast or Mi slow or of Mi fast + Mi slow . “Fast,” “slow,” and
“medium” refer to the rate of the final conformational rearrangement from the dimeric intermediate populations Di fast ; Di slow , and Di medium to the unique folded dimer Df . Di slow is a native-like intermediate and Di fast and Di medium are nonnative intermediates. Only the slow folding reaction through native-like intermediate Di slow is reversible. Figure adapted from Ref. [58] with permission.
to the enzyme at equilibrium yet the more realistic Eq. (3) to the time-dependent nonequilibrium situation of folding and unfolding. Folding of TRP has been studied in much detail [54–58]. Matthews and coworkers have formulated the folding scheme shown in Figure 7 based on a large series of kinetic experiments. In a burst phase following the removal of denaturant, unfolded monomeric TRP rearranges to two different ensembles of partly folded monomers, Mi fast and Mi slow . These rapidly combine to the three dimeric intermediates Di fast , Di slow , and Di medium . The unique folded dimer is formed in a fast, a medium, and a slow reaction from Di fast , Di medium , and Di slow , respectively. The rate of this final unimolecular reaction exhibits very little dependence on denaturant concentration suggesting an isomerization reaction proceeding with no detectable change in solvent-accessible surface area. The structural basis for the three “folding channels” is the presence of two prefolded monomeric species that randomly associate in different pairwise combinations. The intermediates differ in their compactness. The fast and medium channels access the native structure through the most compact yet least native-like intermediates, Di fast and Di medium , respectively. The slow channel, on the other hand, proceeds through a more “direct” folding route since the dimeric intermediate Di slow is native-like. Only this slow route is reversible.
17
18 Association and Folding of Small Oligomeric Proteins
Fig. 8 Complex folding scheme for the heterodimeric bacterial luciferase composed of an "-subunit and a $-subunit. Folding proceeds through the monomeric intermediates M"i and M $i and the dimeric inter-
mediate Di . Monomeric (M $x ) and dimeric (D$x ) dead-end species formed from the $subunit intermediate M $i slow the folding and reduce the folding yield. Figure adapted from Ref. [59] with permission.
The example of TRP shows that the native structure can be accessed through alternative routes, that folding through nonnative states can accelerate folding, and that nonnative states need not lead into kinetic traps [58]. Kinetic traps have been detected in the folding of bacterial luciferase, the last example in Table 3. Figure 8 shows the complex folding scheme for this heterodimeric enzyme composed of an α- and a β-subunit. The unfolded monomeric subunits Mαu and Mβu are converted to the monomeric intermediates Mαi and Mβi , which associate to the heterodimeric intermediate Di in a bimolecular reaction followed by the monomolecular rearrangement to the final folded dimer Df . However, the βsubunit can enter into a dead-end trap as indicated by the irreversible reactions Mβi → Mβx and 2Mβi → Dβx where Dβx is a dead-end homodimer of the β-subunit [59]. 4 Trimeric and Tetrameric Proteins
We only briefly discuss a few examples of small trimeric and tetrameric proteins. For statistical reasons, any reaction higher than second order is very slow. Therefore, simultaneous association of three polypeptide chains to a trimer is statistically unlikely, and simultaneous association of four chains to a tetramer is impossible. Thus, trimers (Tr) and tetramers (Te) are most likely to form by consecutive bimolecular reactions. For a trimer one can write: k1
k2
k−1
k−2
3M D+M Tr [M]2 k−1 [M][D] k−2 K d1 = = = K d2 = [D] k1 [Tr] k2
two−step folding of trimer
(7)
4 Trimeric and Tetrameric Proteins
Fig. 9 Concentration dependence of the CD signal at 222 nm of the trimeric coiled coil LZ16A. The solid line is a best fit for a monomer-dimertrimer equilibrium according to Eq. (7) with K d1 = 7.7 × 10−7 M and K d2 = 2.9 × 10−6 M. These values are in good agreement with the K 1 and K 2 calcu-
lated from the rate constants k1 = 7.8 × 104 M −1 s−1 , k−1 = 1.5 × 10−2 s−1 , k2 = 6.5 × 105 M−1 s−1 , and k−2 = 1.1 s−1 . Inset: backbone structure of dimer and trimer; ribbon models are from 2ZTA.pdb (dimer) and 1g cm.pdb (trimer). Figure adapted from Ref. [60] with permission.
Subscripts u, f, and i have been omitted in Eq. (7) to leave open the conformational state of the three species. Of course, there can be monomeric, dimeric, and trimeric intermediates, making the folding mechanism very complex. Three-state or two-step folding described by Eq. (7) has been reported for the designed homotrimeric coiled-coil protein LZ16A composed of three 29-residue peptide chains [60]. In the range of 1–100 µM total protein concentration, monomer, dimer, and trimer are populated in a concentration-dependent equilibrium [61]. Figure 9 shows the decrease of the CD spectral minimum as a function of total protein concentration. The data are best fit by two equilibria with K d1 = 7.7 × 10−7 M and K d2 = 2.9 × 10−6 M, where K d1 refers to the monomer/dimer equilibrium and K d2 to the dimer/trimer equilibrium. K d1 < K d2 means that the dimer is a stable intermediate dominating at lower protein concentration. For example, at 10 µM total protein concentration, there is 50% dimer, 35% trimer, and 15% monomer. From kinetic refolding experiments the four rate constants defined by Eq. (7) have been determined and were found to be in agreement with the equilibrium data (see the legend to Figure 9).
19
20 Association and Folding of Small Oligomeric Proteins
So it may seem surprising that a trimeric protein can apparently fold in a singlestep reaction, which means by a two-state folding mechanism according to Eq. (8). kf
3Mu Tr f ,K d = ku
[Mu ]3 ku = [Tr f ] k f
one−step folding of trimer
(8)
Here, kf is a trimolecular association rate constant (M−2 s−1 ) and ku is a monomolecular dissociation rate constant. One-step folding according to Eq. (8) occurs if the dimer (Eq. (7)) is very unstable and scarcely populated so that the first equilibrium of Eq. (7) is far on the side of the unfolded monomer and the second far on the side of the folded trimer. Under these conditions, trimer formation appears twostate. It should be noted that 3M → Tr can appear two-state despite the presence of a dimeric intermediate if 2M → D is spectroscopically silent under the chosen detection conditions so that the entire spectral signal change is caused by M + D → Tr [60]. An example of a homotrimeric protein folding through only a single reaction step is the six-helix bundle at the core of the gp41 envelope protein of the human and simian immunodeficiency viruses [62]. The structure is shown in Figure 10 and is composed of three “hairpins” formed by two α-helices linked through a short peptide loop [63]. The trimer folds from three unfolded monomers in a single
Fig. 10 Crystal structure of a six-helix bundle composed of three “hairpins.” Each “hairpin” comprises two helices linked by a short loop; the loop is not visible in this crystal structure from Ref. [63]. Left: Axial view looking down the threefold axis of the helix bundle. Right: Lateral view. The trimer folds from unfolded monomers in a single
step according to Eq. (8). GU (H2 O) determined by denaturant unfolding is 116 kJ mol−1 , and 114 kJ mol−1 when calculated from kinetic rate constants. The trimolecular rate constant of association is 1.3 × 1015 M−2 s−1 and the unimolecular rate constant of dissociation is 1.08 × 10−5 s−1 [62].
4 Trimeric and Tetrameric Proteins
step according to Eq. (8). Equilibrium and kinetic data are in excellent agreement: GU (H2 O) determined by denaturant unfolding is 116 kJ mol−1 , and 114 kJ mol−1 when calculated from kinetic rate constants. The transition state of folding seems closer to the unfolded than the folded state. Only 20–40% of the surface buried in the folded trimer is already buried in the transition state of folding [62]. Formation of the catalytic trimer of aspartate transcarbamoylase is at the other extreme of possible folding mechanisms. Here, partly folded monomers are rapidly formed initially and thereafter rearrange in a slow step to association-competent monomers. These monomeric intermediates are thermodynamically stable. They associate through at least two steps to the folded trimeric enzyme [64]. The folding scheme is 3Mu → 3Mi →→ Trf . Finally, we consider two tetrameric proteins: the C-terminal 30-residue domain of the tumor suppressor protein p53 [65] and a small bacterial dihydrofolate reductase composed of four 78-residue polypeptide chains [66]. The C-terminal p53 domain folds into a homotetramer in a thermodynamically fully reversible reaction. Interestingly, under equilibrium conditions, only folded tetramer and unfolded
Fig. 11 Qualitative reaction coordinate diagram for the folding of the tetrameric protein domain from tumor suppressor p53. Under equilibrium conditions, only a single transition between unfolded monomer (Mu ) and folded tetramer (Tef ) is seen [67]. Kinetic refolding and unfolding experiments show a single bimolecular association reac-
tion and a single monomolecular dissociation reaction, which could be assigned as indicated [65]. Ribbon model of tetrameric C-terminal domain of tumor suppressor p53 from sak.pdb. See the text for a more detailed discussion. Figure adapted from Ref. [65] with permission.
21
22 Association and Folding of Small Oligomeric Proteins
monomer are observed [67]. However, the time course of folding shows two consecutive bimolecular reactions with two kinetic, dimeric intermediates (Figure 11) [65]. Folding can be described as an association of dimers, in accord with the crystal structure, which has been likened to a dimer of dimers. The two intermediates shown in Figure 11 were deduced from extensive -value analysis [65]. The first bimolecular association reaction is between highly unstructured monomers and can be described by a “nucleation-condensation” mechanism [68]. The first dimeric intermediate Di,1 is formed through an early transition state that has little similarity to the native structural organization of the dimer. The second dimeric intermediate Di; 2 is highly structured and already includes most of the structural features of the native protein. The transition Di,2 → Tef is an example of a “framework mechanism” in which stable preformed structural elements “diffuse and collide” [68]. A final example of a tetrameric protein is the bacterial R67 dihydrofolate reductase [66]. Here, two monomeric intermediates (the formation of one being due to slow isomerization of peptidylproline bonds) precede dimer formation. Thereafter, two dimers associate to the folded tetramer. The folding reaction can be summarized as 4Mu → 4Mi,1 → 4Mi,2 → 2Di → Te.
5 Conclusions
The mechanisms of folding of small oligomeric proteins is highly variable. At one extreme is the fully concerted association of unfolded monomers to folded dimers, trimers and even tetramers. Such folding without a thermodynamically stable intermediate is surprisingly frequent. Even in the case of the tetrameric p53 domain, whose folding proceeds through several kinetic folding intermediates among which the final dimeric intermediate is very native-like, none of the intermediates is sufficiently stable to be noticed at equilibrium. The reason for such scarcity of stable folding intermediates may be due, in part, to destabilizing nonpolar surface exposed in the transitory intermediates. This surface becomes buried only in the final folded oligomer. Indeed, intermolecular or intersubunit forces dominate the stability of many oligomeric proteins. However, the interface between the subunits of the folded oligomeric proteins is highly variable. Shape complementarity between the contacting interfaces is a common feature and a main contributor to the specificity of intersubunit interactions. In a survey of 136 homodimeric proteins, a mixture of hydrophobic and intersubunit contacts was identified [69]. Small hydrophobic patches, polar interactions and water molecules are scattered over the intersubunit contact area in two-thirds of the proteins studied. Only in one-third of the dimers there was a single, large and contiguous hydrophobic core between the subunits. Though the hydrophobic effect plays an important role in subunit interactions, it seems to be less strong than that observed in the interior of monomeric proteins [70]. Only in a few cases, oligomerization conforms to the
5 Conclusions
association of preformed stable subunits. The case of the trimeric coiled-coil LZ16A at equilibrium with a stable dimeric coiled coil is probably an exception. From studies addressing the nature of the transition states of folding, it follows that the same principles deduced for monomeric proteins [71] apply also to oligomers. Folding of the subunits and their concurrent association to the oligomeric protein occurs by diffusion and collision of preformed structural frameworks (“framework model”) as well as by simultaneous nucleation and condensation. This is not surprising since the folded structure is always based on an extensive interplay of secondary and tertiary interactions in monomeric proteins and of secondary, tertiary, and quaternary interactions in oligomeric proteins. Because there is no intrinsic difference between tertiary and quaternary interactions there is no principal difference of folding between monomers and oligomers. The main difference is in the association step. Whereas the folding of a monomeric protein can be described by one or several consecutive or parallel monomolecular (single exponential) reactions, the folding of an oligomeric protein exhibits at least one concentration-dependent reaction step. On the one hand this feature complicates the thermodynamic and kinetic analysis. On the other hand the concentration dependence adds a useful variable to assist in the measurement of folding since the monomer/oligomer equilibrium can be studied by simple dilution under benign buffer conditions. Comparison of the data from dilution and from chemical or heat denaturation can give information about the contribution of intermolecular and intramolecular forces to the stability of the oligomeric protein. Also, determination of thermodynamic parameters such as the free energy of unfolding or the heat capacity change of unfolding is facilitated when the protein concentration can be adduced as an additional variable. Finally, in the case of heterooligomeric proteins, folding can be probed by simple mixing of the different subunits.
Appendix – Concurrent Association and Folding of Small Oligomeric Proteins
In this Appendix we have put together a set of equations for the analysis of thermodynamic and kinetic data of two-state folding. Many equations are equivalent to those for the folding of monomeric proteins except for an additional term accounting for protein concentration. Derivation of equilibrium association constants for two-state folding of oligomeric systems and the relationship between equilibrium constants and the thermodynamic parameters G, H, and S have been described in more detail by Marky and Breslauer [78]. The folded (native) state is taken as the reference state so that the thermodynamic parameters refer to the unfolding reaction (GU , HU , etc.). A special form of Eq. (A29) describing reversible two-state folding of a homodimer has been published by Milla and Sauer [20]. The method of deducing kinetic rate constants from the shift of a pre-existing equilibrium is briefly summarized (method of Bernasconi [16]). The kinetic and equilibrium equations presented here are useful for the analysis of simple reaction steps, or steps that are kinetically or thermodynamically well separated. If the reaction steps are linked, numerical integration can be used to
23
24 Association and Folding of Small Oligomeric Proteins
analyze kinetic and equilibrium data. There are good computer programs to aid kinetic analysis, some of the programs are freely available on the Web (for example DynaFit [79]). Depending on the scatter of experimental data, numerical fitting of linked reactions may result in large errors and distinguishing between different models can be very difficult. Numbering of equations Equations appearing only in this Appendix have the prefix A. Other equations are numbered as in the main text. Abbreviations M, monomer; D, dimer; N, n-mer; [Pt ], total protein concentration expressed as total concentration of monomer; f M , fraction of free monomer defined as [M]/[Pt ]. Subscripts; u, unfolded; i, intermediate; f, folded; m, midpoint; K d , equilibrium dissociation constant (=1/K).
A1 Equilibrium Constants for Two-state Folding
Association and dissociation of oligomeric proteins results in concentrationdependent equilibria. Expressions for the equilibrium constants of homooligomeric and heterooligomeric proteins are different. A1.1 Homooligomeric Protein
For a homooligomeric protein, the equilibrium transition between n identical, unfolded monomeric subunits M and the folded n-meric protein N is described by nMN
(A1)
The equilibrium dissociation constant K d is defined as K d=
[M]n [N]
(A2)
Defining the fraction of free monomeric subunits as fM=
[M] (0≤ f M ≤1) [Pi ]
(A3)
we can write K d as K d=
n[Pt ]n−1 f Mn 1− f M
(A4)
A2 Calculation of Thermodynamic Parameters from Equilibrium Constants 25
where [Pt ] = [M] + n[N] is the total protein concentration expressed as total monomer concentration. At f M = 0.5 we have K d, f M =0.5 =n ([Pt ]/2)n−1
(A5)
A1.2 Heterooligomeric Protein
For a heterooligomeric protein, the equilibrium transition between n different, unfolded monomeric subunits M1 , M2 . . . Mn and the folded n-meric protein N is described by M1 +M2 + . . . +Mn N
(A6)
The equilibrium dissociation constant K d is defined as K d=
[M1 ][M2 ] . . . [Mn ] [N]
(A7)
The corresponding expression in terms of the fraction f M of free monomeric subunits is K d=
[Pt ]n−1 f Mn nn−1 (1− f M )
(A8)
and at f M = 0.5 K d, f M =0.5 =
[Pt ] 2n
n−1 (A9)
In Eqs (A8) and (A9), [Pt ] = [M1 ] + [M2 ] + . . . + [Mn ] + n[N]. The difference between Eqs (A4), (A5) and (A8), (A9) reflects the statistical difference between homooligomeric and heterooligomeric equilibria. Equations (4) and (5) of the main text correspond to Eqs (A4) and (A8), respectively. A2 Calculation of Thermodynamic Parameters from Equilibrium Constants A2.1 Basic Thermodynamic Relationships
The changes of free energy, enthalpy, and entropy are related by the well-known equation G U =HU −T SU =−RT lnK d
(2)
26 Association and Folding of Small Oligomeric Proteins
The superscript◦ indicating standard conditions is omitted for simplicity. The van’t Hoff enthalpy of unfolding is HUvH =RT 2
d lnK d dT
(A10)
The heat capacity change at constant pressure, CP , is defined by Cp =
∂H dT
(A11) p
The calculation of thermodynamic parameters from equilibrium constants is based on Eqs (2), (A10), and (A11). A2.2 Linear Extrapolation of Denaturant Unfolding Curves of Two-state Reaction
Linear extrapolation is performed by the same procedure as applied to denaturant unfolding of monomeric proteins, except that the concentration has to be taken into account. The linear extrapolation method is discussed in 3. Linear extrapolation of the free energy of two-state unfolding of homooligomeric and heterooligomeric proteins is described by −RT < ln −RT < ln
n[Pt ]n−1 f Mn 1− f M
=G U ( H2 O) −m[denaturant]
[Pt ]n−1 f Mn =G U ( H2 O) −m[denaturant] nn−1 (1− f M )
homooligomer
heterooligomer
(A12)
(A13)
The term on the left hand side of Eq. (A12) follows from Eqs (2) and (A4), that on the left hand side of Eq. (A13) from Eqs (2) and (A8). The slope m is the change of GU with denaturant concentration (J mol−1 M−1 ). A2.3 Calculation of the van’t Hoff Enthalpy Change from Thermal Unfolding Data
From the change of K d with T, the van’t Hoff enthalpy is calculated according to Eq. (A2). In practice, one calculates HU vH at T m from a plot of ln K d against 1/T. With Eq. (2) one obtains the Arrhenius equation ln
HU,m SU + RT R
(A14)
where HU,m is the van’t Hoff enthalpy at T m . Only data points in a narrow interval around T m should be used to calculate HU,m from the slope HU,m /R since
A2 Calculation of Thermodynamic Parameters from Equilibrium Constants 27
Eq. (A14) neglects the change of the enthalpy with T (heat capacity change assumed to be zero). A2.4 Calculation of the van’t Hoff Enthalpy Change from the Concentration-dependence of Tm
Since association of an oligomeric protein is concentration dependent, T m changes with concentration. Measuring T m for varying [Pt ] is another way to calculate HU,m . From Eqs (A5) and (A(A9)) it follows that at T m , K d is described by K d (Tm ) =n ([Pt ]/2)n−1 K d (Tm ) = ([Pt ]/2n)n−1
homooligomer
(A15)
heterooligomer
(A16)
From Eqs (??), and (A16) one obtains after rearranging [78]: S− (n−1) R< ln<2+R lnn 1 (n−1) R = < ln<[Pt ]+ Tm Hm Hm 1 (n−1) R S− (n−1) R ln2n = ln[Pt ]+ Tm Hm Hm
homooligomer
heterooligomer
(A17) (A18)
A2.5 Extrapolation of Thermodynamic Parameters to Different Temperatures: Gibbs-Helmholtz Equation
The calculation of the van’t Hoff enthalpy described above refers to the transition temperature T m . To obtain the thermodynamic parameters at any other temperature, the change of H and S with T has to be taken into account according to the Gibbs-Helmholtz equation: T T ( ) +Cp T −Tm −T < ln G U T =HU,m 1− Tm Tm −RT ln[K d (Tm ) ]
(6)
The term −RTln[K d (T m )] on the right hand side is necessary to account for the concentration dependence of GU . K d (T m ) corresponds to Kd at f M = 0.5. Depending on the number of subunits and on whether the protein is a homo- or a heterooligomer, the appropriate form of K d from Eq. (A5) or (A9) is inserted in Eq. (6). For example, the correction term for a homodimeric protein is −RT ln([Pt ]). The concentration-independent form of the Gibbs-Helmholtz equation is obtained by using the reference temperature T g at which GU = 0 T T +Cp T −Tg ln G U (T ) =H Tg 1− Tg Tg
(A19)
28 Association and Folding of Small Oligomeric Proteins
A3 Kinetics of Reversible Two-state Folding and Unfolding: Integrated Rate Equations
The following integrated rate equations for two-state folding of a dimeric protein are used to calculate the folding rate constant kf and the unfolding rate constant ku from kinetic traces. Depending on experimental conditions, only folding (A3.1), only unfolding (A3.2), or both (A3.3) have to be taken into account. A3.1 Two-state Folding of Dimeric Protein
kf 2M → D
(A20)
The time course of folding is given by d[M] =−k f [M]2 dt
(A21)
After integration, the time course of disappearance of the monomer fraction becomes f M (t ) =
[M0 ] [Pt ] k f t+1
(A22)
where [M0 ] is the initial monomer concentration at t = 0. [M0 ] = [Pt ] if [D] = 0 at the beginning of the folding reaction. Equation (A22) is used to obtain kf from a folding trace expressed as f M (t) if the unfolding reaction can be neglected (for example at low denaturant concentration). A3.2 Two-state Unfolding of Dimeric Protein
ku D → 2M
(A23)
The time course of unfolding is given by d[D] =−ku [D] dt
(A24)
After integration, the time course of disappearance of the dimer fraction f D 1–f M becomes f D (t) =
2[D0 ] exp (−ku t ) [Pt ]
(A25)
A3 Kinetics of Reversible Two-state Folding and Unfolding: Integrated Rate Equations
where [D0 ] is the initial dimer concentration at t = 0. [D0 ] = [Pt ]/2 if [M] = 0 at t = 0. Equation (A25) is used to obtain ku from an unfolding trace expressed as f D (t) = 1–f M (t) if refolding can be neglected (for example at high denaturant concentration). A3.3 Reversible Two-state Folding and Unfolding A3.3.1 Homodimeric protein kf
2Mu D
(1)
ku
The time course of disappearance of the monomer can be written as d[M] =−k f [M]2 +2ku [D] dt
(A26)
which can be recast into d[M] =dt a+b[M]+c[M]2
(A27)
a=ku [M0 ],b=−ku ,c=−k f
(A28)
where
After integration, the time course of disappearance of the monomer fraction is (b+s ) ( Z−1) [Pt ]2c (1−Z) √ 2c[M0 ]+b−s <exp (s t) and s = b 2 −4ac where Z= 2c[M0 ]+b+s f M (t) =
(A29)
Equation (A29) is used to obtain kf and ku from a folding trace expressed as f m (t) under conditions where both folding and unfolding have to be taken into account (for example at intermediate denaturant concentration). A3.3.2 Heterodimeric protein kf
M1 +M2 D
(A30)
ku
The change of [M1 ] with time can be written as d[M1 ] =−k f [M1 ][M2 ]+ku [D] dt
(A31)
29
30 Association and Folding of Small Oligomeric Proteins
The time course of disappearance of the fraction of M1 or M2 is again described by Eq. (A27) except that the constants a, b, and c are different. If [M1 ] = [M2 ], a, b, and c are a=ku [M0 ]/2,b=−ku ,c=−k f
(A32)
If [M1 ] = [M2 ] a=ku [M1,0 ],b=k f [M1,0 ]−[M2,0 ] −ku ,c=−k f
(A33)
a=ku [M2,0 ],b=k f [M2,0 ]−[M1,0 ] −ku ,c=−k f
(A34)
or
Equation (A33) is valid for f M1 (t) and (A34) for f M2 (t); subscript 0 indicates concentration at t = 0.
A4 Kinetics of Reversible Two-state Folding: Relaxation after Disturbance of a Preexisting Equilibrium (Method of Bernasconi)
Rate equations can be simplified if the reaction corresponds to a relaxation after a relatively small disturbance of the equilibrium. The procedure, pioneered by Bernasconi [16], allows rate constants to be obtained in those cases where there is no analytical solution of the differential rate equation. In practise, a pre-existing equilibrium between folded and unfolded protein is rapidly disturbed, for example by rapid dilution or rapid increase of temperature (T-jump), and the subsequent relaxation to the new equilibrium is observed. This relaxation is characterized by the relaxation time τ = 1/kapp , where kapp (s−1 ) is the apparent rate constant of the exponential decay to the new equilibrium. The number of relaxation times equals the number of reaction steps. Table 4 lists a few equations for different reversible folding reactions. They describe the relationship between the experimentally measured relaxation time τ and the individual rate constants of the reversible foldingunfolding reaction. The main difficulty with this procedure is that the equilibrium concentrations of the reactants at the new equilibrium, that is after relaxation, have to be known (concentrations with overbars in Table 4). These concentrations can be calculated if the equilibrium constants are known from independent experiments. Alternatively, a fitting procedure can be used in which initial estimates of the equilibrium constants are iteratively adapted until a best fit is obtained.
References Table 4 Relaxation times for various reversible folding reactions
Equation for relaxation timea
Protein kf
2M D ku kf
nM N ku
kf
M1 +M2 D
ku k2 2M Di D k−1 k−2 k1
1 =4k f [M]+ku τ 1 =n2 k f [M]n−1 +ku τ 1 =k f [M1 ]+[M2 ] +ku τ τ1 < τ2 : bimolecular association is faster than monomolecular rearrangement
(A35) (A36) (A37) (A38)
1 1 4k1 k2 [M] +k−2 =4k1 [M] +k−1 , = τ1 τ2 4k1 ([M]+k−1 ) k1
k2
k−1
k−2
2M Di D
τ1 > τ2 : bimolecular association is slower than monomolecular rearrangement
(A39)
1 1 k−1 k−2 =k1 +k−1 , =4k1 [M]+ τ1 τ2 k2 +k−2 a
Concentrations with overbars are equilibrium concentrations after relaxation to the new equilibrium
Acknowledgments
I thank Ilian Jelesarov and Daniel Marti for fruitful discussions and helpful comments on the manuscript. Work from my own laboratory has been supported in part by the Swiss National Science Foundation.
References 1 D. S. GODSELL and A. J. OLSON, Trends Biochem. Sci. 1993, 18, 65–68. 2 K. A. DILL, Biochemistry 1990, 29, 7133–7155. 3 W. KAUZMANN, Adv. Protein Chem. 1959, 14, 1–63. 4 C. N. PACE, Biochemistry 2001, 40, 310–313. 5 R. JAENICKE and H. LILIE, 2000, 53. 6 K. E. NEET and D. E. TIMM, Protein Sci. 1994, 3, 2167–2174. 7 R. SECKLER, Assembly of multi-subunit structures. In Mechanisms of Protein Folding (R. H. PAIN, ed.), pp. 279–308. Oxford University Press, Oxford, 2000. 8 R. JAENICKE, Biol. Chem. 1998, 379, 237–243.
9 C. N. PACE, Methods Enzymol. 1986, 131, 266–280. 10 J. U. BOWIE and R. T. SAUER, Biochemistry 1989, 28, 7139–7143. 11 D. E. TIMM and K. E. NEET, Protein Sci. 1992, 1, 236–244. 12 I. JELESAROV and H. R. BOSSHARD, J. Mol. Recogn. 1999, 12, 13–18. 13 E. FREIRE, O. L. MAYORGA and M. STRAUME, Anal. Chem. 1990, 62, 950A–959A. 14 T. WISEMAN, S. WILLISTON, J. F. BRANDTS and L.-N. LIN, Anal. Biochem. 1989, 179, 131–137. 15 K. W. PLAXCO and C. M. DOBSON, Curr. Opin. Struct. Biol. 1996, 6, 630–636. 16 C. F. BERNASCONI, Relaxation Kinetics, 1st edn. Academic Press, New York, 1976.
31
32 Association and Folding of Small Oligomeric Proteins
17 H. WENDT, C. BERGER, A. BAICI, R. M. THOMAS and H. R. BOSSHARD, Biochemistry 1995, 34, 4097–4107. 18 T. JONSSON, C. D. WALDBURGER and R. T. SAUER, Biochemistry 1996, 35, 4795–4802. 19 R. ZWANZIG, Proc. Natl Acad. Sci. USA 1997, 94, 148–150. 20 M. E. MILLA and R. T. SAUER, Biochemistry 1994, 33, 1125–1133. 21 T. JONSSON, C. D. WALDBURGER and R. T. SAUER, Biochemistry 1996, 35, 4795–4802. 22 A. K. SRIVASTAVA and R. T. SAUER, Biochemistry 2000, 39, 8308–8314. 23 M. E. MILLA, B. M. BROWN, C. D. WALDBURGER and R. T. SAUER, Biochemistry 1995, 34, 13914–13919. 24 C. D. WALDBURGER, T. JONSSON and R. T. SAUER, Proc. Natl Acad. Sci. USA 1996, 93, 2629–2634. 25 I. A. HOPE and K. STRUHL, EMBO J. 1987, 6, 2781–2784. 26 A. I. DRAGAN and P. L. PRIVALOV, J. Mol. Biol. 2002, 321, 891–908. 27 I. JELESAROV and H. R. BOSSHARD, J. Mol. Biol. 1996, 263, 344–358. 28 H. WENDT, L. LEDER, H. H¨ARMA¨ , I. JELESAROV, A. BAICI and H. R. BOSSHARD, Biochemistry 1997, 36, 204–213. 29 E. K. O’SHEA, K. J. LUMB and P. S. KIM, Curr. Biol. 1993, 3, 658–667. 30 P. L. PRIVALOV and S. A. POTHEKIN, Methods Enzymol. 1986, 131, 4–51. 31 R. S. SPOLAR and M. T. RECORD, Jr., Science 1994, 263, 777–784. 32 G. I. MAKHATADZE and P. L. PRIVALOV, Adv. Protein Chem. 1995, 47, 307–425. 33 G. SCHREIBER and A. R. FERSHT, Nature Struct. Biol. 1996, 3, 427–431. 34 H. X. ZHOU, Biopolymers 2001, 59, 427–433. 35 H. X. ZHOU, Protein Sci. 2003, 12, 2379–2382. 36 H. R. BOSSHARD, D. N. MARTI, and I. JELESAROV, J. Mol. Recogn. 2004, 17, 1–16. 37 K. J. LUMB and P. S. KIM, Science 1995, 268, 436–439. 38 P. PHELAN, A. A. GORFE, J. JELESAROV, D. N. MARTI, J. WARWICKER and H. R. BOSSHARD, Biochemistry 2002, 41, 2998–3008. 39 D. N. MARTI and H. R. BOSSHARD, J. Mol. Biol. 2003, 330, 621–637.
40 C. C. MOSER and P. L. DUTTON, Biochemistry 1988, 27, 2450–2461. 41 E. D¨URR, I. JELESAROV and H. R. BOSSHARD, Biochemistry 1999, 38, 870–880. 42 R. KOREN and G. G. HAMMES, Biochemistry 1976, 15, 1165–1171. 43 M. VIJAYAKUMAR, K. Y. WONG, G. SCHREIBER, A. R. FERSHT, A. SZABO and H. X. ZHOU, J. Mol. Biol. 1998, 278, 1015–1024. 44 H. R. BOSSHARD, E. D¨URR, T. HITZ and I. JELESAROV, Biochemistry 2001, 40, 3544–3552. 45 J. A. ZITZEWITZ, O. BILSEL, J. B. LUO, B. E. JONES and C. R. MATTHEWS, Biochemistry 1995, 34, 12812–12819. 46 R. A. KAMMERER, T. SCHULTHESS, R. LANDWEHR et al. Proc. Natl Acad. Sci. USA 1998, 95, 13419–13424. 47 J. K. MYERS and T. G. OAS, J. Mol. Biol. 1999, 289, 205–209. 48 R. A. KAMMERER, V. A. JARAVINE, S. FRANK et al. J. Biol. Chem. 2001, 276, 13685–13688. 49 J. A. ZITZEWITZ, B. IBARRA-MOLERO, D. R. FISHEL, K. L. TERRY and C. R. MATTHEWS, J. Mol. Biol. 2000, 296, 1105–1116. 50 B. A. KRANTZ and T. R. SOSNICK, Nature Struct. Biol. 2001, 8, 1042–1047. 51 X. CHENG, M. L. GONZALEZ and J. C. LEE, Biochemistry 1993, 32, 8130–8139. 52 S. M. DOYLE, E. H. BRASWELL and C. M. TESCHKE, Biochemistry 2000, 39, 11667–11676. 53 D. H. KIM, D. S. JANG, G. H. NAM et al. Biochemistry. 2000, 39, 13084–13092. 54 M. S. GITTELMAN and C. R. MATTHEWS, Biochemistry 1990, 29, 7011–7020. 55 C. J. MANN, S. XHAO and C. R. MATTHEWS, Biochemistry 1995, 34, 14573–14580. 56 L. M. GLOSS and C. R. MATTHEWS, Biochemistry 1997, 36, 5612–5623. 57 L. M. GLOSS and C. R. MATTHEWS, Biochemistry 1998, 37, 15990–15999. 58 L. M. GLOSS, B. R. SIMLER and C. R. MATTHEWS, J. Mol. Biol. 2001, 312, 1121–1134. 59 A. C. CLARK, S. W. RASO, J. F. SINCLAIR, M. M. ZIEGLER, A. F. CHAFFOTTE and T. O. BALDWIN, Biochemistry 1997, 36, 1891–1899.
References 60 E. D¨URR and H. R. BOSSHARD, Protein Sci. 2000, 9, 1410–1415. 61 R. M. THOMAS, H. WENDT, A. ZAMPIERI and H. R. BOSSHARD, Progr. Colloid. Polym. Sci. 1995, 99, 24–30. 62 D. N. MARTI, M. LU, H. R. BOSSHARD and I. JELESAROV, J. Mol. Biol. 2004, 336, 1–8. 63 D. C. CHAN, D. FASS, J. M. BERGER and P. S. KIM, Cell 1997, 89, 263–273. 64 D. L. BURNS and H. K. SCHACHMAN, J. Biol. Chem. 1982, 257, 8648–8654. 65 M. G. MATEU, M. M. S. Del PINO and A. R. FERSHT, Nature Struct. Biol. 1999, 6, 191–198. 66 C. BODENREIDER, N. KELLERSHOHN, M. E. GOLDBERG and A. MEJEAN, Biochemistry 2002, 41, 14988–14999. 67 C. R. JOHNSON, P. E. MORIN, C. H. ARROWSMITH and E. FREIRE, Biochemistry 1995, 34, 5309–5316. 68 A. R. FERSHT, Curr. Opin. Struct. Biol. 1997, 7, 3–9. 69 T. A. LARSEN, A. J. OLSON and D. S. GOODSELL, Structure 1998, 6, 421–427.
70 C. J. TSAI, S. L. LIN, H. J. WOLFSON and R. NUSSINOV, Protein Sci. 1997, 6, 53–64. 71 V. DAGGETT and A. R. FERSHT, Trends Biochem. Sci. 2003, 28, 18–25. 72 K. S. THOMPSON, C. R. VINSON and E. FREIRE, Biochemistry 1993, 32, 5491–5496. 73 K. BACHHAWAT, M. KAPOOR, T. K. DAM and A. SUROLIA, Biochemistry 2001, 40, 7291–8300. 74 D. APIYO, K. JONES, J. GUIDRY and P. WITTUNG-STAFSHEDE, Biochemistry 2001, 40, 4940–4948. 75 E. BARBAR, B. KLEINMAN, D. IMHOFF, M. LI, T. S. HAYS and M. HARE, Biochemistry 2001, 40, 1596–1605. 76 L. A. WALLACE and H. W. DIRR, Biochemistry 1999, 38, 16686–16694. 77 L. A. WALLACE, N. SLUIS-CREMER and H. W. DIRR, Biochemistry 1998, 37, 5320–5328. 78 L. A. MARKY and K. J. BRESLAUER, Biopolymers 1987, 26, 1601–1620. 79 P. KUZMIC, Anal. Biochem. 1996, 237, 260–273.
33
1
Disulfide Bond Formation in Prokaryotes Jean-Francois Collet, and James C. Bardwell University of Michigan, Ann Arbor, USA
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Oxidation of two cysteine residues leads to the formation of a disulfide bond and the concomitant release of two electrons. The formation of disulfide bonds is a required step in the folding pathway of many secreted proteins. It takes place in the eukaryotic endoplasmic reticulum or in the bacterial periplasm. In contrast, this oxidation reaction is harmful to most cytoplasmic proteins and may lead to protein misfolding and aggregation. Both eukaryotic and prokaryotic cells possess mechanisms to ensure that cytoplasmic cysteines are kept reduced. These mechanisms involve enzymes of the thioredoxin and glutaredoxin systems. This review will focus on disulfide bond formation in bacteria, taking Escherichia coli as a model. In the first two sections we will discuss disulfide bond formation and isomerization in the periplasm. In the third section, we will present some of the techniques that have been used to study thiol-disulfide chemistry.
2 Disulfide Bond Formation in the E. coli Periplasm 2.1 A Small Bond, a Big Effect
In both eukaryotes and prokaryotes, structural disulfide bonds are found in extracytoplasmic proteins. The presence of a covalent bond between two cysteine residues confers a higher stability to these proteins and is often required for their correct folding. One striking example of the importance of disulfide bonds for the correct Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Disulfide Bond Formation in Prokaryotes
Fig. 1 The correct folding of the flagellum component FlgI requires a disulfide bond.
folding of secreted proteins is the E. coli flagellar protein FlgI, a component of the bacterial flagellum. The folding of FlgI requires a disulfide bond. When this disulfide is not introduced in FlgI, FlgI cannot be properly folded and a functional flagellum cannot be assembled [1]. This results in the complete loss of motility as shown in Figure 1. 2.2 Disulfide Bond Formation Is a Catalyzed Process
Disulfide bonds can be formed spontaneously by molecular oxygen according to the following equation: 2 R−SH + 1/2 O2 →RS − SR + H2 O However, this reaction is very slow, occurring on the hour timescale in vitro. Such reaction times are unrealistic in vivo, where proteins are synthesized and folded within minutes. This discrepancy led Anfinsen in the early 1960s to postulate the existence of catalysts of disulfide bond formation. In 1964, he discovered the first catalyst for disulfide bond formation, the eukaryotic protein disulfide isomerase (PDI). PDI is a multi-domain enzyme that is active in the eukaryotic endoplasmic reticulum. The eukaryotic disulfide formation pathway has been recently reviewed by Sevier and Kaiser and will not be discussed here [2]. In E. coli, disulfide bonds are introduced exclusively in the periplasm. The enzymes responsible are members of the Dsb (disulfide bond) protein family: DsbA and DsbB, which are involved in disulfide bond formation, and DsbC, DsbG, and DsbD, which are involved in disulfide bond isomerization.
2 Disulfide Bond Formation in the E. coli Periplasm 3
2.3 DsbA, a Protein-folding Catalyst
DsbA is a 21-kDa soluble protein that is required for disulfide bond formation in the E. coli periplasm. In agreement with the central role of this protein in the oxidative protein-folding pathways, dsbA− strains exhibit pleiotropic phenotypes [3]. For instance, dsbA− mutants are hypersensitive to benzylpenicillin, dithiothreitol, and metals [4, 5]. They show reduced levels of several secreted proteins such as the outer-membrane OmpA and alkaline phosphatase [3]. Like many other oxidoreductases, DsbA has a thioredoxin-like fold and a catalytic CXXC motif. In agreement with DsbA acting as a thiol-oxidase, the CXXC motif is found to be predominantly oxidized in vivo [6]. Biochemical and biophysical studies have shown that the disulfide bond present in DsbA is very unstable [7] and can be rapidly transferred to newly translocated proteins (Figure 2). With a redox potential around −120 mV [7], DsbA is the second most oxidizing protein after DsbB (see below). The oxidizing power of DsbA originates from the low pK a of Cys30, the first cysteine of the CXXC motif [8]. This residue has a pK a of ≈3, which is much more acidic than the usual pK a of cysteine residues (≈9). Due to its low pK a value, Cys30 is in the thiolate-anion state at physiological pH. Structural studies have shown that this thiolate anion, which is highly accessible in the folded protein, is stabilized by electrostatic interactions [9]. The most important electrostatic interactions take place between Cys30 and the third residue of the CXXC motif, His32. Mutations of this single His32 residue increase the stability of the oxidized protein, thereby decreasing the redox potential of DsbA and the reactivity of the enzyme [10]. The stabilization of the thiolate anion is a key characteristic of DsbA, as it drives the reaction towards the reduction of DsbA and the transfer of the disulfide bond to the folding protein. In addition to the thioredoxin-like domain, DsbA contains a second compactly folded helical domain [9]. This domain covers the CXXC active site and contains
Fig. 2 Disulfide bond transfer from DsbA to the target protein. Oxidized DsbA is very unstable and reacts rapidly with unfolded proteins entering the periplasm. The disulfide bond is transferred onto the target protein and DsbA is reduced.
4 Disulfide Bond Formation in Prokaryotes
several residues that form a hydrophobic patch. This patch, together with a long and relatively deep groove running in the thioredoxin domain below the active site, is likely to interact in a hydrophobic manner with unfolded protein substrates [11]. Such hydrophobic interactions have indeed been reported to take place. Using NMR, Couprie et al. showed that DsbA binds a model peptide in a hydrophobic way [12]. In a mixed disulfide between DsbA and the substrate protein Rnase T1, the conformational stability of Rnase T1 is decreased by 5 kJ mol−1 and the conformational stability of DsbA is increased by 5 kJ mol−1 , suggesting that DsbA interacts with the unfolded protein by preferential noncovalent interactions [13]. Proteolysis experiments also showed that oxidized DsbA is more flexible than reduced DsbA. This flexibility might help the oxidized protein to establish interactions with different partner proteins by accommodating the peptide-binding groove to various substrates. On the other hand, the increased rigidity of the reduced form could facilitate the release of oxidized products [14]. 2.4 How is DsbA Re-oxidized?
After the transfer of the disulfide bond to target proteins, DsbA is reduced and therefore incapable of donating a disulfide bond. The protein responsible for the re-oxidation of DsbA is an inner-membrane protein called DsbB [4, 15]. Mutants lacking DsbB accumulate DsbA in the reduced state and exhibit the same pleiotropic phenotypes as dsbA− mutants, indicating that DsbA and DsbB are on the same pathway [15]. This was confirmed in vitro when we purified DsbB to homogeneity and showed that it is able to catalytically oxidize DsbA in the presence of molecular oxygen [16]. DsbB is a 21-kDa protein that is predicted to have four transmembrane segments and two loops protruding into the periplasm. Each of these periplasmic domains contains a conserved pair of cysteine residues: Cys41 and Cys44 in domain 1 and Cys104 and Cys130 in domain 2. These cysteines can undergo oxidationreduction cycles. They are essential for activity, as removal of any one of these cysteines causes the accumulation of reduced DsbA [17–19]. The first pair of cysteine residues (Cys41–Cys44) is present in a CXXC motif, a motif that is often found in thioredoxin-like folds. However, the first periplasmic loop of DsbB contains no other recognizable similarity to thioredoxin, and because it is predicted to be only ≈17 residues long, it is far too short to contain a thioredoxin fold. 2.5 From Where Does the Oxidative Power of DsbB Originate?
In vivo data obtained by Kobayashi et al. suggested that DsbB is recycled by the electron transport chain [20]. We confirmed this hypothesis by successfully reconstituting the complete disulfide bond formation system in vitro [16, 21]. Using purified components, we showed that electrons flow from DsbB to ubiquinone, then onto cytochrome oxidases bd and bo, and finally to molecular oxygen. Under
2 Disulfide Bond Formation in the E. coli Periplasm 5
Fig. 3 Oxidation pathway in the disulfide periplasm. Reduced proteins are oxidized by DsbA. DsbA is re-oxidized by the inner-membrane protein DsbB. Under aerobic conditions, DsbB is re-oxidized by
molecular oxygen in a ubiquinone- and cytochrome oxidase-dependent reaction. Under anaerobic conditions, electrons flow to menaquinone and then to alternative terminal acceptors.
anaerobic conditions, electrons are transferred from DsbB to menaquinone then to fumarate or nitrate reductase (see Figure 3). DsbB is a key element of the disulfide bond formation pathway, as it uses the oxidizing power of quinones to generate disulfides de novo [22]. This is a novel catalytic activity that seems to be the major source of disulfide bonds in prokaryotes. We showed that purified DsbB contains a single, highly specific quinone-binding site and has a KM of 2 µM for quinones [22]. The quinone-binding domain of DsbB seems to be in close proximity to residues 91–97 of the protein [23]. This segment is in the second periplasmic domain of DsbB, in a loop connecting transmembrane helices 3 and 4. Another residue, R48, is also likely to be involved in quinone binding. R48 mutants exhibit a major defect in catalyzing disulfide bond formation in vivo [24]. We purified some of these DsbB mutant proteins and showed that the R48H mutant has a KM value for quinones that is seven times greater than that of wild-type DsbB, indicating that R48 is involved in quinone binding [24]. 2.6 How Are Disulfide Bonds Transferred From DsbB to DsbA?
An interesting question is how disulfide bonds are transferred from DsbB to DsbA. Under aerobic conditions, the CXXC motif of DsbB is very difficult to reduce with dithiothreitol when quinones are present in the preparation [25]. This is one of the major arguments in favor of a direct oxidation of Cys41 and Cys44 by quinones.
6 Disulfide Bond Formation in Prokaryotes
Fig. 4 DsbB transfers electrons by a succession of thiol-disulfide exchange reactions. Step 1: a disulfide bond is transferred from the C-terminal cysteine residues of DsbB (Cys104 and Cys130) to DsbA. Cys104 and Cys130 are reduced and DsbA is oxidized. Step 2: a disulfide bond is transferred
from the N-terminal cysteine residues of DsbB (Cys41 and Cys44) to the C-terminal cysteine residues. Cys104–Cys130 are oxidized and Cys41–Cys44 are reduced. Step 3: Cys41 and Cys44 are re-oxidized by quinone reduction. Electrons are transferred to the electron transport system.
Then, according to a model first proposed by Ito and coworkers [18, 19, 26], the generated disulfide bond is transferred to the cysteine pair Cys104–Cys130 and then onto DsbA (Figure 4). Indeed, a mixed disulfide between Cys30 of DsbA and Cys104 of DsbB has been isolated [18]. This model has been a matter of controversy, and results obtained recently by us and two other labs do not entirely agree with it [26– 28]. However, in a paper published recently, Glockshuber and coworkers reported strong arguments in favor of the proposed original mechanism [29]. They showed that Cys104–Cys130 of a quinone-depleted DsbB can indeed directly transfer a disulfide to DsbA and is then re-oxidized by the Cys41–Cys44 disulfide bond. They determined the redox potential of Cys41–Cys44 and Cys104–Cys130 disulfide bonds of DsbB to be −69 mV and −186 mV, respectively. This high redox potential of the Cys41–Cys44 disulfide bond provides strong support for the above-mentioned model, because this redox potential lies between that of ubiquinone (+100 mV) and that of DsbA. This redox potential makes the Cys41–Cys44 disulfide bond the most oxidizing disulfide ever identified in proteins. The redox potential of Cys104–Cys130 is about 40 mV more negative than the redox potential of DsbA, but the very oxidizing Cys41–Cys44 disulfide bond could
2 Disulfide Bond Formation in the E. coli Periplasm 7
provide the thermodynamic driving force that allows the energetically uphill oxidation of DsbA by the Cys104–Cys130 disulfide in DsbB to occur. 2.7 How Can DsbB Generate Disulfide by Quinone Reduction?
Another very intriguing and still unanswered question is, how can DsbB form disulfide bonds by reducing quinone molecules? We recently showed that purified DsbB has a purple color, presumably due to the presence of a quinhydrone, a purple charge-transfer complex consisting of a hydroquinone and a quinone in a stacked conformation [30]. This finding suggested that the DsbB mechanism involves two quinone molecules. According to our model, one of these quinones is very tightly bound to DsbB, whereas the other one is exchangeable. The resident quinone would be directly involved in disulfide bond formation and would undergo oxidation-reduction cycles: it would be reduced during the generation of a disulfide bond and then re-oxidized by an exchangeable quinone derived from the oxidized quinones pool (see Figure 5).
Fig. 5 Model for the quinone reductase activity of DsbB. The quinone reductase activity of DsbB involves two quinone molecules: a tightly bound quinone and an exchangeable quinone. The two quinones are initially reduced. Step 1: the reduced exchangeable quinone is replaced by an oxidized quinone derived from the quinone pool. A purple quinhydrone-like charge-transfer complex is formed between the reduced and the oxi-
dized quinone molecules. Step 2: Electrons are transferred from DsbA to the C-terminal disulfide in DsbB. Then, electrons are transferred to the N-terminal disulfide of DsbB. Step 3: the N-terminal cysteine residues reduce the resident quinone. The two reduced quinone molecules do not form a stable complex, and the transient quinone is exchanged with an oxidized one.
8 Disulfide Bond Formation in Prokaryotes
3 Disulfide Bond Isomerization 3.1 The Protein Disulfide Isomerases DsbC and DsbG
The powerful oxidant DsbA can introduce nonnative disulfide bonds into proteins with more than two cysteines. It is important to quickly resolve these incorrect disulfides and thus prevent the accumulation of misfolded proteins. Cells therefore contain a “disulfide correction system,” which, in E. coli, involves the two protein disulfide isomerases DsbC and DsbG. The proteins are homologous (≈30% amino acid identity) and share many common properties. They seem to differ, however, in their substrate specificity. We showed, for instance, that RNAse I and MepA, which contain eight and six conserved cysteines, respectively, are substrates for DsbC but not for DsbG [31]. DsbC and DsbG are dimeric proteins with a thioredoxin-like domain and a CXXC active site motif. In contrast to DsbA, the two cysteine residues of these motifs are kept reduced in the periplasm [32]. This allows the proteins to reduce nonnative disulfides and to isomerize them to the correct form. According to the proposed mechanism for DsbC and DsbG action (Figure 6), the first cysteine residue attacks
Fig. 6 Disulfide bond isomerization by DsbG. (A) Reduced DsbG reacts with a nonnative disulfide. (B) A mixed disulfide is formed between DsbG and the target protein. The mixed disulfide is resolved either by attack of another cysteine of the substrate (C) or by attack of the second cysteine of DsbG (D).
3 Disulfide Bond Isomerization 9
a nonnative disulfide bond in the substrate protein, leading to the formation of a mixed disulfide. This mixed disulfide can then be resolved either by the attack of the second active site cysteine of the isomerase or by attack of a third cysteine of the target protein. In the first case, the nonnative disulfide in the substrate protein is reduced and the isomerase is released in the oxidized state. In the second case, alternative disulfide bonds form in the substrate protein and the isomerase is released in the reduced state. DsbC and DsbG have been shown to be able to correct nonnative disulfide bonds both in vitro and in vivo. They have been found to assist in the refolding of eukaryotic recombinant proteins expressed in E. coli, such as bovine pancreatic trypsin inhibitor (three disulfide bonds), RNAse A (four disulfide bonds), and mouse urokinase (12 disulfide bonds) [22, 33–35]. We have recently shown that DsbC is required to fold at least two native E. coli proteins, RNAse I (four disulfides) and MepA (six conserved cysteines) [31]. Curiously, so far in vivo substrates of DsbG are restricted to eukaryotic proteins expressed in E. coli, and its physiological substrates are still unknown. It seems likely that DsbG has a very limited substrate specificity, which makes the identification rather difficult. Alternatively, it is conceivable that DsbG has an in vivo function that is not related to its ability to act on disulfide bonds. Further work is needed to further characterize the function and substrate specificity of DsbC and DsbG. This should help us to understand why E. coli has evolved two homologous protein disulfide isomerases. 3.2 Dimerization of DsbC and DsbG Is Important for Isomerase and Chaperone Activity
An important feature of DsbC and DsbG is the presence of a dimerization domain at the amino-terminus of the protein [36]. Although no catalytically active cysteine residues are present in this domain, this region is important for the isomerase activity of DsbC. This was suggested by limited proteolysis experiments performed on DsbC, which showed that monomeric DsbC lacking the amino-terminal domain is inactive as an isomerase and cannot catalyze the formation of correct disulfide bonds in proteins such as RNase A and bovine pancreatic trypsin inhibitor [36]. In particular, this amino-terminal domain is thought to be important for peptide binding. This was suggested by the crystal structure of DsbC, which showed that DsbC is a V-shaped dimer, with each monomer forming one arm of the V [37]. The dimerization domains form an extended cleft in the DsbC structure. The surface of this cleft is predominantly composed of hydrophobic residues, which are predicted to bind misfolded substrates. These hydrophobic interactions are proposed to stabilize the complex formed between DsbC and the target protein until the target protein reaches a more stable conformation. Depending on the redox state of the CXXC motif, the hydrophobic cleft switches between an open and a closed conformation [38]. This allows DsbC to adapt to the size and shape of the substrate proteins. In addition to the isomerase activity, both DsbC and DsbG have been reported to have a chaperone activity. DsbC can assist in the refolding of lysozyme and promotes the in vitro reactivation of denatured D-glyceraldehyde-3-phosphate dehydrogenase [39].
10 Disulfide Bond Formation in Prokaryotes
DsbG is able to prevent the aggregation of citrate synthase and luciferase in vitro [40]. Dimerization seems to be required for this activity, as truncated DsbC variants that lack the dimerization domain have no chaperone activity [36]. Furthermore, addition of the DsbC amino-terminal domain to DsbA or thioredoxin, two oxidoreductases that do not exhibit chaperone activity, causes their dimerization and confers some chaperone activity [41]. The formation of similar V-shaped dimers with a hydrophobic cleft by these hybrid proteins is likely to be the cause of this new activity. 3.3 Dimerization Protects from DsbB Oxidation
If DsbB were able to oxidize the active-site cysteines of DsbC and DsbG, the proteins would no longer be able to act as disulfide isomerases. However, we and others have found that DsbB is unable to oxidize DsbC and DsbG either in vivo or in vitro [22, 32, 35, 42]. The observed dimerization of DsbC and DsbG appears to be the molecular basis for this discrimination, as disruption of the dimer interface produces a monomeric DsbC protein that is now oxidized by DsbB both in vivo and in vitro and that can substitute for DsbA in the cell [43]. 3.4 Import of Electrons from the Cytoplasm: DsbD
In order to stay reduced and active in the oxidizing environment of the periplasm, DsbC and DsbG have to be kept reduced. This reduction of DsbC and DsbG depends on the protein DsbD (Figure 7). DsbD is an innermembrane protein of 59 kDa. It is composed of three domains (Figure 7): a periplasmic amino-terminal domain (α), a central membranous domain with eight transmembrane segments (β), and a periplasmic carboxy-terminal domain with a thioredoxin-like fold (γ ). Each of these domains harbors a conserved pair of cysteine residues. These six conserved cysteine residues are important for DsbD activity, as removal of any one of them leads to the accumulation of oxidized DsbC and DsbG in the E. coli periplasm [44, 45]. Rietsche et al. showed that the cytoplasmic thioredoxin system is the source of reducing equivalents that keep DsbD reduced [32, 34]. The thioredoxin system, which consists of thioredoxin (TrxA), thioredoxin reductase (TrxB), and NADPH, is responsible for maintaining cytoplasmic proteins in a reduced state. Thus, the function of DsbD is to transfer electrons across the inner membrane, using the cytoplasmic pool of NADPH as a source of electrons. The mechanism by which DsbD transfers electrons across the membrane is still unclear. in vivo and in vitro experiments suggest that electrons are transferred via a cascade of disulfide bond exchange reactions. According to the current model [46], the first step is the reduction of the two conserved cysteine residues within the transmembrane β domain by cytoplasmic TrxA. Then, electrons are successively transferred to the γ and α domains and finally onto DsbC or DsbG. Most of the steps that have been proposed by this model are well characterized, especially the reactions taking place in the cytoplasm and in the periplasm. The isolation of a mixed
3 Disulfide Bond Isomerization 11
Fig. 7 Disulfide bond isomerization pathway in the periplasm. DsbC and DsbG are kept reduced by DsbD. DsbD is kept reduced by the thioredoxin system in the cytoplasm. The proposed direction for the electron flow is shown by the arrows.
disulfide between TrxA and one of the conserved cysteines of the β domain is an argument in favor of a direct interaction between TrxA and the β domain [46]. Using purified components, we showed that electrons flow from the cysteines of the γ domain to the α domain and then to DsbC or DsbG [47]. This was further confirmed by the determination of the redox potential of the cysteine residues in the γ domain (−241 mV) and in the α domain (−229 mV). These lie midway between the redox potential of thioredoxin (−270 mV) and that of DsbC (−134 mV) [47]. The fact that the cysteine residues of the α domain are more reducing than the cysteine residues of the a domain is consistent with the reaction being thermodynamically driven. The structure of the γ domain has been determined, and residues that might be important for the interaction with the α domain have been identified [48]. The interaction between the α domain and DsbC or DsbG is also well characterized. in vitro, cysteines of the α domain can efficiently reduce DsbC and DsbG with a Km of 5 µM, and the structure of a complex between the a domain and DsbC has been solved [38]. The membranous part of the DsbD reaction is still obscure. Removal of the β domain leads to the accumulation of oxidized γ and a domains, which is a strong argument that the β domain is required to transfer electrons to the periplasmic domains [46]. However, how electrons are transferred from the β domain to the γ domain is still unclear. Is it by pure disulfide exchange? Is a cofactor, such as
12 Disulfide Bond Formation in Prokaryotes
a quinone required as with DsbB? So far there is no evidence that the cysteine residues of the β domain interact directly with the cysteine residues of the γ domain, and attempts to isolate mixed disulfides between the two domains have been unsuccessful. Moreover, according to a recent report, both cysteine residues of the β domain are on the cytoplasmic side of the membrane [49]. If this is the case, how can they reduce the cysteines of the γ domain, which are in the periplasm? We purified the β domain and showed that it can transfer electrons from TrxA to the cysteines of the γ domain in vitro [47]. However, the small amount of pure β domain that we were able to produce and the relatively low activity of the preparation prevented us from either proving or ruling out the presence of a cofactor. The important question of how electrons are transferred across the membrane is still unanswered and further work, such as crystallization studies and biochemical characterization of the β domain, is required. 3.5 Conclusions
Before the discovery of DsbA in 1991, it was thought that disulfide bond formation in the periplasm was an uncatalyzed process. Since the discovery of this enzyme, much work has been done and most of the important steps in the pathways of disulfide bond formation and isomerization in the E. coli periplasm have been unraveled. However, some very interesting questions remain unanswered. These mainly concern the mechanism of action of the two membrane proteins of the Dsb protein family, DsbB and DsbD. DsbB generates disulfides de novo via quinone reduction. More work is required to establish exactly how it performs this key reaction, which appears to be the source of the vast majority of disulfides in E. coli. DsbD acts to transport disulfides across the membrane, but how it does so is almost entirely unclear. Characterization of the membranous domain of DsbD will help us to understand how DsbD solves this unique problem. There are many examples of proteins involved in the transport of metals, sugars, or even complex polypeptides across the membranes, and some of these proteins are very well characterized. In contrast, DsbD and its homologues are the only proteins known to transport disulfides, and their mechanism is still obscure. It is not surprising that these two membrane proteins are the least understood of the family of Dsb proteins, as membrane proteins are traditionally difficult to work with. However, both DsbB and DsbD can be purified in large amounts, so further biochemical progress is likely. References 1 DAILEY, F. E. & BERG, H. C. (1993). Mutants in disulfide bond formation that disrupt flagellar assembly in Escherichia coli. Proc. Natl. Acad. Sci. USA 90, 1043–1047. 2 SEVIER, C. S. & KAISER, C. A. (2002). Formation and transfer of disulphide
bonds in living cells. Nat Rev Mol Cell Biol 3, 836–847. 3 BARDWELL, J. C., MCGOVERN, K. & BECKWITH, J. (1991). Identification of a protein required for disulfide bond formation in vivo. Cell 67, 581– 589.
References 4 MISSIAKAS, D., GEORGOPOULOS, C. & RAINA, S. (1993). Identification and characterization of the Escherichia coli gene dsbB, whose product is involved in the formation of disulfide bonds in vivo. Proc Natl Acad Sci USA 90, 7084–7088. 5 STAFFORD, S. J., HUMPHREYS, D. P. & LUND, P. A. (1999). Mutations in dsbA and dsbB, but not dsbC, lead to an enhanced sensitivity of Escherichia coli to Hg2+ and Cd2+ . FEMS Microbiol Lett 174, 179–184. 6 KISHIGAMI, S., AKIYAMA, Y. & ITO, K. (1995). Redox states of DsbA in the periplasm of Escherichia coli. FEBS Lett 364, 55–58. 7 ZAPUN, A., BARDWELL, J. C. & CREIGHTON, T. E. (1993). The reactive and destabilizing disulfide bond of DsbA, a protein required for protein disulfide bond formation in vivo. Biochemistry 32, 5083–5092. 8 GRAUSCHOPF, U., WINTHER, J. R., KORBER, P., ZANDER, T., DALLINGER, P. & BARDWELL, J. C. (1995). Why is DsbA such an oxidizing disulfide catalyst?. Cell 83, 947–955. 9 MARTIN, J. L., BARDWELL, J. C. & KURIYAN, J. (1993). Crystal structure of the DsbA protein required for disulphide bond formation in vivo. Nature 365, 464– 468. 10 GUDDAT, L. W., BARDWELL, J. C., GLOCKSHUBER, R., HUBER-WUNDERLICH, M., ZANDER, T. & MARTIN, J. L. (1997). Structural analysis of three His32 mutants of DsbA: support for an electrostatic role of His32 in DsbA stability. Protein Sci 6, 1893–1900. 11 GUDDAT, L. W., BARDWELL, J. C., ZANDER, T. & MARTIN, J. L. (1997). The uncharged surface features surrounding the active site of Escherichia coli DsbA are conserved and are implicated in peptide binding. Protein Sci 6, 1148–1156. 12 COUPRIE, J., VINCI, F., DUGAVE, C., Qu inverted question markem inverted question MARKENEUR, E. & MOUTIEZ, M. (2000). Investigation of the DsbA mechanism through the synthesis and analysis of an irreversible enzyme-ligand complex. Biochemistry 39, 6732–6742. 13 FRECH, C., WUNDERLICH, M., GLOCKSHUBER, R. & SCHMID, F. X. (1996).
14
15
16
17
18
19
20
21
22
23
Preferential binding of an unfolded protein to DsbA. EMBO J 15, 392–398. VINCI, F., COUPRIE, J., PUCCI, P., QUEMENEUR, E. & MOUTIEZ, M. (2002). Description of the topographical changes associated to the different stages of the DsbA catalytic cycle. Protein Sci 11, 1600–1612. BARDWELL, J. C., LEE, J. O., JANDER, G., MARTIN, N., BELIN, D. & BECKWITH, J. (1993). A pathway for disulfide bond formation in vivo. Proc Natl Acad Sci USA 90, 1038–1042. BADER, M., MUSE, W., ZANDER, T. & BARDWELL, J. (1998). Reconstitution of a protein disulfide catalytic system. J Biol Chem 273, 10302–10307. GUILHOT, C., JANDER, G., MARTIN, N. L. & BECKWITH, J. (1995). Evidence that the pathway of disulfide bond formation in Escherichia coli involves interactions between the cysteines of DsbB and DsbA. Proc Natl Acad Sci USA 92, 9895– 9899. KISHIGAMI, S., KANAYA, E., KIKUCHI, M. & ITO, K. (1995). DsbA-DsbB interaction through their active site cysteines. Evidence from an odd cysteine mutant of DsbA. J Biol Chem 270, 17072–17074. KISHIGAMI, S. & ITO, K. (1996). Roles of cysteine residues of DsbB in its activity to reoxidize DsbA, the protein disulphide bond catalyst of Escherichia coli. Genes Cells 1, 201–208. KOBAYASHI, T., KISHIGAMI, S., SONE, M., INOKUCHI, H., MOGI, T. & ITO, K. (1997). Respiratory chain is required to maintain oxidized states of the DsbA-DsbB disulfide bond formation system in aerobically growing Escherichia coli cells. Proc Natl Acad Sci USA 94, 11857–11862. BADER, M., MUSE, W., BALLOU, D. P., GASSNER, C. & BARDWELL, J. C. (1999). Oxidative protein folding is driven by the electron transport system. Cell 98, 217–227. BADER, M. W., XIE, T., YU, C. A. & BARDWELL, J. C. (2000). Disulfide bonds are generated by quinone reduction. J Biol Chem 275, 26082–26088. XIE, T., YU, L., BADER, M. W., BARDWELL, J. C. & YU, C. A. (2002). Identification of the ubiquinone-binding domain in the
13
14 Disulfide Bond Formation in Prokaryotes
24
25
26
27
28
29
30
31
32
33
34
disulfide catalyst disulfide bond protein B. J Biol Chem 277, 1649–1652. KADOKURA, H., BADER, M., TIAN, H., BARDWELL, J. C. & BECKWITH, J. (2000). Roles of a conserved arginine residue of DsbB in linking protein disulfidebond-formation pathway to the respiratory chain of Escherichia coli. Proc Natl Acad Sci USA 97, 10884–10889. KOBAYASHI, T. & ITO, K. (1999). Respiratory chain strongly oxidizes the CXXC motif of DsbB in the Escherichia coli disulfide bond formation pathway. EMBO J 18, 1192–1198. INABA, K. & ITO, K. (2002). Paradoxical redox properties of DsbB and DsbA in the protein disulfide-introducing reaction cascade. EMBO J 21, 2646–2654. REGEIMBAL, J. & BARDWELL, J. C. (2002). DsbB catalyzes disulfide bond formation de novo. J Biol Chem 277, 32706–32713. KADOKURA, H. & BECKWITH, J. (2002). Four cysteines of the membrane protein DsbB act in concert to oxidize its substrate DsbA. EMBO J 21, 2354–2363. GRAUSCHOPF, U., FRITZ, A. & GLOCKSHUBER, R. (2003). Mechanism of the electron transfer catalyst DsbB from Escherichia coli. EMBO J 22, 3503–3513. REGEIMBAL, J., GLEITER, S., TRUMPOWER, B. L., YU, C. A., DIWAKAR, M., BALLOU, D. P. & BARDWELL, J. C. (2003). Disulfide bond formation involves a quinhydrone-type charge-transfer complex. Proc Natl Acad Sci USA 100, 13779–13784. HINIKER, A. & BARDWELL, J. C. (2004). In vivo substrate specificity of periplasmic disulfide oxidoreductases. J Biol Chem 279, 12967–12973. RIETSCH, A., BESSETTE, P., GEORGIOU, G. & BECKWITH, J. (1997). Reduction of the periplasmic disulfide bond isomerase, DsbC, occurs by passage of electrons from cytoplasmic thioredoxin. J Bacteriol 179, 6602–6608. ZAPUN, A., MISSIAKAS, D., RAINA, S. & CREIGHTON, T. E. (1995). Structural and functional characterization of DsbC, a protein involved in disulfide bond formation in Escherichia coli. Biochemistry 34, 5075–5089. RIETSCH, A., BELIN, D., MARTIN, N. & BECKWITH, J. (1996). An in vivo pathway
35
36
37
38
39
40
41
42
43
44
for disulfide bond isomerization in Escherichia coli. Proc Natl Acad Sci USA 93, 13048–13053. BESSETTE, P. H., COTTO, J. J., GILBERT, H. F. & GEORGIOU, G. (1999). In vivo and in vitro function of the Escherichia coli periplasmic cysteine oxidoreductase DsbG. J Biol Chem 274, 7784–7792. SUN, X. X. & WANG, C. C. (2000). The N-terminal sequence (residues 1–65) is essential for dimerization, activities, and peptide binding of Escherichia coli DsbC. J Biol Chem 275, 22743–22749. MCCARTHY, A. A., HAEBEL, P. W., TORRONEN, A., RYBIN, V., BAKER, E. N. & METCALF, P. (2000). Crystal structure of the protein disulfide bond isomerase, DsbC, from Escherichia coli. Nat Struct Biol 7, 196–199. HAEBEL, P. W., GOLDSTONE, D., KATZEN, F., BECKWITH, J. & METCALF, P. (2002). The disulfide bond isomerase DsbC is activated by an immunoglobulin-fold thiol oxidoreductase: crystal structure of the DsbC-DsbDalpha complex. EMBO J 21, 4774–4784. CHEN, J., SONG, J. L., ZHANG, S., WANG, Y., CUI, D. F. & WANG, C. C. (1999). Chaperone activity of DsbC. J Biol Chem 274, 19601–19605. SHAO, F., BADER, M. W., JAKOB, U. & BARDWELL, J. C. (2000). DsbG, a protein disulfide isomerase with chaperone activity. J Biol Chem 275, 13349–13352. ZHAO, Z., PENG, Y., HAO, S. F., ZENG, Z. H. & WANG, C. C. (2003). Dimerization by domain hybridization bestows chaperone and isomerase activities. J Biol Chem 278, 43292–43298. JOLY, J. C. & SWARTZ, J. R. (1997). In vitro and in vivo redox states of the Escherichia coli periplasmic oxidoreductases DsbA and DsbC. Biochemistry 36, 10067–10072. BADER, M. W., HINIKER, A., REGEIMBAL, J., GOLDSTONE, D., HAEBEL, P. W., RIEMER, J., METCALF, P. & BARDWELL, J. C. (2001). Turning a disulfide isomerase into an oxidase: DsbC mutants that imitate DsbA. EMBO J 20, 1555–1562. STEWART, E. J., KATZEN, F. & BECKWITH, J. (1999). Six conserved cysteines of the membrane protein DsbD are required for the transfer of electrons from the
References cytoplasm to the periplasm of Escherichia coli. EMBO J 18, 5963–5971. 45 CHUNG, J., CHEN, T. & MISSIAKAS, D. (2000). Transfer of electrons across the cytoplasmic membrane by DsbD, a membrane protein involved in thioldisulphide exchange and protein folding in the bacterial periplasm. Mol Microbiol 35, 1099–1109. 46 KATZEN, F. & BECKWITH, J. (2000). Transmembrane electron transfer by the membrane protein DsbD occurs via a disulfide bond cascade. Cell 103, 769–779. 47 COLLET, J. F., RIEMER, J., BADER, M. W. &
BARDWELL, J. C. (2002). Reconstitution of a disulfide isomerization system. J Biol Chem 277, 26886–26892. 48 KIM, J. H., KIM, S. J., JEONG, D. G., SON, J. H. & RYU, S. E. (2003). Crystal structure of DsbDgamma reveals the mechanism of redox potential shift and substrate specificity(1). FEBS Lett 543, 164–169. 49 KATZEN, F. & BECKWITH, J. (2003). Role and location of the unusual redox-active cysteines in the hydrophobic domain of the transmembrane electron transporter DsbD. Proc Natl Acad Sci USA 100, 10471–10476.
15
1
Folding of Membrane Proteins Lukas K. Tamm and Heedeok Hong
University of Virginia, Charlottesville, USA
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The major forces that govern the folding and thermodynamic stability of watersoluble proteins have been studied in significant detail for over half a century [1, 2]. Even though the reliable prediction of protein structure ab initio from amino acid sequence is still an elusive goal, a huge amount of information on the mechanisms by which soluble proteins fold into well-defined three-dimensional structures has been obtained over this long time period [3]. In marked contrast, our current knowledge on thermodynamic and mechanistic aspects of the folding of membrane proteins is several orders of magnitude less advanced than that of soluble proteins, although integral membrane proteins comprise about 30% of open reading frames in prokaryotic and eukaryotic organisms [4]. There are several reasons why the research on membrane protein folding lags far behind similar research on soluble proteins. First, the first structure of a soluble protein (myoglobin) was solved in 1958, but the first membrane protein structure (photosynthetic reaction center) was only determined 27 years later in 1985. Even today, of the approximately 25 000 protein structures in the Protein Structure Databank (PDB), only about 140 (i.e., less than 0.6%) are structures of integral membrane proteins. Second, membrane proteins are more difficult to express, purify, and handle in large batches than soluble proteins. They require the presence of nondenaturing detergents or lipid bilayers to maintain their native structure, which adds an additional layer of complexity to these systems compared with soluble protein systems. The requirement for detergents or membranes also explains why it is more difficult to crystallize membrane proteins or subject them to nuclear magnetic resonance (NMR) for high-resolution structural studies. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Folding of Membrane Proteins
Third, it is very difficult to completely denature membrane proteins [5]. Solvent denaturation with urea or guanidinium chloride (GdmCl) is generally not applicable to membrane proteins because secondary structures that are buried in the membrane are often not accessible to these denaturants. Membrane proteins are also quite resistant to heat denaturation, and even if they can be (partially) denatured, they are prone to irreversible aggregation. These properties have not helped to make membrane proteins favorable targets for folding studies. However, as will be shown in this article, new and different techniques are being developed to study the folding of membrane proteins. If this is done with sufficient care, membrane proteins can be brought back “into the fold” and very useful and much needed information can be gathered on the molecular forces that determine their thermodynamic stability and mechanisms that lead to their native structures. Even though most helical membrane proteins are inserted into biological membranes by means of the translocon [6], knowledge of the energetics and kinetics of membrane protein structure formation are important from a basic science standpoint and for structure prediction and protein engineering of this still neglected class of proteins, which constitutes the largest fraction of all targets of currently available drugs. To understand the folding of membrane proteins, we first need to understand the dynamic structure of the lipid bilayer (i.e., the “solvent”) into which these proteins fold. Fluid lipid bilayers are highly dynamic, yet well-ordered two-dimensional arrays of two layers of lipids with their polar headgroups exposed towards water and their apolar fatty acyl chains shielded from water and facing the center of the bilayer. This structure [7, 8] is maintained almost entirely by the hydrophobic effect [9]. The hydrocarbon chains in the core of the bilayer are quite well ordered, but their order degrades towards the midplane of the bilayer. This region comprises a width of about 30 Å in a “typical” bilayer of dioleoylphosphatidylcholine. The two interfaces of the bilayer are chemically and structurally quite complex and comprise a width of about 15 Å each. These regions contain the phosphate and choline headgroups, the glycerol backbone and about 25 molecules of bound, but disordered water for each phosphatidylcholine (PC) headgroup. The lipids are free to diffuse laterally (∼1 µm2 s−1 ) and rotationally (∼1 µs−1 ), but basically do not flip-flop across the membrane. The molecular packing of lipids in a fluid lipid bilayer is maintained mechanically by a combination of three types of opposing forces: lateral chain repulsions in the core region, headgroup repulsions, and surface tension at the polar–nonpolar interface [10]. These forces create a lateral pressure profile along the membrane normal that cannot be directly measured, but that has been calculated based on the known magnitudes of the relevant forces [11]. This packing and lateral pressure profile has profound consequences on the lipid hydrocarbon chain dynamics [12] and the folding of membrane proteins as will be summarized in later sections of this article. Therefore, the mechanical and dynamical properties as well as the chemical properties of membranes change dramatically as the composition of chemical groupings changes along the bilayer. To dehydrate and bury a single peptide bond in the lipid bilayer is energetically unfavorable and costs 1.15 kcal mol−1 [13]. The energetic gain of folding a
1 Introduction 3
polypeptide into an internally hydrogen-bonded secondary structure in membranes is much smaller (i.e., of the order of −0.1 to −0.4 kcal per residue) [14–17]. This shows that partitioning of the bare backbone (e.g., polyglycine) into membranes is energetically unfavorable irrespective of whether the peptide is folded or not. Productive membrane insertion of a polypeptide segment is only promoted if the segment contains a sufficiently large number of apolar side chains, which by virtue of their favorable partitioning can overcome the energy cost of transferring the peptide backbone into the membrane. As a rule of thumb, the equivalent of a minimum of five leucine substitutions are required to offset the unfavorable energy of trying to insert a 20-residue polyalanine helix (see below). Forming intramolecular hydrogen bonds is moderately favorable at the interface and presumably quite favorable in the apolar core of the bilayer [18]. The energetic cost of opening a hydrogen bond in a membrane likely depends on the concentration of water or the probability of finding another hydrogen-bonding partner in that location. Since water and other hydrogen-bond acceptors are rare in membranes, one generally finds only fully hydrogen-bonded secondary structures in membrane proteins. Therefore, the predominant building blocks of membrane proteins are transmembrane (TM) helices and closed TM β-barrels. Other fully hydrogen-bonded secondary structures such as 310 -helices and π-bulges are rare, but do occasionally occur in membrane proteins. Since the α-helix and closed β-sheets are the main building blocks and since these two elementary secondary structures are rarely mixed in membrane proteins, we treat α-helical and β-barrel membrane proteins (for examples of two prototype membrane protein structures, see Figure 1) in separate sections of this article. These sections are preceded by a general section on the thermodynamics
Fig. 1 Three-dimensional structures of representative "-helical and $-barrel membrane proteins. Left: Bacteriorhodopsin from Halobacterium salinarium (PDB code: 1C3W). Right: Transmembrane domain of
OmpA (1QJP) from Escherichia coli. The aromatic side chains (Trp, Tyr, Phe) are concentrated at the bilayer–water interfaces as shown in both structures.
4 Folding of Membrane Proteins
of side-chain partitioning, which applies to both classes of membrane proteins and which also forms the basis of membrane protein topology and structure prediction algorithms.
2 Thermodyamics of Residue Partitioning into Lipid Bilayers
The partitioning of apolar amino acid side chains into the membrane provides the major driving force for folding of α-helical and β-barrel membrane proteins. Folding of membrane proteins is always coupled to partitioning of the polypeptide from water into the membrane environment. A large number of hydropathy scales have been developed over the years. The different scales have been derived based on very different physical or statistical principles and, therefore, differ for different residues sometimes quite significantly. It is beyond the scope of this article to review the different scales that are in use. The only scales that are based on comprehensive partitioning measurements of peptides into membranes and membrane-mimetic environments are the two Wimley & White (WW) scales, namely the WW interface [19] and the WW octanol scale [13, 20]. The WW interface scale refers to partitioning into the interface region of phosphatidylcholine bilayers and the octanol scale refers to partitioning into octanol, which is thought to be a good mimetic of the core of the lipid bilayer. As has been demonstrated for the hydrophobic core of soluble proteins [21], the dielectric constant in the core of a lipid bilayer may be as high as 10 due to water penetration (i.e., considerably larger than 2 in pure hydrocarbon and closer to that of wet octanol). Both scales are whole-residue scales, which means that they take into account the thermodynamic penalty of also partitioning the peptide backbone into the respective membrane locations. Based on these partition data, one may group the amino acid residues into three groups, namely those that do not favor membranes (Ala, Ser, Asn, Gly, Glu− , Asp− , Asp0 , Lys+ , Arg+ , His+ ), those that favor the interface (Trp, Tyr, Cys, Thr, Gln, His0 , Glu0 ), and those that favor the hydrocarbon core (Phe, Leu, Ile, Met, Val, Pro). Sliding a 19-residue window of summed WW hydropathy values over the amino acid sequences of membrane proteins and using the thermodynamic G = 0 cutoff predicts TM helices with high accuracy [13]. The residues that fall in the first and third group are not surprising. However, the residues that prefer the interface deserve a few comments. Statistical analyses and model studies of membrane proteins have indicated for quite some time that the aromatic residues Trp and Tyr residues have a strong preference for membrane interfaces [22] (see also Figure 1). It has been suggested that their bulkiness, aromaticity and slightly polar character are the major factors that determine their preferred location in the interface [23]. Hydrogen bonding is probably not important for this localization. The moderately polar residues in this group are not surprising. However, the uncharged form of glutamic acid interestingly is apolar enough to localize into the interface. The differential partitioning of glutamate depending on its charge state likely has
3 Stability of $-Barrel Proteins
profound consequences on the pK a value of this residue at membrane surface because the pKa is now coupled to partitioning. A manifestation of this shifted pK a is seen, for example, when the binding and membrane penetration of the fusion peptide of influenza hemagglutinin is examined [16]. This peptide contains two Glu residues in its N-terminal portion. A shift of the pH from 7 to 5 is sufficient to bind the peptide more tightly and more deeply (i.e., deep enough to cause membrane fusion at pH 5, but not at pH 7).
3 Stability of β-Barrel Proteins
All β-barrel membrane proteins whose structures are known at high resolution are outer membrane proteins of gram-negative bacteria [24]. However, several mitochondrial and chloroplast outer membrane proteins, namely those that are part of the protein translocation machinery of these membranes also appear to be β-barrel membrane proteins [25, 26]. Bacterial outer membrane proteins are synthesized with an N-terminal signal sequence and their translocation through the inner membrane is mediated by the translocon consisting of the SecY/E/G complex with the assistance of the signal recognition particle and the SecA ATPase, which together constitute the prokaryotic protein export machinery [27]. Helical membrane proteins that reside in the inner membrane have contiguous stretches of about 20 hydrophobic residues that form TM helices, which presumably can exit the translocon laterally into the lipid bilayer [6]. The TM strands of outer membrane proteins on the other hand are composed of alternating hydrophilic and hydrophobic residues [28]. Therefore, they lack the signal for lateral exit from the translocon into the inner membrane and become secreted into the periplasmic space, where they are presumably greeted by periplasmic chaperones such as Skp and SurA [29, 30]. Skp binds unfolded outer membrane proteins, prevents them from aggregation, but does not appear to accelerate their insertion into lipid model membranes (D. Rinehart, A. Arora, and L. Tamm, unpublished results). The outer membrane proteins of gram-negative bacteria may be grouped into six families according to their functions [31]. Structures of members of each of these families have been solved by X-ray crystallography or NMR spectroscopy. The six families with representative solved structures are: (i) the general porins such as OmpF and PhoE [32, 33], (ii) passive sugar transporters such as LamB and ScrY [34, 35], (iii) active transporters of siderophores such as FepA, FecA, and FhuA [36– 39] and of vitamin B12 such as BtuB [40], (iv) enzymes such as the phospholipase OmpLA [41], (v) defensive proteins such as OmpX [42], and (vi) structural proteins such OmpA [43]. All outer membrane proteins form β-barrels. The β-barrels of these proteins consist of even numbers of β-strands ranging from 8 to 22. The average length of the TM β-strands is 11 amino acid residues in trimeric porins and 13–14 residues in monomeric β-barrels. Since the strands are usually inclined at about 40◦ from the membrane normal, they span about 27–35 Å of the outer membrane, respectively.
5
6 Folding of Membrane Proteins
Most outer membrane proteins have long extracellular loops and short periplamsic turns that alternatingly connect the TM β-strands in a meandering pattern. The loops exhibit the largest sequence variability within each outer membrane protein family [31]. The largest and smallest outer membrane proteins are monomeric, but others such as OmpLA are dimers and the porins are trimers of β-barrels. Two general approaches have been taken to examine the thermodynamic stability of β-barrel membrane proteins: solvent denaturation by urea and guanidinium chloride and differential scanning calorimetry (DSC). In the following, we will discuss the thermodynamic stability of the simple monomeric β-barrel protein OmpA and then extend the discussion to complexities that arise by the presence of the central plug domain of FepA and FhuA and trimer formation in the case of porins. The most extensive thermodynamic stability studies have been carried out with the outer membrane protein A (OmpA) of Escherichia coli. OmpA is an abundant structural protein of the outer membrane of gram-negative bacteria. Its main function is to anchor the outer membrane to the peptidoglycan layer in the periplasmic space and thus to maintain the structural integrity of the outer cell envelope [44]. The 325-residue protein is a two-domain protein whose N-terminal 171 residues constitute an eight-stranded β-barrel membrane-anchoring domain and whose C-terminal 154 residues form the periplasmic domain that interacts with the peptidoglycan. The three-dimensional structure of the TM domain has been solved by X-ray crystallography [43] and, more recently, by solution NMR spectroscopy [45]. The structure resembles a reverse micelle with most polar residues facing the interior, where they form numerous hydrogen bonds and salt bridges, and the apolar residues facing the lipid bilayer [43]. Two girdles of aromatic side chains, tryptophans and tyrosines, delineate the rims of the barrel where they interact with the two membrane interfaces (Figure 1). Extensive mutagenesis studies show that OmpA is quite robust against many mutations especially in the loop, turn, and lipid bilayer-facing regions [46]. OmpA can even be circularly permutated without impairing its assembly and function in outer membranes [47]. Dornmair et al. [48] showed that OmpA could be extracted from the outer membrane and denatured and solubilized in 6–8 M urea and that the protein spontaneously refolded into detergent micelles by rapid dilution of the denaturant. They subsequently showed that urea-unfolded OmpA can also be quantitatively refolded into preformed lipid bilayers by rapid dilution of urea [49]. Recently, we could achieve the complete and reversible refolding of OmpA in lipid bilayers [50]. The reaction was shown to be a coupled two-state membrane partition folding reaction with the unfolded state in urea being completely dissociated from the membrane and the folded state completely integrated into the membrane (Figure 2). This represents the first example of an integral membrane protein, for which the thermodynamic stability could be quantitatively accessed. To achieve this goal, a urea-induced equilibrium folding system was developed for OmpA. The system consisted of small unilamellar vesicles composed of 92.5 mol% phos;phatidylcholine and 7.5 mol% phosphatidylglycerol in weakly basic (pH 7.0–10) and low ionic strength conditions. These conditions facilitate both the folding and unfolding reactions and dissociation of the unfolded form from the
3 Stability of $-Barrel Proteins
Fig. 2 Cartoon depicting the folding of OmpA into lipid bilayers. Left path: Folding into most bilayers is a thermodynamic two-state process. Right path: Folding into thin bilayers is multi-state (i.e., at least one equilibrium intermediate occurs). Bilayer forces acting on OmpA folding are indicated with arrows. The large black arrows indicate lateral bilayer pressure imparted on the lipid/protein interface in the hydrophobic core of the bilayer. Increasing this pressure increases the thermodynamic stability of the protein. The small black arrows in-
dicate lipid deformation forces caused by hydrophobic mismatch between the protein and unstressed bilayers. These forces decrease the thermodynamic stability of the protein. Water molecules penetrate more easily into the hydrophobic core of thin bilayers (blue arrows) and stabilize equilibrium intermediates until, in very thin bilayers, complete unfolding is no longer observed. The unfolded state in urea is dissociated from the membrane. Adapted from Ref. [50].
membrane surface. Folding of OmpA is monitored by tryptophan fluorescence or CD spectroscopy, or by sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE) (Figure 3). Each of these techniques monitors a different aspect of the folding reaction. Trp fluorescence reports on the insertion of tryptophans into the lipid bilayer, far-UV CD on the formation secondary structure, and SDS-PAGE on the completion of the tertiary β-barrel structure. If samples are not boiled prior to loading onto the SDS gels, they run at an apparent molecular mass of 30 kDa if the protein is completely folded, but at 35 kDa if it is unfolded or incompletely folded. This shift on SDS gels has proven to be a very useful assay for tertiary structure formation of OmpA and other outer membrane proteins [51]. Complete refolding as measured by the SDS-PAGE shift correlates with the reacquisition of the ion
7
8 Folding of Membrane Proteins
Fig. 3 Unfolded fraction of OmpA as a function of urea concentration obtained from the average fluorescence emission wavelength 8, far-UV CD spectroscopy, and the SDS-PAGE shift assay. These mea-
sures of lipid binding, secondary structure, and tertiary structure, respectively, superimpose in equilibrium measurements although they develop at different times in kinetic experiments. Adapted from Ref. [50].
channel activity of OmpA [52]. Since unfolding/refolding curves as a function of denaturant concentration measured by three techniques that report on vastly different kinetic processes superimpose, it is very likely that folding/unfolding is a thermodynamic two-state process that can be analyzed using a two-state equilibrium folding model. From such an analysis, Hong and Tamm [50] found a free energy of folding G◦ H2O = −3.4 kcal mol−1 and an m-value, which describes the cooperativity parameter of the folding/unfolding reaction, of m = 1.1. The rather small G◦ H2O of OmpA, which is of the same order of magnitude as that of water-soluble proteins, is perhaps surprising in view of the quite extreme heat resistance of this and other β-barrel membrane proteins. However, if one simply calculates the free energy of transfer of all residues that are transferred into the lipid bilayer with the augmented WW hydrophobicity scale, one finds that the net G◦ H2O amounts to only about –1 kcal mol−1 . This value may be further decreased by adding a few (negative) kcal mol−1 for folding, which would bring the prediction close to the experimentally determined value. The folding contribution may be the sum of contributions for secondary structure formation, estimated to be about 80 × −0.2 kcal mol−1 , and on the order of 10 kcal mol−1 of entropy and other costs for packing polar residues in the core of the protein. The take-home message from these measurements and theoretical considerations is that the overall stability of membrane proteins is not as large as one might have anticipated, but rather similar in magnitude to that of soluble proteins of similar size. As is true for soluble proteins, the thermodynamic stability of this and perhaps most membrane proteins is determined by the sum of many relatively large thermodynamic contributions that ultimately cancel to yield a relatively small net free energy of folding. The study by Hong and Tamm [50] also examined the effect of lipid bilayer forces on the thermodynamic stability of OmpA. The effects of elastic lipid deformation
3 Stability of $-Barrel Proteins
Fig. 4 Dependence of A) the free energy of folding G◦ H2O and B) the m-value reflecting the cooperativity of folding on the hydrophobic thickness of phosphatidylcholine bilayers. Filled circles: saturated acyl chain
series with small lateral bilayer pressure. Open circles: double-unsaturated acyl chain series with increasing lateral bilayer pressure as hydrophobic thickness decreases. Adapted from Ref. [50].
due to a mismatch of the hydrophobic length of the protein compared with the bilayer thickness and due to curvature stress imposed by cone-shaped lipids were investigated. Increasing the thickness of the lipid bilayer increases the stability (i.e., decreases G◦ H2O ) of OmpA. It also increases the cooperativity (m-value) of folding (Figure 4). The decrease in G◦ H2O is −0.34 kcal mol−1 per Å of increased bilayer thickness, which converts to −4 cal mol−1 per Å2 of increased hydrophobic contact area. This is only about 20% of the standard value for the hydrophobic effect [9]. We conclude that elastic energy due to lipid deformation counteracts the energy gain that would be expected for a full development of the hydrophobic effect. Including cone-shaped lipids with a smaller cross-sectional headgroup than acyl chain area increases the lateral pressure within lipid bilayers [53]. Increasing the lateral pressure in membranes by including lipids with increasing relative amounts
9
10 Folding of Membrane Proteins
Fig. 5 Effect of increasing mol fraction of phosphatidylethanolamine (C16:0 C18:1 PE) in phosphatidylcholine (C16:0 C18:1 PC) bilayers on the thermodynamic stability of OmpA. A) Unfolding curves measured by Trp fluo-
rescence. B) Dependence of G◦ H2O and m-value of OmpA folding on the coneshaped and lateral pressure-inducing lipid C16:0 C18:1 PE. Adapted from Ref. [50].
of cis double bonds in the hydrocarbon region increased the stability (i.e., decreased G◦ H2O ) of OmpA from −3.4 to −7 kcal mol−1 and more negative values when phosphatidylethanolamines were included as cone-shape lipids (Figure 5). The mvalue exhibited an increase that was correlated with the G◦ H2O decrease, in the case of bilayer thickening, but a decrease when the bilayer pressure was increased with cis double bonds. The various bilayer forces that act on OmpA and modulate its thermodynamic stability in membranes are illustrated in Figure 2. Gram-negative bacteria possess active transport systems for the uptake of iron–siderophore complexes, vitamin B12, and other essential nutrients. The systems consist of active ATP-requiring transporters in the inner membrane, the TonB linker protein that spans the periplasm, and a family of proteins called TonBdependent transporters in the outer membrane. The TonB-dependent transporters share common structural features. They are monomeric 22-stranded β-barrels. The lumina of these large-diameter barrels are filled with a N-terminal “cork” or “plug” domain with a globular mixed α-helix/β-structure. The plug domains are tightly inserted into the barrels and make extensive salt bridge and hydrogen bond contacts
3 Stability of $-Barrel Proteins
with the inner barrel wall. The mechanism of substrate transport through these transporters is presently not well understood. A thermal denaturation study using DSC of FhuA demonstrated that the plug domain and the surrounding β-barrel are autonomous folding units [54]. In the absence of bound substrate (ferrichrome), a reversible transition centered at 65 ◦ C was well separated from an irreversible transition centered at 74 ◦ C. The binding of the substrate increased the lower transition temperature to 71 ◦ C, while the higher transition temperature was not changed. Since the higher temperature transition was accompanied by a significant change of the CD signal at 198 nm and 215 nm, this transition was assigned to the denaturation of the β-barrel. The lower temperature transition was assigned to the denaturation of the plug domain and neighboring loops emerging from the β-barrel because antibodies targeting unfolded outer loops bound to FhuA only above the lower transition temperature of 65 ◦ C. A deletion mutant (21–128) that lacked the plug domain underwent a thermal denaturation at a lower temperature (T m = 62 ◦ C) than the wild-type protein, implying that the presence of the plug stabilizes the barrel structure. The stability of FepA in Triton-X100 micelles was probed by solvent-induced denaturation and EPR spectroscopy [55, 56]. In these studies, FepA was functionally refolded from the denatured state by dialysing the denaturant in the presence of TX-100 micelles. When GdmCl- and urea-induced denaturation was monitored by site-directed spin-label EPR spectroscopy at an extracellular loop site, unfolding occurred in a sharp transition at 2.0 M GdmCl or 5.5 M urea, respectively. The free energies of unfolding were approximately 6 kcal mol−1 with both denaturants. The rate of unfolding of the substrate-bound protein was significantly smaller than that of the free protein. This confirms that the substrate has a stabilizing effect on FepA as has also been observed by DSC with FhuA. When residues pointing towards the center of the barrel were spin-labeled, their EPR spectra indicated completely mobile residues in 4 M GdmCl. However, residues facing the detergent were still quite immobilized at this GdmCl concentration. The authors argued that denatured FepA retained substantial residual hydrophobic interactions with the detergent micelle. This observation is reminiscent of residual hydrophobic interactions that have been observed in soluble proteins at high denaturant concentrations [57]. However, in these studies with FepA, the degree of denaturation was unfortunately not recorded by a global method such as CD spectroscopy and therefore, it is not clear whether the transition is two-state and whether fully denaturing conditions have been reached in this work. Porins facilitate the general diffusion of polar solutes (< 400 Da) across the outer membranes of gram-negative bacteria. The porins are composed of 16-stranded β-barrels that assemble into rigid homotrimers whose trimer interfaces are formed by close van der Waals packing of nonpolar residues and aromatic ring stacking interactions [32, 33]. The lumen of each monomer is aqueous and surrounded by hydrophilic residues. An interesting structural and functional feature of the porins is that the long loop L3 folds back into the lumen of the channel where it engages in a hydrophobic contact with a few residues inside the barrel. The loop forms a constriction in the channel and its acidic residues together with a cluster of basic
11
12 Folding of Membrane Proteins
residues on the opposite channel wall are thought to exert a transverse electric field across the channel opening. The shorter loop L2 forms a “latch” that reaches over from one subunit to a neighboring subunit and thereby further stabilizes inter-subunit interactions. Porins are extremely stable towards heat denaturation, protease digestion, and chemical denaturation with urea and GdmCl [31]. For example, extraction of OmpF from the E. coli outer membrane requires the heating of outer membranes in mixtures of isopropanol and 6 M GdmCl at 75 ◦ C for more than 30 min [58]. The trimer structure itself is maintained up to 70 ◦ C in 1% SDS [59]. A mutagenesis study combined with DSC and SDS-PAGE shift assays demonstrates that the inter-subunit salt bridge and hydrogen bonding interactions involving loop L2 contribute significantly to the trimer stability [59]. For example, mutations breaking the salt bridge between Glu71 and Arg100 decrease the trimer–monomer transition temperature from 72 to 47–60 ◦ C and Hcal from 430 to 200–350 kcal mol−1 . A similar behavior was observed when residues 69–77 of L2 were deleted. The energetic contribution of the nonpolar contact between neighboring barrel walls has not yet been investigated in porins, but would be interesting to examine and compare with lateral associations of interacting TM helices (see below).
4 Stability of Helical Membrane Proteins
In 1990, Popot and Engelman proposed a two-stage model for the folding of αhelical membrane proteins [60]. In the framework of this model, TM helices are thought to be independently folded units (stage I) that may laterally associate into TM helix bundles (stage II). The model was inspired by the finding of the same group that the seven-TM-helix protein bacteriorhodopsin (BR) can be split into helical fragments that would still integrate as TM helices and then laterally assemble into a seven-helix protein that was capable of binding to cofactor retinal just as the wild-type protein [61]. Since then, it has been realized that more stages are needed to describe the folding of helical proteins. The interface is a membrane compartment that may be important in early stages of folding [62]. In addition, cofactors and polypeptide hinges and loops can be additional factors that sometimes play important roles in late stages of helical membrane protein folding [63]. Although the real world of membrane proteins is clearly more complex, the two-stage model has some merits because it is conceptually simple and because it appears to correctly describe the hierarchy of folding of at least a few simple membrane proteins. Therefore, we describe in this section how individual TM helices are established and then proceed in the following section with lateral helix associations and higher order structural principles of membrane protein folding. As indicated above, open hydrogen bonds are energetically unfavorable in membranes. Sequence segments with a sufficient number of apolar residues are driven
5 Helix and Other Lateral Interactions in Membrane Proteins
into membranes by the hydrophobic effect where they fold into TM or interfacial α-helices. The thermodynamic gain of folding a 20-residue peptide into a TM helix is about −5 kcal mol−1 , if we take −0.25 kcal mol−1 as the per-residue increment of forming an helix in membranes [16]. Generally and in contrast to soluble protein α-helices, TM helices are quite rich in glycines. Prolines are also not uncommon in TM helices, where they often induce bends, but do not completely break the helix. Finally, serines, threonines, and cysteines are frequently interspersed into otherwise apolar TM sequences. These latter residues often form hydrogen bonds with backbone carbonyls and may therefore have a more apolar character in TM helices than in aqueous environments. It is generally not possible to obtain interpretable results on the stability of helical integral membrane proteins from thermal denaturation studies because thermal denaturation of these proteins usually leads to irreversibly denatured states [5]. However, the thermodynamic stabilities of a few polytopic helical membrane proteins have been studied by partial denaturation with strong ionic detergents. Early refolding studies by Khorana and coworkers showed that BR could be quantitatively refolded by step-wise transfer from organic solvent to SDS to renaturing detergents or lipids [64, 65]. These protocols were later refined to measure the kinetics of refolding of BR upon transfer from the ionic detergent SDS into mixed lipid/detergent micelles [66, 67]. It should be kept in mind that although tertiary contacts are lost, most secondary structure is retained or perhaps even partially refolded in SDS. Therefore, the reactions observed with these protocols report only on the last stages of membrane protein folding. In 1997, Lau and Bowie developed a system to reversibly unfold native diacylglycerol kinase (DAGK) in n-decyl-β-maltoside (DM) micelles by the addition of SDS [68]. In these carefully controlled studies, the ratios of the denaturing and native structure-supporting detergents were systematically varied, so that reversibility of the partial unfolding reaction could be demonstrated. By analyzing their data with the two-state model of protein folding, these authors determined the stability of the three-helix TM domain of DAGK in DM relative to SDS to be −16 kcal mol−1 . This large (negative) value is somewhat surprising and indicates an extremely high thermodynamic stability of DAGK. A similar study has been carried out recently with BR [69].
5 Helix and Other Lateral Interactions in Membrane Proteins
The packing density and the hydrophobicity of amino acid residues involved in the intramolecular contact of α-helical membrane proteins are similar to those in the hydrophobic cores of water-soluble proteins. The basic principles of the hydrophobic organization of residues in water-soluble proteins appear to hold true also for membrane proteins. However, the energy gain of the hydrophobic effect has already been expended on inserting individual α-helices into membranes and cannot be a driving force for helix association. From comprehensive mutagenesis work on the TM domain of glycophorin A [70] and later from genome-wide
13
14 Folding of Membrane Proteins
database searches [71], it has become clear that tight van der Waals contacts between complementary helix surfaces provide a strong driving force for lateral helix association in membranes. This force is perhaps best understood if one considers the possibility that lipids even in their fluid liquid-crystalline state do not conform as well to corrugated exposed helix interaction sites as the complementary helix binding partner. Lipids also gain entropy upon release from a helix surface into the bulk liquid-crystalline bilayer (“lipophobic effect”). Since thermodynamic driving forces are always determined by the energetic differences between two states, it is the unfavorable lipid solvation of corrugated helix surfaces that ultimately drives helix association. Helices of membrane proteins are rarely oriented exactly perpendicular to the plane of the membrane. They are frequently tilted by about 20◦ from the membrane normal in structurally rugged membrane proteins [72] and by much larger angles in some recently solved structures of ion channels and complex transporters. TM helices interact with each other with left- or right-handed crossing angles [72, 73]. The specificity of helix interactions in membranes is thought to arise from complementary knob-in-the-hole van der Waals interactions. Glycines provide the holes and the β-branched side-chains of valine and isoleucine provide the most frequent knobs on the complementary surfaces. The generality of this interaction is supported by the frequent occurrence of GxxxG motifs in interacting TM segments [71]. The interaction energy of the helix dimer of the glycophorin A TM domain has been measured in C8E5 micelles by analytical ultracentrifugation and was found to be −9 kcal mol−1 [74]. Although this measurement offers a rare and important glimpse on the thermodynamics of TM helix-helix interactions, the value was determined in detergent micelles and may be different in lipid bilayers. Subsequent alanine scanning experiments indicated that mutation of Gly79 and Gly83 resulted in the largest energy penalty (1.5–3 kcal mol−1 relative to wild-type) for dimer formation of glycophorin [75]. The helix interaction “knobs” Leu75 and Ile76 also cost more than 1 kcal mol−1 in stabilization energy when replaced by alanines. Another alanine scanning mutagenesis study examined the effect of all side chains of helix B of BR on the thermodynamic stability of helix interactions in this protein [69]. The most destabilizing mutations were Y57A (3.7 kcal mol−1 relative to wt) and T46A (2.2 kcal mol−1 relative to wt). These were followed by I45A, F42A, and K41A (1.9–1.6 kcal mol−1 ). Apparently, very different types of side chains (polar, apolar, aromatic, charged) can cause large effects. These studies show that helix packing interactions may be more complex (and less predictable) than previously thought based on extrapolations from the glycophorin paradigm. A remarkable and important result from the Faham et al. [69] study is that each Å2 of buried surface area yields about 26 cal mol−1 of stabilizing energy. This is about the same number as observed for the burying of residues in soluble proteins, which is usually thought to be driven by the hydrophobic effect. Since the hydrophobic effect is already consumed by inserting side chains into a hydrophobic environment (SDS micelle) and since polar residues essentially yield the same number, the 26 cal mol−1 per Å2 , packing energy may be more universal than previously thought and may reflect van der Waals, hydrophobic, and lipophobic interactions. Alternatively,
6 The Membrane Interface as an Important Contributor to Membrane Protein Folding 15
water may access helical surfaces in partially denaturing SDS micelles better than in native structure-supporting DMPC/CHAPSO micelles, in which case the result may still conform to the classical hydrophobic effect. An additional mechanism of helix interaction is thought to arise from interhelix hydrogen bonds. Side chain-to-side chain (Asn, Gln, Asp, Glu, His) hydrogen bonds have been introduced into engineered TM helical bundles and were found to stabilize homodimers and homotrimers [76–79]. Ser and Thr side chains were also examined in these host-guest model sequences, but were insufficient to support helix association. It has also been suggested that backbone Cα-H to backbone carbonyl hydrogen bonds might stabilize membrane proteins [80]. However, although such hydrogen bonds are observed quite frequently in crystal structures of membrane (and soluble) proteins, they may merely allow close packing and may not make significant energetic contributions to the association of neighboring helices as has been demonstrated for one such bond in BR [81]. Pore loops are found to intercalate into open angled helical membrane protein structures as exemplified by the KcsA potassium channel structure [82]. The loops are generally also fixed to the helical framework of the protein by side chain-to-side chain hydrogen bonds, namely a hydrogen bond from a loop Tyr to a neighboring helix Trp and Thr in the case of KcsA. Finally, many membrane protein structures contain cofactors and pigments that likely contribute significantly to the stability and interaction specificity of these structures. Intersubunit interactions that determine quaternary structures of membrane proteins generally follow the same helix packing rules as those determining their tertiary structures [83]. In case of BR, van der Waals packing between nonpolar surfaces, hydrogen bonding, and aromatic ring stacking of tyrosines in the bilayer interface all contribute to trimer formation.
6 The Membrane Interface as an Important Contributor to Membrane Protein Folding
In this and the following sections we turn to pathways and possible intermediates of membrane protein folding. Jacobs and White [84] proposed a three-step thermodynamic model of membrane protein folding based on structural and partitioning data of small hydrophobic peptides. In this model, the first three steps of membrane protein folding are thought include partitioning into the interface, folding in the interface, and translocation across the membrane to establish a TM helix. The model was later extended to a four-step model, which added helix–helix interactions as the fourth step [62]. The model is rather a thermodynamic concept than an actual pathway of folding of constitutive helical membrane proteins because their TM helices are inserted with the assistance of the translocon in biological membranes. However, the concept is relevant for the insertion of helical toxins into membranes, which occurs without the assistance of other proteins, and for the placement of interfacial helices into membranes. Separation of a partition and folding step in the interface is probably also mostly semantic because in all observable reactions these two processes appear to be coupled. Nevertheless, to think of protein folding as a
16 Folding of Membrane Proteins
coupled process of two components is useful for separating the thermodynamic contributions of each component [14–16]. Partition-folding coupling was observed more than 20 years ago for toxic peptides [85], peptide hormones [86], signal peptides [87, 88], and membrane fusion peptides [89]. Beta-sheet forming proteins and peptides may also insert and fold in the membrane interface before they become completely inserted into and translocated across the membrane [90–93]. An example of the respective contributions of G; H, and S to partitioning and folding of an α-helical fusion peptide in lipid bilayers is shown in Figure 6.
Fig. 6 Thermodynamics of partition-folding of the helical fusion peptide from influenza hemagglutinin. The lighter bars represent contributions from partitioning and the darker bars represent contributions from the coil → "-helix transition to G; H, and TS upon peptide insertion into mixed phosphatidylcholine/phosphatidylglycerol bi-
layers. P8 through P20 are 8- to 20-residue peptides of the same sequence, and G1S and G1V are single amino acid mutations in position 1 of P20 that cause hemi-fusion or are defective in causing fusion, respectively. The numbers in the bars refer to the respective energies in kcal mol−1 . Adapted from Ref. [16].
8 Mechanisms of $-Barrel Membrane Protein Folding
7 Membrane Toxins as Models for Helical Membrane Protein Insertion
There are several proteins that assume globular structures in solution, but upon interaction with a membrane receptor or at low pH insert into and sometimes translocate across membranes. Among them are several colicins, which are translocated across outer bacterial membranes via porins and subsequently inserted into the inner membrane where they form toxic pores. Other toxins such as diphtheria, tetanus, and botulinum toxins spontaneously insert their translocation domains into mammalian cell membranes and thereby permit the translocation of linked catalytic domains into the cells. Since these proteins form helical structures in solution and in membranes, they are good models to study the refolding of soluble helical proteins into membranes. The channel-forming colicins are perhaps in this regard the best-studied bacterial toxins. The structures of the soluble forms of the channel domains of several colicins are known [94]. They consist of a central hydrophobic helical hairpin that is surrounded by eight amphipathic helices. Refolding at the membrane interface has been implicated in many models of colicin insertion into lipid bilayers. An early study found a molten globule intermediate at an early stage of colicin A insertion into membranes [95]. The helices were formed, but tertiary contacts were not. Subsequent fluorescence studies indicated that the helices of colicin E were located in the interface, but widely dispersed in this state [96, 97]. The central hydrophobic helical hairpin of colicin A appears to be more deeply inserted, but does not assume a complete TM topology [98]. How the voltage-gated ion channel is opened from this stage is still unclear. Apparently, long stretches of sequence translocate across the membrane upon the application of a membrane potential by a process that is still poorly understood structurally and mechanistically [99, 100]. The process of membrane insertion of the translocation domain of diphtheria toxin likely resembles that of the colicins. The structure of the soluble form features a hydrophobic helical hairpin similar to that of the colicins [101]. It is thought that this hairpin inserts first into the lipid bilayer. The membrane-bound protein can exist in two conformations with an either shallow or deeply inserted helical hairpin [102]. Similar to the colicins, diphtheria toxin switches between two very different topologies in membranes and their relative populations depend on the applied membrane potential [103]. Again, the translocation mechanism is only poorly understood at the present time. However, despite many still unexplored details, the bacterial toxins appear to be an interesting class of facultative membrane proteins, from which we expect to learn a great deal about the mechanisms and energetics of refolding helical proteins in lipid bilayers.
8 Mechanisms of β-Barrel Membrane Protein Folding
The mechanism of folding and membrane insertion of the eight-stranded β-barrel protein OmpA has been studied by a number of kinetic experiments. The kinetics of
17
18 Folding of Membrane Proteins
folding into lipid bilayers composed of dimyristoylphosphatidylcholine in the fluid phase were found to be rather slow (i.e., on the order of many minutes at 30 ◦ C) [104]. A more detailed kinetic study carried out with dioleoylphosphatidylcholine (DOPC) lipid bilayers in the 2–40◦ C temperature range revealed three distinct kinetic phases [105]. The fastest phase detected by tryptophan (Trp) fluorescence changes had a time constant of 6 min and was rather independent of temperature. This phase was attributed to the initial binding of the unfolded protein to the bilayer surface. A second phase was strongly temperature dependent and had time constants in the 15 min to 3 h range (40-2 ◦ C). The activation energy determined from an Arrhenius plot was 11 kcal mol−1 . This phase presumably corresponds to a deeper insertion, but not yet complete translocation of the β-strands in the lipid bilayer. The slowest phase was observed by the SDS gel-shift assay, which reports on the completion of the β-barrel. Complete folding in DOPC bilayers was only observed at temperatures greater than 30 ◦ C, had a time constant of about 2 h, and took about 6 h to go to completion at 37 ◦ C. The translocation process of Trps of OmpA across the lipid bilayer was monitored by time-resolved Trp fluorescence quenching (TDFQ) [106]. In this technique, quenchers of Trp fluorescence are placed at different depths into the membrane and the time course of passage of Trps past these zones of quenchers is followed. When the technique was applied to single Trp mutants that were placed on different membrane-crossing β-hairpins, it was found that all four β-hairpins of OmpA crossed the membrane following the same time course (i.e., using a synchronized concerted mechanism of folding and insertion) [107]. This result is in accordance with the notion that interstrand hydrogen bonds and the barrel itself have to form while the protein translocates across the membrane. Again and very similar to the partition–folding coupling that was discussed in the context of helical membrane protein folding at membrane interfaces, we observe a folding–translocation coupling for this class of membrane proteins. Thus the mechanism of β-barrel membrane protein folding in lipid bilayers is very different from the two (or more) stage models discussed above for helical membrane protein folding. The requirement that the barrel needs to be completely folded in order to translocate across the membrane probably also explains why the time constants of this process are so slow and the activation energies are so high. Folding and translocation of the barrel requires a large defect to be created in the membrane in order to insert a barrel of the size of OmpA. Moreover, the membrane provides a 100- to 1000-fold more viscous environment than water for folding and inserting membrane proteins. In another kinetic study, the dependence of the rates of folding and insertion of OmpA on the bilayer thickness was examined [108]. As one might expect from the material properties of lipid bilayers, the folding and insertion rates increased significantly as the bilayer thickness decreased. When the bilayers were sufficiently thin, folding of OmpA into large unilamellar vesicles was observed, whereas in average or thick bilayers complete folding and insertion occurs only in small unilamellar vesicles. Small vesicles are more strained and therefore exhibit more defects than large vesicles, which permits the quantitative refolding of OmpA in small, but not in large unilamellar vesicles if the hydrocarbon thickness of the bilayer is more
9 Experimental Protocols 19
than 20 Å (i.e., that of diC12 -phosphatidylcholine). Interestingly, the hydrophobic thickness of outer bacterial membranes is thought to be thinner than that of the inner membranes [109]. 9 Experimental Protocols 9.1 SDS Gel Shift Assay for Heat-modifiable Membrane Proteins
The heat modifiability can be defined as a property of a protein whose electrophoretic mobility depends on the treatment of heat. It is characteristic for many bacterial outer membrane proteins [110]. For example, detergent-solubilized OmpA from E. coli migrates on SDS-polyacrylamide gels as a 30-kDa form without boiling, but as a 35-kDa form if boiled in SDS. Likewise, the iron transporter FepA shows an apparent molecular mass of 58 kDa if not boiled, but exhibits its actual molecular mass of 81 kDa after boiling [55]. The correlation between conformational states and electrophoretic mobilities on SDS-PAGE has been studied most extensively for unboiled samples OmpA. The 30-kDa form represents the membrane-inserted native state, which also exhibits a characteristic ion channel activity [45, 49, 52]. The 35-kDa form may represent a completely denatured state in the presence or absence of lipids [49–51], a misfolded state in the absence of lipid [111], a surface-adsorbed partially folded state [49, 90], or a kinetic intermediate at a stage before the complete insertion into membranes [105, 108]. 9.1.1 Reversible Folding and Unfolding Protocol Using OmpA as an Example 1. Dilute OmpA in 8 M urea more than 100-fold into sonicated small unilamellar vesicles (SUV) composed of the desired lipids to give a final OmpA concentration of 12 µM and lipid-to-protein molar ratio of 800. 2. Incubate for 3 h at 37 ◦ C for refolding. 3. Divide the refolded protein-lipid complex into 10 aliquots. 4. Add appropriate amounts of a freshly made 10 M urea and buffer (10 mM glycine, pH 10, 2 mM EDTA) solutions to the aliquots to reach a 2.5-fold dilution of OmpA and urea concentrations in the range of interest. 5. Incubate reactions overnight at 37 ◦ C. 6. Terminate reactions by mixing the samples with an equal volume of 0.125 M Tris buffer, pH 6.8, 4% SDS, 20% glycerol, and 10% 2-mercaptoethanol at room temperature. 7. Without boiling, load equilibrated samples on 12.5% SDS polyacrylamide gels, run electrophoresis, and stain with Coomassie Blue. 8. Measure fractions folded and unfolded, F and U, by densitometry and fit to two-state equilibrium folding equation:
[U]=
([U]+[F])exp({m[urea]−G ◦ H2O }/RT ) 1+exp({m[urea]−G ◦ H2O }/RT )
The fit parameters are the free energy of unfolding, G◦ H2O , and the m-value.
(1)
20 Folding of Membrane Proteins
Fig. 7 Reversible urea-induced unfolding/refolding of OmpA in lipid bilayers composed of 92.5% phosphatidylcholine (C16:0 C18:1 PC) and 7.5% phosphatidylglycerol (C16:0 C18:1 PG) at pH 10 and 37 ◦ C monitored by the SDS-PAGE shift assay. A) Unfolding (upper panel) and refolding
(lower panel) of OmpA. The 30-kDa form represents the native state, and the 35-kDa form represents denatured states. B) Unfolded fractions as a function of urea concentration estimated from A. The data are fitted with the two-state equilibrium folding equation Eq. (1).
An example of a reversible unfolding/refolding reaction using this protocol is shown in Figure 7. 9.2 Tryptophan Fluorescence and Time-resolved Distance Determination by Tryptophan Fluorescence Quenching
Steady-state and time-resolved Trp fluorescence spectroscopy have been widely used to the study folding and stability of membrane proteins. The relatively large spectral shifts and fluorescence intensity changes of Trps as a result of changing local environments have made Trp fluorescence an excellent tool for such studies. Figure 8A shows typical fluorescence spectral changes of OmpA–lipid complexes equilibrated in various urea concentrations. The unfolding transition curves are
9 Experimental Protocols 21
Fig. 8 Urea-induced unfolding of OmpA in lipid bilayers composed of 92.5% phosphatidylcholine (C16:0 C18:1 PC) and 7.5% phosphatidylglycerol (C16:0 C18:1 PG) at pH 10 and 37 ◦ C monitored by Trp fluorescence. A) Fluorescence emission spectra of OmpA as a function of urea concentration. The protein and lipid concentrations were
1.2 µM and 10 µM, respectively, and the urea concentration ranged from 0 to 8 M (left to right). B) Unfolding transition monitored by the average emission wavelength 8 (Eq. (2)) calculated from the spectra in A and fitted with the two-state equilibrium folding equation Eq. (3).
constructed as a function of urea by parametrizing each spectrum to an average emission wavelength: λ= i
(Fi λi )
i
(2)
(λi )
and fitting the data to the two-state equilibrium folding equation: λ=
λ F +λU Q1R exp[m([denaturant]−Cm )/RT ] 1+ Q1R exp[m([denaturant]−Cm )/RT ]
(3)
22 Folding of Membrane Proteins
Inclusion of the weighting factor Q R = (total fluorescence intensity in native state)/(total fluorescence intensity in denatured state) makes λ proportional to species concentrations and therefore permits λ to be taken as a measure of the fraction of folded protein [112]. The free energy of unfolding is calculated from the fitted denaturant concentration at the mid-point of the transition, Cm , and the m-value using G◦ H2O = mCm . Figure 8B shows an example of equilibrium unfolding of OmpA measured by Trp fluorescence. The degree of solvent exposure of Trp residues of membrane proteins and thus their degree of immersion in the lipid bilayer is often determined by the effect of polar (e.g., cesium, iodide, bromide, acrylamide) or nonpolar (e.g., molecular oxygen) quenchers on the fluorescence spectra. A more informative method to determine the depth of Trps in membranes involves the use of spin-labeled or brominated lipids as quenchers. Phospholipids selectively labeled with bromines or spin labels in various positions of the acyl chains are commercially available. Depths in membranes of specific Trps are often measured by using a whole set of these lipids and analyzing the resulting spectra with the “parallax” or “distribution analysis” methods [113]. These steady-state methods may be extended to measure the time dependence of Trp positions and more specifically the translocation of Trps across lipid bilayers during the folding of membrane proteins [106]. Reliable distances in membranes are obtained by the time-resolved distance determination by fluorescence quenching (TDFQ) technique because the positions and distributions of the bromines in the bilayer are known from X-ray diffraction studies [114].
9.2.1 TDFQ Protocol for Monitoring the Translocation of Tryptophans across Membranes 1. Mix solutions of DOPC and one each of DOPC plus 4,5-, 6,7-, 9,10-, and 11,12diBrPC, respectively, at molar ratios of 7:3 (DOPC:m,n-diBrPC) in chloroform. 2. Dry the mixtures on the bottom of test tubes under a stream of nitrogen and further in a desiccator under a high vacuum overnight. 3. Disperse dried lipids in 1.0 mL buffer (10 mM glycine, pH 8.5, 1 mM EDTA, 150 mM NaCl) to a lipid concentration of 2 mg mL−1 . 4. Prepare SUVs by sonication of the lipid dispersions for 50 min using the microtip of a Branson ultrasonifier at 50% duty cycle in an ice/water bath. 5. Remove titanium dust by low-speed centrifugation (e.g., at 6000 rpm for 10 min on a tabletop centrifuge) and store vesicles at 4 ◦ C overnight for equilibration. 6. Measure blank fluorescence emission spectra for each vesicle solution at 1 mM lipid concentration and at the controlled temperature of interest with instrument settings for Trp fluorescence (e.g., excitation at 290 nm with 2-nm slit; emission from 310 to 370 nm with 5-nm slit). 7. Add small volume of concentrated OmpA (stock is typically 20–50 mg mL−1 ) in 8 M urea to give a final OmpA concentration of 1.4 µM and mix well. 8. Record fluorescence spectra with above settings every 2–5 min. (The first spectrum can be taken 30 s after mixing.) Repeat measurements for each of the five vesicle solutions.
9 Experimental Protocols 23
9. Measure fluorescence intensities of smoothed spectra at 330 nm as a function of time and normalize values for each of the quenching vesicles to the corresponding values of vesicles without lipid quencher. 10. Fit normalized fluorescence intensities to the Gaussian distribution equation:
F (d Q ) 1 d Q −dTrp 2 S =exp − √ exp − F0 2 σ σ 2π
(4)
F 0 and F(dQ ) are the fluorescence intensities in the absence and presence of a particular m,n-diBrPC quencher. dQ are the known distances of the quenchers from the bilayer center, and dTrp is the distance of the fluorophor from the center. The dispersion σ is a measure of the width of the distribution of the fluorophor in the membrane, which in turn depends on the sizes and thermal fluctuations of the fluorophors and quenchers. S is a function of the quenching efficiency and quencher concentration in the membrane. The fits yield dTrp ,σ , and S. 11. Plot dTrp and σ as a function of time.
The accuracy of the distribution analysis could be improved if more than four membrane-bound quenchers at different depths in the membrane and/or a quencher attached to the polar lipid headgroup were used. Figure 9 shows two
Fig. 9 Measurement of the translocation rate of single trypophans of OmpA into and across lipid bilayers composed of phosphatidylcholine (diC18:1 PC) by time-resolved fluorescence quenching (TDFQ). A) Time course of the movement of Trp7 of OmpA into lipid bilayer at 2 ◦ C. This Trp stays on
the cis side and does not cross the bilayer. B) Time course of the movement of Trp143 across the lipid bilayer at 30 ◦ C. This Trp translocates across the bilayer with a time constant of 3.3 min at 30 ◦ C. The solid lines are fits of the data to exponential functions. Adapted from Ref. [107].
24 Folding of Membrane Proteins
examples of Trp movements of OmpA into and across lipid bilayers, respectively, measured by TDFQ using four different bromine quenchers in the acyl chain region. 9.3 Circular Dichroism Spectroscopy
Circular dichroism (CD) refers to the differential absorption of the right- and lefthanded circular polarized lights by a sample that exhibits molecular asymmetry. Far-UV CD spectroscopy has been used to characterize the secondary structures of proteins including membrane proteins because the differential amide absorptions of α-helices, β-sheets, turns, and random coils in the range between 180 and 250 nm yield characteristic CD signals that can be used to determine their secondary structure compositions. The CD spectra of α-helices exhibit minima at 208 and 222 nm. Beta-sheets show a weaker minimum at 216 nm. Both of these structures exhibit maxima just below 200 nm, but the spectra of random coils are negative in the entire accessible far-UV spectral range with a minimum just below 200 nm. CD spectroscopy of membrane proteins is difficult because the proteo-liposomes, which are necessary for functional reconstitution of membrane proteins, are strong scatterers of UV light resulting in overwhelming spectral noise. Therefore, only SUVs, which are typically obtained by sonication and which are typically 30 nm in diameter, or protein–detergent mixed micelles are recommended for recording CD spectra of membrane proteins. Even with these samples, CD spectra cannot usually be recorded to below 200 nm, which limits the accuracy of secondary structure determinations. Good results can be obtained with 0.05-mm pathlength cells and protein concentrations of 0.5–1.0 mg mL−1 in buffers that do not absorb in the far UV. Relatively crude estimates of secondary structure content can be made from the spectral shapes in the 200–240 nm spectral region and using the following rule of thumb values for 100% secondary structure: θ 222 = −30 000 deg cm2 dmol−1 for α-helix and θ 216 = −15 000 deg cm2 dmol−1 for β-sheet. 9.4 Fourier Transform Infrared Spectroscopy
Fourier transform infrared (FTIR) spectroscopy is a useful technique for determining the conformation and orientation of membrane proteins and lipids in lipid bilayers. The method has no light scattering artifacts and works well with turbid and even solid samples. Since the absorption bands of liquid water overlap with several bands that are of interest in protein and membrane spectroscopy, it is convenient to work in D2 O solutions. In conventional transmittance spectroscopy, the strong absorbances of residual H2 O and bulk D2 O are minimized by using high protein concentrations and measuring cells of short path lengths. Another very convenient method for suppressing unwanted water signals is to record attenuated total reflection (ATR) FTIR spectra. In this method, the IR beam is
9 Experimental Protocols 25
reflected within an IR-transparent internal reflection element such as a germanium plate. An evanescent wave of the same frequency as the incoming IR light builds up at the germanium-water interface. The electric field of the evanescent wave decreases exponentially away from the interface with a characteristic decay length, dp , which is on the order of a few hundred nanometers in typical applications. This wave probes with high sensitivity samples that are adsorbed at the germanium-water interface such as a reconstituted protein-lipid bilayer. Since absorptions from the bathing buffer solution are greatly suppressed in this surface sensitive technique, small amounts of H2 O in D2 O are not a problem in ATR-FTIR spectroscopy. An additional advantage is that the orientation of membrane proteins and lipids in the bilayers can be obtained by using polarized light. Microgram quantities of membrane proteins in single supported bilayers are sufficient to record high quality spectra. Finally, buffer conditions can be varied in situ using a perfusable ATR measuring cell. A comprehensive description of ATR-FTIR spectroscopy on supported membranes has appeared [115]. The most widely used conformation-sensitive vibrational mode of proteins is the amide I band (1600–1700 cm−1 ). The secondary structures of proteins are correlated with the amide I frequencies, which enables the assignment of the major secondary structure elements of proteins by spectral decomposition of this band (α-helix: 1648–1660 cm−1 ; parallel and antiparallel β-sheet: 1625–1640 cm−1 ; anti–parallel β-sheet: 1675-1695 cm−1 ; aggregated strands: 1610-1628 cm−1 ; turns: 1660-1685 cm−1 ; unordered structures: 1652-1660 cm−1 in H2 O and 1640–1648 cm−1 in D2 O). The peak corresponding to the α-helix is often not well resolved from nearby peaks of turns and unordered structures. However, β-sheet can usually be reliably assigned due to its distinctive components at about 1630 cm−1 and 1690 cm−1 . The following bands of lipid vibrations are well separated from protein amide bands and are frequently analyzed in studies of protein–lipid interactions: symmetric and antisymmetric methylene stretches at 2850 cm−1 and 2920 cm−1 , respectively, and ester carbonyl stretch at 1730 cm−1 . 9.4.1 Protocol for Obtaining Conformation and Orientation of Membrane Proteins and Peptides by Polarized ATR-FTIR Spectroscopy 1. Fill liquid ATR cell containing germanium plate with D2 O buffer (e.g., 5 mM HEPES/10 mM MES pH 7.3, 135 mM NaCl) and record FTIR spectra at vertical and horizontal polarizations with a spectral resolution of 2 cm−1 to obtain reference spectra. 2. Deposit lipid monolayer on germanium plate by the Langmuir-Blogett technique at 32 mN m−1 . 3. Add proteoliposomes (typically about 0.5 mM lipid and 20 µg mL−1 protein in same H2 O buffer) to supported monolayer in assembled ATR cell and incubate with gentle shaking for 1 h. The proteoliposomes will fuse with the monolayer during this time and form a complete planar bilayer on the surface of the germanium plate.
26 Folding of Membrane Proteins
4. Flush excess proteoliposomes out of measuring cell with 10 volumes of D2 O buffer. This step also exchanges H2 O for D2 O. 5. Incubate exchanged sample for 1 h at room temperature to equilibrate labile amide protons with deuteriums. This is done in the FTIR spectrometer and the spectrometer compartment is purged with dry air or nitrogen at the same time to remove water vapor. 6. Collect the sample spectra using the same experimental settings as reference spectra at vertical and horizontal polarizations. 7. Ratio sample and reference spectra and calculate absorption spectra.
The following steps describe a recipe to extract secondary structure information from complex (overlapped) amide I bands 8. Calculate second-derivative spectra by differentiation with 11-point SavitskyGolay smoothing. The minima in the second derivatives indicate possible component peaks. 9. Using only prominent and clearly identified component peaks as starting points, carry out spectral decomposition by least-squares fitting with Gaussian component line shapes. Use as few components as possible to reproduce the experimental line shape. Accept only fits that keep the Gaussian peak components centered close to the component frequencies that were identified in the second derivative spectra. 10. Integrate spectral components, assign to secondary structures, and calculate area fractions to indicate respective secondary structure fractions. The following steps describe how to obtain orientations of secondary structure elements of proteins and orientations (order parameters) of lipid chemical groups 11. Calculate ATR dichroic ratio for the amide I bands and symmetric or antisymmetric methylene stretching band intensities according to ATR RamideI =
A (v)dv // A⊥ (v)dv
(5)
A// (v) A⊥ (v)
(6)
and ATR RLipid =
12. The order parameter is defined as
Sθ =(3cos2 θ −1)/2
(7)
where θ is the angle between the axis of rotational symmetry of an axially symmetric molecule (long axis of the lipid, α-helix, β-barrel, etc.) and the normal
9 Experimental Protocols 27
to the germanium plate and the angular brackets denote a time and space average over all angles in a given sample. 13. Calculate the order parameter for an amide I band component as: SamideI =
ATR E x2 −RamideI E y2 +E z2
(8)
ATR E x2 −RamideI E y2 −E z2
where E x , E y , and E z are the electric field amplitudes at the germanium-buffer interface. For a 45◦ germanium plate with a thin attached membrane in D2 O buffer E x 2 = 1.9691, E y 2 = 2.2486, and E 2 = 1.8917. For α-helices, where the angle between the amide I transition dipole moment and the helix axis is 39◦ , Eq. (8) transforms to SH =2.46
ATR E x2 −RamideI E y2 +E z2
(9)
ATR E x2 −RamideI E y2 −E z2
SH is the order parameter that describes the helix orientation relative to the germanium plate and membrane normal. For β-sheets, there is no helical symmetry and Eq. (8) describes the order parameter of the transition dipole moment of the amide I band, which roughly coincides with the order parameter of the amide carbonyl bond orientations. 14. Lipid order parameters are obtained from the experimental dichroic ratios of the symmetric and/or antisymmetric methylene stretch vibrations: SL =−2
E x2 −RLATR E y2 +E z2
(10)
E x2 −RLATR E y2 −E z2
Lipid order parameters should always be recorded along with those of embedded proteins because they provide essential controls of the bilayer quality. Results on protein orientations are only meaningful if lipid orientations (to the germanium plate) are satisfactory. 15. Lipid-to-protein ratios in supported bilayers Use the following form of BeerLambert’s law modified for polarized ATR spectroscopy to calculate the lipid/protein ratio in supported bilayers [116]: L /P=0.208(nres −1)
2980 1−SamideI 2800 A⊥ (vL )dv 1+SL /2 1690 A⊥ (vamideI )dv
(11)
1600
where nres is the number of residues in the polypeptide and vL and vamideI are the wavenumbers of the lipid methylene stretch and the protein amide I vibrations, respectively.
28 Folding of Membrane Proteins
Acknowledgment
We thank members of the Tamm laboratory, past and present, for their contributions and many discussions. The work from our laboratory was supported by grants from the National Institutes of Health.
References 1 DILL, K. A. (1990). Dominant forces in protein folding. Biochemistry 29, 7133–7155. 2 BALDWIN, R. L. (1999). Protein folding from 1962 to 1982. Nature Struct. Biol. 6, 814–817. 3 FERSHT, A. (1999). Structure and Mechanism in Protein Science. Freeman, New York. 4 WALLIN, E. & VON HEIJNE, G. (1998). Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 7, 1029–1038. 5 HALTIA, T. & FREIRE, E. (1995). Forces and factors that contribute to the structural stability of membrane proteins. Biochim. Biophys. Acta 1228, 1–27. 6 Van DEN BERG, B., CLEMONS, W. M., Jr., COLLINSON, I., KODIS, Y., HARTMAN, E., HARRISON, S. C. & RAPOPORT, T. A. (2004). X-ray structure of a proteinconducting channel. Nature 427, 37–44. 7 WHITE, S.H. & WIENER, M.C. (1996). The liquid-crystallographic structure of fluid lipid bilayer membranes. In Biological Membranes: A Molecular Perspective from Computation and Experiment (MERZ, K.M. & ROUX, B., eds), pp. 127–144. Birkhauser, Boston. 8 NAGLE, J. F. & TRISTRAM-NAGLE, S. (2000). Structure of lipid bilayers. Biochim. Biophys. Acta 1469, 159–195. 9 TANFORD, C. (1980). The Hydrophobic Effect, 2nd edn. Wiley, New York. 10 MARSH, D. (1996). Lateral pressure in membranes. Biochim. Biophys. Acta 1286, 183–223. 11 CANTOR, S. C. (1999). Lipid composition and the lateral pressure profile in bilayers. Biophys. J. 76, 2625–2639.
12 BROWN, M. F., THURMOND, R. L., DODD, S. W., OTTEN, D. & BEYER, K. (2002). Elastic deformation of membrane bilayers probed by deuterium NMR relaxation. J. Am. Chem. Soc. 124, 8471–8484. 13 JAYASINGHE, S., HRISTOVA, K. & WHITE, S. H. (2001). Energetics, stability, and prediction of transmembrane helices. J. Mol. Biol. 312, 927–934. 14 LADOKHIN, A. S. & WHITE, S. H. (1999). Folding of amphipathic α-helices on membranes: energetics of helix formation by melittin. J. Mol. Biol. 285, 1363–1369. 15 WIEPRECHT, T., APOSTOLOV, O., BEYERMANN, M. & SEELIG, J. (1999). Thermodynamics of the α-helix-coil transition of amphipathic peptides in a membrane environment: implications for the peptide-membrane binding equilibrium. J. Mol. Biol. 294, 785–794. 16 LI, Y., HAN, X. & TAMM, L. K. (2003). Thermodynamics of fusion peptide-membrane interactions. Biochemistry 42, 7245– 7251. 17 WIMLEY, W. C., HRISTOVA, K., LADOKHIN, A. S., SILVESTRO, L., AXELSEN, P. H. & WHITE, S. H. (1998). Folding of β-sheet membrane proteins: A hydrophobic hexapeptide model. J. Mol. Biol. 277, 1091–1110. 18 BENTAL, N., SITKOFF, D., TOPOL, I. A., YANG, A.-S., BURT, S. K. & HONIG, B. (1997). Free energy of amide hydrogen bond formation in vacuum, in water, and in liquid alkane solution. J. Phys. Chem. B 101, 450–457. 19 WIMLEY, W. C. & WHITE, S. H. (1996). Experimentally determined hydrophobicity scale for proteins at
References
20
21
22
23
24
25
26
27
28
29
30
membrane interfaces. Nature Struct. Biol. 3, 842–848. WIMLEY, W. C., CREAMER, T. P. & WHITE, S. H. (1996). Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides. Biochemistry 35, 5109–5124. DWYER, J. J., GITTIS, A. G., KARP, D. A., LATTMAN, E. E., SPENCER, D. S., STITES, W. E. & GARCIA-MORENO, E. B. (2000). High apparent dielectric constants in the interior of a protein reflect water penetration. Biophys. J. 79, 1610–1620. KILLIAN, J. A. & VON HEIJNE, G. (2000). How protein adapt to a membrane-water interface. Trends Biochem. Sci. 25, 429–434. YAU, W.-M., WIMLEY, W. C., GAWRISCH, K. & WHITE, S. H. (1998). The preference of tryptophan for membrane interfaces. Biochemistry 37, 14713–14718. SCHULZ, G. E. (2002). The structure of bacterial outer membrane proteins. Biochim. Biophys. Acta 1565, 308–317. GABRIEL, K., BUCHANAN, S. K. & LITHGOW, T. (2001). The alpha and the beta: protein translocation across mitochondrial and plastid outer membranes. Trends Biochem. Sci. 26, 36–40. PASCHEN, S. A., WAIZENEGGER, T., STAN, T., PREUSS, M., CYRKLAFF, M., HELL, K., RAPAPORT, D. & NEUPERT, W. (2003). Evolutionary conservation of biogenesis of β-barrel membrane proteins. Nature 426, 862–866. DANESE, P. N. & SILHAVY, T. J. (1998). Targeting and assembly of perplasmic and outer-membrane proteins in Escherichia coli. Annu. Rev. Genet. 32, 59–94. WIMLEY, W. C. (2002). Toward genomic identification of β-barrel membrane proteins: Composition and architecture of known structures. Protein Sci. 11, 301–312. CHEN, R. & HENNING, U. (1996). A periplasmic protein (Skp) of Escherichia coli selectively binds a class of outer membrane proteins. Mol. Microbiol. 19, 1287–1294. LAZAR, S. W. & KOLTER, R. (1996). SurA assists the folding of Escherichia coli
31
32
33
34
35
36
37
38
39
outer membrane proteins. J. Bacteriol. 178, 1770–1773. KOEBNIK, R., LOCHER, K. P. & van GELDER, P. (2000). Structure and function of bacterial outer membrane proteins: barrels in a nutshell. Mol. Microbiol. 37, 239–253. COWAN, S. W., SCHIRMER, T., RUMMEL, G., STEIERT, M., GHOSH, R., PAUPTIT, R. A., JANSONIUS, J. N. & ROSENBUSCH, J. P. (1992). Crystal structures explain functional properties of two E. coli porins. Nature 358, 727–733. WEISS, M. S., ABELE, U., WECKESSER, J., WELTE, W., SCHILTZ, E. & SCHULZ, G. E. (1991). Molecular architecture and electrostatic properties of a bacterial porin. Science 254, 1627–1630. SCHIRMER, T., KELLER, T. A., WANG, Y. F. & ROSENBUSCH, J. P. (1995). Structural basis for sugar translocation through maltoporin channels in 3.1 A resolution. Science 267, 512–514. FORST, D., WELTE, W., WACKER, T. & DIEDERICHS, K. (1998). Structure of the sucrose-specific porin ScrY from Salmonella typhimurium and its complex with sucrose. Nature Struct. Biol. 5, 37–46. FERGUSON, A. D., HOFMANN, E., COULTON, J. W., DIEDERICHS, K. & WELTE, W. (1998). Siderophore-mediated iron transport: crystal structure of FhuA with bound lipopolysaccharide. Science 282, 2215–2220. LOCHER, K. P., REES, B., KOEBNIK, R., MITSCHLER, A., MOULINIER, L. & ROSENBUSCH, J. P. (1998). Transmembrane signaling across the ligand-gated FhuA receptor: crystal structures of free and ferrichromebound states reveal allosteric changes. Cell 95, 771–778. BUCHANAN, S. K., SMITH, B. S., VENKATRAMANI, L. et al. (1999). Crystal structure of the outer membrane active transporter FepA from Escherichia coli. Nature Struct. Biol. 6, 56–63. FERGUSON, A. D., CHAKRABORTY, R., SMITH, B. S., ESSER, L., van DERHELM, D. & DEISENHOFER, J. (2002). Structural basis of gating by the outer membrane
29
30 Folding of Membrane Proteins
40
41
42
43
44
45
46
47
48
49
transporter FecA. Science 295, 1658–1659. CHIMENTO, D. P., MOHANTY, A. K., KADNER, R. J. & WIENER, M. C. (2003). Substrate-induced transmembrane signaling in the cobalamin transporter BtuB. Nature Struct. Biol. 10, 394–401. SNIJDER, J. J., UBARRETXENA-BALANDIA, I., BLAAUW, M. et al. (1999). Structural evidence for dimerization- regulated activation of an integral membrane protein phospholipase. Nature 401, 717–721. VOGT, J. & SCHULZ, G. E. (1999). The structure of the outer membrane protein OmpX from Escherichia coli reveals possible mechanisms of virulence. Structure 7, 1301–1309. PAUTSCH, A. & SCHULZ, G. E. (1998). Structure of the outer membrane protein A transmembrane domain. Nature Struct. Biol. 5, 1013–1017. SONNTAG, I., SCHWARZ, H., HIROTA, Y. & HENNING, U. (1978). Cell envelope and shape of Escherichia coli: multiple mutants missing the outer membrane lipoprotein and other major outer membrane proteins. J. Bacteriol. 136, 280–285. ARORA, A., ABILDGAARD, F., BUSHWELLER, J. H. & TAMM, L. K. (2001). Structure of outer membrane protein A transmembrane domain by NMR spectroscopy. Nature Struct. Biol. 8, 334–338. KOEBNIK, R. (1999). Membrane assembly of the Escherichia coli outer membrane protein OmpA: Exploring sequence constraints on transmembrane β-strands. J. Mol. Biol. 285, 1801–1810. KOEBNIK, R. & KRA¨ MER, L. (1995). Membrane assembly of circularly permuted variants of the E. coli outer membrane protein OmpA. J. Mol. Biol. 250, 617–626. DORNMAIR, K., KIEFER, H. & J¨AHNIG, F. (1990). Refolding of an integral membrane protein. OmpA of Escherichia coli. J. Biol. Chem. 265, 18907–18911. SURREY, T. & J¨AHNIG, F. (1992). Refolding and oriented insertion of a membrane protein into a lipid bilayer. Proc. Natl Acad. Sci. USA 89, 7457–7461.
50 HONG, H. & TAMM, L. K. (2004). Elastic coupling of integral membrane protein stability to lipid bilayer forces. Proc. Natl Acad. Sci. USA 101, 4065–4070. 51 SCHWEIZER, M., HINDENNACH, I., GARTEN, W. & HENNING, U. (1978). Major proteins of the Escherichia coli outer cell envelope membrane. Interaction of protein II with lipopolysaccharide. Eur. J. Biochem. 82, 211–217. 52 ARORA, A., RINEHART, D., SZABO, G. & TAMM, L. K. (2000). Refolded outer membrane protein A of Escherichia coli forms ion channels with two conductance states in planar lipid bilayers. J. Biol. Chem. 275, 1594–1600. 53 GRUNER, S. (1985). Intrinsic curvature hypothesis for biomembrane lipid composition: a role for nonbilayer lipids. Proc. Natl Acad. Sci. USA 82, 3665–3669. 54 BONHIVERS, M., DESMADRIL, M., MOECK, G. S., BOULANGER, P., COLOMER-PALLAS, A. & LETELIER, L. (2001). Stability studies of FhuA, a two-domain outer membrane protein from Escherichia coli. Biochemistry 40, 2606–2613. 55 KLUG, C. S., SU, W., LIU, J., KLEBBA, P. E. & FEIX, J. B. (1995). Denaturant unfolding of the ferric enterobactin receptor and ligand-induced stabilization studied by site-directed spin labeling. Biochemistry 34, 14230–14236. 56 KLUG, C. S. & FEIX, J. B. (1998). Guanidine hydrochloride unfolding of a transmembrane β-strand in FepA using site-directed spin labeling. Protein Sci. 7, 1469–1476. 57 ACKERMAN, M. S. & SHORTLE, D. (2001). Persistence of native-like topology in a denatured protein in 8 M urea. Science 293, 487–489. 58 SURREY, T., SCHMID, A. & J¨AHNIG, F. (1996). Folding and membrane insertion of the trimeric β-barrel protein OmpF. Biochemistry 35, 2283–2288. 59 PHALE, P. S., PHILLIPPSEN, A., KIEFHABER, T., KOEBNIK, R., PHALE, V. P., SCHIRMER, T. & ROSENBUSCH, J. P. (1998). Stability of trimeric OmpF porin: The contributions of the latching loop L2. Biochemistry 37, 15663–15670. 60 POPOT, J.-L. & ENGELMAN, D. M. (1990). Membrane protein folding and
References
61
62
63
64
65
66
67
68
69
70
oligomerization: the two-stage model. Biochemistry 29, 4031–4037. POPOT, J. L., GERCHMAN, S. E. & ENGELMAN, D. M. (1987). Refolding of bacteriorhodopsin in lipid bilayers. A thermodynamically controlled two-stage process. J. Mol. Biol. 198, 655–676. WHITE, S. H. & WIMLEY, W. C. (1999). Membrane protein folding and stability: Physical principles. Annu. Rev. Biophys. Biomol. Struct. 28, 319–365. ENGELMAN, D. M., CHEN, Y., CHIN, C.-N. et al. (2003). Membrane protein folding: beyond the two stage model. FEBS Lett. 555, 122–125. HUANG, K. S., BAYLEY, H., LIAO, M. J., LONDON, E. & KHORANA, H. G. (1981). Refolding of an integral membrane protein. Denaturation, renaturation, and reconstitution of intact bacteriorhodopsin and two proteolytic fragments. J. Biol. Chem. 256, 3802–3809. LONDON, E. & KHORANA, H. G. (1982). Denaturation and renaturation of bacteriorhodopsin in detergents and lipid-detergent mixtures. J. Biol. Chem. 257, 7003–7011. BOOTH, P. J., FLITSCH, S. L., STERN, L. J., GREENHALGH, D. A., KIM, P. S. & KHORANA, H. G. (1995). Intermediates in the folding of the membrane protein bacteriorhodopsin. Nature Struct. Biol. 2, 139–143. LU, H. & BOOTH, P. J. (2000). The final stages of folding of the membrane protein bacteriorhodopsin occur by kinetically indistinguishable parallel folding paths that are mediated by pH. J. Mol. Biol. 299, 233–243. LAU, F. W. & BOWIE, J. U. (1997). A method for assessing the stability of a membrane protein. Biochemistry 36, 5884–5892. FAHAM, S., YANG, D., BARE, E., YOHANNAN, S., WHITELEGGE, J. P. & BOWIE, J. U. (2004). Side-chain contributions to membrane protein structure and stability. J. Mol. Biol. 335, 297–305. LEMMON, M. A., FLANAGAN, J. M., TREUTLEIN, H. R., ZHANG, J. & ENGELMAN, D. M. (1992). Sequence specificity in the
71
72
73
74
75
76
77
78
79
80
dimerization of transmembrane alpha-helices. Biochemistry 31, 12719–12725. SENES, A., GERSTEIN, M. & ENGELMAN, D. M. (2000). Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with beta-branched residues at neighboring positions. J. Mol. Biol. 296, 921–936. BOWIE, J. U. (1997). Helix packing in membrane proteins. J. Mol. Biol. 272, 780–789. POPOT, J. L. & ENGELMAN, D. M. (2000). Helical membrane protein folding, stability, and evolution. Annu. Rev. Biochem. 69, 881–922. FLEMING, K. G., ACKERMAN, A. L. & ENGELMAN, D. M. (1997). The effect of point mutations on the free energy of transmembrane alpha-helix dimerization. J. Mol. Biol. 272, 266–275. FLEMING, K. G. & ENGELMAN, D. M. (2001). Specificity in transmembrane helix–helix interactions can define a hierarchy of stability for sequence variants. Proc. Natl Acad. Sci. USA 98, 14340–14344. ZHOU, F. X., COCCO, M. J., RUSS, W. P., BRU¨ NGER, A. T. & ENGELMAN, D. M. (2000). Interhelical hydrogen bonding drives strong interactions in membrane proteins. Nature Struct. Biol. 7, 154–160. CHOMA, C., GRATKOWSKI, H., LEAR, J. D. & DEGRADO, W. F. (2000). Asparagine-mediated self-association of a model transmembrane helix. Nature Struct. Biol. 7, 161–166. GRATKOWSKI, H., LEAR, J. D. & DEGRADO, W. F. (2001). Polar side chains drive the association of model transmembrane peptides. Proc. Natl Acad. Sci. USA 98, 880–885. ZHOU, F. X., MERIANOS, H. J., (2001). Polar residues drive association of polyleucine transmembrane helices. Proc. Natl Acad. Sci. USA 98, 2250–2255. SENES, A., UBARRETXENA-BELANDIA, I. & ENGELMAN, D. M. (2001). The Calpha –H O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions. Proc. Natl Acad. Sci. USA 98, 9056–9061.
31
32 Folding of Membrane Proteins
81 YOHANNAN, S., FAHAM, S., YANG, D., GROSFELD, D., CHAMBERLAIN, A. K. & BOWIE, J. U. (2004). A C(alpha)–H O hydrogen bond in a membrane protein is not stabilizing. J. Am. Chem. Soc. 126, 2284–2285. 82 DOYLE, D. A., MORAIS CABRAL, J., PFUETZNER, R. A. et al. (1998). The structure of the potassium channel: molecular basis of K+ conduction and selectivity. Science 280, 69–77. 83 STOWELL, M. H. B. & REES, D. C. (1995). Structure and stability of membrane proteins. Adv. Protein Chem. 46, 279–311. 84 JACOBS, R. E. & WHITE, S. H. (1989). The nature of the hydrophobic binding of small peptides at the bilayer interface: implications for the insertion of transbilayer helices. Biochemistry 28, 3421–3437. 85 VOGEL, H. (1981). Incorporation of melittin into phosphatidylcholine bilayers. Study of binding and conformational changes. FEBS Lett. 134, 37–42. 86 KAISER, E. T. & KEZDY, F. J. (1984). Amphiphilic secondary structure: design of peptide hormones. Science 223, 249–255. 87 BRIGGS, M. S., CORNELL, D. G., DLUHY, R. A. & GIERASCH, L. M. (1986). Conformations of signal peptides induced by lipids suggest initial steps in protein export. Science 233, 206–208. 88 TAMM, L. K. (1991). Membrane insertion and lateral mobility of synthetic amphiphilic signal peptides in lipid model membranes. Biochim. Biophys. Acta 1071, 123–148. 89 LEAR, J. D. & DEGRADO, W. F. (1987). Membrane binding and conformational properties of peptides representing the NH2 terminus of influenza HA-2. J. Biol. Chem. 262, 6500–6505. 90 RODIONOVA, N. A., TATULIAN, S. A., SURREY, T., J¨AHNIG, F. & TAMM, L. K. (1995). Characterization of two membrane-bound forms of OmpA. Biochemistry 34, 1921–1929. 91 VECSEY-SEMJEN, B., LESIEUR, C., MOLLBY, R. & van DERGOOT, F. G. (1997). Conformational changes due to membrane binding and channel
92
93
94
95
96
97
98
99
100
101
formation by staphylococcal alpha- toxin. J. Biol. Chem. 272, 5709–5717. TERZI, E., H¨OLZEMANN, G. & SEELIG, J. (1994). Reversible random coil-beta-sheet transition of the Alzheimer beta-amyloid fragment (25–35). Biochemistry 33, 1345–1350. BISHOP, C. M., WALKENHORST, W. F. & WIMLEY, W. C. (2001). Folding of beta-sheets in membranes: specificity and promiscuity in peptide model systems. J. Mol. Biol. 309, 975–988. ZAKHAROV, S. D. & CRAMER, W. A. (2002). Colicin crystal structures: pathways and mechanisms for colicin insertion into membranes. Biochim. Biophys. Acta 1565, 333–346. van DERGOOT, F. G., GONZALEZ-MANAS, J. M., LAKEY, J. H. & PATTUS, F. (1991). A “molten-globule” membrane-insertion intermediate of the poreforming domain of colicin A. Nature 354, 408–410. ZAKHAROV, S. D., LINDEBERG, M., GRIKO, Y. et al. (1998). Membrane-bound state of the colicin E1 channel domain as an extended two- dimensional helical array. Proc. Natl Acad. Sci. USA 95, 4282–4287. ZAKHAROV, S. D., LINDEBERG, M. & CRAMER, W. A. (1999). Kinetic description of structural changes linked to membrane import of the colicin E1 channel protein. Biochemistry 38, 11325–11332. LAKEY, J. H., DUCHE, D., GONZALEZ-MANAS, J. M., BATY, D. & PATTUS, F. (1993). Fluorescence energy transfer distance measurements. The hydrophobic helical hairpin of colicin A in the membrane bound state. J. Mol. Biol. 230, 1055–1067. QIU, X. Q., JAKES, K. S., KIENKER, P. K., FINKELSTEIN, A. & SLATIN, S. L. (1996). Major transmembrane movement associated with colicin Ia channel gating. J. Gen. Physiol. 107, 313–328. KIENKER, P. K., QIU, X., SLATIN, S. L., FINKELSTEIN, A. & JAKES, K. S. (1997). Transmembrane insertion of the colicin Ia hydrophobic hairpin. J. Membr. Biol. 157, 27–37. WEISS, M. S., BLANKE, S. R., COLLIER, R. J. & EISENBERG, D. (1995). Structure of the
References
102
103
104
105
106
107
108
isolated catalytic domain of diphtheria toxin. Biochemistry 34, 773–781. WANG, Y., MALENBAUM, S. E., KACHEL, K., ZHAN, H., COLLIER, R. J. & LONDON, E. (1997). Identification of shallow and deep membrane-penetrating forms of diphtheria toxin T domain that are regulated by protein concentration and bilayer width. J. Biol. Chem. 272, 25091–25098. SENZEL, L., GORDON, M., BLAUSTEIN, R. O., OH, K. J., COLLIER, R. J. & FINKELSTEIN, A. (2000). Topography of diphtheria toxin’s T domain in the open channel state. J. Gen. Physiol. 115, 421–434. SURREY, T. & J¨AHNIG, F. (1995). Kinetics of folding and membrane insertion of a β-barrel membrane protein. J. Biol. Chem. 270, 28199–28203. KLEINSCHMIDT, J. H. & TAMM, L. K. (1996). Folding intermediates of a beta-barrel membrane protein. Kinetic evidence for a multi-step membrane insertion mechanism. Biochemistry 35, 12993–13000. KLEINSCHMIDT, J. H. & TAMM, L. K. (1999). Time-resolved distance determination by tryptophan fluorescence quenching (TDFQ): Probing intermediates in membrane protein folding. Biochemistry 38, 4996–5005. KLEINSCHMIDT, J. H., DEN BLAAUWEN, T., DRIESSEN, A. J. M. & TAMM, L. K. (1999). Outer membrane protein A of Escherichia coli inserts and folds into lipid bilayers by a concerted mechanism. Biochemistry 38, 5006–5016. KLEINSCHMIDT, J. H. & TAMM, L. K. (2002). Secondary and tertiary structure formation of the beta-barrel membrane protein OmpA is synchronized and
109
110
111
112
113
114
115
116
depends on membrane thickness. J. Mol. Biol. 324, 319–330. RAETZ, C. R. & WHIFIELD, C. (2002). Lipopolysaccharide endotoxins. Annu. Rev. Biochem. 71, 635–700. HINDENNACH, I. & HENNING, U. (1975). The major proteins of the Escherichia coli outer cell envelope membrane. Preparative isolation of all major membrane proteins. Eur. J. Biochem. 59, 207–213. KLEINSCHMIDT, J. H., WIENER, M. C. & TAMM, L. K. (1999). Outer membrane protein A of E. coli folds into detergent micelles, but not in the presence of monomeric detergent. Protein Sci. 8, 2065–2071. MANN, C. J., ROYER, C. A. & MATTHEWS, C. R. (1993). Tryptophan replacements in the trp aporepressor from Escherichia coli: probing the equilibrium and kinetic folding models. Protein Sci. 2, 1853–1861. LONDON, E. & LADOKHIN, A. S. (2002). Measuring the depth of amino acid residues in membrane-inserted peptides by fluorescence quenching. Curr. Topics Membr. 52, 89–115. MCINTOSH, T. J. & HOLLOWAY, P. W. (1987). Determination of the depth of bromine atoms in bilayers formed from bromolipid probes. Biochemistry 26, 1783–1788. TAMM, L. K. & TATULIAN, S. A. (1997). Infrared spectroscopy of proteins and peptides in lipid bilayers. Q. Rev. Biophys. 30, 365–429. TAMM, L. K. & TATULIAN, S. A. (1993). Orientation of functional and nonfunctional PTS permease signal sequences in lipid bilayers. A polarized attenuated total reflection infrared study. Biochemistry 32, 7720–7726.
33
1
Molecular Dynamics Simulations to Study Protein Folding and Unfolding Amedeo Caflisch, and Emanuele Paci University of Z¨urich, Z¨urich, Switzerland
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Proteins in solution fold in time scales ranging from microseconds to seconds. A computational approach to folding that should work, in principle, is to use an atombased model for the potential energy (force field) and to solve the time-discretized Newton equation of motion (molecular dynamics, MD [1]) from a denatured conformer to the native state in the presence of the appropriate solvent. With the available simulation protocols and computing power, such a trajectory would require approximately 10–100 years for a 100-residue protein where the experimental transition to the folded state takes place in about 1 ms. Hence, there is a clear problem related to time scales and sampling (statistical error). On the other hand, we think that current force fields, even in their most detailed and sophisticated versions, i.e., explicit water and accurate treatment of long-range electrostatic effects, are not accurate enough (systematic error) to be able to fold a protein on a computer. In other words, even if one could use a computer 100 times faster than the currently fastest processor to eliminate the time scale problem, most proteins would not fold to the native structure because of the large systematic error and the marginal stability of the folded state typically ranging from 5 to 15 kcal mol−1 . Interestingly, only designed peptides of about 20 residues have been folded by MD simulations (see Section 2.1) using mainly approximative models of the solvent (see Section 3.4). Alternatively, protein unfolding which is a simpler process than folding (e.g., the unfolding rate shows Arrhenius-like temperature dependence whereas folding does not because of the importance of entropy, see Section 2.1.2) can be simulated on shorter time scales (1–100 ns) at high temperature or by using a suitable perturbation.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
MD simulations can provide the ultimate detail concerning individual atom motion as a function of time. Hence, future improvements in force fields and simulation protocols will allow specific questions about the folding of proteins to be addressed. The understanding at the atomic level of detail is important for a complicated reaction like protein folding and cannot easily be obtained by experiments. Yet, experimental approaches and results are essential in validating the force fields and simulation methods: comparison between simulation and experimental data is conditio sine qua non to validate the simulation results and very helpful for improving force fields. This chapter cannot be comprehensive. Results obtained by using atom-based force fields and MD are presented whereas lattice models [2] as well as off-lattice coarse-grained models (e.g., one interaction center per residue) [3] are not mentioned because of size limitations. It is important to note that the impact of MD simulations of folding and unfolding is increasing thanks to faster computers, more efficient sampling techniques, and more accurate force fields as witnessed by several review articles [1, 4] and books [5–7].
2 Molecular Dynamics Simulations of Peptides and Proteins 2.1 Folding of Structured Peptides
Several comprehensive review articles on MD simulations of structured peptides have appeared recently [8–10]. Here, we first focus on simulation results obtained in our research group and then discuss the Trp-cage, a model system that has been investigated by others. 2.1.1 Reversible Folding and Free Energy Surfaces 2.1.1.1 β-Sheets The reversible folding of two designed 20-residue sequences, beta3s and D PG, having the same three-stranded antiparallel β-sheet topology was simulated [11, 12] with an implicit model of the solvent based on the accessible surface area [13]. The solution conformation of beta3s (TWIQNGSTKWYQNGSTKIYT) has been studied by NMR [14]. Nuclear Overhauser enhancement spectroscopy (NOE) and chemical shift data indicate that at 10◦ C beta3s populates a single structured form, the expected three-stranded antiparallel β-sheet conformation with turns at Gly6-Ser7 and Gly14-Ser15, (Figure 1) in equilibrium with the denatured state. The β-sheet population is 13–31% based on NOE intensities and 30–55% based on the chemical shift data [14]. Furthermore, beta3s was shown to be monomeric in aqueous solution by equilibrium sedimentation and NMR dilution experiments [14]. D PG is a designed amino acid sequence (Ace-VFITSD PGKTYTEVD PG-OrnKILQ-NH), where D P are D-prolines and Orn stands for ornithine. Circular dichroism and chemical shift data have provided evidence that D PG adopts the expected
2 Molecular Dynamics Simulations of Peptides and Proteins
Fig. 1 Number of clusters as a function of time. The “leader” clustering procedure was used with a total of 120 000 snapshots saved every 0.1ns (thick line and square symbols). The clustering algorithm which uses the C" RMSD values between all pairs of structures was used only for the first 8 µs (80 000 snapshots) because of the computational requirements (thin line and circles). The diamond in the bottom left corner shows the average number of conform-
ers sampled during the folding time which is defined as the average time interval between successive unfolding and refolding events. The insets show a backbone representation of the folded state of beta3s with main chain hydrogen bonds in dashed lines, and the average effective energy as a function of the fraction of native contacts Q which are defined in [11]. Figure from Ref. [33].
three-stranded antiparallel β-sheet conformation at 24◦ C in aqueous solution [15]. Moreover, D PG was shown to be monomeric by equilibrium sedimentation. Although the percentage of β-sheet population was not estimated, NOE distance restraints indicate that both hairpins are highly populated at 24 ◦ C. In the MD simulations at 300 K (started from conformations obtained by spontaneous folding at 360 K) both peptides satisfy most of the NOE distance restraints (3/26 and 4/44 upper distance violations for beta3s and D PG, respectively). At a
3
4 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
temperature value of 360 K which is above the melting temperature of the model (330 K), a statistically significant sampling of the conformational space was obtained by means of around 50 folding and unfolding events for each peptide [11, 12]. Average effective energy and free energy landscape are similar for both peptides, despite the sequence dissimilarity. Since the average effective energy has a downhill profile at the melting temperature and above it, the free energy barriers are a consequence of the entropic loss involved in the formation of a β-hairpin which represents twothirds of the chain. The free energy surface of the β-sheet peptides is completely different from the one of a helical peptide of 31 residues, Y(MEARA)6 (see below). For the helical peptide, the folding free energy barrier corresponds to the helix nucleation step, and is much closer to the fully unfolded state than for the β-sheet peptides. This indicates that the native topology determines to a large extent the free energy surface and folding mechanism. On the other hand, the D PG peptide has a statistically predominant folding pathway with a sequence of events which is the inverse of the one of the most frequent pathway for the beta3s peptide. Hence, the amino acid sequence and specific interactions between different side chains determine the most probable folding route [12]. It is interesting to compare with experimental results on two-state proteins. Despite a sequence identity of only 15%, the 57-residue IgG-binding domains of protein G and protein L have the same native topology. Their folded state is symmetric and consists mainly of two β-hairpins connected such that the resulting four-stranded β-sheet is antiparallel apart from the two central strands which are parallel [16]. The φ value analysis (see Section 2.3 for a definition of φ value) of protein L and protein G indicates that for proteins with symmetric native structure more than one folding pathway may be consistent with the native state topology and the selected route depends on the sequence [16]. Our MD simulation results for the two antiparallel three-stranded β-sheet peptides (whose sequence identity is also 15%) go beyond the experimental findings for protein G and L. The MD trajectories demonstrate the existence of more than one folding pathway for each peptide sequence [12]. Interestingly, Jane Clarke and collaborators [17] have recently provided experimental evidence for two different unfolding pathways using the anomalous kinetic behavior of the 27th immunoglobulin domain (β-sandwich) of the human cardiac muscle protein titin. They have interpreted the upward curvature in the denaturant-dependent unfolding kinetics as due to changes in the flux between transition states on parallel pathways. In the conclusion of their article [17] they leave open the question “whether what is unusual is not the existence of parallel pathways, but the fact that they can be experimentally detected and resolved.” 2.1.1.2 α-Helices Richardson et al. [18] have analyzed the structure and stability of the synthetic peptide Y(MEARA)6 by circular dichroism (CD) and differential scanning calorimetry (DSC). This repetitive sequence was “extracted” from a 60amino-acid domain of the human CstF-64 polyadenylation factor which contains 12 nearly identical repeats of the consensus motif MEAR(A/G). The CD and DSC data were insensitive to concentration indicating that Y(MEARA)6 is monomeric
2 Molecular Dynamics Simulations of Peptides and Proteins
in solution at concentrations up to 2 mM. The far-UV CD spectrum indicates that the peptide has a helical content of about 65% at 1 ◦ C. The DSC profiles were used to determine an enthalpy difference for helix formation of 0.8 kcal mol−1 per amino acid. The length of Y(MEARA)6 makes it difficult to study helix formation by MD simulations with explicit water molecules. Therefore, multiple MD runs were performed with the same implicit solvation model used for the β-sheet peptides [13]. The simulation results indicate that the synthetic peptide Y(MEARA)6 assumes a mainly α-helical structure with a nonnegligible content of π-helix [149]. This is not inconsistent with the currently available experimental evidence [18]. A significant π-helical content was found previously by explicit solvent molecular dynamics simulations of the peptides (AAQAA)3 and (AAKAA)3 [19], which provides further evidence that the π-helical content of Y(MEARA)6 is not an artifact of the approximations inherent to the solvation model. An exponential decay of the unfolded population is common to both Y(MEARA)6 [149] and the 20-residue three-stranded antiparallel β-sheet [14] previously investigated by MD at the same temperature (360 K) [11]. The free energy surfaces of Y(MEARA)6 and the antiparallel β-sheet peptide differ mainly in the height and location of the folding barrier, which in Y(MEARA)6 is much lower and closer to the fully unfolded state. The main difference between the two types of secondary structure formation consists of the presence of multiple pathways in the α-helix and only two predominant pathways in the three-stranded β-sheet. The helix can nucleate everywhere, with a preference for the C-terminal third of the sequence in Y(MEARA)6 . Furthermore, two concomitant nucleation sites far apart in the sequence are possible. Folding of the three-stranded antiparallel β-sheet peptide beta3s started with the formation of most of the side chain contacts and hydrogen bonds between strands 2 and 3, followed by the 1–2 interstrand contacts. The inverse sequence of events, i.e., first formation of 1–2 and then 2–3 contacts was also observed, but less frequently [11]. The free energy barrier seems to have an important entropic component in both helical peptides and antiparallel β-sheets. In an α-helix, it originates from constraining the backbone conformation of three consecutive amino acids before the first helical hydrogen bond can form, while in the antiparallel β-sheet it is due to the constraining of a β-hairpin onto which a third strand can coalesce [11]. Therefore, the folding of the two most common types of secondary structure seems to have similarities (a mainly entropic nucleation barrier and an exponential folding rate) as well as important differences (location of the barrier and multiple vs. two pathways). The similarities are in accord with a plethora of experimental and theoretical evidence [20] while the differences might be a consequence of the fact that Y(MEARA)6 has about 7–9 helical turns whereas the three-stranded antiparallel β-sheet consists of only two “minimal blocks”, i.e., two β-hairpins. 2.1.2 Non-Arrhenius Temperature Dependence of the Folding Rate Small molecule reactions show an Arrhenius-like temperature dependence, i.e., faster rates at higher temperatures. Protein folding is a complex reaction involving many degrees of freedom; the folding rate is Arrhenius-like at physiological
5
6 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
temperatures, but deviates from Arrhenius behavior at higher temperatures [20]. To quantitatively investigate the kinetics of folding, MD simulations of two model peptides, Ace-(AAQAA)3 -NHCH3 (α-helical stable structure) and Ace-V5 D PGV5 NH2 (β-hairpin), were performed using the same implicit solvation model [13]. Folding and unfolding at different temperature values were studied by 862 simulations for a total of 4 µs [21]. Different starting conformations (folded and random) were used to obtain a statistically significant sampling of conformational space at each temperature value. An important feature of the folding of both peptides is the negative activation enthalpy at high temperatures. The rate constant for folding initially increases with temperature, goes through a maximum at about T m , and then decreases [21]. The non-Arrhenius behavior of the folding rate is in accord with experimental data on two mainly alanine α-helical peptides [22, 23], a β-hairpin [24], CI2, and barnase [25], lysozyme [26, 27], and lattice simulation results [28–30]. It has been proposed that the non-Arrhenius profile of the folding rate originates from the temperature dependence of the hydrophobic interaction [31, 32]. The MD simulation results show that a non-Arrhenius behavior can arise at high values of the temperature in a model where all the interactions are temperature independent. This has been found also in lattice simulations [28, 29]. The curvature of the folding rate at high temperature may be a property of a reaction dominated by enthalpy at low temperatures and entropy at high temperatures [30]. The non-Arrhenius behavior for a system where the interactions do not depend on the temperature might be a simple consequence of the temperature dependence of the accessible configuration space. At low temperatures, an increase in temperature makes it easier to jump over the energy barriers, which are rate limiting. However, at very high temperatures, a larger portion of the configuration space becomes accessible, which results in a slowing down of the folding process. 2.1.3 Denatured State and Levinthal Paradox The size of the accessible conformational space and how it depends on the number of residues is not easy to estimate. To investigate the complexity of the denatured state four molecular dynamics runs of beta3s were performed at the melting temperature of the model (330 K) for a total simulation time of 12.6 µs [33]. The simulation length is about two orders of magnitude longer than the average folding or unfolding time (about 85 ns each), which are similar because at the melting temperature the folded and unfolded states are equally populated. The peptide is within 2.5Å Cα root mean square deviation (RMSD) from the folded conformation about 48% of the time. Figure 1 shows the results of a cluster analysis based on Cα RMSD. There are more than 15000 conformers (cluster centers) and it is evident that a plateau has not been reached within the 12.6 µs of simulation time. However, the number of significantly populated clusters (see Ref. [12] for a detailed description) converges already within 2 µs. Hence, the simulation-length dependence of the total number of clusters is dominated by the small ones. At each simulation interval between an unfolding event and the successive refolding event additional conformations are sampled. More than 90% of the unfolded state conformations are in
2 Molecular Dynamics Simulations of Peptides and Proteins
small clusters (each containing less than 0.1% of the saved snapshots) and the total number of small clusters does not reach a plateau within 12.6 µs. Note that there is also a monotonic growth with simulation time of the number of snapshots in the folded-state cluster. After 12.6 µs (and also within each of the four trajectories) the system has sampled an equilibrium of folded and unfolded states despite a large part of the denatured state ensemble has not yet been explored. In fact, the average folding time converges to a value around 85 ns which shows that the length of each simulation is much larger than the relaxation time of the slowest conformational change. Interestingly, in the average folding time of about 85 ns beta3s visits less than 400 clusters (diamond in Figure 1). This is only a small fraction of the total amount of conformers in the denatured state. It is possible to reconcile the fast folding with the large conformational space by analyzing the effective energy, which includes all of the contributions to the free energy except for the configurational entropy of the protein [11, 34]. Fast folding of beta3s is consistent with the monotonically decreasing profile of the effective energy (inset in Figure 1). Despite the large number of conformers in the denatured state ensemble, the protein chain efficiently finds its way to the folded state because native-like interactions are on average more stable than nonnative ones. In conclusion, the unfolded state ensemble at the melting temperature is a large collection of conformers differing among each other, in agreement with previous high temperature molecular dynamics simulations [8, 35]. The energy “bias” which makes fast folding possible does not imply that the unfolded state ensemble is made up of a small number of statistically relevant conformations. The simulations provide further evidence that the number of denatured state conformations is orders of magnitudes larger than the conformers sampled during a folding event. This result also suggests that measurements which imply an average over the unfolded state do not necessarily provide information on the folding mechanism.
2.1.4 Folding Events of Trp-cage Very small proteins are ideal systems to validate force fields and simulation methodology. Neidigh et al. [36] have truncated and mutated a marginally stable 39-residue natural sequence thereby designing a 20-residue peptide, the Trp-cage, that is more than 95% folded in aqueous solution at 280 K. The stability of the Trp-cage is due to the packing of a Trp side chain within three Pro rings and a Tyr side chain. Moreover, the C-terminal half contains four Pro residues which dramatically restrict the conformational space, i.e., entropy, of the unfolded state [36, 37]. Four MD studies have appeared in the 12 months following the publication of the Trp-cage structure [38–41]. All of the simulations were started from the completely extended conformation and used different versions of the AMBER force field and the generalized Born continuum electrostatic solvation model [42]. Two simulations were run with conventional constant temperature MD at 300 K [40] and 325 K [38], a third study used replica exchange MD with a range of temperatures from 250 K to 630 K [41], and in the fourth paper distributed computing simulations at 300 K with full water viscosity were reported [39].
7
8 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
An important problem of the three constant temperature studies is that the Trpcage seems to fold to a very deep free energy minimum and no unfolding events have been observed [38–40]. Moreover, only one folding event is presented by Simmerling et al. [38] and Chowdhury et al. [40]. The poor statistics does not allow to draw any conclusions on free energy landscapes or on the folding mechanism of the Trp-cage. Another potential problem is the discrepancy between the most stable state sampled by MD and the NMR conformers. Only in two of the four MD studies NOE distance restraints were measured along the trajectories and about 20% were found to be violated [40, 41]. Moreover, as explicitly stated by the authors, the native state sampled by distributed computing contains a π-helix (instead of the α-helix) and the Trp is not packed correctly in the core [39]. These discrepancies are significant because the Trp-cage has a very small core and a rather rigid Cterminal segment. 2.2 Unfolding Simulations of Proteins 2.2.1 High-temperature Simulations Since the early work of Daggett and Levitt [43] and Caflisch and Karplus [44], several other high-temperature simulation studies have been concerned with exploring protein unfolding pathways. Several comprehensive review articles exist on this simulation protocol [45] which has been widely used since. Recent MD simulations at temperatures of 100 ◦ C and 225 ◦ C of a three-helix bundle 61-residue protein, the engrailed homeodomain (En-HD), by Daggett and coworkers [46, 47] have been used to analyze a folding intermediate at atomic level of detail. The unfolding halflife of the En-HD at 100 ◦ C has been extrapolated to be about 7.5 ns, a time scale that can be accessed by MD simulations with explicit water molecules. Also, unfolding simulations in the presence of explicit urea molecules have shown that the protein (barnase) remains stable at 300 K but unfolds partially at moderately high temperature (360 K) [48]. The results suggested a mechanism for urea induced unfolding due to the interaction of urea with both polar and nonpolar groups of the protein. 2.2.2 Biased Unfolding Because of the limitations on simulation times and height of the barriers to conformational transitions in proteins, a number of methods, alternative to the use of high, nonphysical temperatures, have been proposed to accelerate such transitions by the introduction of an external time-dependent perturbation [49–54]. The perturbation induce the reaction of interest in a reasonable amount of time (the strength of the perturbation is inversely proportional to the available computer time). These methods have been used for studying not only protein unfolding at native or realistic denaturing conditions, but also large conformational changes between known relevant conformers [51]. Their goal is to generate pathways which are realistic, in spite of the several orders of magnitude reduction in the time required for the conformational change. They are not alternatives to methods to compute free
2 Molecular Dynamics Simulations of Peptides and Proteins
energy profiles along defined pathways. The external perturbation is usually applied to a function of the coordinates which is assumed to vary monotonically as the protein goes from the native to the nonnative state of interest. For certain perturbations the unfolding pathways obtained have been shown to depend on the nature of the perturbation and the choice of the reaction coordinate [52]; this is even more the case when the perturbation is strong and the reaction is induced too quickly for the system to relax along the pathway. A perturbation which is particularly “gentle” since it exploits the intrinsic thermal fluctuations of the system and produces the acceleration by selecting the fluctuations that correspond to the motion along the reaction coordinate has also been used to unfold proteins [55, 56]. This perturbation has been employed, in particular, to expand α-lactalbumin by increasing its radius of gyration starting from the native state, and generate a large number of low-energy conformers that differ in terms of their root mean square deviation, for a given radius of gyration. The resulting structures were relaxed by unbiased simulations and used as models of the molten globule and more unfolded denatured states of α-lactalbumin based on measured radii of gyration obtained from nuclear magnetic resonance experiments [57]. The ensemble of compact nonnative structures agree in their overall properties with experimental data available for the α-lactalbumin molten globule, showing that the native-like fold of the α-domain is preserved and that a considerable proportion of the antiparallel β-sheet in the β-domain is present. This indicated that the lack of hydrogen exchange protection found experimentally for the β-domain [58] is due to rearrangement of the β-sheet involving transient populations of nonnative β-structures in agreement with more recent infrared spectroscopy measurements [59]. The simulations also provide details concerning the ensemble of structures that contribute as the molten globule unfolds and shows, in accord with experimental data [60], that the unfolding is not cooperative, i.e., the various structural elements do not unfold simultaneously. 2.2.3 Forced Unfolding Unfolding by stretching proteins individually has become routinely mainly thanks to the advent of the atomic force microscopy technology [61]. This peculiar way of unfolding proteins opened new perspectives on protein folding studies. Experiments are usually performed on engineered homopolyproteins, and the I27 domain from titin has become the reference system for this type of studies. Experiments measure force-extension profiles, and show typical “saw-tooth” profiles, where peaks are due to the sudden unfolding of individual domains, sequentially in time, causing a drop in the recorded force. These profiles are generally interpreted assuming that the unfolding event is determined by a single barrier which is decreased by the external force. For a detailed description of the experimental techniques and of the most recent results on forced unfolding of single molecules (by atomic force microscopy and optical tweezers) see Chapter 31. To provide a structural interpretation of the typical saw-tooth-like spectra measured in single molecule stretching experiments, various simulation techniques have been proposed, where detailed all-atom models of proteins are stretched by pulling two atoms apart [62, 63], differing mainly in the way the solvent is treated.
9
10 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
In some cases simulation can effectively explain the force patterns measured (see Ref. [64] for a review). For all the proteins experimentally unfolded by pulling, only a simple saw-tooth pattern has been recorded related to the sudden unfolding when the protein was pulled beyond a certain length, i.e., a simple two-state behavior. Simulations showed a more complex behavior [63] with possible intermediates on the forced unfolding pathways for certain proteins.1) Simulation has been used [65] to compare forced unfolding of two protein classes (all-β-sandwich proteins and all-α-helix bundle proteins). In particular, simulations suggested that different proteins should show a significantly different forced unfolding behavior, both within a protein class and for the different classes and dramatic differences between the unfolding induced by high temperature and by external pulling forces. The result was shown to be correlated to the type of perturbation, the folding topologies, the nature of the secondary and tertiary interactions and the relative stability of the various structural elements [65]. Improvements in the AFM technique combined with protein engineering methods have now confirmed that chemical (or thermal) and forced unfolding occur through different pathways and that forced unfolding is related to crossing of a free energy barrier which might not be unique, but might change with force magnitude or upon specific mutations [66]. It should be borne in mind, however, that the forced unfolding of proteins is a nonequilibrium phenomenon strongly dependent on the pulling speed, and, since time scales in simulations and experiments are very different, the respective pathways need not to be the same. Recently, through a combination of experimental analysis and molecular dynamics simulations it has been shown that, in the case of mechanical unfolding, pathways might effectively be the same in a large range of pulling speeds or forces [67, 68], thus providing another demonstration of the robustness of the energy landscape (i.e., the funnel-like shape of the free energy surface sculpted by evolution is not affected by the application of even strong perturbations [69, 70]). In two recent papers [71, 72] it has been shown that proteins resist differently when pulled in different directions. In both cases the experiments have been complemented with simulations, with either explicit or implicit solvents. In both cases the behavior observed experimentally is qualitatively reproduced. This fact strongly suggests, although does not prove it, that in this particular case the forced unfolding mechanisms explored in the simulations is the same as that which determines the experimentally measured force. Difference between solvation models is discussed in detail in Section 3.4 in the context of forced unfolding simulations, the disadvantage of an explicit solvation model [62, 72, 73] relative to an implicit [63, 65, 67, 68, 71] is not only that of being much slower, but also to provide an environment which relaxes slowly relative to the fast unraveling of the protein under force. Moreover, properly hydrating with
1) The presence of “late” intermediates on the forced unfolding pathway was first observed [63] in the 10th domain of fibronectin type III from fibronectin (FNfn3). A more complex pattern than equally spaced peaks in the force ex-
tension profile was predicted to arise from the presence of a kinetically metastable state. Most recent experimental results (J. Fernandez, personal communication) confirm the behavior predicted by the simulation.
2 Molecular Dynamics Simulations of Peptides and Proteins
explicit water a partially extended protein requires a large quantity of water, thus requiring a very large amount of CPU time for a single simulation. Implicit solvent models, on the other hand, allow unfolding to be performed at much lower forces (or pulling speeds) and multiple simulations to be used to study the dependence of the results on the initial conditions and/or on the applied force. 2.3 Determination of the Transition State Ensemble
The understanding of the folding mechanism has crucially advanced since the development of a method which provide information on the transition state [74] (see Chapter 13). The method allows the structure of the transition state at the level of single residue to be probed by measuring the change in folding and unfolding rates upon mutations. The method provides a so-called φ-value for each of the mutated residues which is a measure of the formation of native structure around the residue: a φ-value of 1 suggests that the residue is in a native environment at the transition state while a φ-value of 0 can be interpreted as a loss of the interactions of the residue at the transition state. Fractional φ-values are more difficult to interpret, but have been shown to arise from weakened interactions [75] and not from a mixture of species, some with fully formed and some with fully broken interaction. As we discussed in Section 2.1, the use of high temperature makes it possible to observe the unfolding of a protein by MD on a time scale which can be simulated on current computers. Valerie Daggett and collaborators [76] first had the idea of performing a very high-temperature simulation and looking for a sudden change in the structure of the protein along the trajectory, indicating the escape from the native minimum of the free energy surface. The collections of structures around the “jump” were assumed to constitute a sample of the putative transition state. Assuming that the experimental φ-values correspond in microscopic terms to fraction of native contacts, they found a good agreement between calculated and experimental values. This approach was initially applied to the protein CI2, a small two-state proteins which has been probably the most thoroughly studied by experimental φ-value analysis; it has been subsequently improved and extended to the study of several proteins for which experimental φ-values were available (see Ref. [77] for review and other references). Another related method has been used recently [78] to unfold a protein by hightemperature simulation (srcSH3 in the specific case) and determine a putative transition state by looking for conformations where the difference between calculated and experimental φ-values was smallest. Both methods presented above have the advantage of providing structures extracted from an unfolding trajectory and thus the fast refolding or complete unfolding from these structures (a property of transition states) has been reported [78, 79]. But both approaches only provide few transition state structures, because a long simulation is required to generate each member of the transition state ensemble, while the transition state can be a quite broad ensemble for some proteins [80].
11
12 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
Fig. 2 Comparison between the native state structure (left) and the most representastate structure (left) and the most representative structures of the transition state ensemble of AcP, determined by all-atom molecular dynamics simulations [82]. Native secondary
structure elements are show in color (the two "-helices are plotted in red and the $sheet in green). The three key residues for folding are shown as gold spheres [81, 82]. Figure from Ref. [69].
In a recent development, it has been shown that the amount of information that can be obtained from experimental measurements can be expanded further by using the data to build up phenomenological energy functions to bias computer generated trajectories. With this approach (see Section 3.2 for a more detailed description of the technique), conformations compatible with experimental data are determined directly during the simulations [81, 82], rather than being obtained from filtering procedures such as those discussed above [78]. The incorporation of experimental data into the energy function creates a minimum in correspondence of the state observed experimentally and therefore allows for a very efficient sampling of conformational space. The transition state for folding of acylphosphatase (see Figure 2) was determined in this way [81, 82], showing that the network of interactions that stabilize the transition state is established when a few key residues form their native-like arrangement. Based on this computational technique, a general approach in which theory and experiments are combined in an iterative manner to provide a detailed description of the transition state ensemble has been recently proposed [83]. In the first iteration, a coarse-grained determination of the transition state ensemble (TSE) is carried out by using a limited set of experimental φ-values as constraints in a molecular dynamics simulation. The resulting model of the TSE is used to determine the additional residues whose φ-value measurement would provide the most information for refining the TSE. Successive iterations with an increasing number
3 MD Techniques and Protocols
of φ-value measurements are carried out until no further changes in the properties of the TSE are detected or there are no additional residues whose φ-values can be measured. The method can be also used to find key residues for folding (i.e., those that are most important for the formation of the TSE). The study of the transition state represents probably the most interesting example of how experiment and molecular dynamics simulations complement each other in understanding and visualizing the folding mechanisms in terms of relevant structures involved. Simulations are performed with approximate force fields and unfolding induced using artificial means (such as high temperature or other perturbations). At this stage in the development of MD simulations, the experiment provides evidence that what is observed in silico is consistent with what happens in vitro. On the other hand, and particularly in the case in which the full ensemble of conformations compatible with the experimental results is generated [82, 83], the simulation suggests further mutations to increase the resolution of the picture of the transition state, and allows detailed hypothesis of the mechanisms, such as the identity and structure of the residues involved in the folding nucleus [150].
3 MD Techniques and Protocols 3.1 Techniques to Improve Sampling
A thorough sampling of the relevant conformations is required to accurately describe the thermodynamics and kinetics of protein folding. Since the energetic and entropic barriers are higher than the thermal energy at physiological temperature, standard MD techniques often fail to adequately sample the conformational space. As already mentioned in this chapter, even for a small protein it is currently not yet feasible to simulate reversible folding with a high-resolution approach (e.g., MD simulations with an all-atom model). The practical difficulties in performing such brute force simulations have led to several types of computational approaches and/or approximative models to study protein folding. An interesting approach is to unfold starting from the native structure [84–86] but detailed comparison with experiments [47] is mandatory to make sure that the high-temperature sampling does not introduce artifacts. In addition, a number of approaches to enhance sampling of phase space have been introduced [87, 88]. They are based on adaptive umbrella sampling [89], generalized ensembles (e.g., entropic sampling, multicanonical methods, replica exchange methods) [90], modified Hamiltonians [91–93], multiple time steps [94], or combinations thereof. 3.1.1 Replica Exchange Molecular Dynamics Replica exchange is an efficient way to simulate complex systems at low temperature and is the simplest and most general form of simulated tempering [95]. Sugita and
13
14 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
Okamoto have been the first to extend the original formulation of replica exchange into an MD-based version (REMD), testing it on the pentapeptide Metenkephalin in vacuo [96]. The basic idea of REMD is to simulate different copies (replicas) of the system at the same time but at different temperatures values. We recently applied a REMD protocol to implicit solvent simulations of a 20residue three-stranded antiparallel β-sheet peptide (beta3s) [97]. Each replica evolves independently by MD and every 1000 MD steps (2 ps), states i, j with neighbor temperatures are swapped (by velocity rescaling) with a probability wij = exp(–) [96], where ≡ (β i – β j )(E j −E i ), β = 1/kT and E is the potential energy. During the 1000 MD steps the Berendsen thermostat [98] is used to keep the temperature close to a given value. This rather tight coupling and the length of each MD segment (2 ps) allow the kinetic and potential energy of the system to relax. High temperature simulation segments facilitate the crossing of the energy barriers while the low-temperature ones explore in detail the conformations present in the minimum energy basins. The result of this swapping between different temperatures is that high-temperature replicas help the low-temperature ones to jump across the energy barriers of the system. In the beta3s study eight replicas were used with temperatures between 275 and 465 K [97]. The higher the number of degrees of freedom in the system the more replicas should be used. It is not clear how many replicas should be used if a peptide or protein is simulated with explicit water. The transition probability between two temperatures depends on the overlap of the energy histograms. The histograms’ width √ depends on 1/ N (where N is the size of the system). Hence, the number of replicas required to cover a given temperature range increases with the size. Moreover, in order to have a random walk in temperature space (and then a random walk in energy space which enhances the sampling), all the temperature exchanges should occur with the same probability. This probability should be at least of 20–30%. To optimize the efficiency of the method, one should find the best compromise between the number of replicas to be used, the temperature space to cover and the acceptance ratios for temperature exchanges. In the literature there is no clear indication about the selection of temperatures and empirical methods are usually applied (weak point of the method). The choice of the boundary temperatures depends on the system under study. The highest temperature has to be chosen in order to overcome the highest energy barriers (probably higher in explicit water) separating different basins; the lowest temperature to investigate the details of the different basins. Sanbonmatsu and Garcia have applied REMD to investigate the structure of Metenkephalin in explicit water [99] and the α-helical stabilization by the arginine side chain which was found to originate from the shielding of main-chain hydrogen bonds [100]. Furthermore, the energy landscape of the C-terminal β-hairpin of protein G in explicit water has been investigated by REMD [101, 102]. Recently, a multiplexed approach with multiple replicas for each temperature level has been applied to large-scale distributed computing of the folding of a 23-residue miniprotein [103]. Starting from a completely extended chain, conformations close to the
3 MD Techniques and Protocols
NMR structures were reached in about 100 trajectories (out of a total of 4000) but no evidence of reversible folding (i.e., several folding and unfolding events in the same trajectory) was presented [103].
3.1.2 Methods Based on Path Sampling A very promising computational method, called transition path sampling (reviewed in Ref. [104]) has been recently used [105] to study the folding of a β-hairpin in explicit solvent. The method allows in principle the study of rare events (such as protein folding) without requiring knowledge of the mechanisms, reaction coordinates, and transition states. Transition path sampling focuses on the sampling not of conformations but of trajectories linking two conformations or regions (possibly basins of attraction) in the conformational space. Other methods focus on building ensemble of paths connecting states; the stochastic path approach [106] and the reaction path method [107] have been also used to study the folding of peptides and small proteins in explicit solvent. The stochastic path ensemble and the reaction path methods introduce a bias in the computed trajectories but allow the exploration of long time scales. All the methods mentioned above are promising but rely on the choice of a somewhat arbitrary initial unfolded conformation beside the final native one.
3.2 MD with Restraints
A method to generate structures belonging to the TSE ensemble discussed in Section 2.3 consists in performing molecular dynamics simulations restrained with a pseudo-energy function based on the set of experimental φ-values. The φvalues are interpreted as the fraction of native contacts present in the structures that contribute to the TSE. With this restraint the TSE becomes the most stable state on the potential energy surface rather than being an unstable region, as it is for the true energy function of the protein. This procedure is conceptually related to that used to generate native state structures compatible with measurements from nuclear magnetic resonance (NMR) experiments, in that pseudo-energy terms involving experimental restraints are added to the protein force field [108, 109]. The main difference is that an approach is required to sample a broad state compatible with some experimental restraints, rather than a method to search for an essentially unique native structure. The method is based on molecular dynamics simulations using an all-atom model of the protein [110, 111] and an implicit model for the solvent [112] with an additional term in the energy function: ρ=
1 exp (φi −φi )2 Nφ i∈E
(1)
15
16 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
where E is the list of the N φ available experimental φ-values, φ i exp . The φ i -value of amino acid i in the conformation at time t is defined as φi (t)=
Ni (t) Ninat
(2)
where N i (t) is the number of native contacts of i at time t and N i nat the number of native contacts of i in the native state. Molecular dynamics simulations are then performed to sample all the possible structures compatible with the restraints. The structures thus generated are not necessarily at the transition state for folding for the potential used. They provide instead a structural model of the experimental transition state, including all possible structures compatible with the restraints derived from the experiment. The experimental information provided by the φ-values might not be enough to restrain the sampling to meaningful structures (e.g., when only few mutations have been performed). In such circumstance, other experimentally measured quantities, such as the m-value, which is related to the solvent accessible surface, must be used to restrain the sampling or to a posteriori select meaningful structures. This type of computational approach relies on the assumption implicit in Eq. (2). This consists in approximating a φ-value, measured as a ratio of free energy variations upon mutation, as a ratio of side-chain contacts. A definition based on side-chains is appropriate since experimental φ-values are primarily a measure of the loss of side-chain contacts at the transition state, relative to the native state. Although simply counting contacts, rather than calculating their energies, is a crude approximation [113], it has been shown that there is a good correlation between loss of stability and loss of side-chain contacts within about 6Å on mutation [114]. Also, Shea et al. [115] have found in their model calculations that this approximation for estimating φ-values from structures is a good one under certain conditions. A more detailed relation between experimental φ-values and atomic contacts could in principle be established by using the energies of the all-atom contacts made by the side chain of the mutated amino acid. The same approach can be extended to generate the structures corresponding to other unfolded or intermediate states as the site-specific information provided by the experiment is steadily increasing (see Chapters 20 and 21). 3.3 Distributed Computing Approach
As mentioned in the introduction, the problem of simulating the folding process of any sequence from a random conformation is mainly a problem of potentials and computer time. Duan and Kollman [116] have showed that a huge effort in parallelizing (on a medium-scale, 256 processors) an MD code and exploiting for several months a several million dollars computer (a Cray T3E) could lead to the simulation of 1 µs of the small protein villin headpiece. Even approaching the
3 MD Techniques and Protocols
typical experimental folding times (which is, however, larger than 1 ms for most proteins), a statistical characterization of the folding process is still impossible in the foreseeable future. Developing a large-scale parallelization method seems the most viable approach, as the cost of fast CPUs decreases steadily and their performances approach those of much more expensive mainframes. Time being sequential, MD codes are not massively parallelizable in an efficient way. A good scaling is usually obtained for large systems with explicit water and a relatively small number of processors (between 2 and 100, depending on the program and the problem studied). One approach has been proposed that allows the scalability of a MD simulation to be pushed to the level of being able to use efficiently a network of heterogeneous and loosely connected computer [117]. The approach (called distributed computing) exploits the stochastic nature of the folding process. In general protein folding involves the crossing of free energy barriers. The approach is most easily understood assuming that the proteins have a single barrier and a single exponential kinetic (which is the case for a large number of small proteins [118]). The probability that a protein is folded after a time t is P(t) = 1 –exp(–kt), where k is the folding rate. Thus, for short times, and considering M proteins or independent simulations, the probability of observing a folding event is Mkt. So, if M is large, there is a sizable probability of observing a folding event on simulations much shorter than the time constant of the folding process [119]. The folding rate could then in principle be estimated by running M independent simulations (starting from the completely extended conformation with different random velocities) for a time t and counting the number N of simulations which end up in the folded state as k = N/(Mt). Simulations have been reported where the folding rate estimated in this way (assuming that partial refolding counts as folding) is in good agreement with the experimental one (see, for example, Ref. [39]). However, it has been argued [120] that even for simple two-state proteins, folding has a series of early conformational steps that lead to lag phases at the beginning of the kinetics. Their presence can bias short simulations toward selecting minor pathways that have fewer or faster lag steps and so miss the major folding pathways. This fact has been clearly observed by comparing equilibrium and fast folding trajectories simulations [121] for a 20-residue three-stranded antiparallel β-sheet peptide (beta3s). It was found that the folding rate is estimated correctly by the distributed computing approach when trajectories longer than a fraction of the equilibrium folding time are considered; in the case of the 20-residue peptide studied within the frictionless implicit solvation model used for the simulations, this time is about 1% of the average folding time at equilibrium. However, careful analysis of the folding trajectories showed that the fastest folding events occur through high-energy pathways, which are unlikely under equilibrium conditions (see Section 2.1.1). Along these very fast folding pathways the peptide does not relax within the equilibrium denatured state which is stabilized by the transient presence of both native and nonnative interactions. Instead, collapse and formation of native interactions coincides and, unlike at equilibrium, the formation of the two β-hairpins is nearly simultaneous.
17
18 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
These results demonstrate that the ability to predict the folding rate does not imply that the folding mechanisms are correctly characterized: the fast folding events occur through a pathway that is very unlikely at equilibrium. However, extending the time scale of the short simulations to 10% of the equilibrium folding time, the folding mechanism of the fast folding events becomes almost indistinguishable from equilibrium folding events. It must be stressed that this result is not general but concerns the specific peptide studied; the explicit presence of solvent molecule (and the consequent friction), might decrease the differences between equilibrium and shortest folding events. Unfortunately, this kind of validation of the distributed computing approach is not possible for a generic protein in a realistic solvent, as equilibrium simulations are not feasible. An alternative method to use many processors simultaneously to access time scales relevant in the folding process by MD simulations has recently been proposed by Settanni et al. [122]. The method is based on parallel MD simulations that are started from the denatured state; trajectories are periodically interrupted, and are restarted only if they approach the transition (or some other target) state. In other words, the method choses trajectories along which a cost function decreases. The effectiveness of such an approach was shown by determining the transition state for folding an SH3 domain using as cost function the deviation between experimental and computed φ-values (Eq. (1) in Section 3.2). The method can efficiently use a large number of computers simultaneously because simulations are loosely coupled (i.e., only the comparison between final conformations, needed periodically to choose which trajectory to restart, involve communications between CPUs). This method can also be extended to complex nondifferentiable cost functions. 3.4 Implicit Solvent Models versus Explicit Water
Incorporating solvent effects in MD and Monte-Carlo simulations is of key importance in quantitatively understanding the chemical and physical properties of biomolecular processes. Accurate electrostatic energies of proteins in an aqueous environment are needed in order to discriminate between native and nonnative conformations. An exact evaluation of electrostatic energies considers the interactions among all possible solute-solute, solute-solvent, and solvent-solvent pairs of charges. However, this is computationally expensive for macromolecules. Continuum dielectric approximations offer a more tractable approach [123–127]. The essential concept in continuum models is to represent the solvent by a high dielectric medium, which eliminates the solvent degrees of freedom, and to describe the macromolecule as a region with a low dielectric constant and a spatial charge distribution. The Poisson equation provides an exact description of such a system. The increase in computation speed for a finite difference solution of the Poisson equation [128–131] with respect to an explicit treatment of the solvent is remarkable but still not enough for effective utilization in computer simulations of macromolecules. The generalized Born (GB) model was introduced to facilitate
3 MD Techniques and Protocols
an efficient evaluation of continuum electrostatic energies [42]. It provides accurate energetics and the most efficient implementations are between five and ten times slower than in vacuo simulations [132–134]. The essential element of the GB approach is the calculation of an effective Born radius for each atom in the system which is a measure of how deeply the atom is buried inside the protein. This information is combined in a heuristic way to obtain a correction to the Coulomb law for each atom pair [42]. For the integration of energy density, necessary to obtain the effective Born radii, both numerical [42, 132, 135] and analytical [134, 136, 137] implementations exist. The former are more accurate but slower than the latter [135]. Moreover, analytical derivatives that are required for MD simulations are not given by numerical implementations. For efficiency reasons empirical dielectric screening functions are the most common choice in MD simulations with implicit solvent. One kind of solvation model is based on the use of a dielectric function that depends linearly on the distance r between two charges ((r) = αr) [138, 139] or has a sigmoidal shape [140, 141]. Although very fast, these options suffer from their inability to discriminate between buried and solvent exposed regions of a macromolecule and are therefore rather inaccurate. A distance and exposure dependent dielectric function has been proposed [142]. Recently, an approach based on the distribution of solute atomic volumes around pairs of charges in a macromolecule has been proposed to calculate the effective dielectric function of proteins in aqueous solution [143]. The simulation results presented in Section 2.1 were obtained using an implicit solvent model based on a fast analytical approximation of the solvent accessible surface (SAS) [13] and the CHARMM force field [110]. The former drastically reduces the computational cost with respect to an explicit solvent simulation. The SAS model is based on the approximation proposed by Lazaridis and Karplus [112] for dielectric shielding due to the solvent, and the surface area model for the hydrophobic effect introduced by Eisenberg and McLachlan [144]. Electrostatic screening effects are approximated by a distance-dependent dielectric function and a set of partial charges with neutralized ionic groups [112]. An approximate analytical expression [145] is employed to calculate the SAS because an exact analytical or numerical computation of the SAS is too slow to compete with simulations in explicit solvent. The SAS model is based on the assumptions that most of the solvation energy arises from the first water shell around the protein [144] and that two atomic solvation parameters are sufficient to describe these effects at a qualitative level of accuracy. Within these assumptions, the SAS energy term approximates the solute-solvent interactions (i.e., it should account for the energy of cavity formation, solute-solvent dispersion interactions, and the direct (or Born) solvation of polar groups). The two atomic solvation parameters were optimized by performing 1 ns MD simulations at 300 K on six small proteins [13]. It is important to underline that the structured peptides discussed in Section 2.1 were not used for the calibration of the SAS atomic solvation parameters. The SAS model is a good approximation for investigating the folded and denatured state (large ensemble of conformers) of structured peptides. Its limitations, in particular for highly charged peptides and large proteins, have been discussed [13].
19
20 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
The most detailed and physically sound approaches (e.g., explicit solvent and particle mesh Ewald treatment of the long-range electrostatic interactions [146]) are still approximations and might introduces artifacts (see, for example, Ref. [147]). All solvation models, even those computationally most expensive, are approximations and their range of validity is difficult to explore. It is likely that most proteins will unfold fast relative to the experimental time scale if one could afford long (e.g., 100 ns) explicit water MD simulations even at room temperature. Some evidence of this instability has been recently published [148].
4 Conclusion
It is a very exciting time for studying protein folding using multidisciplinary approaches rooted in physics, chemistry, and computer science. The time scale gap between folding in vitro and in silico is being continuously reduced and this will bring interesting surprises. We expect an increasing role of MD simulations in the elucidation of protein folding thanks to further improvements in force fields and solvation models.
References 1 KARPLUS, M. & MCCAMMON, J. A. (2002). Molecular dynamics simulations of biomolecules. Nature Struct. Biol. 9, 646–652. 2 DILL, K. A. & CHAN, H. S. (1997). From Levinthal to pathways to funnels. Nature Struct. Biol. 4, 10–19. 3 MIRNY, L. & SHAKHNOVICH, E. (2001). Protein folding theory: From lattice to all-atom models. Annu. Rev. Biophys. Biomol. Struct. 30, 361–396. 4 DAGGETT, V. & FERSHT, A. R. (2003). Is there a unifying mechanism for protein folding? Trends Biochem. Sci. 28, 18–25. 5 CREIGHTON, T. E. (1992). Protein Folding. W. H. Freeman & Co., New York. 6 MERZ Jr, K.M. & LEGRAND, S. M. (1994). The Protein Folding Problem and Tertiary Structure Prediction. Birkh¨auser, Boston. 7 PAIN, R.H., ed. (2000). Mechanisms of Protein Folding. Oxford University Press, Oxford. 8 SHEA, J. E. & BROOKS III, C. L. (2001). From folding theories to folding proteins: A review and assessment of simulation studies of protein folding
9
10
11
12
13
14
and unfolding. Annu. Rev. Phys. Chem. 52, 499–535. GALZITSKAYA, O. V., HIGO, J. & FINKELSTEIN, A. V. (2002). α-helix and β-hairpin folding from experiment, analytical theory and molecular dynamics simulations. Curr. Protein Pept. Sci. 3, 191–200. GNANAKARAN, S., NYMEYER, H., PORTMAN, J., SANBONMATSU, K. Y. & GARCIA, A. E. (2003). Peptide folding simulations. Curr. Opin. Struct. Biol. 13, 168–174. FERRARA, P. & CAFLISCH, A. (2000). Folding simulations of a three-stranded antiparallel β-sheet peptide. Proc. Natl Acad. Sci. USA 97, 10780–10785. FERRARA, P. & CAFLISCH, A. (2001). Native topology or specific interactions: What is more important for protein folding? J. Mol. Biol. 306, 837–850. FERRARA, P., APOSTOLAKIS, J. & CAFLISCH, A. (2002). Evaluation of a fast implicit solvent model for molecular dynamics simulations. Proteins 46, 24–33. De ALBA, E., SANTORO, J., RICO, M. & JIMENEZ, M. A. (1999). De novo design
References
15
16
17
18
19
20
21
22
23
24
of a monomeric three-stranded antiparallel β-sheet. Protein Sci. 8, 854–865. SCHENCK, H. L. & GELLMAN, S. H. (1998). Use of a designed triple-stranded antiparallel β-sheet to probe β-sheet cooperativity in aqueous solution. J. Am. Chem. Soc. 120, 4869–4870. MCCALLISTER, E. L., ALM, E. & BAKER, D. (2000). Critical role of β-hairpin formation in protein G folding. Nature Struct. Biol. 7, 669–673. WRIGHT, C. F., LINDORFF-LARSEN, K., RANDLES, L. G. & CLARKE, J. (2003). Parallel protein-unfolding pathways revealed and mapped. Nature Struct. Biol. 10, 658–662. RICHARDSON, J. M., MCMAHON, K. W., MACDONALD, C. C. & MAKHATADZE, G. I. (1999). MEARA sequence repeat of human CstF-64 polyadenylation factor is helical in solution. A spectroscopic and calorimetric study. Biochemistry 38, 12869–12875. SHIRLEY, W. A. & BROOKS III, C. L. (1997). Curious structure in “canonical” alanine-based peptides. Proteins 28, 59–71. KARPLUS, M. (2000). Aspects of protein reaction dynamics: Deviations from simple behavior. J. Phys. Chem. B 104, 11–27. FERRARA, P., APOSTOLAKIS, J. & CAFLISCH, A. (2000). Thermodynamics and kinetics of folding of two model peptides investigated by molecular dynamics simulations. J. Phys. Chem. B 104, 5000–5010. CLARKE, D. T., DOIG, A. J., STAPLEY, B. J. & JONES, G. R. (1999). The α-helix folds on the millisecond time scale. Proc. Natl Acad. Sci. USA 96, 7232–7237. LEDNEV, I. K., KARNOUP, A. S., SPARROW, M. C. & ASHER, S. A. (1999). α-Helix peptide folding and unfolding activation barriers: A nanosecond UV resonance raman study. J. Am. Chem. Soc. 121, 8074–8086. ˜ OZ, V., THOMPSON, P. A., MUN HOFRICHTER, J. & EATON, W. A. (1997). Folding dynamics and mechanism of β-hairpin formation. Nature 390, 196–199.
25 OLIVEBERG, M., TAN, Y. J. & FERSHT, A. R. (1995). Negative activation enthalpies in the kinetics of protein folding. Proc. Natl Acad. Sci. USA 92, 8926–8929. 26 SEGAWA, S. & SUGIHARA, M. (1984). Characterization of the transition state of lysozyme unfolding. I. Effect. Biopolymers 23, 2473–2488. 27 MATAGNE, A., JAMIN, M., CHUNG, E. W., ROBINSON, C. V., RADFORD, S. E. & DOBSON, C. M. (2000). Thermal unfolding of an intermediate is associated with non-Arrhenius kinetics in the folding of hen lysozyme. J. Mol. Biol. 297, 193–210. 28 KARPLUS, M., CAFLISCH, A., SALI, A. & SHAKHNOVICH, E. (1995). Protein dynamics: From the native to the unfolded state and back again. In Modelling of Biomolecular Structures and Mechanisms (PULLMAN, A., JORTNER, J. & PULLMAN, B., eds), pp. 69–84, Kluwer Academic, Dordrecht, The Netherlands 29 KARPLUS, M. (1997). The Levinthal paradox: Yesterday and today. Folding Des. 2, S69–S75. 30 DOBSON, C. M., SALI, A. & KARPLUS, M. (1998). Protein folding: A perspective from theory and experiment. Angew. Chem. Int. Ed. 37, 868–893. 31 SCALLEY, M. L. & BAKER, D. (1997). Protein folding kinetics exhibit an Arrhenius temperature dependence when corrected for the temperature dependence of protein stability. Proc. Natl Acad. Sci. USA 94, 10636–10640. 32 CHAN, H. S. & DILL, K. A. (1998). Protein folding in the landscape perspective: Chevron plots and non-Arrhenius kinetics. Proteins 30, 2–33. 33 CAVALLI, A., HABERTHU¨ R, U., PACI, E. & CAFLISCH, A. (2003). Fast protein folding on downhill energy landscape. Protein Sci. 12, 1801–1803. 34 DINNER, A. R., SALI, A., SMITH, L. J., DOBSON, C. M. & KARPLUS, M. (2000). Understanding protein folding via free-energy surfaces from theory and experiment. Trends Biochem. Sci. 25, 331–339. 35 WONG, K. B., CLARKE, J., BOND, C. J. et al. (2000). Towards a complete description of the structural and dynamic properties
21
22 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
36
37
38
39
40
41
42
43
44
45
46
of the denatured state of barnase and the role of residual structure in folding. J. Mol. Biol. 296, 1257–1282. NEIDIGH, J. W., FESINMEYER, R. M. & ANDERSEN, N. H. (2002). Designing a 20-residue protein. Nature Struct. Biol. 9, 425–430. GELLMAN, S. H. & WOOLFSON, D. N. (2002). Mini-proteins Trp the light fantastic. Nature Struct. Biol. 9, 408–410. SIMMERLING, C., STROCKBINE, B. & ROITBERG, A. E. (2002). All-atom structure prediction and folding simulations of a stable protein. J. Am. Chem. Soc. 124, 11258–11259. SNOW, C. D., ZAGROVIC, B. & PANDE, V. S. (2002). The Trp cage: Folding kinetics and unfolded state topology via molecular dynamics simulations. J. Am. Chem. Soc. 124, 14548–14549. CHOWDHURY, S., LEE, M. C., XIONG, G. & DUAN, Y. (2003). Ab initio folding simulation of the Trp-cage mini-protein approaches NMR resolution. J. Mol. Biol. 327, 711–717. PITERA, J. W. & SWOPE, W. (2003). Understanding folding and design: replica-exchange simulations of “Trp-cage” miniproteins. Proc. Natl Acad. Sci. USA 100, 7587–7592. STILL, W. C., TEMPCZYK, A., HAWLEY, R. C. & HENDRICKSON, T. (1990). Semianalytical treatment of solvation for molecular mechanics and dynamics. J. Am. Chem. Soc. 112, 6127–6129. DAGGETT, V. & LEVITT, M. (1993). Protein unfolding pathways explored through molecular dynamics simulations. J. Mol. Biol. 232, 600–619. CAFLISCH, A. & KARPLUS, M. (1994). Molecular dynamics simulation of protein denaturation: Solvation of the hydrophobic cores and secondary structure of barnase. Proc. Natl Acad. Sci. USA 91, 1746–1750. DAGGETT, V. & FERSHT, A. (2003). Opinion: The present view of the mechanism of protein folding. Nature Rev. Mol. Cell Biol. 4, 497–502. MAYOR, U., JOHNSON, C. M., DAGGETT, V. & FERSHT, A. R. (2000). Protein folding and unfolding in microseconds to nanoseconds by experiment and
47
48
49
50
51
52
53
54
55
56
57
simulation. Proc. Natl Acad. Sci. USA 97, 13518–13522. MAYOR, U., GUYDOSH, N. R., JOHNSON, C. M. et al. (2003). The complete folding pathway of a protein from nanoseconds to microseconds. Nature 421, 863–867. CAFLISCH, A. & KARPLUS, M. (1999). Structural details of urea binding to barnase: A molecular dynamics analysis. Structure 7, 477–488. HAO, M.-H., PINCUS, M. R., RACHOVSKY, S. & SCHERAGA, H. A. (1993). Unfolding and refolding of the native structure of bovine pancreatic trypsin inhibitor studied by computer simulations. Biochemistry 32, 9614–9631. HARVEY, S. C. & GABB, H. A. (1993). Conformational transition using molecular dynamics with minimum biasing. Biopolymers 33, 1167–1172. SCHLITTER, J., ENGELS, M., KRUGER, P., JACOBY, E. & WOLLMER, A. (1993). Targeted molecular dynamics simulation of conformational change. Application to the TR transition in insulin. Mol. Simulations 10, 291–308. H¨UNENBERGER, P. H., MARK, A. E. & van GUNSTEREN, W. F. (1995). Computational approaches to study protein unfolding: Hen egg white lysozyme as a case study. Proteins 21, 196–213. FERRARA, P., APOSTOLAKIS, J. & CAFLISCH, A. (2000). Computer simulations of protein folding by targeted molecular dynamics. Proteins 39, 252–260. FERRARA, P., APOSTOLAKIS, J. & CAFLISCH, A. (2000). Targeted molecular dynamics simulations of protein unfolding. J. Phys. Chem. B 104, 4511–4518. PACI, E., SMITH, L. J., DOBSON, C. M. & KARPLUS, M. (2001). Exploration of partially unfolded states of human α-lactalbumin by molecular dynamics simulation. J. Mol. Biol. 306, 329–347. MARCHI, M. & BALLONE, P. (1999). Adiabatic bias mlecular dynamics: A method to navigate the conformational space of complex molecular systems. J. Chem. Phys. 110, 3697–3702. WILKINS, D. K., GRIMSHAW, S. B., RECEVEUR, V., DOBSON, C. M., JONES, J. A. & SMITH, L. J. (1999). Hydro-dynamic radii of native and denatured proteins
References
58
59
60
61
62
63
64
65
66
67
68
measured by pulse NMR techniques. Biochemistry 38, 16424–16431. KUWAJIMA, K. (1996). The molten globule state of α-lactalbumin. FASEB J. 10, 102–109. TROULLIER, A., REINSTA¨ DLER, D., DUPONT, Y., NAUMANN, D. & FORGE, V. (2000). Transient nonnative secondary structures during the refolding of α-lactalbumin by infrared spectroscopy. Nature Struct. Biol. 7, 78–86. SCHULMAN, B., KIM, P. S., DOBSON, C. M. & REDFIELD, C. (1997). A residue-specific NMR view of the non-cooperative unfolding of a molten globule. Nature Struct. Biol. 4, 630–634. RIEF, M., GAUTEL, M., OESTERHELT, F., FERNANDEZ, J. M. & GAUB, H. E. (1997). Reversible unfolding of individual titin immunoglobulin domains by AFM. Science 276, 1109–1112. LU, H., ISRALEWITZ, B., KRAMMER, A., VOGEL, V. & SCHULTEN, K. (1998). Unfolding of titin immunoglobulin domains by steered molecular dynamics simulation. Biophys. J. 75, 662–671. PACI, E. & KARPLUS, M. (1999). Forced unfolding of fibronectin type 3 modules: An analysis by biased molecular dynamics simulations. J. Mol. Biol. 288, 441–459. ISRALEWITZ, B., GAO, M. & SCHULTEN, K. (2001). Steered molecular dynamics and mechanical functions of proteins. Curr. Opin. Struct. Biol. 11, 224–230. PACI, E. & KARPLUS, M. (2000). Unfolding proteins by external forces and high temperatures: The importance of topology and energetics. Proc. Natl Acad. Sci. USA 97, 6521–6526. WILLIAMS, P. M., FOWLER, S. B., BEST, R. B. et al. (2003). Hidden complexity in the mechanical properties of titin. Nature 422, 446–449. FOWLER, S., BEST, R. B., TOCA-HERRERA, J. L. et al. (2002). Mechanical unfolding of a titin Ig domain: Structure of unfolding intermediate revealed by combining AFM, molecular dynamics simulations, NMR and protein engineering. J. Mol. Biol. 322, 841–849. BEST, R. B., FOWLER, S., TOCA-HERRERA, J. L., STEWARD, A., PACI, E. & CLARKE, J.
69
70
71
72
73
74
75
76
77
78
(2003). Mechanical unfolding of a titin Ig domain: Structure of transition state revealed by combining atomic force microscopy, protein engineering and molecular dynamics simulations. J. Mol. Biol. 330, 867–877. VENDRUSCOLO, M. & PACI, E. (2003). Protein folding: Bringing theory and experiment closer together. Curr. Opin. Struct. Biol. 13, 82–87. VENDRUSCOLO, M., PACI, E., KARPLUS, M. & DOBSON, C. M. (2003). Structures and relative free energies of partially folded states of proteins. Proc. Natl Acad. Sci. USA 100, 14817–14821. BROCKWELL, D. J., PACI, E., ZINOBER, R. C. et al. (2003). Pulling geometry defines the mechanical resistance of a β-sheet protein. Nature Struct. Biol. 10, 731–737. CARRION-VAZQUEZ, M., LI, H., LU, H., MARSZALEK, P. E., OBERHAUSER, A. F. & FERNANDEZ, J. M. (2003). The mechanical stability of ubiquitin is linkage dependent. Nature Struct. Biol. 10, 738–743. LU, H. & SCHULTEN, K. (2000). The key event in force-induced unfolding of titin’s immunoglobulin domains. Biophys. J. 79, 51–65. FERSHT, A. R., MATOUSCHEK, A. & SERRANO, L. (1992). The folding of an enzyme. I. Theory of protein engineering analysis of stability and pathway of protein folding. J. Mol. Biol. 224, 771–782. FERSHT, A. R., ITZHAKI, L. S., ELMASRY, N. F., MATTHEWS, J. M. & OTZEN, D. E. (1994). Single versus parallel pathways of protein folding and fractional structure in the transition state. Proc. Natl Acad. Sci. USA 91, 10426–10429. LI, A. & DAGGETT, V. (1994). Characterization of the transition state of protein unfolding by use of molecular dynamics: Chymotrypsin inhibitor 2. Proc. Natl Acad. Sci. USA 91, 10430–10434. DAGGETT, V. (2002). Molecular dynamics simulations of the protein unfolding/ folding reaction. Acc. Chem. Res. 35, 422–429. GSPONER, J. & CAFLISCH, A. (2002). Molecular dynamics simulations of
23
24 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
79
80
81
82
83
84
85
86
87
88
89
protein folding from the transition state. Proc. Natl Acad. Sci. USA 99, 6719– 6724. DEJONG, D., RILEY, R., ALONSO, D. O. & DAGGETT, V. (2002). Probing the energy landscape of protein folding/unfolding transition states. J. Mol. Biol. 319, 229–242. FERSHT, A. R. (1999). Structure and Mechanism in Protein Science: A Guide to Enzyme Catalysis and Protein Folding. W. H. Freeman & Co., New York. VENDRUSCOLO, M., PACI, E., DOBSON, C. M. & KARPLUS, M. (2001). Three key residues form a critical contact network in a protein folding transition state. Nature 409, 641–645. PACI, E., VENDRUSCOLO, M., DOBSON, C. M. & KARPLUS, M. (2002). Determination of a transition state at atomic resolution from protein engineering data. J. Mol. Biol. 324, 151–163. PACI, E., CLARKE, J., STEWARD, A., VENDRUSCOLO, M. & KARPLUS, M. (2003). Self-consistent determination of the transition state for protein folding. Application to a fibronectin type III domain. Proc. Natl Acad. Sci. USA 100, 394–399. DAGGETT, V. & LEVITT, M. (1993). Realistic simulations of native-protein dynamics in solution and beyond. Annu. Rev. Biophys. Biomol. Struct. 22, 353–380. CAFLISCH, A. & KARPLUS, M. (1995). Acid and thermal denaturation of barnase investigated by molecular dynamics simulations. J. Mol. Biol. 252, 672–708. LAZARIDIS, T. & KARPLUS, M. (1997). “New View” of protein folding reconciled with the old through multiple unfolding simulations. Science 278, 1928–1931. FRENKEL, D. & SMIT, B. (1996). Understanding Molecular Simulation, 2nd edition, Academic Press, London. BERNE, B. J. & STRAUB, J. E. (1997). Novel methods of sampling phase space in the simulation of biological systems. Curr. Opin. Struct. Biol. 7, 181–189. BARTELS, C. & KARPLUS, M. (1997). Multidimensional adaptive umbrella sampling: Applications to main chain and side chain peptide conformations. J. Comput. Chem. 18, 140–1462.
90 MITSUTAKE, A., SUGITA, Y. & OKAMOTO, Y. (2001). Generalized-ensemble algorithms for molecular simulations of biopolymers. Biopolymers 60, 96–123. 91 WU, X. & WANG, S. (1998). Self-guided molecular dynamics simulation for efficient conformational search. J. Phys. Chem. B 102, 7238–7250. 92 APOSTOLAKIS, J., FERRARA, P. & CAFLISCH, A. (1999). Calculation of conformational transitions and barriers in solvated systems: Application to the alanine dipeptide in water. J. Chem. Phys. 110, 2099–2108. 93 ANDRICIOAEI, I., DINNER, A. R. & KARPLUS, M. (2003). Self-guided enhanced sampling methods for thermodynamic averages. J. Chem. Phys. 118, 1074–1084. 94 SCHLICK, T., BARTH, E. & MANDZIUK, M. (1997). Biomolecular dynamics at long timesteps: bridging the time-scale gap between simulation and experimentation. Annu. Rev. Biophys. Biomol. Struct. 26, 181–222. 95 MARINARI, E. & PARISI, G. (1992). Simulated tempering: A new Monte Carlo scheme. Europhys. Lett. 19, 451–458. 96 SUGITA, Y. & OKAMOTO, Y. (1999). Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 314, 141–151. 97 RAO, F. & CAFLISCH, A. (2003). Replica exchange molecular dynamics simulations of reversible folding. J. Chem. Phys. 119, 4035–4042. 98 BERENDSEN, H. J. C., POSTMA, J. P. M., van GUNSTEREN, W. F., DINOLA, A. & HAAK, J. R. (1984). Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690. 99 SANBONMATSU, K. Y. & GARCIA, A. E. (2002). Structure of Met-enkephalin in explicit aqueous solution using replica exchange molecular dynamics. Proteins 46, 225–234. 100 GARCIA, A. E. & SANBONMATSU, K. Y. (2002). Alpha-helical stabilization by side chain shielding of backbone hydrogen bonds. Proc. Natl Acad. Sci. USA 99, 2782–2787. 101 GARC´IA, A. E. & SANBONMATSU, K. Y. (2001). Exploring the energy landscape
References
102
103
104
105
106
107
108
109
110
111
112
113
of a hairpin in explicit solvent. Proteins 42, 345–354. ZHOU, R., BERNE, B. J. & GERMAIN, R. (2001). The free energy landscape for β hairpin folding in explicit water. Proc. Natl Acad. Sci. USA 98, 14931–14936. RHEE, Y. M. & PANDE, V. S. (2003). Multiplexed-replica exchange molecular dynamics method for protein folding simulation. Biophys. J. 84, 775–786. BOLHUIS, P. G., CHANDLER, D., DELLAGO, C. & GEISSLER, P. L. (2002). Transition path sampling: Throwing ropes over rough mountain passes, in the dark. Annu. Rev. Phys. Chem, 53, 291–318. BOLHUIS, P. G. (2003). Transition-path sampling of β-hairpin folding. Proc. Natl Acad. Sci. USA 100, 12129–12134. ELBER, R., MELLER, J. & OLENDER, R. (1999). Stochastic path approach to compute atomically detailed trajectories: Application to the folding of C peptide. J. Phys. Chem. B, 103, 899–911. EASTMAN, P., GRONBECH-JENSEN, N. & DONIACH, S. (2001). Simulation of protein folding by reaction path annealing. J. Chem. Phys. 114, 3823–3841. W¨UTHRICH, K. (1989). Protein structure determination in solution by nuclear magnetic resonance spectroscopy. Science 243, 45–50. CLORE, G. M. & SCHWIETERS, C. D. (2002). Theoretical and computational advances in biomolecular NMR spectroscopy. Curr. Opin. Struct. Biol. 12, 146–153. BROOKS, B. R., BRUCCOLERI, R. E., OLAFSON, B. D., STATES, D. J., SWAMINATHAN, S. & KARPLUS, M. (1983). CHARMM: A program for macromolecular energy, minimization and dynamics calculations. J. Comput. Chem. 4, 187–217. NERIA, E., FISCHER, S. & KARPLUS, M. (1996). Simulation of activation free energies in molecular dynamics system. J. Chem. Phys. 105, 1902–1921. LAZARIDIS, T. & KARPLUS, M. (1999). Effective energy function for protein dynamics and thermodynamics. Proteins 35, 133–152. PACI, E., VENDRUSCOLO, M. & KARPLUS, M. (2002). Native and nonnative
114
115
116
117
118
119
120
121
122
123
124
interactions along protein folding and unfolding pathways. Proteins 47, 379–392. COTA, E., HAMILL, S. J., FOWLER, S. B. & CLARKE, J. (2000). Two proteins with the same structure respond very differently to mutation: The role of plasticity in protein stability. J. Mol. Biol. 302, 713–725. SHEA, J.-E., ONUCHIC, J. N. & BROOKS III, C. L. (1999). Exploring the origins of topological frustration: Design of a minimally frustrated model of fragment B of protein A. Proc. Natl Acad. Sci. USA 96, 12512–12517. DUAN, Y. & KOLLMAN, P. A. (1998). Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science 282, 740–744. SHIRTS, M. & PANDE, V. (2000). COMPUTING: Screen savers of the world unite! Science 290, 1903–1904. JACKSON, S. E. (1998). How do small single-domain proteins fold? Folding Des. 3, R81–R91. PANDE, V. S., BAKER, I., CHAPMAN, J. et al. (2003). Atomistic protein folding simulations on the submillisecond time scale using worldwide distributed computing. Biopolymers 68, 91–109. FERSHT, A. R. (2002). On the simulation of protein folding by short time scale molecular dynamics and distributed computing. Proc. Natl Acad. Sci. USA 99, 14122–14125. PACI, E., CAVALLI, A., VENDRUSCOLO, M. & CAFLISCH, A. (2003). Analysis of the distributed computing approach applied to the folding of a small b peptide. Proc. Natl Acad. Sci. USA 100, 8217–8222. SETTANNI, G., GSPONER, J. & CAFLISCH, A. (2004). Formation of the folding nucleus of an SH3 domain investigated by loosely coupled molecular dynamics simulations. Biophys. J. 86, 1691–1701. ROUX, B. & SIMONSON, T. (1999). Implicit solvent models. Biophys. Chem. 78, 1–20. GILSON, M. K. (1995). Theory of electrostatic interactions in macromolecules. Curr. Opin. Struct. Biol. 5, 216–223.
25
26 Molecular Dynamics Simulations to Study Protein Folding and Unfolding
125 TOMASI, J. & PERSICO, M. (1994). Molecular interactions in solution: An overview of methods based on continuous distributions of the solvent. Chem. Rev. 94, 2027–2094. 126 CRAMER, C. J. & TRUHLAR, D. G. (1999). Implicit solvation models: Equilibria, structure, spectra, and dynamics. Chem. Rev. 99, 2161–2200. 127 OROZCO, M. & LUQUE, F. J. (2000). Theoretical methods for the description of the solvent effect in biomolecular systems. Chem. Rev. 100, 4187–4226. 128 WARWICKER, J. & WATSON, H. C. (1982). Calculation of the electric potential in the active site cleft due to α-helix dipoles. J. Mol. Biol. 157, 671–679. 129 GILSON, M. K. & HONIG, B. H. (1988). Energetics of charge-charge interactions in proteins. Proteins 3, 32–52. 130 BASHFORD, D. & KARPLUS, M. (1990). pKa’s of ionizable groups in proteins: Atomic detail from a continuum electrostatic model. Biochemistry 29, 10219–10225. 131 DAVIS, M. E., MADURA, J. D., LUTY, B. A. & MCCAMMON, J. A. (1991). Electrostatics and diffusion of molecules in solutionsimulations with the University-ofHouston-brownian dynamics program. Comput. Phys. Comm. 62, 187–197. 132 SCARSI, M., APOSTOLAKIS, J. & CAFLISCH, A. (1997). Continuum electrostatic energies of macromolecules in aqueous solutions. J. Phys. Chem. B 101, 8098–8106. 133 BASHFORD, D. & CASE, D. A. (2000). Generalized Born models of macromolecular solvation effects. Annu. Rev. Phys. Chem. 51, 129–152. 134 LEE, M. S., FEIG, M., SALSBURY, F. R. & BROOKS III, C. L. (2003). New analytic approximation to the standard molecular volume definition and its application to generalized Born calculations. J. Comput. Chem. 24, 1348–1356. 135 LEE, M. S., SALSBURY, F. R. & BROOKS III C. L. (2002). Novel generalized Born methods. J. Chem. Phys. 116, 10606–10614. 136 QIU, D., SHENKIN, P. S., HOLLINGER, F. P. & STILL, W. C. (1997). The GB/SA
137
138
139
140
141
142
143
144
145
146
continuum model for solvation. A fast analytical method for the calculation of approximate Born radii. J. Phys. Chem. A 101, 3005–3014. DOMINY, B. N. & BROOKS III, C. L. (1999). Development of a generalized Born model parametrization for proteins and nucleic acids. J. Phys. Chem. B 103, 3765–3773. WARSHEL, A. & LEVITT, M. (1976). Theoretical studies of enzymic reactions: dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. J. Mol. Biol. 103, 227–249. GELIN, B. R. & KARPLUS, M. (1979). Side-chain torsional potentials: effect of dipeptide, protein, and solvent environment. Biochemistry 18, 1256–1268. MEHLER, E. L. (1990). Comparison of dielectric response models for simulating electrostatic effects in proteins. Protein Eng. 3, 415–417. WANG, L., HINGERTY, B. E., SRINIVASAN, A. R., OLSON, W. K. & BROYDE, S. (2002). Accurate representation of B-DNA double helical structure with implicit solvent and counterions. Biophys. J. 83, 382–406. MALLIK, B., MASUNOV, A. & LAZARIDIS, T. (2002). Distance and exposure dependent effective dielectric function. J. Comput. Chem. 23, 1090–1099. HABERTHU¨ R, U., MAJEUX, N., WERNER, P. & CAFLISCH, A. (2003). Efficient evaluation of the effective dielectric function of a macromolecule in aqueous solution. J. Comput. Chem. 24, 1936–1949. EISENBERG, D. & MCLACHLAN, A. D. (1986). Solvation energy in protein folding and binding. Nature 319, 199–203. HASEL, W., HENDRICKSON, T. F. & STILL, W. C. (1988). A rapid approximation to the solvent accessible surface areas of atoms. Tetrahedron Comput. Methodol. 1, 103–116. DARDEN, T. A., YORK, D. M. & PEDERSEN, L. (1993). Particle mesh Ewald: An N log(N) method for computing Ewald sums. J. Chem. Phys. 98, 10089–10092.
References 147 WEBER, W., HUNENBERGER, P. H. & MCCAMMON, J. A. (2000). Molecular dynamics simulations of a polyalanine octapeptide under Ewald boundary conditions: Influence of artificial periodicity on peptide conformation. J. Phys. Chem. B 104, 3668– 3675. 148 FAN, H. & MARK, A. E. (2003). Relative stability of protein structures determined by X-ray crystallography or NMR spectroscopy: a molecular dynamics
simulation study. Proteins 53, 111– 120. 149 HILTPOLD, A., FERRARA, P., GSPONER, J. & CAFLISCH, A. (2000). Free energy surface of the helical peptide Y(MEARA)6. J. Phys. Chem. B 104, 10080–10086. 150 LINDORFF-LARSEN, K., VENDRUSCOLO, M., PACI, E. & DOBSON, C. M. (2004). Transition states for protein folding have native topologies despite high structural variability. Nature Struct. Mol. Biol 11, 443–449.
27
1
Molecular Dynamics Simulations of Proteins and Peptides Paul Tavan, Heiko Carstens, and Gerald Mathias Ludwig-Maximilians-Universit¨at M¨unchen, M¨unchen, Germany
Originally published in: Protein Folding Handbook. Part I. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
When viewed from the standpoint of theoretical physics, proteins are extremely complex materials. They are inhomogeneous, non-isotropic, and exhibit dynamic processes covering many spatial and temporal scales (instead of only one or two), that is, they do not offer any of the nice features that make other materials readily accessible to simplified or coarse-grained descriptions. Therefore, theoretical approaches towards protein and peptide dynamics have to be based on microscopic descriptions. Here, atomistic molecular dynamics (MD) simulations, which are based on molecular mechanics (MM) force fields, currently represent the standard [1]. It should be noted, however, that these MM methods represent a first level of coarse graining, because they try to account for the electronic degrees of freedom mediating the interactions between the atoms by simplified parametric descriptions coded into standard MM force fields (for a recent review see Ponder and Case[2]). In this article we want to address the question, to what extent can MM-MD simulations currently or in the near future contribute to the understanding of protein folding? Although simulations have been published (for a review see Ref. [3]) that allegedly describe folding processes of peptides or small proteins, there are also serious arguments (e.g., in Ref. [2]) that the available MM force fields may not be accurate enough for qualitatively correct descriptions. This would imply that the published simulations, instead of providing a “virtual reality” of folding processes, actually represent “real artifacts.” Thus, it is worthwhile addressing the issue of accuracy. Because there are many excellent textbooks and review articles on MD simulation methods for liquids [4] and proteins (Ref. [2] and references therein), there is no Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Molecular Dynamics Simulations of Proteins and Peptides
point in duplicating this material. Instead, we have chosen to outline our own perspective of the issue, which has been and still is guiding our longstanding efforts to develop and improve the required computational methods. We start with a sketch of the basic physics of protein structure and dynamics. This sketch will serve to identify key challenges that have to be mastered by any attempt aiming at a theoretical description of these complicated processes. 2 Basic Physics of Protein Structure and Dynamics
The molecular machinery of life, designed by Darwinian selection in the process of evolution, essentially consists of proteins. Like all machines, these nano-machines are composed of rigid parts that define a specific three-dimensional structure and flexible parts that enable the particular functional dynamics for which the respective protein has been designed. Because proteins are polymer molecules, “flexibility” here implies a liquid state, like the one found in polymer melts. Therefore, under physiological thermodynamic conditions, proteins are generally found in a mixed state close to the melting transition between the solid and liquid phases. The solid “secondary” substructures of proteins, for example the α-helices and βsheets, are shaped by attractive interactions between strong electric dipoles, which are attached to the locally stiff and planar peptide groups within the polypeptide chains (Figure 1 illustrates the dipole–dipole interactions within a β-sheet). Here,
Fig. 1 Organization of the dipole–dipole interactions within a $-sheet. As explained by the resonance structures shown in Figure 3, the peptide groups form rigid platelets (indicated by the dashed lines) and exhibit strong electric dipole moments (arrows). Such dipoles attract each other, if their arrangement is parallel and axial (→→) or
antiparallel and equatorial (↑↓). The structural stability of $-sheets is caused by a dipolar organization integrating both arrangements in an optimized fashion. Here, in contrast to "-helices, which exclusively exhibit the (→→)-arrangement, the local dipoles do not add up to a macroscopic dipole.
2 Basic Physics of Protein Structure and Dynamics 3
the specific sequence of amino acid residues selects the respective secondary structures or, alternatively, precludes the formation of such rigid structures by inducing a local melting of the backbone into a flexible loop or random coil. Therefore, the specific combination and three-dimensional organization of solid and liquid building blocks into the complex structure and dynamics characteristic for proteins is predetermined by their amino acid sequence. Considering the thermodynamic state, proteins resemble a mixture of ice and liquid water, which, likewise, is characterized by a competition between enthalpic dipolar binding interactions attempting to generate solid order and temperature-driven entropic unbinding interactions attempting to generate liquid disorder. Nevertheless, there is a distinct difference concerning the complexity of the system: whereas a water–ice mixture homogeneously consists of only one molecular species, proteins are made up of a specific sequence of at least the 20 different residues coded in the genes, not counting all the additional cofactors, which also contribute to the complexity of protein structure and dynamics.
2.1 Protein Electrostatics
When attempts are made at a theoretical description, a first difficulty becomes apparent here. The dipole–dipole interactions (commonly called hydrogen bonds), which provide the main contribution to the folding of the polypeptide chains into specific three-dimensional structures, provide binding energies not much larger than the thermal energy kB T and it is only the combination of many different dipole–dipole interactions that generates the stability of a specific structure. Therefore, the precise description of the many weak electrostatic interactions within a protein is of key importance, since these interactions determine whether particular solid structures are formed or not. This instance is only a specific manifestation of the general fact that electrostatic interactions are dominant in shaping protein structure and dynamics [5]. In contrast, the weaker van der Waals interactions are less important as they are nonspecifically acting between all chemically nonbonded atoms in a similar way. To obtain a quantitative estimate, consider a hypothetical protein with N = 100 residues, whose native conformation lies the typical energy value of E = 20 kJ mol−1 below either a compact misfolded conformation or the huge variety of unfolded ones. Then the acceptable error per residue allowing us still to identify the correct conformation is approximately E/N 1/2 or 2 kJ mol−1 (see Ref. [2] and references therein), where the errors are assumed to be randomly distributed. Systematic errors in the description of the electrostatics will reduce the acceptable error down to E/N or 0.2 kJ mol−1 . Thus, great care has to be taken in the description of intraprotein electrostatics if one wants to generate a “virtual reality.” Its quantitatively correct description constitutes the first key problem for theoretical approaches.
4 Molecular Dynamics Simulations of Proteins and Peptides
2.2 Relaxation Times and Spatial Scales
Besides the strong variations of the physicochemical (predominantly electrostatic) properties among the residues, there is a second important difference between a water–ice equilibrium and a protein, which adds to the indicated complexity. In liquid water the molecules are free to move individually, which confines the relaxation time of water in response to an external perturbation (e.g., to a change of the electric field) to the scale of a few picoseconds. Furthermore, in pure water at room temperature spatially ordered structures are found only up to a distance of about 1.5 nm from a given water molecule; beyond this distance water starts to behave like a dielectric continuum [6]. In contrast, in a protein the motions of the molecular components are strongly constrained by the polymeric connectivity and by solid substructures. These constraints lead to relaxation times that are larger than those in liquid water by up to 12 orders of magnitude or more. Motions that change the conformation of a protein proceed through thermally activated rotations around the covalent single bonds linking the Cα atoms to the adjacent peptide groups of the backbone. At room temperature such rotations take at least a few 10–100 ps, which is why a protein can start to statistically sample its available conformational space only at time scales above 1 ns. Here, the number of atoms involved in a conformational transition defines a spatial scale and the time required for this transition increases rapidly with the size of the spatial scale. Up to now it is not yet clear whether the resulting time scales of conformational transitions represent a well-ordered hierarchy, within which distinct time scales can be associated to specific processes, or a continuum extending from nanoseconds to seconds. Whereas the former alternative would immediately suggest coarse-grained approaches to protein dynamics, the latter would make them difficult to construct. But whichever of these alternative pictures may be true, the vastly different temporal and spatial scales of molecular motions in proteins generate the second key problem for theoretical descriptions by simulation techniques. 2.3 Solvent Environment
Up to this point we have looked at the problem of describing the native structure and dynamics of proteins as if they could be understood as isolated objects. Assuming this hypothesis we have identified two key problems with which any attempt of a realistic description is confronted. The first is posed by the high accuracy required for the modeling of the electrostatics within a protein and the second by the huge span of time scales that have to be covered. But the issue is even more complicated, since proteins cannot be isolated from their surroundings. Proteins acquire their native structure and functional dynamics solely within their physiological environment, which in the case of soluble proteins is characterized by certain ranges of temperature, pressure, pH, and concentrations of ions and of other molecular components within the aqueous solvent. Although these ranges
2 Basic Physics of Protein Structure and Dynamics 5
may vary from protein to protein, there are specific physiological ranges for each of them. Therefore, any attempt to provide a quantitative description for the dynamic processes of protein folding and function has to properly include the environment at its physiological thermodynamic state. 2.4 Water
The key component of the environment is liquid water, whose adequate representation in computer simulations represents a challenge on its own (see Ref. [7] for a critical review and Ref. [6] for recent results). A water molecule is nearly spherical and very small (approximately the size of an oxygen atom), it carries large electrical dipole and quadrupole moments, and is strongly polarizable by electric fields generated, for example, by other water molecules, ions, or polar groups in its surroundings. The large dipole moment causes the unusually large dielectric constant of 78, by which liquid water scales down electrostatic interactions. For proteins and peptides this dielectric mean-field property of water is important, because they are usually polyvalent ions and therefore generate a reaction field by polarization of the aqueous environment, which modifies the electrostatic interactions within the protein. The proper description of these very long-ranged effects thus represents the first problem specifically posed by MD simulations of protein–solvent systems. The sizable quadrupole moment of the water molecule shapes the detailed structures of the hydration shells of solute molecules and, in particular, also those of proteins. If, as suggested by Brooks [8], water molecules actively support secondary structure formation, for example, by bridging nascent β-sheets, then an accurate modeling of the quadrupole moment is required for the understanding of the detailed folding pathways. The strong electronic polarizability of the water molecules poses yet another challenge. As illustrated by the computational result in Figure 2, due to polarization the dipole moment increases from a value of 1.85 D for a water molecule isolated in the gas-phase to a value of about 2.6-3.0 D in the liquid phase [7, 9, 10]. Correspondingly the dipole moment of a water molecule will sizably change when it moves from a hydration shell of an ion through bulk water towards a non-polar surface of a protein. 2.5 Polarizability of the Peptide Groups and of Other Protein Components
The polarization-induced changes of the dipole moments of the water molecules sketched above are important in the hydrophobic interactions [11], which are commonly believed to provide a key driving force in protein folding (see Ref. [12] for a discussion). Here, particularly the dipole-induced-dipole interactions, for example, between ordered hydration shells and nonpolar surfaces, are expected to provide large contributions. Therefore, not only the electronic polarizability of the water
6 Molecular Dynamics Simulations of Proteins and Peptides
Fig. 2 Distribution of the dipole moment of a water molecule dissolved in pure water as obtained from a hybrid MD simulation combining density functional theory (DFT) for the given water molecule with an MM description of the aqueous environment. The DFT description of an isolated water molecule yields a dipole moment of 1.92 D, which is very close to the experimental
value of 1.85 D. Thus, the induced dipole moment determined by the calculations is 0.85 D, that is about 40% of the gas-phase value. In liquid water the fluctuations of the dipole moment generated by the electronic polarization are 6.5% as measured by F : /:. For technical details and explanations see Ref. [9].
molecules, but also that of the various protein components has to be included for a quantitative description of the hydrophobic effect. Furthermore, the electronic polarizability of the components can stabilize local structures in proteins. For instance, about 50% of the binding energy between a positive charge and an aromatic ring is due to polarization (as shown in Ref. [2] for the representative example of potassium bound to benzene). Particularly the polarizability of the peptide groups can be expected to decide about the relative stabilities of the secondary structure motifs in proteins. This claim is proven by the following simple argument: We have stated further above (and illustrated in Figure 1 for the case of a β-sheet) that the dipole–dipole interactions within a protein, particularly those between the peptide groups of the backbone, shape the secondary structures of proteins. The peptide groups, in addition to exhibiting strong dipole moments, are highly polarizable as one can deduce from the resonance structures depicted in Figure 3. In response to the electric field generated by the dipole moments of neighboring peptide groups, with which a peptide group interacts in a given secondary structure motif (cf. Figure 1), an induced dipole moment will be added to the static dipole moment of the given group, and will correspondingly strengthen or weaken its dipole–dipole interactions (H-bonds). Here, the size and direction of the induced dipole moment will be determined by the spatial distribution and orientation of the neighboring dipoles and partial charges, that is, predominantly by the surrounding secondary structure. As is also apparent from Figure 3, the electronic polarization will concomitantly change the force constants and equilibrium lengths of the chemical bonds
2 Basic Physics of Protein Structure and Dynamics 7
Fig. 3 B-electron resonance structures of a peptide group explaining its strong dipole moment and polarizability. An external electric field can modify the relative weights, by which these resonance structures contribute to the electronic wave function. Correspondingly, it will sizably change the dipole moment and the force constants.
(O C N H) within the peptide groups. This effect is apparent in the vibrational spectra of proteins, which are dominated by the normal modes of the peptide groups giving rise to so-called amide bands. The shape of these broad bands is a signature of the differences in polarization experienced by the various peptide groups and, therefore, can be used for an approximate determination of secondary structure content (see Ref. [13] for further information). Peaks of the so-called amide-I band, for instance, which belongs to the C O stretching vibrations of the backbone, are found in the spectral range between 1620 and 1700 cm−1 , indicating that the polarization can change the associated C O force constants by 5%. Conversely, (i) the fact that the large and specific frequency shifts of the amide modes can indicate the secondary structure environment of the corresponding peptide groups, and (ii) the resonance argument in Figure 3 uniquely demonstrate that the electronic polarization will entail sizably different strengths (≥5%) of the dipolar interactions (“H-bonds”) within the various secondary structure motifs. This proves our above conjecture that the electronic polarizability provides a differential contribution to the relative stabilities of secondary structure motifs. As a result, theoretical descriptions of protein dynamics have to include the electronic polarizability. In addition to this general insight, the above considerations have led us also to quantitative estimates as to how much the polarizability contributes to the electrostatics. We found numbers ranging between 50% (charge on top of an aromatic ring) and 5% (peptide groups in different secondary structure motifs). These contributions systematically and differentially change the electrostatic interactions within protein–solvent systems and do not represent random variations. This instance is obvious if one compares the interaction of a dipolar group with a charged and a nonpolar group, respectively. Furthermore, it has been demonstrated above for the dipolar interactions within secondary structure motifs. Below, we will refer to the thus established 5% lower bound of the polarizability contribution to protein electrostatics for purposes of accuracy estimates. Finally, we would like to add that enzymatic catalysis in proteins, which is a prominent feature of these materials, is essentially an effect of polarizability. Here, a substrate is noncovalently attached in a specific orientation to a binding pocket and a highly structured electric field, which is generated by a particular arrangement
8 Molecular Dynamics Simulations of Proteins and Peptides
of charged, polar, and nonpolar residues, exerts a polarizing strain on its charge distribution such that a particular chemical bond either breaks or is formed. This example underlines the decisive contribution of the electronic polarizability to protein structure, dynamics and function.
3 State of the Art
By considering the basic physics of protein–solvent systems we have identified key challenges that have to be met by MD-based theoretical descriptions of protein and peptide dynamics. The majority of these problems are posed by (i) the complex electrostatics in such systems and comprise (a) the long range of these interactions, (b) their shielding by the surrounding solvent, and (c) their local variability, which is caused by differential electronic polarization in these inhomogeneous and nonisotropic materials. Further challenges derive from the need (ii) to properly account for the thermodynamic conditions and (iii) to cover the relevant time scales. During the past 25 years, since the first MD simulation of a protein was reported [14], the true complexities of the challenge have become more and more apparent. By now some of the problems have been solved, whereas others are still awaiting a solution. Quite na¨ıvely, the first MD simulation dealt with a protein embedded in vacuum. However, the importance of the aqueous environment and the correct choice of the thermodynamic conditions soon became apparent [15, 16]. 3.1 Control of Thermodynamic Conditions
In the theory of liquids and long before the history of protein simulations began, MD simulation systems representing a well-defined thermodynamic ensemble had been set up by filling periodic boxes with MM models of the material of interest (cf. the first simulation of liquid water by Rahman and Stillinger [17]). Whereas the temperature (or internal energy) is trivially accessible in any MD approach, up to now the pressure (or volume) can be controlled solely (cf. Mathias et al. [18] for a discussion) through the use of periodic boundary conditions (PBCs). This control can be executed safely and poses no problems. 3.2 Long-range Electrostatics
PBCs generate an unbounded simulation system of finite size with the topology of a three-dimensional torus. Therefore, they create the problem of artificial selfinteractions whenever the interactions among the simulated particles are long ranged as is the case for electrostatics. To avoid such self-interactions one has to follow the minimum image convention (MIC), which dictates that all interactions beyond the MIC-distance RMIC = L/2 should be neglected, where L is the size of the
3 State of the Art
periodic simulation box [4]. On the other hand, a corresponding truncation of the long-ranged electrostatic interactions in the so-called straight-cutoff (SC) approach leads to serious structural and energetic artifacts, even if the simulated system consists solely of pure water [18–20]. For solutions of polyvalent ions, like proteins or nucleic acids, the SC-artifacts become even worse. Nevertheless, up to about a decade ago MD simulations of pure water and protein–solvent systems used to apply the SC approach with small cutoff distances RC ≈ 1 nm (i.e., RC RMIC ) mainly for computational reasons: In an N-particle system the number of long-range interactions scales like N 2 and, therefore, seems to imply an intractable computational effort. However, in the sequel, the unacceptably large errors connected with this crude approximation forced investigators to ignore the MIC and to apply lattice sum (LS) methods [21–23] to the computation of the long-range electrostatics. LS methods take advantage of the periodicity of the simulated system and can evaluate the correspondingly periodic electrostatic potential at a computational effort scaling with N log N. On the other hand, the neglect of the MIC now introduces periodicity artifacts into the description of nonperiodic systems such as liquids and proteins in solution [24]. However, for large simulation systems with L > 6 nm in the case of pure water and L 2d in the case of a solvated protein of diameter d, the periodicity artifacts should become negligible or small, respectively [6, 18]. Today, we can even give absolute error bounds for the accuracy of the electrostatics computation in MD simulations. This is due to the recent development of a new algorithmic approach, which avoids the use of a periodic potential and, nevertheless, provides an accurate and computationally efficient alternative to the LS methods [18]. Therefore, it allows us to estimate, for example, the size of the LS artifacts and, in combination with LS results, to actually measure the SC errors. Because a comparably accurate alternative to LS had been lacking before, such accuracy measurements had been impossible. Figure 4 sketches the basic concept of this recent multiple-scale approach, which combines toroidal boundary conditions with a reaction field correction (TBC/RF). To provide an example for such error bounds, we consider the PBC/LS, TBC/RF, and SC simulations on large water systems (L = 8-12 nm, N = 34000-120 000), which have been executed with two different programs (GROMACS [27], EGOMMII [28]) under otherwise identical conditions (see Refs. [6, 18]). The enthalpies of vaporization H per water molecule calculated by PBC/LS and TBC/RF showed relative deviations of at most 3/1000, that is 0.1 kJ mol−1 . This deviation is smaller by a factor of 2 than the acceptable systematic error per residue estimated in Section 1 and by a factor of 20 below the acceptable random error. In contrast, the H value calculated by SC using a cutoff radius RC ≈ 4 nm (system size L = 8 nm) deviated by 1.2 kJ mol−1 from the PBC/LS and TBC/RF values, which is larger by a factor of 6 than the acceptable systematic error introduced in Section 1. This example quantitatively proves that SC simulations grossly miss the accuracy required for protein structure prediction in purely dipolar systems, whereas here the two other methods are acceptable. For solutions containing ions and particularly proteins similar comparisons of PBC/LS and TBC/RF treatments will soon
9
10 Molecular Dynamics Simulations of Proteins and Peptides
Fig. 4 Concept of the TBC/RF approach to the computation of long-range electrostatics in MD simulations of proteins in solution [18]. Each atom of the periodic simulation system of size L containing a protein of diameter d in solution is surrounded by a sphere of radius RC = RMIC = L/2. The polarization of the dielectric out-
side of that sphere generates a reaction field acting on the central charge, whose interactions with the charges within the sphere are rapidly calculated by fast, hierarchical, and structure-adapted multipole expansions. Here, the applied SAMM algorithm [25, 26] scales linearly with N.
provide clues as to which measures can be used to avoid computational artifacts and to achieve the required accuracy in these more complicated cases. As a result, the longstanding problem of how the long-range electrostatics of protein–solvent systems can be treated in MD simulations with sufficient accuracy and manageable computational effort will find a final answer in the near future. Furthermore, we can safely state that the remaining accuracy problems connected with long-range electrostatics computation are much smaller than those associated with the MM force fields to which we will now turn our attention. 3.3 Polarizability
As shown in Sections 3.4–3.5, many components of a protein–solvent system exhibit a significant electronic polarizability and, therefore, their electrostatic signatures, as measured by their multipole moments, will strongly vary with their position within these complex and inhomogeneous systems. In spite of the decisive part played by the electronic polarizability in protein structure, dynamics, and function, all available MM force fields still use static partial charges to model the electrostatic properties of the molecules (see, however, the ongoing development reported by Kaminski et al. [29]). Thus, they try to account for the electronic polarizability in a mean field fashion, which is invalid in inhomogeneous and noniso-tropic materials. The examples discussed in Sections 3.4–3.5 suggest a lower bound of 5% for the size of the relative errors, which the neglect of polarization introduces into the description of the electrostatics.
3 State of the Art
Fig. 5 Geometries of three different nonpolarizable MM models for water. Black circles indicate the positions of negative and open circles those of the positive partial charges. The large circle indicates the van der Waals radius. A) TIP3P, B) TIP4P (both Jorgensen et al. [33]), C) TIP5P [36].
To convert this relative error into an absolute number, we have conducted a sample TBC/RF simulation for bovine pancreatic trypsin inhibitor (BPTI) [30] in water at a temperature of 300 K and a pressure of 1 bar using the set-up described in Ref [31]. Here, a periodic simulation box of 8 nm side length, filled with 17329 rigid TIP3P water molecules [32] (cf. Figure 5) and a CHARMM22 model [33] of BPTI (N res = 58 amino acid residues), was equilibrated for 1 ns. We took this equilibrated system as the starting point for a 100 ps simulation and evaluated the average total electrostatic energy E e,tot of the protein atoms. Less than 1% of all electrostatic interactions between the protein charges occur within a given residue. Due to the so-called 1-3 exclusion commonly used in MM force fields [33] they act at comparable distances like the interactions with charges residing at neighboring residues or solvent molecules. Therefore, we estimate the intra-residue contribution to E e,tot to be sizably smaller than at most 20% of E e ,tot. As a result, the expression E e,res ≈ 0.8 × E e,tot /Nres should represent a lower limit for the average electrostatic energy E e,res per residue. Inserting the computed number, we find E e,res ≈ 160 kJ Mol−1 . With the 5% lower bound for the systematic error, which is expected to be caused by the neglect of polarizability, we thus find a lower limit to the “polarizability error” of 8 kJ mol−1 per residue. This polarizability error is by a factor 4 larger than the acceptable random error and exceeds the acceptable systematic error, which should be more relevant here, by a factor of 40. Therefore, the neglect of the electronic polarizability is the key drawback of the available MM force fields and, up to now, prevents quantitative MD descriptions of protein structure. Furthermore, it is the reason why the parameterization of such force fields is confronted with insurmountable difficulties. As outlined in Section 2.5, the electronic polarizability of the peptide groups provides a differential contribution to the relative stabilities of secondary structure motifs. As a result, the common nonpolarizable MM force fields can either describe the stability of α-helices or that of β-sheets at an acceptable accuracy, but never both.
11
12 Molecular Dynamics Simulations of Proteins and Peptides
Correspondingly and as illustrated in Ponder and Case [2] by sample simulations, such force fields either favor α-helical structures (e.g., AMBER ff94 [34]) or βsheets (e.g., OPLS-AA [35]), or else fail concerning intermediate structures (e.g., CHARMM22 [33]). 3.4 Higher Multipole Moments of the Molecular Components
For computational reasons, MM force fields apply the partial charge approximation, according to which the charge distribution within the molecular components of a simulation system is modeled by charges located at the positions of the nuclei. Although this approximation should be sufficiently accurate in most instances (if the problem of the missing polarizability is put aside), it will cause problems concerning the description of local structures, whenever “electron lone pairs” occur at oxygen atoms [29]. Important examples for this chemical motif are the peptide C O groups (cf. Figure 3) and the water molecule. At the small oxygen atoms the lone pairs cause large higher multipole moments, whose directional signature cannot be properly represented within the partial charge approximation. Although the electric field generated by these multipole moments rapidly decays beyond the first shell of neighboring atoms, it steers the directions of the H-bonds within that shell and, thus, shapes the corresponding local structures. The consequences of this subtle effect are yet to be explored for proteins. For water, however, whose experimental and theoretical characterization has always been and still is at a much higher stage than that of proteins, the importance of this effect is well known. 3.5 MM Models of Water
In an excellent review, Guillot [7] has recently given “a reappraisal of what we have learnt during three decades of computer simulations on water.” Within the partial charge approximation, MM models of water comprise three charges at the positions of its three atoms (“three point models”). By symmetry and neutrality of the molecule, in this case three parameters (one charge, bond angle, and bond length) suffice to specify the charge distribution (cf. Figure 5A). However, these simple three-point models, which are commonly used in MD simulations of protein–water systems, do not correctly account for the higher multipole moments and, therefore, miss essential features of the correlation functions in pure water (see, for example, Ref. [6]). Because of this defect and of the missing polarizability, they cannot be expected to provide an adequate description of the hydration shell structures surrounding proteins. For an improved description of the higher multipole moments, four- and fivepoint models have been suggested positioning partial charges away from the positions of the nuclei. Figure 5 compares the geometries of correspondingly extended models with that of a simple three-point model. In addition, a series of polarizable models using “fluctuating charges” or inducible dipoles have been proposed.
3 State of the Art
Guillot gives an impressive table listing about 50 different MM models of water and, subsequently, classifies the performance of these models concerning the reproduction of experimentally established properties in simulations. Although Guillot mainly discusses pure water, he also shortly comments on the topic of transferability, which is the unsolved problem of how to construct an MM water model performing equally well for all kinds of solute molecules. Summarizing his review he arrives at the conclusion that “one has a taste of incompletion, if one considers that not a single water model available in the literature is able to reproduce with a great accuracy all the water properties. Despite many efforts to improve this situation very few significant progress can be asserted.” In his view this is due to “the extreme complexity of the water force field in the details, and their influence on the macroscopic properties of the condensed phase.” Nevertheless, being a committed scientist he does not suggest that we should quit attempts at an MM description but instead adds several suggestions as to how one should account for the polarizability and for other details in a better way. 3.6 Complexity of Protein–Solvent Systems and Consequences for MM-MD
The state of the MM-MD descriptions of pure water sketched above is sobering when one turns to protein–solvent systems, whose complexity exceeds that of pure water by orders of magnitude. This enormous increase of complexity is caused only in part by the much larger spatial and temporal scales characterizing protein dynamics (cf. Section 2.1). More important is the combinatorial complexity, which is posed by the fact that one has to parameterize the interactions between a large variety of chemically different components (water, peptide groups, amino acid residues, ions, etc.) into a single effective energy function in a balanced way. If already in the case of a one-component liquid like water subtle details of the MM models determine the macroscopic properties, a difficulty which, up to now, has prevented the construction of a model capable of explaining all known properties, then one may ask whether the MM approach will ever live up to its promises concerning proteins at all. This critical question gains support by our above estimate concerning the size of the errors introduced by the neglect of the electronic polarizability characteristic for the common force fields. Our conservative estimate suggests that nowadays not even a single and most basic property of soluble proteins, namely their structure, can be reliably predicted by extended MM-MD simulations. 3.7 What about Successes of MD Methods?
However, this disastrous result on the accuracy of current MM-MD simulations does not imply that their results are always completely wrong. Instead it may sometimes happen that numbers computed by MD for certain observables surprisingly agree with experimental data within a few per cent. Examples are the peptide simulations presented by Daura et al. [37], which have shown a mixture of peptide
13
14 Molecular Dynamics Simulations of Proteins and Peptides
conformations compatible with NMR data, the simulation of AFM experiments on the rupture force required to pull a biotin substrate out of its binding pocket in streptavidin [38], or the simulation of the light-induced relaxation dynamics in small cyclic peptides [39] (see Section 3 for a more detailed discussion of this example). In such cases, the apparent “success” of an MD simulation may be due (i) to the choice of an observable that happens to be extremely stable with respect to the sizable inaccuracies of the MM force field, or (ii) to the fortunate cancellation of many different errors [40]. Therefore, one cannot readily conclude from such favorable examples that the MD approach towards biomolecular dynamics has “come of age” [41]. Instead, one must expect that observables computed by unrestricted MD simulations differ either quantitatively from experimental data by 10–50% or are qualitatively wrong. Furthermore, the deficiencies of the MM force fields do not imply that results of MD simulations that are restricted by experimental data are wrong. On the contrary, this type of simulation [42] has led to the greatest success of the MD method and is largely responsible for its widespread application on proteins. In this experimentally restricted MD approach the chemical expertise that is coded into the various parameters of the MM force field (providing standard values for the bond lengths, bond angles, dihedral angles, for the elasticity of these internal coordinates etc.) is combined with a large set of experimental constraints (e.g., from X-ray diffraction or more-dimensional NMR). Shortly after their suggestion in 1987, the corresponding MD-based “annealing simulations” have become the standard tool in the computation of protein and peptide structures from X-ray and NMR data and have acquired a key role in structural biology, because they have enabled the automation of protein structure refinement. In this context, the deficiencies of the MM force fields play a minor role as they are largely corrected by the experimental constraints. Note that the protein electrostatics and its shielding by the solvent are usually ignored in these experimentally restricted simulations, because here one neither wants nor has to deal with the associated complexities. Finally, another type of restricted MD simulation also does not suffer too much from the deficiencies of the MM force fields. These are the so-called QM/MM hybrid methods, in which a small fragment of a simulation system is treated by quantum mechanics (QM) and the large remainder by MM (the basic concepts of this approach were described in a seminal paper by Warshel and Levitt in 1976 [43]). Here, the attention is restricted to the properties of the QM fragment, which are calculated as exactly as possible by numerical solution of the electronic Schr¨odinger equation. The electronic polarization of the QM fragment is obtained by importing the electric field generated by the MM environment as an external perturbation into the QM Hamiltonian. If this perturbation changes a certain property of the QM fragment by 10%, then even 10% errors in the external field calculated by MM will entail only errors of 1% in the QM result. To provide an example we consider the vibrational spectra of molecules. Due to the development [44, 45] and widespread accessibility [46] of density functional theory (DFT) highly accurate computations of intramolecular force fields and of associated gas-phase vibrational spectra have become feasible at a moderate
3 State of the Art
Fig. 6 Comparison of DFT (A) and DFT/MM (B) computations with experimental data for the vibrational spectra of the C O (solid lines) and C C (dashed lines) modes of quinone molecules in the gas-phase and in a protein, respectively. The DFT results pertain to p-benzoquinone [48] and the DFT/MM results to the ubiquinone
QA in the reaction center of Rb. sphaeroides [49]. Both computational results agree very well with the observations. In particular, the DFT/MM treatment correctly predicts the sizable 60 cm−1 red shift of one of the C O modes, which is caused by the protein environment through electrostatic polarization of the quinone.
computational effort. On average, the spectral positions of the computed lines deviate by only about 1.5% from the observations, which is sufficient for a safe assignment of observed bands to normal modes [47, 48]. Note that calculated molecular structures are more exact by orders of magnitude than the spectra, because the molecular structure is much less sensitive to inaccuracies of the QM treatment than the intramolecular force field. In condensed phase, the computation of vibrational spectra [49, 50] has recently been enabled by the development of a DFT/MM hybrid method [9]. The perturbation argument given further above entails the expectation that DFT/MM can describe condensed-phase vibrational spectra at an accuracy comparable to the one previously achieved by DFT for the gas-phase spectra. This expectation is confirmed by the results shown in Figure 6 and by the infrared spectra of p-bezoquinone on water, which have been calculated from a QM/MM hybrid MD trajectory [50]. Thus, although the common nonpolarizable MM force fields are not accurate enough for unrestricted MD simulations of biomolecules, they are valuable in more restricted settings. 3.8 Accessible Time Scales and Accuracy Issues
Concerning MM-MD simulations of protein or peptide folding, the insufficient accuracy of MM force fields has the following consequence: If we had computer power enough to simulate the folding dynamics of a small protein or peptide in solution for milliseconds or seconds starting from a random coil conformation, we would have to expect at a high probability that the conformation predicted by the
15
16 Molecular Dynamics Simulations of Proteins and Peptides
simulation has nothing to do with the native conformation, that is, represents a “real artifact.” In this respect it is, in a sense, fortunate that computational limitations have previously restricted MM-MD simulations of protein–solvent systems to the time scale of a few nanoseconds. Because simulations usually start from a structure determined by X-ray crystallography or by more-dimensional NMR, and because in proteins small-scale conformational transitions only begin to occur at this time scale, the actual metastability of a MM-MD protein model is likely to become apparent only in much more extended simulations than in the ones that are currently feasible. At the accessible nanosecond time scale, the large-scale and slow relaxation processes changing a metastable structure are simply “frozen out,” and protein simulations are strongly restricted by the accessible time scales. Structural stabilities observed in short-time simulations are due to this restriction and cannot be attributed to the particular quality of the force field. Nevertheless, the apparent stability of protein structures even in grossly erroneous MM-MD simulations (like in those applying the SC truncation to the long-range electrostatics) may have led scientists to believe in the validity of MD results. For the field of MD simulations on protein dynamics this was fortunate, because it has created a certain public acceptance (cf. e.g., Ref. [41]) and has attracted researchers. On the other hand it was also quite unfortunate, because the urgent necessity to improve the MD methods to meet the challenges outlined in Section 1 remained an issue of concern only in a small community of methods developers. However, the speed of computation has been increasing by a factor of 2 each year during the past decades and, according to the plans of the computer industry, will continue to exhibit this type of exponential growth within the coming decade. For the available MM force fields this progress would imply that MD simulations spanning about 0.5 ms become feasible, because the current technology already allows series of MD simulations spanning 10 ns each for protein–solvent systems comprising 104 –105 atoms. Therefore, the deficiencies of the current MM force fields are becoming increasingly apparent in unrestricted long-time simulations through changes of supposedly stable protein or peptide conformations. This experience will drive the efforts to construct better force fields, which must be more accurate by at least one order of magnitude, although such an improvement necessarily will imply a larger computational effort and, therefore, will impede the tractability of large proteins at extended time scales. Here, the accurate and computationally efficient inclusion of the electronic polarizability will be of key importance. 3.9 Continuum Solvent Models
The above discussion on the accessible time scales was based on the assumption that a microscopic model of the aqueous environment is employed to account for the important solvent–protein interactions. This approach represents the state of the art and, at the same time, an obstacle in accessing more extended time scales, because in the corresponding simulation systems 90% of the atoms must belong
3 State of the Art
to the solvent (for an explanation see Section 2.2 and Mathias et al. [18]). Thus, one simulates mainly a liquid solvent slightly polluted by protein atoms, and most of the computer time is spent on the calculation of the forces acting on the solvent molecules. It is tempting to get rid of this enormous computational effort by applying socalled implicit solvent models. Here, one tries to replace the microscopic modeling of the solvent by mean field representations of its interactions with the solute protein, which are mainly the electrostatic free energy of solvation and the hydrophobic effect caused by the solvent. The hydrophobic effect is usually included by an energy term proportional to the surface of the solute. Much more complicated is the computation of the electrostatic free energy of solvation. Here one has to find solutions to the electrostatics problem of an irregularly shaped cavity filled with point charges (the solute) and embedded in a dielectric and possibly ionic continuum (the solvent). Standard numerical methods to solve the corresponding Poisson and Poisson-Boltzmann equations are much too expensive to be used in MD simulations and have many other drawbacks (see Egwolf and Tavan [31] for a discussion). To circumvent the problem of actually having to solve a partial differential equation (PDE) at each MD time step, the so-called generalized Born methods (GB) have been suggested (for a review see Bashford and Case [51]). GB methods introduce screening functions that are supposed to describe the solvent-induced shielding of the electrostatic interactions within the solute and are empirically parameterized using sets of sample molecules. The required parameters then add to the combinatorial complexity of the force field parameterization. The GB approximation enables the computation of impressive MD trajectories covering several hundred microseconds [52]. However, not surprisingly, the resulting free energy landscapes drastically differ from those obtained with explicit solvent [53, 54]. Thus, the GB methods apparently oversimplify the complicated electrostatics problem for the sake of computational efficiency and, therefore, fail in structure prediction. A physically correct method to compute the free energy of solvation and, in particular, the forces on the solute atoms in implicit solvent MD simulations has recently been developed by Egwolf and Tavan [31, 55]. In this analytical “one-parameter” approach the reaction field arising from the polarization of the implicit solvent continuum is represented by Gaussian dipole densities localized at the atoms of the solute. These dipole densities are determined by a set of coupled equations, which can be solved numerically by a self-consistent iterative procedure. The required computational effort is comparable to that of introducing the electronic polarizability into MM force fields. This effort is by many orders of magnitude smaller than that of standard numerical methods for the solution of PDEs. The resulting reaction field forces agree very well with those of explicit solvent simulations, and the solvation free energies match closely those obtained by the standard numerical methods [31]. However, this progress is only a first step towards MD simulations with implicit solvent. For its use in MD, one has to correct the violations of the reaction principle resulting from the reaction field forces. In
17
18 Molecular Dynamics Simulations of Proteins and Peptides
addition, the surface term covering the hydrophobic forces has to be parameterized in order to match the free energy landscapes found in explicit solvent simulations. 3.10 Are there Further Problems beyond Electrostatics and Structure Prediction?
Up to this point, our analysis has been focused on the complex electrostatics in protein–solvent systems and the accuracy problems that are caused by the electrostatics in the construction and numerical solution of MM-MD models. In this discussion we have taken the task of protein structure prediction as our yardstick to judge the accuracy of the available force fields and computational techniques. If MD is supposed to provide a contribution to protein folding or to the related problem of structure-based drug design, this yardstick is certainly relevant. However, there are many more possible fields of application for MD simulation than just these two. Each of them has its own accuracy requirements pertaining to different parts of the force field, as has been illustrated by the various examples of restricted MD-based methods given in Section 2.5. If MD is taken seriously as a method for generating a virtual reality of the con-formational dynamics that enables, for instance, energy-driven processes of protein function, then additional aspects of the MM force fields become relevant. Because this conformational dynamics involves torsions around chemical bonds and requires evasive actions to circumvent steric hindrances in these strongly constrained macromolecules, the potential functions mapping the mechanical elasticity and flexibility must be adequately represented for correct descriptions of the kinetics. Here, these so-called “bonded” potential functions must be balanced with the van der Waals potentials of the various atoms and with the modeling of the electrostatics. To illustrate the delicate dependence of simulation descriptions on these details of the force field parameterization, we will now return to one of the examples mentioned in Section 2.5.
4 Conformational Dynamics of a Light-switchable Model Peptide
An important obstacle impeding the improvement of MM force fields is the lack of sufficiently specific and detailed experimental data on the conformational dynamics in proteins and peptides. Structural methods like X-ray or NMR render only information on the equilibrium fluctuations or on the equilibrium conformational ensemble, respectively. In X-ray, a structurally and temporally resolved monitoring of relaxation processes may become accessible through the Laue technique, but this approach is still in its infancy. On the other hand, time-resolved spectroscopic methods for the visible and infrared spectral regions are well-established and, with the recent development of two-dimensional correlation techniques [56], promising perspectives are opening up.
4 Conformational Dynamics of a Light-switchable Model Peptide 19
Fig. 7 Chemical structures of a small cyclic peptide (cAPB) and scheme of the lightinduced photoisomerization, which changes the isomeric state of the azobenzene dye integrated into the backbone of the peptide from cis to trans. The isomerization widens the angle of the stiff linkage between the chromophore and the peptide from about
80◦ to 180◦ and elongates the chromophore by 3.3 Å. The strain exerted by the chromophore after photoisomerization forces the peptide to relax from an ensemble of helical and loop-like conformations into a stretched $-sheet (see Figure 9 for structural details).
To become useful for the evaluation of MM force fields, these techniques must be applied to small peptide–solvent systems of strongly restricted complexity, because, otherwise, the multitude of the many weak interactions characteristic for large proteins in solution prevents the unique identification of force field deficiencies. Furthermore, the dynamical processes monitored spectroscopically should proceed rapidly on the time scale of at most nanoseconds, because this time scale is already accessible to MD simulations. Adopting this view, Sp¨orlein et al. [39] have recently presented an integrated approach towards the understanding of peptide conformational dynamics, in which femtosecond time-resolved spectroscopy on model peptides with built-in light switches has been combined with MD simulation of the light-triggered motions. It has been applied to monitor the light-induced relaxation dynamics occurring on subnanosecond time scales in a peptide that was backbone-cyclized with an azobenzene derivative as optical switch and spectroscopic probe. Figure 7 depicts the chemical structure of this molecule (cAPB) and schematically indicates the changes of geometry that are induced by the photoisomerization of the azobenzene dye. The femtosecond spectra [39] allows us to clearly distinguish and characterize the subpicosecond photoisomerization of the chromophore, the subsequent dissipation of vibrational energy and the subnanosecond conformational relaxation of the peptide. It was interesting to see to what extent these processes can be described by MM-MD simulation. 4.1 Computational Methods
To enable the MD simulation of the photochemical cis-trans isomerization of the chromophore and of the resulting peptide relaxations, an MM model potential has
20 Molecular Dynamics Simulations of Proteins and Peptides
been constructed (see Refs [39] and [57] for details), which drives the chromophore along an inversion coordinate at one of the central nitrogen atoms from cis to trans and, concomitantly, deposits the energy of 260 kJ mol−1 , which is equal to that of the absorbed photon, into the cAPB molecule. This approach serves to model the photoexcitation of the dye, its relaxation on the excited-state surface, and its ballistic crossing back to the potential energy surface of the electronic ground state through a conical intersection. To describe the subsequent dissipation of the absorbed energy into the solvent and the relaxation of the peptide from the initial cis-ensemble of conformations to the target trans-conformation, MM parameters for the chromophore, the peptide and the surrounding solvent were required. Apart from a slight modification [57], the peptide parameters were adopted from the CHARMM22 force field [33]. Like in the experimental set-up, dimethyl-sulfoxide (DMSO) was used as the solvent. For this purpose an all-atom, fully flexible DMSO model was designed by DFT/MM techniques [9], was characterized by TBC/RF-MD simulations, and was compared with other MM models known in the literature [57]. Likewise, the MM parameters of the chromophore were determined by series of DFT calculations. Figure 8 illustrates the subtle differences that may result from such a procedure of MM parameter calculation, taking the stiffness of the central N N bond with respect to torsions as a typical example. The stiffer potential E 1 has been employed in Sp¨orlein et al. [39] and the consequences of choosing the slightly softer potential E 2 will be discussed further below. A larger number of MD simulations, each spanning 1-10 ns (some even up to 40 ns), have been carried out to characterize the equilibrium conformational ensembles in the cis and trans states as well as the kinetics and pathways of their photo-induced conversion [57]. The simulation system was periodic, shaped as a rhombic dodecahedron (inner diameter of 54Å), and was initially filled with 960 DMSO molecules. The best 10 NMR solutions [58] for the structure of the cAPB peptide were taken as starting structures for the simulations of the photoisomerization. They were placed into the simulation system, overlapping DMSO molecules
Fig. 8 Two different MM potentials obtained from fits to DFT potential curves for the torsion of the chromophore around the central N N double bond by a dihedral angle [57]. The stiff potential E1 results from a fit of a cosine to the DFT
potential curve in the vicinity of the transconfiguration ( = 180◦ ) and the softer potential E2 from a fit of a Fourier expansion aiming at an improved description of the DFT potential curve in a wider range of torsion angles g[70◦ , 300◦ ].
4 Conformational Dynamics of a Light-switchable Model Peptide 21
were removed, and the solvent was allowed to adjust in a 100 ps MD simulation at 300 K to the presence of the rigid peptide. Then the geometry restraints initially imposed on the peptide were slowly removed during 50 ps, the system was equilibrated for another 100 ps at 300 K and 1 bar, it was heated to 500 K for 50 ps, and cooled within 200 ps to 300 K. In this way 10 different starting structures were obtained for the 1 ns simulations of the cis-trans isomerization, which was initiated by suddenly turning on the model potential driving the inversion reaction. To obtain estimates for the equilibrium conformational ensembles in the cis and trans states, 10 ns MD simulations were carried out at 500 K starting from the best NMR structures modified by the equilibration procedure outlined above. At this elevated temperature a large number of conformational transitions occur in the peptide during the simulated 10 ns. In all simulations the TBC/RF approach (cf. Section 2.2) was used with a dielectric constant of 45.8 modeling the DMSO continuum at large distances (for further details see Ref. [57]). Due to the lack of space, the methods applied for the statistical analysis of the simulation data cannot be presented in detail here. Instead a few remarks must suffice: The 10 ns trajectories at 500 K have been analyzed by a new clustering tool, which is based on the computation of an analytical model of the configuration space density sampled by the simulation. The model is a Gaussian mixture [59], enables the construction of the free energy landscape and allows the identification of a hierarchy of conformational substates. The associated minima of the free energy are then characterized by prototypical conformations [57]. Concerning the characterization of the photo-induced relaxation dynamics, we will restrict our analysis to the temporal evolution of the total energy left in the cAPB molecule after photoisomerization and will omit structural aspects. 4.2 Results and Discussion
Figure 9 shows the prototypical conformations of cis- and trans-cAPB in DMSO at 500 K and 1 bar, which have been obtained from the 10 ns simulations by the procedures sketched in the preceding section. Whereas cis-cAPB exhibits a series of different conformations, which are dynamically related by thermal flips around the dihedral angles at the Cα atoms of the backbone, there is only a single trans-cAPB conformer. Although the depicted conformers derive from high-temperature simulations, they are compatible [57] with the NOE-restraints observed by NMR [58]. This agreement may not be taken as a big surprise, because the conformational flexibility of peptide is sizably restricted by the strong covalent linkages within the cyclic structure. Apparently, in the given case these restrictions suffice to compensate those inaccuracies of the force field, which involve weaker dipolar interactions or the neglect of the electronic polarizability. Figure 9 illustrates the process that is driven by the photoisomerization. The photons are absorbed by an ensemble of cis-conformers, and the geometry change of the chromophore forces the peptide backbone to relax through flips around the bonds at the Cα atoms towards the trans-conformation. In this conformation, the
22 Molecular Dynamics Simulations of Proteins and Peptides
Fig. 9 Conformational ensembles in the cis and trans states of the cAPB peptide as obtained from 10 ns simulations at 500 K. Each backbone structure represents a different local minimum of the free energy land-
scape. The gray-scale indicates the depth of the respective minimum. Whereas there is only one such minimum in trans, there are many in cis (see Ref. [57] for details).
structure of the peptide is that of a β-sheet, up to turns at the covalent linkage with the chromophore. Therefore, the process may be considered as protein folding “en miniature.” As far as the photo-induced relaxation dynamics is concerned, Figure 10 illustrates a surprising and, as we will argue below, also somewhat frustrating result, reported previously in Ref. [39]. The kinetics of energy relaxation calculated by MD for the cis-trans photoisomerization of cAPB agrees nearly perfectly with that of a corresponding spectroscopic observable! As far as the match within the first picosecond is concerned, this agreement simply reflects the careful design of the MM model potential driving the ballistic photoisomerization and, therefore, is what one should expect, if the assumption of an inversion reaction is correct. From a global fit of all spectroscopic data, covering a wide spectral range, four time constants have been obtained for the relaxation processes, and the analysis of the spectra has allowed to assign elementary processes to these kinetics [39]. Thus, according to spectroscopy, the ballistic isomerization proceeds within 230 fs, a second relaxation channel from the excited state surface to the electronic ground state (not included in the MM model by construction) takes 3 ps, the dissipation of heat from the chromophore into the solvent and the peptide chain proceeds within 15 ps, and a 50 ps kinetics is assigned to conformational transitions in the peptide. As shown in Ref. [60] further conformational transitions in the peptide occur above the time scale of 1 ns until the relaxation process is complete. The limited statistics of the MD data shown in Figure 10 allows us to determine two time constants within the first nanosecond. These time constants are 280 fs for the isomerization of the chromophore and 45 ps for the conformational relaxation of the peptide form the cis-ensemble towards the trans-conformation. The corresponding two-exponential decay is depicted by the solid fit curve in Figure 10. Due to the proximity of the cooling kinetics (15 ps) to that of the conformational dynamics (50 ps) these processes cannot be distinguished in the MD data on
4 Conformational Dynamics of a Light-switchable Model Peptide 23
Fig. 10 Comparison of the energy relaxation kinetics following cis-trans isomerization as observed by femtosecond spectroscopy (black dots, monitoring the spectral position of the main absorption band) and as
calculated by MD (black pixels and fit to the MD trajectories, monitoring the total energy of the cAPB molecule) on a linearlogarithmic time scale [39].
cAPB. According to the simulations, after 1 ns the trans-conformation has not yet been reached in most trajectories. Thus, within the first nanosecond the relaxation kinetics of the simulated ensemble quantitatively agrees with the observations to the extent, to which the limited MD statistics allows the identification of kinetic constants. Further above we have stated that this result has been somewhat frustrating for us. Representing the theory group that had been involved in the design of the model system and of its experimental characterization from the very beginning about a decade ago, we had hoped that our integrated approach [39] would provide hints as to how one has to improve the MD methods and MM force fields. This hope had been a key driving force in the original set-up of this interdisciplinary approach. Instead, the results have merely validated our MD methods, raising the question whether (i) the integrated approach is sensitive enough to the details of the MM force field or (ii) our skepticism with respect to the quality of MD descriptions is justified. To check these questions we have conducted a series of simulations in which certain details of the applied MM force fields were changed one by one [57]. From the huge pile of results collected in this way we have selected a typical example which provides enough evidence to answer both questions. In this example we have changed a single model potential of the MM force field, that is, we have replaced the stiff torsion potential E 1 by the slightly weaker potential E 2 (see Figure 8 for plots of these function and Section 3.1 for further explanation). One can expect that this potential is important for transmitting the mechanical strain from the chromophore to the peptide after cis-trans isomerization. If it is stiff (E 1 ), then
24 Molecular Dynamics Simulations of Proteins and Peptides
Fig. 11 Comparison of the energy relaxation kinetics calculated by MD for two slightly different force fields: The original model potential E1 is replaced by E2 (cf. Figure 8). The fit curve belonging to E1 has been taken from Figure 10. The dashed fit curve
to the 10 MD trajectories (pixels) calculated with E2 exhibits three kinetic constants. Besides showing a 190 fs ballistic isomerization (exp: 230 fs), it separates an 11 ps heat dissipation (exp: 15 ps) from a 190 ps conformational relaxation (exp: 50 ps).
the strain is strong and the relaxation should be fast, and if it is weaker (E 2 ) the relaxation should become slowed down. The above expectation is confirmed by the MD results for the softer potential E 2 shown in Figure 11 and by the kinetic data listed in the associated caption. In the case of E 2 , the prolonged time scale (190 ps) of the conformational relaxation, which is caused by the slightly softer spring, has enabled us to additionally identify the kinetics of heat dissipation (11 ps) in the data. The results demonstrate that a small change of a single model potential can change the kinetic time constant calculated for the conformational relaxation from 45 ps to 190 ps, that is, by a factor of 4. This example answers the two questions voiced above, because it provides evidence that (i) our integrated approach is sensitive enough to detect small changes in a force field and (ii) that our skepticism with respect to the quality of MM force fields is well-justified. The latter is apparent from the fact that both fit procedures for obtaining a torsion potential from DFT calculations, the one that has led to E 1 , and the one yielding E 2 , represent a priori valid choices. It was actually a matter of luck that we happened to choose the right one for the first few simulations published in Ref. [39]. As we now definitely know [57], only a series of further lucky choices in the force field used for our original simulations enabled the surprisingly perfect match. On the other hand, the perfect match obtained in our first attempt to describes the light-induced relaxation in the cAPB peptide also demonstrates that, in principle, MD simulations can give quantitative descriptions. In this sense it gives credit to the method and justifies further efforts aiming at the improvement of the force fields and the extension of time scales.
References
Summary
As of today, MM force fields are not yet accurate enough to enable a reliable prediction of protein and peptide structures by MD simulation. Here the key deficiency is the neglect of the electronic polarizability in the MM force fields. However, MM-MD is valid in restricted settings, for instance, when complemented by experimental constraints, when applied to systems of reduced complexity, or in the restricted context of QM/MM hybrid calculations. In particular, the development of the DFT/MM methods has generated key tools that are required for the improvement of the MM force fields, because they allow highly accurate computations of intramolecular force fields of molecules polarized by a condensed phase environment. The latter has been demonstrated by computations of IR spectra in condensed phase, whose accurate description critically depends on the quality of the computed force field. For the validation of future MM force fields more specific experimental characterizations of the materials are necessary. Thus, the improvement of the computational methods has to rely on close cooperation between scientists working in experimental and theoretical biophysics, chemistry, biochemistry, and beyond. Acknowledgments
Financial support by the Volkswagen-Stiftung (Project I/73 224) and by the Deutsche Forschungsgemeinschaft (SFB533/C1) is gratefully acknowledged. The authors would like to thank W. Zinth, F. Siebert, L. Moroder, C. Renner, and J. Wachtveitl for fruitful discussions. References 1 M. KARPLUS, J. A. MCCAMMON, Nature Struct. Biol. 2002, 9, 646–652. 2 J. W. PONDER, D. A. CASE, Adv. Protein Chem. 2003, 66, 27–85. 3 S. GNANAKARAN et al., Current Opinion in Struct. Biol. 2003, 13, 168–174. 4 M. P. ALLEN, D. J. TILDESLEY, Computer Simulations of Liquids. Clarendon, Oxford, 1987. 5 A. WARSHEL, S. T. RUSSEL, Q. Rev. Biophys. 1984, 17, 283–422. 6 G. MATHIAS, P. TAVAN, J. Chem. Phys. 2004, 120, 4393–4403. 7 B. GUILLOT, J. Mol. Liquids 2002, 101, 219–260. 8 C. L. BROOKS, Acc. Chem. Res. 2002, 35, 447–454. 9 M. EICHINGER, P. TAVAN, J. HUTTER, M. PARRINELLO, J. Chem. Phys. 1999, 110, 10452–10467.
10 K. LAASONEN, M. SPRIK, M. PARRINELLO, R. CAR, J. Chem. Phys. 1993, 99, 9080–9089. 11 C. TANFORD, The Hydrophobic Effect: Formation of Micelles and Biological Membranes. Wiley, New York, 1980. 12 P. L. PRIVALOV, in Protein Folding, Ed: T. E. CREIGHTON, Freeman, New York, 1992. 13 F. SIEBERT, Meth. Enzymol. 1995, 246, 501–526. 14 J. A. MCCAMMON, B. R. GELIN, M. KARPLUS, Nature 1977, 267, 585–590. 15 W. F. van GUNSTEREN, H. J. C. BERENDSEN, J. Mol. Biol. 1984, 176, 559–564. 16 M. LEVITT, R. SHARON, Proc. Natl Acad. Sci. USA 1988, 85, 7557–7561. 17 A. RAHMAN, F. H. STILLINGER, J. Chem. Phys. 1971, 55, 3336–3359. 18 G. MATHIAS, B. EGWOLF, M. NONELLA, P. TAVAN, J. Chem. Phys. 2003, 118, 10847–10860.
25
26 Molecular Dynamics Simulations of Proteins and Peptides
19 I. G. TIRONI, R. SPERB, P. E. SMITH, W. F. van GUNSTEREN, J. Chem. Phys. 1995, 102, 5451–5459. 20 P. H. H¨UNENBERGER, W. F. van GUNSTEREN, J. Chem. Phys. 1998, 108, 6117–6134. 21 T. A. DARDEN, D. YORK, L. PEDERSEN, J. Chem. Phys. 1993, 98, 10089– 10092. 22 U. ESSMANN et al., J. Chem. Phys. 1995, 103, 8577–8593. 23 B. A. LUTY, I. G. TIRONI, W. F. van GUNSTEREN, J. Chem. Phys. 1995, 103, 3014–3021. 24 P. H. H¨UNENBERGER, J. A. MCCAMMON, Biophys. Chem. 1999, 78, 69–88. 25 C. NIEDERMEIER, P. TAVAN, J. Chem. Phys. 1994, 101, 734–748. 26 C. NIEDERMEIER, P. TAVAN, Mol. Simul. 1996, 17, 57–66. 27 E. LINDAHL, B. HESS, D. VAN DER SPOEL, J. Mol. Mod. 2001, 7, 306–317. 28 G. MATHIAS et al., EGO-MMII Users Guide. Lehrstuhl f¨ur BioMolekulare Optik, LMU M¨unchen, Oettingen-strasse 67, D-80538 M¨unchen, unpublished. 29 G. A. KAMINSKI et al., J. Comput. Chem. 2002, 23, 1515–1531. 30 M. MARQUART et al., Acta Crystallogr. Sec. 1983, 39, 480–490. 31 B. EGWOLF, P. TAVAN, J. Chem. Phys. 2003, 118, 2039–2056. 32 L. JORGENSEN et al., J. Chem. Phys. 1983, 79, 926–935. 33 A. D. MACKERELL et al., J. Phys. Chem. B 1998, 102, 3586–3616. 34 W. D. CORNELL et al., J. Am. Chem. Soc. 1995, 115, 9620–9631. 35 G. A. KAMINSKI, R. A. FRIESNER, J. TIRADO-RIVES, W. L. JORGENSEN, J. Phys. Chem. B 2001, 105, 6474–6487. 36 M. W. MAHONEY, W. L. JORGENSEN, J. Chem. Phys. 2000, 112, 8910–8922. 37 X. DAURA et al., J. Am. Chem. Soc. 2001, 123, 2393–2404. 38 H. GRUBMU¨ LLER, B. HEYMANN, P. TAVAN, Science 1996, 271, 997–999.
39 S. SPO¨ RLEIN et al., Proc. Natl Acad. Sci. USA 2002, 99, 7998–8002. 40 W. F. van GUNSTEREN, A. E. MARK, J. Chem. Phys. 1998, 108, 6109–6116. 41 H. J. C. BERENDSEN, Science 1996, 271, 954–955. 42 A. T. BRU¨ NGER et al., Science 1987, 235, 1049–1053. 43 A. WARSHEL and M. LEVITT, J. Mol. Biol 1976, 103, 227–249. 44 P. HOHENBERG, W. KOHN, Phys. Rev B 1964, 136, 864–870. 45 W. KOHN, L. J. SHAM, Phys. Rev. A 1965, 140, 1133–1138. 46 M. J. FRISCH et al., Gaussian 98, Gaussian, Inc., Pittsburgh PA, 1998. 47 J. NEUGEBAUER, B. A. HESS, J. Chem. Phys. 2003, 118, 7215–7225. 48 M. NONELLA, P. TAVAN, Chem. Phys. 1995, 199, 19–32. 49 M. NONELLA, G. MATHIAS, M. EICHINGER, P. TAVAN, J. Phys. Chem. B 2003, 107, 316–322. 50 M. NONELLA, G. MATHIAS, P. TAVAN, J. Phys. Chem. A 2003, 107, 8638–8647. 51 D. BASHFORD, D. A. CASE, Annu. Rev. Phys. Chem. 2000, 51, 129–152. 52 C. D. SNOW, N. NGUYEN, V. S. PANDE, M. GRUEBELE, Nature 2002, 420, 102–106. 53 R. H. ZHOU, B. J. BERNE, Proc. Natl Acad. Sci. USA 2002, 99, 12777–12782. 54 H. NYMEYER, A. E. GARCIA, Proc. Natl Acad. Sci. USA 2003, 100, 13934–13939. 55 B. EGWOLF, P. TAVAN, J. Chem. Phys. 2004, 120, 2056–2068. 56 S. WOUTERSEN, P. HAMM, J. Phys. Chem. B 2000, 104, 11316–11320. 57 H. CARSTENS, Dissertation, Ludwig-Maximilians-Universit¨at M¨unchen, 2004. 58 C. RENNER et al., Biopolymers 2000, 54, 489–500. 59 M. KLOPPENBURG, P. TAVAN, Phys. Rev. E 1997, 55, 2089–2092. 60 S. Sp¨orlein, Dissertation, Ludwig-Maximilians-Universit¨at M¨unchen, 2001.
1
Folding and Association of Multi-domain and Oligomeric Hauke Lilie Martin-Luther-Universit¨at Halle/Wittenberg, Halle, Germany
Robert Seckler Universit¨at Potsdam, Golm, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Folding and association of nascent or refolding polypeptide chains are spontaneous and autonomous processes. All information required for the formation of the native three-dimensional structure of a given protein or protein complex is encoded in the amino acid sequence. The structure of a protein is the result of an optimum partitioning of the nonpolar and polar parts of the polypeptide chain between regions of high and low dielectric constant in the solvent environment [1]. In aqueous solution this leads to a minimization of hydrophobic surface area. This basic principle governs both structure formation of monomeric proteins and association of subunits of oligomeric proteins. In many cases the biological activity of proteins depends on their association state. The association can result in homo- or hetero-oligomers that may be either stable or only transiently formed during their specific functional cycle. The Protein Data Bank (pdb) of protein structures contains more than 2000 oligomeric structures, the largest of which comprise virus shells, and homomeric or heteromeric protein complexes containing more than 20 subunits, such as ferritin (24 identical subunits, [2]), the chaperonin complex GroEL/ES (21 subunits α14β7, [3]), and dihydrolipoyl acetyltransferase (60 subunits, [4]). For quite a number of multi-domain and even oligomeric proteins, folding and assembly pathways as well as their thermodynamic stabilities and dissociation constants have been determined. This chapter deals with the general principles of structure formation and stabilization of multi-domain and oligomeric proteins and the experimental approaches to analyze these reactions. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Folding and Association of Multi-domain and Oligomeric
2 Folding of Multi-domain Proteins 2.1 Domain Architecture
Domains are compact substructures within protein molecules. There are different definitions of “domain” based on analysis of the three-dimensional structure of the respective protein, the autonomous folding units of a protein, and functions such as cofactor binding or resistance to proteolysis. Interestingly, in a number of cases the boundaries of protein domains correlate with intron-exon boundaries on the genetic level [5, 6]. This suggests that exon shuffling (and thus domain shuffling) is an important mechanism in generating functional diversity during evolution of multi-domain proteins [1]. A hierarchical classification of domain structures (CATH) has been proposed, using as parameters the secondary structure composition (C for class), the gross arrangement of secondary structure (A for architecture), the sequential connectivities (T for topology), and the structural and functional similarity (H for homology) [7–9]. By this procedure 27000 known domain structures can be grouped into 1800 sequence families, 700 fold groups, 245 superfamilies, and about 30 different architectures (Figure 1) of a total of three major classes (all-α, α/β, all-β). Based on the three-dimensional structure and structure acquisition of proteins, domains are considered as cooperative units in the folding process. In this concept, the individual domains of a multi-domain protein can be considered simply as small, one-domain proteins that fold to a native-like structure independently of the remaining part of the protein. This holds for both in vitro folding and cotranslational folding. The latter has been shown for the folding and assembly of nascent antibody chains [10]. Based on this kind of vectorial mechanism with welldefined assembly modules, large polypeptide chains speed up folding by many orders of magnitude. At the same time, possible wrong long-range interactions that would lead to an incorrectly folded protein are minimized. Methods used to investigate the structure formation process of multi-domain proteins are the same as for simple model proteins and will not be discussed here. The most important techniques are listed in Table 1. A detailed description of some of these methods can be found in several chapters of Volume I of this handbook. Detailed analyses on the structure, thermodynamic stability, and folding of multi-domain proteins have been performed, e.g., for γ B-crystallin, lysozyme, α-lactalbumin, the light chain of antibodies, and many other proteins [11, 12]. From these studies it can be concluded that in most cases the domains fold independently kinetically and that they are thermodynamically stable entities. The complete refolding, however, often requires an additional kinetic phase involving intramolecular domain association. If the interaction sites of the domains are very large, then domain-domain interactions may contribute significantly to the stability of the respective domains, and thus the isolated domains are thermodynamically unstable on their own. In such cases the folding and structure of the partner
2 Folding of Multi-domain Proteins
Fig. 1 Architectures of domains for the mainly ", mainly $, and "$ classes of the CATH database [7–9].
3
4 Folding and Association of Multi-domain and Oligomeric
Fig. 1 (Cont.)
2 Folding of Multi-domain Proteins Table 1 Experimental approaches to study folding of domain proteins
Assay
Methods
Spectroscopy
Circular dichroism (CD) In the far UV region In the near UV region Fluorescence NMR (combined with H/D exchange) and 2D-NMR Enzymatic activity and/or ligand binding Limited proteolysis or denaturant
Function Stability
Information
secondary structure information environment of aromatic side chains tertiary and quaternary structure changes of the environment of single amino acids monitoring the native state (sometimes can be active as well) only folded domains or the native state might be resistant to proteolytic degradation or increased denaturant concentration
domain represent an essential template for structure formation of the other domain [13, 14]. One might suggest that proteins with very similar structure should show similar folding characteristics. This, however, is not the case. An intriguing example is the comparison of hen egg white lysozyme (HEWL) and α-lactalbumin. Both proteins with a molecular mass of approximately Mr = 14 400 are structurally almost identical, containing an α-helical and a β-sheet domain. However, both their denaturant-induced equilibrium transitions and their folding characteristics are quite different. HEWL shows an equilibrium transition according to a classical two-state model, i.e., only the native and the completely denatured state of the protein are populated in the transition region. In contrast, α-lactalbumin shows a three-state unfolding transition (in the absence of Ca2+ ions) according to the model N ←→ I ←→ U, where N is the native state, I is a partially structured folding intermediate, and U is the completely denatured protein [15]. The equilibrium intermediate I retains the α-helical domain, whereas the β-sheet domain is denatured. This same kind of intermediate can also be observed during the refolding kinetics of α-lactalbumin. It has been used as a model structure for the definition of the molten globule, a type of intermediate often observed in the folding of proteins. Under strong refolding conditions (low residual concentration of denaturant), HEWL also folds via intermediates, which, however, do not resemble a classical molten globule [16]. Furthermore, in the case of HEWL, the intermediate is not obligatory; the protein can fold directly from the denatured to the native state. Due to kinetic competition, a fraction of the denatured molecules folds to an intermediate state, which subsequently undergoes a structural conversion to the native state [17]. 2.2 γ -Crystallin as a Model for a Two-domain Protein
γ -Crystallin as an eye-lens protein of vertebrates belongs to a superfamily of β-sheeted proteins with a Greek key topology. It is a two-domain protein of a
5
6 Folding and Association of Multi-domain and Oligomeric
Fig. 2 Stability and folding of (-crystallin (adapted from Ref. [18]). (A) Urea-induced equilibrium transition of (-crystallin. The protein at a concentration of 0.2 mg mL−1 in 10 mM HCl (pH 2), 0.1 M NaCl was incubated at different urea concentrations. Subsequently, the samples were analyzed
using CD () and sedimentation (s20,w: ). (B) Kinetics of folding/unfolding of (crystallin (,) measured by fluorescence and HPLC gel filtration. The relaxation times of the kinetics are plotted against the urea concentration used for folding/unfolding.
molecular mass of 21 kDa. The two domains, connected by a short peptide linker, are highly homologous (cf. Figure 5). γ -Crystallin is extremely stable, retaining its native structure at pH 1–10 or in 7 M urea at neutral pH [18]. The denaturantinduced equilibrium transition at neutral pH yields a broad transition and a partially irreversible denaturation. At acidic pH, however, the overall denaturation process splits into two transitions, both of which are fully reversible [18–20]. Making use of sedimentation analysis, intrinsic protein fluorescence, and circular dichroism, the urea-dependent equilibrium unfolding at pH 2 can be quantitatively described by a three-state transition N ↔ I ↔ U, with the intermediate populated at 2–4 M urea (Figure 2). The kinetic data on folding and unfolding of γ -crystallin fit well to the biphasic equilibrium transition. Below 3 M urea, the denaturation kinetic is monophasic, representing the reaction N → I. At higher urea concentrations, the unfolding reaction becomes biphasic, the faster reaction corresponding to the structural conversion N → I and the slower one to I → U. Similar effects are observed for the refolding: at 3 M urea, the folding reaction is monophasic (U → I), while under strongly native conditions both reactions U → I and I → N can be observed. On the other hand, when starting either denaturation or refolding from the intermediate state, the respective kinetics are monophasic. Summarizing the dependence of the rate constants (or relaxation times) of folding and unfolding on the urea concentration, two separate V-shaped profiles (chevron plot) are obtained (Figure 2). The respective reactions correspond to structural transition of N → I, I → N, U → I, and I → U. Structural interpretation of these data, however, is difficult. It is not possible to assign any of the equilibrium transitions and kinetic reactions to a certain domain of this two-domain protein. In order to correlate the mechanism with the specific
2 Folding of Multi-domain Proteins
Fig. 3 Stability of a single domain of (-crystallin. Urea-induced equilibrium transition of (-crystallin and its single domains measured in 10 mM HCl pH 2, 0.1 M NaCl. Complete (-crystallin (), the isolated N-terminal domain (), and the isolated C-terminal domain ().
structural properties of γ -crystallin, the recombinantly produced, isolated domains were investigated [19]. Both the C-terminal and the N-terminal domains have been shown to be monomeric in their isolated states as well as if simultaneously incubated even at very high protein concentrations. The thermodynamic stability and folding and unfolding reactions of the isolated N-terminal domain of γ -crystallin resemble those of the intermediate I of the full-length protein [18, 19], indicating that the intermediate I consists of a native-like structured N-terminal and a completely unfolded C-terminal domain. The isolated C-terminal domain, however, is much less stable than it is in the context of the whole protein (Figure 3). In fact, even in denaturant-free buffer, a significant fraction of the isolated C-terminal domain is unfolded. The difference in the stability of the C-terminal domain in its isolated form and in the context of the full-length protein implies a significant contribution of the domain interactions (Gint ) to the overall free energy of stabilization (Gγ ), in addition to the sum of the intrinsic stabilities of the isolated domains (GN , GC ): G γ =G N +G C +G int Under the given conditions (pH 2) the stability GC amounts to only −4.7 kJ mol−1 , whereas the Gint is −16 kJ mol−1 . Thus, at low pH the C-terminal domain of γ -crystallin is mainly stabilized by an intramolecular association with the N-terminal domain. This intramolecular interface is characterized by a welldefined hydrophobic core of Phe 56, Val 132, Leu 145, and Val 170. If Phe 56 of the N-terminal domain is replaced by either Ala, thus creating a cavity in the hydrophobic core, or Asp, placing a non-saturated charge in the contact site, the C-terminal domain is considerably destabilized, again emphasizing the importance of the interaction free energy of stabilization (Gint ) in stabilizing the structure of the C-terminal domain of γ -crystallin.
7
8 Folding and Association of Multi-domain and Oligomeric
2.3 The Giant Protein Titin
Titin is a protein located in the sarcomere of vertebrate striated muscle cells. A single molecule of titin spans half the sarcomere from the Z-disc to the M-line, extending about 1 mm in length (Figure 4). The protein consists mostly of several hundred copies of β-sheet domains of the immunoglobulin type and of fibronectin type III (Fn3); the molecular mass ranges from 3 MDa to 4 MDa, depending on the source of titin. The function of titin varies along the sarcomere. In the M-line it forms an integral part of the protein meshwork, while in the A-band it probably regulates the assembly of the thick filaments and might be involved in the interfilament interaction of myosin and the actin/nebulin thin filament [21, 22]. In the I-band, titin serves as an elastic connection between the Z-disc and the thick filaments. It is hypothesized that these elastic properties depend on partial unfolding/refolding of titin during muscle tension. In order to obtain insight into the molecular properties of this giant protein, single immunoglobulin and Fn3 domains as well as tandem repeats of these two domains have been cloned and biophysically analyzed. The individual domains vary considerably in sequence. Using fluorescence and circular dichroism as probes in equilibrium unfolding experiments, the thermodynamic stability of a variety of these domains was determined to be in the range of 8.6–42 kJ mol−1 [23]. Similarly, the rates of chemical unfolding of individual domains vary by almost six orders of magnitude [23]. It has been speculated that this dramatic variation in folding behavior is important in ensuring independent folding of domains in the multidomain protein titin. Site-directed mutagenesis of each position in a protein together with a detailed analysis of the folding and unfolding kinetics of these mutants permit the structural mapping of the transition state of folding. Such an analysis for the titin domain I27 reveals that the transition state of folding is highly structured and remarkably native-like regarding its solvent accessibility. Only a small region close to the Nterminal region of the domain remains completely unfolded. How do the molecular properties of the single domains of titin correlate with the function of the whole molecule? Titin is involved in muscle tension and elasticity and thus is subjected to mechanical forces. The elasticity of a protein can be measured by atomic force microscopy (AFM) [22]. To this end, the protein is fixed at one end on a surface and at the other end on the tip of a so-called cantilever (Figure 4). Moving the cantilever slowly and on an Angstrom scale from the surface and thus continuously increasing the mechanical force leads to a stretching of the fixed protein. The resistance of the protein to this stretching results in a bending of the cantilever that is optically detected. At some point a protein domain unfolds, the resistance is partially released, and the cantilever returns to its starting position. This procedure is repeated as often as domains are located between the static surface and the cantilever. The experimental result is a sawtooth-shaped curve, in which the width of a single sawtooth reflects the difference in length of the folded and unfolded domains (28 nm for the titin domain I27) and the height
Fig. 4 (A) Schematic representation of a sarcomere of striated muscle. (B–D) Single-molecule AFM measurements of an engineered polyprotein of the titin domain 127. (B) Structure of the 127 lg-like domain, (C, D) Schematic representation of the sequence of events during the stretching of 127 multi-domain protein. Stretching the ends of the polyprotein with an atomic force microscope sequentially unfolds the protein modules, generating a sawtooth pattern in the force-extension relationship that reveals the
mechanical characteristics of the protein. (1) An anchored multi-domain protein composed of four 127 domains. The protein is relaxed. (2) Stretching this protein requires a force that is measured as a deflection of the cantilever. (3) The applied force triggers unfolding of a domain, increasing the contour length of the protein and relaxing the cantilever back to its resting position. (4) Further stretching removes the slack and brings the protein to its new contour length Lc 1.
2 Folding of Multi-domain Proteins 9
10 Folding and Association of Multi-domain and Oligomeric
of the peak corresponds to the mechanical force necessary for domain unfolding (Figure 4). The immunoglobulin domain I27 of titin unfolds at 210 pN, and the Fn3 domain I28 unfolds at ca. 260 pN. However, the hetero-oligomer (I27–I28)4 is slightly stabilized, hence the mechanical force to denature the domain I28 in this construct increases to 306 pN [24]. Such a stabilization of the Fn3 domain I28 in the context of the neighboring I27 domain was also observed using chemical denaturation methods [25]. Comparison of the chemically induced unfolding and the mechanically induced unfolding reveals similar mechanisms of denaturation. The forced unfolding starts with the detachment of the N-terminal β-strand from the otherwise stable β-sheet. This N-terminal β-strand is not structured in the transition state of chemical denaturation. Site-directed mutagenesis studies show that the region important for kinetic stability is very similar in both cases, although there are also significant differences in the forced and chemically induced unfolding transition states [26, 27]. Presently, it seems that chemical and mechanical denaturation processes are related, at least for proteins evolved to respond to mechanical stress. However, in the case of barnase, a globular two-domain protein, mechanically and chemically induced unfolding seem to proceed quite differently on the structural level [28].
3 Folding and Association of Oligomeric Proteins 3.1 Why Oligomers?
Oligomers consist of noncovalently associated subunits of identical origin (homooligomers) or different origin (hetero-oligomers). Two or more monomers form dimers, trimers, tetramers, etc. Higher oligomers may possess topological units (e.g., the dodecameric aspartate transcarbamylase (ATCase) from E. coli consists of two catalytic trimers and three regulatory dimers). Tetramers of glyceraldehyde-3phosphate dehydrogenase and yeast pyruvate decarboxylase are formed as dimers of dimers. Compared to other tetramerization pathways such as the addition of monomers to dimers or trimers, the specific topology of a tetramer as a dimer of dimers leads to kinetics of association that seem to be less prone to competition with off-pathway reactions of nonnative assemblies or accumulation of association intermediates [29]. The advantage of oligomeric proteins compared to their non-associated monomeric counterparts may be found on the functional, structural, and genetic levels. Oligomerization permits an allosteric regulation of enzyme activity. In multienzyme complexes, the catalytic efficiency of the overall process may be significantly increased. This would especially be the case if labile intermediates were channeled from one active site to the other. These functional advantages, however, can be conferred not only by oligomeric structures but also by large multi-domain proteins. Indeed, a few such very large multi-domain proteins consisting of more
3 Folding and Association of Oligomeric Proteins
than 10 000 amino acids are known [30]. There is no fundamental structural or functional distinction between domains and subunits. The eukaryotic fatty acid synthase contains the whole assembly line of fatty acid synthesis [31, 32]; similarly, many non-ribosomal peptides are synthesized by very large proteins (e.g., the cyclosporin synthetase, in which all the different enzyme functions are located in one polypeptide chain) [33]. However, these very large multi-domain proteins are rare: most large protein structures are formed by association of subunits. Oligomerization of small gene products is by far more economical; this is quite obvious in the case of viruses, where a small protein, and thus a small gene, is sufficient for building up a large virus capsid of the size of several megadaltons. Furthermore, errors occurring during protein synthesis can be eliminated more easily in the case of oligomeric structures. Another possible advantage of oligomeric protein complexes versus large multi-domain proteins may result from differences in folding. It has been shown that domains in the context of a larger polypeptide chain fold more slowly than the same domain in its isolated state [34–36]. This difference may result from nonnative long-range interactions with other parts of the polypeptide chain and/or diffusional limitations of a long polypeptide during folding. In the case of the central domain of bacteriophage P22 tail spike protein, the folding of this β-helix domain is accelerated about twofold upon removal of the N-terminal, structurally unrelated domain. This slightly accelerated folding increases the yield of folding significantly both in vitro and in vivo [36, 37]. 3.2 Inter-subunit Interfaces
The increasing number of high-resolution crystal structures of oligomeric proteins and protein complexes available in the Protein Data Bank allows a statistical analysis of the properties of interfaces of subunits and domains [38–41]. The majority of the subunit interfaces have a surface of 1000–3000 Å 2 . The major part of this surface is usually hydrophobic, conveying the stability of interaction; polar interactions, including hydrogen bonding and ionic interactions, as well as surface complementarity serve as specificity factors of interaction [43, 44]. In fact, the molecular properties of the interface of stably associated subunits regarding the distribution of hydrophobic versus polar residues are very similar to that of the hydrophobic core of proteins and, conversely, very different from that of the solvent-accessible surface. In contrast to the interfaces responsible for stable subunit association, the contact sites of proteins that interact only transiently with other proteins are less hydrophobic [41]. A qualitative analysis of 136 subunit interfaces of homodimeric proteins [45] showed that about 30% are characterized by a well-defined hydrophobic core, surrounded by polar interactions [45, 46]. However, the majority of interfaces consist of multiple small, hydrophobic patches interspersed by hydrogen bonds and polar interactions [45]. Such large differences in interface structure might correspond to differences in interface stability and in folding and assembly mechanisms, as proposed by
11
12 Folding and Association of Multi-domain and Oligomeric
Nussinov and coworkers [40, 47]. This is illustrated by the tetrameric yeast pyruvate decarboxylase (PDC). The topology of PDC is reminiscent of a dimer of dimers. The interface between the dimers that form the tetramer is largely polar, and the tetramer spontaneously dissociates to dimers upon dilution. The dissociation constant of this dimer-tetramer equilibrium is 8.1 µM [48]. The interface between the monomers, however, is strongly hydrophobic; the dimer does not dissociate into monomers just by dilution of the protein. Only in the presence of 2–3 M urea can folded monomers be populated [48]. Unfortunately, there are only very few cases where thermodynamic, kinetic, and crystallographic data of a certain protein can be combined in order to assess these questions. Recent studies suggest that, in a given interface, a few amino acids may play a dominant role in determining the free energy of stabilization of the interaction. These residues are called hot spots of the binding surface [46]. Whereas in most cases the association of proteins is mediated by globular, nativelike structured domains, a growing number of protein-protein interactions have been discovered in which one of the partners is unstructured in its isolated state. These natively unstructured proteins can be grouped into two classes, one with random coil structures and the other with a pre-molten globule structure [13]. Only upon association with their targets do these proteins develop an ordered and biologically active structure. Examples are several ribosomal proteins that transform to a rigid well-folded conformation only in the presence of the ribosomal RNA [49, 50]. In the case of cyclin-dependent kinase inhibitor p21, the N-terminal fragment lacks stable secondary and tertiary structure in the free solution state. However, when bound to its target, Cdk 2, it adopts a well-ordered conformation [51]. Besides the usual subunit interaction via globular domains, there is also another major type of protein-protein interaction mediated by special helical structures: the so-called coiled coils [52]. Furthermore, completely artificially designed peptides have been described, conferring heterodimerization of proteins fused to them. These peptides are based on only a stretch of eight charged amino acids and an additional cysteine. Two of these peptides containing complementary charged side chains associate with each other due to their polyionic interaction; subsequently, the cysteines can form a disulfide bond, thus stabilizing the heterodimerization [53–55]. 3.3 Domain Swapping
As pointed out by Bennett et al. [56–58], the gradual accumulation of random mutations on the surface of a monomeric protein that are required to create a suitable subunit interface and stabilize a dimer cannot be accomplished within a biologically feasible time. At least in some cases the evolution of domains offers a key to understanding the mechanism to proceed from monomers to dimers. Starting from a two-domain monomer, domain swapping permits the transition from monomers to dimers (or any other association state) simply by switching from an inter-domain to an inter-subunit interaction. In this scheme, the interface
3 Folding and Association of Oligomeric Proteins
Fig. 5 Domain swapping. (A) The monomer (I) with its primary inter-domain interface may participate in interchain interactions, generating domain-swapped dimers (II). Stabilizing mutations in the linker re-
gion allow stable dimers (III) to be formed [56]. (B) $-crystallin dimer (thin line) as potential product of swapped domains of (-crystallin (thick line) (adapted from Ref. [11]).
that is almost impossible to create by random mutagenesis in evolution is available a priori. Based on the already-existing interaction site, additional mutations might then stabilize the domain-swapped dimer compared to the monomer (Figure 5). The concept of domain swapping may be illustrated by βγ -crystallins. βγ crystallins are members of the superfamily of β-sheeted proteins containing the Greek key motif. γ -crystallin is monomeric even at extremely high protein concentrations, whereas β-crystallin forms dimers or higher oligomers. Both show highly homologous two-domain structures. Comparison of the structures of γ - and β-crystallins suggests that the β dimers are the product of domain swapping and association of a γ ancestor (Figure 5). This hypothesis is supported by the fact that 60% of the interface residues of the N- and C-terminal domains of γ -crystallin are identical or highly conserved compared to those of the N- and C-terminal domains of the two subunits of β-crystallin [59]. Domain swapping does not correlate necessarily with structural domains. Instead, the swapped portion can vary from a large globular domain, as described for βγ -crystallins, to a structural element as short as a single helix or β-strand.
13
14 Folding and Association of Multi-domain and Oligomeric
In fact, of the ca. 40 structurally characterized cases of domain-swapped proteins, most swapped domains are either at the N- or C-terminus and are diverse in their primary and secondary structure [60]. Hsp33, for example, forms a dimer in which a C-terminal arm of one monomer consisting of two α-helices wraps around a second monomer [61, 62]. This C-terminal arm is connected to the remaining part of the protein by a putatively flexible hinge region. The chaperone-active dimer is in equilibrium with a less active or even inactive monomer (K D =0.6 µM, [63]). Unfortunately, the structure of the Hsp33 monomer is not known. Thus, at present it is only speculation that in the monomeric structure the C-terminal arm folds back on the monomer. Higher association states of domain-swapped proteins are found in several virus capsids, in which capsomers swap β-strands, or pathologic aggregates in the case of α1-antitrypsin [64]. Although domain swapping is an intriguing concept in understanding evolution of protein dimerization and oligomerization, it can be applied only in a limited number of cases. It cannot explain hetero-oligomerization of subunits encoded by widely differing genes as these subunits do not share a common inter-domain interface or different parallel routes of dimerization, as shown for bovine seminal ribonuclease, which exists in both a domain-swapped and a non-swapped dimeric form [65, 66].
3.4 Stability of Oligomeric Proteins
The laws of thermodynamics require that association equilibria be shifted towards the dissociated state at low protein concentrations. Indeed, for several oligomers it has been shown that the dissociation constants of dimerization are in the nanomolar to micromolar range, and thus the dissociation can be achieved directly by dilution experiments. Examples are phosphofructokinase [67], dimeric Arc repressor [68], Hsp33 [63], and yeast hexokinase [69, 70]. In some cases this association equilibrium is functionally and physiologically relevant. Dimeric Hsp33 is active as a molecular chaperone, whereas the monomeric form is less active or non-active [63, 71]. The K D describing the monomer-dimer equilibrium is in the range of the physiological concentration of Hsp33 in E. coli [63]. In the case of yeast hexokinase, the monomeric state of the protein is less active than the dimeric form. Interestingly, the dimerization is favored in the presence of the substrate glucose [69]. Most oligomeric proteins, however, dissociate only at concentrations too low to observe dissociation by mere dilution [1]. This finding suggests that subunit interactions account for a large fraction of the conformational stability of oligomeric proteins; sub-nanomolar dissociation equilibrium constants for dimerization correspond to standard free energy changes of >50 kJ mol−1 . In such cases, dissociation is achievable only by denaturation. Whereas the denaturant-induced equilibrium transition of denaturation of monomeric proteins is independent of the protein concentration, the midpoint of transition of oligomeric proteins varies with protein
3 Folding and Association of Oligomeric Proteins
Fig. 6 Guanidine-dependent equilibrium transitions of the dimeric antibody domain CH3. Samples were incubated for six days at 20◦ C in 0.1 M Tris-HCI (pH 8.0) and varying GdmCI concentrations. For the fluorescence measurements the excitation wavelength was set to 280 nm and emission was analyzed at 355 nm. The CH3 dimer concentrations were 0.2 mM (open circles), 1.6 :M (open squares), and 4.1 :M (open tri-
angles). For the CD experiment, the ellipticity was recorded at 213 nm (filled triangles) at a protein concentration of 4.1 :M. All measurements were carried out at 20◦ C Inset: From the transitions, the stability G could be calculated at the respective GdmCl concentrations. These G values were plotted against the respective GdmCl concentration and linear extrapolated to 0 M GdmCl [42].
concentration. In the simplest case of a reversibly denaturing dimeric protein, this transition can be ascribed to the reaction N 2 ←→ 2U. The linear extrapolation method, used for analysis of the transition, yields a straight line with a slope depending on the protein concentration used (Figure 6). These lines describing the denaturation transition at different protein concentrations will intersect the γ -axis at the same value that corresponds to the stability of the dimeric native state in the absence of denaturant [72]. Oligomeric states often are not only thermodynamically stable but also kinetically stabilized, i.e., the activation energy of dissociation under native conditions is very high. Dissociation rates in denaturant-free buffer have been estimated by extrapolation to be as low as 10−14 s−1 [73].
3.5 Methods Probing Folding/Association
The reconstitution of oligomeric proteins comprises folding of subunits and their association. Therefore, all methods suited to analyze folding of proteins and domains can be applied to oligomeric proteins as well (cf. Table 1). In the following section, only methods that detect association reactions are discussed. In using these methods, one should be very careful, because these techniques do not always
15
16 Folding and Association of Multi-domain and Oligomeric
distinguish between specific association and unspecific aggregation. The latter, however, will often be found to compete with proper refolding and association.
3.5.1 Chemical Cross-linking Highly reactive bifunctional chemicals can be used to covalently crosslink subunits of an oligomeric structure with each other. In the case that cross-linking proceeds much faster than oligomerization, it is possible to get time-resolved information of the association reaction by cross-linking aliquots of the reaction at different time points. The amount of cross-linked oligomers can subsequently be analyzed on SDS-PAGE. Glutaraldehyde has been shown to be ideally suited as a cross-linker for several reasons. It forms a Schiff base with lysine side chains, which are usually located on the surface of a folded subunit. The reaction is very fast and efficient. It is not dependent on a specific distance between two lysine residues because glutaraldehyde exists in differently polymerized forms in aqueous solutions; therefore, the distance between the two reactive aldehyde groups of glutaraldehyde varies with its polymerization state. For a given protein, the glutaraldehyde concentration, reaction time, temperature, and protein concentration need to be optimized in order to guarantee that non-associated subunits are not cross-linked artificially. A protocol for glutaraldehyde cross-linking is given in the Appendix.
3.5.2 Analytical Gel Filtration Chromatography Gel filtration is a standard technique to determine the molecular mass of proteins. Using it for kinetic analyses of association, however, poses problems. Gel filtration is a comparatively slow method; the time resolution of kinetic measurement will therefore be low. The protein analyzed will be diluted on the column, thus shifting a possible association equilibrium towards the dissociated species. Only if the dissociation of the oligomer is much slower than the time it takes to carry out the gel-filtration chromatography will significant results be obtained (the rate of dissociation might be slowed down by cooling the column). Gel-filtration chromatography has been used successfully to characterize, among others, the reconstitution of β-galactosidase [74], P22 tail spike protein [75], and an antibody Fab fragment [76].
3.5.3 Scattering Methods Light scattering, small-angle X-ray scattering, and neutron scattering provide information on particle size and shape. They can be used online without the necessity of taking aliquots during reconstitution for analysis. However, scattering methods need high protein concentrations (small-angle X-ray and neutron scattering, several milligrams per milliliter; light scattering, depending on the size of the oligomer, not less than 0.1 mg mL−1 ). At these high protein concentrations, unspecific aggregation is often favored over correct reconstitution. Light scattering has been used
3 Folding and Association of Oligomeric Proteins
predominantly to monitor formation of large assemblies such as viruses or protein filaments. More often it is used as a method to follow aggregation [77, 78]. 3.5.4 Fluorescence Resonance Energy Transfer Fluorescence resonance energy transfer (FRET) can be measured if the emission spectrum of a fluorophore overlaps with the absorption spectrum of another fluorophore. Then the energy of the excited donor fluorophore will not be emitted as fluorescence but will be transferred to the acceptor fluorophore. In this setup, the donor fluorophore will be excited and the fluorescence of the acceptor is measured. The transfer efficiency is inversely proportional to the sixth power of the distance between the two fluorophores, thus permitting measurement of distances within proteins [79] and monitoring of protein complex formation [80]. FRET measurements have to be carefully controlled. The fluorescence dyes that are coupled to the respective subunits are bulky hydrophobic chromophores and might induce association or aggregation of the coupled proteins artificially. The FRET technique has become increasingly important to study protein-protein interaction in the cellular context. To this end, the proteins in question are fused to different variants of the green fluorescent protein, e.g., BFP and GFP. If these fusion proteins associate in vivo, FRET can be observed and quantified by fluorescence microscopy. 3.5.5 Hybrid Formation The concentration of unassembled subunits can be measured by their ability to form hybrid oligomers. Originally, the technique used a large excess of chemically modified or isoenzyme subunits added after varied reconstitution time to trap unassembled subunits [81–84]. Problems such as slow folding of added subunits and preferential formation of homo-oligomers are associated with this approach. More-reliable data may be obtained with naturally occurring or site-directed mutant proteins with altered net charge but unaffected folding and association kinetics [37, 75, 85]. The technique may be illustrated using the trimeric tail spike protein from phage P22. Wild-type and mutant tail spikes with altered charge were separately denatured by urea at pH 3. Subsequently, both proteins were diluted separately into neutral buffer to remove the denaturants and start refolding. After different times, aliquots from both solutions were mixed, reconstitution of native trimers was allowed to reach completion, and the differently charged hybrids and homooligomers were separated by electrophoresis [75]. In samples mixed early during renaturation, wild-type homotrimers, hybrids, and mutant homotrimers formed at the statistically expected ratio of 1:2:2:1, whereas in samples mixed late, only homo-oligomers were detected. The half-time of assembly can be estimated from the amount of hybrid present in samples mixed at intermediate renaturation times. It should be noted that hybrid formation measures only stably associated protomers; those in rapid exchange between loosely associated oligomers will be equivalent to free subunits.
17
18 Folding and Association of Multi-domain and Oligomeric
3.6 Kinetics of Folding and Association 3.6.1 General Considerations The reconstitution of oligomeric proteins from the denatured state generally constitutes a series of uni- and bimolecular steps. For a dimeric protein, reconstitution involves the following steps (neglecting possible off-pathway side reactions):
2U→2M→M2 →N2 where U is the denatured state. M represents a folded monomer, M2 a nonnative dimer, and N2 the native dimer. The folding reactions preceding subunit assembly are identical to those of single-domain or multi-domain proteins. They consist of a hydrophobic collapse, formation of secondary structure, and the merging of domains, leading to association-competent monomers. As in the case of monomeric proteins, these reactions may occur as sequential or parallel reactions on multiple pathways [86, 87]. All these reactions follow first-order kinetics and do not depend on protein concentration. In contrast, the association of subunits (2M→M2 ) is a bimolecular and therefore probably a second-order reaction. Subsequently, the dimer may need to undergo further unimolecular rearrangements to reach the final native state (M2 →N2 .0). The assembly of larger oligomers can be described accordingly as a series of multiple uni- and bimolecular steps. The assembly of a homotrimer is expected to proceed via a dimeric intermediate: 3M↔M2 +M→M3 In this reaction scheme, the first association resulting in a dimer, in which only one of two interfaces is buried, is considered a fast equilibrium. This first association step is unfavorable and the dimeric intermediate is not populated in equilibrium. The second association step will pull this equilibrium towards the native trimeric state. In this concept, the overall process follows apparent third-order kinetics [84]. A tetramer may be envisioned as a dimer of dimers. Thus, the scheme for this reaction is just an extension of that shown for a dimeric protein [1, 88]:
4U→4M→2M2 →2M2 →M4 →N4 In the case of multimers such as fibrillary proteins or virus capsids, the simplest model comprises a stepwise addition of monomers to the growing chain: 2A↔A2 ,A2 +A↔A3 ,A3 +A↔A4 ,. . . In this scheme the equilibrium constants of association may be all identical. However, the first steps of multimerization are often thermodynamically disfavored, i.e., the equilibrium is on the side of the non-associated species. Only after
3 Folding and Association of Oligomeric Proteins
reaching a certain size of the oligomer structure does further addition of monomers to this nucleus become thermodynamically favorable. This would lead to an overall assembly process characterized by a nucleation/polymerization reaction. 3.6.2 Reconstitution Intermediates Because the half-time of association is concentration-dependent, the nature of the rate-limiting step of reconstitution of oligomeric proteins may change with protein concentration [1]. At high protein concentrations, the formation of associationcompetent subunits is rate-limiting; at low protein concentrations, dimerization is rate-limiting, and the overall reaction follows second-order kinetics. Under these conditions, highly structured, association-competent subunits often are populated kinetically during refolding/assembly. Such intermediates can be observed spectroscopically. In general, secondary structure as monitored by far-UV circular dichroism is formed quickly upon folding/association. Furthermore, the secondary structural content of the folded subunits does not change significantly upon subsequent association. Thus, the change in CD signal follows first-order kinetics, reaching the signal of the native protein as subunit folding is completed. In contrast, if reconstitution is analyzed by reactivation, i.e., regain of enzyme activity, no reactivation is observed during subunit folding provided that the non-associated subunits do not possess enzyme activity by themselves. The folding of the subunits results in a lag-phase in regain of activity; reactivation occurs upon association. Thus, the overall reactivation is described by a uni-bimolecular process that follows a sigmoidal time course, with the lag-phase determined by first-order subunit folding (Figure 7) [89]. Another method often used to analyze protein folding is fluorescence. However, whereas circular dichroism monitors secondary structure formation almost exclusively on the subunit level and, in most cases, enzymatic activity corresponds to the native state after assembly, fluorescence may detect both subunit folding and association, depending on the localization of the fluorophores in the quaternary structure. Therefore, the interpretation of fluorescence data might be more difficult. On the other hand it allows the detection of the association process even if the protein is enzymatically inactive. In some cases the folded but non-associated subunit already shows ligand binding or partial enzyme activity. Examples are mammalian aldolases [90], lactate dehydrogenase [91], the P22 tail spike protein [92], and yeast hexokinases [69, 70]. Monomeric folding intermediates of oligomeric proteins often possess large interfaces for subunit association. This subunit interaction is quite similar to the domain-domain interactions in multi-domain proteins. As shown before for γ crystallin, the association can contribute substantially to the overall stability of the subunits. Thus, it is not surprising that folded subunits usually are not detectable as equilibrium unfolding intermediates if the native protein is subjected to high temperature or chemical denaturants [1, 88]. Structured monomers of lactate dehydrogenase have been obtained at low pH in the presence of the stabilizing salt sodium sulfate [93–95]. Another possibility to obtain stably folded monomeric subunits is based on protein engineering, as has been demonstrated for triose
19
20 Folding and Association of Multi-domain and Oligomeric
Fig. 7 Folding and association pathway of yeast invertase. (a) Changes in circular dichroism (open triangles), fluorescence emission (broken line), and enzymatic activity (open circles) occur with very different kinetics during the reconstitution of homodimeric yeast invertase. (b) The first-order
folding rate obtained from the fluorescence change can be used to fit a sequential unibimolecular mode to the reactivation data obtained at different protein concentrations (full lines). Subunit concentrations were 8.5 nM (circles), 17 nM (triangles), and 68 nM (diamonds) [89].
phosphate isomerase [96, 97], tetrameric human aldolase [98], and P22 tail spike protein [92]. In all these cases the engineered monomeric form retained some enzyme activity, but, as expected, the variants were strongly destabilized compared to the oligomeric wild-type form. 3.6.3 Rates of Association The maximum rate of association of two molecules is determined by their random diffusional collision. For a common protein domain, this limit is about 109 M−1 s−1 [99]. Indeed, this limit is reached in the binding of chaperones to their substrate proteins. Chaperones recognize a wide variety of substrate proteins mainly by hydrophobic interactions. Since these interactions do not need a pre-orientation of chaperone and substrate, most of the diffusional collisions result in complex formation. However, this is quite different for the specific association of folded subunits.
3 Folding and Association of Oligomeric Proteins Table 2 Rates of association of some oligomeric proteins
Protein
Triosephosphate isomerase Malate dehydrogenase (mitochondrial) Invertase β-Galactosidase Alcohol dehydrogenase (liver) Phosphoglycerate mutase Lactate dehydrogenase Arc repressor Trp aporepressor
Reaction
k(M−1 s−1 )
Reference
27 33
2M→D 2M→D
3×105 3×104
[1] [1]
76 116 40 27
2M→D 2M→D 2M→D 2M→D 2D→T 2D→T 2M→D 2M→D
1×104 4×103 2×103 6×103 3×04 2×104 1×107 3×108
[89] [74] [1] [103] [103] [1] [102] [101]
Subunit mass (kDa)
36 62 12
Here, the pre-formed interfaces require an exact rotational alignment for association, and this reduces the diffusion-controlled rate by several orders of magnitude. Brownian dynamics simulations estimate the rate of association of model proteins to be 106 M−1 s−1 [100]. Most second-order rate constants determined experimentally are even smaller (Table 2). There are, however, two examples published – the core domain of trp repressor and the arc repressor – that show faster dimerization than expected (Table 2) [101, 102]. In both cases folding is tightly coupled to association; the association-competent monomers are only partially structured. 3.6.4 Homo- versus Heterodimerization The molecular interaction of subunits in oligomers can be considered as specific as the formation of the hydrophobic core of a protein. The structure of a protein or subunit determines the kinetic and thermodynamic characteristics of its interaction with other molecules. Therefore, a competition between homo- and heterodimerization is expected only if the different subunits of the oligomer are highly homologous. The most prominent example of such a competition is the reconstitution of heterodimeric bacterial luciferase from Vibrio harveyi. The two subunits consisting of a (βα)8 barrel are structurally homologous and probably result from gene duplication [104]. The association is mediated by two helices from each subunit forming a fourhelix bundle at the contact site [105, 106]. The reconstitution of luciferase from the urea-denatured state is fully reversible. It comprises as rate-limiting steps the folding of the subunits on the monomeric level, the association of the subunits to an inactive heterodimer (at low protein concentrations), and a conformational rearrangement on the dimeric level leading to the native state [107–110]. As shown by far-UV circular dichroism, the inactive dimer seems to be incompletely folded. Whereas the reconstitution of luciferase is quantitative if the subunits are refolded together, refolding of the subunits separately with a subsequent assembly does not lead to active heterodimer [107, 111]. In a detailed study, it could be shown
21
22 Folding and Association of Multi-domain and Oligomeric
that active heterodimeric luciferase can be assembled upon folding of the β-subunit in the presence of the pre-folded α-subunit, but not vice versa [73]. In the absence of the α-subunit, the β-subunit forms a homodimer that is almost as stable thermodynamically as the heterodimer. Furthermore, the β 2 homodimer is kinetically very stable, and under native conditions the dissociation does not occur on a biological or experimental timescale [73]. Upon reconstitution, the association of the β 2 homodimer and the αβ heterodimer compete with each other. The almost exclusive formation of the heterodimer is due to the kinetics of association: the rate of heterodimerization is more than tenfold higher than that of the β 2 homodimerization. The structures of the homo-and heterodimer are almost identical regarding the contacts within the interfaces; most of the inter-subunit hydrogen bonds and water-mediated contacts are conserved between the two dimers [105, 106]. Thus, from the structural point of view, there is no obvious explanation for the different association kinetics of homoand heterodimeric luciferase. Mammalian lactate dehydrogenase (LDH) may serve as another example of the competition of homo- and hetero-oligomerization. LDH is a tetrameric protein that exists in two homomeric variants: in skeletal muscle (LDH-M4 ) and in the heart (LDH-H4 ). If these two isoforms are reconstituted together from the denatured monomers, the product is not only the two homomeric tetramers M4 and H4 but also all types of hybrids. Even more surprising, the distribution of all the variants fits to a statistical process, e.g., the ratio of M4 , M3 H1 , M2 H2 , M1 H3 , and H4 is 1:4:6:4:1 [1, 94]. The beauty of this system is that it can be used to analyze the two different association steps upon reconstitution; at different time points of the reconstitution of LDH-M4 , an excess of LDH-H subunits can be added to the reaction. From the ratios of the hybrids reconstituted after addition of LDH-H subunits to the refolding LDH-M, the amount of monomeric, dimeric, and tetrameric M-subunits can be calculated [94]. Distinguishing homo- from hetero-oligomers is not always a simple matter, because in order to obtain competition of homo- and hetero-oligomerization, the subunits have to be highly homologous. The different hybrids of LDH could be separated by native electrophoresis, and in the case of luciferase the two subunits show different pI values, allowing a separation of the different subunits by ion-exchange chromatography [109]. Furthermore, enzymatic activity or varying stability of the different species may be used to distinguish between homo- and hetero-oligomers.
4 Renaturation versus Aggregation
Folding and association of multi-domain and oligomeric proteins are complex processes involving partially structured folding intermediates. These processes can be accompanied by nonspecific side reactions leading to aggregation. The molecular basis for aggregation is that intermediates in refolding/reassembly may engage in wrong intermolecular rather than in correct intramolecular or assembly
4 Renaturation versus Aggregation
Fig. 8 Folding and aggregation of reduced lysozyme. (A) Yield of enzymatic activity as a function of turkey lysozyme concentration during oxidative refolding. (B) Kinetics of reactivation of turkey lysozyme. (C) Kinetics of commitment of aggregation of denatured and reduced lysozyme. Starting protein concentration for renaturation was 0.185 mg mL−1 (•) and 0.74 mg mL−1 (◦). At different time points of renaturation,
the samples were further diluted 10-fold and incubated overnight. (D) Kinetics of commitment of renaturation. Renaturation was performed with denatured and reduced turkey lysozyme. At different time points, a high concentration of denatured and reduced hen lysozyme was added. After completion of renaturation, the activity of turkey lysozyme was determined (adapted from Ref. [114]).
interactions. Thus, aggregation is at least a second-order or even higher-order reaction, depending strongly on the concentration of folding intermediates [112, 113]. Therefore, the kinetic partitioning between folding and aggregation favors aggregation at high protein concentrations. In contrast, at low protein concentrations, folding dominates over aggregation (Figure 8). Using reduced lysozyme as a model system, it could be shown that commitment to aggregation is a fast reaction preceding the commitment to renaturation; early folding intermediates were especially prone to aggregation [114]. After a certain intermediate state is reached, slow conformational changes lead exclusively to the native state (Scheme 1 [114]).
23
24 Folding and Association of Multi-domain and Oligomeric
Scheme 1
As shown by circular dichroism and Fourier transform infrared spectroscopy (FTIR), aggregated proteins possess a high content of secondary structure and probably of native-like structured protein domains [115, 116]. Aggregation is not an artifact of in vitro refolding; the same phenomenon occurs in vivo. It is especially obvious in the case of inclusion body formation upon highlevel expression of recombinant proteins in microorganisms and the accumulation of fibrillary aggregates in human diseases [115–118]. Cells have developed a set of chaperones and folding catalysts to deal with aggregation. These two protein classes are discussed in detail in several chapters of this book.
5 Case Studies on Protein Folding and Association 5.1 Antibody Fragments
Antibodies are hetero-oligomeric proteins consisting of two light chains and two heavy chains. These four polypeptides are covalently linked by several disulfide bonds. The light chain consists of two domains and the heavy chain of four domains, with each domain adopting a typical immunoglobulin-like topology of a two-sheeted β structure. Folding of these domains has been shown to be a highly cooperative and reversible process involving an intermediate with already nativelike secondary structure [119, 120]. The two antibody domains of the light chain (κ-type) and the CH 1 and CH 3 domain of the heavy chain (γ 1-type) contain conserved cis-prolines. In the denatured state (depending on the denaturation time) equilibration between the cis and trans configuration occurs. Since the two configurations are separated by a high-energy barrier, re-isomerization is a slow process. This is reflected by an additional slow folding phase observed during refolding [119] that can be accelerated by prolyl isomerases [121, 122]. As soon as immunoglobulin domains are covalently linked, as in the case of the immunoglobulin light chain, which is composed of two domains, an additional slow folding phase is detected that may be due to interactions of the individual domains [120]. An additional level of complexity is observed in the case of the antibody Fab fragment, which consists of the light chain and the two N-terminal domains of the heavy chains, the Fd fragment (Figure 9). The two polypeptide chains associate noncovalently via contact sites between both the variable domains and the
5 Case Studies on Protein Folding and Association
Fig. 9 Structure of the Fab fragment of the monoclonal antibody MAK33 (6/IgG1) (pdb: 1FHE5, [123]). The light chain and the Fd chain are colored in green and yellow, respectively. Proline residues in trans con-
figuration are highlighted in gray, those in cis configuration in red. The intermolecular disulfide bond connecting the C-terminus of the two chains is not resolved in the structure.
constant domains of each chain. Precise assembly of the two chains is required for antigen-binding activity [124, 125]. Furthermore, a disulfide bond connects the C-terminal domains of both chains covalently. Despite this molecular complexity, the regain of antigen-binding activity of a Fab fragment, measured by ELISA, follows simple first-order kinetics [76, 122]. Spectroscopic analyses, however, reveal a complex folding behavior, which is expected for a multi-domain protein. The Fab fragment contains 22 prolyl residues, five of them in the cis configuration in the native state. Hence, upon refolding all these prolines need to isomerize to their native peptide bond configuration; this reaction dominates the overall folding process. Consequently, as described for the single-antibody domains, the folding of the Fab fragment can be catalyzed by peptidyl-prolyl-cis/trans isomerases [122].
25
26 Folding and Association of Multi-domain and Oligomeric
Fig. 10 Folding of Fab, not containing the C-terminal disulfide bond between the two chains. Refolding was achieved by diluting the denatured protein into 0.1 M Tris, pH 8, either starting from a long-term denatured protein in the absence (◦) or presence
() of a fivefold molar excess of native light chain or starting from short-term denatured protein in the absence (•) or presence () of a fivefold molar excess of native light chain. The data were fitted to a single firstorder or a serial first-order reaction [76].
If the disulfide between the two chains of the Fab fragment is deleted (with the remaining intra-domain disulfides still intact), then the reconstitution comprises both folding of the four domains and association of the two subunits. In this case, the regain of antigen-binding activity follows a sigmoidal time course (Figure 10). The kinetics of reconstitution does not depend on protein concentration, indicating that the association of the subunits is not rate-limiting [76]. Instead, the reconstitution process can be described by a serial first-order reaction with one of the phases identical to the proline-determined refolding of the disulfide-bonded Fab described before (k = 0.04 min−1 , Scheme 2). The other phase describes the slow formation of an association-competent state of the light chain. It is known that structure formation of the two domains of the light chain is a fast process compared to the overall folding/reconstitution of the whole Fab fragment, even though folding of the light chain itself is limited by prolyl isomerization [121]. If the prolyl isomerization is catalyzed (or excluded from the folding/assembly reaction by short-term denaturation), the time course of reconstitution of the Fab fragment switches from a sigmoidal to a monophasic first-order kinetic; prolyl isomerization is not rate-limiting anymore. The yield of reconstitution does not change. The yield is determined by aggregation of the Fd fragment. Because peptidyl-prolyl-cis/trans isomerases catalyze the rate-limiting step occurring after subunit association of the Fab fragment, they do not effect the kinetic competition of off-pathway aggregation and productive reconstitution. The situation changes if the Fab reconstitution is performed in the presence of an excess of native light chain. Under these conditions, the rate-limiting step
5 Case Studies on Protein Folding and Association
Scheme 2 Folding/reconstitution of the non-disulfide-bonded Fab fragment. U and N donate the denatured and the native state of the Fab fragment. Fd marks the position at which the association of light chain and Fd along the reconstitution pathway takes place. D describes an intermediate of associated light chain and Fd fragment, and
I represents non-associated folding intermediates. The subscribed c and t reflect the cis and trans configuration of the rate-limiting proline peptide bond, respectively. The values given at two of the arrows indicate the rate constants (min−1 ) of the respective reactions (adapted from Ref. [76]).
before subunit association vanishes, and the regain of Fab activity becomes identical to the prolyl isomerization-determined folding of the Fab fragment, in which the two chains are disulfide-linked. This shows that (1) the rate-limiting step of reconstitution on the monomeric level is the folding of the light chain, presumably corresponding to a specific intramolecular pairing of the two folded domains [120], and (2) the rate-limiting prolyl isomerization on the level of the already associated subunits occurs in the Fd part of the Fab fragment (Scheme 2). The folding/reconstitution of the disulfide-bonded and the non-disulfide-bonded Fab fragment comprise domain folding and domain interactions that are identical on the molecular level; however, in one case the domain pairing occurs intramolecularly, in the other case intermolecularly. The rate-limiting folding of the light chain can be observed only for the non-disulfide-bonded Fab fragment, clearly indicating that in the disulfide-bonded form the folding of the light chain is intramolecularly chaperoned by the nearby Fd part of the Fab fragment. On the other hand, the reconstitution of the non-disulfide-bonded Fab shows that the isomerization of at least one prolyl residue of the Fd part obligatorily occurs only after subunit association. Thus, only the association with the folded light chain provides the structural information for the isomerization state of the respective proline in the Fd part. This is an intriguing example of how folding and association of subunits are interrelated. Whereas in the case of the Fab fragment the association of the two polypeptides precedes prolyl isomerization, it is different for the antibody domain CH 3. The isolated CH 3 domain forms a dimer. Under conditions where prolyl isomerization does not contribute to the folding kinetics, formation of the β-sandwich structure is a slow process, even compared to other antibody domains, while the subsequent
27
28 Folding and Association of Multi-domain and Oligomeric
association of the folded monomers is fast. After long-term denaturation, the majority of the unfolded CH 3 molecules reach the native state in two serial reactions involving the re-isomerization of one proline bond to the cis-configuration. The folded species with the wrong isomer accumulates as a monomeric intermediate; the proline isomerization is a prerequisite for dimerization of the CH 3 domain [42]. 5.2 Trimeric Tail Spike Protein of Bacteriophage P22
The tail spike protein (TSP) of bacteriophage P22 from Salmonella typhimurium has been a paradigm in studying protein folding and association in vivo since Jonathan King’s early attempts to solve the problem of phage morphogenesis by genetic methods [126–130]. The beauty of both the system and the approach is that it has provided insight not only into the principles of protein self-organization in the cytoplasm but also into the mechanisms of misfolding, aggregation, and inclusion body formation [85, 115, 131–134]. It has been shown that in vitro reconstitution is clearly related to the situation in the living cell [115, 132]. The tail spike is a multifunctional trimeric protein composed of identical 72-kDa polypeptide chains, attached to the viral capsid by their 108-residue N-terminal domain. Six of these trimers assemble onto the virus head to form the tail, completing the infectious phase of the phage. In the absence of heads, tail spikes accumulate as soluble protein in the bacterial cytosol. Their assembly and the competing formation of inclusion bodies at high temperature have been studied in detail using temperature-sensitive (ts) mutants. In this connection, the assembly of viable phage was used as an in vitro assay for proper folding. Other means to follow the folding and assembly pathway of TSP, in addition to spectroscopic and hydrodynamic techniques, are its endorhamnosidase activity (directed toward the O-antigen of the host), its reactivity with monoclonal antibodies, and the thermal stability and detergent-resistance of the mature protein. Crystal structures of wild-type TSP and mutant variants of the protein have been determined at high resolution [135, 136]. The major part of the trimer represents a perfect right-handed parallel beta-helix; the side-by-side association of the single polypeptide chains is illustrated in Figure 11. Evidently this part of the molecule can fold separately in the subunits, whereas the interdigitated C-terminal part (Figure 11) can be formed only after subunit association. The native TSP trimer in the absence of denaturants is stable up to temperatures well above 80 ◦ C. However, there are numerous point mutations that prevent phage proliferation at the much lower temperature of 40 ◦ C. Interestingly the ts mutations do not affect protein synthesis, and the mutant proteins are perfectly stable and biologically active at 40 ◦ C, once synthesized and assembled at lower temperature. However, after expression at the restrictive temperature, they are inactive and react with antibodies specific for the denatured protein. Thus, they have been coined “temperature-sensitive for folding” mutants (tsf ). A second class of mutants represents global intragenic suppressors (su) of the tsf phenotype, showing an increase in the yield of correctly folded and assembled TSP at high
5 Case Studies on Protein Folding and Association
Fig. 11 Structure of the tail spike protein (TSP) from Salmonella bacteriophage P22. (A) TSP trimer assembled from the structures of the N-terminal (above) and C-terminal (below) fragments [136]. (B) Section through the parallel $-helix in the central domain of the trimer. (C) Section through the C-terminal domain where the
subunits form mixed $-sheets. (D) Single TSP subunit with sites of tsf mutations in the main body and in the “dorsal fin” (red), sites of global suppressors of the tsf phenotype in the central part of the $-helix (yellow), and sites of lethal mutations (green) [134, 135].
temperature at the expense of ligand-binding affinity or thermal stability of the native state [131, 132, 137]. Figure 11 shows that all these mutations are located in the parallel beta-helix domain and the associated “dorsal fin domain” of the fishlike structure. From the point of view of the folding mechanism, this means that these parts of the polypeptide chain must acquire a native-like conformation at an early stage of the folding pathway [133, 135, 138]. The mechanism shown in Scheme 3 describes the folding and association of TSP, starting from the unfolded (or nascent) polypeptide chains (U):
29
30 Folding and Association of Multi-domain and Oligomeric
Scheme 3
I represents a loosely folded conformation exhibiting native-like secondary and tertiary structure, but it is still quite thermolabile and susceptible to misfolding (I →I*) and subsequent irreversible aggregation (I* →A); this pathway is preferred at high temperature. At low temperature, polypeptide chains end up in the native trimer (N) via the “structured monomer” (M). Folding mutations affect the thermal stability of the intermediate (tsf mutations destabilizing and tsf suppressor mutations stabilizing I), thus altering the partitioning between the two pathways. In M, the β-helix is preformed in a native-like conformation [92, 133, 138]. Subunit assembly then leads to the incompletely folded, metastable “protrimer” (PT) in which the polypeptides are stably associated but still incompletely folded (i.e., prone to proteolysis and SDS denaturation) [139]. Because PT formation is independent of protein concentration, a unimolecular isomerization reaction must be rate-limiting [85]. The “maturation” of the trimer, in which the protein acquires the high thermal stability and SDS resistance of the native protein, is associated with a high activation energy and occurs with identical rates both in vitro and in vivo [37, 139]. Thus, the reconstitution from unfolded polypeptide chains may be considered a valid model for the self-assembly of the tail spike part of the phage in its host, supporting the conclusion that neither codon frequencies nor co-translational reactions can play a significant role in this part of the phage morphogenesis. The folding and assembly of TSP confirm the general observation that misfolding and irreversible aggregation are intimately related in the sense that at high protein concentrations, kinetic competition of folding and association occurs. In the case of TSP, decreasing the stability of an on-pathway intermediate by mutation or high temperature leads to accumulation of less-structured or misfolded conformations that aggregate at a faster rate. As TSP has been the paradigm for folding studies in vitro and in vivo, this might be the right place to consider briefly the significance of chaperones on the above reconstitution reactions. Incompletely folded TSP chains can be trapped on the E. coli chaperonin GroEL in the absence of ATP. However, even an excess of GroEL/GroES increases the yield of TSP reconstitution only marginally at temperatures higher than 30◦ C [140]. Overexpression of the chaperone system in Salmonella does not suppress tsf mutations [141, 142]. On recombinant expression in E. coli, TSP mutants impaired in folding or assembly are proteolyzed more rapidly when GroE is simultaneously overexpressed, thereby preventing aggregation of TSP in inclusion bodies [134]. The irreversible aggregation reaction itself has been studied by initiating refolding at intermediate denaturant concentrations at which a large fraction of refolding proteins can be induced to aggregate at low temperature [93, 115, 132, 143, 144]. The misassembly process can be described as a nucleated linear polymerization reaction involving partially folded or misfolded rather than
References
fully unfolded polypeptide chains. It is specific so that TSP does not form mixed aggregates with other proteins (e.g., P22 coat protein), in agreement with the observation that overexpression or heterologous expression of proteins in bacteria commonly leads to inclusion bodies consisting of a single or very few polypeptide species [145].
References 1 JAENICKE, R. (1987). Folding and association of proteins. Prog. Biophys. Mol. Biol., 49, 117–237. 2 LAWSON, D. M., ARTYMIUK, P. J., YEWDALL, S. J., SMITH, I. M., LIVINGSTONE, I. C., TREFFRY, A., LUZZAGO, A., LEVI, S., AROSIO, P., CESARENI et al. (1991). Solving the structure of human H ferritin by genetically engineering intermolecular crystal contacts. Nature, 349, 541–544. 3 XU, Z., HORWICH, A. L. & SIGLER, P. B. (1997). The crystal structure of the asymmetric GroEL-GroES-(ADP)7 chaperonin complex. Nature, 388, 741–750. 4 IZARD, T., AEVARSSON A., ALLEN, M. D., WESTPHAL, A. H., PERHAM, R. N., de KOK, A. & HOL, W. G. (1999). Principles of quasi-equivalence and euclidean geometry govern the assembly of cubic and dodecahedral cores of pyruvate dehydrogenase complexes. Proc. Natl Acad. Sci. USA, 96, 1240–1245. 5 GILBERT, W. (1978). Why genes in pieces? Nature, 271, 501. 6 GILBERT, W. (1985). Genes-in-pieces revisited. Science, 228, 823–824. 7 ORENGO, C. A., MICHIE, A. D., JONES, S., JONES, D. T., SWINDELLS, M. B. & THORNTON, J. M. (1997). CATH a hierarchic classification of protein domain structures. Structure, 5, 1093–1108. 8 THORNTON, J. M., ORENGO, C. A., TODD, A. E. & PEARL, F. M. (1999). Protein folds, functions and evolution. J. Mol. Biol., 293, 333–342. 9 ORENGO, C. A., BRAY, J. E., BUCHAN, D. W., HARRISON, A., LEE, D., PEARL, F. M., SILLITOE, I., TODD, A. E. & THORNTON, J. M. (2002). The CATH protein family database: a resource for structural and
10
11
12
13
14
15
16
17
18
functional annotation of genomes. Proteomics, 2, 11–21. BERGMAN, L. W. & KUEHL, M. W. (1979). Formation of intermolecular disulfide bonds on nascent immuno-globulin polypeptides. J. Biol. Chem., 254, 5690–5694. JAENICKE, R. (1999) Stability and folding of domain proteins. Prog. Biophys. Mol. Biol., 71, 155–241. KUWAJIMA, Y. (1989). The molten globule state as a clue for understanding the folding and cooperativity of globular-protein structure. Proteins, 6, 87–103. UVERSKY, V. N. (2002). Natively unfolded proteins: A point where biology waits for physics. Prot. Science, 11, 739–756. NAMBA, K. (2001). Roles of partly unfolded conformations in macro-molecular self-assembly. Genes to Cells, 6, 1–12. IKEGUCHI, M., KUWAJIMA, K. & SUGAI, S. (1986). Ca2+-induced alteration in the unfolding behavior of alpha-lactalbumin. J. Biochem., 99, 1191–1201. SEGEL, D. J., BACHMANN, A., HOFRICHTER, J., HODGSON, K. O., DONIACH, S. & KIEFHABER, T. (1999). Characterization of transient intermediates in lysozyme folding with time-resolved small-angle X-ray scattering. J. Mol. Biol., 288, 489–499. WILDEGGER, G. & KIEFHABER, T. (1997). Three-state model for lysozyme folding: triangular folding mechanism with an energetically trapped intermediate. J. Mol. Biol., 270, 294–304. RUDOLPH, R., SIEBENDRITT, R., NESSLAUER, G., SHARMA, A. K. & JAENICKE, R. (1990). Folding of an all-beta protein: independent domain folding in gamma II-crystallin from calf eye lens.
31
32 Folding and Association of Multi-domain and Oligomeric
19
20
21
22
23
24
25
26
27
Proc. Natl Acad. Sci. USA, 88, 2854–2858. MAYR, E. M., JAENICKE, R. & GLOCKSHUBER, R. (1997). The domains in gamma B-crystallin: identical fold-different stabilities. J. Mol. Biol., 269, 260–269. PALME, S., SLINGSBY, C. & JAENICKE, R. (1997). Mutational analysis of hydrophobic domain interactions in gamma B-crystallin from bovine eye lens. Protein Sci., 6, 1529–1536. GUITERREZ-CRUZ, G., Van HEERDEN, A. H. & WANG, K. (2001). Modular motif, structural folds and affinity profiles of the PEVK segment of the human fetal skeletal muscle titin. J. Biol. Chem., 276, 7442–7449. SMITH, D. A. & REDFORD, S. (2000). Protein folding: Pulling back the frontiers. Current Biol., 10, 662–664. HEAD, J. G., HOUMEIDA, A., KNIGHT, P. J., CLARKE, A. R., TRINICK, J. & BRADY, R. L. (2001). Stability and folding rates of domains spanning the large A-band super-repeat of titin. Biophys. J., 81, 1570–1579. LI, H., OBERHAUSER, A. F., FOWLER, S. B., CLARKE, J. & FERNANDEZ, J. M. (2000). Atomic force microscopy reveals the mechanical design of a modular protein. Proc. Natl Acad. Sci. USA, 97, 6527– 6531. POLITOU, A. S., GAUTEL, M., IMPROTA, M., VANGELISTA, L. & PASTORE, A. (1996). The elastic I band region of titin is assembled in a “modular” fashion by weakly interacting Ig-like domains. J. Mol. Biol., 255, 604–616. FOWLER, S. B. & CLARKE, J. (2001). Mapping the folding pathway of an immunoglobulin domain: structural detail from Phi value analysis and movement of the transition state. Structure, 9, 355–366. BEST, R. B., FOWLER, S. B., HERRERA, J. L., STEWARD, A., PACI, E. & CLARKE, J. (2003). Mechanical unfolding of a titin Ig domain: structure of the transition state revealed by combining atomic force microscopy, protein engineering and molecular dynamics simulations. J. Mol. Biol., 330, 867–877.
28 BEST, B., LI, B., STEWARD, A., DAGGETT, V. & CLARKE, J. (2001). Can non-mechanical proteins withstand force? Stretching barnase by atomic force microscopy and molrcular dynamics simlation. Biophys. J., 81, 2344–2356. 29 POWERS, E. T. & POWERS, D. L. (2004). A perspective of mechanisms of protein tetramer formation. Biophys. J., 85, 3587–3599. 30 SECKLER, R. (2000). Assembly of multi-subunit strucutres. In: Mechanisms of Protein Folding, Ed.: R. H. PAIN, 2nd ed., Oxford Press, pp. 279–308. 31 BRANDEN, C.-I. and TOOZE, J. (1999). Introduction to protein structure, (2nd edn). Garland, New York, p. 410. 32 SMITH, S. (1994). The animal fatty add synthase: one gene, one polypeptide, seven enzymes. FASEB J., 8, 1248–1259. 33 WEBER, G., SCHORGENDORFER, K., SCHNEIDER-SCHERZER, E. & LEITNER, E. (1994). The peptide synthetase catalyzing cyclosporin production in Tolypocladium niveum is encoded by a giant 45.8-kilobase open reading frame. Curr. Genet., 26, 120–125. 34 DAUTRY-VARSAT, A. & GAREL, J. R. (1978). Refolding of a bifunctional enzyme and its monofunctional fragment. Proc. Natl Acad. Sci. USA, 75, 5979–5982. 35 CREIGHTON, T. E. (1992). Proteins: structures and molecular properties, (2nd edn). W. H. Freeman, New York, p. 309–325. 36 MILLER, S., SCHULER, B. & SECKLER, R. (1998). Phage P22 tailspike protein: removal of head binding domain unmasks effects of folding mutations on native-state thermal stability. Prot. Sci., 7, 2223–2232. 37 DANNER, M., FUCHS, A., MILLER, S. and SECKLER, R. (1993). Folding and assembly of phage P22 tailspike endo-rhamnosidase lacking the N-terminal, head-binding domain. Eur. J. Biochem., 215, 653–661. 38 JONES, S. & THORNTON, J. M. (1997). Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol., 272, 121–132. 39 TSAI, C. J., LIN, S. L., WOLFSON, H. J. & NUSSINOV, R. (1997). Studies of
References
40
41
42
43
44
45
46
47
48
49
50
51
protein-protein interfaces: a statistical analysis of the hydrophobic effect. Protein Sci., 6, 53–64. TSAI, C. J., XU, D. & NUSSINOV, R. (1997). Structural motifs at protein-protein interfaces: protein cores versus two-state and three-state model complexes. Prot. Sci., 6, 1793–1805. LE CONTE, L., CHOTHIA, C. & IANIN, J. (1999). The atomic structure of protein-protein recognition sites. J. Mol. Biol., 285, 2177–2198. THIES, M., MAYER, S., AUGUSTINE, J. G., FREDERICK, C. A., LILIE, H. & BUCHNER, J. (1999). Folding and association of the antibody domain CH 3: Prolyl isomerization preceeds dimerization. J. Mol. Biol., 293, 67–79. CHOTHIA, C. and JANIN, J. (1975). Principles of protein-protein recognition. Nature, 256, 705–708. DILL, K. A. (1990). Dominant forces in protein folding. Biochemistry, 29, 7133–7155. LARSEN, T. A., OLSON, A. J. & GOODSELL, D. S. (1998). Morphology of protein-protein interfaces. Structure, 6, 421–427. DELANO, W. L. (2002). Unravelling hot spots in the binding interface: progress and challenges. Curr. Opin. Struct. Biol., 12, 14–20. XU, D., TSAI, C. J. & NUSSINOV, R. (1998). Mechanism and evolution of protein dimerization. Protein Sci., 7, 533–544. KILLENBERG-JABS, M., JABS, A., LILIE, H., GOLBIK, R. & HUBNER, G. (2001). Active oligomeric states of pyruvate decarboxylase and their functional characterization. Eur. J. Biochem., 268, 1698–1704. VENYAMINOV, S. Yu., GUDKOV, A. T., GOGIA, Z. V. & TUMANOVA, L. G. (1981). Absorption and cicular dichroism spectra of individual proteins from Escherichia coli ribosomes. Pushkino, Russia. YUSUPOV, M. M., YUSUPOVA, G. Z., BAUCOM, A., LIEBERMAN, K., EARNEST, T. N., CATE, J. H. & NOLLER, H. F. (2001). Science, 292, 883–896. KRIWACKI, R. W., HENGST, L., TENNANT, L., REED, S. I. & WRIGHT, P. E. (1996). Structural studies of p21 in the free and
52
53
54
55
56
57
58
59
60
61
Cdk2-bound state: Conformational disorder mediates binding diversity. Prog. Natl Acad. Sci. USA, 93, 11504–11509. YU, Y. B. (2002). Coiled coils: stability, specificity, and drug delivery potential. Adv. Drug Deliv. Rev., 54, 1113–1129. RICHTER, S., STUBENRAUCH, K., LILIE, H. & RUDOLPH, R. (2001). Polyionic fusion peptides function as specific dimerization motifs. Prot. Engineering, 14, 775–783. STUBENRAUCH, K., GLEITER, S., BRINKMANN, U., RUDOLPH, R. & LILIE, H. (2001). Tumor cell specific targeting and gene transfer by recombinant polyoma virus like particle/antibody conjugates. Biochem. J., 356, 867–873. KLEINSCHMIDT, M., RUDOLPH, R. & LILIE, H. (2003). Design of a modular immunotoxin connected by polyionic adapter peptides. J. Mol. Biol., 327, 445–452. BENNETT, M. J., CHOE, S. & EISENBERG, D. (1994). Domain swapping: entangling alliances between proteins. Proc. Natl Acad. Sci. USA, 91, 3127–3131. BENNETT, M. J., SCHLUNEGGER, M. P. & EISENBERG, D. (1995). 3D domain swapping: a mechanism for oligomer assembly. Protein Sci., 4, 2455–2468. SCHLUNEGGER, M. P., BENNETT, M. J. & EISENBERG, D. (1997). Oligomer formation by 3D domain swapping: a model for protein assembly and misassembly. Adv. Protein Chem., 50, 61–122. BAX, B., LAPATTO, R., NALINI, V., DRIESSEN, H., LINDLEY, P. V., MAHADEVAN, D., BLUNDELL, T. L. & SLINGSBY, C. (1990). X-ray analysis of beta B2-crystallin and evolution of oligomeric lens proteins. Nature, 347, 776–779. LIU, Y. & EISENBERG, D. (2202). 3D domain swapping: As domains continue to swap. Prot. Science, 11, 1285–1299. VIJAYALAKSHMI, J., MUKHERGEE, M. K., GRAUMANN, J., JAKOB, U. & SAPER, M. A. (2001). The 2.2 A crystal structure of Hsp33: A heat shock protein with a redox-regulated chap-erone activity. Structure, 9, 367–375.
33
34 Folding and Association of Multi-domain and Oligomeric
62 KIM, S. J., JEONG, D. G., CHI, S. W., LEE, J. S. & RYA, S. E. (2001). Crystal structure of proteolytic fragments of the redox-sensitive Hsp33 with constitutive chaperone activity. Nat. Struct. Biol., 8, 459–466. 63 GRAUMANN, J., LILIE, H., TANG, X., TUCKER, K. A., HOFFMANN, J., JANAKIRAMAN, V., SAPER, M., BARDWELL, J. C. A. & JAKOB, U. (2001). Activation of the redox regulated chaperone Hsp33 a two step mechanism. Structure, 9, 377–387. 64 LOMAS, D. A., EVANS, D. L., FINCH, J. T. & CARRELL, R. W. (1992). The mechanism of Z alpha 1-antitrypsin accumulation in the liver. Nature, 357, 541–542. 65 D’ALESSIO, G. (1995). Oligomer evolution in action? Nat. Struct. Biol., 2, 11–13. 66 PICCOLI, R., TAMBURRINI, M., PICCIALLI, G., DIDONATO, A., PARENTE, A. & D’ALESSIO, G. (1992). The dual-mode quaternary structure of seminal RNase. Proc. Natl Acad. Sci. USA, 89, 1870–1874. 67 KONO, N., UYEDA, K. & OLIVER, R. M. (1973). Chicken liver phosphofructo-kinase. I. Crystallization and physico-chemical properties. J. Biol. Chem., 248, 8592–8602. 68 BOWIE, J. U. & SAUER, R. T. (1989). Equilibrium dissociation and unfolding of the Arc repressor dimer. Biochemistry, 28, 7139–7143. 69 GOLBIK, R., NAUMANN, M., OTTO, A., M¨ULLER, E.-C., BEHLKE, J., REUTER, R., H¨UBNER, G. & KRIEGEL, T. M. (2001). Regulation of phosphotransferase activity of hexokinase 2 from Saccha-romyces cerevisiae by modification at serine-14. Biochemistry, 40, 1083–1090. 70 BAER, D., GOLBIK, R., HUEBNER, G., LILIE, H., MUELLER, E. C., NAUMANN, M., OTTO, A., REUTER, R., BREUNIG, K. D. & KRIEGEL, T. M. (2003). The unique hexokinase of Kluyveromyces lactis: Isolation, molecular characterization and evaluation of a role in glucose signaling. J. Biol. Chem., 278, 39280–39286. 71 HOFFMANN, J. H., LINKE, K., GRAF, P. C., LILIE, H. & JAKOB, U. (2004). Identification of a redox regulated
72
73
74
75
76
77
78
79
80
81
chaperone network. EMBO J., 23, 160–168. NEET, K. E. & TIMM, D. E. (1994). Conformational stability of dimeric proteins: quantitative studies by equilibrium denaturation. Protein Sci., 3, 2167–2174. SINCLAIR, J. F., ZIEGLER, M. M. & BALDWIN, T. O. (1994). Kinetic partitioning during protein folding yields multiple native states. Nature Struct. Biol., 1, 320–326. NICHTL, A., BUCHNER, J., JAENICKE, R., RUDOLPH, R. & SCHEIBEL, T. (1998). Folding and association of beta-galactosidase. J. Mol. Biol., 282, 1083–1091. FUCHS, A., SEIDERER, C. & SECKLER, R. (1991). In vitro folding pathway of phage P22 tailspike protein. Biochemistry, 30, 6598–6604. LILIE, H., RUDOLPH, R. & BUCHNER, J. (1995). Association of antibody chains at different stages of folding; Prolyl isomerization occurs after formation of quaternary structure. J. Mol. Biol., 248, 190–201. CLELAND, J. L. & WANG, D. I. (1990). Refolding and aggregation of bovine carbonic anhydrase B: quasi-elastic light scattering analysis. Biochemistry, 29, 11072–11078. BUCHNER, J., GRALLERT, H. & JAKOB, U. (1998). Analysis of chaperone function using citrate synthase as nonnative substrate protein. Methods Enzymol., 290, 323–338. STRYER, L. (1978) Fluorescence energy transfer as a spectroscopic ruler. Annu. Rev. Biochem., 47, 819–846. HASSIEPEN, U., FEDERWISCH, M., MULDERS, T., LENZ, v. J., GATTNER, H. G., KRUGER, P. & WOLLMER, A. (1998). Analysis of protein self-association at constant concentration by fluorescence-energy transfer. Eur. J. Biochem., 255, 580–587. TENENBAUM-BAYER, H. & LEVITZKI, A. (1976). The refolding of lactate dehydrogenase subunits and their assembly to the functional tetrarner. Biochim. Biophys. Acta, 445, 261–279.
References 82 CARDENAS, J. M., HUBBARD, D. R. and ANDERSON, s. (1977). Subunit structure and hybrid formation of bovine pyruvate kinases. Biochemistry, 16, 191. 83 GROSSMAN, S. H., PYLE, J. & STEINER, R. J. (1981). Kinetic evidence for active monomers during the reassembly of denatured creatine kinase. Biochemistry, 20, 6122–6128. 84 BURNS, D. L. & SCHACHMAN, H. K. (1982). Assembly of the catalytic trimers of aspartate transcarbamoylase from folded monomers. J. BioI. Chem., 257, 8638–8647. 85 DANNER, M. & SECKLER, R. (1993). Mechanism of phage P22 tailspike protein folding mutations. Protein Sci., 2, 1869–1881. 86 WEISSMAN, J. S. (1995). All roads lead to Rome? The multiple pathways of protein folding. Chem. Biol., 2, 255–260. 87 DILL, K. A. & CHAN, H. S. (1997). From Levinthal to pathways to funnels. Nat. Struct. Biol., 4, 10–19. 88 JAENICKE, R. & LILIE, H. (2000). Folding and association of oligomeric and multimeric proteins. Adv. Prot. Chem., 53, 329–401. 89 KERN, G., KERN, D., JAENICKE, R. & SECKLER, R. (1993). Kinetics of folding and association of differently glycosylated variants of invertase from Saccharomyces cerevisiae. Prot. Science, 2, 1862–1867. 90 RUDOLPH, R., WESTHOF, E. & JAENICKE, R. (1977). Kinetic analysis of the reactivation of rabbit muscle aldolase after denaturation with guanidine-HCl. FEBS Lett., 73, 204–206. 91 OPITZ, U., RUDOLPH, R., JAENICKE, R., ERICSSON, L. & NEURATH, H. (1987). Proteolytic dimers of porcine muscle lactate dehydrogenase: characterization, folding, and reconstitution of the truncated and nicked polypeptide chain. Biochemistry, 26, 1399–13406. 92 MILLER, S., SCHULER, B. & SECKLER, R. (1998). A reversibly unfolding fragment of P22 tailspike protein with native structure: the isolated beta-helix domain. Biochemistry, 37, 9160–9168. 93 JAENICKE, R. & RUDOLPH, R. (1986). Refolding and association of oligo-meric
94
95
96
97
98
99
100
101
102
103
proteins. Methods Enzymol., 131, 218–250. HERMANN, R., RUDOLPH, R. & JAENICKE, R. (1982). The use of subunit hybridization to monitor the reassociation of porcine lactate dehydrogenase after acid dissociation. Hoppe Seylers. Z. Physiol. Chem., 363, 1259–1265. GERL, M., RUDOLPH, R. & JAENICKE, R. (1985). Mechanism and specificity of reconstitution of dimeric lactate dehydrogenase from Limulus polyphemus. Biol. Chem. Hoppe Seyler, 366, 447–454. BORCHERT, T. V., ABAGYAN, R., JAENICKE, R. & WIERENGA, R. K. (1994). Design, creation, and characterization of a stable, monomeric triosephosphate isomerase. Proc. Natl Acad. Sci. USA, 91, 1515–1518. SCHLIEBS, W., THANKI, N., JAENICKE, R. & WIERENGA, R. K. (1997). A double mutation at the tip of the dimer interface loop of triosephos-phate isomerase generates active monomers with reduced stability. Biochemistry, 36, 9655–9662. BEERNINK, P. T. & TOLAN, D. R. (1996). Disruption of the aldolase A tetramer into catalytically active monomers. Proc. Natl Acad. Sci. USA, 93, 5374–5379. KOREN, R. & HAMMES, G. G. (1976). A kinetic study of protein-protein interactions. Biochemistry, 15, 1165–1171. NORTHRUP, S. H. & ERICKSON, H. P. (1992). Kinetics of protein-protein association explained by Brownian dynamics computer simulation. Proc. Natl Acad. Sci. USA, 89, 3338–3342. GLOSS, L. M. & MATTHEWS, C. R. (1998). Mechanism of folding of the dimeric core domain of Escherichia coli trp repressor: a nearly diffusion-limited reaction leads to the formation of an on-pathway dimeric intermediate. Biochemistry, 37, 15990–15998. MILLA, M. E. & SAUER, R. T. (1994). P22 Arc repressor: folding kinetics of a single-domain, dimeric protein. Biochemistry, 33, 1125–1131. HERRMANN, R., RUDOLPH, R., JAENICKE, R., PRICE, N. C. & SCOBBIE, A. (1983). The reconstitution of denatured
35
36 Folding and Association of Multi-domain and Oligomeric
104
105
106
107
108
109
110
111
phosphoglycerate mutase. J. Biol. Chem., 258, 11014–11019. FISHER, A. J., RAUSHEL, F. M., BALDWIN, T. O. & RAYMENT, I. (1995). Three-dimensional structure of bacterial luciferase from Vibrio harveyi at 2.4A resolution. Biochemistry, 34, 6581– 6586. THODEN, J. B., HOLDEN, H. M., FISHER, A. J., SINCLAIR, J. F., WESENBERG, G., BALDWIN, T. O. & RAYMENT, I. (1997). Structure of the beta 2 homodimer of bacterial Iuciferase from Vibrio harveyi: X-ray analysis of a kinetic protein folding trap. Protein Sci., 6, 13–23. TANNER, J. J., MILLER, M. D., WILSON, K. S., TU, S. C. & KRAUSE, K. L. (1997). Structure of bacterial luciferase beta 2 homodimer: implications for flavin binding. Biochemistry, 36, 665–672. WADDLE, J. J., JOHNSTON, T. C. & BALDWIN, T. O. (1987). Polypeptide folding and dimerization in bacterial-luciferase occur by a concerted mechanism in vivo. Biochemistry, 26, 4917–4921. ZIEGLER, M. M., GOLDBERG, M. E., CHAFFOTTE, A. F. & BALDWIN, T. O. (1993). Refolding of luciferase subunits from urea and assembly of the active heterodimer. Evidence for folding intermediates that precede and follow the dimerization step on the pathway to the active form of the enzyme. J. Biol. Chem., 268, 10760–10765. BALDWIN, T. O., ZIEGLER, M. M., CHAFFOTTE, A. F. & GOLDBERG, M. E. (1993). Contribution of folding steps involving the individual subunits of bacterial luciferase to the assembly of the active heterodimeric enzyme. J. Biol. Chem., 268, 10766–10772. CLARK, A. C., SINCLAIR, J. F. & BALDWIN, T. O. (1993). Folding of bacterial luciferase involves a non-native heterodimeric intermediate in equilibrium with the native enzyme and the unfolded subunits. J. Biol. Chem., 268, 10773–10779. SINCLAIR, J. F., WADDLE, J. J., WADDILL, E. F. & BALDWIN, T. O. (1993). Purified native subunits of bacterial luciferase are active in the bioluminescence reaction
112
113
114
115
116
117
118
119
120
121
122
but fail to assemble into the alpha beta structure. Biochemistry, 32, 5036–5044. ZETTLMEISSL, G., RUDOLPH, R. & JAENICKE, R. (1979). Reconstitution of lactic dehydrogenase. Noncovalent aggregation vs. reactivation. 1. Physical properties and kinetics of aggregation. Biochemistry, 18, 5567–5571. De BERNADEZ CLARK, E., SCHWARZ, E. & RUDOLPH, R. (1999). Inhibition of Aggregation Side Reactions during in vitro Protein Folding. Methods in Enzymology, 309, 217–236. GOLDBERG, M. E., RUDOLPH, R. & JAENICKE, R. (1991). A kinetic study of the competition between renaturation and aggregation during the refolding of denatured-reduced egg white lysozyme. Biochemistry, 30, 2790–2797. JAENICKE, R. & SECKLER, R. (1997). Protein misassembly in vitro. Adv. Protein Chem., 50, 1–59. WETZEL, R. (1997). Domain stability in immunoglobulin light chain deposition disorders. Adv. Protein Chem., 50, 183–242. SUNDE, M. & BLAKE, C. (1997). The structure of amyloid fibrils by electron microscopy and X-ray diffraction. Adv. Protein Chem., 50, 123–159. KIEFHABER, T., RUDOLPH, R., KOHLER, H. H. & BUCHNER, J. (1991). Protein aggregation in vitro and in vivo: a quantitative model of the kinetic competition between folding and aggregation. Biotechnology (N.Y.), 9, 825–829. GOTO, Y. & HAMAGUCHI, K. (1982). Unfolding and refolding of the constant fragment of the immuno-globulin light chain. J. Mol. Biol., 156, 891–910. TSUNENAGA, M., GOTO, Y., KAWATA, Y. & HAMAGUCHI, K. (1987). Unfolding and refolding of a type kappa immunoglobulin light chain and its variable and constant fragments. Biochemistry, 26, 6044–6051. LANG, K., SCHMID, F. X. & FISCHER, G. (1987). Catalysis of protein folding by prolyl isomerase. Nature, 329, 268–270. LILIE, H., LANG, K., RUDOLPH, R. & BUCHNER, J. (1993). Prolyl isomerases
References
123
124
125
126
127
128
129
130
131
catalyze antibody folding in vitro. Prot. Sci., 2, 1490–1496. AUGUSTINE, J. G., de La CALLE, A., KNARR, G., BUCHNER, J. & FREDERICK, C. A. (2001). The crystal structure of the fab fragment of the monoclonal antibody MAK33. Implications for folding and interaction with the chaperone bip. J. Biol. Chem., 276, 3287–3294. PADLAN, E. A., COHEN, G. H. & DAVIES, D. R. (1986). Antibody Fab assembly: the interface residues between CH 1 and CL . Mol. Immunol., 23, 951–960. CHOTHIA, C., NOVOTNY, Y., BRUCCOLERI, R. & KARPLUS, M. (1985). Domain association in immunoglo-bulin molecules. The packing of variable domains. J. Mol. Biol., 186, 651–663. SMITH, D. H., BERGET, P. B. & KING, J. (1980). Temperature-sensitive mutants blocked in the folding or subunit assembly of the bacteriophage P22 tail-spike protein. I. Fine-structure mapping. Genetics, 96, 331–352. GOLDENBERG, D. P. & KING, J. (1981). Temperature-sensitive mutants blocked in the folding or subunit of the bacteriophage P22 tail spike protein. II. Active mutant proteins matured at 30 degrees C. J. Mol. Biol., 145, 633– 651. SMITH, D. H. & KING, J. (1981). Temperature-sensitive mutants blocked in the folding or subunit assembly of the bacteriophage P22 tail spike protein. III. Intensive polypeptide chains synthesized at 39 degrees C. J. Mol. Biol., 145, 653–676. KING, J. and YU, M. H. (1986). Mutational analysis of protein folding pathways: the P22 tailspike endo-rhamnosidase. Methods Enzymol., 131, 250–266. BETTS, S. & KING, J. (1999). There’s a right way and a wrong way: in vivo and in vitro folding, misfolding and subunit assembly of the P22 tailspike. Structure Fold. Des., 7, 131–139. MITRAKI, A., FANE, B., HAASE PETTINGELL, C., STURTEVANT, J. & KING, J. (1991). Global suppression of protein folding defects and inclusion body formation. Science, 253, 54–58.
132 MITRAKI, A., DANNER, M., KING, J. & SECKLER, R. (1993). Temperature-sensitive mutations and second-site suppressor substitutions affect folding of the P22 tailspike protein in vitro. J. Biol. Chem., 268, 20071–20075. 133 SCHULER, B. & SECKLER, R. (1998). P22 tailspike folding mutants revisited: effects on the thermodynamic stability of the isolated beta-helix domain. J. Mol. Biol., 281, 227–234. 134 SECKLER, R. (1998). Folding and function of repetitive structure in the homotrimeric phage P22 tailspike protein. J. Struct. Biol., 122, 216–222. 135 STEINBACHER, S., SECKLER, R., MILLER, S., STEIPE, B., HUBER, R. & REINEMER, P. (1994). Crystal structure of P22 tailspike protein: interdigitated subunits in a thermostable trimer. Science, 265, 383–386. 136 STEINBACHER, S., MILLER, S., BAXA, U., BUDISA, N., WEINTRAUB, A., SECKLER, R. & HUBER, R. (1997). Phage P22 tailspike protein: crystal structure of the head-binding domain at 2.3A, fully refined structure of the endorhamnosidase at 1.56 A resolution, and the molecular basis of O-antigen recognition and cleavage. J. Mol. Biol., 267, 865–880. 137 BAXA, U., STEINBACHER, S., WEINTRAUB, A., HUBER, R. & SECKLER, R. (1999). Mutations improving the folding of phage P22 tailspike protein affect its receptor binding activity. J. Mol. Biol., 293, 693–701. 138 BEISSINGER, M., LEE, S. C., STEINBACHER, S., REINEMER, P., HUBER, R., YU, M. H. & SECKLER, R. (1995). Mutations that stabilize folding intermediates of phage P22 tailspike protein: folding in vivo and in vitro, stability, and structural context. J. Mol. Biol., 249, 185–194. 139 GOLDENBERG, D. & KING, J. (1982). Trimeric intermediate in the in vivo folding and subunit assembly of the tail spike endorhamnosidase of bacteriophage P22. Proc. Natl Acad. Sci. USA, 79, 3403–3407. 140 BRUNSCHIER, R., DANNER, M. & SECKLER, R. (1993). Interactions of phage P22
37
38 Folding and Association of Multi-domain and Oligomeric
tailspike protein with GroE molecular chaperones during refolding in vitro. J. Biol. Chem., 268, 2767–2772. 141 GORDON, C. L., SATHER, S. K., CASJENS, S. & KING, J. (1994). Selective in vivo rescue by GroEL/ES of thermolabile folding intermediates to phage P22 structural proteins. J. Biol. Chem., 269, 27941–27951. 142 SATHER, S. K. & KING, J. (1994). Intracellular trapping of a cytoplasmic folding intermediate of the phage P22 tailspike using iodoacetamide. J. Biol. Chem., 269, 25268– 25276.
143 SPEED, M. A., WANG, D. I. & KING, J. (1995). Multimeric intermediates in the pathway to the aggregated inclusion body state for P22 tailspike polypeptide chains. Protein Sci., 4, 900–908. 144 SPEED, M. A., MORSHEAD, T., WANG, D. I. & KING, J. (1997). Conformation of P22 tailspike folding and aggregation intermediates probed by monoclonal antibodies. Protein Sci., 6, 99–108. 145 SPEED, M. A., WANG, D. I. & KING, J. (1996). Specific aggregation of partially folded polypeptide chains: the molecular basis of inclusion body composition. Nat. Biotechnol., 14, 1283–1287.
1
Chaperone Function in Vitro Johannes Buchner, and Stefan Walter Technische Universit¨at M¨unchen, Garching, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Originally, the term molecular chaperone had been coined by Laskey to describe the role of nucleoplasmin in the association of histones [1, 2]. Later, the concept was extended to heat shock proteins by John Ellis [3, 4]. He also invented the term “chaperonin” for GroE (Hsp60) and related proteins. Over the years, molecular chaperone has acquired additional meanings, some of which represent useful extensions of the original definition, whereas others, such as “chemical chaperones”, may cause some confusion. In the first section of this chapter, we will discuss the distinct features of a molecular chaperone. We will then describe the types of reactions that can be facilitated by molecular chaperones in general and present some important functional aspects. The chapter concludes with a discussion of the principal assays that can be used to investigate chaperone function in vitro. Molecular chaperones are a functionally related family of proteins assisting conformational changes in proteins without becoming part of the final structure of the client protein [5–9]. Originally, the focus of the concept was on the correct association of proteins and the prevention of unspecific aggregation of polypeptide chains during protein folding. Interestingly, all major chaperones belong to the class of heat shock or stress proteins, i.e., they are synthesized in response to stress situations such as elevated temperatures. Under these conditions, proteins tend to aggregate in vivo. This was demonstrated for E. coli strains lacking the heat shock response [10]. In these bacteria, elevated temperature caused the accumulation of proteins in large insoluble aggregates, which resemble the so-called inclusion bodies sometimes obtained upon overexpression of recombinant proteins [11]. These experiments showed that under stress there is a strong influence of molecular chaperones on protein homeostasis. However, this seems to hold true also for Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Chaperone Function in Vitro
Fig. 1 Model for assisted protein folding by molecular chaperones. Several processes generate unfolded polypeptide chains (red), which can fold via partially structured intermediates (orange, green) to the native, biologically active state (right). Depending on the conditions and the nature of the polypeptide, a significant fraction of the molecules may form aggregates (top). The yield of cellular protein folding is significantly improved by the action of general molecular chaperones (bottom, blue).
They bind to nonnative polypeptides and thus prevent their aggregation. Further, they may alter the conformation of the bound polypeptide so that once it dissociates, it is less prone to aggregation. Often, release is triggered by a change in the nucleotide state of the chaperone or by ATP hydrolysis. Recently, it was shown that some chaperones of the Hsp100 class such as ClpB and Hsp104 are able to reverse aggregation (top).
physiological conditions: many molecular chaperones have homologues that are constitutively expressed [12], and some of them are essential for viability [13, 14]. This finding suggests that these helper proteins are of general importance for the folding and assembly of newly synthesized polypeptide chains. Furthermore, the observation of strong aggregation of overexpressed proteins under otherwise physiological conditions is consistent with the notion of a balance between protein synthesis/folding and chaperone expression, which has been developed through evolution and can be modulated only within a certain range. At first glance, the concept outlined above is in conflict with the paradigm of autonomous, spontaneous protein folding established by Anfinsen and many others [15, 16]. The apparent contradiction is that if the acquisition of the unique threedimensional structure of a protein is governed by its amino acid sequence and the resulting interactions between amino acid side chains, there should be no need for molecular chaperones (Figure 1). However, additional factors have to be taken into account. The most important is the possibility of unspecific intermolecular interactions during the folding process of proteins [17, 18]. This reaction depends on the concentration of unfolded or partially folded polypeptide chains and their
1 Introduction 3
folding kinetics. Thus, either conditions under which proteins fold slowly or high concentrations of folding proteins will favor aggregation [19]. In consequence, the fraction of molecules reaching the native state will decrease. The basic function of molecular chaperones is to counteract this nonproductive side reaction and thereby increase the efficiency of folding (Figure 1). Molecular chaperones do not provide steric information for the folding process and thus do not violate the concept of autonomous folding. (The information for folding is encoded solely in the amino acid sequence.) Rather, molecular chaperones assist the spontaneous folding of proteins. Interestingly, the principal possibility of assisted folding had already been hypothesized by Anfinsen in 1963 [20]. It is difficult to assess what fraction of the cellular proteome depends on molecular chaperones to reach its active conformation. But clearly not all newly synthesized proteins need their assistance. While for the chaperone role model GroE (see Chapter 20) it was shown that 40% of the E. coli proteins can be influenced in their folding [21], in the cell only about 10–15% may need this assistance under physiological conditions [22, 23]. Similar values were reported for other chaperones [24, 25]. Under conditions where proteins are destabilized (such as heat shock), the requirement for chaperone assistance is clearly increased. There are only a few classes of general molecular chaperones (Figure 2) that have to take care of a large number of different proteins, implying that chaperones have to be promiscuous in their interaction with nonnative proteins, i.e., they must be able to recognize and handle a large variety of different proteins. With the exception of small Hsps, all of them possess an intrinsic ATPase activity (Table 1). ATP-dependent structural changes in molecular chaperones are essential for the conformational processing of client proteins. While some basic functional features
Fig. 2 Classes of general molecular chaperones. From left to right: Hsp70 is a monomeric chaperone consisting of an ATPase domain and a peptide-binding domain. The bacterial chaperone GroEL forms two rings comprised of seven subunits each. In the cross-section shown here, two subunits of each ring are depicted. Interactions between the subunits are mediated by the equatorial domains in the center of the molecule, which also contain the ATPase
site. The dimeric Hsp90 chaperone consists of three domains. While the C-terminal domains are responsible for dimerization, the N-terminal domains bind and hydrolyze ATP. The small Hsps form large oligomeric structures that contain 24–32 identical subunits. They are the only general chaperones without ATPase function. Chaperones of the Hsp100 class form ring-shaped homohexamers. Each subunit contains two similar, but not identical, ATPase sites.
1 0.02–0.04 (DnaK) [131] 0.04 (Ssa1) [132]
1
3 (GroEL) [67]
1–7 (bovine CCT) [130] 67 (GroEL) [134]
ATPases per subunit ATPase rate per subunit (min−1 )
2
Can form oligomers in the absence of ATP. Some Hsp100s contain only one ATPase/monomer. 3 Unpublished data from our labs.
1
50 (Ssa1)
Dimer
Monomer
Homo 14mer (GroEL) Hetero 16mer (archaea/eukaryotes)
Quaternary structure
T M (◦ C)
Hop/Sti1 p23/Sba1 Cdc37
Hsp40 GrpE (E. coli) Hop/Sti1
GroES (E. coli) Prefoldin (human)
Co-chaperones
62 (Hsp82)
0.05 (Hsp90) [60]
1.0 (Hsp82) [54]
1
69 (DnaK) 72 (human 81 (Hsp82) 85 Hsp72) (human Hsp90 a)
48 (Hsp104)
10–30 (ClpB) [133]
70 (Hsp104) [93]
2
Dynamic hexamer
Hsp70/Hsp40 (Hsp104) DnaK/DnaJ/GrpE (ClpB)
102 (Hsp104) 96 (ClpB) 84 (ClpA)
1AM1 (Hsp82 ATPase 1QVR (ClpB T. 1DKZ (DnaK thermophilus)[128] domain) [126] 1HK7 peptide-binding (Hsp82 middle domain)[33] 1S3X(Hsp72 ATPase domain) [127] domain) [150]
1AON (GroEL) [40] 1A6D (thermosome) [125]
ClpB (bacteria) ClpA (bacteria)
Hsp82 (S. cerevisiae) Hsp90 (human)
Ssa1 (S. cerevisiae) Hsp72 (higher eukaryotes)
CCT/TRiC (eukaryotes)
Hsp104 (S. cerevisiae)
HtpG (E. coli)
Hsp100
DnaK (E. coli)
Hsp90
GroEL (eubacteria)
Hsp70
Molecular mass of 57 (GroEL) 57–60 (human CCT-α) the monomer (kDa)
Structure: PDB accession number
Representatives
Hsp60
Table 1 Properties of general chaperones
75 (Hsp26) [135]
Dynamic 24mer (Hsp26) 32mer (α-crystallin)
Not known
16.5 (Hsp16.5) 24 (Hsp26) 20 (bovine α-crystallin)
1SHS (Hsp16.5) [129] 1GME (Hsp16.9) [151]
Hsp16.5 (M. jannashii) Hsp26 (S. cerevisiae) α-crystallin
sHsp
4 Chaperone Function in Vitro
2 Basic Functional Principles of Molecular Chaperones
seem to be conserved between different classes of chaperones, the structure and mode of action are completely different among individual chaperones.
2 Basic Functional Principles of Molecular Chaperones
The classification of an enzyme is usually based on the type of reaction it catalyzes and the substrate(s) it turns over. In the field of protein folding and molecular chaperones, applying this concept is rather difficult for three reasons. First, though we currently lack a detailed understanding of the mechanisms of protein folding, especially in the context of a living cell, we know it is a rather complex process involving multiple pathways, intermediates, and potential side reactions. As a consequence, molecular chaperones can interfere with and assist protein folding in many different ways, and in most cases our knowledge of how chaperones influence protein folding on a molecular level is very limited. A second problem arises from the heterogeneity of the substrates. As already mentioned, the in vivo substrates of most chaperones are not known in detail. However, it is clear that they include a substantial fraction of the proteome. Finally, it is very difficult to assess which conformational state(s) of these client proteins interact with a certain chaperone. In classic enzymology, reaction intermediates can often be trapped by changing the solvent conditions or temperature. Due to their low intrinsic stability, this is in general not possible for nonnative proteins. Here, a change in solvent conditions will almost always result in a change in the conformational state of the client protein. Under “folding” conditions, the nonnative protein will continue to “react” until a final state (e.g., native structure or aggregate) is reached. Therefore, great care must be taken when the structure of these intermediates is studied. Otherwise, the result will most likely not reflect the conformation of the protein at the time the sample was taken [26–29]. In the following sections, we present key features that many chaperones have in common and that we consider as the functionally most relevant. 2.1 Recognition of Nonnative Proteins
The most fascinating trait of a molecular chaperone is its ability to discriminate between the folded and nonnative states of a protein. It would be deleterious for the cell if chaperones would bind to native proteins or fail to recognize the accumulation of the nonnative protein species. Thus, the promiscuous general chaperones have to use a property for recognition and binding that is common to all nonnative proteins. This is the exposure of hydrophobic amino acid side chains, the hallmark of unfolded or partially folded protein structures. In the unfolded state, extended stretches of polypeptide chains containing hydrophobic residues can be recognized by chaperones. This is well documented for proteins of the Hsp70 family [30, 31]. In partially folded proteins, these residues are often clustered in areas that are part
5
6 Chaperone Function in Vitro
of a domain/subunit interface in the folded protein, and they will become buried in subsequent folding/association events. These regions can thus be utilized by molecular chaperones to identify potential client polypeptides, i.e., to distinguish a nonnative conformation from a native protein. The respective binding site of the chaperone usually contains a number of hydrophobic residues that interact with the substrate [32, 33]. The low specificity of hydrophobic interactions is the basis for the substrate promiscuity of molecular chaperones. Since the extended exposure of hydrophobic surfaces in an unfolded or partially folded state is an important determinant for its probability to aggregate, the shielding of these surfaces upon binding to a chaperone is a very efficient means to prevent aggregation. Besides hydrophobicity, a polypeptide may have to meet additional requirements in order to bind to a chaperone. It was shown for GroEL that negative charges on the substrate significantly weaken interaction with the chaperone due to electrostatic repulsion [34]. In the case of DnaK (Hsp70), the preferred substrate is a stretch of hydrophobic residues flanked by basic amino acids [35, 36]. Another important aspect that can influence the affinity of a polypeptide towards a chaperone is its conformational plasticity. Due to their promiscuity, chaperones cannot provide an optimal binding surface for each of their client proteins. Consequently, the polypeptide, and to some extent the chaperone itself, may have to adjust its conformation to establish tight binding [37]. Because this process requires a certain degree of flexibility, it may serve as a second checkpoint to discriminate between native and nonnative conformations. An important consequence of the unspecific recognition of exposed hydrophobic residues is that chaperones will not bind just one specific nonnative conformation of a protein. In contrast, it has to be assumed that all conformations exhibiting a certain degree of hydrophobicity will be recognized. In the case of GroE, the conformations that are stably bound range from completely unfolded to a nativelike dimer [38, 39]. The sites of interaction with nonnative proteins are known for Hsp70 and GroEL. In the crystal structure of the protein-binding domain of DnaK (Hsp70) in complex with a short peptide [33], the peptide adopts an extended conformation and the hydrophobic side chains of the peptide make contact with the binding groove, but there are also polar interactions between Hsp70 and the peptide backbone. In the case of GroEL, the analysis of numerous point mutations together with the crystal structure indicated that the nonnative protein binds to hydrophobic residues of the apical domains, which are located on the top of the inner rim of the GroEL cylinder [32, 40]. This conclusion was confirmed several years later by the X-ray structure of a GroEL-peptide complex [41, 42]. Detailed descriptions of the substrate-binding sites and their properties can be found in the chapters dedicated to the respective molecular chaperones. A notable exception from the theme of hydrophobic binding motifs is chaperones involved in the unfolding of proteins. Many of their client proteins adopt a fully folded conformation and thus do not expose hydrophobic surfaces. At the same time, an unambiguous signal is required to tell the chaperone which protein should be unfolded for subsequent degradation. Evolution has solved this problem by
2 Basic Functional Principles of Molecular Chaperones
attaching specific tags, such as ubiquitin [43, 44] or the ssrA peptide [45], to these proteins, which serve as recognition sites for the respective degradation systems [46]. Little is known about the structure/sequence motifs recognized by chaperones involved in disaggregation, such as ClpB or Hsp104, as it has proved difficult to obtain stable chaperone-substrate complexes in these cases. 2.2 Induction of Conformational Changes in the Substrate
Prevention of aggregation by (transient) binding of partially folded proteins clearly is an important part of chaperone function in vivo. But in a number of cases, this “buffering capacity” appears not to be the primary task. This is especially true for the ATP-consuming chaperones. An increasing amount of evidence suggests that the energy provided by ATP hydrolysis is used to induce conformational changes in the bound polypeptide. For the ClpA chaperone, Horwich and coworkers demonstrated that it can actively unfold ssrA-tagged, native GFP [47]. In this case, ATP hydrolysis is required to overcome the free energy of folding. In the case of the Hsp90 system, an immature conformation of a bound steroid hormone receptor (a client protein) is converted into a conformation that is capable of binding the hormone ligand [48, 49]. ATP binding and hydrolysis are essential for this process, as they seem to control the association of the chaperone with its various cofactors [50]. During the whole sequence of reactions, the receptor is assumed to remain bound to the Hsp90 chaperone. For the general chaperones, we know that conformational changes occur in the chaperone upon both ATP binding and hydrolysis [51–54]. However, much less is known about how and to what extent these changes result in conformational rearrangements in substrate proteins associated with the chaperone. As mentioned above, this is mainly because the species involved are often only transiently populated and very labile and thus are not easily susceptible to methods of structure determination. Also, the bound target protein may represent an ensemble of conformations [39]. Two mechanisms can be envisaged as to how a chaperone causes a structural change in a client protein. In both cases, a conformational change in the chaperone is induced by the binding of a ligand (ATP or a cofactor) or by ATP hydrolysis. This in turn results in a change in the nature/geometry of the substrate-binding site that weakens the interaction with the bound polypeptide. As the system ligand/chaperone tends to assume a state of minimum free energy, the substrate has two choices: (1) it may undergo structural rearrangements to adopt a conformation that has a higher affinity towards the binding site or (2) it may start to fold. Both processes will lower the free energy of the system, but in the first case the energy drop primarily results from interactions between chaperone and substrate, whereas in the second case it results solely from interactions within the client polypeptide. Although these rearrangements are driven by thermodynamics, kinetics may be important as well in defining whether binding or folding dominates for a given polypeptide conformation [55–57].
7
8 Chaperone Function in Vitro
An alternative to this “passive” model is that the conformational change in the chaperone generates a “power stroke” that is used to perform work on the substrate, e.g., by moving apart two sections of the bound protein. If the substrate is a single polypeptide, this could result in a forced unfolding reaction [58]. If the substrate is an oligomeric species, e.g., a small aggregate, the power stroke may cause its dissociation into individual molecules. This “active” mechanism requires that the substrate remains associated with the chaperone during the conformational switch, which is not required in the first model. After the power stroke, the substrate will probably undergo a series of conformational relaxation reactions. For both models, the concept of general chaperones requires that the conformational change in the substrate be encoded by its amino acid sequence rather than being forced onto the substrate by the chaperone. The chaperone merely enables the polypeptide to use its inherent folding information.
2.3 Energy Consumption and Regulation of Chaperone Function
Most of the general chaperones require ATP to carry out their task (Table 1). Examples of ATP-hydrolyzing chaperones are GroEL; the Hsp104/ClpB proteins, which participate in the dissociation of protein aggregates; Hsp90; and Hp70, which is involved in a large variety of reactions ranging from the general folding of nascent polypeptide chains to specific functions. As stated above, the cycle of ATP binding and hydrolysis drives an iterative sequence of conformational changes in the chaperone, which may induce structural rearrangements in a bound polypeptide or trigger binding/release of client proteins and co-chaperones. In contrast to the conformational changes that occur within both the chaperone and a bound substrate protein, the ATPase activity of molecular chaperones is experimentally very easily accessible. Because of its crucial functional importance, ATP turnover is usually modulated by both the substrate and helper chaperones. Stimulation of ATP hydrolysis in the presence of unfolded proteins or peptides has been observed, e.g., for Hsp70, Hsp90, and Hsp104 [30, 59–61]. Apparently, the binding of the substrate induces a conformational change in these chaperones that results in a state with a higher intrinsic ATPase activity. This “activation on demand” may serve as a means to reduce the energy consumption by the empty chaperones. In contrast, ATP hydrolysis by GroEL is only marginally affected by the presence of protein substrates. A well-characterized example of the regulation of ATP hydrolysis by cochaperones is the DnaK system of E. coli. The ATPase cycle, and thus the cycle of substrate binding and release, is governed by two potential rate-limiting steps: the hydrolysis reaction and the subsequent exchange of ADP by ATP [62]. Both steps can be accelerated by specific cofactors, namely DnaJ and GrpE [59, 63, 64]. The rates of nucleotide exchange and hydrolysis determine how fast a substrate is processed and how long it is on average associated with the ADP and ATP states of DnaK, respectively.
3 Limits and Extensions of the Chaperone Concept
While Hsp70 is subject mainly to positive regulation, other chaperone systems appear to include negative regulators as well. Hsp90, which plays an essential role in the activation of many regulatory proteins such as hormone receptors and kinases, is controlled by a large number of cofactors. Two of these co-chaperones, p23 and Hop/Sti1, were found to decrease the ATPase activity by stabilizing specific conformational states of Hsp90 [50, 65]. Intriguingly, Hop/Sti1 from S. cerevisiae seems to have a dual regulatory function: besides its inhibitory effect on Hsp90, Sti1 was also shown to be a very potent activator of Hsp70 [66]. Another example of a negative regulator is GroES, which reduces the ATP hydrolysis of its partner protein GroEL to 10–50% depending on the conditions used [67, 68]. The role of GroES in the functional cycle of the GroE chaperone, however, goes far beyond its modulation of ATP turnover, as it is essential for the encapsulation of a polypeptide in the folding compartment of the GroE particle. The example of GroES is instructive, as it demonstrates that regulation of ATP turnover may not be the most important function of a cofactor. Nevertheless, the combination of experimental simplicity and functional relevance makes the ATPase of a general chaperone a preferred choice for mechanistic studies. Some chaperones are regulated by temperature. Hsp26, a small heat shock protein from S. cerevisiae, as no chaperone activity at 30 ◦ C, the physiologic temperature for yeast. At this temperature, Hsp26 forms hollow spherical particles consisting of 24 subunits [69]. Under heat stress, e.g., at 42 ◦ C, these large complexes dissociate into dimers that exhibit chaperone activity and bind to nonnative polypeptides. Another interesting example of temperature regulation is the hexameric chaperone/protease degP of the E. coli periplasm [70]. At low temperature the protein acts as a chaperone, whereas at elevated temperature it possesses proteolytic activity. The regulatory switch is thought to involve the reorientation of inhibitory domains, which block access to the active site of the protease at low temperature [71]. Interestingly, switching can also be achieved by the addition of certain peptides. A further variation of the scheme of activating chaperones by stress conditions is the regulation of Hsp33. Here, oxidative stress leads to the formation of disulfide bonds within the chaperone and its subsequent dimerization, which is a prerequisite for efficient substrate binding [72].
3 Limits and Extensions of the Chaperone Concept
The term chaperone and the underlying concept are attractive. Therefore, it is not surprising that the term itself has been proved to be rather promiscuous, as it is increasingly used for different proteins and even non-proteinaceous molecules. In its most basic interpretation, it has been applied to all proteins that have an effect on the aggregation of another protein or that influence the yield of correct folding. Most of these proteins have hydrophobic patches on their surface and thus, irrespective of their biological function, will associate with nonnative proteins. However, to qualify for a molecular chaperone, additional requirements should be met. This includes
9
10 Chaperone Function in Vitro
the defined stoichiometry of the binding reaction and the regulated release of the bound polypeptide (e.g., triggered by binding of ATP). An extension of the term chaperone are the so-called intramolecular chaperones. These are regions of proteins that are required for their productive folding without being part of the mature protein because they are cleaved off after folding is complete. Well-investigated examples include the pro-regions of subtilisin E and α-lytic protease [73, 74]. The variation of the chaperone concept is that an intramolecular chaperone is specific for one protein and is covalently attached to its target. Furthermore, these chaperones act only once: after completing their job, they are cleaved off and degraded. These properties do not agree with the basic definition of a chaperone. Chemical chaperone is a term that has become fashionable in recent years. It refers to small molecules (not proteins or peptides) that are added to refolding reactions to increase the efficiency of protein folding in vitro. For instance, the folding yields for many enzymes can be improved by adding a substrate to the refolding buffer [75–77]. These compounds are thought to stabilize critical folding intermediates by binding to the active site, thereby enhancing folding. Besides these rather specific ligands, a number of substances, the most prominent of them being Tris and arginine, were found to promote the yield of folding in many cases when present in large excess over the folding proteins. In contrast to molecular chaperones, both chemical chaperones and intramolecular chaperones cannot be regulated in their activity. 3.1 Co-chaperones
For many chaperones, cofactors/co-chaperones exist that seem to directly influence the function of the chaperone. Some of them are essential constituents of their respective chaperone system, while others only improve its efficiency. Often, the role of cofactors is not limited to the regulation of the ATPase function but involves the recognition of nonnative proteins. These proteins thus have features of a chaperone. The DnaJ/Hsp40 co-chaperones seem to bind unfolded proteins and transfer them to Hsp70 [62, 78]. At the same time, as mentioned above, binding of DnaJ/Hsp40 stimulates ATP hydrolysis by Hsp70. Far more complex is the situation for the Hsp90 system. Here a plethora of cofactors exist, many of which seem to be able to recognize and bind nonnative proteins as demonstrated by in vitro chaperone assays [79, 80]. There is good reason to assume that they exert this function also within the Hsp90 chaperone complex, i.e., they interact with the substrate when it is associated with Hsp90. The term co-chaperone implies that the function of these proteins is restricted to the context of their respective chaperones. However, recent evidence suggests that this may not be true in all cases. One example is the Hsp90 co-chaperone p23, which clearly has a distinct function in the Hsp90 system but also seems to be able to disassemble transcription factor complexes bound to DNA [81]. Similarly, Cdc37 was reported to exhibit an autonomous chaperone function [82].
4 Working with Molecular Chaperones
It is important to remember that while in vitro chaperone assays can be used to detect chaperone properties in a cofactor, their functional relevance in the context of a chaperone complex has to be addressed in subsequent experiments. 3.2 Specific Chaperones
In addition to the general chaperones, a number of proteins exist for which the basic definition of chaperone function applies. However, these chaperones are dedicated to one or a few target proteins. In these cases, recognition is highly specific and does not necessarily involve hydrophobic interactions. An example for this class of chaperones is Hsp47, a resident protein of the endoplasmic reticulum that is a dedicated folding factor for collagen [83]. It interacts with immature collagen, is required for its correct folding and association, and is not part of the final collagen trimer. Thus, all the criteria for a molecular chaperone are fulfilled. Collagen release is not triggered by ATP hydrolysis but by changes in the pH upon transit from the ER to the Golgi [84, 85]. Another well-studied example is the pili assembly proteins of gram-negative bacteria. Here, unassembled pili subunits are bound by specific chaperones that prevent their premature association in the periplasmic space. These chaperones bind very specifically and can be considered as surrogate subunits.
4 Working with Molecular Chaperones 4.1 Natural versus Artificial Substrate Proteins
An important point in studying molecular chaperones is the choice of substrate proteins. For many of them it is clear that they do not represent authentic in vivo substrates for the chaperone studied. The question is whether this poses a limitation on the validity of the results obtained with these proteins. For the promiscuous general chaperones and co-chaperones, the interaction with the substrate protein is relatively unspecific. As a consequence, any unfolded protein can be regarded as a bona fide substrate. Interestingly, almost all mechanistic studies on GroE have been performed with “artificial” substrate proteins. Using a standard set of model substrates has a number of important advantages: (1) they are commercially available at a fairly low cost, (2) they are very well characterized in their folding and aggregation properties, and (3) they allow a direct comparison among chaperones to assess functional differences (Table 2). Moreover, they likely reflect the spectrum of polypeptides that chaperones may encounter in the cell. The situation is different for chaperones with high substrate specificity. Here, assays with model substrate proteins may allow analysis of certain aspects of their
11
Sus scrofa 2×49 Photinus pyralis 60.7 Bos taurus 5.8
Citrate synthase Luciferase Insulin
33 2×50.5
Rhodanese Rubisco
Bos taurus R. rubrum
2×33 40.7
Malate dehydrogenase Sus scrofa Maltose binding protein E. coli
14
Bos taurus
Alpha-lactalbumin
68
S. cerevisiae
1.8 1.11
0.2 1.6
1.56 0.59 1.06
1.7
2.61
Molecular mass A280 nm (1 mg (kDa) mL−1 , 1 cm),
Alpha-glucosidase
Organism
Table 2 Protein substrates for molecular chaperones
Yes No
Yes No
Yes Yes Yes
Yes
Yes
Yes (toxic) Yes, but uses 14 CO2
Yes No (but change in Trp fluorescence upon folding)
Yes (expensive) Yes (expensive) No
Yes, but awkward
Yes
Good Good
Good Good
Good Very poor Good
Very good
Good
Commercially Activity assay available General available handling
Permanently unfolded derivate (RCMLA) Model substrate Very sensitive Useful only for aggregation assays Model substrate Spontaneous folding is decelerated by chaperones Model substrate First substrate ever used for in vitro assays
Remarks
[56, 97, 147] [139, 148, 149]
[99, 145] [55, 146]
[87, 107, 139] [140–143] [114, 115, 144]
[137, 138]
[115, 136]
References
12 Chaperone Function in Vitro
References
mechanisms. However, it is desirable to use authentic substrate proteins whenever possible. 4.2 Stability of Chaperones
There seems to be a common misconception that chaperones are exceptionally stable proteins because they are active under conditions, such as heat shock, that lead to the unfolding of many proteins. However, it has to be considered that the heat shock response is able to protect organisms only a few degrees above their optimum growth temperature. In general, there is no selection pressure on any intracellular protein to be stable above the maximum temperature tolerated by the host. As outlined above, many chaperones are functionally important under both physiological and stress conditions, i.e., they have to be active over a broad temperature range. From a mechanistic point of view, flexibility and conformational changes are of exceptional importance for molecular chaperones since they have to interact with a large variety of substrate proteins. For the reasons given, it is therefore plausible that chaperones cannot be extremely stable and rigid proteins (Table 1). Acknowledgments
We would like to thank V. Grimminger, B. B¨osl, and K. Ruth for contributing to this chapter and M. Haslbeck and H. Wegele for critically reading the manuscript. Work in the authors’ labs was supported by grants from the DFG to J.B. and S.W. and by grants from the Fonds der Chemischen Industrie to J.B.
References 1 EARNSHAW, W. C., HONDA, B. M., LASKEY, R. A., & THOMAS, J. O. (1980). Assembly of nucleosomes: the reaction involving X. laevis nucleoplasmin. Cell 21, 373–383. 2 DINGWALL, C. & LASKEY, R. A. (1990). Nucleoplasmin: the archetypal molecular chaperone. Semin. Cell Biol. 1, 11–17. 3 ELLIS, R. J. & HEMMINGSEN, S. M. (1989). Molecular chaperones: proteins essential for the biogenesis of some macromolecular structures. Trends Biochem. Sci. 14, 339–342. 4 ELLIS, R. J. & van der VIES, S. M. (1991). Molecular chaperones. Annu. Rev. Biochem. 60, 321–347.
5 GETHING, M. J. & SAMBROOK, J. (1992). Protein folding in the cell. Nature 355, 33–45. 6 HARTL, F. U. (1996). Molecular chaperones in cellular protein folding. Nature 381, 571–579. 7 BUKAU, B. & HORWICH, A. L. (1998). The Hsp70 and Hsp60 chaperone machines. Cell 92, 351–366. 8 HARTL, F. U. & HAYER-HARTL, M. (2002). Molecular chaperones in the cytosol: from nascent chain to folded protein. Science 295, 1852–1858. 9 WALTER, S. & BUCHNER, J. (2002). Molecular chaperonescellular machines for protein folding. Angew. Chem. Int. Ed Engl. 41, 1098–1113.
13
14 Chaperone Function in Vitro
10 GRAGEROV, A. I., MARTIN, E. S., KRUPENKO, M. A., KASHLEV, M. V., & NIKIFOROV, V. G. (1991). Protein aggregation and inclusion body formation in Escherichia coli rpoH mutant defective in heat shock protein induction. FEBS Lett. 291, 222–224. 11 SCHOEMAKER, J. M., BRASNETT, A. H., & MARSTON, F. A. (1985). Examination of calf prochymosin accumulation in Escherichia coli: disulphide linkages are a structural component of prochymosin-containing inclusion bodies. EMBO J. 4, 775–780. 12 INGOLIA, T. D. & CRAIG, E. A. (1982). Drosophila gene related to the major heat shock-induced gene is transcribed at normal temperatures and not induced by heat shock. Proc. Natl. Acad. Sci. U.S.A 79, 525–529. 13 FAYET, O., ZIEGELHOFFER, T., & GEORGOPOULOS, C. (1989). The groES and groEL heat shock gene products of Escherichia coli are essential for bacterial growth at all temperatures. J. Bacteriol. 171, 1379–1385. 14 BORKOVICH, K. A., FARRELLY, F. W., FINKELSTEIN, D. B., TAULIEN, J., & LINDQUIST, S. (1989). hsp82 is an essential protein that is required in higher concentrations for growth of cells at higher temperatures. Mol. Cell Biol. 9, 3919–3930. 15 ANFINSEN, C. B., HABER, E., SELA, M., & WHITE, F. H. (1961). The kinetics of formation of native ribonuclease during oxidation of the reduced polypetide chain. Proc. Natl. Acad. Sci. U.S.A. 47, 1309–1314. 16 ANFINSEN, C. B. (1973). Principles that govern the folding of protein chains. Science 181, 223–230. 17 JAENICKE, R. (1987). Folding and association of proteins. Prog. Biophys. Mol. Biol. 49, 117–237. 18 SECKLER, R. & JAENICKE, R. (1992). Protein folding and protein refolding. FASEB J. 6, 2545–2552. 19 KIEFHABER, T., RUDOLPH, R., KOHLER, H. H., & BUCHNER, J. (1991). Protein aggregation in vitro and in vivo: a quantitative model of the kinetic
20
21
22
23
24
25
26
27
28
competition between folding and aggregation. Biotechnology 9, 825–829. EPSTEIN, C. J., GOLDBERGER, R. F., & ANFINSEN, C. B. (1963). Genetic Control of Tertiary Protein Structure: Studies with Model Systems. Cold Spring Harbor Symp. Quant. Biol. 28, 439–449. VIITANEN, P. V., GATENBY, A. A., & LORIMER, G. H. (1992). Purified chaperonin 60 (groEL) interacts with the nonnative states of a multitude of Escherichia coli proteins. Protein Sci. 1, 363–369. HORWICH, A. L., LOW, K. B., FENTON, W. A., HIRSHFIELD, I. N., & FURTAK, K. (1993). Folding in vivo of bacterial cytoplasmic proteins: role of GroEL. Cell 74, 909–917. HOURY, W. A., FRISHMAN, D., ECKERSKORN, C., LOTTSPEICH, F., & HARTL, F. U. (1999). Identification of in vivo substrates of the chaperonin GroEL. Nature 402, 147–154. TETER, S. A., HOURY, W. A., ANG, D., TRADLER, T., ROCKABRAND, D., FISCHER, G., BLUM, P., GEORGOPOULOS, C., & HARTL, F. U. (1999). Polypeptide flux through bacterial Hsp70: DnaK cooperates with trigger factor in chaperoning nascent chains. Cell 97, 755–765. HASLBECK, M., BRAUN, N., STROMER, T., RICHTER, B., MODEL, N., WEINKAUF, S., & BUCHNER, J. (2004). Hsp42 is the general small heat shock protein in the cytosol of Saccharomyces cerevisiae. EMBO J. 23, 638–649. ROBINSON, C. V., GROSS, M., EYLES, S. J., EWBANK, J. J., MAYHEW, M., HARTL, F. U., DOBSON, C. M., & RADFORD, S. E. (1994). Conformation of GroEL-bound alpha-lactalbumin probed by mass spectrometry. Nature 372, 646–651. ZAHN, R., PERRETT, S., & FERSHT, A. R. (1996). Conformational states bound by the molecular chaperones GroEL and secB: a hidden unfolding (annealing) activity. J. Mol. Biol. 261, 43–61. GOLDBERG, M. S., ZHANG, J., SONDEK, S., MATTHEWS, C. R., FOX, R. O., & HORWICH, A. L. (1997). Native-like structure of a protein-folding intermediate bound to the chaperonin
References
29
30
31
32
33
34
35
36
37
38
GroEL. Proc. Natl. Acad. Sci. U.S.A 94, 1080–1085. CHEN, J., WALTER, S., HORWICH, A. L., & SMITH, D. L. (2001). Folding of malate dehydrogenase inside the GroEL-GroES cavity. Nat. Struct. Biol. 8, 721–728. FLYNN, G. C., POHL, J., FLOCCO, M. T., & ROTHMAN, J. E. (1991). Peptide-binding specificity of the molecular chaperone BiP. Nature 353, 726–730. LANDRY, S. J., JORDAN, R., MCMACKEN, R., & GIERASCH, L. M. (1992). Different conformations for the same polypeptide bound to chaperones DnaK and GroEL. Nature 355, 455–457. FENTON, W. A., KASHI, Y., FURTAK, K., & HORWICH, A. L. (1994). Residues in chaperonin GroEL required for polypeptide binding and release. Nature 371, 614–619. ZHU, X., ZHAO, X., BURKHOLDER, W. F., GRAGEROV, A., OGATA, C. M., GOTTESMAN, M. E., & HENDRICKSON, W. A. (1996). Structural analysis of substrate binding by the molecular chaperone DnaK. Science 272, 1606–1614. KATSUMATA, K., OKAZAKI, A., TSURUPA, G. P., & KUWAJIMA, K. (1996). Dominant forces in the recognition of a transient folding intermediate of alpha-lactalbumin by GroEL. J. Mol. Biol. 264, 643–649. GRAGEROV, A., ZENG, L., ZHAO, X., BURKHOLDER, W., & GOTTESMAN, M. E. (1994). Specificity of DnaK-peptide binding. J. Mol. Biol. 235, 848–854. CROUY-CHANEL, A., KOHIYAMA, M., & RICHARME, G. (1996). Specificity of DnaK for arginine/lysine and effect of DnaJ on the amino acid specificity of DnaK. J. Biol. Chem. 271, 15486–15490. WANG, Z., FENG, H., LANDRY, S. J., MAXWELL, J., & GIERASCH, L. M. (1999). Basis of substrate binding by the chaperonin GroEL. Biochemistry 38, 12537–12546. GERVASONI, P., STAUDENMANN, W., JAMES, P., GEHRIG, P., & PLUCKTHUN, A. (1996). beta-Lactamase binds to GroEL in a conformation highly protected against hydrogen/deuterium exchange. Proc. Natl. Acad. Sci. U.S.A 93, 12189–12194.
39 LILIE, H. & BUCHNER, J. (1995). Interaction of GroEL with a highly structured folding intermediate: iterative binding cycles do not involve unfolding. Proc. Natl. Acad. Sci. U.S.A 92, 8100–8104. 40 BRAIG, K., OTWINOWSKI, Z., HEGDE, R., BOISVERT, D. C., JOACHIMIAK, A., HORWICH, A. L., & SIGLER, P. B. (1994). The crystal structure of the bacterial chaperonin GroEL at 2.8 Å Nature 371, 578–586. 41 BUCKLE, A. M., ZAHN, R., & FERSHT, A. R. (1997). A structural model for GroEL-polypeptide recognition. Proc. Natl. Acad. Sci. U.S.A 94, 3571–3575. 42 CHEN, L. & SIGLER, P. B. (1999). The crystal structure of a GroEL/peptide complex: plasticity as a basis for substrate diversity. Cell 99, 757–768. 43 CHIN, D. T., KUEHL, L., & RECHSTEINER, M. (1982). Conjugation of ubiquitin to denatured hemoglobin is proportional to the rate of hemoglobin degradation in HeLa cells. Proc. Natl. Acad. Sci. U.S.A 79, 5857–5861. 44 PARAG, H. A., RABOY, B., & KULKA, R. G. (1987). Effect of heat shock on protein degradation in mammalian cells: involvement of the ubiquitin system. EMBO J. 6, 55–61. 45 KARZAI, A. W., ROCHE, E. D., & SAUER, R. T. (2000). The SsrA-SmpB system for protein tagging, directed degradation and ribosome rescue. Nat. Struct. Biol. 7, 449–455. 46 van NOCKER, S., DEVERAUX, Q., RECHSTEINER, M., & VIERSTRA, R. D. (1996). Arabidopsis MBP1 gene encodes a conserved ubiquitin recognition component of the 26S proteasome. Proc. Natl. Acad. Sci. U.S.A 93, 856–860. 47 WEBER-BAN, E. U., REID, B. G., MIRANKER, A. D., & HORWICH, A. L. (1999). Global unfolding of a substrate protein by the Hsp100 chaperone ClpA. Nature 401, 90–93. 48 PRATT, W. B. & TOFT, D. O. (1997). Steroid receptor interactions with heat shock protein and immunophilin chaperones. Endocr. Rev. 18, 306–360.
15
16 Chaperone Function in Vitro
49 PRATT, W. B. & TOFT, D. O. (2003). Regulation of signalling protein function and trafficking by the hsp90/hsp70-based chaperone machinery. Exp. Biol. Med. 228, 111–133. 50 RICHTER, K., MUSCHLER, P., HAINZL, O., REINSTEIN, J., & BUCHNER, J. (2003). Sti1 is a non-competitive inhibitor of the Hsp90 ATPase. Binding prevents the N-terminal dimerization reaction during the atpase cycle. J. Biol. Chem. 278, 10328–10333. 51 LIBEREK, K., SKOWYRA, D., ZYLICZ, M., JOHNSON, C., & GEORGOPOULOS, C. (1991). The Escherichia coli DnaK chaperone, the 70-kDa heat shock protein eukaryotic equivalent, changes conformation upon ATP hydrolysis, thus triggering its dissociation from a bound target protein. J. Biol. Chem. 266, 14491–14496. 52 XU, Z., HORWICH, A. L., & SIGLER, P. B. (1997). The crystal structure of the asymmetric GroEL-GroES-(ADP)7 chaperonin complex. Nature 388, 741–750. 53 RANSON, N. A., FARR, G. W., ROSEMAN, A. M., GOWEN, B., FENTON, W. A., HORWICH, A. L., & SAIBIL, H. R. (2001). ATP-bound states of GroEL captured by cryo-electron microscopy. Cell 107, 869–879. 54 RICHTER, K., MUSCHLER, P., HAINZL, O., & BUCHNER, J. (2001). Coordinated ATP hydrolysis by the Hsp90 dimer. J. Biol. Chem. 276, 33689–33696. 55 HARDY, S. J. & RANDALL, L. L. (1991). A kinetic partitioning model of selective binding of nonnative proteins by the bacterial chaperone SecB. Science 251, 439–443. 56 WEISSMAN, J. S., KASHI, Y., FENTON, W. A., & HORWICH, A. L. (1994). GroEL-mediated protein folding proceeds by multiple rounds of binding and release of nonnative forms. Cell 78, 693–702. 57 FRIEDEN, C. & CLARK, A. C. (1997). Protein folding: how the mechanism of GroEL action is defined by kinetics. Proc. Natl. Acad. Sci. U.S.A 94, 5535–5538. 58 SHTILERMAN, M., LORIMER, G. H., & ENGLANDER, S. W. (1999). Chaperonin
59
60
61
62
63
64
65
66
67
function: folding by forced unfolding. Science 284, 822–825. JORDAN, R. & MCMACKEN, R. (1995). Modulation of the ATPase activity of the molecular chaperone DnaK by peptides and the DnaJ and GrpE heat shock proteins. J. Biol. Chem. 270, 4563–4569. MCLAUGHLIN, S. H., SMITH, H. W., & JACKSON, S. E. (2002). Stimulation of the weak ATPase activity of human hsp90 by a client protein. J. Mol. Biol. 315, 787–798. CASHIKAR, A. G., SCHIRMER, E. C., HATTENDORF, D. A., GLOVER, J. R., RAMAKRISHNAN, M. S., WARE, D. M., & LINDQUIST, S. L. (2002). Defining a pathway of communication from the C-terminal peptide binding domain to the N-terminal ATPase domain in a AAA protein. Mol. Cell 9, 751–760. GAMER, J., MULTHAUP, G., TOMOYASU, T., MCCARTY, J. S., RUDIGER, S., SCHONFELD, H. J., SCHIRRA, C., BUJARD, H., & BUKAU, B. (1996). A cycle of binding and release of the DnaK, DnaJ and GrpE chaperones regulates activity of the Escherichia coli heat shock transcription factor sigma32. EMBO J. 15, 607–617. LIBEREK, K., MARSZALEK, J., ANG, D., GEORGOPOULOS, C., & ZYLICZ, M. (1991). Escherichia coli DnaJ and GrpE heat shock proteins jointly stimulate ATPase activity of DnaK. Proc. Natl. Acad. Sci. U.S.A 88, 2874–2878. PACKSCHIES, L., THEYSSEN, H., BUCHBERGER, A., BUKAU, B., GOODY, R. S., & REINSTEIN, J. (1997). GrpE accelerates nucleotide exchange of the molecular chaperone DnaK with an associative displacement mechanism. Biochemistry 36, 3417–3422. SULLIVAN, W. P., OWEN, B. A., & TOFT, D. O. (2002). The influence of ATP and p23 on the conformation of hsp90. J. Biol. Chem. 277, 45942–45948. WEGELE, H., HASLBECK, M., REINSTEIN, J., & BUCHNER, J. (2003). Sti1 is a novel activator of the Ssa proteins. J. Biol. Chem. 278, 25970–25976. GRAY, T. E. & FERSHT, A. R. (1991). Cooperativity in ATP hydrolysis by GroEL is increased by GroES. FEBS Lett. 292, 254–258.
References 68 TODD, M. J., VIITANEN, P. V., & LORIMER, G. H. (1994). Dynamics of the chaperonin ATPase cycle: implications for facilitated protein folding. Science 265, 659–666. 69 HASLBECK, M., WALKE, S., STROMER, T., EHRNSPERGER, M., WHITE, H. E., CHEN, S., SAIBIL, H. R., & BUCHNER, J. (1999). Hsp26: a temperature-regulated chaperone. EMBO J. 18, 6744–6751. 70 SPIESS, C., BEIL, A., & EHRMANN, M. (1999). A temperature-dependent switch from chaperone to protease in a widely conserved heat shock protein. Cell 97, 339–347. 71 KROJER, T., GARRIDO-FRANCO, M., HUBER, R., EHRMANN, M., & CLAUSEN, T. (2002). Crystal structure of DegP (HtrA) reveals a new protease-chaperone machine. Nature 416, 455–459. 72 JAKOB, U., MUSE, W., ESER, M., & BARDWELL, J. C. (1999). Chaperone activity with a redox switch. Cell 96, 341–352. 73 KOBAYASHI, T. & INOUYE, M. (1992). Functional analysis of the intramolecular chaperone. Mutational hot spots in the subtilisin propeptide and a second-site suppressor mutation within the subtilisin molecule. J. Mol. Biol. 226, 931–933. 74 JASWAL, S. S., SOHL, J. L., DAVIS, J. H., & AGARD, D. A. (2002). Energetic landscape of alpha-lytic protease optimizes longevity through kinetic stability. Nature 415, 343–346. 75 MENDOZA, J. A., ROGERS, E., LORIMER, G. H., & HOROWITZ, P. M. (1991). Unassisted refolding of urea unfolded rhodanese. J. Biol. Chem. 266, 13587–13591. 76 ZHI, W., LANDRY, S. J., GIERASCH, L. M., & SRERE, P. A. (1992). Renaturation of citrate synthase: influence of denaturant and folding assistants. Protein Sci. 1, 522–529. 77 AGHAJANIAN, S., WALSH, T. P., & ENGEL, P. C. (1999). Specificity of coenzyme analogues and fragments in promoting or impeding the refolding of clostridial glutamate dehydrogenase. Protein Sci. 8, 866–872.
78 SZABO, A., KORSZUN, R., HARTL, F. U., & FLANAGAN, J. (1996). A zinc fingerlike domain of the molecular chaperone DnaJ is involved in binding to denatured protein substrates. EMBO J. 15, 408–417. 79 BOSE, S., WEIKL, T., BUGL, H., & BUCHNER, J. (1996). Chaperone function of Hsp90-associated proteins. Science 274, 1715–1717. 80 FREEMAN, B. C., TOFT, D. O., & MORIMOTO, R. I. (1996). Molecular chaperone machines: chaperone activities of the cyclophilin Cyp-40 and the steroid aporeceptor-associated protein p23. Science 274, 1718–1720. 81 FREEMAN, B. C. & YAMAMOTO, K. R. (2002). Disassembly of transcriptional regulatory complexes by molecular chaperones. Science 296, 2232–2235. 82 KIMURA, Y., RUTHERFORD, S. L., MIYATA, Y., YAHARA, I., FREEMAN, B. C., YUE, L., MORIMOTO, R. I., & LINDQUIST, S. (1997). Cdc37 is a molecular chaperone with specific functions in signal transduction. Genes Dev. 11, 1775–1785. 83 NAGATA, K. (1996). Hsp47: a collagen-specific molecular chaperone. Trends Biochem. Sci. 21, 22–26. 84 SAGA, S., NAGATA, K., CHEN, W. T., & YAMADA, K. M. (1987). pH-dependent function, purification, & intracellular location of a major collagen-binding glycoprotein. J. Cell Biol. 105, 517–527. 85 NATSUME, T., KOIDE, T., YOKOTA, S., HIRAYOSHI, K., & NAGATA, K. (1994). Interactions between collagen-binding stress protein HSP47 and collagen. Analysis of kinetic parameters by surface plasmon resonance biosensor. J. Biol. Chem. 269, 31224–31228. 86 GOLOUBINOFF, P., CHRISTELLER, J. T., GATENBY, A. A., & LORIMER, G. H. (1989). Reconstitution of active dimeric ribulose bisphosphate carboxylase from an unfolded state depends on two chaperonin proteins and Mg-ATP. Nature 342, 884–889. 87 BUCHNER, J., SCHMIDT, M., FUCHS, M., JAENICKE, R., RUDOLPH, R., SCHMID, F. X., & KIEFHABER, T. (1991). GroE facilitates refolding of citrate synthase by suppressing aggregation. Biochemistry 30, 1586–1591.
17
18 Chaperone Function in Vitro
88 SIGLER, P. B., XU, Z., RYE, H. S., BURSTON, S. G., FENTON, W. A., & HORWICH, A. L. (1998). Structure and function in GroEL-mediated protein folding. Annu. Rev. Biochem. 67, 581–608. 89 EHRNSPERGER, M., HERGERSBERG, C., WIENHUES, U., NICHTL, A., & BUCHNER, J. (1998). Stabilization of proteins and peptides in diagnostic immunological assays by the molecular chaperone Hsp25. Anal. Biochem. 259, 218–225. 90 LANGER, T., PFEIFER, G., MARTIN, J., BAUMEISTER, W., & HARTL, F. U. (1992). Chaperonin-mediated protein folding: GroES binds to one end of the GroEL cylinder, which accommodates the protein substrate within its central cavity. EMBO J. 11, 4757–4765. 91 HOROVITZ, A., FRIDMANN, Y., KAFRI, G., & YIFRACH, O. (2001). Review: allostery in chaperonins. J. Struct. Biol. 135, 104–114. 92 STARK, G. R. (1965). Reactions of cyanate with functional groups of proteins. 3. Reactions with amino and carboxyl groups. Biochemistry 4, 1030–1036. 93 GRIMMINGER, V., RICHTER, K., IMHOF, A., BUCHNER, J., & WALTER, S. (2004). The prion curing agent guanidinium chloride specifically inhibits ATP hydrolysis by Hsp104. J. Biol. Chem. 279, 7378–7383. 94 TODD, M. J. & LORIMER, G. H. (1995). Stability of the asymmetric Escherichia coli chaperonin complex. Guanidine chloride causes rapid dissociation. J. Biol. Chem. 270, 5388–5394. 95 S¨ORBO, B. H. (1953). Crystalline Rhodanese: The Enzyme Catalyzed Reaction. Acta Chem. Scand. 7, 1137–1145. 96 PLOEGMAN, J. H., DRENT, G., KALK, K. H., & HOL, W. G. (1978). Structure of bovine liver rhodanese. I. Structure determination at 2.5Å resolution and a comparison of the conformation and sequence of its two domains. J. Mol. Biol. 123, 557–594. 97 MENDOZA, J. A., ROGERS, E., LORIMER, G. H., & HOROWITZ, P. M. (1991). Chaperonins facilitate the in vitro folding of monomeric mitochondrial
98
99
100
101
102
103
104
105
106
rhodanese. J. Biol. Chem. 266, 13044–13049. JAENICKE, R., RUDOLPH, R., & HEIDER, I. (1979). Quaternary structure, subunit activity, & in vitro association of porcine mitochondrial malic dehydrogenase. Biochemistry 18, 1217–1223. STANIFORTH, R. A., CORTES, A., BURSTON, S. G., ATKINSON, T., HOLBROOK, J. J., & CLARKE, A. R. (1994). The stability and hydrophobicity of cytosolic and mitochondrial malate dehydrogenases and their relation to chaperonin-assisted folding. FEBS Lett. 344, 129–135. RODERICK, S. L. & BANASZAK, L. J. (1986). The three-dimensional structure of porcine heart mitochondrial malate dehydrogenase at 3.0-Å resolution. J. Biol. Chem. 261, 9461–9464. RANSON, N. A., DUNSTER, N. J., BURSTON, S. G., & CLARKE, A. R. (1995). Chaperonins can catalyse the reversal of early aggregation steps when a protein misfolds. J. Mol. Biol. 250, 581–586. LEE, G. J., ROSEMAN, A. M., SAIBIL, H. R., & VIERLING, E. (1997). A small heat shock protein stably binds heat-denatured model substrates and can maintain a substrate in a folding-competent state. EMBO J. 16, 659–671. GOLOUBINOFF, P., MOGK, A., ZVI, A. P., TOMOYASU, T., & BUKAU, B. (1999). Sequential mechanism of solubilization and refolding of stable protein aggregates by a bichaperone network. Proc. Natl. Acad. Sci. U.S.A 96, 13732–13737. REMINGTON, S., WIEGAND, G., & HUBER, R. (1982). Crystallographic refinement and atomic models of two different forms of citrate synthase at 2.7 and 1.7Å resolution. J. Mol. Biol. 158, 111–152. HASLBECK, M., SCHUSTER, I., & GRALLERT, H. (2003). GroE-dependent expression and purification of pig heart mitochondrial citrate synthase in Escherichia coli. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 786, 127–136. SRERE, P. A., BRAZIL, H., & GONEN, L. (1963). Citrate synthase assay in tissue homogenates. Acta Chem. Scand. 17, 129–134.
References 107 BUCHNER, J., GRALLERT, H., & JAKOB, U. (1998). Analysis of chaperone function using citrate synthase as nonnative substrate protein. Methods Enzymol. 290, 323–338. 108 WIECH, H., BUCHNER, J., ZIMMERMANN, R., & JAKOB, U. (1992). Hsp90 chaperones protein folding in vitro. Nature 358, 169–170. 109 SHAO, F., BADER, M. W., JAKOB, U., & BARDWELL, J. C. (2000). DsbG, a protein disulfide isomerase with chaperone activity. J. Biol. Chem. 275, 13349–13352. 110 CYR, D. M., LU, X., & DOUGLAS, M. G. (1992). Regulation of Hsp70 function by a eukaryotic DnaJ homolog. J. Biol. Chem. 267, 20927–20931. 111 PANSE, V. G., SWAMINATHAN, C. P., SUROLIA, A., & VARADARAJAN, R. (2000). Thermodynamics of substrate binding to the chaperone SecB. Biochemistry 39, 2420–2427. 112 SCHOLZ, C., STOLLER, G., ZARNT, T., FISCHER, G., & SCHMID, F. X. (1997). Cooperation of enzymatic and chaperone functions of trigger factor in the catalysis of protein folding. EMBO J. 16, 54–58. 113 SANGER, F. (1949). Fractionation of oxidized insulin. Biochem. J. 44, 126–128. 114 HORWITZ, J., HUANG, Q. L., DING, L., & BOVA, M. P. (1998). Lens alpha-crystallin: chaperone-like properties. Methods Enzymol. 290, 365–383. 115 STROMER, T., EHRNSPERGER, M., GAESTEL, M., & BUCHNER, J. (2003). Analysis of the interaction of small heat shock proteins with unfolding proteins. J. Biol. Chem. 278, 18015–18021. 116 SCHONFELD, H. J. & BEHLKE, J. (1998). Molecular chaperones and their interactions investigated by analytical ultracentrifugation and other methodologies. Methods Enzymol. 290, 269–296. 117 BLOND-ELGUINDI, S., CWIRLA, S. E., DOWER, W. J., LIPSHUTZ, R. J., SPRANG, S. R., SAMBROOK, J. F., & GETHING, M. J. (1993). Affinity panning of a library of peptides displayed on bacteriophages reveals the binding specificity of BiP. Cell 75, 717–728. 118 FOURIE, A. M., SAMBROOK, J. F., & GETHING, M. J. (1994). Common and
119
120
121
122
123
124
125
126
127
128
divergent peptide binding specificities of hsp70 molecular chaperones. J. Biol. Chem. 269, 30470–30478. KNARR, G., GETHING, M. J., MODROW, S., & BUCHNER, J. (1995). BiP binding sequences in antibodies. J. Biol. Chem. 270, 27589–27594. RUDIGER, S., GERMEROTH, L., SCHNEIDER-MERGENER, J., & BUKAU, B. (1997). Substrate specificity of the DnaK chaperone determined by screening cellulose-bound peptide libraries. EMBO J. 16, 1501–1507. RUDIGER, S., SCHNEIDER-MERGENER, J., & BUKAU, B. (2001). Its substrate specificity characterizes the DnaJ co-chaperone as a scanning factor for the DnaK chaperone. EMBO J. 20, 1042–1050. KORNBERG, A., SCOTT, J. F., & BERTSCH, L. L. (1978). ATP utilization by rep protein in the catalytic separation of DNA strands at a replicating fork. J. Biol. Chem. 253, 3298–3304. LANZETTA, P. A., ALVAREZ, L. J., REINACH, P. S., & CANDIA, O. A. (1979). An improved assay for nanomole amounts of inorganic phosphate. Anal. Biochem. 100, 95–97. NORBY, J. G. (1988). Coupled assay of Na+ ,K+ -ATPase activity. Methods Enzymol. 156, 116–119. DITZEL, L., LOWE, J., STOCK, D., STETTER, K. O., HUBER, H., HUBER, R., & STEINBACHER, S. (1998). Crystal structure of the thermosome, the archaeal chaperonin and homolog of CCT. Cell 93, 125–138. PRODROMOU, C., ROE, S. M., O’BRIEN, R., LADBURY, J. E., PIPER, P. W., & PEARL, L. H. (1997). Identification and structural characterization of the ATP/ADP-binding site in the Hsp90 molecular chaperone. Cell 90, 65–75. MEYER, P., PRODROMOU, C., HU, B., VAUGHAN, C., ROE, S. M., PANARETOU, B., PIPER, P. W., & PEARL, L. H. (2003). Structural and functional analysis of the middle segment of hsp90: implications for ATP hydrolysis and client protein and cochaperone interactions. Mol. Cell 11, 647–658. LEE, S., SOWA, M. E., WATANABE, Y. H., SIGLER, P. B., CHIU, W., YOSHIDA, M., &
19
20 Chaperone Function in Vitro
129
130
131
132
133
134
135
136
137
TSAI, F. T. (2003). The structure of ClpB: a molecular chaperone that rescues proteins from an aggregated state. Cell 115, 229–240. KIM, K. K., KIM, R., & KIM, S. H. (1998). Crystal structure of a small heat-shock protein. Nature 394, 595–599. MELKI, R., ROMMELAERE, H., LEGUY, R., VANDEKERCKHOVE, J., & AMPE, C. (1996). Cofactor A is a molecular chaperone required for beta-tubulin folding: functional and structural characterization. Biochemistry 35, 10422–10435. RUSSELL, R., JORDAN, R., & MCMACKEN, R. (1998). Kinetic characterization of the ATPase cycle of the DnaK molecular chaperone. Biochemistry 37, 596–607. LOPEZ-BUESA, P., PFUND, C., & CRAIG, E. A. (1998). The biochemical properties of the ATPase activity of a 70-kDa heat shock protein (Hsp70) are governed by the C-terminal domains. Proc. Natl. Acad. Sci. U.S.A 95, 15253–15258. BARNETT, M. E., ZOLKIEWSKA, A., & ZOLKIEWSKI, M. (2000). Structure and activity of ClpB from Escherichia coli. Role of the amino-and-carboxyl-terminal domains. J. Biol. Chem. 275, 37565–37571. SURIN, A. K., KOTOVA, N. V., KASHPAROV, I. A., MARCHENKOV, V. V., MARCHENKOVA, S. Y., & SEMISOTNOV, G. V. (1997). Ligands regulate GroEL thermostability. FEBS Lett. 405, 260–262. STROMER, T., FISCHER, E., RICHTER, K., HASLBECK, M., & BUCHNER, J. (2004). Analysis of the regulation of the molecular chaperone Hsp26 by temperature-induced dissociation: the N-terminal domain is important for oligomer assembly and the binding of unfolding proteins. J. Biol. Chem. 279, 11222–11228. HOLL-NEUGEBAUER, B., RUDOLPH, R., SCHMIDT, M., & BUCHNER, J. (1991). Reconstitution of a heat shock effect in vitro: influence of GroE on the thermal aggregation of alpha-glucosidase from yeast. Biochemistry 30, 11609–11614. HAYER-HARTL, M. K., EWBANK, J. J., CREIGHTON, T. E., & HARTL, F. U. (1994).
138
139
140
141
142
143
144
145
Conformational specificity of the chaperonin GroEL for the compact folding intermediates of alpha-lactalbumin. EMBO J. 13, 3192–3202. LINDNER, R. A., KAPUR, A., & CARVER, J. A. (1997). The interaction of the molecular chaperone, alpha-crystallin, with molten globule states of bovine alpha-lactalbumin. J. Biol. Chem. 272, 27722–27729. SCHMIDT, M., BUCHNER, J., TODD, M. J., LORIMER, G. H., & VIITANEN, P. V. (1994). On the role of groES in the chaperonin-assisted folding reaction. Three case studies. J. Biol. Chem. 269, 10304–10311. SCHRODER, H., LANGER, T., HARTL, F. U., & BUKAU, B. (1993). DnaK, DnaJ and GrpE form a cellular chaperone machinery capable of repairing heat-induced protein damage. EMBO J. 12, 4137–4144. BUCHBERGER, A., SCHRODER, H., HESTERKAMP, T., SCHONFELD, H. J., & BUKAU, B. (1996). Substrate shuttling between the DnaK and GroEL systems indicates a chaperone network promoting protein folding. J. Mol. Biol. 261, 328–333. ZOLKIEWSKI, M. (1999). ClpB cooperates with DnaK, DnaJ, & GrpE in suppressing protein aggregation. A novel multi-chaperone system from Escherichia coli. J. Biol. Chem. 274, 28083–28086. DUMITRU, G. L., GROEMPING, Y., KLOSTERMEIER, D., RESTLE, T., DEUERLING, E., & REINSTEIN, J. (2004). DafA Cycles Between the DnaK Chaperone System and Translational Machinery. J. Mol. Biol. 339, 1179–1189. FARAHBAKHSH, Z. T., HUANG, Q. L., DING, L. L., ALTENBACH, C., STEINHOFF, H. J., HORWITZ, J., & HUBBELL, W. L. (1995). Interaction of alpha-crystallin with spin-labeled peptides. Biochemistry 34, 509–516. MILLER, A. D., MAGHLAOUI, K., ALBANESE, G., KLEINJAN, D. A., & SMITH, C. (1993). Escherichia coli chaperonins cpn60 (groEL) and cpn10 (groES) do not catalyse the refolding of mitochondrial
References malate dehydrogenase. Biochem. J. 291, 139–144. 146 SPARRER, H., LILIE, H., & BUCHNER, J. (1996). Dynamics of the GroEL-protein complex: effects of nucleotides and folding mutants. J. Mol. Biol. 258, 74–87. 147 HAYER-HARTL, M. K., WEBER, F., & HARTL, F. U. (1996). Mechanism of chaperonin action: GroES binding and release can drive GroEL-mediated protein folding in the absence of ATP hydrolysis. EMBO J. 15, 6111–6121. 148 VIITANEN, P. V., LUBBEN, T. H., REED, J., GOLOUBINOFF, P., O’KEEFE, D. P., & LORIMER, G. H. (1990). Chaperonin-facilitated refolding of ribulosebis-phosphate carboxylase and ATP hydrolysis by chaperonin 60
(groEL) are K+ dependent. Biochemistry 29, 5665–5671. 149 RYE, H. S., BURSTON, S. G., FENTON, W. A., BEECHEM, J. M., XU, Z., SIGLER, P. B., & HORWICH, A. L. (1997). Distinct actions of cis and trans ATP within the double ring of the chaperonin GroEL. Nature 388, 792–798. 150 SRIRAM, M., OSIPIUK, J., FREEMAN, B., MORIMOTO, R., & JOACHIMIAK, A. (1997). Human Hsp70 molecular chaperone binds two calcium ions within the ATPase domain. Structure 5, 403–414. 151 van MONTFORT, R. L., BASHA, E., FRIEDRICH, K. L., SLINGSBY, C., & VIERLING, E. (2001). Crystal structure and assembly of a eukaryotic small heat shock protein. Nat. Struct. Biol. 8, 1025–1030.
21
1
Peptidyl-prolyl cis/trans Isomerases Gunter Fischer Max Planck Research Unit for Enzymology of Protein Folding, Halle/Saale, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The proper function of proteins within living cells depends on how those macromolecules adopt a unique three-dimensional structure and on how spatial-temporal control underlies the mechanism of protein folding in vivo. The numerous contact points formed and released in the course of protein folding are based on countless rotational movements. Most rotations of covalent bonds have proven to be intrinsically very fast and do not need further rate acceleration by either external factors or intramolecular assistance. Potentially, external influences on rotational rates could be mediated by a wide variety of physical and chemical means such as enzyme catalysis, catalysis by low-molecular-mass compounds, heat, mechanical forces, and supportive microenvironments. Acid-base catalysis, torsional strain, and proximity and field effects are major forces with the potential to act in intramolecular assistance. Submillisecond burst phase signals measured in the kinetics of protein-folding experiments and triplet-triplet energy transfer rates measured for side chain contacts of oligopeptides have been widely interpreted in terms of the fast formation of collapsed folding intermediates and secondary structures formation, both involving countless conformational interconversions [1–3]. Thus, in the evolutionary history, attempts to invent mechanisms of rate acceleration for the formation of collapsed polypeptides by accessory molecules appeared to be futile. However, in contrast to the information provided by its graphical representation, a protein backbone possesses a considerable degree of rigidity that exists independently of the three-dimensional fold. Despite the formal single-bond, fast-rotation nature of its covalent linkages, the repeating unit of the polypeptide backbone suffers from at least one slow bond rotation. It is the high rotational barrier of the carbon-nitrogen linkage of the peptide bond (angle ω) that casts serious doubt on Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Peptidyl-prolyl cis/trans Isomerases
the validity of the structure-based treatment of a polypeptide chain as an array of single-bond units. Considering the influences on the dynamics of biochemical reactions such as protein folding/restructuring and protein-ligand/protein-protein interactions, the electronic structure of the C N bond is best described as a partial double bond. Under this classical view, the peptide bond exhibits characteristic features of E-Z geometrical isomerism: the existence of chemically and physically distinct cis/trans isomers, a high barrier to rotation, and a distinct chemistry of the cis/trans interconversion [4, 5]. By virtue of having two thermodynamically stable chain arrangements of different chemical reactivity separated by an energetic barrier, peptide bond cis/trans isomerization resembles a molecular switch. Switching is manifested in the form of directed mechanical movement of the polypeptide chain. This fact is best illustrated by analysis of the protein crystal structure database for homologous proteins with a different isomeric state of a certain prolyl residue. A subset of structures has been created that has a particular proline residue either in the cis or the trans isomeric state for similar proteins [6]. Structural alignment of matched pairs of isomeric proteins generates three classes with respect to positionspecific distribution of Cα-atom displacements around the isomeric proline imide bond. Besides small bidirectional chain displacements, there is a class of 12 protein pairs in which the structural changes are unidirectional relative to the isomerizing bond. The magnitude of the isomer-specific distance effect exceeds 3.0±2.0 Å at sequence positions remote from the prolyl bond. Interestingly, the magnitude of the intramolecular isomer-specific Cα atom displacements reveals a lever-arm amplification of distance effects because there is only a cis-specific chain contraction of 0.8 Å for the Cα atoms directly involved in the prolyl bond. A type of directed oscillating movement within the backbone in its native conformation, if a fast kinetic step precedes it, offers a structural basis for the explanation of the isomer-specific control of the bioactivity. The findings indicate that cellular folding catalysts may provide a previously unrecognized level of control for the biological activity of the equilibrium-folding pathway of native proteins, which exists independently of the de novo folding pathway. Chemically, peptide bond cis/trans isomerization is reminiscent of other important biological events, such as the retinal-mediated photocycle, carotenoids in antioxidant processes, and phytochrome-based reactions, which, like retinal-based, display isomerization-linked geometrical distortions that are transmitted to amino acid residues of their respective binding protein [7]. Whereas visible light absorption gives rise to the very fast cis/trans isomerization-mediated photocycle of the protein-bound 11-cis-retinal chromophore, which finally results in the formation of a G-protein-interacting state of rhodopsin, the rate of peptide bond cis/trans isomerization is naturally insensitive to irradiation at wavelengths > 220 nm [8]. Thus, light irradiation is also unable to decrease the double-bond character of the bonds determining protein backbone rigidity, with the consequent lack of directed atomic displacements as well as structural “backbone liquification” or facilitating torsional movements by “lubrication.” However, specific peptide bond cis/trans isomerases have evolved to trigger a similar type of structural response and, thereby, to have a specific contribution to lowering of the high rotational barrier of peptide
2 Peptidyl-prolyl cis/trans Isomerization
bonds [9, 10]. Unlike photochemical triggering, where a very slow dark reaction at the carbon-carbon double bond after switching off the light is to occur, enzymecatalyzed peptide cis/trans isomerizations tend to be both fast and reversible under most conditions. It should be noted that the peptide bond cis/trans isomerases identified up to now could not shift a cis/trans equilibrium but could accelerate re-equilibration after transient chemical or physical perturbbations. Except for enzyme concentrations much larger than substrate concentrations, where the Michaelis complexes of the substrate and the product will confer an individual isomer ratio, peptide bond cis/trans isomerases must be considered simple catalysts in bioreactions. Obviously, enzymatic coupling to an energy-consuming process would allow generating a permanent non-equilibrium composition of cis/trans isomers free in solution. Such multifunctional peptide bond cis/trans isomerases have not yet been found but are likely to exist as isolated enzymes or hetero-oligomeric complexes. From the point of view of biological importance, when a polypeptide binds to a native protein, the resulting complex forms a global ensemble that must seek its own minimum free-energy fold. Consequently, the rules underlying the process of de novo protein folding do not change, in principle, for chain rearrangements induced in native proteins during ligand association, enzyme catalysis, and other reactions. Since peptide bond cis/trans isomers are frequently found to exist in a dynamic equilibrium in the context of a native fold, the data collected for the isomerization process in the refolding of denatured proteins are principally applicable for the analysis of native-state cis/trans isomerizations. Most importantly, native-state isomers might exhibit significant differences in their biological activity [6, 11–13]. In an approach to understanding the nature of catalyzed cis/trans isomerizations in cells, two major effects have to be assumed: (1) accelerated re-equilibration of a transiently perturbed cis/trans isomer ratio and (2) enhanced conformational dynamics in the presence of a catalyst in an already equilibrated mixture of cis/trans isomers. The increase in the rate of re-equilibration brought about by the catalyst according to effect (1) requires a fast-equilibrium, perturbing-type reaction prior to the cis/trans isomerization in order for isomer-specific reaction control to occur. In contrast, effect (2) permanently enables a small segment of the polypeptide backbone to escape the steric restrictions imposed by the partial double-bond character of the C N linkage. Concomitantly, the position of the substrate chain oscillates in a directed manner with a frequency determined by kcat values of the reversible enzyme reaction. Because of dynamic coupling within the enzyme-substrate or enzyme-product complexes, specific amino acid residues of the enzyme might oscillate with the same frequency [14].
2 Peptidyl-prolyl cis/trans Isomerization
Proline is the only one of the 20 canonical amino acids that is involved in the formation of an imidic peptide bond at its N-terminal tail because the peptide bond
3
4 Peptidyl-prolyl cis/trans Isomerases
Fig. 1 cis/trans isomerization of a prolyl bond and free-energy profiles of the spontaneous and enzyme-catalyzed reactions.
nitrogen is in an N-alkylated state (Figure 1). Essential features of the peptidyl-prolyl peptide bond and the peptidyl-prolyl bond’s cis/trans isomerization, which we subsequently refer to as prolyl bond and prolyl isomerization, respectively, are dynamics and unique conformational constraints. Among all peptide bonds, prolyl bonds should be afforded special attention regarding conformational polymorphism of proline-containing polypeptides. Regardless of their sequential context, the cis and trans prolyl conformers in unstructured or partially structured polypeptide chains exhibit energetic differences that tend to be small, thus leading to a comparable level of isomers in solution. The slow rate of cis/trans interconversion obtained in refolding studies plays a significant role in determining the uniqueness of prolyl isomerization because just one covalent bond serves to control an enormous body of structural changes from the seconds time range of slow folding steps in small globular proteins [15] to the hours time range for the zipper-like mechanism of
2 Peptidyl-prolyl cis/trans Isomerization
the triple-helix formation of collagen [16, 17] or protein-protein associations [18]. The classical experiment conducted by Brandts et al. led us to realize that a prolyl isomerization is also involved in protein denaturation [19]. The experimental data reveal that the relatively few examples of functional coupling between protein folding and prolyl isomerization are seemingly at odds with the large number of proline residues in proteins. According to the marked differences of the cis/trans isomer ratios between the unfolded and the folded states, every proline residue might cause slow folding events unless intramolecular assistance greatly enhances the isomerization rate. This ambiguity is of more than theoretical interest since the whole body of slowly isomerizing prolyl bonds can be seen as a potential molecular basis of metastable states of proteins and classed as potential targets for enzyme catalysis. Obviously, the existence of non-rate-limiting prolyl isomerizations must take into account the potential impact of intramolecular catalysis by amino acid side chains and steric strain in polypeptides, but only a few examples have been identified [20, 21]. An alternative explanation has been to suggest that the prolyl isomerization is silent in common folding probes. This case is easily conceivable, but nevertheless, it is likely that formation of bioactivity might be affected in many cases [22–24]. In principle, peptides or proteins with n proline residues can form 2n isomers unless structural constraints (such as the three-dimensional fold of proteins) stabilize one isomer strongly relative to the others. While being a highly abundant motif in proteins, a polyproline stretch tends to be a conformationally uniform segment but is largely undefined in its isomerization dynamics [25, 26]. For example, all theoretically predicted cis/trans isomers could be identified by 1 H-NMR spectroscopy for the octapeptide Ile-Phe-Pro-Pro-Val-Pro-Gly-Pro-Gly, which is derived from the prolactin receptor [27]. Counter to the expectations drawn from statistics, a decreasing fractional isomer probability among the 16 isomeric forms does not correlate with a decreasing number of trans peptide bonds in the respective isomer. In many cases among prolyl isomers presented to other proteins for interactions to occur, some isomers might lack productive binding. In fact, proteins have evolved highly efficient means of selecting a particular structure among an ensemble of conformers of ligands. Consequently, isomer-specific enzymes and transporters, such as proteases [28], protein phosphatases [29], proline-directed protein kinases [30], and the peptide transporter Pept1 [31], that favor molecules with trans prolyl bonds at critical positions discriminating the coexisting cis isomer have been found. However, bioactivity is not stringently linked with chains with all-trans peptide bond conformations. In several cases the optimal receptor-ligand interaction requires appropriately positioned cis prolyl bonds [32–35]. Since bioactivity of polypeptides exhibits specificity for the slowly interconverting prolyl isomers, enzymes have evolved (1) to accelerate the interconversion of the isomers and (2) to establish a new level of conformational control. In the years following the discovery of the first peptidyl-prolyl cis/trans isomerase (PPIase) in pig kidney in 1984, three different families of PPIases have been characterized, which now collectively form the enzyme class EC 5.2.1.8 [9, 28, 36]. In
5
6 Peptidyl-prolyl cis/trans Isomerases
Fig. 2 Presently known peptidyl-prolyl cis/trans isomerase in humans classified according to amino acid sequence-based similarities. The individual enzymes have been iden.epsied by the SWISS-Prot/TREMBL accession numbers.
humans, the families of FK506-binding proteins (FKBPs) and cyclophilins have many members, whereas the human genome probably restricts the total number of the parvulins to not more than two members (Figure 2). Sequence similarity of the catalytic domains forms the signature of family membership. Across families, the catalytic domains are unrelated to each other in their amino acid sequences, confer distinct substrate specificities, and prove to be binders for different types of inhibitors. In their three-dimensional structures, parvulins appeared to be closely related to FKBPs in that, by comparison, a FKBP superfold could be extracted [37]. Family members of the ubiquitous enzyme class of PPIases are found in
3 Monitoring Peptidyl-prolyl cis/trans Isomerase Activity 7
abundance in virtually all organisms and subcellular compartments. Within families, the amino acid sequences are highly conserved phylogenetically. Organisms simultaneously express paralogs within the families that encompass one or more PPIase domains complemented with other functional polypeptide segments. Except for targeting motifs and metal-ion-binding polypeptide stretches, the functional interplay between the supplementing domains of the larger PPIases and the catalytic domain(s) is still unclear. Since some prototypic PPIases are proven drug targets either by blockage of the active site or by utilizing the PPIase as a presenter protein of the drug [38], knowledge about functional consequences of the supplementary domains in the multidomain-like PPIases is urgently needed. Recently, the secondary amide peptide bond cis/trans isomerases (abbreviated APIases) were added to the class of peptide bond cis/trans isomerases, indicating that an expanded set of polypeptide substrates is potentially sensitive to biocatalysis of a conformational interconversion in the cell.
3 Monitoring Peptidyl-prolyl cis/trans Isomerase Activity
The chemical and physical principles described for the detection of isomer-specific reactions and native-state isomerizations of prolyl bond-containing oligopeptide chains apply, in general, to all PPIase assays. In all cases, the rate enhancement in the presence of the PPIase quantitatively reflects the enzyme activity. As a monomolecular reaction, the spontaneous cis/trans isomerization conforms to firstorder kinetics with a specific rate constant ko (s−1 ) that is independent of the substrate concentration. With some exceptions, the isomer ratio cannot be extracted from the data of the spectrometric PPIase assays, but NMR-based methods can be utilized. Under favorable conditions, where the concentration of the active PPIase is much smaller than the total substrate concentration and the total substrate concentration is much smaller than the K m value, Eq. (1) provides a quantitative estimate of enzyme catalysis. Such conditions also include first-order kinetics of the PPIasecatalyzed progress curves. kcat /K m =(kob −ko )/E o
(1)
The kcat /K m (L mol−1 s−1 ) value is the specificity constant for the enzymatic catalysis of the interconversion of the cis/trans isomers of the substrate, kob (s−1 ) is the first-order rate constant of the catalyzed interconversion, ko (s−1 ) is the first-order rate constant of the spontaneous cis/trans interconversion, and Eo (mol L−1 ) is the concentration of active enzyme molecules. However, active-site titration methods cannot be employed for PPIases except for cyclophilins. Cyclosporin A proved to be useful for fluorescence-based active-site titration of those cyclophilins with a tryptophan residue corresponding to position 121 of Cyp18 [39]. Detailed
8 Peptidyl-prolyl cis/trans Isomerases
inspection of the catalyzed reaction will reveal the specific meaning of the kcat /K m value: either that the reversible cis/trans interconversion determines this constant or that there is a direct coupling to an irreversible process, with preceding trans to cis or cis to trans interconversions, that plays a role in determining the available enzymatic parameters. The presence of PPIases with high affinity for a substrate (K m [S]) is indicated by deviations of the experimental data from strict first-order kinetics in the initial phase of the progress curve for the catalyzed interconversion. Several assays make use of the transient production of pure cis or trans isomers or at least of an isomer ratio different from that expected based on the steadystate reaction conditions. A comparison between the PPIase assays revealed that the higher the deviation from the equilibrium ratio of isomers, the higher the signal-to-noise ratio of the assay. Rate enhancement measurements usually require monitoring of reaction kinetics over an extended period of time. It is vital to reliable activity measurements that kinetic traces be followed over at least three half-times of the reaction. Use of the assays as endpoint-determination methods has not yet been established. Assaying PPIases with protein substrates has proven to be more difficult. Threedimensional protein structures derived from crystals reveal a defined isomeric state for every prolyl peptide bond being in either cis or trans conformation, with the majority of prolyl bonds in trans form [6, 40]. In some cases the crystallization process was found to be a crucial determinant of the conformational state of prolyl bonds in the sample [6]. The seeming homogeneity results only partially from isomer-specific stabilization effects by intramolecular long-range interactions or isomer-specific crystal lattice sampling. When probed by NMR spectroscopy in solution, many proteinaceous prolyl bonds show conformational polymorphism, where the isomeric proteins interconvert in a reversible reaction. Typically, these native-state isomerizations exhibit a wide range of isomer ratio values. The rate of the spontaneous interconversion of isomers is usually found to be slow on the NMR time-scale. Due to a considerable isomer-specific dispersion of proton signals adjacent to the isomeric bond, dynamic 1 H-NMR methods provide powerful tools for locating isomerizing peptide bonds in polypeptides. Rate constants of interconversion are also available from this probe, but NMR methods are limited to rather high substrate concentrations. Dynamic NMR methods generally do not require prior chemical perturbation of isomer composition by chemical means. Instead, transferred magnetization to isomers is detected. However, one-dimensional 1 HNMR spectroscopy also has been used in combination with classical solvent-jump techniques when subsequent monitoring of the time course of an isomer-specific signal intensity is to occur [41]. 19 F nuclei in labeled amino acids can also serve as a probe, as was shown for a PPIase-catalyzed isomerization reaction in a derivative of the tissue hormone bradykinin [42]. Based on the combination of dynamic 1 H-NMR spectroscopy and conformational polymorphism of proteins, PPIase activity can be easily determined if the number of isomerizing prolyl bonds is limited (preferably to one or two prolyl bonds) and the isomer ratio does not deviate much from unity. Because one-dimensional 1 H-NMR
3 Monitoring Peptidyl-prolyl cis/trans Isomerase Activity 9
experiments reveal slow dynamics in spontaneous prolyl isomerization, line broadening is not significant for the cis and trans signals. However, in the presence of PPIases, signals experience line broadening and coalescence [43] that can be quantified by comparison with simulated line shapes based on quantum-mechanical density matrix formalism [44, 45]. The two-dimensional NOESY experiments are more sensitive to catalytically enhanced interconversion rates than are one-dimensional NMR techniques. Exchange cross-peak intensity at different mixing times gives access to the rate constants of interconversion [46, 47]. Isomer-specific proteolysis has been widely employed to assay PPIase activity with oligopeptide substrates. This robust method can be considered as the standard test for routine determination of enzyme activity for all families of PPIases in their pure state as well as in crude cell extracts and biological fluids [9, 48]. The assay is based on the fact that a linear oligopeptide containing a single prolyl bond usually consists of a mixture of two slowly interconverting conformers, the cis and the trans prolyl isomers, in solution. Depending on the nature of the amino acid flanking the proline residue of substrates in the -Xaa-Pro-Ala- moiety, the percentage of cis conformer is in the range of 6% to 38% in aqueous solutions [41]. Slow re-equilibration after rapid disturbance of the conformational equilibrium is a common feature of most prolyl isomerizations. Proteases such as chymotrypsin, various elastases, trypsin, subtilisin, clostripain, and others liberate 4-nitroaniline from -Pro-Yaa-4-nitroanilides (typically, Yaa Phe, Tyr, Leu, Arg, Lys, Val, Ala) only if a trans prolyl bond exists in the penultimate amino acid position from the scissile linkage. Isomer-specific hydrolysis by a protease can serve as a chemical means of displaycing the isomer distribution in a very efficient manner, resulting in a transient population of about 100% cis isomer. A protease has not yet been detected that discriminates to the isomer level, hydrolyzing substrates with a cis prolyl bond exclusively. To ensure rapid cleavage of the entire population of the trans isomeric substrate, the protease concentration is set at a high value. The kcat value of the proteolytic reaction determines the limiting cleavage rate at very high protease concentrations and defines an upper limit estimate of isomerization rates accessible to this assay. Thus, after consumption of the entire population of the trans isomer using this strong proteolytic pulse, the cis isomer of the peptide is left over. Such a reaction cannot continue to completeness, which is defined by consumption of the total amount of substrate, unless a parallel reaction couples. The coupled reaction is the first-order cis to trans interconversion, which precedes the rapid proteolytic consumption of the trans isomer formed during the conformational interconversion as a rate-limiting step after the initial phase of the reaction. Biphasic kinetics can be monitored under these conditions, where the PPIase-mediated rate acceleration will reside in the slow phase of the reaction (Figure 3). It is clear that only a small fraction of the total absorbance of a protease-coupled reaction can be utilized for assaying PPIases. Due to the distinctive characteristics of the proteasecoupled assay, the part of absorbance released in the dead time of mixing at 390 nm contributes to the unwanted background signal. Substitution of 4-nitroaniline by fluorogenic leaving groups or the use of intramolecularly quenched fluorescent protease substrates allows for fluorescence-based protease-coupled PPIase assays
10 Peptidyl-prolyl cis/trans Isomerases
Fig. 3 First-order progress curve of a typical proteolytic PPlase assay. Zero-time t0 is taken at the time point of injection of the assay peptide. The time period required for sample maxing is depicted on the time axis. By extrapolation, the time zero inter-
cept value At and the absorbance at infinite reaction time Aoo become available, the difference of which gives an estimate of the cis/trans ratio if it is set relative to the amplitude of absorbance of the very rapid kinetic phase visible at zero time.
[49, 50]. The signal-to-noise ratio of this modified assay also suffers the limitations of UV/Vis detection. Deficits of the protease-coupled assay such as proteolytic degradation of PPIases and substrates are partially avoided by carefully selecting the helper protease. However, most PPIases may have evolved to accelerate cis/trans interconversions at or near the cis/trans equilibrium of a cellular substrate [44]. The protease-coupled assay utilizes reaction conditions that necessarily lead to a quasi-irreversible isomerization under non-equilibrium conditions. In contrast, small physical perturbations of a cis/trans equilibrium can serve to realize PPIase assays under more natural conditions. Provided that isomer-specific differences of UV/Vis spectra, NMR signals, fluorescence quenching, or chemical properties (pKa values) exist, the time courses of re-equilibration of prolyl isomers may provide a signal that is determined in rate by the sum of the first-order rate constants for the cis to trans and trans to cis interconversion. Transient deviations from the actual equilibrium conditions have been achieved mainly by pH and solvent jumps [51–53]. Given the experimental fact that signal amplitudes arising from solvent perturbations are usually very small, improvement in signal quality of the assays is expected to be associated with covalent modifications that serve as a means for potentially increasing the cis/trans isomer ratio. By intramolecular disulfide bridging, 2-aminobenzoyl-Cys-Phe-Pro-Ala-Cys-Ala-4-nitroanilide or 2-aminobenzoylCys-Phe-Pro-Ala-Cys-3-nitro-tyrosineamide experienced about 100% cis isomer for the Phe-Pro moiety, which declined after disulfide ring reduction to an open-chain
3 Monitoring Peptidyl-prolyl cis/trans Isomerase Activity 11
cis isomer level of about 10% [54]. In the presence of high concentrations of reductive agent, such as dithiothreitol or tris-(2-carboxyethyl)-phosphine, ring opening is a rapid reaction followed by the slow equilibration of the Phe-Pro moiety. Efficient fluorescence quenching of the 2-aminobenzoyl residue by the 4-nitroanilide moiety requires the cis conformation. Thus, subsequent to the reductive pulse, the time course of the fluorescence increase corresponds to the cis/trans interconversion in the open-chain peptide that is able to reveal the presence of a PPIase [55]. Fortunately, most substrates and enzymes are not susceptible to interference from the reductive agents. Quite early in the development of PPIase assays, it was noted that slow kinetic phases of refolding of chemically denatured RNases are sensitive to PPIase catalysis [56–58]. The reaction progress curves may in turn serve as a tool for assaying PPIase activity. For this to occur, the PPIase must recognize partially folded intermediates that result in limited access to the catalyzable prolyl bond. Both wild-type ribonuclease T1 (abbreviated wtRNaseT1) and its site-directed mutagenized protein variants turned out to be especially suitable substrates for peptide bond cis/trans isomerases. Two prolyl bonds of RNase T1 (Tyr38-Pro39 and Ser54-Pro55) adopt the cis conformation in the native state but display a diminished cis population in the unfolded state. Thus, the trans to cis isomerization is a quasi-irreversible process because the reverse isomerization is disfavored under strongly native refolding conditions. The refolding kinetics can be probed by fluorescence, CD, UV/Vis, and real-time NMR spectroscopy and the regain of enzymatic activity [58–60]. Recently, time-resolved FTIR-difference spectroscopy has also been applied successfully for the spontaneous reaction [61]. A complete kinetic folding model has been established by unfolding-refolding double mixing experiments [62, 63]. If RNase T1 is denatured and reduced and the cysteine side chains are carboxmethylated, it is still able to adopt an active, native-like conformation, but only if high ionic strength applies. Thus, instead of utilizing denaturing agents or extreme thermal conditions, variable concentrations of NaCl allow for shuttling between native and unfolding conditions. In general, the PPIase activity does not reflect significant interference with the shuttling conditions. Using the remarkable features of RNaseT1, this refolding reaction is the established standard assay for PPIase catalysis on proteinaceous substrates in the refolding and unfolding directions [64]. Tendamistat provides an example where PPIase catalysis of native-state prolyl isomerization can be probed by fluorescence spectroscopy using double-jump techniques. In interrupted refolding experiments, a very slow folding reaction, with an amplitude of about 12%, was detected that is still present, although at a reduced level of amplitude, in the native state. It indicates conformational heterogeneity at the proline positions 7 and 9. Catalysis by cyclophilin identifies this very slow reaction as a limiting prolyl isomerization. The very low cis/trans ratio of about 0.02 for these prolyl bonds suggests that the native-state isomerization would probably escape detection by NMR spectroscopy [65]. Accelerated recovery of enzyme activity can be used to identify and quantitate PPIase catalysis on prolyl bond-limited refolding phases that are spectroscopically
12 Peptidyl-prolyl cis/trans Isomerases
silent. Monitoring the time-resolved appearance of native muscle acyl phosphatase exemplifies this approach [22]. 4 Prototypical Peptidyl-prolyl cis/trans Isomerases 4.1 General Considerations
This article addresses some fundamental principles underlying the biochemical functions of PPIases based on the discussion of prototypical enzymes. Prototypical PPIases are monomerically active enzymes ranging in size from 92 to 177 amino acids, which, per definition, do not contain any targeting signal or supplementary domain. Functional and structural studies have been performed mostly for the prototypical enzymes, whereas present knowledge of the enzymatic properties of multidomain PPIases is rather limited. Four different molecular mechanisms can be hypothesized to define the biochemical basis of the physiological effects known to be exerted by the members of the three families of PPIases: (1) the catalysis of prolyl isomerization, (2) a holding function for unfolded polypeptides, (3) a presenter-protein function for physiological ligands, and (4) a proline-directed binding function for other proteins. The underlying reactions might control, either in isolation or collectively, the physiological role of prototypical and domain-complemented enzymes. A general consensus has not yet been reached to which extent the biochemical principles above contribute to the generation of physiological signals by PPIases [36]. In vitro experiments frequently reveal that peptide bond cis/trans isomerases encompassing more domains than just the catalytic module are able to prevent chain aggregation in typical chaperone assays [66–69]. Proline residues are not essential for sequestration of the polypeptide chain. However, polypeptide sequestration by PPIases is due to the presence of an extended stretch of catalytic subsites, also termed secondary binding sites. Classically, many polypeptide-utilizing enzymes form extensive contacts both in the active site and through secondary binding sites. All these interacting sites are closely linked to the activation of the catalytic machinery, but they will not give rise to physiological functions independent of the catalyzed reaction. More specifically, the properties of maintaining and catalyzing unfolded chains are not mutually exclusive. Therefore, although an important determinant of catalysis, the holding function plays an auxiliary role in contributing to the physiological effects of PPIases [36]. Immunosuppressive PPIase inhibitors produce a multitude of effects in transplantation medicine, reflecting the causal relationship between physiological effects and the multifunctional nature of the biochemical platform of PPIases [70–72]. The molecular mechanism underlying immunosuppression mediated by PPIase inhibitors was found to be associated with a gain of function in the form of drug activation by PPIase/inhibitor complex formation. This example unequivocally
4 Prototypical Peptidyl-prolyl cis/trans Isomerases
demonstrates the importance of drug target control by presenter molecules (according to the biochemical principle 3). Consequently, the most prominent effect of the Cyp18 and FKBP inhibitors CsA and FK506, respectively, i.e., the prevention of clonal expansion after antigenic stimulation of T cells, has been discussed exclusively in terms of this presenter-protein strategy. The active complex is targeted to the inhibition of the protein phosphatase calcineurin. Structurally the PPIase/drug complex was thought to present a composite surface, which is next to active-site residues of the PPIase, to other cytosolic binding partners, whereas no independent role could be elucidated for the simultaneously occurring blockage of the catalytic machinery of the PPIase [73]. Care must be taken, however, when interpreting the drug-mediated immunosuppression as a model for physiological PPIase function in humans, as the microbial drugs do not belong to the normal constituents of mammalian cells. The fact of the matter is that an authentic polypeptide ligand utilizing the presenter-protein function of a PPIase has not yet been identified, thus challenging the view of a presenter-protein function of PPIases in normal cell life. Moreover, gain of function appears to be released in a few special cases, such as the Cyp18/cyclosporin A complex. When substituted in the 3 position, cyclosporin derivatives inhibit calcineurin on their own but are inactive in the Cyp18/drug complexes. Of the most important structural parameters, only the ring conformation of the cyclosporins proved to be a decisive feature [74]. Only recently has attention turned to the role of inhibiting the catalytic function of PPIases for the interpretation of drug effects under physiological and pathophysiological conditions [75–78]. Small cell-permeable effectors represent in principle powerful tools for identification of functional links between catalyzed prolyl isomerization and physiological processes in living cells. Most attempts to unravel physiological functions of PPIases in vivo have been done by this pharmacological approach. In particular, the immunosuppressants cyclosporin A (CsA), FK506, and rapamycin, as well as their non-immunosuppressive derivatives, have been proposed as biochemical markers, which may provide results useful for dissecting functions in vivo. These compounds are competitive inhibitors of many cyclophilins or FKBPs present in human tissues, but to a different extent. There are only a few inhibitors available for the enzymes of the parvulin family [53, 79, 80]. A longstanding aspiration of researchers in the field has been the design and synthesis of inhibitors capable of selectively inhibiting closely related PPIase paralogs, but success is still limited at this stage. Among all known enzymes, PPIases exhibit a unique reaction profile with a large number of potentially reactive sites in a polypeptide substrate and a low degree of electronic differences between substrate and product. Regarding a microscopic description, the minimal kinetic mechanism of the enzyme-catalyzed prolyl isomerization of protein and peptide substrates operates on the basis of three chemical transitions. Accordingly, the two Michaelis complexes, the trans and cis isomers in their enzyme-bound state, are able to interconvert to each other with or without forming a non-Michaelis complex-like intermediate. In the simplest case, the catalytic rate enhancement is due to enzyme-mediated stabilization of the partially rotated state of the prolyl bond. However, the limited access to transition-state
13
14 Peptidyl-prolyl cis/trans Isomerases
analogue inhibitors and the minimal activities of catalytic antibodies generated against surrogates of twisted peptide bonds [81, 82] do not support an exclusive role of noncovalent transition-state stabilization for catalysis. Large inverse solvent deuterium isotope effects (J. Fangh¨anel and G. Fischer, in preparation), the pH dependence of kcat /K m values [83, 84], and the results of site-directed mutagenesis [85–87] point to the involvement of distinct catalytic groups in the mechanism of rate enhancement. Therefore, it is likely that the desolvation, the twisted transitionstate stabilization, and the acid-base catalysis are acting in combination rather than as isolated routes to enzymatic catalysis [88]. It appears likely that only physiological substrates correctly recruiting the complete ensemble of secondary binding sites provided by the PPIase are able to make the catalytic machinery fully active [84]. PPIases have been shown to exhibit pronounced subsite recruitment, supporting the presence of many enzyme-substrate-contacting sites in the transition state of catalysis, but for most PPIases knowledge remains limited and fragmentary. In the reported cases, this secondary specificity contributes much to efficient catalysis [9, 53]. With a few exceptions, the parent five-membered proline ring represents a major determinant of catalytic specificity [89]. The active sites of some eukaryotic members of the parvulin family impose considerable constraints on substrate selection. The marked preference for substrates containing the (PO3 H2 )Thr-Pro-/ (PO3 H2 )Ser-Pro-moiety illustrate a unique ability of several parvulins to isomerize polypeptides targeted by the large superfamily of proline-directed protein kinases, including cyclin-dependent and mitogen-activated protein kinases [90–92]. As a common property of PPIases, stereospecificity causes detrimental effects on catalysis when D-amino acid substitution in the P1 and P1 positions is to occur [93]. 4.2 Prototypic Cyclophilins
Prototypic cyclophilins consist of a single catalytic domain with no N-terminal and C-terminal extensions bearing a discrete biochemical function. Their amino acid sequences are highly conserved from yeast to humans, with a smaller degree of conservation across prokaryotic taxa [94]. Currently, this group of cyclophilins comprises six members in human cells, with polypeptide chains of 160–177 amino acids long whose biochemical functions have been at least partially characterized. Many additional prototypic cyclophilins can be derived from the complete human genome, but their protein expression must await characterization. Cytosolic Cyp18 (CypA) is the most abundant constitutively expressed cyclophilin in the cytosol of mammal cells. Unexpectedly, Cyp18 can be extracellularly secreted in response to inflammatory stimuli [95] and is recognized by the chemotaxis cell-surface receptor CD147 only if the Cyp18 active site is functional [96]. Other members are the nuclearly localized Cyp19.2 (CypH, USA-Cyp, Snu-Cyp, Cyp20) [97]; the cancerrelated Cyp18.1a (COAS2-Cyp) [98]; the two splicing variants of the gene PPIL3, Cyp18.6 and Cyp18.1 (CypJ) [99]; and Cyp18.2a (CypZ) [100]. Among the bestcharacterized prototypic cyclophilins, Cyp19.2 is biochemically dissected along with resolved crystal structures [101–103].
4 Prototypical Peptidyl-prolyl cis/trans Isomerases
Cyp18 represents a thermally and chemically rather stable protein that contains a well-defined PPIase active site, a hidden catalytic triad typical of serine proteases [104], and a poorly defined second peptide ligand-binding site [105]. The calcium/magnesium-dependent nuclease activity reported for cyclophilins [106, 107] is not a property of the native Cyp18 structure but might result from a nuclease contamination of Cyp18 preparations [108]. The binding partner of U4/U6-60K and hPrp18, the splicosomal Cyp19.2, contains two well-defined protein-protein interaction sites: the PPIase site and an oppositely located peptide-binding site. Both sites are believed to act in conjunction [102]. The Cyp18 concentration approaches high levels in the brain but is also remarkable in other tissues and cells, as could be exemplified by the measurement of 5–10 µg/mg−1 total protein in kidney tubules and endothelial cells [109–111]. In contrast to FKBP12, Cyp18 is rather promiscuous toward substrates of a series of-Xaa-Pro- with comparable kcat /K m values for tetrapeptides, while the Xaa position varied within the natural amino acids [112, 113]. Specificity constants approach high values of about >107 M−1 s−1 , implying a nearly diffusion-controlled bimolecular reaction, and the energy barrier of Cyp18-catalyzed isomerization is almost entirely determined by the activation entropy. General acid catalysis by Arg55 has been argued to be the major driving force of catalyzed isomerization for Cyp18 [85, 114]. Another critical residue in hydrogen bonding distance to the rotating prolyl bond is His126, but its specific function has not yet been explored. Kinetic secondary deuterium isotope effects for substrates with an H/D exchange at the Cα position of the amino acid preceding proline are found to be similar between the spontaneous and the Cyp18-catalyzed reaction. The occurrence of a tetrahedral intermediate on the catalytic pathway is ruled out by the isotope effect kH /kD > 1 for these substrates [83]. However, a detailed picture of Cyp18 catalysis has not yet begun to emerge. Most strikingly, all genes encoding cyclophilins could be deleted without seriously affecting viability of Saccharomyces cerevisiae [115]. Similarly, disruption of both copies of the mouse gene encoding for Cyp18 indicates the nonessential character of this protein [116]. On the other hand, mutations of the Cpa1 and Cpa2 proteins alone, the Cyp18 homologues in Cryptococcus neoformans, conferred dramatic phenotypes in that cell growth, mating, virulence, and cyclosporin A toxicity were affected. Cpa1/Cpa2 double mutants exhibited severe synthetic defects in growth and virulence [117]. Cyp18 itself has pronounced biochemical and physiological effects on living cells. When Cyp18 is secreted by vascular smooth muscle cell in response to oxidative stress, it mediates typical effects of reactive oxygen species: the activation of extracellular signal-regulated kinase (ERK1/2) and stimulation of cell growth [118]. Whole-cell experiments assaying the phosphorylation status of extracellular signal-regulated protein kinases in the presence of a panel of wild-type Cyp18 and Cyp18 variants served to display possible enzymatically controlled initiation of downstream signaling events. The extracellular part of the multifunctional transmembrane glycoprotein CD147 (basigin) is considered to be the docking site of
15
16 Peptidyl-prolyl cis/trans Isomerases
Cyp18 [95]. Whereas externally added wild-type Cyp18 fully enhances the level of ERK phosphorylation in CHO.CD147 cells, but not in CHO cells lacking induced CD147 expression, Cyp18 variants are effective only according to their level of residual PPIase activity. Site-directed mutagenesis of the CD147 extracellular domain revealed that the Asp179-Pro180 segment was the Cyp18-directed substrate region. Neutrophil chemotaxis, a downstream signaling effect of CD147 stimulation by Cyp18, is also dependent on the intact Asp179-Pro180 recognition segment [96]. Another topic concerns the intracellularly localized fraction of Cyp18 that controls the functional expression of homo-oligomeric α7 neuronal nicotinic and type 3 serotonin receptors in a manner dependent on its PPIase activity in Xenopus laevis oocytes [119]. An antisense deoxynucleotide effectively reduced Cyp18 expression in a rat neuroblastoma cell line and provided the first indication that this enzyme participates in the activation of the caspase cascade in neuronal cells [120]. Functional analysis revealed that overexpression of Cyp18 inhibited prolactininduced Rac activation, while simultaneously prolonging Jak2 phosphorylation. This effect indicates that Cyp18 might be involved in the various prolactin receptormediated signaling pathways [121]. Cyp18 physically interacts with many protein ligands in cells, some of which may involve substantial chain reorganization during functional interaction affecting their bioactivity. If ligand binding is strong enough, as realized by the retinoblastoma gene product RB, it can interfere with the blockage of CsA-mediated NF-AT dephosphorylation if the Cyp18 concentration is limiting [122]. Direct evidence for the functional role of Cyp18 has come from signals associated with oxidative stress. In particular, stimulation of the ligand activity was found to occur when Cyp18 binds to the thiol-specific antioxidant protein Aop1 [123]. The familial amyotrophic lateral sclerosis (FALS)-associated mutant Cu/Zn superoxide dismutase-l (SOD) induces apoptosis of neuronal cells in culture associated with high levels of reactive oxygen species. An increased protein turnover might result from altered cellular free radical balance that requires an increased level of folding-supportive enzymes. Thus, the apoptosis-enhancing effect of cyclosporin A and FK506, as well as non-immunosuppressive analogues of cyclosporin A, in PC12 cells infected with the AdV-SOD Val148Gly- and SOD Ala4Val variants might result from the increased need for PPIase activity in this oxidative stress situation. The importance of enzymatic activity in mutant SOD-mediated apoptosis was supported by experiments showing that overexpressed wild-type Cyp18, but not Cyp18 Arg55Ala variant with a reduced catalytic activity of about 0.1% of the wild-type enzyme, protected cells from death after SOD Val148Gly expression [124]. Interleukin-2 tyrosine kinase catalytic activity is inhibited according to Cyp18 PPIase activity level. NMR structural studies combined with mutational analysis show that a proline-dependent conformational switch within the Itk SH2 domain regulates substrate recognition and mediates regulatory interactions with the active site of the PPIase. Cyp18 and Itk form a stable complex in Jurkat T cells that is disrupted by treatment with cyclosporin A [125].
4 Prototypical Peptidyl-prolyl cis/trans Isomerases
Despite the fact that the Cyp18 gene is regarded as a housekeeping gene, the regulated expression of Cyp18 mRNA in the brain points to stress control of expression [126]. Many Cyp18 homologues are subject to stress regulation [127– 129]. Similarly, proteome analyses detected upregulation of Cyp18 in the higher passages of fetal skin cells and downregulation in the fibroblasts of older adults [130, 131]. A proteome-based analysis of potential mitogens to human bone marrow stromal cells secreted by PC3 cancer cells identified Cyp18 [132]. Both the prototypic COAS-Cyp and Cyp18 belong to the highly overexpressed proteins in human tumors [98, 133, 134]. Secretion of prototypic cyclophilins appears to be involved in host cell defense against parasitic invasions, because a Toxoplasma gondii Cyp18 homologue induces IL-12 production in dendritic cells via chemokine receptor signaling [135]. In other parasites, Cyp18 inactivation causes anti-parasite effects [136]. Cyclophilins and FKBPs have frequently been found to be involved in protein trafficking. Yeast Cyp17 (Cpr1p) was identified as the protein that is required for fructose-1,6-bisphosphatase import into intermediate transport vesicles. Mutants lacking Cpr1p were defective in this import function, but addition of the purified Cyp17 restored import function in the Cpr1p mutants [137]. The three-dimensional structure of human Cyp18 revealed an eight-stranded, antiparallel β-barrel capped by two short helices, with the active-site groove located on one side of the barrel. Many three-dimensional structures of Cyp18 exist either in the free state or in complex with a variety of inhibitors and oligopeptide substrates, and the structure of a ternary complex of Cyp18/CsA/calcineurin has also been resolved [138–140]. Generally, the Cyp18 fold does not change much in response to ligand binding. Complexes of Cyp18 with physiological ligands have been realized for human Cyp18 bound to HIV-1 capsid protein fragments [141, 142]. HIV-1 replication requires capsid protein-mediated host cell Cyp18 packaging [143]. The binding portion of the capsid protein fragments extend over nine residues, at least, with a Gly89-Pro90-Ile91 moiety bound to the primary catalytic site of Cyp18. As already discussed above, such extended ground-state binding sites may have the potential to sequester unfolded polypeptide chains, thus realizing the properties of a holding chaperone. The reaction conditions, however, determine whether Cyp18 is able to prevent aggregation [144] or remains refractory to unfolded chains [145]. In response to their additional domains, larger cyclophilins frequently exhibit chaperone-like function. The mobility of the binding loop of the capsid protein fragments causes native-state isomers to exist in the unbound state. Cyp18 effectively exerts control on the dynamics of native-state prolyl isomerization by increasing the isomerization rate [146, 147]. It can be hypothesized that the Cyp18/capsid protein interaction protects HIV-1 from the host cell restriction factor Ref-1 by a catalyzed distortion of a capsid protein segment representing the Ref-1 docking site [148]. A Cyp18-mediated rate enhancement has been established as a standard probe for the identification of prolyl-bond-limited steps in refolding [149–154]. This enzyme is also a major component of the mixture of folding helper proteins of the oxidative refolding chromatography used, for example, for “empty” MHC class
17
18 Peptidyl-prolyl cis/trans Isomerases
I-like molecules to assemble at physiological temperatures in the absence of ligand [155–157]. Cyp18 constitutes a common target protein for the immunosuppressants cyclosporin A and sanglifehrin, which both inhibit the enzyme by binding to the deep hydrophobic pocket of the PPIase active site [158–161]. Besides cyclosporin A and its derivatives, only a few other Cyp18 inhibitors have been reported, but at a reduced level of inhibitory potency [162–164]. To date the Cyp18/cyclosporin A complex forms a sole example among the Cyp18/inhibitor complexes in that complex formation results in a gain of function that is exclusively targeted to the Ser/Thr protein phosphatase calcineurin [71]. 4.3 Prototypic FK506-binding Proteins
The FK506-binding proteins (FKBPs) are a family of PPIases in which the prototypic enzymes have been found to bind to and become inhibited by the peptidomacrolides FK506 and rapamycin. The group of prototypic FKBPs is much smaller than those of Cyp18 because FKBP12, FKBP12.6, and a third, uncharacterized FKBP12 encoded on human chromosome 6 are the only members. FKBP12 differs from FKBP12 in 16 out of 107 amino acids. The genomes of lower eukaryotes, eu-bacteria, and archaea encode homologues that have, in some cases, a bulge insertion in the flap region of FKBP12, conferring holding chaperone properties to the prototypic enzymes [165]. The property is most easily explained with largely extended catalytic subsites required for optimal recognition of the cellular substrates in these organisms. In the catalytic modules of FKBPs, in view of the sequence similarity among members of the FKBP family, affinity to FK506 is less well conserved [87]. As was already found for cyclophilins, sequence variations at the positions that are crucial for binding the inhibitors and enzyme activity are relatively small. The human members of the FKBP family are depicted in Figure 2. The concentration of FKBP12 in T cells is extraordinarily high for an enzyme (about 20 mM) [28]. Furthermore, there is an increased level of FKBP12 in neurons situated in areas of degeneration [166]. Mimicking the intrinsically high concentrations of authentic FKBP12 in the male reproductive tract, recombinant FKBP12 enhances the curvilinear velocity of immature sperm, suggesting a role for FKBP12 in motility initiation [167]. FKBP12 also possesses chemotactic effects on neutrophils unless the active site is blocked by FK506 [168]. Native FKBP12 contains seven trans-prolyl peptide bonds, and the cis to trans isomerizations of some or all of them constitute the slow, rate-limiting event in folding. Its refolding process from a chemically denatured state constituted the first example of an autocatalytic formation of a native protein from kinetically trapped species with nonnative proline isomers [169, 170]. Knowledge of FKBP12 and FKBP12.6 structures either alone or complexed with ligands exists, but the structure of an FKBP12/oligopeptide substrate complex is
4 Prototypical Peptidyl-prolyl cis/trans Isomerases
still missing [171]. Thus, the FK506-binding site was taken to represent the catalytic site. A docking procedure to FKBP12 was used for a potential substrate: the prolinerich prolactin receptor-derived octapeptide. When bound to FKBP12, the docked peptide reveals a type I β-turn conformation and many hydrophobic contacting sites [172]. The major structural elements surrounding the hydrophobic catalytic cleft are a concave five-stranded antiparallel β-sheet, wrapping around a short alpha-helix, and a large flap region. In most preferred substrates, hydrophobic side chains are positioned toward the N-terminus of the reactive prolyl bond [83]. Unlike Cyp18 and Par10, however, kcat K m values of FKBP12 do not achieve the diffusion-controlled limit. Both PPIase activity and FK506 inhibition proved to be similar for FKBP12.6 and FKBP12 [173]. Consequently, steady-state catalytic data of prototypic FKBPs suggest an unassisted conformational twist mechanism with rate enhancement due in part to desolvation of the peptide bond at the active site [88, 174]. Comparison of the effects of point mutations on kcat K m values for tetrapeptide substrates revealed catalytically important side chains of FKBP12 [86]. Several FKBP12 variants (Tyr82Phe, Trp59Ala, and Asp37Val) experienced marked reduction (about 10-fold) in the specificity constant kcat /K m , with the kcat value being the most severely affected parameter. None of the point mutations confers complete catalytic inactivity to the enzyme variants. In comparison to Cyp18, FKBP12 confers a more diverse gain of function because the FKBP12/FK506 complex inhibits the protein phosphatase calcineurin, but the structurally related FKBP12/rapamycin complex inactivates the human target of rapamycin (mTOR) protein, a cell-cycle-specific protein kinase known to be involved in signaling pathways that promote protein synthesis and cell growth [175]. The downstream events that follow the inactivation of TOR result in the blockage of cell-cycle progression at the juncture of the G1 and S phases. Besides the natural products FK506 and rapamycin and their derivatives, many other inhibitors of several totally unrelated substance classes have been developed, indicating a highly promiscuous binding affinity of the active site of FKBP12 [176–178]. That is, a rather limited number of contacting sites suffice to account for specificity and affinity. The dissociation constants for the FKBP12 complexes of dimethyl sulfoxide and 5-diethylamino-2-pentanone of 20 mM and 500 µM, respectively, highlight the unique ligand-affinity pattern of the FKBP12 catalytic cleft [179]. Most potent compounds are derived from α-ketoamides, from which the potent neuroactive inhibitor GPI 1046 (3-(3-pyridyl)-1-propyl (2S)-1-(3,3-dimethyl-1,2-dioxopentyl)-2-pyrrolidinecarboxylate) was derived [180–183]. Displacement of enzymebound water molecules upon inhibitor binding is a major driving force for the tight binding of α-ketoamides to the active site [184]. Some FKBPs, notably FKBP12, have defined roles in regulating ion channels, protein transport, or cell signaling. Using cells from FKBP12-deficient (FKBP12(−/−) ) mice, cell-cycle arrest in the G(1) phase could be detected. These cells can be rescued by FKBP12 transfection [185]. The transgenic mice do not show abnormal development of skeletal muscle but have severe dilated cardiomyopathy and ventricular septal defects that mimic a human congenital heart disorder [186]. In contrast,
19
20 Peptidyl-prolyl cis/trans Isomerases
disruption of the FKBP12.6 gene in mice results in cardiac hypertrophy in male mice, but not in females. However, female FKBP12.6(−/−) mice treated with an estrogen receptor antagonist develop cardiac hypertrophy similar to that of male mice [187]. Physically interacting proteins of FKBP12 and FKBP12.6 include the intracellular calcium release channels [188, 189], the type II TGF-β receptor [190], the EGF receptor [191], the HMG-box proteins 1 and 2 [192], aspartokinase [193], FAP48, a fragment of the glomuvenous malformation-related glomulin [194, 195], and the transcription factor YY1 [196]. An interacting protein similar to FAP48, the FAP48 homologue FIP37, has been found for the prototypic plant FKBP12 in Arabidopsis thaliana [197]. Two transmembrane receptor protein kinases have been shown to be downregulated in their activating phosphorylation in the presence of FKBP12. FKBP12 can affect TGF-β signaling by preventing Ser/Thr phosphorylation of the type I TGF-β receptor by the type II TGF-β receptor via binding to a Leu-Pro motif of the type I TGF-β receptor [190]. Type I TGF-β receptor affinity for FKBP12 is abolished for a receptor molecule phosphorylated on the GS box. The modulation of the EGF receptor signaling by FKBP12 is directed against the tyrosine autophosphorylation of the cytoplasmic domain of the receptor [191]. FK506 and rapamycin abolish the FKBP12 effect to the same extent in EGF receptor-rich A431 fibroblasts, leading to an increase in cell growth. In both cases, by impeding the activation of receptors in the absence of native ligands, FKBP12 may provide a safeguard against leaky signaling, resulting in a basal level of receptor phosphorylation. In cell signaling, the function of prototypic FKBPs emerges as catalysts affecting the rate of decay of metastable states of receptor proteins (C. Schiene-Fischer, in preparation). The sarco(endo)plasmic reticulum, the main Ca2+ storage compartment, contains two different families of Ca2+ release channels: the ryanodine receptors (RyR) and the inositol 1,4,5-trisphosphate receptors (IP3 R), both of which share affinity binding to prototypic FKBPs. Mammalian tissues express three large, high-conductance ryanodine receptors known as skeletal muscle (RyR1), cardiac muscle (RyR2), and brain (RyR3). They are able to control dynamically the level of intracellular Ca2+ by releasing Ca2+ from the intracellular sarco(endo)plasmic reticulum. The skeletal muscle ryanodine receptor (RyR1) can be isolated as a heterooligomer containing FKBP12 whereas the cardiac ryanodine receptor (RyR2) selectively associates with FKBP12.6. However, RyR2 isolated from many vertebrates contains both FKBP12 and FKBP12.6 [198]. The structural response to mutation suggests that considerable hydrophobicity of tryptophan 59 (which is a phenylalanine in FKBP12.6) contributes to the specificity of binding between FKBP12 isoforms and ryanodine receptors. More specifically, the triple mutant (Gln31Glu/As-n32Asp/Phe59Trp) of FKBP12.6 was found to lack selective binding to RyR2, comparable to that of FKBP12. In complementary studies, mutations of FKBP12 to the three critical amino acids of FKBP12.6 conferred selective binding to RyR2 [199].
4 Prototypical Peptidyl-prolyl cis/trans Isomerases
The 4:1 molar ratio of FKBP12/tetrameric RyR indicates the stoichiometric binding of one FKBP12 molecule to one RyR subunit. The heterocomplex dissociation by titration with FK506 reveals that the interaction involves the active site of FKBP12. The prototypic FKBPs are regulatory constituents of the RyR that can dramatically affect RyR function, e.g., by overexpression [200] or deletion [186, 200]. For example, when FKB12 is bound to RyR1, it inhibits the channel by stabilizing its closed state. Taking a more detailed view, FKBP12 deficiency causes longer mean open times and greater open probability in cardiac myocytes of the ryanodine receptor when compared with reference cells. Generally, the Ca2+ leak rate increases considerably [187]. Marks et al. reported that the FKBP12 Phe36Tyr variant is still able to bind but not to modulate RyR channeling, pointing to a crucial role of the unperturbed enzyme activity of FKBP12 [201]. In a canine model of pacing-induced heart failure, the ratio of FKBP12.6 per failing RyR2 is significantly decreased. A conformational change was suggested for RyR2 chains concomitant with abnormal Ca2+ leak of the channel [202]. Ca2+ stores of the endoplasmic reticulum become superfilled after co-expression of FKBP12.6, but not of FKBP12, in CHOhRyR2 cells. The effects of FKBP12.6 on hRyR2-mediated intracellular Ca2+ handling could be antagonized using the FKBP inhibitor rapamycin [203]. 4.4 Prototypic Parvulins
The parvulins represent the most recently discovered PPIase family [204, 205]. The most typical parvulin known to date is the E. coli Par10, a monomerically active enzyme comprising 92 amino acids in its mature form. Parvulin sequences are characterized by two conserved histidine motifs (three to five amino acids in each segment), which are separated by a stretch of about 70–85 amino acids containing a few other conserved residues. There is no sequence similarity to either cyclo-philins or FKBPs. In various plants, a parvulin member corresponds to the prototypical form (it is 117–119 amino acids in length) because it does not show an N-terminal or C-terminal extension when compared to Par10 [47, 91, 206]. The difference of Par10 in chain length is due to a single polypeptide segment of 23 amino acids inserted in the N-terminal half of the protein that adopt a loop conformation in the α1/β1 region [207]. In two other eukaryotic parvulins, functional domains complement N-terminally the catalytic domain, a DNA-binding domain, and a (PO3 H2 )Ser/Thrdirected type IV WW domain in Par14 and Pin1, respectively. So far, the existence of only these two genes encoding parvulin proteins has been described in the human genome, i.e., they have a low frequency compared with both Cyp and FKBP sequences (Figure 2). Originally, an activity-based screening approach for cytosolic extracts identified Par10, and it was not shown to be a predicted ORF of the E. coli genome. Besides plants, no other organism was reported to possess a prototypic parvulin. However, the concept of ORF identification is confounded by several factors. The procedure may permit small proteins to escape detection due to their size falling below some arbitrary researcher-defined minimum cutoff, or there may
21
22 Peptidyl-prolyl cis/trans Isomerases
be an inability to precisely define a promoter or translational start [208]. Thus, future identification of new prototypic enzymes is predicted. Many larger parvulins have been characterized in prokaryotes. It is of interest that Par10 is as active as Cyp18 toward oligopeptide substrates but that proteins are catalyzed less efficiently. When compared to Cyp18, Par10 was found to have a much narrower substrate specificity at the position preceding proline (measured by kcat /K m ), a finding reminiscent of FKBP12 [204]. However, unlike other PPIase families, the parvulins exhibit an extraordinary pattern of substrate specificity across the family members, ranging from the positively charged arginine side chain to the double-negative phosphoserine (phosphothreonine) side chain for Par14 and plant Par13 or Pin1, respectively. Like FKBP12, Par10 accelerates its own refolding in an autocatalytic fashion, and the rate of refolding increases 10-fold when the concentration of parvulin is increased from 0.5 to 3.0 µM [209]. In larger parvulins, such as E. coli SurA, two parvulin-like domains exist, one of which, by itself, is devoid of PPIase activity [210]. The enzymatically active parvulin-like domain protrudes out from the core domain about 25–30Å in the three-dimensional structure, implicating an isolated action onto substrates [211]. The three-dimensional structure of Par10 is not yet published, but sequence similarity suggests a common structural pattern shared by human Par14, Pin1, the active domain of SurA, and Arabidopsis thaliana Par13. Analyses of the tertiary structures of the parvulins revealed the existence of an FKBP-like superfold that is also found for the E. coli GreA transcript cleavage factor of 17.6 kDa [37]. When measuring 2D-1 H-15 N HSQC NMR spectra of human Par14 in the presence of a substrate, site-directed chemical shift changes in the protein were identified [212]. Hydrophobic amino acids in positions Met90, Val91, and Phe94 have been identified to be substrate-sensitive residues that match those of the active site of Pin1 [213]. However, the basic cluster of two neighboring arginines and a lysine, which collectively determine the -(PO3 H2 )Ser/ThrPro- directed substrate specificity of Pin1 and plant Par13, is absent in Par10 and Par14. The putative biological functions of human Pin1, which has highly conserved homologues in eukaryotes, are best considered in the context of the manifold of known phosphoprotein targets [214]. Cellular processes include cell cycle progression, transcriptional regulation, cell proliferation and differentiation, DNA damage response, p53 functions, Alzheimer’s disease, and other tauopathies. On these Pin1 topics, excellent comprehensive reviews have been compiled recently [24, 214, 215]. Although plant prototypic parvulins lack the type IV WW domain, they are able to rescue the lethal mitotic phenotype of a temperature-sensitive mutation in the Pin1 homologue ESS1/PTF1 gene in Saccharomyces [91, 206]. Plant Par13 proteins are expressed in high abundance, predominantly in rapidly dividing cells. Expression of apple Par13 is tightly associated with cell division during both apple fruit development in vivo and cell cultures in vitro. An oxidative stress phenotype could be observed for E. coli Par10 deletion strains. The results of complementation experiments with several Par10 variants suggest a correlation between residual enzyme activity and the capability to rescue protein
References
most strongly associated with the putative Par10 function (B. Hernandez-Alavarez, unpublished results).
5 Conclusions
Elucidation of the molecular mechanisms active in conformational regulation of bioactivity as applied in catalytic fashion to polypeptide chains of various folding states by peptide bond cis/trans isomerases is an emerging research field. Regarding safety of conformational control, the high-resistance on/off switch of a prolyl bond offers unique advantages over other conformational rearrangements within the polypeptide backbone. Thus, the topic extends from de novo protein folding to allosteric control and protein metastability and includes processes that govern the spatial-temporal control of bioactivity in cell life. Research should make use of highly multidisciplinary approaches involving physicochemical techniques and bioorganic chemistry as well as molecular and cell biology. Fundamental probes characterizing prolyl isomerizations in larger polypeptide chains are complicated, are in part insufficiently developed, and suffer from severe limitations regarding sensitivity and selectivity. Advances in current detection methods on the one-bond level of conformational couplings for proteins in vivo will be of essence if the molecular aspect of regulation is to be explored. Experiments combining gene deletion with variant enzyme gene transfection and the small-molecule approach with monofunctional PPIase inhibitors must be supplemented with new techniques allowing the externally controlled switching of a peptide bond conformation.
References 1 EATON, W. A., MUNOZ, V., HAGEN, S. J., JAS, G. S., LAPIDUS, L. J., HENRY, E. R. & HOFRICHTER, J. (2000). Fast kinetics and mechanisms in protein folding. Annu. Rev. Biophys. Biomol. Struct 29, 327–359. 2 BIERI, O., WIRZ, J., HELLRUNG, B., SCHUTKOWSKI, M., DREWELLO, M. & KIEFHABER, T. (1999). The speed limit for protein folding measured by triplet-triplet energy transfer. Proc. Natl. Acad. Sci. USA 96, 9597–9601. 3 KRIEGER, F., FIERZ, B., BIERI, O., DREWELLO, M. & KIEFHABER, T. (2003). Dynamics of unfolded polypeptide chains as model for the earliest steps in protein folding. J. Mol. Biol. 332, 265–274. 4 DUGAVE, C. & DEMANGE, L. (2003). Cis/trans isomerization of organic
5
6
7
8
molecules and biomolecules: implications and applications. Chem. Rev. 103, 2475–2532. FISCHER, G. (2000). Chemical aspects of peptide bond isomerization. Chem. Soc. Rev. 29, 119–127. REIMER, U. & FISCHER, G. (2002). Local structural changes caused by peptidyl-prolyl cis/trans isomerization in the native state of proteins. Biophys. Chem. 96, 203–212. NEUTZE, R., PEBAY PEYROULA, E., EDMAN, K., ROYANT, A., NAVARRO, J. & LANDAU, E. M. (2002). Bacterio-rhodopsin: a high-resolution structural view of vectorial proton transport. Biochim. Biophys. Acta 1565, 144–167. SONG, S. H., ASHER, S. A., KRIMM, S. & SHAW, K. D. (1991). Ultraviolet
23
24 Peptidyl-prolyl cis/trans Isomerases
9
10
11
12
13
14
15
16
17
18
19
resonance raman studies of trans-peptides and cis-peptides photochemical consequences of the twisted pi-star excited state. J. Am. Chem. Soc. 113, 1155–1163. FISCHER, G., BANG, H. & MECH, C. (1984). Determination of enzymatic catalysis for the cis/trans isomerization of peptide bonds in proline-containing peptides. Biomed. Biochim. Acta 43, 1101–1111. SCHIENE-FISCHER, C., HABAZETTL, J., SCHMID, F. X. & FISCHER, G. (2002). The hsp70 chaperone DnaK is a secondary amide peptide bond cis/trans isomerase. Nat. Struct Bio.l 9, 419–424. ANDREOTTI, A. H. (2003). Native state proline isomerization. Biochemistry 42, 9515–9524. STUKENBERG, P. T., KIRSCHNER, M. W. (2001). Pin1 acts catalytically to promote a conformational change in Cdc25. Mol. Cell 7, 1071–1083. NG, K. K. S. & WEIS, W. I. (1998). Coupling of prolyl peptide bond isomerization and Ca2+ binding in a C-type mannose-binding protein. Biochemistry 37, 17977–17989. EISENMESSER, E. Z., BOSCO, D. A., AKKE, M. & KERN, D. (2002). Enzyme dynamics during catalysis. Science 295, 1520–1523. SCHMID, F. X. (1992). The mechanism of protein folding. Curr. Opin. Struct. Biol. 2, 21–25. B¨ACHINGER, H. P. (1987). The influence of peptidyl-prolyl cis/trans isomerase on the in vitro folding of type III collagen. J. Biol. Chem. 262, 17144–17148. STEINMANN, B., BRUCKNER, P. & Superti FURGA, A. (1991). Cyclo-sporin A slows collagen triple-helix formation in vivo: indirect evidence for a physiologic role of peptidyl-prolyl cis/trans-isomerase. J. Biol. Chem. 266, 1299–1303. THIES, M. J. W., MAYER, J., AUGUSTINE, J. G., FREDERICK, C. A., LILIE, H. & BUCHNER, J. (1999). Folding and association of the antibody domain C(H)3: Prolyl isomerization preceeds dimerization. J. Mol. Biol. 293, 67–79. BRANDTS, J. F., HALVORSON, H. R. & BRENNAN, M. (1975). Consideration of the possibility that the slow step in protein denaturation reactions is due to
20
21
22
23
24
25
26
27
28
cis/trans isomerism of proline residues. Biochemistry 14, 4953–4963. REIMER, U., ELMOKDAD, N., SCHUTKOWSKI, M. & FISCHER, G. (1997). Intramolecular assistance of cis/trans isomerization of the histidine-proline moiety. Biochemistry 36, 13802–13808. TEXTER, F. L., SPENCER, D. B., ROSENSTEIN, R. & MATTHEWS, C. R. (1992) Intramolecular Catalysis of a Proline Isomerization Reaction in the Folding of Dihydrofolate Reductase. Biochemistry 31, 5687–5691. CHITI, F., TADDEI, N., GIANNONI, E., van NULAND, N. A. J., RAMPONI, G. & DOBSON, C. M. (1999). Development of enzymatic activity during protein folding – Detection of a spectroscopically silent native-like intermediate of muscle acylphosphatase. J. Biol. Chem. 274, 20151–20158. YAN, S. Z., BEELER, J. A., CHEN, Y. B., SHELTON, R. K. & TANG, W. J. (2001). The regulation of type 7 adenylyl cyclase by its C1b region and Escherichia coli peptidylprolyl isomerase, SlyD. J. Biol. Chem. 276, 8500–8506. LU, K. P., LIOU, Y. C. & VINCENT, I. (2003). Proline-directed phosphoryla-tion and isomerization in mitotic regulation and in Alzheimer’s Disease. Bioessays 25, 174–181. REIERSEN, H. & REES, A. R. (2001). The hunchback and its neighbours: proline as an environmental modulator. Trends Biochem. Sci. 26, 679–684. CREAMER, T. P. & CAMPBELL, M. N. (2002). Determinants of the polyproline II helix from modeling studies. Adv. Prot. Chem. 62, 263–282. ONEAL, K. D., CHARI, M. V., MCDONALD, C. H., COOK, R. G., YULEE, L. Y., MORRISETT, J. D. & SHEARER, W. T. (1996). Multiple cis/trans conformers of the prolactin receptor proline-rich motif (prm) peptide detected by reverse-phase HPLC, CD and NMR spectroscopy. Biochem. J. 315, 833–844. FISCHER, G. (1994). Peptidyl-prolyl cis/trans isomerases and their effectors. Angew. Chem. Intern. Ed. Engl. 33, 1415–1436.
References 29 ZHOU, X. Z., KOPS, O., WERNER, A., LU, P. J., SHEN, M. H., STOLLER, G., K¨ULLERTZ, G., STARK, M., FISCHER, G. & LU, K. P. (2000). Pin1-dependent prolyl isomerization regulates dephosphorylation of Cdc25C and tau proteins. Mol. Cell 6, 873–883. 30 WEIWAD, M., K¨ULLERTZ, G., SCHUTKOWSKI, M. & FISCHER, G. (2000). Evidence that the substrate backbone conformation is critical to phosphorylation by p42 MAP kinase. FEBS Lett. 478, 39–42. 31 BRANDSCH, M., THUNECKE, F., K¨ULLERTZ, G., SCHUTKOWSKI, M., FISCHER, G. & NEUBERT, K. (1998). Evidence for the absolute conformational specificity of the intestinal H+ /peptide symporter, PEPT1. J. Biol. Chem. 273, 3861–3864. 32 FENG, Y. Q., HOOD, W. F., FORGEY, R. W., ABEGG, A. L., CAPARON, M. H., THIELE, B. R., LEIMGRUBER, R. M. & MCWHERTER, C. A. (1997). Multiple conformations of a human inter-leukin-3 variant. Protein. Sci. 6, 1777–1782. 33 KELLER, M., BOISSARD, C., PATINY, L., CHUNG, N. N., LEMIEUX, C., MUTTER, M. & SCHILLER, P. W. (2001). Pseudoproline-containing analogues of morphiceptin and endomorphin-2: Evidence for a cis Tyr-Pro amide bond in the bioactive conformation. J. Med. Chem. 44, 3896–3903. 34 NIELSEN, K. J., WATSON, M., ADAMS, D. J., HAMMARSTROM, A. K., GAGE, P. W., HILL, J. M., CRAIK, D. J., THOMAS, L., ADAMS, D., ALEWOOD, P. F. & LEWIS, R. J. (2002). Solution structure of mu-conotoxin PIIIA, a preferential inhibitor of persistent tetrodotoxin-sensitive sodium channels. J. Biol. Chem. 277, 27247–27255. 35 BRAUER, A. B. E., DOMINGO, G. J., COOKE, R. M., MATTHEWS, S. J. & LEATHERBARROW, R. J. (2002). A conserved cis peptide bond is necessary for the activity of Bowman-Birk inhibitor protein. Biochemistry 41, 10608–10615. 36 FISCHER, G. & AUMU¨ LLER, T. (2003). Regulation of peptide bond cis/trans isomerization by enzyme catalysis and its implication in physiological
37
38
39
40
41
42
43
44
45
processes. Rev. Physiol Biochem. Pharmacol. 148, 105–150. SEKERINA, E., RAHFELD, J. U., M¨ULLER, J., FANGHA¨ NEL, J., RASCHER, C., FISCHER, G. & BAYER, P. (2000). NMR solution structure of hPar14 reveals similarity to the peptidyl-prolyl cis/trans isomerase domain of the mitotic regulator hPin1 but indicates a different functionality of the protein. J. Mol. Biol. 301, 1003–1017. SCHREIBER, S. L., LIU, J., ALBERS, M. W., ROSEN, M. K., STANDAERT, R. F., WANDLESS, T. J., SOMERS, P. K. (1992). Molecular recognition of immunophilins and immunophilinligand complexes. Tetrahedron 48, 2545–2558. HUSI, H. & ZURINI, M. G. M. (1994). Comparative binding studies of cyclophilins to cyclosporin A and derivatives by fluorescence measurements. Anal. Biochem. 222, 251–255. MACARTHUR, M. W. & THORNTON, J. M. (1991). Influence of proline residues on protein conformation. J. Mol. Biol. 218, 397–412. REIMER, U., SCHERER, G., DREWELLO, M., KRUBER, S., SCHUTKOWSKI, M. & FISCHER, G. (1998). Side-chain effects on peptidyl-prolyl cis/trans isomerization. J. Mol. Biol. 279, 449–460. LONDON, R. E., DAVIS, D. G., VAVREK, R. J., STEWART, J. M. & HANDSCHUMACHER, R. E. (1990). Bradykinin and its Gly6 analogue are substrates of cyclophilin: a fluorine-19 magnetization transfer study. Biochemistry 29, 10298–10302. H¨UBNER, D., DRAKENBERG, T., FORSEN, S. & FISCHER, G. (1991). Peptidyl-prolyl cis/trans isomerase activity as studied by dynamic proton NMR spectroscopy. FEBS Lett. 284, 79–81. KERN, D., KERN, G., SCHERER, G., FISCHER, G. & DRAKENBERG, T. (1995). Kinetic analysis of cyclophilin-catalyzed prolyl cis/trans isomerization by dynamic NMR spectroscopy. Biochemistry 34, 13594–13602. VIDEEN, J. S., STAMNES, M. A., HSU, V. L. & GOODMAN, M. (1994). Thermodynamics of cyclophilin catalyzed peptidyl-prolyl isomerization
25
26 Peptidyl-prolyl cis/trans Isomerases
46
47
48
49
50
51
52
53
54
by NMR spectroscopy. Biopolymers 34, 171–175. BAINE, P. (1986). Comparison of rate constants determined by two-dimensional NMR spectroscopy with rate constants determined by other NMR techniques. Magn. Res. Chem. 24, 304–307. LANDRIEU, I., DEVEYLDER, L., FRUCHART, J. S., ODAERT, B., CASTEELS, P., PORTETELLE, D., VANMONTAGU, M., INZE, D. & LIPPENS, G. (2000). The Arabidopsis thaliana PIN1At gene encodes a single-domain phosphorylationdependent peptidyl-prolyl cis/trans isomerase. J. Biol. Chem. 275, 10577–10581. K¨ULLERTZ, G., LUTHE, S. & FISCHER, G. (1998). Semiautomated microtiter plate assay for monitoring peptidyl-prolyl cis/trans isomerase activity in normal and pathological human sera. Clin. Chem. 44, 502–508. GAREL, J. R. (1980). Evidence for involvement of proline cis/trans isomerization in the slow unfolding reaction of RNase A. Proc. Natl Acad. Sci. USA 77, 795–798. GRACIA-ECHEVERRIA, C., KOFRON, J. L., KUZMIC, P., KISHORE, V. & RICH, D. H. (1992). Continuous fluorimetric direct (uncoupled) assay for peptidyl-prolyl cis/trans-isomerases. J. Am. Chem. Soc. 114, 2758–2759. JANOWSKI, B., WOLLNER, S., SCHUTKOWSKI, M. & FISCHER, G. (1997). A protease-free assay for peptidyl-prolyl cis/trans isomerases using standard peptide substrates. Anal. Biochem. 252, 299–307. GRACIA-ECHEVERRIA, C., KOFRON, J. L., KUZMIC, P. & RICH, D. H. (1993) A continuous spectrophotometric direct assay for peptidyl-prolyl cis/trans-isomerases. Biochem. Biophys. Res. Commun. 191, 70–75. ZHANG, Y. X., F¨USSEL, S., REIMER, U., SCHUTKOWSKI, M. & FISCHER, G. (2002). Substrate-based design of reversible Pin1 inhibitors. Biochemistry 41, 11868–11877. WEISSHOFF, H., FROST, K., BRANDT, W., HENKLEIN, P., MUGGE, C. & FROMMEL, C. (1995). Novel disulfide-constrained
55
56
57
58
59
60
61
62
63
64
pentapeptides as models for beta-VIa turns in proteins. FEBS Lett. 372, 203–209. FISCHER, G., SCHUTKOWSKI, M., & K¨ULLERTZ, G. (2001) Peptidic substrates for directly assaying enzyme activities. Patent DE 100 23743A1. LIN, L. N., HASUMI, H. & BRANDTS, J. F. (1988). Catalysis of proline isomerization during protein-folding reactions. Biochim. Biophys. Acta 956, 256–266. LANG, K., SCHMID, F. X. & FISCHER, G. (1987). Catalysis of protein folding by prolyl isomerase. Nature 329, 268–270. KIEFHABER, T., QUAAS, R., HAHN, U. & SCHMID, F. X. (1990). Folding of ribonuclease T1. 2. Kinetic models for the folding and unfolding reactions. Biochemistry 29, 3061–3070. KIEFHABER, T., QUAAS, R., HAHN, U. & SCHMID, F. X. (1990). Folding of ribonuclease T1. Existence of multiple unfolded states created by proline isomerization. Biochemistry 29, 3053–3061. BALBACH, J., STEEGBORN, C., SCHINDLER, T. & SCHMID, F. X. (1999). A protein folding intermediate of ribonuclease T-1 characterized at high resolution by 1D and 2D realtime NMR spectroscopy. J. Mol. Biol. 285, 829–842. MORITZ, R., REINSTADLER, D., FABIAN, H. & NAUMANN, D. (2002). Time-resolved FTIR difference spectroscopy as tool for investigating refolding reactions of ribonuclease T1 synchronized with trans to cis prolyl isomerization. Biopolymers 67, 145–155. KIEFHABER, T. & SCHMID, F. X. (1992). Kinetic coupling between protein folding and prolyl isomerization. 2. folding of ribonuclease-A and ribonuclease-T1. J. Mol. Biol. 224, 231–240. MAYR, L. M., ODEFEY, C., SCHUTKOWSKI, M. & SCHMID, F. X. (1996). Kinetic analysis of the unfolding and refolding of ribonuclease Tl by a stopped-flow double-mixing technique. Biochemistry 35, 5550–5561. M¨UCKE, M. & SCHMID, F. X. (1992). Enzymatic catalysis of prolyl isomerization in an unfolding protein. Biochemistry 31, 7848–7854.
References 65 PAPPENBERGER, G., BACHMANN, A., MULLER, R., AYGUN, H., ENGELS, J. W. & KIEFHABER, T. (2003). Kinetic mechanism and catalysis of a native-state prolyl isomerization reaction. J. Mol. Biol. 326, 235–246. 66 BOSE, S., WEIKL, T., BUGL, H. & BUCHNER, J. (1996). Chaperone function of hsp90-associated proteins. Science 274, 1715–1717. 67 FREEMAN, B. C., TOFT, D. O. & MORIMOTO, R. I. (1996). Molecular chaperone machines – chaperone activities of the cyclophilin cyp-40 and the steroid aporeceptor-associated protein p23. Science 274, 1718–1720. 68 SCHOLZ, C., STOLLER, G., ZARNT, T., FISCHER, G. & SCHMID, F. X. (1997). Cooperation of enzymatic and chaperone functions of trigger factor in the catalysis of protein folding. EMBO J. 16, 54–58. 69 SCHAFFITZEL, E., R¨UDIGER, S., BUKAU, B. & DEUERLING, E. (2001). Functional dissection of Trigger factor and DnaK: Interactions with nascent polypeptides and thermally denatured proteins. Biol. Chem. 382, 1235–1243. 70 SCHREIER, M. H., BAUMANN, G. & ZENKE, G. (1993). Inhibition of T-cell signaling pathways by immunophilin drug complexes – are side effects inherent to immunosuppressive properties. Transplant. Proc. 25, 502–507. 71 SCHREIBER, S. L., ALBERS, M. W. & BROWN, E. J. (1993). The cell cycle, signal transduction, and immuno-philin ligand complexes. Acc. Chem. Res. 26, 412–420. 72 HAMILTON, G. S. & STEINER, J. P. (1998). Immunophilins: Beyond immunosuppression. J. Med. Chem. 41, 5119–5143. 73 VOGEL, K. W., BRIESEWITZ, R., WANDLESS, T. J. & CRABTREE, G. R. (2001). Calcineurin inhibitors and the generalization of the presenting protein strategy. Adv. Protein Chem. 56, 253–291. 74 BAUMGRASS, R., ZHANG, Y., ERDMANN, F., THIEL, A., RADBRUCH, A., WEIWAD, M. & FISCHER, G. (2004). Substitution in position 3 of cyclosporin A abolishes the cyclophilin-mediated gain-offunction mechanism but not
75
76
77
78
79
80
81
82
immunosuppression. J. Biol. Chem., in press. STEINER, J. P., HAMILTON, G. S., ROSS, D. T., VALENTINE, H. L., GUO, H. Z., CONNOLLY, M. A., LIANG, S., RAMSEY, C., LI, J. H. J., HUANG, W., HOWORTH, P., SONI, R., FULLER, M., SAUER, H., NOWOTNIK, A. C. & SUZDAK, P. D. (1997). Neurotrophic immunophilin ligands stimulate structural and functional recovery in neurodegenera-tive animal models. Proc. Natl. Acad. Sci. USA 94, 2019–2024. BARTZ, S. R., HOHENWALTER, E., HU, M. K., RICH, D. H. & MALKOVSKY, M. (1995). Inhibition of human immunodeficiency virus replication by non-immunosuppressive analogs of cyclosporin A. Proc. Natl. Acad. Sci. USA 92, 5381–5385. CHRISTNER, C., HERDEGEN, T. & FISCHER, G. (2001). FKBP ligands as novel therapeutics for neurological disorders. Mini Rev. Med. Chem. 1, 377–397. WALDMEIER, P. C., FELDTRAUER, J. J., QIAN, T. & LEMASTERS, J. J. (2002). Inhibition of the mitochondrial permeability transition by the nonimmunosuppressive cyclosporin derivative NIM811. Mol. Pharmacol. 62, 22–29. UCHIDA, T., TAKAMIYA, M., TAKAHASHI, M., MIYASHITA, H., IKEDA, H., TERADA, T., MATSUO, Y., SHIROUZU, M., YOKOYAMA, S., FUJIMORI, F. & HUNTER, T. (2003). Pin1 and Par14 peptidyl-prolyl isomerase inhibitors block cell proliferation. Chem. Biol. 10, 15–24. HENNIG, L., CHRISTNER, C., KIPPING, M., SCHELBERT, B., R¨UCKNAGEL, K. P., GRABLEY, S., K¨ULLERTZ, G. & FISCHER, G. (1998). Selective inactivation of parvulin-like peptidyl-prolyl cis/trans isomerases by juglone. Biochemistry 37, 5953–5960. YLI-KAUHALUOMA, J. T., ASHLEY, J. A., LO, C. H. L., COAKLEY, J., WIRSCHING, P. & JANDA, K. D. (1996). Catalytic antibodies with peptidyl-prolyl cis/trans isomerase activity. J. Am. Chem. Soc. 118, 5496–5497. MA, L. F., HSIEH-WILSON, L. C. & SCHULTZ, P. G. (1998). Antibody catalysis
27
28 Peptidyl-prolyl cis/trans Isomerases
83
84
85
86
87
88
89
90
of peptidyl-prolyl cis/trans isomerization in the folding of RNase T1. Proc. Natl. Acad. Sci. USA 95, 7251–7256. STEIN, R. L. (1993). Mechanism of enzymic and non-enzymic prolyl cis/trans isomerization. Adv. Protein Chem. 44, 1–24. SCHUTKOWSKI, M., BERNHARDT, A., ZHOU, X. Z., SHEN, M. H., REIMER, U., RAHFELD, J. U., LU, K. P. & FISCHER, G. (1998). Role of phosphorylation in determining the backbone dynamics of the serine/threonine-proline motif and Pin1 substrate. Biochemistry 37, 5566–5575. ZYDOWSKY, L. D., ETZKORN, F. A., CHANG, H. Y., FERGUSON, S. B., STOLZ, L. A., HO, S. I. & WALSH, C. T. (1992). Active site mutants of human cyclophilin-a separate peptidyl-prolyl isomerase activity from cyclosporin-A binding and calcineurin inhibition. Protein Sci. 1, 1092–1099. DECENZO, M. T., PARK, S. T., JARRETT, B. P., ALDAPE, R. A., FUTER, O., MURCKO, M. A. & LIVINGSTON, D. J. (1996). FK506-binding protein mutational analysis – defining the active-site residue contributions to catalysis and the stability of ligand complexes. Protein Eng. 9, 173–180. TRADLER, T., STOLLER, G., RUCKNAGEL, K. P., SCHIERHORN, A., RAHFELD, J. U. & FISCHER, G. (1997). Comparative mutational analysis of peptidyl-prolyl cis/trans isomerases – active sites of Escherichia coli trigger factor and human FKBP12. FEBS Lett. 407, 184–190. KRAMER, M. L. & FISCHER, G. (1997). FKBP-like catalysis of peptidyl-prolyl bond isomerization by micelles and membranes. Biopolymers 42, 49–60. KERN, D., SCHUTKOWSKI, M. & DRAKENBERG, T. (1997). Rotational barriers of cis/trans isomerization of proline analogues and their catalysis by cyclophilin. J. Am. Chem. Soc. 119, 8403–8408. YAFFE, M. B., SCHUTKOWSKI, M., SHEN, M. H., ZHOU, X. Z., STUKENBERG, P. T., RAHFELD, J. U., XU, J., KUANG, J., KIRSCHNER, M. W., FISCHER, G., CANTLEY, L. C. & LU, K. P. (1997). Sequence-specific and phosphorylation-dependent proline
91
92
93
94
95
96
97
98
99
isomerization – a potential mitotic regulatory mechanism. Science 278, 1957–1960. METZNER, M., STOLLER, G., RUCKNAGEL, K. P., LU, K. P., FISCHER, G., LUCKNER, M. & K¨ULLERTZ, G. (2001). Functional replacement of the essential ESS1 in yeast by the plant parvulin DlPar13. J. Biol. Chem. 276, 13524–13529. LU, K. P., LIOU, Y. C. & ZHOU, X. Z. (2002). Pinning down proline-directed phosphorylation signaling. Trends Cell Biol. 12, 164–172. SCHIENE, C., REIMER, U., SCHUTKOWSKI, M. & FISCHER, G. (1998). Mapping the stereospecificity of peptidyl-prolyl cis/trans isomerases. FEBS Lett. 432, 202–206. GALAT, A. (2003). Peptidylprolyl cis/trans isomerases (immunophilins): biological diversity-targets-functions. Curr. Top. Med. Chem. 3, 1315–1347. BUKRINSKY, M. I. (2002). Cyclophilins: unexpected messengers in intercellular communications. Trends Immunol. 23, 323–325. YURCHENKO, V., ZYBARTH, G., O’CONNOR, M., DAI, W. W., FRANCHIN, G., HAO, T., GUO, H. M., HUNG, H. C., TOOLE, B., GALLAY, P., SHERRY, B. & BUKRINSKY, M. (2002). Active site residues of cyclophilin A are crucial for its signaling activity via CD147. J. Biol. Chem. 277, 22959–22965. HOROWITZ, D. S., KOBAYASHI, R. & KRAINER, A. R. (1997). A new cyclophilin and the human homologues of yeast prp3 and prp4 form a complex associated with U4/U6 snRNPs. RNA 3, 1374–1387. MEZA-ZEPEDA, L. A., FORUS, A., LYGREN, B., DAHLBERG, A. B., GODAGER, L. H., SOUTH, A. P., MARENHOLZ, I., LIOUMI, M., FLORENES, V. A., MAELANDSMO, G. M., SERRA, M., MISCHKE, D., NIZETIC, D., RAGOUSSIS, J., TARKKANEN, M., NESLAND, J. M., KNUUTILA, S. & MYKLEBOST, O. (2002). Positional cloning identifies a novel cyclophilin as a candidate amplified oncogene in 1q21. Oncogene 21, 2261–2269. ZHOU, Z., YING, K., DAI, J., TANG, R., WANG, W., HUANG, Y., ZHAO, W., XIE, Y. & MAO, Y. (2001). Molecular cloning and
References
100
101
102
103
104
105
106
107
characterization of a novel peptidylprolyl isomerase (cyclophilin)-like gene (PPIL3) from human fetal brain. Cytogenet. Cell Genet. 92, 231–236. OZAKI, K., FUJIWARA, T., KAWAI, A., SHIMIZU, F., TAKAMI, S., OKUNO, S., TAKEDA, S., SHIMADA, Y., NAGATA, W., WATANABE, T., TAKAICHI, A., TAKAHASHI, E., NAKAMURA, Y. & SHIN, S. (1996). Cloning, expression and chromosomal mapping of a novel cyclophilin-related gene (PPIL1) from human fetal brain. Cytogenet. Cell Genet. 72, 242–245. HOROWITZ, D. S., LEE, E. J., MABON, S. A. & MISTELI, T. (2002). A cyclophilin functions in pre-mRNA splicing. EMBO J. 21, 470–480. REIDT, U., WAHL, M. C., FASSHAUER, D., HOROWITZ, D. S., LUHRMANN, R. & FICNER, R. (2003). Crystal structure of a complex between human spliceosomal cyclophilin H and a U4/U6 snRNP-60K peptide. J. Mol. Biol. 331, 45–56. REIDT, U., REUTER, K., ACHSEL, T., INGELFINGER, D., LUHRMANN, R. & FICNER, R. (2000). Crystal structure of the human U4/U6 small nuclear ribonucleoprotein particle-specific SnuCyp-20, a nuclear cyclophilin. J. Biol. Chem. 275, 7439–7442. WALLACE, A. C., LASKOWSKI, R. A. & THORNTON, J. M. (1996). Derivation of 3D coordinate templates for searching structural databases: application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci. 5, 1001–1013. DEMANGE, L., MOUTIEZ, M., VAUDRY, K. & DUGAVE, C. (2001). Interaction of human cyclophilin hCyp-18 with short peptides suggests the existence of two functionally independent subsites. FEBS Lett. 505, 191–195. MONTAGUE, J. W., HUGHES, F. M. & CIDLOWSKI, J. A. (1997). Native recombinant cyclophilins A, B, and C degrade DNA independently of peptidylprolyl cis/trans-isomerase activity – potential roles of cyclophilins in apoptosis. J. Biol. Chem. 272, 6677–6684. NICIEZA, R. G., HUERGO, J., CONNOLLY, B. A. & SANCHEZ, J. (1999). Purification,
108
109
110
111
112
113
114
characterization, and role of nucleases and serine proteases in Streptomyces differentiation – Analogies with the biochemical processes described in late steps of eukaryotic apoptosis. J. Biol. Chem. 274, 20366–20375. SCHMIDT, B., TRADLER, T., RAHFELD, J. U., LUDWIG, B., JAIN, B., MANN, K., R¨UCKNAGEL, K. P., JANOWSKI, B., SCHIERHORN, A., K¨ULLERTZ, G., HACKER, J. & FISCHER, G. (1996). A cyclophilin-like peptidyl-prolyl cis/trans isomerase from Legionella pneumophila characterization, molecular cloning and overexpression. Mol. Microbiol. 21, 1147–1160. RYFFEL, B., FOXWELL, B. M., GEE, A., GREINER, B., WOERLY, G. & MIHATSCH, M. J. (1988). Cyclosporine relationship of side effects to mode of action. Transplantation 46, 90S–96S. GOLDNER, F. M. & PATRICK, J. W. (1996). Neuronal localization of the cyclophilin a protein in the adult rat brain. J. Comp. Neurol. 372, 283–293. ARCKENS, L., Van der GUCHT, E., Van DEN BERGH, G., MASSIE, A., LEYSEN, I., VANDENBUSSCHE, E., EYSEL, U. T., HUYBRECHTS, R. & VANDESANDE, F. (2003). Differential display implicates cyclophilin A in adult cortical plasticity. Eur. J. Neurosci. 18, 61–75. BERGSMA, D. J., EDER, C., GROSS, M., KERSTEN, H., SYLVESTER, D., APPELBAUM, E., CUSIMANO, D., LIVI, G. P., MCLAUGHLIN, M. M., KASYAN, K., PORTER, T. G., SILVERMAN, C., DUNNINGTON, D., HAND, A. PRITCHETT, W. P., BOSSARD, M. J. BRANDT, M. & LEVY, M. A. (1991). The cyclophilin multigene family of peptidyl-prolyl isomerases. Characterization of three separate human isoforms J. Biol. Chem. 266, 23204–23214. HARRISON, R. K. & STEIN, R. L. (1992). Mechanistic studies of enzymic and nonenzymic prolyl cis/trans isomerization. J. Am. Chem. Soc. 114, 3464–3471. HOWARD, B. R., VAJDOS, F. F., LI, S., SUNDQUIST, W. I. & HILL, C. P. (2003). Structural insights into the catalytic mechanism of cyclophilin A. Nat. Struct. Biol. 10, 475–481.
29
30 Peptidyl-prolyl cis/trans Isomerases
115 DOLINSKI, K., MUIR, S., CARDENAS, M. & HEITMAN, J. (1997). All cyclophilins and FK506 binding proteins are, individually and collectively, dispensable for viability in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 94, 13093–13098. 116 COLGAN, J., ASMAL, M. & LUBAN, J. (2000). Isolation, characterization and targeted disruption of mouse Ppia: Cyclophilin A is not essential for mammalian cell viability. Genomics 68, 167–178. 117 WANG, P., CARDENAS, M. E., COX, C. M., PERFECT, J. R. & HEITMAN, J. (2001). Two cyclophilin A homologs with shared and distinct functions important for growth and virulence of Cryptococcus neoformans. EMBO Rep. 2, 511–518. 118 JIN, Z. G., MELARAGNO, M. G., LIAO, D. F., YAN, C., HAENDELER, J., SUH, Y. A., LAMBETH, J. D. & BERK, B. C. (2000). Cyclophilin A is a secreted growth factor induced by oxidative stress. Circ. Res. 87, 789–796. 119 HELEKAR, S. A. & PATRICK, J. (1997). Peptidyl-prolyl cis/trans isomerase activity of cyclophilin a in functional homo-oligomeric receptor expression. Proc. Natl. Acad. Sci. USA 94, 5432–5437. 120 CAPANO, M., VIRJI, S. & CROMPTON, M. (2002). Cyclophilin-A is involved in excitotoxin-induced caspase activation in rat neuronal B50 cells. Biochem. J. 363, 29–36. 121 SYED, F., RYCYZYN, M. A., WESTGATE, L. & CLEVENGER, C. V. (2003). A novel and functional interaction between cyclophilin A and prolactin receptor. Endocrine 20, 83–89. 122 CUI, Y., MIRKIA, K., FU, Y. H. F., ZHU, L., YOKOYAMA, K. K. & CHIU, R. (2002). Interaction of the retinoblastoma gene product, RB, with cyclophilin A negatively affects cyclosporin-inhibited NFAT signaling. J. Cell. Biochem. 86, 630–641. 123 JASCHKE, A., MI, H. F. & TROPSCHUG, M. (1998). Human T cell cyclophilin18 binds to thiol-specific antioxidant protein AOP1 and stimulates its activity. J. Mol. Biol. 277, 763–769. 124 LEE, J. P., PALFREY, H. C., BINDOKAS, V. P., GHADGE, G. D., MA, L., MILLER, R. J. & ROOS, R. P. (1999). The role of
125
126
127
128
129
130
131
132
133
immunophilins in mutant superoxide dismutase-1-linked familial amyo-trophic lateral sclerosis. Proc. Natl. Acad. Sci. USA 96, 3251–3256. BRAZIN, K. N., MALLIS, R. J., FULTON, D. B. & ANDREOTTI, A. H. (2002). Regulation of the tyrosine kinase Itk by the peptidyl-prolyl isomerase cyclophilin A. Proc. Natl. Acad. Sci. USA 99, 1899–1904. PORTER, R. H., BURNET, P. W., EASTWOOD, S. L. & HARRISON, P. J. (1996). Contrasting effects of electro-convulsive shock on mRNAs encoding the high affinity kainate receptor subunits (KA1 and KA2) and cyclophilin in the rat. Brain Res. 710, 97–102. SYKES, K., GETHING, M. J. & SAMBROOK, J. (1993). Proline isomerases function during heat shock. Proc. Natl. Acad. Sci. USA 90, 5853–5857. K¨ULLERTZ, G., LIEBAU, A., R¨UCKNAGEL, P., SCHIERHORN, W., DIETTRICH, B., FISCHER, G. & LUCKNER, M. (1999). Stress-induced expression of cyclophilins in proembryonic masses of Digitalis lanata does not protect against freezing thawing stress. Planta 208, 599–605. ANDREEVA, L., HEADS, R. & GREEN, C. J. (1999). Cyclophilins and their possible role in the stress response. Int. J. Exp. Pathol. 80, 305–315. VUADENS, F., CRETTAZ, D., SCELATTA, C., SERVIS, C., QUADRONI, M., BIENVENUT, W. V., SCHNEIDER, P., HOHLFELD, P., APPLEGATE, L. A. & TISSOT, J. D. (2003). Plasticity of protein expression during culture of fetal skin cells. Electrophoresis 24, 1281–1291. BORALDI, F., BINI, L., LIBERATORI, S., ARMINI, A., PALLINI, V., TIOZZO, R., PASQUALI-RONCHETTI, I. & QUAGLINO, D. (2003). Proteome analysis of dermal fibroblasts cultured in vitro from human healthy subjects of different ages. Proteomics 3, 917–929. ANDERSEN, H., JENSEN, O. N. & ERIKSEN, E. F. (2003). A proteome study of secreted prostatic factors affecting osteoblastic activity: identification and characterisation of cyclophilin A. Eur. J. Cancer 39, 989–995. CAMPA, M. J., WANG, M. Z., HOWARD, B., FITZGERALD, M. C. & PATZ, E. F. (2003).
References
134
135
136
137
138
139
140
141
Protein expression profiling identifies macrophage migration inhibitory factor and cyclophilin A as potential molecular targets in non-small cell lung cancer. Cancer Res. 63, 1652–1656. LIM, S. O., PARK, S. J., KIM, W., PARK, S. G., KIM, H. J., KIM, Y. I., SOHN, T. S., NOH, J. H. & JUNG, G. (2002). Proteome analysis of hepatocellular carcinoma. Biochem. Biophys. Res. Commun. 291, 1031–1037. ALBERTI, J., VALENZUELA, J. G., CARRUTHERS, V. B., HIENY, S., ANDERSEN, J., CHAREST, H., SOUSA, C. R. E., FAIRLAMB, A., RIBEIRO, J. M. & SHER, A. (2003). Molecular mimicry of a CCR5 binding-domain in the microbial activation of dendritic cells. Nat. Immunol. 4, 485–490. DUTTA, M., DELHI, P., SINHA, K. M., BANERJEE, R. & DATTA, A. K. (2001). Lack of abundance of cytoplasmic cyclosporin A-binding protein renders free-living Leishmania donovani resistant to cyclosporin A. J. Biol. Chem. 276, 19294–19300. BROWN, C. R., CUI, D. Y., HUNG, G. G. C. & CHIANG, H. L. (2001). Cyclophilin a mediates Vid22p function in the import of fructose-1,6-bisphosphatase into vid vesicles. J. Biol. Chem. 276, 48017–48026. TAYLOR, P., HUSI, H., KONTOPIDIS, G. & WALKINSHAW, M. D. (1997). Structures of cyclophilin-ligand complexe. Prog. Biophys. Mol. Biol. 67, 155–181. HUAI, Q., KIM, H. Y., LIU, Y. D., ZHAO, Y. D., MONDRAGON, A., LIU, J. O. & KE, H. M. (2002). Crystal structure of calcineurin-cyclophilin-cyclosporin shows common but distinct recognition of immunophilin-drug complexes. Proc. Natl. Acad. Sci. USA 99, 12037–12042. JIN, L. & HARRISON, S. C. (2002). Crystal structure of human calcineurin complexed with cyclosporin A and human cyclophilin. Proc. Natl. Acad. Sci. USA 99, 13522–13526. GAMBLE, T. R., VAJDOS, F. F., YOO, S. H., WORTHYLAKE, D. K., HOUSEWEART, M., SUNDQUIST, W. I. & HILL, C. P. (1996). Crystal structure of human cyclophilin A bound to the aminoterminal domain of HIV-1 capsid. Cell 87, 1285–1294.
142 ZHAO, Y. D., CHEN, Y. Q., SCHUTKOWSKI, M., FISCHER, G. & KE, H. M. (1997). Cyclophilin A complexed with a fragment of HIV-1 gag protein insights into HIV-1 infectious activity. Structure 5, 139–146. 143 FRANKE, E. K., YUAN, H. E. H. & LUBAN, J. (1994). Specific incorporation of cyclophilin A into HIV-1 virions. Nature 371, 359–362. 144 OU, W. B., LUO, W., PARK, Y. D. & ZHOU, H. M. (2001). Chaperone-like activity of peptidyl-prolyl cis/trans isomerase during creatine kinase refolding. Protein Sci. 10, 2346–2353. 145 KERN, G., KERN, D., SCHMID, F. X. & FISCHER, G. (1995). A kinetic analysis of the folding of human carbonic anhydrase II and its catalysis by cyclophilin. J. Biol. Chem. 270, 740–745. 146 BOSCO, D. A., EISENMESSER, E. Z., POCHAPSKY, S., SUNDQUIST, W. I. & KERN, D. (2002). Catalysis of cis/trans isomerization in native HIV-1 capsid by human cyclophilin A. Proc. Natl. Acad. Sci. USA 99, 5247–5252. 147 REIMER, U., DREWELLO, M., JAKOB, M., FISCHER, G. & SCHUTKOWSKI, M. (1997). Conformational state of a 25-mer peptide from the cyclophilin-binding loop of the HIV type 1 capsid protein. Biochem. J. 326, 181–185. 148 TOWERS, G. J., HATZIIOANNOU, T., COWAN, S., GOFF, S. P., LUBAN, J. & BIENIASZ, P. D. (2003). Cyclophilin A modulates the sensitivity of HIV-1 to host restriction factors. Nat. Med. 9, 1138–1143. 149 SCHMID, F. X. (2002). Prolyl isomerases. Adv. Prot. Chem. 59, 243–282. 150 CHITI, F., MANGIONE, P., ANDREOLA, A., GIORGETTI, S., STEFANI, M., DOBSON, C. M., BELLOTTL, V. & TADDEI, N. (2001). Detection of two partially structured species in the folding process of the amyloidogenic protein beta 2-microglobulin. J. Mol. Biol. 307, 379–391. 151 GOLBIK, R., FISCHER, G. & FERSHT, A. R. (1999). Folding of barstar C40A/C82A/P27A and catalysis of the peptidyl-prolyl cis/trans isomerization by
31
32 Peptidyl-prolyl cis/trans Isomerases
152
153
154
155
156
157
158
159
human cytosolic cyclophilin (Cyp18). Protein Sci. 8, 1505–1514. VEERARAGHAVAN, S., NALL, B. T. & FINK, A. L. (1997). Effect of prolyl isomerase on the folding reactions of staphylococcal nuclease. Biochemistry 36, 15134–15139. READER, J. S., Van NULAND, N. A. J., THOMPSON, G. S., FERGUSON, S. J., DOBSON, C. M. & RADFORD, S. E. (2001). A partially folded intermediate species of the beta-sheet protein apo-pseudoazurin is trapped during proline-limited folding. Protein Sci. 10, 1216–1224. MARTIN, A. & SCHMID, F. X. (2003). A proline switch controls folding and domain interactions in the gene-3-protein of the filamentous phage fd. J. Mol. Biol. 331, 1131–1140. ALTAMIRANO, M. M., GARCIA, C., POSSANI, L. D. & FERSHT, A. R. (1999). Oxidative refolding chromatography: folding of the scorpion toxin Cn5. Nature Biotechnol. 17, 187–191. ALTAMIRANO, M. M., WOOLFSON, A., DONDA, A., SHAMSHIEV, A., BRISENO-ROA, L., FOSTER, N. W., VEPRINTSEV, D. B., De LIBERO, G., FERSHT, A. R. & MILSTEIN, C. (2001). Ligand-independent assembly of recombinant human CD1 by using oxidative refolding chromatography. Proc. Natl. Acad. Sci. USA 98, 3288–3293. KARADIMITRIS, A., GADOLA, S., ALTAMIRANO, M., BROWN, D., WOOLFSON, A., KLENERMAN, P., CHEN, J. L., KOEZUKA, Y., ROBERTS, I. A. G., PRICE, D. A., DUSHEIKO, G., MILSTEIN, C., FERSHT, A., LUZZATTO, L. & CERUNDOLO, V. (2001). Human CD1d-glycolipid tetramers generated by in vitro oxidative refolding chromatography. Proc. Natl. Acad. Sci. USA 98, 3294–3298. SCHUTKOWSKI, M., W¨OLLNER, S. & FISCHER, G. (1995). Inhibition of peptidyl-prolyl cis/trans isomerase activity by substrate analog structures: Thioxo tetrapeptide-4-nitroanilides. Biochemistry 34, 13016–13026. SANGLIER, J. J., QUESNIAUX, V., FEHR, T., HOFMANN, H., MAHNKE, M., MEMMERT, K., SCHULER, W., ZENKE, G., GSCHWIND, L., MAURER, C. & SCHILLING, W. (1999). Sanglifehrins A, B, C and D, novel cyclophilin-binding compounds isolated
160
161
162
163
164
165
166
167
from Streptomyces sp A92-308110-I. Taxonomy, fermentation, isolation and biological activity. J. Antibiot. 52, 466–473. SEDRANI, R., KALLEN, J., CABREJAS, L. M. M., PAPAGEORGIOU, C. D., SENIA, F., ROHRBACH, S., WAGNER, D., THAI, B., EME, A. M. J., FRANCE, J., OBERER, L., RIHS, G., ZENKE, G. & WAGNER, J. (2003). Sanglifehrin-cyclophilin interaction: Degradation work, synthetic macrocyclic analogues, X-ray crystal structure, and binding data. J. Am. Chem. Soc. 125, 3849–3859. ZHANG, L. H. & LIU, J. O. (2001). Sanglifehrin A, a novel cyclophilin-binding immunosuppressant, inhibits IL-2-dependent T cell proliferation at the G(1) phase of the cell cycle. J. Immunol. 166, 5611–5618. WU, Y. Q., BELYAKOV, S., CHOI, C., LIMBURG, D., THOMAS, B. E., VAAL, M., WEI, L., WILKINSON, D. E., HOLMES, A., FULLER, M., MCCORMICK, J., CONNOLLY, M., MOELLER, T., STEINER, J. & HAMILTON, G. S. (2003). Synthesis and biological evaluation of non-peptidic cyclophilin ligands. J. Med. Chem. 46, 1112–1115. DEMANGE, L., MOUTIEZ, M. & DUGAVE, C. (2002). Synthesis and evaluation of Gly psi(PO2R-N)pro-containing pseudopeptides as novel inhibitors of the human cyclophilin hCyp-18. J. Med. Chem. 45, 3928–3933. WANG, H. C., KIM, K., BAKHTIAR, R. & GERMANAS, J. P. (2001). Structure-activity studies of ground- and transition-state analogue inhibitors of cyclophilin. J. Med. Chem. 44, 2593–2600. MARUYAMA, T. & FURUTANI, M. (2000). Archaeal peptidyl-prolyl cis/trans isomerases (PPIases). Front. Biosci. 5, D821–D836. AVRAMUT, M. & ACHIM, C. L. (2002). Immunophilins and their ligands: insights into survival and growth of human neurons. Physiol. Behav. 77, 463–468. WALENSKY, L. D., DAWSON, T. M., STEINER, J. P., SABATINI, D. M., SUAREZ, J. D., KLINEFELTER, G. R. & SNYDER, S. H. (1998). The 12 kD FK506 binding protein FKBP12 is released in the male
References
168
169
170
171
172
173
174
175
176
reproductive tract and stimulates sperm motility. Mol. Med. 4, 502–514. LEIVA, M. C. & LYTTLE, C. R. (1992). Leukocyte chemotactic activity of FKBP and inhibition by FK506. Biochem. Biophys. Res. Commun. 186, 1178–1183. VEERARAGHAVAN, S., HOLZMAN, T. F. & NALL, B. T. (1996). Autocatalyzed protein folding. Biochemistry 35, 10601–10607. SCHOLZ, C., ZARNT, T., KERN, G., LANG, K., BURTSCHER, H., FISCHER, G. & SCHMID, F. X. (1996). Autocatalytic folding of the folding catalyst FKBP12. J. Biol. Chem. 271, 12703–12707. DEIVANAYAGAM, C. C. S., CARSON, M., THOTAKURA, A., NARAYANA, S. V. L. & CHODAVARAPU, R. S. (2000). Structure of FKBP12.6 in complex with rapamycin. Acta Crystallogr. D Biol. Crystallogr. 56, 266–271. SOMAN, K. V., HANKS, B. A., TIEN, H., CHARI, M. V., ONEAL, K. D. & MORRISETT, J. D. (1997). Template-based docking of a prolactin receptor proline-rich motif octapeptide to FKBP12 – Implications for cytokine receptor signaling. Protein Sci. 6, 999–1008. LAM, E., MARTIN, M. M., TIMERMAN, A. P., SABERS, C., FLEISCHER, S., LUKAS, T., ABRAHAM, R. T., OKEEFE, S. J., ONEILL, E. A. & WIEDERRECHT, G. J. (1995). A novel FK506 binding protein can mediate the immuno-suppressive effects of FK506 and is associated with the cardiac ryanodine receptor. J. Biol. Chem. 270, 26511–26522. PARK, S. T., ALDAPE, R. A., FUTER, O., DECENZO, M. T. & LIVINGSTON, D. J. (1992). PPIase catalysis by human FK506-binding protein proceeds through a conformational twist mechanism. J. Biol. Chem. 267, 3316–3324. KIRKEN, R. A. & WANG, Y. L. (2003). Molecular actions of sirolimus: Sirolimus and mTor. Transplant. Proc. 35, 227S–230S. YANG, W., ROZAMUS, L. W., NARULA, S., ROLLINS, C. T., YUAN, R., ANDRADE, L. J., RAM, M. K., PHILLIPS, T. B., van SCHRAVENDIJK, M. R., DALGARNO, D., CLACKSON, T. & HOLT, D. A. (2000). Investigating protein-ligand interactions with a mutant FKBP possessing a
177
178
179
180
181
182
183
designed specificity pocket. J. Med. Chem. 43, 1135–1142. CHRISTNER, C., WYRWA, R., MARSCH, S., K¨ULLERTZ, G., THIERICKE, R., GRABLEY, S., SCHUMANN, D. & FISCHER, G. (1999). Synthesis and cytotoxic evaluation of cycloheximide derivatives as potential inhibitors of FKBP12 with neuroregenerative properties. J. Med. Chem. 42, 3615–3622. DRAGOVICH, P. S., BARKER, J. E., FRENCH, J., IMBACUAN, M., KALISH, V. J., KISSINGER, C. R., KNIGHTON, D. R., LEWIS, C. T., MOOMAW, E. W., PARGE, H. E. & PELLETIER, L. A. (1996). Structure-based design of novel, urea-containing FKBP12 inhibitors. J. Med. Chem. 39, 1872–1884. BURKHARD, P., TAYLOR, P. & WALKINSHAW, M. D. (2000). X-ray structures of small ligand-FKBP complexes provide an estimate for hydrophobic interaction energies. J. Mol. Biol. 295, 953–962. WEI, L., WU, Y. Q., WILKINSON, D. E., CHEN, Y., SONI, R., SCOTT, C., ROSS, D. T., GUO, H., HOWORTH, P., VALENTINE, H., LIANG, S., SPICER, D., FULLER, M., STEINER, J. & HAMILTON, G. S. (2002). Solid-phase synthesis of FKBP12 inhibitors: N-sulfonyl and N-carbamoylprolyl/pipecolyl amides. Bioorg. Med. Chem. Lett. 12, 1429–1433. HAMILTON, G. S., HUANG, W., CONNOLLY, M. A., ROSS, D. T., GUO, H., VALENTINE, H. L., SUZDAK, P. D. & STEINER, J. P. (1997). FKBP12-binding domain analogues of FK506 are potent, nonimmunosuppressive neurotrophic agents in vitro and promote recovery in a mouse model of Parkinsons-disease. Bioorg. Med. Chem. Lett. 7, 1785–1790. YAMASHITA, D. S., OH, H. J., YEN, H. K., BOSSARD, M. J., BRANDT, M., LEVY, M. A., NEWMANTARR, T., BADGER, A., LUENGO, J. I. & HOLT, D. A. (1994). Design, synthesis and evaluation of dual domain FKBP ligands. Bioorg. Med. Chem. Lett. 4, 325–328. HOLT, D. A., LUENGO, J. I., YAMASHITA, D. S., OH, H. J., KONIALIAN, A. L., YEN, H. K., ROZAMUS, L. W., BRANDT, M., BOSSARD, M. J., LEVY, M. A., EGGLESTON, D. S., LIANG, J., SCHULTZ, L. W., STOUT,
33
34 Peptidyl-prolyl cis/trans Isomerases
184
185
186
187
188
189
190
191
T. J. & CLARDY, J. (1993). Design, synthesis, and kinetic evaluation of high-affinity FKBP ligands and the X-ray crystal structures of their complexes with FKBP12. J. Am. Chem. Soc. 115, 9925–9938. CONNELLY, P. R., ALDAPE, R. A., BRUZZESE, F. J., CHAMBERS, S. P., FITZGIBBON, M. J., FLEMING, M. A., ITOH, S., LIVINGSTON, D. J., NAVIA, M. A., THOMSON, J. A. & WILSON, K. P. (1994). Enthalpy of hydrogen bond formation in a protein-ligand binding reaction. Proc. Natl. Acad. Sci. USA 91, 1964–1968. AGHDASI, B., YE, K. Q., RESNICK, A., HUANG, A., HA, H. C., GUO, X., DAWSON, T. M., DAWSON, V. L. & SNYDER, S. H. (2001). FKBP12, the 12-kDa FK506-binding protein, is a physiologic regulator of the cell cycle. Proc. Natl. Acad. Sci. USA 98, 2425–2430. SHOU, W. N., AGHDASI, B., ARMSTRONG, D. L., GUO, Q. X., BAO, S. D., CHARNG, M. J., MATHEWS, L. M., SCHNEIDER, M. D., HAMILTON, S. L. & MATZUK, M. M. (1998). Cardiac defects and altered ryanodine receptor function in mice lacking FKBP12. Nature 391, 489–492. XIN, H. B., SENBONMATSU, T., CHENG, D. S., WANG, Y. X., COPELLO, J. A., JI, G. J., COLLIER, M. L., DENG, K. Y., JEYAKUMAR, L. H., MAGNUSON, M. A., INAGAMI, T., KOTLIKOFF, M. I. & FLEISCHER, S. (2002). Oestrogen protects FKBP12.6 null mice from cardiac hypertrophy. Nature 416, 334–337. DARGAN, S. L., LEA, E. J. A. & DAWSON, A. P. (2002). Modulation of type-1 Ins(1,4,5)P-3 receptor channels by the FK506-binding protein, FKBP12. Biochem. J. 361, 401–407. MARKS, A. R. (2002). Ryanodine receptors, FKBP12, and heart failure. Front. Biosci. 7, D970–D977. CHEN, Y. G., LIU, F. & MASSAGUE, J. (1997). Mechanism of TBF-beta receptor inhibition by FKBP12. EMBO J. 16, 3866–3876. LOPEZILASACA, M., SCHIENE, C., K¨ULLERTZ, G., TRADLER, T., FISCHER, G. & WETZKER, R. (1998). Effects of FK506-binding protein 12 and FK506 on autophosphorylation of epidermal
192
193
194
195
196
197
198
199
growth factor receptor. J. Biol. Chem. 273, 9430–9434. DOLINSKI, K. J. & HEITMAN, J. (1999). Hmo1p, a high mobility group 1/2 homolog, genetically and physically interacts with the yeast FKBP12 prolyl isomerase. Genetics 151, 935–944. ALARCON, C. M. & HEITMAN, J. (1997). FKBP12 physically and functionally interacts with aspartokinase in Saccharomyces cerevisiae. Mol. Cell. Biol. 17, 5968–5975. CHAMBRAUD, B., RADANYI, C., CAMONIS, J. H., SHAZAND, K., RAJKOWSKI, K. & BAULIEU, E. E. (1996). FAP48, a new protein that forms specific complexes both immunophilins FKBP59 and FKBP12 – prevention by the immunosuppres-sant drugs FK506 and rapamycin. J. Biol. Chem. 271, 32923–32929. BROUILLARD, P., BOON, L. M., MULLIKEN, J. B., ENJOLRAS, O., GHASSIBE, M., WARMAN, M. L., TAN, O. T., OLSEN, B. R. & VIKKULA, M. (2002). Mutations in a novel factor, glomulin, are responsible for glomuvenous malformations (“glomangiomas”). Am. J. Hum. Genet. 70, 866–874. YANG, W. M., INOUYE, C. J. & SETO, E. (1995). Cyclophilin A and FKBP12 interact with YY1 and alter its transcriptional activity. J. Biol. Chem. 270, 15187–15193. FAURE, J. D., GINGERICH, D. & HOWELL, S. H. (1998). An Arabidopsis immunophilin, atFKBP12, binds to atFIP37 (FKBP interacting protein) in an interaction that is disrupted by FK506. Plant J. 15, 783–789. JEYAKUMAR, L. H., BALLESTER, L., CHENG, D. S., MCINTYRE, J. O., CHANG, P., OLIVEY, H. E., ROLLINS-SMITH, L., BARNETT, J. V., MURRAY, K., XIN, H. B. & FLEISCHER, S. (2001). FKBP binding characteristics of cardiac microsomes from diverse vertebrates. Biochem. Biophys. Res. Commun. 281, 979–986. XIN, H. B., ROGERS, K., QI, Y., KANEMATSU, T. & FLEISCHER, S. (1999). Three amino acid residues determine selective binding of FK506-binding protein 12.6 to the cardiac ryanodine
References
200
201
202
203
204
205
206
receptor. J. Biol. Chem. 274, 15315–15319. PRESTLE, J., JANSSEN, P. M. L., JANSSEN, A. P., ZEITZ, O., LEHNART, S. E., BRUCE, L., SMITH, G. L. & HASENFUSS, G. (2001). Overexpression of FK506-Binding protein FKBP12.6 in cardiomyocytes reduces ryanodine receptor-mediated Ca2+ leak from the sarcoplasmic reticulum and increases contractility. Circ. Res. 88, 188–194. GABURJAKOVA, M., GABURJAKOVA, J., REIKEN, S., HUANG, F., MARX, S. O., ROSEMBLIT, N. & MARKS, A. R. (2001). FKBP12 binding modulates ryanodine receptor channel gating. J. Biol. Chem. 276, 16931–16935. YANO, M., ONO, K., OHKUSA, T., SUETSUGU, M., KOHNO, M., HISAOKA, T., KOBAYASHI, S., HISAMATSU, Y., YAMAMOTO, T., NOGUCHI, N., TAKASAWA, S., OKAMOTO, H. & MATSUZAKI, M. (2000). Altered stoichiometry of FKBP12.6 versus ryanodine receptor as a cause of abnormal Ca2+ leak through ryanodine receptor in heart failure. Circulation 102, 2131–2136. GEORGE, C. H., SORATHIA, R., BERTRAND, B. M. A. & LAI, F. A. (2003). In situ modulation of the human cardiac ryanodine receptor (hRyR2) by FKBP12.6. Biochem. J. 370, 579–589. RAHFELD, J. U., SCHIERHORN, A., MANN, K. & FISCHER, G. (1994). A novel peptidyl-prolyl cis/trans isomerase from Escherichia coli. FEBS Lett. 343, 65–69. RAHFELD, J. U., R¨UCKNAGEL, K. P., SCHELBERT, B., LUDWIG, B., HACKER, J., MANN, K. & FISCHER, G. (1994). Confirmation of the existence of a third family among peptidyl-prolyl cis/trans isomerases – Amino acid sequence and recombinant production of parvulin. FEBS Lett. 352, 180–184. YAO, J. L., KOPS, O., LU, P. J. & LU, K. P. (2001). Functional conservation of phosphorylation-specific prolyl isomerases in plants. J. Biol. Chem. 276, 13517–13523.
207 LANDRIEU, I., WIERUSZESKI, J. M., WINTJENS, R., INZE, D. & LIPPENS, G. (2002). Solution structure of the single-domain prolyl cis/trans isomerase PIN1At from Arabidopsis thaliana. J. Mol. Biol. 320, 321–332. 208 RAY, W. C., MUNSON, R. S, & DANIELS, C. J. (2001). Tricross: using dot-plots in sequence-id space to detect uncataloged intergenic features. Bioinformatics 17, 1105–1112. 209 SCHOLZ, C., RAHFELD, J., FISCHER, G. & SCHMID, F. X. (1997). Catalysis of protein folding by parvulin. J. Mol. Biol. 273, 752–762. 210 ROUVIERE, P. E. & GROSS, C. A. (1996). SurA, a periplasmic protein with peptidyl-prolyl isomerase activity, participates in the assembly of outer membrane porins. Genes Dev. 10, 3170–3182. 211 BITTO, E. & MCKAY, D. B. (2002). Crystallographic structure of SurA, a molecular chaperone that facilitates folding of outer membrane porins. Structure 10, 1489–1498. 212 TERADA, T., SHIROUZU, M., FUKUMORI, Y., FUJIMORI, F., ITO, Y., KIGAWA, T., YOKOYAMA, S. & UCHIDA, T. (2001). Solution structure of the human parvulin-like peptidyl-prolyl cis/trans isomerase, hPar14. J. Mol. Biol. 305, 917–926. 213 RANGANATHAN, R., LU, K. P., HUNTER, T. & NOEL, J. P. (1997). Structural and functional analysis of the mitotic rotamase Pin1 suggests substrate recognition is phosphorylation dependent. Cell 89, 875–886. 214 RYO, A., LIOU, Y. C., LU, K. P. & WULF, G. (2003). Prolyl isomerase Pin1: a catalyst for oncogenesis and a potential therapeutic target in cancer. J. Cell Sci. 116, 773–783. 215 MALESZKA, R., LUPAS, A., HANES, S. D. & MIKLOS, G. L. G. (1997). The dodo gene family encodes a novel protein involved in signal transduction and protein folding. Gene 203, 89–93.
35
1
Proteins Import into Mitochondria Walter Neupert Ludwig-Maximillians-Universit¨at M¨unchen, M¨unchen, Germany
Michael Brunner Heidelberg University, Heidelberg, Germany
Kai Hell Ludwig-Maximillians-Universit¨at M¨unchen, M¨unchen, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The eukaryotic cell is divided into a number of subcellular compartments, the cell organelles. Mitochondria represent one of them. In contrast to most other organelles, mitochondria are delineated by two membranes: the outer membrane, which forms a kind of envelope, and the inner membrane, which usually forms numerous invaginations, the cristae. The membranes enclose two compartments: the matrix space surrounded by the inner membrane and the intermembrane space between the inner and outer membranes. Mitochondria fulfill a variety of important functions. They are often called powerhouses of the cell, as they generate ATP by the process of oxidative phosphorylation. The free energy in the hydrolysis of ATP to ADP is used to drive a plethora of cellular reactions. In addition, many other metabolic processes take place within mitochondria, such as fatty acid oxidation, the Krebs cycle, parts of the urea cycle, and the biosynthesis of cofactors and amino acids. The biogenesis of Fe/S clusters is an essential process for cell viability that is also located within mitochondria. In order to perform these functions, the mitochondria have a specific set of proteins. Mitochondria harbor their own genome; this, however, codes for only a small subset of mitochondrial proteins: eight in Saccharomyces cerevisiae and 13 in human. Most mitochondrial proteins are encoded by nuclear DNA and synthesized in the cytoplasm as precursor proteins. They contain specific targeting sequences, which Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Proteins Import into Mitochondria
are recognized by receptor proteins on the surface of the mitochondrial outer membrane. Proteinaceous machineries, termed protein translocases, mediate the transport of the precursor proteins into and across mitochondrial membranes [1–6]. The translocation processes across membranes require a driving force. In the case of import of proteins into the mitochondrial matrix, energy is provided by two sources. The electrochemical potential across the inner membrane drives the transfer of the targeting signals of preproteins across the inner membrane. ATP in the matrix fuels a proteinaceous machinery, the import motor. The import motor is associated with the translocase at the trans side of the inner membrane and completes the translocation of the precursor proteins across the mitochondrial membranes. Proteins are translocated into mitochondria in a largely unfolded state. In this respect, their import into mitochondria differs from their translocation into peroxisomes or their transport by the TAT translocase in bacteria and chloroplasts [7, 8], but it is similar to most translocation pathways in the endoplasmatic reticulum (ER) and chloroplasts and to the Sec translocase in bacteria [9–13]. Unfolding of proteins during translocation across the mitochondrial outer membrane (OM) and inner membrane (IM) is mediated by the mitochondrial import machinery [14, 15]. Protein unfolding by the mitochondrial import motor has become a paradigm for protein unfolding, a physiological process occurring in virtually every cell. Unfolding of proteins is also essential for other cellular processes, in particular protein degradation. Here, we discuss the import of proteins into mitochondria and the unfolding of proteins by the mitochondrial import motor during the translocation process.
2 Translocation Machineries and Pathways of the Mitochondrial Protein Import System
Three translocation machineries that mediate the import of precursor proteins into mitochondria are known (see Figure 1). The translocase of the outer membrane (TOM complex) functions in the recognition of precursor proteins, their insertion into the outer membrane, and in translocation across the outer membrane (for details, see [16–18]). Chaperones have been reported to interact with the precursor proteins in the cytosol to keep them in a translocation-competent state and to support their transport to the TOM complex [19–22]. The process of translocation across the TOM complex is followed by sorting proteins either directly to the intermembrane space or to two translocases of the inner membrane (TIM): the TIM23 complex and the TIM22 complex [1, 3, 23, 24]. The TIM23 complex transports proteins with typical mitochondrial targeting sequences across or into the mitochondrial inner membrane, while the TIM22 complex inserts a subclass of inner-membrane proteins with internal targeting signals into the membrane. This subclass includes the proteins of the carrier family and components of the inner-membrane translocases such as Tim22 and Tim23. An interesting case is the sorting of β-barrel proteins of the outer membrane. They first use the TOM complex and are then inserted and assembled in the membrane with the help of the recently discovered TOB/SAM complex [25–28].
2 Translocation Machineries and Pathways of the Mitochondrial Protein Import System 3
Fig. 1 Translocation machineries of the mitochondrial protein import system. The translocase of the outer membrane (TOM complex) mediates the translocation of proteins across and into the outer membrane. All proteins analyzed so far enter mitochondria via the TOM complex. The TOM complex consists of the receptor subunits Tom20, Tom70, and Tom22 and the general import pore. The general import pore is made up of the core subunit Tom40, the integral part of the Tom22 receptor and of the small TOM proteins Tom5, Tom6, and Tom7. There are two translocases of the inner membrane, the TIM23 and TIM22 complexes. The TIM22 complex inserts polytopic proteins with internal targeting signals into the inner membrane. It is composed of two sub-complexes. The peripherally attached Tim9/10/12 subcomplex is associated with the membrane-integrated components Tim22, Tim54, and Tim18. The soluble Tim9/10 complex and the Tim8/13 com-
plex in the intermembrane space cooperate with the Tim22 complex in the transport of carrier proteins and of Tim23, respectively. The TIM23 complex mediates the translocation of proteins with N-terminal targeting sequences across or into the inner membrane. Tim50 and the N-terminal domains of Tim23 guide the precursor proteins to the translocation channel in the inner membrane, which is formed by the integral Cterminal segments of Tim23 and Tim17. Translocation is completed by the import motor of the TIM23 translocase. Tim44 recruits the Tim14/Tim16 (Pam18/Pam16) subcomplex and mtHsp70 to the translocation channel. The co-chaperone Mge1 interacts with the mtHsp70 chaperone. The TOB complex (topogenesis of outer-membrane $-barrel proteins) inserts and assembles $barrel proteins into the outer membrane. Two subunits have been identified: the main poreforming component Tob55 (Sam50) and Mas37.
4 Proteins Import into Mitochondria
We later focus on the translocation pathway of proteins destined for the mitochondrial matrix. These proteins are translocated across the inner membrane by the TIM23 complex. 2.1 Import of Proteins Destined for the Mitochondrial Matrix
Matrix-targeted proteins in most cases have amino-terminal presequences that have the potential to form an amphipathic α-helix [29, 30]. These targeting signals direct the proteins to mitochondria by binding to the surface-exposed receptor proteins of the TOM complex on the cytoplasmic side of the outer membrane, the cis side [28]. The receptor protein Tom20 preferentially recognizes presequence proteins, whereas the Tom70 receptor protein binds mostly proteins with internal targeting signals [28]. From the initial receptor proteins, the preproteins are transferred to the general import pore (GIP) via the receptor domain of Tom22 [32, 33]. GIP is made up of five other membrane-embedded components of the TOM complex: Tom40, Tom22, and the three small subunits Tom7, Tom6, and Tom5 [2, 20]. The only essential component of the TOM complex in Saccharomyces cerevisiae, Tom40, is able to form a conducting pore in the membrane and is the main constituent of the protein-conducting channel [1, 37]. Following translocation through this channel, the preprotein binds to the “transbinding site” on the inner face of the outer membrane [38, 39]. It has been suggested that the trans-binding site has a higher affinity for preproteins than do the preprotein-binding sites of the TOM complex at the surface of the outer membrane. This would drive translocation of the presequence across the outer membrane [8, 38, 41, 42]. Next, the presequences interact with the TIM23 complex of the inner membrane. The TIM23 complex translocates the presequences across the inner membrane in a membrane potential–dependent () and ATP-dependent manner. Three functions have been assigned to the . First, it triggers dimerization of the Tim23 presequence receptor [4]. Second, it exerts an electrophoretic effect and drives the translocation of the presequence across the inner membrane [44]. Third, it can support unfolding of protein domains on the surface of the mitochondria [45]. Once the presequence has reached the matrix side of the inner membrane, complete translocation of proteins into the matrix is driven by ATP hydrolysis [4, 44, 47, 48]. Inner-membrane proteins containing a matrix-targeting signal followed by one hydrophobic transmembrane segment are also imported via the TIM23 pathway. They are not completely translocated across the inner membrane but instead are laterally inserted into the inner membrane [49, 50]. If the hydrophobic segment of these proteins is located directly after the matrix-targeting signal, their import is independent of matrix ATP [34, 50]. The TIM23 complex can be structurally and functionally divided into three parts: the hydrophilic domains in the intermembrane space, the membrane-embedded part, and the mitochondrial import motor on the matrix side of the inner membrane. The amino-terminal domain of Tim23 spans the mitochondrial outer membrane
2 Translocation Machineries and Pathways of the Mitochondrial Protein Import System 5
and links the outer membrane with the inner membrane [22]. The conserved intermembrane space domain of the single transmembrane protein Tim50 facilitates the transfer of the preprotein from the TOM complex to the TIM23 complex, probably to the receptor domain of Tim23 in the intermembrane space [37, 54, 55]. The integral part of the complex is made up by the two proteins with four transmembrane domains, Tim17 and Tim23, which most likely form the protein-conducting channel [5, 18, 19, 27, 60–64]. The mitochondrial import motor consists of at least five components: Tim44, mtHsp70, Tim14/Pam18, Tim16/Pam16, and Mge1 (see Figure 1) [4, 24, 32, 59, 68, 69]. Tim44 is crucial for recruiting mtHsp70 to the outlet of the protein-conducting channel [62, 68, 71, 72]. The chaperone mtHsp70 is the central component of the import motor. Like all DnaK-type chaperones, it consists of an N-terminal adenine nucleotide-binding domain, a peptide-binding domain, and a C-terminal “lid” domain [11, 61, 75, 76]. When ATP is bound to mtHsp70, the peptide-binding domain is in an open conformation [11, 40]. MtHsp70 in the ATP form binds to the incoming polypeptide chain emerging from the channel [68, 69, 72]. Conversion to the ADP form of mtHsp70 results in the closure of the peptide-binding domain and high-affinity binding of the polypeptide chain [11, 79]. This efficient trapping at the outlet of the channel inhibits backsliding of the polypeptide chain and drives its translocation across the mitochondrial membranes. Tim14 is a membrane-bound DnaJ homologue that is associated with the TIM23 translocase [24, 68, 69]. It interacts with Tim44 and mtHsp70 in an ATP-dependent manner and stimulates the ATPase activity of mtHsp70, thereby promoting the efficient binding of the polypeptide. Then the mtHsp70 bound to the polypeptide chain dissociates from Tim44 [72, 80–82]. Tim44 is present in the TIM23 translocase as a dimer, and therefore a second mtHsp70 is already present at the translocation pore [83]. Dissociation of the first mtHsp70 and further translocation allow the second one to bind to the next incoming segment of the polypeptide chain. Such “hand-over-hand” action results in efficient translocation. Tim44 can be viewed as an organizer that integrates the import motor with the components of the translocation channel. It recruits mtHsp70, Tim14, and Tim16 to the Tim17-Tim23 subcomplex [68] and interacts with the incoming precursor protein and transfers it to mtHsp70 [72]. Tim16 is a J-domain-related protein that has similarity to Tim14 and forms a stable subcomplex with Tim14 [32, 59]. It is structurally important, as it mediates the interaction of Tim14 with the TIM23 complex. The nucleotide-exchange factor Mge1 mediates the release of ADP from mtHsp70 [8, 65, 85, 86], which allows binding of an ATP and thereby release of mtHsp70 from the polypeptide chain. Thus, the mtHsp70 can be reused for a next cycle of binding and release. Mge1 seems to regulate the binding of mtHsp70 to Tim44 [69, 80]. MtHsp70 also binds in the ADP form to Tim44. Mge1 dissociates such an unproductive interaction, as ADP-bound mtHsp70 cannot bind incoming polypeptides. Indeed, cross-linking experiments showed that mtHsp70 binds to Tim44 in the presence of ATP in intact mitochondria [68]. Following translocation of the precursor protein into the mitochondrial matrix, the targeting signal is cleaved by the matrix-processing peptidase and, in some cases,
6 Proteins Import into Mitochondria
the mitochondrial intermediate peptidase [4]. Then the extended polypeptide chain folds into its native form, which in many cases is supported by the mitochondrial chaperones mtHsp70, Hsp60, and peptidyl-prolyl cis/trans isomerase [14, 55, 89, 90].
3 Import into Mitochondria Requires Protein Unfolding
The structures of the isolated native TOM complex or the TOM core complex of the mitochondrial outer membrane and the TIM22 complex were analyzed by singleparticle electron microscopy. The observed structures suggested pores with an internal diameter of about 20–25 Å for the TOM complex and stain-filled pits with a diameter of 16Å for the TIM22 complex [2, 63, 83, 93]. The integral membrane core component Tom40 appears to be the subunit forming the pore of the TOM complex, with a diameter of about 22–25 Å [1, 37]. Such a pore diameter is consistent with observations that precursors containing bulky elements can be transported into mitochondria. Examples are precursors containing branched polypeptides or oligonucleotides linked to the C-terminus of a precursor or precursors containing neighboring β-strands that were covalently altered by chemical cross-linking prior to import [94–96]. It was also reported that the AAC precursor protein and Tim23 cross the TOM complex in a loop conformation [15, 30, 99]. In other studies experimental evidence suggests that the BCS1 precursor protein is imported across the outer and inner membrane as a loop structure, formed by hydrophobic interactions between its transmembrane domain and the subsequent amphipathic α-helical targeting signal [31, 101]. However, it seems impossible that translocation pores with the observed diameters import fully folded proteins. Indeed, experiments have clearly demonstrated that tightly folded domains cannot be imported. The first evidence was obtained using a fusion protein consisting of the first 22 amino acid residues of the cytochrome oxidase subunit IV presequence fused to mouse dihydrofolate reductase (DHFR). This protein was imported into isolated mitochondria like an authentic mitochondrial precursor protein [50]. Binding of methotrexate, a specific ligand of DHFR, leading to a stably folded DHFR domain inhibited translocation of the precursor protein into mitochondria [26]. Further studies demonstrated that mitochondrial precursor proteins containing the tightly folded heme-binding domain of (HBD) of cytochrome b2 or the bacterial ribonuclease barnase as folded passenger proteins are transported into mitochondria only in an unfolded state [35, 39, 49, 77]. In an elegant study by Schwartz et al. [94], rigid gold clusters of specific dimensions were cross-linked to the C-terminus of a precursor protein and their import was studied. Above a certain size, these clusters were not imported, confirming that the translocases cannot transport large structures. Clusters of specific dimensions allowed estimating the internal diameter of the TOM complex to be between 20 Å and 26 Å . Comparison with the diameter observed in the electron microscopy studies suggests that the pore diameter of the TOM complex does not alter significantly during translocation. The TIM23 translocase in the inner membrane might
4 Mechanisms of Unfolding by the Mitochondrial Import Motor
behave differently. Calculations based on channel conductance and size exclusion suggest that renatured recombinant Tim23 forms a channel with an even smaller inner diameter of 13 Å [64]. The pore of the TIM23 complex is narrower than the TOM complex. However, the TIM23 complex appears not to have a well-defined rigid pore, as preproteins cross-linked to 20-Å gold clusters were still imported by the TIM23 complex, albeit with strongly reduced efficiency [108]. Most precursor proteins are imported posttranslationally into mitochondria. Therefore, they may fold in the cytoplasm prior to import. The requirement of unfolding of preproteins prior to their import has been observed not only in vitro but also in vivo. When a hybrid protein of the amino-terminal portion of the mitochondrial precursor protein cytochrome b2 and of DHFR was expressed in intact yeast cells and the specific ligand methotrexate was added to the cells, the precursor protein was arrested in mitochondrial contact sites, demonstrating the importance of protein unfolding for translocation in vivo [109]. It has also been shown that the heme-binding domain, an authentic mitochondrial domain, folds in the cytosol [10]. Further research is needed to determine which endogenous mitochondrial proteins are held in a translocation-competent unfolded state by cytosolic chaperones and which ones fold and have to be unfolded prior to their translocation into mitochondria. However, it is obvious that some precursor proteins have to be unfolded by the mitochondrial import machinery.
4 Mechanisms of Unfolding by the Mitochondrial Import Motor
The mitochondrial import motor drives the unfolding of protein domains and the translocation of polypeptides across the mitochondrial membranes. Two models have been proposed to explain how the mitochondrial import motor exerts its function (see Figure 2). In the “targeted Brownian ratchet model,” the import motor acts as a molecular ratchet that traps the incoming polypeptide chain directly at the outlet of the protein-conducting channel of the inner membrane [72, 92, 112]. In the “power-stroke model” or “pulling model,” the mitochondrial import motor exerts a power stroke perpendicular to the membrane and generates a pulling force on the polypeptide chain [38, 39]. 4.1 Targeted Brownian Ratchet
Proteins partially unfold and fold in the range of milliseconds, a process also termed “thermal breathing”. If an unfolded segment is exposed due to “thermal breathing” during protein import, the unfolded segment of the polypeptide chain can move into the translocation channel due to Brownian motion [82]. Within the translocation channel it can oscillate in both directions [82]. MtHsp70 of the import motor binds to segments of the polypeptide chain as they are emerging from the translocation channel at the matrix side. According to this model, the
7
8 Proteins Import into Mitochondria
4 Mechanisms of Unfolding by the Mitochondrial Import Motor
import motor acts as a ratchet that permits forward movement but not backsliding of the polypeptide chain [61, 72, 75, 80, 83, 94]. Thereby it transduces spontaneous Brownian movement into vectorial transport. The polypeptide chain is trapped and refolding of the polypeptide on the outside of mitochondria is impossible. Such a mechanism requires a very efficient trapping process. The attachment of mtHsp70 to the TIM23 complex by Tim44 targets the trapping device to the site of the emerging polypeptide and assures immediate binding and trapping. In summary, the targeted molecular ratchet unfolds proteins, because it inhibits the refolding of spontaneously unfolded segments of the protein at the outside of mitochondria [11, 35]. 4.2 Power-stroke Model
In the power-stroke model, the action of the mitochondrial import motor generates a force on the polypeptide chain spanning the mitochondrial membranes. This force is transmitted by the polypeptide chain and is required for unfolding of protein ←-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Fig. 2 The “Brownian ratchet” and the “power-stroke” models for unfolding and translocation of precursor proteins into mitochondria. Stage I: Membrane potential– dependent translocation of the N-terminal targeting signal of a precursor protein into the matrix and its cleavage by the mitochondrial processing peptidase generate a preprotein spanning the mitochondrial outer and inner membranes. The folded domain of this early translocation intermediate is on the surface of mitochondria. Stage II: When the preprotein emerges from the translocation channel, it is bound by the mitochondrial heat shock protein 70 (mtHsp70), which is recruited by Tim44 to the translocation channel. ATP hydrolysis by mtHsp70 is stimulated by Tim14. The conversion of the ATP-bound form of mtHsp70 to the ADP-bound form leads to a conformational change of the peptidebinding domain and the tight binding of the preprotein. “Brownian ratchet” model: Stage IIIa: Thermal fluctuations lead to fast local unfolding and refolding of the preprotein, termed “thermal breathing.” Stage IVa: The mtHsp70 bound to the preprotein dissociates from Tim44. Unfolded segments of the preprotein can slide back and forth in the translocation channel due to random
thermal fluctuations (Brownian oscillations). This exposes a new segment of the incoming polypeptide chain at the exit of the import channel, which is trapped by a second mtHsp70 bound to the second Tim44 in the TIM23 complex. Tim14 stimulates ATP hydrolysis of this mtHsp70 and allows efficient trapping. Thus, backsliding of the preprotein is prevented, and therefore the preprotein cannot refold outside of mitochondria. Stage Va: Another mtHsp70 is recruited by Tim44 and traps the next segment of the oscillating preprotein at the exit of the channel. “Power-stroke” model: Stage IIIb: MtHsp70 exerts a pulling force perpendicular to the inner membrane, which leads to unfolding of the preprotein on the surface of mitochondria and transport of a segment of the preprotein into the matrix space. The force is generated by a conformational change of mtHsp70. Stage IVb: Tim14 stimulates ATP hydrolysis of the second recruited mtHsp70. The ADP-bound form of mtHsp70 binds tightly to a newly exposed segment of the preprotein. Stage Vb: The first mtHsp70 dissociates from Tim44 and the second mtHsp70 undergoes a conformational change, generating a pulling force.
9
10 Proteins Import into Mitochondria
domains on the outside of mitochondria [38, 47, 53, 103, 117]. An ATP-induced conformational change of mtHsp70 exerts the pulling force, which is transferred by a lever arm to the polypeptide chain. Following the power stroke, mtHsp70 has to be released from Tim44 so that the next mtHsp70 can bind and exert another power stroke. Backsliding of the polypeptide chain before binding of the next mtHsp70 must be prevented. It was proposed that the protein-conducting channel fulfills such a function [13, 19]. Multiple cycles of a power stroke would result in a regular stepwise translocation.
5 Studies to Discriminate between the Models
Studies to discriminate between the two models have addressed the mode of unfolding of preproteins on the outside of mitochondria during the protein translocation as well as mechanistic aspects of the mitochondrial import motor. 5.1 Studies on the Unfolding of Preproteins
Before discussing protein unfolding in the context of mitochondrial protein import, two modes of unfolding will be distinguished: global and local unfolding of proteins. Global unfolding results in a completely unfolded polypeptide chain without any structured elements. The equilibrium between the folded and unfolded states is described by the thermodynamic stability of a protein. The thermodynamic stability of folded proteins seems to correlate with their resistance against protease digestion, which requires extensive unfolding of domains [35]. In contrast, local unfolding, also termed “breathing,” is characterized by perpetual thermal fluctuations within a protein [119]. These fluctuations occur in the time range of nanoseconds to seconds and result in local unfolding events. Binding of denaturants such as urea [119], chemical modification of side chains of amino acid residues [35], or interaction with other proteins such as chaperones leads to the trapping of such local unfolded segments with varying efficiencies [49, 68]. These segments cannot refold once they are trapped. Further unfolding occurs and eventually a completely unfolded state results within milliseconds to minutes. 5.1.1 Comparison of the Import of Folded and Unfolded Proteins In order to understand the unfolding of precursor proteins during import, proteins containing folded domains were analyzed in comparison with the same proteins harboring the domains in an unfolded state. In most cases mouse cytosolic dihydrofolate reductase (DHFR) was used as passenger protein and was fused to a mitochondrial presequence [50]. The unfolded state was acquired by addition of urea prior to the import reaction [25, 96] or by using a mutant form, which does not stably fold [123]. The import process was started by allowing the presequence to translocate across the TIM23 complex with the help of . In this way the
5 Studies to Discriminate between the Models
precursor chain was threaded into the translocation pores of the TOM and TIM23 complexes. Import of the unfolded form of the protein was always faster than import of the folded counterpart [33, 35, 61, 68, 123, 125, 126]. According to the Brownian ratchet model, movement of the polypeptide chain in the import channels is stopped when a folded domain reaches the TOM complex. Only when local spontaneous unfolding events occur will unfolded segments be exposed and then be able to enter the translocation channels. The kinetics of this breathing determines the average time period for which the forward movement of a folded domain is halted. Experimental observations support this. Folded, but not unfolded, DHFR-containing precursor proteins do pause upon import [35]. The import kinetics of preproteins with folded DHFR show a notable lag phase. After unfolding by urea, no lag phase in the import was observed [68]. Furthermore, passenger proteins such as HBD and titin (see below and Section 5.1.3) showed lag phases with different time periods [35, 94]. It should be noted that most significant lag phases were observed for proteins with short amino-terminal presequences in front of the folded domain, such as cyt b2 (47)DHFR [68]. In these cases the presequences in front of the folded domain are not long enough to interact with the import motor, and therefore local spontaneous unfolding events of the folded domain cannot be trapped or promoted by the import motor in the initial stage of the translocation process. Preproteins with longer presequences interact with the import motor, spontaneous local unfolding events are trapped, and backsliding of the polypeptide chain is inhibited. In the case of pSu9(1–69)DHFR, a fusion protein consisting of the presequence of the subunit 9 of Neurospora F1 F0 -ATPase and mouse DHFR, the rate of import was determined for different forms of the DHFR passenger. When wild-type DHFR was present, import was much slower than when an unfolded mutant form of DHFR was present [35]. A sequence of 69 amino acid residues was shown to be sufficient to reach the mitochondrial import motor; about 50 residues were reported to be required to span both the outer and inner membranes [82, 108]. Therefore, the faster import of the mutant form must have been due to the unfolded state of the passenger DHFR. One would expect that the kinetics of import of folded proteins after the lag phase is similar to those of urea-unfolded proteins. Indeed, this was observed, demonstrating that a preprotein following the initial unfolding events reflects the behavior of an unfolded protein [68]. According to the power-stroke model, a folded domain will be unfolded by a power stroke of the import motor. One power stroke should be sufficient to cause a nearly complete collapse of the folded structure. The power stroke is supposed to be exerted by a fast conformational change. Assuming one fast power stroke, it is difficult to explain that the lag phases of import are in the range of minutes. In addition, unfolding and import of preproteins were compared with regard to their temperature dependence [35]. Import of folded and unfolded DHFR at 30 ◦ C occurred with similar rates. The permanently unfolded mutant variant of DHFR has a twofold lower import rate into mitochondria at 10 ◦ C than at 30 ◦ C. This result indicates that the import motor has a moderate temperature dependence.
11
12 Proteins Import into Mitochondria
On the other hand, the import rates of folded DHFR were 30-fold faster at 30 ◦ C than at 10 ◦ C, and there was very little import at 4 ◦ C. In the case of a power-stroke mechanism, the temperature dependence of the import would be determined by that of the import motor and is expected to be similar for the import of folded and unfolded proteins. This was not observed. According to the Brownian ratchet model, import of folded DHFR depends on its spontaneous unfolding at the N-terminus. The kinetics of N-terminal breathing was analyzed by measuring the modification by N-ethylmaleimide (NEM) of the cysteine residue in position 7 of DHFR [35]. This cysteine is buried and therefore inaccessible to NEM in the folded protein. Upon breathing of the DHFR, the N-terminal segment moves out and can be modified by NEM. The modified protein cannot refold and is sensitive to added protease, in contrast to the folded protein. The measured rates of amino-terminal unfolding confirmed, for all temperatures analyzed, that unfolding is faster than import. Furthermore, the amino-terminal breathing is strongly reduced at low temperature. At 30 ◦ C the unfolding at the N-terminus was observed to be ca. 14-fold faster than at 10 ◦ C. The temperature dependence of the import rates can be explained by the temperature dependence of spontaneous unfolding, but not by the temperature dependence of the import motor. The heme-binding domain of cytochrome b2 (HBD) was another passenger domain studied in this respect. The domain was fused to mitochondrial presequences and its import characteristics were investigated [35]. The behavior of the HBD differed from that of DHFR. The import rates of HBD passenger proteins were only twofold higher at 30 ◦ C than at 10 ◦ C, and import was still quite efficient at 0–4 ◦ C. Thus, import of the HBD displayed a very weak temperature dependence. This correlates with a faster local unfolding of HBD than of DHFR [35]. The rate of local unfolding of HBD was measured by chemical modification of residues buried in the folded domain. Interestingly, local unfolding of HBD was less temperaturedependent than that of DHFR. This is consistent with the finding that pausing of preproteins containing HBD, as described for DHFR-containing constructs, was not observed. When the global thermodynamic stability of the proteins was assessed by their protease resistance, HBD was found to be much more stable than DHFR [35]. In conclusion, the kinetics of spontaneous local unfolding and not the thermodynamic stability of a folded domain is the crucial parameter for the efficiency of import. The observed results with DHFR and HBD are in agreement with those from experiments using bacterial RNase barnase as a passenger protein [49]. When the import of barnase fused to mitochondrial presequences was analyzed, it was observed that its import kinetics were faster than that of fusion proteins consisting of DHFR and presequences, although the thermodynamic stability of barnase is higher than that of DHFR. In addition, it was observed that import of barnase into mitochondria results in an unfolding pathway that differs from the one of global unfolding in solution [49]. During import the barnase was unfolded from its N-terminus. The amino-terminal structure of barnase consists of an α-helix on the surface of the folded barnase structure and can therefore unfold with a relatively high probability [47]. Such an arrangement is also found in the HBD. In contrast,
5 Studies to Discriminate between the Models
the amino-terminal β-strand of DHFR is buried in a central β-stranded core and therefore does not unfold easily. The amino-terminal structures of DHFR on the one hand and HBD and barnase on the other hand appear to be the reason for their different import and unfolding behavior. In conclusion, spontaneous unfolding together with a targeted ratchet mechanism is sufficient to explain the temperature dependence of import, the pausing of import at folded domains, and the amino-terminal unfolding behavior of the protein domains analyzed. 5.1.2 Import of Preproteins With Different Presequence Lengths Precursor proteins that have segments of different length in front of a folded DHFR domain were imported with very different rates [35, 77]. Rather slow import was observed for precursors with 60 residues in front of DHFR, whereas precursors with segments of 90 or more residues had 10- to 40-fold higher import rates. The import rate increased dramatically at lengths between 60–70 residues due to accessibility of the presequence to the molecular import motor. To span both mitochondrial membranes and to reach the matrix space, a preprotein needs a minimal length of ca. 50–60 residues in front of folded DHFR. This allows binding of mtHsp70, which, together with the other motor proteins, promotes forward movement and unfolding of the DHFR domain. In contrast, precursors containing less than 50–60 residues in front of DHFR (referred to as shorter precursors) are not able to reach the matrix without unfolding of a sufficiently long segment [35, 77]. Without unfolding, mtHsp70 cannot bind and support import. In the Brownian ratchet model, spontaneous unfolding and forward movement by Brownian motion occur with longer as well as with shorter precursors. However, with a longer precursor mtHsp70 can trap a segment of the precursor at the outlet of the import channel. Therefore, backward sliding is inhibited and refolding of the DHFR is strongly impaired. In contrast, shorter precursors can slide back and refold because mtHsp70 does not reach the precursor. Backsliding and refolding compete with forward movement and import, so that overall import is much slower [35]. According to the power-stroke model, longer precursors have faster import kinetics because an amino-terminal segment of the polypeptide chain reaches into the mitochondrial matrix and mtHsp70 can exert a pulling force and cause a collapse of the folded domain on the mitochondrial surface [47, 77]. Thus, both models explain the higher import rates of precursors having longer presequences in front of a folded domain compared to the rates of those with short presequences. The Brownian ratchet model, in contrast to the power-stroke model, predicts that the dependence of import rates on the length of the N-terminal stretch varies among folded domains, which have a different rate of spontaneous unfolding. This was indeed observed with DHFR and HBD as passenger proteins [35]. 5.1.3 Import of Titin Domains Strong evidence in favor of the targeted Brownian ratchet mechanism was obtained using the immunoglobulin (Ig)-like domains of muscle titin as a passenger protein. The Ig-like domains are tightly folded. Experiments with atomic force microscopy
13
14 Proteins Import into Mitochondria
indicated that mechanical pulling with forces of 200–300 pN is needed to unfold these domains [12, 67, 72, 128]. However, import of these domains into isolated mitochondria occurs efficiently when they are combined with a mitochondrial targeting sequence [94]. This raises the question of what kind of force a hypothetical power stroke of mtHsp70 could generate. The well-known motor proteins myosin or kinesin generate forces of about 5 pN [129, 130]. Thus, a reasonable assumption would estimate similar forces for mtHsp70. Furthermore, one single stroke and therefore hydrolysis of one ATP should lead to the complete collapse and hence unfolding of a folded domain. Estimating a stroke length of 3.5 nm, the equivalent of ca. 10 extended amino acid residues, the energy in one ATP can exert a force of about 14 pN [94]. Although the force measurements were performed under in vitro conditions and the values may be slightly different in the biological system, it appears impossible that the required force in the range of 200–300 pN could be generated by a power stroke of mtHsp70. It also has to be kept in mind that rapid cycles of mtHsp70 binding and release of the incoming polypeptide chain take place at the import site, when a folded domain on the outside of the mitochondria blocks forward movement [35, 94]. This excludes titin unfolding with low force by a slow power stroke in the minute range. 5.1.4 Unfolding by the Mitochondrial Membrane Potential Some precursor proteins have a targeting sequence of such length in front of a folded domain that they do not reach into the matrix to interact with mtHsp70. However, if they are of sufficient length that their positively charged presequences are exposed in the import channel of the inner membrane, then the membrane potential across the inner membrane can act on them. Since the electrical potential is negative at the inner surface of the membrane, the positively charged presequence is driven towards the matrix. It has been reported that such an action of on the presequence can result in unfolding of a protein on the surface of mitochondria [45]. According to this view, spontaneous local unfolding events at the N-terminus of the folded domain can be trapped by the membrane potential. The activation barrier for spontaneous local unfolding events at the N-terminus of the folded domain may be lowered by the force generated by the action of the membrane potential on the presequence [14, 45]. However, this intriguing possibility has yet to be verified. 5.2 Mechanistic Studies of the Import Motor 5.2.1 Brownian Movement of the Polypeptide Within the Import Channel The polypeptide chain can move within the translocation channels of the TOM and TIM23 complexes in forward and reverse direction. Studies using intact mitochondria, isolated outer-membrane vesicles, and purified TOM complex demonstrate that the presequence of a precursor protein can translocate across the TOM complex and bind to the trans side of the complex independent of an energy source [38, 132]. In the absence of a membrane potential, the preprotein interacts with
5 Studies to Discriminate between the Models
a component of the inner-membrane translocase, Tim50, as demonstrated by a cross-linking approach [54, 55]. However, the membrane potential is needed for further translocation across the inner membrane because it appears to be involved in the gating of the TIM23 complex [4, 64]. Subsequently, the polypeptide chain traverses the outer and inner membranes at the same time [26, 107]. Intermediates that span both membranes were used to demonstrate retrograde movement of the polypeptide chain within the translocation channels [35, 80–82]. When mitochondria were depleted of ATP, mtHsp70 was unable to arrest the intermediates in the matrix, and these intermediates slid back and fell out of the import channels. Forward movement of the polypeptide chain without the function of mtHsp70 was supported by experiments with specific precursor proteins. A certain group of precursor proteins is sorted into the inner membrane by arrest of their singletransmembrane segment in the translocase and subsequent lateral release into the lipid bilayer, a process termed stop-transfer pathway [49, 50]. In these precursors the transmembrane segment follows the amino-terminal presequence in a short distance of about 20 amino acid residues. The enzyme D-lactate dehydrogenase (D-LD) [50] and a mutant form of subunit 5A of cytochrome oxidase [34] are such preproteins. Neither matrix ATP nor a functional mtHsp70 is needed for import of these preproteins. It appears that the polypeptide chain moves in the forward direction until the hydrophobic transmembrane segment is trapped in the translocation channel of the TIM23 complex. These proteins apparently do not require the mtHsp70 motor. Another group of precursor proteins was genetically engineered. A presequence was fused to a segment of 25 or 50 residues of glutamate or glycine residues followed by the coding sequence of mouse DHFR [94]. Stretches of these amino acid residues could not be bound by mtHsp70, which is consistent with the substrate specificity of the binding pocket of its bacterial homologue DnaK [40, 79]. Although mtHsp70 was not able to bind and therefore to “pull” on the preprotein, import of these fusion proteins into mitochondria occurred efficiently. These experiments support the view that movement of polypeptide chains within the import channels in a forward and reverse direction can occur without the mtHsp70 and a power stroke being involved. 5.2.2 Recruitment of mtHsp70 by Tim44 Tim44 anchors the mtHsp70 chaperone to the TIM23 complex at the matrix side of the inner membrane [62, 71, 72]. An analysis of the interaction of Tim44 with mtHsp70 may help to discriminate between the two models. MtHsp70 consists of an ATPase domain, a peptide-binding domain (PBD), and a carboxy-terminal domain or “lid” [11, 42]. According to the Brownian ratchet model, any of these domains could interact with Tim44 to allow mtHsp70 to act as a targeted trapping device. In the power-stroke model, Tim44 is expected to bind the ATPase domain of mtHsp70, so that an ATP hydrolysis–dependent conformational change could generate a vectorial movement. In this case the PBD could move relative to the membrane anchor and function as a lever arm. Such a pulling of the polypeptide chain would be impossible if the PBD were bound to Tim44.
15
16 Proteins Import into Mitochondria
In two studies, the interaction between Tim44 and mtHsp70 was analyzed by co-immunoprecipitation experiments following the import of domains of mtHsp70 and hybrids between mtHsp70 and mitochondrial homologues of mtHsp70 into isolated mitochondria [61, 76]. The PBD of mtHsp70 in combination with the various ATPase domains were co-precipitated with Tim44 with an efficiency of about 50%, whereas the ATPase domains without the PBD were not precipitated or only minor quantities of about 10% precipitated. When a construct consisting of the peptide-binding and lid domain was imported, no interaction was observed; the functionality of the construct, however, was not shown [61, 76]. Another study was performed with hybrids between mtHsp70 and the closely related DnaK from E. coli, which were expressed in mitochondria [75]. In addition, truncated versions of mtHsp70 were imported into mitochondria. Coimmunoprecipitation analysis revealed that the PBD alone binds to Tim44. Together with the lid domain, the PBD was essential to transmit information on the nucleotide state of the ATPase domain to the rest of the chaperone. In addition, a mutant form of mtHsp70, the ssc1-2 mutant, was isolated [55]. The ssc1-2 mutation is present within the PBD. The interaction of mtHsp70 with Tim44, and therefore mitochondrial import, is compromised in the ssc1-2 mutant [33, 55, 72, 137]. Intragenic suppressors were isolated that restore binding of the mutant form of mtHsp70 to Tim44 [125]. The suppressor mutations also reside within the PBD. The position of the mutations in the ssc1-2 mutant is consistent with an interaction between the PBD and Tim44. In summary, it appears that the PBD of mtHsp70 cannot function as a lever arm, as it seems to bind to Tim44. 5.2.3 Import Without Recruitment of mtHsp70 by Tim44 According to the targeted Brownian ratchet model, the polypeptide chain will be trapped directly at the outlet of the import channel. Therefore, mtHsp70 has to be recruited to this site. It was suggested that mtHsp70 can act as a ratchet in the mitochondrial matrix and import unfolded, but not folded, proteins without anchoring of mtHsp70 to Tim44 [68, 80, 125]. A trapping mechanism was proposed to be sufficient for import of unfolded proteins, but import of folded domains would require a Tim44mtHsp70 interaction supporting the power-stroke model [47, 111, 125]. However, it has to be kept in mind that a Tim44-mtHsp70 interaction is equally essential for a targeted ratchet. The argumentation was based on experiments with two conditional mutant strains. One strain carries a mutant form of Tim44 and the other strain, ssc1-2, carries a mutation in the gene encoding mtHsp70. At non-permissive temperature these strains are unable to grow. When isolated mitochondria of the Tim44 mutant were incubated at non-permissive temperature, unfolded preproteins were imported, but import of folded preproteins was strongly reduced [9]. The same was found with ssc1-2 mutant mitochondria after incubation at nonpermissive temperature. At non-permissive temperature a prolonged binding of mtHsp70 to unfolded polypeptide chains was detected, but mtHsp70 did not interact with Tim44 anymore [125]. However, the prolonged binding does not reflect a stronger trapping capacity of the mutant Hsp70, Ssc1-2p, because its association
5 Studies to Discriminate between the Models
and dissociation rates towards the polypeptide are not strongly altered [70]. Rather, folding of the polypeptide is affected, and therefore more unfolded precursor is present, which then interacts with mtHsp70. Binding to Tim44 and import of folded precursors could be restored by intragenic suppressors of ssc1-2 [125]. Two explanations for this behavior were discussed. First, binding to the polypeptide chain by mtHsp70 without recruitment to the import channel could be sufficient for import of unfolded, but not folded, proteins. Second, there might be residual activities in the mutant forms that are sufficient for import of unfolded precursors. The ssc1-2 strain and a strain with reduced levels of functional Tim44 were used in another study that demonstrated that the activity of import of high amounts of recombinant unfolded precursor (picomoles) was drastically reduced in these mitochondria [61]. The level of reduction in import activity in mitochondria of the ssc1-2 strain might depend on the precursor protein studied and on subtle experimental differences, as a modest reduction was observed in another report [68]. Still, these mitochondria imported radio labeled, unfolded precursors that were produced in a cell-free translation system (femtomoles) [61]. Residual activities of the conditional mutants were able to promote import of unfolded, but not folded, preproteins in small quantities. These experiments indicate that Tim44 and mtHsp70 are essential for efficient import [61]. In conclusion, both proposed models, the targeted molecular ratchet and the power-stroke mechanism, require the Tim44-mtHsp70 interaction to mediate efficient import of unfolded proteins. Therefore, these studies [61, 68, 80, 125] are not suited to discriminate between these models.
5.2.4 MtHsp70 Function in the Import Motor MtHsp70 in the ATP-bound form is recruited by Tim44 to the import sites and efficiently binds to the incoming polypeptide chains. Tim14 then stimulates ATP hydrolysis, which leads to a conformational change in mtHsp70 so that the PBD of the ADP-bound form closes. Repeated cycles lead to inward movement of the polypeptide chain. Two classes of interactions of the unfolded polypeptide chain with mtHsp70s can be distinguished. The first class is mediated by Tim44 together with Tim14 at the outlet of the import channel. Even low-affinity complexes, which lead to short-lived interactions of substrate with mtHsp70 and which do not contribute to efficient trapping of the chain, may be formed. The on-rate of mtHsp70 to the substrate is high due to the presence of the J-domain protein Tim14 in the import motor, but the off-rate is also high due to the low binding affinity. Thus, there is a high rate of binding and release. Only a few binding sites of very high affinity are present in proteins [117, 122]. High-affinity binding can contribute to efficient trapping of segments exposed in the matrix space. This explains why high ATP levels in the matrix are necessary to find a pool of complexes of mtHsp70 in the import motor associated with the translocating precursor [54, 72, 81]. High ADP levels lead to a loss of this pool, as the ADP form of this mtHsp70 is rapidly released and Tim44-mediated cycling of the ATP form cannot occur.
17
18 Proteins Import into Mitochondria
The second class comprises high-affinity interactions. MtHsp70 entering the precursor at the import motor will remain tightly bound to a few segments of an unfolded polypeptide chain. These longer-lived complexes remain stable even after movement into the mitochondrial matrix. Moreover, dissociation of these complexes is dependent on the action of the nucleotide-exchange factor Mge1 and subsequent ATP uptake. Once mtHsp70 is released, but the protein still does not reach a folded state, mtHsp70 will rebind to the high-affinity binding sites with the help of the mitochondrial matrix co-chaperone Mdj1. High ATP in the matrix favors the open ATP-bound form of mtHsp70, and therefore steady-state levels of high-affinity complexes mediated by Mdj1 are low. In conclusion, the mode of interaction of mtHsp70 with preproteins apparently depends on its location and its partner proteins. When ATP levels are high, Tim44mediated interactions with preproteins at the import sites are observed, whereas at low ATP levels, interactions with preproteins in the matrix mediated by Mdj1 are observed [81, 82]. This demonstrates the specialized nature of the mitochondrial import motor, which allows efficient trapping of incoming precursor proteins. The recruitment of mtHsp70 and the J-protein Tim14 by Tim44 directly to the trans-location channel allows efficient trapping, and even precursor segments with lower affinity and short-lived interactions can contribute. On the other hand, Mdj1supported interactions of mtHsp70 with a few high-affinity sites of the polypeptide generate long-lived complexes in the matrix. These latter complexes may be able to prevent large-scale retrograde movements of the preprotein, but not the small-scale oscillations in the import channel that have to be ratcheted, in particular to promote unfolding at the mitochondrial surface. 6 Conclusions
Mitochondrial precursor proteins are imported in a largely unfolded state. The translocases of the mitochondrial membranes are not able to transport proteins in a folded state across the membranes. There are at least three mechanisms that result in an unfolded polypeptide chain. First, co-translational import makes protein folding prior to translocation impossible. Second, molecular chaperones can interact with newly synthesized proteins. They protect the proteins against aggregation and keep them in an unfolded, translocation-competent state. Third, unfolding of proteins is mediated by the molecular import motor. It remains to be investigated which substrates have to be unfolded by the mitochondrial import motor in vivo and which substrates are kept in a translocation-competent state by cytosolic chaperones. It will be interesting to see how it is determined whether a protein folds or interacts with chaperones prior to the import into mitochondria. The Brownian ratchet mechanism is widely accepted as a mechanism to drive protein translocation across membranes. This is the case not only for the mitochondrial system but also for translocation into the endoplasmic reticulum [53, 74, 125]. In mitochondria, protein translocation and protein unfolding on the surface of
References
mitochondria are driven by the mitochondrial import motor. The targeted Brownian ratchet mechanism is sufficient to explain all experimental observations on the function and structure of the mitochondrial protein import motor. One of these functions is the unfolding of proteins during mitochondrial import. Folded proteins do not have requirements for their import that are different from or additional to those of folded proteins. It is still debated whether the import motor can exert a power stroke or whether it can switch to a power-stroke mode, in particular for the import of folded proteins. So far direct evidence for a power stroke has not been presented. It cannot be excluded that there is a weak pulling force generated by the import motor that assists in protein translocation. However, such a form does not seem to be either helpful or necessary. In the case of a power stroke, it should be possible to measure directly the force generated by the import motor. In order to support the power-stroke model, experimental proof for a lever-arm function of mtHsp70 would be required. In addition, experimental proof for an import in regular steps that correlates with the lever-arm dimensions would be required. As it was suggested that the import motor could switch from the trapping to the pulling mode to achieve unfolding by one power stroke, such a switching mechanism must be supported by experimental evidence as well. A promising approach to address the mechanism of the import motor will be to reconstitute the import motor with purified components. Initial experiments have already been performed [69]. The observations made in this study are consistent with a Brownian ratchet, but not with a power-stroke mode. However, this study did not include the complete set of import motor components. The discovery of two new essential components of the import motor indicates that the motor is more complex than previously thought [24, 32, 59, 68, 69]. The search for further possible components of the motor will be a major task in the future, as it might provide us with new insights into the action of the import motor during the trans-location process. For example, the membrane integration of the J-protein Tim14, one of the new components, clearly demonstrates that the ATP hydrolysis occurs at the membrane and not in the matrix space. After having identified the complete set of components, it will be necessary to reconstitute the mitochondrial import motor. This will allow us to analyze the system under experimentally defined conditions. One characteristic feature of the import motor is its association with the membraneintegrated translocation channel. Therefore, in the long run the import motor will have to be reconstituted into proteoliposomes together with the components of the translocation channel.
References 1 BAUER, M. F., HOFMANN, S., NEUPERT, W. and BRUNNER, M. (2000) Protein translocation into mitochondria: the role of TIM complexes. Trends Cell Biol., 10, 25–31.
2 ENDO, T., YAMAMOTO, H. and ESAKI, M. (2003) Functional cooperation and separation of translocators in protein import into mitochondria, the double-membrane bounded
19
20 Proteins Import into Mitochondria
3
4
5
6
7
8
9
10
11
12
13
14
15
organelles. J. Cell Sci., 116, 3259– 3267. JENSEN, R. and DUNN, C. (2002) Protein import into and across the mitochondrial inner membrane: role of the TIM23 and TIM22 translocons. Biochim. Biophys. Acta, 1592, 25. NEUPERT, W. (1997) Protein import into mitochondria. Annu. Rev. Biochem., 66, 863–917. SCHATZ, G. and DOBBERSTEIN, B. (1996) Common principles of protein translocation across membranes. Science, 271, 1519–1526. TRUSCOTT, K. N., BRANDNER, K. and PFANNER, N. (2003) Mechanisms of protein import into mitochondria. Curr. Biol., 13, R326–R337. HOLROYD, C. and ERDMANN, R. (2001) Protein translocation machineries of peroxisomes. FEBS Lett., 501, 6–10. ROBINSON, C. and BOLHUIS, A. (2001) Protein targeting by the twin-arginine translocation pathway. Nat. Rev. Mol. Cell Biol., 2, 350–356. DRIESSEN, A. J., MANTING, E. H. and van der DOES, C. (2001) The structural basis of protein targeting and translocation in bacteria. Nat. Struct. Biol., 8, 492–498. JARVIS, P. and SOLL, J. (2002) Toc, tic, and chloroplast protein import. Biochim. Biophys. Acta, 1590, 177–189. JOHNSON, A. E. and HAIGH, N. G. (2000) The ER translocon and retrotranslocation: is the shift into reverse manual or automatic?. Cell, 102, 709–712. KEEGSTRA, K. and FROEHLICH, J. E. (1999) Protein import into chloroplasts. Curr. Opin. Plant Biol., 2, 471–476. MATLACK, K. E., MOTHES, W. and RAPOPORT, T. A. (1998) Protein translocation: tunnel vision. Cell, 92, 381–390. MATOUSCHEK, A. (2003) Protein unfolding –an important process in vivo?. Curr. Opin. Struct. Biol., 13, 98–109. NEUPERT, W. and BRUNNER, M. (2002) The protein import motor of mitochondria. Nat. Rev. Mol. Cell Biol., 3, 555–565.
16 PASCHEN, S. A. and NEUPERT, W. (2001) Protein import into mitochondria. IUBMB Life, 52, 101–112. 17 PFANNER, N. and CHACINSKA, A. (2002) The mitochondrial import machinery: preprotein-conducting channels with binding sites for presequences. Biochim. Biophys. Acta, 1592, 15. 18 RAPAPORT, D. (2002) Biogenesis of the mitochondrial TOM complex. Trends Biochem. Sci., 27, 191–197. 19 DESHAIES, R. J., KOCH, B. D., WERNER-WASHBURNE, M., CRAIG, E. A. and SCHEKMAN, R. (1988) A subfamily of stress proteins facilitates translocation of secretory and mitochondrial precursor polypeptides. Nature, 332, 800–805. 20 HACHIYA, N., KOMIYA, T., ALAM, R., IWAHASHI, J., SAKAGUCHI, M., OMURA, T. and MIHARA, K. (1994) MSF, a novel cytoplasmic chaperone which functions in precursor targeting to mitochondria. EMBO J., 13, 5146–5154. 21 MURAKAMI, H., PAIN, D. and BLOBEL, G. (1988) 7 0-kD heat shock-related protein is one of at least two distinct cytosolic factors stimulating protein import into mitochondria. J. Cell. Biol., 107, 2051–2057. 22 YOUNG, J. C., HOOGENRAAD, N. J. and HARTL, F. U. (2003) Molecular chaperones Hsp90 and Hsp70 deliver preproteins to the mitochondrial import receptor Tom70. Cell, 112, 41–50. 23 KOEHLER, C. M. (2000) Protein translocation pathways of the mitochondrion. FEBS Lett., 476, 27–31. 24 REHLING, P., PFANNER, N. and MEISINGER, C. (2003) Insertion of hydrophobic membrane proteins into the inner mitochondrial membrane –a guided tour. J. Mol. Biol., 326, 639–657. 25 GENTLE, I., GABRIEL, K., BEECH, P., WALLER, R. and LITHGOW, T. (2004) The Omp85 family of proteins is essential for outer membrane biogenesis in mitochondria and bacteria. J. Cell Biol., 164, 19–24. 26 KOZJAK, V., WIEDEMANN, N., MILENKOVIC, D., LOHAUS, C., MEYER, H. E., GUIARD, B., MEISINGER, C. and PFANNER, N. (2003) An essential role of Sam50 in the protein sorting and assembly machinery of the
References
27
28
29
30
31
32
33
34
35
mitochondrial outer membrane. J. Biol. Chem., 278, 48520–48523. PASCHEN, S. A., WAIZENEGGER, T., STAN, T., PREUSS, M., CYRKLAFF, M., HELL, K., RAPAPORT, D. and NEUPERT, W. (2003) Evolutionary conservation of biogenesis of beta-barrel membrane proteins. Nature, 426, 862–866. WIEDEMANN, N., KOZJAK, V., CHACINSKA, A., SCHONFISCH, B., ROSPERT, S., RYAN, M. T., PFANNER, N. and MEISINGER, C. (2003) Machinery for protein sorting and assembly in the mitochondrial outer membrane. Nature, 424, 565–571. LEE, C. M., NEUPERT, W. and STUART, R. A. (2000) Mitochondrial targeting signals. In J. A. F. OP DEN KAMP, (ed.), Protein, Lipid and Membrane Traffic: Pathways and Targeting. IOS Press, Amsterdam, pp. 151–159. VON HEIJNE, G. (1986) Mitochondrial targeting sequences may form amphiphilic helices. EMBO J., 5, 1335–1342. ENDO, T. and KOHDA, D. (2002) Functions of outer membrane receptors in mitochondrial protein import. Biochim. Biophys. Acta, 1592, 3. H¨ONLINGER, A., K¨UBRICH, M., MOCZKO, M., G¨ARTNER, F., MALLET, L., BUSSEREAU, F., ECKERSKORN, C., LOTTSPEICH, F., DIETMEIER, K., JACQUET, M. and PFANNER, N. (1995) The mitochondrial receptor complex: Mom22 is essential for cell viability and directly interacts with preproteins. Mol. Cell. Biol., 15, 3382–3389. KIEBLER, M., KEIL, P., SCHNEIDER, H., van der KLEI, I. J., PFANNER, N. and NEUPERT, W. (1993) The mitochondrial receptor complex: a central role of MOM22 in mediating preprotein transfer from receptors to the general insertion pore. Cell, 74, 483–492. AHTING, U., THUN, C., HEGERL, R., TYPKE, D., NARGANG, F. E., NEUPERT, W. and NUSSBERGER, S. (1999) The TOM core complex: the general protein import pore of the outer membrane of mitochondria. J. Cell Biol., 147, 959–968. DEKKER, P. J., RYAN, M. T., BRIX, J., MULLER, H., HONLINGER, A. and PFANNER, N. (1998) Preprotein
36
37
38
39
40
41
42 43
44
translocase of the outer mitochondrial membrane: molecular dissection and assembly of the general import pore complex. Mol. Cell. Biol., 18, 6515–6524. AHTING, U., THIEFFRY, M., ENGELHARDT, H., HEGERL, R., NEUPERT, W. and NUSSBERGER, S. (2001) Tom40, the poreforming component of the protein-conducting TOM channel in the outer membrane of mitochondria. J. Cell Biol., 153, 1151–1160. HILL, K., MODEL, K., RYAN, M. T., DIETMEIER, K., MARTIN, F., WAGNER, R. and PFANNER, N. (1998) Tom40 forms the hydrophilic channel of the mitochondrial import pore for preproteins. Nature, 395, 516–521. MAYER, A., NEUPERT, W. and LILL, R. (1995) Mitochondrial protein import: Reversible binding of the presequence at the trans side of the outer membrane drives partial translocation and unfolding. Cell, 80, 127–137. RAPAPORT, D., NEUPERT, W. and LILL, R. (1997) Mitochondrial protein import. Tom40 plays a major role in targeting and translocation of preproteins by forming a specific binding site for the presequence. J. Biol. Chem., 272, 18725–18731. BOLLIGER, L., JUNNE, T., SCHATZ, G. and LITHGOW, T. (1995) Acidic receptor domains on both sides of the outer membrane mediate translocation of precursor proteins into yeast mitochondria. EMBO J., 14, 6318–6326. RAPAPORT, D., KUNKELE, K. P., DEMBOWSKI, M., AHTING, U., NARGANG, F. E., NEUPERT, W. and LILL, R. (1998) Dynamics of the TOM complex of mitochondria during binding and translocation of preproteins. Mol. Cell. Biol., 18, 5256–5262. SCHATZ, G. (1997) Just follow the acid chain. Nature, 388, 121–122. BAUER, M. F., SIRRENBERG, C., NEUPERT, W. and BRUNNER, M. (1996) Role of Tim23 as voltage sensor and presequence receptor in protein import into mitochondria. Cell, 87, 33–41. MARTIN, J., MAHLKE, K. and PFANNER, N. (1991) Role of an energized inner membrane in mitochondrial protein
21
22 Proteins Import into Mitochondria
45
46
47
48
49
50
51
52
53
import. Delta psi drives the movement of presequences. J. Biol. Chem., 266, 18051–18057. HUANG, S., RATLIFF, K. S. and MATOUSCHEK, A. (2002) Protein unfolding by the mitochondrial membrane potential. Nat. Struct. Biol., 9, 301–307. HERRMANN, J. M. and NEUPERT, W. (2000) What fuels polypeptide translocation? An energetical view on mitochondrial protein sorting. Biochim. Biophys. Acta, 1459, 331–338. MATOUSCHEK, A., PFANNER, N. and VOOS, W. (2000) Protein unfolding by mitochondria. The Hsp70 import motor. EMBO Rep., 1, 404–410. PFANNER, N. and GEISSLER, A. (2001) Versatility of the mitochondrial protein import machinery. Nat. Rev. Mol. Cell. Biol., 2, 339–349. MILLER, B. R. and CUMSKY, M. G. (1993) Intramitochondrial sorting of the precursor to yeast cytochrome c oxidase subunit Va. J. Cell Biol., 121, 1021– 1029. ROJO, E. E., GUIARD, B., NEUPERT, W. and STUART, R. A. (1998) Sorting of D-lactate dehydrogenase to the inner membrane of mitochondria. Analysis of topogenic signal and energetic requirements. J. Biol. Chem., 273, 8040–8047. G¨ARTNER, F., VOOS, W., QUEROL, A., MILLER, B. R., CRAIG, E. A., CUMSKY, M. G. and PFANNER, N. (1995) Mitochondrial import of subunit Va of cytochrome c oxidase characterized with yeast mutants –Independence from receptors, but requirement for matrix hsp70 translocase function. J. Biol. Chem., 270, 3788–3795. DONZEAU, M., KALDI, K., ADAM, A., PASCHEN, S., WANNER, G., GUIARD, B., BAUER, M. F., NEUPERT, W. and BRUNNER, M. (2000) Tim23 links the inner and outer mitochondrial membranes. Cell, 101, 401–412. GEISSLER, A., CHACINSKA, A., TRUSCOTT, K. N., WIEDEMANN, N., BRANDNER, K., SICKMANN, A., MEYER, H. E., MEISINGER, C., PFANNER, N. and REHLING, P. (2002) The mitochondrial presequence translocase: an essential role of Tim50 in
54
55
56
57
58
59
60
61
directing preproteins to the import channel. Cell, 111, 507–518. MOKRANJAC, D., PASCHEN, S. A., KOZANY, C., PROKISCH, H., HOPPINS, S. C., NARGANG, F. E., NEUPERT, W. and HELL, K. (2003a) Tim50, a novel component of the TIM23 preprotein translocase of mitochondria. EMBO J., 22, 816–825. YAMAMOTO, H., ESAKI, M., KANAMORI, T., TAMURA, Y., NISHIKAWA, S. and ENDO, T. (2002) Tim50 is a subunit of the TIM23 complex that links protein translocation across the outer and inner mitochondrial membranes. Cell, 111, 519–528. BERTHOLD, J., BAUER, M. F., SCHNEIDER, H. C., KLAUS, C., DIETMEIER, K., NEUPERT, W. and BRUNNER, M. (1995) The MIM complex mediates preprotein translocation across the mitochondrial inner membrane and couples it to the mtHsp70/ATP driving system. Cell, 81, 1085–1093. DEKKER, P. J., KEIL, P., RASSOW, J., MAARSE, A. C., PFANNER, N. and MEIJER, M. (1993) Identification of MIM23, a putative component of the protein import machinery of the mitochondrial inner membrane. FEBS Lett., 330, 66–70. DEKKER, P. J., MARTIN, F., MAARSE, A. C., BOMER, U., MULLER, H., GUIARD, B., MEIJER, M., RASSOW, J. and PFANNER, N. (1997) The Tim core complex defines the number of mitochondrial translocation contact sites and can hold arrested preproteins in the absence of matrix Hsp70-Tim44. EMBO J., 16, 5408–5419. EMTAGE, J. L. and JENSEN, R. E. (1993) MAS6 encodes an essential inner membrane component of the yeast mitochondrial protein import pathway. J. Cell Biol., 122, 1003–1012. MAARSE, A. C., BLOM, J., KEIL, P., PFANNER, N. and MEIJER, M. (1994) Identification of the essential yeast protein MIM17, an integral mitochondrial inner membrane protein involved in protein import. FEBS Lett., 349, 215–221. MILISAV, I., MORO, F., NEUPERT, W. and BRUNNER, M. (2001) Modular structure of the TIM23 preprotein translocase of mitochondria. J. Biol. Chem., 276, 25856–25861.
References 62 RYAN, K. R. and JENSEN, R. E. (1993) Mas6p can be cross-linked to an arrested precursor and interacts with other proteins during mitochondrial protein import. J. Biol. Chem., 268, 23743–23746. 63 RYAN, K. R., MENOLD, M. M., GARRETT, S. and JENSEN, R. E. (1994) SMS1, a high-copy suppressor of the yeast mas6 mutant, encodes an essential inner membrane protein required for mitochondrial protein import. Mol. Biol. Cell, 5, 529–538. 64 TRUSCOTT, K. N., KOVERMANN, P., GEISSLER, A., MERLIN, A., MEIJER, M., DRIESSEN, A. J., RASSOW, J., PFANNER, N. and WAGNER, R. (2001) A presequence – and voltage-sensitive channel of the mitochondrial preprotein translocase formed by Tim23. Nat. Struct. Biol., 8, 1074–1082. 65 D’SILVA, P. D., SCHILKE, B., WALTER, W., ANDREW, A. and CRAIG, E. A. (2003) J protein cochaperone of the mitochondrial inner membrane required for protein import into the mitochondrial matrix. Proc. Natl. Acad. Sci. USA, 100, 13839–13844. 66 FRAZIER, A., DUDEK, J., GUIARD, B., VOOS, W., LI, Y., LIND, M., MEISINGER, C., GEISSLER, A., SICKMANN, A., MEYER, H. E., BILANCHONE, V., CUMSKY, M. G., TRUSCOTT, K. N., PFANNER, N. and REHLING, P. (2004) Pam16 plays an essential role in the mitochondrial protein import motor. Nature Struct. Mol. Biol., 11, 226–233. 67 KOZANY, C., MOKRANJAC, D., SICHTING, M., NEUPERT, W., and HELL, K. (2004) The J-domain related co-chaperone Tim16 is a constituent of the mitochondrial TIM23 preprotein translocase. Nature Struct. Mol. Biol., 11, 234–241. 68 MOKRANJAC, D., SICHTING, M., NEUPERT, W. and HELL, K. (2003) Tim14, a novel key component of the import motor of the TIM23 protein translocase of mitochondria. EMBO J., 22, 4945–4956. 69 TRUSCOTT, K. N., VOOS, W., FRAZIER, A. E., LIND, M., LI, Y., GEISSLER, A., DUDEK, J., MULLER, H., SICKMANN, A., MEYER, H. E., MEISINGER, C., GUIARD, B., REHLING, P. and PFANNER, N. (2003) A J-protein is
70
71
72
73
74
75
76
77
an essential subunit of the presequence translocase-associated protein import motor of mitochondria. J. Cell Biol., 163, 707–713. KRONIDOU, N. G., OPPLIGER, W., BOLLIGER, L., HANNAVY, K., GLICK, B. S., SCHATZ, G. and HORST, M. (1994) Dynamic interaction between Isp45 and mitochondrial hsp70 in the protein import system of the yeast mitochondrial inner membrane. Proc. Natl. Acad. Sci. U.S.A., 91, 12818–12822. RASSOW, J., MAARSE, A. C., KRAINER, E., KUBRICH, M., MULLER, H., MEIJER, M., CRAIG, E. A. and PFANNER, N. (1994) Mitochondrial protein import: biochemical and genetic evidence for interaction of matrix hsp70 and the inner membrane protein MIM44. J. Cell Biol., 127, 1547–1556. SCHNEIDER, H. C., BERTHOLD, J., BAUER, M. F., DIETMEIER, K., GUIARD, B., BRUNNER, M. and NEUPERT, W. (1994) Mitochondrial Hsp70/MIM44 complex facilitates protein import. Nature, 371, 768–774. BUKAU, B. and HORWICH, A. L. (1998) The Hsp70 and Hsp60 chaperone machines. Cell, 92, 351–366. KRIMMER, T., RASSOW, J., KUNAU, W. H., VOOS, W. and PFANNER, N. (2000) Mitochondrial protein import motor: the ATPase domain of matrix Hsp70 is crucial for binding to Tim44, while the peptide binding domain and the carboxy-terminal segment play a stimulatory role. Mol. Cell. Biol. 20, 5879–5887. MORO, F., OKAMOTO, K., DONZEAU, M., NEUPERT, W. and BRUNNER, M. (2002) Mitochondrial protein import: molecular basis of the ATP-dependent interaction of MtHsp70 with Tim44. J. Biol. Chem., 277, 6874–6880. STRUB, A., ROTTGERS, K. and VOOS, W. (2002) The Hsp70 peptide-binding domain determines the interaction of the ATPase domain with Tim44 in mitochondria. EMBO J., 21, 2626–2635. GRAGEROV, A., ZENG, L., ZHAO, X., BURKHOLDER, W. and GOTTESMAN, M. E. (1994) Specificity of DnaK-peptide binding. J. Mol. Biol., 235, 848–854.
23
24 Proteins Import into Mitochondria
78 LIU, Q., D’SILVA, P., WALTER, W., MARSZALEK, J. and CRAIG, E. A. (2003) Regulated cycling of mitochondrial Hsp70 at the protein import channel. Science, 300, 139–141. 79 RUDIGER, S., BUCHBERGER, A. and BUKAU, B. (1997a) Interaction of Hsp70 chaperones with substrates. Nat. Struct. Biol., 4, 342–349. 80 SCHNEIDER, H. C., WESTERMANN, B., NEUPERT, W. and BRUNNER, M. (1996) The nucleotide exchange factor MGE exerts a key function in the ATP-dependent cycle of mt-Hsp70-Tim44 interaction driving mitochondrial protein import. EMBO J., 15, 5796–5803. 81 UNGERMANN, C., GUIARD, B., NEUPERT, W. and CYR, D. M. (1996) The delta psiand Hsp70/MIM44-dependent reaction cycle driving early steps of protein import into mitochondria. EMBO J., 15, 735–744. 82 UNGERMANN, C., NEUPERT, W. and CYR, D. M. (1994) The role of Hsp70 in conferring unidirectionality on protein translocation into mitochondria. Science, 266, 1250–1253. 83 MORO, F., SIRRENBERG, C., SCHNEIDER, H. C., NEUPERT, W. and BRUNNER, M. (1999) The TIM17.23 preprotein translocase of mitochondria: composition and function in protein transport into the matrix. EMBO J., 18, 3667–3675. 84 LALORAYA, S., GAMBILL, B. D. and CRAIG, E. A. (1994) A role for a eukaryotic GrpE-related protein, Mge1p, in protein translocation. Proc. Natl. Acad. Sci. U.S.A., 91, 6481–6485. 85 NAKAI, M., KATO, Y., IKEDA, E., TOHE, A. and ENDO, T. (1994) Yge1p, a eukaryotic Grp-E homolog, is localized in the mitochondrial matrix and interacts with mitochondrial Hsp70. Biochem. Biophys. Res. Commun. 200, 435–442. 86 WESTERMANN, B., PRIP-BUUS, C., NEUPERT, W. and SCHWARZ, E. (1995) The role of the GrpE homologue, Mge1p, in mediating protein import and protein folding in mitochondria. EMBO J., 14, 3452–3460. 87 CHENG, M. Y., HARTL, F. U., MARTIN, J., POLLOCK, R. A., KALOUSEK, F., NEUPERT,
88
89
90
91
92
93
94
95
W., HALLBERG, E. M., HALLBERG, R. L. and HORWICH, A. L. (1989) Mitochondrial heat-shock protein hsp60 is essential for assembly of proteins imported into yeast mitochondria. Nature, 337, 620–625. KANG, P. J., OSTERMANN, J., SHILLING, J., NEUPERT, W., CRAIG, E. A. and PFANNER, N. (1990) Requirement for hsp70 in the mitochondrial matrix for translocation and folding of precursor proteins. Nature, 348, 137–143. OSTERMANN, J., HORWICH, A. L., NEUPERT, W. and HARTL, F. U. (1989) Protein folding in mitochondria requires complex formation with hsp60 and ATP hydrolysis. Nature, 341, 125–130. RASSOW, J., MOHRS, K., KOIDL, S., BARTHELMESS, I. B., PFANNER, N. and TROPSCHUG, M. (1995) Cyclophilin 20 is involved in mitochondrial protein folding in cooperation with molecular chaperones Hsp70 and Hsp60. Mol. Cell. Biol., 15, 2654–2662. KUNKELE, K. P., HEINS, S., DEMBOWSKI, M., NARGANG, F. E., BENZ, R., THIEFFRY, M., WALZ, J., LILL, R., NUSSBERGER, S. and NEUPERT, W. (1998) The preprotein translocation channel of the outer membrane of mitochondria. Cell, 93, 1009–1019. MODEL, K., PRINZ, T., RUIZ, T., RADERMACHER, M., KRIMMER, T., KUHLBRANDT, W., PFANNER, N. and MEISINGER, C. (2002) Protein Translocase of the Outer Mitochondrial Membrane: Role of Import Receptors in the Structural Organization of the TOM Complex. J. Mol. Biol., 316, 657–666. REHLING, P., MODEL, K., BRANDNER, K., KOVERMANN, P., SICKMANN, A., MEYER, H. E., KUHLBRANDT, W., WAGNER, R., TRUSCOTT, K. N. and PFANNER, N. (2003a) Protein insertion into the mitochondrial inner membrane by a twin-pore translocase. Science, 299, 1747–1751. SCHWARTZ, M. P., HUANG, S. and MATOUSCHEK, A. (1999) The structure of precursor proteins during import into mitochondria. J. Biol. Chem., 274, 12759–12764. VESTWEBER, D. and SCHATZ, G. (1988) Mitochondria can import artificial
References
96
97
98
99
100
101
102
103
104
precursor proteins containing a branched polypeptide chain or a carboxy-terminal stilbene disulfonate. J. Cell Biol., 107, 2045–2049. VESTWEBER, D. and SCHATZ, G. (1989) DNA-protein conjugates can enter mitochondria via the protein import pathway. Nature, 338, 170–172. CURRAN, S. P., LEUENBERGER, D., SCHMIDT, E. and KOEHLER, C. M. (2002) The role of the Tim8p-Tim13p complex in a conserved import pathway for mitochondrial polytopic inner membrane proteins. J. Cell Biol., 158, 1017–1027. ENDRES, M., NEUPERT, W. and BRUNNER, M. (1999) Transport of the ADP/ATP carrier of mitochondria from the TOM complex to the TIM22.54 complex. EMBO J., 18, 3214–3221. WIEDEMANN, N., PFANNER, N. and RYAN, M. T. (2001) The three modules of ADP/ATP carrier cooperate in receptor recruitment and translocation into mitochondria. EMBO J., 20, 951–960. FOELSCH, H., GUIARD, B., NEUPERT, W. and STUART, R. A. (1996) Internal targeting signal of the BCS1 protein: a novel mechanism of import into mitochondria. EMBO J., 15, 479–487. STAN, T., BRIX, J., SCHNEIDER-MERGENER, J., PFANNER, N., NEUPERT, W. and RAPAPORT, D. (2003) Mitochondrial protein import: recognition of internal import signals of BCS1 by the TOM complex. Mol. Cell. Biol., 23, 2239–2250. HURT, E. C., PESOLD-HURT, B. and SCHATZ, G. (1984) The cleavable prepiece of an imported mitochondrial protein is sufficient to direct cytosolic dihydrofolate reductase into the mitochondrial matrix. FEBS Lett., 178, 306–310. EILERS, M. and SCHATZ, G. (1986) Binding of a specific ligand inhibits import of a purified precursor protein into mitochondria. Nature, 322, 228–232. GAUME, B., KLAUS, C., UNGERMANN, C., GUIARD, B., NEUPERT, W. and BRUNNER, M. (1998) Unfolding of preproteins upon import into mitochondria. EMBO J., 17, 6497–6507.
105 GLICK, B. S., WACHTER, C., REID, G. A. and SCHATZ, G. (1993) Import of cytochrome b2 to the mitochondrial intermembrane space: the tightly folded heme-binding domain makes import dependent upon matrix ATP. Protein Sci., 2, 1901–1917. 106 HUANG, S., RATLIFF, K. S., SCHWARTZ, M. P., SPENNER, J. M. and MATOUSCHEK, A. (1999) Mitochondria unfold precursor proteins by unraveling them from their N-termini. Nat. Struct. Biol., 6, 1132–1138. 107 MATOUSCHEK, A., AZEM, A., RATLIFF, K., GLICK, B. S., SCHMID, K. and SCHATZ, G. (1997) Active unfolding of precursor proteins during mitochondrial protein import. EMBO J., 16, 6727–6736. 108 SCHWARTZ, M. P. and MATOUSCHEK, A. (1999) The dimensions of the protein import channels in the outer and inner mitochondrial membranes. Proc. Natl. Acad. Sci. U.S.A., 96, 13086–13090. 109 WIENHUES, U., BECKER, K., SCHLEYER, M., GUIARD, B., TROPSCHUG, M., HORWICH, A. L., PFANNER, N. and NEUPERT, W. (1991) Protein folding causes an arrest of preprotein translocation into mitochondria in vivo. J. Cell Biol., 115, 1601–1609. 110 BOMER, U., MEIJER, M., GUIARD, B., DIETMEIER, K., PFANNER, N. and RASSOW, J. (1997) The sorting route of cytochrome b2 branches from the general mitochondrial import pathway at the preprotein translocase of the inner membrane. J. Biol. Chem., 272, 30439–30446. 111 NEUPERT, W., HARTL, F. U., CRAIG, A. and PFANNER, N. (1990) How do polypeptides cross the mitochondrial membranes?. Cell, 63, 447–450. 112 SIMON, S. M., PESKIN, C. S. and OSTER, G. F. (1992) What drives the translocation of proteins? Proc. Natl. Acad. Sci. USA, 89, 3770–3774. 113 GLICK, B. S. (1995) Can Hsp70 proteins act as force-generating motors?. Cell, 80, 11–14. 114 OKAMOTO, K., BRINKER, A., PASCHEN, S. A., MOAREFI, I., HAYER-HARTL, M., NEUPERT, W. and BRUNNER, M. (2002) The protein import motor of
25
26 Proteins Import into Mitochondria
115
116
117
118
119
120
121
122
123
124
mitochondria: a targeted molecular ratchet driving unfolding and translocation. EMBO J., 21, 3659–3671. JENSEN, R. E. and JOHNSON, A. E. (1999) Protein translocation: is Hsp70 pulling my chain? Curr. Biol., 9, R779–782. PFANNER, N. and MEIJER, M. (1995) Protein sorting. Pulling in the proteins. Curr. Biol., 5, 132–135. VOOS, W. and ROTTGERS, K. (2002) Molecular chaperones as essential mediators of mitochondrial biogenesis. Biochim. Biophys. Acta, 1592, 51. CHAUWIN, J. F., OSTER, G. and GLICK, B. S. (1998) Strong precursorpore interactions constrain models for mitochondrial protein import. Biophys. J, 74, 1732–1743. ZHUANG, X., HA, T., KIM, H. D., CENTNER, T., LABEIT, S. and CHU, S. (2000) Fluorescence quenching: A tool for single-molecule protein-folding study. Proc. Natl. Acad. Sci. USA, 97, 14241–14244. LIM, J. H., MARTIN, F., GUIARD, B., PFANNER, N. and VOOS, W. (2001) The mitochondrial Hsp70-dependent import system actively unfolds preproteins and shortens the lag phase of translocation. EMBO J., 20, 941–950. EILERS, M., HWANG, S. and SCHATZ, G. (1988) Unfolding and refolding of a purified precursor protein during import into isolated mitochondria. EMBO J., 7, 1139–1145. OSTERMANN, J., VOOS, W., KANG, P. J., CRAIG, E. A., NEUPERT, W. and PFANNER, N. (1990) Precursor proteins in transit through mitochondrial contact sites interact with hsp70 in the matrix. FEBS Lett., 277, 281–284. VESTWEBER, D. and SCHATZ, G. (1988) Point mutations destabilizing a precursor protein enhance its post-translational import into mitochondria. EMBO J., 7, 1147– 1151. GAMBILL, B. D., VOOS, W., KANG, P. J., MIAO, B., LANGER, T., CRAIG, E. A. and PFANNER, N. (1993) A dual role for mitochondrial heat shock protein 70 in membrane translocation of preproteins. J. Cell Biol., 123, 109–117.
125 VOISINE, C., CRAIG, E. A., ZUFALL, N., VON AHSEN, O., PFANNER, N. and VOOS, W. (1999) The protein import motor of mitochondria: Unfolding and trapping of preproteins are distinct and separable functions of matrix Hsp70. Cell, 97, 565–574. 126 VOOS, W., VON AHSEN, O., MULLER, H., GUIARD, B., RASSOW, J. and PFANNER, N. (1996) Differential requirement for the mitochondrial Hsp70-Tim44 complex in unfolding and translocation of preproteins. EMBO J., 15, 2668–2677. 127 RASSOW, J., HARTL, F. U., GUIARD, B., PFANNER, N. and NEUPERT, W. (1990) Polypeptides traverse the mitochondrial envelope in an extended state. FEBS Lett., 275, 190–194. 128 CARRION-VAZQUEZ, M., OBERHAUSER, A. F., FOWLER, S. B., MARSZALEK, P. E., BROEDEL, S. E., CLARKE, J. and FERNANDEZ, J. M. (1999) Mechanical 129 LI, H., OBERHAUSER, A. F., FOWLER, S. B., CLARKE, J. and FERNANDEZ, J. M. (2000) Atomic force microscopy reveals the mechanical design of a modular protein. Proc. Natl. Acad. Sci. USA, 97, 6527–6531. 130 MARSZALEK, P. E., LU, H., LI, H., CARRION-VAZQUEZ, M., OBERHAUSER, A. F., SCHULTEN, K. and FERNANDEZ, J. M. (1999) Mechanical unfolding intermediates in titin modules. Nature, 402, 100–103. 131 SMITH, D. A. and RADFORD, S. E. (2000) Protein folding: pulling back the frontiers. Curr. Biol., 10, R662–664. 132 SPUDICH, J. A. (1994) How molecular motors work. Nature, 372, 515–518. 133 SPUDICH, J. A. (2001) The myosin swinging cross-bridge model. Nat. Rev. Mol. Cell Biol., 2, 387–392. 134 STAN, T., AHTING, U., DEMBOWSKI, M., KUNKELE, K. P., NUSSBERGER, S., NEUPERT, W. and RAPAPORT, D. (2000) Recognition of preproteins by the isolated TOM complex of mitochondria. EMBO J., 19, 4895–4902. 135 RASSOW, J., GUIARD, B., WIENHUES, U., HERZOG, V., HARTL, F. U. and NEUPERT, W. (1989) Translocation arrest by reversible folding of a precursor protein imported into mitochondria. A means to
References
136
137
138
139
140
141
142
143
144
quantitate translocation contact sites. J. Cell Biol., 109, 1421–1428. HARTL, F. U. and HAYER-HARTL, M. (2002) Molecular chaperones in the cytosol: from nascent chain to folded protein. Science, 295, 1852–1858. VOOS, W., GAMBILL, B. D., GUIARD, B., PFANNER, N. and CRAIG, E. A. (1993) Presequence and mature part of preproteins strongly influence the dependence of mitochondrial protein import on heat shock protein 70 in the matrix. J. Cell Biol., 123, 119–126. MERLIN, A., VOOS, W., MAARSE, A. C., MEIJER, M., PFANNER, N. and RASSOW, J. (1999) The J-related segment of tim44 is essential for cell viability: a mutant Tim44 remains in the mitochondrial import site, but inefficiently recruits mtHsp70 and impairs protein translocation. J. Cell Biol., 145, 961–972. RASSOW, J. and PFANNER, N. (2000) The protein import machinery of the mitochondrial membranes. Traffic, 1, 457–464. BOMER, U., MAARSE, A. C., MARTIN, F., GEISSLER, A., MERLIN, A., SCHONFISCH, B., MEIJER, M., PFANNER, N. and RASSOW, J. (1998) Separation of structural and dynamic functions of the mitochondrial translocase: Tim44 is crucial for the inner membrane import sites in translocation of tightly folded domains, but not of loosely folded preproteins. EMBO J., 17, 4226–4237. LIU, Q., KRZEWSKA, J., LIBEREK, K. and CRAIG, E. A. (2001) Mitochondrial Hsp70 Ssc1: role in protein folding. J. Biol. Chem., 276, 6112–6118. RUDIGER, S., GERMEROTH, L., SCHNEIDER-MERGENER, J. and BUKAU, B. (1997b) Substrate specificity of the DnaK chaperone determined by screening cellulose-bound peptide libraries. EMBO J., 16, 1501–1507. SCHMID, D., BAICI, A., GEHRING, H. and CHRISTEN, P. (1994) Kinetics of molecular chaperone action. Science, 263, 971–973. MATLACK, K. E., MISSELWITZ, B., PLATH, K. and RAPOPORT, T. A. (1999) BiP acts as a molecular ratchet during posttranslational transport of
145
146
147
148
149
150
151
152
153
preproalpha factor across the ER membrane. Cell, 97, 553–564. BLAKLEY, R. L. (1995) Eukaryotic dihydrofolate reductase. Adv. Enzymol. Relat. Areas Mol. Biol., 70, 23–102. BOLLIGER, L., DELOCHE, O., GLICK, B. S., GEORGOPOULOS, C., JENO, P., KRONIDOU, N., HORST, M., MORISHIMA, N. and SCHATZ, G. (1994) A mitochondrial homolog of bacterial GrpE interacts with mitochondrial hsp70 and is essential for viability. EMBO J., 13, 1998–2006. CYR, D. M., UNGERMANN, C. and NEUPERT, W. (1995) Analysis of mitochondrial protein import pathway in Saccharomyces cerevisiae with translocation intermediates. Methods Enzymol., 260, 241–252. DAUM, G., BOHNI, P. C. and SCHATZ, G. (1982) Import of proteins into mitochondria. Cytochrome b2 and cytochrome c peroxidase are located in the intermembrane space of yeast mitochondria. J. Biol. Chem., 257, 13028–13033. HERRMANN, J. M., FOELSCH, H., NEUPERT, W. and STUART, R. A. (1994) Isolation of yeast mitochondria and study of mitochondrial protein translation. In CELIS, J. E. (ed.), Cell Biology: A Laboratory Handbook. Academic Press, San Diego, Vol. 1, pp. 538–544. KURZ, M., MARTIN, H., RASSOW, J., PFANNER, N. and RYAN, M. T. (1999) Biogenesis of Tim proteins of the mitochondrial carrier import pathway: differential targeting mechanisms and crossing over with the main import pathway. Mol. Biol. Cell, 10, 2461–2474. OEFNER, C., D’ ARCY, A. and WINKLER, F. K. (1988) Crystal structure of human dihydrofolate reductase complexed with folate. Eur. J. Biochem., 174, 377–385. PALLOTTI, F. and LENAZ, G. (2001) Isolation and subfractionation of mitochondria from animal cells and tissue culture lines. Methods Cell Biol., 65, 1–35. PELHAM, H. R. B. and JACKSON, R. J. (1976) An efficient mRNA-dependent translation system from reticulocyte lysates. Eur. J. Biochem., 67, 247– 256.
27
28 Proteins Import into Mitochondria
154 STAMMERS, D. K., CHAMPNESS, J. N., BEDDELL, C. R., DANN, J. G., ELIOPOULOS, E., GEDDES, A. J., OGG, D. and NORTH, A. C. (1987) The structure of mouse L1210 dihydrofolate reductase-drug complexes and the construction of a model of human enzyme. FEBS Lett., 218, 178–184. 155 TOUCHETTE, N. A., PERRY, K. M. and MATTHEWS, C. R. (1986) Folding of dihydrofolate reductase from Escherichia coli. Biochemistry, 25, 5445–5452. 156 VESTWEBER, D., BRUNNER, J., BAKER, A. and SCHATZ, G. (1989) A 42K outer-membrane protein is a component of the yeast mitochondrial protein
import site. Nature, 341, 205–209. 157 WERNER, S. and NEUPERT, W. (1972) Functional and biogenetical heterogeneity of the inner membrane of rat-liver mitochondria. Eur. J. Biochem., 25, 379–396. 158 WIENHUES, U., KOLL, H., BECKER, K., GUIARD, B. and HARTL, F.-U. (1992) Protein targeting to mitochondria. In A Practical Approach to Protein Targeting. IRL (Oxford University Press), pp. 135–159. 159 XIA, Z. X. and MATHEWS, F. S. (1990) Molecular structure of flavocytochrome b2 at 2.4 A resolution. J. Mol. Biol., 212, 837–863.
1
Hsp70 Chaperones
Elizabeth A. Craig, and Peggy Huang University of Wiconsin, Madison, USA
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Hsp70s are ubiquitous molecular chaperones that function in a wide variety of physiological processses, including protein folding, protein translocation across membranes, and assembly/disassembly of multimeric protein complexes. Hsp70s rarely, if ever, function alone, but rather with J-protein (also known as Hsp40 or DnaJ-like protein) partners. All eukaryotic, and the vast majority of prokaryotic, genomes encode Hsp70 and J-proteins. Most, in fact, encode multiple Hsp70 and J-proteins, indicative of the evolution of Hsp70s/J-proteins such that they function in a variety of cellular processes. For example, the yeast genome encodes 14 Hsp70s (Figure 1), while the mouse genome encodes at least 12; the number of J-proteins is even more numerous, with the yeast and mouse genomes encoding approximately 20 and 23, respectively [1]. The larger number of J-proteins likely reflects the fact that a single Hsp70 can function with more than one J-protein, driving the function of an Hsp70 in a particular cellular role. In fact, in some cases, J-proteins act as specificity factors, modulating the activity of Hsp70s or delivering specific substrates to them. This proliferation of Hsp70 and J-protein genes reflects the divergence in their biological functions, the subject of this chapter. In most cases, the basic biochemical properties of Hsp70s and J-proteins utilized when carrying out their functions are the same. Diverse cellular roles result from harnessing these basic biochemical properties in different ways. For example, by tethering a chaperone to a particular cellular location or substrate protein, or by evolving the substrate specificity of these chaperones, they are optimized to function in a particular cellular function. The general structure of Hsp70s has been conserved [2, 3]: a highly conserved N-terminal ATPase domain, followed by a less conserved domain having a cleft in which hydrophobic peptides can bind, and a variable C-terminal 10-kDa region. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Hsp70 Chaperones
Fig. 1 Phylogenetic tree of Hsp70 proteins from the yeast Saccharomyces cerevisiae. Subcellular locations, functions, and J-protein partners are indicated for each Hsp70 family. The tree was created using MegAlign software from DNASTAR (Madison, WI) with the clustal method. See text for references regarding evidence for information on particular genes and their functions. Locations or functions that should be considered tentative assignments are in parentheses.
2 “Soluble” Hsp70s/J-proteins Function in General Protein Folding
ATP binding and hydrolysis in the N-terminus regulate the interaction of the C-terminus with unfolded polypeptides, with ATP hydrolysis stabilizing the interaction. J-proteins transiently interact with the ATPase domain of ATP-bound Hsp70 via their highly conserved J-domains and stimulate its ATPase activity, thus facilitating the chaperone cycle. Some J-proteins themselves are also able to bind poly-peptide substrates, delivering them to Hsp70s. In addition to J-proteins, some Hsp70s also require nucleotide releasing factors to facilitate the exchange of ADP from ATP [2, 3]. In this article, we first focus on Hsp70/J-proteins’ role in the folding of newly synthesized proteins and the recovery of proteins partially denatured by stress, a function carried out by the abundant soluble Hsp70/J-proteins found in all major cellular compartments. Secondly, we discuss Hsp70s/J-proteins, tethered to specific sites within the cell, that are involved in the early stages of protein folding and translocation across membranes. Thirdly, we discuss examples in which Hsp70/Jproteins function in specific cellular processes where folded proteins, rather than unfolded polypeptides, serve as substrates for Hsp70s/J-proteins. In such cases, specific effects on protein conformation, disassembly of multimeric protein complexes, and/or regulation of activity are the typical effects of chaperone action. At the end of the article, we discuss two examples in which a member of the yeast Hsp70 family has evolved a function distinct from that of a chaperone, raising the possibility that during evolution some such proteins have been usurped to play other, perhaps regulatory, roles.
2 “Soluble” Hsp70s/J-proteins Function in General Protein Folding
The “classic” notion of an Hsp70 is an abundant, likely stress-inducible, soluble protein that primarily facilitates the refolding of proteins partially denatured by stress or the folding of newly synthesized proteins. Such Hsp70s are ubiquitous in the cytosol of eubacteria, eukarya, and some archaea, as well as in the matrix of mitochondria and the lumen of the endoplasmic reticulum (ER). 2.1 The Soluble Hsp70 of E. coli, DnaK
The E. coli Hsp70 DnaK is an abundant protein even under optimal growth conditions, making up about 1% of cellular protein. It is also inducible when cells encounter a stress, such as a rapid temperature increase, at which point it can become 3% of cellular protein, making it the most abundant cellular chaperone [4]. DnaK works with its J-protein partner DnaJ and its nucleotide releasing factor GrpE to perform chaperone functions in the cell. The evidence that the DnaK system is critical for preventing the aggregation of cellular proteins upon heat stress and for assisting in the refolding of denatured proteins is incontrovertible [5–10]. At high
3
4 Hsp70 Chaperones
temperatures, 15–25% of cytosolic proteins, comprising 150–200 different species of polypeptides, aggregate after a temperature upshift of cells lacking DnaK [4, 11]. DnaK also plays a role in the folding of newly synthesized proteins, but that function is not unique to DnaK and can be performed by other chaperones, particularly the ribosome-associated chaperone trigger factor. Cells lacking either DnaK or trigger factor are viable and little or no protein aggregation is seen [12, 13]. But a double mutant is inviable; upon depletion of trigger factor in cells lacking DnaK, a dramatic increase in aggregation of more than 40 proteins was observed prior to cell death [12, 13]. Experiments using pulse-labeling of newly synthesized proteins to track the flux of newly synthesized proteins through DnaK revealed that between 5% and 18% of newly made proteins, mainly polypeptides larger than 30 kDa, could be immuno-precipitated with DnaK-specific antibodies in wild-type cells [12]. But, in the absence of trigger factor, this number increased two- to threefold, up to as much as 36% [13]. Together these results suggest that DnaK acts as a chaperone for newly synthesized chains but that function can be taken over by other chaperones, particularly trigger factor, in DnaK’s absence. In turn, DnaK is able to take over some of the chaperone activities of trigger factor in its absence. Such overlaps in chaperone function likely are common and reflect a general plasticity of the chaperone networks. 2.2 Soluble Hsp70s of Major Eukaryotic Cellular Compartments 2.2.1 Eukaryotic Cytosol While the data for eukaryotic cells is less extensive than that available for E. coli, Hsp70s, working with J-protein co-chaperones, are thought to be important for protein folding in the cytosol of eukaryotes as well. In yeast the major cytosolic family of Hsp70s, the Ssa’s, is most closely related to the abundant cytosolic Hsp70s of mammalian cells, Hsc70 and Hsp70. Ssa’s are thought to participate in “bulk” protein folding in the cytosol, mainly in cooperation with the J-protein Ydj1. Perhaps the most compelling evidence that Ssa proteins are involved in protein folding in the cytosol is the fact that yeast cells carrying a temperature-sensitive SSA mutant accumulate newly synthesized ornithine transcarbamylase in a misfolded, but soluble, form after shift to the non-permissive temperature [14]. In mammalian tissue culture cells, Hsc70 has also been implicated in bulk folding of newly synthesized proteins. It has been found to interact with ∼20% of newly synthesized polypeptides [15, 16]. In addition, overexpression of human Hsp70 and the J-protein Hdj2 enhances refolding of heat-inactivated luciferase in vivo [17]. Interestingly, in several eukaryotic model systems for protein aggregation, including overexpression of disease-related proteins such as huntingtin and ataxin, Hsp70s and J-proteins have been found to colocalize with such proteins and/or overexpression of the chaperones have been found to decreases their aggregations (reviewed in Ref. [3]. While in many cases Hsp70s, along with their co-chaperone J-protein, are thought to function alone in protein folding, there are many indications that they also
2 “Soluble” Hsp70s/J-proteins Function in General Protein Folding
function cooperatively with other chaperone systems to assist posttranslational protein folding. In particular, Hsp70s are thought to function with the Hsp90 system, interacting with Hsp90 client proteins prior to interacting with Hsp90 itself [18]. In yeast both the Ssa and the Sse classes of Hsp70s have been implicated in such cooperation [19, 20]. 2.2.2 Matrix of Mitochondria As perhaps expected of an organelle evolved from eubacteria, the major Hsp70 and its J-protein partner in the mitochondrial matrix (called Ssc1 and Mdj1, respectively, in yeast) are more closely related to the bacterial Hsp70 DnaK and its J-protein partner DnaJ than to other Hsp70s of the same species. A nucleotide release factor, related to E. coli GrpE, called Mge1 completes the prokaryotic-type chaperone triad of mitochondria. This chaperone triad is involved in the folding of proteins synthesized on cytosolic ribosomes and imported into the matrix [21–24] as well as of proteins encoded by mitochondrial DNA and translated on mitochondrial ribosomes [23, 25]. Ssc1 also has a second role: a portion of Ssc1 is tethered to the mitochondrial inner membrane and is critical for the process of protein translocation across membranes. 2.2.3 Lumen of the Endoplasmic Reticulum Approximately 30% of the proteins synthesized on cytosolic ribosomes are translocated into ER, where posttranslational modification, folding, and assembly take place. These include secreted and cell-surface proteins as well as resident proteins of the endocytic and exocytic organelles. Hsp70s in the ER interact with these newly translocated proteins, assisting in their folding and assembly and preventing their aggregation. BiP (also known as GRP78) in mammalian cells and its yeast homologue, Kar2, are the best-studied ER lumenal Hsp70s that have been shown to interact with a number of secretory pathway proteins to promote their folding and assembly [26–28]. BiP/Kar2 have also been found to play a role in retaining proteins in the ER that do not mature properly and in directing them to back across the ER membrane to be degraded by the proteasome, a process known as ER quality control (reviewed in Refs. [29–31]). In addition, the interaction between unfolded proteins and BiP/Kar2 has been shown to serve as a sensor for ER stress [32]. Working with ER-membrane-associated J-proteins (Sec63 and Mtj1), BiP/Kar2 are also involved in the early step of protein translocation into ER (see Sections 3.1 and 3.2 for more discussion). In addition to Kar2, a second Hsp70, Lhs1 (also known as Ssi1 or Cer1), has also been found to function in the lumen of the ER in yeast. Unlike KAR2, LHS1 is not an essential gene and is not induced by heat shock. However, lhs1 strains are coldsensitive for growth and accumulate precursor forms of several secretory proteins [33, 34]. Lhs1 is thus believed to play important roles in protein translocation into the ER. Increased ER protein aggregation and degradation after heat stress have also been found in strains lacking Lhs1, suggesting a role in the refolding of denatured proteins in the ER as well [35]. Genetic analysis also strongly imply that it plays a role in unfolded protein response (UPR) in the ER [34, 36]. A number
5
6 Hsp70 Chaperones
of genetic interactions have been reported between Lhs1 and Kar2 [33, 34, 37], indicating a complex relationship between these two Hsp70s. Recently, Steel et al. reported that the ATPase activities of Kar2 and Lhs1 are coupled and coordinately regulated. It was shown that Lhs1 stimulates Kar2’s ATPase activity by serving as a nucleotide exchange factor, which could be a novel function of Hsp70s [38]. Because the mammalian homologues of Kar2 and Lhs1, BiP and GRP170, both bind to unfolded immunoglobulin chains [39, 40], their chaperone functions might be coupled in vivo by this coordinated regulation.
3 “Tethered” Hsp70s/J-proteins: Roles in Protein Folding on the Ribosome and in Protein Translocation
Examples of Hsp70s that are targeted to particular positions within the cell have been known for some time. Of these, the best-studied is the tethering of organellar Hsp70s to the import channel, where they play critical roles in the import of proteins from the cytosol into both the ER and mitochondria. In these cases, Hsp70 binding to unfolded proteins provides a driving force for the import process by favoring vectorial movement of the translocating polypeptide into the organelle, preventing movement back towards the cytosol. In other cases, chaperones tethered to their site of action appear to play a role similar to that of soluble chaperones, but their targeting places them in very close proximity to their protein substrate. One such example is the ribosome-associated Hsp70/J-protein pair Ssb/Zuo1 of yeast, discussed in Section 3.2. 3.1 Membrane-tethered Hsp70/J-protein
Translocation across both the endoplasmic reticulum membrane and the mitochondrial inner membrane requires passage through channels whose narrow width demands that proteins be substantially unfolded. In both cases, organellar Hsp70s tethered to the translocation channel play a significant role in protein translocation into the ER lumen and the mitochondrial matrix. In the ER, Hsp70 is important for both SRP-dependent co-translational and posttranslational translo-cation [41, 42]. In mitochondria, Hsp70 is particularly important for translocation of proteins that are tightly folded [43] and thus need to be unfolded before translocation. In the ER of yeast, Sec63, a polytopic membrane J-protein with a substantial cytosolic domain and a lumenal J-domain, is thought to tether the lumenal Hsp70 Kar2 to the channel, as well as to stimulate its ATPase activity [44]. In mitochondria, the roles of tether and J-protein are separated into (or at least shared by) two proteins, the tether being Tim44 [45–47] and the J-protein being Pam18 [48–50]. Both are associated with the import channel. Tim44 is a peripheral membrane protein, while Pam18 is an integral membrane protein with an N-terminal domain extending into the intermembrane space and a J-domain extending into the matrix.
3 “Tethered” Hsp70s/J-proteins: Roles in Protein Folding on the Ribosome and in Protein Translocation
Pam18 stimulates the ATPase activity of Ssc1; Tim44 does not, but also does not interfere with stimulation by Pam18. While the exact mechanism of Hsp70 function at the channel remains controversial, there is no debate over the idea that Hsp70 makes use of the same basic biochemical properties utilized in other chaperone activities such as protein folding and binding of short hydrophobic peptide segments, which are regulated by ATP binding and hydrolysis, which in turn is modulated by J-proteins [51–53]. This binding drives vectorial movement of the polypeptide chain by sterically blocking movement back through the narrow channel. Minimally, the act of tethering to the channel allows an extremely high local concentration at the channel. It has also been argued that Hsp70 exerts a force upon the incoming chain, making use of an ATP-dependent conformational change to “pull” it into the organelle. Proof of such an unprecedented activity for an Hsp70 awaits more sophisticated biophysical experiments than have been carried out to date. Interestingly, however, there is evidence for an additional and unusual role for Hsp70 at the translocon in the ER membrane: gating the channel [54]. The lumenal Hsp70 is responsible for sealing the translocon pore, thereby maintaining the permeability barrier of a non-translocating pore. Mechanistically, how this occurs and which domains of Hsp70 are involved have not been established. Hsp70s and J-proteins also play a role in protein import into organelles from the cytosolic side. There are multiple reports, both in vivo and in vitro and in both yeast and mammalian cells, that indicate that soluble cytosolic chaperones facilitate translocation, presumably by preventing the premature folding of precursor proteins [51]. While soluble cytosolic chaperones play significant roles in this process, it is intriguing that the J-protein Ydj1 is farnesylated [55] and a portion is found to be associated with the membrane. Whether this association is important for aiding translocation is not clear. However, an intriguing means of targeting chaperoneassociated proteins to mitochondria has been reported by Young et al. [56]. Tom70 is an outer mitochondrial membrane receptor for proteins with internal mitochondrial targeting signals that are present in certain mitochondrial proteins such as membrane carrier proteins of the inner membrane. Tom70 is also a tetratrico peptide repeat (TPR) domain-containing protein. The TPR domain of Tom70 interacts with the C-terminus of Hsp70, targeting it to the mitochondria and thus facilitating the interaction of Tom70 with the targeting signal of the protein. Thus, Tom70 serves to target cytosolic Hsp70 (and Hsp90 in mammalian cells), carrying its cargo to the mitochondrial outer membrane.
3.2 Ribosome-associated Hsp70/J-proteins
One can imagine more than one possible function for molecular chaperones associated with ribosomes. Three examples are discussed below. While none of them has been thoroughly mechanistically defined, they represent three ways that chaperones may function in protein translation and folding of newly synthesized
7
8 Hsp70 Chaperones
poly-peptides: directly interacting with the nascent chain, modulating protein transloca-tion across membranes, and in the process of translation itself. The advantage of tethering molecular chaperones near the exit site of the ribosome in terms of protein folding is perhaps the easiest to envision. As protein translation occurs in a vectorial fashion, a newly synthesized polypeptide emerging from the ribosome cannot fold until the entire domain is synthesized. Being exposed to the crowded cellular environment, ribosome-bound nascent chains are thus prone to misfolding and/or aggregation. In the yeast S. cerevisiae, two types of Hsp70s (Ssb and Ssz1) and one type of J-protein (Zuo1) have been found to nearly quantitatively associate with ribosomes, with Ssz1 and Zuo1 forming a surprisingly stable heterodimer known as RAC (ribosome-associated complex) [57–59]. The ribosome association of Ssb and RAC is believed to situate them in close proximity to nascent chains, thus facilitating their chaperone functions. Actually, Ssb is believed to bind to the ribosome near the polypeptide exit tunnel since it can be cross-linked to short ribosome-bound nascent chains with a length only slightly longer than that required to span the ribosome exit tunnel [60]. It has been shown that Ssb and Zuo1 bind ribosomes independently, whereas Ssz1 associates with ribosome via its interaction with Zuo1 [59]. Genetic analysis has implied that Ssb, Zuo1, and Ssz1 function together in the same biological pathway, as any combinatory deletions of the three cause the same phenotypes [60, 61]. A chaperone system containing two Hsp70s and one J-protein such as this had not been observed before. How do they function together to chaperone nascent chains on ribosomes? Current evidence suggests that this chaperone triad is actually a variation of the classic Hsp70/J-protein pair, in which the J-protein’s role is played by RAC, the Zuo1-Ssz1 complex. Ssb is believed to be the main Hsp70 chaperone interacting with ribosome-bound nascent chains, since it can be cross-linked to nascent chains of between 54 and 152 amino acids [60, 62] and its peptide-binding domain has been shown to be important for its in vivo function [63]. In contrast, Ssz1, the other Hsp70 in this system, has not been observed to interact with nascent chains, and its entire putative peptide-binding domain is dispensable [60]. However, compelling evidence has shown the importance of Ssz1 in working with Zuo1 to assist Ssb’s function, as Ssb’s cross-link to nascent chain depends on the presence of a functional RAC [61]. In addition, Zuo1 carrying an alteration in its J-domain, even though bound to Ssz1, could not restore the cross-link by Ssb, indicating that Zuo1 is the J-protein partner for Ssb [61]. Collectively, it has been proposed that Ssz1 functions as a cofactor/regulator of Zuo1, modifying the ability of Zuo1 to stimulate Ssb’s chaperone activity [60]. This model is supported by the recent finding that Ssb’s ATPase activity can be stimulated by Zuo1 only when it is in complex with Ssz1 (P. Huang, personal communication). Genetic analysis also suggests more central roles for Ssb and Zuo1 than for Ssz1, since overexpression of Zuo1 or Ssb can partially compensate for the absence of Ssz1, but Ssz1 overexpression cannot compensate for the lack of either Zuo1 or Ssb [61]. It seems that Ssz1 has evolved to function differently from classic Hsp70s (see Section 6.1 for more discussion).
4 Modulating of Protein Conformation by Hsp70s/J-proteins 9
It should be noted that only around half of the total Ssb proteins are found to be ribosome-associated, whereas all of Zuo1 and Ssz1 are found on the ribosomes [57–59]. This suggests that Ssb might play additional roles in the cytosol or that it could move along with the elongating nascent chains to facilitate their folding. It was recently proposed that Ssb might cooperate with the yeast chaperonin system (TriC) in the cytosol to fold a specific class of substrates containing WD domains [64]. A ribosome-associated J-protein, Mtj1p, from dog pancreas microsomes has recently been identified and characterized [65]. Mtj1p is thought to function very differently from Zuo1. It is an ER membrane protein having a single transmembrane domain; its J-domain, which extends into the ER lumen, interacts with the ER lumenal Hsp70 BiP. The large cytosolic domain of Mtj1p interacts with translating and non-translating ribosomes at low ionic strength, up to 200 mM KCl. Using truncation constructs, the ribosome-interacting region of Mtj1p was mapped to a highly charged N-terminal region (amino acids 176–194) in its cytosolic domain. Intriguingly, the interaction of Mtj1p with the ribosome appears to affect translation. Translation of a number of proteins in reticulocyte lysates is dramatically decreased when either Mtj1p’s cytosolic domain or a peptide derived from its ribosome-interacting region is present. It has been proposed that Mtj1p acts during co-translational protein transport into ER to recruit both the ribosome and BiP to the translocon complex. In the process, it could help to modulate translation and facilitate any number of aspects of the translocation process, including: facilitating the handover of nascent polypeptides from the signal recognition particle (SRP) to the translocon complex, transmitting signals from the ribosome to BiP, or regulating lumenal gating of the translocon [65]. Ribosome association has also been observed for a portion of yeast Hsp70 Ssa and one of its J-protein partners, Sis1 [66, 67]. However, the exact nature of their ribosome association has not been established. The data indicate that Ssa interacts with Sis1 and poly(A)-binding protein (Pab1) via its C-terminal 10-kDa domain [66, 68, 69], and its interaction with Sis1 and Pab1 occurs preferentially on translating ribosomes [66]. Supporting a meaningful role for Sis1 on ribosomes, genetic interactions have been reported between mutations in genes encoding ribosomal subunit proteins and mutations in SIS1 [66, 67]. A dramatic decrease in interaction between Pab1 and the translation initiation factor, eIF4G, has also been observed in strains carrying a temperature-sensitive allele of SSA growing at non-permissive temperature [66]. Collectively, the data suggest that Ssa and Sis1 may function in translation initiation on yeast ribosomes, possibly via interaction with Pab1.
4 Modulating of Protein Conformation by Hsp70s/J-proteins
Although the examples are limited, Hsp70 and J-proteins have evolved in some instances to interact with proteins that are fully, or nearly fully, folded. In this manner, chaperones function in specific, and very diverse, cellular functions,
10 Hsp70 Chaperones
including the biogenesis of Fe/S proteins, regulation of the heat shock response, and regulation of initiation of DNA replication. In some cases an abundant chaperone that functions in generic protein folding, interacting with many unfolded protein substrates, will also interact with one or a few specific folded proteins. The interaction of DnaK with the heat shock transcription factor σ 32 and the function of mammalian Hsc70 in clathrin uncoating are such examples. In other cases, such as the biogenesis of Fe/S centers, Hsp70 and J-proteins have evolved to work in a single, specific function. 4.1 Assembly of Fe/S Centers
Although in vitro Fe/S clusters can be formed in proteins directly from elemental iron and sulfur, in vivo this task falls to highly conserved and complex assembly systems. Central to this system is a scaffold protein onto which a cluster is assembled and then transferred to a recipient protein [70, 71]. The scaffold, called Isu in eukaryotes and IscU in prokaryotes, is a substrate protein for a specialized Hsp70:J-protein pair (Ssq1:Jac1 in eukaryotes and Hsc66:Hsc20 in prokaryotes) [72, 73]. IscU/Isu proteins are small, highly conserved proteins, with the mammalian and bacterial proteins having about 70% amino acid identity. One of the stretches (LPPVK) of complete identity among the proteins from diverse organisms contains the binding site for the specialized Hsp70. The binding site for the J-protein has been elusive; the site may not be a continuous stretch of amino acids and thus is technically more difficult to define. In both eukaryotes and prokaryotes, the J-protein and the Hsp70 of this pair are very specialized. In fact, there is no evidence of the existence of substrate proteins in addition to the Fe/S cluster scaffold protein. This J-protein:Hsp70 pair plays a very important role in assembly of Fe/S clusters, an essential cellular process [71, 74]. Bacteria have redundant systems for Fe/S cluster assembly, so the lack of Hsc66:Hsc20 has a fairly modest effect on the activity of Fe/S-containing enzymes in the cell [75]. However, JAC1 is essential in most yeast strain backgrounds, and cells lacking Ssq1 have greatly reduced Fe/S-containing enzyme activity. Based on the extensive data available concerning Hsp70:J-protein function, it would be logical to propose that the chaperones alter the conformation of the scaffold proteins to facilitate either the assembly of the cluster or its transfer to a recipient protein. Recent results from the Lill laboratory [76], utilizing the labeling of Isu with radioactive iron, suggest that the chaperones may function in aiding cluster transfer. Strains depleted of Ssq1 or Jac1 accumulated higher than normal amounts of iron associated with Isu, while those having a decreased amount of proteins known to be needed for assembly of the cluster, such as the sulfur-transfer protein Nfs1, had lower levels. But, the mechanistic role of chaperones in the process remains to be clarified. While it is clear that these chaperones involved in Fe/S cluster assembly are quite specialized, it is not so clear whether the abundant “generic” chaperones of the same cellular compartment can carry out this role as well. The data from bacteria suggest that the generic DnaK/DnaJ/GrpE system functions completely separately
4 Modulating of Protein Conformation by Hsp70s/J-proteins 11
from the Hsc66/Hsc20 system [77]. But in yeast it appears that there is at least some functional overlap between the systems. Overexpression of the abundant Hsp70 of the mitochondrial matrix Ssc1, which also functions in protein translocation and folding of many matrix proteins, can substantially suppress the effects of the absence of Ssq1 [78]. Also, Ssq1 and Ssc1 appear to share the nucleotide release factor Mge1 [73]. Interestingly, it is not clear whether mammalian mitochondria have a specialized Hsp70 for Fe/S assembly, as no obvious homologue of Ssq1 has been found in these genomes. However, a homologue of Jac1 does exist. Perhaps in mammalian cells the major Hsp70, which is involved in protein translocation and general protein folding, also fulfils the role of facilitating Fe/S cluster assembly, using the Jac1 homologue as its alternative J-protein partner. 4.2 Uncoating of Clathrin-coated Vesicles
Regulated dynamic interaction among plasma membranes and cellular membrane compartments is accomplished by transport vesicles. Budding and fusion of vesicles require a series of assembly and disassembly cycles of the coat protein complex that provide the mechanical support for membrane bending and cargo selection. Hsc70, the constitutive form of Hsp70, has been known to play important roles in uncoating of clathrin-coated vesicles (CCVs), the major class of transport vesicles involved in endocytosis and synaptic transmission (for review, see Ref. [79]). Hsc70, the major cytosolic Hsp70 of mammalian cells, promotes the disassembly of clathrin coats by binding to clathrin in an ATP-dependent manner, thereby releasing monomeric clathrin triskelia and other coat proteins from CCVs [80–82]. Like other members of the Hsp70 family, Hsc70 works with a J-protein partner in this process. The specialized type of J-protein involved is called auxilin, a name that encompasses brain-specific auxilin 1 and the ubiquitously expressed auxilin 2 (also called G-cyclin-associated kinase, or GAK) [83–88]. It is believed that auxilin binds to clathrin first and then recruits Hsc70 to CCVs via its J-domain, which then stimulates ATP hydrolysis and clathrin binding to Hsc70, resulting in release of clathrin monomers from CCVs [89–91]. Auxilin homologues have been identified in several species, including yeast, C. elegans, and mammals; they all contain a clathrin-binding domain and a Cterminal J-domain [87, 89, 90, 92–94]. Recent studies indicate that, as expected, the J-domain is important for interaction with Hsc70 [95] but that it also has features different from previously studied J-domains [96]. The structure of the bovine auxilin J-domain has been solved and was found to contain an extra N-terminal helix and a long loop inserted between helices I and II in addition to conserved J-domain features. Surface plasmon resonance analysis of auxilin mutants reveals that several positively charged residues in the long loop are also important for Hsc70 binding. Recently, the results of a flurry of in vivo studies, including RNA interference in nematode cells [90] and dominant-negative interference experiments in mammalian cells [97], suggest that Hsc70s and auxilin play a more general role in clathrin dynamics than just uncoating of CCVs. Auxilin is released from CCVs
12 Hsp70 Chaperones
during uncoating, but Hsc70 remains associated with free clathrin, possibly performing additional functions such as preventing clathrin-spontaneous polymerization or priming it for membrane recruitment by adaptor complex [98]. In addition, inactivation of auxilin in yeast also results in a phenotype similar to cells lacking clathrin [93]. Besides their functions with clathrin, Hsc70 and auxilin have recently been found to have unanticipated roles in other stages of vesicle transport. For example, Newmyer et al. [99] found that Hsc70 and auxilin can interact with GTP-bound dynamin, a protein well-known for its importance in vesicle formation. Two domains in auxilin were identified as being involved in dynamin binding; overexpression of these domains in vivo inhibited endocytosis without changing the dynamics or distribution of clathrin. It is thus suggested that Hsc70 and auxilin also participate in the early step of CCV formation [99]. In addition, Hsc70 has been suggested to play a role in neurotransmitter exocytosis. Mutations in Hsc70-4, a major cytosolic Hsp70 of Drosophila, were found to impair nerve-evoked neurotransmitter release in vivo. Genetic analysis suggested that Hsc70-4 might cooperate with cysteine string protein, a specialized J-protein on the membrane of secretary vesicles, to increase the Ca2+ sensitivity of neurotransmitter release upon vesicle fusion [100]. Additionally, a block in exocytosis has also been observed in squid synapses expressing dominant-interfering Hsc70 mutant proteins [95]. However, the critical protein substrates of Hsc70 during exocytosis are not yet known. 4.3 Regulation of the Heat Shock Response
While there are indications that Hsp70s play a role in regulation of the heat shock response in a variety of organisms, this role is by far best understood in bacteria. The heat shock response in E. coli is transcriptionally controlled by the heat shock promoter-specific transcription factor σ 32 , a subunit of RNA polymerase (reviewed in Refs. [101, 102]). Upon heat shock, the level and activity of σ 32 are rapidly increased due to the altered translatability of the σ 32 gene (rpoH) and elevated stability of σ 32 protein. The shutoff of the response results from the reversal of these effects, a decrease in the level and activity of σ 32 . The E. coli Hsp70 system, DnaK/DnaJ/GrpE, has been found to regulate the heat shock response by directly associating with σ 32 and modulating its degradation by proteases. In non-heat-shocked cells, σ 32 has an extremely short half-life of less than 1 minute. After heat shock, the half-life rapidly increases to, and remains at, eightfold, until the shutoff phase of the heat shock response begins [103, 104]. Several proteases are involved in σ 32 degradation, including Lon (la), ClpAP and HslVU, and, most significantly, the ATP-dependent metalloprotease FtsH (previously known as HflB) [105–108]. The DnaK chaperone system was first implicated in the regulation of the heat shock response because strains carrying mutations in any one of the genes encoding components of the DnaK/DnaJ/GrpE system showed elevated expression of heat shock-inducible genes [109]. Later it was proposed that the DnaK and co-chaperones present σ 32 to the proteases, thus negatively regulating the stability of σ 32 . This model provides an attractive mechanism that links the level of
4 Modulating of Protein Conformation by Hsp70s/J-proteins 13
σ 32 to the level of unfolded proteins, the stress sensed by the DnaK chaperone system. Upon heat shock, less DnaK and DnaJ would be available for σ 32 association and degradation, since more substrates (e.g., unfolded proteins) are competing for binding to the chaperones, resulting in the increased stability of σ 32 [110]. This model predicts that the actual levels of the DnaK system proteins directly affect the heat shock response, which has been proven to be the case [111]. A significant amount of work has been done to understand the interaction between the DnaK system and σ 32 , but exactly how the DnaK system modulates the degradation of σ 32 by FtsH remains unclear. It has been reported that when binding to RNA polymerase, σ 32 is resistant to FtsH-mediated degradation [111]. It is thus possible that the DnaK system affects the stability of σ 32 by preventing its binding to the RNA polymerase, hence indirectly increasing the chance of σ 32 degradation by FtsH. However, it has been shown that a mutant version of σ 32 that cannot bind to RNA polymerase, but interacts normally with DnaK/DnaJ/GrpE, has a normal stability in the cells lacking DnaK or cells lacking both DnaJ and the DnaJ homologue ClbA [112], suggesting a more active role of the DnaK system in promoting the degradation of σ 32 . It was recently reported that the in vitro interaction between DnaK and σ 32 is highly temperature-sensitive, possibly due to a destabilized structural element in σ 32 at elevated temperature [113], thus providing another way of regulating the stability of σ 32 by the DnaK system. The actual regions of σ 32 that interact with DnaK and DnaJ, and thus are involved in regulation, remain elusive. in vitro, DnaK and the co-chaperone DnaJ bind independently to free σ 32 , and it has been proposed that σ 32 interacts similarly to any other substrate, with σ 32 binding/release controlled by the ATP hydrolysis cycle [114]. Screening of a σ 32 -derived peptide library for chaperone binding sites revealed two binding sites in the highly conserved RpoH box in σ 32 , between residues 133 and 140 [115, 116]. Peptides corresponding to this region bind to DnaK and can be degraded by FtsH in vitro. However, introduction of alterations within these regions in the full-length σ 32 protein did not affect DnaK binding in vitro or its degradation in vivo and in vitro [117]. Instead, these mutants of σ 32 show decreased affinity for RNA polymerase and reduced transcriptional activity. It is thus still unclear whether the RpoH box in σ 32 is directly involved in DnaK binding in vivo. In addition to the turnover of σ 32 , the DnaK system has also been suggested to modulate the activity of σ 32 directly, since overexpression of DnaK and DnaJ in a ftsH strain leads to reduced activity of σ 32 , while the level of σ 32 remains unchanged [108]. This type of modulation has also been reported in Agrobacterium tumefaciens, in which RpoH, the σ 32 homologue, is normally quite stable and a change of the level of DnaK and DnaJ specifically changes the activity of RpoH [118]. However, little is known about how the activity of σ 32 or RpoH is altered by the DnaK system. 4.4 Regulation of Activity of DNA Replication-initiator Proteins
Many plasmid-encoded DNA initiator proteins exist as monomers and dimers in equilibrium. Monomers activate initiation of DNA replication by binding to
14 Hsp70 Chaperones
specific DNA-binding sites around the origin of replication, whereas dimers are inactive due to altered DNA-binding specificities. The role of the Hsp70 chapterone machinery in regulating the activity of DNA replication initiators was first demonstrated by the work of Wickner et al. [119–122] in the E. coli system of RepA, the replication-initiator protein of plasmid P1, motivated by earlier in vivo observations that mutations in dnaK, dnaJ, or grpE resulted in plasmid P1 instability [123]. The DnaK/DnaJ/GrpE chaperone machinery altered the equilibrium between dimer and monomer, destabilizing dimeric RepA in vitro and rendering it more active in origin binding [119–122]. How DnaK/DnaJ/GrpE interacts with the folded RepA dimer and promotes its monomerization is still not fully understood. Both DnaJ and DnaK bind to a RepA dimer, but at different sites [119, 124]. It was proposed that DnaJ tags the RepA dimer for recognition by DnaK, which then stimulates the monomerization of RepA in an ATP-dependent reaction [121]. The DnaK chaperone system has also been shown to assist the monomerization of other replication initiation proteins, including RepA of plasmid P7 [122] and TrfA of RK2 plasmid [125]. Konieczny et al. [125] reported that ClpB, the Clp/Hsp100 family, cooperates with the DnaK chaperone system to convert TrfA dimers into active monomers. The activation of TrfA seems to be specific for bacterial chaperones (ClpB, DnaK, DnaJ, and GrpE) since the corresponding yeast mitochondrial homologues (Hsp78, Ssc1, Mdj1, and Mge1) did not activate TrfA in an in vitro system reconstituted with purified components. Recent identification of similarities between RepA and the origin-recognition complex (ORC) subunit from eukaryotes and archaea raises the question as to whether Hsp70s play a more general role in DNA replication. Giraldo et al. [126] found that the C-terminal domain of ScOrc4p, a subunit of S. cerevisiae ORC, shares sequence and potential structural similarity with the N-terminal domain of RepA, including a common winged-helix domain and a leucine-zipper dimerization motif. Furthermore, they found that ScOrc4p can interact with yeast Hsp70 both in vivo and in cell lysates, although it is not clear with which Hsp70 it interacts [126]. Collectively, these results suggest Hsp70’s role in regulating the assembly of active ORC in eukaryotes, possibly through a mechanism involving disassembly of ORC subunits. In addition to monomerization of initiators for DNA replication, Hsp70/J-protein systems have long been known to play important roles in the assembly and activation of viral DNA replicative machinery in both prokaryotes and eukaryotes (see review in Ref. [127]). In the case of bacteriophage λ replication in E. coli, it was proposed that DnaJ interacts with DnaB-λP-λO-DNA complex and facilitates recognition by DnaK, which then stimulates DnaB helicase activity by catalyzing the release of λP from the initiation complex [120, 128–130]. In mammalian cells, Hsp70 and J-protein have also been found to enhance the origin binding of UL9, the origin-binding protein of herpes simplex virus type 1 [131]. In addition, hTid-1, a human homologue of DnaJ, was found to associate with UL9 and promote the multimer formation from dimeric UL9 [132]. The DNA binding and activity of helicase E1 from human papilloma virus is also promoted by human Hsp70 and Jprotein [133, 134]. In summary, mounting evidence suggests that Hsp70/J-protein
6 Hsp70s/J-proteins – When an Hsp70 Maybe Isn’t Really a Chaperone 15
systems interact with folded initiator proteins and play regulatory roles in DNA replication.
5 Cases of a Single Hsp70 Functioning With Multiple J-Proteins
The previous sections discuss many physiological roles of Hsp70s and their partner J-proteins (Figure [1]). In some of the cases discussed, specific functions depend upon a functional specificity inherent in the Hsp70 itself. The Hsp70 evolved in Fe/S cluster biogenesis and the one evolved to bind to ribosomes, and to be regulated like ribosomal proteins, are examples of such specialized Hsp70s. Although dispersed in different sections of this chapter, there are also multiple examples of a single Hsp70 functioning with different J-proteins in very different physiological roles. Examples exist in mitochondria, the ER, and the cytosol of eukaryotes. In mitochondria, the abundant Hsp70 Ssc1 is one such example. A portion of Ssc1 is tethered to the membrane and functions in translocation of proteins into the matrix with the membrane J-protein Pam18 [48–50]. The remainder of Ssc1 is soluble and functions with the soluble J-protein Mdj1 in the folding of imported and mitochondrially encoded proteins [21, 24]. In the ER, the predominant Hsp70, Kar2/BiP, functions with several J-proteins. Analogous to the case in the mitochondria, a portion of Kar2 is tethered to the translocon and acts with the integral membrane J-protein Sec63 in protein translocation. As a soluble protein in the lumen, it has the J-protein partners Scj1 and Jem1. Multiple examples exist in the cytosol as well. The best-studied example in mammalian systems is the involvement of Hsc70 not only in protein folding and translocation into mitochondria but also in vesicle transport. In the former process, Hsc70 works with J-proteins Hdj1 and Hdj2, among others; in vesicle transport it works with auxilin and cysteine-string proteins (described in Section 2.2). Similar examples exist in the yeast cytosol, where the Ssa class of Hsp70s has multiple J-protein partners, including Ydj1, Sis1, and Swa2 (yeast auxilin). Ydj1 and Sis1 are particularly interesting examples, in that the J-proteins show extensive similarity beyond their J-domains but are functionally distinct, in the sense that the functions of the essential Sis1 cannot be carried out by Ydj1. Interestingly, this functional specificity has been conserved, as the mammalian homologue of Sis1, Hdj1, can rescue a sis1 strain, but the homologue of Ydj1, Hdj2, cannot [135, 136]. The biochemical basis of this functional difference has yet to be defined.
6 Hsp70s/J-proteins – When an Hsp70 Maybe Isn’t Really a Chaperone
As we learn more about the complex workings of cells, the idea has emerged that genes have been duplicated and evolved to serve additional functions. Such evolution includes the examples described above, as Hsp70 and J-proteins have
16 Hsp70 Chaperones
evolved to serve specialized functions utilizing their inherent ability to interact with hydrophobic sequences in proteins in their peptide-binding clefts. However, examples have also been found in which Hsp70s have evolved to carry out functions that are no longer dependent on their ability to interact with substrate proteins. Below we discuss two examples in which an Hsp70 forms a heterodimer with another protein, perhaps serving a regulatory role. 6.1 The Ribosome-associated “Hsp70” Ssz1
Ssz1 (previously known as Pdr13) from S. cerevisiae is an example of a protein with sequence similarity to Hsp70s that clearly functions in ways not expected of an Hsp70 chaperone. Ssz1 is typically classified as an Hsp70 because it contains a conserved N-terminal ATPase domain that shares around 35% sequence identity with ATPase domains of Hsp70s across species and because its size of 60 kDa is close to that expected of an Hsp70. However, Ssz1’s C-terminal region has little sequence similarity with known Hsp70s, rendering its overall sequence similarity to Hsp70s only 20–25%. Ssz1’s cellular properties also differ significantly from those of well-studied Hsp70s. First, although Ssz1 interacts with a J-protein, the ribosome-associated Zuo1, the complex, called the ribosome-associated complex (RAC), is surprisingly stable [59]. Typically, Hsp70-J-protein interactions are quite transient, while the RAC subunits cannot be separated without denaturation. But most compelling is the fact that Ssz1’s C-terminal putative peptide-binding domain is not required for its in vivo function. The C-terminal truncated Ssz1, having only the ATPase domain, fully rescues the phenotypes of a ssz1 strain [60]. Up to now, there is no evidence to indicate that Ssz1 interacts with nascent chains or has any chaperone activity. However, there is every indication that Ssz1 has an important in vivo role working with Zuo1, but as a component required for Zuo1’s ability to act as the J-protein partner of the ribosome-associated Hsp70 Ssb. Both Ssz1 and Zuo1 are required for Ssb cross-link to nascent chains on the ribosomes [61]. Genetic analysis also suggests that Ssz1, Zuo1, and Ssb work in the same biological pathway, as the phenotypes of any single mutant are the same as cells lacking all three [60, 61]. Therefore, it has been proposed that Ssz1 functions as a modulator of Zuo1, priming it to interact with Ssb in chaperoning newly synthesized polypeptides on the ribosomes [60, 61] (see Section 2.2 on ribosome-associated Hsp70s). Consistent with this idea, Zuo1 stimulates Ssb’s ATPase activity only when it is in complex with Ssz1, supporting the idea that Ssz1 functions as a modulator of a J-protein (P. Huang, personal communication). The only other known in vivo activity of Ssz1 is its role in pleiotropic drug resistance (PDR), the resistance to multiple drugs with unrelated structures or modes of action [137]. PDR occurs because cells increase the efflux of drugs, primarily because of the increase in activity of the transcription factor Pdr1, which acts on a
6 Hsp70s/J-proteins – When an Hsp70 Maybe Isn’t Really a Chaperone 17
number of genes, including those encoding ABC transporters, which are responsible for drug efflux. Actually, Ssz1 (previously known as Pdr13) was first isolated from a genetic screen for genes that, when overexpressed from a multicopy plasmid, increased PDR in a Pdr1-dependent fashion [137]. Ssz1’s activation of PDR is independent of Zuo1 or Ssb and does not require its C-terminal putative peptidebinding domain (H. Eisenman, personal communication). Although it remains an enigma as to how Ssz1 mediates Pdr1-dependent transcription, its ability to do so without its putative peptide-binding domain indicates that it does not act as a chaperone in this role.
6.2 Mitochondrial Hsp70 as the Regulatory Subunit of an Endonuclease
The major mitochondrial Hsp70, Ssc1, from S. cerevisiae serves as another example of an Hsp70 that carries out a function other than that of a conventional chaperone. Ssc1 forms a stable heterodimer with the endonuclease SceI [138]. SceI functions as a multisite-specific endonuclease in yeast mitochondria, where it cleaves more than 30 sites on mitochondrial DNA to induce homologous recombination upon zygote formation [139]. Ssc1 not only increased the thermostability and endonuclease activity of SceI but also broadened its sequence specificity, as in the absence of Ssc1, SceI is a unisite-specific endonuclease [140]. Although the mechanism by which Ssc1 alters the substrate specificity of SceI is not thoroughly understood, progress has been made in understanding the interaction between Ssc1 and SceI. The interaction between Ssc1 and SceI is more stable in the presence of ADP than in the presence of ATP or in the absence of nucleotides [140]. SceI was found to interact with Ssc1’s ATPase domain; in addition, interaction of full-length Ssc1 could not be competed away by an excess of an unfolded protein substrate. Thus, SceI is not a substrate of Ssc1 but rather a protein partner. However, only when full-length Ssc1 is complexed with SceI does it confer multisite-specific endonuclease activity. It remains unclear whether Ssc1’s ATPase activity plays any role in regulating SceI’s activity and specificity. However, no J-proteins seem to be required for Ssc1’s effect on endonuclease activity, as in vitro reconstituted SceI/Ssc1 heterodimers can recapitulate the multisite-specific endonuclease activity observed in mitochondrial extracts. A similar function of mitochondrial Hsp70 in regulating endonuclease specificity has been found in another yeast strain, Saccharomyces uvarum. SuvI, the ortholog of the SceI in S. uvarum, is a mitochondrial endonuclease that forms a stable heterodimer with the homologue of Ssc1, called mtHsp70, resulting in a change from a unisite-specific to a multisite-specific endonuclease [141]. In the absence of mtHsp70, SuvI and SceI have the same unisite sequence specificity but show different multisite specificities upon binding to mtHsp70. These findings suggest that mtHsp70 can regulate the variety of sites cleaved by regulating the sequence specificity of endonucleases.
18 Hsp70 Chaperones
7 Conclusions
This article attempts to summarize the explosion of information and to outline emerging themes regarding the variety of functions of Hsp70s and J-proteins in the cell. Elegant biochemical experiments carried out over the past 15 years have established the basic biochemical parameters of the Hsp70:J-protein partnership and serve as the foundation for understanding the in vivo function of the myriad of individual J-protein:Hsp70 pairs that function in the cell. Clearly, much work remains, and certainly many surprises await us. References 1 OHTSUKA, K. & HATA, M. (2000). Mammalian HSP40/DNAJ homo-logues: cloning of novel cDNAs and a proposal for their classification and nomenclature. Cell Stress Chaperones 5, 98–112. 2 BUKAU, B. & HORWICH, A. L. (1998). The Hsp70 and Hsp60 chaperone machines. Cell 92, 351–366. 3 HARTL, F. & HAYER-HARTL, M. (2002). Molecular chaperones in the cytosol: from nascent chain to folded protein. Science 295, 1852–1858. 4 HESTERKAMP, T. & BUKAU, B. (1998). Role of the DnaK and HscA homo-logues of Hsp70 chaperones in protein folding in E. coli. EMBO J. 17, 4818–4828. 5 GAITANARIS, G. A., PAPAVASSILIOU, A. G., RUBOCK, P., SILVERSTEIN, S. J. & GOTTESMAN, M. E. (1990). Renaturation of denatured lambda repressor requires heat shock proteins. Cell 61, 1013–1020. 6 SKOWYRA, D., GEORGOPOULOS, C. & ZYLICZ, M. (1990). The E. coli dnak gene product, the hsp70 homologue, can reactivate heat-inactivated RNA polymerase in an ATP hydrolysis-dependent manner. Cell 62, 939–944. 7 LANGER, T., LU, C., ECHOLS, H., FLANAGAN, J., HAYER, M. K. & HARTL, F. U. (1992). Successive action of DnaK, DnaJ, and GroEL along the pathway of chaperone-mediated protein folding. Nature 356, 683–689. 8 SCHRODER, H., LANGER, T., HARTL, F.-U. & BUKAU, B. (1993). DnaK, DnaJ and GrpE form a cellular chaperone
9
10
11
12
13
machinery capable of repairing heat-induced protein damage. EMBO Journal 12, 4137–4144. ZIEMIENOWICZ, A., SKOWYRA, D., ZEILSTRA-RYALLS, J., FAYET, O., GEORGOPOULOS, C. & ZYLICZ, M. (1993). Both the Escherichia coli chaperone systems, GroEL/GroES and DnaK/DnaJ/GrpE, can reactivate heat-treated RNA polymerase. Different mechanisms for the same activity. Journal of Biological Chemistry 268, 25425–25431. SZABO, A., LANGER, T., SCHRODER, H., FLANAGAN, J., BUKAU, B. & HARTL, F. U. (1994). The ATP hydrolysis-dependent reaction cycle of the Escherichia coli Hsp70 system DnaK, DnaJ, and GrpE. Proc. Natl. Acad. Sci. USA 91, 10345–10349. MOGK, A., TOMOYASU, T., GOLOUBINOFF, P., FUDIGER, S., RODER, D., LANGEN, H. & BUKAU, B. (1999). Identification of thermolabile Escherichia coli proteins: prevention and reversion of aggregation by DnaK and ClpB. EMBO J. 18, 6934–6949. TETER, S. A., HOURY, W. A., ANG, D., TRADLER, T., ROCKABRAND, D., FISCHER, G., BLUM, P., GEORGOPOULUS, C. & HARTL, F. U. (1999). Polypeptide Flux through Bacterial Hsp70: DnaK Cooperates with Trigger Factor in Chaperoning Nascent Chains. Cell 97, 755–765. DEUERLING, E., SCHULZE-SPECKING, A., TOMOYASU, T., MOGK, A. & BUKAU, B. (1999). Trigger factor and DnaK
References
14
15
16
17
18
19
20
21
22
cooperate in folding of newly synthesized proteins. Nature 400, 693–696. KIM, S., SCHILKE, B., CRAIG, E. & HORWICH, A. (1998). Folding in vivo of a newly translated yeast cytosolic enzyme is mediated by the SSA class of cytosolic yeast Hsp70 proteins. Proc. Natl. Acad. Sci. USA 95, 12860–12865. THULASIRAMAN, V., YANG, C.-F. & FRYDMAN, J. (1999). In vivo newly translated polypeptides are sequestered in a protected folding environment. EMBO J. 18, 85–95. EGGERS, D. K., WELCH, W. J. & HANSEN, W. J. (1997). Complexes between Nascent Polypeptides and Their Molecular Chaperones in the Cytosol of Mammalian Cells. Molecular Biology of the Cell 8, 1559–1573. MICHELS, A. A., KANON, B., KONINGS, A. W., OHTSUKA, K., BENSAUDE, O. & KAMPINGA, H. H. (1997). Hsp70 and Hsp40 chaperone activities in the cytoplasm and the nucleus of mammalian cells. J. Biol. Chem. 272, 33283–33289. PRATT, W. B. & TOFT, D. O. (2003). Regulation of signaling protein function and trafficking by the hsp90/hsp70-based chaperone machinery. Exp Biol Med 228, 111–133. CHANG, H. C. & LINDQUIST, S. (1994). Conservation of Hsp90 macromolecular complexes in Saccharomyces cerevisiae. J. Biol. Chem. 269, 24983–24988. LIU, X. D., MORANO, K. A. & THIELE, D. J. (1999). The yeast Hsp110 family member, Sse1, is an Hsp90 cochaperone. J. Biol. Chem. 274, 26654–26660. KANG, P. J., OSTERMANN, J., SHILLING, J., NEUPERT, W., CRAIG, E. A. & PFANNER, N. (1990). Requirement for hsp70 in the mitochondrial matrix for translocation and folding of precursor proteins. Nature 348, 137–143. WESTERMANN, B., PRIP-BUUS, C., NEUPERT, W. & SCHWARZ, E. (1995). The role of the GrpE homologue, Mge1p, in mediating protein import and protein folding in mitochondria. EMBO J. 13, 1998–2006.
23 WESTERMANN, B., GAUME, B., HERRMANN, J. M., NEUPERT, W. & SCHWARZ, E. (1996). Role of the mitochondrial DnaJ homologue Mdj1p as a chaperone for mitochondrially synthesized and imported proteins. Mol Cell Biol 16, 7063–7071. 24 LIU, Q., KRZEWSKA, J., LIBEREK, K. & CRAIG, E. A. (2001). Mitochondrial Hsp70 Ssc1: role in protein folding. J. Biol. Chem. 276, 6112–6118. 25 HERMANN, J., STUART, R., CRAIG, E. & NEUPERT, W. (1994). Mitochondrial heat shock protein 70, a molcular chaperone for proteins encoded by mitochondrial DNA. J. Cell Biol. 127, 893–902. 26 HAAS, I. G. & WABL, M. (1983). Immunoglobulin heavy chain binding protein. Nature 306, 387–389. 27 SIMONS, J. F., FERRO-NOVICK, S., ROSE, M. D. & HELENIUS, A. (1995). BiP/Kar2p serves as a molecular chaperone during carboxypeptidase Y folding in yeast. J. Cell. Biol. 130, 41–49. 28 HENDERSHOT, L., WEI, J., GAUT, J., MELNICK, J., AVIEL, S. & ARGON, Y. (1996). Inhibition of immunoglobulin folding and secretion by dominant negative BiP ATPase mutants. Proc. Natl. Acad. Sci. USA 93, 5269–5274. 29 ELLGAARD, L. & HELENIUS, A. (2001). ER quality control: towards an understanding at the molecular level. Curr Opin Cell Biol 13, 431–437. 30 BRODSKY, J. L., WERNER, E. D., DUBAS, M. E., GOECKELER, J. L., KRUSE, K. B. & MCCRACKEN, A. A. (1999). The requirement for molecular chaperones during endoplasmic reticulum-associated protein degradation demonstrates that protein export and import are mechanistically distinct. J. Biol. Chem. 274, 3453– 3460. 31 PLEMPER, R. K., BOHMLER, S., BORDALLO, J., SOMMER, T. & WOLF, D. H. (1997). Mutant analysis links the translocon and BiP to retrograde protein transport for ER degradation. Nature 388, 891–895. 32 BERTOLOTTI, A., ZHANG, Y., HENDERSHOT, L. M., HARDING, H. P. & RON, D. (2000). Dynamic interaction of BiP and ER stress transducers in the unfolded-
19
20 Hsp70 Chaperones
33
34
35
36
37
38
39
40
41
protein response. Nat Cell Biol 2, 326–332. BAXTER, B. K., JAMES, P., EVANS, T. & CRAIG, E. A. (1996). SSI1 encodes a novel Hsp70 of the Saccharomyces cerevisiae endoplasmic reticulum. Mol. Cell Biol. 16, 6444–6456. CRAVEN, R. A., EGERTON, M. & STIRLING, C. J. (1996). A novel Hsp70 of the yeast ER lumen is required for the efficient translocation of a number of protein precursors. EMBO J. 15, 2640–2650. SARIS, N., HOLKERI, H., CRAVEN, R. A., STIRLING, C. J. & MAKAROW, M. (1997). The Hsp70 homologue Lhs1p is involved in a novel function of the yeast endoplasmic reticulum, refolding and stabilization of heat-denatured protein aggregates. J. Cell. Biol. 137, 813–824. TYSON, J. R. & STIRLING, C. J. (2000). LHS1 and SIL1 provide a lumenal function that is essential for protein translocation into the endoplasmic reticulum. EMBO J. 19, 6440–6452. HAMILTON, T. G. & FLYNN, G. C. (1996). Cer1p, a novel Hsp70-related protein required for posttranslational endoplasmic reticulum translocation in yeast. J. Biol. Chem. 271, 30610–30613. STEEL, G. J., FULLERTON, D. M., TYSON, J. R. & STIRLING, C. J. (2004). Coordinated Activation of Hsp70 Chaperones. Science 303, 98–101. LIN, H. Y., MASSO-WELCH, P., DI, Y. P., CAI, J. W., SHEN, J. W. & SUBJECK, J. R. (1993). The 170-kDa glucose-regulated stress protein is an endoplasmic reticulum protein that binds immunoglobulin. Mol Biol Cell 4, 1109–1119. MEUNIER, L., USHERWOOD, Y. K., CHUNG, K. T. & HENDERSHOT, L. M. (2002). A subset of chaperones and folding enzymes form multiprotein complexes in endoplasmic reticulum to bind nascent proteins. Mol Biol Cell 13, 4456–4469. PANZNER, S., DREIER, L., HARTMANN, E., KOSTKA, S. & RAPOPORT, T. A. (1995). Posttranslational protein transport in yeast reconstituted with a purified complex of Sec proteins and Kar2p. Cell 81, 561–570.
42 YOUNG, B. P., CRAVEN, R. A., REID, P. J., WILLER, M. & STIRLING, C. J. (2001). Sec63p and Kar2p are required for the translocation of SRP-dependent precursors into the yeast endoplasmic reticulum in vivo. EMBO J. 20, 262–271. 43 VOISINE, C., CRAIG, E. A., ZUFALL, N., VON AHSEN, O., PFANNER, N. & VOOS, W. (1999). The protein import motor of mitochondria: unfolding and trapping of preproteins are distinct and separable functions of matrix Hsp70. Cell 97, 565–574. 44 CORSI, A. K. & SCHEKMAN, R. (1997). The lumenal domain of Sec63p stimulates the ATPase activity of BiP and mediates BiP recruitment to the translocon in Saccharomyces cerevisiae. J. Cell. Biol. 137, 1483–1493. 45 SCHNEIDER, H.-C., BERTHOLD, J., BAUER, M. F., DIETMEIER, K., GUIARD, B., BRUNNER, M. & NEUPERT, W. (1994). Mitochondrial Hsp70/MIM44 complex facilitates protein import. Nature 371, 768–774. 46 RASSOW, J., MAARSE, A., KRAINER, E., KUBRICH, M., MULLER, H., MEIJER, M., CRAIG, E. & PFANNER, N. (1994). Mitochondrial protein import: biochemical and genetic evidence for interaction of matrix Hsp70 and the inner membrane protein Mim44. J. Cell Biol. 127, 1547–1556. 47 LIU, Q., D’SILVA, P., WALTER, W., MARSZALEK, J. & CRAIG, E. A. (2003). Regulated cycling of mitochondrial Hsp70 at the protein import channel. Science 300, 139–141. 48 D’SILVA, P. D., SCHILKE, B., WALTER, W., ANDREW, A. & CRAIG, E. A. (2003). J protein cochaperone of the mitochondrial inner membrane required for protein import into the mitochondrial matrix. Proc. Natl. Acad. Sci. USA 100, 13839–13844. 49 MOKRANJAC, D., SICHTING, M., NEUPERT, W. & HELL, K. (2003). Tim14, a novel key component of the import motor of the Tim23 protein translocase of mitochondria. EMBO J. 22, 4945–4956. 50 TRUSCOTT, K. N., VOOS, W., FRAZIER, A. E., LIND, M., LI, Y., GEISSLER, A., DUDEK, J., MULLER, H., SICKMANN, A., MEYER, H.
References
51
52
53
54
55
56
57
58
59
E., MEISINGER, C., GUIARD, B., REHLING, P. & PFANNER, N. (2003). A J-protein is an essential subunit of the presequence translocase-associated protein import motor of mitochondria. J. Cell. Biol. 163, 707–713. FEWELL, S. W., TRAVERS, K. J., WEISSMAN, J. S. & BRODSKY, J. L. (2001). The action of molecular chaperones in the early secretory pathway. Ann. Rev. Gen. 35, 149–191. NEUPERT, W. & BRUNNER, M. (2002). The protein import motor of mitochondria. Nat Rev Mol Cell Biol 8, 555–565. MATOUSCHEK, A., PFANNER, N. & VOOS, W. (2000). Protein unfolding by mitochondria. The Hsp70 import motor. EMBO Rep 5, 404–410. HAMMAN, B. D., HENDERSCHOT, L. M. & JOHNSON, A. E. (1998). BiP maintains the permeability barrier of the ER membrane by sealing the lumenal end of the translocon pore before and early in translocation. Cell 92, 747–758. CAPLAN, A. J., TSAI, J., CASEY, P. J. & DOUGLAS, M. G. (1992). Farnesylation of YDJ1p is required for function at elevated growth temperatures in S. cerevisiae. J. Biol. Chem. 267, 18890–18895. YOUNG, J. C., HOOGENRAAD, N. J. & HARTL, F.-U. (2003). Molecular Chaperones Hsp90 and Hsp70 Deliver Proteins to the Mitochondrial Import Receptor Tom70. Cell 112, 41–50. NELSON, R. J., ZIEGELHOFFER, T., NICOLET, C., WERNER-WASHBURNE, M. & CRAIG, E. A. (1992). The translation machinery and seventy kilodalton heat shock protein cooperate in protein synthesis. Cell 71, 97–105. YAN, W., SCHILKE, B., PFUND, C., WALTER, W., KIM, S. & CRAIG, E. A. (1998). Zuotin, a ribosome-associated DnaJ molecular chaperone. EMBO J. 17, 4809–4817. GAUTSCHI, M., LILIE, H., FUNFSCHILLING, U., MUN, A., ROSS, S., LITHGOW, T., RUCKNAGEL, P. & ROSPERT, S. (2001). RAC, a stable ribosome-associated complex in yeast formed by the DnaK-DnaJ homologues Ssz1p and zuotin. Proc. Natl. Acad. Sci. USA 98, 3762–3767.
60 HUNDLEY, H., EISENMAN, H., WALTER, W., EVANS, T., HOTOKEZAKA, Y., WIEDMANN, M. & CRAIG, E. (2002). The in vivo function of the ribosome-associated Hsp70, Ssz1, does not require its putative peptide-binding domain. Proc. Natl. Acad. Sci. USA 99, 4203–4208. 61 GAUTSCHI, M., MUN, A., ROSS, S. & ROSPERT, S. (2002). A functional chaperone triad on the yeast ribosome. Proc. Natl. Acad. Sci. USA 99, 4209–4214. 62 PFUND, C., LOPEZ-HOYO, N., ZIEGELHOFFER, T., SCHILKE, B. A., LOPEZ-BUESA, P., WALTER, W. A., WIEDMANN, M. & CRAIG, E. A. (1998). The Molecular Chaperone SSB from S. cerevisiae is a Component of the Ribosome-Nascent Chain Complex. EMBO Journal 17, 3981–3989. 63 PFUND, C., HUANG, P., LOPEZ-HOYO, N. & CRAIG, E. (2001). Divergent functional properties of the ribosome-associated molecular chaperone Ssb compared to other Hsp70s. Mol. Biol. Cell 12, 3773–3782. 64 SIEGERS, K., B¨OLTER, B., SCHWARZ, J. P., B¨OTTCHER, U., GUHA, S. & HARTL, F.-U. (2003). TRiC/CCT cooperates with different upstream chaperones in the folding of distinct protein classes. EMBO J. 22, 5230–5240. 65 DUDEK, J., VOLKMER, J., BIES, C., GUTH, S., MULLER, A., LERNER, M., FEICK, P., SCHAFER, K. H., MORGENSTERN, E., HENNESSY, F., BLATCH, G. L., JANOSCHECK, K., HEIM, N., SCHOLTES, P., FRIEN, M., NASTAINCZYK, W. & ZIMMERMANN, R. (2002). A novel type of co-chaperone mediates transmembrane recruitment of DnaK- like chaperones to ribosomes. EMBO J. 21, 2958–2967. 66 HORTON, L. E., JAMES, P., CRAIG, E. A. & HENSOLD, J. O. (2001). The yeast hsp70 homologue Ssa is required for translation and interacts with Sis1 and Pab1 on translating ribosomes. J. Biol. Chem. 276, 14426–14433. 67 ZHONG, T. & ARNDT, K. T. (1993). The yeast SIS1 protein, a DnaJ homologue, is required for initiation of translation. Cell 73, 1175–1186. 68 QIAN, X., LI, Z. & SHA, B. (2001). Cloning, expression, purification and
21
22 Hsp70 Chaperones
69
70
71
72
73
74
75
76
77
preliminary X-ray crystallographic studies of yeast Hsp40 Sis1 complexed with Hsp70 Ssa1 C-terminal lid domain. Acta Crystallogr D Biol Crystallogr 57, 748–750. QIAN, X., HOU, W., ZHENGANG, L. & SHA, B. (2002). Direct interactions between molecular chaperones heat-shock protein (Hsp) 70 and Hsp40: yeast Hsp70 Ssa1 binds the extreme C-terminal region of yeast Hsp40 Sis1. Biochem J. 361, 27–34. FRAZZON, J. & DEAN, D. R. (2003). Formation of iron-sulfur clusters in bacteria: an emerging field in bioinorganic chemistry. Current Opinion in Chemical Biology 7, 166–173. LILL, R. & KISPAL, G. (2000). Maturation of cellular FeaS proteins: an essential function of mitochondria. Trends in Biochemical Sciences 25, 352–356. HOFF, K. G., TA, D. T. L. T. T., SILBERG, J. J. & VICKERY, L. E. (2002). Hsc66 substrate specificity is directed toward a discrete region of the iron-sulfur cluster template protein IscU. J. Biol. Chem. 277, 27353–27359. DUTKIEWICZ, R., SCHILKE, B., KNIESZNER, H., WALTER, W., CRAIG, E. A. & MARSZALEK, J. (2003). Ssq1, a mitochondrial Hsp70 involved in iron-sulfur (Fe/S) center biogenesis: Similarities to and differences from its bacterial counterparts. J. Biol. Chem. 278, 29719–29727. CRAIG, E. A. & MARSZALEK, J. (2002). A specialized mitochondrial molecular chaperone system: a role in formation of Fe/S centers. Cell Mol Life Sci. 59, 1658–1665. SCHWARTZ, C. J., DJAMAN, O., IMLAY, J. A. & KILEY, P. J. (2000). The cysteine desulferase, IscS, has a major role in in vivo Fe–S cluster formation in Escherichia coli. Proc. Natl. Acad. Sci. USA 97, 9009–9014. MUHLENHOFF, U., GERBER, J., RICHHARDT, N. & LILL, R. (2003). Components involved in assembly and dislocation of iron-sulfur clusters on the scaffold protein Isu1p. EMBO J. 22, 4815–4825. SILBERG, J. J., HOFF, K. G. & VICKERY, L. E. (1998). The Hsc66-Hsc20 chaperone
78
79
80
81
82
83
84
85
86
87
system in Escherichia coli: chaperone activity and interactions with the DnaK-DnaJ-grpE system. J Bacteriol 180, 6617–6624. VOISINE, C., SCHILKE, B., OHLSON, M., BEINERT, H., MARSZALEK, J. & CRAIG, E. A. (2000). Role of the mitochondrial Hsp70s, Ssc1 and Ssq1, in the maturation of Yfh1. Molecular and Cellular Biology 10, 3677–3684. CREMONA, O. (2001). Live stripping of clathrin-coated vesicles. Dev Cell 1, 592–594. SCHLOSSMAN, D. M., SCHMID, S. L., BRAELL, W. A. & ROTHMAN, J. E. (1984). An enzyme that removes clathrin coats: purification of an uncoating ATPase. J. Cell. Biol. 99, 723–733. BAROUCH, W., PRASAD, K., GREENE, L. E. & EISENBERG, E. (1994). ATPase activity associated with the uncoating of clathrin baskets by Hsp70. J. Biol. Chem. 269, 28563–28568. HANNAN, L. A., NEWMYER, S. L. & SCHMID, S. L. (1998). ATP- and cytosol-dependent release of adaptor proteins from clathrin-coated vesicles: A dual role for Hsc70. Mol Biol Cell 9, 2217–2229. AHLE, S. & UNGEWICKELL, E. (1990). Auxilin, a newly identified clathrin-associated protein in coated vesicles from bovine brain. J. Cell. Biol. 111, 19–29. UNGEWICKELL, E., UNGEWICKELL, H., HOLSTEIN, S. E., LINDNER, R., PRASAD, K., BAROUCH, W., MARTIN, B., GREENE, L. E. & EISENBERG, E. (1995). Role of auxilin in uncoating clathrin-coated vesicles. Nature 378, 632–635. KANAOKA, Y., KIMURA, S. H., OKAZAKI, I., IKEDA, M. & NOJIMA, H. (1997). GAK: a cyclin G associated kinase contains a tensin/auxilin-like domain. FEBS Lett 402, 73–80. KIMURA, S. H., TSURUGA, H., YABUTA, N., ENDO, Y. & NOJIMA, H. (1997). Structure, expression, and chromosomal localization of human GAK. Genomics 44, 179–187. GREENER, T., ZHAO, X., NOJIMA, H., EISENBERG, E. & GREENE, L. E. (2000). Role of cyclin G-associated kinase in
References
88
89
90
91
92
93
94
95
96
uncoating clathrin-coated vesicles from non-neuronal cells. J. Biol. Chem. 275, 1365–1370. UMEDA, A., MEYERHOLZ, A. & UNGEWICKELL, E. (2000). Identification of the universal cofactor (auxilin 2) in clathrin coat dissociation. Eur J Cell Biol 79, 336–342. PISHVAEE, B., COSTAGUTA, G., YEUNG, B. G., RYAZANTSEV, S., GREENER, T., GREENE, L. E., EISENBERG, E., MCCAFFERY, J. M. & PAYNE, G. S. (2000). A yeast DNA J protein required for uncoating of clathrin-coated vesicles in vivo. Nat Cell Biol 2, 958–963. GREENER, T., GRANT, B., ZHANG, Y., WU, X., GREENE, L. E., HIRSH, D. & EISENBERG, E. (2001). Caenorhabditis elegans auxilin: a J-domain protein essential for clathrin-mediated endocytosis in vivo. Nat Cell Biol 3, 215–219. ZHAO, X., GREENER, T., AL-HASANI, H., CUSHMAN, S. W., EISENBERG, E. & GREENE, L. E. (2001). Expression of auxilin or AP180 inhibits endocytosis by mislocalizing clathrin: evidence for formation of nascent pits containing AP1 or AP2 but not clathrin. J Cell Sci 114, 353–365. HOLSTEIN, S. E., UNGEWICKELL, H. & UNGEWICKELL, E. (1996). Mechanism of clathrin basket dissociation: separate functions of protein domains of the DnaJ homologue auxilin. J. Cell. Biol. 135, 925–937. GALL, W. E., HIGGINBOTHAM, M. A., CHEN, C., INGRAM, M. F., CYR, D. M. & GRAHAM, T. R. (2000). The auxilin-like phosphoprotein Swa2p is required for clathrin function in yeast. Curr Biol 10, 1349–1358. LEMMON, S. K. (2001). Clathrin uncoating: Auxilin comes to life. Curr Biol 11, R49–R52. MORGAN, J. R., PRASAD, K., JIN, S., AUGUSTINE, G. J. & LAFER, E. M. (2001). Uncoating of clathrin-coated vesicles in presynaptic terminals: roles for Hsc70 and auxilin. Neuron 32, 289–300. JIANG, J., TAYLOR, A. B., PRASAD, K., ISHIKAWA-BRUSH, Y., HART, P. J., LAFER, E. M. & SOUSA, R. (2003). Structure-function analysis of the auxilin
97
98
99
100
101
102
103
104
105
106
J-domain reveals an extended Hsc70 interaction interface. Biochemistry 42, 5748–5753. NEWMYER, S. L. & SCHMID, S. L. (2001). Dominant-interfering Hsc70 mutants disrupt multiple stages of the clathrin-coated vesicle cycle in vivo. J. Cell. Biol. 152, 607–620. JIANG, R., GAO, B., PRASAD, K., GREENE, L. E. & EISENBERG, E. (2000). Hsc70 chaperones clathrin and primes it to interact with vesicle membranes. J. Biol. Chem. 275, 8439–8447. NEWMYER, S. L., CHRISTENSEN, A. & SEVER, S. (2003). Auxilin-dynamin interactions link the uncoating ATPase chaperone machinery with vesicle formation. Dev Cell 4, 929–940. BRONK, P., WENNIGER, J. J., DAWSON-SCULLY, K., GUO, X., HONG, S., ATWOOD, H. L. & ZINSMAIER, K. E. (2001). Drosophila Hsc70-4 is critical for neurotransmitter exocytosis in vivo. Neuron 30, 475–488. ARSENE, F., TOMOYASU, T. & BUKAU, B. (2000). The heat shock response of Escherichia coli. Int J Food Microbiol 55, 3–9. LUND, P. A. (2001). Regulation of expression of molecular chaperones. In Moleuclar Chaperones in the Cell, pp. 235–256. Oxford University Press Inc., New York. STRAUS, D. B., WALTER, W. A. & GROSS, C. A. (1989). The activity of sigma 32 is reduced under conditions of excess heat shock protein production in Escherichia coli. Genes Dev 3, 2003–2010. MORITA, M. T., KANEMORI, M., YANAGI, H. & YURA, T. (2000). Dynamic interplay between antagonistic pathways controlling the sigma 32 level in Escherichia coli. Proc. Natl. Acad. Sci. USA 97, 5860–5865. HERMAN, C., THEVENET, D., D’ARI, R. & BOULOC, P. (1995). Degradation of sigma 32, the heat shock regulator in Escherichia coli, is governed by HflB. Proc. Natl. Acad. Sci. USA 92, 3516–3520. TOMOYASU, T., GAMER, J., BUKAU, B., KANEMORI, M., MORI, H., RUTMAN, A. J., OPPENHEIM, A. B., YURA, T., YAMANAKA, K., NIKI, H. et al. (1995). Escherichia coli
23
24 Hsp70 Chaperones
107
108
109
110
111
112
113
114
115
FtsH is a membrane-bound, ATP-dependent protease which degrades the heat-shock transcription factor sigma 32. EMBO J. 14, 2551–2560. KANEMORI, M., NISHIHARA, K., YANAGI, H. & YURA, T. (1997). Synergistic roles of HslVU and other ATP-dependent proteases in controlling in vivo turnover of sigma32 and abnormal proteins in Escherichia coli. J Bacteriol 179, 7219–7225. TATSUTA, T., TOMOYASU, T., BUKAU, B., KITAGAWA, M., MORI, H., KARATA, K. & OGURA, T. (1998). Heat shock regulation in the ftsH null mutant of Escherichia coli: dissection of stability and activity control mechanisms of sigma32 in vivo. Mol Microbiol 30, 583–593. STRAUS, D., WALTER, W. & GROSS, C. A. (1990). DnaK, DnaJ, and GrpE heat shock proteins negatively regulate heat shock gene expression by controlling the synthesis and stability of sigma 32. Genes Dev 4, 2202–2209. BUKAU, B. (1993). Regulation of the Escherichia coli heat-shock response. Mol Microbiol 9, 671–680. TOMOYASU, T., OGURA, T., TATSUTA, T. & BUKAU, B. (1998). Levels of DnaK and DnaJ provide tight control of heat shock gene expression and protein repair in Escherichia coli. Mol Microbiol 30, 567–581. TATSUTA, T., JOOB, D. M., CALENDAR, R., AKIYAMA, Y. & OGURA, T. (2000). Evidence for an active role of the DnaK chaperone system in the degradation of sigma(32). FEBS Lett 478, 271–275. CHATTOPADHYAY, R. & ROY, S. (2002). DnaK-sigma 32 interaction is temperature-dependent. Implication for the mechanism of heat shock response. J. Biol. Chem. 277, 33641–33647. LAUFEN, T., MAYER, M. P., BEISEL, C., KLOSTERMEIER, D., MOGK, A., REINSTEIN, J. & BUKAU, B. (1999). Mechanism of regulation of hsp70 chaperones by DnaJ cochaperones. Proc. Natl. Acad. Sci. USA 96, 5452–5457. NAKAHIGASHI, K., YANAGI, H. & YURA, T. (1995). Isolation and sequence analysis of rpoH genes encoding sigma 32 homologues from gram negative
116
117
118
119
120
121
122
123
124
bacteria: conserved mRNA and protein segments for heat shock regulation. Nucleic Acids Res 23, 4383–4390. MCCARTY, J. S., RUDIGER, S., SCHONFELD, H. J., SCHNEIDER-MERGENER, J., NAKAHIGASHI, K., YURA, T. & BUKAU, B. (1996). Regulatory region C of the E. coli heat shock transcription factor, sigma32, constitutes a DnaK binding site and is conserved among eubacteria. J Mol Biol 256, 829–837. ARSENE, F., TOMOYASU, T., MOGK, A., SCHIRRA, C., SCHULZE-SPECKING, A. & BUKAU, B. (1999). Role of region C in regulation of the heat shock gene-specific sigma factor of Escherichia coli, sigma32. J Bacteriol 181, 3552–3561. NAKAHIGASHI, K., YANAGI, H. & YURA, T. (2001). DnaK chaperone-mediated control of activity of a sigma(32) homologue (RpoH) plays a major role in the heat shock response of Agrobacterium tumefaciens. J Bacteriol 183, 5302–5310. WICKNER, S. H. (1990). Three Escherichia coli heat shock proteins are required for P1 plasmid DNA replication: formation of an active complex between E. coli DnaJ protein and the P1 initiator protein. Proc. Natl. Acad. Sci. USA 87, 2690–2694. WICKNER, S., HOSKINS, J. & MCKENNEY, K. (1991). Function of DnaJ and DnaK as chaperones in origin-specific DNA binding by RepA. Nature 350, 165–167. WICKNER, S., HOSKINS, J. & MCKENNEY, K. (1991). Monomerization of RepA dimers by heat shock proteins activates binding to DNA replication origin. Proc. Natl. Acad. Sci. USA 88, 7903–7907. WICKNER, S., SKOWYRA, D., HOSKINS, J. & MCKENNEY, K. (1992). DnaJ, DnaK, and GrpE heat shock proteins are required in oriP1 DNA replication solely at the RepA monomerization step. Proc. Natl. Acad. Sci. USA 89, 10345–10349. TILLY, K. & YARMOLINSKY, M. (1989). Participation of Escherichia coli heat shock proteins DnaJ, DnaK, and GrpE in P1 plasmid replication. J Bacteriol 171, 6025–6029. KIM, S. Y., SHARMA, S., HOSKINS, J. R. & WICKNER, S. (2002). Interaction of the
References
125
126
127
128
129
130
131
132
133
DnaK and DnaJ chaperone system with a native substrate, P1 RepA. J. Biol. Chem. 277, 44778–44783. KONIECZNY, I. & LIBEREK, K. (2002). Cooperative action of Escherichia coli ClpB protein and DnaK chaperone in the activation of a replication initiation protein. J. Biol. Chem. 277, 18483–18488. GIRALDO, R. & DIAZ-OREJAS, R. (2001). Similarities between the DNA replication initiators of Gram-negative bacteria plasmids (RepA) and eukaryotes (Orc4p)/archaea (Cdc6p). Proc. Natl. Acad. Sci. USA 98, 4938–4943. SULLIVAN, C. S. & PIPAS, J. M. (2001). The virus-chaperone connection. Virology 287, 1–8. ALFANO, C. & MCMACKEN, R. (1989). Heat shock protein-mediated disassembly of nucleoprotein structures is required for the initiation of bacteriophage lambda DNA replication. J. Biol. Chem. 264, 10709–10718. LIBEREK, K., OSIPIUK, J., ZYLICZ, M., ANG, D., SKORKO, J. & GEORGOPOULOS, C. (1990). Physical interactions between bacteriophage and Escherichia coli proteins required for initiation of lambda DNA replication. J. Biol. Chem. 265, 3022–3029. ZYLICZ, M., ANG, D., LIBEREK, K. & GEORGOPOULOS, C. (1989). Initiation of lambda DNA replication with purified host- and bacteriophage-encoded proteins: the role of the dnaK, dnaJ and grpE heat shock proteins. EMBO J. 8, 1601–1608. TANGUY LE GAC, N. & BOEHMER, P. E. (2002). Activation of the herpes simplex virus type-1 origin-binding protein (UL9) by heat shock proteins. J. Biol. Chem. 277, 5660–5666. EOM, C. Y. & LEHMAN, I. R. (2002). The human DnaJ protein, hTid-1, enhances binding of a multimer of the herpes simplex virus type 1 UL9 protein to oris, an origin of viral DNA replication. Proc. Natl. Acad. Sci. USA 99, 1894–1898. LIU, J. S., KUO, S. R., MAKHOV, A. M., CYR, D. M., GRIFFITH, J. D., BROKER, T. R.
134
135
136
137
138
139
140
141
& CHOW, L. T. (1998). Human Hsp70 and Hsp40 chaperone proteins facilitate human papillomavirus-11 E1 protein binding to the origin and stimulate cell-free DNA replication. J. Biol. Chem. 273, 30704–30712. LIN, B. Y., MAKHOV, A. M., GRIFFITH, J. D., BROKER, T. R. & CHOW, L. T. (2002). Chaperone proteins abrogate inhibition of the human papilloma-virus (HPV) E1 replicative helicase by the HPV E2 protein. Mol Cell Biol 22, 6592–6604. MARCHLER, G. & WU, C. (2001). Modulation of Drosophila heat shock transcription factor activity by the molecular chaperone DROJ1. EMBO J. 20, 499–509. LOPEZ, N., ARON, R. & CRAIG, E. A. (2003). Specificity of Class II Hsp40 Sis1 in Maintenance of Yeast Prion [RNQ(+)]. Mol Biol Cell 14, 1172–1181. HALLSTROM, T. C., KATZMANN, D. J., TORRES, R. J., SHARP, W. J. & MOYE-ROWLEY, W. S. (1998). Regulation of transcription factor Pdr1p function by an Hsp70 protein in Saccharomyces cerevisiae. Mol Cell Biol 18, 1147–1155. MORISHIMA, N., NAKAGAWA, K., YAMAMOTO, E. & SHIBATA, T. (1990). A subunit of yeast site-specific endonuclease SceI is a mitochondrial version of the 70-kDa heat shock protein. J. Biol. Chem. 265, 15189–15197. SHIBATA, T., NAKAGAWA, K. & MORISHIMA, N. (1995). Multi-site-specific endonucleases and the initiation of homologous genetic recombination in yeast. Adv Biophys 31, 77–91. MIZUMURA, H., SHIBATA, T. & MORISHIMA, N. (1999). Stable association of 70-kDa heat shock protein induces latent multisite specificity of a unisite-specific endonuclease in yeast mitochondria. J. Biol. Chem. 274, 25682–25690. MIZUMURA, H., SHIBATA, T. & MORISHIMA, N. (2002). Association of HSP70 with endonucleases allows the expression of otherwise silent mutations. FEBS Lett 522, 177–182.
25
1
Protein Folding in the Endoplasmic Reticulum Ying Shen and Linda M. Hendershot St. Jude Children’s Research Hospital, Memphis, USA
Kyung Tae Chung Dongeui University, Pusan, South Korea
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The endoplasmic reticulum (ER) is a membrane-enclosed organelle that is found in all eukaryotic organisms and that represents the entry site or origin of the secretory pathway. As such, the ER is a major site of protein folding and assembly. In some highly specialized secretory cells – such as immunoglobulin-producing plasma cells, serum-protein-producing liver cells, or insulin-producing pancreatic cells – the ER is highly developed and becomes the major site of protein biosynthesis for the cell. Proteins that are destined for the secretory pathway are synthesized in the cytosol on ER-associated ribosomes. A hydrophobic signal sequence present on the nascent polypeptide chain directs it to the translocon, which is a proteinaceous channel that traverses the ER membrane [1], allowing the protein to be translocated into the lumen of the ER, in many cases as it is being synthesized. The elongating nascent chain first passes through a channel in the ribosome and then through the translocon. This requires that 70 amino acids of the nascent chain be translated before the N-terminus can enter the ER lumen [2]. It appears that the growing polypeptide chain remains unfolded during its transit through the ribosome and translocon [3, 4]. After it enters the ER, N-terminal signal sequences are often removed by a signal peptidase that is positioned at the lumenal side of the translocon. Once the site is 14 amino acids into the lumen, N-linked glycans are added cotranslationally by oligosaccharyl transferase (OST) to asparagine residues that are followed by a second amino acid, which can be anything but a proline, and then a serine or a threonine (N-X-S/T) [4]. The OST complex is also associated with the
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Folding in the Endoplasmic Reticulum
translocon. Inside the ER, the polypeptide chain begins folding co-translationally [5, 6], and in some cases, subunit assembly can occur before the individual chains are completely translated and inside the ER [7]. The modification of secretory pathway proteins with N-linked glycans serves in part to limit the ways the nascent protein can fold. In addition, the ER environment itself causes further constraints on and benefits to protein folding and assembly. The calcium required for many signal transduction pathways is stored here. Thus, proteins that are synthesized in this organelle have evolved to fold and assemble in a high calcium environment, and changes in the ER calcium levels can dramatically affect their folding [8]. Like the extracellular space that it is contiguous with, the ER also possesses an oxidizing environment [9]. This allows disulfide bonds to form between juxtaposed cysteine residues, which can serve to stabilize folded regions of the nascent chain. However, in the crowded environment of the ER lumen, where large numbers of unfolded polypeptide chains are being synthesized, the formation of disulfide bonds between the wrong regions of a protein or even between unrelated proteins could lead to the formation of large, insoluble aggregates. Surprisingly, this rarely occurs under normal conditions. The high fidelity of protein folding in the ER is due to a stringent quality-control apparatus [10, 11]. Newly synthesized proteins are carefully scrutinized by two major families of molecular chaperones: the Hsp70 cognate BiP and the resident lectin-like proteins calnexin and calreticulin. Only when all aspects of an unfolded state have disappeared do the chaperones cease binding, and the newly synthesized protein is allowed to exit the ER for transport to the Golgi for further routing along the secretory pathway. Proteins that do not meet the stringent requirements of this quality-control system are retained in the ER, where molecular chaperones prevent them from aggregating and provide them with additional opportunities to achieve their correct conformation. Proteins that ultimately fail to mature properly are targeted for intracellular degradation by the 26S proteasome [12–14]. The regulated and carefully controlled assembly of subunit proteins into larger complexes also occurs in the ER and is monitored by the same quality-control apparatus.
2 BiP Interactions with Unfolded Proteins
Immunoglobulin heavy-chain-binding protein (BiP), the first component of the ER quality-control apparatus to be identified, was found by virtue of its association with the unassembled, non-transported heavy chains produced in pre-B cell lines [15]. BiP was subsequently shown to interact transiently with Ig assembly intermediates in mouse plasmacytoma lines, but not with completely assembled H2 L2 molecules [16], and later was shown to interact with numerous other unfolded, unassembled, and misfolded proteins [17]. The presence of a tetrapeptide sequence (KDEL) at the extreme C-terminus of BiP prevents it from being transported further along the secretory pathway [18]. A receptor for the KDEL sequence exists in the intermediate
2 BiP Interactions with Unfolded Proteins
compartment and cis Golgi and serves to redirect any KDEL-containing proteins, along with the unfolded proteins they carry with them, back to the ER [19, 20]. Like all Hsp70 proteins, BiP possesses an N-terminal ATPase domain that is crucial to its chaperoning activity [21–24] and a C-terminal protein-binding domain that recognizes unfolded polypeptides [25, 26]. BiP homologues have now been identified in the ER of all eukaryotic organisms that have been examined, and the yeast protein Kar2p is an essential gene [27]. In addition to playing a role in protein folding in the yeast ER, Kar2p is also required for translocation of proteins into the ER lumen [28–30] and for degradation [31, 32] as described below. Clues as to how BiP might recognize so many different proteins first came from studies showing that peptides of at least seven amino acids in length and composed of predominantly aliphatic residues were more likely to bind BiP and stimulate its ATPase activity than were those containing mostly hydrophilic or charged residues [33, 34]. By scanning known proteins for seven amino acid sequences that were enriched in hydrophobic residues, the authors were able to conclude that a “potential” BiP-binding site might occur about every 16 amino acids on most globular proteins! The makeup of the “BiP-binding sequence” was confirmed and extended when a peptide display library was used to elucidate the peptide sequences that could bind to BiP. A composite heptameric sequence, Hy-X-Hy-X-Hy-X-Hy, was derived from this data, where Hy is a bulky aromatic or hydrophobic residue and X is any amino acid [35]. The alternating pattern of Hy residues in the binding motif is compatible with BiP binding to proteins when they are in an extended conformation, causing the bulky aromatic/hydrophobic side chains to lie on one side of the protein and to presumably point into the protein-binding site on BiP. The results of these two studies further suggested that BiP would bind to hydrophobic stretches on nascent or unassembled polypeptides, which would become inaccessible after folding and assembly were complete. Studies that monitored the oxidation state of the vesicular stomatitis virus G-protein [36] or of Ig light chains [22, 37] as an indication of their folding status demonstrated that BiP bound to regions of the protein that were not yet oxidized and that loss of BiP binding corresponded to the protein achieving a completely oxidized state. BiP’s ability to bind and release substrate proteins is nucleotide-dependent, a feature that is conserved with all Hsp70 family members. The ATPase cycle has been described in great detail for bacterial and cytosolic family members and works as follows. Hsp70 associates best with client proteins when it is in the ATP-bound state. DnaJ proteins bind and catalyze the rapid hydrolysis of ATP to ADP [38], “locking” the Hsp70 protein onto the unfolded substrate. The next step involves the exchange of ATP back into the nucleotide binding cleft, which “reopens” the Hsp70 protein, thereby releasing the substrate protein and giving it an opportunity to fold. In bacteria and mitochondria, GrpE proteins regulate nucleotide exchange [38]. Mammalian GrpE homologues have not been identified in other organelles, but several other cytosolic proteins have been identified that play various roles in controlling nucleotide exchange [39–43]. Very recently, ER-localized nucleotidereleasing factors were identified in yeast [44, 45] and mammals [46].
3
4 Protein Folding in the Endoplasmic Reticulum
BiP binds both to normal proteins that have not completely folded or assembled [15, 16] and to mutant proteins that are unable to fold [17]. BiP release is dependent on its ability to bind ATP and undergo a conformational change, thus altering the protein-binding domain so that the unfolded protein can be released and given the opportunity to fold. The premature release of T cell receptor subunits from BiP can be achieved in vivo by depleting calcium from the ER, which results in BiP release and the inappropriate secretion of individual subunits [47]. Alternatively, depletion of ATP from the ER with the drug CCCP results in prolonged association of proteins with BiP and a block in their secretion [48]. However, it does not appear that BiP release is the only requirement for all proteins to fold properly. While the in vitro release of BiP from Ig light chains with ATP can lead to folding and disulfide bond formation [37], a similar release of BiP from Ig heavy chains results in the formation of large aggregates [24]. Thus, individual properties of proteins dictate whether they will be able to fold upon BiP release. Given estimates on molecular crowding inside the cell and the subsequent effects crowding can have on protein folding [49], it is reasonable to assume that BiP release might be a tightly regulated process in vivo. Data to support this idea came from studies to measure the rate of BiP cycling on and off unassembled Ig heavy chains. Both wild-type and ATPasedefective hamster BiP were co-expressed with Ig heavy chains under conditions where heavy chains bound equally to both wild-type and mutant BiP during a short pulse [24]. The mutant BiP was expected to act as a kinetic trap by binding to nascent heavy chains but not leasing them, whereas the wild-type BiP should release during its normal course of cycling. Each time the wild-type BiP releases, it should be replaced half the time by another wild-type BiP and the other half by a mutant. Eventually, and under steady-state conditions, all the heavy chains should be bound to mutant BiP. Instead, the study revealed that heavy chains remained bound to both wild-type and mutant BiP and that the ratio of the two types of BiP did not change over a long chase period [24]. However, introduction of light chains rapidly displaced wild-type but not mutant BiP, suggesting that the ATPase cycle of BiP is “stalled” when heavy chains are expressed alone and that assembly with light chains either releases a repressive factor or allows access of a stimulatory factor. This further suggests that the substrate itself can play a role in contributing to the release of BiP or the rate at which the ATPase cycle turns. Although peptide-binding preferences for BiP have been determined, relatively little is known about actual BiP-binding sites on proteins. Early studies to identify the BiP-binding site on unassembled Ig heavy chains revealed that for all isotypes, the CH 1 domain, which pairs with Ig light chains, possesses the “stable” BiPbinding site [50]. A more recent study revealed that, unlike other domains on the heavy chain, the CH 1 domain remains unfolded until it assembles with light chains [51], thus providing a long-term BiP-binding site. An algorithm based on the binding of BiP to peptides displayed on bacterial phages [35] was applied to Ig domains, where it was found that each Ig domain possessed multiple potential BiPbinding sites [52]. Peptides corresponding to these sequences did indeed stimulate the ATPase activity of BiP, demonstrating that they were capable of binding to the chaperone. The more rapid folding of other domains probably accounts for the
3 ER-localized DnaJ Homologues
fact that BiP associates only transiently with them [53] or, in some cases, that BiP fails to bind altogether [54]. The fact that multiple potential BiP-binding sites were identified on the CH 1 domain [52], even though BiP binds in a 1:1 stoichiometry to this domain in vivo [55], suggests either that the binding of BiP to one site precludes its binding to others or that the algorithm does not accurately identify BiP-binding sites that are used on proteins in vivo. Further studies are needed to determine which is correct. In addition to BiP, both yeast and mammals possess a second ER-localized Hsp70 family member, Lhs1p [56] and GRP170 [57], respectively, that is less well characterized than BiP. Lhs1p (also known as Cer1p and Ssi1p) is an Hsp70-related protein that appears to be important for the translocation of a subset of proteins into the ER of S. cerevisiae [45, 58]. Lhs1p/Cer1p possesses an ATP-binding site like Kar2p and other members of the Hsp70 family [59]. The CER1 RNA levels increase during UPR activation [56] and at lower temperatures, which is in keeping with the more severe defects in folding and translocation that are observed in the cer strain at low temperatures [59]. These results suggest that Lhs1p/Cer1p provides an additional chaperoning activity in processes known to require Kar2p. In mammalian cells, GRP170 has been identified as an ATP-binding Hsp70 homologue [57] that is induced by ER stress [60] and hypoxia, which has led to its alternative designation as oxygen-regulated protein ORP150 [61, 62]. GRP170 can be co-precipitated with unassembled Ig heavy chains [60], although it is not clear whether it binds to them directly or as a component of the multi-chaperone complex that was recently identified in the ER [55]. GRP170 shows sequence similarity to Lhs1p and may also play a role in the translocation of nascent polypeptides into the ER lumen [63].
3 ER-localized DnaJ Homologues
The ATPase activity of Hsp70 proteins is stimulated by interactions with DnaJ proteins. The first DnaJ was discovered in Escherichia coli, where it cooperates with DnaK to aid in lambda phage replication [64]. It is a 43-kDa protein containing four domains, beginning with the N-terminal, 73-amino-acid J-domain, which is present in all DnaJ family members and contains the hallmark tripeptide HPD motif (His-Pro-Asp) [65, 66]. A flexible Gly/Phe-rich domain links the J-domain to a cysteine-rich Zn2+ -binding domain [67, 68]. Distal to the Cys-rich domain is a poorly conserved region that accounts for nearly half of the DnaJ molecule and that may contain the substrate-binding domain [69]. Presently, more than 100 DnaJ family members have been identified, which can be found in all species and organelles. They can be divided into three subgroups based upon the degree of domain conservation with E. coli DnaJ [69]. Type I DnaJ-like proteins have the highest domain homology with E. coli DnaJ and possess all four domains. Type II DnaJ proteins have an N-terminal J-domain and the Gly/Phe-rich linker but lack the Zn2+ -binding domain. Type III proteins possess only a J-domain, which can occur anywhere in the protein.
5
6 Protein Folding in the Endoplasmic Reticulum
Organelle-specific DnaJs work as cofactors to cooperate with their specific Hsp70 partners, and, unlike Hsp70s, when multiple DnaJs are present in an organelle, each one often regulates a different function. The yeast ER contains three DnaJ-like proteins: Sec63p [70, 71], Jem1p [72], and Scj1p [73]. Sec63p and Jem1p are both type III DnaJ proteins [72, 74], whereas Scj1p is a type I protein [73]. Sec63p is an integral membrane protein possessing three transmembrane domains, with its Jdomain in the ER lumen and a large C-terminal domain exposed to the cytosol [74]. As one subunit of the ER translocon, Sec63p interacts with Sec61p, Sec62p, Sec71p, and Sec72p and recruits yeast BiP, Kar2p, to the lumenal side of the translocation apparatus [75]. Sec63p assists Kar2p in pulling nascent proteins into the ER lumen during both co-translational and posttranslational translocation [28, 76, 77]. Sec63p also appears to be a component of the retrograde translocon, which plays a role in removing terminally misfolded proteins from the ER for degradation by the 26S proteasome [78]. Unlike Sec63p, which is a membrane protein required for growth, Jem1p and Scj1p are both soluble ER lumenal proteins that are not essential for cell viability under normal growth conditions. However, a double disruption of the JEM1 and SCJ1 genes causes growth arrest at elevated temperatures. Jem1p interacts with Kar2p to mediate nuclear membrane fusion or karyogamy during mating [72], while Scj1p cooperates with Kar2p to fold and assemble proteins in the ER lumen [73]. Recent studies show that both Scj1p and Jem1p may also assist Kar2p in maintaining lumenal ERAD substrates in a retrotranslocation-competent state [32]. Recently, five mammalian ER DnaJ homologues have been identified. They appear to include two Sec63 homologues, one Scj1 homologue, and two novel eukaryotic family members (Table 1). However, no JEM homologues have been identified at this time. According to their order of discovery, we propose they be named ERdj1–5 for ER-associated DnaJ proteins. They are ERdj1/Mtj1 [79, 80], ERdj2/hSec63 [81, 82], ERdj3/HEDJ [55, 83, 84], ERdj4/Mdg1 [85, 86], and ERdj5/JPDI [87, 88]. All of the ERdjs contain a J-domain, which interacts with BiP and stimulates its ATPase activity in vitro. Since mammalian BiP has multiple functions in the ER, it is possible that these ERdjs each regulate different BiP functions in vivo. Like yeast Sec63p, ERdj1/Mtj1 (66 kDa) and ERdj2/hSec63 (97 kDa) are type III DnaJ proteins associated with the mammalian Sec61p complex in the ER membrane [82, 89, 90]. The N-terminal 190 amino acids of ERdj1/Mtj1 are oriented in the ER lumen and contain the J-domain, which could serve to recruit BiP to the translocon. A large C-terminal cytosolic domain is in close contact with SRP in the cytosol and appears to modulate translation [89]. Inspection of gene databases reveals that ERdj1/Mtj1 is highly conserved in mammals, with homologues present in Drosophila melanogaster, Ciona intestinalis, and Anopheles gambiae. ERdj2/hSec63 displays similar membrane topology and shares 44% amino acid sequence similarity (26% identity) with yeast Sec63p [81]. Like ERdj1/Mtj1, ERdj2/hSec63 is associated with Sec61 and Sec62, where it may act with BiP to translocate nascent polypeptides into the ER [82, 90]. ERdj2/hSec63 is expressed at relatively higher levels (1.98 µM) in pancreatic microsomes than is ERdj1/Mtj1 (0.36 µM) [89].
3 ER-localized DnaJ Homologues Table 1 Identification and characterization of mammalian ER DnaJ proteins
ERdj2/hSec63 homologues can also be found in organisms as diverse as Arabidopsis thaliana, Caenorhabditis elegans, Neurospora crassa, and A. gambiae. ERdj3/HEDJ is a 43-kDa type I DnaJ protein that shows 46% sequence similarity (35% identity) to yeast Scj1p. Despite the absence of a KDEL retention signal, it appears to be a resident ER protein, although there is some controversy as to its intracellular localization [55, 83, 84, 91] and whether it is a membrane-anchored or soluble protein [55, 83, 84]. ERdj3/HEDJ was identified in association with Shiga toxin [84], which is taken up by the cell at the plasma membrane and passes through the ER before being retrotranslocated to the cytosol, where it performs its pathogenic function [92, 93]. The fact that ERdj3/HEDJ interacts with long-lived, BiP-bound, unassembled Ig heavy chains in vivo [55] suggests that it is likely to play a role in keeping nascent proteins from aggregating and in aiding protein folding and assembly. Alternatively, ERdj3 could act to target ERAD substrates to the retrotranslocon for degradation. Either function would be compatible with its association with Shiga toxin [84]. ERdj3 appears to be widely expressed, with highly homologous orthologs present in D. melanogaster, C. elegans, Danio rerio, A. gambiae, A. thaliana, and Zea mays in addition to yeast. ERdj4/Mdg1 (26 kDa) is the first type II DnaJ homologue to be found in the ER of any organism [86]. It appears to be largely restricted to vertebrates, since no potential ERdj4 homologues could be found in the D. melanogaster, C. elegans, or S. cerevisiae genomic databases. ERdj4/Mdj1 is anchored in the ER membrane through
7
8 Protein Folding in the Endoplasmic Reticulum
its uncleaved signal peptide, thus orienting its J-domain and the rest of the protein inside the ER where it interacts with BiP [86]. However, another study using an N-terminal-tagged version of ERdj4/Mdg2 reported nuclear localization after heat shock and cytosolic expression before heat shock [85]. ERdj4 is expressed at the highest levels in secretory tissues and is highly induced by ER stress, suggesting that ERdj4/Mdj1 may play a role in either protein folding or ERAD [86]. Overexpression of ERdj4/Mdg1 inhibited cell death induced by ER stress [94], suggesting that it plays some protective role during the UPR. The last mammalian DnaJ homologue to be discovered, ERdj5/JPDI (87 kDa), contains a J-domain at its N-terminus, a PDI-like domain, a thioredoxin domain, and a KDEL sequence at its C-terminus [88]. This latter feature suggests that it is likely to be a soluble ER lumenal protein, although this has not been formally demonstrated. The presence of a thioredoxin-like domain suggests that ERdj5 may be involved in assisting protein folding and disulfide bond formation in the ER. Homologues of ERdj5/JPDI can also be found in C. elegans, D. melanogaster, Ciona intestinalis, and A. gambiae, but not in yeast or plants. Like BiP and other ER chaperones, some of the ERdjs are also upregulated in response to ER stress. ERdj3 and ERdj4 transcripts are dramatically upregulated in response to UPR activation [86]. Combined with the fact that both are expressed at the highest levels in secretory tissues (Figure 1), it is likely that they play roles in either the refolding of unfolded proteins or the retrotranslocation of misfolded proteins, both of which diminish the accumulation of unfolded proteins that occurs
Fig. 1 Tissue distribution and ER stress inducibility of BiP and its cofactors. (A) A human multi-tissue Northern blot was hybridized with probes corresponding to the coding regions of human ERdj3, mouse ERdj4, human BAP, hamster BiP, and human $-actin genes. Lane 1: peripheral blood leukocytes; lane 2: lung; lane 3: placenta; lane 4: small intestine; lane 5: liver; lane 6:
kidney; lane 7: spleen; lane 8: thymus; lane 9: colon; lane 10: skeletal muscle; lane 11: heart; lane 12: brain. (B) NIH3T3 fibroblasts were incubated with the ER stress-inducing agents tunicamycin or thapsigargin. RNA was isolated at the indicated times and subjected to Northern blot analyses using the indicated probes.
4 ER-localized Nucleotide-exchange/releasing Factors 9
in the ER during stress conditions. Inspection of genomic sequences upstream of these two genes reveals that ERdj3/HEDJ contains a putative ERSE element in its promoter, which could serve to regulate ERdj3 expression, but no potential ERSE can be found in the ERdj4 promoter. ERdj5/JPDI has been reported to contain one ERSE in its promoter and to respond to ER stress, albeit less dramatically than either ERdj3 or ERdj4 [87]. Unlike yeast Sec63, ERdj2/hSec63 is not upregulated during ER stress [86]. This is compatible with the fact that translation and translocation of newly synthesized peptides slow down when mammalian cells encounter ER stress and that, unlike yeast, mammalian cells do not respond to ER stress by dramatically expanding the ER membranes [95, 96]. Because BiP has multiple functions in vivo and because a second Hsp70 homologue, GRP170/ORP150 [60], also exists in the mammalian ER, we anticipate that various ER DnaJ homologues will be specific to the different functions of the mammalian ER Hsp70s and that more ERdjs are therefore likely to be discovered. At present, little functional data are available for most of the mammalian ER DnaJs, and it is not known whether any of them interact with GRP170 in vivo.
4 ER-localized Nucleotide-exchange/releasing Factors
In order for Hsp70 proteins to be efficiently released from unfolded substrate proteins, ATP must replace ADP in the nucleotide-binding cleft. In bacteria, this feat is accomplished by the nucleotide-exchange factor GrpE, which binds to the ATPase domain of DnaK (the bacterial Hsp70 ortholog) and promotes the exchange of ADP to ATP, consequently releasing the unfolded substrate [38, 97]. In the mammalian cytosol, a number of both positive and negative regulators of nucleotide exchange have been identified. Hip is an Hsc70-interacting protein that binds to the ATPase domain of Hsc70 and stabilizes it in the ADP-bound state [42, 98]. The Bcl-2-binding anti-apoptotic factor BAG-1 was found to enhance nucleotide exchange from Hsp70 [41, 99, 100], but interestingly this does not always lead to enhanced folding activity. The third cofactor for Hsp70, HspBP1, promotes nucleotide exchange from Hsp70 in vitro, which actually inhibits its chaperoning activity for some proteins [101]. A homologue of HspBP1, Fes1p, was recently identified in S. cerevisiae, where it appears to be a much less efficient nucleotide-exchange factor than HspBP1 [102]. Interestingly, none of these Hsp70 cofactors share any homology with GrpE. The first potential nuclear-exchange factor for BiP came from a genetic screen in the yeast Yarrowia lipolytica to identify genes that interacted with the signal recognition particle in co-translational translocation. This screen identified SLS1, a resident ER protein [103]. A homologue of Sls1p, ScSls1p, was identified in S. cerevisiae, which was independently identified as Per100p [96] and Sil1p [45]. Further studies revealed that Sls1p binds directly to the ATPase domain of Kar2p and, in the presence of the J-domain of Sec63p, enhances ATP hydrolysis [45, 102]. Binding studies revealed that Sls1p prefers Kar2p when it is in the ADP-bound form, suggesting that it could act as a nucleotide-exchange factor [44]. SLS1/SIL1
10 Protein Folding in the Endoplasmic Reticulum
is a nonessential gene, but its overexpression can suppress the lethal phenotype observed in yeasts that lack both Ire1p, the kinase that signals the unfolded protein response in yeast, and Lhs1p, a second resident ER Hsp70 family member [45]. Recently, a mammalian nucleotide-releasing factor for BiP was identified in a yeast two-hybrid screen that used the ATPase domain of a BiP mutant as bait [46]. BAP (BiP-associated protein) shares low sequence homology with both Sls1p and HspBP1. BAP is an ER-localized glycoprotein that is ubiquitously expressed but which shows the highest levels of expression in tissues with a well-developed secretory pathway. BAP stimulates the ATPase activity of BiP by accelerating nucleotide release from BiP. Surprisingly, both ATP and ADP can be released by BAP in vitro, but, like Sls1p [44], BAP appears to bind better or more stably to the ADP-bound state of BiP [46]. This property may be critical in allowing BAP to drive the ATPase cycle of BiP forward. In contrast to Sls1p, whose expression level is increased by ER stress [45], BAP transcripts are not induced. In fact, BAP protein levels actually appear to decline [46], suggesting that mammalian cells might be able to regulate BiP release from substrates by controlling the ratio of BiP and BAP. It is not known whether BAP can also serve as a nucleotide-releasing factor for GRP170 or if another specific release factor exists. It has been suggested that nucleotide-exchange/releasing factors are not necessary for some Hsp70s including BiP, since nucleotide hydrolysis, not exchange, appears to be the rate-limiting step in the reaction [104, 105]. However, it is important to point out that these assays are done in the presence of only ATP and not with a combination of ATP and ADP as would be expected to occur in vivo, although currently there are no measurements of the relative ratio of the two nucleotides. The rate-limiting step in the ATPase cycle of DnaK is also ATP hydrolysis [106, 107], but it is clear that the nucleotide-exchange factor GrpE plays an important role in the DnaK cycle in vivo [108].
5 Organization and Relative Levels of Chaperones in the ER
Data obtained from a number of studies have demonstrated that multiple ER chaperones can associate with a given nascent protein. Sitia and coworkers demonstrated that unoxidized Ig light chains form disulfide bonds transiently with both PDI and ERp72 and suggested that these proteins may form a kind of affinity matrix in the ER that impedes the transport of unoxidized nascent proteins [109]. Similarly, both thyroglobulin [110] and HCGβ [111] can be cross-linked to BiP, GRP94, and ERp72 during their maturation, and the influenza hemagglutinin protein binds to a number of ER proteins including BiP, GRP94, calreticulin, and calnexin when cross-linking agents are added to the cells [112]. However, it was not clear from these studies whether the chaperones were binding as a complex or whether the individual chaperones were binding to distinct unfolded regions on these proteins. Recently, membrane-permeable cross-linking studies demonstrated the existence of a large ER-localized multi-protein complex bound to unassembled
6 Regulation of ER Chaperone Levels
Ig heavy chains that is comprised of the molecular chaperones BiP, GRP94, CaBP1, PDI, ERdj3, cyclophilin B, ERp72, GRP170, UDP-glucosyltransferase (UDP-GT), and SDF2–L1 [55]. Except for ERdj3, and to a lesser extent PDI, this complex also forms in the absence of nascent protein synthesis and is found in a variety of cell types, suggesting that this subset of ER chaperones forms an ER network that can bind to unfolded protein substrates instead of existing as free pools that assemble onto substrate proteins. It is notable that most of the components of the calnexin/calreticulin system, which include some of the most abundant chaperones inside the ER, either are not detected in this complex or are only very poorly represented [55]. Further support for this type of sub-organellar organization of chaperones comes from a study in which fluorescence microscopy was used. The precursor of the human asialoglycoprotein receptor, H2a, and the free heavy chains of MHC class I molecules accumulated in a compartment containing calnexin and calreticulin, but not BiP, PDI, or UDP-GT, when proteasomal degradation was inhibited [113]. Together these studies suggest a spatial separation of the two chaperone systems that may account for the temporal interactions observed in other studies [114, 115].
6 Regulation of ER Chaperone Levels
Changes in the normal physical environment of the cell (e.g., decreases in pH, energy, oxygen, glucose, or other nutrients) can dramatically affect the normal biosynthesis of proteins in the ER and can result in the accumulation of unfolded proteins. Under these conditions, a signal transduction pathway, termed the unfolded protein response (UPR), is activated to protect the cell by preventing the formation of insoluble protein aggregates. The hallmark of the ER stress response, and perhaps the only component of the response that is truly ER-specific, is the coordinate transcriptional upregulation of most ER chaperones and folding enzymes [8]. A second characteristic of the UPR, and one unique to metazoans, is the inhibition of protein synthesis, which serves to limit the accumulation of unfolded proteins in the ER. This occurs via phosphorylation of the α-subunit of eukaryotic translation initiation factor 2 (eIF2α) at Ser51 , which reduces the frequency of translation initiation and thereby inhibits new protein synthesis [116]. The UPR pathway was first delineated in yeast. The identification of an unfolded protein response element (UPRE) in the yeast Kar2 (BiP) promoter [117] allowed investigators to use genetic approaches to characterize the signaling pathway. Ire1/Ern1, the first component to be identified, is a transmembrane, ER-localized kinase that possesses an N-terminal stress-sensing domain in the lumen of the ER and cytoplasmic kinase domain [95, 118]. In response to ER stress, Ire1 dimerizes and is phosphorylated, which serves to activate a unique endonuclease activity at its C-terminus [119]. The target of this activity is a precursor mRNA that encodes the Hac1 transcription factor. After cleavage by Ire1p and religation by Rlg1, Hac1p is synthesized and regulates the expression of UPR target genes by binding to
11
12 Protein Folding in the Endoplasmic Reticulum
the UPRE in their promoter [120]. This single signaling cascade is responsible for activating the UPR in yeast. In higher eukaryotes, the elements of this signaling pathway are conserved but greatly expanded. Two Ire1 homologues exist in mammalian cells; Ire1α, which is ubiquitously expressed [121], and Ire1β, whose expression is restricted to gut epithelium [122]. Both proteins possess a lumen stress-sensing domain, a cytosolic kinase domain, and the unique endonuclease domain found in yeast Ire1p. The target of Ire1’s endonuclease activity is XBP1 mRNA, which alters the C-terminus of this transcription factor so that it encodes a protein with both a DNA-binding domain and a strong transactivation domain [123, 124]. Although XBP1 was first identified in a screen to identify proteins that bind to ER stress-regulated elements in mammalian chaperone promoters [125], it does not appear that either Ire1 or XBP1 [126, 127] is required for chaperone upregulation during ER stress, since mouse embryonic fibroblasts that do not express these proteins are still capable of inducing the chaperones. Instead, it is possible that the ATF6 transcription factor, which is synthesized as an ER-localized transmembrane protein [128] is responsible for their induction [127]. Activation of the UPR in mammalian cells leads to transport of ATF6 to the Golgi, where it is cleaved by the S1P and S2P proteases [129], thus liberating the cytosolic transcription factor domain from the membranes. Cleaved ATF6 can then enter the nucleus, bind to conserved ER stress elements (ERSE) that are found in multiple copies in the promoters of most ER chaperones and folding enzymes, and presumably upregulate their transcription during ER stress [125]. The latter point remains to be formally demonstrated either by examining chaperone induction in cells that are null for ATF6 or by showing that endogenous ATF6 binds to the chaperone promoters in a stress-inducible manner. A third arm of the mammalian UPR is represented by PERK/PEK, an eIF–2α kinase that is responsible for the transient inhibition of protein synthesis that occurs during ER stress [130, 131]. Although PERK null cells can still upregulate ER chaperones and folding enzymes during ER stress, the magnitude of the response is not as high as in wild-type cells [132], suggesting that something downstream of PERK contributes to their transcriptional upregulation. The mechanism by which the ER stress signal is transduced has been recently determined. Earlier studies revealed that the initial signal for activating the UPR was the accumulation of unfolded proteins in the ER [133] and that all the agents that induced the UPR would be expected to dramatically affect protein folding in this organelle [134]. However, not all unfolded proteins are able to activate the response. Apparently, those that bind to BiP do [133, 135], whereas those that bind to other chaperones do not [136]. Since only BiP-binding proteins appeared to activate the response, these studies suggested that levels of free BiP might be monitored by the cell to judge changes in the folding environment of the ER. Identification of the UPR transducer proteins allowed this to be examined directly. Studies revealed that both Ire1 and PERK were associated with BiP during normal physiological conditions, which kept them in a monomeric, non-activated state. Activation of the UPR with thapsigargin or DTT led to the rapid loss of BiP from the lumenal domain of these proteins and a concomitant oligomerization and activation of the
7 Disposal of BiP-associated Proteins That Fail to Fold or Assemble 13
two signal transducers [137]. Similar results have been obtained with yeast Ire1p [138]. Another study showed that the lumenal domain of ATF6 was also associated with BiP prior to stress, which in this case served to retain ATF6 in the ER [139]. Activation of the stress response leads to BiP release and transport of ATF6 to the Golgi, where the cytosolic transcription factor domain is liberated by the S1P and S2P proteases that reside there [129]. Thus, BiP directly regulates the UPR by controlling the activation status of the three transducers. It is likely that as stress conditions are alleviated and the pool of free BiP increases, BiP also plays a role in shutting down the response. In keeping with a central role of BiP in regulating the response, a recent study found that BiP is not readily translated early in the stress response even though BiP transcripts begin to increase very early [140].
7 Disposal of BiP-associated Proteins That Fail to Fold or Assemble
Proteins that have ultimately failed ER quality control are degraded to prevent their accumulation in the ER, which might either titrate out the components of the chaperone systems or form large insoluble aggregates that would be toxic to the cell. This turnover mechanism is termed ER-associated protein degradation (ERAD), which is conserved from yeast to mammals [14, 141]. The final steps of this ERAD process have been best characterized in yeast. Both malfolded proteins and excess subunits of multimeric proteins are retrotranslocated or dislocated back into the cytosol through a structure, which appears to be similar to the translocon used by nascent polypeptide chains to enter the ER lumen [31]. This retrotranslocation process is usually coupled with ubiquitination, which occurs at the cytosolic surface of the ER membrane. Ubiquitin (Ub) is a highly conserved small protein that is universally expressed in eukaryotic cells. Ubiquitination of substrates is a multistep process that is dependent on a Ub-activating enzyme (E1), a Ub-conjugating enzyme (E2), and a Ub ligase (E3) enzyme [142, 143]. E1 binds to Ub, adenylates its C-terminus, and then binds to an E2 to transfer Ub to its catalytic subunit. E3, which is usually substrate-specific, will bring the substrate to an E2 and mediate the polyubiquitination process. E2 enzymes, such as Ubc6p and Ubc7p in yeast, are recruited by Cue1p or E3 enzymes such as Hrd1p/Der3p to the ER membrane and positioned near the translocon to directly facilitate the ERAD process [144– 146]. The ubiquitinated ERAD substrates are then degraded by the cytosolic 26S proteasome [147]. This appears to be an important process in maintaining ER homeostasis during normal physiological conditions, since interfering with this process by either expressing mutants of the ERAD pathway [96, 148, 149] or using proteasomal inhibitors [150] results in activation of the ER stress pathway. However, the upstream ERAD signals that help cells select malfolded proteins and feed them into the downstream degradation mechanism remain fairly elusive. One mechanism for identifying malfolded glycoproteins for ERAD involves the ER chaperones calnexin, calreticulin, and calmegin, which constitute a machinery called the “CNX cycle” [151, 152]. Glycoproteins with nine mannoses are allowed
14 Protein Folding in the Endoplasmic Reticulum
to bind to and are then released from calnexin by alternating actions of glucosidase II and UDP-GT. The incorrectly folded proteins are allowed multiple rounds of association and dissociation to acquire the correct conformation until the outermost unit of mannose from the middle branch of the sugar is cleaved by ER mannosidase I. Glycoproteins tagged with Man8–glycans now have a lower affinity for UDP-GT [153] but a higher affinity for EDEM [154]. Unlike CNX, EDEM is a UPR-inducible ER membrane protein with homology to α-mannosidase but lacks mannosidase activity. EDEM will then extract malfolded ERAD substrates from the CNX cycle and feed them into the downstream ERAD machinery via mechanisms that are still unclear [155, 156]. Calnexin is required for degradation of ERAD substrates in an in vitro system in which ER membranes from yeast are used [31]. Kar2p was shown to be important in keeping the ERAD substrates in a soluble and retrotranslocationcompetent state in the yeast system [32]. Recent data on ERAD in mammalian cells have suggested that calnexin and BiP play sequential roles in identifying and targeting ERAD substrates for degradation [157].
8 Other Roles of BiP in the ER
In addition to their role in folding nascent proteins, both calnexin and BiP appear to aid the translocation of nascent polypeptide chains into the ER. BiP has been shown to “plug” the translocon during early stages of protein translocation to maintain the permeability barrier between the ER and cytosol [158]. This puts BiP in an ideal place to bind nascent chains as they enter the ER, and indeed a number of studies have shown that Kar2p together with Sec63p is required for the translocation of nascent proteins into the yeast ER [29, 30, 159]. However, there are currently no data to show a similar role for mammalian BiP. The recent identification of two mammalian homologues of Sec63 [79, 81] is certainly compatible with such a role. A final function of the ER is to house the calcium stores that are essential for many intracellular signaling pathways. Along with a number of other ER chaperones, BiP is a calcium-binding protein that contributes significantly to the calcium stores of the ER [160].
9 Conclusions
In conclusion, the ER is the site of most secretory protein synthesis, where aqueous channels must be opened to allow the nascent polypeptide chains to enter the ER lumen. Care must be taken to ensure that the permeability barrier between the ER and cytosol are preserved in order to maintain the unique environment of the ER. Once inside the ER lumen, the protein must fold and assemble into its mature tertiary and quaternary form. Proteins that fail to do so must be identified and targeted for retrotranslocation into the cytosol, where they become substrates
References
for the 26S proteasome. Changes in the physiological environment of the ER that could affect protein maturation must be monitored and responded to by increasing ER chaperones and presumably proteins involved in maintaining and restoring the ER environment. BiP has been shown to be involved in all of these functions and to directly regulate activation of the UPR that maintains ER homeostasis. As such BiP can be considered a master sensor and regulator of ER function. All of the ER functions, with the exception of calcium storage, require BiP’s ATPase activity and as a result are likely to involve BiP regulators including the ERdjs and BAP. Thus, we believe that functions for the regulators in these processes will be revealed in the future and that additional family members are likely to be discovered.
Acknowledgements
We wish to thank Ms. Melissa Mann for help in preparing and editing the manuscript.
References 1 JOHNSON, A. E. & van WAES, M. A. (1999). The translocon: a dynamic gateway at the ER membrane. Annu. Rev. Cell Dev. Biol. 15, 799–842. 2 CROWLEY, K. S., LIAO, S., WORRELL, V. E., REINHART, G. D., & JOHNSON, A. E. (1994). Secretory proteins move through the endoplasmic reticulum membrane via an aqueous, gated pore. Cell 78, 461–471. 3 KOWARIK, M., KUNG, S., MARTOGLIO, B., & HELENIUS, A. (2002). Protein folding during cotranslational translocation in the endoplasmic reticulum. Mol. Cell 10, 769–778. 4 WHITLEY, P., NILSSON, I. M., & von HEIJNE, G. (1996). A nascent secretory protein may traverse the ribosome/endoplasmic reticulum translocase complex as an extended chain. J. Biol. Chem. 271, 6241–6244. 5 BERGMAN, L. W. & KUEHL, W. M. (1979). Formation of an intrachain disulfide bond on nascent immunoglobulin light chains. J. Biol. Chem. 254, 8869–8876. 6 CHEN, W., HELENIUS, J., BRAAKMAN, I., & HELENIUS, A. (1995). Cotranslational folding and calnexin binding during
7
8
9
10
11
12
13
glycoprotein synthesis. Proc. Natl. Acad. Sci. U.S.A 92, 6229–6233. BERGMAN, L. W. & KUEHL, W. M. (1979). Formation of intermolecular disulfide bonds on nascent immunoglobulin polypeptides. J. Biol. Chem. 254, 5690–5694. LEE, A. S. (1987). Coordinated regulation of a set of genes by glucose and calcium ionophores in mammalian cells. Trends. Biochem Sci. 12, 20–23. HWANG, C., SINSKEY, A. J., & LODISH, H. F. (1992). Oxidized redox state of glutathione in the endoplasmic reticulum. Science 257, 1496–1502. ELLGAARD, L., MOLINARI, M., & HELENIUS, A. (1999). Setting the standards: quality control in the secretory pathway. Science 286, 1882–1888. HAMMOND, C. & HELENIUS, A. (1995). Quality control in the secretory pathway. Curr. Opin. Cell Biol. 7, 523–529. HAMPTON, R. Y. (2002). ER-associated degradation in protein quality control and cellular regulation. Curr. Opin. Cell Biol. 14, 476–482. JAROSCH, E., LENK, U., & SOMMER, T. (2003). Endoplasmic reticulum-
15
16 Protein Folding in the Endoplasmic Reticulum
14
15
16
17
18
19
20
21
22
23
24
associated protein degradation. Int. Rev. Cytol. 223, 39–81. WERNER, E. D., BRODSKY, J. L., & MCCRACKEN, A. A. (1996). Proteasome-dependent endoplasmic reticulum-associated protein degradation: an unconventional route to a familiar fate. Proc. Natl. Acad. Sci. U.S.A. 93, 13797–13801. HAAS, I. G. & WABL, M. (1983). Immunoglobulin heavy chain binding protein. Nature 306, 387–389. BOLE, D. G., HENDERSHOT, L. M., & KEARNEY, J. F. (1986). Posttranslational association of immunoglobulin heavy chain binding protein with nascent heavy chains in nonsecreting and secreting hybridomas. J. Cell. Biol. 102, 1558–1566. GETHING, M. J. & SAMBROOK, J. (1992). Protein folding in the cell. Nature 355, 33–45. MUNRO, S. & PELHAM, H. R. (1987). A C-terminal signal prevents secretion of luminal ER proteins. Cell 48, 899–907. LEWIS, M. J. & PELHAM, H. R. (1992). Sequence of a second human KDEL receptor. J. Mol. Biol. 226, 913–916. LEWIS, M. J., SWEET, D. J., & PELHAM, H. R. (1990). The ERD2 gene determines the specificity of the luminal ER protein retention system. Cell 61, 1359–1363. GAUT, J. R. & HENDERSHOT, L. M. (1993). Mutations within the nucleotide binding site of immunoglobulin-binding protein inhibit ATPase activity and interfere with release of immunoglobulin heavy chain. J. Biol. Chem. 268, 7248–7255. HENDERSHOT, L., WEI, J., GAUT, J., MELNICK, J., AVIEL, S., & ARGON, Y. (1996). Inhibition of immunoglobulin folding and secretion by dominant negative BiP ATPase mutants. Proc. Natl. Acad. Sci. U.S.A. 93, 5269–5274. KASSENBROCK, C. K. & KELLY, R. B. (1989). Interaction of heavy chain binding protein (BiP/GRP78) with adenine nucleotides. EMBO. J. 8, 1461–1467. VANHOVE, M., USHERWOOD, Y.-K., & HENDERSHOT, L. M. (2001). Unassembled Ig heavy chains do not cycle from BiP in vivo, but require light chains to trigger their release. Immunity 15, 105–114.
25 CAO, X., ZHOU, Y., & LEE, A. S. (1995). Requirement of tyrosine- and serine/threonine kinases in the transcrip-tional activation of the mammalian grp78/BiP promoter by thapsigargin. J. Biol. Chem. 270, 494–502. 26 HENDERSHOT, L. M., WEI, J.-Y., GAUT, J. R., LAWSON, B., FREIDEN, P. J., & MURTI, K. G. (1995). In vivo expression of mammalian BiP ATPase mutants causes disruption of the endoplasmic reticulum. Mol. Biol. Cell. 6, 283–296. 27 ROSE, M. D., MISRA, L. M., & VOGEL, J. P. (1989). KAR2, a karyogamy gene, is the yeast homolog of the mammalian BiP/GRP78 gene [published erratum appears in Cell 1989 Aug 25; 58(4): following 801]. Cell 57, 1211–1221. 28 BRODSKY, J. L., GOECKELER, J., & SCHEKMAN, R. (1995). BiP and Sec63p are required for both co- and posttranslational protein translocation into the yeast endoplasmic reticulum. Proc. Natl. Acad. Sci. U.S.A. 92, 9643–9646. 29 BRODSKY, J. L. & SCHEKMAN, R. (1993). A Sec63p-BiP complex from yeast is required for protein translocation in a reconstituted proteoliposome. J. Cell Biol. 123, 1355–1363. 30 VOGEL, J. P., MISRA, L. M., & ROSE, M. D. (1990). Loss of BiP/GRP78 function blocks translocation of secretory proteins in yeast. J. Cell. Biol. 110, 1885–1895. 31 BRODSKY, J. L., WERNER, E. D., DUBAS, M. E., GOECKELER, J. L., KRUSE, K. B., & MCCRACKEN, A. A. (1999). The requirement for molecular chaperones during endoplasmic reticulum-associated protein degradation demonstrates that protein export and import are mechanistically distinct. J. Biol. Chem. 274, 3453–3460. 32 NISHIKAWA, S. I., FEWELL, S. W., KATO, Y., BRODSKY, J. L., & ENDO, T. (2001). Molecular chaperones in the yeast endoplasmic reticulum maintain the solubility of proteins for retro-translocation and degradation. J. Cell Biol. 153, 1061–1070. 33 FLYNN, G. C., CHAPPELL, T. G., & ROTHMAN, J. E. (1989). Peptide binding
References
34
35
36
37
38
39
40
41
42
and release by proteins implicated as catalysts of protein assembly. Science 245, 385–390. FLYNN, G. C., POHL, J., FLOCCO, M. T., & ROTHMAN, J. E. (1991). Peptide-binding specificity of the molecular chaperone BiP. Nature 353, 726–730. BLOND-ELGUINDI, S., CWIRLA, S. E., DOWER, W. J., LIPSHUTZ, R. J., SPRANG, S. R., SAMBROOK, J. F., & GETHING, M. J. (1993). Affinity panning of a library of peptides displayed on bacteriophages reveals the binding specificity of BiP. Cell 75, 717–728. MACHAMER, C. E., DOMS, R. W., BOLE, D. G., HELENIUS, A., & ROSE, J. K. (1990). Heavy chain binding protein recognizes incompletely disulfide-bonded forms of vesicular stomatitis virus G protein. J. Biol. Chem. 265, 6879–6883. KNITTLER, M. R., DIRKS, S., & HAAS, I. G. (1995). Molecular chaperones involved in protein degradation in the endoplasmic reticulum: Quantitative interaction of the heat shock cognate protein BiP with partially folded immunoglobulin light chains that are degraded in the endoplasmic reticulum. Proc. Natl. Acad. Sci. U.S.A. 92, 1764–1768. LIBEREK, K., MARSZALEK, J., ANG, D., GEORGOPOULOS, C., & ZYLICZ, M. (1991). Escherichia coli DnaJ and GrpE heat shock proteins jointly stimulate ATPase activity of DnaK. Proc. Natl. Acad. Sci. U.S.A. 88, 2874–2878. CHEN, S. & SMITH, D. F. (1998). Hop as an adaptor in the heat shock protein 70 (Hsp70) and hsp90 chaperone machinery. J. Biol. Chem. 273, 35194–35200. GROSS, M. & HESSEFORT, S. (1996). Purification and characterization of a 66-kDa protein from rabbit reticulocyte lysate which promotes the recycling of hsp 70. J. Biol. Chem. 271, 16833–16841. HOHFELD, J. & JENTSCH, S. (1997). GrpE-like regulation of the hsc70 chaperone by the anti-apoptotic protein BAG-1. EMBO J. 16, 6209–6216. HOHFELD, J., MINAMI, Y., & HARTL, F. U. (1995). Hip, a novel cochaperone
43
44
45
46
47
48
49
50
51
involved in the eukaryotic Hsc70/Hsp40 reaction cycle. Cell 83, 589–598. TAKAYAMA, S., BIMSTON, D. N., MATSUZAWA, S., FREEMAN, B. C., AIME-SEMPE, C., XIE, Z., MORIMOTO, R. I., & REED, J. C. (1997). BAG-1 modulates the chaperone activity of Hsp70/Hsc70. EMBO Journal. 16, 4887–4896. BOISRAME, A., KABANI, M., BECKERICH, J. M., HARTMANN, E., & GAILLARDIN, C. (1998). Interaction of Kar2p and Sls1p is required for efficient co-translational translocation of secreted proteins in the yeast Yarrowia lipolytica. J. Biol. Chem. 273, 30903–30908. TYSON, J. R. & STIRLING, C. J. (2000). LHS1 and SIL1 provide a lumenal function that is essential for protein translocation into the endoplasmic reticulum. EMBO J. 19, 6440–6452. CHUNG, K. T., SHEN, Y., & HENDERSHOT, L. M. (2002). BAP, a mammalian BiP associated protein, is a nucleotide exchange factor that regulates the ATPase activity of BiP. J. Biol. Chem. 277, 47557–47563. SUZUKI, C. K., BONIFACINO, J. S., LIN, A. Y., DAVIS, M. M., & KLAUSNER, R. D. (1991). Regulating the retention of T-cell receptor alpha chain variants within the endoplasmic reticulum: Ca(2+)-dependent association with BiP. J. Cell. Biol. 114, 189–205. DORNER, A. J., WASLEY, L. C., & KAUFMAN, R. J. (1990). Protein dissociation from GRP78 and secretion are blocked by depletion of cellular ATP levels. Proc. Natl. Acad. Sci. U.S.A. 87, 7429–7432. MARTIN, J. & HARTL, F. U. (1997). The effect of macromolecular crowding on chaperonin-mediated protein folding. Proc. Natl. Acad. Sci. U.S.A 94, 1107–1112. HENDERSHOT, L., BOLE, D., KOHLER, G., & KEARNEY, J. F. (1987). Assembly and secretion of heavy chains that do not associate posttranslationally with immunoglobulin heavy chain-binding protein. J. Cell. Biol. 104, 761–767. LEE, Y.-K., BREWER, J. W., HELLMAN, R., & HENDERSHOT, L. M. (1999). BiP and Ig light chain cooperate to control the folding of heavy chain and ensure the
17
18 Protein Folding in the Endoplasmic Reticulum
52
53
54
55
56
57
58
59
60
fidelity of immunoglobulin assembly. Mol. Biol. Cell. 10, 2209–2219. KNARR, G., GETHING, M. J., MODROW, S., & BUCHNER, J. (1995). BiP binding sequences in antibodies. J. Biol. Chem. 270, 27589–27594. KALOFF, C. R. & HAAS, I. G. (1995). Coordination of immunoglobulin chain folding and immunoglobulin chain assembly is essential for the formation of functional IgG. Immunity 2, 629–637. HELLMAN, R., VANHOVE, M., LEJEUNE, A., STEVENS, F. J., & HENDERSHOT, L. M. (1999). The in vivo association of BiP with newly synthesized proteins is dependent on their rate and stability of folding and simply on the presence of sequences that can bind to BiP. J. Cell. Biol. 144, 21–30. MEUNIER, L., USHERWOOD, Y. K., CHUNG, K. T., & HENDERSHOT, L. M. (2002). A subset of chaperones and folding enzymes form multiprotein complexes in endoplasmic reticulum to bind nascent proteins. Mol. Biol. Cell 13, 4456–4469. CRAVEN, R. A., EGERTON, M., & STIRLING, C. J. (1996). A novel Hsp70 of the yeast ER lumen is required for the efficient translocation of a number of protein precursors. EMBO J. 15, 2640–2650. EASTON, D. P., KANEKO, Y., & SUBJECK, J. R. (2000). The hsp110 and Grp170 stress proteins: newly recognized relatives of the Hsp70s. Cell Stress. Chaperones. 5, 276–290. HAMILTON, T. G. & FLYNN, G. C. (1996). Cer1p, a novel Hsp70-related protein required for posttranslational endoplasmic reticulum translocation in yeast. J. Biol. Chem. 271, 30610–30613. HAMILTON, T. G., NORRIS, T. B., TSURUDA, P. R., & FLYNN, G. C. (1999). Cer1p functions as a molecular chaperone in the endoplasmic reticulum of Saccharomyces cerevisiae. Mol. Cell Biol. 19, 5298–5307. LIN, H. Y., MASSO-WELCH, P., DI, Y. P., CAI, J. W., SHEN, J. W., & SUBJECK, J. R. (1993). The 170-kDa glucose-regulated stress protein is an endoplasmic reticulum protein that binds
61
62
63
64
65
66
67
68
69
immunoglobulin. Mol. Biol. Cell. 4, 1109–1119. KANEDA, S., YURA, T., & YANAGI, H. (2000). Production of three distinct mRNAs of 150 kDa oxygen-regulated protein (ORP150) by alternative promoters: preferential induction of one species under stress conditions. J. Biochem. (Tokyo) 128, 529–538. OZAWA, K., TSUKAMOTO, Y., HORI, O., KITAO, Y., YANAGI, H., STERN, D. M., & OGAWA, S. (2001). Regulation of tumor angiogenesis by oxygen-regulated protein 150, an inducible endoplasmic reticulum chaperone. Cancer Res. 61, 4206–4213. DIERKS, T., VOLKMER, J., SCHLENSTEDT, G., JUNG, C., SANDHOLZER, U., ZACHMANN, K., SCHLOTTERHOSE, P., NEIFER, K., SCHMIDT, B., & ZIMMERMANN, R. (1996). A microsomal ATP-binding protein involved in efficient protein transport into the mammalian endoplasmic reticulum. EMBO J. 15, 6931–6942. YOCHEM, J., UCHIDA, H., SUNSHINE, M., SAITO, H., GEORGOPOULOS, C. P., & FEISS, M. (1978). Genetic analysis of two genes, dnaJ and dnaK, necessary for Escherichia coli and bacteriophage lambda DNA replication. Mol. Gen. Genet. 164, 9–14. CHEETHAM, M. E. & CAPLAN, A. J. (1998). Structure, function and evolution of DnaJ: conservation and adaptation of chaperone function. Cell Stress & Chaperones 3, 28–36. KELLEY, W. L. (1998). The J-domain family and the recruitment of chaperone power. Trends Biochem. Sci. 23, 222–227. BANECKI, B., LIBEREK, K., WALL, D., WAWRZYNOW, A., GEORGOPOULOS, C., BERTOLI, E., TANFANI, F., & ZYLICZ, M. (1996). Structure-function analysis of the zinc finger region of the DnaJ molecular chaperone. J. Biol. Chem. 271, 14840–14848. SZABO, A., KORSZUN, R., HARTL, F. U., & FLANAGAN, J. (1996). A zinc fingerlike domain of the molecular chaperone DnaJ is involved in binding to denatured protein substrates. EMBO J. 15, 408–417. HENNESSY, F., CHEETHAM, M. E., DIRR, H. W., & BLATCH, G. L. (2000). Analysis of
References
70
71
72
73
74
75
76
77
the levels of conservation of the J-domain among the various types of DnaJ-like proteins. Cell Stress. Chaperones. 5, 347–358. ROTHBLATT, J. A., DESHAIES, R. J., SANDERS, S. L., DAUM, G., & SCHEKMAN, R. (1989). Multiple genes are required for proper insertion of secretory proteins into the endoplasmic reticulum in yeast. J. Cell Biol. 109, 2641–2652. SADLER, I., CHIANG, A., KURIHARA, T., ROTHBLATT, J., WAY, J., & SILVER, P. (1989). A yeast gene important for protein assembly into the endoplasmic reticulum and the nucleus has homology to DnaJ, an Escherichia coli heat shock protein. J. Cell Biol. 109, 2665–2675. NISHIKAWA, S. & ENDO, T. (1997). The yeast JEM1p is a DnaJ-like protein of the endoplasmic reticulum membrane required for nuclear fusion. J. Biol. Chem. 272, 12889–12892. SCHLENSTEDT, G., HARRIS, S., RISSE, B., LILL, R., & SILVER, P. A. (1995). A yeast DnaJ homologue, Scj1p, can function in the endoplasmic reticulum with BiP/Kar2p via a conserved domain that specifies interactions with Hsp70s. J. Cell Biol. 129, 979–988. FELDHEIM, D., ROTHBLATT, J., & SCHEKMAN, R. (1992). Topology and functional domains of Sec63p, an endoplasmic reticulum membrane protein required for secretory protein translocation. Mol. Cell. Biol. 12, 3288–3296. CORSI, A. K. & SCHEKMAN, R. (1997). The lumenal domain of Sec63p stimulates the ATPase activity of BiP and mediates BiP recruitment to the translocon in Saccharomyces cerevisiae. J. Cell Biol. 137, 1483–1493. MCCLELLAN, A. J., ENDRES, J. B., VOGEL, J. P., PALAZZI, D., ROSE, M. D., & BRODSKY, J. L. (1998). Specific molecular chaperone interactions and an ATP-dependent conformational change are required during post-translational protein translocation into the yeast ER. Mol. Biol. Cell 9, 3533–3545. YOUNG, B. P., CRAVEN, R. A., REID, P. J., WILLER, M., & STIRLING, C. J. (2001). Sec63p and Kar2p are required for the
78
79
80
81
82
83
84
85
86
translocation of SRP-depen-dent precursors into the yeast endoplasmic reticulum in vivo. EMBO J. 20, 262–271. PLEMPER, R. K., BOHMLER, S., BORDALLO, J., SOMMER, T., & WOLF, D. H. (1997). Mutant analysis links the translocon and BiP to retrograde protein transport for ER degradation. Nature 388, 891–895. BRIGHTMAN, S. E., BLATCH, G. L., & ZETTER, B. R. (1995). Isolation of a mouse cDNA encoding MTJ1, a new murine member of the DnaJ family of proteins. Gene 153, 249–254. CHEVALIER, M., RHEE, H., ELGUINDI, E. C., & BLOND, S. Y. (2000). Interaction of murine BiP/GRP78 with the DnaJ homologue MTJ1. J. Biol. Chem. 275, 19620–19627. SKOWRONEK, M. H., ROTTER, M., & HAAS, I. G. (1999). Molecular characterization of a novel mammalian DnaJ-like Sec63p homolog. Biol. Chem. 380, 1133–1138. TYEDMERS, J., LERNER, M., BIES, C., DUDEK, J., SKOWRONEK, M. H., HAAS, I. G., HEIM, N., NASTAINCZYK, W., VOLKMER, J., & ZIMMERMANN, R. (2000). Homologs of the yeast Sec complex subunits Sec62p and Sec63p are abundant proteins in dog pancreas microsomes. Proc. Natl. Acad. Sci. U.S.A 97, 7214–7219. LAU, P. P., VILLANUEVA, H., KOBAYASHI, K., NAKAMUTA, M., CHANG, B. H., & CHAN, L. (2001). A DnaJ protein, apobec-1-binding protein-2, modulates apolipoprotein B mRNA editing. J. Biol. Chem. 276, 46445–46452. YU, M., HASLAM, R. H., & HASLAM, D. B. (2000). HEDJ, an Hsp40 cochaperone localized to the endo-plasmic reticulum of human cells. J. Biol. Chem. 275, 24984–24992. PROLS, F., MAYER, M. P., RENNER, O., CZARNECKI, P. G., AST, M., GASSLER, C., WILTING, J., KURZ, H., & CHRIST, B. (2001). Upregulation of the cochaperone Mdg1 in endothelial cells is induced by stress and during in vitro angiogenesis. Exp. Cell Res. 269, 42–53. SHEN, Y., MEUNIER, L., & HENDERSHOT, L. M. (2002). Identification and characterization of a novel endoplasmic reticulum (ER) DnaJ homologue, which
19
20 Protein Folding in the Endoplasmic Reticulum
87
88
89
90
91
92
93
stimulates ATPase activity of BiP in vitro and is induced by ER stress. J. Biol. Chem. (2002). May, 3; 277, (18): 15947–15956. J. Biol. Chem. 277, 15947–15956. CUNNEA, P. M., MIRANDA-VIZUETE, A., BERTOLI, G., SIMMEN, T., DAMDIMOPOULOS, A. E., HERMANN, S., LEINONEN, S., HUIKKO, M. P., GUSTAFSSON, J. A., SITIA, R., & SPYROU, G. (2003). ERdj5, an endoplasmic reticulum (ER)-resident protein containing DnaJ and thioredoxin domains, is expressed in secretory cells or following ER stress. J. Biol. Chem. 278, 1059–1066. HOSODA, A., KIMATA, Y., TSURU, A., & KOHNO, K. (2003). JPDI, a novel endoplasmic reticulum-resident protein containing both a BiP-interacting J-domain and thioredoxin-like motifs. J. Biol. Chem. 278, 2669–2676. DUDEK, J., VOLKMER, J., BIES, C., GUTH, S., MULLER, A., LERNER, M., FEICK, P., SCHAFER, K. H., MORGENSTERN, E., HENNESSY, F., BLATCH, G. L., JANOSCHECK, K., HEIM, N., SCHOLTES, P., FRIEN, M., NASTAINCZYK, W., & ZIMMERMANN, R. (2002). A novel type of co-chaperone mediates transmembrane recruitment of DnaK-like chaperones to ribosomes. EMBO J. 21, 2958–2967. MEYER, H. A., GRAU, H., KRAFT, R., KOSTKA, S., PREHN, S., KALIES, K. U., & HARTMANN, E. (2000). Mammalian Sec61 is associated with Sec62 and Sec63. J. Biol. Chem. 275, 14550–14557. BIES, C., GUTH, S., JANOSCHEK, K., NASTAINCZYK, W., VOLKMER, J., & ZIMMERMANN, R. (1999). A Scj1p homolog and folding catalysts present in dog pancreas microsomes. Biol. Chem. 380, 1175–1182. JOHANNES, L. & GOUD, B. (2000). Facing inward from compartment shores: how many pathways were we looking for? Traffic. 1, 119–123. SANDVIG, K., GARRED, O., PRYDZ, K., KOZLOV, J. V., HANSEN, S. H., & van DEURS, B. (1992). Retrograde transport of endocytosed Shiga toxin to the endoplasmic reticulum. Nature 358, 510–512.
94 KURISU, J., HONMA, A., MIYAJIMA, H., KONDO, S., OKUMURA, M., & IMAIZUMI, K. (2003). MDG1/ERdj4, an ER-resident DnaJ family member, suppresses cell death induced by ER stress. Genes Cells 8, 189–202. 95 COX, J. S., SHAMU, C. E., & WALTER, P. (1993). Transcriptional induction of genes encoding endoplasmic reticulum resident proteins requires a transmembrane protein kinase. Cell 73, 1197–1206. 96 TRAVERS, K. J., PATIL, C. K., WODICKA, L., LOCKHART, D. J., WEISSMAN, J. S., & WALTER, P. (2000). Functional and genomic analyses reveal an essential coordination between the unfolded protein response and ER-associated degradation. Cell 101, 249–258. 97 SZABO, A., LANGER, T., SCHRODER, H., FLANAGAN, J., BUKAU, B., & HARTL, F. U. (1994). The ATP hydrolysis-dependent reaction cycle of the Escherichia coli Hsp70 system DnaK, DnaJ, and GrpE. Proc. Natl. Acad. Sci. U.S.A. 91, 10345–10349. 98 PRAPAPANICH, V., CHEN, S., TORAN, E. J., RIMERMAN, R. A., & SMITH, D. F. (1996). Mutational analysis of the hsp70-interacting protein Hip. Mol. Cell Biol. 16, 6200–6207. 99 BIMSTON, D., SONG, J., WINCHESTER, D., TAKAYAMA, S., REED, J. C., & MORIMOTO, R. I. (1998). BAG-1, a negative regulator of Hsp70 chaperone activity, uncouples nucleotide hydrolysis from substrate release. EMBO Journal 17, 6871–6878. 100 STUART, J. K., MYSZKA, D. G., JOSS, L., MITCHELL, R. S., MCDONALD, S. M., XIE, Z., TAKAYAMA, S., REED, J. C., & ELY, K. R. (1998). Characterization of interactions between the anti-apoptotic protein BAG-1 and Hsc70 molecular chaperones. J. Biol. Chem. 273, 22506–22514. 101 RAYNES, D. A. & GUERRIERO, V., Jr. (1998). Inhibition of Hsp70 ATPase activity and protein renaturation by a novel Hsp70-binding protein. J. Biol. Chem. 273, 32883–32888. 102 KABANI, M., BECKERICH, J. M., & GAILLARDIN, C. (2000). Sls1p stimulates Sec63p-mediated activation of Kar2p in a
References
103
104
105
106
107
108
109
110
111
conformation-dependent manner in the yeast endoplasmic reticulum. Mol. Cell Biol. 20, 6923–6934. BOISRAME, A., BECKERICH, J. M., & GAILLARDIN, C. (1996). Sls1p, an endoplasmic reticulum component, is involved in the protein translocation process in the yeast Yarrowia lipolytica. J. Biol. Chem. 271, 11668–11675. GAO, B., EMOTO, Y., GREENE, L., & EISENBERG, E. (1993). Nucleotide binding properties of bovine brain uncoating ATPase. J. Biol. Chem. 268, 8507– 8513. MAYER, M., REINSTEIN, J., & BUCHNER, J. (2003). Modulation of the ATPase cycle of BiP by peptides and proteins. J. Mol. Biol. 330, 137–144. MCCARTY, J. S., BUCHBERGER, A., REINSTEIN, J., & BUKAU, B. (1995). The role of ATP in the functional cycle of the DnaK chaperone system. J. Mol. Biol. 249, 126–137. THEYSSEN, H., SCHUSTER, H. P., PACKSCHIES, L., BUKAU, B., & REINSTEIN, J. (1996). The second step of ATP binding to DnaK induces peptide release. J. Mol. Biol. 263, 657–670. POLISSI, A., GOFFIN, L., & GEORGOPOULOS, C. (1995). The Escherichia coli heat shock response and bacterio-phage lambda development. FEMS Microbiol. Rev. 17, 159–169. REDDY, P., SPARVOLI, A., FAGIOLI, C., FASSINA, G., & SITIA, R. (1996). Formation of reversible disulfide bonds with the protein matrix of the endoplasmic reticulum correlates with the retention of unassembled Ig light chains. EMBO. J. 15, 2077–2085. KUZNETSOV, G., CHEN, L. B., & NIGAM, S. K. (1997). Multiple molecular chaperones complex with misfolded large oligomeric glycoproteins in the endoplasmic reticulum. J. Biol. Chem. 272, 3057–3063. FENG, W., MATZUK, M. M., MOUNTJOY, K., BEDOWS, E., RUDDON, R. W., & BOIME, I. (1995). The asparagine-linked oligosaccharides of the human chorionic gonadotropin beta subunit facilitate correct disulfide bond pairing. J. Biol. Chem. 270, 11851–11859.
112 TATU, U. & HELENIUS, A. (1997). Interactions between newly synthesized glycoproteins, calnexin and a network of resident chaperones in the endoplasmic reticulum. J. Cell Biol. 136, 555–565. 113 KAMHI-NESHER, S., SHENKMAN, M., TOLCHINSKY, S., FROMM, S. V., EHRLICH, R., & LEDERKREMER, G. Z. (2001). A novel quality control compartment derived from the endoplasmic reticulum. Mol. Biol. Cell 12, 1711–1723. 114 HAMMOND, C. & HELENIUS, A. (1994). Folding of VSV G protein: sequential interaction with BiP and calnexin. Science 266, 456–458. 115 KIM, P. S. & ARVAN, P. (1995). Calnexin and BiP act as sequential molecular chaperones during thyroglobulin folding in the endoplasmic reticulum. J. Cell Biol. 128, 29–38. 116 PATHAK, V. K., SCHINDLER, D., & HERSHEY, J. W. (1988). Generation of a mutant form of protein synthesis initiation factor eIF-2 lacking the site of phosphorylation by eIF-2 kinases. Mol. Cell. Biol. 8, 993–995. 117 KOHNO, K., NORMINGTON, K., SAMBROOK, J., GETHING, M. J., & MORI, K. (1993). The promoter region of the yeast KAR2 (BiP) gene contains a regulatory domain that responds to the presence of unfolded proteins in the endoplasmic reticulum. Mol. Cell. Biol. 13, 877–890. 118 MORI, K., MA, W., GETHING, M. J., & SAMBROOK, J. (1993). A transmembrane protein with a cdc2+ /CDC28-related kinase activity is required for signalling from the ER to the nucleus. Cell 74, 743–756. 119 SIDRAUSKI, C. & WALTER, P. (1997). The transmembrane kinase Ire1p is a site-specific endonuclease that initiates mRNA splicing in the unfolded protein response. Cell 90, 1031–1039. 120 COX, J. S. & WALTER, P. (1996). A novel mechanism for regulating activity of a transcription factor that controls the unfolded protein response [see comments]. Cell 87, 391–404. 121 TIRASOPHON, W., WELIHINDA, A. A., & KAUFMAN, R. J. (1998). A stress response pathway from the endo-plasmic reticulum to the nucleus requires a
21
22 Protein Folding in the Endoplasmic Reticulum
122
123
124
125
126
127
128
129
novel bifunctional protein kinase/ endoribonuclease (Ire1p) in mammalian cells. Genes. Dev. 12, 1812–1824. WANG, X.-Z., HARDING, H. P., ZHANG, Y., JOLICOEUR, E. M., KURODA, M., & RON, D. (1998). Cloning of mammalian Ire1 reveals diversity in the ER stress responses. EMBO. J. 17, 5708–5717. CALFON, M., ZENG, H., URANO, F., TILL, J. H., HUBBARD, S. R., HARDING, H. P., CLASK, S. G., & RON, D. (2002). IRE1 couples endoplasmic reticulum load to secretory capacity by processing the XBP-1 mRNA. Nature 415, 92–96. YOSHIDA, H., MATSUI, T., YAMAMOTO, A., OKADA, T., & MORI, K. (2001). XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell 107, 881–891. YOSHIDA, H., HAZE, K., YANAGI, H., YURA, T., & MORI, K. (1998). Identification of the cis-acting endoplasmic reticulum stress response element responsible for transcriptional induction of mammalian glucose-regulated proteins. Involvement of basic leucine zipper transcription factors. J. Biol. Chem. 273, 33741–33749. BERTOLOTTI, A., WANG, X., NOVOA, I., JUNGREIS, R., SCHLESSINGER, K., CHO, J. H., WEST, A. B., & RON, D. (2001). Increased sensitivity to dextran sodium sulfate colitis in IRE1beta-deficient mice. J. Clin. Invest 107, 585–593. LEE, K., TIRASOPHON, W., SHEN, X., MICHALAK, M., PRYWES, R., OKADA, T., YOSHIDA, H., MORI, K., & KAUFMAN, R. J. (2002). IRE1-mediated unconventional mRNA splicing and S2P-mediated ATF6 cleavage merge to regulate XBP1 in signaling the unfolded protein response. Genes Dev. 16, 452–466. HAZE, K., YOSHIDA, H., YANAGI, H., YURA, T., & MORI, K. (1999). Mammalian transcription factor ATF6 is synthesized as a transmembrane protein and activated by proteolysis in response to endoplasmic reticulum stress. Mol. Biol. Cell 10, 3787–3799. YE, J., RAWSON, R. B., KOMURO, R., CHEN, X., DAVE, U. P., PRYWES, R., BROWN, M. S., & GOLDSTEIN, J. L. (2000). ER stress
130
131
132
133
134
135
136
137
138
induces cleavage of membrane-bound ATF6 by the same proteases that process SREBPs. Mol. Cell 6, 1355–1364. HARDING, H. P., ZHANG, Y., & RON, D. (1999). Protein translation and folding are coupled by an endo-plasmic-reticulum-resident kinase. Nature 397, 271–274. SHI, Y., VATTEM, K. M., SOOD, R., AN, J., LIANG, J., STRAMM, L., & WEK, R. C. (1998). Identification and characterization of pancreatic eukaryotic initiation factor 2 alpha-subunit kinase, PEK, involved in translational control. Mol. Cell Biol. 18, 7499–7509. HARDING, H. P., NOVOA, I., ZHANG, Y., ZENG, H., WEK, R., SCHAPIRA, M., & RON, D. (2000). Regulated translation initiation controls stress-induced gene expression in mammalian cells. Mol. Cell 6, 1099–1108. KOZUTSUMI, Y., SEGAL, M., NORMINGTON, K., GETHING, M. J., & SAMBROOK, J. (1988). The presence of malfolded proteins in the endoplasmic reticulum signals the induction of glucose-regulated proteins. Nature 332, 462–464. LEE, A. S. (1992). Mammalian stress response: induction of the glucose-regulated protein family. Curr. Opin. Cell Biol. 4, 267–273. LENNY, N. & GREEN, M. (1991). Regulation of endoplasmic reticulum stress proteins in COS cells trans-fected with immunoglobulin mu heavy chain cDNA. J. Biol. Chem. 266, 20532–20537. GRAHAM, K. S., LE, A., & SIFERS, R. N. (1990). Accumulation of the insoluble PiZ variant of human alpha 1-antitrypsin within the hepatic endoplasmic reticulum does not elevate the steady-state level of grp78/BiP. J. Biol. Chem. 265, 20463–20468. BERTOLOTTI, A., ZHANG, Y., HENDERSHOT, L. M., HARDING, H. P., & RON, D. (2000). Dynamic interaction of BiP and ER stress transducers in the unfolded-protein response. Nat. Cell Biol. 2, 326–332. OKAMURA, K., KIMATA, Y., HIGASHIO, H., TSURU, A., & KOHNO, K. (2000).
References
139
140
141
142
143
144
145
146
147
148
Dissociation of Kar2p/BiP from an ER sensory molecule, Ire1p, triggers the unfolded protein response in yeast. Biochem. Biophys. Res. Commun. 279, 445–450. SHEN, J., CHEN, X., HENDERSHOT, L., & PRYWES, R. (2002). ER stress regulation of ATF6 localization by dissociation of BiP/GRP78 binding and unmasking of Golgi localization signals. Dev. Cell. 3, 99–111. MA, Y. & HENDERSHOT, L. M. (2003). Delineation of the negative feedback regulatory loop that controls protein translation during ER stress. J. Biol. Chem. 278, 34864–34873. HAMPTON, R. Y. (2002). ER-associated degradation in protein quality control and cellular regulation. Curr. Opin. Cell Biol. 14, 476–482. FREIMAN, R. N. & TJIAN, R. (2003). Regulating the regulators: lysine modifications make their mark. Cell 112, 11–17. HERSHKO, A. & CIECHANOVER, A. (1998). The ubiquitin system. Annu. Rev. Biochem. 67, 425–479. BIEDERER, T., VOLKWEIN, C., & SOMMER, T. (1997). Role of Cue1p in ubiquitination and degradation at the ER surface. Science 278, 1806–1809. GILON, T., CHOMSKY, O., & KULKA, R. G. (2000). Degradation signals recognized by the Ubc6p–Ubc7p ubiquitin-conjugating enzyme pair. Mol. Cell Biol. 20, 7214–7219. WILHOVSKY, S., GARDNER, R., & HAMPTON, R. (2000). HRD gene dependence of endoplasmic reticulum associated degradation. Mol. Biol. Cell 11, 1697–1708. OBERDORF, J., CARLSON, E. J., & SKACH, W. R. (2001). Redundancy of mammalian proteasome beta subunit function during endoplasmic reticulum associated degradation. Biochem. 40, 13397–13405. FRIEDLANDER, R., JAROSCH, E., URBAN, J., VOLKWEIN, C., & SOMMER, T. (2000). A regulatory link between ER-associated protein degradation and the unfolded-protein response. Nat. Cell Biol. 2, 379–384.
149 NG, D. T., SPEAR, E. D., & WALTER, P. (2000). The unfolded protein response regulates multiple aspects of secretory and membrane protein biogenesis and endoplasmic reticulum quality control. J. Cell Biol. 150, 77–88. 150 BUSH, K. T., GOLDBERG, A. L., & NIGAM, S. K. (1997). Proteasome inhibition leads to a heat-shock response, induction of endoplasmic reticulum chaperones, and thermotolerance. J. Biol. Chem. 272, 9086–9092. 151 CABRAL, C. M., LIU, Y., & SIFERS, R. N. (2001). Dissecting glycoprotein quality control in the secretory pathway. Trends Biochem. Sci. 26, 619–624. 152 HELENIUS, A. & AEBI, M. (2001). Intracellular functions of N-linked glycans. Science 291, 2364–2369. 153 PARODI, A. J. (2000). Role of N-oligosaccharide endoplasmic reticulum processing reactions in glycoprotein folding and degradation. Biochem. J. 348 Pt 1, 1–13. 154 HOSOKAWA, N., WADA, I., HASEGAWA, K., YORIHUZI, T., TREMBLAY, L. O., HERSCOVICS, A., & NAGATA, K. (2001). A novel ER alpha-mannosidase-like protein accelerates ER-associated degradation. EMBO Rep. 2, 415–422. 155 MOLINARI, M., CALANCA, V., GALLI, C., LUCCA, P., & PAGANETTI, P. (2003). Role of EDEM in the release of misfolded glycoproteins from the calnexin cycle. Science 299, 1397–1400. 156 ODA, Y., HOSOKAWA, N., WADA, I., & NAGATA, K. (2003). EDEM as an acceptor of terminally misfolded glycoproteins released from calnexin. Science 299, 1394–1397. 157 MOLINARI, M., GALLI, C., PICCALUGA, V., PIEREN, M., & PAGANETTI, P. (2002). Sequential assistance of molecular chaperones and transient formation of covalent complexes during protein degradation from the ER. J. Cell Biol. 158, 247–257. 158 HAMMAN, B. D., HENDERSHOT, L. M., & JOHNSON, A. E. (1998). BiP maintains the permeability barrier of the ER membrane by sealing the lumenal end of the translocon pore before and early in translocation. Cell 92, 747–758.
23
24 Protein Folding in the Endoplasmic Reticulum
159 SANDERS, S. L., WHITFIELD, K. M., VOGEL, J. P., ROSE, M. D., & SCHEKMAN, R. W. (1992). Sec61p and BiP directly facilitate polypeptide translocation into the ER. Cell 69, 353–365. 160 LIEVREMONT, J. P., RIZZUTO, R., HENDERSHOT, L., & MELDOLESI, J. (1997). BiP, a major chaperone protein of the endoplasmic reticulum lumen, plays a direct and important role in the storage of the rapidly exchanging pool of Ca2+ . J. Biol. Chem. 272, 30873–30879. 161 (1995). Current Protocols in Protein Science John Wiley & Sons, Inc. 162 AUSUBEL, F. M., BRENT, R., KINGSTON, R. E., MOORE, D. D., SEIDMAN, J. G., SMITH, J. A., & STRUHL, K. (1989). Current Protocols in Molecular Biology John Wiley and Sons, New York, NY. 163 BAI, C. & ELLEDGE, S. J. (1996). Gene identification using the yeast two-hybrid system. Methods Enzymol. 273, 331–347. 164 BANECKI, B. & ZYLICZ, M. (1996). Real time kinetics of the DnaK/DnaJ/GrpE molecular chaperone machine action. J. Biol. Chem. 271, 6137–6143. 165 BASKIN, L. S. & YANG, C. S. (1980). Cross-linking studies of cytochrome P-450 and reduced nicotinamide adenine dinucleotide phosphate-cytochrome P-450 reductase. Biochem. 19, 2260–2264. 166 BORDIER, C. (1981). Phase separation of integral membrane proteins in Triton X-114 solution. J. Biol. Chem. 256, 1604–1607. 167 BOWDEN, G. A. & GEORGIOU, G. (1990). Folding and aggregation of beta-lactamase in the periplasmic space of Escherichia coli. J. Biol. Chem. 265, 16760–16766. 168 BRAAKMAN, I., HELENIUS, J., & HELENIUS, A. (1992). Manipulating disulfide bond formation and protein folding in the endoplasmic reticulum. EMBO. J. 11, 1717–1722. 169 BRAAKMAN, I., HOOVER LITTY, H., WAGNER, K. R., & HELENIUS, A. (1991). Folding of influenza hemagglutinin in the endoplasmic reticulum. J. Cell. Biol. 114, 401–411. 170 BUKAU, B. & HORWICH, A. L. (1998). The
171
172
173
174
175
176
177
178
179
Hsp70 and Hsp60 chaperone machines. Cell 92, 351–366. BURGESS, A. J., MATTHEWS, I., GRIMES, E. A., MATA, A. M., MUNKONGE, F. M., LEE, A. G., & EAST, J. M. (1991). Chemical cross-linking and enzyme kinetics provide no evidence for a regulatory role for the 53 kDa glycoprotein of sarcoplasmic reticulum in calcium transport. Biochim. Biophys. Acta 1064, 139–147. BURT, H. M. & JACKSON, J. K. (1990). Role of membrane proteins in monosodium urate crystal-membrane interactions. II. Effect of pretreatments of erythrocyte membranes with membrane permeable and impermeable protein cross-linking agents. J. Rheumatol. 17, 1359–1363. CARLSSON, J., DREVIN, H., & AXEN, R. (1978). Protein thiolation and reversible protein-protein conjugation. N-Succinimidyl 3-(2-pyridyldithio)propionate, a new heterobifunctional reagent. Biochem. J. 173, 723–737. CEREGHINO, G. P., CEREGHINO, J. L., ILGEN, C., & CREGG, J. M. (2002). Production of recombinant proteins in fermenter cultures of the yeast Pichia pastoris. Curr. Opin. Biotechnol. 13, 329–332. CHAPPELL, T. G., KONFORTI, B. B., SCHMID, S. L., & ROTHMAN, J. E. (1987). The ATPase core of a clathrin uncoating protein. J. Biol. Chem. 262, 746–751. CHENGALVALA, M. V., LUBECK, M. D., SELLING, B. J., NATUK, R. J., HSU, K. H., MASON, B. B., CHANDA, P. K., BHAT, R. A., BHAT, B. M., MIZUTANI, S., et al. (1991). Adenovirus vectors for gene expression. Curr. Opin. Biotechnol. 2, 718–722. CHEVALIER, M., KING, L., & BLOND, S. (1998). Purification and properties of BiP. Methods Enzymol. 290, 384–409. CHIRICO, W. J., MARKEY, M. L., & FINK, A. L. (1998). Conformational changes of an Hsp70 molecular chaperone induced by nucleotides, polypeptides, and N-ethylmaleimide. Biochem. 37, 13862–13870. CLAROS, M. G., BRUNAK, S., & von HEIJNE, G. (1997). Prediction of
References
180
181
182
183
184
185
186
187
188
189
N-terminal protein sorting signals. Curr. Opin. Struct. Biol. 7, 394–398. CREEMERS, J. W., van de LOO, J. W., PLETS, E., HENDERSHOT, L. M., & Van De VEN, W. J. (2000). Binding of BiP to the processing enzyme lymphoma proprotein convertase prevents aggregation, but slows down maturation. J. Biol. Chem. 275, 38842–38847. CREGG, J. M., VEDVICK, T. S., & RASCHKE, W. C. (1993). Recent advances in the expression of foreign genes in Pichia pastoris. Biotechnology (N. Y.) 11, 905–910. DIGILIO, F. A., MORRA, R., PEDONE, E., BARTOLUCCI, S., & ROSSI, M. (2003). High-level expression of Aliciclobacillus acidocaldarius thioredoxin in Pichia pastoris and Bacillus subtilis. Protein Expr. Purif. 30, 179–184. FIELDS, S. & SONG, O. (1989). A novel genetic system to detect protein-protein interactions. Nature 340, 245–246. FLAHERTY, K. M., DELUCA FLAHERTY, C., & MCKAY, D. B. (1990). Three-dimensional structure of the ATPase fragment of a 70K heat-shock cognate protein. Nature 346, 623–628. FLAHERTY, K. M., WILBANKS, S. M., DELUCA FLAHERTY, C., & MCKAY, D. B. (1994). Structural basis of the 70-kilodalton heat shock cognate protein ATP hydrolytic activity. II. Structure of the active site with ADP or ATP bound to wild type and mutant ATPase fragment. J. Biol. Chem. 269, 12899–12907. FLEISCHER, S. & KERVINA, M. (1974). Subcellular fractionation of rat liver. Methods Enzymol. 31, 6–41. GAO, B., GREENE, L., & EISENBERG, E. (1994). Characterization of nucleotide-free uncoating ATPase and its binding to ATP, ADP, and ATP analogues. Biochem. 33, 2048–2054. GARNIER, A., COTE, J., NADEAU, I., KAMEN, A., & MASSIE, B. (1994). Scale-up of the adenovirus expression system for the production of recombinant protein in human 293S cells. Cytotechnology 15, 145–155. GASSLER, C. S., WIEDERKEHR, T., BREHMER, D., BUKAU, B., & MAYER, M. P.
190
191
192
193
194
195
196
197
198
199
(2001). Bag-1M accelerates nucleotide release for human Hsc70 and Hsp70 and can act concentration-dependent as positive and negative cofactor. J. Biol. Chem. 276, 32538–32544. GOSSEN, M. & BUJARD, H. (1992). Tight control of gene expression in mammalian cells by tetracycline-responsive promoters. Proc. Natl. Acad. Sci. U.S.A. 89, 5547–5551. HA, J. H. & MCKAY, D. B. (1994). ATPase kinetics of recombinant bovine 70 kDa heat shock cognate protein and its amino-terminal ATPase domain. Biochemistry. 33, 14625–14635. HART, G. W. (1997). Dynamic O-linked glycosylation of nuclear and cytoskeletal proteins. Annu. Rev. Biochem. 66, 315–335. HELLERS, M., GUNNE, H., & STEINER, H. (1991). Expression of posttranslational processing of preprocecropin A using a baculovirus vector. Eur. J. Biochem. 199, 435–439. JARVIS D. L. and SUMMERS M.. (1992). Baculovirus expression vectors. In Recombinant DNA vaccines: Rationale and Strategies, pp. 265–291, Marcel Dekker, New York. JONES, S., LITT, R. J., RICHARDSON, C. J., & SEGEV, N. (1995). Requirement of nucleotide exchange factor for Ypt1 GTPase mediated protein transport. J. Cell Biol. 130, 1051–1061. KABANI, M., MCLELLAN, C., RAYNES, D. A., GUERRIERO, V., & BRODSKY, J. L. (2002). HspBP1, a homologue of the yeast Fes1 and Sls1 proteins, is an Hsc70 nucleotide exchange factor. FEBS Lett. 531, 339–342. KOIDE, N. & MURAMATSU, T. (1975). Specific inhibition of endo-beta-N-acetylglucosaminidase D by mannose and simple mannosides. Biochem. Biophys. Res. Commun. 66, 411–416. KORNFELD, R. & KORNFELD, S. (1985). Assembly of asparagine-linked oligosaccharides. Annu. Rev. Biochem 54, 631–664. LEHLE, L. & TANNER, W. (1976). The specific site of tunicamycin inhibition in the formation of dolichol-bound
25
26 Protein Folding in the Endoplasmic Reticulum
200
201
202
203
204
205
206
207
208
N-acetylglucosamine derivatives. FEBS Lett. 72, 167–170. MALBY, R. L., CALDWELL, J. B., GRUEN, L. C., HARLEY, V. R., IVANCIC, N., KORTT, A. A., LILLEY, G. G., POWER, B. E., WEBSTER, R. G., COLMAN, P. M., et al. (1993). Recombinant antineuraminidase single chain antibody: expression, characterization, and crystallization in complex with antigen. Proteins 16, 57–63. MCKAY, D. B. (1993). Structure and mechanism of 70-kDa heat-shock-related proteins. Adv. Protein. Chem. 44, 67–98. MELNICK, J., AVIEL, S., & ARGON, Y. (1992). The endoplasmic reticulum stress protein GRP94, in addition to BiP, associates with unassembled immunoglobulin chains. J. Biol. Chem. 267, 21303–21306. MELNICK, J., DUL, J. L., & ARGON, Y. (1994). Sequential interaction of the chaperones BiP and Grp94 with immunoglobulin chains in the endoplasmic reticulum. Nature 370, 373–375. MINAMI, Y., HOHFELD, J., OHTSUKA, K., & HARTL, F. U. (1996). Regulation of the heat-shock protein 70 reaction cycle by the mammalian DnaJ homolog, Hsp40. J. Biol. Chem. 271, 19617–19624. MORRIS, J. A., DORNER, A. J., EDWARDS, C. A., HENDERSHOT, L. M., & KAUFMAN, R. J. (1997). Immunoglobulin binding protein (BiP) function is required to protect cells from endoplasmic reticulum stress but is not required for the secretion of selective proteins. J. Biol. Chem. 272, 4327–4334. NEU, H. C. & HEPPEL, L. A. (1965). The release of enzymes from Escherichia coli by osmotic shock and during the formation of spheroplasts. J. Biol. Chem. 240, 3685–3692. NICCHITTA, C. V. & BLOBEL, G. (1993). Lumenal proteins of the mammalian endoplasmic reticulum are required to complete protein translocation. Cell 73, 989–998. OKA, T., SAKAMOTO, S., MIYOSHI, K., FUWA, T., YODA, K., YAMASAKI, M., TAMURA, G., & MIYAKE, T. (1985). Synthesis and secretion of human epidermal growth factor by Escheri-chia
209
210
211
212
213
214
215
216
217
218
coli. Proc. Natl. Acad. Sci. U.S.A 82, 7212–7216. PALLEROS, D. R., REID, K. L., SHI, L., WELCH, W. J., & FINK, A. L. (1993). ATP-induced protein-Hsp70 complex dissociation requires K+ but not ATP hydrolysis. Nature 365, 664–666. PIWNICA-WORMS, H., WILLIAMS, N. G., CHENG, S. H., & ROBERTS, T. M. (1990). Regulation of pp60c-src and its interaction with polyomavirus middle T antigen in insect cells. J. Virol. 64, 61–68. PLUMMER, T. H., Jr., ELDER, J. H., ALEXANDER, S., PHELAN, A. W., & TARENTINO, A. L. (1984). Demonstration of peptide: N-glycosidase F activity in endo-beta-N-acetylgluco-saminidase F preparations. J. Biol. Chem. 259, 10700–10704. POWER, B. E., IVANCIC, N., HARLEY, V. R., WEBSTER, R. G., KORTT, A. A., IRVING, R. A., & HUDSON, P. J. (1992). High-level temperature-induced synthesis of an antibody VH-domain in Escherichia coli using the PelB secretion signal. Gene 113, 95–99. QIAN, P., LI, X., TONG, G., & CHEN, H. (2003). High-level Expression of the ORF6 Gene of Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) in Pichia pastoris. Virus Genes 27, 189–196. ROTH, R. A. & PIERCE, S. B. (1987). In vivo cross-linking of protein disulfide isomerase to immunoglobulins. Biochem. 26, 4179–4182. STEBBINS, J. & DEBOUCK, C. (1994). Expression systems for retroviral proteases. Methods Enzymol. 241, 3–16. TACHIBANA, H., TAKEKOSHI, M., CHENG, X. J., MAEDA, F., AOTSUKA, S., & IHARA, S. (1999). Bacterial expression of a neutralizing mouse monoclonal antibody Fab fragment to a 150-kilodalton surface antigen of Entamoeba histolytica. Am. J. Trop. Med. Hyg. 60, 35–40. TAKAHASHI, N. (1977). Demonstration of a new amidase acting on glycopeptides. Biochem. Biophys. Res. Commun. 76, 1194–1201. TRIMBLE, R. B., TARENTINO, A. L., PLUMMER, T. H., Jr., & MALEY, F. (1978). Asparaginyl glycopeptides with a low
References
219
220
221
222
223
mannose content are hydrolyzed by endo-beta-N-acetylglucosaminidase H. J. Biol. Chem. 253, 4508–4511. WATKINS, J. D., HERMANOWSKI, A. L., & BALCH, W. E. (1993). Oligomerization of immunoglobulin G heavy and light chains in vitro. A cell-free assay to study the assembly of the endo-plasmic reticulum. J. Biol. Chem. 268, 5182–5192. WAWRZYNOW, A. & ZYLICZ, M. (1995). Divergent effects of ATP on the binding of the DnaK and DnaJ chaperones to each other, or to their various native and denatured protein substrates. Journal. of. Biological. Chemistry. 270, 19300–19306. WEI, J.-Y., GAUT, J. R., & HENDERSHOT, L. M. (1995). In vitro dissociation of BiP: peptide complexes requires a conformational change in BiP after ATP binding but does not require ATP hydrolysis. J. Biol. Chem. 270, 26677–26682. WEI, J.-Y. & HENDERSHOT, L. M. (1995). Characterization of the nucleotide binding properties and ATPase activity of recombinant hamster BiP purified from bacteria. J. Biol. Chem. 270, 26670–26677. WESTIN, G., GERSTER, T., MULLER, M. M., SCHAFFNER, G., & SCHAFFNER, W. (1987). OVEC, a versatile system to study transcription in mammalian cells and cell-free extracts. Nucleic. Acids. Res. 15, 6787–6798.
224 WILBANKS, S. M., DELUCA FLAHERTY, C., & MCKAY, D. B. (1994). Structural basis of the 70-kilodalton heat shock cognate protein ATP hydrolytic activity. I. Kinetic analyses of active site mutants. J. Biol. Chem. 269, 12893–12898. 225 WURM, F. & BERNARD, A. (1999). Large-scale transient expression in mammalian cells for recombinant protein production. Curr. Opin. Biotechnol. 10, 156–159. 226 XU, R., KUME, A., MATSUDA, K. M., UEDA, Y., KODAIRA, H., OGASAWARA, Y., URABE, M., KATO, I., HASEGAWA, M., & OZAWA, K. (1999). A selective amplifier gene for tamoxifen-inducible expansion of hematopoietic cells. J. Gene Med. 1, 236–244. 227 ZERBY, D., SAKHUJA, K., REDDY, P. S., ZIMMERMAN, H., KAYDA, D., GANESH, S., PATTISON, S., BRANN, T., KADAN, M. J., KALEKO, M., & CONNELLY, S. (2003). In vivo ligand-inducible regulation of gene expression in a gutless adenoviral vector system. Hum. Gene Ther. 14, 749–761. 228 ZYLICZ, M., LEBOWITZ, J. H., MCMACKEN, R., & GEORGOPOULOS, C. (1983). The dnaK protein of Escherichia coli possesses an ATPase and autophosphorylating activity and is essential in an in vitro DNA replication system. Proc. Natl. Acad. Sci. U.S.A. 80, 6431–6435.
27
1
Procollagen Biosynthesis in Mammalian Cells Mohammed Tasab, and Neil J. Bulleid University of Manchester, Manchester, United Kingdom
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction 1.1 Variety and Complexity of Collagen Proteins
Collagen proteins are a very heterogeneous group of structural proteins with an array of different structure-function properties [1–3]. The biosynthesis of collagen within the cell is a complex process and cannot be considered to be typical of the biosynthesis of the “average” secretory protein. This is due to the special physicochemical properties of the large triple-helix-containing molecules that comprise the intracellular form of these proteins, known as procollagen. Furthermore, the variety and complexity of the different types of collagens that are expressed by animal tissues (currently in excess of 24) indicate that the biosynthetic characteristics of each type are different due to the multiple methods by which such types are assembled and modified within the cell. Each type must be assembled in such a way that it can fulfill its final function outside the cell. These functions include a variety of roles, from interactions with cells and cellular receptors, to roles such as attachment and signaling. Because of this diversity in structures and functions, it is difficult to generalize about mechanisms of molecular assembly and transport of collagen in vivo. However, some common themes do emerge. In this chapter we have chosen to illustrate the concept of in vivo collagen biosynthesis using the fibril-forming collagens. Fibril-forming collagens (also known as fibrillar collagens) are the most abundant of the collagen family, occurring commonly in most animal tissues. For this reason they are probably the best characterized among the collagen protein family. The main features of this group are
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Procollagen Biosynthesis in Mammalian Cells
their large molecule size and large uninterrupted triple helix. The fibrillar collagen molecule itself is a flexible “rod-like” structure that has the ability to associate laterally with similar molecules, giving rise to the well-known fibril structure outside the cell. Inside the cell the non-collagenous extremities known as the propeptide domains maintain the procollagen molecule in a soluble state. 1.2 Fibrillar Procollagen
The procollagens are the larger soluble intracellular precursors of the insoluble fibrillar collagen proteins [4, 5]. Members of the fibrillar collagen subgroup show significant homology to each other, and this is also true of their procollagen precursors [6]. This group consists of three major members including types I, II, and III and two smaller members, types V and XI. These collagens form essential constituents of vital structural tissues including bone, skin, blood vessels, and cartilage. The triple-helical molecules of procollagen are formed from three constituent procollagen chains. The structure of the procollagen molecule is reflected in the amino acid sequence composition of the constituent chains. Complete cDNA and amino acid sequences for many types of procollagen chain are available and may be used in sequence homology analyses to compare and contrast sequences between chains [7–10]. In each procollagen chain, the domains responsible for the formation of the N-propeptide, the N-telopeptide, the triple helix, the C-telopeptide, and the Cpropeptide have been identified [11]. The structure of the molecule is illustrated in Figure 1. The characteristic repeating Gly-X-Y motif of the triple-helical domain is wound into the triple helix that is the core part of the trimeric procollagen molecule. While a great deal is known about the structure and properties of the triple helix, the globular propeptides are not so well characterized. The larger of the globular regions is the C-propeptide, which performs specialist functions that are discussed later. The N-propeptide domain is not as well characterized, but some specific roles have been identified. The propeptides are also involved in multiple interactions with various cellular factors that may be involved in the assembly and processing of procollagen molecules. The C-propeptides of the major procollagen types I, II, and III are cleaved by specific C-proteinases immediately upon exiting the cell, and this begins the process of fibrillogenesis.
Fig. 1 Domain structure of a procollagen molecule.
2 The Procollagen Biosynthetic Process: An Overview
1.3 Expression of Fibrillar Collagens
The complexity and diversity of collagen expression are essential to the developmental pathways that allow a higher organism to develop from its early embryonic stages to the mature adult form. Furthermore, tissue renewal and repair processes that are necessary throughout the life of the organism indicate that some expression pathways operate constitutively while other pathways are induced to repair and replace damaged and worn tissues. Thus, expression of all collagens in terms of both development and maintenance is a tightly controlled operation regulated at both the genetic and biochemical levels. Collagen expression is linked to the expression of many other proteins. Some of these proteins are directly involved in processing procollagen in vivo, such as bone morphogenetic protein (BMP-1) and proteins that interact with immature procollagen such as protein disulfide isomerase (PDI), Hsp47, prolyl hydroxylase, lysyl hydroxylase, and other proteins that are co-expressed along with fibrillar collagens. This ensures that the intracellular procollagens are synthesized, correctly modified, and efficiently transported when required. Some regulatory factors such as TGF-β and IL-6 promote procollagen expression at the gene level. Other factors such as tumor necrosis factor TNF-α and IFN-γ appear to suppress procollagen expression. Thus, there is a whole spectrum of signals that appear to induce or suppress procollagen expression and constitute the physiological feedback mechanisms by which expression is regulated.
2 The Procollagen Biosynthetic Process: An Overview
Multi-exon genes such as COL1A1 and COL1A2 that encode the chains that make up type I procollagen reside on different chromosomes. Therefore, the production of fibrillar collagens from collagen genes must necessarily be a highly regulated and coordinated process so that the correct combinations of chains are produced at a given time in order to produce procollagen. Splicing of large mRNA precursors to produce specific mRNAs occurs within the nucleus. The mature mRNAs are transported out into the cytoplasm, where membrane-bound ribosomes can begin the biosynthesis of procollagen. The biosynthesis of procollagen begins with the translation of specific mRNAs on ribosomal complexes [12–14]. The polypeptide chains are co-translationally inserted into the lumen of the endoplasmic reticulum (ER) [15–17]. An interaction between the translocon Sec61p protein and the lumenal chaperone BiP has been proposed to facilitate the completion of translocation into the ER [18, 19]. Enzymes involved in the biosynthetic process are located within the ER lumenal space [20–26]. Hydroxylation of some proline and lysine residues occurs as the triple-helical domain emerges into the lumen of the ER [27, 28]. Virtually all of the Y position proline residues, as well as some Y position lysine residues, are hydroxylated. This hydroxylation is important to the structural stability of the triple
3
4 Procollagen Biosynthesis in Mammalian Cells
helix that is formed later. Some of the hydroxylysine residues in the triple-helical domains are further modified by O-linked glycosylation. This is distinct from the N-linked glycosylation that occurs in the C-propeptide domains, the function of which is not clear. Once translocated, the C-propeptide domain is free to fold, forming its complement of intrachain disulfide bonds [29, 30]. The three polypeptide chains of procollagen must then associate at the C-propeptide domains, bringing the three chains into close proximity and thus allowing the formation of interchain disulfide bonds and nucleation of the triple-helix. The molecule is then able to form a triplehelix in a “zipper-like” fashion [31] starting at the C-terminal end and proceeding towards the N-terminal end of the chains [32]. This folding of the triple helix is rate limited by the conversion of cis-hydroxyproline residues to trans-hydroxyproline residues [33–36]. Current evidence suggests that only the non-triple-helical chains are substrates for the hydroxylase enzymes and that once the triple-helix is formed no further hydroxylation occurs. The hydroxylase enzymes lose affinity as the triplehelix is formed and this probably serves as the detachment mechanism that allows the triple-helical molecule to escape the clutches of these ER-resident proteins. The putative chaperone Hsp47 is believed to bind to the triple-helix as it is formed. Once helix formation is completed, the N-propeptides of the three constituent chains are able to associate, completing the assembly process. Assembled procollagen molecules are then transported through the secretory pathway via mechanisms that are not yet clear but that could involve vesicular transport or a process of cisternal maturation. A schematic of the biosynthetic process is shown in Figure 2.
3 Disulfide Bonding in Procollagen Assembly
Cysteine residues, which form intrachain disulfide linkages, are more highly conserved than other residues in most cysteine-containing proteins [37]. This is unsurprising given that these bonds are extremely important for formation of the native structure of most cysteine-containing proteins. This rule is also applicable to the cysteines in procollagen chains, particularly those in the C-propeptide domain, where a number of intrachain disulfide linkages are thought to occur [6, 38, 39]. The strict conservation of these cysteines between different fibrillar collagen types and between species suggests that the same intrachain links are formed in all the C-propeptide domains and therefore that the tertiary structure of the C-propeptides is similar. Most of the fibrillar procollagen C-propeptide domains contain eight cysteine residues in conserved positions relative to the C-proteinase cleavage site. These cysteines are numbered sequentially from the N-terminus of the C-propeptide domain. It is notable that both proα2(I) and proα2(V) C-propeptide domains possess only seven of these cysteine residues and that the second cysteine residue position that occurs in the other chains is occupied by a serine residue. Since both proα2(I)
3 Disulfide Bonding in Procollagen Assembly 5
Fig. 2 Schematic of procollagen biosynthesis.
and proα2(V) are heterotrimer-forming chains, it was suggested that the lack of this cysteine could account for the inability of these chains to form homotrimers [40]. However, this theory is contradicted by work showing that restoration of a full complement of cysteines in the C-propeptide domain of proα2(I) does not enable them to form homotrimers [41]. Early models suggested that disulfide bonding did not occur while the procollagen chains were nascent and still attached to the polysomes [42]. Disulfide linkages were detected only when chains had been completed and apparently released from the ribosomal complexes. This observation is consistent with the prevailing view that synthesis of the whole C-propeptide domain must be completed before folding of these domains can begin. The last cysteine residue in the procollagen C-propeptide domain is located very close to the C-terminus of the chain; it is probable that this residue is essential to the structure of the C-propeptide and that correct folding cannot occur without it. It is known that isolated C-propeptide domains can independently associate and form interchain disulfide linkages [30]. Subsequently, it was also shown that a construct with the C-telopeptide and C-propeptide domains from proα1(III) chains linked to a signal peptide folds and assembles into disulfide-linked trimers [43]. Thus, the C-propeptide can be considered as a domain that has a specific role in guiding the formation of the triple-helical procollagen molecule.
6 Procollagen Biosynthesis in Mammalian Cells
Interchain disulfide bonding between C-propeptide domains involves specific sets of cysteine residues on adjacent chains [38, 41, 44]. Early models suggested that only cysteines 5–8 form intrachain links [38] and that cysteines 1–4 were considered to be involved with interchain disulfide linkages [44]. More recent studies analyzing the effect of site-directed mutagenesis of the cysteines in the C-propeptide domain suggest that only cysteine 2 and cysteine 3 are involved in interchain disulfide bond formation [41]. This model indicates that mutations of cysteines 1 and 4 affect the folding of the C-propeptide domain and therefore are probably involved in intrachain linkages. In contrast, the C-propeptide domains in mutants without cys2 or cys3 were folded correctly, suggesting that these residues would normally form interchain disulfide bonds.
4 The Influence of Primary Amino Acid Sequence on Intracellular Procollagen Folding 4.1 Chain Recognition and Type-specific Assembly
An interesting observation is that procollagen-expressing cells may express multiple types of fibrillar procollagen at a given time. For example, skin fibroblasts may co-express procollagen types I, III, and V [45–50]. Thus, up to six highly homologous chains may be co-expressed and assembled type-specifically into the correct combinations. This discrimination process is illustrated by the fact that proα2(I) chains will associate only with proα1(I) chains to form type I molecules [51] and that no homotrimeric proα2(I) molecules have been reported. This is in contrast to proα1(III) chains, which are solely homotrimer-forming and will not associate with any other type of chain [52, 53]. The formation of heterotypic molecules containing both type V and type XI chains is a known exception to the rule that assembly of fibrillar collagens is type-specific [54]. However, some workers consider types V and XI to be a single collagen type consisting of five polypeptide chains [55–57]. In addition, a highly glycosylated alternatively spliced version of the proα1(II) chain [58, 59] appears to be involved in heterotrimer formation with type XI chains [60, 61]. Despite these complications, it is clear that procollagen chains come together in only certain combinations despite the presence of highly homologous chains being expressed at the same time in the same ER lumenal space. 4.2 Assembly of Multi-subunit Proteins
Little is known about how the assembly of multi-subunit proteins, which may involve the interactions of folded and/or partially folded subunits, is achieved. Clearly, a recognition aspect must exist in order for the subunits to recognize each other. Regardless of whether this assembly/recognition is mediated by a catalyst or is an intrinsic property of subunits, the assembly will be unlikely to succeed in the
4 The Influence of Primary Amino Acid Sequence on Intracellular Procollagen Folding
absence of the correct primary information. Complex proteins are often made up of many subunits, which may or may not serve distinct functions. Some subunits can co-assemble with other types of subunits in particular combinations in different circumstances. One example of this versatility is the protein disulfide isomerase (PDI). PDI functions as a catalyst of disulfide bond formation but is also present as a constituent of prolyl-4-hydroxylase (P4H) [62, 63] and the microsomal lipid transfer complex [64–66]. There are important questions regarding how such complex multi-subunit proteins are formed. Is the information for subunit association encoded within the constituent subunits? How are these constituent subunits able to recognize each other and associate together in the correct manner? How is the correct stoichiometric ratio determined? Is the recognition ability encoded within the primary amino acid sequence, or are other proteins in the biosynthetic machinery responsible for “manufacturing” the correct combinations of subunits? 4.3 Coordination of Type-specific Procollagen Assembly and Chain Selection
One theory of how chain selection can be achieved is to consider it as a problem of coordinated transcription and translation. These processes must be regulated spatially and temporally to bring about the correct combinations of procollagen chains within the same ER lumenal space [14, 67, 68]. While some studies have suggested that proα1(I) and proα2(I) genes are coordinately transcribed in a ratio close to 2:1 [69, 70], other studies show that the levels of mRNA for these chains deviate appreciably from this ratio [71]. Clearly, some degree of transcriptional coordination is necessary in order to produce heterotrimeric type I procollagen molecules, but this does not necessarily determine the ultimate chain selection or folding events. Veis and coworkers [72] have shown that the ribosomes reading out mRNA transcripts appear in organized arrays along the ER. This model suggests that monocistronic mRNAs are brought together within the same ER compartment by a complex set of interactions that ensure coordinated synthesis [72]. This implies that some component of the ER membrane participates in the organization of the ribosomes and coordinates the synthesis of nascent chains. Further studies by Kirk [73] and Veis and Kirk [74] support a model of molecular assembly where chain selection and folding are determined by the attachment of the ribosomes to the ER membrane. Hu and coworkers [75] have also proposed that some component of the ER membrane participates in the chain selection process. This proposal is based on the observation that cell-free translations of proα1(I) and proα2(I) are coordinately altered in the presence of added membranes. It has long been postulated that C-propeptide anchoring of the nascent procollagen chains to the ER membrane could be important in chain assembly as this would restrict the freedom of the chains and decrease the concentration of chains required for association [76]. Observations of the behavior of procollagen chains on a monomolecular film suggest that procollagen chains could assemble while associated with the ER membrane [77]. However, this behavior is not a consequence of the interaction of the
7
8 Procollagen Biosynthesis in Mammalian Cells
C-propeptide N-linked oligosaccharide chain with the membrane lectin calnexin, as mutagenesis of the site for oligosaccharide attachment does not prevent the resulting non-glycosylated C-propeptide from participating in assembly or secretion of procollagen [78]. It is possible that other membrane-associated proteins could interact with the procollagen C-propeptide and organize trimer assembly at the ER membrane. It is difficult to envisage how a C-propeptide domain can achieve its fully folded state if binding to the ER membrane restricts the C-terminal end. One explanation could be that completed procollagen chains detach from the ER membrane to allow the C-propeptide domain to fold and are subsequently indirectly associated with the membrane via interaction with a membrane-bound chaperone. However, these models fail to explain how ribosomes or other membrane factors distinguish between proα2(I) and proα1(I) chains and bring them together in the correct combinations or why proα2(I) chains cannot form homotrimeric molecules. 4.4 Hypervariable Motifs: Components of a Recognition Mechanism That Distinguishes Between Procollagen Chains?
Multi-subunit proteins are crucial to the many complex cellular processes that take place in higher organisms. Clearly, mechanisms that determine the recognition and association of subunits are a vital element in generating multi-subunit proteins. Secondary structure predictions reveal striking similarities between predicted structures for the C-propeptide domains of proα2(I) and proα1(I) chains, despite sequence dissimilarities [38, 79]. Conversely, the same type of prediction suggests a different type of structure for the C-propeptide domain of proα1(II) chains. It has been argued that this could represent the structural basis for chain discrimination during procollagen biosynthesis [38]. Other workers have suggested that the diversity between C-propeptide sequences near the C-proteinase cleavage site might account for the differential chain selection between procollagen chains [80]. More recent studies exploiting advanced in vitro translation technology and genetic engineering have led to a new perspective on how selective association of procollagen chains is achieved. In this system recombinant procollagen chains are expressed in reticulocyte lysate in the presence of semi-permeabilized cells. The utility of this methodology for the analysis of procollagen assembly cannot be understated. Procollagen assembly can be analyzed under controlled experimental conditions while presserving the in vivo environment including the chaperones that are crucial for the assembly process. An experimental strategy based upon the specific exchange of variable sequences between the C-propeptide domains of the homotrimeric proα1(III) chain and the heterotrimer-forming proα2(I) chain has been used to investigate the mechanism of selective chain association [81, 82]. The analysis of procollagen assembly using the chimeric chains expressed in semipermeabilized cells led to the identification of a specific short, discontinuous sequence of 15 amino acids (GNPELPEDVLDV . . .. . . SSR) within the proα1(III) Cpropeptide, which directs procollagen self-association. This recognition sequence
4 The Influence of Primary Amino Acid Sequence on Intracellular Procollagen Folding
appears to be necessary and sufficient to drive homotrimer formation when placed in the correct context within the proα2(I) C-propeptide. This sequence is presumably responsible for the initial recognition event between chains and is therefore necessary to ensure selective chain association. The variable recognition sequence is interrupted by a central hydrophobic motif, Q(L/M)(T/A)F(L/M)(R/K)L(L/M), that is conserved between different fibrillar procollagen chains. This sequence possibly preserves some element of structural similarity in any recognition site that may be formed at the interface between the interacting C-propeptide domains. 4.5 Modeling the C-propeptide
It is known that sequence variations between subunit interfaces determine the specificity of subunit interactions in some proteins. Studies of protein folding and protein-protein interfaces also indicate that similar principles underlie both peptide folding and protein-protein association. In the absence of hard structural data, it is difficult to define the exact interactions that take place during procollagen chain recognition and association. An alternative strategy is to look at computer models based upon information from biophysical studies. Recent low-resolution models based on biophysical studies of the recombinant type III C-propeptide envisage the soluble trimer as “cruciform-shaped” structure with three large lobes and a minor lobe all in the same plane [83]. The larger lobes apparently correspond to each individual propeptide chain, and the minor lobe is proposed to correspond to the junction that links all three C-propeptide domains to the rest of the procollagen molecule. Interestingly, such models place the previously identified recognition sequences at the interface junctions between the individual C-propeptide domains, supporting the idea that such sequences are major determinants of chain selection. 4.6 Chain Association
The above models may suggest that recognition domains promote chain selection and association; however, these two events are not necessarily connected. Experimental data indicate that the recognition event is not necessarily followed by an association event. It is possible to envisage a situation where in vivo a certain sequence is responsible for recognition but another neighboring sequence is responsible for the association. Thus, the two events may be uncoupled. There is also sufficient evidence to indicate that trimer formation can be induced in the absence of the recognition sequences. Early sequence comparisons identified a group of aromatic residues within the C-propeptide domain that could be responsible for association among the three chains [84]. Experimental work has shown that the C-propeptide itself can be efficiently replaced by other sequences such as the transmembrane domain of hemagglutinin [43] and the 29-residue foldon domain from T4 bacteriophage fibritin, which are capable of bringing the three chains
9
10 Procollagen Biosynthesis in Mammalian Cells
of procollagen into close proximity [85]. Thus, nonnative combinations of chains can be synthesized in the absence of selective regions. The alpha-helical coiled coil is a very common motif in oligomerizing proteins. These coils are believed to form oligomerization domains in fibrillar and non-fibrillar collagens and also in collagenlike proteins such as C1q. The widespread presence of these domains in collagens and other trimerizing proteins [86] suggests a general role in trimerization and perhaps a specific role in determining the association of procollagen chains in vivo.
5 Posttranslational Modifications That Affect Procollagen Folding 5.1 Hydroxylation and Triple-helix Stability
Hydroxylation of prolyl and lysyl residues in the triple-helical domain is critical for triple-helix formation and stability. Three types of enzyme are known to participate in hydroxylation of these residues: prolyl 4-hydroxylase (P4H), prolyl 3-hydroxylase (P3H), and lysyl hydroxylase (LH). These hydroxylating enzymes convert the peptidyl-lysine or peptidyl-proline to hydroxylysine or hydroxyproline. Only a small percentage of the peptidyl-prolines in the Y position of the Gly-X-Y triplet are hydroxylated by P3H, while the most important contribution to the stability of the triple-helical procollagen molecule is the hydroxylation of Y-position proline residues in the Gly-X–Y triplet by the P4H enzyme. Formation of the triple helix appears to prevent further hydroxylation [87–89], suggesting that triple-helical procollagen is not a substrate for P4H. A detailed mechanism for the P4H reaction has been described by a number of workers [90–94]. The hydroxylation reaction requires a number of cofactors, including Fe2+ , 2-oxoglutarate, O2 , and ascorbate. The fact that the same cofactors are required for all three types of hydroxylases suggests that the mechanism of action is similar for all three hydroxylating enzymes. The presence or absence of hydroxylation cofactors can be used experimentally as a tool for the analysis of procollagen production by cells. One example is the chelation of ferrous iron by inhibitors such as αα -dipyridyl. This is a popular method for inhibiting the hydroxylation reaction experimentally. Treatment of procollagen-expressing fibroblasts with this agent leads to retention of unhydroxylated procollagen within the ER lumenal space. This “block” can be reversed by the addition of excess ferrous ions when required. Another method for achieving the same result is to deplete procollagen-producing cells of ascorbate. Ascorbate is an essential cofactor that is required for repeated cycles of enzyme activity. Readdition of ascorbate to depleted cells restores procollagen secretion. Such strategies are often used in experiments where it is necessary to build up a concentration of unfolded procollagen within the ER and to follow its maturation through the secretory pathway.
6 Procollagen Chaperones
6 Procollagen Chaperones 6.1 Prolyl 4-Hydroxylase
Vertebrate P4H consists of tetramers composed of two α and two β subunits α2β2 [95–99]. The α-subunit is the catalytic subunit for hydroxylation and is prone to aggregation in the absence of the β-subunit [63, 100]. Thus, one function of the β-subunit in P4H appears to be keeping the α-subunit in the active soluble state [101, 102]. Isoforms of the α-subunit have been identified in human tissue [103], in mouse [104], and in C. elegans [105]. The β-subunit is identical to PDI and maintains its PDI activity even as part of the P4H complex [62, 63]. Another function of the β-subunit may be to retain the α-subunit within the ER lumen since the β-subunit has an ER retention motif, whereas the α-subunit does not. Interestingly, P4H has been found in stable association with procollagen chains possessing a deletion in the triple-helical domain [106], suggesting a chaperonetype role for P4H. Since this result was obtained with antibodies directed against the β-subunit of P4H, this chaperone effect may be mediated by the PDI. However, current evidence suggests that P4H (both α and β subunits) can form stable associations with non-triple helical chains and that it is the major protein that interacts with procollagen chains during early biosynthesis [107]. This binding appears to be conformation-dependent so that P4H is able to distinguish between folded and unfolded procollagen molecules. Taken together with evidence that P4H has reduced affinity for hydroxylated chains [108], it appears that P4H may be a sophisticated hydroxylation- and conformation-sensitive chaperone that is able to mediate the retention of unfolded procollagen chains. 6.2 Protein Disulfide Isomerase
PDI is a key cellular folding enzyme that is important for the maturation of several secreted and membrane-associated proteins. This is particularly true for the folding and maturation of procollagen, where PDI is involved in a number of important steps. As well as functioning as part of the P4H complex, this chaperone has multiple other roles in procollagen biosynthesis as well as a general role in biogenesis of other disulfide-bonded proteins. The function of PDI as a component of enzymes such as P4H is independent of its disulfide isomerase activity [109]. PDI has been proposed to act as a general molecular chaperone by binding to unfolded proteins and thereby preventing aggregation [110–112]. This indicates that PDI assists in the refolding of certain denatured proteins in vitro. In other cases PDI appears to have no activity or even anti-chaperone activity with some substrates [113, 114]. PDI also has been shown to interact with newly synthesized proteins [115] and to catalyze the formation of both intrachain and interchain disulfide bonds [116, 117]. Studies using a cross-linking approach in a semi-permeabilized cell system have shown that
11
12 Procollagen Biosynthesis in Mammalian Cells
PDI also appears to have a separate role in the chaperoning of monomeric chains and monomeric C-propeptide domains [118, 119]. The function of this binding appears to be to keep the monomeric chains in solution. These observations support a model where PDI plays a crucial role in binding to the C-propeptide, thereby coordinating heterotrimer assembly. Although the monomeric C-propeptide contains free thiol residues, no mixed disulfides have been detected between PDI and procollagen C-propeptides. Hence, the interaction is probably dependent on hydrophobic interactions. 6.3 Hsp47
Prolyl 4-hydroxylase is just one member of a group of chaperones that play a vital role in the biogenesis of procollagen within the cell. These chaperones carry out a number of functions in order to assist with procollagen biosynthesis, including folding, protection, retention, posttranslational modification, and degradation. This group of chaperones interacts with multiple proteins that require assistance with folding and assembly. However, it seems that the variety and complexity of proteins in the ER folding environment requires a second class of more specific protein chaperones that are limited to quality control of particular proteins. Heat shock protein 47 (Hsp47) falls into this second group and is specifically associated with procollagen biosynthesis. This stress-induced chaperone is linked with the production of procollagen in many types of cells. Hsp47 and type IV collagen have been found to decrease during the differentiation of mouse teratocarcinoma cells [120]. Upregulation of Hsp47 also occurs when expression of collagen types I and III is induced during fibrosis [121] and during mechanical and heat stress in embryonic chicken tendon cells [122]. Despite some 20 years of investigation, definitive roles for Hsp47 in procollagen biosynthesis have yet to be elucidated. This is in part due to conflicting evidence regarding the substrate preferences of Hsp47 and also to a lack of understanding of some aspects of procollagen biosynthesis such as how transport from the ER to the Golgi is achieved. The RDEL motif at the carboxy-terminus of Hps47 certainly limits the theater of its operations via the KDEL receptor retention mechanism. Any chaperone-type function must therefore necessarily occur prior to the cis-Golgi, where Hsp47 is believed to dissociate from procollagen probably as a result of a pH change that alters its conformation and its affinity for this substrate [123]. The importance of Hsp47 to the biosynthesis of procollagen is illustrated by the dramatic effect of disrupting the Hsp47 genes in a mouse model [124]. This mutation is lethal to the null mouse embryo within 11 days post coitus. The embryo itself is extremely fragile and displays defects in the basal lamina, which compromise basement membrane formation, and apparent defects in associated connective tissues. Most of these effects are likely to be due to the disruption of the extracellular matrix (ECM). Such observations indicate that any secreted collagens from cells lacking Hsp47 may have an abnormal conformation that causes the ECM disruption. The procollagens secreted by these cells were sensitive to proteolytic digestion,
6 Procollagen Chaperones
indicating that the triple helix may be malformed. Transient transfection of Hsp47 into the knockout cells appears to make the procollagen resistant to proteolysis, demonstrating a possible role in stabilization of the triple helix. Studies looking at the interaction of Hsp47 with recombinant procollagen chains expressed in semi-permeabilized cells found that only triple-helical procollagen molecules were able to interact with Hsp47 with high affinity [125]. Procollagen chains that were engineered to remain monomeric or were prevented from forming a triple helix by inhibiting hydroxylation did not appear to interact with Hsp47. This evidence together with evidence from work with collagen peptides [126] shows that Hsp47 has a much higher affinity for the triple-helical form of procollagen. It has been established that the thermal stability of procollagen within the cell may be higher than that of the isolated protein [127], suggesting that the intracellular environment protects the collagen helix from heat denaturation. It is possible that the Hsp47-procollagen interaction leads to a stabilization of the procollagen triple helix, in particular stabilizing those regions of the helix with lower thermal stability [125]. However, work using triple helices designed to be highly stable without any regions of lower thermal stability shows that Hsp47 is still able to bind to these molecules [128]. Furthermore, this work and other evidence show that a minimum of one Gly-X-Arg triplet in the triple-helical domain is necessary for the binding of Hsp47 to procollagen [129]. An alternative explanation for the purpose of binding of Hsp47 to triple-helical procollagen could be to prevent the lateral association of the chains occurring within the ER. Once the procollagen molecule reaches the Golgi apparatus, it has been suggested to form higher-order aggregates, leading to distension of this organelle [130]. This “aggregate form” could be a necessary intermediate prior to propeptide processing and formation of collagen fibrils. The formation of aggregates within the ER would be an undesirable event, perhaps preventing the transport of procollagen to the Golgi apparatus; hence, the presence of Hsp47 could be required to ensure efficient procollagen transport.
6.4 PPI and BiP
Other proteins have been identified that have specific roles during procollagen biosynthesis. One example of this is during the formation of the triple helix. The unfolded procollagen chain contains peptide bonds statistically distributed between cis and trans configurations. Only the trans configuretion is compatible with the structure of the triple helix [131]. Conversion between the cis and trans configurations is limited by the pyrrolidine ring of the imino acids proline and hydroxy-proline, which restricts rotation around the peptide bond. Since both proline and hydroxyproline occur frequently within the triple-helical domain of procollagen chains, the cis-trans isomerization reaction imposes a rate-limiting step on the propagation of the triple helix [35, 36]. This rate-limiting step is catalyzed by peptidyl-prolyl cis-trans isomerase (PPI), which is identical to the protein known as cyclophilin
13
14 Procollagen Biosynthesis in Mammalian Cells
[132, 133]. This enzyme is inhibited by cyclosporin A (CsA), which has been shown to reduce the rate of triple-helix formation in vitro. Immunoglobulin-binding protein (BiP), also known as GRP78, is a stressinduced protein implicated in the retention of misfolded proteins [134, 135]. However, it is not clear whether this protein plays a role during the normal biosynthesis of procollagen or whether its role is limited to quality control of misfolded procollagens in vivo. It is known that BiP is constitutively expressed during the production of mutant chains produced in cases of osteogenesis imperfecta [136, 137]. The mutations where BiP is involved affect the C-propeptide and presumably interfere with chain association. The mechanism by which BiP recognizes misfolded C-propeptides is unknown but the binding is clearly one part of the quality-control mechanism that processes incorrectly folded procollagen in vivo.
7 Analysis of Procollagen Folding
As stated previously, the early stages of procollagen biosynthesis can be reconstituted in a semi-permeabilized cell system whereby conditions can be manipulated and specific aspects of the procollagen assembly process investigated. This methodology has been successfully employed to look at the assembly process itself as well as the role of chaperones such as Hsp47, PDI, and P4H. The volume of physiologically meaningful data resulting from this approach illustrates the power of this combination of in vivo and in vitro techniques. The technique is also very adaptable and may be used to look at assembly of virtually any protein entering the secretory pathway. Here we illustrate the methodology by describing the techniques as applied to procollagen assembly. The procedure involves treating cells grown in culture with the detergent digitonin and isolating the cells free from their cytosolic component. The SP cell system allows protein assembly to be studied in an environment that more closely resembles that of the intact cell. As the ER remains morphologically intact, the interactions with endogenous chaperones and the spatial localization of folding are maintained. An in vitro translation system is combined in the presence of the SP cells so that individual components can be manipulated easily, providing a means by which cellular processes can be studied under a variety of conditions. Chemical cross-linking reagents can also be utilized posttranslationally in order to facilitate the study of interaction between proteins within the ER lumen [138, 139]. The basic protocol involves translation of mRNA transcripts encoding particular procollagen chains in a rabbit reticulocyte lysate supplemented with the semipermeabilized HT1080 cells prepared as outlined below. This cell line is able to carry out the complex co- and posttranslational modifications required for the assembly of procollagen molecules into thermally stable triple helices [140]. The mRNA transcript encoding for the protein of interest is translated in the presence of a radio-labeled amino acid (35 S-methionine) such that the protein synthesized can be visualized by autoradiography. As the mRNA can be synthesized in vitro
References
from cloned cDNA, the effect of manipulating the primary amino acid sequence upon folding and assembly can be examined.
References 1 KIELTY, C. M., HOPKINSON, I. & GRANT, M. E. (1993). The Collagen Family: Structure, Assembly and Organization in the Extracellular Matrix. In Connective Tissue and its Heritable Disorders, pp. 103–147. Wiley-Liss, Inc. 2 KADLER, K. (1994). Protein profile: Fibril forming collagens (SHETERLINE, P., ed.), Vol. 1.2 vols. Academic Press, Oxford. 3 PROCKOP, D. J. & KIVIRIKKO, K. I. (1995). Collagens – Molecular-Biology, Diseases, and Potentials For Therapy. Ann. Rev. Biochem. 64, 403–434. 4 SCHOFIELD, J. D. & PROCKOP, D. J. (1973). Procollagen – a precursor form of collagen. Clin Orthop 97, 175–195. 5 BORNSTEIN, P. (1974). The structure and assembly of procollagen a review. J. Supramol. Struct. 2, 108–120. 6 DION, A. S. & MYERS, J. C. (1987). COOH-terminal propeptides of the major human procollagens: structural, functional and genetic comparisons. J. Mol. Biol. 193, 127–143. 7 HOFMANN, H., FIETZEK, P. P. & KUHN, K. (1980). Comparative analysis of the sequences of the three collagen chains alpha 1(I), alpha 2 and alpha 1(III) Functional and genetic aspects. J. Mol. Biol. 141, 293–314. 8 BERNARD, M. P., MYERS, J. C., CHU, M. L., RAMIREZ, F., EIKENBERRY, E. F. & PROCKOP, D. J. (1983). Structure of a cDNA for the pro alpha 2 chain of human type I procollagen. Comparison with chick cDNA for pro alpha 2(I) identifies structurally conserved features of the protein and the gene. Biochemistry 22, 1139–1145. 9 BALDWIN, C. T., REGINATO, A. M., SMITH, C., JIMENEZ, S. A. & PROCKOP, D. J. (1989). Structure of cDNA clones coding for human type II procollagen. The alpha 1(II) chain is more similar to the alpha 1(I) chain than two other alpha
10
11
12
13
14
15
16
17
chains of fibrillar collagens. Biochem J 262, 521–528. GREENSPAN, D. S., CHENG, W. & HOFFMAN, G. G. (1991). The Pro-Alpha-1(V) Collagen Chain – Complete Primary Structure, Distribution of Expression, and Comparison With the Pro-Alpha-1(XI) Collagen Chain. J. Biol. Chem. 266, 24727–24733. KUHN, K. (1984). Structural and Functional Domains of Collagen – a Comparison of the Protein With Its Gene. Collagen and Related Research 4, 309–322. KERWAR, S. S., KOHN, L. D., LAPIERE, C. M. & WEISSBACH, H. (1972). In vitro synthesis of procollagen on polysomes. Proc. Natl. Acad. Sci. USA 69, 2727–2731. KERWAR, S. S. (1974). Studies on the nature of procollagen synthesized by chick embryo polysomes. Arch. Biochem. Biophys 163, 609–613. VEIS, A. & BROWNELL, A. G. (1977). Triple-helix formation on ribosome-bound nascent chains of procollagen: deuterium-hydrogen exchange studies. Proc. Natl. Acad. Sci. USA 74, 902–905. HARWOOD, R., GRANT, M. E. & JACKSON, D. S. (1973). The sub-cellular location of inter-chain disulfide bond formation during procollagen biosynthesis by embryonic chick tendon cells. Biochem Biophys Res Commun 55, 1188–1196. HARWOOD, R., BHALLA, A. K., GRANT, M. E. & JACKSON, D. S. (1975). The synthesis and secretion of cartilage procollagen. Biochem. J. 148, 129–138. HARWOOD, R., GRANT, M. E. & JACKSON, D. S. (1974). Secretion of procollagen: evidence for the transfer of nascent polypeptides across microsomal membranes of tendon cells. Biochem Biophys Res Commun 59, 947–954.
15
16 Procollagen Biosynthesis in Mammalian Cells
18 NICCHITTA, C. V. & G, B. (1993). Lumenal proteins of the mammalian endoplasmic reticulum are required to complete protein translocation. Cell 73, 513–521. 19 SANDERS, S. L., WHITFIELD, K. M., VOGEL, J. P., ROSE, M. D. & SCHEKMAN, R. W. (1992). Sec61p and BiP directly facilitate polypeptide translocation into the ER. Cell 69, 353–365. 20 HARWOOD, R., GRANT, M. E. & JACKSON, D. S. (1974). Collagen biosynthesis. Characterization of subcellular fractions from embyonic chick fibroblasts and the intracellular localization of protocollagen prolyl and protocollagen lysyl hydroxylases. Biochem J 144, 123–130. 21 HARWOOD, R., GRANT, M. E. & JACKSON, D. S. (1975). Studies on the glycosylation of hydroxylysine residues during collagen biosynthesis and the subcellular localization of collagen galactosyltransferase and collagen glucosyltransferase in tendon and cartilage cells. Biochem. J. 152, 291–302. 22 DIEGELMANN, R. F., BERNSTEIN, L. & PETERKOFSKY, B. (1973). Cell-free collagen synthesis on membrane-bound polysomes of chick embryo connective tissue and the localization of prolyl hydroxylase on the polysome-membrane complex. J. Biol. Chem. 248, 6514–6521. 23 CUTRONEO, K. R., GUZMAN, N. A. & SHARAWY, M. M. (1974). Evidence for a subcellular vesicular site of collagen prolyl hydroxylation. J. Biol. Chem. 249, 5989–5994. 24 GUZMAN, N. A., ROJAS, F. J. & CUTRONEO, K. R. (1976). Collagen lysyl hydroxylation occurs within the cisternae of the rough endoplasmic reticulum. Arch. Biochem Biophys 172, 449–454. 25 FESSLER, J. H. & FESSLER, L. I. (1978). Biosynthesis of procollagen. Annu. Rev. Biochem 47, 129–162. 26 PROCKOP, D., BERG, R., KIVIRIKKO, K. & UITTO, J. (1976). Biochemistry of Collagen (RAMACHANDRAN, G. & REDDI, H., Eds.), Plenum Press, New York. 27 UITTO, J. & PROCKOP, D. J. (1974). Hydroxylation of peptide-bound proline and lysine before and after chain
28
29
30
31
32
33
34
35
36
completion of the polypeptide chains of procollagen. Arch. Biochem Biophys 164, 210–217. VUUST, J. & PIEZ, K. A. (1972). A kinetic study of collagen biosynthesis. J. Biol. Chem. 247, 856–862. UITTO, V. J., UITTO, J. & PROCKOP, D. J. (1981). Synthesis of Type-I Procollagen – Formation of Interchain Disulfide Bonds Before Complete Hydroxylation of the Protein. Arch. Biochem Biophys 210, 445–454. DOEGE, K. J. & FESSLER, J. H. (1986). Folding of carboxyl domain and assembly of procollagen I. J. Biol. Chem. 261, 8924–8935. ENGEL, J. & PROCKOP, D. J. (1991). The Zipper-Like Folding of Collagen Triple Helices and the Effects of Mutations That Disrupt the Zipper. Ann Rev Biophysics and Biophysical Chem 20, 137–152. BACHINGER, H. P., FESSLER, L. I., TIMPL, R. & FESSLER, J. H. (1981). Chain assembly intermediate in the biosynthesis of type III procollagen in chick embryo blood vessels. J. Biol. Chem. 256, 13193–13199. BACHINGER, H. P., BRUCKNER, P., TIMPL, R. & ENGEL, J. (1978). The role of cis-trans isomerization of peptide bonds in the coil leads to and comes from triple helix conversion of collagen. Eur. J. Biochem 90, 605–613. BACHINGER, H. P., BRUCKNER, P., TIMPL, R., PROCKOP, D. J. & ENGEL, J. (1980). Folding mechanism of the triple-helix in type III collagen and type III pN-collagen: role of disulfide bridges and peptide bond isomerization. Eur. J. Biochem 106, 619–632. BRUCKNER, P. & EIKENBERRY, E. F. (1984). Formation of the Triple Helix of Type-I Procollagen Incellulo – Temperature-Dependent Kinetics Support a Model Based On Cis Reversible Trans Isomerization of Peptide-Bonds. Eur. J. Biochem 140, 391–395. BRUCKNER, P., EIKENBERRY, E. F. & PROCKOP, D. J. (1981). Formation of the triple helix of type I procollagen in cellulo. A kinetic model based on
References
37
38
39
40
41
42
43
44
45
46
47
cis-trans isomerization of peptide bonds. Eur. J. Biochem 118, 607–613. THORNTON, J. M. (1981). Disulfide bridges in globular proteins. J. Mol. Biol. 151, 261–287. OLSEN, B. R. (1982). The Carboxyl Propeptides of Procollagens: Structural and Functional Considerations. New Trends Basement Memb Res, 225–236. BYERS, P. H., CLICK, E. M., HARPER, E. & BORNSTEIN, P. (1975). Interchain disulfide bonds in procollagen are located in a large nontriple-helical COOH-terminal domain. Proc. Natl. Acad. Sci. USA 72, 3009–3013. KOIVU, J. (1987). Disulfide bonding as a determinant of the molecular composition of types I, II and III procollagen. FEBS Lett 217, 216–220. LEES, J. F. & BULLEID, N. J. (1994). The role of cysteine residues in the folding and association of the COOH-terminal propeptide of types I and III procollagen. J. Biol. Chem. 269, 24354–24360. LUKENS, L. N. (1976). Time of occurrence of disulfide linking between procollagen chains. J. Biol. Chem. 251, 3530–3538. BULLEID, N. J., DALLEY, J. A. & LEES, J. F. (1997). The C-propeptide domain of procollagen can be replaced with a transmembrane domain without affecting trimer formation or collagen triple helix folding during biosynthesis. EMBO J 16, 6694–6701. KOIVU, J. (1987). Identification of disulfide bonds in carboxy-terminal propeptides of human type I procollagen. FEBS Lett 212, 229–232. LENAERS, A. & LAPIERE, C. M. (1975). Type III procollagen and collagen in skin. Biochim Biophys Acta 400, 121–131. BOOTH, B. A., POLAK, K. L. & UITTO, J. (1980). Collagen biosynthesis by human skin fibroblasts. I. Optimization of the culture conditions for synthesis of type I and type III procollagens. Biochim Biophys Acta 607, 145–160. UITTO, J., BOOTH, B. A. & POLAK, K. L. (1980). Collagen biosynthesis by human skin fibroblasts. II. Isolation and further characterization of type I and type III
48
49
50
51
52
53
54
55
56
57
58
procollagens synthesized in culture. Biochim Biophys Acta 624, 545–561. NARAYANAN, A. S. & PAGE, R. C. (1983). Biosynthesis and regulation of type V collagen in diploid human fibroblasts. J. Biol. Chem. 258, 11694–11699. JIMENEZ, S. A., WILLIAMS, C. J., MYERS, J. C. & BASHEY, R. I. (1986). Increased collagen biosynthesis and increased expression of type I and type III procollagen genes in tight skin (TSK) mouse fibroblasts. J. Biol. Chem. 261, 657–662. KYPREOS, K. E. & SONENSHEIN, G. E. (1998). Basic fibroblast growth factor decreases type V/XI collagen expression in cultured bovine aortic smooth muscle cells. J Cell Biochem 68, 247–258. TRAUB, W. & PIEZ, K. A. (1971). The chemistry and structure of collagen. Adv Protein Chem 25, 243–352. MILLER, E. J. & GAY, S. (1982). Collagen: an overview. Methods Enzymol 82 Pt A, 3–32. CHUNG, E., KEELE, E. M. & MILLER, E. J. (1974). Isolation and characterization of the cyanogen bromide peptides from the alpha 1(3) chain of human collagen. Biochemistry 13, 3459–3464. KLEMAN, J. P., HARTMANN, D. J., RAMIREZ, F. & Van der REST, M. (1992). The human Rhabdomyosarcoma cell-line A204 lays down a highly insoluble matrix composed mainly of alpha-1 type XI and alpha-2 type V collagen chains. Eur J Biochem 210, 329–335. MAYNE, R., BREWTON, R. G., MAYNE, P. M. & BAKER, J. R. (1993). Isolation and characterization of the chains of type-V and type-XI collagen present in bovine vitreous. J. Biol. Chem. 268, 9381–9386. FICHARD, A., KLEMAN, J. P. & RUGGIERO, F. (1995). Another Look At Collagen-V and Collagen-XI Molecules. Matrix Biol 14, 515–531. OLSEN, B. R., WINTERHALTER, K. H. & GORDON, M. K. (1995). FACIT collagens and their biological roles. Trends Glycosci Glycotech 7, 115–127. SANDELL, L. J., MORRIS, N., ROBBINS, J. R. & GOLDRING, M. B. (1991). Alternatively spliced type II procollagen mRNAs
17
18 Procollagen Biosynthesis in Mammalian Cells
59
60
61
62
63
64
65
define distinct populations of cells during vertebral development: differential expression of the amino-propeptide. J. Cell. Biol. 114, 1307–1319. SANDELL, L. J., NALIN, A. M. & REIFE, R. A. (1994). Alternative splice form of type II procollagen mRNA (IIA) is predominant in skeletal precursors and non-cartilaginous tissues during early mouse development. Dev Dyn 199, 129–140. CREMER, M. A., YE, X. J., TERATO, K., OWENS, S. W., SEYER, J. M. & KANG, A. H. (1994). Type-XI collagen-induced arthritis in the lewis rat – characterization of cellular and humoral immune-responses to native type-XI, type-V and type-II collagen and constituent alpha-chains. Journal of Immunology 153, 824–832. WU, J. J. & EYRE, D. R. (1995). Structural analysis of cross-linking domains in cartilage type-XI collagen – insights on polymeric assembly. J. Biol. Chem. 270, 18865–18870. KOIVU, J., MYLLYLA, R., HELAAKOSKI, T., PIHLAJANIEMI, T., TASANEN, K. & KIVIRIKKO, K. I. (1987). A single polypeptide acts both as the beta subunit of prolyl 4-hydroxylase and as a protein disulfide-isomerase. J. Biol. Chem. 262, 6447–6449. PIHLAJANIEMI, T., HELAAKOSKI, T., TASANEN, K., MYLLYLA, R., HUHTALA, M. L., KOIVU, J. & KIVIRIKKO, K. I. (1987). Molecular cloning of the beta-subunit of human prolyl 4-hydroxylase. This subunit and protein disulfide isomerase are products of the same gene. EMBO J 6, 643–649. LAMBERG, A., JAUHIAINEN, M., METSO, J., EHNHOLM, C., SHOULDERS, C., SCOTT, J., PIHLAJANIEMI, T. & KIVIRIKKO, K. I. (1996). The role of protein disulfide isomerase in the microsomal triacylglycerol transfer protein does not reside in its isomerase activity. Biochem J 315, 533–536. THOMPSON, J., SACK, J., JAMIL, H., WETTERAU, J., EINSPAHR, H. & BANASZAK, L. (1998). Structure of the microsomal triglyceride transfer protein in complex
66
67
68
69
70
71
72
73
74
with protein disulfide isomerase. Biophys J 74, A138–A138. WETTERAU, J. R., COMBS, K. A., SPINNER, S. N. & JOINER, B. J. (1990). Protein disulfide isomerase is a component of the microsomal triglyceride transfer protein complex. J. Biol. Chem. 265, 9801–9807. BROWNELL, A. G. & VEIS, A. (1976). Intracellular location of triple helix formation of collagen. Enzyme probe studies. J Biol Chem 251, 7137–7143. de WET, W. J., CHU, M. L. & PROCKOP, D. J. (1983). The mRNAs for the pro-alpha 1(I) and pro-alpha 2(I) chains of type I procollagen are translated at the same rate in normal human fibroblasts and in fibroblasts from two variants of osteogenesis imperfecta with altered steady state ratios of the two mRNAs. J. Biol. Chem. 258, 14385–14389. KOSHER, R. A., KULYK, W. M. & GAY, S. W. (1986). Collagen gene expression during limb cartilage differentiation. J. Cell. Biol. 102, 1151–1156. VUUST, J., SOBEL, M. E. & MARTIN, G. R. (1985). Regulation of type I collagen synthesis. Total pro alpha 1(I) and pro alpha 2(I) mRNAs are maintained in a 2:1 ratio under varying rates of collagen synthesis. Eur J Biochem 151, 449–453. OLSEN, A. S. & PROCKOP, D. J. (1989). Transcription of Human Type-I Collagen Genes – Variation in the Relative Rates of Transcription of the Pro-Alpha-1 and Pro-Alpha-2 Genes. Matrix 9, 73–81. VEIS, A., LEIBOVICH, S. J., EVANS, J. & KIRK, T. Z. (1985). Supramolecular assemblies of mRNA direct the coordinated synthesis of type I procollagen chains. Proc Natl Acad Sci USA 82, 3693–3697. KIRK, T. Z., EVANS, J. S. & VEIS, A. (1987). Biosynthesis of type I procollagen. Characterization of the distribution of chain sizes and extent of hydroxylation of polysome-associated pro-alpha-chains. J Biol Chem 262, 5540–5545. VEIS, A. & KIRK, T. Z. (1989). The Coordinate Synthesis and Cotranslational Assembly of Type-I
References
75
76
77
78
79
80
81
82
83
Procollagen. J. Biol. Chem. 264, 3884–3889. HU, G., TYLZANOWSKI, P., INOUE, H. & VEIS, A. (1995). Relationships between translation of pro alpha1(I) and pro alpha2(I) mRNAs during synthesis of the type I procollagen heterotrimer. J Cell Biochem 59, 214–234. DOEGE, K. J. & FESSLER, J. H. (1986). Folding of Carboxyl Domain and Assembly of Procollagen-I. J. Biol. Chem. 261, 8924–8935. BECK, K., BOSWELL, B. A., RIDGWAY, C. C. & BACHINGER, H. P. (1996). Triple helix formation of procollagen type I can occur at the rough endoplasmic reticulum membrane. J. Biol. Chem. 271, 21566–21573. LAMANDE, S. R. & BATEMAN, J. F. (1995). The Type-I Collagen Pro-Alpha-1(I) COOH-Terminal Propeptide N-Linked Oligosaccharide – Functional-Analysis By Site-Directed Mutagenesis. J. Biol. Chem. 270, 17858–17865. DOERING, T. E., EIKENBERRY, E. F., OLSEN, B. R., FIETZEK, P. P. & BRODSKY, B. (1982). Secondary Structure of Collagen Propeptides and Telopeptides. Biophys J 37, A395–A395. NINOMIYA, Y. & OLSEN, B. R. (1984). Molecular-Cloning of DNA Encoding a Chondrocyte-Specific Collagen of Novel and Unusual Structure. Journal of the Tissue Culture Association 20, 255–255. LEES, J. F., TASAB, M. & BULLEID, N. J. (1996). Molecular Signals Within the C-Propeptide Which Determine Type-Specific Assembly of Procollagen Chains. Matrix Biol 15, 153–153. LEES, J. F., TASAB, M. & BULLEID, N. J. (1997). Identification of the molecular recognition sequence which determines the type-specific assembly of procollagen. EMBO J 16, 908–916. BERNOCCO, S., FINET, S., EBEL, C., EICHENBERGER, D., MAZZORANA, M., FARJANEL, J. & HULMES, D. J. (2001). Biophysical characterization of the C-propeptide trimer from human procollagen III reveals a tri-lobed structure. J. Biol. Chem. 276, 48930–48936.
84 BRASS, A., KADLER, K. E., THOMAS, J. T., GRANT, M. E. & Boot HANDFORD, R. P. (1991). The aromatic zipper: a model for the initial trimerization event in collagen folding. Biochem Soc Trans 19, 365s. 85 PAKKANEN, O., HAMALAINEN, E. R., KIVIRIKKO, K. I. & MYLLYHARJU, J. (2003). Assembly of stable human type I and III collagen molecules from hydroxylated recombinant chains in the yeast Pichia pastoris. Effect of an engineered C-terminal oligomerization domain foldon. J Biol Chem 278, 32478–32483. 86 ENGEL, J. & KAMMERER, R. A. (2000). What are oligomerization domains good for? Matrix Biol 19, 283–288. 87 KIVIRIKKO, K. I. & MYLLYLA, R. (1985). Post-translational processing of procollagens. Ann NY Acad Sci 460, 187–201. 88 KIVIRIKKO, K. I., MYLLYLA, R. & PIHLAJANIEMI, T. (1989). Protein hydroxylation: prolyl 4-hydroxylase, an enzyme with four cosubstrates and a multifunctional subunit. FASEB J 3, 1609–1617. 89 KIVIRIKKO, K. I., HELAAKOSKI, T., TASANEN, K., VUORI, K., MYLLYLA, R., PARKKONEN, T. & PIHLAJANIEMI, T. (1990). Molecular biology of prolyl 4-hydroxylase. Ann NY Acad Sci 580, 132–142. 90 MYLLYLA, R., TUDERMAN, L. & KIVIRIKKO, K. I. (1977). Mechanism of the prolyl hydroxylase reaction. 2. Kinetic analysis of the reaction sequence. Eur J Biochem 80, 349–357. 91 TUDERMAN, L., MYLLYLA, R. & KIVIRIKKO, K. I. (1977). Mechanism of the prolyl hydroxylase reaction. Role of co-substrates. Eur J Biochem 80, 341–348. 92 HANAUSKE ABEL, H. M. & GUNZLER, V. (1982). A stereochemical concept for the catalytic mechanism of prolylhydroxylase: applicability to classification and design of inhibitors. J Theor Biol 94, 421–455. 93 MAJAMAA, K., HANAUSKE ABEL, H. M., GUNZLER, V. & KIVIRIKKO, K. I. (1984). The 2-oxoglutarate binding site of prolyl 4-hydroxylase. Identification of distinct subsites and evidence for 2-oxoglutarate decarboxylation in a ligand reaction at
19
20 Procollagen Biosynthesis in Mammalian Cells
94
95
96
97
98
99
100
101
102
the enzyme-bound ferrous ion. Eur J Biochem 138, 239–245. MAJAMAA, K., TURPEENNIEMI HUJANEN, T. M., LATIPAA, P., GUNZLER, V., HANAUSKE ABEL, H. M., HASSINEN, I. E. & KIVIRIKKO, K. I. (1985). Differences between collagen hydroxylases and 2-oxoglutarate dehydrogenase in their inhibition by structural analogues of 2-oxoglutarate. Biochem J 229, 127–133. PANKALAINEN, M., ARO, H., SIMONS, K. & KIVIRIKKO, K. I. (1970). Protocollagen proline hydroxyllase: molecular weight, subunits and isoelectric point. Biochim Biophys Acta 221, 559–565. BERG, R. A. & PROCKOP, D. J. (1973). Affinity column purification of protocollagen proline hydroxylase from chick embryos and further characterization of the enzyme. J. Biol. Chem. 248, 1175–1182. TUDERMAN, L., KUUTTI, E. R. & KIVIRIKKO, K. I. (1975). Radiommunoassay for human and chick prolyl hydroxylases. Eur J Biochem 60, 399–405. KUUTTI, E. R., TUDERMAN, L. & KIVIRIKKO, K. I. (1975). Human prolyl hydroxylase. Purification, partial characterization and preparation of antiserum to the enzyme. Eur J Biochem 57, 181–188. BERG, R. A., KEDERSHA, N. L. & GUZMAN, N. A. (1979). Purification and partial characterization of the two nonidentical subunits of prolyl hydroxylase. J. Biol. Chem. 254, 3111–3118. HELAAKOSKI, T., VUORI, K., MYLLYLA, R., KIVIRIKKO, K. I. & PIHLAJANIEMI, T. (1989). Molecular cloning of the alpha-subunit of human prolyl 4-hydroxylase: the complete cDNA-derived amino acid sequence and evidence for alternative splicing of RNA transcripts. Proc Natl Acad Sci USA 86, 4392–4396. JOHN, D. C. & BULLEID, N. J. (1994). Prolyl 4-hydroxylase: defective assembly of alpha-subunit mutants indicates that assembled alpha-subunits are intramolecularly disulfide bonded. Biochemistry 33, 14018–14025. VEIJOLA, J., PIHLAJANIEMI, T. & KIVIRIKKO, K. I. (1996). Coexpression of the Alpha-Subunit of Human Prolyl
103
104
105
106
107
108
4-Hydroxylase With BiP Polypeptide in Insect Cells Leads to the Formation of Soluble and Insoluble Complexes Soluble Alpha-Subunit-BiP Complexes Have No Prolyl 4-Hydroxylase Activity. Biochemical J 315, 613–618. ANNUNEN, P., HELAAKOSKI, T., MYLLYHARJU, J., VEIJOLA, J., PIHLAJANIEMI, T. & KIVIRIKKO, K. I. (1997). Cloning of the human prolyl 4-hydroxylase alpha subunit isoform alpha(II) and characterization of the type II enzyme tetramer. The alpha(I) and alpha(II) subunits do not form a mixed alpha(I)alpha(II)beta2 tetramer. J. Biol. Chem. 272, 17342–17348. HELAAKOSKI, T., ANNUNEN, P., VUORI, K., MACNEIL, I. A., PIHLAJANIEMI, T. & KIVIRIKKO, K. I. (1995). Cloning, baculovirus expression, and characterization of a second mouse prolyl 4-hydroxylase alpha-subunit isoform: formation of an alpha 2 beta 2 tetramer with the protein disulfide-isomerase/beta subunit. Proc Natl Acad Sci USA 92, 4427–4431. VEIJOLA, J., KOIVUNEN, P., ANNUNEN, P., PIHLAJANIEMI, T. & KIVIRIKKO, K. I. (1994). Cloning, baculovirus expression, and characterization of the alpha subunit of prolyl 4-hydroxylase from the nematode Caenorhabditis elegans. This alpha subunit forms an active alpha beta dimer with the human protein disulfide isomerase/beta subunit. J Biol Chem 269, 26746–26753. CHESSLER, S. D. & BYERS, P. H. (1992). Defective Folding and Stable Association With Protein Disulfide Isomerase/Prolyl Hydroxylase of Type-I Procollagen With a Deletion in the Pro-Alpha-2(I) Chain That Preserves the Gly-X-Y Repeat Pattern. J. Biol. Chem. 267, 7751–7757. WALMSLEY, A. R., BATTEN, M. R., LAD, U. & BULLEID, N. J. (1999). Intracellular retention of procollagen within the endoplasmic reticulum is mediated by prolyl 4-hydroxylase. J. Biol. Chem. 274, 14884–14892. JUVA, K. & PROCKOP, D. J. (1969). Formation of enzyme-substrate complexes with protocollagen proline hydroxylase and large polypeptide
References
109
110
111
112
113
114
115
116
substrates. J. Biol. Chem. 244, 6486–6492. VUORI, K., PIHLAJANIEMI, T., MYLLYLA, R. & KIVIRIKKO, K. I. (1992). Site-directed mutagenesis of human protein disulfide isomerase: effect on the assembly, activity and endoplasmic reticulum retention of human prolyl 4-hydroxylase in Spodoptera frugiperda insect cells. EMBO J 11, 4213–4217. CAI, H., WANG, C. C. & TSOU, C. L. (1994). Chaperone-like activity of protein disulfide isomerase in the refolding of a protein with no disulfide bonds. J. Biol. Chem. 269, 24550–24552. QUAN, H., FAN, G. & WANG, C. C. (1995). Independence of the chaper-one activity of protein disulfide isomerase from its thioredoxin-like active site. J. Biol. Chem. 270, 17078–17080. YAO, Y., ZHOU, Y. & WANG, C. (1997). Both the isomerase and chaperone activities of protein disulfide isomerase are required for the reactivation of reduced and denatured acidic phospholipase A2. EMBO J 16, 651–658. PUIG, A., LYLES, M. M., NOIVA, R. & GILBERT, H. F. (1994). The role of the thiol/disulfide centers and peptide binding site in the chaperone and anti-chaperone activities of protein disulfide isomerase. J. Biol. Chem. 269, 19128–19135. PUIG, A. & GILBERT, H. F. (1994). Protein disulfide isomerase exhibits chaperone and anti-chaperone activity in the oxidative refolding of lysozyme. J. Biol. Chem. 269, 7764–7771. KLAPPA, P., FREEDMAN, R. B. & ZIMMERMANN, R. (1995). Protein Disulfide-Isomerase and a Lumenal Cyclophilin-Type Peptidyl-Prolyl Cis-Trans Isomerase Are in Transient Contact With Secretory Proteins During Late Stages of Translocation. Eur J Biochem 232, 755–764. BULLEID, N. J. & FREEDMAN, R. B. (1988). Defective Co-Translational Formation of Disulfide Bonds in Protein Disulfide-Isomerase-Deficient Microsomes. Nature 335, 649–651.
117 KOIVU, J. & MYLLYLA, R. (1987). Interchain disulfide bond formation in types I and II procollagen. Evidence for a protein disulfide isomerase catalyzing bond formation. J Biol Chem 262, 6159–6164. 118 WILSON, R., LEES, J. F. & BULLEID, N. J. (1998). Protein disulfide isomerase acts as a molecular chaperone during the assembly of procollagen. J. Biol. Chem. 273, 9637–9643. 119 BOTTOMLEY, M. J., BATTEN, M. R., LUMB, R. A. & BULLEID, N. J. (2001). Quality control in the endoplasmic reticulum. PDI mediates the ER retention of unassembled procollagen C-propeptides. Current Biol 11, 1114–1118. 120 TAKECHI, H., HIRAYOSHI, K., NAKAI, A., KUDO, H., SAGA, S. & NAGATA, K. (1992). Molecular cloning of a mouse 47-kDa heat-shock protein (HSP47), a collagen-binding stress protein, and its expression during the differentiation of F9 teratocarcinoma cells. Eur J Biochem 206, 323–329. 121 MASUDA, H., FUKUMOTO, M., HIRAYOSHI, K. & NAGATA, K. (1994). Coexpression of the collagen-binding stress protein HSP47 gene and the alpha 1(I) and alpha 1(III) collagen genes in carbon tetrachloride-induced rat liver fibrosis. J Clin Invest 94, 2481–2488. 122 PAN, H. & HALPER, J. (2003). Regulation of heat shock protein 47 and type I procollagen expression in avian tendon cells. Cell Tissue Res 311, 373–382. 123 EL-THAHER, T. S. H., DRAKE, A. F., YOKOTA, S., NAKAI, A., NAGATA, K. & MILLER, A. D. (1996). The Ph-Dependent, Atp-Independent Interaction of Collagen Specific Serpin Stress Protein Hsp47. Prot Pept Lett 3, 1–8. 124 NAGAI, N., HOSOKAWA, M., ITOHARA, S., ADACHI, E., MATSUSHITA, T., HOSOKAWA, N. & NAGATA, K. (2000). Embryonic lethality of molecular chaperone hsp47 knockout mice is associated with defects in collagen biosynthesis. J. Cell. Biol. 150, 1499–1506. 125 TASAB, M., BATTEN, M. R. & BULLEID, N. J. (2000). Hsp47: a molecular chaperone that interacts with and stabilizes correctly-folded procollagen. EMBO J 19, 2204–2211.
21
22 Procollagen Biosynthesis in Mammalian Cells
126 KOIDE, T., ASO, A., YORIHUZI, T. & NAGATA, K. (2000). Conformational requirements of collagenous peptides for recognition by the chaperone protein HSP47. J. Biol. Chem. 275, 27957–27963. 127 BRUCKNER, P. & EIKENBERRY, E. F. (1984). Procollagen is more stable in cellulo than in vitro. Eur J Biochem 140, 397–399. 128 TASAB, M., JENKINSON, L. & BULLEID, N. J. (2002). Sequence-specific recognition of collagen triple helices by the collagen-specific molecular chaperone HSP47. J. Biol. Chem. 277, 35007–35012. 129 KOIDE, T., TAKAHARA, Y., ASADA, S. & NAGATA, K. (2002). Xaa-Arg-Gly Triplets in the Collagen Triple Helix Are Dominant Binding Sites for the Molecular Chaperone HSP47. J. Biol. Chem. 277, 6178–6182. 130 BONFANTI, L., MIRONOV, A. A., MARTINEZ-MENARGUEZ, J. A., MARTELLA, O., FUSELLA, A., BALDASSARRE, M., BUCCIONE, R., GEUZE, H. J. & LUINI, A. (1998). Procollagen traverses the Golgi stack without leaving the lumen of cisternae: evidence for cisternal maturation. Cell 95, 993–1003. 131 RICH, A. & CRICK, F. H. C. (1961). The molecular structure of collagen. J Mol Biol 3, 483–506. 132 TAKAHASHI, N., HAYANO, T. & SUZUKI, M. (1989). Peptidyl-prolyl cis-trans isomerase is the cyclosporin A-binding protein cyclophilin. Nature 337, 473–475. 133 FISCHER, G., WITTMANN LIEBOLD, B., LANG, K., KIEFHABER, T. & SCHMID, F. X. (1989). Cyclophilin and peptidyl-prolyl cis-trans isomerase are probably identical proteins. Nature 337, 476–478. 134 DORNER, A. J., BOLE, D. G. & KAUFMAN, R. J. (1987). The relationship of N-linked glycosylation and heavy chain-binding protein association with the secretion of
135
136
137
138
139
140
glycoproteins. J. Cell. Biol. 105, 2665–2674. SUZUKI, C. K., BONIFACINO, J. S., LIN, A. Y., DAVIS, M. M. & KLAUSNER, R. D. (1991). Regulating the retention of T-cell receptor alpha chain variants within the endoplasmic reticulum: Ca(2+)-dependent association with BiP. J. Cell. Biol. 114, 189–205. CHESSLER, S. D. & BYERS, P. H. (1993). BiP binds type I procollagen pro alpha chains with mutations in the carboxyl-terminal propeptide synthesized by cells from patients with osteogenesis imperfecta. J. Biol. Chem. 268, 18226–18233. LAMANDE, S. R., CHESSLER, S. D., GOLUB, S. B., BYERS, P. H., CHAN, D., COLE, W. G., SILLENCE, D. O. & BATEMAN, J. F. (1995). Endoplasmic reticulum-mediated quality control of type I collagen production by cells from osteogenesis imperfecta patients with mutations in the pro alpha 1 (I) chain carboxyl-terminal propeptide which impair subunit assembly. J. Biol. Chem. 270, 8642–8649. WILSON, R. R. & BULLEID, N. J. (2000). Semi-permeabilized cells to study procollagen assembly. Methods Mol Biol 139, 1–9. ELLIOTT, J. G., OLIVER, J. D. & HIGH, S. (1997). The thiol-dependent reductase ERp57 interacts specifically with N-glycosylated integral membrane proteins [published erratum appears in J Biol Chem 1997 Aug 8;272(32):20312]. J. Biol. Chem. 272, 13849–13855. PIHLAJANIEMI, T., MYLLYLA, R., ALITALO, K., VAHERI, A. & KIVIRIKKO, K. I. (1981). Posttranslational modifications in the biosynthesis of type IV collagen by a human tumor cell line. Biochemistry 20, 7409–7415.
1
The E. coli GroE Chaperone Steven G. Burston University of Bristol, Bristol, United Kingdom
Stefan Walter Technische Universit¨at M¨unchen, Garching, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The Hsp60 chaperones, also known as chaperonins, are divided into two groups. Group I consists of the homologues found in eubacteria and their endosymbiotic counterparts in eukaryotes, while group II comprises the chaperone complexes from archaea and the eukaryotic cytosol. Of all Hsp60 proteins characterized so far, the GroE chaperone from Escherichia coli, a member of the first group, is the best understood on a molecular level. GroE was first identified in the early 1970s by Costa Georgopoulos and coworkers, who observed that certain temperature-sensitive mutations in the GroE operon were unable to support the growth of bacteriophage λ [1]. It was subsequently demonstrated that both of the proteins encoded by the operon, GroEL and GroES, are essential for the viability of E. coli at all temperatures. Several years later, John Ellis and coworkers identified a protein component that was associated with chloroplast Rubisco prior to the formation of the holo-enzyme [2]. This was termed the Rubisco subunit-binding protein and was later identified as a homologue of GroEL. However, a molecular analysis of the GroE chaperones really began only in the late 1980s, when George Lorimer and colleagues showed that the efficiency of refolding of bacterial Rubisco could be drastically improved using purified GroEL and GroES with Mg-ATP [3]. At the same time another Hsp60 homologue was identified in the yeast mitochondrion as being essential for the folding of newly imported polypeptides [4]. Since then an enormous amount of data has been obtained describing structural and mechanistic aspects of the GroE chaperone.
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The E. coli GroE Chaperone
2 The Structure of GroEL
GroEL is a large, cylindrical, oligomeric protein complex consisting of 14 identical 57-kDa subunits, which are arranged as two heptameric rings stacked back-to-back (Figure 1A). Each subunit is composed of three domains [5]: (1) an equatorial domain (residues 6–133 and 409–523), which contains the nucleotide-binding site as well as being responsible for the inter-ring and most of the intra-ring contacts; (2) an apical domain (residues 191–376) at the end of the cylinder; and (3) an intermediate domain (residues 134–190 and 377–408), which serves as a highly flexible linker between the two other domains. Upon nucleotide binding/hydrolysis and the binding of GroES, cooperative domain movements occur around two hinge regions on either end of the intermediate domain, leading to large changes in the GroEL structure. Each heptameric ring contains a 45-Å wide cavity that is occluded from the other by the presence of the 14 N- and C-terminal extensions at the ring-ring interface. The opening to each cavity is lined with apical domain residues (Figure 1B), which are principally hydrophobic in character, and provides the binding site for unfolded polypeptide and GroES [5, 6]. Within the crystal structure of the R13G/A26V double mutant of GroEL, the two heptameric rings are related by twofold crystallographic symmetry [5]; however, more recent 10-Å image reconstructions from cryo-electron
Fig. 1 Structure of the GroEL tetradecamer as determined by X-ray crystallography [5]. (A) In this side view, the seven subunits of the top ring are shown in color, while the bottom ring is shown in gray. Each subunit consists of three domains. The equatorial domains (red) provide the inter-ring contacts and most of the interactions between subunits of the same ring. Also, they contain the nucleotide-binding site. The apical domains (orange) are responsible for the binding of both the polypeptide sub-
strate and the co-chaperone GroES. The intermediate domain (yellow) serves as a hinge and is important for the transmission of allosteric signals during the chaperonin cycle. (B) Top view of GroEL showing the sevenfold rotational symmetry of the chaperone. The central cavity has a diameter of 45 Å. The hydrophobic residues of the apical domains that comprise the binding site for GroES and the polypeptide substrate are colored in red.
4 The Structure of GroES and its Interaction with GroEL
microscopy data (cryo-EM) of unliganded GroEL show a slight asymmetry between the two rings [7].
3 The Structure of GroEL-ATP
The X-ray structure of the R13G/A26V double mutant of GroEL complexed with ATPyS has also been determined, showing that the nucleotide-binding site is located at the top of the equatorial domain facing the central cavity. A highly conserved loop region composed of residues 87–91 (Asp-Gly-Thr-Thr-Thr) interacts with the β- and γ -phosphates [8]. However, rather puzzlingly, there appeared to be no significant conformational changes within the GroEL molecule, in contrast to what had been seen using low-resolution image reconstructions from cryo-EM [9] and which would provide a structural basis for the cooperativity observed in ATP binding [10]. Determining a structure of the GroEL-ATP7 complex proved difficult because of the chaperone’s intrinsic ATPase activity. Eventually, this problem was overcome by using the D398A-GroEL mutant, which was previously identified as having a drastically reduced rate of ATP hydrolysis [11]. This mutant GroEL complexed with ATP has been visualized using cryo-EM at a resolution of 10 Å [7]. The reconstructed images show that upon ATP binding the intermediate domains rotate downwards by ∼20◦ while the apical domains extend and twist counterclockwise by ∼25◦ in the ring occupied by ATP. As a result the salt bridge between Glu-386 and Arg-197 of the adjacent subunit, which was previously shown to be important for intra-ring allosteric communication [12–14], is broken. Glu-386 now makes contact with the top of helix C in the adjacent subunit in the region of Lys-80. The apical domain movement partially buries the hydrophobic subunit-subunit interface and radially expands the seven polypeptide-binding sites within the ring in the manner of increasing the aperture of an iris. These conformational changes and the switching of inter-subunit salt bridges provide a structural basis for the propagation of the ATP-induced allosteric communication and are consistent with molecular dynamics simulations of the process [15].
4 The Structure of GroES and its Interaction with GroEL
GroEL functions with a smaller co-chaperone, GroES, which is encoded in the same GroE operon as GroEL [16]. The GroES polypeptide is 10 kDa in size and assembles into a single heptameric structure. The X-ray crystallographic structure of GroES has been determined [17], and when viewed from above, it has an axis of sevenfold rotational symmetry similar to that of GroEL (Figure 2A). Each subunit in the heptameric ring is an eight-stranded β-barrel with an extension directed upwards and towards the center of the sevenfold axis, giving GroES a dome-shaped structure. Protruding downwards from each subunit is the highly dynamic mobile
3
4 The E. coli GroE Chaperone
Fig. 2 Structures of the co-chaperone GroES (A, B) [17] and of a GroELS bullet complex (C) [21]. (A) GroES, here shown in a top view, consists of seven identical 10-kDa subunits and shares with GroEL the same sevenfold rotational symmetry. (B) Side view of GroES in a ribbon representation. A prominent feature of the GroES structure is the mobile loops, which pro-
trude from the bottom of the heptamer and mediate the interaction with GroEL. (C) Side view of a GroELS complex (crosssection). Binding of GroES (red) to the top ring (green) of the GroEL tetradecamer converts the upper cavity into a large shielded compartment for protein folding. The cavity of the bottom ring (gray) is much smaller in volume.
loop region (Figure 2B, residues 17–34), which was originally identified as the site of interaction with GroEL by NMR studies on a peptide corresponding to the mobile loop [18]. GroES binds to one end, or in some conditions to both ends, of the GroEL cylinder in the presence of ATP or ADP [19–21] via interactions between the mobile loops of GroES and the apical domains of GroEL. Low-resolution structures of the various GroEL-nucleotide-GroES complexes have been obtained using cryo-EM and single-particle analysis [9], and the X-ray crystallographic structure of the GroEL-(ADP)7 -GroES complex has been determined [21]. In contrast to the structure of the D398A-GroEL-ATP image reconstruction, which showed that the apical domains had extended and twisted ∼25◦ counterclockwise [7], the binding of GroES apparently induces the apical domains
5 The Interaction Between GroEL and Substrate Polypeptides
to extend and twist ∼90◦ clockwise relative to its position in the unliganded GroEL structure (Figure 2C). This means that the interaction of the seven GroES mobile loops with the apical domains of a GroEL-ATP complex induces the large rotation of the apical domains through a total of 115◦ (−25◦ through +90◦ ). The opening and twisting of the apical domains upon GroES binding cause the hydrophobic residues of the GroEL peptide-binding surface to become buried within the wall of the central cavity. Thereby the physicochemical nature of the central cavity is changed from largely hydrophobic to largely hydrophilic. At the same time, the volume of the GroEL central cavity beneath GroES is drastically increased from 85 000 Å3 to 175 000 Å3 (Figure 2C). It has been estimated that this is enough to accommodate folding intermediates of proteins up to Mr ∼ 60 kDa [22]. The binding of GroES also effectively seals the top of the central cavity, trapping any polypeptide inside the cavity [23]. The GroEL-(ADP)7 -GroES complex alone is unable to promote efficient refolding of stringent GroE polypeptide substrates. However, the addition of either of the γ -phosphate analogues aluminum fluoride and beryllium fluoride to an SR1rhodanese binary complex (SR1 is the single-ring version of GroEL) before mixing with GroES and ADP was found to promote efficient refolding [24]. The possibility that there may be a difference between the GroEL-(ADP)7 -GroES and GroEL(ATP)7 -GroES structures to explain the ability of the γ -phosphate analogues to promote efficient refolding was investigated by solving the X-ray crystallographic structure of the GroEL-(ADP · AlF3 )7 -GroES complex. Surprisingly, though, there were no significant differences between this structure and the GroEL-(ATP)7 -GroES structure [24].
5 The Interaction Between GroEL and Substrate Polypeptides
In vivo experiments with the yeast mitochondrial Hsp60 showed that a chaperone deficiency resulted in the aggregation of a large number of imported and newly synthesized proteins [4]. Like its endosymbiotic counterparts, GroEL can bind to a wide variety of unfolded or partially folded polypeptides. Approximately 40% of denatured proteins from E. coli cell extract are able to associate with GroEL in vitro [25]. Shifting an E. coli strain containing a temperature-sensitive mutation of GroEL to an elevated temperature resulted in a decrease in the rate of translation and strongly affected the folding of a number of cytoplasmic proteins [26]. More recently, a combination of pulse-chase radio-labeling in vivo, immunoprecipitation, and 2D PAGE was used to screen for E. coli polypeptide substrates of GroEL [27]. The identified proteins are involved in a diverse set of cellular roles including transcription, translation, metabolism, and other functions. Unfolded proteins bind to the GroEL apical domains predominantly via hydrophobic interactions [6, 28]. Firstly, this mode of binding provides a means to distinguish unfolded or partially folded polypeptides from folded ones. Secondly, it shields the exposed, hydrophobic residues of these proteins from the environment
5
6 The E. coli GroE Chaperone
and thus prevents their aggregation. Although GroEL appears to have the highest affinity for largely unfolded polypeptides exposing extended hydrophobic surfaces, it is also able to bind highly structured folding intermediates [29–33]. One controversial aspect of the interaction between GroEL and substrate polypeptide has been the question of whether GroEL recognizes any specific structural motifs [27]. An early NMR study on the interaction between a peptide and GroEL found that the peptide bound as an amphipathic α-helix [34]; however, GroEL was also shown to assist the refolding of an all β-sheet protein [35]. More recent data suggest that the most important factor is the spatial clustering of hydrophobic residues that can interact with the hydrophobic surface of the apical domains. As there is significant plasticity in the polypeptide, and perhaps in the GroEL apical domain, the binding affinity may be optimized by structural rearrangements [36–38].
6 GroEL is a Complex Allosteric Macromolecule
GroEL binds and hydrolyses ATP in a K+ -dependent manner and exhibits positive cooperativity, with a K 1/2 of 14 µM and a Hill constant of ∼2 [10, 39, 40]. Subsequent analysis showed that while ATP binding within a heptameric ring is positively cooperative, negative cooperativity exists between the rings (Figure 3). Once the first ring of GroEL is fully saturated with ATP, binding of a further seven ATP molecules to the second ring occurs with a K 1/2 of 160 µM, corresponding to an inter-ring coupling energy of 31 kJ mol−1 [12, 41]. This behavior has been described using a nested model of cooperativity [41] in which the intra-ring allosteric changes from the T state (apo) to the R state (ATP-bound) are described by the MWC-type concerted model [42], whereas the conversion of each ring from Tto R (Figure 3) occurs sequentially as described by the KNF model [43]. ATP hydrolysis is also asymmetric, and the optimal rate of hydrolysis (4 min−1 per subunit) is achieved when only one ring is occupied with ATP [41, 44]. However if both rings are saturated with ATP, then the rate of hydrolysis declines to half of its maximal rate (compare) [41]. The rate-limiting step appears to be the cleavage of the anhydride bond between the β- and γ -phosphates [39]. The conformational transitions induced by ATP have also been investigated using fluorescence spectroscopy. A GroEL derivative in which on average one of the 14 subunits was covalently labeled with a pyrene fluorescent dye showed a large enhancement in fluorescence upon the addition of ATP [39]. When mixed with ATP in a stopped-flow apparatus, a single kinetic phase was observed whose hyperbolic ATP dependence suggested a two-step binding process: ATP initially forms a weak complex with GroEL (K D = 4 mM) before inducing a rapid conformational rearrangement (t1/2 = 5 ms). In contrast, a GroEL mutant in which a single Trp was inserted in the equatorial domain (F44W) reported three kinetic phases upon rapid mixing with ATP [45], while a similar mutant, Y485W GroEL, reported four distinct kinetic transients [46]. A very rapid (∼1 ms) kinetic phase was observed with Y485W GroEL before the main fluorescence quench phase also reported by Y44W GroEL.
7 The Reaction Cycle of the GroE Chaperone
Fig. 3 Model for the allosteric transitions within GroEL. Owing to the positive intraring cooperativity, all seven subunits of one ring adopt the same conformation. In the absence of ligands (left), both rings are preferentially in the T (tense) state. This conformation displays a high affinity for unfolded polypeptides. Binding of ATP to the
bottom ring causes this ring to adopt the R state (middle), in which the affinity for polypeptides is decreased. Because of the negative inter-ring cooperativity, binding of ATP to the second ring is disfavored, and the transition to the R/R state (right) occurs only at high concentrations of nucleotide.
This phase revealed a bi-sigmoid dependency on ATP concentration reminiscent of that seen in steady-state experiments and presumably represents conformational transitions occurring upon occupying first one heptameric ring with ATP and subsequently the second ring. However, the precise structural changes represented by these kinetic transitions are still under investigation. In the presence of GroES, ATP binds to GroEL with a greater affinity (K 1/2 = 6 µM) and displays an increased level of cooperativity. Under steady-state conditions, GroES inhibits the rate of ATP hydrolysis to ∼35% of that seen with GroEL alone [10, 39, 40, 44]. During the hydrolytic cycle, the association of GroES with GroEL is highly dynamic [47]. GroES associates rapidly with the GroEL-ATP complex (5 × 107 M−1 s−1 ) [44, 48] to form a highly stable complex. Hydrolysis of the cis ATP (i.e., that ATP bound to the same ring as GroES) weakens the interaction between GroEL and GroES before ATP binding to the trans GroEL ring forces the dissociation of GroES (and any encapsulated polypeptide) from the cis GroEL ring [11, 48–50]. 7 The Reaction Cycle of the GroE Chaperone
The GroEL reaction cycle can be divided into the three distinct stages of polypeptide binding, encapsulation, and ejection [51]. In the first instance, a protein substrate that is unfolded, misfolded, or partially folded is able to interact with the hydrophobic surface on the apical domains of the trans ring, which adopts a conformation with a high affinity towards polypeptide substrate similar to the T-state rings found in unliganded GroEL (Figure 4). The binding of ATP to this ring induces the rapid dissociation of ligands from the opposite (previously cis) ring [48] and permits the binding of GroES to the same ring as the polypeptide. The binding of ATP and GroES results in a large increase in the volume of the central cavity and buries the hydrophobic polypeptide-binding surface within the inter-subunit interface (Figure 4), thereby displacing the polypeptide substrate into the GroEL cavity where
7
8 The E. coli GroE Chaperone
Fig. 4 Reaction cycle of the GroE chaperonin from E. coli. Although GroEL is composed of two rings, the functional cycle is best described on the level of individual rings, which represent the operational units of GroE. While both rings are active at the same time, they are in different phases of the cycle. Processing of an individual substrate poly-peptide requires two revolutions of the GroE cycle, during which the polypeptide remains associated with the same GroEL ring. For graphical reasons, the orientation of the GroE complex is reversed after step 4. The cycle of GroE-assisted folding can be dissected into three steps: capture, encapsulation/folding, and release. During capture (1), a hydrophobic polypeptide is prevented from aggregation by bind-
ing to GroEL. The acceptor ring (bottom ring) is nucleotide-free and therefore has a high affinity for the polypeptide. Binding of ATP (2) and GroES (3) to this ring induces a set of structural changes in GroEL. Most importantly, the affinity for the bound polypeptide is decreased, and it is released into the closed cavity where folding begins. Subsequent hydrolysis of ATP (5) induces a second conformational change in GroEL (top ring), which allows the bottom ring to bind polypeptide and initiate a new cycle. Upon binding of ATP and GroES in the next round, GroES is displaced from the top ring, and the substrate polypeptide is released (4). The formation of the symmetric complex shown in brackets is controversial.
folding is initiated [7, 11]. There is evidence that the polypeptide may undergo some degree of forced mechanical unfolding upon the radial expansion of the proteinbinding site [52]. Since the entrance to the cavity is sealed by GroES, the protein substrate may fold without interference by other aggregation-prone polypeptides. However, it is clear that at this stage the polypeptide is still able to interact with the walls of the cavity [53]. ATP hydrolysis in the cis ring (t1/2 ∼ 10 s) dictates the lifetime of this enclosed folding compartment by weakening the interaction between
7 The Reaction Cycle of the GroE Chaperone
GroEL and GroES [11] to prime the chaperone complex for the “ejection” signal from the trans ring (Figure 4). ATP binding to the trans ring is sufficient to eject GroES and the polypeptide from the cis ring [11, 47, 49, 50], although this process is accelerated when polypeptide binds to the trans ring concomitantly. An intermediate complex with GroES bound to both ends of the GroEL tetradecamer has been observed during assisted protein-folding reactions [54–56]. The role of these so-called “football” intermediates is not clear, as some results suggest that their formation is not required for efficient refolding [57]. One speculative possibility is that they may provide a means for the efficient trapping of a “new” polypeptide substrate to the trans ring just prior to ejection of the “old” polypeptide from the cis ring (Figure 4) [58]. The time in which the ejected polypeptide can fold in this protected environment is limited by the lifetime of the cis cavity. In the case of a monomeric protein, such as bovine mitochondrial rhodanese, it may be long enough to reach the fully functional native conformation. However, many substrate polypeptides are active as oligomers and therefore individual subunits must be released from GroEL sufficiently folded such that assembly into the active multimer can take place in bulk solution. The proportion of substrate polypeptide that has folded in the cavity depends upon the intrinsic rate of folding of that particular polypeptide. In most cases, the fraction of molecules that has not yet folded sufficiently to proceed rapidly and efficiently to its native conformation within the lifetime of the cavity will, upon ejection, still be competent to rebind to GroEL and proceed through another reaction cycle, where it will have another chance to refold. In this iterative manner, high yields of fully folded protein substrates can be obtained. This kind of mechanism exploits several characteristics of the GroE complex: 1. The binding and hydrolysis of ATP serve to modulate the affinity of GroEL for unfolded polypeptide substrate. Systematic studies of the effect of the various GroEL-nucleotide complexes on the refolding rates and yields of a number of protein substrates have been performed [29, 59–62]. Unliganded GroEL has a low affinity for ATP and GroES but a high affinity for unfolded polypeptides, while the GroEL-ATP complex has precisely the reverse properties. 2. The binding of GroES and ATP to a GroEL-polypeptide binary complex ensures efficient displacement of the polypeptide substrate into the central cavity, where it starts to refold. The binding of GroES and ATP switches the chemical nature of the cavity from one lined at the entrance with hydrophobic side chains to one in which this hydrophobic surface is buried. The volume of the cavity is also significantly increased. 3. The positive intra-ring cooperativity with respect to ATP binding may ensure a concerted release of all parts of the polypeptide chain as it becomes displaced into the central cavity. This may be crucial for productive folding in cases where structure formation in one part of the polypeptide cannot proceed before other regions are released from the GroEL apical domains. An optimal level of intra-ring cooperativity appears to have evolved such that the T-to-R transition occurs at a rate that ensures a rapid release of the whole polypeptide chain into the cavity [63].
9
10 The E. coli GroE Chaperone
4. The inherent asymmetry in the GroEL macromolecule enables each heptameric ring to take part in coupled half-reactions. That is, when the central cavity of one ring contains a folding polypeptide encapsulated under GroES (i.e., cis ring), the opposite trans ring provides the signal for ejection of GroES and poly-peptide. The roles of the two rings are then exchanged such that the trans ring now binds GroES and becomes the cis ring. This communication between the two rings is essential for efficient protein folding, as indicated by the fact that the single-ring version of GroEL (SR1) cannot rescue a GroEL-depleted strain [64].
8 The Effect of GroE on Protein-folding Pathways
GroE is able to exert differing effects on the folding kinetics and efficiency of different protein substrates. In most cases GroE improves the final yield of fully folded protein substrates but does not affect the rate of refolding. There are a number of models have been proposed to account for this phenomenon. In the first model GroE prevents aggregation by reducing the concentration of aggregation-prone intermediates in bulk solution [65]. This could be done passively by GroEL alone, but at the cost of a significant reduction in the rate of refolding [29, 39]. Dynamically cycling the polypeptide on and off the chaperone using ATP binding and hydrolysis to switch GroEL between conformations with alternately high and low affinity for protein substrate would allow faster rates of refolding to be achieved. An extension of this model is the so-called Anfinsen cage model in which encapsulating the folding protein within the central cavity of GroEL underneath GroES allows the polypeptide an opportunity to fold without the possibility of aggregation. In this instance the GroEL central cavity acts as an “infinite dilution” chamber. Multiple rounds of binding, encapsulation, and release ensure that the polypeptide is bound or encapsulated most of the time, thus preventing aggregation. An alternative model suggests that the rate-limiting step during the refolding of many larger proteins involves the breaking of incorrect intramolecular interactions that arise during the rapid collapse of the unfolded protein chain. If this step is slow, then the protein chain is likely to aggregate before the isomerization to a productive conformation can occur. The chaperone can therefore improve the efficiency of protein folding by assisting in the unfolding of misfolded conformations [39, 47]. The chaperonin can assist this process in two possible ways. The first way is by annealing the polypeptide to the apical domain binding sites and then mechanically unfolding the polypeptide further upon binding ATP and GroES [52]. The application of this forced unfolding requires interaction with a number of apical domain polypeptide-binding sites on GroEL [66]. Multiple rounds of this forced unfolding as the protein binds and dissociates would ensure a high yield of folded protein. Alternatively, as long as the collapsed protein substrate can explore its available conformational space in rapid equilibrium, effective unfolding can be achieved by a thermodynamic coupling mechanism, since GroEL binds the most unfolded forms tightly [67, 68].
8 The Effect of GroE on Protein-folding Pathways
There are some notable instances in which an apparent enhancement in the rate of refolding is observed. Porcine mitochondrial malate dehydrogenase (mMDH) shows a 3.5-fold increase in its apparent rate of refolding with GroEL compared to its spontaneous rate [69, 70], and the folding of bacterial Rubisco is increased fourfold over its spontaneous rate [71–73]. Two hypotheses to explain this have been suggested. 1. Ranson et al. [70] found that the misfolding of mMDH is the result of early steps in aggregation, presumably as small, low-order aggregates form, and that although the equilibrium normally heavily favors the formation of these smaller aggregates, this process is reversible. Remarkably, GroE was able to actively reverse these early aggregation steps even at sub-stoichiometric amounts, suggesting that it acts as a catalyst. Using cycles of binding and release, the chaperonin constantly recycles material from the unproductive aggregation pathway and resupplies the productive folding pathway. As a result the rate of refolding apparently increases although the intrinsic rate has not changed. A similar model has also been suggested to explain the apparent rate increase observed during the assisted refolding of bacterial Rubisco [72]. 2. Alternatively, it has been proposed that confinement of the protein substrate within the narrow GroEL central cavity has the effect of smoothing the energy landscape in order to increase the flux of protein to its native conformation [73]. More recently, molecular simulations of the refolding process have been performed in systems with different accessible volumes, and it was noted that confinement was able to significantly stabilize the protein, leading to an enhancement in the rate of refolding [74].
Recently, the importance of the chemical properties of the central cavity has been strikingly demonstrated in an experiment in which directed evolution was used in an attempt to optimize the central cavity to maximize the efficiency of folding of the green fluorescent protein (GFP). This was achieved by altering the rate of ATP hydrolysis and the inherent allostery of the chaperonin complex and also by shifting the polarity of the central cavity in order to suit the folding pathway of GFP. However this enhanced specialization within GroEL was achieved at the expense of its ability to refold its natural protein substrates, demonstrating that GroE has had to evolve a balance of ATPase rate and central cavity properties in order to perform a very general role as a molecular chaperone [75]. Another important question concerns proteins that are too large to fit into the central cavity underneath GroES. Its volume was calculated to be in the range of 175 000 Å3 , which likely is large enough to accommodate folding intermediates of up to 60 kDa. Can GroEL also assist the folding of larger proteins? Yeast mitochondrial aconitase (82 kDa) was observed to aggregate in chaperonin-deficient mitochondria, indicating an in vivo role for the chaperonin [76]. in vitro mechanistic studies revealed that aconitase does not become encapsulated underneath GroES but interacts only with an open ring. GroES binding to the ring opposite to the protein substrate causes the release of the nonnative substrate into bulk
11
12 The E. coli GroE Chaperone
solution, where it can refold. In this way aconitase passes through many cycles of binding and release without ever being encapsulated until it is in a conformation that can form the holo-enzyme [77, 78]. This demonstrates a mechanism that is quite distinct from that found when assisting the folding of proteins less than 60 kDa but that is also efficient at refolding to the native (or near-native) conformation. 9 Conclusions
Despite the enormous quantity of data now available describing the structure and mechanism of action of the chaperonins, there remain a number of secrets that GroE has yet to give up. For example, GroE is one of the most complex allosteric systems ever studied, and a great deal of work involving a combination of mutagenetic, kinetic, and thermodynamic studies needs to be done before the intricate macromolecular communication can be mapped onto the structure and the role of allostery in the assisted protein-folding reaction is really understood. Additionally, despite the large amount of structural data that has been collected, it is still not certain precisely what conformational ensemble of a polypeptide is recognized by GroEL and whether unfolding upon ATP and GroES binding is a general feature of the chaperonin reaction [33, 52]. Also unknown is the degree to which the nature of the central cavity can play a part in ensuring efficient folding. Does the cavity play an important role in helping the protein reach a folded conformation, or does it act merely as an “infinite dilution cage”? One new approach that may be able to unlock some of these puzzles is the extension of NMR techniques to the study of large macromolecular complexes such as the GroEL-GroES chaperonin [79]. This would permit structural and dynamic information about the chaperonin and any bound substrate. It is clear that an imaginative and multidisciplinary approach will likely be needed to understand the precise molecular details of this extraordinary protein complex. Acknowledgements
We thank K. Ruth for practical assistance concerning the described protocols. Support from the Deutsche Forschungsgemeinschaft, SFB594 (S.W.) and the Wellcome Trust (S.G.B.) is gratefully acknowledged. References 1 GEORGOPOULOS, C., HENDRIX, R. W., CASJENS, S. R., and KAISER, A. D. (1973). Host participation in bacteriophage lambda head assembly. J. Mol. Biol. 76, 45–60.
2 BARRACLOUGH, R. & ELLIS, R. J. (1980). Protein synthesis in chloroplasts. IX. Assembly of newly-synthesized large subunits into ribulose bisphosphate carboxylase in isolated intact pea
References
3
4
5
6
7
8
9
10
11
chloroplasts. Biochim. Biophys. Acta 607, 19–31. GOLOUBINOFF, P., CHRISTELLER, J. T., GATENBY, A. A., and LORIMER, G. H. (1989). Reconstitution of active dimeric ribulose bisphosphate carboxylase from an unfolded state depends on two chaperonin proteins and Mg-ATP. Nature 342, 884–889. CHENG, M. Y., HARTL, F. U., MARTIN, J., POLLOCK, R. A., KALOUSEK, F., NEUPERT, W., HALLBERG, R. L., and HORWICH, A. L. (1989). Mitochondrial heat-shock protein hsp60 is essential for assembly of proteins imported into yeast mitochondria. Nature 337, 620–625. BRAIG, K., OTWINOWSKI, Z., HEGDE, R., BOISVERT, D. C., JOACHIMIAK, A., HORWICH, A., and SIGLER, P. B. (1994). The crystal structure of the bacterial chaperonin GroEL at 2.8 Å. Nature 371, 578–586. FENTON, W. A., KASHI, Y., FURTAK, K., and HORWICH, A. L. (1994). Residues in chaperonin GroEL required for polypeptide binding and release. Nature 371, 614–619. RANSON, N. A., FARR, G. W., ROSEMAN, A. M. B. G., FENTON, W. A., HORWICH, A. L., and SAIBIL, H. R. (2001). ATP-bound states of GroEL captured by cryoelectron microscopy. Cell 107, 869–879. BOISVERT, D. C., WANG, J., OTWINOWSKI, Z., HORWICH, A. L., and SIGLER, P. B. (1996). The 2.4 Å crystal structure of the bacterial chaperonin GroEL complexed with ATPgS. Nat. Struct. Biol. 3, 170–177. ROSEMAN, A. M., CHEN, S., WHITE, H., BRAIG, K., and SAIBIL, H. R. (1996). The chaperonin ATPase cycle: Mechanism of allosteric switching and movements of substrate-binding domains in GroEL. Cell 87, 241–251. GRAY, T. E. & FERSHT, A. R. (1991). Cooperativity in ATP hydrolysis by GroEL is increased by GroES. FEBS Lett. 292, 254–258. RYE, H. S., BURSTON, S. G., FENTON, W. A., BEECHEM, J. M. Xu. Z., SIGLER, P. B., and HORWICH, A. L.
12 YIFRACH, O. & HOROVITZ, A. (1994). Two Lines of Allosteric Communication in the Oligomeric Chaperonin GroEL are Revealed by the Single Mutation Arg196. Ala. J. Mol. Biol. 243, 397–401. 13 WHITE, H. E., CHEN, S., ROSEMAN, A. M., YIFRACH, O., HOROVITZ, A., and SAIBIL, H. R. (1997). Structural basis of allosteric changes in the GroEL mutant Arg197. Ala. Nat. Struct. Biol. 4, 690–693. 14 YIFRACH, O. & HOROVITZ, A. (1998). Mapping the Transition State of the Allosteric Pathway of GroEL by Protein Engineering. Journal American Chemical Society 120, 13262–13263. 15 MA, J., SIGLER, P. B., and KARPLUS, M. (2000). A dynamic model for the allosteric mechanism of GroEL. J. Mol. Biol. 302, 303–313. 16 CHANDRASEKHAR, G. N., TILLY, K., WOOLFORD, C., HENDRIX, R., and GEORGOPOULOS, C. (1986). Purification and properties of the groES morphogenetic protein of Escherichia coli. J. Biol. Chem. 261, 12414–12419. 17 HUNT, J. F., WEAVER, A. J., LANDRY, S. J., GIERASCH, L., and DEISENHOFER, J. (1996). The crystal structure of the GroES co-chaperonin at 2.8 Å resolution. Nature 379, 37–45. 18 LANDRY, S. J., ZEILSTRA-RYALLS, J., FAYET, O., GEORGOPOULOS, C., and GIERASCH, L. (1993). Characterization of a functionally important mobile domain of GroES. Nature 364, 255–258. 19 SAIBIL, H., DONG, Z., WOOD, S., and AUF DER MAUER, A. (1991). Binding of chaperonins. Nature 353, 25–26. 20 LANGER, T., PFEIFER, G., MARTIN, J., BAUMEISTER, W., and HARTL, F. U. (1992). Chaperonin-mediated protein folding: GroES binds to one end of the GroEL cylinder, which accomodates the protein substrate within its central cavity. EMBO J. 11, 4757–4765. 21 XU, Z., HORWICH, A. L., and SIGLER, P. B. (1997). The crystal structure of the asymmetric GroEL-GroES-(ADP) ∼7 chaperonin complex. Nature 388, 741–750. 22 SAKIHAWA, C., TAGUCHI, H., MAKINO, Y., and YOSHIDA, M. (1999). On the maximum size of proteins to stay and
13
14 The E. coli GroE Chaperone
23
24
25
26
27
28
29
30
31
fold in the cavity of GroEL underneath GroES. J. Biol. Chem. 274, 21251–21256. WEISSMAN, J. S., HOHL, C. M., KOVALENKO, O., KASHI, Y., CHEN, S., BRAIG, K., SAIBIL, H. R., FENTON, W. A., and HORWICH, A. L. (1995). Mechanism of GroEL action: Productive release of polypeptide from a sequestered position under GroES. Cell 83, 577–587. CHAUDHRY, C., FARR, G. W., TODD, M. J., RYE, H. S., BRUNGER, A. T., ADAMS, P. D., HORWICH, A. L., and SIGLER, P. B. (2003). Role of the g-phosphate of ATP in triggering protein folding by GroEL-GroES: function, structure and energetics. EMBO J. 22, 4877–4887. VIITANEN, P. V., GATENBY, A. A., and LORIMER, G. H. (1992). Purified chaperonin (GroEL) interacts with the nonnative states of a multitude of Escherichia coli proteins. Protein Sci. 1, 363–369. HORWICH, A. L., LOW, K. B., FENTON, W. A., and HIRSHFIELD, I. N. (1993). Folding In Vivo of Bacterial Cytoplasmic Proteins: Role of GroEL. Cell 74, 909–917. HOURY, W. A., FRISHMAN, D., ECKERSKORN, C., LOTTSPEICH, F., and HARTL, F. U. (1999). Identification of in vivo substrates of the chaperonin GroEL. Nature 402, 147–154. LIN, Z., SCHWARZ, F. P., and EISENSTEIN, E. (1995). The hydrophobic nature of GroEL-substrate binding. J. Biol. Chem. 270, 1011–1014. BADCOE, I. G., SMITH, C. J., WOOD, S., HALSALL, D. J., HOLBROOK, J. J., LUND, P., and CLARKE, A. R. (1991). Binding of a chaperonin to the folding intermediates of lactate dehydrogenase. Biochemistry 30, 9195–9200. ROBINSON, C. V., GROSS, M., EYLES, S. J., EWBANK, J. J., MAYHEW, M., HARTL, F. U., DOBSON, C. M., and RADFORD, S. E. (1994). Conformation of GroEL-bound α-lactalbumin probed by mass spectrometry. Nature 372, 646–651. ZAHN, R., PERRETT, S., STENBERG, G., and FERSHT, A. R. (1996). Catalysis of amide proton exchange by the molecular chaperones GroEL and SecB. Science 271, 642–645.
32 GOLDBERG, M. S., ZHANG, J., SONDEK, S., MATTHEWS, C. R., FOX, R. O., and HORWICH, A. L. (1997). Native-like structure of a protein-folding intermediate bound to the chaperonin GroEL. Proc. Natl. Acad. Sci. U.S.A. 94, 1080–1085. 33 CHEN, J., WALTER, S., HORWICH, A. L., and SMITH, D. L. (2001). Folding of malate dehydrogenase inside the GroEL-GroES cavity. Nat. Struct. Biol. 8, 721–728. 34 LANDRY, S. J. & GIERASCH, L. (1991). The chaperonin GroEL binds a polypeptide in an α-helical conformation. Biochemistry 30, 7359–7362. 35 SCHMIDT, M. & BUCHNER, J. (1992). Interaction of GroE with an all β-protein. J. Biol. Chem. 267, 16829–16833. 36 CHEN, L. & SIGLER, P. B. (1999). The crystal structure of a GroEL/peptide complex: plasticity as a basis for substrate diversity. Cell 99, 757–768. 37 WANG, Z., FENG, H., LANDRY, S. J., MAXWELL, J., and GIERASCH, L. (1999). Basis of substrate binding by the chaperonin GroEL. Biochemistry 38, 12537–12546. 38 FALKE, S., FISHER, M. T., and GOGOL, E. P. (2001). Structural changes in GroEL effected by binding of a denatured protein substrate. J. Mol. Biol. 308, 569–577. 39 JACKSON, G. S., STANIFORTH, R. A., HALSALL, D. J., ATKINSON, T., HOLBROOK, J. J., CLARKE, A. R., and BURSTON, S. G. (1993). Binding and hydrolysis of nucleotides in the chaperonin catalytic cycle: implications for the mechanism of assisted protein folding. Biochemistry 32, 2554–2563. 40 TODD, M. J., VIITANEN, P. V., and LORIMER, G. H. (1993). Hydrolysis of Adenosine 5 -Triphosphate by Escherichia coli GroEL: Effects of GroES and Potassium Ion. Biochemistry 32, 8560–8567. 41 YIFRACH, O. & HOROVITZ, A. (1995). Nested cooperativity in the ATPase activity of the oligomeric chaperonin GroEL. Biochemistry 34, 5303–5308. 42 MONOD, J., WYMAN, J., and CHANGEUX, J.-P. (1965). On the nature of allosteric
References
43
44
45
46
47
48
49
50
51
transitions: a plausible model. J. Mol. Biol. 12, 88–118. KOSHLAND, D. E., NEMETHY, G., and FILMER, D. (1966). Comparison of experimental binding data and theoretical models in proteins containing subunits. Biochemistry 5, 365–385. BURSTON, S. G., RANSON, N. A., and CLARKE, A. R. (1995). The Origins and Consequences of Asymmetry in the Chaperonin Reaction Cycle. J. Mol. Biol. 249, 138–152. YIFRACH, O. & HOROVITZ, A. (1998). Transient Kinetic Analysis of Adenosine 50-Triphosphate Binding-Induced Conformational Changes in the Allosteric Chaperonin GroEL. Biochemistry 37, 7083–7088. CLIFF, M. J., KAD, N. M., HAY, N., LUND, P. A., WEBB, M. R., BURSTON, S. G., and CLARKE, A. R. (1999). A Kinetic Analysis of the Nucleotide-induced Allosteric Transitions of GroEL. J. Mol. Biol. 293, 667–684. TODD, M. J., VIITANEN, P. V., and LORIMER, G. H. (1994). Dynamics of the chaperonin ATPase cycle: implications for facilitated protein folding. Science 265, 659–666. RYE, H. S., ROSEMAN, A. M., CHEN, S., FURTAK, K., FENTON, W. A., SAIBIL, H. R., and HORWICH, A. L. (1999). GroEL-GroES Cycling: ATP and Nonnative Polypeptide Direct Alternation of Folding-Active Rings. Cell 97, 325–338. WEISSMAN, J. S., KASHI, Y., FENTON, W. A., and HORWICH, A. L. (1994). GroEL-Mediated Protein Folding Proceeds by Multiple Rounds of Binding and Release of Nonnative Forms. Cell 78, 693–702. BURSTON, S. G., WEISSMAN, J. S., FARR, G. W., FENTON, W. A., and HORWICH, A. L. (1996). Release of both native and non-native proteins from a cis-only GroEL ternary complex. Nature 383, 96–98. RANSON, N. A., BURSTON, S. G., and CLARKE, A. R. (1997). Binding, Encapsulation and Ejection: Substrate Dynamics During a Chaperonin-assisted
52
53
54
55
56
57
58
59
60
61
Folding Reaction. J. Mol. Biol. 266, 656–664. SHTILERMAN, M., LORIMER, G. H., and ENGLANDER, S. W. (1999). Chaperonin function: Folding by forced unfolding. Science 284, 822–825. WEISSMAN, J. S., RYE, H. S., FENTON, W. A., BEECHEM, J. M., and HORWICH, A. L. (1996). Characterization of the Active Intermediate of a GroEL-GroESMediated Protein Folding Reaction. Cell 84, 481–490. SCHMIDT, M., RUTKAT, K., RACHEL, R., PFEIFER, G., JAENICKE, R., VIITANEN, P. V., LORIMER, G. H., and BUCHNER, J. (1994). Symmetric complexes of GroE chaperonins as part of the functional cycle. Science 265, 656–659. AZEM, A., KESSEL, M., and GOLOUBINOFF, P. (1994). Characterization of a functional GroEL14(GroES7)2 chaperonin hetero-oligomer. Science 265, 653–656. SPARRER, H., RUTKAT, K., and BUCHNER, J. (1997). Catalysis of protein folding by symmetric chaperone complexes. Proc. Natl. Acad. Sci. U.S.A. 94, 1096–1100. HAYER-HARTL, M. K., EWALT, K. L., and HARTL, F. U. (1999). On the role of symmetrical and asymmetrical chaperonin complexes in assisted protein folding. Biol. Chem. 380, 531–540. WALTER, S. (2002). Structure and function of the GroE chaperone. Cellular and Molecular Life Sciences 59, 1589–1597. FISHER, M. T. (1992). Promotion of the in vitro renaturation of dodecameric glutamine synthetase from Escherichia coli in the presence of GroEL (chaperonin-60) and ATP. Biochemistry 31, 3955–3963. GRAY, T. E. and FERSHT, A. R. (1993). Refolding of barnase in the presence of GroE. J. Mol. Biol. 232, 1197–1207. STANIFORTH, R. A., BURSTON, S. G., ATKINSON, T., and CLARKE, A. R. (1994). Affinity of chaperonin-60 for a protein substrate and its modulation by nucleotides and chaperonin-10. Biochemical J. 300, 651–658.
15
16 The E. coli GroE Chaperone
62 SPARRER, H., LILIE, H., and BUCHNER, J. (1996). Dynamics of the GroEL-protein complex: effects of nucleotides and folding mutants. J. Mol. Biol. 258, 74–87. 63 YIFRACH, O. & HOROVITZ, A. (1999). Coupling between protein folding and allostery in the GroE chaperonin system. Proc. Natl. Acad. Sci. U.S.A. 97, 1521–1524. 64 SUN, Z., SCOTT, D. J., and LUND, P. A. (2003). Isolation and characterisation of mutants of GroEL that are fully functional as single rings. J. Mol. Biol. 332, 715–728. 65 BUCHNER, J., SCHMIDT, M., FUCHS, M., JAENICKE, R., RUDOLPH, R., SCHMID, F. X., and KIEFHABER, T. (1991). GroE facilitates refolding of citrate synthase by suppressing aggregation. Biochemistry 30, 1586–1591. 66 FARR, G. W., FURTAK, K., ROWLAND, M. B., RANSON, N. A., SAIBIL, H. R., KIRCHHAUSEN, T., and HORWICH, A. L. (2000). Multivalent Binding of Nonnative Substrate Proteins by the Chaperonin GroEL. Cell 100, 561–574. 67 ZAHN, R., SPITZFADEN, C., OTTIGER, M., W¨UTHRICH, K., and PLU¨ CKTHUN, A. (1994). Destabilization of the complete protein secondary structure on binding to the chaperone GroEL. Nature 368, 261–265. 68 WALTER, S., LORIMER, G. H., and SCHMID, F. X. (1996). A thermo-dynamic coupling mechanism for GroEL-mediated unfolding. Proc. Natl. Acad. Sci. U.S.A. 93, 9425–9430. 69 STANIFORTH, R. A., CORTES, A., BURSTON, S. G., ATKINSON, T., HOLBROOK, J. J., and CLARKE, A. R. (1994). The stability and hydropho-bicity of cytosolic and mitochondrial alate dehydrogenase and their relation to chaperonin-assisted folding. FEBS Lett. 344, 129–135. 70 RANSON, N. A., DUNSTER, N. J., BURSTON, S. G., and CLARKE, A. R. (1995). Chaperonins can Catalyse the Reversal of Early Aggregation Steps when a Protein Misfolds. J. Mol. Biol. 250, 581–586.
71 TODD, M. J. & LORIMER, G. H. (1995). Stability of the asymmetric Escherichia coli chaperonin complex. Guanidine chloride causes rapid dissociation. 72 TODD, M. J., LORIMER, G. H., and THIRUMALAI, D. (1996). Chaperonin-facilitated protein folding: optimization of rate and yield by an iterative annealing mechanism. Proc. Natl. Acad. Sci. U.S.A. 93, 4030–4035. 73 BRINKER, A., PFEIFER, G., KERNER, M. J., NAYLOR, D. J., HARTL, F. U., and HAYER-HARTL, M. K. (2001). Dual function of protein confinement in chaperonin-assisted protein folding. Cell 107, 223–233. 74 TAKAGI, F., KOGA, N., and TAKADA, S. (2003). How protein thermodynamics and folding mechanisms are altered by the chaperonin cage: molecular simulations. Proc. Natl. Acad. Sci. U.S.A. 100, 11367–11372. 75 WANG, J. D., HERMAN, C., TIPTON, K. A., GROSS, C. A., and WEISSMAN, J. S. (2002). Directed evolution of substrateoptimized GroEL/S chaperonins. Cell 111, 1027–1039. 76 DUBAQUIE, Y., LOOSER, R., F¨UNFSCHIL-LING, U., JENO, P., and ROSPERT, S. (1998). Identification of in vivo substrates of the yeast mitochondrial chaperonins reveals overlapping but non-identical requirement for hsp60 and hsp10. EMBO J. 15, 5868–5876. 77 CHAUDHURI, T. K., FARR, G. W., FENTON, W. A., ROSPERT, S., and HORWICH, A. L. (2001). GroEL/GroES-Mediated Folding of a Protein Too Large to Be Encapsulated. Cell 107, 235–246. 78 FARR, G. W., FENTON, W. A., CHAUDHURI, T. K., CLARE, D. K., SAIBIL, H. R., and HORWICH, A. L. (2003). Folding with and without encapsulation by cis and trans-only GroEL-GroES complexes. EMBO J. 22, 3220–3230. 79 FIAUX, J., BERTELSON, E. B., HORWICH, A. L., and WUTHRICH, K. (2002). NMR analysis of a 900K GroEL-GroES complex. Nature 418, 207–211.
1
The Hsp90 Family of Molecular Chaperones Klaus Richter, Birgit Meinlschmidt, and Johannes Buchner Technische Universit¨at M¨unchen, Garching, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Hsp90 is a cytosolic molecular chaperone that has been found in context with many signal transduction pathways in higher eukaryotes [1]. Its participation in these processes is still enigmatic but is thought to involve specific conformational changes in its substrate proteins, many of which are protein kinases and transcription factors. Because these conformational changes are required to confer activity to the substrates, Hsp90 has become a global player in the signal transduction network of eukaryotic cells. In contrast, the prokaryotic homologue of Hsp90 (HtpG) and the compartmentalized versions of Hsp90 from mitochondria, chloroplasts, and the endoplasmatic reticulum are thought to be predominantly folding helpers for the proteins that reside in or pass through these cellular structures. The evolution of Hsp90 function may be the result of the ever more complex set of cytosolic partner proteins that are known to associate in a substrate-specific way with Hsp90. In this article specific focus is put on the ATPase cycle of Hsp90, the interplay of Hsp90 with partner proteins or substrates, and the specific ways to investigate these complex macromolecular assemblies. 2 The Hsp90 Family in vivo 2.1 Evolutionary Relationships within the Hsp90 Gene Family
About 130 genes have been unambiguously assigned to the Hsp90 gene family so far. Most of them account for cytosolic variants of Hsp90. Usually, one cytosolic Hsp90 gene is found in prokaryotes, but gene duplications have arisen in eukaryotes Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Hsp90 Family of Molecular Chaperones
that have led to two Hsp90 genes in yeast [2], two Hsp90 genes in mammals [3], and four Hsp90 genes in Arabidopsis thaliana [4]. The functional differences between these proteins, and thus the reasons for duplications, are unknown, especially as the degree of homology between them is very high (as much as 98% in yeast). However, differences in the regulation exist. Sequence alignments show that all the Hsp90 genes share a common organization consisting of conserved domains connected by flexible linkers [5, 6] (Figure 1B). Analysis of the individual domains has seen considerable progress in the past few years concerning both structure and function. The N-terminal domain is the nucleotide-binding site of the protein, whereas the Cterminal domain hosts the dimerization site of Hsp90. The middle domain seems to be involved in substrate binding and ATP hydrolysis (Figure 1B). Based on the many sequences available for Hsp90 and its homologous proteins, it is possible to trace the evolution of this protein family throughout the organismal kingdom. The phylogenetic tree of the primary sequences of Hsp90 family members reveals interesting aspects of the evolutionary relationship (Figure 1A). First, eukaryotic (including chloroplast and ER members) differ from prokaryotic (including mitochondrial) Hsp90 proteins by a massive extension of the linker region between the N-terminal and the middle domain. This linker consists predominantly (about 90%) of charged amino acids and starts around amino acid 210. It has a maximal length of 92 amino acids in human Hsp90 genes, of about 40 amino acids in yeast Hsp90, and of seven amino acids in HtpG from E. coli. Its function is unknown and its presence is not required for yeast cell growth [7]. Second, cytosolic eukaryotic Hsp90 proteins differ from other eukaryotic Hsp90s by a C-terminal extension. This region, which includes the MEEVD motif at the C-terminal end of the protein, serves as the binding site for cytosolic partner proteins containing TPR motifs [8, 9]. This region is also not required for yeast cell growth [7]. Third, Grp94, the ER-resident species, shows an N-terminal extension of about 60 amino acids following the N-terminal signal sequence for ER import. The function and origin of this region are unknown, as it lacks any homology to known protein domains. Interestingly, the evolutionary tree indicates that the Hsp90s in chloroplasts [10] and ER are descendants of cytosolic variants and thus have evolved more recently, while the mitochondrial Hsp90 gene is of ancient, bacterial origin [6]. One feature that shows this origin is the presence of the linker between the N-terminal and the middle domain in the chloroplast and ER-resident Hsp90 isoforms, which is missing in the mitochondrial Hsp90 homologue [11]. The ER-resident Grp94 is found only in multicellular organisms, suggesting either a more recent development or a secondary loss of this gene by single-celled eukaryotes. Duplications of Hsp90 appear to have happened on multiple occasions during evolution, indicating a strong correlation of organismal evolution and Hsp90 evolution. Interestingly, in archaea Hsp90 genes have not been detected yet. 2.2 In Vivo Functions of Hsp90
Investigating the importance of a gene for an organism means in particular investigating phenotypes upon deletion of this gene. Knocking out HtpG, the bacterial
2 The Hsp90 Family in vivo 3
4 The Hsp90 Family of Molecular Chaperones
Hsp90 homologue, produced a modest heat-sensitive phenotype in E. coli [12] but a lack of thermoprotective functions in cyanobacteria [13]. As no substrate of HtpG is known to date, the role of HtpG in bacteria is largely enigmatic. A slight evolutionary advantage could be detected that leads to an outgrowth of HtpG-deficient bacteria by those that carry a functional HtpG gene [12]. In lower eukaryotes, like yeast, Hsp90 is essential, as knocking out both cytosolic Hsp90 genes results in the loss of viability [2]. The reasons for this are unknown, but it has been observed that the cell cycle of yeast is arrested at both the G1 /S and the G2 /M phases upon deletion of Hsp90s [14], which highlights the involvement of Hsp90 in cell cycle control [15, 16]. This view is further strengthened by the identification of Cdk4 as a substrate of Hsp90 in yeast [17]. Several other kinases, such as dsRNA-dependent protein kinase [18] and the cell cycle kinase Wee1 [19], were found to require functional Hsp90 in yeast. Hsp90 in higher eukaryotes was found to be involved in many different pathways, as best exemplified by the study of Rutherford and Lindquist [20]. Here, genotype-specific phenotypes could be generated by the reduction of Hsp90 levels in Drosophila. Hsp90 appeared to buffer the accumulation of mutations in target genes and thus allowed for the invisible accumulation of these events. Partial reduction of Hsp90 function resulted in the manifestation of the previously silent phenotypes and, consequently, led to a diverse set of developmentally defective flies [20]. Comparable effects were later found in Arabidopsis, suggesting a similar scenario in animals and plants [21]. In mammals, knockout studies indicated that at least Hsp90β is essential and that its knockout leads to defects in placental development [22]. Hsp90α is also required for embryonic development, as inhibition of the Hsp90 protein results in defects in muscle cell development in zebra fish [23]. For Grp94, on the other hand, studies with B-cell lines indicate an involvement in innate immunity [24]. Several other reports on Grp94 function suggest interactions of the major ER chaperones BiP and Grp94 in processing, folding, and quality control during protein production in the endoplasmic reticulum [25, 26]. In addition, there are reports of an active role of Grp94 in the loading of peptides onto MHCI complexes during the immune response [27]. Much less is known about the in vivo function of the recently discovered Hsp90 homologues of mitochondria and chloroplasts. Also, a shortened Hsp90 homologue, Hsp90N, lacking the N-terminal ATP-binding site, has recently been identified in human cell lines [28]. The function of this protein is still unknown.
← Fig. 1 Overview of the Hsp90 family. (A) Evolutionary tree of the Hsp90 family. The evolutionary tree of the Hsp90 family has been calculated based on 120 Hsp90 sequences that are publicly accessible. The tree has been constructed as outlined in the Appendix. In principle, it is in agreement with previously constructed trees [5, 6]. ER: endoplasmic reticulum; Ver": verte-
brata Hsp90"; Ver$: vertebrata Hsp90$; Art: Arthropoda; Ne: Nematoda; Po: Polifera; Chl: Chloroplast; Spiro: Spirochaetes; Pro: Proteobacteria; Firm: Firmicutes; Cya: Cyanobacteria; Bac: Bacteroides. (B) Schematic representation of the domain organization of the Hsp90 family. Functional equivalent domains are labeled with the same colors.
2 The Hsp90 Family in vivo 5
2.3 Regulation of Hsp90 Expression and Posttranscriptional Activation
To understand the function of proteins, it is helpful to investigate the specific requirements for their expression. Thus, the strongly enhanced expression of many Hsp90 isoforms as a response to heat stress hints at an involvement in the protective system of the cell. Conditions that lead to the expression of the Hsp90 gene in prokaryotes include exposure to ethanol, toxic substances, and radicicol as well as other harmful conditions [29, 30]. The regulation of HtpG expression is understood in detail only for heat shock. Here, the heat shock factor sigma 32 determines transcriptional control [31]. Upon heat-induced activation of sigma 32, the heat shock element upstream of the htpG gene is used as a docking site of the transcription factor, leading to strongly induced expression of the target protein [32]. To ensure coupling of transcriptional control of chaperones to the presence of unfolded proteins, sigma 32 is negatively regulated by DnaK/Hsp70 and other chaperones. This might add an additional level of control to the regulation, as it guarantees that expression is induced only if a lack of free chaperones is observed. Interestingly, the association of HtpG with sigma 32 is also described [33]. Very similar regulation patterns have been observed for the Hsp90 genes of eukaryotes. Regulation here is performed by the heat shock factor (HSF) [34]. This protein, while not sharing homology with sigma 32, behaves in a comparable way. Here, heat induces trimerization of the protein [35], leading to an active transcription factor, which binds to heat shock elements (HSEs) found in the promoters of all major heat shock proteins, sometimes in multiple redundant sequences [36–38]. If more than one cytosolic Hsp90 isoform exists, the expression is differentially regulated. In yeast, only Hsp82 is stress-regulated, while the 98% identical Hsc82 is constitutively expressed; its levels increase only modestly following stress [2]. Similar results were obtained for mammalian Hsp90. Here, Hsp90β is expressed constitutively and is only slightly upregulated following heat shock and interleukindependent signal transduction [39, 40]. The modest upregulation upon heat shock as well as the constitutive expression require a HSE sequence, which is found within the first intron of the gene [40]. Hsp90α instead is strongly heat-inducible, as it contains several HSEs in its 5 upstream promoter sequence [41]. In addition, it is also under control of interleukin-induced signaling pathways [41, 42]. Other, less well-understood mechanisms – involving the signaling proteins PKCε [43] or STAT1 [44] – are involved in the regulation of Hsp90 expression and appear to couple the expression to an accurate control machinery. Interestingly, recent reports suggest that several chromosomal copies exist for each of the Hsp90 genes in humans [44]. The only other Hsp90 homologue for which the regulation of expression is understood in some detail is Grp94. Grp94 is controlled, among other pathways, by the unfolded protein response (UPR), which connects the appearance of unfolded proteins in the ER with the expression of the major chaperones in this compartment. The pathway starts with the recognition of unfolded proteins by the ER-resident
6 The Hsp90 Family of Molecular Chaperones
transmembrane kinase Ire1 and finally results in the activation of the transcription factor Xbp1 [45]. Here as well, molecular chaperones participate in the activation process as negative regulators of Ire1 [46, 47]. Similar pathways exist in lower eukaryotes as well [48]. Thus, throughout the whole animal kingdom, Hsp90 appears as a protein that, although highly expressed even under non-stress conditions, can be strongly induced following either heat stress or chemical stress. Especially the observed coupling of overexpression to the available chaperone activity, as is observed throughout the activation process of the required transcription factors, strongly hints to the origin of Hsp90 as a folding helper and participant in the homeostasis of protein stability in the cell. In yeast, for example, Hsp90 represents 1–2% of the cytosolic protein under permissive conditions and significantly more under stress conditions. Interestingly, only 1/20th of the amount of Hsp90 present under permissive conditions is required to ensure yeast cell growth, while significantly more Hsp90 is required under stress conditions [2]. 2.4 Chemical Inhibition of Hsp90
In addition to knockout strategies, natural Hsp90 inhibitors were used to imitate the loss of Hsp90 functions in cell culture and some eukaryotic species [49]. Those mostly used are radicicol and geldanamycin (Figure 2). Radicicol (formerly monorden) is a product of the fungus Neocosmospora tenuicristata [50]. Geldanamycin, an ansamycin, is derived from Streptomyces hygroscopicus var. geldanus [51]. Several substances that are similar to geldanamycin were also found in streptomyces strains, including macbecin [52, 53] and herbimycin A/B [54, 55]. All these inhibitors appear similar in their efficiency to inhibit Hsp90 in vitro, and strong similarities exist in their in vivo effects [56, 57]. Geldanamycin originally was thought to be an inhibitor of tyrosine kinases, but this effect later was proved to be indirect via the inhibition of Hsp90 [58]. It was observed that incubation with Hsp90 inhibitors results in the rapid degradation of many substrate proteins [59, 60]. The potent inhibition of Hsp90 function is the result of a strong affinity for the N-terminal ATP-binding domain (K D = 19 nM for radicicol, K D = 1.2 µM for geldanamycin), which makes them competitive inhibitors of ATP. ATP binds at least 300-fold weaker to Hsp90 [61, 62]. Thus, the binding of these substances to Hsp90 inhibits the ATPase-dependent functions of Hsp90. The strong effect of these substances on the tumorigenic growth of cell lines is probably the result of disruption of Hsp90-substrate complexes that are important for cell division. As many of these substrates are oncogenic or associated with pathways leading to cell growth and cell division, the antitumor potential of these substances has beem employed in clinical studies [63]. Especially one derivative of geldanamycin, 17-allylamino, 17-demethoxygeldanamycin (17AAG), shows promising results in the treatment of melanoma patients [64–66]. In particular, the ability of this inhibitor to simultaneously block multiple pathways required for tumor growth by disrupting the Hsp90 chaperone system holds great promise for
2 The Hsp90 Family in vivo 7
Fig. 2 Chemical structure of geldanamycin and radicicol. R2: Geldanamycin; R1: 17-allylamino, 17-demethoxygeldanamycin.
cancer therapy [49]. Further, these substances appear to accumulate in cancer cells to concentrations far higher than in normal cells [67]. Considerable interest has arisen in identifying further artificial Hsp90 inhibitors, leading to structure-based strategies in the development of new compounds [68]. First progress in the form of potential lead structures based on purine-scaffolds has already been reported [69, 70]. 2.5 Identification of Natural Hsp90 Substrates
The phenotypes observed after Hsp90 deletion in eukaryotes are likely the result of diminished activation of the Hsp90 substrate proteins. Many substrates were found
8 The Hsp90 Family of Molecular Chaperones
in specific, stable complexes with Hsp90 by co-purification or co-precipitation. This implies that, in sharp contrast to Hsp70, the interaction with this chaperone is long-lived. These substrates are derived from several different and seemingly unrelated protein families [1, 71]. Among them are protein kinases of the tyrosine and threonine/serine kinase family, such as Src, Raf, Mek, Cdk4. As these proteins are at the heart of signaling decisions in eukaryotes, Hsp90 is involved in many important cellular processes. Some other substrates can be grouped together into the large class of transcription factors, although no sequence homology exists between these proteins. Hsp90 targets are, e.g., p53 [72], steroid hormone receptors (SHR), and heat shock factor [73]. Several other unrelated proteins have also been found in complex with Hsp90. The known Hsp90 substrates are listed in Table 1. Strikingly, regarding the strong response of Hsp90 expression to heat shock, none of the substrates identified to date gives any hint towards the chaperoning activity of Hsp90 at elevated temperatures. All of the known substrates require Hsp90 function even at permissive temperatures. Therefore, it may well be that the substrate range changes significantly at elevated temperatures.
3 In Vitro Investigation of the Chaperone Hsp90 3.1 Hsp90: A Special Kind of ATPase
The crystal structure of the N-terminal domain of Hsp90 was solved in 1997 [61, 74] (Figure 3A). These and subsequent studies revealed a new type of binding site for ATP and for the inhibitor geldanamycin [61, 75]. The only related fold known at that time was that of GyraseB [76]. As is evident from the structures with ADP and ATP, the nucleotide is bound in a cleft formed by α-helices on top of an eight-stranded β-sheet [75]. Four αhelices participate directly in the binding of the nucleotide, while three others surround them. Binding induces a special kinked conformation of the nucleotide. The adenine and ribose moieties are hidden inside the nucleotide-binding cleft, while the phosphate groups are oriented towards the surface of this domain. The γ -phosphate is highly unordered in the structure of the ATP-protein complex, leading to the assumption that it might be completely solvent-exposed [75]. This orientation of the nucleotide within the N-terminal domain is shared by other members of this ATPase family. The family of related proteins has grown and now includes, among others, the proteins GyraseB, MutL, and Pts and, with some restrictions, the histidine kinases EnvZ and CheA. Collectively they are called GHKL-ATPases [77]. These proteins share very similar 3D structures of their ATPbinding sites, although sequence homology is rather low. With the exception of the histidine kinases, the ATP-binding domains are located in the N-terminal part of the respective protein. They show a dimeric architecture, for which a Cterminal dimerization site is responsible. The surface exposure of the γ -phosphate
3 In Vitro Investigation of the Chaperone Hsp90 9 Table 1 Hsp90 substrate proteins
Protein
References
Transcription
Androgen receptor Aryl hydrocarbon (Ah) receptor CAR, transducer protein Ecdysone receptor Estrogen receptor Glucocorticoid receptor Heme activator protein (Hap1) HSF-1 Hypoxia-inducible factor-1a Mineralocorticoid receptor MTG8 myeloid leukemia protein p53 PPARα (PPARβ); peroxisome proliferator-activated receptor Progesterone receptor Retinoid receptor Sim Stat3; Signal transducer and activator of transcription SV40 large T antigen Tumor promotor-specific binding protein v-erbA water mold Achlya steroid (antheridiol) receptor
[212, 213] [214, 215] [216] [217] [212, 218] [219] [220] [221] [222] [223] [224] [72] [225, 226] [227, 228] [229] [230] [231] [232] [233] [234] [235]
Polymerases
Telomerase Hepatitis B virus reverse transcriptase DNA-polymerase α
[236] [237] [191]
Kinases
3-Phosphoinositide-dependent kinase-1; PDK1 Akt Aurora B Bcr-Abl Calmodulin-regulated eEF-2 kinase Casein kinase II Cdc2 Cdk4 Cdk6 Cdk9 Chk1 c-Mos Epidermal growth factor receptor ErbB2 Flt3 Focal adhesion kinase GRK2 Hck Heme-regulated eIF-2a kinase IκB kinases α,β,γ ,ε Insulin receptor Insulin-like growth factor receptor Ire1
[238] [239] [240] [59] [241] [103] [242] [17, 162] [243] [244] [245] [246] [247] [248] [249] [250] [251] [252] [253, 254] [255] [256] [257] [258] (continued)
10 The Hsp90 Family of Molecular Chaperones Table 1 (continued)
Protein
References
Kinase suppressor of ras (KSR) Lkb1 Lymphoid cell kinase p56lck MAK-related kinase Male germ cell-associated kinase MAK MEK (MAP kinase kinase) MEKK1 and MEKK3 Mik1 Mitogen-activated protein kinase MOK MRK Nucleophosmin-Anaplastic Lymphoma Kinase Perk Phosphatidylinositol 4-kinase Pim-1 PKR Platelet-derived growth factor receptor Polo mitotic kinase Raf family kinases: v-Raf, c-Raf, B-Raf, Gag-Mil, Ste11
v-yes VEGFR2 Wee1
[165] [259, 260] [261] [262] [262] [247] [263] [264] [262] [240] [265] [258] [266] [267] [18] [268] [269] [7, 270– 273] [255] [274] [264] [275] [276] [277] [277] [261, 278] [163, 279, 280] [277] [250] [19]
Actin Apaf-1 Apoprotein B Atrial natriuretic peptide receptor Bid calmodulin Calponin Centrin/centrosome Cna2 (catalytic subunit of calcineurin) CFTR Ctf13/Skp1 component of CBF3 Endothelial NOS Ether-a-gogo-related cardiac potassium channel Fanconi anemia group C protein (FACC protein) G protein βγ Gα0 Gα12
[281] [282] [283] [284] [285] [286] [287] [288] [289] [290] [291] [292] [293] [294] [295] [296] [297]
Receptor-interacting protein (RIP) Sevenless PTK Swe1 Translation initiation factor kinase Gcn2 Tropomyosin related kinase B (trkB) v-fes v-fps v-fgr, c-fgr v-Src, c-Src
Others
3 In Vitro Investigation of the Chaperone Hsp90 11 Table 1 (continued)
Protein
References
GERp95 (Golgi Endoplasmic Reticulum protein 95 kDa) Glutathione S-transferase subunit 3 (KS type) Guanylate cyclase (β-subunit) HETE binding complex Histones H1, H2A, H2B, H3 and H4 Inducible NOS Lysosomal membrane Macrophage scavenger receptor Aminoacyl t-RNA synthetase Mdm2 Myosin NB-LRR proteins RPM1 and RPS2 Neuronal NOS Neuropeptide Y P2X7 purinergic receptor PB2 subunit of influenza RNA pol. Pancreatic bile salt-dependent lipase Erythrocyte membrane protein (Plasmodium falciparum) Protease-activated receptor 1 (PAR-1) Proteasome Rab-αGDI Ral-binding protein 1 Reovirus protein s1 Survivin β-Galactosidase M15 truncation mutant Thyroglobulin Tubulin Unassembled immunoglobulin chains Vaccinia core protein 4a
[298] [299] [300] [301] [302] [303] [304] [305] [306] [307] [308] [182, 309] [310] [311] [312] [313] [314] [315] [316] [317] [318] [319] [320] [321] [322] [323] [324] [325] [326]
1 See also Picard [1] and (http://www.picard.ch/DP/downloads/Hsp90interactors.pdf), Wegele et al. [210], and Pratt and Toft [71].
[75] allows complex formation with other parts of the protein. In several cases, the middle domain was found to serve this function, as is evident in the crystal structures of GyraseB (Figure 4) and MutL [78, 79]. The recent solution of the crystal structure of the middle domain of Hsp90 (Figure 3B) has given further evidence for the homology of Hsp90 with these proteins [80]. According to this structure, the ∼150 amino acids immediately following the charged linker share marked similarity to the middle domain of GyraseB and MutL and thus indicate that it may be considered a part of the ATPase site [80]. The following 150 amino acids, in contrast, are unique in their structural organization. This might reflect the different substrate specificities of GyraseB, MutL, and Hsp90. In the structures of GyraseB and MutL, in complex with the non-hydrolyzable ATP analogue AMP-PNP, the N-terminal domains of both proteins form dimers. The way this dimerization is accomplished is noticeable. In both cases, GyraseB
12 The Hsp90 Family of Molecular Chaperones
Fig. 3 Crystal structure of Hsp90 domains. (A) The crystal structure of the N-terminal domain of yeast Hsp90 as solved by Prodromou et al. [74]. The image is based on the PDB entry 1AM1. (B) The crystal structure
of the middle domain of yeast Hsp90 was solved by Meyer et al. [188]. To construct the image, the coordinates of the PDB entry 1HK7 were used.
and MutL, the first strand of the β-sheet and the first α-helix of the protein are exchanged between the two N-terminal dimerized domains [78, 79]. To facilitate this strand-swapping mechanism, apparently a loop consisting of two α-helices has to be moved within the N-terminal domain and claps over the ATP-binding site.
3 In Vitro Investigation of the Chaperone Hsp90 13
Fig. 4 Crystal structure of a dimeric GyraseB fragment. The crystal structure of GyraseB in the AMP-PNP complexed Nterminal dimerized conformation was solved by Brino et al. [78]. The coloring of the structure highlights the mechanistic features
of the N-terminal dimerization reaction. The swapped strand is colored purple, the Nterminal ATP-binding site is yellow, and the middle domain, including parts of the dimerization site, is red. The PDB accession number for this structure is 1EI1.
This “ATP lid” then forms part of the receptor region for the N-terminal strand of the other domain. It directly interacts with the intruding strand of the other N-terminal domain and is therefore considered to be involved in the hydrolysis mechanism. In addition, complex formation between the γ -phosphate of AMPPNP and the middle domain is evident in both crystal structures. In both cases, a specific lysine residue appears to serve the function of the γ -phosphate receptor. Concluding from these structures, the corresponding residue is expected to be in the region of amino acids 320-400 of Hsp90. Based on the crystal structure of the middle domain of Hsp90, residue Glu381 was suggested as the acceptor site of the γ -phosphate [80]. However, sequence alignments suggest other possible residues in that region. Further studies on the interaction between the N-terminal and the middle domain are needed for a conclusive understanding of participating segments and the order of conformational changes within this molecular machine. 3.2 The ATPase Cycle of Hsp90
Because the crystal structures present a rather static picture of one particular step during the ATPase reaction, much effort has been put into the dissection of the steps that lead to ATP hydrolysis by Hsp90. Yeast Hsp90 hydrolyzes ATP with a turnover of about one per minute [81, 82]. Early studies attempted to define the minimum requirements of Hsp90 to perform ATP hydrolysis. These studies, involving
14 The Hsp90 Family of Molecular Chaperones
different fragments starting from the N-terminal ATP-binding site, showed the critical requirement of regions outside the ATP-binding pocket for ATP hydrolysis [83, 84]. Only 2% of the wild-type ATPase activity was obtained for the isolated Nterminal domain. A slight increase in ATPase activity (about 12% of wt activity) was observed if the middle domain was present in addition to the N-terminal domain [83, 84]. Higher concentrations of this fragment or its covalent dimerization by a C-terminal cysteine bridge lead to an increase in the ATPase activity [85, 86]. The latter approach led to wild-type ATPase activity, proving the necessity of dimer formation for hydrolysis [86]. Experiments involving heterodimeric variants of Hsp90 helped to identify the regions within Hsp90 that mediate the contacts in the dimer that are important for hydrolysis. These studies proved the transient N-terminal dimerization within the Hsp90 dimer during the ATPase reaction [85] that had been proposed by earlier studies [83, 87]. Further deletion studies imply that the critical region for formation of N-terminal dimers during the ATPase reaction of Hsp90 involves the first 24 amino acids of Hsp90 [88], matching the orientation observed in the crystal structure of the homologous ATPases. Based on these studies, a cycle of coordinated conformational changes has been proposed for the ATP hydrolysis reaction of Hsp90 (Figure 5). This cycle starts with the binding of two ATP molecules to the N-terminal domains of Hsp90. Kinetic studies showed that ATP binding is a fast process, especially compared to the turnover of Hsp90 [84]. A series of conformational changes results in the
Fig. 5 ATPase cycle of Hsp90. The ATPase cycle of Hsp90 combines biochemical and structural evidence. The numbers are based on the work of Weikl et al. [84] and represent kinetic studies of yeast Hsp90 per-
formed at 25 ◦ C. ATP hydrolysis is supposed to happen within the N-terminal dimerized state as indicated by the asterisk. There is no evidence so far for the order of the conformational changes.
3 In Vitro Investigation of the Chaperone Hsp90 15
trapping of the ATP molecule as observed by Weikl et al. [84] for yeast Hsp90. Early investigations already had pointed to ATP-induced conformational changes within Hsp90 [89]. Several studies suggest that N-terminal dimerization and subsequent activation of the ATPase activity are critical steps during the ATPase reaction [83, 85, 87]. The activation is thought to be – in analogy to other GHKL-ATPases and based on biochemical data – the result of a strand-swapping reaction involving the very N-terminal amino acids [88]. Evidence is accumulating that the rate-limiting step of the hydrolysis reaction is the conformational change leading to the N-terminally dimerized state [88]. While it has not yet been possible to exactly map the order of conformational changes, mutagenesis studies allow defining of regions that help to facilitate this reaction [88]. A rational model for the N-terminal dimerization reaction can be envisioned. First, the swapping strand has to be released from its own domain, by breaking interactions of the α-helix and the accompanying ATP lid. Based on dynamics studies using NMR spectroscopy, it is assumed that ATP binding might facilitate the N-terminal dimerization reaction. The closure of the ATP lid after ATP binding might set this strand free [90]. One mutant – called 8-Hsp90 and which lacks the first eight amino acids – was found to form the N-terminal dimers much more efficiently and is thus especially useful for the investigation of the ATPase cycle. The rate-limiting step is shifted here from formation of the N-terminal dimer to the opening of the N-terminal dimer after hydrolysis [88]. These steps might facilitate the interaction with the middle domain to form the fully functional ATPase-active state. Unfortunately, it is not yet possible to determine whether the interaction with the middle domain precedes or follows the N-terminal dimerization reaction. This cycle of conformational changes is based mostly on studies with yeast Hsp90, but, judging from the high degree of homology, it is assumed to be identical for the other Hsp90-like proteins. The ATPase activities determined for HtpG from E. coli (0.4 min−1 ), yeast Hsp90 (0.5 min−1 ), chicken TRAP1 (0.4 min−1 ), and chicken Grp94 (0.4 min−1 ) [11, 81, 91, 92] are quite similar, while human Hsp90 (0.04 min−1 ) is significantly decelerated [91]. Interestingly, a second nucleotide-binding site has been suggested recently in the C-terminal domain of Hsp90 [93, 94]. The involvement of this site in the ATPase reaction and the specificity of this site have not yet been established. At present it is also possible that this site is the second part of the ATPase-active site that is thought to form contacts with the γ -phosphate. 3.3 Interaction of Hsp90 with Model Substrate Proteins
Compared to the ATPase reaction, the main functions of Hsp90 – the binding and processing of target proteins – are not understood sufficiently. Several studies attempted to identify the substrate-binding site using nonnative proteins such as thermally destabilized citrate synthase [82, 95–98], luciferase [97, 99], rhodanese [100], chemically destabilized insulin [82, 101], and thermally inactivated casein kinase [102–104]. These studies led to the identification of chaperone-active sites
16 The Hsp90 Family of Molecular Chaperones
in all major domains of Hsp90 and thus show the difficulty in understanding Hsp90 interaction with substrate proteins. Many of these assays are based on the aggregation of proteins and thus represent rather complicated systems. This excludes complete understanding of the molecular interactions that influence these processes and thus does not allow obtaining of quantitative data. Based on these studies, the chaperone activity of Hsp90 originally was proposed to reside in the N-terminal domain and in a construct comprising the middle and the C-terminal domain [82, 100]. An N-terminal peptide-binding site has been described based on interactions with reduced insulin B-chain and several peptides including the viral VSV8 peptide [82]. The C-terminal construct instead was found to be more promiscuous with respect to interaction partners. Additionally, the interaction of Hsp90 with substrates in the C-terminal region was found to be independent of ATP binding or hydrolysis. One study addressed the specific state of the substrate during the interaction process. Here, using citrate synthase as a model target protein, it was observed that the Hsp90-interacting species is native-like and thus does not represent a fully unfolded or aggregated protein [95]. This is in agreement with in vivo studies, which suggest that Hsp90 is not involved in de novo protein folding [105]. Additionally, using the Hsp90 homologue from the ER, Grp94, attempts to identify substrate-binding sites were made [106–108]. This protein is thought to bind peptides tightly, which allows their co-purification [106, 109]. The interaction of Grp94 with peptides is still not understood completely, although recent data highlight the possibility of an N-terminal peptide-binding site that is controlled by nucleotide binding and radicicol [106, 109]. 3.4 Investigating Hsp90 Substrate Interactions Using Native Substrates
Although more than 100 authentic Hsp90 substrates have been identified to date, the large-scale purification of these proteins presents great challenges. Therefore, unlike for GroE, it has not yet been possible to reconstitute a nucleotide-dependent processing of substrate proteins in the case of Hsp90. So far, in particular the protein kinases Lck and eIF2α kinase [110], the transcription factor MyoD [111, 112], the reverse transcriptase of the hepatitis B virus [113, 114], and the ligand-binding domain (LBD) of the steroid hormone receptor [91, 115, 116] have been used in interaction studies. The first three proteins were investigated in combination with truncated Hsp90 mutants. The resulting complex interaction patterns did not allow an unambiguous identification of a binding site. Studies by McLaughlin et al. regarding the effect of the LBD on the ATPase of Hsp90 showed a coupling of the ATPase cycle to substrate binding. Binding to Hsp90 appeared only weak; nevertheless, the LBD was found to stimulate the ATPase activity of human Hsp90 by a factor of 200 [91]. The mechanistic aspects of this stimulation remain unknown; however, studies like these will ultimately lead to a deeper understanding of the reactions involved in the processing of protein substrates. It can be anticipated that the conformational changes that were observed as necessary for the hydrolysis of ATP might be translated into the change of substrate
4 Partner Proteins: Does Complexity Lead to Specificity?
conformation. This would require the movement of two substrate-binding sites against each other in an ATP-dependent reaction. The unambiguous identification of these sites remains to be achieved to further understand the chaperone function of Hsp90.
4 Partner Proteins: Does Complexity Lead to Specificity?
Several partner proteins of Hsp90 were consistently identified in Hsp90-substrate complexes of eukaryotes, and most of them were later found to bind directly to Hsp90. The investigation of these complexes, in the absence of substrates, led to important information about the assembly of the Hsp90 machinery. All partner proteins described to date are found only in eukaryotic cells and thus might present the evolutionary adaptation of the Hsp90 system to an increasing number of substrates. For some of the partner proteins, pronounced substrate specificity has been observed. 4.1 Hop, p23, and PPIases: The Chaperone Cycle of Hsp90
The first partner proteins were detected in complexes with SHRs. These macromolecular assemblies were described in 1966 as 9S receptor complexes [117], implying that several proteins in addition to the SHR were assembled into these structures. Subsequently, Hsp90 and some of its partner proteins were identified as the key components [118]. Attempts to reconstitute these complexes led to the observation that the capacity of SHRs to bind hormones is strictly connected to the assembly into Hsp90-containing protein structures. It additionally became evident that the assembly process appears to be a chronological progression through several distinct complexes [119, 120]. Dependent on the involvement of different partner proteins, these complexes were termed “early complex,” “intermediate complex,” and “mature complex.” The partner proteins were identified as Hop [121], p23 [122], Cyp40 [123], FKBP51, and FKBP52 [124, 125]. Yeast proteins with considerable homology were identified for Hop (Sti1), Cyp40 (Cpr6, Cpr7), and p23 (Sba1) [126]. Detailed studies by Smith and coworkers led to a chaperone cycle (Figure 6) that describes the chronological interaction of these partner proteins with Hsp90-SHR complexes [120]. Proteins of the “early complex” were identified to be Hsp70 [127] and Hsp40 [128, 129]. In the “intermediate complex,” the proteins Hsp70, Hop, and Hsp90 were found, while proteins of the “late complex” are Hsp90, p23, and FKBP51, FKBP52, or Cyp40 [130]. Similar heterocomplexes can be found in yeast and mammals [126]. The progression time through the cycle has been estimated to be about 5 min. Recent biochemical experiments have shown that the ATP cycle is the potential driving force behind this sequence of interactions. Based on studies using the yeast system, a thermodynamically valid cycle for the exchange of partner proteins was
17
Fig. 6 The chaperone cycle of Hsp70/Hsp90. This cycle was established by Smith et al. [120] for the substrate protein glucocorticoid receptor. The substrate is transferred from Hsp70 to Hsp90 and further processed there to reach its activated state. This cycle may look different for other substrates as, especially in the case of kinases, different partner proteins can be found in Hsp90-containing chaperone complexes (see also Wegele et al. [210]).
18 The Hsp90 Family of Molecular Chaperones
4 Partner Proteins: Does Complexity Lead to Specificity?
proposed [131]. The first biochemical evidence for the chaperone cycle in yeast came from investigations using the yeast proteins homologous to Hop and Cyp40 [132]. It was shown that the protein Sti1 is a high-affinity inhibitor of Hsp90 AT-Pase and that Cpr6 could replace Sti1 from Hsp90. The binding site for both proteins was identified to be in the C-terminal region [132]. Further information about the inhibitory mechanism of Sti1 was obtained by the identification of an additional weak binding site in the N-terminal domain of Hsp90 [9, 133]. The inhibitory mechanism was found to be noncompetitive concerning ATP. Interestingly, Sti1 inhibits the N-terminal dimerization reaction required for efficient ATP hydrolysis [133]. However, ATP can still bind to this inhibited conformation. This static complex can be resolved efficiently by the combined action of ATP, Sba1, and Cpr6 [130, 131]. Sba1 was found to bind only to an N-terminal dimerized conformation of Hsp90 that can be accumulated by using the non-hydrolyzable ATP analogue AMP-PNP [83, 134, 135] or by Hsp90 mutants in which the rate-limiting step is shifted from N-terminal dimerization to ATP hydrolysis [131]. The binding constant of Sba1 to this conformation is in the nanomolar range. Binding results in a decrease of the turnover by about 60% [131, 136]. Thus, it is possible to describe the chaperone cycle as an exchange of partner proteins that is driven by the conformational changes of the Hsp90 molecular chaperone. As these rearrangements are the result of the hydrolysis reaction, the exchange of partner proteins itself is driven by ATP turnover. It can be envisioned that substrate turnover, at least in the presence of these partner proteins, is achieved by conformational changes that are driven by ATP hydrolysis (Figure 7). Of particular interest is how the progression of the substrate through the individual complexes affects the conformation of the substrate. For SHRs, the substrate enters the chaperone cycle probably by interaction with Hsp40 proteins [137]. These proteins recognize hydrophobic surfaces on nonnative proteins with relatively low specificity. The interaction of Hsp40 proteins with Hsp70 allows a transfer of the substrate to Hsp70. Hsp70, itself an ATPase, interacts with Hop. For the yeast system, it was shown that Sti1 activates the ATPase activity of yeast Hsp70 tremendously and thus might facilitate the processing of the substrate in this early complex [138]. Substrate transfer to Hsp90 occurs in the intermediate complex and might be accomplished by ATP hydrolysis of the Hsp70 component [139]. Whether the substrate is already active in this complex is not clear. The exchange of cofactors leading to the mature complex clearly results in an active SHR [71]. Hormone binding to this complex is allowed and results in the release of SHR from the chaperoning machinery [71]. Progression through the chaperone cycle may occur even in the absence of hormones, and it might be speculated that ATP is turned over in the p23-containing mature complex. This is followed by release of p23 and the release of SHR. If the SHR is hormone-free, it will again be a client for the reloading machinery of Hsp40/Hsp70. Thus, the low p23-inhibited ATPase activity of Hsp90 in the mature complex would function as a timer, controlling the active state of the SHR. In addition to the partner proteins mentioned in the context of the SHRs, many more are known in the yeast and mammalian systems. The individual functions
19
20 The Hsp90 Family of Molecular Chaperones
Fig. 7 ATPase and co-chaperone binding. Based on the conformational changes during the ATPase cycle, it is possible to integrate the co-chaperones into this cycle. Some of the co-chaperones, such as p23 and Sti1, clearly have different affinities for
specific conformations of Hsp90. Therefore, the exchange of co-chaperones as observed in the chaperone cycle may be the result of ATP-induced conformational changes of Hsp90.
of these proteins in the context of the Hsp90 chaperone activities are, with some exceptions, unknown. A striking feature is the presence of TPR domains in a number of partner proteins (Figure 8). 4.2 Hop/Sti1: Interactions Mediated by TPR Domains
The protein Hop has been identified as outlined above by the specific interaction with Hsp90-containing SHR complexes. Further experiments resulted in the observation that this protein greatly enhances the ability to assemble the SHR hetero-complexes from purified proteins [140]. This probably is due to its ability to work as an adaptor protein between Hsp90 and Hsp70. Three TPR domains have been identified in the protein Hop. Each of these domains is composed of three tandem TPR motifs. A TPR motif is formed structurally by two antiparallel
4 Partner Proteins: Does Complexity Lead to Specificity?
Fig. 8 Schematic representation of the TPR-containing Hsp90 partner proteins in yeast and the localization of the TPR domains within the protein.
α-helices connected by a turn (Figure 9). Therefore, a full TPR motif is built from six (in some cases seven) α-helices that form a binding site for a peptide [8, 141]. In the case of Hop, the specific interactors of the TPR motifs are known only partially. TPRI (the most N-terminal) binds Hsp70, and TPRIIa (the middle domain) binds Hsp90; the function of TPRIIb is not yet fully understood [8, 142, 143]. The same arrangement can be found in the yeast homologue of Hop, Sti1. In both cases, the main interacting peptide is the very C-terminal extension of the chaperones, which is formed by the amino acid sequence MEEVD. As described before, Hop/Sti1 function not only includes binding to Hsp90 and Hsp70. Sti1 is known to inhibit the ATPase activity of yeast Hsp90, and this inhibitory interaction has been found to be the result of two binding sites [132, 133]. Therefore, a more complex interaction pattern is found in this case. The same is true for the interaction of Sti1 with the yeast Hsp70 protein Ssa1. Sti1 strongly activates the ATPase activity of this protein by accelerating the hydrolysis reaction, indicating a more complex interaction as well [138]. None of these effects have been observed for the human Hsp90 system, which indicates that the regulation of the Hsp90 system has been subject to evolutionary changes.
21
22 The Hsp90 Family of Molecular Chaperones
Fig. 9 TPR binding to the C-terminal peptide of Hsp90. The TPR domain found in many of the Hsp90 co-chaperones binds with considerable affinity to the C-terminal peptide of Hsp90. The crystal structure of
the middle TPR domain of Hop in complex with the pentapeptide MEEVD was solved by Scheufler et al. [8]. Its PDB accession number is 1ELR.
4.3 p23/Sba1: Nucleotide-specific Interaction with Hsp90
The protein p23 was also identified in the context of the chaperone cycle of Hsp90 [122]. Its presence leads to a much higher yield of active SHRs [144]. The participation of p23 in the heterocomplexes remarkably stabilizes these assemblies and thus keeps SHRs in the active conformation much longer [134, 144]. This is achieved by specifically binding to a nucleotide-induced conformation of Hsp90 [130, 135]. This conformation has been shown to be N-terminal dimerized [87]. Truncation of the dimerization site resulted in a complete loss of p23-binding activity [9]. p23 is a small, acidic protein. It is composed of an antiparallel β-sandwich built from 10 strands (Figure 10). This domain has been shown to be responsible for the interaction with Hsp90 [145, 146]. The C-terminal of this domain is a flexible extension of about 70 amino acids. This is the chaperone-active site of p23,
Fig. 10 Structure of p23. The crystal structure of p23 was solved by Weaver et al. [146]. Its PDB accession number is 1EJF.
4 Partner Proteins: Does Complexity Lead to Specificity?
but it does not influence Hsp90 binding [145, 146]. Interestingly, the N-terminal structured part of p23 shares strong similarity with the structure of the small heat shock protein Hsp16 from Methanococcus jannaschii. The significance of this observation is unknown, but, based on the chaperone activity of p23, it may well be that they share a common origin. 4.4 Large PPIases: Conferring Specificity to Substrate Localization?
Analysis of SHR-containing heterocomplexes resulted in the identification of the proteins FKBP51, FKBP52, and Cyp40. These proteins have a domain structure that is composed of a PPIase domain of either the FKBP type (FKBP51, FKBP52) or the cyclophilin type (Cyp40). This peptidyl-prolyl cis/trans isomerase activity [147, 148] is known to accelerate the cis/trans isomerization of peptide bonds prior to proline residues. Due to the specific conformation of proline, these bonds have a remarkably slow rotation rate and thus need assistance to flip from the cis to the trans position and vice versa. The PPIase activity of the large PPIases is quite low compared to related enzymes. In addition to the PPIase domain, they contain a TPR domain that mediates the interaction with Hsp90 (Figure 11). All these PPIases were found to bind to the C-terminal end of Hsp90 and the accompanying sequences [9, 133]. Hsp90 binding by the TPR-containing large PPIases is governed not only by the TPR domain but also by other C-terminal sequences outside this binding site [149, 150]. In addition, the appearance of the specific PPIase in substrate complexes seems to be determined by the substrate. Especially the duplication of these proteins in higher eukaryotes and the appearance of two
Fig. 11 Structure of FKBP51. The crystal structure of FKBP51 was solved by Taylor et al. [211]. Its PDB accession number is 1IHG. Highlighted in different colors is the domain organization of this protein. The N-terminal PPIase domain (yellow) is followed by a TPRcontaining domain (purple) that serves as interaction site with Hsp90.
23
24 The Hsp90 Family of Molecular Chaperones
classes of PPIases (the FKBPs and the cyclophilins) as TPR-containing Hsp90 cochaperones raise important questions about the specific function of these proteins for specific substrates. Interestingly, the large PPIases come in different numbers in different organisms – two cyclophilins in yeast, one FKBP in C. elegans, and two FKBPs plus one cyclophilin in mammals. This may represent a further indication for the evolutionary adaptation of the Hsp90 system towards specific chaperone requirements. Analysis of the SHR-containing complexes showed the presence of all three PPIases in these complexes, although probably not simultaneously [151]. Differences regarding specific functions of the individual large PPIases have been reported. In yeast, Cpr6 is nonessential, whereas deletion of Cpr7 results in a slowgrowth phenotype. The reasons for this are unknown [152]. In mammals, special focus has been set on differences between FKBP51 and FKBP52. Both proteins show chaperoning activity [153, 154] and both bind to the C-terminus of Hsp90 with comparable affinity [148]. One specific difference might be the appearance of a nuclear localization signal in the primary sequence of FBKP52. Indeed, FKBP52 appears to be localized to the nucleus, and specific interaction with the transcription factor and Hsp90 substrate HSF1 has been reported [155]. Another interesting feature is the complex formation between FKBP52 and dynein [156, 157]. This interaction may be required for the transport of Hsp90 substrates to the nucleus [158]. This has been demonstrated recently for the transport of transcription factor and the Hsp90 substrate p53 from the cytosol to the nucleus, which occurs in a complex containing Hsp90 and FKBP52 [159]. For SHRs, which also require a shuttling system from the cytosol to the nucleus, a similar dependency on FKBP52 has been observed [160]. Therefore, evidence is accumulating that the differential interaction of Hsp90 substrate complexes with PPIases also determines the cellular localization of the active substrate. Thus, they add specificity to the Hsp90 system. 4.5 Pp5: Facilitating Dephosphorylation
Pp5/PPT is a phosphatase that associates with Hsp90 based on TPR regions and their interaction with the C-terminus of Hsp90 [132]. The specific function of Pp5 in the context of Hsp90 substrate complexes is unknown, although several reports indicate that Pp5 as well might be involved in the activation process of SHRs [161]. Binding, at least in vitro, appears to occur in competition with FKBPs and other TPR-containing proteins [132]. The structure of the TPR-domain of Pp5 was solved before any other TPR structure was known [141]. The involvement of phosphorylation on the functionality of the Hsp90 system and on the activity of many Hsp90 substrate proteins has not been investigated in detail. Thus, it is unknown what specific function the phosphatase might serve. Interestingly, many of the known substrates of Hsp90, such as the heat shock factor and several kinases, are clearly regulated by phosphorylation and dephosphorylation, and thus there are ample possibilities to include phosphatases into the regulatory functions of Hsp90-containing protein complexes.
4 Partner Proteins: Does Complexity Lead to Specificity?
4.6 Cdc37: Building Complexes with Kinases
Of particular interest in the context of protein kinase substrates is the co-chaperone Cdc37. Cdc37 was identified as a part of Hsp90 complexes with Src kinase in eukaryotes [162, 163]. In these complexes, Cdc37 is thought to bind to Hsp90 with its C-terminal part and to the kinase with its N-terminal part [164]. This interaction seems to confer specificity to the otherwise promiscuous Hsp90-substrate interaction. Cdc37-containing complexes were sometimes found to be multi-protein complexes in the megadalton range, involving several other proteins [165]. In addition, the possibility exists that Cdc37 is also able to work independently in the activation process of kinase targets [166]. Besides the many kinases that are known to associate with Hsp90 and Cdc37, it is still not clear exactly what the function of these two proteins in the context of the activation of kinases is. Src kinase appears to be shuttled by the system from the nascent state to the cell membrane, where the kinase becomes activated upon myristoylation and insertion of the N-terminus into the plasma membrane [163, 167, 168]. Disruption of the Hsp90 system leaves these substrates inactive and prone to degradation. The same has been observed for most other tyrosine kinases of the Src family, many serine/threonine kinases, and several kinases involved in cell cycle control such as Cdk4 [169]. Experiments using heterologously expressed Src kinase in yeast showed that yeast Cdc37 can substitute for human Cdc37 in the activation of the kinase. This is remarkable, as significant homology between yeast and human Cdc37 is restricted to 25 amino acids in the N-terminal part of the protein [170]. The interaction of Cdc37 with Hsp90 results in low-affinity inhibition of the ATPase activity [171]. Recently, the crystal structure of human Cdc37 with the N-terminal domain of yeast Hsp90 was solved [172]. The observed interaction between the N-terminal domain of Hsp90 and the co-crystallized Cdc37 fragment is remarkable (Figure 12). Cdc37 binds as a dimeric protein to the two N-terminal domains of Hsp90. This orientation prevents the N-terminal dimeriza-tion reaction required for ATP hydrolysis by Hsp90 and explains the inhibitory effect of Cdc37 on the ATPase activity of Hsp90 [172]. Its direct interaction site is the ATP lid, which has been implicated in the ATP hydrolysis reaction. Thus, as with p23 and Hop/Sti1, a mechanism has been found to couple the binding of a co-chaperone to a distinct step in the ATPase cycle of Hsp90. According to genetic experiments, Cdc37 shares overlapping functions with Sti1 in vivo and thus appears to function in the loading of substrates onto Hsp90 complexes [173]. This simplistic view of Cdc37 as a kinase-specific cofactor of Hsp90 is challenged by recent discoveries [174]. In addition to functions in the activation of Hsp90 substrates other than kinases, Cdc37 seems to have Hsp90-independent functions. Furthermore, it has been found to form complexes with the Hsp90 co-chaperone Sti1, even in the absence of Hsp90 [175]. In addition, Cdc37 is known to be a molecular chaperone on its own [176].
25
26 The Hsp90 Family of Molecular Chaperones
Fig. 12 Interaction between Cdc37 and Hsp90. The crystal structure of the complex between the N-terminal domain of Hsp90 and a fragment of the co-chaperone Cdc37 was reported by Roe et al. [172]. In this
structure the C-terminal part of Cdc37 (yellow) interacts with the N-terminal domain of Hsp90 (purple). The primary interaction site is the ATP lid (red). The PDB accession number of this structure is 1US7.
4.7 Tom70: Chaperoning Mitochondrial Import
Tom70 is an integral part of the mitochondrial import machinery. It serves as a receptor of preproteins in the outer mitochondrial membrane. The substrates to be imported into mitochondria are transferred from Tom70 to the mitochondrial import channel. Tom70 was found to associate with Hsp90 via its TPR domain [177]. The association allows delivering certain preproteins from the cytosol onto the Tom70 receptor as a first step of mitochondrial import. This interaction could be shown to be required for the transport of several targets into the mitochondrial matrix [177]. Further experiments showed that the ATPase activity of Hsp90 is involved. Interestingly, both Hsp70 and Hsp90 are able to interact with the single TPR domain of Tom70 in yeast and mammals [177]. The individual affinities of these chaperones for Tom70 vary in different organisms. In humans, Hsp90 appears to be the preferred cytosolic chaperone to deliver preproteins, whereas in yeast most preproteins are delivered by cytosolic Hsp70 [177]. Thus, this example indicates that besides the well-documented involvement of Hsp90 in the activation of signal transducers, some general folding functions remain to be discovered. The specific function and localization of Hsp90 in complex with its substrate protein may be governed to a large degree by its association with a specific co-chaperone [178].
4 Partner Proteins: Does Complexity Lead to Specificity?
4.8 CHIP and Sgt1: Multiple Connections to Protein Degradation
Hsp90 has long been speculated to be involved in protein degradation in addition to its role in protein folding [60]. In particular, the fast degradation observed for several substrates after Hsp90 inhibition hints towards this possibility [59, 60]. The recent identification of the co-chaperone CHIP in mammalian cells has intensified these considerations, as this protein apparently directs substrates towards degradation by the proteasome [179]. CHIP is composed of two domains: a TPR domain mediating the interaction with Hsp70 and Hsp90 and an ubiquitin-ligase domain [180]. The latter domain is involved in the attachment of ubiquitin chains to degradationprone proteins [180]. Therefore, it can be envisioned that via this partner protein, Hsp90 substrates are marked for degradation. How the decision between folding and degradation is made is the subject of intense research. Sgt1, another Hsp90 co-chaperone that was identified recently, is also involved in protein degradation. Sgt1 is part of the SCF (Skp1-Cul1-F-box) ubiquitin-ligase complex [181] and was found to bind to Hsp90 [182]. Disturbance of its interaction with Hsp90 results in defects in genomic defense mechanisms of plants [183, 184]. The interaction of Sgt1 with Hsp90 is not fully understood. Recent results indicate that the domain of Sgt1, which binds to Hsp90, is similar to p23. Binding seems to be slightly ATP-dependent [185, 186]. Interestingly, the TPR domain also present in Sgt1 appears not to be involved in Hsp90 binding. This new co-chaperone seems to localize primarily to the kinetochore, and therefore a nuclear function of Hsp90 has to be envisioned in the context of this chaperone [181].
4.9 Aha1 and Hch1: Just Stimulating the ATPase?
In the late 1990s, two temperature-sensitive Hsp90 variants were used in yeast to identify cellular suppressors of the ts-associated defects [187]. Overexpression of three proteins was found to decrease the ts effects. These proteins were identified as Cns1, Ssf1, and Hch1 [187]. Subsequent biochemical investigations led to the determination that Hch1 is a stimulator of the ATPase activity of Hsp90 [136]. Although binding was found to be weak (in the high micromolar range), the effect on the turnover number of Hsp90 is strong, leading to a 20-fold increase in ATP hydrolysis. Using bioinformatics approaches, a homologue of Hch1 (Aha1) was identified and shown to bind with greater affinity to Hsp90 [136]. It also stimulates the ATPase activity of Hsp90. Cellular roles for these proteins are unknown to date, although yeast experiments using heterologously expressed v-Src as an Hsp90 substrate indicate that the interaction between Hsp90 and Aha1 is important for the maturation of Hsp90 substrates [136]. While Hch1 is a fairly small protein composed of 140 amino acids, Aha1 consists of two domains. The N-terminal domain of Aha1 is homologous to Hch1, and the C-terminal domain is of unknown origin. The isolated N-terminal domain of Aha1 is sufficient to bind to and stimulate
27
28 The Hsp90 Family of Molecular Chaperones
Fig. 13 Interaction of Aha1 with Hsp90. The crystal structure of this complex was solved by Meyer et al. [188]. Aha1 (red) interacts with the middle domain of Hsp90. The interaction sites within the middle do-
main reside in the ATPase-interacting part (yellow) as well as in the part supposed to interact with substrates (purple). The PDB accession number of this structure is 1USU.
the ATPase activity of Hsp90. While Aha1 is present in all eukaryotes, Hch1 has so far been found only in yeast. The crystal structure of the complex between Hsp90 and Aha1 suggests a mechanism for the stimulation of the ATPase activity of Hsp90 by Aha1 [188] (Figure 13). Aha1 binds to the middle domain of Hsp90 [80, 189]. This leads to the reorientation of a loop involved in ATP hydrolysis in other GHKL-ATPases as well. The loop is thought to interact with the γ -phosphate of the bound nucleotide, and this interaction appears to be facilitated by Aha1 [188]. 4.10 Cns1, Sgt2, and Xap2: Is a TPR Enough to Become an Hsp90 Partner?
Several other proteins from S. cerevisiae contain TPR domains with homology to those of Cpr6, Hop, or Ppt1. In particular, for the TPR proteins Cns1, Sgt2, and mammalian Xap2, an interaction with either Hsp90 or Hsp70 or both has been described. Cns1, which was identified as a suppressor of temperature-sensitive Hsp90 phenotypes [187], has been detected as an Hsp90 co-chaperone in different contexts [190]. The interaction has been confirmed in Drosophila [191]. Recent biochemical investigations showed that Cns1, like the co-chaperone CHIP, is able to interact with both Hsp90 and Hsp70 with its single TPR domain [192]. In addition, it was found to strongly stimulate the ATPase activity of yeast Hsp70 [192]. In contrast to Sti1, the binding of Cns1 to Hsp90 does not influence its ATPase activity. Interestingly, Cns1 is one of the few essential co-chaperones in
6 Acknowledgements
yeast [190, 193]. In this context, a close genetic relationship between Cns1 and Cp7 was reported [193] and a direct interaction between the two proteins was observed [194]. Xap2 contains a TPR domain and interacts with Hsp90 during the chaperoning of the substrate reverse transcriptase from hepatitis B virus [195, 196] and the dioxin receptor, a cytosolic receptor with some functional similarity to the steroid receptors [197]. In both cases, the function of Xap2 is unknown and the participation of further proteins in this process is very likely. Using bioinformatics approaches, several other proteins with TPR domains have been identified. The characterization of their function in the Hsp90 chaperone cycle is just beginning. These include Sgt2, Cdc23, Cdc26, Tom34, Tpr2, and others [141, 198, 199]. Sequence homology studies so far have failed to predict a clear differentiation between Hsp90- and Hsp70-binding motifs within the TPR domains. Ultimately, an investigation of binding affinities between Hsp90/Hsp70 and these co-chaperones will have to be performed to fully establish the Hsp90 network within the cell.
5 Conclusion
The evolution of the Hsp90 system from an obviously single, nonessential protein to an essential network of specific substrates, co-chaperones, and cellular functions highlights the importance of this protein system for the eukaryotic cells. In recent years, great progress has been achieved regarding the identification of new participants in the system, the biochemistry of the ATPase reaction, the interaction with partner proteins, and the identification of new substrates. Nevertheless, despite over 2000 publications on Hsp90, a clear understanding of its cellular function is still lacking. Many of the studies performed with Hsp90 to date address the interaction of partner proteins in the absence of substrates. The functional aspects of the cochaperone-Hsp90 interactions are beginning to emerge for the best-described partner proteins. For the others, information is scarce. It thus is of utmost importance to integrate the partner proteins into the ATPase cycle and to determine which partner proteins associate to heterogenic complexes and which bind mutually exclusively to Hsp90. These investigations in combination with in vitro studies on substrates will ultimately result in the elucidation of critical functions of the Hsp90 chaperone machine.
6 Acknowledgements
The authors would like to acknowledge Didier Picard for critically reading the manuscript. We thank Stefan Walter for help with the equations and Benjamin B¨osl
29
30 The Hsp90 Family of Molecular Chaperones
for drawing of chemical structures. In addition, we would like to thank Lin Mueller and Harald Wegele for providing graphic material. This work was supported by grants of the DFG and the Fonds der chemischen Industrie.
References 1 PICARD, D. (2002). Heat-shock protein 90, a chaperone for folding and regulation. Cell Mol. Life Sci., 59, 1640–1648. 2 BORKOVICH, K. A., FARRELLY, F. W., FINKELSTEIN, D. B., TAULIEN, J., and LINDQUIST, S. (1989). hsp82 is an essential protein that is required in higher concentrations for growth of cells at higher temperatures. Mol. Cell Biol., 9, 3919–3930. 3 HICKEY, E., BRANDON, S. E., SMALE, G., LLOYD, D., and WEBER, L. A. (1989). Sequence and regulation of a gene encoding a human 89-kilodalton heat shock protein. Mol. Cell Biol., 9, 2615–2626. 4 KRISHNA, P. and GLOOR, G. (2001). The Hsp90 family of proteins in Arabidopsis thaliana. Cell Stress. Chaperones., 6, 238–246. 5 GUPTA, R. S. (1995). Phylogenetic analysis of the 90 kD heat shock family of protein sequences and an examination of the relationship among animals, plants, and fungi species. Mol. Biol. Evol., 12, 1063–1073. 6 EMELYANOV, V. V. (2002). Phylogenetic relationships of organellar Hsp90 homologs reveal fundamental differences to organellar Hsp70 and Hsp60 evolution. Gene, 299, 125–133. 7 LOUVION, J. F., WARTH, R., and PICARD, D. (1996). Two eukaryote-specific regions of Hsp82 are dispensable for its viability and signal transduction functions in yeast. Proc. Natl. Acad. Sci. U.S.A, 93, 13937–13942. 8 SCHEUFLER, C., BRINKER, A., BOURENKOV, G., PEGORARO, S., MORODER, L., BARTUNIK, H., HARTL, F. U., and MOAREFI, I. (2000). Structure of TPR domain-peptide complexes: critical elements in the assembly of the
9
10
11
12
13
14
15
16
Hsp70-Hsp90 multichaperone machine. Cell, 101, 199–210. CHEN, S., SULLIVAN, W. P., TOFT, D. O., and SMITH, D. F. (1998). Differential interactions of p23 and the TPR-containing proteins Hop, Cyp40, FKBP52 and FKBP51 with Hsp90 mutants. Cell Stress. Chaperones., 3, 118–129. SCHMITZ, G., SCHMIDT, M., and FEIERABEND, J. (1996). Characterization of a plastid-specific HSP90 homologue: identification of a cDNA sequence, phylogenetic descendence and analysis of its mRNA and protein expression. Plant Mol. Biol., 30, 479–492. FELTS, S. J., OWEN, B. A., NGUYEN, P., TREPEL, J., DONNER, D. B., and TOFT, D. O. (2000). The hsp90-related protein TRAP1 is a mitochondrial protein with distinct functional properties. J. Biol. Chem., 275, 3305–3312. BARDWELL, J. C. and CRAIG, E. A. (1988). Ancient heat shock gene is dispensable. J. Bacteriol., 170, 2977–2983. TANAKA, N. and NAKAMOTO, H. (1999). HtpG is essential for the thermal stress management in cyanobacteria. FEBS Lett., 458, 117–123. MORANO, K. A., SANTORO, N., KOCH, K. A., and THIELE, D. J. (1999). A trans-activation domain in yeast heat shock transcription factor is essential for cell cycle progression during stress. Mol. Cell Biol., 19, 402–411. ENDO, N., TASHIRO, E., UMEZAWA, K., KAWADA, M., UEHARA, Y., DOKI, Y., WEINSTEIN, I. B., and IMOTO, M. (2000). Herbimycin A induces G1 arrest through accumulation of p27(Kip1) in cyclin D1-overexpressing fibroblasts. Biochem. Biophys. Res. Commun., 267, 54–58. SHIOTSU, Y., NECKERS, L. M., WORTMAN, I., AN, W. G., SCHULTE, T. W., SOGA, S.,
References
17
18
19
20
21
22
23
24
25
26
MURAKATA, C., TAMAOKI, T., and AKINAGA, S. (2000). Novel oxime derivatives of radicicol induce erythroid differentiation associated with preferential G(1) phase accumulation against chronic myelogenous leukemia cells through destabilization of Bcr-Abl with Hsp90 complex. Blood, 96, 2284–2291. DAI, K., KOBAYASHI, R., and BEACH, D. (1996). Physical interaction of mammalian CDC37 with CDK4. J. Biol. Chem., 271, 22030–22034. DONZE, O., ABBAS-TERKI, T., and PICARD, D. (2001). The Hsp90 chaperone complex is both a facilitator and a repressor of the dsRNA-dependent kinase PKR. EMBO J., 20, 3771–3780. ALIGUE, R., AKHAVAN-NIAK, H., and RUSSELL, P. (1994). A role for Hsp90 in cell cycle control: Wee1 tyrosine kinase activity requires interaction with Hsp90. EMBO J., 13, 6099–6106. RUTHERFORD, S. L. and LINDQUIST, S. (1998). Hsp90 as a capacitor for morphological evolution. Nature, 396, 336–342. QUEITSCH, C., SANGSTER, T. A., and LINDQUIST, S. (2002). Hsp90 as a capacitor of phenotypic variation. Nature, 417, 618–624. VOSS, A. K., THOMAS, T., and GRUSS, P. (2000). Mice lacking HSP90beta fail to develop a placental labyrinth. Development, 127, 1–11. LELE, Z., HARTSON, S. D., MARTIN, C. C., WHITESELL, L., MATTS, R. L., and KRONE, P. H. (1999). Disruption of zebrafish somite development by pharmacologic inhibition of Hsp90. Dev. Biol., 210, 56–70. RANDOW, F. and SEED, B. (2001). Endoplasmic reticulum chaperone gp96 is required for innate immunity but not cell viability. Nat. Cell Biol., 3, 891–896. GASS, J. N., GIFFORD, N. M., and BREWER, J. W. (2002). Activation of an Unfolded Protein Response during Differentiation of Antibody-secreting B Cells. J. Biol. Chem., 277, 49047–49054. MEUNIER, L., USHERWOOD, Y. K., CHUNG, K. T., and HENDERSHOT, L. M. (2002). A subset of chaperones and folding
27
28
29
30
31
32
33
34
enzymes form multiprotein complexes in endoplasmic reticulum to bind nascent proteins. Mol. Biol. Cell, 13, 4456–4469. BINDER, R. J. and SRIVASTAVA, P. K. (2004). Essential role of CD91 in representation of gp96-chaperoned peptides. Proc. Natl. Acad. Sci. U.S.A. GRAMMATIKAKIS, N., VULTUR, A., RAMANA, C. V., SIGANOU, A., SCHWEINFEST, C. W., WATSON, D. K., and RAPTIS, L. (2002). The role of Hsp90N, a new member of the Hsp90 family, in signal transduction and neoplastic transformation. J. Biol. Chem., 277, 8312–8320. HEITZER, A., MASON, C. A., SNOZZI, M., and HAMER, G. (1990). Some effects of growth conditions on steady state and heat shock induced htpG gene expression in continuous cultures of Escherichia coli. Arch. Microbiol., 155, 7–12. ARANDA, A., QUEROL, A., and DEL OLMO, M. (2002). Correlation between acetaldehyde and ethanol resistance and expression of HSP genes in yeast strains isolated during the biological aging of sherry wines. Arch. Microbiol., 177, 304–312. ZHOU, Y. N., KUSUKAWA, N., ERICKSON, J. W., GROSS, C. A., and YURA, T. (1988). Isolation and characterization of Escherichia coli mutants that lack the heat shock sigma factor sigma 32. J. Bacteriol., 170, 3640–3649. COWING, D. W., BARDWELL, J. C., CRAIG, E. A., WOOLFORD, C., HENDRIX, R. W., and GROSS, C. A. (1985). Consensus sequence for Escherichia coli heat shock gene promoters. Proc. Natl. Acad. Sci. U.S.A, 82, 2679–2683. NADEAU, K., DAS, A., and WALSH, C. T. (1993). Hsp90 chaperonins possess ATPase activity and bind heat shock transcription factors and peptidyl prolyl isomerases. J. Biol. Chem., 268, 1479–1487. ZARZOV, P., BOUCHERIE, H., and MANN, C. (1997). A yeast heat shock transcription factor (Hsf1) mutant is defective in both Hsc82/Hsp82 synthesis and spindle pole body duplication. J. Cell Sci., 110 (Pt 16), 1879–1891.
31
32 The Hsp90 Family of Molecular Chaperones
35 AHN, S. G., LIU, P. C., KLYACHKO, K., MORIMOTO, R. I., and THIELE, D. J. (2001). The loop domain of heat shock transcription factor 1 dictates DNA-binding specificity and responses to heat stress. Genes Dev., 15, 2134–2145. 36 GROSS, D. S., ADAMS, C. C., ENGLISH, K. E., COLLINS, K. W., and LEE, S. (1990). Promoter function and in situ protein/DNA interactions upstream of the yeast HSP90 heat shock genes. Antonie Van Leeuwenhoek, 58, 175–186. 37 GROSS, D. S., ENGLISH, K. E., COLLINS, K. W., and LEE, S. W. (1990). Genomic footprinting of the yeast HSP82 promoter reveals marked distortion of the DNA helix and constitutive occupancy of heat shock and TATA elements. J. Mol. Biol., 216, 611–631. 38 ERKINE, A. M., ADAMS, C. C., GAO, M., and GROSS, D. S. (1995). Multiple protein-DNA interactions over the yeast HSC82 heat shock gene promoter. Nucleic Acids Res., 23, 1822–1829. 39 RIPLEY, B. J., STEPHANOU, A., ISENBERG, D. A., and LATCHMAN, D. S. (1999). Interleukin-10 activates heat-shock protein 90beta gene expression. Immunology, 97, 226–231. 40 SHEN, Y., LIU, J., WANG, X., CHENG, X., WANG, Y., and WU, N. (1997). Essential role of the first intron in the transcription of hsp90beta gene. FEBS Lett., 413, 92–98. 41 ZHANG, S. L., YU, J., CHENG, X. K., DING, L., HENG, F. Y., WU, N. H., and SHEN, Y. F. (1999). Regulation of human hsp90alpha gene expression. FEBS Lett., 444, 130–135. 42 DUGYALA, R. R., CLAGGETT, T. W., KIMMEL, G. L., and KIMMEL, C. A. (2002). HSP90alpha, HSP90beta, and p53 expression following in vitro hyperthermia exposure in gestation day 10 rat embryos. Toxicol. Sci., 69, 183–190. 43 WU, J. M., XIAO, L., CHENG, X. K., CUI, L. X., WU, N. H., and SHEN, Y. F. (2003). PKC epsilon is a unique regulator for hsp90 beta gene in heat shock response. J. Biol. Chem., 278, 51143–51149. 44 SREEDHAR, A. S., KALMAR, E., CSERMELY, P., and SHEN, Y. F. (2004). Hsp90
45
46
47
48
49
50
51
52
53
54
isoforms: functions, expression and clinical importance. FEBS Lett., 562, 11–15. RUTKOWSKI, D. T. and KAUFMAN, R. J. (2004). A trip to the ER: coping with stress. Trends Cell Biol., 14, 20–28. OKAMURA, K., KIMATA, Y., HIGASHIO, H., TSURU, A., and KOHNO, K. (2000). Dissociation of Kar2p/BiP from an ER sensory molecule, Ire1p, triggers the unfolded protein response in yeast. Biochem. Biophys. Res. Commun., 279, 445–450. KIMATA, Y., KIMATA, Y. I., SHIMIZU, Y., ABE, H., FARCASANU, I. C., TAKEUCHI, M., ROSE, M. D., and KOHNO, K. (2003). Genetic evidence for a role of BiP/Kar2 that regulates Ire1 in response to accumulation of unfolded proteins. Mol. Biol. Cell, 14, 2559–2569. PATIL, C. and WALTER, P. (2001). Intracellular signaling from the endoplasmic reticulum to the nucleus: the unfolded protein response in yeast and mammals. Curr. Opin. Cell Biol., 13, 349–355. NECKERS, L. and IVY, S. P. (2003). Heat shock protein 90. Curr. Opin. Oncol., 15, 419–424. HORAKOVA, K. and BETINA, V. (1977). Cytotoxic activity of macrocyclic metabolites from fungi. Neoplasma, 24, 21–27. JOHNSON, R. D., HABER, A., and RINEHART, K. L., Jr. (1974). Geldanamycin biosynthesis and carbon magnetic resonance. J. Am. Chem. Soc., 96, 3316–3317. MUROI, M., IZAWA, M., KOSAI, Y., and ASAI, M. (1980). Macbecins I and II, new antitumor antibiotics. II. Isolation and characterization. J. Antibiot. (Tokyo), 33, 205–212. TANIDA, S., HASEGAWA, T., and HIGASHIDE, E. (1980). Macbecins I and II, new antitumor antibiotics. I. Producing organism, fermentation and antimicrobial activities. J. Antibiot. (Tokyo), 33, 199–204. OMURA, S., MIYANO, K., NAKAGAWA, A., SANO, H., KOMIYAMA, K., UMEZAWA, I., SHIBATA, K., and SATSUMABAYASHI, S. (1984). Chemical modification and
References
55
56
57
58
59
60
61
62
63
antitumor activity of herbimycin A. 8,9-Epoxide, 7,9-cyclic carbamate, and 17 or 19-amino derivatives. J. Antibiot. (Tokyo), 37, 1264–1267. OMURA, S., IWAI, Y., TAKAHASHI, Y., SADAKANE, N., NAKAGAWA, A., OIWA, H., HASEGAWA, Y., and IKAI, T. (1979). Herbimycin, a new antibiotic produced by a strain of Streptomyces. J. Antibiot. (Tokyo), 32, 255–261. UEHARA, Y., HORI, M., TAKEUCHI, T., and UMEZAWA, H. (1986). Phenotypic change from transformed to normal induced by benzoquinonoid ansamycins accompanies inactivation of p60src in rat kidney cells infected with Rous sarcoma virus. Mol. Cell Biol., 6, 2198–2206. UEHARA, Y. (2003). Natural product origins of Hsp90 inhibitors. Curr. Cancer Drug Targets., 3, 325–330. WHITESELL, L. and COOK, P. (1996). Stable and specific binding of heat shock protein 90 by geldanamycin disrupts glucocorticoid receptor function in intact cells. Mol. Endocrinol., 10, 705–712. AN, W. G., SCHULTE, T. W., and NECKERS, L. M. (2000). The heat shock protein 90 antagonist geldanamycin alters chaperone association with p210bcr-abl and v-src proteins before their degradation by the proteasome. Cell Growth Differ., 11, 355–360. SCHNEIDER, C., SEPP-LORENZINO, L., NIMMESGERN, E., OUERFELLI, O., DANISHEFSKY, S., ROSEN, N., and HARTL, F. U. (1996). Pharmacologic shifting of a balance between protein refolding and degradation mediated by Hsp90. Proc. Natl. Acad. Sci. U.S.A, 93, 14536–14541. STEBBINS, C. E., RUSSO, A. A., SCHNEIDER, C., ROSEN, N., HARTL, F. U., and PAVLETICH, N. P. (1997). Crystal structure of an Hsp90-geldanamycin complex: targeting of a protein chaperone by an antitumor agent. Cell, 89, 239–250. ROE, S. M., PRODROMOU, C., O’BRIEN, R., LADBURY, J. E., PIPER, P. W., and PEARL, L. H. (1999). Structural basis for inhibition of the Hsp90 molecular chaperone by the antitumor antibiotics radicicol and geldanamycin. J. Med. Chem., 42, 260–266. NECKERS, L. (2002). Hsp90 inhibitors as
64
65
66
67
68
69
70
71
72
novel cancer chemotherapeutic agents. Trends Mol. Med., 8, S55–S61. WORKMAN, P. (2004). Combinatorial attack on multistep oncogenesis by inhibiting the Hsp90 molecular chaperone. Cancer Lett., 206, 149–157. KELLAND, L. R., SHARP, S. Y., ROGERS, P. M., MYERS, T. G., and WORKMAN, P. (1999). DT-Diaphorase expression and tumor cell sensitivity to 17-allylamino, 17-demethoxygeld-anamycin, an inhibitor of heat shock protein 90. J. Natl. Cancer Inst., 91, 1940–1949. SAUSVILLE, E. A., TOMASZEWSKI, J. E., and IVY, P. (2003). Clinical development of 17-allylamino, 17-demethoxygeldanamycin. Curr. Cancer Drug Targets., 3, 377–383. KAMAL, A., THAO, L., SENSINTAFFAR, J., ZHANG, L., BOEHM, M. F., FRITZ, L. C., and BURROWS, F. J. (2003). A high-affinity conformation of Hsp90 confers tumour selectivity on Hsp90 inhibitors. Nature, 425, 407–410. ROWLANDS, M. G., NEWBATT, Y. M., PRODROMOU, C., PEARL, L. H., WORKMAN, P., and AHERNE, W. (2004). High-throughput screening assay for inhibitors of heat-shock protein 90 ATPase activity. Anal. Biochem., 327, 176–183. CHIOSIS, G., TIMAUL, M. N., LUCAS, B., MUNSTER, P. N., ZHENG, F. F., SEPP-LORENZINO, L., and ROSEN, N. (2001). A small molecule designed to bind to the adenine nucleotide pocket of Hsp90 causes Her2 degradation and the growth arrest and differentiation of breast cancer cells. Chem. Biol., 8, 289–299. CHIOSIS, G., LUCAS, B., HUEZO, H., SOLIT, D., BASSO, A., and ROSEN, N. (2003). Development of purine-scaffold small molecule inhibitors of Hsp90. Curr. Cancer Drug Targets., 3, 371–376. PRATT, W. B. and TOFT, D. O. (2003). Regulation of signaling protein function and trafficking by the hsp90/hsp70-based chaperone machinery. Exp. Biol. Med. (Maywood.), 228, 111–133. BLAGOSKLONNY, M. V., TORETSKY, J., BOHEN, S., and NECKERS, L. (1996).
33
34 The Hsp90 Family of Molecular Chaperones
73
74
75
76
77
78
79
80
81
Mutant conformation of p53 translated in vitro or in vivo requires functional HSP90. Proc. Natl. Acad. Sci. U.S.A, 93, 8379–8383. ZOU, J., GUO, Y., GUETTOUCHE, T., SMITH, D. F., and VOELLMY, R. (1998). Repression of heat shock transcription factor HSF1 activation by HSP90 (HSP90 complex) that forms a stress-sensitive complex with HSF1. Cell, 94, 471–480. PRODROMOU, C., ROE, S. M., PIPER, P. W., and PEARL, L. H. (1997). A molecular clamp in the crystal structure of the N-terminal domain of the yeast Hsp90 chaperone. Nat. Struct. Biol., 4, 477–482. PRODROMOU, C., ROE, S. M., O’BRIEN, R., LADBURY, J. E., PIPER, P. W., and PEARL, L. H. (1997). Identification and structural characterization of the ATP/ADP-binding site in the Hsp90 molecular chaperone. Cell, 90, 65–75. WIGLEY, D. B., DAVIES, G. J., DODSON, E. J., MAXWELL, A., and DODSON, G. (1991). Crystal structure of an N-terminal fragment of the DNA gyrase B protein. Nature, 351, 624–629. DUTTA, R. and INOUYE, M. (2000). GHKL, an emergent ATPase/kinase superfamily. Trends Biochem. Sci., 25, 24–28. BRINO, L., URZHUMTSEV, A., MOUSLI, M., BRONNER, C., MITSCHLER, A., OUDET, P., and MORAS, D. (2000). Dimerization of Escherichia coli DNA-gyrase B provides a structural mechanism for activating the ATPase catalytic center. J. Biol. Chem., 275, 9468–9475. BAN, C., JUNOP, M., and YANG, W. (1999). Transformation of MutL by ATP binding and hydrolysis: a switch in DNA mismatch repair. Cell, 97, 85–97. MEYER, P., PRODROMOU, C., HU, B., VAUGHAN, C., ROE, S. M., PANARETOU, B., PIPER, P. W., and PEARL, L. H. (2003). Structural and functional analysis of the middle segment of hsp90: implications for ATP hydrolysis and client protein and cochaperone interactions. Mol. Cell, 11, 647–658. PANARETOU, B., PRODROMOU, C., ROE, S. M., O’BRIEN, R., LADBURY, J. E., PIPER, P. W., and PEARL, L. H. (1998). ATP
82
83
84
85
86
87
88
89
binding and hydrolysis are essential to the function of the Hsp90 molecular chaperone in vivo. EMBO J., 17, 4829–4836. SCHEIBEL, T., WEIKL, T., and BUCHNER, J. (1998). Two chaperone sites in Hsp90 differing in substrate specificity and ATP dependence. Proc. Natl. Acad. Sci. U.S.A, 95, 1495–1499. PRODROMOU, C., PANARETOU, B., CHOHAN, S., SILIGARDI, G., O’BRIEN, R., LADBURY, J. E., ROE, S. M., PIPER, P. W., and PEARL, L. H. (2000). The ATPase cycle of Hsp90 drives a molecular ‘clamp’ via transient dimerization of the N-terminal domains. EMBO J., 19, 4383–4392. WEIKL, T., MUSCHLER, P., RICHTER, K., VEIT, T., REINSTEIN, J., and BUCHNER, J. (2000). C-terminal regions of Hsp90 are important for trapping the nucleotide during the ATPase cycle. J. Mol. Biol., 303, 583–592. RICHTER, K., MUSCHLER, P., HAINZL, O., and BUCHNER, J. (2001). Coordinated ATP hydrolysis by the Hsp90 dimer. J. Biol. Chem., 276, 33689–33696. WEGELE, H., MUSCHLER, P., BUNCK, M., REINSTEIN, J., and BUCHNER, J. (2003). Dissection of the contribution of individual domains to the ATPase mechanism of Hsp90. J. Biol. Chem., 278, 39303–39310. CHADLI, A., BOUHOUCHE, I., SULLIVAN, W., STENSGARD, B., MCMAHON, N., CATELLI, M. G., and TOFT, D. O. (2000). Dimerization and N-terminal domain proximity underlie the function of the molecular chaperone heat shock protein 90. Proc. Natl. Acad. Sci. U.S.A, 97, 12524–12529. RICHTER, K., REINSTEIN, J., and BUCHNER, J. (2002). N-terminal residues regulate the catalytic efficiency of the Hsp90 ATPase cycle. J. Biol. Chem., 277, 44905–44910. GRENERT, J. P., SULLIVAN, W. P., FADDEN, P., HAYSTEAD, T. A., CLARK, J., MIMNAUGH, E., KRUTZSCH, H., OCHEL, H. J., SCHULTE, T. W., SAUSVILLE, E., NECKERS, L. M., and TOFT, D. O. (1997). The amino-terminal domain of heat shock protein 90 (hsp90) that binds
References
90
91
92
93
94
95
96
97
geldanamycin is an ATP/ADP switch domain that regulates hsp90 conformation. J. Biol. Chem., 272, 23843–23850. DEHNER, A., FURRER, J., RICHTER, K., SCHUSTER, I., BUCHNER, J., and KESSLER, H. (2003). NMR Chemical Shift Perturbation Study of the N-Terminal Domain of Hsp90 upon Binding of ADP, AMP-PNP, Geldanamycin, and Radicicol. Chembiochem., 4, 870–877. MCLAUGHLIN, S. H., SMITH, H. W., and JACKSON, S. E. (2002). Stimulation of the weak ATPase activity of human hsp90 by a client protein. J. Mol. Biol., 315, 787–798. OWEN, B. A., SULLIVAN, W. P., FELTS, S. J., and TOFT, D. O. (2002). Regulation of heat shock protein 90 ATPase activity by sequences in the carboxyl terminus. J. Biol. Chem., 277, 7086–7091. SOTI, C., RACZ, A., and CSERMELY, P. (2002). A Nucleotide-dependent molecular switch controls ATP binding at the C-terminal domain of Hsp90. N-terminal nucleotide binding unmasks a C-terminal binding pocket. J. Biol. Chem., 277, 7066–7075. GARNIER, C., LAFITTE, D., TSVETKOV, P. O., BARBIER, P., LECLERC-DEVIN, J., MILLOT, J. M., BRIAND, C., MAKAROV, A. A., CATELLI, M. G., and PEYROT, V. (2002). Binding of ATP to heat shock protein 90: evidence for an ATP-binding site in the C-terminal domain. J. Biol. Chem., 277, 12208–12214. JAKOB, U., LILIE, H., MEYER, I., and BUCHNER, J. (1995). Transient interaction of Hsp90 with early unfolding intermediates of citrate synthase. Implications for heat shock in vivo. J. Biol. Chem., 270, 7288–7294. NEMOTO, T. K., ONO, T., and TANAKA, K. (2001). Substrate-binding characteristics of proteins in the 90 kDa heat shock protein family. Biochem. J., 354, 663–670. JOHNSON, B. D., CHADLI, A., FELTS, S. J., BOUHOUCHE, I., CATELLI, M. G., and TOFT, D. O. (2000). Hsp90 chaperone activity requires the full-length protein and interaction among its multiple domains. J. Biol. Chem., 275, 32499–32507.
98 WIECH, H., BUCHNER, J., ZIMMERMANN, R., and JAKOB, U. (1992). Hsp90 chaperones protein folding in vitro. Nature, 358, 169–170. 99 THULASIRAMAN, V. and MATTS, R. L. (1996). Effect of geldanamycin on the kinetics of chaperone-mediated renaturation of firefly luciferase in rabbit reticulocyte lysate. Biochemistry, 35, 13443–13450. 100 YOUNG, J. C., SCHNEIDER, C., and HARTL, F. U. (1997). In vitro evidence that hsp90 contains two independent chaperone sites. FEBS Lett., 418, 139–143. 101 SCHEIBEL, T., SIEGMUND, H. I., JAENICKE, R., GANZ, P., LILIE, H., and BUCHNER, J. (1999). The charged region of Hsp90 modulates the function of the N-terminal domain. Proc. Natl. Acad. Sci. U.S.A, 96, 1297–1302. 102 CSERMELY, P., MIYATA, Y., SOTI, C., and YAHARA, I. (1997). Binding affinity of proteins to hsp90 correlates with both hydrophobicity and positive charges. A surface plasmon resonance study. Life Sci., 61, 411–418. 103 MIYATA, Y. and YAHARA, I. (1992). The 90-kDa heat shock protein, HSP90, binds and protects casein kinase II from self-aggregation and enhances its kinase activity. J. Biol. Chem., 267, 7042–7047. 104 MIYATA, Y. and YAHARA, I. (1995). Interaction between casein kinase II and the 90-kDa stress protein, HSP90. Biochemistry, 34, 8123–8129. 105 NATHAN, D. F., VOS, M. H., and LINDQUIST, S. (1997). In vivo functions of the Saccharomyces cerevisiae Hsp90 chaperone. Proc. Natl. Acad. Sci. U.S.A, 94, 12949–12956. 106 VOGEN, S., GIDALEVITZ, T., BISWAS, C., SIMEN, B. B., STEIN, E., GULMEN, F., and ARGON, Y. (2002). Radicicol-sensitive peptide binding to the N-terminal portion of GRP94. J. Biol. Chem., 277, 40742–40750. 107 WEARSCH, P. A. and NICCHITTA, C. V. (1997). Interaction of endoplasmic reticulum chaperone GRP94 with peptide substrates is adenine nucleotide-independent. J. Biol. Chem., 272, 5152–5156.
35
36 The Hsp90 Family of Molecular Chaperones
108 WEARSCH, P. A., VOGLINO, L., and NICCHITTA, C. V. (1998). Structural transitions accompanying the activation of peptide binding to the endoplasmic reticulum Hsp90 chaperone GRP94. Biochemistry, 37, 5709–5719. 109 GIDALEVITZ, T., BISWAS, C., DING, H., SCHNEIDMAN-DUHOVNY, D., WOLFSON, H. J., STEVENS, F., RADFORD, S., and ARGON, Y. (2004). Identification of the N-terminal Peptide Binding Site of Glucose-regulated Protein 94. J. Biol. Chem., 279, 16543–16552. 110 SCROGGINS, B. T., PRINCE, T., SHAO, J., UMA, S., HUANG, W., GUO, Y., YUN, B. G., HEDMAN, K., MATTS, R. L., and HARTSON, S. D. (2003). High affinity binding of Hsp90 is triggered by multiple discrete segments of its kinase clients. Biochemistry, 42, 12550–12561. 111 SHUE, G. and KOHTZ, D. S. (1994). Structural and functional aspects of basic helix-loop-helix protein folding by heat-shock protein 90. J. Biol. Chem., 269, 2707–2711. 112 SHAKNOVICH, R., SHUE, G., and KOHTZ, D. S. (1992). Conformational activation of a basic helix-loop-helix protein (MyoD1) by the C-terminal region of murine HSP90 (HSP84). Mol. Cell Biol., 12, 5059–5068. 113 HU, J., TOFT, D., ANSELMO, D., and WANG, X. (2002). In vitro reconstitution of functional hepadnavirus reverse transcriptase with cellular chaperone proteins. J. Virol., 76, 269–279. 114 BECK, J. and NASSAL, M. (2003). Efficient Hsp90-independent in vitro activation by Hsc70 and Hsp40 of duck hepatits B virus reverse transcriptase, an assumed Hsp90 client protein. J. Biol. Chem. 115 YOUNG, J. C. and HARTL, F. U. (2000). Polypeptide release by Hsp90 involves ATP hydrolysis and is enhanced by the co-chaperone p23. EMBO J., 19, 5930–5940. 116 SCHERRER, L. C., DALMAN, F. C., MASSA, E., MESHINCHI, S., and PRATT, W. B. (1990). Structural and functional reconstitution of the glucocorticoid receptor-hsp90 complex. J. Biol. Chem., 265, 21397–21400.
117 TOFT, D. and GORSKI, J. (1966). A receptor molecule for estrogens: isolation from the rat uterus and preliminary characterization. Proc. Natl. Acad. Sci. U.S.A, 55, 1574–1581. 118 SANCHEZ, E. R., MESHINCHI, S., SCHLESINGER, M. J., and PRATT, W. B. (1987). Demonstration that the 90-kilodalton heat shock protein is bound to the glucocorticoid receptor in its 9S nondeoxynucleic acid binding form. Mol. Endocrinol., 1, 908–912. 119 SMITH, D. F., SCHOWALTER, D. B., KOST, S. L., and TOFT, D. O. (1990). Reconstitution of progesterone receptor with heat shock proteins. Mol. Endocrinol., 4, 1704–1711. 120 SMITH, D. F. (1993). Dynamics of heat shock protein 90-progesterone receptor binding and the disactivation loop model for steroid receptor complexes. Mol. Endocrinol., 7, 1418–1429. 121 SMITH, D. F., SULLIVAN, W. P., MARION, T. N., ZAITSU, K., MADDEN, B., MCCORMICK, D. J., and TOFT, D. O. (1993). Identification of a 60-kilodalton stress-related protein, p60, which interacts with hsp90 and hsp70. Mol. Cell Biol., 13, 869–876. 122 JOHNSON, J. L., BEITO, T. G., KRCO, C. J., and TOFT, D. O. (1994). Characterization of a novel 23-kilodalton protein of unactive progesterone receptor complexes. Mol. Cell Biol., 14, 1956–1963. 123 OWENS-GRILLO, J. K., HOFFMANN, K., HUTCHISON, K. A., YEM, A. W., DEIBEL, M. R., Jr., HANDSCHUMACHER, R. E., and PRATT, W. B. (1995). The cyclosporin A-binding immunophilin CyP-40 and the FK506-binding immunophilin hsp56 bind to a common site on hsp90 and exist in independent cytosolic heterocomplexes with the untransformed glucocorticoid receptor. J. Biol. Chem., 270, 20479–20484. 124 RENOIR, J. M., PAHL, A., KELLER, U., and BAULIEU, E. E. (1993). Immunological identification of a 50 kDa Mr FK506-binding immunophilin as a component of the non-DNA binding, hsp90 and hsp70 containing, heterooligomeric form of the chick
References
125
126
127
128
129
130
131
132
133
oviduct progesterone receptor. C. R. Acad. Sci. III, 316, 1410–1416. SMITH, D. F., BAGGENSTOSS, B. A., MARION, T. N., and RIMERMAN, R. A. (1993). Two FKBP-related proteins are associated with progesterone receptor complexes. J. Biol. Chem., 268, 18365–18371. CHANG, H. C. and LINDQUIST, S. (1994). Conservation of Hsp90 macromolecular complexes in Saccharomyces cerevisiae. J. Biol. Chem., 269, 24983–24988. PERDEW, G. H. and WHITELAW, M. L. (1991). Evidence that the 90-kDa heat shock protein (HSP90) exists in cytosol in heteromeric complexes containing HSP70 and three other proteins with Mr of 63,000, 56,000, and 50,000. J. Biol. Chem., 266, 6708–6713. JOHNSON, J. L. and CRAIG, E. A. (2000). A role for the Hsp40 Ydj1 in repression of basal steroid receptor activity in yeast. Mol. Cell Biol., 20, 3027–3036. KIMURA, Y., YAHARA, I., and LINDQUIST, S. (1995). Role of the protein chaperone YDJ1 in establishing Hsp90-mediated signal transduction pathways. Science, 268, 1362–1365. JOHNSON, J., CORBISIER, R., STENSGARD, B., and TOFT, D. (1996). The involvement of p23, hsp90, and immunophilins in the assembly of progesterone receptor complexes. J. Steroid Biochem. Mol. Biol., 56, 31–37. RICHTER, K., WALTER, S., and BUCHNER, J. (2004). Sba1 connects the ATPase reaction of Hsp90 to the progression of the chaperone cycle. manuscript submitted. PRODROMOU, C., SILIGARDI, G., O’BRIEN, R., WOOLFSON, D. N., REGAN, L., PANARETOU, B., LADBURY, J. E., PIPER, P. W., and PEARL, L. H. (1999). Regulation of Hsp90 ATPase activity by tetratricopeptide repeat (TPR)-domain co-chaperones. EMBO J., 18, 754–762. RICHTER, K., MUSCHLER, P., HAINZL, O., REINSTEIN, J., and BUCHNER, J. (2003). Sti1 is a non-competitive inhibitor of the Hsp90 ATPase. Binding prevents the N-terminaldimerization reaction during the atpase cycle. J. Biol. Chem., 278, 10328–10333.
134 JOHNSON, J. L. and TOFT, D. O. (1995). Binding of p23 and hsp90 during assembly with the progesterone receptor. Mol. Endocrinol., 9, 670–678. 135 SULLIVAN, W. P., OWEN, B. A., and TOFT, D. O. (2002). The influence of ATP and p23 on the conformation of hsp90. J. Biol. Chem. 136 PANARETOU, B., SILIGARDI, G., MEYER, P., MALONEY, A., SULLIVAN, J. K., SINGH, S., MILLSON, S. H., CLARKE, P. A., NAABY-HANSEN, S., STEIN, R., CRAMER, R., MOLLAPOUR, M., WORKMAN, P., PIPER, P. W., PEARL, L. H., and PRODROMOU, C. (2002). Activation of the ATPase activity of hsp90 by the stress-regulated cochaperone aha1. Mol. Cell, 10, 1307–1318. 137 HERNANDEZ, M. P., CHADLI, A., and TOFT, D. O. (2002). HSP40 binding is the first step in the HSP90 chaperoning pathway for the progesterone receptor. J. Biol. Chem., 277, 11873–11881. 138 WEGELE, H., HASLBECK, M., REINSTEIN, J., and BUCHNER, J. (2003). Sti1 is a novel activator of the Ssa proteins. J. Biol. Chem, 278, 25970–25976. 139 MORISHIMA, Y., KANELAKIS, K. C., MURPHY, P. J., SHEWACH, D. S., and PRATT, W. B. (2001). Evidence for iterative ratcheting of receptor-bound hsp70 between its ATP and ADP conformations during assembly of glucocorticoid receptor.hsp90 heterocomplexes. Biochemistry, 40, 1109–1116. 140 MORISHIMA, Y., KANELAKIS, K. C., SILVERSTEIN, A. M., DITTMAR, K. D., ESTRADA, L., and PRATT, W. B. (2000). The Hsp organizer protein hop enhances the rate of but is not essential for glucocorticoid receptor folding by the multiprotein Hsp90-based chaperone system. J. Biol. Chem., 275, 6894–6900. 141 DAS, A. K., COHEN, P. W., and BARFORD, D. (1998). The structure of the tetratricopeptide repeats of protein phosphatase 5: implications for TPR-mediated protein-protein interactions. EMBO J., 17, 1192–1199. 142 BRINKER, A., SCHEUFLER, C., VON DER, M. F., FLECKENSTEIN, B., HERRMANN, C., JUNG, G., MOAREFI, I., and HARTL, F. U.
37
38 The Hsp90 Family of Molecular Chaperones
143
144
145
146
147
148
149
150
(2002). Ligand discrimination by TPR domains. Relevance and selectivity of EEVD-recognition in Hsp70 × Hop × Hsp90 complexes. J. Biol. Chem., 277, 19265–19275. ODUNUGA, O. O., HORNBY, J. A., BIES, C, ZIMMERMANN, R., PUGH, D. J., and BLATCH, G. L. (2003). Tetratrico-peptide repeat motif-mediated Hsc70-mSTI1 interaction. Molecular characterization of the critical contacts for successful binding and specificity. J. Biol. Chem., 278, 6896–6904. JOHNSON, J. L. and TOFT, D. O. (1994). A novel chaperone complex for steroid receptors involving heat shock proteins, immunophilins, and p23. J. Biol. Chem., 269, 24989–24993. WEIKL, T., ABELMANN, K., and BUCHNER, J. (1999). An unstructured C-terminal region of the Hsp90 co-chaperone p23 is important for its chaperone function. J. Mol. Biol., 293, 685–691. WEAVER, A. J., SULLIVAN, W. P., FELTS, S. J., OWEN, B. A., and TOFT, D. O. (2000). Crystal structure and activity of human p23, a heat shock protein 90 co-chaperone. J. Biol. Chem., 275, 23045–23052. MAYR, C., RICHTER, K., LILIE, H., and BUCHNER, J. (2000). Cpr6 and Cpr7, two closely related Hsp90-associated immunophilins from Saccharomyces cerevisiae, differ in their functional properties. J. Biol. Chem., 275, 34140–34146. PIRKL, F. and BUCHNER, J. (2001). Functional analysis of the Hsp90-associated human peptidyl prolyl cis/trans isomerases FKBP51, FKBP52 and Cyp40. J. Mol. Biol., 308, 795–806. CARRIGAN, P. E., NELSON, G. M., ROBERTS, P. J., STOFFER, J., RIGGS, D. L., and SMITH, D. F. (2004). Multiple Domains of the Co-chaperone Hop Are Important for Hsp70 Binding. J. Biol. Chem., 279, 16185–16193. CHEUNG-FLYNN, J., ROBERTS, P. J., RIGGS, D. L., and SMITH, D. F. (2003). C-terminal sequences outside the tetratricopeptide repeat domain of FKBP51 and FKBP52 cause differential
151
152
153
154
155
156
157
158
binding to Hsp90. J. Biol. Chem., 278, 17388–17394. NAIR, S. C., RIMERMAN, R. A., TORAN, E. J., CHEN, S., PRAPAPANICH, V., BUTTS, R. N., and SMITH, D. F. (1997). Molecular cloning of human FKBP51 and comparisons of immunophilin interactions with Hsp90 and progesterone receptor. Mol. Cell Biol., 17, 594–603. DUINA, A. A., MARSH, J. A., KURTZ, R. B., CHANG, H. C., LINDQUIST, S., and GABER, R. F. (1998). The peptidyl-prolyl isomerase domain of the CyP-40 cyclophilin homolog Cpr7 is not required to support growth or glucocorticoid receptor activity in Saccharomyces cerevisiae. J. Biol. Chem., 273, 10819–10822. BOSE, S., WEIKL, T., BUGL, H., and BUCHNER, J. (1996). Chaperone function of Hsp90-associated proteins. Science, 274, 1715–1717. PIRKL, F., FISCHER, E., MODROW, S., and BUCHNER, J. (2001). Localization of the chaperone domain of FKBP52. J. Biol. Chem., 276, 37034–37041. BHARADWAJ, S., ALI, A., and OVSENEK, N. (1999). Multiple components of the HSP90 chaperone complex function in regulation of heat shock factor 1 in vivo. Mol. Cell Biol., 19, 8033–8041. SILVERSTEIN, A. M., GALIGNIANA, M. D., KANELAKIS, K. C., RADANYI, C., RENOIR, J. M., and PRATT, W. B. (1999). Different regions of the immunophilin FKBP52 determine its association with the glucocorticoid receptor, hsp90, and cytoplasmic dynein. J. Biol. Chem., 274, 36980–36986. GALIGNIANA, M. D., RADANYI, C., RENOIR, J. M., HOUSLEY, P. R., and PRATT, W. B. (2001). Evidence that the peptidylprolyl isomerase domain of the hsp90-binding immunophilin FKBP52 is involved in both dynein interaction and glucocorticoid receptor movement to the nucleus. J. Biol. Chem., 276, 14884–14889. DAVIES, T. H., NING, Y. M., and SANCHEZ, E. R. (2002). A new first step in activation of steroid receptors: hormone-induced switching of FKBP51
References
159
160
161
162
163
164
165
166
167
and FKBP52 immunophilins. J. Biol. Chem., 277, 4597–4600. GALIGNIANA, M. D., HARRELL, J. M., O’HAGEN, H. M., LJUNGMAN, M., and PRATT, W. B. (2004). Hsp90-binding immunophilins link p53 to dynein during p53 transport to the nucleus. J. Biol. Chem. RIGGS, D. L., ROBERTS, P. J., CHIRILLO, S. C., CHEUNG-FLYNN, J., PRAPAPANICH, V., RATAJCZAK, T., GABER, R., PICARD, D., and SMITH, D. F. (2003). The Hsp90-binding peptidylprolyl isomerase FKBP52 potentiates glucocorticoid signaling in vivo. EMBO J., 22, 1158–1167. SILVERSTEIN, A. M., GALIGNIANA, M. D., CHEN, M. S., OWENS-GRILLO, J. K., CHINKERS, M., and PRATT, W. B. (1997). Protein phosphatase 5 is a major component of glucocorticoid receptor. Hsp90 complexes with properties of an FK506-binding immunophilin. J. Biol. Chem., 272, 16224–16230. STEPANOVA, L., LENG, X., PARKER, S. B., and HARPER, J. W. (1996). Mammalian p50Cdc37 is a protein kinase-targeting subunit of Hsp90 that binds and stabilizes Cdk4. Genes Dev., 10, 1491–1502. BRUGGE, J. S. (1986). Interaction of the Rous sarcoma virus protein pp60src with the cellular proteins pp50 and pp90. Curr. Top. Microbiol. Immunol., 123, 1–22. GRAMMATIKAKIS, N., LIN, J. H., GRAMMATIKAKIS, A., TSICHLIS, P. N., and COCHRAN, B. H. (1999). p50(cdc37) acting in concert with Hsp90 is required for Raf-1 function. Mol. Cell Biol., 19, 1661–1672. STEWART, S., SUNDARAM, M., ZHANG, Y., LEE, J., HAN, M., and GUAN, K. L. (1999). Kinase suppressor of Ras forms a multiprotein signaling complex and modulates MEK localization. Mol. Cell Biol., 19, 5523–5534. TATEBE, H. and SHIOZAKI, K. (2003). Identification of Cdc37 as a novel regulator of the stress-responsive mitogen-activated protein kinase. Mol. Cell Biol., 23, 5132–5142. XU, Y., SINGER, M. A., and LINDQUIST, S. (1999). Maturation of the tyrosine kinase
168
169
170
171
172
173
174
175
176
c-src as a kinase and as a substrate depends on the molecular chaperone Hsp90. Proc. Natl. Acad. Sci. U.S.A, 96, 109–114. XU, Y. and LINDQUIST, S. (1993). Heat-shock protein hsp90 governs the activity of pp60v-src kinase. Proc. Natl. Acad. Sci. U.S.A, 90, 7074–7078. LAMPHERE, L., FIORE, F., XU, X., BRIZUELA, L., KEEZER, S., SARDET, C., DRAETTA, G. F., and GYURIS, J. (1997). Interaction between Cdc37 and Cdk4 in human cells. Oncogene, 14, 1999–2004. PERDEW, G. H., WIEGAND, H., VANDEN HEUVEL, J. P., MITCHELL, C., and SINGH, S. S. (1997). A 50 kilodalton protein associated with raf and pp60(v-src) protein kinases is a mammalian homolog of the cell cycle control protein cdc37. Biochemistry, 36, 3600–3607. SILIGARDI, G., PANARETOU, B., MEYER, P., SINGH, S., WOOLFSON, D. N., PIPER, P. W., PEARL, L. H., and PRODROMOU, C. (2002). Regulation of Hsp90 ATPase activity by the co-chaperone Cdc37p/p50cdc37. J. Biol. Chem, 277, 20151–20159. ROE, S. M., ALI, M. M., MEYER, P., VAUGHAN, C. K., PANARETOU, B., PIPER, P. W., PRODROMOU, C., and PEARL, L. H. (2004). The Mechanism of Hsp90 regulation by the protein kinase-specific cochaperone p50(cdc37). Cell, 116, 87–98. LEE, P., SHABBIR, A., CARDOZO, C., and CAPLAN, A. J. (2004). Sti1 and Cdc37 can stabilize Hsp90 in chaperone complexes with a protein kinase. Mol. Biol. Cell, 15, 1785–1792. MACLEAN, M. and PICARD, D. (2003). Cdc37 goes beyond Hsp90 and kinases. Cell Stress. Chaperones., 8, 114–119. ABBAS-TERKI, T., BRIAND, P. A., DONZE, O., and PICARD, D. (2002). The Hsp90 co-chaperones Cdc37 and Sti1 interact physically and genetically. Biol. Chem, 383, 1335–1342. KIMURA, Y., RUTHERFORD, S. L., MIYATA, Y., YAHARA, I., FREEMAN, B. C., YUE, L., MORIMOTO, R. I., and LINDQUIST, S. (1997). Cdc37 is a molecular chaperone with specific functions in signal transduction. Genes Dev., 11, 1775–1785.
39
40 The Hsp90 Family of Molecular Chaperones
177 YOUNG, J. C., HOOGENRAAD, N. J., and HARTL, F. U. (2003). Molecular chaperones Hsp90 and Hsp70 deliver preproteins to the mitochondrial import receptor Tom70. Cell, 112, 41–50. 178 YOUNG, J. C., BARRAL, J. M., and ULRICH, H. F. (2003). More than folding: localized functions of cytosolic chaperones. Trends Biochem. Sci., 28, 541–547. 179 BALLINGER, C. A., CONNELL, P., WU, Y., HU, Z., THOMPSON, L. J., YIN, L. Y., and PATTERSON, C. (1999). Identification of CHIP, a novel tetratricopeptide repeat-containing protein that interacts with heat shock proteins and negatively regulates chaperone functions. Mol. Cell Biol., 19, 4535–4545. 180 MURATA, S., MINAMI, Y., MINAMI, M., CHIBA, T., and TANAKA, K. (2001). CHIP is a chaperone-dependent E3 ligase that ubiquitylates unfolded protein. EMBO Rep., 2, 1133–1138. 181 KITAGAWA, K., SKOWYRA, D., ELLEDGE, S. J., HARPER, J. W., and HIETER, P. (1999). SGT1 encodes an essential component of the yeast kinetochore assembly pathway and a novel subunit of the SCF ubiquitin ligase complex. Mol. Cell, 4, 21–33. 182 TAKAHASHI, A., CASAIS, C., ICHIMURA, K., and SHIRASU, K. (2003). HSP90 interacts with RAR1 and SGT1 and is essential for RPS2-mediated disease resistance in Arabidopsis. Proc. Natl. Acad. Sci. U.S.A, 100, 11777–11782. 183 LIU, Y., BURCH-SMITH, T., SCHIFF, M., FENG, S., and DINESH-KUMAR, S. P. (2004). Molecular chaperone Hsp90 associates with resistance protein N and its signaling proteins SGT1 and Rar1 to modulate an innate immune response in plants. J. Biol. Chem., 279, 2101–2108. 184 AUSTIN, M. J., MUSKETT, P., KAHN, K., FEYS, B. J., JONES, J. D., and PARKER, J. E. (2002). Regulatory role of SGT1 in early R gene-mediated plant defenses. Science, 295, 2077–2080. 185 GARCIA-RANEA, J. A., MIREY, G., CAMONIS, J., and VALENCIA, A. (2002). p23 and HSP20/alpha-crystallin proteins define a conserved sequence domain present in
186
187
188
189
190
191
192
193
other eukaryotic protein families. FEBS Lett., 529, 162–167. LEE, Y. T., JACOB, J., MICHOWSKI, W., NOWOTNY, M., KUZNICKI, J., and CHAZIN, W. J. (2004). Human Sgt1 Binds HSP90 through the CHORD-Sgt1 Domain and Not the Tetratricopeptide Repeat Domain. J. Biol. Chem., 279, 16511–16517. NATHAN, D. F., VOS, M. H., and LINDQUIST, S. (1999). Identification of SSF1, CNS1, and HCH1 as multicopy suppressors of a Saccharomyces cerevisiae Hsp90 loss-of-function mutation. Proc. Natl. Acad. Sci. U.S.A, 96, 1409–1414. MEYER, P., PRODROMOU, C., LIAO, C., HU, B., MARK, R. S., VAUGHAN, C. K., VLASIC, I., PANARETOU, B., PIPER, P. W., and PEARL, L. H. (2004). Structural basis for recruitment of the ATPase activator Aha1 to the Hsp90 chaperone machinery. EMBO J., 23, 511–519. LOTZ, G. P., LIN, H., HARST, A., and OBERMANN, W. M. (2003). Aha1 binds to the middle domain of Hsp90, contributes to client protein activation, and stimulates the ATPase activity of the molecular chaperone. J. Biol. Chem., 278, 17228–17235. DOLINSKI, K. J., CARDENAS, M. E., and HEITMAN, J. (1998). CNS1 encodes an essential p60/Sti1 homolog in Saccharomyces cerevisiae that suppresses cyclophilin 40 mutations and interacts with Hsp90. Mol. Cell Biol., 18, 7344–7352. CREVEL, G., BATES, H., HUIKESHOVEN, H., and COTTERILL, S. (2001). The Drosophila Dpit47 protein is a nuclear Hsp90 co-chaperone that interacts with DNA polymerase alpha. J. Cell Sci., 114, 2015–2025. HAINZL, O., WEGELE, H., RICHTER, K., and BUCHNER, J. (2004). Cns1 is an activator of the Ssa1 ATPase activity. J. Biol. Chem. MARSH, J. A., KALTON, H. M., and GABER, R. F. (1998). Cns1 is an essential protein associated with the hsp90 chaperone complex in Saccharomyces cerevisiae that can restore cyclophilin 40-dependent functions in cpr7Delta
References
194
195
196
197
198
199
200
201
202
cells. Mol. Cell Biol., 18, 7353–7359. TESIC, M., MARSH, J. A., CULLINAN, S. B., and GABER, R. F. (2003). Functional Interactions between Hsp90 and the Co-chaperones Cns1 and Cpr7 in Saccharomyces cerevisiae. J. Biol. Chem, 278, 32692–32701. RAMADOSS, P., PETRULIS, J. R., HOLLINGSHEAD, B. D., KUSNADI, A., and PERDEW, G. H. (2004). Divergent roles of hepatitis B virus X-associated protein 2 (XAP2) in human versus mouse Ah receptor complexes. Biochemistry, 43, 700–709. MEYER, B. K. and PERDEW, G. H. (1999). Characterization of the AhR-hsp90-XAP2 core complex and the role of the immunophilin-related protein XAP2 in AhR stabilization. Biochemistry, 38, 8907–8917. LEES, M. J., PEET, D. J., and WHITELAW, M. L. (2003). Defining the role for XAP2 in stabilisation of the dioxin receptor. J. Biol. Chem. ANGELETTI, P. C., WALKER, D., and PANGANIBAN, A. T. (2002). Small glutamine-rich protein/viral protein U-binding protein is a novel cochaperone that affects heat shock protein 70 activity. Cell Stress. Chaperones., 7, 258–268. BRYCHZY, A., REIN, T., WINKLHOFER, K. F., HARTL, F. U., YOUNG, J. C., and OBERMANN, W. M. (2003). Cofactor Tpr2 combines two TPR domains and a J domain to regulate the Hsp70/Hsp90 chaperone system. EMBO J., 22, 3613–3623. BOEKE, J. D., TRUEHEART, J., NATSOULIS, G., and FINK, G. R. (1987). 5-Fluoroorotic acid as a selective agent in yeast molecular genetics. Methods Enzymol., 154, 164–175. SCHEIBEL, T., WEIKL, T., RIMERMAN, R., SMITH, D., LINDQUIST, S., and BUCHNER, J. (1999). Contribution of N- and C-terminal domains to the function of Hsp90 in Saccharomyces cerevisiae. Mol. Microbiol., 34, 701–713. OBERMANN, W. M., SONDERMANN, H., RUSSO, A. A., PAVLETICH, N. P., and HARTL, F. U. (1998). In vivo function of
203
204
205
206
207
208
209
210
211
212
Hsp90 is dependent on ATP binding and ATP hydrolysis. J. Cell Biol., 143, 901–910. PICARD, D., KHURSHEED, B., GARABEDIAN, M. J., FORTIN, M. G., LINDQUIST, S., and YAMAMOTO, K. R. (1990). Reduced levels of hsp90 compromise steroid receptor action in vivo. Nature, 348, 166–168. ALI, J. A., JACKSON, A. P., HOWELLS, A. J., and MAXWELL, A. (1993). The 43-kilodalton N-terminal fragment of the DNA gyrase B protein hydrolyzes ATP and binds coumarin drugs. Biochemistry, 32, 2717–2724. LANZETTA, P. A., ALVAREZ, L. J., REINACH, P. S., and CANDIA, O. A. (1979). An improved assay for nanomole amounts of inorganic phosphate. Anal. Biochem., 100, 95–97. KORNBERG, A., SCOTT, J. F., and BERTSCH, L. L. (1978). ATP utilization by rep protein in the catalytic separation of DNA strands at a replicating fork. J. Biol. Chem., 253, 3298–3304. FREEMAN, B. C., FELTS, S. J., TOFT, D. O., and YAMAMOTO, K. R. (2000). The p23 molecular chaperones act at a late step in intracellular receptor action to differentially affect ligand efficacies. Genes Dev., 14, 422–434. MALMQVIST, M. (1993). Surface plasmon resonance for detection and measurement of antibody-antigen affinity and kinetics. Curr. Opin. Immunol., 5, 282–286. SZABO, A., STOLZ, L., and GRANZOW, R. (1995). Surface plasmon resonance and its use in biomolecular interaction analysis (BIA). Curr. Opin. Struct. Biol., 5, 699–705. WEGELE, H., MULLER, L., and BUCHNER, J. (2004). Hsp70 and Hsp90-a relay team for protein folding. Rev. Physiol Biochem. Pharmacol. TAYLOR, P., DORNAN, J., CARRELLO, A., MINCHIN, R. F., RATAJCZAK, T., and WALKINSHAW, M. D. (2001). Two structures of cyclophilin 40: folding and fidelity in the TPR domains. Structure. (Camb.), 9, 431–438. JOAB, I., RADANYI, C., RENOIR, M., BUCHOU, T., CATELLI, M. G., BINART, N., MESTER, J., and BAULIEU, E. E. (1984).
41
42 The Hsp90 Family of Molecular Chaperones
213
214
215
216
217
218
219
220
221
Common non-hormone binding component in non-transformed chick oviduct receptors of four steroid hormones. Nature, 308, 850–853. VELDSCHOLTE, J., BERREVOETS, C. A., BRINKMANN, A. O., GROOTEGOED, J. A., and MULDER, E. (1992). Anti-androgens and the mutated androgen receptor of LNCaP cells: differential effects on binding affinity, heat-shock protein interaction, and transcription activation. Biochemistry, 31, 2393–2399. DENIS, M., CUTHILL, S., WIKSTROM, A. C., POELLINGER, L., and GUSTAFSSON, J. A. (1988). Association of the dioxin receptor with the Mr 90,000 heat shock protein: a structural kinship with the glucocorticoid receptor. Biochem. Biophys. Res. Commun., 155, 801–807. PERDEW, G. H. (1988). Association of the Ah receptor with the 90-kDa heat shock protein. J. Biol. Chem., 263, 13802–13805. YOSHINARI, K., KOBAYASHI, K., MOORE, R., KAWAMOTO, T., and NEGISHI, M. (2003). Identification of the nuclear receptor CAR:HSP90 complex in mouse liver and recruitment of protein phosphatase 2A in response to phenobarbital. FEBS Lett., 548, 17–20. ARBEITMAN, M. N. and HOGNESS, D. S. (2000). Molecular chaperones activate the Drosophila ecdysone receptor, an RXR heterodimer. Cell, 101, 67–77. SABBAH, M., REDEUILH, G., and BAULIEU, E. E. (1989). Subunit composition of the estrogen receptor. Involvement of the hormone-binding domain in the dimeric state. J. Biol. Chem., 264, 2397–2400. BEN OR, S. (1989). Evidence that 5 S intermediate state in glucocorticoid receptor transformation contains hsp90 in addition to the steroid-binding protein. J. Steroid Biochem., 33, 899–906. ZHANG, L., HACH, A., and WANG, C. (1998). Molecular mechanism governing heme signaling in yeast: a higher-order complex mediates heme regulation of the transcriptional activator HAP1. Mol. Cell Biol., 18, 3819–3828. ALI, A., BHARADWAJ, S., O’CARROLL, R., and OVSENEK, N. (1998). HSP90 interacts with and regulates the activity of heat
222
223
224
225
226
227
228
shock factor 1 in Xenopus oocytes. Mol. Cell Biol., 18, 4949–4960. MINET, E., MOTTET, D., MICHEL, G., ROLAND, I., RAES, M., REMACLE, J., and MICHIELS, C. (1999). Hypoxia-induced activation of HIF-1: role of HIF-1alpha-Hsp90 interaction. FEBS Lett., 460, 251–256. RAFESTIN-OBLIN, M. E., COUETTE, B., RADANYI, C., LOMBES, M., and BAULIEU, E. E. (1989). Mineralocortico-steroid receptor of the chick intestine. Oligomeric structure and transformation. J. Biol. Chem., 264, 9304–9309. KOMORI, A., SUEOKA, E., FUJIKI, H., ISHII, M., and KOZU, T. (1999). Association of MTG8 (ETO/CDR), a leukemia-related protein, with serine/threonine protein kinases and heat shock protein HSP90 in human hematopoietic cell lines. Jpn. J. Cancer Res., 90, 60–68. SUMANASEKERA, W. K., TIEN, E. S., DAVIS, J. W., TURPEY, R., PERDEW, G. H., and VANDEN HEUVEL, J. P. (2003). Heat shock protein-90 (Hsp90) acts as a repressor of peroxisome proliferator-activated receptor-alpha (PPARalpha) and PPARbeta activity. Biochemistry, 42, 10726–10735. SUMANASEKERA, W. K., TIEN, E. S., TURPEY, R., VANDEN HEUVEL, J. P., and PERDEW, G. H. (2003). Evidence that peroxisome proliferator-activated receptor alpha is complexed with the 90-kDa heat shock protein and the hepatitis virus B X-associated protein 2. J. Biol. Chem., 278, 4467–4473. CATELLI, M. G., BINART, N., JUNG-TESTAS, I., RENOIR, J. M., BAULIEU, E. E., FERAMISCO, J. R., and WELCH, W. J. (1985). The common 90-kd protein component of non-transformed ‘8S’ steroid receptors is a heat-shock protein. EMBO J., 4, 3131–3135. SCHUH, S., YONEMOTO, W., BRUGGE, J., BAUER, V. J., RIEHL, R. M., SULLIVAN, W. P., and TOFT, D. O. (1985). A 90,000-dalton binding protein common to both steroid receptors and the Rous sarcoma virus transforming protein, pp60v-src. J. Biol. Chem., 260, 14292–14296.
References 229 HOLLEY, S. J. and YAMAMOTO, K. R. (1995). A role for Hsp90 in retinoid receptor signal transduction. Mol. Biol. Cell, 6, 1833–1842. 230 MCGUIRE, J., COUMAILLEAU, P., WHITELAW, M. L., GUSTAFSSON, J. A., and POELLINGER, L. (1995). The basic helix-loop-helix/PAS factor Sim is associated with hsp90. Implications for regulation by interaction with partner factors. J. Biol. Chem., 270, 31353–31357. 231 SHAH, M., PATEL, K., FRIED, V. A., and SEHGAL, P. B. (2002). Interactions of STAT3 with caveolin-1 and heat shock protein 90 in plasma membrane raft and cytosolic complexes: preservation of cytokine signaling during fever. J. Biol. Chem. 232 MIYATA, Y. and YAHARA, I. (2000). p53-independent association between SV40 large T antigen and the major cytosolic heat shock protein, HSP90. Oncogene, 19, 1477–1484. 233 HASHIMOTO, Y. and SHUDO, K. (1991). Cytosolic-nuclear tumor promoter-specific binding protein: association with the 90 kDa heat shock protein and translocation into nuclei by treatment with 12-O-tetradecanoyl-phorbol 13-acetate. Jpn. J. Cancer Res., 82, 665–675. 234 PRIVALSKY, M. L. (1991). A subpopula-tion of the v-erb A oncogene protein, a derivative of a thyroid hormone receptor, associates with heat shock protein 90. J. Biol. Chem., 266, 1456–1462. 235 BRUNT, S. A., PERDEW, G. H., TOFT, D. O., and SILVER, J. C. (1998). Hsp90-containing multiprotein complexes in the eukaryotic microbe Achlya. Cell Stress. Chaperones., 3, 44–56. 236 HOLT, S. E., AISNER, D. L., BAUR, J., TESMER, V. M., DY, M., OUELLETTE, M., TRAGER, J. B., MORIN, G. B., TOFT, D. O., SHAY, J. W., WRIGHT, W. E., and WHITE, M. A. (1999). Functional requirement of p23 and Hsp90 in telomerase complexes. Genes Dev., 13, 817–826. 237 HU, J. and SEEGER, C. (1996). Hsp90 is required for the activity of a hepatitis B virus reverse transcriptase. Proc. Natl. Acad. Sci. U.S.A, 93, 1060–1064.
238 FUJITA, N., SATO, S., ISHIDA, A., and TSURUO, T. (2002). Involvement of Hsp90 in signaling and stability of 3-phosphoinositide-dependent kinase-1. J. Biol. Chem., 277, 10346–10353. 239 SATO, S., FUJITA, N., and TSURUO, T. (2000). Modulation of Akt kinase activity by binding to Hsp90. Proc. Natl. Acad. Sci. U.S.A, 97, 10832–10837. 240 MIYATA, Y. and NISHIDA, E. (2004). CK2 controls multiple protein kinases by phosphorylating a kinase-targeting molecular chaperone, Cdc37. Mol. Cell Biol., 24, 4065–4074. 241 PALMQUIST, K., RIIS, B., NILSSON, A., and NYGARD, O. (1994). Interaction of the calcium and calmodulin regulated eEF-2 kinase with heat shock protein 90. FEBS Lett., 349, 239–242. 242 MUNOZ, M. J. and JIMENEZ, J. (1999). Genetic interactions between Hsp90 and the Cdc2 mitotic machinery in the fission yeast Schizosaccharomyces pombe. Mol. Gen. Genet., 261, 242–250. 243 MAHONY, D., PARRY, D. A., and LEES, E. (1998). Active cdk6 complexes are predominantly nuclear and represent only a minority of the cdk6 in T cells. Oncogene, 16, 603–611. 244 O’KEEFFE, B., FONG, Y., CHEN, D., ZHOU, S., and ZHOU, Q. (2000). Requirement for a kinase-specific chaperone pathway in the production of a Cdk9/cyclin T1 heterodimer responsible for P-TEFb-mediated tat stimulation of HIV-1 transcription. J. Biol. Chem., 275, 279–287. 245 ARLANDER, S. J., EAPEN, A. K., VROMAN, B. T., MCDONALD, R. J., TOFT, D. O., and KARNITZ, L. M. (2003). Hsp90 inhibition depletes Chk1 and sensitizes tumor cells to replication stress. J. Biol. Chem., 278, 52572–52577. 246 FISHER, D. L., MANDART, E., and DOREE, M. (2000). Hsp90 is required for c-Mos activation and biphasic MAP kinase activation in Xenopus oocytes. EMBO J., 19, 1516–1524. 247 STANCATO, L. F., SILVERSTEIN, A. M., OWENS-GRILLO, J. K., CHOW, Y. H., JOVE, R., and PRATT, W. B. (1997). The hsp90-binding antibiotic geldanamycin decreases Raf levels and epidermal
43
44 The Hsp90 Family of Molecular Chaperones
248
249
250
251
252
253
254
255
growth factor signaling without disrupting formation of signaling complexes or reducing the specific enzymatic activity of Raf kinase. J. Biol. Chem., 272, 4013–4020. XU, W., MIMNAUGH, E., ROSSER, M. F., NICCHITTA, C., MARCU, M., YARDEN, Y., and NECKERS, L. (2001). Sensitivity of mature Erbb2 to geldanamycin is conferred by its kinase domain and is mediated by the chaperone protein Hsp90. J. Biol. Chem., 276, 3702–3708. MINAMI, Y., KIYOI, H., YAMAMOTO, Y., YAMAMOTO, K., UEDA, R., SAITO, H., and NAOE, T. (2002). Selective apoptosis of tandemly duplicated FLT3-transformed leukemia cells by Hsp90 inhibitors. Leukemia, 16, 1535–1540. MASSON-GADAIS, B., HOULE, F., LAFERRIERE, J., and HUOT, J. (2003). Integrin alphavbeta3, requirement for VEGFR2-mediated activation of SAPK2/p38 and for Hsp90-dependent phosphorylation of focal adhesion kinase in endothelial cells activated by VEGF. Cell Stress. Chaperones., 8, 37–52. LUO, J. and BENOVIC, J. L. (2003). G protein-coupled receptor kinase interaction with Hsp90 mediates kinase maturation. J. Biol. Chem., 278, 50908–50914. SCHOLZ, G., HARTSON, S. D., CARTLEDGE, K., HALL, N., SHAO, J., DUNN, A. R., and MATTS, R. L. (2000). p50(Cdc37) can buffer the temperature-sensitive properties of a mutant of Hck. Mol. Cell Biol., 20, 6984–6995. MATTS, R. L. and HURST, R. (1989). Evidence for the association of the heme-regulated eIF-2 alpha kinase with the 90-kDa heat shock protein in rabbit reticulocyte lysate in situ. J. Biol. Chem., 264, 15542–15547. ROSE, D. W., WETTENHALL, R. E., KUDLICKI, W., KRAMER, G., and HARDESTY, B. (1987). The 90-kilodalton peptide of the heme-regulated eIF-2 alpha kinase has sequence similarity with the 90-kilodalton heat shock protein. Biochemistry, 26, 6583–6587. LEWIS, J., DEVIN, A., MILLER, A., LIN, Y., RODRIGUEZ, Y., NECKERS, L., and LIU, Z. G. (2000). Disruption of hsp90 function
256
257
258
259
260
261
262
263
results in degradation of the death domain kinase, receptor-interacting protein (RIP), and blockage of tumor necrosis factor-induced nuclear factor-kappaB activation. J. Biol. Chem., 275, 10519–10526. TAKATA, Y., IMAMURA, T., IWATA, M., USUI, I., HARUTA, T., NANDACHI, N., ISHIKI, M., SASAOKA, T., and KOBAYASHI, M. (1997). Functional importance of heat shock protein 90 associated with insulin receptor on insulin-stimulated mitogenesis. Biochem. Biophys. Res. Commun., 237, 345–347. JEROME, V., LEGER, J., DEVIN, J., BAULIEU, E. E., and CATELLI, M. G. (1991). Growth factors acting via tyrosine kinase receptors induce HSP90 alpha gene expression. Growth Factors, 4, 317–327. MARCU, M. G., DOYLE, M., BERTOLOTTI, A., RON, D., HENDERSHOT, L., and NECKERS, L. (2002). Heat shock protein 90 modulates the unfolded protein response by stabilizing IRE1alpha. Mol. Cell Biol., 22, 8506–8513. BOUDEAU, J., DEAK, M., LAWLOR, M. A., MORRICE, N. A., and ALESSI, D. R. (2003). Heat-shock protein 90 and Cdc37 interact with LKB1 and regulate its stability. Biochem. J., 370, 849–857. NONY, P., GAUDE, H., ROSSEL, M., FOURNIER, L., ROUAULT, J. P., and BILLAUD, M. (2003). Stability of the Peutz-Jeghers syndrome kinase LKB1 requires its binding to the molecular chaperones Hsp90/Cdc37. Oncogene, 22, 9165–9175. HARTSON, S. D. and MATTS, R. L. (1994). Association of Hsp90 with cellular Src-family kinases in a cell-free system correlates with altered kinase structure and function. Biochemistry, 33, 8912–8920. MIYATA, Y., IKAWA, Y., SHIBUYA, M., and NISHIDA, E. (2001). Specific association of a set of molecular chaperones including HSP90 and Cdc37 with MOK, a member of the mitogen-activated protein kinase superfamily. J. Biol. Chem., 276, 21841–21848. CISSEL, D. S. and BEAVEN, M. A. (2000). Disruption of Raf-1/heat shock protein 90 complex and Raf signaling by
References
264
265
266
267
268
269
270
271
272
dexamethasone in mast cells. J. Biol. Chem., 275, 7066–7070. GOES, F. S. and MARTIN, J. (2001). Hsp90 chaperone complexes are required for the activity and stability of yeast protein kinases Mik1, Wee1 and Swe1. Eur. J. Biochem, 268, 2281–2289. BONVINI, P., GASTALDI, T., FALINI, B., and ROSOLEN, A. (2002). Nucleophosmin-anaplastic lymphoma kinase (NPM-ALK), a novel Hsp90-client tyrosine kinase: down-regulation of NPM-ALK expression and tyrosine phosphorylation in ALK(+) CD30(+) lymphoma cells by the Hsp90 antagonist 17-allylamino, 17-demethoxygeldana mycin. Cancer Res., 62, 1559–1566. FLANAGAN, C. A. and THORNER, J. (1992). Purification and characterization of a soluble phosphatidylinositol 4-kinase from the yeast Saccharomyces cerevisiae. J. Biol. Chem., 267, 24117–24125. MIZUNO, K., SHIROGANE, T., SHINOHARA, A., IWAMATSU, A., HIBI, M., and HIRANO, T. (2001). Regulation of Pim-1 by Hsp90. Biochem. Biophys. Res. Commun., 281, 663–669. SAKAGAMI, M., MORRISON, P., and WELCH, W. J. (1999). Benzoquinoid ansamycins (herbimycin A and geldanamycin) interfere with the maturation of growth factor receptor tyrosine kinases. Cell Stress. Chaperones., 4, 19–28. de CARCER, G., DO CARMO, A. M., LALLENA, M. J., GLOVER, D. M., and GONZALEZ, C. (2001). Requirement of Hsp90 for centrosomal function reflects its regulation of Polo kinase stability. EMBO J., 20, 2878–2884. JAISWAL, R. K., WEISSINGER, E., KOLCH, W., and LANDRETH, G. E. (1996). Nerve growth factor-mediated activation of the mitogen-activated protein (MAP) kinase cascade involves a signaling complex containing B-Raf and HSP90. J. Biol. Chem., 271, 23626–23629. LOVRIC, J., BISCHOF, O., and MOELLING, K. (1994). Cell cycle-dependent association of Gag-Mil and hsp90. FEBS Lett., 343, 15–21. STANCATO, L. F., CHOW, Y. H., HUTCHISON, K. A., PERDEW, G. H., JOVE, R., and PRATT, W. B. (1993). Raf exists in
273
274
275
276
277
278
279
280
281
a native heterocomplex with hsp90 and p50 that can be reconstituted in a cell-free system. J. Biol. Chem., 268, 21711–21716. WARTMANN, M. and DAVIS, R. J. (1994). The native structure of the activated Raf protein kinase is a membrane-bound multi-subunit complex. J. Biol. Chem., 269, 6695–6701. CUTFORTH, T. and RUBIN, G. M. (1994). Mutations in Hsp83 and cdc37 impair signaling by the sevenless receptor tyrosine kinase in Drosophila. Cell, 77, 1027–1036. DONZE, O. and PICARD, D. (1999). Hsp90 binds and regulates Gcn2, the ligand-inducible kinase of the alpha subunit of eukaryotic translation initiation factor 2 [corrected]. Mol. Cell Biol., 19, 8422–8432. BERNSTEIN, S. L., RUSSELL, P., WONG, P., FISHELEVICH, R., and SMITH, L. E. (2001). Heat shock protein 90 in retinal ganglion cells: association with axonally transported proteins. Vis. Neurosci., 18, 429–436. LIPSICH, L. A., CUTT, J. R., and BRUGGE, J. S. (1982). Association of the transforming proteins of Rous, Fujinami, and Y73 avian sarcoma viruses with the same two cellular proteins. Mol. Cell Biol., 2, 875–880. ZIEMIECKI, A., CATELLI, M. G., JOAB, I., and MONCHARMONT, B. (1986). Association of the heat shock protein hsp90 with steroid hormone receptors and tyrosine kinase oncogene products. Biochem. Biophys. Res. Commun., 138, 1298–1307. HUTCHISON, K. A., BROTT, B. K., De LEON, J. H., PERDEW, G. H., JOVE, R., and PRATT, W. B. (1992). Reconstitution of the multiprotein complex of pp60src, hsp90, and p50 in a cell-free system. J. Biol. Chem., 267, 2902–2908. OPPERMANN, H., LEVINSON, W., and BISHOP, J. M. (1981). A cellular protein that associates with the transforming protein of Rous sarcoma virus is also a heat-shock protein. Proc. Natl. Acad. Sci. U.S.A, 78, 1067–1071. KOYASU, S., NISHIDA, E., KADOWAKI, T., MATSUZAKI, F., IIDA, K., HARADA, F.,
45
46 The Hsp90 Family of Molecular Chaperones
282
283
284
285
286
287
288
289
290
KASUGA, M., SAKAI, H., and YAHARA, I. (1986). Two mammalian heat shock proteins, HSP90 and HSP100, are actin-binding proteins. Proc. Natl. Acad. Sci. U.S.A, 83, 8054–8058. SALEH, A., SRINIVASULA, S. M., BALKIR, L., ROBBINS, P. D., and ALNEMRI, E. S. (2000). Negative regulation of the Apaf-1 apoptosome by Hsp70. Nat. Cell Biol., 2, 476–483. GUSAROVA, V., CAPLAN, A. J., BRODSKY, J. L., and FISHER, E. A. (2001). Apoprotein B degradation is promoted by the molecular chaperones hsp90 and hsp70. J. Biol. Chem., 276, 24891–24900. KUMAR, R., GRAMMATIKAKIS, N., and CHINKERS, M. (2001). Regulation of the atrial natriuretic peptide receptor by heat shock protein 90 complexes. J. Biol. Chem., 276, 11371–11375. ZHAO, C. and WANG, E. (2004). Heat shock protein 90 suppresses tumor necrosis factor alpha induced apoptosis by preventing the cleavage of Bid in NIH3T3 fibroblasts. Cell Signal., 16, 313–321. NISHIDA, E., KOYASU, S., SAKAI, H., and YAHARA, I. (1986). Calmodulin-regulated binding of the 90-kDa heat shock protein to actin filaments. J. Biol. Chem., 261, 16033–16036. BOGATCHEVA, N. V., MA, Y., UROSEV, D., and GUSEV, N. B. (1999). Localization of calponin binding sites in the structure of 90 kDa heat shock protein (Hsp90). FEBS Lett., 457, 369–374. UZAWA, M., GRAMS, J., MADDEN, B., TOFT, D., and SALISBURY, J. L. (1995). Identification of a complex between centrin and heat shock proteins in CSF-arrested Xenopus oocytes and dissociation of the complex following oocyte activation. Dev. Biol., 171, 51–59. IMAI, J. and YAHARA, I. (2000). Role of HSP90 in salt stress tolerance via stabilization and regulation of calcineurin. Mol. Cell Biol., 20, 9262–9270. LOO, M. A., JENSEN, T. J., CUI, L., HOU, Y., CHANG, X. B., and RIORDAN, J. R. (1998). Perturbation of Hsp90 interaction with nascent CFTR prevents its maturation
291
292
293
294
295
296
297
298
299
and accelerates its degradation by the proteasome. EMBO J., 17, 6879–6887. STEMMANN, O., NEIDIG, A., KOCHER, T., WILM, M., and LECHNER, J. (2002). Hsp90 enables Ctf13p/Skp1p to nucleate the budding yeast kinetochore. Proc. Natl. Acad. Sci. U.S.A, 99, 8585–8590. GARCIA-CARDENA, G., FAN, R., SHAH, V., SORRENTINO, R., CIRINO, G., PAPAPETROPOULOS, A., and SESSA, W. C. (1998). Dynamic activation of endothelial nitric oxide synthase by Hsp90. Nature, 392, 821–824. FICKER, E., DENNIS, A. T., WANG, L., and BROWN, A. M. (2003). Role of the cytosolic chaperones Hsp70 and Hsp90 in maturation of the cardiac potassium channel HERG. Circ. Res., 92, e87–100. HOSHINO, T., WANG, J., DEVETTEN, M. P., IWATA, N., KAJIGAYA, S., WISE, R. J., LIU, J. M., and YOUSSOUFIAN, H. (1998). Molecular chaperone GRP94 binds to the Fanconi anemia group C protein and regulates its intracellular expression. Blood, 91, 4379–4386. INANOBE, A., TAKAHASHI, K., and KATADA, T. (1994). Association of the beta gamma subunits of trimeric GTP-binding proteins with 90-kDa heat shock protein, hsp90. J. Biochem. (Tokyo), 115, 486–492. BUSCONI, L., GUAN, J., and DENKER, B. M. (2000). Degradation of heterotrimeric Galpha(o) subunits via the proteosome pathway is induced by the hsp90-specific compound geldanamycin. J. Biol. Chem., 275, 1565–1569. VAISKUNAITE, R., KOZASA, T., and VOYNO-YASENETSKAYA, T. A. (2001). Interaction between the G alpha subunit of heterotrimeric G(12) protein and Hsp90 is required for G alpha(12) signaling. J. Biol. Chem., 276, 46088–46093. TAHBAZ, N., CARMICHAEL, J. B., and HOBMAN, T. C. (2001). GERp95 belongs to a family of signal-transducing proteins and requires Hsp90 activity for stability and Golgi localization. J. Biol. Chem., 276, 43294–43299. MAYAMA, J., KUMANO, T., HAYAKARI, M., YAMAZAKI, T., AIZAWA, S., KUDO, T., and TSUCHIDA, S. (2003). Polymorphic glutathione S-transferase subunit 3 of
References
300
301
302
303
304
305
306
307
rat liver exhibits different susceptibilities to carbon tetrachloride: differences in their interactions with heat-shock protein 90. Biochem. J., 372, 611–616. VENEMA, R. C., VENEMA, V. J., JU, H., HARRIS, M. B., SNEAD, C., JILLING, T., DIMITROPOULOU, C., MARAGOUDAKIS, M. E., and CATRAVAS, J. D. (2003). Novel complexes of guanylate cyclase with heat shock protein 90 and nitric oxide synthase. Am. J. Physiol Heart Circ. Physiol, 285, H669–H678. HERBERTSSON, H., KUHME, T., and HAMMARSTROM, S. (1999). The 650-kDa 12(S)-hydroxyeicosatetraenoic acid binding complex: occurrence in human platelets, identification of hsp90 as a constituent, and binding properties of its 50-kDa subunit. Arch. Biochem. Biophys., 367, 33–38. SCHNAIDER, T., OIKARINEN, J., ISHIWATARI-HAYASAKA, H., YAHARA, I., and CSERMELY, P. (1999). Interactions of Hsp90 with histones and related peptides. Life Sci., 65, 2417–2426. JOLY, G. A., AYRES, M., and KILBOURN, R. G. (1997). Potent inhibition of inducible nitric oxide synthase by geldanamycin, a tyrosine kinase inhibitor, in endothelial, smooth muscle cells, and in rat aorta. FEBS Lett., 403, 40–44. AGARRABERES, F. A. and DICE, J. F. (2001). A molecular chaperone complex at the lysosomal membrane is required for protein translocation. J. Cell Sci., 114, 2491–2499. NAKAMURA, T., HINAGATA, J., TANAKA, T., IMANISHI, T., WADA, Y., KODAMA, T., and DOI, T. (2002). HSP90, HSP70, and GAPDH directly interact with the cytoplasmic domain of macrophage scavenger receptors. Biochem. Biophys. Res. Commun., 290, 858–864. KANG, J., KIM, T., KO, Y. G., RHO, S. B., PARK, S. G., KIM, M. J., KWON, H. J., and KIM, S. (2000). Heat shock protein 90 mediates protein-protein interactions between human aminoacyl-tRNA synthetases. J. Biol. Chem., 275, 31682–31688. PENG, Y., CHEN, L., LI, C., LU, W., and CHEN, J. (2001). Inhibition of MDM2 by hsp90 contributes to mutant p53
308
309
310
311
312
313
314
315
stabilization. J. Biol. Chem., 276, 40583–40590. KELLERMAYER, M. S. and CSERMELY, P. (1995). ATP induces dissociation of the 90 kDa heat shock protein (hsp90) from F-actin: interference with the binding of heavy meromyosin. Biochem. Biophys. Res. Commun., 211, 166–174. HUBERT, D. A., TORNERO, P., BELKHADIR, Y., KRISHNA, P., TAKAHASHI, A., SHIRASU, K., and DANGL, J. L. (2003). Cytosolic HSP90 associates with and modulates the Arabidopsis RPM1 disease resistance protein. EMBO J., 22, 5679–5689. BENDER, A. T., SILVERSTEIN, A. M., DEMADY, D. R., KANELAKIS, K. C., NOGUCHI, S., PRATT, W. B., and OSAWA, Y. (1999). Neuronal nitric-oxide synthase is regulated by the Hsp90-based chaperone system in vivo. J. Biol. Chem., 274, 1472–1478. ISHIWATARI-HAYASAKA, H., MARUYA, M., SREEDHAR, A. S., NEMOTO, T. K., CSERMELY, P., and YAHARA, I. (2003). Interaction of neuropeptide Y and Hsp90 through a novel peptide binding region. Biochemistry, 42, 12972–12980. ADINOLFI, E., KIM, M., YOUNG, M. T., DI VIRGILIO, F., and SURPRENANT, A. (2003). Tyrosine phosphorylation of HSP90 within the P2X7 receptor complex negatively regulates P2X7 receptors. J. Biol. Chem., 278, 37344–37351. MOMOSE, F., NAITO, T., YANO, K., SUGIMOTO, S., MORIKAWA, Y., and NAGATA, K. (2002). Identification of Hsp90 as a stimulatory host factor involved in influenza virus RNA synthesis. J. Biol. Chem., 277, 45306–45314. BRUNEAU, N., LOMBARDO, D., and BENDAYAN, M. (1998). Participation of GRP94-related protein in secretion of pancreatic bile salt-dependent lipase and in its internalization by the intestinal epithelium. J. Cell Sci., 111 (Pt 17), 2665–2679. BANUMATHY, G., SINGH, V., and TATU, U. (2002). Host chaperones are recruited in membrane-bound complexes by Plasmodium falciparum. J. Biol. Chem., 277, 3902–3912.
47
48 The Hsp90 Family of Molecular Chaperones
316 PAI, K. S., MAHAJAN, V. B., LAU, A., and CUNNINGHAM, D. D. (2001). Thrombin receptor signaling to cytoskeleton requires Hsp90. J. Biol. Chem., 276, 32642–32647. 317 TSUBUKI, S., SAITO, Y., and KAWASHIMA, S. (1994). Purification and characterization of an endogenous inhibitor specific to the Z-Leu-Leu-Leu-MCA degrading activity in proteasome and its identification as heat-shock protein 90. FEBS Lett., 344, 229–233. 318 SAKISAKA, T., MEERLO, T., MATTESON, J., PLUTNER, H., and BALCH, W. E. (2002). Rab-alphaGDI activity is regulated by a Hsp90 chaperone complex. EMBO J., 21, 6125–6135. 319 HU, Y. and MIVECHI, N. F. (2003). HSF-1 interacts with Ral-binding protein 1 in a stress-responsive, multiprotein complex with HSP90 in vivo. J. Biol. Chem., 278, 17299–17306. 320 GILMORE, R., COFFEY, M. C., and LEE, P. W. (1998). Active participation of Hsp90 in the biogenesis of the trimeric reovirus cell attachment protein sigma1. J. Biol. Chem., 273, 15227–15233. 321 FORTUGNO, P., BELTRAMI, E., PLESCIA, J., FONTANA, J., PRADHAN, D., MARCHISIO, P. C., SESSA, W. C., and ALTIERI, D. C. (2003). Regulation of survivin function
322
323
324
325
326
by Hsp90. Proc. Natl. Acad. Sci. U.S.A, 100, 13791–13796. ABBAS-TERKI, T. and PICARD, D. (1999). Alpha-complemented beta-galactosidase. An in vivo model substrate for the molecular chaperone heat-shock protein 90 in yeast. Eur. J. Biochem., 266, 517–523. MURESAN, Z. and ARVAN, P. (1997). Thyroglobulin transport along the secretory pathway. Investigation of the role of molecular chaperone, GRP94, in protein export from the endoplasmic reticulum. J. Biol. Chem., 272, 26095–26102. SANCHEZ, E. R., REDMOND, T., SCHERRER, L. C., BRESNICK, E. H., WELSH, M. J., and PRATT, W. B. (1988). Evidence that the 90-kilodalton heat shock protein is associated with tubulin-containing complexes in L cell cytosol and in intact PtK cells. Mol. Endocrinol., 2, 756–760. MELNICK, J., AVIEL, S., and ARGON, Y. (1992). The endoplasmic reticulum stress protein GRP94, in addition to BiP, associates with unassembled immunoglobulin chains. J. Biol. Chem., 267, 21303–21306. HUNG, J. J., CHUNG, C. S., and CHANG, W. (2002). Molecular chaperone Hsp90 is important for vaccinia virus growth in cells. J. Virol., 76, 1379–1390.
1
Small Heat Shock Proteins Franz Narberhaus Ruhr-Universit¨at Bochum, Bochum, Germany
Martin Haslbeck Technische Universit¨at M¨unchen, Garching, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
The most poorly conserved molecular chaperone family comprises the vertebrate eye lens α-crystallins and numerous small heat shock proteins (sHsps). Although proteins belonging to this superfamily are diverse, most of them share six characteristic features: (1) a modestly conserved α-crystallin domain of about 90 amino acids, (2) a small molecular mass between 12 and 43 kDa, (3) induction by stress conditions, (4) chaperone-like activity in preventing other proteins from aggregation, (5) formation of large oligomers, and (6) a highly dynamic quaternary structure [1–9]. This article reports on the structural and functional properties of sHsps with an emphasis on their dynamic oligomeric nature.
2 α-Crystallins and the Small Heat Shock Protein Family: Diverse Yet Similar
The vertebrate eye lens α-crystallins are the founding members of a protein superfamily, the small heat shock proteins, sHsps or α-Hsps for short [2, 7, 8]. Two isoforms, αA- and αB-crystallin, which exhibit close to 60% amino acid sequence identity, usually occur in a molar ratio of three to one and makeup one-third of the eye lens [10]. As the name implies, functional α-crystallins are crucial for transparency of the eye lens, a crowded environment with an extremely high protein concentration. While αA-crystallin is mainly confined to the lenticular compartment, its close relative αB-crystallin was found to be a heat-inducible protein present in Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Small Heat Shock Proteins
many other tissues [11–13]. Ignolia and Craig were the first to notice that four small heat shock proteins from Drosophila are related to mammalian α-crystallins [14]. Meanwhile, members of the sHsp family have been reported in all kingdoms from archaea and bacteria to plants and animals [2, 3, 7, 8]. A notable exception are at least 11 pathogenic bacteria, whose completed genome sequences do not provide any hint at the presence of α-Hsp genes [2, 15, 16]. On the other hand, many organisms encode multiple sHsps. Distinct sHsp classes residing in different cellular compartments can be distinguished in plants. For example, the Arabidopsis genome revealed a total of 13 sHsps belonging to six classes defined on the basis of their intracellular localization and sequence similarity. Six additional open reading frames potentially encode further family members [17]. A list of representative sHsps from prokaryotes and eukaryotes is presented in Table 1. The molecular mass of the majority of sHsps is around 20 kDa giving rise to their designation small Hsp. However, the full range extends from 12 to 43 kDa. Unlike other molecular chaperones, sHsps are poorly conserved and often exhibit not more than 20% sequence identity in pairwise alignments. Common to all family members is a moderately conserved α-crystallin domain of about 90 amino acids flanked by a variable N-terminal region and a short C-terminal extension (Figure 1). Multiple sequence alignments including sHsps from all kingdoms revealed only very few consensus residues [2, 7, 8]. Not a single amino acid is invariant or shared by all family members. The most common denominators are two short motifs (FxRxxxL and AxxxxGVL) towards the end of the α-crystallin domain and an IxI sequence in the C-terminal extension (Figure 1). The N-terminal region varies substantially in both length and sequence. While Hsp12 proteins of Caenorhabditis elegans carry a short N-terminus of about 20 amino acids [18], Hsp42 from yeast owes its length to an extended N-terminal region [19]. Sequence similarities in the N-terminal domain are restricted to closely related proteins. Phylogenetic analyses suggest that sHsps diverged very early in evolution [7]. A complex history of repeated gene duplications and lateral gene transfer probably accounts for the presence of multiple sHsps in some organisms. Bacteria lacking these proteins might have lost the corresponding gene(s) during adaptation to specialized conditions. Since sHsps are generally less critical for survival than the major chaperones, weaker functional constraints might have allowed a much higher diversification as compared to Hsp60 and Hsp70. Many sHsps are not expressed under normal physiological conditions but rapidly accumulate during heat stress by classical prokaryotic or eukaryotic heat shock control mechanisms [2, 20, 21]. Microarray-based gene expression profiling in E. coli revealed a 300-fold induction of ibpAB, which by far exceeds the induction of all other heat shock genes [22]. Massive heat induction of sHsps was also reported in animal and plant tissues [3, 23]. Interestingly, the presence of sHsps is not restricted to eye lenses and heat-stressed cells. In bacteria, plants, and animals, they have been discovered during development processes, metabolic transitions, environmental insults, and many other conditions [2, 3, 9, 23, 24]. The broad expression pattern of sHsps is consistent with both a general protective function on one hand and specialized tasks in certain cell types on the other.
Large
αA/B-crystallin
C. elegans Wheat Pea Mouse
Vertebrates
Eukaryotes
Bacteria
24mer Large Large 9mer 24mer 24mer 12mer Monomer 12mer 12mer 16mer
Hsp16.5 Pfu-sHsp IbpA/B Hsp16.3 Hsp17 (16.6) Hsp26 Hsp42 Hsp12.6 Hsp16.9 Hsp18.1 Hsp25
M. jannaschii P. furiosus E. coli M. tuberculosis Synechocystis sp. Yeast
Complex
Archaea
Protein
Organism
Kingdom
Table 1 Representative members of the α-Hsp family
Yes
Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes
Chaperone
Concentration-dependent equilibrium among 16mer, tetramers, and dimers Variable quaternary structure, molar ratio of 3:1 in the eye lens
Crystal structure, two hexameric rings
Associates with inclusion bodies Antigen, trimer of trimers Stabilizes membranes Temperature-regulated chaperone
Crystal structure, hollow sphere
Comments
[25, 150]
[110, 143] [144, 145] [53, 146] [115, 147] [97, 122] [114] [148] [18] [113] [29] [46, 107, 149]
References
2 "-Crystallins and the Small Heat Shock Protein Family: Diverse Yet Similar 3
4 Small Heat Shock Proteins
Fig. 1 Domain structure of sHsps. The overall domain structure is depicted on top of the figure. The N-terminal region, "-crystallin domain, C-terminal extension, and conserved motifs are indicated. Representative sHsps are shown below. Numbering is based on the assignment by Kim et al. [110].
3 Cellular Functions of α-Hsps 3.1 Chaperone Activity in vitro
From in vitro studies in the early 1990s, it emerged that the primary function of sHsps might be to bind denatured proteins in order to prevent their aggregation. Bovine α-crystallin was first reported to have chaperone-like activity [25]. It prevented thermal aggregation of various model substrates. Since then, it has been demonstrated that almost all sHsps are able to prevent the formation of thermally or chemically induced light-scattering aggregates (Table 1). Some additional examples are human Hsp27 [26], frog Hsp30C [27], shrimp p26 [28], pea Hsp17.7 [29], tobacco Hsp18 [30], two classes of sHsps from the soybean symbiont Bradyrhizobium japonicum [31], Lo18 from the lactic acid bacterium Oenococcus oeni [32], and Hsp16 and sHspTm from the thermophilic bacteria Synechococcus vulcanus and Thermotoga maritima, respectively [33, 34]. Only the smallest known representatives of the sHsp family, Hsp12.2, Hsp12.3, and Hsp12.6 of C. elegans, have failed to show chaperone-like properties [18, 35].
3 Cellular Functions of "-Hsps
Since natural substrates are largely unknown, the model substrates chosen for in vitro studies usually are commercially available proteins, such as citrate synthase (CS) or malate dehydrogenase (MDH) from pig heart mitochondria, hog muscle lactate dehydrogenase, or bovine insulin. Some fairly robust chaperone assays will be described in the Appendix. The fact that such a wide spectrum of artificial substrates is protected from precipitation suggests that sHsps act as rather promiscuous chaperones, accepting a variety of client proteins. It is still unclear which regions in these substrate proteins are bound by sHsps and whether common recognition motifs exist. Only a single α-crystallin-binding site was identified in yeast alcohol dehydrogenase (ADH) by a cross-linking approach [36]. The side corresponds to amino acids 40–60 in the enzyme and is rich in beta-sheet-forming amino acids. The corresponding synthetic peptide formed a complex with α-crystallin at 48 ◦ C. Since the same peptide exhibited chaperone-like activity and prevented aggregation of ADH and other proteins [37], it is possible that it has an inherent potential to interact with other β-sheets either in unfolded proteins or in sHsps. Target sites in other substrates need to be defined before a putative recognition motif can be established by the compilation of binding sequences. Likewise, the residues in sHsps involved in substrate interaction are largely unknown. Circumstantial evidence by incorporation of hydrophobic dyes has suggested that substrates might bind to short segments in the N-terminal region or at the beginning of the α-crystallin domain [38, 39]. Results from cross-linking experiments supported this notion [40, 41]. An exchange of the highly conserved G in the AxxxGVL motif of HspH from B. japonicum specifically impaired chaperone activity without interfering with other properties of the protein [42]. This finding suggests that the C-terminal part of the α-crystallin domain might also be involved in substrate binding. Light-scattering experiments demonstrated that precipitation of most substrates can be avoided by the addition of an equimolar amount of sHsps, suggesting that each chaperone particle is able to capture one substrate molecule. However, certain size limitations and steric restrictions seem to apply, as it was shown that α-crystallin is a more potent chaperone with small substrate proteins [43]. Large substrates probably tie up two or more Hsp molecules simultaneously [44]. At any rate, the binding capacity of sHsps is much higher than that of other multi-meric chaperones such as GroEL, which accommodates only one or two substrates in its central cavity formed by 14 subunits [45]. With enzymatic substrates it could be shown that binding to sHsps did not prevent their inactivation [29, 46]. Bound proteins have already experienced some unfolding and are thought to be in a disordered and aggregation-prone molten-globule state, in which tertiary structure is lost but secondary structure remains [47–49]. The major chaperoning task of sHsps is to prevent irreversible inactivation and aggregation of such (un)folding intermediates by maintaining them in a soluble and folding-competent state [38, 46]. ATP does not play a direct role in this “holding” activity, but it might modulate the structural and functional properties of some sHsps [50–52]. Once formed, complexes between substrate and sHsp are very stable [46, 53]. Electron microscopic images have revealed large complexes of defined size and shape
5
6 Small Heat Shock Proteins
Fig. 2 Dynamic behavior and protein quality-control function of sHsps. Partially unfolded proteins that accumulate after temperature upshift (T) are prevented from aggregation by binding to dissociated sHsp particles, resulting in large chaperone-substrate complexes. Refolding of the substrate proteins requires the action of ATP-dependent chaperones.
whose morphology is strictly dependent on the substrate protein but independent of the chaperone [54]. When several different nonnative proteins were incorporated into one complex, its overall shape was determined by the first substrate bound. Substrate proteins are not loosely attached to sHsps but are incorporated into mixed aggregates whose size and solubility correlate with the availability of the chaperone [55]. In plants, the formation of large heat shock granules consisting of sHsps, unfolded proteins, and mRNA has been reported during prolonged heat stress [56]. The association of heat shock granules is a specific and ordered process [57]. A consequence of the stable interaction between unfolded proteins and sHsps is that the substrates are neither transferred between different sHsp complexes nor spontaneously released. While trapping of aggregation-prone poly-peptides might be sufficient in the static environment of the eye lens, the precious reservoir of foldingcompetent proteins is destined for reuse in other cell types. Some release of CS from Hsp25 could be achieved by the addition of oxaloacetate, a stabilizing ligand of CS [46]. However, productive release of substrate proteins requires additional cellular components in an ATP-dependent reaction [38, 58]. It has been demonstrated that substrate proteins are specifically transferred to the Hsp70/DnaK system for refolding [46, 59–61]. Disaggregation of sHsp/substrate complexes by DnaK is facilitated by ClpB, in particular when large insoluble particles have been formed [55]. The Hsp60/GroEL system does not directly participate in release of sHsp bound substrates, but it can accelerate the rate of refolding by operating downstream of Hsp70 in a cooperative multi-chaperone network [60] (Figure 2). 3.2 Chaperone Function in vivo
Many cellular functions of sHsps can be reconciled in light of their reported chaperone activity. In the center of the human eye lens, the protein concentration reaches
3 Cellular Functions of "-Hsps
up to 450 mg mL−1 [62]. The structural lens proteins in this crowded environment unfold with age and are prevented from forming light-scattering aggregates by the combined action of αA- and αB-crystallin [58, 63, 64]. The most compelling documentation that α-crystallins are essential for lens transparency stems from αA-crystallin knockout mice that developed severe cataracts at an early age [65]. An R116G exchange αA-crystallin is responsible for autosomal dominant congenital cataract in humans [66]. The mutated protein was shown to have reduced chaperone activity [67–69]. An equivalent mutation in αB-crystallin (R120G) causes cataract formation and desmin-related myopathy, an inherited neuromuscular disorder [70]. The aggregation of desmin filaments is thought to result from the reported defect in chaperone activity [67, 71, 72]. Somewhat surprisingly, a mouse deleted of the αB-crystallin gene carried normal eye lenses and developed normally except for a reduced life span [73]. Interestingly, αB-crystallin is overexpressed in many neurological disorders such as Alzheimer [74], Creutzfeld-Jakob [75], and other diseases [76, 77]. The clinical importance of sHsps is further underlined by an ever-growing number of other reported activities, among them, interaction with actin filaments and microtubules [78], protection against oxidative stress and modulation of the intracellular redox state [79], interference with apoptosis [80], translation inhibition [81], and neuroprotective effects [82]. The involvement of Hsps in disease and important cellular processes is potentially applicable to the development of diagnostic tests and novel drugs [83, 84]. Specific subsets of the multiple sHsps in plants are induced by an amazing array of conditions including germination, embryo and pollen development, and fruit maturation [3, 24]. How the complex developmental expression pattern is coordinated remains to be determined. Osmotic shock, oxidative stress, cold storage, and heavy metals also trigger the induction of certain sHsp species. Again, it is tempting to speculate that protein misfolding is a common theme in all of these events and that the sHsps carry out a principle protective function by preventing deleterious protein aggregation. In fact, it has been demonstrated by an in vivo reporter system that expression of various sHsps protects firefly luciferase in Arabidopsis cell suspension cultures [85, 86]. Interestingly, the expression of various plant sHsps has been reported to enhance thermotolerance of E. coli, again emphasizing a general rather than specific protective effect of these chaperones [87, 88]. Much to the same effect, overexpression of endogenous sHsps increased stress tolerance of E. coli and A. thaliana [89, 90], whereas disruption of the hsp30 gene in Neurospora crassa resulted in reduced thermotolerance [91]. In contrast to the Hsp60 and Hsp70 chaperones, one of which is often essential for survival in many cell types, sHsps are dispensable. Apparently, their lack can be offset by the other constituents of the multi-chaperone network. Growth defects associated with accumulation of aggregated proteins were observed only in an E. coli ibpAB mutant during extreme heat shock [92, 93]. The defect was more pronounced in combination with a dnaK mutant allele, suggesting that a functional interaction between sHsps and the Hsp70 system also exists in vivo [92].
7
8 Small Heat Shock Proteins
3.3 Other Functions
Inactivation of the hsp17 gene of Synechocystis sp. PCC6803 coding for the single sHsp member in this cyanobacterium resulted not only in decreased thermotolerance but also in reduced activity of the photosynthetic apparatus and disrupted integrity of thylakoid membranes [94, 95]. Hsp17 was found to be associated with thylakoid membranes, and transcription of hsp17 responded to changes in membrane physical order, which led to its designation as “fluidity gene” [96]. Purified Hsp17 exhibited a dual function, acting either as chaperone by binding to denatured proteins or as stabilizer of hyperfluid lipid membranes by penetrating into the membrane hydrophobic core [97]. Preservation of membrane integrity by sHsps was postulated to be a general mechanism because α-crystallin stabilized synthetic and cyanobacterial membranes much like Hsp17 [98]. The observed membrane association of α-crystallin [99] and other sHsps [100–102] supports this hypothesis. Partitioning between soluble and membrane-bound states may be the key to whether chaperone function or membrane stabilization prevails. Several reports have documented an interaction between sHsps and nucleic acids. Untranslated mRNAs have been detected in heat shock granules from plants and mammals [56, 103]. It is not clear, however, whether it is a fortuitous association mediated via other proteins that are integrated into the granules or whether the sHsps have a direct affinity towards nucleic acids. Some reports show that sHsps are indeed able to bind single-stranded or double-stranded DNA [104, 105], probably by helical structures [106]. Many interesting sHsp activities in maintaining the integrity of various cellular macromolecules may still remain to be explored.
4 The Oligomeric Structure of α-Hsps
Despite their sequence diversity, sHsps share secondary structure. Predictions that suggest a high β-sheet content in the α-crystallin domain are consistent with CDspectra dominated by β-sheets in several sHsps [107, 108]. Calculation of the β-sheet content from far UV spectra of Hsp25, Hsp26, and α-crystallin revealed β-sheet contents between 20% and 30% [107, 109], correlating directly with the content of α-crystallin domain in the overall sequence and indicating high α-helical and unstructured parts in the N-terminal domain. One of the most striking features of sHsps is their organization in large oligomeric structures, comprising nine to about 50 subunits. For three family members the structure of these complexes has been solved (Figure 3), revealing hollow, globule-like structures with outside diameters of 120 and 190Å . While the quaternary structure of Hsp16.5 from Methanococcus jannaschii is well ordered [110], the oligomers formed by α-crystallin and Hsp27 show more structural variability
4 The Oligomeric Structure of "-Hsps
Fig. 3 Three-dimensional structure of sHsps. (A) Crystal structure of wheat Hsp16.9 [113]. The dodecamer arranged as two hexameric discs is 9.5 nm wide and 5.5 nm high. (B) Cryo-electron microscopic image of "-crystallin [111]. The outer diame-
ter of the 32mer is 18 nm [111]. (C) Crystal structure of Hsp16.5 from M. jannaschii, which assembles into a 24mer with an outer diameter of 12 nm [110]. Reproduced with permission of the publisher and authors.
and the ability to acquire and release subunits [111, 112]. Interestingly, wheat Hsp16.9 assembles into a dodecameric double disk. Each disk is organized as a trimer of dimers [113]. Hydrophobic patches become exposed upon disassembly of the complex into dimers. The structures of all sHsps solved so far support the idea that a dimer is the smallest exchangeable unit. This minimal dimeric building block seems to be conserved and may represent the functional unit concerning chaperone properties [9, 109, 113, 114]. The necessary exception to this general picture might be Hsp16.3 from Mycobacterium tuberculosis, for which a trimeric substructure has been proposed. According to cryo-electron microscopy data, the triangular particle is composed of nine subunits organized as a trimer of trimers [115]. It is matter of controversy which parts of sHsps contribute to the association of the oligomeric structures and which are involved in the interaction with nonnative proteins [2, 4, 9]. A common property of all sHsp structures is the localization of the N-terminal regions in the interior of the oligomeric structures. In Hsp16.5 and α-crystallin, these disordered regions sequester inside the sphere; in the case of Hsp16.9, they are buried in the oligomeric structure. The liberated, hydrophobic N-terminal part might represent the substrate-binding site of disassembled sHsp particles [109, 113]. To asses the role of the N-terminal region, the four Hsp12 proteins from C. ele-gans have been investigated [18, 35]. They are the smallest naturally occurring representatives of the family and are reduced to the α-crystallin domain preceded by
9
10 Small Heat Shock Proteins
an exceptionally short N-terminal region of 25 or 26 residues. They form complexes from monomers to tetramers, but neither of them displays chaperone activity. Only 16.2 from C. elegans, a typical member with an N-terminal extension of 41 residues, builds a higher oligomer of 14 to 24 subunits and displays chaperone activity [116]. N-terminal deletions resulted in trimeric or tetrameric complexes lacking chaperone activity. In recent studies on yeast Hsp26, an N-terminally truncated construct turned out to be dimeric and inactive [109]. It was concluded that regions that are important for both the assembly of the 24mer and the interaction with nonnative proteins reside in the N-terminal part of the protein. This view is supported by thermal unfolding experiments that show that full-length Hsp26 exhibits two transitions, one in the heat shock temperature range and one at much higher temperatures. Only the high-temperature transition was observed in the case of Hsp26N. Furthermore, these results suggest that less energy is required for dissociation of the 24mer than for the dissociation and unfolding of the dimer. Quantitative data for the changes in energy involved in these processes were obtained by analyzing urea-induced unfolding transitions at 25 ◦ C and 43 ◦ C. These data are in good agreement with the unfolding transitions and support the hypothesis of a thermolabile Hsp26 assembly that dissociates to stable dimers at elevated temperatures [109, 114]. Many efforts to identify individual amino acid residues involved in subunit interaction or chaperone activity by point mutagenesis have had limited success [52, 117–120]. Although some point mutations resulted in significantly enlarged assemblies, chaperone activity was barely affected. Point mutations in bacterial sHsps have been more telling because they pinpointed a number of residues in the α-crystallin domain and C-terminal extension that are critical for oligomeriza-tion and chaperone activity [42, 121–123]. In summary, the emerging picture is that residues in all three domains of sHsp are required for oligomerization [2]. While the α-crystallin domain is necessary for dimer formation and thus assembles the basic building block, both flanking regions promote the formation of higher-order structures. The oligomeric state of mammalian sHsps can be altered by various posttranslational modifications, in particular by serine-specific phosphorylation [124–126]. Human Hsp27, for example, possesses three phosphorylation sites, S15, S78, and S82, whose modification via a MAP kinase cascade leads to eight possible isoforms [127]. Thr143 comprises an alternative phosphorylation site in Hsp27 that is phos-phorylated by a cGMP-dependent protein kinase [128]. The only reported example of a phosphorylated plant sHsp is Hsp22 from maize mitochondria. Covalent modification occurs at a serine residue by an unspecified kinase activity [129]. Phosphorylation of human Hsp27 and αB-crystallin decreased its oligomeric size and reduced chaperone activity [130, 131]. With the exception of αA-crystallin, phosphorylation of mammalian sHsps is regulated in response to stress, cytokines, and growth factors [133–135]. Phosphatase-mediated dephosphorylation regulates the phosphorylation state [136–138]. Apart from the ambient physical and chemical conditions, controlled covalent modification of sHsp provides an additional
References
mechanism to modulate the structural and functional properties of sHsps according to the cellular demands.
5 Dynamic Structures as Key to Chaperone Activity
While the crystal structures of Hsp16.5 and Hsp16.9 might appear rather rigid and stable, sHsp complexes in fact are very dynamic, enabling them to exchange subunits and to form hetero-oligomeric assemblies. For example, Triticum aestivum Hsp16.9 has been shown to dissociate into sub-oligomeric species at higher temperatures. As a consequence, Hsp16.9 is able to interact and form complexes with its close homologue Pisum sativum Hsp18.1 [113]. Complex formation between both sHsps involves subunit exchange, as has been demonstrated elegantly by electrospray mass spectrometry [139]. The exchange of subunits between sHsps has also been demonstrated for other sHsps from plants and bacteria [31, 140]. Exchange occurred only if the proteins were from the same class of sHsps. In mammals, various sHsps frequently exchange subunits. In the vertebrate eye lens, for example, α-crystallin forms heteroaggregates containing αA and αB subunits [141]. In addition, αA-crystallin exchanges subunits with Hsp27 [112]. Mixed polymers of αA-crystallin, αB-crystallin, and Hsp25 [107], and between HspB2 and HspB3, have been described [142]. It is not clear what the function of this intermolecular subunit exchange in sHsps might be. However, in the case of cells that express several sHsps in the same compartment, the formation of hetero-complexes most likely is relevant in vivo. It is possible that mixed complexes are the result of a fortuitous interaction of closely related proteins maintaining them in an inactive storage form. Regardless of whether homo-oligomeric or hetero-oligomeric complexes have been formed, they need to dissociate in order to gain chaperone function (Figure 2). One interpretation of this dynamic behavior is that substrate-binding sites are buried in the complex but become exposed by dissociation. Major structural rearrangements upon substrate binding lead to the formation of even larger substratesHsp complexes that need to be taken apart with the help of other chaperones [54, 55]. The spontaneous dissociation-reassociation process might be some sort of sensing process monitoring the presence of nonnative proteins in the cellular environment.
Acknowledgements
We thank Hauke Hennecke and Johannes Buchner for generous support. Work in our laboratories was financed by grants from the Swiss National Foundation for Scientific Research and the Swiss Federal Institute of Technology to F.N. and by a grant from the Deutsche Forschungsgemeinschaft (SFB 594) to J.B. and M.H.
11
12 Small Heat Shock Proteins
References 1 MACRAE, T. H. (2000) Structure and function of small heat shock/α-crystallin proteins: established concepts and emerging ideas. Cell. Mol. Life Sci. 57, 899–913. 2 NARBERHAUS, F. (2002) a-Crystallin-type heat shock proteins: Socializing minichaperones in the context of a multichaperone network. Microbiol. Mol. Biol. Rev. 66, 64–93. 3 WATERS, E. R., LEE, G. J., and VIERLING, E. (1996) Evolution, structure and function of the small heat shock proteins in plants. J. Exp. Bot. 47, 325–338. 4 HASLBECK, M., and BUCHNER, J. (2002) Chaperone function of sHsps. Prog. Mol. Subcell. Biol. 28, 37–59. 5 HASLBECK, M. (2002) sHsps and their role in the chaperone network. Cell. Mol. Life Sci. 59, 1649–1657. 6 EHRNSPERGER, M., BUCHNER, J., and GAESTEL, M. (1998) Structure and function of small heat-shock proteins. In Molecular chaperones in the life cycle of proteins: Structure, function and mode of action (FINK A. L., and GOTO, Y., eds), pp. 533–575, Marcel Dekker, Inc., New York. 7 De JONG, W. W., CASPERS, G. J., and LEUNISSEN, J. A. M. (1998) Genealogy of the α-crystallin – small heat-shock protein superfamily. Int. J. Biol. Macromol. 22, 151–162. 8 CASPERS, G. J., LEUNISSEN, J. A. M., and de JONG, W. W. (1995) The expanding small heat-shock protein family, and structure predictions of the conserved ‘α-crystallin domain’. J. Mol. Evol. 40, 238–248. 9 Van MONTFORT, R., SLINGSBY, C., and VIERLING, E. (2001) Structure and function of the small heat shock protein/α-crystallin family of molecular chaperones. Adv. Protein Chem. 59, 105–156. 10 HORWITZ, J. (2003) Alpha-crystallin. Exp. Eye Res. 76, 145–153. 11 KLEMENZ, R., FRO¨ HLI, E., STEIGER, R. H., SCHA¨ FER, R., and AOYAMA, A. (1991) αB-crystallin is a small heat shock protein. Proc. Natl. Acad. Sci. USA 88, 3652–3656.
12 DUBIN, R. A., WAWROUSEK, E. F., and PIATIGORSKY, J. (1989) Expression of the murine αB-crystallin gene is not restricted to the lens. Mol. Cell. Biol. 9, 1083–1091. 13 BHAT, S. P., and NAGINENI, C. N. (1989) αB subunit of lens-specific protein α-crystallin is present in other ocular and non-ocular tissues. Biochem. Biophys. Res. Commun. 158, 319– 325. 14 INGOLIA, T. D., and CRAIG, E. A. (1982) Four small Drosophila heat shock proteins are related to each other and to mammalian α-crystallin. Proc. Natl. Acad. Sci. USA 79, 2360–2364. 15 M¨UNCHBACH, M., NOCKER, A., and NARBERHAUS, F. (1999) Multiple small heat shock proteins in rhizobia. J. Bacteriol. 181, 83–90. 16 KAPPE G., LEUNISSEN, J. A., and de JONG, W. W. (2002) Evolution and diversity of prokaryotic small heat shock proteins. Prog. Mol. Subcell. Biol. 28, 1–17. 17 SCHARF, K. D., SIDDIQUE, M., and VIERLING, E. (2001) The expanding family of Arabidopsis thaliana small heat stress proteins and a new family of proteins containing α-crystallin domains (Acd proteins). Cell Stress Chaperones 6, 225–237. 18 LEROUX, M. R., MA, B. J., BATELIER, G., MELKI, R., and CANDIDO, E. P. M. (1997) Unique structural features of a novel class of small heat shock proteins. J. Biol. Chem. 272, 12847–12853. 19 WOTTON, D., FREEMAN, K., and SHORE, D. (1996) Multimerization of Hsp42p, a novel heat shock protein of Saccharomyces cerevisiae, is dependent on a conserved carboxyl-terminal sequence. J. Biol. Chem. 271, 2717–2723. 20 NOVER, L., and SCHARF, K. D. (1997) Heat Stress Proteins and Transcription Factors. Cell. Mol. Life Sci. 53, 80–103. 21 MORIMOTO, R. I. (1998) Regulation of the heat shock transcriptional response: cross talk between a family of heat shock factors, molecular chaperones, and negative regulators. Genes Dev. 12, 3788–3796.
References 22 RICHMOND, C. S., GLASNER, J. D., MAU, R., JIN, H., and BLATTNER, F. R. (1999) Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res. 27, 3821–3835. 23 ARRIGO, A. P., and LANDRY, J. (1994) Expression and function of the low-molecular-weight heat shock proteins. In The biology of heat shock proteins and molecular chaperones (MORIMOTO R. I., TISSIERES A., and GEORGOPOULOS C., eds), pp. 335–373, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 24 SUN, W., Van MONTAGU, M., and VERBRUGGEN, N. (2002) Small heat shock proteins and stress tolerance in plants. Biochim. Biophys. Acta 1577, 1–9. 25 HORWITZ, J. (1992) α-crystallin can function as a molecular chaperone. Proc. Natl. Acad. Sci. USA 89, 10449–10453. 26 JAKOB, U., GAESTEL, M., ENGEL, K., and BUCHNER, J. (1993) Small heat shock proteins are molecular chaperones. J. Biol. Chem. 268, 1517–1520. 27 FERNANDO, P., and HEIKKILA, J. J. (2000) Functional characterization of Xenopus small heat shock protein, Hsp30C: the carboxyl end is required for stability and chaperone activity. Cell Stress Chaperones 5, 148–159. 28 LIANG, P., AMONS, R., MACRAE, T. H., and CLEGG, J. S. (1997) Molecular characterization of a small heat shock/alpha-crystallin protein in encysted Artemia embryos. Eur. J. Biochem. 243, 225–232. 29 LEE, G. J., POKALA, N., and VIERLING, E. (1995) Structure and in vitro molecular chaperone activity of cytosolic small heat shock proteins from pea. J. Biol. Chem. 270, 10432–10438. 30 SMYKAL, P., MASIN, J., HRDY I., KONOPASEK, I., and ZARSKY V. (2000) Chaperone activity of tobacco HSP18, a small heat-shock protein, is inhibited by ATP. Plant J. 23, 703–713. 31 STUDER, S., and NARBERHAUS, F. (2000) Chaperone activity and homo-and hetero-oligomer formation of bacterial small heat shock proteins. J. Biol. Chem. 275, 37212–37218.
32 DELMAS, F., PIERRE, F., COUCHENEY, F., DIVIES, C., and GUZZO, J. (2001) Biochemical and physiological studies of the small heat shock protein Lo18 from the lactic acid bacterium Oenococcus oeni. J. Mol. Microbiol. Biotechnol. 3, 601–610. 33 MICHELINI, E. T., and FLYNN, G. C. (1999) The unique chaperone operon of Thermotoga maritima: Cloning and initial characterization of a functional Hsp70 and small heat shock protein. J. Bacteriol. 181, 4237–4244. 34 ROY, S. K., HIYAMA, T., and NAKAMOTO, H. (1999) Purification and characterization of the 16-kDa heat-shock-responsive protein from the thermophilic cyanobacterium Synechococcus vulcanus, which is an α-crystallin-related, small heat shock protein. Eur. J. Biochem. 262, 406–416. 35 KOKKE, B. P. A., LEROUX, M. R., CANDIDO, E. P. M., BOELENS, W. C., and DEJONG, W. W. (1998) Caenorhabditis elegans small heat-shock proteins Hsp12.2 and Hsp12.3 form tetramers and have no chaperone-like activity. FEBS Lett. 433, 228–232. 36 SANTHOSHKUMAR, P., and SHARMA, K. K. (2002) Caenorhabditis elegans small heat-shock proteins Hsp12.2 and Hsp12.3 form tetramers and have no chaperone-like activity. Biochim. Biophys. Acta 1598, 115–121. 37 BHATTACHARYYA, J., SANTHOSHKUMAR, P., and SHARMA, K. K. (2003) A peptide sequence-YSGVCHTDLHAWHGDWPLPVK [40–60]-in yeast alcohol dehydrogenase prevents the aggregation of denatured substrate proteins. Biochem. Biophys. Res. Commun. 307, 1–7. 38 LEE, G. J., ROSEMAN, A. M., SAIBIL, H. R., and VIERLING, E. (1997) A small heat shock protein stably binds heat denatured model substrates and can maintain a substrate in a folding competent state. EMBO J. 16, 659– 671. 39 SHARMA, K. K., KUMAR, G. S., MURPHY, A. S., and KESTER, K. (1998) Identification of 1,10-Bi(4-anilino) naphthalene-5,50-disulfonic acid
13
14 Small Heat Shock Proteins
40
41
42
43
44
45
46
47
48
49
binding sequences in α-crystallin. J. Biol. Chem. 273, 15474–15478. SHARMA, K. K., KUMAR, R. S., KUMAR, G. S., and QUINN, P. T. (2000) Synthesis and characterization of a peptide identified as a functional element in αA-crystallin. J. Biol. Chem. 275, 3767–3771. SHARMA, K. K., KAUR, H., and KESTER, K. (1997) Functional elements in molecular chaperone α-crystallin: identification of binding sites in αB-crystallin. Biochem. Biophys. Res. Commun. 239, 217–222. LENTZE, N., STUDER, S., and NARBERHAUS, F. (2003) Structural and functional defects caused by point mutations in the α-crystallin domain of a bacterail α-heat shock protein. J. Mol. Biol. 328, 927– 937. LINDNER, R. A., KAPUR, A., MARIANI, M., TITMUSS, S. J., and CARVER, J. A. (1998) Structural alterations of α-crystallin during its chaperone action. Eur. J. Biochem. 258, 170–183. BOVA, M. P., DING, L. L., HORWITZ, J., and FUNG, B. K. (1997) Subunit exchange of a α-crystallin. J. Biol. Chem. 272, 29511–29517. HARTL, F. U. (1996) Molecular chaperones in cellular protein folding. Nature 381, 571–580. EHRNSPERGER, M., GRA¨ BER, S., GAESTEL, M., and BUCHNER, J. (1997) Binding of non-native protein to Hsp25 during heat shock creates a reservoir of folding intermediates for reactivation. EMBO J. 16, 221–229. RAJARAMAN, K., RAMAN, B., and RAO, C. M. (1996) Molten-globule state of carbonic anhydrase binds to the chaperone-like α-crystallin. J. Biol. Chem. 271, 27595–27600. RAWAT, U., and RAO, M. (1998) Interactions of chaperone alpha-crystallin with the molten globule state of xylose reductase. Implications for reconstitution of the active enzyme. J. Biol. Chem. 273, 9415–9423. LINDNER, R. A., KAPUR, A., and CARVER, J. A. (1997) The interaction of the molecular chaperone, α-crystallin, with molten globule states of bovine α-lactalbumin. J. Biol. Chem. 272, 27722–27729.
50 WANG, K., and SPECTOR, A. (2001) ATP causes small heat shock proteins to release denatured protein. Eur. J. Biochem. 268, 6335–6345. 51 MUCHOWSKI, P. J., and CLARK, J. I. (1998) ATP-enhanced molecular chaperone functions of the small heat shock protein human aB crystallin. Proc. Natl. Acad. Sci. USA 95, 1004–1009. 52 MUCHOWSKI, P. J., HAYS, L. G., YATES, J. R., and CLARK, J. I. (1999) ATP and the core “α-crystallin” domain of the small heat-shock protein αB-crystallin. J. Biol. Chem. 274, 30190–30195. 53 ALLEN, S. P., POLAZZI, J. O., GIERSE, J. K., and EASTON, A. M. (1992) Two novel heat shock genes encoding proteins produced in response to heterologous protein expression in Escherichia coli. J. Bacteriol. 174, 6938–6947. 54 STROMER, T., EHRNSPERGER, M., GAESTEL, M., and BUCHNER, J. (2003) Analysis of the interaction of small heat shock proteins with unfolding proteins. J. Biol. Chem. 278, 18015–18021. 55 MOGK, A., SCHLIEKER, C., FRIEDRICH, K. L., SCHO¨ NFELD, H. J., VIERLING, E., and BUKAU, B. (2003) Refolding of substrates bound to small Hsps relies on a disaggregation reaction mediated most efficiently by ClpB/Dank. J. Biol. Chem. 278, 31033–31042. 56 NOVER, L., SCHARF, K. D., and NEUMANN, D. (1989) Cytoplasmic heat shock granules are formed from precursor particles and are associated with a specific set of mRNAs. Mol. Cell. Biol. 9, 1298–1308. 57 KIRSCHNER, M., WINKELHAUS, S., THIERFELDER, J. M., and NOVER, L. (2000) Transient expression and heat-stress-induced co-aggregation of endogenous and heterologous small heat-stress proteins in tobacco protoplasts. Plant J. 24, 397–411. 58 WANG, K., and SPECTOR, A. (1994) The chaperone activity of bovine α-crystallin. Interaction with other lens crystallins in native and denatured states. J. Biol. Chem. 269, 13601–13608. 59 SMYKAL, P., HRDY I., and PECHAN, P. M. (2000) High-molecular-mass complexes formed in vivo contain smHSPs and
References
60
61
62
63
64
65
66
67
68
HSP70 and display chaperone-like activity. Eur. J. Biochem. 267, 2195– 2207. VEINGER, L., DIAMANT, S., BUCHNER, J., and GOLOUBINOFF, P. (1998) The small heat-shock protein IbpB from Escherichia coli stabilizes stress-denatured proteins for subsequent refolding by a multichaperone network. J. Biol. Chem. 273, 11032–11037. LEE, G. J., and VIERLING, E. (2000) A small heat shock protein cooperates with heat shock protein 70 systems to reactivate a heat-denatured protein. Plant Physiol. 122, 189–197. FAGERHOLM, P. P., PHILIPSON, B. T., and LINDSTRO¨ M B. (1981) Normal human lens – the distribution of proteins. Exp. Eye Res. 33, 615–620. RAO, P. V., HUANG, Q. L., HORWITZ, J., and ZIGLER, J. S., Jr. (1995) Evidence that α-crystallin prevents non-specific protein aggregation in the intact eye lens. Biochim. Biophys. Acta 1245, 439–447. CARVER, J. A., NICHOLLS, K. A., AQUILINA, J. A., and TRUSCOTT, R. J. (1996) Age-related changes in bovine α-crystallin and high-molecular-weight protein. Exp. Eye Res. 63, 639–647. BRADY, J. P., GARLAND, D., DUGLAS-TABOR, Y., ROBISON, W. G. J., GROOME, A., and WAWROUSEK, E. F. (1997) Targeted disruption of the mouse αA-crystallin gene induces cataract and cytoplasmic inclusion bodies containing the small heat shock protein αB-crystallin. Proc. Natl. Acad. Sci. USA 94, 884–889. LITT, M., KRAMER, P., LAMORTICELLA, D. M., MURPHEY, W., LOVRIEN, E. W., and WELEBER, R. G. (1998) Autosomal dominant congenital cataract associated with a missense mutation in the human alpha crystallin gene CRYAA. Hum. Mol. Genet. 7, 471–474. KUMAR, L. V., RAMAKRISHNA, T., and RAO, C. M. (1999) Structural and functional consequences of the mutation of a conserved arginine residue in aA and aB crystallins. J. Biol. Chem. 274, 24137–24141. COBB, B. A., and PETRASH, J. M. (2000) Characterization of α-crystallin-plasma
69
70
71
72
73
74
75
76
membrane binding. Biochemistry 39, 15791–15798. SHROFF, N. P., CHERIAN-SHAW, M., BERA, S., and ABRAHAM, E. C. (2000) Mutation of R116C results in highly oligomerized αA-crystallin with modified structure and defective chaperone-like function. Biochemistry 39, 1420–1426. VICART, P., CARON, A., GUICHENEY, P., LI, Z., PREVOST, M. C., FAURE, A., CHATEAU, D., CHAPON, F., TOME, F., DUPRET, J. M., PAULIN, D., and FARDEAU, M. (1998) A missense mutation in the αB-crystallin chaperone gene causes a desmin-related myopathy. Nat. Genet. 20, 92–95. PERNG, M. D., MUCHOWSKI, P. J., van den IJSSEL, P., WU, G. J., HUTCHESON, A. M., CLARK, J. I., and QUINLAN, R. A. (1999) The cardiomyopathy and lens cataract mutation in αB-crystallin alters its protein structure, chaperone activity, and interaction with intermediate filaments in vitro. J. Biol. Chem. 274, 33235– 33243. BOVA, M. P., YARON, O., HUANG, Q. L., DING, L. L., HALEY, D. A., STEWART, P. L., and HORWITZ, J. (1999) Mutation R120G in αB-crystallin, which is linked to a desmin-related myopathy, results in an irregular structure and defective chaperone-like function. Proc. Natl. Acad. Sci. USA 96, 6137–6142. BRADY, J. P., GARLAND, D. L., GREEN, D. E., TAMM, E. R., GIBLIN, F. J., and WAWROUSEK, E. F. (2001) αB-crystallin in lens development and muscle integrity: a gene knockout approach. Invest. Ophthalmol. Vis. Sci. 42, 2924–2934. RENKAWEK, K., VOORTER, C. E., BOSMAN, G. J., van WORKUM, F. P., and de JONG, W. W. (1994) Expression of αB-crystallin in Alzheimer’s disease. Acta Neuropathol. 87, 155–160. RENKAWEK, K., de JONG, W. W., MERCK, K. B., FRENKEN, C. W., van WORKUM, F. P., and BOSMAN, G. J. (1992) αB-crystallin is present in reactive glia in Creutzfeldt-Jakob disease. Acta Neuropathol. 83, 324–327. LOWE, J., LANDON, M., PIKE, I., SPENDLOVE, I., MCDERMOTT, H., and MAYER, R. J. (1990) Dementia with beta-amyloid deposition: involvement of
15
16 Small Heat Shock Proteins
77
78
79
80
81
82
83
84
85
alpha B-crystallin supports two main diseases. Lancet 336, 515–516. HEAD, M. W., CORBIN, E., and GOLDMAN, J. E. (1993) Overexpression and abnormal modification of the stress proteins αB-crystallin and HSP27 in Alexander disease. Am. J. Pathol. 143, 1743–1153. MOUNIER, N., and ARRIGO, A. P. (2002) Actin cytoskeleton and small heat shock proteins: how do they interact? Cell Stress Chaperones 7, 167–176. ARRIGO, A. P., PAUL, C., DUCASSE, C., SAUVAGEOT, O., and KRETZ-REMY, C. (2002) Small stress proteins: modulation of intracellular redox state and protection against oxidative stress. Prog. Mol. Subcell. Biol. 28, 171–184. ARRIGO, A. P., PAUL, C., DUCASSE, C., MANERO, F., KRETZ-REMY, C., VIROT, S., JAVOUHEY, E., MOUNIER, N., and DIAZ-LATOUD, C. (2002) Small stress proteins: novel negative modulators of apoptosis induced independently of reactive oxygen species. Prog. Mol. Subcell. Biol. 28, 185–204. CUESTA, R., LAROIA, G., and SCHNEIDER, R. J. (2000) Chaperone hsp27 inhibits translation during heat shock by binding eIF4G and facilitating dissociation of cap-initiation complexes. Genes Dev. 14, 1460–1470. AKBAR, M. T., LUNDBERG, A. M., LIU, K., VIDYADARAN, S., WELLS, K. E., DOLATSHAD, H., WYNN, S., WELLS, D. J., LATCHMAN, D. S., and de BELLEROCHE, J. (2003) The neuroprotective effects of heat shock protein 27 overexpression in transgenic animals against kainate-induced seizures and hippocampal cell death. J. Biol. Chem. 278, 19956–19965. CLARK, J. I., and MUCHOWSKI, P. J. (2000) Small heat-shock proteins and their potential role in human disease. Curr. Opin. Struct. Biol. 10, 52–59. CRABBE, M. J. C., and HEPBURNE-SCOTT, H. W. (2001) Small heat shock proteins (sHsps) as potential drug targets. Curr. Pharm. Biotech. 2, 77–111. L¨OW D., BRA¨ NDLE, K., NOVER, L., and FORREITER, C. (2000) Cytosolic heat-stress proteins Hsp17.7 class I and Hsp17.3
86
87
88
89
90
91
92
93
class II of tomato act as molecular chaperones in vivo. Planta 211, 575–582. FORREITER, C., KIRSCHNER, M., and NOVER, L. (1997) Stable transformation of an Arabidopsis cell suspension culture with firefly luciferase providing a cellular system for analysis of chaperone activity in vivo. Plant Cell 9, 2171–2181. YEH, C. H., CHANG, P. F. L., YEH, K. W., LIN, W. C., CHEN, Y. M., and LIN, C. Y. (1997) Expression of a gene encoding a 16.9-kDa heat-shock protein, Oshsp16.9, in Escherichia coli enhances thermotolerance. Proc. Natl. Acad. Sci. USA 94, 10967–10972. SOTO, A., ALLONA, I., COLLADA, C., GUEVARA, M. A., CASADO, R., RODRIGUEZ-CEREZO, E., ARAGONCILLO, C., and GOMEZ, L. (1999) Heterologous expression of a plant small heat-shock protein enhances Escherichia coli viability under heat and cold stress. Plant Physiol. 120, 521–528. SUN, W., BERNARD, C., van de COTTE, B., Van MONTAGU, M., and VERBRUGGEN, N. (2001) At-HSP17.6A, encoding a small heat-shock protein in Arabidopsis, can enhance osmotolerance upon overexpression. Plant J. 27, 407–415. KITAGAWA, M., MATSUMURA, Y., and TSUCHIDO, T. (2000) Small heat shock proteins, IbpA and IbpB, are involved in resistances to heat and superoxide stresses in Escherichia coli. FEMS Microbiol. Lett. 184, 165–171. PLESOFSKY-VIG, N., and BRAMBL, R. (1995) Disruption of the gene for hsp30, an α-crystallin-related heat shock protein of Neurospora crassa, causes defects in thermotolerance. Proc. Natl. Acad. Sci. USA 92, 5032–5036. THOMAS, J. G., and BANEYX, F. (1998) Roles of the Escherichia coli small heat shock proteins IbpA and IbpB in thermal stress management: Comparison with ClpA, ClpB, and HtpG in vivo. J. Bacteriol. 180, 5165–5172. KUCZYNSKA-WISNIK, D., KEDZIERSKA, S., MATUSZEWSKA, E., LUND, P., TAYLOR, A., LIPINSKA, B., and LASKOWSKA, E. (2002) The Escherichia coli small heat-shock proteins IbpA and IbpB prevent the aggregation of endogenous proteins
References
94
95
96
97
98
99
100
101
denatured in vivo during extreme heat shock. Microbiology 148, 1757–1765. LEE, S., OWEN, H. A., PROCHASKA, D. J., and BARNUM, S. R. (2000) HSP16.6 is involved in the development of thermotolerance and thylakoid stability in the unicellular cyanobacterium, Synechocystis sp. PCC 6803. Curr. Microbiol. 40, 283–287. LEE, S. Y., PROCHASKA, D. J., FANG, F., and BARNUM, S. R. (1998) A 16.6-kilodalton protein in the cyanobacterium Synechocystis sp. PCC 6803 plays a role in the heat shock response. Curr. Microbiol. 37, 403–407. HORVATH, I., GLATZ, A., VARVASOVSZKI, V., T¨ORO¨ K Z., PALI, T., BALOGH, G., KOVACS, E., NADASDI, L., BENKO¨ S., JOO F., and VIGH, L. (1998) Membrane physical state controls the signaling mechanism of the heat shock response in Synechocystis PCC 6803: identification of hsp17 as a “fluidity gene”. Proc. Natl. Acad. Sci. USA 95, 3513–3518. T¨ORO¨ K Z., GOLOUBINOFF, P., HORVA´ TH, I., TSVETKOVA, N. M., GLATZ, A., BALOGH, G., VARVASOVSZKI, V., LOS, D. A., VIERLING, E., CROWE, J. H., and VIGH, L. (2001) Synechocystis HSP17 is an amphitropic protein that stabilizes heat-stressed membranes and binds denatured proteins for subsequent chaperone-mediated refolding. Proc. Natl. Acad. Sci. USA 98, 3098–3103. TSVETKOVA, N. M., HORVATH, I., T¨ORO¨ K Z., WOLKERS, W. F., BALOGI, Z., SHIGAPOVA, N., CROWE, L. M., TABLIN, F., VIERLING, E., CROWE, J. H., and VIGH, L. (2002) Small heat-shock proteins regulate membrane lipid polymorphism. Proc. Natl. Acad. Sci. USA 99, 13504–13509. COBB, B. A., and PETRASH, J. M. (2000) Characterization of α-crystallin-plasma membrane binding. J. Biol. Chem. 275, 6664–6672. LEE, B. Y., HEFTA, S. A., and BRENNAN, P. J. (1992) Characterization of the major membrane protein of virulent Mycobacterium tuberculosis. Infect. Immun. 60, 2066–2074. L¨UNSDORF, H., SCHAIRER, H. U., and HEIDELBACH, M. (1995) Localization of
102
103
104
105
106
107
108
109
the stress protein SP21 in indole-induced spores, fruiting bodies, and heat-shocked cells of Stigmatella aurantiaca. J. Bacteriol. 177, 7092–7099. JOBIN, M. P., DELMAS, F., GARMYN, D., DIVIE’S C., and GUZZO, J. (1997) Molecular characterization of the gene encoding an 18-kilodalton small heat shock protein associated with the membrane of Leuconostoc oenos. Appl. Environ. Microbiol. 63, 609–614. KEDERSHA, N. L., GUPTA, M., LI, W., MILLER, I., and ANDERSON, P. (1999) RNA-binding proteins TIA-1 and TIAR link the phosphorylation of eIF-2α to the assembly of mammalian stress granules. J. Cell. Biol. 147, 1431–1442. SINGH, K., GROTHVASSELLI, B., and FARNSWORTH, P. N. (1998) Interaction of DNA with bovine lens α-crystallin: its functional implications. Int. J. Biol. Macromol. 22, 315–320. PIETROWSKI, D., DURANTE, M. J., LIEBSTEIN, A., SCHMITT-JOHN, T., WERNER, T., and GRAW, J. (1994) α-Crystallins are involved in specific interactions with the murine gamma D/E/F-crystallin-encoding gene. Gene 144, 171–178. BLOEMENDAL, M., TOUMADJE, A., and JOHNSON, W. C. (1999) Bovine lens crystallins do contain helical structures: a circular dichroism study. Biochim. Biophys. Acta 1432, 234–238. MERCK, K. B., GROENEN, P. J. T. A., VOORTER, C. E. M., de HAARD-HOEKMAN, W. A., HORWITZ, J., BLOEMENDAL, H., and de JONG, W. W. (1993) Structural and functional similarities of bovine α-crystallin and mouse small heat-shock protein –A family of chaperones. J. Biol. Chem. 268, 1046–1052. DUDICH, I. V., ZAV’YALOV, V. P., PFEIL, W., GAESTEL, M., ZAV’YALOVA, G. A., DENESYUK, A. I., and KORPELA, T. (1995) Dimer structure as a minimum cooperative subunit of small heat-shock proteins. Biochim. Biophys. Acta 1253, 163–168. STROMER, T., FISCHER, E., RICHTER, K., HASLBECK, M., and BUCHNER, J. (2004) Analysis of the regulation of the molecular chaperone Hsp26 by
17
18 Small Heat Shock Proteins
110
111
112
113
114
115
116
117
temperature-induced dissociation: the N-terminal domail is important for oligomer assembly and the binding of unfolding proteins. J. Biol. Chem. 278, 25970–25976. KIM, K. K., KIM, R., and KIM, S. H. (1998) Crystal structure of a small heat-shock protein. Nature 394, 595–599. HALEY, D. A., HORWITZ, J., and STEWART, P. L. (1998) The small heat-shock protein, α-B-crystallin, has a variable quaternary structure. J. Mol. Biol. 277, 27–35. BOVA, M. P., MCHAOURAB, H. S., HAN, Y., and FUNG, B. K. K. (2000) Subunit exchange of small heat shock proteins –Analysis of oligomer formation of αA-crystallin and Hsp27 by fluorescence resonance energy transfer and site-directed truncations. J. Biol. Chem. 275, 1035–1042. van MONTFORT, R. L., BASHA, E., FRIEDRICH, K. L., SLINGSBY, C., and VIERLING, E. (2001) Crystal structure and assembly of a eukaryotic small heat shock protein. Nat. Struct. Biol. 8, 1025–1030. HASLBECK, M., WALKE, S., STROMER, T., EHRNSPERGER, M., WHITE, H. E., CHEN, S. X., SAIBIL, H. R., and BUCHNER, J. (1999) Hsp26: a temperature-regulated chaperone. EMBO J. 18, 6744–6751. CHANG, Z. Y., PRIMM, T. P., JAKANA, J., LEE, I. H., SERYSHEVA, I., CHIU, W., GILBERT, H. F., and QUIOCHO, F. A. (1996) Mycobacterium tuberculosis 16-kDa antigen (Hsp16.3) functions as an oligomeric structure in vitro to suppress thermal aggregation. J. Biol. Chem. 271, 7218–7223. LEROUX, M. R., MELKI, R., GORDON, B., BATELIER, G., and CANDIDO, E. P. M. (1997) Structure-function studies on small heat shock protein oligomeric assembly and interaction with unfolded polypeptides. J. Biol. Chem. 272, 24646–24656. DERHAM, B. K., van BOEKEL, M. A., MUCHOWSKI, P. J., CLARK, J. I., HORWITZ, J., HEPBURNE-SCOTT, H. W., de JONG, W. W., CRABBE, M. J., and HARDING, J. J. (2001) Chaperone function of mutant versions of αA-and αB-crystallin
118
119
120
121
122
123
124
125
126
127
prepared to pinpoint chaperone binding sites. Eur. J. Biochem. 268, 713–721. HORWITZ, J., BOVA, M., HUANG, Q. L., DING, L., YARON, O., and LOWMAN, S. (1998) Mutation of αB-crystallin: effects on chaperone-like activity. Int. J. Biol. Macromol. 22, 263–269. KOTEICHE, H. A., and MCHAOURAB, H. S. (1999) Folding pattern of the α-crystallin domain in αA-crystallin determined by site-directed spin labeling. J. Mol. Biol. 294, 561–577. SMULDERS, R. H. P. H., CARVER, J. A., LINDNER, R. A., van BOEKEL, M. A. M., BLOEMENDAL, H., and de JONG, W. W. (1996) Immobilization of the C-terminal extension of bovine αA-crystallin reduces chaperone-like activity. J. Biol. Chem. 271, 29060–29066. STUDER, S., OBRIST, M., LENTZE, N., and NARBERHAUS, F. (2002) A critical motif for oligomerization and chaperone activity of bacterial α-heat shock proteins. Eur. J. Biochem. 269, 3578–3586. GIESE, K. C., and VIERLING, E. (2002) Changes in oligomerization are essential for the chaperone activity of a small heat shock protein in vivo and in vitro. J. Biol. Chem. 277, 46310–46318. BERENGIAN, A. R., PARFENOVA, M., and MCHAOURAB, H. S. (1999) Site-directed spin labeling study of subunit interactions in the α-crystallin domain of small heat-shock proteins: Comparison of the oligomer symmetry in αA-crystallin, Hsp27 and Hsp16.3. J. Biol. Chem. 274, 6305–6314. KANTOROW, M., and PIATIGORSKY, J. (1998) Phosphorylations of alpha A and alpha B-crystallin. Int. J. Biol. Macromol. 22, 307–314. GAESTEL, M. (2002) sHsp-phosphorylation: enzymes, signaling pathways and functional implications. Prog. Mol. Subcell. Biol. 28, 151–169. DERHAM, B. K., and HARDING, J. J. (1999) a-crystallin as a molecular chaperone. Prog. Retin. Eye Res. 18, 463–509. KEMP, B. E., and PEARSON, R. B. (1990) Protein kinase recognition sequence motifs. Trends Biochem. Sci. 15, 342–346.
References 128 BUTT, E., IMMLER, D., MEYER, H. E., KOTLYAROV, A., LAASS, K., and GAESTEL, M. (2001) Heat shock protein 27 is a substrate of cGMP-dependent protein kinase in intact human platelets: phosphorylation-induced actin polymerization caused by HSP27 mutants. J. Biol. Chem. 276, 7108–7113. 129 LUND, A. A., RHOADS, D. M., LUND, A. L., CERNY, R. L., and ELTHON, T. E. (2001) In vivo modifications of the maize mitochondrial small heat stress protein, HSP22. J. Biol. Chem. 276, 29924–29929. 130 ITO, H., KAMEI, K., IWAMOTO, I., INAGUMA, Y., NOHARA, D., and KATO, K. (2001) Phosphorylation-induced Change of the Oligomerization State of αB-crystallin. J. Biol. Chem. 276, 5346–5352. 131 KATO, K., HASEGAWA, K., GOTO, S., and INAGUMA, Y. (1994) Dissociation as a result of phosphorylation of an aggregated form of the small stress protein, hsp27. J. Biol. Chem. 269, 11274–11278. 132 ROGALLA, T., EHRNSPERGER, M., PREVILLE, X., KOTLYAROV, A., LUTSCH, G., DUCASSE, C., PAUL, C., WIESKE, M., ARRIGO, A. P., BUCHNER, J., and GAESTEL, M. (1999) Regulation of Hsp27 oligomerization, chaperone function, and protective activity against oxidative stress/tumor necrosis factor alpha by phosphorylation. J. Biol. Chem. 274, 18947–18956. 133 ITO, H., OKAMOTO, K., NAKAYAMA, H., ISOBE, T., and KATO, K. (1997) Phosphorylation of αB-crystallin in response to various types of stress. J. Biol. Chem. 272, 29934–29941. 134 van den IJSSEL, P. R., OVERKAMP, P., BLOEMENDAL, H., and de JONG, W. W. (1998) Phosphorylation of αB-crystallin and HSP27 is induced by similar stressors in HeLa cells. Biochem. Biophys. Res. Commun. 247, 518–523. 135 VOORTER, C. E., de HAARD-HOEKMAN, W. A., ROERSMA, E. S., MEYER, H. E., BLOEMENDAL, H., and de JONG, W. W. (1989) The in vivo phosphorylation sites of bovine αB-crystallin. FEBS Lett. 259, 50–52. 136 GAESTEL, M., BENNDORF, R., HAYESS, K., PRIEMER, E., and ENGEL, K. (1992)
137
138
139
140
141
142
143
144
Dephosphorylation of the small heat shock protein hsp25 by calcium/calmodulin-dependent (type 2B) protein phosphatase. J. Biol. Chem. 267, 21607–21611. CAIRNS, J., QIN, S., PHILP, R., TAN, Y. H., and GUY, G. R. (1994) Dephosphorylation of the small heat shock protein Hsp27 in vivo by protein phosphatase 2A. J. Biol. Chem. 269, 9176–9183. MORONI, M., and GARLAND, D. (2001) In vitro dephosphorylation of α-crystallin is dependent on the state of oligomerization. Biochim. Biophys. Acta 1546, 282–290. SOBOTT, F., BENESCH, J. L., VIERLING, E., and ROBINSON, C. V. (2002) Subunit exchange of multimeric protein complexes. Real-time monitoring of subunit exchange between small heat shock proteins by using electrospray mass spectrometry. J. Biol. Chem. 277, 38921–38929. HELM, K. W., LEE, G. J., and VIERLING, E. (1997) Expression and native structure of cytosolic class II small heat shock proteins. Plant Physiol. 114, 1477– 1485. van DENOETELAAR, P. J., van SOMEREN, P. F., THOMSON, J. A., SIEZEN, R. J., and HOENDERS, H. J. (1990) A dynamic quaternary structure of bovine alpha-crystallin as indicated from intermolecular exchange of subunits. Biochemistry 10, 3488–3493. SUGIYAMA, Y., SUZUKI, A., KISHIKAWA, M., AKUTSU, R., HIROSE, T., WAYE, M. M., TSUI, S. K., YOSHIDA, S., and OHNO, S. (2000) Muscle develops a specific form of small heat shock protein complex composed of MKBP/HSPB2 and HSPB3 during myogenic differentiation. J. Biol. Chem. 275, 1095–1104. KIM, R., KIM, K. K., YOKOTA, H., and KIM, S. H. (1998) Small heat shock protein of Methanococcus jannaschii, a hyperthermophile. Proc. Natl. Acad. Sci. USA 95, 9129–9133. LAKSANALAMAI, P., JIEMJIT, A., BU, Z., MAEDER, L., and ROBB, T. (2003) Multi-subunit assembly of the Pyrococcus furiosus small heat shock protein is
19
20 Small Heat Shock Proteins
essential for cellular protection at high temperature. Extremophiles 7, 79–83. 145 LAKSANALAMAI, P., MAEDER, D. L., and ROBB, F. T. (2001) Regulation and mechanism of action of the small heat shock protein from the hyperthermophilic archaeon Pyrococcus furiosus. J. Bacteriol. 183, 5198–5202. 146 SHEARSTONE, J. R., and BANEYX, F. (1999) Biochemical characterization of the small heat shock protein IbpB from Escherichia coli. J. Biol. Chem. 274, 9937–9945. 147 YOUNG, D., LATHRIGA, R., HENDRIX, R., SWEETSER, D., and YOUNG, R. A. (1988) Stress proteins are immune targets in leprosy and tuberculosis. Proc. Natl. Acad. Sci. USA 85, 4267–4270.
148 HASLBECK, M., BRAUN, N., STROMER, T., RICHTER, B., MODEL, N., WEINKAUF, S., and BUCHNER, J. (2004) Hsp42 is the general small heat shock protein in the cytosol of Saccharomyces cerevisiae. EMBO J. 23, 638–649. 149 EHRNSPERGER, M., LILIE, H., GAESTEL, M., and BUCHNER, J. (1999) The dynamics of Hsp25 quaternary structure –Structure and function of different oligomeric species. J. Biol. Chem. 274, 14867– 14874. 150 HALEY, D. A., BOVA, M. P., HUANG, Q. L., MCHAOURAB, H. S., and STEWART, P. L. (2000) Small heat-shock protein structures reveal a continuum from symmetric to variable assemblies. J. Mol. Biol. 298, 261–272.
1
Transmemebrane Domains in Proteins Anja Ridder, and Dieter Langosch Technische Universit¨at M¨unchen, Freising, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction 1.1 Structure of Transmembrane Domains
Integral membrane proteins traverse the membrane with one or more transmembrane segments (TMSs). The basic architecture of a membrane consists of a hydrophobic region of about 30Å, formed by the lipid acyl chains, and two more polar interface regions that are both about 15 Å thick [1]. In order to fulfill the hydrogen bond-forming capacity of the polypeptide, proteins generally span the membrane either with α-helices or β-strands (Figure 1). Since in β-strands hydrogen bonds are formed between residues in adjacent segments, proteins have to form cylindrical β-barrels to satisfy all their hydrogen bonds. In α-helices, hydrogen bonds are formed between residues within the same segment. Therefore, proteins can either span the membrane with a single hydrophobic α-helical transmembrane segment (bitopic proteins) or form a bundle of helices (polytopic proteins). As an increasing number of polytopic membrane proteins are structurally characterized with high-resolution methods, and as oligomerization of more and more bitopic membrane proteins is studied with biochemical, biophysical, and genetic techniques, it is becoming increasingly clear that folding of membrane-integral domains and biological function is strongly dependent on interactions between αhelical TMSs. A number of excellent reviews have been published recently covering membrane protein folding and TMS-TMS interactions from different perspectives [1–12]. Naturally, the structure of the membrane-spanning domains of the β-barrel type, as observed with many bacterial outer-membrane proteins, is also dependent
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Transmemebrane Domains in Proteins
Fig. 1 Structures of two membrane proteins. On the left is the "helical potassium channel from Streptomyces lividans [272] (PDB access code 1BL8). On the right, the $-barrel protein PhoE from Escherichia coli is shown [273] (PDB access code 1PHO). The side chains of tryptophan and tyrosine residues in the proteins are shown in black.
on interactions between their TMSs [13, 14]. Because these interactions are less well understood than those between α-helical TMSs, they will be excluded from the present review. 1.2 The Biosynthetic Route towards Folded and Oligomeric Integral Membrane Proteins
It appears that some proteins or peptides have the ability to insert into membranes spontaneously, i.e., without the use of proteinaceous factors. In the thylakoid membrane of chloroplasts, several proteins seem to insert as helical hairpins, depending on the hydrophobicity of the TMSs [15–17]. Small proteins with a single hydrophobic segment and small, if any, flanking regions also seem to be able to insert spontaneously into lipid bilayers [18], a situation that may also apply to some toxins or antimicrobial peptides. However, most proteins become integrated into biological membranes via proteinaceous machineries. Co-translational membrane integration is a process that is similar in the eukaryotic endoplasmic reticulum (ER) membrane and bacterial inner membranes [19–22]. In the first step, a signal recognition particle (SRP) binds to the N-terminal signal sequence or to the first TMS of the protein when it emerges from the ribosome. These ribosomenascent chain complexes are targeted to the SRP receptor in the membrane (performed by FtsY in bacteria) and subsequently
1 Introduction 3
transferred to the translocon. In eukaryotic ER membranes, this is the Sec61p complex, consisting of α-, β-, and γ -subunits, while in E. coli the homologous proteins are called SecYEG. The protein is then further translated and TMSs exit the translocation channel laterally into the lipid bilayer. For some proteins this occurs during translation, while other membrane proteins are transferred into the lipids only after the complete protein is synthesized. In the latter case, interactions between TMSs can occur within the translocon. It is commonly thought that TMSs are initially oriented and integrated into the membrane as independent units. However, recent results show that intraprotein interactions, stop-transfer effector proteins, and cotranslational modifications can also play a role in determining the final topology of proteins within the membrane [21]. The membrane insertion of individual helices is the first stage in the widely accepted two-stage model of membrane protein folding, while in the second stage the helices come together through helix-helix interactions to produce the final folded conformation of the protein [7, 23]. It has been shown early on for a number of cases that membrane protein fragments produced by proteolytic cleavage of the loops or by co-expression of separate domains can be reconstituted to functional proteins. The most prominent example is bacteriorhodopsin, where a large number of different TMS fragments were produced and reassembled and where none of the connecting loops appears to be absolutely required for function [24–26]. Other examples are reviewed elsewhere [10, 12]. These results demonstrate that the information directing folding of functional membrane proteins is at least partly contained within the structure and sequence of the TMSs. 1.3 Structure and Stability of TMSs 1.3.1 Amino Acid Composition of TMSs and Flanking Regions In contrast to the wealth of structural information obtained for soluble proteins, very few detailed structures of membrane proteins have been obtained. Nevertheless, much can be deduced about the structure of membrane proteins by examining their primary amino acid sequences. The average length of a transmembrane (TM) helix is 20–25 amino acids, which is enough to span the hydrophobic part of a lipid bilayer [27–29]. Due to the need to be accommodated in the hydrophobic part of a lipid bilayer, TMSs are enriched in the hydrophobic amino acids Phe, Ile, Leu, Met, and Val [28, 30–33]. Single-spanning membrane protein TMSs tend to be more hydrophobic than TMSs from multispanning proteins and contain fewer polar amino acids [27, 32, 33]. The nucleotide bias of a particular organism affects the amino acid composition of TMSs such that organisms with a higher GC content in their genomes have more Val and Ala in their TMSs, whereas organisms with a lower GC content contain more Ile and Phe residues [34]. It is presently unknown whether the resulting difference in hydrophobicity is compensated for by the length of the TMSs. In the membrane-flanking regions of TMSs, positively charged residues are more abundant at the cytoplasmic side of the membrane than at the periplasmic side,
4 Transmemebrane Domains in Proteins
while there is no such bias for negatively charged residues [31–33]. Positive charges are more difficult to translocate across the membrane and thus control the topology of membrane proteins; this is called the “positive-inside rule” [35]. In most of the structures of membrane proteins that have been determined, a remarkable distribution of aromatic amino acids is observed. In β-barrel as well as α-helical proteins, these residues form aromatic belts at the membrane-water interface (see Figure 1). This interfacial enrichment of aromatic residues has been confirmed by statistical analysis of membrane proteins [18, 27, 31, 36]. The role of these belts is not completely clear, but it is probably related to the strong affinity of Trp and Tyr residues for the interfacial region of lipid bilayers [37–40]. This may anchor proteins within the membrane and influence their precise positioning [40–42]. Using the information described above, it can be quite reliably predicted which proteins are membrane proteins and where TMSs are located in the sequence. Most of these prediction programs are based on hydrophobicity plots and scan an amino acid sequence for hydrophobic stretches of about 20 residues. Very often used are the Kyte and Doolittle [43] or GES [44] hydrophobicity scales. If the hydrophobicity of a segment is higher than a certain threshold value, it is considered to be a TMS. Other methods are based on comparisons with TMSs of well-characterized membrane proteins [45, 46] or use evolutionary information [47] to determine which amino acids are likely to be in the membrane [48]. When the TMSs have been determined, the transmembrane topology of a protein can subsequently be predicted by considering the preference of certain amino acids for the cis or trans side of the membrane [49–51]. Until now, the nucleotide bias of a particular organism usually has not been taken into account in these predictions [34]. 1.3.2 Stability of Transmembrane Helices In general, the stability of an α-helix depends both on intrahelical N—H ··· O C hydrogen bonds between backbone atoms of successive helical turns and on intrahelical interactions between adjacent amino acid side chains [52]. As all of these interactions are essentially electrostatic in nature (see Section 2.1.1), it follows that their strength increases when a polypeptide chain is transferred from polar aqueous solution (dielectricity constant ε = 80) into the apolar part of a lipid bilayer. The magnitude of this increase depends on amino acid sequence and on the dielectric constant of the bilayer. That TM helices in apolar environments are indeed more stable than soluble helices is experimentally supported by the finding that denaturation of membrane proteins in detergent solution by heat or denaturants often causes only minimal loss of helical structure [3]. In soluble proteins, Leu and Ala residues rank among the best helix promoters, while Ile and Val destabilize α-helices and are over-represented in β-sheets; Gly strongly destabilizes soluble helices, and Pro is known to be a helix breaker [53–57]. Using a host-guest approach, different residue types were placed into the invariant framework of a model peptide and the latter’s structure in media of different
2 The Nature of Transmembrane Helix-Helix Interactions
polarity was determined. It was found that the helix-destabilizing effect of Gly, Val, and Ile was much smaller upon insertion of the peptide into the apolar membranemimetic environment of detergent or lipid micelles than in aqueous buffer [58]. In a later study, the same authors showed that Ile, Leu, and Val were the residue types with the highest propensity of α-helices in apolar media, whereas Gly and Pro had the most destabilizing effects [59, 60]. That residues with β-branched side chains are effective helix formers in membranes is supported by the observation that Ile and/or Val make up about half of the α-helical TMSs of the bacteriophage M13 major coat protein [61] and of the pulmonary surfactant-associated polypeptide SP-C [62]. As in soluble helices, Pro residues have a destabilizing effect on TM-helices for two reasons: (1) the imide within its cyclic side chain cannot hydrogen bond to the carbonyl oxygen of the peptide bond at the i-3 or i-4 position, and (2) a steric clash between the Pro ring and the backbone carbonyl group at the i-4 position is produced [63]. Despite these destabilizing effects, Pro is well tolerated in TM helices due to their greater overall stability; in addition, the missing hydrogen bond may partially be replaced by a weak hydrogen bond involving the proton linked to the δ carbon of the Pro ring and the carbonyl group at i-4 [64]. Nevertheless, Pro residues in TMSs are frequently, but not always, associated with local bends or distortions of helical structure that may be relevant for protein function [63, 65]. A similar situation may apply to Gly residues that occur with high frequency in TMSs. Gly is thought to locally disrupt intrahelical side chain packing and appears to have a greater destabilizing effect on helices in a polar environment than in apolar media [58]. Interestingly, bends in TM helices are frequently seen when Pro and Gly residues are spaced four residues apart [66].
2 The Nature of Transmembrane Helix-Helix Interactions 2.1 General Considerations
A complete thermodynamic description of TMS-TMS interactions is a complex task since partitioning of the helices into the lipid bilayer, stability of and attractive forces between the helices, entropic factors, competition between protein-protein, protein-lipid, and lipid-lipid interactions, and environmental constraints have to be taken into account. At present, the degrees by which these factors define a given interaction are not understood at a quantitative level and we therefore limit the discussion to qualitative descriptions. We will summarize first how TMS-TMS interactions result from different attractive forces and entropic factors. Second, we will discuss how these concepts evolved from the study of polytopic and bitopic membrane proteins. Finally, we discuss how TMS-TMS interactions may be influenced by protein-lipid interactions.
5
6 Transmemebrane Domains in Proteins
2.1.1 Attractive Forces within Lipid Bilayers Of the four basic attractive forces – i.e., weak and strong nuclear forces, gravitation, and electrostatic interactions – the latter account for folding of and interactions between proteins. Electrostatic interactions are commonly classified into three main categories. Ionic, or coulomb, interactions attract stable point charges such as ionized amino acid side chains. These are strong and decrease proportionally with the distance between them. Hydrogen bonds are based on protons that are shared by appropriately positioned electronegative atoms and are of primarily electrostatic nature. Dipole-dipole, or van der Waals, forces develop between permanent dipoles, between permanent and induced dipoles, or between induced and thus fluctuating dipoles (dispersion forces). In general, van der Waals forces are weaker than ionic interactions or hydrogen bonds; their strength depends on the polarity and polarizability of the binding partners and decreases with the sixth power of distance [52]. It has frequently been noted that the strength of ionic interactions and hydrogen bonds strongly increases when the polarity of the environment decreases, since electrostatic forces are inversely related to the dielectric constant ε. It is often overlooked, however, that the same applies to van der Waals interactions [67, 68]. These considerations imply that all attractive forces mediating protein-protein interaction are generally stronger in the low dielectric environment of the lipid bilayer acyl chain region than in aqueous solution. 2.1.2 Forces between Transmembrane Helices All of the forces discussed in Section 2.1.1 – that is, ionic interactions, hydrogen bonds, and van der Waals interactions – can contribute to TMS-TMS interactions. Their relative contributions to the total enthalpy of interaction largely depend on amino acid composition and sequence. Since TMSs are built mostly from apolar amino acids, van der Waals interactions between their side chains appear to be a major and universal driving force in TMS assembly. Albeit weak, unidirectional, and strongly dependent on distance, van der Waals interactions apply to any type of side chain atom and may accumulate over entire interfacial regions. This is of particular relevance where an interface forms between helix surfaces that are complementary in shape and therefore form extensive interfacial areas. Interhelical hydrogen bonds appear to come in a variety of flavors, distinguished by the electronegativity and proximity of donors and acceptors that determine their strength. Hydrogen bonds can form (1) between the side chains of polar, yet nonionizable, residues; (2) between the hydrogens connected to Cα carbon atoms and main-chain carbonyl groups or polar side chains; (3) between N H groups of the main chain and polar side chains; and (4) between isolated ionizable residues that are likely to be uncharged, yet polar, in the membrane. There is experimental evidence for all of these alternatives, as is described in Section 2.2. Since hydrogen bonds exhibit a greater degree of directionality than van der Waals interactions, they are likely to complement the latter and to enhance specificity and stability of membrane protein folding and interaction [5]. Whether interactions between ionized side chains occur between TMSs is an unsettled question. The considerable enthalpic cost of transferring a point charge into an apolar environment makes the existence
2 The Nature of Transmembrane Helix-Helix Interactions
of isolated ionized side chains in membranes quite unlikely [69]. On the other hand, this cost would drop substantially provided that two charges of opposite sign interact and thus neutralize each other. Formation of strong ionic bonds between TMSs is therefore a realistic scenario. As outlined in Section 2.1.1, electrostatic forces are strengthened in apolar environments. Whereas standard textbooks frequently depict the acyl chain region of a lipid bilayer as a slab of uniformly low polarity with dielectric constants ε ∼ 2, the situation is more complex in reality. Nitroxide radicals, whose spin density depends on the polarity of the surrounding medium, were coupled to different positions of phospholipids, fatty acids, or cholesterol in order to report the polarity of model bilayers at different depths of penetration by electron paramagnetic resonance spectroscopy. The results revealed steep polarity gradients within the acyl chain region. Specifically, the apolar character of membranes composed of monounsaturated phospholipids increased from the headgroup region towards the middle of the acyl chain region. Inclusion of cholesterol reduced polarity in the central regions of saturated membranes and broadened the apolar central region in unsaturated membranes [70]. A similar picture emerged from neutron diffraction studies, where it was found that the concentration of the strongly hydrophobic molecule hexane dissolved in a bilayer peaked at its center and decreased towards its boundaries [71]. Accordingly, TMSs may experience the lowest polarity, and therefore strongest interactions, at their central regions. The same argument would apply to TMSs in detergent micelles, since their terminal regions would be close to the charged surface of the micelle, whereas the central regions would be embedded in its strongly hydrophobic core. 2.1.3 Entropic Factors Influencing Transmembrane Helix-Helix Interactions Apart from enthalpic contributions, the free energy of TMS-TMS interactions is likely to be influenced by entropic factors in ways that are significantly different from the situation encountered with soluble proteins. First, TMSs are preoriented in the membrane upon biosynthetic insertion, with their long axes roughly perpendicular to the plane of the bilayer. Therefore, the loss of backbone entropy that is associated with any protein-protein interaction is expected to be smaller upon assembly of TMSs than of soluble helices tumbling freely in isotropic solution [72]. Second, helix-helix interactions are also influenced by the loss of side chain entropy upon association, provided that residues buried within interfaces exhibit only one out of several alternative rotameric states. On first approximation, this phenomenon would be of equal importance for soluble and membrane-embedded helices. On the other hand, the number of accessible rotameric states available to many side chains drops when a helix folds from the denatured state. Therefore, association of soluble helices that are usually in equilibrium with denatured states would be accompanied with a greater entropic penalty than assembly of preformed TM helices. Loss of side chain entropy is absent for Gly and minimal for those residues that adopt only one (Val) or three (Leu and Ile) rotamers in helices [10]; this would favor these amino acids in TMS-TMS interfaces. Furthermore, the hydrophobic effect is considered to be a major driving force in soluble protein
7
8 Transmemebrane Domains in Proteins
folding. This effect is thought to evolve when apolar protein surfaces interact and thereby release ordered networks of water molecules associated with them [52]. It may be more important for inducing hydrophobic collapse than for stabilization of the folded structure, since many enzymes remain functional in apolar organic solvents [73]. Due to the scarcity of water molecules in the hydrophobic part of the membrane, the hydrophobic effect is thought to be of little or no importance for TMS-TMS interactions [2, 10]. It has been argued, however, that other solvophobic factors based on lipids play a role in the membrane. Accordingly, the entropy of a membrane may increase as soon as lipids are removed from interacting protein surfaces [9]. 2.2 Lessons from Sequence Analyses and High-resolution Structures
Careful analyses of the amino acid sequences and crystal structures of membrane proteins can provide insight into how TMSs pack together. Transmembrane helices of polytopic proteins that are adjacent in sequence always pack against each other in an antiparallel mode in the folded structure due to the topological constraints of the connecting loops. TMSs farther apart in sequence or TMSs from interacting subunits may pack in either parallel or antiparallel mode [74–76]. Thus, loops are at least partially responsible for the sequential order and relative orientation of the TMSs in a folded structure. An antiparallel arrangement of helices seems to allow for tighter packing than a parallel one [77]. A role for the helix dipole moment in antiparallel packing of TMSs does not seem likely [78]. Similar to the situation with soluble globular proteins [79], interacting TM helices within polytopic membrane proteins rarely display exactly parallel long axes but rather adopt either positive or negative crossing angles , i.e., they form left-or right-handed pairs, respectively [74, 80] (Figure 1). The sign of the crossing angles depends on the geometry of side chain packing in the common interface. Packing of TM helices is predominantly left-handed [76, 77, 80], with crossing angles at around +20◦ [74], although this result may be biased by the limited number of available high-resolution structures. Pairs of helices with positive crossing angles have a larger average interaction surface than those with negative crossing angles [77]. TMS-TMS interfaces adopting positive crossing angles follow a repeated heptad [abcdefg] pattern of amino acids whose side chains interact via a “knobs-into-holes” packing reminiscent of soluble leucine zipper interaction domains [80]. There, residues located at the a or d positions protrude into cavities formed by a-, d-, e-, and g-type residues of the partner helix. An analysis of three membrane protein structures showed that the positions in such heptad repeat patterns are predominantly occupied by hydrophobic amino acids. Their composition is thus similar to the general composition of TMSs, although Trp seems to be over-represented. In that study, no positional specificity of particular amino acids was found [80]. The other general mode of TMS-TMS packing, generating negative crossing angles, is described more appropriately by “ridges-into-grooves” packing, where
2 The Nature of Transmembrane Helix-Helix Interactions
ridges are formed from protruding side chains and grooves correspond to the valleys between them [2, 79]. Sequence analyses showed that amino acid substitutions in general are more likely to occur at residues facing the lipid bilayer than at helix-helix interfaces, due to the need to conserve specific packing [81]. In support of this, single-spanning transmembrane proteins are more tolerant to mutation than multi-spanning proteins, indicating that they experience less sequence constraints [32, 33]. Surprisingly, the oligomerizing surfaces of polytopic transmembrane proteins are not well conserved [81]. However, this observation is probably caused by the fact that only membrane proteins with large, extended contacting surfaces located on different helices were analyzed. In contrast to soluble helices, Gly and Pro are relatively common in TMSs and by themselves do not necessarily disrupt α-helical structure [66, 75], although disruption of helices is frequently observed when two Pro or a Pro and a Gly together are present within the same helix [66]. Conserved motifs in TMSs of membrane protein families often contain Gly or Pro residues [82]. This points to special roles for these potentially helix-breaking residues in transmembrane helices. Pro residues are enriched in the middle of TM helices compared to soluble ones [32, 33] and are usually oriented towards the protein interior where they contribute to tight packing [65, 75, 77, 83, 84]. A likely reason for this is that it would be unfavorable to direct the unsatisfied hydrogen bond of the carbonyl group of the residue at i-4 towards the lipids, whereas in the protein interior it could hydrogen bond to another helix or to prosthetic groups. Another function for Pro residues in TMSs was recently proposed [85]. In the cystic fibrosis transmembrane conductance regulator, a Pro prevents aggregation of the protein in the aqueous phase or in the translocation pore by hindering the formation of β-sheet structure. By counteracting the β-sheet propensities of surrounding amino acids in the TMS, Pro may thus destabilize misfolded states of the protein [85]. Gly residues in TMSs frequently occur in pairs that are spaced four amino acids apart [27] and thus localize to the same face of the helix. Indeed, the GxxxG motif is the single most enriched pairwise motif found in membrane-spanning segments, where it frequently occurs with β-branched residues at neighboring positions [28]. In addition, pairs of small residues with spacings of four or seven amino acids are conspicuous in light of their high degree of conservation within homologous TMSs in membrane protein families [85] and are therefore thought to play essential roles in the packing of transmembrane helices. In support of this, the small residues Gly, Ala, Ser, and Thr show a large preference for being oriented towards the interior of membrane proteins, where they do not create voids or pockets but pack tightly with other residues [66, 75, 84–86]. In contrast, large hydrophobic amino acids seem to be more loosely packed in membrane proteins than these small residues and are more often oriented to the lipids [75, 86]. Nevertheless, they are important for TMS packing, since in crystallized membrane protein structures 15 of the most frequent 20 pairs of neighboring residues in TM regions contain at least one L, I, or V residue [84]. Hydrophobic residues appear to be highly mutable in TMSs, indicating that they
9
10 Transmemebrane Domains in Proteins
are relatively interchangeable. However, Leu was found to be only half as mutable as the other hydrophobic residues, which could indicate a special role for Leu in the packing of TMSs as leucine zippers [32, 33]. The more polar faces of transmembrane helices are enriched in aromatic amino acids, indicating that these residues are generally buried within the protein interior [83, 87]. Phe appears to be very versatile, since it can pack with a large variety of other residues [77, 84]. Trp and Tyr residues show a tendency to cluster together, which is probably due to their enrichment in the membrane-water interface, where they can interact with the lipid headgroups [27] (see Section 1.3.1). Polar residues tend to be fairly conserved in membrane proteins, suggesting special roles in packing and/or function [77], e.g., in the formation of hydrogen bonds, ionic interactions, and/or cofactor binding. Comparison of transmembrane sequences of mesophilic and thermophilic organisms can provide insight into the importance of certain sequence motifs in the packing and stability of membrane proteins [29]. Thermophilic organisms contain more Asp and Glu residues in their TMSs, which can form stronger hydrogen bonds than the Asn and Gln residues more often found in mesophilic organisms. A striking reduction in the occurrence of Cys residues was found in thermophiles, most likely because their reactivity could be harmful, especially at higher temperatures [29]. The fact that it can readily be replaced by other residues indicates that Cys in most cases has no specific function. Indeed, Cys residues rarely contact each other in TMSs [84], suggesting that S S bridges almost never occur within the membrane. In thermophilic organisms, more of the small residues Gly, Ala, and Ser are found in pair motifs, which could allow for tighter packing between helices and thus for higher thermal stability than in mesophilic organisms [29]. Almost all transmembrane helices form hydrogen bonds with other helices, and the corresponding interfaces are packed more closely than those without [88]. The modes of interhelical hydrogen bonding appear to be more diverse in the membrane than between α-helices from soluble proteins [84] (see Section 2.1.2). Asn and Gln show a high propensity to form strong hydrogen bonds in TMSs [84]. Consequently, these polar residues are often conserved in transmembrane helices, where they interact with other polar amino acids [75]. When located in the middle of TMSs, they tend to be buried within the TMS-TMS interfaces, while they do not show this preferential orientation in the lipid headgroup region [86]. Ser and Thr residues can form hydrogen bonds with the carbonyl atoms at positions i-4 or i-3 on the same helix [76] or can form interhelical hydrogen bonds, e.g., with backbone nitrogens [75]. Interhelical hydrogen bonds and backbone contacts occur more frequently between helix pairs crossing each other at positive packing angles [76]. Indeed, such interactions are often found where helices cross each other via GxxxG motifs, since Gly allows the helices to come into close contact [76, 84]. In addition, Gly has not one but two H atoms that can participate in Cα H···O hydrogen bonds. Multiple interhelical hydrogen bonds appear to be arranged in two kinds of spatial motifs [84]. In the “serine zipper,” the side chains of two amino acids located on opposing helices form hydrogen bonds with the peptide backbone at the opposite
2 The Nature of Transmembrane Helix-Helix Interactions
residue. This is mostly seen for Ser-Ser pairs, and the seven-residue spacing is reminiscent of the leucine-zipper motif [84]. Indeed, such a serine zipper can often be found in combination with a leucine zipper, forming a mixed serine-leucinezipper interface [89]. The second spatial arrangement of hydrogen bonds, the “polar clamp,” is formed by three amino acids on two different helices, with two hydrogen bonds between them, so that a polar residue (Glu, Lys, Asn, Gln, Arg, Ser, Thr) is clamped by hydrogen bonds to either backbone atoms or other side chains [84]. 2.3 Lessons from Bitopic Membrane Proteins
Although the atomic structure of only one bitopic dimeric membrane protein TMS is currently known [90, 91], residue patterns of a number of TMS-TMS interfaces have been mapped by mutational analysis (see protocols, Section 4). Accordingly and in analogy to polytopic proteins (see Section 2.2), bitopic membrane proteins may be grouped into two broad categories. One exhibits interfacial [abcd]n repeats, where a and b correspond to interfacial residues and the TMSs appear to form right-handed pairs, i.e., they cross each other at negative angles. The other one is based on the [abcdefg]n heptad repeat motif of left-handed pairs characterized by positive crossing angles. 2.3.1 Transmembrane Segments Forming Right-handed Pairs The homodimeric glycophorin A, an erythrocyte protein of unknown function, has long served as the paradigm for TMSs interacting at a negative angle, thus forming a right-handed pair. The interface between its TMSs corresponds to the pattern LI. .GV. .GV. .T, as originally shown by Engelman and coworkers and others by mutational analysis (see Section 4) in detergent solution [92–94] and in membranes [95–97]. The structure of this interface gives rise to a negative crossing angle as implied by molecular modeling [98, 99] and confirmed by nuclear magnetic resonance studies in detergent [90] and membranes [91]. This interaction is dominated by a GxxxG motif that is central to the TMS-TMS interface, as mutation of these Gly residues to Ala destabilizes the interaction much more than mutation of the other interfacial residues [93, 95, 100, 101]. The glycophorin A TMS-TMS interaction is apparently driven by a complex mixture of attractive forces and entropic factors. The GxxxG motif may drive assembly by formation of a flat helix surface that allows multiple van der Waals interactions to form. In addition, the entropy loss upon association is considered minimal for Gly and the neighboring Val residues [102]. Moreover, the Gly residues reduce the distance between the helix axes and thus may facilitate hydrogen bond formation between their Cahydrogens and the backbone of the partner helix [76]. That the GxxxG motif is of prime importance is supported by the observation that changing the residue spacing between both Gly residues affects dimerization [103, 104] and that a GxxxG pair induces self-interaction of oligo-methionine or oligo-valine helices in membranes [103]. Interestingly, GxxxG motifs appear also to drive homophilic interactions of TMSs that are otherwise unrelated in sequence and are derived from membrane proteins of diverse function,
11
12 Transmemebrane Domains in Proteins
including M13 major coat protein [105] and syndecan 3 [106]. Further, degenerate versions of this motif have been proposed to mediate TMS-TMS interactions of erbB receptors [107]. Apart from their occurrence in TMSs, GxxxG motifs seem to be prevalent in helix-helix interfaces of soluble proteins where they are also supported by Cα H···O hydrogen bonds [108]. A traditional interhelical hydrogen bond between the Thr residues of the glycophorin TMS is not seen in detergent [90] but is suggested by the structure determined in membranes [91]. It should also be noted that the distance between the dimerization motif and the C-terminal flanking charged residues influences glycophorin TMS-TMS packing for reasons that are not entirely clear [109]. A residue pattern similar to that of glycophorin A accounts for interactions between the TMSs of SNARE (soluble NSF-attachment protein receptor) proteins. SNAREs form an evolutionarily conserved family of proteins that are essential for all types of intracellular membrane fusion events [110]. These proteins form a stable ternary complex that appears to bridge apposed membranes prior to their actual fusion [111]. While crystallographic analysis [112] has shown that a soluble version of the complex corresponds to a coiled-coil structure, the crystal structure of the complete SNARE complex including the TMSs is not known. in vivo, a fraction of the SNARE synaptobrevin II not participating in SNARE complex formation appears to exist in association with the synaptic vesicle protein synaptophysin or as a homodimer, as revealed by cross-linking experiments performed on brain fractions [113–115] or visualization of fluorescently tagged molecules in live cells [116]. A homodimeric form also develops from recombinant full-length synaptobrevin II in detergent solution [117], liposomes [118], or bacterial membranes [119]. Alaninescanning mutagenesis indicated that synaptobrevin homodimerization depends on a specific residue pattern within its TMS: LxxICxxxLxxII. Grafting these six residues onto an inert oligo-alanine host sequence restored homodimerization in detergent solution; two additional residues were required to complete the motif when assayed in membranes: ILxxICxxILxxII [119]. These amino acids form a contiguous patch on the synaptobrevin TM helix and are thus supposed to constitute the TMS-TMS interface (Figure 2). The spacing of these interfacial residues suggests that the selfinteracting TMSs adopt a negative crossing angle [117, 119]. These experimental results are supported by a computational study revealing that the critical residues form a tightly packed interface between α-helices that approach each other most closely at Cys103 and cross each other at a packing angle of −38◦ [120]. In the model, the synaptobrevin TMS-TMS interface comprises a much smaller molecular surface area (3.96 nm2 ) than that of glycophorin A (5.50 nm2 ), is based mainly on van der Waals interactions, and is not stabilized by intermolecular hydrogen bonds. Although the loss of side chain rotamer entropy upon burying Leu and Ile side chains within interfaces is considered to be small (see Section 2.1.3), it appears to exceed that of glycophorin TMS-TMS dimerization [120]. These differences are most likely due to the fact that the synaptobrevin TMS does not harbor a GxxxG motif. Interestingly, the interfacial residues of the synaptobrevin II TMS are almost completely conserved within the TMS of syntaxin 1A, the natural binding partner of synaptobrevin II [119]. This suggested that the TMS also mediates self-assembly
2 The Nature of Transmembrane Helix-Helix Interactions
Fig. 2 Models of self-assembling TMSs from bitopic membrane proteins. The TMSs of glycophorin A (PDB code 1afo; only residues 70-90 are shown) and synaptobrevin II (PDB coordinates kindly provided by Dr. Karen Fleming) form right-handed homodimers with crossing angles < 0◦ , whereas the TMSs from the tetrameric influenza M2 protein (PDB coordinates kindly provided by Dr. William DeGrado) or from the pentameric phospholamban (PDB coordinates kindly provided by Dr. Isaiah Arkin) form left-handed pairs with > 0◦ . For easier visualization of TMS-TMS packing the interfacial residues from neighboring helices are colored green and blue respectively (the
sulfur atom of synaptobrevin Cys 103 is in yellow). In the M2 protein and phospholamban, part of the residues is shown not in CPK but in stick representation to allow for better visual access to the interface. The patterns of the dominant interfacial residues as given below the models were determined by NMR (glycophorin A [90]) and/or mutational or functional analyses (glycophorin A [93]; synaptobrevin II [117]; M2 protein [123]; phospholamban [127]1 and [128]2 ). Lowercase letters underneath the interfacial residues denote the interfacial residue positions according to the consensus patterns for negative ([ab. .]n ) or positive ([a. .d. . .]n ) helix-helix crossing angels.
13
14 Transmemebrane Domains in Proteins
of syntaxin or its heterophilic association with synaptobrevin, and this prediction was experimentally confirmed [118, 119]. A number of hypotheses concerning the functional relevance of synaptobrevin TMS-TMS interaction have been forwarded, including stabilization [117] or multimerization [119, 121] of the SNARE complex and/or propagating the “zippering up” of the cytoplasmic coiled-coil domains into the membrane at the onset of membrane fusion [118, 119]. In addition, the ability of SNARE proteins to enter the SNARE complex may be influenced by TMS-mediated homodimerization. 2.3.2 Transmembrane Segments Forming Left-handed Assemblies The M2 proton channel from influenza A virus is one example of a TMS that is likely to adopt positive crossing angles. M2 forms a tetramer that is stabilized by intermolecular disulfide bonds. Since these are not essential for tetramerization, the protein can assemble by way of noncovalent TMS-TMS interactions. Its function is to form a proton channel in the envelope of the virus that is blocked by the antiviral drug amantadine [122]. Channel function can be reconstituted in mRNA-injected Xenopus laevis oocytes and proton translocation can be determined elec-trophysiologically. DeGrado, Pinto, and coworkers have determined the degree of amantadine blockage for a range of point mutants generated by cysteine-scanning mutagenesis in comparison with the wild-type protein. A pattern of sensitive residues (V. .A. . .G. .H. . .W) emerged with the periodicity of a and d positions in a left-handed helix-helix pair, suggesting a positive interhelical crossing angle [123]. This model was supported by direct analysis of oligomeric states upon disulfide oxidation [124] and refined by molecular modeling [125] and spectroscopic analysis ([10] and references cited therein). Phospholamban TMSs also appear to cross each other at a positive packing angle upon homophilic assembly. This protein is part of the sarcoplasmatic reticulum membrane, where it functions as a regulator of Ca2+ -ATPase function [126]. Recombinant phospholamban forms a mixture of different oligomeric species in detergent solution up to a predominant pentamer, and the relevance of the individual TMS residues for assembly has been determined by saturation [127] and scanning [128] mutagenesis. Although the patterns of sensitive residues slightly deviated in both studies (L. .IC. .LL. .I [127]; L. .I. . .L. .I. . .L [128]), it became apparent that the interfacial residues follow a heptad periodicity indicative of a positive crossing angle. Molecular modeling supported this view [129–131]. The exclusive participation of Leu, Ile, and Cys residues in interface formation is reminiscent of the situation of SNARE proteins as described in the previous section. It suggests that the interaction is driven mainly by van der Waals interactions and may be supported by minimal entropy loss upon association. A leucine-zipper type of TMS-TMS interaction was also proposed by us to contribute to homodimerization of erythropoietin receptors. Whereas these cytokine receptors were originally thought to be activated by ligand-mediated dimerization, more recent evidence has revealed the existence of preformed dimers whose active conformation is stabilized by ligand binding [132, 133]. That the TMS may participate in receptor assembly was suggested by its ability to self-interact in membranes
2 The Nature of Transmembrane Helix-Helix Interactions
[134] and was confirmed by showing that the wild-type TMS sequence is important for receptor assembly [135] and function [136]. Thus, TMS-TMS assembly may support formation of a ligand-independent receptor dimer. The relevant interface was mapped by asparagine-scanning mutagenesis (see Section 4.3), which confirmed that it corresponds to a heptad repeat pattern [274]. Interestingly, the same face of the mEpoR TM helix may alternatively mediate homophilic interaction or heterophilic binding to the TMS of the viral membrane protein gp55-P in cells infected with polycythemic Friend spleen focus-forming virus. The disulfide-linked, single-span, homodimeric gp55-P binds to the erythropoietin receptor in a way that depends on the TMS sequences of both proteins and thereby induces constitutive activation [137]. Computational searching of low-energy structures and model building provided a 3-D model of this TMS-TMS interface and predicted that it corresponds to a left-handed leucine zipper stabilized by van der Waals interactions [137]. Since the residues that account for homophilic assembly of the erythropoietin receptor TMS correspond precisely to the residues contacting the gp55-P TMS in the model, homophilic interaction may compete with heterophilic interaction in the membrane of an infected erythroid cell. A TMS-TMS interaction of the leucine zipper type was also proposed for cadherins that are calcium-dependent, cell-cell adhesion molecules whose function requires lateral clustering within the cell’s plasma membrane [134]. Lateral cadherin clustering involves interactions between extracellular and cytoplasmic domains [138]. In addition, their TMSs appear to contribute to lateral assembly, since mutations reducing TMS-TMS interactions also reduced adhesiveness of E-cadherin in transfected cells [139]. Apart from these natural sequences, model systems have been developed in order to study the forces driving TMS-TMS assembly. For example, an oligo-Leu sequence that functions as an artificial TMS has been shown to self-interact in membranes and in detergent solution, where it forms oligomeric structures of mixed stoichiometry [134, 275]. As self-interaction was preserved with a heptad-repeat pattern of Leu residues, the oligo-Leu sequence is regarded as a prototypical membrane-spanning leucine-zipper interaction domain [134] whose assembly is presumably driven by van der Waals interactions. Interestingly, the affinity of the oligo-Leu sequence was dramatically enhanced when Engelman and coworkers placed an Asn residue within the oligo-Leu TMS. A nuclear magnetic resonance study revealed that an intermolecular hydrogen bond had formed between the carboxamide side chains of Asn [140]. In a different study, DeGrado and coworkers investigated the effect of Asn on self-assembly of the GCN4 leucine-zipper domain that was reengineered to a membrane-soluble molecule (termed MS1) by exchanging most hydrophilic residues to apolar ones [141]. In the original GCN4 molecule, the leucine-zipper domain mediates high-affinity homodimerization and its structural features are known in great detail [142–145]. The a and d positions of this domain are occupied by Leu and Val residues, respectively, except for one d position that corresponds to an Asn. In the water-soluble form, this Asn residue is destabilizing relative to a hydrophobic interaction but specifies the dimeric state in favor of other oligomeric forms [144, 146]. In the membrane-soluble MS1 peptide, however, Asn strongly
15
16 Transmemebrane Domains in Proteins
increased thermodynamic stability [141], whereas the Leu and Val residues at the other a and d positions contributed little to the free energy of association [10]. Thus, hydrogen bond formation between Asn side chains appears to provide a much stronger attractive force within the apolar milieu of detergent micelles or membranes than in aqueous solution. Later, it was shown that other polar amino acids such as Asp, Gln, Glu, and His also strongly enhanced self-assembly of the membrane-spanning leucine-zipper models [147, 148]. We recently used the effect exerted by Asn to probe the interface of the original oligo-Leu model. When all positions of the oligo-Leu sequence were systematically mutated in asparagine-scanning mutagenesis, the most sensitive positions correspond to a heptad repeat pattern of residues, confirming a leucine-zipper type of side chain packing. Further, asparagines had a much stronger impact on self-assembly when located at the center of the oligo-leucine sequence than at the termini, and a similar phenomenon was observed upon asparagine mutagenesis of the erythropoietin receptor TMS [274]. We believe that the dependence of the strength of the hydrogen bond between Asn residues on their depth of bilayer penetration is related to polarity gradients in lipid bilayers (see Section 2.1.2). This conclusion is supported by a recent study addressing the effect of asparagine on self-assembly of the MS1 peptide [86]. By analytical ultracentrifugation of corresponding synthetic peptides in detergent micelles, these authors showed that asparagine residues located within the apolar region of the peptide provide a significantly larger driving force on helix-helix interaction than an asparagine at a position near the apolar/polar interface of the micelle. It has been argued that the existence of polar residues forming hydrogen bonds between TMSs could actually be dangerous since they might induce unspecific assembly in the membrane [140]. That this danger is real is highlighted by a number of growth factor receptors that are constitutively activeted by TMS mutations in different hereditary diseases. For example, the neu tyrosine kinase receptor is activated by a substitution of V664 within its TMS for a glutamic acid residue [149] that appears to induce permanent receptor dimerization by interhelical hydrogen bond formation [69]. The tyrosine kinase activity of fibroblast growth factor receptor 3 is activated by a glycine-to-arginine exchange within its TMS in patients suffering from achondroplasia [150]. Further, mutating S498 of the thrombopoietin receptor TMS to asparagine rendered this receptor constitutively active [151], and mutation of T617 to asparagine within the granulocyte colony-stimulating factor receptor TMS as found in patients with acute myeloid leukemia conferred growth-factor independence in expressing cells [152]. In addition to hydrogen bonds, it is quite likely that ionic interactions between charged residues take place in the apolar environment. Conceptually, charge-charge interactions between TMSs are difficult to envision since ionizable residues are likely to be integrated into membranes in an uncharged state. The T cell receptor is a model where this issue has been examined in considerable detail. This receptor is composed of six different chains that form the αβ heterodimer responsible for ligand recognition plus the CD3γ ε, CD3δε, and ζ ζ signaling modules [153]. Three basic residues are found in the TM domains of the T cell receptor αβ heterodimer, while a pair of acidic residues is present in each of the three associated signaling
2 The Nature of Transmembrane Helix-Helix Interactions
dimers. It was shown earlier that interactions between TMS residues of opposite charge contribute to association of certain receptor subunits [154–156], while a recent study addressed the role of charged residues in formation of the complete receptor complex [153]. According to this model, each of the nine ionizable residues is essential for receptor assembly. Each assembly step is promoted by interaction between the basic residues of the αβ heterodimer and the acidic residues of the different signaling modules. Whether these residues are truly ionized prior to interaction is not clear. It was shown previously that unassembled α and β chains have a short half-life in cell membranes due to the basic residues [157], and model proteins with basic residues remain associated with the Sec61p channel of the endoplasmic reticulum rather than diffusing into the lipid [158]. Therefore, it was argued that newly synthesized subunits might stay at the vicinity of the translocon where their ionizable residues could be in the charged state before associating with cognate subunits [153]. This model of T cell receptor assembly is an interesting case, as it provides evidence that certain types of TMS-TMS interactions may depend on chaperoning by accessory proteins. The issue of ionic interactions between TMSs was also addressed recently by an in vitro study based on synthetic hydrophobic peptides with non-natural sequences. Upon reconstitution into liposomal membranes, self-interaction of these oligo-Leubased peptides was not enhanced by the presence of ionizable residues that were of opposite signs. It was thus proposed that ionic interactions would not contribute to TM helix-helix interactions [159]. The reason for the obvious discrepancy in the results obtained with the T cell receptor [153] is not known. Clearly, no chaperoning factors were present in the liposome system, and it is not certain that these peptides were inserted correctly into the plane of the bilayer. 2.4 Selection of Self-interacting TMSs from Combinatorial Libraries
Another tool for exploring the mechanisms underlying TMS-TMS interactions is the selection of transmembrane sequences capable of self-interaction from combinatorial libraries [102, 160, 161]. These libraries are created by randomization of amino acid motifs characteristic for TMS-TMS interfaces. In these systems, TMSTMS interaction results in a phenotype that can be selected for. For this purpose, the ToxR transcription activator system is used, which exists in two versions: TOX-CAT [96] and POSSYCCAT [160] (see Section 4). Randomization of the LIxxGVxxGVxxT pattern that mediates the high-affinity dimerization of glycophorin A followed by selection of self-interacting sequences using TOXCAT yielded GxxxG motifs in over 80% of all isolates [102]. As discussed previously, Gly residues reduce the distance between the helix axes and thus may facilitate hydrogen bond formation between Cα -hydrogens and the backbone of the partner helix [76]. In addition, β-branched residues (Ile, Val) were apparently preferred at positions adjacent to the glycines. Such motifs might drive TMS-TMS packing by formation of a flat helix surface. Moreover, the side chains of Ile and Val prefer a single rotameric state in an α-helix; this limits the entropy loss that is
17
18 Transmemebrane Domains in Proteins
normally associated with the freezing of side chain conformations upon proteinprotein interaction [102]. When the glycophorin motif was randomized with a set of hydrophobic residues from which Gly was excluded, clusters of Ser and Thr residues were enriched in the selected self-interacting TMSs [161]. Although single Ser or Thr residues did not support TMS-TMS interactions in previous studies [147, 148], multiples of them apparently do. This suggests that multiple weak interhelical hydrogen bonds – formed either between hydroxylated side chains or between them and main-chain atoms – may act cooperatively to drive TMS-TMS assembly. The heptad repeat motif ga. .de. .ga. .de. .ga characteristic of TMS-TMS interfaces adopting positive crossing angles was randomized with three different sets of mostly hydrophobic residues, and self-interacting sequences were selected with the POSSYCCAT system [160]. A set of only five hydrophobic residues (Leu, Ile, Val, Met, and Phe) yielded heptad patterns that were capable of self-interaction in a rather sequence-independent way, although Val was over-represented in those TMSs with the highest affinities. Randomizing with more complex amino acid mixtures drastically decreased the fraction of self-interacting TMSs and resulted in an enrichment of Ile and Leu in high-affinity sequences. It is likely that these aliphatic side chains contribute to TMS-TMS interaction by virtue of their relatively large contribution to van der Waal’s interactions in a well-packed interface. In addition, loss of side chain entropy may be minimal for Leu, Ile, and Val. In contrast to the results obtained upon randomizing the motif associated with negative crossing angles [102, 161], no enrichment of GxxxG motifs or hydroxylated amino acids was seen, and the content of Pro residues was negatively correlated to self-interaction of the heptad pattern [160]. On the other hand, Pro was abundant in high-affinity sequences assumed to adopt negative crossing angles, where the free i-4 carbonyl was proposed to engage in interhelical hydrogen bonding to a Ser or Thr residue of the partner helix [161]. These differences suggest that weak interhelical hydrogen bonding may be more relevant for right-handed TMS pairs than for left-handed ones. This is in agreement with the observation that interhelical hydrogen bonds involving Cα -hydrogens were primarily associated with TM helix pairs crossing each other at negative packing angles [76]. 2.5 Role of Lipids in Packing/Assembly of Membrane Proteins
The packing of transmembrane helices is likely to be influenced by the surrounding lipids. Excellent reviews of lipid-protein interactions in biological membranes have recently appeared, with an extensive treatment of high-resolution structures of membrane proteins that contain tightly bound lipid molecules even after detergent solubilization [162–164]. In a number of cases, these lipids are found at or between protein interaction surfaces and thus might directly support oligomerization by stabilizing TMS-TMS interactions. In the crystal structure of bacteriorho-dopsin, which normally occurs as a trimer in its native purple membrane, six glycolipids were found in the central compartment of the trimer. Mutations in the binding
2 The Nature of Transmembrane Helix-Helix Interactions
sites for these lipids disrupted the trimeric organization of the protein [165]. For a review about lipid-protein interactions in the purple membrane, see Ref. [166]. When tightly bound cardiolipin molecules were removed from bovine cytochrome c oxidase by digestion with phospholipase, two subunits dissociated from the complex [167]. The structure of cytochrome bc1 also contains several phospholipids, which are suggested to play a role in stabilization of the multi-subunit complex [168]. Moreover, a phosphatidylcholine (PC) molecule has been found at the interface between the L and M subunits of the photosynthetic reaction center [169], and three cardiolipin molecules are present at the trimer interface of formate dehydrogenase [170]. Apart from this direct role, lipids may influence TMS-TMS interactions in more indirect ways. First, interactions between proteins and lipids in a membrane are thermodynamically coupled. Thus, the free energy of helix-helix association in lipid bilayers can be described as: G a =G HH +n/2G LL −nG HL
(1)
where GHH , GLL , and GHL are the free energies of helix-helix, lipid-lipid, and helix-lipid interactions, respectively, and it is assumed that n lipid molecules are displaced from the helices upon interaction [1]. Therefore, oligomerization of transmembrane helices will be facilitated when helix-helix or lipid-lipid interactions are more favorable than helix-lipid interactions. Molecular dynamics simulations of glycophorin A in lipid bilayers suggest that these helix-helix and helix-lipid interactions can be quite similar [171]. Studies with transmembrane model peptides indicate that the free energy of helix-helix interaction increases with increasing lipid acyl chain length [172], presumably because the lipid-lipid interactions become more favorable. Other theoretical descriptions of how lipid properties can influence transmembrane helix-helix interactions have recently been reviewed [173–175]. Also, entropic factors play a role, since upon helix-helix association within the lipid bilayer, the displacement of protein-bound lipids to the bulk lipid increases overall entropy. In other words, reduction of the solvent-exposed surface would be associated with an increase in overall entropy. This is analogous to the hydrophobic effect, which is one force driving the folding of soluble proteins in water [9]. Second, oligomerization or aggregation of membrane proteins has been proposed to be one of the possible consequences of hydrophobic mismatch with the membrane. Hydrophobic mismatch occurs when the length of a hydrophobic TMS is not equal to the thickness of the apolar part of the lipid bilayer. The resulting exposure of hydrophobic surfaces to a hydrophilic environment is unfavorable, and a number of possible ways of relieving potential mismatch have been proposed [176, 177]. The area of the protein that is exposed to the lipid bilayer can be reduced by adaptation of the lipids, tilting of transmembrane helices, or oligomerization of the protein. This indeed occurs with bacteriorhodopsin, which was monomeric when reconstituted into phosphatidylcholines with chain lengths from 12 to 22 carbon atoms but aggregated in thinner or thicker bilayers [178]. This effect of hydrophobic mismatch seems to occur especially with α-helices that are too long
19
20 Transmemebrane Domains in Proteins
to span the lipid bilayer, and therefore the resulting tilted conformation has been suggested to assist in helix packing [179]. Third, oligomerization of membrane proteins naturally depends on their concentration within the membrane, as exemplified by bacteriorhodopsin, which is mono-meric at low molar ratios but trimerizes when the concentration is increased [180, 181]. In lipid bilayers containing separate domains of lipids in the fluid (liquid crystalline) and in the gel phase, transmembrane helices tend to be excluded from the gel-phase regions [41, 182]. This increases the local concentration of these proteins in the liquid-crystalline part of the bilayer and thus can lead to increased oligomer formation [41, 182, 183]. In lipid bilayers composed entirely of gelstate lipids, this can even lead to the formation of large, highly ordered peptide-containing domains [184]. The same local enrichment phenomenon might also apply to membranes containing liquid-ordered domains, also called rafts. The lipid acyl chains in such domains are tightly packed like gel-state lipids, but the lateral diffusion of lipids is nearly the same as in the fluid phase. Increasing concentrations of the raft lipids sphingomyelin and/or cholesterol lead to an increase in dimerization of transmembrane peptides [159, 172, 185]. This could be due to exclusion of the transmembrane peptides from liquid-ordered domains [186], although other membrane proteins are believed to be anchored within rafts via their TMSs [187–189]. Other effects of cholesterol, such as the ordering of the lipid acyl chains and the resulting increase in membrane thickness, could also play a role. Not only the lipids but also the choice of detergents used to study transmembrane peptides can severely influence the extent of interaction that is observed. The negatively charged SDS is much more disruptive for transmembrane helix-helix interactions than are zwitterionic detergents [97, 101, 190]. In addition, the acyl chain length of the detergent is very important, since optimal packing of transmembrane peptides of cystic fibrosis transmembrane conductance regulator was observed with detergents with acyl chain of eight or nine carbon atoms. This may be related to the hydrophobic diameter of the detergent micelles [190]. Taken together, lipids may influence TMS-TMS interactions at various levels, although the extent to which this is relevant is currently not clear due to the paucity of experimental data. On the other hand, those few examples where TMS-TMS interactions have been compared in lipids and detergents [90, 91, 101] suggest that the environment may influence the fine structure and/or the association equilibrium [97, 100, 101], but not the identity of the interface [93, 95, 119, 134].
3 Conformational Flexibility of Transmembrane Segments
TM helices are often described as rather rigid structures, a notion that is reinforced by the stabilizing influence of the apolar environment of a lipid membrane (see Section 1.3.2). There is mounting evidence, however, that TMSs can exhibit considerable conformational flexibility and that this can be of significant functional relevance. Consistent with the fact that the network of intrachain hydrogen bonding is incomplete at helix termini, the internal flexibility of model helices
3 Conformational Flexibility of Transmembrane Segments
was found to be greater at terminal positions than at the centers [191, 192]. Further, helix-destabilizing Pro and Gly residues appear to locally increase flexibilities of TM helices. Early indications for this were obtained by Deber and coworkers, who found that Pro residues are significantly over-represented in the TMSs of transporters and channels [193]. It was concluded that Pro residues facilitate the conformational changes during solute transport mediated by these proteins. As noted above (see Section 1.3.2), Pro residues in TM helices are frequently associated with local kinks, especially when a Gly residue is present in the vicinity, and it has been argued that these kinks may function as “molecular hinges,” i.e., sites of increased local flexibility [63]. Indeed, a number of experimental and computational studies support these arguments. For example, Pro hinges have been reported to be important for gating of voltage-gated potassium channels [194] and for activation of G-protein-coupled receptors [195] or solute transporters [196]. Pro and Gly are frequently found in natural peptides, such as alamethicin and mellitin, that exhibit channel function and/or disrupt lipid bilayer structure. Replacing a Pro at a central location of the alamethicin sequence with alanine or shifting it to a different position influences conductance states and channel lifetimes [197, 198]. Alamethicin forms a helix that exhibits conformational flexibility around the Gly residue at the i-3 position of the central Pro residue in organic solvent as well in lipid bilayers [199, 200]. Mutational analysis of this GxxP pair suggested that the Pro residue by itself had little influence on bending of the helix, while concurrent mutation of Gly significantly reduced its flexibility [201]. Heck and coworkers studied the conformational flexibility of analogues of a non-natural model peptide reconstituted into liposomal membranes by hydrogendeuterium exchange. They demonstrated that insertion of a Pro residue at the center of the TMS strongly enhanced the accessibility to deuterium of the amide groups of the entire central hydrophobic region that was largely inaccessible in the parental structure. This is likely to reflect an effect of Pro on helix flexibility that was, however, not seen upon insertion of Gly [202]. Conformational flexibility of TM helices has recently been proposed to play a role at the terminal stages of lipid membrane fusion [203–205]. Membrane fusion involves a restructuring of lipid bilayers brought into close proximity by single-span, membrane-anchored fusion proteins [110, 206, 207]. During the initial stages of the fusion process, these proteins mediate membrane apposition and undergo global conformational changes [208]. Late stages require the TMSs, as their replacement by lipid anchors or mutation abrogated the ability of various viral fusion proteins [110, 206, 207, 209–213] or SNAREs [214, 215] to mediate complete bilayer mixing. We and others showed that synthetic peptides corresponding to the TMSs of vesicular stomatitis virus (VSV) G-protein [204, 205] or of presynaptic SNAREs [203] induce unregulated sequence-specific membrane fusion upon reconstitution into liposomal membranes. In the case of the VSV G-protein, the pep-tide mimics partly reproduced the effects of point mutations affecting full-length VSV G-protein function in HeLa cells [204, 205]. Interestingly, these TMS mutations correspond to exchanges of Gly or of mostly β-branched residues for helix promoters. At the same time, these mutations increased the stability of the helical structures in isotropic solution [203, 205]. Gly and/or β-branched residues are generally over-represented
21
22 Transmemebrane Domains in Proteins
in TMSs of viral fusion proteins [213] and SNAREs [203]. It was proposed, therefore, that the fusogenic activities of the TMSs at the final stage of lipid merger might require their conformational flexibility [203–205]. There are a few other cases known where TMS mutations suppressed the fusogenicity of full-length viral proteins. This is exemplified by influenza hemagglutinin, whose function was compromised by a Gly-to-Leu exchange [209], and by a murine leukemia virus glycoprotein where mutations of Pro and Phe residues had the strongest effects [211]. That fusogenic TMSs may be characterized by high internal flexibility is also consistent with hydrogen/deuterium exchange experiments conducted on the membrane-embedded TM helix of influenza hemagglutinin [216]. While the conformational effects of Gly and Pro residues are likely to result from local packing defects as discussed above, the situation is more complex with β-branched amino acids. The conformation of the latter has been extensively analyzed in the case of the TMS of lung surfactant protein C, which includes two segments of seven and four consecutive valyls. This TMS exists as a stable and very well-defined α-helix in phosphocholine micelles but refolds to β-sheet aggregates in organic solvent [217]. Its amide protons were virtually non-exchangeable in the helical state but accessible upon spontaneous unfolding. Thus, consecutive valyl side chains appear to protect a helical peptide backbone by tight interlocking [217]. Similarly, the amide protons of oligo-Leu helices were virtually non-exchangeable in bilayers and exchanged very slowly in organic solvent [218]. These and other data [59, 60] show that both Leu and Val residues form stable α-helices in membranes. One may wonder, therefore, whether the observed overrepresentation of β-branched residues in TMSs of fusogenic proteins [203, 213] is primarily related to their ability to self-interact or to fusogenicity. It will be interesting to determine the conformational and functional properties of β-branched residues in mixed sequences.
4 Experimental Techniques
In the following sections, we give an overview on the most widely used techniques for the characterization of TMS-TMS interactions. Whereas most of these techniques monitor protein-protein interaction upon solubilization with a suitable detergent that prevents unspecific aggregation [219], others are suited for analysis of membrane proteins reconstituted into lipid membranes. Experimental approaches not covered here include the use of electron paramagnetic resonance, Fourier transform infrared spectroscopy, small angle X-ray scattering, and nuclear magnetic resonance spectroscopy [220]. 4.1 Biochemical and Biophysical Techniques
Most standard techniques developed for the characterization of interactions between soluble proteins can be adopted for the purpose of studying TMS-TMS
4 Experimental Techniques
interactions in the context of native or recombinant membrane proteins or of synthetic peptides. Recombinant membrane proteins are frequently produced by E. coli, which can be a difficult task due to toxic effects exerted by hydrophobic protein domains on the host cells. For high-level expression, therefore, it is often of considerable advantage to drive the recombinant protein into insoluble, and hence harmless, inclusion bodies that can later be detergent-solubilized upon cell lysis. Full-length membrane proteins or TMS sequences are frequently genetically fused to any of a variety of soluble proteins, such as glutathione-S-transferase or Staphylococcus aureus nuclease A, or to short epitope tags in order to facilitate expression and/or later detection as reviewed recently [221]. Expression in eukaryotic host systems, based on Sf9 insect cells, yeast, or mammalian cell lines, has the advantage of proper integration into eukaryotic membranes and proper posttranslational modifications but involves more cumbersome experimental procedures and frequently results in low protein yields [222, 223]. Alternatively, in vitro translation in the presence [153, 224–228] or absence [128] of microsomal membranes (derived from endoplasmatic reticulum) has been used to produce membrane protein subunits in sufficient quantity for investigation of their oligomeric structures. Translocation into microsomal membranes has the advantage that chaperoning factors and enzymes adding posttranslational modifications are present and active. Short sequences representing TMSs from a variety of membrane proteins have also been synthesized by solid-phase chemical methods. Generally, synthesis of TMS peptides is a difficult task, as the sequences tend to aggregate nonspecifically on the resin, resulting in premature termination of the growing chains. Depending on the actual sequences, the standard 9-fluorenylmethoxycarbonyl (F-moc) methodology has been used successfully [147, 229, 230]. In certain instances, using the tert-butyloxycarbonyl (Boc) strategy appears to be better suited for production of pure peptides in satisfactory yields ([231] and our unpublished results). 4.1.1 Visualization of Oligomeric States by Electrophoretic Techniques Denaturing Electrophoresis Sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) is generally used to determine the molecular weights of proteins based on comparison of their electrophoretic mobilities with a set of reference proteins [232]. Provided that the domains responsible for protein-protein interaction, e.g., TMSs, are not denatured by SDS, this technique is a simple and reliable tool for monitoring formation of oligomers. Stoichiometries and even subunit compositions of digomers can be investigated based on their molecular weights, and subsequent Western blotting reveals their identities. The preservation of noncovalent protein adducts in the presence of SDS is generally facilitated under mild conditions, i.e., using low SDS concentrations in sample buffer, omission of sample heating, and running of the gels in the cold room. Depending on the affinity of the TMS-TMS interaction under study and its competition by SDS binding, it is generally helpful to use protein concentrations in the micromolar range. Nevertheless, sample dilution during electrophoresis frequently results in partial dissociation of the protein complexes, thus producing smears on the gels. A major concern with
23
24 Transmemebrane Domains in Proteins
this technique – as with any other method that monitors protein-protein interaction in detergent solution – is the issue of potential unspecific adduct formation. Several criteria have been applied to exclude unspecific protein-protein interaction. First, inclusion of urea in sample buffer and gel significantly reduces unspecific interactions [233]. A critical aspect here is that the structure of the interaction domain must not be destroyed by urea, as seems to be the case with at least some TMSs [117, 119]. Second, the disruptive effect of certain point mutations within a TMS on adduct formation is a good indication of its specific nature and can reveal the identity of interfacial residues. By the same token, systematic point mutagenesis has been used to identify amino acid patterns of entire TMS-TMS interfaces (see Section 4.3). Third, competition of protein-protein interaction by added synthetic TMS peptides has been used to demonstrate the role of the TMSs in interaction and specificity [92, 147]. Natural TMSs whose interaction was previously studied by SDS-PAGE include those of glycophorin A [92, 93], Mas70p [234], phospholamban [127, 128], the major coat protein of M13 phage [235], and presynaptic SNAREs [117–119]. In cases where noncovalent protein-protein interaction cannot be detected in the presence of SDS, it has been proposed to substitute SDS with a novel detergent, perfluorooctanoic acid, which protects assembly of some membrane protein oligomers and allows their molecular weight determination [236]. Further, it is frequently helpful to stabilize weak interactions before SDS-PAGE analysis with homo- or heterobifunctional chemical cross-linking agents that are available in great variety with respect to chemical specificities, spacer lengths, and cleavability [237, 238]. Non-denaturing Electrophoresis The problem of denatured or masked TMSs upon SDS binding as encountered with SDS-PAGE is alleviated with non-denaturing or native PAGE. The membrane protein is solubilized with mild nonionic detergents and ε-aminocaproic acid. In the original version (“colorless native PAGE”), membrane protein migration in the electric field depended on the isoelectric point. Electrophoretic separation was greatly improved upon addition of the negatively charged dye Coomassie brilliant blue, which binds to membrane proteins, resulting in negatively charged dye-protein complexes at neutral pH. These complexes display reduced unspecific aggregation and travel towards the anode [239–242]. A problematic point with this technique, termed “blue native PAGE,” is that precise determination of molecular weights is difficult since the ratio of negative charge to molecular weight is far less uniform than with SDS-PAGE. Further, its success depends on the nature of the respective protein and requires careful control of the ratio of detergent, dye, and protein concentrations. 4.1.2 Hydrodynamic Methods Hydrodynamic methods such as gel-filtration chromatography and ultracentrifugation on density gradients have traditionally been used to study oligomeric complexes of soluble proteins and to calculate their molecular weights with reference to marker proteins. For membrane proteins, these techniques are carried out in
4 Experimental Techniques
the presence of non-denaturing detergents that may allow detection of TMS-TMS interactions that could be destroyed under the harsher conditions of SDS-PAGE [134]. To make the molecular weight increment of the detergent molecules that are bound to the protein invisible, D2 O is added to the solution such as to adjust its density to the density of the detergent [243]. In analytical ultracentrifugation, the free-energy change of an interaction as well as the stoichiometry of a complex can be determined [10]. The method has been applied to the TMSs of glycophorin A [100] and of the influenza M2 protein [10] and to TMS peptide models engineered to self-interact [141, 147]. 4.1.3 Fluorescence Resonance Transfer Techniques based on fluorescence resonance energy transfer (FRET) between suitable chromophores enjoy widespread use in protein-protein interaction research. Their advantage for membrane proteins is that they are capable of monitoring interactions in detergent micelles, upon reconstitution into synthetic bilayers, or even in membranes of live cells depending on the experimental approach taken. The principle of FRET is that two interacting proteins must carry different fluorescent chromophores, termed “donor” and “acceptor,” with complementary spectral characteristics. Fluorescent energy emitted by the donor can transfer to the acceptor in a radiation-less process, which is also called “F¨orster transfer,” provided that both are sufficiently close to each other. Since the efficiency of radiation-less transfer, as determined by measuring the decreasing intensity of donor fluorescence, decreases with the sixth power of the distance, FRET is used to measure intermolecular distances and is also dubbed as “molecular ruler” [10, 244–246]. The molecular distances between interaction partners that are amenable to FRET analysis are typically in the tens of angstroms and depend on the type of FRET pair used. A variety of donor/acceptor pairs have been successfully employed for analyzing TMS-TMS interactions using recombinant proteins or synthetic peptides. Many chromophores that are commercially available are conjugated to N-hydroxysuccinimide esters or maleimides that react with free primary amine groups or sulfhydryl groups, respectively. Synthetic peptides are labeled either after or during synthesis. Some examples include the pairs carboxyfluorescein/tetramethylrhodamine [247], dansyl/dabcyl [230], pyrene acetic acid/7-dimethyl-coumarin-4-acetic acid [97], and 7nitro-2-1,3-benzoxadiazol-4-yl/5(6) carboxy tetra-methylrhodamine [248]. In an alternative to synthetic chromophores, FRET was measured between synthetic peptides containing the pair tryptophan/pyrenylala-nine [159]. Natural TMSs whose interaction was studied by the FRET technique include those of the pore-forming domain of Bacillus thuringiensis delta-endotoxin [249], phospholamban [250], the major coat protein of Ff bacteriophage [230], and glycophorin A [97, 251]. The latter case is particularly interesting since FRET measurements in detergent micelles have revealed a strong dependence of the TMS-TMS affinity, but not of helicity, on the type and concentration of detergent used [97]. Apart from small molecule chromophores, derivatives of the green fluorescent protein have been developed that are suitable as donor/acceptor pairs and can be genetically fused to the protein under study. While the first-generation derivatives
25
26 Transmemebrane Domains in Proteins
suffered from low absorbances and spectral overlap [252], cyan and yellow fluorescent proteins [253–255] offer great hope for analysis of protein-protein interaction within the membranes of live cells [116, 256]. A technique based on bioluminescence resonance energy transfer (BRET) was recently described [257]. In that method, luciferase is genetically fused to the protein of interest; its luminescence is induced by addition of the membrane-permeable coelenterazine and excites yellow fluorescent protein fused to an interaction partner [258–260]. BRET has also been used to monitor membrane protein interactions in live cells and may have a number of advantages over FRET in that direct excitation of the acceptor and photobleaching of the donor are excluded; further, autofluorescence of endogenous cellular components does not interfere with the measurement [257]. 4.2 Genetic Assays
A number of genetic assay systems have been developed for the analysis of isolated TMS sequences or full-length membrane proteins in natural membranes. Their unifying hallmark is that protein-protein interaction elicits expression of reporter genes whose products can be quantitated and qualitatively represent the efficiency of the underlying interaction. 4.2.1 The ToxR System The single-span membrane protein ToxR transcription activator protein is the central regulator of virulence gene expression in Vibrio cholerae. Upon self-assembly in the inner membrane, the ToxR molecule activates the ctx promoter, thereby initiating transcription of downstream genes [261]. For a system that is suitable for studying TMS-TMS self-association, a tripartite protein consisting of the cytoplasmic ToxR domain, a TMS, and a periplasmatically located maltose-binding protein (MalE) has been engineered. Variants of this protein with TMSs of choice are expressed in Escherichia coli reporter strains, where reporter genes are under transcriptional control of the ctx promoter. While the original version of the ToxR system utilizes the lacZ gene as reporter [95, 262], the TOXCAT system is based on plasmid-borne chloramphenicol acetyltransferase [96]. Specific activities of the reporter enzymes in cell lysates reflect the degree of self-assembly of the ToxR proteins in the inner bacterial membrane. Since TMS-TMS interactions are concentration-dependent like any other protein-protein interaction, it is desirable to regulate the concentration of the ToxR proteins in the membrane. While ToxR expression in the first-generation vector is constitutive [95, 262], later versions make use of the regulatable lac [96] or arabinose [160] promoters, respectively. One advantage of regulatable promoters is that expression can be driven to levels that are high enough for protein solubilization, purification, and biochemical analysis (W. Ruan and D. L. Langosch, submitted). It should be borne in mind that the orientation of the interacting face of a TMS relative to the DNA-binding ToxR domain is a critical factor since it influences the
4 Experimental Techniques
coupling of TMS-TMS interaction to transcription activation [95, 274]. Therefore, the effect of inserting a TMS at different phases into ToxR chimeric proteins should initially be determined. Assuming α-helicity of the TMS, stepwise insertion of at least four additional residues at its N-terminus concurrent with stepwise deletion of four residues at its C-terminus rotates the potential TMS-TMS interface by up to 4 × 100 = 400◦ , i.e., more than a full helical turn, relative to the ToxR domain and thus reveals one construct whose orientation within the construct is suitable for further analysis. Control experiments are advisable to ascertain similar levels of ToxR expression and correct membrane integration. Concentrations of ToxR proteins in inner bacterial membranes can be compared by determining their ability to complement the deficiency in maltose-binding protein (MalE) of an E. coli deletion strain (PD28) [103]. This strain cannot grow in minimal medium with maltose as the only carbon source unless the C-terminal MalE domain of expressed ToxR chimeric proteins is successfully translocated to the periplasmic space [263]. Further, correct vectoriality of membrane integration can be confirmed by proteolytic digestion of the MalE domain in spheroplast preparations [96, 107]. The ToxR system has been used to study self-assembly of TMSs derived from glycophorin A [95, 96, 103], E-cadherin [139], erythropoietin receptors [136], SNARE proteins [119], receptor epidermal growth factor receptors [107], etc. [134, 140]. A variation of the ToxR system has been described where synthetic TMS peptides added to the culture medium inhibit homodimerization of endogenously expressed ToxR proteins by heterophilic association [231]. Apart from investigating defined TMS-TMS interactions, ToxR-based systems, i.e., TOXCAT [96, 102, 161] and POSSYCCAT [160], have been most useful for the construction and selection of self-assembling TMSs from combinatorial libraries, since the interacting proteins and the corresponding genetic information are part of one particle, the bacterial cell. In other words, the physical coupling of phenotype to genotype allows for selection of functional properties (see Section 2.4). Disadvantages of these systems in their present form are that heterophilic interactions cannot be determined and that no information on the stoichiometries of the ToxR complexes can be obtained. 4.2.2 Other Genetic Assays Another in vivo genetic assay system for detecting homophilic TMS-TMS interactions is based on the bacteriophage lambda cI repressor C-terminal dimerization domain. There, the native C-terminal dimerization domain of the lambda repressor is replaced by candidate TMSs. The ability of the TMSs to drive dimerization of the lambda repressor headpiece is tested by measuring the effectiveness of the hybrid proteins in preventing infection by a lambda phage missing its repressor [264]. This system has been used to identify new self-interacting TMSs in a library of natural membrane proteins from E. coli [276, 277]. In an attempt to make heterophilic TMS-TMS interactions accessible to investigation, a system, termed GALLEX, was recently developed wherein the TMSs are fused to two LexA DNA-binding domains with different DNA-binding
27
28 Transmemebrane Domains in Proteins
specificities. Heterodimeric association of the TMSs in E. coli inner membranes induces repression of a β-galactosidase reporter gene [265]. As discussed by these authors, the potential disadvantage of this system is that homophilic TMS-TMS interactions that exist in parallel may indirectly influence reporter gene expression, since the equilibria of homo- and heterophilic interactions in the membrane are coupled. To assess homo- and heterophilic membrane protein interactions in yeast membranes, a modification of the split-ubiquitin system [266] was presented. There, two potential interaction partners x and y are genetically fused to an altered N-terminal half (Nub G) or the C-terminal half (Cub ) of ubiquitin, respectively. In addition, an epitope-tagged reporter protein R is linked to the C-terminus of Cub . Upon xy interaction, Nub G and Cub -R re-associate and thus allow for proteolytic release of R by ubiquitin-specific proteases in the cell [266]. For membrane protein interactions, the reporter consists of an artificial transcription factor consisting of LexA and the herpes simplex VP16. Assembly of x and y then releases the transcription factor, thus inducing activation of a reporter gene [267]. Alternatively, the reporter consists of the yeast Ura3 protein that converts added 5-fluoroorotic acid to a toxic substrate, thus killing the cells. Once Nub G and Cub -ura3 re-associate due to xy interaction, Ura3 is proteolytically released and quickly degraded in the cell. Interaction of x and y thus prevents substrate conversion and allows for the cell’s survival [268]. Other methods for membrane interactions have been described that depend on genetic assays or complementation of enzyme function [269]. 4.3 Identification of TMS-TMS Interfaces by Mutational Analysis
In principle, any of the above methods can be used to assess the effect of individual point mutations on the degree of interaction. Thereby, the critical residues constituting TMS-TMS interfaces can be mapped. Point mutations have been introduced by random mutagenesis using degenerate codons [93] or by successive replacement of the residues in different types of scanning mutagenesis. Alanine-scanning mutagenesis is frequently used for this purpose [95, 117], as exchange of most residue types to alanine creates voids and/or replaces side chains involved in hydrogen bonding, etc., thus resulting in incremental reductions of protein-protein affinity. Alternatively, asparagine-scanning mutagenesis has been developed, which is based on the observation that asparagine residues located within TMSs drive their self-interaction by hydrogen bond formation in apolar environments such as a lipid membrane [140, 141]. Systematic replacement of TMS residues by asparagine therefore results in different enhancements of TMS-TMS affinity depending on whether the mutated position is closely juxtaposed to its counterpart within the helix-helix interface or not [274]. Covalent linkages between TMSs have been introduced upon systematically replacing the residues by cysteine. If oxidation of the cysteine leads to formation of a disulfide cross-link, the respective residue is regarded as interfacial [124, 270, 271].
References
References 1 WHITE, S. H. & WIMLEY, W. C. (1999). Membrane protein folding and stability: physical principles. Annu. Rev. Biophys. Biomol. Struct. 28, 319–365. 2 LEMMON, M. A. & ENGELMAN, D. M. (1994). Specificity and promiscuity in membrane helix interactions. Q. Rev. Biophys. 27, 157–218. 3 HALTIA, T. & FREIRE, E. (1995). Forces and factors that contribute to the structural stability of membrane proteins. Biochim. Biophys. Acta 1228, 1–27. 4 FLEMING, K. G. (2000). Riding the wave: structural and energetic principles of helical membrane proteins. Curr. Opin. Biotechnol. 11, 67–71. 5 UBARRETXENA-BELANDIA, I. & ENGELMAN, D. M. (2001). Helical membrane proteins: diversity of functions in the context of simple architecture. Curr. Opin. Struct. Biol. 11, 370–376. 6 ARKIN, I. T. (2002). Structural aspects of oligomerization taking place between the transmembrane alpha-helices of bitopic membrane proteins. Biochim. Biophys. Acta 1565, 347–363. 7 POPOT, J.-L. & ENGELMAN, D. M. (2000). Helical membrane protein folding, stability and evolution. Annu. Rev. Biochem. 69, 881–922. 8 LANGOSCH, D., LINDNER, E. & GUREZKA, R. (2002). In vitro selection of self-interacting transmembrane segments membrane proteins approached from a different perspective. IUBMB Life 54, 1–5. 9 HELMS, V. (2002). Attraction within the membrane – Forces behind transmembrane protein folding and supramolecular complex assembly. EMBO Rep. 3(12), 1133–1138. 10 DEGRADO, W. F., GRATKOWSKI, H. & LEAR, J. D. (2003). How do helix-helix interactions help determine the folds of membrane proteins?. Perspectives from the study of homo-oligomeric helical bundles. Prot. Sci. 12(4), 647–665. 11 CHAMBERLAIN, A. K., FAHAM, S., YOHANNAN, S. & BOWIE, J. U. (2003). Construction of helix-bundle membrane
12
13
14
15
16
17
18
19
20
proteins. Adv. Protein Chem. 63, 19–46. SHAI, Y. (2001). Molecular recognition within the membrane milieu: Implications for the structure and function of membrane proteins. J. Membr. Biol. 182(2), 91–104. KOEBNIK, R. (1999). Membrane assembly of the Escherichia coli outer membrane protein OmpA: exploring sequence constraints on transmembrane β-strands. J. Mol. Biol. 285, 1805–1810. KOEBNIK, R. (1996). In vivo membrane assembly of split variants of the E. coli outer membrane protein OmpA. EMBO J. 15, 3529–3537. THOMPSON, S. J., KIM, S. J. & ROBINSON, C. (1998). Sec-independent insertion of thylakoid membrane proteins – Analysis of insertion forces and identification of a loop intermediate involving the signal peptide. J. Biol. Chem. 273(30), 18979–18983. THOMPSON, S. J., ROBINSON, C. & MANT, A. (1999). Dual signal peptides mediate the signal recognition particle Sec-independent insertion of a thylakoid membrane polyprotein, psbY. J. Biol. Chem. 274(7), 4059–4066. MANT, A., WOOLHEAD, C. A., MOORE, M., HENRY, R. & ROBINSON, C. (2001). Insertion of PsaK into the thylakoid membrane in a ‘horse-shoe’ conformation occurs in the absence of signal recognition particle, nucleoside triphosphates, or functional Albino3. J. Biol. Chem. 276(39), 6200–6206. RIDDER, A., MOREIN, S., STAM, J. G., KUHN, A., de KRUIJFF, B. & KILLIAN, J. A. (2000). Analysis of the role of interfacial tryptophan residues in controlling the topology of membrane proteins. Biochemistry 39(21), 6521–6528. MULLER, M., KOCH, H. G., BECK, K. & SCHAEFER, U. (2001). Protein traffic in bacteria: Multiple routes from the ribosome to and across the membrane. Prog. Nucl. Acid Res. Mol. Biol. 66(107), 107–157. CHIN, C. N., VON HEIJNE, G. & de GIER, J. W. L. (2002). Membrane proteins:
29
30 Transmemebrane Domains in Proteins
21
22
23
24
25
26
27
28
29
30
31
shaping up. Trends Biochem. Sci. 27(5), 231–234. OTT, C. M. & LINGAPPA, V. R. (2002). Integral membrane protein biosynthesis: why topology is hard to predict. J. Cell Sci. 115(10), 2003–2009. DREW, D., FRODERBERG, L., BAARS, L. & de GIER, J. W. L. (2003). Assembly and overexpression of membrane proteins in Escherichia coli. Biochim. Biophys. Acta 1610(1), 3–10. POPOT, J.-L. & ENGELMAN, D. M. (1990). Membrane Protein Folding and Oligomerization: the two-stage model. Biochemistry 29, 4031–4037. POPOT, J. L., TREWHELLA, J. & ENGELMAN, D. M. (1986). Reformation of crystalline purple membrane from purified bacteriorhodopsin fragments. Embo 5, 3039–3044. OZAWA, S., HAYASHI, R., MASUDA, A., IIO, T. & TAKAHASHI, S. (1997). Reconstitution of bacteriorhodopsin from a mixture of a proteinase V8fragment and two synthetic peptides. Biochim. Biophys. Acta 1323, 145–153. MARTI, T. (1998). Refolding of bacteriorhodopsin from expressed polypeptide fragments. J. Biol. Chem. 273, 9312–9322. ARKIN, I. T. & BRU¨ NGER, A. T. (1998). Statistical analysis of predicted transmembrane alpha-helices. Biochim. Biophys. Acta 1429, 113–128. SENES, A., GERSTEIN, M. & ENGELMAN, D. M. (2000). Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions. J. Mol. Biol. 296, 921–936. SCHNEIDER, D., LIU, Y., GERSTEIN, M. & ENGELMAN, D. M. (2002). Thermostability of membrane protein helix-helix interaction elucidated by statistical analysis. FEBS Lett. 532(1-2), 231–236. VON HEIJNE, G. (1986). The distribution of positively charged residues in bacterial inner membrane proteins correlates with the trans-membrane topology. EMBO J. 5, 3021–3027. LANDOLT-MARTICORENA, C., WILLIAMS, K. A., DEBER, C. M. & REITHMEIER, R. A. F.
32
33
34
35
36
37
38
39
40
41
(1993). Non-random distribution of amino acids in the transmembrane segments of human type I single span membrane proteins. J. Mol. Biol. 229, 602–608. JONES, D. T., TAYLOR, W. R. & THORNTON, J. M. (1994). A mutation data matrix for transmembrane proteins. FEBS Lett. 339(3), 269–275. JONES, D. T., TAYLOR, W. R. & THORNTON, J. M. (1994). A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 33(10), 3038–3049. STEVENS, T. J. & ARKIN, I. T. (2000). The effect of nucleotide bias upon the composition and prediction of transmembrane helices. Prot. Sci. 9, 505–511. VON HEIJNE, G. (1989). Control of topology and mode of assembly of a polytopic membrane protein by positively charged residues. Nature 341(6241), 456–458. WALLIN, E., TSUKIHARA, T., YOSHIKAWA, S., VON HEIJNE, G. & ELOFSSON, A. (1997). Architecture of helix bundle membrane proteins: an analysis of cytochrome c oxidase from bovine mitochondria. Prot. Sci. 6, 808–815. WIMLEY, W. C. & WHITE, S. H. (1996). Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat. Struct. Biol. 3(10), 842–848. YAU, W. M., WIMLEY, W. C., GAWRISCH, K. & WHITE, S. H. (1998). The preference of tryptophan for membrane interfaces. Biochemistry 37(42), 4713–4718. PERSSON, S., KILLIAN, J. A. & LINDBLOM, G. (1998). Molecular ordering of interfacially localized tryptophan analogs in ester- and ether-lipid bilayers studied by 2H-NMR. Biophys. J. 75(3), 1365– 1371. KILLIAN, J. A. & VON HEIJNE, G. (2000). How proteins adapt to a membranewater interface. Trends Biochem. Sci. 25, 429–434. MALL, S., BROADBRIDGE, R., SHARMA, R. P., LEE, A. G. & EAST, J. M. (2000). Effects of aromatic residues at the ends
References
42
43
44
45
46
47
48
49
50
of transmembrane alpha-helices on helix interactions with lipid bilayers. Biochemistry 39(8), 2071–2078. de PLANQUE, M. R., BONEV, B. B., DEMMERS, J. A., GREATHOUSE, D. V., KOEPPE, R. E., 2nd, SEPAROVIC, F., WATTS, A. & KILLIAN, J. A. (2003). Interfacial anchor properties of tryptophan residues in transmembrane peptides can dominate over hydrophobic matching effects in peptide-lipid interactions. Biochemistry 42(18), 5341–5348. KYTE, J. & DOOLITTLE, R. F. (1982). A simple method for displaying the hydropathic character of a protein. J. Mol. Biol 157, 105–132. ENGELMAN, D. M., STEITZ, T. A. & GOLDMAN, A. (1986). Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu. Rev. Biophys. Biophys. Chem. 15, 321–353. CSERZO, M., WALLIN, E., SIMON, I., VON HEIJNE, G. & ELOFSSON, A. (1997). Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 10(6), 673– 676. PASQUIER, C., PROMPONAS, V. J., PALAIOS, G. A., HAMODRAKAS, J. S. & HAMODRAKAS, S. J. (1999). A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng. 12(5), 381–385. ROST, B., CASADIO, R., FARISELLI, P. & SANDER, C. (1995). Prediction of helical transmembrane segments at 95% accuracy. Prot. Sci. 4, 521–533. CHEN, C. P., KERNYTSKY, A. & ROST, B. (2002). Transmembrane helix predictions revisited. Protein Sci. 11(12), 2774–2791. VON HEIJNE, G. (1992). Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule. J. Mol. Biol. 225, 487–494. PERSSON, B. & ARGOS, P. (1996). Topology prediction of membrane proteins. Protein Sci. 5(2), 363–371.
51 PERSSON, B. & ARGOS, P. (1997). Prediction Of Membrane Protein Topology Utilizing Multiple Sequence Alignments. Journal Of Protein Chemistry 16(5), 453–457. 52 CREIGHTON, T. E. (1993). Proteins: Structures and Molecular Properties, Freeman, New York. 53 PADMANABHAN, S., MARQUSEE, S., RIDGEWAY, T., LAUE, T. M. & BALDWIN, R. L. (1990). Relative helix-forming tendencies of nonpolar amino acids. Nature 344, 268–270. 54 O’NEIL, K. T. & DEGRADO, W. F. (1990). A thermodynamic scale for the helix-forming tendencies of the commonly occuring amino acids. Science 250, 646–651. 55 BLABER, M., ZHANG, X. & MATTHEWS, B. W. (1993). Structural basis for amino acid alpha helix propensity. Science 260, 1637–1640. 56 SMITH, C. K., WITHKA, J. M. & REGAN, L. (1994). A thermodynamic scale for the β-sheet forming tendencies of the amino acids. Biochemistry 33, 5510–5517. 57 STREET, A. G. & MAYO, S. L. (1999). Intrinsic β-sheet propensities result from van der Waals interactions between side chains and the local backbone. Proc. Natl. Acad. Sci. USA 96, 9074–9076. 58 LI, S. C. & DEBER, C. M. (1992). Glycine and beta-branched residues support and modulate peptide helicity in membrane environments. FEBS Lett. 311, 217–220. 59 LI, S. C. & DEBER, C. M. (1994). A measure of helical propensity for amino acids in membrane environments. Nature Struct. Biol. 1, 368–373. 60 LIU, L.-P. & DEBER, C. M. (1998). Uncoupling hydrophobicity and helicity in transmembrane segments. J. Biol. Chem. 273, 3645–3648. 61 DEBER, C. M., LI, Z. M., JOENSSON, C., GLIBOWICKA, M. & XU, G. Y. (1992). Transmembrane Region of Wild-Type and Mutant M13 Coat Proteins Conformational Role of Beta-Branched Residues. J. Biol. Chem. 267(8), 5296–5300. 62 JOHANSSON, J., SZYPERSKI, T., CURSTEDT, T. & W¨UTHRICH, K. (1994). The NMR structure of pulmonary
31
32 Transmemebrane Domains in Proteins
63
64
65
66
67
68
69
70
71
72
73
surfactant-associated polypeptide SP-C in a apolar solvent contains a valyl-rich α-helix. Biochemistry 33, 6015. CORDES, F. S., BRIGHT, J. N. & SANSOM, M. S. P. (2002). Proline-induced Distortions of Transmembrane Helices. J. Mol. Biol. 323, 951–960. CHAKRABARTI, P. & CHAKRABARTI, S. (1998). C–H–O hydrogen bond involving proline residues in α-helices. J. Mol. Biol. 284(4), 867–873. VON HEIJNE, G. (1991). Proline kinks in transmembrane alpha-helices. J. Mol. Biol. 218, 499–503. JAVADPOUR, M. M., EILERS, M., GROESBEEK, M. & SMITH, S. O. (1999). Helix packing in polytopic membrane proteins: Role of glycine in trans-membrane helix association. Biophys. J. 77(3), 1609–1618. DANIEL, J. M., FRIESS, S. D., RAJAGOPALAN, S., WENDT, S. & ZENOBI, R. (2002). Quantitative determination of noncovalent binding interactions using soft ionization mass spectrometry. Int. J. Mass. Spec. 216, 1–27. ISRAELACHVILI, J. N. (1991). Inter-molecular and surface forces, Academic Press, London. SMITH, S. O., SMITH, C. S. & BORMANN, B. J. (1996). Strong hydrogen bonding interactions involving a buried glutamic acid in the transmembrane sequence of the neu/erbB-2 receptor. Nature Structural Biology 3, 252–258. SUBCZYNSKI, W. K., WISNIEWSKA, A., YIN, J.-J., HYDE, J. S. & KUSUMI, A. (1994). Hydrophobic Barriers of Lipid Bilayer Membranes Formed by Reduction of Water Penetration by Alkyl Chain Unsaturation and Cholesterol. Biochemistry 33, 7670–7681. WHITE, S. H., KING, G. I. & CAIN, J. E. (1981). Location of hexane in lipid bilayers determined by neutron diffraction. Nature 290, 161–163. GRASBERGER, B., MINTON, A. P., DELISI, C. & METZGER, H. (1986). Interaction between proteins localized in membranes. Proc. Natl. Acad. Sci. USA 83, 6258–6262. KLIBANOV, A. M. (2001). Improving enzymes by using them in
74
75
76
77
78
79
80
81
82
83
84
organic solvents. Nature 409, 241–246. BOWIE, J. U. (1997). Helix packing in membrane proteins. J. Mol. Biol. 272, 780–789. EILERS, M., SHEKAR, S. C., SHIEH, T., SMITH, S. O. & FLEMING, P. J. (2000). Internal packing of helical membrane proteins. Proc. Natl. Acad. Sci. 97, 5796–5801. SENES, A., UBARRETXENA-BELANDIA, I. & ENGELMAN, D. M. (2001). The Ca H···O hydrogen bond: A determinant of stability and specifity in transmembrane helix interactions. Proc. Natl. Acad. Sci. 98, 9056–9061. EILERS, M., PATEL, A. B., LIU, W. & SMITH, S. O. (2002). Comparison of helix interactions in membrane and soluble alpha-bundle proteins. Biophys. J. 82(5), 2720-2736. BEN-TAL, N. & HONIG, B. (1996). Helix-helix interactions in lipid bilayers. Biophys. J. 71(6), 3046-3050. CHOTHIA, C., LEVITT, M. & RICHARDSON, D. (1981). Helix to helix packing in proteins. J. Mol. Biol. 145, 215–250. LANGOSCH, D. & HERINGA, J. (1998). Interaction of transmembrane helices by a knobs-into-holes geometry characteristic of soluble coiled coils. Proteins Struct. Funct. Genet. 31, 150–160. STEVENS, T. J. & ARKIN, I. T. (2001). Substitution rates in alpha-helical transmembrane proteins. Protein Science 10(12), 2507–2517. LIU, Y., ENGELMAN, D. M. & GERSTEIN, M. (2002). Genomic analysis of membrane protein families: abundance and conserved motifs. Genome Biol. 3(10), RESEARCH0054.1-0054.12. PILPEL, Y., BEN-TAL, N. & LANCET, D. (1999). kPROT: a knowledge-based scale for the propensity of residue orientation in transmembrane segments. Application to membrane protein structure prediction. J. Mol. Biol. 294, 921–935. ADAMIAN, L. & LIANG, J. (2001). Helix-Helix Packing and Interfacial Pairwise Interactions of Residues in Membrane Proteins. J. Mol. Biol. 311, 891–907.
References 85 WIGLEY, W. C., CORBOY, M. J., CUTLER, T. D., THIBODEAU, P. H., OLDAN, J., LEE, M. G., RIZO, J., HUNT, J. F. & THOMAS, P. J. (2002). A protein sequence that can encode native structure by disfavoring alternate conformations. Nat. Struct. Biol. 9(5), 381–388. 86 LEAR, J. D., GRATKOWSKI, H., ADAMIAN, L., LIANG, J. & DEGRADO, W. F. (2003). Position-dependence of stabilizing polar interactions of asparagine in transmembrane helical bundles. Biochemistry 42(21), 6400–6407. 87 SAMATEY, F. A., XU, C. & POPOT, J.-L. (1995). On the distribution of amino acid residues in transmembrane α-helix bundles. Proc. Natl. Acad. Sci. 92, 4577–4581. 88 ADAMIAN, L. & LIANG, J. (2002). Interhelical hydrogen bonds and spatial motifs in membrane proteins: polar clamps and serine zippers. Proteins-Structure Function and Genetics 47, 209–218. 89 ADAMIAN, L., JACKUPS, R., Jr., BINKOWSKI, T. A. & LIANG, J. (2003). Higher-order interhelical spatial interactions in membrane proteins. J. Mol. Biol. 327(1), 251–272. 90 MACKENZIE, K. R., PRESTEGARD, J. H. & ENGELMAN, D. M. (1997). A transmembrane helix dimer: structure and implications. Science 276, 131–133. 91 SMITH, S. O., SONG, D., SHEKAR, S., GROESBEEK, M., ZILIOX, M. & AIMOTO, S. (2001). Structure of the Transmembrane Dimer Interface of Glycophorin A in Membrane Bilayers. Biochemistry 40(22), 6553–6558. 92 LEMMON, M. A., FLANAGAN, J. M., HUNT, J. F., ADAIR, B. D., BORMANN, B.-J., DEMPSEY, C. E. & ENGELMAN, D. M. (1992). Glycophorin A dimerization is driven by specific interactions between transmembrane alpha-helices. J. Biol. Chem. 267, 7683–7689. 93 LEMMON, M. A., FLANAGAN, J. M., TREUTLEIN, H. R., ZHANG, J. & ENGELMAN, D. M. (1992). Sequence specificity in the dimerization of transmembrane alpha-helices. Biochemistry 31, 2719–2725.
94 LEMMON, M. A., TREUTLEIN, H. R., ADAMS, P. D., BRU¨ NGER, A. T. & ENGELMAN, D. (1994). A dimerization motif for transmembrane alpha-helices. Nature Struct. Biol. 1, 157–163. 95 LANGOSCH, D. L., BROSIG, B., KOLMAR, H. & FRITZ, H.-J. (1996). Dimerisation of the glycophorin A transmembrane segment in membranes probed with the ToxR transcription activator. J. Mol. Biol. 263, 525–530. 96 RUSS, W. P. & ENGELMAN, D. M. (1999). TOXCAT: A measure of transmembrane helix association in a biological membrane. Proc. Natl. Acad. Sci. USA 96, 863–868. 97 FISHER, L. E., ENGELMAN, D. M. & STURGIS, J. N. (1999). Detergents modulate dimerization but not helicity, of the glycophorin A transmembrane domain. J. Mol. Biol. 293(3), 639–651. 98 TREUTLEIN, H. R., LEMMON, M. A., ENGELMAN, D. M. & BRU¨ NGER, A. T. (1992). The glycophorin A transmembrane domain dimer: sequence-specific propensity for a right-handed supercoil of helices. Biochemistry 31, 2726–2733. 99 ADAMS, P. D., ENGELMAN, D. M. & BRU¨ NGER, A. T. (1996). Improved prediction for the structure of the dimeric transmembrane domain of glycophorin A obtained through global searching. Proteins 26, 257–261. 100 FLEMING, K. G., ACKERMAN, A. L. & ENGELMAN, D. M. (1997). The Effect Of Point Mutations On the Free Energy Of Transmembrane Alpha Helix Dimerization. J. Mol. Biol. 272, 266–275. 101 FLEMING, K. G. & ENGELMAN, D. M. (2001). Specificity in transmembrane helix-helix interactions can define a hierarchy of stability for sequence variants. Proceedings of the National Academy of Sciences of the United States of America 98(25), 14340–14344. 102 RUSS, W. P. & ENGELMAN, D. M. (2000). The GxxxG motif: a framework for transmembrane helix-helix association. J. Mol. Biol. 296, 911–919. 103 BROSIG, B. & LANGOSCH, D. (1998). The dimerization motif of the glycophorin A transmembrane segment in
33
34 Transmemebrane Domains in Proteins
104
105
106
107
108
109
110
111
112
membranes: importance of glycine residues. Protein Sci. 7, 1052–1056. MINGARRO, I., WHITLEY, P., LEMMON, M. A. & VONHEIJNE, G. (1996). Ala-insertion scanning mutagenesis of the glycophorin A transmembrane helix: a rapid way to map helix-helix interactions in integral membrane proteins. Protein Sci. 5, 1339–1341. WILLIAMS, K. A., GLIBOWICKA, M., LI, Z., LI, H., KHAN, A. R., CHEN, Y. M. Y., WANG, J., MARVIN, D. A. & DEBER, C. M. (1995). Packing of coat protein amphipathic and transmembrane helices in filamentous bacteriophage M13: role of small residues in protein oligomerization. J. Mol. Biol. 252, 6–14. ASUNDI, V. K. & CAREY, D. J. (1995). Self-association of N-syndecan (syndecan-3) core protein is mediated by a novel structural motif in the transmembrane domain and ecto-domain flanking region. J. Biol. Chem. 270, 26404–26410. MENDROLA, J. M., BERGER, M. B., KING, M. C. & LEMMON, M. A. (2002). The single transmembrane domains of ErbB receptors self-associate in cell membranes. J. Biol. Chem. 277(7), 4704–4712. KLEIGER, G., GROTHE, R., MALLICK, P. & EISENBERG, D. (2002). GXXXG and AXXXA: Common alpha-helical interaction motifs in proteins, particularly in extremophiles. Biochemistry 41(19), 5990–5997. ORZAEZ, M., PEREZ-PAYA, E. & MINGARRO, I. (2000). Influence of the C-terminus of the glycophorin A transmembrane fragment on the dimerization process. Prot. Sci. 9, 1246–1253. JAHN, R., LANG, T. & S¨UDHOF, T. C. (2003). Membrane Fusion. Cell 112, 519–533. S¨OLLNER, T., BENNETT, M. K., WHITEHEARD, S. W., SCHELLER, R. H. & ROTHMAN, J. E. (1993). A protein assembly-disassembly pathway in vitro that may correspond to sequential steps of synaptic vesicle docking, activation, and fusion. Cell 75, 409–418. SUTTON, R. B., FASSHAUER, D., JAHN, R. & BRU¨ NGER, A. T. (1998). Crystal structure
113
114
115
116
117
118
119
120
121
of a SNARE complex involved in synaptic exocytosis at 2.4A resolution. Nature 395, 347–353. CALAKOS, N. & SCHELLER, R. H. (1994). Vesicle-associated membrane protein and synaptophysin are associated on the synaptic vesicle. J. Biol. Chem. 269, 4534–4537. WASHBOURNE, P., SCHIAVO, G. & MONTECUCCO, C. (1995). Vesicle-associated membrane protein-2 (synaptobrevin-2) forms a complex with synaptophysin. Biochem. J. 305, 721–724. EDELMAN, L., HANSON, P. I., CHAPMAN, E. R. & JAHN, R. (1995). Synaptobrevin binding to synapto-physin: a potential mechanism for controlling the exocytotic fusion machine. EMBO J. 14, 224–231. PENNUTO, M., DUNLAP, D., CONTESTABILE, A. F. B. & VALTORTA, F. (2002). Fluorescence Resonance Energy Transfer Detection of Synaptophysin I and Vesicle-associated Membrane Protein 2 Interactions during Exocytosis from Single Live Synapses. Mol. Biol. Cell 13, 2706–2717. LAAGE, R. & LANGOSCH, D. (1997). Dimerization of the synaptic vesicle protein synaptobrevin/VAMP II depends on specific residues within the transmembrane segment. Eur. J. Biochem. 249, 540–546. MARGITTAI, M., OTTO, H. & JAHN, R. (1999). A stable interaction between syntaxin 1a and synaptobrevin 2 mediated by their transmembrane domains. FEBS Lett. 446, 40–44. LAAGE, R., ROHDE, J., BROSIG, B. & LANGOSCH, D. (2000). A conserved membrane-spanning amino acid motif drives homomeric and supports heteromeric assembly of presynaptic SNARE proteins. J. Biol. Chem. 275, 17481–17487. FLEMING, K. G. & ENGELMAN, D. M. (2001). Computation and mutagenesis suggest a right-handed structure for the synaptobrevin transmembrane dimer. Proteins 45(4), 313–317. POIRIER, M. A., HAO, J. C., MALKUS, P. N., CHAN, C., MOORE, M. F., KING, D. S. & BENNETT, M. K. (1998). Protease resistance of syntaxin SNAP-25 VAMP
References
122
123
124
125
126
127
128
129
complexes Implications for assembly and structure. J. Biol. Chem. 273, 11370–11377. HOLSINGER, L. J., NICHANI, D., PINTO, L. H. & LAMB, R. A. (1994). Influenza a Virus M(2) Ion-Channel Protein a Structure-Function Analysis. J. Virol. 68(3), 1551–1563. PINTO, L. H., DIECKMANN, G. R., GANDHI, C. S., PAPWORTH, C. G., BRAMAN, J., SHAUGNESSY, M. A., LEAR, J. D., LAMB, R. A. & DEGRADO, W. F. (1997). A functionally defined model for the M2 proton channel of influenza A virus suggests a mechanism for its ion selectivity. Proc. Natl. Acad. Sci. USA 94, 11301–11306. BAUER, C. M., PINTO, L. H., CROSS, T. A. & LAMB, R. A. (1999). The Influenza Virus M2 Ion Channel Protein: Probing the Structure of the Transmembrane Domain in Intact Cells by Using Engineered Disulfide Cross-Linking. Virology 254, 196–209. DIECKMANN, G. R. & DEGRADO, W. F. (1997). Modeling transmembrane helical oligomers. Curr. Opin. Struct. Biol. 7, 486–494. ARKIN, I. T., ADAMS, P. D., BRU¨ NGER, A. T., SMITH, S. & ENGELMAN, D. M. (1997). Structural perspectives of phospolamban, a helical transmembrane pentamer. Annu. Rev. Biophys. Biomol. Struct. 26, 157–179. ARKIN, I. T., ADAMS, P. D., MACKENZIE, K. R., LEMMON, M. A., BRU¨ NGER, A. T. & ENGELMAN, D. M. (1994). Structural organization of the pentameric transmembrane alpha-helices of phospholamban, a cardiac ion channel. EMBO J. 13, 4757–4764. SIMMERMAN, H. K. B., KOBAYASHI, Y. M., AUTRY, J. M. & JONES, L. R. (1996). A leucine zipper stabilizes the pentameric membrane domain of phospholamban and forms a coiled-coil pore structure. J. Biol. Chem. 271, 5941–5946. ARKIN, I. T., ROTHMAN, M., LUDLAM, C. F. C., AIMOTO, S., ENGELMAN, D., ROTHSCHILD, K. J. & SMITH, S. O. (1995). Structural Model of the phospholamban ion channel complex in phospholipid membranes. J. Mol. Biol. 248, 824–834.
130 KARIM, C. B., STAMM, J. D., KARIM, J., JONES, L. R. & THOMAS, D. D. (1998). Cysteine reactivity and oligomeric structures of phospholamban and its mutants. Biochemistry 37(35), 2074–2081. 131 TORRES, J., KUKOL, A. & ARKIN, I. T. (2001). Mapping the energy surface of transmembrane helix-helix interactions. Biophysical Journal 81(5), 2681– 2692. 132 LIVNAH, O., STURA, E. A., MIDDLETON, S. A., JOHNSON, D. L., JOLLIFFE, L. K. & WILSON, I. A. (1999). Crystallo-graphic evidence for preformed dimers of eryhtropoietin receptor before ligand activation. Science 283, 987–990. 133 REMY, I., WILSON, I. A. & MICHNICK, S. W. (1999). Erythropoietin receptor activation by a ligand-induced conformation change. Science 283, 990–993. 134 GUREZKA, R., LAAGE, R., BROSIG, B. & LANGOSCH, D. (1999). A Heptad Motif of Leucine Residues Found in Membrane Proteins Can Drive Self-Assembly of Artificial Transmembrane Segments. J. Biol. Chem. 274, 9265–9270. 135 CONSTANTINESCU, S. N., KEREN, T., SOCOLOVSKY, M., NAM, H. S., HENIS, Y. I. & LODISH, H. F. (2001). Ligand-independent oligomerization of cell-surface erythropoietin receptor is mediated by the transmembrane domain. Proc. Natl. Acad. Sci. U.S.A. 98, 4379–4384. 136 KUBATZKY, K. F., RUAN, W., GUREZKA, R., COHEN, J., KETTELER, R., WATOWICH, S. S., NEUMANN, D., LANGOSCH, D. & KLINGMU¨ LLER, U. (2001). Self-assembly of the transmembrane domain is a crucial mediator for signalling through the erythropoietin receptor. Current Biology 11, 110–115. 137 CONSTANTINESCU, S. N., LIU, X., BEYER, W., FALLON, A., SHEKAR, S., HENIS, Y. I., SMITH, S. O. & LODISH, J. F. (1999). Activation of the erythropoietin receptor by the gp55-P viral envelope protein is determined by a single amino acid in its trans-membrane domain. EMBO J. 18, 3334–3347. 138 SHAPIRO, L. & COLMAN, D. R. (1998). Structural biology of cadherins in the
35
36 Transmemebrane Domains in Proteins
139
140
141
142
143
144
145
146
147
148
149
nervous system. Curr. Opin. Neurobiol. 8, 593–599. HUBER, O., KEMMLER, R. & LANGOSCH, D. (1999). Mutations affecting transmembrane segment interaction impair adhesiveness of E-cadherin. J. Cell Sci. 112, 4415–4423. ZHOU, F. X., COCCO, M. J., RUSS, W. P., BRUNGER, A. T. & ENGELMAN, D. M. (2000). Interhelical hydrogen bonding drives strong interactions in membrane proteins. Nature Struct. Biol. 7, 154–160. CHOMA, C., GRATKOWSKI, H., LEAR, J. D. & DEGRADO, W. F. (2000). Asparagine-mediated self-association of a model transmembrane helix. Nature Struct. Biol. 7, 161–166. O’SHEA, E. K., KLEMM, J. D., KIM, P. S. & ALBER, T. (1991). X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science 243, 539–544. ALBER, T. (1992). Structure of the leucine zipper. Corr. Op. Genet. and Developm. 2, 205–210. HARBURY, P. B., ZHANG, T., KIM, P. S. & ALBER, T. (1993). A switch between two-, three-, and four-stranded coiled coils in GCN4 leucine zipper mutants. Science 262, 1401–1406. GONZALEZ, L., PLECS, J. J. & ALBER, T. (1996). An engineered allosteric switch in leucine-zipper oligomerization. Nature Struct. Biol. 3, 510–515. HARBURY, P. B., KIM, P. S. & ALBER, T. (1994). Crystal structure of an isoleucine-zipper trimer. Nature 371, 80–83. GRATKOWSKI, H., LEAR, J. D. & DEGRADO, W. F. (2001). Polar side chains drive the association of model transmembrane peptides. Proc. Natl. Acad. Sci. USA 98, 880–885. ZHOU, F. X., MERIANOS, H. J., BRU¨ NGER, A. T. & ENGELMAN, D. M. (2001). Polar residues drive association of polyleucine transmembrane helices. Proc. Natl. Acad. Sci. USA. WEINER, D. B., LIU, J., COHEN, J. A., WILLIAMS, W. V. & GREENE, M. I. (1989). A point mutation in the neu oncogene mimics ligand induction of receptor aggregation. Nature 339, 230–231.
150 WEBSTER, M. & DONOGHUE, J. (1996). Constitutive activation of fibroblast growth factor receptor 3 by the transmembrane domain point mutation found in achondroplasia. EMBO J. 15, 520–527. 151 ONISHI, M., MUI, A. L. F., MORIKAWA, Y., CHO, L., KINOSHITA, S., NOLAN, G. P., GORMAN, D. M., MIYAJIMA, A. & KITAMURA, T. (1996). Identification of an oncogenic form of the thrombo-poietin receptor MPL using retrovirus-mediated gene transfer. Blood 88(4), 1399–1406. 152 FORBES, L. V., GALE, R. E., PIZZEY, A., POUWELS, K., NATHWANI, A. & LINCH, D. C. (2002). An activating mutation in the transmembrane domain of the granulocyte colony-stimulating factor receptor in patients with acute myeloid leukemia. Oncogene 21(39), 5981– 5989. 153 CALL, M. E., PYRDOL, J., WIEDMANN, M. & WUCHERPFENNIG, K. W. (2002). The Organizing Principle in the Formation of the T Cell Receptor-CD3 Complex. Cell 111, 967–979. 154 ALCOVER, A., MARIUZZA, R. A., ERMONVAL, M. & ACUTO, O. (1990). Lysine 271 in the Transmembrane Domain of the T-Cell Antigen Receptor Beta-Chain Is Necessary for Its Assembly with the Cd3 Complex but Not for Alpha Beta-Dimerization. J. Biol. Chem. 265(7), 4131–4135. 155 BLUMBERG, R. S., ALARCON, B., SANCHO, J., MCDERMOTT, F. V., LOPEZ, P., BREITMEYER, J. & TERHORST, C. (1990). Assembly and Function of the T-Cell Antigen Receptor Requirement of Either the Lysine or Arginine Residues in the Transmembrane Region of the Alpha-Chain. J. Biol. Chem. 265(23), 4036–4043. 156 COSSON, P., LANKFORD, S., BONIFACINO, J. S. & KLAUSNER, R. D. (1991). Membrane protein association by potential intramembrane charge pairs. Nature 351, 414–416. 157 BONIFACINO, J. S., SUZUKI, C. K. & KLAUSNER, R. D. (1990). A peptide sequence confers retention and rapid degradation in the endoplasmic reticulum. Science 247, 79–82.
References 158 HEINRICH, S. U., MOTHES, W., BRUNNER, J. & RAPOPORT, T. A. (2000). The Sec61p complex mediates the integration of a membrane protein by allowing lipid partitioning of the transmembrane domain. Cell 102(2), 233–244. 159 SHIGEMATSU, D., MATSUTANI, M., FURUYA, T., KIYOTA, T., LEE, S., SUGIHARA, G. & YAMASHITA, S. (2002). Roles of peptide-peptide charge interaction and lipid phase separation in helix-helix association in lipid bilayer. Biochim. Biophys. Acta 1564, 271–280. 160 GUREZKA, R. & LANGOSCH, D. (2001). In vitro selection of membrane-spanning leucine zipper protein-protein interaction motifs using POSSYCCAT. J. Biol. Chem. 276, 45580–45587. 161 DAWSON, J. P., WEINGER, J. S. & ENGELMAN, D. M. (2002). Motifs of serine and threonine can drive association of transmembrane helices. J. Mol. Biol. 316(3), 799–805. 162 PEBAY-PEYROULA, E. & ROSENBUSCH, J. P. (2001). High-resolution structures and dynamics of membrane protein lipid complexes: a critique. Curr. Opin. Struct. Biol. 11(4), 427–432. 163 FYFE, P. K., MCAULEY, K. E., ROSZAK, A. W., ISAACS, N. W., COGDELL, R. J. & JONES, M. R. (2001). Probing the interface between membrane proteins and membrane lipids by X-ray crystallography. Trends Biochem. Sci. 26(2), 106–112. 164 LEE, A. G. (2003). Lipid-protein interactions in biological membranes: a structural perspective. Biochim. Biophys. Acta 1612(1), 1–40. 165 ESSEN, L., SIEGERT, R., LEHMANN, W. D. & OESTERHELT, D. (1998). Lipid patches in membrane protein oligomers: crystal structure of the bacteriorhodopsin-lipid complex. Proc. Natl. Acad. Sci. USA 95(20), 1673–1678. 166 CARTAILLER, J. P. & LUECKE, H. (2003). X-ray crystallographic analysis of lipid-protein interactions in the bacteriorhodopsin purple membrane. Annu. Rev. Biophys. Biomol. Struct. 32, 285–310. 167 SEDLAK, E. & ROBINSON, N. C. (1999). Phospholipase A(2) digestion of
168
169
170
171
172
173
174
175
176
177
cardiolipin bound to bovine cyto-chrome c oxidase alters both activity and quaternary structure. Biochemistry 38(45), 14966–14972. LANGE, C., NETT, J. H., TRUMPOWER, B. L. & HUNTE, C. (2001). Specific roles of protein-phospholipid interactions in the yeast cytochrome bc(1) complex structure. EMBO J. 20(23), 6591–6600. CAMARA-ARTIGAS, A., BRUNE, D. & ALLEN, J. P. (2002). Interactions between lipids and bacterial reaction centers determined by protein crystallography. Proc. Natl. Acad. Sci. USA 99(17), 11055–11060. JORMAKKA, M., TORNROTH, S., BYRNE, B. & IWATA, S. (2002). Molecular basis of proton motive force generation: structure of formate dehydrogenase-N. Science 295(5561), 1863–1868. PETRACHE, H. I., GROSSFIELD, A., MACKENZIE, K. R., ENGELMAN, D. M. & WOOLF, T. B. (2000). Modulation of glycophorin A transmembrane helix interactions by lipid bilayers: molecular dynamics calculations. J. Mol. Biol. 302(3), 727–746. MALL, S., BROADBRIDGE, R., SHARMA, R. P., EAST, J. M. & LEE, A. G. (2001). Self-association of model transmem-brane alpha-helices is modulated by lipid structure. Biochemistry 40(41), 12379–12386. GIL, T., IPSEN, J. H., MOURITSEN, O. G., SABRA, M. C., SPEROTTO, M. M. & ZUCKERMANN, M. J. (1998). Theoretical analysis of protein organization in lipid membranes. Biochim. Biophys. Acta 1376(3), 245–266. CANTOR, R. S. (1999). Lipid composition and the lateral pressure profile in bilayers. Biophys. J. 76(5), 2625–2639. CANTOR, R. S. (2002). Size distribution of barrel-stave aggregates of membrane peptides: influence of the bilayer lateral pressure profile. Biophys. J. 82(5), 2520–2525. KILLIAN, J. A. (1998). Hydrophobic mismatch between proteins and lipids in membranes. Biochim. Biophys. Acta 1376(3), 401–415. DUMAS, F., LEBRUN, M. C. & TOCANNE, J. F. (1999). Is the protein/lipid
37
38 Transmemebrane Domains in Proteins
178
179
180
181
182
183
184
185
hydrophobic matching principle relevant to membrane organization and functions? FEBS Lett. 458(3), 271–277. LEWIS, B. A. & ENGELMAN, D. M. (1983). Bacteriorhodopsin remains dispersed in fluid phospholipid bilayers over a wide range of bilayer thicknesses. J. Mol. Biol. 166(2), 203–210. REN, J., LEW, S., WANG, J. & LONDON, E. (1999). Control of the transmem-brane orientation and interhelical interactions within membranes by hydrophobic helix length. Biochemistry 38, 5905–5912. DENCHER, N. A., KOHL, K. D. & HEYN, M. P. (1983). Photochemical cycle and light-dark adaptation of monomeric and aggregated bacteriorhodopsin in various lipid environments. Biochemistry 22(6), 1323–1334. PIKNOVA, B., PEROCHON, E. & TOCANNE, J. F. (1993). Hydrophobic mismatch and long-range protein/lipid interactions in bacteriorho-dopsin/phosphatidylcholine vesicles. Eur. J. Biochem. 218(2), 385–396. HOROWITZ, A. D. (1995). Exclusion of SP-C, but not SP-B, by gel phase palmitoyl lipids. Chem. Phys. Lipids 76(1), 27–39. ZHANG, Y. P., LEWIS, R. N., HODGES, R. S. & MCELHANEY, R. N. (2001). Peptide models of the helical hydrophobic transmembrane segments of membrane proteins: interactions of acetyl-K2-(LA)12-K2-amide with phosphatidylethanolamine bilayer membranes. Biochemistry 40(2), 474–482. RINIA, H. A., KIK, R. A., DEMEL, R. A., SNEL, M. M., KILLIAN, J. A., van der EERDEN, J. P. & de KRUIJFF, B. (2000). Visualization of highly ordered striated domains induced by transmembrane peptides in supported phosphatidylcholine bilayers. Biochemistry 39(19), 5852–5858. JONES, D. H., RIGBY, A. C., BARBER, K. R. & GRANT, C. W. M. (1997). Oligomerization Of the EGF Receptor Transmembrane Domain: a H2 NMR Study In Lipid Bilayers. Biochemistry 36, 12616–12624.
186 Van DUYL, B. Y., RIJKERS, D. T. S., KRUIJFF, B. d. & KILLIAN, J. A. (2002). Influence of hydrophobic mismatch and palmitoylation on the association of transmembrane α-helical peptides with detergent-resistant membranes. FEBS Lett., 79–84. 187 SCHEIFFELE, P., ROTH, M. G. & SIMONS, K. (1997). Interaction Of Influenza Virus Haemagglutinin With Sphingolipid Cholesterol Membrane Domains Via Its Transmembrane Domain. EMBO J. 16, 5501–5508. 188 PERSCHL, A., LESLEY, J., ENGLISH, N., HYMAN, R. & TROWBRIDGE, I. S. (1995). Transmembrane domain of CD44 is required for its detergent insolubility in fibroblasts. J. Cell Sci. 108(Pt 3), 1033–1041. 189 FIELD, K. A., HOLOWKA, D. & BAIRD, B. (1999). Structural aspects of the association of FcepsilonRI with detergent-resistant membranes. J. Biol. Chem. 274(3), 1753–1758. 190 THERIEN, A. G. & DEBER, C. M. (2002). Interhelical packing in detergent micelles Folding of a cystic fibrosis transmembrane conductance regulator construct. J. Biol. Chem. 277, 6067–6072. 191 VOGEL, H. (1992). Structure and dynamics of polypeptides and proteins in lipid membranes. Q. Rev. Biophys. 25, 433–457. 192 BELOHORCOVA, K., DAVIS, J. H., WOOLF, T. B. & ROUX, B. (1997). Structure and Dynamics of an Amphiphilic Peptide in a Lipid Bilayer: A Molecular Dynamics Study. Biophys. J. 73, 3039–3055. 193 BRANDL, C. J. & DEBER, C. M. (1986). Hypothesis About the Function of Membrane-Buried Proline Residues in Transport Proteins. Proc. Natl. Acad. Sci. U.S.A. 83(4), 917–921. 194 LABRO, A. J., RAES, A. L., OTTSCHYTSCH, N. & SNYDERS, D. J. (2001). Role of the S6 tandem proline motif in gating of Kv channels. Biophys. J. 80(1), 441A–441A. 195 SANSOM, M. S. P. & WEINSTEIN, H. (2000). Hinges, swivels and switches: the role of prolines in signalling via transmembrane α-helices. Trends. Pharmacol. Sci. 21, 445–451.
References 196 SHELDEN, M. C., LOUGHLIN, P., TIERNEY, M. L. & HOWITT, S. M. (2001). Proline residues in two tightly coupled helices of the sulphate transporter, SHST1, are important for sulphate transport. Biochem. J. 356, 589–594. 197 DUCLOHIER, H., MOLLE, G., DUGAST, J. Y. & SPACH, G. (1992). Prolines Are Not Essential Residues in the Barrel-Stave Model for Ion Channels Induced by Alamethicin Analogs. Biophys. J. 63(3), 868–873. 198 KADUK, C., DUCLOHIER, H., DATHE, M., WENSCHUH, H., BEYERMANN, M., MOLLE, G. & BIENERT, M. (1997). Influence of proline position upon the ion channel activity of alamethicin. Biophys. J. 72(5), 2151–2159. 199 GIBBS, N., SESSIONS, R. B., WILLIAMS, P. B. & DEMPSEY, C. E. (1997). Helix bending in alamethicin: molecular dynamics simulations and amide hydrogen exchange in methanol. Biophys. J. 72, 2490–2495. 200 TIELEMAN, D. P., SANSOM, M. S. P. & BERENDSEN, H. J. C. (1999). Alamethicin Helices in a Bilayer and in Solution: Molecular Dynamics Simulations. Biophys. J. 76, 40–49. 201 JACOB, J., DUCLOHIER, H. & CAFISO, D. S. (1999). The Role of Proline and Glycine in Determining the Backbone Flexibility of a Channel-Forming Peptide. Biophys. J. 76, 1367–1376. 202 DEMMERS, J. A. A., DUIJN, E. v., HAVERKAMP, J., GREATHOUSE, D. V. I. I. R. E. K., HECK, A. J. R. & KILIAN, J. A. (2001). Interfacial Positioning and Stability of Transmembrane Peptides in Lipid Bilayers Studied by Combining Hydrogen/Deuterium Exchange and Mass Spectrometry. J. Biol. Chem. 276, 4501–4508. 203 LANGOSCH, D., CRANE, J. M., BROSIG, B., HELLWIG, A., TAMM, L. K. & REED, J. (2001). Peptide mimics of SNARE transmembrane segments drive membrane fusion depending on their conformational plasticity. J. Mol. Biol. 311, 709–721. 204 LANGOSCH, D., BROSIG, B. & PIPKORN, R. (2001). Peptide mimics of the vesicular stomatitis virus G-protein
205
206
207
208
209
210
211
212
213
214
transmembrane segment drive membrane fusion in vitro. J. Biol. Chem. 276, 32016–32021. DENNISON, S. M., GREENFIELD, N., LENARD, J. & LENTZ, B. R. (2002). VSV transmembrane domain (TMD) peptide promotes PEG-mediated fusion of liposomes in a conformationally sensitive fashion. Biochemistry 41(50), 14925–14934. LI, L. & CHIN, L. S. (2003). The molecular machinery of synaptic vesicle exocytosis. Cell. Mol. Life Sci. 60(5), 942– 960. SCALES, S. J., BOCK, J. B. & SCHELLER, R. H. (2000). Cell biology The specifics of membrane fusion. Nature 407, 144–146. BLUMENTHAL, R., CLAGUE, M. J., DURELL, S. T. & EPAND, R. M. (2003). Membrane Fusion. Chem. Rev. 103, 53–69. MELIKYAN, G. B., LIN, S. S., ROTH, M. G. & COHEN, F. S. (1999). Amino acid sequence requirements of the transmembrane and cytoplasmic domains of influenza virus hemagglutinin for viable membrane fusion. Mol. Biol. Cell 6, 1821–1836. OWENS, R. J., BURKE, C. & ROSE, J. K. (1994). Mutations in the membrane-spanning domain of the human immunodeficiency virus envelope glycoprotein that affect fusion activity. J. Virol. 68, 570–574. TAYLOR, G. M. & SANDERS, D. A. (1999). The role of the membrane-spanning domain sequence in glycoproteinmediated membrane fusion. Mol. Biol. Cell 10, 2803–2815. ODELL, D., WANAS, E., YAN, J. & GHOSH, H. P. (1997). Influence of membrane anchoring and cytoplasmic domains on the fusogenic activity of vesicular stomatitis virus glycoprotein G. J. Virol. 71, 7996–8000. CLEVERLEY, D. Z. & LENARD, J. (1998). The transmembrane domain in viral fusion: essential role for a conserved glycine residue in vesicular stomatitis virus G protein. Proc. Natl. Acad. Sci. USA 95, 3425–3430. GROTE, E., BABA, M., OHSUMI, Y. & NOVICK, P. J. (2000). Geranyl-geranylated SNAREs are dominant inhibitors of
39
40 Transmemebrane Domains in Proteins
215
216
217
218
219
220
221
222
223
224
membrane fusion. J. Cell Biol. 151, 453–465. ROHDE, J., DIETRICH, L., LANGOSCH, D. & UNGERMANN, C. (2003). The transmembrane domain of Vam3 affects the composition of cis- and trans-SNARE complexes to promote homotypic vacuole fusion. J. Biol. Chem. 278(3), 1656–1662. TATULIAN, S. A. & TAMM, L. K. (2000). Secondary structure, orientation, oligomerization, and lipid interactions of the transmembrane domain of influenza hemagglutinin. Biochemistry 39, 496–507. SZYPERSKI, T., VANDENBUSSCHE, G., CURSTEDT, T., RUYSSCHAERT, J. M., W¨UTHRICH, K. & JOHANSSON, J. (1998). Pulmonary surfactant-associated polypeptide C in a mixed organic solvent transforms from a monomeric alpha-helical state into insoluble beta-sheet aggregates. Protein Sci. (12), 2533–2540. ZHANG, Y.-P., LEWIS, R. N. A.H., HODGES, R. S. & MCELHANEY, R. N. (1992). FTIR Spectroscopic studies of the conformation and amide hydrogen exchange of a peptide model of the hydrophobic transmembrane α-helices of membrane proteins. Biochemistry 31, 1572–1578. HELENIUS, A., MCCASLIN, D. R., FRIES, E. & TANFORD, C. (1979). Properties of Detergents. Meth. Enzymol. 56, 734–749. TORRES, J., STEVENS, T. J. & SAMSO, M. (2003). Membrane proteins: the ‘Wild West’ of structural biology. Trends Biochem. Sci. 28(3), 137–144. LAAGE, R. & LANGOSCH, D. (2001). Strategies for prokaryotic expression of eukaryotic membrane proteins. Traffic 2, 99–104. GRISSHAMMER, R. & TATE, C. G. (1995). Overexpression of integral membrane proteins for structural studies. Q. Rev. Biophys. 28, 315–422. TATE, C. G. & GRISSHAMMER, R. (1996). Heterologous expression of G-protein-coupled receptors. TIBTECH 14, 426–430. HACKAM, A. S., WANG, T.-L., GUGGINO, W. B. & CUTTING, G. R. (1997). The
225
226
227
228
229
230
231
232
233
234
N-terminal domain of human GABA receptor P1 subunits contains signals for homooligomeric and heterooligomeric interaction. J. Biol. Chem. 272, 3750–3757. SZCZESNA SKORUPA, E. & KEMPER, B. (1991). Cell-free analysis of targeting of cytochrome P450 to microsomal membranes. Methods Enzymol. 206, 64–75. ROSENBERG, R. L. & EAST, J. E. (1992). Cell-free expression of functional Shaker potassium channels. Nature 360, 166–169. RUSINOL, A. E., JAMIL, H. & VANCE, J. E. (1997). In vitro reconstitution of assembly of apolipoprotein B48-containing lipoproteins. J. Biol. Chem. 272, 8019–8025. HUPPA, J. B. & PLOEGH, H. L. (1997). In vitro translation and assembly of a complete T cell receptor-CD3 complex. J. Exp. Med. 186, 393–403. FISHER, L. E. & ENGELMAN, D. M. (2001). High-yield synthesis and purification of an alpha-helical transmembrane domain. Analytical Biochemistry 293(1), 102–108. MELNYK, R. A., PARTRIDGE, A. W. & DEBER, C. M. (2002). Transmembrane domain mediated self-assembly of major coat protein subunits from Ff bacteriophage. J. Mol. Biol. 315(1), 63–72. GERBER, D. & SHAI, Y. (2001). In vivo detection of hetero-association of glycophorin-A and its mutants within the membrane. J. Biol. Chem. 276(33), 1229–1232. LAEMMLI, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680–685. SOULIEE´ , S., MOLLER, J. V., FALSON, P. & MAIRE, l. M. (1996). Urea Reduces the Aggregation of Membrane Proteins on Sodium Dodecyl Sulfate-Polyacrylamide Gel Electrophoresis. Anal. Biochem. 236, 363–364. MILLAR, D. G. & SHORE, G. C. (1993). The signal anchor sequence of mitochondrial Mas70p contains an oligomerization
References
235
236
237
238
239
240
241
242
243
domain. J. Biol. Chem. 268, 18403–18406. DEBER, C. M., KHAN, A. R., ZUOMEI, L., JOENSSON, C., GLIBOWICKA, M. & WANG, J. (1993). Val-Ala mutations selectively alter helix-helix packing in the transmembrane segment of phage M13 coat protein. Proc. Natl. Acad. Sci. USA 90, 1648–1652. RAMJEESINGH, M., HUAN, L., GARAMI, E. & BEAR, C. E. (1999). Novel method for evaluation of the oligomeric structure of membrane proteins. Biochem. J. 342, 119–123. PETERS, K. & RICHARDS, M. (1977). Chemical Cross-linking: Reagents and Problems in Studies of Membrane Structure. Annu. Rev. Biochem. 46, 523–551. GAFFNEY, B. J. (1985). Chemical and biochemical crosslinking of membrane components. Biochim. Biophys. Acta 822, 289–317. SCHA¨ GGER, H., CRAMER, W. A. & VON JAGOW, G. (1994). Analysis of molecular masses and oligomeric states of protein complexes by blue native electrophoresis and isolation of membrane protein complexes by two-dimensional native electrophoresis. Anal. Biochem. 217, 220–230. SCHA¨ GGER, H. & VON JAGOW, G. (1991). Blue native electrophoresis for isolation of membrane protein complexes in enzymatically active form. Anal. Biochem. 199, 223–231. POETSCH, A., NEFF, D., SEELERT, H., SCHA¨ GGER, H. & DENCHER, N. A. (2000). Dye removal, catalytic activity and 2D crystallization of chloroplast H+ -ATP synthase purified by blue native electrophoresis. Biochim. Biophys. Acta 1466, 339–349. REXROTH, S., MEYER ZU TITTINGDORF, J. M. W., KRAUSE, F., DENCHER, N. A. & SEELERT, H. (2003). Thylakoid membrane at altered metabolic state: Challenging the forgotten realms of the proteome. Electrophoresis 24, 2814–2823. CLARKE, S. (1975). Size and Detergent Binding of Membrane Proteins. J. Biol. Chem. 250(14), 5459–5469.
244 SELVIN, P. R. (1995). Fluorescence resonance energy transfer. Meth. Enzymol. 246, 300–334. 245 CLEGG, R. M. (1995). Fluorescence resonance energy transfer. Curr. Op. Biotech. 6, 103–110. 246 WU, P. a. B. L. (1994). Resonance energy transfer: methods and applications. Anal. Biochem. 218, 1–13. 247 PELED, H. & SHAI, Y. (1993). Membrane interaction and self-assembly within phospholipid membranes of synthetic segments corresponding to the H-5 region of the shaker K+ channel. Biochemistry 32, 7879–7885. 248 YANO, Y., TAKEMOTO, T., KOBAYASHI, S., YASUI, H., SAKURAI, H., OHASHI, W., NIWA, M., FUTAKI, S., SUGIURA, Y. & MATSUZAKI, K. (2001). Topological Stability and Self-Association of a Completely Hydrophobic Model Transmembrane Helix in Lipid Bilayers. Biochemistry 41, 3073–3080. 249 GAZIT, E. & SHAI, Y. (1995). The assembly and organization of the alpha5 and alpha7 helices from the pore-forming domain of Bacillus thuringiensis delta-endotoxin. J. Biol. Chem. 270, 2571–2578. 250 LI, M., REDDY, L. G., BENNETT, R., SILVA, N. D., JONES, L. R. & THOMAS, D. D. (1999). A fluorescence energy transfer method for analyzing protein oligomeric structure: application to phospholamban. Biophys. J. 76, 2587–2599. 251 ADAIR, B. D. & ENGELMAN, D. M. (1994). Glycophorin A helical transmembrane domains dimerize in phospholipid bilayers: a resonance energy transfer study. Biochemistry 33, 5539–5544. 252 MITRA, R. D., SILVA, C. M. & YOUVAN, D. C. (1996). Fluorescence resonance energy transfer between blue-emitting and red-shifted excitation derivatives of the green fluorescent protein. Gene 173, 13–17. 253 HEIM, R. & TSIEN, R. Y. (1996). Engineering green fluorescent protein for improved brightness, longer wavelenghts and fluorescence resonance energy transfer. Curr. Biol. 6, 178–182.
41
42 Transmemebrane Domains in Proteins
254 ELLENBERG, J., LIPPINCOTT-SCHWARTZ, J. & PRESLEY, J. F. (1999). Dual-color imaging with GFP variants. Trends. Cell Biol. 9, 52–56. 255 LIPPINCOTT-SCHWARTZ, J. & PATTERSON, G. H. (2003). Development and Use of Fluorescent Protein Markers in Living Cells. Science 300, 87–91. 256 MAJOUL, I., STRAUB, M., DUDEN, R., HELL, S. W. & S¨OLING, H.-D. (2002). Fluorescence resonance energy transfer analysis of protein-protein interactions in single living cells by multifocal multiphoton microscopy. Rev. Mol. Biotech. 82, 267–277. 257 COUTURIER, C., AYOUB, M. A. & JOCKERS, R. (2002). BRET erm¨oglicht die Messung von Protein-Interaktionen in lebenden Zellen. Biospektrum 5, 612–615. 258 AYOUB, M. A., COUTURIER, C., LUCAS-MEUNIER, E., ANGERS, S., FOSSIER, P., BOUVIER, M. & JOCKERS, R. (2002). Monitoring of ligand-independent dimerization and ligand-induced conformational changes of melatonin receptors in living cells by bioluminescence resonance energy transfer. J. Biol. Chem. 277(24), 21522–21528. 259 KROEGER, K. M., HANYALOGLU, A. C. & EIDNE, K. A. (2001). Applications of BRET to study dynamic G-protein coupled receptor interactions in living cells. Letters Pep. Sci. 8(3-5), 155– 162. 260 EIDNE, K. A., KROEGER, K. M. & HANYALOGLU, A. C. (2002). Applications of novel resonance energy transfer techniques to study dynamic hormone receptor interactions in living cells. Trends Endocrin. Metabol. 13(10), 415–421. 261 DIRITA, V. (1992). Co-ordinate expression of virulence genes by ToxR in Vibrio cholerae. Mol. Microbiol. 6, 451–458. 262 KOLMAR, H., FRITSCH, C., KLEEMAN, G., G¨OTZE, K., STEVENS, F. J. & FRITZ, H. J. (1994). Dimerization of bence jones proteins: linking the rate of transcription from an Escherichia coli promoter to the association constant of REIv. Biol. Chem. Hoppe-Seyler 375, 61–69.
263 BEDOUELLE, H. & DUPLAY, P. (1988). Production in Escherichia coli and one-step purification of bifunctional hybrid proteins which bind maltose export of the Klenow polymerase into the periplasmic space. Eur. J. Biochem. 171, 541–549. 264 LEEDS, J. A. & BECKWITH, J. (1998). Lambda repressor n-terminal DNA-binding domain as an assay for protein transmembrane segment interactions in vivo. J. Mol. Biol. 280(5), 799–810. 265 SCHNEIDER, D. & ENGELMAN, D. M. (2003). GALLEX: a measurement of heterologous association of trans-membrane helices in a biological membrane. J. Biol. Chem. 278, 3105–3111. 266 JOHNSSON, N. & VARSHAVSKY, A. (1994). Split ubiquitin as a sensor of protein interactions in vivo. Proc. Natl. Acad. Sci. USA 91, 10340–10344. 267 STAGLJAR, I., KOROSTENSKY, C., JOHNSSON, N. & TEHEESEN, S. (1998). A genetic system based on split-ubiquitin for the analysis of interactions between membrane proteins in vivo. Proc. Natl. Acad. Sci. USA 95, 5187– 5192. 268 WITTKE, S., LEWKE, N., MULLER, S. & JOHNSSON, N. (1999). Probing the molecular environment of membrane proteins in vivo. Mol. Biol. Cell 10(8), 2519–2530. 269 STAGLJAR, I. & FIELDS, S. (2002). Analysis of membrane protein interactions using yeast-based technologies. Trends. Biochem. Sci. 27, 559–563. 270 LEE, G. F. & et al. (1994). Deducing the organization of a transmembrane domain by disulfide crosslinking. J. Biol. Chem. 269, 29920–29927. 271 PAKULA, A. A. & SIMON, M. I. (1992). Determination of transmembrane protein structure by disulfide cross-linking: the Escherichia coli Tar receptor. Proc. Natl. Acad. Sci. USA 89, 4144–4148. 272 DOYLE, D. A., CABRAL, J. M., PFUETZNER, R. A., KUO, A., GULBIS, J. M., COHEN, S. L., CHAIT, B. T. & MACKINNON, R. (1998). The structure of the potassium channel:
References molecular basis of K+ conduction and selectivity. Science 280, 69–77. 273 COWAN, S. W., SCHIRMER, T., RUMMEL, G., STEIERT, M., GHOSH, R., PAUPTIT, R. A., JANSONIUS, J. N. & ROSENBUSCH, J. P. (1992). Crystal structures explain functional properties of two E. coli porins. Nature 358, 727–733. 274 RUAN, W., BECKER, V., KLINGMULLER, U., LANGOSCH, D. The interface between self-assembling erythro-poietin receptor transmembrane segments corresponds to a membrane-spanning leucine zipper. J. Biol. Chem. 279(5), 3273–3279. 275 RUAN, W., LINDNER, E., and LANGOSCH, D. (2004). The interface of a membrane-spanning leucine zipper
mapped by asparagine-scanning mutagenesis. Prot. Sci. 13, 555– 559. 276 TOUTAIN, C. M., CLARKE, D. J., LEEDS, J. A., KUHN, J., BECKWITH, J., HOLLAND, I. B., JACQ, A. (2003). The transmembrane domain of the DnaJ-like protein DjlA is a dimerisation domain. Mol. Genet. Genom. 268(6), 761–770. 277 LEEDS, J. A., BOYD, D., HUBER, D. R., SONODA, G. K., LUU, H. T., ENGELMAN, D. M., BECKWITH, J. (2001). Genetic selection for and molecular dynamic modeling of a protein transmembrane domain multimerization motif from a random Escherichia coli genomic library. J. Mol. Biol. 313(1), 181–195.
43
1
The Structures of General Porins Georg E. Schulz
Universit¨at Freiburg, Freiburg, Germany
Originally published in: Bacterial and Eukaryotic Porins. Edited by Roland Benz. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30775-3
1 Bacterial Outer Membrane Proteins
A large group of bacteria possesses an outer membrane in addition to the common inner or cytoplasmic membrane. The outer membrane functions as a shield against an unfavorable environment and is permeable for small polar solutes. In contrast to the inner membrane, it contains only few proteins. As a general rule, the outer membrane proteins consist of β-barrels, whereas the inner membrane proteins have mostly α-helices. Due to the tight network of strong main-chain hydrogen bonds, the β-barrels are much more stable than the inner membrane α-helix bundles, which have only isolated strings of main-chain hydrogen bonds within each of the α-helices. The intrinsically strong structure of the β-barrels resists chemical and mechanical stress, and it corresponds to the relatively good crystallizability of these proteins resulting in numerous established structures. According to their functions and structures, the outer membrane proteins can be subdivided into the five groups of Table 1. The general porins yielded the first X-ray-grade membrane protein crystals [1], indicating that their intrinsic stability approaches that of soluble proteins [2, 3]. They constitute the largest group [4–11]. The specific porins are structurally closely related to the general porins although their β-barrels have two more strands [12–14]. All of these porins are trimers (Figure 1). A quite different group is composed of pores where the β-barrel itself is a homo-oligomer, a dimer in gramicidin A [15], a heptamer in α-hemolysin [16], and a trimer in TolC [17]. In α-hemolysin and TolC the contact within the β-barrel is obviously of a secondary nature because the major oligomer interface lies between the globular domains of the monomers. In the monomer, the polypeptide segment which forms the barrel has a very different structure or none at all [18]. The group of outer membrane transport proteins has a much larger β-barrel and additional Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Structures of General Porins Table 1 Available structures of bacterial outer membrane porins all of which consist of β-
barrels (according to Figure 2 the β-barrels are described by the number of strands n and by the shear number S). Porin type
Organism
Number of β-strands
Shear number
PDB code
Reference
General porins Rc-porin OmpF PhoE Rb-porin Pd-porin OmpK36 Omp32
Rhodobacter capsulatus Escherichia coli E. coli Rhodobacter blasticus Paracoccus denitrificans Klebsiella pneumoniae Comamonas acidovorans
16 16 16 16 16 16 16
20 20 20 20 20 20 20
2POR 2OMF 1PHO 1PRN – 1OSM 1E54
[5] [6] [6] [8] [9] [10] [11]
Specific porins LamB LamB ScrY
E. coli Salmonella typhimurium S. typhimurium
18 18 18
20 20 20
1MAL 1MPR 1OH2
[12] [13] [14]
Composed pores Gramicidin A a-hemolysin TolC
Bacillus brevis Staphylococcus aureus E. coli
(2) 14 12
(6) 14 20
1MAG 7AHL 1EK9
[15] [16] [17]
Transport Omp FhuA FhuA FepA FecA BtuB
E. coli E. coli E. coli E. coli E. coli
22 22 22 22 22
24 24 24 24 24
1BY3 2FCP 1FEP 1KMO 1NQH
[19] [20] [21] [22] [23]
Poreless Omp OmpA OmpX OmpLA OmpT OpcA NspA
E. coli E. coli E. coli E. coli Neisseria meningitidis N. meningitidis
8 8 12 10 10 8
10 8 12 12 10 10
1QJP 1QJ8 1QD5 1I78 1K24 1P4T
[25] [26] [27] [28] [29] [30]
internal domains which may work as a stopper [19–23]. They do not form a pore, but open and close for the passage of relatively large molecules such as iron-containing siderophores or cobalamins through the outer membrane. Another group of outer membrane proteins contains small β-barrels that are not likely to form a pore in their native conformation. Rather, they function as membrane anchors, and fulfill numerous purposes using their extracellular and periplasmic surfaces [24– 30]. Their small β-barrels have a non-polar exterior and a polar interior, and can therefore be considered inverse micelles. The β-barrels of the outer membrane proteins show common construction principles which are sketched in Figure 2. The descriptive parameters are tilting angle
1 Bacterial Outer Membrane Proteins
Fig. 1 Stereoviews of ribbon plots of the four distinct general porins. (a) Rb-porin as depicted in a projection along the membrane normal, showing that the membranefacing part approximates closely to an upright cylinder. The central hub is not hollow, but filled by residue side-chains; the spikes
contain the interfaces between barrels. Hub and spikes have a non-polar interior and a polar surface, like soluble proteins. The other pictures are views from the external side: (b) Rb-porin, (c) OmpF, (d) Omp32 and (e) Rc-porin.
3
4 The Structures of General Porins
Fig. 1 (Continued)
a, shear number S [31], strand number n and barrel radius R, which are related to each other according to the given formulae. The β-barrels of outer membrane proteins obey the six rules stated below. i) The N- and C-termini are at the periplasmic end, and the number of β-strands is even. ii) The tilting angle a varies between 30◦ and 60◦ . The tilt is always to the right when viewed from the outside, rendering the barrel construction chiral. The mirror image is disfavored by the steric hindrance at the Cα atoms as demonstrated by the asymmetry of the Ramachandran plot [32]. iii) All β-strands are connected locally to their immediate neighbors along the chain, resulting in an antiparallel β-sheet pattern that maximizes the neighborhood correlation [32]. Viewed from the outside, the polypeptide runs from right to left.
2 Construction of General Porins
Fig. 2 General construction principle of $barrels. The tilt angle " is assumed to be constant, although it tends to vary around the observed barrels. The shear number S [31] is the displacement, counted as number of residues, when going around the
barrel along the hydrogen bonds (dashed lines). The $-barrel radius R depends on " (or on S) and the number of $-strands n. The peptide parameters are a = 3.3 Å and b = 4.4 Å.
iv) The strand connections at the periplasmic barrel end are short turns named T1, T2, etc., whereas those at the external barrel end are mostly long loops named L1, L2, etc. v) The external loops show a very high sequence variation and are often very mobile. vi) The outer barrel surface contacting the membrane has a 20-Å wide ribbon of aliphatic residues that is lined by an upper and by a lower girdle or aromatic residues.
Here, we are concerned with the general porins that are the most abundant species of outer membrane proteins. Seven structures have been published and, due to close sequence similarity, further structures are available with less accuracy. Among the six structures deposited in the Protein Data Bank, those of porins OmpF, PhoE and OmpK36 are closely related, showing more than 60% sequence identity, and we use OmpF as a representative of the group in Figure 1. 2 Construction of General Porins
All structurally known general porins are trimers formed by monomers consisting of 16-stranded β-barrels. All barrel axes are approximately perpendicular to the membrane plane and all barrel interiors function as channels. OmpF and its homologs PhoE and OmpK36 are somewhat special as their barrel axes deviate by about 10◦ from the membrane normal. Figure 1 shows ribbon plots of the four distinct
5
6 The Structures of General Porins
porins. The barrels have ellipsoidal cross-sections with axes ratios of 1.2:1, which is rather close to circular. In all cases the external loop L3 is of a particular length and folds into the barrel interior forming the pore eyelet. In some cases, especially in Omp32, further external loops assist L3 in constricting the available space inside the barrel. The monomers are always closely associated, forming trimers which can be considered as wheels comprising a central globular hub, three spikes and a hoop. At the hub, the tilting angle α in the β-barrel is usually larger than at the hoop and an external loop folds over a neighboring subunit fortifying the oligomer. Hub and spikes contain a non-polar interior and a polar surface, part of which lines the three channels. All hubs start at the periplasmic barrel end and extend over less than the full barrel length so that the three channels merge at the external side of the eyelets to a common large channel that has been named the vestibule because the solutes gather there before entering the bacterium. The Rc-porin was the second structurally established membrane protein [4]. Therefore, its interface to the membrane caught particular attention. The expected ribbon of non-polar residues was readily detected. This ribbon is also apparent in the statistics of the general porin surfaces shown in Figure 3. Interestingly, the ribbon splits into three parts, an aliphatic central ribbon surrounding the trimer and two girdles of aromatic residues lining the ribbon on both sides [33, 34]. The width of the non-polar ribbon facing the non-polar membrane layer is only 20–25 Å, which approximates the length of a single fatty acid chain. This is only half the thickness displayed in the textbooks which commonly show two opposing extended fatty acid chains forming the membrane. Accordingly, the porin surfaces show that the fatty acids are extensively intertwined, halving the generally assumed thickness of the non-polar layer. It was further recognized that the benzene ring polarity is intermediate between that of aliphatic and polar side-chains, and most suitable for contacting the membrane layer of ester bonds between the fatty acids and the polar heads of the lipids [33–35]. A statistical analysis of the six available general porin structures in Figure 3 shows a further peculiarity of the membrane-exposed porin surface. When subdividing the distribution of Tyr and Phe residues into one for their Cβ atoms and one for their Cζ atoms it became apparent that the aromatic ring orientations are not random. As recognized earlier [33, 34], the tyrosines are located closer to the membrane center and point outward, whereas the phenylalanines are closer to the membrane surfaces and point inwards. The tryptophans are also in the aromatic girdles, but their frequency is negligible in comparison with Tyr and Phe. This distribution confirms that the aromatic residues act as floating stabilizers in the membrane [33, 34]. In a β-barrel the Cα –Cβ bonds point radially to the outside so that the aromatic side-chains can rotate freely around them. As demonstrated in Figure 4, this construction mitigates the consequences of membrane fluctuation because the aromatic rings can keep the contact with the ester bond layers between the non-polar and polar parts of the membrane so that the β-barrel will not be destroyed when its non-polar surface faces a polar environment or vice versa. Such fluctuations are likely to occur frequently due to mechanical stress on the bacterial surface.
2 Construction of General Porins
Fig. 3 Average residue locations at the membrane-facing surfaces of the six available general porin structures listed in Table 1. The residues are plotted according to their position along the membrane normal. The two aromatic girdles were used as a reference, their positions are indicated by arrows. The girdle centers are 19 Å apart. On each side the statistics extend 6 Å beyond the girdle centers. Three
prolines and 21 glycines are not shown (Table 2) because prolines disturb the barrel and glycines are neither polar nor nonpolar. The Tyr and Phe statistics are given for the C $ and C . atoms. The difference in the respective distribution indicates clearly the side-chain orientations: Tyr away from the membrane center and Phe towards the center.
7
8 The Structures of General Porins
Fig. 4 Sketch of the porin-membrane interface showing a usual (left) and a displaced (right) membrane position. The non-polar parts of the porin surface and of the membrane are striated. The aromatic girdles function as swim-stabilizers in the membrane. As the C " –C $ bonds point out ra-
dially, the aromatic rings can rotate easily and thus mitigate the effect of membrane fluctuations by preventing a polar versus non-polar contact between the barrel and membrane, which would disrupt the $-sheet structure.
Table 2 Preferences for amino acid residues at the membrane-exposed surface as derived
from the six trimeric general porins the counting is for the Cy atom positions (Cα for Gly, Cβ for Ala) over a 31-Å stretch centered around the two aromatic girdles (Figure 3) Residue
Local count
Local percentage
E. coli overall percentage [36]
Ratio
Tyr Leu Va l Phe Ala Ile Thr Gly Met Asn Trp His Pro Ser Total
62 61 52 46 43 29 23 21 7 7 5 4 3 2 365
17.0 16.7 14.2 12.6 11.8 7.9 6.3 5.8 1.9 1.9 1.4 1.1 0.8 0.6
3.2 9.0 6.4 3.9 7.8 5.2 5.8 7.4 2.2 4.5 1.3 2.2 5.2 7.1
5.3 1.9 2.2 3.2 1.5 1.5 1.1 0.8 0.9 0.4 1.1 0.5 0.2 0.8
The residue counts of the distributions depicted in Figure 3 are given in Table 2. The preference for Tyr and Phe and the dominance of the aliphatic residues becomes most obvious in comparison with the overall residue abundance in Escherichia coli [36]. The depletion of the polar residues is also very clear. These preferences can be used as propensities for the prediction of membrane interfaces from protein sequences. They seem to be characteristic for all bacterial outer membrane proteins. The aromatic girdle is also present in cytoplasmic membranes where, however, it does not seem to be equally pronounced with respect to the Ty r and Phe orientations.
3 Trimer Association and Folding
3 Trimer Association and Folding
The chain folds of the general porins depicted in Figure 1 show an almost circular high wall surrounding the trimer and, thus, facing the membrane. When describing the trimer as a wheel (see above), this wall is the hoop. The outer surface of the broad hoop consists of the non-polar ribbon lined by the two girdles of aromatic residues (Figure 3) that leave no further surface area on the periplasmic side (Figure 4). However, there is still plenty of surface on the external side where the porins face the lipopolysaccharides of the external leaflet of the bacterial outer membrane. This additional surface is covered by ionogenic residues, in particular by aspartates and glutamates. Presumably, these side-chains bind the porins tightly to the lipopolysaccharides, either through direct hydrogen bonds or through divalent ions such as Ca2+ which are abundant in the external medium. This anchoring should mitigate the mechanical stress from the environment so that the remaining fluctuations can be buffered more easily by the aromatic girdles (Figure 4). The three spikes protruding from the central hub face the polar external medium on one side and the polar periplasm on the other (Figure 1). They are constructed like soluble proteins; their surfaces are covered with polar residues, whereas their interior is non-polar. Each spike contains about half a dozen non-polar residues and the hub has an additional dozen of non-polar residues. Most prominent are phenylalanines and leucines. The side-chains leave essentially no open space at the trimer axis running through the hub. The construction of hub and spikes involves a non-polar trimer interface like those of most soluble oligomeric proteins. Such non-polar protein contacts are easily formed in a polar environment, but they are dispersed within a non-polar membrane. It was therefore suggested that the porin monomers associate in the periplasm and that the resulting trimer is inserted into the membrane [33, 34]. The association should start with the folding of hub and spikes in the manner of soluble proteins starting with the usual non-polar core covered by polar residues. For all porins such an intermediate consisting of hub and spike would contain the N- und C-termini (Figure 1). Consequently, the remaining polypeptide of around 150 residues per monomer would be fixed at both ends. Presumably this chain segment is mobile and unstructured in the periplasm and forms the second part of the β-barrel and, thus, the hoop only when inserted into the membrane. The folding of a chain that is fixed at both ends is highly restrained because neither of the chain termini can be wrapped around any other part. This constraint does not affect the folding of the porins, however, because the chain segment which forms the hoop is an all-antiparallel and all-next-neighbor β-sheet structure (Figure 1) that can be easily formed by mere contraction and meandering. Actually, this exceptional β-barrel construction of all outer membrane proteins of Table 1, which is not observed in the β-barrels of soluble proteins [37], indicates that the barrels of outer membrane proteins are only formed after the chain ends have been fixed.
9
10 The Structures of General Porins
Fig. 5 The geometry of three general porins as described by the reciprocal of the total channel cross-section plotted along the membrane normal. Each distribution along the normal is started from the first atom at the periplasmic end. The analysis shows three distinct examples Rc-porin (dotted line), OmpF (dashed line) and Omp32 (solid line). The two aromatic girdles level around the 13 and 32 Å values of the ab-
scissa. The cross-sections correspond to the solvent-accessible space outside 2.5 Å radius spheres around the non-hydrogen atoms for one of the three channels. The three channels start around 8 Å above the first atoms (A) combine to form the vestibule around 29 Å further up (B) which opens up to the external space after a further 8 Å (C).
4 Pore Geometry
Within the established general porins we observe the three types of channel geometries described in Figure 5. Rc-porin and Rb-porin have a rather wide eyelet, OmpF, PhoE and OmpK36 show intermediate cross-sections, and Omp32 forms a particularly small pore. The general construction is always the same. The sidechains at the periplasmic end of the three channels (A in Figure 5) protrude by about 8 Å into the periplasm. The channels become narrower towards the external end until they are constricted by three eyelets after about 18 Å. Approximately 10 Å further up, the three channels join to form the vestibule (B in Figure 5), which after a further 8 Å opens up to the external medium (C in Figure 5). The channel geometry should be related to the electric conductivity of these porins. When neglecting the radial components of the electric currents, the resistance of a pore should be proportional to the area under the reciprocal cross-section distribution of Figure 5, which is 0.39, 0.26 and 0.16 A−1 for Omp32, OmpF and Rc-porin, respectively. Accordingly, the conductivity should show the ratios of 0.39−1 :0.26−1 :0.16−1 = 0.66:1.00:1.68. The actually measured conductivity for 1 M KCl [38–40] is 0.5, 1.9 and 3.3 nS for Omp32, OmpF and Rc-porin, respectively,
4 Pore Geometry
which converts to the ratios 0.27:1.00:1.74. These values fit each other only roughly. The fit improves, however, if one takes into account that Omp32 (in contrast to OmpF and Rc-porin) shows saturation effects at a high salt concentration [38] so that the first ratio increases appreciably at salt concentrations closer to the physiological level. Since the geometry of the channel is known in absolute terms, its conductivity in a 1 M KCl solution can be deduced from the specific conductivity of the salt solution according to the formula: (pore conductivity) = (specific conductivity)/ Q(z)−1 d z
(1)
where Q(z) is the cross-section distribution along the membrane normal in the z-direction. The specific conductivity of a 1 M KCl solution is 0.113 S cm−1 = 1.13 nS Å−1 and the integral is the area under the respective distribution in Figure 5 as stated above. This results in a calculated porin trimer conductivity of 9, 13 and 21 nS for Omp32, OmpF and Rc-porin, respectively. The OmpF and Rc-porin values exceed the experimental values by a factor of about 6 and the Omp32 value is too high by a factor of about 18. At this point it should be remembered that the Q(z)−1 distribution of Figure 5 was deduced using a molecular surface 2.5 Å away from the non-hydrogen atoms, which is obviously not enough to ensure that the solutes are in an environment comparable to the bulk solvent. Enlarging this 2.5 Å distance to about 3.8 Å reduces the calculated conductivities to the observed values. Reversing the argument, the K + and Cl− ions can be considered as mobile as in the bulk solvent if they are 3.8 Å away from the non-hydrogen atoms of the protein. This is a rather rough estimate. Much more detailed permeation analyses were performed with the specific porins [41, 42]. Pore conductivity is determined by applying a voltage across the membrane and measuring the resulting current. Assuming bulk solvent conditions, the electric potential follows the integral over the reciprocal cross-section distribution of Figure 5 and the voltage between any two points along the membrane normal equals the area under the distribution between these two points. Accordingly, the strongest electric field lies across the eyelet. For OmpF, the area under a 10-Å slice at the eyelet amounts to 64 % of the total area so that a 100 mV voltage across the membrane translates to a 64 mV voltage across the eyelet. The displacement of a single charge over this distance corresponds to an energy of 1.6 × 10−19 A·s·64 × 10−3 V·6 × 1023 = 6 kJ mol−1 which approaches the energy of a hydrogen bond. A voltage in such a range can therefore disrupt weak hydrogen bonds and, thus, destroy the eyelet structure. These effects are known as voltage closures [43]. They are observed with all porins, although at different voltages depending on the strength of the eyelet construction. In vivo, any static voltage across a bacterial outer membrane would decay rapidly by electric currents through the porins. However, transient voltages in form of Donnan potentials in the 100 mV range may occur on rapid changes of the environment and thus may threaten the porin integrity.
11
12 The Structures of General Porins
5 Permeation
Porins are passive channels showing a transport rate proportional to the concentration gradient. The rate itself depends strongly on the construction of the particular eyelet and on the properties of the solute. General porins, for instance, have eyelet sizes that limit the size of permeating solutes to about 600 kDa. Clearly, the mass is a rather coarse and not a very characteristic property of a solute. Solutes are better described by their net charges, their dipoles and their shapes. The dependence of the permeability on the net charge can be rather easily determined for small ions. Ion selectivities of general porins are the rule. They correlate well with a predominance of the opposite charge at the eyelet. As a further rule, porins discriminate against non-polar solutes even if the eyelet is much larger than the solute so that it cannot be expelled by unfavorable contacts with the protein. An explanation for this effect was found for the Rc-porin where a wide eyelet (Figure 5) is lined by six positive charges on one side and by six negative charges on the other [33, 34]. As sketched in Figure 6, this charge distribution gives rise to a transversal electric field across the eyelet. With its water filling, this is a capacitor at a low energy level because the oriented water molecules correspond to a dielectric constant as high as 80. As soon as a non-polar molecule with its low dielectric constant of about 3 intrudes this capacitor, the energy level is raised (similar to lifting a mass) and a force develops to expel this solute (similar to the gravitational force pulling a lifted mass down again). As a result such an electric field permits a wide eyelet without allowing the passage of non-polar solutes. On the other hand, the passage of polar solutes through this transversal electric field is facilitated because they stop tumbling around and pass the eyelet in a fixed orientation (Figure 6). The diffusion becomes one-dimensional and therefore faster. Such an electric field is also present in Rb-porin, OmpK36, PhoE and OmpF where, however, it is based on fewer charges than in the Rc-porin. The small eyelet of Omp32 (Figure 5) lacks such a charge distribution. This seems reasonable because the electric field is only required for wide pores. A survey of the residues of the eyelet wall reveals in all cases three arginines at characteristic positions of the β-barrel (Arg-42, Arg-82 and Arg-132 of OmpF). In several eyelets two guanidinium groups form a sandwich that stabilizes the sidechains mechanically. In addition, there are numerous lysines with their side-chains extending into the solvent. Many of these are laterally stabilized by other side-chains. Aspartates and glutamates are abundant giving rise to numerous negative charges. Several eyelets contain tyrosines pointing with their hydroxyl dipoles towards the solvent. In general, the eyelets are highly polar and stabilized by hydrogen bonding networks. The Rc-porin shows two peculiarities. First, the negative side of the eyelet wall contains two Ca2+ ions. The charges of these Ca2+ are compensated by numerous acids giving rise to a negative net charge of 6. Obviously the eyelet size is regulated by the Ca2+ concentration in the external medium. As a second peculiarity, Rc-porin harbors a binding site for rather non-polar solutes close to the external end of the
6 Conclusion 13
Fig. 6 Sketch of the transversal electric field across the pore eyelet, which is most prominent in Rc-porin with a large pore (Figure 5) and least prominent in Omp32 with a small pore. As shown on the left-hand side, the positively and negatively charged walls of the eyelet form a capacitor that is usually filled with oriented water molecules giving rise to a low energy level. The energy level
of the capacitor increases when a non-polar molecule (C, dielectric constant around 3) enters, thus developing an expelling force. The right-hand sketch indicates that the field orients polar solutes so that their permeation is facilitated by restricting the diffusion to only one dimension. Taken together, the electric field acts as a polarity filter.
eyelet: a bound molecule of the size of adenosine would clog the eyelet. Such an arrangement gives rise to a Michaelis-Menten-type kinetics for the permeation for adenosine. The molecule binds at low concentrations to this site and then diffuses from there through the eyelet at a certain rate corresponding to kcat of an enzyme. This results in a saturation effect for adenosine permeation which resembles the saturation effects of enzymes. Consequently, Rc-porin should be considered a general porin for all the polar solutes that do not bind to this particular site, showing a linear relationship between the permeating solutes and their concentration gradient. For solutes using the binding site, however, Rc-porin acts like a specific protein. Binding to this site increases the permeation at low concentrations but shows a saturation effect at high concentrations.
6 Conclusion
The structures of the general porins resemble each other closely and they are also related to all other bacterial outer membrane proteins with their all-next-neighbor all-antiparallel β-barrels with narrowly distributed tilt angles and barrel lengths. All of them are trimeric, showing the general structure of a wheel where the hub and the three spikes have polar surfaces and non-polar cores like soluble proteins. The hoop is broad and shows a characteristic surface structure consisting of an aliphatic ribbon lined by aromatic girdles at each side. The periplasmic rim of the hoop is very narrow. In contrast, the external rim harbors numerous ionogenic side-chains which bind tightly to the lipopolysaccharide cores. The mechanical stress on the outer membrane is buffered by the tight connection of the porins to the lipopolysaccharides in the outer leaflet and also by the two aromatic girdles that prevent unfavorable contacts at the polar and the non-polar outer surfaces of
14 The Structures of General Porins
the β-barrels, which would destroy them. The porin channels have no uniform cross-section. Rather, they are wide at both ends and have a small eyelet extending over a length of about 10 Å approximately in the middle of the membrane. All general porins have a spacious vestibule at the external channel end. On the basis of the known structures, the porins are most suitable for channel engineering [44– 49]. Such endeavors should help to correlate the permeation properties with the atomic structure of the pore. Moreover, they may give rise to porins of technical interest. Acknowledgments
I thank Linda Bohm and Markus Kromer for their help with the preparation of the manuscript.
References 1 R. M. GARAVITO, J. P. ROSENBUSCH, Three-dimensional crystals of an integral membrane protein: an initial X-ray analysis. J Cell Biol 1980, 86, 327–329 2 A. KREUSCH, M. S. WEISS, W. WELTE, J. WECKESSER, G. E. SCHULZ, Crystals of an integral membrane protein diffracting to 1.8 Å resolution. J Mol Biol 1991, 217, 9–10. 3 A. PAUTSCH, J. VOGT, K. MODEL, C. SIEBOLD, G. E. SCHULZ, Strategy for membrane protein crystallization exemplified with OmpA and OmpX. Proteins: Struct Funct Genet 1999, 34, 167–172. 4 M. S. WEISS, T. WACKER, J. WECKESSER, W. WELTE, G. E. SCHULZ, The three-dimensional structure of porin from Rhodobacter capsulatus at 3 Å resolution. FEBS Lett 1990, 267, 268–272. 5 M. S. WEISS, G. E. SCHULZ, Structure of porin refined at 1.8 Å resolution. J Mol Biol 1992, 227, 493–509. 6 S. W. COWAN, T. SCHIRMER, G. RUMMEL, M. STEIERT, R. GHOSH, R. A. PAUPTIT, J. N. JANSONIUS, J. P. ROSENBUSCH, Crystal structures explain functional properties of two E. coli porins. Nature 1992, 358, 727–733. 7 A. KREUSCH, A. NEUBIISER, E. SCHILTZ, J. WECKESSER, G. E. SCHULZ, The structure of the membrane channel porin from
8
9
10
11
12
13
Rhodopseudomonas blastica at 2.0 Å resolution. Protein Sci 1994, 3, 58–63. A. KREUSCH, G. E. SCHULZ, Refined structure of the porin from Rhodopseudomonas blastica: comparison with the porin from Rhodobacter capsulatus. J Mol Biol. 1994, 243, 891–905. HIRSCH, J. BREED, K. SAXENA, O.M.H. RICHTER, B. LUDWIG, K. DIEDERICHS, W., Welte The structure of porin from Paracoccus denitrificans at 3.1 Å resolution. FEBS Lett 1997, 404, 208–210. R. DUTZLER, G. RUMMEL, S. ALBERTI, S. HERNANDEZ-ALLES, P. S. PHALE, J. P. ROSENBUSCH, V. J. BENEDI, T. SCHIRMER, Crystal structure and functional characterization of OmpK36, the osmo-porin of Klebsiella pneumoniae. Structure 1999, 7, 425–434. K. ZETH, K. DIEDERICHS, W. WELTE, H. ENGELHARDT, Crystal structures of Omp32, the anion-selective porin from Comamonas acidovorans, in complex with a periplasmic peptide at 2.1 Å resolution. Structure 2000, 8, 981–992. T. SCHIRMER, T. A. KELLER, Y. F. WANG, J. P. ROSENBUSCH, Structural basis for sugar translocation through malto-porin channels at 3.1 Å resolution. Science 1995, 267, 512–514. J. E. W. MEYER, M. HOFNUNG, G. E. SCHULZ, Structure of maltoporin from Salmonella typhimurium ligated with a
References
14
15
16
17
18
19
20
21
22
nitrophenyl-maltotrioside. J Mol Biol 1997, 266, 761–775. D. FORST, W. WELTE, T. WACKER, K. DIEDERICHS, Structure of the sucrose-specific porin ScrY from Salmonella typhimurium and its complex with sucrose. Nature Struct Biol 1998, 5, 37–46. R. R. KETCHEM, W. HU, T. A. CROSS, High-resolution conformation of gramicidin A in a lipid bilayer by solid-state NMR. Science 1993, 261, 1457–1460. L. SONG, M. R. HOBAUGH, C. SHUSTAK, S. CHELEY, H. BAYLEY, J. E. GOUAUX, Structure of staphylococcal α-hemolysin, a heptameric transmembrane pore. Science 1996, 274, 1859–1865. V. KORONAKIS, A. SCHARFF, E. KORONAKIS, B. LUISI, C. HUGHES, Crystal structure of the bacterial membrane protein TolC central to multidrug efflux and protein export. Nature 2000, 405, 914–919. R. OLSON, H. NARIYA, K. YOKOTA, Y. KAMIO, E. GOUAUX, Crystal structure of Staphylococcal LukF delineates conformational changes accompanying formation of a transmembrane channel. Nature Struct Biol 1999, 6, 134–140. K. P. LOCHER, B. REES, R. KOEBNIK, A. MITSCHLER, L. MOULINIER, J. P. ROSENBUSCH, D. MORAS, Transmembrane signaling across the ligand-gated FhuA receptor: crystal structures of free and ferrichrome-bound states reveal allosteric changes. Cell 1998, 95, 771–778. A. D. FERGUSON, E. HOFMANN, J. W. COULTON, K. DIEDERICHS, W. WELTE, Siderophore-mediated iron transport: crystal structure of FhuA with bound lipopolysaccharide. Science 1998, 282, 2215–2220. S. K. BUCHANAN, B. S. SMITH, L. VENKATRAMANI, D. XIA, L. ESSER, M. PALNITKAR, R. CHAKRABORTY, D. van der HELM, J. DEISENHOFER, Crystal structure of the outer membrane active transporter FepA from Escherichia coli. Nature Struct Biol 1999, 6, 56–63. A. D. FERGUSON, R. CHAKRABORTY, B. S. SMITH, L. ESSER, D. van der HELM, J. DEISENHOFER, Structural basis of gating by
23
24
25
26
27
28
29
30
31
32
33
the outer membrane transporter FecA. Science 2002, 295, 1715–1719. D. P. CHIMENTO, A. K. MOHANTY, R. J. KADNER, M. C. WIENER, Substrate-induced transmembrane signaling in the cobalamin transporter BtuB. Nature Struct Biol 2003, 10, 394–401. A. PAUTSCH, G. E. SCHULZ, Structure of the outer membrane protein A trans-membrane domain. Nature Struct Biol. 1998, 5, 1013–1017. A. PAUTSCH, G. E. SCHULZ, High resolution structure of the OmpA membrane domain. J Mol Biol 2000, 298, 273–282. J. VOGT, G. E. SCHULZ, The structure of the outer membrane protein OmpX from Escherichia coli reveals possible mechanisms of virulence. Structure 1999, 7, 1301–1309. H. J. SNIJDER, I. UBARRETXENA-BELANDIA, M. BLAAUW, K. H. KALK, H. M. VERHIJ, M. R. EGMOND, N. DEKKER, B. W. DIJKSTRA, Structural evidence for dimerization-regulated activation of an integral membrane phospholipase. Nature 1999, 401, 717–721. L. VANDEPUTTE-RUTTEN, R. A. KRAMER, J. KROON, N. DEKKER, M. R. EGMOND, P. GROS, Crystal structure of the outer membrane protease OmpT from Escherichia coli suggests a novel catalytic site. EMBO J. 2001, 20, 5033–5039. S. M. PRINCE, M. ACHTMAN, J. P. DERRICK, Crystal structure of the OpcA integral membrane adhesin from Neisseria meningitidis. Proc Natl Acad Sci USA 2002, 99, 3417–3421. L. VANDEPUTTE-RUTTEN, M. P. BOS, J. TOMMASSEN, P. GROS, Crystal structure of neisserial surface protein A (NspA), a conserved outer membrane protein with vaccine potential. J Biol Chem 2003, 278, 24825–24830. W. M. LIU, Shear numbers of protein β-barrels: definition, refinements and statistics. J Mol Biol 1998, 275, 541–545. G. E. SCHULZ, R. H. SCHIRMER, Principles of Protein Structure. Springer, New York, 1979. G. E. SCHULZ, Structure-function relationships in the membrane channel porin as based on a 1.8 Å resolution
15
16 The Structures of General Porins
34
35
36
37
38
39
40
41
crystal structure, In: A. PULLMAN, J. JORTNER, B. PULLMAN (Eds), Membrane Proteins: Structures, Interactions and Models. Kluwer, Dordrecht, 1992, pp. 403–412. G. E. SCHULZ, Structure-function relationships in porins as derived from a 1.8 Å resolution crystal structure, In: J. M. GHUYSEN, R. HAKENBECK (Eds), New Comprehensive Biochemistry, Vol 27: Bacterial Cell Wall. Elsevier, Amsterdam, 1994, pp. 343–352. W. C. WIMLEY, S. H. WHITE, Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nature Struct Biol 1996, 3, 842–848. A. B. ROBINSON, L. R. ROBINSON, Distribution of glutamine and asparagine residues and their neighbors in peptides and proteins. Proc Natl Acad Sci USA 1991, 88, 8880–8884. C. BRANDEN, J. TOOZE, Introduction to Protein Structure. Garland, New York, 1999, pp. 67–88. A. MATHES, H. ENGELHARDT, Nonlinear and asymmetric open channel characteristics of an ion-selective porin in planar lipid membranes. Biophys J 1998, 75, 1255–1262. R. BENZ, Porin from bacterial and mitochondrial outer membranes, In: CRC Critical Reviews in Biochemistry. CRC Press, Boca Raton, FL, 1986, pp. 145–190. R. BENZ, K. BAUER, Permeation of hydrophilic molecules through outer membrane of gram-negative bacteria. Eur J Biochem 1988, 176, 1–19. J. E. W. MEYER, G. E. SCHULZ, Energy profile of maltooligosaccharide permeation through maltoporin as derived from the structure and from a statistical analysis of saccharide-protein
42
43
44
45
46
47
48
49
interactions. Protein Sci 1997, 6, 1084–1091. T. SCHIRMER, P. S. PHALE, Brownian dynamics simulation of ion flow through porin channels. J Mol Biol 1999, 294, 1159–1167. D. J. M¨ULLER, A. ENGEL, Voltage and pH-induced channel closure of porin OmpF visualized by atomic force microscopy. J Mol Biol 1999, 285, 1347–1351. B. SCHMID, M. KRO¨ MER, G. E. SCHULZ, Expression of porin from Rhodopseudomonas blastica in Escherichia coli inclusion bodies and folding into exact native structure. FEBS Lett 1996, 381, 111–114. P. VANGELDER, N. SAINT, R. van BOXTEL, J. P. ROSENBUSCH, J. TOMMASSEN, Pore functioning of outer membrane protein PhoE of Escherichia coli: mutagenesis of the constriction loop L3. Protein Eng 1997, 10, 699–706. N. LIU, A. H. DELCOUR, The spontaneous gating activity of OmpC porin is affected by mutations of a putative hydrogen bond network or of a salt bridge between the L3 loop and the barrel. Protein Eng 1998, 11, 797–802. B. SCHMID, L. MAVEYRAUD, M. KROMER, G. E. SCHULZ, Porin mutants with new channel properties. Protein Sci 1998, 7, 1603–1611. L. Q. GU, O. BRAHA, S. CONLAN, S. CHELEY, H. BAYLEY, Stochastic sensing of organic analytes by a pore-forming protein containing a molecular adapter. Nature 1999, 398, 686–690. K. SAXENA, V. DROSOU, E. MAIER, R. BENZ, B. LUDWIG, Ion selectivity reversal and induction of voltage-gating by site-directed mutations in the Paracoccus denitrificans porin. Biochemistry 1999, 38, 2206–2212.
1
Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure Stephen H. White University of California at Irvine, Irvine, USA Tara Hessa, and Gunnar von Heijne Stockholm University, Stockholm, Sweden
Originally published in: Protein-Lipid Interactions. Edited by Lukas K. Tamm. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31151-4
1 Introduction
Constitutive membrane proteins (MPs) come to equilibrium with the lipid bi-layer and water, after transmembrane (TM) insertion, through the translocation machinery of cells. The prediction of their three-dimensional structure from the amino acid sequence should emerge from a comprehensive understanding of the physical chemistry of protein-lipid interactions. The most fundamental physical principle is that TM helices are composed predominantly of non-polar amino acids. Bacteriorhodopsin [1], comprised of seven TM helices packed neatly into a bundle, is generally taken as the archetypal MP. Its apparent simplicity has led to a simple prediction paradigm that involves first identifying hydrophobic TM segments using hydropathy plots (reviewed in [2]) and then applying helix-packing constraints [3]. This optimistic assessment has been seriously challenged by the threedimensional structure of the ClC chloride channel published in 2002 [4] (Fig. 1A). The jumble of helices buried within the membrane mocks bacteriorhodopsin’s simplicity. Not only do the 17-odd helices vary greatly in length and tilt, some form TM structures in end-to-end arrangements in the manner of the aquaporin family of transporters (reviewed in [5]). Hydropathy plots fail to identify the complex topology correctly. This failure is not limited to the ClC channel alone, as shown by the three-dimensional structure of the KvAP voltage-gated potassium channel [6]. The S1-S4 voltage sensing region is not comprised of the simple TM helices as surmised from hydropathy plot analyses. Rather, this region appears to be dominated by a helical hairpin arrangement that can move within the lipid bilayer in response to
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
Fig. 1 Examples of MPs in lipid bilayers. In the molecular images, phospholipid headgroups are red and acyl chains are white. (A) An image of the ClC chloride channel [4, 111] embedded in a lipid bilayer (red and white) surrounded by water (aquamarine). The topology of this complex protein defies predictions using hydropathy plots. The yellow arrows highlight the components of the intrinsic interactions that must be understood quantitatively in order to predict the three-dimensional structure from the amino acid sequence. Intrinsic interactions are those involving the full-length polypeptide sequence, the lipid bilayer and water. The image was produced from a MD simulation of ClC in a POPC bilayer, courtesy of Dr. Alfredo Freites at UC Irvine. (B) Schematic representation of the translocation or insertion of TM helices by a translocon receiving an elongating polypeptide chain from a ribosome. Polypeptide chains destined for translocation across the ER (center green chain) of eukaryotes or the plasma membrane of prokaryotes lack a segment of sufficient hydrophobicity and length to be identified by the translocon as a TM helix. The topology of a TM segment
[112] is determined by charge interactions [113] with the translocon complex (Sec61 in eukaryotes, SecY in bacteria). Several recent reviews discuss translocon-guided insertion of MPs [9–14, 114]. The schematic image is based upon Fig. 1 of [9]. (C and D) Structure of the SecY complex from Methanococcus jannaschii [7] that has been embedded in a POPC lipid bilayer using MD methods. A view of SecY normal to the bilayer plane looked at from the ribosome is shown in (C), while (D) shows a view along the bilayer plane looking into the so-called “gate” formed by helices TM2B and TM7. Nascent TM helices move into the bilayer through this gate. The translocon is in a closed state, because the structure was determined in the absence of an elongating polypeptide. The TM2A “plug helix” apparently seals the translocon in the absence of nascent peptide to prevent TM movement of ions. Waters within 5 Å of SecY are identified by the blue triangles. The images were prepared from a MD simulation, courtesy of Dr. Alfredo Freites at UC Irvine. All molecular graphics images were produced using Visual Molecular Dynamics (VMD) [115].
2 Membrane Proteins: Intrinsic Interactions
changes of membrane potential. These new structures force a re-evaluation of the structure-prediction problem. What is missing from the present approach? One thing may be attention to the mechanisms of biological assembly. Constitutive α-helical MPs are assembled in membranes by means of a translocation/insertion process that involves physical engagement of a ribosome (Fig. 1B) with the translocon complex [7–9] – itself a MP [9–12] (Fig. 1C and D). Polypeptide segments destined for insertion as TM segments are identified by the translocon-bilayer system and shunted into the bilayer (reviewed in [9–15]). After release into the membrane’s bilayer fabric and disassembly of the ribosome-translocon machinery, a MP resides stably in a thermodynamic free energy minimum (evidence reviewed in [16, 17]). This outline of MP assembly suggests two fundamental categories of protein-lipid interactions that require consideration in structure-prediction algorithms: intrinsic and formative. Intrinsic interactions are those responsible for the stability and structure of the full-length polypeptide chain after synthesis. These interactions, which produce the final shaping of MP structure, include interactions of the polypeptide chain with itself, water, the bilayer hydrocarbon core (HC), the bilayer interfaces (IFs) and, in some cases, cofactors (Fig. 1A). Several recent reviews [17–21] provide extensive discussions of the evolution, structure and thermodynamic stability of MPs. An overview of intrinsic interactions that stabilize α-helical MPs is provided in Section 2. The basic thermodynamic principles of α-helical MPs, except for helix-helix interactions, apply also to β-barrel MPs, but this class of MPs will not be considered here. The interested reader should consult two excellent recent reviews on β-barrel MPs [21, 22]. The second category of interactions that require consideration in structureprediction algorithms, formative interactions, involve interactions of elongating polypeptides with the translocon as well as the lipid bilayer. These interactions, which lead to the selection of a polypeptide segment for shunting into the bilayer, are the subject of Section 3. Recent experiments [23] have revealed the basic selection rules, and the recent structure of the bacterial (SecY) translocon [7, 8] (shown embedded in a lipid bilayer in Fig. 1C and D) provides a structural context for the underlying formative interactions. The basic selection rules indicate that our understanding of the intrinsic interactions is incomplete.
2 Membrane Proteins: Intrinsic Interactions 2.1 Physical Determinants of Membrane Protein Stability: The Bilayer Milieu
Two influences are paramount in shaping polypeptide structure in membranes. First, as indicated in Fig. 2, the membrane’s bilayer fabric has two chemically distinct regions: HC and IFs. IF structure and chemistry must be important because the specificity of protein signaling and targeting by membrane-binding domains
3
4 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
Fig. 2
2 Membrane Proteins: Intrinsic Interactions
could not exist otherwise [24]. Second, the high energetic cost of dehydrating the peptide bond, as when transferring it to a non-polar phase, causes it to dominate structure formation [25], as summarized in Fig. 3. The only permissible TM structural motifs of MPs are α-helices and β-barrels, because internal hydrogen bonding ameliorates this cost (see below). As membranes must be in a fluid state for normal cell function, only the structure of fluid (Lα phase) bilayers is relevant to understanding how membranes mold proteins. However, atomic-resolution images of fluid membranes are precluded due to their high thermal disorder (Fig. 2A). Nevertheless, fundamental and useful structural information can be obtained from multilamellar bilayers (liquid crystals) dispersed in water or deposited on surfaces [26–29]. Their one-dimensional crystallinity perpendicular to the bilayer plane allows the distribution of matter along the bilayer normal to be determined by combined X-ray and neutron diffraction measurements (liquid crystallography; reviewed in [30, 31]). The resulting “structure” consists of a collection of time-averaged probability distribution curves of water and lipid component groups (carbonyls, phosphates, etc.), representing projections of three-dimensional motions onto the bilayer normal. Fig. 2B shows the liquid-crystallographic structure of an Lα phase dioleoylphosphatidylcholine (DOPC) bilayer [32]. Three features of this structure are important. First, the widths of the probability densities reveal the great thermal disorder of fluid membranes. Second, the combined thermal thickness of the IFs (defined by the distribution of the waters of hydration) is approximately equal to the 30-Å thickness of the HC. The thermal thickness of a single IF (around 15 Å) can easily accommodate an α-helix parallel to the membrane plane. The common cartoons of bilayers that assign a diminutive thickness to the bilayer IFs are thus misleading. Third, the thermally disordered IFs are highly heterogeneous chemically. A polypeptide chain in an IF must ← Fig. 2 The liquid-crystalline structure of a fluid DOPC bilayer. (A) Molecular graphics image of DOPC taken from a MD simulation by Ryan Benz at UC Irvine.The color scheme for the component groups (carbonyls, phosphates, water, etc.)is given in (B). The image was prepared by S.White using VMD [115]. (B) Liquid-crystallographic structure of a fluid DOPC lipid bilayer [32]. The “structure ” of the bilayer is comprised of a collection of transbilayer Gaussian probability distribution functions representing the lipid components that account for the entire contents of the bilayer unit cell. The areas under the curves correspond to the number of constituent groups per lipid represented by the distributions (one phosphate, two carbonyls, four methyls, etc.). The widths of the Gaussians measure the
thermal motions of the lipid components and are simply related to crystallographic B factors [39, 40, 116]. The thermal motion of the bilayer is extreme: lipid-component B factors are typically around 150 Å2 , compared to around 30 Å 2 for atoms in protein crystals. (C) Polarity profile (yellow curve) of the DOPC bilayer (above) computed from the absolute values of atomic partial charges [33]. The end-on view in (B) of an "-helix with diameter ∼10 Å – typical for MP helices [87] – shows the approximate location of the helical axes of the amphipathic-helix peptides Ac-18ANH2 [40]and melittin [39], as determined by a novel, absolute-scale X-ray diffraction method (reviewed in [117]). Panels (B) and (C) have been adapted from reviews by White and Wimley [17, 33, 118].
5
6 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
Fig. 3 Energetics of peptide interactions with lipid bilayers. (A) Schematic representation of the shaping of protein structure through polypeptide – bilayer interactions.Based upon the fourstep thermodynamic cycle of Wimley and White [17] for describing the partitioning, folding, insertion and association of "-helical polypeptides. The aqueous insolubility of MPs, folded or unfolded, precludes direct determinations of interaction free energies. The only route to understanding the energetics of MP stability is through studies of small,
water-soluble peptides [62, 64, 65, 68]. The association of TM helices is probably driven by van der Waals interactions, giving rise to knob-into-hole packing [84–86, 119]. The GxxxG motif is especially important in helixhelix interactions in membranes [90, 91]. (B) Energetics of secondary structure formation by melittin at the bilayer IF [65]. Unfolded peptides are driven toward the folded state in the IF because hydrogenbond formation dramatically lowers the cost of peptide-bond partitioning, which is the dominant determinant of whole-residue
2 Membrane Proteins: Intrinsic Interactions
experience dramatic variations in environmental polarity over a short distance due to the steep changes in chemical composition, as illustrated by the yellow curve in the lower half of Fig. 2C [33]. As the regions of first contact, the IFs are especially important in the folding and insertion of non-constitutive MPs, such as diphtheria toxin [34, 35] and to the activity of surface-binding enzymes, such as phospholipases [36–38]. However, for reasons discussed below, they are also likely to be important in translocon-assisted folding of MPs. Experimentally determined bilayer structures such as the one in Fig. 2C are essential for understanding thermodynamic measurements of peptide-bilayer interactions at the molecular level. Recent extension of the liquid-crystallographic methods to bilayers containing peptides such as melittin [39] and other amphipathic peptides [40] makes this a practical possibility. However, there are numerous other X-ray and neutron diffraction approaches that provide important information about the molecular interactions of peptides with lipid bilayers [41–47]. Molecular dynamics (MD) simulations of bilayers [48–51] (Fig. 2A) are rapidly becoming an essential structural tool for examining lipid-protein interactions at the atomic scale [52–57]. The future offers the prospect of combining bilayer diffraction data with MD simulations in order to arrive at experimentally validated MD simulations of fluid lipid bilayers [58]. This approach should allow one to convert the static one-dimensional images obtained by diffraction (Fig. 2B) into dynamic, three-dimensional structures for examining peptide-lipid interactions in atomic detail. 2.2 Physical Determinants of Membrane Protein Stability: Energetics of Peptides in Bilayers
Experimental exploration of the stability of intact MPs is problematic due to their general insolubility. One approach to stability is to “divide and conquer” by studying the membrane interactions of fragments of MPs, i.e. peptides. Because MPs are equilibrium structures, one is free to describe the interactions by any convenient set of experimentally accessible thermodynamic pathways, irrespective of the ← Fig. 3 (Continued) partitioning. The free energy reduction accompanying secondary structure formation by melittin is around 0.4 kcal mol−1 per residue [64, 65], but may be as low as 0.1 kcal mol−1 for other peptides [120]. Although small, such changes in aggregate can be large.For example, the folding of 12 residues of 26-residue melittin into an "-helical conformation causes the folded state to be favored over the unfolded state by around 5 kcal mol−1 . To put this number in perspective, the ratio of folded to unfolded peptide is around 4700. (C)
The energetics of TM helix insertion based upon the work of Wimley and White [68] and Jayasinghe et al. [72]. Estimated relative free energy contributions of the sidechains (Gsc )and backbone (Gbb )to the helix-insertion energetics of glycophorin A [73]. The net side-chain contribution (relative to glycine) was computed using the n-octanol hydrophobicity scale of Wimley et al. [74]. The per-residue cost of partitioning a polyglycine "-helix is +1.15 kcal mol−1 [72]. (Adapted from reviews by White et al. [19] and White [20]).
7
8 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
biological synthetic pathway. One particularly useful set of pathways is the so-called four-step model [17] (Fig. 3A), which is a logical combination of the early three-step model of Jacobs and White [59] and the two-stage model of Popot and Engelman [60], in which TM helices are first “established” across the membrane and then assemble into functional structures (helix association; reviewed in [61]). Although these pathways do not mirror the actual biological assembly process of MPs, they are nevertheless useful for guiding biological experiments, because they provide a thermodynamic context within which biological processes must proceed. In the four-step model (Fig. 3A), the free energy reference state is taken as the unfolded protein in an IF. However, this state cannot actually be achieved with MPs because of the solubility problems. Nor can it be achieved with small non-constitutive membrane-active peptides, such as melittin, because binding usually induces secondary structure (partitioning-folding coupling). It can be defined for phosphatidylcholine (PC) IFs by means of an experiment-based interfacial free energy (hydrophobicity) scale [62] derived from partitioning into 1-palmitoyl2-oleolyl-phosphatidylcholine (POPC) bilayers of tri- and pentapeptides [59, 62] that have no secondary structure in the aqueous or interfacial phases. This scale (Fig. 4A), which includes the peptide bonds as well as the side-chains, allows calculation of the virtual free energy of transfer of an unfolded chain into an IF. For peptides that cannot form regular secondary structure, such as the antimicrobial peptide indolicidin, the scale predicts observed free energies of transfer with remarkable accuracy [63]. This validates it for computing virtual partitioning free energies of proteins into PC IFs. Similar scales are needed for other lipids and lipid mixtures. The high cost of interfacial partitioning of the peptide bond [62], 1.2 kcal mol−1 , explains the origin of partitioning-folding coupling and it also explains why the IF is a potent catalysis of secondary structure formation. Wimley et al. [64] showed for interfacial β-sheet formation that hydrogen-bond formation reduces the cost of peptide partitioning by about 0.5 kcal mol−1 per peptide bond. The folding of melittin into an amphipathic α-helix on POPC membranes involves a per-residue reduction of about 0.4 kcal mol−1 [65] (Fig. 3B). The folding of other peptides may involve smaller per-residue values [66, 67]. The cumulative effect of these relatively small per-residue free energy reductions can be very large when tens or hundreds of residues are involved. The energetics of TM helix stability also depend critically on the partitioning cost of peptide bonds (Fig. 3C). Determination of the energetics of TM α-helix insertion, which is necessary for predicting structure, is difficult because non-polar helices tend to aggregate in both aqueous and interfacial phases [68]. The broad energetic issues are clear [69], however. Computational studies [70, 71] suggest that the transfer free energy GCONH of a non-hydrogen-bonded peptide bond from water to alkane is +6.4 kcal mol−1 , compared to only +2.1 kcal mol−1 for the transfer free energy GHbond of a hydrogen-bonded peptide bond. The per-residue free energy cost of disrupting hydrogen bonds in a membrane is therefore about
2 Membrane Proteins: Intrinsic Interactions
Fig. 4
9
10 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
4 kcal mol−1 . A 20-amino-acid TM helix would thus cost 80 kcal mol−1 to unfold within a membrane, which explains why unfolded polypeptide chains cannot exist in a TM configuration. As discussed in detail elsewhere [19, 72], GHbond sets the threshold for TM stability as well as the so-called decision level in hydropathy plots [2]. The free energy of transfer of non-polar side-chains dramatically favors helix insertion, while the transfer cost of the helical backbone dramatically disfavors insertion. For example [19], the favorable (hydrophobic effect) free energy for the insertion of the single membrane-spanning helix of glycophorin A [73] is estimated to be -36 kcal mol−1 , whereas the cost Gbb of dehydrating the helix backbone is +26 kcal mol−1 (Fig. 3C). The net free energy GTM favoring insertion is thus −10 kcal mol−1 . As is common in so many biological equilibria, the free energy minimum is the small difference of two relatively large opposing energetic terms. Uncertainties in the per-residue cost of backbone insertion will have a major effect on estimates of TM helix stability, the interpretation of hydropathy plots, and the establishment of the minimum value of side-chain hydrophobicity required for stability. An uncertainty of 0.5 kcal mol−1 , for example, would cause an uncertainty of about 10 kcal mol−1 in GTM ! What is the most likely estimate of GHbond ? The practical number is the cost G helix glycyl of transferring a single glycyl unit of a polyglycine α-helix into the bi-layer HC. Electrostatic calculations [71] and the octanol partitioning study of Wimley −1 et al. [74] suggested that G helix glycyl =+1.25 k cal mol , which is the basis for the calculation of AGbb. The cost of transferring a random-coil glycyl unit into n-octanol [74] is +1.15 kcal mol−1 , which suggested that the n-octanol whole-residue hydrophobicity scale [17] (Fig. 4B) derived from partitioning data of Wimley et al. [74] might be a good measure of G helix glycyl . This hypothesis was borne out by a study [72] of known TM helices cataloged in the MPtopo database of MPs of known topology [75], accessible via http://blanco.biomol.uci.edu/mptopo. This study showed that ← Fig. 4 Summary of experiment-based hydrophobicity scales that are useful for understanding MP stability and translocon assisted folding. (A) The WW interfacial hydrophobicity scale determined from measurements of the partitioning of short peptides into phosphatidylcholine vesicles [62]. (B) The WW octanol hydrophobicity scale determined from the partitioning of short peptides into n -octanol [74] that predicts the stability of TM helices [72]. The free energy values along the abscissa are ordered in the same manner as in Fig. 6A. (C) The basis for deriving the octanol -IF scale (oct-IF =oct -IF ) from the scales shown in (A and B). Numerical values for all of the scales can be obtained at
http://blanco.biomol.uci.edu/hydrophobicity scales.html. (D) The oct -IF scale divides the natural amino acid residues into three classes based upon their relative propensities for the HC and the membrane IF. (E) A plot of the normalized turn propensity for helical hairpin formation [78] versus the octanol -IF hydrophobicity scale. There is a clear correlation between the turn propensity and oct-IF hydrophobicity. Those residues that favor the conversion of a long (about 40 amino acids), single-spanning polyleucine TM helix into a helical hairpin (two TM helices separated by a tight turn)are generally the same ones that favor the membrane IF.See text for discussion. (Adapted from a review by White [20]).
2 Membrane Proteins: Intrinsic Interactions
+1.15 kcal mol−1 is indeed the best estimate of G helix glycyl . Using this value, TM helices for MPs of known three-dimensional structure could be identified with high accuracy in the 2001 edition of MPtopo. This scale also includes free energy values for protonated and deprotonated forms of Asp, Glu and His. In addition, Wimley et al. [76] determined the free energies of partitioning salt-bridges into octanol, which are believed to be good estimates for partitioning into membranes [72]. This has led to the augmented Wimley-White (aWW) hydrophobicity scale [72] that forms the basis for a useful hydropathy-based tool, MPEx, for analyzing MP protein stability. MPEx is available as an on-line java applet at http://blanco.biomol.uci.edu/mpex. However, the scale fails miserably in the prediction of the topology of the ClC chloride channel (Fig. 1A), indicating the need to understand the translocon-assisted folding of MPs. Nevertheless, the WW experiment-based whole-residue hydrophobicity scales [62, 72, 74], Fig. 4 [A (GIF ) and B (GWW or Goct )], provide a solid starting point for understanding the physical stability of MPs. The whole-residue WW scale provides an important connection between physical chemistry and biology (see below). When the two scales are used together (Fig. 4C), one can estimate the preference of a polypeptide segment for the HC as an α-helix relative to the membrane IF as an unfolded chain. The “octanol-IF” scale, Goct-IF = Goct –GIF , divides the amino acid residues into three groups (Fig. 4D): strongly IF preferring, strongly HC preferring and those that are borderline (|Goct-IF | ≤0.25 kcal mol−1 ). The octanol-IF scale provided insights into translocon-assisted folding [77–79] and was the stimulus for undertaking a detailed examination of the recognition of TM helices by the endoplasmic reticulum (ER) translocon (see below) [23]. 2.3 Physical Determinants of Membrane Protein Stability: Helix-Helix Interactions in Bilayers
The hydrophobic effect is generally considered to be the major driving force for compacting soluble proteins [80]. However, it cannot be the force driving compaction (association) of TM α-helices. Because the hydrophobic effect arises solely from dehydration of non-polar surfaces [81], it is expended after helices are established across the membrane. Helix association is most likely driven primarily by van der Waals forces; more specifically, the London dispersion force (reviewed in [17, 18]). But why would van der Waals forces be stronger between helices than between helices and lipids? Extensive work [82–86] on dimer formation of glycophorin A in detergents reveals the answer: knob-into-hole packing that allows more efficient packing between helices than between helices and lipids. Tight, knob-into-hole packing has been found to be a general characteristic of helical-bundle MPs as well [87, 88]. For glycophorin A dimerization, knob-into-hole packing is facilitated by the GxxxG motif, in which the glycines permit close approach of the helices. The substitution of larger residues for glycine prevents the close approach and, hence, dimerization [82, 85, 86]. The so-called TOX-CAT method [89] has made it possible to sample the
11
12 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
amino acid motifs preferred in helix-helix association in biological membranes by using randomized sequence libraries [90]. The GxxxG motif is among a significant number of motifs that permit close packing. A statistical survey of MP sequences disclosed that these motifs are very common in MPs [91]. Dimerization studies of glycophorin in detergent micelles [85] do not permit the absolute free energy of association to be determined, because of the large free energy changes associated with micelle stability. However, estimates [17] suggest 1–5 kcal mol−1 as the free energy cost of separating a helix from a helix bundle within the bilayer environment. The cost of breaking hydrogen bonds within the bilayer HC (above) implies that hydrogen bonding between α-helices could provide a strong stabilizing force for helix association. This is borne out by recent studies of synthetic TM peptides designed to hydrogen bond to one another [92, 93]. Interhelical hydrogen bonds, however, are not common in MPs (reviewed in [17]). Indeed, lacking the specificity of knob-into-hole packing, they could be hazardous because of their tendency to cause promiscuous aggregation [18], although they are probably important in the association of TM signaling proteins [94].
3 Membrane Proteins: Formative Interactions 3.1 Connecting Translocon-assisted Folding to Physical Hydrophobicity Scales: The Interfacial Connection
The literature on translocon-assisted MP folding has been reviewed extensively in the past several years [9–14]. Here it is sufficient to note that the signal recognition particle (SRP) targets nascent ribosome-bound membrane and secreted proteins to the translocon complex (Sec61 in eukaryotes, SecY in bacteria), whereupon membrane integration and folding occurs, provided that the nascent protein has at least one run of amino acids with sufficient hydrophobicity to form a TM helix/stoptransfer sequence (Fig. 1B). Otherwise, the protein is secreted across the membrane. An important topic, reviewed elsewhere [9, 95, 96], is the physical basis for topology determination of the initial TM segment. There have been two points of view about translocon-assisted membrane integration, discussed extensively by Johnson [14]. The “sequential” point-of-view visualizes the translocon as having a large-diameter tunnel (around 50 Å) into which the nascent protein chain is secreted during folding, in preparation for insertion into the lipid bilayer via a passageway through the wall of the translocon. A crucial feature of this scheme is that the ribosome must make a tight seal with the translocon in order to prevent ion leakage. There is a growing body of evidence, however, that the alternate “concerted” scheme, in which the translocon complex and the lipid work together, is more likely (reviewed in [9]). Two low-resolution (around 15 Å) images of ribosome-translocon assemblies indicate significant gaps between the ribosome and translocon [97, 98], which eliminates the possibility of
3 Membrane Proteins: Formative Interactions
a tight seal. It appears that sealing must be provided in some way by the nascent peptide within the translocon itself. The structure of an archaeal SecY translocon, composed of 10 TM segments, strongly supports this view (Fig. 1C and D). The nascent TM segment apparently emerges laterally through a gate formed principally by helices TM2B and TM7. A short “plug” helix (TM2A) serves to seal the translocon in the absence of a nascent chain. Site-specific photo-cross-linking studies [99] show that the nascent chain can cross-link with lipids well before the termination of translation, implying that the growing chain interacts with both the translocon and neighboring lipids during folding. Heinrich et al. [100] concluded that the integration of TM domains occurs through a lipid-partitioning process as a result of the TM segment being in contact with the lipid as soon as it arrives in the translocon channel. However, integration into the membrane can occur only if a polypeptide segment has the right properties, such as sufficient hydrophobicity. What is the minimum hydrophobicity required for a 20-amino-acid stop-transfer segment to be integrated into the lipid bilayer? Chen and Kendall [101] examined this question for Escherichia coli by attaching artificial stop-transfer sequences to alkaline phosphatase, which is a water-soluble protein that is normally secreted across the membrane. Potential stop-transfer sequences (21 amino acids) composed of Leu and Ala in various ratios were introduced into an internal position of the enzyme by cassette mutagenesis. The threshold value of hydrophobicity for integration was found to be 16 Ala and five Leu. This is exactly the threshold predicted by the WW octanol-based hydrophobicity scale, as shown by Jayasinghe et al. [72]. This establishes a close relationship between the WW octanol scale and translocon-assisted TM helix insertion. There is also indirect evidence for a relationship between interfacial hydrophobicity and translocon-mediated folding. Nilsson and von Heijne [102] made the interesting observation that a Leu39Val hydrophobic sequence introduced into leader peptidase was incorporated into the membranes of dog pancreas microsomes as a single TM helix. The fact that this helix is twice the length of the typical TM helix strongly supports the idea of early contact of the growing chain with membrane lipids. The more striking observation, however, was that the introduction of a single proline into the center of the Leu39Val segment caused it to be inserted as a helical hairpin. That is, the proline induced the formation of two TM segments separated by a tight turn. Expanding on this observation, Monn´e et al. [77, 78] established a turn-propensity scale by introducing one or two of each of the natural amino acids into the center of a 40-residue polyleu-cine sequence. The residues with a favorable turn potential were found to be, in decreasing order, Pro, Asn, Arg, Asp, His, Gln, Lys, Glu and Gly. Except for Pro, which commonly occurs within TM helices of ordinary length [103], these are the residues in the WW Goct-IF scale (Fig. 4D) that have a strong IF preference. Another misfit is Ala, which has a low turn potential but a significant interfacial preference. The relationship between turn potential and the octanol-IF scale is shown in Fig. 4E. The correlation coefficient between the scales is 0.67, meaning that there is not a strict linear relationship. This is not surprising because turn potential is affected by the length of the long polyleucine segment and the number of residues of a given type introduced into the segment’s
13
14 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
center [78]. For example, unlike the Leu39Val, a single proline placed in the center of a Leu29Val sequence does not induce hairpin formation. A closer connection between turn potential and the WW Goct-IF scale was disclosed by studies of turn induction by runs of Ala residues placed in the center of polyleucine segments [79]. A run of around four alanines was found to induce helical hairpins efficiently in hydrophobic segments as short as 34 residues. Furthermore, glycosylation mapping revealed a slight preference of alanine for the membrane IF, consistent with the WW Goct-IF scale. These various studies support the idea that the translocon and lipid bilayer work in concert to integrate hydrophobic segments into membranes, which strengthens the lipid-partitioning model of Rapoport et al. [100]. In addition, the studies establish a direct link between physical hydrophobicity scales and translocon-assisted folding. An early study [104] of the relationship between biophysical hydrophobicity and translocon-mediated integration found that popular hydrophobicity scales of the time could not accurately predict the hydrophobic threshold for stop-transfer activity. The reason is now understood [72]. Prior to the WW experiment-based whole-residue scales, no hydrophobicity scale took into account the cost of dehydrating the helix backbone. As result, side-chain-only scales dramatically overpredict TM helices in MPs of known structure. If one thinks of the threshold for insertion as the mid-point of a Boltzmann probability curve (see below), side-chainonly scales will cause the apparent threshold to have a positive G, rather than the expected value of zero. Indeed, S¨aa¨f et al. [104] found the mean per-residue hydrophobicity threshold to be approximately +1.5 kcal mol−1 , which is about the cost of dehydrating the peptide bond. Had the partitioning cost of the peptide bond been appreciated at the time and taken into account, the threshold then would have been very close to G = 0. With the availability of experiment-based physical scales that account reasonably well for both interfacial and HC partitioning, it became possible to design more finely tuned TM helices for probing translocon-assisted folding [23], described below. 3.2 Connecting Translocon-assisted Folding to Physical Hydrophobicity Scales: Transmembrane Insertion of Helices
Important new insights into TM helix insertion have been obtained by Hessa et al. [23] using an in vitro expression system [104] that permits quantitative assessment of the membrane insertion efficiency of model TM segments. Specifically, they examined the integration into membranes of dog pancreas rough microsomes of designed polypeptide segments. These segments were engineered into the luminal P2 domain of the integral MP leader peptidase (Lep) (Fig. 5A–C). The first step in the analysis was to test the hypothesis that the WW octanol scale had correctly identified the minimum hydrophobicity required for TM helix stability. Initial measurements were thus made by testing H-segments of the design GGPG-(Ln A19-n )-GPGG with n = 0–7. As shown in Fig. 5D, the probability of
3 Membrane Proteins: Formative Interactions
Fig. 5 Integration of designed TM segments (H-segments) into the ER using dog pancreas microsomal membranes. This system was used to explore systematically the hydrophobicity requirements for TM helix integration via the Sec61 translocon [23]. (A) Wild-type leader peptidase (Lep) from E. coli has two N-terminal TM segments (TM1 and TM2) and a large luminal domain (P2). H-segments were inserted between residues 226 and 253 in the P2 domain. Glycosylation acceptor sites (G1 and G2) were placed in positions flanking the Hsegment. For H-segments that integrate into the membrane, only the G1 site is glycosylated (right), whereas both the G1 and G2 sites are glycosylated for H-segments that do not integrate into the membrane
(left). (Based upon Hessa et al. [23]). (B) An example of sodium dodecylsulfate gels used in the determination of the extent of glycosylation of Lep/H-segment constructs. Plasmids encoding the Lep/H-segment constructs were transcribed and translated in vitro in the absence (−RM) and presence (+RM) of dog pancreas rough microsomes. Data from Hessa et al. [23]. (C) Equations used by Hessa et al. [23] for the analysis of gels of the type shown in (B). (D) Mean probability of insertion, p, for H-segments with n=0–7 Leu residues in H-segments of the form GGPG-(Ln A19-n )-GPGG. The curve is the best-fit Boltzmann distribution, which suggests equilibrium between the inserted and translocated H-segments. (Data re-plotted from Hessa et al. [23]).
15
16 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
Fig. 6 Biological and biophysical hydrophobicity scales. (A) G aa app scale derived by Hessa et al. [23] from H-segments (Fig. 5)with the indicated amino acid placed in the middle of the 19-residue hydrophobic stretch. (B) Correlation between G aa app and the WW water/octanol free energy scale (G aa WW et al. [23]).
insertion, p(n), conforms accurately to a Boltzmann distribution, which shows that translocon-mediated insertion has the appearance of an equilibrium process. A “biological” hydrophobicity scale (G aa app ) could be derived from studies on H-segments in which each of the 20 naturally occurring amino acids were placed in the middle position of the segment. As seen in Fig. 6A, Ile, Leu, Phe and Val aa promote membrane insertion (G aa app <0), Cys, Met and Ala have G app ∼0, and aa all polar and charged residues have G app >0. The correlation between the G aa app scale and the WW octanol scale is shown in Fig. 6B. Considering the complexity of the biological system, the two scales correlate surprisingly well. The overall high correspondence between the two scales indicates that the recognition of TM segments by the translocon involves direct interaction between the segment and the surrounding lipid [100]. The G aa app biological scale is strictly valid only for residues placed in the middle of the H-segment. To explore the role of residue position, Hessa et al. also performed symmetric “scans” in which a pair of residues of a given kind were moved symmetrically from the center of the H-segment towards its N- and C-termini. The results are summarized in Fig. 7A – while the contributions from apolar residues do not vary much with position within the H-segment, Trp and Tyr strongly reduce membrane insertion when placed centrally, but become much less unfavorable as they are moved apart. This positional dependency is even stronger for charged residues such as Lys and Asp. The positional effects are consistent with the relative preferences of Trp, Tyr and charged residues for the bilayer IF (Fig. 4), suggesting the importance of interactions of elongating peptides with the lipid bilayer. The position dependence observed by Hessa et al. [23] had another important characteristic. Namely, the probability of helix insertion was sensitive to amphiphilicity of the elongating peptide as an α-helix (Fig. 7B). Helices with a low
3 Membrane Proteins: Formative Interactions
Fig. 7 Summary of the basic code used by the ER translocon to identify TM segments based upon the findings of Hessa et al. [23, 106]. As noted in Fig. 6, the biological G aa app scale is based upon values obtained from amino acids placed in the center of the H-segment. The results of Hessa et alreveal very strong dependences upon amino acid position, contrary to the implicit assumption of hydropathy plot analyses that the position of an amino acid within a bilayer-spanning helix does not matter. (A) The G aa app values for some amino acids such as Gly and Ala are little affected by position within the TM segment. The G aa app values for the aromatic residues Trp and Tyr, on the other hand, depend strongly on position. They are very unfavorable in the central 10-amino-acid zone, but become quite favorable toward the ends, consistent with the strong inter-
facial preference of aromatic amino acids. Interestingly, Phe does not show this effect. Its behavior is about the same as that of Leu. G aa app values for charged residues, which can be placed in the middle of a TM segment in the presence of a sufficiently large number of Leu residues, show an even stronger dependence than Trp and Tyr. The positional penalty declines almost linearly as the residue is moved toward either end of the helix. (B) TM helices with low hydrophobic moments (low amphiphilicity)are released into the bilayer interior from the translocon more readily than helices with high amphiphilicity. (C) Surprisingly, the KvAP potassium channel voltage sensor (S4 helix)can be inserted across the ER membrane with good efficiency, despite the presence of four arginines. The strong positional Arg dependence of G app makes this possible [106].
17
18 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
hydrophobic moment [105] had a higher insertion probability than those with a high hydrophobic moment, as though the polar surface had a more favorable interaction energy with the translocon than the non-polar surface. Overall, the results of Hessa et al. [23] suggest that direct protein-lipid interactions are essential for the recognition of TM helices by the translocon, and support models based on a partitioning of the TM helices between the Sec61 translocon and the surrounding lipid. The details of the partitioning process remain to be determined, but presumably the open state of the translocon is a highly dynamic one that permits rapid sampling of the translocon-bilayer IF by the translocating polypeptide. The results also provide a starting point for quantitative modeling of the membrane insertion of TM segments. However, Hessa et al. caution that the base G aa app scale alone (Fig. 6A) is not appropriate for calculating membrane insertion efficiency of natural polypeptide segments because of the strong positional dependence of G aa app . The importance of including the position dependence was especially apparent in a related study by Hessa et al. [106] of the TM insertion of the voltage sensor of the KvAP voltage-gated potassium channel [6]. The critical element in the sensor domains in virtually all voltage-gated ion channels is the S4 helix, which contains at least four regularly spaced Arg residues interspersed with hydrophobic residues. Voltage activation has been suggested to involve movement of S4 through the lipid bilayer in response to membrane depolarization [107]. This mechanism is controversial, because of the presumed cost of burying charges in the HC of the lipid bilayer [108]. To examine this issue, Hessa et al. [106] measured the insertion efficiency of an H-segment containing the arginine-rich region of the KvAP S4 helix (Fig. 7C). The measured Gapp was found to be only 0.5 kcal mol−1 rather than the value of 3.9 kcal mol−1 computed from the biological hydrophobicity scale Arg (Fig. 6A). However, when measurements of the position dependence of G app were taken into account, the computed value of Gapp agreed closely with the measured value. The position dependence of G aa app is clearly extremely important. However, it is surprising, because the HC of the bilayer has always been assumed to behave as a uniform alkyl liquid.
4 Conclusions
The lipid bilayer presents a complex environment for the folding and stability of MPs. Much progress has been made in describing and understanding this environment, and in teasing out the basic thermodynamic principles of its interactions with peptides. Yet, despite our progress with model systems, our understanding of the details of protein-lipid interactions in vivo remain woefully inadequate, as revealed by the studies of translocon-assisted insertion of TM helices [23, 106]. The dogma of the past 25 years or so has been that the HC of the lipid bilayer is simply a thin alkyl film that is strictly off-limits to charged amino acids because of the Born charging energy [109]. It has certainly dominated thinking about the energetics of ion channel voltage sensors [108].
References
The new information that has emerged from the studies of translocon-assisted protein folding tells us that the lipid bilayer has greater possibilities for lipid-protein interactions than previously thought. The dependence of the insertion energetics of polar residues on position within TM helices reveals this most clearly. The ease with which the S4 helix of the KvAP potassium channel voltage sensor can be inserted across the ER membrane seems astounding at first. However, in the context of diphtheria toxin, the result is not so surprising. The T-domain of diphtheria toxin is capable, on its own, of translocating large portions of itself (including highly charged helices) and the water-soluble catalytic domain across endosomal membranes spontaneously in response to lowered pH [110]. Just how this can be accomplished is a mystery that may, at its core, be related to the high structural integrity of the lipid bilayer, an integrity that prevails despite great thermal motion. Understanding and describing the lipid bilayer and its interactions with proteins from this perspective is one of the important challenges ahead.
Acknowledgments
This work was supported by grants from the National Institute of General Medical Sciences (GM46823 and GM68002) to S.H.W., and the Swedish Cancer Foundation, the Swedish Research Council, and the Marianne and Marcus Wallenberg Foundation to G.v.H.
References 1 J. K. LANYI, B. SCHOBERT, J. Mol. Biol. 2003, 328, 439–450. 2 S. H. WHITE, Hydropathy plots and the prediction of membrane protein topology. In Membrane Protein Structure:Experimental Approaches, WHITE, S. H. (ed.). New York: Oxford University Press, 1994, pp. 97–124. 3 J. U. BOWIE, Protein Sci. 1999, 8, 2711–2719. 4 R. DUTZLER, E. B. CAMPBELL, M. CADENE, B. T. CHAIT, R. MACKINNON, Nature 2002, 415, 287–294. 5 R. M. STROUD, L. J. W. MIERCKE, J. O’CONNELL, S. KHADEMI, J. K. LEE, J. REMIS, W. HARRIES, Y. ROBLES, D. AKHAVAN, Curr. Opin. Struct. Biol. 2003, 13, 424–431. 6 Y. X. JIANG, A. LEE, J. Y. CHEN, V. RUTA, M. CADENE, B. T. CHAIT, R. MACKINNON, Nature 2003, 423, 33–41.
7 B. van den BERG, W. M. CLEMONS, Jr, I. COLLINSON, Y. MODIS, E. HARTMANN, S. C. HARRISON, T. A. RAPOPORT, Nature 2004, 427, 36–44. 8 W. M. CLEMONS Jr, J. -F. M´ENE´ TRET, C. W. AKEY, T. A. RAPOPORT, Curr. Opin. Struc. Biol. 2004, 14, 390–396. 9 S. H. WHITE, G. von HEIJNE, Curr. Opin. Struct. Biol. 2004, 14, 397–404. 10 A. J. M. DRIESSEN, E. H. MANTING, C. van der DOES, Nat. Struct. Biol. 2001, 8, 492–498. 11 R. E. DALBEY, G. von HEIJNE (eds), Protein Targeting Transport and Translocation. New York: Academic Press, 2002. 12 S. PFEFFER, Cell 2003, 112, 507–517. 13 E. BIBI, Trends Biochem. Sci. 1998, 23, 51–55. 14 A. E. JOHNSON, M. A. van WAES, Annu. Rev. Cell Dev. Biol. 1999, 15, 799–842. 15 G. von HEIJNE, Adv. Protein Chem. 2003, 63, 1–18.
19
20 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
16 M. A. LEMMON, D. M. ENGELMAN, Q. Rev. Biophys. 1994, 27, 157–218. 17 S. H. WHITE, W. C. WIMLEY, Annu. Rev. Biophys. Biomol. Struct. 1999, 28, 319–365. 18 J.-L. POPOT, D. M. ENGELMAN, Annu. Rev. Biochem. 2000, 69, 881–922. 19 S. H. WHITE, A. S. LADOKHIN, S. JAYASINGHE, K. HRISTOVA, J. Biol. Chem. 2001, 276, 32395–32398. 20 S. H. WHITE, FEBS Lett. 2003, 555, 116–121. 21 L. K. TAMM, H. HONG, Folding of membrane proteins. In Protein Folding Handbook I, BUCHNER, J., KIEFHABER, T. (eds). Weinheim: Wiley-VCH, 2004, pp. 994–1027. 22 L. K. TAMM, A. ARORA, J. H. KLEINSCHMIDT, J. Biol. Chem. 2001, 276, 32399–32402. 23 T. HESSA, H. KIM, K. BIHLMALER, C. LUNDIN, J. BOEKEL, H. ANDERSSON, I. NILSSON, S. H. WHITE, G. von HEIJNE, Nature 2005, 433, 377–381. 24 J. H. HURLEY, S. MISRA, Annu. Rev. Biophys. Biomol. Struct. 2000, 29, 49–79. 25 Y. LIU, D. W. BOLEN, Biochemistry 1995, 34, 12884–12891. 26 S. TRISTRAM-NAGLE, H. I. PETRACHE, J. F. NAGLE, Biophys. J. 1998, 75, 917–925. 27 H. I. PETRACHE, S. TRISTRAM-NAGLE, J. F. NAGLE, Chem. Phys. Lipids 1998, 95, 83–94. 28 J. F. NAGLE, S. TRISTRAM-NAGLE, Curr. Opin. Struct. Biol. 2000, 10, 474–480. 29 J. F. NAGLE, S. TRISTRAM-NAGLE, Biochim. Biophys. Acta 2001, 1469, 159–195. 30 S. H. WHITE, M. C. WIENER, Determination of the structure of fluid lipid bilayer membranes. In Permeability and Stability of Lipid Bilayers, DISALVO, E.A., SIMON, S.A. (eds). Boca Raton: CRC Press, 1995, pp. 1–19. 31 S. H. WHITE, M. C. WIENER, The liquid-crystallographic structure of fluid lipid bilayer membranes. In Membrane Structure and Dynamics, MERZ, K.M., ROUX, B. (eds). Boston: Birkh¨auser, 1996, pp. 127–144. 32 M. C. WIENER, S. H. WHITE, Biophys. J. 1992, 61, 434–447. 33 S. H. WHITE, W. C. WIMLEY, Biochim. Biophys. Acta 1998, 1376, 339–352.
34 A. S. LADOKHIN, R. LEGMANN, R. J. COLLIER, S. H. WHITE, Biochemistry 2004, 43, 7451–7458. 35 M. P. ROSCONI, G. ZHAO, E. LONDON, Biochemistry 2004, 43, 9127–9139. 36 M. H. GELB, W. H. CHO, D. C. WILTON, Curr. Opin. Struct. Biol. 1999, 9, 428–432. 37 J. G. BOLLINGER, K. DIRAVIYAM, F. GHOMASHCHI, D. MURRAY, M. H. GELB, Biochemistry 2004, 43, 13293–13304. 38 A. A. FRAZIER, M. A. WISNER, N. J. MALMBERG, K. G. VICTOR, G. E. FANUCCI, E. A. NALEFSKI, J. J. FALKE, D. S. CAFISO, Biochemistry 2002, 41, 6282–6292. 39 K. HRISTOVA, C. E. DEMPSEY, S. H. WHITE, Biophys. J. 2001, 80, 801–811. 40 K. HRISTOVA, W. C. WIMLEY, V. K. MISHRA, G. M. ANANTHARAMAIAH, J. P. SEGREST, S. H. WHITE, J. Mol. Biol. 1999, 290, 99–117. 41 K. HE, S. J. LUDTKE, D. L. WORCESTER, H. W. HUANG, Biophys. J. 1996, 70, 2659–2666. 42 L. YANG, T. M. WEISS, R. I. LEHRER, H. W. HUANG, Biophys. J. 2000, 79, 2002–2009. 43 W. T. HELLER, A. J. WARING, R. I. LEHRER, T. A. HARROUN, T. M. WEISS, L. YANG, H. W. HUANG, Biochemistry 2000, 39, 139–145. 44 J. P. BRADSHAW, M. J. M. DARKES, T. A. HARROUN, J. KATSARAS, R. M. EPAND, Biochemistry 2000, 39, 6581–6585. 45 T. M. WEISS, P. C. A. van der WEL, J.A. KILLIAN, R. E. KOEPPE, II, H. W. HUANG, Biophys. J. 2003, 84, 379–385. 46 J. P. BRADSHAW, S. M. A. DAVIES, T. HAUSS, Biophys. J. 1998, 75, 889–895. 47 F.-Y. CHEN, M.-T. LEE, H. W. HUANG, Bio-phys. J. 2003, 84, 3751–3758. 48 R. W. PASTOR, Curr. Opin. Struct. Biol. 1994, 4, 486–492. 49 D. P. TIELEMAN, S. J. MARRINK, H. J. C. BERENDSEN, Biochim. Biophys. Acta 1997, 1331, 235–270. 50 L. R. FORREST, M. S. P. SANSOM, Curr. Opin. Struct. Biol. 2000, 10, 174–181. 51 S. E. FELLER, Curr. Opin. Colloid Interface Sci. 2000, 5, 217–223. 52 S. S. DEOL, P. J. BOND, C. DOMENE, M. S. P. SANSOM, Biophys. J. 2004, 87, 3737–3749. 53 S. E. FELLER, K. GAWRISCH, T. B. WOOLF, J. Am. Chem. Soc. 2003, 125, 4434–4435.
References 54 D. P. TIELEMAN, B. HESS, M. S. P. SANSOM, Biophys. J. 2002, 83, 2393–2407. 55 F. Q. ZHU, E. TAJKHORSHID, K. SCHULTEN, Biophys. J. 2004, 86, 50–57. 56 S. BERNE` CHE, B. ROUX, Nature 2001, 414, 73–77. 57 D. J. TOBIAS, Membrane simulations. In Computational Biochemistry and Biophysics, BECKER, O.M., MACKERELL, A.D., Jr, ROUX, B., WATANABE, M. (eds). New York: Marcel Dekker, 2001, pp. 465–496. 58 R. W. BENZ, F. CASTRO-ROMA´ N, D. J. TOBIAS, S. H. WHITE, Biophys. J. 2005, in press. 59 R. E. JACOBS, S. H. WHITE, Biochemistry 1989, 28, 3421–3437. 60 J.-L. POPOT, D. M. ENGELMAN, Biochemistry 1990, 29, 4031–4037. 61 A. R. CURRAN, D. M. ENGELMAN, Curr. Opin. Struct. Biol. 2003, 13, 412–417. 62 W. C. WIMLEY, S. H. WHITE, Nat. Struct. Biol. 1996, 3, 842–848. 63 A. S. LADOKHIN, S. H. WHITE, J. Mol. Biol. 2001, 309, 543–552. 64 W. C. WIMLEY, K. HRISTOVA, A. S. LADOKHIN, L. SILVESTRO, P. H. AXELSEN, S. H. WHITE, J. Mol. Biol. 1998, 277, 1091–1110. 65 A. S. LADOKHIN, S. H. WHITE, J. Mol. Biol. 1999, 285, 1363–1369. 66 T. WIEPRECHT, M. BEYERMANN, J. SEELIG, Biochemistry 1999, 38, 10377–10387. 67 Y. LI, X. HAN, L. K. TAMM, Biochemistry 2003, 42, 7245–7251. 68 W. C. WIMLEY, S. H. WHITE, Biochemistry 2000, 39, 4432–4442. 69 M. A. ROSEMAN, J. Mol. Biol. 1988, 201, 621–625. 70 N. BEN-TAL, D. SITKOFF, I. A. TOPOL, A.-S. YANG, S. K. BURT, B. HONIG, J. Phys. Chem. B 1997, 101, 450–457. 71 N. BEN-TAL, A. BEN-SHAUL, A. NICHOLLS, B. HONIG, Biophys. J. 1996, 70, 1803–1812. 72 S. JAYASINGHE, K. HRISTOVA, S. H. WHITE, J. Mol. Biol. 2001, 312, 927–934. 73 J. P. SEGREST, R. L. JACKSON, V. T. MARCHESI, R. B. GUYER, W. TERRY, Biochem. Biophys. Res. Commun. 1972, 49, 964–969.
74 W. C. WIMLEY, T. P. CREAMER, S. H. WHITE, Biochemistry 1996, 35, 5109–5124. 75 S. JAYASINGHE, K. HRISTOVA, S. H. WHITE, Protein Sci. 2001, 10, 455–458. 76 W. C. WIMLEY, K. GAWRISCH, T. P. CREAMER, S. H. WHITE, Proc. Natl Acad. Sci. USA 1996, 93, 2985–2990. 77 M. MONNE´ , M. HERMANSSON, G. von HEIJNE, J. Mol. Biol. 1999, 288, 141–145. 78 M. MONNE´ , I. M. NILSSON, A. ELOFSSON, G. von HEIJNE, J. Mol. Biol. 1999, 293, 807–814. 79 I. M. NILSSON, A. E. JOHNSON, G. von HEIJNE, J. Biol. Chem. 2003, 278, 29389–29393. 80 K. A. DILL, Biochemistry 1990, 29, 7133–7155. 81 C. TANFORD, The Hydrophobic Effect: Formation of Micelles and Biological Membranes. New York: Wiley, 1973. 82 M. A. LEMMON, J. M. FLANAGAN, J. F. HUNT, B. D. ADAIR, B. J. BORMANN, C. E. DEMPSEY, D. M. ENGELMAN, J. Biol. Chem. 1992, 267, 7683–7689. 83 M. A. LEMMON, H. R. TREUTLEIN, P. D. ADAMS, A.T. BRU¨ NGER, D.M. ENGELMAN, Nat. Struct. Biol. 1994, 1, 157–163. 84 K. R. MACKENZIE, J. H. PRESTEGARD, D. M. ENGELMAN, Science 1997, 276, 131–133. 85 K. G. FLEMING, A. L. ACKERMAN, D. M. ENGELMAN, J. Mol. Biol. 1997, 272, 266–275. 86 K. R. MACKENZIE, D. M. ENGELMAN, Proc. Natl Acad. Sci. USA 1998, 95, 3583–3590. 87 J. U. BOWIE, J. Mol. Biol. 1997, 272, 780–789. 88 D. LANGOSCH, J. HERINGA, Proteins 1998, 31, 150–159. 89 W. P. RUSS, D. M. ENGELMAN, Proc. Natl Acad. Sci. USA 1999, 96, 863–868. 90 W. P. RUSS, D. M. ENGELMAN, J. Mol. Biol. 2000, 296, 911–919. 91 A. SENES, M. GERSTEIN, D. M. ENGELMAN, J. Mol. Biol. 2000, 296, 921–936. 92 F. X. ZHOU, M. J. COCCO, W. P. RUSS, A. T. BRUNGER, D. M. ENGELMAN, Nat. Struct. Biol. 2000, 7, 154–160. 93 C. CHOMA, H. GRATKOWSKI, J. D. LEAR, W. F. DEGRADO, Nat. Struct. Biol. 2000, 7, 161–166.
21
22 Lipid Bilayers, Translocons and the Shaping of Polypeptide Structure
94 S. O. SMITH, C. S. SMITH, B. J. BORMANN, Nat. Struct. Biol. 1996, 3, 252–258. 95 A. KUHN, M. SPIESS, Membrane protein insertion into bacterial membranes and the endoplasmic reticulum. In Protein Targeting Transport and Translocation, DALBEY, R.E., VON HEIJNE, G. (eds). New York: Academic Press, 2002, pp. 107–130. 96 V. GODER, M. SPIESS, FEBS Lett. 2001, 504, 87–93. 97 R. BECKMANN, C. M. T. SPAHN, N. ESWAR, J. HELMERS, P. A. PENCZEK, A. SALI, J. FRANK, G. BLOBEL, Cell 2001, 107, 361–372. 98 D.G. MORGAN, J.-F. M´ENE´ TRET, A. NEUHOF, T. A. RAPOPORT, C. W. AKEY, J. Mol. Biol. 2002, 324, 871–886. 99 W. MOTHES, S. U. HEINRICH, R. GRAF, I. M. NILSSON, G. von HEIJNE, J. BRUNNER, T. A. RAPOPORT, Cell 1997, 89, 523–533. 100 S. U. HEINRICH, W. MOTHES, J. BRUNNER, T. A. RAPOPORT, Cell 2000, 102, 233–244. 101 H. CHEN, D. A. KENDALL, J. Biol. Chem. 1995, 270, 14115–14122. 102 I. M. NILSSON, G. von HEIJNE, J. Mol. Biol. 1998, 284, 1185–1189. 103 K. A. WILLIAMS, C. M. DEBER, Biochemistry 1991, 30, 8919–8923. 104 A. S¨AA¨ F, E. WALLIN, G. von HEIJNE, Eur.J. Biochem. 1998, 251, 821–829. 105 D. EISENBERG, R. M. WEISS, T. C. TERWILLIGER, Proc. Natl Acad. Sci. USA 1984, 81, 140–144. 106 T. HESSA, S. H. WHITE, G. von HEIJNE, Science 2005, 307, 1427.
107 Y. X. JIANG, V. RUTA, J. Y. CHEN, A. LEE, R. MACKINNON, Nature 2003, 42–48. 108 M. GRABE, H. LECAR, Y. N. JAN, L. Y. JAN, Proc. Natl Acad. Sci. USA 2004, 101, 17640–17645. 109 A. PARSEGIAN, Nature 1969, 221, 844–846. 110 K. J. OH, L. SENZEL, R. J. COLLIER, A. FINKELSTEIN, Proc. Natl Acad. Sci. USA 1999, 96, 8467–8470. 111 R. DUTZLER, E. B. CAMPBELL, R. MACKINNON, Science 2003, 300, 108–112. 112 C. D. SNOW, H. NGUYEN, V. S. PANDE, M. GRUEBELE, Nature 2002, 420, 102–106. 113 V. GODER, T. JUNNE, M. SPIESS, Mol. Biol. Cell 2004, 15, 1470–1478. 114 D. J. SCHNELL, D. N. HEBERT, Cell 2003, 112, 491–505. 115 W. HUMPHREY, W. DALKE, K. SCHULTEN, J. Mol. Graphics 1996, 14, 33–38. 116 M. C. WIENER, S. H. WHITE, Biophys. J. 1991, 59, 162–173. 117 S. H. WHITE, K. HRISTOVA, Peptides in lipid bilayers: determination of location by absolute-scale X-ray refinement. In Lipid Bilayers. Structure and Interactions, KATSARAS, J., GUTBERLET, T. (eds). Berlin: Springer, 2000, pp. 189–206. 118 S. H. WHITE, W. C. WIMLEY, Curr. Opin. Struct. Biol. 1994, 4, 79–86. 119 I. T. ARKIN, K. R. MACKENZIE, L. FISHER, S. AIMOTO, D. M. ENGELMAN, S. O. SMITH, Nat. Struct. Biol. 1996, 3, 240–243. 120 T. WIEPRECHT, O. APOSTOLOV, M. BEYERMANN, J. SEELIG, J. Mol. Biol. 1999, 294, 785–794.
1
Expression of Proteins in Isotopically Enriched Form Heiko Patzelt Sultan Qaboos University, Al-Khod, Sultanate of Oman
Natalie Goto The Scripps Research Institute, La Jolla, USA
Hideo Iwai University of Zurich, Zurich, Switzerland
Kenneth Lundstrom Bio-Xtal-BioPole, Epalinges (Lausanne), Switzerland
Erhard Fernholz Roche Diagnostics GmbH, Penzberg, Germany
Originally published in: BioNMR in Drug Research. Edited by Oliver Zerbe. Copyright ľ 2002 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30465-7
1 Introduction
The introduction of stable isotopes into proteins has significantly reduced the time requirements for structure elucidation of biomolecules. Moreover, structural studies of proteins with molecular weights exceeding the 10 kDa limit are usually not possible without uniform isotope labeling because of severe resonance overlap and inefficient coherence transfer along the rather small 3 J 1 H-1 H couplings. Nowadays, efficient expression of recombinant proteins is a prerequisite for many techniques used in structural biology, but the requirement for isotope labeling in particular often precludes NMR (nuclear magnetic resonance) structure determination of proteins isolated from natural sources. Specifically, proteins that have been uniformly labeled with 15 N and/or 13 C are commonly required for NMR spectroscopy, especially for backbone chemical shift assignment procedures, which are greatly facilitated by the use of a series of rather sensitive multi-dimensional triple-resonance NMR experiments [1], in a process that can also be automated with good success [2]. Moreover, the random replacement of nonexchangeable protons by deuterons Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Expression of Proteins in Isotopically Enriched Form
reduces 1 H-1 H dipolar interactions and scalar couplings, thereby reducing peak line widths considerably and allowing structure elucidation of proteins exceeding 30 kDa [3, 4]. Random fractionally deuterated protein samples also permit the use of longer mixing times in NOESY (nuclear Overhauser effect spectroscopy) experiments, since spin-diffusion pathways are largely eliminated. In addition, transverse-relaxation optimized spectroscopy TROSY [5], which has been used for molecules larger than 100 kDa [6], benefits dramatically from deuteration. This stems from the fact that the TROSY component that is narrowed by the DD-CSA compensation is broadened by dipolar interactions with nearby protons. Besides uniform labeling approaches, stable isotopes can also be introduced at specific sites in proteins in order to simplify the assignment process and to isolate spectral information from the region of interest. For example, biosyntheticallydirected fractional 13 C labeling offers the possibility of making stereospecific assignments of all isopropyl methyl groups of Val and Leu residues [7]. In another approach often used for solid-state NMR termed residue-specific labeling, isotope labels are introduced at single sites in a protein. A related scheme, called amino acid-type labeling, is accomplished by expression in an amino acid-based medium where only targeted amino acids contain isotope labels [8]. Preparing a series of samples with different isotopically enriched single amino acid types provides a useful approach for the assignment of large systems not accessible by traditional methods (for an example see Ref. [9]). However, labeling specificity and yield can suffer from isotope scrambling arising from metabolic conversion of amino acids. This problem can be circumvented through the use of specially engineered amino acid-type specific auxotrophic strains. An interesting alternative that can be used to achieve residue and/or amino acid type-specific labeling is presented by the in vitro cell-free expression systems. These systems are additionally advantageous when expression products display cell toxicity. Another labeling strategy geared toward the study of large proteins combines the favorable relaxation properties conferred by extensive deuteration with site-specific strategies for introducing protons. For example, methyl groups and/or aromatic amino acids can be targeted for protonation in otherwise fully deuterated proteins. An alternative approach for the study of large proteins features segmental labeling methods based on protein splicing methodology. Consequently, longer stretches of protein are isotopically enriched, leaving the remainder unlabeled. Isotope editing will remove all signals from unlabeled segments of the proteins, thereby largely reducing resonance overlap and facilitating assignment. Protein splicing methods can also be used to introduce non-natural amino acids or chemical modifications into the sequence. Therein, a protein segment containing an unnatural residue can be chemically synthesized and then ligated to a recombinantly produced (isotopically enriched) segment. Protein splicing can also be used to produce proteins which have high potential for cytotoxicity [10] or to stabilize proteins through cyclization. This article will mainly focus on recent developments in protein labeling methodology. For an introduction to methods involved in the generation and yield optimization of protein samples labeled with 13 C and/or 15 N by recombinant methods
2 Isotope-Labeled Proteins from Hydrolyzates of the Green Alga Scenedesmus obliquus 3
in E. coli the interested reader is referred to excellent reviews published in the literature [8, 11–18]. Here we first describe the use of labeled algal hydrolyzates for the production of labeled proteins in E. coli or other organisms. We then review methods used for the introduction of isotope labels into specific sites, and this is followed by a section on segmental labeling approaches. Subsequently, we will summarize recombinant protein expression methods in hosts other than E. coli that have proven to be especially suitable for post-translationally modified proteins and membrane proteins, before concluding with an introduction to cell-free expression systems.
2 Isotope-Labeled Proteins from Hydrolyzates of the Green Alga Scenedesmus obliquus
Although alternative expression systems have been successfully adapted for the production of isotope-labeled proteins (see Sect. 5), heterologous expression in E. coli often remains the method of choice for NMR sample preparation. There is a fundamental difference, however, with respect to the kind of medium in which the cells are cultivated. In a so-called “chemically defined” or “minimal” medium only one or a very limited number of carbon sources is provided, e.g. glucose or glycerol. All bacterial metabolites have to be biosynthesized by the cells through the various, sometimes lengthy and energy-demanding, metabolic pathways. In a “complex” or “rich” medium, the cells grow, as the name suggests, on a complex mixture of amino acids and/or carbohydrates. Amino acid interconversions, and thus the potential for isotope scrambling in selectively labeled samples, are here reduced to a minimum. Unlabeled fermentations are usually performed in complex media (i.e. yeast extract-containing LB for E. coli), since protein yield and cell density are here considerably higher than in minimal media. The same is desirable for isotope-labeled fermentations, but the limited availability and/or high price of commercial amino acid/sugar mixtures in the required isotope composition often impose fermentations on a single carbon source. Several companies are meanwhile supplying labeled amino acid/sugar mixtures of acceptable quality. However, especially if larger-scale or repeated preparations of labeled proteins are envisaged, investment of some extra time for the in-house production of labeled complex growth media for E. coli or other host cells clearly becomes advantageous, especially from the point of view of financial considerations. The most frequently employed source for complex amino acid/sugar mixtures, labeled in any combination of 2 H, 13 C and/or 15 N in E. coli continues to be a phototrophic green alga. Scenedesmus obliquus was introduced for that purpose in 1972 by Crespi and Katz [19]. In recent years, the original protocols have been modified by several groups, leading to improvements in the yield and purity of the algal amino acid mixtures, thereby enhancing protein labeling efficiency and expression levels in the hosts [20–22].
4 Expression of Proteins in Isotopically Enriched Form
Depending on the desired labeling pattern, 2 H2 O, 13 CO2 and/or 15 NH4 Cl or Na15 NO3 are used as exclusive isotope sources during the algal fermentation. All of the above have become commercially available at affordable rates. For the preparation of random partially isotope-labeled amino acid/sugar mixtures, unlabeled water, carbon dioxide or nitrogen salts are simply admixed to the labeled starting material in the appropriate proportions. Thus, the three basic steps required for the preparation of a uniformly labeled protein for NMR experiments are 1. production of isotope-labeled algal hydrolyzates, 2. adaptation of the protein overproducing organism (usually, but not always, E. coli) to growth on the algal medium, and 3. preparation and purification of the isotope-labeled protein on a preparative scale.
If specific amino acid-type labeling is required, the labeled amino acid is added to the fermentation of the expression host (topic 1 above, see Sect. 2.3). In this case, a thorough isotope analysis of the expressed protein is advisable prior to NMR spectroscopic investigations. This is preferentially achieved by GC-MS analyses of the hydrolyzed amino acids from the protein product. 2.1 Production of Isotope-Labeled Algal Hydrolyzates
Cultures of S. obliquus can easily be grown photoautotrophically in two-tier flasks in an inorganic medium. A stepwise replacement of H2 O by 2 H2 O leads to deuterated cultures, and a replacement of CO2 (the sole carbon source) by 13 CO2 and/or the replacement of the nitrogen-containing salts by their 15 N-isotopomers produces 13 C- and/or 15 N-labeled cultures. These are used as inoculi for larger fermentations in Fernbach flasks or stirred or airlifting fermenters. All fermentations are continuously illuminated with standard plant light bulbs or fluorescent tubes. In the last stage of 2 H2 O fermentations, all salts containing crystal water are repeatedly dissolved in small amounts of 2 H2 O and lyophilized before addition to the medium. If the algal cells are harvested under sterile conditions, the recovered medium can be re-inoculated for up to four further fermentations, and only phosphate is supplemented when the growth rate declines [22]. The recycling of the media cuts isotope costs by about 90%. The algal cell mass is then purified from low-molecular-weight metabolites and hydro-lyzed in HCl (2 HCl in the case of a deuterated fermentation). After neutralization and lyophilization, a white powder (typically 2–2.5 g L−1 of medium and fermentation cycle) is obtained, containing around 50% amino acids (for composition see Ref. [23]), 30% sugars (composition in Ref. [24]) and 20% NaCl. This amino acid/sugar mixture for complex microbial growth media can be produced with any combination of 2 H, 13 C and/or 15 N, including random fractional label distributions. Used as carbon source it enables the simple and quick preparation
2 Isotope-Labeled Proteins from Hydrolyzates of the Green Alga Scenedesmus obliquus 5
of isotope-labeled, complex microbial growth media for the production of labeled proteins. 2.2 Adaptation of the Protein Overproducer to the Algal Medium
The described algal hydrolyzate contains amino acids and sugars in a physiological composition, i.e. in a relative composition similar to that required by most host cells. Amino acid biosynthesis, interconversion, and thus the potential for isotope scrambling, are minimized.When all potentially inhibitory lowmolecular-weight compounds are removed by extraction prior to hydrolysis, most organisms grow well in media containing their typical salt and trace element composition, with the exception of NaCl, which is introduced as part of the algal hydrolyzate. Only the carbon sources are substituted by the algal amino acid/sugar mixture (e.g. yeast extract is replaced by algal hydrolyzate with the required isotope composition). Examples of the production of isotope-labeled proteins in Bacteria [25], Archaea [23, 26] and Eucarya [27] can be found in the recent literature. Since changing the carbon source may influence bacterial growth and expression characteristics, a series of unlabeled test experiments is recommended in order to establish the minimum hydrolyzate concentration required, as well as the reproducibility of protein expression levels. While expression in 13 C- and/or 15 N-labeled media is usually straightforward, most organisms need to be adapted in 3 to 4 steps to growth in 2 H2 O (e.g. 50, 75, 90, 100%). In addition, while formulating any deuterated medium, it is important to recall that the reading on pH meters equipped with normal glass electrodes is about 0.4 units lower in 2 H2 O than in H2 O with the same hydrogen/deuterium ion concentration [28]. Because of the different physical properties of 2 H2 O, growth may be slower than usual, and the timing for induction of protein expression may require adjustment. The extent of deuteration depends on the type of experiments that will be performed. For investigations of internal dynamics using 15 N relaxation or for backbone assignment with triple-resonance spectroscopy, 100% deuteration at the nonexchangable carbon sites maximizes signal sensitivity and resolution. On the other hand, structural information traditionally relies on distance restraints derived from 1 H,1 H NOEs. Nietlispach et al. have calculated and measured the effects of various levels of random fractional deuteration and found 50–70% deuteration most useful for larger proteins [29]. 2.3 Preparation of Homogenously Isotope-Labeled Protein by Fermentation on Algal Media
Production of isotope-labeled proteins on a larger scale from the optimized test conditions is typically routine as long as the physical growth parameters (reactor type, aeration, etc.) are not changed significantly. However, the purification of
6 Expression of Proteins in Isotopically Enriched Form
deuterated proteins may require some adjustments, depending on the techniques utilized. For example, because of their considerably higher density, centrifugation gradients must be adapted. Also, the chromatographic properties of deuterated proteins may display differences relative to their unlabeled (1 H at natural abundance) counterparts, reflecting potential shifts in isoelectric point, stronger intramolecular hydrogen bonds and weaker van der Waals interactions. A final consideration is the re-introduction of protons at the exchangeable amide sites. Since the quantitative exchange of amide protons from the protein core can be extremely slow, it is sometimes necessary to expose the sample to denaturing conditions, followed by refolding in H2 O, if possible.
2.4 Amino Acid-Type Specific Labeling
The principal difficulty associated with the preparation of amino acid-type specific labeled proteins is the suppression of metabolic scrambling of the label into other amino acid types through the common metabolic pathways in the host cell. The use of a complex amino acid/sugar mixture, such as the one present in the algal hydrolyzates, reduces this danger greatly compared to fermentations on a single carbon source. In fact, the activity of many enzymes responsible for amino acid interconversions appears to be low or absent in bacteria grown under these conditions. For example, for the production of a fully deuterium-labeled protein containing 1 H-Trp, the host cell is grown on fully 2 H-labeled algal hydrolyzate in 2 H2 O to which unlabeled Trp is admixed. The individual amount of the differentially labeled amino acid to be added to the fermentation may vary for the different residues and depends on the biosynthetic origin of the amino acid [30] as well as on its background concentration in the algal hydrolyzate [22]. Labels in the biosynthetically central amino acids Asx and Glx show a pronounced tendency for biosynthetically-directed isotope relocation, since these molecules may be used as metabolic precursors for a number of different downstream amino acids (e.g. Met, Lys, Thr, and Ile or Pro and Arg, for Asx and Glx, respectively). The metabolically peripheral amino acids can usually be labeled specifically with much higher isotope purity. For every new amino acid, a small series of test experiments is normally sufficient to establish a compromise between the minimum concentration required for high specific labeling (usually at least ten times the amount introduced with the algal hydrolyzate) and the toxicity limit for the respective amino acid. Before a large-scale fermentation is attempted, the isotope composition downstream from the labeled amino acid should be analyzed. The most probable sites of undesired isotope incorporation are found in the same biosynthetic group [30]. Labeled Tyr, for instance, may be found after media supplementation with labeled Phe, whereas Cys and Gly labeling may result from the addition of labeled Ser.
3 Selective Labeling Schemes
Fig. 1 Most common derivatization of amino acids for GC-MS analysis. The fragment without the carboisopropoxy group normally produces the highest observable peak, which is used for the determination of the isotope composition.
2.5 Mass Spectrometric Analysis of the Labeled Amino Acids
For amino acid analysis the labeled protein needs to be hydrolyzed and derivatized. Most commonly the hydrolysis is performed in 6 M HCl, and the amino acids are converted into their isopropyl ester and pentafluoropropanamide derivatives (Fig. 1) before GC/MS analysis. The molecular ion is not always visible after standard electron impact (EI) ionization, and the fragment after loss of the carboisopropoxy group is the highest observable peak. This leaves m/e=175 plus the mass of the amino acid side chain, from which the degree of labeling can be directly deduced. Fig. 2 shows illustrative mass spectra for derivatized Phe with various isotope patterns. In unlabeled Phe (Fig. 2a), the highest observable peak at m/e=266 accounts for the aforementioned fragment of 175 plus the mass of the benzyl group (C7 H7 =91). The strongest peak is due to the tropylium cation (C7 H7 ) at m/e=91. Complete deuteration (Fig. 2d) takes molecular ion to m/e=274 (175+C7 2 H7 ), and the tropylium signal (C7 2 H7 ) accordingly to m/e 98. As expected, the spectrum of 13 C,15 N-labeled Phe (Fig. 2e) shows the corresponding signals at m/e=275 (175+2+13 C7 H7 ) and 98 (13 C7 H7 ). The signals in the spectra of the partially deuterated amino acid (Figs. 2b–d and 2f) show a statistical distribution of the masses around the calculated values for all fragments and are thus an unambiguous proof of a complete random distribution of the labels.
3 Selective Labeling Schemes
While general labeling strategies relying on expression of proteins using hydrolyzates of algae are useful for uniform or amino acid-specific labeling, expression can also be performed in minimal media in a cost-effective manner. These types
7
8 Expression of Proteins in Isotopically Enriched Form
Fig. 2 EI mass spectra of derivatized phenylalanine, isolated from S. obliquus; a unlabeled, b random 50% 2 H-, c random 70% 2 H-, d 100% 2 H-, e 13 C,15 N-, and f 13 C,15 N/random 75% 2 H-labeled.
of expression media typically use glucose and ammonium salts as the sole carbon and nitrogen sources, respectively, and include D2 O as the medium base when deuteration is required. These growth conditions are easily adapted for uniform or random fractional isotope labeling. In addition, modifications to this general formula can be made to introduce isotope labels into specific sites, for example, by supplemention of the expression media with specifically labeled amino acids or amino acid precursors. In the following section we will review a number of methods used for different patterns of selective labeling in E. coli. Reagents used in these different strategies, with corresponding literature references, are summarized in Tab.1.
3 Selective Labeling Schemes Table 1 Chemical structures of metabolites involved in selective isotope labeling strategies
(from Ref. [14] with kind permission)
3.1 Reverse-Labeling Schemes 3.1.1 Selective Protonation of Methyl Groups in 2 H-Labeled Proteins As described in the introduction, deuteration is routinely used to reduce the rapid transverse relaxation rates characteristic of larger proteins, leading to improvements in peak line widths and experimental sensitivity. Deuteration of all the carbonbound protons maximizes the sensitivity gains that can be obtained from this labeling strategy, and has thus proven useful for in the assignment of backbone 1 H, 15 N, 13 C and side-chain 13 C chemical shifts of large proteins. However, the elimination of all but the exchangeable protons significantly impedes structural studies that rely on conventional NOE approaches. Although in some cases it is possible to use only backbone 1 HN-1 HN NOEs to obtain a protein global fold, the accuracy of these structures is very low because of the small proportion of distance restraints between protons from nonsequential residues. These global folds tend to be less compact than high-resolution structures, with the backbone pairwise
9
10 Expression of Proteins in Isotopically Enriched Form
root-mean-squared deviation (rmsd) to the high-resolution structure ranging from 5 to 8 Å [31, 32]. For the purpose of increasing the number of protons in the protein core while maintaining the benefits of extensive deuteration, it is possible to re-introduce protons using a “reverse isotope” labeling approach. In some of the original approaches, side chains of target amino acid types were selectively protonated in deuterated proteins by adding protonated forms of these amino acids to the D2 O growth medium (see, for example, Refs. [33, 34]). In a variation of this theme, Rosen and coworkers developed a protocol to selectively incorporate protons at the methyl positions of Ala, Val, Leu and Ile γ 2 [35]. Methyl groups are enriched in protein hydrophobic cores and hence are attractive targets for selective protonation [36]. In addition, NMR spectroscopic properties of methyl groups are favorable owing to reasonably well-resolved 13 C-1 H correlations and rapid rotation about the methyl symmetry axis that reduces peak line widths [37]. The original protocol for the production of 15 N, 13 C, 2 H, 1 Me (protonated methyl)proteins utilized 13 C, 1 H-pyruvate as the exclusive source of carbon in 100% D2 O minimal media [35]. Pyruvate can be diverted into the tricarboxylic acid (TCA) cycle to produce many of the intermediates used in the biosynthesis of amino acids [38], but can also be directly incorporated into amino acids either by transamination (Ala), or reactions with threonine (Ile), pyruvate (Val) or both pyruvate and acetyl CoA (Leu). As was observed, pyruvate that is directly incorporated into these amino acids will largely retain the methyl protons, while those amino acids synthesized indirectly via intermediates will be highly deuterated. However, the incorporation of protons at each methyl site tends to be variable, with the result that methyl isotopomers such as 13 CHD2 and 13 CH2 D are also produced [35]. The additive deuterium isotope effect on both carbon and proton chemical shifts produces upfield shifts relative to the CH3 peak by 0.02 and 0.3 ppm per 2 H atom in the 1 H and 13 C dimensions respectively. As a result, 13 C-1 H correlation spectroscopy of the methyl region shows three peaks for every methyl group labeled in this way, translating into resolution and sensitivity problems. Nonetheless, through the use of 2 H-purging pulse schemes during the acquisition of carbon chemical shift, it is possible to remove the peaks arising from methyl groups containing 2 H [32]. In addition, it should be noted that other pyruvate-based labeling schemes have also found great utility in the measurement of side-chain dynamics involving these methyl-containing side chains [39–41]. More recently, the yield and uniformity of methyl group protonation was enhanced through the use of α-ketoisovalerate in combination with α-ketobutyrate to produce 15 N, 13 C, 2 H-labeled proteins with protons introduced at the methyl positions of Leu, Val and Ile (δ1) [42, 43]. Proteins expressed in D2 O/13 C, 2 Hglucose/15 NH4 Cl minimal media can be supplemented with 13 C, [3, 2 H] αketoisovalerate for the selective protonation of the Val and Leu methyl groups and [3,3- 2 H2 ], 13 C α-ketobutyrate for Ile δ1 methyl group labeling. Using this strategy, labeling efficiency of the target methyl groups was shown to exceed 90%, while high levels of deuteration were maintained at other sites without production of methyl group isotopomers containing deuterium. Since the selectively deuterated form of
3 Selective Labeling Schemes
these amino acid precursors can be obtained by base-catalyzed proton exchange in aqueous buffer, the protonated, commercially available forms of these precursors are straightforwardly adapted to this labeling scheme. 3.1.2 Structure Determination of Selectively Methyl Protonated Proteins High levels of deuteration combined with selective methyl protonation using one of the schemes outlined above permits the measurement of 1 HN – 1 HN , 1 HN -methyl and methyl-methyl NOEs. Global folds can be determined using this subset of NOEs, where the quality of these structures is a reflection of protein topology, secondary structure content, and the location and distribution of methyl groups in the protein [32]. In the case of a 30 kDa cell adhesion fragment from intimin, for example, intradomain backbone rmsd values of the ensemble of structures ranged between 1.5 and 1.8 Å from the mean [44]. In contrast, MBP structure quality was lower, with intradomain backbone rmsds between NMR and crystal structures of 3.1–3.8 Å [45]. Although structures produced by this methodology are often of a preliminary quality, they can nonetheless be useful in the localization of ligand or protein interaction sites and the identification of homologous proteins (e.g. Ref [46]). Global folds can also be used as a structural stepping stone in the generation of high-resolution structures, since the assignment of additional NOEs from random fractionally deuterated samples or fully protonated molecules is facilitated by the use of a preliminary structure [47, 48]. Further improvements in the quality of structures can also be obtained through the incorporation of additional restraints such as dipolar couplings [45] or homology modeling (e.g. Ref. [49]). 3.1.3 Introducing 1 H,12 C Aromatic Residues into Otherwise 13 C Uniformly Labeled Proteins Alternative schemes involving selective protonation have also been developed to increase the number of side-chain distance restraints that can be obtained in highly deuterated proteins. For example, 1 H, 12 C Phe and Tyr can be directly incorporated into an otherwise uniformly 13 C-labeled protein expressed in minimal media [50]. Since these amino acids are also preferentially located in the hydrophobic cores of proteins, as well as at ligand binding interfaces, distance restraints involving these residues can be very valuable. This labeling strategy was shown to be useful for proteins under 30 kDa with relatively few Phe and Tyr such as a 24 kDa Dbl homology domain [48] and the 25 kDa antiapoptotic protein Bcl-xL [51]. In cases where overlap in the aromatic spectrum becomes problematic, a synthetic strategy has been introduced to produce Phe that is 13 C labeled only at the epsilon position [52]. An illustration of the utlility of this approach is provided by the structure determination of a 21 kDa Dbl homology domain containing seven phenylalanine residues [47]. 3.1.4 Backbone-Labeled Proteins Protocols for selective isotope labeling of protein backbone atoms are also being developed, since the prevention of 13 C incorporation at the Cβ site circumvents resolution problems associated with homonuclear 1 JCαCβ coupling. Toward this end, syntheses of backbone-labeled amino acids have been described for ten different
11
12 Expression of Proteins in Isotopically Enriched Form
amino acids starting from 15 N, 13 C2 -glycine [53, 54]. While original demonstrations of backbone labeling utilized a CHO cell expression system to prevent isotope scrambling [53], bacterial cell expression systems have also proven amenable to this strategy [54]. In this case, the expression medium must contain the full complement of amino acids, which are then replaced with those that are 13 Cα , 13 CO, 15 N, and 1 Hα (or 50% 2 Hα) labeled just prior to induction of protein expression. As was demonstrated for ubiquitin backbone-labeled with a subset of amino acids, sensitivity and resolution in HNCA-type experiments is enhanced, and couplings can be readily measured from IPAP 1 H- 13 C HSQC (heteronuclear single-quantum correlation) spectra [54, 55]. 3.2 Selective 13 C Methyl Group Labeling
To reduce the expense of producing selectively methyl-labeled proteins, it is possible to use 13 C-methyl iodide to synthesize α-ketobutyrate and α-ketoisovalerate containing 13 C only at the methyl sites [56]. A larger than 20-fold reduction in the cost of precursor molecules can be achieved using this synthetic strategy in place of the commercially available uniformly 13 C-labeled isotopomers. Although these compounds can be adapted to the selective methyl protonation scheme described above, they can also be used to produce proteins that only contain 13 C at the methyl positions of Val, Leu, and Ile δ1, with 12 C at all other sites. The reduced cost, spectral simplification and sensitivity enhancement of 13 C methyl-labeled proteins over uniformly 13 Clabeled samples facilitates the use of chemical shift mapping in the search for potential lead compounds in the drug discovery process. If, on the other hand, selective methyl protonation is required, the 1 H, 13 C methyl-labeled α-ketoisovalerate and αketobutyrate would be added to D2 O expression media containing 2 H, 13 C-labeled glucose and 15 N-labeled ammonium salt. However, compared to the uniformly 13 C-labeled selectively protonated samples described previously, structure determination is less straightforward since backbone carbon atoms for isoleucine and valine are derived from the nonmethyl portion of the α-ketobutryate and α-ketoisovalerate, respectively, and would therefore contain the 12 C isotope. Nonetheless, once assignments are made, NOE measurements involving these methyl groups benefit from elimination of one-bond 13 C-13 C couplings leading to narrow carbon line widths without the requirement for constant-time evolution periods [51].
4 Intein-Based Protein Engineering for NMR Spectroscopy
Recently, new advances in biochemistry have opened up a novel approach for protein engineering, which utilizes a protein-splicing domain. Protein splicing is a post-translational chemical modification discovered in nature, which catalyzes the excision of an intervening polypeptide (internal protein, intein) while simultaneously ligating both the N-and C-terminal flanking polypeptide chains (Fig. 3, 5A),
Fig. 3 The currently accepted chemical mechanism of protein splicing. 1 N-S(O) acyl shift, 2 transesterification, 3 cleavage by succinimide formation, 4 S(O)-N acyl shift.
4 Intein-Based Protein Engineering for NMR Spectroscopy 13
14 Expression of Proteins in Isotopically Enriched Form
analogous to RNA splicing [57, 58]. This unique enzymatic process has been used in various biochemical and biotechnological applications such as protein purification, protein ligation, backbone cyclization and C-terminal modifications. In particular, protein ligation using inteins has opened up a new way for the production of segmentally labeled proteins, thereby reducing the complexity of NMR spectra. For larger proteins or multimeric proteins, it may be necessary to combine selective labeling and segmental isotope-labeling approaches. Moreover, it will be useful in cases where information is desired only for a small part of a large protein, e. g. to characterize interactions with known binding sites or to detect conformational changes in a specific region. Another potentially useful application of inteins for NMR is backbone cyclization to enhance the stability of proteins. 4.1 Segmental Labeling of Proteins
Although more than 100 protein splicing domains have been found in nature [59], only a handful have been used for segmental labeling purposes, namely SceVMA (PI-SceI), PI-PfuI and PI-Puf II. The currently accepted reaction mechanism for protein splicing consists of the following four steps, namely, (i) N-S(O) acyl shift, (ii) transesterification, (iii) succinimide formation, and (iv) S(O)-N acyl rearrangement (Fig. 3). Either a subset of these four steps or the entire reaction process can be used for protein ligation. For example, the intein-mediated protein ligation (IPL) approach utilizes a subset of these reactions by using a modified intein as described in Sect. 4.1.2. On the other hand, the split intein approach requires all four reaction steps. In this case the success of the reaction depends on refolding properties of the split intein. 4.1.1 Intein-Mediated Protein Ligation (IPL)/Expressed Protein Ligation (EPL) using the IMPACT System The IMPACT (Intein-Mediated Purification with an Affinity Chitin-binding Tag) system was originally developed as a novel purification method by New England Biolab. It makes use of a modified intein, SceVMA, in which the active site was mutated from His-Asn-Cys to His-Ala-Cys, so that the usual cleavage due to succinimide formation involving the side-chain of Asn is no longer possible (Figs. 4A and 6A). Since a chitin-bind-ing domain (CBD) is fused to the C-terminus of the intein, this protein can be immobilized to a chitin column, providing a convenient tool for purification [60]. The desired protein segment is fused at the N-terminus of the intein, which can be liberated from the intein-CBD portion of the fusion protein by addition of nucleophiles such as dithiolthreitol (DTT), ethanethiol, 2mercaptoethane sulfonic acids (MESNA), hydroxylamine or cysteine. The IMPACT system provides a good opportunity to expand the application of native chemical ligation, which was originally developed by the Kent group, to a variety of protein targets, because the C-terminus of the N-terminal fusion polypeptide can be converted easily into a thioester group by the intein-mediated cleavage [61]. For native chemical ligation, an N-terminal peptide segment containing a C-terminal
Fig. 4 A Intein-mediated protein ligation (IPL) (also called expressed protein ligation EPL). B The chemical mechanism of protein cyclization by the IPL/EPL approach. HAC denotes the sequence of the active site of the intein, e.g. His-Ala-Cys. CBD stands for chitin-binding domain.
4 Intein-Based Protein Engineering for NMR Spectroscopy 15
16 Expression of Proteins in Isotopically Enriched Form
thioester is chemoselectively ligated to a C-terminal peptide segment that has an N-terminal cysteine in aqueous solution, without protecting any functional groups in the peptides. With the intein-based E. coli expression system, it is now possible to produce larger protein segments with a C-terminal thioester group easily, which can subsequently be used for native chemical ligation. Using this approach, Xu et al. have demonstrated the domain-selective 15 N labeling of the SH2 domain of the Abl-kinase SH domain [62]. In this experiment, the 15 N-labeled SH2 domain containing an N-terminal cysteine capped with a specific proteolytic sequence, which can be removed, was expressed and purified in 15 N-labeled media. The protective N-terminal sequence was subsequently removed by proteolysis in order to create an N-terminal thionucleophile (N-terminal cysteine). The N-terminal segment of the SH3 domain was separately produced as the intein-fusion protein in unlabeled media and eluted with ethanethiol to form the C-terminal thioester. The unlabeled SH3 domain and 15 N-labeled SH2 domain were ligated in aqueous solution at pH 7 in the presence of thiophenol and benzyl mercaptan, achieving a yield of 70%. It has also been demonstrated that multiple ligation steps can be performed with the IPL/EPL approach, thereby illustrating its potential use in central-segment labeling [63]. One of the intrinsic limits of the IPL/EPL system is the requirement for a cysteine residue at the site of protein ligation, which will replace the thioester group. Recently, it was shown that this requirement could be avoided by using a cleavable thiol-containing auxiliary group. Low et al.demonstrated protein ligation by introducing a cleavable thiol-containing auxiliary group, 1-phenyl-2-mercaptoethyl, at the alpha-amino group of a chemically synthesized peptide, which is removed upon protein ligation [64]. Unfortunately, this modification at the N-terminus could be difficult to introduce into proteins which are prepared from bacterial expression systems, and hence its use could be limited to situations where the C-terminal peptide can be chemically synthesized. Nevertheless, the removal auxiliary approach presents an opportunity for segmental isotope labeling regardless of the primary sequence. A second approach that can be adopted to overcome the intrinsic requirement for cysteine at the N-terminus of C-terminal fragment utilizes the enzyme subtiligase, a double mutant of subtilisin, which is able to join two unprotected peptides. Thioester-modified proteins were shown to present good substrates of subtiligase [65]. However, although this approach could be potentially useful for general isotope labeling, the efficiency of this process remains to be proven. It is noteworthy that there is another limiting factor in the choice of amino acid types at the junction sites which affect the enzymatic process of the intein. For example, in the case of SceVMA (also called PI-SceI) from the IMPACT system, proline, cysteine, asparagine, aspartic acid, and arginine cannot be at the C-terminus of the N-terminal target protein just before the intein sequence. The presence of these residues at this position would either slow down the N-S acyl shift dramatically or lead to immediate hydrolysis of the product from the N-S acyl shift [66]. The compatibility of amino acid types at the proximal sites depends on the specific inteins and needs to be carefully considered during the design of the required
4 Intein-Based Protein Engineering for NMR Spectroscopy
expression vectors. The specific amino acid requirements at a particular splicing site depends on the specific intein used and is thus a crucial point in this approach. 4.1.2 Reconstitution of Split Inteins It has been demonstrated that an intein can be split into two fragments and reconstituted in vitro as well as in vivo to form an active intein capable of trans splicing [67–69]. This trans-splicing activity can be directly used for protein ligation as an alternative to the native chemical ligation step, which requires additional thionucleophile groups. Yamazaki et al. have applied trans-splicing to the segmental labeling of RNA polymerase subunit α by splitting an intein from Pyrococcus furiosus, PIPfuI (Fig. 5B) [70]. Each half of the split intein fused to the N- (or C)-extein was produced separately, one in isotopically labeled and the other in unlabeled medium. The independently prepared protein fragments were expressed as inclusion bodies and purified under denaturing conditions. The two independently prepared fragments were reassembled and refolded in aqueous solution in order to form a functional intein, resulting in a ligated extein fragment and a spliced intein. The splicing reaction was found to be efficient at elevated temperature (70 ◦ C), presumably because PI-PfuI is a thermophilic enzyme. However, the general use of elevated temperatures may often be unfavorable, because many proteins denature at higher temperatures. Therefore, conditions for refolding and ligation such as temperature, pH, additives like glycerol etc. must be carefully optimized for each protein system. On the other hand, the split intein approach does not require any additional thionucleophile, in contrast to the IPL/EPL approach. Remarkably, this method can even be extended to joining three segments by using two different inteins, PI-PfuI and PI-PfuII (Fig. 5C). Otomo et al. have successfully demonstrated this approach by isotope labeling a central segment of the 370-residue maltosebinding protein [71]. Three protein fragments were constructed for the two ligation reactions. The first fragment was a fusion protein of the N-terminal domain of the split target protein and the N-terminal split intein-1. The second segment consisted of the C-terminal split intein-1 (PI-PfuI), the central part of the split target protein fragment and the N-terminal split intein-2 (PI-PfuII). The third segment contained the C-terminal split intein-2 fused to the C-terminal fragment of the split target protein. Ligation of the first and the second fragment was facilitated by intein-1, while the second and the third fragments were ligated by intein-2. These ligation reactions were highly specific because two different inteins were used. The second fragment was prepared in isotopically enriched medium, and hence the ligated protein was isotopically labeled only in the central part. The intrinsic limitation of this approach, as in the case of the IPL/EPL approach, is the requirement for specific proximal residues near or at the ligation sites. In inteins, the residues at the junction where ligation occurs are highly conserved because of the chemical mechanism and typically require either cysteine, serine, or threonine [72]. Therefore, at least the N-terminal amino acid of the C-extein must be one of these residues, depending on which intein is used. In addition, the peptide sequence preceding the intein as well as the sequence following the ligation site play an important role for the efficiency of the protein-splicing reaction. For example,
17
Fig. 5 A Natural protein splicing, B Trans splicing with a split intein. The two fragments can be prepared separately and reassembled in vitro to form an active intein domain for protein ligation. C Central segment labeling using two different split inteins. Residues involved in the protein splicing reaction are shown by a single character code of amino acids. N- and C-termini are indicated by NH2 and COOH respectively.
18 Expression of Proteins in Isotopically Enriched Form
4 Intein-Based Protein Engineering for NMR Spectroscopy
in the case of the DnaE intein, the five residues preceding the intein N-terminus and the three residues following the ligation site must seem to be native extein residues in order to achieve efficient splicing [73]. However, these sequential and structural determinants are presently not well understood for all known inteins. Otomo et al.have speculated that the flexibility of the junction region could be one of the important factors for ligation with PI-PfuI. Such requirements would restrict the position of ligation sites, probably to (flexible) linker regions. One advantage of the split intein over the IPL/EPL approach is the direct use of intein splicing activity, eliminating the requirement for additional thionucleophiles such as thiophenol. An another potential advantage is the ability to perform multiple ligations in a one-pot process, greatly simplifying the reaction procedure for the ligation of several fragments. In contrast, the IPL/EPL approach requires ligation reactions to be performed sequentially for multiple fragment ligations. Although the use of split inteins for segmental isotope labeling has great potential, the number of inteins which have been adapted to this purpose is currently limited. Additional difficulties arise from the fact that the determinants influencing the success of an intein splicing reaction are not well understood. Moreover, the refolding requirements of split inteins could hinder its use as a general method for the ligation of protein fragments. Hence, further biochemical characterization of inteins is required for the advance of intein-mediated protein ligation methods. 4.2 Stabilizing Proteins by Intein-Mediated Backbone Cyclization
Limited protein stability often hampers successful structure elucidation by X-ray crystallography and/or NMR spectroscopy. Relaxation properties are usually improved at elevated temperatures, and multidimensional NMR experiments require sample lifetimes to extend over several days to weeks in order to acquire all the necessary data. In addition, the activity of contaminating proteases that are sometimes present in purified samples can be significant at the experimental temperatures. Therefore, the stability of a target protein can be a concern, in particular for expensive isotope-labeled proteins. There have been many attempts to improve protein stability and protein properties, utilizing methods such as random mutagenesis, directed evolution, and rational protein design approaches. In general, these methods are far from straightforward and can be time-consuming. In addition, the stabilization of proteins without loss of function is not a trivial problem. One new approach to stabilizing proteins without changing the primary sequence is to introduce backbone cyclization [74]. No mutations in the primary sequence are introduced by this method, although it might be necessary to insert a flexible linker comprising several residues to join the termini [74–76]. Polymer theory by Flory predicts an improvement in protein stability upon cyclization, because the entropy of the unfolded states should be reduced [77]. Backbone cyclization has long been used for small peptides to reduce the accessible conformational space. Recent advances in intein technology have opened up a new avenue for
19
20 Expression of Proteins in Isotopically Enriched Form
the cyclization of large proteins, because these proteins can be produced with recombinant techniques in bacterial expression systems [74, 76]. Cyclized proteins can be produced either in vitro or in vivo, as discussed in the following two sections. Statistical analysis of the structure database reveals that more than 30% of all known proteins might have termini in relatively close proximity, and hence the use of backbone cyclization to stabilize proteins has a good chance of success even in cases where the structure is not yet known [78].
4.2.1 In vitro Cyclization of Proteins The IPL/EPL method described in Sect. 4.1.2 can be used for cyclization of the backbone polypeptide chains of proteins simply by introducing a nucleophilic thiol group at the N-terminus of the protein (Fig. 4B, 6A). This can be achieved either by creating an N-terminal cysteine by removing residues at the front of the cystein by specific proteolysis or by introducing a cysteine right after the methionine start codon, which is then removed enzymatically in vivo [74, 76]. Another method is to use the so-called TWIN system developed by New England Biolab, in which the target protein is fused into the middle of two different modified inteins (Fig. 6B). The N-terminal nucleophilic cysteine is produced by an intein fused to the N-terminus of the target protein. At the same time the C-terminus can be transformed into the corresponding thioester by another intein fused to the C-terminus of the target protein [79]. This system circumvents the requirement for a specific proteolytic site in order to create the N-terminal cysteine, thereby simplifying the cyclization procedure. The biggest problems associated with in vitro cyclization methods using the IPL/EPL or the TWIN system are competing intermolecular reactions such as polymerization and hydrolysis, which complicate purification as well as reduce yields [74, 79].
4.2.2 In vivo Cyclization The principle of in vivo cyclization is based on the circular permutation of precursor proteins containing an intein (Fig. 6C) [74, 75, 80, 81]. A naturally occurring split intein, DnaE from Synechocystis sp. PCC6803, was first successfully used for cyclization. However, similarly to the IPL/EPL approach, a mixture of linear and circular forms is obtained, presumably because of hydrolysis of an intermediate [73, 75]. On the other hand, artificially split inteins such as PI-PfuI, DnaB, and the RecA intein have been successfully applied for in vivo cyclization, and only circular forms were observed [80–82], suggesting that the circular permutation approach is more suitable for cyclization. Compared to the IPL/EPL or the TWIN system, in vivo cyclization does not require any external thiol group for cyclization, similarly to protein ligation with split inteins. Moreover, there are no undesired products, such as linear forms or polymers, originating from intermolecular reactions.
Fig. 6 A IPL/EPL cyclization with the IMPACT system, B backbone cyclization using the TWIN system, C backbone cyclization with split inteins. CBD stands for chitin-binding domain. N-termini are indicated by NH2 .
4 Intein-Based Protein Engineering for NMR Spectroscopy 21
22 Expression of Proteins in Isotopically Enriched Form
4.2.3 Stability Enhancement by Backbone Cyclization The effect of backbone cyclization was originally tested on BPTI, but no stabilization effects were observed, presumably because the three disulfide bridges reduce entropic gains [83]. Nevertheless, intein-mediated backbone cyclization has opened the way to a study of cyclization effects on protein stability (including membrane proteins) in more detail. Experimentally improved thermal and/or chemical stability has been shown to result from backbone cyclization of a range of proteins, including β-lactamase, DHFR, E. coli IIA Glc , a destabilized mutant of SH3 domain and the N-terminal domain of DnaB [74, 75, 81, 82, 84]. An additional advantage imparted by backbone cyclization is essentially complete resistance to exopeptidases.
5 Alternatives to E. coli Expression Systems
Structural biology and the structural understanding of the genome, popularly called structural genomics, play an increasingly important part in drug discovery today. Fast and reliable protein expression tools are therefore of prime importance. To this end, the choice of protein expression systems has become increasingly important. While in the not so distant past, only Escherichia coli-based expression was used, today a variety of expression systems have been developed ranging from Archaebacteria to mammalian expression vectors. Needless to say, there is no universal expression system, and hence it is often necessary to balance various parameters to achieve optimal expression. For instance, considering the cost of isotopically enriched media, it can be advantageous to sacrifice some native characteristics of a recombinant protein in order to benefit from the higher yields that can be achieved in a more basic expression system. In contrast, specific modifications of the target protein (e. g. glycosylation) predominantly occur in certain cell types, which therefore require the development of special expression vectors. Here, we describe the various alternatives to the use of E. coli expression hosts for heterologous gene expression. Advantages and disadvantages for the different expression systems are discussed, and practical aspects of expression technologies are also described. The feasibility of isotope labeling of recombinantly expressed proteins and their potential use for NMR is also discussed, since the costs and quantities of recombinant proteins produced depend on the system being used. 5.1 Expression Vectors
Traditionally, prokaryotic expression, especially employment of E. coli-based vectors, has been the system of choice. However, bacteria are unable to provide many vital components required for post-translational modifications including various forms of glycosylation or lipid attachment and protein processing, all of which can
5 Alternatives to E. coli Expression Systems Table 2 Features of expression systems
Vector
Advantage
Disadvantage
Rapid expression Colorimetric expression Easy scale-up
Cloning and transformation complicated Requires fusion protein strategy Lack of post-translational modifications
E. coli
Rapid cloning procedure High expression levels Easy scale-up
Toxicity of foreign membrane proteins Lack of post-translational modifications
Saccharomyces cerevisiae
Good secretion machinery Post-translational modifications Easy scale-up
Selection procedure required
Halobacterium salinarum
Schizosacharomyces pombe
Pichia pastoris Baculovirus
Genetics well understood Mammalian promoters applicable High GPCR expression levels Improved procedure
Tendency to overglycosylation Thick cell wall complicates purification Selection procedure required
Selection procedure required Relatively slow virus production
Infection of insect cells High expression yields
Different post-translational processing
Stable mammalia
High authenticity Large-scale set up
Slow procedure to generate cell lines Low recombinant protein yields Stability problems
Transient mammalian
High authenticity Relatively fast methods
Scale-up difficult Transfection methods cell line-specific
Semliki Forest virus
Rapid virus production Broad host range Extreme yields of receptors Large-scale technology established
Safety concerns
also be important for proper protein folding. For this reason, it is not surprising that much time and effort has been dedicated to the development of alternative systems, summarized in Tab. 2. 5.1.1 Halobacterium salinarum Archaea are interesting organisms in the sense that they represent a phylogenetically distinct group of Prokarya, which is as distantly related to Eubacteria as to Eukarya [85]. H. salinarum, the best characterized Archaeon harbors a purplecolored plasma membrane consisting of a complex of one protein, bacterio-opsin (Bop) and its chromophore retinal in a 1:1 ratio [86]. The complex was named bacteriorhodopsin, and it forms typically highly ordered two-dimensional structures
23
24 Expression of Proteins in Isotopically Enriched Form
in the purple membrane, which allowed its purification and the determination of a high-resolution structure [87]. Recently, a system for heterologous gene expression was constructed for H. salinarum [88]. Fusion constructs between the Bop gene and heterologous sequences have been introduced into the H. salinarum expression vectors as follows: (i) C-terminally tagged bacteriorhodopsin [88]; C-terminal fusion with (ii) the catalytic subunit of E. coli aspartate transcarbamylase [88], (iii) the human muscarinic M1 receptor [88], (iv) the human serotonin 5-HT2c receptor [88], (v) the yeast α-mating factor Ste2 receptor [88]. The Bop-transcarbamylase fusion was well expressed, generating yields of 7 mg receptor per liter of culture. However, introduction of tags at the C-terminus of the Bop gene significantly reduced its expression levels. This was partly because of a decrease in Bop-fusion protein mRNA levels compared to the wild-type Bop. More dramatically, expression studies of fusion constructs between the Bop gene and mammalian GPCRs (G protein-coupled receptors, human muscarinic M1 receptor, platelet-activating receptor and angiotensin-1 receptor) failed to detect fusion protein expression detected by Western blotting [89]. In this case, coding region swaps between Bop and GPCRs improved RNA yields and resulted in detectable levels of Ste2 receptor. These results suggest that H. salinarum can be considered as a potentially interesting alternative. The simple and rapid large-scale culture technology is attractive; however, improvements are still required concerning heterologous gene expression. In addition, questions related to codon usage and fusion construct optimization need to be properly addressed. 5.1.2 Saccharomyces cerevisiae Yeast expression vectors have been among those most commonly used since the beginning of gene technology. Vectors based on baker’s yeast, Saccharomyces cerevisiae, have been especially popular for robust expression of many types of recombinant proteins [90]. For instance, the first commercially available recombinant vaccine, the hepatitis B surface antigen vaccine, was produced from an S. cerevisiae vector [91]. Many other recombinant proteins have also been efficiently expressed in yeast including α1-Antitrypsin [92], insulin [93], Epstein-Barr virus envelope protein [94], superoxide dismutase [95] and interferon-α [90]. The genetics and fermentation technology of S. cerevisiae are well characterized. Several strong yeast promoters like alcohol dehydrogenase (ADH1), galactose (GAL1/GAL10), 3-phosphoglycerate kinase (PGK) and mating Factor-α (MFα1) have been applied as well as the selection marker genes β-isopropylmalate (LEU2) and oritidine 5 -decarboxylase (URA3) [90]. Among transmembrane proteins, the S. cerevisiae α-Factor receptor Ste2p has been expressed with C-terminal FLAG and His6 tags [96]. Ste2p belongs to the family of GPCRs with a 7-transmembrane topology. Yields of up to 1 mg of almost homologous receptor were obtained, and the purified receptor was reconstituted into artificial phospholipid vesicles. However, restoration of ligand-binding activity required the addition of solubilized membranes from an Ste2p negative yeast strain. Also, the human dopamine D1A receptor was expressed with C-terminal FLAG and His6 tags in S. cerevisiae, which allowed for purification and reconstitution of receptor [97].
5 Alternatives to E. coli Expression Systems
5.1.3 Schizosaccharomyces pombe Another yeast strain that has received much attention as an expression host is the fission yeast Schizosaccharomyces pombe. In contrast to S. cerevisiae, no budding occurs, and the yeast only reproduces by means of fission and by spores [98]. Two types of expression vectors have been developed for S. pombe. The chromosomal integration type of vector maintains the foreign gene stably in the chromosome [99], and the episomal vector replicates autonomously in yeast cells [100]. Some mammalian promoters like the human chorionic gonadotropin and CMV promoters are functional in S. pombe [101]. The fission yeasts possess many similar features to mammalian cells. S. pombe has a signal transduction system similar to the mammalian G protein-coupling system [102], and the mammalian endoplasmatic reticulum retention signal KDEL is also recognized [103]. The glycosylation pattern for S. pombe is also different from that of S. cerevisiae and other yeast species. A wide range of mammalian proteins have been expressed in S. pombe. In a successful example, the human lipocortin I comprised 50% of the total soluble proteins in yeast cells and showed high activity, indicating that the post-translational modifications were mammalian-like [104]. Membrane proteins including cytochrome P450 were expressed at ten times the levels of those in other yeast systems [105]. Also, GPCRs have been expressed in S. pombe, where the human dopamine D2 receptor was correctly inserted into the yeast cell membrane and demonstrated expression levels three times those of S. cerevisiae [106]. 5.1.4 Pichia pastoris The methylotrophic yeasts including Pichia pastoris, Hansenula polymorpha and Kluyveromyces lactis have become potentially attractive expression hosts for various recombinant proteins [107]. In addition to the relative ease with which molecular biology manipulations can be carried out, P. pastoris has demonstrated a capacity for performing many post-translational modifications such as glycosylation, disulfide bond formation and proteolytic processing [108]. P. pastoris utilizes the tightly methanol-regulated alcohol oxidase 1 (AOX1) promoter, and the vector is integrated as several copies into the yeast host genome. When human insulin was expressed in P. pastoris, the secretion was comparable to that obtained for S. cerevisiae. Peptide mapping and mass spectrometry confirmed identical processing of human insulin in yeast and mammalian cells. P. pastoris has also been used as a host for expression of GPCRs [109]. The mouse 5-HT5A receptor and the human β 2 -adrenergic receptor were fused to the prepropeptide sequence of the S. cerevisiae α-factor, which enhanced the expression levels by a factor of three. Multiple chromosomal integrations further improved the expression twofold. In the case of the β 2 -adrenergic receptor, addition of the antagonist alprenol to the culture medium increased the number of specific binding sites. A similar but weaker effect was seen for the 5-HT5A receptor after addition of yohimbine. The binding activity for the β 2 -adrenergic receptor and the 5-HT5A receptor were 25 pmol and 40 pmol, respectively, per milligram of membrane protein. The pharmacological profiles assayed by liganddisplacement analysis were similar to those obtained from receptors expressed in mammalian cells.
25
26 Expression of Proteins in Isotopically Enriched Form
5.1.5 Baculovirus Heterologous gene expression has been studied to a great extent in insect cells with the aid of baculovirus vectors. The popularity of the baculovirus system is mainly due to the high expression levels obtained for various recombinant proteins resulting from the use of strong viral promoters [110]. Generally, heterologous genes are expressed from the polyhedrin promoter of Autographa californica in several insect cell lines such as Spodoptera frugiperda (Sf9), Trichoplusia ni (Tn), Mamestra brassicae and Estigmene acrea [111]. Although baculovirus vectors have been used for expression of various mammalian recombinant proteins, a limitation has been the differences in the N-glycosylation pathway between insect and mammalian cells. However, Estigmene acrea cells can produce a similar glycosylation pattern as occurs in mammalian cells [112]. Moreover, modifications of baculovirus vectors by replacing the polyhedrin promoter with a CMV promoter made it possible to carry out expression studies in mammalian cell lines [113]. Using this so-called BacMam system, milligram quantities of a cellular adhesion protein (SAF-3) could be produced in CHO cells [114]. Baculovirus vectors have been used extensively to express GPCRs and ligand-gated ion channels [115]. Expression levels up to 60 pmol receptor per milligram have been obtained, which has led to relatively efficient purification procedures. In attempts to further enhance the expression level of the β2-adrenergic receptor, an artificial sequence was introduced, which resulted in approximately double the receptor levels in insect cells [116]. 5.1.6 Transient Mammalian Expression Several approaches have been taken to develop efficient transient mammalian expression systems. The most straightforward process has been to engineer expression vectors with strong promoters. Relatively high expression levels for cytoplasmic and even some transmembrane proteins have been obtained in adherent cells on a small scale. However, a major problem arises in the scale-up of these growth procedures, which are also relatively expensive [117]. In spite of this difficulty, transient transfection experiments using a modified calcium-phosphate coprecipitation method have been carried out in HEK293 EBNA cells adapted to suspension cultures grown on a 100 L scale [118]. More than 0.5 g of a monoclonal antibody was produced from this system, although similar methods have yet to be developed for receptor expression. 5.1.7 Stable Mammalian Expression Generation of various cell lines (BHK, CHO and HEK293) with the target gene inserted downstream of a strong promoter into the genome is a common approach to achieve overexpression in mammalian hosts. However, one drawback is the time-consuming procedure involved in the establishment of stable cell lines, which generally requires 6–8 weeks. Other problems associated with this approach are related to the relatively low expression levels and the instability of generated cell lines. These issues have been addressed by engineering inducible expression systems, which are usually based on tetracyclin-based regulation (Tet on-off systems) [119]. A highly interesting development has been the generation of a cold-inducible
5 Alternatives to E. coli Expression Systems
expression system based on the Sindbis virus replicon [120]. Because of a point mutation in one of the replicase genes, the viral replicase complex is totally inactive at 37 ◦ C, whereas a shift in temperature below 34 ◦ C results in high replication activity and high levels of heterologous gene expression. Using this approach, the serotonin transporter gene, characterized by its low expression levels in any system tested, generated reasonable yields (approximately 250,000 copies per cell) [121]. 5.1.8 Viral Vectors The two common features that have made viral vectors attractive for recombinant protein expression are their high infection rates for a broad range of mammalian cell lines and their strong promoters. Adenovirus vectors have shown high expression levels in, for instance, human embryonic kidney (HEK293) cells, but their use has been to some extent restricted by the fairly complicated virus generation procedure [122]. Another potentially useful class of viruses are the poxviruses. Recombinant gene expression of herpes simplex virus thymidine kinase (TK) has been established for vaccinia virus vectors [123]. Moreover, the engineering of a hybrid bacteriophagevaccinia virus vector by applying the T7 promoter has simplified and broadened the use of pox virus-based expression systems [124]. However, vaccinia vectors are still quite complicated to use for rapid recombinant protein expression, and they have instead found applications in the field of vaccine development. Alphavirus vectors have proven to be highly efficent for heterologous gene expression. Both Semliki Forest virus- (SFV) [125] and Sindbis virus-based [126] expression vectors have been engineered to rapidly generate high-titer recombinant particles, which are susceptible to a broad range of mammalian cell lines and primary cells in culture [127]. Typically, both GPCRs and ligand-gated ion channels have been expressed at extreme levels, i.e. up to 200 pmol receptor per milligram protein [128]. Large-scale SFV technology has been established, which has generated large quantities of, for instance, mouse serotonin 5-HT3 receptor, purified to homogeneity and subject to structural characterization [129]. Moreover, several GPCRs have been expressed at levels of 5-10 mg receptor yields per liter suspension culture of mammalian host cells [130], which has provided material for large-scale purification. 5.2 Comparison of Expression Systems
Comparison between different systems for transmembrane protein expression are always difficult to make, and they obviously reflect individual needs and are strongly influenced by personal experience. It is, however, important to define the usefulness of each system by taking into account different aspects such as ease of handling, expression levels, time and labor requirements, safety, costs, and the quality of the produced recombinant protein (Tab. 3). Obviously, prokaryotic systems are easy to use, the costs for their large-scale applications are low, and no safety risks are involved. The drawbacks are their limited capacity for post-translational modifications and generally low yields of complex mammalian transmembrane proteins. Yeast expression systems are competitive
27
28 Expression of Proteins in Isotopically Enriched Form Table 3 Application of expression systems for mammalian membrane proteins
Vector
Handling Expr./Scale-up Authenticity Time/Labour
Safety
Costs
Halobacterium salinarum E. coli Saccharomyces cerevisiae Schizosacharomyces pombe Pichia pastoris Baculovirus Stable mammalian Transient mammalian Semliki Forest virus
Easy
Short/easy
High
Low
Easy Moderate/easy Low Rel. easy High/easy Mod.
Short/easy Mod./mod.
High High
Low Mod.
Rel. easy High/easy
Mod.
Mod./mod
High
Mod.
Rel. easy High/easy Mod. High/mod. Difficult Low/difficult
Acceptable Rel. high High
Mod./mod High Mod./mod. High Long/intensive High
Mod. Rel. high High
Difficult Mod./difficult High
Mod./mod.
High
High
Easy
Short/easy
Of High concern
Low/easy
Extreme/easy
Low
High
with bacterial vectors with respect to scaleability, costs and safety. Although the time required from gene construct to expressed recombinant protein is slightly longer, the yields are significantly higher. Yeast can also provide some post-translational and protein-processing capacity, although not identical to mammalian cells. The thick yeast cell wall is of some concern, because it makes the purification of intracellular and transmembrane proteins more complicated. Baculovirus vectors have the advantages of high expression levels in insect cells and the fairly simple though more expensive scale-up procedure. Needless to say, the optimal host for expression of mammalian transmembrane proteins from the viewpoint of a molecular biologist must be mammalian cell lines. Obviously, the drawbacks with the conventional transient or stable expression approaches have been the labor intensiveness and time-consuming procedures. The yields have also been disappointingly low and the costs for large-scale production high. Viral vectors are therefore potentially very attractive because of their high capacity for gene delivery and extreme expression levels of heterologous genes. Naturally, viral vectors always pose a higher safety risk of possible infection of laboratory personnel. Not only have the rapidly generated replication-deficient SFV vectors been demonstrated to be free from any contaminating replication-proficient particles, but also the amounts of residual infectious particles associated with cells or even in the medium are negligible [131]. Today the SFV system is classified as BL1 level in several European countries (Germany, Finland, Sweden, Swizerland, UK) but is currently BL2 in the United States. Moreover, the broad host range provides an additional opportunity for the study of gene expression and protein processing in several mammalian host cell lines in parallel. Evidently, large-scale cultivation of mammalian cells is more expensive than bacterial or yeast cell equivalents, but these costs are significantly reduced by using a serum-free medium for suspension cultures.
5 Alternatives to E. coli Expression Systems
5.3 Isotope Labeling and NMR
Recent developments in technologies within structural biology should also play an important part for transmembrane proteins. The potential to incorporate stable isotopes would facilitate structure determination by NMR techniques. Although NMR technologies were long considered to be applicable only to smaller proteins, the development of transverse relaxation-optimized spectroscopy (TROSY) has made it possible to use NMR for larger proteins also [5], even integral membrane proteins. For example, the OmpX and OmpA integral membrane proteins of E. coli were labeled with 13 C/15 N/2 H isotopes and overexpressed as inclusion bodies in bacterial cells. After solubilization in 6 M Gdn-HCl and reconstitution in detergent micelles, solution NMR techniques could be used to identify regular secondary structural elements [132]. Proteins that require non-E.coli expression systems are generally too large in size to be used for NMR studies without uniform 15 N, 13 C and and sometimes 2 H labeling. Hence, when using the expression systems described in this article, it is important to ensure that cells can be grown on defined, isotopically enriched media. This fact at the moment excludes, for example, the use of fetal calf serum. However, special isotope-enriched defined media are available from commercial suppliers which present fully rich, serum-free (protein-free) media at reasonable costs containing labeled amino acids and carbohydrates and which can be used to effectively express heterologous proteins in insect cells applying the baculovirus vector system [133]. Similarly, rich media are also available for expression in S. cerevisiae [134]. Conversely, the methylotropic yeast P. pastoris can be grown on minimal media, facilitating its use as a potential host for isotope labeling. In fact, there have been a number of successful examples where P. pastoris was used to produce isotopically enriched samples for solution NMR studies, including a cysteine-rich glycosylated domain of thromobomodulin [135], a glycosylated EGF module [136], domains from rat calretinin [137], and tick anticoagulant peptide [138]. Even more importantly, it was shown that expression in yeast enables the production of heterologous proteins in deuterated form [139], which would be very difficult or even impossible to achieve in mammalian expression systems because of the cell toxicity of deuterated water. Similarly, H. salinarum can easily be grown in D2 O and on the above-described algae hydrolyzates. Moreover, residue type-specific labeling is possible in this host [26]. Examples of proteins expressed in this host in isotopically labeled form can be found in the literature, e.g. Refs. [25] and [140]. In order to obtain proteins with more native-like gly-cosylation patterns, CHO mammalian cell expression systems have also been developed for NMR sample generation [141–143]. However, the requirement for rich media in mammalian cell-based expression systems combined with difficulties associated with generating high expression levels have to date prevented a more widespread utilization for NMR. Nonetheless, considering the range of proteins that may be inaccessible to E. coli-based expression systems as well as the potential information that can be gained by studying the native-like post-translationally modified forms of
29
30 Expression of Proteins in Isotopically Enriched Form
proteins, the adaptation of heterologous protein expression systems to the purpose of isotope labeling for solution NMR clearly requires further development. Reconstitution of GPCRs in appropriate membrane mimetics or membrane preparations is still very difficult, and success requires efficient expression systems to yield enough protein material. As large quantities of an increasing number of recombinant proteins become available, it will be possible to develop techniques for solubilization, purification and reconstitution in a high-throughput format. The first global structural geno-mics project for membrane proteins, MePNet (Membrane Protein Network) was recently initiated with the aim of comparing the overexpression of 100 GPCRs in three systems based on E. coli, P. pastoris and SFV vectors [144]. The goal of this three year program is to verify the expression levels for the 100 targets and establish platforms for solubilization, purification and crystallization technologies which should form a solid base for obtaining novel high-resolution structures of GPCRs. Technological developments arising from this initiative should also benefit the field of solution and solid-state NMR. 5.4 Target Proteins
The choice of expression system is to a large extent dictated by the type of target protein. In general, many soluble proteins, which are fairly easy to express and have a low molecular weight, are efficiently expressed from bacterial vectors. As a rule of thumb one can suggest that E. coli-based expression should be used whenever possible. However, E. coli vectors are not suitable for expression of many authentic mammalian transmembrane proteins. Additionally, the near-completion of the human genome sequence has revealed a multitude of genes as potential targets for structural analyses. Many of these include transmembrane proteins, whose properties quite often make them more difficult to express. Many of these transmembrane proteins, including receptors and channels, are important targets for drug discovery. GPCRs alone stand for approximately 50% of drug targets today. Moreover, a quarter of the top 200 drugs in the United States are based on GPCRs, and the annual sales in 2000 reached more than 18.5 billion US$ [145]. It is therefore understandable that so much interest has been focused on transmembrane proteins today. At present, it is possible to express complicated transmembrane receptors in several vector systems at reasonably high levels, cost-effectively in a near-to-native state. As more recombinant proteins become available, the solubilization, purification and crystallization for NMR technology can also be developed, which will contribute to the achievement of rates of structure determination similar to those common for soluble proteins today. However, the determination of highresolution structures of membrane proteins, and particularly mammalian ones, has been modest compared to soluble proteins (Tab. 4). Recently, methodological advances in NMR spectroscopy have significantly raised the size limit amenable to NMR investigations, and hence membrane protein structures determined by NMR have also appeared in the literature. The outer membrane protein A from E. coli was reconstituted in DPC micelles and the structure determined by heteronuclear
6 The Use of Cell-Free Protein Expression for NMR Analysis
multidimensional NMR [146]. Similarly, the structure of the outer membrane protein OmpX was elucidated in DHPC micelles [147]. Both proteins were expressed in E. coli. However, in the case of OmpA, a series of selectively 15 N-labeled samples was prepared, whereas in the case of OmpX, spectroscopy was performed on a single triply 2 H,13 C,15 N-labeled sample. The best successes have been achieved when it was possible to isolate proteins from their natural sources instead of using recombi-nantly expressed forms. The results have been particularly poor for GPCRs, and the only example of a mammalian 7-transmembrane protein for which a high-resolution structure has been obtained is bovine rhodopsin [148].
6 The Use of Cell-Free Protein Expression for NMR Analysis
The expression of proteins usually requires optimization by trial and error, since conditions leading to production of high yields of active protein are difficult to predict. Therefore, additional time-consuming steps are often involved that may require changes in the choice of expression system, of reaction conditions, and of tests for the expression of protein sub-domains. In structural genomics programs this problem was rapidly identified as a major bottleneck, since high-throughput production of soluble proteins in milligram amounts is essential for success. Hence, systems that allow rapid, productive expression, are easy to manipulate, and can be run in a parallel format are highly desired in order to rapidly screen for the best conditions. Until recently, cell-free protein expression (also sometimes erroneously named in vitro protein expression) did not exhibit the productivity required for preparation of NMR samples, especially considering the high cost of using isotopically labeled starting material. Rather, it was exclusively used as an analytical tool that served to verify correct cloning or to study promotor sites. Because of the very low yields, detection of the expressed product usually required incorporation of a radioactive label (usually via 35 S-methionine). This situation changed fundamentally when Spirin published his substantial improvements in 1988 that resulted in much higher product yields [149]. He developed a set-up in which the coupled transcription/translation reaction could be continuously supplied with all the essential low-molecular-weight components (nucleotides, amino acids, energy components) while maintaining a constant reaction volume by continuously removing the product through a membrane (continuousflow cell-free system, CFCF). Under these conditions, the system would remain active for more than 1 day, in contrast to the usual upper limit of 2 h which was observed in the conventional batch system. The prolonged expression period indicated that sufficient amounts of all factors necessary for translation were still available in the system, although the membrane clearly allowed the leakage of at least some of them. Soon, these findings were used to run the reaction in a more robust dialysis mode, keeping the reaction volume constant while feeding the system with all necessary low-molecular-weight consumables.
31
32 Expression of Proteins in Isotopically Enriched Form
In the following years the productivity of the method was tremendously improved, so that milligram amounts of protein per milliliter of reaction mixture could be obtained [150–154]. This finally opened the door to cell-free protein expression to be used for the production of isotopically enriched proteins suitable for NMR analysis. However, although widely considered to be a promising method for labeling proteins at specific positions and therefore facilitating the process of chemical shift assignment as well as reducing spectral overlap [155, 156], in fact it was used in very few laboratories worldwide. This was mainly due to the fact that the preparation of the lyzate was tedious and the expression levels of each batch varied from lot to lot. Recently, a system for cell-free protein production has became commercially available (the Rapid Translation System, RTS) [157]. In the following sections this system will be described, and advantages as well as limitations will be discussed. 6.1 The Cell-Free Protein Expression System RTS
The RTS system includes two different technology platforms for cell-free protein expression as well as a number of tools for finding optimal conditions (Scheme 1). All expression systems use the T7-polymerase for transcription and an E. coli lyzate with reduced nuclease and protease activity for translation. The conditions are optimized for a coupled transcription/translation reaction so that the DNA can be directly used as the template. The first platform (RTS 100 HY) is designed as a screening tool. It uses the batch format, so that the reaction time does not exceed 3 h. In particular, in the RTS 100 HY system the exonuclease activity is reduced, so that the direct use of
Scheme 1 From gene to protein structure via cell-free protein expression.
6 The Use of Cell-Free Protein Expression for NMR Analysis
Fig. 7 Device of the RTS 500 format using the Continuous Exchange Cell-Free principle.
PCR-generated DNA templates is possible. To facilitate the generation of the PCR templates there is a special product available (linear template kit), which introduces all regulatory elements [T7-promotor, gene10 enhancer sequence and the Shine-Dalgarno (RBS) sequence]. Consequently, RTS 100 HY can be used for the rapid evaluation of the best template, without spending time with cloning, and for optimization of the reaction conditions (e.g. temperature, choice of additives like detergents, chaperones etc.). In addition, a bioinformatic tool (the program ProteoExpert) facilitates the process of designing the optimal template by analyzing and improving the secondary structure of the corresponding mRNA (without changing the amino acid sequence of the protein). The second platform (RTS 500 HY) is designed for the production of proteins on a preparative scale. It is based on the CECF principle and utilizes a device with two chambers (Fig. 7). This design, with a proper choice of reaction conditions, gives a reaction time of 24 h, yielding up to 6 mg of protein per ml. A scaled-up version is also available (RTS 9000 HY), providing up to 50 mg protein per run. In all formats, the amino acids are supplied separately, so they can be conveniently exchanged for labeled ones. 6.2 From PCR Product to 15 N-Labeled Protein
As an example of how cell-free protein expression can be used to rapidly generate a protein sample suitable for NMR analysis, an SH3-domain (8 kDa) was expressed using the RTS system [158]. Since it was initially found that the yield using a template carrying the wild-type sequence was too low, the sequence of the template was analyzed using the ProteoExpert program. The suggested sequences were subsequently evaluated by running expressions in the batch mode (RTS 100) using PCR-generated templates (linear template kit). As a result, all ten test sequences showed significantly higher yields than those with the original wild-type RNA template (data not shown). One of these was selected and ligated into the TOPO Cloning vector (Invitrogen Corp.) for expression in RTS 500 HY. Comparison with expression levels obtained with the wild-type template showed that also by
33
34 Expression of Proteins in Isotopically Enriched Form
Fig. 8. Western blot analysis (via His6 -tag) of the expression of the SH3 domain using the wild-type DNA-sequence (lanes 1 and 2) or the optimized DNA-sequence (lanes 3 and 4). On lanes 1 and 3 0.25 :L DNA were loaded on the gel, while on lanes 2 and 4 0.125 :L were applied.
using a circular template the yield could be improved more than fivefold (Fig. 8), resulting in approximately 3 mg product/mL reaction mix. In the next step, uniformly 15 N-labeled protein was produced by using a mixture of 15 N-labeled amino acids. Expression levels of the 15 N-labeled reaction were identical to the first reaction performed with unlabeled amino acids. The product was purified to homogeneity, and a [15 N,1 H]-HSQC spectrum was obtained (Fig. 9) to confirm the overall integrity of the protein fold. For the residue-type assignment of the cross peaks, the SH3 domain was expressed with specifically labeled glycine or arginine residues by using amino acid mixtures where only Gly and Arg were 15 N-labeled. Again, yields for the labeled proteins were identical to those of the unlabeled product. Analysis of the corresponding [15 N,1 H]-HSQC spectra (Fig. 9) verified that the numbers of cross peaks for the 15 N-Gly as well as for the 15 N-Arg labeled protein were identical to the predicted ones (10 for the 10 Gly and 6 for the 3 Arg, respectively). Importantly, no scrambling of the 15 N-labels could be detected. All cross peaks were contained in the [15 N,1 H]-HSQC spectrum of the uniformly labeled protein (Fig. 9) and no additional signals occurred. Therefore, by simply overlaying the HSQC spectra, the cross peaks belonging to the glycine and the arginine residues could be readily assigned.
Fig. 9 [15 N,1 H]-HSQC spectrum of 15 N-uniformly labeled SH3 domain (A) and of a sample selectively labeled with only 15 N-Gly (B) or 15 N-Arg (C).
References
6.3 Conclusion
This example demonstrates how cell-free protein expression can be used to rapidly optimize the reaction conditions as well as to shorten the time for spectral assignments by producing individually labeled proteins. To discuss in general the applicabilities of cell-free protein expression technology [159], two features are most valuable. First, only the protein of interest is produced, because the highly active T7-polymerase is used for translation. Consequently, labeled amino acids are almost exclusively incorporated into the newly produced protein. Since the labeled amino acids can be supplied as a mixture or added individually, the time for assigning the cross peaks to the particular amino acid(s) can be significantly reduced. This approach will enable partial assignments to be made in molecules that are far too large to allow spectral assignments from uniformly labeled protein by classical methods. The second important feature is that cell-free protein expression can be considered as an “open system”, meaning that no lipid membrane barriers are present. Consequently, chemicals, proteins (e.g. chaperones) as well as PCR-generated templates can be added directly to the reaction solution. Even major changes of the reaction conditions are possible (e.g. using a redox system to produce active proteins containing correctly formed disulfide bonds [160]). These features, as well as the ease of sample handling, dramatically reduce the time for optimizing the expression conditions (in fact, pipetting robots can be designed to run the reactions). Moreover, proteins can be synthesized which display cell toxicity and which therefore can hardly be expressed in classical systems. However, certain limitations do exist that need to be considered. Although enzymes necessary for post-translational modifications can be added, in principle there is currently no productive system available for the preparation of glycosylated proteins, although some interesting results have already been obtained [161]. Also, the expression of functional membrane proteins in quantities necessary for structural analysis will be a challenging task for the future. Nonetheless, the speed and flexibility of this emerging technology could provide the key to meeting the demands of high-throughput structure determinations.
References 1 BAX, A.; GRZESIEK, S. Acc. Chem. Res. 1993, 26, 131–138. 2 ZIMMERMAN, D. E.; KULIKOWSKI, C. A.; HUANG, Y. P.; FENG, W. Q.; TASHIRO, M.; SHIMOTAKAHARA, S.; CHIEN, C. Y.; POWERS, R.; MONTELIONE, G. T. J. Mol. Biol. 1997, 269, 592–610. 3 YAMAZAKI, T.; LEE, W.; ARROWSMITH, C. H.; MUHANDIRAM, D. R.; KAY, L. E. J. Am. Chem. Soc. 1994, 116, 11655– 11666.
4 LEMASTER, D. M.; RICHARDS, F. M. Biochemistry 1988, 27, 142–150. 5 PERVUSHIN, K.; RIEK, R.; WIDER, G.; W¨UTHRICH, K. Proc. Natl. Acad. Sci. USA 1997, 94, 12366–12371. 6 SALZMANN, M.; PERVUSHIN, K.; WIDER, G.; SENN, H.; W¨UTHRICH, K. J. Am. Chem. Soc. 2000, 122, 7543–7548. 7 NERI, D.; SZYPERSKI, T.; OTTING, G.; SENN, H.; W¨UTHRICH, K. Biochemistry 1989, 28, 7510–7516.
35
36 Expression of Proteins in Isotopically Enriched Form
8 MUCHMORE, D. C.; MCINTOSH, L. P.; RUSSEL, C. B.; ANDERSON, D. E.; DAHLQUIST, F. W. In Methods Enzymol. OPPENHEIMER, N. J., JAMES, T. L., Eds.; Academic Press: San Diego, 1989; Vol. 177, pp 44–73. 9 PATZELT, H.; SIMON, B.; FERLAAK, A.; KESSLER, B.; K¨UHNE, R.; SCHMIEDER, P.; OESTERHELT, D.; OSCHKINAT, H. Proc. Natl. Acad. Sci. USA 2002, 99, 9765–9770. 10 EVANS, T. C., Jr.; BENNER, J.; XU, M. Q. Protein Sci. 1998, 7, 2256–2264. 11 PARDI, A. Curr. Opin. Struct. Biol. 1992, 2, 832–835. 12 MARKLEY, J. L.; KAINOSHO, M. In NMR of Macromolecules: A Practical Approach; ROBERTS, G. C. K., Ed.; Oxford University Press: Oxford, 1993; pp 101–152. 13 LEMASTER, D. M. Progr. NMR Spectrosc. 1994, 26, 371–419. 14 GOTO, N. K.; KAY, L. E. Curr. Opin. Struct. Biol. 2000, 10, 585–592. 15 BOLTON, P. H. Progr. NMR Spectrosc. 1990, 22, 423–452. 16 KAINOSHO, M. Nature Struct. Biol. 1997, 4 Suppl., 858–861. 17 LIAN, L.-Y.; MIDDLETON, D. A. Prog. NMR Spectrosc. 2001, 39, 171–190. 18 MAKRIDES, S. C. Microbiol. Rev. 1996, 60, 512–538. 19 CRESPI, H. L.; KATZ, J. J. Methods Enzymol. 1972, 26, 627–637. 20 BRODIN, P.; DRAKENBERG, T.; THULIN, E.; FORSEN, S.; GRUNDSTROEM, T. Protein Eng. 1989, 2, 353–357. 21 SORENSEN, P.; POULSEN, F. M. J. Biomol. NMR 1992, 2, 99–101. 22 PATZELT, H.; KESSLER, B.; OSCHKINAT, H.; OESTERHELT, D. Phytochemistry 1999, 50, 215–217. 23 PATZELT, H.; ULRICH, A. S.; EGBRINGHOFF, H.; DUX, P.; ASHURST, J.; SIMON, B.; OSCHKINAT, H.; OESTERHELT, D. J. Biomol. NMR 1997, 10, 95–106. 24 TAKEDA, H. Phytochemistry 1996, 42, 673–675. 25 SCHWEIMER, K.; MARG, B.-L.; OESTERHELT, D.; R¨OSCH, P.; STICHT, H. J. Biomol. NMR 2000, 16, 347–348. 26 REAT, V.; PATZELT, H.; FERRAND, M.; PFISTER, C.; OESTERHELT, D.; ZACCAI, G.
27
28 29
30 31
32 33 34 35
36 37
38 39 40 41
42 43
44
45
Proc. Natl. Acad. Sci. U. S. A. 1998, 95, 4970–4975. CELESTE, S. G.; JAY, G. D.; BRENDA, G. S.; ENGIN, S. H. Arch. Biochem. Biophys. 1995, 319, 203–210. GLASOE, P. K.; LONG, F. A. J. Phys. Chem. 1960, 64, 188–190. NIETLISPACH, D.; CLOWES, R. T.; BROADHURST, R. W.; ITO, Y.; KEELER, J.; KELLY, M.; ASHURST, J.; OSCHKINAT, H.; DOMAILLE, P. J.; LAUE, E. D. J. Am. Chem. Soc. 1996, 118, 407–415. For more information see: http:// www.genome.ad.jp/kegg/ kegg2.html VENTERS, R. A.; METZLER, W. J.; SPICER, L. D.; MUELLER, L.; FARMER, B. T. J. Am. Chem. Soc. 1995, 117, 9592–9593. GARDNER, K. H.; ROSEN, M. K.; KAY, L. E. Biochemistry 1997, 36, 1389–1401. CRESPI, H. L.; ROSENBERG, R. M.; KATZ, J. J. Science 1968, 161, 795–796. CRESPI, H. L.; KATZ, J. J. Nature 1969, 224, 560–562. ROSEN, M. K.; GARDNER, K. H.; WILLIS, R. C.; PARRIS, W. E.; PAWSON, T.; KAY, L. E. J. Mol. Biol. 1996, 263, 627–636. JANIN, J.; MILLER, S.; CHOTHIA, C. J. Mol. Biol. 1988, 204, 155–164. KAY, L. E.; BULL, T. E.; NICHOLSON, L. K.; GRIESINGER, C.; SCHWALBE, H.; BAX, A.; TORCHIA, D. A. J. Magn. Reson. 1992, 100, 538–558. GOTTSCHALK, G. Bacterial Metabolism; 2 ed.; Springer Verlag: New York, 1986. LEE, A. L.; URBAUER, J. L.; WAND, A. J. J. Biomol. NMR 1997, 9, 437–440. ISHIMA, R.; LOUIS, J. M.; TORCHIA, D. A. J. Biomol. NMR 2001, 21, 167–171. MULDER, F. A.; HON, B.; MITTERMAIER, A.; DAHLQUIST, F. W.; KAY, L. E. J. Am. Chem. Soc. 2002, 124, 1443–1451. GARDNER, K. H.; KAY, L. E. J. Am. Chem. Soc. 1997, 119, 7599–7600. GOTO, N. K.; GARDNER, K. H.; MUELLER, G. A.; WILLIS, R. C.; KAY, L. E. J. Biomol. NMR 1999, 13, 369–374. KELLY, G.; PRASANNAN, S.; DANIELL, S.; FLEMING, K.; FRANKEL, G.; DOUGAN, G.; CONNERTON, I.; MATTHEWS, S. Nat. Struct. Biol. 1999, 6, 313–318. MUELLER, G. A.; CHOY, W. Y.; YANG, D.; FORMAN-KAY, J. D.; VENTERS, R. A.; KAY, L. E. J. Mol. Biol. 2000, 300, 197–212.
References 46 BERARDI, M. J.; SUN, C. H.; ZEHR, M.; ABILDGAARD, F.; PENG, J.; SPECK, N. A.; BUSHWELLER, J. H. Struct. Fold. Des. 1999, 7, 1247–1256. 47 LIU, X.; WANG, H.; EBERSTADT, M.; SCHNUCHEL, A.; OLEJNICZAK, E. T.; MEADOWS, R. P.; SCHKERYANTZ, J. M.; JANOWICK, D. A.; HARLAN, J. E.; HARRIS, E. A.; STAUNTON, D. E.; FESIK, S. W. Cell 1998, 95, 269–277. 48 AGHAZADEH, B.; ZHU, K.; KUBISESKI, T. J.; LIU, G. A.; PAWSON, T.; ZHENG, Y.; ROSEN, M. K. Nat. Struct. Biol. 1998, 5, 1098–1107. 49 DEROSE, E. F.; LI, D. W.; DARDEN, T.; HARVEY, S.; PERRINO, F. W.; SCHAAPER, R. M.; LONDON, R. E. Biochemistry 2002, 41, 94–110. 50 VUISTER, G. W.; KIM, S. J.; WU, C.; BAX, A. J. Am. Chem. Soc. 1994, 116, 9206–9210. 51 MEDEK, A.; OLEJNICZAK, E. T.; MEADOWS, R. P.; FESIK, S. W. J. Biomol. NMR 2000, 18, 229–238. 52 WANG, H.; JANOWICK, D. A.; SCHKERYANTZ, J. M.; LIU, X. H.; FESIK, S. W. J. Am. Chem. Soc. 1999, 121, 1611–1612. 53 COUGHLIN, P. E.; ANDERSON, F. E.; OLIVER, E. J.; BROWN, J. M.; HOMANS, S. W.; POLLAK, S.; LUSTBADER, J. W. J. Am. Chem. Soc. 1999, 121, 11871–11874. 54 GIESEN, A. W.; BAE, L. C.; BARRETT, C. L.; CHYBA, J. A.; CHAYKOVSKY, M. M.; CHENG, M. C.; MURRAY, J. H.; OLIVER, E. J.; SULLIVAN, S. M.; BROWN, J. M.; DAHLQUIST, F. W.; HOMANS, S. W. J. Biomol. NMR 2001, 19, 255–260. 55 GIESEN, A. W.; BAE, L. C.; BARRETT, C. L.; CHYBA, J. A.; CHAYKOVSKY, M. M.; CHENG, M. C.; MURRAY, J. H.; OLIVER, E. J.; SULLIVAN, S. M.; BROWN, J. M.; HOMANS, S. W. J. Biomol. NMR 2002, 22, 21–26. 56 HAJDUK, P. J.; AUGERI, D. J.; MACK, J.; MENDOZA, R.; YANG, J. G.; BETZ, S. F.; FESIK, S. W. J. Am. Chem. Soc. 2000, 122, 7898–7904. 57 HIRATA, R.; OHSUMK, Y.; NAKANO, A.; KAWASAKI, H.; SUZUKI, K.; ANRAKU, Y. J. Biol. Chem. 1990, 265, 6726–6733. 58 KANE, P. M.; YAMASHIRO, C. T.; WOLCZYK, D. F.; NEFF, N.; GOEBL, M.; STEVENS, T. H. Science 1990, 250, 651–657.
59 PERLER, F. B. Nucleic Acids Res 2002, 30, 383–384. 60 CHONG, S.; MONTELLO, G. E.; ZHANG, A.; CANTOR, E. J.; LIAO, W.; XU, M. Q.; BENNER, J. Nucleic Acids Res 1998, 26, 5109–5115. 61 DAWSON, P. E.; KENT, S. B. Annu. Rev. Biochem. 2000, 69, 923–960. 62 XU, R.; AYERS, B.; COWBURN, D.; MUIR, T. W. Proc. Natl. Acad. Sci. USA 1999, 96, 388–393. 63 COTTON, G. J.; AYERS, B.; XU, R.; MUIR, T. W. J. Am. Chem. Soc. 1999, 121, 1100–1101. 64 LOW, D. W.; HILL, M. G.; CARRASCO, M. R.; KENT, S. B.; BOTTI, P. Proc. Natl. Acad. Sci. USA 2001, 98, 6554–6559. 65 WELKER, E.; SCHERAGA, H. A. Biochem. Biophys. Res. Commun. 1999, 254, 147–151. 66 CHONG, S.; MERSHA, F. B.; COMB, D. G.; SCOTT, M. E.; LANDRY, D.; VENCE, L. M.; PERLER, F. B.; BENNER, J.; KUCERA, R. B.; HIRVONEN, C. A.; PELLETIER, J. J.; PAULUS, H.; XU, M. Q. Gene 1997, 192, 271–281. 67 MILLS, K. V.; LEW, B. M.; JIANG, S.; PAULUS, H. Proc. Natl. Acad. Sci. USA 1998, 95, 3543–3548. 68 WU, H.; XU, M. Q.; LIU, X. Q. Biochim. Biophys. Acta 1998, 1387, 422–432. 69 SOUTHWORTH, M. W.; ADAM, E.; PANNE, D.; BYER, R.; KAUTZ, R.; PERLER, F. B. EMBO J. 1998, 17, 918–926. 70 YAMAZAKI, T.; OTOMO, T.; ODA, N.; KYOGOKU, Y.; UEGAKI, K.; ITO, N.; ISHINO, Y.; NAKAMURA, H. J. Am. Chem. Soc. 1998, 120, 5591–5592. 71 OTOMO, T.; TERUYA, K.; UEGAKI, K.; YAMAZAKI, T.; KYOGOKU, Y. J. Biomol. NMR 1999, 14, 105–114. 72 PAULUS, H. Annu. Rev. Biochem. 2000, 69, 447–496. 73 EVANS, T. C., Jr.; MARTIN, D.; KOLLY, R.; PANNE, D.; SUN, L.; GHOSH, I.; CHEN, L.; BENNER, J.; LIU, X. Q.; XU, M. Q. J. Biol. Chem. 2000, 275, 9091–9094. 74 IWAI, H.; PLU¨ CKTHUN, A. FEBS Lett. 1999, 459, 166–172. 75 SCOTT, C. P.; ABEL-SANTOS, E.; WALL, M.; WAHNON, D. C.; BENKOVIC, S. J. Proc. Natl. Acad. Sc. USA 1999, 96, 13638–13643.
37
38 Expression of Proteins in Isotopically Enriched Form
76 CAMARERO, J. A.; MUIR, T. W. J. Am. Chem. Soc. 1999, 121, 5597–5598. 77 FLORY, P. J. J. Am. Chem. Soc. 1956, 78, 5222–5235. 78 TRABI, M.; CRAIK, D. J. Trends Biochem. Sci. 2002, 27, 132–138. 79 EVANS, T. C., Jr.; BENNER, J.; XU, M. Q. J. Biol. Chem. 1999, 274, 3923–3926. 80 IWAI, H.; LINGEL, A.; PLU¨ CKTHUN, A. J. Biol. Chem. 2001, 276, 16548–16554. 81 WILLIAMS, N. K.; PROSSELKOV, P.; LIEPINSH, E.; LINE, I.; SHAPIRO, A.; LITTLER, D. R.; CURMI, P. M.; OTTING, G.; DIXON, N. E. J. Biol. Chem. 2002, 277, 7790–7798. 82 SIEBOLD, C.; FLUKIGER, K.; BEUTLER, R.; ERNI, B. FEBS Lett. 2001, 504, 104–111. 83 GOLDENBERG, D. P.; CREIGHTON, T. E. J. Mol. Biol. 1983, 165, 407–413. 84 CAMARERO, J. A.; FUSHMAN, D.; SATO, S.; GIRIAT, I.; COWBURN, D.; RALEIGH, D. P.; MUIR, T. W. J. Mol. Biol. 2001, 308, 1045–1062. 85 WOESE, C. R.; FOX, G. E. Proc. Natl. Acad. Sci. USA 1977, 74, 5088–5090. 86 MURATA, K.; MITSUOKA, K.; HIRAI, T.; WALZ, T.; AGRE, P.; HEYMANN, J. B.; ENGEL, A.; FUJIYOSHI, Y. Nature 2000, 407, 599–605. 87 LUECKE, H.; SCHOBERT, B.; RICHTER, H. T.; CARTAILLER, J. P.; LANYI, J. K. J. Mol. Biol. 1999, 291, 899–911. 88 TURNER, G. J.; REUSCH, R.; WINTER-VANN, A. M.; MARTINEZ, L.; BETLACH, M. C. Protein Expr. Purif. 1999, 17, 312–323. 89 WINTER-VANN, A. M.; MARTINEZ, L.; BARTUS, C.; LEVAY, A.; TURNER, G. J. In Perspectives on Solid State NMR in Biology; K¨UHNE, S. R., de GROOT, H. J. M., Eds.; Kluwer: Netherlands, 2001; pp 141–159. 90 HINNEN, A.; BUXTON, F.; CHAUDHURI, B.; HEIM, J.; HOTTIGER, T.; MEYHACK, B.; POHLIG, G. In Gene expression in recombinant Microorganisms; SMITH, A., Ed.; Marcel Dekker: New York, 1994; pp 121–193. 91 VALENZUELA, P.; MEDINA, A.; RUTTER, W. J.; AMMERER, G.; HALL, B. D. Nature 1982, 298, 347–350. 92 CABEZON, T.; De WILDE, M.; HERION, P.; LORIAU, R.; BOLLEN, A. Proc. Natl. Acad. Sci. USA 1984, 81, 6594–6598.
93 EMINI, E. A.; ELLIS, R. W.; MILLER, W. J.; MCALEER, W. J.; SCOLNICK, E. M.; GERETY, R. J. J. Infect. 1986, 13 Suppl. A, 3–9. 94 SCHULTZ, L. D.; TANNER, J.; HOFMANN, K. J.; EMINI, E. A.; CONDRA, J. H.; JONES, R. E.; KIEFF, E.; ELLIS, R. W. Gene 1987, 54, 113–123. 95 HALLEWELL, R. A.; MILLS, R.; TEKAMP-OLSON, P.; BLACHER, R.; ROSENBERG, S.; OTTING, F.; MASIARZ, F. R.; SCANDELLA, C. J. Biotechnol. 1987, 5, 363–366. 96 DAVID, N. E.; GEE, M.; ANDERSEN, B.; NAIDER, F.; THORNER, J.; STEVENS, R. C. J. Biol. Chem. 1997, 272, 15553– 15561. 97 ANDERSEN, B.; STEVENS, R. C. Protein Expr. Purif. 1998, 13, 111–119. 98 GIGA-HAMA, Y.; KUMAGAI, H. Biotechnol. Appl. Biochem. 1999, 30, 235–244. 99 GRALLERT, B.; NURSE, P.; PATTERSON, T. E. Mol. Gen. Genet. 1993, 238, 26–32. 100 MAUNDRELL, K.; HUTCHISON, A.; SHALL, S. EMBO J. 1988, 7, 2203–2209. 101 TOYAMA, R.; OKAYAMA, H. FEBS Lett. 1990, 268, 217–221. 102 XU, H. P.; WHITE, M.; MARCUS, S.; WIGLER, M. Mol. Cell Biol. 1994, 14, 50–58. 103 HILDEBRANDT, V. In Foreign Gene Expression in Fission Yeast Schizosaccharomyces pombe; GIGA-HAMA, Y., KUMAGAI, H., Eds.; Springer Verlag: Berlin, 1997; pp 97–110. 104 GIGA-HAMA, Y.; TOHDA, H.; OKADA, H.; OWADA, M. K.; OKAYAMA, H.; KUMAGAI, H. Biotechnology 1994, 12, 400–404. 105 YASUMORI, T. In Foreign Gene expression in Fission Yeast Schizosaccharomyces pombe; GIGA-HAMA, Y., KUMAGAI, H., Eds.; Springer Verlag: Berlin, 1997; pp 111–121. 106 SANDER, P.; GRUNEWALD, S.; REILANDER, H.; MICHEL, H. FEBS Lett. 1994, 344, 41–46. 107 KJELDSEN, T.; PETTERSSON, A. F.; HACH, M. Biotechnol. Appl. Biochem. 1999, 29, 79–86. 108 CEREGHINO, J. L.; CREGG, J. M. FEMS Microbiol. Rev. 2000, 24, 45–66. 109 WEISS, H. M.; HAASE, W.; MICHEL, H.; REILANDER, H. Biochem. J. 1998, 330, 1137–1147.
References 110 LUQUE, T.; O’REILLY, D. R. Mol. Biotechnol. 1999, 13, 153–163. 111 POSSEE, R. D. Curr. Opin. Biotechnol. 1997, 8, 569–572. 112 OGONAH, O. W.; FREEDMAN, R. B.; JENKINS, N.; PATEL, K.; ROONEY, B. C. Nat. Biotechnol. 1996, 14, 197–202. 113 CONDREAY, J. P.; WITHERSPOON, S. M.; CLAY, W. C.; KOST, T. A. Proc. Natl. Acad. Sci. USA 1999, 96, 127–132. 114 PAYNE, L. A.; FORNWALD, J. A.; KANE, J. F.; MCNULTY, D. E.; TRILL, J. J.; RAMOS, L. In Animal Cell Technology: From Target to Market; LINDNER-OLSSON, E., CHATISSAVIDOU, N., LILLAU, E., Eds.; Kluwer: Netherlands, 2001; pp 94–100. 115 MAZINA, K. E.; STRADER, C. D.; FONG, T. M. J. Recept. Res. 1994, 14, 63–73. 116 GUAN, X. M.; KOBILKA, T. S.; KOBILKA, B. K. J. Biol. Chem. 1992, 267, 21995–21998. 117 GU, H.; WALL, S. C.; RUDNICK, G. J. Biol. Chem. 1994, 269, 7124–7130. 118 GIRARD, P.; DEROUAZI, M.; BAUMGARTNER, G.; BOURGEOIS, M.; JORDAN, M.; WURM, F. M. In Animal Cell Technology: From Target to Market; LINDNER-OLSSON, E., CHATZISSAVIDOU, N., LILLAU, E., Eds.; Kluwer Academic Publishers: Netherlands, 2001; pp 37–42. 119 FUX, C.; MOSER, S.; SCHLATTER, S.; RIMANN, M.; BAILEY, J. E.; FUSSENEGGER, M. Nucleic Acids Res. 2001, 29, E19. 120 BOORSMA, M.; NIEBA, L.; KOLLER, D.; BACHMANN, M. F.; BAILEY, J. E.; RENNER, W. A. Nat. Biotechnol. 2000, 18, 429– 432. 121 TATE, C. G. FEBS Lett. 2001, 504, 94–98. 122 GRAHAM, F. L. Immunol. Today 2000, 21, 426–428. 123 PANICALI, D.; PAOLETTI, E. Proc. Natl. Acad. Sci. USA 1982, 79, 4927–4931. 124 WARD, G. A.; STOVER, C. K.; MOSS, B.; FUERST, T. R. Proc. Natl. Acad. Sci. USA 1995, 92, 6773–6777. 125 LILJESTROM, P.; GAROFF, H. Biotechnol. 1991, 9, 1356–1361. 126 XIONG, C.; LEVIS, R.; SHEN, P.; SCHLESINGER, S.; RICE, C. M.; HUANG, H. V. Science 1989, 243, 1188–1191. 127 LUNDSTROM, K. J. Recept. Signal. Transduct. Res. 1999, 19, 673–686. 128 LUNDSTROM, K. In Perspectives on Solid-State NMR in Biology; K¨UHNE, S. R.,
129
130
131
132
133 134 135 136
137 138
139 140
141
142
143
144 145
de GROOT, J. J. M., Eds.; Kluwer: The Netherlands, 2001; pp 131–139. HOVIUS, R.; TAIRI, A.-P.; BLASEY, H.; BERNARD, A. R.; LUNDSTROM, K.; VOGEL, H. J. Neurochem. 1998, 70, 824–834. LUNDSTROM, K.; SCHWEITZER, C.; ROTMANN, D.; HERMANN, D.; SCHNEIDER, E. M.; EHRENGRUBER, M. U. FEBS Lett. 2001, 504, 99–103. LUNDSTROM, K.; ROTMANN, D.; HERRMANN, D.; SCHLAEGER, E.-J. Cytotechnology 2001, 35, 213–221. FERNANDEZ, C.; HILTY, C.; BONJOUR, S.; ADEISHVILI, K.; PERVUSHIN, K.; W¨UTHRICH, K. FEBS Lett. 2001, 504, 173–178. BELLIZZI, J. J.; WIDOM, J.; KEMP, C. W.; CLARDY, J. Structure 1999, 7, R263–267. BUSHNELL, D. A.; CRAMER, P.; KORNBERG, R. D. Structure 2001, 9, R11–R14. WOOD, M. J.; KOMIVES, E. A. J. Biomol. NMR 1999, 13, 149–159. MORGAN, W. D.; BIRDSALL, B.; FRENKIEL, T. A.; GRADWELL, M. G.; BURGHAUS, P. A.; SYED, S. E. H.; UTHAIPIBULL, C.; HOLDER, A. A.; FEENEY, J. J. Molec. Biol. 1999, 289, 113–122. PALCZEWSKA, M.; GROVES, P.; KUZNICKI, J. Protein Expr. Purif. 1999, 17, 465–476. LAROCHE, Y.; STORME, V.; DEMEUTTER, J.; MESSENS, J.; LAUWEREYS, M. Bio-Technology 1994, 12, 1119–1124. MORGAN, W. D.; KRAGT, A.; FEENEY, J. J. Biomol. NMR 2000, 17, 337–347. HELMLE, M.; PATZELT, H.; OCKENFELS, A.; GARTNER, W.; OESTERHELT, D.; BECHINGER, B. Biochemistry 2000, 39, 10066–10071. WYSS, D. F.; WITHKA, J. M.; KNOPPERS, M. H.; STERNE, K. A.; RECNY, M. A.; WAGNER, G. Biochemistry 1993, 32, 10995–11006. HANSEN, A. P.; PETROS, A. M.; MAZAR, A. P.; PEDERSON, T. M.; RUETER, A.; FESIK, S. W. Biochemistry 1992, 31, 12713–12718. LUSTBADER, J. W.; BIRKEN, S.; POLLAK, S.; POUND, A.; CHAIT, B. T.; MIRZA, U. A.; RAMNARAIN, S.; CANFIELD, R. E.; BROWN, J. M. J. Biomol. NMR 1996, 7, 295–304. See http://www.mepnet.org for further information BAILEY, W. J.; VANTI, W. B.; GEORGE, S. R.; LEVINS, R.; SWARMINATHAN, S.; BONINI, J. A.; SMITH, K. E.; WEINSHANK,
39
40 Expression of Proteins in Isotopically Enriched Form
146
147
148
149
150
151
152
153 154 155 156
157 158
159 160
161 162
R. L.; O’DOWD, B. F. Expert Opin. Ther. Patents 2001, 11, 1861–1887. ARORA, A.; ABILDGAARD, F.; BUSHWELLER, J. H.; TAMM, L. K. Nat. Struct. Biol. 2001, 8, 334–338. FERNANDEZ, C.; ADEISHVILI, K.; W¨UTHRICH, K. Proc. Natl. Acad. Sci. USA 2001, 98, 2358–2363. PALCZEWSKI, K.; KUMASAKA, T.; HORI, T.; BEHNKE, C. A.; MOTOSHIMA, H.; FOX, B. A.; LETRONG, I.; TELLER, D. C.; OKADA, T.; STENKAMP, R. E.; YAMAMOTO, M.; MIYANO, M. Science 2000, 289, 739–745. SPIRIN, A. S.; BARANOV, V. I.; RYABOVA, L. A.; OVODOV, S. Y.; ALAKHOV, Y. B. Science 1988, 242, 1162–1164. KIM, D. M.; KIGAWA, T.; CHOI, C. Y.; YOKOYAMA, S. Eur. J. Biochem. 1996, 239, 881–886. KIGAWA, T.; YABUKI, T.; YOSHIDA, Y.; TSUTSUI, M.; ITO, Y.; SHIBATA, T.; YOKOYAMA, S. FEBS Lett. 1999, 442, 15–19. MADIN, K.; SAWASAKI, T.; OGASAWARA, T.; ENDO, Y. Proc. Natl. Acad. Sci. USA 2000, 97, 559–564. KIM, D. M.; SWARTZ, J. R. Biotechnol. Prog. 2000, 16, 385–390. KIM, D. M.; SWARTZ, J. R. Biotechnol. Bioeng. 2001, 74, 309–316. KIGAWA, T.; MUTO, Y.; YOKOYAMA, S. J. Biomol. NMR 1995, 6, 129–134. YABUKI, T.; KIGAWA, T.; DOHMAE, N.; TAKIO, K.; TERADA, T.; ITO, Y.; LAUE, E. D.; COOPER, J. A.; KAINOSHO, M.; YOKOYAMA, S. J. Biomol. NMR 1998, 11, 295–306. See www.proteinexpression.com for more information. FERNHOLZ, E.; BESIR, H.; K¨UHLEWEIN, A.; MAYR, D.; SCHMITT, R.; SCHWAIGER, M. unpublished results 2002. SPIRIN, A. S. Cell-Free-Translation Systems; Springer Verlag: Berlin, 2002. FERNHOLZ, E.; ZAISS, K.; BESIR, H.; MUTTER, W. In Cell Free Translation Systems; SPIRIN, A. S., Ed.; Springer Verlag: Berlin, 2002; pp 175–179. TARUI, H.; IMANISHI, S.; HARA, T. J. Biosci. Bioeng. 2000, 90, 508–514. LEMASTER, D. M.; KUSHLAN, D. M. J. Am. Chem. Soc. 1996, 118, 9255–9264.
163 ISHIMA, R.; LOUIS, J. M.; TORCHIA, D. A. J. Am. Chem. Soc. 1999, 121, 11589–11590. 164 TOYOSHIMA, C.; NAKASAKO, M.; NOMURA, H.; OGAWA, H. Nature 2000, 405, 647–655. 165 HUNTE, C.; KOEPKE, J.; LANGE, C.; ROSSMANITH, T.; MICHEL, H. Struct. Fold. Des. 2000, 8, 669–684. 166 YOSHIKAWA, S.; SHINZAWA-ITOH, K.; NAKASHIMA, R.; YAONO, R.; YAMASHITA, E.; INOUE, N.; YAO, M.; FEI, M. J.; LIBEU, C. P.; MIZUSHIMA, T.; YAMAGUCHI, H.; TOMIZAKI, T.; TSUKIHARA, T. Science 1998, 280, 1723–1729. 167 STOCK, D.; LESLIE, A. G.; WALKER, J. E. Science 1999, 286, 1700–1705. 168 LANCASTER, C. R.; KROGER, A.; AUER, M.; MICHEL, H. Nature 1999, 402, 377–385. 169 UNGER, V. M.; KUMAR, N. M.; GILULA, N. B.; YEAGER, M. Science 1999, 283, 1176–1180. 170 FU, D.; LIBSON, A.; MIERCKE, L. J.; WEITZMAN, C.; NOLLERT, P.; KRUCINSKI, J.; STROUD, R. M. Science 2000, 290, 481–486. 171 KOLBE, M.; BESIR, H.; ESSEN, L. O.; OESTERHELT, D. Science 2000, 288, 1390–1396. 172 AUER, M.; SCARBOROUGH, G. A.; K¨UHLBRANDT, W. Nature 1998, 392, 840–843. 173 CHANG, G.; SPENCER, R. H.; LEE, A. T.; BARCLAY, M. T.; REES, D. C. Science 1998, 282, 2220–2226. 174 WILLIAMS, K. A. Nature 2000, 403, 112–115. 175 MIYAZAWA, A.; FUJIYOSHI, Y.; STOWELL, M.; UNWIN, N. J. Mol. Biol. 1999, 288, 765–786. 176 LANCASTER, C. R.; MICHEL, H. J. Mol. Biol. 1999, 286, 883–898. 177 ZOUNI, A.; WITT, H. T.; KERN, J.; FROMME, P.; KRAUSS, N.; SAENGER, W.; ORTH, P. Nature 2001, 409, 739–743. 178 MEYER, J. E.; HOFNUNG, M.; SCHULZ, G. E. J. Mol. Biol. 1997, 66, 761–765. 179 DOYLE, D. A.; MORAIS CABRAL, J.; PFUETZNER, R. A.; KUO, A.; GULBIS, J. M.; COHEN, S. L.; CHAIT, B. T.; MACKINNON, R. Science 1998, 280, 69–77.
1
Monocot Expression Systems for Molecular Farming Paul Christou Fraunhofer-Institute for Molecular Biology and Applied Ecology (IME), Schmallenberg, Germany
Eva Stoger RWTH Aachen, Aachen, Germany
Richard M. Twyman University of York, York, United Kingdom
Originally published in: Molecular Farming. Edited by Rainer Fischer and Stefan Schillberg. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30786-9
1 Introduction
Among the many different agricultural expression systems that can be used for the large-scale production of recombinant proteins, field-grown monocot crops, and more specifically cereals, represent one of the most attractive options [1, 2]. The major advantage of cereals is that recombinant proteins can be targeted to accumulate specifically in the seed, e.g. in the embryo, the aleurone layer or the endosperm. The endosperm is particularly suitable because this tissue has evolved for the accretion of storage products, including proteins [3]. Therefore, it contains a well-developed endomembrane system, within which molecular chaperones and disulfide isomerases help to fold proteins correctly, while the desiccated environment in the mature seed protects the stored proteins from degradation [3, 4]. In the best cases, recombinant proteins expressed in seeds have remained stable and active after storage at room temperature for more than two years [1, 2, 4]. The recombinant protein reaches high concentrations in a small volume of tissue, which facilitates extraction and downstream processing. In terms of processing, other advantages of cereal seeds include the lack of phenolic compounds and the relatively simple proteome. Phenolic compounds, such as the alkaloids present in tobacco and the oxalic acid found in alfalfa, can interfere with processing steps, while the presence of many competing proteins can result in the co-purification of non-target Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Monocot Expression Systems for Molecular Farming
endogenous polypeptides with similar properties to the target recombinant protein. A final advantage of seeds is that proteins restricted to seeds do not interfere with the growth of vegetative organs, so even if they could adversely affect the plant, then normal growth is possible. In terms of biosafety seed expression limits the exposure of non-target organisms, such as pollinating insects, microbes in the rhizosphere or herbivores feeding on leaves, to biologically active recombinant proteins. Furthermore, three of the cereal crops considered below are self-pollinating, reducing the likelihood of outcrossing to non-transgenic crops and wild relatives [5]. Four cereal crops have thus far been utilized for the production of recombinant proteins: maize, rice, wheat and barley. It is notable that, despite the attention given to tobacco, oilseed rape and potatoes as major expression systems, the only cultivated crop currently used to produce commercial recombinant proteins is maize [6, 7]. In this article, we discuss the relative merits of the four cereal expression systems, consider some of the technical issues surrounding recombinant protein expression in cereal grains, and finally discuss examples of recombinant proteins that have been produced in cereals. Several factors need to be weighed up when choosing the most appropriate cereal expression host for any given protein. These include geographical considerations for crop growth and harvest, the ease of transformation and regeneration, the annual yield of seed per hectare, the yield of recombinant protein per kilogram of seed, the producer price of the crop, the percentage of the seed that is made up of protein and, inevitably, intellectual property issues. Together, these determine the overall cost of production.
2 Cereal Production Crops
All four of the cereal production crops have advantages for molecular farming, but maize was the first to be developed into a platform expression system thanks largely to the efforts of scientists at Prodigene Inc., College Station, TX. Maize was chosen over the other cereals because it has the largest annual grain yield (approximately 8300 kg ha−1 ) but also a relatively high seed protein content (10%), and these properties together offer the highest potential recombinant protein yields per hectare. Maize is also relatively easy to transform and manipulate in the laboratory, while its short generation interval (normally about 17 weeks) facilitates rapid production scale-up in the field. Maize is the most widely cultivated crop in North America, so the complex infrastructure for growing, harvesting, processing, storing and transporting large volumes of corn seed is already in place. Prodigene currently has two commercial products that are expressed in maize [6, 7] and is developing further lines for the production of a range of pharmaceutical and technical proteins, including recombinant antibodies, vaccine candidates and enzymes [8–10]. Rice has many advantages in common with maize including the high grain yields, the ease of transformation and manipulation in the laboratory, and the capacity for rapid scale up [11]. At 6600 kg ha−1 , the annual grain yield of rice is slightly lower than that of maize, and the seed also has a slightly lower pro-
3 Technical Aspects of Molecular Farming in Cereals
tein content (8%). Importantly, however, rice has emerged as the model cereal species, and is one of the few terrestrial plants to benefit from a completed genome sequence [12, 13]. Therefore, many useful expression cassettes have been developed based on endogenous rice genes, including constitutive actin promoters, glutelin promoters for seed-specific expression and the α-amylase inducible promoter system which is often employed in rice suspension cell cultures [14]. The major disadvantage of rice compared to maize is that the producer price is significantly higher, so it may be excluded as a mainstream expression host in the West simply on economic grounds. However, the story may be different in Asia and Africa, where rice is traditionally grown. Rice, along with barley (see below) has been adopted as a platform production technology by Ventria Bioscience, Sacramento, CA. Wheat has been used only rarely for molecular farming, probably because the technology for gene transfer and regeneration is not so well advanced as in other cereals [4]. The major advantage of wheat for the commercial production of recombinant proteins is its very low producer price compared to maize and rice. The wheat grain also has a higher protein content than these other cereals (>12%). Unfortunately, wheat also has a much lower grain yield than maize and rice (2800 kg ha−1 yr−1 ) which means that more land is required to produce the same amount of protein. While some recombinant proteins, including pea legumin, have been produced at high levels in wheat endosperm, only low yields have been achieved with antibodies, although this situation may change in the future if better expression cassettes can be developed. Barley has much in common with wheat as a host for molecular farming. It has a low producer price, a high seed protein content (about 13%) and the technology for gene transfer and regeneration is not widely disseminated. Unlike wheat, however, there have been some very encouraging reports of high recombinant protein yields in transgenic barley, including a diagnostic antibody that accumulated to 150 µg g−1 seed weight [15] and a recombinant cellulase that accumulated to 1.5% total seed protein [16]. Further recombinant proteins will need to be expressed in barley before its performance as an expression platform can be judged against maize and rice. 3 Technical Aspects of Molecular Farming in Cereals 3.1 Cereal Transformation
Until the 1990s, cereals were thought to be outside the host range of Agrobacterium tumefaciens, so researchers concentrated on alternative transformation methods. Initially, attempts were made to regenerate cereal plants from transformed protoplasts following PEG-mediated DNA transfer or electroporation. After successful experiments using model dicots, protoplast transformation was achieved in wheat [17] and the Italian ryegrass Lolium multiflorum [18]. In each case, transgenic callus
3
4 Monocot Expression Systems for Molecular Farming
obtained but it was not possible to recover transgenic plants, probably because monocot protoplasts loose their competence to respond to tissue culture conditions as the cells differentiate. In cereals and grasses, this has been addressed to a certain extent by using embryogenic suspension cultures as a source of protoplasts. Additionally, since many monocot species are naturally tolerant towards kanamycin, the nptII marker used in the initial experiments was replaced with alternative markers conferring resistance to hygromycin or phosphinothricin. With these modifications, it has been possible to regenerate transgenic rice and maize of certain varieties with reasonable efficiency [19–21]. However, the extended tissue culture step is unfavorable, often resulting in sterility and other phenotypic abnormalities in the regenerated plants. An alternative procedure for plant transformation was introduced in 1987, involving the use of a modified shotgun to accelerate small (1–4 µm) metal particles into plant cells at a velocity sufficient to penetrate the cell wall (∼250 m s−1 ) [22]. The original device was gunpowder-driven and rather inaccurate. This has been replaced by a commercially-available pressurized helium gun, resulting in greater control over particle velocity and hence greater reproducibility of transformation conditions. In addition, an apparatus based on electric discharge [23] has been useful for the development of variety-independent gene transfer methods for the more recalcitrant cereals. There appears to be no intrinsic limitation to the scope of particle bombardment since DNA-delivery is governed entirely by physical parameters. Many different types of plant material have been used as transformation targets, including callus pieces, cell suspension cultures and organized tissues such as immature embryos and meristems. Almost all of the commercially-important cereals have been transformed by particle bombardment, including maize [24], rice [25], wheat [26] and barley [27]. During the 1980s, it was shown that asparagus [28] and yam [29] could be infected with A. tumefaciens and that tumors could be induced. In the latter case, an important factor in the success of the experiment was pre-treatment of the A. tumefaciens suspension with wound exudate from potato tubers (A. tumefaciens infection of monocots is inefficient because wounded monocot tissues do not produce phenolic compounds such as acetosyringone at sufficient levels to induce vir gene expression). Attention then turned towards the cereals. The first species to be transformed was rice. The nptII gene was used as a selectable marker, and successful transformation was demonstrated both by the resistance of transgenic callus to kanamycin or G418, and the presence of T-DNA in the genome [30]. However, these antibiotics interfere with the regeneration of rice plants, so only four transgenic plants were produced. The use of an alternative marker conferring resistance to hygromycin allowed the regeneration of large numbers of transgenic japonica rice plants [31], and the same selection strategy has been used to produce transgenic rice plants representing the remaining subspecies, indica and javanica [32, 33]. More recently, efficient Agrobacterium-mediated transformation has become possible for other cereals discussed in this article: maize [34], wheat [35] and barley [36]. The breakthrough in cereal transformation using A. tumefaciens reflected the recognition of several key factors required for efficient infection and gene transfer.
3 Technical Aspects of Molecular Farming in Cereals
The use of explants containing a high proportion of actively dividing cells, such as embryos or apical meristems, was found to increase transformation efficiency greatly, probably because DNA synthesis and cell division favor the integration of exogenous DNA. In dicots, cell division is induced by wounding, whereas wound sites in monocots tend to become lignified. Hiei et al. [31] showed that the co-cultivation of A. tumefaciens and rice embryos in the presence of 100 mM acetosyringone was a critical factor for successful transformation. Transformation efficiency can be increased further by the use of vectors with enhanced virulence functions. The modification of A. tumefaciens to boost virulence has been achieved by increasing the expression of virG (which in turn upregulates the expression of the other vir genes) and/or the expression of virE1, which is a major limiting factor in T-DNA transfer [37]. Komari et al. [38] used a different strategy, in which a portion of the virulence region from the Ti-plasmid of supervirulent strain A281 was transferred to the T-DNA-carrying plasmid to generate a so-called superbinary vector. The advantage of the latter technique is that the superbinary vector can be used in any Agrobacterium strain. 3.2 Expression Construct Design
The most important aspect of construct design for molecular farming in cereals is the promoter used to drive transgene expression. In dicots, the cauliflower mosaic virus (CaMV) 35S promoter is the most popular choice because it is strong and constitutive, and therefore drives high-level transgene expression in any relevant organ, including seeds [39, 40]. The promoter can be made even more active by various modifications, such as duplicating the enhancer region [41]. Even so, this promoter shows a very low activity in cereals. Although some plant promoters appear to be just as active in both dicots and monocots [42–44], most promoters that are active in dicots need to be modified in some way before they work efficiently in cereals. One general type of modification that appears to work well with a range of promoters is the addition of an intron, usually in the untranslated region between the promoter and the initiation codon of the transgene open reading frame. Several different introns have been used to modify the CaMV 35S promoter and this has led to improved promoter activity in all four of the cereal host species discussed above, as well as in some grasses. For example, four monocot introns were tested with the CaMV 35S promoter in transgenic maize and bluegrass by Vain et al. [45], specifically those from the Adh1, Sh1, Ubi1 and Act1 genes, as well as a dicot intron (chsA). In this comparison, the Ubi1 intron provided the highest level of enhancement in both species on average, but the dicot chsA intron perhaps surprisingly achieved a nearly 100-fold enhancement. Other monocot and dicot introns that have been tested with the CaMV 35S promoter include those from the Bz1, cat1 and Wx genes [46–49]. The activity of another dicot promoter, the potato pin2 promoter, was greatly enhanced in transgenic rice by inserting the first intron from the rice Act1 gene [50]. More recently, Waterhouse and colleagues [51] adapted their novel pPLEX series of
5
6 Monocot Expression Systems for Molecular Farming
expression constructs, which are based on regulatory elements from subterranean clover stunt virus (SCSV), to be used in monocots. This was achieved by adding either the Ubi1 or Act1 introns, as well as GC-rich enhancer sequences from banana bunchy top virus (BBTV) or maize streak virus (MSV). An alternative to dicot promoters enhanced with introns is to use constitutive monocot promoters, of which the maize Ubi1 promoter (with first intron) is the most widely utilized [52]. Various seed-specific promoters, many derived from seed storage protein genes, have been employed to restrict recombinant protein expression to different parts of the seed. Early examples include the maize zmZ27 zein promoter, a maize Waxy promoter and the rice small subunit ADP-glucose pyrophosphorylase promoter, all of which are endosperm-specific in maize, and the rice glutelin 1 (Gt1) promoter, which is endosperm-specific in rice and maize [53–55]. More recently, recombinant proteins have been expressed in maize using the embryo-preferred globulin-1 promoter [56], in rice using an endosperm-specific globulin promoter [57] and two aleurone-specific promoters from barley [58], and in barley using the wheat Bx17 high-molecular-weight glutenin promoter [59]. Care must be taken, however, because although these promoters are often described as ‘seed-specific’, a low level of activity may be present in other tissues. One pertinent example is the maize Waxy promoter, which is endosperm-preferred but shows a low level of activity in pollen [53]. In addition to an active promoter, a strong polyadenylation signal is required for transcript stability [1, 2]. For molecular farming in cereals, terminators derived from the CaMV 35S transcript, the A. tumefaciens nos gene, and the potato PinII (protease inhibitor II) gene have been popular choices. The structure of the 5 and 3 untranslated regions should be inspected for AU-rich sequences, and these should be removed where possible [60]. Such sequences can act as cryptic splice sites, cryptic polyadenylation sites and mRNA instability elements. Some sequences, such as the 5 leader of the petunia chalcone synthase gene and the 5 leader of tobacco mosaic virus RNA (also known as the omega sequence) have been identified as translational enhancers although they may not always work effectively in cereals. Further important factors that influence translation include the presence of a single AUG codon within a consensus Kozak sequence, since multiple AUG codons, where present, often result in pausing and inefficient translational initiation. Different species also have very different codon preferences when specifying degenerate amino acids. Taking the amino acid arginine as an example, the codon CGU is preferred in alfalfa, and is 50 times more likely to occur than the rarest codon, CGG. In contrast, both of these codons are equally prevalent in maize, but the preferred choice is CGC. The expression of foreign transgenes in cerelas can therefore be very inefficient if infrequently-used codons predominate [60]. One critical consideration for the improvement of protein yields is subcellular protein targeting, because the compartment in which a recombinant protein accumulates strongly influences the interrelated processes of folding, assembly and post-translational modification [61]. For protein accumulation in cereal endosperm, the most important destinations are the protein storage organelles, i.e. the protein
3 Technical Aspects of Molecular Farming in Cereals
bodies and protein storage vacuoles, since these have developed to facilitate stable protein accumulation [3]. In rice, it has been shown that the two major classes of storage proteins, prolamins and glutelins, accumulate in different storage compartments, and are sorted in distinct ways [62]. Prolamins accumulate in protein bodies inside the rough endoplasmic reticulum (ER), and these eventually bud off to form separate organelles, whereas glutelins accumulate in protein storage vacuoles, which are derived from the smooth ER, and are conveyed to these organelles by transport vesicles budding from the Golgi apparatus [3]. There have been few studies of recombinant protein localization within the endosperm, but in one interesting report a recombinant antibody targeted to the secretory pathway in rice endosperm, and tagged for retrieval to the ER with a KDEL tetrapeptide tag, was found mainly in protein bodies and to some minor extent also in protein storage vacuoles [63]. The lack of a KDEL sequence in endosperm-expressed proteins generally results in secretion to the apoplast, where in most cases the recombinant protein accumulates in the space under the cell wall. A variety of signal peptides has been used to target proteins to the secretory pathway, including endogenous signal peptides from the transgene (e.g. immunoglobulin signal peptides for recombinant antibodies) [4, 63] and heterologous plant signal peptides such as the barley α-amylase signal sequence (BAASS) [56]. 3.3 Production Considerations for Cereals
Most of the useful information concerning the practical and commercial aspects of molecular farming in cereals has come from ProdiGene Inc., which has conducted detailed studies into the economic aspects of molecular farming in maize [64, 65]. The development of a product line takes about 30–35 weeks, including 4–6 weeks for vector construction, 5–7 weeks to identify transformants, 4–6 weeks for the regeneration of plants and transfer to greenhouse, and 12–17 weeks for the T0 plants to reach maturity and set seed. At this point, T1 seeds are removed for molecular analysis of the integrated transgene, in order to select transgenic events for further characterization. After a further 17 weeks, the T1 plants are mature and seeds can be removed for protein extraction and scale-up. At this point, about 10 mg of protein can be obtained for preliminary analysis. After another 17 weeks, the T2 plants are mature and T3 seeds can be removed for analysis and scaleup. It is routine to obtain about 1 g of pure recombinant protein by this time, but theoretical yields can be up to 1 kg. Also at this time, molecular, genetic and biochemical analysis of samples makes it possible to establish a master seed line for future production. After 12–18 months, the T3 generation is mature, and it should be possible to obtain 100 g of recombinant protein on a routine basis. Production lines can then be optimized and established in the field. Another practical issue that needs to be addressed is the compliance of cerealbased production with regulatory guidelines established by bodies such as APHIS and the USDA. Again, ProdiGene Inc. has taken a leading role, developing a procedure in which the producer crop is isolated from other crops, and identity
7
8 Monocot Expression Systems for Molecular Farming
preservation is used from planting through to product extraction, to prevent mingling and contamination. For pharmaceutical products, further strict regulations govern quality assurance and quality control [8].
4 Examples of Recombinant Proteins Produced in Cereals
Molecular farming is often subdivided into pharmaceutical and industrial components, the former dealing with medically relevant proteins (e.g. human blood products, antibodies, vaccine candidates) and the later with bulk enzymes, technical proteins and biopolymers. However, there is no strict boundary between these two areas, while other products do not fit clearly into either category. What is clear is that a large and increasingly diverse spectrum of recombinant proteins is being produced in cereals, mostly as laboratory or greenhouse experiments, but also in field trials and large-scale commercial enterprises. 4.1 ProdiGene and Maize
The first proteins from transgenic plants to reach commercial status were avidin and β-glucuronidase (GUS) both of which are used as diagnostic agents in molecular biology. An important principle demonstrated by these case studies is that molecular farming in cereals can be an economical alternative even when the natural source of a protein is abundant (i.e. egg whites for avidin, and Escherichia coli for GUS) and where a market is already established. Avidin is an abundant glycoprotein that is routinely purified from hens’ eggs. It is widely used in molecular biology due to its very high affinity for biotin. Recombinant avidin was expressed in transgenic maize to establish whether the molecular farming approach could compete with the established source [6]. In the best-expressing transgenic line, avidin represented over 2 % of the aqueous protein extracted from dry maize seeds, equivalent to approximately 230 mg per kg of transgenic seed. The Ubi1 promoter was used to drive transgene expression and the protein was targeted to the secretory pathway using the BAASS. Consequently, the mature protein accumulated in the intercellular spaces. The recombinant protein was shown to be nearly identical to native avidin in terms of molecular weight and biotin binding properties. The protein remained stable in seeds stored at 10 ◦ C and was not affected by commercial processing practices (dry milling, fractionation and hexane extraction). Interestingly, avidin expression in transgenic maize plants correlated with partial or complete male sterility. The cost of producing avidin by molecular farming in maize is estimated to be only 10% of the cost of extraction from egg whites. Recombinant avidin produced in maize is now available from Sigma-Aldrich. The β-glucuronidase enzyme (GUS) is widely used as a screenable marker in transformation and promoter analysis, so unlike avidin, its expression and function
4 Examples of Recombinant Proteins Produced in Cereals
has been confirmed in virtually all plant species that have been transformed. The use of plants as a commercial source of the enzyme was first demonstrated by Witcher et al. [7]. The Ubi1 promoter was used to drive transgene expression and the protein, lacking a targeting signal, accumulated in the cytosol. In the best maize lines, the recombinant enzyme accounted for up to 0.7% of water-soluble protein extracted from dry seed, equivalent to about 80 mg per kg dry seeds. Purified recombinant GUS was similar in molecular mass, physical and kinetic properties to native GUS isolated from E. coli. Transgenic seed containing recombinant GUS could be stored at an ambient temperature for up to two weeks and for at least three months at 10 ◦ C without significant loss of enzyme activity, and could be subject to normal maize processing practices as discussed above without loss of GUS activity [66]. Transgenic maize lines have also been used to investigate the large-scale production of a variety of other products, although none has yet reached commercial status. Perhaps the closest to commercial production are the protease trypsin, the protease-inhibitor aprotinin and fungal laccase (which is used to modify lignin). The aprotinin gene, driven by the Ubi1 promoter, was introduced into maize embryos by particle bombardment [67]. The protein was targeted to the extracellular matrix, and in the best-expressing lines, seeds accumulated recombinant aprotinin at levels up to 0.7% soluble protein. Biochemical analysis of the purified recombinant protein revealed similar properties in terms of molecular weight, Nterminal amino acid sequence, isoelectric point, and trypsin inhibition activity as native aprotinin. The cost of aprotinin production in transgenic plants is comparable to the cost of extracting the protein from its natural source. Trypsin and laccase have been expressed in maize using constitutive and seed-specific promoters [56, 68]. Trypsin expression presents a difficulty because the enzyme is a protease, so it could degrade endogenous proteins and itself (autoproteolysis) thus reducing yields. Therefore, the enzyme was expressed as an inactive precursor, trypsinogen, using the seed-specific promoter, a strategy which is patented by ProdiGene [69]. The laccase 1 isozyme from Trametes versicolor reached the highest expression levels (about 1% total soluble protein) when driven by the embryo-preferred globulin promoter, combined with the BAASS to target the protein for secretion to the cell wall matrix [56]. The production of a secretory antibody in maize has also been reported [8]. Immature maize embryos were transformed with A. tumefaciens containing five transgenes, encoding the four components of the antibody (heavy chain, light chain, joining chain, secretory component) and a selectable marker. The four antibody transgenes were driven by the Ubi1 promoter in combination with the BAASS, and the transgenic seeds were analyzed by enzyme-linked immunosorbent assay to detect assembled antibodies, showing that the antibody accumulated to up to 0.3% total soluble protein in T1 seeds. Based on ProdiGene’s success with other products, the selection of high-performance lines and backcrossing should allow this yield to be increased as much as 70-fold over six generations. Maize has also been investigated as a potential source of oral recombinant vaccines, by expressing the Lt-B protein of enterotoxigenic E. coli [70, 71].
9
10 Monocot Expression Systems for Molecular Farming
4.2 Recombinant Proteins Expressed in Rice
Rice grains were used only occasionally for the expression of recombinant proteins until this system was adopted as a production platform by Ventria Bioscience. In early studies, rice plants were used for the expression of α-interferon [72] and recombinant antibodies [4], while more recently a cedar pollen allergen has been expressed in grains [73]. The PI promoter was used to drive α-interferon expression [72], the antibodies were expressed using either the maize Ubi1 or enhanced CaMV 35S promoters (with the maize promoter showing five times the activity of the viral promoter) [4], while the cedar pollen antigen was expressed using the seed-specific GluB-1 promoter, leader and signal peptide [73]. This construct was expected to direct the recombinant protein to type II protein bodies, but instead it accumulated in type I protein bodies, suggesting some competition between the signal sequence and the structure of the recombinant protein in determining its final destination. In 2002, Ventria Bioscience published two papers describing the expression of human proteins in rice grains [74, 75]. Lysozyme was expressed under the control of the glutelin 1 promoter, leader and signal peptide, and an expression level of 0.6% dry weight (equivalent to 45% of soluble proteins) was achieved [74]. As was the case for maize production lines, the superior transgenic events were selected and maintained over several generations, and showed no loss of expression. Furthermore, the product itself was similar to the native enzyme under all tests and was shown to be active against a laboratory strain of E. coli. Lactoferrin was expressed using the same regulatory elements as above [75] and accumulated to a level of 0.5 % dry weight of dehusked grains. As for lysozyme, all biochemcial and functional tests showed that the protein was similar in its properties to the native enzyme, and receptor-binding activity in human Caco-2 cell cultures was conserved. In a further interesting development, Yang et al. [76] describe the expression, in rice grains, of a REB transcriotion factor (rice endosperm bZIP), which acts upon the globulin (Glb) promoter. When the Glb promoter was placed upstream of the gusA gene for β-glucuronidase, GUS activity was 2–2.5 times higher in the presence of REB in a cell-based transient assay. This enhancement was abolished when the upstream promoter motif GCCACGT(A/C)AG was deleted from the Glb promoter, and a gain of activity was observed when this sequence was added to the normally unresponsive Gt1 promoter. In transgenic rice grains expressing lysozyme under the control of the Glb promoter, the presence of REB resulted in a 3.7-fold increase in lysozyme accumulation. 4.3 Recombinant Proteins Produced in Wheat
As discussed above, wheat has been used only rarely for molecular farming. Thus far, the only example of a pharmaceutical protein produced in wheat is a single chain Fv antibody, which was expressed using the Ubi1 promoter and achieved
References
a maximum expression level of 1.5 µg g−1 dry weight [77]. Transgenic wheat producing Aspergillus phytase has also been reported [78]. 4.4 Recombinant Proteins Produced in Barley
There are few reports of barley used for the production of recombinant proteins but some of those reports show high expression levels which, due to the low producer price of barley, could make this a viable production crop. In early reports, recombinant glucanase and xylanase were expressed at very low levels (0.004%) [79, 80] but as stated above, a recombinant diagnostic antibody (SimpliRED, for the detection of HIV) has been expressed at levels exceeding 150 µg g−1 [15], and a recombinant cellulase enzyme was expressed at levels exceeding 1.5% total seed protein [16].
5 Conclusions
Cereal expression systems are among the most advantageous for field-based recombinant protein production, since they combine intrinsic biosafety features (self-pollination in rice, barley and wheat, seed-specific protein expression) with practical benefits, particularly the tendency for recombinant proteins expressed in seeds to remain stable and active for prolonged periods at ambient temperatures. The commercial success of the first maize-derived recombinant proteins produced by ProdiGene has demonstrated that plants, and cereals in particular, represent an economically viable production system which provides a real alternative to mammalian cells, microbial cultures and indeed other crop systems such as tobacco, oilseed rape and potato. It is likely that rice and barley will be the next crops to emerge as commercial production platforms, particularly through the efforts of Ventria Bioscience.
References 1 J.K.-C. MA, P. M. W. DRAKE, P. CHRISTOU, Nature Rev. Genet. 2003, 4 (10), 794–805. 2 R. M. TWYMAN, E. STOGER, S. SCHILLBERG et al., Trends Biotechnol. 2003, 21 (12), 570–578. 3 K. M¨UNTZ, Plant Mol. Biol. 1998, 38 (1-2) 77–99. 4 E. STOGER, M. SACK, Y. PERRIN et al., Mol. Breeding 2000, 9 (3), 149–158. 5 U. COMMANDEUR, R. M. TWYMAN, R. FISCHER, AgBiotechNet 2003, 5 (ABN 110), 1–9.
6 E. E. HOOD, D. R. WITCHER, S. MADDOCK et al., Mol. Breeding 1997, 3 (4), 291–306. 7 D. R. WITCHER, E. E. HOOD, D. PETERSON et al., Mol. Breeding 1998, 4 (4), 301–312. 8 E. E. HOOD, S. L. WOODARD, M. E. HORN et al., Curr. Opin. Biotechnol. 2002, 13 (6), 630–635. 9 E. E. HOOD, Enzyme & Microbial Technol. 2002, 30 (3), 279–283. 10 E. E. HOOD, in: P. CHRISTOU, H. KLEE (eds) Handbook of Plant Biotechnology, John Wiley & Sons, Chichester, 2004,
11
12 Monocot Expression Systems for Molecular Farming
11 P. CHRISTOU, Rice Biotechnology and Genetic Engineering, Technomic Publishing Co. Inc., Lancaster, PA 12 J. YU, S. N. HU, J. WANG et al., Science 2002, 296 (5565), 79–92. 13 S. A. GOFF, D. RICKE, T. H. LAN et al., Science 2002, 296 (5565), 92–100. 14 M. TERASHIMA, Y. MURAI, M. KAWAMURA et al., Appl. Microbiol. Biotechnol. 1999, 52 (4), 516–523. 15 P. H. D. SCHUNMANN, G. COIA, P. M. WATERHOUSE, Mol. Breeding 2002, 9 (2), 113–121. 16 G. P. XUE, M. PATEL, J. S. JOHNSON et al., Plant Cell Rep. 2003, 21 (11), 1088–1094. 17 H. LORZ, B. BAKER, J. SCHELL, Mol. Gen. Genet. 1985, 199 (2), 178–182. 18 I. POTRYKUS, M. W. SAUL, J. PETRUSKA et al., Mol. Gen. Genet. 1985, 199 (2), 183–188. 19 K. SHIMAMOTO, R. TERADA, T. IZAWA et al., Nature 1989, 338 (6212), 274–276. 20 S. K. DATTA, A. PETERHANS, K. DATTA et al., Bio/Technol. 1990, 8 (8), 736–740. 21 S. OMIRULLEH, M. ABRAHAM, M. GOLOVKIN et al., Plant Mol. Biol. 1993, 21 (3), 415–428. 22 T. M. KLEIN, E. D. WOLF, R. WU et al., Nature 1987, 327 (6117), 70–73. 23 D. MCCABE, P. CHRISTOU, Plant Cell Tiss. Org. Cult. 1993, 33 (3), 227–236. 24 T. M. KLEIN, M. FROMM, A. WEISSINGER et al., Proc. Natl Acad. Sci. USA 1988, 85 (12), 4305–4309. 25 P. CHRISTOU, T. L. FORD, M. KOFRON, Bio/Technol. 1991, 9 (10), 957–962. 26 V. VASIL, A.M. CASTILLO, M. E. FROMM et al., Bio/Technol. 1992, 10 (6), 667–674. 27 Y. C. WAN, P. G. LEMAUX, Plant Physiol. 1994, 104 (1), 37–48. 28 J. P. HERNALSTEENS, L. THIATOONG, J. SCHELL et al., EMBO J. 1984, 3 (13), 3039–3041. 29 W. SCHAFER, A. GORZ, G. KAHL, Nature 1987, 327 (6122), 529–532. 30 D. M. RAINERI, P. BOTTINO, M. P. GORDON et al., Bio/Technol. 1990, 8 (1), 33–38. 31 Y. HIEI, S. OHTA, T. KOMARI et al., Plant J. 1994, 6 (2), 271–282. 32 H. RASHID, S. YOKOI, K. TORIYAMA et al., Plant Cell Rep. 1996, 15 (10), 727–730. 33 J. J. DONG, W. M. TENG, W. G. BUCHHOLZ et al., Mol. Breeding 1996, 2 (3), 267–276.
34 Y. ISHIDA, H. SAITO, S. OHTA et al., Nature Biotechnol. 1996, 14 (6), 745–750. 35 M. CHENG, J. E. FRY, S. Z. PANG et al., Plant Physiol. 1997, 115 (3), 971–980. 36 S. TINGAY, D. MCELROY, R. KALLA et al., Plant J. 1997, 11 (6), 1369–1376. 37 S. B. GELVIN, Annu. Rev. Plant. Physiol. 2000, 51, 223–256. 38 T. KOMARI, Y. HIEI, Y. SAITO et al., Plant J. 1996, 10 (1), 165–174. 39 J. T. O’DELL, F. NAGY, N. H. CHUA, Nature 1985, 313 (6005), 810–812. 40 M. A. LAWTON, M. A. TIERNEY, I. NAKAMURA et al., Plant Mol. Biol. 1987, 9 (4), 315–324. 41 R. KAY, A. CHAN, M. DALY et al., Science 1987, 236 (4806), 1299–1302. 42 B. VERDAGUER, A. DEKOCHKO, R. N. BEACHY et al., Plant Mol. Biol. 1996, 31 (6), 1129–1139. 43 P. M. SCHENK, T. REMANS, L. SAGI et al., Plant Mol. Biol. 2001, 47 (3), 399–412. 44 P. M. SCHENK, L. SAGI, T. REMANS et al., Plant Mol. Biol. 1999, 39 (6), 1221–1230. 45 P. VAIN, K. R. FINER, D. E. ENGLER et al., Plant Cell Rep. 1996, 15 (7), 489–494. 46 J. CALLIS, M. FROMM, V. WALBOT, Genes & Dev. 1987, 1 (10), 1183–1200. 47 A. TANAKA, S. MITA, S. OHTA et al., Nucleic Acids Res. 1990, 18 (23), 6767–6770. 48 S. TAKUMI, M. OTANI, T. SHIMADA, Plant Sci. 1994, 103 (2), 161–166. 49 Y. Z. LI, H. M. MA, J. L. ZHANG et al., Plant Sci. 1995, 108 (2), 181–190. 50 D. P. XU, D. MCELROY, R.W. THORNBURG et al., Plant Mol. Biol. 1993, 22 (4), 573–588. 51 P. H. D. SCHUNMANN, B. SURIN, P. M. WATERHOUSE, Funct. Plant Biol. 2003, 30 (4), 453–460. 52 A. H. CHRISTENSEN, P. H. QUAIL, Transgenic Res. 1996, 5 (3), 213–218. 53 D. A. RUSSELL, M. E. FROMM, Transgenic Res. 1997, 6 (2), 157–168. 54 Z. W. ZHENG, N. MURAI, Plant Sci. 1997, 128 (1), 59–65. 55 Z. W. ZHENG, Y. KAWAGOE, S. H. XIAO et al., Plant J. 1993, 4 (2), 357–366. 56 E. E. HOOD, M. R. BAILEY, K. BEIFUSS et al., Plant Biotechnol. J. 2003, 1 (2), 129–140. 57 Y.-S. HWANG, D. YANG, C. MCCULLAR et al., Plant Cell Rep. 2002, 20 (9), 842–847.
References 58 Y.-S. HWANG, S. NICHOL, S. NANDI et al., Plant Cell Rep. 2001, 20 (7), 647–654. 59 P. H. D. SCHUNMANN, G. COIA, P. M. WATERHOUSE, Mol. Breeding 2002, 9 (2), 113–121. 60 R. FISCHER, N. J. EMANS, R. M. TWYMAN et al., in: M. FINGERMAN, R. NAGABHUSHANAM (eds) Recent Advances in Marine Biotechnology, Volume 9 (Biomaterials and Bioprocessing), Science Publishers Inc., Enfield NH, pp 279–313, 2003. 61 S. SCHILLBERG, N. EMANS, R. FISCHER, Phytochem. Rev. 2002, 1 (1), 45–54. 62 T. OKITA, S. S. CHOI, H. ITO et al., J. Exp. Bot. 1998, 49 (324), 1081–1090. 63 E. TORRES, P. GONZALEZ-MELENDI, E. STOGER et al. Plant Physiol. 2001, 127 (3), 1212–1223. 64 R. L. EVANGELISTA, A. R. KUSNADI, J. A. HOWARD et al., Biotechnol. Prog. 1998, 14 (4), 607–614 65 A. R. KUSNADI, Z. L. NIKOLOV, J. A. HOWARD, Biotechnol. Bioeng. 1997, 56 (5), 473–484. 66 A. R. KUSNADI, R. L. EVANGELISTA, E. E. HOOD et al., Biotech. Bioeng. 1998, 60 (1), 44–52. 67 G. Y. ZHONG, D. PETERSON, D. E. DELANEY et al., Mol. Breeding 1999, 5 (4), 345–356.
68 S. L. WOODARD, J. M. MAYOR, M. R. BAILEY et al., Biotechnol. Appl. Biochem. 2003, 38 (2), 123–130. 69 J. A. HOWARD, E. E. HOOD, US Patent 6, 087, 558, 1998. 70 S. J. STREATFIELD, J. R. LANE, C. A. BROOKS et al., Vaccine 2003, 21 (7–8), 812–815. 71 B. J. LAMPHEAR, S. J. STREATFIELD, J. A. JILKA et al., J. Control. Release 2002, 85 (1–3), 169–180. 72 Z. ZHU, K. HUGHES, L. HUANG et al., Plant Cell Tiss. Org. Cult. 1994, 36 (2), 197–204. 73 A. OKADA, T. OKADA, T. IDE et al., Mol. Breeding 2003, 12 (1), 61–70. 74 J. HUANG, S. NANDI, L. WU et al., Mol. Breeding 2002, 10 (1-2), 83–94. 75 S. NANDI, Y. A. SUZUKI, J. HUANG et al., Plant Sci. 2002, 163 (4), 713–722. 76 D. YANG, L. WU. Y. S. HUANG et al., Proc. Natl Acad. Sci USA 2001, 98 (20), 11438–11443. 77 E. STOGER, C. VAQUERO, E. TORRES et al., Plant Mol. Biol. 2000, 42 (4), 583–590. 78 H. BRINCH-PEDERSEN, A. OLESEN, S. K. RASMUSSEN, Mol. Breeding 2000, 6 (2), 195–206. 79 L. G. JENSEN, O. OLSEN, O. KOPS et al., Proc. Natl Acad. Sci. USA 1996, 93 (8), 3487–3491. 80 M. PATEL, J. S. JOHNSON, R. I. S. BRETTELL et al., Mol. Breeding 2000, 6 (1), 113–123.
13
1
Large-scale Protein Production in Plants: Host Plants, Systems and Expression Richard M. Twyman University of York, York, United Kingdom
Originally published in: Molecular Farming. Edited by Rainer Fischer and Stefan Schillberg. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30786-9
1 Introduction
Plants have numerous advantages over traditional expression technologies for the large-scale production of recombinant proteins. The major benefits of whole plants include the comparatively low cost of large-scale production, the inherent scalability of agricultural systems and the convenience of existing infrastructure for harvesting, processing and distribution [1–4]. The start-up and running costs for molecular farming in plants are significantly lower than those of cell-based production systems because there is no need for fermenters or the skilled operators to run them. It has been estimated that proteins can be produced in plants at 2–10% of the cost of microbial fermentation systems and at 0.1% of the cost of mammalian cell cultures or transgenic animals, as long as adequate yields can be achieved [5]. All plant expression platforms, including whole plants and cell/tissue culture systems, also have safety benefits over microbial and animal cells. This is because they lack the endotoxins produced by bacterial cells and they do not harbor human pathogens [1]. Proteins produced in mammalian cells and transgenic animals must be screened for viruses, oncogenic DNA sequences and prions, which adds significantly to processing costs. Indeed, regardless of the production system, over 85% of the costs associated with recombinant protein production reflect extraction and downstream processing steps rather than the production phase itself [6]. For many proteins, however, these steps can be circumvented or eliminated if plants are used as production hosts. For example, recombinant subunit vaccines produced in plants can be administered orally as raw or partially processed fruits and vegetables [7], therapeutic proteins designed for topical application can be applied as crude extracts or pastes, and industrial enzymes used in food and feed processing can be Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
expressed in the plant that needs to be processed [8–11]. Even for those proteins that must be extracted and purified using conventional methods, plant systems can provide practical advantages to facilitate processing and make it more economical. One example is the oleosin fusion method, which is being developed as a commercial platform technology by SemBioSys Genetics Inc., Calgary, Canada [12]. This allows recombinant proteins to be concentrated in the oil bodies of oilseed crops and extracted using a simple and economical method. Another example is the rhizosecretion of recombinant proteins from the roots of tobacco plants, which allows the protein to be collected continually from root exudates [13, 14]. This approach is being developed as a commercial platform technology by Phytomedics Inc., Dayton, NJ. Given the multitude of benefits listed above, it is not surprising that many of our crop species have been investigated as potential hosts for molecular farming (Table 1). More than 30 species of plants have now been transformed for the sole purpose of expressing and exploiting recombinant proteins, and it is becoming increasingly difficult to choose which expression host is the most suitable for particular products. Further complications are added by the diversity of expression systems available for each crop species. These may include transgenic plants, transplastomic plants, virus-infected plants, transiently transformed leaves, hairy roots and suspension cell cultures. There is also a wide choice of expression strategies, including leaf expression, seed expression, fruit expression, inducible expression, targeting to different subcellular compartments and secretion. The choice of host species, expression system and expression strategy must be evaluated carefully on a case-by-case basis, depending on many inter-related factors [15]. Strategic choices may be made for geographical reasons, such as the site intended for production and the local availability of labor, processing infrastructure, storage facilities, transport and distribution networks. The value of the recombinant protein is also important, since it would make economic sense to produce a high-value protein such as a recombinant antibody in an expensive expression system if other benefits were gained, while bulk products with a low market value would better be produced in a less expensive expression system. The widest choice is generally available where the recombinant protein needs to be purified to homogeneity, while there are greater constraints for proteins intended to be delivered in unprocessed or partially processed plant material. For example, proteins expressed for the purpose of oral vaccination in humans need to be expressed in the edible parts of food crops (e.g. tomato or banana fruits) while those intended for the oral vaccination of animals need to be expressed in fodder crops (e.g. alfalfa or clover leaves). Additional factors that may need to be taken into consideration include the degree of containment afforded by the crop and the extent to which the structure and homogeneity of N-linked glycans can be controlled. Different host species also vary in the expediency and convenience of transformation and regeneration, the availability of useful regulatory elements to control transgene expression, the extent to which endogenous compounds interfere with downstream processing, and in the absolute yields of recombinant proteins that can be achieved.
1 Introduction 3 Table 1 Plant expression hosts used for molecular farming – advantages and disadvantages
Species Model plants Arabidopsis thaliana Simple plants Physcomitrella patens Chlamydomonas reinhardtii Lemna minor
Leafy crops Tobacco
Alfalfa, clover
Lettuce, spinach Cereals Maize, rice
Wheat, barley
Legumes Soybean
Pea, pigeon pea, peanut Fruits and vegetables Potato, carrot
Tomato Banana
Oilcrops Oilseed rape, safflower, Camelina sativa
Advantages
Disadvantages
Range of available mutants, accessible genetics, ease of transformation
Not useful for commercial production (low biomass)
Containment, clonal propagation, batch consistency, secretion of proteins into medium, regulatory compliance, homologous recombination in Physcomitrella
Scalability
High yield, established transformation and expression technology, rapid scale-up, non-food/feed High yield, useful for animal vaccines, clonal propagation, homogenous N-glycans (alfalfa) Edible, useful for human vaccines
Low protein stability in harvested material, presence of alkaloids
Protein stability during storage, high yield, easy to transform and manipulate Protein stability during storage, low producer price
Food crops
High seed protein content, seed coat expression, low producer price High protein content
Lower expression levels
Edible, proteins stable in storage tissues Edible, containment in greenhouses Edible, staple in developing countries, eaten by adults and children
Oil-body purification, sprouting system
Low protein stability in harvested material, presence of oxalic acid Low protein stability in harvested material
Food crops, lower yields, more difficult to transform and manipulate
Low expression levels Must be cooked before consumption (potato), high starch content (potato) More expensive to grow, must be chilled after harvest Transformation and regeneration are currently difficult and time-consuming Lower biomass yields, oil bodies incompatible with glycosylation
4 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
This chapter provides an overview of the different expression hosts, systems and strategies available in molecular farming, and discusses their advantages and disadvantages for the production of different types of recombinant proteins. 2 Host Species for Molecular Farming 2.1 Leafy Crops
The two major leafy crops used for the production of recombinant proteins are tobacco and alfalfa, both of which have high leaf biomass yields in part because they can be cropped several times every year. The main limitation of such crops is that the harvested leaves tend to have a restricted shelf life. The recombinant proteins exist in an aqueous environment and are therefore relatively unstable, which can reduce product yields [16]. For proteins that must be extracted and purified, the leaves need to be dried or frozen for transport, or processed immediately after harvest at the production site. This adds considerably to the processing costs. 2.1.1 Tobacco (Nicotiana tabacum) Cultivated tobacco has a long and successful history in molecular farming. The popularity of this species reflects its dual status as a model plant and a cultivated crop, providing many practical advantages for the large-scale production of recombinant proteins. As a model plant, tobacco benefits from well-established gene transfer and regeneration methodologies, and the availability of many robust expression cassettes for the control of transgene expression. The practical advantages of tobacco include its high biomass yield (up to 100 tonnes of leaf biomass per hectare each year), the wide range of available expression systems (transgenic plants, transplastomic plants, virus-infected plants, transient expression in leaves, transformed suspension cell cultures, hairy roots and shooty teratomas; see Sect. 3), and the fact that tobacco is neither a food nor a feed crop, thus reducing the likelihood of transgenic material contaminating the food or feed chains. The first recombinant human therapeutic protein to be produced in transgenic plants was expressed in tobacco leaves [17, 18]1) , as was the first plant-derived recombinant antibody [19], the first plant-derived vaccine candidate [20], the first plant-derived industrial enzyme [21] and the first plant-derived synthetic biopolymer [22]. However, there are several drawbacks to molecular farming in tobacco leaves, particularly for pharmaceutical proteins. Many tobacco cultivars have high contents of nicotine and
1) Human serum albumin was the first fulllength human protein to be expressed in the leaves of a transgenic plant (both tobacco and potato leaves were shown to express the protein). This report was published in 1990 [18]. Four years earlier, Barta and colleagues had
demonstrated the expression of human growth hormone in tobacco and sunflower callus, and the protein was expressed as a fusion with the Agrobacterium tumefaciens nopaline synthase enzyme [17].
2 Host Species for Molecular Farming
other alkaloids, which must be removed during the downstream processing steps. Even where low-alkaloid cultivars are used, processing is still necessary to remove these toxic components [23, 24]. It has also been shown that glycoproteins produced in tobacco are very heterogeneous in terms of their N-glycan structures, which could make tobacco unsuitable for the bulk production of certain pharmaceutical proteins [25, 26]. 2.1.2 Tobacco (Nicotiana benthamiana) N. benthamiana is a non-cultivated tobacco species whose main advantage as a host plant for molecular farming is that it supports the systemic replication of many different viruses. This has two major applications. First, N. benthamiana is a suitable host species when the aim is to produce recombinant proteins using viral vectors, such as tobacco mosaic virus (TMV) and potato virus X (PVX) (Sect. 3.3). Second, transgenic N. benthamiana plants are often used to produce antibodies that are active against plant viruses, allowing the transgenic plant to be used as a viral assay system. While this is not strictly molecular farming (in that the aim is to confer disease resistance on the plant rather than to exploit the recombinant protein as a pharmaceutical or industrial product), the expression of antibodies in plants is of general interest because strategies to improve antibody expression levels or target them to specific compartments are also applicable to pharmaceutical proteins. N. benthamiana is occasionally used for molecular farming without the involvement of viruses, as in the production of a VH domain recognizing the neuropeptide substance P [27]. 2.1.3 Alfalfa (Medicago sativa) The leaf biomass produced by alfalfa is somewhat lower than that of tobacco (12 tonnes per hectare per year), but it has several advantageous agronomic characteristics compared to tobacco including the fact that it is a perennial plant (vegetative growth can be maintained for many years), it can be clonally propagated by stem cutting, and it fixes its own nitrogen thus eliminating the need for fertilizer input. The technology for gene transfer and transgene control in alfalfa is not so well established as it is in tobacco, but due mainly to the efforts of researchers at Medicago Inc., Qu´ebec City, Canada, much progress has been made in the development of constitutive and inducible expression cassettes for molecular farming, and alternative expression systems such as agroinfiltrated leaves and cell suspension cultures. While alfalfa lacks toxic alkaloids, the leaves do contain high levels of oxalic acid, which can in some cases interfere with downstream processing. The major advantage of alfalfa for the production of pharmaceutical proteins is that glycoproteins expressed in alfalfa leaves have homogeneous glycan structures [28, 29]. This, together with the ease of clonal propagation, provides enormous production benefits in terms of batch-to-batch reproducibility. Since alfalfa is a fodder crop, the other major application of this species in molecular farming is the delivery of vaccines to domestic animals, as has been demonstrated in the case of a foot-and-mouthdisease vaccine [30]. Alfalfa has also been used for the production of three industrial enzymes: 1,4-β-D-endoglucanase and cellobiohydrolase [11], and phytase [31].
5
6 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
2.1.4 White clover (Trifolium repens) Like alfalfa, the clover family (Trifolium spp.) consists of perennial legume plants that fix their own nitrogen. However, they suffer several limitations as general production crops for molecular farming, such as restricted perenniality, low protein content and the presence of high levels of condensed tannins which interfere with protein extraction. Despite these disadvantages, clovers are forage crops and are therefore useful for the production of vaccines against animal diseases. Thus far, white clover has been used to produce a Mannheimia haemolytica A1 leukotoxin fusion protein antigen to protect cattle against pneumonic pasteurellosis, otherwise known as shipping fever. The antigen remained stable for over one year in dried clover leaves that had been baked in the oven at 50 ◦ C immediately after harvest [32]. 2.1.5 Lettuce (Lactuca sativa) Lettuce has been used for the production of a hepatitis B surface antigen and was chosen essentially because it is an edible salad crop that can be used in human clinical trials [33, 34]. It does have a higher biomass yield than alfalfa (30 tonnes per hectare per year) but it has a much higher producer price and a very high water content (98%) which reduces protein yields and stability. Human volunteers, fed with transgenic lettuce plants expressing hepatitis B virus surface antigen, developed specific serum-IgG responses to the plant-derived vaccine. The investigators who published this original report are now producing further pharmaceutical proteins in lettuce, including anthrax protective protein and antibodies against rabies and colorectal cancer [35]. Other investigators are considering the use of lettuce for the production of further antigens, including most recently the SARS virus spike glycoprotein. 2.1.6 Spinach (Spinacia oleracea) Spinach, like lettuce, has been used for the production of edible vaccines. In the first report, Yusibov and colleagues used alflafla mosaic virus (AlMV) to produce rabies fusion epitopes on the virus surface in infected spinach leaves. This resulted in the development of anti-rabies (as well as anti-AlMV) antibodies when the spinach leaves were fed to mice [36]. Vaccines against HIV gp120 and Tat have been produced in spinach, and a construct of gp120 with the CD4 receptor is now being adapted for this plant [35]. Spinach is also being used to make an anthrax vaccine [37]. 2.1.7 Lupin (Lupinus spp.) Lupin (Lupinus luteus) has been used to express the same hepatitis B surface antigen as produced in lettuce, and the transgenic leaves were used in pre-clinical trials in an attempt to promote an immune response in mice following oral administration [34]. Narrow-leaf lupin (Lupinus angustifolius) has been used to express a gene encoding a plant allergen (sunflower seed albumin), in a successful attempt to suppress experimentally-induced asthma [38]. This was the first study involving
2 Host Species for Molecular Farming
the production of a heterologous vaccine in plants to provide protection against allergic diseases. 2.2 Dry Seed Crops
The dry seed crops that have been used as host plants for molecular farming include the cereals maize, rice, wheat and barley, and the grain legumes soybean, pea, pigeon pea and peanut. Maize, rice, wheat, barley, soybean and pea have been investigated as general production platforms, while pigeon pea and peanut have been used solely for the expression of animal vaccine candidates. The major advantage of all seed crops is that recombinant proteins can be directed to accumulate specifically in the desiccated seeds [15]. Therefore, although seed biomass yields are smaller than the leaf biomass yields of tobacco and alfalfa, this is offset by the increased stability of the proteins. Seeds are natural storage organs, with the optimal biochemical environment for the accumulation of large amounts of protein. In the best cases, recombinant proteins expressed in seeds have been shown to remain stable and active after storage at room temperature for over three years. The accumulation of proteins in the seed rather than vegetative organs also prevents any toxic effects on the host plant. Finally, the extraction of proteins from seeds is facilitated because the target protein is concentrated in a small volume, most cereal seeds lack the phenolic compounds that are often found in leaves and which interfere with processing, and the seed proteome is fairly simple, which reduces the likelihood that contaminating proteins will co-purify with the recombinant protein during downstream processing [39]. Several factors need to be weighed up when choosing an appropriate dry seed expression host, including geographical considerations, the ease of transformation and regeneration, the annual yield of seed per hectare, the yield of recombinant protein per kilogram of seed, the producer price of the crop, the percentage of the seed that is made up of protein and, inevitably, intellectual property issues [15]. Together, these determine the overall cost of producing the recombinant protein in the chosen seed crop. 2.2.1 Maize (Zea mays) Maize was chosen as a platform expression host by ProdiGene Inc., College Station, TX, and has the honor of being the only crop thus far to be used commercially for the production of plant-derived recombinant proteins (avidin and β-glucuronidase were first produced in maize seeds on a commercial basis in 1997 [40, 41]). Maize was chosen over the other cereals because it has the highest annual grain yield (8300 kg ha−1 ), and the seeds have a relatively high seed protein content (10%) resulting in potentially the highest recombinant protein yields per hectare. Maize is also relatively easy to transform and manipulate in the laboratory, while in the field it has the greatest capacity of all the cereals for rapid scale up. Maize is the most widely cultivated crop in North America, so the complex infrastructure for growing, harvesting, processing, storing and transporting large volumes of corn is
7
8 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
already in place. Prodigene is developing maize for the production of a range of pharmaceutical and technical proteins, including recombinant antibodies, vaccine candidates and enzymes [5, 42, 43]. 2.2.2 Rice (Oryza sativa) Rice is the most important staple food crop in the world and has also emerged as the model cereal species (it is the only terrestrial plant other than Arabidopsis thaliana to benefit from a completed genome sequence, and extensive EST resources are also available). Rice has a lower annual grain yield than maize (6600 kg ha−1 ) and the grain has a lower protein content (8%), but like maize it is easy to transform and manipulate in the laboratory, a range of useful expression cassettes have been developed, and field populations can be scaled up rapidly. The major disadvantage of rice compared to maize is that the producer price is significantly higher, so it may be excluded as a major expression host in the West simply on economic grounds. However, the story may be different in Asia and Africa, where rice is traditionally grown. Several pharmaceutical proteins have been expressed in rice grains, including α-interferon [44], recombinant antibodies [39] and most recently a cedar pollen allergen [45]. Rice suspension cell cultures have also been used for the production of pharmaceutical proteins (Sect. 3.8).
2.2.3 Wheat (Triticum aestivium) The main advantages of wheat for molecular farming are the low producer price and the high protein content of the grain (>12%) [15]. The adoption of this species as a production crop will depend, however, on the development of better transformation procedures and stronger expression cassettes, since at the current time only low expression levels have been achieved in transgenic wheat grains [15]. The annual grain yield per hectare (2800 kg) is also much lower in wheat than in maize or rice, so larger areas of land would need to be cultivated to produce the same amounts of protein.
2.2.4 Barley (Hordeum vulgare) Barley, like wheat, has a low producer price and a high protein content in the grain (about 13%). Also like wheat, gene transfer and regeneration techniques are not so well developed as they are in maize and rice. A significant advantage of barley, however, is that while few molecular farming experiments have been carried out using this species, the results have generally been very encouraging in terms of recombinant protein yields. In early reports, recombinant glucanase and xylanase were expressed at very low levels in barley [46, 47]. More recently, however, a recombinant diagnostic antibody (SimpliRED, for the detection of HIV) was expressed in transgenic grains at levels exceeding 150 µg g−1 [48], and a recombinant cellulase enzyme was expressed at levels exceeding 1.5% total seed protein [49]. Further recombinant proteins will need to be expressed in barley before its performance can be compared meaningfully with maize and rice.
2 Host Species for Molecular Farming
2.2.5 Soybean (Glycine max) The advantage of soybean as a production crop is that while it has a relatively low annual grain yield compared to maize and rice (about 2500 kg ha−1 ), the protein content of the seed is very high (> 40%) which means that it has the highest potential recombinant protein yields per hectare of any seed crop. This, combined with its relatively low producer price, makes it one of the least expensive crops for recombinant protein production. However, these advantages are balanced by the more difficult transformation and regeneration procedures. Therefore, only one report of molecular farming in soybean has been published, that of a humanized antibody against herpes simplex virus, and this was produced constitutively in the plant rather than in the seeds alone [50]. Although the use of soybean as a production host has been limited, the anti-HSV2 antibody produced in soybean is one of the few plant-derived antibodies in clinical development [51]. Another potential advantage of soybeans is the ability to express proteins in the seed coat [52]. This structure contains very few complex soluble proteins, which makes isolation of the target protein simple and inexpensive. Furthermore, the seed coats are easily removed and separated in the milling process, which makes them readily attainable and free of contamination from other components of the seed. 2.2.6 Pea (Pisum sativum) Pea has a similar annual grain yield and seed protein content to soybean, and therefore has the same potential in terms of high recombinant protein yields per hectare. However, the producer price is about 50% higher than that of soybean, so proportionately higher yields would be necessary to make this a more competitive crop. To date, only a single pharmaceutical protein has been expressed in pea, a recombinant scFv antibody recognizing a cancer antigen. This was expressed under the control of the seed-specific legumin A promoter, and the maximum expression level achieved was 9 µg g−1 of seed [53]. Trials have also been carried out with field pea varieties producing the enzyme α-amylase from Bacillus licheniformis under the control of the bean USP (unknown seed protein) promoter. In this study, a yield of 7000 enzyme units per kg of seed was achieved. 2.2.7 Pigeon pea (Cajanus cajan) Pigeon pea is often used in Africa, the Middle East and South Asia as a fodder crop, and is therefore suitable as an expression host for animal vaccines against diseases endemic in those areas. In one report thus far, pigeon pea plants have been generated expressing Rinderpest virus hemagglutinin at a level just below 0.5 % TSP, with the aim of protecting ruminants against the disease caused by this virus [54]. As with many crops envisaged as delivery vehicles for vaccines, this is a very specialized application and the crop is unlikely to be used as a general production system for recombinant proteins. 2.2.8 Peanut (Arachis hypogaea) Peanuts are used as fodder in much of Africa and Asia, making this species also a suitable delivery vehicle for vaccines against animal diseases. As is the case for
9
10 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
pigeon pea, transgenic peanut plants expressing Rinderpest virus hemagglutinin have been generated, but more general applications are unlikely [55, 56]. 2.3 Fruit and vegetable crops
Most fruit and vegetable crops that have been considered as hosts for molecular farming have been chosen not because they are particularly advantageous for bulk production, but because they might serve as vehicles for the delivery of edible vaccines. Therefore, the choice of a fruit or vegetable production crop usually reflects either geographical or cultural preferences that suit the production and distribution of a given vaccine, rather than any intrinsic advantages in terms of yields or stability that each species may exhibit. The exceptions to this general rule are potato and tomato, both of which have merits as general production platforms. 2.3.1 Potato (Solanum tuberosum) The potato is the fourth most important food crop (after rice, wheat and maize) and is therefore widely grown throughout the world. It also has a very high tuber biomass yield (about 125 tonnes per hectare annually) which makes it eminently suitable for bulk protein production. Like cereals and grain legumes, potato plants have specialized storage organs (tubers) that are adapted for the accumulation of large amounts of protein. Targeting recombinant proteins to these organs therefore enhances protein stability in a manner similar to seed endosperm-specific expression in cereals. The first example of molecular farming in potato was the expression of human serum albumin in 1990, although in this case the protein was expressed in leaves [18]. Potato leaves have also been used to express an industrial cellulase [9]. Pharmaceutical proteins that have been expressed in potato tubers include human glutamic acid decarboxylase [57, 58], human interferons [59], human interleukins [60, 61] as well as various diagnostic and therapeutic antibodies [62, 63]. Potato has also been used to express nutrition-enhancing proteins, such as human milk casein, and antibacterial lysozyme as potential additives to baby food [64, 65]. The most widespread use of this species, however, has been in the production of vaccine candidates. At least 10 different vaccine subunits have been expressed in potato [66], and in three cases thus far these have been used in clinical trials [67–69]. Recently, potato was used simultaneously to express three vaccine antigens: cholera toxin B and A2 subunits, rotavirus enterotoxin and enterotoxigenic Escherichia coli fimbrial antigen for protection against several enteric diseases [70]. The only disadvantage of using potatoes for the delivery of vaccines is that potatoes are cooked before eating in normal domestic settings, and heating may denature the recombinant proteins and render them inactive in terms of eliciting an appropriate immune response. This limitation does not apply to the other fruit and vegetable crops discussed below. A disadvantage of potatoes for the production of other pharmaceutical proteins, which need to be isolated and purified, is that the large tuber starch content may interfere with downstream processing.
2 Host Species for Molecular Farming
2.3.2 Tomato (Lycopersicon esculentum) Tomato plants have a high fruit biomass yield (about 60 tonnes per hectare per year) and offer other advantages in terms of containment, because they are grown in greenhouses. Tomato fruits have therefore been investigated as a general production system in molecular farming, and have been used to express one recombinant antibody (an scFv recognizing carcinoembryonic antigen) [15] and several potentially pharmaceutical proteins (e.g. angiotensin-converting enzyme). As with potatoes, however, the most widespread use of tomato fruits thus far has been in the expression of vaccine candidates. The first such report involved the expression of rabies surface glycoprotein, which achieved the relatively high expression level of 1% TSP [71]. Other vaccines that have been expressed in tomato include cholera toxin B sub-unit [72] respiratory syncytial virus-F protein [73] and, most recently, hepatitis E virus partial OFR2 [74] and the B subunit of E. coli heat-labile enterotoxin [75]. On the negative side, tomatoes must be chilled after harvest to keep the recombinant proteins stable, and the yields of recombinant proteins in tomatoes are generally relatively low at the present time. However, this could be addressed in some cases by transforming the fruit chromoplasts rather than the nuclear genome (Sect. 3.2). 2.3.3 Carrot (Daucus carota) The carrot taproot is a useful site for protein accumulation since this is both a natural storage organ and the edible portion of the plant. Unlike potato, carrots do not need to be cooked prior to consumption, so vaccines are more likely to remain in their native conformation. Several antigens have already been expressed in carrot, including the Mycobacterium tuberculosis MPT64 protein [76], a diabetes-associated autoantigen (an isoform of glutamic acid decarboxylase) [77] and various derivatives of measles hemagglutinin [78, 79]. 2.3.4 Banana (Musa spp.) Vaccine production in transgenic plants began with the idea that transgenic bananas could be used to administer oral vaccines to adults (consuming whole fruits) and children (fed with banana paste or puree). Banana plants are cheap to grow and the fruits are a staple food source in many developing countries where vaccination campaigns are needed the most. One limitation of bananas, however, is that the transgenic plants take nearly two years to produce. Transgenic banana plants have been produced expressing vaccines against measles virus and hepatitis B virus. In order to prevent vaccine-containing bananas entering the food chain, attempts are in progress to introduce a marker gene that turns the banana flesh blue, enabling them to be distinguished from wild type fruits. 2.4 Oilcrops
Oilcrops are potentially advantageous for molecular farming because the oil bodies in developing seeds can trap the recombinant proteins and can be used to facilitate
11
12 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
extraction and processing. Oil bodies are seed-specific organelles whose function is to accumulate triacylglycerides. Each oil body comprises a triacylglyceride core surrounded by a phospholipid membrane, which is peppered with oil-body-specific proteins termed oleosins. Recombinant protein can be trapped in the oil bodies by expressing them as oleosin-fusion proteins. A proprietary system developed by the biotechnology company SemBioSys Inc., Calgary, Canada, can then be used to extract the oil bodies and isolate the recombinant protein by endoproteolytic cleavage. It may also be possible to exploit the natural oils normally extracted from such crops as byproducts, which can be used to offset production costs.
2.4.1 Rapeseed/Canola (Brassica napus) Brassica napus is a widely grown crop used primarily for the production of oil, which is classed as either rapeseed oil or canola oil depending on its quality and content. The seeds of this plant also have a high protein content (25%), which makes them suitable for molecular farming. Transgenic rapeseed/canola plants can be produced relatively easily, and scale-up is rapid due to the large number of seeds produced by each plant. Although the overall biomass yields from this crop are comparatively low (about 1 tonne per hectare) significant cost savings can be achieved during downstream processing steps by targeting recombinant proteins to the oil bodies and using these organelles to facilitate protein extraction and purification. This was first achieved in the case of leech hirudin by fusing the Hirudo medicinalis hirudin cDNA to the plant’s oleosin gene. The recombinant hirudin accumulated to 1% total seed protein, and following extraction of the oil bodies and purification of the fusion protein, the leech protein could be removed from its fusion partner by endoproteolytic cleavage in vitro [80]. As well as providing a simple extraction and purification process, the expression of pharmaceutical proteins as fusions renders them inactive, and thus poses less risk to both the developing plant and any other organisms that come into adventitious contact with it. Enzymes have also been produced in rape-seeds, including a bacterial xylanase and Aspergillus phytase [81]. Seeds expressing recombinant phytase are commercially available as a feed additive. One disadvantage of the oil body system is that proteins directed to oil bodies do not pass through the secretory pathway and therefore are not glycosylated. For this reason, the oil body system is not useful for the production of glycoproteins in cases where the N-glycans are necessary for function or activity. Another useful feature of B. napus is the rapid sprouting of the seeds, which is the basis of a distinct platform technology being developed by UniCrop Ltd, Helsinki, Finland. In this method, proteins are expressed in the developing sprouts using cotyledon-specific Rubisco small subunit promoters, and proteins are extracted from the sprouts which are grown in an airlift tank. Although lower yields are produced in this method compared to agricultural scale production, the added advantage of containment may be useful for the production of pharmaceutical proteins under defined conditions.
2 Host Species for Molecular Farming
2.4.2 Falseflax (Camelina sativa) Falseflax is a self-pollinating oilseed crop, rarely used for food, which originates from the Fertile Crescent. This is also being developed as a sprout-based production platform by UniCrop Ltd, Helsinki, Finland. 2.4.3 Safflower (Carthamus tinctorius) Safflower has many qualities in common with rapeseed including the suitability for oleosin fusion technology, and has been chosen as a platform crop by SemBioSys Inc., Calgary, Canada, for the following reasons: Safflower plants are readily transformed, they grow counter-seasonally, and they can be contained easily (there are no weedy relatives in the West, and seeds show minimal dormancy). The system is easily scalable and can produce clinical quantities of pure protein with an unprecedented manufacturing capacity. In addition, the seed-based system offers seasonally-independent availability of raw materials and improved inventory management since the recombinant proteins are stable in transgenic seeds for extended periods. 2.5 Unicellular Plants and Aquatic Plants Maintained in Bioreactors
Single-celled plants and aquatic plants can be maintained in bioreactors, which offers two critical advantages over molecular farming in terrestrial plants. First, the growth conditions can be controlled precisely, which means that optimal growth conditions can be maintained, batch-to-batch product consistency is improved and the growth cycle can conform to good manufacturing practice (GMP) procedures. Second, growth in bioreactors offers complete containment, thus sidestepping the environmental biosafety issues associated with all transgenic terrestrial plants, whether or not they are used for molecular farming. Although more expensive than agricultural molecular farming, the use of simple plants in bioreactors is not as expensive as cultured animal cells because the media requirements are generally very simple. Added to this, the proteins can be secreted into the medium, which reduces the downstream processing costs and allows the product to be collected in a non-destructive manner. A final, major advantage is the speed of production. The time from transformation to first product recovery is on the scale of days to weeks because no regeneration is required, and stable producer lines can be established in weeks rather than months to years because there is no need for crossing, seedcollection and the testing of several filial generations to check transgene stability. Three major bioreactor-based systems are currently under commercial development: algae, moss and duckweed. 2.5.1 Chlamydomonas reinhardtii Although the alga Chlamydomonas reinhardtii is a model organism which has been instrumental in the study of photosynthesis and light-regulated gene expression, it has only recently been explored as a potential host for molecular farming. A single report discusses the production of monoclonal antibodies in algae [82], and
13
14 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
shows that the production costs are similar to those of recombinant proteins produced in terrestrial plants, mainly due to the inexpensive media requirements (the medium does not cost very much to start with, and in any case can be recycled for algal cultures grown in continuous cycles). Aside from the economy of producing recombinant proteins in algae, there are further attributes that make alga ideal candidates for recombinant protein production. First, transgenic algae can be generated quickly, requiring only a few weeks between the generation of initial transformants and their scale up to production volumes. Second, both the chloroplast and nuclear genome of algae can be genetically transformed, providing scope for the production of several different proteins simultaneously. In addition, algae have the ability to be grown on various scales, ranging from a few milliliters to 500,000 liters in a cost-effective manner. These attributes, and the fact that green algae fall into the GRAS (generally regarded as safe) category, make C. reinhardtii a particularly attractive alternative to other plants for the expression of recombinant proteins. The production technology has been reviewed recently [83]. 2.5.2 Physcomitrella patens The moss Physcomitrella patens is a haploid bryophyte which can be grown in bioreactors in the same way as algae, suspension cells and aquatic plants. Like these other systems, it has the advantages of controlled growth conditions, synthetic growth media, and the ability to secrete recombinant proteins into the medium [84]. The unique feature of this organism, relative to all other plants, is that it is amenable to homologous recombination [85]. This means that not only can it be transformed stably with new genetic information, but that endogenous genes can be disrupted by gene targeting. The major application of gene targeting in molecular farming is the modification of the glycosylation pathway (by knocking out enzymes that add non-human glycan chains to proteins) thus allowing the production of humanized glycoproteins [84, 86]. The P. patens system is being developed by the German biotechnology company Greenovation Biotech GmbH, which is based in Freiburg. The company has developed transient expression systems that allow feasibility studies, and stable production strains that can be scaled up to several thousand liters. 2.5.3 Duckweed (Lemna minor) The Lemna System developed by the US biotechnology company Biolex Inc. has a number of significant advantages for the production of recombinant pharmaceutical proteins [87]. Unlike transgenic terrestrial plants, this aquatic plant is cultured in sealed, aseptic vessels under constant growth conditions (temperature, pH and artificial light). Only very simple nutrients are required (water, air and completely synthetic inorganic salts) and under these conditions, the plant proliferates vegetatively and doubles its biomass every 36 hours. This provides the optimal production environment for batch-to-batch consistency. Duckweed constitutes about 30% dry weight of protein, and recombinant proteins can either be extracted from wet plant biomass or secreted into the growth medium. Biolex Inc. has reported the successful expression of 12 proteins in this system, including growth hormone, interferon-α, and several recombinant antibodies and enzymes [87].
3 Expression systems for molecular farming
2.6 Non-cultivated Model Plants
Rather than study every different species in the world, researchers focus on model organisms to learn general principles that can be applied to a wider range of life forms. Model organisms are often chosen for historical reasons, or because they are particularly easy to handle and manipulate in the laboratory. Model organisms may be chosen because they have short generation intervals or other features that make them amenable to genetic analysis. They may be chosen because they are particularly suitable for genetic manipulation or surgical procedures, or they may have a small, compact genome compared to related species. Often, a combination of the above features is present. Model organisms are rarely, if ever, chosen because of their commercial value. In terms of molecular farming, the advantage of model plants is that they are very well-characterized, which means that there is a disproportionately large amount of genetic, biochemical and physiological data available about them. This makes them suitable test systems for the design of novel expression systems. Three model plants, rice, tobacco and C. reinhardtii, have already been discussed in this section. The most important model plant of all, however, is Arabidopsis thaliana. 2.6.1 Arabidopsis thaliana Arabidopsis thaliana is a small dicot plant of the mustard family. It has a number of features that make it ideal as a model organism: e.g. its size, short life cycle, the fact that it produces large numbers of seeds, and its relatively small genome of 125 Mb. The genome sequence of A. thaliana was completed in 2000, extensive EST resources are available and in terms of functional annotation, more is known about this higher plant than any other. The availability of such large amounts of information, together with the ease of transformation by methods such as floral dipping, make A. thaliana a convenient host species to use as a test system for molecular farming. Various pharmaceutical and industrial proteins have been expressed in the leaves, seeds and undifferentiated callus of this plant, including an Acidothermus en-doglucanse which accumulated to 26% of total soluble protein (TSP) [10]. Despite the encouraging results in terms of protein expression levels, however, A. thaliana is not suitable for cultivation on an agricultural scale. In this setting, its small size, weediness and low biomass yield are disadvantages and the plant has no commercial value. Therefore, while useful as a test system, it is unlikely this species will ever be used for commercial molecular farming.
3 Expression systems for molecular farming
While the range of available expression hosts for molecular farming is impressive, the choice becomes even more varied when the different expression systems are considered (Table 2). In this chapter, an expression host and an expression system are considered as separate entities: an expression host is defined as a
15
16 Large-scale Protein Production in Plants: Host Plants, Systems and Expression Table 2 Comparison of different plant-based production systems
System
Advantages
Disadvantages
Transgenic plants, accumulation within plant Transgenic plants, secretion from roots or leaves Transplastomic plants
Economy, biomass yields, scalability, establishment of permanent lines Containment, purification
Production timescale, regulation, biosafety, product yields Scale, cost of production facilities
Yield, multiple gene expression, low toxicity, containment Yield, timescale, mixed infections, epitope presentation systems Timescale Timescale, containment, secretion into medium (purification), regulatory compliance
Absence of glycosylation
Virus infected plants
Agroinfiltration Cell or tissue culture
Construct size limitations
Cost, scalability Cost, scalability
particular species while an expression system is defined as a transformation, propagation and expression strategy. The combination of host and system results in an expression platform, e.g. stably transformed rice seeds, transiently transformed alfalfa leaves, stably transformed tobacco suspension cells, virus-infected spinach leaves etc. Note that this is not a universal nomenclature, and in the literature the terms host, system, platform, strategy etc. appear to be used interchangeably. By far the predominant system used in molecular farming is the nuclear transgenic plant with the recombinant protein accumulating within the plant tissues [4]. However, biosafety concerns and the disadvantage of long development phases have driven researchers to look at alternative systems based either on transient expression, extra-nuclear transgenesis, or the use of cultured plants, organs or cells. 3.1 Transgenic plants
Transgenic plants usually contain foreign DNA incorporated into the nuclear genome. Such plants have a combination of valuable attributes for molecular farming, but in terms of commercialization the most important of these are the low overall cost of production, the inherent scalability of agricultural systems, the fact that stable transgenic lines are a permanent resource, and the variety of production hosts suitable for different applications. Scalability is probably the most important advantage because the cost of recombinant proteins produced in field plants is inversely proportional to the production scale. In a market which can see demand rapidly increase and decrease, fermenter-based production systems are often unable to cope or left with surplus capacity. In contrast, the scale of plant-based production can be modulated rapidly simply by using more or less land as required.
3 Expression systems for molecular farming
It can take several years to achieve e.g. a tenfold scale-up or scale-down in fermenter systems or in transgenic animals, but a field of transgenic plants can be scaled up or down more than 1000fold in a single generation by planting greater or smaller numbers of seeds [88]. The two major disadvantages of transgenic plants are the development timescales and the increasing importance of biosafety and regulatory compliance. The ‘geneto-protein’ time for transgenic plants encompasses the preparation of expression constructs, transformation, regeneration and the production and testing of several generations of plants. The testing phase is necessary to ensure transgene and expression stability and the biochemical activity of the product, as well as the absence of adverse phenotypic changes in the host plant. These processes take up to two years depending on the plant species although milligram amounts of protein might be available after several months for initial testing [5]. Biosafety concerns include the risk of transgenic plants becoming naturalized outside their intended production sites due to human or animal activities, the risk of transgene spread by outcrossing or horizontal gene transfer, and the risk to human health and the environment caused by the presence of potentially toxic recombinant proteins. These issues are discussed at more length. One further limitation is the low yields that are obtained in many transgenic plants. In some cases, this reflects the poor performance of the expression construct and can be addressed by improved construct design (see Sect. 3.4). However, in other cases the problem is intrinsic to the transgenic plant, and reflects rate limiting processesof transgene expression, protein synthesis or protein turnover. 3.2 Transplastomic plants
Transplastomic plants are transgenic plants generated by introducing DNA into the chloroplast genome, usually by particle bombardment [89, 90]. The plants are grown in the same way as nuclear transgenic plants and therefore suffer the same disadvantages in terms of production timelines. However, the advantages of chloroplast transformation are many: the transgene copy number is high because of the many chloroplasts in a typical photosynthetic cell, there is no gene silencing, multiple genes can be expressed in operons, the recombinant proteins accumulate within the chloroplast thus limiting toxicity to the host plant, and the absence of functional chloroplast DNA in the pollen of most crops provides natural transgene containment. The high transgene copy numbers and the absence of silencing have resulted in extraordinary expression levels, e.g. 25% TSP for a tetanus toxin fragment [91], 11% TSP for human serum albumin [92] and 6% TSP for a thermostable xylanase [93]. The ability to express multiple genes in operons means that chloroplast expression will be particularly amenable to the production of multisubunit proteins such as antibodies. However, while proteins expressed in the chloroplast have been shown to fold properly and form appropriate disulfide bonds, glycosylation does not occur, so this system has limited use for the production of therapeutic glycoproteins. The biosafety advantages of the chloroplast
17
18 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
system are particularly noteworthy since the inability of functional chloroplast DNA to reach the egg in most commercial crop species means that transgene spread by outcrossing is strongly inhibited. However, other biosafety concerns such as seed spread by human and animal activities are just as prevalent in transplastomic crops as they are for standard transgenic crops. At the current time, chloroplast transformation is a routine procedure only in tobacco and C. reinhardtii (see Sect. 2.5.1). However, plastid transformation has been achieved in a growing number of other species, including carrot and tomato [94]. The ability to transform the chromoplasts of fruit and vegetable crops has obvious advantages for the expression of subunit vaccines [94]. 3.3 Virus-infected plants
Recombinant plant viruses have been used as expression vectors because the infections are rapid and systemic, leading to high levels of recombinant protein production soon after inoculation [95]. Compared to transgenic and transplastomic plants, the development phase is significantly reduced, but the scalability of virusmediated expression is equivalent to that of other field plant expression systems. No known plant viruses integrate into the genome, so the genetic modification of plants is entirely avoided. Two major strategies have been developed with viral vectors: the expression of full-length recombinant proteins and the presentation of foreign epi-topes on the surface of viral particles. Both TMV and PVX have been used in the context of the first strategy to produce antibodies, vaccine candidates and some other pharmaceutical proteins. Cowpea mosaic virus (CPMV) and AlMV are the most popular epitope presentation systems, but TMV and PVX have also been used for this purpose. Perhaps the most significant use of viral expression systems for molecular farming was described by McCormick and colleagues [96]. These investigators used TMV vectors in N. benthamiana to produce a scFv antibody based on the idiotype of malignant B-cells from the murine 38C13 B-lymphoma cell line. When administered to mice, the recombinant protein stimulated the production of anti-idiotype antibodies capable of recognizing 38C13 cells, providing immunity against lethal challenge with the lymphoma. This strategy could be used to develop personalized therapies for diseases such as non-Hodgkin’s lymphoma. Antibodies capable of recognizing unique markers on the surface of any malignant B-cells could be produced for each patient. The necessity for speed in the derivation of such prophylactic antibodies is recognized by the use of viral vectors rather than transgenic plants. These antibodies are now undergoing Phase I clinical trials. Another important development in the use of viral vectors was the production of full-size immunoglobulins in N. benthamiana plants using two TMV vectors [97]. One of the vectors expressed the heavy chain and one the light chain. This study showed that viral coexpression was compatible with the correct assembly and processing of multimeric recombinant proteins. Many epitopes from human and animal pathogenic viruses and bacteria have been expressed as coat protein fusions in CPMV and AlMV, and in most cases
3 Expression systems for molecular farming
mice either injected with plant extracts or administered the extracts intranasally developed suitable immune responses. 3.4 Transiently transformed leaves
Transient expression assays are often used to evaluate expression constructs or test the functionality of a recombinant protein before committing to the long term goal of transgenic plants. However, transient expression can also be used as a routine molecular farming method if enough protein can be produced to make the system economically viable. An example of a transient expression system is the agroinfiltration method, where recombinant Agrobacterium tumefaciens are infiltrated into leaf tissue and genes carried on the T-DNA are expressed for 2–5 days without integration. Agroinfiltration was developed in tobacco [98], but there appears to be no intrinsic limitation to the range of species that can be used, since preliminary data obtained at the RWTH Aachen demonstrates the successful infiltration of over 20 different plant species (Markus Sack, personal communication). Although originally considered difficult to scale up, agroinfiltration is now known to be suitable for the routine production of milligram amounts of protein in a timescale of weeks. Scientists at Medicago Inc., Qu´ebec, Canada, regularly process up to 7500 infiltrated alfalfa leaves per week and similarly we have shown that up to 100 kg of wild type tobacco leaves can be processed by agroinfiltration, resulting in the production of 50-150 mg of protein per kg (Stefan Schillberg, personal communication). A number of different antibodies and their derivatives have been produced by agroinfiltration, including the full-size IgG T84.66 (along with its scFv and diabody derivatives) [99], and a chimeric full-size IgG known as PIPP which recognizes human chorionic gonadotropin [100]. Recently, several reports have described how agroinfiltration can be scaled-up more efficiently. Baulcombe and colleagues have shown that the loss of protein expression seen a few days after agroinfiltration is predominantly caused by gene silencing. They managed to increase the expression levels of several proteins at least 50-fold by co-expressing the p19 protein from tomato bushy stunt virus, a known inhibitor of gene silencing [101]. 3.5 Hydroponic cultures
In most cases where nuclear transgenic plants have been used for the production of recombinant protein, the proteins have been extracted from plant tissues. An alternative is to attach a signal peptide to the recombinant protein thus directing it to the secretory pathway. In this way, the protein can be recovered from the root exudates or leaf guttation fluid, processes known respectively as rhizosecretion and phyllosecretion [13, 14]. Although not widely used, the secretion of recombinant proteins into hydroponic culture medium is advantageous because no cropping or harvesting is necessary. The technology is being developed by the US biotechnology company Phytomedics Inc. In an exciting recent development, a monoclonal
19
20 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
antibody was shown to be secreted into hydroponic culture medium resulting in a yield of 11.7 µg antibody per gram of dry root mass per day [102]. 3.6 Hairy roots
Hairy roots are neoplastic structures that arise following transformation of a suitable plant host with Agrobacterium rhizogenes. If the plant is already transgenic, or if the transforming A. rhizogenes strain is transgenic and transfers the foreign gene to the host plant during the process of transformation, then hairy root cultures can be initiated which will produce recombinant proteins and secrete them into the growth medium [103]. Hairy roots grow rapidly and can be propagated indefinitely in liquid medium. Thus far, hairy root cultures have been used to produce a relatively small number of antibodies [104–106] mainly because of the relative ease with which multi-subunit proteins can be produced. The cultures can be initiated from transgenic plants already carrying multiple transgenes, wild type plants can be infected with multiple A. rhizogenes strains, or established hairy root cultures can be super-transformed with A. tumefaciens. 3.7 Shooty teratomas
Shooty teratomas are differentiated cell cultures produced by transformation with certain strains of A. tumefaciens [107]. Thus far, there has been only one report of pharmaceutical protein production in teratoma cultures, and the levels of antibody were very low [106]. 3.8 Suspension cell cultures
Suspension cell cultures are usually derived from callus tissue by the disaggregation of friable callus pieces in shake bottles or fermenters of liquid medium. Recombinant protein production is achieved by using transgenic explants to derive the cultures, or transforming the cells after disaggregation, usually by co-cultivation with A. tumefaciens. Suspension cultures have the same advantages as the simple plants discussed in Sect. 2.5, i.e. controlled growth conditions, batch-to-batch reproduci-bility, containment and production under GMP procedures. The main disadvantage is the scale of production, although tobacco suspension cells have been cultivated at volumes of up to 100,000 liters [108]. Many foreign proteins have been expressed successfully in suspension cells, including antibodies, enzymes, cytokines and hormones [108–112]. Tobacco cultivar Bright Yellow 2 (BY-2) is the most popular source of suspension cells for molecular farming, since these proliferate rapidly and are easy to transform. However, rice suspension cells have been used to produce several biopharmaceutical proteins [113–117] and soybean suspension cells have been used to produce a hepatitis B vaccine candidate [118].
4 Expression strategies and protein yields
Proteins expressed in suspension cells can either be extracted from the wet biomass or secreted into the culture medium for continuous, non-destructive recovery.
4 Expression strategies and protein yields
Finally in this chapter, we consider some general strategies for the control of gene expression and protein accumulation in plants. These strategies play an important role in defining the overall yields of recombinant proteins, but they also have wider impact, e.g. on biosafety and product authenticity. To achieve high yields, all stages of gene expression must be optimized, including transcription, mRNA stability, mRNA processing, protein synthesis, protein modification, protein accumulation and protein stability. Since expression constructs are chimeric structures in which the transgene is enclosed by various regulatory elements, the considered choice of these regulatory elements is an essential component of the development phase in molecular farming. For high-level transcription, the two most important elements are the promoter and the polyadenylation site. In dicot plants, the cauliflower mosaic virus (CaMV) 35S promoter is the most popular choice because it is strong and constitutive, and therefore drives high-level transgene expression in leaves, fruits, tubers, roots and any other relevant organs [119, 120]. The promoter can be made even more active by various modifications, such as duplicating the enhancer region [121]. In monocots, where seed expression is the normal strategy, the CaMV 35S promoter has lower activity and is generally replaced either with the maize ubiquitin-1 promoter (which is constitutive, but drives high level transgene expression in seeds [122]), or with a seed-specific promoter from a seed storage protein gene (e. g. maize zein, rice glute-lin, bean unknown seed protein (USP), pea legumin). The seed-specific arc5-I promoter from the common bean (Phaseolus vulgaris) has been used to express a single chain antibody in Arabidopsis thaliana, and this resulted in the accumulation of 36 times more recombinant protein than when the transgene was driven by the CaMV 35S promoter [123]. Other tissue-specific promoters that are useful in molecular farming include fruit specific promoters for tomato and tuber-specific promoters for potato. In each case, restriction of transgene expression to the target tissue prevents expression in vegetative organs, therefore reducing any negative impact of the recombinant protein on normal plant growth and development, and limiting the exposure of non-target organisms such as pollinating insects or microbes in the rhizo-sphere. Inducible promoter systems are also valuable assets in molecular farming because transgene expression can be controlled externally [124]. One example is the mechanical gene activation (MeGA) system that was developed at CropTech Corp., VA. This utilizes a tomato hydroxy-3-methylglutaryl CoA reductase 2 (HMGR2) promoter, which is inducible by mechanical stress. Transgene expression is activated when harvested tobacco leaves are sheared during processing, which leads to the rapid induction of protein expression, usually within 24 hours. Another potentially
21
22 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
useful inducible promoter that has been described recently is the peroxidase gene promoter from sweet potato (Ipomoea batatas). When linked to the gusA reporter gene, this promoter produced 30 times more GUS activity than the CaMV 35S promoter following exposure to hydrogen peroxide, wounding or ultraviolet light [125]. Other parts of the expression construct are also important. A strong polyadenylation signal is required for transcript stability and those from the CaMV 35S transcript, the Agrobacterium tumefaciens nos gene and the pea ssu gene are popular choices. In monocots, the presence of an intron in the 5 untranslated region of the expression construct has been shown to improve transgene expression [126]. The structure of the 5 and 3 untranslated regions should also be inspected for AU-rich sequences, which can act as cryptic splice sites and instability elements. Some sequences, such as the 5 leader of the petunia chalcone synthase gene, have been identified as translational enhancers and these can be incorporated into the expression construct to boost protein synthesis. Other important factors that influence translation include the presence of a consensus Kozak sequence, the absence of multiple AUG codons, and the disparity of codon bias between the transgene donor and the host species [127]. One of the most important considerations for the improvement of protein yields is subcellular protein targeting, because the compartment in which a recombinant protein accumulates strongly influences the interrelated processes of folding, assembly and post-translational modification. All of these contribute to protein stability and hence help to determine the final yield [88]. Comparative targeting experiments with full size immunoglobulins and single chain Fv fragments have shown that the secretory pathway is often a more suitable compartment for folding and assembly than the cytosol, and is therefore advantageous for high-level protein accumulation [128, 129]. Because many plant-derived recombinant proteins under development are human proteins that normally pass through the endomembrane system, this principle can be applied not only to antibodies but also more generally. However, it is not a universal rule, since there are examples of proteins that are more abundant when directed to the cytosol than the secretory pathway, e.g. α-galactosidase A. Antibodies targeted to the secretory pathway using either plant or animal N-terminal signal peptides usually accumulate to levels that are several orders of magnitude greater than those of antibodies expressed in the cytosol. Even with antibodies, however, there are occasional exceptions, and this suggests that intrinsic features of each antibody might also contribute to overall stability [130, 131]. The endoplasmic reticulum provides an oxidizing environment and an abundance of molecular chaperones, while there are few proteases. These are likely to be the most important factors affecting protein folding and assembly. It has been shown recently that antibodies targeted to the secretory pathway in transgenic plants interact specifically with the molecular chaperone BiP [132]. In the absence of further targeting information, the expressed protein is secreted to the apoplast. The stability of antibodies in the apoplast is lower than in the lumen of the ER. Therefore, antibody expression levels can be increased up to ten times
References
higher if the protein is retrieved to the ER lumen using an H/KDEL C-terminal tetrapeptide tag [133]. Again, although the principles of ER-retention in molecular farming have been established using antibodies, it is likely that they will also apply to many other proteins.
5 Conclusions
In this chapter, we have looked at the properties of different expression hosts and expression systems, and considered some of the available strategies to control transgene expression and protein accumulation. When all these variations are combined, there exists a very diverse range of potential expression platforms that can be used to produce recombinant proteins. The choice depends on many factors, some intrinsic to the plant species or expression system, some dependent on the recombinant protein and its intended use, and some determined by external factors such as regional, economic and regulatory constraints. There is no ideal production platform for molecular farming, and each of the host plants and systems described in this chapter has its merits and drawbacks. As ever, the choice of production platform therefore should be determined empirically, and on a case-by-case basis.
References 1 R. FISCHER, N. EMANS, Transgenic Res. 2000, 9 (3-4), 279–299. 2 G. GIDDINGS, G. ALLISON, D. BROOKS et al., Nature Biotechnol. 2000, 18 (11), 1151–1155. 3 J.K.-C. MA, P. M. W. DRAKE, P. CHRISTOU, Nature Rev. Genet. 2003, 4 (10), 794–805. 4 R. M. TWYMAN, E. STOGER, S. SCHILLBERG et al., Trends Biotechnol. 2003, 21 (12), 570–578. 5 E. E. HOOD, S. L. WOODARD, M. E. HORN et al., Curr. Opin. Biotechnol. 2002, 13 (6), 630–635 6 R. L. EVANGELISTA, A. R. KUSNADI, J. A. HOWARD et al., Biotechnol. Prog. 1998, 14 (4), 607–614 7 H. S. MASON, H. WARZECHA, T. MOR et al., Trends Mol. Med. 2002, 8 (7), 324–329. 8 D. M. DENBOW, E. A. GRABAU, G. H. LACY et al., Poultry Sci. 1998, 77 (6), 878–881. 9 Z. Y. DAI, B. S. HOOKER, D. B. ANDERSON et al., Mol. Breeding 2000, 6 (3), 277–285.
10 M. T. ZIEGLER, S. R. THOMAS, K. J. DANNA, Mol. Breeding 2000, 6 (1), 37–46. 11 T. ZIEGELHOFFER, J. A. RAASCH, S. AUSTINPHILLIPS, Mol. Breeding 2001, 8 (2), 147–158. 12 M. MOLONEY, J. BOOTHE, G. Van ROOIJEN, US Patent 6, 509, 453, 2003. 13 N. V. BORISJUK, L. G. BORISJUK, S. LOGENDRA et al., Nature Biotechnol. 1999, 17 (5), 466–469. 14 S. KOMARNYTSKY, N. V. BORISJUK, L. G. BORISJUK et al., Plant Physiol. 2000, 124 (3), 927–933. 15 E. STOGER, M. SACK, Y. PERRIN et al., Mol. Breeding 2000, 9 (3), 149–158. 16 U. CONRAD, U. FIEDLER, Plant Mol. Biol. 1998, 38 (1-2), 101–109. 17 A. BARTA, K. SOMMERGRUBER, D. THOMPSON et al., Plant Mol. Biol. 1986, 6 (5), 347–357. 18 P. C. SIJMONS, B. M. M. DEKKER, B. SCHRAMMEIJER et al., Bio/Technology 1990, 8 (3), 217–221. 19 A. HIATT, R. CAFFERKEY, K. BOWDISH, Nature 1989, 342 (6245), 76–78.
23
24 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
20 H. S. MASON, D. M. K. LAM, C. J. ARNTZEN, Proc. Natl Acad. Sci. USA 1992, 89 (24), 11745–11749. 21 J. PEN, L. MOLENDIJK, W. J. QUAX et al., Bio/Technology 1992, 10 (3), 292–296. 22 X. ZHANG, D. W. URRY, H. DANIELL, Plant Cell Rep. 1996, 16 (3-4), 174–179. 23 S. W. MA, D. L. ZHAO, Z. Q. YIN et al., Nature Med. 1997, 3 (7), 793–796. 24 R. MENASSA, V. NGUYEN, A. JEVNIKAR et al., Mol. Breeding 2001, 8 (2), 177–185. 25 M. CABANES-MACHETEAU, A. C. FITCHETTELAINE, C. LOUTELIERBOURHIS et al., Glycobiology 1999, 9 (4), 365–372. 26 H. BAKKER, M. BARDOR, J. W. MOLTHOFF et al., Proc. Natl Acad. Sci. USA 2001, 98 (5), 2899–2904. 27 E. BENVENUTO, R. J. ORDAS, R. TAVAZZA et al., Plant Mol. Biol. 1991, 17 (4), 865–874. 28 H. KHOUDI, S. LABERGE, J. M. FERULLO et al., Biotechnol. Bioeng. 1999, 64 (2), 135–143. 29 M. BARDOR, C. LOUTELIERBOURHIS, T. PACCALET et al., Plant Biotech. J. 2003, 1 (6), 451–462. 30 A. WIGDOROVITZ, C. CARRILLO, M. J. DUS SANTOS et al., Virology 1999, 255 (2), 347–353. 31 S. AUSTIN-PHILLIPS, R. G. KOEGEL, R. J. STRAUB et al., US patent application 5900525, 1999. 32 R. W. H. LEE, A. N. POOL, A. ZIAUDDIN et al., Mol. Breeding 2003, 11 (4), 259–266. 33 J. KAPUSTA, A. MODELSKA, M. FIGLEROWICZ et al., FASEB J. 1999, 13 (13), 1796–1799. 34 J. KAPUSTA, A. MODELSKA, T. PNIEWSKI et al., Adv. Exp. Med. Biol. 2001, 495, 299–303. 35 H. KOPROWSKI, Arch. Immunol. Ther. Exp. 2002, 50 (6), 365–369. 36 V. YUSIBOV, D. C. HOOPER, S. SPITSIN et al., Vaccine 2002, 20 (25-26), 3155–3164, 37 H. E. SUSSMAN, Drug Discov. Today 2003, 8 (10), 428–430. 38 V. SMART, P. S. FOSTER, M. E. ROTHENBERG et al., J. Immunol. 2003, 171 (4), 2116–2126. 39 E. STOGER, C. VAQUERO, E. TORRES et al., Plant Mol. Biol. 2000, 42 (4), 583–590. 40 E. E. HOOD, D. R. WITCHER, S. MADDOCK et al., Mol. Breeding 1997, 3 (4), 291–306.
41 D. R. WITCHER, E. E. HOOD, D. PETERSON et al., Mol. Breeding 1998, 4 (4), 301–312. 42 E. E. HOOD, Enzyme & Microbial Technol. 2002, 30 (3), 279–283. 43 E. E. HOOD, in: P. CHRISTOU, H. KLEE (eds) Handbook of Plant Biotechnology 2004, Wiley-VCH, NY, 791–800. 44 Z. ZHU, K. HUGHES, L. HUANG et al., Plant Cell Tiss. Org. Cult. 1994, 36 (2), 197–204. 45 A. OKADA, T. OKADA, T. IDE et al., Mol. Breeding 2003, 12 (1), 61–70. 46 L. G. JENSEN, O. OLSEN, O. KOPS et al., Proc. Natl Acad. Sci. USA 1996, 93 (8), 3487–3491. 47 M. PATEL, J. S. JOHNSON, R. I. S. BRETTELL et al., Mol. Breeding 2000, 6 (1), 113–123. 48 P. H. D. SCHUNMANN, G. COIA, P. M. WATERHOUSE, Mol. Breeding 2002, 9 (2), 113–121. 49 G. P. XUE, M. PATEL, J. S. JOHNSON et al., Plant Cell Rep. 2003, 21 (11), 1088–1094. 50 L. ZEITLIN, S. S. OLMSTED, T. R. MOENC et al., Nature Biotechnol. 1998, 16 (13), 1361–1364. 51 R. FISCHER, R. M. TWYMAN, S. SCHILLBERG, Vaccine 2003, 21 (7-8), 820–825. 52 R. A. VIERLING, J. R. WILCOX, Seed Sci. Technol. 1996, 24 (3), 485–494. 53 Y. PERRIN, C. VAQUERO, I. GERRARD et al., Mol. Breeding 2000, 6 (4), 345–352. 54 V. V. SATYAVATHI, V. PRASAD, A. KHANDELWAL et al., Plant Cell Rep. 2003, 21 (7), 651–658. 55 A. KHANDELWAL, G. L. SITA, M. S. SHAILA, Vaccine 2003, 21 (23), 3282–3289. 56 A. KHANDELWAL, K. J. M. VALLY, N. GEETHA et al., Plant Sci. 2003, 165 (1), 77–84. 57 T. ARAKAWA, D. K. X. CHONG, J. YU et al., Transgenics 1999, 3 (1), 51–60. 58 S. W. MA, A. M. JEVNIKAR, Adv. Exp. Med. Biol. 1999, 464, 179–194. 59 K. OHYA, T. MATSUMURA, K. OHASHI et al., J. Interf. Cytok. Res. 2001, 21 (8), 595–602. 60 B. ZHANG, Y. H. YANG, Y. M. LIN et al., Biotechnol. Lett. 2003, 25 (19), 1629–1635. 61 Y. PARK, H. CHEONG, Protein Express. Purif. 2002, 25 (1), 160–165. 62 O. ARTSAENKO, B. KETTIG, U. FIEDLER et al., Mol. Breeding 1998, 4 (4), 313–319.
References 63 C. De WILDE, K. PEETERS, A. JACOBS et al., Mol. Breeding 2002, 9 (4), 271–282. 64 D. K. X. CHONG, W. H. R. LANGRIDGE, Transgenic Res. 2000, 9 (1), 71–78. 65 D. K. X. CHONG, W. ROBERTS, T. ARAKAWA et al., Transgenic Res. 1997, 6 (4), 289–296. 66 A. M. WALMSLEY, C. J. ARNTZEN, Curr. Opin. Biotech. 2003, 14 (2), 145–150. 67 C. O. TACKET, H. S. MASON, G. LOSONSKY et al., Nature Med. 1998, 4 (5), 607–609. 68 C. O. TACKET, H. S. MASON, G. LOSONSKY et al., J. Infect. Dis. 2000, 182 (1), 302–305. 69 L. J. RICHTER, Y. THANAVALA, C. J. ARNTZEN et al., Nature Biotechnol. 2000, 18 (11), 1167–1171. 70 J. YU, W. LANGRIDGE, Transgenic Res. 2003, 12 (2), 163–169. 71 P. B. MCGARVEY, J. HAMMOND, M. M. DIENELT et al., Bio/Technology 1995, 13 (13), 1484–1487. 72 D. JANI, L. S. MEENA, Q. M. RIZWANULHAW et al., Transgenic Res. 2002, 11 (5), 447–454. 73 J. S. SANDHU, S. F. KRASNYANSKI, L. L. DOMIER et al., Transgenic Res. 2000, 9 (2): 127-135. 74 Y. MA, S. Q. LIN, Y. GAO et al., World J. Gastroenterol. 2003, 9 (10), 2211–2215. 75 A. M. WALMSLEY, M. L. ALVAREZ, Y. JIN et al., Plant Cell Rep. 2003, 21 (10), 1020–1026. 76 L. J. WANG, D. A. NI, Y. N. CHEN et al., Acta Bot. Sin. 2001, 43 (2), 132–137. 77 A. PORCEDDU, A. FALORNI, N. FERRADINI et al., Mol. Breeding 1999, 5 (6), 553–560. 78 F. B. BOUCHE, E. MARQUET-BLOUIN, Y. YANAGI et al., Vaccine 2003, 21 (17-18), 2065–2072. 79 E. MARQUETBLOUIN, F. B. BOUCHE, A. STEINMETZ et al., Plant Mol. Biol. 2003, 51 (4), 459–469. 80 D. L. PARMENTER, J. G. BOOTHE, G. J. H. VANROOIJEN et al., Plant Mol. Biol. 1995, 29 (6), 1167–1180. 81 J. H. LIU, L. B. SELINGER, K. J. CHENG et al., Mol. Breeding 1997, 3 (6), 463–470. 82 S. P. MAYFIELD, S. E. FRANKLIN, R. A. LERNER, Proc. Natl Acad. Sci. USA 2003, 100 (2), 438–442. 83 S. E. FRANKLIN, S. P. MAYFIELD, Curr. Opin. Plant Biol. 2004, 7 (2), 159–165.
84 E. L. DECKER, R. RESKI, Curr. Opin. Plant Biol. 2004, 7 (2), 166–170. 85 D. G. SCHAEFER, Annu. Rev. Plant Biol. 2002, 53, 477–501. 86 V. GOMORD, L. FAYE, Curr. Opin. Plant Biol. 2004, 7 (2), 171–181. 87 J. R. GASDASKA, D. SPENCER, L. DICKEY, Bioprocess J. 2003, 2, 49–56. 88 S. SCHILLBERG, N. EMANS, R. FISCHER, Phytochem. Rev. 2002, 1 (1), 45–54. 89 P. MALIGA, Curr. Opin. Plant Biol. 2002, 5 (2), 164–172. 90 H. DANIELL, M. S. KHAN, L. ALLISON, Trends Plant Sci. 2002, 7 (2), 84–91. 91 J. S. TREGONING, P. NIXON, H. KURODA et al., Nucleic Acids Res. 2003, 31 (4), 1174–1179. 92 A. FERNANDEZ-SAN MILAN, A. MINGOCASTEL, M. MILLER et al., Plant Biotechnol. J. 2003, 1 (2), 77–79. 93 S. LEELAVATHI, N. GUPTA, S. MAITI et al., Mol. Breeding 2003, 11 (1), 59–67. 94 S. RUF, M. HERMANN, I. J. BERGER et al., Nature Biotechnol. 2001, 19 (9), 870–875. 95 C. PORTA, G. P. LOMONOSSOFF, Biotechnol. & Genet. Eng. Rev. 2002, 19, 245–291. 96 A. A. MCCORMICK, M. H. KUMAGAI, K. HANLEY et al., Proc. Natl Acad. Sci. USA 1999, 96 (2), 703–708. 97 T. VERCH, V. YUSIBOV, H. KOPROWSKI, J. Immunol. Methods 1998, 220 (1-2), 69–75. 98 J. KAPILA, R. De RYCKE, M. Van MONTAGU et al., Plant Sci. 1997, 122 (1), 101–108. 99 C. VAQUERO, M. SACK, F. SCHUSTER et al., FASEB J. 2002, 16 (1), 408–410. 100 S. KATHURIA, R. SRIRAMAN, R. NATH et al., Hum. Reprod. 2002, 17 (8), 2054–2061. 101 O. VOINNET, S. RIVAS, P. MESTRE et al., Plant J. 2003, 33 (5), 949–956. 102 P. M. W. DRAKE, D. M. CHARGELEGUE, N. D. VINE et al., Plant Mol. Biol. 2003, 52 (1), 233–241. 103 J. M. SHARP, P. M. DORAN, Biotechnol. Prog. 2001, 17 (6), 979–992. 104 R. WONGSAMUTH, P. M. DORAN, Biotechnol. Bioeng. 1997, 54 (5), 401–415. 105 J. M. SHARP, P. M. DORAN, Biotechnol. Bioprocess Eng. 1999, 4 (4), 253–258. 106 J. M. SHARP, P. M. DORAN, Biotechnol. Bioeng. 2001, 73 (5), 338–346. 107 M. A. SUBROTO, J. D. HAMILL, P. M. DORAN, J. Biotechnol. 1996, 45 (1), 45–57.
25
26 Large-scale Protein Production in Plants: Host Plants, Systems and Expression
108 R. FISCHER, N. EMANS, F. SCHUSTER et al., Biotechnol. Appl. Biochem. 1999, 30 (2), 109–112. 109 S. Y. HONG, T. H. KWON, J. H. LEE et al., Enzyme & Microbial Tech. 2002, 30 (6), 763–767. 110 E. JAMES, J. M. LEE, Adv. Biochem. Eng. Biotechnol. 2001, 72, 127–156. 111 T. H. KWON, J. E. SEO, J. KIM et al., Biotechnol. Bioeng. 2003, 81 (7), 870–875. 112 N. S. MAGNUSON, P. M. LINZMAIER, R. REEVES et al., Protein Express. Purif. 1998, 13 (1), 45–52. 113 M. TERASHIMA, Y. EJIRI, N. HASHIKAWA et al., Biochem. Eng. J. 1999, 4 (1), 31–36. 114 J. HUANG, T. D. SUTLIFF, L. WU et al., Biotechnol. Prog. 2001, 17 (1), 126–133. 115 M. M. TREXLER, K. A. MCDONALD, A. P. JACKMAN, Biotechnol. Prog. 2002, 18 (3), 501–508. 116 E. TORRES, C. VAQUERO, L. NICHOLSON et al., Transgenic Res. 1999, 8 (6), 441–449. 117 J. M. HUANG, L. WU, D. YALDA et al., Transgenic Res. 2002, 11 (3), 229–239. 118 M. L. SMITH, M. E. KEEGAN, H. S. MASON et al., Biotechnol. Prog. 2002, 18 (3), 538–550. 119 J. T. O’DELL, F. NAGY, N. H. CHUA, Nature 1985, 313 (6005), 810–812. 120 M. A. LAWTON, M. A. TIERNEY, I. NAKAMURA et al., Plant Mol. Biol. 1987, 9 (4) 315-324. 121 R. KAY, A. CHAN, M. DALY et al., Science 1987, 236 (4806), 1299–1302.
122 A. H. CHRISTENSEN, P. H. QUAIL, Transgenic Res. 1996, 5 (3), 213–218. 123 G. De JAEGER, S. SCHEFFER, A. JACOBS et al., Nature Biotechnol. 2002, 20 (12), 1265–1268. 124 M. PADIDAM, Curr. Opin. Plant Biol. 2003, 6 (2), 169–177. 125 K. Y. KIM, S. Y. KWON, H. S. LEE et al., Plant Mol. Biol. 2003, 51 (6), 831–838. 126 P. VAIN, K. R. FINER, D. E. ENGLER et al., Plant Cell Rep. 1996, 15 (7), 489–494. 127 R. FISCHER, N. J. EMANS, R. M. TWYMAN et al., in: M. FINGERMAN, R. NAGABHUSHANAM (eds) Recent Advances in Marine Biotechnology, Volume 9 (Biomaterials and Bioprocessing), Science Publishers Inc., Enfield NH, pp 279–313, 2003. 128 S. ZIMMERMANN, S. SCHILLBERG, Y-.C. LIAO et al., Mol. Breeding 1998, 4 (4), 369–379. 129 S. SCHILLBERG, S. ZIMMERMANN, A. VOSS et al., Transgenic Res. 1999, 8 (4), 255–263. 130 G. De JAEGER, E. BUYS, D. EECKHOUT et al., Eur. J. Biochem. 1998, 259 (1-2), 1–10. 131 A. SCHOUTEN, J. ROSSIEN, J. BAKKER et al., J. Biol. Chem. 2002, 277 (22), 19339–19345. 132 J. NUTTALL, N. VINE, J. L. HADLINGTON et al., Eur. J. Biochem. 2002, 269 (24), 6042–6051. 133 U. CONRAD, U. FIEDLER, Plant Mol. Biol. 1998, 38 (1-2), 101–109.
1
Plant Viral Expression Vectors Vidadi Yusibov and Shailaja Rabindran Fraunhofer USA Center for Molecular Biotechnology, Newark, USA
Originally published in: Molecular Farming. Edited by Rainer Fischer and Stefan Schillberg. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30786-9
1 Introduction
Proteins such as antibodies, enzymes, hormones and vaccine antigens can be used to prevent, diagnose and treat a range of diseases. Such molecules are therefore of paramount importance in health and medicine. Historically, many of these proteins have been isolated from human or animal sources. However, the low quantities present in such source material coupled with safety risks and high purification costs have limited the availability of protein therapeutics and vaccines for many types of disease. In the mid-1970s, recombinant DNA technology revolutionized research involving the above molecules by making it possible to produce recombinant proteins in bacterial expression systems. Although, until recently, prokaryotic expression systems have been the most common method for recombinant protein production, they have limitations because of the absence of eukaryotic posttranslational modification and the improper folding of many complex human proteins. During the last three decades, many research laboratories have therefore focused on developing alternative platforms for recombinant protein expression, which can overcome the shortcomings of bacteria. The first systems to emerge from these studies were those based on animal and insect cell cultures [1, 2]. Several products, including monoclonal antibodies, vaccines and therapeutics, have been produced using these systems [3, 4], but the high production costs combined with the requirement for highly sophisticated manufacturing facilities for each target protein has encouraged the development of different production systems. Transgenic animals [5] have been considered as an alternative because they can produce complex human proteins in large amounts. However, potential
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Plant Viral Expression Vectors
disadvantages of transgenic animals include economic and time constraints as well as risks that the products could be contaminated with human pathogens. In recent years, plants have emerged as a promising new system for the production of recombinant proteins. Plants are widely described as “green factories” because they provide possible solutions to several of the safety and economic concerns raised by animal systems. The advantages of plants include economical large-scale production, the absence of contaminating mammalian pathogens, and the ability to produce complex, biologically-active proteins. There are several different strategies for the production of recombinant proteins in plants. The traditional approach is to use transgenic plants, in which the gene of interest is incorporated into the plant genome and is passed from generation to generation as a stable trait (reviewed in [6–10]). A number of therapeutic proteins and vaccine antigens have been produced in transgenic plants and have been successfully tested in animal and human trials. Nevertheless, the transgenic approach has some shortcomings, including the length of time required to obtain the transgenic producer lines, low levels of expression, and inherent difficulties in the modification of an existing product. In some expression hosts, scaling up production also takes a long time. One alternative strategy that has emerged in the last decade is the use of plant virus expression vectors to synthesize recombinant proteins in plants. The foreign gene is inserted into the viral genome so that, upon infection of the host plant cell, the transgene is replicated and expressed along with native viral genes. This method of transient expression adds several further advantages to plant-based expression, including improved time efficiency, higher levels of target protein expression, flexibility and convenience in the modification of existing products (or the development of new ones), ease of scale-up, flexibility in the selection of a production host and the potential for protein manufacture in contained facilities [11]. Furthermore, target proteins (in particular antigenic peptides) can be genetically fused to viral structural proteins, such as coat proteins, so that the plant virus is used not only for expression but also for the delivery of the vaccine antigens. To date, the coat proteins from a number of plant RNA viruses have been successfully used as carriers for antigenic peptides derived from various pathogens (for reviews see [11–13]). This approach is economically attractive, and benefits from higher levels of safety and efficacy in generating pathogen-specific immune responses. This chapter evaluates some of the approaches used to produce vaccines and heterologous proteins in plants using plant virus-based expression systems, and discusses novel strategies that are being considered for the development of better vectors.
2 Plant RNA Viruses as Expression Vectors
The majority of viruses that infect plants have single-stranded, positive-sense RNA genomes. It has therefore been necessary to use infectious cDNA clones for the in
2 Plant RNA Viruses as Expression Vectors
vitro manipulation of RNA viruses, allowing them to be developed as effective tools for the commercial production of target proteins in plants. This approach has also been used to study the genetic and metabolic profiles of both viruses and their host plants. Siegel [14] conceptualized the potential use of RNA viruses as expression vectors. Brome mosaic virus (BMV) and Tobacco mosaic virus (TMV) were the first two RNA viruses to be converted into expression vectors. These vectors have since been promoted from elegant laboratory tools (for studying virus movement or replication) to the category of expression vectors for producing biopharmaceuticals in plants. After more than a decade of work by many research groups, the genomes of a number of plant RNA viruses have been engineered to express target sequences (Table 1). Several extensive reviews have been published on this subject [5, 10–12]. Many of these articles have demonstrated the feasibility of plant viruses as expression tools for the production of foreign proteins in plants. Based on genome structure and identification of virus gene functions, several approaches have been employed for the expression of foreign sequences using plant viruses as expression
Table 1 Examples of plant viruses used in the development of expression vectors
Virus
Strategies used
Pathogen or protein/ epitope
References
Tobacco mosaic virus
CP replacement, Second subgenomic promoter (sgp), CP fusion, Readthrough fusion with CP, fusion with cleavage site, fusion with CtxB
11, 16, 17, 52, 59–61
Potato virus X
second sgp, CP fusion, CP fusion with FMDV 2A element, IRES elements Fusion with CP with protease cleavage Fusion with CP with protease cleavage Fusion with CP, fusion to CP with protease cleavage site, complementation Fusion with CP, second sgp CP fusion
malaria, rabies virus, MHV, FMDV, HCV, BHV-1, Ps. aeruginosa, neuropeptide, TMOF, allergens, MABs, α-trichosanthin, cytokines α-galactosidase-A, scFvs GFP, rotavirus, scFv,
Zucchini yellow mosaic virus Plum pox virus Cowpea mosaic virus
Alfalfa mosaic virus Tomato bushy stunt virus
25, 26, 62, 63
anti-HIV proteins
37
CPV
40
GFP, HIV, MEV, HRV, MV, CPV, S. aureus, Ps. aeruginosa
11, 12, 13, 31, 43, 58
HIV, rabies, measles, RSV
32, 54, 64
HIV
65, 66
CP: coat protein; CtxB: cholera toxin B subunit; scFv: single chain Fv antibody fragment; TMOF: trypsin modulating oostatic factor; MAB: monoclonal antibody; GFP: green fluorescent protein; CPV: Canine parvovirus; BHV: Bovine herpes virus; FMDV: Foot and mouth disease virus; HCV: Hepatitis C virus; HRV: Human rhino Virus; MEV: Mink enteritis virus; MHV: Murine hepatitis virus; MV: Measles virus; RSV:Respiratory syncytial virus
3
4 Plant Viral Expression Vectors
systems. These include: i) replacing nonessential viral genes with target sequences, ii) inserting target sequences into the viral genome as an additional gene with an additional promoter, iii) fusing target sequences with viral genes encoding structural proteins, iv) fusing the target sequences and viral structural gene with a cleavage site or read-through sequence, iv) functional complementation of defective viral components, and v) trans-complementation of viral genes through transgene expression in the host plant. 2.1 Tobacco mosaic virus (TMV)
Several groups have demonstrated the feasibility of TMV as an expression vector in plants (for reviews see [11, 16, 17]). The most widely used TMV-based vectors have an additional heterologous subgenomic promoter that directs expression of the foreign gene [9, 18]. One of the first active proteins produced in plants using this vector was α-trichosanthin, an inhibitor of HIV [20]. An improvement on the original TB2 vector resulted in the development of 30B [19], which is widely used both as a laboratory tool to study gene function and for the production of therapeutics and vaccines. Due to size limitations, there is unstable long distance movement of the chimeric virus in tobacco. Using DNA shuffling, Toth et al. [21] created improved 30B-based vectors that had better movement characteristics in tobacco. For further manipulations to ensure the production of functional proteins, signal peptide sequences were incorporated to target the protein either to the endoplasmic reticulum (ER) or the apoplast [17]. TMV particles have also been used as an epitope-presentation system. Detailed X-ray crystallographic analysis of the coat protein identified suitable sites for the insertion of foreign peptides. The helical arrangement of 2130 copies of coat protein around the viral RNA allows the presentation of multiple copies of the foreign epitope on the virus surface. At the same time, this can be detrimental to virion stability because the extra peptide can destabilize the virion structure. To circumvent this problem, a more stable system was developed exploiting the readthrough sequence from the replicase gene, so that both wild type and fusion peptide-containing coat proteins were produced [22, 23]. Recently, Bendahmane et al. [24] showed that by modifying the pI:charge ratio of hybrid coat proteins so that it resembled that of the wild type TMV coat protein, a more stable hybrid virus was produced. 2.2 Potato virus X (PVX)
PVX, a member of the potexvirus group, is another single stranded RNA virus that is widely used as an expression vector. Santa Cruz et al. [25] created a PVX-based vector in which the target molecule was genetically fused to the amino terminus of the PVX coat protein via the foot and mouth disease virus (FMDV) 2A peptide, which facilitates cotranslational processing of the polyprotein. Although this is an elegant way to produce foreign proteins, the extra C-terminal sequences would be
2 Plant RNA Viruses as Expression Vectors
undesirable, particularly for proteins used in therapeutic applications. Recently, Toth et al. [26] created another PVX-based vector in which an internal ribosome entry site (IRES) was inserted between the foreign and coat protein genes. On infection, this virus produced a bicistronic mRNA, from which both coat protein and the foreign protein were translated at detectable levels. PVX vectors have been used to produce single-chain Fv antibody fragments (scFvs) in N. benthamiana at up to 0.2 µg g−1 in leaf tissue [27, 28]. 2.3 Cowpea mosaic virus (CPMV)
Cowpea mosaic virus (CPMV), a bipartite RNA virus, was one of the first viruses used to express foreign peptides as viral coat protein fusions [29]. Since then, CPMV vectors have been improved and chimeric virus particles (CVPs) containing different target peptides inserted at various loop structures within the coat protein have been produced in plants. These CVPs have elicited strong immune responses in model animals and have achieved protection against disease challenge [12, 13]. CPMV has also been used to express full-length proteins. In one version of the vector, the foreign genes were inserted between the movement protein and large coat protein genes with artificial proteolytic cleavage sites engineered on either side of the foreign gene to facilitate the release of soluble protein [30]. In another version, the foreign gene was expressed as a fusion to the C-terminus of the CPMV S protein, using the FMDV 2A peptide as a bridge [31]. 2.4 Alfalfa mosaic virus (AlMV)
Vectors based on Alfalfa mosaic virus (AlMV), a tripartite RNA virus, are particularly useful for producing target molecules genetically fused to the viral coat protein. The AlMV capsid comprises multiple 24-kDa coat protein units, which form particles of different sizes and shapes based on the size of the encapsidated RNA. The Nterminus of the coat protein is located at the surface of the virion and is a useful site for the insertion of peptides without interfering with virion assembly. There have been several studies using AlMV viral particles for the production and delivery of anti-genic epitopes [22, 32]. More recently, we have used AlMV as a screening tool to map antigenic determinants for subunit vaccine development (Munz et al., unpublished data). Both B cell and T cell epitopes are being identified as potential vaccine candidates from a variety of pathogens to formulate multivalent vaccines. The ability to insert peptides of up to 50 amino acids in length as coat protein fusions, together with the ease of particle recovery, makes this system very amenable for the development of vaccines. The availability of P12 transgenic tobacco plants [34] that contain AlMV RNAs 1 and 2 integrated into the plant genome allows for trans-complementation so that infections can be initiated by the delivery of RNA3 alone. The virus particles thus produced are not infectious to non-transgenic plants, therefore improving the containment of the hybrid virus. AlMV is undoubtedly a
5
6 Plant Viral Expression Vectors
very efficient tool for the production and presentation of target sequences in the form of coat protein fusions.
3 Biological Activity of Target Molecules
Much progress has been made in the development of plant virus expression vectors over the last decade and they can now be regarded as commercially viable systems sufficient for the production of large quantities of target protein. During this period, a number of key issues in commercial product development have been improved, including the yield, biological activity and immunogenicity of the products, the genetic stability of the expression system, and the efficiency of downstream processing to recover the product from virus-infected plants. Studies have shown that plants can make biologically active recombinant proteins through both transgenic and transient expression approaches. Although the plant post-translational machinery is similar to that of mammalian cells, there are some notable differences, e. g. differences in glycosylation, particularly the absence of sialation, which may impact the activity of certain proteins. The absence of mammalian enzymes may prevent complex maturation processes that are critical for the biological activity of proteins such as insulin. Fortunately these shortcomings affect the activity of only a limited number of proteins. Most therapeutic proteins expressed in plants have full biological activity. For example, α-galactosidase A (Gal A), a human lysosomal enzyme deficient in patients with Fabry’s disease, was expressed using a TMV vector. Treatment of the disease involves enzyme replacement therapy, which is expensive. Correct disulfide bond formation, glycosylation and dimerization are essential for the activity of this protein. When Gal A protein was targeted to the endoplasmic reticulum, enzyme activity was nominal [16], but when portions of a putative C-terminal propeptide were deleted, and the protein was targeted to the apoplast, high levels of enzyme activity were observed. Targeting to the apoplast also facilitates Gal A purification because leaf cells do not normally secrete proteins. The specific activity of Gal A was similar to that of the native enzyme purified from human tissue, and the predicted glycosylation sites were properly modified. The pituitary glycoprotein follicle-stimulating hormone (FSH) was produced in plants as a single-chain molecule (sc-bFSH) using a TMV vector [35]. Native bFSH comprises two polypeptides expressed from genes at different loci. The cDNAs encoding the α and β subunits of the protein were cloned so that the subunits were fused in the configuration β-α to produce sc-bFSH, a 30-kDa protein. The sc-bFSH protein accumulated to 3% TSP (total soluble protein) in the intercellular wash fluid (IF) and was biologically active, indicating that it was correctly folded. Deglycosylation patterns of the IF extract with N-glycosidase F, which digests all oligosaccharide species except those containing core α(1,3)-fucose, indicated that the N-linked glycans present in the recombinant protein had two types of cores, with and without α(1,3)-fucose. When immunoaffinity-purified sc-bFSH was analyzed by MALDI-MS, the presence of two complex N-glycan structures was demonstrated.
4 Efficacy of Plant Virus-produced Antigens
Plant N-glycan structures include β(1,2)-xylose and core α(1,3)-fucose. Because these glycans are not native to mammalian systems, it is expected that they may be allergenic in mammals. It is also expected that mammalian proteins with native glycosylation patterns will be more stable when administered for therapy. Recently, ‘mammalianized’ plants were developed that expressed mammalian β-1,4-galactosyltransferase [36]. This exciting development provides a possible solution to the production of correctly glycosylated mammalian proteins, including FSH and antibodies, in plants. Arazi et al. [37] used a Zucchini yellow mosaic virus (ZYMV) vector to produce two anti HIV proteins – MAP30 (Momordica anti-HIV protein, 30 kDa) and GAP31 (Gelonium anti-HIV protein, 31 kDa) that are naturally present in other plant species. These two proteins have antiviral and antitumor activities and are effective against viruses, tumor cells and microbes. Purified preparations from ZYMV-infected squash had specific activities comparable to the native proteins, and demonstrated anti-HIV activity, which was measured by inhibition of syncytial formation and production of the viral core protein p24. Each recombinant protein was also able to inhibit the growth of both Gram-positive and Gram-negative bacteria as well as the yeast Candida albicans, and was not toxic to human cells at a wide range of concentrations.
4 Efficacy of Plant Virus-produced Antigens
As indicated above, target molecules and antigens can be engineered into plant virus vectors and produced: i) as free soluble proteins such as vaccine antigens, or ii) as fusions with viral coat protein subunits so that they are incorporated into the virus particles. Soluble antigens could consist of pathogen sequences only or fusions of pathogen sequences with molecules that provide adjuvant or other activities, such as the pentameric cholera toxin B subunit (CTB), which stimulates a mucosal immune response. A significant amount of data has been accumulated using both approaches. 4.1 Vaccine Antigens
Several soluble antigens that have been produced in plants using plant viruses as expression systems have shown immunogenic and protective properties in test animals [11]. The plant-derived proteins, which include known immunogens from pathogens or allergens from plants [38], have been tested either as crude plant extracts containing recombinant protein or as purified material from the infected plants. Wigdorovitz et al. [39] used a TMV expression vector to produce VP1, the 26-kDa structural protein from FMDV, and tested it in mice. Mice injected intraperitoneally with leaf extracts prepared from infected plants mounted an antibody response against the plant-derived protein. All immunized mice were protected when
7
8 Plant Viral Expression Vectors
challenged with virulent FMDV. One-year old calves immunized with plant extracts containing VP1 also developed FMDV-specific antibody responses. In another study, Fernandez-Fernandez et al. [40] developed a vaccine against VP60 of rabbit hemorrhagic disease virus (RHDV) using plum pox virus (PPV) as the expression system. VP60 was produced as a polyprotein that was processed to release the heterologous protein. Rabbits immunized with extracts of Nicotiana clevelandii containing VP60 mounted an efficient immune response that protected them against a lethal challenge with RHDV. Nine of ten rabbits that received plant extracts containing VP60 (two 1-ml doses of extract containing ∼2 g of fresh leaves) produced VP60-specific antibodies while animals vaccinated with extracts of plants infected with wild type PPV did not produce antibodies against RHDV. When the animals vaccinated with PPV-V60 were challenged, the antibody response increased and no clinical symptoms of the disease were observed, whereas all but one of the control animals died after challenge. No RHDV was detected in the livers of the surviving animals two weeks after challenge. The serological responses of animals vaccinated with plant extracts containing VP60 were almost as high as those of animals immunized with a commercial vaccine. In exciting developments, McCormick et al. [41, 42] produced individualized vaccines for the treatment of cancer using a TMV-based expression vector. Bcell tumors (such as non-Hodgkin’s lymphoma) express a unique cell surface immunoglobulin (Ig) that acts as a tumor-specific marker. When such Igs are conjugated to carriers such as KLH and administered to patients with an adjuvant, an immune response is triggered in the patient with a favorable clinical outcome. It is difficult to produce Igs as vaccines, so single-chain variable region (scFv) vaccines that have the hyper-variable domains from the tumor-specific Ig have been developed as alternatives. These scFv vaccines have the ability to elicit an anti-idiotypic response in animals and can block tumor progression in mouse lymphoma models. Using a TMV expression vector, 38C13 scFv (from a mouse Bcell lymphoma) was recovered from the IF of infected plants. The major fraction of protein produced in the plants was soluble and properly folded. When administered to mice, it induced specific anti-38C13 responses. A strong IgG2a isotype response was observed, which is often correlated with increased tumor protection. A 90% survival rate was reported in the group of mice that received the plant-produced vaccine plus adjuvant and were then challenged with 38C13 tumor cells two weeks after the third vaccination. There was also a 70% survival rate in mice receiving the scFv alone and an 80% survival rate in those receiving the standard vaccine. This approach has successfully been tested in a group of non-Hodgkin’s lymphoma patients in an FDA approved phase I human clinical trial (Barry Holtz, Conference on Plant-Made Pharmaceuticals, Quebec City, Canada, 2003). 4.2 Particle-based Vaccine Antigen Delivery
Several plant virus coat proteins, including those of TMV, CPMV, AlMV and Tomato bushy stunt virus (TBSV), have been used to produce and deliver
4 Efficacy of Plant Virus-produced Antigens
antigenic determinants from a variety of viral and bacterial pathogens. These data have been summarized in numerous publications and several reviews [12, 13]. The ease of virus purification coupled with enhanced peptide immunogenicity when fused to carrier molecules makes this approach very attractive for vaccine development. CPMV particles that contained a 17-mer neutralizing epitope, 3L17, from the VP2 capsid of Mink enteritis virus (MEV) fused to the S protein were generated. When mixed with adjuvant, these particles protected all the test animals from clinical disease when challenged with virulent MEV. A modified construct, which presented the peptide on the surface of both L and S coat protein subunits, induced an antibody response that was higher than that of a peptide-KLH conjugate [43]. The predominance of IgG2a indicated early activation of TH1 cells. These results were validated by cell proliferation and IFN-γ release from mice cells exposed to CVPs in vitro. Intranasal immunization resulted in better mucosal responses than serum antibody responses. The significant outcome of these studies was that when peptides are presented on viral particles it is possible to shift the bias towards a TH1 response (which mediates macrophage and cytotoxic T cell activation), and that CVPs can protect against both systemic and mucosal infections. CPMV particles have also been used to present epitopes from bacterial pathogens such as Pseudomonas aeruginosa and Staphylococcus aureus. When fused to CPMV, the D2 peptide from the S. aureus fibronectin-binding protein (FnBP) induced high titers of FnBP-specific antibodies in mice and rats immunized subcutaneously [44]. The sera inhibited binding of fibronectin to immobilized recombinant FnBP, and rat serum was able to block the adherence of S. aureus to fibronectin. The response had a strong TH1 bias probably because CPMV CP elicits TH1/IgG2a type responses. The isotype of anti-D2 IgG in induced mice was predominantly IgG2a and IgG2b . These studies highlight the potential of plant virus-based vaccines to protect against S. aureus infections that include invasive endocarditis, septicaemia, peritonitis and bovine mastitis. When presented on CPMV particles, a linear B-cell epitope from the outer membrane protein F of P. aeruginosa induced peptide-specific antibodies in C57BL/6 mice that bound complement and increased phagocytosis of P. aeruginosa by human neutrophils in vitro [45]. In a mouse model of chronic pulmonary infection, the particles afforded protection when challenged with two different immunotypes of the pathogen. The levels of protection were similar to those observed when the peptide was coupled to KLH. In another study, CPMV particles that displayed a peptide derived from the epidermal growth factor receptor variant III (EGFRvIII) elicited specific antibody responses in mice against the peptide and protected mice from tumor challenge [45]. Most infectious diseases involve colonization or invasion through mucosal surfaces by a pathogen. As a first line of defense it is important to develop a strong mucosal response against the pathogen. Such responses can be achieved when the oral or nasal route is used for immunization. Due to the acidic nature of the stomach and the presence of proteolytic enzymes in the gastrointestinal tract, special formulation of vaccine antigens will be required for efficient delivery and the generation of a suitable immune response, if the oral route is employed. The alternate
9
10 Plant Viral Expression Vectors
method of intranasal immunization would be more effective because it requires less of the immunogen and conditions are more favorable to maintain stability of the viral particles. It has been found that when recombinant virus particles are administered intranasally, antibodies can be detected at distal sites, such as the bronchial, intestinal and vaginal lavages [43]. Chimeric TMV particles containing the 5B19 epitope from the spike protein of murine hepatitis virus (MHV) fused near the C-terminus of TMV coat protein subunits were used to immunize mice intranasally (three doses per week for ten weeks) [46]. High IgG titers and moderate IgA titers specific to the peptide could be detected. When mice were challenged intranasally with MHV, five of six mice that were immunized intranasally with chimeric TMV particles for ten weeks survived the challenge. Two further mice that received immunization for six weeks and one that was immunized for four weeks survived the challenge, indicating that longer periods of immunization with the chimeric TMV particles resulted in higher antibody titers that protected the animals from disease challenge. Nemchinov et al. [47] used a TMV expression vector to develop a subunit vaccine against hepatitis C virus (HCV). A consensus sequence matching hypervariable region 1 (HVR1) of HCV, encoding a potential neutralizing epitope of 27 amino acids, was fused to the C-terminus of CTB. Mice immunized intranasally with plant extract containing ∼0.5-1 µg CTB/HVR1 developed anti-HVR1 antibodies. These experiments were carried out without adjuvant and the amounts of immunogen used per dose were less than 0.1 µg of HVR1 epitope. The same epitope has been engineered as a fusion with other plant viruses such as AlMV and is undergoing testing. The sera of mice fed with fresh spinach leaves infected with AlMV particles presenting a rabies virus epitope contained IgG and IgA. Mucosal IgA was also detected [48]. Human volunteers (in FDA approved trials) fed with spinach containing recombinant particles generated both IgG and IgA responses specific to the pathogen [49]. The trials also suggested that plant virus particle-based vaccines could be effectively used in prime-boost regimens. In more recent work, recombinant AlMV particles containing an epitope from the G protein of human respiratory syncytial virus (RSV) induced protective immunity in mice [33]. Mice immunized intranasally with chimeric PVX particles expressing a sixamino-acid neutralizing epitope from gp 41 of HIV-1 produced high levels of HIV-1-specific IgG and IgA antibodies [50]. The anti-H66 IgG titers ranged from 2000 to > 30,000. Mice immunized intranasally produced IgA in the serum and in fecal extracts. Excellent progress has been made in a relatively short period of time in demonstrating the potential of plant virus vectors not only as expression tools but also as elegant and efficient means for the administration of vaccine antigens by different routes. The targets seem to be unlimited. 4.3 Other Uses of Plant Virus Particles
In a unique study Khor et al. [51] demonstrated that CVPs could also be used as antiviral agents. The cellular receptor for measles virus (MV) has been identified
6 New Approaches to the Development of Viral Vectors
as CD46. Two different peptides from CD46, when presented on CPMV particles, inhibited the infection of HeLa cells by MV in vitro in a dose-dependent manner. The extent of inhibition was 18–180-fold more effective than soluble CD46 peptide, probably due to increased stability. The CVPs also protected mice models of human MV infection when challenged intracranially with MV. The results showed that the chimeric CPMV particles could block MV entry into neurons in the brains of the treated animals. This technology has paved the way for the creation of antiviral agents against several important viruses, particularly those for which the cellular receptors have been characterized. Virus particles that can present multiple peptides on their surfaces could simultaneously evoke humoral responses, cell-mediated responses, and inhibit viral entry, thus acting as a vaccine as well as an antiviral agent. In yet another application of plant virus peptide presentation systems, Borovsky [52] used TMV to present a peptide, trypsin modulating oostatic factor (TMOF), that terminates trypsin biosynthesis in the mosquito gut and causes larval mortality. This unique study uses plant virus particles for the biological control of insect pests.
5 Plant Viruses as Gene Function Discovery Tools
Plant virus vectors have been used extensively as laboratory tools to identify the functions of unknown genes, to localize viral proteins by fusing them to green fluorescent protein (GFP), to manipulate biosynthetic pathways in plants, to screen genomic libraries rapidly, and to study gene silencing. Gene function can be elucidated using plant viral vectors either by overexpressing the gene or by silencing the gene. This has been an important feature in functional genomics strategies to identify plant genes, and plant viral vectors have proven to be invaluable tools for this purpose. The genomes of several viruses have been engineered to study post-transcriptional gene silencing, termed virus-induced gene silencing (VIGS). Kumagai et al. [53] used a TMV expression vector to overexpress tomato phytoene synthase (PSY). When Nicotiana benthamiana plants were inoculated with in vitro transcripts of these constructs, infected leaves developed a bright orange phenotype, and accumulated high levels of phytoene. Plants that were inoculated with a construct that expressed a partial segment of phytoene desaturase cDNA in the antisense orientation developed a white phenotype, reflecting the inhibition of carotenoid biosynthesis due to pds gene silencing. TMV, PVX, TRV and AlMV are some of the plant viruses being used as expression vectors to identify and manipulate plant gene function (reviewed in [11]).
6 New Approaches to the Development of Viral Vectors
During the last decade there has been significant progress in the development of plant virus expression systems for the production of biopharmaceuticals. These
11
12 Plant Viral Expression Vectors
studies have revealed the tremendous potential of plant viruses but have also raised some obvious and inherent shortcomings of the technology. A major limitation is the size of the foreign protein that can be produced. Often, large genes are not stably maintained in the plant viral genome. New approaches are being developed in an attempt to overcome some of these shortcomings, e.g. vector stability, host range and the efficiency of target molecule expression. The design of chimeric viruses to create a functional vector system combining components of different viruses and thus expanding the virus host range was one of the early approaches employed by Yusibov et al. [32]. Antigenic determinants from rabies virus and HIV-1 engineered as fusions with the AlMV coat protein were expressed using a TMV vector. Thus the system had two coat proteins, one serving as a carrier molecule for target peptides and other for long distance movement. This extra coat protein system allows the presentation of larger peptides on the surface of AlMV particles but also provides greater genetic stability. A chimera of TMV that expressed the AlMV coat protein instead of the native coat protein was also created [54]. This hybrid virus had a broader host range and infected plant species such as spinach and soybean. The development of two-component functional complementation-based expression vectors using either homologous or heterologous virus components is an ongoing attempt to circumvent the size limitations of a single component vectors and also to facilitate the production of multi-subunit proteins [55–57]. Liu and Lomonossoff [58] described the agrodelivery of two subgenomic components of CPMV from a mixed suspension of bacteria, each harboring different subgenomic complements. Further improvements to virus expression systems include transcomplementation of some of the virus functions from transgenic host plants (P12 plants for AlMV). By integrating parts of the viral vector into the plant chromosome, this system has the potential for multiple technical solutions that could overcome limitations of classical viral vectors [33]. Viral vectors can be used as molecular switches for tightly controlled, high-level transgene expression (Hull et al. unpublished data). A novel transient expression technology has been developed at Icon Genetics AG, Munich, Germany (Prof. Gleba, Icon Genetics, personal communication). This technology uses T-DNAs that encode parts of a plant virus genome transiently co-delivered by two or more Agrobacterium recombinants. Because more than one bacterium can infect a cell, the viral vector is assembled in planta. This technology is also versatile because multiple genes can be co-expressed and functional proteins assembled in the plants. In another version, two viral sequences encoded on separate T-DNAs, one containing the coat protein gene but lacking the origin of assembly sequence (OAS) and the other containing the foreign gene and OAS but lacking the coat protein gene, are co-infiltrated into plants. Functional complementation between the two viruses results in long distance movement and expression of the foreign gene in upper leaves. In yet another version of the vector, a functional amplicon is produced only after transient delivery of a T-DNA encoding viral
References
integrase by agroinfiltration. This approach has been utilized to express several proteins and has clearly produced greater quantities of the target protein. For example, up to 5 mg g−1 fresh weight of GFP could be obtained in leaves. 7 Conclusion
There is no doubt that plants represent one of the most productive and yet inexpensive sources of biomass. The absence of contaminating animal pathogens, the eukaryotic translational machinery and the ease of plant virus manipulation make plants and plant viruses very useful for the economical production of large amounts of commercial products. A whole new area in agriculture called molecular farming (pharming), i.e. the use of plants to produce pharmaceuticals, could soon revolutionize the pharmaceutical industry. Plant virus-based vectors are powerful tools for the production of biopharmaceuticals, and some products are now close to commercialization. It is clear that as the industry moves forward, the focus of research should turn towards more practical problems such as the development of efficient extraction and purification systems. The glycosylation patterns of plant-derived proteins, allergy testing, and safety and containment issues need to be addressed before the plant virus approach becomes commercially viable. This would require the collaborative efforts of many researchers, including plant virologists, immunologists, chemical engineers and protein chemists. References 1 T. A. KOST, J. P. CONDREAY, Curr. Opin. Biotechnol. 1999, 10 (5), 428–433. 2 F. HESSE, R. WAGNER, Trends Biotechnol. 2000, 18 (4), 173–180. 3 K. KOTHS, Curr. Opin. Biotechnol. 1995, 6 (6), 681–687. 4 B. D. KELLEY, Curr. Opin. Biotechnol. 2001, 12 (2), 173–174. 5 M. A. DALRYMPLE, I. GARNER, Biotechnol. Genet. Eng. Rev. 1998, 15, 33–49. 6 C. J. ARNTZEN, Nature Biotechnol. 1997, 5 (3), 221–222. 7 A. M. WALMSLEY, C. J. ARNTZEN, Curr. Opin. Biotechnol. 2000, 11 (2), 126–129. 8 A. M. WALMSLEY, C. J. ARNTZEN, Curr. Opin. Biotechnol. 2003, 14 (2), 145–150. 9 H. S. MASON, H. WARZECHA, T. MOR et al., Trends Mol. Med. 2002, 8 (7), 324–329. 10 P. AWRAM, R. C. GARDNER, R. L. FORSTER et al., Adv. Virus Res. 2002, 58, 81–124.
11 G. P. POGUE, J. A. LINDBO, S. J. GARGER et al., Annu. Rev. Phytopathol. 2002, 40, 45–74. 12 C. PORTA, G. P. LOMONOSSOFF, Rev. Med. Virol. 1998, 8 (1), 25–41. 13 G. P. LOMONOSSOFF, W. D.O. HAMILTON, in: J. HAMMOND, P. MCGARVEY, V. YUSIBOV (eds) Plant Biotechnology, Springer Verlag, Berlin, pp. 177–189, 1999. 14 A. SIEGEL, Phytopathology 1983, 73 (5), 775–775. 15 H. B. SCHOLTHOF, K-B. G. SCHOLTHOF, A.O. JACKSON, Annu. Rev. Phytopathol. 1996, 34, 299–323. 16 G. P. POGUE, J. A. LINDBO, W. O. DAWSON et al., in: Plant Molecular Biology Manual, Kluwer Academic Publishers, Dordrecht, The Netherlands, Section L4, pp 1–27, 1998. 17 V. YUSIBOV, S. SHIVPRASAD, T. H. TURPEN et al., in: J. HAMMOND, P. MCGARVEY, V. YUSIBOV (eds) Plant Biotechnology,
13
14 Plant Viral Expression Vectors
18
19
20
21 22
23
24 25
26 27
28
29 30 31 32
33 34 35
36 37
Springer Verlag, Berlin, pp. 81–94, 1999. J. DONSON, C. M. KEARNEY, M. E. HILF et al., Proc. Natl Acad. Sci. USA 1991, 88 (16), 7204–7208. S. SHIVPRASAD, G. P. POGUE, D. J. LEWANDOWSKI et al., Virology 1999, 255 (2), 312–323. M. H. KUMAGAI, T. H. TURPEN, N. WEINZETTL et al., Proc. Natl Acad. Sci. USA 1993, 90 (2), 427–430. R. L. TOTH, G. P. POGUE, S. CHAPMAN, Plant J. 2002, 30 (5), 593–600. H. HAMAMOTO, Y. SUGIYAMA, N. NAKAGAWA et al., Bio/Technology 1993, 11 (8), 930–932. T. H. TURPEN, S. J. REINL, Y. CHAROENVIT et al., Bio/Technology 1995, 13 (1), 53–57. M. BENDAHMANE, M. KOO, E. KARRER et al., J. Mol. Biol. 1999, 290 (1), 9–20. S. SANTA CRUZ, S. CHAPMAN, A. G. ROBERTS et al., Proc. Natl Acad. Sci. USA 1996, 93 (13), 6286–6290 R. L. TOTH, S. CHAPMAN, F. CARR et al., FEBS Lett. 2001, 489 (2-3), 215–219. S. HENDY, Z. C. CHEN, H. BARKER et al., J. Immunol. Methods 1999, 231 (1-2), 137–146. P. ROGGERO, M. CIUFFO, E. BENVENUTO et al., Prot. Express. Purif. 2001, 22 (1), 70–74. R. USHA, J. B. ROHL, V. E. SPALL et al., Virology 1993, 197 (1), 366–374. J. VERVER, J. WELLINK, J. Van LENT et al., Virology 1998, 242 (1), 22–27. K. GOPINATH, J. WELLINK, J. PORTA et al., Virology 2000, 267 (2), 159–173. V. YUSIBOV, A. MODELSKA, K. STEPLEWSKI et al., Proc. Natl Acad. Sci. USA 1997, 94 (11), 5784–5788. H. BELANGER, N. FLEYSH, S. COX et al., FASEB J. 2000, 14 (14), 2323–2328. J. F. BOL, J. Gen. Virol. 1999, 80 (5), 1089–1102. D. DIRNBERGER, H. STEINKELLNER, L. ABDENNEBI et al., Eur. J. Biochem. 2001, 268 (16), 4570–4579. M. BARDOR, L. FAYE, P. LEROUGE, Trends Plant Sci. 1999, 4 (9), 376–380. T. ARAZI, P. LEE-HUANG, P.L. HUANG et al., Biochem. Biophys. Res. Commun. 2002, 292 (2), 441–448.
38 M. KREBITZ, U. WIEDERMANN, D. ESSL et al., FASEB J. 2000, 14 (10), 1279–1288. 39 A. WIGDOROVITZ, D. M. PEREZ FILGUEIRA, N. ROBERTSON et al., Virology 1999, 264 (1), 85–91. 40 M. R. FERNANDEZ-FERNANDEZ, M. MOURINO, J. RIVERA et al., Virology 2001, 280 (2), 283–291. 41 A. A. MCCORMICK, M. H. KUMAGAI, K. HANLEY et al., Proc. Natl Acad. Sci. USA 1999, 96 (2), 703–708. 42 A. A. MCCORMICK, S. J. REINL, T. I. CAMERON et al., J. Immunol. Methods 2003, 278 (1-2), 95–104. 43 B. L. NICHOLAS, F. R. BRENNAN, J. L. MARTINEZ-TORRECUADRADA et al., Vaccine 2002, 20 (21-22), 2727–2734. 44 F. R. BRENNAN, L. B. GILLELAND, J. STACZEK et al., Microbiology 1999, 145 (8), 2061–2067. 45 F. R. BRENNAN, T. D. JONES, W. D. HAMILTON, Mol. Biotechnol. 2001, 17 (1), 15–26. 46 M. KOO, M. BENDAHMANE, G. A. LETTIERI et al., Proc. Natl Acad. Sci. USA 1999, 96 (14), 7774–7779. 47 L. G. NEMCHINOV, T. J. LIANG, M. M. RIFAAT, Arch. Virol. 2000, 145 (12), 2557–2573. 48 A. MODELSKA, B. DIETZSCHOLD, N. FLEYSH et al., Proc. Natl Acad. Sci. USA 1998, 95 (5), 2481–2485. 49 V. YUSIBOV, D. C. HOOPER, S. V. SPITSIN et al., Vaccine 2002, 20 (25-26), 3155–3164. 50 C. MARUSIC, P. RIZZA, L. LATTANZI et al., J. Virol. 2001, 75 (18), 8434–8439. 51 I. W. KHOR, T. LIN, J. P. LANGEDIJK et al., J. Virol. 2002, 76 (9), 4412–4419. 52 D. BOROVSKY, J. Exp. Biol. 2003, 206 (21), 3869–3875. 53 M. H. KUMAGAI, J. DONSON, G. DELLA-CIOPPA, Proc. Natl Acad. Sci. USA 1995, 92 (5), 1679–1683. 54 S. SPITSIN, K. STEPLEWSKI, N. FLEYSH et al., Proc. Natl Acad. Sci. USA 1999, 96 (5), 2549–2553. 55 D. J. LEWANDOWSKI, W. O. DAWSON, Virology 1998, 251 (2), 427–437. 56 D. J. LEWANDOWSKI, W. O. DAWSON, Virology 2000, 271 (1), 90–98. 57 V. YUSIBOV, N. FLEYSH, S. SPITSIN et al., in: J. T. ROMER, J. A. SAUNDERS, B. F. MATTHEWS (eds), Regulation of
References
58 59
60
61
Phytochemicals by Molecular Techniques, Elsevier Science Ltd, London, pp. 59–78, 2001. L. LIU, G. P. LOMONOSSOFF, J. Virol. Methods 2002, 105 (2), 343–348. A. A. LIM, S. TACHIBANA, Y. WATANABE et al., Gene 2002, 289 (1-2), 69–79. F. PEREZ FILGUEIRA, D. ZAMORANO, P. DOMINGUEZ, et al., Vaccine 2003, 21(27-30), 4201–4209. L. WU, L. JIANG, Z. ZHOU et al., Vaccine 2003, 21 (27-30), 4390–4398.
62 S. CHAPMAN, T. KAVANAGH, D. BAULCOMBE, Plant. J. 1992, 2 (4), 549–557. 63 L. SMOLENSKA, I. M. ROBERTS, D. LEARMONTH et al., FEBS Lett. 1998, 441 (3), 379–382. 64 J. SANCHEZ-NAVARRO, R. MIGLINO, A. RAGOZZINO et al., Arch. Virol. 2001, 146 (5), 923–939. 65 T. JOELSON, L. AKERBLOM, P. OXELFELT et al., J. Gen. Virol. 1997, 78 (6), 1213–1217. 66 G. ZHANG, C. LEUNG, L. MURDIN et al., Mol. Biotechnol. 2000, 14 (2), 99–107.
15
1
Foreign Protein Expression Using Plant Cells Fiona S. Shadwick, and Pauline M. Doran University of New South Wales, Sydney, NSW 2052, Australia
Originally published in: Molecular Farming. Edited by Rainer Fischer and Stefan Schillberg. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30786-9
1 Foreign Protein Production Systems
The demand of the pharmaceutical industry for large quantities of mammalian proteins has led to the development of heterologous expression systems for the production of proteins and peptides of varying complexity. The host organisms used range from bacteria to eukaryotic systems such as yeast, insect, mammalian and plant cell cultures and transgenic animals and plants. Bacteria can produce relatively high levels of foreign protein but develop insoluble inclusion bodies and offer only limited post-translational modification. In contrast, eukaryotic expression systems are able to glycosylate proteins and carry out post-translational processing, although different post-translational effects can result in the formation of products that are not identical in all respects to the native protein. The cost of protein production using different host organisms and expression systems varies widely. The complexity of the protein, its end use, the scale of production and the degree of similarity required between the transgenic and native proteins are important factors to consider when determining the type of production system to apply. Plant-based production systems are now being used commercially for the synthesis of foreign proteins [1–3]. Post-translational modification in plant cells is similar to that carried out by animal cells; plant cells are also able to fold multimeric proteins correctly. The sites of glycosylation on plant-produced mammalian proteins are the same as on the native protein; however, processing of N-linked glycans in the secretory pathway of plant cells results in a more diverse array of glycoforms than is produced in animal expression systems [4]. Glycoprotein activity is retained in plant-derived mammalian proteins. Agricultural production of foreign proteins in crop plants can deliver large quantities of product at low cost [5]. This is possible even if protein expression levels are Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Foreign Protein Expression Using Plant Cells
low relative to other heterologous systems, as agriculture is a cheap technology and production targets can be achieved using additional plantings or cropping larger areas of land [6]. The ability to scale-up economically and to utilize production and extraction technologies already developed for the food industry significantly reduces the cost of high-volume protein production using agriculture compared with alternative methods such as cell culture. The cost of agricultural production is also reduced substantially if the foreign protein does not require purification from the plant biomass, as is the case for edible vaccines. Despite the advantages of agriculture, and even if the product can be shown to have appropriate biological activity, agricultural production of foreign proteins may not provide adequate assurance of product safety and quality. For example, foreign proteins produced in the field are subject to contamination with pesticides, herbicides and mycotoxins; field-grown plants also experience variable weather conditions, non-uniform soil compositions and infestation by pests and diseases. Any or all of these factors may result in unpredictable product yield and quality. The inability to control production conditions could mean that whole-plant systems fail to comply with good manufacturing practice in many countries around the world, particularly if the transgenic protein is to be used as a therapeutic. Issues of environmental crop safety also arise, especially if the foreign protein is toxic, e.g. to soil microorganisms or to wild-life capable of consuming the plants [7]. In situations where the plant-derived product has suitable activity but where contamination, inconsistent quality and/or regulatory issues prohibit the use of agricultural methods, large-scale plant tissue culture offers an alternative route for foreign protein production [8]. This is particularly so if the volume of protein required is low. As well as overcoming many of the inherent problems associated with agriculture, plant tissue culture also offers a number of advantages over conventional animal cell culture methods currently being applied to produce biopharmaceutical proteins commercially [8]. As plant culture media are relatively simple in composition and do not contain proteins, the cost of the process raw materials is reduced and protein recovery from the medium is easier and cheaper compared with animal cell culture. In addition, as most plant pathogens are unable to infect humans, the risk of pathogenic infections being transferred from the cell culture via the product is also substantially reduced.
2 Production of Foreign Proteins Using Plant Tissue Culture
In this review, we focus on the use of plant tissue culture to produce foreign proteins that have direct commercial or medical applications. The development of largescale plant tissue culture systems for the production of biopharmaceutical proteins requires efficient, high-level expression of stable, biologically active products. To minimize the cost of protein recovery and purification, it is preferable that the expression system releases the product in a form that can be harvested from the culture medium. In addition, the relevant bioprocessing issues associated with bioreactor culture of plant cells and tissues must be addressed.
2 Production of Foreign Proteins Using Plant Tissue Culture
Extensive research has been carried out into the molecular aspects of foreign protein production in whole plants to enhance the yield, quality and stability of the product and to facilitate protein separation and purification from the biomass [3, 6, 9]. In contrast, comparatively little research has been undertaken to investigate the specific issues associated with producing foreign proteins in plant tissue culture. Many of the problems that need to be addressed, such as low protein yields, are similar in vivo and in vitro. However, the differences between these systems means that different solutions may be required. Some developments that have proven useful for enhancing foreign protein yields in whole plants, such as chloroplast and organ-specific expression systems, have little or no practical application in tissue culture. On the other hand, although not yet demonstrated in vitro, the use of viral vectors to increase protein production, and novel approaches for simplifying product purification such as targeting of proteins to oil bodies, may have significant implications for the production of foreign proteins in tissue culture systems. Foreign proteins that have been produced using plant tissue culture are listed in Table 1. Several of these proteins, e.g. antibodies, interleukins, erythropoietin, human granulocyte-macrophage colony stimulating factor (hGM-CSF) and hepatitis B antigen, have pharmaceutical or therapeutic uses and would be suitable for further commercial development if the production levels could be increased. However, to date, few plant cell or organ cultures have been shown to accumulate or secrete foreign proteins at concentrations sufficient for commercial viability. As indicated in Table 1, tobacco (Nicotiana tabacum) has been the host species in most studies of foreign protein expression in plant tissue culture. Rice (Oryza sativa) has also been used by several groups. Accumulation of hGM-CSF was found to be significantly higher in rice suspensions with an inducible promoter than in tobacco suspensions with constitutive transgene expression [10]. The predominance of tobacco-based expression systems in tissue culture studies differs from the situation with whole plants, where advantages associated with producing foreign proteins in edible species or in storage organs such as seeds have resulted in a variety of plant species being transformed. Most research into in vitro foreign protein production has been undertaken using cell suspensions. However, other forms of plant tissue culture such as hairy roots and shooty teratomas have also been tested in a number of studies (Table 1). The characteristics of different types of plant tissue culture and their utility for largescale foreign protein production are outlined in the following sections. 2.1 Suspended Cell Cultures
Plant cell suspensions comprise small clumps of dedifferentiated plant cells in liquid nutrient medium. Dedifferentiation of the cells occurs under the influence of plant growth regulators, which must be provided in the medium to promote rapid growth and maintain the culture morphology. Transgenic cell suspensions can be developed from callus initiated using explants from transformed plants; alternatively, wild-type suspensions may be transformed directly using
3
Nicotiana tabacum (tobacco) Nicotiana tabacum (tobacco)
Oryza sativa (rice)
Nicotiana tabacum (tobacco)
Nicotiana tabacum (tobacco) Nicotiana tabacum (tobacco)
Whole plant, hydroponic
Suspension
Callus
Suspension
Suspension
Suspension
Alkaline phosphatase, human placental Antibody, scFv, against phytochrome
Antibody, murine IgG-2b/κ, against tobacco mosaic virus
Antibody, scFv, against carcinoembryonic antigen Antibody, heavy chain, against pazophenylarsonate
Plant species
Culture type
Foreign protein
A. tumefaciens transformation of leaf explant, regenerated plants sexually crossed
A. tumefaciens transformation of leaf explant
A. tumefaciens transformation of suspension
Microparticle bombardment of callus
A. tumefaciens transformation of leaf explant A. tumefaciens transformation of leaf explant
Transformation method
Table 1 Expression of foreign proteins in plant tissue culture
Murine
Murine Ig
CaMV 35S
Native murine
Native murine
Native murine
CaMV 35S
CaMV 35S
Maize ubiquitin-1
Not reported
Mannopine synthase (mas2 ) CaMV 35S Tobacco pathogenesis related protein (PR1a) Murine heavy-and light-chain IgG
Leader sequence
Promoter
1.2 mg g−1 dry weight (t) 7.5 mg L−1 (t) 3.6 mg L−1 (e) 6.5% TSP (t) 12% TSP (t) with PVP
170 µg L−1 (i) 10 µg L−1 (e) 360 µg L−1 (e) with PVP 80 µg mL−1 (i) 300 µg mL−1 (e) 15 µg g−1 wet weight (i) 45 µg g−1 wet weight (i) with amino acids
0.45 µg g−1 fresh weight without KDEL (i) 3.8 µg g−1 fresh weight with KDEL (i) 150 µg L−1 (t) 430 µg L−1 (t) with DMSO
20 µg day−1 g−1 root dry weight (e) 3% of total medium protein (e) 0.5 mg L−1 (e) 5.0% of total medium protein (e)
Production level (maximum)
17
62
67
11
64
31
73
72
Reference
4 Foreign Protein Expression Using Plant Cells
Suspension
Hairy root
Suspension
Bryodin 1
Cytochrome P450 2E1, rabbit
Erythropoietin, human
Nicotiana tabacum (tobacco)
Atropa belladonna
Nicotiana tabacum (tobacco)
Oryza sativa (rice)
Nicotiana tabacum (tobacco)
Shooty teratoma
Suspension
Nicotiana tabacum (tobacco)
Hairy root
Antibody, murine IgG1 , against Streptococcus mutans surface antigen
α 1 -antitrypsin, human
Plant species
Culture type
Foreign protein
Table 1 (continued)
A. tumefaciens transformation of leaf explant, regenerated plants sexually crossed Microparticle bombardment of callus Microparticle bombardment of suspension A. rhizogenes transformation of leaf explant A. tumefaciens transformation of suspension
A. tumefaciens transformation of leaf explant, regenerated plants sexually crossed
Transformation method Murine Ig
Murine Ig
CaMV 35S
CaMV 35S
Extensin
Not reported
Native human erythropoietin
Rice α-amylase, inducible CaMV 35S
CaMV 35S
CaMV 35S
Rice α-amylase
Leader sequence
Promoter
0.8 µg L−1 (t) 0.0026% TSP
Not reported
85 mg L−1 (e) 5.7 mg g−1 dry weight (e) 51 mg L−1 (e) 30 mg L−1 (e)
18 mg L−1 (t) 1.8% TSP (i) 3.2 mg L−1 (e) 10.8 mg L−1 (e) with PVP 1.1 mg g−1 dry weight (t) 7.0 mg L−1 (t) 1.4 mg L−1 (e) 3.0% TSP (t) 4.0% TSP (t) with PVP 0.28 mg g−1 dry weight (t) 3.2 mg L−1 (t)
Production level (maximum)
(continued)
61
22
74
29,39
17
19,17
Reference
2 Production of Foreign Proteins Using Plant Tissue Culture 5
Hepatitis B surface antigen
Hairy root
Green fluorescent protein
Suspension
Whole plant, hydroponic
Nicotiana tabacum (tobacco)
Suspension
Granulocytemacrophage colony stimulating factor, human (hGM-CSF)
Nicotiana tabacum (tobacco)
Glycine max (soybean)
Nicotiana tabacum (tobacco)
Catharanthus roseus
Hyoscyamus muticus
Oryza sativa (rice)
Plant species
Culture type
Foreign protein
Table 1 (continued)
Chimeric ocs-mas
Not reported
N. plumbaginifolia calreticulin Not reported
Mannopine synthase (mas2 ) Chimeric ocs-mas
–
–
Not reported
Rice α-amylase
Rice α-amylase, inducible CaMV 35S
Glucocorticoidinducible
129 mg L−1 (e) 25% total medium protein (e)
Natural mammalian
CaMV 35S
A. tumefaciens transformation of explant Microparticle bombardment of callus A. tumefaciens transformation of hairy root A. rhizogenes transformation of plant A. tumefaciens transformation of plant Microparticle bombardment of suspension A. tumefaciens transformation of suspension
0.31 mg g−1 dry weight (i)
1.7 mg g−1 dry weight (i) 20–22 mg L−1 (i)
296–923 ng day−1 g−1 root dry weight (e)
–
180 µg L−1 (e) 783 µg L−1 (e) with gelatin
150 µg L−1 (i) 240 µg L−1 (e)
Tobacco etch virus
CaMV 35S
A. tumefaciens transformation of callus
Production level (maximum)
Leader sequence
Promoter
Transformation method
14
14
72
34
23
10
65
40
Reference
6 Foreign Protein Expression Using Plant Cells
Suspension
Suspension
Suspension
Suspension
Suspension
Interleukin-2, human
Interleukin-4, human
Interleukin-12, human
Invertase, carrot
Lysozyme, human
Oryza sativa (rice)
Nicotiana tabacum (tobacco)
Nicotiana tabacum (tobacco)
Nicotiana tabacum (tobacco)
Nicotiana tabacum (tobacco)
Plant species
A. tumefaciens transformation of suspension A. tumefaciens transformation of suspension A. tumefaciens transformation of leaf explant, regenerated plants sexually crossed A. tumefaciens transformation of suspension Microparticle bombardment of callus
Transformation method
Leader sequence
Natural mammalian Natural mammalian Native human interleukin-12 subunit
Native carrot invertase Rice α-amylase
Promoter
CaMV 35S
CaMV 35S
CaMV 35S
CaMV 35S
Rice α-amylase, inducible
1400 U L−1 (i) 40 U g−1 dry weight (i) 150 U L−1 (e) 4% TSP
60 µg L−1 (i) 175 µg L−1 (e) 700 µg L−1 (e) with gelatin
275 µg L−1 (i) 180 µg L−1 (e)
75 µg L−1 (i) 10 µg L−1 (e)
Production level (maximum)
30
15
60
75
75
Reference
i = protein in the biomass; e = protein in the medium; t = total protein CaMV = cauliflower mosaic virus; DMSO = dimethylsulfoxide; PVP = polyvinylpyrrolidone; TSP = total soluble protein; U = unit
Culture type
Foreign protein
Table 1 (continued)
2 Production of Foreign Proteins Using Plant Tissue Culture 7
8 Foreign Protein Expression Using Plant Cells
Agrobacterium tumefaciens-mediated transfection or biolistic delivery of plasmid DNA into the cells (Table 1). Plant suspensions are being used to produce an increasing number of foreign proteins. These include complete antibodies, antibody fragments, hGM-CSF, interleukin-2, interleukin-4 and interleukin-12, erythropoietin, hepatitis B surface antigen, α 1 -antitrypsin, human lysozyme and carrot invertase. Although, in some instances, stable production of foreign proteins has been found to occur over extended periods [11–15], suspension cultures are subject to various types of genetic instability through the effects of somaclonal variation [16]. Significant reductions in the yield of foreign proteins over time, possibly caused by genetic instability, have been reported in plant cell suspensions [12–14, 17]. In some cases, cell lines with stable production characteristics were isolated by screening and selection from a large number of cultures [12–14]. 2.2 Hairy Root Cultures
Hairy roots are neoplastic roots produced by transformation of plant cells with Agrobacterium rhizogenes. When cultured in liquid medium, hairy roots often exhibit rapid growth relative to untransformed roots. Hairy roots can be propagated indefinitely in liquid medium and retain their morphological integrity and stability in the absence of exogenous plant growth regulators. Hairy root cultures have been found to have significantly greater long-term stability than suspended plant cells for the production of foreign proteins [17]. Transgenic hairy root cultures can be initiated by infecting transgene-containing plants or explants with A. rhizogenes (Table 1). Using this approach, it is relatively easy to generate hairy roots expressing multiple foreign genes, as plants containing multiple transgenes (produced by sexually crossing transgenic plants carrying single transgenes) may be used for hairy root initiation. For example, transgenic tobacco plants developed by crossing antibody-heavy-chain-expressing plants with antibody-light-chain-expressing plants [18] were used subsequently to generate hairy roots capable of synthesising complete IgG1 antibody [19–21]. Transgene-expressing hairy roots can also be obtained by performing root initiation and transformation at the same time using genetically-modified A. rhizogenes with the transgene inserted into plasmid constructs [22]. Alternatively, established hairy root cultures can be induced to produce foreign proteins by direct A. tumefaciens-mediated transformation [23]. 2.3 Shooty Teratoma Cultures
Shooty teratomas are a form of differentiated organ culture produced by transformation of plants with particular strains of Agrobacterium tumefaciens [24]. Foreign protein production in transgenic shooty teratomas has been reported by only one group [17, 21]. In this system, shooty teratomas of tobacco were used to produce
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture 9
an IgG1 antibody. Antibody yields in the teratoma cultures were lower than in suspended cell and hairy root cultures [17]. The growth characteristics of shooty teratomas were also not conducive to liquid culture as the shoots tended to callus and were very susceptible to hyperhydricity (vitrification). 2.4 Scale-up Considerations for Different Forms of Plant Tissue Culture
As indicated in Table 2, several studies of foreign protein production have been carried out using plant cell suspensions or hairy roots in bioreactors. Bioprocess development for the large-scale culture of suspended plant cells is relatively well established, as this type of culture has been examined extensively for the production of plant secondary metabolites such as paclitaxel (taxol), ginseng and shikonin [25– 27]. Accordingly, if suspension cultures suitable for the commercial production of foreign proteins were developed, the basic technology for large-scale operations is already available. Research into bioreactor systems for more complex forms of tissue culture such as roots and shoots is not as well developed. The principal reactor types trialed for large-scale hairy root culture have been reviewed by Giri and Lakshmi Narasu [28]. Difficulties associated with providing a low-shear environment while maintaining adequate mixing and oxygen transfer present significant problems for the scale-up of root reactors. However, despite these engineering challenges, if advantages such as enhanced culture stability are associated with hairy roots compared with suspended cells [17], further technical development of root cultures for foreign protein production would be worthwhile.
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture
As indicated by the protein accumulation levels in Table 1, it is currently possible using plant tissue culture to achieve moderate levels of foreign protein expression in some systems. To take advantage of the cost benefits associated with recovering products from the culture medium rather than from homogenized biomass, expression systems for protein secretion have also been developed. Yet, relative to the production levels attained in animal cell cultures, foreign protein concentrations in plant cultures are typically very low. This is a major hurdle preventing plant systems being utilized more widely for commercial protein production. Therefore, a key research objective has been to increase the accumulation of active foreign proteins in plant tissue culture. Although the reasons for the low yields in plant systems are not yet fully understood, two different approaches have been taken to increase product levels in vitro. These are: (i) to increase the level of gene expression in the cells by altering the transgene constructs and methods of expression, and (ii) to increase the retention and stability of foreign protein in the cultures after the protein is produced. Efforts have also been made to enhance the
Nicotiana tabacum (tobacco)
Nicotiana tabacum (tobacco)
Oryza sativa (rice)
Hairy roots
Suspension
Suspension
Suspension
Antibodies, antibody fragments, antibody fusion proteins (unspecified) α 1 -antitrypsin, human
Invertase, carrot
i = intracellular protein; e = protein in the medium
Nicotiana tabacum (tobacco)
Suspension
Antibody, heavy chain, against p-azophenylarsonate Antibody, murine IgG1 , against Streptococcus mutans surface antigen
Nicotiana tabacum (tobacco)
Plant species
Culture type
Foreign protein
Table 2 Production of foreign proteins using plant tissue culture in bioreactors
10 L, stirred, dual 6-blade turbines, continuous operation, 75 days
5 L working volume, stirred, single pitched-blade impeller, 2-stage batch operation, 6–8 days 2 L, stirred, 2-stage batch operation, 13 days
3.5 L working volume, stirred, 6-blade disc impeller, batch operation, 8 days 2 L, magnetic stirrer, vertical cylindrical wire-mesh cage for biomass support, batch operation, 30 days 40 L working volume, stirred, 3-blade impeller, batch operation, 150 h
Bioreactor system
25.6 mg L−1 (e) 4.6 mg g−1 dry weight (e) 20.8 U h−1 L−1 (i) 15 0.8–1.0 U mg−1 protein (i)
51 mg L−1 (e) 7.3 mg day−1 L−1 (e)
Not reported
1.9 mg L−1 (e) 0.45 mg g−1 dry weight (i)
110 µg L−1 (i) 190 µg L−1 (e)
Production level (maximum)
29
39
76
19
67
References
10 Foreign Protein Expression Using Plant Cells
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture 11
availability of product in the culture medium to facilitate subsequent recovery and purification. 3.1 Expression Systems
Compared with whole plants, there has been limited development of foreign protein expression systems specifically for use in tissue culture. Some modifications of expression constructs have resulted in improved protein accumulation or have allowed simplified protein recovery. However, in general, modified expression systems have been tested only in a restricted number of cases and have not resulted in the large increases in product yield required for plant cultures to compete with other foreign protein production vehicles. Transient expression techniques, for example using viral vectors, that have been developed for use in whole plants have not yet been applied in plant tissue culture. 3.1.1 Modifications to Existing Expression Constructs Several molecular strategies have been successful in increasing foreign protein production in cultured plant cells. These include using promoters for inducible expression [10, 12, 29], optimizing codon usage [30] and adding the KDEL sequence to ensure protein retention in the endoplasmic reticulum [31]. Application of different promoters has been the most common approach to the modification of expression constructs. As indicated in Table 1, most of the promoters used in plant tissue culture have been based on the constitutive cauliflower mosaic virus (CaMV) 35S promoter. In contrast, inducible promoters have the advantage of allowing foreign proteins to be expressed at a time that is most conducive to protein accumulation and stability. Although a considerable number of inducible promoters has been developed and used in plant culture applications, e.g. [32–37], the only one to be applied thus far for the production of biopharmaceutical proteins is the rice α-amylase promoter. This promoter controls the production of an α-amylase isozyme that is one of the most abundant proteins secreted from cultured rice cells after sucrose starvation. The rice α-amylase promoter has been used for expression of hGM-CSF [10], α 1 -antitrypsin [12, 29, 38, 39] and human lysozyme [30]. Alterations to the proteins and pre-proteins expressed by cultured plant cells have been used to facilitate product recovery. A leader sequence is required for foreign protein secretion from plant cells into the apoplast and then into the culture medium. As indicated in Table 1, plant, mammalian and viral sequences have been employed to achieve the entry of transgenic proteins into the bulk-flow pathway in plant cultures. To facilitate product recovery and purification, molecular tags may be added to foreign proteins. Attachment of a functional His6 tag to a secreted therapeutic protein, hGM-CSF, has been examined in tobacco suspension cultures [40]. The His6 tag consisted of six histidine residues attached to the protein terminus and
12 Foreign Protein Expression Using Plant Cells
gave the protein the ability to bind strongly to metal ions. The presence of the His6 tag allowed the specific removal of product from the culture medium using iminodiacetic acid metal affinity resin [41]. 3.1.2 Transient Expression Using Viral Vectors Genetically modified viral vectors have been applied in many whole-plant systems for the production of therapeutic proteins and epitope vaccines, e. g. [42–46]. Foreign proteins produced using viral vectors can be in the form of free cytosolic proteins or fusions to viral proteins. Viral expression systems exploit the ability of viruses to propagate rapidly and achieve high concentrations in plant tissues. For example, tobacco mosaic virus (TMV) can accumulate in infected tobacco leaves to levels greater than 60 mg g−1 dry weight [47] and produce amounts of TMV coat protein accounting for 10–40% of the total protein content of the leaves [42]. Provided the movement proteins on recombinant viruses remain functional, viral vectors are able to spread throughout the entire plant from a single infection point via the plasmodesmata between individual cells and the vascular system. Therefore, in principle, when foreign protein is co-expressed with the virus, large amounts of product can be formed. To date, application of transgenic viruses in whole plants has not resulted in the production of foreign proteins to the same high levels as viral proteins from nontransgenic virus infections. This is probably because the genetic construct carried by the virus interferes to some extent with the normal folding, packaging, transmission or replication processes [48]. Nevertheless, foreign protein yields achieved using viral vectors can be substantial. For example, transgenic viruses with coat protein fusions have been reported to accumulate to levels of 1–3 mg g−1 of plant tissue [49, 50]. A vector that facilitates high-level protein expression in plant tissue culture, particularly a transient expression system that could be applied to existing wild-type cultures, would be advantageous for in vitro foreign protein production. However, such a system has not yet been developed. The success of this approach depends in part on whether appropriate levels of viral infection, replication and transmission can be established within tissue culture systems. Previous work has shown that mechanical inoculation techniques, for example, rubbing cells with abrasive powder [51] or vibrating cell suspensions in a vortex mixer [52], can be used to infect plant cell suspensions with viruses. Other methods that have been tested include microinjection of viruses into plant cells [53] and inoculation of callus by pricking the tissues with needles dipped in a virus suspension [51]. Viral infection has been reported to occur to some extent even without special mechanical treatment of cultured plant cells [52]. It is thought that viral agents are able to enter cells via the plasmodesmata observed to be present in dedifferentiated cultures [54]. Infection of suspended cells was found to be most successful when friable cell clumps were freshly dispersed from callus into liquid medium containing the virus [55]. The reason given for this was that the protoplasmic connections between the cells were broken in this procedure so that the plasmodesmata were
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture 13
exposed allowing viral entry into the cells [52]. Additional non-intentional injury may also occur to cells in agitated culture, thus providing other routes for virus infection. In previous work, levels of viral accumulation in plant cell suspensions have been significantly lower than those achieved in whole plants. For example, suspended tobacco cells have been reported to accumulate only one-thirtieth to one-fortieth the concentration of TMV attainable in tobacco leaves [56]. In other experiments, maximum TMV coat-protein levels of only about 250 µg g−1 fresh weight were measured in tobacco cell suspensions [57]. Even though plant cells in suspension tend to aggregate so that individual cells in clumps are connected by plasmodesmata, the spread of virus between infected and uninfected cells that are not in direct contact is likely to be very limited. Therefore, compared with the recombinant systems already available for tissue culture applications, suspended plant cells may offer no significant advantage for improving foreign protein yields using virus-based expression. However, other forms of plant tissue in which the cells are in close and constant contact with each other, such as differentiated organs, may prove feasible hosts for high-level viral expression of foreign proteins in vitro. We have examined the characteristics of virus infection of hairy roots to determine if root cultures would be suitable for foreign protein production using viral vectors. Extensive cell-to-cell contact occurs in hairy roots and some viral transport may also be possible through the vascular tissue. As shown in Figure 1, TMV was produced in significant quantities in Nicotiana benthamiana hairy roots, mostly during the period of active root growth. The concentration of virus in replicate
Fig. 1 Root growth (◦) and accumulation of tobacco mosaic virus (TMV) (•) in hairy roots of N. benthamiana. TMV concentrations were measured by ELISA. The error bars indicate standard errors from four replicate shake-flask cultures.
14 Foreign Protein Expression Using Plant Cells
cultures was found to vary considerably. The average concentration of virus between days 21 and 36 was approximately 2 mg g−1 dry weight; however, levels of virus in individual cultures were as high as 5 mg g−1 . These results demonstrate the potential of hairy roots for the propagation of plant viruses. Further work is underway to test the production of co-expressed foreign proteins in hairy root cultures using a modified TMV vector.
3.2 Secretion of Foreign Proteins
Proteins produced in plant cells can remain within the cell or are secreted into the apoplast via the bulk transport (secretory) pathway. In whole plants, because levels of protein accumulated intracellularly, e.g. using the KDEL sequence to ensure retention in the endoplasmic reticulum, are often higher than when the product is secreted [58], foreign proteins are generally not directed for secretion. However, as protein purification from plant biomass is potentially much more difficult and expensive than protein recovery from culture medium, protein secretion is considered an advantage in tissue culture systems. For economic harvesting from the medium, the protein should be stable once secreted and should accumulate to high levels in the extracellular environment. Secretion of foreign proteins into the medium requires that the protein molecules move through the cell walls. The pores in plant cell walls are thought to allow passage of globular proteins of maximum size around 20 kDa; however, a small number of wider pores may serve as channels for relatively slow permeation of larger molecules [59]. Foreign proteins with molecular weights significantly greater than 20 kDa have been recovered in substantial quantities from plant culture media [17, 29, 60]. In other cases, despite having signal sequences that allow the protein to reach and traverse the plasma membrane, recombinant proteins such as erythropoietin [61] and IgG-2b/κ antibody [62] remain associated with the plant cell wall and fail to be released from the biomass. These results suggest that protein composition and structure may affect the extracellular availability of secreted foreign proteins. The presence of foreign protein in the medium of plant cultures does not necessarily mean that all or even most of the product can be recovered from the medium. In many expression systems where an appropriate signal sequence has been used, considerable amounts of foreign protein remain within the plant cells and/or tissues. For example, in a comparison of IgG1 antibody production in tobacco cell suspension and hairy root cultures, a maximum of 72% of the total antibody was found in the medium of the suspension cultures whereas only 26% was found in the medium of the hairy root cultures [17]. This result could indicate that secretion and/or transport across the cell wall was slower in the hairy roots; alternatively, it could indicate poorer stability of the secreted protein in the hairy root medium. If foreign proteins are to be purified from the medium, improved secretion and extracellular product stability are desirable.
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture 15
3.3 Foreign Protein Stability
There is considerable evidence that foreign proteins are subject to a significant degree of degradation and instability in plant expression systems, both inside and outside of the cells. 3.3.1 Stability Inside the Cells Foreign protein fragments often appear in addition to the intact protein in western blots of extracts from transgenic plants and plant cells [21]. This phenomenon is not confined to plant tissue cultures or particular host species, and occurs in seeds as well as vegetative tissues. A detailed investigation of IgG1 antibody fragments in tobacco cell suspension and hairy root cultures has been carried out by Sharp and Doran [21]. Although various explanations have been offered for the presence of foreign protein fragments, such as protease release during sample homogenization, the presence of assembly intermediates, and variations in the extent of protein glycosylation, in the case of IgG1 antibody these explanations could not adequately account for all the observed molecular properties of the fragments. Instead, with the aid of a range of affinity probes, glycosylation and secretion inhibitors and glycan-reactive agents, proteolytic degradation in the apoplasm of tissues such as hairy roots, and between the endoplasmic reticulum and Golgi apparatus in both hairy roots and suspended cells, was identified as the most likely mechanism of fragment formation. 3.3.2 Stability Outside the Cells The simplicity of plant culture media is considered an advantage for foreign protein production in tissue culture systems. However, as a mixture of salts and sugar containing several heavy metals but negligible protein (except for any protein secreted from the plant cells), plant culture medium provides an environment for foreign proteins that is very different from the physiological conditions inside the cells. The effect of medium composition on the concentration of IgG1 antibody in solution is illustrated in Figure 2. In this experiment, antibody was added at a concentration of 1.0 mg L−1 to fresh, sterile media in shake flasks. The flasks were then incubated on an orbital shaker at 25◦ C and the antibody concentration was measured as a function of time using an enzyme-linked immunosorbent assay (ELISA) [63]. As shown in Figure 2, there is a significant difference between the extent of antibody retention in Murashige and Skoog (MS) plant culture medium and in media designed to support the growth of animal cells. After 7 hours, about 80% of the added antibody was retained in Dulbecco’s minimal essential medium (DMEM) containing 10% fetal bovine serum and about 70% was present in serumfree Ex-cell 302 medium. In contrast, in MS medium, less than 10% of the added antibody could be detected after only 1.5 h. These results indicate that fresh, sterile plant culture medium does not support the retention and stability of proteins in solution.
16 Foreign Protein Expression Using Plant Cells
Fig. 2 Stability of IgG1 monoclonal antibody added to sterile plant and animal cell culture media. (•) Murashige and Skoog (MS) medium; () Dulbecco’s minimal essential medium (DMEM) with 10% serum; and () serum-free Ex-cell 302 medium.
The error bars indicate standard errors from triplicate flasks. (Reproduced with permission, from B. M. -Y. Tsoi and P. M. Doran, Biotechnol. Appl. Biochem. 2002, 35, 171–180. ľ Portland Press on behalf of the IUBMB.)
There have been many reports from several groups that plant culture medium is not conducive to protein stability, and that the retention of secreted proteins in culture media can be very poor [10, 11, 17, 40, 60, 63–66]. The mechanisms responsible for protein loss from plant culture media are not completely understood; however, current indications are that multiple factors may be involved. Processes that have been proposed to affect foreign proteins in plant media include protein degradation due to protease activity [10, 17, 20, 38, 60, 65], protein instability due to defined or undefined conditions or components in the medium [40, 63, 64, 66], surface adsorption of proteins onto the culture vessel [11, 17] and protein aggregation or insolubility [17]. As declining levels of foreign protein in plant tissue culture have been associated in a number of studies with an increase in the concentration of extracellular proteases [10, 60, 65], minimizing protease levels in the medium and/or reducing the susceptibility of heterologous proteins to protease degradation have been investigated as methods for improving foreign protein accumulation. Several approaches have been tested to achieve this objective with varying levels of success. These include adding the broad-spectrum protease inhibitor bacitracin to the medium of cell suspension and hairy root cultures [20], adjusting the osmolarity of the medium to minimize cell disruption and protease release [38], adding gelatin as a possible alternative substrate for protease activity [60], reducing protease accumulation by using inducible promoters to allow separation of the growth and production phases [10] and using host species such as rice that are considered to secrete lower levels of proteases than the more commonly applied tobacco [10].
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture 17
3.3.3 Medium Additives Protein stabilizing agents such as polyvinylpyrrolidone (PVP) [11, 17, 19, 60, 65, 66], gelatin [19, 60, 65], bovine serum albumin (BSA) [40] and salt (NaCl) [40] have been demonstrated to improve the retention of several types of foreign protein in plant tissue culture media. The precise mode of action of these additives in protecting proteins is unclear; however, they may prevent protein aggregation, conformational change and/or adsorption onto the internal surfaces of the holding vessel. The appropriate stabilizing polymer must be identified for each foreign protein production system, as their effects appear to vary depending on the specific culture and its protein product. The addition of biopolymers, particularly proteins such as BSA, to tissue culture media has the potential to complicate downstream processing operations for product recovery. However, when the resulting increase in protein yield is large, the additional cost of product purification may be acceptable. Several medium additives other than the protein stabilizing agents mentioned above have also been tested to improve foreign protein accumulation in plant tissue cultures. These include dimethylsulfoxide (DMSO) [64], polyethylene glycol [60, 65], nitrate [19], amino acids [62], heamin [63], gibberellic acid [63] and glutamine [67]. The mechanisms by which these components might affect intra- and extracellular foreign protein levels include improving protein expression and synthesis, increasing protein secretion, reducing the extent of intracellular protein degradation and improving protein stability in the medium. The results of empirical studies carried out to test the effects of medium additives on foreign protein accumulation in plant tissue culture are summarized below. Polyvinylpyrrolidone (PVP) PVP is a metabolically inert, water-soluble polymer with excellent protein stabilizing properties. As an example of the beneficial effect of PVP on foreign protein accumulation in plant tissue culture, data for growth and IgG1 antibody levels in transgenic tobacco hairy root cultures with and without PVP are shown in Figure 3. In these experiments, PVP with a relative molecular mass of 360,000 was added to the medium at a concentration of 1.5 g L−1 . PVP had no significant effect on root growth (Fig. 3a); in a number of studies, PVP at concentrations up to 3 g L−1 has been found to have little effect on growth of plant cell and organ cultures [11, 17, 19, 66]. The primary effect of PVP was a substantial increase in the amount of foreign protein in the culture medium (Fig. 3b); the maximum level of antibody in the medium with PVP was about four-fold greater than that without PVP. As indicated in Figure 3c, on average throughout the culture period the effect of PVP on antibody levels in the root biomass was relatively small. Similar results have also been reported for PVP-treated plant cell suspensions producing foreign protein [66]. The addition of PVP 360,000 at a concentration of 0.75 g L−1 has been reported to yield a 35-fold increase in the level of extracellular foreign protein in suspended plant cell cultures [66]. The effectiveness of PVP in stabilizing secreted proteins depends on both the polymer molecular weight and its concentration. Low-molecularweight (10,000 and 40,000) PVP was found to be less effective than PVP 360,000 [66]. Increasing concentrations of PVP 360,000 up to 1.0 g L−1 improved antibody
18 Foreign Protein Expression Using Plant Cells
Fig. 3 Effect of PVP on (a) growth; (b) antibody in the medium; and (c) antibody in the biomass, for N. tabacum hairy roots expressing IgG1 antibody. () Cultures with added PVP; () cultures without PVP. The error bars indicate standard errors from triplicate cultures; the initial culture volume
was 50 mL. Much higher amounts of antibody were retained in the medium with PVP. (Reproduced with permission, from J. M. Sharp and P. M. Doran, Biotechnol. Prog. 2001, 17, 979–992. Copyright 2001 Am. Chem. Soc.)
accumulation in hairy root culture medium; however, above this concentration there was no further increase in antibody levels [19]. Addition of PVP after extracellular foreign protein levels had decreased during plant suspension culture did not result in a recovery of the protein [66]. Although PVP has proven successful as a foreign protein stabilizer in several systems and has yielded substantial increases in product concentrations as discussed above, it has been found relatively ineffective in other plant tissue cultures producing foreign proteins [60, 65].
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture 19
Gelatin Gelatin has been shown to enhance foreign protein levels in the medium of transgenic plant tissue cultures [19, 60, 65]. However, growth in both suspended cell and hairy root cultures was reduced, with the negative effect increasing with gelatin concentration [19]. It has been suggested that gelatin could act either as a protein stabilizing agent or as an alternative substrate for protease activity [60]. The addition of gelatin has been associated with significant increases in extracellular foreign protein levels in plant cultures. The concentration of interleukin-12 in suspension culture medium increased four-fold in the presence of 2% gelatin [60], while 0.1–0.9% gelatin increased the concentration of IgG1 antibody in hairy root medium by factors of between four and eight [19]. The addition of 5% gelatin to cell suspensions expressing hGM-CSF resulted in a 4.6-fold improvement in the yield of extracellular product [65]. As gelatin is a common food additive with applications in the pharmaceutical industry, its introduction into foreign protein production systems may generate fewer regulatory concerns than other biopolymers. Bovine Serum Albumin (BSA) The addition of BSA to tobacco suspensions producing hGM-CSF increased the maximum protein concentration in the medium by a factor of two without affecting intracellular antibody levels or cell growth [40]. However, as an animal-derived protein, BSA has the potential to introduce mammalian pathogens into plant tissue cultures, thus negating an important advantage associated with using plant systems for the synthesis of pharmaceutical and therapeutic proteins. The introduction of exogenous proteins into plant cultures may also complicate the downstream processing of foreign proteins. Salt (NaCl) The addition of 50–100 mM NaCl to tobacco cell suspensions reduced cell growth but resulted in a 50% increase in the maximum level of hGM-CSF secreted into the culture medium [40]. Intracellular hGM-CSF was relatively unaffected by this treatment. Dimethylsulfoxide (DMSO) When added to plant suspension cultures, DMSO functions as a cell permeabilizing agent facilitating the release of intracellular products into the culture medium [68]. Use of DMSO as an effector for antibody secretion and/or a protein-stabilizing agent has been tested using tobacco cell suspensions and hairy roots [64, 69]. The addition of DMSO to suspended cells expressing an antibody heavy chain resulted in a significant increase in both intracellular and extracellular product levels [64]. Whereas untreated cultures accumulated a maximum of 150 µg L− of total antibody, treated cultures accumulated 210 µg L−1 in the presence of 2.8% DMSO and 430 µg L− in the presence of 4% DMSO. In contrast, when DMSO at a concentration of 2.8% was added to transgenic hairy roots at various times during the culture, there was no significant effect on IgG1 antibody accumulation in either the biomass or medium [69].
20 Foreign Protein Expression Using Plant Cells
Nitrate Supplementation of Gamborg’s B5 medium with 0.1% KNO3 (in addition to the 0.25% KNO3 usually present in B5 medium) resulted in a significant increase in antibody levels in tobacco hairy root cultures [19]. The antibody concentration in the medium increased 2.8-fold and the total antibody accumulation increased by 90% relative to cultures without added nitrate. Amino Acids Transgenic tobacco suspensions supplemented with a cocktail of essential and nonessential amino acids produced three-fold more IgG-2b/κ antibody than untreated cultures [62]. The amino acids were added 10 h prior to harvesting of the cultures in the presence of 1 mM CaCl2 to facilitate amino acid uptake. Other Additives Further additives have been tested for their influence on foreign protein accumulation in plant tissue cultures. These include heamin (C34 H32 O4 N4 FeCl), an effective inhibitor of ubiquitin-dependent proteolysis, the plant hormone gibberellic acid, which has been shown to enhance the activity of the endoplasmic reticulum and secretory pathway in plant cells, and the amino acid glutamine [63, 67]. However, these compounds had relatively little effect on foreign protein levels. Addition of polyethylene glycol to plant culture medium also did not significantly improve the accumulation of foreign protein in the systems tested [60, 65]. 3.3.4 Medium Properties Modifying the properties of plant culture media, including increasing the osmolarity, reducing the effective concentration of selected heavy metals and altering the pH, has resulted in enhanced foreign protein accumulation or stability in some systems. Osmolarity Hyperosmolar medium is known to be advantageous for the production of proteins such as antibodies in animal cell cultures [70, 71]. Likewise, raising the osmolarity of MS medium using mannitol was found to improve the accumulation of foreign protein in both the biomass and medium in plant cell suspensions [63]. When IgG1 -secreting tobacco cell suspensions were grown in hyperosmolar medium, maximum antibody levels in the medium increased by up to 2.4-fold compared with standard MS medium. This result was possibly due to greater amounts of antibody being produced or secreted from the cells, as IgG1 antibody added to fresh, sterile hyperosmolar medium did not show increased stability compared with unmodified medium [63]. The influence of medium osmolarity on foreign protein accumulation and activity was also demonstrated by Terashima et al. [38] using rice suspensions. In this system, α 1 -antitrypsin was expressed using an inducible promoter that allowed protein production to occur when the sucrose content of the medium was low. Sucrose starvation was achieved by replacing the plant growth medium with sugar-free production medium. However, the removal of sucrose decreased the osmolarity of the medium and the activity of the α 1 -antitrypsin produced was relatively low. Increasing the osmolarity of the production medium using mannitol or
3 Strategies for Improving Foreign Protein Accumulation and Product Recovery in Plant Tissue Culture 21
N-(2-hydroxyethyl) piperazine-N -2-ethanesulfonic acid) (HEPES)/NaCl resulted in a significant improvement in α 1 -antitrypsin activity [38]. Heavy Metal Concentration The heavy metals copper, manganese, cobalt and zinc were omitted individually and in combination from MS and B5 media to determine the effect on antibody stability in solution [63]. When IgG1 antibody was added to these modified media in experiments similar to the one represented in Figure 2, only the B5 medium without Mn showed a significant improvement in antibody retention relative to normal culture media. Nevertheless, protein losses were considerable as only about 30% of the added antibody could be detected in the Mn-free medium after about 5 h. The beneficial effect of removing Mn was lost when all four heavy metals, Cu, Mn, Co and Zn, were omitted simultaneously. The reason for these results is unclear. Addition of the metal chelating agent ethylenediaminetetraacetate (EDTA) had a negligible effect on antibody retention in both MS and B5 media [63]. pH Solution pH has a strong influence on the structure, conformation and solubility of globular proteins. However, when IgG1 antibody was added to fresh, sterile B5 medium adjusted to pH values between 4.0 and 8.0, there was no significant difference in the rate at which the antibody disappeared from the different solutions [63]. In other work, antibody-expressing tobacco hairy roots were cultured in B5 medium with initial pH between 3.0 and 11.0 [69]. Root growth was affected severely at the lowest and highest pH values and total antibody levels declined as the initial pH was increased above 5.0–6.0. 3.4 Bioprocess Developments
The development of novel bioreactors and bioreactor operating strategies has the potential to improve the performance of cell culture systems. Some progress has been made in this area to enhance foreign protein production in plant tissue culture. 3.4.1 Product Recovery from the Medium Continuous or periodic harvesting of secreted foreign proteins from plant culture media could be used to increase product yields by removing active protein before it can be degraded or otherwise lost from the culture. Shake-flask-based affinity-chromatography bioreactors have been used for simultaneous production and purification of foreign proteins in vitro [41]. Heavy chain antibody secreted by suspended plant cells was recovered by continuously recycling the culture medium through a column containing Protein G resin. The harvested antibody was eluted from the column at the end of the culture period. The extent to which the antibody bound to the resin depended on the recirculation flow rate and the column pH. Although extracellular antibody concentrations were similar in the affinity-chromatography and control flasks, protein harvesting increased the total production of secreted protein more than eight-fold [41]. In similar experiments,
22 Foreign Protein Expression Using Plant Cells
hGM-CSF with a His6 tag [40] was harvested semi-continuously by recycling the medium through a metal affinity column [41]. Cell growth was reduced in this system; however, the total amount of hGM-CSF accumulated was more than twice that obtained without product removal. In other work, periodic foreign protein recovery from cell suspension and hairy root cultures was achieved using hydroxyapatite resin [17]. Beginning 13 days after inoculation of the hairy roots and 7 days after inoculation of the suspensions, medium was periodically separated from the biomass, exposed to the resin and then returned to the cultures. Recovered IgG1 antibody was eluted from the resin and the total amount of antibody produced by the cultures determined by ELISA. Even though growth was unaffected by the hydroxyapatite, the overall effect of this process on antibody accumulation was relatively small, with maximum total antibody levels only 20–21% higher with periodic harvesting than without. Considering the rapidity with which antibodies may be lost from plant culture media (Figure 2), continuous rather than periodic removal of product may be required to achieve more substantial improvements in yield [17]. 3.4.2 Oxygen Transfer and Dissolved Oxygen Concentration Adequate aeration of plant tissue cultures is crucial for achieving maximum production of foreign proteins [17, 67]. Poor oxygen transfer has been found to limit cell growth and reduce antibody heavy chain production in genetically modified tobacco cell suspensions in shake flasks [67]. However, although raising the air flow rate in a stirred 5-L bioreactor increased foreign protein levels to a certain extent, excessive aeration resulted in foaming and reduced both growth and antibody production [67]. In transgenic hairy root cultures, the concentration of IgG1 antibody in the biomass increased by 52% relative to the levels obtained under air when the roots were cultured at a dissolved oxygen tension of 150% air saturation [17]. Compared with root cultures grown at 50% air saturation, which is a realistic operating dissolved oxygen tension in poorly mixed, large-scale root reactors, oxygen enrichment to 150% air saturation improved total antibody accumulation 2.9-fold.
4 Conclusions
Plant tissue culture offers an alternative to agriculture as a plant-based system for producing low-to-medium volumes of foreign proteins. Secretion of foreign proteins into the culture medium has significant potential benefits for reducing the cost and complexity of product recovery and purification. A range of molecular, culture and bioprocessing techniques has been employed to enhance the accumulation and stability of foreign proteins in plant cell and organ cultures. Further improvements in product yield are required to raise the economic competitiveness of plant tissue culture as a viable commercial method for the controlled, safe and reliable manufacture of pharmaceutical and therapeutic proteins.
References
References 1 J. W. LARRICK, L. YU, J. CHEN et al., Res. Immunol. 1998, 149 (6), 603–608. 2 E. E. HOOD, J. M. JILKA, Curr. Opin. Biotechnol. 1999, 10 (4), 382–386. 3 C. L. CRAMER, J. G. BOOTHE, K. K. OISHI, in Plant Biotechnology: New Pro ducts and Applications, J. HAMMOND, P. MCGARVEY, V. YUSIBOV (eds), Springer, Berlin, 1999, 95–118. 4 M. CABANES-MACHETEAU, A.-C. FITCHETTELAINE´ , C. LOUTELIER-BOURHIS et al., Glycobiology 1999, 9 (4), 365–372. 5 R. L. EVANGELISTA, A. R. KUSNADI, J. A. HOWARD et al., Biotechnol. Prog. 1998, 14 (4), 607–614. 6 A. R. KUSNADI, Z. L. NIKOLOV, J. A. HOWARD, Biotechnol. Bioeng. 1997, 56 (5), 473–484. 7 L. MIELE, Trends Biotechnol. 1997, 15 (2), 45–50. 8 P. M. DORAN, Curr. Opin. Biotechnol. 2000, 11 (2), 199–204. 9 H. DANIELL, S. J. STREATFIELD, K. WYCOFF, Trends Plant Sci. 2001, 6 (5), 219– 226. 10 Y.-J. SHIN, S.-Y. HONG, T.-H. KWON et al., Biotechnol. Bioeng. 2003, 82 (7), 778–783. 11 N. S. MAGNUSON, P. M. LINZMAIER, J.-W. GAO et al., Protein Express. Purif. 1996, 7 (2), 220–228. 12 J. HUANG, T. D. SUTLIFF, L. WU et al., Biotechnol. Prog. 2001, 17 (1), 126–133. 13 J. GAO, J. M. LEE, G. AN, Plant Cell Rep. 1991, 10 (10), 533–536. 14 M. L. SMITH, H. S. MASON, M. L. SHULER, Biotechnol. Bioeng. 2002, 80 (7), 812– 822. 15 D. VERDELHAN DES MOLLES, V. GOMORD, M. BASTIN et al., J. Biosci. Bioeng. 1999, 87 (3), 302–306. 16 P. J. LARKIN, W. R. SCOWCROFT, Theor. Appl. Genet. 1981, 60 (4), 197–214. 17 J. M. SHARP, P. M. DORAN, Biotechnol. Prog. 2001, 17 (6), 979–992. 18 J. K.-C. MA, T. LEHNER, P. STABILA et al., Eur. J. Immunol. 1994, 24 (1), 131–138. 19 R. WONGSAMUTH, P. M. DORAN, Biotechnol. Bioeng. 1997, 54 (5), 401– 415. 20 J. M. SHARP, P. M. DORAN, Biotechnol. Bioprocess Eng. 1999, 4 (4), 253–258.
21 J. M. SHARP, P. M. DORAN, Biotechnol. Bioeng. 2001, 73 (5), 338–346. 22 S. BANERJEE, T. Q. SHANG, A. M. WILSON et al., Biotechnol. Bioeng. 2002, 77 (4), 462–466. 23 C. D. MERRITT, S. RAINA, N. FEDOROFF et al., Biotechnol Prog. 1999, 15 (2), 278–282. 24 M. A. SUBROTO, J. D. HAMILL, P. M. DORAN, J. Biotechnol. 1996, 45 (1), 45–57. 25 J.-J. ZHONG, J. Biosci. Bioeng. 2002, 94 (6), 591–599. 26 F. BOURGAUD, A. GRAVOT, S. MILESI et al., Plant Sci. 2001, 161 (5), 839–851. 27 A. H. SCRAGG, Curr. Opin. Biotechnol. 1992, 3 (2) 105–109. 28 A. GIRI, M. LAKSHMI NARASU, Biotechnol. Adv. 2000, 18 (1), 1–22. 29 M. TERASHIMA, Y. MURAI, M. KAWAMURA et al., Appl. Microbiol. Biotechnol. 1999, 52 (4), 516–523. 30 J. HUANG, L. WU, D. YALDA et al., Transgenic Res. 2002, 11 (3), 229–239. 31 E. TORRES, C. VAQUERO, L. NICHOLSON et al., Transgenic Res. 1999, 8 (6), 441–449. 32 K.-I. SUEHARA, S. TAKAO, K. NAKAMURA et al., J. Ferment. Bioeng. 1996, 82 (1), 51–55. 33 S. SOMMER, M. SIEBERT, A. BECHTHOLD et al., Plant Cell Rep. 1998, 17 (11), 891–896. 34 E. H. HUGHES, S.-B. HONG, J. V. SHANKS et al., Biotechnol. Prog. 2002, 18 (6), 1183–1186. 35 K. YOSHIDA, T. KASAI, M. R. C. GARCIA et al., Appl. Microbiol. Biotechnol. 1995, 44 (3–4), 466–472. 36 C. XIANG, Z.-H. MIAO, E. LAM, Plant Mol. Biol. 1996, 32 (3), 415–426. 37 H. KURATA, T. TAKEMURA, S. FURUSAKI et al., J. Ferment. Bioeng. 1998, 86 (3), 317–323. 38 M. TERASHIMA, Y. EJIRI, N. HASHIKAWA et al., Biochem. Eng. J. 1999, 4 (1), 31–36. 39 M. M. TREXLER, K. A. MCDONALD, A. P. JACKMAN, Biotechnol. Prog. 2002, 18 (3), 501–508. 40 E. A. JAMES, C. WANG, Z. WANG et al., Protein Express. Purif. 2000, 19 (1), 131–138. 41 E. JAMES, D. R. MILLS, J. M. LEE, Biochem. Eng. J. 2002, 12 (3), 205–213. 42 R. N. BEACHY, J. H. FITCHEN, M. B. HEIN, Ann. N. Y. Acad. Sci. 1996, 792, 43–49.
23
24 Foreign Protein Expression Using Plant Cells
43 C. PORTA, V. E. SPALL, T. LIN et al., Intervirology 1996, 39 (1–2), 79–84. 44 H. B. SCHOLTHOF, K.-B. G. SCHOLTHOF, A. O. JACKSON, Annu. Rev. Phytopathol. 1996, 34, 299–323. 45 T. VERCH, V. YUSIBOV, H. KOPROWSKI, J. Immunol. Meth. 1998, 220 (1–2), 69–75. 46 A. WIGDOROVITZ, D. M. P´EREZ FILGUEIRA, N. ROBERTSON et al., Virology 1999, 264 (1), 85–91. 47 R. J. COPEMAN, J. R. HARTMAN, J. C. WATTERSON, Phytopathology 1969, 59 (7), 1012–1013. 48 S. RABINDRAN, W. O. DAWSON, Virology 2001, 284 (2), 182–189. 49 R. FISCHER, C. VAQUERO-MARTIN, M. SACK et al., Biotechnol. Appl. Biochem. 1999, 30 (2), 113–116. 50 K. DALSGAARD, Å. UTTENTHAL, T. D. JONES et al., Nature Biotechnol. 1997, 15 (3), 248–252. 51 B. KASSANIS, T. W. TINSLEY, F. QUAK, Ann. Appl. Biol. 1958, 46 (1), 11–19. 52 H. H. MURAKISHI, J. N. HARTMANN, R. N. BEACHY et al., Virology 1971, 43 (1), 62–68. 53 T. E. RUSSELL, R. S. HALLIWELL, Phytopathology 1974, 64 (12), 1520–1526. 54 D. F. SPENCER, W. C. KIMMINS, Can. J. Bot. 1969, 47 (12), 2049–2050. 55 H. H. MURAKISHI, Phytopathology 1968, 58 (7), 993–996. 56 B. KASSANIS, Methods Virol. 1967, 1, 537–566. 57 P. THOMAS, G. S. WARREN, J. Exp. Bot. 1994, 45 (276), 987–994. 58 U. FIEDLER, J. PHILLIPS, O. ARTSAENKO et al., Immunotechnology 1997, 3 (3), 205–216. 59 N. CARPITA, D. SABULARSE, D. MONTEZINOS et al., Science 1979, 205 (4411), 1144– 1147.
60 T. H. KWON, J. E. SEO, J. KIM et al., Biotechnol. Bioeng. 2003, 81 (7), 870–875. 61 S. MATSUMOTO, K. IKURA, M. UEDA et al., Plant Mol. Biol. 1995, 27 (6), 1163–1172. 62 R. FISCHER, Y.-C. LIAO, J. DROSSARD, J. Immunol. Methods 1999, 226 (1-2), 1–10. 63 B. M.-Y. TSOI, P. M. DORAN, Biotechnol. Appl. Biochem. 2002, 35 (3), 171–180. 64 M. F. WAHL, G. AN, J. M. LEE, Biotechnol. Lett. 1995, 17 (5), 463–468. 65 J.-H. LEE, N.-S. KIM, T.-H. KWON et al., J. Biotechnol. 2002, 96 (3), 205–211. 66 W. LACOUNT, G. AN, J. M. LEE, Biotechnol. Lett. 1997, 19 (1), 93–96. 67 F. LIU, J. M. LEE, Biotechnol. Bioprocess Eng. 1999, 4 (4), 259–263. 68 D. P. DELMER, Plant Physiol. 1979, 64 (4), 623–629. 69 R. WONGSAMUTH, P. M. DORAN, Hairy roots as an expression system for production of antibodies, in Hairy Roots: Culture and Applications, ed. P. M. DORAN, Harwood Academic, Amsterdam, 1997, 89–97. 70 S. K. W. OH, P. VIG, F. CHUA et al., Biotechnol. Bioeng. 1993, 42 (5) 601–610. 71 M. S. LEE, G. M. LEE, Biotechnol. Bioeng. 2000, 68 (3), 260–268. 72 N. V. BORISJUK, L. G. BORISJUK, S. LOGENDRA et al., Nature Biotechnol. 1999, 17 (5), 466–469. 73 S. FIREK, J. DRAPER, M. R. L. OWEN et al., Plant Mol. Biol. 1993, 23 (4), 861–870. 74 J. A. FRANCISCO, S. L. GAWLAK, M. MILLER et al., Bioconjugate Chem. 1997, 8 (5), 708–713. 75 N. S. MAGNUSON, P. M. LINZMAIER, R. REEVES et al., Protein Express. Purif. 1998, 13 (1), 45–52. 76 R. FISCHER, N. EMANS, F. SCHUSTER et al., Biotechnol. Appl. Biochem. 1999, 30 (2), 109–112.
1
Functional Expression of Ion Channels in Mammalian Systems Jeffrey J. Clare
GalaxoSmithKline, Stevenage, United Kingdom
Originally published in: Expression and Analysis of Recombinant Ions Channels. Edited by Jeffrey J. Clare and Derek J. Trezise. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31209-2
1 Introduction
The expression of recombinant ion channels in heterologous systems is an invaluable tool for detailed analysis of their molecular and biophysical properties. This powerful approach compliments the analysis of endogenous channels in their native tissues by providing an opportunity to study and manipulate them in a defined and electrically inactive cell environment. This is advantageous for a wide range of studies, particularly for detailed definition of biophysical and pharmacological properties, for analysing structure–function relationships, and for characterising the role of individual subtypes and/or component subunits. An expression system that has been extensively used in this way is based on oocytes of the South African clawed frog, Xenopus laevis. This system has the advantage of being easy to manipulate while giving robust expression and stable recordings for a broad range of recombinant ion channels in an environment that is largely free of endogenous electrical activity. However, despite the utility of Xenopus oocytes for ion channel functional analysis, certain disadvantages are inherent. The maintenance of frogs and preparation of oocytes requires additional procedures and equipment beyond that usually found in a standard research laboratory. Oocyte quality is critical for success and this can be subject to wide variation due to seasonal and other factors. More importantly, mammalian channels can exhibit biophysical properties that are somewhat unexpected [1, 2], possibly due to an absence of regulatory, accessory and other factors that normally interact with the channels in their native environment. Similarly, pharmacological properties can be altered in oocytes. Typically, lower drug potencies are observed and this has been ascribed to a reduction of the free intracellular concentration of drug due to absorption by the large amounts of membrane and yolk particles found inside the oocyte [3]. Many of these drawbacks Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Functional Expression of Ion Channels in Mammalian Systems
can be avoided, at least partially, by the use of mammalian cells as an expression host for studying recombinant mammalian ion channels. The aim of this article is therefore to review approaches that are currently available for heterologous ion channel expression in mammalian systems. For research applications, expression of the channel in a functional form is normally a prerequisite. Similarly, although alternative formats exist (e.g. radioligand binding), the most reliable read-outs for ion channel drug screening are those based, either directly or indirectly, on functional activity of the channel in a cell-based assay. However, this can often present significant challenges given the huge diversity and organisational complexity found within the ion channel superfamily. Ion channel subunits generally contain multiple transmembrane spanning segments and functional channels are usually complex multi-subunit proteins. Correct membrane localisation and orientation, as well as appropriate subunit composition and stoichiometry, are often vital for faithful reproduction of the pharmacological and biophysical properties of the native channel. In addition, ion channel cDNAs can be difficult to manipulate due to high instability and frequently cause cytotoxicity to the host cell when over-expressed. Finally, even for well expressed channels, the generation of reagents that are sufficiently robust for use in a high throughput screening environment can be extremely problematic. In this chapter a number of strategies that can be used to address these challenges will be described. A number of case studies will be presented that highlight some of the theoretical and practical considerations for obtaining robust expression of functionally active channels.
2 cDNA Cloning and Manipulation
cDNA instability can be a fairly common phenomenon with certain ion channels, which can make cloning and manipulation using standard techniques difficult. This is presumably due to the presence of sequences that are either toxic or recombinogenic in E.coli (e.g. direct repeats, Z DNA, etc.) causing negative selection pressure and resulting in a strong orientation bias and/or few viable clones. Clones that do arise are frequently found to contain mutations ranging from large rearrangements and deletions to single base changes that cause mis-sense or frame shift mutations. Certain channel subfamilies, e.g. voltage-gated sodium (NaV ) channels and ATP-binding cassette (ABC) transporters such as CFTR, sulfonylurea receptors, seem to be particularly susceptible to this, but the problem can also occur more sporadically in other subfamilies, e.g. voltage-gated calcium channels (CaV ). These problems can be minimised using a variety of approaches, either separately or, if necessary, in combination. The use of “directional” cloning strategies (i.e. where insertion of the DNA fragment in the required orientation is forced by the use of incompatible restriction sites) is recommended wherever possible. Vectors that are “transcriptionally silent” (i.e. containing no E.coli promoter sequences, cyptic or otherwise, in the vicinity of the cloning site) and/or that replicate at low copy number in E.coli (e.g. pBR322-based) are also useful. In addition, E.coli host
3 Choice of Host Cell Background 3
strains are commercially available that allow propagation of standard ColE1-based vectors at reduced copy number (ABLE, Stratagene). The use of commercially available recombination-deficient E.coli host strains (SURE, Stratagene, STABL2, Invitrogen) is also recommended. Finally, propagation of E.coli clones at reduced temperature (e.g. 30 ◦ C) is often found essential for problematic channels – the colonies may take several days to appear and will generally be much smaller than normal. Despite taking some or all of the above precautions, for channels that are known to be unstable, it is essential to rigorously validate final constructs by complete DNA sequencing and it frequently proves necessary to correct mutations by one or more rounds of site-directed mutagenesis. In extreme cases, it may also be necessary to resequence the channel-encoding region each time a new batch of DNA is prepared. Finally, for optimal expression, it is common to engineer the region upstream of the start codon to contain sequences compatible with efficient initiation of translation (Fig. 1). The consensus sequence associated with genes that are efficiently expressed in mammalian cells, identified by Kozak [4], is often used, and this can normally be inserted during construction of the expression vector.
3 Choice of Host Cell Background
As indicated in the introduction to this chapter, the choice of host cell background is often critical for efficient expression as well as for faithful reproduction of the properties of the native channel. Two major concerns are the existence of endogenous channels in the host cell that may be similar to the channel of interest and mask its expression, and the presence or absence of accessory factors that may normally modify the properties or expression of the channel in its native environment. Clearly, these parameters will vary between different cell types and are dependent on the nature of the channel of interest. Thus, it is advisable to screen a range of potential cell lines for their suitability as hosts before embarking on an expression project. This can be done by functional screening, preferably using the assay format that will ultimately be used for analysing the channel of interest, or alternatively by molecular screening, for example by quantitative mRNA analysis (e.g. Taqman). The latter approach is often more convenient and is useful as a primary screen in that the endogenous expression of wide range of channels and accessory proteins can be screened simultaneously (e.g. using reverse Taqman). However, this requires a reasonable level of background knowledge of the channel family and its likely modulatory or interacting partners, which may not always be available, especially for more novel channels. In addition, the detection of mRNA for a particular channel or accessory factor does not necessarily mean that the active protein is expressed, and functional assays are usually necessary as a secondary screen to establish this. Although pre-screening of potential host cell lines may indicate a lack of confounding background activities, it is worth remembering that it is formally possible
4 Functional Expression of Ion Channels in Mammalian Systems
Fig. 1 Quantitative comparison of different expression constructs and screening of stable clones using automated high throughput electrophysiology. (A) Comparison of representative hERG K+ channel expressing CHO cell lines that were generated using different types of expression construct. A and D = non-optimised mRNA leader sequence, B = optimised leader, C = optimised leader in IRES expression vector. Each point represents peak current (left panel) or percentage of cells expressing (right panel) averaged from ∼300 cells mea-
sured by IonWorks high throughput electrophysiology. (B) Screening of stable clones for functional hERG channel expression. 24 different clonal CHO cell lines generated with construct C (above), were analysed by Ionworks electrophysiology. The graphs show a comparison, for each clone, of the percentage of cells giving usable seals (>80 M, bottom), percentage cells expressing currents above background (middle) and mean current amplitude (top). Data are averaged from 64 cells for each clone.
3 Choice of Host Cell Background 5
for normally silent endogenous channel proteins to become up-regulated during the process of generating an expression reagent, perhaps in response to over expression of the target channel. Thus, it is prudent to validate expression of the target protein in the final reagent by a sequence dependent method, for example by immunological validation, if selective antibodies are available, or by a quantitative PCR (Taqman). In a drug discovery setting another factor that should be considered when choosing a host background is the physical characteristics of the cells, which must be compatible with the intended assay format. Thus, electrophysiological assays depend on the formation of high resistance electrical seals with the plasma membrane, and microtitre plate-based assays (see chapters 7 & 8 ) often require good adherence of cells to the plastic substrate. It is worth stating that these qualities can deteriorate in cells over-expressing ion channels compared to untransfected parent cells, especially over prolonged passage. A variant of the HEK293 cell line that has been engineered to be more adherent to plastic surfaces by expressing the macrophage scavenger receptor is available (Invitrogen). A variety of mammalian cells lines have been successfully used as hosts for recombinant ion channel expression. Standard hosts that are in general use for heterologous expression have also been successfully used for ion channels, including CHO, COS7, CV1, HEK293 etc. As indicated above, variants of these lines which are designed for specific applications have also been engineered and their use will be highlighted throughout this article. These commonly used hosts are relatively (though not completely) free of background conductances and are suitable for many channel types (e.g. see Refs. [5, 6]). However, it is worth noting that different isolates can vary in this respect and may behave differently in different laboratories, so it is still advisable to functionally screen for background currents of interest. In situations where there is relatively little background knowledge of the channel and its potential accessory factors, or where expression proves problematic in “standard hosts”, it is often worth searching for less common immortalised lines that are derived from tissues that natively express the channel of interest. For example, various neuroblastoma cell lines have proved useful as hosts for neuronal ion channel expression, such as SH-SY5Y, ND7–32, NG108 [7–9]. In practise, the precise reason(s) for this have not been fully explored but it is likely that the intracellular environment and processing machinery are more favorable in a host cell derived from the tissue where the channel is normally expressed. Other factors that can be important probably include the presence of cognate accessory proteins and better tolerance to a wider range of resting membrane potential than nonexcitable cells. An example of this is provided by the sensory neuron specific NaV 1.8 Na+ channel. This is only poorly expressed in commonly used host cells (e.g. HEK293) but is more efficiently expressed in an immortalised dorsal root ganglion-derived cell line (ND7–23, Fig. 2a). One possible explanation is that this relates to the accessory β3 subunit since this is endo-genously expressed in ND7–23 cells but, not in HEK293 (Fig. 2c). In support of this, it was found that the recombinant β3 subunit enhances the level of NaV 1.8 specific currents when co-expressed in HEK293 cells (Fig. 2b). Furthermore, this effect is also seen in ND7–23 cells, suggesting that the
6 Functional Expression of Ion Channels in Mammalian Systems
Fig. 2 Effect of cell background and auxiliary subunits on efficiency of ion channel expression. (A) Comparison showing representative current traces recorded from HEK293 and ND7–23 cell lines stably expressing the NaV 1.8 channel. Peak Na+ currents in HEK293 stable clones were smaller compared to ND7–23 and declined during passage or following freeze-thawing of cells. (B) Transient co-expression of the NaV P3 subunit enhances NaV 1.8 expression in HEK293 cells. Bars represent the
mean peak currents obtained. (C) NaV $3 subunit mRNA is endogenously expressed in ND7–23 but not in HEK293 cells. RT-PCR analysis of cell extracts using subtype selective primers is shown. Lanes labelled “-” indicate −ve controls where reverse transcriptase was omitted; +ve control reactions, using the cloned $ subunit cDNAs as template, are shown in the right hand lanes.
Reproduced with permission from Ref. [89].
4 Post-translational Processing of Heterologous Expressed Ion Channels
effect of endogenous β3 in these cells can be enhanced by over-expression of the recombinant subunit. As illustrated above, it is possible to engineer host cell lines to make them more favorable for ion channel expression by over-expressing accessory factors that may be required for efficient expression. It may also be desirable to generate cell lines that express reporter genes (e.g. Aequorin [10], halide-sensitive YFP [11]) for use as generic hosts for ion channel assays. An additional approach to engineering cell lines to make them more favorable as hosts is to manipulate their electrical properties by introducing additional channels. This is particularly useful when generating reagents for microtitre plate-based functional assays in which the readout is dependent on voltage-sensitive dyes, but can also apply to assays based on Ca2+ binding fluorescent dyes. A good example of this is shown in Fig. ??, which illustrates the expression of the R-type voltage-gated calcium channel in non-neuronal cells (HEK293). In these cells only small Ca2+ responses can be detected in fluorescence imaging plate reader (FLIPR, Molecular Devices) assays when they are depolarised, even though good sized currents can be detected electrophysiologically. This is apparently due to the resting potential of HEK293 cells which is more positive than that typically found in neurons (e.g. −10 to −20 mV compared with −60 to −80 mV) resulting in the majority of channels being inactivated. However, it is possible to increase the magnitude of depolarisation-induced Ca2+ responses by manipulating the resting potential by co-expressing a constitutively active K+ channel, KCNK2 (TREK-1). This leads to a more negative resting potential, resulting in more Ca2+ channels being available to open when the cells are depolarised. This dual expression system can also be used in reverse to measure activity of KCNK2 in FLIPR assays using the Ca2+ channel as a reporter.
4 Post-translational Processing of Heterologous Expressed Ion Channels
Since, by nature, they are complex multimeric transmembrane proteins, ion channels present particular challenges for efficient heterologous expression. In order to obtain functional activity a number of post-translational steps are required to ensure that channel proteins are correctly folded, processed, assembled and transported to the appropriate membrane compartment (for review see Ref. [12]). Since, each of these steps requires interaction of the heterologous channel protein with homologous host factors, there is potential for incorrect processing at every stage. In addition, abnormally high levels of expression can potentially overwhelm the capacity of the endogenous processing machinery, again leading to unprocessed or aberrant channel protein. Similarly, for any given channel, a particular processing step may require specific factors that might not be endogenous to the cell type being used (see below for examples), again leading to aberrant, nonfunctional protein. Processing “bottlenecks” such as these can result in the aberrant channel protein either accumulating and then aggregating within the cell [13], or alternatively being targeted for degradation [14, 15]. In extreme cases both of these outcomes can be
7
8 Functional Expression of Ion Channels in Mammalian Systems
4 Post-translational Processing of Heterologous Expressed Ion Channels
difficult to diagnose, since the former can lead to cytotoxicity and low viability (see discussion below), and the latter can appear as a total failure of expression unless this is being monitored by measuring mRNA levels. In less extreme cases a low level of the mature functional protein may be detected, but the amount produced is likely to be sensitive to both the level and rate at which expression is being driven. Paradoxically, attempting to improve yields in this situation by increasing expression, for example with the use of stronger promoters, may actually be counterproductive [16] since this could exacerbate blockage of the processing pathway. Instead, the use of titratable (e.g. viral) and/or inducible systems is recommended in order to try to optimise the balance between the total protein expressed and the proportion that is correctly processed and functional (see Sections 6.1 and 7.3). An alternative approach for increasing the proportion of mature channel protein is to augment the level of processing factors that may be present only in limiting amounts or absent altogether. As discussed previously, an empirical way of achieving this is to identify and use an alternative host cell type that is more related to the tissue in which the native channel is expressed. However, another approach is to overproduce factors that are likely to be, or are known to be, involved in channel maturation by co-expression of their cloned cDNA. Proteins with this activity broadly fall into two categories: those that are thought to play a general role in quality control and promoting protein folding and transport within most cells (e.g. chaperones), and those with more specialised trafficking roles for particular types of ion channel. Chaperones act by binding to nascent proteins in order to promote correct folding and targeting. It is known that proteins with large regions of exposed hydrophilic regions are toxic to cells, possibly by promoting membrane damage [17], and indeed this is believed to contribute to the pathogenic mechanism of certain “viral” proteins (e.g. prions [14, 15]). It is likely that the production of large amounts of misfolded recombinant membrane proteins can also cause cytotoxicity by similar mechanisms. Thus, co-expression of factors with chaperone-like activity should be beneficial, not only by increasing the proportion of mature channel protein produced, but also by counteracting this toxic effect. Examples of chaperones that have been shown to boost the surface expression of various recombinant ion channels (e.g. Kv, nAChR, HERG, CFTR, NaV ) include calnexin, heat shock ← Fig. 3 Manipulation of the host cell background to optimise functional expression of ion channels. (A) Co-expression of the TREK1 K+ channel is required in order to hyperpolarise the resting potential of HEK293 cells so that functional R-type CaV channels can be detected in a FLIPRbased fluorescence assay. Only when the TREK1 K+ channel is co-expressed, in this case using a BacMam viral vector, can robust Ca2+ influx (measured by an increase in fluorescence of the Ca2+ binding dye Fluo4) be induced by depolarisation (addi-
tion of KCl) of cells stably expressing the Rtype CaV channel (CaV 2.3 and $3 subunits). B) Similar functional responses can be observed when all three subunits (CaV 2.2, $3 and TREK1) are delivered using BacMam. Little or no Ca2+ influx is seen in the absence of TREK1 virus. (C) The magnitude of Ca2+ influx (change in fluorescence) can be manipulated by varying the amount of TREK1 virus added. (X, no TREK1 virus added; , 5 × 106 plaque forming units (pfu) ml−1 of TREK1 virus added, , 107 pfu ml−1 ; , 2 × 107 pfu ml−1 .)
9
10 Functional Expression of Ion Channels in Mammalian Systems
proteins (e.g. Hsp70, Hsp90) and fibroblast growth factor homologous factor 2B (FHF2) [18–23]. However, as well as their beneficial effects, it must be remembered that, in some circumstances, binding of chaper-ones can actually target proteins for degradation [22]. In addition to chaperones, certain cytoskeletal scaffold proteins which interact with channel proteins at the cell surface can also promote surface localisation of the channel [24]. As well as proteins that have general chaperone-like activity within the cell, a number of channel-specific maturation factors have been identified. Examples include the PACS proteins, annexin II light chain, and KChap which promote surface localisation of Trp, Kv and NaV 1.8 channels respectively [25–27]. In addition, numerous other channel-associated proteins are known that have dual functions, both acting as channel-specific modulatory subunits as well as enhancing channel trafficking. Examples include the β subunits of Kv, NaV (see Fig. 2B) and CaV channels [28–30], α2δ subunits of CaV channels [31], calmodulin [32, 33] and stargazin [34] (which promote trafficking of IK/SK and AMPA channels respectively). Some of these are thought to act by masking specific ER retention motifs present in the channel protein. There is evidence for these retention signals in a wide range of channels, including Kir, SK/IK, Kv, Eag, HERG, NMDA, kainate, nAChR [35–44], and it is thought that these motifs are exposed in the monomeric subunit and become inaccessible upon assembly of the mature multimeric channel (see for example Ref. [44]). Thus, for optimal expression of any given channel, it is worth co-expressing any known channel-specific accessory subunits or interacting factors, particularly in cases where known ER retention signals are present. Another strategy that can be used to increase the amount of functional channel protein found at the cell surface is to lower the growth temperature at which the cells are maintained. This has been shown to give higher levels of surface expression for a number of different channel types. For several of these examples (e.g. CFTR, P2X7, HERG, NaV 1.5 [45–49]) the effect was first noted with variants of the channel that are defective for trafficking. Incubation of cells expressing these mutants at reduced temperature (e.g. 25–30 ◦ C) was found to rescue the defect. This appears to be a general phenomenon since it has also been observed with nonmutated channels (e.g. ROMK1, nAChR’s, KAT1 [50–55], and HERG – see Fig. 4). There are several possible explanations for this effect, for instance, lower temperatures could result in increased steady-state levels of transcript or synthesis of channel protein, reduced protein turnover, improved protein folding, assembly or transport. The best studied example is probably the CFTR508 mutant that is the most prevalent cause of cystic fibrosis. Recent studies indicate that the deleted residue normally has a major contribution to the proper folding of the channel [56, 57]. Thus, at physiological temperature (37 ◦ C) folding of the mutant is impaired and the protein is targeted for degradation, probably due to binding of the Hsp70/CHIP chaperone complex [22]. However, at lower temperature (30 ◦ C) folding kinetics are more favorable, protein degradation is greatly reduced, and the mutant protein can then be detected at the cell surface. A similar mechanism probably accounts for the increased surface expression of nonmutant channels at lower temperature. Consistent with this, reduced protein degradation has been
4 Post-translational Processing of Heterologous Expressed Ion Channels
Fig. 4 Effect of reduced cell culture temperature on ion channel expression. (A) Confocal images of stable HERG-expressing CHO cells grown at different temperatures, following immunocytochemical staining with a HERG-specific antibody. The level of HERG immunoreactivity is significantly increased in cells grown at 27 ◦ C and 30 ◦ C compared to 37 ◦ C. No immunoreactivity is seen in untransfected CHO cells (CHO-wt). The increase in total HERG protein at 30 ◦ C seen here is also mirrored by an increase in surface-localised functional protein as
measured by IonWorks electrophysiology. Note that, in the HERG-expressing cells, at 37 ◦ C and 30 ◦ C immunoreactivity is uniform and granularthroughoutthe cytoplasm whereas at 27 ◦ C there is an accumulation in giant lysosome-like bodies. (B) Quantitative analysis of HERG immunoreactivity by flow cytometry confirms the increases seen at lower growth temperatures, as indicated by the rightward shift in peak fluorescence (X-axis) observed within the populations of cells grown at 27 ◦ C and 30 ◦ C compared to 37 ◦ C.
11
12 Functional Expression of Ion Channels in Mammalian Systems
reported in several cases (ROMK1, α4β2nAChR [50, 53] and HERG – see Fig. 4b). Evidence suggests that degradation can occur via mechanisms that are both proximal (proteosomes) and distal (endocytic recycling) to the secretory pathway ([58], Fig. 4A). Thus, maintenance of cell cultures at lower temperatures would be expected to improve functional expression of any channel that is prone to misfolding when over-expressed.
5 Cytotoxicity
Many ion channels are poorly tolerated when over-expressed in mammalian cells, leading to poor growth and viability, and low levels of expression which tend to decline during cell passage. Usually this becomes evident during stable cell line generation when only a low number of viable clones can be selected, with few of those that do arise giving measurable levels of expression. Among these clones only a low proportion of cells actually express and there are large cell-to-cell variations. This is a hallmark of toxicity induced by heterologous protein expression, but often the situation can be at least partially rescued by the use of appropriate expression vectors (e.g. IRES vectors, see Section 7.1). Other strategies to address this problem include the use of transient or inducible expression systems (see Sections 6 and 7.3). However, it may also be possible to improve the situation by attempting to tackle the root cause(s) of the problem. As discussed above, one possible reason for toxicity is saturation of the host processing machinery – this may lead to deficiencies in the processing of essential host proteins, with consequent toxic effects, as well as the accumulation of large amounts of misfolded protein which may have membrane damaging effects. In this case co-expression of chaperones or channel accessory subunits may be beneficial. Another possible mechanism for toxicity is that functional activity of the channel itself has cytotoxic consequences for the host cell. Channels that are permeable to calcium may be particularly susceptible to this (e.g. NMDA-NR2a and NR2b [59]), since calcium signalling regulates many fundamental processes within the cell and is normally tightly regulated. Clearly, over-expression of large conductance calcium-permeable channels, or those that normally have a role in calcium homeostasis, is more likely to cause this type of problem. A potential solution to this problem is to use known channel blockers during reagent generation (e.g. during selection of stable clones) and also during routine maintenance of the recombinant cells [60]. In the light of this it is worth bearing in mind that aminoglycoside antibiotics, which are often used for selection during stable cell line generation (e.g. neomycin, geneticin), have been found to have inhibitory activity at several types of ion channel (e.g. TRPV1, CaV , NMDA [61–64]). This may be beneficial during cell line generation for toxic channels. On the other hand, it has also been found the aminoglycoside antibiotics can also potentiate channel activity (e.g. NMDA-NR2b [63]), which may actually exacerbate toxicity.
6 Transient Expression Systems 13
6 Transient Expression Systems
Transient expression systems are commonly used in ion channel studies for a wide variety of applications. They are particularly valuable where rapid and/or high throughput analysis is required, for example in alanine scanning, cysteine scanning, site-directed or random mutagenesis studies and for the evaluation of expression constructs, channel variants and subunit combinations. Transient expression provides a flexible way of titrating the level of expression, controlling stoichiometry of multi-subunit channels and introducing reporters or interacting factors that may be required. In a drug screening environment, where many different ion channel assays may be run in parallel, transient expression can be more convenient than stable expression since this reduces cell culture demands by removing the need to continuously maintain multiple cell lines in culture. As discussed already, transient systems are also useful where toxicity is known to be a problem. Several alternative approaches for transient expression are available and each has different advantages and disadvantages. 6.1 “Standard” Transient Expression
“Standard” transient expression approaches usually involve the treatment of cells with chemical agents that promote the uptake and internalisation of naked DNA. Different types of agent are available, and these work with varying efficiencies on different cell types [64]. Among the most commonly used for ion channels are calcium phosphate [65] and various cationic lipid reagents which are available from commercial suppliers (e.g. Lipofectamine, Invitrogen). Although the latter come with detailed protocols, there are particular considerations for ion channels. For example, in electrophysiology experiments an appropriate balance is needed between the transfection rate (i.e. proportion of cells expressing) and the health of the cells following transfection. Thus, in order to achieve the best results for any given channel it may be worth optimising the transfection conditions (amount and time of reagent exposure, recovery time etc.). An alternative methodology to chemical reagents sometimes used for transient expression of ion channels is electroporation [66], though this tends to be less efficient. This is because the magnitude of the electrical pulse that can be given to promote DNA uptake is limited by deleterious effects on cell viability and once again a balance between transfection efficiency and cell health has to be struck. The efficiency of transient transfection and gene expression is highly dependent on the cell type and expression vector used. HEK293 and COS cells are among the more competent for DNA uptake and are commonly used for ion channel expression, though other cell types can also be used (e.g. CHO). Variants of HEK293 are available which enable plasmid replication when using vectors containing the origin of replication from SV40 due to the expression of SV40 Tantigen (HEK293T, ATCC) [67, 68]. This “episomal” vector system should give enhanced expression
14 Functional Expression of Ion Channels in Mammalian Systems
due to the increased copy number of the channel cDNA and possibly due to other activities of T-antigen. Another episomal vector system uses an analogous transacting DNA replication factor (EBNA1) from Epstein Barr virus (EBV) which enables replication of vectors containing the replication origin from EBV (oriP) [69]. Expression may also be boosted via a reported EBNA1-dependent enhancer like activity of oriP [70]. Different variations of this system have been developed where EBNA1 is encoded either by the expression vector (e.g. pCEP4, Invitrogen) or by the host cells (HEK293E, Invitrogen). The former has the advantage of being compatible with any host cell type, whereas the latter may be more convenient since a smaller vector can be used which may aid insertion of the channel encoding cDNA. Streamlined oriPcontaining episomal vectors of this type have been developed that contain optimised promoters and that have been shown to give very high levels of expression [71]. 6.2 Viral Expression Systems
Standard transient transfection is the most rapid means of mammalian cell expression. However, the efficiency is highly dependent on the host cell background and many cell types that are of interest for ion channel expression can be relatively resistant to transfection methodologies (e.g. neuroblastoma cell lines). A potential solution to this problem is to make use of virus-based systems for gene delivery, and two examples that have been successfully applied to the heterologous expression of ion channels and membrane-bound receptors in mammalian cells will be discussed here. The first system is derived from the single-stranded RNA virus, Semliki Forest Virus (SFV), and the second is based on the insect cell baculovirus, Autographa California nuclear polyhedrosis virus (AcMNPV). The SFV expression system comprises two plasmid vectors which together contain a cDNA copy of the viral genome [72]. The expression vector contains the non-structural genes, the strong SFV 26S promoter for expression of the heterologous gene and a viral packaging signal. The helper vector encodes the structural proteins, including the capsid and envelope genes, but no packaging signal. A stock of recombinant viral particles is produced by first generating single-stranded helper and expression vector RNAs using in vitro transcription and then co-transfecting these into baby hamster kidney (BHK) cells. These particles can then be used to efficiently infect a broad range of mammalian cell lines (including BHK, COS, HEK293), as well as primary cells, and are capable of giving rapid, high-level expression. However, no virus progeny is produced because the recombinant particles contain no helper RNA and are therefore replication deficient. This system has been used extensively to express G-protein coupled receptors (GPCRs) [73] often giving good expression levels and correctly folded protein, as determined by ligand binding assays and in some cases by coupling to G-protein signalling pathways. The potential for ion channel expression has also been explored and several channels have been expressed, including 5HT3, P2X1, 2 and 4 [74], GABAa [75] and Kv [76]. Functional activity was demonstrated for 5HT3, P2X receptors and GABAa. Recombinant baculoviruses have been routinely used for efficient protein expression in insect cells for many years. However, although baculovirus virions are able
6 Transient Expression Systems 15
to enter mammalian cells, no viral expression can normally be detected. Recently baculovirus-based “BacMam” vectors have been engineered to allow expression in mammalian cells by insertion of promoters that are active in mammalian hosts (for a review see Ref. [77]). The gene of interest is inserted into a transfer vector downstream of the mammalian promoter (e.g. CMV) in which, similar to the baculovirus system, it is flanked by regions that promote insertion into the baculovirus genome, either by homologous recombination in insect cells or by transposonmediated recombination in E.coli cells (see Fig. 5). A stock of virus particles is generated by propagation of the recombinant virus genome in insect cells and this can then be used to deliver the gene for transient expression in a range of different mammalian cell types, including many of those commonly used for recombinant protein production. The BacMam transfer vector commonly used also contains a selectable marker conferring resistance to the antibiotic G418, thus allowing the additional option of selecting stable cell lines from virus-transduced cells [78]. The BacMam system works particularly well with certain cell types including HEK293, and the human osteosarcoma derived cell line, U2OS in which transduction and expression rates are high [79, 80]. It has been used with good success for membrane protein expression, including a large range of GPCRs and ion channels [81, 82]. In common with other transient systems it offers advantages for expression of toxic channels and for multi-subunit and multi-channel expression [81]. For example, the NMDA-NR2b glutamate receptor, which requires expression of both the NR2b and NR1 subunits for functional activity, is extremely toxic when expressed in non-neuronal mammalian cells [59]. In the author’s laboratory, repeated attempts to generate NR2b-expressing stable cell lines were unsuccessful, even using IRES vectors (see Section 7.1) and even in the presence of a channel blocker (ketamine). However, by generating separate BacMam viruses for each subunit, and transducing a mixture of these into HEK293 cells, robust glutamate-induced calcium influx can be observed that is not present in mock-transduced cells (Fig. 6). The magnitude of the response is proportional to the total amount of virus added and the stoichiometry of the two subunits can be readily investigated and optimised by varying the dose of virus for each subunit. An example of multi-channel expression using BacMam viruses is shown in Fig. ??. In addition to viruses for two different subunits of the R-type calcium channel (CaV 2.3, α2δ), a virus encoding the TREK-1 K+ channel was also introduced, in order to hyperpolarise the resting potential of the host HEK293 cells and thus prevent inactivation of the R-type channel. In this way, robust calcium influx can be measured when the TREK-1 virus is added, but not in its absence, and the magnitude of the response can be varied by altering the amount of TREK-1 virus added. In summary, the BacMam system is a versatile tool for the robust analysis of many ion channels and is particularly suitable for toxic or multi-subunit channels. It allows flexible control over subunit or channel stoichiometry simply by varying the dose of the corresponding viral constructs used for transduction. It is also a convenient means of introducing additional factors into existing ion channel stable cell lines, for example chaperones or reporter genes that might be required for use in alternative assay formats (e.g. aequorin, halide sensitive YFP). Little or no cytopathic effects of virus transduction are observed and cells can normally be used
16 Functional Expression of Ion Channels in Mammalian Systems
Fig. 5 The BacMam expression system. (A) Map of the pFastBacMam1 shuttle vector which used to generate recombinant Bac-Mam viruses. The gene of interest is inserted downstream of the CMV promoter where it is flanked by Tn7 inverted repeats that direct site-specific transposition into
the viral genome when transfected into the appropriate E.coli host. (B) Workflow for the generation of recombinant BacMam virus stock and transduction into mammalian cells for expression. Reproduced with permission from Ref. [77].
7 Stable Expression of Ion Channels 17
Fig. 6 Expression of a cytotoxic channel using the BacMam system. (A) Functional expression of the NMDA-NR2b receptor using BacMam. Glutamate-induced Ca2+ influx, as measured by increased fluorescence of the Ca2+ binding dye Fluo4, can be observed in HEK293 cells transduced with a 1:5 mixture of BacMam viruses expressing
the NMDA-NR1A and NR2b subunits, respectively. The arrow indicates the point of glutamate addition. (B) The magnitude of the peak response is dependent on the total amount of virus added, as measured by MOI (multiplicity of infection – the number of virus particles added per cell).
for electrophysiology assays the day after transduction. BacMam viruses are also safe since they are unable to replicate in mammalian cells and the budded form of the virus is noninfectious for the natural insect host [77].
7 Stable Expression of Ion Channels
For many applications stable expression of the target ion channel may be desirable, for example for detailed biophysical studies or extended compound screening campaigns. Unlike transient systems, this avoids the need to continually carry out transfections or transductions, and there is no requirement for repeated
18 Functional Expression of Ion Channels in Mammalian Systems
re-synthesis of vector DNA or virus reagents. In theory, stable expression would also be expected to be more reproducible than transient expression. Nevertheless, a certain degree of day-to-day and cell-to-cell variation of expression in stable cell lines is liable to occur and this can be considerable for some ion channels. In practise, the worst stable cell lines can be more variable than some transient systems, e.g. BacMam viruses, which are capable of giving perfectly acceptable reproducibility. 7.1 Bicistronic Expression Systems
The generation of robust ion channel expressing stable cell lines can be problematic. As discussed already, since many ion channels are poorly tolerated when over-expressed, difficulties can arise due to poor growth and loss of expression during passage, even when selection is maintained. Often this is accompanied by large cell-to-cell variability in expression level even within clonally isolated lines, which can be a particular problem for single-cell format electrophysiology assays. A related problem during cell line generation is that, following transfection and selection, only a low proportion of clones actually express measurable levels of channel activity. This situation can be improved by using vectors where expression of the channel is closely coupled to that of the selectable marker. The best way of achieving this is to use a viral internal ribosome entry site (IRES) which promotes translation of both proteins from a single bicistronic mRNA [83]. The IRES element is placed between the two coding regions, with the channel upstream and the selection marker immediately downstream (Fig. 7). Translation of the upstream channel is initiated, as normal, by signals in the 5 untranslated region, whereas translation of the downstream selection marker is initiated by binding of ribosomes to the IRES element. Thus, both proteins are produced from a single transcript with the advantage that expression of the two is very tightly linked. This is unlike “standard” vectors where the selection marker and gene of interest are in separate expression cassettes and it is not uncommon for expression of the two to become separated, either during clonal selection or subsequent maintenance of the cell line, giving rise to the problems described above. This is especially true with toxic genes where the growth disadvantage conferred on expressing cells is an additional selection pressure for this to occur. Thus, an additional benefit with IRES vectors is that expression should remain relatively stable over extended periods as long as selection is maintained (Fig. 8). The advantages of IRES vectors for stable cell line generation become particularly apparent when using high-throughput electrophysiology assays. Automated patchclamp instruments are planar array instruments, and place great demands on the quality of the cell line being used since they are single cell assays in which the cells are randomly selected (though this problem is now being addressed by the latest generation instruments, e.g. IonWorks Quattro). Thus, a very high proportion of cells in the population that express currents of usable size is essential in order to avoid low success rates and unacceptably high costs resulting from compound wastage and reduced throughput. On the other hand, these instruments provide
7 Stable Expression of Ion Channels 19
Fig. 7 Bicistronic expression vectors for stable ion channel expression. (A) Using an IRES element the expression of the ion channel of interest is co-translationally linked to the selection marker (neoR), thus avoiding loss of expression and instability problems that can occur with conventional expression vectors due to independent seg-
regation during generation and propagation of the cells. (B) Plasmid map of a typical bicistronic expression vector, pCIN5L, that can be used to generate stable cell lines. The ion channel cDNA is inserted upstream of the IRES element between the Not I and Nhe I restriction sites. (IVS, intervening sequence or intron).
invaluable tools for functional screening during cell line generation. They enable a vast increase in the number of clones that can be screened compared to that previously possible using conventional patch-clamp (Fig. 2B). Using a combination of IRES vectors and screening by high throughput automated electrophysiological screening, very high quality stable cell lines can readily be obtained with a variety of different ion channels, which typically express peak currents in the nA range in greater than 90–95% of the cell population (Fig. 9). High throughput patch-clamp instruments also add further precision to the screening and functional validation of ion channel reagents by providing the capability for statistically analysing expression in large numbers of individual cells. This is highly advantageous for monitoring the often very subtle effects of auxiliary subunits or for carrying out detailed comparative studies, for example, of different expression constructs; vectors, host cells etc. (see Figs. 1, 8 and 9). IRES-containing vectors carrying the CMV promoter to drive transcription of the bicistronic mRNA and the IRES element derived from Encephalomyocarditis
20 Functional Expression of Ion Channels in Mammalian Systems
Fig. 8 Quantitative analysis of multiple subunit expression using automated high throughput electrophysiology. (A) Population histogram of current amplitudes in individual cells (n = 3180) coexpressing KCNQ1 and the accessory subunit minK. The fitted line is a 2 peak gaussian fit indicating the presence of two populations–nonexpressors (currents <0.5 A, mean = 0.33 nA), and expressors (cur-
rents >0.5 A, mean = 1.05 nA). The expressors comprise >80% of the cell population. (B) Stability of expression over passage showing percentage of cells expressing (triangles) and mean peak current in expressing cells (circles). Each point represents the mean of >250 cells. Kinetic analysis of the currents indicates that virtually every cell expresses the minK subunit as well as KCNQ1.
7 Stable Expression of Ion Channels 21
Fig. 9 Detailed evaluation of ion channel stable cells lines using automated high throughput electrophysiology. HEK293 cells were transfected with an expression vector for human NaV 1.7 and multiple stable clones were selected, expanded and then screened for expression of Na+ currents using Ion-Works. The best expressing clone was identified and then characterised fur-
ther. (A) Representative inward Na+ current evoked by depolarisation to −10mV, showing blockade by tetrodotoxin (TTX, 300 nM). Mean peak currents (B) and distribution of peak current (C) measured from 270 cells are shown. (D) Greater than 80% of cells gave usable seals (>80 M) and expressed currents above background.
virus (EMCV) are commercially available (BD Biosciences Clontech). Versions with different variants of the EMCV IRES are also available. In the original pCIN vector (and similar vectors, e.g. pCIN3 [84]) the efficiency of expression of the downstream neomycin-resistant selection marker is relatively low since the distance between the neomycin phosphotransferase initiation codon and the IRES element is nonoptimal (i.e. 35bp less than is found between the IRES element and the major site of initiation of the viral polyprotein in the native EMCV virus) [85]. This is likely to be advantageous for many applications because a relatively low efficiency of translation of neomycin phosphotransferase should favor the selection of clones expressing
22 Functional Expression of Ion Channels in Mammalian Systems
high levels of the bicistronic mRNA, therefore giving relatively high levels of the upstream ion channel gene also. However, for reasons already discussed, this is not necessarily optimal for functional expression of ion channels and an alternative vector, pCIN5, that gives lower expression is also available [[85], BD Biosciences Clontech]. In this vector the initiation codon of neomycin phosphotransferase has been engineered to be in precisely the same position with respect to the IRES as in the downstream ORFof the native EMCV virus. This is likely to give more efficient expression of the selectable marker resulting in lower levels of bicistronic mRNA and correspondingly lower levels of channel expression from the upstream cistron. The pCIN5 vector has been used successfully for a variety of different channels (see Figs. 2, 4, 8, and 9 and also Refs. [7, 83–88]). 7.2 Stable Expression of Multiple Subunits
As well as those with different IRES variants, vectors with different selection markers are available [85, BD Biosciences Clontech] and have been used successfully for a variety of multi-subunit and multi-channel examples [87, Fig. 8]. It is possible to successfully co-express up to three subunits or channels using these different vectors with different selection markers for each. However, most cell types grow relatively poorly under such extreme selection pressure in the presence of three antibiotics. Thus, for channels containing more than three subunits, and also where channel expression itself has deleterious effects on host cells, a different strategy is sometimes required. One possibility is to link two of the required subunits on the same expression vector, either as separate expression cassettes or via an IRES element. As previously discussed, a concern with using two separate expression cassettes on the same vector is that expression of the two subunits could become segregated during cell line selection or maintenance. Similarly, a concern with linking the two subunits via an IRES element is that, unless arranged in a tricistronic configuration (see below), the selection marker must be expressed as a separate expression cassette, again with the risk that it can become separated from expression of the channel subunits. Another potential problem with the latter approach is that, because expression of cDNAs downstream of the IRES element is considerably lower than that of cDNAs positioned upstream, the stoichiometry of subunits expressed may be inappropriate. Nevertheless, this system has been used successfully for functional expression of the P2X2/3 channel [89]. Another application where this arrangement has been successful is a variation where the downstream cistron is a reporter that can be used for fluorescence activated cell sorting (FACS), e.g. green fluorescent protein (GFP). In this case, GFP fluorescence can be used as a more convenient surrogate for monitoring expression of the upstream gene of interest and for selecting a subpopulation of transfected cells that are enriched for expression [90]. Obtaining an acceptable stoichiometry of subunits is also a consideration when using tricistronic expression constructs. Since this arrangement involves the expression of three different cDNAs (e.g. two channel subunits plus selection marker)
7 Stable Expression of Ion Channels 23
linked using two IRES elements, the expression of the cDNA at the 3 end will be very low indeed. Stability of expression with tricistronic constructs is also a potential issue due to intrinsic instability of such lengthy transcripts, especially for larger channel subunits e.g. NaV or CaV α subunits. In addition, unless IRES elements from different sources (i.e. having different sequences) are used, there is also the potential for deletions and rearrangements to occur by homologous recombination. Despite these caveats there are reported examples of successful expression using tricistronic constructs [85, 89, 91]. Another interesting strategy for multisubunit channels that may be applicable to some channel types is the expression of tandemly concatenated subunits. This approach has been shown to be feasible in a number of instances [92] and is particularly useful for investigating or fixing the stoichiometry and assembly of complex channels, since covalently linking subunits predetermines their composition and arrangement in the mature protein. 7.3 Inducible Expression
While the use of IRES vectors has proven effective for the selection of stable cell lines for a wide variety of ion channel types, sometimes expression of the channel can be so cytotoxic that the growth rate of antibiotic resistant clones is prohibitively slow or, in extreme cases, no clones actually survive antibiotic selection. As reviewed above, one option is to use transient expression to address this problem but, if stable expression is required, an alternative strategy is to use inducible expression. As with transient expression, inducible systems offer the possibility of regulating the level of channel expression, in this case by titration of the inducer. Several inducible different systems have been described, but probably those most commonly used for ion channels make use of the bacterial tetracycline resistance operon. Alternative versions are available depending on whether regulation occurs via a derepression or transactivation mechanism. In the original system, expression of the gene of interest is driven by a mini CMV promoter containing a tetre-pressor binding site [93, BD Biosciences Clontech]. This is activated by a fusion protein which is a hybrid between the tet repressor and the activation domain of the VP16 protein from Herpes Simplex virus. When tetracycline is added this hybrid tet/VP16 repressor protein no longer binds the promoter and no expression of the gene of interest can occur. This “Tet-off” system has been used for several ion channels including NMDA-NR2a [94], NaV 1.7 and NaV 1.8 [95]. A “Tet-on” variation of this has been developed which uses a mutant tet repressor/VP16 hybrid activator that binds to the promoter only when tetracycline is present [96]. An ion channel example using this system is NR2b [97]. The other tetracycline regulated system (“T-REx”, Invitrogen) uses a wild-type tet repressor protein (i.e. not fused to VP16) which normally binds to tet operator sequences within the CMV promoter silencing expression of the downstream gene of interest. When tetracycline is added the repressor protein no longer binds and expression of the gene of interest is derepressed [98]. A potential disadvantage of this derepression approach, compared to transactivation, is that production of the tet repressor needs to be very efficient in order to effectively silence
24 Functional Expression of Ion Channels in Mammalian Systems
the powerful CMV promoter. Thus, basal levels of expression can be relatively high, which may be a problem for highly toxic channels. Nevertheless, this system has been successfully used for several ion channels including TrpM2, CaV 3.2 and Kv2.1 [99–101]. On the other hand the level of expression when fully induced is higher with the derepression system since this uses a stronger CMV promoter which does not rely on the activity of a hybrid viral transactivator. Other types of inducible system that have been used for ion channels involve the use of transactivating steroid hormone receptors. One of these is based on the insect hormone, ecdysone, and uses a heteromeric receptor that is functional in mammalian cells [102]. This receptor is a heteromer between the mammalian retinoid X receptor (RXR) and the insect ecdysone receptor (EcR) that binds to an engineered promoter containing specific recognition sequences and tightly represses expression of the downstream gene. In the presence of synthetic ecdysone analogue inducers (muristerone A or ponasterone A) the conformation of the receptor is altered and transactivation of the downstream gene then occurs. A similar system, “GeneSwitch” (Invitrogen), uses a receptor that is a hybrid between the DNA binding domain from the yeast Gal4 transactivator, the ligand binding domain from the human progesterone receptor and the activation domain from the NFkb transcription factor (p65AD). This binds to an engineered GAL4-adenovirus E1b promoter
Fig. 10 Generation and analysis of stable clones with inducible channel expression. Expression screening of 16 HERGGeneSwitch clones by flow cytometry using a HERG-specific antibody. Clones were generated by transfection of a pGENE-HERG vector into HEK293 cells expressing the GeneSwitch regulatory protein (pSwitch) and selection with zeocin. For each clone expression in both non-inducing (filled
bars) and inducing conditions (mifepristone added, open bars) is shown. Clones 9 and 16 showed substantial increases in HERG immunoreactivity following induction, though the induced level of expression was lower than that seen in 4 representative clones generated with an expression vector that uses the CMV promoter (pCIN5, lanes shown at far left).
References
upstream of the gene of interest. In the absence of the inducer, mifepristone, the hybrid GeneSwitch receptor represses the promoter but when mife-pristone is added it binds to the receptor and induces a conformational change that causes activation of expression. An example of an ion channel expressed using this system is shown in Fig. 10. Both the ecdysone and the GeneSwitch systems should give relatively low basal expression levels since the receptors are able to repress promoter activity in the absence of inducer, so may be suitable for the expression of highly toxic channels. For example, the ecdysone system has been used to express the cytotoxic NMDANR2a and NR2b receptors [103, 104]. However, the fully induced level of expression may not be as high as with vectors that use the CMV promoter (see Fig. 10).
8 Conclusions
In conclusion, this chapter has attempted to highlight some of the many potential difficulties that functional expression of heterologous ion channels in mammalian cells can present. These challenges can be magnified even further within a drug discovery setting, due to the stringent demands for robust, highly reproducible expression over a long term and on a large scale. Nevertheless, as has been outlined, a variety of approaches to address these problems are available, each with their own set of advantages and disadvantages. The choice of expression system will ultimately be governed by the nature of the channel of interest and by the final assay format that is required. However, it is often not possible to predict with any certainty how successful a particular system will be and in many cases, particularly for more novel channels, a combination of approaches may be necessary. Thus, it is often worth exploring several different strategies in parallel in order to develop a final expression reagent that is most appropriate for the particular channel-assay combination(s) required.
Acknowledgements
The author would like to acknowledge numerous colleagues at GSK who have given support and input, particularly Mark Chen, Yuhua Chen, Pat Condreay, Dave Grose, Bruce Hamilton, Andy Powell, Andy Randall, Steve Rees, Mike Romanos, Derek Trezise and Simon Tate.
References 1 G. SEEBOHM, C. LERCHE, A. E. BUSCH, A. BACHMANN, Dependence of I(Ks) biophysical properties on the expression system, Pflugers Arch. 2001, 442, 891–895.
2 A. SHCHERBATKO, F. ONO, G. MANDEL, P. BREHM, Voltage-dependent sodium channel function is regulated through membrane mechanics, Biophys.J. 1999, 77, 1945–1959.
25
26 Functional Expression of Ion Channels in Mammalian Systems
3 H. WITCHEL, J. T. MILNES, J. S. MITCHESON, J. C. HANCOX, Troubleshooting problems with in vitro screening of drugs for QT interval prolongation usung HERG K+ channels expressed in mammalian cell lines and Xenopus oocytes, J. Pharmacol. Toxicol. Methods 2002, 48, 65–80. 4 M. KOZAK, The scanning model for translation: an update, J. Cell. Biol. 1989, 108, 229–241. 5 P. H. LALIK, D. S. KRAFTE, W. A. VOLBERG, R. B. CICCARELLI, Characterisation of endogenous sodium channel gene expressed in Chinese hamster ovary cells, Am. J. Physiol. 1993, 264, C803-C809. 6 G. J. STEPHENS, K. M. PAGE, J. R. BURLEY, N. S. BERROW, A. C. DOLPHIN, Functional expression of rat brain cloned α1e calcium channels in COS7 cells, Pflugers Arch. 1997, 433, 523–532. 7 V. H. JOHN et al., Heterologus expression and functional analysis of rat NaV 1.8 (SNS) Voltage-gated sodium channels in the dorsal root ganglion neuroblastoma cell line ND7–23, Neuropharmacol 2004, 46, 425–438. 8 J. CHEMIN, A. MONTEIL, S. DUBEL, J. NARGEOT, P. LORY, The alpha1I T-type calcium channel exhibits faster gating properties when overexpressed in neuroblastoma/glioma NG 108-15 cells, Eur. J. Neurosci. 2001, 14, 1678–1686. 9 T. LJUNGSTROM, M. GRUNNET, B. S. JENSEN, S. P. OLESEN, Functional coupling between heterologously expressed dopamine D(2) receptors and KCNQ channels, Pflugers Arch. 2003, 446, 684–694. 10 V. J. DUPRIEZ, K. MAES, E. LE POUL, E. BURGEON, M. DETHEUX, Aequorin-based functional assays for G-protein-coupled receptors, ion channels, and tyrosine kinase receptors, Receptors Channels 2002, 8, 319–330. 11 L. J. V. GALIETTA, M. F. SPRINGSTEEL, E. MASAHIRO, E. J. NIEDZINSKI, K. BY, M. J. HADDADIN, M. J. KURTH, M. H. NANTZ, A. S. VERKMAN, Novel CFTR Chloride Channel Activators Identified by Screening of Combinatorial Libraries
12 13
14 15
16
17
18
19
20
21
22 23
Based on Flavone and Benzoquino. lizinium Lead Compounds, J. Biol. Chem. 2001, 276, 1972–1978. C. DEUTSCH, The birth of a channel, Neuron 2003, 40, 265–276. J. JOHNSTON, C. L. WARD, R. R. KOPITO, Aggresomes: A cellular response to misfolded proteins, J. Cell Biol. 1999, 143, 1883–1898. C. M. DOBSON, Protein folding and mis-folding, Nature 2003, 426, 884–890 A. L. GOLDBERG, Protein degradation and protection against misfolded or damaged proteins, Nature 2003, 426, 895–899. M. K. HIGGINS, M. DEMIR, C. G. TATE, Calnexin co-expression and the use of weaker promoters increase expression of correctly assembled Shaker potassium channel in insect cells, Biochim. Biophys. Acta 2003, 1610, 124–132. J. I. KOURIE, C. L. HENRY, Ion channel formation and membrane linked pathologies of misfolded hydrophobic proteins: the role of dangerous unchaperoned molecules, Clin. Exp. Pharmacol. Physiol. 2002, 29, 741–753. L. N. MANGANAS, J. S Trimmer, Calnexin regulates mammalian Kv1 channel trafficking, Biochem. Biophys. Res. Commun. 2004, 322, 577–584. W. CHANG, M. S. GELMAN, J. M. PRIVES, Calnexin-dependent Enhancement of Nicotinic Acetylcholine Receptor Assembly and Surface Expression, J. Biol. Chem. 1997, 46, 28925–28932. S. H. KELLER, J. LINDSTROM, P. TAYLOR, Involvement of the chaperone protein calnexin and the acetylocholine receptor beta-subunit in the assembly and cell surface expression of the receptor, J. Biol. Chem. 1996, 37, 22871–22877. E. FICKER, A. T. DENNIS, L. WANG, A. M Brown, Role of the Cytosolic Chaperones Hsp70 and Hsp90 in Maturation of the Cardiac Potassium Channel hERG, Circ. Res. AHA 2003, 92, e87–e100. M. D. AMARAL, CFTR and Chaperones, J. Mol. Neurosci. 2003, 23, 0895–8696. E. K. WITTMACK, A. M. RUSH, M. J. CRANER, M. GOLDFARB, S. G. WAXMAN, S. D. DIB-HAJJ, Fibroblast growth factor homologous factor 2B: association with NaV 1.6 and selective colocalization at
References
24
25
26
27
28
29
30
31
32
33
34
nodes of Ranvier of dorsal root axons, J. Neurosci. 2004, 30, 6765–6775. C. ALTIER et al., Trafficking of L-type Calcium Channels Mediated by the Post-synaptic Scaffolding Protein AKAP79, J. Biol. Chem. 2002, 277(37), 33598–33603. M. K¨OTTGEN et al., Trafficking of TRPP2 by PACS proteins represents a novel mechanism of ion channel regulation, EMBO J. 2005, 24(4), 705–716. K. OKUSE, M. MALIK-HALL, M. D. BAKER, W. Y. POON, H. KONG, M. V. CHAO, J. N. WOOD, Annexin II light chain regulates sensory neuron-specific sodium channel expression, Nature 2002, 6, 417(6889), 653–656. Y. A. KURYSHEV, T. I. GUDZ, A. M. BROWN, B. A. WIBLE, KChAP as a chaperone for specific K (+) channels, Am. J. Physiol. Cell. Physiol. 2000, 278(5), C863–C864. O. T. PONGS et al., Functional and molecular aspects of voltage-gated K + channel beta subunits, Ann. N.Y. Acad. Sci. 1999, 868, 344–355. L. L. ISOM, Sodium channel beta subunits: anything but auxiliary, Neuroscientist 2001, 1, 42–54. M. W. RICHARDS, A. J. BUTCHER, A. C. DOLPHIN, Ca2 + channel β-subunits: structural insights AID our understanding. Trends Pharm. Sci. 2005, 12, 0165–6147. N. KLUGBAUER, E. MARAIS, F. HOFMANN, Calcium channel alpha2delta subinuts: differential expression, function, and drug binding, J. Bioenerg. Biomembr. 2003, 35(6), 639–647. W. J. JOINER, R. KHANNA, L. C. SCHLICHTER, L. K. KACZMAREK, Calmodulin Regulates Assembly and Trafficking of SK4/IK1 Ca2 +-activated K+ Channels, J. Biol. Chem. 2001, 276(41), 37980–37985. W-S. LEE, T. J. NGO-ANH, A. BRUENING-WRIGHT, A. MAYLIE, J. P. ADELMAN, Small Conductance Ca2 -activated K+ Channels and Calmodulin, J. Biol. Chem. 2003, 278(28), 25940–25946. W. VANDANBERGHE, R. A. NICOLL, D. S. BREDT, Stargazin is an AMPA receptor
35
36
37
38
39
40
41
42
43
auxiliary subunit, PNAS. 2005, 102(2), 485–490. D. MA, N. ZERANGUE, Y-F. LIN, A. COLLINS, M. YU, Y. N. JAN, L. Y. JAN, Role of ER Export Signals in Controlling Surface Potassium Channel Numbers, Science 2001, 291. M. M. ZAREI, M. EGHBALI, A. ALIOUA, M. SONG, H. G. KNAUS, E. STEFANI, L. TORO, An endoplasmic reticulum trafficking signal prevents surface expression of a Voltage- and Ca2 (+)- activated K+ channel splice variant, Proc. Natl. Acad. Sci. USA 2004, 101(27), 10072–10077. R. RONCARTI, I. DECIMO, G Fumagalli, Assembly and trafficking of human small conductance Ca2 (+)- activated K+ channel SK3 are governed by different molecular domains, Mol. Cell. Neurosci. 2005, 28(2), 314–325. H. M. JONES, K. L. HAMILTON, G. D. PAPWORTH, C. A. SYME, S. C. WATKINS, N. A. BRADBURY, D. C. DEVOR, Role of the NH2 terminus in the assembly and trafficking of the intermediate conductance Ca2 (+)-activated K+ channel hIK1, J. Biol. Chem. 2004, 279(15),15531–15540. D. LI, K. TAKIMOTO, E. S. LEVITAN, Surface expression of Kv1 channels is governed by a C-terminal motif, J. Biol. Chem. 2000, 275(16), 11597–11602. M. JENKE, A. S´ANCHEZ, F. MONJE, W. STU¨ HMER, R. M. WESELOH, L. A. PARDO, C-terminal domains implicated in the functional surface expression of potassium channels, EMBO 2003, 22(3), 395–403. A. AKHAVAN, R. ATANASIU, A. SHRIER, Identification of a COOH-terminal segment involved maturation and stability of human ether-a-go-go-related gene potassium channels, J. Biol. Chem. 2003, 278(41), 40105–40112. D. B. SCOTT, I. MICHAILIDIS, Y. MU, D. LOGOTHETIS, M. D. EHLERS, Endocytosis and Degradative Sorting of NDMA Receptors by Conserved MembraneProximal Signals, J. Neurosci. 2004, 24(32), 706–7109. Z. REN, N. J. RILEY, E. P. GARCIA, J. M. SANDERS, G. T. SWANSON, J. MARSHALL, Multiple Trafficking Signals Regulate
27
28 Functional Expression of Ion Channels in Mammalian Systems
44
45
46
47
48
49
50
51
52
Kainate Receptor KA2 Subunit Surface Expression, J. Neurosci. 2003, 23(16), 6608–6616. J-M. WANG, L. ZHANG, Y. YAO, N. VIROONCHATAPAN, E. ROTHE, Z-Z. WANG, A transmembrane motif governs the surface trafficking of nicotinic acetylcholine receptors, Nature 2002, 5, No. 10 10.1038/nn198. G. M. DENNING, M. P. ANDERSON, J. F. AMARA, J. MARSHALL, A. E. SMITH, M. J. WELSH, Processing of mutant cystic fibrosis transmembrane conductance regulator is temperature-sensitive, Nature 1992, 27,358(6389), 761–764. L. C. DENLINGER et al., Mutation of a Dibasic Amino Acid Motif Within the CTer-minus if the P2X7 Nucleotide Receptor Results in Trafficking Defects and Impaired Function, J. Immunol. 2003, 171, 1304–1311. Z. ZHOU, Q. GONG, C. T. JANUARY, Correction of Defective Protein Trafficking of a Mutant hERG Potassium Channel in Human Long QT Syndrome, J. Biol. Chem. 1999, 274(44), 31123–31126. A. PAULUSSEN, A. RAES, G. MATTHIJS, D. J. SNYDERS, N. COHEN, J. AERSSENS, A Novel Mutation (T65P) in the PAS Domain of the Human Potassium Channel hERG Results in the Long QT Syndrome by Trafficking Deficiency, J. Biol. Chem. 2002, 277(50), 48610–48616. C. R. VALDIVIA, M. J. ACKERMAN, D. J. TESTER, T. WADA, J. MCCORMACK, B. YE, J. C. MAKIELSKI, A novel SCN5A arrhythmia mutation, M1766L, with expression defect rescued by mexiletine, Cardiovasc. Res. 2002, 55(2), 279–289. M. BREJON, S. LE MAOUT, P. A. WELLING, J. MEROT, Processing and Transport of ROMK1 Channel is Temperature-Sensitive, Biochem. Biophy. Res. Commun. 1999, 261, 364–371. K. SCHROEDER, J. WU, L. ZHAO, R. J. LUKAS, Regulation by cycloheximide and lowered temperature of cell-surface α7-nicotinic acetylcholine receptor expression on transfected SH-EP1 cells, J. Neurochem. 2003, 85, 581–591. M. E. F. NELSON, F. WANG, A. KURYATOV, C. H. CHOI, V. GERZANICH, J. LINDSTROM,
53
54
55
56
57
58
59
60
Functional Properties of Human Nicotinic AChRs Expressed by IMR-32 Neuroblastoma Cells Resemble Thise of α3ˆa4 AChRs Expressed in Permanently Transfected HEK Cells, J. Gen. Physiol. 2001, 118, 563–582. S. T. COOPER, P. C. HARKNESS, E. R. BAKER, N. S. MILLAR, Up-regulation of Cell-surface α4P2 Neuronal Nicotinic Receptors by Lower Temperature and Expression of Chimeric Subunits, J. Biol. Chem. 1999, 17, 27145–27152. M. E. NELSON, A. KURYATOV, C. H. CHOI, Y. ZHOU, J. LINDSTROM, Alternate Stoichiometries of α4β2 Nicotinic Acetylcholine Receptors, Mol. Pharm. 2003, 63(2), 2035–1038020. I. SZABO` , A. NEGRO, P. M. DOWNEY, M. ZORATTI, F. I. SCHIAVO, G. M. GIACOMETTI, Temperature-Dependent Functional Expression of a Plant K+ Channel in Mammalian Cells, Biochem. Biophys. Res. Commun, 2000, 274, 130–135. K. DU, M. SHARMA, G. L Lukas,The DeltaF508 cystic fibrosis mutation impairs domain-domain interactions and arrests post-translational folding of CFTR, Nat. Struct. Mol. Biol. 2005, 12(1), 17–25. P. H. THIBODEAU, C. A. BRAUTIGAM, M. MACHIUS, P. J. THOMAS, Side chain and backbone contributions of Phe508 to CFTR folding, Nat. Struct. Mol. Biol. 2005, 12(1), 10–16. M. GENTZSCH et al., EndocyticTrafficking ¨ Routes of Wild Type and AF508 Cystic Fibrosis Transmembrane Conductance Regulator, Mol. Biol. Cell 2004, 15, 2684–2696. N. J. ANEGAWA, D. R. LYNCH, T. A. VERDOORN, D. B. PRITCHETT. Transfection of N-methyl-D-aspartate receptors in a non-neuronal cell line leads to cell death, J. Neurochem. 1995, 64(5), 2004–2012. N. J. ANEGAWA, R. P. GUTTMAN, E. R. GRANT, R. ANAND, J. LINDSTROM, D. R. LYNCH, N-Methyl-D-aspartate receptor mediated toxicity in non-neuronal cell lines: characterization using fluorescent measures of cell viability and reactive oxygen species production,
References
61
62
63
64
65
66
67
68
69
70
Brain Res. Mol. Brain. Res. 2000, 77(2), 163–175. M. RAISINGHANI, L. S. PREMKUMAR, Block of native and cloned vanilloid receptor (TRPVI) by aminoglycoside antibiotics, J. Pain 2004, 0304–3959. D. DOBREV, U. RAVENS, Therapeutically relevant concentrations of neomycin selectively inhibit P-type Ca2+ channels in rat striatum, Eur. J. Pharmacol. 2003, 461(2–3), 105–111. T. MASUKO, T. KUNO, K. KASHIWAGI, T. KUSAMA, K. WILLIAMS, K. IGARASHI, Stimulatory and Inhibitory Properties of Aminoglycoside Antibiotics at N-Methyl-D-Asparate Receptors, J. Pharm. Exp. Ther. 1999, 290(3),1026–1033. A. COLOSIMO et al., Transfer and expression of foreign genes in mammalian cells, Biotechniques 2000, 2, 314–318, 320–322, 324 passim. C. CHEN, H. OKAYAMA, High-efficiency transformation of mammalian cells by plasmid DNA, Mol. Cell Biol. 1987, 7, 2745–2752. J. GEHL, Electroporation: theory and methods, perspectives for drug delivery, gene therapy and research, Acta. Physiol. Scand. 2003, 177(4), 437–447. R. E. B. DUBRIDGE, P. TANG, H. C. HSIA, P. M. LEONG, J. H. MILLER, M. P. CALOS, Analysis of mutation in human cells by using an Epstein-Barr virus shuttle system, Mol. Cell. Biol. 1987, 1, 379–387. P. MELLON, V. PARKER, Y. GLUZMAN, T. MANIATIS, Identification of DNA sequences required for transcription of the human alpha 1-globin gene in a new SV40 host-vector system, Cell. 1981, 2(1), 279–288. C. R. SCLIMENTI, M. P. CALOS, Epstein-Barr virus vectors for gene expression and transfer, Curr. Opin. Biotechnol. 1998, 9, 476–479. B. SUGDEN, N. WARREN, A promoter of Epstein-Barr virus that can function during latent infection can be transactivated by EBNA-1, a viral protein required for viral DNA replication during latent infection, J. Virol. 1989, 63(6), 2644–2649.
71 Y. DUROCHER, S. PERRET, A. KAMEN, High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells, Nucleic Acids Res. 2001, 30(2), e9. 72 P. LILJESTRO¨ M, H. GAROFF, A New Generation of Animal Cell Expression Vectors Based on the Forest Vorus Replicon, Bio/Tech. 1991, 9. 73 K. LUNDSTROM, Semliki Forest virus vectors for rapid and high-level expression of integral membrane proteins, Biochim. Biophys. Acta. 2003, 90–96. 74 K. LUNDSTROM, A. MICHEL, H. BLASEY, A. R. BERNARD, R. HOVIUS, H. VOGEL, Surprenant A. Expression of ligand-gated ion channels with the Semliki Forest virus expression system, J. Recept. Signal Transduct Res. 1997, 17(1–3), 115–126. 75 G. H. GORRIE, Y. VALLIS, A. STEPHENSON, J. WHITFIELD, B. BROWNING, T. G. SMART, S. J. MOSS, Assembly of GABBA receptors composed of alpha1 and beta2 subunits in both cultured neurons and fibroblasts, J. Neurosci. 1997, 17(17), 6587–6596. 76 O. SHAMOTIENKO, S. AKHTAR, C. SIDERA, F. A. MEUNIER, B. INK, M. WEIR, J. O. DOLLY, Recreation of Neuronal Kv1 Channel Oligomers by Expression in Mammalian Cells using Semliki Forest Virus, Biochemistry. 1999, 38, 16766–16776. 77 T. A. KOST, J. P. CONDREAY, Recombinant baculoviruses as mammalian cell gene-delivery vectors, Trends Biotechnol. 2002, 20(4). 78 J. P. CONDREAY, S. M. WITHERSPOON, W. C. CLAY, T. A. KOST, Transient and stable gene expression in mammalian cells transduced with a recombinant baculovirus vector, Proc. Natl. Acad. Sci. USA. 1999, 96, 127–132. 79 W. C. CLAY, J. P. CONDREAY, L. B. MOORE, S. L. WEAVER, M. A. WATSON, T. A. KOST, J. J. LORENZ, Recombinant baculoviruses used to study estrogen receptor function in human osteosarcoma cells, Assay Drug Dev. Technol. 2003, 1(6), 801–810.
29
30 Functional Expression of Ion Channels in Mammalian Systems
80 R. AMES, P. NUTHULAGANTI, J. FORNWALD, U. SHABON, H. van der KEYL, N. ELSHOURBAGY, Heterologous expression of G protein-coupled receptors in U-2 OS osteosarcoma cells, Receptors Channels. 2004, 10(3–4), 17–124. 81 J. L. PFOHL, J. F. WORLEY3rd, J. P. CONDREAY, G. AN, C. J. APOLITO, T. A. KOST, J. F. TRUAX, Titration of KATP channel expression in mammalian cells utilizing recombinant baculovirus transduction, Receptors Channels. 2000, 8(2), 99–111. 82 R. AMES et al. BacMam recombinant baculoviruses in G protein-coupled receptor drug discovery, Receptors Channels. 2004, 10(3–4), 99–107. 83 S. K. JANG, H. G. KRAUSSLICH, M. J. NICKLIN, G. M. DUKE, A. C. PALMENBERG, E. WIMMER, A segment of the 5 nontranslated region of encephalomyocarditis virus RNA directs internal entry of ribosomes during in vitro translation, J. Virol. 1988, 62(8), 2636–2643. 84 S. REES, J. COOTE, J. STABLES, S. GOODSON, S. HARRIS, M. G. LEE, Bicistronic Vector for the Creation of Stable Mammalian Cell Lines that Predisposes all Antibiotic-Resistant Cells to Express Recombinant Protein, Biotechniques 1996, 20(1), 102–110. 85 S. REES. 86 Y-H. CHEN, T. J. DALE, M. A. ROMANOS, W. R. J. WHITAKER, X. M. XIE, J. J. CLARE, Cloning, distribution and functional analysis of the type III sodium channel from human brain, Eur. J. Neurosci. 2000, 12, 4281–4289. 87 M. J. MAIN, J. E. CRYAN, J. R. B. DUPERE, B. COX, J. J. CLARE, S. A. BURBIDGE, Modulation of KCNQ2/3 Potassium Channels by the Novel Anticonvulsant Retigabine, Mol. Pharm. 2000, 58, 253–262. 88 S. A. BURBIDGE, T. J. DALE, A. J. POWELL, W. R. J. WHITTAKER, X. M. XIE, M. A. ROMANOS, J. J. CLARE, Molecular cloning, distribution and functional analysis of the NaV 1.6. Voltage-gated sodium channel from human brain, Mol. Brain. Res. 2002, 103, 80–90.
89 V. H. JOHN et al., Heterologous expression and functional analysis of rat NaV 1.8 (SNS) voltage-gated sodium channels in the dorsal root ganglion neurobastoma cell line ND7–23, NeuroPharm. 2004, 46, 425–438. 90 E. KAWASHIMA et al., A novel and efficient method for the stable expression of heteromeric ion channels in mammalian cells, Receptors Channels 1998, 5(2), 53–60. 91 D. A. PERSONS et al., Enforced expression of the GATA-2 transcription factor blocks normal hematopoiesis, Blood 1999, 93(2), 488–499. 92 J. ZHU, M. L. MUSCO, M. J. GRACE, Three-color flow cytomentry analysis of tricistronic expression of eBFP, eGFP, and eYFP using EMCV-IRES linkages, Cytometry 1999, 37(1), 51–59. 93 F. MINIER, E. SIGEL, Techniques: Use of concatenated subunits for the study of ligand-gated ion channels, Trends Pharm. Sci. 2004, 25(9). 94 M. GOSSEN, H. BUJARD, Tight control of gene expression in mammalian cells by tetracycline-responsive promoters, Proc. Natl. Acad. Sci. USA 1992, 89, 5547–5551. 95 S. RENARD, C. DROUET-P´ETRE´ , M. PARTISETI, S. Z. LANGER, D. GRAHAM, F. BESNARD, Development of an inducible NMDA receptor stable cell line with an intracellular Ca2+ reporter, Eur. J. Pharm. 1999, 366, 319–328. 96 I. AKIBU et al., Stable Expression and Characterization of Human PN1 and PN3 Sodium Channels, Receptors Channels 2003, 9, 291–299. 97 M. GOSSEN, S. FREUNDLIEB, G. BENDER, G. MULLER, W. HILLEN, H. BUJARD, Transcriptional activation by tetracyclines in mammalian cells, Science 1995, 268(5218), 1766– 1769. 98 A. ROSSIGNOLI, M. G. GIRIBALDI, S. CORAZZA, Establishment of a CHO cell line expressing the NMDA NR1a/NR2b subunits for functional analysis of receptor ligands, presented at The Society for Biomolecular Screening Conference, 2002.
References 99 F. YAO, T. SVENSJO, T. WINKLER, M. LU, C. ERIKSSON, E. ERIKSSON, Tetracycline repressor, tetR, rather than the tetR-mammalian cell transcription factor fusion derivatives, regulates inducible gene expression in mammalian cells, Hum. Gene Ther. 1998, 9(13), 1939–1950. 100 A-L. PERRAUD et al., ADP-ribose gating of the calcium-permeable LTRPC2 channel revealed by Nudix motif homology, Nature 2001, 411. 101 M. XIA et al., Generation and Characterization of a Cell Line with Inducible Expression of CaV 3.2 (T-Type) Channels, Assay Drug. Dev. Tech. 2003, 1(5). 102 J. G. TRAPANI, S. J. KORN, Control of ion channel expression for patch clamp recordings using an inducible
expression system in mammalian cell lines, BMC Neurosci. 2003, 4, 15. 103 D. NO, T. P. YAO, R. M. EVANS, Ecdysone-inducible gene expression in mamma lian cells and transgenic mice, Proc. Natl. Acad. Sci. USA. 1996, 93(8), 3346–3351. 104 D. KURKO´ , P. DEZSO ˜ , A. BOROS, S. KOLOK, L. FODOR, J. NAGY, Z. Szombathelyi, Inducible expression and pharmacological characterization of recombinant rat NR1a/NR2A NMDA receptors, Neurochem. Int. 2005, 46, 369–379. 105 J. NAGY, A. BOROS, P. DEZSO ˜ , S. KOLOK, L. FODOR, Inducible expresi´on and pharmacology of recombinant NMDA receptors, composed of rat NR1a/NR2B subunits, Neurochem. Int. 2003, 43, 19–29.
31
1
Production of Recombinant Proteins by In Vitro Folding Christian Lange, and Rainer Rudolph Martin-Luther-Universit¨at Halle/Wittenberg, Halle/Saale, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Twenty years ago protein folding was regarded as a highly fascinating topic of pure basic research. Although tremendous progress has been made in our understanding of this enigmatic process, protein folding is still a very attractive field for academic research, as is documented in this handbook. However, at that time nobody would have anticipated that protein folding would today be a well-established downstream process in industrial production. Researchers in industry working on recombinant protein production are frequently confronted with misfolded, inactive material. They either can try to find a way to produce correctly folded native protein by using alternative expression systems or they can take up the challenge of in vitro protein folding. Since efficient processes are now available that allow medium- to largescale production of recombinant proteins by in vitro folding, many recombinant proteins for therapy, diagnostics, and industrial applications are successfully and profitably produced via this process. 1.1 The Inclusion Body Problem
Before the advent of recombinant DNA technology, proteins for medical, diagnostic, or industrial applications were difficult to obtain from human body fluids, from animal or plant tissue, or from microorganisms. Potentially low availability and the risk of viral contamination or high antigenicity put severe limitations on the use of natural proteins in therapy. The extraction of proteins from natural sources was often very cost-intensive, requiring large amounts of raw material, water, and
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Production of Recombinant Proteins by In Vitro Folding
energy. Recombinant DNA technology has dramatically changed this situation. It has opened the way for the large-scale, low-cost production of almost any imaginable protein [1, 2]. Proteins that are available in only minute amounts from natural sources can now be produced in huge quantities in host cells such as E. coli. Expression levels up to 50% of the total cell protein can be obtained with standard expression systems (reviewed, e.g., in Ref. [3]). Using high-density fermentation techniques, gram amounts of the desired proteins can be obtained per liter of fermentation broth [4–6]. There is, however, one major problem associated with overexpression in E. coli. Often the desired polypeptide is synthesized in these host cells, but not in the correctly folded, native form. Instead of the biologically active product, aggregates – inclusion bodies – are formed [7]. These dense particles, which may span the whole diameter of the host cell, are mostly biologically inactive and generally are insoluble in non-denaturing buffer systems (Figure 1). Inclusion bodies predominantly contain the overexpressed recombinant protein and only small amounts of host cell material such as inclusion body binding proteins (IBPs) [8], outer-membrane
Fig. 1 E. coli cell filled with inclusion bodies (lower third of cell volume).
1 Introduction 3
proteins, RNA polymerase, and ribosome components including ribosomal RNA as well as circular and nicked forms of plasmid DNA [9]. Some of these impurities may originate from co-precipitation of these cellular components together with the recombinant protein during inclusion body formation. However, most of the impurities found in inclusion body preparations may originate from incomplete removal of particulate host cell material during their isolation [10]. Inclusion body formation is not restricted to the overexpression of heterologous gene products in bacteria. Overexpression of autologous E. coli proteins in the cytosol of this organism has been observed to lead to inclusion body formation. For example, the overexpression of endogenous β-galactosidase in E. coli led to the formation of inclusion bodies that showed only partial enzymatic activity [11]. Inclusion body formation may also occur upon overexpression of proteins in eukaryotic host cells [12]. Apparently, high-level expression of any gene product beyond certain limits is sufficient to drive the recombinant protein into aggregation and inclusion body formation. Based on this observation, a simple kinetic model for inclusion body formation was proposed [13]. This model, which takes only the zero-order rate of protein synthesis and a subsequent competition between first-order folding and second-order aggregation into account, provides a fairly accurate description of inclusion body formation. In agreement with this model, inclusion body formation can be reduced or even prevented by decreasing the rate of protein synthesis. This can be achieved, for example, by cultivation at low temperature or by limited induction [14, 15]. Except for the presence of disulfide bonds, other molecular characteristics of proteins such as size, hydrophobicity, secondary structure content, etc., do not correlate well with the observed propensity for inclusion body formation. Small proteins containing few disulfide bonds are often expressed in soluble form, while inclusion body formation predominates upon cytosolic production of large proteins containing multiple disulfide bonds. Correct oxidative folding of more complex proteins is apparently not possible in the reducing cytosolic environment, in line with the finding that naturally secreted proteins mostly contain disulfide bonds, while the cysteine residues of intracellular proteins usually stay reduced [16]. The oxidative folding of secreted proteins occurs in the periplasm and the endoplasmic reticulum of prokaryotes and eukaryotes, respectively. These compartments offer optimum oxidizing conditions as well as thiol-disulfide isomerases for the formation of correct disulfide bonds [17–20]. Upon overexpression of disulfide-bonded proteins in the reducing bacterial cytosol, oxidative folding is massively impaired, and, as a consequence, inclusion body formation predominates even upon low-level expression. By selecting for mutations in E. coli, which allow for disulfide bond formation in the cytoplasm, Beckwith and coworkers could isolate mutant strains with reduced or eliminated thioredoxin reductase activity [21–23]. These and other, further-improved, mutant strains showed increased cytosolic oxidative folding of disulfide-bonded proteins as compared to wild-type strains [24]. However, in spite of this recent progress in the soluble overexpression of heterologous proteins in E. coli, in many cases the formation of inclusion bodies still prevails.
4 Production of Recombinant Proteins by In Vitro Folding
The exact mechanism by which inclusion body formation is induced and the reason that the recombinant protein is deposited predominantly in the inclusion bodies are still unclear. Apart from the changes in the kinetic competition between folding and aggregation caused by changing the rate of protein synthesis, essential components of the chaperone machinery may be simply oversaturated upon massive overproduction of recombinant proteins. It is, however, not yet clear which chaperones are involved in this process. In some cases, inclusion body formation can be suppressed by co-overexpression of molecular chaperones such as GroE or DnaK and its cohort chaperones [25, 26]. Although this approach was quite efficient upon overexpression of some proteins, co-expression of chaperones had no effect whatsoever in other cases. Because of the frequent, and sometimes frustrating, observation of inclusion bodies upon overexpression of recombinant proteins in E. coli, other host cell systems have been studied in detail for industrial protein production. Although very efficient in some cases, none of these expression systems offers the ultimate solution for recombinant protein production. Incorrect glycosylation in yeast and insect cells as well as long development times and high production costs in mammalian cell culture systems limit the general applicability of these alternatives. Therefore, enormous efforts have been made to develop efficient procedures for the reactivation of inactive bacterial inclusion body material by in vitro folding. These efforts relied heavily on the body of information available on in vitro protein folding that had been gathered over the years by basic research. Many protein folders from academia were approached by industry to help in setting up industrial folding processes. Since the main goal of academic research on protein folding was the identification of folding intermediates in order to understand the folding pathway of proteins in minute detail, the final goal of the industrial effort, which was to develop an efficient, reproducible, and cost-effective production process, sometimes came out of focus. In most previous academic in vitro folding studies, model proteins that showed more or less quantitative refolding under standard buffer conditions had been chosen. Many proteins of industrial relevance, on the other hand, form inactive aggregates under these conditions. In order to develop efficient processes, the “dark side” of in vitro folding, namely misfolding and aggregation, had to be taken into consideration. These unproductive side reactions had to be analyzed in detail in order to be overcome in maximizing the yield of the desired, correctly folded, native protein [10, 27–30]. Although subject to its specific problems and limitations, the industrial production of recombinant proteins for therapy, diagnostics, and research by overexpression into inclusion bodies and subsequent refolding has by now turned out to be an attractive and viable option. For example, Ernst et al. carefully compared the production cost for soluble expression of the recombinant heparinase in E. coli and for its production via inclusion bodies [31]. Mainly due to the low level of soluble expression, the isolation of the soluble protein was estimated to be 50% more expensive than the production of the enzyme by refolding of inclusion body material.
1 Introduction 5
1.2 Cost and Scale Limitations in Industrial Protein Folding
Protein refolding is a relatively new unit operation in industrial processes for recombinant protein production. To date, there are no generally applicable protocols available for the optimization of a refolding process. There are, however, some guidelines. Protein-dependent variables make it difficult to estimate the final production cost before an extensive optimization of a given process has been carried out. It is impossible to predict the yield of refolding without analysis of a large number of parameters such as the maximum concentration at which refolding of a particular protein can be successfully carried out, optimum pH, temperature, time, and ionic strength, and the effect of one or more of many possible additives. While some proteins, like most of those described in this volume, refold quantitatively in simple buffer solutions, others may need sophisticated process design and cost-intensive additives to achieve the successful refolding of at least some of the starting material. The protein concentration at which refolding can be performed best is a specific property of any particular protein. Because of the kinetic competition between first-order folding and higher-order unproductive aggregation, the yield of correct refolding decreases upon increasing the initial concentration of unfolded protein [10, 32]. While some proteins can be refolded to a high yield at high protein concentrations in the milligram per milliliter range, other proteins refold effectively only at concentrations on the order of micrograms per milliliter. For industrial processes, where comparatively huge amounts of proteins must be produced, the cost is extremely high if both the yield and the concentration of refolding are low. In an interesting case study, the process economics for the production of human tissue-type plasminogen activator (t-PA) by E. coli fermentation and in vitro folding were analyzed in detail. Human t-PA is a relatively complex protein, consisting of 527 amino acid residues that are organized in five structural domains. The native protein contains 17 disulfide bonds and one additional free cysteine and has a rather low solubility even in its native state. Because of the astronomical number of statistically possible disulfide bond pairings (2.2 × 1020 ) and the low solubility of folding intermediates, t-PA poses an enormous problem for in vitro folding. This therapeutic protein activates plasmin at the surface of blood clots and is therefore ideally suited for treating patients suffering from acute myocardial infarction by inducing fibrinolysis. Its market potential justified the enormous efforts that were directed towards the optimization of its in vitro folding. Based on a low yield of refolding, to be obtained only at an extremely low protein concentration, Datar et al. calculated the process economics for the production of recombinant human t-PA from E. coli inclusion bodies [33]. The authors estimated that enormous refolding reactors (1500 m3 ) would be necessary for production of the required amounts of protein. These reactors would account for 75% of the total equipment costs [34]. A simple comparison of the economics of the process involving the refolding of inclusion body protein from E. coli with an alternative process involving protein
6 Production of Recombinant Proteins by In Vitro Folding
expression in Chinese hamster ovary cells led to the conclusion that the latter, despite the considerable costs of mammalian cell culture systems, was economically far more advantageous. These studies seemed to demonstrate that it was clearly not feasible to set up an economic process for the production of recombinant human t-PA by bacterial overexpression and in vitro folding. However, a deletion variant of the plasminogen activator, Rapilysin, which was approved by the authorities in 1996 and has by now attained a considerable market share, is in fact produced by in vitro folding. A process was developed in which the protein is refolded in high yield at relatively high concentrations [35–37]. The clue to this success lay in careful process design and the inclusion of the low-molecular-weight additive L-arginine (L-Arg), which has proved effective for t-PA and many other proteins (cf. Section 3.4), in the refolding buffer. The presence of this additive led to a tremendous increase in the yield of refolding.
2 Treatment of Inclusion Bodies 2.1 Isolation of Inclusion Bodies
After cell lysis, inclusion bodies are usually harvested by solid-liquid separation. With this simple isolation procedure, the recombinant protein can be obtained in a relatively pure form, provided that all particulate matter of the E. coli host cells is disintegrated by a rigorous lysis procedure. Rather homogenous inclusion body isolates can be obtained by lysozyme treatment, subsequent high-pressure dispersion, and incubation with detergent at high salt concentrations [10]. This combination of cell disruption techniques provides for a nearly complete removal of cell wall debris and membrane fragments. As a consequence, highly enriched inclusion body material should be present in the pellet fraction of the subsequent sedimentation step. After sedimentation, inclusion bodies form a very dense pellet. Impurities are difficult to extract from this glutinous material by later washing steps. Therefore, sedimentation should be performed after cell lysis and removal of membrane proteins by detergent and high salt concentrations (cf. Protocol 1). After the inclusion bodies have been collected by centrifugation, detergents and other buffer components that might interfere with the following solubilization and renaturation steps can be removed by repeated resuspension and sedimentation. This inclusion body isolation procedure, which combines different methods for cell lysis and disintegration, is rather robust and yields relatively pure inclusion body isolates (Figure 2), even in those cases where the lysis of E. coli is difficult due to changes in the cell wall morphology, which may occur as a response to the physiological stress caused by protein overexpression [38, 39].
2 Treatment of Inclusion Bodies
Fig. 2 Expression in E. coli, inclusion body preparation, and purification of the N-terminal domain of human glucagon-like peptide 1 receptor. Reproduced from Ref. [54].
2.2 Solubilization of Inclusion Bodies
Inclusion bodies are usually solubilized in high concentrations of denaturing agents such as 6 M guanidinium chloride (GdmCl) or 8 M urea. Guanidinium chloride, which is the stronger denaturant, may allow solubilization of inclusion body isolates that are resistant to urea. Since urea rapidly decomposes into cyanate under certain solvent conditions (e.g., high pH values), care must be taken to avoid side chain modification of lysine and cysteine residues if this denaturant is used [40–42]. Although disulfide bonds are hardly formed in the reducing environment of the cytosol in vivo, random formation of these bonds, either within the same protein molecule or between different polypeptide chains, may occur by air oxidation during cell lysis and inclusion body isolation. This is of special importance in an industrial setting, where the incubation times may change upon scaling up the process. In order to guarantee the complete reduction of any accidentally formed disulfide bonds, reducing agents such as dithiothreitol (DTT), dithioerythritol (DTE), or β-mercaptoethanol should be present during solubilization. (cf. Protocol 2). Provided that the level of overexpression is sufficiently high, the recombinant protein is usually found highly enriched in the inclusion body isolate (Figure 2). In this case, further purification of the recombinant protein in the denatured state prior to renaturation is not necessary. Contaminating proteins and other components of the host cells found in inclusion body isolates have no significant effect on protein refolding in most cases. Since the chromatographic resolution during the purification of unfolded proteins is sometimes poor and chromatographic purification represents a relatively cost-intensive unit operation, it should be avoided
7
8 Production of Recombinant Proteins by In Vitro Folding
at this stage in industrial processes. In certain cases, however, contaminating proteins may co-precipitate during refolding of a given protein [43, 44]. The solubilized protein then has to be purified by, e.g., reversed-phase HPLC [45] or immobilized metal-affinity chromatography [46]. During research and process development, purification of the solubilized recombinant protein is essential if spectroscopic techniques such as fluorescence, circular dichroism, or light scattering are used to optimize refolding conditions.
3 Refolding in Solution
The following section aims to give an overview of the factors that have to be taken into account when dealing with the in vitro folding of a recombinant protein from inclusion bodies. Aggregation of unfolded protein and misfolded intermediates is in most cases the major complication during protein refolding. The ideas and methods that have been explored and used in order to overcome the problem posed by this unproductive side reaction will be discussed in some detail. 3.1 Protein Design Considerations
As already mentioned, the behavior of a protein upon refolding is a specific property of its given amino acid sequence. This property may be influenced by protein engineering, e.g., with respect to the pI value and the hydrophobicity. The refolding yield of a number of proteins could indeed be improved by protein design. The in vitro folding of human insulin-like growth factor was greatly enhanced by expressing it as a fusion protein with two Z domains derived from staphylococcal protein A [47]. This effect was mainly due to a more than 100-fold increase in the solubility of the reduced denatured polypeptide. In a similar fashion, the renaturation yield of granulocyte colony-stimulating factor could be systematically improved by fusion to various hydrophilic peptide sequences between 2 and 20 amino acids long [48]. Many important proteins naturally contain pro-sequences, which are later removed by specific proteolytic processing. These pro-sequences may also affect in vitro structure formation and act as natural folding helpers. The potentially beneficial “intramolecular chaperone” effect of pro-sequences was analyzed in detail for the folding of subtilisin [49] as well as for α-lytic protease [50]. Interestingly, the pro-region of α-lytic protease also promoted the correct folding of the enzyme when co-expressed in trans. In the case of human nerve growth factor, the pro-sequence was found to have a tremendously beneficial effect on both the rate and the yield of correct refolding and disulfide bond formation [51, 52]. Upon in vitro folding of proteins whose pro-sequences have been truncated or, more generally, of isolated protein domains, the refolding pathway may be corrupted by the absence of those parts of the protein that are essential for guiding
3 Refolding in Solution
structure formation towards the native state. This is especially true if any of the deleted parts strongly interact with the isolated domain in the native state. The energetic contribution of such an interaction may be essential for correct folding. In spite of these considerations, isolated domains have been successfully produced by in vitro folding. For example, the N-terminal extracellular domains of class B G protein-coupled receptors could be refolded in functional form [53, 54]. Finally, folding of polypeptides that naturally form part of a hetero-oligomeric protein may require the presence of all subunits of the complex for the formation of a thermodynamically stable, native state. The expression of fully functional tetrameric human hemoglobin in E. coli, for example, required the co-expression of the α and β globin genes [55]. Separate expression of the individual subunits resulted in a marked decrease in protein production as well as in inclusion body formation. In agreement with this observation, in vitro folding of β globin could proceed well only in the presence of α globin and the heme cofactor required for the formation of a functional tetramer [56]. 3.2 Oxidative Refolding With Disulfide Bond Formation
Before designing a refolding strategy, it is essential to establish whether the target protein contains disulfide bonds in its native structure. While most cytosolic proteins contain only free cysteine residues, secreted proteins or those localized in cell organelles often form intramolecular cystine bridges [16]. Highly conserved cysteines in homologous sequences as well as the presence of export signals usually indicate the presence of disulfide bonds. In such a case, folding conditions must be chosen that facilitate their fast, efficient, and, above all, correct formation during oxidative refolding, since otherwise inactive, incorrectly disulfide-bonded products may accumulate, with potentially dramatic negative effects on the renaturation yield. The number of possible combinations of disulfide bonds increases exponentially with the number of cysteines that are available for pairing (see Table 1). However, the problem is not as ill-behaved as it may look at first glance since pure statistics are usually overcome by effects that energetically favor the correct refolding pathway. Disulfide bond formation is directed towards the correct pairings by the conformational energy gained upon formation of the native structure [1, 57–59]. Furthermore, during the in vivo folding of proteins, disulfide isomerases promote the reshuffling of disulfide bonds, avoiding kinetic trapping of misfolded conformations with incorrectly formed disulfide bridges (for recent reviews, see, e.g., Refs. [20, 61–63]). The first experiments on in vitro folding with concomitant disulfide bond formation were performed using air as the oxidizing agent [64]. The rate and yield of reactivation that may be obtained by air oxidation have been found to be quite low. The reproducibility is usually low due to variations in surface-to-volume ratio, additional oxidation processes (e.g., methionine oxidation), and the presence of trace amounts of transition metal ions (especially Cu2+ ) in variable concentrations
9
10 Production of Recombinant Proteins by In Vitro Folding Table 1 Number of possible combinations of 2n cysteines to form n disulfide bonds
n
Number of combinations1
1
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
3 15 105 945 10395 135135 2027025 34459425 654729075 1374931058 × 1010 3162341432 × 1011 7905853581 × 1012 2134580467 × 1014 6190283354 × 1015 1918987840 × 1017 6332659871 × 1018 2216430955 × 1020 ...
1
(2n−1)·(2n−3)·(2n−5)·. . .·3·1
in the buffer solutions. These metal ions act as efficient catalysts for air oxidation in a concentration range (low micromolar) that is experimentally difficult to control [65]. Because of these inherent drawbacks of air oxidation, more defined and efficient methods for disulfide bond formation in vitro have been developed. As mentioned above, the reshuffling of incorrectly formed disulfide bridges is an essential step on the pathway towards the formation of native structure. In some cases, catalytic amounts of disulfide isomerase enzymes have been used for the efficient refolding of proteins (e.g., proinsulin) [66]. In general, however, thiol-disulfide exchange reactions between reduced polypeptide and low-molecular-weight thiols are used to facilitate disulfide bond shuffling [67, 68]. Reduced/oxidized glutathione (GSH/GSSG) are most commonly used as oxido-shuffling reagents, although other low-molecular-weight thiols such as cystine/cysteine, cystamine/cysteamine, or bisβ-hydroxyethyl disulfide/2-mercaptoethanol have been found to be equally effective in most cases. Since the thiolate anion is the active species in thiol-disulfide exchange, the pH of the refolding buffer must lie in the range of the pKa of the thiol group (i.e., between pH 8.5 and pH 9.5). Over the past few years, several low-molecular-weight thiol compounds have been designed with the aim of improving thiol-disulfide exchange during the oxidative refolding of proteins. For example, the synthetic dithiol (±)-trans-1,2-bis(2mercaptoacetamido)cyclohexane (Vectrase P), which has a pKa value of 8.3 and an E0 value of −0.24 V, was designed to mimic the properties of the active site of
3 Refolding in Solution
protein disulfide isomerases [69]. This compound improved the activation of scrambled RNAse A more efficiently than comparable monothiols. The synthetic dithiol compound also improved the refolding yield of human proinsulin [70]. Recently, various aromatic monothiols, characterized by low thiol pKa values, were tested for their effect on rate and yield of the oxidative refolding of scrambled RNAse A [71, 72]. At slightly acidic pH values (pH 6.0), the aromatic thiols increased the refolding rate by a factor of 10 to 23 over that observed with glutathione. Since the nucleophilicity of the redox buffer thiolate determines the rate of thiol-disulfide exchange, the rate of RNAse A refolding was increased with decreasing thiol pKa value. Oxidative refolding is sometimes complicated by the low solubility of the aggregation-prone reduced-denatured polypeptide chains. This problem may be circumvented by the reversible redox modification of cysteine residues of a protein in its solubilized state prior to refolding. The formation of the mixed disulfide with glutathione introduces additional negatively charged residues on the polypeptide chain and may thereby reduce precipitation of unfolded protein during the initial states of oxidative refolding [73]. This idea has been demonstrated for the in vitro folding of human tissue-type plasminogen activator [37]. As an alternative to the complete removal of the reducing agent (DTT or DTE), which is present during solubilization (cf. Section 2.2), before oxidative refolding is carried out in a defined redox buffer, the solubilized protein solution (still containing DTT or DTE) may be directly diluted into a refolding buffer containing an oxidized monothiol (e.g., oxidized glutathione) [74]. Redox equilibration between the reduced dithiol and the oxidized monothiol will then result in predominantly oxidized dithiol and a redox buffer consisting of the reduced and oxidized monothiol. An elegant and simple approach towards oxidative refolding was demonstrated by Honda et al. [75]. They succeeded in refolding recombinant human growth/differentiation factor 5 (rhGDF-5) with a yield of 63% at high protein concentrations (2.4 mg mL−1 ). The inclusion bodies were first solubilized in the presence of 32 mM cysteine/HCl for disulfide bond reduction. The solubilized protein solution was then diluted into a refolding buffer without any additional thiol reagent. Apparently, the reducing potential of the residual cysteine gradually decreased with time due to air oxidation, thus providing an oxidative redox-shuffling system composed of reduced and oxidized cysteine that was appropriate for rhGDF5 refolding. It remains to be tested whether this simple and cost-effective protocol can be used for the refolding of other recombinant proteins at high concentrations. 3.3 Transfer of the Unfolded Proteins Into Refolding Buffer
The question of whether the buffer exchange from unfolding to folding conditions should be performed by dialysis or dilution is still controversial. On some occasions, higher folding yields might be obtained by dialysis as compared to rapid dilution of the denaturant. For a number of proteins, protocols comprising gradual removal of the denaturant have been reported. For example, mouse interleukin
11
12 Production of Recombinant Proteins by In Vitro Folding
4 [76], a recombinant fragment of bovine conglutinin [77], the fungal protease rhizopuspepsin [78], recombinant bovine guanylate cyclase-activating protein 1 [79], and human interleukin-21 [80] were obtained using stepwise dialysis for the removal of the denaturant. Recently, Umetsu et al. performed a detailed analysis of the refolding pathway of a single-chain antibody fragment during a complex dialysis protocol for the stepwise reduction of the GdmCl concentration [81]. For industrial processes, however, buffer exchange by dialysis presents considerable disadvantages with respect to reproducibility, handling, and ease of scale-up. Furthermore, many proteins form highly aggregation-prone folding intermediates at intermediate denaturant concentrations. Slow removal of the denaturant by dialysis then inevitably leads to quantitative and irreversible precipitation of these intermediates. In these cases, rapid removal of the denaturant by dilution reduces protein losses by aggregate formation, especially if carried out under properly controlled process conditions and in the presence of low-molecular-weight folding enhancers (cf. Section 3.4). Diluting the solubilized protein into refolding buffer is in most cases much more convenient and efficient than the removal of the denaturing agent by dialysis. The rate of aggregation processes, which are of second or higher order, increases dramatically upon increasing the initial concentration of unfolded protein, while the first-order rate of productive folding processes does not change. Therefore, at high concentrations of denatured protein, aggregation processes predominate in the refolding reaction mixture, and this may represent a major difficulty for the design of an efficient refolding process. This problem can be at least partially overcome by process engineering (cf. Protocol 3). Since correctly folded protein does not participate in aggregation side reactions, high concentrations of native protein can be obtained by stepwise addition of the denatured protein to the refolding vessel [35]. By stepwise addition of the solubilized denatured protein at time intervals that are sufficiently large for the polypeptides to fold past the aggregation-prone early stages on the folding pathway, the concentration of unfolded protein and folding intermediates is kept sufficiently low at any given time during the refolding process (Figure 3). For example, this method has made possible the development of an economically feasible refolding process for Rapilysin, a variant of tissuetype plasminogen activator in refolding reactors that are about three orders of magnitude smaller than the huge and prohibitively expensive 1500-m3 tanks that were considered necessary in the initial analysis [34]. 3.4 Refolding Additives
One would be hard-pressed to give an exhaustive account of all low-molecularweight compounds that have been found to improve in vitro folding of one or another model protein under some conditions. This section aims at giving a, necessarily incomplete, overview of additives that show no, in a biochemical sense, specific interaction with the refolded proteins but rather act as co-solvents that modify the solvation properties of the buffer for dissolved polypeptides and thereby
3 Refolding in Solution
Fig. 3 Renaturation yield of proinsulin upon stepwise dilution of denatured protein into the refolding buffer. Reproduced from Ref.[82].
influence the pathways of refolding and of its unproductive side reactions, potentially to great effect. Systematic studies by Timasheff and coworkers (see, e.g., Refs. [83, 84]) have greatly contributed to our understanding of the interplay among polypeptides, water, and co-solvents. Generally speaking, when using lowmolecular-weight compounds as co-solvents in refolding reactions, the task is to strike a balance between, on the one hand, preserving the relative stability of the native state and, on the other hand, stabilizing denatured polypeptides and intermediates in solution in order to prevent them from following the path down to aggregation. The earliest and most simple substances used to enhance refolding were the denaturing agents GdmCl and urea themselves, at non-denaturing concentrations [85]. Among other denaturing substances, alkyl ureas, carbonic acid amides [37], and alcohols [86] have been used to improve in vitro folding. The positive effect of theses substances on refolding is most probably caused by their very properties as denaturing agents, as they thermodynamically stabilize unfolded and misfolded forms of polypeptides in solution. As a consequence, the solubility of denatured protein and folding intermediates is increased and aggregation is reduced. For a number of proteins, improved refolding has been observed upon addition of polyethylene glycol to the refolding buffer [87, 88]. Although polyethylene glycols do not act as denaturants, they have been found to decrease the thermostability of proteins with increasing average hydrophobicity of the amino acid sequence. An analysis of solvent-protein interactions indicated that the polyethylene glycols interact favorably with the hydrophobic side chains exposed upon unfolding at elevated temperatures [89]. Again, these hydrophobic interactions contribute to the stability of denatured polypeptides, folding intermediates, and misfolded forms in solution. Detergents represent another class of substances that may be assumed to stabilize aggregation-prone states of polypeptides in solution by hydrophobic interactions.
13
14 Production of Recombinant Proteins by In Vitro Folding
Indeed, detergents or phospholipid-detergent mixed micelles have been found to promote the correct refolding of various proteins [86, 90, 91]. Interestingly, the nonionic detergent Brij 58P was recently reported to be highly effective in preventing aggregation and promoting folding at very low, near-stoichiometric concentrations [92]. In other cases, osmolytes that generally stabilize the native states of proteins, such as sugars, polyalcohols, and trimethylamine oxide (TMAO), were found to be beneficial for the in vitro folding of various proteins [93, 94]. For example, a destabilized variant of hen egg white lysozyme, containing only three disulfide bonds, could be efficiently refolded in the presence of 20% glycerol [95]. In the case of human placental alkaline phosphatase, refolding could be achieved only in the presence of stabilizing sugars or Hofmeister salts [96]. In these cases efficient folding apparently depends on the relative stabilization of the native state and productive intermediates with respect to nonnative conformational states. However, the balance between the beneficial effect of stabilizing the desired native state of a protein and destabilizing the intermediates it has to pass on its folding pathway is apparently delicate. TMAO, for example, was recently found to hinder the productive refolding of lactate dehydrogenase [97]. Stabilizing osmolytes are therefore generally not the first choice as co-solvents for in vitro folding, unless the stability of the native state is an issue. Several amino acids have been found to have a beneficial effect on the in vitro folding of certain proteins. For example, the presence of 5 M proline could prevent aggregation and, consequently, improve the refolding yield of bovine carbonic an-hydrase II [98]. The refolding of lactate dehydrogenase, however, was negatively influenced by the presence of proline [97]. Recently, all naturally occurring amino acids (with the exception of cysteine, histidine, and the aromatic amino acids) were screened for their efficiency in suppressing the aggregation of lysozyme during thermal denaturation and during dilution of the reduced-carboxymethylated denatured protein from a urea solution into buffer [99]. In this study, the most basic amino acid, L-Arg, was found to be the most effective suppressor of aggregation. This is in line with the fact that L-Arg has probably been the most widely used low-molecularweight refolding enhancer since it was first described in a patent application almost 20 years ago (cf Ref [35]). The presence of L-Arg as a co-solvent in concentrations above 0.4 M (up to 1.2 M) was found to promote the refolding of many different proteins. It was successfully used in the refolding of human tissue-type plasminogen activator [35], recombinant Fab antibody fragments [74], single-chain antibody fragments [100], engineered recombinant immunotoxins [101], human gamma interferon [102], human matrix metalloproteinase-7 [103], human interleukin-21 [80], and many others (Figure 4A). Like GdmCl, L-Arg increases the solubility of unfolded polypeptides and of folding intermediates and prevents their aggregation. Although it contains a guanidino group, however, L-Arg, in contrast to GdmCl, was found to have no effect or only a minor effect on the thermodynamic stability of natively folded proteins [104, 105] (Figure 4B). Exactly how L-Arg exerts its effect as a suppressor of aggregation is still subject to discussion. As already mentioned, no dramatic relative stabilization of
3 Refolding in Solution
Fig. 4 (A) Refolding yield of human tissuetype plasminogen activator as a function of the concentration of l-Arg. Reproduced from Ref [30]. (B) Influence of l-Arg on the GdmCl-induced folding/unfolding of hen egg white lysozyme. A final concentration of 20 µg mL−1 hen egg white lysozyme
was incubated for 12 h at 20 ◦ C in 50 mM HEPES/NaOH, pH 7, in the presence of increasing concentrations of GdmCl and 0 M, 0.5 M, and 1.0 M l-Arg, respectively, as indicated. The folding/unfolding transition was monitored by tryptophan fluorescence. Data are courtesy of Mr. Ravi Charan Reddy
denatured polypeptides with respect to the native state by L-Arg could be observed. At molal concentrations above 0.1 mol kg−1 , L-Arg is preferentially excluded from the surface hydration sphere of proteins, which means it is expected to act as a weakly stabilizing osmolyte, while at lower concentrations it is preferentially bound and therefore should exert a slightly destabilizing effect [83, 106]. Because no straightforward classification as either a stabilizing or a destabilizing co-solvent is possible, more or less specific direct interactions of its functional groups with the functional groups of dissolved polypeptides will have to be involved to explain the effects of L-Arg as a refolding enhancer. Recently, organic solvent systems with low water contents have received some attention as potential media for the in vitro folding of proteins. Rariy and Klibanov
15
16 Production of Recombinant Proteins by In Vitro Folding
were able to show that it is possible to refold hen egg white lysozyme to its native state in 99.8% dry glycerol, albeit at low yields [107]. The same group later reported that in buffered organic media (1,ω-alkanediols, oligoethylene glycols, and N-methyl formamide), the presence of high concentrations of a stabilizing salt, namely, LiCl up to 2 M, resulted in markedly enhanced refolding yields as compared to the organic buffer system alone [108]. However, at water contents below 30% (w/w), the observed refolding yields for lysozyme were still extremely low. The renaturation of lysozyme was also reported in buffer systems containing the liquid organic salts ethyl ammonium nitrate and butyl ammonium nitrate [109]. These salts were found to have an effect similar to other denaturing agents (see above) by suppressing aggregation during refolding in concentrations up to approx. 0.5 M, while dissolving lysozyme in the pure liquid salts resulted in unfolded protein. With different liquid organic salts, however, efficient refolding of lysozyme and of human tissue-type plasminogen activator was observed up to salt concentrations above 90% (w/w) (Hauke Lilie, personal communication). Liquid organic salts represent a rather novel, and very versatile, class of polar organic solvents (for recent reviews, see, e.g., Refs. [110–112]). As the organic ions that form these salts may be easily derivatized, they can potentially be optimized as solvents for any given biochemical reaction. The potential of these “tailor-made” media for in vitro folding should be kept in view. 3.5 Cofactors in Protein Folding
It is a straightforward assumption that for proteins, whose structural integrity depends on the presence of strongly bound prosthetic groups or metal ions, these factors must be present during refolding in order to guide structure formation towards the stable native state. Cofactors that only weakly interact with the native state may also have a beneficial effect on in vitro folding by guiding folding intermediates towards native structure formation. Price et al. found that Ca2+ , although not strongly bound to bovine pancreatic DNAse, was necessary for correct disulfide bond formation [113]. Upon refolding of recombinant horseradish peroxidase C from E. coli inclusion bodies, the presence of Ca2+ and heme during the refolding process was essential for the formation of native, active enzyme [114]. In this case, protein folding was performed at a concentration of 220 µg mL−1 in redox buffer containing an additional 5 mM of CaCl2 and 5 µM hemin. The refolding procedure was later successfully applied to other heme-containing proteins [115, 116]. Horse liver alcohol dehydrogenase requires Zn2+ for both enzymatic activity and stability. Upon refolding of this dimeric enzyme in the absence of Zn2+ , an inactive intermediate is formed. Addition of the metal ion initiates dimerization and recovery of native enzymatic activity [117]. Reactivation was observed only at moderate concentrations of Zn2+ , while higher concentrations of the metal ion seemed to sequestrate inactive intermediates in an inactive conformation. Other cofactors such as NAD(P)H, ATP, or FADH can be essential for structure formation. Upon refolding of tetrameric yeast glyceraldehyde phosphate
3 Refolding in Solution
dehydrogenase, NAD+ led to a significant increase in the final yield of reactivation [118]. In the case of proteins interacting with NAD(P)H, ATP, or DNA, the effect of these molecules during in vitro folding can be mimicked to a certain extent by including pyrophosphate in the refolding buffer. For example, human p53 tumor suppressor protein could be successfully refolded in the presence of 50 mM pyrophosphate [119]. 3.6 Chaperones and Folding-helper Proteins
An elegant method to prevent aggregation during protein folding is to make use of the machinery designed by nature to promote correct folding in living cells, i.e., molecular chaperones and other folding-helper proteins such as protein disulfide isomerases and peptidyl-prolyl cis/trans isomerases (for recent reviews, see, e.g., Refs. [120, 121]). Both the yield and the rate of in vitro folding processes can be increased tremendously by chaperones and foldases [122]. Both classes of proteins are, however, not yet used in large-scale industrial production processes. The reason lies once again in cost considerations. Using molecular chaperones or foldases for in vitro protein folding would require the prior, cost-intensive production of these proteins. Since some chaperones and folding catalysts must be present in stoichiometric amounts to improve in vitro folding, large amounts of these compounds would be needed in production processes. ATP, which is essential for the function of many chaperones, would also add to the production costs. Buchner et al. first reported the successful use of the chaperones GroEL/ES and DnaK, together with protein disulfide isomerase, as folding enhancers in the renaturation of a single-chain, antibody-based immunotoxin [123]. In this work, DnaK was found to retain its folding-helper activity when immobilized on a matrix. Immobilization of foldases allows for repeated use, thus reducing the necessary amount of chaperone and improving the process economy. This methodology was taken up by Fersht and coworkers, who co-immobilized a functional fragment of the E. coli chaperone GroEL, which was no longer ATP-dependent, together with DsbA, a bacterial disulfide isomerase, and peptidyl-prolyl isomerase [124, 125]. The disulfide-containing scorpion protein toxin Cn5 could be successfully refolded on the obtained multifunctionalized matrix. The use of chaperones and foldases will, however, need considerable further optimization before it can be implemented in industrial production processes. 3.7 An Artificial Chaperone System
It has been known for some time that cyclodextrins have a destabilizing effect on the native state of proteins [126]. This can be explained by a shift in the equilibrium in favor of the unfolded state, which is caused by the direct interaction of the cyclodextrins with buried hydrophobic side chains of the protein. During protein folding, this interaction may be exploited for transient shielding of hydrophobic
17
18 Production of Recombinant Proteins by In Vitro Folding
regions of the unfolded polypeptide chain and thereby reducing unproductive aggregation [127]. This use of cyclodextrins in protein-folding processes was further improved in the artificial chaperone-assisted refolding process described by Gellman and coworkers [128, 129]. In this refolding protocol, chemically denatured protein (e.g., inclusion bodies solubilized by GdmCl) is first diluted in a “capture step” into a detergent solution. During this step, the denaturant is diluted to a non-denaturing concentration. However, protein folding, as well as aggregation, is still prevented by the formation of a protein-detergent complex. In a second step, a cyclodextrin is added to the solution to strip the protein of detergent, thereby inducing refolding. This procedure has by now been optimized with respect to the choice of detergent and the respective cyclodextrin. Efficient stripping of the detergent is achieved by its intercalation into the cyclodextrin ring [130, 131]. The artificial chaperone system was further improved by using β-cyclodextrin coupled to epichlorohydrin copolymers. This eliminated the unit operation of removal of the cyclodextrindetergent complex in the liquid-phase artificial chaperone system [132]. Using the β-cyclodextrin beads in an expanded-bed mode allows in vitro folding on a large process scale. The only drawback of this elegant technique is the limited commercial availability of cyclodextrins coupled to hydrophilic resins. 3.8 Pressure-induced Folding
High hydrostatic pressure leads to reversible dissociation of oligomeric proteins [133], while pressures above 4 kbar lead to protein unfolding. Although folded proteins are highly incompressible, high hydrostatic pressures force conformational changes that lead to a reduction in the overall volume of the system. This effect can be described by taking into account the decrease in volume resulting from the exposure of hydrophobic groups from the interior of a protein, as well as the effect of pressure on the dynamic structure of water [134]. Lately, various attempts have been made to make use of high pressure for applied protein refolding processes. The basic idea is to shift the competition between correct folding and aggregate formation to the direction of the productive pathway. Since oligomeric proteins can be reversibly dissociated using high hydrostatic pressure, it could be assumed that the propensity for aggregate formation during in vitro folding should also be reduced upon increasing pressure. This hypothesis was confirmed by Gorovits and Horowitz [135]. They were able to show that at 2 kbar pressure the refolding yield of urea-denatured rhodanese increased to approx. 25% in contrast to a recovery of approx. 5% observed at ambient pressure. Upon further addition of 4 M glycerol during pressure treatment, the yield of refolding could be further increased up to 56%. Using P22 tail-spike protein as a model system, Foguel et al. showed that aggregates could be dissociated when subjected to hydrostatic pressures of 2.4 kbar [136]. Finally, St. John et al. demonstrated that native β-lactamase could be obtained by simply subjecting inclusion
3 Refolding in Solution
bodies to high pressure [137]. The effect of high pressure on the refolding of covalently cross-linked aggregates of lysozyme was analyzed concerning the effect of non-denaturing concentrations of GdmCl and the ratio of oxidized to reduced glutathione [138]. Under optimal conditions, refolding yields of up to 80% could be obtained. The general applicability of high-pressure techniques for inclusion body solubilization and refolding remains to be demonstrated. Amorphous aggregates are formed upon decompression of RNAse A and of bovine pancreatic trypsin inhibitor after high-pressure treatment in the presence of reducing agents [139], indicating that, at least for some proteins, misfolded, nonnative states may be favored by high hydrostatic pressures. 3.9 Temperature-leap Techniques
The optimum temperature for in vitro folding is often significantly lower than the temperature at which a given protein functions under physiological conditions. For example, the optimum temperature for refolding of several mammalian proteins lies in the range of 5–15 ◦ C. During in vivo folding, molecular chaperones that promote correct structure formation by preventing aggregation processes allow folding at higher temperatures than under in vitro conditions. Increasing temperatures during in vitro folding may lead to a stabilization of unproductive hydrophobic interactions and thus accelerate the rate of aggregate formation. Xie and Wetlaufer made use of the temperature dependency of the yield of refolding in designing an elegant and efficient folding protocol: the “temperature-leap tactic” [140]. Using bovine carbonic anhydrase II as a model system, refolding was optimal when first induced at 4 ◦ C for 120 min, followed by a final folding phase after warming the solution to 36 ◦ C. During the refolding of carbonic anhydrase II, an early intermediate is formed that is still susceptible to aggregation. If refolding is performed at elevated temperatures, this intermediate is largely converted to aggregates. If, however, the initial stage of refolding is allowed to proceed at lower temperatures, the early intermediate rearranges to a late intermediate that is more resistant to aggregation and can then be refolded at a higher rate at higher temperatures. The efficacy of this method, however, depends on the reaction pathway-dependent presence or absence of aggregation-prone refolding intermediates. Therefore, its applicability has to be checked and folding temperatures have to be optimized on a case-by-case basis. In another case, a short hydrophobic protein could be refolded after extensive purification of the solubilized inclusion bodies using a temperature-leap protocol [141]. In yet another example, oxidative renaturation of ovotransferrin from the fully reduced form was achieved by raising the temperature during folding [142]. Here the reduced protein was first diluted into refolding buffer containing 1 mM GSH and pre-incubated at 0 ◦ C for varying times. To initiate the final oxidative folding of the preformed, still reduced, intermediates, 1 mM GSSG was added and the temperature was increased to 22 ◦ C. The pre-incubation
19
20 Production of Recombinant Proteins by In Vitro Folding
at 0 ◦ C significantly decreased the aggregation, as measured by turbidity, during refolding. 3.10 Recycling of Aggregates
Upon in vitro folding, the complete conversion of the unfolded protein into its native biologically active form is rarely achieved. Instead, a significant amount of material is sequestered into inactive aggregates. These aggregates may comprise more than 90% of the initial solubilized protein material, depending on the particular protein and the refolding conditions. It should be possible to recover this material by recycling through a second round of solubilization and refolding. This was demonstrated in principle quite some time ago, using porcine lactate dehydrogenase as a model protein [143]. However, mistranslated forms or protein molecules containing nonnative covalent modifications may accumulate in the aggregated material and corrupt the correct refolding pathway during recycling. By using electron spray mass spectroscopy (ES-MS) for the analysis of the recycled aggregates after in vitro folding of the N-terminal domain of tissue inhibitor of metalloproteinase-2, it could be shown that even after partial purification the material was extremely heterogeneous in mass [144]. Thus, the need for the elaborate purification of the recycled protein, together with the relative ease of obtaining fresh inclusion bodies from the overexpression of a given target protein as starting material for in vitro folding, renders the recycling of aggregates unattractive.
4 Alternative Refolding Techniques 4.1 Matrix-assisted Refolding
The basic idea in matrix-assisted protein refolding is to make use of the specific interaction of the protein that is to be refolded with a solid support in order to prevent unspecific aggregation. To achieve this goal, three prerequisites must be fulfilled. First, protein binding to the matrix must be possible under denaturing conditions; second, the interaction between the protein and the matrix during the refolding step must allow for the flexibility of the polypeptide chain that is required for proper structure formation; and third, the interaction must be tight enough to prevent interchain interaction, which might result in aggregate formation. The technology of matrix-assisted protein refolding was pioneered by Creighton [145, 146], using the transient interaction of the unfolded protein with an ionexchange resin and the gradual removal of the denaturing agent to allow refolding. As an alternative, hydrophobic matrices were tested for the same purpose [147]. In another, slightly exotic, example, the MalK subunit of the Salmonella typhimurium maltoside transport complex was recovered in native form after solubilizing E. coli
4 Alternative Refolding Techniques
inclusion bodies in urea and subsequent binding of the material to a red agarose column, followed by refolding through buffer exchange [148]. Specific binding of solubilized protein to matrices can be achieved by using sequence tags, which must be chosen to allow for binding under denaturing conditions. For example, N- or C-terminal hexa-arginine tags were used to bind urea-unfolded α-glucosidase from Saccharomyces cerevisiae to polyanionic supports [149]. Careful optimization of the renaturation conditions was necessary, as ionic interactions of the refolding polypeptide with the matrix caused a drastic decrease in renaturation yields at low salt concentrations, while at high salt concentrations, renaturation was prevented by hydrophobic interactions with the matrix. Under optimum conditions, however, immobilized α-glucosidase could be renatured with a high yield at protein concentrations up to 5 mg mL−1 . Folding of the enzyme in solution had previously been feasible only at concentrations below 15 µg mL−1 . Recently, binding of His-tagged proteins to Ni-chelate affinity material has been increasingly used for matrix-assisted refolding. In the case of cysteine-rich proteins or proteins containing multiple disulfide bonds, incorrect disulfide bond formation due to metal-catalyzed air oxidation may hamper proper renaturation. Bassuk et al. described a method for isolation and partial refolding of a reduced His-tagged protein on a Ni-nitrilotriacetic acid (NTA) column, followed by disulfide shuffling using a redox buffer system composed of reduced and oxidized glutathione [150]. In another case, an overnight linear gradient, starting with 8 M urea, 100 mM NaH2 PO4 , 10 mM Tris/HCl, pH 8.0, and 13 mM β-mercaptoethanol and finishing with refolding buffer (100 mM NaH2 PO4 , 10 mM Tris/HCl, pH 8.0, 2 mM GSH, and 0.2 mM GSSG), was successfully used for the refolding of the Histagged recombinant human and mouse myelin oligodendrocyte glycoproteins on a Ni-NTA column [151]. Franken et al. recently reported a protocol for the refolding of matrix-bound His-tagged proteins, which includes a washing step with a buffer containing 60% isopropanol prior to refolding for the efficient removal of contaminants, namely detergents and bacterial endotoxins [152]. Apart from soluble proteins, membrane proteins also could be refolded after immobilizing the solubilized inclusion body material on a Ni-chelating matrix [153]. Here, the inclusion bodies were solubilized by urea in the presence of sodium N-lauryl sarcosinate and glycerol. After the binding to a Ni-chelate affinity matrix, folding was initiated by omitting urea from the elution buffer. With the protein still bound to the column, N-lauryl sarcosinate was exchanged against Triton X-100, and the refolded protein was finally eluted with imidazole. Another interesting approach to matrix-assisted protein refolding was described by Berdichevsky et al. [154]. In this case, the folding of a single-chain Fv antibody fragment fused to a cellulose-binding domain from Clostridium thermocellum (CBD) was analyzed in the matrix-bound form and compared to standard refolding in solution. Interestingly, the CBD was found to retain its specific cellulose-binding capacity even in the presence of up to 6 M urea. Following solubilization of the inclusion bodies of the fusion protein in urea, the CBD recovered its native structure, while the Fv part was still unfolded. After binding to crystalline cellulose, the
21
22 Production of Recombinant Proteins by In Vitro Folding
antibody fragment could then be refolded while bound to the matrix via its fusion partner, the already native CBD. Protein refolding by transient interaction of folding intermediates with solid supports can also be achieved by using disulfide-carrying polymeric microspheres during oxidative refolding of disulfide bond-containing proteins [155]. Here, cystaminefunctionalized microspheres were present during the oxidative refolding of reduced lysozyme. Due to transient binding of folding intermediates to the matrix by thiol-disulfide exchange, an increase in the yield of productive refolding could be achieved. 4.2 Folding by Gel Filtration
Surprisingly, unfolded proteins can in some cases be refolded simply by passing the solubilized, unfolded protein solution through a size-exclusion column equilibrated with refolding buffer [156–158]. Even more surprisingly, this refolding procedure often functions at relatively high protein concentrations. Several reasons may contribute to the feasibility of protein refolding by gel filtration. First, the unfolded polypeptide, characterized by a higher hydrodynamic radius than the chemical denaturant (GdmCl or urea) will run ahead of the denaturant and thus gradually move into conditions that promote correct refolding. Now either the protein can complete correct refolding and outrun the denaturant front, or higher aggregates may form, which then precipitate within the gel. The following denaturant front will re-dissolve these aggregates, giving them a renewed chance for correct folding. Furthermore, limited diffusion as well as physical separation of the refolding polypeptide chains within the pores of the gel may reduce the propensity for aggregate formation. Finally, transient interaction of the intermediates with the gel matrix may also contribute to a reduction in aggregate formation. The use of size-exclusion chromatography for refolding was further improved by Gu et al., who used urea gradients to guarantee the gradual transfer of the unfolded protein to refolding conditions [159]. In a very thorough analysis, chromatographic refolding methods were compared to more traditional refolding techniques using E. coli inclusion bodies of recombinant single-chain antibody fragments [160]. Refolding by dilution and dialysis was compared to protein refolding using an immobilized metal affinity matrix and gel filtration. The highest yield was obtained using gel filtration with decreasing urea and, in parallel, increasing pH gradient. In the presence of the appropriate redox buffer, composed of reduced and oxidized glutathione, decreasing urea concentrations allowed structure formation, while increasing pH values promoted formation of disulfide bonds by thiol-disulfide exchange. This method allowed high-yield recovery of correctly refolded protein. Since the oxidative refolding of antibody fragments is a relatively slow process, low flow rates had to be used during gel filtration. In another study Gu et al. compared the efficacy of different gel-filtration media for size-exclusion chromato-graphic refolding of denatured bovine carbonic anhydrase B [161].
4 Alternative Refolding Techniques
Although folding by gel filtration seems to be an elegant and promising approach to the renaturation of inclusion body material, several inherent difficulties may limit its practical applicability. Clogging of the gel-filtration column by aggregates can be a severe problem for many proteins, and the refolding conditions have to be carefully optimized with respect to buffer conditions, flow rate, and gradient volume in each case. Low maximum flow rates and the cost of large size-exclusion columns represent barriers to the scale-up of a process for industrial production. 4.3 Direct Refolding of Inclusion Body Material
As the properties of proteins vary so widely depending on their individual amino acid sequence, it is obviously to be expected that the inclusion bodies formed by different recombinant proteins also differ in their morphology. Using the same procedure for isolation may in one case yield inclusion body preparations that are extremely resistant to solubilization, while isolates of other proteins may be re-dissolved even under non-denaturing conditions. In general, inclusion body preparations of secreted proteins containing multiple disulfide bonds are quite stable, so that strong denaturants are a prerequisite for solubilization. Inclusion body isolates of intracellular proteins, on the other hand, are often found to be quite labile, so that the folded or partially folded protein contained in these inclusion bodies can be solubilized by a simple incubation in physiological buffer systems. For example, native protein could be extracted from inclusion body isolates obtained after massive overexpression of the endogenous E. coli β-galactosidase (R. Rudolph, unpublished observation). In another study, Tsumoto et al. reported the extraction of active green fluorescent protein (GFP) from inclusion bodies using non-denaturing solvent conditions [162]. In this case, the inclusion bodies already contained some native GFP, as indicated by their fluorescence. This native material, together with partially folded intermediates, could be solubilized using non-denaturing concentrations of GdmCl or, even more effectively, using L-Arg in concentrations ranging from 0.5 M to 2 M. In other cases, the mild detergent N-lauryl sarcosine was used to extract active protein from inclusion body material. Native E. coli RNA polymerase σ factor could be recovered by dissolving the inclusion bodies in buffer containing 0.4% (w/v) sarcosyl, followed by 10-fold dilution at 4 ◦ C [163]. Actin could be recovered from inclusion body material by using 1.5% (w/v) sarcosyl in the presence of 1 mM ethylenediamine tetraacetic acid (EDTA) [164]. In the absence of divalent cations, only actin was extracted from the inclusion body material, while contaminating E. coli outer-membrane proteins were not solubilized. Recombinant proteins that form more resistant inclusion bodies may still be directly solubilized and refolded by applying strongly alkaline buffer conditions. Although this procedure may appear relatively harsh, it has indeed been used in industrial processes. For example, prochymosin was extracted by suspending the inclusion body material in 0.05 M K2 HPO4 /K2 NaPO4 , 1 mM EDTA, 0.1 M
23
24 Production of Recombinant Proteins by In Vitro Folding
NaCl, pH 10.7. After incubation for a least 1 h at 4 ◦ C, the pH of the supernatant was adjusted to pH 8.0 by addition of concentrated HCl to induce final refolding [165]. In another case, bovine somatotropin was recovered by incubating inclusion bodies in dilute NaOH, pH 12.5. After 20 min at 25 ◦ C, the inclusion bodies were completely dissolved. After adjusting the pH to 11.5, oxidative refolding proceeded, presumably by air oxidation, during a further incubation for 8 h [166]. 5 Conclusions
Protein folding has lost its virgin status and become the subject of industrial exploitation for many years. By now a large number of proteins of industrial relevance are produced by processes involving refolding. Among the most prominent examples are recombinant human proteins of high market potential produced by in vitro folding from E. coli inclusion body material. These examples include human insulin, human granulocyte colony-stimulating factor, human growth hormone, and Rapilysin, an engineered variant of human tissue-type plasminogen activator. Other proteins used for drug discovery by screening or rational, structure-based design are also produced by in vitro folding. In this article only a selection of the techniques that have been used for protein refolding could be mentioned. Unfortunately, many of the refolding protocols described in the literature work well for only one particular target protein. Folding protocols have to be optimized with respect to protein concentration, temperature, pH, redox conditions, additives, and specific methodology on a case-by-case basis. However, “standard procedures” have been developed over the last couple of years, which may serve as fairly general guidelines (cf. Protocol 3). Chances are high that native, biologically active protein can be obtained from inclusion body material by following these protocols. These procedures, however, may not straightforwardly lead to the most cost-efficient process for the large-scale production of a given protein. In designing an industrial process, other alternatives that are mentioned in this article should be kept in view as possibilities for the optimization of process economics. 6 Experimental Protocols 6.1 Protocol 1: Isolation of Inclusion Bodies Equipment and Reagents Ĺ Ultraturrax Ĺ French press Ĺ Centrifuge
6 Experimental Protocols
Ĺ Ĺ Ĺ Ĺ Ĺ Ĺ
0.1 M Tris/HCl pH 7, 1 mM EDTA Lysozyme MgCl2 DNAse 60 mM EDTA, 6% Triton X-100, 1.5 M NaCl, pH 7 0.1 M Tris/HCl pH 7, 20 mM EDTA
Method 1. Homogenize cells (5 g wet weight) in 25 mL 0.1 M Tris/HCl pH 7, 1 mM EDTA, at 4 ◦ C, using an Ultraturrax (10 000 rpm). 2. Add 1.5 mg lysozyme per gram cells, mix briefly with the Ultraturrax, and incubate at 4 ◦ C for 30 min. 3. Use sonication or high-pressure homogenization (French press) to disrupt the cells. 4. In order to digest DNA, add MgCl2 to a final concentration of 3 mM and DNAse to a final concentration of 10 µg mL−1 , and incubate for 30 min at 4 ◦ C. 5. Add 0.5 volume of 60 mM EDTA, 6% Triton X-100, 1.5 M NaCl, pH 7, and incubate for 30 min at 4 ◦ C. 6. Pellet inclusion bodies (IB) by centrifugation at 31 000 g for 10 min at 4 ◦ C. 7. Resuspend the pellet in 40 mL 0.1 M Tris/HCl pH 7, 20 mM EDTA, using the Ultraturrax. 8. Repeat centrifugation. 9. The IB pellet may be stored frozen at −20 ◦ C for at least two weeks. 6.2 Protocol 2: Solubilization of Inclusion Bodies Reagents Ĺ 6 M GdmCl (or 8 M urea), 0.1 M Tris/HCl pH 8, 100 mM DTE (or DTT), 1 mM EDTA Ĺ 1 M HCl Ĺ 4 M GdmCl (or 6 M urea), 10 mM HCl Method 1. Resuspend 50 mg IB pellet from Protocol 1 in 5 mL 6 M GdmCl (or 8 M urea), 0.1 M Tris/HCl pH 8, 100 mM DTE (or DTT), 1 mM EDTA, and incubate for 2 h at 25 ◦ C. 2. Lower the pH of the solution to a value between pH 3 and pH 4 by drop-wise addition of HCl. 3. Remove debris by centrifugation at 10 000 g. 4. Completely remove the DTE (or DTT) by exhaustive dialysis (twice) against 500 mL of 4 M GdmCl (or 6 M urea), 10 mM HCl, for 2 h at 25 ◦ C. Dialyze again at
25
26 Production of Recombinant Proteins by In Vitro Folding
4 ◦ C overnight against 1 L of 4 M GdmCl (or 6 M urea). (In the case of proteins that do not contain disulfide bonds, the dialysis step may be omitted.) 5. Determine the protein concentration by a Bradford assay [167] using bovine serum albumin (BSA) in 4 M GdmCl (or 6 M urea), 10 mM HCl as standard protein. 6. Aliquots of the protein solution may be stored at −70 ◦ C for at least two weeks.
6.3 Protocol 3: Refolding of Proteins Refolding by Rapid Dilution of Denatured Protein 1. The optimal choice of the refolding buffer depends on the particular protein. Table 2 represents a series of buffer conditions that should be tested in an initial screening and may serve as starting points for further optimization. 2. Dilute the solubilized proteins obtained according to Protocol 2 into the precooled refolding buffer (10 ◦ C) under rapid mixing. The final protein concentration should be within the range of 10–30 µg mL−1 . This concentration should be obtained by diluting the solubilized inclusion bodies in a ratio of approx. 1:100 (v/v). 3. After 12 h refolding at 10 ◦ C, the sample should be dialyzed against a dilute buffer system appropriate for the given protein, which must not contain L-Arg. Apart from improving refolding, L-Arg may also keep misfolded forms of the protein in solution and should therefore be removed. A precipitate may form during this dialysis step.
Table 2 Buffer conditions for initial refolding screening
Buffer
pH1
Additive
GSH/GSSG2
EDTA3
0.1 M Tris/HCl " " " " "" " "
8.5 " " 7.5 9.5 8.5 " " "
− 0.5 M L-Arg/HCl 1.0 M L-Arg/HCl 0.5 M L-Arg/HCl " "" 0.5 M Na2 SO4 20% (w/v) Sorbitol
5 mM/0.5 mM "/" "/" "/" "/" 0.5 mM/5 mM 5 mM/0.5 mM "/"
1 mM " " " " "" " "
1
Care should be taken to adjust the pH value only after addition of all buffer components. Presence of the redox buffer system composed of oxidized and reduced glutathione is necessary only if the protein contains disulfide bonds; if the protein does not contain disulfide bonds, GSH/GSSG may be replaced by 5 mM DTT (or DTE). Stock solutions of GSH, DTT, and DTE should not be stored for longer than 24 h at 4 ◦ C. 3 EDTA should be added only if the native protein does not contain bound metal ions; if the protein does contain bound metal ions, low concentrations of the respective metal should be included (e.g., in fivefold molar excess over the protein). 2
References
4. After removal of the precipitate by centrifugation, the biological activity of the refolded protein should be assessed using an appropriate assay. If no activity as say is available, solubility may be taken as a first indication for correct refolding. Upon quantification of the refolding yield, take into account volume changes that may have occurred during dialysis due to osmosis. Refolding by Stepwise Addition of Denatured Protein To achieve refolding at higher final concentrations of renatured protein, the dilution of the solubilized inclusion body material may be carried out in a stepwise manner, according to the following rules: 1. Determine the upper limit of the concentration of denatured protein (c) that can be refolded at high yield upon a single dilution into renaturation buffer. 2. Determine the time t that is required for renaturation to 90% of the final value. 3. Determine the maximum residual denaturant concentration that is compatible with native structure formation. 4. Refolding is performed by stepwise addition of concentration increments c of the denatured protein to the refolding buffer at time intervals t. The addition of denatured protein solution can be repeated until the maximum residual denaturant concentration is reached. 5. After refolding is complete, proceed with dialysis according to step 3 of the standard protocol (see above).
Acknowledgements
The group’s ongoing work on the in vitro folding of proteins is supported by the German Federal Ministry of Education and Research (BMBF grant no. 03WKB01 A) and the Federal State (Land) of Saxony-Anhalt (grant no. 3537 A/0903 L).
References 1 ITAKURA, K., HIROSE, T., CREA, R., RIGGS, A. D., HEYNECKER, H. L., BOLIVAR, F., and BOYER, H. W. (1977). Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin. Science 198, 1056– 1063. 2 GOEDDEL, D. V., KLEID, D. G., BOLIVAR, F., HEYNEKER, H. L., YANSURA, D. G., CREA, R., HIROSE, T., KRASZEWSKI, A., ITAKURA, K., and RIGGS, A. D. (1979). Expression in Escherichia coli of chemically synthesized genes for human insulin. Proc. Natl. Acad. Sci. USA 76, 106–110.
3 MAKRIDES, S. C. (1996). Strategies for Achieving High-Level Expression of Genes in Escherichia coli. Microbiol. Rev. 60, 512–538. 4 YEE, L., and BLANCH, H. W. (1992). Recombinant protein expression in high cell density fed-batch cultures of Escherichia coli. Bio/Technology 10, 1550–1556. 5 KLEMAN, G. L., and STROHL, W. R. (1994). Developments in high cell density and high productivity microbial fermentation. Curr. Opin. Biotechnol. 5, 180–186.
27
28 Production of Recombinant Proteins by In Vitro Folding
6 LEE, S. Y. (1996). High cell-density culture of Escherichia coli. Trends Biotechnol. 14, 98–105. 7 MARSTON, F. A. O. (1986). The purification of eukaryotic polypeptides synthesized in Escherichia coli. Biochem. J. 240, 1–12. 8 ALLEN, S. P., POLAZZI, J. O., GIERSE, J. K., and EASTON, A. M. (1992). Two Novel Heat Shock Genes Encoding Proteins Produced in Response to Heterologous Protein Expression in Escherichia coli. J. Bacteriol. 174, 6938–6947. 9 HARTLEY, D. L., and KANE, J. F. (1988). Properties of inclusion bodies from recombinant Escherichia coli. Biochem. Soc. Trans. 16, 101–102. 10 De BERNARDEZ CLARK, E., SCHWARZ, E., and RUDOLPH, R. (1999). Inhibition of Aggregation Side Reactions during in vitro Protein Folding. Meth. Enzymol. 309, 217–236. 11 WORRALL, D. M., and GOSS, N. H. (1989). The Formation of Biologically Active β-Galactosidase Inclusion Bodies in Escherichia coli. Austral. J. Biotechnol. 3, 28–32. 12 FLAMAND, M., CHEVALIER, M., HENCHAL, E., GIRARD, M., and DEUBEL, V. (1995). Purification and Renaturation of Japanese Encephalitis Virus Nonstructural Glycoprotein NS1 Overproduced by Insect Cells. Protein Expr. Purif. 6, 519–527. 13 KIEFHABER, T., RUDOLPH, R., KOHLER, H.-H., and BUCHNER, J. (1991). Protein aggregation in vitro and in vivo: A quantitative Model of the kinetic competition between folding and aggregation. Bio/Technology 9, 825– 829. 14 SCHEIN, C. H., and NOTEBORN, M. H. M. (1988). Formation of soluble recombinant proteins in Escherichia coli is favored by lower growth temperature. Bio/Technology 6, 291–294. 15 KOPETZKI, E., SCHUMACHER, G., and BUCKEL, P. (1989). Control of formation of active soluble or inactive insoluble baker’s yeast a-glucosidase PI in Escherichia coli by induction and growth conditions. Mol. Gen. Genet. 216, 149–155.
16 FAHEY, R. C., HUNT, J. S., and WINDHAM, G. C. (1977). On the cysteine and cystine content of proteins. Differences between intra-cellular and extracellular proteins. J. Mol. Evol. 10, 155–163. 17 FREEDMAN, R. B. (1995). The formation of protein disulphide bonds. Curr. Opin. Struct. Biol. 5, 85–91. 18 RAINA, S., and MISSIAKAS, D. (1997). Making and Breaking of Disulfide Bonds. Annu. Rev. Microbiol. 51, 179–202. 19 FABIANEK, R. A., HENNECKE, H., and THO¨ NY-MEYER, L. (2000). Periplasmic protein thiol:disulfide oxidoreductases of Escherichia coli. FEMS Microbiol. Rev. 24, 303–316. 20 KADOKURA, H., KATZEN, F., and BECKWITH, J. (2003). Protein Disulfide Bond Formation in Prokaryotes. Annu. Rev. Biochem. 72, 111–135. 21 DERMAN, A. I., PRINZ, W. A., BELIN, D., and BECKWITH, J. (1993). Mutations That Allow Disulfide Bond Formation in the Cytoplasm of Escherichia coli. Science 262, 1744–1747. 22 PRINZ, W. A., ASLUND, F., HOLMGREN, A., and BECKWITH, J. (1997). The role of the thioredoxin and glutaredoxin pathways in reducing protein disulfide bonds in the Escherichia coli cytoplasm. J. Biol. Chem. 272, 15661–15667. 23 STEWART, E. J., ASLUND, F., and BECKWITH, J. (1998). Disulfide bond formation in the Escherichia coli cytoplasm: an in vivo role reversal for the thioredoxins. EMBO J. 17, 5543–5550. 24 BESSETTE, P. H., ASLUND, F., BECKWITH, J., and GEORGIOU, G. (1999). Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm. Proc. Natl. Acad. Sci. USA 96, 13703–13708. 25 GEORGIOU, G., and VALAX, P. (1996). Expression of correctly folded proteins in Escherichia coli. Curr. Opin. Biotechnol. 7, 190–197. 26 VONRHEIN, C., SCHMIDT, U., ZIEGLER, G. A., SCHWEIGER, S., HANUKOGLU, I., and SCHULZ, G. E. (1999). Chaperone-assisted expression of authentic bovine adrenodoxin reductase
References
27
28
29
30
31
32
33
34
35
36
in Escherichia coli. FEBS Lett. 443, 167–169. RUDOLPH, R. (1996). Successful Protein Folding on an Industrial Scale. in: Protein Engineering: Principles and Practice. CLELAND, J. L., and CRAIK, C. S. (eds.). Wiley-Liss Inc., New York, USA. pp. 283–298. RUDOLPH, R., and LILIE, H. (1996). In vitro folding of inclusion body proteins. FASEB J. 10, 49–56. RUDOLPH, R., B¨oHM, G., LILIE, H., and JAENICKE, R. (1997). Folding proteins. in: Protein Function: A Practical Approach. CREIGHTON, T. E. (ed.). IRL Press, Oxford, UK. pp. 57–99. RUDOLPH, R., LILIE, H., and SCHWARZ, E. (1999). In vitro Folding of Inclusion Body Proteins on an Industrial Scale. in: Biotechnology (2nd ed.), Vol. 5a: Recombinant Proteins, Monoclonal Antibodies and Therapeutic Genes. MOUNTAIN A., NEY U., *and SCHOMBURG D. (eds.). Wiley-VCH, Weinheim, Germany. pp. 112–123. ERNST, S., GARRO, O. A., WINKLER, S., VENKATARAMAN, G., LANGER, R., COONEY, C. L., and SASISEKHARAN, R. (1997). Biotechnol. Bioeng. 53, 575–582. ZETTLMEISSL, G., RUDOLPH, R., and JAENICKE, R. (1979). Reconstitution of Lactic Dehydrogenase. Noncovalent Aggregation vs. Reactivation. 1. Physical Properties and Kinetics of Aggregation. Biochemistry 18, 5567–5571. DATAR, R. V., CARTWRIGHT, T., and ROSEN, C.-G. (1993). Process Economics of Animal Call and Bacterial Fermentations: A Case Study Analysis of Tissue Plasminogen Activator. Bio/Technology 11, 349–357. MIDDELBERG, A. P. J. (1996). The influence of protein refolding strategy on cost for competing reactions. Chem. Eng. J. 61, 41–52. RUDOLPH, R., and FISCHER, S. (1990). Process for obtaining renatured proteins. U.S. Patent 4,933,434. KOHNERT, U., RUDOLPH, R., VERHEIJEN, J. H., JACOLINE, E., WEENING-VERHOEFF, D., STERN, A., OPITZ, U., MARTIN, U., LILL, H., PRINZ, H., LECHNER, M., KRESSE,
37
38
39
40
41
42
43
44
45
G.-B., BUCKEL, P., and FISCHER, S. (1992). Protein Eng. 5, 93–100. RUDOLPH, R., FISCHER, S., and MATTES, R. (1995). Process for the activation of t-PA or Ing after genetic expression in prokaryotes. U.S. Patent 5,453,363. J¨URGEN, B., LIN, H. Y., RIEMSCHNEIDER, S., SCHARF, C., NEUBAUER, P., SCHMID, R., HECKER, M., and SCHWEDER, T. (2000). Monitoring of Genes that Respond to Overproduction of an Insoluble Recombinant Protein in Escherichia coli Glucose-Limited Fed-Batch Fermentations. Biotechnol. Bioeng. 70, 217–224. HUNKE, S., and BETTON, J.-M. (2003). Temperature effect on inclusion body formation and stress response in the periplasm of Escherichia coli. Mol. Microbiol. 50, 1579–1589. STARK, G. R., STEIN, W. H., and MOORE, S. (1960). Reactions of the Cyanate Present in Aqueous Urea with Amino Acids and Proteins. J. Biol. Chem. 235, 3177–3181. STARK, G. R. (1965). Reactions of Cyanate with Functional Groups of Proteins. III. Reactions with Amino Carboxyl Groups. Biochemistry 4, 1030–1036. HAGEL, P., GERDING, J. J. T., FIEGGEN, W., and BLOEMENDAL, H. (1971). Cyanate Formation in Solutions of Urea. I. Calculation of Cyanate Concentrations at Different Temperature and pH. Biochim. Biophys. Acta 243, 366–373. GOLDBERG, M. E., RUDOLPH, R., and JAENICKE, R. (1991). A Kinetic Study of the Competition between Renaturation and Aggregation during the Refolding of Denatured-Reduced Egg White Lysozyme. Biochemistry 30, 2790– 2797. TRIVEDI, V. D., RAMAN, B., MOHAN RAO Ch., and RAMAKRISHNA, T. (1997). Co-refolding denatured-reduced hen egg white lysozyme with acidic and basic proteins. FEBS Lett. 418, 363–366. RAMAGE, P., CHENEVAL, D., CHVEI, M., GRAFF, P., HEMMIG, R., HENG, R., KOCHER, H. P., MACKENZIE, A., MEMMERT, K., REVESZ, L., and WISHART, W. (1995). Expression, Refolding, and Autocatalytic Proteolytic Processing of
29
30 Production of Recombinant Proteins by In Vitro Folding
46
47
48
49
50
51
52
53
54
the Interleukin-1β-converting Enzyme Precursor. J. Biol. Chem. 270, 9378–9383. HOCHULI, E., BANNWARTH, W., D¨OBELI, H., GENTZ, R., and STU¨ BER, D. (1988). Genetic Approach to Facilitate Purification of Recombinant Proteins with a Novel Metal Chelate Adsorbent. Bio/Technology 6, 1321–1325. SAMUELSSON, E., and UHLEN, M. (1996). Chaperone-like effect during in vitro refolding of insulin-like growth factor I using a solubilizing fusion partner. Ann. N.Y. Acad. Sci. 782, 486–494. AMBROSIUS, D., DONY, C., and RUDOLPH, R. (1996). Activation of recombinant proteins. U.S. Patent 5,578,710. ZHU, X. L., OHTA, Y., JORDAN, F., and INOUYE, M. (1989). Pro-sequence of subtilisin can guide the refolding of denatured subtilisin in an intramolecular process. Nature 339, 483–484. SILEN, J. L., and AGARD, D. A. (1989). The α-lytic protease pro-region does not require a physical linkage to activate the protease domain in vivo. Nature 341, 462–464. RATTENHOLL, A., RUOPPOLO, M., FLAGIELLO, A., MONTI, M., VINCI, F., MARINO, G., LILIE, H., SCHWARZ, E., and RUDOLPH, R. (2001). Pro-Sequence Assisted Folding and Disulfide Bond Formation of Human Nerve Growth Factor. J. Mol. Biol. 305, 523–533. RATTENHOLL, A., LILIE, H., GROSSMANN, A., STERN, A., SCHWARZ, E., and RUDOLPH, R. (2001). The pro-sequence facilitates folding of human nerve growth factor from Escherichia coli inclusion bodies. Eur. J. Biochem. 268, 3296–3303. GRAUSCHOPF, U., LILIE, H., HONOLD, K., WOZNY, M., REUSCH, D., ESSWEIN, A., SCHAEFER, W., R¨UCKNAGEL, K. P., and RUDOLPH, R. (2000). The N-Terminal Fragment of Human Parathyroid Hormone Receptor 1 Constitutes a Hormone Binding Domain and Reveals a Distinct Disulfide Pattern. Biochemistry 30, 8878–8887. BAZARSUREN, A., GRAUSCHOPF, U., WOZNY, M., REUSCH, D., HOFFMANN, E., SCHAEFER, W., PANZNER, S., and
55
56
57
58
59
60
61
62
63
64
65
RUDOLPH, R. (2002). In vitro folding, functional characterization, and disulfide pattern of the extracellular domain of human GLP-1 receptor. Biophys. Chem. 96, 305–318. HOFFMAN, S. J., LOOKER, D. L., ROEHRICH, J. M., COZART, P. E., DURFEE, S. L., TEDESCO, J. L., and STETLER, G. L. (1990). Expression of fully functional tetrameric human hemoglobin in Escherichia coli. Proc. Natl. Acad. Sci. USA 87, 8521–8525. NAGAI, K., PERUTZ, M., and POYART, C. (1985). Oxygen binding properties of human mutant hemoglobins synthesized in Escherichia coli. Proc. Natl. Acad. Sci. USA 82, 7252–7255. CREIGHTON, T. E. (1978). Experimental Studies of Protein Folding and Unfolding. Prog. Biophys. Molec. Biol. 33, 231–297. PACE, C. N., and CREIGHTON, T. E. (1986). The Disulphide Folding Pathway of Ribonuclease T1 . J. Mol. Biol. 188, 477–486. WEISSMAN, J. S., and KIM, P. S. (1991). Reexamination of the folding of BPTI: predominance of native intermediates. Science 253, 1386–1393. WEISSMAN, J. S., and KIM, P. S. (1992). Kinetic role of nonnative species in the folding of bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. USA 89, 9900–9904. FRAND, A. R., CUOZZO, J. W., and KAISER, C. A. (2000). Pathways for protein disulphide bond formation. Trends Cell Biol. 10, 203–210. WOYCECHOWSKY, K. J., and RAINES, R. T. (2000). Native disulfide bond formation in proteins. Curr. Opin. Chem. Biol. 4, 533–539. HINIKER, A., and BARDWELL, J. C. A. (2003). Disulfide Bond Isomerization in Prokaryotes. Biochemistry 42, 1179–1185. SELA, M., WHITE, F. H., and ANFINSEN, C. B. (1957) Reductive cleavage of disulfide bridges in ribonuclease. Science 125, 691–692. AHMED, A. K., SCHAFFER, S. W., and WETLAUFER, D. B. (1975). Nonenzymatic reactivation of reduced bovine pancreatic ribonuclease by air oxidation and by
References
66
67
68
69
70
71
72
73
74
75
glutathione oxidoreduction buffers. J. Biol. Chem. 250, 8477–8482. WINTER, J., KLAPPA, P., FREEDMAN, R., LILIE, H., and RUDOLPH, R. (2002). Catalytic Activity and Chaperone Function of Human Protein-disulfide Isomerase are Required for the Efficient Refolding of Proinsulin. J. Biol. Chem. 277, 310–317. SAXENA, V. P., and WETLAUFER, D. B. (1970). Formation of three-dimensional structure in proteins. I. Rapid nonenzymic reactivation of reduced lysozyme. Biochemistry 9, 5015–5023. WETLAUFER, D. B. (1984). Nonenzymatic formation and isomerization of protein disulfides. Meth. Enzymol. 107, 301– 304. WOYCECHOWSKY, K. J., WITTRUP, K. D., and RAINES, R. T. (1999). A small-molecule catalyst of protein folding in vitro and in vivo. Chem. Biol. 6, 871–879. WINTER, J., LILIE, H., and RUDOLPH, R. (2002). Recombinant expression and in vitro folding of proinsulin are stimulated by the synthetic dithiol Vectrase-P. FEMS Microbiol. Lett. 213, 225–230. GOUGH, J. D., WILLIAMS R. H. Jr., DONOFRIO, A. E., and LEES, W. J. (2002). Folding Disulfide-Containing Proteins Faster with an Aromatic Thiol. J. Am. Chem. Soc. 124, 3885–3892. GOUGH, J. D., GARGANO, J. M., DONOFRIO, A. E., and LEES, W. J. (2003) Aromatic Thiol pKa Effects on the Folding Rate of a Disulfide Containing Protein. Biochemistry 42, 11787–11797. LIGHT, A., and HIGAKI, J. N. (1987). Detection of Intermediate Species in the Refolding of Bovine Trypsinogen. Biochemistry 26, 5556–5564. BUCHNER, J., and RUDOLPH, R. (1991). Renaturation, Purification, and Characterization of Recombinant Fab -Fragments Produced in Escherichia coli. Bio/Technology 9, 157–162. HONDA, J., ANDOU, H., MANNEN, T., and SUGIMOTO, S. (2000). Direct Refolding of Recombinant Human Growth Differentiation Factor 5 for Large-Scale Production Process. J. Biosci. Bioeng. 89, 582–589.
76 LEVINE, A. D., RANGWALA, S. H., HORN, N. A., PEEL, M. A., MATTHEWS, B. K., LEIMGRUBER, R. M., MANNING, J. A., BISHOP, B. F., and OLINS, P. O. (1995). High Level Expression and Refolding of Mouse Interleukin 4 Synthesized in Escherichia coli. J. Biol. Chem. 270, 7445–7452. 77 WANG, J.-Y., KISHORE, U., and REID, K. B. M. (1995). A recombinant polypeptide, composed of the α-helical neck region and the carbohydrate recognition domain of conglutinin, self-associates to give a functionally intact homotrimer. FEBS Lett. 376, 6–10. 78 LOWTHER, W. T., MAJER, P., and DUNN, B. M. (1995). Engineering the substrate specificity of rhizopuspepsin: The role of Asp 77 of fungal aspartic proteinases in facilitating the cleavage of oligopeptide substrates with lysine in P1 . Protein Sci. 4, 689–702. 79 SCHREM, A., LANGE, C., and KOCH, K.-W. (1999). Identification of a Domain in Guanylyl Cyclase-activating Protein 1 that Interacts with a Complex of Guanylyl Cyclase and Tubulin in Photoreceptors. J. Biol. Chem. 274, 6244–6249. 80 ASANO, R., KUDO, T., MAKABE, K., TSUMOTO, K., and KUMAGAI, I. (2002). Antitumor activity of interleukin-21 prepared by novel refolding procedure from inclusion bodies expressed in Escherichia coli. FEBS Lett. 528, 70– 76. 81 UMETSU, M., TSUMOTO, K., HARA, M., ASHISH, K., GODA, S., ADSCHIRI, T., and KUMAGAI, I. (2003). How Additives Influence the Refolding of Immunoglobulin-folded Proteins in a Stepwise Dialysis System. J. Biol. Chem. 278, 8979–8978. 82 WINTER, J., LILIE, H., and RUDOLPH, R. (2002). Renaturation of human proinsulin – a study on refolding and conversion to insulin. Anal. Biochem. 310, 148–155. 83 TIMASHEFF, S. N., and ARAKAWA, T. (1989). Stabilization of protein structure by solvents. in: Protein structure. A practical approach. CREIGHTON T. E. (ed.). IRL Press, Oxford, UK. pp. 331ff.
31
32 Production of Recombinant Proteins by In Vitro Folding
84 TIMASHEFF, S. N. (2002). Protein Hydration, Thermodynamic Binding, and Preferential Hydration. Biochemistry 41, 13473–13482. 85 ORSINI, G., and GOLDBERG, M. E. (1978). The Renaturation of Reduced Chymotrypsinogen A in Guanidine HCl. Refolding versus Aggregation. J. Biol. Chem. 253, 3453–3458. 86 WETLAUFER, D. B., and XIE, Y. (1995) Control of aggregation in protein refolding: A variety of surfactants promote renaturation of carbonic anhydrase II. Protein Sci. 4, 1536–1543. 87 CLELAND, J. L., HEDGEPETH, C., and WANG, D. I. C. (1992). Polyethylene Glycol Enhanced Refolding of Bovine Carbonic Anhydrase B. J. Biol. Chem. 267, 13327–13334. 88 CLELAND, J. L., BUILDER, S. E., SWARTZ, J. R., WINKLER, M., CHANG, J. Y., and WANG, D. I. C. (1992). Polyethylene Glycol Enhanced Protein Refolding. Bio/Technology 10, 1013–1019. 89 LEE, L. L.-Y., and LEE, J. C. (1987). Thermal Stability of Proteins in the Presence of Poly(ethylene glycols). Biochemistry 26, 7813–7819. 90 TANDON, S., and HOROWITZ, P. (1988). The effects of lauryl maltoside on the reactivation of several enzymes after treatment with guanidinium chloride. Biochim. Biophys. Acta 955, 19–25. 91 ZARDENATA, G., HOROWITZ, P. M. (1994). Protein Refolding at High Concentrations Using Detergent/Phospholipid Mixtures. Anal. Biochem. 218, 392–398. 92 KRAUSE, M., RUDOLPH, R., and SCHWARZ, E. (2002). The non-ionic detergent Brij 58P mimics chaperone effects. FEBS Lett. 532, 253–255. 93 YANCEY, P. H., CLARK, M. E., HAND, S. C., BOWLUS, R. D., and SOMERO, G. N. (1982). Living with water stress: evolution of osmolyte systems. Science 24, 1214–1222. 94 BOLEN, D. W., and BASKAKOV, I. V. (2001). The osmophobic effect: natural selection of a thermodynamic force in protein folding. J. Mol. Biol. 310, 955–963. 95 SAWANO, H., KOUMOTO, Y., OHTA, K., SASAKI, Y., SEGAWA, S., and TACHIBANA,
96
97
98
99
100
101
102
103
104
H. (1992). Efficient in vitro folding of the three-disulfide derivatives of hen lysozyme in the presence of glycerol. FEBS Lett. 303, 11–14. MICHAELIS, U., RUDOLPH, R., JARSCH, M., KOPETZKI, E., BURTSCHER, H., and SCHUMACHER, G. (1995). Process for the production and renaturation of recombinant, biologically active, eukaryotic alkaline phosphatase. U.S. Patent 5,434,067. CHILSON, O. P., and CHILSON, A. E. (2003). Perturbation of folding and reassociation of lactate dehydrogenase by proline and trimethylamine oxide. Eur. J. Biochem. 270, 4823–4834. KUMAR, T. K. S., SAMUEL, D., JAYARAMAN, G., SRIMATHI, T., and YU, C. (1998). The role of proline in the prevention of aggregation during protein folding in vitro. Biochem. Mol. Biol. Int. 46, 509–517. SHIRAKI, K., KUDOU, M., FUJIWARA, S., IMANAKA, T., and TAKAGI, M. (2002). Biophysical Effect of Amino Acids on the Prevention of Protein Aggregation. J. Biochem. (Tokyo) 132, 591–595. BUCHNER, J., PASTAN, I., and BRINKMANN, U. (1992). A Method for Increasing the Yield of Properly Folded Recombinant Fusion Proteins: Single-Chain Immunotoxins from Renaturation of Bacterial Inclusion Bodies. Anal. Biochem. 205, 263–270. BRINKMANN, U., BUCHNER, J., PASTAN, I. (1992). Independent domain folding of Pseudomonas exotoxin and single-chain immuno-toxins: influence of interdomain connections. Proc. Natl. Acad. Sci. USA 89, 3075–3079. ARORA, D., and KHANNA, N. (1996). Method for increasing the yield of properly folded recombinant human gamma interferon from inclusion bodies. J. Biotechnol. 52, 127–133. ONEDA, H., and INOUYE, K. (1999). Refolding and recovery of recombinant human matrix metalloproteinase 7 (matrilysin) from inclusion bodies expressed by Escherichia coli. J. Biochem. 126, 905–911. TANEJA, S., and AHMAD, F. (1994). Increased thermal stability of proteins in
References
105
106
107
108
109
110
111
112
113
114
115
the presence of amino acids. Biochem. J. 303, 147–153. ARAKAWA, T., and TSUMOTO, K. (2003). The effects of arginine on refolding of aggregated proteins: not facilitate refolding, but suppress aggregation. Biochem. Biophys. Res. Comm. 304, 148–152. KITA, Y., ARAKAWA, T., LIN, T.-Y., and TIMASHEFF, S. N. (1994). Contribution of the Surface Free Energy Perturbation to Protein-Solvent Interactions. Biochemistry 33, 15178–15189. RARIY, R. V., and KLIBANOV, A. M. (1997). Correct protein folding in glycerol. Proc. Natl. Acad. Sci. USA 94, 13520–13523. RARIY, R. V., and KLIBANOV, A. M. (1999). Protein Refolding in Predominantly Organic Media Markedly Enhanced by Common Salts. Biotechnol. Bioeng. 62, 704–710. SUMMERS, C. A., and FLOWERS R. A. II (2000). Protein renaturation by the liquid organic salt ethylammonium nitrate. Protein Sci. 9, 2001–2008. WELTON, T. (1999). Room-temperature ionic liquids. Solvents for synthesis and catalysis. Chem. Rev. 99, 2071–2083. KRAGL, U., ECKSTEIN, M., and KAFTZIK, N. (2002). Enzyme catalysis in ionic liquids. Curr. Opin. Biotechnol. 13, 565–571. van RANTWIJK, F., MADEIRA LAU, R., and SHELDON, R. A. (2003). Biocatalytic transformations in ionic liquids. Trends Biotechnol. 21, 131–138. PRICE, P. A., STEIN, W. H., and MOORE, S. (1969). Effect of Divalent Cations on the Reduction and Reformation of the Disulfide Bonds of Deoxyribonuclease. J. Biol. Chem. 244, 929–932. SMITH, A. T., SANTAMA, N., DACEY, S., EDWARDS, M., BRAY, R. C., THORNELEY, R. N. F., and BURKE, J. F. (1990). Expression of a Synthetic Gene for Horseradish Peroxidase C in Escherichia coli and Folding and Activation of the Recombinant Enzyme with Ca2+ and Heme. J. Biol. Chem. 265, 13335–13343. WHITWAM, R. E., GAZARIAN, I. G., and TIEN, M. (1995). Expression of Fungal Mn Peroxidase in E. coli and Refolding to Yield Active Enzyme. Biochem. Biophys. Res. Comm. 216, 1013–1017.
116 DOYLE, W. A., and SMITH, A. T. (1996). Expression of lignin peroxidase H8 in Escherichia coli: folding and activation of the recombinant enzyme with Ca2+ and Haem. Biochem. J. 315, 15–19. 117 RUDOLPH, R., GERSCHITZ, J., and JAENICKE, R. (1978). Effect of Zinc(II) on the Refolding and Reactivation of Liver Alcohol Dehydrogenase. Eur. J. Biochem. 87, 601–606. 118 RUDOLPH, R., HEIDER, I., and JAENICKE, R. (1977). Mechanism of Reactivation and Refolding of Glyceraldehyde-3-Phosphate Dehydrogenase form Yeast after Denaturation and Dissociation. Eur. J. Biochem. 81, 563–570. 119 BELL, S., HANSEN, S., and BUCHNER, J. (2002). Refolding and structural characterization of the human p53 tumor suppressor protein. Biophys. Chem. 96, 243–257. 120 WANG, C.-C., and TSOU, C.-L. (1998). Enzymes as chaperones and chaperones as enzymes. FEBS Lett. 425, 382–384. 121 SCHIENE, C., and FISCHER, G. (2000). Enzymes that catalyse the restructuring of proteins. Curr. Opin. Struct. Biol. 10, 40–45. 122 WALTER, S., and BUCHNER, J. (2002). Molecular Chaperones – Cellular Machines for Protein Folding. Angew. Chem. Int. Ed. 41, 1098–1113. 123 BUCHNER, J., BRINKMANN, U., and PASTAN, I. (1992). Renaturation of a Single-Chain Immunotoxin Facilitated by Chaperones and Protein Disulfide Isomerase. Bio/Technology 10, 682– 685. 124 ALTAMIRANO, M. M., GOLBIK, R., ZAHN, R., BUCKLE, A. M., and FERSHT, A. R. (1997) Refolding chromatography with immobilized mini-chaperones. Proc. Natl. Acad. Sci. USA 94, 3576–3578. 125 ALTAMIRANO, M. M., GARC´IA C., POSSANI, L. D., and FERSHT, A. R. (1999). Oxidative refolding chromatography: folding of the scorpion toxin Cn5. Nature Biotechnol. 17, 187–191. 126 COOPER, A. (1992). Effect of Cyclodextrins on the Thermal Stability of Globular Proteins. J. Am. Chem. Soc. 114, 9208–9209.
33
34 Production of Recombinant Proteins by In Vitro Folding
127 SHARMA, L., and SHARMA, A. (2001). Influence of cyclodextrin ring substituents on folding-related aggregation of bovine carbonic anhydrase. Eur. J. Biochem. 268, 2456–2463. 128 ROZEMA, D., and GELLMAN, S. H. (1996). Artificial Chaperone-assisted Refolding of Carbonic Anhydrase B. J. Biol. Chem. 271, 3478–3487. 129 ROZEMA, D., and GELLMAN, S. H. (1996). Artificial Chaperone-Assisted Refolding of Lysozyme: Competition between Renaturation and Aggregation. Biochemistry 35, 15760–15771. 130 DAUGHERTY, D. L., ROZEMA, D., HANSON, P. E., and GELLMAN, S. H. (1998) Artificial Chaperone-assisted Refolding of Citrate Synthase. J. Biol. Chem. 273, 33961–33971. 131 HANSON, P. E., and GELLMAN, S. E. (1998). Mechanistic comparison of artificial-chaperone-assisted and unassisted refolding of urea-denatured carbonic anhydrase B. Folding & Design 3, 457–468. 132 MANNEN, T., YAMAGUCHI, S., HONDA, J., SUGIMOTO, S., and NAGAMUNE, T. (2001). Expanded-Bed Protein Refolding Using a Solid-Phase Artificial Chaperone. J. Biosci. Bioeng. 91, 403–408. 133 SCHADE, B. C., RUDOLPH, R., L¨UDEMANN, H.-D., and JAENICKE, R. (1980). Reversible High-Pressure Dissociation of Lactic Dehydrogenase from Pig Muscle. Biochemistry 19, 1121–1126. 134 HILLSON, N., ONUCHIC, J. N., and GARC´IA A. E. (1999). Pressure-induced protein-folding/unfolding kinetics. Proc. Natl. Acad. Sci. USA 96, 14848–14853. 135 GOROVITS, B. M., and HOROWITZ, P. M. (1998). High Hydrostatic Pressure Can Reverse Aggregation of Protein Folding Intermediates and Facilitate Acquisition of Native Structure. Biochemistry 37, 6132–6135. 136 FOGUEL, D., ROBINSON, C. R., CAETANO DE SOUSA, P. Jr., SILVA, J. L., and ROBINSON, A. S. (1999). Hydrostatic Pressure Rescues Native Protein from Aggregates. Biotechnol. Bioeng. 63, 552–558. 137 ST. JOHN, R. J., CARPENTER, J. F., and RANDOLPH, T. W. (1998) High pressure
138
139
140
141
142
143
144
145
fosters protein refolding from aggregates at high concentrations. Proc. Natl. Acad. Sci. USA 23, 13029–13033. ST. JOHN, R. J., CARPENTER, J. F., and RANDOLPH, T. W. (2002) High-Pressure Refolding of Disulfide-Cross-Linked Lysozyme Aggregates: Thermodynamics and Optimization. Biotechnol. Prog. 18, 565–571. MEERSMAN, F., and HEREMANS, K. (2003). High pressure induces the formation of aggregation-prone states of proteins under reducing conditions. Biophys. Chem. 104, 297–304. XIE, Y., and WETLAUFER, D. B. (1996) Control of aggregation in protein refolding: The temperature-leap tactic. Protein Sci. 5, 517–523. LAJMI, A. R., WALLACE, T. R., and SHIN, J. A. (2000). Short, Hydrophobic, Alanine-Based Proteins Based on the Basic Region/Leucine Zipper Protein Motif: Overcoming Inclusion Body Formation and Protein Aggregation during Overexpression, Purification, and Renaturation. Protein Expr. Purif. 18, 394–403. HIROSE, M., AKUTA, T., and TAKAHASHI, N. (1989). Renaturation of Ovotransferrin Under Two-step Conditions Allowing Primary Folding of the Fully Reduced Form and the Subsequent Regeneration of the Intramolecular Disulfides. J. Biol. Chem. 264, 16867–16872. RUDOLPH, R., ZETTLMEISSL, G., and JAENICKE, R. (1979). Reconstitution of Lactic Dehydrogenase. Noncovalent Aggregation vs. Reactivation. 2. Reactivation of Irreversibly Denatured Aggregates. Biochemistry 18, 5572–5575. WILLIAMSON, R. A., NATALIA, D., GEE, C. K., MURPHY, G., CARR, M. D., and FREEDMAN, R. B. (1996). Chemically and conformationally authentic active domain of human tissue inhibitor of metalloproteinases-2 refolded from bacterial inclusion bodies. Eur. J. Biochem. 241, 476–483. CREIGHTON, T. E. (1985). Folding of proteins adsorbed reversibly to ion-exchange resins. UCLA Symp. Mol. Cell. Biol. 39, 249–258.
References 146 CREIGHTON, T. E. (1990). Process for the production of a protein. U.S. Patent 4,977,248. 147 GENG, X., and CHANG, X. (1992). High-performance hydrophobic interaction chromatography as a tool for protein refolding. J. Chromatogr. 599, 185–194. 148 WALTER, C., H¨ONER ZU BENTRUP, K., and SCHNEIDER, E. (1992). Large Scale Purification, Nucleotide Binding Properties, and ATPase Activity of the MalK Subunit of Salmonella typhimurium Maltose Transport Complex. J. Biol. Chem. 267, 8863–8869. 149 STEMPFER, G., H¨OLL-NEUGEBAUER, B., and RUDOLPH, R. (1996). Improved Refolding of an Immobilized Fusion Protein. Nature Biotechnol. 14, 329–334. 150 BASSUK, J. A., BRAUN, L. P., MOTAMED, K., BANEYX, F., and SAGE, H. E. (1996). Renaturation of SPARC Expressed in Escherichia coli Requires Isomerization of Disulfide Bonds for Recovery of Biological Activity. Int. J. Cell Biol. 28, 1031–1043. 151 LINA˜RES, D., ECHEVARRIA, I., and MANA˜ P. (2004). Single-step purification and refolding of recombinant mouse and human myelin oligoden-drocyte glycoprotein and induction of EAE in mice. Protein Expr. Purif. 34, 249–256. 152 FRANKEN, K. L. M. C., HIEMSTRA, H. S., van MEIJGAARDEN, K. E., SUBRONTO, Y., DEN HARTIGH, J., OTTENHOFF, T. H. M., and DRIJFHOUT, J. W. (2000). Purification of His-Tagged Proteins by Immobilized Chelate Affinity Chromatography: The Benefits from the Use of Organic Solvents. Protein Expr. Purif. 18, 95–99. 153 ROGL, H., KOSEMUND, K., K¨UHLBRANDT, W., and COLLINSON, I. (1998) Refolding of Escherichia coli produced membrane protein inclusion bodies immobilised by nickel chelating chromatography. FEBS Lett. 432, 21–26. 154 BERDICHEVSKY, Y., LAMED, R., FRENKEL, D., GOPHNA, U., BAYER, E. A., YARON, S., SHOHAM, Y., and BENHAR, I. (1999). Matrix-Assisted Refolding of Single-Chain Fv-Cellulose Binding Domain Fusion Proteins. Protein Expr. Purif. 17, 249–259.
155 SHIMIZU, H., FUJIMOTO, K., and KAWAGUCHI, H. (2000). Improved refolding of denatured/reduced lysozyme using disulfide-carrying polymeric microspheres. Colloids Surfaces B: Biointerfaces 18, 137– 144. 156 SHALONGO, W., LEDGER, R., JAGANNADHAM, M. V., and STELLWAGEN, E. (1987). Refolding of denatured thioredoxin observed by size-exclusion chromatography. Biochemistry 26, 3135–3141. 157 SHALONGO, W., JAGANNADHAM, M. V., FLYNN, C., and STELLWAGEN, E. (1989). Refolding of denatured ribonuclease observed by size-exclusion chromatography. Biochemistry 28, 4820–4825. 158 WERNER, M. H., CLORE, G. M., GRONENBORN, A. M., KONDOH, A., and FISHER, R. J. (1994). Refolding proteins by gel filtration chromatography. FEBS Lett. 345, 125–130. 159 GU, Z., SU, Z., and JANSON, J.-C. (2001). Urea gradient size-exclusion chromatography enhanced the yield of lysozyme refolding. J. Chromatogr. A 918, 311–318. 160 GU, Z., WEIDENHAUPT, M., IVANOVA, N., PAVLOV, M., XU, B., SU, Z.-G., and JANSON, J.-C. (2002). Chromatographic Methods for the Isolation of, and Refolding of Proteins from, Escherichia coli inclusion Bodies. Protein Expr. Purif. 25, 174–179. 161 GU, Z., ZHU, X., ZHOU, H., and SU, Z. (2003) Inhibition of aggregation by media selection, sample loading and elution in size exclusion chromatographic refolding of denatured bovine carbonic anhydrase B. J. Biochem. Biophys. Methods 56, 165–175. 162 TSUMOTO, K., UMETSU, M., KUMAGAI, I., EJIMA, D., and ARAKAWA, T. (2003). Solubilization of active green fluorescent protein from insoluble particles by guanidine and arginine. Biochem. Biophys. Res. Comm. 312, 1381–1386. 163 BURGESS, R. R. (1996). Purification of Overproduced Escherichia coli RNA Polymerase s Factors by Solubilizing Inclusion Bodies and Refolding from
35
36 Production of Recombinant Proteins by In Vitro Folding
Sarkosyl. Methods Enzymol. 273, 145–149. 164 FRANKEL, S., SOHN, R., and LEINWAND, L. (1991). The use of sarkosyl in generating soluble protein after bacterial expression. Proc. Natl. Acad. Sci. USA 88, 1192–1196. 165 LOWE, P. A., MARSTON, F. A. O., SAROJANI, A., and SCHOEMAKER, J. A. (1994). Process for the recovery of recombinantly produced protein from
insoluble aggregate. U.S. Patent 5,340,926. 166 MCCOY, K. M., and FROST, R. A. (1991). Method for solubilization and naturation of somatotropin. U.S. Patent 5,064,943. 167 BRADFORD, M. M. (1976). A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248–254.
1
Protein Production in Escherichia coli Josef Altenbuchner, and Ralf Mattes University Stuttgart, Stuttgart, Germany
Originally published in: Production of Recombinant Proteins. Edited by Gerd Gellissen. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31036-4
1 Introduction
The Gram-negative bacterium Escherichia coli was not only the first microorganism to be subjected to detailed genetic and molecular biological analysis, but also the first to be employed for genetic engineering and recombinant protein production. Our knowledge of its genetics, molecular biology, growth, evolution and genome structure has grown enormously since the first compilation of a linkage map in 1964. Its current status is reviewed in the standard reference “Escherichia coli and Salmonella” [1]), now available as EcoSal online (www.asmpress.org http://www.ecosal.org/). From a model organism for laboratory-based basic research, E. coli has evolved into an industrial microorganism, and is now the most frequently used prokaryotic expression system. It has become the standard organism for the production of enzymes for diagnostic use and for analytical purposes, and is even used for the synthesis of proteins of pharmaceutical interest, provided that the desired product does not consist of different multiple subunits or require substantial post-translational modification. A huge body of knowledge and experience in fermentation and high-level production of proteins has grown up during the past 40 years. Many strains are available which are adapted for the production of proteins in the cytoplasm or periplasm, and hundreds of expression vectors with differently regulated promoters and tags for efficient protein purification have been constructed. Nevertheless, high-level gene expression and fermentation of recombinant strains cannot be regarded as a routine task. Due to the unique structural features of individual genes and their products, optimization of E. coli-based processes is quite often tedious, time-consuming and costly. Major drawbacks include the instability of vectors (especially during largescale fermentation), inefficient translation initiation and elongation, instability of Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Production in Escherichia coli
mRNA, toxicity of gene products, and instability, heterogeneity, inappropriate folding and consequent inactivity of protein products. This article describes some of the main features of E. coli as a host cell, and focuses on some of the problems mentioned above and recent advances in our attempts to overcome them.
2 Strains, Genome, and Cultivation
Following the first description of E. coli by T. Escherich in 1885, a line of Escherichia coli K12 was isolated in 1922 and deposited as “K-12” at Stanford University in 1925. In the early 1940s, E.L. Tatum began his work on bacteria, and isolated the first auxotrophic mutants. The nomenclature used to designate loci, mutation sites, plasmids and episomes, sex factors, phenotypic traits and bacterial strains developed over time, and was codified by Demerec et al. [2]. A useful reference for the terminology, with a compilation of (older) alternate symbols, is given by Berlyn [3]. This last compilation of the traditional linkage map was complemented by the appearance of the corresponding physical map [4] A pedigree and description of various standard strains, including the E. coli B strain, commonly used in the laboratories all over the world, was first published in 1972 [5]. Salient features of the most relevant and popular strains used today are summarized in the Appendix of this article. The first complete genome sequence of an E. coli strain was established by [6] for the K-12 strain MG1655 (4,639,221 bp). Sequencing of the closely related strain W3110 has not yet been completed, but about 2 Mb are available for comparison. Based on this comparison, the rate of nucleotide changes was estimated to be less than 10−7 per site per year [7]. Thus, the degree of difference between the two strains is remarkably low in light of their differing histories. According to laboratory records, these sub-strains originated from W1485 approximately 40 years ago [5]. To elucidate the differences between the reference genome MG1655 and currently employed strains, an alternative method for whole-genome sequencing has been applied. Using whole-genome arrays as a tool to identify deletions, this study revealed the exact nature of three previously unresolved deletions in a particular selected strain, and a fourth deletion which was completely unknown [8]. Ongoing sequencing efforts have also permitted genomic sequence comparisons with various other species and relatives of E. coli. This work has resulted in an initial view of the evolutionary forces that shape bacterial genomes. The comparisons revealed a startling pattern in which hundreds of strain-specific “islands” are found inserted in a common “backbone” that is highly conserved (98% sequence identity) between strains. Gene loss and horizontal gene transfer have been the major genetic processes that shaped the ancestral E. coli genome, resulting in a spectrum of divergent present-day strains, which possess very different arrays of genes [9]. Clearly, E. coli today possesses many dispensable functions and genes that are not required for laboratory or technical purposes. Transposable elements and cryptic
2 Strains, Genome, and Cultivation
Fig. 1 Localization of deletions MD1 through MD12 on the circular map of the MG1655 genome. The replication origin (oriC) and terminus (ter) are indicated. Redrawn after Kolisnychenko et al. [11]).
prophages may even compromise genome stability in industrial strains. Consequently, our knowledge of the genome has led to attempts to create new strains using precise genome engineering tools [10]. The first result of such approaches was the generation of the E. coli strain MDS12, which was the outcome of 12 rounds of deletion formation. This resulted in a genome of 4.263 Mb, equivalent to an 8.1% reduction in size, a 9.3% reduction in gene count, and the elimination of 24 of the 44 transposable elements (Figure 1) [11]). Partial characterization of MDS12 revealed only a few phenotypic changes compared to the parental strain. Growth characteristics and transformability were essentially identical to those of MG1655. This strategy opens up new opportunities for the design of strains with favorable characteristics. Cultivation of E. coli was first established on a laboratory scale in academic laboratories. Differences in behavior observed in various strains and mutants were exploited to develop special genetic approaches to create strains adapted to certain conditions. The more recent evolution of E. coli into an industrial microorganism has entailed much work in basic research and development to establish today’s standards of high-cell-density (HCD) cultivation (for reviews, see [12–14]). As various currently popular expression plasmids (see below) require special mutant strains as hosts, fermentation protocols have been developed for strains of E. coli, which differ considerably from each other. Detailed descriptions are available in particular for E. coli K-12 W3110, HB101 and E. coli B BL21 (and its mutants) [1, 15, 16]. One of the major obstacles during the development of HCD cultivation has been the propensity of E. coli for acetate formation and the resulting inhibition of growth [18]. The basis for this tendency has been elucidated in greater detail [19, 20], and several approaches are now available to combat it. For example, the use of
3
4 Protein Production in Escherichia coli
methyl-α-glucoside as an alternative carbon source [21] allows one to bypass this problem. Finally, metabolic engineering using the acetolactate synthetase from Bacillus subtilis [5] or feed-back control of glucose feeding have recently become standard tools [1].
3 Expression Vectors
An expression vector usually contains an origin of replication (ori), an antibiotic resistance marker, and an expression cassette for regulated transcription and translation of a target gene. Additional features might include plasmid stability functions, and genes or DNA structures for mobilization and transfer to other strains. The most frequently used vector system is derived from the ColE1-like plasmid pMB1 [23]. 3.1 Replication of pMB1-derived Vectors
The plasmid pMB1 requires RNA polymerase and DNA polymerase I for replication. Replication is controlled by an antisense RNA (RNA I) that binds to the precursor of the primer RNA (RNA II), thus inhibiting RNase H-mediated primer maturation. A small protein called Rop encoded by the plasmid binds to the complex of RNA I and RNA II and stabilizes it [24]. Deletion of the rop gene, as in the case of the pUC series of plasmids [25], increases the plasmid copy number in cells from about 50 copies to 150 copies. Increases in copy number have also been observed during overproduction of recombinant proteins. In such cases, excessive consumption of amino acids obviously leads to the accumulation of uncharged tRNAs. Due to sequence homologies between these tRNAs and RNA I [26], the interaction between RNA I and RNA II is impaired, and this results in the amplification of plasmid DNA. The resulting increase in copy number (up to 250 plasmid copies per cell and more) further raises the dosage of the recombinant gene. This may eventually exhaust the cell’s metabolic capacity and in turn cause a breakdown of protein synthesis. Strong expression systems, such as the T7 system, are known to be susceptible to this phenomenon. Recently, a new ColE1-type vector has been constructed in which the region of RNA I that most probably interacts with uncharged tRNAs have been altered (Figure 2). Indeed, with this new plasmid, the copy number remained constant during high-level protein synthesis [27]. 3.2 Plasmid Partitioning
Many E. coli strains that have been selected for high transformation rates are characterized by low biomass formation during HCD fermentations [13]. In more robust strains like E. coli W3110, very high cell densities can be obtained; however,
3 Expression Vectors
Fig. 2 Stem–loop structure of RNA I, the antisense repressor of the primer RNA II. RNA I half-life is determined by the indicated cleavage site for RNase E. Uncharged tRNAs are believed to interact with RNA I
or II, leading to deterioration of the controlling complex and resulting in hyperproliferation of plasmid DNA. Changes in loop 2 (pEARL1) result in stabilizing the copy number of ColE1-derived vectors. [27]
ColE1-derived expression vectors tend to be unstable [28]. Most of this instability can be attributed to the recA+ status of the host strain. Plasmids present in high copy numbers are generally subject to homologous recombination in rec-proficient strains. This converts them into head-to-tail dimeric plasmids. The dimers are either resolved into monomers again, or become substrates for further multimerization. Since the probability of replication of ColE1-type plasmids is assumed to increase in proportion to the number of ori sequences per molecule, dimers will replicate twice as often as monomers and finally dominate the plasmid population. The plasmid multimers disrupt control circuits, eventually resulting in copy number depression (the dimer catastrophe) [29]. In systems which do not have their own partition mechanism and are randomly distributed to the daughter cells, like the ColE1 plasmids, a lower copy number results in higher plasmid instability. This means that multimers accumulate clonally and create a sub-population of cells that show higher rates of plasmid loss. To avoid plasmid loss, ColE1 carries the cer recognition sequence for the chromosomally encoded, site-specific recombinase XerCD which, together with some accessory proteins, efficiently resolves chromosomal or plasmid dimers into monomers [30]. In addition, a promoter located inside the 240-bp cer region directs the synthesis of a 95-nt transcript, Rcd, which is assumed to delay the division of multimer-containing cells [31]. Most of the ColE1-type cloning and expression vectors lack a functional cer sequence. This does not affect recA-deficient strains, in which multimerization is not possible. However, in Rec+ strains like W3110, this can have dramatic effects–as demonstrated in the following example. In case of a recombinant W3110 strain with an L-rhamnose-inducible expression vector, more than 50% of the cells were found to have lost the plasmid at the end of a fed-batch fermentation without antibiotic selection pressure when a construct without the cer sequence was used. In contrast, more than 90% of the cells retained the corresponding cer-bearing plasmid. All the plasmids without cer isolated from
5
6 Protein Production in Escherichia coli
this biomass were multimeric, whereas more than 90% of the plasmids with cer were monomers [28]. Another means of stabilizing plasmids lies in the use of post-segregational killing systems or addiction modules, which are frequently found on low-copy number plasmids, and even on chromosomes. This system of plasmid maintenance, exemplified by the parB stability locus of plasmid R1, consists of two genes, hok (host killing) and sok (suppressor of host killing). The hok gene encodes a highly toxic protein, while sok specifies an antisense RNA which is complementary to the hok mRNA and prevents its translation. The hok mRNA is stable, the sok mRNA unstable. In case of plasmid loss, the sok product is degraded more rapidly than the hok mRNA, leading to production of the toxin which eventually kills the plasmid-free cell [32]. Overall, this system ensures that no plasmid-free cells arise, but it does not stabilize the plasmid. The use of such addiction modules in unstable expression systems might therefore lead to slow growth and even impose an additional metabolic burden on the cells. If a promoter is not tightly regulated and/or gene products are detrimental to the cell, plasmids that are maintained in lower copy numbers may help to minimize the metabolic stress, especially when strong promoters are used. Frequently used ColE1-type plasmids of lower copy number are derived from p15A–for example, the plasmids pACYC177 and pACYC184 [31]. For gene expression in different species, broad-host-range vectors based on RSF1010 with moderate copy numbers and a different type of replication control can be used [34]. Other vectors with even lower copy numbers include derivatives of pSC101 [35], RK2 [36] or the F-plasmid [37]. These plasmids are also mutually compatible, and may also be useful for the expression of several genes in a single cell. If two or more genes have to be expressed in a single cell, the genes can be inserted into the plasmid in a tandem arrangement downstream of the promoter. If the translation efficiency of the selected genes or the activities of the encoded enzymes are unbalanced, the easiest way to ensure optimally balanced production is to use vectors with different copy numbers carrying similar expression modules. For selection, the plasmids must harbor different antibiotic resistance genes. For example, for enantioselective production of amino acids from racemic hydantoins, a hydantoinase, a carbamoylase and a racemase which differed by up to tenfold in specific activities had to be produced in the same cell [38]. The various genes in question were introduced into an L-rhamnose-inducible expression cassette present in derivatives of pACYC184, pSC101, and pBR322. Various combinations of these plasmids were introduced into recipient strains, and whole-cell reactors with an optimal reaction cascade were eventually developed. 3.3 Genome Engineering
In many cases, chromosomal integration of target genes may be preferable to the use of plasmids, especially when very strong promoters are used to compensate for low gene dosage. For E. coli, several integration systems are now available [39]. For
3 Expression Vectors
example, the vector pKO3 may be used for integration via homologous recombination in rec-proficient strains. The vector is temperature-sensitive in its replication, and contains an antibiotic-resistance gene and a sacB gene encoding levan sucrase. An expression cassette and the gene of interest, flanked by chromosomal targeting DNA, are integrated into the pKO3 plasmid and introduced into the E. coli strain. Selection for the antibiotic resistance during growth at a nonpermissive temperature leads to cells with integrated plasmids. In a second step, the cells are selected for loss of the vector sequences by growth on sucrose, which is lethal in the presence of the levan sucrase. Half of the plasmid-free colonies should have the gene of interest stably integrated together with the expression cassette, but with no further vector sequences [40]. Another rec-independent integration system is based on λ site-specific recombination. The expression cassette together with the target gene is inserted into a plasmid containing the λ attachment site attP. The replication region of this plasmid is removed by restriction digestion, and after re-ligation, the fragment is introduced into an E. coli strain carrying the λ integrase gene on a temperature-sensitive plasmid. The DNA circle is integrated into the chromosome attB site, and then the helper plasmid is removed by growth at a nonpermissive temperature [41]. Finally, a novel way to engineer DNA in E. coli and to integrate DNA into the chromosome or into plasmids independently of restriction sites and recA employs the recET system. E. coli strains that express recET, due to a mutation in sbcA or because recET is placed under the control of another promoter, are able to take up linear PCR fragments and integrate them into the chromosome if the fragments are flanked by short sequences (40–60 bp) homologous to a chromosomal target region. This means that genes of interest can be stably integrated into any region of the chromosome, or into low- or high-copy number vectors, and brought under the control of the regulatory system present at the integration site without having to use any restriction sites [10]. The antibiotic resistance genes which have to be used for selection of integration can subsequently be removed, for example by site-specific integrases. Furthermore, this method may also be very useful for engineering host strains for increased production of recombinant proteins; for example, by targeted inactivation of protease or RNase genes (compare the construction of MDS12 in Figure 1; Kolisnychenko et al., [11]). 3.4 E. coli Promoters
Promoters are DNA sequences which direct RNA polymerase binding and transcription initiation. They usually consist of the two −10 and −35 hexameric sequences, separated by a spacer of 16–19 bp. The sigma subunit confers promoter specificity on RNA polymerase [38]. E. coli has seven different sigma factors and, accordingly, seven different types of promoter. The most widely used promoter type is recognized by the sigma 70 factor. The initiation of transcription can be divided into four major steps [43]: (i) recognition of the promoter sequences by the RNA polymerase holoenzyme; (ii) isomerization of the initial complex into a conformation capable of initiation; (iii) initiation of RNA synthesis; and (iv) transition
7
8 Protein Production in Escherichia coli
to an elongation complex and promoter clearance. The initial contact between RNA polymerase and promoter results in an open complex in which the DNA strands are separated in the region flanking the start site of RNA synthesis, referred as the +1 position. In the open complex the RNA polymerase covers the region from −50 and +20 [44]. Strong promoters of the sigma-70 type have motifs in the −35 and −10 regions that are most similar to the consensus sequences TTGACA and TAATAT, respectively. Another important feature is the spacing between the −10 and −35 regions; 17 bp is the optimal length. The nucleotide sequence of the spacer itself is of minor importance. Other regions that influence promoter strength are an AT-rich region upstream of the −35 sequence around position −43, and the region +1 to +20, which seem to participate in promoter recognition and promoter clearance, respectively. Each of the four steps in transcription initiation can be rate-limiting, which means that promoters of similar strength can have quite different sequences depending upon which steps are optimized. Actually, most promoters found in E. coli differ considerably from the hexameric consensus sequences. One obvious reason for these differences is that gene products are needed in quite different amounts. Even more importantly, promoters often overlap with regulatory sequences. Two or more promoters, which may be recognized by either the same or different sigma factors, may be arranged in tandem to allow the cell to respond to specific signals, as well as to the physiological condition of the whole cell. Many efforts have been made in the past to adapt natural promoters for use in expression vectors, with the aim of generating optimal elements that combine high efficiency and tight regulation, thereby promoting maximal protein production and avoiding plasmid instability. A completely different type of promoter architecture has been found in some lytic phages, such as phages T3, T5 or T7, and SP6. These phages encode their own RNA polymerases, which are much simpler in structure than the host enzyme. They are highly processive and recognize conserved sequences covering a region between positions −17 and +6 bp relative to the mRNA start site. These are the strongest promoters described for microorganisms so far. They have become very popular for use in expression vectors and in-vitro transcription when coupled with regulatory sequences from natural E. coli promoters [45, 46].
4 Regulation of Gene Expression
Constitutive heterologous gene expression that results in product yields equivalent to about 30% of total cell protein will obviously lead to high genetic instability. Therefore, promoters employed in expression vectors must be very tightly regulated during bacterial growth, and be switched on only when the cells have reached a high cell density. Many different transcriptional regulatory mechanisms are found in nature. Binding of a regulatory protein to a promoter is probably the most common principle, but there are other control mechanisms, such as transcriptional
4 Regulation of Gene Expression
attenuation, anti-sense RNA, anti-termination, changes in sigma factors and antisigma-factors. The conditions which may lead to changes in promoter activity are countless. Arbitrary examples are changes in the availability of carbon, nitrogen, phosphate and other mineral sources, growth temperature, pH, oxygen supply, osmolarity and mutagenic conditions [47]. Many of these options have been assessed for use in expression systems. In practice, however, regulation of promoter activity by regulatory proteins in response to carbon sources or to growth temperature is most often used. In principle, DNA-binding proteins can regulate promoter activity in two different ways. In negatively controlled systems a repressor protein binds in or just downstream to the promoter region and directly inhibits transcription. Positively regulated promoters either exhibit sub-optimal spacing of the −10 and −35 hexameric sequences, or the −35 sequence is quite different from the ideal consensus sequence of strong promoters. In these cases, activator proteins are necessary to bind the RNA polymerase. Both negatively and positively regulated promoters can be controlled either by induction or repression. In negatively controlled inducible systems, an effector molecule binds to the repressor and inhibits its binding to the operator sequence, the binding site of the repressor. In positively controlled inducible systems, activators only bind to their target in the presence of effector molecules. Thus, in the case of mercury resistance genes, the activator MerR is already bound to the operator in the absence of mercury ions, but is only rendered active when Hg2+ binds to it [48]. In systems with underlying repression, the situation is exactly the opposite: the inactive repressor becomes active in the presence of the effector and binds to its operator, and the activator is inactivated by the effector. 4.1 Negative Control
Many promoters—especially those of operons involved in carbohydrate catabolism–are both negatively and positively controlled. The best known example is the promoter of the E. coli lac-operon for consumption of lactose. The lactose repressor LacI binds downstream of the lac promoter (position +1 to +21) in the absence of lactose, and inhibits transcription of the genes lacZYA. In the presence of allolactose, which is synthesized from lactose by the β-galactosidase LacZ, or following addition of the synthetic inducer isopropyl-thiogalactopyranoside (IPTG), the repressor loses its affinity for the operator. In addition, transcription is positively controlled by the catabolite activator protein. Efficient transcription is only possible in the presence of the activator complex CAP-cAMP, which binds upstream of the −35 region (around position −65). Only in the absence of glucose is the cAMP level in the cell high enough to allow lac transcription, leading to a preference for glucose and a diauxic growth pattern when both carbohydrates are added simultaneously to the cells (an additional effect is exclusion of the inducer lactose). Derivatives of the lac promoter are still among the most frequently used promoters in E. coli expression vectors. A marked improvement in the lac promoter was achieved by fusing the −35 region of the promoter of the tryptophan operon (trp) with the −10 region
9
10 Protein Production in Escherichia coli
of the lac promoter (actually the already improved lacUV5 promoter was used). In the new tac promoter, the spacer between the −10 and −35 regions was 16 bp long, and in the trc promoter 17 bp [49]. These two promoters are about tenfold more efficient in transcription initiation compared to the wild-type promoter, especially on multicopy plasmids. They also enable production of recombinant proteins in large quantities, independently of catabolite activation. These promoters serve as good examples for the problems that one may encounter when engineering negatively regulated promoters. The chromosomal lacI gene gives rise to only a very few lac repressor molecules (on average about 10). If the lac promoter–operator sequences are inserted into ColE1-type multicopy plasmids with 40 or more copies per chromosome, most of the lac operators will not be occupied by repressor molecules. This results in constitutive transcription from these promoters. Furthermore, there are two additional lac operators (also called pseudo-operators), one at the 3 -end of lacI upstream of the lac promoter and another downstream, inside the lacZ coding sequence. Only the presence of all three operators with the correct spacing provides for full repression of the promoter [50]. Many attempts have been made to increase the repression of lac promoter derivatives. The first step was the isolation of the lacIq mutation, a promoter-up mutation of the lacI gene which increased production of LacI by tenfold [51]. This gene is either provided on an F’ plasmid or by a derivative of phage 80. This approach restricts the use of lac promoter-based expression vectors to particular E. coli strains. Other vectors carry the lacI gene or even the lacI q gene, and are less dependent on host genes [52]. These strains with high repressor content provided by the plasmid are no longer fully inducible with the cheap but weak inducer lactose, but are still fully inducible with the nonhydrolyzable IPTG. On the other hand, this compound is not recommended for production of therapeutic proteins due to its toxicity and cost. An alternative strategy involves the use of a thermo-sensitive lac repressor. Here, the system is inducible by a shift in the growth temperature from 30◦ C to 42 ◦ C. Using the lacIts gene on the vector, there is still a high basal level of expression, whereas with a lacIqts gene on the vector protein production is fivefold less [53, 54]. Furthermore, an increase in growth temperature favors inclusion body formation and induces heat-shock proteases. Finally, it has been demonstrated that the level of basal expression can be altered by changing the position of the operator within the promoter region [55]. Repression was strongly increased when the lac operator was positioned between the −10 and −35 hexameric sequences instead of its original position downstream of the −10 sequence or upstream of the −35 sequence. The lac regulatory system is also used in another very efficient expression system. The pET vectors contain the very strong T7 late promoter, which is transcribed by the highly processive T7 RNA polymerase. The RNA polymerase is supplied in trans, either by infecting the host with a T7 phage–a procedure which is not practicable for large-scale fermentation–or by using the prophage λDE3 in which the RNA polymerase gene is under the control of the lacUV5 promoter. Again, leakiness is a major problem in this system. This can be counteracted to some extent by adding a lacI gene to the expression vector, or a lac operator downstream of the T7
4 Regulation of Gene Expression
promoter or by using a plasmid encoding a T7 lysozyme, which degrades the T7 RNA polymerase [56]. Another frequently used negatively regulated system is based on the very strong leftwardly oriented pL promoter of phage λ. This promoter is very tightly regulated by the λ cI repressor. One limitation of this promoter is that it can only be induced using a thermo-labile repressor (λ cI857 ), with all the resulting disadvantages when one wishes to increase the growth temperature [57]. Another mode of down-regulation has been described by Hasan and Szybalski [53]. This employs an invertible tac promoter. The promoter is oriented away from the target gene during cell growth. For gene expression the promoter is inverted by the λ integrase acting on the attB and attP sites, which flank the invertible promoter. The int gene is placed on a temperature-inducible defective λ phage, and expression is induced by a brief heat shock. The inversion is rapid and over 95% efficient. Another frequently used and negatively regulated (this time by repression) strong promoter is the trp promoter derived from the E. coli tryptophan operon. The tryptophan repressor binds to the trp operator in the presence of excess tryptophan. Induction is achieved by depletion of tryptophan. Drawbacks of this system are, again, leakiness of the promoter, a limited choice of growth media, and the fact that tryptophan limitation is needed at a time when protein synthesis should be maximal. Such conditions are difficult to define in large-scale fermentations. Alternatively, the promoter can be induced by the addition of the inducer β-indoleacrylic acid, but this compound is expensive [59]. In general, negatively controlled promoters are difficult to handle in downregulation, and require a balanced repressor to operator ratio. Induction, especially with strong promoters on multicopy vectors and nondegradable inducers, leads to high mRNA levels which might be toxic, as well to rapid synthesis of proteins which often results in the formation of inclusion bodies. 4.2 Positive Control 4.2.1 L-Arabinose Operon Positively regulated systems are characterized by a slower, but more reliable, response, with a very low basal activity. Here, the most popular system is the Larabinose system of E. coli. L-arabinose can be used by E. coli as its sole carbon source. It is taken up by two different transport systems (araE and araFGH) and metabolized to xylulose-5-phosphate by the enzymes encoded by araB, araA, and araD. The genes araBAD are organized as a single operon, as are araFGH and araE. These genes form a regulon that is regulated by AraC. The araC gene is located upstream of, and in opposite orientation to, the araBAD operon [60]. AraC belongs to the AraC/XylS family, one of the most common types of positive regulators [47]. The noncoding region between araBAD and araC is highly complex. There are three operators, araI, araO1, and araO2, and two binding sites (promoters), pc and pBAD, for RNA polymerase and the CAP-cAMP complex, since L-arabinose utilization is dependent on catabolite activation. The AraC binding sites are unusual in
11
12 Protein Production in Escherichia coli
that they consist of two direct repeats of 17 bp, instead of shorter, inverted repeated DNA sequences. According to the current model, the homodimeric AraC binds in the absence of L-arabinose to one half-site of araI and of araO2, leading to the formation of a double-stranded DNA loop of 210 bp. In this loop conformation, transcription of the pc promoter is inhibited and the pBAD promoter is not active as the second half-site of araI is not occupied. The binding of arabinose to AraC together with CAP-cAMP opens the loop. AraC loses its ability to bind to distant sites. Instead, it binds to the adjacent half-sites of araI and activates transcription of pBAD. In addition, transcription of pc increases about tenfold until sufficient AraC is synthesized (within about 10 min) and occupies araO1, thereby inhibiting further transcription. In expression vectors, the ara pBAD promoter from E. coli or Salmonella typhimurium is normally used. Most vectors also carry the araC gene. Obviously, AraC becomes limiting when produced from a single chromosomal copy, taking into account the many operator elements present on multicopy plasmids. The ratio of induction to repression is between 250- and 1300-fold in such vectors depending on the medium used, and the basal level is very low, especially in the presence of glucose [50]. The promoter is about 2.5- to 4.5-fold weaker than the tac promoter. Nevertheless, depending on translation efficiency, proteins can be produced in amounts constituting up to 30% of total cell protein. The induction can be modulated by supplementation with sub-optimal concentrations of arabinose. Only at very low arabinose concentrations is a mixed population of induced and noninduced cells found [63]. 4.2.2 l-Rhamnose Operon The catabolism of L-rhamnose in E. coli is another example for positive regulation. Lrhamnose is taken up via the product of the rhaT gene, isomerized to L-rhamnulose (rhaB, phosphorylated to rhamnulose-5-phosphate (rhaA), and hydrolyzed by an aldolase (rhaD). The genes rhaBAD form an operon, whereas rhaT is located close by and has its own promoter. Upstream of rhaBAD are the two genes rhaR and rhaS, which are arranged in opposite orientations. The rhaR and rhaS gene products also belong to the AraC/XylS family of activators (Figure 3). Both RhaR and RhaS bind in the noncoding region between rhaR and rhaS in the presence of rhamnose. RhaR activates transcription of rhaR and rhaS. RhaS activates transcription of the structural genes rhaBAD by binding to the rhaBAD promoter. In addition, L-rhamnose consumption requires the catabolite activation complex CAP-cAMP, which binds upstream of the rhaS promoter [10, 65]. The rhaBAD promoter is even more tightly regulated than the araBAD promoter. The basal level of transcription (determined using a transcriptional fusion with lacZ) is about tenfold lower, whereas the induced levels of rhaBAD and araBAD expression are similar [66]. Expression modules have been constructed containing just the rhaBAD promoter together with the CAP-cAMP binding site, a ribosomal binding site, several restriction sites for the insertion of target genes, and a transcription terminator (Figure 4). This module was inserted into various plasmids and used for expression of several different genes. Generally, proteins could be produced in levels up to 30% of total
4 Regulation of Gene Expression
Fig. 3 l-Rhamnose pathway and involved genes and pathways. Transcription of rhaBAD genes is controlled by RhaS activator (see text for details).
cell protein with this regulatory system. In contrast to the case with the ara system, the regulatory genes rhaRS are dispensable; the chromosomal copy seems to produce enough regulatory proteins even to saturate the binding sites on multicopy plasmids. The induction times for maximal protein yields varied between 4 h and 10 h. These very slow response rates were found to be beneficial for proteins that tend to form inclusion bodies, as well as for enzymes which have to be transferred into the periplasm. In comparison to the tac promoter, up to fourfold higher activities were obtained in the case of the E. coli pencillin amidase, and up to 20-fold higher activities in the case of a sucrose isomerase from Protaminobacter rubrum (unpublished observations). In addition, due to its low basal expression level, this regulatory system can be used successfully in HCD fermentation. Proteins were produced in amounts comparable to those obtainable in shake-flask experiments, and without having to employ selective conditions with antibiotics [28].
Fig. 4 Expression modules for vectors contain just the rhaBAD promoter, together with the CAP-cAMP binding site (rhaP), a ribosomal binding site (RBS), several restriction sites for insertion of target genes, and a transcription terminator (ter). RhaS and RhaR are provided in trans by the host strain.
13
14 Protein Production in Escherichia coli
5 Transcription and Translation 5.1 Translation Initiation
For efficient expression of foreign genes in E. coli, not only transcription but also the translation efficiency of the cloned gene must be considered, with translation initiation being the most important step. Initiation is mainly governed by the ribosomal binding site or Shine–Dalgarno (SD) sequence, a region of four to nine bases which is complementary to the 3 -end of the 16S RNA and is located 5–11 nt from the start codon. Most E. coli genes begin with a AUG start codon, but some use a GUG or a UUG codon (Table 1) [6]. The latter two are frequently found in genes that are weakly expressed, and translation initiation at these triplets is indeed twoto threefold less efficient. For heterologous gene expression it is therefore advisable to replace a GUG or UUG codon by a AUG codon. In some mRNAs the adenine of the start codon is the first 5 nucleotide. It is therefore not surprising that the region immediately downstream – called the downstream box – also influences translation initiation [45]. The secondary structure formed during translation initiation is of crucial importance. Pairing of the SD sequence and/or start codon with other parts of the mRNA makes the mRNA inaccessible to the ribosome. It has been shown that single nucleotide exchanges which affect such stem – loop structures that “hide” the SD sequence and AUG can result in up to a 500-fold difference in translation efficiency [37]. The most straightforward way to tackle such problems is the use of the translation initiation region of a gene which is highly expressed in E. coli, such as the T7 gene 10, T4 gene 32 or the atpE gene from E. coli, and to insert the target gene about five to ten codons downstream of the start codon of such a gene. If the N-terminal region of the target gene must be left unchanged, it is inserted immediately downstream of the start codon. Many expression vectors
Table 1 Frequency of codons (DNA sequence) used for start and stop
of translation in Escherichia coli [6] Codon
aa
Count
%
ATG GTG TTG AT T CTG Total
met val leu ile leu
3542 612 130 1 1 4286
82.6 14.3 3.0
TAA TGA TAG Total
ochre opal amber
2705 1257 326 4288
63.1 29.3 7.6
5 Transcription and Translation
already contain a well-functioning SD sequence and allow insertion of the gene exactly at the start codon. There are several restriction enzymes, the recognition sites of which overlap ATG codons. These are SphI (GCATG C), NcoI (C CATGG), and NdeI (CA TATG). Only the NdeI site allows the insertion of any possible codon after the start codon and is therefore preferred for use in expression vectors. The NdeI site can be introduced into the target sequence by the oligonucleotide used for PCR amplification of the gene. To avoid the formation of stem–loop structures and optimize the downstream sequence without changing the amino acid sequence, an oligonucleotide may be used which has a variable sequence at the third position of the first five or more codons, in accordance with the wobble theory. After insertion of the gene into the expression vector, the most efficient sequence is selected by screening the clones. A different strategy for enhancing translation efficiency is based on the translational coupling of genes. In this case, two successive genes share overlapping start/stop codons (TGATG or ATGA), and the SD sequence of the downstream gene is located within the upstream gene. In some expression vectors employing this strategy, the translational initiation region of an efficiently translated gene is inserted downstream of a promoter. After 20–30 codons, the open reading frame (ORF) is interrupted in frame by a translational coupling sequence – that is, by a SD sequence and the TGATG sequence. The target sequence is finally integrated immediately downstream of the start codon. Following this construction scheme it is expected that the translating ribosome will disrupt any secondary structure at the second start site, thus allowing uniform translational re-initiation [69]. There are no functional SD sequences in eukaryotic mRNAs. However, cryptic translational start sites may be found within eukaryotic genes, as has been observed for example in the sequence encoding the human parathyroid hormone. After expression of the gene in E. coli two gene products were obtained – a fulllength species, and a truncated protein which was seven amino acids shorter. Clearly, in this case there were two competing translational start sites, one at the authentic AUG start codon and one at the methionine codon at position 8. This internal ribosomal binding site could be removed by site-directed mutagenesis without changing the amino acid sequence [70], thus resulting in a homogeneous product. 5.2 Codon Usage
Once translation has started, efficient elongation is required. It has been shown that so-called “rare codons” can slow down the translation process. It emerged that expression of heterologous genes in E. coli can result in the immediate cessation of cell division and in the accumulation of modified or truncated forms of the desired protein products [25, 72, 73]. This effect was found to be mediated by a remarkable RNA molecule – alternatively designated as SsrA RNA, tmRNA, or 10Sa RNA - that acts both as a tRNA and as an mRNA to direct the modification of
15
16 Protein Production in Escherichia coli
proteins of which biosynthesis has stalled or has been interrupted [72, 74, 75]. The ssrA-encoded peptide-tagging system marks nascent polypeptide products originating from damaged mRNAs. A specific (AANDENYALAA) peptide tag is added by co-translational template switching of the ribosome from the damaged mRNA to the ssrA tmRNA. The tagged polypeptide is then rapidly degraded. This mechanism constitutes a novel post-translational proof-reading system and has now been described for various bacteria. Coding sequences in pro- and eukaryotes display species-specific usage of synonymous codons; each species uses what may be termed a “codon dialect”. Thus, the codons AGG/AGA (Arg), AUA (Ile), CUA (Leu), and CCC (Pro) are seldom used in E. coli, in contrast to the situation in Saccharomyces cerevisiae (Figure 5), for example. Often, the frequency of synonymous codons reflects the relative abundance of their cognate tRNAs. Thus, for E. coli a rather stringent correlation between the codon usage and the numbers of genes encoding the cognate tRNAs can be observed (Figure 6). The examination of fractionated tRNAs [42] has established for E. coli that the frequency of rare codons often correlates with low levels of the cognate tRNAs. For the codon AGG, in particular, it was observed early on that translation efficiency could be decreased by newly introduced codons in test genes [21, 77]. It has been shown that the provision of additional copies of the tRNAarg gene argU (formerly called dnaY) results in relief of growth inhibition, and concomitantly provides enhanced product formation, when genes with multiple AGG/AGA codons are expressed in E. coli [25, 32, 79]. It has also been observed that these rare AGG/AGA codons are prone to preferential misincorporation of lysine instead of arginine by misreading. Supplemental tRNAarg ensured the production of the correct and homogeneous protein product here also [28, 46, 83]. Numerous reports have described successful product development using additional argU genes, while others have indicated that the same strategy can also be used successfully for other codons that are rare in E. coli. Del Tito and colleagues [39] first successfully used an multi-augmentation strategy combining argU with ileX. For the expression of genes from AT-rich organisms such as Plasmodium and other protozoans, or helminths [8] and clostridia [86], or GC-rich organisms such as archeal bacteria [87], the so-called Codon+ strains have been used successfully. These combine argU on a helper plasmid with various other tRNA genes, ileY for AUA, leuW for CUA, proL for CCC and sometimes glyT for GGA, or lysT for AAG. Reports on the use of this multi-augmentation approach are still rare and, as some of these rare tRNAs require additional steps for their maturation (e.g., specific base modifications), it is at present not clear if the addition of tRNA genes alone will always be sufficient, as in the case of argU. HCD fermentation may be a problem with such strains, because it is known that the use of rich media may be indispensable for the success of this strategy [39]. Thus – especially for large-scale use – synthetic re-design of the coding sequence remains a potentially useful, but tedious, alternative [88]. The use of fusion genes
5 Transcription and Translation
Fig. 5 Comparison of codon usage frequencies of Escherichia coli and Saccharomyces cerevisiae. The arrow points to the codon of lowest frequency for arginine.
17
18 Protein Production in Escherichia coli
Fig. 6 Comparison of codon usage and number of genes known for the respective cognate tRNAs of Escherichia coli. Seven tRNA genes are duplicates or triplicates, whereas five are quadruplicates.
5 Transcription and Translation
(if possible) opens an attractive means of bypassing the problem, as described recently [89]. 5.3 Translation Termination
Product heterogeneity can also be caused by incorrect translation termination. Frequently, the stop codon UGA is used for translation termination (see Table 1); this can lead to translational read-through even in strains without suppressor tRNAs. Two natural UGA-decoding tRNAs insert selenocysteine into proteins. These require a special RNA stem–loop structure downstream of the UGA codon [20]. During expression of chorismate mutase genes containing UGA stop codons from E. coli and Bacillus subtilis, translational read-through and tryptophan insertion at the UGA codon was observed in 5 % of the translation products [80]. Translational +1 frameshifting at UGA codons has also been described by Poole et al., [92]. These authors tested the strength of stop signals and found that the next base following the stop signal has a very strong impact on termination efficiency. The strongest termination was found with the tetranucleotide UAAU, and the weakest with UGAC. Therefore, use of the UAAU stop sequence is strongly recommended. 5.4 Transcription Termination and mRNA Stability
As well as a strong stop codon, the use of a strong transcription terminator is recommended. The transcription terminator inhibits read-through by the RNA polymerase into other genes on the vector, or into the replication region. For example, deleting the T terminator in the pET vectors leads to read-through by the T7 RNA polymerase into the β-lactamase gene located downstream in the same orientation, thereby resulting in marked instability of the expression system (unpublished observations). Usually, Rho-independent transcription terminators which are able to form stem–loop structures are used, such as the two from the rnnB operon, from phage fd or T7 (T). For some transcripts, higher mRNA stability was observed when these terminators were used. Presumably they protect the RNA from the major 3 → 5 exonu-cleases RNase II and polynucleotide phosphorylase. Degradation of most mRNAs is probably initiated by an endonucleolytic cleavage catalyzed by RNase E [70]. The products of this initial cleavage are further degraded by additional RNase E cleavages and by 3 → 5 -exonucleases. RNase E is essential for the maturation of 5S RNA. It is a very large enzyme of 1061 residues, and is active as a homodimer. The catalytic activity is located in the N-terminal half of each subunit. The C-terminal half can be deleted without affecting the maturation of 5S RNA, whereas the half-life of individual mRNAs and the bulk mRNA in the cell is considerably increased by this mutation. This is due to the fact that the C-terminal part of RNase E contains binding sites for polynucleotide phosphorylase, RNase II, enolase and the RNA helicase RhlB. These proteins together form a multi-protein complex called the “degradasome”, with RNase E as a scaffold [34]. The degradasome
19
20 Protein Production in Escherichia coli
binds preferentially to single-stranded mono-and triphosphorylated 5 -ends of mRNAs, and it is suggested that after looping of the mRNA-RNase E complex a sequence downstream is recognized and cleaved. The cleavage occurs upstream of the dinucleotide AU. However, other structures are required for efficient cleavage in addition to the AU dinucleotide, the nature of which is not fully understood. When the 5 -end of the mRNA is buried within a stem–loop structure, the end cannot be recognized by the degradasome [11]. This explains the high stability of some transcripts, like ompA, which have a stem–loop structure at their 5 -end. There is no sequence specificity of the stem–loop structure, and the addition of such secondary RNA structures to other mRNAs such as lacZ also stabilizes these. The half-life of stabilized mRNA can be increased by up to sixfold [6]. Nevertheless, this mRNA is also degraded in an RNase E-dependent fashion, and it is postulated that RNase E has a second, internal entry site which is less efficient. Factors other than the 5 -end also influence the stabilization of mRNA. The most important factor seems to be the translational efficiency of the mRNA. For example, changing the start codon UUG to the more efficient AUG codon or improving the ribosomal binding site of the rpsTgene has a positive effect on mRNA stability: higher translation rates resulted in higher mRNA stability. It is assumed that the translating ribosomes mask the RNase E cleavage site and therefore stabilize the mRNA. In addition to RNase E there is an RNase G, which shows sequence similarity to RNase E at its N-terminus. This enzyme is essential for the processing of 16S RNA. In only a single case – the adhE gene for alcohol dehydrogenase – has it been shown that RNase G is responsible for a short transcript life time: an RNase G mutant produced about fivefold higher amounts of the adhE gene product than the corresponding wild-type strain [97].
6 Protein Production 6.1 Inclusion Body Formation
One of the main problems encountered in heterologous gene expression using strong promoters is the misfolding and aggregation of proteins in inclusion bodies. For many pharmaceutical proteins, inclusion body formation and subsequent refolding into an active conformation can be employed as an important first purification step, incidentally providing protection of the protein against proteolytic degradation by host proteases [98]. For proteins of lower commercial value, such as technical enzymes, purification and re-folding is too expensive and is not possible when the enzymes are used without purification in whole-cell biocatalysts. The key to solving this problem is an understanding of protein folding during protein synthesis and of the effects of misfolded proteins on the cell. In contrast to the initial assumption that inclusion bodies resulted from unspecific coagulation of incompletely folded proteins, later investigations suggested specific interactions
6 Protein Production
between defined domains of partially folded proteins, for example in the case of P22 tail-spike and P22 coat protein [99]. Thus, two proteins which both show a tendency to form inclusion bodies mainly aggregate in separate inclusion bodies when co-produced in a single cell. High-level expression of genes leads to a substantial change in the overall expression profile of cells. On average, about 6% of genes show a more than threefold change in expression, regardless of the solubility of the recombinant proteins. The misfolding of proteins leads to an additional cellular stress reaction similar to a heat-shock response. Comparison of expression patterns identified 52 genes with a more than threefold change in response to inclusion body formation [76]. Some of the most strongly induced genes encode chaperones such as DnaK, DnaJ, GrpE, GroEL, GroES, IbpA, IbpB, trigger factor and ClpB, ClpP and La, which facilitate denovo protein folding, inhibit aggregation of denatured proteins, bind to inclusion bodies, dissolve aggregated proteins, or act as proteases. Probably the most efficient way to avoid inclusion body formation is to decrease the growth temperature during the production phase to 20◦ C and even lower, depending on the protein. The lower temperature, with the resulting reduction in the rate of protein synthesis, may simply provide more time for proper folding of the nascent chain, or may induce cold shock proteins which assist in folding.
6.1.1 Chaperones as Facilitators of Folding The strategy just described has two obvious disadvantages: low growth rate; and additional energy costs incurred in cooling the fermentor. Therefore, many alternative approaches have been tested. One of the more successful solutions involves the reduction of gene dosage by employing expression vectors with lower copy numbers, or promoters of lower strength such as the positively regulated araBAD and rhaBAD promoters, or a combination of both. A fundamentally different approach is the co-production of chaperones. Plasmids that are compatible with expression vectors have been constructed, which for example contain dnaK, dnaJ, grpE, or mopA and mopB (GroEL/ES) genes in tandem, under the control of a different promoter. However, these plasmids are not always effective, and sometimes a combination of different chaperones is necessary to enhance the production of recombinant proteins in a soluble form. For example, co-expression of trigger factor (a cytoplasmic peptidylprolyl cis/trans isomerase which associates with the ribosome) enhanced the solubility of mouse endostatin (which was otherwise deposited in inclusion bodies), whereas no effect was observed upon co-expression of GroEL-GroES. Only when GroEL-GroES were produced together with DnaK-DnaJ-GrpE was a significant amount of endostatin rendered soluble. Solubilization was only achieved in case of human lysozyme upon co-production of trigger factor and GroEL-GroES. Activity testing of the lysozyme showed that only a small fraction of the soluble protein was enzymatically active. This emphasizes that solubility per se does not necessarily imply correct folding [101]. Another disadvantage of chaperone co-production is that these proteins have to be made in large quantities, thereby reducing the fraction of the cell’s metabolic capacity that can be devoted to production of the target protein.
21
22 Protein Production in Escherichia coli
6.1.2 Fusion Protein Technology Another way of reducing inclusion body formation is to fuse the target gene to the 3 end of a gene which is highly expressed in E. coli and encodes a soluble gene product. Examples of established fusion partners are the E. coli maltose-binding protein MalE (without the signal peptide for export), Schistosoma glutathione transferase, thioredoxin, or Staphylococcus A protein. For example, interleukins IL-2, IL-3, IL-4 and IL-6 could only be produced in inclusion bodies even at 15 ◦ C. However, after fusion of the ORF to the thioredoxin gene trxA, each of the fusion proteins could be recovered in soluble form, even when it accounted for up to 20% of the total cell protein [72]. Besides their higher solubility, such fusions can be advantageous for productivity and yield in providing improved efficiency of translation initiation and in allowing one to use the fusion partner as the basis for purification by affinity chromatography. However, the success of these fusions is not predictable. The degree of misfolding of the fusion protein may be even higher than that of the native protein, or the fusion protein may be soluble but inactive. Finally, in the case of therapeutic products, such as factor Xa or enteropeptidase, the fusion partner must be cleaved off, and very often this is a quite inefficient process. 6.2 Methionine Processing
Recombinant proteins produced in the cytoplasm of E. coli can retain the initiation methionine at their N-terminal end, and this may affect the immunogenicity of the product. E. coli possesses an enzyme activity in the cytoplasm which has the capacity to remove N-terminal methionine. However, in-vivo and in-vitro experiments have demonstrated that the enzyme has some limitations, and that efficient processing depends on the amino acid adjacent to the methionine (Table 2). N-terminal product
Table 2 Cleavage of N-terminal methionine from intracellular
prokaryotic proteins. The effect of the penultimate amino-acid on methionine removal is shown [13] Met cleaved
Variable
Ala Gly Pro Ser
Ile Val Cys Thr
Retained Lys Arg Leu Asp Asn Phe Met Trp Tyr Glu Gln His
6 Protein Production
heterogeneity due to partial processing has also been observed [13, 58]. The gene for the methionine aminopeptidase (map) has been cloned and analyzed [14], and enhancement of processing by concurrent expression of map has also been demonstrated [104].
6.3 Secretion into the Periplasm
Proteins destined for the periplasmic space and outer membrane must pass through the cytoplasmic membrane. Most of these proteins are exported by the general secretory pathway (Sec) consisting of a membrane-embedded complex of the proteins SecY, SecE, SecG, and SecA. The proteins to be exported usually need an extended, loosely folded conformation and an N-terminal extension – an 18- to 30-amino acid signal peptide which is cleaved by the signal peptidases during translocation of the polypeptide. Folding of the polypeptide in the cytoplasm is inhibited by binding of SecB and other cytoplasmic chaperones. The periplasmic space also has its own chaperones to assist in protein folding and to generate disulfide bonds. There are several practical reasons for targeting recombinant proteins to the periplasm. Some proteins need disulfide bonds, which are only formed in the periplasm and not under the reducing conditions found in the cytoplasm. Other proteins, such as the penicillin amidase which must be processed auto-catalytically into a heterodimer, do not fold into the active conformation in the cytoplasm even after removal of the signal peptide. Secretion can also serve to protect the desired product from cytoplasmic proteases; the periplasm itself has a different set of proteases which are present in lower abundance. Therefore, isolation of intact proteins from the periplasm by osmotic shock may be easier than recovery from crude extracts of whole cells. Finally, some proteins produced in the cytoplasm still contain the methionine encoded by the start codon. Processing by the signal peptidase may be used to generate the mature N terminus. On the other hand, translocation of a protein into the periplasm is a bottleneck for high-level production of recombinant proteins. Usually, large cytoplasmic proteins cannot be exported when they are fused to a signal peptide. Export of recombinant proteins seems to be most promising route if the protein is exported in its original host. For efficient secretion, signal sequences are used from E. coli proteins like the outer membrane proteins OmpT or OmpA, from (β-lactamases or from the pectate lyase PelB of Erwinia carotovora. In many cases, however, pre-proteins are not readily exported, and form inclusion bodies in the cytoplasm, or are degraded. In other cases the proteins seem to be transported to the Sec complex but are trapped during translocation, leading to lethality and cell lysis. It is not yet clear why some proteins are exported and others not, or which signal peptides might be optimal. Presumably several factors are crucial for export: the recognition of the nascent polypeptide by SecB or the signal recognition particle and other chaperones, maintenance of its unfolded conformation, the time needed
23
24 Protein Production in Escherichia coli
for transport to the translocation complex, and the overall hydrophobicity of the protein. Even when a protein is exported into the periplasm there are still potential pitfalls to be overcome before an active protein is obtained. These obstacles include the formation of incorrect disulfide bridges, misfolding and inclusion body formation, and proteolytic degradation. As in the cytoplasm, where misfolded proteins induce the σ H -dependent heat-shock genes, the periplasm also possesses a heat-shock system regulated by σ E and the Cpx two-component system, which respond to extracytoplasmic stress by synthesizing proteases and chaperones [94]. 6.4 Disulfide Bond Formation and Folding
Some of the problems encountered with periplasmic proteins can be solved by coproduction of periplasmic chaperones. Thus, it has been shown that overproduction of the thiol/disulfide oxidoreductase DsbA, which acts as a strong oxidizer of thiol groups, and DsbC, a protein disulfide isomerase, can enhance the production of active tPA, a protein with 17 disulfide bridges [108]. In another report, co-expression of DsbCD or DsbABCD improved production of the exported human nerve growth factor (NGF) by 80% [69]. Other chaperones, such as Skp/OmpH, which are involved in the folding of outer membrane proteins seem to have a broad substrate spectrum. Thus, overproduction of Skp improved the solubility of single-chain antibodies secreted into the periplasm [55]. Some chaperones exhibit peptidylprolyl cis/trans isomerase activity (SurA, PpiD, FkpA, RotA/PipA). The co-expression of FkpA considerably improved the periplasmic solubility of a MalE mutant protein [4]. There are about ten proteases in the periplasm, but probably only one of them –DegP–has a chaperone activity in addition, and is induced by the presence of misfolded proteins in the periplasm. Overproduction of DegP has been found to improve the production of active penicillin amidase. High-level production of penicillin amidase leads to inclusion body formation in the periplasm, and leakage of active enzyme into the medium. Coexpression of DegP resulted in a significant decrease in the incidence of inclusion bodies, and the penicillin amidase was slightly more active (about 15%). The main effect was the improvement of cell physiology. Whereas about 50% of the enzyme was lost to the medium in the original strain, in the DegP-overproducing strain the outer membrane remained intact. It could be shown that this effect was due to the proteolytic activity of DegP. Overall, 80% more active enzyme could be recovered from the cells [90]. 6.5 Twin Arginine Translocation (TAT) of Folded Proteins
Several years ago it became clear that most bacteria have a second general protein export pathway, which is independent of Sec. This pathway was called TAT (twin arginine translocation), because its substrates are characterized by two consecutive
6 Protein Production
arginine residues in the signal peptide sequence [15]. The most remarkable characteristic of the TAT system is that it transports into the periplasm those proteins which have already been folded in the cytoplasm, often together with a bound cofactor such as FeS centers or molybdopterin. Even heterodimeric proteins with a TAT signal in only one of the subunits are exported by this system. Signal peptides recognized by the Sec system are divided into three sections: a positively charged N-terminal sequence; a hydrophobic α-helical region; and a cleavage site. The TAT signal sequences are similar but exhibit some distinct features, most notably the conserved sequence (S/T-R-R-X-F-L-K) of the positively charged N-terminal region. Replacement of TAT by Sec signal sequences directs the TAT-exported protein exclusively to the Sec translocator, and vice versa, as has been shown for two E. coli proteins. Thus, it seems that sorting of the precursor proteins to the transport complex is determined only by the signal peptide. When the green fluorescent protein (GFP) was fused to a Sec signal peptide, it remained in the cytoplasm, whereas at least part of it was translocated into the periplasm when a TAT signal was used [40]. As with the Sec system, the addition of a TAT signal peptide to a large cytoplasmic protein such as β-galactosidase is not sufficient to ensure TAT-mediated translocation. Interestingly, penicillin amidase does not show the signal sequence typical of the TAT proteins (it has two arginine residues in the hydrophilic region, but these are separated by an asparagine), but it is nevertheless transported by the TAT translocator. When the original signal sequence was replaced by an OmpT signal (Sec), penicillin amidase was equally well exported [59]. Since penicillin amidase folds predominantly in the periplasm, it seems that some proteins can be transported in a loosely folded state by the TAT system. Further investigations are necessary to understand what exactly is recognized by the TAT translocation system, before it can be routinely exploited for production of recombinant proteins.
6.6 Disulfide Bond Formation in the Cytoplasm
Disulfide bonds usually do not form in the cytoplasm, due to the presence of reducing components such as glutathione and thioredoxins and glutaredoxins. In addition, the cytoplasm lacks enzymes like the DsbA and DsbC found in the periplasm, which oxidize pairs of cysteines in proteins to form disulfide bonds, and function as a disulfide isomerase to reduce incorrectly formed disulfide bonds [62]. The oxidized forms of the two thioredoxins and three glutaredoxins are able to oxidize proteins for disulfide formation, but they are kept in a reduced state by thioredoxin reductase (TrxB) and glutathione, which in turn is reduced by glutathione oxidoreductase (Gor). Double mutants for trxB and gor have been isolated, which are viable but grow very slowly and only in the presence of exogenous reducing dithiothreitol (DTT). A fast-growing derivative has, however, been isolated which grows in the absence
25
26 Protein Production in Escherichia coli
of DTT. Many proteins, such as E. coli alkaline phosphatase and the maltase MalS, as well as mouse urokinase and tissue plasminogen activator (t-PA), are only active when the correct disulfide bonds have been formed. Therefore, they must be exported into the periplasm to ensure activity. In the new mutant all of these proteins could be expressed intracellularly in an active conformation, and in higher amounts than in the periplasm. The amount of active protein could be further increased by intracellular production of the disulfide isomerase DsbC [17]. Hence, this strain seems to offer a suitable alternative to the secretion of proteins into the periplasm. Strains with the double mutation trxB gor are now being marketed commercially under the trade name OrigamiTM . 6.7 Cell Surface Display and Secretion across the Outer Membrane
Display of peptides and proteins at the cell surface has a wide range of applications, including the development of live vaccines, screening of peptide libraries, adsorption and removal of hazardous chemicals, or construction of whole-cell biocatalysts and biosensors [75]. For this purpose, vehicles are needed which are targeted to the outer membrane, such as the E. coli Omp proteins OmpA and OmpC, the peptidoglycan-associated lipoprotein PAL, adhesin AIDA-I, TraT or LamB, as well as foreign proteins like the IgA1 protease from Neisseria gonorrhea or the ice nucleation protein (INP) of Pseudomonas syringae. Alternatively, proteins of cell appendages like pilins, fimbriae and flagella can be used. These so-called carrier proteins are fused at the N-terminal end, the C-terminal end or somewhere within the protein (sandwich technique) to a passenger peptide or protein. The site of fusion depends on where the carrier protein is anchored to the cell. Many carrier proteins allow the integration of small peptides only, otherwise the structure of the outer membrane or the appendages is disturbed and the protein no longer folds properly. The Pseudomonas syringae INP, which has been employed for the generation of a whole-cell biocatalyst, is the most promising example for displaying large proteins. INP is an outer membrane protein which can be easily produced in E. coli. It nucleates ice formation and is responsible for frost injuries in plants. Levansucrase, organophosphorus hydrolase, and carboxymethyl cellulase, as well as hepatitis B virus surface antigen [66] and single-chain antibodies, have been produced as fusions to the C-terminal end of the INP, and displayed in a biologically active form on the surface of the E. coli cell. Sometimes it may be useful to secrete protein products through the cytoplasmic and outer membrane into the medium. In general, laboratory strains of E. coli do not seem to have the appropriate genetic equipment for this, in contrast to pathogenic E. coli and many other Gram-negative bacteria which segregate toxins, proteases, pullulanase and filamentous phages. There are two principal ways to direct proteins into the medium [112]. The first is to direct the target proteins into the periplasm, followed by subsequent passive passage through the outer membrane. A second approach involves the use of an active transport system from another bacterial strain or species. If the first option is chosen, methods can be applied to make the outer membrane leaky, such as osmotic
7 Examples of Products and Processes
shock, incubation with polyethylene glycol, EDTA or Triton X-100. Such procedures can only be carried out after the cells have been harvested, and there are no defined methods for particular proteins of interest. An alternative technique involves co-production of colicin lysis proteins, which permeabilize the outer membrane to deliver the toxin to the medium. An example is the colicin E1 lysis protein (Kil), which has been used to promote the passage of penicillin amidase into the medium. However, the damage caused to the outer membrane can rapidly lead to cell lysis and death. E. coli mutants are known which are naturally more leaky than the wild-type, for example strains mutated in the Omp proteins. Finally mutants, so-called L-forms, which obviously have no outer membrane, have long been known [73] but have largely been neglected. Although L-forms have been used successfully for singlechain antibody production on a laboratory scale [98], the very complex medium that is needed and the sensitivity of the cells prevents scale-up of this process for industrial purposes. Many different systems have been described for the active transport of proteins, but all of them are very specific for a particular protein. An example is the transport of the hemolysin HlyA. This protein is translocated via a protein channel consisting of HlyB, HylD and TolC directly from the cytoplasm into the medium. The hemolysin is recognized by a signal sequence at its C-terminal end, which is not cleaved off. A two-stage process is observed in case of a Klebsiella oxytoca-derived pullulanase. First, the protein is transported into the periplasm, during which step the signal sequence at the N-terminus is cut off. Fourteen different gene products are then necessary to translocate the protein into the medium! In summary, these and a series of other strategies have not yet led to efficient segregation of proteins; they are difficult to adapt to the transport of foreign proteins, and they are still not well understood.
7 Examples of Products and Processes
There are numerous examples of E. coli-based processes for the industrial production of proteins, including a range of pharmaceutical and therapeutic compounds. A selection of approved pharmaceutical products is listed in Table 3. In addition to the examples described here and in the previous sections, even very complex proteins such as new variants of tPA have been produced in high quality and large quantities [82]. These thrombolytic (fibrinolytic) agents are used for the treatment of acute myocardial infarction. They are produced currently as inclusion bodies, and are purified with subsequent renaturation in vitro [102]. This example is not unique, as several systems for the production of the original tPA and newly developed derivatives are available. Does E. coli perform as well as, or better than comparable animal cell systems? A unique comparative study performed some years ago provided some evidence for the superiority of the E. coli-based process in this particular development [36]. In the meantime, there have been improvements which make the process even more efficient [106].
27
28 Protein Production in Escherichia coli Table 3 Pharmaceutical products produced by Escherichia coli strains, approved in Germany (selection)
Class
Product
Human insulin
insulin insulin Insulin lispro (variant)
Pituitary hormones
somatotropin somatotropin
Cytokines
Aldesleukin, IL2 Interferon alfa-2a Interferon alfa-2b Interferon beta-1b Interferon gamma-1b Molgramostim, rhGM-CSF Filgrastim, r-metHuG-CSF
Fibrinolytics
Reteplase (t-PA variant)
Brand name R Berlinsulin R Huminsulin R 100 Humalog R Genotropin R Humatrope R Norditropin R Zomacton R Proleukin R Roferon -A R Intron A R Betaferon R Imukin R Leucomax R Neupogen R Rapilysin
8 Conclusions
Many expression systems, ranging from microorganisms to animals and plants, have been developed or are under construction. Nevertheless, it is felt that E. coli is still the first choice as the host for heterologous gene expression, except in cases where special requirements such as protein modifications, glycosylation or unusual cofactors, must be met. However, even this may change in the near future, as by expanding the genetic code the incorporation of modified amino acids has been achieved [43, 125]. The “last” hurdle for prokaryotes – the lack of a glycosylation capability – may be solved by a new strategy and genome engineering, as depicted recently [128]. New methods of genome analysis [48] and efficient genomic engineering should therefore expand the potential of E. coli as an industrial producer of proteins and other valuable products [30]. Improvements are continuously being made – in the stability of expression vectors, in the tight regulation of promoters, the construction of new host strains for improved mRNA and protein stability, hosts which allow disulfide bond formation in the cytoplasm, or in the co-production of chaperones to facilitate correct protein folding. Other important developments are dealing with the economic production of proteins by improved feeding strategies during fed-batch fermentation and downstream processing. Presumably, there will never be a single unique expression system which is applicable to all genes. However, we are confident that, with increasing knowledge and experience, the expression of even “difficult” genes can be successfully accomplished, and appropriate platforms and strategies will become available to overcome the problems discussed in this article.
8 Conclusions
Appendix
Table 4 Useful strains of Escherichia coli and their characteristics
Strain
Relevant features/Remarks
Reference
An E. coli B strain for expression vectors containing bacteriophage T7 promoter. Bacteriophage T7 RNA polymerase is carried on the bacteriophage λ DE3, which is integrated into the chromosome of BL21.
Studier and Moffatt [56]
supE44 lacU169 (φ lacZM15) hsdR17 recA1 endA1 gyrA96 thi-1 relA1 mcrA mcrBC mrr supE44 hsdS20 (r− B m− B ) recA13 ara-14 proA2 lacY1 galK2 rpsL20 xyl-5 mtl-1 mcrBC mrr F− mcrA recA1 endA1 gyrA96 thi + hsdR17 (r− K mK ) relA1 supE44, (lac-proAB) mcrA recA1 supE44 endA1 hsdR17 ( r− K m+ K ) gyrA96 relA1 thi (lac-proAB) F [traD36 proAB+ lacIq lacZM15]
A recombination deficient and suppressing strain used for plasmids and cosmids. The φ80 lacZM15 permits a complementation with the amino terminus of β-galactosidase encoded in pUC vectors.
Hanahan [52]
A suppressing strain commonly used for large scale production of plasmids. It is an E. coli K12 x E. coli B hybrid that is highly transformable.
Boyer and RoullandDussoix [23]
A recombination deficient F− strain that will support growth of vectors carrying amber mutations and will modify but not restrict transfected DNA.
YanischPerron et al., [25]
A recombination deficient strain that will support growth of vectors carrying amber mutations and will modify but not restrict transfected DNA and contains F . The F7prime; allows blue/white screening on X-gal and permits bacteriophage M13 superinfection.
YanischPerron et al., [25]
ED8654
supE supF hsdR metB lacY gal trpR mcrA
A suppressing strain commonly used to propagate bacteriophage λ vectors and their recombinants.
Borck et al., [22]
W3110
inversion between rrnD and rrnE W3110 lacIq L8
Wild-type strain.
Hill and Harnish [57] Brent and Ptashne [24]
supE44 hsdR17 recA1 endA1 gyrA46 thi relA1 lac− F [proAB+ lacIq lacZM15 Tn10(tetr )]
A recombination deficient strain that will support the growth of vectors carrying some amber mutations, but not those with the Sam100 mutation (e.g., λZAP). Transfected DNA is modified but not restricted. The F allows blue/white screening on X-gal and permits bacteriophage M13 superinfection.
BL21(DE3)
DH5α
HB101
JM108
JM109
RB791
XL1 Blue
Genotype − hsdS(r− B mB gal ompT (λcI ts857 indI Sam7 nin5 lacUV5-T7 gene 1
A strain that makes high levels of lac repressor and is used for inducible expression of genes under the control of the lac and tac promoters.
Bullock et al., [27]
29
30 Protein Production in Escherichia coli Table 5 Useful vectors for Escherichia coli and their characteristics
Accession no./ Reference
Plasmid
Resist marker
Relevant features
pBR322
bla (Apr ), tet (Tcr )
Standard plasmid with 50–200 copies per cell
J01749
pKK223–3
bla (Apr )
expression vector, pBR322 derivative, tacPO (trp/lacUV5 hybrid promoter with lac operator), rrnB terminator
VB0070
pUC19
bla (Apr )
Cloning and expression vector (translational fusion to lacZamino-terminal sequence), pBR322 derivative, lacUV5 promoter and lacZα coding region (blue-white screening with X-Gal); suitable with lacZM15 providing host cells, higher copy number due to loss of rop, contains 54-bp multiple cloning site polylinker within lacZα
M77789
pET11
bla (Apr )
Expression vector, pBR322 derivative, suitable for expression by T7 RNA polymerase provided by e.g. BL21(DE3), the T7 derived promoter is under control of the lac operator, lacI encodes the lac repressor
Dubendorff et al., [44]
pJOE2702
bla (Apr )
Expression vector, pBR322 derivative, rhaP (promoter of the rhaBAD operon), rrnB terminator
Wilms et al., [28]
pACYC184 tet (Tcr ), cat (Cmr )
p15A derivative, compatible to pBR322, lower copy number than pBR322
X06403
pACYC177 bla (Apr ), aphA (Kmr )
p15A derivative, compatible to pBR322, lower copy number than pBR322
X06402
pUBS520
aphA (Kmr )
argU augmentation plasmid, pACYC177 derivative, providing also LacI from the lacIq allele
Brinkmann et al., [25]
pSC101
tet (Tcr )
Low copy vector, compatible to pBR322 and p15A derivatives
NC 002056
8 Conclusions Table 6 List of Genes
Gene
Encoded gene product or function
adhE araA,B,D araC,I araE argU (dnaY) atpE cer dnaK,J glyT grpE
alcohol dehydrogenase metabolism, kinase, isomerase, epimerase L-arabinose-dependent regulators L-arabinose-specific transport [AGA/AGG] arginine tRNA5 membrane-bound ATP synthase, subunit c recognition sequence for the site-specific recombinase XerCD HSP-70-type molecular chaperone, with DnaJ chaperone glycine tRNA2 , UGA suppression GrpE heat shock protein; stimulates DnaK ATPase; nucleotide exchange function post-replicational killing by the gene product of the parB system of plasmid R1 Isoleucine tRNA2 and variant integrase lactose-specific β-galactosidase, permease and regulator (repressor) leucine tRNA3 lysine tRNA (multiple loci, lysQTVWYZ) outer membrane protein 3a origin of DNA replication (E. coli chromosome origin of replication) stability locus of plasmid R1 consisting of hok and sok genes pectate lyase of Erwinia carotovora proline tRNA2 enzyme for general recombination and DNA repair; pairing and strand exchange L-rhamnose-specific metabolism, isomerase, kinase, aldolase L-rhamnose-dependent regulators L-rhamnose-specific transport repressor of primer (RNA organizing protein) of ColE1-type plasmids operons encoding ribosomal RNA and tRNAs levan sucrase of Bacillus subtilis suppressor of post-replicational killing by hok gene product of plasmid R1 tmRNA or 10Sa RNA (amber suppression); glutamine tRNA2 (glnV); tandemly duplicated tyrosine tRNA1 (tyrTV) terminus of DNA replication regulator of trp operon and aroH thioredoxin and thioredoxin reductase
hok ileX,Y int lacZ,Y,I leuW lysT ompA ori (oriC) parB pelB proL recA rhaA,B,D rhaR,S rhaT rop (rom) rrnB,D,E sacB sok ssrA supE,F ter trpR trxA,B
L-arabinose-specific
31
32 Protein Production in Escherichia coli
References 1 NEIDHARDT F. C., CURTISS R., INGRAHAM J. L., LIN E. C. C., LOW K. B., MAGASANIK B., REZNIKOFF W. S., RILEY M., SCHAECHTER M., UMBARGER H. E. (1996) Escherichia coli and Salmonella: Cellular and molecular biology, 2nd edition. ASM Press, Washington, DC 2 DEMEREC M., ADELBERG E. A., CLARK A. J., HARTMAN P. E. (1966) A proposal for a uniform nomenclature in bacterial genetics. Genetics 54: 61–76 3 BERLYN M. K. B. (1998) Linkage map of Escherichia coli K-12, edition 10: the traditional map. Microbiol Mol Biol Rev 62: 814–984 4 RUDD K. E. (1998) Linkage map of Escherichia coli K-12, edition 10: the physical map. Microbiol Mol Biol Rev 62: 985–1019 5 BACHMANN B. J. (1972) Pedigrees of some mutant strains of Escherichia coli K-12. Bacteriol Rev 36: 525–557 6 BLATTNER F. R., PLUNKETT G., III, BLOCH C. A., PERNA N. T., BURLAND V., RILEY M., COLLADO-VIDES J., GLASNER J. D., RODE C. K., MAYHEW G. F., GREGOR J., DAVIS N. W., KIRKPATRICK H. A., GOEDEN M. A., ROSE D. J., MAU B., SHAO Y. (1997) The complete genome sequence of Escherichia coli K-12. Science 277: 1453–1474 7 ITOH T., OKAYAMA T., HASHIMOTO H., TAKEDA J., DAVIS R. W., MORI H., GOJOBORI T. (1999) A low rate of nucleotide changes in Escherichia coli K-12 estimated from a comparison of the genome sequences between two different sub-strains. FEBS Lett 450: 72–76 8 PETERS J. E., THATE T. E., CRAIG N. L. (2003) Definition of the Escherichia coli MC4100 genome by use of a DNA array. J Bacteriol 185: 2017–2021 9 RILEY M., SERRES M. H. (2000) Interim report on genomics of Escherichia coli. Annu Rev Microbiol 54: 341–411 10 ZHANG Y., BUCHHOLZ F., MUYRERS J. P., STEWART A. F. (1998) A new logic for DNA engineering using recombination in Escherichia coli. Nat Genet 20: 123–128
11 KOLISNYCHENKO V., PLUNKETT G., III, HERRING C. D., FEHER T., POSFAI J., BLATTNER F. R., POSFAI G. (2002) Engineering a reduced Escherichia coli genome. Genome Res 12: 640–647 12 YEE L., BLANCH H. W. (1992) Recombinant protein expression in high cell density fed-batch cultures of Escherichia coli. Biotechnology (NY) 10: 1550–1556 13 LEE S. Y. (1996) High cell-density culture of Escherichia coli. Trends Biotechnol 14: 98–105 14 RIESENBERG D., GUTHKE R. (1999) High-cell-density cultivation of microorganisms. Appl Microbiol Biotechnol 51: 422–430 15 HEWITT C. J., NEBE-VON CARON G., NIENOW A. W., MCFARLANE C. M. (1999) Use of multi-staining flow cytometry to characterise the physiological state of Escherichia coli W3110 in high cell density fed-batch cultures. Biotechnol Bioeng 63: 705–711 16 ROTHEN S. A., SAUER M., SONNLEITNER B., WITHOLT B. (1998) Growth characteristics of Escherichia coli HB101[pGEc47] on defined medium. Biotechnol Bioeng 58: 92–100 17 ÅKESSON M., HAGANDER P., AXELSSON J. P. (2001) Avoiding acetate accumulation in Escherichia coli cultures using feedback control of glucose feeding. Biotechnol Bioeng 73: 223–230 18 LULI G. W., STROHL W. R. (1990) Comparison of growth, acetate production, and acetate inhibition of Escherichia coli strains in batch and fed-batch fermentations. Appl Environ Microbiol 56: 1004–1011 19 KLEMAN G. L., STROHL W. R. (1994) Acetate metabolism by Escherichia coli in high-cell-density fermentation. Appl Environ Microbiol 60: 3952–3958 20 van de WALLE M., SHILOACH J. (1998) Proposed mechanism of acetate accumulation in two recombinant Escherichia coli strains during high density fermentation. Biotechnol Bioeng 57: 71–78
References 21 CHOU C. H., BENNETT G. N., SAN K. Y. (1994) Effect of modulated glucose uptake on high-level recombinant protein production in a dense Escherichia coli culture. Biotechnol Prog 10: 644–647 22 ARISTIDOU A. A., SAN K. Y., BENNETT G. N. (1999) Metabolic flux analysis of Escherichia coli expressing the Bacillus subtilis acetolactate synthase in batch and continuous cultures. Biotechnol Bioeng 63: 737–749 23 BETLACH M., HERSHFIELD V., CHOW L., BROWN W., GOODMAN H., BOYER H. W. (1976) A restriction endonuclease analysis of the bacterial plasmid controlling the EcoRI restriction and modification of DNA. Fed Proc 35: 2037–2043 24 TOMIZAWA J. (1990) Control of ColE1 plasmid replication. Interaction of Rom protein with an unstable complex formed by RNA I and RNA II. J Mol Biol 212: 695–708 25 YANISCH-PERRON C., VIEIRA J., MESSING J. (1985) Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene 33: 103–119 26 YAVACHEV L., IVANOV I. (1988) What does the homology between E. coli tRNAs and RNAs controlling ColE1 plasmid replication mean?. J Theor Biol 131: 235–241 27 GRABHERR R., NILSSON E., STRIEDNER G., BAYER K. (2002) Stabilizing plasmid copy number to improve recombinant protein production. Biotechnol Bioeng 77: 142–147 28 WILMS B., HAUCK A., REUSS M., SYLDATK C., MATTES R., SIEMANN M., ALTENBUCHNER J. (2001a) High-cell-density fermentation for production of L-N-carbamoylase using an expression system based on the Escherichia coli rhaBAD promoter. Biotechnol Bioeng 73: 95–103 29 SUMMERS D. (1998) Timing, self-control and a sense of direction are the secrets of multicopy plasmid stability. Mol Microbiol 29: 1137–1145 30 COLLOMS S. D., MCCULLOCH R., GRANT K., NEILSON L., SHERRATT D. J. (1996) Xer-mediated site-specific
31
32
33
34
35
36
37
38
39
40
41
recombination in vitro. EMBO J 15: 1172–1181 SHARPE M. E., CHATWIN H. M., MACPHERSON C., WITHERS H. L., SUMMERS D. K. (1999) Analysis of the CoIE1 stability determinant Rcd. Microbiology 145: 2135–2144 NAGEL J. H., GULTYAEV A. P., GERDES K., PLEIJ C. W. (1999) Metastable structures and refolding kinetics in hok mRNA of plasmid R1. RNA 5: 1408–1418 CHANG A. C., COHEN S. N. (1978) Construction and characterization of amplifiable multicopy DNA cloning vehicles derived from the P15A cryptic miniplasmid. J Bacteriol 134: 1141–1156 SCHOLZ P., HARING V., WITTMANN-LIEBOLD B., ASHMAN K., BAGDASARIAN M., SCHERZINGER E. (1989) Complete nucleotide sequence and gene organization of the broad-host-range plasmid RSF1010. Gene 75: 271–288 TAIT R. C., BOYER H. W. (1978) Restriction endonuclease mapping of pSC101 and pMB9. Mol Gen Genet 164: 285–288 SCOTT H. N., LAIBLE P. D., HANSON D. K. (2003) Sequences of versatile broad-host-range vectors of the RK2 family. Plasmid 50: 74–79 JONES K. L., KEASLING J. D. (1998) Construction and characterization of F plasmid-based expression vectors. Biotechnol Bioeng 59: 659–665 WILMS B., WIESE A., SYLDATK C., MATTES R., ALTENBUCHNER J. (2001b) Development of an Escherichia coli whole cell biocatalyst for the production of L-amino acids. J Biotechnol 86: 19–30 MARTIN V. J., SMOLKE C. D., KEASLING J. D. (2002) Redesigning cells for production of complex organic molecules. ASM News 68: 336–343 LINK A. J., PHILLIPS D., CHURCH G. M. (1997) Methods for generating precise deletions and insertions in the genome of wild-type Escherichia coli: application to open reading frame characterization. J Bacteriol 179: 6228–6237 ATLUNG T., NIELSEN A., RASMUSSEN L. J., NELLEMANN L. J., HOLM F. (1991) A versatile method for integration of genes and gene fusions into the lambda
33
34 Protein Production in Escherichia coli
42
43
44
45
46
47
48
49
50
51
52
attachment site of Escherichia coli. Gene 107: 11–17 DEHASETH P. L., ZUPANCIC M. L., RECORD M. T., Jr. (1998) RNA polymerase-promoter interactions: the comings and goings of RNA polymerase. J Bacteriol 180: 3019–3025 KAMMERER W., DEUSCHLE U., GENTZ R., BUJARD H. (1986) Functional dissection of Escherichia coli promoters: information in the transcribed region is involved in late steps of the overall process. EMBO J 5: 2995–3000 MOONEY R. A., ARTSIMOVITCH I., LANDICK R. (1998) Information processing by RNA polymerase: recognition of regulatory signals during RNA chain elongation. J Bacteriol 180: 3265–3275 DUBENDORFF J. W., STUDIER F. W. (1991) Controlling basal expression in an inducible T7 expression system by blocking the target T7 promoter with lac repressor. J Mol Biol 219: 45–59 SAGAWA H., OHSHIMA A., KATO I. (1996) A tightly regulated expression system in Escherichia coli with SP6 RNA polymerase. Gene 168: 37–41 SAWERS G., JARSCH M. (1996) Alternative regulation principles for the production of recombinant proteins in Escherichia coli. Appl Microbiol Biotechnol 46: 1–9 O’HALLORAN T., WALSH C. (1987) Metalloregulatory DNA-binding protein encoded by the merR gene: isolation and characterization. Science 235: 211–214 BROSIUS J., ERFLE M., STORELLA J. (1985) Spacing of the −10 and −35 regions in the tac promoter. Effect on its in vivo activity. J Biol Chem 260: 3539–3541 OEHLER S., AMOUYAL M., KOLKHOF P., WILCKEN-BERGMANN B., M¨ULLER-HILL B. (1994) Quality and position of the three lac operators of E. coli define efficiency of repression. EMBO J 13: 3348–3355 CALOS M. P. (1978) DNA sequence for a low-level promoter of the lac repressor gene and an ‘up’ promoter mutation. Nature 274: 762–765 AMANN E., OCHS B., ABEL K. J. (1988) Tightly regulated tac promoter vectors useful for the expression of unfused and fused proteins in Escherichia coli. Gene 69: 301–315
53 HASAN N., SZYBALSKI W. (1995) Construction of lacIts and lacIqts expression plasmids and evaluation of the thermosensitive lac repressor. Gene 163: 35–40 54 ANDREWS B., ADARI H., HANNIG G., LAHUE E., GOSSELIN M., MARTIN S., AHMED A., FORD P. J., HAYMAN E. G., MAKRIDES S. C. (1996) A tightly regulated high level expression vector that utilizes a thermosensitive lac repressor: production of the human Tcell receptor V beta 5.3 in Escherichia coli. Gene 182: 101–109 55 LANZER M., BUJARD H. (1988) Promoters largely determine the efficiency of repressor action. Proc Natl Acad Sci USA 85: 8973–8977 56 STUDIER F. W., MOFFATT B. A. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol 189: 113–130 57 REMAUT E., STANSSENS P., FIERS W. (1981) Plasmid vectors for high-efficiency expression controlled by the PL promoter of coliphage lambda. Gene 15: 81–93 58 HASAN N., SZYBALSKI W. (1987) Control of cloned gene expression by promoter inversion in vivo: construction of improved vectors with a multiple cloning site and the ptac promoter. Gene 56: 145–151 59 BASS S. H., YANSURA D. G. (2000) Application of the E. coli trp promoter. Mol Biotechnol 16: 253–260 60 SCHLEIF R. (1996) Two positively regulated systems, ara and mal. In: Escherichia coli and Salmonella: Cellular and molecular biology (NEIDHARDT F. C., et al., Eds). ASM Press, Washington, DC, Vol 1, pp 1300–1309 61 GALLEGOS M. T., SCHLEIF R., BAIROCH A., HOFMANN K., RAMOS J. L. (1997) AraC/XylS family of transcriptional regulators. Microbiol Mol Biol Rev 61: 393–410 62 GUZMAN L. M., BELIN D., CARSON M. J., BECKWITH J. (1995) Tight regulation, modulation, and high-level expression by vectors containing the arabinose pBAD promoter. J Bacteriol 177: 4121–4130
References 63 SIEGELE D. A., HU J. C. (1997) Gene expression from plasmids containing the araBAD promoter at subsaturating inducer concentrations represents mixed populations. Proc Natl Acad Sci USA 94: 8168–8172 64 BADIA J., BALDOMA L., AGUILAR J., BORONAT A. (1989) Identification of the rhaA, rhaB and rhaD gene products from Escherichia coli K-12. FEMS Microbiol Lett 53: 253–257 65 VIA P., BADIA J., BALDOMA L., OBRADORS N., AGUILAR J. (1996) Transcriptional regulation of the Escherichia coli rhaTgene. Microbiology 142: 1833–1840 66 HALDIMANN A., DANIELS L. L., WANNER B. L. (1998) Use of new methods for construction of tightly regulated arabinose and rhamnose promoter fusions in studies of the Escherichia coli phosphate regulon. J Bacteriol 180: 1277–1286 67 ETCHEGARAY J. P., INOUYE M. (1999) Translational enhancement by an element downstream of the initiation codon in Escherichia coli. J Biol Chem 274: 10079–10085 68 de SMIT M. H., van DUIN J. (1990) Secondary structure of the ribosome binding site determines translational efficiency: a quantitative analysis. Proc Natl Acad Sci USA 87: 7668–7672 69 SCHONER B. E., BELAGAJE R. M., SCHONER R. G. (1990) Enhanced translational efficiency with two-cistron expression system. Methods Enzymol 185: 94–103 70 SUNG W. L., LUK C. K., ZAHAB D. M., BARBIER J. R., LAFONTAINE M., WILLICK G. E. (1991) Internal ribosome-binding site directs expression of parathyroid hormone analogue (8–84) in Escherichia coli. Biochem Biophys Res Commun 181: 481–485 71 BRINKMANN U., MATTES R. E., BUCKEL P. (1989) High-level expression of recombinant genes in Escherichia coli is dependent on the availability of the dnaYgene product. Gene 85: 109–114 72 TU G. F., REID G. E., ZHANG J. G., MORITZ R. L., SIMPSON R. J. (1995) C-terminal extension of truncated recombinant proteins in Escherichia coli
73
74
75
76
77
78
79
80
81
with a 10Sa RNA decapeptide. J Biol Chem 270: 9322–9326 ZAHN K. (1996) Overexpression of an mRNA dependent on rare codons inhibits protein synthesis and cell growth. J Bacteriol 178: 2926–2933 KEILER K. C., WALLER P. R., SAUER R. T. (1996) Role of a peptide tagging system in degradation of proteins synthesized from damaged messenger RNA. Science 271: 990–993 MUTO A., USHIDA C., HIMENO H. (1998) A bacterial RNA that functions as both a tRNA and an mRNA. Trends Biochem Sci 23: 25–29 DONG H., NILSSON L., KURLAND C. G. (1996) Covariation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J Mol Biol 260: 649–663 ROBINSON M., LILLEY R., LITTLE S., EMTAGE J. S., YARRANTON G., STEPHENS P., MILLICAN A., EATON M., HUMPHREYS G. (1984) Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic Acids Res 12: 6663–6671 BONEKAMP F., ANDERSEN H. D., CHRISTENSEN T., JENSEN K. F. (1985) Codon-defined ribosomal pausing in Escherichia coli detected by using the pyrE attenuator to probe the coupling between transcription and translation. Nucleic Acids Res 13: 4113–4123 SPANJAARD R. A., CHEN K., WALKER J. R., van DUIN J. (1990) Frameshift suppression at tandem AGA and AGG codons by cloned tRNA genes: assigning a codon to argU tRNA and T4 tRNA(Arg). Nucleic Acids Res 18: 5031–5036 CHEN G. F., INOUYE M. (1990) Suppression of the negative effect of minor arginine codons on gene expression; preferential usage of minor codons within the first 25 codons of the Escherichia coli genes. Nucleic Acids Res 18: 1465–1473 CALDERONE T. L., STEVENS R. D., OAS T. G. (1996) High-level misincorporation of lysine for argi-nine at AGA codons in a fusion protein expressed in Escherichia coli. J Mol Biol 262: 407–412
35
36 Protein Production in Escherichia coli
82 FORMAN M. D., STACK R. F., MASTERS P. S., HAUER C. R., BAXTER S. M. (1998) High level, context dependent misincorporation of lysine for arginine in Saccharomyces cerevisiae a1 homeodomain expressed in Escherichia coli. Protein Sci 7: 500–503 83 YOU J., COHEN R. E., PICKART C. M. (1999) Construct for high-level expression and low misin-corporation of lysine for arginine during expression of pET-encoded eukaryotic proteins in Escherichia coli. Biotechniques 27: 950–954 84 DEL TITO B. J., Jr., WARD J. M., HODGSON J., GERSHATER C. J., EDWARDS H., WYSOCKI L. A., WATSON F. A., SATHE G., KANE J. F. (1995) Effects of a minor isoleucyl tRNA on heterologous protein translation in Escherichia coli. J Bacteriol 177: 7086–7091 85 BACA A. M., HOL W. G. (2000) Overcoming codon bias: a method for high-level overexpression of Plasmodium and other AT-rich parasite genes in Escherichia coli. Int J Parasitol 30: 113–118 86 ZDANOVSKY A. G., ZDANOVSKAIA M. V. (2000) Simple and efficient method for heterologous expression of clostridial proteins. Appl Environ Microbiol 66: 3166–3173 87 SORENSEN H. P., SPERLING-PETERSEN H. U., MORTENSEN K. K. (2003) Production of recombinant thermostable proteins expressed in Escherichia coli: completion of protein synthesis is the bottleneck. J Chromatogr B Analyt Technol Biomed Life Sci 786: 207–214 88 PAN W., RAVOT E., TOLLE R., FRANK R., MOSBACH R., TURBACHOVA I., BUJARD H. (1999) Vaccine candidate MSP-1 from Plasmodium falciparum: a redesigned 4917 bp polynucleotide enables synthesis and isolation of full-length protein from Escherichia coli and mammalian cells. Nucleic Acids Res 27: 1094–1103 89 WU X., JORNVALL H., BERNDT K. D., OPPERMANN U. (2004) Codon optimization reveals critical factors for high level expression of two rare codon genes in Escherichia coli: RNA stability
90
91
92
93
94
95
96
97
98
99
100
and secondary structure but not tRNA abundance. Biochem Biophys Res Commun 313: 89–96 B¨OCK A., FORCHHAMMER K., HEIDER J., LEINFELDER W., SAWERS G., VEPREK B., ZINONI F. (1991) Selenocysteine: the 21st amino acid. Mol Microbiol 5: 515–520 MACBEATH G., KAST P. (1998) UGA read-through artifacts – when popular gene expression systems need a pATCH. BioTechniques 24: 789–794 POOLE E. S., BROWN C. M., TATE W. P. (1995) The identity of the base following the stop codon determines the efficiency of in vivo translational termination in Escherichia coli. EMBO J 14: 151–158 KUSHNER S. R. (2002) mRNA decay in Escherichia coli comes of age. J Bacteriol 184: 4658–4665 COHEN S. N., MCDOWALL K. J. (1997) RNase E: still a wonderfully mysterious enzyme. Mol Microbiol 23: 1099–1106 BAKER K. E., MACKIE G. A. (2003) Ectopic RNase E sites promote bypass of 5 -end-dependent mRNA decay in Escherichia coli. Mol Microbiol 47: 75–88 ARNOLD T. E., YU J., BELASCO J. G. (1998) mRNA stabilization by the ompA 5 untranslated region: two protective elements hinder distinct pathways for mRNA degradation. RNA 4: 319–330 KAGA N., UMITSUKI G., CLARK D. P., NAGAI K., WACHI M. (2002) Extensive overproduction of the AdhE protein by rng mutations depends on mutations in the cra gene or in the Crabox of the adhE promoter. Biochem Biophys Res Commun 295: 92–97 LILIE H., SCHWARZ E., RUDOLPH R. (1998) Advances in refolding of proteins produced in E. coli. Curr Opin Biotechnol 9: 497–501 SPEED M. A., WANG D. I. C., KING J. (1996) Specific aggregation of partially folded polypeptide chains: The molecular basis of inclusion body composition. Nat Biotechnol 14: 1283–1287 LESLEY S. A., GRAZIANO J., CHO C. Y., KNUTH M. W., KLOCK H. E. (2002) Gene expression response to misfolded protein as a screen for soluble
References
101
102
103
104
105
106
107
108
109
110
re-combinant protein. Protein Eng 15: 153–160 NISHIHARA K., KANEMORI M., YANAGI H., YURA T. (2000) Overexpression of trigger factor prevents aggregation of recombinant proteins in Escherichia coli. Appl Environ Microbiol 66: 884–889 LAVALLIE E. R., MCCOY J. M. (1995) Gene fusion expression systems in Escherichia coli. Curr Opin Biotechnol 6: 501–506 HIREL P. H., SCHMITTER M. J., DESSEN P., FAYAT G., BLANQUET S. (1989) Extent of N-terminal methionine excision from Escherichia coli proteins is governed by the side-chain length of the penultimate amino acid. Proc Natl Acad Sci USA 86: 8247–8251 BEN-BASSAT A. (1991) Methods for removing N-terminal methionine from recombinant proteins. Bioprocess Technol 12: 147–159 BEN-BASSAT A., BAUER K., CHANG S. Y., MYAMBO K., BOOSMAN A., CHANG S. (1987) Processing of the initiation methionine from proteins: properties of the Escherichia coli methionine aminopeptidase and its gene structure. J Bacteriol 169: 751–757 SANDMAN K., GRAYLING R. A., REEVE J. N. (1995) Improved N-terminal processing of recombinant proteins synthesized in Escherichia coli. Biotechnology (NY) 13: 504–506 RAIVIO T. L., SILHAVY T. J. (2001) Periplasmic stress and ECF sigma factors. Annu Rev Microbiol 55: 591–624 BESSETTE P. H., ASLUND F., BECKWITH J., GEORGIOU G. (1999) Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm. Proc Natl Acad Sci USA 96: 13703–13708 KUROKAWA Y., YANAGI H., YURA T. (2001) Overproduction of bacterial protein disulfide isomerase (DsbC) and its modulator (DsbD) markedly enhances periplasmic production of human nerve growth factor in Escherichia coli. J Biol Chem 276: 14393–14399 HAYHURST A., HARRIS W. J. (1999) Escherichia coli skp chaperone coexpression improves solubility and phage display of single-chain antibody
111
112
113
114
115
116
117
118
119
120
121
fragments. Protein Expr Purif 15: 336–343 ARIE J. P., SASSOON N., BETTON J. M. (2001) Chaper-one function of FkpA, a heat shock prolyl isomerase, in the periplasm of Escherichia coli. Mol Microbiol 39: 199–210 PAN K. L., HSIAO H. C., WENG C. L., WU M. S., CHOU C. P. (2003) Roles of DegP in prevention of protein misfolding in the periplasm upon overexpression of penicillin acylase in Escherichia coli. J Bacteriol 185: 3020–3030 BERKS B. C., SARGENT F., PALMER T. (2000) The Tat protein export pathway. Mol Microbiol 35: 260–274 DELISA M. P., SAMUELSON P., PALMER T., GEORGIOU G. (2002) Genetic analysis of the twin arginine translocator secretion pathway in bacteria. J Biol Chem 277: 29825–29831 IGNATOVA Z., HORNLE C., NURK A., KASCHE V. (2002) Unusual signal peptide directs penicillin amidase from Escherichia coli to the Tat translocation machinery. Biochem Biophys Res Commun 291: 146–149 KADOKURA H., KATZEN F., BECKWITH J. (2003) Protein disulfide bond formation in prokaryotes. Annu Rev Biochem 72: 111–135 LEE S. Y., CHOI J. H., XU Z. (2003) Microbial cell-surface display. Trends Biotechnol 21: 45–52 KIM E. J., YOO S. K. (1999) Cell surface display of hepatitis B virus surface antigen by using Pseudomonas syringae ice nucleation protein. Lett Appl Microbiol 29: 292–297 SHOKRI A., SANDEN A. M., LARSSON G. (2003) Cell and process design for targeting of recombinant protein into the culture medium of Escherichia coli. Appl Microbiol Biotechnol 60: 654–664 LEDERBERG J., ST. CLAIR J. (1958) Protoplasts and L-type growth of Escherichia coli. J Bacteriol 75: 143–160 RIPPMANN J. F., KLEIN M., HOISCHEN C., BROCKS B., RETTIG W. J., GUMPERT J., PFIZENMAIER K., MATTES R., MOOSMAYER D. (1998) Procaryotic expression of single-chain variable-fragment (scFv) antibodies: secretion in L-form cells of
37
38 Protein Production in Escherichia coli
122
123
124
125
126
127
128
Proteus mirabilis leads to active product and overcomes the limitations of periplasmic expression in Escherichia coli. Appl Environ Microbiol 64: 4862–4869 MATTES R. (2001) The production of improved tissue-type plasminogen activator in Escheri-chia coli. Semin Thromb Hemost 27: 325–336 RUDOLPH R., LILIE H. (1996) In vitro folding of inclusion body proteins. FASEB J 10: 49–56 DATAR R. V., CARTWRIGHT T., ROSEN C. G. (1993) Process economics of animal cell and bacterial fermentations: a case study analysis of tissue plasminogen activator. Biotechnology (NY) 11: 349–357 SCHAFFNER J., WINTER J., RUDOLPH R., SCHWARZ E. (2001) Cosecretion of chaperones and low-molecular-size medium additives increases the yield of recombinant disulfide-bridged proteins. Appl Environ Microbiol 67: 3994–4000 WANG L., BROCK A., HERBERICH B., SCHULTZ P. G. (2001) Expanding the genetic code of Escherichia coli. Science 292: 498–500 D¨ORING V., MOOTZ H. D., NANGLE L. A., HENDRICKSON T. L., CRECY-LAGARD V., SCHIMMEL P., MARLIERE P. (2001) Enlarging the amino acid set of Escherichia coli by infiltration of the valine coding pathway. Science 292: 501–504 ZHANG Z., GILDERSLEEVE J., YANG Y. Y., XU R., LOO J. A., URYU S., WONG C. H., SCHULTZ P. G. (2004) A new strategy for
129
130
131
132
133
134
135
136
the synthesis of glycoproteins. Science 303: 371–373 GILL R. T., DELISA M. P., VALDES J. J., BENTLEY W. E. (2001) Genomic analysis of high-cell-density recombinant Escherichia coli fermentation and ‘cell conditioning’ for improved recombinant protein yield. Biotechnol Bioeng 72: 85–95 CAUSEY T. B., SHANMUGAM K. T., YOMANO L. P., INGRAM L. O. (2004) Engineering Escherichia coli for efficient conversion of glucose to pyru-vate. Proc Natl Acad Sci USA 101: 2235–2240 HANAHAN D. (1983) Studies on transformation of Escherichia coli with plasmids. J Mol Biol 166: 557–580 BOYER H. W., ROULLAND-DUSSOIX D. (1969) A complementation analysis of the restriction and modification of DNA in Escherichia coli. J Mol Biol 41: 459–472 BORCK K., BEGGS J. D., BRAMMAR W. J., HOPKINS A. S., MURRAY N. E. (1976) The construction in vitro of transducing derivatives of phage lambda. Mol Gen Genet 146: 199–207 HILL C. W., HARNISH B. W. (1981) Inversions between ribosomal RNA genes of Escherichia coli. Proc Natl Acad Sci USA 78: 7069–7072 BRENT R., PTASHNE M. (1981) Mechanism of action of the lexA gene product. Proc Natl Acad Sci USA 78: 4204–4208 BULLOCK W. O., FERNANDEZ J. M., SHORT J. M. (1987) XL1-Blue: A high efficiency plasmid transforming recA Escherichia coli strain with beta-galactosidase selection. Biotechniques 5: 376
1
Protein Production in Pseudomonas fluorescens Lawrence C. Chew, Tom M. Ramseier, Diane M. Retallack, Jane C. Schneider, Charles H. Squires, and Henry W. Talbot The Dow Chemical Company, San Diego, USA
Originally published in: Production of Recombinant Proteins. Edited by Gerd Gellissen. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31036-4
1 Introduction
The rapid expansion of the biotechnology industry in recent years has necessitated the expression of a wide spectrum of recombinant proteins in different host systems for a wide variety of purposes. In some applications, a large number of proteins are needed in small quantities for screening applications or structural determinations. In other cases, quantities approaching the metric tonne scale are needed for specific therapeutic applications. The majority of therapeutic proteins have been produced in either mammalian cell-culture systems, with Chinese hamster ovary (CHO) cells representing the most common mammalian cell system in use, or in microbial systems, Escherichia coli being the most common. A variety of alternative expression systems are also being used or developed and evaluated. These include other mammalian (human) cell lines, fungi, yeast, insect, and bacterial host cell systems. While each of these systems is being developed with very specific system advantages in mind, it is not clear which of them will ultimately be the most useful for therapeutic protein production. It is, in fact, likely that several of these systems will reach broad acceptance in particular niches in the protein production industry. The efficient discovery of efficacious new protein pharmaceuticals is essential if medicine is to meet its challenge of treating diseases. However, the ability to manufacture pharmaceuticals economically has rapidly become as important as overall drug development costs continue to escalate. Production technologies that yield high titers of protein in active form are key. Also, proteins must often be posttranslationally modified to achieve their intended performance as therapeutics. Many of the proteins in development today are monoclonal antibodies that present
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Production in Pseudomonas fluorescens
special production challenges in order to maximize recovery of active material for high dosage demands. E. coli was the first heterologous host of any kind used to produce a recombinant DNA-based pharmaceutical, this being carried out in 1982 by the pharmaceutical company Eli Lilly to produce human insulin. Although E. coli has been the dominant host (in terms of economic value) for the production of protein pharmaceuticals [1], in recent years it has been overtaken by mammalian cell production. This has been driven by the need to produce more complex proteins, in particular, antibodies. In many cases (e. g., insulin), E. coli is unable to fold these products into their correct conformation, instead producing unfolded, inactive, intracellular protein masses called inclusion bodies. Although the recombinant insulin example suggests potential economic feasibility for protein production in E. coli, many producers are unwilling to implement an in-vitro protein folding process for these complex products. In addition, E. coli can neither synthesize nor attach mammalian glycosylation chains which may be required for functionality or pharmacokinetic properties, although for many products and applications glycosylation is not required, such as for antibodies in which only binding function is required. Another bottleneck to protein pharmaceutical production can be the high-level expression of the target in a cell line or host organism. A Pseudomonas fluorescensbased manufacturing platform for high-yield production of protein pharmaceuticals has been developed based on P. fluorescens biovar I strain MB101 [2]. The system’s performance is due to the combination of a robust host strain and the availability of extensive molecular biology and bioinformatics tools. These tools include a range of stable plasmid vectors of various copy numbers, non-antibioticdependent plasmid maintenance, engineered host strains for stringent control of gene expression, as well as the capability of exporting proteins to the periplasmic space. A well-optimized, high-cell-density (HCD) fermentation process completes the utility of the system. Soluble and active protein yields in excess of 25 g L−1 with a variety of protein types ranging from enzymes for industrial use to proteins for pharmaceutical applications have been achieved. Production conditions are routinely optimized in 20-L fermentors, and have been successfully scaled up for the commercial production of a number of heterologous proteins.
2 Biology of Pseudomonas fluorescens
The species P. fluorescens is an abundant and natural component of the microbial flora of soil, water and plants, inhabiting plant rhizosphere and phyllosphere environments [3]. Members of the species P. fluorescens are aerobic, saprophytic, Gram-negative organisms that are not pathogenic (they do not cause a disease state) in plants [4], mammals or immunocompetent humans [5]. They grow at neutral pH, and in the mesophilic temperature range (optimally between 25 and 30 ◦ C), and do not grow above 40◦ C or under acid conditions (pH <4.5) [3, 6]. Although
3 History and Taxonomy of Pseudomonas fluorescens Strain Biovar I MB101 3
growth of P. fluorescens is restricted at higher temperatures, this organism has the ability to grow at 4◦ C [7]. Strains of P. fluorescens are commonly found on plant surfaces, as well as in decaying vegetation, soil, and water [4]. They can also grow on a wide range of very simple organic substrates and remain viable for long periods of time in a wide variety of habitats. The ubiquitous nature of P. fluorescens on the surface of plants typically grown for human consumption suggests that this bacterium has been widely consumed by humans for many years. The American Type Culture Collection (ATCC) has designated strains of P. fluorescens under Biosafety Level 1 (BSL-1), which is defined as “. . . having no known potential to cause disease in humans or animals.” (see Appendix Table A1). P. fluorescens has been used in recent years in a variety of industrial applications, including for ice nucleation (for snow making and minimization of frost damage to plants) [8–10], to produce biological pesticides [11, 12], and for the control of diseases in the phyllosphere of plants [10]. The ability of P. fluorescens to catabolize a wide variety of natural and synthetic compounds (e.g., chlorinated aliphatic hydrocarbons) and to produce a variety of enzymes has led to its wide use in bioremediation and biocatalysis [13].
3 History and Taxonomy of Pseudomonas fluorescens Strain Biovar I MB101
Pseudomonas fluorescens strain MB101 was isolated in 1984 from the surface of a lettuce leaf located on a farm in San Diego County, California. Morphologic characteristics for MB101 are consistent with those identified in Bergey’s Manual [6] for P. fluorescens: Ĺ Ĺ Ĺ Ĺ
Straight or slightly curved rods, but not helical Motile by one or several polar flagella 0.5–1.0 µm in diameter by 1.5–5.0 µm in length The cells do not produce prosthecae, and are not surrounded by sheaths
The phylogeny of strain MB101 was established by thorough characterization of its morphological, biochemical, and genotypic properties. The results from a series of phenotypic tests confirmed that strain MB101 belongs to P. fluorescens biotype A – that is, biovar I. The tests included API rapid NFT (bio Merieux-Vitek, Hazelwood, MD USA), production of levan and phenazine, ability to denitrify and liquefy gelatin, utilization of trehalose and numerous other carbon sources, and analysis of fatty acid methyl esters [14, 15]. Genotypic analyses were performed using the 16S rRNA genes of strain MB101 and other selected strains from the genus Pseudomonas. Analyses of the three hypervariable regions within the 16S rRNA genes [16] corroborated the findings of the phenotypic analyses that identify strain MB101 as P. fluorescens biovar I.
4 Protein Production in Pseudomonas fluorescens
The taxonomic position of strain MB101 is therefore as follows:
P. fluorescens MB101 has been used as a host for production of several EPAregistered bioinsecticides [17–19] and a generally-recognized-as-safe (GRAS) αamylase preparation [2, 20]. Therefore, the safety of the industrial application of P. fluorescens biovar I strain MB101 is well established. 4 Cultivation
While P. fluorescens can grow well in common laboratory media, such as Luria Broth and M9 minimal medium, its nutrient, growth temperature and oxygen requirements have to be taken into careful consideration in order to achieve high recombinant expression levels at high cell densities. As with other pseudomonads [21, 22], P. fluorescens can be cultivated to high cell densities in bioreactors with simple but balanced defined mineral salts media supplemented with an inorganic nitrogen source such as ammonia, and a carbon source such as glucose. The optimal temperature for P. fluorescens growth is 32 ◦ C. Being a strict aerobe, only adequate oxygen transfer to the culture will ensure better growth. Nevertheless, from our experience, maintaining dissolved oxygen above a certain level in the bioreactor (e.g., 20%) does not appear to be as critical in P. fluorescens as in E. coli. P. fluorescens MB101, being a prototrophic strain, obviates the need of any organic nitrogen supplementation, which decreases the risk of performance variability that is accompanied by dependence on crude hydrolysates. Unlike E. coli, P. fluorescens does not accumulate acetate during fermentation. Common to other pseudomonads, glucose uptake is preferentially through the oxidative rather than the phosphorylative pathway [23, 24]. Using optimal fermentation conditions and carbon feeding, biomass levels of greater than 100 g L−1 , accompanied by recombinant protein expression levels of more than 30% total cell protein, can be achieved by P. fluorescens at production scales without the need for oxygen supplementation. 5 Genomics and Functional Genomics of P. fluorescens Strain MB101
The genome sequence of P. fluorescens strain MB214 (a derivative of MB101 with a chromosomal insertion of the E. coli lac operon) was determined to aide the
5 Genomics and Functional Genomics of P. fluorescens Strain MB101 5
development of MB214 into a production platform strain for the manufacture of proteins, peptides, and metabolites. Knowledge of the entire genome allows directed changes to individual genes or groups of genes to be performed quickly, thus enabling sophisticated pathway engineering and enhancement of specific gene expression in P. fluorescens (see below). P. fluorescens MB214 is also easy to manipulate genetically, thereby enhancing this capability. The genome sequence was determined by random shotgun sequencing of three different DNA libraries, varying the type of vector and DNA insert size. The DNA sequences were assembled into overlapping reads using the Phred software [25, 26]. The genome of strain MB214 is 6.5 megabases, similar in size to other pseudomonads for which the genome sequences have already been established [27–29]. No indigenous plasmids were identified in strain MB214. The G+C content was calculated as 60%, which is slightly lower than the G+C content reported for P. aeruginosa PAO1 [27]. Over 6200 open reading frames (ORFs) were identified using the gene finder tools Generation (Genomix Corporation, Oak Ridge, TN, USA), Critica [30], and Glimmer [31] and the Glimmer post-processor, RBSfinder [32]. Gene functions were assigned based on BLAST [33] and FASTA [34] homology, as well as hits against COGs [35] and InterPro [36]. Reconstruction of the metabolism based on the annotated genes reveals that metabolically, P. fluorescens is very versatile. Over 700 pathways were identified containing numerous, sometimes-redundant pathways for the same metabolic reaction (Table 1). Glucose uptake is achieved through two different routes, a phosphorylative and an oxidative pathway that are employed at different glucose concentrations. Unlike E. coli, catabolism of this inexpensive carbon source utilizes the Entner–Doudoroff pathway and not the Embden–Meyerhof–Parnas glycolytic pathway. This is due to an absence of the gene encoding phosphofructokinase. All the genes necessary for a fully functional Krebs cycle – including the glyoxylate shunt pathway – are present for efficient glucose catabolism. P. fluorescens has the “genomic power” to synthesize and degrade all 20 amino acids. Amino acid synthesis can occur via multiple routes for homoserine, methionine, glutamate, and homocysteine. P. fluorescens also contains two glutamate tRNA synthetases, in contrast to E. coli and Bacillus subtilis, both of which only have one corresponding enzyme. Although there is a high level of genome conservation between P. fluorescens and the pathogenic P. aeruginosa PAO1, key virulence factors such as exotoxin A and PrpL proteinase are absent from the established genome sequence of P. fluorescens MB214. Other members of the species P. fluorescens whose genomes have been sequenced – for example, strains Pf0–1 [37] and SBW25 [38] – are also missing these factors. Bioinformatics analyses of the MB214 genome to uncover protein secretion systems indicated that all but one known microbial protein export system (the type IV secretion system) are present in MB214. The absent secretion system is commonly plasmid-borne, and no plasmids have been found in MB214. For most of the present export systems, multiple paralogous protein exporters exist, which may have arisen because of the saprophytic lifestyle of this microbe. A comparison of the genomes of P. aeruginosa PAO1 and the two P. fluorescens strains MB214
6 Protein Production in Pseudomonas fluorescens Table 1 Overview of number of pathways for each metabolic category
Category
Pathways(n)
Amino acid metabolism Aromatic hydrocarbons Carbohydrate metabolism Coenzymes and vitamins Electron transport Halide metabolism Hydrocarbon metabolism Hydrogen metabolism Intracellular transport Lipid metabolism Membrane transport Metabolism of oxygen/radicals Nitrogen metabolism Nucleic acids metabolism One-carbon metabolism Phosphate metabolism Protein metabolism Purine metabolism Pyrimidine metabolism Signal transduction Sulfur metabolism
164 27 139 38 21 1 1 1 1 29 124 6 3 28 9 7 50 42 36 6 5
and Pf0–1 revealed that the same protein secretion systems are present in all three pseudomonads, differing only in the number of paralogous exporters [39]. The presence of multiple protein secretion systems opens avenues to genetically engineered production strains capable of secreting the desired protein products outside of the bacterial cells into the growth medium for easy and cost-effective protein recovery. Knowledge of the genome sequence also allows establishment of functional genomics capabilities such as DNA microarray and proteomics for gene and protein expression analysis, respectively. These capabilities are useful to monitor cellular metabolic conditions during the growth and protein production phases of fermentation. For the gene expression capability, over 6200 50-mer oligodeoxyribonucleotides (oligos) were manufactured that represent each ORF of the P. fluorescens MB214 genome. The oligos were designed to exhibit consistent hybridization temperature, to lack strong secondary structure, and to show minimal crosshybridization with other genes in the MB214 genome. All oligos are printed twice on epoxy-coated glass slides to increase the confidence of the obtained gene expression data. The printed microarrays are then hybridized with control and experimental labeled RNA samples, and their hybridization pattern is analyzed using software tools from BioDiscovery (El Segundo, CA, USA) and SpotFire Inc. (Somerville, MA, USA). The complementary proteomics capability involves separation of proteins using one- or two-dimensional SDS–polyacrylamide gel electrophoresis (SDSPAGE), subsequent robotic protein spot cutting, automated proteolysis of the
6 Core Expression Platform for Heterologous Proteins
proteins to yield peptides, and matrix-assisted laser desorption/ionization time-offlight (MALDI-TOF) mass spectrometric analysis of the obtained peptide to identify P. fluorescens proteins. Both the transcriptional profiling and proteomics technologies are used as analytical tools to provide information about the metabolic state of the host cells, as well as the temporal profile of the gene transcripts and protein production during fermentation runs. The proteomics capability furthermore supplies information about the yield and stability of the overproduced protein as well as the processing sites of signal sequences that target proteins into the periplasm. Additionally, proteomics can be used to direct protein purification processes by identifying co-purifying proteins. This may lead to altering current purification processes or steering genetic engineering to eliminate interfering proteins. Together with other tools, these capabilities allow refined analysis and modification of both fermentation and protein expression in P. fluorescens. Genome mining has also been used to discover new, useful genetic functions to enhance the performance of this organism. Novel promoter elements inducible by benzoate or anthranilate have been cloned and further engineered (see below). Furthermore, identification of essential metabolic genes within the genome led to the cloning of these genes into plasmids (coupled with the deletions of the same genes from the chromosome) replacing genes encoding antibiotic resistance, thus resulting in antibiotic-free stable plasmid maintenance during HCD production fermentation runs (see below). Computational analysis of the codon usage of strain MB214 also allows the construction of synthetic P. fluorescens expression-specific genes with improved protein production based on the preferred codons used in this organism.
6 Core Expression Platform for Heterologous Proteins
The current platform for recombinant protein production in P. fluorescens comprises a family of host strains derived from MB101 (A1) combined with stable, but nonconjugative expression plasmids of varying copy number (A2). The host strains are stable, amenable to genetic or molecular manipulations, and can be cultivated to high cell densities in fully defined mineral salts media in standard fermentors, without oxygen enrichment. The expression plasmid vectors are either derived from the medium copy number IncQ plasmid RSF1010 [40] or the lower copy number P. savastanoi plasmid pPS10 [41]. These plasmids are compatible and, if needed, can be stably maintained concomitantly. Although broad host range IncP plasmids, such as RK2 derivatives, can replicate in P. fluorescens, they have been found to be unstable and not suitable for high-level heterologous gene expression. Expression of heterologous genes is driven by promoters of varying strengths, such as the ben, ant, tac, and lacUV5 promoters optimal translation initiation signals and strong transcription terminators. Derivatives of the lac promoter can be regulated by the introduction of a lacI gene in the host strain, permitting induction by
7
8 Protein Production in Pseudomonas fluorescens
isopropyl-thiogalactopyranoside (IPTG). Induction by lactose is possible in strains bearing the lac operon from E. coli, such as P. fluorescens MB214. 6.1 Antibiotic-free Plasmids using pyrF and proC
In traditional microbial heterologous protein production systems, antibiotic resistance genes are essential to maintain plasmids, but they cause a problem if they cannot be completely removed from the product. Regulatory agencies discourage residual antibiotic resistance-coding gene DNA in products due to the perceived risk of its transfer to the intestinal flora of humans. Additional, expensive processing steps are often needed to degrade or remove the DNA so that it does not contaminate the final product. Removing the antibiotic resistance genes from the strain would save the cost of these steps. To achieve an antibiotic-resistance-gene-free system, antibiotic resistance genes on the plasmid were replaced by genes encoding essential proteins for a step in intermediary metabolism in P. fluorescens. Two genes were tested: the pyrF gene, which encodes orotidine 5 -decarboxylase, an essential step in the biosynthesis of uracil; and proC, a gene encoding pyrroline-5-carboxylate reductase, the last step in proline biosynthesis. Cells that lack pyrF can grow by supplementation with uracil, which is converted to uridine 5 -monophosphate (UMP) through a salvage pathway. Mutants of pyrF are no longer sensitive to 5 -fluoroorotic acid (FOA), which facilitated creation of further genomic changes (discussed further below). The proC gene was chosen because the ORF did not appear to be transcriptionally linked to adjacent ORFs; therefore, it was reasonable to assume that the endogenous promoter sequences were near the ORF. The ORFs for pyrF and proC, and a few hundred base pairs of flanking sequences, in order to include endogenous promoters and terminators, were PCR-amplified and cloned in place of the antibiotic-resistance genes on the two expression plasmids. The corresponding auxotrophic Pseudomonas strains were made by deleting the corresponding genes in the chromosome. The productivity of the strains carrying the auxotrophic markers in place of the antibiotic-resistance genes was identical (see below). 6.2 Gene Deletion Strategy and Re-usable Markers
A strain with a deletion of the pyrF ORF was created by PCR-amplifying the regions flanking the pyrF gene and joining them together on a plasmid that cannot replicate in P. fluorescens in order to carry out allele exchange [42]. Upon electroporation [43], this plasmid crossed into the genomic region upstream or downstream of pyrF. Subsequent selection for growth in the presence of FOA identified strains in which the integrated plasmid had crossed out, resulting in a strain with a precise deletion of the pyrF ORF and no plasmid DNA. This pyrF strain is a platform for other precise genomic changes in the following manner. Any desired genomic change can be effected by PCR-amplifying about 500 bp of genomic DNA on each side of the change, then cloning it into a suicide
6 Core Expression Platform for Heterologous Proteins
Fig. 1 Engineering of the proC deletion in the P. fluorescens genome. Regions flanking proC were PCR-amplified and cloned into the SrfI site of pDOW1261. Selection on tetracycline resulted in homologous recombination into one of the regions flanking proC. Subsequent growth in the absence of
tetracycline and presence of proline allowed survival of strains that had recombined within the other flanking region. These strains were identified by selection on 5 fluoroorotic acid, which is toxic to cells that are pyrF+.
vector carrying the pyrF gene (e.g., pDOW1261) (Figure 1). Allelic exchange is carried out as above, using the cloned pyrF as a counterselection marker, forming a strain with the desired change and restoring the pyrF marker in the genome. In this manner, the proC gene was deleted from the production strain, and subsequently the E. coli lacIQ1 gene was inserted into a locus encoding levan sucrase. 6.3 Periplasmic Secretion and Use of Transposomes
In order to find signal sequences capable of efficiently secreting a heterologous protein to the periplasm, we designed a transposome [44] carrying near the insertion junction the E. coli-derived gene for alkaline phosphatase, phoA, from which the endogenous signal sequence had been removed. Since the phoA gene product is active only when localized to the periplasm [45], genomic P. fluorescens signal sequences were detected by electroporation of the transposome into cells, after which the transposome randomly integrated and created a signal sequence-phoA fusion. Of several signal sequences identified in this way, the signal from phosphate binding protein, in particular, was effective at efficiently transporting single-chain antibodies (see Section 7). 6.4 Alternative Expression Systems: Anthranilate and Benzoate-inducible Promoters
Native P. fluorescens promoters that provide an alternative to the traditional IPTGinducible systems such as lac and tac promoters have been identified. Sodium benzoate and anthranilate are inexpensive chemicals that can be utilized as a carbon source by P. fluorescens, as well as by other Pseudomonas sp., Acinetobacter
9
10 Protein Production in Pseudomonas fluorescens
Fig. 2 Relative activity of benzoate and anthranilate-inducible promoters. Shown above are the arrangements of the benzoate (benABCD) and anthranilate (antABC) catabolic operons of P. fluorescens MB214. Promoter constructs tested are shown
above each operon diagram. The relative strength of each promoter containing fragment, as measured by $-galactosidase expressed from lacZ fusion constructs, is depicted on the right with relative strength increasing from + to ++++.
sp., and others [46–50]. The promoters responsible for expression of the P. fluorescens benzoate (benABCD)-degradative genes (encoding catabolic pathway enzymes), which responds to sodium benzoate (Pben), and the promoter of the P. fluorescens antABC-degradative genes (Pant), which responds to anthranilate, were isolated and characterized. Transcriptional activators benR and antR have been identified directly upstream of the benABCD and antABC operons, respectively. Using the E. coli-derived lacZ gene as a reporter of promoter activity, a minimal promoter region that includes the activator binding site has been identified for both (Figure 2), which is active both at small (200 mL) and large (20 L) scales. Anthranilate-inducible activity is improved by adding the transcriptional activator antR in multi-copy. Inactivation of the benzoate and/or anthranilate metabolic pathways allows for use of these compounds as inducers without degradation of the inducer itself. By deleting gene(s) encoding the large and/or small subunits of the dioxygenases, we are able to mimic a gratuitous inducer system, such as IPTG induction of lac-based promoters, in that the inducer is added only once. Having a variety of promoters allows for greater flexibility in titrating gene expression, and allows for co-expression of genes using different induction profiles. 7 Production of Heterologous Proteins in P. fluorescens 7.1 Pharmaceutical Proteins
The production of several therapeutic proteins in the P. fluorescens system was compared side-by-side with the E. coli T7 expression system [51]. In each case, P.
7 Production of Heterologous Proteins in P. fluorescens
Fig. 3 Analysis of recombinant human growth hormone (rhGH) activity. Activity of rhGH was measured essentially as described by Patra et al. [52], except that proliferation of the rat lymphoma cell Nb2-11 was determined by BrdU incorporation, de-
picted on the Y-axis as A370. Proliferation induced by varying amounts of rhGH purified from P. fluorescens was compared to commercially available rhGH (Antigenix) purified from E. coli. Cells do not respond to bovine serum albumin (BSA).
fluorescens was equivalent to, or had some advantages over the commonly used T7 system in E. coli. Commercial recombinant human growth hormone (rhGH) is currently produced in E. coli cytoplasmically as inclusion bodies that can be readily refolded to an active form [52]. Secretion to the periplasm results in the production of soluble protein, but the yield is greatly decreased [53, 54]. The cDNA encoding rhGH, amplified from a human pituitary cDNA library with the native secretion signal sequence removed, was cloned and expressed cytoplasmically in both the classic P. fluorescens expression system and the E. coli T7 expression system, then evaluated at the 20L fermentation scale. P. fluorescens produced 1.6 fold more rhGH per gram dry biomass than E. coli. Refolded rhGH isolated from P. fluorescens extracts was as active as commercially available rhGH, as determined by a cell proliferation assay using the rat lymphoma cell line Nb2–11 (Figure 3). Human gamma interferon (γ -IFN) represents another example of a therapeutic protein currently produced in E. coli as inclusion bodies [55–57]. Moreover, studies have found that γ -IFN is difficult to refold [57–60]. Mutants of γ -IFN that are produced as soluble protein, yet retain function, have been isolated and appear to overcome these problems [61]. The comparative expression of γ -IFN in P. fluorescens versus E. coli revealed a significant advantage for recovery of the active cytokine in the P. fluorescens expression system. A human spleen cDNA library was used as a template to clone the γ -IFN cDNA, with the native signal sequence removed, into the E. coli T7 and P. fluorescens expression systems. The E. coli construct produced 2–4 g L−1 of insoluble protein at the 20-L scale (Figure 4). The P. fluorescens construct produced ∼4 g L−1 of the cytokine during a typical 20-L fermentation. SDS-PAGE analysis of soluble and insoluble fractions shows that the majority of the protein (95 %) is present in the soluble fraction when produced in P. fluorescens. This soluble protein was readily purified in a single ion-exchange chromatographic step
11
12 Protein Production in Pseudomonas fluorescens
Fig. 4 Production of(-IFN in E. coli versus P. fluorescens. (-IFN produced by E coli is shown in panel A. Lanes 1 and 2 show pre-induced samples, soluble and insoluble fractions, respectively; lanes 3 and 4 show 3-h-induced samples, soluble and insoluble fractions, respectively. Lane 5, See Blue Plus2 Marker. (-IFN produced in P. fluorescens is shown in panel B. Lane 1, see Blue Plus2 Marker. Lanes 2 and 3 show pre-induced samples, soluble and insoluble fractions, respectively; lanes 4 and 5 show
24-h-induced samples, soluble and insoluble fractions, respectively. Lanes 6 and 7 show 48-h-induced samples, soluble and insoluble fractions, respectively. 5 :L samples normalized to OD575 20 were loaded onto a 10% Bis-Tris NuPAGE run in 1 x MES. The arrow indicates the position of the recombinant protein. Western analyses for E. coli and P. fluorescens samples are depicted in the lower panels, showing 100% insoluble protein in E. coli and 95% soluble protein in P. fluorescens.
to >95% purity, and was found in a viral inhibition assay to be as active as the commercially available compound [62]. Monoclonal antibodies and antibody fragments represent a large percentage of therapeutic proteins currently in the development pipeline. Two anti-βgalactosidase single chain antibodies, gal13 and gal2 [63, 64] were produced in the cytoplasm or secreted to the periplasm of P. fluorescens. The Gal13 antibody, produced cytoplasmically, demonstrated an eightfold greater volumetric yield at the 20-L scale when produced in P. fluorescens as compared to expression in E. coli. Gal13 was produced in P. fluorescens primarily as soluble protein (96%), whereas only 48% of the gal13 was soluble in E. coli (Figure 5). Protein purified from both
7 Production of Heterologous Proteins in P. fluorescens
Fig. 5 Cytoplasmic production of Gal13. SDS-PAGE analysis of E. coli and P. fluorescens soluble and insoluble fractions containing Gal13. Lanes 1 and 2 show E. coli pre-induced samples, soluble and insoluble fractions, respectively; lanes 3 and 4 show 3-h-induced E. coli samples, soluble and insoluble fractions, respectively. Lane 5, See Blue Plus2 Marker. Lanes 6 and 7 show P. fluorescens pre-induced samples, soluble and insoluble fractions, respectively; lanes 8 and
9 show P. fluorescens 24-h-induced samples, soluble and insoluble fractions, respectively. 5 :L samples normalized to OD575 30 were loaded in each lane and run on a 10% NuPAGE gel (Invitrogen, San Diego, CA, USA) in 1X MOPS buffer. Western analyses of these samples are shown in the panels below. E. coli was found to have 48% soluble protein while P. fluorescens had 96% soluble protein.
expression systems by Ni+ affinity chromatography was found to be active in an ELISA assay, as was previously shown for E. coli-derived material [63]. The pelB secretion signal for E. coli expression/secretion system [65], or the pbp (phosphate-binding protein signal sequence) secretion signal for the P. fluorescens expression/secretion system was fused to the N-terminus of the gal2 ORF to test for Gal2 secretion to the periplasm and/or to the culture medium. The pelB-gal2 fusion expressed in E. coli resulted in 1.6 g L−1 of protein of which about 54% was processed by the cell, resulting in removal of the secretion signal and indicating that the protein was secreted to the periplasm. Of that 54%, 11% was found in the soluble fraction. The pbp-gal2 fusion expressed in P. fluorescens resulted in 9–10 g L−1 of fully processed protein. The majority (96%) of the Gal2 protein produced in P. fluorescens was found in the insoluble fraction. Protein purified
13
14 Protein Production in Pseudomonas fluorescens
from both expression systems was found to be active in an anti-β-galactosidase ELISA. Although a greater percentage of the total protein was found in the soluble fraction for the E. coli construct, as compared to the P. fluorescens pbp:gal2 construct, the overall yield in P. fluorescens of processed protein was significantly higher. 7.2 Industrial Enzymes
A wide range of industrial enzymes belonging to the glycosidase (EC 3.2.1.x), nitrilase (EC 3.5.5.x), and phosphatase (EC 3.1.3.x) families, both of mesophilic and hyperthermophilic origins, have been produced in P. fluorescens at high levels. Several native and laboratory-evolved α-amylases derived from environmental Thermococcus strains [66] accumulated to amounts greater than 25% total cell protein in HCD cultures at pilot production scales of thousands of liters. Similarly, native and optimized nitrilase genes [67] have been expressed in P. fluorescens leading to yields greater than 50% total cell proteins in a soluble and active form (Figure 6).
Fig. 6 Cytoplasmic production of soluble and active nitrilase. A) Phase-contrast micrographs of P. fluorescens cells: (i) before induction and (ii) producing nitrilase as nonrefractile inclusion bodies after induction. B) SDS-PAGE analysis of P. fluorescens producing nitrilase at 0, 24, 32, and 41 h post-IPTG induction time points.
8 Conclusions
This represents a titer of >25 g L−1 when combined with a biomass of 100 g L−1 in a HCD fermentation process. 7.3 Agricultural Proteins
The P. fluorescens system has been employed for many years to express a variety of genes encoding insecticidal proteins, ranging from 20 to 140 kDa in size, derived from Bacillus thuringiensis. P. fluorescens has proven to be a reliable tool to generate sufficient amounts of lead proteins for testing of specific insecticidal activities. The novel active proteins were then candidates for introduction into transgenic plants, or to be manufactured as spray-on bioinsecticides by fermentation. The Cry1 and Cry3 classes of B. thuringiensis δ-endotoxin insecticidal proteins of molecular weights 135 and 70 kDa, respectively, have been produced at the commercial R R R scale of thousands of liters in the manufacture of M-Trak , MVP , MVPII , R and Mattch biopesticides [17–19]. Expression of these insecticidal proteins was invariably greater than 20% of total cell protein in HCD fermentations of 100 g L−1 biomass. Fixation of the P. fluorescens during formulation afforded better stability over native B. thuringiensis products, permitting stable liquid concentrate formulations [68].
8 Conclusions
Although P. fluorescens has long been considered to be a metabolically diverse, versatile, and nonpathogenic organism, and has a history of successful and safe use for production of agricultural proteins and enzymes, until now it has not been seriously considered as a host for the production of therapeutic proteins. In this chapter, we describe results with a selected strain of P. fluorescens which shows great promise as an expression system for improved and lower-cost production of biotherapeutics. This strain, designated P. fluorescens MB101, has an excellent safety profile and is very well characterized. Extensive genomic and genetic information is now available for further strain development. At this point, a number of strong promoters are in place for the construction of expression vectors, the plasmids employed contain selection markers based on auxotrophic genes, and no antibiotic resistance markers are required. These plasmids are also stable, and are not mobilizable. A large and expanding set of molecular biology tools, and a functional genomics capability has been developed to enable rapid expression of a variety of therapeutic proteins. As described in this chapter, several therapeutic proteins were produced in P. fluorescens MB101, and these results were compared to similar expression studies in E. coli. In most cases the P. fluorescens strain produced up to fivefold more protein (on a g L−1 basis) than the E. coli strain. An unexpected finding was the production of soluble and active proteins in P. fluorescens, whereas in E. coli the same proteins
15
16 Protein Production in Pseudomonas fluorescens
formed inclusion bodies. If a protein can be produced in soluble and active form, significant savings in purification costs can be achieved. P. fluorescens has an impressive capability for producing heterologous proteins at high levels. The nitrilase described above was produced in a soluble and active form, amounting to yields >25 g L−1 , though this productivity level cannot be expected for all protein examples. MB101 was also shown to be an efficient producer of secreted proteins; yields in excess of 15 g L−1 have been achieved with single-chain antibody fragments secreted to the periplasm. Additional developmental work is likely to further improve these yields. A rapid and efficient expression of therapeutics is desired, both now and in the future, in order to manage production costs and provide effective, affordable products, especially for higher volume applications. The P. fluorescens strain described in this chapter has many favorable properties in this regard. The organism is grown in a completely defined mineral salts medium with no added animal components or organic nitrogen of any kind. The fed-batch fermentation process is well-characterized, and scale up is predictable and rapid, up to thousands of liters. Cell densities in excess of 100 g L−1 dry cell weight are routinely obtained in standard fermentation vessels, without oxygen supplementation. The organism is unusually well suited for high level expression, and will tolerate a wide range of conditions. In addition, recovery and downstream purification procedures with P. fluorescens are standard, and consistent with those employed with E. coli. A knowledge of the genome sequence has enabled strain engineering to improve gene expression in P. fluorescens and this has resulted in higher recombinant protein yields. Protein expression during fermentation runs can be monitored by functional genomic tools to gauge the metabolic state of host cells and to provide temporal production profiles of the desired gene transcript and protein. These data may then be used to further improve strain performance through directed genetic changes. Extensive pathogenicity and toxicology studies with P. fluorescens have shown the organism to be a safe strain for the production of therapeutics. In this respect, investigations are continuing in order to add data which are relevant to regulatory submissions. The combination of high volumetric and specific expression of a broad range of therapeutic proteins, coupled with a potential for soluble, active, and secreted products, makes P. fluorescens a compelling alternative for the microbial expression of biologicals for human health.
Appendix Table A1 Selection of P. fluorescens strains
Strain
Genotype
Phenotype
Reference
MB101 MD214 DC206
Wild-type Lac operton insertion MB101 pyrF lacQ1
Biovar I Lac+ URA-
This report [2] This report [69] This report
Appendix Table A2 Selection of P. fluorescens plasmids
Expression vectors
Expression cassette
Gene
Selection marker
Reference
pDOW2400
tac promoter and rrnBT1T2 terminator, RSF1010 origin
tetR
This report
pDOW1129
tac promoter and rrnBT1T2 terminator, RSF1010 origin
tetR
This report
pDOW1117
tac promoter and rrnBT1T2 terminator, RSF1010 origin tac promoter and rrnBT1T2 terminator, RSF1010 origin, pbp secretion signal tac promoter and rrnBT1T2 terminator, RSF1010 origin
Human growth hormone without signal sequence (rhGH) gamma interferon (γ -IFN) without native signal sequence Single chain antibody Scfv gall 3 pbp secretion signal fused to single chain antibody gal2 nitrilase
tetR
This report
tetR
This report
pyrF
This report [67]
kanR pyrF tetR
[41] This report
pDOW1123
pDOW2415
Other plasmids pCN51lacl pDOW1261
pPS10 origin ColEl origin does not replicate in P. fluorescens
lacl
Table A3 List of Genes
Gene
Encoded gene product
rhGH γ -IFN gal2 gal13 pelB pyrF proC phoA pbp Pben benABCD Pant antABC antR tetR Ptac lacI
Recombinant human growth hormone (lacking the native secretion signal) Human gamma interferon (lacking the native secretion signal) Single chain antibody Gal2 Single chain antibody Gal13 Pectate lyase Orotidine-5 -phosphate decarboxylase Pyrroline-5-carboxylate reductase Alkaline phosphatase Phosphate-binding protein benABCD Promoter Benzoate degradation operon antABC Promoter Anthranilate degradation operon Transcriptional activator of the antA promoter tetracycline resistance tac promoter Repressor of lac promoter
17
18 Protein Production in Pseudomonas fluorescens
References 1 SWARTZ J. R. (2001) Advances in Escherichia coli production of therapeutic proteins. Curr Opin Biotechnol 12: 195–201 2 LANDRY T. D., CHEW L., DAVIS J. W., FRAWLEY N., FOLEY H. H., STELMAN S. J., THOMAS J., WOLT J., HANSELMAN D. S. (2003) Safety evaluation of an alpha-amylase enzyme preparation derived from the archaeal order Thermococcales as expressed in Pseudomonas fluorescens biovar I. Regul Toxicol Pharmacol 37: 149–168 3 OECD (1997) Consensus Document on Information Used in the Assessment of Environmental Applications Involving Pseudomonas. Organisation for Economic Cooperation and Development, Paris, France 4 BRADBURY J. (1986) Guide to Plant Pathogenic Bacteria, C. A. B. International Mycological Institute. Kew, Surrey, UK. 5 PALLERONI N. (1992) Human and animal pathogenic Pseudomonas. In: The Prokarytoes, Vol. III 2nd edition (BALOWS A., TRUPER H. G., DWORKIN M., HARDER W., SCHLEIFER K. H., Eds) Springer-Verlag, Berlin, New York, pp 3086–3103 6 HOLT J. G., KRIEG N. R., SNEATH P. H. A., STALEY J. T., WILLIAMS S. T. (1994) Bergey’s Manual of Determinative Bacteriology, 9th edition. Lippincott, Williams & Wilkins, New York, pp 71–156 7 GILARDI G. L. (1991) Pseudomonas and related genera. In: Manual of Clinical Microbiology, 5th edition (BALOWS A., HAUSLER W. J.. HERRMANN K. L., ISENBERG, H. D., SHADOMY H. D., Eds). American Society for Microbiology, Washington, DC 8 WARREN G. J. (1987) Bacterial ice nucleation: molecular biology and applications. Biotechnol Genet Eng Rev 5: 107–135 9 LINDOW S. E., PANOPOULOS N. J. (1988) Field tests of recombinant Ice-Pseudomonas syringae for biological frost control in potato. In: Proceeding for the First International Conference on Release of Genetically Engineered Microorganisms (SUSSMAN M., COLLINS C.
10
11
12
13
14
15
16
17
18
19
H., SKINNER F. A., Eds). Academic Press, London, pp 121–138 WILSON M., LINDOW S. E. (1993) Release of re-combinant microorganisms. Annu Rev Microbiol 47: 913–944 Mycogen (1991) Biotechnology PreManufacturing Notice submitted to The Office of Toxic Substances of USEPA to manufacture encapsulated Bacillus thuringiensis deltaendotoxin in Pseudomonas fluorescens. HERRERA G. S., SNYMAN J., THOMSON J. A. (1994) Construction of a bioinsecticidal strain of Pseudomonas fluorescens active against the sugarcane borer, Eldana saccharina. Appl Environ Microbiol 60: 682–690 WUBBOLTS M. G. W., WITHOLT B. (1998) Selected industrial biotransformations. In: Pseudomonas, Vol. 10 (MONTIE T. C., Ed). Plenum Press, New York, pp 271–329 ROY M. H. (1988) Use of fatty acids for the identification of phytopathogenic bacteria. Plant Dis 72: 460 SASSER M. (1990) Identification of bacteria through fatty acid analysis. In: Methods in Phytobacteriology (KLEMENT K. R. Z., RUDOLF K., SANDS D. C., Eds). Academiai Kiado, Budapest MOORE E. R. B. M., ARNSCHEIDT A., BOETTGER E. C., HUTSON R. A., COLLINS M. D., van de PEER Y., De WACHTER R., TIMMIS K. N. (1996) The determination and comparison of the 16S rRNA gene sequences of species of the genus Pseudomonas (sensu stricto) and estimation of the natural intrageneric relationships. System Appl Microbiol 19: 478–492 EPA (1991) M-One Plus Bioinsecticide; tolerance exemption. Federal Register 56: 28325–28356 EPA (1991) MVP Bioinsecticide; tolerance exemption. Federal Register 56: 28326–28328 EPA (1995) CryIA(c) and CryIC derived delta endotoxins of Bacillus thuringiensis encapsulated in killed Pseudomonas fluorescens; exemption from the requirement of a tolerance. Federal Register 60: 47487–47489
References 20 FDA (2003) Agency Response Letter GRAS Notice No. GRN 000126 http://www.cfsan.fda.gov/∼rdb/opag126.html. CFSAN/Office of Food Additive Safety. 21 KIM G. J., LEE I. Y., YOON S. C., SHIN Y. C., PARK Y. H. (1997) Enhanced yield and a high production of medium-chain-length poly(3-hydroxyalkanoates) in a two-step fed-batch cultivation of Pseudomonas putida by combined use of glucose and octanoate. Enzyme MicrobiolTechnol 20: 500–505 22 LEE S. Y., WONG H. H., CHOI J. I., LEE S. H., LEE S. C., HAN C. S. (2000) Production of medium-chain-length polyhydroxyalkanoates by high-cell-density cultivation of Pseudomonas putida under phosphorus limitation. Biotechnol Bioeng 68: 466–470 23 DAWES E. A., MIDGLEY M., WHITING P. H. (1976) Control of transport systems for glucose, gluconate and 2-oxo-gluconate, and of glucose metabolism in Pseudomonas aeruginosa. Presented at the Continuous Culture: Application of New Fields, [Plenary Lecture], 6th International Symposium of Continuous Culture of Micro-organisms 24 TEMPLE L. M., SAGE A. E., SCHWEIZER H. P., PHIBBS P. V. (1998) Carbohydrate catabolism in Pseudomonas aeruginosa. In: Pseudomonas, Vol. 10 (MONTIE T. C., Ed). Plenum Press, New York, pp 35–72 25 EWING B., GREEN P. (1998) Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res 8: 186–194 26 EWING B., HILLIER L., WENDL M. C., GREEN P. (1998) Base-calling automated sequencer traces using Phred. I. Accuracy assessment. Genome Res 8: 175–185 27 STOVER C. K., PHAM X. Q., ERWIN A. L., MIZOGUCHI S. D., WARRENER P., HICKEY M. J., BRINKMAN F. S., HUFNAGLE W. O., KOWALIK D. J., LAGROU M., GARBER R. L., GOLTRY L., TOLENTINO E., WESTBROCK-WADMAN S., YUAN Y., BRODY L. L., COULTER S. N., FOLGER K. R., KAS A., LARBIG K., LIM R., SMITH K., SPENCER D., WONG G. K., WU Z., PAULSEN I. T., REIZER J., SAIER M. H., HANCOCK R. E., LORY S., OLSON M. V. (2000) Complete genome
28
29
30
31
32
sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature 406: 959–964 NELSON K. E., WEINEL C., PAULSEN I. T., DODSON R. J., HILBERT H., MARTINS DOS SANTOS V. A., FOUTS D. E., GILL S. R., POP M., HOLMES M., BRINKAC L., BEANAN M., DEBOY R. T., DAUGHERTY S., KOLONAY J., MADUPU R., NELSON W., WHITE O., PETERSON J., KHOURI H., HANCE I., CHRIS LEE P., HOLTZAPPLE E., SCANLAN D., TRAN K., MOAZZEZ A., UTTERBACK T., RIZZO M., LEE K., KOSACK D., MOESTL D., WEDLER H., LAUBER J., STJEPANDIC D., HOHEISEL J., STRAETZ M., HEIM S., KIEWITZ C., EISEN J. A., TIMMIS K. N., DUSTERHOFT A., TUMMLER B., FRASER C. M. (2002) Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440. Environ Microbiol 4: 799–808 BUELL C. R., JOARDAR V., LINDEBERG M., SELENGUT J., PAULSEN I. T., GWINN M. L., DODSON R. J., DEBOY R. T., DURKIN A. S., KOLONAY J. F., MADUPU R., DAUGHERTY S., BRINKAC L., BEANAN M. J., HAFT D. H., NELSON W. C., DAVIDSEN T., ZAFAR N., ZHOU L., LIU J., YUAN Q., KHOURI H., FEDOROVA N., TRAN B., RUSSELL D., BERRY K., UTTERBACK T., Van AKEN S. E., FELDBLYUM T. V., D’ASCENZO M., DENG W. L., RAMOS A. R., ALFANO J. R., CARTINHOUR S., CHATTERJEE A. K., DELANEY T. P., LAZAROWITZ S. G., MARTIN G. B., SCHNEIDER D. J., TANG X., BENDER C. L., WHITE O., FRASER C. M., COLLMER A. (2003) The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000. Proc Natl Acad Sci USA 100: 10181–10186 BADGER J. H., OLSEN G. J. (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol Biol Evol 16: 512–524 SALZBERG S. L., DELCHER A. L., KASIF S., WHITE O. (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26: 544–548 SUZEK, B. E., ERMOLAEVA M. D., SCHREIBER M., SALZBERG S. L. (2001) A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17: 1123–1130.
19
20 Protein Production in Pseudomonas fluorescens
33
34
35
36
37
38
39
40
41
(http:://ftp.tigr.org/pub/software/ RBSfinder/. [Online.]) ALTSCHUL S. F., MADDEN T. L., SCHAFFER A. A., ZHANG J., ZHANG Z., MILLER W., LIPMAN D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402 PEARSON W. R. (2000) Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol 132: 185–219 TATUSOV R. L., GALPERIN M. Y., NATALE D. A., KOONIN E. V. (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28: 33–36 MULDER N. J., APWEILER R. T., ATTWOOD T. K., BAIROCH A., BARRELL D., BATEMAN A., BINNS D., BISWAS M., BRADLEY P., BORK P., BUCHER P., COPLEY R. R., COURCELLE E., DAS U., DURBIN R., FALQUET L., FLEISCHMANN W., GRIFFITHS-JONES S., HAFT D., HARTE N., HULO N., KAHN D., KANAPIN A., KRESTYANINOVA M., LOPEZ R., LETUNIC I., LONSDALE D., SILVENTOINEN V., ORCHARD S. E., PAGNI M., PEYRUC D., PONTING C. P., SELENGUT J. D., SERVANT F., SIGRIST C. J., VAUGHAN R., ZDOBNOV E. M. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res 31: 315–318 JOINT GENOME Institute (2004) Pf0–1 genome sequence http://genome.jgipsf.org/draft microbes/ psefl/psefl.home.html SANGER Centre (2004) SBW25: www.sanger.ac.uk/Projects/ P fluorescens/. MA Q., ZHAI Y., SCHNEIDER J. C., RAMSEIER, T. M., SAIER M. H.. (2003) Protein secretion systems in Pseudomonas aeruginosa and P. fluorescens. Biochim Biophys Acta –Biomembranes 1611: 223–233 SCHOLZ P., HARING V., WITTMANN-LIEBOLD B., ASHMAN K., BAGDASARIAN M., SCHERZINGER E. (1989) Complete nucleotide sequence and gene organization of the broadhost-range plasmid RSF1010. Gene 75: 271–288 NIETO C., FERNANDEZ-TRESGUERRES E., SANCHEZ N., VICENTE M., DIAZ R. (1990)
42
43
44
45
46
47
48
49
50
Cloning vectors, derived from a naturally occurring plasmid of Pseudomonas savastanoi, specifically tailored for genetic manipulations in Pseudomonas. Gene 87: 145–149 STIBITZ S. (1994) Use of conditionally counterselectable suicide vectors for allelic exchange. Methods Enzymol, Vol 235, pp 458–465 ARTIGUENAVE F., VILAGINES R., DANGLOT C. (1997) High-efficiency transposon mutagenesis by electroporation of a Pseudomonas fluorescens strain. FEMS Microbiol Lett 153: 363–369 GORYSHIN I. Y., REZNIKOFF W. S. (1998) Tn5 in vitro transposition. J Biol Chem 273: 7367–7374 MANOIL C., BECKWITH J. (1985) TnphoA: a transposon probe for protein export signals. Proc Natl Acad Sci USA 82: 8129–8133 NEIDLE E. L., HARTNETT C., ORNSTON L. N., BAIROCH A., REKIK M., HARAYAMA S. (1991) Nucleotide sequences of the Acinetobacter calcoaceticus benABC genes for benzoate 1,2-dioxygenase reveal evolutionary relationships among multicomponent oxygenases. J Bacteriol 173: 5385–5395 JEFFREY W. H., CUSKEY S. M., CHAPMAN P. J., RESNICK S., OLSEN R. H. (1992) Characterization of Pseudomonas putida mutants unable to catabolize benzoate: cloning and characterization of Pseudomonas genes involved in benzoate catabolism and isolation of a chromosomal DNA fragment able to substitute for xylS in activation of the TOL lower-pathway promoter. J Bacteriol 174: 4986–4996 BUNDY B. M., CAMPBELL A. L., NEIDLE E. L. (1998) Similarities between the antABC-encoded anthranilate dioxygenase and the benABC-encoded benzoate dioxygenase of Acinetobacter sp. strain ADP1. J Bacteriol 180: 4466–4474 COWLES C. E., NICHOLS N. N., HARWOOD C. S. (2000) BenR, a XylS homologue, regulates three different pathways of aromatic acid degradation in Pseudomonas putida. J Bacteriol 182: 6339–6346 EBY D. M., BEHARRY Z. M., COULTER E. D., KURTZ D. M., NEIDLE E. L. (2001)
References
51
52
53
54
55
56
57
58
59
Characterization and evolution of anthranilate 1,2-dioxygenase fromAcinetobacter sp. strain ADP1. J Bacteriol 183: 109–118 STUDIER F. W. (1991) Use of bacteriophage T7 lysozyme to improve an inducible T7 expression system. J Mol Biol 219: 37–44 PATRA A. K., MUKHOPADHYAY R., MUKHIJA R., KRISHNAN A., GARG L. C., PANDA A. K. (2000) Optimization of inclusion body solubilization and renaturation of recombinant human growth hormone from Escherichia coli. Protein Expr Purif 18: 182–192 BECKER G. W., HSIUNG H. M. (1986) Expression, secretion and folding of human growth hormone in Escherichia coli. Purification and characterization. FEBS Lett 204: 145–150 UCHIDA H., NAITO N., ASADA N., WADA M., IKEDA M., KOBAYASHI H., ASANAGI M., MORI K., FUJITA Y., KONDA K., KUSUHARA N., KAMIOKA T., NAKASHIMA K., HONJO M. (1997) Secretion of authentic 20-kDa human growth hormone (20K hGH) in Escherichia coli and properties of the purified product. J Biotechnol 55: 101–112 SIMONS G., REMAUT E., ALLET B., DEVOS R., FIERS W. (1984) High-level expression of human interferon gamma in Escherichia coli under control of the pL promoter of bacteriophage lambda. Gene 28: 55– 64 PEREZ L., VEGA J., CHUAY C., MENENDEZ A., UBIETA R., MONTERO M., PADRON G., SILVA A., SANTIZO C., BESADA V., HERRERA L. (1990) Production and characterization of human gamma interferon from Escherichia coli. Appl Microbiol Biotechnol 33: 429–434 HAELEWYN J., De LEY M. (1995) A rapid single-step purification method for human interferon-gamma from isolated Escherichia coli inclusion bodies. Biochem Mol Biol Int 37: 1163–1171 YIP Y. K., PANG R. H., URBAN C., VILCEK J. (1981) Partial purification and characterization of human gamma (immune) interferon. Proc Natl Acad Sci USA 78: 1601–1605 BRAUDE I. A. (1983) A simple and efficient method for the purification of human
60
61
62
63
64
65
66
67
68
gamma interferon. Prep Biochem 13: 177–190 BRAUDE I. A. (1984) Purification of human gamma-interferon to essential homogeneity and its biochemical characterization. Biochemistry 23: 5603–5609 WETZEL R., PERRY L. J., VEILLEUX C. (1991) Mutations in human interferon gamma affecting inclusion body formation identified by a general immunochemical screen. Biotechnology (NY) 9: 731–737 MEAGER A. (1987) Quantification of interferons by antiviral assays and their standardization. In: Lymphokines and Interferons, A Practical Approach (CLEMENS M. J., MORRIS A. G., GEARING A. J. H., Eds). IRL Press, Oxford, pp 129–147 MARTINEAU P., JONES P., WINTER G. (1998) Expression of an antibody fragment at high levels in the bacterial cytoplasm. J Mol Biol 280: 117–127 MARTINEAU P., BETTON J. M. (1999) In vitro folding and thermodynamic stability of an antibody fragment selected in vivo for high expression levels in Escherichia coli cytoplasm. J Mol Biol 292: 921–929 ROBERT R. L. (2002) Periplasmic expression and purification of recombinant Fabs. Methods Mol Biol 178: 343–348 RICHARDSON T. H., TAN X., FREY G., CALLEN W., CABELL M., LAM D., MACOMBER J., SHORT J. M., ROBERTSON D. E., MILLER C. (2002) A novel, high performance enzyme for starch liquefaction. Discovery and optimization of a low pH, thermostable alpha-amylase. J Biol Chem 277: 26501–26507 DESANTIS G., WONG K., FARWELL B., CHATMAN K., ZHU Z., TOMLINSON G., HUANG H., TAN X., BIBBS L., CHEN P., KRETZ K., BURK M. J. (2003) Creation of a productive, highly enantioselective nitrilase through gene site saturation mutagenesis (GSSM). J Am Chem Soc 125: 11476–11477 GAERTNER F. H., QUICK T. C., THOMPSON M. A. (1993) CellCap: An encapsulation system for insecticidal biotoxin proteins. In: Advanced Engineered Pesticides (KIM L., Ed),. Marcel Dekker, New York, pp 73–83
21
22 Protein Production in Pseudomonas fluorescens
69 WILCOX E. R. (1992) Methods, vectors, and host cells for the control of expression of heterologous genes from lac operated promoters. US Patent 5,169,760
70 HADDAD S., EBY D. M., NEIDLE E. L. (2001) Cloning and expression of the benzoate dioxygenase genes from Rhodococcus sp. strain 19070. Appl Environ Microbiol 67: 2507–2514
1
Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria Roland Freudl
Forschungszentrum J¨ulich GmbH, J¨ulich, Germany
Originally published in: Production of Recombinant Proteins. Edited by Gerd Gellissen. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31036-4
1 Introduction
Gram-positive bacteria are structurally very simple organisms. The cytosol is surrounded by the cytoplasmic membrane, which is covered by a thick cell wall composed mainly of peptidoglycan and negatively charged polymers such as teichoic or teichuronic acid. Since, in contrast to Gram-negative bacteria, this class of microorganisms does not possess an additional outer membrane, proteins which are exported across the cytoplasmic membrane can be released directly into the surrounding growth medium. Due to this fact, Gram-positive bacteria are considered especially interesting as host organisms for the secretory production of proteins, because the secretion of proteins offers considerable process advantages over intracellular deposition as insoluble aggregates (inclusion bodies). In fact, the enormous secretion potential of certain Gram-positive bacteria (e.g., Bacillus species) is one of the reasons for their extensive use in industry for the production of secretory proteins. A range of such microorganisms has been exploited as efficient sources for technical enzymes, such as proteases, lipases, amylases and other, mostly hydrolytic enzymes, with yields of several grams per liter of culture medium [1–3]. Many of the bacteria that are presently used in industry for the production of technical enzymes have not been generated by molecular biological techniques, but have been identified by selection after classical mutagenesis. In most cases, the reasons for the improved productivities are therefore unknown. Attempts to use Gram-positive bacteria for the secretory production of heterologous proteins have often failed or have led to disappointing results. The yields obtained for heterologous proteins were, in many cases, significantly lower than those observed for homologous enzymes [4]. Several bottlenecks in the secretory pathway have been identified which, alone or in combination, can dramatically Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
Fig. 1 Frequently observed problems during heterologous protein secretion in Grampositive bacteria. (1) Inefficient translocation across the cytoplasmic membrane; (2) inefficient release into the supernatant;
(3) degradation by membrane- and/or cell wall-associated proteases; (4) inefficient or wrong folding; (5) degradation by secreted proteases. SP: signal peptide.
decrease the amount of the desired product in the culture supernatant (Figure 1). A detailed understanding of these bottlenecks at a molecular level will ease approaches for the selective improvement of Gram-positive bacteria as hosts for the secretory production of biotechnologically or pharmaceutically relevant heterologous proteins.
2 Major Protein Export Routes in Gram-positive Bacteria
In eubacteria, the export of the vast majority of extra-cytosolic proteins is mediated by the general (Sec) secretion system. In addition, many bacteria possess a second protein export system (Tat) for the translocation of a certain subset of proteins. One of the most remarkable differences between these two protein export pathways is the folding status of their respective substrate proteins during the actual translocation step. Sec-dependent proteins are translocated in more or less unfolded state. Subsequent folding takes place on the trans-side of the membrane after membrane translocation (Figure 2). In contrast, the Tat system translocates its substrates in a fully folded or even oligomeric form across the membrane (Figure 2). 2.1 The General Secretion (Sec) Pathway
The identification and characterization of genes in various Gram-positive bacteria, encoding homologues of the well-characterized Sec-proteins of the Gramnegative bacterium Escherichia coli, clearly has established that the mechanism of Sec-dependent protein translocation across the plasma membrane is highly
2 Major Protein Export Routes in Gram-positive Bacteria
Fig. 2 Major protein export pathways in Gram-positive bacteria. The general Secand the alternative twin-arginine (Tat) protein export pathways differ dramatically with respect to the folding status of their substrates. Translocation via the Sec system
requires that the export proteins are kept in an unfolded state. In contrast, fully folded proteins are exported by the Tat system. SPSec: Sec signal peptide, SP-Tat: Tat signal peptide.
conserved also in Gram-positive bacteria [5]: Proteins supposed to exit the cytosol are synthesized as larger precursors containing an N-terminal signal peptide. During or shortly after their synthesis, these precursor proteins are recognized by specific targeting factors (e.g., a bacterial signal recognition particle, b-SRP, consisting of Ffh and the Ffs-RNA, and its receptor FtsY) and subsequently delivered to the so-called translocase holoenzyme in the membrane, which consists of the subunits SecA, SecY, SecE, SecG, SecDF, and YajC (Figure 3). The translocation ATPase component SecA is the key component within the translocation scenario, and couples the energy of ATP-binding and ATP-hydrolysis to the movement of the translocating polypeptide chain across the membrane. This coupling is achieved by cycling of SecA between a membrane-peripheral and a membrane-integral state and, comparable to the needle of a sewing machine, SecA concomitantly transfers 20–30 amino acid residues of the translocating protein across the membrane during each cycle. In addition to ATP, the electrochemical membrane potential is required as an energy source for efficient protein translocation. The actual translocation across the membrane takes place at the SecY/SecE core translocase which forms the protein-conducting channel. SecG and the SecDF/YajC complex are thought to increase the efficiency of translocation at the SecY/SecE core translocase by facilitating the SecA cycle. During or shortly after translocation, the signal peptide is cleaved from the precursor proteins by specific signal peptidases (Bacillus subtilis has five of them) and the resulting mature protein is released from the
3
4 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
Fig. 3 General (Sec) protein export apparatus of Gram-positive bacteria. Newly synthesized secretory precursor proteins are recognized by specific targeting factors (such as the bacterial signal recognition particle, b-SRP) and subsequently delivered to the translocase via the b-SRP receptor, FtsY. The translocation ATPase SecA pushes
the secretory protein through a proteinconducting channel consisting of SecY, SecE and SecG. SecG and the SecDF-YajC complex increase the efficiency of translocation at the SecY-SecE core translocase. During translocation, the signal peptide (SP) is cleaved from the precursor by signal peptidase (SPase).
translocase on the trans-side of the membrane. Assisted by specific folding factors, such as the PrsA lipoprotein (see also Section 3), the mature protein resumes its final folded structure and, after transport across the cell wall, is released into the supernatant. Due to the fixed number of protein export sites, it is understandable that the capacity for heterologous protein secretion is limited. Overloading of the protein export machinery results in the intracellular accumulation of the corresponding heterologous precursor protein, which subsequently might become export-incompetent due to tight folding or due to aggregation. Furthermore, since ongoing functional protein translocation is essential for viability, blocking of the export sites by heterologous proteins is often toxic to the cells. The expression level required to overload the export machinery is dependent on the efficiency, by which the respective protein is translocated – that is, efficiently exported proteins require higher expression levels for saturating the translocase than those which are inefficiently translocated. Therefore, the application of moderate or low-level expression conditions can result in a significant improvement of the exportability and yield in the supernatant of the desired product [6]. Likewise, the co-overproduction of limiting components of the Sec-translocase with the protein of interest can be beneficial with respect to its secretion [7].
2 Major Protein Export Routes in Gram-positive Bacteria
Besides these quantitative aspects of substrate/translocase interactions, qualitative aspects of such interactions also play an important role. Successful secretion of a heterologous protein requires that the corresponding precursor protein can productively interact with all components of the secretion apparatus. Studies conducted in our laboratory have indicated that the formation of a productive translocation initiation complex, consisting of the foreign protein, SecA and the SecY/SecE core translocase, is a decisive step which determines whether or not the protein is channeled into the secretion pathway (M. Klein, D. Tippe, O. K¨oberling, R. Freudl, unpublished). The isolation of mutants with mutations in the translocase components that result in improved interactions between the heterologous protein and the translocase [8] can be considered as a promising strategy for enlarging the spectrum of heterologous proteins which can be accepted by the Sec protein export apparatus. 2.2 The Twin-Arginine Translocation (Tat) Pathway
In addition to Sec-mediated protein translocation, many bacteria possess a second protein export pathway for the translocation of a subset of precursor proteins. This Sec-independent mechanism has been designated Tat (for twin-arginine translocation) due to the fact that a characteristic amino acid motif including two consecutive arginine residues can be identified in the signal peptide of the respective precursor proteins. In marked contrast to the Sec system, which requires that its substrates are maintained in a more or less unfolded state, the Tat pathway translocates its protein substrates in a fully folded or even oligomeric form across the membrane. Many of the Tat substrates are proteins that have to recruit a cofactor in the cytosol and as a prerequisite have to acquire a folded status prior to export. In addition, some cofactor-less proteins are exported via this route, presumably because their rapid folding kinetics precludes transport via the Sec pathway [9, 10]. The Tat export machinery consists of a surprisingly low number of components. In the ABC-type systems (found for example in E. coli and plant thylakoids), the three integral membrane proteins TatA, TatB, and TatC seem to be the minimal components of the Tat translocation pathway. These proteins are assumed to merge in the cytoplasmic membrane to form the Tat translocase. Although TatA and TatB show some weak sequence identity, they are thought to perform different functions. However, Tat systems of the AC type (found for example in Bacillus subtilis or Staphylococcus carnosus) even lack TatB, suggesting that in these systems TatA additionally takes over the TatB function. So far, very little is known about the mechanism of Tat-mediated protein transport. Experimental evidence from E. coli [11] and plant thylakoids [12] suggests that Tat precursor proteins first bind to a receptor consisting of Tat(B)C in the cytoplasmic membrane. Subsequently, multiple copies of TatA are recruited to the Tat(B)C-precursor complex to form an active translocase, a step which is dependent on the presence of the transmembrane H+ gradient (pH). One attractive hypothesis for the formation of the actual translocation channel postulates that TatA molecules undergo a dynamic polymerization
5
6 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
around the substrate molecule, such that the number of TatA proteins present in the pore matches the diameter of the respective substrate protein (which varies considerably between different known Tat substrates) within the pore [10]. After transport of the substrate across the membrane plane, the substrate protein is released on the trans-side of the membrane and the pore component TatA re-dissociates from the Tat(B)C receptor complex, allowing the recycled components to catalyze further rounds of substrate binding and translocation. Export of heterologous proteins via the Tat pathway can provide an attractive alternative to Sec-mediated export, especially in such cases where the heterologous protein does not fold properly in an extra-cytosolic environment. In E. coli, it has been shown that the fusion of the green fluorescent protein (GFP) from jellyfish to a Sec signal peptide results in an efficient Sec-dependent translocation of GFP into the periplasm. However, the exported GFP was completely inactive, most likely due to folding problems caused by the lack of chaperones in the periplasmic space [13]. In contrast, GFP could be transported into the periplasm in an active form, when the Tat transport route was used [14]. These results clearly demonstrate that prefolding of a heterologous protein in the cytosol, assisted by the cytoplasmic Hsp60 and Hsp70 chaperone systems, and its subsequent translocation via the Tat pathway can be the superior to Sec-dependent secretion. The extent of Tat pathway-mediated secretion is also limited by the number of available export sites as described above for the Sec pathway. For E. coli, it has been shown that overproduction of GFP fused to a Tat signal peptide results in a rapid saturation of the Tat translocase and that the efficiency of export can significantly be increased by co-overexpression of the tatABC genes [15]. To date, very few reports exist in the literature describing the use of the Tat pathway for the secretion of heterologous proteins in Gram-positive bacteria. Scherlaekens et al. [16] demonstrated, that a tyrosinase from Streptomyces antibioticus (a copper-containing monooxygenase that catalyzes the formation of melanin pigment from tyrosine) could successfully be secreted in a Tat-dependent manner by Streptomyces lividans. Furthermore, it was shown by Pop et al. [17] that the B. subtilis Tat system is capable of secreting the cytosolic E. coli-derived β-galactosidase when fused to a B. subtilis Tat signal peptide. Although further experimental evidence has to be provided, the available results strongly suggest that secretion of heterologous proteins by the Tat pathway of Gram-positive bacteria promises to be an attractive alternative to conventional approaches relying on Sec-dependent secretion. 2.3 Secretion Signals
In general, signal peptides for Sec-dependent protein export in bacteria and for import into the endoplasmic reticulum (ER) of eukaryotic cells as well as the membrane core of the respective translocases are the central features of protein export conserved during evolution. In many cases, it has been found that these signal peptides are functional across species boundaries. For example, the rat insulin signal peptide has been shown to mediate insulin export into the periplasm of E. coli [18].
3 Extracytosolic Protein Folding 7
Likewise, it has been demonstrated that signal peptides of Gram-negative bacteria efficiently function in Gram-positive hosts [6] and vice versa [19]. Therefore, secretion of heterologous secretory proteins by Gram-positive bacteria can sometimes be achieved by maintaining their authentic signal peptide [6, 20]. However, it has also been observed in other cases that the natural signal peptide of a heterologous secretory protein functions only inefficiently in the chosen Gram-positive host and that its replacement by a host-derived signal peptide can significantly improve the secretion efficiency [21]. Secretion of a normally not-exported heterologous protein requires that the respective protein is attached to a signal peptide. The joint between the signal peptide and the foreign protein can be designed such that the protein is linked in an exact fusion located immediately to the signal peptide or a few amino acid residues downstream of it. The exact fusions are used to obtain an authentic NH2 -terminus for the secreted protein, the latter ones are used to preserve the steric environment for efficient removal of the signal peptide by the signal peptidase. However, it is still not predictable whether a certain combination of a signal peptide and a foreign protein will result in membrane translocation and processing of the respective hybrid precursor. For example, numerous attempts to secrete foreign proteins by B. subtilis have shown that certain signal peptides can support efficient export for a particular protein, but not for another one [4]. When attempts to export a desired heterologous protein fail, use of alternative signal peptides derived from the chosen Gram-positive host might therefore be a promising strategy to succeed. Tat signal peptides seem to function optimally only with their cognate translocases [22, 23]. For this reason, the fusion of heterologous proteins to a Tat signal peptide originating from the chosen Gram-positive host bacterium or a closely related species is highly recommended if the Tat-dependent secretion option is envisaged.
3 Extracytosolic Protein Folding
Folding of proteins that reside in the cytosol is assisted by various well-characterized general chaperones (Hsp60, Hsp70 proteins). These chaperones prevent the aggregation of newly synthesized proteins and assist their folding into the correct, biologically active conformation. In contrast, proteins exported via the Sec pathway have to resume their folded conformation at the trans-side of the membrane. Very little is known about the post-translocational mechanisms of folding of secreted proteins in Gram-positive bacteria; that is, it is still unclear to what extent cellular factors of the host bacterium (Figure 4) are involved in the respective folding events. In B. subtilis, the plasma membrane-anchored lipoprotein PrsA – which is homologous to the parvulin class of peptidyl prolyl cis/trans-isomerases – seems to be involved in the folding of various newly translocated proteins. Limiting the amount or the activity of PrsA results in the decreased folding of exported proteins and their subsequent degradation (see also Section 5) [24]. Especially under high-level protein-secretion conditions, the amount of available PrsA is a serious bottleneck
8 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
Fig. 4 Late stages of Sec-dependent protein secretion in Gram-positive bacteria. The lipoprotein PrsA and the thiol disulfide oxido-reductases BdbABCD are factors assisting protein folding on the trans-side of the plasma membrane. An inverse corre-
lation exists between the efficiency of folding of newly translocated proteins into a protease-resistant conformation and their degradation by membrane- and/or cell wallassociated proteases (P).
[25]. Consequently, the overproduction of the PrsA folding factor is a promising tool for improving the yields of correctly folded homologous and heterologous proteins. In fact, increasing the number of PrsA molecules resulted in an improved secretion of a Bacillus amyloliquefaciens-derived α-amylase (AmyQ) in B. subtilis [25]. Likewise, the amount of a fibrin-specific single-chain antibody fragment that was secreted into the culture supernatant of B. subtilis was significantly increased when the prsA gene was co-overexpressed [26]. Disulfide bonds are crucial for the activity and stability of many proteins of biotechnological or pharmaceutical interest. In principle, such bonds can form spontaneously by air oxidation. However this process is very slow and much less effective than the formation of disulfide bonds in vivo, catalyzed by specific enzymes, the so-called thiol disulfide oxido-reductases. In bacteria, disulfide bonds cannot be formed in the cytosol, due to its reducing conditions. Disulfide bond formation requires that the respective protein is exported into an extracytosolic compartment, such as the periplasm of Gram-negative bacteria. In E. coli, six proteins residing in the periplasm and the cytoplasmic membrane have been identified which are involved in disulfide bond formation and isomerization: DsbA, DsbB, DsbC, DsbD, DsbE, and DsbG (for details, see [27]). Very few extra-cytoplasmic proteins with disulfide bonds have been identified in Gram-positive bacteria; this raised the question whether this class of bacteria has catalytic systems for disulfide bond formation. However, recent genetic studies suggest the existence of such systems in B. subtilis. Here, genes encoding four proteins with similarities to thiol disulfide oxido-reductases (BdbA, BdbB, BdbC, and BdbD) have been identified and it has been shown that the natural substrates for these enzymes are a membrane protein involved in competence development (ComGC) and the lantibiotic sublancin 168, a peptidic antibiotic which is transported across the cytoplasmic membrane by an ABC transport system [28, 29]. Moreover, it has been shown that B. subtilis can secrete active E. coli-derived alkaline phosphatase PhoA, which requires the correct formation of two disulfide bonds for activity. Mutations in the bdbB and
4 The Cell Wall as a Barrier for the Secretion of Heterologous Proteins
bdbC genes led to a severe reduction in the production of the secreted PhoA with correct disulfide bonds [30]. Secretion of active E. coli-derived TEM β-lactamase, which requires the presence of one disulfide bond for stability, only depends on BdbC [30]. Active human interleukin-3, which again requires one disulfide bond for its biological activity, could be secreted by a recombinant Bacillus licheniformis strain, as shown in a previous study [31]. These results showed that disulfide bonds of heterologous proteins can be formed properly by Gram-positive bacteria and that thiol disulfide oxido-reductases are crucial for this process. On the other hand, secreted proteins of eukaryotes often contain several (i. e., more than one or two) disulfide bonds. Proteins containing multiple disulfide bridges such as the human serum albumin or the human pancreatic alpha-amylase were secreted only very poorly by B. subtilis, most likely due to impairments in disulfide bond formation [32, 33]. Furthermore, not all Gram-positive bacteria seem to be equally capable in catalyzing the formation of disulfide bridges. In contrast to the situation in B. subtilis, the E. coli-derived alkaline phosphatase PhoA described before was found to be completely inactive when secreted from a S. carnosus host. This suggests either that this microorganism does not possess thiol disulfide oxido-reductases, or that these enzymes differ from their B. subtilis homologues with respect to their substrate specificity (M. Klein, R. Freudl, unpublished). Therefore, correct choice of the Gram-positive host organism is crucial if the desired heterologous protein requires disulfide bonds for activity.
4 The Cell Wall as a Barrier for the Secretion of Heterologous Proteins
Gram-positive bacteria possess a thick cell wall, a highly cross-linked semi-porous structure composed of peptidoglycan and negatively charged polymers such as teichoic or teichuroic acid. Proteins exiting the external surface of the cytoplasmic membrane first encounter the cell wall before being released into the culture medium. The cell wall therefore represents the final stage of protein secretion. Although some exo-proteins may be able to pass the cell wall by diffusion, the passage of larger proteins – or those extensively exposing positively charged amino acid residues at their surface – is likely to be affected by wall characteristics such as molecular-sieving and ion-exchanging properties. Accordingly, it has been found frequently that the wall can indeed represent a severe bottleneck in the secretion of heterologous proteins, but this is not observed in the export of autologous proteins. For example, no release of human serum albumin is observed unless the peptidoglycan layer of the cell wall is destroyed, despite the fact that the protein is translocated across the B. subtilis plasma membrane [32]. In such cases, testing different Gram-positive bacteria or mutant strains with slightly altered cell wall compositions [34] might be considered as a possible means of improving the cell wall passage of trapped or inefficiently released heterologous proteins. Moreover, the use of cell wall-less L-forms of Gram-positive bacteria [35] might be an alternative approach for the secretory production of heterologous proteins with an otherwise impaired cell wall passage.
9
10 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
5 Degradation of Exported Proteins by Cell-associated and Secreted Proteases
Many Gram-positive bacteria secrete proteases into the culture supernatant. For example, at least eight different proteases efficiently degrade secreted, proteasesensitive heterologous proteins in B. subtilis. Mutants of B. subtilis have been generated which lack the corresponding protease genes and show less than 0.5% proteolytic activity compared to wild-type strains. In these mutants, an improved stability of secreted heterologous proteins can be observed [36]. In addition, strains of species secreting only low levels of proteases or no proteases at all (such as S. carnosus; see the following sections) can be considered as alternative host organisms. In addition to soluble proteases in the supernatant, proteases localized on the outer surface of the plasma membrane and/or in the cell wall area of Grampositive bacteria represent another serious bottleneck [37]. An inverse correlation exists between the efficiency of folding of newly translocated proteins into a conformation resistant towards these cell-associated proteases and their degradation (see Figure 4). If the intrinsic folding of the target protein is inefficient, or if the folding is rendered inefficient by genetic manipulations (e.g., by depletion of the PrsA folding factor in B. subtilis), a rapid degradation of newly secreted proteins is observed, caused by the cell-associated proteases [24, 37]). So far, the respective proteases involved in the cell-associated degradation of exported heterologous proteins have not been characterized. The identification and subsequent inactivation of these proteases is a major challenge for the future which, in case of success, might result in substantial improvements of the performance of Gram-positive bacteria to secrete heterologous proteins.
6 Staphylococcus carnosus
The presence of secreted proteases in the culture supernatant of most Gram-positive bacteria is a major drawback that precludes their use for the secretory production of heterologous proteins. In contrast, the culture supernatant of S. carnosus TM300 does not contain any significant proteolytic activity. Mainly for this reason, this microorganism is considered to be a promising expression host for a variety of scientific and biotechnological applications [38]. 6.1 General Description
Gram-positive and catalase-positive cocci play an important role in the ripening process of dry sausages and meat. One of the predominant microorganisms in fermented meat is S. carnosus. In earlier reports, this bacterium has been named Micrococcus but, based on several criteria such as DNA sequence homology, peptidoglycan composition and biochemical properties, this microorganism was
6 Staphylococcus carnosus
re-classified as a member of the genus Staphylococcus. The species name “carnosus” was chosen due to the fact that the bacterium occurs in, and actually was isolated from, meat [39]. S. carnosus has now been in use for more than 50 years, either alone or in combination with other microorganisms (Lactobacillus sp., Pediococcus sp.), as a starter culture for the production of raw meat products. The use of starter cultures in meat fermentation processes offers a number of advantages over uncontrolled processes: (1) accelerated and controlled color formation, drop in pH and acquisition of the desired texture and flavor; and (2) control of food pathogens and spoilage organisms. The main characteristics of S. carnosus in this process is the reduction of nitrate, the development of a characteristic flavor, a moderate decrease in the pH value and the reduction of hydrogen peroxide that is produced by the catalase-negative lactobacilli [40]. To date, no S. carnosus strain is known to produce toxins, hemolysins, protein A, coagulase or clumping factors, all of which are typical markers for many pathogenic Staphylococcus aureus strains. Furthermore, S. carnosus has a long tradition of use in starter cultures, and no adverse effects on human health have ever been observed. Therefore, S. carnosus has obtained the GRAS status (generally recognized as safe), for its safe use in the food industry [38]. 6.2 Microbiological and Molecular Biological Tools
Among the various S. carnosus strains, the S. carnosus strain TM300 was identified as the most promising candidate as host for the expression, secretion and surface display of heterologous proteins, as this strain possesses hardly any extracellular proteolytic activity (A1). Due to the pioneering work of Friedrich G¨otz and his coworkers, methods allowing the introduction of DNA into S. carnosus by protoplast transformation [41, 42] or electroporation [43] are available. Furthermore, a variety of vectors suitable for cloning and the expression of genes [44–47], for gene replacement [48], for promotor probing [46], and for secretion [49] or surface display [50] of heterologous proteins have been developed. In addition, a food-grade vector system has been described for S. carnosus, that is based on the genes that mediate sucrose uptake and hydrolysis in Staphylococcus xylosus [51] (Table A2). 6.3 S. carnosus as Host Organism for the Analysis of Staphylococcal-related Pathogenicity Aspects
Due to its noninvasive, nonpathogenic characteristics, S. carnosus has been used successfully as a clean expression background to investigate the contribution of pathogenicity factors in the virulence of pathogenic staphylococci. S. aureus invasion of mammalian cells, including epithelial, endothelial, and fibroblastic cells, critically depends on fibronectin bridging between S. aureus fibronectin-binding proteins (FnBPs) and the host fibronectin receptor integrin a5b1. To investigate whether this mechanism is sufficient for S. aureus invasion, the FnBPs of S. aureus
11
12 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
were expressed in S. carnosus, which intrinsically lacks such FnBPs. Characterization of the corresponding recombinant strain revealed that the S. aureus FnBPs were expressed in a functional form at the cell surface. They conferred an invasiveness to S. carnosus comparable to that of the strongly invasive S. aureus strain Cowan 1, showing that the S. aureus FnBPs are per se sufficient to mediate fibronectin binding and invasion of host cells [52]. In another study, the role of the intercellular adhesion (ica) gene products in the formation of biofilms was investigated using S. carnosus as an expression platform. Nosocomial infections that result in the formation of biofilms on the surfaces of biomedical implants are a leading cause of sepsis and are often associated with colonization of the implants by Staphylococcus epidermidis. Biofilm formation is thought to be a two-step process that requires the adhesion of bacteria to a substrate surface followed by cell-cell adhesion, forming the multiple layers of the biofilm. This latter process is triggered by the polysaccharide intercellular adhesin (PIA), which is composed of linear β-1,6-linked glucosaminylglycans. The intercellular adhesion (ica) locus was identified and shown to mediate cell-cell adhesion and PIA production in S. epidermidis. Transfer of the ica genes of S. epidermidis into S. carnosus, which intrinsically does not possess an intercellular adhesion phenotype, led to PIA expression and to the formation of a biofilm on a glass surface and of large cell aggregates, directly demonstrating the involvement of the ica gene products in biofilm formation [53]. 6.4 Secretory Production of Heterologous Proteins by S. carnosus 6.4.1 The Staphylococcus hyicus Lipase: Secretory Signals and Heterologous Expression in S. carnosus The secretory signals derived from a lipase of another staphylococcal species, Staphylococcus hyicus, have been successfully used in a variety of cases for the secretion of heterologous proteins by S. carnosus. They are among the most powerful signals that are presently available for this host system. In addition, analysis of the secretion mechanism of the S. hyicus lipase has provided some new insights into the secretion pathway of Gram-positive bacteria. These newly discovered features are of significance also with respect to the application of this class of microorganisms to the secretion of heterologous proteins. The S. hyicus lipase gene was one of the first heterologous genes that was cloned and expressed in S. carnosus [54]. According to the gene sequence, the S. hyicus lipase consists of 641 amino acid residues (predicted molecular mass of 71 kDa) that contains a typical Gram-positive Sec signal peptide, 38 amino acid residues in length, at its amino-terminal end. Transformation of S. carnosus with a plasmid containing the lipase gene from S. hyicus resulted in the release of the active enzyme into the culture supernatant. The lipase is secreted in amounts of 40 mg protein L−1 , and accounts for about 15% of the total extracellular proteins in the recombinant S. carnosus strain. Comparison of the lipase proteins by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) revealed apparent masses of 46 kDa in the donor strain and
6 Staphylococcus carnosus
Fig. 5 The Staphylococcus hyicus pre-prolipase. The pre-pro part of the S. hyicus lipase is not only an efficient signal for the secretion of the cognate mature lipase, but also for heterologous proteins or peptides into the culture supernatant of S. carnosus.
Attachment of a cell wall sorting signal (CWS) to the C-terminal end of a passenger protein or peptide by genetic engineering leads to covalent anchoring of the passenger to the peptidoglycan and to its display on the S. carnosus cell surface.
86 kDa in the recombinant S. carnosus strain, respectively [54]. This remarkable size difference suggested that the lipase is processed proteolytically in S. hyicus, but not in S.carnosus. Amino-terminal sequencing of the two lipase variants showed that the 86-kDa form secreted by S. carnosus started with Asp39 , whereas the 46kDa form found in the S. hyicus culture supernatant started with Val246 . These results confirmed that both forms were derived from the same precursor protein, and furthermore indicated that the 641-amino acid precursor is organized as a so-called pre-pro-protein possessing a 38-amino acid signal peptide (pre-peptide), a 207-amino acid long pro-peptide followed by the 396-amino acid residues of the mature lipase [55] (Figure 5). Further characterization of the processing events showed that, in S. hyicus the pre-pro–lipase is first processed by signal peptidase to pro-lipase during or shortly after Sec-dependent membrane translocation. Then, the resulting 86-kDa pro-lipase is further processed to the 46-kDa form by a metalloprotease that is present in the culture supernatant of S. hyicus but not in S. carnosus [56] (Figure 5). The presence of a pro-peptide is not uncommon in proteins secreted from Grampositive bacteria. In many examples, as in representatives of the well-characterized subtilisin protease family from various Bacillus species, the pro-peptide functions as an intramolecular chaperone that is strictly required for the folding of the attached mature protein. After the mature protease has adopted its active conformation, the pro-peptide is subsequently removed by autocatalytic proteolytic degradation [57]. In contrast, the pro-peptide of the S. hyicus lipase is obviously not essential for the folding of the mature part of the lipase, but it rather fulfils an additional important function in relation to its secretion. Deletion of parts of the pro-peptide or removal of the entire pro-peptide from the pre-pro-lipase precursor resulted in a dramatic reduction of the amount of lipase that is released into the supernatant of S. carnosus
13
14 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
cultures. However, this secreted lipase was enzymatically active, showing that the pro-peptide per se is not absolutely required for adopting the correct conformation of the mature lipase. In contrast, the strongly reduced amounts of secreted lipase suggested that the presence of the pro-peptide is required for the protection of the mature lipase against proteolytic degradation at a certain step in the secretion pathway [58]. Furthermore, the secretion of the lipase pro-peptide in trans with a pro-peptide-deficient lipase precursor did not result in a stabilization of the mature lipase, showing that the protection against proteolytic degradation requires that the mature part of the lipase is covalently attached to the pro-peptide [59]. 6.4.2 Use of the Pre-pro-part of the S. hyicus Lipase for the Secretion of Heterologous Proteins in S. carnosus Subsequent studies showed that the pro-peptide promoted not only the efficient secretion of the cognate lipase, but also of totally unrelated heterologous proteins and peptides. The first study which demonstrated the beneficial effect of the pro-peptide on the secretion of foreign passenger proteins was performed by Liebl and G¨otz [49] using the periplasmic β-lactamase of E. coli as a heterologous model protein. Fusion of the mature part of the E. coli β-lactamase to the pre-pro-part of the S. hyicus lipase resulted in the release of active pro-peptide-β-lactamase hybrid protein into the culture supernatant. The amounts obtained were close to those obtained with the native lipase. When the pro-peptide was omitted, hardly any secreted β-lactamase could be detected [49]. More detailed information on the action of the pro-peptide was gained in studies aiming at the secretion of an E. coli outer membrane protein (OmpA) by S. carnosus [37]. In its native environment, OmpA is a two-domain protein, consisting of an amino-terminal anti-parallel β-barrel outer membrane domain and a carboxy-terminal periplasmic domain. In contrast to the periplasmic domain, the folding properties of which can be considered similar to those of a soluble protein, the folding of the membrane domain into its native conformation requires the presence of lipopolysaccharide (LPS). Since Gram-positive bacteria do not contain LPS, the outer membrane domain of OmpA is not expected to fold into a stable conformation after secretion across the plasma membrane in S. carnosus. Therefore, it can be used as an excellent sensor for the detection of protease-related bottlenecks in the secretory pathway. Expression of native OmpA precursor protein containing its own signal peptide in S. carnosus resulted in the efficient secretion of two degradation products, 16 and 18 kDa in size. The amino-terminal sequence analysis revealed that the two fragments overlapped, and were both derived from the periplasmic domain of OmpA. Clearly, the membrane domain of OmpA was degraded subsequent to membrane translocation. This observation was surprising, since S. carnosus is known to secrete very small amounts of proteases, if any at all. The finding that purified OmpA, which was added externally to the culture medium of growing S. carnosus cells, remained intact indicated that newly synthesized and exported OmpA is degraded by one or more cell-associated proteases rather than by soluble proteases in the culture supernatant. Attachment of the mature OmpA to the pre-pro-part of the S. hyicus lipase led to an efficient secretion of the corresponding pro-peptide-OmpA hybrid protein into the supernatant of S. carnosus, and
6 Staphylococcus carnosus
completely prevented the cell-associated proteolytic degradation of OmpA, despite the fact that the protease-sensitive membrane domain is present also in the fusion protein. Although the underlying molecular mechanism is still not understood, these results showed that a main function of the lipase pro-peptide is to prevent the proteolytic degradation of the attached passenger protein by proteases residing on the trans-side of the cytoplasmic membrane and/or in the cell wall area [37]. The beneficial effect of the lipase pro-peptide has also been used successfully for the secretion of mammalian proteins by S. carnosus. For example, pro-insulin [59], different antibody fragments [60, 61] and a precursor of the human peptide hormone calcitonin (see below; Section 6.4.3) could be secreted by employing vectors that had incorporated the pre-pro part of the S. hyicus lipase for secretion targeting. In all cases, it became evident that the pro-peptide is crucial for efficient secretion and proteolytic stability. 6.4.3 Process Development for the Secretory Production of a Human Calcitonin Precursor Fusion Protein by S. carnosus Human calcitonin (hCT) is a peptide hormone of 32 amino acids which is effectively used for the treatment of diseases such as osteoporosis imperfect, menopausal osteoporosis and Paget’s disease. To date, therapeutic hCT has been produced industrially by total chemical synthesis. Biologically active hCT requires the amidation of its C-terminal proline residue, a modification which is introduced enzymatically to the unmodified synthesized precursor peptide [62]. Due to the high costs of the peptide synthesis and the increasing therapeutic demands for hCT, a biotechnological process for the large-scale (kg) production of hCT precursor peptide would be desirable. Previous attempts to establish such a process mostly used E. coli as an expression system. As small peptides such as hCT are rapidly degraded by intracellular proteases, hCT was produced as a fusion protein. However, the intracellular production of such fusion proteins resulted in the formation of inclusion bodies [63]. Therefore, secretion of the hCT precursor peptide fused to the pre-pro part of the S. hyicus lipase by S. carnosus was considered as a promising approach to overcome problems of aggregation and proteolytic instability. Multiple repeats (2, 4, 6, 8, or 10 copies) of the hCT precursor peptide were fused to the pre-pro-part of the lipase. The hCT peptides were separated from each other and from the pro-peptide by linker sequences containing cleavage sites for the endopeptidase clostripain, allowing the enzymatic release of monomeric hCT precursor peptides subsequent to the isolation of the respective fusion proteins. The corresponding hybrid sequences were cloned into an expression vector fusing them to a xylose-inducible promoter. The respective plasmids were used to transform S. carnosus. In all cases, the lipase secretion signals allowed an efficient release of the respective pro-peptide-hCT fusion proteins. The yields obtained in laboratory shake flasks were close to those obtained with the native lipase under the same conditions (approx. 40–50 mg L−1 ) [47]. In the next step, a fermentation process was established for the production of the pro-peptide-hCT fusion containing two hCT repeats. First, the recombinant strain was cultured in a pH-auxostatic fed-batch process in a 3-L stirred tank reactor using glycerol as carbon source, and
15
16 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
with a pH-controlled addition of yeast extract. Following this process design, the production of 2000 mg L−1 of the pro-peptide-hCT fusion protein (equivalent to 420 mg L−1 hCT precursor peptide) was achieved within 14 hours. The process was then scaled up to 150 L, resulting in yields of 200 g of the pro-peptide-hCT fusion protein [64, 65]. Finally, a downstream procedure was developed based on an expanded bed adsorption (EBA) to the cation exchanger STREAMLINE SP matrix at pH 5. In this way, the pro-peptide-hCT fusion protein was recovered in a single step from the unclarified culture broth [47]. To summarize: the results obtained with hCT precursor peptide example clearly demonstrate the high potential of S. carnosus as an alternative expression system for the secretory production of heterologous proteins. 6.5 Surface Display on S. carnosus
The anchoring of heterologous proteins on the cell surface of food-grade bacteria, such as S. carnosus, has become an attractive strategy for various potential applications in biotechnology, microbiology, and biomedicine. Many of the major steps involved in the surface anchoring of proteins in Gram-positive bacteria have been elucidated in S. aureus, using protein A as the model protein [66]. For membrane translocation and subsequent anchoring to the cell wall, the proteins to be displayed on the surface must be equipped with an N-terminal secretion signal and a C-terminal cell wall sorting signal. The latter signal consists of: (1) a conserved penta-peptide motif, LPXTG; (2) a hydrophobic stretch of 15–22 amino acid residues; and (3) a short charged tail of six to seven amino acid residues. During secretion of proteins containing such a cell wall sorting signal, the hydrophobic region and the charged tail impede membrane translocation by the Sec system, allowing the recognition of the LPXTG motif by a membrane-associated sortase enzyme. In a following two-step transpeptidation reaction, sortase cleaves the LPXTG motif between the threonine and glycine residues and covalently attaches the threonine to the amino group of the pentaglycine cell wall cross-bridge, resulting in a cell wall-attached protein [67]. Using the pre-pro part of the S. hyicus lipase as secretion signals and the cell wall sorting signal of either protein A or fibronectin-binding protein from S. aureus (see Figure 5), various heterologous proteins have successfully been displayed on the S. carnosus cell surface. The S. hyicus-derived lipase and the E. coli-derived β-lactamase are two examples of enzymes which have been tethered to the surface of S. carnosus in an active form. Approximately 10 000 enzyme molecules were found to be present on each cell, and it was suggested that, due to their rigid structure, Gram-positive bacteria would be particularly appropriate for applications such as whole-cell catalysts or biofilters [68]. The surface display of chimeric proteins containing poly-histidyl peptides resulted in S. carnosus strains which have gained a significant Ni2+ - and Cd2+ -binding capacity, suggesting that such bacteria could find use as metal-binding bioadsorbents [69]. However, most surface display approaches in S. carnosus are devoted to the exposition of an antigenic determinant on the cell surface. Use of the respective recombinant bacteria as
Appendix 17
live vaccine delivery vectors is considered to be an attractive vaccination alternative. For example, epitopes derived from streptococcal protein G [70] or the G glycoprotein of human respiratory syncytial virus (RSV) [71] were successfully displayed on the S. carnosus cell surface. Furthermore, intranasal or oral immunization of mice with the corresponding live recombinant staphylococci elicited significant serum IgG responses to the respective antigens. In addition, the challenge of mice, immunized with S. carnosus displaying RSV peptides on the cell surface, with 105 RSV tissue culture infectious doses50 showed protection of the lung in approximately half of the mice in the two immunization groups. These results clearly demonstrate that immune protection can be elicited using the food-grade bacterium S. carnosus as a vaccine-delivery vehicle [71]. Appendix Table A1 Selection of S. carnosus host strains
Strain
Genotype
Reference
TM300 TM300 recA
wild-type recA
[39] [72]
Table A2 Selection of S. carnosus vectors
Plasmid marker
Type
Resistance
Relevant features
Reference
pCA43 pCT20 pCE10 pSCRBA
Cloning vector Cloning vector Cloning vector Cloning vector
cml cml, tet cml, ery cml
[44] [45] [45] [51]
pPS11
Promoter-probe vector Expression vector Expression vector
cml
— — — food-grade vector system based on sucrose utilization promotor-less lipase gene of S. hyicus as reporter xylose-inducible promoter
cml (Sc) amp (Ec)
[47]
pLipPS1
Secretion vector
cml
pSPP-mABPXM
Surface display vector
cml (Sc) amp (Ec)
xylose-inducible promoter (E. coli/S. carnosus shuttle vector) contains promoter and structural gene of S. hyicus pre-prolipase contains lipase promoter and gene fragments encoding the pre-pro-part of S. hyicus lipase and the cell wall anchor of S. aureus protein A (E. coli/S. carnosus shuttle vector)
pCX26 pXR100
cml
[46] [46]
[49]
[50]
18 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
References 1 AUNSTRUP K. (1979) Production, isolation, and economics of extracellular enzymes. In: Applied Biochemistry and Bioengineering, Vol. 2 (WINGARD L. B., KATCHALSKI-KAZIR E., GOLDSTEIN L., Eds), Academic Press, New York, pp 27–69 2 DEBABOV V. S. (1982) The industrial use of Bacilli. In: The Molecular Biology of the Bacilli (DUBNAU D., Ed), Academic Press, New York, pp 331–370 3 HARWOOD C. R. (1992) Bacillus subtilis and its relatives: molecular biological and industrial workhorses. Trends Biotechnol 10: 247–256 4 SIMONEN M., PALVA I. (1993) Protein secretion in Bacillus species. Microbiol Rev 57: 109–137 5 van WELY K. H. M., SWAVING J., FREUDL R., DRIESSEN A. J. M. (2001) Translocation of proteins across the cell envelope of Gram-positive bacteria. FEMS Microbiol Rev 25: 437–454 6 MEENS J., FRINGS E., KLOSE M., FREUDL R. (1993) An outer membrane protein (OmpA) of Escherichia coli can be translocated across the cytoplasmic membrane of Bacillus subtilis. Mol Microbiol 9: 847–855 7 LELOUP L., DRIESSEN A. J. M., FREUDL R., CHAMBERT R., PETIT-GLATRON M. F. (1999) Differential dependence of levansucrase and – amylase secretion on SecA (Div) during the exponential phase of growth of Bacillus subtilis. J Bacteriol 181: 1820–1826 8 PEREZ-PEREZ J., BARBERO J. L., MARQUEZ G., GUTIERREZ J. (1996) Different PrlA proteins increase the efficiency of periplasmic production of human interleukin-6 in Escherichia coli. J Biotechnol 49: 245–247 9 M¨ULLER M., KOCH H. G., BECK K., SCHA¨ FER U. (2001) Protein traffic in bacteria: multiple routes from the ribosome to and across the membrane. Prog Nucleic Acid Res Mol Biol 66: 197–157 10 BERKS B. C., PALMER T., SARGENT F. (2003) The Tat protein translocation pathway and its role in microbial physiology. Adv Microbial Physiol 47: 187–254 11 ALAMI M., L¨UKE I., DEITERMANN S., EISNER G KOCH H. G., BRUNNER J., M¨ULLER M.
12
13
14
15
16
17
18
19
20
(2003) Differential interaction between a twin-arginine signal peptide and its translocase in Escherichia coli. Mol Cell 12: 937–946 MORI H., CLINE K. (2002) A twin arginine signal peptide and the pH gradient trigger reversible assembly of the thylakoid pH/Tat translocase. J Cell Biol 157: 205–210 FEILMEIER B. J., ISEMINGER G., SCHROEDER D., WEBBER H., PHILLIPS G. J. (2000) Green fluorescent protein functions as a reporter for protein localisation in Escherichia coli. J Bacteriol 182: 4068–4076 THOMAS J. D., DANIEL R. A., ERRINGTON J., ROBINSON C. (2001) Export of active green fluorescent protein to the periplasm by the twin-arginine translocase (Tat) pathway in Escherichia coli. Mol Microbiol 39: 47–53 BARRETT C. M. L., RAY N., THOMAS J. D., ROBINSON C., BOLHUIS, A. (2003) Quantitative export of a reporter protein, GFP, by the twin-arginine translocation pathway in Escherichia coli. Biochem Biophys Res Commun 304: 279–284 SCHAERLAEKENS K., SCHIEROVA M., LAMMERTYN E., GEUKENS N., ANNE´ J., van MELLAERT L. (2001) Twin-arginine translocation pathway in Streptomyces lividans. J Bacteriol 183: 6727–6732 POP O., MARTIN U., ABEL C., M¨ULLER J. P. (2002) The twin-arginine signal peptide of PhoD and the TatAd/Cd proteins of Bacillus subtilis form an autonomous Tat translocation system. J Biol Chem 277: 3268–3273 TALMADGE K., KAUFMAN J., GILBERT W. (1980) Bacteria mature preproinsulin to proinsulin. Proc Natl Acad Sci USA 77: 3988–3992 MALKE H., FERRETTI J. J. (1984) Streptokinase: cloning, expression and excretion by Escherichia coli. Proc Natl Acad Sci USA 81: 3557–3561 LAO G. F., WILSON D. B. (1996) Cloning, sequencing, and expression of a Thermomonospora fusca protease gene in Streptomyces lividans. Appl Environ Microbiol 62: 4256–4259
References 21 MILLER J. R., KOVACEVIC S., VEAL L. E. (1987) Secretion and processing of staphylococcal nuclease by Bacillus subtilis. J Bacteriol 169: 3508–3514 22 JONGBLOED J. D. H., MARTIN U., ANTELMANN H., HECKER M., TJALSMA H., VENEMA G., BRON, S., van DIJL J. M., M¨ULLER J. (2000) TatC is a specificity determinant for protein secretion via the twin-arginine translocation pathway. J Biol Chem 275: 41350–41357 23 BLAUDECK N., SPRENGER G. A., FREUDL R., WIEGERT T. (2001) Specificity of signal peptide recognition in Tat-dependent bacterial protein translocation. J Bacteriol 183: 604–610 24 JACOBS M., ANDERSEN J. B., KONTINEN V., SARVAS M. (1993) Bacillus subtilis PrsA is required in vivo as an extracytoplasmic chaperone for secretion of active enzymes synthesized either with or without pro-sequences. Mol Microbiol 8: 957–966 25 VITIKAINEN M., PUMMI T., AIRAKSINEN U., WAHLSTRO¨ M E., WU H., SARVAS M., KONTINEN V. P. (2001) Quantitation of the capacity of the secretion apparatus and requirement for PrsA in growth and secretion of α-amylase in Bacillus subtilis. J Bacteriol 183: 1881–1890 26 WU S. C., YEUNG J. C., DUAN Y., YE R., SZARKA S. J., HABIBI H. R., WONG S. L. (2002) Functional production and characterization of a fibrin-specific single-chain antibody fragment from Bacillus subtilis: Effects of molecular chaperones and a wall-bound protease on antibody fragment production. Appl Environ Microbiol 68: 3261–3269 27 KADOKURA H., KATZEN F., BECKWITH J. (2003) Protein disulfide bond formation in prokaryotes. Annu Rev Biochem 72: 111–135 28 DORENBOS R., STEIN T., KABEL J., BRUAND C., BOLHUIS A., BRON S., QUAX W. J., van DIJL J. M. (2002) Thiol-disulfide oxidoreductases are essential for the production of the lantibiotic sublancin 168. J Biol Chem 277: 16682–16688 29 MEIMA R., ESCHEVINS C., FILLINGER S., BOLHUIS A., HAMOEN L. W., DORENBOS R., QUAX W. J., van DIJL J. M., PROVVEDI R., CHEN I., DUBNAU D., BRON S. (2002) The bdbDC operon of Bacillus subtilis encodes
30
31
32
33
34
35
36
37
thiol-disulfide oxidoreductases required for competence development. J Biol Chem 277: 6994–7001 BOLHUIS A., VENEMA G., QUAX W. J., BRON S., van DIJL J. M. (1999) Functional analysis of paralogous thiol-disulfide oxidoreductases in Bacillus subtilis. J Biol Chem 274: 24531–24538 van LEEN R. W., BAKHUIS J. G., van BECKHOVEN R. R. W. C., BURGER H., DORSSERS L. C. J., HOMMES R. W. J., LEMSON P. J., NOORDAM B., PERSOON N. L. M., WAGEMAKER G. (1991) Production of human interleukin-3 using industrial microorganisms. Bio/Technology 9: 47–52 SAUNDERS C. W., SCHMIDT B. J., MALLONEE R. L., GUYER M. S. (1987) Secretion of human serum albumin from Bacillus subtilis. J Bacteriol 169: 2917–2925 BOLHUIS A., TJALSMA H., SMITH H. E., de JONG A., MEIMA R., VENEMA G., BRON S., van DIJL J. M. (1999) Evaluation of bottlenecks in the late stages of protein secretion in Bacillus subtilis. Appl Environ Microbiol 65: 2934–2941 THWAITE J. E., BAILLIE L. W. J., CARTER N. M., STEPHENSON K., REES M., HARWOOD C. R., EMMERSON P. T. (2002) Optimization of the cell wall microenvironment allows increased production of recombinant Bacillus anthracis protective antigen from B. subtilis. Appl Environ Microbiol 68: 227–234 GUMPERT J., HOISCHEN C. (1998) Use of cell wall-less bacteria (L-forms) for efficient expression and secretion of heterologous gene products. Curr Opin Biotechnol 9: 506–509 MURASHIMA K., CHEN C. L., KOSUGI A., TAMARU Y., DOI R. H., WONG S. L. (2002) Heterologous production of Clostridium cellulovorans engB, using protease-deficient Bacillus subtilis, and preparation of active recombinant cellulosomes. J Bacteriol 184: 76–81 MEENS J., HERBORT M., KLEIN M., FREUDL R. (1997) Use of the pre-pro-part of Staphylococcus hyicus lipase as a carrier for secretion of Escherichia coli outer membrane protein A (OmpA) prevents proteolytic degradation of OmpA by cell-associated protease(s) in two different
19
20 Protein Production in Staphylococcus carnosus and other Gram-Positive Bacteria
38
39
40 41
42
43
44
45
46
47
48
Gram-positive bacteria. Appl Environ Microbiol 63: 2814–2820 G¨OTZ F. (1986) Ein neues Wirt-Vektor-System bei Staphylococcus carnosus. Umschau 10: 530–537 SCHLEIFER K. H., FISCHER U. (1982) Description of a new species of the genus Staphylococcus: Staphylococcus carnosus. Int J Syst Bacteriol 32: 153–156 LIEPE H. U. (1982). Bakterienkulturen und Rohwurst. Forum Mikrobiol 5: 10–15 G¨OTZ F., KREUTZ B., SCHLEIFER K. H. (1983) Protoplast transformation of Staphylococcus carnosus by plasmid DNA. Mol Gen Genet 189: 340–342 G¨OTZ F., SCHUMACHER B. (1987) Improvements of protoplast transformation in Staphylococcus carnosus. FEMS Microbiol Lett 40: 285–288 AUGUSTIN J., G¨OTZ F. (1990) Transformation of Staphylococcus epidermidis and other staphylococcal species with plasmid DNA by electroporation. FEMS Microbiol Lett 66: 203–208 KREUTZ B., G¨OTZ F. (1984) Construction of Staphylococcus plasmid vector pCA43 conferring resistance to chloramphenicol, arsenate, arsenite and antimony. Gene 31: 301–304 KELLER G., SCHLEIFER K. H., G¨OTZ F. (1983) Construction and characterization of plasmid vectors for cloning in Staphylococcus aureus and Staphylococcus carnosus. Plasmid 10: 270–278 WIELAND K. P., WIELAND B., G¨OTZ F. (1995) A promoter-screening plasmid and xylose-inducible, glucose-repressible expression vectors for Staphylococcus carnosus. Gene 158: 91–96 SANDGATHE A., TIPPE D., DILSEN S., MEENS J., HALFAR M., WEUSTER-BOTZ D., FREUDL R., THO¨ MMES J., KULA M. R. (2003) Production of a human calcitonin precursor with Staphylococcus carnosus: secretory expression and single-step recovery by expanded bed adsorption. Process Biochem 38: 1351–1363 MADSEN S. M., BECK H. C., RAVN P., VRANG A., HANSEN A. M., ISRAELSEN H. (2002) Cloning and inactivation of a branched-chain-amino-acid aminotransferase gene from
49
50
51
52
53
54
55
56
57
58
Stapyhlococcus carnosus and characterization of the enzyme. Appl Environ Microbiol 68: 4007–4014 LIEBL W., G¨OTZ F. (1986) Studies on lipase directed export of Escherichia coli β-lactamase in Staphylococcus carnosus. Mol Gen Genet 204: 166–173 SAMUELSON P., HANSSON M., AHLBORG N., ANDREONI C., G¨OTZ F., B¨ACHI T., NGUYEN T. N., BINZ H., UHLEN M., STAHL S. (1995) Cell surface display of recombinant proteins on Staphylococcus carnosus. J Bacteriol 177: 1470–1476 BRU¨ CKNER R., G¨OTZ F. (1996) Development of a food-grade vector system for Staphylococcus carnosus. Syst Appl Microbiol 18: 510–516 SINHA B., FRANCOIS P., QUE Y. A., HUSSAIN M., HEILMANN C., MOREILLON P., LEW D., KRAUSE K. H., PETERS G., HERRMANN M. (2000) Heterologously expressed Staphylococcus aureus fibronectin-binding proteins are sufficient for invasion of host cells. Infect Immun 68: 6871–6878 HEILMANN C., SCHWEITZER O., GERKE C., VANITTANAKOM N., MACK D., G¨OTZ F. (1996) Molecular basis of intercellular adhesion in the biofilm-forming Staphylococcus epidermidis. Mol Microbiol 20: 1083–1091 G¨OTZ F., POPP F., KORN E., SCHLEIFER K. H. (1985) Complete nucleotide sequence of the lipase from Staphylococcus hyicus cloned in Staphylococcus carnosus. Nucleic Acids Res 13: 5895–5906 WENZIG E., LOTTSPEICH F., VERHEIJ B., De HAAS G. H., G¨OTZ F. (1990) Extracellular processing of the Staphylococcus hyicuslipase. Biochemistry (Life Sci Adv) 9: 47–55 AYORA S., LINDGREN P. E., G¨OTZ F. (1994) Biochemical properties of a novel metalloprotease from Staphylococcus hyicus subsp. hyicus involved in extracellular lipase processing. J Bacteriol 176: 3218–3223 SHINDE U., INOUYE M. (2000) Intramolecular chaperones: polypeptide extensions that modulate protein folding. Semin Cell Dev Biol 11: 35–44 DEMLEITNER G., G¨OTZ F. (1994) Evidence for importance of the Staphylococcus hyicus lipase pro-peptide in lipase
References
59
60
61
62
63
64
65
secretion, stability and activity. FEMS Microbiol Lett 121: 189–198 G¨OTZ F., VERHEIJ H. M., ROSENSTEIN R. (1998) Staphylococcal lipases: molecular characterization, secretion, and processing. Chem Phys Lipids 93: 15–25 PSCHORR J., BIESELER B., FRITZ H. J. (1994) Production of the immunoglobulin variable domain REIv via fusion protein synthesized and secreted by Staphylococcus carnosus. Biol Chem Hoppe-Seyler 375: 271–280 SCHNAPPINGER D., GEISSDO¨ RFER W., SIZEMORE C., HILLEN W. (1995) Extracellular expression of native antilysozyme fragments in Staphylococcus carnosus. FEMS Microbiol Lett 129: 121–128 ISHIKAWA H., TAMAOKI H. (1996) Production of human calcitonin in Escherichia coli from multimeric fusion protein. J Ferm Bioeng 82: 140–144 YABUTA M., SUZUKI Y., OHSUYE K. (1995) High expression of recombinant human calcitonin precursor peptide in Escherichia coli. Appl Microbiol Biotechnol 42: 703–708 DILSEN S., PAUL W., SANDGATHE A., TIPPE D., FREUDL R., THO¨ MMES J., KULA M. R., TAKORS R., WANDREY C., WEUSTER-BOTZ D. (2000) Fed-batch production of recombinant human calcitonin precursor fusion protein using Staphylococcus carnosus as an expression-secretion system. Appl Microbiol Biotechnol 554: 361–369 DILSEN S., PAUL W., HERFORTH D., SANDGATHE A., ALTENBACH-REHM J., FREUDL R., WANDREY C., WEUSTER-BOTZ D. (2001) Evaluation of parallel operated small-scale bubble columns for microbial
66
67
68
69
70
71
72
process development using Staphylococcus carnosus. J Biotechnol 88: 77–84 MAZMANIAN S. K., TON-THAT H., SCHNEEWIND O. (2001) Sortase-catalyzed anchoring of surface proteins to the cell wall of Staphylococcus aureus. Mol Microbiol 40: 1049–1057 PATERSON G. K., MITCHELL T. J. (2004) The biology of Gram-positive sortase enzymes. Trends Microbiol 12: 89–95 STRAUSS A., G¨OTZ F. (1996) In vivo immobilization of enzymatically active polypeptides on the cell surface of Staphylococcus carnosus. Mol Microbiol 21: 491–500 SAMUELSON P., WERNERUS H., SVEDBERG M., STAHL S. (2000) Staphylococcus surface display of metal-binding polyhistidyl peptides. Appl Environ Microbiol 66: 1243–1248 CANO F., LILJEQVIST S., NGUYEN T. N., SAMUELSON P., BONNEFOY J. Y., STAHL S., ROBERT A. (1999) A surface-displayed cholera toxin B peptide improves antibody responses using food-grade staphylococci for mucosal subunit vaccine delivery. FEMS Immunol Med Microbiol 25: 289–298 CANO F., PLOTNICKY-GILQUIN H., NGUYEN T. N., LILJEQVIST S., SAMUELSON P., BONNEFOY J., STAHL S., ROBERT A. (2000) Partial protection to respiratory syncytial virus (RSV) elicited in mice by intranasal immunization using live staphylococci with surface-displayed RSV-peptides. Vaccine 18: 2743–2752 SANDGATHE A. (2000) Herstellung eines humanen Calcitonin-Vorl¨aufers mit Staphylococcus carnosus: Sekretorische Expression und Aufarbeitung mittels Fließbettadsorption. Dissertation, Heinrich-Heine-Universit¨at D¨usseldorf
21
1
Protein Production in Hansenula polymorpha Hyun Ah Kang Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, Korea
Gerd Gellissen W¨ulfrath, Germany
Originally published in: Production of Recombinant Proteins. Edited by Gerd Gellissen. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31036-4
1 History, Phylogenetic Position, Basic Genetics and Biochemistry of H. polymorpha
A limited number of yeast species are able to utilize methanol as a sole energy and carbon source. The range of methylotrophic yeasts includes Candida boidinii, Pichia methanolica, Pichia pastoris and Hansenula polymorpha [1]. The latter two are distinguished by a growing track record of application in heterologous gene expression. In particular, H. polymorpha has found successful application in the industrial production of heterologous proteins, as detailed in later sections of this chapter [3, 4]. Since H. polymorpha is the more thermotolerant of the two yeasts, it might also be better suited as source and for the production of proteins considered for crystallographic studies. In basic research it is used as model organism for research in peroxisomal function and biogenesis, as well as nitrate assimilation [4–7]. Again, the presence of a nitrate assimilation pathway is a feature not shared by P. pastoris. The first methylotrophic yeast described was Kloeckera sp. No 2201, later re-identified as Candida boidinii [8]. Subsequently, other species, including H. polymorpha, were identified as having methanol-assimilating capabilities [9]. Three basic strains of this species with unclear relationships, different features, and independent origins are used in basic research and biotechnological applications: strain CBS4732 (CCY38–22-2; ATCC34438, NRRL-Y-5445) was initially isolated by Morais and Maia [10] from soil irrigated with waste water from a distillery in Pernambuco, Brazil. Strain DL-1 (NRRL-Y-7560; ATCC26012) was isolated from soil by Levine and Cooney [11]. The strain named NCYC495 (CBS1976; ATAA14754,
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Production in Hansenula polymorpha
Fig. 1 Taxonomy of Hansenula polymorpha [19].
NRLL-Y-1798) is identical to a strain first isolated by Wickerham [12] from spoiled concentrated orange juice in Florida and initially designated Hansenula angusta. Strains CBS4732 and NCYY495 can be mated, whereas strain DL-1 cannot be mated with the other two (K. Lahtchev, personal communication). The genus Hansenula H. et P. Sydow includes ascosporogenic yeast species exhibiting spherical, spheroidal, ellipsoidal, oblong, cylindrical, or elongated cells. One to four ascospores are formed. Ascigenic cells are diploid arising from conjugation of haploid cells. The genus is predominantly heterothallic. H. polymorpha is probably homothallic, exhibiting an easy interconversion between the haploid and diploid states [13, 14]. Since the morphological characteristics of the Hansenula species are shared by species of the genus Pichia Hansen, Kurtzman [15], after performing DNA/DNA reassociation studies, proposed to merge both genera and transfer Hansenula species with hat-shaped ascospores to Pichia Hansen emend Kurtzman, although Hansenula spp. can grow on nitrate and Pichia spp. cannot. Kurtzman and Robnett [16] provided a phylogenetic tree in which nitrate-positive and nitrate-negative Pichia are clustered, demonstrating the unreliability of nitrate assimilation for prediction of kinship. The leading taxonomy monographs follow this proposal, renaming H. polymorpha as Pichia angusta [17, 18]. However, the merging of the genera is still criticized by some taxonomists, and there are arguments for maintaining the established and popular name Hansenula polymorpha [14, 20]. Among a wealth of biochemical and physiological characteristics, some selected features are presented in the following sections; for a more comprehensive view, the reader is referred to the chapters of a recent monograph [4]. Some strains of H. polymorpha can tolerate temperatures of 49 ◦ C and higher [13, 21]. It was shown that trehalose synthesis is not required for growth at elevated temperatures, but that it is necessary for normal acquisition of thermotolerance [21]. The thermoprotective compound trehalose accumulates in large amounts in cells grown at high temperatures. The synthetic steps for trehalose synthesis have been detailed for H. polymorpha. The TPS1 gene encoding trehalose 6-phosphate synthase is the key enzyme gene of this pathway [21, 22]. Transcripts of this gene were found to be present in high quantities in cells grown at normal temperatures, and to be
1 History, Phylogenetic Position, Basic Genetics and Biochemistry of H. polymorpha 3
Fig. 2 Hansenula polymorpha cell. The cells grown in a methanollimited chemostat are crowded with peroxisomes (Courtesy of M. Veenhuis).
especially abundant when grown at elevated temperatures [21]. Accordingly, the TPS1-derived promoter provides an attractive element to drive constitutive heterologous gene expression which can be further boosted at temperatures above 42 ◦ C ([23, 24]; see also the following sections). The capability of H. polymorpha to grow on methanol as a sole energy and carbon source is enabled by a methanol utilization pathway that is shared by all known methylotrophic yeasts [25, 26]. Growth on methanol is accompanied by a massive proliferation of peroxisomes in which the initial enzymatic steps of this pathway take place (Figure 2) [5, 6, 26]. The enzymatic steps of the compartmentalized methanol metabolism pathway are detailed in Figure 3. For more comprehensive information, the reader is referred to [26]. During growth on methanol, key enzymes of the methanol metabolism are present in high amounts. An especially high abundance can be observed for methanol oxidase (MOX), formiate dehydrogenase (FMD), and dihydroxyacetone synthase (DHAS) [32]. The presence of these enzymes is regulated at the transcriptional level of the respective genes. Gene expression is subject to a carbon sourcedependent repression/derepression/induction mechanism conferred by inherent properties of their promoters. Promoters are repressed by glucose, derepressed by glycerol, and induced by methanol. Again, these promoter elements – and in particular the elements derived from the MOX and the FMD genes – constitute attractive components for the control of heterologous gene expression that can be regulated by carbon source addition (see forthcoming sections). The possibility of eliciting high promoter activity with glycerol as sole carbon source and even with limited addition of glucose (glucose starvation) is unique among the methylotrophic yeasts. In the related species C. boidinii, P. methanolica, and P. pastoris, the active status of the promoter is strictly dependent on the presence of methanol or methanol derivatives [1]. However, this does not seem to be an inherent promoter characteristic; rather, it rather depends on the cellular environment of the specific host as upon
4 Protein Production in Hansenula polymorpha
Fig. 3 The methanol utilization pathway and its compartmentalization in methylotrophic yeasts. (Modified after [1, 26].) 1 Methanol is oxidized by alcohol oxidase to generate formaldehyde and hydrogen peroxide. 2 The toxic hydrogen peroxide is decomposed by catalase to water and oxygen. 3, 4 Within a dissimilatory pathway the formaldehyde is oxidized by two subsequent dehydrogenase reactions to carbon dioxide, catalyzed by a formaldehyde dehydrogenase (FLD) and a formate dehydrogenase (FMD or FDH). 5 For assimilation, the formaldehyde reacts with xylulose 5-phosphate (Xu5 P) by the action of dihydroxya-cetone synthase (DHAS) to
generate the C3 compounds glyceraldehyde 3-phosphate (GAP) and dihydroxyacetone (DHA). 6 DHA is phosphorylated by dihydroxyacetone kinase (DHAK) to dihydroxyacetone phosphate (DHAP). 7 GAP and DHAP yield in an aldolase reaction fructose 1,6-biphosphate (FBP). 8 In further steps of the pentose phosphate cycle, fructose 5-phosphate and xylulose 5-phosphate are finally generated. Identified and characterized genes of the H. polymorpha methanol utilization pathway are boxed and shown in the pathway position of the encoded enzymes. The genes are MOX [27], DAS [28], CAT [29], DAK [104], FLD1 [30]), and FMD [31].
transfer into H. polymorpha the P. pastoris-derived AOX1 promoter is active under glycerol conditions [33, 34].
2 Characteristics of the H. polymorpha Genome
As pointed out before, there are several H. polymorpha strains available. In the following section we focus on strains CBS4732 (CCY38–22-2; ATCC34438; NRRLY-5445) and DL-1 (NRRL-Y-7560; ATCC26012) which are the two ancestor strains of the preferred H. polymorpha hosts employed in heterologous gene expression. Data on karyotyping are available for both strains (Figure 4; Table 1).
2 Characteristics of the H. polymorpha Genome
Fig. 4 Electrophoretic karyotype of H. polymorpha strains CBS4732 and DL-1. A) Chromosome pattern of H. polymorpha strain CBS4732 separated by pulsed-field gel electrophoresis (Schwarz and Cantor 1984) using the Pulsaphor apparatus (Pharmacia). Details of the separation conditions are provided elsewhere [35]. B) Chromo Blot. The separated chromosomes were transferred
to a nylon membrane and hybridized to a labeled URA3 probe. A signal was obtained exclusively with chromosome I. (Modified and supplemented according to Waschk et al., [35].) C) Chromosome pattern of strain DL-1 (MJ Ok, HA Kang, unpublished results). Details of the separation conditions will be published elsewhere.
Table 1 Chromosomal localization of several cloned genes in strains CBS4732
Cloned gene/sequence used as specific probe
Function
Reference
Chromosome no.
URA 3
Orotidine-5 -phosphate decarboxylase
Merckelbach et al., (1993)
I
CPY
Carboxypeptidase Y
Unpublished1)
I
GAP
Glycerinaldehyde 3-phosphate dehydrogenase
Unpublished2)
I
rDNA (5.8S, 18S, 26S)
Ribosomal DNA
Klabunde3)
II
HARS1
Autonomously replicating sequence 1 Trehalose 6-phosphate synthase
Ledeboer et al., (1986)
III
[21]
IV
β-Isopropylmalate dehydrogenase
Agaphonov et al., (1994)
IV
MOX
Methanol oxidase
[27]
V
FMD
Formiate dehydrogenase
Hollenberg and Janowicz (1989)
VI
TPS1 HLEU2
1)
Bae JH, Kim HY, Sohn JH, Choi ES, Rhee, SK. Accession number U67174. Sohn JH, Choi ES, Rhee SK. Accession number U95625. 3) J Klabunde, personal communication. 2)
5
6 Protein Production in Hansenula polymorpha Table 2 Comparison of selected gene sequences from H. polymorpha strains CBS4732 (RB11)
and DL-1 Gene name
Amino acid identity (%)
Nucleic acid identity (%)
Accession No. in GenBank*
Reference
CST13 CPY GSH2 MNN9 PMI40 PMR1 YPT1 Average
96.7 98.0 96.0 96.3 97.9 98.5 99.5 97.6
95.8 95.9 94.5 95.5 94.9 95.2 97.2 96.6
AF454544 U67174 AF435121 AF264786 AF454544 U92083 AF454544
[38, 39] KRIBB KRIBB [37] [38, 39] Kang et al., (1998) [38, 39]
∗
The sequences of genes isolated from the DL-1 strain were obtained from GenBank and compared with those from the RB11 strain [36].
Pulsed-field gel electrophoresis of H. polymorpha chromosomes revealed that both strains have six chromosomes, ranging from 0.9 to 1.9 Mbp, but the electrophoretic patterns of their chromosomes were quite different. The scientific and industrial significance of strain CBS4732 is now met by the recent characterization of its entire genome [36]. A few gene sequences are elucidated and can be compared for both strains (Table 2). The sequence identity of the open reading frame (ORF) for the selected genes ranges between 94.5 and 97.2%, with an average value of 96.6%. The sequence differences are observed to be much more apparent at the 5 - and 3 -untranslated regions, which might be involved in controlling gene expression. This implies that two strains are closely related, but have distinct genetic and physiological characteristics. Several groups worldwide initiated studies on the CBS4732 genome several years ago. Included in the comparative genome analysis on 13 hemiascomycetous yeasts, part of the H. polymorpha genome sequence was established using a partial random sequencing strategy with a coverage of 0.3 genome equivalents. Using this approach, about 3 Mbp of sequencing raw data of the H. polymorpha genome was yielded [41]. The recently terminated genome analysis aimed at a higher coverage using a BAC-to-BAC approach and resulting in the comprehensive genome analysis of this organism [36]. For sequencing of H. polymorpha strain RB11, an odc1-derivative of strain CBS4732, a BAC library with approximately 17 × coverage was constructed in a pBACe3.6 vector according to [42, 43]. Details on base calling, assembly and editing are provided by Ramezani-Rad et al., [36]. Sequencing resulted in the characterization of 8.733 million base pairs assembled into 48 contigs. The derived sequence covers over 90% of the estimated total genome content of 9.5 Mbp located on six chromosomes which range in size from 0.9 to 2.2 Mbp (see Figure 4A). From the sequenced 8.73 Mb, a total of 5848 ORFs have been extracted for proteins longer than 80 amino acids (aa), and 389 ORFs smaller than 100 aa were identified. Likewise, 4771 ORFs have homologues to known proteins (81.6%). Calculation of the gene density and protein length, taking into account the gene numbers, showed an average gene density of one gene per 1.5 kb, and an
3 N-linked Glycosylation in H. polymorpha 7 Table 3 Hansenula polymorpha genome statistics
Contigs: Total length of contigs: Average contig length: No. of extracted ORFs: No. of ORFs <100 aa: Average gene density: Average gene size (start-stop): Average protein length:
48 8 733 442 bp 182 kb 5848 389 1 gene/1.5 kb 1.3 kb (1320 nt) 440 aa
average protein length of 440 amino acids. Ninety-one introns have been identified by homology to known proteins and confirmed by using GeneWise [44]. Eighty tRNAs were identified, corresponding to all 20 amino acids. From approximately 50 rRNA clusters [35, 44, 45], seven clusters have been fully sequenced. All clusters are completely identical and have a precise length of 5033 bp. The main functional categories and their distribution in the gene set are manually predicted for energy, 4%; cellular communication, signal transduction mechanism, 3%; protein synthesis, 6%; cell rescue, defense and virulence, 4%; cellular transport and transport mechanisms, 9%; cell cycle and DNA processing, 9%; protein fate (folding, modification, destination), 17%; transcription, 13% and metabolism, 19%. A selection of the data obtained from the annotated sequence is provided in Tables 3 and 4.
3 N-linked Glycosylation in H. polymorpha
The similarities between yeast and animal cell secretion pathways have made yeasts in general preferred microbial host systems for the production of human secretory proteins. A majority of human proteins with therapeutic potential are glycoproteins, Table 4 Functional categorization of genes
Functional category
No. of ORFs
%
Metabolism Energy Cell growth, cell division and DNA synthesis Transcription Protein synthesis Protein destination Transport facilitation Cellular transport and transport mechanisms Control of cellular organization Cellular communication/signal transduction Cell rescue, defense, and virulence Cell fate Regulation of/interaction with cellular environment
1114 231 518 767 323 1014 423 518 417 170 260 282 184
19 4 9 13 6 17 7 9 7 3 4 5 3
8 Protein Production in Hansenula polymorpha
and increasing evidence shows that oligosaccharides assembled on glycoproteins have profound effects on the properties of glycoproteins, such as antigenicity, specific activity, solubility, and stability. The initial processing of N-linked glycans on glycoproteins, which occurs in the endoplasmic reticulum (ER), is well conserved among eukaryotes and results in the core oligosaccharide, Man8 GlcNAc2 . However, further maturation of oligosaccharides in the Golgi apparatus is quite variable among organisms, even among yeast species [47]. Yeasts elongate the core oligosaccharide mostly by addition of mannose, leading to the formation of core-sized structures (Man<15 GlcNAc2 ) as well as hypermannose structures (Man50–200 GlcNAc2 ) with extended poly-α-1,6 outer mannose chains, which are decorated with various carbohydrate side chains in a species-specific manner. In Saccharomyces cerevisiae, the linear backbone of the outer chain is often composed of 50 or more mannoses, highly branched by addition of α-1,2- linked mannoses and terminally capped with α-1,3-linked mannoses, generating heavily hypermannosylated glycoproteins. The outer chain also contains several mannosylphosphate residues, conferring the oligosaccharide with a negative net charge [48, 49]. Compared to S. cerevisiae, the mannose outer chains of N-linked oligosaccharides generally appear to be much shorter in the methylotrophic yeasts P. pastoris and H. polymorpha, although extensive hyperglycosylation has also been reported in a few cases of recombinant glycoproteins produced in these yeasts [50, 51]. Analyses of N-linked oligosaccharides added to native and recombinant glycoproteins from P. pastoris have indicated that the major oligosaccharide species in P. pastoris are Man8–14 GlcNAc2 with short α-1,6 extensions. More significantly, P. pastoris oligosaccharides are reported to have no hyperimmunogenic terminal α-1,3 glycan linkages [52, 53]. Phosphomannose has been detected in both elongated and core oligosaccharides on some recombinant proteins of P. pastoris [54, 55]. At present, very little information exists on the structural characteristics of N-linked oligosaccharides of H. polymorpha-derived glycoproteins. A comparative study on the glycosylation pattern of recombinant human α 1 -antitrypsin produced in S. cerevisiae, H. polymorpha, and P. pastoris has suggested that the length of outer mannose chains attached to the recombinant protein in H. polymorpha was much shorter than in S. cerevisiae, but slightly longer than in P. pastoris [56]. A recent study on the structure of the oligosaccharides derived from the recombinant Aspergillus niger glucose oxidase (GOD) and the cell wall mannoproteins derived from H. polymorpha has revealed that most oligosaccharide species attached to the recombinant GOD have core-sized structures (Man8–12 GlcNAc2 ) without terminal α-1,3-linked mannose residues [49]. Therefore, the outer chain processing in the N-linked glycosylation pathway in H. polymorpha appears to be similar to that in P. pastoris, with the addition of short outer chains to the core and no terminal α-1,3-linked mannose addition (Figure 5). In contrast to the yeast oligosaccharides composed solely of mannose, a variety of sugars including galactose, N-acetylgalactosamine and sialic acid, are added to oligosaccharides in mammals. The differences in glycan processing between yeasts and humans are a major limitation for yeasts to be used in the production of recombinant glycoproteins for therapeutic use. Glycoproteins derived from yeast
3 N-linked Glycosylation in H. polymorpha 9
Fig. 5 The representative N-linked oligosaccharide structures assembled on the S. cerevisiae, P. pastoris, or H. polymorpha-derived glycoproteins. Information on the S. cerevisiae and P. pastoris oligosaccharides is from Gemmill and Trimble [47]).
expression systems contain nonhuman N-glycans of the high-mannose type, which are immunogenic in humans. Attempts have been made genetically to modify glycosylation processes in S. cerevisiae [57] and P. pastoris [58], in order to trim the yeast N-glycans of the high-mannose type to the human glycans of the (Man5 GlcNAc2 ) intermediate type. A more advanced achievement has been recently made genetically to re-engineer the glycosylation pathway of P. pastoris to produce the complex human N-glycan N-acetylglucosamine2 -mannose3 -N-acetylglucosamine2 (GlcNAc2 Man3 GlcNAc2 ) [59]. Potentially, the development of H. polymorpha expression systems for proper glycosylation can be achieved as further understanding is gained of H. polymorpha-specific carbohydrate structure and processing sugar transferases. To date, only a few H. polymorpha genes and mutants related to protein glycosylation have been reported [37–39, 60, 61]. Information from the H. polymorpha genome sequence [36, 62] will expedite the identification of genes that are predicted to be involved in the biosynthesis of sugar chains. The functional characterization of these genes should facilitate delineation of the H. polymorphaspecific N-glycosylation pathway, and this would provide valuable information for the development of glyco-engineering strategies in H. polymorpha to achieve the optimal glycosylation of recombinant proteins.
10 Protein Production in Hansenula polymorpha
4 The H. polymorpha-based Expression Platform 4.1 Transformation
Recombinant H. polymorpha strain generation requires special tools. Application of the commonly used S. cerevisiae 2 µm sequence with its predominantly episomal fate is restricted to a limited set of host strains [63]. A variety of gene replacement approaches have been described, but these require DNA fragments with termini comprising much longer target gene overlaps than those used in S. cerevisiae [64, 65]. Typically, these overlaps must be in the magnitude of hundreds to thousands of base pairs. In addition, since the background caused by nonhomologous recombination is high, screening of more transformants than in S. cerevisiae-based systems is necessary to identify a strain with the desired integration/replacement [66]. Plasmids harboring one of a set of several cloned sub-telomeric ARS sequences derived from the DL-1 strain have been described, which homologously integrate into a genomic counterpart, resulting in the recombinant strains harboring single or multiple tandemly repeated copies at the respective sub-telomeric genomic locus [67, 68]. A set of vectors has been described to target the heterologous DNA to the rDNA locus of H. polymorpha [24, 45, 46]. Most commonly, plasmids harboring HARS1 (Hansenula ARS1) as a replication signal are used to generate recombinant H. polymorpha strains. The fate of these plasmids in H. polymorpha has been extensively analyzed for both RB11 (derived from CBS4732; see the following section) and DL-1 strains [24, 66]. The strain generation by use of HARS1 plasmids will therefore be described in more detail. After transformation, cells are plated on selective media according to the selection marker gene present on the HARS1 plasmid. Macroscopic colonies typically appear after 4–5 days of incubation at 37 ◦ C. In this early phase, all cells of a colony harbor the plasmid as an episome at a low copy number; plasmid loss can be induced by cultivation on a nonselective medium. Colonies are then transferred to liquid selective medium and incubated under vigorous shaking for 24–48 h at 37 ◦ C. This procedure is called the “passaging step”. An aliquot of the dense culture is then used to inoculate fresh selective medium, and the incubation is repeated. After three to eight subsequently applied passaging steps, cells grow with the initially episomal plasmid integrated into the genome. In a particular single strain developed for the production of a hepatitis B vaccine, a recombination within the FMD locus was observed (U. Dahlems, unpublished results). In order to separate these cells from those still harboring the plasmid as an episome, one or two “stabilization steps” must be performed in nonselective liquid media. If aliquots of these cultures are plated onto selective media, the resulting single colonies will exclusively represent strains harboring one to multiple tandemly repeated copies of the HARS1 plasmid. The various individual strains can vary significantly in the expression rates of the foreign gene, present on the plasmid. However, typically only a few strains display very high levels of target protein. For a high probability
4 The H. polymorpha-based Expression Platform
of obtaining a “high producer” in the first approach, parallel generation of up to 150 strains is recommended. Once a suitable strain is identified, a so-called “supertransformation” can be performed using a second HARS1 plasmid with a different selection marker gene. This second plasmid may contain either the same heterologous expression cassette as the first plasmid, or a different one. In the first case, strains would result which might display higher production rates of the target protein than the basic strain; in the second case, strains would be obtained co-expressing two different heterologous genes at variable but individually fixed relative expression rates [1]. To summarize this procedure, the generation of recombinant strains in H. polymorpha is clearly more laborious than in other yeasts. However, these additional difficulties are balanced by two positive features which are highly favorable in biotechnological applications. First, heterologous gene expression in H. polymorpha can be controlled by homologous promoters of extraordinary strength. While the carbon source-regulated MOX and FMD promoters are derived from genes of the methanol degradation pathway, the TPS1 promoter, derived from the trehalose 6-phosphate synthase gene of H. polymorpha, is constitutive with regard to different carbon sources and can be influenced by different temperatures [23]. In combination with the obtained high copy numbers of the integrated HARS1 plasmid (up to 100 copies per haploid genome may result from supertransformation), these strong promoters can provide very high expression rates of the heterologous gene in selected strains; indeed, for secreted phytase, product levels of up to 13.5 g L−1 have been obtained [69]. The second favorable feature of recombinant H. polymorpha is the unambiguous mitotic stability of the individual strains with regard to the copy numbers of the HARS1 plasmids integrated, even upon cultivation on nonselective media for a long period of time. This stability has been well documented in several examples over periods of 800 generations. For plasmid examples, see Section 4.3 and Figures 7 and 8; for some product and process examples, see Section 5. 4.2 Strains
Starting with the three H. polymorpha parental strains, NCYC495, CBS4732, and DL-1 (see Section 6.1), some forty to fifty other strains have been derived. The DL-1 strain is not employed in classical genetic analyses, and no data are available regarding its ability to mate and to sporulate [70]. The DL-1 strain has certain advantages in that it has a higher growth rate and adapts more quickly to culture media than the other parental strains; additionally, DL-1 strains have a higher frequency of homologous recombination than other strains [66, 70]. The inability of the DL-1 strain to copulate makes this strain inconvenient for classical genetic manipulation exploiting meiotic segregation. However, the relatively high frequency of homologous recombination in the DL-1 strain enables the application of several molecular genetic techniques developed in S. cerevisiae to be used in the organism. Several host strains suitable for heterologous protein expression, including
11
12 Protein Production in Hansenula polymorpha
Fig. 6 Strategy of gene disruption using fusion PCR and in-vivo recombination in H. polymorpha. Step 1) Fusion-PCR. The N- and C-terminal fragments of “Gene X” are amplified and fused with the N- and Cfragments of the HpURA3 popout cassette containing the overlapped internal sequence of HpURA3 (100 bp) by PCR. Step 2) In-vivo DNA recombination. The two fusion-PCR products obtained are introduced into H. polymorpha cells and converted into one linear gene disruption cassette via in-vivo
homologous recombination at the overlapped sequence. The double homologous crossover of the disruption cassette at the “Gene X” locus results in the disruption of “Gene X”. Step 3) The HpURA3 pop-out cassette can be removed to recover the auxotrophic marker for subsequent gene disruption or for subsequent transformation. A detailed procedure will be described elsewhere (M.W. Kim, H.A. Kang, unpublished results).
auxotrophic mutants, protease-deficient strains, and mox-negative strains, have been constructed in the DL-1 strain, mostly using gene disruption techniques (see the list of strains in Appendix Table A1). A pop-out cassette using HpURA3 as a selection marker has been constructed to recover the auxotrophic marker for the subsequent gene disruption or for subsequent transformation with expression vectors [66]. Combined with fusion PCR and in-vivo recombination, the use of the HpURA3 pop-out cassette becomes more simplified in constructing null mutant strains with disruption of the gene of interest (see Figure 6). This approach eliminates the time-consuming steps of ligation and sub-cloning, which are otherwise required for the construction of a gene deletion cassette.
4 The H. polymorpha-based Expression Platform
Fig. 7 General design of an expression vector for H. polymorpha RB11. A standard H. polymorpha expression vector contains the ori (origin of replication), a strong promoter, and a terminator (connected by a multiple cloning site for insertion of the foreign gene to be expressed), a selection
marker from H. polymorpha or another yeast, and/or a selection marker for antibiotic resistance, a replication sequence (HARS), or alternatively a sequence for targeted integration into the genome sequence. For a list of elements available for insertion into a plasmid, see Table A2.
Most classical genetic techniques have been performed using NCYC495, which shows both mating and sporulation [70]. Unlike the other two parental strains, NCYC495 does not grow well on methanol-containing media and therefore does not have the strong methanol-induced promoters available to the other strains for gene expression. Instead, NCYC495 has other interesting applications, including its employment for the study of nitrate assimilation mentioned previously [7]. Cells from CBS4732 grow well on methanol, and show strong mating and sporulation [70]. Both CBS4732 (strain RB11 and its derivatives in particular) and DL-1 are employed in the production of recombinant products [1, 4, 71–73]; see also the forthcoming section of this chapter). In contrast to DL-1 strains, some substrains of CBS4732 are not easily applied in recombinatory methods, perhaps due to their high mitotic stability [24]. For a selection of H. polymorpha strains and for protocols specific to parental strains, see Degelmann [74, 75] and the list in Appendix Table A1. 4.3 Plasmids and Available Elements
Expression and integration vectors in H. polymorpha are composed of prokaryotic and yeast DNA [63]. Vectors are either supplied as circular plasmid or linearized and
13
14 Protein Production in Hansenula polymorpha
Fig. 8 Vectors designed for copy number-controlled gene integration in H. polymorpha DL-1 using auxotrophic selection markers (A) or antibiotic resistance markers (B and C). Refer to Kang et al. [66] for more detailed information on these multiple integration vectors.
targeted to a specific genomic locus. Possible targets for homologous integration include the MOX/TRP locus [76], an ARS sequence [77, 78]), the URA3 gene [79], the LEU2 gene [77], the GAP promoter region [80], or the rDNA cluster [45, 46]. Clearly, the circular plasmids are not randomly integrated, but recombine with genomic sequences represented on the vector. This was recently shown with a particular vector harboring a FMD promoter/HBsAg fusion where recombination within the FMD gene was observed (U. Dahlems, personal communication). It remains to be shown whether homologous recombination also takes place with vectors equipped with MOX, TPS1 and other promoter elements. Standard expression vectors and elements available for insertion into H. polymorpha is illustrated in Figures 7 and 8. [For detailed information on the various H. polymorpha expression platforms, see [3, 24, 66]. Plasmids that have been successfully developed for industrial use of RB11-based strains include pFPMT121 (for production of phytase) and a derivative of strain pMPT121 (for production of the anticoagulant hirudin) [24] (Figure 7].
4 The H. polymorpha-based Expression Platform
Plasmids AMpL1, AMIpLD1, and AMIpSU1 have been used as multiple integration systems based on the complementation of auxotrophic mutations, to elicit desired plasmid copy numbers in DL-1-derived recombinant strains (Figure 8A). When an appropriate mutant strain is transformed with one of these plasmids under selective conditions, transformants with plasmid integrated in low (1–2), moderate (6–9), or high (up to 100) copy numbers can be rapidly selected [77]. Another rapid and copy number-controlled selection system has been developed using antibiotic resistance markers (Figure 8B and C). The G418 and hygromycin B resistance cassettes were used as dominant selection markers, which allow selection of transformants on plates containing different concentrations of G418 [81] or hygromycin B [82]. Due to the strong correlation between antibiotic resistance and integration copy number, the selection of transformants with different copy numbers ranging from 1 to 50 can be easily achieved. For a selection of H. polymorpha expression/integration vectors, refer to Degelmann [74, 75] and to the list provided in Appendix Table A1. Plasmids harboring CEN/ARS (automously replicating sequences) elements typically support autonomous replication of plasmids within host cells. In H. polymorpha, centromeres have not yet been isolated. The established plasmids contain HARS elements, but not a CEN region. These plasmids may integrate into the host DNA over a number of generations, resulting in mitotically stable strains with as many as 100 plasmids in tandem repeats [1, 66]. Notably, a high number of integrated copies is not always a pre-requisite for high-level protein production, especially in the case of secretory production. In one case, four copies of a HARS vector were sufficient to obtain efficient expression of Schwanniomyces occidentalis glucoamylase in H. polymorpha [83]. Other good examples are the secretory expression of human urinary-type plasminogen activator (u-PA) and human serum albumin (HSA) in H. polymorpha. In these cases, a single- or two-copy integration of the expression vector resulted in the maximum levels of recombinant u-PA or HSA secreted into culture supernatants [66]. Signal sequences may be fused to the target ORF for direct release of synthesized proteins into the media, or into a preselected cell compartment, such as the peroxisome, the vacuole, the ER, the mitochondrion, or the cell surface. Available signal sequences include the peroxisomal targeting signals PTS1 and PTS2 [84], the repressible acid phosphatase (PHO1) secretion leader sequence [85], a S. occidentalis-derived GAM1 [84, 86], and the S. cerevisiae-derived MFα1 [1]. Glycosylphosphatidylinositol (GPI)-anchoring motifs derived from the GPI-anchored cell surface proteins, such as HpSED1, HpGAS1, HpTIP1, and HpCWP1, have been recently exploited to develop a cell surface display system in H. polymorpha. When the recombinant glucose oxidase (GOD) was expressed as a fusion protein to these anchoring motifs, most enzyme activity was detected at the cell surface [38]. One of the main advantages of heterologous gene expression in H. polymorpha is that this yeast has unusually strong promoters, the most widely employed of which are derived from genes of the methanol utilization pathway (see Figure 3). These promoters include elements derived from the methanol oxidase (MOX), formate dehydrogenase (FMD), and dihydroxyacetone synthase (DHAS) gene [1, 84].
15
16 Protein Production in Hansenula polymorpha
Other available (but less frequently applied) regulative promoters are derived from inducible genes encoding enzymes involved in nitrate assimilation (e. g., YNT1, YNI1, YNR1, which can be induced by nitrate and repressed by ammonium) [87], or the enzyme acid phosphatase (the PHO promoter) [30, 85]. Examples of constitutive promoters are ACT [82], GAP [80], PMA1 [88], and TPS1 [23]. The PMA1 promoter even competes with the outstanding MOX promoter in terms of high expression levels; PMA1 is of interest in the co-expression of genes on industrial scale [88]. The performance of the TPS1 promoter is not linked to the use of a particular carbon source. In contrast to the constitutive promoters listed above, it can be applied at elevated temperatures, where its activity may be boosted even further [23]. The H. polymorpha FLD1 gene encoding formaldehyde dehydrogenase has been characterized recently [30]. FLD1p is essential for the catabolism of methanol, and shows 82% sequence identity with the Fld1p protein from P. pastoris and 76% identity with Fld1p from C. boidinii. The FLD1 promoter promises to be advantageous in that expression can be controlled at two levels: it is strongly induced under methylotrophic growth conditions, but shows moderate activity using primary amines as a nitrogen source. With these promising characteristics, the FLD1 promoter is expected to augment the existing H. polymorpha promoters [30]. The GAP promoter also showed a higher specific production rate and required a much simpler fermentation process than the MOX promoter-based HSA production system, implying that the GAP promoter can be a practical alternative of the MOX promoter in the large-scale production of some recombinant proteins [80].
5 Product and Process Examples
We now provide a short summary of H. polymorpha-based processes. Here, a few industrially relevant examples are only briefly summarized; for a more detailed description of fermentation and purification procedures, the reader is referred to Jenzelewski [89]. Once stable recombinant integrants have been generated, production strain candidates are identified from a background of nonproducers or strains of low or impaired productivity. The subsequent design of a fermentation procedure greatly depends on the characteristics of the host cell, the intended routing of the recombinant gene product and, most importantly, on the promoter elements used. The commonly used culture media are based on simple synthetic components. They contain trace metal ions and adequate nitrogen sources, which are required for efficient gene expression and cell yield, but no proteins. The total fermentation time varies between 60 and 150 hours. Due to the inherent versatile characteristics of the two methanol-inducible promoters, fermentation modes vary for the most part in the supplemented carbon source: glycerol, methanol, glucose, and combinations thereof may be selected. The ability to achieve high yields of a recombinant product expressed from a methanol pathway promoter without the addition of methanol is a unique feature of the H. polymorpha system (see Section 6.1; [1, 24]. In contrast,
5 Product and Process Examples
activation of these promoters in the related yeast P. pastoris is strictly dependent on the presence of methanol [90]. In processes for secretory heterologous proteins, a “one-carbon source” mode is usually employed, supplementing the culture medium only with glycerol. A hirudin production process may serve as an example for this fermentation mode. In this process, a strain was employed that harbors 40 copies of an expression cassette for an MFα1 pre-pro-sequence/hirudin fusion gene under control of the MOX promoter [86, 91, 92]. Hirudin production was promoted by reducing the initial glycerol concentration and maintaining it on a suitable level by a pO2 -controlled addition of the carbon source. The fermentation is started with 3% (w/v) glycerol. After consumption of the carbon source (after 25 h), the pO2 -controlled feeding mode is initiated and this results in a glycerol concentration of between 0.05 and 0.3% (w/v) (derepression of the MOX promoter). The fermentation run is terminated after 36 h of derepression (total fermentation time 72 h). The broth is then harvested and the secreted product purified from the supernatant by a sequence of ultrafiltration, ion exchange, and gel filtration steps. In the case of HBsAg production, a “two carbon source” fermentation mode was used. The producer strain harbors high copy numbers of an expression cassette, with the coding sequence for the small surface antigen (S-antigen) under control of methanol pathway promoters. The selected strain is fermented on a 50-L scale. The product-containing cells are generated via a two-fermentor cascade, consisting of a 5-L seed inoculating the 50-L main fermentor. The initial steps of fermentation closely follow those described for the production of hirudin. Initially, cultivation is performed with a glycerol feeding in a fed-batch mode, to be followed by subsequent semi-continuous glycerol feeding controlled by the dissolved oxygen level in the culture broth. This derepression phase is then followed by batchwise feeding with methanol in the final fermentation mode. The product concentration increases to amounts in the multigram range. The product consists of a lipoprotein particle in which the recombinant HBsAg is inserted into host-derived membranes. As noted in Section 6.1, the addition of methanol also serves for the proliferation of organelles and consequently for the synthesis and proliferation of membranes. Methanol is thus needed in this case to provide a high yield and balanced co-production of both components of the particle [93, 94]. For downstream processing, the harvested cells are disrupted and the particles purified in a multi-step procedure that includes adsorption of a debris-free extract to a matrix and the subsequent application of a sequence of ion exchange, ultra-filtration, gel filtration, and ultra-centrifugation steps as detailed by Schaefer et al. [93, 94]. For the production of phytase, H. polymorpha has been used in a particularly efficient process [69, 95], a pre-requisite for an economically competitive production of a technical enzyme. In this development all steps and components of strain generation, fermentation, and purification are dictated by a rationale of efficiency and cost-effectiveness. This also applies to the definition of fermentation process using glucose as the main carbon source. A strain was generated in which the phytase sequence is under control of the FMD promoter. Subsequent supertransformation yielded strains with up to 120
17
18 Protein Production in Hansenula polymorpha
copies of the heterologous DNA, thus enabling a gene dosage-dependent high productivity. A fermentation procedure was then developed to achieve high levels of enzyme production. Significantly, it was found that the use of glycerol as the main carbon source was not required in this case, but could be substituted by lowcost glucose. The active status of the FMD-promoter was maintained by glucose starvation (fermentation with minimal levels of continuously fed glucose). At a 2000-L scale, fermentation with glucose as the sole carbon source led to high product yields and an 80% reduction in raw material costs compared to glycerol-based fermentations [69, 95]. Strains were found to produce the recombinant phytase at levels ranging up to 13.5 g L−1 [69]. The secreted product is purified through a series of steps, including flocculation centrifugation, dead-end filtration, and a final ultrafiltration yielding a high-quality, highly concentrated product at a recovery rate of up to 92%. A short outline of product recovery and downstream processing was provided for the previous examples of two secreted and one intracellular product. An individual procedure must be defined for each process developed. Especially in the case of secreted compounds, the fermentation and primary product recovery are intimately linked, and this interface of upstream and downstream processing is often the objective of successful integrated bioprocess development [96, 97]. A typical example of this is the production of aprotinin variants in H. polymorpha [98]. A selection of H. polymorpha-derived products is listed in Table 5.
6 Conclusion 6.1 Limitations of the H. polymorpha-based Expression Platform
Despite the most favorable characteristics of the H. polymorpha-based platform for application in heterologous gene expression, problems and limitations may be encountered in particular strain and product developments, as is similarly (and more Table 5 H. polymorpha-based products (selection)
Category
Product
Status
Brand name
Reference
Pharmaceuticals
HBsAg (adr) HbsAg (adw) Insulin IFNα-2a HSA EGF hexose oxidase phytase Levansucrase
Launched Launched Launched Process transfer Pilot scale completed Lab scale completed Launched Registration Lab scale completed
HepaVax Gene AgB Wosulin
[94] [94]
Food additive Feed additive Enzymes
Grindamyl-Surebake
[72] [80] [40] [105] [69] [73]
6 Conclusion Table 6 Improvement of the expression performance of H. polymorpha production strains by
the co-expression or deletion of other genes Gene expressed
Problem encountered
Co-expressed (+) or deleted (−) gene
IFNα-2a Enzyme, IFNγ EGF
Incorrect pre-procleavage Overglycosylation impaired secretion C-terminal truncation
KEX2 (+) CMK2; CNE1 (+) KEX1 (−)
frequently) seen in other yeast systems. These limitations include overglycosylation [61], retention within the ER [99], poor secretion, impaired processing [2, 72], and proteolytic degradation [24]. A possible strategy to overcome these limitations is to identify genes and gene products that may, upon disruption or co-expression, positively influence the performance of respective strains. This has been applied successfully in several cases. For example, the deletion of the KEX1 gene, coding for carboxypeptidase α, led to a significant improvement in the quality of recombinant human epidermal growth factor (hEGF) secreted from H. polymorpha by decreasing the generation of C-terminally truncated hEGF form [80]. Among others, co-expression of the S. cerevisiae-derived KEX2 gene provided a greatly improved processing of a IFNα-2a pre-pro-sequence in H. polymorpha in which production of predominantly N-terminally extended molecules had been observed previously [2, 72]. In other examples, the co-expression of the S. cerevisiae-derived CMK2 or the H. polymorpha CNE1 (calnexin) gene led to an improved secretion and a reduction in overglycosylation of a secreted enzyme and a cytokine (Table 6). 6.2 Impact of Functional Genomics on Development of the H. polymorpha RB11-based Expression Platform
Several approaches have been initiated to identify H. polymorpha genes that may be manipulated to effect a positive influence on the performance of particular production strains. Examples include identification of the PMR1 gene [56] and glycosylation genes [3, 38]. With the completion of genome sequencing, transcriptome, proteome analysis and other related technologies are all now feasible and enable more systematic approaches to be introduced. In a first approach, genes will be identified that are involved in methanol metabolism, peroxisome homeostasis, protein glycosylation, secretion, and cell wall integrity. These tasks are executed within a cooperative effort with partners in Russia, Ukraine, The Netherlands and Germany, and funded by INTAS (INTAS 2001-0583). For the identification of these genes, linear DNA fragments harboring reporter genes are used for random integration, thereby generating mutants. By using this random integration (RALF) approach [65], certain genes of potential impact for relevant gene expression functions may be disrupted and identified by sequencing the region adjoining the integration site and comparing the deduced sequence with the genome data. By applying a selection of suitable reporter proteins
19
20 Protein Production in Hansenula polymorpha
and a range of certain growth conditions, the generated strains can be screened for genes which might have an impact on the functions mentioned above. For transcriptome analysis, H. polymorpha cDNA microarrays are being generated in cooperation with KRIBB (Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea), funded by the Korean Ministry of Science and Technology (Microbial Genomics and Applications R& D Program). As an initial trial, a partial genome cDNA chip spotted with 382 ORFs of H. polymorpha was constructed. Each ORF was PCR-amplified using gene-specific primer sets, of which the forward primers have a 5 -aminolink. The PCR products were printed in duplicate onto the aldehyde-coated slide glasses to link only the coding strands to the surface of the slide via covalent coupling between amine and aldehyde groups. The partial DNA chip was used to analyze differential expression profiling of H. polymorpha cells cultivated under defined environmental conditions [100]. At present, the whole-genome cDNA microarrays of H. polymorpha are available, and these have been constructed using the same strategy as applied to fabrication of the partial cDNA chip. It is expected that these whole-genome DNA microarrays will provide a powerful tool for a genome-wide transcriptional profiling of H. polymorpha under defined genetic and physiological conditions. This will generate invaluable information for pathway engineering and process optimization in exploiting H. polymorpha as a cell factory. A third project which is to be started soon is a comprehensive analysis of the proteome of recombinant H. polymorpha production strains in correlation to specific products, secretion efficiency, and other characteristics. The extraction of defined proteins from two-dimensional SDS gels and mass spectrometric analysis of proteolytic fragments will lead to the identification of proteins and their respective genes, with a potential impact on the performance of the H. polymorpha expression platform. The availability of the complete genome and various ‘omic’s approaches surely facilitates extensive exploration of interesting genes and strong promoters, and will thus further the development of expression systems to supplement the strong platform that already exists for H. polymorpha. Yeasts have only come under intense molecular study during the past few decades, and yeast-derived recombinant products have been already developed, ranging from technical enzymes and anticoagulants (hirudin and saratin) [91, 92, 101, 102] to vaccines such as hepatitis B [93, 94]. S. cerevisiae is well characterized, being the first organism in which recombinant vaccines were developed, and the first eukaryotic organism to have its genome entirely sequenced. However, H. polymorpha has several advantages over S. cerevisiae, including strong and tightly regulated promoters, the lack of hyperallergenic structures on target proteins, capabilities of dense growth on simple media, stability of expression plasmids, and high frequency of nonhomologous recombination. Moreover, H. polymorpha has the endogenous capacity for prolyl 4-hydroxylation, which is essential for the function and folding of certain recombinant proteins, such as gelatin [103]. This posttranslational modification is generally known to be absent in microbial eukaryotic systems. Consequently, H. polymorpha holds a secure place in representing the methylotrophic yeasts as an alternative system for heterologous gene expression.
6 Conclusion
Appendix
Table A1. Selection of H. polymorpha host strains
Strain
Genotype
Phenotype
Parental strain DL-1 wild-type (NRRL-Y-7560, ATCC26012)
Source
[11]
Auxotrophic strains DL-1-L leu2
Leu−
[78]
uDL10 DL-LdU DL1-A DL1-L DL1-U
Leu− Ura− Leu− Ura− Leu− Ade− Leu− Leu− Ura−
KRIBB KRIBB CRC CRC CRC
Protease-deficient strains uDLB11 leu2 ura3 pep4::lacZ uDLB12 leu2 ura3 prc1::lacZ uDLB13 leu2 ura3 kex1::lacZ uDLB14 leu2 ura3 pep4::lacZ prc1::lacZ uDLB15 leu2 ura3 pep4::lacZ kex1::lacZ uDLB16 leu2 ura3 prc1::lacZ kex1::lacZ uDLB17 leu2 ura3 pep4::lacZ prc1::lacZ kex1::lacZ
Leu− Ura− Pep4− Leu− Ura− Prc1− Leu− Ura− Kex1− Leu− Ura− Pep4− Prc1− Leu− Ura− Pep4− Kex1− Leu− Ura− Prc− Kex1− Leu− Ura− Pep4− Prc1− Kex1−
KRIBB KRIBB KRIBB KRIBB KRIBB KRIBB KRIBB
Methanol utilization-deficient strains DLT2 leu2 mox-trp3::ScLEU2 DL1-LM leu2 mox
Mox− Tr p − Leu− Mox−
CRC CRC
leu2 ura3 leu2 ura3::lacZ leu2ade2 ade2 leu2::ADE2 leu2 ade2 ura3::ADE2
Parental strain NCYC495 wild-type (CBS1976, ATa14754, NRRL-Y-1798,VKM-Y-1397) L1 leu1–1* A11 ade11–1 M6 met6–1 NCYC495 leu1–1*
Leu− Ade− Met− Leu−
[106] [107] [107] [79]
Nitrate assimilation-related strains NAG1995 ynr1::URA3, leu1—1* NAG1996 yni1::URA3, leu1—1* NAG997 ynt1::URA3, leu1—1* NAG998 yna1::URA3, leu1—1* NAG2001 yna2::URA3, leu1–1*
Ynr1− Leu− Yni1− Leu− Ynt1− Leu− Yna1− Leu− Yna2−
[108] [109] P´erez et al. (1997) [87] Avila and Siverio (unpublished)
[12]
Parental strain CBS4732 wild-type (CCY38–22-2, ATCC34438, NRRL-Y-5445) LR9 ura3–1 (odc1)
Ura−
[110]
RB11
ura3–1
Ura−
[86]
RB12
ura3 leu1–1*
Ura− Leu−
Rhein Biotech, unpublished
[10]
21
22 Protein Production in Hansenula polymorpha Table A1. (Continued)
Strain
Genotype
Phenotype
Source
RB13
ura3 leu1–1* Mox
Ura− Leu− Mox
RB14
ura3 mox
Ura− Mox−
RB15
leu1–1* mox
Leu− Mox−
RB17 RC296
haro7 ade
Ty r− Ade−
A16
leu2 trp3 mox ade2–88 leu2–2 ade2–88 ura2–1 met 4–220 leu2–2 cat1–14 ade2–88 leu2
Leu− Tr p − Mox − Ade− Leu− Ade− Leu− Met− Leu− Cat− Ade− Leu−
Rhein Biotech, unpublished Rhein Biotech, unpublished Rhein Biotech, unpublished [111] Rhein Biotech, unpublished
1B 1-HP065 14C 5C-HP156 8V ∗
[112] [113] [114] [70] [70] [76]
leu1-1 and leu2 correspond to the same gene.
Table A2. Selection of H. polymorpha expression/integration vectors
Plasmid
Expression cassette Replication Selection marker Integrated References/ comments sequence copy numer
DL-1 based plasmids HARS36 AMIpL1 No promoter; Terminator from an unknown gene
HpLEU2
1–2
[77] Multiple cloning sites for insertion of expression cassettes
HARS36 AMIpLD1 No promoter; Terminator from an unknown gene
HpLEU2-d
1–100
[77] Multiple cloning sites for insertion of expression cassettes
HARS36 AMIpSL1 No promoter; Terminator from an unknown gene
ScLEU2
6–9
[77] Multiple cloning sites for insertion of expression cassettes
HARS36 AMIpSU1 No promoter; Terminator from an unknown gene
ScURA3
30–50
[77] Multiple cloning sites for insertion of expression cassettes
pGLG61
TEL188
HpLEU2/G418r 1–50
TEL188
HpLEU2/Hygr
No promoter; No terminator
pHACT90- No promoter; No HyL terminator
1–25
[81] NotI and a BamHI sites for insertion of expression cassettes [71, 82] A NotI site for insertion of expression cassettes
6 Conclusion Table A2. (Continued)
Plasmid
Expression cassette
Replication Selection sequence marker
RB11-based plasmids ScURA3) pMPT121 MOX-promoter; HARS1 MOX-terminator (oppositely oriented than in pFPMT121 and pTPSPMT pFPMT121 FMD-promoter; HARS1 ScURA3 MOX-terminator
Integrated References/ comments copy numer
30–60
[63] EcoRI, BamHI, BglII, sites for insertion of ORFs
30–60
[98] EcoRI, BamHI, BglII, sites for insertion of ORFs Rhein Biotech, unpublished; EcoRI, BamHI, BglII, sites for insertion of ORFs Rhein Biotech, unpublished; EcoRI, BamHI, BglII, sites for insertion of ORFs; no ampR gene
pTPS1MT121 TPS1-promoter HARS1 MOX-terminator
ScURA3
30–60
pB14
FMD-promoter; HARS1 MOX-terminator
ScURA3
30–60
pB14TPS1
TPS1-promoter; HARS1 MOX-terminator
ScURA3
30–60
pB14-LEU2
FMD-promoter; HARS1 MOX-terminator
ScLEU2
30–60
pM1
No promoter HARS1 MOX-terminator
ScURA3
n.d.
pSK92
FMD-promoter; HARS1 MOX-terminator
HARO7
1–5
Rhein Biotech, unpublished; EcoRI, BamHI, BglII, sites for insertion of ORFs; no ampR Rhein Biotech, unpublished; EcoRI, BamHI, BglII sites for insertion of ORFs; no ampR [23] Multiple cloning sites for insertion of expression cassettes [111] Rhein Biotech, unpublished; EcoRI, BamHI, BglII, sites for insertion of ORFs
23
24 Protein Production in Hansenula polymorpha Table A2. (Continued)
Plasmid
Expression cassette
NCYC495-based plasmids pHIPA4 AOX-promoter AMO-terminator pHIPX4 AOX-promoter AMO-terminator pHIPX5 AMO-promoter AMO-terminator pHIPX6 PEX3-promoter AMO-terminator pHIPX7 TEF1-promoter AMO-terminator pHIPX8 TEF2-promoter AMO-terminator pHIPZ4 AOX-promoter AMO-terminator pREMI-Z ScTEF1promoter/EM7 synthetic promoter/ScCYCterminator pYT3 No promoter No terminator pHS5 No promoter No terminator LacZα pHS6 No promoter No terminator LacZα pBSK-LEU2-Ca No promoter No terminator pHI1 No promoter No terminator LacZα
Integrated References/ comments copy numer
Replication sequence
Selection marker
No replication sequence No replication sequence* No replication sequence* No replication sequence*, * No replication sequence* No replication sequence* No replication sequence No replication sequence
HpPUR7/Amp n.d.
[115]
ScLEU2/Kan
n.d.
[116]
ScLEU2/Kan
n.d.
[117]
ScLEU2/Kan
n.d.
[117]
ScLEU2/Kan
n.d.
[118]
ScLEU2/Kan
n.d.
Zeocin/Amp
n.d.
M. Veenhuis, unpublished [119]
Zeocin
n.d.
[65]
ScLEU2/Amp
n.d.
[120]
No replication ScLEU2/Amp sequence*
n.d.
M. Veenhuis, unpublished
HARS1*
n.d.
M. Veenhuis, unpublished
No replication CaLEU2/Amp n.d. sequence No replication HpURA3/Amp n.d. sequence
M. Veenhuis, unpublished [121]
C-ARS*
ScLEU2/Amp
Key: Ca, Candida albicans; Hp, Hansenula polymorpha; Sc, Saccharomyces cerevisiae. These plasmids contain the S. cerevisiae LEU2 gene, which harbors a cryptic HARS. As a consequence, pHIPX- and pHS-type plasmids replicate – albeit rather poorly – in H. polymorpha NCYC495. Addition of the C-ARS or HARS1 regions results in good replicating plasmids. ∗∗ pHIPX6 contains the H. polymorpha PEX3 promoter. This fragment contains a HARS, and allows use as a replicating plasmid to express genes at rather low levels. ∗
References List of Genes Gene
Encoded gene product
CAT CNE1 CTT1 CMK2 CWP1 DAK1 DAS FLD1 FMD GAM1 GAP GAS1 LEU2 MFα1 MOX PHO1 PMA1 PMR1 SED1 TIP1 TPS1 TRP3 URA3 (ODC1) YNT1 YNI1 YNR1
catalase calnexin catalase T (S. cerevisiae) calmodulin-dependent kinase Cell wall mannoprotein dihydroxyacetone kinase dihydroxyacetone synthase formaldehyde dehydrogenase formate dehydrogenase glucoamylase (Schwanniomyces occidentalis) glyceraldehyde-3-phosphate dehydrogenase GPI-anchored surface glycoprotein β-isopropyl malate dehydrogenase α mating factor (S. cerevisiae) methanol oxidase acid phosphatase plasma membrane ATPase P-type Ca2+ transport ATPase Cell surface glycoprotein (Suppressor of Erd2 Deletion) Cell wall mannoprotein (Temperature shock-inducible protein) trehalose-6-phosphate synthase indole-3-glycerol-phosphate synthase ornithidine decarboxylase nitrate transporter nitrite reductase nitrate reductase
References 1 GELLISSEN G. (2000) Heterologous protein production in methylotrophic yeasts. Appl Microbiol Biotechnol 54: 741–750 2 GELLISSEN G., M¨ULLER F., SIEBER H., TIEKE A., JENZELEWSKI V., DEGELMANN A., STRASSER A. W. M. (2002) Production of cytokines in Hansenula polymorpha. In: GELLISSEN G. (Ed) Hansenula polymorpha –biology and applications. Wiley-VCH, Weinheim, pp 229–254. 3 GUENGERICH L., KANG H. A., BEHLE B., GELLISSEN G., SUCKOW M. (2004) A platform for heterologous gene expression based on the methylotrophic yeast Hansenula polymorpha. In: K¨UCK U. (Ed) The mycota II. Genetics and
Biotechnology, 2nd edition, Springer-Verlag, Heidelberg, pp. 273–287. 4 GELLISSEN G. (2002) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim 5 GELLISSEN G., VEENHUIS M. (2001) The methylotrophic yeast Hansenula polymorpha: its use in fundamental research and as a cell factory. Yeast 18: i–iii. 6 van der KLEI I. J., VEENHUIS M. (2002) Hansenula polymorpha: a versatile model organism in peroxisome research. In: GELLISSEN G. (Ed) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim, pp. 76–94.
25
26 Protein Production in Hansenula polymorpha
7 SIVERIO J. M. (2002) Biochemistry and genetics of nitrate assimilation. In: GELLISSEN G. (Ed) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim, pp. 21–40. 8 OGATA K., NISHIKAWA H., OHSUGI M. (1969) Yeast capable of utilizing methanol. Agric Biol Chem 33: 1519–1520. 9 HAZEU W., de BRUYN J. C., BOS P. (1972) Methanol assimilation by yeasts. Arch Mikrobiol 87: 185–188. 10 MORAIS J. O. F. de, MAIA M. H. D. (1959) Estudos de microorganismos encontrados em leitos de despejos de caldas de destilarias de Pernambuco. II. Una nova especie de Hansenula: H. polymorpha. Anais de Escola Superior de Quimica de Universidade do Recife 1: 15–20. 11 LEVINE D. W., COONEY C. L. (1973) Isolation and characterization of a thermotolerant methanol-utilizing yeast. Appl Microbiol 26: 982–989. 12 WICKERHAM L. J. (1951) Taxonomy of yeasts.Technical bulletin No 1029, US Dept Agric, Washington DC, pp. 1–56. 13 TEUNISSON D. J., HALL H. H., WICKERHAM L. J. (1960) Hansenula angusta, an excellent species for demonstration of the coexistence of haploid and diploid cells in a homothallic yeast. Mycologia 52: 184–188. 14 MIDDELHOVEN W. (2002) History, habitat, variability, nomenclature and phylogenetic position of Hansenula polymorpha. In: GELLISSEN G. (Ed) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim, pp. 1–7. 15 KURTZMAN C. P. (1984) Synonymy of the yeast genera Hansenula and Pichia demonstrated through comparison of deoxyribonucleic acid relatedness. Antonie van Leeuwenhoek 50: 209– 217. 16 KURTZMAN C. P., ROBNETT C. J. (1998) Identification and phylogeny of ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal DNA partial sequences. Antonie van Leeuwenhoek 73: 331–371. 17 KURTZMAN C. P., FELL J. W. (1998) The
18
19
20
21
22
23
24
25
26
27
yeasts, a taxonomic study, 4th edition. Elsevier, Amsterdam BARNETT J. A., PAYNE R. W., YARROW D. (2000) Yeasts: Characteristics and Identification, 3rd edition. Cambridge University Press, Cambridge, UK KREGER-VAN RIJ N. J. W. N. (1984) The yeasts. A taxonomic study, 3rd edition. Amsterdam: Elsevier Science Publisher, Amsterdam SUDBERY P. (2003) Book review: Gellissen G (Ed.) Hansenula polymorpha biology and Application. Yeast 20: 107–1308. REINDERS A., ROMANO I., WIEMKEN A., de VIRGILIO C. (1999) The thermophilic yeast Hansenula polymorpha does not require trehalose synthesis for growth at high temperatures but does for normal acquisition of thermotolerance. J Bacteriol 181: 4665–4668. ROMANO I. (1998) Untersuchungen zurTrehalose-6-phosphat Synthase und Klonierung und Deletion des TPS1 Gens in der methylotrophen Hefe Hansenula polymorpha. PhD Thesis, University of Basel, Switzerland AMUEL C., GELLISSEN G., HOLLENBERG C. P., SUCKOW M. (2000) Analysis of heat shock promoters in Hansenula polymorpha: TPS1, a novel element for heterologous gene expression. Biotechnol Bioprocess Eng 5: 247–252. SUCKOW M., GELLISSEN G. (2002) The expression platform based on Hansenula polymorpha strain RB11 and its derivatives-history, status and perspectives. In: GELLISSEN G. (Ed) Hansenula polymorphabiology and applications. Wiley-VCH, Weinheim, pp. 105–123. TANI Y. (1984) Microbiology and biochemistry of the methylotrophic yeasts. In: HOU C. T. (ed) Methylotrophs: microbiology, biochemistry, and genetics. CRC Press, Boca Raton, FL, pp. 55–86. YURIMOTO H., SAKAI Y., KATO N. (2002) Methanol metabolism. In: GELLISSEN G. (Ed) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim, pp. 61–75. LEDEBOER A. M., EDENS L., MAAT J., VISSER C., BOS J. W., VERRIPS C. T.,
References
28
29
30
31
32
33
34
35
JANOWICZ Z. A., ECKART M., ROGGENKAMP R. O., HOLLENBERG C. P. (1985) Molecular cloning and characterization of a gene coding for methanol oxidase in Hansenula polymorpha. Nucleic Acids Res 13: 3063–3082. JANOWICZ Z. A., ECKART M. R., DREWKE C., ROGGENKAMP R. O., HOLLENBERG C. P., MAAT J., LEDEBOER A. M., VISSER C., VERRIPS C. T. (1985) Cloning and characterization of the DAS gene encoding the major methanol assimilatory enzyme from the methylotrophic yeast Hansenula polymorpha. Nucleic Acids Res 13: 3043–3306. DIDION T., ROGGENKAMP R. (1992) Targeting signal of the peroxisomal catalase in the methylotrophic yeast Hansenula polymorpha. FEBS Lett 303: 113–116. BAERENDS R. J. S., SULTER G. J., JEFFRIES T. W., CREGG J. M., VEENHUIS M. (2002) Molecular characterization of the Hansenula polymorpha FLD1 gene encoding formaldehyde dehydrogenase. Yeast 19: 37–42. HOLLENBERG C. P., JANOWICZ Z. A. (1988) DNA molecules coding for FMDH control regions and structured gene having FMDH activity and their uses. European Patent Application EPA 0299108 GELLISSEN G., JANOWICZ Z. A., WEYDEMANN U., MELBER K., STRASSER A. W. M., HOLLENBERG C. P. (1992) High-level expression of foreign genes in Hansenula polymorpha. Biotechnol Adv 10: 179–189. RASCHKE W. C., NEIDITCH B. R., HENDRICKS M., CREGG J. M. (1996) Inducible expression of a heterologous protein in Hansenula polymorpha using the alcohol oxidase 1 promoter of Pichia pastoris. Gene 177: 163–187. RODRIGUEZ L., NARCIANDI R. E., ROCA H., CREMATA J., MONTESINOS R., RODRIGUEZ E., GRILLO J. M., MUZIO V., HERRERA L. S., DELGADO J. M. (1996) Invertase secretion in Hansenula polymorpha under the AOX1 promoter from Pichia pastoris. Yeast 12: 815–822. WASCHK D., KLABUNDE J., SUCKOW M.,
36
37
38
39
40
41
42
43
44
45
HOLLENBERG C. P. (2002) Characteristics of the Hansenula polymorpha genome. In: GELLISSEN G. (Ed) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim, pp. 95–104. RAMEZANI-RAD M., HOLLENBERG C. P., LAUBER J., WEDLER H., GRIESS E., WAGNER C., ALBERMANN K., HANI J., PIONTEK M., DAHLEMS U., GELLISSEN G. (2003) The Hansenula polymorpha (strain CBS4732) genome sequencing and analysis. FEMS Yeast Res 4: 207–215. KIM S. Y., SOHN S.-H., KANG H. A., CHOI E.-S. (2001) Cloning and characterization of the Hansenula polymorpha homologue of the Saccharomyces cerevisiae MNN9 gene. Yeast 18: 455–461. KIM M. W., AGAPHONOV M. O., KIM J. Y., RHEE S. K., KANG H. A. (2002) Sequencing and functional analysis of the Hansenula polymorpha genomic fragment containing the YPT1 and PMI40 genes. Yeast 19: 863–871. KIM S. Y., SOHN J. H., PYUN Y. R., CHOI E. S. (2002) A cell surface display system using novel GPI-anchored proteins in Hansenula polymorpha. Yeast 19: 1153–1163. HEO J. H., WON H. S., KANG H. A., RHEE S. K., CHUNG B. H. (2002) Purification of recombinant human epidermal growth factor secreted from the methylotrophic yeast Hansenula polymorpha. Protein Expr Purif 24: 117–122. Feldmann, H (Ed) (2000) G´enolevures. Genomic exploration of the hemiascomycetous yeasts. FEBS Lett 487: 1–150. OSOEGAWA K., WOON P. Y., ZHAO B., FRENGEN E., TATENO M., CATANESE J. J., de JONG P. J. (1998) An improved approach for construction of bacterial artificial chromosome libraries. Genomics 52: 1–8. OSOEGAWA K., de JONG P. J., FRENGEN E., IOANNOU P. A. (1999) Construction of bacterial artificial chromosome (BAC/PAC) libraries. Curr Protocol Hum Genet 5.15.1–5.15.33 BIRNEY E., DURBIN R. (2000) Using GeneWise in the Drosophila annotation experiment. Genome Res 10: 547–548. KLABUNDE J., DIESEL A., WASCHK D., GELLISSEN G., HOLLENBERG C. P., SUCKOW
27
28 Protein Production in Hansenula polymorpha
46
47
48
49
50
51
52
53
M. (2002) Single step co-integration of multiple expressible heterologous genes into the ribosomal DNA of the methylotrophic yeast Hansenula polymorpha. Appl Microbiol Biotechnol 58: 797–805. KLABUNDE J., KUNZE G., GELLISSEN G., HOLLENBERG C. P. (2003) Integration of heterologous genes in several yeast species using vectors containing a Hansenula polymorpha-derived rDNA targeting element. FEMS Yeast Res 4: 185–193. GEMMILL T. R., TRIMBLE R. B. (1999) Overview of N-linked and O-linked oligosaccharide structures found in various yeast species. Biochim Biophys Acta 1426: 227–237. JIGAMI Y., ODANI T. (1999) Mannosylphosphate transfer to yeast mannan. Biochim Biophys Acta 1426: 335–345. KIM M. W., RHEE S. K., KIM J. Y., SHIMMA Y. I., CHIBA Y., JIGAMI Y., KANG H. A. (2004) Characterization of N-linked oligosaccharides assembled on secretory recombinant glucose oxidase and cell wall mannoproteins from the methylotrophic yeast Hansenula polymorpha. Glycobiology 14: 243–251. SCORER C. A., BUCKHOLZ R. G., CLARE J. J., ROMANOS M. A. (1993) The intracellular production and secretion of HIV-1 envelop protein in the methylotrophic yeast Pichia pastoris. Gene 136: 111–119. M¨ULLER S., SANDAL T., KAMP-HANSEN P., DALBØGE H. (1998) Comparison of expression systems in the yeasts Saccharomyces cerevisiae, Hansenula polymorpha, Kluyveromyces lactis, Schizosaccharomyces pombe and Yarrowia lipolytica. Cloning of two novel promoters from Yarrowia lipolytica. Yeast 14: 1267–1283. MONTESINO R., GARC´IA R., QUINTERO O., CREMATA J. A. (1998) Variation in N-linked oligosac-charide structures on heterologous proteins secreted by the methylotrophic yeast Pichia pastoris. Protein Exp Purif 14: 197–207. BRETTHAUER R. K., CASTELLANO F. J. (1999) Glycosylation of Pichia
54
55
56
57
58
59
60
pastoris-derived proteins. Biotechnol Appl Biochem 30: 193–200. MIELE R. G., CASTELLINO F. J., BRETTHAUER R. K. (1997) Characterization of the acidic oligosaccharides assembled on the Pichia pastoris-expressed recombinant kringle 2 domain of tissue-type plasminogen activator. Biotechnol Appl Biochem 26: 79–83. MONTESINO R., NIMTZ M., QUINTERO O., GARC´IA R., FALCON ´ V., CREMATA J. A. (1999) Characterization of the oligosaccharides assembled on the Pichia pastoris-expressed recombinant aspartic protease. Glycobiology 9: 1037–1043. KANG H. A., SOHN J.-H., CHOI E.-S., CHUNG B.-H., YU M.-H., RHEE S.-K. (1998) Glycosylation of human α 1 -antitrypsin in Saccharomyces cerevisiae and methylotrophic yeasts. Yeast 14: 371–381. CHIBA Y., SUZUKI M., YOSHIDA S., YOSHIDA A., IKENAGA H., TAKEUCHI M., JIGAMI Y., ICHISHIMA E. (1998) Production of human compatible high mannose-type (Man5 GlcNAc2 ) sugar chains in Saccharomyces cerevisiae. J Biol Chem 273: 26298– 26304. CALLEWAERT N., LAROY W., CADIRGI H., GEYSENS S., SAELNES X., JOU W. M., CONTRERAS R. (2001) Use of HDEL-tagged Trichoderma reesei mannosyl oligosaccharide 1,2-a-D-mannosidase for N-glycan engineering in Pichia pastoris. FEBS Lett 503: 173–178. HAMILTON S. R., BOBROWICZ P., BOBROWICZ B., DAVIDSON R. C., LI H., MITCHELL T., NETT J. H., RAUSCH S., STADHEIM T. A., WISCHENWSK H., WILDT S., GERNGROSS T. U. (2003) Production of complex human glycoproteins in yeast. Science 301: 1244–1246. KANG H. A., KIM J.-Y., KO S.-M., PARK C. S., RYU D. Y., SOHN J.-H., CHOI E.-S., RHEE S.-K. (1998) Cloning and characterization of the Hansenula polymorpha homologue of the Saccharomyces cere-visiae PMR1 gene. Yeast 14: 1233–1240.
References 61 AGAPHONOV M. O., PACKEISER A. N., CHECHENOVA M. B., CHOI E.-S., TER-AVANESYAN M. D. (2001) Mutation of the homologue of GDP-mannose pyrophosphorylase alters cell wall structure, protein glycosylation and secretion in Hansenula polymorpha. Yeast 18: 391–402. 62 BLANDIN G., LLORENTE B., MALPERTUY A., WINCKER P., ARTIGUENAVE F., DUJON B. (2000) Genomic exploration of the hemiascomycetous yeasts: 13. Pichia angusta. FEBS Lett 487: 76–81. 63 GELLISSEN G., HOLLENBERG C. P. (1997) Application of yeasts in gene expression studies: a comparison of Saccharomyces cerevisiae, Hansenula polymorpha and Kluyveromyces lactis – a review. Gene 190: 87–97. 64 GONZALEZ C., PERDOMO G., TEJERA P., BRITO N., SIVERIO J. M. (1999) One-step, PCR-mediated, gene disruption in the yeast Hansenula polymorpha. Yeast 15: 1323–1329. 65 van DIJK R., FABER K. N., HAMMOND A. T., GLICK B. S., VEENHUIS M., KIEL J. A. K. W. (2001) Tagging Hansenula polymorpha genes by random integration of linear DNA fragments (RALF). Mol Genet Genomics 266: 646– 656. 66 KANG H. A., SOHN J.-H., AGAPHONOV M. O., CHOI E.-S., TER-AVANESYAN M. D., RHEE S. K. (2002) Development of expression systems for the production of recombinant proteins in Hansenula polymorpha DL-1. In: GELLISSEN G. (Ed) Hansenula polymorpha-biology and applications. Wiley-VCH, Weinheim, pp. 124–146. 67 SOHN J.-H., CHOI E.-S., KANG H. A., RHEE J.-S., RHEE S.-K. (1999) A family of telomere-associated autonomously replicating sequences and their functions in targeted recombination in Hansenula polymorpha DL-1. J Bacteriol 181: 1005–1013. 68 KIM S. Y., SOHN J. H., BAE J. H., PYUN Y. R., AGAPHONOV M. O., TER-AVANESYAN M. D., CHOI E. S. (2003) Efficient library construction by in vivo recombination with a telomere-originated autonomously replicating sequence of
69
70
71
72
73
74
75
76
Hansenula polymorpha. Appl Environ Microbiol 69: 4448–4454. MAYER A. F., HELLMUTH K., SCHLIEKER H., LOPEZ-ULIBARRI R., OERTEL S., DAHLEMS U., STRASSER A. W. M., van LOON A. P. G. M. (1999) An expression system matures: a highly efficient and cost-effective process for phytase production by recombinant strains of Hansenula polymorpha. Biotechnol Bioeng 63: 373–381. LAHTCHEV K. (2002) Basic genetics of Hansenula polymorpha. In: GELLISSEN G. (Ed) Hansenula polymorpha –biology and applications. Wiley-VCH, Weinheim, pp. 8–20. KANG H. A., KANG W., HONG W.-K., KIM M. W., KIM J.-Y., SOHN J.-H., CHOI E.-S., CHOE K.-B., RHEE S. K. (2001) Development of expression systems for the production of recombinant human serum albumin using the MOX promoter in Hansenula polymorpha DL-1. Biotechnol Bioeng 76: 175–185. M¨ULLER F. II, TIEKE A., WASCHK D., M¨UHLE C., M¨ULLER F. I, SEIGELCHIFER M., PESCE A., JENZELEWSKI V., GELLISSEN G. (2002) Production of IFNα-2a in Hansenula polymorpha. Proc Biochem 38: 15–25. PARK B. S., ANANINE V., KIM C. H., RHEE S. K., KANG H. A. (2004) Secretory production of Zymomonas mobilis levansucrase by the methylotropic yeast Hansenula polymorpha Enzym Microbiol Tech 34: 132–138. DEGELMANN A. (2002) Methods. In: GELLISSEN G. (Ed) Hansenula polymorpha-biology and applications. Wiley-VCH, Weinheim, pp. 285–335. DEGELMANN A., M¨ULLER F., SIEBER H., JENZELEWSKI V., SUCKOW M., STRASSER A. W. M., GELLISSEN G. (2002) Strain and process development for the production of human cytokines in Hansenula polymorpha. FEMS Yeast Res 2: 349–361. AGAPHONOV M. O., BEBUROV M. Y., TER-AVANESYAN M. D., SMIRNOV V. N. (1995) Disruption-displacement approach for the targeted integration of foreign genes in Hansenula polymorpha. Yeast 11: 1241–1247.
29
30 Protein Production in Hansenula polymorpha
77 AGAPHONOV M., TRUSHKINA P. M., SOHN J. S., CHOI E. S., RHEE S. K., TER-AVANESYAN M. D. (1999) Vectors for rapid selection of integrants with different plasmid copy numbers in the yeast Hansenula polymorpha DL-1. Yeast 15: 541–551. 78 SOHN J.-H., CHOI E.-S., KIM C.-H., AGAPHONOV, M. O. TER-AVANESYAN M. D., RHEE J.-S., RHEE S.-K. (1996) A novel autonomously replicating sequence (ARS) for multiple integration in the yeast Hansenula polymorpha DL-1. J Bacteriol 178: 4420–4428. 79 BRITO N., P´EREZ M. D., PERDOMO G., GONZA´ LEZ C., GARC´IA-LUGO P., SIVERIO J. M. (1999) A set of Hansenula polymorpha integrative vectors to construct lacZ fusions. Appl Microbiol Biotechnol 53: 23–29. 80 HEO J. H., HONG W. K., CHO E. Y., KIM M. W., KIM J. Y., KIM C. H., RHEE S. K., KANG H. A. (2003) Properties of the Hansenula polymorpha-derived constitutive GAP promoter, assessed using an HSA reporter gene. FEMS Yeast Res 4: 175–184. 81 SOHN J.-H., CHOI E.-S., KANG H. A., RHEE J.-S., AGAPHONOV M. O., TER-AVANESYAN M. D., RHEE S.-K. (1999) A dominant selection system designed for copy number-controlled gene integration in Hansenula polymorpha DL-1. Appl Microbiol Biotechnol 51: 800–807. 82 KANG H. A., HONG W.-K., SOHN J.-H., CHOI E.-S., RHEE S. K. (2001) Molecular characterization of the actin-encoding gene and the use of its promoter for a dominant selection system in the methylotrophic yeast Hansenula polymorpha. Appl Microbiol Biotechnol 55: 734–741. 83 GELLISSEN G., MELBER K., JANOWICZ S. A., DAHLEMS U., WEYDEMANN U., PIONTEK M., STRASSER A. W. M., HOLLENBERG C. P. (1992) Heterologous protein expression in yeast. Antonie van Leeuwenhoek 62: 79–93. 84 van DJIK R., FABER K. N., KIEL J. A. K. W., VEENHUIS M., van der KLEI I. J. (2000) The methylotrophic yeast Hansenula polymorpha: a versatile cell factory. Enzyme Microbiol Technol 26: 793–800.
85 PHONGDARA A., MERCKELBACH A., KEUP P., GELLISSEN G., HOLLENBERG C. P. (1998) Cloning and characterization of the gene encoding a repressible acid phosphatase (PHO1) from the methylotrophic yeast Hansenula polymorpha. Appl Microbiol Biotechnol 50: 77–84. 86 WEYDEMANN U., KEUP P., PIONTEK M., STRASSER A. W. M., SCHWEDEN J., GELLISSEN G., JANOWICZ S. A. (1995) High-level secretion of hirudin by Hansenula polymorpha-authentic processing of three different preprohirudins. Appl Microbiol Biotechnol 44: 377–385. 87 AVILA J., GONZA´ LEZ C., BRITO N., SIVERIO J. M. (1998) Clustering of the YNA1 gene encoding a ZN(II)2 Cys6 transcriptional factor in the yeast Hansenula polymorpha with the nitrate assimilation genes YNT1,YNI1 and YNR1, and its involvement in their transcriptional activation. Biochem J 335: 647–652. 88 COX H., MEAD D., SUDBERY ELAND M., EVANS L. (2000) Constitutive expression of recombinant proteins in the methylotrophic yeast Hansenula polymorpha using the PMA1 promoter. Yeast 16: 1191–1203. 89 JENZELEWSKI V. (2002) Fermentation and primary product recovery. In: GELLISSEN G. (Ed.) Hansenula polymorpha-biology and applications. Wiley-VCH, Weinheim, pp. 156–174. 90 CREGG J. M. (1999) Expression in the methylotrophic yeast Pichia pastoris. In: FERNANDEZ J. M., HOEFFLER J. P. (Eds.) Gene expression systems: using nature for the art of expression. Academic Press, San Diego, pp. 157–191. 91 AVGERINOS G. C., TURNER B. G., GORELICK M. D., PAPENDIECK A., WEYDEMANN U., GELLISSEN G. (2001) Production and clinical development of a Hansenula polymorpha-derived PEGylated hirudin. Semin Thromb Hemost 27: 357–371. 92 BARTELSEN O., BARNES C. S., GELLISSEN G. (2002) Production of anticoagulants in Hansenula polymorpha. In: GELLISSEN G. (Ed.) Hansenula polymorpha –biology and applications. Wiley-VCH, Weinheim, pp. 211–228.
References 93 SCHAEFER S., PIONTEK M., AHN S.-J., PAPENDIECK A., JANOWICZ S. A., GELLISSEN G. (2001) Recombinant hepatitis B vaccines –characterization of the viral disease and vaccine production in the methylotrophic yeast Hansenula polymorpha. In: DEMBOWSKY K., STADLER P. (Eds.), Novel therapeutic proteins: selected case studies. Wiley-VCH, Weinheim, pp. 245–274. 94 SCHAEFER S., PIONTEK M., AHN S.-J., PAPENDIECK A., JANOWICZ Z. A., TIMMERMANS I., GELLISSEN G. (2002) Recombinant hepatitis B vaccines–disease characterization and vaccine production. In: GELLISSEN G. (Ed) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim, pp. 175–210. 95 PAPENDIECK A., DAHLEMS U., GELLISSEN G. (2002) Technical enzyme production and whole-cell biocatalysis: application of Hansenula polymorpha. In: GELLISSEN G. (Ed) Hansenula polymorpha – biology and applications. Wiley-VCH, Weinheim, pp. 255–271. 96 CURVERS S., BRIXIUS P., KLAUSER T., WEUSTER-BOTZ D., TAKORS R., WANDREY C. (2001) Human chymotrypsinogen B production with Pichia pastoris by integrated development of fermentation and downstream processing Part I. Fermentation. Biotechnol Progr 17: 495–502. 97 THO¨ MMES J., HALFAR M., GIEREN H., CURVERS S., TAKORS R., BRUNSCHIER R., KULA M.-R. (2001) Human chymotrypsinogen B production from Pichia pastoris by integrated development of fermentation and downstream processing Part 2. Protein recovery. Biotechnol Progr 17: 503–512. 98 ZUREK C., KUBIS E., KEUP P., H¨ORLEIN D., BEUNINK J., THO¨ MMES J., KULA M.-R., HOLLENBERG C. P., GELLISSEN G. (1996) Production of two aprotinin variants in Hansenula polymorpha. Process Biochem 31: 679–689. 99 AGAPHONOV M. O., ROMANOS N. V., TRUSHINA P. M., SMIRNOV V. N., TER-AVANESYAN M. D. (2002) Aggregation and retention of human urokinase type plasminogen activator in
100
101
102
103
104
105
106
107
the yeast endoplasmic reticulum. BMC Mol Biol 3: 15–22. OH K. S., KWON O., OH Y. W., SOHN M. J., JUNG S., KIM Y. K., KIM M. G., RHEE S. K., GELLISSEN G., KANG H. A. (2004) Fabrication of a partial genome microarray of the methylotrophic Yeast Hansenula polymorpha: optimization and evaluation for transcript profiling. J Microbiol Biotechnol 14: (in press) SOHN J.-H., KANG H. A., RAO K. J., KIM C. H., CHOI E.-S., CHUNG B. H., RHEE S.-K. (2001) Current status of the anticoagulant hirudin: its biotechnological production and clinical practice. Appl Microbiol Biotechnol 57: 606–613. BARNES C. S., KRAFFT B., FRECH M., HOFMANN U. R., PAPENDIECK A., DAHLEMS U., GELLISSEN G., HOYLAERTS M. F. (2001) Production and characterization of saratin, an inhibitor of von Willebrand factor-dependent platelet adhesion to collagen. Semin Thromb Hemost 27: 337–347. De BRUIN E. C., WERTEN M. W., LAANE C., de WOLF F. A. (2002) Endogenous propyl 4-hydroxylation in Hansenula polymorpha and its use for the production of hydroxylated recombinant gelatin. FEMS Yeast 1: 291–298. TIKHOMIROVA L. P., IKONOMOVA R. N., KUZNETSOVA E. N., FODOR I. I., BYSTRYKH L. V., AMINOVA L. R., TROTSENKO Y. A. (1988) Transformation of methylotrophic yeast Hansenula polymorpha: cloning and expression of genes. J Basic Microbiol 28: 343–351. COOK M. W., THYGESEN H. V. (2003) Safety evaluation of a hexose oxidase expressed in Hansenula polymorpha. Food Chem Toxicol 41: 523–529. GLEESON M. A., ORTORI G. S., SUDBERY P. E. (1986) Transformation of the methylotrophic yeast Hansenula polymorpha. J Gen Microbiol 132: 3459–3465. PARPINELLO G., BERARDI F., STRABBIOLI R. (1998) A regulatory mutant of Hansenula polymorpha exhibiting methanol utilization metabolism and peroxisome proliferation in glucose. J Bacteriol 180: 2958–2967.
31
32 Protein Production in Hansenula polymorpha
108 AVILA J., P´EREZ M. D., BRITO N., GONZA´ LEZ C., SIVERIO J. M. (1995) Cloning and disruption of the YNR1 gene encoding the nitrate reductase apoenzyme of the yeast Hansenula polymorpha FEBS Lett 366: 137– 142. 109 BRITO N., AVILA J., P´EREZ M. D., GONZA´ LEZ C., SIVERIO J. M. (1996) The genes YNI1 and YNR1 encoding nitrite reductase and nitrate reductase respectively in the yeast Hansenula polymorpha, are clustered and co-ordinately regulated. Biochem J 317: 89–95. 110 ROGGENKAMP R., HANSEN H., ECKART M., JANOWICZ Z., HOLLENBERG, C. P. (1986) Transformation of the methylotrophic yeast Hansenula polymorpha by autonomous replication and integration vectors. Mol Gen Genet 202: 302– 308. 111 KRAPPMANN S., PRIES R., GELLISSEN G., HILLER M., BRAUS G. H. (2000) HARO7 encodes chorismate mutase of the methylotrophic yeast Hansenula polymorpha and is derepressed upon methanol utilization. J Bacteriol 182: 4188–4197. 112 VEALE R. A., GUISEPPIN M. L., van EIJK H. M., SUDBERY P. E., VERRIPS C. T. (1992) Development of a strain of Hansenula polymorpha for the efficient expression of guar alpha-galactosidase. Yeast 8: 361–372. 113 BOGDANOVA A. I., KUSTIKOVA O. S., AGAPHONOV M. O., TER-AVANESYAN M. D. (1998) Sequences of Saccharomyces cerevisiae 2 µm DNA improving plasmid partitioning in Hansenula polymorpha. Yeast 14: 1–9. 114 MANNAZZU I., GUERRA E., STRABBIOLI R., MASIA A., MAESTRALE G. B., ZORODDU M. A., FATICHENTI F. (1997) Vanadium affects vacuolation and phosphate metabolism in Hansenula polymorpha. FEMS Microbiol Lett. 147: 23–28. 115 HAAN G. J., van DIJK R., KIEL J. A. K. W., VEENHUIS M. (2002) Characterization of the Hansenula polymorpha PUR7 gene and its use as selectable marker for targeted chromosomal integration. FEMS Yeast Res 2: 17–24.
116 GIETL C., FABER K. N., van der KLEI I. J., VEENHUIS M. (1994) Mutational analysis of the N-terminal topogenic signal of watermelon glyoxysomal malate dehydrogenase using the heterologous host Hansenula polymorpha. Proc Natl Acad Sci USA 91: 3151–3155. 117 KIEL J. A. K. W., KEIZER-GUNNINK I. K., KRAUSE T., KOMORI M., VEENHUIS M. (1995) Heterologous complementation of peroxisome function in yeast: the Saccharomyces cerevisiae PAS3 gene restores peroxisome biogenesis in a Hansenula polymorphaper disruption mutant. FEBS Lett 377: 434–438. 118 BAERENDS R. J. S., SALOMONS F. A., KIEL J. A. K. W., van der KLEI I. J., VEENHUIS M. (1997) Deviant Pex3p levels affect normal peroxisome formation in Hansenula polymorpha: a sharp increase of the protein level induces the proliferation of numerous, small protein import-competent peroxisomes. Yeast 13: 1449–1463. 119 SALOMONS F. A., KIEL J. A. K. W., FANER K. N., VEENHUIS M., van der KLEI I. J. (2000) Overproduction of pex5p stimulates import of alcohol oxidase and dihydroxyacetone synthase in a Hansenula polymorpha pex14 null mutant. J Biol Chem 276: 44570–44574. 120 TAN X., TITORENKO V. I., van der KLEI I. J., SULTER G. J., HAIMA P., WATERHAM H. R., EVERS M., HARDER W., VEENHUIS M., CREGG J. M. (1995) Characterization of peroxisome-deficient mutants of Hansenula polymorpha. Curr Genet 28: 248–257. 121 KIEL J. A. K. W., HILBRANDS R. E., van der KLEI I. J., RASMUSSEN S. W., SALOMONS F. A., van der HEIDE M., FABER K. N., CREGG J. M., VEENHUIS M. (1999) Hansenula polymorpha Pex1p and Pex6p are peroxisome-associated AAA proteins that functionally and physically interact. Yeast 15: 1059–1078. 122 AGAPHONOV M. O., DEEV A. V., KIM S.-Y., SOHN J.-H., CHOI E.-S., TER-AVANESYAN M. D. (2003) A novel approach to isolation and functional characterization of genomic DNA sequences from the methylotrophic yeast Hansenula polymorpha. Mol Biol 37: 74–80.
References 123 BARR P. J., BRAKE A. J., VALENZUELA P. (Eds.) Yeast genetic engineering. Butterworth, Boston, MA 124 GELLISSEN G., PIONTEK M., DAHLEMS U., JENZELEWSKI V., GAVAGAN J. E., DICOSIMO R., ANTON D. L., JANOWICZ S. A. (1996) Recombinant Hansenula polymorpha as a biocatalyst: coexpression of the spinach glycolate oxidase (GO) and the S. cerevisiae catalase T (CTT1) gene. Appl Microbiol Biotechnol 46: 46–54. 125 GOFFEAU A. (1996) 1996: A vintage year forYeast and yeast. Yeast 2: 1603–1606. 126 JANOWICZ S. A., MERCKELBACH A., JACOBS E., HARFORD N., COMERBACH M., HOLLENBERG C. P. (1991) Simultaneous expression of the S and L surface antigens of hepatitis B, and formation of mixed particles in the methylotrophic yeast Hansenula polymorpha. Yeast 7: 431–443.
127 MARRI L., ROSSOLINI G. M., SATTA G. (1993) Chromosome polymorphism among strains of Hansenula polymorpha (syn. Pichia angusta). Appl Environ Microbiol 59: 939–941. 128 MEWES H. W., FRISHMAN D., G¨ULDENER U., MANNHAUPT G., MAYER K., MOKREJS M., MORGENSTERN B., M¨UNSTERKOETTER M., RUDD S., WEIL B. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30: 31–34. 129 REHM H.-J., REED G. (Eds.) (1993) Biotechnology, Vol 2, Genetic Fundamentals and genetic engineering. VCH, Weinheim 130 SPIER R. E. (2000) Yeast as a cell factory. Enzyme Microbiol Technol 26: 639 131 STRAHL-BOLSINGER S., GENTZSCH M., TANNER W. (1999) Protein O-mannosylation. Biochim Biophys Acta 1426: 297–307.
33
1
Protein Production in Pichia pastoris Christine Ilgen and James M. Cregg Keck Graduate Institute of Applied Sciences, Claremont, USA
Joan Lin-Cereghino University of the Pacific, Stockton, USA
Originally published in: Production of Recombinant Proteins. Edited by Gerd Gellissen. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31036-4
1 Introduction
The methylotrophic yeast Pichia pastoris has developed into a highly successful system for the production of a variety of heterologous proteins. At present, over 500 reports describing its use have been published (http://faculty.kgi.edu/ cregg/index.htm). The popularity of the system can be attributed to several factors, most importantly: (1) the presence, in expression vectors, of a strong, tightly regulated, and easily manipulated promoter derived from the P. pastoris alcohol oxidase I gene (AOX1) [1, 2, 2] (2) a strong preference for respiratory versus fermentative growth; (3) the simplicity of techniques needed for the molecular genetic manipulation of P. pastoris [3]; (4) the capability of performing many eukaryotic post-translational modifications, such as glycosylation, disulfide bond formation and proteolytic processing; and (5) the availability of the expression system as a commercially available kit (www.invitrogen.com). This chapter is intended as a general overview of the system and how it works, plus a review of recent advances in modifying the structures of N-linked carbohydrates on glycoproteins secreted from the yeast. The conceptual basis for the P. pastoris expression system stems from the ability of this yeast to grow on methanol as sole carbon and energy source [4–6] and the observation that some of the enzymes required for methanol metabolism are present at substantial levels only when cells are grown on methanol [5, 7]. At least two of the methanol pathway enzymes, AOX and dihydroxyacetone synthase (DHAS), are present at high levels in cells grown on methanol, but are undetectable in cells
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Production in Pichia pastoris
grown on most other carbon sources. In cells fed methanol at growth-limiting rates in fermentor cultures, AOX levels are dramatically induced, constituting >30% of total soluble protein [8]. The most commonly utilized promoter for controlling foreign gene expression in the system is derived from the P. pastoris AOX1 gene [9]. In fact, there are two genes that code for AOX in P. pastoris: AOX1 and AOX2. Although both genes encode equally functional alcohol oxidase enzymes, AOX1 is responsible for a vast majority of alcohol oxidase activity in the cell, due to the relative strength of its promoter [1, 10]. In methanol-grown cells, ∼5% of polyA+ RNA is from AOX1; however, in cells grown on most other carbon sources, AOX1 message is undetectable [11]. The regulation of the AOX1 gene appears to involve two mechanisms: a repression/derepression mechanism; plus an induction mechanism, similar to the regulation of several of the Saccharomyces cerevisiae GAL genes. However, unlike GAL-pathway regulation, the absence of a repressing carbon source, such as glucose in the medium, does not result in substantial transcription of AOX1. The presence of methanol appears to be essential to induce high levels of AOX1 transcription [9].
2 Construction of Expression Strains
Like most expression systems, the P. pastoris system is an Escherichia coli-based shuttle vector system. As a result of the efforts of numerous groups, a variety of expression vectors and P. pastoris host strains are available. Tables 1 and 2 provide lists of promoters and selectable marker genes that have been incorporated into P. pastoris expression vectors. Additional information on vectors and strains can be found in A1 and A2, and elsewhere (faculty.kgi.edu/cregg/index.htm; [2, 3, 12–14]) 2.1 Expression Vector Components
Most P. pastoris expression vectors have an expression cassette composed of a 0.9kb fragment from AOX1 composed of the 5 promoter sequences, and a second Table 1 P. pastoris promoters and their properties
Namea
Relative efficiency
Inducing compound
Reference
PAOX1 PGAP Pfld1 Ppex8 Pypt1
High High High Moderate Low
Methanol Constitutive Methanol or methylamine Methanol or oleate Constitutive
Ellis et al., [1]; Tschopp et al., [9] [15] [16] [17] [18]
a
See Section 2.2 for definitions of abbreviations.
2 Construction of Expression Strains Table 2 Selectable marker genes for transformation of P. pastoris
Biosynthetic markers
Useful property
Reference
HIS4
P. pastoris and S. cerevisiae genes complement P. pastoris his4 strains P. pastoris and S. cerevisiae genes complement P. pastoris arg4 strains Pink color of ade1 strains 5-Fluoro-orotic acid selection for Ura− strains 5-Fluoro-orotic acid selection for Ura− strains
[11]
ARG4 ADE1 URA3 URA5
Cregg and Madden [11] [6] [6] Nett and Gerngross [21]
Drug-resistance markers Selective drug
Reference
KanR ZeoR BsdR FLD1
[22] [23] www.invitrogen.com Sunga and Cregg [25]
G418/Geneticin Zeocin Blasticidin Formaldehyde
short AOX1-derived fragment with sequences required for transcription termination [9, 14]. Between the promoter and terminator sequences is a site or multiple cloning site (MCS) for insertion of the foreign coding sequence. Generally, the best expression results are obtained when the first ATG of the heterologous coding sequence is inserted as close as possible to the position of the corresponding AOX1 ATG. This position coincides with the 5 -most restriction site in MCSs. In addition, for the secretion of foreign proteins, vectors are available where in-frame fusions of foreign proteins and the secretion signals of P. pastoris acid phosphatase (PHO1) or S. cerevisiae α-mating factor (α-MF) can be generated. 2.2 Alternative Promoters
Although the AOX1 promoter has been successfully used to express numerous foreign genes, there are circumstances in which this promoter may not be suitable. For example, the use of methanol to induce gene expression may not be appropriate for the production of food products since methane – a petroleum-related compound – is one source of methanol. Also, methanol is a potential fire hazard, especially in quantities needed for large-scale fermentations. Therefore, promoters that are not induced by methanol are attractive for expression of certain genes. Alternative promoters to the AOX1 promoter are the P. pastoris GAP, FLD1, PEX8, and YPT1 promoters (see Table 1). The P. pastoris glyceraldehyde 3-phosphate dehydrogenase (GAP) gene promoter provides strong constitutive expression on glucose at a level comparable to that seen with the AOX1 promoter [15]. The advantage of using the GAP promoter is that methanol is not required for induction, nor is it necessary to shift cultures from one
3
4 Protein Production in Pichia pastoris
carbon source to another, making strain growth more straightforward. However, since the GAP promoter is constitutively expressed, it is not a good choice for the production of proteins that are toxic to the yeast. The FLD1 gene encodes glutathione-dependent formaldehyde dehydrogenase, a key enzyme required for the metabolism of certain methylated amines as nitrogen sources and methanol as a carbon source [16]. The FLD1 promoter can be induced with either methanol as a sole carbon source (and ammonium sulfate as a nitrogen source) or methylamine as a sole nitrogen source (and glucose as a carbon source). Thus, the FLD1 promoter offers the flexibility to induce high levels of expression using either methanol or methylamine, an inexpensive nontoxic nitrogen source. For some applications, the AOX1, GAP, and FLD1 promoters may be too strong, expressing genes at too high a level. For these and other applications, moderately expressing promoters are desirable. Toward this end, the P. pastoris PEX8 and YPT1 promoters may be of use. The PEX8 gene (formerly PER3) encodes a peroxisomal matrix protein that is essential for peroxisome biogenesis [17]. It is expressed at a low but significant level on glucose and is induced modestly (three- to fivefold) when cells are shifted to methanol. The YPT1 gene encodes a GTPase involved in secretion, and its promoter provides a low but constitutive level of expression in medium containing either glucose, methanol, or mannitol as carbon source [18]. 2.3 Selectable Markers
The number of available vector marker genes for transformation of P. pastoris has increased significantly in recent years (see Table 2). The stable expression of human type III collagen illustrates the need for multiple selectable markers [19]. The production of collagen required the co-expression of prolyl-4-hydroxylase, a central enzyme in the synthesis and assembly of trimeric collagen. Since prolyl4-hydroxylase is an α 2 β 2 tetramer, the β subunit of which is protein disulfide isomerase (PDI), three markers – Arg, His, and Zeocin resistance – were used to co-express all three polypeptides in the same P. pastoris strain. Marker genes fall into two basic groups: the biosynthetic gene/auxotrophic mutant group; and the dominant drug resistance gene group. Biosynthetic gene markers require a vector with such a gene along with a mutant host strain that is defective in that gene. For P. pastoris expression vectors, the P. pastoris HIS4 (histidinol dehydrogenase), ADE1 (PR-amidoimidazole- succinocarboxamide synthase), ARG4 (argininosuccinate lyase), URA3 (orotidine 5 -phosphate decarboxylase), and URA5 (orotate-phosphoribosyl transferase) genes have been developed [20, 21]. Maps and sequences of vectors containing the first four marker genes are available (http://faculty.kgi.edu/cregg/index.htm). In addition, a series of host strains containing all possible combinations of ade1, arg4, his4, and ura3 auxotrophies has been generated and made available (http://faculty.kgi.edu/cregg/index.htm; [20]). Drug-resistant selectable markers are often more convenient to use as they usually do not require a specific mutation in the host strain for selection. Three such
2 Construction of Expression Strains
Fig. 1 Map of P. pastoris expression vector pPICZ.
marker systems have been developed for P. pastoris. As shown in Table 2, these marker genes are the E. coli Tn903kanR kanamycin/geneticin/G418 resistance gene (KanR ) [22], the Streptoalloteichus hindustanus She ble gene conferring resistance to the bleomycin-related drug Zeocin (ZeoR ) [23], and the Streptomyces griseochromogens gene, conferring resistance to the nucleoside antibiotic Blasticidin S (BsdR ) (www.invitrogen.com). At present, vectors containing the ZeoR gene are most popular due to their small size, a consequence of the ZeoR gene being the selectable marker for transformations of both E. coli and P. pastoris in these vectors (Figure ??). The BsdR vectors appear to be comparable to the ZeoR vectors but are relatively new, with few reports of their use in the literature as yet. As described in Section 2.6, all three systems offer the potential to enrich for strains with multiple integrated copies of a vector by selection for transformants with resistance to high levels of the appropriate drug. In addition, with each of these systems, it is only necessary to add drug during the initial selection of transformants (and perhaps once again to single colony purify strains). Once isolated, the strains are stable – provided one follows standard microbial storage and manipulation procedures – and do not need further drug selection, for example during growth of expression strains in shake flask or fermentor cultures. A counterselective marker based on a corn mitochondrial gene T-urf13 has been described [24]. Expression of this gene in P. pastoris renders the cells sensitive to the insecticide methomyl, making this gene a useful counter marker for pop-in/pop-out gene replacement strategies in the yeast. Finally, a marker system utilizing the P. pastoris FLD1 gene as the selectable marker was recently reported [25]. Like biosynthetic gene systems, FLD1-based vectors require a mutant host (fld1) for initial selection of transformants for growth on methylamine as nitrogen source. However, like the drug resistance vectors, once the methylamine utilizing colonies are selected, one can enrich populations of transformants with multiple vector copies by selection for strains that are resistant to high levels of formaldehyde. Vectors containing the FLD1 marker gene were
5
6 Protein Production in Pichia pastoris
constructed so that all non-yeast sequences (except the foreign gene itself) can be removed prior to transformation in P. pastoris. In particular, bacterial antibiotic resistance genes and origins of replication are not present in the final strains, substantially improving the biological safety of such strains over typical bacterial shuttle expression vectors. 2.4 Host Strains
All P. pastoris expression strains are derived from NRRL-Y 11430 (Northern Regional Research Laboratories, Peoria, IL, USA). Some have one or more auxotrophic mutations, which allow for selection of expression vectors containing the appropriate biosynthetic marker and of diploid products resulting from genetic crosses (A1). Prior to transformation, all of these strains grow on complex media but require supplementation with the appropriate nutrient(s) for growth on minimal media. 2.4.1 Methanol Utilization Phenotype Most P. pastoris host strains grow on methanol at the wild-type rate (Mut+ , methanol utilization plus phenotype). However, two other types of host strains are available that vary with regard to their ability to utilize methanol because of deletions in one or both AOX genes. Strains with AOX mutations are sometimes better producers of foreign proteins than wild-type strains [9, 14, 26]. Additionally, these strains require only small amounts of methanol for induction and, therefore, do not have the potential methanol feeding problems that large-scale fermentations of Mut+ strains can experience (see Section 2.7). KM71 (his4 arg4 aox1::SARG4) is a strain in which AOX1 has been partially deleted and replaced with the S. cerevisiae ARG4 gene [11]. Since the strain must rely on the weaker AOX2 for methanol metabolism, it grows slowly on this carbon source (Muts , methanol utilization slow phenotype). Another strain, MC100-3 (his4arg4 aox1::SARG4 aox2::Phis4), is deleted for both AOX genes and is totally unable to grow on methanol (Mut− , methanol utilization minus phenotype) [10, 26]. All of these strains, including the Mut− strain, retain the ability to induce expression at high levels from the AOX1 promoter. 2.4.2 Protease-deficient Host Strains Several protease-deficient strains – SMD1163 (his4 pep4 prb1), SMD1165 (his4 prb1), and SMD1168 (his4 pep4) – have been shown to be effective in reducing the degradation of some foreign proteins [27, 28]. This is especially noticeable in fermentor cultures of secreted recombinant proteins, because the combination of high cell density and lysis of a small percentage of cells results in a relatively high concentration of these vacuolar proteases. Unfortunately, these protease-deficient cells are not as vigorous as wild-type strains with respect to PEP4. In addition to lower viability, they possess a slower growth rate and are more difficult to transform. Therefore, the use of protease-deficient strains is only recommended in situations where other measures to reduce proteolysis have yielded unsatisfactory results.
2 Construction of Expression Strains
An additional protease-deficient strain, SMD1168 kex1::SUC2 (pep4::URA3 kex1::SUC2 his4 ura3), was developed to inhibit proteolysis of murine and human endostatin [29]. The Kex1 protease cleaves carboxy-terminal lysines and arginines. Therefore, the KEX1-deleted strain was generated to inhibit carboxy-terminal proteolysis and proved to substantially reduce this type of degradation. 2.5 Construction of Expression Strains
Although autonomous episomal multicopy vectors exist for P. pastoris [30], it was discovered that strains expressing a recombinant gene from a single integrated copy of an expression vector made more product than strains containing the identical expression cassette on a multicopy autonomous vector. Furthermore, integrated vectors are considerably more stable than autonomous vectors, requiring little or no continued selection for the vector after transformation and initial colony formation. Thus, virtually all P. pastoris expression vectors are designed to integrate into the genome of the yeast. P. pastoris expression vectors can be integrated into the genome in one of two ways. The simplest way is to restrict the vector at a unique site in either the marker gene (e.g., HIS4) or the AOX1 promoter fragment and then transform it into the appropriate auxotrophic mutant. The free DNA termini stimulate homologous recombination events that result in single crossover-type integration events into these loci at high requencies (50–80% of His+ transformants). The remaining transformants have undergone gene conversion events in which only the marker gene from the vector has integrated into the mutant host locus without other vector sequences [30]. Alternatively, certain P. pastoris expression vectors can be digested in such a way that the expression cassette and marker gene are released, flanked by 5 and 3 AOX1 sequences (see for example Cregg et al. [14]; or [25]. Approximately 10–20% of transformation events are the result of a gene replacement event in which the AOX1 gene is deleted and replaced by the expression cassette and marker gene. This disruption of the AOX1 gene forces these strains to rely on the transcriptionally weaker AOX2 gene for growth on methanol [10, 14] and, as a result, these strains have a Muts phenotype. These gene replacement strains are easily identified among transformed colonies by replica-plating them to methanol and selecting those with reduced ability to grow on methanol (Muts phenotype). As mentioned in Section 2.4.1, the advantage of Muts strains is that they utilize less methanol, contain no bacterial antibiotic resistance genes or replication origins, and sometimes express higher levels of foreign protein than wild-type (Mut+ ) strains, especially in shakeflask cultures. 2.6 Multicopy Strains
Most P. pastoris transformants contain a single copy of an expression vector. However, experience has shown that, despite the strength of the AOX1 promoter,
7
8 Protein Production in Pichia pastoris
foreign gene expression is often transcriptionally limited. Thus, strains that contain multiple integrated copies of an expression cassette often, but not always (see for example Thill et al., [31]), yield more recombinant protein than single-copy strains (see for example [28]). Two general approaches have been developed to construct multicopy expression strains in P. pastoris. The first approach involves constructing a vector with multiple head-to-tail copies of an expression cassette [28]. The key to generating this construction is a vector that has an expression cassette flanked by restriction sites that have complementary termini (e.g., BamHI–BglII, SalI–XhoI combinations). The process of repeated cleavage and reinsertion results in the generation of a series of vectors that contain increasing numbers of expression cassettes. A particular advantage to this approach, especially in the production of human pharmaceuticals, is that the precise number of expression cassettes is known and can be recovered for direct verification by DNA sequencing. The second approach utilizes expression vectors that contain a drug resistance gene as the selectable marker. As described in Section 2.3 and Table 2, these genes include the bacterial KanR , ZeoR , BsdR genes and the P. pastoris FLD1 gene. With each of these genes, the level of drug resistance roughly correlates to vector copy number. However, with any of these four marker genes, vector copy number varies greatly, and most transformants still contain only a single vector copy, even at high levels of drug. Thus, a significant number (50–100) of transformants must be subjected to further analysis of copy number and expression level to identify one or two with high copy number. By this approach, strains carrying up to 30 copies of an expression cassette have been isolated [22]. Importantly, the multicopy strains, once isolated, are stable with standard microbial handling procedures (e.g., stock of strain stored frozen at −80 ◦ C, working stock kept on plate on non-inducing medium) and do not require continued drug selection on plates or in liquid medium. 2.7 Growth in Fermentor Cultures
P. pastoris is a poor fermentor, and this is a major advantage relative to S. cerevisiae. In high cell density (HCD) fermentor cultures, ethanol (the product of S. cerevisiae fermentation) rapidly builds to toxic levels, which limits further growth and foreign protein production. With its preference for respiratory growth, P. pastoris can be cultured at extremely high densities (150 g L−1 dry cell weight; 500 OD600 U mL−1 ) in the controlled environment of the fermentor, with little risk of “pickling” itself. HCD growth is especially important for secreted proteins, as the concentration of product in the medium is roughly proportional to the concentration of cells in culture. For all cultures, high cell densities reduce the total volume of culture needed. Another positive aspect of growing P. pastoris in fermentor cultures is that the level of transcription initiated from the AOX1 promoter can be threeto five-fold greater in cells fed methanol at growth-limiting rates compared to cells grown in excess methanol. Thus, even for intracellularly expressed proteins, product yields can be significantly higher from fermentor-cultured cells. Finally,
3 Post-translational Modification of Secreted Proteins
methanol metabolism utilizes oxygen at a high rate, and expression of foreign genes is negatively affected by oxygen limitation. Only in the controlled environment of a fermentor is it feasible to monitor and adjust oxygen levels in the culture medium. A hallmark of the P. pastoris system is the ease with which expression strains scale-up from shake-flask to high-density fermentor cultures. Although some foreign proteins have expressed well in shake-flask cultures, expression levels are typically low compared to fermentor cultures. Considerable effort has gone into the optimization of heterologous protein expression techniques, and detailed fedbatch and continuous culture protocols are available [28, 32, 33]. In general, PAOX1 controlled expression strains are grown initially in a defined medium containing glycerol as its carbon source. During this time, biomass accumulates, but heterologous gene expression is fully repressed. Upon depletion of glycerol, a transition phase is initiated in which additional glycerol is fed to the culture at a growthlimiting rate. Finally, methanol or a mixture of glycerol and methanol is fed to the culture to induce expression. The concentration of foreign protein is monitored in the culture to determine time of harvest. The growth medium for P. pastoris recombinant protein production fermentations is ideal for large-scale production because it is inexpensive and defined, consisting of pure carbon sources (glycerol and methanol), biotin, salts, trace elements, and water. This medium is free of undefined ingredients that can be sources of pyrogens or toxins and is, therefore, compatible with the production of human pharmaceuticals. Also, since P. pastoris is cultured in medium with a relatively low pH and methanol, it is less likely to become contaminated by most other microorganisms.
3 Post-translational Modification of Secreted Proteins
A major advantage of a fungal expression system over bacterial systems is that fungi are eukaryotes and, therefore, can perform many of the post-translational modifications typically associated with eukaryotes, such as processing of signal sequences, folding, disulfide bridge formation, certain types of lipid addition, and O- and N-linked glycosylation. 3.1 Secretion Signals
Foreign proteins expressed in P. pastoris can be produced either intracellularly or extracellularly. Because this yeast secretes only low levels of endogenous proteins, the secreted recombinant protein often constitutes the vast majority of total protein in the culture medium (Figure ??). Therefore, directing a heterologous protein to the culture medium can serve as a substantial first step in purification. However, due to protein stability and folding requirements, the option of secretion is usually reserved for foreign proteins that are normally secreted by their native hosts.
9
10 Protein Production in Pichia pastoris
Fig. 2. Secreted expression of human serum albumin (HSA). 7.5% SDS-PAGE of 25-µL samples of culture supernatant from a P. pastoris strain expressing HSA. Cells were induced in BMMY (buffered methanolcomplex medium) for 0, 12, 24, 48, and 72 h. Lane M contains molecular mass markers (kDa). (Reprinted with permission from Lin Cereghino and Cregg, [13]).
Using the appropriate P. pastoris expression vector, researchers can clone a foreign gene with sequences encoding either the native signal of the foreign gene, the S. cerevisiae α-factor pre-pro peptide, or the P. pastoris acid phosphatase (PHO1) signal. Although several different secretion signal sequences – including the native secretion signal present on heterologous proteins – have been used successfully, results have been variable. The S. cerevisiae α-factor pre-pro peptide has been used with the most success. This signal sequence consists of a 19-amino acid signal (pre) sequence followed by a 66-residue (pro) sequence containing three consensus N-linked glycosylation sites and a dibasic Kex2 endopeptidase processing site [34]. The processing of this signal sequence involves three steps. The first is the removal of the pre signal by signal peptidase in the endoplasmic reticulum. Second, Kex2 endopeptidase cleaves between Arg–Lys of the pro leader sequence. This is rapidly followed by cleavage of Glu–Ala repeats by the Ste13 protein [35]. The efficiency of this process can be affected by the surrounding amino acid sequence. For instance, the cleavage efficiencies of both Kex2 and Ste13 proteins can be affected by the close proximity of proline residues. In addition, the tertiary structure formed by a foreign protein may protect cleavage sites from their respective proteases. The S. cerevisiae MFα1 pre-pro signal sequence is the most widely used signal for secretion of recombinant proteins from P. pastoris. In some cases, it is a better secretion signal for expression in P. pastoris than the leader sequence of the native heterologous protein. In a study concerning the expression of the industrial lipase Lip1 from Candida rugosa, the effect of heterologous leader sequences on expression and secretion was investigated [36]. It was found that the native Lip1p leader sequence allowed for secretion but somehow hampered expression. Either the α-factor pre or pre-pro signal was adequate for both secretion and expression, but the highest level of lipase secretion was from a clone with the full pre-pro sequence. This clone produced two species of secreted protein. A small percentage was correctly processed to the mature protein. However, a majority of the product contained four additional N-terminal amino acids. Variability in the amino terminus is commonly seen with heterologous proteins secreted by P. pastoris using the α-factor pre-pro leader. In some cases, MFα1 or PHO1 secretion signals have not worked, so synthetic leaders have been created. Martinez-Ruiz et al. [37] made mutations in the native
3 Post-translational Modification of Secreted Proteins
leader to reconstruct a more efficient Kex2p recognition motif (Lys-Arg). This aided in secretion of the ribosome-inactivation protein α-sarcin from the mold Aspergillus giganteus. Another more drastic solution was to create an entirely synthetic pre-pro leader. For the expression of human insulin, a synthetic leader and spacer sequence was found to improve secretion and protein yield [38]. Recently, yet another signal peptide – PHA-E from the plant lectin Phaseolus vulgaris agglutinin – was found to be effective for the secreted expression of two plant lectins and green fluorescent protein [39]. Additionally, it was found that proteins fused to the PHA-E signal peptide were correctly processed at the amino termini, whereas the same proteins secreted using the S. cerevisiae-MFα1 signal had heterogeneous amino-terminal extensions, indicating improper processing of the MFα1 signal. It remains to be seen whether the PHA-E signal sequence works as well in the secretion and processing of other foreign proteins.
3.2 O-linked Glycosylation
P. pastoris is capable of adding both O- and N-linked carbohydrate moieties to secreted proteins [40–43]. Eukaroytic cells assemble O-linked saccharides onto the hydroxyl groups of serine and threonine. In mammals, O-linked oligosaccharides are composed of a variety of sugars, including N-acetylgalactosamine, galactose (Gal), and sialic acid (NeuAc). In contrast, lower eukaryotes such as P. pastoris add O-oligosaccharides composed solely of mannose (Man) residues. No consensus primary amino acid sequence for O-glycosylation appears to exist. Additionally, different hosts may add O-linked sugars on different residues in the same protein. Consequently, it should not be assumed that P. pastoris will not glycosylate a recombinant protein, even if that protein is not glycosylated by its native host. For instance, although insulin-like growth factor 1 (IGF-1) is not glycosylated in humans, P. pastoris was found to add O-linked mannose to 15% of expressed IGF-I product [28]. It should also not be assumed that the specific Ser and Thr residues selected for O-glycosylation by P. pastoris will be the same as the original host. Although there is little information concerning the mechanism and specificity of O-glycosylation in P. pastoris, the presence of O-glycosylation has been reported in some heterologous proteins, such as the Aspergillus awamori glucoamylase catalytic domain [44], human IGF-1 [28], Barley α-amylases 1 and 2 [45], and human singlechain urokinase-type plasminogen activator [46]. Duman et al. [47] used a variety of enzymatic and chromatographic procedures to study endogenous cellular proteins and recombinant human plasminogen produced in P. pastoris. The study revealed the presence of O-linked α-1,2 mannans containing dimeric, trimeric, tetrameric, and pentameric oligosaccharides. No α1,3 linkages were detected. The majority of oligosaccharide was equally distributed between α-1,2-linked dimers and trimers.
11
12 Protein Production in Pichia pastoris
3.3 N-linked Glycosylation
In all eukaryotes, N-glycosylation begins in the endoplasmic reticulum with the transfer of a lipid-linked oligosaccharide unit, Glc3 Man9 GlcNAc2 (Glc = glucose; GlcNAc = N-acetylglucosamine), to asparagine at the recognition sequence Asn-XSer/Thr [41]. This oligosaccharide core is then trimmed to Man8 GlcNAc2 . At this point, glycosylation patterns of lower and higher eukaryotes begin to differ. The mammalian Golgi apparatus performs a series of trimming and addition reactions that generate oligosaccharides composed of Man5–6 GlcNAc2 (high-mannose type), a mixture of several different sugars (complex type), or a combination of both (hybrid type). In S. cerevisiae, N-linked core units are elongated in the Golgi through the addition of mannose outer chains. Since these outer chains vary in length, endogenous and heterologous secreted proteins from S. cerevisiae are heterogeneous in size. These chains are typically 50–150 mannose residues in length – a condition referred to as hyperglycosylation. Most foreign proteins, as well as native proteins, secreted in P. pastoris appear to be Man8 GlcNAc2 or Man9 GlcNAc2 with smaller amounts of carbohydrates with more Man units (Figure ??) [42, 48]. These fungal N-linked high-mannose oligosaccharides represent a significant problem in the use of foreign-secreted proteins by the pharmaceutical industry. They can be exceedingly antigenic when introduced intravenously into mammals, and are rapidly cleared from the blood by the liver. An additional problem caused by the differences between yeast and mammalian Nlinked glycosylation patterns is that the long outer chains may potentially interfere with the folding or function of a mammalian foreign protein. Relative to the oligosaccharide structures on S. cerevisiae-secreted proteins, at least three differences are apparent in P. pastoris-produced proteins. First – and perhaps most importantly – is the frequent absence of hyperglycosylation. Another difference is the presence of α-1,6-linked mannose on corerelated structures reported in P. pastoris-secreted invertase [42], and the kringle-2 domain of tissue-type plasminogen activator [43] and other proteins. Finally, P. pastoris oligosaccharides appear not to have any terminal α-1,3-linked mannosylation [48, 49]. These linkages are unique to fungal glycoproteins and render recombinant proteins produced by fungi unsuitable for human pharmaceutical uses [48]. 3.4 “Humanization” of N-linked Carbohydrate
Recently, exciting results on the modification (“humanization”) of N-linked carbohydrate structures on recombinant proteins secreted from P. pastoris have been described. First, Callewaert et al. [50] showed that modification of carbohydrates on recombinant proteins secreted from P. pastoris was possible by co-expressing an endoplasmic reticulum-targeted Trichoderma reesei 1,2-α-D-mannosidase along with either of two foreign secreted glycoproteins. The result was recombinant glycoproteins that displayed a dramatic decrease in the number of α-1,2-linked
3 Post-translational Modification of Secreted Proteins
Fig. 3 N-linked glycosylation pathway in humans and P. pastoris. Mns, "-1,2-mannosidase; MnsII, mannosidase II; GnT1, $-1,2-Nacetylglucosaminyltransferase I; GnTII, $-1,2-N-acetylglucosaminyltransferase II; MnT, mannosyltransferase. (Reprinted with permission from Hamilton et al., [52].)
13
14 Protein Production in Pichia pastoris
mannose residues. For one of the two foreign proteins, a Trypanosome transsialidase, the major glycan was a near-human Man5 GlcNAc2 structure. A second research group then pushed these results further by constructing a P. pastoris strain with the following modifications: (1) deletion of the P. pastoris OCH1 gene (encoding an α-1,6-manno-syltransferase); (2) addition and expression of two different α-1,2-mannosidase genes, and two different N-acetylglucosaminyl transferase genes; and (3) addition of a uridine 5 -diphosphate (UDP)-N-acetylglucosamine transporter gene [51, 52]. A novel method was utilized to search for mannosidase and transferase genes that performed the desired modifications and that were correctly localized in the secretory pathway: by expressing multiple genes for each of four enzyme types as fusions with a library of fungal type II membrane leader sequences. Each of the four combinatorial libraries (one for each enzymatic step) was sequentially introduced into P. pastoris and collections of transformed strains were screened for ones with the proper modifying activity. The result was a strain that predominantly added the human “high-mannose” N-glycan GlcNAc2 Man3 GlcNAc2 to a model recombinant protein. Further studies are needed to determine whether the humanized glycosylation strain will add this same structure to all recombinant proteins and to all sites on a recombinant protein. Furthermore, fully humanized glycans should also contain some amount of complex and/or hybrid structures, in particular, terminal sialic acid residues. It is not clear that P. pastoris will be able to perform these additions. Still, it is evident that great progress has been made in ameliorating this critical yeast post-translational problem. These carbohydrate modification results should also be of interest to investigators studying protein structure/function. The eukaryotic secretory pathway typically produces glycoproteins with a mixture of glycan structures. This has been a problem for investigators since the resulting variations in carbohydrate structures influence the functional and structural properties of the protein. With these and other glycan-modifying strains, researchers will – for the first time – be able to produce recombinant glycoproteins with uniform carbohydrates of known structure.
4 Conclusions
The P. pastoris expression system has gained acceptance as an important host organism for the production of recombinant proteins, as illustrated by the number of investigations in which this system has been successfully used (http://faculty.kgi.edu/cregg/index.htm). Several proteins synthesized in P. pastoris are in use, or are being tested for use, as human pharmaceuticals in clinical trials. Human serum albumin (HSA), a serum replacement product marketed under the name ALBREC by Mitsubishi Pharmaceuticals, has passed clinical trials, and has received NDA approval in Japan [53]; http://www.mpharma.co.jp/e/ir/ar2002/pdfs/ar2002.pdf). The company is currently awaiting approval of its HSA manufacturing faculty. Another protein, hepatitis B surface antigen, is on the market as a subunit vaccine against the hepatitis B virus in South
5 Appendix
America. The angiogenesis inhibitors endostatin and angiostatin are in clinical trials (BKL Sim, unpublished results). 5 Acknowledgments
The preparation of this chapter was supported in part by a grant from US Department of Energy (DE-FG02-03ER15407). Appendix Table A.1 Common P. pastoris expression host strains
Strain name Y-11430 X-33 GS115 KM71
Genotype
GS241 MS105 SMD1168
his4 aox1::SARG4 his4 arg4 aox1::SARG4 aox2::Phis4 his4 arg4 fld1 f ld1 his4 pep4 his4
SMD1165
prb1 his4
SMD1163
pep4 prb1 his4
JC220-JC308b
ade1 arg4 his4 ura3
MC100–3
a b
Phenotype
Reference
Wild-type Wild-type Mut+ His− Muts His−
NRRLa Invitrogen [11] Cregg and Madden [11]
Mut− His−
[10]
Mut− Mta− Mut− Mta− His− Mut+ His− Protease- deficient Mut+ His− Protease deficient Mut+ His− Protease deficient Ade− Arg− His− Ura−
Sunga and Cregg [25] Sunga and Cregg [25] Invitrogen Higgins and Cregg [3] Higgins and Cregg [3] Higgins and Cregg [3] Cereghino et al. [20]
Northern Regional Research Laboratories, Peoria, IL, USA. Strain set with all possible combinations of these four auxotrophic mutations.
Table A.2 Common P. pastoris expression vectors
Vector name
Selectable markers
Intracellular pHIL-D2
HIS4
pA08015
HIS4
Features
Reference
NotI sites for AOX1 gene replacement Expression cassette bounded by BamHI and BglII sites for generation of multicopy expression vector
Invitrogen Invitrogen Thill et al., [31]
15
16 Protein Production in Pichia pastoris Table A.2 Common P. pastoris expression vectors (Continued)
Vector name
Selectable markers
pPIC3.5K
HIS4 and kar r
pPICZ
bler
pGAPZ
bler
pJL-IX
FLD1
pBLHIS-IX pBLAR3-IX pBLADE-IX pBLURA-IX Secretion pHIL-S1
HIS4 ARG4 ADE1 URA3 HIS4
pPIC9K
HIS4 and kanr
pPICZα
bler
Features
Reference
Multiple cloning sites for insertion of foreign genes; G418 selection Multiple cloning sites for insertion of foreign genes; Zeocin selection for multicopy strains; potential for fusion of foreign protein to His6 and myc epitope tags Expression controlled by constitutive GAPp; multiple cloning site or insertion of foreign genes; Zeocin selection for multicopy strains; potential for fusion of foreign protein HIS6 and myc epitope tags PAOX1 with FLD1 gene as selectable marker PAOX1 vector series with one of four bio-synthetic gene markers (HIS4, ARG4, ADE1 or URA3) PAOX1 fused to PHO1 secretion signal; XhoI, EcoRI, and BamHI sites available for insertion of foreign genes PAOX1 fused to α-MF pre-pro signal sequence; XhoI (not unique), EcoRI, NotI, SnaBI and AvrII sites available for insertion of foreign genes; G418 selection for multicopy strains PAOX1 fused to α-MF pre-pro signal sequence; multiple cloning site for insertion of foreign genes; Zeocin selection for multicopy strains; potential for fusion of foreign protein to HIS6 and myc epitope tags
Invitrogen
Invitrogen Miles et al. (1998)
Invitrogen Waterham et al., [15]
Sunga and Cregg [25] Cereghino et al. [20] http://faculty.kgi .edu/cr Invitrogen
Invitrogen Scorer et al. [22]
Invitrogen
5 Appendix Table A.2 Common P. pastoris expression vectors (Continued)
Vector name
Selectable markers
pGAPZα
bler
pJL-SX
FLD1
pBLHIS-SX pBLARG-SX pBLADE-SX pBLURA-SX
HIS4 ARG4 ADE1 URA3
Features
Reference
Expression controlled by constitutive PGAP ; PGAP fused to α-MF pre-pro signal sequence; multiple cloning site for insertion of foreign genes; Zeocin selection for multicopy strains; potential for fusion of foreign protein to HIS6 and myc epitope tags PAOXI fused to α-MF pre-pro signal sequence; FLD1 gene as selectable marker Series of vectors with PAOX1 fused to α-MF pre-pro signal sequence and H1S4, ARG4, ADE1, or URA3 genes as selectable marker
Invitrogen Waterham et al. [15]
Table A.3 List of Genes
Gene
Encoded gene product
ADE1 ARG4 AOX1 AOX2 FLD1 GAP HIS4 KEX1 MFα1 OCH1 PEP4 PEX8 PHO1 URA3 URA5 YPT1
amidoimidazole succinocarboxamide synthase argininosuccinate lyase alcohol oxidase I alcohol oxidase II formaldehyde dehydrogenase glyceraldehyde-3-phosphate dehydrogenase histidinol dehydrogenase carboxy-terminal protease α mating factor (S. cerevisiae) α-1,6-mannosyltransferase vacuolar protease peroxisomal biogenesis protein acid phosphatase orotidine-5 -phosphate decarboxylase orotate-phosphoribosyl transferase GTPase
Sunga and Gregg [25] Cereghino et al. [20]; httpa/faculty.kqi .edu/ci
17
18 Protein Production in Pichia pastoris
References 1 ELLIS S. B., BRUST P. F., KOUTZ P. J., WATERS A. F., HARPOLD M. M., GINGERAS T. R. (1985) Isolation of alcohol oxidase and two other methanol regulatable genes from the yeast Pichia pastoris. Mol Cell Biol 5: 1111–1121 2 CREGG J. M., LIN CEREGHINO J., SHI J., HIGGINS D. R. (2000) Recombinant protein expression in Pichia pastoris. Mol Biotechnol 16: 23–52 3 HIGGINS D. R., CREGG J. M. (1998) Methods in Molecular Biology: Pichia Protocols, Volume 103, Humana Press, Totowa, NJ 4 OGATA K., NISHIKAWA H., OHSUGI M. (1969) A yeast capable of utilizing methanol. Agric Biol Chem 33: 1519–1520 5 VEENHUIS M., VAN DIJKEN J. P., HARDER W. (1983) The significance of peroxisomes in the metabolism of one-carbon compounds in yeast. Adv Microb Physiol 24: 1–82 6 WEGNER G. (1990) Emerging applications of the methylotrophic yeasts. FEMS Microbiol Rev 7: 279–283 7 EGLI T., van DIJKEN J. P., VEENHUIS M., HARDER W., FIECHTER A. (1980) Methanol metabolism in yeasts: regulation of the synthesis of catabolic enzymes. Arch Microbiol 124: 115–121 8 COUDERC R. and BARATTI J. (1980) Oxidation of methanol by the yeast Pichia pastoris: purification and properties of alcohol oxidase. Agric Biol Chem 44: 2279–2289 9 TSCHOPP J. F., BRUST P. F., CREGG J. M., STILLMAN C. A., GINGERAS T. R. (1987) Expression of the LacZ gene from two methanol-regulated promoters in Pichia pastoris. Nucleic Acids Res 15: 3859– 3876 10 CREGG J. M., MADDEN K. R., BARRINGER K. J., THILL G. P., STILLMAN C. A. (1989) Functional characterization of the two alcohol oxidase genes from the yeast Pichia pastoris. Mol Cell Biol 9: 1316–1323 11 CREGG J. M., MADDEN K. R. (1987) Development of yeast transformation systems and construction of methanol-utilization-defective mutants of Pichia pastoris by gene disruption. In: Biological Research on Industrial Yeasts
12
13
14
15
16
17
18
19
(STEWART G. G., RUSSELL I., KLEIN R. D. and HIEBSCH R. R., Eds), Vol. 2, CRC Press, Boca Raton, FL, pp 1–18 CREGG J. M. (1999) Expression in the methylotrophic yeast Pichia pastoris. In: Gene Expression Systems: Using Nature for the Art of Expression (FERNANDEZ J. M. and HOEFFLER J. P., Eds), Academic Press, San Diego, CA, pp 157–191 LIN CEREGHINO J., CREGG J. M. (2000) Heterologous protein expression in the methylotrophic yeast Pichia pastoris. FEMS Microbiol Rev 24: 45–66 CREGG J. M., TSCHOPP J. F., STILLMAN C., SIEGEL R., AKONG M., CRAIG W. S., BUCKHOLZ R. G., MADDEN K. R., KELLARIS P. A., DAVIS G. R., SMILEY B. L., CRUZE J., TORREGROSSA R., VELICELEBI G., THILL G. P. (1987) High level expression and efficient assembly of hepatitis B surface antigen in the methylotrophic yeast, Pichia pastoris. Bio/Technology 5: 479–485 WATERHAM H. R., DIGAN M. E., KOUTZ P. J., LAIR S. V., CREGG J. M. (1997) Isolation of the Pichia pastoris glyceraldehyde-3phosphate dehydrogenase gene and regulation and use of its promoter. Gene 186: 37–44 SHEN S., SULTER G., JEFFRIES T. W., CREGG J. M. (1998) A strong nitrogen source-regulated promoter for controlled expression of foreign genes in the yeast Pichia pastoris. Gene 216: 93–102 LIU H., TAN X., RUSSELL K. A., VEENHUIS M., CREGG J. M. (1995) PER3, a gene required for peroxisome biogenesis in Pichia pastoris, encodes a peroxisomal membrane protein involved in protein import. J Biol Chem 270: 10940–10951 SEARS I. B., O’CONNOR J., ROSSANESE O. W., GLICK B. S. (1998) A versatile set of vectors for constitutive and regulated gene expression in Pichia pastoris. Yeast 14: 783–790 VUORELA A., MYLLYHARJU J., NISSI R., PIHLAJANIEMI T., KIVIRIKKO K. I. (1997) Assembly of human prolyl 4-hydroxylase and type III collagen in the yeast Pichia pastoris: formation of a stable enzyme tetramer requires coexpression with collagen and assembly of a stable collagen
References
20
21
22
23
24
25
26
27
28
requires coexpression with prolyl 4-hydroxylase. EMBO J 16: 6702–6712 CEREGHINO G. L., LIM M., JOHNSON M. A., CEREGHINO J. L., SUNGA A. J., RAGHAVAN D., GLEESON M., CREGG J. M. (2001) New selectable marker/auxotrophic host strain combinations for molecular genetic manipulation of Pichia pastoris. Gene 263: 159–169 NETT J. H., GERNGROSS T. U. (2003) Cloning and disruption of the PpURA5 gene and construction of a set of integration vectors for the stable genetic modification of Pichia pastoris. Yeast 20: 1279–1290 SCORER C. A., CLARE J. J., MCCOMBIE W. R., ROMANOS M. A., SREEKRISHNA K. (1994) Rapid selection using G418 of high copy number transformants of Pichia pastoris for high-level foreign gene expression. Bio/Technology 12: 181–184 HIGGINS D. R., BUSSER K., COMISKEY J., WHITTIER P. S., PURCELL T. J., HOEFFLER J. P. (1998) Small vectors for expression based on dominant drug resistance with direct multicopy selection. Methods Mol Biol 103: 41–53 SODERHOLM J., BEVIS B. J., GLICK B. S. (2001) Vector for pop-in/pop-out gene replacement in Pichia pastoris. Biotechniques 31: 306–310, 312 SUNGA A. J., CREGG J. M. (2004) The Pichia pastoris formaldehyde dehydrogenase gene (FLD1) as a marker for selection of multicopy expression strains of P. pastoris. Gene 330: 39–47 CHIRUVOLU V., CREGG J. M., MEAGHER M. M. (1997) Recombinant protein production in an alcohol oxidase-defective strain of Pichia pastoris in fed-batch fermentations. Enzyme Microb Technol 21: 277–283 WHITE C. E., HUNTER M. J., MEININGER D. P., WHITE L. R., KOMIVES E. A. (1995) Large-scale expression, purification and characterization of small fragments of thrombomodulin: the roles of the sixth domain and of methionine 388. Protein Eng 8: 1177–1187 BRIERLEY R. A. (1998) Secretion of recombinant human insulin-like growth factor I (IGF-1). Methods Mol Biol 103: 149–177
29 BOEHM T., PIRIE-SHEPARD S., TRINH L. B., SHILOACH J., FOLKMAN, J. (1999) Disruption of the KEX1 gene in Pichia pastoris allows expression of full-length murine and human endostatin. Yeast 15: 563–567 30 CREGG J. M., BARRINGER K. J., HESSLER A. Y., MADDEN K. R. (1985) Pichia pastoris as a host system for transformations. Mol Cell Biol 5: 3376–3385 31 THILL G. P., DAVIS G. R., STILLMAN C., HOLTZ G., BRIERLEY R., ENGEL M., BUCKHOLTZ R., KENNEY J., PROVOW S., VEDVICK T., SIEGEL R. S. (1990) Positive and negative effects of multicopy integrated expression vectors on protein expression in Pichia pastoris. In: Proceedings of the Sixth International Symposium on the Genetics of Microorganisms (HESLOT H., DAVIES J., FLORENT J., BOBICHON L., DURAND G. and PENASSE L., Eds), Vol. 2, Soci´et´e Franc¸aise de Microbiologie, Paris, pp 477–490 32 CLARE J., SREEKRISHNA K., ROMANOS M. (1998) Expression of tetanus toxin fragment C. Methods Mol Biol 103: 193–208 33 STRATTON J., CHIRUVOLU V., MEAGHER M. (1998) High cell-density fermentation. Methods Mol Biol 103: 107–120 34 KURJAN J., HERSKOWITZ I. (1982) Structure of a yeast pheromone gene (MF alpha): a putative alpha-factor precursor contains four tandem copies of mature alpha-factor. Cell 30: 933–943 35 BRAKE A. J., MERRYWEATHER J. P., COIT D. G., HEBERLEIN U. A., MASIARZ F. R., MULLENBACH G. T., URDEA M. S., VALENZUELA P., BARR P. J. (1984) Alpha-factor-directed synthesis and secretion of mature foreign proteins in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 81: 4642–4646 36 BROCCA S., SCHMIDT-DANNERT C., LOTTI M., ALBERGHINA L., SCHMID R. D. (1998) Design, total synthesis, and functional overexpression of the Candida rugosa lip1 gene coding for a major industrial lipase. Protein Sci 7: 1415–1422 37 MART´INEZ-RUIZ A., MART´INEZ DEL POZO A., ˜ O J. M., O˜ LACADENA J., MANCHEN NADERRA M., LOPEZ ´ -OTIN C., GAVILANES J. G. (1998) Secretion of recombinant pro- and
19
20 Protein Production in Pichia pastoris
38
39
40
41
42
43
44
45
mature fungal α-sarcin ribotoxin by the methylotrophic yeast Pichia pastoris: the Lys–Arg motif is required for maturation. Protein Expr Purif 12: 315–322 KJELDSEN T., PETTERSSON A. F., HACH M. (1999) Secretory expression and characterization of insulin in Pichia pastoris. Biotechnol Appl Biochem 29: 79–86 RAEMAEKERS R. J. M., de MURO L., GATEHOUSE J. A., FORDHAM-SKELTON A. P. (1999) Functional phytohaemagglutinin (PHA) and Galanthus nivalis agglutinin (GNA) expressed in Pichia pastoris: correct N-terminal processing and secretion of heterologous proteins expressed using the PHA-E signal peptide. Eur J Biochem 265: 394–403 TSCHOPP J. F., SVERLOW G., KOSSON R., CRAIG W., GRINNA L. (1987b) High level secretion of glycosylated invertase in the methylotrophic yeast, Pichia pastoris. Bio/Technology 5: 1305–1308 GOOCHEE C. F., GRAMER M. J., ANDERSEN D. C., BAHR J. B., RASMUSSEN J. R. (1991) The oligosaccharides of glycoproteins: bioprocess factors affecting oligosaccharide structure and their effect on glycoprotein properties. Bio/Technology 9: 1347–1355 TRIMBLE R. B., ATKINSON P. H., TSCHOPP J. F., TOWNSEND R. R., MALEY F. (1991) Structure of oligosaccharides on Saccharomyces SUC2 invertase secreted by the methylotrophic yeast Pichia pastoris. J Biol Chem 266: 22807–22817 MIELE R. G., CASTELLINO F. J., BRETTHAUER R. K. (1997) Characterization of the acidic oligosaccharides assembled on the Pichia pastoris-expressed recombinant kringle 2 domain of human tissue-type plasminogen activator. Biotechnol Appl Biochem 26: 79–83 HEIMO H., PALMU K., SUOMINEN I. (1997) Expression in Pichia pastoris and purification of Aspergillus awamori glucoamylase catalytic domain. Protein Expr Purif 10: 70–79 JUGE N., ANDERSEN J. S., TULL D., ROEPSTORFF P., SVENSSON B. (1996) Overexpression, purification, and characterization of recombinant barley alpha-amylases 1 and 2 secreted by the
46
47
48
49
50
51
52
53
methylotrophic yeast Pichia pastoris. Protein Expr Purif 8: 204–214 TSUJIKAWA M., OKABAYASHI K., MORITA M., TANABE T. (1996) Secretion of a variant of human single-chain urokinase-type plasminogen activator without an N-glycosylation site in the methylotrophic yeast, Pichia pastoris and characterization of the secreted product. Yeast 12: 541–553 DUMAN J. G., MIELE R. G., LIANG H., GRELLA D. K., SIM K. L., CASTELLINO F. J., BRETTHAUER R. K. (1998) O-Mannosylation of Pichia pastoris cellular and recombinant proteins. Biotechnol Appl Biochem 28: 39–45 MONTESINO R., GARCIA R., QUINTERO O. and CREMATA J. A. (1998) Variation in N-linked oligosaccharide structures on heterologous proteins secreted by the methylotrophic yeast Pichia pastoris. Protein Expr Purif 14: 197–207 ROMANOS M. A., SCORER C. A., CLARE J. J. (1992) Foreign gene expression in yeast: a review. Yeast 8: 423–488 CALLEWAERT N., LAROY W., CADIRIGI H., GEYSENS S., SAELENS X., MIN JOU W., CONTRERAS R. (2001) Use of HDEL-tagged Trichoderma reesei mannosyl oligosaccharide 1,2-α-D-mannosidase for N-glycan engineering in Pichia pastoris. FEBS Lett 503: 173–178 CHOI B.-K., BOBROWICZ P., DAVIDSON R. C., HAMILTON S. R., KUNG D. H., LI H., MIELE R. G., NETT J. H., WILDT S., GERNGROSS T. U. (2003) Use of combinatorial genetic libraries to humanize N-linked glycosylation in the yeast Pichia pastoris. Proc Natl Acad Sci USA 100: 5022–5027 HAMILTON S. R., BOBROWICZ P., BOBROWICZ B., DAVIDSON R. C., LI H., MITCHELL T., NETT J. H., RAUSCH S., STADHEIM T. A., WISCHNEWSKI H., WILDT S., GERNGROSS T. U. (2003) Production of complex human glycoproteins in yeast. Science 301: 1244–1246 OHTANI W., NAWA Y., TAKESHIMA K., KAMURO H., KOBAYASHI K., OHMURA T. (1998) Physico-chemical and immunochemical properties of recombinant human serum albumin from Pichia pastoris. Anal Biochem 256: 56–62
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
1
Protein Production in Mammalian Cells Volker Sandig, Thomas Rose, Karsten Winkler, and Rene Brecht ProBiogen AG, Berlin, Germany
Originally published in: Production of Recombinant Proteins. Edited by Gerd Gellissen. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31036-4
1 Why Use Mammalian Cells for Heterologous Gene Expression?
Mammalian cells constitute a demanding system for the production of heterologous proteins. The need for specialized media and sufficient oxygen supply, low cell densities and slow growth kinetics, and a high sensitivity of the cells to mechanical stress are obstacles which must be overcome in routine fermentation. Furthermore, mammalian cells are potential targets for adventitious viral agents, and processes based on such cells must be rigorously monitored. Despite these difficulties, mammalian cells are the preferred production system for the synthesis of glycoproteins intended for administration to humans. In 2004, mammalian cellbased therapeutic proteins are expected to reach a market share of 59%, followed by 27% for E. coli-based products (Source: Datamonitor). The ease of production in bacterial systems must be counterbalanced against the need to dissolve and renature misfolded, aggregated, insoluble proteins. In contrast, the chaperone system in mammalian cells ensures that proteins are secreted in correctly folded form. Whereas eukaryotic microbial systems such as yeasts are also capable of modifying recombinant translation products by proteolytic processing of precursor proteins, formation of disulfide bridges and phosphorylation, only mammalian cells are able to glycosylate proteins in the patterns characteristic of higher eukaryotes, yielding products that are identical to their natural human counterparts. Although other modifications such as the attachment of lipids may have an impact on heterologous protein production in the near future, we focus in this chapter on authentic N- and O-linked glycosylation as the most important capability of mammalian cells that distinguishes them from other expression systems. The significance of correct glycosylation is illustrated by the following examples: All antibodies are glycosylated at conserved positions in their constant regions, and Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
2 Protein Production in Mammalian Cells
the presence of these carbohydrate moieties is essential for pharmaceutical efficacy. The structure of the carbohydrate affects complement activation and binding to the Fc receptor [1]. In addition, glycosylation at or near the antigen binding site may have a dramatic influence on affinity. Human plasma proteins may be rendered antigenic upon exposure of epitopes that are normally masked by a carbohydrate chain. O-linked oligosaccharide chains (attached to serine and threonine hydroxyl groups) can present multivalent antigenic or functional determinants for antibody recognition and cell adhesion [2]. Glycosylation patterns are not identical in all mammalian cells, but depend on the tissue type and species of origin, and on cell culture conditions [3]. Again, these differences can be of functional significance. For example the alpha 1 → 3 gal epitope is absent from normal human cells but is present in human tumor cells, as well as in proteins derived from mice and yeast. This epitope is recognized by pre-existing antibodies, which mediate the rapid clearance of proteins that display it [4].
2 Mammalian Cell Lines for Protein Production
One of the reasons why one might wish to produce proteins via heterologous gene expression is that a valuable protein of interest is normally secreted from its natural cellular source in very small amounts. Therefore, a recombinant approach is taken. A cDNA of the respective coding sequence is inserted between a strong promoter and a polyadenylation sequence contained on an expression vector, and transferred into a suitable cell line via transfection. Although almost any human or animal cell line would be able to generate the protein – provided that its growth properties meet fermentation requirements – therapeutic proteins are preferentially produced in CHO (Chinese hamster ovary), BHK (baby hamster kidney) and NSO cells (mouse myeloma) and Sp2/0 (myeloma) cells (Table 1). More recently, special human designer cell lines made in vitro from primary cells by immortalization with defined oncogens, have become attractive candidates for glycoprotein production [5]. HEK 293 derived from human embryonic kidney cells by transformation with E1 genes of Adenovirus 5 are used extensively for transient protein production due to their highly efficient DNA uptake system. The PER.C6 cell line – a better characterized line derived from human retinoblasts using the same transforming gene – is also intended for pharmaceutical application [6]. Extremely high cell densities can be achieved in batch processes (well over 107 cells mL−1 ), allowing accumulation of a recombinant product to high titers. A second potential advantage of these cells is that they glycosylate target proteins in a precisely determined manner, which is characteristic of human glycoproteins. Other cell lines are currently under development that promise to be particularly suited for efficient protein secretion. The vector and the cell line form an integrated system designed to produce maximal yields of a given protein. However, even when applying an optimal combination of vector and cell line, the productivity can vary greatly between individual clones. Furthermore, productivity depends crucially on the precise design of the
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
3 Mammalian Expression Systems Table 1 Mammalian cell lines used for glycoprotein production
Cell line
Genotype
CHO K1
parental/no defect
Selection
Source
Reference
GS
DSMZ (ACC110)
Puck (1965)
Antibiotic resistance markers (no dhfr)
ATCC
CHO DXB11 dhfr-
dhfr
DSMZ (ACC126)
[28]
CHO DG44
dhfr-
dhfr
Chasin, LA
[29]
NS0
no defect; very low GS GS
[30]
SP2/0
BHK
hybrid BALB/c spleen + P3X63AG8 no defect
with GS system: Lonza DSMZ (ACC146)
HEK 293
no defect
293T
with SV40T
PER.C6
no defect
Antibiotic resistance markers Antibiotic resistance markers Antibiotic resistance markers Transient expression of SV40ori plasmid Antibiotic resistance markers (neo)
[43]
ATCC (CLL10) ATCC) (CRL1773) [44] ATCC (CRL11268) [45]
Crucell
[6]
fermentation process. Therefore, special emphasis must be placed on the careful screening of potential production lines and on optimization of the medium and growth conditions for the subsequent fermentation process.
3 Mammalian Expression Systems 3.1 Design of the Basic Expression Unit
To allow for transcription, the cDNA encoding the protein of interest must be linked to a promoter that controls its expression. Such an element is taken from a highly expressed viral or cellular gene. The “promoter” is a stretch of DNA immediately upstream of a transcription initiation site. The promoter binds factors of the transcription machinery and factors that recruit co-activators for the removal of core histones, thus giving the RNA polymerase access to the template and allowing it to bind. This process is supported by more distantly located enhancers which act in a distance- and orientation-independent manner [7]. From the wide range of promoter elements available, only a very few are routinely used: The SV40 early promoter, the Rous sarcoma virus (RSV) long terminal repeat, and the human cytomegalovirus (CMV) immediate early promoters are the most prominent virus-derived examples. Mammalian cell-derived promoters are often
3
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
4 Protein Production in Mammalian Cells
intrinsically less efficient, but on the other hand they may also be less susceptible to inactivation by DNA methylation, thus permitting stable long-term expression [8]. In the majority of expression vectors currently in use, the immediate early CMV promoter/enhancer is employed, and several versions that differ in strength are available. Some versions included in commercial vectors lack upstream inhibitory elements such as the YY1 binding site [9], but also the strong activating downstream elements and even the natural transcription start site (pcDNA3; Invitrogen). Surprisingly, an extremely short version extending from −299 to 72 has exceptionally high activity (unpublished observation by the authors). Alternative promoters such as the mouse CMV promoter and the EF1 alpha promoter have been shown to be active in CHO cells when used alone [10], or even more active when combined [11]. The human ubiquitin C promoter, the human initiation factor 4A1 promoter, and the chicken beta actin promoter as a non-mammalian example provide additional alternatives [12]. Promoter strength reflects the level of mRNA synthesis, but this not the only factor that determines the amount of protein obtained. Further determinants include the steady-state level of the mRNA, its transport to the cytoplasma, and its translatability. In cases where production is sub-optimal, the effects of re-introduction of introns into the cDNA can be assessed. In some cases this resulted in truncation of transcripts, since splicing commences from the 3 -end of a transcripts and a cryptic splice sequence within the cDNA could be used for processing instead of the expected donor. This problem can be eliminated by placing such introns 5 to the cDNA. The definition of an appropriate polyadenylation signal is of similar importance for efficient gene expression. Sources for suitable elements are the SV40 late transcript, and the transcripts for bovine growth hormone and HSV tk genes [13]. In contrast, a large fraction of transcripts remains without a polyadenylated tail when the SV40 early poly(A) signal (frequently included in commercially available expression vectors) is used. For efficient translation, the start codon of the gene to be expressed should be embedded within a nucleotide sequence that conforms to the Kozak rules [14]. Although in a large number of genes the main start site is preceded by AUG codons [15], upstream AUGs – as well as hairpins close to the AUG – should be avoided by careful vector design. Their presence can result in a 100-fold reduction in protein yield. 3.2 Transient Expression and Episomal Vectors: Alternatives to Stable Integration
Typically, it takes between several months and a full year to establish a recombinant mammalian production cell line, and such developments require a large number of tedious and labor-intensive steps. However, it is desirable to be able to produce a limited amount of a protein of choice in a much shorter time frame when new product candidates are to be assessed, or a screening system has to be established. For required amounts of 10 mg to 1 g, mammalian cell systems offer
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
3 Mammalian Expression Systems
the option of a shortcut, namely transient expression. Using especially efficient transfection techniques, almost 100% of the recipient cells of some cell lines (e.g., HEK293 or CHO) can be induced to take up vector DNA. Some techniques, such as a modified version of the traditional calcium phosphate co-precipitation technique [16], or more recent approaches using polyethylenimine [17] or dendrimers [18], are suitable for suspension cells grown in bioreactors. The transfected DNA is transported to the nucleus where it is maintained for a few days. Unless it has any replication and nuclear retention signals, the foreign DNA is rapidly lost and expression consequently declines. The useful lifetime of such systems can be extended by using virus-based vectors (SV40, BPV, EBV). A large number of DNA copies per cell ensures maximal transcription, and protein processing then becomes the rate-limiting step in production. In the case of SV40, extrachromosomal replication can result in more than 100 000 DNA copies per cell. Due to their extremely high productivity, these cells eventually die because an appropriate energy balance cannot be maintained. In the case of BPV, attainable copy numbers are lower and expression persists for a longer time. All of the systems mentioned require the presence of a viral protein (SV40 large T antigen, BPV E1/E2, EBNA1, respectively) and a viral origin of replication. In the case of SV40 and BPV, replication origins are activated by the helicase activity of the respective viral protein, and replication is uncoupled from that of the chromosomal DNA. In contrast, EBNA1 appears to lack the enzymatic activities characteristic of ori-binding proteins. EBV-based plasmids replicate once per cell cycle, and expression is maintained for even longer. These proteins of all three viruses have one feature in common: they attach vector DNA to the nuclear matrix. This attachment is required for retention in the nucleus; the centromere functions that are lacking in a vector are substituted by association with host nuclear matrix [19]. For chromosomal DNA, attachment is mediated by MAR elements (AT-rich sequences of bent DNA). It is not surprising therefore that MAR elements can also elicit binding of viral sequences. Accordingly, it has been shown that vectors equipped with a specific assembly of replication origin, MAR element and properly terminated transcription units have all functional elements required for episomal replication over 100 passages [20]. This is not observed for EBV-based vectors, such as REP 4 and REP 8 (Invitrogen), which require permanent selection pressure for long-term stability. The recently described MAR element located adjacent to oriP in EBV may permit us to overcome these limitations [21]. Such episomal vectors may in future represent the preferred option, not only for short-term transient expression but also for the generation of stable cell lines, provided that some open issues can be successfully addressed. Such vectors offer several obvious advantages. Almost all recipient cells will stably maintain the vector. There is no requirement for (inefficient) chromosomal integration relying on cellular repair mechanisms. Moreover, episomal DNA is assumed to be not affected by cellular defense mechanisms that would silence expression, as it is often observed with integrated DNA. For the time being, traditional episomal vectors grown in HEK293 cells carrying the respective genes (293EBNA, 293T) will retain their importance as short- and medium-term mammalian expression systems.
5
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
6 Protein Production in Mammalian Cells
Quite different DNA-RNA replication systems for intermediate and long-term expression have been defined which do not require DNA integration. These systems are based on RNA viruses such as alphaviruses. DNA vectors are introduced into the cell and transcribed as described for the previous vectors. However, in addition to the open reading frame (ORE) for the protein of interest, such a vector harbors the gene for an RNA-dependent RNA polymerase. This polymerase first allows for replication of the transcript via a minus strand and a sub-genomic plus strand RNA. The plus strand serves as a template for translation of the gene of interest. This system provides a stable level of RNA that can also be regulated via temperature shifts and is compatible with the physiology of the host cell [22]. This contrasts with the previous systems, in which the newly introduced virus can completely take over the host cell’s metabolism and eventually kill it. 3.3 “Stable” Integration into the Host Genome
The most typical approach for protein production in mammalian cells is the generation of so-called stable cell lines. This term is used to describe an individual cell clone which carries the expression unit embedded at a single or at multiple sites in host chromosome(s). The substrate for integration is a linear DNA molecule with free ends, irrespective of whether the vector is provided as a supercoiled or linearized molecule. These free ends are recognized by the double-strand break repair system and joined to cellular DNA. This process represents a survival strategy that is vital to the maintenance of permanent cell lines because it prevents the loss of essential genes telomeric to a breakpoint. Since integration usually occurs in nonessential DNA – which represents the majority of the mammalian genome – this is unlikely to be detrimental to the cell. Depending on the plasmid load introduced into a cell, integration occurs as a single copy or in the form of concatamers. Calcium phosphate transfection is known to favor the integration of multimeric molecules. The resulting increase in gene dosage may lead to higher expression levels. Hence, the method has traditionally been preferred for the generation of high-level producer cell lines. Alternatively, multiple expression cassettes can be placed on a single vector, and plasmids with up to 12 copies of a complete gene and a single selection marker gene can easily be constructed. The insertion of multiple cassettes at a single position may even be enforced, if the expression cassettes are integrated into a single plasmid using restriction endonucleases (such as SfiI) that recognize interrupted palindromes, and cosmid-based vectors are employed [23]. In contrast to the products of random concatamerization, the expression units on such plasmids are present in head-totail array after transfection, thereby precluding the formation of antisense RNA. However, the increase in expression is never proportional to the copy number, and expression levels can be further reduced by epigenetic modifications. Thus, after extended passaging the expression level may actually be inferior to that attainable with individual inserts. Epigenetic inactivation is induced as a defense mechanism against retroviruses, and is based on recognition of identical DNA repeats of several
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
3 Mammalian Expression Systems
kilobases in length [23, 24]. Therefore, transfection techniques that result in the preferential integration of a single copy, such as electroporation and lipofection, are now preferred. 3.4 Selection Strategies for Mammalian Cells
Although illegitimate recombination (end joining without homology among the participating sequences) is strongly enhanced in permanent cell lines, it still occurs in only a minor fraction of transfected cells. This is not surprising because preexisting double-strand breaks in the genome are required for this process. The identification of the few cells with an insertion requires the co-integration of marker genes and drug selection of the transfected cell pool. Initially, the gene of interest and the selection marker were introduced on separate plasmids provided in a molar ratio of at least 10:1 in favor of the plasmid bearing the product gene. In more recent approaches, both genes are delivered on the same plasmid (Figure 1). This strategy still does not ensure co-expression of both genes because the gene of interest or its expression signals may be inactivated by random double-strand breaks introduced before integration. This is rendered much less likely when both genes are provided as a single expression unit. In this case, both are transcribed from a single promoter and constitute a single RNA species with only one polyadenylation signal. Whereas translation of the upstream gene in this bi-cistronic transcript follows conventional rules (e.g., dependence on a m7G-cap structure at the 5 -end of the RNA and ribosome scanning to the first ATG [25]), the second ORF is inactive unless an IRES element (internal ribosome entry site) is incorporated (Figure 1). IRES elements have been isolated mainly from picornaviruses, wherein the positive-strand RNA genome lacks a cap structure, and translation is dependent upon a non-coding 450-bp fragment of complex secondary structure. However, class II IRES elements, such as those derived from the encephalomycarditis virus (EMCV) and foot and mouth disease virus (FMDV), are generally more efficient and easier to use compared to those of class I (e.g., poliovirus), because they do not require exact positioning of the ATG relative to the element. More recently, a number of elements of similar function have been identified in cellular genes [26, 27] that are normally expressed under conditions when capdependent translation is inhibited. Even when the more efficient elements are used, the expression level of the downstream gene is approximately one order of magnitude lower than that of the upstream gene. Therefore, the gene encoding the selection marker is placed to the downstream position. The tight linkage of the two genes provides a tool for selecting high-level producer clones by increasing the selection pressure. However, the selection conditions must be re-established for each individual construct, because the efficiency of translation of the marker sequence strongly depends on the structure of the upstream gene. Inactivation of the gene of interest is highly unlikely when an IRES is used. However, under more stringent conditions, we have frequently observed cases in which the marker alone
7
Alw44I
ClaI StuI SfiI
SexAI
SVearly promoter
2000
DraIII
SVearly PA
1000
BsrGI NotI XbaI MunI HpaI AflII
Fig. 1 Representative vectors for stable expression. Left: plasmid pEGFPNI (Clonetech): the plasmid contains separate expression units for GFP (with hCMV promoter and SV40 early Poly(A) signal) and neomycin phosphotransferase as selection marker (with Sv40 early promoter and HSVtk Poly(A) signal). The neomycin gene serves as prokaryotic selection marker. Right: pCMVseaphyg (ProBioGen) The plasmid contains a
Tth111I FspI MscI NarI KasI EheI BbeI
BsrDI
neo
3000
4733 bps
pEGFPN1
egfp
4000
PmlI Acc65I KpnI
EMCV IRES SexAI HpaI XbaI SpeI Psp1406I ApaI PspOMI
seap
2000
BstEII
NdeI
AflII HpaI BglII Bsu36I
SapI AhdI
bi-cistronic expression unit controlled by the human CMV promoter. CMV intron A is introduced to facilitate RNA transport and to include downstream promoter elements. The EMCV IRES is positioned between seap and hyg genes to allow expression of hygR. Kan R serves as a prokaryotic marker only.
NdeI RsrII
NdeI SnaBI
hCMV IntronA
8514 bps
tkPoly A hyg
6000
8000
hCMV promoter
NdeI BsrGI SpeI
pCMVseaphyg
Kan R plasmid ori
NotI SalI SanDI PpuMI
SfiI
BbvCI
EcoNI SspI
XhoI NruI
21:36
RsrII
tk polyA
4000
NdeI SnaBI
hCMV promoter
VspI
January 22, 2008
BsaI
EcoO109I
plasmid ori
PciI AflIII
NheI Eco47III BglII XhoI Ecl136II SacI HindIII EcoRI PstI AccI SalI Acc65I KpnI SacII ApaI PspOMI SmaI XmaI BamHI AgeI
c11 (JWBK148-Bommarius) Char Count=
8 10 Protein Production in Mammalian Cells
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
3 Mammalian Expression Systems
is expressed. Indeed, such events can occur as often with such coupled expression plasmids as when selection and product units are located on the same plasmid. Even more important than the linkage between the two genes is the definition of an appropriate selection marker. The npt gene conferring resistance against G418 is an established selection marker for a wide range of eukaryotic cells. However, even very few molecules of its product neomycin phosphotransferase render the cell resistant to G418. Therefore selection of high-level producers is impaired since more stringent conditions cannot be imposed. Indeed, selection approaches based on factors that confer resistance to toxic antibiotics generally suffer from a number of disadvantages: 1. The resistance gene usually acts via an indirect mechanism: it inactivates an antibiotic that typically blocks protein synthesis in treated cells. 2. Inactivated antibiotics must be replenished to maintain the selective pressure, thereby leading to a variable selective concentration. 3. In some cases, temporary selection pressure results in permanent changes as a harmful effect of partial inactivation of antibiotics. 4. Often, a subtraction of untransfected cells can survive under selective conditions in acquiring resistance via an alternative mechanism such as activation of the multidrug resistance gene. The latter phenomenon is very pronounced when puromycin is used for selection of recombinant HEK 293 cells.
These disadvantages have therefore led to the use of auxotrophic markers as a more promising alternative. 3.5 Auxotrophic Selection Markers and Gene Amplification
Two auxotrophic markers have emerged as preferred options for the selection of high-level producers, namely the genes for dihydrofolate reductase (dhfr) and glutamine synthetase (gs). The use of the dhfr gene is restricted to CHO cells, as cell variants have been engineered that lack the corresponding enzyme. These cells depend on low levels of exogenous enzyme for nucleotide synthesis if cultured in the absence of thymidine and hypoxanthine. Two such cell lines were constructed by Chasin and colleagues during the 1980s. Whereas in CHO DXB11 cells [28] one allele is still present but inactivated by point mutation, thus allowing reversion to the dhfr+ phenotype by homologous recombination, in the DG44 mutant [29] dhfr sequences have been completely deleted. Both cell lines are frequently used to generate high-level producer cell lines. In contrast to dhfr, the gene encoding glutamine synthetase [30] acts as a dominant selection marker, because the enzyme content in most mammalian cell lines is too low to ensure an adequate supply of glutamine. Accordingly, the system was initially developed for myeloma cells, such as the NS0, in which endogenous enzyme levels are negligible.
9
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
10 Protein Production in Mammalian Cells
In addition to efficient selection, both systems offer an even more important advantage: selective pressure can be sequentially increased using the specific enzyme inhibitors methotrexate (MTX) for DHFR and methionine sulphoximine (MSX) for GS. The single-copy dhfr or gs gene cannot provide sufficient amounts of the enzyme to counteract the effects of high inhibitor concentrations. This situation selects for mutants that increase gene dosage by intrachromosomal amplification of the selection marker and adjacent sequences. This process is well known from tumor cells, which can acquire resistance to chemotherapeutics by a similar mechanism. According to the breakage-fusion-bridge model [31], this requires the exposure of fragile sites close to the marker gene – double-strand breaks which must be repaired by sister chromatids, and secondary breakage of covalently linked chromatids. In agreement with this model, MTX selection strongly reduces the size of cellular nucleotide pool, resulting in DNA breakage. MTX selection thus results in the accumulation of several hundred copies of the gene, in contrast to MSX selection where lower copy numbers are observed. This extended amplification promotes higher expression levels, but is also a potential cause of instability. Depending on the site of the first breakage, a telomeric fragment may be amplified that is unequally distributed to the daughter cells. As a consequence, the amplified region is rapidly lost in the absence of selection. Therefore, amplification is considered highly unpredictable, and cell lines generated without amplification are currently to be preferred. 3.6 The Integration Locus: a Major Determinant of Expression Level
Despite the presence of identical plasmids in comparable copy numbers, expression levels can vary by two to three orders of magnitude between individual clones. This phenomenon can be attributed to variations in the accessibility of the integrated gene to the transcriptional machinery. Only if the gene sequence is in the open chromatin conformation does it becomes available for transcription. This conformation, in turn, is determined by types of sequence elements, such as matrix attachment sites or chromatin opening elements, and mediated via histone acetylation and the lack of DNA methylation. Loci that display high expression efficiency are extremely rare in the genome (Figure 2). Hence, intensive screening (coupled with stringent selection and tight linkage between the marker and the gene of interest) is required to identify clones that show high productivity. Furthermore, the selection must be repeated for every individual cell line development. A high frequency of illegitimate recombination in permanent mammalian cell lines further impedes the identification of lines resulting from these rare homologous recombination events, and makes targeting as labor-intensive as random screening. The isolation of site-specific recombinases from heterologous sources, and their introduction into mammalian cells, can potentially overcome this problem. Recombinases from bacteriophage P1 (cre) and Saccharomyces cerevisiae (flp) elicit intermolecular strand exchange between the short target sequences (lox and FRT, respectively) in their natural host. For site-specific integration an
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
3 Mammalian Expression Systems
Generation of CHO-Receptor Cell Lines
Yield<60% of best performing clone
medium Producer 1,4% high Producer 0,2% Yield >60% of best performing clone
low Producer 98,4%
Yield <5% of best performing clone
Fig. 2 Distribution between low producers, medium producers and high producers within a pool of resistant cell clones after transfection and selection.
Site-specific recombination
with incompatible Rec. sites
frt
F3
..TTCTTCAAATAGT..
F5
..TTCTTCAAAAGGT..
Fig. 3 Site-specific recombination using a single site versus two heterospecific sites. Different core sequences within the fit recognition sequences (bold) of fit F5 and F3 [34] prevent recombination within the chromosome.
intermolecular reaction is required between a circular plasmid and a chromosomal target site. However, the thermodynamically preferred excision of a fragment flanked by recombinase sites is more often applied for the generation of recombinant mammalian cells and transgenic animals (Figure 3) [32, 33]. In order to achieve efficient integration, three options are available: 1. The use of heterospecific recombinase sites, in which the core sequences are modified to prevent a reaction between these sites (see Figure 3).
11
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
12 Protein Production in Mammalian Cells
2. Restriction of recombinase activity to a short time interval immediately after introduction of the template DNA, when the template concentration is still high (regulated flp expression, protein transfer, RNA transfection). 3. The introduction of an incomplete promoter-less selection marker into the target site, and activation of this marker via the subsequent assembly of a functional promoter as a direct result of the recombination event (promoter trapping).
We have successfully applied techniques (1) and (3) to achieve highly efficient targeting, using, respectively, heterospecific FLP sites [34] and repair of a defective neo gene lacking a start codon by a promoter/ATG combination (Figure 4). In a large-scale screening effort, we identified recipient CHO cell clones that show stable high-level expression of a model gene (Figure 5). While the model gene could be reproducibly replaced by the gene of interest in these clones, we nevertheless observed variations in expression level among clones originating from a single recipient cell. All of these clones harbored a single copy of the gene of interest at the pre-determined locus. The heterogeneity of expression can possibly be attributed to perturbance of the architecture of a previously stable locus caused by epigenetic phenomena. After re-cloning, we were able to identify clones that secreted up to 40 pg of the desired of a recombinant glycoprotein per cell per day. In addition to the integration loci identified by random screening, we also have used the human IgH locus of a human-mouse heterohybridoma for targeted integration. The locus was made accessible by insertion of FRT sites via homologous recombination and targeted by Flp-mediated exchange between heterospecific FRT sites. The best clones secreted 45 pg per cell per day of an IgM antibody. Insertion of multiple genes into this locus typically resulted in cell lines yielding typically between 10 and 25 pg per cell per day.
functionalized cell dhfr
Frt F3
Frt inactive F5 neoR
Testgene
Target gene
producer cell
dhfr
Fig. 4 Position-mediated expression enhancement (PMEE). The functionalized cell is a pre-selected high producer for a test protein. This construct contains a conventional selection marker (white box) and the dhfr gene allowing for gene amplification at a later stage. In addition, it contains an incomplete neo gene lacking a promoter. The cassette to be exchanged is flanked by het-
best locus
prom. ATG
reconstituted neoR
erospecific frt sites (hexagon). To generate the producer cell the test gene is exchanged for the gene of interest via recombination at the frt sites mediated by co-transfection of an tip-expressing plasmid. Recombination reconstitutes a functional neo gene. The procedure ensures rapid and reliable insertion into an optimized locus.
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
4 Mammalian Cell-based Fermentation Processes Start: CHO dhfr-DXB11
Electroporation (test gene-Leptin-Fc) Selection (hyg) Visual screening 1600 clones
Target gene insertion PMEE Selection
N.[µg/mL]
Identification of highest producers
40 30 20 10
Light microscopy image
0
Subclonesof Subclones of SKA9 SKA9
G3
B4
C4
C9
G5
H2
H11
Recloning & Recloning& optimisation
Leptin-Fc clone
Top producer >40pg/cell*day Immune staining: anti-human-IgG-Fc -TexRed
from a single copy gene
Fig. 5 Generation of high producers for multiple genes using a pre-selected clone expressing the gene for a test protein suitable for targeting.
Loci with superior expression characteristics can also be created by specific vector design: thus, specific genomic sequences that locate several kilobases away from enhancers and display features of MAR elements can be inserted into a vector. The inclusion of such elements can significantly reduce the screening effort required for the development of particular lines [35]. Finally, it should be stressed that optimal molecular design and selection of the producer cell line are not the only factors that determine whether an economic cell-based process can be established. The subsequent definition and fine-tuning of media composition and of fermentation parameters can have a comparable impact on the economics of a manufacturing process.
4 Mammalian Cell-based Fermentation Processes
Mammalian cells are the preferred option to produce therapeutics that require authentic glycosylation for activity or for improved distribution and elimination properties in vivo, as pointed out before. The lower productivity per volume of culture space in comparison to bacterial and fungal strains is the major challenge for process development and process design [36].
13
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
14 Protein Production in Mammalian Cells
In addition, reproducibility, long-term stability and compliance with the regulatory requirements are of concern when initiating the development for a mammalian cell-based production process.
4.1 Batch and Fed-batch Fermentation
Fermentation processes in stirred tank reactors (STR) carried out in a batch mode [37] can be scaled up to a 15 000-L scale. Due to the limitations in nutrient supply inherent in this mode of fermentation, cells grow logarithmically and die shortly after reaching maximal density (up to 5 × 106 cells mL−1 ). At a defined end-point, the content of the bioreactor is harvested and the total harvest is subjected to a purification (downstream) process. For the manufacturing of approved biopharmaceuticals, extensive cleaning in place (CIP) and sterilization in place (SIP) procedures are required to prepare the bioreactor for the next production run. As the total protein yield is proportional to the number of viable cells over the time (see area below the gray line in Figure 6), great efforts have been made to extend the effective production time by supplementing the medium with nutrient solution during fermentation. Cell line-specific feeding strategies were developed resulting in highly efficient “fed-batch” processes. In the fed-batch mode, nutrients such as glucose or amino acids are added to the bioreactor at defined culture times after initiating the culture (Figure 7) [37]. Higher cell densities of up to 10 × 107 cells mL−1 are now feasible, and the productivity of the substrate is much higher (up to fivefold) in comparison to the batch fermentation mode.
Fig. 6 Production train of stirred tank reactors; the smaller tanks are required to inoculate the production reactor. Only the reactor with the highest volume is used for production. The schematic diagram on the right illustrates the typical profile of viable cell number (black line) and the amount of protein produced (gray line) over the culture time (x-axis).
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
4 Mammalian Cell-based Fermentation Processes
Feeding
Mess &
Mess &
Fig. 7 Schematic illustration of fed-batch production process with increasing culture volume.
4.2 Continuous Perfusion Fermentation
More recently, variants of continuous production processes have been assessed. These processes require a continuous flow of culture medium through the bioreactor, in addition to cell retention (Figure 8). Cell retention enables higher cell densities (up to 5 × 107 cells mL−1 ) and, when combined with extended culture periods, a substantial increase of volumetric productivity per bioreactor run can be observed. Cell viability can be stably maintained during the complete fermentation run, thereby achieving consistent product quality during the entire fermentation process [38, 39]. The product is also removed from the culture via the continuous flow of medium. This reduces the potential problems are caused by product inhibition, or by limited stability of the product. It furthermore allows a down-sizing of the downstream process and an overall reduction of investments (Figure 9). Despite these advantages, continuous fermentation processes are rarely used in biopharmaceutical production at present. The equipment is more complex and
Mess &
Fig. 8 Schematic illustration of perfusion fermentation (media tank on the left side, harvest tank on the right side).
15
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
Viable Cells
16 Protein Production in Mammalian Cells
Individual harvest bulks
30
Batch API
Individual purified bulks 45 60
Time (d)
Fig. 9 Scheme of batch production with flexible downstream processes in continuous fermentation. API = active pharmaceutical ingredient. The individual purified bulks can be pooled together to get one batch per reactor run.
more difficult to maintain in comparison to batch or fed-batch fermentation. The need for long-term sterility and additional reservoirs and equipment for the media and harvest must also be considered. Likewise, the process development and process validation required are more time-consuming. Several different cell-retention systems are available which exploit the following methods: filtration, acoustic cell retention, sedimentation, centrifugation, and fluidized bed retention [40]. 4.3 Continuous Production with Hollow-fiber Bioreactors
Hollow-fiber bioreactors represent special continuous perfusion systems, characterized by independent medium and harvest streams. The system allows retention of both cells and product in the culture chamber. The separation is realized by using hollow-fiber cartridges with intra- and extracapillary spaces. The system has a cycling pathway that provides a continuous flow of medium through the hollow fibers and the oxygenator (medium circulation), and an inoculation and harvest pathway, that runs through the extracapillary space, also termed the culture space. The mass transfer through the membrane is diffusion-controlled. The use of older types of these bioreactors is associated with problems involving oxygen supply, membrane fouling, homogeneous product harvest, and removal of dead cells and debris. In attempt to avoid these problems, a specific secondgeneration type of bioreactor was developed. Reactors of this type are equipped with an expansion chamber for each pathway, in which gas pressure can be selectively increased (up to 100 mmHg) (Figure 10). This leads to a controlled transmembrane pressure, and provides a much larger mass transfer than that obtained via diffusion. An advantageous mixing effect occurs when this trans-membrane pressure is applied and, as a result, the fluid flow is reversed. The trans-membrane flow can be adjusted as high as sixfold that of the cell culture volume per hour. Because harvesting is independent of the feeding flow, the product concentration is also adjustable.
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
4 Mammalian Cell-based Fermentation Processes Harvest Medium
Medium Harvest
Outflow
Outflow GEX
GEX
Fig. 10 Schematic flow chart of traditional (left) and recent (right) hollow-fiber bioreactor systems. Gas pressure is marked with arrows in the more recent system. (GEX = gas exchange).
Another advantage is the use of disposable culture devices (equipment that is in contact with cells and medium/harvest flow), and therefore, no CIP and SIP procedures need to be applied. This in turn minimizes the risk of process variation and of cross-contamination between different reactor runs. Very high cell densities (>108 cells mL−1 ) are reached in the culture space. As shown in Figure 11, the cells are arranged in a single hollow fiber, and densities are comparable to those in the tissues. Because it is impossible to determine the count of viable cells or to isolate large cell amounts from the bioreactor, the provision of proof of batch-to-batch consistency is a major challenge. As an example, three different production runs for a monoclonal antibody are shown in Figure 12. The production rate was shown to be constant for up to three months, and to be independent of the culture time [41]. Hollow-fiber bioreactors have many advantages over traditional bioreactors. Their only limitation is that the capacity of commercially available bioreactors is so far limited to a culture space of 2000 mL. Therefore, their use is restricted to the production of low-volume biopharmaceuticals and in-vivo diagnostics [42].
Nuclear Stain (H&E) 100µm
Hollow Fibre 10 kDa Cut Off
Fig. 11 Histological preparation of a hollow-fiber cartridge. A homogeneous cell distribution was found throughout the vast majority of the extracapillary space in the modules.
17
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
18 Protein Production in Mammalian Cells
MAB amount [g]
20 15 10 5 0 0
20
40
60
80
100
Culturetime [d] Run 1
Run 2
Run 3
Fig. 12 Three different production runs for a monoclonal antibody with varying process times (♦, 1 month; , 2 months; , 3 months). The runs were performed by using an ASM-Maximizer 500 (100 ml of culture space).
5 Conclusions
Mammalian cells are the system of choice for the production of therapeutic glycoproteins, and in particular of full-length therapeutic antibodies. To make the system efficient, careful screening and selection of the producer cell line and a careful process design are required. Highly versatile vector systems have been defined that allow the production of single or multiple candidate proteins in stable producer lines on small and medium scales. Despite continuous advances being made in basic research, the mechanisms of expression and secretion remain poorly understood. Therefore, any success in identifying a line with superior characteristics remains unpredictable. Currently, new approaches are being taken for standardized cell lines and process development, thereby reducing the time and effort required to establish mammalian cell-based production processes.
Acknowledgments
The authors thank Gerd Gellissen and Michele Großst¨uck for carefully reading the manuscript, and for their suggestions and modifications.
References 1 JEFFERIS R., LUND J. (1997) Glycosylation of antibody molecules: structural and functional significance. Chem Immunol 65: 111–128
2 HOUNSELL E. F., DAVIES M. J., RENOUF D. V. (1996) O-linked protein glycosylation structure and function. Glycoconj J 13: 19–26
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
References 3 WRIGHT A., MORRISON S. L. (1997) Effect of glycosylation on antibody function: implications for genetic engineering. Trends Biotechnol 15: 26–32 4 GOLLOGLY L., CASTRONOVO V. (1996) A possible role for the alpha 1 3 galactosyl epitope and the natural anti-gal antibody in oncogenesis. Neoplasma 43: 285–289 5 JONES D., KROOS N., ANEMA R., VAN MONTFORT B., VOOYS A., VAN DER KRAATS S., VAN DER HELM E., SMITS S., SCHOUTEN J., BROUWER K., LAGERWERF F., VAN BERKEL P., OPSTELTEN D. J., LOGTENBERG T., BOUT A. (2003) High-level expression of recombinant IgG in the human cell line PerC6. Biotechnol Prog 19: 163– 168 6 FALLAUX F. J., BOUT A., VAN DER VELDE I., VAN DEN WOLLENBERG D. J., HEHIR K. M., KEEGAN J., AUGER C., CRAMER S. J., VAN ORMONDT H., VAN DER EB A. J., VALERIO D., HOEBEN R. C. (1998) New helper cells and matched early region 1-deleted adenovirus vectors prevent generation of replication-competent adenoviruses. Hum GeneTher 9: 1909–1917 7 BERK A. J. (1999) Activation of RNA polymerase II transcription. Curr Opin Cell Biol 11: 330–335 8 PROSCH S., STEIN J., STAAK K., LIEBENTHAL C., VOLK H. D., KRUGER D. H. (1996) Inactivation of the very strong HCMV immediate early promoter by DNA CpG methylation in vitro. Biol Chem Hoppe Seyler 377: 195–201 9 LIU R., BAILLIE J., SISSONS J. G., SINCLAIR J. H. (1994) The transcription factor YY1 binds to negative regulatory elements in the human cytomegalovirus major immediate early enhancer/promoter and mediates repression in non-permissive cells. Nucleic Acids Res 22: 2453– 2459 10 ROTONDARO L., MELE A., ROVERA G. (1996) Efficiency of different viral promoters in directing gene expression in mammalian cells: effect of 3 -untranslated sequences. Gene 168: 195–198 11 KIM S. Y., LEE J. H., SHIN H. S., RANG H. J., KIM Y. S. (2002) The human elongation factor 1 alpha (EF-1 alpha) first intron highly enhances expression of foreign genes from the murine
12
13
14
15
16
17
18
19
20
21
22
cytomegalovirus promoter. J Biotechnol 93: 183–187 QUINN C. M., WILES A. P., EL-SHANAWANY T., CATCHPOLE I., ALNADAF T., FORD M. J., GORDON S., GREAVES D. R. (1999) The human eukaryotic initiation factor 4AI gene (EIF4A1) contains multiple regulatory elements that direct high-level reporter gene expression in mammalian cell lines. Genomics 62: 468–476 DENOME R. M., COLE C. N. (1988) Patterns of polyadenylation site selection in gene constructs containing multiple polyadenylation signals. Mol Cell Biol 8: 4829–4839 KOZAK M. (1984) Compilation and analysis of sequences upstream from the translational start site in eukaryotic mRNAs. Nucleic Acids Res 12: 857–872 PERI S., PANDEY A. (2001) A reassessment of the translation initiation codon in vertebrates. Trends Genet 17: 685–687 LINDELL J., GLRARD P., MULLER N., JORDAN M., WURM F. (2004) Calfection: a novel gene transfer method for suspension cells. Biochim Biophys Acta 1676: 155–161 DUROCHER Y., PERRET S., KAMEN A. (2002) High-level and high-throughput recombinant protein production by transient transfection of suspension-growing human 293-EBNA1 cells. Nucleic Acids Res 30: E9 HAENSLER J. and SZOKA F. C., Jr (1993) Polyami-doamine cascade polymers mediate efficient transfection of cells in culture. Bioconjug Chem 4: 372–379 CALOS M. P. (1998) Stability without a centromere. Proc Natl Acad Sci USA 95: 4084–4085 PIECHACZEK C., FETZER C., BAIKER A., BODE J., LIPPS H. J. (1999) A vector based on the SV40 origin of replication and chromosomal S/ MARs replicates episomally in CHO cells. Nucleic Acids Res 27: 426–428 WHITE R. E., WADE-MARTINS R., JAMES M. R. (2001) Sequences adjacent to oriP improve the persistence of Epstein-Barr virus-based episomes in B cells. J Virol 75: 11249–11252 BOORSMA M., NIEBA L., KOLLER D., BACHMANN M. F., BAILEY J. E., RENNER W. A. (2000) A temperature-regulated
19
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
20 Protein Production in Mammalian Cells
23
24
25
26
27
28
29
30
31
replicon-based DNA expression system. Nat Biotechnol 18: 429–432 MONACO L., TAGLIABUE R., SORIA M. R., UHLEN M. (1994) An in vitro amplification approach for the expression of recombinant proteins in mammalian cells. Biotechnol Appl Biochem 20: 157–171 MCBURNEY M. W., MAI T., YANG X., JARDINE K. (2002) Evidence for repeat-induced gene silencing in cultured Mammalian cells: inactivation of tandem repeats of transfected genes. Exp Cell Res 274: 1–8 KOZAK M. (2002) Pushing the limits of the scanning mechanism for initiation of translation. Gene 299:1–34 NANBRU C., LAFON I., AUDIGIER S., GENSAC M. C., VAGNER S., HUEZ G., PRATS A. C. (1997) Alternative translation of the proto-oncogene c-myc by an internal ribosome entry site. J Biol Chem 272: 32061–32066. MILLER D. L., DIBBENS J. A., DAMERT A., RISAU W., VADAS M. A., GOODALL G. J. (1998) The vascular endothelial growth factor mRNA contains an internal ribosome entry site. FEBS Lett 434: 417–420 URLAUB G., CHASIN L. A. (1980) Isolation of Chinese hamster cell mutants deficient in dihydrofolate reductase activity. Proc Natl Acad Sci USA 77: 4216–4220 URLAUB G., MITCHELL P. J., KAS E., CHASIN L. A., FUNANAGE V. L., MYODA T. T., HAMLIN J. (1986) Effect of gamma rays at the dihydrofolate reductase locus: deletions and inversions. Somat Cell Mol Genet 12: 555–566 BEBBINGTON C. R., RENNER G., THOMSON S., KING D., ABRAMS DYARRANTON G. T. (1992) High-level expression of a recombinant antibody from myeloma cells using a glutamine synthetase gene as an amplifiable selectable marker. Biotechnology (NY) 10: 169–175 COQUELLE A., PIPIRAS E., TOLEDO F., BUTTIN G., DEBATISSE M. (1997) Expression of fragile sites triggers intrachromosomal mammalian gene amplification and sets boundaries to early amplicons. Cell 89: 215–225
32 RAY M. K., FAGAN S. P., BRUNICARDI F. C. (2000) The Cre-loxP system: a versatile tool for targeting genes in a cell- and stage-specific manner. Cell Transplant 9: 805–815 33 RODRIGUEZ C. I., BUCHHOLZ F., GALLOWAY J., SEQUERRA R., KASPER J., AYALA R., STEWART A. F., DYMECKI S. M. (2000) High-efficiency deleter mice show that FLPe is an alternative to CreloxP. Nat Genet 25: 139–140 34 SCHLAKE T., BODE J. (1994) Use of mutated FLP recognition target (FRT) sites for the exchange of expression cassettes at defined chromosomal loci. Biochemistry 33: 12746–12751 35 ZAHN-ZABAL M., KOBR M., GIROD P. A., IMHOF M., CHATELLARD P., DE JESUS M. WURM F., MERMOD N. (2001) Development of stable cell lines for production or regulated expression using matrix attachment regions. J Biotechnol 87: 29–42 36 MOLOWA D. T., MAZANET R. (2003) The state of bio-pharmaceutical manufacturing. Biotechnol Annu Rev 9: 285–302 37 NELSON K. L., GEYER S. (1991) Bioreactor and process design for large-scale mammalian cell culture manufacturing. Bioprocess Technol 13: 112–143 38 LULLAU E., KANTTINEN A., HASSEL J., BERG M., HAAG-ALVARSSON A., CEDERBRANT K., GREEN-BERG B., FENCE C., SCHWEIKART F. (2003) Comparison of batch and perfusion culture in combination with pilot-scale expanded bed purification for the production of soluble recombinant beta-secretase. Biotechnol Prog 19: 37–44 39 VOISARD D., MEUWLY F., RUFFIEUX P. A., BAER G., KADOURI A. (2003) Potential of cell retention techniques for large-scale high-density perfusion culture of suspended mammalian cells. Biotechnol Bioeng 82: 751–765 40 CASTILHO L. R., MEDRONHO R. A. (2002) Cell retention devices for suspended-cell perfusion cultures. Adv Biochem Eng Biotechnol 74: 129–169 41 BRECHT R., BUSHNAQ-JOSTING H., ZIETZE S., NUCK R., KLOTH C., SCHWADTKE A., OBERMAYER N., MARX U., KOCH S. (2003) Continuous GMP production in hollow
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
References fibre bioreactors – a viable alternative for the production of low volume biopharmaceuticals. 18th ESACT meeting, Spain 42 CHU L., ROBINSON D. K. (2001) Industrial choices for protein production by large-scale cell culture. Curr Opin Biotechnol 12: 180–187 43 SHULMAN M., WILDE C. D., KOHLER G. (1978) A better cell line for making hybridomas secreting specific antibodies. Nature 276: 269–270
44 GRAHAM F. L., SMILEY J., RUSSELL W. C., NAIRN R. (1977) Characteristics of a human cell line transformed by DNA from human adenovirus type 5. J Gen Virol 36: 59–74 45 SENA-ESTEVES M., SAEKI Y., CAMP S. M., CHIOCCA E. A., BREAKEFIELD X. O. (1999) Single-step conversion of cells to retrovirus vector producers with herpes simplex virus-Epstein–Barr virus hybrid amplicons. J Virol 73: 10426– 10439
21
c11 (JWBK148-Bommarius)
January 22, 2008
21:36
Char Count=
22
1
Protein Production in Aspergillus sojae Margreet Heerikhuisen, Cees Hondel van den, and Peter Punt TNO Nutrition and Food Research, Zeist, The Netherlands
Originally published in: Production of Recombinant Proteins. Edited by Gerd Gellissen. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31036-4
1 Introduction
Among microbial production organisms, filamentous fungi hold a strong position in particular for the production of antimicrobial metabolites such as penicillins and cephalosporins [1] and for the production of hydrolytic enzymes [2]. Bacterial hosts are frequently used for the production of primary and secondary metabolites such as amino acids and vitamins [3]. The use of filamentous fungi for protein production originates from a variety of traditional food processes involving these organisms. Already more than a century ago, fungi were identified as relevant microorganisms in cheese-making and a number of oriental food processes (e.g., soy sauce, tempeh, rice brewing). Based on this early biotechnological knowledge, fungi have been exploited as sources for enzymes beyond these original applications. Nowadays, a multitude of fungal enzymes are available commercially for numerous applications in the food and the feed industry, as well as being used in chemical processes related to detergent, paper and pulp generation and usage (see for example [4]; [5]). With the availability of molecular tools for filamentous fungi, new developments emerged to improve yields in the production of established fungal proteins and to identify new protein products. The new protein products included both, modified proteins with altered or improved characteristics and hitherto poorly or nonaccessible proteins [6]. Moreover, these molecular tools enabled the expression of heterologous proteins from completely unrelated species including mammals. The first reports on fungi-derived mammalian proteins in commercial yields date from more than a decade ago (for a review, see Gouka et al. [6]). Only a relatively small number of fungal species is presently used as protein production hosts [7]). Among these fungi, Aspergillus species play a dominant role. An important reason for this focus stems from the fact that Aspergillus species are Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Protein Production in Aspergillus sojae
highly represented among the strains used for traditional production processes. Consequently, one of the first fungal expression platforms is based on an Aspergillus species [8, 9], thereby taking a clear lead to further developments based on fungal organisms in general. This position is further sustained by the fact that several Aspergillus-derived food additive products had already obtained a GRAS (generally recognized as safe) status from the regulatory authorities, thereby easing its recognition as a safe and reliable expression platform. As with other industrial environments, an extensive intellectual property position was built up by a number of Biotech companies (e.g., DSM, NOVOzymes, Genencor) for particular Aspergillus species. As a consequence – and due to the obvious commercial attraction of fungal expression hosts – several parties set out to identify alternative species and to assess the possibilities of employing the newly identified fungal expression hosts for heterologous gene expression [7]). As a result of this search for new hosts, Aspergillus sojae was identified. This species is related to other Aspergillus species (in particular A. oryzae), and possibly allows the use of existing molecular tools. The products from this species have a history of safe use. In this chapter, the status of the protein production platform based on A. sojae will be described.
2 Taxonomy
Fungal taxonomy is a complex issue, which continues to be subject to changes. When deduced in particular from molecular data, the genus Aspergillus is classified into the Ascomycetes, now combining species with ascomycetous teleomorphs as well as species with no known teleomorphs which were previously classified into the Deuteromycetes. By comparing complete 18S rDNA sequences, Tamura et al. found that the genus Aspergillus consists of three different evolutionary lines [10]. Based on monophyletic taxonomy, three subgenera were suggested that consist of 15 sections in total: Aspergillus, Fumigati, and Nidulantes [11]. A phylogenetic tree of selected Aspergillus species is shown in Figure 1; these include species belonging to the section Flavi, which in turn can be divided into non-aflatoxigenic species such as Aspergillus oryzae and A. sojae, and aflatoxigenic species such as Aspergillus flavus and A. parasiticus. The non-aflatoxigenic species have been widely used in industry for food fermentation or for the production of enzymes. Discrimination between the nonaflatoxigenic species and the aflatoxigenic species is a very important characteristic with respect to their use in food applications, and represents a key criterion for their acceptance by regulatory authorities. A. sojae strains can be distinguished from taxonomically closely related species, such as A. oryzae and A. parasiticus, in a number of ways. Traditionally, the identification of species belonging to the section Flavi was based on several morphological characteristics. However, these species are morphologically very similar, which impedes any clear distinction between them. Based on the high degree of DNA
2 Taxonomy
Fig. 1 Molecular phylogeny of selected Aspergillus species, including species belonging to the section Flavi, based on ITS sequence data.
complementarity, Kurtzman et al. proposed that A. flavus, A. oryzae, A. parasiticus, and A. sojae are varieties of a single species. These authors found that the homology in total DNA hybridizations between A. flavus and A. oryzae is 100%, between A. parasiticus and A. sojae is 91%, and between these groups is 72% [12]. Others state that Aspergillus sojae and A. oryzae are domesticated variants of A. parasiticus and A. flavus, respectively. During the past decade, molecular genetic techniques have been used for the classification of the species in the section Flavi. Reference is here made in particular to the PCR fragments derived from ver-1, aflR and rDNA sequences [13–16, 42]). For example, Yuan et al. [17] have shown that the very closely related A. parasiticus and A. sojae species can be distinguished from each other by random amplification of polymorphic DNA (RAPD) analysis, or by the difference in resistance to bleomycin. In addition, it has been found that Aspergillus oryzae can be distinguished from A. sojae by comparison of the alpA sequences [17]). It has also been found that the A. sojae genome harbors an XmnI restriction site at a specific location within the alpA
3
4 Protein Production in Aspergillus sojae
gene, which is not present in A. oryzae. This provides an additional discrimination tool between the two fungal strains. In our initial host screening, we have analyzed various A. sojae isolates available from the ATCC strain collection for the above-mentioned criteria.
3 The Expression Platform 3.1 Strain Selection
From a collection of 10 ATCC strains deposited as A. sojae (see Appendix Table A1), an initial selection was made for the most appropriate strain to be used as a host for protein production. We screened the strains for the previously described taxonomic criteria. A literature research resulted in the classification for nine out of the 10 strains to be Aspergillus sojae. On the basis of morphological parameters, A. sojae ATCC20235 strain did not meet the requirements to be classified as A. sojae [42]. Based on the alpA restriction fragment length polymorphism (RFLP) which distinguishes A. sojae from A. oryzae, two strains (ATCC20235 and ATCC46250) were re-classified as A. oryzae strains, leaving eight A. sojae strains. Apparently, the deposition of fungal strains does not meet stringent taxonomic classification criteria. An important property of a potentially useful fungal expression systems is the lack of (or a low level of) proteolytic degradation by the expression host. In order to analyze the protease activity of culture fluids of the various A. sojae strains, a milk-clearing assay was performed. The formation of a milk-clearing zone at the periphery of growing colonies – and thus the degradation of milk proteins – is a generally accepted criterion for protease activity [18]. In addition, shake flask culture samples were supplemented with different proteins (e.g., bovine serum albumin) and incubated. The extent of protein degradation was monitored to provide a valid acceptance criterion for suitability of the tested strains as expression hosts for a range of extracellular proteins (Figure 2). Two out of the eight A. sojae strains (ATCC11906 and ATCC20387) exhibited a low proteolytic profile in shake flask cultures. Under preliminary fed-batch fermentation conditions, the A. sojae ATCC11906 strain showed the lowest proteolytic activity and was thus selected for further research. 3.2 Transformation 3.2.1 Protoplasting Two currently used protoplasting protocols – the modified OM-method [19] and the NaCl method [20] – were tested for application to A. sojae. Transformation experiments with the selected A. sojae strains revealed that protoplasting
3 The Expression Platform
Fig. 2 Bovine serum albumin (BSA) degradation after overnight incubation with shake flask culture samples from various A. sojae strains grown in Mineral Medium [30] containing 1% Trusoy (Soya flour; Spillers Premier Products Limited).
efficiencies for A. sojae strains ATCC11906 (and also ATCC20387) were better using the NaCl method. Successful protoplasting was achieved using various commercially available protoplasting enzyme preparations such as Novozym234 (NOVOzymes), Caylase C4 (Cayla), Glucanex (NOVOzymes). 3.2.2 Dominant Selection Markers In general, fungi can be transformed with vectors either containing nutritional markers which complement an auxotrophy or dominant selection (mostly antibiotic resistance) markers. In particular, dominant selection markers can be very helpful as they do not require the availability of specific auxotrophic strains. Therefore, the sensitivity of A. sojae to various selective agents was tested on agar plates with the respective drug additions. The range of agents was restricted to those for which a corresponding fungal selection marker is available. From this analysis it was concluded that hygromycin B, phleomycin, bialaphos and acetamide selection cannot be used for A. sojae, whereas oligomycin C and acrylamide turned out to be useful as selective agents. Other known selective agents such as sulfonylurea and benomyl were not tested. Although A. sojae is able to grow on agar plates containing acetamide as a sole nitrogen source, the amdS gene of A. nidulans can also be used as a dominant selection marker for A. sojae on the basis of acrylamide utilization [9]. Unlike the other dominant markers, the amdS gene is not an antibiotic resistance marker but a dominant nutritional marker, encoding acetamidase, which is an enzyme that confers the ability to use acetamide and acrylamide as nitrogen and carbon sources. For transformation of A. sojae, vector p3SR2 (carrying the amdS marker; Kelly and Hynes [9]) was used in combination with a derivative of the autonomously replicating Aspergillus vector Arp1 [21]. In all Aspergillus species tested so far, this vector resulted in highly increased numbers of (instable) transformants when used as a co-transforming vector. Initial experiments revealed, that even on selective
5
6 Protein Production in Aspergillus sojae
acrylamide plates, a considerable background from non-transformed protoplasts was observed. Selection of primary AmdS+ transformants required around three weeks, and many of the initially selected putative transformants turned out to be false positives, only showing background growth after transfer to fresh selective acrylamide plates. Further optimization of the acrylamide selection was obtained reducing background growth by omitting glucose from the acrylamide selective plates [17]. 3.2.3 Auxotrophic Selection Markers As the transformation frequency was found to be too low for routine transformation when using the amdS marker for selection, an auxotrophic marker system was developed for A. sojae. In general, such a system provides a higher transformation frequency. A commonly used auxotrophic selection method for fungal transformants is based on the usage of orotidine 5-monophosphate decarboxylase (pyrG) mutants. Mattern et al. [22] published the transformation of A. oryzae using the A. niger pyrG gene. The isolation of pyrG mutants is based on direct selection for resistance to fluoro-orotic acid (FOA). This approach has resulted in the isolation of numerous pyrG mutants for a variety of fungi to date. From our experience with a number of different filamentous fungi, the auxotrophic pyrG-based system has many favorable characteristics. Experiments were carried out to obtain A. sojae pyrG mutant strains, using a standard procedure based on direct selection for resistance to FOA on plates containing uridine to support growth of the mutant strain [23]. However, use of this method did not result in A. sojae pyrG mutant strains, as all obtained FOA-resistant strains were able to grow without uridine. However, when using a modified selective FOA medium which contained uracil in addition to uridine, several FOA-resistant mutants were obtained, which were uracil-requiring [17]. Retesting of these strains showed that they were unable to grow on uridinesupplemented minimal medium. Subsequent transformation experiments with some of the uracil-requiring strains showed that these mutants could indeed be complemented with a fungal pyrG gene (vector pAB4.1; see Figure 3). The inability of pyrG mutants to grow in minimal medium supplemented with uridine alone was not observed for the related species A. nidulans, A. niger, A. oryzae, and for various other fungal species. 3.2.4 Re-usable Selection Marker Versatile genetic modification of A. sojae requires the possibility to modify, disrupt, and express a number of different genes in a single fungal strain. Therefore, the availability of a series of different selection markers is essential. However, the possibility of repeated use of the same marker in subsequent experiments can circumvent the need for multiple selection markers. Use of the pyrG gene is very suitable for this purpose, because it allows selection of the pyrG mutant (FOA selection) and the PyrG+ transformant (uracil-less medium). For such an approach, a pyrG marker gene was designed in which the complementing sequence was flanked by a direct repeat sequence originating from the 3 flanking end of the pyrG gene. The construction of a vector containing this type of pyrG marker,
3 The Expression Platform
Fig. 3 The construction of pAB4-1rep.
pAB4-1rep, is detailed in Figure 3. Transformation of A. sojae pyrG mutants with this vector results in a similar number of PyrG+ transformants as with the vector pAB4-1. In gene disruption experiments, the repeat-flanked pyrG fragment has been used for disruption of the gene of interest and subsequent removal of the pyrG selection marker (described in Section 5). 3.3 Promoter Elements
Promoter elements are essential components for the establishment of an expression platform. Therefore, we set out experiments for the identification and isolation of suitable elements for the A. sojae platform. As a result of gene expression studies, highly expressed fungal genes were identified and exploited as sources for efficient promoters. Currently, the most frequently used elements are derived from: (i) the constitutive A. nidulans gpdA gene, encoding glyceraldehyde 3-phosphate dehydrogenase; (ii) the A. niger glucoamylase-encoding gene glaA; and (iii) the A. nidulans alcohol dehydrogenase gene alcA and the Trichoderma reesei cellobiohydrolase (cbh1) gene [2, 24, 25]. In contrast to the constitutive expression imposed by the gpdA promoter, a carbon source-dependent gene expression profile is elicited by the others. Commonly used inducers for the latter three are starch/maltodextrin, ethanol, and cellulose/lactose, respectively. The selected elements were assessed for heterologous gene expression in A. sojae by generating expression vectors using an E. coli glucuronidase (GUS) reporter gene [26]. Based on initial results, we focused on the A. nidulans gpdA and the
7
8 Protein Production in Aspergillus sojae
A. niger glaA promoters [26, 27]. In addition, a third Aspergillus promoter was evaluated for use in A. sojae. This promoter was obtained from the A. niger-derived bipA gene which exhibits a high and constitutive expression profile and encodes an endoplasmic reticulum (ER)-chaperone from the HSP70 gene family [28]. Interestingly, this promoter was induced through a stress-response mediated by the overproduction of homologous and heterologous proteins [28, 29]. As a result of this, the bipA promoter is of potential interest for high-level induced gene expression. For assessment, the bipA promoter was included into a GUS expression vector designed identically to that described for the vectors harboring gpdA and glaA promoters (P.J. Punt et al., unpublished results). The various expression vectors described above were applied to generate a number of A. sojae ATCC11906 transformants using the amdS gene for selection. GUS+ transformants were identified using X-glucuronide plate assays [26]. For each of the promoters, representative transformants were purified and analyzed for GUS expression in shake-flask cultures using nitrate-based mineral medium [30] supplemented with a variety of carbon sources (Table 1). For comparison, results obtained with A. niger transformants were also included in this analysis ([27]; P.J. Punt et al., unpublished results). The A. niduans gpdA promoter elicits the highest level of gene expression for each of the carbon sources, as shown in Table 1. In contrast, the A. niger glaA promoter is hardly active in A. sojae and shows no regulation by carbon source under the conditions tested. A previous analysis of the regulation of the A. niger glaA promoter suggested that this promoter is not only carbon source-, but also pH-regulated. Experiments carried out in various A. niger strains with different acidification characteristics revealed that the glaA promoter is not active at pH >6 ([31]; P.J. Punt et al., unpublished results). Since shake-flask culturing of A. sojae without pH adjustments results in the formation of pH values higher than 7, a lack of activity of the glaA promoter in this host is easily explained. In contrast, the bipA promoter is fully active in A. sojae, but of a strength lower than that of the Table 1 Promoter strength in various Aspergillus transformants
Transformant
Aspergillus sojae ATCC 11906 wt ATCC11906[pGUS54] ATCC11906[pGUS64] ATCC11906[pBIPGUS] Aspergillus niger * AB4.1 [pGUS54] AB4.1[pGUS64] AB4.1 [pBIPGUS] ∗
Promoter
Gus activity (U mg−1 ) in Minimal Medium 5% xylose 5% glucose 5% maltodexatrin
gpdA glaA bipA
0 9141 33 2914
0 6291 50 2849
0 6667 25 1642
gpdA glaA bipA
ND ND ND
1992 1966 785
2365 4384 1237
A. niger transformants carrying a single copy of the expression vector at the pyrG locus were analyzed for GUS expression. For A. sojae, representative transformants were selected, based on a GUS plate assay.
4 Aspergillus sojae as a Cell Factory for Foreign Proteins
gpdA promoter, as was also observed in A. niger. Furthermore, the bipA promoter is of decreased activity in A. sojae when culturing in maltodextrin-based media. This divergent gene expression profile was not further analyzed in detail. Based on the results obtained with the GUS expression vectors, the A. nidulansderived gpdA promoter was chosen as a preferred promoter for gene expression in A. sojae. Current research in our laboratory is focusing on the isolation of alternative promoters for similar high-level gene expression in A. sojae. 4 Aspergillus sojae as a Cell Factory for Foreign Proteins
Fungal cells are known to be efficient natural protein factories, as indicated above. This capability made them preferred target organisms for the production of recombinant proteins. Initial developments were aimed at increasing the productivity and the yields of fungal proteins by introducing additional gene copies [2]. Indeed, this approach resulted in largely increased levels of recombinant proteins [2]. An additional encouragement was the recent substantial synthesis of fungal heme-peroxidases, the production of which in microbial hosts was reported to be extremely difficult [32]. Based on these promising results, the production of non-fungal heterologous proteins was attempted, albeit with different levels of success. Based on a compilation of data from expression studies performed in other fungal hosts, the gene-fusion approach developed initially by researchers at Genencor has been particularly successful [6]. In this method, the gene encoding the non-fungal protein of interest is fused to a gene encoding a well-secreted fungal protein, such as glucoamylase or cellobiohydrolase. This approach was applied successfully to the new system. For recombinant protein production studies we have selected a number of divergent model proteins ranging from fungal hydrolases (glucoamylase, phytase), a fungal oxidase (laccase), and a mammalian protein (interleukin 6). As pointed out earlier, PgpdA-based expression vectors were employed and introduced into selected A. sojae host strains. 4.1 Production of Fungal Proteins
The various expression vectors used in this study are shown in the Appendix (Table A2). As shown in Figure 4, high levels of the fungal phytase and glucoamylase proteins were secreted from selected transformants in shake-flask cultures. In these transformants, the fraction of secreted recombinant proteins consisted of at least 80% of the total. The use of a glucoamylase-phytase gene-fusion did not result in higher levels of phytase. Apparently, in this case the gene fusion approach could not further improve the expression level of the fungal phytase. Since the copy number of the introduced expression cassette in the A. sojae transformants was low compared to that reported for A. niger, a cosmid approach was used to increase the copy number [33]. This approach is based on the construction of a
9
10 Protein Production in Aspergillus sojae
Fig. 4 Coomassie brilliant blue-stained SDS-PAGE gels of medium samples from: A) phytase-secreting A. sojae transformants; and B) glucoamylase (GLA)-secreting A. sojae transformants. The small arrows indicate endogenous A. sojae proteins.
cosmid vector in which multiple copies of the expression cassette are cloned. Using a phytase expression vector carrying four copies of the expression cassette, the level of secreted phytase was increased at least twofold. We were also successful in generating A. sojae transformants which secreted an active fungal blue copper-containing laccase [34] at significant levels (Figure 5). It should be noted that in all these cases only a limited set of transformants was screened. Screening of larger numbers of transformants using a platform for 96well-based fungal cultures may result in the isolation of transformants with even higher levels of recombinant protein.
Fig. 5 A. sojae pLAC1-B transformants in an ABTS plate assay [34]. The colored halo around the colonies indicate the secretion of active laccase. A. sojae strain ATCC11906 was included as a negative control.
4 Aspergillus sojae as a Cell Factory for Foreign Proteins
4.2 Production of Non-fungal Proteins
As we are particularly interested in the production of non-fungal proteins, we have also expressed several genes for animal/human proteins in A. sojae. One of the proteins produced in A. sojae is human interleukin 6 (IL-6). For the production of IL-6, a glucoamylase-IL-6 fusion construct was used (pAN56–4; Broekhuijsen et al. [35]). In shake-flask cultures of the respective A. sojae transformants, reasonable levels of IL-6 could be detected by Western blot analysis using an IL-6 antiserum. The levels obtained in these cultures were clearly higher than those obtained initially in A. niger [35]. Attempts to produce IL-6 using vector constructs harboring a nonfused IL-6 gene failed. From these results it can be concluded that fusion to a secreted fungal protein is a prerequisite to secrete non-fungal proteins in A. sojae, similar to the situation observed in other Aspergillus sp. For the secretion of the intracellular green fluorescent protein (GFP), a glucoamylase-GFP fusion was constructed. Again, high-level secretion was achieved, and the recombinant A. sojae strains were found to be superior to A. niger reference strains. In contrast to the situation in A. niger [36], significant amounts of GFP could be detected extracellularly, whereas in A. niger the secreted GFP was rapidly inactivated and/or degraded. For both IL-6 and GFP the acidification of culture medium in A. niger is suggested to be the cause for proteolytic degradation, and hence the low levels of active secreted protein. In shake-flask cultures with A. sojae no significant acidification occurs. Based on these results it was concluded that the A. sojae expression system has good characteristics for the secretion of recombinant proteins. However, the level of protein secretion – and in particular of non-fungal proteins – had to be improved further in order to meet industrial requirements. To reach this goal, two approaches have been pursued, namely the development of improved production process conditions using controlled fermentation, and further strain improvement. 4.3 Controlled Fermentation
The transformant of the highest IL-6 productivity was selected for controlled batch fermentation experiments in fully controlled Bioflow fermentor systems (New Brunswick Scientific). Using a statistical experimental design, the effects of different process and culture parameters on IL-6 production were analyzed. An initial set of 16 batch fermentations was performed using Mineral Medium with two conditions for each of the four parameters (pH, temperature, C-source, and N-source). It transpired that pH was a critical parameter for culturing, and based on this finding a new series of fermentation runs was carried out at pH values ranging from 5 to 7.5. As shown in Table 2, the highest levels of IL-6 were obtained at pH 6, at which point the relative levels of IL-6 were comparable to the glucoamylase levels, suggesting that no degradation occurred. In contrast, IL-6 levels were lower than glucoamylase levels at pH 5 and pH 7.5, most likely due to the degradation of IL-6
11
12 Protein Production in Aspergillus sojae Table 2 Production levels of glucoamylase and interleukin 6 (IL-6) in batch fermentations of
an A. sojae pAN56-4 transformant at different pH values in Mineral Medium, 3% glucose, 10 mM NH4 Cl, 1% soya peptone, 33 ◦ C. Glucoamylase levels were determined as described by Withers et al. [31]. IL-6 levels were determined using a BIACORE biosensor pH
Glucoamylase (mg L−1 )
IL-6 (mg L−1 )
5.0 5.5 6.0 6.5 7.0 7.5
180 350 500 370 440 290
0 173 181 106 54 54
under these conditions. It was also observed that in most fermentation runs the IL-6 levels decreased at the end of fermentation (after 60 h). 5 Strain Development 5.1 Protease-deficient Strains
As described in Section 2, A. sojae strain ATCC11906 was selected for further development as an expression platform. Based on our previous studies with A. niger and the findings described earlier, it became clear that protease-deficient strains are required for the successful production of secreted recombinant proteins [18, 35]. Two approaches were pursued to obtain such protease-deficient strains. In a first approach, we carried out classical (ultra-violet) mutagenesis of spores from ATCC11906, followed by screening for a reduced milk-clearing zone and thus reduced proteolytic activity. In a second approach, the gene encoding alkaline protease was disrupted in strain ATCC11906. 5.1.1 Ultra-violet (UV) Mutagenesis Freshly harvested spores from ATCC11906 were UV-mutagenized in a BioRad UVchamber with a dose that resulted in 20–50% survival. Serial dilutions were plated onto skim-milk-containing plates [18]. Among 5000 surviving strains, four mutant strains with a considerably reduced milk-clearing zone were selected(Figure 6). 5.1.2 Gene Disruption Protein analysis of A. sojae ATCC11906 fermentation media revealed a 35-kDa protein band on SDS-PAGE which was considered to represent an alkaline protease. Even though the proteolytic activity of this strain is relatively low, the putative 35-kDa protease band was relatively abundant. N-terminal sequence analysis confirmed that, indeed, the alkaline protease was present in this band. Primers were designed to clone the promoter region and the major part of the gene by PCR. The resulting
5 Strain Development
Fig. 6 Milk-clearing zones of the A. sojae protease-deficient strains compared to the wild-type strain on skim-milk-containing plates.
PCR fragment was used as a probe for screening of an ATCC11906 cosmid library using the pAOpyrGcosarp1 vector [37]. A genomic clone containing the alkaline protease gene (alpA) was picked up. Two fragments (a 4-kb EcoRI fragment and a 2.5-kb HindIII fragment) from this clone were sub-cloned into a pUC19 vector and characterized by restriction enzyme digestion and sequence analysis. These sub-clones were used to construct a gene replacement vector replacing a 700bp fragment of the coding region by the re-usable pyrG selection marker from pAB4–1rep (Figure 7). An 8.7-kb EcoRI fragment from pAS1-alp, carrying the alpA disruption fragment, was used for transformation of the ATCC11906 pyrG− strain. A number of transformants with a reduced milk halo were obtained. Southern analysis of these transformants revealed successful deletion of the alkaline protease gene. To allow subsequent use of the pyrG marker for transformation, 106 spores were plated on FOA-containing medium, selective for pyrG mutants. From strains with the correctly integrated disruption cassette a large number of FOA-resistant colonies were identified, whereas no spontaneous FOA-resistant mutants were obtained after plating 106 spores of a wild-type control strain. The FOA-resistant colonies derived from the disruption mutants were virtually all uracil- and uridine-requiring, and were shown again to be pyrG− . Southern analysis confirmed the removal of the pyrG gene at the alpA locus, leaving only a single copy of the 400-bp repeat at the site of disruption. 5.1.3 Controlled Batch Fermentations Batch fermentations were carried out with the A. sojae ATCC11906 strain in comparison to the disrupted strain ATCC11906 alpA and the UV-mutagenized strain ATCC11906 UV3.20.1. All fermentations were performed in a fully controlled Bioflow III fermentor (New Brunswick Scientific) using the following conditions: Mineral Medium at pH 6.5; 3% glucose; 10 mM NH4 Cl; 2% yeast extract; 30 ◦ C. When monitoring the proteolytic activity of the various culture samples, the disruptant strain showed, on completion of fermentation, about 40% reduction in proteolytic activity compared to that of the wild-type (Figure 8). The proteolytic
13
14 Protein Production in Aspergillus sojae
¨ Fig. 7 The construction of the alpA disruption vector pAS1-Aalp.
Fig. 8 Protease activity was measured at pH 7.8. The A. sojae protease-deficient strains were compared to the ATCC11906 strain. Universal Protease Substrate (Roche Applied Science, Cat. no. 1080733) was used to determine the activity.
5 Strain Development
activity of the UV-mutagenized strain was even further reduced (up to 70%) in comparison to the wild-type strain. Thus, we are confident that the newly generated protease-deficient strains will provide a considerable improvement for the use of A. sojae for protein production under controlled fermentation conditions. Nevertheless, further improvement of the fermentation process is required. In particular, the culture viscosity was found to be very high due to the filamentous growth of A. sojae. Consequently, oxygen transfer in the culture broth was insufficient, causing poor growth and poor biomass yields. In order to improve oxygen transfer, the culture viscosity needs to be lowered. This can be achieved either by changing the process parameters, which could cause a difference in the growth characteristics of the fungus [38], or by changing the morphology of the fungus itself. Consequently, the latter option – to isolate morphological A. sojae mutants of reduced viscosity – was chosen. 5.2 Low-viscosity Mutants
As described previously, viscosity is of major concern during fermentations with A. sojae and many other fungal organisms [31, 39]. The results of previous studies in our laboratory have shown that the disruption of a gene encoding the pro-protein convertase resulted in hyperbranching, low-viscosity mutants in A. niger. Based on this finding, the decision was taken to disrupt the corresponding A. sojae gene. 5.2.1 Gene Disruption The pro-protein convertase-encoding gene (pclA) of A. sojae was isolated by functional complementation of an A. niger pclA disruption mutant with the ATCC11906 cosmid library. A genomic clone was identified to complement the hyperbranching phenotype of this mutant, resulting in wild-type growth characteristics. PCR and partial sequence analysis confirmed the presence of the pclA gene on this genomic clone. A 7.2-kb XbaI fragment contained on this clone was characterized by restriction analysis and sub-cloned into the pMTL24 cloning vector [40]. Subsequently, a 5.2-kb XbaI/BamHI fragment was further sub-cloned, resulting in the vector pAS24 (Figure 9). The construction of a pclA disruption vector was achieved by cloning the repeat-flanked pyrG gene from pAB4-1rep as a SmaI fragment into the unique EcoRV site of the pclA gene (Figure 9). The 8.2-kb NotI/AscI fragment from pAS2-4pcl, containing the pclA disruption fragment, was used to transform strain ATCC11906 pyrG− and its proteasedeficient derivatives. A number of transformants were obtained of which 5–10% showed a compact morphology. After purification of the compactly growing transformants, two types of compact morphology were observed; a poorly sporulating type, and another of normal sporulation (Figure 10). Microscopic analysis also revealed differences in morphology. The poorly sporulating transformants were hyperbranching, while the other transformants showed very short hyphae (Figure 11). Southern analysis showed that the pclA gene was disrupted in the poorly sporulating, hyperbranching transformants. In the other type of transformants, the pclA
15
16 Protein Production in Aspergillus sojae
Fig. 9 The construction of the pclA disruption vector pAS2-4pcl.
Fig. 10 Morphology of A. sojae wild-type and low-viscosity mutant strains grown on a culture agar plate.
5 Strain Development
Fig. 11 Morphology of A. sojae wild-type and low-viscosity mutant strains grown in culture medium.
disruption fragment was integrated in the genome elsewhere. Nevertheless, it was found that this morphologically aberrant type of transformants showed lowviscosity characteristics under shake-flask conditions, as did the pclA disruption strains. As the genotype of the non-pclA disrupted strains is still unknown, we refer to these strains as LFV (Low Fermentor Viscosity) strains. 5.2.2 Controlled Fermentations The pclA disruption strains were grown in batch fermentations under the following conditions: Mineral Medium, 3% glucose; 10 mM NH4 Cl; 2% yeast extract; 33 ◦ C; pH 5.5. The compact morphology of these strains in shake-flask cultures can be described as micropellets (pellet diameter <1 mm; pellet diameter of wild-type strains are >3 mm). Unexpectedly, in the batch fermentations rigid pellets were observed which tended to enlarge in time. From A. niger research it is known that pellet size decreases when the pH of the fermentation is increased (N. van Biezen, unpublished results). Therefore, the batch fermentations were repeated at pH 7.0, under which conditions micropellets of the A. sojae pclA disruption strains were observed, resulting in a lower viscosity. However biomass yields were not higher than those obtained with the wild-type strain. As the A. sojae pclA disruption strains poorly sporulate, further research is required on these to evaluate their potential as expression hosts. Controlled fed-batch fermentations were carried out with the A. sojae ATCC11906 strain and two LFV strains. The viscosity of whole fermentation broth (medium + cells) was determined in a Haake Viscotester (type VT500) using sensor system MV DIN (vessel number 7), operated at room temperature. A fresh fermentation sample (40 mL) was equilibrated for 1 min at a set spindle speed (5 r.p.m.). After this equilibration period, the viscosity was measured at 10 different spindle speeds. Viscosity ranges were determined for the A. sojae ATCC11906 and the two LFV strains and the viscosity of the whole fermentation broth of both LFV strains was significantly lower than that of the wild-type strain, while the biomass yields were comparable (Table 3). The LFV strain with the lowest viscosity (LFV#2) was selected for the development of high-density biomass fermentations.
17
18 Protein Production in Aspergillus sojae Table 3 Viscosity ranges of the various A. sojae strains determined on a Haake Viscotester (type VT500), using sensor system MV DIN (vessel number 7)
Strain
Viscosity (cp) at shear rate ∼6.5 s−1 83.2 s−1
644.4 s−1
A. sojae ATCC11906 A. sojae LFV#2 A. sojae LFV#4
2000 1565 2000
155 18 76
1505 94 751
Biomass (g L−1 )
8.8 and 16.6 6.9 and 17.2 7.6 and 17.2
High-density biomass fermentations were accomplished with the LFV#2 strain by optimizing the fed-batch fermentation conditions. Modifying the feed/batch ratios and higher amounts of yeast extract, phosphate and ammonia resulted in an up to fourfold higher biomass yield. The viscosity of these high-density biomass fermentations was still very low, and the macroscopic morphology of the LFV#2 strain could be described as “yeast-like”. Initial protein production experiments with this strain showed good fermentation characteristics. Research investigations are currently under way to determine the nature of the mutation in this strain. We are confident that the combination of protease-deficient and fermentor-adapted strains will further improve the A. sojae expression platform for recombinant protein production. An overview of strains is given in Appendix Table A1.
6 Conclusion
A. sojae has been developed as a platform for recombinant protein production, in a similar manner as described previously for several other Aspergillus species. One aspect that makes A. sojae attractive is that it has a long history of (safe) use in the food industry. Several industrial processes are already in place applying both submerged and solid-state fermentations. However, additional experimental investigations are needed to define the conditions of controlled fungal fermentation for the production of non-fungal proteins when employing this organism, as well as using established fungal hosts such as A. niger. The established culture conditions defined for the production of endogenous proteins are often not suited to the production of heterologous non-fungal proteins. This aspect is addressed within the department of Microbiology at TNO Nutrition and Food Research (www.voeding.tno.nl), as well as in a research program which was initiated in 2003 and funded by the Dutch Government: Kluyver Centre for Genomics of Industrial Fermentation (www.kluyvercentre.nl). In this program, integrative data-driven approaches such as transcriptomics, proteomics, and metabolomics will be used to address bottlenecks in the complex processes of fungal fermentation and protein production. This research will lead to the identification of new targets for process and strain optimization, which may subsequently be addressed with the molecular and process engineering approaches, as described in this chapter.
Appendix
Acknowledgments
The authors thank Kurt Vogel and Markus Wyss for their contributions to the studies described in this chapter, and for their critical reading of the manuscript. DSM Nutritional Products is gratefully acknowledged for the financial support of these investigations. Appendix
Table 4 Selection of Aspergillus sojae strains
Strain Parental strains ATCC9362 ATCC11906 ATCC20235 ATCC20245 ATCC20387 ATCC20388 ATCC42249 ATCC42250 ATCC42251 ATCC46250 Protease-deficient strains ATCC11906 alpA UV1.19.1 UV3.18.3 UV3.20.1 Low-viscosity mutants ATCC11906 pclA ATCC11906 alpA pclA UV3.20.1 pclA
Genotype
Phenotype
Wild-type Wild-type Wild-type Wild-type Wild-type Wild-type Wild-type Wild-type Wild-type Wild-type alpA niaD, unknown unknown niaD, unknown
Reference
[14] [14] [14] [14] [14] [14] [14] [14] [14] [14] AlpA− NiaD− , Protease deficient Protease deficient NiaD− , Protease deficient
[14] [14] [14] [14]
pclA alpA, pclA
PclA− AlpA− , PclA−
[14] [14]
niaD, pclA, unknown
[14] [14]
ATCC11906 LFV Auxotrophic strains ATCC11906 pyrG ATCC11906 alpA pyrG ATCC11906 pclA pyrG ATCC11906 alpA pclApyrG UV1.19.1
unknown
NiaD− , PclA− , Protease deficient Low viscosity
pyrG alpA, pyrG pclA, pyrG alpA, pclA, pyrG
PyrG− AlpA− , PyrG− PclA− , PyrG− AlpA− , PclA− , PyrG−
[14] [14] [14] [14]
niaD, unknown
[14]
UV3.18.3 pyrG
pyrG, unknown
UV3.20.1 pyrG
niaD, pyrG, unknown
UV3.20.1 pclA pyrG
niaD, pclA, pyrG, unknown
NiaD− , Protease deficient PyrG− , Protease deficient NiaD− , PyrG− , Protease deficient NiaD− , PclA− , PyrG− , Protease deficient
[14] [14] [14]
19
Expression cassette
A. nidulans gpdA promoter and trpC terminator
A. nidulans gpdA promoter and trpC terminator
A. nidulans gpdA promoter and trpC terminator A. nidulans gpdA promoter and trpC terminator A. nidulans gpdA promoter and trpC terminator
A. nidulans gpdA promoter and trpC terminator A. nidulans gpdA promoter and trpC terminator A. nidulans gpdA promoter and trpC terminator
A. nidulans gpdA promoter and trpC terminator
A. nidulans A. nidulans A. niger A. niger A. oryzae
Expression vectors
pAN52-1Not
pAN56-1
pFytF3 pFytF3cos4 pGLAFytF
pGLA6S pLAC1-B pAN56-4
Pgpd-GLA-GFP
p3SR2 pAN4-cos1 pAB4-1 pAB4-1rep pAOpyrGcosarp1
Table 5 Selection of vectors used in Aspergillus sojae
A. fumigatus phytase A. fumigatus phytase A. niger glaA (G2)–A. fumigatus phytase fusion A. niger glaA (G1) P. cinnabarius laccase A. niger glaA (G2)–human interleukin 6 fusion A. niger glaA (G2)–green fluorescent protein fusion amdS amdS pyrG pyrG pyrG
A. niger glaA (G2)
Gene
Dominant Dominant Auxotrophic Auxotrophic Auxotrophic
amdS
Selection marker
[9] [33] [23] [14] [15]
[12]
[24] [29] [5]
Punt et al. [24] (Accession #Z32524) Punt et al. [24] (Accession #Z32700) [22] unpublished unpublished
Reference
20 Protein Production in Aspergillus sojae
References Table 6 List of Genes
Gene
Encoded gene product
ver-1 aflR alpA amdS pyrG gpdA glaA alcA cbh1 GUS IL-6 GFP bipA pclA niaD trpC
versicolorin A dehydrogenase (Aspergillus species in the section Flavi) aflatoxin pathway-specific regulator (Aspergillus species in the section Flavi) alkaline protease (Aspergillus sojae) acetamidase (Aspergillus nidulans) orotidine-5-monophosphate decarboxylase (Aspergillus niger) glyceraldehyde-3-phosphate dehydrogenase (Aspergillus nidulans) glucoamylase (Aspergillus niger) alcohol dehydrogenase (Aspergillus nidulans) cellobiohydrolase (Trichoderma reesei) glucuronidase (Escherichia coli) interleukin 6 (Homo sapiens) green fluorescent protein (Aequorea victoria) ER-chaperone from the HSP70 gene family (Aspergillus niger) proprotein convertase (Aspergillus niger, Aspergillus sojae) nitrate reductase (Aspergillus sojae) trifunctional polypeptide; glutamine amido transferase, indole-3-glycerol phosphate synthase, and N-(5 -phosphoribosyl) anthranilate isomerase (Aspergillus nidulans)
References 1 ELANDER R. P. (2003) Industrial production of beta-lactam antibiotics. Appl Microbiol Biotechnol 61: 385–392 2 VERDOES J. C. (1994) Molecular genetic studies of the overproduction of glucoamylase in Aspergillus niger. PhD thesis, Vrije Universiteit of Amsterdam 3 DEMAIN A. L. (2000) Small bugs, big business: the economic power of the microbe. Biotechnol Adv 18: 499–514 4 SIGOILLOT C., RECORD E., BELLE V., ROBERT J. L., LEVASSEUR A., PUNT P. J., van den HONDEL C. A., FOURNEL A., SIGOILLOT J. C., ASTHER M. (2004) Natural and recombinant fungal laccases for paper pulp bleaching. Appl Microbiol Biotechnol 64: 346–352 5 RECORD E., ASTHER M., SIGOILLOT C., PAGES S., PUNT P. J., DELATTRE M., HAON M., van den HONDEL C. A., SIGOILLOT J. C., LESAGE-MEESSEN L., ASTHER M. (2003) Overproduction of the Aspergillus niger feruloyl esterase for pulp bleaching application. Appl Microbiol Biotechnol 62: 349–355
6 GOUKA R. J., PUNT P. J., van den HONDEL C. A. (1997) Efficient production of secreted proteins by Aspergillus: progress, limitations and prospects. Appl Microbiol Biotechnol 47: 1–11 7 PUNT P. J., van BIEZEN N., CONESA A., ALBERS A., MANGNUS J., van DEN HondelCA (2002) Filamentous fungi as cell factories for heterologous protein production. Trends Biotechnol 20: 200–206 8 BALLANCE D. J., BUXTON F. P., TURNER G. (1983) Transformation of Aspergillus nidulans by the orotidine-5 -phosphate decarboxylase gene of Neurospora crassa. Biochem Biophys Res Commun 112: 284–289 9 KELLY J. M., HYNES M. J. (1985) Transformation of Aspergillus niger by the amdS gene of Aspergillus nidulans. EMBO J 4: 475–479 10 TAMURA M., KAWAHARA K., SUGIYAMA J. (2000) Molecular phylogeny of Aspergillus and associated teleomorphs in the trichocomaceae (Eurotiales). In: Integration of modern taxonomic methods for Penicillium and Aspergillus
21
22 Protein Production in Aspergillus sojae
11
12
13
14
15
16
17
18
19
classification (SAMSON R. A., PITT J. I., Eds). Singapore: Harwood Academic Publishers, pp 357–372 PETERSON S. W. (2000) Phylogenetic relationships in Aspergillus based on rDNA sequence analysis. In: Integration of modern taxonomic methods for Penicillium and Aspergillus classification (SAMSON R. A., PITT J. I., Eds). Singapore: Harwood Academic Publishers, pp 323–355 KURTZMAN C. P., SMILEY M. J., ROBERT C. J., WICKLOW D. T. (1986) DNA relatedness among wild and domesticated species in the Aspergillus flavus group. Mycologia 78: 955–959 CHANG P. K., BHATNAGAR D., CLEVELAND T. E., BENNETT J. W. (1995) Sequence variability in homologs of the aflatoxin pathway gene aflR distinguishes species in Aspergillus section Flavi. Appl Environm Microbiol 61: 40–43 YUAN G. F., LIU C. S. and CHEN C. C. (1995) Differentiation of Aspergillus parasiticus from Aspergillus sojae by Random Amplification of Polymorphic DNA. Appl Environm Microbiol 61: 2384–2387 KUSOMOTO K. I., YABE K., NOGATA Y., OHTA H. (1998) Aspergillus oryzae with and without a homolog of aflatoxin biosynthetic gene ver-1. Appl Microbiol Biotechnol 50: 98–104 WATSON A. J., FULLER L. J., JEENES D. J., ARCHER D. B. (1999) Homologs of aflatoxin biosynthesis genes and sequence of aflR in Aspergillus oryzae and Aspergillus sojae. Appl Environ Microbiol 65: 307–310 HEERIKHUISEN M., van den HONDEL C. A., PUNT P. J., van BIEZEN N., ALBERS A., VOGEL K. (2001) Novel means of transformation of fungi and their use for heterologous protein production. World Patent Application WO 01/09352 MATTERN I. E., van NOORT J. M., van den BERG P., ARCHER D. B., ROBERTS I. N., van den HONDEL C. A. (1992) Isolation and characterization of mutants of Aspergillus niger deficient in extracellular proteases. Mol Gen Genet 234: 332–336 YELTON M. M., HAMER J. E., TIMBERLAKE W. E. (1984) Transformation of Aspergillus nidulans by using a trpC plasmid. Proc Natl Acad Sci USA 81: 1470–1474
20 PUNT P. J., van den HONDEL C. A. (1992) Transformation of filamentous fungi based on hygromycin B and phleomycin resistance markers. Methods Enzymol 216: 447–457 21 GEMS D., JOHNSTONE I. L., CLUTTERBUCK A. J. (1991) An autonomously replicating plasmid transforms Aspergillus nidulans at high frequency. Gene 98: 61–67 22 MATTERN I. E., UNKLES S., KINGHORN J. R., POUWELS P. H., van den HONDEL C. A. (1987) Transformation of Aspergillus oryzae using the A. niger pyrG gene. Mol Gen Genet 210: 460–461 23 van HARTINGSVELDT W., MATTERN I. E., van ZEIJL C. M., POUWELS P. H., van den HONDEL C. A. (1987) Developmentofa homologous transformation system for Aspergillus niger based on the pyrG gene. Mol Gen Genet 206: 71–75 24 PUNT P. J., ZEGERS N. D., BUSSCHER M., POUWELS P. H., van den HONDEL C. A. (1991) Intracellular and extracellular production of proteins in Aspergillusunder the control of expression signals of the highly expressed Aspergillus nidulans gpdA gene. J Biotechnol 17: 19–33 25 KERANEN S., PENTTILA M. (1995) Production of recombinant proteins in the filamentous fungus Trichoderma reesei. Curr Opin Biotechnol 6: 534–537 26 ROBERTS I. N., OLIVER R. P., PUNT P. J., van den HONDEL C. A. (1989) Expression of the Escherichia coli beta-glucuronidase gene in industrial and phytopathogenic filamentous fungi. Curr Genet 15: 177–180 27 VERDOES J. C., PUNT P. J., STOUTHAMER A. H., van den HONDEL C. A. (1994) The effect of multiple copies of the upstream region on expression of the Aspergillus niger glucoamylase-encoding gene. Gene 145: 179–187 28 van GEMEREN I. A., PUNT P. J., DRINT-KUYVENHOVEN A., BROEKHUIJSEN M. P., VAN’T HOOG A., BEIJERSBERGEN A., VERRIPS C. T., van den HONDEL C. A. (1997) The ER chaperone encoding bipA gene of black Aspergilli is induced by heat shock and unfolded proteins. Gene 198: 43–52 29 PUNT P. J., van GEMEREN I. A., DRINT-KUIJVENHOVEN J., HESSING J. G., van MUIJLWIJK-HARTEVELD G. M.,
References
30
31
32
33
34
35
BEIJERSBERGEN A., VERRIPS C. T., van den HONDEL C. A. (1998) Analysis of the role of the gene bipA, encoding the major endoplasmic reticulum chaperone protein in the secretion of homologous and heterologous proteins in black Aspergillus. Appl Microbiol Biotechnol 50: 447–454 BENNETT J. W., LASURE L. L. (1991) Growth media. In: More Gene Manipulations in Fungi (BENNETT J. W., LASURE L. L., Eds). Academic Press, San Diego, pp 441–457 WITHERS J. M., SWIFT R. J., WIEBE M. G., ROBSON G. D., PUNT P. J., van den HONDEL C. A., TRINCI A. P. (1998) Optimization and stability of glucoamylase production by recombinant strains ofAspergillus niger in chemostat culture. Biotechnol Bioeng 59: 407–418 CONESA A., PUNT P. J., van den HONDEL C. A. (2002) Fungal peroxidases: molecular aspects and applications. J Biotechnol 93: 143–158 VERDOES J. C., PUNT P. J., SCHRICKX J. M., van VERSEVELD H. W., STOUTHAMER A. H., van den HONDEL C. A. (1993) Glucoamylase overexpression in Aspergillus niger: molecular genetic analysis of strains containing multiple copies of the glaA gene. Transgenic Res 2: 84–92 RECORD E., PUNT P. J., CHAMKHA M., LABAT M., van den HONDEL C. A., ASTHER M. (2002) Expression of the Pycnoporus cinnabarinus laccase gene in Aspergillus niger and characterization of the recombinant enzyme. EurJ Biochem 269: 602–609 BROEKHUIJSEN M. P., MATTERN I. E., CONTRERAS R., KINGHORN J. R., van DEN HONDElCA (1993) Secretion of heterologous proteins by Aspergillus niger: production of active human interleukin-6 in a protease-deficient mutant by KEX2-like processing of a
36
37
38
39
40
41
42
glucoamylase-hIL6 fusion protein. J Biotechnol 31: 135–145 GORDON C. L., ARCHER D. B., JEENES D. J., DOONAN J. H., WELLS B., TRINCI A. P., ROBSON G. D. (2000) A glucoamylase:: GFP gene fusion to study protein secretion by individual hyphae of Aspergillus niger. J Microbiol Methods 42: 39–48 HJORT C., van den HONDEL C. A., PUNT P. J., SCHUREN F. H. (2000) Fungal transcriptional activator useful in methods for producing polypeptides. World Patent Application WO 00/20596 BHARGAVA S., NANDAKUMAR M. P., ROY A., WENGER K. S., MARTEN M. R. (2003) Pulsed feeding during fed-batch fungal fermentation leads to reduced viscosity without detrimentally affecting protein expression. Biotechnol Bioeng 81: 341–347 BOCKING S. P., WIEBE M. G., ROBSON G. D., HANSEN K., CHRISTIANSEN L. H., TRINCI A. P. (1999) Effect of branch frequency in Aspergillus oryzae on protein secretion and culture viscosity. Biotechnol Bioeng 65: 638–648 CHAMBERS S. P., PRIOR S. E., BARSTOW D. A., MINTON N. P. (1988) The pMTL nic-cloning vectors. I. Improved pUC polylinker regions to facilitate the use of sonicated DNA for nucleotide sequencing. Gene 68: 139–149 PASAMONTES L., HAIKER M., WYSS M., TESSIER M., van LOON A. P. (1997) Gene cloning, purification, and characterization of a heat-stable phytase from the fungus Aspergillus fumigatus. Appl Environ Microbiol 63: 1696–1700 USHIJIMA S., HAYASHI K., MURAKAMI H. (1982) The current taxonomic status of Aspergillus sojae used in Shoyu fermentation. Agric Biol Chem 46: 2365–2367
23
1
Antibiotics and the Ribosome Jeffrey L. Hansen Yale University, New Haven, USA
Originally published in: Protein Crystallography in Drug Discovery. Edited by Robert E. Babine and Sherin S. Abdel-Meguid. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30678-7
1 Introduction
Many antibiotics kill bacteria by binding to the ribosome and thereby inhibiting protein synthesis. After decades of antibiotic use, many pathogens have become resistant to antibiotics, and thus new antibiotics are needed to treat bacterial diseases. Rational drug design is expected to play an important role in meeting this need. Rational efforts to design new antibiotics that target the ribosome became feasible when the crystal structures of both ribosomal subunits were solved at atomic resolution in 2000 [1–3]. Since then, structures of about 20 different antibiotics bound to ribosomes have been published (Tab. 1) [4–12]. These structures of antibiotic/ribosomal complexes provide insights into the mechanisms by which antibiotics inhibit protein synthesis and by which mutations confer resistance. Thus, a sound basis now exists for designing new antibiotics that may circumvent resistance. 2 The Ribosome 2.1 Introduction
The ribosome is the enzyme that catalyzes peptide bond formation. The bacterial ribosome is a large 2500 kDa ribonucleic acid/protein complex comprised of a large subunit (LSU or 50S subunit) and a small subunit (SSU or 30S subunit) (Fig. 1). The small ribosomal subunit binds to messenger RNA (mRNA) and reads the genetic code by aligning its base triplet codons with anticodons of transfer RNA Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Antibiotics and the Ribosome Table 1 Antibiotic/ribosome crystal structures
Classification of antibiotic
Antibiotic
Ribosome Resolution FreeR PDB code Reference
50S 50S MLSB Macrolide-16
None (native) None (native)
Hm Dr
2.4 3.1
26.1 27.4
1JJ2 1LNR
[9] [10]
Carbomycin Spiramycin Tylosin Azithromycin Clarithromycin Erythromycin A Roxithromycin Clindamycin Virginiamycin M Not available Anisomycin Chloramphenicol Chloramphenicol Puromycin
Hm Dr Hm Hm Dr Dr Dr Dr Hm
3.0 3.0 3.0 3.2 3.5 3.5 3.8 3.1 3.0
26.5 26.9 26.2 25.0 32.3 30.1 27.4 30.3 24.2
1K8A 1KD1 1K9M 1M1K 1J5A 1JZY 1JZZ 1JZX 1N8R
[8] [8] [8] [8] [4] [4] [4] [4] [7]
Hm Dr Hm Hm
3.0 3.5 3.0
24.6 32.1 20.9
1K73 IKOl 1NJ1 1FGO
[7] [4] [7] [11]
Sparsomycin Blasticidin S None (native) None (native) None (native) Pactamycin Hygromycin B Paromomycin Paromomycin Paromomycin Paromomycin Paromomycin Spectinomycin Streptomycin Edeine Edeine Tetracycline Tetracycline Tetracycline
Hm Hm Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt Tt
2.8 3.0 3.0 3.2 3.3 3.4 3.3 3.0 3.0 3.1 3.3 3.3 3.0 3.0 3.2 4.5 3.2 3.4 4.5
22.2 23.5 25.2 26.1 30.5 28.0 26.1 25.5 27.0 27.5 28.2 28.4 25.5 25.5 24.5 24.4 24.5 26.4 25.4
1M190 1KC8 1J5E 1HRO 1FKA 1HNX 1HNZ 1FJG 1N32 1IBL 1IBK 1N33 1FJG 1FJG 1194 1195 1194 1HNW 1197
[7] [7] [2] [2]
Macrolide-15 Macrolide-14
Lincosamide Streptogramin-A Streptogramin-B A-site
Nucleotide analog P-site 30S
[5] [5] [6] [12] [12] [12] [12] [6] [6] [52] [52] [52] [5] [52]
molecules (tRNA). The large ribosomal subunit binds to opposite ends of tRNA molecules and catalyzes peptide bond formation. 2.2 Binding of tRNA
Several sites on the ribosome interact with tRNA (Fig. 1) and also are targeted by antibiotics. The peptidyl-tRNA binding site (P-site) of the large subunit binds to
2 The Ribosome
Fig. 1 Overview of ribosome. Center: These space filled representations are based on docking of the large ribosomal subunit [1] (grey RNA and blue protein) onto the small subunit [2] (yellow RNA and green protein) based on a protein alpha carbon and rRNA phosphate backbone trace of the 70S ribosome bound with tRNA [53]. Functional sites are circled and labeled according to the corresponding bound tRNA molecules; A-site (red), P-site (orange), E-site (purple) and factor binding site (FBS). Peptide bond formation is catalyzed at the peptidyl transferase center (PTC) that is located on the large subunit between the A-site and P-site.
Left: The small subunit with bound tRNA was rotated to the left to also show its interface surface. Right: The large subunit with bound tRNA was rotated to the right, and its interface surface was clipped, to show a model of a peptide attached to the P-site tRNA (orange) and extending down the exit tunnel. Messenger RNA (mRNA), which binds to the 30S subunit, is schematically represented (straight white line) on the 50S subunit with its codons (black circles) interacting with anticodons of tRNA. With respect to the large subunit (right), tRNA moves from right to left during translation.
the 3 end of peptidyl-tRNA. The aminoacyl-tRNA binding site (A-site) of the large subunit binds to the 3 end of aminoacyl-tRNA. The tRNA exit site (E-site) of the large subunit binds to the 3 end of deacylated tRNA. Meanwhile, corresponding sites on the small subunit interact with the opposite ends of tRNA molecules that contain the anticodon stems.
2.3 Peptidyl Transferase Activity
In the first step of the peptidyl transferase reaction, a peptidyl tRNA molecule is bound in the P-site with its nascent peptide extending down the peptide exit tunnel (Fig. 1). An elongation factor binds to a factor binding site (FBS) and positions an aminoacyl-tRNA in the A-site. The α amino group of the aminoacyltRNA nucleophilically attacks the ester bond which connects the peptide to the tRNA bound in the P-site (Fig. 2). The ester bond is broken as an amide bond forms, and the peptide becomes one amino acid longer, and is now attached to the tRNA that in the A-site. Translocation of the products follows peptide bond formation, as the newly formed deacylated- tRNA of the P-site moves into the
3
4 Antibiotics and the Ribosome
Fig. 2 Peptidyl transferase reaction. Left: The " amino group of an A-site substrate, attacks (arrow) the ester bond that links a P-site substrate tRNA to its nascent peptide chain. The first 73 nucleotides of tRNA are represented by ribbons, C74 and C75 are represented by the letter C, and A76 and the peptide are represented by chemical drawings. Center: During the nucleophilic attack,
a tetrahedral carbon intermediate with an oxyanion is formed. The entire tRNA portion of the substrates except for A76 is represented by (tRNA). Right: The intermediate resolves forming two products; a deacylated tRNA bound to the P-site, and a peptidyltRNA that is one amino acid longer and is still bound to the A-site.
E-site, and as the newly elongated peptidyl-tRNA moves from the A-site into the P-site. 2.4 Structure of the Ribosome
Prior to the availability of atomic resolution crystal structures of ribosomal subunits, the secondary structure of ribosomal RNA (rRNA) was determined [13, 14] (Fig. 3). When the X-ray structures of the 50S [1] and 30S [2, 3] subunits were solved, the predicted secondary structure and many predicted tertiary interactions [15] were found to be quite reliable. Within the secondary structural diagrams, regions containing highly conserved sequence motifs were identified and were found to be implicated in various functions of the ribosomes (Figs. 3 and 4). Nucleotides implicated in the peptidyl transferase reaction were located in the central loop of Domain V, which was therefore designated as the peptidyl transferase center (PTC) Other nucleotides that were implicated in factor binding and in an associated GTPase activity were located in a loop of Domain II Nucleotides in these same regions of secondary structure were also implicated in antibiotic binding because they either were protected by antibiotics from chemical modification or upon mutation-conferred resistance to antibiotics
3 Antibiotics
Fig. 3 Secondary structure of large ribosomal subunit. The secondary structure of rRNA has been divided into six domains (Dom I to Dom VI) [14]. Various functional sights have been located within these domains. The peptidyl transferase center (PTC) was located in domain V. The factor binding site was located in domain II. Fig. modified from [1].
This chapter will focus on antibiotics that bind to the peptidyl transferase center because no X-ray structures are available of antibiotics bound to the factor binding site.
3 Antibiotics 3.1 Introduction
Anti-ribosomal antibiotics are enzyme inhibitors, and the ones that are clinically useful inhibit bacterial ribosomes far more effectively than they inhibit eukaryotic ribosomes. Many enzyme inhibitors exert their effects by binding to the active site of enzymes and thereby prevent the binding of substrates. Others block enzyme function by inhibiting conformational changes essential for enzyme activity. Antiribosomal antibiotics are unexceptional in this regard. They bind to the sites on both subunits to which tRNA also binds. Some block the interactions of substrates
5
6 Antibiotics and the Ribosome
Fig. 4 Secondary structure of the PTC and antibiotic information. Nucleotide numbering is from H. marismortui (Hm) and is followed by the standard corresponding E. coli (Ec) numbering in parentheses. Nucleotides with superscripts have been implicated in antibiotic interactions by nucleotide protection studies (green) [54–57] or by mutations (orange) that confer resistance to: carbomycin A (M); linco-samides (L); streptogramin A [58] (S); chloramphenicol [59–
64] (c); blasticidin S [48] (b); anisomycin [51, 65] (a); or sparsomycin [65, 66] (y). Two hydrophobic crevices (gray triangles) are located at the ends of two RNA helices (center right and center left of figure). Nucleotides that form canonical or noncanonical base pairs or base triples that are involved in the structure of either crevice are connected with gray lines. Fig. modified from [7].
with the ribosome. Others block the tunnel and thereby prevent the nascent peptide from extending.
3.2 Antibiotics that Bind to the 50S Subunit
Most antibiotics that inhibit the function of the 50S subunit bind near its peptidyl transferase center (Fig. 4) and block peptide bond formation. Crystal structures are available for several such antibiotics bound to the ribosome (Fig. 5). They appear to inhibit the peptidyl transferase reaction either by competing directly with its substrates for binding, or indirectly by blocking the exit tunnel.
3 Antibiotics
Fig. 5 Overview of antibiotics bound at the peptidyl transferase center. A surface representation of the large subunit of H. marismortui includes the P-site, A-site and entrance to the peptide exit tunnel. Most of these antibiotics contact either the active site hydrophobic crevice (green surface, upper center) or the hydrophobic crevice at the entrance to the exit tunnel (green
surface, lower right). In addition, many of these antibiotics occupy an elongated pocket (dark surface, center) in the wall of the exit tunnel between these two crevices. The antibiotics shown are all from complexes with H. marismortui ribosomes and overlap the binding site of A-site substrates (red sticks) or of a P-site substrate (orange sticks). Fig. modified from [7].
Two hydrophobic crevices near the peptidyl transferase center appear to be particularly important for antibiotic binding (Figs. 4 and 5). The first hydrophobic crevice, which is at the peptidyl transferase center, normally functions as the binding site for amino acid side chains of A-site substrates [11, 16]. Antibiotics that bind to this site will block the binding with A-site substrates and thereby directly prevent peptide bond formation. The second hydrophobic crevice is located at the entrance to the peptide exit tunnel, and its normal function in translation is not yet known. Nevertheless, binding of antibiotics to the second crevice appears to inhibit peptide bond formation by interfering with the passage of an elongating polypeptide down the peptide exit tunnel. A few of the antibiotics that target the 50S subunit bind near the factor binding site. 3.3 MLSB Antibiotics
The largest class of antibiotics that inhibit the function of the 50S subunit is the MLSB group (Macrolide, Lincosamide, and Streptogramin B) (Fig. 6) many of which are used to treat human diseases [17]. Macrolides, Lincosamides, and Streptogramin B antibiotics are considered as a group, not based on any chemical similarity, but because their effects on the ribosome are similar. For example, when ribosomes are exposed to any MLSB antibiotic, a specific group of nucleotides becomes protected from chemical modification (Fig. 4). Mutation of many of these same nucleotides
7
8 Antibiotics and the Ribosome
Fig. 6 Chemical structures of MLSB antibiotics. Macrolides are comprised of a central lactone ring of 14, 15, or 16 atoms from which extend various sugar groups and functional groups. Lincosamides are comprised of two sugars linked by an amide
bond. Like macrolides streptogramins contain a large central ring streptogramin A is included because it binds the ribosome cooperatively with streptogramin B. Fig. combined from [7, 8].
confers resistance to antibiotics from all three groups (Fig. 4). Prominent in this group of implicated nucleotides are Ec1) A2058 and A2059 which are strongly protected by most MLSB antibiotics. Finally, mutation or modification of Ec A2058
1) Throughout the text, nucleotide numbers will correspond to the organism being described and will be preceded by its initials: Hm = Haloarcula marismortui, Ec = Escherichia coli,
Tt = Thermus thermophilus, Dr = Deinococcus radiodurans. The coresponding standard nucleotide numbering from E. coli will normally follow in parentheses.
3 Antibiotics
confers strong resistance to MLSB antibiotics. These observations suggest that Ec A2058 and A2059 may be near the center of the region to which all MLSB antibiotics bind. X-ray structures of several macrolides [4, 8] and of one lincosamide [4] have been solved in complexes with ribosomes. Although no structure of a streptogramin B in a complex with the ribosome is yet available there is a structure of a streptogramin A and it provides considerable insight into the biochemistry of streptogramins. Consistent with the biochemical data the antibiotics bind near the entrance to the peptide exit tunnel and interact with the hydrophobic crevice formed by Ec A2058 and A2059 (Fig. 5) 3.4 Macrolides
Macrolide antibiotics consist of a central lactone ring from which extend various functional groups and sugar substituents (Fig. 6). Macrolides are divided into three subgroups depending on the number of atoms in the lactone ring: 14-membered, 15-membered and 16-membered. Inhibition of protein translation by macrolides has two distinct characteristics. Firstly, macrolides will neither bind to [18] nor inhibit [19] a ribosome that is already translating mRNA, that is, a ribosome that already has a nascent peptide in its exit tunnel. Secondly, macrolides do not directly inhibit formation of a peptide bond. Instead, when macrolides are added to a translational system, di-peptides, tri-peptides and even tetra-peptides accumulate [20]. However, formation of longer peptides is inhibited. Structures for three 16-membered macrolides and one 15-memberered macrolide bound to the archaeal ribosome of Haloarcula marismortui (Hm) are available [8], as are structures for four 14-membered macrolides bound to the bacterial ribosome of Ddnococcus radiodurans (Dr) [4]. Interestingly, it appears that the 14-membered macrolides do not bind to the bacterial ribosomes in the same way that the 15- or 16-membered macrolides bind to the archaeal ribosomes. Nevertheless, in both systems the macrolides bind to the same site, at the entrance to the peptide exit tunnel, consistent with biochemical inferences about the way that macrolides interfere with elongating peptides. The mode of binding observed for 15- and 16-membered macrolides will be discussed first. 3.4.1 Macrolides, 15- and 16-Membered In H. marismortui, the 15-membered and 16-membered macrolides studied so far bind to the ribosome almost identically. When rRNA portions of these structures are superimposed, the lactone rings of the macrolides become superimposed on an almost atom by atom basis (Fig. 7). At the center of the macrolide binding site is the hydrophobic crevice at the entrance to the peptide exit tunnel between Hm G2099 and A2100 (Ec A2058 and A2059) (Fig. 8). The bound macrolides almost completely occlude the peptide exit tunnel, which explains the two distinctive characteristics of macrolide inhibition. Firstly, the reason that macrolides do not inhibit ribosomes that are already actively making protein is that the nascent peptide in the exit tunnel blocks access to the macrolide
9
10 Antibiotics and the Ribosome
Fig. 7 Superposition of macrolides. A cutaway view of a space-filled representation of rRNA (gray) and protein (light blue) show the peptide exit tunnel (left) and the peptidyl transferase center (upper right) of H. maris-mortui. The lactone rings of tylosin (orange sticks), carbomycin A (red sticks), spiramycin (yellow sticks) and azithromycin (light blue sticks) become superimposed when rRNA is superimposed among these structures. The lactone rings bind to the hydrophobic crevice between Hm G2099 and
A2100 (green spheres) at the entrance to the peptide exit tunnel. Hm A2103 (dark green sticks) changes conformation (white arrow and light green sticks) and forms a covalent bond to 16-membered macrolides. One distinct sugar group (in parentheses) extends from the lactone ring of each antibiotic. The isobutyrate group of carbomycin extends into the active site crevice (purple and blue spheres) between Hm A2486 and C2487. Fig. from [8].
binding site. Secondly, the reason macrolides do not directly inhibit the peptidyl transferase reaction is that they bind near to, but not directly at, the active site. Only after a few peptide bonds are formed will the peptide contact the bound macrolide and be blocked from further elongation. As a result, short di-, tri- and tetra-peptides will accumulate. Interestingly, these crystal structures also make it possible to understand more subtle differences among the inhibitory effects of different macrolides. As a peptide elongates, the portion of the macrolide it first encounters will be the sugar branch that extends from C5 of the lactone ring (Fig. 6) towards the peptidyl transferase active site (Fig. 7). Thus, those macrolides that have only a monosaccharide branch extending from C5 allow the synthesis of a longer oligo-peptide than those with longer branches extending from C5. For example, in the presence of erythromycin, which has only a monosaccharide at C5, tetra-peptides accumulate [20]. In the presence of tylosin and spiramycin that have longer disaccharide branches at C5, only di-peptides accumulate. Finally, carbomycin A has an isobutyrate group that extends from its disaccharide branch at C5 all the way into the active site hydrophobic crevice, and strongly inhibits formation of even di-peptides. 3.4.2 Binding Interactions Between the Lactone Ring and the Ribosome Hydrophobic interactions dominate the binding of the 15- and 16-membered lactone rings of macrolide antibiotics to the ribosome. One face of the lactone ring is hydrophobic, because it is comprised entirely of carbon atoms, with the exception of
3 Antibiotics
Fig. 8 Macrolide binding interactions and antibiotic resistance. The lactone ring of a macrolide (orange), illustrated by tylosin, interacts with the hydrophobic crevice at the entrance to the exit tunnel between Hm G2099 and A2100 (Ec A2058 and 2059) and with the helix terminating base pair Hm C2098-G2646 (Ec G2057-C2611). The sugars form hydrogen bonds (dotted lines) to ribosomal RNA. (A) The exocyclic amine (N2) of Hm G2099 is positioned at the center of these hydrophobic interactions, and forms unfavorable contacts(red arrows) with C4
and C7 of the lactone ring. (B) The mycinose sugar of tylosin interacts with domain II (gray sticks) of rRNA and directly contacts large ribosomal subunit protein L22 (light blue). A model of an Nl methyl group (N1-CH3) has been added to the crystal structure. (C) Alternative view of the hydrophobic face of tylosin in binding site. (D) Model building by changing G2099 to dimethyl-adenine (N6-di-CH3), the base common to macrolide producing organisms.
a single hydrogen bond acceptor group, which is the linking oxygen of the ester bond (Figs. 6 and 8). In contrast, the opposite face contains many hydrophilic groups. Crystal structures of the 16-membered macrolide antibiotics tylosin, carbomycin M and spiramycin, and of the 15-membered macrolide antibiotic azithromycin show that the hydrophobic face of the lactone ring lies almost flat against the wall of the peptide exit tunnel at a site where it is able to form van der Waals contacts with hydrophobic portions of 23S rRNA (Fig. 8). For example, one edge of the lactone ring (atoms C5 to C8) follows a hydrophobic crevice between Hm G2099 and A2100 (Ec 2058 and 2059), and the other edge of the lactone ring forms van der Waals contacts with the hydrophobic face of Hm G2646 (Ec C2611), which is exposed
11
12 Antibiotics and the Ribosome
because it forms a helix-terminating base-pair with Hm C2098 (Ec G2057). The hydrophilic groups on the opposite face of the lactone ring are exposed to solvent and do not interact with the ribosome. The two nucleotides that form the hydrophobic crevice at the entrance to the exit tunnel, Hm G2099 and A2100 (Ec A2058 and A2059), are also the two nucleotides that are biochemically most implicated in MLSB binding. Not only are they strongly protected by MLSB antibiotics from chemical modification, but mutations of these nucleotides confer MLSB resistance (Fig. 4). Furthermore, bacterial ribosomes that have a highly conserved adenine at Ec 2058 are far more sensitive to MLSB antibiotics than are archaeal ribosomes that in contrast have a highly conserved guanine in that position. Mutation of Ec A2058 to guanine in bacteria confers strong resistance to macrolides, especially 14-membered macrolides. Even using crystal structures of macrolides bound to the less sensitive archaeal ribosomes, the mechanism of the resistance conferred by the Ec A2058 to G2058 mutation can be understood. Adenine differs chemically from guanine in only two ways. In guanine, the obligate hydrogen bond donor, N6 of adenine, is replaced by the obligate hydrogen bond acceptor, O6, and adenine lacks the exo-cyclic amine N2 of guanine. It appears that resistance is caused specifically by the exocyclic N2 of guanine, not by the replacement of N6 by O6. The hydrophilic N2 group of Hm G2099 (Ec A2058 to G) is placed in the middle of the hydrophobic lactone ring binding site (Fig. 8A, B). As a result, the hydrophilic N2 group is unable to form hydrogen bonds with solvent, and the binding site is reduced both in size and in hydrophobicity. The N2 group of Hm G2099 contacts both C4 and C7 of the lactone ring. Removal of the hydrophilic N2 of Hm G2099 (Ec A2058 to G) would enable the lactone ring to fit more tightly against its binding site and to form additional van der Waals contacts. Despite the presence of the N2 group, the 16- and 15membered macrolides do bind to these less sensitive archaeal ribosomes. This may be due in part to the larger size of their lactone rings which allows more room for the intrusive N2 of Hm G2099, but in the case of 16-membered macrolides, other binding features may also come into play (see below). 3.4.3 Sugar Interactions with the Ribosome The sugar branch that extends from C5 of the lactone ring of all macrolides and extends towards the peptidyl transferase center, also contributes to binding affinity (Figs. 7 and 8). In the case of 16-membered macrolides, the first sugar of this branch is mycaminose, and a hydrogen bond forms between its O2A and the Nl of Hm G2099 (Ec A2058), which also interacts hydrophobically with the lactone ring of macrolides. It is not surprising that G2099 is the nucleotide that is most strongly protected by macrolide binding from chemical modification. Interestingly, resistance to macrolides can be conferred not only by mutation of Ec A2058 to guanine, but also by di-methylation of Ec A2058 at N6. This modification is observed in most organisms that produce macrolides. It is easy to examine the effects of this modification by a model building exercise in which Hm G2099 is replaced by dimethyl A2099 in the crystal structure (Fig. 8C,D). This reveals that one of the added methyl groups will interfere with the atom O2A of the mycaminose
3 Antibiotics
sugar (Fig. 8C, D), and by pushing the sugar away from its normal position will prevent formation of the hydrogen bond between Nl of Hm G2099 and O2A of the macrolide. When tylosin binds to the ribosome, the mycinose extension from C14 of the lactone ring extends down the exit tunnel and interacts with domain II of rRNA (Fig. 8B). Nucleotide modification by Nl methylation of Ec G748 (Hm A841) in domain II, which confers tylosin resistance, is at the bottom of the pocket in which the mycinose sugar binds. Model building suggests that the added methyl group will clash with O4C of the tylosin mycinose, and prevent it from forming hydrogen bonds to the ribosome (Fig. 8B). Not surprisingly, this modification does not alter the affinity of other macrolides for the ribosome, because they do not have a mycinose at C14. In a rational drug design experiment, removal of both O4C and its methyl group might reduce the level of resistance conferred by the methyl modification of Ec G748, by reducing the steric clash, even though one hydrogen bond would still be lost. 3.4.4 A Covalent Bond The crystal structures of the three 16-membered macrolides reveals that each forms a covalent bond with 23S rRNA (Figs. 7 and 8 A). In the absence of substrates and of antibiotics, the base of Hm A2103 (Ec 2062) lies flat against the wall of the exit tunnel of archaeal ribosomes, and forms a non-canonical base pair with Hm A2538 (Ec 2503). However, upon binding of tylosin, carbomycin M, or spiramycin the adenine base of Hm A2103 (Ec 2062) rotates approximately 90◦ . Hm A2103 extends out from the wall of the exit tunnel and forms a carbinolamine bond between its N6 and the aldehyde group extending from C6 of the macrolide (Fig. 9). Formation of this carbinolamine bond is revealed by the continuous electron density that connects the aldehyde group of the antibiotic to Hm A2103 in the unbiased electron density maps obtained for each of the 16-membered macrolides studied to date, each of which has an aldehyde group extending from C6. Not
Fig. 9 Carbinolamine bond. An amine can react with an aldehyde (left) to form a carbinolamine intermediate (center) that normally dehydrates to form a Schiff base (right). However, the exocyclic amine of a nucleoside base can form only the carbinolamine bond and not a Schiff base.
13
14 Antibiotics and the Ribosome
surprisingly, that continuous electron density feature is absent in the unbiased electron density map calculated for the structure of the 15-membered macrolide azithromycin which has no aldehyde group at C6. The electron density feature is consistent with formation of a carbinolamine bond and not with a Schiff base, consistent with the expected chemical reactivity between exocyclic amines and aldehydes (Fig. 9) [21]. The carbinolamine bond observed in these crystal structures has not been detected biochemically. Nevertheless, its formation may explain the results of a number of previous biochemical experiments. Modification or removal of the aldehyde group from a macrolide results in 100-fold increases in minimum inhibitory concentrations (MIC) [22–28]. Furthermore, mutation of Hm A2103 (Ec 2062) to guanine (which has an O6 instead of N6) confers resistance specifically to macrolides that have an aldehyde group at C6, but not to others [29]. It seems likely that the carbinolamine bonds observed in these crystal structures also form in vivo, and are physiologically relevant to the inhibitory effects of macrolides. This novel covalent interaction between antibiotics and the ribosome might be exploited in rational drug design experiments. The ribosome is full of exocyclic amines many of which will be found in crystal structures to be close to bound antibiotics. Perhaps an aldehyde group could be added to one of these antibiotics and be designed to form a carbinolamine bond with a nearby exocyclic amine and to thereby increase binding affinity. Of course, the aldehyde group itself might undergo unanticipated reactions. 3.4.5 Macrolides, 14-Membered X-ray crystal structures of the three 14-membered macrolides, erythromycin, clarithromycin, and roxithromycin have been solved bound to the Deinococcus radiodurans (Dr) large subunit [4]. They bind at about the same location as the 15- and 16-membered macrolides, near Ec A2058 and A2059 (Hm G2099 and 2100) at the entrance to the exit tunnel. The desosamine sugar of the 14-membered macrolides is positioned very much like the corresponding desosamine sugar of azithromycin and the mycaminose sugar of the 16-membered macrolides in complexes with H. marismortui ribosomes. The difference between these two classes of structures is in the way their lactone rings interact with the ribosome. Hydrophobic interactions dominate the binding of 15- or 16-membered lactone rings (Fig. 8) which bind with their hydrophobic faces flat against the wall of the exit tunnel. In contrast, the lactone ring of each of the three 14-member macrolides is oriented approximately perpendicular to that of 15and 16-membered lactone rings, which exposes their hydrophobic faces to solution. Instead, of hydrophobic interactions, it is proposed that hydrophilic interactions mediate the binding of 14-membered macrolides to the ribosome. Only one of the six hydrogen bonds proposed to stabilize binding of 14-membered lactone rings is observed to mediate binding of 15- and 16-membered macrolides to the ribosome. That hydrogen bond, which connects O2A of the mycaminose sugar to Nl of Hm G2099, may be important to macrolide binding, because it is observed in all of these structures regardless of the other differences in binding. Finally, the binding
3 Antibiotics
of the 14-membered lactone rings differs from that of the 15- and 16-membered rings because the former [4] assume a “folded-in” conformation, and the latter [8] assume a “folded-out” conformation [30].
3.4.6 Rational Drug Design of Macrolides Understanding the mechanism of resistance conferred by a specific mutation is expected to assist in rational drug design experiments. As described above, crystal structures reveal the mechanisms by which the Ec A2058 to G2058 mutation confers resistance to macrolide antibiotics. However, knowledge of that specific resistance mechanism might not be useful in rational drug design experiments. Guanine at Ec 2058 is highly conserved in eukaryotes, including humans, and so a macrolide redesigned to overcome that resistance mutation might also bind more tightly to human ribosomes. The structures of macrolide antibiotics bound to the ribosomes seem to provide numerous possibilities for rational drug design experiments. However, a few of these experiments may require an understanding of what causes the 14membered macrolides to bind to the ribosome differently than do the 15- and 16-membered macrolides. With respect to the ribosome, some atoms of the 14membered macrolides are located over 6 Å away from the corresponding atoms of the larger macrolides. Knowledge of which mode of binding a given antibiotic assumes would be crucial to a rational drug design experiment that involves those atoms. One possible cause of the different macrolide binding modes is that the smaller size of the 14-membered lactone ring might cause them to bind differently than do the larger ones. If so, rational drug design experiments on 14-membered macrolides may need to preserve and supplement the relevant hydrophilic interactions, such as the six hydrogen bonds that mediate their binding. In contrast, rational drug design experiments on 15- or 16-membered macrolides may need to focus instead on the hydrophobic interactions that mediate the binding of their larger lactone rings. A second possibility is that the two distinct modes of macrolide binding are caused by the conserved differences between archaeal and bacterial ribosomes. A good candidate for such a conserved difference is the nucleotide Hm G2099 (Ec A2058). Hm G2099 is located at the center of the macrolide binding site and is highly conserved as guanine in archaea and as adenine in bacterial ribosomes. Furthermore, mutation of Ec 2058 to G2058 affects macrolide binding and confers resistance to macrolides. If the G2099 to A change does cause the observed differences in binding between these two systems, then the structure of a macrolide bound to the bacterial ribosome might be the most relevant for rational drug design purposes. An archaeal pathogen has not yet been described. A third possibility is that the differences in macrolide binding between these sets of experiments result from inaccuracies in the models. Relevant to this possibility, the 14-membered macrolide structures are based on lower resolution data and have higher free R values than do the other macrolide structures.
15
16 Antibiotics and the Ribosome
These alternate modes of macrolide binding not only pose a challenge to rational drug design but potentially provide new opportunities in this area. Perhaps features from both modes of binding could be exploited by a single antibiotic. For example, the binding affinity of a 15- or 16-membered macrolide antibiotic might be increased by attaching a hydrogen bonding group that is designed to extend from its hydrophilic face towards one of the hydrogen bonding partners that was observed to mediate binding of a 14-membered macrolide. Many rational drug design experiments may be possible even prior to understanding the cause of these differences in binding. Since the desosamine sugar of the 14-membered macrolides binds similarly to the corresponding mycaminose sugars of the larger macrolides, rational drug design experiments involving that sugar might be independent of the mode of binding. Accordingly, the differences in binding may be irrelevant to experiments that involve any of the atoms of the C5 sugar branch. 3.5 Lincosamides
Currently, one structure of a lincosamide antibiotic bound to the ribosome is available for analysis [4]. Like the macrolide antibiotics, clindamycin binds near the hydrophobic crevice at the entrance to the peptide exit tunnel. As with the macrolide carbomycin A, clindamycin interacts not only with the hydrophobic crevice at the entrance to the peptide exit tunnel, but also with the active site hydrophobic crevice. The nucleotides that surround the clindamycin binding site were previously implicated in binding of lincosamides based on nucleotide protection studies and on the analysis of mutations conferred by resistance (Fig. 4). Since lincosamide and macrolide antibiotics share binding sites that only partially overlap, it might be possible to design a combination antibiotic. As this experiment would involve a macrolide, uncertainty over which of the two binding modes should be used for the macrolide might complicate the experiment. Nevertheless, it may be possible to apply this approach of combining the binding features of two antibiotics into one among the various antibiotics that share overlapped binding sites near the peptidyl transferase center (Fig. 5). 3.6 Streptogramins
The chemical structures of streptogramin antibiotics are unlike those of either macrolides or lincosamides (Fig. 6) and are unique in that organisms that synthesize them invariably make two different compounds, one being a streptogramin A and the other a streptogramin B. Streptogramin A and streptogramin B antibiotics are independently effective as antibiotics, but are much more effective when used in combination, because they bind cooperatively to the ribosome [18, 31]. Apparently, streptogramin A causes a conformational change of the ribosome near the peptidyl
3 Antibiotics
Fig. 10 Streptogramin A. Streptogramin A (light blue) binds to rRNA (gray) among nucleotides that are implicated in its binding by chemical modification (green) or resistance mutation (orange) studies. In response to streptogramin A binding, the Hm A2103 (Ec A2062) changes conformation (black arrow) from its native position
against the wall of the exit tunnel (thin green sticks) and a position that extends into the exit tunnel (thick green sticks). In this extended conformation, a face of the base of Hm A2103 stacks against the conjugated amide group of streptogramin A, and the ribose sugar forms a hydrogen bond (dotted line) with the same amide group.
transferase center that increases the affinity of the ribosome for streptogramin B [32]. Although no structure of a streptogramin B bound to the ribosome is available, there is a structure for streptogramin A bound to the ribosome [7]. Streptogramin A is a large antibiotic that binds in a location that overlaps both the A-site and the P-site (Fig. 5), consistent with reports that streptogramin A inhibits binding of substrates to both sites [32, 33]. When streptogramin A binds to the ribosome, the base of Hm A2103 (Ec A2062) rotates about 90◦ away from the wall of the exit tunnel (Fig. 10) much as it does when 16-membered macrolides bind. This conformational change allows the base of Hm A2103 (Ec 2062) to interact hydrophobically with the flat side of the conjugated amide bond of streptogramin A, and the sugar of Hm A2103 to hydrogen bond to the oxygen of the same amide bond. In addition, the nucleotides of the active site crevice undergo smaller conformational changes [7]. It appears that the far side of the base of Hm A2103 forms part of the binding site for streptogramin B. 3.7 Chloramphenicol
Chloramphenicol is widely used in research and in the clinic, although it is slightly toxic to human mitochondria [34]. Its chemical structure (Fig. 11) has been compared to nucleoside analogues such as puromycin, but biochemically the inhibition
17
18 Antibiotics and the Ribosome
Fig. 11 Chemical structures of antibiotics. Chemical structures of nucleosides are provided in the top row for comparison with the antibiotics shown in the second row. The third row shows chemical structures of antibiotics that only weakly resemble the nucleosides to which they have been compared. Fig. modified from [7].
of protein translation by chloramphenicol is more like that of MLSB antibiotics, a group with which it shares no chemical similarity. For example, many of the nucleotides that are protected from chemical modification by MLSB antibiotics (e.g., Ec A2058 and A2059) are also protected by chloramphenicol, and the same mutations confer resistance to both types of antibiotics (Fig. 4). Furthermore, oligopeptides are reported to accumulate when protein translation is inhibited by chloramphenicol or by macrolides [35]. However, chloramphenicol binding also protects Ec A2451 and C2452 at the active site hydrophobic crevice, and the only macrolide that protects Ec A2451 is Carbomycin A, evidently because of its isobutyrate extension. Furthermore, there are nucleotides protected by MLSB antibiotics that are not protected by chloramphenicol. The interactions of chloramphenicol with the ribosome have been studied for decades, and the some of the results are confusing. For example, the binding of chloramphenicol to the ribosome has been reported variously as competitive [36– 40], independent [41], and cooperative [40] with respect to macrolide binding. These inconsistencies may reflect the binding of chloramphenicol to two sites on the large ribosomal subunit that differ in affinity [42].
3 Antibiotics
Fig. 12 Chloramphenicol. A cutaway view of a space filled representation of the H. marismortui ribosome chloramphenicol bound at two sites, the active site hydrophobic crevice [4] and at the hydrophobic crevice at the entrance to the exit tunnel [7]. Both
of these binding sites are surrounded by nucleotides that upon mutation confer resistance to chloramphenicol (orange spheres) or that are protected by chloramphenicol from chemical modification (green spheres). Fig. from [7].
Two crystal structures of chloramphenicol bound to the ribosome are available. In one structure, chloramphenicol is observed to bind only at the active site hydrophobic crevice of the bacterial (D. radiodurans) ribosome [4]. In the other structure, chloramphenicol binds only at the hydrophobic crevice at the entrance to the exit runnel of an archaeal (H. marismortui) ribosome [7]. Both of these sites are surrounded by nucleotides implicated in chloramphenicol binding either by nucleotide protection studies or by mutational studies (Fig. 12). They probably correspond to the two sites inferred from biochemical experiments. Currently, it seems likely that the high affinity binding site of chloramphenicol in bacterial ribosomes is the active site hydrophobic crevice and that the low affinity site corresponds to the crevice at the entrance to the peptide exit tunnel. In addition, it seems reasonable that binding of chloramphenicol at the active site crevice might have a greater inhibitory effect on protein translation than binding at the more distant site. 3.8 Nucleoside Analogue Antibiotics
Many antibiotics have been described as nucleoside analogues, because they contain chemical groups that resemble portions of an aminoacylated nucleoside (Fig. 11). However, few of those antibiotics actually bind to the ribosome in a manner analogous with nucleosides. For example, chloramphenicol has been compared with the aminoacyl-nucleoside substrate puromycin. Its aromatic group has been compared both with the base of a nucleoside [43] and with an aromatic amino acid side chain [44]. Although, the aromatic group of chloramphenicol approaches the active site crevice from the same direction as the tyrosyl side chain of puromycin (Fig. 12), the binding of chloramphenicol to the ribosome is neither like that of an
19
20 Antibiotics and the Ribosome
amino acid side chain nor a nucleoside base (Fig. 12). Nevertheless, a few antibiotics contain chemical groups that are similar to portions of an aminoacylated nucleoside in chemical structure and in their binding interactions with the ribosome.
3.8.1 Puromycin The prototypical aminoacylated nucleoside analogue antibiotic is puromycin which inhibits the protein translation in all three domains of life. The chemical structure of puromycin is the same as that of tyrosylated adenosine, except for the presence of three added methyl groups and the replacement of an ester bond with an amide bond (Fig. 11). Puromycin mimics tyrosyl-tRNA so well that it binds to the A-site and gets incorporated into an elongating peptide. This leads to termination of translation because puromycin terminated peptides fall off the ribosome. Puromycin derivatives have been used crystallographically as peptidyl transferase substrates and have contributed to our understanding of the structure of the peptidyl transferase site (Fig. 5) [11, 16, 45].
3.8.2 Aminoacyl-4-aminohexosyl-cytosine Antibiotics The aminoacyl-4-aminohexosyl-cytosine group of antibiotics is classified based on chemical structures (Fig. 11) each of which contains a cytosine base. Included in this group are blasticidin S, gougerotin, amicetin and bamicetin [46]. These antibiotics are not used therapeutically. However, blasticidin S is an important antifungal used on rice crops. Only the crystal structure of blasticidin S bound to the ribosome is available. The chemical structure of blasticidin S resembles that of peptidyl-cytidine, although not as closely as puromycin resembles a tyrosylated adenosine (Fig. 11). In the crystal structure, blasticidin S binds to two sites on the ribosome (Fig. 13) and forms base-pairs with Hm G2284 and G2285 (Ec 2251 and 2252). These two guanines are almost totally conserved residues of the P-loop, and are known to base pair with the CCA end of tRNA in order to position the P-site substrate in the peptidyl transferase center [47]. The blasticidin S molecule that base pairs with Hm G2284 is the better ordered of the two, and its structure can be fit completely into unbiased electron density maps. The other blasticidin S molecule is represented by less complete electron density, and only its base and sugar groups can be modeled unambiguously. When the phosphates of rRNA are superimposed between one structure containing blasticidin S and another structure containing a P-site substrate bound to the ribosome, both the base and sugar groups of the well ordered blasticidin S molecule superimpose on C75 of a P-site substrate bound to the ribosome (Fig. 13). For the less ordered blasticidin S molecule that base pairs with Hm G2285, the superimposition with C74 of the P-site substrate is limited to only the nucleoside base. Therefore, blasticidin S, like puromycin, acts in the same manner as a nucleoside analogue in its interaction with the ribosome.
3 Antibiotics
Fig. 13 Blasticidin S. Blasticidin S (purple) base pairs to the P-loop of rRNA (grey). Its guanidinium tail forms hydrogen bonds (dotted lines) to the phosphate and stacks hydrophobically onto the base of Hm A2474, a nucleotide that is protected from
chemical modification (green) by blasticidin S. Superimposition of rRNA between crystal structures causes blasticidin S to become superimposed on C74 and C75 (yellow) of the analogue of a CCA end of tRNA. Fig. from [7].
The guanidinium tail of the well ordered blasticidin S binds hydrophobically to Hm A2474, which is protected by blasticidin S from chemical modification (Fig. 13). In addition, the nitrogens of its guanidinium tail hydrogen bond with the phosphate that connects Hm A2474 to U2473, a nucleotide that upon mutation confers resistance to blasticidin S [48]. These biochemical observations suggest that the blasticidin S molecule that is best ordered in the structure corresponds to the blasticidin S molecule that is also the most active physiologically. Whether or not binding of blasticidin S at the second site is also physiological relevant warrants further investigation. 3.9 Other Antibiotics that Bind to the 50S Subunit 3.9.1 Sparsomycin Sparsomycin inhibits protein synthesis in all organisms and consequently is not used as an antibiotic. However, it has been tested extensively for its antitumor activity [49]. Sparsomycin contains a pseudouracil base with a conjugated link to a sulfur containing tail (Fig. 11). This chemical structure has been compared with that of puromycin, and it has been suggested that sparsomycin binds to the A-site much like puromycin [50]. Sparsomycin does not bind to the ribosome unless a P-site substrate is also present. The structure of sparsomycin and a P-site substrate analogue simultaneously bound to the 50S ribosomal subunit is available. The binding of sparsomycin to the ribosome bears no similarity to the binding of puromycin. The pseudouracil group of sparsomycin binds on top of a P-site substrate analogue and is sandwiched
21
22 Antibiotics and the Ribosome
Fig. 14 Sparsomycin. Sparsomycin (green sticks) is sandwiched between a tRNA analogue (gray spheres) and Hm A2637 (white sticks) and forms numerous hydrogen or ionic bonds (dotted lines) with tRNA or rRNA. Fig. from [7].
between it and the mobile nucleotide Hm A2637 (Fig. 14). Sparsomycin’s sulfur containing tail extends from above the P-site into the A-site hydrophobic crevice, where it should compete for binding with an A-site substrate. These binding characteristics account for the failure of sparsomycin to bind the ribosome in the absence of a P-site substrate, its cooperative binding with a P-site substrate and for its inhibition of peptide bond formation. Each of the functional groups on the pseudouracil group forms favorable interactions with the P-site substrate or with the ribosome.
3.9.2 Anisomycin Like sparsomycin, anisomycin inhibits eukaryotic ribosomes and has been tested as an antitumor drug. Also like sparsomycin, the chemical structure of anisomycin has been compared with that of puromycin [44]. The crystal structure of anisomycin bound to the ribosome shows that anisomycin binds to the active site hydrophobic crevice (Fig. 15). The methoxyphenyl group of anisomycin, like that of puromycin, inserts into the active site hydrophobic crevice. However, the binding of anisomycin is unlike that of puromycin, because it approaches the active hydrophobic crevice from the opposite direction (Fig. 5). As the aromatic group of anisomycin stacks on Hm C2487 (Ec 2452), its sugar group occupies the oblong pocket between the two hydrophobic crevices (Figs. 6 and 15). The nitrogen of the sugar group forms a hydrogen bond with N3 of Hm C2487. When Hm C2487 is mutated to uridine, halophiles become resistant to anisomycin [51]. The simplest explanation is that this mutation eliminates the hydrogen bond by replacing the N4 hydrogen bond acceptor of cytidine with an N4 hydrogen bond donor of uridine. This hypothesis could be tested to provide insight into rational drug design principles by replacing N3 of anisomycin with an oxygen atom. This alternate form of anisomycin should preferentially bind the mutant ribosome and, at the same time, should be ineffective against the wild type ribosome.
4 Prospects for Rational Drug Design of Antibiotics that Bind to the Ribosome
Fig. 15 Anisomycin. Anisomycin (yellow) binds to the active site hydrophobic crevice forming numerous hydrogen or ionic bonds (dotted lines) with rRNA Fig. from [7].
4 Prospects for Rational Drug Design of Antibiotics that Bind to the Ribosome
The time for rational design of antibiotics that bind the ribosome has arrived. Currently, structures of 20 antibiotic/ribosome complexes are readily available in the Protein Data Bank (http://www.rcsb.org/pdb/). This chapter has discussed only half of them, the ones that bind to the 50S subunit. It is anticipated that rational drug design experiments will succeed in the near future, assuming the necessary precautions are taken. First, neither any X-ray nor any NMR structure should be accepted without question. The reliability of any given structure must be verified prior to being used as the basis for a rational drug design experiment. Analysis of X-ray crystallography statistics will reveal the extent of precision that can be expected of a structare. Evaluating the stereochemistry of a model by checking bond angles, bond distances, dihedral angles and van der Waals distances will reveal what level of confidence can be placed in a given structure. Ideally, such analysis will also include a complete set of X-ray diffraction amplitudes, although such data are not always released. The possibility of errors in deposited structures must be considered. Errors in chirality may be prevalent in high resolution small molecule structures. For example, the small molecule structure of streptogramin A is represented by its enantiomer in the Cambridge Structural Data Base. The effort to solve the structure of streptogramin A bound to the ribosome was initially hindered by relying on that incorrect small molecule structure. Likewise, the structure of anisomycin is incorrectly diagrammed as its enantiomer in most of the ribosomal literature. Fortunately, these chirality errors were identified when solving structures of these antibiotics bound to macromolecules. In fact, the identification of these errors may increase confidence in the reliability of these structures of complexes between ribosomes and antibiotics.
23
24 Antibiotics and the Ribosome
Knowledge of the precision of a particular X-ray structure may also be important in deciding whether or not a particular rational drug design experiment is feasible. For example, at a resolution of 3.0 Å, the rotamers for the links between the 16membered macrolides and sugars are reliable, with the possible exception of the forosamine sugar of spiramycin. However, the rotamers of smaller groups, such as the dimethyl amine of mycaminose (Fig. 9), are ambiguous. For most rational drug design experiments, this ambiguity will probably be irrelevant. However, for a rational drug design experiment involving that dimethylamino group, awareness that its conformation is ambiguous could be crucial. Another factor that may require analysis is the observation that some antibiotics bind to multiple sites. Tetracycline provides the most extreme example and binds to six separate sites on the small ribosomal subunit [52]. The existence of multiple binding sites raises the same challenges and opportunities that are raised by antibiotics that have multiple modes of binding to a single site, such as is reported for macrolides. Knowledge of which of these sites or which of these modes of binding is most physiologically relevant to the inhibitory effects of an antibiotic would be essential for certain rational drug design experiments. At the same time, identifying the less relevant sites to which an antibiotic binds at lower affinity may provide for an alternative strategy. A site which is physiologically irrelevant because its affinity for an antibiotic is low, may become physiologically relevant if an antibiotic can be redesigned to bind with higher affinity. Many structures of antibiotics bound to the ribosome are available and are of sufficient quality (Tab. 1) to warrant rational drug design experiments. If the proper precautions are taken, it is expected that some of these experiments will succeed in the near future. Acknowledgments
I thank Thomas A. Steitz and Peter B. Moore for directing my research on antibiotics bound to the ribosome and for reading this manuscript, Nenad Ban and Poul Nissen for earlier work on the structure of the 50S subunit, Betty Freeborn for technical help and useful scientific discussions, Jimin Wang for help with crystallographic software, Joe Ippolito and Martin Schmeing for help with some antibiotic experiments, Nukri Sanishvili and Andres Joachimiak for help with data collection at ID19 at APS. Much of the research that this chapter is based on was supported by National Institutes of Health Grant (GM22778) to Thomas A. Steitz, and by a grant from Agouron Institute to T.A.S. and Peter B. Moore.
References 1 BAN, N., NISSEN, P., HANSEN, J., MOORE, P., STEITZ, T., The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science 2000, 289,
905–920. 2 WIMBERLY, B. T., BRODERSEN, D. E., CLEMONS, W. M., MORGAN-WARREN, R. J., CARTER A. P., VONRHEIN, C., HARTSCH,
References
3
4
5
6
7
8
9
10
11
T., RAMAKRISHNAN, V., Structure of the 30S ribosomal subunit. Nature 2000, 407, 327–339. SCHLUENZEN, F., TOCILJ, A., ZARIVACH, R., HARMS, J., GLUEHMANN, M., JANELL, D., BASHAN A. BARTELS, H., AGMON, I., FRANCESCHI, F., YONATH, A., Structure of functionally activated small ribosomal subunit at 3.3 angstroms resolution. Cell 2000, 102, 615–623. SCHLUENZEN, F., ZARIVACH, R., HARMS, J., BASHAN, A., TOCILJ A. ALBRECHT, R., YONATH, A., FRANCESCHI, F., Structural basis for the interaction of antibiotics with the peptidyl transferase centre in eubacteria. Nature 2001, 413, 814–821. BRODERSEN, D. E., CLEMONS, W. M., CARTER A. P., MORGAN-WARREN, R. J., WIMBERLY, B. T., RAMAKRISHNAN, V., The structural basis for the action of the antibiotics tetracycline, pactamycin, and hygromycin B on the 30S ribosomal subunit. Cell 2000, 103, 1143–1154. CARTER A. P., CLEMONS W. M.J., BRODERSEN, D. E., MORGAN-WARREN, R. J., WIMBERLY, B. T., RAMAKRISHNAN, V., Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics. Nature 2000, 407, 306–307. HANSEN, J. L., MOORE, P., Steitz, T, Co-crystal structures of five antibiotics bound at the peptidyl transferase center of the 50S ribosomal subunit. J. Mol. Biol. 2003, 330, 1061–1075. HANSEN, J., IPPOLITO, J., BAN, N., NISSEN, P., MOORE, P., STEITZ, T., The structures of four macrolide antibiotics bound to the large ribosomal subunit. Mol. Cell 2002, 10, 1–20. KLEIN, D. J., SCHMEING, T. M., MOORE, P. B., STEITZ, T. A., The kinkturn; a new RNA motif. EMBOJ. 2001, 20, 4214–4221. HARMS, J., SCHLUENZEN, F., ZARIVACH, R., BASHAN A. GAT, S., AGMON, I., BARTELS, H., FRANCESCHI, F., YONATH, A., High resolution structure of the large ribosomal subunit from a mesophilic eubacterium. Cell 2001, 107, 679–688. NISSEN, P., HANSEN, J., BAN, N., MOORE, P., STEITZ, T., The structural basis of
12
13
14
15
16
17
18
19
20
ribosome activity in peptide bond synthesis. Science 2000, 289, 920–930. OGLE, J. M., BRODERSEN, D. E., CLEMONS, W. M., Jr., TARRY, M. J., CARTER A. P., RAMAKRISHNAN, V., Recognition of cognatetransfer RNA by the 30S ribosomal subunit. Science 2001, 292, 897–902. NOLLER, H. F., WOESE, C. R., Secondary structure of 16S ribosomal RNA. Science 1981, 212, 403–411. NOLLER, H. F., KOP, J., WHEATON, V., BROSIUS, J., GUTELL, R. R., KOPYLOV, A. M., DOHME, F., HERR W., STAHL, D. A., GUPTA, R., WAESE, C. R., Secondary structure model for 23 S ribosomal RNA. Nucleic Acids Res. 1981, 9, 6167–6189. CANNONE, J. J., SUBRAMANIAN, S., SCHNARE, M. N., COLLETT, J. R., D’SOUZA, L. M., DU Y. FENG, B., LIN, N., MADABUSI, L. V., MULLER, K. M., PANDE, N., SHANG, Z., YU, N., GUTELL, R. R., The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for ribosomal intron, and other RNAs. BMC Bioinformatics 2002, 3, 2–31. SCHMEING, T. M., SELLA, A.C., HANSEN, F. L., FREEBORN, B., SOUKUP, J. K., SCARINGE, S. A., STROBEL, S. A., MOORE, P. B., STEITZ, T. A., A pre-translocation intermediate in protein synthesis observed in crystals of enzymatically active 50S subunits. Nat. Struct. Biol. 2002, 9, 225–230. SCHONEELD, W., KIRST, H. A., (eds.) Macrolide Antibiotics. Milestones in Drug Therapy. 2002, Birkhauser, Basel. CONTRERAS, A., VAZQUEZ, D., Cooperative and antagonistic interactions of peptidyl-tRNA and antibiotics with bacterial ribosomes. Eur. J. Biochem. 1977, 74, 539–547. PESTKA, S., Studies on transfer ribonucleic acid-ribosome complexes. Effect of antibiotics on peptidyl puromycin synthesis on polyribosomes from Escherichia coli. J. Biol. Chem. 1972, 247, 4669–4678. MAO, J. C. H., ROBISHAW, E. E., Effects of macrolides on peptide-bond formation and translocation. Biochemistry 1971, 10, 2054–2061.
25
26 Antibiotics and the Ribosome
21 MCGHEE, J., VON HIPPEL, P., Formaldehyde as a probe of DNA structure. I. Reaction with exocyclic amino groups of DNA bases. Biochemistry 1977, 14, 1281–1303. 22 KIRST, H., TOTH, J. E., DEBONO, M., WILLARD, K. E., TRUEDELL, B. A., OTT, J. L., COUNTER, F. T., FELTY-DUCKWORTH, A. M., PEKAREK, R. S., Synthesis and evaluation of tylosin-related macrolides modified at the aldehyde function: a new series of orally effective antibiotics. J. Med. Chem. 1988, 31, 1631–1641. 23 KIRST, H. A., WILLARD, K. E., DEBONO, M., TOTH, J. E., TRUEDELL, B. A., LEEDS, J. P., OTT, J. L., FELTY-DUCKWORTH, A. M., COUNTER, F. T., Structure-activity studies of 20-deoxo-20-amino derivatives of tylosin-related macrolides. J. Antibiot. 1989, 42, 1673–1683. 24 NARANDJA, A., SUSKOVIC, B., KELNERIC, Z., DJOKIC, S., Structure-activity relationship among polyhydro derivatives of tylosis J. Antibiot. 1993, 47, 581–587. 25 OMURA, S., KATAGIRI, M., UMEZA, I., KOMIYAMA, K., MAEKAWA, T., SEKIKAWA, K., MATSUMAE, A., HATA, T., Structure-biological activities relationships among leucomycins and their derivatives. J. Antibiot. 1968, 21, 532–538. 26 OMURA, S., MIYANO, K., MATSUBARA, H., NAKAGAWA, A., Novel dimeric derivatives of leucomycfns and tylosin, sixteen-membered macrolides. J. Med. Chem. 1982, 25, 271–275. 27 OMURA, S., TISHLER, M., NAKAGAWA, Y., HIRONAKA, Y., HATA, T., Relationship of structures and microbiological activities of the 16-membered macrolides. J. Med. Chem. 1972, 15, 1011–1015. 28 WHALEY, H. A., PATTERSON, E. L., DORNBUSH, A. C., BACKUS, E. J., BOHONOS, N., Isolation and characterization of relomycin a new antibiotic. Antimicrob. Agents Chemother. 1964, 1963, 45. 29 DEPARDIEU, F., COURVALIN, P., Mutation in rRNA responsible for resistance to 16-membered macrolides and streptogramins in Streptococcus pneumoniae. Antimicrob. Agents Chemother. 2001, 45, 319–323.
30 BERTHO, G., LADAM, P., GHARBI-BENAROUS, J., DELAEORGE, M., GIRAULT, J., Solution conformation of methylated macrolide antibiotics roxithromycin and erythromycin using NMR and molecular modelling. Ribosome-bound conformation determined by TRNOE and formation of cytochrome P450-metabolite complex. Int. J. Biolog. Macromol. 1998, 22, 103–127. 31 DI GLAMBATTISTA, M., HUMMEI, H., BOCK, A., COCITO, C., Action of synergimycins and macrolides on in vivo and in vitro protein synthesis in archaebacteria. Mol. General Genetics 1985, 299, 323–329. 32 CHINALI, G., MOUREAU, P., COCITO, C. G., The action of virginiamycin M on the acceptor, donor, and catalytic sites of peptidyltransferase. J. Biol. Chem. 1984, 259, 9563–9568. 33 PESTKA, S., Studies on the formation of transfer ribonucleic acid-ribosome complexes. XI. Antibiotic effects on phenyl-alanyloligonucleotide binding to ribosomes. Proc. Natl. Acad. Sci. USA 1969, 64, 709–714. 34 LAM, R. F., LAI, J. S., NG, J. S., RAO, S. K., LAW, R. W., LAM, D. S., Topical chloramphenicol for eye infections. Hong Kong Med. J. 2002, 8, 44–47. 35 RHEINBERGER, H. J., NIERHAUS, K. H., Partial release of AcPhe-Phe-tRNA from ribosomes during poly(U)-dependent poly(Phe) synthesis and the effects of chloramphenicol. Eur. J. Biochem. 1990, 193, 643–650. 36 WOLEE, A. D., HAHN, F. E., Mode of action of chloramphenicol: IX. Effects of chloramphenicol upon a ribosomal amino acid polymerization system and its binding to bacterial ribosome. Biochim. Biophys. Acta 1965, 95, 146–155. 37 GRIVELL, L. A., NETTER, P., BORST, P., SLONIMSKI, P., Mitochondrial antibiotic resistance in yeast: ribosomal mutants resistant to chloramphenicol, erythromycin and spiramycin. Biochim. Biophys. Acta 1973, 312, 358–367. 38 VAZQUES, D., Antibiotics which affect protein synthesis: the uptake of 14 C-chloramphenicol by bacteria.
References
39
40
41
42
43
44
45
46
47
Biochem. Biophys. Res. Commun. 1963, 12, 409–413. VAZQUES, D., Antibiotics affecting chloramphenicol uptake by bacteria, their affect on amino acid incorporation in a cell-free system. Biochim. Biophys. Acta 1966, 2 24, 289–295. LANGLOIS, R., CANTOR, C. R., VINCE, R., PESTKA, S., Interaction between erythromycin and chloramphenicol binding sites on the Escherichia coli ribosome. Biochemistry 1977, 26, 2349–2355. OLEINICK, N. L., WILHELM, J. M., CORCORAN, J. W., Nonidentity of the site of action of erythromycin A and chloramphenicol on Bacillus subtilis ribosomes. Biochim. Biophys. Acta 1967, 255, 290–292. YUKIOKA, M., HATAYAMA, T., MORISAWA, S., Affinity labeling of the ribonucleic acid component adjacent to the peptidyl recognition center of peptidyl transferase in Escherichia coli ribosomes. Biochim. Biophys. Acta 1975, 390, 192–208. COUTSOGEORGOPOULOS, C. Amino acylaminonucleoside inhibitors of protein synthesis. The effects of amino acyl ribonucleic acid on the inhibition. Biochemistry 1967, 6, 1704–1711. KIRILLOV, S., PORSE, B. T., VESTER, B., WOOLLEY, P., GARRETT, R. A., Movement of the 3 -end of tRNA through the peptidyl transferase centre and its inhibition by antibiotics. FEES Lett. 1997, 406, 223–233. HANSEN, J. L., SCHMEING, T.M., MOORE, P. B., STEITZ, T. A., Structural insights into peptide bond formation. Proc. Natl. Acad. Sci. USA 2002, 99, 11670–11765. LICHTENTHALER, F. W., TRUMMLITZ, G., Structural basis for inhibition of protein synthesis by the aminoacyl-aminohexosyl-cytosine group of antibiotics. FEBS Lett. 1974, 38, 237–242. SAMAHA, R. R., GREEN, R., NOLLER, H. F., A base pair between tRNA and 23S rRNA in the peptidyl transferase centre of the ribosome. Nature 1995, 377, 309–314.
48 PORSE, B., RODRIGUEZ-FONSECA, C., LEVIEV, I., GARRETT, R. A., Antibiotic inhibition of the movement of tRNA substrates through a peptidyl transferase cavity. Biochem. Cell Biol. 1995, 73, 877–885. 49 HOFS, H. P., WAGENER, D. J., de VALK-BAKKER, V., van RENNES, H., de VOS, D., DOESBURG, W. H., OTTENHEIJM, H. C., de GRIP, W. J., Schedule-dependent enhancement of antitumor activity of ethyldeshy-droxysparsomycin in combination with classical antineoplastic agents. Anticancer Drugs 1995, 6, 277–284. 50 FLYNN, G. A., ASH, R. J., Necessity of the sulfoxide moiety for the biochemical and biological properties of an analog of sparsomycin. Biochem. Biophys. Res. Commun. 1983, 114, 1–7. 51 HUMMEL, H., BOCK, A., 23 S ribosomal RNA mutations in halobacteria conferring resistance to the anti-80S ribosome targeted antibiotic anisomycin. Nucleic Acids Res. 1987, 15, 2431–2443. 52 PLOLETTI, M., SCHLUENZEN, F., HARMS, J., ZARIVACH, R., GLUHMANN, M., AVILA, H., BASHAN A. BARTELS, H., AUERBACK, T., JACOB, C., HARTSCH, T., YONATH, A. Crystal structures of complexes of the small ribosomal subunit with tetracycline, edeine and IF3. EMBO J. 2001, 20, 1829–1839. 53 YUSUPOV, M. M., YUSUPOVA, G. Z., BAUCOM, A., LIEBERMAN, K., EARNEST, T., CATE, J., NOLLER, H. F., Crystal structure of the ribosome at 5.5 Å resolution. Science 2001, 292, 883–896. 54 DOUTHWAITE, S., Interaction of the antibiotics clindamycin and lincomycin with Escherichia coli 23S ribosomal RNA. Nucleic Acids Res. 1992, 20, 4717–4720. 55 MOAZED, D., NOLLER, H. F., Chloramphenicol, erythromycin, carbomycin, and vernamycin B protect overlapping sites in the peptidyl transferase region of 23S ribosomal RNA. Biochimie 1987, 69, 879–884. 56 RODRIGUEZ-FONSECA, C., AMILS, R., GARRETT, R. A., Fine structure of the peptidyl transferase centre on 23S-like
27
28 Antibiotics and the Ribosome
57
58
59
60
61
rRNAs deduced from chemical probing of antibiotic-ribosome complexes. J. Mol. Biol. 1995, 247, 224–235. MOAZED, D., NOLLER, H. F., Sites of interaction of the -CCA end of peptidyl-tRNA with 23S rRNA. Proc. Natl. Acad. Sci. USA 1991, 88, 3725–3728. PORSE, B. T., GARRETT, R. A., Sites of interaction of streptogramin A and B antibiotics in the peptidyl transferase loop of 23S rRNA and the synergism of their inhibitory mechanisms. J. Mol. Biol. 1998, 286, 375–387. KEARSEY, S. E., CRAIG, I. W., Altered ribosomal RNA genes in mitochondria from mammalian cells with chloramphenicol resistance. Nature 1981, 290, 607–608. BLANC, H., WRIGHT, C. T., BIBB, M. J., WALLACE, D. C., CLAYTON, D. A., Mitochondrial DNA of chloramphenicol-resistant mouse cells contains a single nucleotide change in the region encoding the 3 end of the large ribosomal RNA. Proc. Natl. Acad. Sci. USA 1981, 78, 3789–3793. ETTAYEBI, M. S., PRASAD, S. M., MORGAN, E. A., Chloramphenicol-erythromycin resistance mutations in a 23 S rRNA gene of Escherichia coli. J. Bacterial. 1985, 162, 551–557.
62 VESTER, B., GARRETT, R. A., The importance of highly conserved nucleotides in the binding region of chloramphenicol at the peptidyl transferase centre of Escherichia coli 23S ribosomal RNA. EMBO J. 1988, 7, 3577–3587. 63 DUJON, B., Sequence of the intron and flanking exons of the mitochondrial 21S rRNA gene of yeast strains having different alleles at the rib-1 loci. Cell 1980, 20, 185–197. 64 BLANC, H., ADAMS, C. W., WALLACE, D. C., Different nucleotide changes in the large rRNA gene of the mitochondrial DNA confer chloramphenicol resistance on two human cell lines. Nucleic Acids Res. 1981, 9, 5785–5795. 65 TAN, G. T., DEBLASIO A. MANKIN, A. S., Mutations in the peptidyl transferase center of 23 S rRNA reveal the site of action of sparsomycin, a universal inhibitor of translation. J. Mol. Biol. 1996, 261, 222–230. 66 LAZARO, E., RODRIGUEZ-FONSECA, C., PORSE, B., URENA, D., GARRETT, R. A., BALLESTA, J. P. G., A sparsomycin-resistant mutant of Halobacterium salinarium lacks a modification at nucleotide U2603 in the peptidyl transferase centre of 23 S rRNA J. Mol. Biol. 1996, 261, 231–238.
1
Ribosome-associated Proteins Sabine Rospert
Universit¨at Freiburg, Freiburg, Germany
Matthias Gautschi ETH Z¨urich-Swiss Federal Institute of Technology, Z¨urich, Switzerland
Magdalena Rakwalska and Uta Raue Universit¨at Freiburg, Freiburg, Germany
Originally published in: Protein Folding Handbook. Part II. Edited by Johannes Buchner and Thomas Kiefhaber. Copyright ľ 2005 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30784-2
1 Introduction
Translation of the genetic message into a polypeptide is carried out by ribosomes. Eukaryotic ribosomes are macromolecular structures of about 4000 kDa consisting of two-thirds RNA and one-third protein [1–3]. The structures of the 30S subunit of the eubacterium Thermus thermophilus [4] and the 50S subunit of the archaebacterium Haloarcula marismortui [5] and of the eubacterium Deinococcus radiodurans [6] have been solved by X-ray crystallography. Although the core of the ribosome is conserved in all organisms, eukaryotes contain an additional rRNA molecule plus 20–30 more ribosomal proteins than do prokaryotes. So far neither the 60S nor the 40S eukaryotic ribosomal subunits are available in atomic detail. Our current understanding of the eukaryotic ribosome comes from a combination of cryo-electron microscopy, crystallography data on individual components, and homology modeling based on the prokaryotic ribosome [7–10]. The key event in polypeptide synthesis is the catalysis of peptide bond formation. The peptidyl transferase (PT) center is located on the large ribosomal subunit and is composed of RNA (23S rRNA) [5, 11]. A mechanism of catalysis was proposed based on the arrangement of RNA elements in the active site and their interactions with an intermediate-state analogue [11]. The newly formed polypeptide leaves the ribosome through a tunnel. First observed in the mid-1980s, the tunnel was assumed to be a
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Ribosome-associated Proteins
passive path to exit the ribosome [12, 13]. The crystal structure of the H. marismortui 50S subunit revealed that the tunnel from the site of peptide synthesis to its exit is about 10 nm in length and between 1 and 2 nm in width. The tunnel can accommodate a polypeptide chain of approximately 40 amino acids [5, 11] and is formed by RNA and protein (23S rRNA, L4, L22, L39e: H. marismortui nomenclature; compare Table 1). Its surface is largely hydrophilic and includes exposed hydrogen-bonding groups from bases, phosphates, and polar amino acids. It has been suggested that this results in a Teflon-like tunnel surface, similar to the chaperonin GroEL [14] in its non-binding conformation [11]. However, recent findings suggest that the tunnel may very well interact with the nascent polypeptide chain and play a rather active role in regulating translation events downstream of polypeptide bond formation. An example for such an active regulatory role of the ribosomal tunnel came from mechanistic studies on co-translational translocation into the mammalian endoplasmic reticulum (ER). Specific nascent polypeptides harbored within the ribosomal tunnel have been shown to induce structural changes in the ER translocon upon ribosome
Table 1 Yeast ribosomal proteins with homology to ribosomal proteins of Haloarcula maris-
mortui localized in close proximity to the polypeptide tunnel exit [5, 6, 11]. The name of the yeast homologue is given according to the Rp nomenclature [156]
Family of Phenotype of ribosomal H. S. the deletion in proteins marismortui E. coli cerevisiae S. cerevisiae
Attachment site for ribosome-associated proteins
Ribosomal proteins close to the exit of the polypeptide tunnel L24
L24
L24
L22
L22
L22
L23
L23
L23
Rpl26a/b rpl26a: viable rpl26b: viable [157] Rpl17a/b rpl17a: lethal rpl17b: viable [157] Rpl25 rpl25: lethal [158]
Rpl35a/b rpl35a: viable rpl35b: viable [70, 157] L19e L19 – Rpl19a/b rpl19a: viable rpl19b: viable rpl19a rpl19b: lethal [157, 159] L39e L39e – Rpl39 rpl39: viable [160] L31e L31e – Rpl31a/b rpl31a: viable rpl31b: viable [157] Ribosomal proteins lining the exit tunnel L29
L29
L29
L4e/L4p
L4
L4
L22 L39e
L22 L39e
L22 –
rpl4a: viable rpl4b: viable [161] Rpl17a/b Compare above Rpl39 Compare above Rpl4a/b
Binding site for TF [46] and SRP [51] in E. coli. Binding site for eukaryotic SRP [36] Binding site for eukaryotic SRP [36]
1 Introduction 3
binding [15, 16]. This finding suggests that a polypeptide is able to affect the process of membrane translocation from within the tunnel. Recently, a number of studies have suggested sequence-specific interactions between the exit tunnel and nascent peptides [17]. Thus, the original idea of a passive tunnel seems to be no longer valid. Significant interaction of the tunnel surface with newly synthesized polypeptides raises the question of what drives translocation of the polypeptide. Besides the main tunnel, there might be several additional routes for a polypeptide to leave the ribosome [18]. It has been suggested that the translation state of the ribosome influences the tunnel (system) via communication between the small and large ribosomal subunits [18]. The basis of this idea is that it might be important to hold the polypeptide chain in a defined position during peptide bond synthesis. Subsequently, opening of the tunnel would allow movement of the polypeptide and provide space for the next amino acid to be added. Indeed, the diameter of the tunnel entrance of a translating ribosome has been found to be significantly larger than that of a non-translating ribosome. The tunnel exit is surrounded by a number of ribosomal proteins that are thought to provide the binding sites for additional proteins affecting the newly synthesized polypeptide chain upon exit from the tunnel (Figure 1 and Table 1). These ribosome-
Fig. 1 Potential arrangement of yeast ribosomal proteins in close proximity to the polypeptide exit tunnel. The model is based on the crystal structure of the large ribosomal subunit of the archaebacterium Haloarcula marismortui [5, 11]. Archaebacterial ribosomes share a high degree of similarity with eukaryotic ribosomes. The mammalian
homologues of Rpl25 and Rpl35 have been identified as the binding sites for SRP [36]. The eubacterial L23 (yeast homologue of Rpl25) was identified as the binding site for trigger factor and bacterial SRP [46, 51]. Proteins present in only archaebacteria and eukaryotes are lightly shaded (compare also Table 1).
4 Ribosome-associated Proteins Table 2 Major ribosome-associated proteins and complexes in yeast (in alphabetical order)
Subunits (protein family) NAC (Nascent polypeptideassociated complex) NatA (N α -acetyltransferase A) RAC (Ribosome-associated complex) Ssb1p Ssb2p SRP (Signal recognition particle)
Direct binding to the Interaction with nascent polypeptide Selected the ribosome via: chain references
α NAC = Egd2p β NAC β 1 NAC = Egd1p β 2 NAC = Btt1p Nat1p Nat1p Ard1p Nat5p
α NAC β NAC
[21, 29], [154, 162]
Nat1p
[31, 132], [138, 139]
Zuotin (Hsp40) Ssz1p (Hsp70)
Zuotin
–
[63, 71], [87, 107]
− (Hsp70)
Ssb1/2p
Ssb1/2p
[62, 66, 87]
Srp72p Srp68p Sec65p Srp54p Srp21p Srp14p scR1 (RNA)
Srp54p
Srp54p
[23, 24, 36], [163–165]
Srp14p + part of scR1 (Alu-domain)
associated proteins are thought to serve a variety of functions; however, their exact cellular roles are only poorly understood. One subgroup of proteins is localized so closely to the nascent polypeptide chain that they can be covalently linked by cross-linking reagents. Some belong to the classical chaperone families of Hsp70 and Hsp40 homologues [19], while others have enzymatic functions and covalently modify nascent polypeptides [20]. Still others lack homologues in other compartments of the cell, and their role in protein biogenesis is currently unclear [21]. This review focuses on our current knowledge of ribosome-associated proteins affecting nascent polypeptides in the eukaryote Saccharomyces cerevisiae (Table 2) and the eubacterium Escherichia coli.
2 Signal Recognition Particle, Nascent Polypeptide–associated Complex, and Trigger Factor 2.1 Signal Recognition Particle
The function of signal recognition particle (SRP) is specific for a subgroup of newly synthesized proteins and is conserved throughout all kingdoms of life [22]. In eukaryotes, SRP acts on nascent polypeptide chains, which are either secreted or integrated into the membrane of the endoplasmic reticulum, and connects their
2 Signal Recognition Particle, Nascent Polypeptide–associated Complex, and Trigger Factor
translation with translocation. Eukaryotic SRP is a multi-protein complex containing an RNA molecule. In yeast, SRP is not essential; however, cells lacking functional SRP grow only very poorly and show severe translocation defects [23]. In prokaryotes, SRP-dependent export is responsible for the translocation of the majority of proteins targeted to the membrane and periplasmic space. Its structure is less complex than that of its eukaryotic counterpart. E. coli SRP consists of only the protein Ffh and a 4.5S RNA molecule [22, 24, 25]. In both eukaryotes and prokaryotes, SRP recognizes N-terminally localized signal sequences co-translationally. Signal sequences share no specific consensus on the amino acid level but display common structural properties: they possess the capability to adopt an α-helical conformation, plus they contain a central hydrophobic core [24, 25]. In eukaryotes, binding of SRP to nascent polypeptides leads to a translational arrest (Figure 2B) and is required for the productive interaction of the complex, consisting of ribosome and nascent chain (RNC), with the translocation machinery [22]. 2.2 An Interplay between Eukaryotic SRP and Nascent Polypeptide–associated Complex?
Nascent polypeptide–associated complex (NAC) is a heterodimeric complex present in all eukaryotic cells [21]. In eubacteria, NAC homologues have not been detected thus far. Several studies based on cross-linking approaches indicate that both subunits of NAC are located in close proximity to ribosome-bound nascent poly-peptide chains [26–31]. These cross-link experiments suggest that NAC does not display pronounced sequence specificity but ubiquitously binds to nascent polypeptide chains. In the mammalian system, NAC cross-links have been detected to nascent polypeptide chains as short as 17 amino acids in length [26, 32]. In yeast, the minimal length of the nascent polypeptide chain forming a cross-link to NAC has been determined to be 40–50 amino acids [31]. Based on the structure of the ribosomal tunnel (compare Section 1); [11], the nascent polypeptide chain is predicted to exit the ribosome at a length of approximately 40 amino acids. In yeast, NAC would thus bind just after the polypeptide emerges from the ribosome. After termination of translation, NAC does not stably interact with newly synthesized polypeptides. Binding to proteins after release from the ribosome has not been detected so far [26, 28]. It has been suggested that SRP and NAC compete for the same binding site on the ribosome [33, 34]. This competition might facilitate SRP’s sampling of nascent polypeptide chains for signal sequences, which is achieved by binding and release of SRP from the ribosome during the elongation process. NAC might increase the rate at which SRP dissociates from RNCs lacking signals. However, it is presently unknown exactly how NAC influences SRP’s function [33, 35]. Recently it was shown that the Srp54p subunit of SRP is positioned close to ribosomal proteins L23a (corresponding to Rpl25 in yeast) and L35 (corresponding to Rpl35a/b in yeast) at the exit site of the mammalian ribosome [36] (Figure 1). Upon binding of SRP to its receptor in the ER membrane, Srp54p is repositioned and loses contact with L23a [36].
5
6 Ribosome-associated Proteins
Fig. 2 Hypothetical arrangement of ribosome-associated proteins at the yeast polypeptide tunnel exit. (A) Ribosomebound factors that bind to nascent polypeptide chains in a sequence-independent manner. NAC (nascent polypeptide–associated complex), the Hsp70 homologue Ssb1/2p, and the N" -acetyltransferase NatA interact with nascent polypeptide chains independent of their amino acid sequence. NAC and Ssb1/2p require a minimal polypeptide length of 40–50 amino acids [31]. Binding of NAC and Ssb1/2p precedes binding of Nat1p, which is in close proximity only to nascent polypeptide chains longer than 70
amino acids [31]. Whether NAC, Ssb1/2p, and NatA simultaneously bind to one and the same ribosome (as suggested here) or independently to different molecules is currently unknown. The chaperone complex RAC and the methionine aminopeptidase Map1p bind to translating ribosomes; however, direct binding to the nascent polypeptide has so far not been observed. RAC functionally interacts with Ssb1/2p and potentially binds in its close proximity. The binding site for Map1p should enable the protein to act prior to NatA, which N" acetylates residues generated by Map1p (for details, compare text).
2.3 Interplay between Bacterial SRP and Trigger Factor?
Trigger factor (TF) is a protein found in eubacteria; homologues in eukaryotes have so far not been detected. TF was initially described as a ribosome-bound protein involved in protein export [37, 38]. Later TF was shown to possess peptidylprolyl isomerase activity and to localize in close proximity to a variety of nascent
2 Signal Recognition Particle, Nascent Polypeptide–associated Complex, and Trigger Factor
Fig. 2 (B) Signal recognition particle (SRP) binds to secretory nascent polypeptides. SRP binds through the M-domain of Srp54p to signal peptides of membrane or secretory proteins. This causes a transient arrest of translation, possibly through the interaction of the Alu-domain with the A-site of the peptidyl-transferase center (discussed in Ref. [155]). The numbers indicate the posi-
tion of the mammalian SRP subunits on the RNA molecule (for recent reviews on structural details, see Refs. [25, 155]). Whether secretory proteins interact simultaneously with SRP plus one or more of the sequenceindependent nascent-chain-binding factors shown in (A) is currently unknown (compare also Table 2).
polypeptide chains [39–41]. TF is dispensable under normal growth conditions, and its deletion causes only a mild phenotype [42–44]. TF binds to the large ribosomal subunit [38, 45]. Recently it was shown that L23 (homologue of yeast Rpl25, Table 1), a ribosomal protein in close proximity to the exit tunnel, serves as the attachment site [46]. TF binds to both the ribosome and unfolded substrate proteins with an affinity in the sub- to low-micromolar range [38, 47–50]. However, the dynamics of interaction differ significantly. While the TFribosome complex is stable for about 30 s (2 µM concentration of TF and ribosome, 20 ◦ C), binding to unfolded proteins is very transient (100 ms at 15 ◦ C) [47]. As the binding sites for the ribosome and the nascent polypeptide chain reside in distinct domains, their differences in kinetic properties might enable TF to scan a growing polypeptide chain while at the same time remaining bound to the ribosome. Like TF, SRP was localized in close proximity to ribosomal protein L23 with a cross-linking approach. In contrast to eukaryotic SRP, no cross-link to L29
7
8 Ribosome-associated Proteins
(homologue of yeast Rpl35a/b, Table 1) was detected [51] (Table 1). The crosslinking pattern between L23 and SRP was unchanged in the presence of the SRP receptor FtsY; however, it was modified by the presence of a nascent signal peptide. No contacts to other ribosomal proteins were found. Given the fact that L23 exposes only a small surface at the exit of the tunnel [6], it seems unlikely that TF and SRP can bind simultaneously. Rather, they might alternate in transient binding to the ribosome until a nascent polypeptide chain emerges [52]. Upon recognition of the nascent polypeptide, binding of either SRP or TF might be stabilized. Alternatively, the ribosome may sense the nature of the nascent peptide already within its exit tunnel [15, 53–55]. In this scenario, the ribosome might actively recruit SRP or TF to polypeptides. L23 clearly plays a central role in the fate of the newly synthesized polypeptide chain. 2.4 Functional Redundancy: TF and the Bacterial Hsp70 Homologue DnaK
By transient association with short peptide segments, Hsp70 chaperones assist a large variety of processes, such as protein translocation, folding, and degradation (compare also below) [56]. The bacterial Hsp70 homologue DnaK is not itself directly bound to the ribosome; however, a fraction of DnaK in the cell is associated with nascent polypeptides. In E. coli the simultaneous deletion of both TF and DnaK is lethal and leads to the aggregation of a number of newly synthesized proteins [43, 44]. This synthetic effect suggests that TF and DnaK have overlapping functions and assist in the folding of newly synthesized polypeptides. How these structurally very different proteins cope with the newly synthesized polypeptide chain is currently only poorly understood. It is clear, however, that DnaK does not directly become associated with the ribosome in the absence of TF [57]. Recently it was shown that Bacillus subtilis can survive in the absence of DnaK and TF, suggesting that yet another factor can functionally complement for Dnak and TF [58].
3 Chaperones Bound to the Eukaryotic Ribosome: Hsp70 and Hsp40 Systems
The family of weak ATPases termed Hsp70s (70-kDa heat shock proteins) have three domains: a highly conserved N-terminal ATPase domain, a less well-conserved peptide-binding domain, and a C-terminal variable domain [56, 59, 60]. Hsp70s assist in a variety of processes in all kingdoms of life. Most importantly, they are involved in protein folding, translocation of proteins across cellular membranes, cooperation with the protein degradation machinery, and in the assembly of oligomeric complexes [56, 61]. The common principle that enables Hsp70s such broad functionality is their ability to interact with a variety of short, hydrophobic peptide segments exposed in partly folded or unfolded proteins. The interaction of Hsp70s with their substrates is regulated by an ATPase cycle controlling its affinity for peptide segments. Yeast contains a number of different Hsp70 subfamilies in the
3 Chaperones Bound to the Eukaryotic Ribosome: Hsp70 and Hsp40 Systems 9
cytosol. Two subfamilies, Ssb and Ssz, are ribosome-associated [62, 63] (Figure 2). Published data are inconsistent regarding ribosome association of the essential Ssa subfamily (consisting of Ssa1, Ssa2, Ssa3, and Ssa4) ([64–66]; compare Section 3.1). A number of co-chaperones modulate Hsp70s’ function and tailor them for specific cellular processes [56, 61]. The best-studied group of Hsp70 co-chaperones are the Hsp40 homologues [56, 67–69]. Yeast contains more than 20 different Hsp40s, all containing a common domain of approximately 80 amino acids, the so-called J-domain. Two of the yeast Hsp40s have been localized on the ribosome: Sis1p and zuotin [70, 71]. The primary function of Hsp40s is to mediate ATP hydrolysis– dependent interaction of peptide segments with the peptide-binding domain of Hsp70. The J-domain is essential for this regulation and stimulates the rate of ATP hydrolysis by interacting with the ATPase domain of Hsp70s. In addition, some Hsp40s can associate with substrate proteins and transfer them to the Hsp70 partner. The mechanism of this transfer reaction is not very well understood [56, 68]. 3.1 Sis1p and Ssa1p: an Hsp70/Hsp40 System Involved in Translation Initiation?
Sis1p is an essential Hsp40 homologue localized in the cytosol of yeast [72]. Initial experiments suggested that Sis1p was mainly bound to the small ribosomal subunit. This binding was less stable than that of other ribosome-bound chaperones and was strongly decreased at salt concentrations higher than 50 mM KCl [70]. Later it was shown that a small fraction of Sis1p is associated with polysomes even at concentrations as high as 500 mM KCl [64]. It is currently unclear how these different pools of Sis1p might relate to each other. Using a temperature-sensitive variant of Sis1p (sis1-85), it was shown that loss of function leads to a decrease in polysomes and to the accumulation of 80S ribosomes. Consistent with these data, it was suggested that Sis1p function is required for the initiation of translation [70, 73]. Interestingly, temperature sensitivity of sis1-85 can be suppressed by deletion of either RPL39 or RPL35, two proteins of the large ribosomal subunit localized close to the exit of the polypeptide tunnel [70] (Table 1). The unusual genetic interaction between SIS1 and RPL39 connects the proteins to PAB1, encoding poly(A)-binding protein. Deletion of RPL39 suppresses not only temperature sensitivity of sis1-85 but also the otherwise lethal deletion of PAB1 [74]. Pab1p binds to the poly(A)-tail of mRNA, plays a role in mRNA stabilization, and is also essential for efficient translation initiation [75, 76]. Therefore, Pab1p and Sis1p may functionally interact in translation initiation. This cooperation also seems to involve Ssa1p, a chaperone of the Hsp70 class. Ssa1p is present mainly in the soluble fraction of the cytosol and is involved in a multitude of different processes [77–81]. Like Sis1p, and also Pab1p, a small fraction of Ssa1p might be bound to polysomes in a salt-resistant manner [64]. However, in another study Ssa1p did not behave like a polysome-associated protein [65]. Two additional findings support the notion that Sis1p may act as a co-chaperone for Ssa1p. Firstly, Sis1p can stimulate the ATPase activity of Ssa1p and facilitate the refolding of denatured protein together with Ssa1p in vitro [82].
10 Ribosome-associated Proteins
Secondly, Sis1p interacts with the C-terminal 15-amino-acid residues of Ssa1p and might be involved in the transfer of substrate proteins to Ssa1p [64, 83]. However, cross-link products indicating direct interaction of Ssa1p with nascent polypeptide chains have not been observed so far [66]. Future studies will have to clarify the role of Sis1p and Ssa1p during translation. 3.2 Ssb1/2p, an Hsp70 Homologue Distributed Between Ribosomes and Cytosol
The two members of the Ssb family of Hsp70 homologues, Ssb1p and Ssb2p, differ by only four amino acids and localize to the cytosol of yeast [84]. Single deletion of the respective genes does not result in a phenotype. However, cells expressing neither Ssb1p nor Ssb2p (ssb1ssb2) display strong growth defects. The optimal growth temperature is shifted from 30 ◦ C to 37 ◦ C, and ssb1ssb2 strains are cold-sensitive [85]. In addition, ssb1ssb2 strains are hypersensitive towards a number of antibiotics affecting the translational apparatus, among them, the aminoglycoside antibiotics paromomycin, hygromycin B, and G418, which increase the error rate during translation [62, 86]. Approximately 50% of Ssb1p and Ssb2p (Ssb1/2p) is associated with ribosomes [62, 63, 66]. Ssb1/2p binds both translating and non-translating ribosomes, but the salt sensitivity of the Ssb1/2p-ribosome interaction suggests two different binding modes (Table 3). Ssb1/2p binding to non-translating ribosomes is salt-sensitive. When the ribosome is involved in translation, binding of Ssb1/2p becomes saltresistant [62, 66]. Ssb1/2p interacts directly with ribosome-associated polypeptides, suggesting that it is localized close to the exit of the ribosomal tunnel [31, 66, 87] (Figure 2). The ability to bind the nascent polypeptide chain is a prerequisite for the formation of the salt-resistant complex between Ssb1/2p and the ribosome [88]. Salt-resistant binding, however, is most likely not achieved exclusively via interaction of Ssb1/2p with the nascent polypeptide. Without the context of the ribosome, binding of Ssb1/2p to polypeptide chains has not been observed so far [88], suggesting that this interaction is not stable enough to resist at high salt concentrations. Conformational changes between empty and translating ribosomes, possibly induced by Ssb1/2p interaction with the nascent polypeptide chain, might contribute to the different binding modes [66]. Expression of Ssb1/2p is regulated like a core component of the ribosome [89], which is in agreement with its close relation to the translational process. Whether Ssb1/2p cycles between a ribosomal-bound and a soluble pool and whether this dual localization reflects distinct functions of this chaperone remains to be determined. 3.3 Function of Ssb1/2p in Degradation and Protein Folding
In eukaryotic cells, the vast majority of proteins in the cytosol and nucleus are degraded via the proteasome-ubiquitin pathway [90]. SSB1 was isolated as a multicopy suppressor of a temperature-sensitive growth phenotype observed in a yeast strain
3 Chaperones Bound to the Eukaryotic Ribosome: Hsp70 and Hsp40 Systems 11 Table 3 Salt sensitivity of the binding of major ribosome-bound proteins. Polysome binding
was determined using sucrose-density gradients. Binding to translating ribosomes in vitro was demonstrated by cross-linking. The experimental conditions used in the different studies vary considerably. However, salt-resistant binding is always defined to be stable at concentrations of at least 500 mM salt. A question mark indicates either that salt sensitivity has not been determined or that results are inconsistent Eukaryotic ribosome-bound factors
Polysome (translating ribosomes)
80S ribosomes (empty ribosomes)
Ssb1/2p
+ Salt resistant + Salt resistant + Salt sensitive + Salt sensitive + Salt sensitive + ? + ? Polysome (translating ribosomes)
+ Salt sensitive + Salt sensitive + Salt sensitive + Salt sensitive + Salt sensitive + ? (+) ? 70S ribosomes (empty ribosomes)
+ Salt resistant + ?
+ Salt sensitive + ?
SRP RAC (zuotin) NAC NatA Map1p Sis1p Eukaryotic ribosome-bound factors Trigger factor SRP
References [66] [63, 65, 87, 88] [33, 166, 167] [63, 71] [29, 31, 63, 168] [31] [137] [64, 70]
References [41] [51, 169]
carrying a mutation in one of the proteasome subunits. Later it was found that high levels of Sis1p lead to suppression of temperature sensitivity in the pro-teasome mutant and that co-overexpression of Ssb1/2p and Sis1p showed an even stronger effect [91, 92]. The findings suggest that Ssb1/2p and Sis1p together facilitate the degradation of proteins, possibly mediating the transfer of damaged proteins to the proteasome. in vitro, however, there is so far no evidence for a typical Hsp70-Hsp40type interaction between Ssb1/2p and Sis1p [82, 93]. High levels of Ssb1/2p have also been reported to suppress a temperature-sensitive mutant displaying defects in vesicular transport [94] and a mutant with a defect in the mito-chondrial import machinery [95]. Possibly, Ssb1/2p also aids the degradation of precursor proteins that fail to reach the correct destination in these mutant strains. Alternatively, high levels of Ssb1/2p might indirectly affect protein trafficking by enhancing folding or stability of factors required for the respective transport processes. Evidence for a classical chaperone function of Ssb1/2p in protein folding comes from a study on the chaperone TRiC [96]. TRiC cooperates with GimC and assists in the folding of a subset of aggregation-sensitive polypeptides [96–98]. The function of TRiC is essential, whereas GimC function is dispensable for the life of yeast
12 Ribosome-associated Proteins
[97, 99, 100]. However, simultaneous loss of GimC and Ssb1/2p results in synthetic lethality, suggesting that GimC and Ssb1/2p are able to functionally replace each other in assisting in the folding of a subset of TRiC substrate proteins [96]. So far, no mammalian Hsp70 family member has been found to be in direct association with the ribosome. In higher eukaryotes, a number of cytosolic proteins are assisted by the cytosolic Hsp70 homologues and TRiC in co-translational folding (reviewed in Refs. [101, 102]). While TRiC might interact with the ribosome directly [103], the mammalian Hsp70 homologues studied so far seem to be attached to the ribosome–nascent chain complex via the nascent polypeptide chain only. 3.4 Zuotin and Ssz1p: a Stable Chaperone Complex Bound to the Yeast Ribosome
The Hsp40 homologue zuotin was found in 1998 to be a ribosome-associated protein [71]. Later it was discovered that the bulk of zuotin forms a stable complex with the Hsp70 homologue Ssz1p. The complex, termed RAC (ribosome-associated complex), is a stable dimer and is almost entirely associated with ribosomes [63]. A stable complex between an Hsp70 and an Hsp40 homologue is unusual but is not without precedence. Thermus thermophilus DnaK (Hsp70) and DnaJ (Hsp40) form a stable 3:3 complex that contains an additional three subunits of the 8-kDa protein Daf [104, 105]. How Ssz1p and zuotin stably interact is currently unknown. Deletion analysis of zuotin revealed that the very N-terminus up to the J-domain results in a nonfunctional protein that is still able to associate with the ribosome [71]. This N-terminus, which bears no homology to any other known J-protein, is a good candidate domain for binding to Ssz1p. Stable binding of RAC to the ribosome is mediated via the zuotin subunit: in the absence of zuotin, Ssz1p is not associated with the ribosome, while zuotin remains ribosome-associated in the absence of Ssz1p [63]. 3.5 A Functional Chaperone Triad Consisting of Ssb1/2p, Ssz1p, and Zuotin
Deletion of either zuo1 or ssb1/2 results in slow growth, cold sensitivity, sensitivity towards aminoglycosides, and high osmolarity. This genetic interaction, in combination with the finding that both chaperones bind to the ribosome, strongly suggests that Ssb1/2p and zuotin have a common function [71]. Later it turned out that Ssz1p and zuotin not only form the stable RAC complex but also that deletion of ssz1 as well as simultaneous deletion of ssz1zuo1ssb1/2 leads to the same set of growth defects [63, 87, 106]. Overexpression of zuotin or Ssb1/2p can partially suppress the growth defect caused by deletion of ssz1, while overexpression of Ssz1p cannot support growth of zuo1 or ssb1/2 strains. The combination of data suggests that Ssz1p, while not obligatory for Ssb1/2p and zuotin’s common function, is involved in the same cellular process [87, 106]. How RAC and Ssb1/2p work together is so far only poorly understood. in vitro studies of the purified proteins will be required to better understand their interaction on a molecular level. The first insights on how they might cooperate have
3 Chaperones Bound to the Eukaryotic Ribosome: Hsp70 and Hsp40 Systems 13
come from cross-linking experiments. So far, cross-link products to neither of the RAC subunits have been detected. In contrast, Ssb1/2p can be cross-linked to a variety of nascent polypeptide chains in vitro, suggesting that it interacts with many, possibly all, newly synthesized polypeptide chains on their way through the ribosomal tunnel [66, 87]. As outlined above, high-salt treatment of ribosome–nascent chain complexes does not release Ssb1/2p but rather other factors such as RAC. Unexpectedly, on such high-salt-treated complexes, nascent polypeptide chains and Ssb1/2p no longer form cross-link products. However, cross-linking of the nascent polypeptide chain to Ssb1/2p can be restored by re-addition of purified RAC [87]. This finding suggests that RAC is required for the efficient binding of Ssb1/2p to the nascent polypeptide chain (Figure 3). This function of RAC depends on an intact J-domain, suggesting that zuotin acts as a J-partner for the Hsp70 Ssb1/2p while being simultaneously bound to another Hsp70, namely, Ssz1p. Interestingly, the presence of Ssz1p is required for efficient cross-linking to Ssb1/2p. Possibly
Fig. 3 Efficient cross-linking of Ssb1/2p depends on functional RAC. Ssb1/2p and RAC (zuotin + Ssz1p) are associated with translating ribosomes. (A) When the nascent polypeptide chain reaches a length of approximately 45 amino acids, it can be cross-linked to the Hsp70 homologue Ssb1/2p. (B) If RAC is absent from the
ribosome–nascent chain complex, zuotin carries an inactivating mutation in its Jdomain, or, if Ssz1p is missing, the crosslink is no longer formed, suggesting that RAC influences nascent polypeptide binding of Ssb1/2p [87] (for details, compare text).
14 Ribosome-associated Proteins
the nascent polypeptide chain initially binds to Ssz1p and is subsequently transferred to Ssb1/2p. Fast transfer of the nascent polypeptide chain from Ssz1p to the more tightly binding Ssb1/2p might account for the lack of a direct cross-link to Ssz1p. However, sequential binding of the nascent polypeptide chain to Ssz1p and Ssb1/2p seems to be unessential for functionality of the chaperone triad. Lack of the unusually short peptide-binding domain of Ssz1p does not result in the same phenotype as observed for the deletion [106]. It is possible that Ssz1p modulates zuotin’s ability to act as a partner chaperone for Ssb1/2p. Alternatively, Ssz1p might interact only with a specific, possibly small, subset of nascent polypeptide chains. Ssz1p, originally named Pdr13p, was identified as a posttranslational regulator of the transcription factor Pdr1p [107]. Whether Pdr1p or a subset of newly synthesized polypeptides directly interacts with Ssz1p and whether Ssz1p’s function is influenced by zuotin’s J-domain await further investigation. Various speculations have been made on the in vivo role of ribosome-bound Ssb1/2p: folding of ribosome-bound nascent polypeptide chains, maintaining nascent polypeptide chains in an unfolded yet folding-competent state, preventing the nascent polypeptide chain from slipping backwards into the ribosome, facilitating movement of the chain through the exit tunnel or away from the ribosome, clearance of aggregated proteins from the ribosome. However, experimental evidence is scarce. Ssb1/2p cooperates with TRiC in posttranslational completion of protein folding and has a partly overlapping function with the TRiC co-chaperone GimC ([96] and compare above). It should be pointed out that this is most likely not a function of the Ssb1/2p/RAC system. While the lack of Ssb1/2p is lethal in the absence of GimC, the lack of zuotin is not [96]. This strongly suggests that soluble, and not ribosome-associated, Ssb1/2p assists in protein folding by TriC, possibly in combination with an as yet unidentified cytosolic Hsp40 homologue. 3.6 Effects of Ribosome-bound Chaperones on the Yeast Prion [PSI+ ]
The prion-forming proteins of yeast are members of a larger class of Gln/Asnrich proteins. Yeast prions are replicated by a protein-only mechanism resulting in a dominant, non-Mendelian mode of inheritance. The best-characterized yeast prion is [PSI+ ], the polymerized form of the translation termination factor Sup35p with properties characteristic of prion proteins. The formation of [PSI+ ] results in depletion of functional Sup35p and leads to a defect in translation termination that manifests itself by an increase in stop codon read-through [108–111]. Chaperones are thought to play an important role in prion biogenesis. In particular, the cytosolic chaperone Hsp104 has been implicated in yeast prion propagation [109, 112]. [PSI+ ], for example, can be cured by either overproduction or inactivation of Hsp104 [113]. Hsp104, a homohexameric ATPase of the AAA protein family, functions in solubilization and refolding of aggregated proteins [114–117]. This ability provides the explanation for [PSI+ ] being cured by excess Hsp104: Hsp104 overproduction leads to the release of Sup35p from [PSI+ ] aggregates [118, 119]. The reason for the failure to inherit [PSI+ ] in the absence of Hsp104 is most likely connected
4 Enzymes Acting on Nascent Polypeptide Chains 15
to the lack of smaller prion seeds, required for the transmission of [PSI+ ] to daughter cells. However, other models have been proposed ([120] and references therein). In addition to Hsp104, a number of cytosolic chaperones have been implicated in yeast prion propagation, among them, the partially ribosome-associated chaperones Ssb1/2p, Ssa1p, and Sis1p. The reported effects are variable, suggesting the action of complex chaperone networks and variations between different prion strains [121, 122]. The effects are only poorly understood on a mechanistic level. High levels of Ssa have been reported to protect [PSI+ ] from being cured by an excess of Hsp104 [123]. In contrast, other studies have reported curing of [PSI+ ] due to overexpression of Ssa1p [124, 125]. Ssb1/2p increases, while lack of Ssb1/2p prevents [PSI+ ] curing by excess of Hsp104 [126, 127]. High levels of Ssa or Ssb1/2p might either prevent formation of misfolded Sup35p intermediates, which serve as the seeds for [PSI+ ], or stimulate degradation of such seeds. High levels of the Hsp40 homologues Ydj1p, Sis1p, Sti1p, and Apj1p have also been shown to influence prion propagation [124]. Sis1p might exert its function in combination with Ssa or Ssb (compare above). Sis1p is also necessary for the propagation of [RNQ + ], another prion of yeast. As Ydj1p cannot substitute for this function of Sis1p, Hsp40s might provide specificity to the prion curing effects of Hsp70 homologues [128, 129]. Sti1p functionally interacts with Ssa and might cooperate with Ssa in prion curing [130, 131]. So far, Apj1p has been only poorly characterized, and its Hsp70 partner has not been found.
4 Enzymes Acting on Nascent Polypeptide Chains
Two co-translational modifications, cleavage of N-terminal methionine residues and Nα -terminal acetylation, are by far the most common protein modifications in eukaryotes [20, 132]. N-terminal methionine is cleaved from nascent polypeptide chains of prokaryotic and eukaryotic proteins when the residue following the initial methionine is small and uncharged. Nα -terminal acetylation predominantly occurs in eukaryotic cells. Most frequently modified are serine and alanine Nα -termini, and these residues along with methionine, glycine, and threonine account for more than 95% of the N-terminally acetylated residues [20]. 4.1 Methionine Aminopeptidases
Methionine aminopeptidase (MetAP) catalyzes the removal of the initiator methionine from nascent polypeptides. In yeast, there are two homologous MetAPs, Map1p and Map2p. Their combined activities are essential, but their relative intracellular roles are unclear [133, 134]. The efficiency with which specific N-terminal amino acids become exposed in nascent polypeptides by the action of Map1p and Map2p correlates very well with the stability of the generated proteins according to the N-end rule [135]. As an example, methionine in front of alanine, serine, and threonine is cleaved very efficiently, and the resulting polypeptides display high
16 Ribosome-associated Proteins
stability. On the other hand, methionine is not efficiently cleaved in front of, e.g., lysine, arginine, or glutamate. These N-terminal residues, which would destabilize the generated protein, are not commonly generated co-translationally [135, 136]. Consistent with its co-translational mode of action, it was recently demonstrated that Map1p is associated with polysomes [137] (Figure 2). Map2p has so far not been localized. 4.2 Nα -acetyltransferases
Three Nα -terminal acetyltransferase complexes termed NatA, NatB, and NatC have been identified in yeast [31, 138–143]. They differ in substrate specificity and subunit composition, but all contain at least one catalytic subunit homologous to the GNAT family of acetyltransferases [132, 144] (Table 4). All three NATs contain a large subunit that does not possess catalytic activity but is required for activity of the catalytic subunits in vivo. Analogous to the large, multiple tetratrico peptide repeats (TPR) containing subunits of NatA, they might anchor the catalytic subunits to the ribosome and facilitate interaction with the nascent polypeptide substrate [31, 145, 146]. Recently Nat4, a novel catalytic subunit, has been found to specifically N α -acetylate histone H4 and H2A [147]. No potential “anchor” for Nat4 has been found so far. NatA is the major N α -terminal acetyltransferase in yeast, responsible for the ace-tylation of serine, alanine, threonine, and glycine [140, 148, 149]. NatA contains three subunits: Ard1p, the catalytically active GNAT subunit; Nat5p, a GNAT homologue for which catalytic activity has not yet been demonstrated; and the TPR domain protein Nat1p [31, 146] (Table 4). In accordance with its co-translational mode of action, NatA is quantitatively bound to ribosomes [31] (Figure 2). Binding is mediated by Nat1p, which not only anchors Ard1p and Nat5p to the ribosome but also is in close proximity to a variety of nascent polypeptide chains [31]. The lack of Nα -terminal acetylation does not cause a general defect in the stability or folding of the affected proteins [20]. In agreement with this observation, Table 4 Yeast Nα -acetyltransferases. NatA, NatB, NatC, and Nat4p differ in substrate specificity and subunit composition but all contain at least one catalytic subunit homologous to the GNAT family of acetyltransferases [132, 144]. For details, compare text
Large non-catalytic subunit
Catalytic subunit (GNAT)
Auxiliary subunit
References
NatA
Nat1p (98 kDa)
Ard1p (27 kDa)
Nat5p (20 kDa) (GNAT homologue, possibly catalytically active)
[31, 138]
NatB NatC
Mdm20p (92 kDa) Mak10p (84 kDa)
Nat3p (23 kDa) Mak3p (20 kDa) Nat4p (32 kDa)
Mak31p (10 kDa)
[142, 143] [170] [147]
5 A Complex Arrangement at the Yeast Ribosomal Tunnel Exit
N α -acetyltransferases are not essential for the life of yeast. However, nat1 and ard1 strains display temperature sensitivity plus some specific phenotypes, such as reduced sporulation efficiency, failure to enter G0 under specific conditions, and defects in silencing of the silent mating-type loci [31, 139, 140, 150, 151]. The pleio-tropic phenotypes are thought to reflect functional defects of various target proteins lacking proper acetylation at their N-terminus. So far there are surprisingly few examples demonstrating the biological importance of Nα -terminal protein acetylation [20]. For NatA no clear-cut example has been described. One of the few well-documented examples has been reported for Mak3p, the catalytic subunit of NatC. Nα -acetylation by Mak3p is essential for the coat protein assembly of the LA double-stranded RNA virus in yeast [141, 152]. Ssb1/2p can partially suppress temperature sensitivity and the silencing defect caused by the lack of NatA [31]. It seems unlikely that high levels of Ssb1/2p would restore N α -acetylation of NatA substrates. Suppression of both temperature sensitivity and the silencing defect suggests that Ssb1/2p is able to complement a general function of NatA. Whether Ssb1/2p can partially substitute for Nat1p function on the ribosome or whether it suppresses defects caused by specific proteins that lack Nα -terminal acetylation awaits further investigation. 5 A Complex Arrangement at the Yeast Ribosomal Tunnel Exit
Of all the nascent polypeptide chain-binding proteins discussed in this chapter, SRP is exceptional in that it interacts with nascent polypeptides bearing a signal sequence and its in vivo role precisely correlates with this specificity: SRP functions in the co-translational translocation of signal sequence bearing pre-proteins (compare above). The role of the other factors is less well defined; most likely they are multi-functional, as reflected by their ability to bind to a variety of nascent polypeptide chains. However, yeast Ssb1/2p and E. coli TF share an important characteristic with SRP. Consistent with a functional interaction rather than coincidental proximity, their mode of binding to empty ribosomes and ribosome–nascent polypeptide chain complexes differs. SRP, Ssb1/2p, and TF bind much more stably to ribosomes when simultaneously interacting with a nascent polypeptide chain [33, 41, 54, 66, 87] (Table 3). NAC and NatA also interact with the ribosome and have been found in close proximity to a variety of nascent polypeptide chains by cross-linking approaches [29, 31]. However, significant stabilization of ribosome association by nascent polypeptide chain binding has not been observed for NAC or NatA (Table 3). Whether proximity of NAC and NatA to nascent polypeptide chains indicates binding or rather reflects co-localization has not yet firmly been established [21, 31]. The presence of various factors at the yeast ribosomal tunnel exit implies a highly ordered arrangement and sequence of binding and release events. To date nothing is known about how proteins are arranged at the yeast exit site and to which ribosomal proteins or portions of the rRNA they bind. The first evidence for an ordered sequence of events comes from a comparison of the nascent polypeptide
17
18 Ribosome-associated Proteins
length required for the in vitro generation of cross-link products. Srp54p interacts with nascent secretory polypeptides of 60–70 amino acids in length [153]. Recently it has been found that longer nascent polypeptide chains also can interact with SRP [54]. Ssb1/2p and NAC can be cross-linked to nascent polypeptides – including secretory nascent polypeptides – from a minimal length of 45 amino acids, and NatA ultimately attaches to nascent polypeptides longer than 70 amino acids [31] (Figure 2). These results suggest that the emerging polypeptide first passes Ssb1/2p and NAC, possibly interacting with one or both. Whether NatA (and possibly other enzymes) acts on the very N-terminus while Ssb1/2p and/or NAC are still in close proximity to the nascent polypeptide is not known. Also, the interplay between SRP and the other factors on secretory nascent chains is far from being fully understood. Future studies should establish the arrangement of these factors and how they dynamically interact with the ribosome and the nascent polypeptide to perform their function. References 1 GREEN, R. & NOLLER, H. F. (1997). Ribosomes and translation. Annu. Rev. Biochem. 66, 679–716. 2 MAGUIRE, B. A. & ZIMMERMANN, R. A. (2001). The ribosome in focus. Cell 104, 813–816. 3 RAMAKRISHNAN, V. (2002). Ribosome structure and the mechanism of translation. Cell 108, 557–572. 4 WIMBERLY, B. T., BRODERSEN, D. E., CLEMONS, W. M., Jr., MORGAN-WARREN, R. J., CARTER, A. P., VONRHEIN, C., HARTSCH, T. & RAMAKRISHNAN, V. (2000). Structure of the 30S ribosomal subunit. Nature 407, 327–339. 5 BAN, N., NISSEN, P., HANSEN, J., MOORE, P. B. & STEITZ, T. A. (2000). The complete atomic structure of the large ribosomal subunit at 2.4A resolution. Science 289, 905–920. 6 HARMS, J., SCHLUENZEN, F., ZARIVACH, R., BASHAN, A., GAT, S., AGMON, I., BARTELS, H., FRANCESCHI, F. & YONATH, A. (2001). High resolution structure of the large ribosomal subunit from a mesophilic eubacterium. Cell 107, 679–688. 7 MORGAN, D. G., MENETRET, J. F., RADERMACHER, M., NEUHOF, A., AKEY, I. V., RAPOPORT, T. A. & AKEY, C. W. (2000). A comparison of the yeast and rabbit 80S ribosome reveals the topology of the nascent chain exit tunnel, intersubunit bridges and mammalian
8
9
10
11
12
13
14
15
rRNA expansion segments. J. Mol. Biol. 301, 301–321. AGRAWAL, R. K. & FRANK, J. (1999). Structural studies of the translational apparatus. Curr. Opin. Struct. Biol. 9, 215–221. STARK, H. (2002). Three-dimensional electron cryomicroscopy of ribosomes. Curr. Protein Pept. Sci. 3, 79–91. DOUDNA, J. A. & RATH, V. L. (2002). Structure and function of the eukaryotic ribosome: the next frontier. Cell 109, 153–156. NISSEN, P., HANSEN, J., BAN, N., MOORE, P. B. & STEITZ, T. A. (2000). The structural basis of ribosome activity in peptide bond synthesis. Science 289, 920–930. MILLIGAN, R. A. & UNWIN, P. N. (1986). Location of exit channel for nascent protein in 80S ribosome. Nature 319, 693–695. YONATH, A., LEONARD, K. R. & WITTMANN, H. G. (1987). A tunnel in the large ribosomal subunit revealed by three-dimensional image reconstruction. Science 236, 813–816. XU, Z., HORWICH, A. L. & SIGLER, P. B. (1997). The crystal structure of the chaperonin complex. Nature 388, 741–750. LIAO, S., LIN, J., DO, H. & JOHNSON, A. E. (1997). Both lumenal and cytosolic
References
16
17
18
19
20
21
22
23
24
25
26
gating of the aqueous ER translocon pore are regulated from inside the ribosome during membrane protein integration. Cell 90, 31–41. SIEGEL, V. (1997). Recognition of a transmembrane domain: another role for the ribosome?. Cell 90, 5–8. TENSON, T. & EHRENBERG, M. (2002). Regulatory nascent peptides in the ribosomal tunnel. Cell 108, 591–594. GABASHVILI, I. S., GREGORY, S. T., VALLE, M., GRASSUCCI, R., WORBS, M., WAHL, M. C., DAHLBERG, A. E. & FRANK, J. (2001). The polypeptide tunnel system in the ribosome and its gating in erythromycin resistance mutants of L4 and L22. Mol. Cell 8, 181–188. CRAIG, E. A., EISENMAN, H. C. & HUNDLEY, H. A. (2003). Ribosometethered molecular chaperones: the first line of defense against protein misfolding?. Curr. Opin. Microbiol. 6, 157–162. POLEVODA, B. & SHERMAN, F. (2000). Nalpha-terminal acetylation of eukaryotic proteins. J. Biol. Chem. 275, 36479–36482. ROSPERT, S., DUBAQUIE Y. & GAUTSCHI, M. (2002). Nascent-polypeptideassociated complex. Cell. Mol. Life Sci. 59, 1632–1639. KOCH, H. G., MOSER, M. & M¨ULLER, M. (2003). Signal recognition particle-dependent protein targeting, universal to all kingdoms of life. Rev. Physiol. Biochem. Pharmacol. 146, 55–94. HANN, B. C. & WALTER, P. (1991). The signal recognition particle in S. cerevisiae. Cell 67, 131–144. KEENAN, R. J., FREYMANN, D. M., STROUD, R. M. & WALTER, P. (2001). The signal recognition particle. Annu. Rev. Biochem. 70, 755–775. NAGAI, K., OUBRIDGE, C., KUGLSTATTER, A., MENICHELLI, E., ISEL, C. & JOVINE, L. (2003). Structure, function and evolution of the signal recognition particle. Embo J. 22, 3479–3485. WIEDMANN, B., SAKAI, H., DAVIS, T. A. & WIEDMANN, M. (1994). A protein complex required for signal-sequencespecific sorting and translocation. Nature 370, 434–440.
27 RADEN, D. & GILMORE, R. (1998). Signal recognition particle-dependent targeting of ribosomes to the rough endoplasmic reticulum in the absence and presence of the nascent polypeptide-associated complex. Mol. Biol. Cell 9, 117–130. 28 PLATH, K. & RAPOPORT, T. A. (2000). Spontaneous release of cytosolic proteins from posttranslational substrates before their transport into the endoplasmic reticulum. J. Cell Biol. 151, 167–178. 29 REIMANN, B., BRADSHER, J., FRANKE, J., HARTMANN, E., WIEDMANN, M., PREHN, S. & WIEDMANN, B. (1999). Initial characterization of the nascent polypeptide-associated complex in yeast. Yeast 15, 397–407. 30 BEATRIX, B., SAKAI, H. & WIEDMANN, M. (2000). The alpha and beta subunit of the nascent polypeptide-associated complex have distinct functions. J. Biol. Chem. 275, 37838–37845. 31 GAUTSCHI, M., JUST, S., MUN, A., ROSS, S., R¨UCKNAGEL, P., DUBAQUIE Y., EHRENHOFER-MURRAY, A. & ROSPERT, S. (2003). The yeast Na -acetyltransferase NatA is quantitatively anchored to the ribosome and interacts with nascent polypeptides. Mol. Cell. Biol. 23, 7403–7414. 32 WANG, S., SAKAI, H. & WIEDMANN, M. (1995). NAC covers ribosome-associated nascent chains thereby forming a protective environment for regions of nascent chains just emerging from the peptidyl transferase center. J. Cell. Biol. 130, 519–528. 33 POWERS, T. & WALTER, P. (1996). The nascent polypeptide-associated complex modulates interactions between the signal recognition particle and the ribosome. Curr. Biol. 6, 331–338. 34 M¨OLLER, I., JUNG, M., BEATRIX, B., LEVY, R., KREIBICH, G., ZIMMERMANN, R., WIEDMANN, M. & LAURING, B. (1998). A general mechanism for regulation of access to the translocon: competition for a membrane attachment site on ribosomes. Proc. Natl. Acad. Sci. USA 95, 13425–13430. 35 OGG, S. C. & WALTER, P. (1995). SRP samples nascent chains for the presence
19
20 Ribosome-associated Proteins
36
37
38
39
40
41
42
43
of signal sequences by interacting with ribosomes at a discrete step during translation elongation. Cell 81, 1075–1084. POOL, M. R., STUMM, J., FULGA, T. A., SINNING, I. & DOBBERSTEIN, B. (2002). Distinct modes of signal recognition particle interaction with the ribosome. Science 297, 1345–1348. CROOKE, E. & WICKNER, W. (1987). Trigger factor: a soluble protein that folds pro-OmpA into a membraneassembly-competent form. Proc. Natl. Acad. Sci. USA 84, 5216–5220. LILL, R., CROOKE, E., GUTHRIE, B. & WICKNER, W. (1988). The “trigger factor cycle” includes ribosomes, presecretory proteins, and the plasma membrane. Cell 54, 1013–1018. STOLLER, G., R¨UCKNAGEL, K. P., NIERHAUS, K. H., SCHMID, F. X., FISCHER, G. & RAHFELD, J. U. (1995). A ribosome-associated peptidyl-prolyl cis/trans isomerase identified as the trigger factor. Embo J. 14, 4939–4948. VALENT, Q. A., KENDALL, D. A., HIGH, S., KUSTERS, R., OUDEGA, B. & LUIRINK, J. (1995). Early events in preprotein recognition in E. coli: interaction of SRP and trigger factor with nascent polypeptides. Embo J. 14, 5494–5505. HESTERKAMP, T., HAUSER, S., LUTCKE, H. & BUKAU, B. (1996). Escherichia coli trigger factor is a prolyl isomerase that associates with nascent poly-peptide chains. Proc. Natl. Acad. Sci. USA 93, 4437–4441. GOTHEL, S. F., SCHOLZ, C., SCHMID, F. X. & MARAHIEL, M. A. (1998). Cyclophilin and trigger factor from Bacillus subtilis catalyze in vitro protein folding and are necessary for viability under starvation conditions. Biochemistry 37, 13392–13399. TETER, S. A., HOURY, W. A., ANG, D., TRADLER, T., ROCKABRAND, D., FISCHER, G., BLUM, P., GEORGOPOULOS, C. & HARTL, F. U. (1999). Poly-peptide flux through bacterial Hsp70: DnaK cooperates with trigger factor in chaperoning nascent chains. Cell 97, 755–765.
44 DEUERLING, E., SCHULZE-SPECKING, A., TOMOYASU, T., MOGK, A. & BUKAU, B. (1999). Trigger factor and DnaK cooperate in folding of newly synthesized proteins. Nature 400, 693–696. 45 KRISTENSEN, O. & GAJHEDE, M. (2003). Chaperone Binding at the Ribosomal Exit Tunnel. Structure 11, 1547–1556. 46 KRAMER, G., RAUCH, T., RIST, W., VORDERWULBECKE, S., PATZELT, H., SCHULZE-SPECKING, A., BAN, N., DEUERLING, E. & BUKAU, B. (2002). L23 protein functions as a chaperone docking site on the ribosome. Nature 419, 171–174. 47 MAIER, R., ECKERT, B., SCHOLZ, C., LILIE, H. & SCHMID, F. X. (2003). Interaction of trigger factor with the ribosome. J. Mol. Biol. 326, 585–592. 48 SCHOLZ, C., STOLLER, G., ZARNT, T., FISCHER, G. & SCHMID, F. X. (1997). Cooperation of enzymatic and chaperone functions of trigger factor in the catalysis of protein folding. Embo J. 16, 54–58. 49 MAIER, R., SCHOLZ, C. & SCHMID, F. X. (2001). Dynamic association of trigger factor with protein substrates. J. Mol. Biol. 314, 1181–1190. 50 SCHOLZ, C., M¨UCKE, M., RAPE, M., PECHT, A., PAHL, A., BANG, H. & SCHMID, F. X. (1998). Recognition of protein substrates by the prolyl iso-merase trigger factor is independent of proline residues. J. Mol. Biol. 277, 723–732. 51 GU, S. Q., PESKE, F., WIEDEN, H. J., RODNINA, M. V. & WINTERMEYER, W. (2003). The signal recognition particle binds to protein L23 at the peptide exit of the Escherichia coli ribosome. Rna 9, 566–573. 52 ULLERS, R. S., HOUBEN, E. N., RAINE, A., ten HAGEN-JONGMAN, C. M., EHRENBERG, M., BRUNNER, J., OUDEGA, B., HARMS, N. & LUIRINK, J. (2003). Interplay of signal recognition particle and trigger factor at L23 near the nascent chain exit site on the Escherichia coli ribosome. J. Cell Biol. 161, 679–684. 53 NAKATOGAWA, H. & ITO, K. (2002). The ribosomal exit tunnel functions as a discriminating gate. Cell 108, 629–636.
References 54 FLANAGAN, J. J., CHEN, J. C., MIAO, Y., SHAO, Y., LIN, J., BOCK, P. E. & JOHNSON, A. E. (2003). Signal recognition particle binds to ribosome-bound signal sequences with fluorescence-detected subnanomolar affinity that does not diminish as the nascent chain lengthens. J. Biol. Chem. 278, 18628–18637. 55 GONG, F. & YANOFSKY, C. (2002). Instruction of translating ribosome by nascent peptide. Science 297, 1864–1867. 56 MAYER, M. P., BREHMER, D., G¨ASSLER, C. S. & BUKAU, B. (2002). Hsp70 chaperone machines. Adv. Protein Chem. 59, 1–44. 57 KRAMER, G., RAMACHANDIRAN, V., HOROWITZ, P. M. & HARDESTY, B. (2002). The molecular chaperone DnaK is not recruited to translating ribosomes that lack trigger factor. Arch. Biochem. Biophys. 403, 63–70. 58 REYES, D. Y. & YOSHIKAWA, H. (2002). DnaK chaperone machine and trigger factor are only partially required for normal growth of Bacillus subtilis. Biosci. Biotechnol. Biochem. 66, 1583–1586. 59 FLAHERTY, K. M., DELUCA-FLAHERTY, C. & MCKAY, D. B. (1990). Three-dimensional structure of the ATPase fragment of a 70K heat-shock cognate protein. Nature 346, 623–628. 60 WANG, T.-F., CHANG, J.-H. & WANG, C. (1993). Identification of the peptide binding domain of Hsc70 18-kilodalton fragment located immediately after ATPase domain is sufficient for high affinity binding. J. Biol. Chem. 268, 26049–26051. 61 WALTER, S. & BUCHNER, J. (2002). Molecular chaperones–cellular machines for protein folding. Angew. Chem. Int. Ed. Engl. 41, 1098–1113. 62 NELSON, R. J., ZIEGELHOFFER, T., NICOLET, C., WERNER-WASHBURNE, M. & CRAIG, E. A. (1992). The translation machinery and 70 kd heat shock protein cooperate in protein synthesis. Cell 71, 97–105. 63 GAUTSCHI, M., LILIE, H., F¨UNFSCHILLING, U., MUN, A., ROSS, S., LITHGOW, T., R¨UCKNAGEL, P. & ROSPERT, S. (2001). RAC, a stable ribosome-associated complex in yeast formed by the DnaK-DnaJ homologs Ssz1p and zuotin.
64
65
66
67
68
69
70
71
72
73
74
Proc. Natl. Acad. Sci. USA 98, 3762–3767. HORTON, L. E., JAMES, P., CRAIG, E. A. & HENSOLD, J. O. (2001). The yeast hsp70 homologue Ssa is required for translation and interacts with Sis1 and Pab1 on translating ribosomes. J. Biol. Chem. 276, 14426–14433. JAMES, P., PFUND, C. & CRAIG, E. A. (1997). Functional specificity among Hsp70 molecular chaperones. Science 275, 387–389. PFUND, C., LOPEZ-HOYO, N., ZIEGELHOFFER, T., SCHILKE, B. A., LOPEZ-BUESA, P., WALTER, W. A., WIEDMANN, M. & CRAIG, E. A. (1998). The molecular chaperone Ssb from Saccharomyces cerevisiae is a component of the ribosome-nascent chain complex. Embo J. 17, 3981–3989. RYAN, M. T. & PFANNER, N. (2001). Hsp70 proteins in protein translocation. Adv. Protein Chem. 59, 223–242. BUKAU, B. & HORWICH, A. L. (1998). The Hsp70 and Hsp60 chaperone machines. Cell 92, 351–366. KELLEY, W. L. (1998). The J-domain family and the recruitment of chaperone power. Trends Biochem. Sci. 23, 222–227. ZHONG, T. & ARNDT, K. T. (1993). The yeast SIS1 protein, a DnaJ homolog, is required for the initiation of translation. Cell 73, 1175–1186. YAN, W., SCHILKE, B., PFUND, C., WALTER, W., KIM, S. & CRAIG, E. A. (1998). Zuotin, a ribosome-associated DnaJ molecular chaperone. Embo J. 17, 4809–4817. LUKE, M. M., SUTTON, A. & ARNDT, K. T. (1991). Characterization of SIS1, a Saccharomyces cerevisiae homologue of bacterial dnaJ proteins. J. Cell Biol. 114, 623–638. SALMON, D., MONTERO-LOMELI, M. & GOLDENBERG, S. (2001). A DnaJ-like protein homologous to the yeast co-chaperone Sis1 (TcJ6p) is involved in initiation of translation in Trypanosoma cruzi. J. Biol. Chem. 276, 43970–43979. SACHS, A. B. & DAVIS, R. W. (1989). The poly(A) binding protein is required for poly(A) shortening and 60S ribosomal subunit-dependent translation initiation. Cell 58, 857–867.
21
22 Ribosome-associated Proteins
75 WILUSZ, C. J., WORMINGTON, M. & PELTZ, S. W. (2001). The cap-to-tail guide to mRNA turnover. Nat. Rev. Mol. Cell. Biol. 2, 237–246. 76 PREISS, T. & HENTZE, M. W. (1999). From factors to mechanisms: translation and translational control in eukaryotes. Curr. Opin. Genet. Dev. 9, 515–521. 77 BUSH, G. L. & MEYER, D. I. (1996). The refolding activity of the yeast heat shock proteins Ssa1 and Ssa2 defines their role in protein translocation. J. Cell Biol. 135, 1229–1237. 78 DUTTAGUPTA, R., VASUDEVAN, S., WILUSZ, C. J. & PELTZ, S. W. (2003). A yeast homologue of Hsp70, Ssa1p, regulates turnover of the MFA2 transcript through its AU-rich 3 untranslated region. Mol. Cell. Biol. 23, 2623–2632. 79 GEYMONAT, M., WANG, L., GARREAU, H. & JACQUET, M. (1998). Ssa1p chaperone interacts with the guanine nucleotide exchange factor of ras Cdc25p and controls the cAMP pathway in Saccharomyces cerevisiae. Mol. Microbiol. 30, 855–864. 80 KIM, S., SCHILKE, B., CRAIG, E. A. & HORWICH, A. L. (1998). Folding in vivo of a newly translated yeast cytosolic enzyme is mediated by the SSA class of cytosolic yeast Hsp70 proteins. Proc. Natl. Acad. Sci. USA 95, 12860–12865. 81 LIU, Y., LIANG, S. & TARTAKOFF, A. M. (1996). Heat shock disassembles the nucleolus and inhibits nuclear protein import and poly(A)+ RNA export. Embo J. 15, 6750–6757. 82 LU, Z. & CYR, D. M. (1998). Protein folding activity of Hsp70 is modified differentially by the hsp40 co-chaperones Sis1 and Ydj1. J. Biol. Chem. 273, 27824–27830. 83 QIAN, X., HOU, W., ZHENGANG, L. & SHA, B. (2002). Direct interactions between molecular chaperones heat-shock protein (Hsp)70 and Hsp40: yeast Hsp70 Ssa1 binds the extreme C-terminal region of yeast Hsp40 Sis1. Biochem. J. 361, 27–34. 84 SHULGA, N., JAMES, P., CRAIG, E. A. & GOLDFARB, D. S. (1999). A nuclear export signal prevents Saccharomyces
85
86
87
88
89
90
91
92
93
cerevisiae Hsp70 Ssb1p from stimulating nuclear localization signal-directed nuclear transport. J. Biol. Chem. 274, 16501–16507. CRAIG, E. A. & JACOBSEN, K. (1985). Mutations in cognate genes of Saccharomyces cerevisiae hsp70 result in reduced growth rates at low temperatures. Mol. Cell. Biol. 5, 3517–3524. OGLE, J. M., CARTER, A. P. & RAMAKRISHNAN, V. (2003). Insights into the decoding mechanism from recent ribosome structures. Trends Biochem. Sci. 28, 259–266. GAUTSCHI, M., MUN, A., ROSS, S. & ROSPERT, S. (2002). A functional chaperone triad on the yeast ribosome. Proc. Natl. Acad. Sci. USA 99, 4209–4214. PFUND, C., HUANG, P., LOPEZ-HOYO, N. & CRAIG, E. A. (2001). Divergent functional properties of the ribosome-associated molecular chaperone Ssb compared with other Hsp70s. Mol. Biol. Cell 12, 3773–3782. LOPEZ, N., HALLADAY, J., WALTER, W. & CRAIG, E. A. (1999). SSB, encoding a ribosome-associated chaperone, is coordinately regulated with ribosomal protein genes. J. Bacteriol. 181, 3136–3143. HOCHSTRASSER, M., JOHNSON, P. R., ARENDT, C. S., AMERIK, A., SWAMINATHAN, S., SWANSON, R., LI, S. J., LANEY, J., PALS-RYLAARSDAM, R., NOWAK, J. & CONNERLY, P. L. (1999). The Saccharomyces cerevisiae ubiquitin-proteasome system. Philos. Trans. R. Soc. Lond. B Biol. Sci. 354, 1513–1522. OHBA, M. (1997). Modulation of intracellular protein degradation by SSB1-SIS1 chaperon system in yeast S. cerevisiae. FEBS Lett. 409, 307–311. OHBA, M. (1994). A 70-kDa heat shock cognate protein suppresses the defects caused by a proteasome mutation in Saccharomyces cerevisiae. FEBS Lett. 351, 263–266. LOPEZ-BUESA, P., PFUND, C. & CRAIG, E. A. (1998). The biochemical properties of the ATPase activity of a 70-kDa heat
References
94
95
96
97
98
99
100
101
102
shock protein (Hsp70) are governed by the C-terminal domains. Proc. Natl. Acad. Sci. USA 95, 15253–15258. KOSODO, Y., IMAI, K., HIRATA, A., NODA, Y., TAKATSUKI, A., ADACHI, H. & YODA, K. (2001). Multicopy suppressors of the sly1 temperature-sensitive mutation in the ER-Golgi vesicular transport in Saccharomyces cerevisiae. Yeast 18, 1003–1014. DUNN, C. D. & JENSEN, R. E. (2003). Suppression of a defect in mitochondrial protein import identifies cytosolic proteins required for viability of yeast cells lacking mito-chondrial DNA. Genetics 165, 35–45. SIEGERS, K., BOLTER, B., SCHWARZ, J. P., BOTTCHER, U. M., GUHA, S. & HARTL, F. U. (2003). TRiC/CCT cooperates with different upstream chaperones in the folding of distinct protein classes. Embo J. 22, 5230–5240. SIEGERS, K., WALDMANN, T., LEROUX, M. R., GREIN, K., SHEVCHENKO, A., SCHIEBEL, E. & HARTL, F. U. (1999). Compartmentation of protein folding in vivo: sequestration of non-native polypeptide by the chaperonin-GimC system. EMBO Journal 18, 75–84. HANSEN, W. J., COWAN, N. J. & WELCH, W. J. (1999). Prefoldin-nascent chain complexes in the folding of cytoskeletal proteins. J. Cell Biol. 145, 265–277. STOLDT, V., RADEMACHER, F., KEHREN, V., ERNST, J. F., PEARCE, D. A. & SHERMAN, F. (1996). Review: the Cct eukaryotic chaperonin subunits of Saccharomyces cerevisiae and other yeasts. Yeast 12, 523–529. GEISSLER, S., SIEGERS, K. & SCHIEBEL, E. (1998). A novel protein complex promoting formation of functional alpha- and gamma-tubulin. Embo J. 17, 952–966. HARTL, F. U. & HAYER-HARTL, M. (2002). Molecular chaperones in the cytosol: from nascent chain to folded protein. Science 295, 1852–1858. FRYDMAN, J. (2001). Folding of newly translated proteins in vivo: The Role of molecular chaperones. Annu. Rev. Biochem. 70, 603–647.
103 MCCALLUM, C. D., DO, H., JOHNSON, A. E. & FRYDMAN, J. (2000). The interaction of the chaperonin tailless complex polypeptide 1 (TCP1) ring complex (TRiC) with ribosome-bound nascent chains examined using photo-cross-linking. J. Cell Biol. 149, 591–602. 104 MOTOHASHI, K., YOHDA, M., ENDO, I. & YOSHIDA, M. (1996). A novel factor required for the assembly of the DnaK and DnaJ chaperones of Thermus thermophilus. J. Biol. Chem. 271, 17343–17348. 105 SCHLEE, S. & REINSTEIN, J. (2002). The DnaK/ClpB chaperone system from Thermus thermophilus. Cell. Mol. Life Sci. 59, 1598–1606. 106 HUNDLEY, H., EISENMAN, H., WALTER, W., EVANS, T., HOTOKEZAKA, Y., WIEDMANN, M. & CRAIG, E. (2002). The in vivo function of the ribosome-associated Hsp70, Ssz1, does not require its putative peptide-binding domain. Proc. Natl. Acad. Sci. USA 99, 4203–4208. 107 HALLSTROM, T. C., KATZMANN, D. J., TORRES, R. J., SHARP, W. J. & MOYE-ROWLEY, W. S. (1998). Regulation of transcription factor Pdr1p function by an Hsp70 protein in Saccharomyces cerevisiae. Mol. Cell. Biol. 18, 1147–1155. 108 LINDQUIST, S., KROBITSCH, S., LI, L. & SONDHEIMER, N. (2001). Investigating protein conformation-based inheritance and disease in yeast. Philos. Trans. R. Soc. Lond. B Biol. Sci. 356, 169–176. 109 TUITE, M. F. & LINDQUIST, S. L. (1996). Maintenance and inheritance of yeast prions. Trends Genet. 12, 467–471. 110 OSHEROVICH, L. Z. & WEISSMAN, J. S. (2002). The utility of prions. Dev. Cell 2, 143–151. 111 TUITE, M. F. (2000). Yeast prions and their prion-forming domain. Cell 100, 289–292. 112 WICKNER, R. B., TAYLOR, K. L., EDSKES, H. K., MADDELEIN, M. L., MORIYAMA, H. & ROBERTS, B. T. (2000). Prions of yeast as heritable amyloidoses. J. Struct. Biol. 130, 310–322. 113 CHERNOFF, Y. O., LINDQUIST, S. L., ONO, B., INGE-VECHTOMOV, S. G. & LIEBMAN, S. W. (1995). Role of the chaperone protein
23
24 Ribosome-associated Proteins
114
115
116
117
118
119
120
121
122
123
Hsp104 in propagation of the yeast prion-like factor [psi+]. Science 268, 880–884. SCHIRMER, E. C., GLOVER, J. R., SINGER, M. A. & LINDQUIST, S. (1996). HSP100/Clp proteins: a common mechanism explains diverse functions. Trends Biochem. Sci. 21, 289–296. PARSELL, D. A., SANCHEZ, Y., STITZEL, J. D. & LINDQUIST, S. (1991). Hsp104 is a highly conserved protein with two essential nucleotide-binding sites. Nature 353, 270–273. SANCHEZ, Y., TAULIEN, J., BORKOVICH, K. A. & LINDQUIST, S. (1992). Hsp104 is required for tolerance to many forms of stress. Embo J. 11, 2357–2364. GLOVER, J. R. & LINDQUIST, S. (1998). Hsp104, Hsp70, and Hsp40: a novel chaperone system that rescues previously aggregated proteins. Cell 94, 73–82. PATINO, M. M., LIU, J. J., GLOVER, J. R. & LINDQUIST, S. (1996). Support for the prion hypothesis for inheritance of a phenotypic trait in yeast. Science 273, 622–626. PAUSHKIN, S. V., KUSHNIROV, V. V., SMIRNOV, V. N. & TER-AVANESYAN, M. D. (1996). Propagation of the yeast prion-like [psi+] determinant is mediated by oligomerization of the SUP35-encoded polypeptide chain release factor. Embo J. 15, 3127–3134. WEGRZYN, R. D., BAPAT, K., NEWNAM, G. P., ZINK, A. D. & CHERNOFF, Y. O. (2001). Mechanism of prion loss after Hsp104 inactivation in yeast. Mol. Cell. Biol. 21, 4656–4669. UPTAIN, S. M., SAWICKI, G. J., CAUGHEY, B., LINDQUIST, S., CHIEN, P. & WEISSMAN, J. S. (2001). Strains of [PSI(+)] are distinguished by their efficiencies of prion-mediated conformational conversion conformational diversity in a yeast prion dictates its seeding specificity. Embo J. 20, 6236–6245. TRUE, H. L. & LINDQUIST, S. L. (2000). A yeast prion provides a mechanism for genetic variation and phenotypic diversity. Nature 407, 477–483. NEWNAM, G. P., WEGRZYN, R. D., LINDQUIST, S. L. & CHERNOFF, Y. O.
124
125
126
127
128
129
130
131
132
133
(1999). Antagonistic interactions between yeast chaperones Hsp104 and Hsp70 in prion curing. Mol. Cell. Biol. 19, 1325–1333. KRYNDUSHKIN, D. S., SMIRNOV, V. N., TER-AVANESYAN, M. D. & KUSHNIROV, V. V. (2002). Increased expression of Hsp40 chaperones, transcriptional factors, and ribosomal protein Rpp0 can cure yeast prions. J. Biol. Chem. 277, 23702–23708. KUSHNIROV, V. V., KRYNDUSHKIN, D. S., BOGUTA, M., SMIRNOV, V. N. & TER-AVANESYAN, M. D. (2000). Chaperones that cure yeast artificial [PSI+] and their prion-specific effects. Curr. Biol. 10, 1443–1446. CHERNOFF, Y. O., NEWNAM, G. P., KUMAR, J., ALLEN, K. & ZINK, A. D. (1999). Evidence for a protein mutator in yeast: role of the Hsp70-related chaperone ssb in formation, stability, and toxicity of the [PSI] prion. Mol. Cell. Biol. 19, 8103–8112. CHACINSKA, A., SZCZESNIAK, B., KOCHNEVA-PERVUKHOVA, N. V., KUSHNIROV, V. V., TER-AVANESYAN, M. D. & BOGUTA, M. (2001). Ssb1 chaperone is a [PSI+] prion-curing factor. Curr. Genet. 39, 62–67. SONDHEIMER, N., LOPEZ, N., CRAIG, E. A. & LINDQUIST, S. (2001). The role of Sis1 in the maintenance of the [RNQ+] prion. Embo J. 20, 2435–2442. LOPEZ, N., ARON, R. & CRAIG, E. A. (2003). Specificity of Class II Hsp40 Sis1 in Maintenance of Yeast Prion [RNQ(+)]. Mol. Biol. Cell 14, 1172–1181. CHANG, H. C., NATHAN, D. F. & LINDQUIST, S. (1997). In vivo analysis of the Hsp90 cochaperone Sti1 (p60). Mol. Cell. Biol. 17, 318–3125. WEGELE, H., HASLBECK, M., REINSTEIN, J. & BUCHNER, J. (2003). Sti1 is a novel activator of the Ssa proteins. J. Biol. Chem. 278, 25970–25976. POLEVODA, B. & SHERMAN, F. (2003). N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. J. Mol. Biol. 325, 595–622. LI, X. & CHANG, Y. H. (1995). Amino-terminal protein processing in Saccharomyces cerevisiae is an essential
References
134
135
136
137
138
139
140
141
function that requires two distinct methionine aminopeptidases. Proc. Natl. Acad. Sci. USA 92, 12357–12361. DUMMITT, B., MICKA, W. S. & CHANG, Y. H. (2003). N-terminal methionine removal and methionine metabolism in Saccharomyces cerevisiae. J. Cell Biochem. 89, 964–974. GONDA, D. K., BACHMAIR, A., WUNNING, I., TOBIAS, J. W., LANE, W. S. & VARSHAVSKY, A. (1989). Universality and structure of the N-end rule. J. Biol. Chem. 264, 16700–16712. MOERSCHELL, R. P., HOSOKAWA, Y., TSUNASAWA, S. & SHERMAN, F. (1990). The specificities of yeast methionine aminopeptidase and acetylation of amino-terminal methionine in vivo. Processing of altered iso-1-cytochromes c created by oligonucleotide transformation. J. Biol. Chem. 265, 19638–19643. VETRO, J. A. & CHANG, Y. H. (2002). Yeast methionine aminopeptidase type 1 is ribosome-associated and requires its N-terminal zinc finger domain for normal function in vivo. J. Cell Biochem. 85, 678–688. PARK, E. C. & SZOSTAK, J. W. (1992). ARD1 and NAT1 proteins form a complex that has N-terminal acetyltransferase activity. Embo J. 11, 2087–2093. MULLEN, J. R., KAYNE, P. S., MOERSCHELL, R. P., TSUNASAWA, S., GRIBSKOV, M., COLAVITO-SHEPANSKI, M., GRUNSTEIN, M., SHERMAN, F. & STERNGLANZ, R. (1989). Identification and characterization of genes and mutants for an N-terminal acetyltransferase from yeast. Embo J. 8, 2067–2075. POLEVODA, B., NORBECK, J., TAKAKURA, H., BLOMBERG, A. & SHERMAN, F. (1999). Identification and specificities of N-terminal acetyltransferases from Saccharomyces cerevisiae. Embo J. 18, 6155–6168. TERCERO, J. C. & WICKNER, R. B. (1992). MAK3 encodes an N-acetyltransferase whose modification of the L-A gag NH2 terminus is necessary for virus particle assembly. J. Biol. Chem. 267, 20277–20281.
142 POLEVODA, B., CARDILLO, T. S., DOYLE, T. C., BEDI, G. S. & SHERMAN, F. (2003). Nat3p and Mdm20p are required for function of yeast NatB Nalpha-terminal acetyltransferase and of actin and tropomyosin. J. Biol. Chem. 3, 3. 143 SINGER, J. M. & SHAW, J. M. (2003). Mdm20 protein functions with Nat3 protein to acetylate Tpm1 protein and regulate tropomyosin-actin interactions in budding yeast. Proc. Natl. Acad. Sci. USA 100, 7644–7649. 144 NEUWALD, A. F. & LANDSMAN, D. (1997). GCN5-related histone N-acetyltransferases belong to a diverse superfamily that includes the yeast SPT10 protein. Trends Biochem. Sci. 22, 154–155. 145 BLATCH, G. L. & LASSLE, M. (1999). The tetratricopeptide repeat: a structural motif mediating protein-protein interactions. Bioessays 21, 932–939. 146 POLEVODA, B. & SHERMAN, F. (2003). Composition and function of the eukaryotic N-terminal acetyltrans-ferase subunits. Biochem. Biophys. Res. Commun. 308, 1–11. 147 SONG, O. K., WANG, X., WATERBORG, J. H. & STERNGLANZ, R. (2003). An Nalpha-acetyltransferase responsible for acetylation of the N-terminal residues of histones H4 and H2A. J. Biol. Chem. 278, 38109–38112. 148 TAKAKURA, H., TSUNASAWA, S., MIYAGI, M. & WARNER, J. R. (1992). NH2-terminal acetylation of ribosomal proteins of Saccharomyces cerevisiae. J. Biol. Chem. 267, 5442–5445. 149 ARNOLD, R. J., POLEVODA, B., REILLY, J. P. & SHERMAN, F. (1999). The action of N-terminal acetyltransferases on yeast ribosomal proteins. J. Biol. Chem. 274, 37035–37040. 150 WHITEWAY, M., FREEDMAN, R., Van ARSDELL, S., SZOSTAK, J. W. & THORNER, J. (1987). The yeast ARD1 gene product is required for repression of cryptic mating-type information at the HML locus. Mol. Cell. Biol. 7, 3713–3722. 151 WHITEWAY, M. & SZOSTAK, J. W. (1985). The ARD1 gene of yeast functions in the switch between the mitotic cell cycle and
25
26 Ribosome-associated Proteins
152
153
154
155
156
157
158
159
alternative developmental pathways. Cell 43, 483–492. TERCERO, J. C., DINMAN, J. D. & WICKNER, R. B. (1993). Yeast MAK3 N-acetyltransferase recognizes the N-terminal four amino acids of the major coat protein (gag) of the L-A double-stranded RNA virus. J. Bacteriol. 175, 3192–3194. KURZCHALIA, T. V., WIEDMANN, M., GIRSHOVICH, A. S., BOCHKAREVA, E. S., BIELKA, H. & RAPOPORT, T. A. (1986). The signal sequence of nascent preprolactin interacts with the 54K polypeptide of the signal recognition particle. Nature 320, 634–636. F¨UNFSCHILLING, U. & ROSPERT, S. (1999). Nascent Polypeptide-associated Complex Stimulates Protein Import into Yeast Mitochondria. Mol. Biol. Cell. 10, 3289–3299. WILD, K., WEICHENRIEDER, O., STRUB, K., SINNING, I. & CUSACK, S. (2002). Towards the structure of the mammalian signal recognition particle. Curr. Opin. Struct. Biol. 12, 72–81. PLANTA, R. J. & MAGER, W. H. (1998). The list of cytoplasmic ribosomal proteins of Saccharomyces cerevisiae. Yeast 14, 471–477. WINZELER, E. A., SHOEMAKER, D. D., ASTROMOFF, A., LIANG, H., ANDERSON, K., ANDRE, B., BANGHAM, R., BENITO, R., BOEKE, J. D., BUSSEY, H., CHU, A. M., CONNELLY, C., DAVIS, K., DIETRICH, F., DOW, S. W., EL BAKKOURY, M., FOURY, F., FRIEND, S. H., GENTALEN, E., GIAEVER, G., HEGEMANN, J. H., JONES, T., LAUB, M., LIAO, H., DAVIS, R. W. & et al. (1999). Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906. RUTGERS, C. A., SCHAAP, P. J., VAN’T RIET, J., WOLDRINGH, C. L. & RAUE, H. A. (1990). In vivo and in vitro analysis of structure-function relationships in ribosomal protein L25 from Saccharomyces cerevisiae. Biochim. Biophys. Acta 1050, 74–79. SONG, J. M., CHEUNG, E. & RABINOWITZ, J. C. (1996). Organization and characterization of the two yeast
160
161
162
163
164
165
166
167
168
ribosomal protein YL19 genes. Curr. Genet. 30, 273–278. SACHS, A. B. & DAVIS, R. W. (1990). Translation initiation and ribosomal biogenesis: involvement of a putative rRNA helicase and RPL46. Science 247, 1077–1079. LUCIOLI, A., PRESUTTI, C., CIAFRE, S., CAFFARELLI, E., FRAGAPANE, P. & BOZZONI, I. (1988). Gene dosage alteration of L2 ribosomal protein genes in Saccharomyces cerevisiae: effects on ribosome synthesis. Mol. Cell. Biol. 8, 4792–4798. GEORGE, R., BEDDOE, T., LANDL, K. & LITHGOW, T. (1998). The yeast nascent polypeptide-associated complex initiates protein targeting to mitochondria in vivo. Proc. Natl. Acad. Sci. USA 95, 2296–2301. BROWN, J. D., HANN, B. C., MEDZIHRADSZKY, K. F., NIWA, M., BURLINGAME, A. L. & WALTER, P. (1994). Subunits of the Saccharomyces cerevisiae signal recognition particle required for its functional expression. Embo J. 13, 4390–4400. MASON, N., CIUFO, L. F. & BROWN, J. D. (2000). Elongation arrest is a physiologically important function of signal recognition particle. Embo J. 19, 4164–4174. WILLER, M., JERMY, A. J., STEEL, G. J., GARSIDE, H. J., CARTER, S. & STIRLING, C. J. (2003). An in vitro assay using overexpressed yeast SRP demonstrates that cotranslational translocation is dependent upon the J-domain of Sec63p. Biochemistry 42, 7171–7177. ZOPF, D., BERNSTEIN, H. D., JOHNSON, A. E. & WALTER, P. (1990). The methionine-rich domain of the 54 kd protein subunit of the signal recognition particle contains an RNA binding site and can be crosslinked to a signal sequence. Embo J. 9, 4511–4517. HIGH, S. & DOBBERSTEIN, B. (1991). The signal sequence interacts with the methionine-rich domain of the 54-kD protein of signal recognition particle. J. Cell Biol. 113, 229–233. GEORGE, R., WALSH, P., BEDDOE, T. & LITHGOW, T. (2002). The nascent polypeptide-associated complex (NAC)
References promotes interaction of ribosomes with the mitochondrial surface in vivo. FEBS Lett. 516, 213–216. 169 VALENT, Q. A., de GIER, J. W., VON HEIJNE, G., KENDALL, D. A., TEN HAGEN-JONGMAN, C. M., OUDEGA, B. & LUIRINK, J. (1997). Nascent membrane and presecretory proteins synthesized in
Escherichia coli associate with signal recognition particle and trigger factor. Mol. Microbiol. 25, 53–64. 170 POLEVODA, B. & SHERMAN, F. (2001). NatC Nalpha-terminal acetyltransfer-ase of yeast contains three subunits, Mak3p, Mak10p, and Mak31p. J. Biol. Chem. 276, 20154–20159.
27
1
The Mechanism of Recording in Pro- and Eukaryotes Elizabeth S. Poole, Andrew G. Cridge, and Warren P. Tate University of Otago, Dunedin, New Zealand
Louise L. Major The University St. Andrews, Scotland, UK
Originally published in: Protein Synthesis and Ribosome Structure. Edited by Knud H. Nierhaus and Daniel N. Wilson. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30638-1
1 Introduction
During protein synthesis, the ribosome uses mechanisms that maintain the translational frame and the nature of the interactions between the RNA participants are critical to this process. Of the RNA visitors to the ribosome, the tRNA occupies two of three possible sites [1] at the active center depending on whether the ribosome is in the pre-translocational or post-translocational state of the polypeptidechainelongation cycle [2]. In the pre-translocation state, the tRNAs are at the A-site (as the reaction substrate) and the P-site (as the reaction intermediate). In the post-translocation state, the tRNAs occupy the P-site and the E-site (as the reaction product). These pairs of tRNAs make unique interactions with the host structural rRNAs in their particular environments, and the tRNAs themselves undergo some conformational flexing during these interactions which are important for maintaining canonical events. The other key RNA visitor to the ribosome during protein synthesis is the mRNA that occupies a specific channel in the neck of the small ribosomal subunit as it threads through the decoding site during triplet decoding [3]. This threading does not impose a stress on the triplet code reading frame unless there is a ‘tangle’ of some kind in the downstream region of the mRNA. Such ordered tangles, commonly in the form of stem-loops or pseudoknots, can impose sufficient pressure to facilitate a non-canonical or ‘recoding’ event in the form of a change in frame in the mRNA or a change in the interpretation of a particular codon. Generally, this will occur only when additional particular sequence motifs occur in the mRNA itself. Under these conditions, there is a recoding event, since
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Mechanism of Recording in Pro- and Eukaryotes
the expected translational event does not occur because of the new meaning of the codon or because of a subtle or more significant reading frame change [4]. Genetic recoding initially only seemed to happen in the test tube under special conditions, or, at most, to be the domain of viral RNAs that use extraordinary means to subvert the host ribosomes’ fidelity for their own purposes. Clearly, viruses are major users of recoding mechanisms on host-cell ribosomes, but, in addition, it is now well established that cells themselves use recoding strategies as another layer of regulating the expression of a small subset of their genes. Interest in genetic recoding has grown as its importance as a mainstream mechanism of regulating gene expression has become apparent and as its absolute importance in the biology of some pathogenic viruses, like HIV-1, has become obvious. Speculation is now occurring as to whether recoding is widespread as a mechanism to produce minor products in addition to standard proteins from a particular mRNA. This could contribute to the proteome in ways that are beyond comprehension at present [5]. However, genetic recoding is not just restricted to changes in the translational frame. It is a term used to encompass other non-canonical events that give unexpected results from the translation of a signal in the mRNA. For example, included in this definition are the following: 1. Selenocysteine incorporation at a stop codon in a small subset of UGA stop codons in pro- and eukaryotic genes [6]. 2. Incorporation of amino acids by the decoding of stop codons by near-cognate tRNAs (commonly referred to as readthrough) [7]. 3. Disengaging and slipping through a section of the mRNA (bypassing) [8]. 4. Frameshifting on the mRNA, either + or – (slipping forward or slipping backward) [9].
2 Maintaining Decoding Accuracy and the Reading Frame
Ribosomes are not absolute in their avoidance of error during translation of an mRNA; they incorporate an incorrect amino acid only occasionally, perhaps 1 in 103 –104 times and they lose the reading frame of the mRNA maybe 1 in 104 –105 times. Fortunately, this level of error does not compromise the ability of the ribosome to make a protein of 1000 amino acids. This means that if recoding is to occur, then there must be active mechanisms to promote recoding rather than relying on natural error, especially if it is to occur at a specific site. We know now from the work of Ramakrishnan and co-workers [10] that the rRNA uses a sensing mechanism of the codon–anticodon interaction in each of the three nucleotide positions when the substrate tRNA is in the A-site. For example, interaction of a 16S rRNA base (A1493) with the 2 -hydroxyl in each ribose of the nucleotides in the first base pair precludes non-Watson–Crick interaction at this position. In contrast, the wobble base pair is not as constrained as the other two base pairs and allows for
3 The Use of a Stop Signal for both Elongation and Termination of Protein Synthesis
more variety in the kinds of tRNA:mRNA interactions at this position. Similarly, the structures of the 70S ribosome with a P-site tRNA and the 30S subunit, where part of a second subunit molecule was found to mimic the decoding stem of the P-site tRNA, indicate that there are constraints on interactions at the P-site. At this site, the mRNA is forced to adopt a kinked formation allowing the anticodon stems of the A- and P-site tRNAs to be relatively far apart. The ribosome structures have revealed much detail and allowed insight into how decoding accuracy can occur, resulting in speculation of constraints that prevent a shift in reading frame. Clearly, there are mechanisms and constraints through structural interactions of the tRNAs with the rRNA and the mRNA that are strong determinants for the canonical events of protein synthesis and ensure that non-canonical events are an exception rather than the rule. For a non-canonical event to occur there have to be extraordinary circumstances that overcome the normal restraints on such events. These can be primary sequence cis signals and/or specific secondary structures in the mRNA spaced at particular distances from a primary signal. These mRNA elements somehow perturb the normal kinetics of protein synthesis, often causing pauses at decoding sites that allow competition between canonical and non-canonical events. The non-canonical result will occur in a proportion of the ribosomal passages through the signal with the frequency dependent on competition strength between the canonical and non-canonical events.
3 The Use of a Stop Signal for both Elongation and Termination of Protein Synthesis
Although the stop codons (UAA, UAG, and UGA) were once thought to be used universally as stop signals in protein synthesis, there are now many specific examples where they have been captured to encode amino acids. However, in most of these instances they only signal stop or sense. For example, in mitochondria and in mycobacterium species, UGA is frequently used to code for tryptophan but in these cases does not signal stop. Also, unicellular eukaryotic organisms such as Tetrahymena use UGA for stop whereas UAA and UAG code for glutamine. These are all examples where there has been codon ‘takeover’ or, perhaps, ‘reassignment’ in different organisms although the events that they signal are still canonical processes of protein synthesis. An exception to this kind of promiscuity for stop codons is when a UGA signals selenocysteine (Sec) in a small number of genes but still signals stop in the same organism in the vast majority of occurrences. This implies that elongation and termination must be in competition at the signal. This competition occurs in a wide range of organisms in the eubacteria, archaea and eukarya kingdoms. The mechanisms for incorporation of Sec at UGA sites are distinct in prokaryotic and eukaryotic organisms, although there are some similarities. For example, there is a secondary structural element that is critical for the signal to function in both but
3
4 The Mechanism of Recording in Pro- and Eukaryotes
in prokaryotes it is within the coding region and close to the primary signal UGA, whereas in eukaryotes it is quite distant and found in the 3 -hydroxyl untranslated region. In both cases, there is a special tRNA for Sec that is a minor isoacceptor of a serine tRNA where the serine has been modified by specific proteins.
4 The Mechanism for Sec Incorporation at UGA Sites in Bacterial mRNAs
Elegant studies from B¨ock and co-workers [11, 12] through the 1980s and 1990s defined the genes that were responsible for incorporation of Sec into proteins and largely defined the mechanism of how this occurred at the ribosome. Four genes controlling this mechanism were defined selA–D. 4.1 The Gene Products 1. The selD gene product is a 37 kDa monomeric protein, selenophosphate synthetase [13]. Although the identity of the physiological selenium substrate that is phosphorylated by ATP is still uncertain, there is evidence for an enzyme-bound phosphoryl intermediate [14], which is then attacked in vitro by selenide at the active site releasing the selenophosphate product. Selenide is a highly reactive molecule and may not be the physiological substrate or, if so, may be sequestered by another molecule for this reaction. 2. SelC is the gene encoding a minor serine tRNA that is first aminoacylated with serine by the normal synthetase [15] and after conversion of Ser to Sec is subsequently used during translation to deliver Sec into the elongating polypeptide. Interestingly, the tRNA is not well recognized by the typical elongation factor, EF-Tu, that delivers aminoacyl-tRNAs to the ribosome. This implies that the Ser-tRNASec would have structural features unlike all other tRNAs and, indeed, it does have subtle differences in its structure. At a length of 95 nucleotides, it is one of the longest tRNAs known and this is largely because it has a large variable arm. Moreover, several invariant residues in other tRNAs are different in the selC gene product and, significantly, the amino acceptor arm has eight rather than the seven base pairs of other tRNAs [16]. 3. SelA encodes a 50 kDa subunit of an oligomeric protein comprising 10 subunits. Each subunit has a pyridoxyl phosphate moiety. Conversion of Ser-tRNASec to the Sec derivative is catalyzed by this enzyme using the selenophosphate as a substrate donor of selenium. The whole conversion takes place on this enzyme [17]. The serine is converted into amino acrylyl derivative by elimination of water from the seryl moiety first and then the activated selenium derivative is added to this intermediate to complete the conversion. The enzyme has a high degree of specificity for the Ser-tRNA, with one tRNA bound per two subunits [18]. Using electron microscopy, it was determined that the enzyme comprises a double ring
4 The Mechanism for Sec Incorporation at UGA Sites in Bacterial mRNAs 5
of five subunits each, consistent with the stoichiometry of tRNA binding (five per enzyme) [19]. The extra long amino acceptor stem of the tRNA and its large variable loop are both important for this binding. 4. SelB encodes the specific elongation factor that recognizes the Sec-tRNASec and is clearly important for the delivery of Sec to the elongating polypeptide. It is a protein of 69 kDa and exhibits a high degree of sequence similarity to both EF-Tu and the initiation factor, IF-2, within its N-terminal region (244 amino acids). SELB is much bigger than EF-Tu (69 kDa versus 43 kDa) and the C-terminal extension on SELB not shared with EF-Tu may have some other function such as recognizing the mRNA context of the UGA recoding site [20]. The protein cannot bind Ser-tRNASec in contrast with the Sec derivative and this explains why Ser is not incorporated at a UGA recoding site. The major determinant on the tRNA for binding to SELB is the eight base amino acceptor stem and conversion to the typical seven base pair stem abolishes binding [21].
4.2 The Mechanism of Sec Incorporation
Sec incorporation involves an intriguing mechanism in which Sec-tRNASec and SELB are major players. As well, SECIS (selenocysteine insertion sequence) elements are critical in a small number of specific mRNAs such as that for formate dehydrogenase F (FDHF) [22]. These mRNAs contain UGA and a stem-loop (the SECIS element). In fdhF, it was originally predicted that the stem-loop closely followed the UGA codon but further studies have suggested that the UGA is within the stem-loop structure that forms the SECIS element before it approaches the ribosomal decoding site [23]. SELB binds to the stem at a specific site in the apical loop and upper helical region. The structure of the loop rather than its primary sequence seems to be the important determinant. A bulged region on the upper 5 -arm of the stem and nucleotides in the apical loop are protected by SELB from hydroxyl radical cleavage in footprint experiments. Over-expression of SELB and SELC does not lead to a misincorporation of Sec at typical UGA stop codons. This indicates that delivery of SELB.GTP.Sec-tRNASec to the ribosome is different than that for other tRNAs in ternary complexes. H¨uttenhofer and B¨ock [24] have obtained evidence that suggests the ternary complex may be in a ‘pre-competent state’ before binding to the mRNA stem-loop. A variety of approaches indicate that SELB must be complexed with the SECIS element for a productive interaction with the ribosome to occur. These studies suggest that binding to the SECIS element induces a conformational switch in SELB that facilitates the formation of an anticodon:codon interaction between the Sec-tRNASec and the UGA codon as it reaches the ribosomal A-site. The SECIS element would then act like a safety switch, preventing normal UGA termination codons being decoded as Sec by the SELB.GTP.Sec-tRNASec ternary complexes [24, 25]. Once the switch converts SELB into a ‘competent state’, it would give SELB a strong selective
6 The Mechanism of Recording in Pro- and Eukaryotes
advantage when it reaches the ribosomal A-site and is in competition with the decoding release factor, RF2, to decode the now A-site UGA. In this way, the SECIS element acts not only as a functional switch for protection, but also as a facilitator to send the ternary complex along a kinetic path whereby Sec is incorporated into the polypeptide chain. 4.3 The Competition between Sec Incorporation and Canonical Decoding of UGA by RF2
Factors affecting the competition between the RF2 and Sec incorporation in vivo during translation of the fdhF Sec (UGA) recoding site have been defined with wildtype and modified fdhF sequences [26]. Altering sequences surrounding the UGA codon to create more or less efficient UGA-containing stop signals without affecting the secondary structure of the SECIS element, have indicated that the kinetics of stop signal decoding have a significant influence on Sec incorporation efficiency. The UGA codon in the specific fdhF sequence remains ‘visible’ to the decoding RF2 that in vitro can form a site-directed ‘zero-length’ crosslink to it when the secondary structure of the mRNA created by the stem-loop is absent [27]. Increasing the cellular concentration of either the RF2 decoding molecule for termination, or the tRNASec decoding molecule for elongation (for Sec incorporation), showed that these molecules are able to compete for the UGA by a kinetic competition that is dynamic and dependent on the growth rate of Escherichia coli. The tRNASec mediated decoding can compete more effectively for the recoding site UGA at lower growth rates, consistent with the well-established anaerobic induction of fdhF expression, when, presumably, Sec-containing enzymes are in an environment protected from oxidation. How is the competition between the RF2 and tRNASec -decoding molecules mediated? There is a reciprocal relationship between termination and Sec incorporation efficiencies at the fdhF-recoding site UGA, providing compelling evidence for competition between the canonical and non-canonical decoding events. Mansell et al. [27] have proposed a ‘helical approach’ mechanism for how this competition might be mediated. The competitiveness of either decoding molecule at the UGA can change according to the relative concentrations of the participating molecules. The SELB complex carrying tRNASec is bound to the apical loop of the fdhF stem-loop as the sequence approaches the ribosomal decoding site. If the complex is to remain bound, it must rotate about the axis of the helical stem as the secondary-structure unwinds. There is a likelihood of a ribosomal pause or translational slowing because of the increased torsional load imposed by the unwinding hairpin. The SELB complex is ideally positioned to deliver the tRNASec just as the UGA reaches the A-site and this would apparently give the tRNASec a significant advantage over the decoding RF2 to reach the inner cavern of the ribosomal active center. Indeed, this may be why the relatively efficient termination context of the UGA performs poorly against this competition. However, the creation of a translational slowing [28] by the ‘helical approach’ of the SELB complex may also provide the window of opportunity for the RF2 to remain relatively competitive for decoding the UGA (Fig. 1A).
5 Mechanism for Sec Incorporation at UGA Sites in Eukaryotic and Archaeal mRNAs 7
Fig. 1 The cis elements and trans factors critical for Sec insertion during selenoprotein synthesis. The mechanism for Sec incorporation is shown for prokaryotes in (A) and for eukaryotes in (B).
5 Mechanism for Sec Incorporation at UGA Sites in Eukaryotic and Archaeal mRNAs
Less is known about the mechanism of Sec incorporation into eukaryotes and archaea. The SECIS elements in eukaryotes are also stem-loops but differ in structure and location from their bacterial counterparts. They are distant from the UGA site where the incorporation of Sec is to occur. Indeed, in eukaryotes, there is a minimal spacing of 60 nucleotides between the two for the SECIS element to function, but it can be as far away as several thousand bases. Although these elements are downstream of the UGA in most cases, there is one report of a SECIS element in the archaeon, Methanococcus jannaschii that is upstream of the UGA recoding site [29]. 5.1 The Gene Products
Are there equivalent genes to those found in bacteria that mediate Sec incorporation into archaea and eukaryotes? 1. SELA the oligomeric selenocysteine synthase has been found in archaea but not, as yet, in eukaryotes although putative homologs have been suggested [6]. 2. SELB The search for this specific elongation factor protein has been protracted and has resulted in several false leads. Initially, a protein, selenocysteine-binding protein 2 (SBP2), essential for Sec incorporation into rabbit reticulocyte lysate was thought to be the eukaryotic equivalent of SELB but it lacked elongation factor function [30]. Eventually, after database searches of increasing complexity to look for Sec-specific elongation factor homology from the archaea through to the eukaryotes, specific candidates for the SELB protein were identified [31]. One of these candidates, eEFSec, expressed as a recombinant protein, exhibited all the necessary and expected characteristics of the required factor being highly specific
8 The Mechanism of Recording in Pro- and Eukaryotes
in its binding of Sec-tRNASec and associating with SPB2 to form a complex with the SECIS element during selenoprotein synthesis. 3. SELC was identified first as a minor serine-specific tRNA in mammals [32] and later was shown to be the tRNASec in eukaryotes [33]. It is present in all eukaryotic species examined and, almost exclusively, the gene is present as a single copy. The human tRNA can substitute for the bacterial SELC in Ser to Sec conversion. An important recognition determinant is a 13 base-pair coaxial helix involving an extended acceptor stem of 9 base pairs and a shortened T stem of 4 base pairs probably present in both bacterial and human tRNAs. Although the 9/4 arrangement rather than a 7/5 structure is somewhat controversial, the archaeal SELC can fold only into a 9/4 structure and provides an evolutionary reason for the presence of this 9/4 coaxial helix. 4. SELD was identified in humans by Berry and co-workers [34] and, although having a low similarity to the bacterial SELD, could complement a bacterial selD mutation. The bacterial protein is also functional in mammalian cells suggesting a strong commonality of mechanism. The mouse and human enzymes themselves are selenoproteins with selenium at the active site and, therefore, may be involved in autoregulation of selenocysteine metabolism. 5.2 The Mechanism of Sec Incorporation at Specific UGA Stop Codons
Incorporation of Sec into a mammalian or archaeal protein during UGA decoding is dependent on the presence of specific structures in the mRNAs encoding these proteins. They can have different structures and certainly different primary sequences. Two consensus classes have been defined [35]. The first consists of a 9–11 base-pair stem separating a conserved SECIS element core at the base of the stem, from a 10–14 nucleotide loop with three adenosines at the 5 -side of it. In the second class, there are three adenosines comprising an internal bulge in the stem before it continues to a smaller 3–6 nucleotide loop at the tip. How does the location of this SECIS element affect the mechanism, given that the decoding complex in bacteria is placed immediately following the UGA codon at the decoding site and the same mechanism is not feasible in mammals or archaea? Unlike the situation in prokaryotes, there are two proteins involved rather than simply a specific elongation factor to carry the Sec-tRNASec . The first to be discovered was a SECIS element binding protein, SBP2 [30]. This protein binds selenoprotein mRNAs specifically, and Sec incorporation depends on its presence. It is speculated that the protein may play a role in excluding the eukaryotic release factor from the UGA site since it has homology to a yeast omnipotent termination suppressor of protein biosynthesis, SUP1. The eEFSec binds both isoforms of SectRNASec but not its serylated precursor or other tRNAs and, as well, binds GTP to show the classic characteristics of an elongation factor. Indeed, this protein also interacts with SBP2 and the two proteins function together for Sec incorporation into selenoproteins. This implies that the delivery complex consists of the SECIS element, SBP2, and the Sec-tRNASec bound to eEFSec (Fig. 1B). How this spans
6 Why does Recoding Occur at Stop Signals?
the distance to the upstream UGA is not clear, but if there were a kinetic exclusion mechanism to prevent eRF1 from decoding the UGA as stop, the complex could position itself optimally for the decoding event. For example, if SBP2 were to make an association with the UGA before it entered the ribosomal A-site, then the complex already would be positioned for decoding and the eRF1 would be compromised. As both eEFSec and SBP2 have nuclear localization sites, it is speculated that the complex might be assembled on the mRNA in the nucleus ensuring the first round of translation is primed for Sec incorporation.
6 Why does Recoding Occur at Stop Signals?
Incorporation of selenocysteine at UGA stop codons could be explained simply by the presence of specific SECIS elements. However, the immediate context of the UGA stop signal at most Sec incorporation sites influences competition by the Sec incorporation machinery. For example, the nucleotide immediately following the UGA recoding site in eukaryotic selenoprotein mRNAs is usually either pyrimidine U or C. Changing this nucleotide from C to a purine in the type I iodothyronine 5 -deiodinase mRNA decreases the amount of complete product and significantly increases the premature chain-termination product. This suggests that there is kinetic competition between termination and Sec incorporation at the site [36]. The context of the nucleotides surrounding stop codons clearly has a major influence on whether a particular stop codon is efficient and whether competing non-canonical events can occur (Fig. 2). If stop codons were decoded at different kinetic rates according to the context of the surrounding nucleotides, then there is opportunity for near-cognate tRNAs to be more competitive in some circumstances. The stop codon could be decoded as sense, or even facilitate a frameshift event to occur during a translational pause when the kinetics of stop signal decoding are particularly slow. What is the evidence that upstream and downstream contexts can affect stop codon efficiency? As early as 1981, Kohli and Grosjean [37] highlighted an apparent bias in both the codon immediately prior to stop codons and in the nucleotide immediately following the stop codon in the very limited data set of gene sequences available at the time. Analysis of nucleotide bias surrounding stop codons became possible as more sequencing data emerged and, more recently, as the sequences
Fig. 2 Sequence elements that influence recoding sites.
9
10 The Mechanism of Recording in Pro- and Eukaryotes
of whole genomes have been completed. An algorithm was created to extract sequences around stop codons [38] and a TransTerm database constructed as a resource for translational signal analysis [39]. It was concluded from a study of nearly 1000 E. coli genes and lesser numbers of genes from other bacteria that the stop signal was actually a tetranucleotide rather than a triplet codon and, as well, there were clear upstream and downstream contextual biases [40]. This was most apparent for stop signals used in the most highly expressed genes (the top 10%) where, in addition to a limited subset of sense codons known to be used in these mRNAs, the stop signals had U following the stop codon almost exclusively. More recent analyses of larger data sets of both prokaryotic and eukaryotic genes indicate that there is a clear ‘signature’ comprising a sequence element that starts two codons before the stop codon itself and extends beyond for another six nucleotides or so [41]. This strongly hinted that there may be a hierarchy of stop signals of varying decoding efficiencies and that this may exert a significant influence on whether a recoding event could occur. 6.1 The Stop Signal of Prokaryotic Genomes – Engineered for High Efficiency Decoding?
Important to study this question, has been the availability of the complete genome sequence of two E. coli strains that had been separated by approximately 4.5 million years of evolution. The 0157:H7 pathogenic strain has acquired or retained 25% more genes than strain K12 and each has acquired or retained unique genes not found in the other strain, with 1387 genes unique to 0157:H7 and 528 unique to K12 [42]. Nevertheless, the bias of the molecular signature at the stop codons in the genes from these two strains was highly similar for TAA and TGA. In addition, the signatures were somewhat different for each of the three codons. Moreover, when this bias was analyzed to determine which individual nucleotides contributed, there was a high degree of concordance between the genes from the two strains despite there being greater then 75 000 polymorphisms in the ‘homologous backbone’ of the DNA sequences between the two strains. These data provide further evidence that there is a preferred sequence element for high efficiency stop signal decoding at least for UAA and UGA, and, by inference, the occurrence of non-canonical error events is minimized. Sequence biases upstream and downstream from stop codons must reflect different mechanisms. The upstream sequences in the mRNA (the last two codons) are already involved in decoding events through ribosomal P- and E-site interactions with tRNAs and, therefore, their influence on RF stop-codon decoding must be indirect. On the other hand, the downstream sequences are not involved in mRNA–rRNA interactions according to current understanding and would be available to make direct interactions with the decoding RF protein itself. The fact that the stop signal is decoded by a protein rather than a tRNA as for sense codons, means that there is no intrinsic reason why more nucleotides than just the triplet codon might make contact with the RF. Indeed, with this in mind, zero-length site-directed crosslinking from specific positions in the mRNA to the E. coli RF
6 Why does Recoding Occur at Stop Signals?
protein using4 thio-U instead of U in the RNA sequence has been carried out. Crosslinks were obtained from the first position of the stop codon (+1) and the three positions following the stop codon (+4 to +6) but not beyond, suggesting close physical contact between these nucleotide positions and the protein factor [43]. These data support the concept of direct interaction between the RF and stop signal that extends beyond just the three nucleotides from the stop codon. What might be occurring upstream of the stop codon? Here, there are two features that are created by the tRNA-decoding events. First, the two tRNAs in the Pand E-site positions have specified the ultimate and penultimate amino acids of the completed polypeptide positioned at the peptidyltransferase center and at the beginning of the ribosomal exit tunnel, respectively. Secondly, the tRNAs themselves have a common three-dimensional shape but have micro stereochemical detail in the bases and modifications in each position of their sequence. Therefore, a particular tRNA in the P-site can create a three-dimensional environment against which the decoding RF, spanning between the decoding center and the peptidyltransferase center in the ribosomal A-site, has the potential to make contact. In this way, both the amino acids specified and the tRNAs themselves may influence the stop signal decoding rate by the RF. This protein is occupying the binding site (A-site) that has been carefully crafted for an aminoacyl-tRNA and, by analogy, is like a cuckoo in the nest of another bird. Just as the cuckoo can be too big for the nest, so the RF may be constrained in the ribosomal A-site. If this were the case, then the upstream sequences coding for specific tRNAs and amino acids may contribute to a three-dimensional binding site for the RF that results in altered kinetics during stop codon decoding. To determine whether the C-terminal amino acid of a protein might be having an effect on the efficiency of stop signal decoding, it is possible to examine whether a particular amino acid is abundant and whether its occurrence is highly biased. It is interesting that the two E. coli strains have similar proportions of each amino acid at the C-terminal positions of their proteins. The abundance and bias in the use of amino acids at this position was highly similar over all genes within the two strains and particularly within the TAA and TGA terminating genes. There were global trends in bias both for and against amino acids with certain characteristics that were still evident when the abundance of the amino acids at the C-terminal position were analyzed. What was significant from this study was that the biases and abundances of the amino acids were not identical for each of the specific stop codons, suggesting that the biases were related to the termination phase of protein synthesis and not simply some other unrelated translational process. The conclusion from these analyses is that the last amino acid of the protein may have some stereochemical or charge-related effect on the efficiency of RFmediated stop signal decoding when this amino acid is positioned at the ribosomal peptidyltransferase center through the P-site tRNA. Although the stop codon is decoded by the RF positioned at the A-site in the decoding centre, for successful termination to occur a signal must be transmitted through the RF structure to the peptidyltransferase center so that release of the polypeptide can occur [44]. Further studies are needed to investigate this possibility.
11
12 The Mechanism of Recording in Pro- and Eukaryotes
What is the situation with the penultimate amino acid that was brought to the ribosome by the tRNA now positioned at the E-site? It is difficult to understand a direct effect of this amino acid on termination efficiency as it is likely to be entering the exit tunnel and less likely, therefore, to affect the RF stereochemically. However, there may be an indirect influence on the stereochemical position of the C-terminal residue by the preceding amino acid through its interactions with the surrounding ribosomal architecture and this could explain an influence on stop signal decoding efficiency. The analysis has shown that fewer residues show bias at this position than at the ultimate position but there were still preferences for amino acids with certain characteristics [42]. The picture that has emerged from these studies is that there is a much weaker influence on termination efficiency from the penultimate amino acid than the last amino acid, but still with a suggestion of some indirect influence yet to be understood. Bias in the two codons upstream of the stop codon may not necessarily reflect the amino acid at all or solely, but may be a more direct effect of the tRNAs that are bound into the two ribosomal sites and their interactions with other RNAs. As a particular amino acid can be carried by different tRNA isoacceptors often recognizing different codons (e.g., Leu1 CUG; Leu2 CUC/U; Leu3 CUA/G; Leu4 UUG; and Leu5 UUA), the codon abundance in the last position before the stop codon has been analyzed in detail. This has allowed a determination of whether there is a subset of tRNAs in the termination complex during stop signal decoding. As for analysis of amino acid frequency, both abundance and bias of codons are important criteria and may not necessarily correlate. There were both specific codons preferred and specific codons that were rare in the last position. These were different from the biases found in the penultimate codon but, as with the amino acid analysis, effects in this position were much less marked and gave a relatively weak signature. What do these trends in codon use at the last amino acid position mean for the selection of specific tRNAs? There were several striking consequences. All but one of the codons common in this position were decoded by only one isoacceptor tRNA species, whereas codons selected against were often decoded by several isoacceptor tRNAs. There were specific sequence characteristics of the abundant tRNAs and some of these related to particular stop codons. For example, before UAA, the most common four codons were decoded by only two tRNAs that were two (of only three) tRNAs with a modified mnm5 s2 U at the anticodon wobble position 34. These modified tRNAs were also used abundantly before the other stop codons as well. This suggests that the ultimate tRNA is contributing to the efficiency of stop signal decoding and may reflect, as indicated above, its contribution to the binding site architecture for the decoding RF in the A-site or the maintenance of stable peptidyl– tRNA interactions. Structurally, the tRNA would line one side of the space that the RF occupies during decoding and there is a potential for interaction between these two macromolecules. Indeed, site-directed crosslinks from the elbow of the tRNA (position 8) and the anticodon loop (position 32) to the RF has been achieved with zero-length crosslink moieties in the tRNA suggesting that there is very close contact at several positions [45].
6 Why does Recoding Occur at Stop Signals?
Is there any evidence that bioinformatic analyses have revealed important features of the stop signal that are physiologically important for both non-canonical recoding and canonical decoding in protein synthesis? First, classic recoding sites where stop signals are involved do have contexts that are rarely found at natural termination sites of genes. The best example is the frameshift site in the bacterial RF2 gene where the downstream context UGACUA is found in only three other genes. This context is the weakest of all 64 UGANNN sequences tested at this site in vivo allowing the non-canonical recoding event to occur in approximately 90% of ribosomal passages. In contrast, the strongest context of the 64 possibilities was UGAUUA with substitution of U for C in the +4 position and consistent with the predictions from bioinformatic analysis. Similarly, the downstream context UGACAC at the selenocysteine incorporation site in the fdhF gene was also shown to contribute to a relatively weak termination signal. At each of the three stop codons, the data from the 64 contexts of the +4 to +6 bases gave a hierarchy of signal efficiencies with the +4 base highly influential. This supports the original suggestion that the termination ‘codon’ should be thought of as comprising four and not three bases [39]. Indeed, in eukaryotic in vitro experiments, the RF has been shown to respond to a minimum of a four-base stop signal [46]. Does the termination signal efficiency of particular downstream contexts determined experimentally correlate with signal abundance in E. coli genes? For TGA contexts, signal strength correlated well with signal abundance and six-base bias, implying the most efficient signals are those that are most frequently used and that use of inefficient signals is avoided except at recoding sites. However, the minor set of TAG signals in E. coli continue to be an enigma as there was a range of decoding efficiencies (Fig. 3) but with a negative correlation between abundance and bias with efficiency. Apart from suggestions from the bioinformatic analyses described above, earlier experimental work had shown effects of the last two sense codons on termination signal strength at selected contexts, as measured by the failure of the signal to specify stop but instead allow incorporation of an amino acid through near-cognate decoding [47–49]. More recently, termination signals spanning 12 nucleotides comprising the last two sense codons, the stop codon, and the three nucleotides following, were constructed to reflect predicted weak, strong and hybrid (strong upstream and weak downstream, and vice versa) signals (Fig. 4) [42]. Again, for UGA signals it was possible to predict correctly which would be strong and which would be weak (those that allow significant stop codon readthrough) with upstream and downstream sequences acting co-operatively. This became more obvious when tested in E. coli strains carrying suppressor tRNAs where competition was stronger with a cognate tRNA present. With UAA signals, competition from suppressor tRNAs was sufficient to reveal relative strengths of the stop signals only in the suppressor strain. With the enigmatic UAG signals, the 5 contexts behaved according to prediction, but the effects of 3 contexts did not correlate with bias or abundance. These studies clearly indicate that context both upstream and downstream of the stop codon in bacteria has effects on the efficiency of stop codon decoding. Certain
13
14 The Mechanism of Recording in Pro- and Eukaryotes
Fig. 3 Termination efficiencies of UAGNNN signals in competition with frameshifting. The termination efficiencies were measured as described in Ref. [78]. Efficiencies are shown relative to a point of equal competition (50% termination midline) with signals divided into graphs according to the identity of the +4 base immediately following the
UAG codon. The majority of signals with a +4 G support > 50% termination, signals with a +4 C all gave < 50% termination and signals with a +4 U or A fall on either side of this midline. The error bars are the S.E.M. for at least six determinations of each signal.
contexts not frequently found with UAA and UGA stop codons are assumed to increase the translational pause at the A-site stop and allow for competing events such as near-cognate decoding or translational frameshifting to occur. They provide the capacity for a recoding site to evolve at a stop codon allowing non-canonical events to occur during protein synthesis, thus recruiting additional complexity to the regulation of gene expression. This is clearly the case at the frameshift site of the prfB gene encoding RF2 [50–52] and at the Sec incorporation site of the fdhF gene [27]. In both cases, the balance of the canonical and non-canonical events can be significantly altered in vivo by changing the stop codon sequence context despite not
6 Why does Recoding Occur at Stop Signals?
Fig. 4 The strategy for testing the strength of termination signals spanning 12 nucleotides. Constructs were designed that contained TAA, or TAG with a predicted strong or weak sequence element 5 or 3 to the stop codon to give ‘strong’, ‘weak’ and hybrid signals.
altering the specific cis elements that favor the non-canonical event. Physiologically, it is events in trans that alter competition such as the concentration of the decoding RF in the case of the frameshift event [52] and the relative concentrations of the two decoding molecules RF and Sec-tRNASec in the case of Sec incorporation [27]. On the other hand, such competitions are precluded at most stop signals found at the ends of the coding regions of mRNAs because the context makes the stop signal so competitive for termination against non-canonical events that recoding is insignificant. 6.2 The Stop Signal of Eukaryotic Genomes – Diversity Contributes to Recoding
When nucleotide bias is examined in the non-redundent cDNA sequences from a number of eukaryotic genomes such as Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, or Homo sapiens, a characteristic pattern is seen [53]. A scan towards the stop signal reveals a characteristic increase in bias a few nucleotides before the codon as is seen for prokaryotic genomes and following the stop codon there is a gradual decrease of bias for up to nine nucleotides downstream (Figure 5A). This is the classic signature of a sequence element, with nucleotides upstream and downstream from the stop codon having the potential to contribute to the strength of the signal. When the nucleotides contributing to the bias are analysed, the pattern is similar amongst the various eukaryotic genomes but different from those of the E. coli strains. For example, where U (especially) and G in the +4 position are highly abundant and contribute to highly efficient signals in bacteria, in eukaryotes, the purines G (predominately) and A are favored (Figure 5B). Highly expressed genes in these eukaryotic genomes have been classified by analysis of two-dimensional protein gels and through the use of the codon adaption index (CAI; a measure of the codon subset used within the gene and strongly correlated with level of expression) together with a number of other criteria. The genes identified as highly expressed have a more significant bias in the nucleotides around the stop codon compared with those genes that are expressed at lower levels. This indicates that there is some translational advantage to the bias in the sequence element specifying termination. The bias and abundance of the amino acid found
15
16 The Mechanism of Recording in Pro- and Eukaryotes
Fig. 5 A statistical analysis of the nucleotide bias surrounding stop codons in the Saccharomyces cerevisiae genome. The bias (nonrandomness) was determined (A, B) by calculating the P 2 value [(observedexpected)2 /expected]. Bias is shown for nucleotide position (A) and in the individual bases contributing to bias at each position (B).
in the C-terminal position and the particular codon used is less marked than in prokaryotic genomes although there are residues that are still over-represented (His, Lys) and under-represented (Gly, Pro). There are also codon biases, for example, the lysine codon AAA is over-represented with respect to AAG in the yeast genome. This may relate to the fact that the tRNA for AAA is the hypermodified tRNA2 Lys , which may improve stability of peptidyl–tRNA interaction at the P-site during stop signal decoding with a corresponding decrease in the rate of readthrough at the following stop codon. Various features of a tRNA contribute to an ability to suppress a stop codon and read it as sense. Suppressor tRNAs, apart from tRNASec , are normal cellular tRNAs with a primary role to decode cognate sense codons but also they can decode stop codons that are near-cognate. There must be some enhanced codon:anticodon stability in the near-cognate interaction perhaps through unconventional base-pairings. Supporting this, is the extent of tRNA modification in or 3 to the anticodon important to enhance or depress readthrough rate [54]. Clearly, an unmodified tRNA coupled with a weak termination context would most probably enhance stop-codon suppression and allow readthrough to compete with the termination event. Bioinformatic analysis of the eukaryotic genomes has enabled prediction of strong and weak stop signal elements. Strong elements favor penultimate and ultimate codons that use tRNAs modified in their anticodons, coupled with G in the +1 position immediately 3 to the stop codon. In contrast, weak elements favor penultimate and ultimate tRNAs that are unmodified in their anticodons and a stop signal with T or C in the +1 position. These predicted weak signals are similar to the sequences found in viral recoding sites [53]. Experiments in mammalian in vitro and in vivo systems have confirmed that sequences over-represented upstream and downstream from the stop codon support only low levels of readthrough
7 Readthrough of a Stop Signal: Decoding Stop as Sense
compared with that supported by under-represented sequences. Sequences found at viral recoding sites supported elevated levels of readthrough in the test system and the patterns were similar for all three stop codons and in contrast with that found at prokaryotic stop codons. This can be explained by the fact that the eukaryotic RF, eRF1, recognizes all three stop codons in contrast with the two bacterial factors that each recognize UAA but are then specific for either UAG or UGA. Another recent study using mammalian cells in vivo showed that the −1 base (the base immediately prior to the stop codon) of a UAG stop signal element influenced recoding at the site mediated either through an effect from the characteristics of the P-site tRNA or interactions of the anticodon with the nucleotide [55]. In yeast, termination efficiency can vary significantly according to context and specific sequence motifs supporting readthrough have been identified [56]. Rousset and coworkers [57] looking for ‘weak termination contexts’, found readthrough motifs in eight yeast open reading frames within the genome. Studies with one of these, PDE2 encoding a cAMP phosphodiesterase, found a 20-fold difference in readthrough depending on whether the strain was [PSI− ] (an epigenetic element resulting in decreased accuracy of translation termination) or [psi− ] (normal strain) and the extended protein had lower stability. Just as is the case with prokaryotes, in eukaryotes, the sequences both upstream and downstream from the stop codon can have a profound influence as to how efficiently the codon signals stop. The nature of the sequence element opens up opportunities for recoding to occur at the stop codons and provides a subtle layer of gene regulation in specific circumstances where defined amounts of a protein are required.
7 Readthrough of a Stop Signal: Decoding Stop as Sense
The fact that the sequence context of the stop codon can produce a hierarchy of stop signals of varying efficiencies gives the potential for subtle redefinition of the signal so as to regulate amounts of a protein dependent on a readthrough event or for provision of a balanced ratio of two proteins from the same mRNA sequence. The extreme case is where the codon itself is defined for two purposes as with Sec incorporation at specific UGA codons discussed above. In this case, there is a cognate species for both events rather than a near-cognate competitor. However, it requires extra sequence elements for one of the cognate decoding species, the SectRNASec , to be competitive at the site. Indeed, Sec incorporation could be regarded as a canonical event and not as a recoding event where the UGA codon is reclassified in the same genetic system for another purpose. Rather than modifying one of 64 codons to provide a specific codon for Sec with a modified base to provide unique structure, one of the existing codons is utilized. Given that the origin of Sec may have been ancient and its presence particularly relevant as a catalytic residue when oxygen was not such a dominant physiological molecule, then UGA may have originally encoded Sec but subsequently has been captured as a stop codon. It is
17
18 The Mechanism of Recording in Pro- and Eukaryotes
interesting that in other genetic systems where a stop codon is used to specify an amino acid, for example, UGA for Trp in mammalian mitochondria, there is no duality of meaning for the codon. In mitochondria the definition of the codon as stop has most probably been lost during evolution. Readthrough of the UGA stop codon at the Sec insertion site in the fdhF mRNA is prevented even in the absence of selenium [58] and this depends on sequences upstream and downstream from the UGA. This could be interpreted either as a special protective element or as the sequence element of the stop signal now simply out-competing near-cognate events. The natural sequences were better at preventing readthrough than alternatives tested. This context may have evolved so that competition for decoding was only between Sec incorporation and termination, excluding the third possible event of another amino acid being incorporated at the site. This would be potentially possible if the decoding rate by the competing RF was slowed. However, such a protein would be non-functional without the key Sec residue at the active site thereby providing a rationale to exclude this possibility. Most cases of stop codon readthrough would involve near-cognate tRNAs that become competitive either because of the sequence of the stop signal or because there are other cis elements that favor the near-cognate over the cognate event. Given the potential for readthrough to be used as a mechanism of regulating gene expression and creating more diversity in the proteome, it is surprising that more examples have not been found. To date, there are a small number of examples where readthrough seems to be important and these span viruses, bacteria, and eukaryotes. Readthrough occurs at UGA and UAG stop codons with the most common amino acids incorporated, Trp and Gln, respectively. Although the efficiency of the event is relatively low (1–10%) in competition with termination, this is still 100- to 1000-fold above the error rate. The cis elements that influence readthrough can be well beyond the boundaries of the stop signal. For example, similar to the SECIS element in eukaryotes for Sec incorporation, an element several hundred nucleotides downstream from the stop codon influences readthrough at a specific stop codon in barley yellow dwarf virus [59]. In addition, secondary-structural elements are also important. A classic example is found in the synthesis of the murine leukemia virus gag-pol precursor protein, where a pseudoknot is an important mediator of the recoding event [60]. On the other hand, other viral examples do not seem to have cis elements beyond the immediate environment of the stop codon that is under recoding pressure. For UGA recoding sites in Sindbis virus [61], and in an E. coli bacteriophage RNA Qβ [62] only the +4 base of the stop signal seems critical. In both cases, the nucleotide following the stop codon is C contributing significantly to a poor context for termination. These situations would be classic candidates for readthrough based on what we know now about the stop signal. Atkins and co-workers [63] have examined 91 unique viral sequences where readthrough of stop signals is known to occur. It is of interest that 90% had one of six trinucleotide sequences downstream from the site (out of the possible 64). The authors make the point that the identity restriction of six nucleotides following the stop codon is remarkable given they come from RNA viruses where mutation rates are high. In other words, there has been strong pressure to retain the contexts. While
8 Bypassing of a Stop Codon: ‘Free-wheeling’ on the mRNA
readthrough may reflect the strength of the stop signal that these contexts create and the rate of decoding by the RF, in the case of RNA viruses where evolution of optimum sequences is likely to occur quite rapidly, other equally or more important features may also have evolved. These may allow for an enhanced rate of aminoacyltRNA binding during near-cognate decoding (perhaps mediated by non-canonical interactions between particular context sequences of the stop-codon mRNA and the rRNA in the environment of the A-site) apart from any secondary structural elements that may be enhancing these effects. Recoding in these circumstances utilizes a stop codon as the marker for the site and in most cases a stop signal that is decoded more slowly by the RF so that there is a favorable site for building a recoding signal with the addition of other sophisticated and specific elements in cis or in trans. The examples in prokaryotic and eukaryotic genomes where readthrough appears to be important are generally less well studied but are potentially very interesting. For example, in D. melanogaster, at least three genes seem to be regulated via stop codon readthrough. Readthrough at a UGA in the kel gene is regulated both in a tissue and developmentally specific manner with maximal readthrough during metamorphosis [64]. A topoisomerase gene in Bacillus firmus is the only documented bacterial gene not derived from bacteriophages that is supposed to use readthrough as a means of regulation [65].
8 Bypassing of a Stop Codon: ‘Free-wheeling’ on the mRNA
In the last decade or so, the more RNA is studied the more remarkable mechanisms are discovered associated with its biology. Many of these could not have been anticipated or dreamt by scientists as possibilities. One such recoding event is stopcodon bypassing, where a section within the mRNA coding region is missed out and recoding starts again further down the mRNA. The classic example is for the bacteriophage T4 gene 60, a topoisomerase subunit gene, where 50 nucleotides are omitted from the decoding process for the correct full-length protein to be produced [66]. In this case, the flanking codons are matching GGAs decoded by tRNAGly2 and the next codon to be decoded after the first GGA is a stop signal with a weak downstream context (UAGCCU). In this case, the choice of events is for the peptidyl-tRNA in the P-site to detach from the mRNA before the UAG in the A-site is decoded and synthesis terminated. This detachment (called ‘take-off’ by Atkins and colleagues) occurs with very high efficiency and under physiological conditions it appears that termination is out-competed. There are additional cis and trans elements that drive the event that includes in addition to detachment of the peptidyl-tRNA, scanning of the mRNA and re-attachment (or ‘landing’ according to the Atkins’ nomenclature) where canonical protein synthesis resumes. As has become the familiar pattern at recoding sites, the stop codon forms the basic platform for the event and other sophisticated elements have been put in place to ensure its efficiency. The detachment site and the stop codon are within
19
20 The Mechanism of Recording in Pro- and Eukaryotes
the stem of a stem-loop secondary structure but, intriguingly, a sequence of charged and hydrophobic amino acids in the nascent peptide synthesized up to this point also acts as mediator of the event [67]. Clearly, to initiate the event there must be competition between detachment and termination. The elements favoring detachment overwhelm stop codon decoding by RF1, the cognate decoder of UAG, as most ribosomes initiate bypassing. A detailed study has been undertaken to try to assign the importance of the various cis and trans elements in the three stages of the event; detachment, scanning and landing [68]. Structures of the bacterial ribosome suggest the peptidyl-tRNA is held at the P-site with a number of interactions between it and rRNA in the vicinity of the decoding site where codon–anticodon interaction is occurring [69]. This is in contrast with the A-site tRNA, where there is a paucity of apparent interactions near the site of codon–anticodon interaction. Slippage of the peptidyl-tRNA on the mRNA involving both detachment and movement to prevent re-attachment must require some significant disruption to these normal contacts between tRNA and rRNA. It is assumed that the decoding rate of UAG by RF1 is sufficiently slowed so that the force for detachment can predominate. Indeed, even with a temperature-sensitive RF1 strain it was not possible to demonstrate competition between termination and detachment because of the overwhelming advantage of the recoding event and it was not until functional RF1 was over-expressed that a decrease in detachment of the peptidyl-tRNA was observed [68]. Presumably, this was because the local concentration of RF1 at the recoding site was higher, the rate of decoding the stop codon was enhanced and, therefore, the kinetic pause was shortened. This study supports a simple pause model at the termination codon for providing the right conditions for ‘take off’ and modest stability of the peptidyl-tRNA interaction at the P-site. The current model for RF-mediated release of the completed polypeptide from the P-site tRNA invokes a conformational change in the RF with altered ribosomal interactions and positioning in the A-site after successful cognate decoding of the stop codon. This is proposed to trigger a signal to the peptidyltransferase center that initiates hydrolysis of the ester bond between the ultimate tRNA and growing polypeptide [44, 70]. If such a conformational change is prevented or altered following successful decoding at the recoding site, this might also favor detachment of the peptidyl-tRNA before hydrolysis can occur. However, as high concentrations of RF1 can compete successfully, the bypassing elements can, at most, decrease the likelihood of a successful signal being triggered to the peptidyltransferase center by the RF. On the other hand, the initial binding rate of RF1 to the A-site might be decreased significantly leaving the A-site empty and an empty A-site might be required for detachment. The role of the cis-acting stem-loop can be compensated for by the removal of ribosomal protein L9. This suggests L9 may have a role in preventing slippage of the peptidyl-tRNA at the P-site, or a role in A-site decoding. However, recent data suggest defects in L9 may enhance mRNA movement through the ribosome [71]. Therefore, the ability of ribosomes lacking L9 to complement mutations in the stem-loop might be through this mechanism. On the other hand, the loss in bypass efficiency by mutant tRNAGly2 can be compensated only by mutations in the
9 Frameshifting Around Stop or Sense Codons 21
nascent gene 60 protein before the site. This suggests the two trans elements, the tRNA and the nascent peptide, are operating through different mechanisms. The data support the nascent peptide either enhancing the peptidyl-tRNA dissociation and the stem-loop occupancy of the A-site, or indirectly enhancing movement of a dissociated peptidyl-tRNA. Presumably, all of the elements will be impinging on the ribosomal architecture and interactions in different ways to loosen structurally mediated tight controls on frame maintenance that are important for canonical decoding events during translation. The mechanism of bypassing seems bizarre to contradict all reason as a logical mechanism of making a viable protein but this is characteristic of RNA. That these highly unusual mechanisms are present, is an important example of how a complicated RNA machine involving three types of RNA can evolve even more sophisticated processes beyond the extremely intricate procedures required for normal canonical events.
9 Frameshifting Around Stop or Sense Codons
Frameshifting involves a disruption and slippage of the interactions between mRNA and tRNA. There is forward (+) or backward (−) movement of the mRNA with respect to tRNA anticodon:codon binding. However, with frameshifting, the movement is of only one or two bases before re-engagement of the tRNA and disruption of the original reading frame is a consequence of this slippage. This is in contrast with ‘bypassing’ mRNA, where a greater length of the mRNA is avoided with respect to the tRNA. Whereas disruption of the reading frame can occur here as well, translation still has the potential to resume in the same frame as the prior synthesis. Gallant and Lindsay [72] have shown that ribosomes can slide over ‘hungry codons’ (described as such where an aminoacyl-tRNA is limiting) and then resume translation at a cognate codon many nucleotides downstream similar to the classical bypass event shown with bacteriophage T4 gene 60. Slippage over ‘hungry codons’ is not ‘programed’ but occurs in the mRNA under a special set of physiological circumstances [72]. In both frameshifting and bypassing, the existing interactions between tRNA and mRNA and possibly also rRNA are perturbed in the initial event. The difference is in the re-engagement process. It occurs almost immediately in the case of frameshifting and is mostly dictated by the particular mRNA sequence and the opportunity for the tRNA to re-engage in the new frame through anticodon:codon interactions. In this way, frameshifting is an example of a programed translational event. Thus far, discussion of recoding in this chapter has focussed on sites that have stop codons as an essential framework (readthrough, Sec incorporation, bypassing). In the case of frameshifting, + or – slippage events are found not only at stop codons but also at regions of the mRNA where a stop codon is not present. Clearly, at these sites the advantage of having a slowly decoded stop codon to allow for kinetic competition at the translational pause is not needed because there are
22 The Mechanism of Recording in Pro- and Eukaryotes
alternative ways in which the pause is created at a sense codon or kinetic competition sufficiently favors the non-canonical event. Non-programed frameshifting can occur naturally at sense codons but at a very low frequency (less than 5 × 105 per codon) [73]. Farabaugh and co-workers [74] have described how frameshifting can disrupt the reading frame by a number of possible mechanisms such as: 1. Expansion or contraction of codon size to four or two bases instead of three (this would not require disengagement of existing interactions). 2. Orientation of the incoming tRNA to allow recognition of three bases but not the next three that are in-frame. 3. Translocation of the mRNA after peptide-bond formation to move the mRNA two or four bases forward instead of three. 4. Disruption of existing RNA–RNA interactions to facilitate slippage after translocation.
Debate on which of these mechanisms operate physiologically has been lively, but there is no necessity to explain all frameshift events by a universal mechanism. For example, a mutant tRNAGly is able to correct frameshift mutations [75] and a possible mechanism is the recognition of GGGN rather than GGN. However, it is hard to accommodate this model with the recent structural information on how the rRNA elegantly ‘senses’ each of the three bases of the correct codon in the ribosomal A-site [10]. 9.1 Forward Frameshifting: the +1 Event
The first discovery of frameshifting in a cellular gene initially appeared to be a ‘one-off’ discovery because it was an exquisite example of how a specific gene could regulate its own synthesis by a unique mechanism. The gene was prfB encoding the bacterial RF2 that recognizes the UGA stop codon. The frameshift site within the RF2 mRNA contained a UGA stop codon, the very codon recognized and decoded by RF2 as stop. This immediately suggested that here was a unique mechanism by which RF2 could control its own synthesis. The stop codon is in-frame at position 26 and it was discovered from sequencing of the first 44 amino acids of the protein that there had to be a +1 frameshift event at the stop codon during translation to obtain the derived sequence of the full-length protein [50]. This was a programed event in that while the stop codon provided the framework of the site, there were other cis elements that facilitated the event. These elements included the codon immediately before the stop codon, CUU, that can detach from the anticodon of the peptidyl-tRNA in the P-site allowing it to re-pair in the +1 frame with UUU, comprising the last two bases and the first base of the stop codon UGA. When this occurs, the codon in the new +1 frame A-site is GAC allowing for Asp to be incorporated as the next amino acid.
9 Frameshifting Around Stop or Sense Codons
Not only is translational frameshifting conserved in the prfB genes from a wide range of bacteria (approximately 70% of those sequenced so far) but the CUUUGA motif is also retained implying a conservation of mechanism [76]. Just 5 of the CUUUGA motif, an internal Shine–Dalgarno sequence normally found 5–6 bases in front of start codons in bacterial genes, base-pairs with a complementary region of the 3 -terminus of the 16S rRNA to facilitate the frameshift event [77]. As well, the stop signal contains the least efficient downstream sequence, UGACUA, of any found in bacteria [42, 78]. This implies that stress is placed on interactions between the mRNA, tRNA and rRNA during decoding of the CUU. At the same time as the ‘misplaced’ Shine–Dalgarno interaction is occurring, a weak stop signal is present in the A-site giving a decoding pause and allowing time for the tRNA to disengage from the peptidyl-tRNA and re-engage with the next base in the mRNA. The trans element for frameshifting in the RF2 mRNA is, of course, the RF2 protein itself (Fig. 6). Modulation of its concentration can shift frameshift efficiency from 0–100% [51, 52, 79], illustrating how effective the mechanism can be to regulate the cellular concentration of RF2. Under normal physiological conditions of bacteria growing in rich media in log phase, the frameshift efficiency is approximately 30–50%. Frameshifting will also occur if the stop codon is replaced by a sense codon but then the efficiency is inversely related to the rate of amino acyltRNA selection and supports the importance of the translational pause [80]. Perhaps a U:G wobble base pair in the pre-shift codon:anticodon that has been associated with high-frequency frameshifting facilitates the event at the RF2 site [81]. A similar +1 programed frameshifting event with a novel autoregulatory mechanism was discovered some 10 years later in the mammalian ornithine decarboxylase antizyme gene [82]. Antizyme synthesis is induced by polyamines (see Fig. 6). Antizyme binds to and induces a conformational change in the key enzyme in polyamine synthesis, ornithine decarboxylase, targeting it for turnover. This was another example of a remarkable autoregulatory circuit where the key molecule was not a protein this time but the polyamine metabolites, spermidine, spermine and putrescine. Polyamines are known to interact with RNA and probably influence
Fig. 6 The cis elements and trans-acting effectors critical for +1 frameshifting at the RF2 and antizyme recoding sites.
23
24 The Mechanism of Recording in Pro- and Eukaryotes
key interactions at the decoding site that facilitate the recoding event. The recoding site in the antizyme mRNA is also at the codon preceding a stop codon (UCCUGAU). Key cis elements are responsible for enhancing frameshift efficiency; at least 50 nucleotides in modules just 5 of the frameshift site are responsible for 2- to 3-fold stimulation of frameshift efficiency, the stop codon increases efficiency 15- to 20-fold and there is a downstream pseudoknot just three nucleotides distant from the motif that contributes 2.5- to 5-fold. The upstream sequences may be equivalent in some way to the Shine–Dalgarno element of the prfB site in function. In contrast with this site, whereas findings from site-directed mutagenesis studies suggested that re-pairing between the tRNA and the +1 shift CCU was unlikely, phylogenetic analysis and observations of –2 frameshifting at the site in yeast are supportive of a re-pairing mechanism [83]. Two mammalian paralogues of what is now called antizyme 1 have been found [84] each with a frameshift site, with antizyme 3 being tissue and cell-type specific. Atkins and co-workers [83] have studied these antizyme sites in a wide range of organisms and this has enabled them to make several conclusions about recoding events. First, recoding sites can be selected over very long evolutionary periods. Secondly, some plasticity is characteristic of the site and, thirdly, once an initial site is established a range of stimulators acting in cis can evolve to give specific characteristics and optimize recoding efficiencies. An example of a mechanism for + 1 frameshifting, which is different from that of tRNA slippage coupled with slow decoding of the codon in the A-site, seems to occur with the yeast retrotransposon Ty3 [73]. In this case, frameshifting occurs at the sequence GCGAGUU as a result of the incoming tRNA pairing with the out-offrame GUU without slippage of the peptidyl-tRNA. Interestingly, over-expressed near-cognate P-site tRNAs were able to induce frameshifting generally at the Ty3 site and is in contrast with cognate tRNAs that decreased frameshift efficiency [84]. This suggests that near-cognate tRNAs may be decoding GCG in the Ty3 site and, when coupled with a slowly decoded codon like AGU, allow take-over by the tRNA recognizing the +1 codon, GUU. Yet another example of +1 frameshifting has been found in the yeast telomerase gene, EST3. Yeast use telomerase to maintain the ends of their chromosomes and the +1 translational frameshift event required to produce fully functional telomerase occurs at a motif, CUUAGUUGAG [85]. This motif is similar to the Ty1, Ty2 and Ty4 frameshift site, CUU AGG C, in the underlined portion but, as well, contains a stop codon. These additional examples indicate there may be more sites yet to be found where +1 frameshifting plays an important part in gene regulation. 9.2 Programed −1 Frameshifting: A Common Mechanism used by Many Viruses During Gene Expression
Diverse virus groups have evolved a recoding site to express coat proteins and enzymes in a carefully balanced ratio. The mechanism involves switching frame
9 Frameshifting Around Stop or Sense Codons
during the synthesis of a polyprotein (gag) so that the extended product (gag:pol) from the new frame contains the enzyme sequences that can be excised from the protein. For most passages (approximately 90%), the ribosome does not frameshift with the result that the coat protein subunits are produced in 10-fold higher amounts than the enzymes. This subunit : enzyme ratio is important for productive infection of the virus. For such viruses, the recoding site has clear motifs and stimulatory elements consisting of two cis acting sequences, a heptanucleotide motif XXXYYYZ (a slippery sequence) and a secondary structural element, usually a pseudoknot. It was a seminal paper in 1985 that indicated quite clearly that frameshifting was not going to be restricted to the example crafted so elegantly for RF2 gene expression [86]. The Rous Sarcoma Virus used a recoding site that included a stop codon (AAAUUUAUAG) similar to that used in expression of RF2 but here the shift was backwards rather than forwards. The new codon in the A-site became AUA rather than UAG and the stop codon was avoided. Jacks and Varmus [86] noticed that the AAAUUUA motif allowed slippage of the two tRNAs (Asn and Leu) simultaneously to re-pair in a near-cognate manner with AAA UUU. They proposed a simultaneous slippage model from the A- and P-sites meaning that slippage would occur before peptide transfer and translocation. This mechanism became the generally accepted model for −1 frameshifting. In hindsight, there were two aspects of this mechanism that required further investigation. First, the proposed mechanism would not involve the stop codon in the recoding event within the decoding center and yet the importance of the stop codon as a part of recoding sites as discussed here is clearly established. While the codon following the slippery motif in viruses is often not a stop codon, for example, it is GGG in HIV-1, there is still a restricted subset of codons used in this position. This is highly suggestive that the UAG codon in the Rous Sarcoma Virus frameshift site (or the GGG codon in the HIV-1 site) might be involved in the recoding event and may have a role at the decoding site (Figure 7). How can this occur? A simultaneous-slippage model where slippage occurs at the P- and E-sites rather than the A-and P-sites, would allow the next codon to be occupying the A-site. Then
Fig. 7 Sequence motifs and cis elements at viral −1 frameshift sites.
25
26 The Mechanism of Recording in Pro- and Eukaryotes
the stop or sense codon could contribute to a translational pause and facilitate the slippage event. Secondly, the unusual simultaneous-slippage mechanism proposed had slippage occurring in competition with peptide-bond formation that is normally a kinetically rapid event in protein synthesis. It is more likely that frameshifting is in competition with a kinetically slower event. While others have favored this to be a step after peptide-bond formation but prior to translocation so that slippage could still occur from the A-and P-sites, we believed an equally similar step occurred after translocation, i.e., slippage from the P- and E-sites in competition with the new event in the now empty A-site. This could be the decoding of the stop or sense codon depending on the sequence composition of the particular viral site. For this reason, we attempted to determine whether the codon after the slippery site was being decoded in the A-site before slippage had occurred. Initially, because the genes for the bacterial RFs were available within plasmids we used the HIV-1 recoding site with the GGG codon replaced with each of the stop codons and bacterial ribosomes. We then tested whether over-expression in vivo of the bacterial RFs would affect frameshifting at the recoding site that occurred readily on these bacterial ribosomes. There were two key observations. First, changing the codon from the natural GGG to each of the stop codons, UAG, UGA and UAA, markedly lowered frameshifting, indicating that the codon was having a major effect on the mechanism. More significantly, over-expression of the factors eliminated frameshifting in a codon-specific manner (RF1 at UAG and UAA, RF2 at UGA and UAA). This was compelling evidence that the codon following the recoding site was being decoded before slippage had occurred and supported a P–E site simultaneous-slippage model [87]. Now that the genes for the eukaryotic RFs are available we have repeated this study in vivo using mammalian cells. The results from these studies using mammalian ribosomes concur with our in vivo studies that used bacterial ribosomes. First, the stop codon depressed frameshift efficiency. Secondly, over-expression of eRF1 caused a further reduction of efficiency as it did also at the antizyme recoding site used as a control [88]. This is provocative evidence that this −1 frameshift event is occurring in a similar manner to a +1 frameshift where recoding is in competition with a canonical decoding event at the ribosomal A-site. Recently, Dinman and co-workers [89] have proposed an elegant integrated model to explain programed frameshifting that defines why in some circumstances +1 and in others −1 frameshifting occurs. A key element of the model is that the occupancy states of the ribosome are different for each event. For +1 frameshifting, the A-site is empty and the shift occurs in the post-translocational state (P- and Esites occupied with tRNAs), but for −1 frameshifting, the A-site is proposed to be already occupied and therefore the ribosome is in the pre-translocational state (A- and P-sites occupied with tRNAs). They have proposed this model as a result of studies that used antibiotics with normal and mutant yeast strains and how under these circumstances frameshifting efficiency was affected. A consistent set of conclusions was drawn from the differential antibiotic sensitivities of the +1 and −1 frameshift events to support the model. For example, translocation inhibitors interfere with +1 but not −1 frameshifting, although neither do they enhance it. As predicted by the model, a peptidyltransferase inhibitor, sparsomysin, that would
9 Frameshifting Around Stop or Sense Codons
increase a ribosomal pause when the A- and P-sites are occupied enhances −1 but not + 1 frameshifting. These conclusions are consistent with what is known about +1 frameshifting from the examples described in this chapter. However, it conflicts with conclusions drawn from our recent evidence that frameshifting at the classic viral −1 site also occurs at the post-translocational state of the ribosome at which time the A-site is empty. In our model, +1 and −1 frameshift events would generally start from the same post-translocational state. It is the specific sequence of the site, the nature of the constraining cis-acting elements and how they impinge on interactions between the tRNAs, mRNA and rRNA and, in particular, the stability of the peptidyl-tRNA:mRNA interaction that will determine whether the shift is forwards or backwards. It invokes the common element of a translational pause at the decoding A-site, dislocation in most cases of the codon:anticodon pairing interactions at the P- and E-sites and re-pairing close by where possible (+1 and −1 events) or bypassing larger tracts of sequence if immediate re-pairing is not favorable (Fig. 8). How can these apparently conflicting positions be accommodated? It could be that if there were a weakening of the already compromised interactions at recoding sites by any effector such as an antibiotic or a mutation creating a new pause in a particular ribosomal state, then this would have the potential to facilitate a shift in frame. Frameshifting may be able to occur in different states of the ribosome under these circumstances. If the partially inhibited process still does not become the rate-limiting step, then it may have no effect on frameshifting. However, if it now makes that process the slowest step, then frameshifting could be enhanced (e.g., inhibiting −1 but not +1 frameshifting by the peptide-bond formation inhibitor, sparsomycin). Moreover, as the detailed new structural information on the ribosome has become available it is revolutionizing our understanding on how antibiotics act and our understanding on the various steps of protein synthesis deduced from biochemical studies both in vitro and in vivo. Further experiments will be needed to resolve some of these apparent paradoxes and it may not be possible to invoke a completely integrated model to explain all programed frameshift events.
Fig. 8 Alternative recoding events that could result from destabilization of the peptidyl-tRNA interaction.
27
28 The Mechanism of Recording in Pro- and Eukaryotes
In addition to the classic viral −1 frameshift events of the retroviruses and other viral groups including bacteriophages, there are also examples identified in cellular genes. In the prokaryotic examples, such as that found in the gene dnaX encoding a subunit of DNA polymerase III [90–92], there are different features to the recoding site that reflect how elements can be added to a basic recoding framework. The Shine–Dalgarno sequence documented for the prfB gene (+1 frameshifting) is found in the dnaX gene as an upstream element, but at a different and precise spacing for acting as a stimulator. The slippery heptamer sequence, AAAAAAG, allows two Lys– tRNAs paired with AAAAAG to slip backwards and re-pair with AAAAAA so that the following G becomes the first base of the next codon. Before slippage, this G would be paired with a modified U (mnm5 S2 U) in the anticodon of the tRNA, the same modification found often in the P-site tRNA at termination sites. However, the modification prevents stable base pairing with G and thereby weakens the codon:anti-codon interaction. Downstream from the slippery site there is a hairpin loop that can also act as a stimulator independent of the upstream element. Does −1 frameshifting occur in cellular genes in eukaryotes? As the features of a −1 viral recoding site are quite clear-cut, it has been possible to create an algorithm to search genomes for genes that might use this strategy. While a number of possible genes have been flagged, there is, as yet, neither any evidence that they use this strategy nor a rationale as to why they might do so [93]. Nevertheless, recently, an EST was discovered that represented a gene, edr, that appeared to have a retrovirus as its ancestor. The recoding site was GGGAAAC with a pseudoknot located downstream. It was expressed temporally and spatially during development although it is not known whether expression is essential for development. However, it seems to use a −1 frameshift mechanism for expression [94]. As programed −1 frameshifting is used in the viral biology of pathogenic viruses like HIV-1 and is a potential target for an antiviral agent, it is important to identify putative human genes that might use this mechanism as this might preclude the viral recoding site as a place of attack in an antiviral strategy. So far, there is no definitive data to preclude this approach although further study is required on the importance of expression of the edr gene.
10 Conclusion
When RNA molecules interact there can be amazing consequences. We have known for some time that protein synthesis is likely to be the consequence of a highly accurate and fast RNA machine that incorporates specific features to ensure that speed and accuracy can occur together. Recent developments have heightened this appreciation now that the RNA and protein components can be seen in atomic detail. For the first time, it is possible to visualize how some of these special features function. The sensing by the rRNA that the correct codon:anticodon interactions between the tRNA and the mRNA are occurring at the decoding A-site is an exquisite
10 Conclusion
example. The ‘enzyme’ (the rRNA) holds the ‘substrates’ (the tRNAs) with unique interactions to maintain the canonical three-base reading frame to ensure they are read with an acceptable level of accuracy. It is not surprising then that this high degree of precision can be overridden. Sometimes, this is a result of a low-frequency error. Sometimes, it is as a result of an atypical physiological perturbation such as starvation for an amino acid but as we know now, sometimes, it is a result of a non-canonical recoding event that has evolved for highly specialized physiological reasons. As each interaction between the mRNA, rRNA and the tRNAs is so critical for maintenance of precision, there is potential for elements acting in cis or trans to perturb one or more of these specific pairings. Sequences in the mRNA, or structures that can form within it, have the capacity to form new interactions with the rRNA and put strain on normal interactions. As a consequence, the highly ordered co-ordination between structure and function can be disrupted and kinetic parameters can be slowed giving a chance for alternative events that normally would not be competitive to be significant. Trans activators of non-canonical recoding can be of a diverse nature such as the protein RF2 acting at the recoding site within its own mRNA, or metabolites like polyamines in the antizyme frameshift site or a unique tRNA like tRNASec in the Sec incorporation site. Sites where recoding events have evolved often have a stop codon as a central core with upstream and downstream elements that help to ensure that stop codon is more slowly decoded. These sequence elements exert influence in two ways. First, sequences contribute to the decoding efficiency of the stop signal itself. Upstream elements likely to be contributing to the three-dimensional architecture of the RF binding site and downstream elements are probably facilitating stable interactions between the factor and mRNA. Secondly, the upstream and downstream elements can independently perturb other parts of the decoding center and affect parameters independent of the stop codon. While having a stop codon at the recoding site is not universal, it does provide an ideal environment for the creation of translational pauses and provides a vulnerable ‘Achilles heel’ of canonical protein synthesis where recoding events can evolve. It is not surprising that most examples of recoding occur in the highly specialized genetic system of the virus. Both bacteriophages and eukaryotic viruses have evolved and maintained highly successful recoding mechanisms. The classic −1 frameshift site found in many eukaryotic viruses has allowed multiple proteins to be synthesized from a single RNA with economy and in balanced amounts. It is surprising, now that we know recoding can be an important translational control mechanism for gene expression, that more examples have not been found in prokaryotic and eukaryotic cellular genes. Those that are known and relatively well characterized such as +1 frameshifting in both the prfB gene for RF2 and in the gene for ornithine decarboxylase antizyme and Sec incorporation at selected UGA stop codons, are highly elegant mechanisms for gene expression regulation. These events add a layer of fine control and subtlety to the more typical control mechanisms. Now that new genome and proteome information is appearing rapidly, will the opportunity to examine gene function in detail reveal more examples of the
29
30 The Mechanism of Recording in Pro- and Eukaryotes
recoding types that are already known and, perhaps, even new events? It is likely that new events will be discovered but these probably will represent rare exclusive examples where some niche advantage is gained. RNA provides ‘flexible moulding clay’ from which new mechanisms and functions can emerge [95]. There are sure to be more surprises.
References 1 H. J. RHEINBERGER, K. H. NIERHAUS, Proc. Natl. Acad. Sci. USA 1981, 78, 5310–5304. 2 K. H. NIERHAUS, Mol. Microbiol. 1993, 9, 661–669. 3 J. FRANK, J. ZHU, P. PENCZEK et al., Nature 1995, 376, 441–444. 4 R. F. GESTELAND, J. F. ATKINS, Annu. Rev. Biochem. 1996, 65, 741–768. 5 S. L. ALAM, J. F. ATKINS, R. F. GESTELAND, Proc. Natl. Acad. Sci. USA 1999, 96, 14177–14179. 6 M. J. BERRY: in Translational Control of Gene Expression, eds N. SONENBERG, J. W. B. HERSHEY, M. B. MATHEWS, Cold Spring Harbor Laboratory Press, New York 2000. 7 P. V. BARANOV, R. F. GESTELAND, J. F. ATKINS, Gene 2002, 286, 187–201. 8 A. J. HERR, J. F. ATKINS, R. F. GESTELAND, Annu. Rev. Biochem. 2000, 69, 343–372. 9 P. J. FARABAUGH, Microbiol. Rev. 1996, 60, 103–134. 10 J. M. OGLE, D. E. BRODERSEN, W. M. Jr. CLEMONS et al., Science 2001, 292, 897–902. 11 A. B¨OCK, K. FORCHHAMMER, J. HEIDER et al., Mol. Microbiol. 1991, 5, 515–520. 12 A. B¨OCK, K. FORCHHAMMER, J. HEIDER et al., TIBS 1991, 16, 463–467. 13 W. LEINFELDER, K. FORCHHAMMER, B. VEPREK et al., Proc. Natl. Acad. Sci. USA 1990, 87 543–547. 14 L. S. MULLINS, S-B. HONG, G. GIBSON et al., J. Am. Chem. Soc. 1997, 119, 6684–6685. 15 W. LEINFELDER, E. ZEHELEIN, M. A. MANDRAND-BERTHELOT et al., Nature 1988, 331, 723–725. 16 C. BARON, E. WESTHOF, A. B¨OCK et al., J. Mol. Biol. 1993, 231, 274–292. 17 K. FORCHHAMMER, W. LEINFELDER, K. BOESMILLER et al., J. Mol. Biol. 1991, 266, 6318–6323. 18 K. FORCHHAMMER, A. B¨OCK, J. Biol. Chem. 1991, 266, 6324–6328.
19 H. ENGELHARDT, K. FORCHHAMMER, S. MULLER et al., Mol. Microbiol. 1992, 6, 3461–3467. 20 K. FORCHHAMMER, W. LEINFELDER, A. B¨OCK, Nature 1989, 342, 453–456. 21 C. BARON, A. B¨OCK, J. Biol. Chem. 1991, 266, 20375–20379. 22 F. ZINONI, J. HEIDER, A. B¨OCK, Proc. Natl. Acad. Sci. USA 1990, 87, 4660–4664. 23 A. H¨UTTENHOFER, E. WESTHOF, A. B¨OCK, RNA 1996, 2, 354–366. 24 A. H¨UTTENHOFER, A. B¨OCK, Biochemistry 1998, 37, 885–890. 25 S. J. KLUG, A. HUTTENHOFER, M. KROMAYER et al., Proc. Natl. Acad. Sci. USA 1997, 94, 6676–6681. 26 J. B. MANSELL, PhD Thesis, University of Otago, New Zealand, 1999. 27 J. B. MANSELL, D. GUE´ VREMONT, E. S. POOLE et al., EMBO J. 2001, 20, 7284–7293. 28 S. SUPPMANN, B. C. PERSSON, A. B¨OCK, EMBO J. 1999, 18, 2284–2293. 29 R. WILTING, S. SCHORLING, B. C. PERSSON et al., J. Mol. Biol. 1997, 266, 637–641. 30 P. R. COPELAND, J. E. FLETCHER, B. A. CARLSON et al., EMBO J. 2000, 19, 306–314. 31 R. M. TUJEBAJEVA, P. R. COPELAND, X.-M. XU et al., EMBO Rep. 2000, 1, 158–163. 32 D. HATFIELD, A. DIAMOND, B. DUDOCK, Proc. Natl. Acad. Sci. USA 1982, 79, 6215–6219. 33 B. J. LEE, M. RAJAGOPALAN, Y. S. KIM et al., Mol. Cell Biol. 1990, 10, 1940–1949. 34 S. C. LOW, J. W. HARNEY, M. J. BERRY, J. Biol. Chem. 1995, 270, 21659–21674. 35 E. GRUNDNER-CULEMANN, G. W. MARTIN III, J. W. HARNEY et al., RNA 1999, 5, 625–635. 36 K. K. MCCAUGHAN, C. M. BROWN, M. E. DALPHIN et al., Proc. Natl. Acad. Sci. USA 1995, 92, 5431–5435.
References 37 J. KOHLI, H. GROSJEAN, Mol. Gen. Genet. 1981, 182, 430–439. 38 C. M. BROWN, P. A. STOCKWELL, C. N. A. TROTMAN et al., Nucleic Acids Res. 1990, 18, 2079–2086. 39 C. M. BROWN, M. E. DALPHIN, P. A. STOCKWELL et al., Nucleic Acids Res. 1993, 21, 3119–3123. 40 C. M. BROWN, P. A. STOCKWELL, C. N. A. TROTMAN et al., Nucleic Acids Res. 1990, 18, 6339–6345. 41 W. P. TATE, E. S. POOLE, J. A. HORSFIELD et al., Biochem. Cell Biol. 1995, 73, 1095–1103. 42 L. L. MAJOR, PhD Thesis, University of Otago, New Zealand, 2001. 43 E. S. POOLE, L. L. MAJOR, S. A. MANNERING et al., Nucleic Acids Res. 1998, 26, 954– 960. 44 D-J. G. SCARLETT, K. K. MCCAUGHAN, D. N. WILSON et al., J. Biol. Chem. 2003, 278, 15095–15104. 45 L. L. MAJOR, E. S. POOLE, W. P. TATE, in preparation. 46 D. S. KONECKI, K. C. AUNE, W. TATE et al., J. Biol. Chem. 1977, 252, 4514–4520. 47 S. MOTTAGUI-TABAR, A. BJO¨ RNSSON, L. A. ISAKSSON, EMBO J. 1994, 13, 249–257. 48 S. ZHANG, M. RYDE´ NAULIN, L. A. ISAKSSON, J. Mol. Biol. 1996, 261, 98–107. 49 A. BJO¨ RNSSON, S. MOTTAGUI-TABAR, L. A. ISAKSSON, EMBO J. 1996, 15, 1696–1704. 50 W. J. CRAIGEN, R. G. COOK, W. P. TATE et al., Proc. Natl. Acad. Sci. USA 1985, 82, 3616–3620. 51 W. J. CRAIGEN, C. T. CASKEY, Nature 1986, 322, 273–275. 52 B. C. DONLY, C. D. EDGAR, F. M. ADAMSKI et al., Nucleic Acids Res. 1990, 18, 6517–6522. 53 A. G. CRIDGE, E. S. POOLE, W. P. TATE, unpublished results. 54 H. BEIER, M. GRIMM, Nucleic Acids Res. 2001, 29, 4767–4782. 55 M. CASSAN, J.-P. ROUSSET, BMC Mol. Biol. 2001, 2, 3. 56 B. BONETTI, L. W. FU, J. MOON et al., J. Mol. Biol. 1995, 251, 334–345. 57 O. NAMY, G. DUCHATEAU-NGUYEN, J.-P. ROUSSET, Mol. Microbiol. 2002, 43, 641–652. 58 Z. LIU, M. RECHES, H. ENGELBERG-KULKA, J. Mol. Biol. 1999, 294, 1073–1086.
59 C. M. BROWN, S. P. DINESH-KUMAR, W. A. MILLER, J. Virol. 1996, 70, 5884–5892. 60 N. M. WILLS, R. F. GESTELAND, J. F. ATKINS, Proc. Natl. Acad. Sci. USA 1991, 88, 6991–6995. 61 G. LI, C. M. RICE, J. Virol. 1993, 67, 5062–5067. 62 A. M. WEINER, K. WEBER, J. Mol. Biol. 1973, 80, 837–855. 63 L. HARRELL, U. MELCHER, J. F. ATKINS, Nucleic Acids Res. 2002, 30, 2011–2017. 64 D. N. ROBINSON, L. COOLEY, Development 1997, 124, 1405–1417. 65 D. M. IVEY, J. CHENG, T. A. KRULWICH, Nucleic Acids Res. 1992, 20, 4928. 66 W. M. HUANG, S. Z. AO, S. CASJENS et al., Science 1988, 239, 1005–1012. 67 R. B. WEISS, W. M. HUANG, D. M. DUNN, Cell 1990, 62, 117–126. 68 A. J. HERR, R. F. GESTELAND, J. F. ATKINS, EMBO J. 2000, 19, 2671–2680. 69 J. H. CATE, M. M. YUSUPOV, G. Zh. YUSUPOVA et al., Science 1999, 285, 2095–2104. 70 D. N. WILSON, D. GUE´ VREMONT, W. P. TATE, RNA 2000, 6, 1704–1713. 71 A. J. HERR, C. C. NELSON, N. M. WILLS et al., J. Mol. Biol. 2001, 309, 1029–1048. 72 J. GALLANT, D. LINDSEY, Proc. Natl. Acad. Sci. USA 1998, 95, 13771–13776. 73 P. J. FARABAUGH, Annu. Rev. Genet. 1996, 30, 507–528. 74 G. STAHL, G. P. MCCARTY, P. J. FARABAUGH, Trends Biochem. Sci. 2002, 27, 178–183. 75 D. RIDDLE, J. CARBON, Nature New Biol. 1973, 242, 230–234. 76 P. V. BARANOV, R. F. GESTELAND, J. F. ATKINS, EMBO Rep. 2002, 3, 373–377. 77 R. B. WEISS, D. M. DUNN, A. E. DAHLBERG et al., EMBO J. 1988, 7, 1503–1507. 78 E. S. POOLE, C. M. BROWN, W. P. TATE, EMBO J. 1995, 14, 151–158. 79 F. M. ADAMSKI, C. DONLY, W. P. TATE, Nucleic Acids Res. 1993, 21, 5074–5078. 80 J. F. CURRAN, M. YARUS, J. Mol. Biol. 1989, 209, 65–77. 81 J. F. CURRAN, Nucleic Acids Res. 1993, 21, 1837–1843. 82 S. MATSUFUJI, T. MATSUFUJI, Y. MIYAZAKI et al., Cell 1995, 80, 51–60. 83 I. P. IVANOV, R. F. GESTELAND, J. F. ATKINS, Nucleic Acids Res. 2000, 28, 3185–3196.
31
32 The Mechanism of Recording in Pro- and Eukaryotes
84 A. SUNDARARAJAN, W. A. MICHAUD, Q. QIAN et al., Mol. Cell 1999, 4, 1005– 1015. 85 D. K. MORRIS, V. LUNDBLAD, Current Biol. 1997, 7, 969–976. 86 T. JACKS, H. E. VARMUS, Science 1985, 30, 1237–1242. 87 J. A. HORSFIELD, D. N. WILSON, S. A. MANNERING et al., Nucleic Acids Res. 1995, 23, 1487–1494. 88 R. S. GRAVES, C. Z. M. MCKINNEY, E. S. POOLE et al., unpublished results. 89 J. W. HARGER, A. MESKAUSKAS, J. D.
90 91 92 93 94 95
DINMAN, Trends Biochem. Sci. 2002, 27, 448–453. A. L. BLINKOWA, J. R. WALKER, Nucleic Acids Res. 1990, 18, 1725–1729. A. M. FLOWER, C. S. MCHENRY, Proc. Natl. Acad. Sci. USA 1990, 87, 3713–3717. Z. TSUCHIHASHI, A. KORNBERG, Proc. Natl. Acad. Sci. USA 1990, 87, 2516–2520. A. B. HAMMELL, R. C. TAYLOR, S. W. PELTZ et al., Genome Res. 1999, 9, 417–427. K. SHIGEMOTO, J. BRENNAN, S. E. WALLES et al., Nucleic Acids Res. 2001, 29, 4079–4088. D. KENNEDY, Science 2002, 298, 2283.
1
Structure of the Ribosome Gregor Blaha Yale University, New Haven, USA
Pavel Ivanov Max-Planck-Institut f¨ur Molekulare Genetik, Berlin, Germany
Originally published in: Protein Synthesis and Ribosome Structure. Edited by Knud H. Nierhaus and Daniel N. Wilson. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30638-1
1 Introduction
Protein synthesis is a complex process with the ribosome as a central player. Its task is to decode the mRNA into the corresponding sequence of amino acids with the aid of amino-acylated tRNAs. We will follow the history of ribosome structure starting with the low-resolution images of the ribosome to arrive at the recent high-resolution, atomic structures for each of the ribosomal subunits, where we will focus on the structural elements that shape them. We will not include detailed structural discussion of 5.5 Å resolution structure of Thermus thermophilus 70S [1], since, as Ramakrishnan and Moore [2] emphasized, it is not possible to build a structure de novo from a 5.5 Å electron density map. Thus, for example, the structure of the 30S ribosomal proteins in the 70S structure was built by placing those from the 30S subunit structure [3] as rigid bodies into the electron density maps of 70S ribosome [4]. Also the 50S ribosomal subunit of the 70S ribosome structure seems to include some misinterpreted electron density regions. The trace of protein L1 C alpha atoms is partly overlapping with the trace of 23S rRNA phosphorus atoms (PDB code 1 giy) [5, 6] and the orientation of the two domains of L1 deviates from the one observed in L1 [7, 8] or in its complex with rRNA [6]. 2 General Features of the Ribosome and Ribosomal Subunits
With an approximate mass of 2.6–2.8 MDa the bacterial ribosome has a diameter of 200–250 Å and a sedimentation coefficient of 70S. The 70S ribosome consists of Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Structure of the Ribosome
Fig. 1 Features of the ribosomes. Comparison of ribosomal 30S (A) and 50S subunits (B) and the 70S ribosome (C) from early EM pictures (top row; [104] with the corresponding views obtained by recent cryo-EM reconstructions (bottom row; according to Frank and Agarwal [105]. (D) A cut through the 50S subunit which bisects the central protuberance and the tunnel along the entire length. All ribosome atoms are shown
in spacefilling representation, with all RNA atoms that do not contact solvent shown in white and all protein atoms that do not contact solvent shown in green. Surface atoms of both protein and RNA are colorcoded: yellow, carbon; red, oxygen; and blue, nitrogen. A possible trajectory for a polypeptide passing through the tunnel is shown as a white ribbon. PT, peptidyltransferase site [20]. (E) Both the mammalian
3 A Special Feature of the 50S Subunit: The Tunnel
two unequal subunits: a large 50S subunit and a small 30S subunit. Each subunit is a ribonucleoprotein particle with one-third of the mass consisting of protein and the other two-thirds of RNA: a single 16S rRNA (1500 nt) in the 30S subunit and a 5S (120 nt) and 23S rRNA (2900 nt) in the large subunit. The protein fraction consists of approximately 20 different proteins in the small and 33 proteins in the large subunit. The general outline of the 70S and its component subunits was characterized by a variety of electron microscopic techniques during the 1980s. The 30S was described anthropomorphically with a head, connected by a neck to a body with a shoulder and a so-called platform (Fig. 1A). A more compact structure for the 50S was defined, consisting of a rounded base with three almost cylindrical extensions. The three protuberances seen from the 50S side are called from left to right, the L1 protuberance, the central protuberance, and the L7/L12 stalk (Fig. 1B). Both subunits form a 70S ribosome as shown in Fig. 1C. A leap in resolution was achieved by the introduction of single-particle reconstruction of cryo-electron microscopic images [9, 10]. As the resolution improved, the general structural features of the ribosome remained, but more detailed structural features appeared, such as the beak and toe or spur on the 30S (Fig. 1A) and a tunnel through the 50S (Fig. 1D).
3 A Special Feature of the 50S Subunit: The Tunnel
A tunnel transverses the 50S subunit, running from the peptidyl-transferase (PTF) center at the foot of the central protuberance up to the base at the cytoplasmic side of the large subunit with a length of about 100 Å and a width of 10–20 Å (Fig. 1D; [10, 11]). The first hint that this tunnel existed was provided by electron microscopy (EM) of two-dimensional (2D) crystals of 80S isolated from chicken embryos [12] ← Fig. 1 (continued) and the bacterial large subunits have been superimposed. Wherever the bacterial contour lies outside the mammalian one, the resulting surface is gold; wherever the mammalian contour lies outside the bacterial one, the resulting surface is blue. The views are from the 30S subunit side (left) and from the tunnel exit (right). From Ref. [15] with permission. (F) Three-dimensional reconstruction of the ribosome–Sec61 complex with Sec61 oligomer shown in red. Right panel: a cut along a plane that cross sections the pore of the Sec61 oligomer and the ribosome tunnel is shown. The arrow indicates the stem connecting the ribosome with the Sec61 oligomer; the ribosomal tunnel and
its alignment with the Sec61 pore is indicated by a broken yellow line. From Ref. [16], with permission. (G) Distribution of r-proteins in 30S: left seen form the 50S and right the cytosolic side of the 30S. Grey, RNA; blue, proteins. From Ref. [3] with permission. (H) Proteins that appear on the surface of the large ribosomal subunit. The RNA of the subunit is shown in gray and protein backbones are shown in gold. Left panel: the crown view facing the small subunit; right panel, back side of the subunit (solvent side) in the 180◦ rotated crown view orientation; bottom right, view from the bottom of the 50S subunit; bottom left, key for the ribosomal proteins. From Ref. [11] with permission.
3
4 Structure of the Ribosome
and 50S subunits from Bacillus stearothermophilus [13]. By that time it had already been shown by immuno-EM that the relative orientation of the exit site of the nascent chain in prokaryotic and eukaryotic ribosomes was identical, located at the lower back (cytoplasmic side) of the large subunit [14]. Moreover, the alignment of cryo-EM structures from rat liver ribosomes with those of Escherichia coli proved that not only the central structural features of the ribosome, i.e., L1, L7/L12 and central protuberance, can be superimposed but also the tunnel, suggesting that the tunnel is another universally conserved feature of the ribosome and probably of high functional importance (Fig. 1E; [15]). This was subsequently confirmed by the cryoEM investigations of ribosome–Sec61 complexes from yeast, where the trimeric Sec61 complex, the major component of the endoplasmic pores which conducts the growing nascent peptide chain into the endoplasmic reticulum (ER), was positioned over the exit of the tunnel (Fig. 1F; [16]). Recent studies demonstrated that proteins, which are translocated through the ER membrane, indeed exit the ribosome from the ribosomal tunnel [17, 18]. The 50S crystal structure of Deinococcus radiodurans [19] and Haloarcula marismortui [11] has now revealed the tunnel at high resolution and confirmed the previously determined dimensions and, furthermore, show that the wall of the tunnel is composed of nucleotides from domains I through V of the 23S rRNA, as well as of the non-globular parts of ribosomal proteins L4 and L22. The narrowest part of the tunnel is formed mainly by ribosomal proteins L4 and L22, where the, β-hairpin of L22 intercalates between rRNA segments of the 23S rRNA. The tunnel surface should minimize unfavorable interactions with growing nascent chains and accordingly no large hydrophobic patches have been observed lining the wall; instead the lining of the tunnel wall is made up of large hydrophilic non-charged groups, thereby facilitating the passage of all kinds of peptide sequences [20]. There seems to exist a system of tunnels, the main tunnel of which represents the shortest route from the entrance to the exterior surface of the ribosome that binds to the bacterial membrane (exit 1), whereas three additional tunnels might communicate with the solvent. One of these additional routes, which branches off the main tunnel in the segment formed by domains I and III of 23S RNA close to the main exit, was originally discovered at the 25 Å resolution [21]; the branching point is 70 Å away from the tunnel entrance. H. marismortui proteins L15 and L29 are closest to the end of this second branch (exit 2; [11]), whose length is 50 Å. Two other branches that start approximately in the same region, which is in fact the widest (20 Å) segment of the main tunnel, partially follow the extensions of proteins L4 (exit 3) and L22 (exit 4). The lengths of these branches from the common branching point to the corresponding exits are approximately 100 and 40 Å, respectively [22, 23]. A similar network of tunnels was also found in E. coli ribosomes [22, 23], within the 50S crystal structure of D. radiodurans (eubacteria, [19]) and H. marismortui (archea, [11]), and in cryo-EM reconstruction of yeast 80S ribosome (eukaryotes, [22, 24]) suggesting that the nascent peptide could in theory emerge into the cytoplasm via one these sub-braches, and thus alternative routes for the nascent peptide chain might be a universal feature of ribosomes.
3 A Special Feature of the 50S Subunit: The Tunnel
If the main tunnel is the shortest route and most simple way out of the ribosome, why then does the ribosome need these additional routes branching off the main pathway near the main exit? One possibility is that these openings could maintain the necessary chemical equilibrium in the tunnel system, providing access for water and ion molecules. Another is that they could be used for a more complex regulation of peptide translocation and modification, i.e., the idea being that different polypeptides would utilize different pathways depending on (i) their subcellular destination, (ii) co-factors, e.g., chaperones, which they need for folding or (iii) whether they require post-translational modifications such as methylations, acetylations, or phosphorolations. The implication of this latter view is that the ribosome tunnel would need to play some active role in directing the native peptides in the correct direction. The corollary of this is that there should in fact be specific interactions between the polypeptide and ribosomal components. This is exactly what a number of recent experiments are clearly indicating: specific peptide sequences have been shown to interact with the interior of the tunnel and thereby affect protein synthesis on the ribosome. Such sequence-specific interactions between the exit tunnel and nascent peptides suggest that the ribosome, similar to the RNA polymerases [25], can recognize cis-acting signals in the synthesized heteropolymeric chain and use them in possibly important intracellular control systems. Nascent peptides in prokaryotes and eukaryotes contain special sequence motifs, and when these effector sequences are situated in the exit tunnel of translating ribosomes, they can significantly affect both protein elongation and peptide termination (Table 1) [26, 27]. In all known cases, the peptides with effector motifs act only in cis and thus only affect the ribosome on which they are synthesized. Secondly, most effector sequences give rise to ribosomal complexes that are stalled either in the elongation or termination phase of protein synthesis. Thirdly, several of the active peptides have a co-effector and the interplay between an effector motif and a co-effector is key to several intracellular control systems. The co-effector can, for example, be an antibiotic (leading to expression of resistance genes), an amino acid (leading to induction of an amino acid degradation operon), or a polyamine (leading to repression of polyamine synthesis; summarized in Ref. [23]). An interesting and surprising involvement for the tunnel is in the regulation of a tryptophan catabolite pathway is seen in the expression of the tryptophanase (tna) operon in E. coli [28]. Tna is a catabolic enzyme that degrades tryptophan to indole, pyruvate, and ammonia, allowing tryptophan to serve as a carbon or nitrogen source [29]. The tna operon begins with a tnaC gene coding for a 24-amino-acidlong oligopeptide (leader peptide) followed by the structural genes tnaA and tnaB coding for a the tryptophanase and tryptophan permease, respectively [30]. The regulation is orchestrated by the following elements: The 12th and the 24th (last) position of the leader peptide are tryptophan and proline, respectively, followed by the RF2-dependent UGA stop codon. Another set of essential elements include transcriptional pause sites located immediately after the tnaC gene, where Rho-factor-mediated transcription termination can occur and thus prevent synthesis of tryptophanase. First analyses have demonstrated that high tryptophan
5
ErmC TnaC
secM
CPA1
Eubacteria Eubacteria
Eubacteria
Yeast
arg AdoMet DC β 2 -AdRec Rar-β 2 Mammalian CMV UL4 virus
cat, cmlA
Eubacteria
Fungi Mammals
Gene
Organism
97 98–101
96
35
95 32, 28
26
Reference
MKLPGVRPRPAAPRRRCTR MIRGWEKDQQPTCQKRGRV MQPLVLSAKKLSSLLTCKY 102 IPP
PSXFTSQDYXSDHLWXAX MAGDIS
NSQYTCQDYISDHIWKTS
FXXXXWIXXXXGIRAGP
SIFVI KWFNID
MVKTDMSTSKNAD
Active sequence
Arginine Spermidine, spermine
Arginine
Membrane translocation (secA)
Erythromycin Tryptophan
Chloramphenicol
Co-effector
Table 1 Nascent peptides causing ribosome stalling (data taken from Ref. [23])
Small uORFs in the eucaryotic mRNAs cause stalling of the ribosome. This inhibits scanning of the initiation complexes to the main ORF downstream. The Arg-responsible nascent chain CAP1 and arg can act on either terminating or elongating ribosomes. AdoMetDC and CMV UL4 can inhibit termination
Trp-induced ribosome stalling at the end of a small uORF is essential for attenuation of transcription; ribosome is inhibited at the stage of termination Sensor of protein translocation across the membranes; inhibits translating ribosome; translocation of the nascent chain relieves inhibition
Present in the middle of a small uORF; low concentration of antibiotics cause stalling of the ribosome and therefore rearrangment of mRNA secondary structure releasing the otherwise trapped translation-initiation region of a downstream ORF coding for an enzyme responsible for antibiotic resistance
Notes
6 Structure of the Ribosome
3 A Special Feature of the 50S Subunit: The Tunnel
concentration prevents Rho action [31] and interferes with RF2-dependent hydrolysis [32]. The following mechanism has emerged: The ribosome pursuing the transcriptase during translation of the leader peptide will carry an aa23 -Pro-tRNAPro at its P-site with the 12th Trp residue in the tunnel just where L4/L22 form the kink of the tunnel (see Fig. 1D). This constellation with the prolyl residue at the P-site next to the UGA and the tryptophanyl residue at the tunnel kink provokes a stalling of the ribosome and a retardation of the RF2-dependent hydrolysis. At this moment, obviously, the amino acid Trp (not Trp-tRNA!) at high Trp concentration binds to ribosome (possibly to the A-site region of the PTC) thus preventing the hydrolysis of the pp-tRNA and blocking the ribosome on a transcript site of the mRNA required for Rho factor binding. The result is a continuation of transcription into the structural genes tnaA and tnaB fostering the degradation of tryptophan [28]. Another example of a peptide with an effector sequence is the secM (secretion monitor) gene of E. coli encoding a unique secretory protein that monitors cellular activity for protein export and accordingly regulates translation of the downstream secA gene [33]. SecM is exported to the periplasm, where it is rapidly degraded by a tail-specific protease [34]. The regulation works again via a translational arrest, and, interestingly, with a similar sequence signature as in the previous example: the motif critical for the arrest is FXXXXWIXXXXGIRAGP with a polypeptide ProtRNAPro at the P-site and a Trp residue located 12 aa residues (towards the Nterminal; [35]). This arrest will be only relieved if the ribosomal complex can contact SRP·SecA, thus triggering the export of nascent SecM. Only the stalled ribosome allows the display of the ribosomal binding site for the translation of SecA, whereas relief of the blocked SecM translation allows folding of the secM-secA mRNA, which hides the translational-initiation site of SecA-mRNA region. It follows that a lack of SecA induces synthesis of SecA. This arrest can be suppressed by each of three amino acid mutations in L22, namely Gly91 to Ser, Ala93 to Thr, and Ala93 to Val. The two residues, 91 and 93, are located on the segment of L22 that protrudes into the exit tunnel at the constricted region. We also see in this example that the regions of L4 and L22 at the tunnel kink (and probably influencing the tunnel shape at this point) might sense the nascent chain in an unknown way, thus influencing essential ribosomal functions occurring not in the adjacent neighborhood such as peptide-bond formation and tRNA translocation. Modeling of this polypeptide in the tunnel revealed that the conserved Trp-Ile (WI) of the motif would be placed within the most constricted region of the tunnel in close proximity to the tip of the β-hairpin of L22 [36]. Furthermore, the binding of the macrolide troleandomycin with the D. radiodurans 50S subunit coincided with this hairpin such that the hairpin was pushed across the tunnel lumen to contact the wall on the other side of the tunnel [36]. This led Yonath and co-workers to suggest that this swung conformation is related to the gating mechanism that is involved in secM-induced translational stalling, i.e., the interaction of the Trp-Ile (both relatively bulky residues) may also induce similar structural rearrangements in L22 such that the tunnel is temporarily closed and therefore translation blocked [36].
7
8 Structure of the Ribosome
Further speculations are that the known ribosomal arrest suppression mutations of L22 (G91S, A93T, and A93V) may stabilize the swung conformation [36]. However, confirmation of this mechanism will require structures of nascent chain-ribosome complexes.
4 Features of the Ribosomal Subunits at Atomic Resolution
The first attempts to crystallize the ribosome were undertaken in the 1980s with the first 3D crystals obtained from B. stearothermophilus [37]. Owing to continual improvements in the quality of the crystals and in sampling techniques of diffraction patterns (discussed in Ref. [38]), the structure of both subunits at atomic resolution was revealed in the past few years [3, 11, 19, 39]. Although it may not be obvious at first glance, but both large and small subunits have structural features in common at a global as well as at an atomic level. The overall shapes of the atomic resolution structures are in good agreement with those derived from cryo-EM (see Refs. [40, 41] for comparison). The interfaces of both subunits, with the exception of S12 in the small subunit, are essentially protein-free [2], which had been predicted by neutron scattering [42]. This was later on confirmed by cryo-EM not only for ribosomes from bacteria but also for yeast ribosomes ([43]; see also Fig. 1G). In the 50S subunit, the proteins are evenly scattered over the cytosolic surface (Fig. 1H), whereas in the 30S they are concentrated mainly in the head and shoulder and platform regions of the body (Fig. 1G). The ribosomal proteins are often bound to junctions between helices, thereby often connecting separate domains, for example S17, which simultaneously contacts helix 7 (h7), h11 of 5 domain and h21 in the central domain, and L18 which links the helical regions 1 and 2/3 of the 5S rRNA with H87 (note the helices of the rRNA of small ribosomal subunit are denoted with a lower-case letter “h”, those of the large subunit with upper-case letter “H” and are counted in the phylogenetically derived secondary structure from the 5 to 3 as they occur; see also Ref. [44]) of 23S rRNA (for helix numbering see below Figs. 4A and C; [3, 10, 39]). Many proteins in both subunits have globular domains, generally found on the surface of the subunits, with long extensions that reach far into the RNA core, where they make intimate contacts with the rRNA. These extensions lack tertiary structure and in many regions, even secondary structure, as exemplified by the proteins that neighbor the PTF center (PTC) of the large subunit (Fig. 2, [20]). These long extensions from a globular domain represent a new and typical feature of ribosomal proteins and explain the numerous painful and unsuccessful attempts to crystallize many of the ribosomal proteins. A classic example being L2, where only fragments could be crystallized [45], and L4, where crystals were obtained only from a thermophilic bacterium and a halophilic archeon [46]. Almost half of the 30S proteins belong to the category of “globular domain plus long extensions” (such as S2, S6, S9, S11, S12, S14, S16, S17, and protein Thx specifically found in T. thermophilus) as well as many of the large subunit proteins (in H. marismortui: L2, L3, L4, L15, L18, L19,
4 Features of the Ribosomal Subunits at Atomic Resolution
Fig. 2 A view of the active PTF site with the RNA removed. The proteins with closest extensions to the entrance of the tunnel (pink) through the 50S subunit are shown as ribbons with their closest side chains in all-atom representation. From Ref.[20] with permission.
L22, L24, L37e, L44e, L15e, L37ae and in D. radiodurans: L3, L4, L5, L13, L24, L31 (counterpart L15e), L35 (no counterpart in H. marismortui)). In the 30S at least, the feature of protein extensions is found only with lateassembly proteins. The large extensions that are rich in basic amino acids (to mask the negative charges of the phosphates in the rRNA backbone) obviously fix the fold of the rRNAs at a late-assembly stage and stabilize the 3D fold of both proteins and rRNAs. In the 50S subunit, the proteins L3, L4, L22, and L25 belong to the proteins that determine a fold of the 23S rRNA essential for the early assembly [47]; in this case they could act initially to connect two distant domains and facilitate their coming together [4]. Even though the general microscopic features, such as the RNA fold or the protein distribution of the different 50S structures, are similar there are some significant differences [19]. For example, the entire L1 stalk in the unbound D. radiodurans 50S is tilted by about 30∞ away from its position in the T. thermophilus 70S ribosome yielding a maximum distance of over 30 Å of the outermost points (Fig. 3A) [19, 48]. Also the L7/L12 stalk and the GTPase-associated center consisting of H42–H44 and proteins L7/L12 and L10 are shifted (by 3–4 Å) between D. radiodurans 50S and T. thermophilus 70S ribosome [1]. The helices H42–H44 show a rotation of about 12∞ in the case of the H. marismortui 50S (PDB 1JJ2) from its position in D. radiodurans 50S [19, 48]. These observed flexibilities of the stalks are in line with cryo-EM studies of ribosome complexes in different functional states [24, 24, 49–51] H. marismortui 50S crystals derived according to Ban et al. [11] were used in kinetic and crystallographic studies of the PTF reaction. The appearance of peptide product, bound to the PTF ring in the electron density maps of 50S crystal soaked with substrate and the strict dependence of the peptide formation on the presence of 50S crystals clearly demonstrates the catalytic activity of 50S in the disputed crystal form (Refs. [52, 53]).
9
10 Structure of the Ribosome
Fig. 3 Specific features and differences of D. radiodurans 50S compared with 50S from T. thermophilus 70S and H. marsimortui 50S. (A) Movement of L1 stalk. The D. radiodurans 50S structure is displayed as gray ribbons with the L1-arm highlighted in gold. The over-laid L1-stalk of T70S is displayed in green. (B) Comparison of the nucleotides within the peptidyl-transferase center of D. radiodurans 50S with the corresponding
ones from H. marismortui 50S. Inset: the overall fold of the peptidyl-transferase center to emphasize the backbone similarity within D. radiodurans and H. marismortui 50S.(C) Overlay of H25 in D. radiodurans and H. marismortui (for details see text). Inset displays proteins L21 and L23e, which are related by an approximate 2-fold. From Ref.[19] with permission.
L27, which is located at the base of the central protuberance of D. radiodurans and has no homolog in H. marsimortui, is proposed to be involved in the proper placement of the 3 -end of the A-site tRNA at the PTC during the PTF reaction [54]. In H. marismortui, the non-homologous L21e replaces L27; however, the tail of L21e folds back towards the interior of the subunit and therefore cannot make contact with the P-site tRNA [19]. Other D. radiodurans-specific large ribosomal proteins are the L25 analog CTC, which fills the gap between the central protuberance and L7/L12 stalk; the extended α-helical protein L20, which is replaced by 47 n extension in H25 of domain II in H. marismortui (see Fig. 3C); and the two Zn-finger proteins L32 and L36 [19].
5 The Domain Structure of the Ribosomal Subunits
5 The Domain Structure of the Ribosomal Subunits
The shear complexity of protein synthesis forces any participating component to maintain their structure and function through evolution. This principle justifies the assumption that not only all tRNAs, but also all 16S (and 16S-like) and 23S (and 23S-like) rRNAs have the same general secondary and tertiary structures [55]. Therefore, the secondary structures of 16S, 23S, and 5S rRNA could be derived by analyzing the pattern of variation within aligned rRNA sequences from different species (see Figs. 4A and C; [56]). The resulting secondary-structure diagrams consist of a complex arrangement of A-form helices and non-helical regions (loops or bulges) [55]. In the 16S rRNA the different domains branch from a central pseudo-knot and, beginning from the 5 -end, are termed the 5 , central, 3 -major and 3 -minor domains (see Fig. 4A). In striking contrast to the 50S subunit (see following), the domains are not interwoven in the tertiary fold and can be assigned easily to the structural landmarks of the 30S subunit (Fig. 4B). The 5 -domain forms the 30S body, starting from the neck of 30S subunit it goes down to the toe and finally turns back to form the shoulder. The central domain constitutes the platform, the 3 -major domain the head. The 3 -minor domain consists of h44 and h45; h44 runs down the 30S along the inter-subunit surface and returns back to the neck, followed by h45 and a single-stranded 3 -end containing the anti-Shine–Dalgarno sequence. In the 23S rRNA secondary structure the 5 - and 3 -terminal ends are brought together to form a helix (H1 in Fig. 4D). Radiating from the loop of this helix are 11 stem-loop structures of differing degrees of complexity. These stem-loop structures are bundled into six different domains and, in analogous fashion to the helices, are numbered from the 5 - to 3 -end. The last change in the assignment of the secondary structure to the various domains was contributed by crystallography. Helix 25, which was originally considered as being part of domain I, was reassigned to domain II, because it exhibits stronger interactions with this domain than with the elements of domain I [11]. The six domains of 23S and 5S rRNA all have compact shapes, which are intertwined (Fig. 4D). The domains form structural units, as the vast majority of interactions involving two or more hydrogen bonds occur within domains, rather than between them. This is also the reason for that the interchange of the domain V from E. coli against Staphylococcus aureus in the 23S rRNA of E. coli was possible, even though it introduces 132 changes into the E. coli rRNA sequence, only one additional mutation of U1782C for was necessary for viability [57]. Nearly all of the secondary-structure base pairings and a few of the tertiary base pairs observed in the crystal structure had already been predicted by comparative structure models. Specifically, more than 1250 base pairs predicted were indeed present in the 16S and 23S rRNA crystal structures. The 35 predicted base pairs, which were not found in the crystal structures, could simply not occur at all or possibly only at certain stages of protein synthesis, for example, in the 30S subunit, the A•A base pair between positions 1408 and 1493 is broken upon binding of tRNA
11
12 Structure of the Ribosome
Fig. 4 Secondary structures of 16S, 23S, and 5S rRNAs. (A) Secondary structure of T. thermophilus 16S rRNA, with its 5 , central, 3 -major, and 3 -minor domains shaded in blue, magenta, red, and yellow, respectively. (B) Three-dimensional fold of 16S rRNA in 70S ribosomes, with its domains colored as in (A). (C) Secondary structures of T. thermophilus 23S and 5S rRNAs, indicating domains I (blue), II (cyan), III (green), IV (yellow), V (red), and VI (magenta) of 23S rRNA. The rRNAs are numbered according to E. coli. (D) Three-dimensional folds of 23S and 5S rRNAs, with their domains colored as in (C). From Ref. [1] with permission. (E, F) Comparison of the current comparative structure models for the 16S
and 23S rRNAs with the corresponding ribosomal subunit crystal structures. (E) 16S rRNA versus the T. thermophilus structure (GenBank accession no. M26923; PDB code 1FJF). (F) 23S rRNA, 5 -half and 3 -half versus the H. marismortui structure (GenBank accession no. AF034620; PDB code 1JJ2). Nucleotides are replaced with colored dots that show the sources of the interactions: red, present in both the covariation-based structure model and the crystal structure; green, present in the comparative structure and not present in the crystal structure; blue, not present in the comparative structure and present in the1 crystal structure; purple, positions that are unresolved in the crystal structure.
5 The Domain Structure of the Ribosomal Subunits
Fig. 4 Continued
and mRNA (cf. 1FJF and 1IBM) [58, 59]. The crystal structures of small and large subunits enriched the secondary-structure diagrams by 170 base pairs in 16S and 415 in 23S rRNA, i.e., these were not predicted by comparative methods. Essentially, all the “mis-assigned” base pairs have no significant amount of variation (Ref. [55], see Figs. 4E and F). The fact that the domain secondary structures form well-defined structural domains of quaternary structure in the small subunit, but not in the large subunit, may result from two reasons that need not be mutually exclusive: (i) the small subunit might require larger flexibility for ribosomal functions [2], and the difference in the organization of the secondary structures might indicate that (ii) the 50S subunit is older in evolutionary terms, because more time would be required to evolve such a complicated interwoven structure similar to that of the 50S subunit. As Schimmel and Henderson [60] noted, three elements of protein synthesis, viz. tRNA, synthetases and the ribosome, separate into two domains of different evolutionary ages that have probably co-evolved. The “old” domain of the tRNA is the aminoacyl stem (the short arm of the L-shaped tRNAs or mini-helix), which corresponds to the catalytic domain of synthetases in charging the tRNAs and to the large ribosomal subunit involved in peptide-bond formation. The “young” domains are
13
14 Structure of the Ribosome
Fig. 4 Continued
represented by the long arm of tRNAs bearing the anticodon loop, the recognition domain of the synthetases, and the small ribosomal subunit that interacts with the anticodon loop and some of the stem base-pairs [1, 61].
6 Interactions of RNA with RNA or Struts and Bolts in the Three-dimensional Fold of rRNA: Coaxial Stacking and A-minor Motifs
Upon the publication of the structures of the two ribosomal subunits the amount of RNA structure known at atomic resolution increased about 8-fold [2, 62]. However,
Fig. 4 Continued
6 Interactions of RNA with RNA or Struts and Bolts in the Three-dimensional Fold of rRNA 15
16 Structure of the Ribosome
most of the structure motifs had been seen before, suggesting that the possible number of RNA structure motifs is limited [2]. In this and the following section ??, we will restrict our analysis to a specific subsection of structural elements, since tetraloops, tetraloop receptors, adenosine platforms, U-turns, E-loops, sarcin/ricin motif, and cross-strand purine stacks have been extensively discussed elsewhere [63–66]. In the crystal structure, many of the single-stranded loop regions in the secondary structure turned out to be slightly irregular double-stranded extensions of neighboring regular helices. Thus, most rRNA may be described as helical or approximately helical [3]. These helical elements are organized via vertical co-axial stacking of helices, by A minor interactions and ribose zippers. 6.1 Coaxial Stacking
Coaxial stacking is the end-to-end stacking of separate helical RNA parts to form long quasi-continuous helical structures (see as example h16/h17 in 30S or H34/H35 in domain II of 50S in H. marismortui [64, 67]). Stacking of nucleic acids is driven by the highly energetically favored stacking interactions between the π-electron system of the nucleic acid bases [68]. The free energy from coaxial stacking of helices ending with Watson–Crick base pairs yields a G◦ of −1.0 to −4.5 kcal mol−1 , depending on the context. This is in the range of the contribution of next-neighbor interaction to the free energy of an intact helix (−2.0 and −3.4 kcal mol−1 ) [69, 70]. Also in the same range of free energy is the contribution from coaxial stacking with a single G•A base pair at the interface of the stacking helices (about −2 kcal mol−1 , [71]). Therefore, 92% (11/12) of the potential coaxial stacking in the 16S rRNA and 50% (11/22) in the 23S rRNA are observed in the crystal structure. Potential coaxial helix stacking is defined by two helices with an A•G or A·A base-pair at their interface and no unpaired nucleotides in the strand connecting them [58]. Often the A•A and A•G, with the G 3 to the helix, at the end of helices are inter-convertible. A good example of a coaxially stacked helix is seen in a classic pseudoknot. This motif consists of a hairpin loop, which base pairs with a complementary single-stranded sequence adjacent to the hairpin stem, to form a contiguous helical structure. Similar to junctions, the coaxial stacking observed in the pseudoknot requires either Mg2+ ions or a high concentration of Na+ ions for stabilization [63]. Sequence-independent packing of ordinary helices is very seldom seen. It consists of an insertion of a phosphate ridge of one helix into a minor groove of the other at a fixed interhelical angle of about 80◦ (e.g., in the 30S subunit for helices h7 and h21 with an angle of 93◦ ). This kind of interaction is already known from the crystal structure of the 59-GGCGCUUGCGUC-39 RNA duplex. In this structure, the duplex forms quasi-continuous helices that pack against each other, with the backbone of one helix in the minor groove of a perpendicular helix [72]. Owing to the limitation to four principal nucleotides, the primary RNA sequence is by itself not enough to define a motif. Moore [66] suggested classifying only structural elements as motifs if they have a defined sequence length or specific
6 Interactions of RNA with RNA or Struts and Bolts in the Three-dimensional Fold of rRNA 17
Fig. 5 Superposition of crystal structures of composite sarcin motif, 23S G911 (red),and internal loop sarcin motif, 23S G225 (blue). From Ref.[65] with permission.
loop sizes. This definition would exclude pseudoknots as motifs, as the sequence length and the loop size are unrestricted. Westhof tried to define an RNA motif as an ordered array of stacked non-Watson–Crick base-pairs and thereby focusing plainly on the 3D aspect of a motif. This is well exemplified by the complex topology around G911 in 23S rRNA. The 3D structure of this G911 topology (consisting of nucleotides 1068–1071 and 1292–1295 in one strand and nucleotides 910–914 in the base-pairing strand, shown in red in Fig. 5) superimposes well onto the sarcin/ricin motif, as seen in the internal-loop motif G225 in 23S rRNA (shown in blue in Fig. 4). The only parts of the motifs that do not superimpose well belong to the backbone, which in the composite motif connects to other parts of 23S RNA [65]. Although this definition of an RNA motif is quite successful, it does not encompasses the K-turn motif, where the array of stacked base pairs is interrupted by a triple loop to introduce a ∼120◦ kink (see below). As seen above, both secondary- and tertiary-structural aspects must be considered to provide an accurate definition of an RNA motif. In spite of these complications, evidence is accumulating that the modular structure of RNA motifs, which mediate RNA–RNA (e.g., sarcin/ricin motif [65]) and RNA–protein interactions (e.g., K-turn, see later), are recurrent and are also found in the high-resolution structures of the ribosomal subunits. 6.2 A-minor Motifs
From the crystal structure of the H. marismortui 50S, the importance of the socalled A-minor motifs for stabilization of the 3D fold of large RNA became evident
18 Structure of the Ribosome Table 2 Frequency and distribution % of single nucleotides in bacterial 16S and 23S rRNAs
comparative structure models [103] Nucleotides
G
C
A
U
Overall In helices In unpaired regions Unpaired/total number of positon of nt
31.4 36.6 23.6 30.1
22.4 28.8 12.5 22.3
25.7 14.8 42.6 66.2
20.5 19.8 21.3 41.5
[73]. Phylogenetic co-variation analysis had already revealed a very strong bias for adenine (A) in single-stranded regions as compared with helical ones, hinting at the important structural or functional role of these adenines (see Table 2). One reason for this bias is the crucial importance of A-minor motifs in helix–helix packing, in helix–loop interactions and at helix junctions [73]. In the type-I A-minor motif, the N1-C2-N3 edge of a purine, preferentially a highly conserved A, interacts with the minor groove face of another helix (see Fig. 5). The A-minor motif was originally observed in the P4-P6 domain of group I ribozymes, where it was part of the ribose zipper [74], and was subsequently seen in the L11-protein-binding domain of 23S rRNA [75, 76]. Nissen et al. [73] distinguish between four variations in the A-minor motif (types O, I, II and III); however, we will focus primarily on types I and II, as they seem to be the most essential for ribosome function. Examples include the fixation of the CCA end of the tRNA at the PTC [20, 73], A-site recognition of the correct codon–anticodon interaction during the selection of the correct aminoacyl-tRNAs (Fig. 6B [59, 77]) and subunit association (A702 of 16S rRNA with the minor groove of H68 of 23S rRNA; [1]). In all A-minor motifs, the ribose-phosphate backbone of the interacting adenosine is closer to one of the strands of the receptor helix. The orientation of the receptor helix strand closest to the adenosine is antiparallel to the orientation of the adenosine of the donor strand. Type I and II A-minor motifs are different with respect to the position of the 2 -OH and N3 atoms of the adenosine residue relative to the closer strand of the receptor helix (Fig. 6; [73]). In type I, both the 2 -OH and the N3 of the adenosine residue are within the minor groove of the receptor helix (Fig. 6A), thereby maximizing the number of possible hydrogen bonds to the inserting adenosine. In type II, the N1, C2, N3, and 2 -OH of the adenosine contact approximately half of the minor groove. The 2 -OH of the inserting adenosine is positioned outside of the minor groove with respect to the 2 -OH of the closer strand of the receptor helix, whereas the N3 of the A is inside the minor groove (Fig. 6A; [73]). Both types of A-minor motifs are highly specific for adenine bases and show a strong preference for C-G receptor base pairs (Table 2; note the preference of adenosine in unpaired regions and the preference of G: C pairs in helices; Fig. 6; [73]).
6 Interactions of RNA with RNA or Struts and Bolts in the Three-dimensional Fold of rRNA 19
Fig. 6 A-minor types I and II.(A) Ribosomal examples of A-minor type I and II. Each type is defined by the position of the 2 -OH group of the interacting adenosine relative to the positions of the two 2 -OH groups of
the receptor base pair (see text for details). From Ref. [73] with permission. (B) Principles of decoding according to Ref. [59], with permission. For details see text.
6.3 Ribose Zippers and Patches of A-minor Motifs
A ribose zipper is defined as two consecutive hydrogen-bonding interactions between ribose 2 -OH from two different RNA segments. The orientation of the two chains linked by ribose zipper is always antiparallel. A total of 97 ribose zippers are present in the ribosomal subunit crystal structures: 20 in the T. thermophilus 16S rRNA, 44 in H. marismortui 23S rRNA (plus two ribose zippers bridging 5S rRNA loop E with H38 and the internal loop of H38 and the 5S rRNA helix 4) and 30 in D. radiodurans 23S rRNA (plus one bridging 5S and 23S rRNAs) [78]. Out of the 11 possible types of ribose zipper, seven are found in 23S and 16S rRNAs. From these only the canonical and single-base ribose zipper occur more than once in 23S and 16S rRNA [78]. 6.3.1 Canonical Ribose Zipper In this type of zipper, the 2 -OH hydrogen bonds are supported by additional hydrogen-bond interactions between the base at the 5 -end and the ribose 2 -OH of the 3 -end on the opposite zipper strand (see Fig. 7A). On the base side, the purine N3 or the pyrimidine O2 functions as a hydrogen acceptor. The ribose zipper formed by C376, C377 and A243, A242 in 23S rRNA of H. marismortui, has a prototypical topology for the 40 canonical ribose zippers found in the 16S and 23S rRNAs. A243 is inserted into the minor groove of the C376:G274 base-pair forming a type I A-minor motif. A242 interacts via a type II A-minor motif with C377 :G273 base pair and is stacked onto A243 (Fig. 7B). This tandem A-minor
20 Structure of the Ribosome
Fig. 7 Ribose zipper.(A) Schematic representation of canonical and single-base ribose zippers type A. Light blue colored broken lines represent hydrogen bonds. (B) Stick diagrams of the canonical ribose zipper. Top panels: canonical ribose zipper where C377 and A242 belong to the upper layer (blue green) and C376 and A243
belong to the bottom layer (slate)(top left panel) viewed perpendicular to the backbone, showing the ribose-ribose interactions and (top right panel) from above the upper layer. Lower panels: hydrogen-bond network in the lower layer (to the right) and the upper layer (to the left). From Ref. [78] with permission.
motif seen in 31 instances of the 40 canonical ribose zipper in T. Thermophilus 30S and H. marismortui 50S had been noticed earlier by Nissen et al. [73], where it was referred to as an “A patch”. The adenosine in the type I A-minor motif, which exhibits stronger conservation than the type II, shows a CG>GC>UA=AU order of preference with regard to the Watson–Crick pairs it interacts with. In contrast, no base-pair preference was detected for the type II A-minor motif [78]. This is in line with experimental and phylogenetic covariation analysis on group I introns [79]. Moreover, with the Aminor patches found in group I introns, the average contribution of the ribose hydrogen bond to the tertiary fold was determined to a G◦ of −0.4 to −0.5 [80] and −6.6 kcal/mol−1 for type I A-minor motif [79]. 6.3.2 Single-base Ribose Zipper The single-base ribose zipper is a canonical ribose zipper with one obstructed base 2 -OH interaction. Two possible types of single-base ribose zippers can be distinguished: types A and B; the definition depending on which base ribose interaction is interrupted, i.e., in either the type I or type II A-minor position, respectively. Con-
7 Progress and New Developments in Understanding rRNA Structures 21
sequently, the 15 single-base ribose zippers identified within the T. thermophilus 30S and H. marismortui 50S can be separated into 10 type A and 5 type B. Almost all single-base ribose zippers have adenine in the type I A-minor position, which is, in the case of type A, rotated away from the “acceptor base pair”. In the type B single-base ribose zipper the type II A-minor position is a G in all instances [78]. Comparison of the two available 50S structures revealed that 10 of the canonical ribose zippers that are present in H. marismortui are also found in D. radiodurans, although two canonical ribose zippers in H. marismortui are single-base ribose zippers in D. radiodurans. All ribose zippers of H. marismortui 50S structure are either conserved or their composing bases are in close proximity to each other in D. radiodurans structure. For example, 10 canonical ribose zippers found specifically in H. marismortui have similar positions in D. radiodurans; however, the bases are too far apart from each other to form hydrogen bonds.
7 Progress and New Developments in Understanding rRNA Structures
To finalize the overview of motifs recognized in the ribosomal subunit, the K-turn and lonepair triloop will be reviewed as well as the attempts that have been made to categorize non-Watson–Crick base interactions and RNA motifs in databases. 7.1 K-turn
After careful analysis of crystal structure of the large subunit, Klein et al. [62] recognized a new motif, consisting of a helix–internal loop–helix. A kink in the phosphodiester backbone of the internal loop bends the RNA helix axis by 120◦ and also gives the motif its name ‘Kink-turn’ or ‘K-turn’. The first helical stem termed ‘canonical stem’ (C-stem) ends at the internal loop with two Watson–Crick base pairs, typically C:Gs. The second helical stem termed ‘non-canonical stem’ (NC-stem) follows the internal loop and starts with two non-Watson–Crick base pairs, typically sheared G•A base pairs. The internal loop between the helical stems is always asymmetrical and usually contains three unpaired nucleotides in one strand but none in the other (Fig. 8A). The close helical packing between the helical stems is stabilized by a type I A-minor interaction [73] between the last C:G of the C-stem and the A of the G•A base pair in the NCstem (Fig. 8B). The requirement for a type I A-minor interaction may account for the conservation of C:G in the C-stem and A•G in the NC-stem and also the high conservation with the consensus secondary structure for the K-turn, which includes 10 consensus nucleotides out of the possible 15 [62]. Although the eight K-turns identified in the H. marismortui 23S and T. thermophilus 16S rRNA vary somewhat in sequence, each has essentially the same distinctive 3D structure. The six Kturns in H. marismortui 23S rRNA superimpose with an r.m.s.d. of 1.7 Å (r.m.s.d.:
22 Structure of the Ribosome
Fig. 8 K-turn motif.(A) Secondary-structure diagrams of the eight K-turns found in the H. marismortui 50S and T. thermophilus 30S subunit structures and a derived consensus sequence. (B) Three-dimensional representation of KT-7 with the phosphate backbone of the kinked strand in orange and the unkinked strand in yellow (upper panel). Lower panels displays atomic details of indi-
vidual base-base and base-backbone hydrogen bonds. (C) Location of the K-turns in the H. marismortui 50S structure. K-turns in blue are shown in the 3D structure: crown view from the interface (top left panel), cytosolic face (top right panel) and in the secondary structure (lower panel). From Ref. [62] with permission.
7 Progress and New Developments in Understanding rRNA Structures 23
Fig. 8 Continued
root-mean-square deviation). All six of these K-turn motifs appear at or near the surface of the 50S particle, in regions that are less well conserved among the three kingdoms (Fig. 8C). The two K-turns identified in the structure of T. thermophilus 30S are localized in the intersubunit surface and probably play a structural role in the association of the subunits. 7.2 Lonepair Triloop
As implied in the name, the lonepair triloop (LPTL) consists of a lone base pair capped with a loop of three nucleotides. The nucleotide sequence within LPTLs can be described as 5 -FXYZL-3 , where the underlined nucleotides F and L form the lonepair and the three nucleotides, X, Y and Z, the triloop [81]. Twenty-three LPTLs occur in the ribosomal RNAs, of which seven are in 16S rRNA of T. thermophilus 30S, 15 in the 23S and one in 5S rRNA of the H. marismortui 50S. The LPTLs in D. radiodurans 50S are at the corresponding positions of 23S RNA of H. marismortui and are structurally equivalent. In addition to the LPTLs recognized in the ribosomal RNA an additional one was found in the T-loop of tRNAs [81]. Nearly all of the LPTL
24 Structure of the Ribosome
sites in rRNAs are conserved and most of them contribute to rRNA packing via tertiary interactions with RNA segments that are distant in terms of the secondary structure [81]. 7.2.1 Classification of Lonepair Triloops All but one of the LPTLs are coaxially stacked on the nearest 5 -helix. The LPTLs can be classified as directly (class I) or indirectly (mediated by a short helical region (class II)), coaxially stacking LPTLs. In both classes, the 5 -base of the lonepair (F) is stacked onto the first nucleotide of the triloop (X), which is connected to the second nucleotide of the triloop (Y) by a 220–230◦ U-turn. The second and third nucleotides are facing into the minor groove, with the third base forming hydrogen bonds to the first nucleotide of the triloop (see Figs. 9A and B). In a subpopulation of type I and II LPTLs, a tertiary base is recruited by the triloop by forming a base pair to the first base of the triloop (X). This recruited tertiary base stacks between the third base (Z) and the 3 -base in the lonepair (L) and yields a structural conformation resembling that of the tetraloop motif. In contrast to the above-described type A LPTLs, no tertiary base is recruited by type B LPTLs [81]. A third class of LPTLs includes all LPTLs within a helical region. This category of LPTLs is structurally distinct from the first two classes in having at least two of the triloop bases base-pairing to form part of a regular helical stem and in missing the U-turn in the triloop [81]. The LPTLs reviewed here include the earlier reported T-loop RNA fold as a type I LPTL [82].
Fig. 9 Lonepair triloop. (A) Schematic representation of IA, IB, IIA, IIB, and III LPTLs. The lonepair F:L is appended either directly (class I) or indirectly (class II) through the intervening base-pair(s) M:N to its 5 helix, which is shown with lines (see text for details). (B) Three-dimensional structure
of representatives of type IA, IB, and IIA LPTLs. Nucleotides are numbered in black give T. thermophilus numbers for 16 S rRNA and H. marismortui numbers for 23 S rRNA and in red E. coli numbers. From Ref. [81] with permission.
7 Progress and New Developments in Understanding rRNA Structures 25
Fig. 9 Continued
26 Structure of the Ribosome
7.3 Systemizing Base Pairs
To facilitate the understanding of non-canonical base interactions, which have proven to stabilize secondary and tertiary structures, different groups have put together compilations of non-Watson–Crick interactions. Westhof and colleagues have sorted the base–base interaction by the C1 –C1 distance and the orientation of the glycosidic bonds. Two base pairs with nearly the same C1 –C1 distance and same glycosidic bond orientation can replace each other without drastically changing the 3D path and relative geometric orientations of the phosphate-sugar backbones. These base pairs are called isosteric, although they will not always occupy the same volume of space. This collection of isosteric basepair interactions will be useful in the development of more accurate 3D RNA models. In the course of this work, Leontis and Westhof [83] systemized the nomenclature of base pairs, based on the three potential hydrogen bond forming edges of a base. These are the Watson–Crick, the Hoogsteen edge for purines and the CH edge for pyrimidines, and the sugar or the shallow groove edge (Fig. 10).
Fig. 10 Proposed nomenclature for base pairs by Leontis and Westhof. Left panel: purine (A or G, indicated by “R”) and pyrimidine (C or U, indicated by “Y”) bases provide three edges for interaction, as shown for adenosine and cytosine. Right panel: the cis and trans orientations are de-
fined relative to a line drawn parallel to and between the base-to-base hydrogen bonds in the case of two hydrogen bonds or, in the case of three hydrogen bonds, along the middle hydrogen bonds. From Ref. [83] with permission.
8 RNA–protein Interactions
In the NCIR (non-canonical interactions in known RNA structures) database, Fox and colleagues [84] compiled all the RNA structures that contain non-Watson–Crick base–base interactions. In addition, this database provides a summary of the known properties of the base–base interaction (http://prion.bchs.uh.edu/bp type/). 7.4 Systemizing RNA Structural Elements
In their structural classification of RNA (SCOR) database (http://scor.lbl.gov), Fox and colleagues categorized RNA motifs either as an external or internal loop. To define loops, they applied a strict definition for each loop; external loops being a covalently connected series of residues non-Watson–Crick-paired to each other, which are closed on one side by a Watson–Crick base pair, whereas internal loops consist of one (for bulge loops) or two non-Watson–Crick paired residues, closed on both sides by Watson–Crick base pairs. Using their definitions, bulged loops or G•U base pairs within a standard helix are considered internal loops [85]. Two hundred and twenty-three internal and 203 external loops extracted from the 259 NMR and X-ray structures are complied in the SCOR database. The internal loops are further divided into nine subclasses. The external loops are categorized by size and stacking pattern within the loop. Examples of RNA tertiary interactions are also included in the SCOR database, such as coaxial helical stacking, ribose zippers, A-minor interactions or pseudoknots [85].
8 RNA–protein Interactions
The ribosome consists of approximate 53 ribosomal proteins, which are crucial for the smooth functioning of protein synthesis. Therefore, we will turn our attention now to principles governing RNA–protein interaction. We will start by explaining the differences between RNA and DNA and the implication this has for binding protein, which will lead us to the principal modes of interaction between protein and RNA, and ending in the specifics of ribosomal protein interaction with rRNA. 8.1 Problem of RNA Recognition
The A-form helix accounts for as much as 50% of the residues in an average nonmessenger RNA, which includes the bases of Watson–Crick base pairs as well as G•U wobble base pairs in the runs of Watson–Crick base pairs [66]. In the A-form, the minor groove is shallow and broad (10–11 Å in width and 2.8 Å in depth) whereas the major groove is deep and narrow (4–5 Å in width and 13.5 Å in depth), when compared with B-form helix of DNA (major groove: 12 Å in width and 8.5 Å in depth, minor groove: 5.8 Å in width, 7.5 Å in depth). This allows functional groups of the ribose sugar, in particular the 2 -OH, to participate in interactions.
27
28 Structure of the Ribosome
The 2 -OH group can act as a hydrogen bond donor and/or acceptor, making it versatile when it comes to interaction with both RNA and protein. The functional importance of 2 -OH group for protein synthesis was experimentally reinforced in many instances, such as decoding [86], for which the involvement of 2 -OH of the mRNA was structurally rationalized [59], translocation of tRNA(Met) from P-site to the E-site, which is dependent on 2 -OH groups at positions 71 and 76 in the 3 -acceptor arm of the tRNA [87], or tRNA binding to EF-Tu or to aminoacyl-tRNA synthetases [88, 89]. Sequence-specific interactions with regular A-form RNA helices via direct Hbonding and van der Waals interactions cannot distinguish, in the minor groove, a G:C from C:G or an A:U from a U:A base pair, but can distinguish G:C from A:U types [90]. This fact degenerates the recognition of base pairs from a quaternary mode (four base pairs) to a binary mode (two kinds of base pairs). This contrasts with the recognition, in the major groove, of B-form DNA where discrimination of all four Watson–Crick base pairs (GC, CG, AT, and TA) is possible (Fig. 11; [90]). This difference in discrimination is reflected in the different mode of sequencespecific recognition of DNA and RNA. DNA recognition is typically accomplished through the recognition of a particular nucleotide sequence in double-stranded DNA (dsDNA). However, the sequence-specific RNA recognition results largely through single-stranded regions, bulges, or internal and terminal loops [91, 92]. The complex structure of the RNA moiety within the protein–RNA complex hampers tight packing at the protein–RNA interface. Therefore, the protein–RNA interface is less well packed in comparison with protein–dsDNA or protein–ssDNA
Fig. 11 A schematic representation of the different patterns of hydrogen-bond donors (two triangles connected on one tip) and acceptors (diamonds) presented by Watson–Crick pairs to the major and the minor grooves. A varied pattern of hydrogen donors and acceptors in the major
groove allows easy discrimination of AT, TA, GC, and CG, whereas in the minor groove only the discrimination between AT and GC base pairs is possible due to the symmetric distribution of the donors and acceptors. Adapted from Ref. [90].
8 RNA–protein Interactions
(single-stranded DNA) complexes. Within the protein–RNA complexes, the sequence-specific complexes achieve the best packing with, surprisingly, the least polar protein interface [92]. The difference in sequence-specific recognition of dsDNA and ssRNA also resonates in the hydrogen-bond pattern between amino acids and bases. In dsDNA–protein complexes the functional groups of amino acid side-chains interact with the nucleotide bases in the accessible major groove. In RNA–protein complexes the amide backbone is frequently used for specific base interactions, especially the carbonyl group of the amide group as a proton acceptor. A duplex helix hampers close contact of a peptide backbone with a nucleotide base to a larger extent than ssRNA does. Interestingly, the extent to which an amide group is used for phosphodiester backbone contacts is approximately the same in RNA and DNA complexes [91]. 8.2 Chemistry of RNA–protein Interactions
In most of the recently published studies on protein–RNA interaction, the structure of H. marsimortui 50S is excluded because the constrains of protein–RNA interaction is compounded by the problem of counteracting the high-salt conditions found in H. marsimortui cells. A statistical analysis of protein–RNA interactions of the 30S subunit, synthetase–tRNA complexes and other RNA–protein complexes revealed that 22% of the amino acids in the RNA–protein interface are hydrophobic, 40% charged (positive 32%, negative 8%), 30% polar and 8% glycine. The preferred amino acids at the RNA–protein interface are Arg, Ser, His, Tyr, Lys, with the underrepresented ones being Ala, Val, Glu, Met, Leu, Ile [93]. Statistical analysis has also established that some amino acids have preferences for interacting with either the phosphate group, the ribose moiety or the nucleotide base of the RNA; Arg prefers to interact with the phosphate backbone; Met, Phe and Tyr favor the ribose; Pro and Asn display stronger affinity for bases over ribose or phosphate groups. Additionally, the four bases have individual preferences, such that adenosine favors Ile, Pro, Ser; cytosine, Leu; guanosine, Asp and Gly, and uracil, Asn [93]. 8.3 rRNA–protein Interaction
Ribosomal proteins make more extensive use of contacts to the sugar-phosphate backbone, which is reflected in the preference for hydrogen bonds to phosphate groups, over ribose, followed by bases. Whereas proteins with sequence-specific binding to ssRNA-like structures make a surprisingly small number of contacts to the RNA backbone and prefer, by far, hydrogen bonds to the base. In-between these two extremes is the tRNA synthetase family, using phosphate, ribose and bases equally for hydrogen-bond contacts [91, 93]. Based on the hypothesis that the ribosome is evolutionary older than tRNA synthetases, which are older than ssRNA–protein complexes, Allers and Shamoo
29
30 Structure of the Ribosome
[91] put forward an intriguing idea: They proposed that in the earliest interaction of an “RNA world” RNA-like molecules had only the minimalist chemical interactions with amino acids and therefore would have only take advantage of the abundant peptide backbone amide and carbonyl groups. The preference of ribosomal proteins to interact with the RNA backbone led to the hypothesis that shape and charge complementarity, rather than specific sequences, are responsible for the specificity observed in ribosomal protein–RNA interactions [75, 76]. This hypothesis is in agreement with co-variation sequence analysis and can also explain the latter’s inefficiency in recognizing r-protein binding sites. As explained above, some of the forces within an RNA–protein complex are understood; however, the understanding of RNA–protein recognition and the mutual interplay between protein and RNA, as seen in the ribosome assembly, is still a long way off [91, 94].
References 1 M. M. YUSUPOV, G. Z. YUSUPOVA, A. BAUCOM et al., Science 2001, 292, 883–996. 2 V. RAMAKRISHNAN, P. B. MOORE, Curr. Opin. Struct. Biol. 2001, 11, 144–154. 3 B. T. WIMBERLY, D. E. BRODERSEN, W. M. CLEMONS et al., Nature 2000, 407, 327–339. 4 D. E. BRODERSEN, W. M. CLEMONS, Jr, A. P. CARTER et al., J. Mol. Biol. 2002, 316, 725–768. 5 N. NEVSKAYA, S. TISHCHENKO, M. PAVELIEV et al., Acta Crystallogr. 2002, 58, 1023–1029. 6 A. NIKULIN, I. ELISEIKINA, S. TISHCHENKO et al., Nat. Struct. Biol. 2003, 10, 104–108. 7 N. NEVSKAYA, S. TISCHENKO, R. FEDOROV et al., Struct. Fold. Des. 2000, 8, 363–371. 8 S. NIKONOV, N. NEVSKAYA, I. ELISEIKINA et al., EMBO J. 1996, 15, 1350–1359. 9 J. FRANK, Electron Microsc. Rev. 1989, 2, 53–74. 10 H. STARK, F. MUELLER, E. V. ORLOVA et al., Structure 1995, 3, 815–821. 11 N. BAN, P. NISSEN, J. HANSEN et al., Science 2000, 289, 905–920. 12 R. A. MILLIGAN, P. N. UNWIN, Nature 1986, 319, 693–695. 13 A. YONATH, K. R. LEONARD, H. G. WITTMANN, Science 1987, 236, 813–816. 14 C. BERNABEU, E. M. TOBIN, A. FOWLER et al., J. Cell. Biol. 1983, 96, 1471–1474.
15 P. DUBE, M. WIESKE, H. STARK et al., Structure 1998, 6, 389–399. 16 R. BECKMANN, D. BUBECK, R. GRASSUCCI et al., Science 1997, 278, 2123–2126. 17 R. BECKMANN, C. M. SPAHN, N. ESWAR et al., Cell 2001, 107, 361–372. 18 J. F. MENETRET, A. NEUHOF, D. G. MORGAN et al., Mol. Cell 2000, 6, 1219–1232. 19 J. HARMS, F. SCHLUENZEN, R. ZARIVACH et al., Cell 2001, 107, 679–688. 20 P. NISSEN, J. HANSEN, N. BAN et al., Science 2000, 289, 920–930. 21 J. FRANK, J. ZHU, P. PENCZEK et al., Nature 1995, 376, 441–444. 22 I. S. GABASHVILI, S. T. GREGORY, M. VALLE et al., Mol. Cell 2001, 8, 181–188. 23 T. TENSON, M. EHRENBERG, Cell 2002, 108, 591–594. 24 C. M. SPAHN, R. BECKMANN, N. ESWAR et al., Cell 2001, 107, 373–386. 25 J. GREENBLATT, Curr. Opin. Cell Biol. 1997, 9, 310–319. 26 P. S. LOVETT, E. J. ROGERS, Microbiol. Rev. 1996, 60, 366–385. 27 D. R. MORRIS, A. P. GEBALLE, Mol. Cell Biol. 2001, 20, 8635–8642. 28 F. GONG, C. YANOFSKY, Science 2002, 297, 1864–1867. 29 M. N. KAZARINOFF, E. E. SNELL, J. Biol. Chem. 1977, 252, 7598–7602.
References 30 M. C. DEELEY, C. YANOFSKY, J. Bacteriol. 1981, 147, 787–796. 31 V. STEWART, C. YANOFSKY, J. Bacteriol. 1986, 167, 383–386. 32 F. GONG, C. YANOFSKY, J. Biol. Chem. 2001, 276, 1974–1983. 33 D. OLIVER, J. NORMAN, S. SARKER, J. Bacteriol. 1998, 180, 5240–5242. 34 H. NAKATOGAWA, K. ITO, Mol. Cell 2001, 7, 185–192. 35 H. NAKATOGAWA, K. ITO, Cell 2002, 108, 629–636. 36 R. BERISIO, F. SCHLUENZEN, J. HARMS et al., Nat. Struct. Biol. 2003, 10, 366–370. 37 A. E. YONATH, J. MUSSIG, B. TESCHE et al., Biochem. Int. 1980, 1, 428–435. 38 W. M. CLEMONS, Jr, D. E. BRODERSEN, J. P. MCCUTCHEON et al., J. Mol. Biol. 2001, 310, 827–843. 39 F. SCHLUENZEN, A. TOCILJ, R. ZARIVACH et al., Cell 2000, 102, 615–623. 40 N. BAN, B. FREEBORN, P. NISSEN et al., Cell 1998, 93, 1105–1115. 41 I. S. GABASHVILI, R. K. AGRAWAL, C. M. SPAHN et al., Cell 2000, 100, 537–549. 42 D. I. SVERGUN, M. H. KOCH, J. S. PEDERSEN et al., J. Mol.Biol. 1994, 240, 78–86. 43 C. M. SPAHN, P. A. PENCZEK, A. LEITH et al. Struct. Fold Des. 2000, 8, 937–948. 44 C. GLOTZ, R. BRIMACOMBE, Nucleic Acids Res. 1980, 8, 2377–2395. 45 A. NAKAGAWA, T. NAKASHIMA, M. TANIGUCHI et al., EMBO J. 1999, 18, 1459–1467. 46 M. WORBS, R. HUBER, M. C. WAHL, EMBO J. 2000, 19, 807–818. 47 S. SPILLMANN, F. DOHME, K. H. NIERHAUS, J. Mol. Biol. 1977, 115, 513–523. 48 A. YONATH, Curr. Protein Pept. Sci. 2002, 3, 67–78. 49 R. K. AGRAWAL, J. LINDE, J. SENGUPTA et al., J. Mol. Biol. 2001, 311, 777–787. 50 M. G. GOMEZ-LORENZO, C. M. SPAHN, R. K. AGRAWAL et al., EMBO J. 2000, 19, 2710–2718. 51 M. VALLE, A. ZAVIALOV, J. SENGUPTA et al., Cell 2003, 114, 123–134. 52 J. L. HANSEN, T. M. SCHMEING, P. B. MOORE et al., Proc. Natl. Acad. Sci. USA 2002, 99, 11670–11675.
53 T. M. SCHMEING, A. C. SEILA, J. L. HANSEN et al., Nat. Struct. Biol. 2002, 9, 225–230. 54 S. V. KIRILLOV, J. WOWER, S. S. HIXSON et al., FEBS Lett. 2002, 514, 60–66. 55 R. R. GUTELL, J. C. LEE, J. J. CANNONE, Curr. Opin. Struct. Biol. 2002, 12, 301–310. 56 R. R. GUTELL: in Ribosomal RNA: Structure, Evolution, Processing, and Function in Protein Biosynthesis, eds R. A. ZIMMERMANN and A. E. DAHLBERG, CRC Press, Boca Raton, FL 1996, 111–128. 57 J. THOMPSON, W. E. TAPPRICH, C. MUNGER et al., RNA 2001, 7, 1076–1083. 58 T. ELGAVISH, J. J. CANNONE, J. C. LEE et al., J. Mol. Biol. 2001, 310, 735–753. 59 J. M. OGLE, D. E. BRODERSEN, Jr. W. M. CLEMONS et al., Science 2001, 292, 897–902. 60 P. SCHIMMEL, B. HENDERSON, Proc. Natl. Acad. Sci. USA 1994, 91, 11283–11286. 61 M. A. SCHAFER, A. O. TASTAN, S. PATZKE et al., J. Biol. Chem. 2002, 277, 19095–19105. 62 D. J. KLEIN, T. M. SCHMEING, P. B. MOORE et al., EMBO J. 2001, 20, 4214–4221. 63 R. T. BATEY, R. P. RAMBO, J. A. DOUDNA, Angew. Chem.-Int. Ed. 1999, 38, 2327–2343. 64 T. HERMANN, D. J. PATEL, J. Mol. Biol. 1999, 294, 829–849. 65 N. B. LEONTIS, J. STOMBAUGH, E. WESTHOF, Biochimie 2002, 84, 961–973. 66 P. B. MOORE, Annu. Rev. Biochem. 1999, 68, 287–300. 67 S. R. HOLBROOK, S. H. KIM, Biopolymers 1997, 44, 3–21. 68 W. SAENGER: Principles of Nucleic Acid Structure, ed. C. R. CANTOR, Springer, New York 1984. 69 A. E. WALTER, D. H. TURNER, Biochemistry 1994, 33, 12715–12719. 70 A. E. WALTER, D. H. TURNER, J. KIM et al., Proc. Natl. Acad. Sci. USA 1994, 91, 9218–9222. 71 J. KIM, A. E. WALTER, D. H. TURNER, Biochemistry 1996, 35, 13753–13761. 72 S. E. LIETZKE, C. L. BARNES, J. A. BERGLUND et al., Structure 1996, 4, 917–930.
31
32 Structure of the Ribosome
73 P. NISSEN, J. A. IPPOLITO, N. BAN et al., Proc. Natl. Acad. Sci. USA 2001, 98, 4899–4903. 74 J. H. CATE, A. R. GOODING, E. PODELL et al., Science 1996, 273, 1678–1685. 75 G. L. CONN, D. E. DRAPER, E. E. LATTMAN et al., Science 1999, 284, 1171–1174. 76 B. T. WIMBERLY, R. GUYMON, J. P. MCCUTCHEON et al., Cell 1999, 97, 491–502. 77 D. N. WILSON, G. BLAHA, S. R. CONNELL et al., Curr. Protein Pept. Sci. 2002, 3, 1–53. 78 M. TAMURA, S. R. HOLBROOK, J. Mol. Biol. 2002, 320, 455–474. 79 E. A. DOHERTY, R. T. BATEY, B. MASQUIDA et al., Nat. Struct. Biol. 2001, 8, 339–343. 80 S. K. SILVERMAN, T. R. CECH, Biochemistry 1999, 38, 8691–8702. 81 J. C. LEE, J. J. CANNONE, R. R. GUTELL, J. Mol. Biol. 2003, 325, 65–83. 82 U. NAGASWAMY, G. E. FOX, RNA 2002, 8, 1112–1119. 83 N. B. LEONTIS, E. WESTHOF, RNA 2001, 7, 499–512. 84 U. NAGASWAMY, M. LARIOS-SANZ, J. HURY et al., Nucleic Acids Res. 2002, 30, 395–397. 85 P. S. KLOSTERMAN, M. TAMURA, S. R. HOLBROOK et al., Nucleic Acids Res. 2002, 30, 392–394. 86 A. P. POTAPOV, F. J. TRIANA-ALONSO, K. H. NIERHAUS, J. Biol. Chem. 1995, 270, 17680–17684. 87 J. S. FEINBERG, S. JOSEPH, Proc. Natl. Acad. Sci. USA 2001, 98, 11120–11125. 88 J. A. PLEISS, O. C. UHLENBECK, J. Mol. Biol. 2001, 308, 895–905. 89 J. A. PLEISS, A. D. WOLFSON, O. C. UHLENBECK, Biochemistry 2000, 39, 8250–8258. 90 T. A. STEITZ: in The RNA World : the Nature of Modern RNA Suggests a
91 92
93 94 95 96
97 98
99 100 101 102 103
104
105
Prebiotic RNA, eds R. GESTELAND, T. R. CECH and J. ATKINS, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York 1999, 427–450. J. ALLERS, Y. SHAMOO, J. Mol. Biol. 2001, 311, 75–86. S. JONES, D. T. DALEY, N. M. LUSCOMBE et al., Nucleic Acids Res. 2001, 29, 943–954. M. TREGER, E. WESTHOF, J. Mol. Recog. 2001, 14, 199–214. J. R. WILLIAMSON, Nat. Struct. Biol. 2000, 7, 834–837. B. WEISBLUM, Antimicrob. Agents Chemother. 1995, 39, 797–805. P. DELBECQ, O. CALVO, R. K. FILIPKOWSKI et al., Curr. Genet. 2000, 38, 105–112. P. FANG, Z. WANG, M. S. SACHS, J. Biol. Chem. 2000, 275, 26710–26719. G. L. LAW, A. RANEY, C. HEUSNER et al., J. Biol. Chem. 2001, 276, 38036– 38043. A. L. PAROLA, B. K. KOBILKA, J. Biol. Chem. 1994, 269, 4s497–4505. A. RANEY, G. L. LAW, G. J. MIZE et al., J. Biol. Chem. 2002, 277, 5988–5994. K. REYNOLDS, A. M. ZIMMER, A. ZIMMER, J. Cell Biol. 1996, 134, 827–835. J. P. ALDERETE, S. JARRAHIAN, A. P. GEBALLE, J. Virol. 1999, 73, 8330–8337. R. R. GUTELL, J. J. CANNONE, Z. SHANG et al., J. Mol. Biol. 2000, 304, 335–354. M. STO¨ FFLER-MEILICKE, G. STO¨ FFLER: in The Ribosome. Structure, Function and Evolution, eds W. E. HILL, A. DAHLBERG, R. A. GARRETT et al., American Society for Microbiology, Washington, DC 1990 123–133. J. FRANK, R. AGRAWAL: in Embryonic Encyclopedia of Life Sciences Nature Publishing Group, London 1999, www.els.net.
1
Assembly of The Prokaryotic Ribosome Knud H. Nierhaus Max Planck Institute for Molecular Genetics, Berlin, Germany
Originally published in: Protein Synthesis and Ribosome Structure. Edited by Knud H. Nierhaus and Daniel N. Wilson. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30638-1
1 Introduction
A ribosome consists of a large number of different components (e.g., Escherichia coli: 54 r-proteins and three rRNAs), all of which are present in one copy per ribosome, the only exception being that of the ribosomal protein L7/L12, which is present in four copies. This fact has two important consequences for the ribosomal biogenesis: both the synthesis and the assembly of the ribosomal components must occur in a highly coordinated fashion. The principles of ribosomal assembly in prokaryotes, such as the assembly maps, rate-limiting steps and that the assembly gradient follows the transcription of the ribosomal RNA were uncovered primarily by in vitro techniques. The requirement for a highly coordinated synthesis is particularly demanding in the cases in which ribosomes contribute significantly to the dry mass of the cell. In bacteria, the ribosomes can amount to more than 50% of the dry mass [1], whereas in eukaryotes they represent not more than 5% [2]. In fact, cells of E. coli appear as sacs filled with ribosomes in images of transmission electron microscopy. Correspondingly, more than 50% of the total energy production of bacteria is consumed by ribosomal biogenesis. Therefore, it is understandable that a coordinated synthesis is not only a prerequisite for a successful and effective assembly, but is also a necessity for economic consumption of the energy available to the cell. We thus find an intricate network of regulatory mechanisms for the synthesis of ribosomal components in bacteria, which do not exist to the same degree in eukaryotes (e.g., translational control of mRNAs carrying cistrons for ribosomal proteins and the stringent response).
Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Assembly of The Prokaryotic Ribosome
Third-order reactions practically do not exist, since the probability of three substrates reacting simultaneously is negligibly low. By extension, the assembly of more than 20 components (in the case of the small ribosomal subunit) to a defined and relatively compact particle is a series of reactions. This construction is a self-assembly process, i.e. the total information for the pathway as well as for the quaternary structure of the active ribosomes resides completely in the primary sequences of the ribosomal proteins and rRNAs. The fact that fully active ribosomes can be reconstituted from the isolated components with the remarkably high efficiency of 50–100% of the input material substantiates this assumption. The self-assembly character in vitro does not preclude the involvement of additional factors in vivo to facilitate and accelerate the whole process, for example, by reducing activation energies of distinct or otherwise rate-limiting reactions. One of these factors is probably the “assembly gradient” that marks the coupling of rRNA synthesis and ribosomal assembly in prokaryotes and eukaryotes, which means that the state of the rRNA synthesis dictates the progress of assembly [3–6]. It is further possible that proteins exist which maintain an unfolded state of the de novo synthesized ribosomal proteins thus favoring the integration into ribosomal particles (“chaperonins”). A recent report suggested that the Hsp70 DnaK system is involved in facilitating the ribosomal assembly by reducing the activation energy barrier [7]; however, these assertions could not be confirmed [8]. Circumstantial evidence for a corresponding activity for the rRNA (“helicases”), possibly facilitating the attainment of distinct RNA conformations that favor the assembly process, has been reported previously [9].
2 Processing of rRNAs
There are two main aspects to the processing of rRNAs: (1) trimming of the rRNAs to yield the mature molecules found in native, active ribosomes and (2) modification (mostly methylations and pseudouridylation) of the rRNAs. The common order of rRNA genes in the seven rRNA operons in E. coli is 16S – internal transcribed spacer I (ITS-1) – (either tRNAAla and tRNAIle or tRNAGlu ) – ITS2 – 23S – ITS-3 – 5S (see Fig. 1). The complete, intact transcript, the “30S precursor rRNA”, is found at low levels in wild-type cells (1–2% of rRNA). Endonucleases are the primary processing enzymes, among which RNase III plays a major role in the maturation of 23S rRNA. RNase III cleaves within the spacer sequences bordering 16S and 23S rRNA. The spacer sequences can form impressive secondary structures flanking both 16S and 23S rRNA; however, these intramolecular interactions are not a prerequisite for RNase III activity, since RNase III can cleave at the 5 end before the 3 end of the same molecule is fully transcribed. A general feature of processing is that it begins before transcription of a ribosomal (rrn) operon is finished. This sequential processing in the 5 → 3 direction is compatible with the
2 Processing of rRNAs
Fig. 1 Operon structure of the rRNA genes in bacteria. UAS, upstream activating sequence, binding region of Fis, a DNAbinding protein that bends the DNA and increases rRNA transcription about 10-fold. P1 and P2, the two promoters, the first of
which can efficiently be regulated by the stringent response; T1 and T2, terminators of transcription. Below the operon is shown with the symbolized secondary structure of the respective RNAs and the cleavage sites of some of the processing RNases.
hypothesis that at least some processing steps are coupled with ribosomal assembly. The final maturation steps of pre-16S, pre-23S and pre-5S are performed by exonucleases (maturases, secondary processing enzymes) that are not yet characterized and are termed M16, M23, and M5, respectively (see Ref. [10] for review). The final processing steps in the 50S subunit occur even after ribosomes are formed probably during early steps of protein biosynthesis, whereas mature 16S rRNA is required to obtain functional competence (see below). RNase III cleavage yields precursor species of rRNA. The pre-16S species retains additional stretches of 115 and 33 nucleotides at the 5 - and 3 - ends, respectively, whereas the pre-23S has stretches of only 7 and 7–9 nucleotides, respectively [10, 11]. RNase III cleavages are not essential processing steps since mutants lacking RNase III are viable. In the absence of RNase III, 50S with pre-23S are found, whereas mature 16S rRNA molecules are formed at the same rate as in the wild-type strain. Interestingly, 30S subunits containing pre-16S, where the sequence flanking the mature 16S rRNA are base-paired and form a long helix, seem to be inactive (i.e., mutants deficient in M16 are not viable) in contrast with 50S subunits containing pre-23S. Note that in mature 30S subunits the 5 - and 3 - ends are far apart from each other, whereas in 50S the 5 - and 3 - ends are base-paired as seen clearly in the crystal structure of bacterial 30S and 50S subunits (Figs. 2 and B, respectively [12, 13]). Consequently, the maturation from pre-16S to mature 16S rRNA within
3
4 Assembly of The Prokaryotic Ribosome
Fig. 2 5 - and 3 -ends of 16S rRNA within the 30S subunitand of 23S rRNA within the 50S subunit (B) viewed from the interface. (A) 30S subunit; Thermus thermophilus 30S (pdb 1fka): ribbon representation of rRNA with the 5 - and 3 -ends of the 16S rRNA colored green and red, respectively. The residues U6 (green) and C1512 (red) are in
spacefill and are approximately 80 Å apart. (B) 50S subunit; Deinococcus radiodurans 50S (1kpj): ribbon representation of rRNA with the 5 - and 3 -ends of the 23S rRNA colored green and red, respectively. The residues G1 (green) and A2877 (red) are shown in spacefill and are in contact with one another.
30S particles (removing the secondary structure flanking the 16S rRNA) triggers the activation of 30S subunits [14]. When the 16S rRNA processing is coupled with and depends on a correct 30S assembly, this final processing step guarantees that only active 30S subunits can initiate protein synthesis. Processing of 5S rRNA requires RNase E. RNase E-deficient mutants accumulate a 9S species that has not been detected in wild-type cells. RNase E forms pre-5S with three extra nucleotides at both its 5 - and 3 - ends. The final processing of 5S rRNA might also occur during protein synthesis or at least in active 70S ribosomes, since pre-5S was found in polysomes. In vitro reconstitution studies have revealed that 5S rRNA can be incorporated into the large subunit at any stage of the 50S assembly [15] reflecting its exposed location in the central protuberance of the 50S subunit. In E. coli, 11 and 23 nucleosides are modified in 16S and 23S rRNA, respectively (Table 1; [16, 17]). The modifications of the 16S rRNA are late events during in vivo assembly and are not essential for assembly per se, whereas most of the 23S modifications, some of which are essential, occur early during assembly. Most of the modifications are base-methylations, and nine are pseudo-uridylations () in the 23S rRNA. The methylations are not required for the trimming processes described above. In fact, a few methyl groups are found at the 2 -ribose position (e.g., Cm2498 of 23S rRNA, see Table 1) and might protect sensitive rRNA regions against RNase attack. The presence of a modified adenine at position 2071 (A* in Table 1) is uncertain. Most of the methyl groups are modifications of bases exposed at the ribosomal surface and are clustered at functionally active
2 Processing of rRNAs Table 1 Modified nucleosides in E. coli rRNA. (see Ref. [44] for review and references)
16S rRNA
Location (nucleotide)
23S rRNA
Location (nucleotide)
m7 G m7 G m5 C m7 G m4 Cm m5 C m3 U m2 G m6 2 A m6 2 A Total
516 527 966 967 1207 1402 1407 1498 1516 1518 1519 11
m1 G m5 U m6 A m2 G m3 m5 U m5 C m6 A m7 G A* Gm m2 G D Cm m2 A Um Total
745 746 747 955 1618 1835 1911 1915 1917 1939 1962 2030 2069 2071 2251 2445 2449 2457 2498 2503 2504 2552 2580 23
sites of the ribosome, e.g., 20 of the 23 modifications occur at or near the peptidyl transferase ring of domain V of the 23S rRNA, none of which are universally conserved. Therefore, they are probably more important for fine-tuning and stability of structural motifs rather than being directly involved in ribosome function. The two adjacent m62 A residues at positions A1518/A1519 in 16S rRNA are the only universally conserved modifications of rRNA. They improve the formation of initiation complexes, in particular the binding of IF-3, which has an anti-association activity during the initiation of translation. The absence of these four methyl groups confers resistance to the drug kasugamycin. Recently, evidence was reported that occasionally defined rRNA fragments occur during the synthesis of 16S and 23S rRNA. These fragments are so efficiently degraded by polynucleotide phosphorylase and RNase R that they escaped the attention in former analyses of rRNA synthesis. Cells are not viable when both enzymes are inactivated. The fragments, if accumulated, might bind some ribosomal proteins, thus compromising the assembly of the mature rRNAs and eventually leading to cell death [18].
5
6 Assembly of The Prokaryotic Ribosome
3 Precursor Particles and Reconstitution Intermediates
Usually, the assembly process is described from the point of view of the largest component, i.e., the 16S or 23S type of rRNA and the sequence of addition of these components is considered. This process requires a concatenation of reactions that differ in their respective activation energies. The highest activation energies function as energy barriers allowing precursor particles to accumulate. In fact, precursor particles have been found in vivo. The assembly of the small subunit (30S, E. coli) passes through at least two different intermediate particles termed p1 30S and p2 30S (p for precursor). The p1 30S particle sediments with 21S. The p2 30S particle contains the full complement of S-proteins (S for proteins from the small subunit), but still an immature “17S” rRNA, which is longer at both its 5 - and 3 -ends with respect to the mature 16S rRNA (see the preceding section; [10]). Only one reconstitution intermediate, RI30 , is found during the assembly in vitro of the 30S subunit [19]. The total reconstitution (“total” marks a reconstitution from completely separated ribosomal proteins and rRNA) is a one-step procedure according to the formula
16S rRNA+TP30
20 mM Mg2+ ,40 C, 20 min → 30S,
(1)
where TP30 stands for total proteins derived from 30S subunits. An incubation of both 16S rRNA and TP30 at 0◦ C leads to the RI30 particle. This particle has to undergo a conformational change (“activation”) according to the equation
RI30
40◦ C, 20 min * → RI30
(2)
where RI30 * is a particle with an unchanged composition but a tighter packing. Only the RI30 * particle can bind at 0◦ C, the lacking S-proteins to form an active 30S particle. Interestingly, the protein content of the RI30 particle is very similar to that of the p1 30S precursor. Equation (2) describes the rate-limiting step of the 30S assembly, the activation energy of which is 63 kcal mol−1 [19]. The assembly in vivo of the large subunit (50S, E. coli) occurs via three precursor particles p1 50S, p2 50S, and p3 50S sedimenting with 34S, 43S and “near 50S”, respectively [20]. The final precursor (p3 50S) contains again a full complement of L-proteins (L for proteins derived from the large subunit). The p2 50S particles can be converted to active 50S subunits in the presence of TP50 under methylating conditions (S-adenosyl methionine, postribosomal supernatant), whereas the p1 50S particles could not, suggesting that this initial process might require additional processing steps other than simple methylations [21].
4 Assembly-initiator Proteins
A two-step procedure is required for the total reconstitution of active 50S subunits [22]: ◦
(23S+5S) rRNA+TP50
4mM Mg2+ , 44 C,20min → ◦ 20mM Mg2+ , 50 C,90min → 50S.
(3)
The two-step procedure is a consequence of the fact that the rate-limiting steps of early and late assembly involve conformational changes that differ in their ionic optima in vitro ([23]; see also below). The two-step procedure is therefore a convenient way to separate early- and late-assembly events, and indeed allowed for a detailed analysis of the possible reconstitution intermediates. Three reconstitution particles have been identified; protein analysis revealed that the first and the second particles contained the same complement of rRNAs and L-proteins in spite of the drastic difference in their respective S values (33S and 41S, respectively, see Table 2), whereas the third particle contained all the components of the active 50S subunit but was totally inactive. Accordingly, the three reconstitution intermediates were termed RI50 (1), RI*50 (1) and RI50 (2). It appears that the rate limiting step of the first incubation is the conformational change * (1), and that of the second incubation the conformational change RI50 (1) → RI50 RI50 (2) → 50S (Table 2). The corresponding activation energies have been determined as 45 and 55 kcal mol−1 , respectively. The precursor particles and the corresponding reconstitution intermediates have similar protein compositions as well as similar S values, indicating that assembly in vivo proceeds via rate-limiting steps that are very similar, if not identical, to the corresponding ones of the assembly in vitro [23].
4 Assembly-initiator Proteins
Ribosomal proteins (r-proteins) that bind in vitro specifically to naked rRNA are members of the “RNA-binding” family of proteins. About two-thirds of all r-proteins are RNA-binding proteins (see Fig. 3). The intriguing question was whether all these RNA-binding proteins, for example, about 20 L-proteins in the large subunit and 7 S-proteins in the small one, also bind directly to rRNAs in vivo without the help of other r-proteins (without cooperativity), i.e., whether these proteins are independent assembly-initiation events. There were indications from in vivo studies that only a small number of r-proteins were able to initiate the assembly process. Under unfavorable growth conditions, when the doubling time for growth was about 10 h, i.e., 30 times longer than the optimal doubling time, the balanced synthesis of rRNA and r-proteins was lost and rRNA was produced in a three molar excess over r-proteins [24]. If, under such conditions, all 20 RNA-binding L-proteins could initiate the assembly process independently then they would be distributed evenly over the excess of rRNA,
7
8 Assembly of The Prokaryotic Ribosome Table 2 Sequential addition of proteins in the course of total reconstitution. Proteins in bold
indicate proteins essential and sufficient for the RI* formation. For further details see Refs. [19] (30S subunit) and [23] (50S subunit)
and therefore the yield of active particles with a full complement of r-proteins would be negligibly small. Since E. coli cells produce significant amounts of active ribosomes even under unfavorable growth conditions, the number of assemblyinitiator proteins must be significantly smaller than that of the total number of RNA-binding r-proteins. An assembly-initiator protein is defined as an r-protein, which binds without cooperativity to an rRNA molecule and is essential for the formation of an active ribosomal subunit. Only those rRNA molecules with a complete set of initiator proteins are able to assemble correctly to form fully active ribosomal particles.
4 Assembly-initiator Proteins
Fig. 3 Translational regulation of the ribosomal proteins. Usually the second or third cistrons of a polycistronic mRNA code for an rRNA-binding protein, which can also bind to the region of the ribosomal binding site of the first cistron of its own mRNA,
thus competing with initiating 30S subunits. Therefore, this regulatory protein will inhibit the translation of its own polycistronic mRNA, when a significant free pool of this protein is in the cell.
The dependence of the amount of active particles on the number of assemblyinitiator proteins and the excess of rRNA is governed by the formula [25] A = E 1−n ,
(4)
where A is the fraction of total proteins that appears in active particles (A for activity), E the molar ratio of rRNA to r-proteins (E for rRNA excess), and n represents the number of assembly-initiator proteins. Let us assume that only one initiator protein is present. If the molar ratio rRNA : TP is E, then the probability of finding the initiator protein on one distinct rRNA molecule is E −1 , whereas for n initiator proteins the probability is E −n (independent events). Since the probability is the same for each rRNA molecule, the overall probability of obtaining a complete initiation complex (i.e., a complex of rRNA and all n initiator proteins) is E × (E −n ) = E 1−n . This is identical (A) with the fraction of TP, which appears in active particles, since only complete initiation complexes will form active particles. Hence, A = E 1−n and lnA = (1−n)ln E .
(5)
Equation (5) provides us with direct access to the experimental strategy for the elucidation of the number of initiator proteins, for example, one keeps the level
9
10 Assembly of The Prokaryotic Ribosome
of TP50 constant and increases the input E of (23S+5S) rRNA. The reconstitution is then performed and the yield of active particles, A, is determined. This kind of analysis revealed that only two L-proteins actually function as assembly-initiator proteins, and in an additional reconstitution analysis with purified proteins, L3 and L24 were identified as the two assembly-initiator proteins of the large subunit by applying a variation of this strategy [25]. Two proteins were also identified as assembly-initiator proteins for the small ribosomal subunit, namely the proteins S4 and S7 [26]. We understand from Eq. (4) that a high yield of active particles, under conditions where rRNAs are synthesized in excess over r-proteins, correlates with a small number of assembly-initiator proteins. Why then are there two initiator proteins rather than one? As already mentioned, the mechanisms regulating the mutual adaptation of the synthesis of rRNA and that of r-proteins uncouple during extremely unfavorable growth conditions leading to an excess synthesis of rRNA. At this point, feedback mechanisms and autoregulatory circuits become increasingly important, such as the translational regulation of the r-proteins. The principle of the translational regulation is demonstrated in Fig. 3. Usually, the second or third cistrons of a polycistronic mRNA of r-proteins code for an RNA-binding protein that can also bind to the Shine–Dalgarno region of the first cistron from its own mRNA, thus competing with initiating 30S subunit and reducing the frequency of translation of the mRNA. Ten ribosomal proteins involved in translational regulation have been identified; these are S4, S7, S8, S15, S20, L1, L4, L10, L12 and L20 [27]. If only one assembly-initiator protein existed, then in the presence of excess rRNA all the r-proteins would flow into the formation of active particles leaving no free pool of r-proteins for regulatory tasks. It is the existence of two initiator proteins that is responsible for assembly-dead ends. These assembly-dead ends are loose protein– rRNA complexes from which r-proteins are provided for the translational control. Therefore, two initiator proteins seem to represent an optimum, the number must be small to allow the formation of significant amounts of active ribosomes in the presence of rRNA excess; on the other hand, the number must be larger than one to enable translational control under unfavorable growth conditions. Protein L24 is essential only for the early-assembly event, being dispensable during later assembly events and not necessary for ribosomal function. The existence of a mutant lacking L24 is a surprise. The mutant strain produces assembly-defective 50S subunits in agreement with the findings described above. The mutation is conditionally lethal, exhibiting temperature sensitivity, i.e., it does not grow at temperatures above 36◦ C. Even at permissive temperatures, severe growth defects are observed (the doubling time is 5–7 times longer than that of wild-type E. coli strain), and the molar ratio of 30S : 50S ribosomal subunits is only 1 : 0.5. An analyses in vitro of the assembly of the mutant ribosomes demonstrated that L24 is absolutely required for the total reconstitution of active 50S subunits when the incubation temperature of the first step (normally 44◦ C, see Eq. (3)) was above 40◦ C. Below 40◦ C, active particles could be formed in the absence of L24; however, the maximal output of active 50S subunits was only 50% at permissive temperatures as
4 Assembly-initiator Proteins
compared with optimal conditions, i.e., half that in the presence of functional L24, and the activation energy of the rate-limiting reaction during the first step incubation was twice as large as that in the presence of L24. Obviously, another protein replaces L24 at “permissive temperatures” (below 40◦ C) with reduced efficiency, and a systematic study with isolated L-proteins revealed that this protein is L20 [28].
4.1 Proteins Essential for the Early Assembly: The Assembly Gradient
* (1) is marked by The transition of the reconstitution intermediate RI50 (1) → RI50 * a drastic S-value shift from 33S to 41S (Table 2). RI50 (1) is the essential product of the early assembly during the first-step incubation that cannot be formed during the second step. Both intermediate particles consist of 23S and 5S rRNA and 20 different proteins. Are all these multiple components needed for the critical * (1)? transition from RI50 (1) → RI50 The results of a systematic analysis with purified ribosomal proteins were surprising. Only 23S rRNA and five proteins (L4, L13, L20, L22, and L24) were necessary * (1) conformation, and sufficient for establishing the functionally important RI50 although L3 stimulated the formation, the 5S rRNA and all the other ribosomal proteins were not required [6]. Comparison of the known binding sites of the early assembly proteins revealed that all these essential proteins have a binding site located towards the 5 -end of the 23S rRNA and only the stimulatory L3 binds near the 3 -end. Since all polymerasedependent synthesis of nucleic acids starts at the 5 -end, this observation has an important consequence. Those proteins that determine the early-assembly reactions bind in vivo immediately after the onset of transcription of the 23S rRNA and before the completion of its synthesis. This coupling of rRNA synthesis and ribosomal assembly was termed the “assembly gradient” and states that the progress of rRNA synthesis dictates the progress of assembly [4]. Therefore, the entropic situation of in vivo assembly is much simpler than that of the total reconstitution in vitro: in vivo, the assembly starts with a relatively short 5 -sequence of the rRNA and the five proteins essential for the early assembly, whereas in vitro the mature 23S rRNA is exposed to all 33 L-proteins. The entropic advantage of the initiation phase of assembly in vivo is seen in the following: the disorder of assembly initiation in vivo is drastically lower than under in vitro conditions, since in vivo the assembly commences before the transcription of the rRNA is finished and thus assembly initiation deals only with a limited number of components. These are the freshly transcribed 5 -region of the rRNA and the five proteins that can bind to the 5 -terminal sequence and determine the early assembly events. In striking contrast, when the mature 23S rRNA is present, all L-proteins compete for the assembly process in vitro, thus defining a much larger disorder under reconstitution conditions. This entropic advantage of the in vivo assembly over the in vitro reconstitution cannot be overestimated. It might at least partially explain why in vivo the 50S assembly of E. coli ribosomes
11
12 Assembly of The Prokaryotic Ribosome
lasts a couple of minutes at 37◦ C, whereas in vitro 90 min at 50◦ C is needed. In vivo studies have confirmed the existence of an “assembly gradient” [3]. The longer the rRNA, the more importance the assembly gradient assumes for an energetically favorable assembly process. The “assembly gradient” is presently the only explanation for the extremely complicated process of ribosomal assembly in eukaryotes. Ribosomal proteins are imported to the nucleoli, where rRNA transcription occurs (with the exception of the 5S rRNA). The near mature ribosomal subunits are exported from the nucleoli to the cytoplasm. It is assumed that this mechanism is required by the necessity to couple rRNA transcription and ribosomal assembly and thus to retain the entropic advantage of the assembly gradient. The early-assembly proteins responsible for the RI30 → RI*30 transition on the pathway to active 30S subunits were identified in the 1960s long before the 50S analysis described above [29]. The proteins essential for the transition were S4, S7, S8, S16 and S19, but the corresponding binding sites do not show such a clear preference for 16S rRNA regions near the 5 -end, as the early-assembly proteins do for the 23S rRNA. However, kinetic studies also revealed a sequential assembly with a 5 - to 3 -polarity along the 16S rRNA chain [5].
4.2 Late-assembly Components
One can define “late-assembly components” as those components of the 50S subunit that can be added to the second incubation step of the two-step incubation to yield active particles. In this classification, 5S rRNA and all L-proteins (except those * (1) particles) would be included. five proteins essential for the formation of RI50 However, here we define “late-assembly components” in the narrow sense as those components which play a decisive role in the late-assembly process, regardless of whether or not they are in addition important for stabilization of the structure and/or participate directly in ribosomal functions. Until now, only two L-proteins have been identified, namely L15 and L16, whose main task is the organization of the late assembly step [30]. A mutant that lacks L15 exists (see also the next section), whereas a mutant without L16 has not yet been described. However, ribosomes lacking either L15 or L16 or both proteins can be reconstituted and are active in poly(U)-dependent poly(Phe) synthesis. Both proteins have been shown to accelerate the assembly process independently by a factor of 2–4; the proteins act synergistically since together they increase the assembly rate by a factor of 20. The stimulatory effects of both proteins are also observed when they were added after the two-step reconstitution during a short third incubation under the conditions of the second step. This means that particles could be reconstituted during the standard two-step procedure in the absence of L15 and L16, and the late addition of these proteins in a short third incubation was sufficient to form active particles. The latter feature underlines their involvement in the late assembly.
4 Assembly-initiator Proteins
5S rRNA also fulfill’s the criteria of a late-assembly component, viz., it can also be added after the two-step reconstitution to a third incubation. The activation effect of 5S rRNA is heat-dependent, i.e., it induces a conformational change. However, in contrast with L15 and L16, fully active particles without 5S rRNA cannot be reconstituted. Therefore, a direct or indirect involvement of 5S rRNA in ribosomal functions is probable, in addition to its assembly activities [15]. 4.3 Proteins Solely Involved in Assembly
A comparison of the secondary structures of rRNAs from organisms of the various kingdoms reveals that about two-thirds of the E. coli 16S and 23S rRNAs is universally conserved (Fig. 4). The regions of the remaining one-third are randomly scattered over the rRNAs and can be shorter, longer or even absent in other
Fig. 4 The core structure (green lines) common to all 16S-type rRNAs from ribosomes of various organisms: (A) E. coli; (B) Halobacterium volcanii; (C) yeast, cytoplasmic ribosomes; (D) mitochondria of plants (maize). According to Ref. [46], modified.
13
14 Assembly of The Prokaryotic Ribosome Table 3 Mutants from E. coli lacking r-proteins. (taken from Ref. [45])
Subunit
Missing protein
30S
S1 S6 S9 S13 S17 S20 L1 L11 L15 L19 L24 L27 L28 L29 L30 L33
50S
Phenotype
Cold-sensitive
Temperature-sensitive
Cold-sensitive Temperature-sensitive, Very slow growth Cold-sensitive Cold-sensitive
Cold-sensitive
organisms. Similarly, one-third of the r-proteins are dispensable in E. coli, since mutants have been described lacking one or other of these r-proteins (Table 3). A recent comparison of sequenced genomes from organisms of all three evolutionary domains, viz., bacteria, archaea and eukarya, found that about one-third of the E. coli proteins are universally conserved ([31]; Fig 5). This suggests that these rRNA regions and r-proteins are dispensable for ribosomal assembly, structure or
Fig. 5 Conservation of ribosomal proteins from bacteria (B), eukarya (E) and archaea (A). 34 proteins are universally conserved, no proteins are common between bacteria and archaea or between bacteria and eukarya, whereas 33 proteins are present
in archaea and eukarya. This is an impressive example that the common ancestor of archaea and eukarya separated from bacteria before the domain separation between eukarya and archaea. From Ref. [31] with permission.
4 Assembly-initiator Proteins
function, although they might accelerate assembly, stabilize structures or fine-tune functions. Proteins that accelerate assembly represent one class of proteins solely involved in assembly. S16 and L15 belong to this class. Fully active 30S subunits can be assembled in the absence of S16, but the protein accelerates the assembly [32]. L15 is absolutely required for the formation of active particles under standard reconstitution conditions, but this requirement is relieved after decreasing the NH4 Cl concentration from 400 to 240 mM [30]. Under these conditions, L15 accelerates the assembly process by a factor 2 –4, which correlates well with the prolonged generation time (2–3 fold) of the mutant lacking L15. This observation suggests that the production of the large ribosomal subunits is the rate-limiting factor of the generation time of E. coli cells under optimal growth conditions. A second class of “assembly-only proteins” consists of a group of proteins essential for achieving a distinct assembly stage, which is a necessary intermediate in the path towards an active subunit. If mutants lacking such a protein exist at all, they are very sick. L20, L24, and probably L16, belong to this class. L16 is an assembly protein; it accelerates the late assembly, and particles lacking L16 can be reconstituted and show a good activity in poly(Phe) synthesis, although reconstituted 50S subunit with a full complement of proteins are four times more active [30]. Crystalstructure analysis of 50S subunits suggests that L16 might help to position tRNAs at P-site. A mutant lacking L16 has not yet been identified, possibly because of this role in positioning tRNAs. A mutant lacking L20 is also yet to be described. As mentioned previously, the mutant lacking L24 is severely handicapped as expected. Both L20 and L24 are essential for the formation of the obligatory, early intermediate RI*50 (1), but they are not involved in late assembly nor in ribosomal functions. How can one test that a protein has an essential role in the assembly but no role in function? The intriguing observation was that L20, L24, and other proteins * (1) conformation, but (L4, L14, and L22) are essential for the formation of the RI50 once this conformation has been achieved, at least for L20 and L24, they can be again removed by high-salt washes without losing the RI*50 (1) conformation. If the resulting core particle is reconstituted with TP50 lacking either L20 or L24, fully active particles are obtained. Therefore, both proteins are essential for the early assembly but play no role in either late assembly or ribosomal functions [33, 34].
4.4 Assembly Maps
In addition to the formal assembly pathway described in the preceding sections, the precise sequence of binding reactions starting with the 16S or 23S rRNA has been unravelled. The experimental results of such binding analyses are summarized in “assembly maps”. Primary binding proteins can individually form a stable complex with the rRNA and are connected to the rRNA by a thick arrow in Fig. 6. Secondary binding proteins require the help of other proteins. The assembly map reflects the interdependencies of the proteins for their incorporation into the ribosomal
15
16 Assembly of The Prokaryotic Ribosome
Fig. 6 Assembly maps of the ribosome from E. coli. (A) The small 30S subunit ([47]; modified according to Ref. [48]). Assembly proteins in the green, yellow, and purple field drive the assembly of the 5 domain, the central domain and the 3 domain. Proteins above the dotted line are those either required for the forma* particles or found in the isotion of RI30 lated 21 RI30 particles. (B) The large 50S subunit. The three main fragments of 23S
rRNA (13S, 8S, and 12S) are indicated, and proteins are arranged according to their binding regions on 23S rRNA. The proteins boxed with a blue line are required for the transition RI50 (1) → RI50 (1) of the early assembly. Proteins above the brown line are * (1) particles. L5, those found in the RI50 L15 and L18, circled by the green line, are the proteins important for mediating the binding of 5S rRNA to 23S rRNA (according to Ref. [49], modified).
4 Assembly-initiator Proteins
particle. Consistently, the sequence of stripping-off r-proteins with increasing LiCl concentrations is the exact reverse of the assembly order [35]. An interdependence of binding of two proteins does not necessarily reflect physical proximity. It is conceivable that a protein induces a conformational change of the ribonucleoprotein particle (RNP) upon integration, thus generating new binding sites for other components. Crystal structures of the small [13, 36] and the large subunits [12, 37] at atomic resolution have enabled a quantum leap in our understanding of the assembly process. We can infer the precise protein–rRNA interactions for each protein as demonstrated with the 30S subunit [38], and thus it is now clear that the primary binding proteins (those proteins in Fig. 6 directly connected with the rRNA) help to nucleate the folding of the rRNA domains. An assembly feature seen for the three major secondary structure domains of the 30S subunit – namely that each of these domains (Figure 7) represent an independent assembly and folding domain [39–42] is strictly true only for the 3 major domain forming the head of the 30S subunit. The proteins involved form a separate S7dependent branch of the 30S assembly map. With regard to the other domains, the above-mentioned study of protein–16S rRNA interactions revealed that most of the proteins contact more than one domain [38]. This is particularly true for the 5 (30S body) and central domains (platform), which are tightly associated with each other. This observation can probably be extrapolated to the assembly of these domains and thus explains why the 30S subunit has only two assembly-initiator proteins [26], S4 initiating the assembly of the 5 -domain and the central domain (body and platform, respectively), and S7 the 3 major domain (head). Since assembly follows transcription of the rRNA from 5 - to the 3 - end, the assembly proceeds in three main steps, the 5 domain forms the body of the 30S subunit, followed by the formation of the platform (central domain) and eventually by that of the head (3 major domain). The 3 minor domain is formed from the long helix (h44) that runs down and back up the interface of the 30S subunit and comprises elements of the decoding center, as well as the short h45 and the anti-Shine-Dalgarno stretch at the 3 - end of the 16S rRNA. The assembly of the domains of the 30S subunit was the topic of an interesting experiment in silico [43]. The 30S crystal structure was simplified by considering each nucleotide of the 16S rRNA as a pseudo-atom P and each aminoacyl residue of the ribosomal proteins as a pseudo-atom C. Interactions between proteins and 16S rRNA were assumed when Ps and Cs were closer than 3 Å. The resulting interaction diagram (Fig. 8) shows again the exclusive interaction of the proteins from the S7-dependent assembly branch with the head domain. In the next step the secondary structure map of 16S rRNA was taken, i.e., the tertiary folding was removed and the secondary structure alone considered, the proteins were added according to the sequence of the assembly map and the 16S rRNA secondary structure folded following the derived contact points of Fig. 8. The result was that the domains obtained were strikingly similar to those of the crystal structure, whereas the interdomain arrangements deviated significantly. The conclusion was that the sequential addition of the proteins during the assembly gradient dictates the folding pathway for the 16S rRNA, whereas the mutual arrangement of the
17
18 Assembly of The Prokaryotic Ribosome
Fig. 7 Ribosomal secondary and tertiary structure within the ribosome; the domains are marked with the same color in the secondary and the tertiary structures. (A) Structures of the 16S rRNA; the 5 , central and 3 major domain are in blue, red and green; the 3 minor domain in blue (3 MIN); Hd,
head; Bd, body; Pt, platform; the proteins are in gray. From Ref. [43], with permission. (B) Structures of the 23S rRNA. Note that the colors of the domains are different in the maps of the secondary structure and in the tertiary structure. From Ref. [50] with permission.
4 Assembly-initiator Proteins
Fig. 8 The proteins of the 30S subunit are listed according to their contacts with the 16S rRNA. From Ref. [43] with permission.
domains might be governed by rRNA–rRNA interactions. The implicit assumption of this study, namely that the structure of the ribosomal proteins before and after assembly is practically the same, seems to be justified (see Ref. [38]). Despite our rapid increase in understanding of the assembly processes through the ribosome subunit crystal structures, we have but scant knowledge of the features responsible for the extremely efficient assembly of ribosomes in vivo. This is emphasized, for example, by the requirement of an incubation step at 50◦ C for 90 min during reconstitution of a 50S subunit in vitro [22] with the in vivo assembly of the large subunit in 20 min at 37◦ C. Although the assembly gradient as men-
19
20 Assembly of The Prokaryotic Ribosome
tioned contributes to this enhancement, there are undoubtedly multiple helicases and chaperones that facilitate and accelerate the assembly process. The elucidation of the concerted interplay of elements supporting the assembly in vivo is a challenge for future assembly studies.
References 1 A. TISSIE` RES, J. WATSON, D. SCHLESSINGER et al., J. Mol. Biol. 1959, 1, 221–233. 2 G. BLOBEL, V. R. POTTER, J. Mol. Biol. 1967, 26, 279–292. 3 B. T. U. LEWICKI, T. MARGUS, J. REMME et al., J. Mol. Biol. 1993, 231, 581–593. 4 K. H. NIERHAUS: in Analysis of the Assembly and Function of the 50S Subunit from Escherichia coli Ribosomes by Reconstitution, eds G. CHAMBLISS, G. R. CRAVEN, J. DAVIES et al., University Park Press Baltimore, 1980. 5 T. POWERS, G. DAUBRESSE, H. F. NOLLER, J. Mol. Biol. 1993, 232, 362–374. 6 S. SPILLMANN, F. DOHME, K. H. NIERHAUS, J. Mol. Biol. 1977, 115, 513–523. 7 J. A. MAKI, D. J. SCHNOBRICH, G. M. CULVER, Mol. Cell 2002, 10, 129–138. 8 J.-H. ALIX, K. H. NIERHAUS, RNA 2003, 9, 787–793. 9 K. NISHI, J. SCHNIER, EMBO J. 1986, 5, 1373–1376. 10 A. K. SRIVASTAVA, D. SCHLESSINGER: in rRNA Processing in Escherichia coli, eds N. HILL, A. DAHLBERG, R. GARRETT et al., American Society for Microbiology, Washington, DC 1990. 11 D. SCHLESSINGER, R. I. BOLLA, R. SIRDESHMUKH et al., Bioessays 1985, 3, 14–18. 12 J. HARMS, F. SCHLUENZEN, R. ZARIVACH et al., Cell 2001, 107, 679–688. 13 B. T. WIMBERLY, D. E. BRODERSEN, W. M. CLEMONS et al., Nature 2000, 407, 327– 339. 14 J. W. WIREMAN, P. S. SYPHERD, Nature 1974, 247, 552–554. 15 F. DOHME, K. H. NIERHAUS, Proc. Natl. Acad. Sci. USA 1976, 73, 2221–2225. 16 J. OFENGAND, K. E. RUDD: in Bacterial, Achaeal, and Organellar rRNA: Pseudouridines and Methylated Nucleosides and their Enzymes, eds W. HILL, A. DAHLBERG, R. GARRETT et al., American
17 18 19 20 21
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
Society for Microbiology, Washington, DC 1990. J. ROZENSKI, P. F. CRAIN, J. A. MCCLOSKEY, Nucleic Acids Res. 1999, 27, 196–197. Z. F. CHENG, M. P. DEUTSCHER, Proc. Natl. Acad. Sci. USA 2003, 100, 6388–6393. P. TRAUB, M. NOMURA, J. Mol. Biol. 1969, 40, 391–413. L. LINDAHL, J. Mol. Biol. 1975, 92, 15–37. K. NIERHAUS, K. BORDASCH, H. E. HOMANN, J. Mol. Biol. 1973, 74, 587– 597. K. H. NIERHAUS, F. DOHME, Proc. Natl. Acad. Sci. USA 1974, 71, 4713–4717. F. DOHME, K. H. NIERHAUS, J. Mol. Biol. 1976, 107, 585–599. K. GAUSING, J. Mol. Biol. 1977, 115, 335–354. V. NOWOTNY, K. H. NIERHAUS, Proc. Natl. Acad. Sci. USA 1982, 79, 7238–7242. V. NOWOTNY, K. H. NIERHAUS, Biochemistry 1988, 27, 7051–7055. M. NOMURA, R. GOURSE, G. BAUGHMAN, Ann. Rev. Biochem. 1984, 53, 75–117. F. J. FRANCESCHI, K. H. NIERHAUS, Biochemistry 1988, 27, 7056–7059. P. TRAUB, M. NOMURA, J. Mol. Biol. 1969, 40, 391–413. F. FRANCESCHI, K. H. NIERHAUS, J. Biol. Chem., 1990, 265, 16676–16682. O. LECOMPTE, R. RIPP, J. C. THIERRY et al., Nucleic Acids Res. 2002, 30, 5382–5390. W. A. HELD, M. NOMURA, J. Biol. Chem. 1975, 250, 3179–3184. V. NOWOTNY, K. H. NIERHAUS, J. Mol. Biol. 1980, 137, 391–399. S. SPILLMANN, K. H. NIERHAUS, J. Biol. Chem. 1978, 253, 7047–7050. H. E. HOMANN, K. H. NIERHAUS, Eur. J. Biochem. 1971, 20, 249–257. F. SCHLUENZEN, A. TOCILJ, R. ZARIVACH et al., Cell 2000, 102, 615–623. N. BAN, B. FREEBORN, P. NISSEN et al., Cell 1998, 93, 1105–1115.
References 38 D. E. BRODERSEN, W. M. CLEMONS, Jr., A. P. CARTER et al., J. Mol. Biol. 2002, 316, 725–768. 39 S. C. AGALAROV, E. N. ZHELEZNYAKOVA, O. M. SELIVANOVA et al., Proc. Natl. Acad. Sci. USA 1998, 95, 999–1003. 40 V. MANDIYAN, S. J. TUMMINIA, J. S. WALL et al., Proc. Natl. Acad. Sci. USA 1991, 88, 8174–8178. 41 R. R. SAMAHA, B. OBRIEN, T. W. OBRIEN et al., Proc. Natl. Acad. Sci. USA 1994, 91, 7884–7888. 42 C. J. WEITZMANN, P. R. CUNNINGHAM, K. NURSE et al., FASEB J. 1993, 7, 177–180. 43 S. M. STAGG, J. A. MEARS, S. C. HARVEY, J. Mol. Biol. 2003, 328, 49–61. 44 G. R. BJO¨ RK: in Escherichia coli and Salmonella, ed. F. C. NEIDHARDT,
45
46
47
48
49 50
American Society for Microbiology, Washington DC 1996, 861–886. E. R. DABBS: in Structure, Function and Genetics of Ribosomes, eds B. HARDESTY and G. KRAMER, Springer-Verlag, New York 1986, 733–748. R. R. GUTELL, B. WEISER, C. R. WOESE et al., Prog. Nucleic Acid Res. Mol. Biol. 1985. W. A. HELD, B. BALLOU, S. MIZUSHIMA et al., J. Biol. Chem. 1974, 249, 3103– 3111. M. I. RECHT, J. R. WILLIAMSON, Cold Spring Harb. Symp. Quant. Biol. 2001, 66, 591–598. M. HEROLD, K. H. NIERHAUS, J. Biol. Chem. 1987, 262, 8826–8833. N. BAN, P. NISSEN, J. HANSEN et al., Science 2000, 289, 905–920.
21
1
Assembly of the Eukaryotic Ribosome Denis L. Lafontaine
Universit´e Libre de Bruxelles, Charleroi-Gosselies, Belgium
Originally published in: Protein Synthesis and Ribosome Structure. Edited by Knud H. Nierhaus and Daniel N. Wilson. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30638-1
1 Introduction
Recent proteomic developments are providing the first eukaryotic ribosomal assembly maps. In these, pre-ribosomal assembly appears to be asymmetric, at least biphasic, with the small ribosomal subunit synthesis factors binding first to the pre-rRNAs to be replaced, after the first few pre-rRNA cleavages, by proteins involved in the synthesis of the large ribosomal subunit. Pre-rRNA processing is fairly well characterized with several key-processing enzymes remaining to be identified, including most endoribonucleases. rRNA modification is also well understood and relies extensively on small nucleolar RNAs (snoRNAs) for the recognition of the sites of modification. Nucleolar routing of box C+D snoRNAs (required for sugar 2 –O methylation) involves transit through a specific nuclear locale, the human coiled/cajal body (CB) and yeast nucleolar body (NB); these are conserved sites of small ribonucleoprotein particles (RNP) biogenesis. The first proteins involved in ribosome export are being identified; however, most of these are also required for pre-rRNA processing, and presumably pre-rRNP assembly. Their precise function in RNP transport therefore awaits these effects to be uncoupled. Key factors active in ribosome synthesis are also required for the processing of many other classes of cellular RNAs, suggesting that maturation factors are recruited from a ‘common pool’ of proteins to specific pathways. Much remains to be done to understand how rRNP processing, modification, assembly and transport are integrated with respect to ribosome synthesis and other cellular biosynthetic pathways. Ribosome synthesis starts in the nucleolus, the site of rDNA transcription. rRNA synthesis occurs at the interface between the fibrillar center(s) (FCs) and the dense fibrillar component (DFC) with the nascent transcripts reaching out in the body of the DFC ([1]; reviewed in Ref. [2]). A dedicated polymerase, RNA Pol I (Pol I), drives Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Assembly of the Eukaryotic Ribosome
the transcription of a large precursor encoding three of the four mature ribosomal RNAs (rRNAs). The fourth rRNA, 5S, is produced from a Pol III promoter. The Pol I primary transcript is modified (specific residues are selected for ribose or base modification and pseudouridines formation), processed (mature sequences are released from the precursors and the non-coding sequences discarded) and assembled with the ribosomal proteins (RPs) in pre-ribosomes (reviewed in Refs. [3–8]). As these processes occur, the granular component (GC) of the nucleolus emerges. FC, DFC, and GC are morphologically distinct compartments present in most eukaryotes; interestingly, although controversial, recent analysis indicate that the yeast Saccharomyces cerevisiae has no FC (D.L.J. Lafontaine and M. Thiry, unpublished results). The relationship between the subnucleolar structures and the different steps of ribosome synthesis is not clear at present. The nucleolus is a highly dynamic structure, and RNA and protein components are known to exchange with the surrounding nucleoplasm with high kinetics [9, 10]. The average nucleolar residency time for human fibrillarin was estimated to be of only ∼40 s, indicating that the remarkably stable organization of the nucleolus may in fact reflect the extremely rapid dynamic equilibrium of its constituents. It is presently unclear whether there are resident, structural, nucleolar proteins or whether the structure simply ‘holds together’ through multiple, weak, interactions occurring between the nascent pre-rRNAs and the numerous trans-acting factors recruited to the sites of transcription [11]. The recent proteomic characterization of this cellular compartment will probably help to address these issues ([12, 13]; reviewed in Ref. [14]). Pre-ribosomes are released from the nucleolar structure, reach the nuclear pore complexes (NPC), presumably by diffusion, and are translocated to the cytoplasm. Both the small (40S) and large (60S) ribosomal subunits undergo final cytoplasmic maturation steps. A large number of trans-acting factors follow the pre-ribosomes to the cytoplasm and are recycled to the nucleus. Recent data suggest that the final steps of maturation may be coupled to cytoplasmic translation [15, 16]. RP genes, most often intron-containing, duplicated and expressed at distinct levels (yeast), are transcribed by Pol II. RP pre-mRNAs follow a canonical Pol II synthesis pathway (including capping, splicing, poly-adenylation, etc.; reviewed in Ref. [17]) and are routed to the cytoplasm to be translated. RPs are addressed to the nucleus and the nucleolus. Nuclear targeting operates on the NLS mode (reviewed in Refs. [18–20]); redundant importins are involved [21, 22]. Nucleolar targeting is less- well defined. Ribosome synthesis is an extremely demanding process requiring both tremendous amounts of energy and high levels of co-regulation and integration with other cellular pathways (reviewed in Refs. [23–25]). The production of the resident ribosomal components (4 rRNAs and about 80 RPs), as well as several hundreds of RNAs and protein trans-acting factors (see below) depends on the concerted action of the three RNA polymerases, extensive RNA processing and modification reactions, RNP assembly and transport and the function of several RNPs, including the ribosome itself. With about 2000 ribosomes to be produced per minute in an actively dividing yeast cell, transcription of pre-rRNAs and RP pre-mRNAs alone
2 Why so many RRPs?
represent not less than 60 and 40% of the Pol I and Pol II cellular transcription, respectively. With about 150 pores per nucleus, each pore must import close to 1000 RPs and export close to 25 ribosomal subunits per minute. The nucleolus does not only serve the purpose of ‘making of a ribosome’. In fact, it appears that most classes of cellular RNAs, including mRNAs [26, 27], tRNAs [28], snRNAs [29, 30], the SRP [31–33], RNAse P [34] and the TEL RNP [35–37] all transit through this organelle on their way to their final destinations, which can either be the nucleoplasm or the cytoplasm. Although the reason for this trafficking is in most cases unclear at present, this presumably reflects a need to benefit from the preribosomes maturation machinery. In the following, I will try to emphasize instances where common trans-acting factors are used on distinct classes of RNAs. The concept of a ‘multifunctional nucleolus’ has recently been elegantly reviewed [38]. Most of our current understanding of ribosome synthesis is based on research work in S. cerevisiae; this will be reviewed here. Other eukaryotic systems have been used successfully, including trypanosomes, Xenope, mouse and humans. Comparison between these various levels of organization is most useful and often highlights a high degree of conservation throughout the eukaryotic kingdom, e.g., most transacting factors identified in yeast have human counterparts. This chapter will present an overview of eukaryotic ribosome synthesis for the non-specialists, with an emphasis on the latest developments and unresolved issues.
2 Why so many RRPs?
An excess of 200 proteins, here referred to as RRPs (ribosomal RNA processing factors) are required for ribosome synthesis and transiently associate with the preribosomes. RRPs are not found in mature, cytoplasmic, particles but are recruited at various stages in the ribosomal assembly process. Recruitment presumably follows a strict temporal order. A similar number of small, stable RNAs, which localize at steady state in the nucleolus, the snoRNAs, are also involved. Most RRPs have no known function and, in fact, apart from those few with catalytic activities or well-characterized protein domains, we clearly have no idea of what they do. Best-characterized RRPs include several endo- and exoribonucleases (Table 1), snoRNA-associated proteins, modification enzymes (ribose and base methyl-transferases, pseudouridine synthase), RNA helicases [39, 40], GTPases [15, 41, 42], AAA-ATPases [43, 44], protein kinases [45, 46], RNA binding or protein–protein interaction domain-containing proteins and proteins with striking homology to RPs [43, 47–49]. Protein–protein interaction domain include coil–coil domains, WD and HEAT repeats, crooked-neck-like tetratrico peptide repeat, etc.; distinctive RRP motifs include the Brix, GAR, G-patch, and KKD/Edomains [50–55]. The actual catalytic activity of most RRPs remains to be demonstrated experimentally and the precise substrate of these proteins is, in most cases, not known.
3
4 Assembly of the Eukaryotic Ribosome Table 1 Endo- and exoribonucleolytic activities involved in pre-rRNA processing
Cleavage site
Cleavage activity
References
B0 B0 →B2 A0 , A1 , A2 A3 A3 →B1S B1L C2 C2 →C1 and C1 →C1 C2 →E
Rnt1p/yeast Rnase III Rex1p ?, ?, ? MRP Rat1p, Xrn1p ? ? Rat1p, Xrn1p Exosome Rex1p, Rex2p ?Ngl2p
[113] [224] [163] [90]
[93] [96] [84] [92]
Comprehensive lists of RRPs have recently been compiled by various authors with a short description of protein domains and known or presumed functions; these are freely available on-line (see useful WWW links at the end of this chapter).
3 (Pre-)ribosome Assembly, the Proteomic Era
In the early 1970s, the joint efforts of the Warner and Planta Labs defined the basics of eukaryotic ribosome assembly; this remained the core of our understanding for the next 30 years [56–62]. Metabolic labeling experiments and sucrose-gradient analysis revealed that following transcription, an early 90S pre-ribosome is formed and subsequently partitioned into a 43S and a 66S particle, precursors to the 40S and 60S subunits, respectively (see Fig. 1). The RNA content of these particles was established as 35S, 27S, and 20S pre-rRNAs for the 90S, 66S, and 43S particles, respectively. The conversion of the 43S particles to 40S subunits is closely linked to small subunit export. Few RRPs were known at that time and the protein composition of these RNP complexes was not determined. In the absence of appropriate tools, most of the research focused on other aspects of ribosome synthesis with most of the progress being made on pre-rRNA processing and modification (see below). There was no reason to believe a priori that there would be a strong bias for the association of RRPs involved in small subunit synthesis with the primary transcript. In fact, since many mutations affecting primarily 25S rRNA synthesis have negative feedback effects on early pre-rRNA cleavages (see Sect. 4 and Ref. [6]), as part of what we think is a ‘quality control’ mechanism (see below), the suggestion was made that early and late RRPs interact functionally; such interactions could have occurred in a single, large, ‘processome’. Functional interactions between early and late RRPs are most probably prevalent but the simple view of a unique ‘processome’ has however been recently challenged.
3 (Pre-)ribosome Assembly, the Proteomic Era
Fig. 1 Ribosomal assembly pathways. See main text for a full description. Cleavage sites, processing activities and RNA content of pre-ribosomal particles are indicated, as well as the TAP-targets used for the purifications (see Table 2). Pre-ribosomes have tentatively been ordered, based on their protein and RNA content, and assigned and to
the early (E), middle (M), or late (L) class. At the time of writing (Christmas 2002), several novel pre-ribosomal species are being isolated (in particular in the 40S subunit branch) and the pathway is expected to be much refined in the next few months. Largely inspired by Fatica and Tollervey [69]). Nu, nucleus; Cy, cytoplasm.
The advent of efficient copurification schemes and mass-spectrometry analysis [63, 64] led several labs to isolate distinct pre-ribosomal species (currently about 12, see Table 2). Typically, these were purified from one or several epitopetagged protein components of the rRNPs ([43, 49, 52, 55, 65–68]; reviewed in Refs. [69, 70]). These preparations have achieved a much better definition in their pre-rRNA content (which parallels our current understanding of pre-rRNA processing, see Figs. 1 and 3) and the protein composition of the particles has been established accurately. In combination with high-throughput copurification and two-hybrid schemes ([71–75] and useful WWW links), these data provide the basis to draw the first eukaryotic (pre-)ribosomal assembly maps (see Figs. 1 and 2). It transpires that ribosomal assembly is strongly asymmetric and at least biphasic ([65, 66]; reviewed in Ref. [69]). Early RRPs interact with nascent pre-rRNAs, mostly in association with the U3 snoRNP, now also referred to as the small subunit processome (‘SSU processome’; [65], see below). Following the first three pre-rRNA cleavages, at sites A0 , A1 , and A2 (see Figs. 1–3 and pre-rRNA processing section), this first set of factors essentially cycles-off the pre-ribosomes and is replaced by the large ribosomal subunit RRPs (Fig. 2). Pre-40S subunits are then left associated with very few factors, about a dozen of them, referred to as the SSU RRP complex [76, 77];
5
6 Assembly of the Eukaryotic Ribosome Table 2 TAP-tagged purified pre-ribosomes, as of Christmas 2002
pre-ribosomes
90S and U3/SSU processome U3/SSU processome Pre-60S E1 Pre-60S E2 Pre-60S M Pre-60S L Seven species of early (E), medium (M) and late (L) pre-60S
TAP targets
References
Pw2p, Rrp9p, Nop58p, YDR449c, Krr1p, Noc4p, Kre31p, Bud21p, YHR196w, YGR090w, Enp1p, YJL109c, Nop14p Mpp10p and Nop58p Ssf1p Nop7p Nug1p Nug2p/Nog2p
[66] [65] [52] [67] [43] [49]
Nsa3p, Nop7p, Sda1p, Rix1p, Arx1p, Kre35p, Nug1p
[68]
The TAP technology (Tandem Affinity Purification, 228) has been used to isolate most pre-ribosomes described to date.
Fig. 2 The ‘biphasic model’ for ribosomal assembly. The SSU RRPs (including the U3 snoRNP/SSU ‘processome’) associate with the primary Pol I transcript, generating the 90S pre-ribosomes. This first set of RRPs is replaced after the first three pre-rRNA processing reactions (A0 –A2 ) by the LSU RRPs.
3 (Pre-)ribosome Assembly, the Proteomic Era
Fig. 3 rDNA and pre-rRNA processing pathway. (A) Structure of the yeast rDNA. The 18S, 5.8S and 25S rRNAs are encoded in a single, large, Pol I transcript (35S) Mature sequences are separated by noncoding spacers, the 5 - and 3 -external transcribed spacers (ETS) and the internal transcribed spacers 1 and 2 (ITS). Processing sites (A0 to E) are indicated. 5S is transcribed independently, in the opposite direction, by Pol III as 3 -extended precursors. The production of 5S mature 3 -ends is a multi-step process that requires Rex1p. (B) Pre-rRNA processing pathway in wildtype strains. See main text for a description of our current understanding of the processing pathway. Processing at sites C2 →E is detailed in Fig. 4. Note that C2 (referred to as C2 in
Ref. [93]) was recently mapped precisely by primer extension at a position located 94 nucleotides upstream of site C1 . Previous mapping, by RNase protection, located this site slightly upstream (at position +101 with respect to C1 ). Although both sites may be used in vivo, it is more probable that this difference reflects limitations inherent to the RNase mapping strategy used. In Ref. [93], the C2 -B2 RNA is referred to as 25.5S (C2 -B2 is 26S here). Note that a cryptic processing site (A4 ) has recently been identified in ITS1 between A2 and A3 in rrp5 mutants [334]. (C) Aberrant RNA precursors commonly detected in RRP mutants. Delays in early pre-rRNA processing often results in the accumulation of the 23S, 22S or 21S RNA. These are generally not further matured.
7
8 Assembly of the Eukaryotic Ribosome
pre-60S subunits acquire several dozens of novel RRPs. As anticipated, there is a steady decrease in the number of these pre-60S-associated RRPs as the preribosomes undergo the complex 5.8S–25S pre-rRNA processing pathway and reach the NPC. 90S and 66S particles were long known to have a higher ratio of protein to RNA content than the mature 60S subunits, as judged from buoyant densities in CsCl gradients (see, e.g., Ref. [57]). This is in contrast with 43S pre-ribosomes and 40S subunits that have about the same protein content. Late nuclear pre-60S ribosomes show the reassuring presence of known transport factors, such as the well-characterized Nmd3p/Rpl10p couple (see Sect. 7). Most striking from the currently described pre-rRNPs is the conspicuous absence of most known cleavage enzymes; this could either reflect the low abundance or transient associations of these activities with the rRNP complexes. X-ray crystallographic analysis of large ribosomal subunits revealed that while many RPs are located on the exterior of the rRNA core, several RPs show idiosyncratic extensions deeply buried into the body of the subunits in a configuration that is only compatible with concomitant foldings of the RPs and the rRNAs ([78]; reviewed in Refs. [79–82]). This presumably underlies the need for close to 30 distinct remodeling activities (helicases, GTPases, and AAA-ATPases). It is remarkable that several RRPs are strikingly homologous to RPs (e.g., Imp3p/Rps9p, Rlp7p(Rix9p)/Rpl7p, Rlp24p/Rpl24p, Yh052p/Rpl1p [43, 47–49]), suggesting that they possibly ‘hold in place’ pre-rRNP structures during the assembly process, preventing premature, irreversible, folding steps to occur before being swapped for their homologous RPs. This strategy may even couple late pre-rRNA processing reactions to translation as eIF3j/Hcr1p is required for both 20S pre-rRNA processing and translation initiation and the RRP Efl1p is homologous to the ribosomal translocase EF-2 [15, 16]. Pre-rRNP particles currently described were isolated from tagged RRPs and although clearly distinct in composition, as expected from the substantial remodeling of the pre-ribosomes that take place along the pathway, represent mixed prerRNP populations. It is also important to note that it is in fact mostly preribosomal assembly rather than ribosomal assembly per se that has been addressed so far. Indeed, RPs present a particular challenge; there are usually small, highly basic and coprecipitate at high degrees with targets that are not related to ribosome synthesis. Despite these limitations, a major step has however been achieved with the isolation of particles which have a lifetime of presumably less than a minute. Much remains to be done to understand what the RRPs exactly do, how and when they interact with the pre-ribosomes and how they ‘talk’ to each other.
4 Ribosomal RNA Processing, Getting there. . .
Pol I transcription drives the synthesis of a large pre-rRNA, 35S in yeast, containing the mature sequences for the small subunit rRNA (the 18S rRNA) and
4 Ribosomal RNA Processing, Getting there. . . 9
two of the large subunit rRNAs (5.8S and 25S rRNAs). Completion of transcription requires about 5 min. Mature sequences are flanked with non-coding spacers (Fig. 3A). A Pol III precursor, 7S, is processed in 3 by the Rex1p/Rna82p exoribonuclease into 5S; the third large ribosomal subunit rRNA [83, 84]. In most eukaryotes but S. cerevisiae, 5S rDNA is located in extranucleolar loci as individual or repeated copies. In yeast, 35S and 7S are encoded on opposite strands within 150–200 repeated 9 kb rDNA arrays located on chromosome XII (Fig. 3A). Mature sequences are generated from the 35S pre-rRNA following a complex multi-step processing pathway requiring both endo- and exoribonucleolytic digestions (Fig. 3B). It is thought that most cleavage sites are known and occur within about 2 min following a precise temporal order. There is a strong bias for cleavages from the 5 - to the 3 -end of the primary transcript and the synthesis of the small and large ribosomal subunits is relatively independent. The 35S pre-rRNA is successively cleaved in the 5 external-transcribed spacer (5 -ETS) at sites A0 and A1 and in the internal-transcribed spacer 1 (ITS1) at site A2 (Fig. 3B). Endonucleolytic digestions at sites A0 and A1 produce the 33S and 32S pre-rRNAs, respectively. Precursors to the small and large subunit rRNAs (the 20S and 27SA2 pre-rRNAs, respectively) are generated by endonucleolytic cleavage of the 32S pre-rRNA at site A2 . The precise mechanism of cleavage at sites A0 –A2 is not known; however, these reactions are tightly coupled and involve the box C+D snoRNA U3/the ‘SSU processome’ (see below). The 20S pre-rRNA is then exported to the cytoplasm where endonucleolytic digestion, by an unknown RRP, at site D provides the 18S rRNA [58, 85]. A complex of late small subunit RRPs has recently been described in association with the dimethyltransferase Dim1p ([45] and see below); the endonucleolytic activity may lie in one of these. The 27SA2 pre-rRNA is cleaved at site A3 to generate the 27SA3 RNA. This cleavage is carried out by the endoribonucleolytic RNP complex RNase MRP. RNase MRP is highly reminiscent to another snoRNP, the ubiquitous RNase P that is involved in the 5 -end formation of tRNAs (reviewed in Refs. [86, 87]). The homology extends both to the structure of their respective RNA as well as to their protein composition (eight of the nine protein subunits are shared between the two enzymes). Snm1p is specific to RNAse MRP; Rpr2p is unique to RNase P [88, 89]. In the absence of cleavage at site A2 , pre-rRNA processing can proceed through the next ITS1 cleavage at site A3 . This can be seen as a ‘rescue’ pathway for such an essential activity as ribosome synthesis [86]. There are two alternative pathways of synthesis of 5.8S–25S rRNAs [90]. In the major pathway, which represents ∼80% of the total processing, the 27SA3 pre-rRNA is trimmed to site B1S (the 5 -end of the most abundant form of 5.8S, the 5.8SS rRNA) by the combined action of two 5 –3 exoribonuclases, Rat1p and Xrn1p. Rat1p is encoded by an essential gene and mostly located to the nucleus; Xrn1p is not essential and mostly localizes to the cytoplasm [91]. These two exoribonucleases often show partially overlapping functions (see, e.g., Refs. [90, 92–94]).
10 Assembly of the Eukaryotic Ribosome
27SBS is cleaved by an unknown endonuclease, roughly in the middle of ITS2, at site C2 . Cleavage at C2 provides the 7SS and 26S pre-rRNAs. Processing of the 3 -end of 5.8S and the 5 -end of 25S requires a complex succession of, mostly, exoribonucleolytic digestions. During these, consecutive substrates are literally ‘handed over’ from one ribonucleolytic activity to the next. The 7S is digested to site E by the successive action of the exosome complex [95–97], the Rex1p exoribonuclease and Ngl2p, a putative endonuclease [84, 92]. The exosome is a remarkable complex of 11 3 –5 exoribonucleolytic activities involved in the synthesis and degradation of most classes of cellular RNAs ([96, 98– 100]; reviewed in Refs. [101, 102]). A nuclear form of the exosome is specialized in the synthesis and turnover of large RNAs, including rRNAs and pre-mRNAs as well as most classes of small stable RNAs (snoRNAs, snRNAs, tRNAs, pre-mRNAs, SRP, RNase P, etc.); a cytoplasmic form is devoted to mRNA degradation. Rrp6p (E. coli RNase D), a non-essential subunit of the exosome, is specific to the nuclear form of the complex [98, 103]. Nuclear and cytoplasmic exosomes also differ by their use of specific cofactors (see, e.g., Refs. [104, 105]). The related DExH putative RNA helicase Dob1p/Mtr4p (nuclear) and Ski2p (cytoplasmic) is an example [100, 106]. 7S precursors are first trimmed from site C2 , located at position +134 with respect to the 3 -end of 5.8S, to position +30 [95] (Fig. 4). This requires all the subunits of the exosome and the nuclear cofactor Dob1p/Mtr4p. 5.8S+30 pre-rRNA is then digested to position +8 by Rrp6p. 5.8S+8, also referred to as 6S, is consequently trimmed to 5.8S+5 by the multiple exoribonuclease activities of Rex1p, Rex2p and the exosome complex (notably the Rrp40p and Rrp45p subunits) [84, 95]. 5.8S+5 is finally matured to 5.8S by Ngl2p [92] (Fig. 4). While the relationship between the subnucleolar compartments and the various ribosome synthesis steps is far from being clear, it is probable that the DFC is the site of early pre-rRNA processing, modification and assembly reactions with later processing cleavages and assembly steps occurring in the GC. SnoRNP core proteins involved in 2 –O methylation, pseudouridines formation and early prerRNA processing cleavage at sites A0 –A2 (see below) localize to the DFC [107–109]. The MRP, involved in cleavage at site A3 , is detected in the GC [110]; this is also where Rlp7p, which is required for cleavage at site C2 , has been localized [48]. Following cleavage at site A2 , the maturation of the small and large rRNAs is relatively independent. However, mutations affecting primarily the synthesis of 5.8S and 25S rRNAs frequently have negative feedback effects on early cleavages at sites A0 –A2 . The mechanism underlying these observations is not known but believed to be part of a ‘quality control’ mechanism (it would not appear very useful to further initiate the production of pre-ribosomes that will fail to mature properly), which presumably reflects the existence of functional interactions between early and late RRPs. 3 -end formation of other classes of RNAs, such as the snoRNAs and snRNAs, seem to follow a similar strategy of ‘exoribonucleolytic hand over’ [95, 111]. It is unclear, at present, whether so many distinct nucleolytic activities, with partially
4 Ribosomal RNA Processing, Getting there. . . 11
Fig. 4 Multiple steps of ribonucleolytic ‘hand-over’ are required to synthesize the 5.8S rRNA. Successive pre-rRNA species and transacting factors involved are indicated. See main text for a complete description.
overlapping specificity, are required to achieve what would appear to be a fairly straightforward processing. This presumably provides potential for further ‘rescue pathways’ and quality controls. The 26S pre-rRNA is trimmed to site C1 by Rat1p and Xrn1p. This is also probably a multi-step process. Consistently, primer extension through ITS2 from an oligonucleotide specific to the 5 -end of 25S rRNA reveals strong stops at positions +9 and +18 (respective to 25S rRNA 5 -end). The species extending to site +9 (25S ) is lost in some RRP mutants [48]. In the mature subunits, 5.8S and 25S rRNAs are base-paired but the precise timing of this association in the pre-ribosomes is not known. The major site of Pol I transcription termination (site T2 ) is located at position +210 (respective to the 3 -end of 25S rRNA). Precursors extending to this site are however not detected in wild-type cells as primary transcripts are cleaved cotranscriptionally at sites +14/+49 (B0 ) on both sides of an AAGN-closed stem-loop structure by the endonuclease Rnt1p [112–114]. Rnt1p is homologous to bacterial RNase III which similarly cleaves its substrates on both sides of extended stem-loop structures (reviewed in Ref. [115]). Final trimming to site B2 (the 3 -end of 25S) is carried out by Rex1p/Rna82p [84]. An oligonucleotide specific to sequences located downstream to B2 detects 27SA2 but not 27SB on Northern blots, demonstrating that processing at sites B1 and B2 is tightly coupled and presumably concurrent [113]. The minor pathway (used in ∼20% of the cases) produce pre-rRNAs and 5.8S rRNA that are extended in 5 by 7–8 nucleotides. This starts with cleavage of the
12 Assembly of the Eukaryotic Ribosome
27SA2 pre-rRNA at site B1L by an unknown enzyme, a presumed endoribonuclease. The resulting 27SBL is then processed into 25S and 5.8SL rRNAs following a pathway that is, as far as we know, essentially identical to the one described above for 27SBS . It is not precisely known when the 5S RNP (5S rRNA associated with RPL5, see [116]) joins pre-60S ribosomes but its recruitment is required for efficient 27SB processing and is therefore presumably concomitant with processing at site C, thus ensuring that all newly formed 60S subunits contain stoichiometric amounts of the three rRNAs [117, 118]. Alterations in the kinetics of cleavage are seen in many RRP mutants. These usually lead to the accumulation of aberrant precursors that are not faithfully processed to mature rRNAs but rather degraded, notably by the action of the exosome complex [95, 119]. The most often encountered abnormal species, the 23S (extending from sites +1 to A3 ), 22S (from sites A0 –A3 ) and 21S (A1 –A3 ) RNAs, result from alterations in the kinetics of early pre-rRNA processing reactions (Fig. 3c). Analysis of these species has allowed the description of the processing in the ITS1 and led to the identification of the cleavage site A3 [90, 120–124]. Alterations in the order of cleavage at later processing sites are now also known to occur and give rise to the accumulation of a full range of abnormal RNAs; e.g., A2 –C2 , A2 –E, etc. [52, 125]. Over the years, extensive mutagenesis experiments have been performed on rDNA to isolate sequences relevant in cis to pre-rRNA processing reactions. While it is far beyond the scope of this chapter to review this body of data (see Ref. [6]), it should be noted that these experiments have often highlighted how processing reactions distant in the primary rRNA sequence are in fact tightly linked; indeed, mutations in the 5 -ETS, ITS2, or 3 -ETS regions can each inhibit processing in ITS1 (see, e.g., Refs. [126–129]). While we now have a fairly complete picture of pre-rRNA processing, much remains to be done to understand the precise kinetics of the processing as well as the extensive connections between early and late cleavage events. Many processing enzymes also remain to be identified, in particular most endoribonucleolytic activities. It is possible that some endoribonucleases have already been assigned to the RRPs and await further attention; the absence of specific motifs in their sequence complicates their identification. The development of in vitro reconstitution assays should be most useful in this respect. It is notable that most known cleavage factors (the exosome, the exoribonucleases Rat1p and Xrn1p, the endoribonuclease Rnt1p) involved in prerRNA processing are required for the synthesis and/or degradation of other classes of cellular RNAs (mRNAs, snRNAs, snoRNAs, tRNAs, SRP, RNase P, etc.). All seem to indicate that general maturation factors are recruited from a ‘common pool’ of proteins to specific cellular pathways. This is also illustrated by the over-increasing sets of proteomic data supporting the existence of extensive integration between ribosome synthesis and other biosynthetic pathways.
5 Ribosomal RNA Modification: A Solved Issue?
5 Ribosomal RNA Modification: A Solved Issue?
Ribosomal RNAs are extensively modified with a large majority of the modifications clustering at the most functionally relevant and conserved sites of the ribosome (tRNA- and mRNA-binding sites, peptidyl transferase center, intersubunit bridges, entry of the exit tunnel, etc.). This has recently been highlighted on three-dimensional maps based on crystallographic analysis of archaeal and bacterial ribosomal subunits (see Ref. [130] and useful WWW links). The atomic resolution structure of the ribosome established it as a ribozyme; the peptidyl transferase center is surrounded by an RNA cage leaving little, if any, chance to the RPs to be involved in the peptidyl-transfer reaction per se (reviewed in Refs. [79, 80, 131]). rRNA spacers are consistently devoid of modification. The most frequent RNA modifications are 2 –O methylation of ribose moieties (Nm) and uridine isomerization (pseudouridines, ) (∼50 of each in yeast; twice this amount in humans) (Figs. 5b and d). The sites of these modifications are virtually all selected by base pairing with the snoRNAs. Less abundant are the base modifications. These are also essentially modified by methylation (mN) and rely, as far as we know, on protein-specific enzymes rather than the snoRNPs.
Fig. 5 snoRNA in pre-rRNA modification. SnoRNA/pre-rRNA hybrids at sites of 2 –O methylation (a) and pseudouridine formation (c). Sugar methylation (b) and pseudouridines (d). See main text. Adapted from Ref. [133].
13
14 Assembly of the Eukaryotic Ribosome
5.1 Ribose Methylation, Pseudouridines Formation and the snoRNAs
There are essentially two families of snoRNAs, the box C+D (involved in sugar 2 –O methylation) and the box H+ACA (required for pseudouridines formation) (Fig. 5). A third class is defined by the related RNAse P/RNAse MRP RNAs. Yeast snoRNAs range in size from about 60 to about 600 nucleotides. Box C+D snoRNAs consist of a stem-loop structure with boxes C (UGAUGA) and D (CUGA) flanking a terminal helix; duplicated boxes C and D are also observed (reviewed in Refs. [132–134]) (Fig. 5a). Box H+ACA RNAs show two consecutives hairpin structures, bridged by a conserved H-box (ANANNA, where N is any residue) or hinge motif (hence its name); the triplet ACA is always located 3 nucleotides upstream to the 3 -end of the RNA (Fig. 5c). At each site of modification, a duplex is formed by Watson–Crick base-pair interactions between a specific snoRNA and the RNA substrate. This results in the formation of a snoRNA/pre-rRNA hybrid that precisely position the residue to be modified on the substrate with respect to conserved boxes on the snoRNA. For the box C+D snoRNAs, the guide or ‘anti-sense’ elements are located upstream of boxes D or D and provide the potential to form between 10 and 21 consecutive basepairs with the pre-rRNAs; including the site of 2 –O methylation invariably located five nucleotides upstream of boxes D or D (reviewed in Refs. [135, 136]). For the box H+ACA snoR-NAs, the ‘anti-sense’ motifs are within internal bulges (also known as ‘pseudouridylation pockets’) in the hairpin stems, and target the formation of two short helices of 3–10 base-pairs with the substrate; these are interrupted by the uridine to be altered by rotation into (reviewed in Refs. [137, 138]). This uridine is usually located at about 14 residues from boxes H or ACA. SnoRNAs show a high degree of divergence outside the conserved boxes; including, obviously, the ‘anti-sense’ elements. SnoRNAs are associated with a limited set of specific core proteins. Snu13p (15.5K in humans), Nop1p (yeast Fibrillarin) and the related Nop56p and Nop58pKKD/E containing proteins are associated with all box C+D snoRNAs [53, 139– 142]. The human 15.5K is expected to nucleate box C+D snoRNP assembly through direct binding to a conserved RNA-fold (K-turn, see below) generated by interactions between boxes C and D [142]. Remarkably, Snu13p is also a component of the spliceosomal U4/U6•U5 tri snRNP [142]. Cbf5p (NAP57 in rodents, Dyskerin in humans), Gar1p, Nhp2p and Nop10p are all associated with the H+ACA snoRNAs [107, 143–145]. Concurring evidences support that Nop1p and Cbf5p are the methyltransferase and the pseudouridine synthase, respectively [144, 146–150]. The localization of the snoRNAs and their associated core proteins in the DFC of the nucleolus suggest that this is the site of rRNA modification [107–109]. Telomerase is an RNP reverse transcriptase that maintains telomere length by adding telomeric DNA repeats onto the ends of eukaryotic chromosomes [151–153]. In humans, the telomerase RNA (hTERT) has a canonical H+ACA motif at its 3 end that is bound by the core H+ACA proteins [154–157]. In yeast, the TEL RNA
5 Ribosomal RNA Modification: A Solved Issue?
is bound by the spliceosomal Sm proteins [158]; another interesting evolutionary crosstalk. Human telomerase also interacts with La and SMN (see below and Refs. [159, 160]). Strikingly, several self-immune and genetic human diseases map to key nucleolar RNA-processing factors and snoRNP proteins, such as the exosome, fibrillarin, dyskerin and the RNAse MRP [98, 161–168]. 5.2 The Emergence of the snoRNAs
Eukaryotes have about 10 times (in the range of the hundred) more modifications in their ribosomal RNAs than prokaryotes do. An observation that, a posteriori, seems to fully justify the emergence of the snoRNAs. Why would a cell evolve and produce several dozens of protein enzymes with distinct substrate specificities when it can rely on a single snoRNA-associated protein? In addition, the snoRNAbased system of RNA modification is very flexible as the guide sequences are not conserved (they have little, if any, functions in snoRNA synthesis, stability, and nucleolar targeting) and are therefore prone to rapid evolution. The accumulation of point mutations in snoRNAs generates new ‘anti-sense’ elements that, eventually, will find new RNA targets. In fact, there is a steady increase in the number of modification across evolution with bacteria and eukaryotes on both ends of the range and the archaea showing intermediate distributions. This raised the possibility that these too may rely on a ‘snoRNA-like mechanism’ to select their sites of RNA modification (discussed in Ref. [133]). An assumption that turned out to be correct as a large family of archaeal box C+D and H+ACA sRNAs (archaea lacking a clear nucleolar structure) and a full set of core sRNPs proteins has now been described ([147, 169–171]; reviewed in Refs. [172, 173]). Remarkably, in archaea, the snRNAs not only target the modification of rRNAs but also of tRNAs [174]. A model, based on the assumption that 2 –O methylation confers extra thermostability, has been proposed that correlates the distribution of archaeal rRNA modification with the temperature of their ecological niches [175]. Archaeal box C+D sRNAs, active in methylation, have been reconstituted in vitro from individually produced components [147]. In these experiments, assembly appeared to follow a strict order with the aL7a (archaeal Snu13p) binding first to the RNAs, followed by aNOP56 binding (archaeal Nop56/58p) and then finally association with aFib (archaeal Nop1p). These analysis led further support to the predictions that Snu13p may nucleate the step-wise assembly of box C+D snoRNPs and that Nop1p carries the methyltransferase activity (mutations in aFib catalytic motifs were inactive in methylation). In yeast, Nop56p was dependent on Nop1p for binding to the snoRNAs whereas Nop58p was found to bind independently [140]. Recent studies have revealed that archaea assemble symmetric sRNPs with a complete set of core proteins (L7a, the single Nop56/58p homolog and fibrillarin) at both box C+D and C +D motifs [176]. In contrast, eukaryotes snoRNPs appeared
15
16 Assembly of the Eukaryotic Ribosome
asymmetric with a distinct set of core proteins bound to each motifs; 15.5K, Nop58p, and fibrillarin were all detected at the terminal C+D motif, whereas Nop56p, fibrillarin, but no apparent 15.5K, were present at the internal C +D position [177, 178]. A rationale to this key difference in protein composition is provided by the observation that during evolution, the 15.5K seemed to have lost its ability to recognize internal C +D motif [176]. The box C +D motif is degenerated and suboptimal for tight association with the core proteins. Significantly, the recent resolution of the 3D structure of an archaeal Nop58p–fibrillarin complex bound to Sadenosyl methionine (SAdoMet), the universal methyl donor, strongly suggests that the C-terminal coiled-coil domain of Nop58p may promote its homodimerization and allow the assembly of a core complex at the suboptimal C +D motif ([179]; reviewed in Ref. [180]). In eukaryotic snoRNPs, this interaction would take place between the C-terminal tails of Nop58p and Nop56p at the C+D and C +D motifs, respectively, and possibly compensate for the absence, at this site, of the nucleation activity carried out by 15.5K. Snu13p belongs to a family of related RNA-binding proteins including several RPs of both subunits: yeast L30 (which binds to its own mRNA for autoregulation, see Refs. [181, 182]) and human L7a and S12, the box H+ACA snoRNP protein Nhp2p [107], SBP2 (which binds to the stem-loop SECIS element in the 3 -UTR of selenocysteine protein-encoding mRNAs, see Refs. [183, 184]) and eRF1 (a subunit of the translation termination release factor). These proteins have been shown, or predicted to, bind to a ubiquitous RNA structural motif, known as ‘kink-turn’ (Kturn, [185]) or ‘GA motif’ suggesting that they share a similar strategy for binding to their substrates. Interestingly, archaeal Snu13p not only binds to the box C+D sRNAs but also to the LSU (23S) rRNA contacting a K-turn and suggesting that ancestors to small stable RNAs may have evolved from rRNA segments; an assumption further supported by the identification of an archaeal box C+D sRNA within a non-coding rRNA spacer region [186]. In the widely accepted concept of the ‘prebiotic RNA world’, RNAs preexisted proteins and most essential functions were carried out by ‘RNA-based machines’. In contrast, the model proposed for the emergence of the snoRNAs is a case where a function initially performed by individual proteins has slowly been taken over by RNPs to achieve greater efficiency (see Ref. [133] for further discussions). 5.3 Non-ribosomal RNA Substrates for the snoRNAs
Although originally described in pre-rRNA modification, snoRNAs and alike (archaeal sRNAs, human scaRNAs, see below) have now been demonstrated to work on other RNA substrates, including spliceosomal U RNAs U1, U2, U4 and U5 (Pol II transcripts) and U6 (Pol III), tRNAs and possibly mRNAs [29, 174, 187, 201]. An interesting case of putative mRNA guide is a tissue-specific (brain) snoRNA, expressed from an imprinted region of the genome that is linked to the neurodegenerative genetic disease Prader-Willi syndrome [187–189]. Remarkably, this snoRNA
5 Ribosomal RNA Modification: A Solved Issue?
is expected to target a site of RNA 2 –O methylation on a serotonin receptor mRNA at a position that is also subjected to A to I editing. Orphan snoRNAs are waiting for their RNA target to be identified and many more classes of RNAs are expected to use a similar strategy for their modification. Viral RNAs are particularly interesting to consider in this respect, as these would require additional co-evolution with their hosts. 5.4 Possible Function(s) of RNA Modifications
Several structural and thermodynamic effects have been proposed for RNA modifications, including altered steric properties, different hydrogen-bonding potential and increased local base stacking (), increased structural rigidity ( and Nm) and protection from hydrolysis of inter-nucleotides bonds (Nm) [190, 191]. However, the precise function of these modifications is not known and we have failed to identify a single modification that is essential for ribosome synthesis or function, although the selective loss of the ’s surrounding the peptidyl transferase center significantly reduce translation efficiency [192]. It is therefore probable that each modification contributes a little benefit and that it is the overall modification pattern that significantly improves ribosome synthesis and/or function. It is quite remarkable that three sites of and three sites of 2 –O methylation are common to bacteria and eukaryotes; these have been selected independently twice during evolution and are made by distinct mechanisms (snoRNPs versus protein-specific enzymes; see Ref. [133]). In addition, most known modification enzymes carry additional, presumably indirect, essential functions in ribosome synthesis, notably in pre-rRNA cleavage (e.g., Nop1p, Cbf5p, Dim1p; reviewed in Ref. [193]). An attractive hypothesis certainly remains that RNA modifications are simply ‘byproducts’ reflecting the involvements of the snoRNAs in pre-rRNA processing and pre-rRNP assembly. Through extensive base pairing with the rRNA precursors, snoRNAs dictate specific pre-rRNA structures and fold them into conformations that are competent for processing and assembly. Modifications could then be seen as mere triggers to unleash the snoRNAs from the pre-ribosomes following the precise kinetics of ribosome assembly. In yeast, methylation of the rRNA occurs immediately after the completion of transcription [194, 195], implying that the snoRNAs are associated with the growing chain as it is being transcribed and potentially circumvent early unwanted folding. 5.5 Base Methylation
Several putative base methyl-transferases have been described and, as far as we know, do not involve the snoRNAs for their function [196–200]. A well-characterized example of base methylation is the 18S rRNA dimethylation carried out by Dim1p (KsgAp in E. coli). Both the site of modification (the 3 terminal SSU hairpin located at the subunit interface where interactions important
17
18 Assembly of the Eukaryotic Ribosome
for ribosome function occur) and the modification itself (a twin methylation at position 6 on two adjacent adenosine residues) are highly conserved in evolution [193, 202]. Methylation of the pre-rRNAs by Dim1p is a fairly late event in the SSU assembly pathway, possibly linked to 40S subunit export and occurring in the cytoplasm. However, Dim1p binds to the pre-rRNAs in the nucleolus and is required for early cleavages at sites A1 and A2 [203–205]. This is further evidence for the existence of ‘quality control’ mechanisms in ribosome synthesis. Processing does not occur on pre-rRNAs that have failed to bind Dim1p and will consequently not be methylated. Consistently, the Dim1p methylation is essential for ribosome function in vitro and is favorable to translation in vivo (reduced rates of frameshifting and misreading; D. Demont´e and D.L.J. Lafontaine, unpublished results). A thermosensitive conditional mutation in Dim1p is suppressed on overexpression of RPL23, a primary binding protein of the large ribosomal subunit (D. Demont´e and D.L.J. Lafontaine, unpublished results). This indicates that alteration in the kinetics of LSU assembly (the process is presumably prematurely triggered on RPL23 over-expression) overcomes the need for the ‘quality control’ exerted by Dim1p in early pre-rRNA processing and small subunit synthesis. In bacteria, the methylation is conserved but KsgAp is not essential, indicating that eukaryotic Dim1p evolved an additional function in ribosome synthesis. 5.6 U3 snoRNP, the ‘SSU Processome’, and the Central Pseudoknot
Several snoRNAs are involved in pre-rRNA cleavage rather than pre-rRNA modification. In yeast, these include the box C+D snoRNAs U3 and U14 and the box H+ACA snoRNAs snR10 and snR30. U3, U8, U14 and U22 are also involved in pre-rRNA cleavage in metazoans (reviewed in Ref. [206]). In yeast, U3, U14, snR10, and snR30 are required for the first three pre-rRNA cleavages at sites A0 –A2 ; these are either delayed (snR10) or inhibited (U3, U14, and snR30) [122, 207–209]. U14 and snR10 are also required for pre-rRNA modification. For snR10, a point mutation in the guide sequence could efficiently uncouple its requirement for pre-rRNA processing and modification [192]. Metazoans have an additional member (U8) involved in ITS2 processing [210–213]; no equivalent has thus been found in yeast. U3 is undoubtedly the best-characterized member of this class of snoRNAs both in structure, function, and synthesis (see below). U3 is larger than most box C+D snoR-NAs (333 nucleotides in yeast) and carry, in addition to the conserved core motifs, sequences (including a protruding 5 -extension, largely unfolded, and ending with a stem-loop) that are known, or presumed, binding sites for about a dozen of U3-spe-cific proteins: Mpp10p, Imp3p, Imp4p, Sof1p, Dhr1p, Lcp5p, Rrp9p/h55K, Rcl1p, and Bms1p [42, 214–221]. Recently, U3 has been isolated in association with 28 proteins [65, 222]; 10 of which were known U3-specific RRPs, another was a known RRP involved in early and late pre-rRNA processing (Rrp5p), the remaining 17 (Utp1-17p) were all nucleolar and required for 18S rRNA synthesis. This complex is now referred to as the ‘SSU processome’ and on the basis of its calculated mass (>2 200 000 kDa)
6 SnoRNA Synthesis and Intranuclear Trafficking 19
and large size (∼80S; roughly the size of a mature ribosome or the spliceosome) has been proposed to correspond to the terminal balls visualized at the 5 -ends of nascent transcripts in chromatin spreads [223, 224]; depletion of several ‘SSU processome’ components led to the disappearance of these structures [65]. The function of U3 in pre-rRNA processing is mediated through at least two Watson–Crick base-pair interactions between U3-specific motifs and the prerRNAs. An interaction between an essentially unstructured region of the 5 extension of U3 and the 5 -ETS (at site +470) is required for cleavages at sites A0 –A2 [127, 225, 226]. A second interaction between a conserved motif (box A) in the 5 -stem-loop of U3 and the pre-rRNA at the 5 -end of the mature 18S rRNA is necessary for cleavage at sites A1 and A2 [227]. The interaction between box A and the 18S rRNA 5 -end is mutually exclusive with the formation of the central pseudoknot, a conserved long-range interaction, which brings together, in the mature particles, sequences that are located more than a kb apart. The formation of the central pseudoknot is a major structural rearrangement in the SSU rRNA and as such is most probably an irreversible step in ribosomal assembly. Dhr1p, a U3-specific DEAH putative RNA helicase required for pre-rRNA processing at sites A1 and A2 , has been proposed to be involved in this RNA isomerization [215]. One possibility is that the action of Dhr1p is regulated such as to leave sufficient time for early pre-rRNP assembly to occur prior to the formation of the central pseudoknot. Growing yeast cells have about enough copies of the U3 snoRNP to support ribosome synthesis for only ∼1 min in the absence of recycling (considering a production rate of ∼2000 ribosomes/min). A function of Dhr1p in recycling the U3 snoRNP and in SSU-processome assembly is therefore also probable. This is currently under investigation.
6 SnoRNA Synthesis and Intranuclear Trafficking 6.1 SnoRNAs Synthesis
SnoRNAs have adopted a large range of strategies for their expression. Their synthesis, in the nucleoplasm, can either proceed from individual Pol II (most snoRNAs) or Pol III (U3 in plants) promoters and produce mono- (most yeast snoRNAs) or polycistronic units (many plants snoRNAs; several yeast snoRNAs) or be expressed from introns of house-keeping genes (most vertebrates snoRNAs; several yeast snoRNAs) (reviewed in Refs. [132, 134, 228]). Host genes are often somehow related to ribosomal synthesis or function and, in extreme cases, do not seem to have any additional function than to carry the snoRNAs, i.e., no proteins are expressed from the spliced mRNAs [187, 229–232]. SnoRNA maturation is complex. Processing of independently encoded or polycistronic units is initiated by endonucleolytic, possibly co-trancriptional, cleavage in the 3 -portion of the primary transcript and requires Nrd1p, the Sen1p helicase and
20 Assembly of the Eukaryotic Ribosome
the cleavage factor IA activity of the mRNA polyadenylation machinery [233–235]. SnoRNAs encoded in polycistronic units are separated by the endonucleolytic activity of Rnt1p/yeast RNase III [236–238]; precursors transcripts containing a single snoRNA may also be cleaved at their 5 -ends by Rnt1p [236]. Intron-encoded snoRNAs are usually synthesized from the excised intron lariat following splicing and debranching by Dbr1p, and exonucleolytic trimming on both ends [94, 239]. In a minor, splicing-independent pathway, the pre-mRNA is directly cleaved endonucleotically to provide entry sites for exoribonucleases. In all cases, final pre-snoRNA maturation steps require exonucleolytic digestions to the mature ends. This involves 3 to 5 exonucleolytic digestion (exosome) [95, 111] and, at least in the case of intronic or polycistronic snoRNAs, 5 to 3 exonuclease digestion (Rat1p, Xrn1p) [94, 238]. The best-characterized pre-snoRNA processing pathway is for the box C+D snoRNA U3 (Fig. 6). As for many other snoRNAs, U3 is synthesized with 3 extensions; these require endonucleolytic cleavage (Rnt1p) to provide an entry site for a processing ‘hand over’ by the exosome subunits [240]. This processing is literally ‘timed’ by the binding of yeast Lhp1p (human La) to poly(U)-rich tracks located close to the RNA 3 -ends [240]. Displacement of La is concomitant with snoRNP assembly (the core snoRNP proteins bind to the RNA, presumably conferring 3 ends protection) and allows final trimming by the exosome to produce the mature 3 -ends. The binding of La to the pre-snoRNAs presumably provides sufficient time for snoRNP assembly to occur prior to the final action of the exosome complex. U3 additionally requires the concomitant splicing of an intron. Individually expressed Pol II snoRNA precursors are produced with a 5 -terminal 7-monomethylguanosine (m7 G) cap that is retained in many snoRNAs and hypermethylated to 2,2,7-trimethylguanosine (m2,2,7 G or TMG) by Tgs1p [241]; the timing of this modification is not known. Tgs1p is also active on snRNAs. For U3, cap trimethylation is dependent on boxes C and D and is concomitant with 3 -end formation and snoRNP assembly as 3 -extended forms of U3 are not bound by the core proteins and are not precipitated by anti-TMG antibodies [240, 242–244]. In plants, U3 is transcribed by Pol III and carries a γ -monomethyl phosphate cap [245]. 6.2 Non-core snoRNP Proteins Required for snoRNA Accumulation
Besides the core components, several proteins have been linked physically or functionally to the snoRNPs but are not found in mature snoRNPs. Such proteins are the Rvb2p(p50)/p55 putative NTPases [246], the putative DEAD-box helicase Sen1p [247], the Naf1p/Shq1p complex [248–250] and Nopp140 [251]. These are required for snoRNA accumulation, through presumed transient interactions, and are potentially involved in snoRNA synthesis, snoRNP assembly, and/or nucle(ol)ar trafficking. The nucleoplasmic p50/p55 complex is required for the stability of both box C+D and box H+ACA snoRNAs as well as for proper nucleolar localization of the core proteins Nop1p and Gar1p. Mammalian orthologs have DNA unwinding activity
6 SnoRNA Synthesis and Intranuclear Trafficking 21
Fig. 6 U3 synthesis pathway. The box C+D snoRNA U3 is synthesized with 3 -extensions; these are cleaved cotranscriptionally by Rnt1p/yeast RNase III. Yeast La (Lhp1p) binds to 3 -terminal poly(U) tracks. Lhp1p-bound precursors are monomethylated and are not assembled with the core proteins. SnoRNP assembly
is concomitant with the displacement of La and the production of mature 3 -ends by the exosome; the cap is trimethylated by Tgs1p. The yeast U3 genes are unusual in that they contain an intron; this is spliced out from the 3 -extended precursors. Adapted from Ref. [240].
22 Assembly of the Eukaryotic Ribosome
in vitro and have been linked to chromatin remodeling and transcription (see Ref. [246] and references therein). Sen1p is required for snoRNA accumulation of both families as well as several other classes of RNAs (including rRNAs, tRNAs, and snRNAs) [247, 252]. Nop1p is mislocalized on Sen1p inactivation [253]. The Naf1p/Shq1p complex is specific to box C+D snoRNAs accumulation. Naf1p is mostly localized to the nucleoplasm and can be co-precipitated at low levels with several snoRNP components [248–250]. Naf1p interacts directly with the RNA in vitro, and most interestingly, is found in association with the phosphorylated form of the C-terminal domain (CTD) of Pol II. This provides a further link to RNA synthesis [249]. Nopp140 (yeast Srp40p) [254, 255], a highly phosphorylated nucleolar- and CBspecific protein, is found in association with both box C+D and box H+ACA snoRNPs [251, 256]; association with the box H+ACA is more avid. The interaction with the snoRNPs is dependent on Nopp140 phosphorylation [257]. The expression of a dominant-negative allele of Nopp140 depletes core snoRNP proteins (NAP57, Gar1p, and fibrillarin) from nucleoli and inactivates Pol I transcription [251]. Nopp140 was also coimmunoprecipitated with the largest subunit of Pol I; strengthening a link between snoRNP metabolism and transcription [258]. NAP57/Cbf5p was originally isolated as a Nop140-associated protein [259]; Nopp140 is however not required for in vitro pseudouridine formation [257]. Box H+ACA snoRNAs are lost on srp 40 deletion in a yeast synthetic lethal background [251]. 6.3 Interactions between Cleavage Factors and Core snoRNP Proteins
Interaction between Rnt1p and Gar1p is required for optimal Rnt1p activity in prerRNA processing, nucleolar localization of the core H+ACA proteins and pseudouridylation. This provides a link between snoRNP synthesis and transport and between RRPs involved in 3 -ETS co-transcriptional cleavage (Rnt1p) and 5 -ETS pre-rRNA processing (Gar1p) [260]. This possibly ensures proper pre-rRNA kinetics and coordinated cleavages on both ends of the primary transcript and prevents processing of incomplete molecules. In addition, Rnt1p accurately cleaves most of the snoRNA substrates in vitro in the absence of other cofactors, with the exception of the U18 intron-encoded snoRNA, which requires the additional presence of the box C+D snoRNP protein Nop1p; Rnt1p and Nop1p interact with each other in pull-down experiments [261]. 6.4 SnoRNAs Trafficking
The synthesis of the snoRNAs in the nucleoplasm but their function, in pre-rRNA processing and/or modification, in the DFC of the nucleolus raise interesting questions as to their localization pathway. Nucleolar targeting and localization of the snoRNAs is probably achieved by diffusion through the nucleoplasm followed by retention through multiple interactions with nucleolar components.
6 SnoRNA Synthesis and Intranuclear Trafficking 23
Fig. 7 Intra-nuclear trafficking of box C+D snoRNAs. A comparison between the yeast and vertebrate systems is provided. Box C+D snoRNA nucleolar targeting involves transit through a conserved nuclear locale, the NB and CB, in yeast and vertebrates, respectively. The cap trimethyl-transferase (Tgs1p/hTgs1) is a specific antigen of this cellular compartment. SnoRNA nucleolar
routing is a multi-step process. In mammals, PHAX the phosphorylated adaptor for snRNA export, drives the snoRNAs from their transcription sites (TS) to the CB (E. Bertrand, pers. comm.). SnoRNP assembly and captrimethylation presumably occur in the NB/CB. Np, nucleoplasm; No, nucleolus.
The cis-acting elements involved in this nucleolar targeting have been identified and, unsurprisingly, precisely map to the conserved boxes C and D and H and ACA [36, 262–265]. These are the only sequences conserved in the snoRNAs and are, known or presumed, protein-binding sites. The ACA element in the telomerase RNA is also required for its nucleolar trafficking [36, 266]. Trans-acting factors involved in this process have only started to be addressed in yeast, with most attention being paid to the box C+D snoRNAs. All core proteins are required as well as several nucleolar proteins of previously ill-defined or unknown functions such as Srp40p (Nopp140 in rodents) and Nsr1p (human nucleolin). The Ran cycle is not involved [267]. Nucleolar routing involves transit through the CB in plants and vertebrates and their recently identified homolog in yeast, the NB [264, 268–270] (Fig. 7). Overexpression of artificial box C+D snoRNAs in yeast led to their accumulation in a single, roughly spherical structure of ∼200 nm in diameter always contiguous to the fibrillar component of the nucleolus [270]. This structure was highly reminiscent to the CBs, also often found in close association with the nucleolus and functionally linked to this nuclear locale ([271–273]; reviewed in Ref. [274]). Expression of a human GFP-SMN reporter construct (a CB-specific antigen) specifically co-localized with the NB strongly supporting this assumption. In addition, the cap trimethyl-transferase Tgs1p, specifically localized to the CB and NB in yeast and humans, respectively, providing further supporting evidences [269]. Most importantly, NBs were later detected with endogenous snoRNAs, in the absence of snoRNA overexpression, supporting the physiological importance of this novel nuclear compartment [269]. Survival of motor neurons (SMN) is the causative agent for spinal muscular atrophy, a neurodegenerative disease and most frequent genetic cause of infant
24 Assembly of the Eukaryotic Ribosome
mortality ([275]; reviewed in Refs. [276, 277]). SMN is present in multiple RNP complexes and notably interacts with core snoRNP proteins of both families (Nop1p and Gar1p), the human telomerase RNP and the human cap trimethyl-transferase, hTgs1 [159, 278–280]. In the best-described complex, SMN is associated with Gemins 2–6 and is involved in snRNP metabolism and pre-mRNA splicing ([281]; reviewed in Refs. [282–284]). In a Gemin 3 (a putative DEAD-box helicase), gemin 4 and eiF2C-specific complex, SMN has also recently been linked to the metabolism of the micro RNPs (miRNPs) [285]. The accumulation of snoRNAs in NBs on RNA overexpression suggested that nucleolar targeting is a saturable, multi-step process (Fig. 7); snoRNAs would first transit from transcription sites (TS) to NB/CB before being redistributed to the entire nucleolus. Both Nsr1p and Srp40p were involved in the emergence of the NB [270]. The first step in the nucleolar routing pathway has recently been successfully uncoupled from the subsequent nucleolar distribution and imaged in time-lapsed microscopy [286]. Transcription sites and CBs were relatively static as to their locations (at least within the time frame used, ∼1 h); snoRNPs appeared to transit from TS to the vicinity of CBs within minutes but strikingly lagged for up to 60 min before being incorporated into this compartment [286]. PHAX (phosphorylated adaptor for snRNA export; [287]) is localized in the nucleoplasm and the CBs, binds specifically to box C+D 3 -extended precursors, and is able to target artificial RNA substrates from their transcription sites to CB, supporting a direct role for this protein in the first step of nucleolar routing (Fig. 7). PHAX interacts with the 15.5K (human Snu13p) in vitro and contact the snoRNPs, at least in part in an hSnu13p-dependent fashion (E. Bertand, pers. comm.). 15.5K is also present in the spliceosomal U4 snRNP (see Sect. 5.4), raising interesting questions as to the discrimination of snRNAs and snoRNAs for their trafficking. Studies on the U3 box B+C motif, which is also bound by 15.5K, indicate that specific flanking sequences and/or structure, surrounding a conserved 15.5 K-binding site, probably provide the specificity for the recruitment of additional complex-specific proteins [288]. The recent identification of box C+D and/or box H+ACA containing small RNAs localized at steady-state in the CB [201], hence their name scaRNAs (small cajal bodies specific RNAs) and active in snRNAs modifications raise additional questions as to the presence of specific cis- or trans-acting determinants in these RNAs for CB retention. 6.5 CB/NB are Conserved Sites of Small RNP Synthesis
Our current view is that NBs/CBs are conserved sites of small RNPs biogenesis; maturation steps occurring in NBs/CBs include snoRNA cap trimethylation (presence of Tgs1p), snRNA internal modification (identification of the scaRNAs) and snoRNA 3 -end formation and snoRNP assembly (occurrence of unassembled 3 -end-extended snoRNA precursors and core snoRNP proteins).
7 Ribosome Intranuclear Movements and Ribosome Export 25
7 Ribosome Intranuclear Movements and Ribosome Export
Once released from the nucleolus, pre-ribosomes transit through the nucleoplasm to reach the NPC. The precise mechanisms of ribosomes intranucle(ol)ar movements are unknown. This presumably occurs by diffusion and may involve unleashing the pre-ribosomes from successive nucle(ol)ar retention sites. Interestingly, three related couples of proteins, originally identified in a large Pol I transcription-related nucleolar complex [289], have recently been involved in this process. In these, Noc1p (Mak21p), Noc2p (Rix3p), and Noc4p share a 45amino-acid-long domain (Noc domain) [290, 291]. Noc2p organizes two distinct nucleolar complexes, Noc1p/Noc2p and Noc2p/Noc3p (a related nucleolar protein which does not show a Noc motif). The Noc complexes differ both in their intranuclear localization and association with the pre-ribosomes. Noc2p/Noc3p is mainly nucleoplasmic and interacts with 66S particles; Noc1p/Noc2p is nucleolar-enriched and associates with the 90S and 66S pre-ribosomes [290]. The Noc1p/Noc2p and Noc2p/Noc3p complexes are required for pre-60S export. The Noc1p homolog, Noc4p, is associated with Nop14p (another unrelated nucleolar protein) [292]; the Noc4p/Nop14p complex is nucleolar, associated with 90S and presumably 43S pre-ribosomes and is involved in pre-40S export [291]. The dynamic intranuclear distribution of the Noc proteins (potential to shuttle between the nucleolus and the nucleoplasm) and their association with distinct species of pre-ribosomes supports a role in intranuclear movements. A problem faced with many RRP mutants defective in ribosome export (also referred to as Rix, for ribosome export) is that they are, in addition, impaired in prerRNA processing. Typically, strains defective for pre-60S export show inhibitions in early pre-rRNA processing reactions (sites A0 –A2 ). This suggests that efficient prerRNA processing is dependent on ongoing ribosome export. Most importantly, in this respect, overexpression of the Noc domain results in a dominant-negative phenotype for growth and nuclear accumulation of the pre-ribosomes in the absence of pre-rRNA processing defects [290]. In this case, pre-rRNA processing and transport defects were efficiently uncoupled, strongly supporting a direct involvement of the Noc proteins in intranuclear movement and nuclear exit of the ribosomes. Another RRPs, the ribosomal-like protein Rlp7p, has also been recently involved in pre-60S subunits release from the nucleolus [48]. Export assays based on microinjections in Xenopus oocytes and the use of isolated Tetrahymena nuclei concluded that ribosome nuclear exit is a unidirectional, saturable (involvement of trans-acting factors, including components of the NPC), energy- and temperature-dependent process [293–297]; subunits are believed to transit to the nucleoplasm independently. In yeast, the intranucle(ol)ar accumulation of pre-ribosomes is either monitored in vivo by the use of fluorescent reporter RPs (e.g. Rps2p-eGFP, Rpl11p-GFP, and Rpl25p-eGFP) [65, 291, 298, 299] or on fixed samples by FISH (e.g., a probe specific to the 5 -portion of ITS1 has been used to follow pre-40S export) [300, 301].
26 Assembly of the Eukaryotic Ribosome
Although none of these strategies is entirely satisfactory (the RPs assay relies on proper incorporation of the reporter constructs in strains that are also potentially defective for assembly; the FISH assay largely used a xrn1 strain that accumulates high levels of cytoplasmic 20S and/or ITS1 D-A2 fragment but with a plethora of associated phenotypes in unrelated processes as diverse as mRNA turnover, microtubule function, DNA replication, telomere length, karyogamy, etc.; see discussion in Ref. [300]), they nevertheless succeeded in identifying a role in ribosome export for a subset of RPs, several nucleoporins, the Ran-system, as well as a, very large number of known or novel RRPs. A well-characterized set of Rix proteins is the Rpl10p/Nmd3p/Xpo1p complex. Rpl10p binds late to the pre-60S ribosomes and interacts with Nmd3p, a nucleo-cytoplasmic shuttling protein which serves as a transport adaptor providing a leucine-rich nuclear export signal (NES) to the exportin Xpo1p/Crm1p ([298, 302, 303]; reviewed in Refs. [304, 305]); the RRP Rsa1p was involved in facilitating the loading of Rpl10p onto pre-ribosomes [306]. The Nmd3p-mediated pathway of LSU export is conserved in metazoans [307, 308]. It is most probable that additional such NES are provided, either directly or not, by the RPs. Consistently, Yrb2p, a Ran-GTP-binding protein required for the efficient export of NEScontaining protein has recently been involved in 40S subunit export [301, 309]. In addition, a specific conditional inactivation of Mtr2p, which is required for mRNA export [310], led to the nuclear accumulation of pre-60S ribosomes and was synthetic lethal with Nmd3p [43]; the mechanism underlying these observation is not known at present. Proteomic analyses of late nuclear pre-60S complexes revealed the presence of the Rpl10p complex as well as several RRPs that were also isolated in NPC purifications [43, 311]. Since the size of the NPC is just about enough (∼20–25 nm in diameter) to accommodate that of individual ribosomal subunits (25–30 nm), it is anticipated that extensive remodeling is needed prior to, during passage through the pore, and following nuclear exit of such large RNPs. The recently identified AAA-ATPase Rix7p is a good candidate to be involved in such structural rearrangements [44]. How late pre-rRNP cleavage, modification and assembly are coupled to intranuclear movements and translocation of pre-ribosomes through the NPC is the subject of ongoing research.
8 The Cytoplasmic Phase of Ribosome Maturation
Following nuclear exit, both the small and large ribosomal subunits undergo final cytoplasmic maturation steps; these include structural rearrangements, the addition of late RPs, and possibly, late pre-rRNA processing and modification reactions. These steps underlie the long-standing observation that ribosomal subunits undergo a significant cytoplasmic lag before their incorporation into polysomes
8 The Cytoplasmic Phase of Ribosome Maturation 27
[56, 58, 85, 312]. The recent identification of the first trans-acting factors involved in these reactions led to an important novel concept in the field, several RRPs follow the pre-ribosomes to the cytoplasm and, at least for some of them, are recycled to the nucle(ol)ar pre-rRNA processing machinery [15, 68, 298, 302, 313]. Such nucleocytoplasmic shuttling was observed more than a decade ago for nucleolin/C23 and No38/B2, two important vertebrate nucleolar antigens [314]. However, the interpretation of these data was not clear at that time. The 20S pre-rRNA is exported to the cytoplasm where cleavage at site D, by an unknown RRP, generates the 18S rRNA. A conclusion largely based on cell fractionation experiments [58, 85] and indirectly supported by the following concurring evidences: (i) strains deleted for the major cytoplasmic 5 –3 exoribonucleolytic activity (Xrn1p) accumulates high levels of the D-A2 fragment in the cytoplasm [300, 315]; (ii) strains genetically depleted for Rio1p or Rio2p, two putative protein kinases, accumulate increased amounts of cytoplasmic 20S pre-rRNAs [45, 46]; and (iii) deletion of the translation initiation factor eIF3j (Hcr1p) slightly impairs 20S pre-rRNA processing [16]. Although cleavage at site D is certainly very closely linked to small subunit export, it should be kept in mind that (i) Xrn1p works cooperatively with Rat1p in multiple nuclear reactions (see above) and, consistently, Xrn1p has recently been copurified with nucle(ol)ar pre-60S subunits [68], and that (ii) the D-A2 fragment, or the 20S pre-rRNA (none of which has ever been directly detected in the cytoplasm in a wild-type strain, see Refs. [300, 301]) could be leaking through the NPC in xrn1 backgrounds or rio mutants. Furthermore, although mostly located in the cytoplasm, eIF3j is also detected within the nucleus. The formal possibility that cleavage at site D occurs shortly prior to nuclear exit or during passage through the NPC prevails. The Dim1p dimethylation was also reported to be a late, cytoplasmic event based on crude cell fractionation and fingerprint analysis [316–319]. Nucle(ol)ar pre-rRNA precursors are dimethylated when pre-rRNA processing kinetics is altered [90, 204] and dimethylation too could still formally be a late nucleoplasmic event closely linked to export in wild-type strains. Elongation factor-like 1 (Efl1p), a cytoplasmic GTPase homologous to the ribosomal translocases EF-G and EF-2, has recently been involved in nucleolar pre-rRNA processing at sites A0 –A2 [15]. It turned out that in strains deficient for Efl1p, Tif6p (a nuclear protein involved in early pre-rRNA processing [320]) is mislocalized to the cytoplasm. We proposed that the pre-ribosomes exit the nucleus in association with Tif6p and that the latter is unleashed from the particles and allowed to recycle to the nucle(ol)us following a structural rearrangement mediated by the GTPase activity of Efl1p [15] (Fig. 8). The homology between Efl1p and ribosomal translocases further suggests that Efl1p may check on the pre-ribosomes that the binding sites for the elongation factors have the proper configuration for interaction before ribosomes engage in translation. Furthermore, the nuclear exit of Tif6p has recently been shown to be dependent on phosphorylation [321]. Finally, Lsg1p/Kre35p, another cytoplasmic GTPase, may also be involved in recycling RRPs to the nucle(ol)us [322].
28 Assembly of the Eukaryotic Ribosome
Fig. 8 The cytoplasmic phase of ribosome maturation. Several RRPs follow the preribosomes to the cytoplasm during the assembly process. A case is provided here for Tif6p. A structural rearrangement in cy-
toplasmic pre-ribosomes, mediated by the GTPase activity of Efl1p, is proposed to facilitate the release of Tif6p and its recycling to the nucle(ol)ar pre-rRNA processing machinery.
9 Regulatory Mechanisms, all along
Many examples of what we currently interpret as ‘quality control’ mechanisms have been provided here (coupling between early and late cleavages, the Dim1p dimethylation, involvement of Dhr1p in pseudoknot formation, Efl1p and late LSU structural rearrangement, Rlp’s versus RPs binding, Noc’s in intranuclear movements, Rix’s in nuclear exit, etc.). In most cases, ‘quality control’ steps potentially circumvent premature, irreversible events to occur such as to drive properly the pre-rRNPs from one assembly step to the next. To put it simply, cells have evolved complex strategies to keep the ‘assembly line’ in good order. In other instances, checkpoints possibly signal upstream processing events to abort the production of what would be unfaithful and non-productive synthesis. In wild-type cells, synthesis is presumably only delayed until the proper event occurs (i.e., RRP or RP binding, a specific structural rearrangement, a cleavage, modification, or transport reaction).
10 Outlook
The next few years will undoubtedly refine the ribosomal assembly pathway. Much attention needs to be paid to the RPs; and as mass-spectrometry techniques develop, to quantitation of the various components in distinct pre-rRNP particles.
11 Conclusions 29
It is probable that several dozens of novel RRPs will be identified adding to the over increasing list of such factors and that, eventually, the endoribonucleases will uncover. Their identification may, however, await the availability of in vitro reconstitution assays. It is quite surprising, considering the amount of work put into the functional characterization of the snoRNAs, that we still barely have a clue to what they do in pre-rRNA processing and ribosome assembly. A major challenge will be to try to understand what the known RRPs are doing and, as further connections between ribosome synthesis and other biosynthetic pathways unfold, it will become essential to distinguish properly the primary versus secondary effects of these trans-acting factors. It will also become necessary to better define the relationships between the various morphological subnucle(ol)ar compartments and the biochemical reactions that occur during ribosome synthesis. Ribosome turnover has not been properly addressed yet. (Pre)-ribosomal assembly studies indicate that RRPs probably recycled but it is presently unclear whether this also applies to some mature ribosomal components. Recent observations are suggesting the existence, in higher eukaryotes, of a nuclear translation-like mechanism [323]. Are the particles involved fully matured, considering the essential cytoplasmic synthesis steps described in yeast – these steps are not known to occur in humans? What is the relationship, if any, between this currently ill-defined process and ribosome synthesis? Does this add a further level of complexity in the assembly process through a connection with pre-mRNA metabolism?
11 Conclusions
It is becoming more and more evident that ribosome synthesis is fully integrated with respect to most other essential cellular pathways. The importance of these connections is only starting to emerge and so far evidences have been provided for a link to transcription, pre-mRNA splicing, mRNA turnover, translation and telomere function (see above), as well as to the secretory pathway [324–326] and the cell cycle (see, e.g., Refs. [327–331]). This is a promising and exciting area of research for the future. Remarkably, two proteins encoded within rDNA or rDNA-like sequences (Tar1p and Ribin, respectively) have recently been identified; these are transcribed in the antisense direction with respect to 25S or 25S-like sequences [332, 333]. Yeast Tar1p is a mitochondrial protein that is capable of rescuing respiration-deficient strains. Mouse Ribin is linked to rDNA transcription; its expression is regulated by physiological changes. These are fascinating observations suggesting stringent coevolution between these short proteins (14 and 32 kDa for Tar1p and Ribin, respectively) and rDNA sequences and providing compelling evidences for a high level of integration between ribosome synthesis and other biosynthetic pathways.
30 Assembly of the Eukaryotic Ribosome
12 Useful WWW links
>http://www.expasy.org/linder/proteins.html • A comprehensive list of the yeast RRPs with a short description of their known or putative functions. >http://www.pre-ribosome.de/; http://yeast.cellzome.com/; http://genomewww.stanford.edu; http://biodata.mshri.on.ca/yeast grid/Servlet/Search Page • A list of physical and functional interactions between RRPs and between RRPs and proteins involved in unrelated biosynthetic pathways. These mostly rely on data sets from extensive co-immunoprecipitation and two-hybrid schemes. >http://www.umass.edu/molvis/pipe/ribosome/opinion/index.htm • 3D maps of rRNA modifications. >http://www.bio.umass.edu/biochem/rna-sequence/Yeast snoRNA Database/snoRNA DataBase.html • A most useful database of the yeast snoRNAs.
Acknowledgements
I am indebted to David Tollervey (Wellcome Trust Center for Cell Biology, Edinburgh) for expert training, constant support and countless advices. I thank R. Bordonn´e and E. Bertrand (CNRS, Montpellier) for allowing to quote unpublished material. The research carried out at my Lab is currently supported by Banque Nationale de Belgique, EMBO, Fonds National de la Recherche Scientifique (FNRS), Fonds pour la Recherche dans l’Industrie et l’Agriculture (FRIA), Fonds Emile Defay, Fonds Brachet, Fonds Alice and David van Buuren and Universit´e Libre de Bruxelles.
References 1 K. KOBERNA, J. MALINSKY, A. PLISS et al., J. Cell. Biol. 2002, 157, 743–748. 2 S. HUANG, J. Cell. Biol. 2002, 157, 739–741. 3 D. KRESSLER, P. LINDER, J. DE la Cruz, Mol. Cell. Biol. 1999, 19, 7897– 7912. 4 H. A. RAUE, R. J. PLANTA, Prog. Nucleic Acid Res. Mol. Biol. 1991, 41, 89–129. 5 J. VENEMA, D. TOLLERVEY, Yeast, 1995, 11, 1629–1650. 6 J. VENEMA, D. TOLLERVEY, Annu. Rev. Gen. 1999, 33, 261–311.
7 J. R. WARNER, Microbiol. Rev. 1989, 53, 256–271. 8 J. L. J. WOOLFORD, J. R. WARNER, The Ribosome and its Synthesis, in The Molecular and Cellular Biology of the Yeast Saccharomyces: Genome Dynamics, Protein Synthesis, and Energetics, vol. I., eds J. R. BROACH, J. R. PRINGLE, E. W. JONES, Cold Spring Harbor Laboratory Press, Cold Spring Harbor 1991, 587–626. 9 D. CHEN, S. HUANG, J. Cell. Biol. 2001, 153, 169–176.
References 10 R. D. PHAIR, T. MISTELI, Nature 2000, 404, 604–609. 11 T. MISTELI, J. Cell. Biol. 2001, 155, 181–185. 12 J. S. ANDERSEN, C. E. LYON, A. H. FOX et al., Curr. Biol. 2002, 12, 1–11. 13 A. SCHERL, Y. COUTE, C. DEON et al., Mol. Biol. Cell. 2002, 13, 4100–4109. 14 M. DUNDR, T. MISTELI, Mol. Cell. 2002, 9, 5–7. 15 B. SENGER, D. L. J. LAFONTAINE, J. S. GRAINDORGE et al., Mol. Cell. 2001, 8, 1363–1373. 16 L. VALASEK, J. HASEK, K. H. NIELSEN et al., J. Biol. Chem. 2001, 276, 43351–43360. 17 N. J. PROUDFOOT, A. FURGER, M. J. DYE, Cell 2002, 108, 501–512. 18 I. W. MATTAJ, L. ENGLMEIER, Annu. Rev. Biochem. 1998, 67, 265–306. 19 K. WEIS, Trends Biochem. Sci. 1998, 23, 185–189. 20 K. WEIS, Curr. Opin. Cell. Biol. 2002, 14, 328–335. 21 S. JAKEL, D. GORLICH, EMBO J. 1998, 17, 4491–4502. 22 M. P. ROUT, G. BLOBEL, J. D. AITCHISON, Cell 1997, 89, 715–725. 23 D. J. LEARY, S. HUANG, FEBS Lett. 2001, 509, 145–150. 24 R. J. PLANTA, Yeast 1997, 13, 1505–1518. 25 J. R. WARNER, Trends Biochem. Sci. 1999, 24, 437–440. 26 T. KADOWAKI, R. SCHNEITER, M. HITOMI et al., Mol. Biol. Cell. 1995, 6, 1103–1110. 27 R. SCHNEITER, T. KADOWAKI, A. M. TARTAKOFF, Mol. Biol. Cell. 1995, 6, 357–370. 28 E. BERTRAND, F. HOUSER-SCOTT, A. KENDALL et al., Genes Dev. 1998, 12, 2463–2468. 29 P. GANOT, B. E. JADY, M. L. BORTOLIN et al., Mol. Cell. Biol. 1999, 19, 6906– 6917. 30 S. A. GERBI, T. S. LANGE, Mol. Biol. Cell. 2002, 13, 3123–3137. 31 L. F. CIUFO, J. D. BROWN, Curr. Biol. 2000, 10, 1256–1264. 32 H. GROSSHANS, K. DEINERT, E. HURT et al., J. Cell. Biol. 2001, 153, 745–762. 33 M. R. JACOBSON, T. PEDERSON, Proc. Natl. Acad. Sci. USA 1998, 95, 7981–7986.
34 N. JARROUS, J. S. WOLENSKI, D. WESOLOWSKI et al., J. Cell. Biol. 1999, 146, 559–572. 35 K. T. ETHERIDGE, S. S. BANIK, B. N. ARMBRUSTER et al., J. Biol. Chem. 2002, 277, 24764–24770. 36 A. NARAYANAN, A. LUKOWIAK, B. E. JADY et al., EMBO J. 1999, 18, 5120–5130. 37 C. SRISAWAT, F. HOUSER-SCOTT, E. BERTRAND et al., RNA 2002, 8, 1348–1360. 38 T. PEDERSON, Nucleic Acids Res. 1998, 26, 3871–3876. 39 J. DE la Cruz, D. KRESSLER, P. LINDER, Trends Biochem. Sci. 1999, 24, 192–198. 40 N. K. TANNER, P. LINDER, Mol. Cell. 2001, 8, 251–262. 41 D. GELPERIN, L. HORTON, J. BECKMAN et al., RNA 2001, 7, 1268–1283. 42 T. WEGIERSKI, E. BILLY, F. NASR et al., RNA 2001, 7, 1254–1267. 43 J. BASSLER, P. GRANDI, O. GADAL et al., Mol. Cell 2001, 8, 517–529. 44 O. GADAL, D. STRAUSS, J. BRASPENNING et al., EMBO J. 2001, 20, 3695–3704. 45 E. VANROBAYS, J. P. GELUGNE, P. E. GLEIZES et al., Mol. Cell. Biol. 2003, 23, 2083–2095. 46 E. VANROBAYS, P. E. GLEIZES, C. BOUSQUET-ANTONELLI et al., EMBO J. 2001, 20, 4204–4213. 47 D. A. DUNBAR, F. DRAGON, S. J. LEE et al., Proc. Natl. Acad. Sci. USA 2000, 97, 13027–13032. 48 O. GADAL, D. STRAUSS, E. PETFALSKI et al., J. Cell. Biol. 2002, 157, 941–951. 49 C. SAVEANU, D. BIENVENU, A. NAMANE et al., EMBO J. 2001, 20, 6475–6484. 50 L. ARAVIND, E. V. KOONIN, Trends Biochem. Sci. 1999, 24, 342–344. 51 F. EISENHABER, C. WECHSELBERGER, G. KREIL, Trends Biochem. Sci. 2001, 26, 345–347. 52 A. FATICA, A. D. CRONSHAW, M. DLAKIC et al., Mol. Cell. 2002, 9, 341–351. 53 T. GAUTIER, T. BERGES, D. TOLLERVEY et al., Mol. Cell. Biol. 1997, 17, 7088–7098. 54 B. GUGLIELMI, M. WERNER, J. Biol. Chem. 2002, 277, 35712–35719. 55 K. A. WEHNER, S. J. BASERGA, Mol. Cell. 2002, 9, 329–339.
31
32 Assembly of the Eukaryotic Ribosome
56 T. KRUISWIJK, R. J. PLANTA, J. M. KROP, Biochim. Biophys. Acta 1978, 517, 378–389. 57 J. TRAPMAN, J. RETEL, R. J. PLANTA, Exp. Cell. Res. 1975, 90, 95–104. 58 S. A. UDEM, J. R. WARNER, J. Biol. Chem. 1973, 248, 1412–1416. 59 M. H. VAUGHAN, J. R. WARNER, J. E. DARNELL, J. Mol. Biol. 1967, 25, 235–251. 60 J. R. WARNER, J. Mol. Biol. 1966, 19, 383–398. 61 J. R. WARNER, A. KUMAR, S. A. UDEM et al., Biochem. Soc. Symp. 1973, 37, 3–22. 62 J. R. WARNER, A. KUMAR, S. A. UDEM et al., Biochem. J. 1972, 129, 29P–30P. 63 M. MANN, R. C. HENDRICKSON, A. PANDEY, Annu. Rev. Biochem. 2001, 70, 437–473. 64 G. RIGAUT, A. SHEVCHENKO, B. RUTZ et al., Nat. Biotechnol. 1999, 17, 1030–1032. 65 F. DRAGON, J. E. GALLAGHER, P. A. COMPAGNONE-POST et al., Nature 2002, 417, 967–970. 66 P. GRANDI, V. RYBIN, J. BASSLER et al., Mol. Cell. 2002, 10, 105–115. 67 P. HARNPICHARNCHAI, J. JAKOVLJEVIC, E. HORSEY et al., Mol. Cell. 2001, 8, 505–515. 68 T. A. NISSAN, J. BASSLER, E. PETFALSKI et al., EMBO J. 2002, 21, 5539–5547. 69 A. FATICA, D. TOLLERVEY, Curr. Opin. Cell. Biol. 2002, 14, 313–318. 70 J. R. WARNER, Cell 2001, 107, 133–136. 71 M. FROMONT-RACINE, J. C. RAIN, P. LEGRAIN, Meth. Enzymol. 2002, 350, 513–524. 72 M. FROMONT-RACINE, J. C. RAIN, P. LEGRAIN, Nat. Genet. 1997, 16, 277–282. 73 A. C. GAVIN, M. BOSCHE, R. KRAUSE et al., Nature 2002, 415, 141–147. 74 Y. HO, A. GRUHLER, A. HEILBUT et al., Nature 2002, 415, 180–183. 75 A. SHEVCHENKO, D. SCHAFT, A. ROGUEV et al., Mol. Cell. Prot. 2002, 1, 204–212. 76 T. SCHAFER, D. STRAUSS, E. PETFALSKI et al., EMBO J. 2003, 22, 1370–1380. 77 E. VANROBAYS, J. P. GELUGNE, M. CAIZERGNES-FERRER et al., RNA 2004, 10, 645–656. 78 N. BAN, P. NISSEN, J. HANSEN et al., Science 2000, 289, 905–920. 79 J. A. DOUDNA, V. L. RATH, Cell 2002, 109, 153–156.
80 D. L. J. LAFONTAINE, D. TOLLERVEY, Nat. Rev. Mol. Cell. Biol. 2001, 2, 514–520. 81 J. D. PUGLISI, S. C. BLANCHARD, R. GREEN, Nat. Struct. Biol. 2000, 7, 855–861. 82 V. RAMAKRISHNAN, P. B. MOORE, Curr. Opin. Struct. Biol. 2001, 11, 144–154. 83 P. W. PIPER, J. A. BELLATIN, A. LOCKHEART, EMBO J. 1983, 2, 353–359. 84 A. van HOOF, P. LENNERTZ, R. PARKER, EMBO J. 2000, 19, 1357–1365. 85 J. TRAPMAN, R. J. PLANTA, Biochim. Biophys. Acta. 1976, 442, 265–274. 86 J. P. MORRISSEY, D. TOLLERVEY, Trends Biochem. Sci. 1995, 20, 78–82. 87 S. XIAO, F. SCOTT, C. A. FIERKE et al., Annu. Rev. Biochem. 2002, 71, 165–189. 88 J. R. CHAMBERLAIN, Y. LEE, W. S. LANE et al., Genes Dev. 1998, 12, 1678–1690. 89 M. E. SCHMITT, D. A. CLAYTON, Genes Dev. 1994, 8, 2617–2628. 90 Y. HENRY, H. WOOD, J. P. MORRISSEY et al., EMBO J. 1994, 13, 2452–2463. 91 A. W. JOHNSON, Mol. Cell. Biol. 1997, 17, 6122–6130. 92 A. W. FABER, M. Van DIJK, H. A. RAUE et al., RNA 2002, 8, 1095–1101. 93 T. H. GEERLINGS, J. C. VOS, H. A. RAUE, RNA 2000, 6, 1698–1703. 94 E. PETFALSKI, T. DANDEKAR, Y. HENRY et al., Mol. Cell. Biol. 1998, 18, 1181– 1189. 95 C. ALLMANG, J. KUFEL, G. CHANFREAU et al., EMBO J. 1999, 18, 5399–5410. 96 P. MITCHELL, E. PETFALSKI, A. SHEVCHENKO et al., Cell 1997, 91, 457–466. 97 P. MITCHELL, E. PETFALSKI, D. TOLLERVEY, Genes Dev. 1996, 10, 502–513. 98 C. ALLMANG, E. PETFALSKI, A. PODTELEJNIKOV et al., Genes Dev. 1999, 13, 2148–2158. 99 P. HILLEREN, T. MCCARTHY, M. ROSBASH et al., Nature 2001, 413, 538–542. 100 J. S. JACOBS, A. R. ANDERSON, R. P. PARKER, EMBO J. 1998, 17, 1497–1506. 101 P. MITCHELL, D. TOLLERVEY, Nat. Struct. Biol. 2000, 7, 843–846. 102 A. van HOOF, R. PARKER, Cell 1999, 99, 347–350. 103 M. W. BRIGGS, K. T. BURKARD, J. S. BUTLER, J. Biol. Chem. 1998, 273, 13255–13263.
References 104 N. SUZUKI, E. NOGUCHI, N. NAKASHIMA et al., Genetics, 2001, 158, 613–625. 105 A. van HOOF, R. R. STAPLES, R. E. BAKER et al., Mol. Cell. Biol. 2000, 20, 8230–8243. 106 J. DE la Cruz, D. KRESSLER, D. TOLLERVEY et al., EMBO J. 1998, 17, 1128–1140. 107 A. HENRAS, Y. HENRY, C. BOUSQUET-ANTONELLI et al., EMBO J. 1998, 17, 7078–7090. 108 M. A. LISCHWE, R. L. OCHS, R. REDDY et al., J. Biol. Chem. 1985, 260, 14304–14310. 109 R. L. OCHS, M. A. LISCHWE, W. H. SPOHN et al., Biol. Cell. 1985, 54, 123–133. 110 G. REIMER, I. RASKA, U. SCHEER et al., Exp. Cell. Res. 1988, 176, 117–128. 111 A. van HOOF, P. LENNERTZ, R. PARKER, Mol. Cell. Biol. 2000, 20, 441–452. 112 G. CHANFREAU, M. BUCKLE, A. JACQUIER, Proc. Natl. Acad. Sci. USA 2000, 97, 3142–3147. 113 J. KUFEL, B. DICHTL, D. TOLLERVEY, RNA 1999, 5, 909–917. 114 H. WU, P. K. YANG, S. E. BUTCHER et al., EMBO J. 2001, 20, 7240–7249. 115 T. C. KING, R. SIRDESKMUKH, D. SCHLESSINGER, Microbiol. Rev. 1986, 50, 428–451. 116 M. DESHMUKH, Y. F. TSAY, A. G. PAULOVICH et al., Mol. Cell. Biol 1993, 13, 2835–2845. 117 A. M. DECHAMPESME, O. KOROLEVA, I. LEGER-SILVESTRE et al., J. Cell. Biol. 1999, 145, 1369–1380. 118 D. I. Van RYK, Y. LEE, R. N. NAZAR, J. Biol. Chem. 1992, 267, 16177–16181. 119 C. ALLMANG, P. MITCHELL, E. PETFALSKI et al., Nucleic Acids Res. 2000, 28, 1684–1691. 120 L. LINDAHL, R. H. ARCHER, J. M. ZENGEL, Nucleic Acids Res. 1994, 22, 5399–5407. 121 L. LINDAHL, R. H. ARCHER, J. M. ZENGEL, Nucleic Acids Res. 1992, 20, 295–301. 122 J. P. MORRISSEY, D. TOLLERVEY, Mol. Cell. Biol. 1993, 13, 2469–2477. 123 K. SHUAI, J. R. WARNER, Nucleic Acids Res. 1991, 19, 5059–5064. 124 D. TOLLERVEY, EMBO J. 1987, 6, 4169–4175. 125 J. KUFEL, C. ALLMANG, E. PETFALSKI et al., J. Biol. Chem. 2002, 278, 2147–2156.
126 C. ALLMANG, D. TOLLERVEY, J. Mol. Biol. 1998, 278, 67–78. 127 M. BELTRAME, D. TOLLERVEY, EMBO J. 1992, 11, 1531–1542. 128 R. W. van NUES, J. M. RIENTJES, S. A. MORRE et al., J. Mol. Biol. 1995, 250, 24–36. 129 R. W. van NUES, J. VENEMA, J. M. RIENTJES et al., Biochem. Cell. Biol. 1995, 73, 789–801. 130 W. A. DECATUR, M. J. FOURNIER, Trends Biochem. Sci. 2002, 27, 344–351. 131 P. B. MOORE, T. A. STEITZ, Nature 2002, 418, 229–235. 132 W. FILIPOWICZ, V. POGACIC, Curr. Opin. Cell. Biol. 2002, 14, 319–327. 133 D. L. LAFONTAINE, D. TOLLERVEY, Trends Biochem. Sci. 1998, 23, 383–388. 134 L. B. WEINSTEIN, J. A. STEITZ, Curr. Opin. Cell. Biol. 1999, 11, 378–384. 135 J. P. BACHELLERIE, J. CAVAILLE, A. HUTTENHOFER, Biochimie 2002, 84, 775–790. 136 T. KISS, EMBO J. 2001, 20, 3617–3622. 137 J. P. BACHELLERIE, J. CAVAILLE´ , L.-H. QU: Nucleotide Modifications of Eukaryotic rRNAs: the World of Small Nucleolar RNA Guides Revisited, in eds R. A. GARRETT, S. R. DOUTHWAITE, A. LILJAS et al., The Ribosome: Structure, Function, Antibiotics, and Cellular Interactions, ASM, Washington, DC 2000. 138 T. KISS, Cell 2002, 109, 145–148. 139 D. L. J. LAFONTAINE, D. TOLLERVEY, RNA 1999, 5, 455–467. 140 D. L. J. LAFONTAINE, D. TOLLERVEY, Mol. Cell. Biol. 2000, 20, 2650–2659. 141 T. SCHIMMANG, D. TOLLERVEY, H. KERN, et al., EMBO J. 1989, 8, 4015–4024. 142 N. J. WATKINS, V. SEGAULT, B. CHARPENTIER et al., Cell 2000, 103, 457–466. 143 J. P. GIRARD, H. LEHTONEN, M. CAIZERGUES-FERRER et al., EMBO J. 1992, 11, 673–682. 144 D. L. J. LAFONTAINE, C. BOUSQUET-ANTONELLI, Y. HENRY et al., Genes Dev. 1998, 12, 527–537. 145 N. J. WATKINS, A. GOTTSCHALK, G. NEUBAUER et al., RNA 1998, 4, 1549–1568. 146 C. HOANG, A. R. FERRE-D’AMARE, Cell 2001, 107, 929–939.
33
34 Assembly of the Eukaryotic Ribosome
147 A. D. OMER, S. ZIESCHE, H. EBHARDT et al., Proc. Natl. Acad. Sci. USA 2002, 99, 5289–5294. 148 D. TOLLERVEY, H. LEHTONEN, R. JANSEN et al., Cell 1993, 72, 443–457. 149 H. WANG, D. BOISVERT, K. K. KIM et al., EMBO J. 2000, 19, 317–323. 150 Y. ZEBARJADIAN, T. KING, M. J. FOURNIER, et al., Mol. Cell. Biol. 1999, 19, 7461–7472. 151 E. H. BLACKBURN, Cell 2001, 106, 661–673. 152 S. W. CHAN, E. H. BLACKBURN, Oncogene 2002, 21, 553–563. 153 M. J. MCEACHERN, A. KRAUSKOPF, E. H. BLACKBURN, Annu. Rev. Genet. 2000, 34, 331–358. 154 C. DEZ, A. HENRAS, B. FAUCON et al., Nucleic Acids Res. 2001, 29, 598–603. 155 F. DRAGON, V. POGACIC, W. FILIPOWICZ, Mol. Cell. Biol. 2000, 20, 3037–3048. 156 J. R. MITCHELL, J. CHENG, K. COLLINS, Mol. Cell. Biol. 1999, 19, 567–576. 157 V. POGACIC, F. DRAGON, W. FILIPOWICZ, Mol. Cell. Biol. 2000, 20, 9028–9040. 158 A. G. SETO, A. J. ZAUG, S. G. SOBEL et al., Nature 1999, 401, 177–180. 159 F. BACHAND, F. M. BOISVERT, J. COTE et al., Mol. Biol. Cell 2002, 13, 3192–; 3202. 160 L. P. FORD, J. W. SHAY, W. E. WRIGHT, RNA 2001, 7, 1068–1075. 161 R. BROUWER, G. J. PRUIJN, W. J. van VENROOIJ, Arth. Res. 2001, 3, 102–106. 162 N. S. HEISS, S. W. KNIGHT, T. J. VULLIAMY et al., Nat. Genet. 1998, 19, 32–38. 163 Z. LYGEROU, H. PLUK, W. J. van VENROOIJ et al., EMBO J. 1996, 15, 5936–5948. 164 J. R. MITCHELL, E. WOOD, K. COLLINS, Nature 1999, 402, 551–555. 165 M. RIDANPAA, H. VAN EENENNAAM, K. Pelin et al., Cell 2001, 104, 195–203. 166 D. RUGGERO, S. GRISENDI, F. PIAZZA et al., Science 2003, 299, 259–262. 167 V. J. TORMEY, C. C. BUNN, C. P. DENTON et al., Rheum. (Oxford) 2001, 40, 1157–1162. 168 X. ZHOU, F. K. TAN, M. XIONG et al., J. Immunol. 2001, 167, 7126–7133. 169 C. GASPIN, J. CAVAILLE, G. ERAUSO et al., J. Mol. Biol. 2000, 297, 895–906. 170 J. F. KUHN, E. J. TRAN, E. S. MAXWELL, Nucleic Acids Res. 2002, 30, 931–941.
171 A. D. OMER, T. M. LOWE, A. G. RUSSELL et al., Science 2000, 288, 517–522. 172 P. P. DENNIS, A. OMER, T. LOWE, Mol. Microbiol. 2001, 40, 509–519. 173 M. P. TERNS, R. M. TERNS, Gene Expr. 2002, 10, 17–39. 174 B.C. D’ORVAL, M. L. BORTOLIN, C. GASPIN et al., Nucleic Acids Res. 2001, 29, 4518–4529. 175 K. R. NOON, E. BRUENGER, J. A. MCCLOSKEY, J. Bacteriol. 1998, 180, 2883–2888. 176 E. J. TRAN, X. ZHANG, E. S. MAXWELL, EMBO J. 2003, 22, 3930–3940. 177 N. M. CAHILL, K. FRIEND, W. SPECKMANN et al., EMBO J. 2002, 21, 3816–3828. 178 L. B. WEINSTEIN SZEWCZAK, S. J. DEGREGORIO, S. A. STROBEL et al., Chem. Biol. 2002, 9, 1095–1107. 179 M. AITTALEB, R. RASHID, Q. CHEN, et al. 2003, 10, 256–263. 180 A. FATICA, D. TOLLERVEY, Nat. Struct. Biol. 2003, 10, 237–239. 181 J. VILARDELL, J. R. WARNER, Mol. Cell. Biol. 1997, 17, 1959–1965. 182 J. VILARDELL, S. J. YU, J. R. WARNER, Mol. Cell. 2000, 5, 761–766. 183 C. ALLMANG, P. CARBON, A. KROL, RNA 2002, 8, 1308–1318. 184 A. KROL, Biochimie, 2002, 84, 765–774. 185 D. J. KLEIN, T. M. SCHMEING, P. B. MOORE et al., EMBO J. 2001, 20, 4214–4221. 186 T. H. TANG, T. S. ROZHDESTVENSKY, B. C. D’ORVAL et al., Nucleic Acids Res. 2002, 30, 921–930. 187 J. CAVAILLE, K. BUITING, M. KIEFMANN et al., Proc. Natl. Acad. Sci. USA 2000, 97, 14311–14316. 188 J. CAVAILLE, H. SEITZ, M. PAULSEN et al., Hum. Mol. Genet. 2002, 11, 1527–1538. 189 J. CAVAILLE, P. VITALI, E. BASYUK et al., J. Biol. Chem. 2001, 276, 26374–26383. 190 B. G. LANE, J. OFENGAND, M. W. GRAY, Biochimie 1995, 77, 7–15. 191 J. OFENGAND, M. J. FOURNIER: The Pseudouridine Residues of rRNA: Number, Location, Biosynthesis, and Function. in Modification and Editing of RNA, eds H. GROSJEAN and R. BENNE, ASM Press, Washington, DC 1998, 229–253. 192 T. H. KING, B. LIU, R. R. MCCULLY et al., Mol. Cell. 2003, 11, 425–435.
References 193 D. L. J. LAFONTAINE, D. TOLLERVEY: Regulatory Aspects of rRNA Modifications and Pre-rRNA Processing, in Modification and Editing of RNA, eds H. GROSJEAN and R. BENNE, ASM Press, Washington, DC 1998, 281–288. 194 J. RETEL, R. C. VAN den Bos, R. J. PLANTA, Biochim. Biophys. Acta. 1969, 195, 370–380. 195 S. A. UDEM, J. R. WARNER, J. Mol. Biol. 1972, 65, 227–242. 196 B. HONG, J. S. BROCKENBROUGH, P. WU et al., Mol. Cell. Biol. 1997, 17, 378–388. 197 D. KRESSLER, M. ROJO, P. LINDER et al., Nucleic Acids Res. 1999, 27, 4598–4608. 198 L. PINTARD, D. KRESSLER, B. LAPEYRE, Mol. Cell. Biol. 2000, 20, 1370–1381. 199 K. SIRUM-CONNOLLY, T. L. MASON, Science 1993, 262, 1886–1889. 200 P. WU, J. S. BROCKENBROUGH, M. R. PADDY et al., Gene 1998, 220, 109–117. 201 X. DARZACQ, B. E. JADY, C. VERHEGGEN et al., EMBO J. 2002, 21, 2746–2756. 202 P. H. van KNIPPENBERG: Structural and Functional Aspects of the N6, N6 Dimethyladenosines in 16S Ribosomal RNA, in Structure, Function, and Genetics of Ribosomes, eds B. HARDESTY and G. KRAMER, Springer, Berlin 1986, 412–424. 203 D. LAFONTAINE, J. DELCOUR, A. L. GLASSER et al., J. Mol. Biol. 1994, 241, 492–497. 204 D. LAFONTAINE, J. VANDENHAUTE, D. TOLLERVEY, Genes Dev. 1995, 9, 2470–2481. 205 D. L. J. LAFONTAINE, T. PREISS, D. TOLLERVEY, Mol. Cell. Biol. 1998, 18, 2360–2370. 206 D. TOLLERVEY, T. KISS, Curr. Opin. Cell. Biol. 1997, 9, 337–342. 207 J. M. HUGHES Jr., M. ARES, EMBO J. 1991, 10, 4231–4239. 208 H. D. LI, J. ZAGORSKI, M. J. FOURNIER, Mol. Cell. Biol. 1990, 10, 1145–1152. 209 D. TOLLERVEY, C. GUTHRIE, EMBO J. 1985, 4, 3873–3878. 210 B. A. PECULIS, J. A. STEITZ, Cell 1993, 73, 1233–1245. 211 B. A. PECULIS, J. A. STEITZ, Genes Dev. 1994, 8, 2241–2255. 212 N. TOMASEVIC, B. PECULIS, J. Biol. Chem. 1999, 274, 35914–35920. 213 N. TOMASEVIC, B. A. PECULIS, Mol. Cell. Biol. 2002, 22, 4101–4112.
214 E. BILLY, T. WEGIERSKI, F. NASR et al., EMBO J. 2000, 19, 2115–2126. 215 A. COLLEY, J. BEGGS, D. TOLLERVEY et al., Mol. Cell. Biol. 2000, 20, 7238–7246. 216 D. A. DUNBAR, S. WORMSLEY, T. M. AGENTIS et al., Mol. Cell. Biol. 1997, 17, 5803–5812. 217 R. JANSEN, D. TOLLERVEY, E. C. HURT, EMBO J. 1993, 12, 2549–2558. 218 S. J. LEE, S. J. BASERGA, Mol. Cell. Biol. 1999, 19, 5441–5452. 219 H. PLUK, J. SOFFNER, R. LUHRMANN et al., Mol. Cell. Biol. 1998, 18, 488–498. 220 J. VENEMA, H. R. VOS, A. W. FABER et al., RNA 2000, 6, 1660–1671. 221 T. WIEDERKEHR, R. F. PRETOT, L. MINVIELLE-SEBASTIA, RNA 1998, 4, 1357–1372. 222 K. A. WEHNER, J. E. GALLAGHER, S. J. BASERGA, Mol. Cell. Biol. 2002, 22, 7258–7267. 223 Jr. O. L. MILLER, B. R. BEATTY, Science 1969, 164, 955–957. 224 E. B. MOUGEY, M. O’REILLY, Y. OSHEIM, et al., Genes Dev. 1993, 7, 1609–1619. 225 M. BELTRAME, Y. HENRY, D. TOLLERVEY, Nucleic Acids Res. 1994, 22, 5139–5147. 226 M. BELTRAME, D. TOLLERVEY, EMBO J. 1995, 14, 4350–4356. 227 K. SHARMA, D. TOLLERVEY, Mol. Cell. Biol. 1999, 19, 6012–6019. 228 E. S. MAXWELL, M. J. FOURNIER, Annu. Rev. Biochem. 1995, 64, 897–934. 229 M. L. BORTOLIN, T. KISS, RNA 1998, 4, 445–454. 230 P. PELCZAR, W. FILIPOWICZ, Mol. Cell. Biol. 1998, 18, 4509–4518. 231 C. M. SMITH, J. A. STEITZ, Mol. Cell. Biol. 1998, 18, 6897–6909. 232 K. T. TYCOWSKI, M. D. SHU, J. A. STEITZ, Nature 1996, 379, 464–466. 233 A. FATICA, M. MORLANDO, I. BOZZONI, EMBO J. 2000, 19, 6218–6229. 234 M. MORLANDO, P. GRECO, B. DICHTL et al., Mol. Cell. Biol. 2002, 22, 1379–1389. 235 E. J. STEINMETZ, N. K. CONRAD, D. A. BROW et al., Nature 2001, 413, 327–331. 236 G. CHANFREAU, P. LEGRAIN, A. JACQUIER, J. Mol. Biol. 1998, 284, 975–988. 237 G. CHANFREAU, G. ROTONDO, P. LEGRAIN et al., EMBO J. 1998, 17, 3726–3737. 238 L. H. QU, A. HENRAS, Y. J. LU et al., Mol. Cell. Biol. 1999, 19, 1144–1158.
35
36 Assembly of the Eukaryotic Ribosome
239 S. L. OOI, D. A. SAMARSKY, M. J. FOURNIER et al., RNA 1998, 4, 1096–1110. 240 J. KUFEL, C. ALLMANG, G. CHANFREAU et al., Mol. Cell. Biol. 2000, 20, 5415–5424. 241 J. MOUAIKEL, C. VERHEGGEN, E. BERTRAND et al., Mol. Cell. 2002, 9, 891–901. 242 W. SPECKMANN, A. NARAYANAN, R. TERNS et al., Mol. Cell. Biol. 1999, 19, 8412–8421. 243 W. A. SPECKMANN, R. M. TERNS, M. P. TERNS, Nucleic Acids Res. 2000, 28, 4467–4473. 244 M. P. TERNS, C. GRIMM, E. LUND et al., EMBO J. 1995, 14, 4860–4871. 245 S. SHIMBA, B. BUCKLEY, R. REDDY et al., J. Biol. Chem. 1992, 267, 13772–13777. 246 T. H. KING, W. A. DECATUR, E. BERTRAND et al., Mol. Cell. Biol. 2001, 21, 7731–7746. 247 D. URSIC, K. L. HIMMEL, K. A. GURLEY et al., Nucleic Acids Res. 1997, 25, 4778–4785. 248 C. DEZ, J. NOAILLAC-DEPEYRE, M. CAIZERGUES-FERRER et al., Mol. Cell. Biol. 2002, 22, 7053–7065. 249 A. FATICA, M. DLAKIC, D. TOLLERVEY, RNA 2002, 8, 1502–1514. 250 P. K. YANG, G. ROTONDO, T. PORRAS et al., J. Biol. Chem. 2002, 277, 45235–45242. 251 Y. YANG, C. ISAAC, C. WANG et al., Mol. Biol. Cell. 2000, 11, 567–577. 252 T. P. RASMUSSEN, M. R. CULBERTSON, Mol. Cell. Biol. 1998, 18, 6885–6896. 253 D. URSIC, D. J. DEMARINI, M. R. CULBERTSON, Mol. Gen. Genet. 1995, 249, 571–584. 254 U. T. MEIER, J. Biol. Chem. 1996, 271, 19376–19384. 255 U. T. MEIER, G. BLOBEL, Cell 1992, 70, 127–138. 256 C. ISAAC, Y. YANG, U. T. MEIER, J. Cell. Biol. 1998, 142, 319–329. 257 C. WANG, C. C. QUERY, U. T. MEIER, Mol. Cell. Biol. 2002, 22, 8457–8466. 258 H. K. CHEN, C. Y. PAI, J. Y. HUANG et al., Mol. Cell. Biol. 1999, 19, 8536–8546. 259 U. T. MEIER, G. BLOBEL, J. Cell. Biol. 1994, 127, 1505–1514. 260 A. TREMBLAY, B. LAMONTAGNE, M. CATALA et al., Mol. Cell. Biol. 2002, 22, 4792–4802.
261 C. GIORGI, A. FATICA, R. NAGEL et al., EMBO J. 2001, 20, 6856–6865. 262 T. S. LANGE, A. BOROVJAGIN, E. S. MAXWELL et al., EMBO J. 1998, 17, 3176–3187. 263 T. S. LANGE, M. EZROKHI, F. AMALDI et al., Mol. Biol. Cell 1999, 10, 3877–; 3890. 264 A. NARAYANAN, W. SPECKMANN, R. TERNS, et al., Mol. Biol. Cell. 1999, 10, 2131–2147. 265 D. A. SAMARSKY, M. J. FOURNIER, R. H. SINGER et al., EMBO J. 1998, 17, 3747–3757. 266 A. A. LUKOWIAK, A. NARAYANAN, Z. H. LI et al., RNA 2001, 7, 1833–1844. 267 A. NARAYANAN, J. EIFERT, K. A. MARFATIA et al., J. Cell. Sci. 2003, 116, 177–186. 268 P. J. SHAW, A. F. BEVEN, D. J. LEADER et al., J. Cell. Sci. 1998, 111, 2121–2128. 269 C. VERHEGGEN, D. L. J. LAFONTAINE, D. SAMARSKY et al., EMBO J. 2002, 21, 2736–2745. 270 C. VERHEGGEN, J. MOUAIKEL, M. THIRY et al., EMBO J. 2001, 20, 5480–5490. 271 K. BOHMANN, J. A. FERREIRA, A. I. LAMOND, J. Cell. Biol. 1995, 131, 817–; 831. 272 M. PLATANI, I. GOLDBERG, J. R. SWEDLOW et al., J. Cell. Biol. 2000, 151, 1561–1574. 273 J. SLEEMAN, C. E. LYON, M. PLATANI et al., Exp. Cell. Res. 1998, 243, 290–304. 274 J. G. GALL, Annu. Rev. Cell. Dev. Biol. 2000, 16, 273–300. 275 S. LEFEBVRE, P. BURLET, Q. LIU et al., Nat. Genet. 1997, 16, 265–269. 276 T. FRUGIER, S. NICOLE, C. CIFUENTES-DIAZ et al., Curr. Opin. Genet. Dev. 2002, 12, 294–298. 277 S. NICOLE, C. C. DIAZ, T. FRUGIER et al., Muscle Nerve 2002, 26, 4–13. 278 K. W. JONES, K. GORZYNSKI, C. M. HALES et al., J. Biol. Chem. 2001, 276, 38645–38651. 279 J. MOUAIKEL, U. NARAYANAN, C. VERHEGGEN et al., EMBO Rep. 2003, 4, 616–622. 280 L. PELLIZZONI, J. BACCON, B. CHARROUX et al., Curr. Biol. 2001, 11, 1079–1088. 281 L. PELLIZZONI, J. YONG, G. DREYFUSS, Science 2002, 298, 1775–1779. 282 G. MEISTER, C. EGGERT, U. FISCHER, Trends Cell. Biol. 2002, 12, 472–478.
References 283 S. PAUSHKIN, A. K. GUBITZ, S. MASSENET et al., Curr. Opin. Cell. Biol. 2002, 14, 305–312. 284 M. P. TERNS, R. M. TERNS, Curr. Biol. 2001, 11, R862–R864. 285 Z. MOURELATOS, J. DOSTIE, S. PAUSHKIN et al., Genes Dev. 2002, 16, 720–728. 286 S. BOULON, E. BASYUK, J. M. BLANCHARD et al., Biochimie 2002, 84, 805–813. 287 M. OHNO, A. SEGREF, A. BACHI et al., Cell 2000, 101, 187–198. 288 S. GRANNEMAN, G. J. PRUIJN, W. HORSTMAN et al., J. Biol. Chem. 2002, 277, 48490–48500. 289 S. FATH, P. MILKEREIT, A. V. PODTELEJNIKOV et al., J. Cell. Biol. 2000, 149, 575–590. 290 P. MILKEREIT, O. GADAL, A. PODTELEJNIKOV et al., Cell 2001, 105, 499–509. 291 P. MILKEREIT, D. STRAUSS, J. BASSLER et al., J. Biol. Chem. 2002, 278, 4072–4081. 292 P. C. LIU, D. J. THIELE, Mol. Biol. Cell 2001, 12, 3644–3657. 293 N. BATAILLE, T. HELSER, H. M. FRIED, J. Cell. Biol. 1990, 111, 1571–1582. 294 G. GIESE, F. WUNDERLICH, J. Biol. Chem. 1983, 258, 131–135. 295 A. KHANNA-GUPTA, V. C. WARE, Proc. Natl. Acad. Sci. USA 1989, 86, 1791–1795. 296 N. J. POKRYWKA, D. S. GOLDFARB, J. Biol. Chem. 1995, 270, 3619–3624. 297 F. WUNDERLICH, G. GIESE, H. FALK, Mol. Cell. Biol. 1983, 3, 693–698. 298 O. GADAL, D. STRAUSS, J. KESSL et al., Mol. Cell. Biol. 2001, 21, 3405–3415. 299 E. HURT, S. HANNUS, B. SCHMELZL et al., J. Cell. Biol. 1999, 144, 389–401. 300 T. I. MOY, P. A. SILVER, Genes Dev. 1999, 13, 2118–2133. 301 T. I. MOY, P. A. SILVER, J. Cell. Sci. 2002, 115, 2985–2995. 302 J. H. HO, G. KALLSTROM, A. W. JOHNSON, J. Cell. Biol. 2000, 151, 1057–1066. 303 K. STADE, C. S. FORD, C. GUTHRIE et al., Cell 1997, 90, 1041–1050. 304 J. D. AITCHISON, M. P. ROUT, J. Cell. Biol. 2000, 151, F23–6. 305 A. W. JOHNSON, E. LUND, J. DAHLBERG, Trends Biochem. Sci. 2002, 27, 580–585.
306 D. KRESSLER, M. DOERE, M. ROJO et al., Mol. Cell. Biol. 1999, 19, 8633–8645. 307 F. THOMAS, U. KUTAY, J. Cell. Sci. 2003, 116, 2409–2419. 308 C. R. TROTTA, E. LUND, L. KAHAN et al., EMBO J. 2003, 22, 2841–2851. 309 T. TAURA, G. SCHLENSTEDT, P. A. SILVER, J. Biol. Chem. 1997, 272, 31877–31884. 310 H. SANTOS-ROSA, H. MORENO, G. SIMOS et al., Mol. Cell. Biol. 1998, 18, 6826–6838. 311 M. P. ROUT, J. D. AITCHISON, A. SUPRAPTO et al., J. Cell. Biol. 2000, 148, 635–651. 312 J. R. WARNER, S. A. UDEM, J. Mol. Biol. 1972, 65, 243–257. 313 N. I. ZANCHIN, P. ROBERTS, A. DESILVA et al., Mol. Cell. Biol. 1997, 17, 5001–; 5015. 314 R. A. BORER, C. F. LEHNER, H. M. EPPENBERGER et al., Cell, 1989, 56, 379–390. 315 A. STEVENS, C. L. HSU, K. R. ISHAM et al., J. Bacteriol. 1991, 173, 7024–7028. 316 R. C. BRAND, J. KLOOTWIJK, T. J. VAN Steenbergen et al., Eur. J. Biochem. 1977, 75, 311–318. 317 J. KLOOTWIJK, R. C. VAN den Bos, R. J. PLANTA, FEBS Lett. 1972, 27, 102–106. 318 B. E. MADEN, M. SALIM, J. Mol. Biol. 1974, 88, 133–152. 319 B. E. MADEN, M. SALIM, D. F. SUMMERS, Nat. New Biol. 1972, 237, 5–9. 320 U. BASU, K. SI, J. R. WARNER et al., Mol. Cell. Biol. 2001, 21, 1453–1462. 321 U. BASU, KSI, D. H., U. MAITRA et al., Mol. Cell. Biol. 2003, 23, 6187–6199. 322 G. KALLSTROM, J. HEDGES, A. JOHNSON, Mol. Cell. Biol. 2003, 23, 4344–4355. 323 F. J. IBORRA, D. A. JACKSON, P. R. COOK, Science 2001, 293, 1139–1142. 324 K. MIYOSHI, R. TSUJII, H. YOSHIDA et al., J. Biol. Chem. 2002, 277, 18334–18339. 325 K. MIZUTA, J. R. WARNER, Mol. Cell. Biol. 1994, 14, 2493–2502. 326 A. TSUNO, K. MIYOSHI, R. TSUJII et al., Mol. Cell. Biol. 2000, 20, 2066–2074. 327 Y. C. DU, B. STILLMAN, Cell 2002, 109, 835–848. 328 W. SHOU, K. M. SAKAMOTO, J. KEENER et al., Mol. Cell. 2001, 8, 45–55. 329 W. SHOU, J. H. SEOL, A. SHEVCHENKO et al., Cell 1999, 97, 233–244.
37
38 Assembly of the Eukaryotic Ribosome
330 A. F. STRAIGHT, W. SHOU, G. J. DOWD et al., Cell 1999, 97, 245–256. 331 R. VISINTIN, E. S. HWANG, A. AMON, Nature 1999, 398, 818–823. 332 P. S. COELHO, A. C. BRYAN, A. KUMAR et al., Genes Dev. 2002, 16, 2755–2760. 333 M. KERMEKCHIEV, L. IVANOVA, Mol. Cell. Biol. 2001, 21, 8255–8263.
334 N. A. EPPENS, A. W. FABER, M. RONDAIJ et al., Nucleic Acids Res. 2002, 30, 4222–4231. 335 T. STAGE-ZIMMERMANN, U. SCHMIDT, P. A. SILVER, Mol. Biol. Cell. 2000, 11, 3777–3789.
1
Genetic Code Organization and Protein Structure Nediljko Budisa Max-Planck-Institute of Biochemistry, Martinsried, Germany
Originally published in: Engineering the Genetic Code. By Nediljko Budisa. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31243-6
1 Basic Features and Adaptive Nature of the Universal Genetic Code
The genetic code arrangement as presented in Fig. 1 is also known as the “universal code” and has at least three basic features: (i) it has a strict codon synonymy, (ii) it specifies a definite number of chemical species that serve as protein building blocks and (iii) it is redundant in the values of physicochemical properties (e.g. the extent of encoded hydrophobicity and hydrophilicity) [3]. These features can be elaborated as follows. First, since most of the amino acids are represented by more than one codon, the genetic code is said to be degenerate or even “redundant” in its structure. Coding triplets that encode the same amino acid are termed synonymous codons. Second, the chemical diversity encoded in this way includes a limited number of side-chain functional groups: carboxylic acids (Glu and Asp), amides (Gln and Asn), thiol (Cys) and thiol ester (Met), alcohols (Ser and Thr), basic amines (Lys), guanidine (Arg), aliphatic (Ala, Val, Leu and Ile) and aromatic side-chains (Phe, Tyr, His and Trp), and even the cyclic imino acid Pro. While Met (AUG) and Trp (UGG) are assigned only with single coding triplets, the rest of the 18 amino acids have at least two codons. On average, an amino acid is encoded by three different codons. As a consequence, any gene can be expressed using different codons, i.e. each gene has a specific codon usage. Third, the increased redundancy in coding of strictly polar and strictly apolar amino acids reflects highly synonymous quotas for Arg, Ser (XGX group of codons), Leu and other hydrophobic amino acids (XUX group of codons). Such an arrangement is directly related to protein folding and building as well as translational error minimization (Section 5). Despite numerous reports about variations in codon assignments, mainly in mitochondria and ciliate protozoa (Section 7), the genetic code can be regarded as conserved and universal. Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Genetic Code Organization and Protein Structure
Fig. 1 Codons (mRNA format) and related amino acids as basic building blocks for protein synthesis. Although there are 1084 possible codes containing at least one codon for each of the 20 amino acids and termination signal, the structure of the genetic code is such that it assigns 20 amino
acids to 61 codons and remaining three are stop codons: UAG (amber), UGA (opal) and UAA (ochre). (R denotes an amino carboxyl group). More details about the history of genetic code deciphering and its general features are available in standard textbooks [1, 2].
This book will follow the line of reasoning that explains patterns within the genetic code as a result of optimization processes during evolution. Such a code is adaptive, i.e. it results from adaptation that optimizes its function such that the number of errors during translation is minimized and the effects of mutations reduced. For code engineering as a research field this is indeed plausible since it would be highly disadvantageous for living cells to accumulate a lot of errors in proteins. Such reasoning can be traced back to the pioneering work of Woese [4], who was among first to suggest that patterns within the code reflect the physicochemical properties of amino acids (Section 5). However, some recent theoretical considerations indicate that the genetic code itself is not optimal. For example, Di Giulio and Medugno [5] calculated that the code has achieved 68% minimization of polarity distance (which is a normalized chemical scale derived from waterto-ethanol distribution interactions). This is not surprising since the code itself is not the result of any intelligent design, but rather the result of evolutionary
2 Metabolism and Intracellular Uptake of Canonical Amino Acids
interplay between chance and necessity. In other words, the genetic code as a product of evolutionary optimization cannot considered as optimal. One might therefore expect that a major challenge for code engineering as a new research field might be the design of target-engineered genetic codes with a much higher degree of optimization than natural ones.
2 Metabolism and Intracellular Uptake of Canonical Amino Acids
Amino acid metabolism in all living beings is characterized by a balance between the rates of protein synthesis and degradation. As a result of this balance, amino acids are found in living organisms in both their free (modified or unmodified) forms, covalently integrated into peptides and proteins or conjugated into other structures. Since life is based on protein chemistry, through its evolution on Earth, a large variety of organisms have developed idiosyncratic ways of handling amino acids. Therefore, many findings regarding (i) the mechanism of the formation of amino acids, (ii) their chemical nature, and (iii) their intracellular transport, turnover and discharge properties in Escherichia coli apply equally well to mammalian and other cell types [6]. In the context of cellular metabolism, canonical amino acids are “multitasking” i.e. they act as substrates perfectly integrated into the fine network of cellular biosynthetic, metabolic and energetic functions. Amino acids may undergo different metabolic pathways, serving as catabolic substrates for energy production by deamination that results in keto-acid residues capable of entering the glycolytic pathway at the level of pyruvate or acetyl CoA or else enter the Krebs cycle [7]. In addition, they are basic building blocks for protein synthesis, and important components in the biosynthesis of numerous biological molecules like nucleic acids, glutathione, nicotinamide sphingosins, porphyrin derivatives, various metabolic cofactors, neurotransmitters and pigments. For example, the majority of Tyr unused in proteins synthesis is catabolized for energy production or enters a pathway for conversion to catecholamines (neurotransmitters: dopamine, norepinephrine and epinephrine), while Trp serves as the precursor for the synthesis of serotonin (5-hydroxytryptamine) [8]. Most prokaryotes have functional biochemical pathways for the synthesis of all 20 amino acids. In contrast, mammalian cells generate only nonessential amino acids (Ala, Arg, Asp, Asn, Cys, Glu, Gln, Gly, Pro, Ser and Tyr), while the essential ones are provided via an exogenous supply [9]. However, even nonessential amino acids are imported from the surrounding medium when available. Even E. coli cells that have the ability to synthesize all amino acids from inorganic salts and glucose maintain the capacity to actively accumulate external amino acids [10]. There are three main pathways by which amino acids enter cells: simple or free diffusion, facilitated diffusion and active transport [11]. In many cases, simple diffusion is probably too slow for essential life processes. Like passive diffusion, facilitated transport does not require energy, but is brought about by protein molecules that bind
3
4 Genetic Code Organization and Protein Structure
to a particular amino acid or its derivative (e.g. dopamine) and facilitate its movement down a concentration gradient into the cell. Systems for active transport or cotransport deliver amino acids into the cells against steep concentration gradients. Examples include the re-uptake of Glu from the synapse back into the presynaptic neuron by sodium-driven pumps [12] or coupled amino acid/H+ cotransporters for the cellular import of neutral and charged amino acids that can exhibit low or higher affinity [13]. Metazoan Na+ -dependent and-independent carrier systems transporting amino acids comprise B0+ , LAT1, LAT2 and TAT secondary transporters and ATP-binding cassette (ABC) pumps [14]. Over the past decades, numerous proton-coupled amino acid and peptide transporters have been discovered from different plant and animal species. Compared with single-cell organisms, higher plants have evolved a larger proportion of genes involved in energydependent transport. This apparent redundancy could be explained by tissue- and development-specific requirements in these organisms. For example, larvae or embryos cannot tolerate dietary deprivation of any one of the basic (Arg, Lys and His) and neutral l-amino acids (Leu, Met, Ile, Thr and Val) [15]. Aromatic amino acids (Phe and Trp) are exceptionally critical in mammalian development since they are essential precursors for the synthesis of neurotransmitters [12]. At present, the molecular identities of essential amino acid transporters as well as mechanisms of accumulation and redistribution of essential amino acids in pools are poorly understood in metazoans. However, one general feature of these transport systems is that they exhibit no strict substrate specificity, i.e. in addition to their “own” group of amino acids, they can also carry amino acids from other groups, although with lower affinity [8]. For example, a Na+ -dependent system ASC preferentially transports Ala, Ser and Cys, but also transfers other aliphatic amino acids [8, 16].
3 Physicochemical Properties of Canonical Amino Acids
Canonical amino acids can be divided according to their physicochemical properties (e.g. positively and negatively charged and uncharged, acidic, basic, etc.), and metabolic and physiological roles. Regarding their physicochemical properties, it is indeed quite difficult to put all amino acids of the same type into an invariant group. For example, the same amino acid can behave differently in the context of different solvents and protein structure microenvironments. Nevertheless, amino acids have been arranged into several groups by using various criteria such as size and volume of their side-chains, isoelectric point, solubility in water, free energy of transfer between different phases (vapor/aqueous) and solvent systems (polar/apolar), optical properties such as refractive index, optical rotation or chromatographic migration, effects on surface tension, and their solvent accessibility in known protein structures. Some of these features are presented in Table 1. Such indexing might be useful in predicting favorable substitution patterns in proteins or in an attempt to understand the solvation effects in molecular recognition. Moreover, the property
582 7.10 48 6.06 −0.4 24.99 0.94
205 7.25 73 5.68 −0.8 5.023 −3.40
Ser 87 2.44 86 5.05 8.37 2.5 FS 1.28
210 5.67 90 6.30 −1.6 162.3 -
Cys Pro 488 6.99 67 6.11 1.8 16.7 1.81
402 6.35 105 6.00 4.2 8.85 4.04
428 9.56 124 6.01 3.8 2.426 4.92
Ala Val Leu 176 3.84 135 5.49 2.8 2.965 2.98
Phe 276 4.50 124 6.05 4.5 4.117 4.92
Ile 146 2.23 124 5.74 1.9 3.381 2.35
Met 241 5.68 93 5.60 −0.7 FS −2.57
Thr 54 1.38 163 5.89 −0.9 1.136 2.33
Trp 131 3.13 141 5.64 8.37 −1.3 0.0453 −0.14
Tyr 90 2.35 118 7.60 6.04 −3.2 4.19 −4.66
His 229 5.07 91 2.85 3.90 −3.5 0.778 −8.72
Asp 229 3.92 96 5.41 −3.5 3.53 −6.64
Asn
250 6.82 109 3.15 4.07 −3.5 0.864 −6.81
Glu
250 4.47 114 5.65 −3.5 2.5 −5.54
Gln
326 5.71 135 9.60 10.54 −3.9 66.6 −5.55
Lys
281 5.28 148 10.76 12.48 −4.5 15.0 −14.92
Arg
b
a
Determined as described in [30] and expressed as µmol g−1 of dried cells. Data taken from [31], based on an analysis of 1490 human genes. c The isoelectric point (pI) is de.ned as the pH at which a molecule carries no net electrical charge. d These values are derived from the hydropathy scale of Kyte and Doolittle [22], which assigns a hydropathy index to each amino acid, based on its relative hydrophobicity (positive value) or hydrophilicity (negative value). It is an index of solubility characteristics in water which combines hydrophobic and hydrophilic tendencies, and can be used to predict protein structure (i.e. propensities to occur in di.erent secondary structures). 1 Water-solubility is expressed as g 100 g−1 at 25◦ C (data from [32]) (FS = fully water soluble). 2 Nonpolar (chx; cyclohexane) → polar (water) distributions of amino acid side-chains expressed in free energies (kcal mol−1 ) from Radzicka and Wolfenden [20].
Abundance in E. colia Occurrence in proteins van der Waals volume (Å3 ) Isoelectric point (pI) pK a (sidechain (R)) Hydropathy index Solubility in water G(chx→water) (kJ mol−1 )
Gly
the rows define the biological, chemical and physical properties)
Table 1 Some properties of canonical amino acids (such tables can be also regarded as an index where the columns refer to a particular amino acid, and
3 Physicochemical Properties of Canonical Amino Acids 5
6 Genetic Code Organization and Protein Structure
which has appeared to be the major driving force in protein folding is hydrophobicity [17, 18]. Not surprisingly, the various methods employed over the past decades have yielded numerous hydrophobicity scales in order to quantify this property at the level of a single amino acid [19]. One class of hydrophobicity scale which is based on the measurement of the distribution coefficients between water and apolar solvents such as 1-octanol or cyclohexane revealed that this parameter is correlated with solvent accessibility in globular proteins [20]. In an ideal case a solvent that would be a good representative of (or at least mimic) the interior of a protein is desirable. Conversely, no single reference phase capable of serving as a general model for the interior of proteins has yet been found. Another class of hydrophobic scale is constructed by examining proteins with known three-dimensional structures and defined hydrophobic character (i.e. the tendency for a residue to be found inside the protein, rather than on its surface) [21]. Finally, correlations from the various types of data presented by the scales can also be modeled mathematically [22–24]. By analyzing 43 different hydrophilic/hydrophobic scales, Trinquier and Sanejounad [19] found a fairly good consensus in assigning strictly hydrophilic characters to Asp, Asn, Glu and Gln. They are seldom buried in the interior of a folded protein, and are normally found on the surface of the protein where they interact with water and other important biological molecules. Curiously, Arg and Lys are in some scales regarded to be the most hydrophilic, whereas in others only as “moderately” hydrophilic. This is not surprising since both Arg and Lys can be considered hydrophobic by virtue of the long hydrocarbon side-chain, whereas the end of the side-chain is positively charged. Lys and Arg are almost exclusively distributed on protein surfaces and have part of the side-chains buried, and only the charged portion is solvent exposed. Similarly, Trp occupies the most hydrophobic rank in some scales, whereas it is considered “moderately” hydrophilic in others (Fig. 2). Indeed, Trp and Met are relatively rare amino acids, not strictly polar or apolar, but combine both properties – as reflected in their “promiscuous” topological distribution in proteins [28, 29]. Residues such as Ala, Thr and Ser are located at the mid-scale, which is not surprising since they might be both buried as well as on the surface of the protein (such amino acids are sometimes called “ambivalent” or “amphipathic” in the literature). They are capable either of forming hydrogen bonds with other polar residues in the protein interior or with water on the surface. The strict hydrophobic character of Phe, Leu, Ile, Val and Met is plausible since the interior of most proteins is composed almost exclusively of them, stabilized by numerous van der Waals interactions. The most extreme case is Cys, which in some scales can be found at the most hydrophobic positions (probably due to its overwhelming tendency to create disulfide bonds in protein interiors), while in other scales it occupies the most hydrophilic positions. The difficulties in such positioning are understandable by taking into account the fundamentally different methods used for constructing such scales (Fig. 2). These difficulties are further enhanced thought the complex nature of hydrophobicity
3 Physicochemical Properties of Canonical Amino Acids 7
Fig. 2 Left: cartoon illustrating the scientific dispute (Fauchere and Pliska [25] versus Wolfenden and Radzicka [20]) over the question “How hydrophilic is tryptophan?” [26]. There is indeed a wide diversity of positions for many amino acids in different hydrophobic scales, especially for Arg, Trp, Cys and the imino acid Pro. (Right) Venn diagram presenting some basic physicochemical features (also shown in Table 1) and relations between 20 canonical amino acids. This is especially useful in showing the multiple properties of an amino acid, e.g. positive–negative charge, neutral–hydrophilic, aromatic–hydrophobic,
aliphatic–hydrophobic, etc. The amino acids that possess the dominant hydrophobic and hydrophilic properties are defined by their set boundaries. Subsets contain amino acids with the properties aliphatic, aromatic, charged, positive, negative and small. There are certainly areas which should define sets of properties possessed by none of the cannonical amino acids – these areas can be “occupied” by noncanonical amino acids delivered to the code by its engineering. (
The cartoon is reproduced from [26] with permission from Elsevier, copyright 1986; the Venn diagram was drawn according to [27]).
itself, which can be defined in several ways. In fact, Charton and Charton [33] wrote: . . . no single hydrophobicity parameter . . . can represent the complete range of amino acid behavior. There is no special phenomenon denoted by hydrophobicity in amino acids. It is the natural and predictable result of differences in the intermolecular forces between water and the amino acid and those between the amino acid and some other medium.
For example, the aromatic amino acids Phe, Tyr and Trp harbor a combination of two properties, i.e. hydrophobicity and polarity (ability to bind ions through cation–π interactions), that are often considered to be mutually exclusive [34]. This combination of properties makes them suitable candidates for both the interior of a protein or for interactions with cell membranes. For that reason, aromatic side-chains are often part of the rigid protein interior, where they participate in networks of three or more interacting side-chains that are supposed to serve as nucleation sites in protein folding and as a main stabilizing force of tertiary structures. On the other hand, these amino acids are crucial in mediating processes such as receptor–ligand interactions, enzyme–substrate binding and antigen–antibody recognition [29].
8 Genetic Code Organization and Protein Structure
4 Reasons for the Occurrence of Only 20 Amino Acids in the Genetic Code
Evolution is based on a changing chemical environment. These changes drive the cells to optimize their chemistry for survival purposes. Only those cells whose chemistry fits the newly emerged environmental conditions will survive such a natural selection process and transmit their genes to the following generations. From that point of view, entry in the code repertoire would be reserved only for those novel amino acids that bring superior cellular functionality (i.e. increase survival fitness). However, it is reasonable to suppose that in most cases the integration of novel amino acids into the proteome or smaller numbers of proteins takes place without any substantial advantage for host cells and might even be harmful. Such cells are usually subjected to negative selection, i.e. elimination from the life tree. According to the early proposal of Woese [4], primitive cells or proto-cells, nowadays usually described in the framework of the “last universal common ancestor” (LUCA) [35] or as “progenote” local populations [36], started with a completely random, highly ambiguous set of codon assignments with very inaccurate translation. Such early systems were capable of adopting a novel amino acid much easier due to the relatively minor deleterious effects on the proto-cell. This is indeed plausible in the context of smaller genomes and proteomes, simple metabolism, rather modest biosynthetic capacity, and their low level of integration. Once such an entry of a novel amino acid brings about a great advantage in a given chemical environment, it would be “cemented” in the code by selection processes. Such code enrichment with novel amino acid building blocks would lead to more advanced and sophisticated life forms to the point when any further repertoire expansion would not be possible without causing large harmful effects and even lethality. At that point “freezing” or conservation of the code took place [37], as already elaborated in Crick’s “frozen accident” concept [38]. According to this hypothesis, the current code repertoire reached an evolutionary dead-end, since cells were no longer capable of assimilating new amino acids in their genetic code. When life had evolved to a certain level of complexity, the proteome of the cell was further functionally diversified by other means: context-dependent preprogrammed recoding and posttranslational modifications. Therefore, the answer to the question as to why only 20 amino acids represent the standard repertoire of the universal genetic code is an evolutionary one. This is the point when the standard repertoire of the 20 amino acids was established either in the context of the LUCA [39], “progenote” local populations [36] or something else. This establishment was a historical process, a sort of “bottleneck” event to which all coding properties could be traced back. Another line of reasoning was followed by Miller and Weber [40]. They were considering the hypothetical scenario of independent life origin on another planet in which proteins were chosen as catalysts. Their main question was: which basic building blocks would then be used for their synthesis? Applying physicochemical, genetic and historical argumentation for such a scenario, Miller and Weber concluded that for an independent origin of life on another planet using protein catalysis, about
5 What Properties of Amino Acids are Best Preserved by the Genetic Code?
two-thirds of the basic building blocks would have to be of the same physicochemical nature as those on Earth. They speculated that certain classes of amino acids were excluded from the code through selection against adverse effects that they might cause on protein structure, stability and function. The explanation for the exclusion of other amino acid types that do not violate these rules is most probably historical, such as their unavailability or insufficient evolutionary time to produce them metabolically [3]. In fact, these rules and determinants are at least in part responsible for the fixation of 20 canonical amino acids in the standard repertoire of the universal genetic code.
5 What Properties of Amino Acids are Best Preserved by the Genetic Code?
In its very essence, known life is based on self-organized polymeric matter, i.e. proteins, nucleic acids and carbohydrates. Such polymeric structures in presentday organisms are the result of the selection process during evolution. The basic feature of these functional biopolymers is their nonlinear (all-or-nothing) response to external stimuli. In other words, small changes happen in response to a varying parameter until a critical point is reached – when a large change occurs over a narrow range of the varying parameter. After such a transition is completed, there is no further significant response of the system [41]. These nonlinear responses of biopolymers are caused by highly cooperative interactions between separate monomeric subunits. In the case of proteins, this is indeed plausible by taking into account their prominent feature – foldability [42]. This means that functional protein sequences fold into conformations that possess a pronounced energy minimum separated by a large energy gap from the bulk of structurally unrelated misfolded conformations [43]. Sequences that satisfy these requirements are able to fold fast and to exhibit a cooperative folding transition [44]. They are stable against mutations as well as against variations in solvent conditions and temperature. Indeed, characteristic folds and stable conformations in proteins are largely preserved across the various phylogenetic trees despite their great sequence variations [18]. A simple system that operates on a few basic principles should be devised to build up such structures. The simplest rule to be obeyed for protein building is the general “apolar in–polar out” principle, allowing them to fold into tight particles that have an internal hydrophobic core shielded from the surrounding solvent [45]. This binary partitioning of polar and nonpolar amino acids is indeed an intrinsic part of the genetic code (Fig.3). In its structure, these relations are visibly regular: all codons with a central U are cognate to amino acids with chemically relatively uniform, hydrophobic side-chains (“convergent types”). On the other hand, coding triplets with a central A are cognate to amino acids with chemically variable, polar sidechains (“divergent types”) [46]. Not only is such a chemical order apparent by careful inspection, it can also be quantitatively measured in a simple chromatographic
9
10 Genetic Code Organization and Protein Structure
experiment. Woese and coworkers [4] measured the chromatographic mobility of amino acids on paper in various solvents and interpreted the observed mobility as the number of water molecules required for solvation. On this basis, the “polar requirement” index was developed. This index reflects remarkably well an order in the coding table, clearly indicating that the genetic code assigns amino acids to related triplets on the basis of polarity/hydrophobicity. Later studies by Wolfenden
5 What Properties of Amino Acids are Best Preserved by the Genetic Code?
and coworkers [47] confirmed that the relative distribution of amino acids between the surface and interior of a native globular proteins is indeed shaped with a sharp bias imposed by the genetic code. As a consequence of its degeneracy, there is an increased redundancy in the code structure, which allows for the existence of larger synonymous quotas for some amino acids. Interestingly, they are either strictly polar or strictly apolar. For example, 15 codons are allocated to apolar amino acids Ile, Val and Leu, which shows the extent of encoded hydrophobicity. On the other hand, the number of codons allocated to strictly polar amino acids is even higher (Fig. ??), which means that hydrophilicity is equally well represented in the genetic code structure. Obviously, such allocation of codons in the genetic code has consequently rather forced the distinction of hydrophobic versus hydrophilic residues in the protein structures. Protein interiors are dominantly populated with hydrophobic residues (XUX group of codons), which are rarely found at the surfaces of proteins. Conversely, hydrophilic residues (mainly XAX and AGX groups of coding triplets) are seldom in the interior since their spontaneous appearance would be seriously disruptive to the overall stability of a protein. Such distributions are associated with large differences between free energies of solvation between polar/apolar residues [47]. ←-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Fig. 3 Genetic code organization, amino acid properties and principles for protein structure building. (Upper panel) The hydrophobic scale of Fouchere and Pliska [25] with canonical amino acids and the second letter of their codons. Most of the other hydrophobic scales (reviewed by Trinquier and Sanejounad [19]) would reveal principally the same relationships. The obvious link between the physicochemical properties of amino acids and the second letter of the corresponding codons suggests that the organization of the genetic code (middle panel) reflects these properties (U is cognate for apolar, while A/G is mostly for polar side-chains). In general, the third position of codons is the most variable; the first position is the next most variable, while changes in the second position always result in amino acid replacements. Indeed, the concept of a “code within the codon” predicts that the first and second bases indicate the biosynthetic relationship as well as amino acid polarity [48]. In addition, there is increased redundancy in the coding of strictly polar (blue) and strictly apolar (green) amino acids, i.e. highly synonymous quotas for Arg, Ser (XGX group of codons), Leu and other hydrophobic amino
acids (XUX group of codons). On the other hand, amino acids such as Met (AUG) or Trp (UGG) are assigned with only single coding triplets. These amino acids together with Cys, Thr, Tyr, Ala and the imino acid Pro are termed ambivalent or amphipathic. (Lower part) The genetic code structure also reflects the relative distributions of amino acids residues between the surfaces (hydrophilic, polar or external residues) and the interiors (internal, apolar or hydrophobic residues) of native globular proteins [45]. Represented in this way, the genetic code reflects binary partitioning of polar and nonpolar amino acids, which ensures basic protein architecture pattern maintenance, by following the general “apolar in–polar out” principle [37]. The existence of higher synonymous quotas for some amino acids such as Leu, Arg and Ser (each of them has six assigned codons) is also a function to enhance such binary partitioning of polar and nonpolar residues in protein structures as well as to ensure translation error minimization [49]. The observation that the genetic code is optimal with respect to single base mutations was also confirmed by Freeland and Hurst [50].
11
12 Genetic Code Organization and Protein Structure
This general globular protein organization (polar out–apolar in) indicates first additions to the code during its evolution that were either strictly apolar or strictly polar. This rigid mechanism for maintaining protein structure can also be recognized in the frame of an “operational RNA code” [51] that is believed to be a relic from primitive proto-cells [52]. This means that the discriminatory base N73 in the acceptor stem of tRNA reflects well such binary relationships: A73 is the acceptor for hydrophobic amino acids, while those with G73 are acceptors with hydrophilic amino acids. Therefore, primitive cells with relatively simple organization started code expansion first through additions that were strictly apolar (in the core or more probably as the integral transmembrane part) and strictly polar (at the surface) [37]. Later additions were more “promiscuous”, of which Trp and Met are excellent examples – both are almost equally distributed in the protein core, at surfaces and minicores, and have increased chemical variability [53]. They are also characterized by smaller synonymous quotas and, correspondingly, a greater probability that they may be substituted, but without significant disturbance of the protein structure. In the evolution of proteins, amino acid substitutions are found to occur more frequently between similar amino acids than between dissimilar ones [38]. Such substitutions obey a simple “similar replaces similar” rule, because disruption of the resulting protein structure is minimized by such modest alterations in the amino acid side-chain. Sonneborn [49] conceptualized these observations about coding similarities between amino acids with similar physical properties in the form of a “translation error minimization” theory about the nature of the genetic code. It postulates that the main function of such an arrangement of the genetic code is to maintain the structural stability of globular proteins. From this point of view, the structure of the genetic code is a direct product of natural selection for a system that minimizes the phenotypic impact of the genetic error [54].
6 Evolutionary Legacy: Dual Nature of Conserved Code and Finite Number of Protein Folds
The subject of this article is not the origin of the genetic code or the “chicken-oregg” situation over the primacy of appearance of nucleic acid and proteins in evolution, but rather general principles that can be deduced from protein–code relationships. The present genetic code was shaped in an optimal (or near optimal) functional state by certain selective pressures that resulted in structurally and functionally meaningful proteins. The interplay of such selective pressures in evolution is the main driving force for convergent processes (e.g. preservation of structural types, such as protein folds or body plans in nature) as well as divergence that increase variety in functionality or performance. This structure can be seen as an equilibrium between two opposing extremes, i.e. invariance and mutability. Such a dual nature of the genetic code consequently has the possibilities that genes can simultaneously be fragile (mutable) and robust (invariant) to mutations.
7 Natural Variations in Assignment of Codons of the Universal Genetic Code
This concept was introduced for the first time by Maeshiro and Kimura [55], and has recently been elaborated in great detail in an excellent study by Judson and Hydon [56]. While mutability promotes diversity in the functionality of proteins,in-variance is the barrier against which such mutations would be restricted, especially those that would compromise the active structure of proteins. The single coding triplets of the rare amino acids Met (AUG) and Trp (UGG) are fragile, and any point mutation or translation error leads to nonsynonymous (amino acid altering) substitution in gene sequences like AUG (Met) and AUA (Ile). Mutability is not only the main source of protein sequence diversification, but also of evolution of living beings in general. It is a basic prerequisite for the development of more complex life forms and provides enough flexibility to allow adaptation to a changing environment. Conversely, invariance is responsible for preventing amino acid substitutions in proteins (synonymous or silent codon substitutions), making them robust to mutations [e.g. AUU (Ile) and AUA (Ile)]. In other words, invariance is evolutionarily neutral, because many synonymous codon changes and conservative amino acid changes occur without deleterious effects. This property is the main source of the characteristic protein folding and structure conservation in all living beings. It has been argued that nonsynonymous versus synonymous substitution ratios might be used as a measure of the selective pressure at the level of the protein [57]. Conservatism in protein structure is related to structural stability and folding kinetics rather that function. Clearly, function does not require a particular sequence, but rather particular amino acids at defined sites, while there are other possibilities of variation at different sites. A comparison of protein homologs (i.e. proteins that are functionally and evolutionary related) with protein analogs (those that are structurally similar proteins, but evolutionary unrelated) shows that such analogous proteins in most cases share folding, but not function [58]. On a more general level, such balanced genetic code structure allows for convergence in the structural types, since protein folds found in nature represent a finite set of buildin forms. Other levels of complexity of the living beings obey these principles as well, e.g. concept of the finite number of body plans in nature as proposed more than two centuries ago by Goethe and Cuvier [59]. In fact, all the properties of living beings rest on the fundamental mechanism of molecular invariance. Extant cells are characterized by their invariant basic chemical organization. Therefore, evolution is not a property of living beings; it stems from imperfections in the conservation of the physiological/molecular mechanisms. Such an “imperfection” is indeed a unique privilege of living beings [60].
7 Natural Variations in Assignment of Codons of the Universal Genetic Code
Deciphering of the triplet genetic code by the end of the 1960s revealed a nearly identical assignment of amino acids to coding triplets in “standard” organisms (e.g. E. coli, Bacillus subtilis, Saccharomyces cerevisiae, Arabidopsis, Drosophila, etc.),
13
14 Genetic Code Organization and Protein Structure
indicating its universality as well as the possibility of a common origin for all life forms. A decade later, the sequencing of some genes from the mitochondrial genome and their comparison with nuclear counterparts revealed that there are deviations in codon assignments between nuclear and mitochondrial genes [61]. Changes in codon meaning can be classified into two general categories. The first category includes reassignment of standard termination codons (UGA, AAA and UAG) to Trp or Gln in some prokaryotes, Archaea, mitochondria of many organisms and even nuclear genes in some protozoa (e.g. ciliated protozoan Tetrahymena and Paramecium). The second category of variant codon assignments includes altered meanings in mitochondrial sense codons such as Met → Ile, Lys → Asn, Arg → Ser and Leu → Thr [62], while the only documented case of sense codon reassignment among nuclear genes is (CUG) Leu → Ser in Candida cylindrica [63]. It is believed that more than 10 million species belonging to Archaea, prokaryotic and eukaryotic kingdoms inhabit the Earth at present [64]. With the sequencing of whole genomes or at least part of them, interesting strategies for alternative assignments in the universal code table as well as various undiscovered recoding “tricks” in translation are expected to come to light (see Fig. 4). Various aspects and detailed accounts of alternative codon assignments can be found in the contemporary literature [62], to which the interested reader is directed for references and details which do not appear here. Extensive studies on this subject were also used as “recent evidence for the evolution of the genetic code” [65], which is hardly reconcilable with the previous considerations of code with an invariant structure preserved in all life taxa. Such reassignments might be seen as flexibility in the “frozen” structure of the code, since they do not affect the amino acid repertoire (i.e. resulting only in canonical amino acid “shifting” between different codons). The universality of the genetic code can only be questioned seriously in the case of the existence of such a species that regularly, in a codondependent manner (in the absence of any special context, i.e. without recoding “tricks”), builds proteins with amino acids such as homoserine, homocysteine, αaminobutyric acid, norleucine, phosphoserine, norvaline, etc. There are indeed no such examples in the known life kingdoms [37]. There are no codon reassignments that would introduce new amino acids, but only codon reassignments in different organisms resulting in the changing of one canonical amino acid to another, but always in the frame of the same, standard amino acid repertoire. In addition, the appearance of such codon reassignments is more species specific than universal and most of them are not disruptive at the level of the protein structure. In the case of C. cylindrica, the (CUG)Leu → Ser change should have a negative impact on the organism’s fitness [63]. This reassignment is “reserved” for only for a tiny portion of genes involved in the stress response and pathogenicity; the CUG triplet does not appear in most of the cellular mRNA. Although co-translational incorporation of amino acids such as fMet, Sec and Pyl is a result of local changes in codon meaning or preprogrammed context-dependent re-coding phenomena, in contemporary literature they are often incorrectly assigned as natural variations in the code structure [64].
7 Natural Variations in Assignment of Codons of the Universal Genetic Code
Fig. 4 The existence and cohabitation of mitochondrial and nuclear genetic codes in mammalian cells was discovered during human mitochondrial genome sequencing. Most proteins in animal mitochondria, including some translational components, are imported from the cytosol (i.e. encoded by nuclear genes), whereas there are only 10–20 “homemade” proteins [2]. The existence of these two arrangements results in the following differences in translation of mitochondrial relative to nuclear genes: (i) UGA is not a termination signal, but codes for Trp (therefore the anticodon
of mitochondrial tRNATrp recognizes both UGG and UGA, as if obeying Crick’s “wobble rules”), (ii) internal Met is encoded by both AUG and AUA, while the initiator Met is cognate to AUG, AUA, AUU and AUC, and (iii) AGA and AGG are not assigned to Arg, but specify chain termination [1]. Thus, there are four termination and four Argcoding triplets in the mitochondrial genetic code. Note that the amino acid repertoire is the same in both code variations. Plant mitochondria use an arrangement of the universal code [65].
15
16 Genetic Code Organization and Protein Structure
7.1 Nucleoside Modifications and Codon Reassignments
The study of nucleoside modification in the context of a natural departure from the universal code, in addition to serving as an excellent model for codon reassignment theories (Section 8.2), might also be a gold mine of useful information that can be applied in code engineering. It is well known that nucleoside modifications as a universal feature of tRNAs are not only necessary for fine tuning the tRNA structure, but also for the accuracy, strength and efficiency of the codon-anticodon interactions. Therefore, ribosomal decoding might be influenced not only by tRNA structure and abundance, but also by modified nucleosides [66]. For example, AUA → Met reassignment in eubacteria might be achieved by control of anticodon base modification as follows. Bacteria use tRNA with the *CAU anticodon [*C stands for (2-lysyl)cytidine] for translating AUA codon as cognate for Ile. By inhibition of the cytidine modification (i.e. with the CAU anticodon) tRNA becomesIle charged with Met instead with Ile [67, 68]. This means that, by elimination of the C-modifying enzyme from bacteria, a strain having a novel codon reassignment (equal to those that already exists in mitochondria) can be designed. Another example includes the presence of pseuduridine ψ in the anticodon of tRNAAsn of echinoderm mitochondria; tRNAAsn with the GψU anticodon recognizes the AAA (Lys) coding triplet. These findings highlight the fact decoding flexibility (wobble rules) exists in the third codon position (first anticodon position); this feature might also be responsible for ambiguous decoding where two or more different tRNAs compete for the same coding triplet (e.g. “polysemous codon” [69]). Finally, modifications of bases that are direct neighbors of “wobble” nucleotides (e.g. N37) might have an influence on decoding since they might affect the interaction between the first anticodon and third codon base [70]. More details about the role of modified nucleosides in codon-anticodon interactions are available in [71].
8 Codon Reassignment Concepts Possibly Relevant to Code Engineering 8.1 Genome Size, Composition, Complexity and Codon Reassignments
The amino acid composition of proteins is correlated with the characteristic GC content of the genomic DNA of an organism, which is related to its phylogeny. Indeed, the neutral theory of protein evolution [72] expects that a significant proportion of the amino acids present in a protein should change by random genetic drift and therefore be influenced by the number of codons assigned to the different amino acids. The differential level of GC content in the different components of the genome in a given organism can be understood as the consequence of selective constraints that have been exerted to eliminate functionally deleterious mutants [73]. Correspondingly, each organism has it own set of biases for use of 61 codons for
8 Codon Reassignment Concepts Possibly Relevant to Code Engineering
the amino acid and tRNA population, which is in direct correlation with the codon composition of total mRNA. The frequency of appearance of particular synonymous codons (i.e. codon bias or codon usage which is correlated to the abundance of particular tRNA species in the cytosol) can differ significantly between genes, cell compartments, cells and organisms. Such a bias is also a measure of tRNA isoacceptor distributions for synonymous codons in particular genes and species [65]. Potential errors in tRNA recognition by AARS are avoided by positive and negative identity elements, whose repertoire obviously has limits. This is evident by the fact that endless variations in tRNA recognition elements are not possible; they would, sooner or later, lead to unacceptable levels of tRNA misacylations. For that reason, Schimmel and Ribas de Poplana proposed a concept that attributes the fixed state of the extant genetic code to the limited amount of productive intrinsic combinations among identity elements that are responsible for specific tRNA:AARS interactions [74]. Mitochondria which encode few proteins, importing most of what they need from the nucleus, are thus are not under strong selection for translational accuracy; this resulted in genome economization. This furthermore inevitably led to constrained codon usage acting as an efficient selective pressure for codon re-assignments [70]. For example, most of the codon variations found in various animal mitochondria (11 different codons) are due to the dramatic reduction of the genome size and, in particular, the number of tRNA genes [74]. This reduction in the tRNA gene pool led to the loss of many of the conserved identity elements, and to a “relaxation” of the recognition constraints between AARS and tRNA. Indeed, the changed identity for tRNAArg , tRNAIle , tRNAMet , tRNALys and tRNASer correlates remarkably with the variations (i.e. codon reassignments) in the mitochondrial genetic code [65]. Such “melting” of the genetic code is also conceivable in Mycoplasma, which is regarded as a degenerate form of Gram-positive bacteria. In fact, the “genome minimization” hypothesis proposes that both mitochondria and Mycoplasma are under extreme selection to reduce their genome size, which should confer an advantage in their replication [75]. For example, the genome of M. capricolum contains only 30 tRNA genes for 29 tRNA species with reduced extents of nucleoside modifications. In this manner, the genomes of Mycoplasma and mitochondria economize (simplify) their content by discarding redundant tRNAs and enzymes for nucleoside modifications (most probably as a result of AT pressure), leaving enough space for the appearance of new tRNA species capable of acquiring novel codon meanings [76]. Such codon reassignments within large, complex, relatively modern genomes would be much more difficult. For example, Jukes and Osawa [77] considered the possibility for tRNA mutation in E. coli that would make Gln cognate for the AAA codon together with Lys. Reading of this codon by both corresponding tRNAs would result in a mixture containing both amino acids at AAA sites of protein products. If 1% of such Lys ↔ Gln replacements were disruptive (E. coli has 1562 genes with at least one AAA codon per copy) there would be 15 defective cellular proteins, enough to eliminate such an allele by negative selection. Similarly, there are 417 UGA termination signals distributed among the genes in the E. coli genome. In a hypothetical scenario of their general reassignment to a particular amino acid,
17
18 Genetic Code Organization and Protein Structure
proteome-wide disturbance accompanied with dramatic losses in cellular viability is the only possible outcome [78]. In this context, the emergence of special, sophisticated, context-dependent and species-specific mechanisms that keep internal stop codons separated from terminal ones during, for example, translation of Sec is easy to understand. However, there are E. coli suppressor strains that are capable of suppressing particular stop codons with efficiencies of even 50% [79]. In some strains the error frequency during translation can be increased by a few orders of magnitude without any effects on exponential growth. Such ambiguity is not only limited to prokaryotes; eukaryotes have basal levels of ambiguity similar to prokaryotes. Obviously there are tolerable levels of nonsense suppressions without any apparent phenotypic feature [67]. The ambiguity is also used by parasitic species, e.g. when animal and plant viruses use ambiguous stop codons to adjust the level of stop-read-through for the translation of their essential gene products. This strategy can be “borrowed” from nature into the laboratory as well. For example, routine use of amber suppressor strains like E. coli DH5α is widespread since they enable (through restricted translational read-through) control of the number of plasmid copies per cell without any negative interference to its viability.
8.2 Stop Codon Takeover, Codon Capture and Codon Ambiguity
The departures from the universal code are often thought as a “relics of a primitive genetic code that possibly preceded the emergence of the so-called universal code” [80], whereas other thinkers in the field regard them to be “of recent origin and derived from the universal code” [65]. Based on the observation that the majority of the described variations in the universal code are in fact reassignments of standard termination codons, Lehman and Jukes [81] proposed code evolution via the takeover of stop codons (“stop codon takeover” hypothesis). Their crucial statement is that “all codons are essentially chain termination or stop codons until tRNA adaptors evolve having the ability to bond tightly to them”. According to their hypothesis, new sense coding triplets have been derived from codons serving previously as exclusive nonsense signals (i.e. they did not have their “own” adaptor or RF). In particular, mutation in different tRNA adaptors (or in their isoacceptors) leads to recognition of a stop codon, while an additional change in the identity set leads to its acylation with a novel amino acid which is subsequently translated into proteins. The tRNA with new function should emerge by duplication of genetic material, which is also one of the fundamental mechanisms in protein evolution [64]. When parental sequences duplicate, the copies are confronted with three possible outcomes: (i) both copies retain native activity, (ii) one of the copies is inactivated by accumulation of a mutation and (ii) one copy might be further functionally diversified. The emergence of a new function with an associated novel codon assignment should have little effect on the phenotype since such an encoding should result in chain lengthening to a tolerable extent [81]. Evolutionarily advanced proteins tend to be longer and more complex in their amino acid composition, e.g. globins or
8 Codon Reassignment Concepts Possibly Relevant to Code Engineering
highly specialized enzymes and functional enzymatic machinery like a ribosome or proteasome. The codon capture (this term refers to a change in the meaning of a codon) theory of Osawa and Jukes [78] (Fig. 5) is in fact an extension and further sophistication of the Lehman–Jukes concept. Two crucial steps have been postulated. (i) An amino acid (or termination) codon temporally vanishes from the coding frames of translated sequences as a consequence of GC or AT mutational pressure. This loss of codons makes corresponding tRNAs functionless and they are eliminated. (ii) When future mutational pressure reverses and sequence composition (e.g. AT content) changes again, the emerging codons that lack cognate tRNA are in fact termination signals (i.e. inhibit translation). Their capture is thus advantageous; the emergence of tRNA that translates them with a specific assignment occurs via (i) a change in the anticodon of particular adapter, (ii) a change in identity sets that leads to charging with novel amino acid (without anticodon change) or (iii) a change in codon–anticodon pairing (e.g. by introduction of nucleotide modification). Such mutated tRNA adaptor originates most probably via gene duplication. This process of simultaneously evolving gene sequences, codons, anticodons and amino acids is accompanied by a series of nondisruptive, i.e. neutral, changes. According to codon capture theory, it would be deleterious for an organism if a codon was assigned to two amino acids simultaneously [77]. Thus, this theory requires a series of strict contingencies to prevent ambiguous codons from arising (to avoid disruptive effects
Fig. 5 The Osawa–Jukes “codons capture” [65] versus the Schultz–Yarus “ambiguous intermediate” theory [82] of codon reassignment. It is generally accepted that codon reassignments occur over relatively long evolutionary periods. In the “codon capture” theory the first step in the change of the genetic code is assumed to be the complete disappearance of a codon from a genome. The deleted codon might reappear with a new function upon, for example, specific mutation on tRNA. The “ambiguous inter-
mediate” theory is similar, but does not envisage the full disappearance of a codon from genes before its reassignment. This theory proposes that the reassignment of a codon takes place via an intermediate stage during which the codon is recognized by two tRNAs assigned to different amino acids (or a tRNA and a release factor). Such a novel meaning might be fixed in the code only if there is a positive benefit arising from the reassignment.
19
20 Genetic Code Organization and Protein Structure
on proteins) and these requisites must be fulfilled in order to add novel amino acids (or to reassign one from the existing repertoire) to the code [65]. In contrast, the Schultz–Yarus “ambiguous intermediate” [82] hypothesis (Fig. 5) does not require the complete disappearance of a codon from the genome before reassignment take place. This theory postulates that a mutant tRNA with expanded decoding properties starts decoding a codon belonging to a noncognate amino acid codon family, making it ambiguous. In other words, the codon undergoes a transitional stage in which it is ambiguously decoded. Such ambiguity may be one of aminoacylation identity, anticodon misreading or competition between two nonambiguous tRNAs. Thereby, an ambiguous intermediate is the preferred route by which a primordial code evolved to the present “universal” form. This leads to heterogeneity in the encoded proteins that, according to this theory and experimental evidence, can be tolerated by an organism. At the core of this proposal is the possibility that mutated tRNA could recognize a normally noncognate codon and insert its amino acid in competition with the cognate amino acid. If this new specificity confers an advantage, propagation of an organism with ambiguous codons can be favored by natural selection, resulting in an organism having an established genetic code with a new arrangement. This theory was recently reformulated by Santos and coworkers [83] by introducing a novel mechanism for codon reassignment through codon ambiguity, as follows: Mutant tRNAs with expanded decoding properties, or mutations in the translational machinery, which create decoding ambiguity, in conjunction with biased GC pressure, impose a negative selective pressure on particular codons, decreasing their usage to very low levels or forcing them to disappear from the genome. Some overall positive advantage must be present in order for this process to occur. These codons can be gradually reassigned through structural change of the translational machinery and loss of the ancestral cognate tRNAs.
There are also recent attempts to reconcile both theories in a single unifying model, which consider them as “opposite faces of the same coin” [62].
References 1 WATSON, J., GANN, A., BAKER, T., LEVINE, M., LOSICK, R. and BELL, S. (2004). Molecular Biology of the Gene. Benjamin Cummings, Amsterdam. 2 STRYER, L. (2001). Biochemistry. Freeman, New York. 3 B. ARDELL, D. H. AND SELLA, G. (2001). On the evolution of redundancy in genetic codes. Journal of Molecular Evolution 53, 269–281. 4 WOESE, C. R., DUGRE, D. H., DUGRE, S. A., KONDO, M. and SAXINGER, W. C. (1966). On fundamental nature and evolution of
of egenetic code. Cold Spring Harbor Symposia on Quantitative Biology 31, 723–736. 5 DI GIULIO, M. and MEDUGNO, M. (1999). Physicochemical optimization in the genetic code origin as the number of codified amino acid increases, Journal of Molecular Evolution 49, 1–10. 6 WHEATLEY, D. N., INGLIS M. S. and MALONE, P. C. (1986). The concept of the intracellular amino acid pool and its relevance in the regulation of protein metabolism, with particular reference to
References
7
8
9
10
11
12
13
14
15
16
mammalian cells, Current Topics in Cellular Regulation 28, 107–182. MICHAL, G. (1999). Biochemical Pathways: An Atlas of biochemistry and Molecular Biology. Wiley, New York. ZEMPLENI, J. and DANIEL, H. (2003). Molecular Nutrition. CABI, Cambridge, MA. MEISENBERG, G. and SIMMONIS, W. H. (1998). Principles of Medical Biochemistry. Mosby, St Louis, MO. NEIDHARDT, F. C. and UMBARGER, E. H. (1996). Chemical composition of Escherichia coli. In Escherichia coli and Salmonella: Cellular and Molecular Biology (NEIDHARDT, F. C. et al. eds), pp. 13–16. 2nd Ed. ASM Press, Washington D.C. LERNER, J. A. (1978). Review of Amino Acid Transport Processes in Animal Cells and Tissues. University of Maine Press, Orono, ME. FILLENZ, M. (1995). Physiological release of excitatory amino acids, Behavioral Brain Research 71, 51–67. FISCHER, W. N., LOO, D. D., KOCH, W., LUDEWIG, U., BOORER, K. J., TEGEDER, M., RENTSCH, D., WRIGHT E. M. and FROMMER, W. B. (2002). Low and high affinity amino acid H+ -cotransporters for cellular import of neutral and charged amino acids, Plant Journal 29, 717–731. BOUDKO, D. Y., KOHN, A. B., MELESHKEVITCH, E. A., DASHER, M. K., SERON, T. J., STEVENS B. R. and HARVEY, W. R. (2005). Ancestry and progeny of nutrient amino acid transporters, Proceedings of the National Academy of Sciences of the USA 102, 1360–1365. RITAR, A. J., DUNSTAN, G. A., CREAR, B. J. and BROWN, M. R. (2003). Biochemical composition during growth and starvation of early larval stages of cultured spiny lobster (Jasus edwardsii) phyllosoma, Comparative Biochemistry and Physiology A 136, 353–370. REIMER, R. J., CHAUDHRY, F. A., GRAY, A. T. and EDWARDS, R. H. (2000). Amino acid transport System A resembles System N in sequence but differs in mechanism, Proceedings of the National Academy of Sciences of the USA 97, 7715–7720.
17 KAUZMANN, W. (1957). Physical chemistry of proteins, Annual Review of Physical Chemistry 8, 413–438. 18 TAVERNA, D. M. and GOLDSTEIN, R. A. (2002), Why are proteins marginally stable? Proteins—Structure Function and Genetics 46, 105–109. 19 TRINQUIER, G. and SANEJOUAND, Y. H. (1998), Which effective property of amino acids is best preserved by the genetic code? Protein Engineering 11, 153–169. 20 RADZICKA, A. and WOLFENDEN, R. (1988). Comparing the polarities of the amino acids—side-chain distribution coefficients between the vapor-phase, cyclohexane, 1-octanol, and neutral aqueous solution, Biochemistry 27, 1664–1670. 21 ROSE, G., GESELOWITZ, A., LESSER, G., LEE, R. and ZEHFUS, M. (1985). Hydrophobicity of amino acid residues in globular proteins, Science 229, 834–838. 22 KYTE, J. and DOOLITE, R. (1982). A simple method for displaying the hydropathic character of a protein, Journal of Molecular Biology 157, 105–132. 23 CORNETTE, J., CEASE, K. B., MARGALIT, H., SPOUGE, J. L., BERZOFSKY, J. A. and DELISI, C. (1987). Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins, Journal of Molecular Biology 195, 659–685. 24 HENIKOFF, S. and HENIKOFF, J. G. (1992). Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences of the USA 89, 10915–10919. 25 FAUCHERE, J. L., and PLISKA, V. (1983). Hydrophobic parameters π of amino acid side chains from the partitioning of N-acetyl amino acid amides, European Journal of Medicinal Chemistry 18, 369–375. 26 WOLFENDEN, R. and RADZICKA, A. (1986). How hydrophilic is tryptophan, Trends in Biochemical Sciences 11, 69–70. 27 TAYLOR, W. R. (1986). The classification of amino acid conservation, Journal of Theoretical Biology 119, 205–218. 28 BUDISA, N., HUBER, R., GOLBIK, R., MINKS, C., WEYHER, E. and MORODER, L. (1998). Atomic mutations in annexin
21
22 Genetic Code Organization and Protein Structure
29
30
31
32
33
34
35
36
37
38
39
V—thermodynamic studies of isomorphous protein variants, European Journal of Biochemistry 253, 1–9. PAL, P. P. and BUDISA, N. (2004). Designing novel spectral classes of proteins with tryptophan-expanded genetic code, Biological Chemistry 385, 893–904. BREMER, H. and DENNIS, P. P. (1996). Modulation of chemical composition and other parameters of the all by growth rate. In Escherichia coli and Salmonella: Cellular and Molecular Biology (NEIDHARDT, F. C., et al. eds), pp. 1553–1569. 2nd Ed. ASM Press, Washington D.C. WADA, K., WADA, Y., ISHIBASHI, F., GOJOBORI, T. and IKEMURA, T. (1992). Codon usage tabulated from the GenBank genetic sequence data, Nucleic Acid Research 20, 2111S–2118S. CHAMBERS, J. A. A. (1993). Buffers, chelating agents and denaturants. In Biochemistry Labfax (CHAMBERS, J. A. A., and RICKWOOD, D., eds), pp. 1–36. Blackwell Scientific, Oxford. CHARTON, M. and CHARTON, B. I. (1982). The structural dependence of amino-acid hydrophobicity parameters, Journal of Theoretical Biology 99, 629–644. DOUGHERTY, D. A. (1996). Cation-interactions in chemistry and biology: A new view of benzene, Phe, Tyr, and Trp, Science 271, 163–168. FORTERRE, P. (1997), Archaea: what can we learn from their sequences? Current Opinion in Genetics and Development 7, 764–770. WOESE, C. (1998). The universal ancestor, Proceedings of the National Academy of Sciences of the USA 95, 6854–6859. BUDISA, N., MORODER, L. and HUBER, R. (1999). Structure and evolution of the genetic code viewed from the perspective of the experimentally expanded amino acid repertoire in vivo, Cellular and Molecular Life Sciences 55, 1626–1635. CRICK, F. H. C. (1968). Origin of genetic code, Journal of Molecular Biology 38, 367–379. DELAYE, L., BECERRA, A. and LAZCANO, A. (2004). The nature of the last common ancestor. In The Genetic Code and the Origin of Life (RIBAS DE POUPLANA, L., ed),
40
41
42
43
44
45
46
47
48
49
50
51
52
pp. 221–249. Landes Bioscience, Georgetown, TX. WEBER, A. L. and MILLER, S. L. (1981). Reasons for the occurrence of the 20 coded protein amino acids, Journal of Molecular Evolution 17, 273–284. GALAEV, I. Y. and MATTIASSON, B. (1999). “Smart” polymers and what they could do in biotechnology and medicine, Trends in Biotechnology 17, 335–340. GOVINDARAJAN, S. and GOLDSTEIN, R. A. (1995). Optimal local propensities for model proteins, Proteins—Structure Function and Genetics 22, 413–418. KLIMOV, D. K. and THIRUMALAI, D. (1996). Factors governing the foldability of proteins, Proteins—Structure Function and Genetics 26, 411–441. TAVERNA, D. M. and GOLDSTEIN, R. A. (2002), Why are proteins so robust to site mutations? Journal of Molecular Biology 315, 479–484. JANIN, J. (1979). Surface and inside volumes in globular proteins, Nature 277, 491–492. BUDISA, N. (2004). Prolegomena to future efforts on genetic code engineering by expanding its amino acid repertoire, Angewandte Chemie International Edition 43, 3387–3428. ROSE, G. D. and WOLFENDEN, R. (1993). Hydrogen-bonding, hydrophobicity, packing, and protein-folding, Annual Review of Biophysics and Biomolecular Structure 22, 381–415. TAYLOR, F. J. R. and COATES, D. (1989). The code within codes, BioSystems 22, 177–187. SONNEBORN, T. M. (1965). Degeneracy of the Genetic Code: Extent, Nature and Genetic Implications. Academic Press, New York. FREELAND, S. J. and HURST, L. D. (1998). The genetic code is one in a million, Journal of Molecular Evolution 47, 238–248. SCHIMMEL, P., GIEGE, R., MORAS, D. and YOKOYAMA, S. (1993). An operational RNA code for amino acids and possible relationship to genetic code, Proceedings of the National Academy of Sciences of the USA 90, 8763–8768. RIBAS DE POUPLANA, L. R. and SCHIMMEL, P. (2001). Operational RNA code for
References
53
54
55
56
57
58
59
60 61
62
63
amino acids in relation to genetic code in evolution, Journal of Biological Chemistry 276, 6881–6884. BAE, J. H., ALEFELDER, S., KAISER, J. T., FRIEDRICH, R., MORODER, L., HUBER, R. and BUDISA, N. (2001). Incorporation of beta-selenolo[3,2-b]pyrrolylalanine into proteins for phase determination in protein X-ray crystallography, Journal of Molecular Biology 309, 925–936. KNIGHT, R. D., FREELAND, S. J., and LANDWEBER, L. F. (2004). Adaptive evolution of the genetic code. In The Genetic Code and the Origin of Life (RIBAS DE POUPLANA, L., ed), pp. 75–91. Landes Bioscience, Georgetown, TX. MAESHIRO, T. and KIMURA, M. (1998). The role of robustness and changeability on the origin and evolution of genetic codes, Proceedings of the National Academy of Sciences of the USA 95, 5088–5093. JUDSON, O. P. and HAYDON, D. (1999). The genetic code: what is it good for? An analysis of the effects of selection pressures on genetic codes, Journal of Molecular Evolution 49, 539–550. DUFTON, M. J. (1997), Genetic code synonym quotas and amino acid complexity: Cutting the cost of proteins? Journal of Theoretical Biology 187, 165–173. MIRNY, L. A. and SHAKHNOVICH, E. I. (1999). Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function, Journal of Molecular Biology 291, 177–196. GOETHE, W. (1954). Die Schriften zur Naturwissenschaft 9. Herman B¨ohlaus, Weimar. MONOD, J. (1971). Chance and Necessity. Vintage Books, New York. BARRELL, B. G., BANKIER, A. T. and DROUIN, J. A. (1979). A different code in human mitochondria, Nature 282, 189–194. SANTOS, M. A. S. and TUITE, M. F. (2004). Extant variations in the genetic code. In The Genetic Code and the Origin of Life (RIBAS DE POUPLANA, L., ed), pp. 183–220. Landes Biosciences, Georgetown, TX. TUITE, M. F. and SANTOS, M. A. S. (1996). Codon reassignment in Candida species:
64 65
66
67
68
69
70
71
72
73
74
an evolutionary conundrum, Biochimie 78, 993–999. OSAWA, S. (1995). Evolution of the Genetic Code. Oxford University Press, Oxford. OSAWA, S., JUKES, T. H., WATANABE, K. and MUTO, A. (1992). Recent evidence for evolution of the genetic code, Microbiological Reviews 56, 229–264. MURAMATSU, T., NISHIKAWA, K., NEMOTO, F., KUCHINO, Y., NISHIMURA, S., MIYAZAWA, T. and YOKOYAMA, S. (1998). Codon and amino-acid specificities of a transfer RNA are both converted by a single post-transcriptional modification, Nature 336, 179–181. SCHULTZ, D. W. and YARUS, M. (1996). On malleability in the genetic code, Journal of Molecular Evolution 42, 597–601. MURAMATSU, T., YOKOYAMA, S., HORIE, N., MATSUDA, A., UEDA, T., YAMAIZUMI, Z., KUCHINO, Y., NISHIMURA, S. and MIYAZAWA, T. (1988). A novel lysine substituted nucleoside in the first position of the anticodon of minor isoleucine tRNA from Escherichia coli, Journal of Biological Chemistry 263, 9261–9267. SUZUKI, T., UEDA, T. and WATANABE, K. (1997). The “polysemous” codon—a codon with multiple amino acid assignment caused by dual specificity of tRNA identity, EMBO Journal 16, 1122–1134. SANTOS, M. A. S., MOURA, G., MASSEY S. E. and TUITE, M. F. (2004). Driving change: the evolution of alternative genetic codes, Trends in Genetics 20, 95–102. YOKOYAMA, S. and NISHIMURA, S. (1995). Modified nucleosides and codon recognition. In tRNA: Structure, Biosynthesis, and Function (S¨OLL, D. and RAJBHANDARY, U., eds), pp. 207–223. American Society of Microbiology, Washington, DC. KIMURA, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge. JUKES, T. H. and KING, J. L. (1969). Non-Darwinian evolution, Science 164, 788–798. RIBAS DE POUPLANA, L. R. and SCHIMMEL, P. (2004). Aminoacylations of tRNAs: Record-keepers for the genetic code. In
23
24 Genetic Code Organization and Protein Structure
75
76
77
78
Protein Synthesis and Ribosome Structure (NIERHAUS, K., and WILSON, D. N., eds), pp. 169–184. Wiley-VCH, Weinheim. KOONIN, E. V. (2000). How many genes can make a cell: the minimalgene-set concept, Annual Review of Genomics and Human Genetics 1, 99–116. JOSE CASTRESANA, J., FELDMAIER-FUCHS, G. and P¨AA¨ BO, S. (1998). Codon reassignment and amino acid composition in hemichordate mitochondria, Proceedings of the National Academy of Sciences of the USA 95, 3703–3707. JUKES, T. H. and OSAWA, S. (1997). Further comments on codon reassignment, Journal of Molecular Evolution 45, 1–3. JUKES, T. H. and OSAWA, S. (1989). Codon reassignment (codon capture) in evolution, Journal of Molecular Evolution 28, 271–278.
79 WANG, L. and SCHULTZ, P. G. (2002). Expanding the genetic code. Chemical Communications, 1–11. 80 GRIVELL, L. A. (1986). Deciphering divergent codes, Nature 324, 109–110. 81 LEHMAN, N. and JUKES, T. H. (1998). Genetic code development by stop codon takeover, Journal of Theoretical Biology 135, 203–214. 82 SCHULTZ, D. W. and YARUS, M. (1994). Transfer-RNA mutation and the malleability of the genetic code, Journal of Molecular Biology 235, 1377–1380. 83 MASSEY, S. E., MOURA, G., BELTRA, P., ALMEIDA, R., GAREY, J. R., TUITE M. F. and SANTOS, M. A. S. (2003). Comparative evolutionary genomics unveils the molecular mechanism of reassignment of the CTG codon in Candida spp, Genome Research 13, 544–557.
1
The Cellular Translation Apparatus Nediljko Budisa Max-Planck-Institute of Biochemistry, Martinsried, Germany
Originally published in: Engineering the Genetic Code. By Nediljko Budisa. Copyright ľ 2006 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-31243-6
1 Natural Laws, Genetic Information and the “Central Dogma” of Molecular Biology
The main feature of living systems as autonomous and self-replicating entities is the high level of organization and integration of their basic physicochemical processes. These interactions are emergent features of life as a process and distinguish it from anything else in the physical world. I will follow the traditional biology (and philosophy of biology) viewpoint that such processes are purely physical and casual [1]. Thus, life processes are shaped by (i) deterministic laws of chemistry and physics (“natural laws”), and (ii) the genetic program which determines all biological activities and phenomena. This program operates such that nucleic acids encode information and proteins execute it. At first glance there is almost nothing in common between the chemistry of nucleotides and that of amino acids. It is the genetic code itself that brings together the two realms – that of protein chemistry with that of nucleic acids chemistry – through a set of optimized and finely tuned interactions [2]. For a long time it was believed that proteins were the hereditary material. The reasons are probably historical, since by the end of the last century protein chemists had already identified most of the amino acids, which were known to be the main building blocks of all proteins. At the same time, Fisher and Hofmeister conceptually solved the problem of amino acid linkage in proteins with the “peptide hypothesis” (i.e. the concept of the peptide bond) [3]. The development of a variety of analytical techniques during the following almost half a century was required until this hypothesis was fully confirmed with the successful sequencing of insulin by Sanger [4]. However, that DNA might be key genetic molecule emerged unexpectedly from the studies on nonvirulent → virulent transformation of pneumonia-causing bacteria by Griffith [5]. After DNA was unambiguously identified as the hereditary Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Cellular Translation Apparatus
molecule (“transforming principle”) by Avery, Hershey and Chase [6, 7], it captured the attention of biologists, chemists and physicists alike. The mechanisms of synthesis of polypeptide chains from amino acid sequences were fully elucidated only upon experimental demonstration that the genes are nucleic acids and that a nucleic acid template is required to direct the synthesis of each protein [8]. The basic problem was how the synthesis of a precisely ordered sequence of amino acids in polypeptide chains occurs. It was fully understood after experimental demonstration that the RNA template is required to direct the synthesis of each protein. At this point, the inevitable connection between intracellular protein synthesis of living beings and their genetic program became clear. Tatum and Beadle’s “one gene → one enzyme” concept [9] certainly paved the way for Jacob and Monod’s discoveries [10] that led to the first description of gene expression regulation. Although the “one gene → one enzyme” concept does not fit into the recent picture of the multidimensionality of a proteome, is still a common practical guide in everyday laboratory routine. The syntax of the genetic information is reflected in a precise sequence, either of bases in nucleic acids or amino acids in proteins. However, the semantic (meaning) of the genetic information stored in the DNA molecule is realized via two processes: transcription and translation. During transcription the sequence of nucleotide bases along one of DNA strand is copied onto a complementary strand of various RNA forms [messenger (mRNA), transfer (tRNA), ribosomal (rRNA), small nuclear (snRNA), transfer messenger (tmRNA), etc.]. In some viruses reverse transcription from RNA back to DNA (by reverse transcriptase) is also possible as well. Translation can be defined as the conversion of information stored in the nucleic acid language into the protein language. In particular, mRNA is used as a template for protein synthesis on ribosomes, i.e. the genetic program of a cell is “read” from mRNA into a complementary amino acid sequence which, once assembled, spontaneously assumes a particular three-dimensional form and begins to perform its specific biological function. The connection between these processes was recognized very early and unified in the concept termed the “central dogma of molecular biology” [11]. The information flow between nucleic acids and protein is unidirectional, i.e. a polynucleotide is never determined by a protein template. This early working hypothesis predicts that DNA functions as the template for RNA molecules, which subsequently move to the cytoplasm where they determine the arrangement of amino acids within polypeptides. This transfer is governed by the genetic code as a set of rules which precisely relates the sequence of base triplets in nucleic acid (polynucleotide) with a sequence of amino acid residues in protein (polypeptide). In order to get a full picture of the genetic information flow, the “central dogma” should be extended as follows (Fig. 1). After transcription and before translation, the RNA transcripts are processed (e.g. through editing, splicing, cap addition, modifications, polyAAA addition) to produce mature mRNA. Polypeptides as translational products must be processed in many cases (see Section 11), producing the mature and active proteins capable of acting as structural components of the cell or serving as functional biochemical machines.
2 Cellular Investments in Ribosome-mediated Protein Synthesis 3
Fig. 1 An extended “central dogma” – the informational flow from genotype to phenotype in living beings. The arrow encircling DNA indicates that this molecule is the template for its own replication. The arrow between DNA and RNA indicates that all
RNAs are made on DNA templates (“transcription”), whereas all proteins are determined by RNA templates (“translation”). hnRNA = homonuclear RNA, mRNA = messenger RNA.
2 Cellular Investments in Ribosome-mediated Protein Synthesis
Protein synthesis, a key step in realizing templated genetic information, is performed by ribosomes in all living organisms. The genes involved in replication, transcription, aminoacylation and translation are almost universally conserved among the different kingdoms of life [12]. Protein synthesis requires two steps: transcription and translation. The high interdependence, synchronization, integration and efficiency of these processes are hallmarks of living cells. This is obvious when, for example, the timescale of these processes is considered: the rate of DNA replication is 50–100 times faster than the rate of transcription, which is thought in bacteria to roughly match the rate of translation. Three nucleotides are transcribed during the time one canonical amino acid is incorporated into the elongating polypeptide chain. Immediately after mRNA is copied from a DNA strand it is captured by the ribosome which scans for initiation codons, and is further followed by mRNA uncoiling and translation. In this way, each mRNA is translated 10–20 times [12]. Not surprisingly, the protein translation process consumes approximately 5% of the human caloric intake or about 30–50% of the energy generated by rapidly growing Escherichia coli cells (in the form of energy-rich amino acid-tRNA bonds and GTP hydrolysis in the elongation process) [13, 14]. It was also argued that the minimum amount of the genome assigned to the protein synthesis apparatus might be as high as 35–45%, which is not surprising in light of the high proportion of a cell’s resources and energy budget that is allotted to translation [15]. Finally, the extent of cellular investment in protein synthesis can be measured by the amount of RNA in living cells. All RNA types together contribute about 21% of the total biomass in an average cell population of E. coli during balanced growth at 37 ◦ C (Table 1). The protein translation process, by which the correct sequence of amino acids is assembled to form a protein, encompasses two distinct molecular recognition events: codon-anticodon interaction between tRNAs and mRNA on the ribosome A site (translation of the genetic information in sensu strictu), and a specific selection of the amino acid substrates by enzyme actions of the aminoacyltRNA synthetases (AARS) (interpretation of the genetic code). It is this pairing of amino acids and tRNAs which defines the genetic code. While the translation of the genetic message relies on nucleic acid-nucleic acid interactions (i.e. base
4 The Cellular Translation Apparatus Table 1 Basic molecular building blocks of E. coli
Components
Percent total dry weight
Amount (g, 10−15 ) per cell
Molecular weight
Molecular per cell
No. of different kinds of molecules
Protein RNA 23S rRNA 16S rRNA 5S rRNA tRNA mRNA DNA Lipid Lipopolysaccharide Peptidoglycan Glycogen Polyamines putrescine spermidine Metabolites, ions, cofactors
55.0 20.5
156 58 31.0 15.5 1.2 8.2 2.3 8.8 25.9 9.7 7.1 7.1 1.1 0.83 0.27 9.9
4.0 × 104
23500 000
1850
1.0 × 106 5.0 × 105 3.9 × 104 2.5 × 104 1.0 × 106 2.5 × 109 705 4070 (904)n 1.0 × 106
18700 18700 18700 198000 1380 2.1 22 000 000 1430 000 1 4300
3.1 9.1 3.4 2.5 2.5 0.4
3.5
88 145
5600 000 1100 000
1 1 1 60 600 1 1 1 1 1 800+
Chemical composition of an average E. coli strain B/r was determined in aerobic glucose minimal medium with a mass doubling time of 40 min during balanced growth at 37 ◦ C. (Reproduced from [15] with permission.)
pairing), the interpretation of the genetic message relies on the enzyme catalytic actions (i.e. enzyme-substrate interactions) of the AARS (Fig. 2) and, exceptionally, on other enzymes as well. In the other words, the accuracy of the basic translational process depends largely on the precision of aminoacylation (i.e. covalent linkage of cognate amino acids with cognate tRNAs) and subsequent noncovalent interactions between charged tRNA with ribosome-bound mRNA (translation in sensu strictu) [16].
3 Molecular Architecture of AARS
The tRNA aminoacylation is catalyzed by corresponding AARS which are responsible for specific esterification of tRNAs with their cognate amino acids (Fig. 3). They constitute a family of 20 cellular enzymes, and operate at the interface between nucleic acids and proteins. They are also essential in maintaining the fidelity of the protein biosynthetic process. Therefore, AARS are at the heart of the translation process. It is believed that they belong to the oldest class of enzymes, i.e. they appeared very early in evolution [17]. The first enzymes from these groups are discovered 1950s, while the first partially purified AARS was TrpRS from
3 Molecular Architecture of AARS
Fig. 2 Protein translation and genetic code interpretation in the context of genetic message transmission. The basic coding is achieved via conservative mRNA–tRNA codon – anticodon specific base pairing; the meaning of each codon is interpreted mainly through the stringent substrate specificity of AARS in the aminoacylation reaction. This reaction is an interpretation level in the context of the flow of genetic
information transmission. Preprogrammed re-coding (see Section 10) and posttranslation modifications (see Section 11) are the means to diversify protein functionality beyond its basic coding. DNA replication, RNA transcription and protein translation are mutually interdependent and synchronized processes followed by protein folding and posttranslation modifications.
Fig. 3 The central role of AARS in protein translation. In well-studied model organisms such as E. coli, yeast or various mammals, 20 AARS catalyze via the aminoacylation reaction the amino acid activation and accurate biosynthesis of aminoacyl-tRNA, the immediate precursors for encoded proteins. Note that each AARS should select its “own” (i.e. cognate) amino acid (AA) which is subsequently covalently linked to
the cognate tRNA isoacceptor “fished” from the cellular pool. Immediately after their dissociation from AARS, the aminoacyltRNAs are shuttled or channeled to the ribosome where the anticodon is matched to the mRNA codon and the tRNA is deacylated, with the amino acid being added as the next residue of a nascent protein chain [AARS(AA-tRNA) – enzyme – aminoacyladenylate complex].
5
6 The Cellular Translation Apparatus
pancreas [18]. Prokaryotic AARS appear as free cytosolic enzymes seldom bound with other proteins to form complexes. The eukaryotic AARS enzymes are usually larger than their prokaryotic counterparts, due to the presence of the C- and N-terminal extensions that are unnecessary for aminoacylation, but their full function is still unclear [19]. It is now also well documented that eukaryotic AARS participate as part of multienzyme complexes in a broad repertoire of cellular activities not just restricted to protein biosynthesis, e.g. tRNA processing, RNA splicing, RNA trafficking, apoptosis, and transcriptional and translational regulation [20]. On the other hand, in some eukaryotes (e.g. yeast) particular AARS are shared between the mitochondria and cytoplasm [21]. Finally, some AARS such as ProRS or GlyRS, according to sequence analysis, occur in different organisms with one of two quite distinct structural architectures: prokaryote like and eukaryote/archaeon like [22]. Both types might even occur in the same organism, most probably as a result of horizontal gene transfer [23]. Although AARS catalyze the same basic reaction and share a common substrate (ATP) and Mg2+ as cofactor, they have long been known to differ in their size, amino acid sequence and subunit structure. For example, their molecular weight is in the range between 51 (CysRS) and 384 kDa (AlaRS), their primary polypeptide structures can be composed from “only” 334 (TrpRS) to 1112 amino acids (PheRS), while their atomic crystal structures are mainly monomeric (α: Arg, Cys, Glu, Gln, Ile, Leu, Val) or dimeric (α 2 : Met, Tyr, Trp, Asn, Asp, His, Lys, Pro, Ser, Thr). PheRS and GlyRS are functional heterotetramers (α 2 β 2 ), while AlaRS is believed to be a functional homotetramer (α 4 ) (Fig. 4). Prokaryotic MetRS are dimers, whereas yeast cytoplasmic and mitochondrial enzymes behave as monomers. In addition the Archaea and eukaryotes possess “eukaryote-like” synthetases (the prototype being the yeast or human cytoplasmic enzymes), whereas eubacteria and mitochondria possess more or less distinct “prokaryote-like” synthetases (the prototype being the E. coli enzyme) [24]. In spite of the fact that the two classes of AARS catalyze the same reaction, earlier sequence analyses have suggested the existence of two distinct classes of AARS (class I and II) of 10 members each on the basis of exclusive sets of sequence motifs [25]. Subsequent elucidation of their three-dimensional structures in free and complex forms with various substrates and substrate analogs revealed a welldefined structural basis of the two more or less independent classes. These studies provided mechanistic insights into the specificity of substrate recognition and catalysis. Both AARS classes also differ as to where the terminal adenosine of the tRNA the amino acid is placed: class I enzymes link cognate amino acids to the 2 -hydroxyl group, whereas the class II enzymes prefer the 3 -hydroxyl group [26]. It is generally assumed that an AARS of a given specificity will always belong to the same class – this assignment is conserved very early in evolution [17]. In other words, regardless of its origin, a particular AARS will always belong to the same class. The only known exceptions to this rule are lysyl-tRNA synthetases which are composed of two unrelated families, class I enzymes in certain Archaea and bacteria, and class II enzymes in all other organisms [27] (class I: MetRS, ValRS, LeuRS, IleRS, CysRS, ArgRS, GluRS, GlnRS, LysRS(1), TyrRS and TrpRS;
3 Molecular Architecture of AARS
Fig. 4 Structural diversity of AARS. The AARS form a large and diverse family whose detailed biochemical and genetic characterizations together with high-resolution structures are now available for nearly every enzyme type (many as complexes with various substrates and substrate analogues). The characteristic AARS X-ray structures fromclass I and class II reveal monomers, homodimers, homotetramers and even heterotetramers. The three dimensional struc-
ture of class II enzyme PylRS (pyrrolysyltRNA synthetases from the Archaea family Methanosarcinidae) is not yet elucidated. Structures were generated with the generous help of Dr M. Seifert using PDB entries available at the AARS Databank (http://rose.man.poznan.pl/aars/). Detailed accounts of each particular enzyme are available in recently published books on AARS [37] and translation mechanisms [38].
7
8 The Cellular Translation Apparatus
class II: SerRS, ThrRS, AlaRS, GlyRS, ProRS, HisRS, AspRS, AsnRS, LysRS(2) and PheRS). The overwhelmingly monomeric enzymes from class I contain highly conserved HIGH and KMSKS sequence motifs, which are critical elements in the structure of the active site for aminoacyl-adenylate synthesis. X-ray studies on the first three class I AARS (MetRS [28], TyrRS [29] and GlnRS [30]) revealed a classic nucleotide binding fold called the Rossmann fold (five-handed parallel β-sheet), already found in NADH-binding dehydrogenases and ATP-binding kinases. The X-ray structure of TyrRS from Bacillus stearothermophillus [31] and subsequent mutagenesis studies provided an insight into the mechanisms of amino acid recognition and activation in class I enzymes [32]. For example, the mutagenesis of the KMSKS motif in TyrRS, MetRS and TrpRS has shown that these residues are part of a mobile loop that stabilizes the transition state of the activation reaction [33]. A similar role was assigned for the HIGH motif in MetRS and TrpRS [34]. The amino acid and ATP binding and activation are accompanied by distinct conformational changes via movement of small domains that contain these motifs. In addition, the topology of the binding site is further influenced by the presence of insertion segments (CP1 and CP2) that contribute to the intrinsic diversification in the frame of class I (see Fig. 5). The first X-ray structure of one AARS from class II (SerRS from E. coli) [35] revealed a rather unique topology completely different from those of class I enzymes. The dominant structural feature of overwhelmingly dimeric class II enzymes is a seven-stranded β-sheet fold with three sequence motifs marked with numbers 1 (18 amino acids), 2 (23–31 amino acids) and 3 (29–34 amino acids). Motif 1 contains an invariant Pro residue involved in dimmer formation, while motives 2 and 3 are involved in aminoacyl-adenylate formation and binding of the 3 -end of tRNA. The specificity of amino acid recognition in these enzymes is further enhanced by an induced fit mechanism (i.e. substrate-induced ordering of the active site) [36]. In some class II AARS (e.g. AspRS and SerRS) this process is mainly local, while in others such as HisRS, LysRS and ProRS more global changes including a relative movement of other domains are observed [33]. The general molecular architecture of these enzymes is based on the existence of (i) a catalytic domain (group I: Rossmann fold; group II: unique fold, built around a six-stranded antiparallel β-sheet), (ii) tRNA-binding domains which can be represented by full motifs or by insertions scattered though the whole structure and (iii) oligomerization domains (Fig. 5) [37–40]. Several AARS contain “insertion peptides” relative to the canonical sequence for that class. In the class I AARS the active site contains a Rosmann dinucleotide binding fold, whereas this fold is absent from class II enzymes, in which the active site has a seven-stranded antiparallel β-sheet feature. For that reasons, class I enzymes bind ATP in an extended conformation, while class II bind ATP in a bent conformation [21]. There is even greater diversity inside the class II than in the class I. Many idiosyncratic domains are attached to and/or inserted in the “class-defining” catalytic core, and are responsible for binding and recognition of cognate RNAs. Such vast structural diversity in topologies within the active site and two different structural frameworks reflects the facts that different AARS evolved independently to catalyze the aminoacylation reaction [23].
3 Molecular Architecture of AARS
Fig. 5 Principal organization into functional domains for catalyses; tRNA binding and oligomerization is a constant feature of AARS. In spite of their conserved mechanisms of catalysis, the AARS are divided into two unrelated classes based on mutually exclusive sequence motifs that reflect distinct active site morphologies. The recently solved crystal structure of the 453amino-acid catalytic fragment of Aquifex aeolicus AlaRS has shown that this enzyme
contains a catalytic domain characteristic of class II synthetases, a helical domain with a hairpin motif critical for acceptor-stem recognition and a C-terminal domain of a mixed "/$ fold [40]. Other principal differences are reflected in the binding of tRNA. CP1 and CP2 = insertion elements. Detailed accounts on particular systems are available in numerous reviews and recently published books [37, 38].
There are other division criteria as well, e.g. AARS can be distinguished according to their ability to bind Zn2+ ions [39]. Recent mapping of entire genomes revealed a surprise in that some organisms do not have genes for all 20 AARS, but they use all 20 amino acids to construct their proteins. It is now well documented that many Archaea possess only 16 out of the 20 canonical AARS [21]. The absence of a particular AARS (like AsnRS and GlnRS) in many taxa of the Archaea and bacteria is often compensated for by indirect pathways, i.e. by the recruitment of the enzymes of intermediary metabolism [23]. For instance, some bacteria do not have an enzyme for charging glutamine onto its tRNAGln . Instead, a single enzyme adds glutamic acid to all tRNAGlu and tRNAGln from the cellular tRNA pool; a second enzyme then converts the Glu-tRNAGln into Gln-tRNAGln , which is than ready for translation at the ribosome [41]. Another impressive example includes the absence of CysRS, which is compensated for
9
10 The Cellular Translation Apparatus
by the dual function of ProRS (which obviously recognizes both substrates), i.e. ProRS is an example of a bifunctional AARS [42]. The reasons for the variations in the number of AARS might also be due to the doubling of some AARS genes, postaminoacylation enzymatic modification of aminoacylated tRNA (AsnRS and GlnRS) as well an alternative decoding mechanism (e.g. Sec and Pyl). In contrast, a full set of AARS genes has been found in all eukaryotes [43]. These surprising findings will most probably be further extended by future genome and proteome mapping among different life kingdoms.
4 Structure and Function of the tRNA Molecule
It is now well established that tRNA plays the role of an adaptor between amino acids and mRNA molecules because it links, in a colinear manner, the genetic information coded in the nucleotide sequence with the amino acid sequence of the corresponding protein. In 1955, Zamecnik and coworkers found that amino acids were first attached to a low-molecular-weight RNA in the soluble fraction of a cell-free tube and were subsequently transferred to proteins in ribosomes [44]. They named these intermediary molecules “soluble” RNA (sRNA) [45]. Soon afterwards, Watson recognized that sRNA meets the requirements of Crick’s adaptor hypothesis, i.e. each of these sRNA pairs with a specific amino acid and ferries it to the ribosome for protein synthesis. After that, the name “transfer” RNA (tRNA) rapidly displaced the original descriptive term sRNA [46]. Once the amino acid has been added to the tRNA, it will be used for protein synthesis according to the specificity dictated by the anticodon sequence in the tRNA (i.e. tRNA decodes the template). This was demonstrated by the classical Raney nickel experiment [47]. The successful sequencing of yeast tRNAAla reported by Holley was the first time that a nucleic acid had had its sequence determined [48]. tRNAs are small molecules (MW ∼ 25 000–30 000 Da) composed of a single polynucleotide chain with 72–93 ribonucleotides and constitute 10–15% of the total RNA in E. coli (Tab. 1). In most organisms there are more codons than tRNAs and more distinct tRNAs than AARS. The genetic code redundancy or degeneracy (i.e. the existence of more than one codon for most amino acids) is the main reason for the presence of more tRNAs for charging of only one amino acid. Different tRNAs chargeable with the same amino acid are called isoaccepting tRNAs. The number of isoaccepting tRNA species varies with the organism – while cells with minimal genomes have only one tRNA for each amino acid, eubacteria have about 40–55 tRNA species in cells. They all have a CCA nucleotide sequence at the 3 -end with terminal adenine 3 -OH or 2 -OH groups as amino acid-binding sites, whereas the 5 -end is phosphorylated (usually pG). On the ribosome, the aminoacyl residues are positioned only at 3 -OH groups since transesterification between the 2 - and 3 -position at the ribose is possible. The 2 -OH group plays an essential role in tRNA translocation from the A to the P site in the ribosome (Fig. 6) [49].
4 Structure and Function of the tRNA Molecule 11
Fig. 6 (A) General representations for tRNA secondary (cloverleaf) and tertiary (L-shaped) structures. The basic tRNA architecture consists of two main parts: the acceptor and anticodon arms enclosing mutually an angle of about 90◦ . The acceptor stem is capable of being loaded with one amino acid, while the anticodon loop provides complementarity for coding triplets
at mRNA. Such a design predisposes the tRNA molecule to participate in all recognition events crucial for correct translation of particular mRNA into the target protein sequence. (B) Schematic representation of tRNA placement in the ribosome in the context of its A and P sites during protein synthesis [50].
The structures and shapes of all known tRNAs are similar since they have to be recognized by different components of the polypeptide formation machinery. However, they should also have rather strong individual differences that predispose them toward specific interactions with cognate AARS in order to be charged with specific amino acids (see Section 6). The tRNA sequences contain, besides canonical A, U, G and C, more that 90 different “unusual” nucleotides such as inozine, pseudouridine, dihydrouridine, ribothymidine, methylated derivatives of guanosine and inosine, etc. Almost 25% of tRNA nucleotides are modified enzymatically during maturation [51]. These modifications include ribose/base methylations, base isomerization, base reduction and thiolation as well as base deamination. The physiological functions of such modifications are related to the diverse physiological roles of tRNAs in living cells. For example, methylations prevent pairing with some nucleotides, but enable other interactions, and contribute additionally to hydrophobicity which certainly influences tRNA interactions with AARS and ribosomal proteins. There are also numerous tertiary interactions that are not based in complementary principles (e.g. A–A, G–G and C–C) as well as hydrogen bonds between bases and riboses, bases and phosphates or riboses and phosphates [49]. In general, such interactions are attributed to tertiary structure modifications, specificity in interactions with other translational components and tRNA decoding on the ribosome. For example, deamination of the adenine in the anticodon loop results in inosine, which is capable of pairing with any of the three bases A, C
12 The Cellular Translation Apparatus
and U (wobbling rules). In fact, Crick’s wobble hypothesis explained much of the degeneracy of the genetic code by taking into account such base modifications – a single tRNA anticodon can recognize multiple codons by nonstandard base pairing. Interestingly, for in vitro experiments tRNAs are usually generated by run-off transcription from a DNA template using T7 RNA polymerase. Transcripts have generally been found to be efficient substrates for cognate AARS despite the fact that they lack base modifications [51]. It was found that a planar cloverleaf structure portrays the best representation of the tRNA secondary structure (Fig. 6). In this two-dimensional model approximately half the nucleotides in the tRNA are paired in four helices with three loops. The acceptor helix (or acceptor stem) with a conserved N73 CAA-3 , contains a specific nucleotide at position 73 functioning as a “discriminatory” base (Fig. 7) that is important for aminoacylation specificity [52]. There is a correlation between the nucleotide type at position 73 and the chemical nature of the cognate amino acid (tRNA with A73 is an acceptor with hydrophobic amino acids, while those with G73 are acceptors with hydrophilic amino acids). The CCA end (which docks at the A and P sites of the ribosome) is encoded by a tRNA gene sequence in eubacteria, whereas in higher organisms the CCA end is added posttranscriptionally. The D-stem-loop (so-called due to the presence of unusual dihydrouridine bases) is a stem-loop structure whose helical region consists of 3–4 bp and a loop length of 8–11 nucleotides. The anticodon stem-loop with a universal length of 7 nucleotides contains an anticodon in the middle of its loop. Similarly, the “T-loop” or “TψCloop” (T stands for ribozyl-thymidine and ψ for pseudouridine) also has a conserved length of 7 nucleotides. Between them there is a variable loop of length between 4 and 24 nucleotides. Based on the structures of their variable loops, there are two tRNAs types. A vast majority of tRNAs belongs to type I with 4–5 nucleotides in the variable loops, whereas type II tRNAs have more that 5 nucleotides that form a stem-loop structure (e.g. eubacterial tRNALeu , tRNASer and tRNATyr ) [51, 52]. The U33 in the anticodon loop is universally conserved since it is instrumental for what is known as a “U-turn” – a sharp 180◦ turn which provides a proper geometry for tRNA interactions to the P and A sites as well as translocation in the ribosome cycles of mRNA decoding [49]. The first three-dimensional structures of tRNAPhe from yeast by Klug [53], Rich [54] and their associates revealed a rigid and stable structure that assumes an L-shape with the anticodon loop and the CCA-3 -OH at opposite ends of the L (Fig. 7). There are two helical mutually perpendicular segments, with numerous non-canonical base pairings and interactions as described above. The crucial role of Mg2+ ions and polyamines is reflected in their interaction with the tRNA sugarphosphate backbone in order to stabilize the rather unusual folding of the tRNA molecule. Although an adaptor function is assigned as a central role to tRNA, it also dictates the functional and conformational states of the ribosome during the elongation cycle of protein synthesis. Finally, in addition to their central role in protein synthesis, tRNAs are involved in a series of other cellular processes ranging from viral infection to metabolism [51].
Fig. 7 Secondary (cloverleaf) and tertiary (L-shaped) structures of tRNAPhe . (A) Cloverleaf model of secondary structure of tRNAPhe from yeast with marked nucleotides that are present in nearly all tRNA species. For example, U33 is essential for translocation in the ribosome, while N73 is the so-call “discriminatory base”. (B) Three-dimensional structure of tRNAPhe – coaxially stacked helices of L-shaped form bear the acceptor arm and the anticodon loop are separated from each other by a distance of 75–80 Å. This value perfectly correlates with the distance from the coding center on the 30S ribosomal subunit to the peptidyl transferase center on the 50S ribosomal subunit.
4 Structure and Function of the tRNA Molecule 13
14 The Cellular Translation Apparatus
5 Aminoacylation Reaction
In neutral aqueous solution with free amino acids together with cognate tRNA, Mg2+ ions and ATP, the AARS catalyze an aminoacylation reaction which results in the generation of aminoacyl-tRNA (AA-tRNA). The energy required to form the aminoacyl bond comes from a high-energy pyrophosphate linkage in ATP. In the cellular milieu, these enzymes have to recognize the correct amino acid and tRNA from a large cellular pool of similar molecules (Fig. 3). Therefore, this reaction is very precisely performed via preferential binding of the enzyme to amino acids and ATP with tRNA charging as the rate-limiting step. Aminoacylation of the tRNA proceeds through a two-step reaction mechanism: both steps occur in the active site of the enzyme. The recognition of the correct amino acid occurs in a manner analogous to that by which all enzymes recognize their substrates. Each amino acid will fit into an active-site pocket in the AARS where it will bind through a network of hydrogen bonds, and electrostatic and hydrophobic interactions. Only amino acids with a sufficient number of favorable interactions will bind. The amino acid is activated by attacking a molecule of ATP at the α-phosphate, giving rise to a mixed anhydride intermediate, aminoacyladenylate, and an inorganic pyrophosphate leaving group. The ATP as substrate is complexed with Mg2+ ions which act as an electrophilic catalyst by binding to γ -and β-phosphates. An amino acid-activated carboxylic group bound to AMP (as a good leaving group) is an easy target for nucleophilic attack by the ribose 2 or 3 -OH group in the terminal adenosine of tRNA (Fig. 8). The first step takes place in the absence of tRNA; however, in some cases (GlnRS, GluRS and ArgRS) amino acid activation is tRNA dependent [55]. There is no dissociation of the aminoacyl-adenylate intermediate from the active site during the reaction and in some cases enzyme – aminoacyladenylate complex is stable enough to be isolated chromatographically. The activation reaction is usually examined by pyrophosphate exchange where radioactively labeled pyrophosphate (usually [32 P]PPi) is distributed among total PPi in the reaction mixture and γ - and β-phosphates in ATP upon equilibration [18, 56]. In this way, the determination of dissociation and velocity constants is possible, and reaction profiles accompanied with free energy changes can be derived. In the second step, the activated amino acid moiety is transferred to the 2 - or 3 -OH of the terminal ribose in the cognate tRNA, yielding the specific aminoacyltRNA and AMP (Fig. 8). The mechanisms by which the AARS recognizes the correct tRNA involve processes that proceed through an initial broad specificity interaction followed by more precise recognition and conformational changes of both AARS and cognate tRNA. Selection of the correct tRNA is assumed to occur as a result of preferential reaction kinetics for cognate protein – RNA complexes [55]. The tRNA first binds the AARS (either as a free enzyme or as an enzyme – aminoacyl-adenylate complex) with recognition depending on sequence-specific protein – RNA interactions. Then, this transient protein – RNA complex catalyzes attachment of the amino acid to the 2 - or 3 -OH-terminal adenosine of the tRNA.
6 AARS:tRNA Interactions – Identity Sets 15
Fig. 8 Chemical mechanism of the aminoacylation reaction. The majority of aminoacyltRNAs are generated in a two-step reaction, whereby the amino acid (AA) is first recognized, activated and subsequently uncharged tRNA aminoacylated with the appropriate amino acid. The aminoacyl-tRNA
then interacts with a translation elongation factor which allows its delivery to the ribosomal A site. P = phosphate, PPi = pyrophosphate, AMP = adenosine monophosphate, AA-AMP = aminoacyl-adenylate. A detailed description of the aminoacylation reaction is available in [33].
The mode of tRNA binding, especially in interactions with acceptor parts, differs between two AARS classes [51]. The two classes of AARS catalyze the same global reaction of attachment of cognate amino acid to the tRNA, but differ as to where the terminal adenosine of the tRNA of the amino acid is placed: class I enzymes act on the 2 -hydroxyl group, whereas the class II enzymes prefer the 3 -hydroxyl group. However, the aminoacyl group attached to the 2 -hydroxyl group of the terminal ribose cannot be used during protein synthesis: it is subsequently transferred by a transesterification reaction from the 2 -hydroxyl to the 3 -hydroxyl of the tRNA terminal adenosine [49].
6 AARS:tRNA Interactions – Identity Sets
tRNA is a central player in the process of protein translation. In this process, the meaning of each particular codon can easily be altered by mutation of the tRNA. This is because the ribosome matches codons with anticodons, but never checks whether tRNAs are charged with cognate, noncognate, canonical or noncanonical amino acids. The responsibility for linking specific amino acids with specific codons (via anticodon-bearing tRNAs) lies with AARS which can employ sophisticated
16 The Cellular Translation Apparatus
proofreading mechanisms, but do not always recognize the anticodon specifically [57]. In this context, the universal genetic code is malleable [58, 59]. The DNA-binding proteins can recognize and form hydrogen bonds with suitable donor or acceptor groups in the major and minor grooves of a double helix. However, the tRNA has no such uniformity in structure – its conformation is a combination of single- and double-strand regions forming stems and loops. For these reasons the AARS:tRNA interactions are very complex and appear to be different for different AARS:tRNA pairs. To a limited extent, it is possible to distinguish between class I and class II AARS based on the way in which they recognize tRNA: most of the class I enzymes (with the exception of LeuRS) require the anticodon for proper recognition, whereas some class II enzymes (SerRS and AlaRS) recognize only features in the acceptor stem [60]. In the investigations on tRNA recognition by AARS it was first postulated that there should be a general set of rules for these interactions. They are defined as the “tRNA identity” – a sum of properties that enable recognition through cognate AARS and prevent binding of other enzymes [61, 62]. Since tRNA molecules are involved in multiple interactions with other diverse components of the translational apparatus, they must have very similar secondary and tertiary structures. On the other hand, all variations in each individual tRNA that enable interactions with specific AARS have to be kept in this frame [60]. It was first believed that the tRNA anticodon is the main recognition element for AARS [63]. Many excellent examples of changes of the anticodon leading to tRNA identity changes in vitro are provided in the contemporary literature [62]. For example, the transplantation of valine anticodon nucleotides into fMet-tRNAMet changed it such that tRNAMet was a substrate for ValRS. Similarly, the “identity” of this fMet-tRNAMet was just as successfully changed to the “identity” of Tyr, Trp, Cys, Phe, Met, Ile, Gly and Gln as well [61]. In all these cases, noncognate amino acids were incorporated at the N-terminal position of a reporter protein. This in vitro system provided an invaluable model to study contributions of anticodon nucleotides to overall the identity of the tRNA. Later, the crystal structures of E. coli tRNAGln :GlnRS [64] and Saccharomyces cerevisiae tRNAAsp :AspRS [65] fully confirmed the role of anticodon nucleotides in tRNA recognition. For most of the tRNA species (17 of 20), anticodon nucleotides mainly dictate the identity of the amino acid to be charged onto an acceptor stem. However, full tRNA recognition does not solely rely to anticodons. There are other important recognition elements, especially bases at position 73 (“discriminatory base”) [52] (Fig. 7) and, in some systems, at the variable arm of tRNAs. At least three well-documented cases exist where the anticodon is not important in aminoacylation: Ser, Leu and Ala acceptor tRNAs [55]. Therefore, it should be possible to change in vitro the tRNA identity without exchange of anticodon loops. An example is the identity switches in E. coli of the amber suppressor tRNALeu though the transplantation of only 2 nucleotides from tRNASer into the variable stem [66]. Suppressor tRNALeu changed in this way served as a serine isoacceptor tRNA. Even in systems where an anticodon is the dominating identity element, the acceptor stem still plays an important role: in the first 4 bp the acceptor stems in
6 AARS:tRNA Interactions – Identity Sets 17
tRNAAla , tRNAGln , tRNAGly , tRNAMet , tRNAPro , tRNASer and tRNAHis from E. coli are markers of their own identity [55]. In addition, the “discriminatory” base 73 proved to be important in tRNACys , tRNAHis and tRNALeu from E. coli and yeast [51]. For these reasons, the concept of an “operational RNA code” for the relationship between the structures and sequences of tRNA acceptor stems and amino acids was proposed [67]. These interactions are a sort of “precursor” to the present genetic code. It was assumed that primitive AARS:tRNA recognition was primarily based on interactions with the acceptor stem or variable loop nucleotides, which is today remains rudimentary only in some AARS:tRNA systems (Ser, Leu and Ala). During the evolution of the genetic code, these amino acid recognition functions were most probably gradually shifted to codon-anticodon interactions. In fact, the “operational RNA code” might have limited the number of variations in tRNA recognition by AARS and these limitations are believed to shape the interpretation of the universal genetic code [17]. In living cells, possible competition reactions between particular AARS and non-cognate tRNAs are prevented through the presence of positive and negative recognition elements. Together, they build up an “identity set” for each tRNA isoacceptor [52]. A considerable body of experimental data has accumulated on the identity elements essential for many tRNA aminoacylation reactions over the past few decades [51, 55]. In this light, attempts to change the “identity” in most cases would require an exchange of the whole set of elements (either positive or negative) rather than exchange of single nucleotides [68]. The fidelity of aminoacylation is also controlled by positive (identity) and negative regulatory elements in tRNAs and AARS which permit both recognition and productive binding of the cognate pairs as well as discrimination against nonproductive binding of noncognate pairs [55]. The accuracy of this process is considerably high, i.e. the error rate in tRNA selection is of order of 10−6 or even higher, and constitutes one of the fundamental phenomena in protein-nucleic acid molecular recognition [69]. Identity residues, properly located in several regions of the tRNAs, trigger specific recognition and charging by the cognate AARS. Numerous currently known identity elements in tRNA charged by class I and class II AARS have already been systematically investigated, and even their tables have been established [51]. Studies on conservation of aminoacylation among different species from all kingdoms revealed paradoxical results – although the genetic code is nearly universal, there are many examples of species-specific aminoacylation reactions [70]. Crossaminoacylation experiments reported by Soma and Himeno have shown that SerRS and LeuRS from E. coli are not able to aminoacylate cognate yeast tRNAs, whereas yeast enzymes recognize E. coli cognate tRNAs [71]. In addition, prokaryotic and eukaryotic TyrRS, IleRS and GlyRS are not able to recognize the corresponding tRNA species from other organisms [72]. In several cases, cytoplasmic AARS cannot aminoacylate the corresponding mitochondrial or chloroplastic tRNA in vitro, which at least is in part due to the natural variation in the genetic code of these organelles [17]. This is also a consequence of the large functional diversity around the catalytic sites of the AARS.
18 The Cellular Translation Apparatus
7 Translational Proofreading
There are remarkable similarities between the members of the encoded amino acids. They can have the same isosteric shape (Val–Thr and Cys–Ser), the same molecular volume (Ile–Val and Pro–Cys) or very similar chemistry (Ser–Thr). Since these amino acids have very similar side-chains, a proofreading mechanism must exist in many cases to make sure that the correct amino acid is chosen. To select the correct amino acid, AARS must be able to discriminate or distinguish between all intracellular canonical and noncanonical amino acids (Fig. 3). In the simplest case, the activation of the larger amino acids might be kept down to an insignificant level by steric hindrance of their binding [73]. Such discrimination is primarily achieved by preferential binding of a cognate amino acid to the correct active site, and by exclusion of larger and chemically different substrates. If the enzyme side-chain can be distinguished adequately from any other similar sidechains, then proofreading is not necessary. For example, among the 20 canonical amino acids TyrRS is almost absolutely specific for tyrosine. Discrimination against Phe is achieved exclusively through the differences in initial binding energies [74]. Therefore, not all AARS have, or need, proofreading mechanisms. However, when differences in binding energies of a particular amino acid to AARS are inadequate, editing is used as a major determinant of enzyme selectivity. Aminoacyl-adenylate formation is the least accurate step in the tRNA aminoacylation pathway and in many cases AARS misactivate similar amino acids (e.g. Leu/Ile, Ser/Thr, etc.) at such high frequencies that formation of AARS-bound noncognate aminoacyl-adenylates is inevitable. Such noncognate adenylates are destroyed via two alternative pathways: (i) pretransfer, by hydrolysis of the noncognate aminoacyl-adenylates, or (ii) posttransfer, by the hydrolysis of the mischarged tRNA. Overall, by including an editing step in aminoacylation, considerably increased AARS selectivity (by a factor over 100) for amino acids can be obtained [75]. AARS for branched-chain amino acids like ThrRS, ValRS, LeuRS and IleRS have evolved additional domains or insertions in their structures, where each amino acid (cognate, noncognate, noncanonical, biogenic) is checked though a “double sieve” [21]. This means that a single (catalytic) active site is supplemented with an additional editing (hydrolytic) site that dissociates noncognate amino acids (Fig. 9). There is solid structural evidence for the presence of a second binding site, as found in the crystal structure of IleRS [76]. Recently, structure elucidation of class II ThrRS enzymes from E. coli and S. aureus revealed the recruitment of Zn2+ ions in the process of cognate amino acid recognition [77]. Zinc ions interact with hydroxyl and amino groups of the cognate amino acid Thr; due to presence of Zn2+ in the active center, the isosteric, but noncognate, amino acid valine is rejected. However, this reaction is not sufficient to discriminate the sterically different, but chemically similar, noncognate amino acid serine. Serine is indeed charged onto tRNAThr ; this misacylated Ser-tRNAThr is then translocated from the catalytic to the editing site through a shuttle of the acceptor (CCA) part and non-
7 Translational Proofreading
Fig. 9 The general concept of doublesieve (two-step) substrate selection in isome AARS classes (left), and the threedimensional structure of class II ThrRS with marked catalytic (aminoacylation), editing and anticodon-binding domains (right). The editing mechanism in both classes is based on the dynamics of acceptor end bind-
ing: the aminoacylated 3 -end of cognate tRNA should be faster in movement towards the editing site than aminoacyl-tRNA dissociation from the enzyme. For more detailed account, see [76, 77]. The ThrRS structure was generated by MolMol using PDB entries available at the AARS Databank (http://rose.man.poznan.pl/aars/).
cognate serine is hydrolyzed on the basis of steric size. In summary, the chemistry of discrimination is achieved by inclusion of Zn2+ in the catalytic site; discrimination based on stereochemistry is gained by addition of the editing domain to an enzyme. The generally accepted model of functional editing is that either the tRNA accepting end or the noncognate adenylate shuttles from the catalytic to the editing site (Fig. 9). Such unimolecular substrate translocations are general mechanisms involved in editing steps at other levels of genetic message transfer as well, i.e. in DNA replication as well as in translation on the ribosome [78]. The most recent finding of tRNA-dependent editing by AlaRS indicates that editing of misactivated glycine or serine in this enzyme requires a tRNA cofactor [40]. This class-specific editing domain was found to be essential for cell growth in the presence of elevated concentrations of glycine or serine. In general, there are three error-correcting levels in protein translation: (i) amino acids selection (pretransfer editing), (ii) tRNA charging (posttransfer editing) and (iii) ribosome synthesis. These processes together ensure that the error frequency in translation does not exceed on average around 0.03% [79]. The need for discrimination goes beyond the canonical 20 amino acids, since the cellular milieu is also full of metabolic intermediates and other biogenic amino acids. For example, homocysteine, a metabolic precursor of Met, is the substrate for activation via MetRS, but is efficiently edited in vitro and in vivo in a way that enzyme-bound homocysteinyl adenylate is cyclized to yield homocysteine lactone [80]. Others Met analogs, norleucine and selenomethionine, are recognized in aminoacylation reactions and subsequently loaded onto tRNAMet [81]. Obviously, E. coli and other organisms were not under the selective pressures to evolve such a mechanism to prevent their activation, tRNA loading and translation into protein sequences. In other words, editing has not evolved against specific features of chemically or sterically similar synthetic amino acids which living cells have never encountered in
19
20 The Cellular Translation Apparatus
their evolutionary paths [82]. Such a lack of absolute substrate specificity of AARS has been well documented in many systems for decades (for more details, see the excellent reviews of Jakubowski [75], Fersht [73] and Cramer [83], and references cited therein) and was very early recognized as a route for the expansion of the amino acid repertoire in vivo.
8 Ribosomal Decoding – A Brief Overview
Subsequent to their aminoacylation, aminoacyl-tRNAs, although they might be subject to further modifications, are usually ready for translation. Aminoacyl-tRNAs are channeled to the ribosomal A and P sites where elongation takes place. The events between the initial binding of aminoacyl-tRNA and amino acid translation into the polypeptide sequence can be summarized as the ribosomal elongation cycle. This cycle is divided into three basic reactions: (i) occupation of the A site by aminoacyl-tRNA, (ii) peptide bond formation where the nascent polypeptide chain is transferred from the P site tRNA onto the A site tRNA (this leaves deacylated tRNA at the P site and a peptidyl-tRNA at the A site) and (iii) translocation involving movement of mRNA:tRNA on the ribosome by one codon length, thus freeing the A site for the next incoming aminoacyl-tRNA (Fig. 6). An aminoacyl-tRNA is delivered to the ribosomal A site as the ternary complex AA-tRNA:EF-Tu:GTP where the displayed codon is specific for the cognate tRNA. Cognate codon – anticodon interactions result in the dissociation of EF-Tu from the ribosome accompanied by GTP hydrolysis and accommodation of the acceptor end of tRNA into the A site of the 50S subunit. Since many other tRNA competitors exist (in E. coli there are 41 anticodon species alone; in eukaryotic cells even more) the ribosome must discriminate between relatively large ternary complexes (selection pathway) on the basis of an anticodon (which represents a relatively small discrimination area) [50]. During the initial decoding reaction the A site is in a low-affinity state which restricts reaction of the ternary complex to mainly codon – anticodon reactions, thus enabling discrimination between cognate and noncognate interactions. Such decoding is able to sense the stereochemical correctness of the base pairing in the anticodon – codon duplex. Simple mistakes in the decoding of a codon are thought to be prevented by ribosomal proofreading, also known as “kinetic” proofreading (for a more detailed discussion, see recent extensive books and reviews [50, 84– 86]). It was observed that the second letter of the codon is most important due to its lowest detectable frequency in misreading, whereas the third codon position (“wobble” position) allows some tRNAs to recognize several different codons (Fig. 10). The pairing at the wobble positions is also influenced by the nature of tRNA nucleoside modifications. Loss of the correct reading frame (frameshift) usually means loss of genetic information due to the appearance of a premature stop codon in the A site and will result in the release of the peptide fragment from the ribosome. In other cases, such as the release of prematurely short peptidyl-tRNA from the ribosome, processing errors take place. For example, proteins whose synthesis on a ribosome has stalled
9 Codon Bias and the Fidelity of Protein Synthesis
Fig. 10 Codon – anticodon base pairing in the ribosomal 30S A site (“decoding center”). In addition to the cognate aminoacyltRNA there are numerous noncognate as well as a few “near-cognate” aminoacyltRNA species that must be efficiently discriminated. The anticodon must correctly
interact with first two nucleotides of the codon following the Chargraff pairing rules, whereas Crick’s “wobbling” is allowed at the third position. Near-cognate aminoacyltRNA is rejected initially or during proofreading after accommodation in the A site of 50S subunit. (Modified from [89].)
(e.g. due to the frequent appearance of rare codons) are marked for degradation by the addition of peptide tags to their C-termini in a reaction-mediated ssrA (for small stable RNA) RNA (also known as tmRNA or 10Sa RNA), which functions as both a tRNA and mRNA [87]. Aminoacylation of tmRNA by AlaRS is an emergency mechanism to recycle stalled ribosomes in E. coli; it results in tagging of unfinished peptide and release of the ribosome from the unproductive complex. The control mechanisms of ribosomal translation, initiation and termination are discussed in detail elsewhere [88, 89], and are still the subject of intensive research.
9 Codon Bias and the Fidelity of Protein Synthesis
Protein sequences are reproduced many times over in the cell ontogenetic and phylogenetic history by a highly accurate mechanism that guarantees the invariance of their basic structure. The fidelity at all levels of protein synthesis is guaranteed by the already discussed proofreading (i.e. error-correcting) mechanisms [69]. During replication of DNA, which is a crucial process for the preservation of species, error rates are in the range 10−6 to 10−10 , while translation errors are usually about 2 × 10−3 to 2 × 10−4 in normal E. coli cells [75]. The most common errors during protein translation are frameshifts, premature truncations, low expression rates and misincorporations [90]. The rate of these errors is, among others, dependent on the nature and the context of the particular codons in a target sequence. For example, the error rate increases by about 100 times when the supply of charged tRNAs is reduced [91]. Indeed, it is possible to demonstrate in vitro that the fidelity of the translation is directly dependent on relative concentrations of isoacceptor tRNAs, i.e. each living cell has its own set of biases for the use of the 61 codons for the amino acid and tRNA population which is in direct correlation with the
21
22 The Cellular Translation Apparatus
codon composition of total mRNA [92]. The frequency of appearance of particular synonymous codons (i.e. codon bias or codon usage) can be significantly different between genes, cell compartments, cells and organisms. Codon usage as a direct result of the degeneration of the genetic code is correlated to the abundance of a particular tRNA species in the cytosol. It is well known that evolutionary advanced, i.e. specialized, eukaryotic cells, like erythrocytes, translate hemoglobin mRNA sequences with high fidelity. Indeed during ontogenic development of highly specialized erythrocytes, its amino acid pools and supply routes are regulated and properly balanced. However, the imbalances in amino acid pools lead to errors in tRNA charging. These considerations should be particularly taken into account in the case of heterologous gene expression. For example, E. coli that produces high levels of recombinant proteins which constitute up to 30% of total cell protein mass [93] can have problems with the intracellular balance and amino acid supply. It is not surprising since such E. coli is not an evolutionary specialized cell like the erythrocyte. Thus, efficient routine heterologous gene expression in E. coli might produce target proteins with considerable higher error rates [94]. A relatively high expression level of a cloned heterologous gene might impose a request for charged tRNAs on a host cells far above the capacity of their normal tRNA population (“stringent response”) [15]. For example, the Arg codon AGA contributes on average 40% of the total tRNAArg pool in yeast, while in E. coli it is the rarest codon (contributing less than 4% to total tRNAArg population) [92]. There are well-documented cases of the expression of yeast genes in E. coli as host cells, where the higher rate of Arg → Lys side-chain substitutions in target proteins was observed [95]. This is possibly due to pronounced differences in codon usage between yeast and E. coli, and could even be enhanced through proper growth media manipulation. Thus, the presence of rare codons in heterologous mRNA resulted in the rapid clearance of specific charged isoacceptor tRNA, thus limiting higher rates of expression. Rare codons in heterologous mRNA might be responsible not only for errors in aminoacyl-tRNA selection on ribosomes and decreased rates of protein expression, but also might lead to mRNA stalling at ribosomes, and induce tagging and subsequent degradation of the partial translation product [87]. Finally, codon usage also dictates the strength of expression of a particular gene in a particular organism and serves as a signal for the control of gene expression [96]. That means that the relative abundance of intracellular tRNA species correlates directly with gene expression. The most abundant tRNA species are mainly a function of expression of those genes that are important for cell growth and metabolism [97].
10 Preprogrammed Context-dependent Recoding: fMet, Sec, Pyl, etc
The proteome of the living cell is multidimensional, i.e. a single gene can specify a number of different protein products. Indeed, organisms of all taxa are provided with a whole repertoire of pathways for molecular variations between a gene and
10 Preprogrammed Context-dependent Recoding: fMet, Sec, Pyl, etc 23
its corresponding active protein (or RNA) product, like suppression, promotion, splicing, cotranslational and posttranslational modifications. In this way, the stem – loops, pseudoknots or specific 5 or 3 sequence elements in the mRNA structure can facilitate in-frame changes of decoding (frameshift, bypass) or even induce re-coding (read-through, suppression) or reassignment of a particular codon. For example, viruses such as retroviruses have evolved strategies to bend host machinery to favor their replication by various mechanisms that result in “hijacking” normal translation. These include programmed ribosome slipping (backward or forward frameshifting), ribosome hopping (bypassing a long sequence of a message) and suppression (read-through, reassignments) in specific sequence contexts. These phenomena are not restricted to viruses; in a minority of mRNA of most organisms diverse sequence-specific instructions are preprogrammed: (i) for altering the linear mechanism of readout and (ii) for sense or nonsense codon reassignments [98]. The AUG codon is the principal signal for initiation of protein synthesis in both eukaryotes and prokaryotes. However, this initiation event is context dependent. For example, in addition to AUG, there is a requirement for the Shine – Dalgarno sequence for proper initiation in prokaryotes; in eukaryotes it is the feature of the 5 -end of mRNA that is important for initiation of protein synthesis. The context-dependent alternative reading or re-coding of the same sense AUG codon should be principally possible only if two adaptors exist. Indeed, two distinct methionyl-tRNAMet exist as substrates for the same cellular MetRS: one exclusively for initiation codons and the other for internal codons. In addition, prokaryotic Met-tRNAMet is formylated [99] (Met-tRNAMet → fMet-tRNAMet ) by intracellular enzymes in a similar manner to asparagines and glutamines (i.e. Asp-tRNAAsn → Asn-tRNAAsn ; Glu-tRNAGln → Gln-tRNAGln ) [41]. The extension of this strategy for in-frame stop codon suppression is also possible only in a given context. For this, special signals are required; these should allow the translation of machinery to distinguish between internal (those that should be reassigned) and termination coding triplets. In a limited number of genes, the in-frame UGA and UAG codons, normally termination triplets, specify the special noncanonical amino acid selenocysteine (Sec) [100] and pyrrolysine (Pyl) [101, 102]. For example, several enzymes (e.g. glutathione peroxidase, deiodinase and formate dehydrogenase) contain UGA codons almost in the middle of their coding regions and correspondingly Sec in their polypeptide sequences. In the deiodinase mRNA, a 250-bp region in the 3 -untranslated region (about 1 kbp removed from the UGA) is also required for selenocysteine incorporation. This local mRNA structure is known as the “Sec insertion element” (SECIS) that allows Sec-tRNASec to find and translate the target UGA codon (Fig. 11). Sec has it own tRNASec which is first charged with Ser by SerRS. Subsequently, the Ser-tRNASec is enzymatically modified to form Sec-tRNASec . Upon binding on a special EF-Tu-like elongation factor (SelB) Sec-tRNASec competes directly with release factor 2 (RF2) for binding to the P site of the ribosome. Briefly, the cotranslational incorporation requires concerted interactions between specialized mRNA structures (SECIS), elongation factors (SelB) and accessory proteins (selenophosphate synthase, Sec synthase) [103].
24 The Cellular Translation Apparatus
Fig. 11 Preprogrammed context-dependent re-coding of AUG, UGA and UAG codons. Structures of special canonical amino acids are presented on the left side. On the right side are RNA structures that serve as context signals for in-frame UGA (Sec) and UAG (Pyl) codon reassignment (modified from [107] with permission). The initial and AUG internal codons are distinguished by specific features of the N-terminal mRNA sequence (e.g. Shine – Dalgarno sequence element). The internal and terminal UGA (Sec) and UAG (Pyl) coding triplets are differentiated by specific (both upstream and downstream) local arrangements of the mRNA structure around the stop codon pre-
programmed for suppression. In bacteria, SelB binds to Sec-tRNASec and SECIS, resulting in suppression of UGA which is in this context cognate to Sec (in Archaea and eukaryotes this process is slightly different) [103]. In other words, the immediate context of the UGA stop signal facilitates, for example, Pyl insertion at greater that 75% efficiency [105]. This is much more than for engineered suppression of orthogonal AARS:tRNA pairs, which are usually less than 20% efficient in incorporation. The existence of the Pyl insertion element (PYLIS), which acts in a manner similar to that of SECIS, is also postulated [107].
The evolutionary rationale behind Sec context-dependent translational integration into proteins might be explained by chemical argumentation – there are significant physicochemical differences between Cys and its analog Sec [e.g. pK a (Sec) = 5.2; pK a (Cys) = 8.3]. At physiological pH 7.0 thiol function is not deprotonated, while selenols exist almost exclusively as selenolates and these are better nucleophiles than the thiolates [82]. Sec occurs as the active center of a few enzymes such as formate dehydrogenase; the Sec → Cys replacement in this enzyme lowers its turnover number by two orders of magnitude [103]. The complex chemistry of selenium requires Sec to be protected from the bulk solvent, i.e. water exclusion. Indeed, in the crystal structure of the Sec enzyme glutathione peroxidase the single active site Sec is located in a flat depression caged with aromatic and hydrophobic amino acids that sequesters the residue from the bulk solvent [104]. In the other words, an efficient cotranslational system that allows Sec protection during its translational integration into proteins evolved. While Sec is metabolically generated by a pretranslational modification of SertRNASec , i.e. via indirect tRNA-dependent amino acid biosynthesis, Pyl is directly Pyl attached to tRNACUA by pyrrolysyl-tRNA synthetase (PylRS) as a response to a single in-frame UAG codon in Methanosarcina barkeri monomethylamine methyltransferase. The PylRS is truly the 21st AARS, belonging to an archaeal class II
11 Beyond Basic Coding – Posttranslational Modifications 25 Pyl
family which charge specifically Pyl to pyrrolysyl-tRNA (tRNACUA ); lysine itself and its cognate tRNALys are not substrates of this enzyme. Krzycki and coworkers have Pyl recently demonstrated that PylRS activates Pyl with ATP and charges it to tRNACUA in vitro and in E. coli cells [105]. The biochemical necessity of Pyl in the versatile methane-producing metabolism in Methanosarcinidae is easy to understand – the crystal structure of monomethylamine methyltransferase indicates a vital role of the Pyl side-chain in methylamine activation [101]. The questions of Pyl abundance and integration into metabolism and protein structure of other species remain to be clarified in future experiments. Although versatile and multifaceted mechanisms for Sec, Pyl or fMet insertions in protein sequences exist, context dependency is the ubiquitous feature of these rather exceptional phenomena. They include recruitment of intermediary metabolism enzymes (fMet and Sec) and even an increase in the number of AARS:tRNA pairs (Pyl). They can be taxonomically assigned as species-specific local changes in the codon meaning or preprogrammed modifications of canonical decoding rules or simply context-dependent re-coding (Fig. 11). In this context, translated fMet, Sec and Pyl can be assigned as special canonical amino acids, although contemporary literature almost unanimously treats them as the 21st (Sec) [100] and 22nd (Pyl) [106, 107] canonical amino acids. Future mapping of genomes and proteomes might yield yet other unpredictable mainly species-specific surprises, which (predictably) will be baptized as the 23rd, 24th, 25th, etc., amino acids. The processes of the local changes in codon meaning are not strictly separated from the coding process; thus, they are probably as old as the basic coding itself. The extent of these phenomena is still the subject of speculation, i.e. whether such preprogrammed re-coding events are widespread mechanisms to produce minor products in addition to standard proteins from mRNA [98]. On the other hand, it is also unclear which intrinsic limitations in such variations are imposed. For example, which factors limit the variability of specific recognition mechanisms in AARS:tRNA interactions? Maybe these phenomena could be better understood in the light of growing experimental evidence which shows that aminoacylation and editing reactions play an important role in the control of diverse cellular processes [87]. However, it seems that during the evolution of advanced cellular forms, these progressed a step further and bypassed those ancestral machineries in order to replace them with more efficient enzymatic mechanisms which are strictly separated from basic coding (posttranslational modifications) as better strategies to compensate for restrictions in the expansion of the coding repertoire. Indeed, the chemical diversity of the proteins achieved by local changes in codon meaning could not rival those gained via posttranslational modifications.
11 Beyond Basic Coding – Posttranslational Modifications
Translation from the language of nucleotides to the language of amino acids takes place when the appropriate codon – anticodon pairs are linked together on the
26 The Cellular Translation Apparatus
ribosome, while all reactions beyond this point are in fact posttranslational. It is well known that only a few proteins have a final covalent structure which is a simple accurate translation of mRNA. Indeed, there are strong reasons why the release of a completed polypeptide chain from a ribosome is usually not the last chemical step in the formation of a protein. For example, the regulation of protein activity is usually accomplished at the level of its gene expression, which determines the amount to be synthesized by the cell. However, other levels of regulation also exist; they are achieved by interactions with other proteins and small molecules or by enzymatically controlled posttranslational modifications. Various covalent modifications (e.g. disulfide bridge formation, carboxypeptidase processing of basic amino acids, γ -carboxylation of Glu residues, phosphorylation, glycosylation, C-terminal amidation, etc.) often occur, either during or after assembly of the polypeptide chains. In fact, protein maturation is a result of processing that is inevitably connected with transport, noncovalent folding and covalent modification of peptide bonds and amino acid side-chains. With the delivery of new inorganic or organic groups for modifications, living cells provide protein structures with an additional versatility able to profoundly influence/modulate/change their properties and functions. Such modifications are normally enzyme-catalyzed, reversible (e.g. phosphorylation, nucleotidylation, methylation, acetylation or ADP rybosylation) or irreversible processes (e.g. limited proteolyses of zymogens like chymotrypsinogen, proelastase, procarboxypeptidase and prophospholipase) [108]. The spontaneous formation of mixed disulfides or cross-linking reactions in proteins such as collagen and elastin are examples of nonenzymatic posttranslational reactions. Protein maturation also includes cleavage of signal sequences in protein trafficking or polypeptide targeting for degradation. Other mature forms of proteins are generated by precursor cleavage (e.g. polyprotein processing, proteolytic cleavage, intein splicing), by specific modifications (e.g. L → D amino acid conversions) or by addition of various chemical groups and molecules (e.g. phosphoryl, methyl, hydroxyl, carboxyl, myristyl, prenyl, glycophospholipid, sugars, ADP-ribose, dipthamide and ubiquitin). The most versatile posttranslational chemistry is achieved at the functional groups of Tyr, Ser, Lys and Cys (Fig. 12) [109]. Although the “Anfinsen dogma” [110] predicts signals for the proper folding of a polypeptide to be contained within its amino acid sequence (i.e. principally no other proteins are needed for proper folding), some proteins do indeed require other proteins (i.e. chaperones) that assist in their folding in order to prevent aggregations or fibril formation, or cofactors such as flavins, pyridoxal and metal ions. However, disulfide bond formation, glycosylation, hydroxylation, proteolytic processing and cis/trans prolylpeptidyl isomerization can also influence the attainment of the native-folded state. In addition, sulfation, phosphorylation – dephosphorylation and a variety of other reactions might alter the conformation of a whole protein. Not surprisingly, reversible protein phosphorylation, principally on serine, threonine or tyrosine residues, plays critical roles in the regulation of many cellular processes, including the cell cycle, growth, apoptosis and signal transduction pathways [111].
11 Beyond Basic Coding – Posttranslational Modifications 27
Fig. 12 Common posttranslational modifications. Aliphatic amino acids, Lys, Arg, Thr, Cys and Ser together with Pro and His frequently serve as targets for various posttranslational modifications (upper).
Aromatic (Trp, Tyr and Phe) and Glx/Asx amino acids with their usual posttranslational modifications (lower). These modifications are listed using [108, 109, 111, 112].
A greatly expanded functional diversity achieved through posttranslational modifications of proteins can be found, especially in the evolutionarily advanced metazoans. The chemistry of these modifications is strictly separated from basic coding processes and includes selective attachment of different classes of chemical groups on defined residues in specific proteins. Protein glycosylation with carbohydrates like pentoses, hexosamines, N-acetylhexosamines, deoxyhexoses, hexoses and sialic acid, that are usually attached at residues like serine, threonine or asparagines, is one of the major classes of posttranslational modifications. Carbohydrates in the
28 The Cellular Translation Apparatus
Fig. 12 (continued).
form of asparagine-linked (N-linked) or serine/threonine (O-linked) oligosaccharides are crucial structural components of many cell-surface and secreted proteins. Glycoproteins on cell surfaces are important for communication between cells, for maintaining cell structure and for self-recognition by the immune system. Therefore, alteration of cell-surface glycoproteins can generate significant effects on cellular physiology and activity [112]. Primary amines from the protein N-terminus or lysine residues are targets for the following posttranslational chemical reactions: methylation, acetylation, farnesylation, biotinylation, stearoylation, formylation, myristoylation, palmitoylation, geranylgeranylation and lipoic acid additions [109]. Sulfhydryl group of cysteine which participates in a rather large number of posttranslational reactions (Fig. 12) is also a target for glutationylation and oxidation. The appearance of Metsulfoxide in proteins is the result of an undesirable, spontaneous oxidation. This process (that might lead to inactivation of the protein) is reversed in vivo by Met(O) reductase, an enzyme with a repair function [113]. Asparagines and glutamines are often deamidated, especially after longer protein storage in various buffers [114]. Amino acids modified in this way are assigned as “biogenic”. This class of amino acids includes metabolic products like urea cycle intermediates (ornithine, citrulline and arginosuccinate), homocysteine, β-alanine or amino acids with a special physiologic function like γ -butyric acid, creatine or thyroxine, or they can
References
arise as secondary metabolites often found in large quantities in plants and microorganisms [115]. Enzyme-mediated production of novel biogenic amino acids, accompanied by large protein functional diversification, is mainly a “privilege” of eukaryotes, especially metazoans. This is expected, since these advanced and mostly specialized cell forms with networks of membrane structures and many compartments require additional chemistries to perform their sophisticated functions. Although these operations are energetically expensive, eukaryotes can afford them by switching-off many of their basic synthetic activities. Prokaryotes indeed have a simple synthetic machinery that uses raw chemical building blocks and are thus independent of any other life form [116]. Many eukaryotes can do away with the task of completing some of the basic biological syntheses and, by retaining the basic cytoplasmic chemistry of prokaryotes, can use them as a source of essential chemicals like fats, essential amino acids, co-enzymes and minerals [117].
References 1 MAYR, E. (2001). What Evolution Is. Perseus Books, New York. 2 BUDISA, N. (2004). Prolegomena to future efforts on genetic code engineering by expanding its amino acid repertoire, Angewandte Chemie International Edition 43, 3387–3428. 3 FISCHER, E. (1906). Untersuchungen u¨ ber Aminos¨auren, Polypeptide und Proteine, Berichte der deutschen chemischen Gesellschaft 39, 530–610. 4 SANGER, F. and THOMPSON, E. O. P. (1952). The amino-acid sequence in the glycyl chain of insulin, Biochemical Journal 52, R3–R3. 5 GRIFFITH, F. (1928). The significance of pneumococcal types, Journal of Hygiene 27, 113–159. 6 AVERY, O. T., MACLEOD C. M. and MCCARTY, M. (1944). Studies on the chemical nature of the substance inducing transformation of pneumococcal types, Journal of Experimental Medicine 79, 137–157. 7 HERSHEY, A. D. and CHASE, M. (1952), Independent functions of viral protein and nucleic acid in growth of bacterio-phage, Journal of General Physiology 36, 39–56. 8 BRENNER, S., JACOB, F. and MESELSON, M. (1961). An unstable intermediate carrying information from genes to
9
10
11
12
13
14
15
ribosomes for protein synthesis, Nature 190, 576–581. BEADLE, G. W. (1957). Genetic basis of biological specificity, Journal of Allergy 28, 392–400. JACOB, F. and MONOD, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins, Journal of Molecular Biology 3, 318–356. CRICK, F. H. C. (1957). On protein synthesis, Symposium of the Society for Experimental Biology 12, 138–163. DANCHIN, A., GUERDOUX-JAMET, P., MOSZER, I. and NITSCHKE, P. (2000). Mapping the bacterial cell architecture into the chromosome, Philosophical Transactions of the Royal Society of London Series B Biological Sciences 355, 179–190. DUPONT, C. (2003). Protein requirements during the first year of life, American Journal of Clinical Nutrition 77, 1544S–1549S. MEISENBERG, G. and SIMMONIS, W. H. (1998). Principles of Medical Biochemistry. Mosby, St Louis, MO. NEIDHARDT, F. C. and UMBARGER, E. N. (1996). Chemical composition of Escherichia coli. In Escherichia coli and Salmonella: Cellular and Molecular Biology (NEIDHARDT, F. C. et al. eds), pp. 13–16. 2nd Ed. ASM Press, Washington, DC.
29
30 The Cellular Translation Apparatus
16 IBBA, M. and S¨OLL, D. (1999). Quality control mechanisms during translation, Science 286, 1893–1897. 17 de POUPLANA, L. R. and SCHIMMEL, P. (2004). Aminoacylations of tRNAs: record-keepers for the genetic code. In Protein Synthesis and Ribosome Structure (NIERHAUS, K. and WILSON, D. N., eds), pp. 169–184. Wiley-VCH, Weinheim. 18 VAUGHAN, M. and STEINBERG, D. (1959). The specificity of protein synthesis, Advances in Protein Chemistry XIV, 116–173. 19 DOOLITTLE, R. F. and HANDY, J. (1998). Evolutionary anomalies among the aminoacyl-tRNA synthetases, Current Opinion in Genetics and Development 8, 630–636. 20 SZYMANSKI, M., DENIZIAK, M. and BARCISZEWSKI, J. (2000). The new aspects of aminoacyl-tRNA synthetases, Acta Biochimica Polonica 47, 821–834. 21 IBBA, M. and S¨OLL, D. (2000). Aminoacyl-tRNA synthesis, Annual Review of Biochemistry 69, 617–650. 22 YAREMCHUK, A., CUSACK, S. and TUKALO, M. (2000). Crystal structure of a eukaryote/archaeon-like prolyl-tRNA synthetase and its complex with tRNAPro(CGG), EMBO Journal 19, 4745–4758. 23 IBBA, M. and S¨OLL, D. (2003). Aminoacyl-tRNA synthetase and evolution. In Translation Mechanisms (LAPOINTE, J. and BRAKIER-GINGRAS, L., eds), pp. 25–37. Landes Biosciences, Georgetown, TX. 24 MIRANDE, M. (2003). Multi-AARS complexes. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S. eds), pp. 298–309. Landes Bioscience, Georgetown, TX. 25 ERIANI, G., DELARUE, M., POCH, O., GANGLOFF, J. and MORAS, D. (1990). Partition of tRNA synthetases into two classes based on mutually exclusive sets of sequence motifs, Nature 347, 203–306. 26 HENKIN, T.M. (2003). Regulation of Aminoacyl-tRNA Synthetase gene expression in bacteria. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp.
27
28
29
30
31
32
33
34
35
309–313. Landes Bioscience, Georgetown, TX. IBBA, M., MORGAN, S., CURNOW, A. W., PRIDMORE, D. R., VOTHKNECHT, U. C., GARDNER, W., LIN, W., WOESE, C. R. and S¨OLL, D. (1997). A euryarchaeal Lysyl-tRNA synthetase: resemblance to class I synthetases, Science 278, 1119–1122. BLANQUET, S., CREPIN, T. MECHULAM, Y. and SCHMITT, E. (2003). Methionyl-tRNA synthetases. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp. 47–58. Landes Biosciences, Georgetown, TX. BEDOUELLE, H. (2003). Tyrosyl-tRNA synthetases. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp. 111–125. Landes Biosciences, Georgetown, TX. DUBOIS, D. Y., LAPOINTE, J. and SHUNICHI, S. (2003). Glutamyl-tRNA synthetases. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp. 89–94. Landes Biosciences, Georgetown, TX. BRICK, P. and BLOW, D. M. (1987). Crystal-structure of a deletion mutant of a tyrosyl-transfer RNA-synthetase complexed with tyrosine, Journal of Molecular Biology 194, 287–297. GAY, G. D., DUCKWORTH, H. W. and FERSHT, R. A. (1993). Modification of the amino-acid specificity of tyrosyl-tRNA synthetase by protein engineering, FEBS Letters 318, 167–171. FIRST, E. A. (2003). Catalysis of the tRNA aminoacylation reaction. In Aminoacyl-tRNA Synthetases (IBBA, M. FRANCKLYN, C. and CUSACK, S., eds), pp. 328–347. Landes Bioscience, Georgetown, TX. CARTER, C. W. (2003). Trytophanyl-tRNA synthetases. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds) pp. 99–108. Landes Bioscience, Georgetown, TX. CUSACK, S., COLOMINAS, C., H¨ARTLEIN, M., NASSAR, N. and LEBERMAN, R. (1990). A second class of synthetase structure revealed by X-ray-analysis of Escherichia coli seryl-transfer RNA synthetase at 2.5Å, Nature 347, 249–255.
References 36 WEYGAND-DURASEVIC, I. and CUSACK, S. (2003). Seryl-tRNA synthetases. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp. 177–190. Landes Bioscience, Georgetown, TX. 37 IBBA, M., FRANCKLYN, C. and CUSACK, S. (eds) (2003). Aminoacyl-tRNA Synthetases. Landes Bioscience, Georgetown, TX. 38 LAPOINTE, J. and BRAKIER-GINGRAS, L. (2003). Translation Mechanisms, Landes Bioscience, Georgetown, TX. 39 NUREKI, O., KOHNO, T., SAKAMOTO, K., MIYAZAWA, T. and YOKOYAMA, S. (1993). Chemical modification and mutagenesis studies on zinc-binding of aminoacyl-transfer RNA-synthetases, Journal of Biological Chemistry 268, 15368–15373. 40 SWAIRJO, M. A. and SCHIMMEL, P. R. (2005). Breaking sieve for steric exclusion of a noncognate amino acid from active site of a tRNA synthetase, Proceeding of the National Academy of Sciences of the USA 102, 988–993. 41 FENG, L., TUMBULAHANSEN, D., MIN, B., NAMGOONG, S., SALAZAR, J., ORELLANA, O. and S¨OLL, D. (2003). Transfer RNA-dependent amidotransferases: key enzymes for Gln-tRNA and Asn-tRNA synthesis in nature. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp. 314–320. Landes Bioscience, Georgetown, TX. 42 STATHOPOULOS, C., LI, T., LONGMAN, R., VOTHKNECHT, U. C., BECKER, H. D., IBBA, M. and S¨OLL, D. (2000). One polypeptide with two aminoacyl-tRNA synthetase activities, Science 287, 479–482. 43 IBBA, M., BECKER, H. D., STATHOPOULOS, C., TUMBULA D. L. and S¨OLL, D. (2000). The Adaptor hypothesis revisited, Trends in Biochemical Sciences 25, 311–316. 44 ZAMECNIK, P. C. (1960). Historical and current aspects of the problem of protein synthesis, Harvey Lectures 54, 256–279. 45 HOAGLAND, M. B., and ZAMECNIK, P. C. (1957). Intermediate reactions in protein biosynthesis, Federation Proceedings 16, 197–197.
46 WATSON, J. D., HOPKINS, N. H., ROBERTS, J. W., STEITZ, J. A. and WEINER, A. M. (1988). Molecular Biology of the Gene. Benjamin Cummings, Amsterdam. 47 CHAPEVILLE, F., EHRENSTEIN, G. V., BENZER, S., WEISBLUM, B., RAY W. J. and LIPMANN, F. (1962). On the role of soluble ribonucleic acid in coding for amino acids, Proceedings of the National Academy of Sciences of the USA 48, 1086–1092. 48 HOLLEY, R. W., APGAR, J. EVERETT, G. A., MADISON, J. T., MERRILL, S. H., PENSWICK, J. R. and ZAMIR, A. (1965). Structure of an alanine transfer RNA. Federation Proceedings 24, 216. 49 MARQUEZ, V. and NIERHAUS, K. (2003). tRNA and synthesis. In Protein Synthesis and Ribosome Structure (NIERHAUS, K. and WILSON, D. N., eds), pp. 145–183. Wiley-VCH, Weinheim. 50 NIERHAUS, K. and WILSON, D. N. (eds) (2004). Protein Synthesis and Ribosome Structure. Wiley-VCH, Weinheim. 51 GIEGE, R. and FRUGIER, M. (2003). Transfer RNA structure and identity. In Translation Mechanisms (LAPOINTE, J. and BRAKIER-GINGRAS, L., eds), pp. 1–24. Landes Biosciences, Georgetown, TX. 52 GIEGE, R., SISSLER, M. and FLORENTZ, C. (1998). Universal rules and idiosyncratic features in tRNA identity, Nucleic Acids Research 26, 5017–5035. 53 ROBERTUS, J. D., LADNER, J. E., FINCH, J. T., RHODES, D., BROWN, R. S., CLARK, B. F. and KLUG, A. (1974). Structure of yeast phenylalanine tRNA at 3 Å resolution, Nature 250, 546–551. 54 KIM, S. H., SUDDATH, F. L., QUIGLEY, G. J., MCPHERSON, A., SUSSMAN, J. L., WANG, A. H. J., SEEMAN N. C. and RICH, A. (1974). 3-Dimensional tertiary structure of yeast phenylalanine transfer-RNA, Science 185, 435–440. 55 BEUNING, P. J. and MUSIER-FORSYTH, K. (1999). Transfer RNA recognition by aminoacyl-tRNA synthetases, Biopolymers 52, 1–28. 56 CALENDAR, R. and BERG, P. (1966). Catalytic properties of tyrosyl ribonucleic acid synthetases from Escherichia coli and Bacillus subtilis, Biochemistry 5, 1690–1695.
31
32 The Cellular Translation Apparatus
57 SCHIMMEL, P. R. and S¨OLL, D. (1979). Aminoacyl transfer RNA-synthetases – general features and recognition of transfer-RNAs, Annual Review of Biochemistry 48, 601–648. 58 SCHULTZ, D. W., and YARUS, M. (1994). Transfer-RNA mutation and the malleability of the genetic code, Journal of Molecular Biology 235, 1377–1380. 59 SCHULTZ, D. W., and YARUS, M. (1996). On malleability in the genetic code, Journal of Molecular Evolution 42, 597–601. 60 HENDRICKSON, T. L. and SCHIMMEL, P. (2003). Transfer RNA-dependent amino acid discrimination by aminoacyl-tRNA synthetase. In Translation Mechanisms (LAPOINTE, J. and BRAKIER-GINGRAS, L., eds), pp. 35–69. Landes Bioscience, Georgetown, TX. 61 SCHULMAN, L. H. A. J. (1988). Recent excitement in understanding transfer RNA identity, Science 240, 1591–1592. 62 MCCLAIN, W. H. (1993). Transfer-RNA identity, FASEB Journal 7, 72–78. 63 SHIMIZU, M., ASAHARA, H., TAMURA, K., HASEGAWA, T. and HIMENO, H. (1992). The role of anticodon bases and the discriminator nucleotide in the recognition of some Escherichia coli transfer-RNAs by their aminoacyl-transfer RNA-synthetases, Journal of Molecular Evolution 35, 436–443. 64 ROULD, M. A., PERONA, J. J., S¨OLL, D. and STEITZ, T. A. (1989). Structure of E. coli glutaminyl-tRNA synthetase complexed with tRNA(Gln) and ATP at 2.8Å resolution, Science 246, 1135–1142. 65 CAVARELLI, J., REES, B., RUFF, M., THIERRY J. C. and MORAS, D. (1993). Yeast transfer RNA (Asp) recognition by its cognate class-II aminoacyl-transfer RNA synthetase, Nature 362, 181–184. 66 SAKS, M. E., SAMPSON J. R. and ABELSON, J. N. (1994). The transfer-RNA identity problem – a search for rules, Science 263, 191–197. 67 de POUPLANA, L. R., and SCHIMMEL, P. (2001). Operational RNA code for amino acids in relation to genetic code in evolution, Journal of Biological Chemistry 276, 6881–6884.
68 SCHIMMEL, P. and S¨OLL, D. (1997). When protein engineering confronts the tRNA world, Proceedings of the National Academy of Sciences of the USA 94, 10007–10009. 69 JAKUBOWSKI, H. (2003). Accuracy of aminoacyl-RNA synthetases: proofreading of amino acids. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp. 384–396. Landes Bioscience, Georgetown, TX. 70 WAKASUGI, K., QUINN, C. L., TAO N. J. and SCHIMMEL, P. (1998). Genetic code in evolution: switching species-specific aminoacylation with a peptide transplant, EMBO Journal 17, 297–305. 71 SOMA, A. and HIMENO, H. (1998). Cross-species aminoacylation of tRNA with a long variable arm between Escherichia coli and Saccharomyces cerevisiae, Nucleic Acids Research 26, 4374–4381. 72 NAIR, S., de POUPLANA, L. R., HOUMAN, F., AVRUCH, A., SHEN, X. Y. and SCHIMMEL, P. (1997). Species-specific tRNA recognition in relation to tRNA synthetase contact residues, Journal of Molecular Biology 269, 1–9. 73 FERSHT, A. R. (1981). Enzymic editing mechanisms and the genetic code, Proceedings of the Royal Society of London Series B Biological Sciences 212, 351–379. 74 FERSHT, R. A., SHINDLER J. S. and TSUI, W. C. (1980). Probing the limits of protein-amino acid side chain recognition with the AARS: discrimination against Phe by tyrosyl-tRNA synthetases, Biochemistry 19, 5520–5524. 75 JAKUBOWSKI, H. and GOLDMAN, E. (1992). Editing of errors in selection of amino acids for protein synthesis, Microbiological Reviews 56, 412–429. 76 FUKAI, S., NUREKI, O., SEKINE, S., SHIMADA, A., TAO, J. S., VASSYLYEV D. G. and YOKOYAMA, S. (2002). Structural basis for double-sieve discrimination of L-valine from L-isoleucine and L-threonine by the complex of tRNA(Val) and valyl-tRNA synthetase, Cell 103, 352–362.
References 77 DOCK-BREGEON, A. C., REES, B., TORRES-LARIOS, A., BEY, G., CAILLET, J. (2004). Achieving error-free translation: the mechanism of proofreading of threonyl-tRNA synthetase at atomic resolution, Molecular Cell 16, 375–386. 78 FERSHT, A. R. (1980). Enzymic editing mechanisms in protein synthesis and DNA-replication, Trends in Biochemical Sciences 5, 262–265. 79 FRANCKLYN, C. (2003). tRNA synthetase paralogs: evolutionary links in the transition from tRNA-dependent amino acid biosynthesis to de novo biosynthesis, Proceedings of the National Academy of Sciences of the USA 100, 9650–9652. 80 JAKUBOWSKI, H. (2000). Translational incorporation of S-nitrosohomocysteine into protein, Journal of Biological Chemistry 275, 21813–21816. 81 FERSHT, A. R. and DINGWALL, C. (1979). Editing mechanism for the methionyl-tRNA synthetase in the selection of amino acids in protein synthesis, Biochemistry 18, 1250–1256. 82 BUDISA, N., MINKS, C., ALEFELDER, S., WENGER, W., DONG, F. M., MORODER, L. and HUBER, R. (1999). Toward the experimental codon reassignment in vivo: protein building with an expanded amino acid repertoire, FASEB Journal 13, 41–51. 83 IGLOI, G. L., VON DER HAAR, F. and CRAMER, F. (1977). Hydrolytic action of aminoacyl-tRNA synthetases from baker’s yeast – chemical proofreading of Thr-tRNA-Val by valyl-tRNA synthetase studied with modified tRNA-Val and amino acid analogs, Biochemistry 16, 1696–1702. 84 WILSON, D. N., and NIERHAUS, K. H. (2003). The ribosome through the looking glass, Angewandte Chemie International Edition 42, 3464–3486. 85 RAMAKRISHNAN, V. (2002). Ribosome structure and the mechanism of translation, Cell 108, 557–572. 86 MOORE, P. B. and STEITZ, T. A. (2003). The structural basis of large ribosomal subunit function, Annual Review of Biochemistry 72, 813–850.
87 KEILER, K. C., WALLER P. R. H. and SAUER, R. T. (1996). Role of a peptide tagging system in degradation of proteins synthesized from damaged messenger RNA, Science 271, 990–993. 88 RODNINA, M. V. and WINTERMEYER, W. (2001). Ribosome fidelity: tRNA discrimination, proofreading and induced fit, Trends in Biochemical Sciences 26, 124–130. 89 OGLE, J. M., CARTER, A. P. and RAMAKRISHNAN, V. (2003). Insights into the decoding mechanism from recent ribosome structures. Trends in Biochemical Sciences 128. 90 EMILSSON, V. and KURLAND, C. G. (1990). Growth rate dependence of transfer-RNA abundance in Escherichia coli, EMBO Journal 9, 4359–4366. 91 SORENSEN, M. A., KURLAND C. G. and PEDERSEN, S. (1989). Codon usage determines translation rate in Escherichia coli, Journal of Molecular Biology 207, 365–377. 92 OSAWA, S., JUKES, T. H., WATANABE, K. and MUTO, A. (1992). Recent evidence for evolution of the genetic code, Microbiological Reviews 56, 229–264. 93 STUDIER, F. W., ROSENBERG, A. H., DUNN, J. J. and DUBENDORFF, J. W. (1990). Use of T7 RNA-polymerase to direct expression of cloned genes, Methods in Enzymology 185, 60–89. 94 KANE, J. F. (1995). Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli, Current Opinion in Biotechnology 6, 494–500. 95 CALDERONE, T. L., STEVENS R. D. and OAS, T. G. (1996). High-level mis-incorporation of lysine for arginine at AGA codons in a fusion protein expressed in Escherichia coli, Journal of Molecular Biology 262, 407–412. 96 KURLAND, C. G. (1991). Codon bias and gene expression, FEBS Letters 285, 165–169. 97 KURLAND, C. G. (1992). Translational accuracy and the fitness of bacteria, Annual Review of Genetics 26, 29–50. 98 POOLE, E. S., MAJOR, L. L., CRIDGE, A. G. and TATE, W. P. (2003). The mechanism
33
34 The Cellular Translation Apparatus
99
100
101
102
103
104
105
106
of recoding in pro- and eukaryotes. In Protein Synthesis and Ribosome Structure (NIERHAUS, K. and WILSON, D. N., eds), pp. 397–426. Wiley-VCH, Weinheim. DURY, A. (1968). Formyl-methionine tRNA and valine tRNA available, Science 162, 621. BOCK, A., FORCHHAMMER, K., HEIDER, J., LEINFELDER, W., SAWERS, G., VEPREK, B. and ZINONI, F. (1991). Selenocysteine – the 21st amino-acid, Molecular Microbiology 5, 515–520. SRINIVASAN, G., JAMES C. M. and KRZYCKI, J. A. (2002). Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA, Science 296, 1459–1462. HAO, B., GONG, W. M., FERGUSON, T. K., JAMES, C. M., KRZYCKI J. A. and CHAN, M. K. (2002). A new UAG-encoded residue in the structure of a methanogen methyltransferase, Science 296, 1462–1466. BOCK, A., THANBICHLER, M., ROTHER, M. and RESCH, A. (2003). Selenocysteine. In Aminoacyl-tRNA Synthetases (IBBA, M., FRANCKLYN, C. and CUSACK, S., eds), pp. 320–328. Landes Bioscience, Georgetown, TX. LADENSTEIN, R., EPP, O., BARTELS, K., JONES, A., HUBER, R. and WENDEL, A. (1979). Structure analysis and molecular model of the selenoenzyme glutathione peroxidase at 2.8 Å resolution, Journal of Molecular Biology 134, 199–218. BLIGHT, S. K., LARUE, R. C., MAHAPATRA, A., LONGSTAFF, D. G., CHANG, E., ZHAO, G., KANG, P. T., GREEN-CHURCH, K. B., CHAN, M. K. and KRZYCKI, J. A. (2004). Direct charging of tRNA(CUA) with pyrrolysine in vitro and in vivo, Nature 431, 333–335. ATKINS J. F. and GESTELAND, R. (2002). The 22nd amino acid, Science 296, 1409–1410.
107 SCHIMMEL, P. and BEEBE, K. (2004). Genetic code sizes pyrrolysine, Nature 431, 257–258. 108 GRAVES, D. J. MARTIN, B. L. and WANG, J. H. (1994). Co- and Post-Translational Modification of Proteins-Chemical Principles and Biological Effects. Oxford University Press, Oxford. 109 KANNICHT, C. (2002). Posttranslational Modification of Proteins Tools for Functional Proteomics. Humana Press, Totowa, NJ. 110 BALDWIN, R. L. (1996). Protein folding from 1961 to 1982, Nature Structural and Molecular Biology 6, 814–817. 111 CLEMENS, M. J. (1997). Protein Phosphorylation in Cell Growth Regulation. Gordon & Breach, London. 112 BROCKHAUSEN, I. and KUHNS, W. (1997). Glycoproteins and Human Disease. Landes Bioscience, Georgetown, TX. 113 VOGT, W. (1995). Oxidation of methionyl residues in proteins – tools, targets, and reversal, Free Radical Biology and Medicine 18, 93–105. 114 WRIGHT, H. T. (1991). Nonenzymatic deamidation of asparaginyl and glutaminyl residues in proteins, Critical Reviews in Biochemistry and Molecular Biology 26, 1–52. 115 ROSENTHAL, G. (1982). Plant Non-protein Amino and Imino Acids. Biological, Biochemical and Toxicological Properties. Academic Press, New York. 116 BUDISA, N., MORODER, L. and HUBER, R. (1999). Structure and evolution of the genetic code viewed from the perspective of the experimentally expanded amino acid repertoire in vivo, Cellular and Molecular Life Sciences 55, 1626–1635. 117 WILLIAMS, R. J. P. and FRAU’STO DA SILVA, J. J. R. (1996). The Natural Selection of the Chemical Elements: The Environment and Life’s Chemistry. Oxford University Press, New York.
1
The Ribosomal Elongation Cycle Knud H. Nierhaus Max Planck Institute for Molecular Genetics, Berlin, Germany
Originally published in: Protein Synthesis and Ribosome Structure. Edited by Knud H. Nierhaus and Daniel N. Wilson. Copyright ľ 2004 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30638-1
1 Introduction
The elongation cycle (parts of previous reviews of the Nierhaus group [1–6] have been integrated in this chapter) is the heart of protein synthesis. Each “heart beat” of the ribosome prolongs the nascent polypeptide chain by a single amino acid. The elongation cycle can be divided into the following three basic reactions: (i) Occupation of the A-site by the incoming tRNA. This step can be further subdivided into (a) the decoding reaction, which mainly restricts the aminoacyl-tRNA and ribosome interactions to codon–anticodon interactions (low-affinity interaction) and (b) the accommodation reaction, where high-affinity binding of the whole tRNA to the A-site results in the docking of the aminoacyl residue into the peptidyl transferase (PTF) center of the large subunit. (ii) Peptide-bond formation, where the nascent polypeptide chain is transferred from the P-site tRNA onto the aminoacyl moiety of the A-site tRNA. This leaves a deacylated (or uncharged) tRNA at the P-site and a peptidyl-tRNA at the A-site (with the nascent chain extended by one amino acid). (iii) Translocation involves the movement of mRNA•tRNA2 on the ribosome by one codon length so as to place the deacylated tRNA into the Esite and the peptidyl-tRNA into the P-site, thus freeing the A-site for the next incoming aminoacyl-tRNA. Progression of the ribosome through these various stages of the elongation cycle is catalyzed by protein factors called elongation factors, specifically elongation factor G (EF-G) and elongation factor Tu (EF-Tu) in bacteria. These factors transiently interact with the ribosome at specific points during the elongation cycle to facilitate movement onto the following stage of the cycle, e.g., EFTu facilitates delivery of the aa-tRNA to the A-site (step 1), whereas EF-G mediates movement or translocation of the tRNAs from the A- and P-sites to the P-and E-sites, respectively (step 3). Translocation shifts the ribosome from a pre-translocational Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 The Ribosomal Elongation Cycle
(PRE) state to a post-translocational state (POST). Both factors have been visualized by cryo-EM in functional complexes with the ribosome at various stages during the elongation cycle [7–10]. By combining these reconstructions with those of PRE and POST tRNA•70S ribosome images, a comprehensive overview of the elongation cycle has been constructed from cryo-EM images [11] (Fig. 1). The ribosomal state before translocation (PRE in Fig. 1c) is characterized by tRNAs at the A- and P-sites and, following translocation (POST as seen in Fig. 1f), by tRNAs at the P- and E-sites. The PRE and POST states represent the main states of the active ribosome and are separated by high activation-energy barriers (about 80–90 kJ mol−1 ; [12]). The POST state probably represents a lower energy level of the ribosome than the PRE state. This is indicated by the fact that after peptide-bond formation an incubation at 37◦ C for 2 min is sufficient to promote translocation from the PRE to the POST state in the absence of the translocation factor EF-G and GTP (“spontaneous translocation“, [13–15]). In contrast, the reverse translocation has never been observed with an isolated POST state. Therefore, during the elongation cycle, the ribosome can be thought of as oscillating between the two main states, namely the PRE and POST states. Even ribosomes carrying only a P-site tRNA (e.g., AcPhe-tRNA), a state that is referred to as the Pi state, seems to exist in two different coformations, as evident from the differential effects of puromycin, tetracycline, viomycin, and thiostrepton on the two subpopulations of ribosomes [16]. Finally, even the empty ribosome can adopt two different conformations, one resembling the PRE, the other the POST conformation, as judged by synergistic effects of EF-Tu and EF-G on their respective uncoupled GTPases [17]. By reducing the activation energy barrier that exists between the PRE and POST states, the elongation factors significantly accelerate protein synthesis by more than four orders of magnitude (104-fold; a spontaneous translocation in the absence of EF-G lasts about 2 min [13], and an enzymatic translocation in the presence of EF-G about 30 µs [18]). Therefore, they resemble enzymes which also lower the activation energy of a reaction and, due to the enormous acceleration factor of 106 –1012 achieved, just enable a reaction to occur [19]. But enzymes accelerate a reaction only until the equilibrium is reached, i.e., enzymes can usually catalyze both the forward and reverse reactions. In this respect, the elongation factors are more specific since they not only accelerate the reaction rate but also determine the direction of the reaction: EF-Tu catalyzes the POST→PRE transition and EF-G the reverse PRE→POST reaction. This unidirectional mechanism of action of the elongation factors explains why the presence of two elongation factors is universally conserved. Only the higher fungi such as Saccharomyces cerevisiae or Candida albicans require a third factor, EF3, which is essential for protein synthesis playing a role in removal of deacylated tRNA from the E-site of the ribosome [20]. Both the universal elongation factors are prototypes of the large superfamily of G-proteins. Like all G-proteins, these elongation factors are GTPases and follow a cycle of activation in the presence of GTP (the “ON” conformation) and deactivation when GDP is bound (“OFF” conformation) [21]. In the “ON” or GTP conformation,
1 Introduction
Fig. 1 Overview of the translation cycle. Multiple cryoelectron microscopic studies have determined the tRNA and elongation factor-binding positions on the 70S ribosome during different stages of the elongation cycle (see Ref. [11] and references therein). These positions of the ribosomal ligands have been overlaid on to an 11.5
Å resolution three-dimensional (3D) map of the ribosome to generate a schematic overview of the elongation cycle, the details of which are provided in the text. The small 30S subunit is in yellow, the 50S large subunit in blue. Adapted from Agrawal et al. [11].
3
4 The Ribosomal Elongation Cycle
G-pro-teins bind to their target substrate, be it a protein or complex, to trigger their specific reaction. Subsequently, the GTPase center of the G-protein is activated and the terminal phosphate residue of the bound GTP molecule is cleaved off to yield GDP. The G-protein now adopts the deactivated “OFF” conformation, which has a lower affinity for the target substrate and therefore dissociates from it. For the cycle to be repeated, GDP must be exchanged for GTP on the G-protein. This means for the elongation factors, the energy liberated by the elongation-factor-dependent GTP hydrolysis is used for the release of the elongation factors after they have done their job rather than for the reaction triggered by the elongation factors in their “on” state (a different view for EF-G was proposed by the “motor protein” hypothesis; see Sect. 5.1).
2 Models of the Elongation Cycle
Before we describe the molecular details of the individual reactions of the elongation cycle, we will briefly and critically consider the two prevailing models of the elongation cycle, namely the hybrid site and α–ε model. 2.1 The Hybrid-site Model for Elongation
This model is based on the observation that bases of rRNA were protected against chemical modification when tRNAs were bound specifically to the ribosomal A-, P- and E-sites. Each tRNA position on the ribosome was correlated with a specific protection pattern, e.g., binding of AcPhe-tRNA (a simple mimic of a peptidyl-tRNA) to the P-site defined the “P-site pattern”, whereas binding of a ternary complex PhetRNA-EF-Tu-GTP to the A-site (after pre-filling the P-site with a deacylated-tRNA) produced an “A-site pattern“. The “E-site pattern” was derived simply from the binding of deacylated tRNA to the E-site, but was not very well defined in the poly(U)-dependent system used [22]. When AcPhe-tRNA was bound to the P-site and the protection patterns were assessed both before and after “peptide-bond formation” (or more accurately, the transfer of the AcPhe to puromycin in the A-site), the P-site pattern shifted to an E-site pattern on the 23S rRNA after peptide-bond formation, whereas an unaltered P-site pattern remained on the 16S rRNA. The conclusion was that the tRNA was in a P/P state before and in a P/E state after the puromycin reaction (where the letter before and after the slash indicate the site bound by the tRNA on the 30S and 50S subunit, respectively). In another experiment, ternary complex was added to ribosomes carrying an AcPhe-tRNA at the P-site, which following peptide-bond formation would put an AcPhe-Phe-tRNA at the A-site and a deacylated tRNA at the P-site. An “A plus P” pattern was observed on the 16S rRNA, whereas the 23S rRNA exhibited a “P plus E” pattern. These results were interpreted such that the analog of the peptidyl-tRNA was in an A/P-site and the deacylated tRNA in a P/E
2 Models of the Elongation Cycle 5
site. Following translocation, the protection patterns suggested that the tRNAs were in the P/P and E-sites, where the E-site is located solely on the 50S subunit rather than on the 30S subunit. The hybrid-site model [22] in its original form is depicted in Fig. 2(A) with its diagnostic feature of hybrid sites after peptide-bond formation: the tRNAs pass through the ribosome with a “creeping” movement. On the 30S the picture follows the classical scheme outlined in the preceding section; however on the 50S, the movement is proposed to start after peptide-bond formation but before EF-G-dependent translocation. The model has been modified in light of the crystal structure [23], where the E-site tRNA shows intensive contacts with the small subunit, such that now the tRNA is found in the E/E-site after translocation [24]. A further change was introduced so that after peptide-bond formation the tRNAs initially remain at the classical A/A- and P/P-sites, before they shift, after an undefined time period, into the hybrid A/P- and P/E-sites, respectively [24]. These changes were necessary to account for the observation that no hybrid sites were observed in a systematic study of the elongation cycle ([11]; also see below) and the CCA ends did not move after peptide-bond formation in a 50S crystal [25]. In functional studies, the P-site is operationally defined as the site where an acylated-tRNA can react with puromycin, in contrast with an A-site location where it is for all intense and purposes puromycin unreactive. The observation that an acylated tRNA can in fact undergo the puromycin reaction [26] does not seriously challenge this definition, since the latter reaction differs qualitatively from that of the P-site reaction in that it is extremely slow. For example, under conditions of 6 mM Mg2+ and in the presence of polyamines, the A-site puromycin reaction proceeds 200 times slower than that in the P-site (A. Potapov, C. Spahn and K. H. Nierhaus, unpublished results). Note that the hybrid-site model uncouples the functional definition of the P-site from the structural one, because this model locates the peptidyl residue at the P-site of the PTF center (on the 50S subunit) after the peptide-bond formation (because the peptidyl-tRNA is at the A/P-site), yet the peptidyl residue is not puromycin-reactive. Consequently, this model is forced to distinguish between two P-site positions on the 50S subunit, one that is puromycinreactive and one that is not. Finally, the problem of moving the (tRNA)2 •mRNA complex has not become significantly simpler, even though the tRNA movement from one site to the other now occurs in two steps. There are two major criticisms that should be considered when applying the hybrid-site model: 1. The concept of hybrid sites rests solely on the protection patterns of the 23S rRNA. However, the protections of 16 out of 17 bases of 23S rRNA were dependent on the ultimate A76 residue or CA76 -3 residues of the universal CCA-3 terminus of the tRNAs [27]. In contrast with the hybrid-site perception, the crystal structure of the 50S subunit after peptide-bond formation demonstrates that the CCA ends at A- and P-sites do not move [25]. 2. The experiments from which the model is derived employs a vast range of Mg2+ concentrations ranging from 5 to 25 mM [27, 28]. However, it is well
6 The Ribosomal Elongation Cycle
Fig. 2 Models of the elongation cycle: (A) The hybrid-site model according to Ref. [22]. For explanation see text. The essence of this model is a creeping movement of the tRNAs through the ribosome. (B) The "–g model of the elongation cycle according to Ref. [49]. The essential feature is a movable ribosomal "–g domain that connects both subunits through the intersubunit space, binds both tRNAs of an elongating ribosome and carries them from the A- and Psites to the P- and E-sites, respectively, dur-
ing translocation. The model keeps all the features of the allosteric-three-site model (see text), but explains the reciprocal linkage between A- and E-sites by the fact that the "–g domain moves out of the A-site during translocation leaving the decoding center alone at the A-site rather than by an allosteric coupling. In (B) Yellow and green represent the two binding regions of the "–g domain; blue, the decoding center at the A-site.
2 Models of the Elongation Cycle 7
known that the binding properties and the interdependencies of the various sites are extremely sensitive to changes in Mg2+ concentration [29]. This sensitivity probably reflects an increasing distortion of the ribosome with increasing Mg2+ concentration, which one would expect affects a fine-structure analysis such as the chemical probing of the rRNA bases. In fact, a deacylated tRNA bound to programed ribosome under conventional ionic conditions was found exclusively at the P/E hybrid site, whereas under more in vivo ionic conditions, the deacylated tRNA was found at the classical P-site, suggesting that the hybrid-site concept may simply be a buffer artefact [30]. A systematic analysis of tRNA-binding sites during elongation did not provide any evidence for hybrid states of a tRNA [11]. Recently, cryo-EM of PRE state ribosomes prepared in a different laboratory also showed no evidence for the presence of hybrid states [31]. In this study, the post-peptide-bond formation of PRE state ribosome complexes had a dipeptidyl tRNA at the classical A-site and not at the A/P hybrid site as would be predicted by the hybrid-site model. From these independent studies, it is at least clear that a ribosome containing a hybrid site does not represent a significant proportion of the elongating ribosome population. Furthermore, the crystal structure of a programed ribosome containing three deacylated tRNAs at 5.5 Å resolution identified a deacylated tRNA at the classical P-site [23], rather than at the P/E hybrid site as might be expected from the hybrid-site model.
Contrary to the belief of some, the ratchet model, where the subunits twist relative to each other in a forward-and-back movement by about 4.5◦ during translocation [32], does not in fact support the hybrid-site model. This is because the forwardand-back ratchet movements occur during one translocation step, i.e., the tRNAs are at A- and P-sites before translocation and at P- and E-sites after translocation (see Sect. 5.2). However, it could be that during one translocation reaction the two tRNAs do transiently pass through a hybrid position when they go from the A- and P-sites to the P- and E-sites, respectively. In fact, this interpretation is supported by a cryo-EM analysis of the translocation movement [31], suggesting that hybrid states are indeed translocation intermediates. 2.2 The Allosteric Three-site Model (α–e Model; Reciprocal Coupling between the A- and E-sites)
Under unfavorable buffer conditions the dissociation rate of a tRNA at the E-site was reported to be 0.3 s−1 [33], whereas the elongation rate, determined under comparable conditions, was reported to be 10 times faster (3 s−1 ) [34]. This alone suggests that an active mechanism must exist to eject the deacylated tRNA from the E-site. Under near in vivo conditions, the situation is even more significant: the dissociation rate of an E-site tRNA from an isolated POST state is much lower and can be measured in hours rather than seconds. POST states can be isolated via overnight centrifugation through sucrose cushions without loss of any deacylated tRNA from the E-site (see, e.g., Ref. [35]), and native polysomes isolated using a
8 The Ribosomal Elongation Cycle
procedure lasting longer than 24 h contain an occupied E-site almost quantitatively [36]. It is clear that the release of a deacylated tRNA from the E-site must be an active process and cannot occur via a simple diffusion process as considered by some authors (see, e.g., Ref. [37]). In fact, crystal structures of 70S•tRNA complexes [23] and cryo-EM studies [31] have demonstrated stable and tight E-site binding, although a mechanism for the E-tRNA release was not deduced. A reciprocal linkage between A- and E-sites has been identified as being the active mechanism for E-site ejection: the E-site affinity drops markedly upon occupation of the A-site and vice versa, i.e., the occupation of the A-site is coupled with the tRNA release from the E-site [38, 39]. In the years following the identification of the reciprocal linkage between A- and E-sites, much data accumulated providing support for this model: (i) A direct and unequivocal demonstration of the reciprocal linkage was achieved using a heteropolymeric mRNA displaying three different codons at the three sites together with three respective cognate tRNAs, each labeled with a different isotope [40]. (ii) The activation energy for occupation of the A-site depends on the charging state of the E-site: when a tRNA is present at the E-site the activation energy is twice as large as that observed when the E-site is free [12]. (iii) Thiostrepton, viomycin and all types of aminoglycosides severely impair A-site binding only if the E-site is occupied [41]. (iv) The reciprocal coupling of A- and E-sites has also been observed with the ribosomes of organisms from other evolutionary domains, viz. with ribosomes of Halobacterium halobium (archea; [42]) and yeast (eukarya; [20]), suggesting that this relationship is universally conserved. Indeed, in yeast, the reciprocal coupling was observed when the functions of the third elongation factor (EF-3) were studied [20]. In the POST state, yeast 80S ribosomes bound the E-tRNA so tightly that a ternary complex aa-tRNA•EF-1•GTP could not bind to the A-site. First, the binding of EF-3 to the ribosome and ATP cleavage was necessary to free the E-site tRNA, perhaps by “opening” up the E-site by movement of the L1 protuberance of the 50S subunit. Only then was it possible for A-site occupation and the concomitant release of the E-tRNA to occur. These results illustrate the bi-directionality of the reciprocal linkage: if the E-tRNA cannot be released, the A-site remains in its low-affinity state and cannot be occupied; if EF-3 “opens” the E-site, A-site occupation triggers release of the E-tRNA. The reciprocal linkage between A- and E-sites led to the “allosteric three-site model”, which is characterized by three basic features (the last version is the α–ε model, see Fig. 2B; see Ref. [43] for review): 1. Ribosomes contain three tRNA-binding sites [44, 45]. 2. The first and the third sites, A- and E-sites, respectively, are coupled in a reciprocal fashion: occupation of the A-site decreases the affinity at the E-site and vice versa [20, 39, 40]. 3. Both tRNAs that are present at the A- and P-sites before translocation and at the P- and E-sites after translocation are linked to the mRNA via codon–anticodon interaction [20, 46]. Codon–anticodon interaction at the E-site seems to be essential for establishing the POST state containing the P- and E-tRNAs [47]. Only the POST state is the proper substrate for the ternary complex aa-tRNA-EF-Tu-GTP.
2 Models of the Elongation Cycle 9
The reciprocal linkage model was extended to the so-called α–ε model when the accessibility of the phosphate groups of a tRNA at the various ribosomal sites was tested in PRE and POST states. These experiments demonstrated that a tRNA in the A-site of a PRE state ribosome had a strikingly different pattern when compared with the corresponding tRNA in the P-site [48]. However, after translocation to the P- and E-sites, the protection patterns of both tRNAs hardly changed [49]. The conclusion was that the tRNAs bound to a structural domain of the ribosome, and that this structural domain moved during translocation from the A- and P-sites to the Pand E-sites while maintaining contact with both tRNAs. Therefore, the structural domain was considered to contain two binding regions, which were termed α and ε (Fig. 2B). A tRNA bound to the α binding region is located at the A-site before translocation and at the P-site after translocation. Similarly, a tRNA bound to ε binding region is located at P-site before translocation and at the E-site after translocation. Thus, only α can appear at the A-site and only ε at the E-site (hence the nomenclature), whereas at the P-site either α or ε can be present depending on the translocation state. Support for the model was recently provided by results obtained using a completely different approach: site-specific Pb2+ cleavage was applied to trace tertiary alterations of tRNAs and rRNAs in PRE and POST state ribosomes [50]. Deacylated tRNAs and AcPhe-tRNA produced the same cleavage pattern in solution but very different ones when bound to the ribosome. Consistent with phosphorothiote experiments, the specific and distinct patterns for the bound tRNAs did not change during translocation. This again led to the conclusion that while the tertiary structure of the adjacent tRNAs at A- and P-sites are different, the fact that they do not change during translocation argues for a ribosomal “conveyor” that binds both tRNAs and moves them during translocation. Comparing contact patterns of tRNAs obtained with isolated subunits and 70S ribosomes using the phosphorothioate method indicated that the two parts of the “conveyor”, both of which bind a tRNA, probably move in a concerted fashion but not strictly side-byside [51]. Therefore, the recent observation that the deacylated tRNA seems to move from the P- to the E-site via a hybrid position [31] is not in conflict with the α–ε model. The α–ε model integrates the well-documented fact that the post-translocational ribosome with a low-affinity A-site is capable of selecting the aminoacyl-tRNA cognate to the codon at the A-site, i.e., the decoding process occurs before the α–ε domain of the ribosome flips back from the P-E-site to the A-P-site. According to the α–ε model, the decoding site, being exclusively on the 30S subunit, is stably located at the A-site where it is called δ. It follows that α is superimposed on δ in the PRE state and separated from the δ domain in the POST state (Fig. 2B). This feature of the α–ε model predicts that the ribosome has two tRNA-binding sites in the PRE state and three in the POST state (the two high-affinity sites α–ε at the Pand E-sites and the low-affinity site δ at the A-site). All three features of the allosteric three-site model are also valid for the α–ε model, although they are extended or re-interpreted, such that (i) the three tRNAbinding sites still exist but only in the POST state. Saturation of 70S ribosomes with deacylated tRNAs levels off at three tRNAs per ribosome. First, the P- and
10 The Ribosomal Elongation Cycle
E-sites are filled, thus establishing a POST state, thereby creating a low-affinity A-site (δ-site at the A-site). This is illustrated by the requirement of excess (> 6fold) of deacylated tRNAs over ribosomes to fill the third site [45]. (ii) The A- and E-sites have an inverse relationship in that an occupied E-site is accompanied by a low-affinity δ-site at the A-site and an occupied A-site with no affinity at the Esite. The new interpretation of this relationship is that allostery is not involved: during translocation the α region moves from the A-site to the P-site leaving the decoding center δ in the A-site (“low-affinity” A-site), and during aminoacyl-tRNA binding to the A-site the α–ε domain jumps from the P-E to the A-P, leaving the E-site without a tRNA-binding capacity. (iii) The two tRNAs have a similar mutual arrangement relative to each other. In fact, the angles between the tRNAs in the PRE and POST states are almost identical, as shown by cryo-EM analysis (39◦ and 35◦ in the PRE and POST states, respectively [11]). Also the arrangement relative to the mRNA, i.e., the codon–anticodon interactions, is maintained before, during and after translocation with regard to the α–ε binding regions. Thus, the movement of the tRNAs occurs simultaneously on the two subunits in a coordinated fashion, and during translocation the tRNAs might transiently move through hybrid positions as has been made evident for the deacylated tRNA moving from the P-site over the hybrid position P/E to the E-site [31], whereas the predictions of the hybrid-site model, namely that the tRNAs swing into a hybrid site after peptide-bond formation could not be confirmed [11, 31]. However, the CCA ends of the two tRNAs present at A- and P-sites are directly adjacent at the PTF center at the PRE state (required for peptide-bond formation), whereas they are separated substantially in the POST state. After the peptide bond has been formed, there is no need to keep them together. This fact indicates that the α and ε regions do not strictly move side-byside during translocation. Ribosomal candidates for a movable domain have been identified as bridge B2a in 70S ribosomes via cryo-EM [52] and X-ray crystallography [23], and probable ribosomal components such as the upper region of the h44 of the 16S rRNA [53], H69 of the 23S rRNA and parts of the ribosomal protein L2 [54, 55]. The α–ε model provides a dynamic picture of the translating ribosome. The translocation reaction is explained in a (possibly too) simple fashion: the α–ε domain moves together with both tRNAs and the corresponding codons. The required movement of about 10 Å, the length of a codon, is not unusual for molecular movements of enzyme substructures. For example, the distance between the first and the second domains of EF-Tu is enlarged by up to 40 Å upon GTP cleavage [56]. According to the α–ε model, the critical step during the elongation cycle is not the translocation reaction but rather the binding of the new aminoacyl tRNA in the α-site at the A-site, since during the latter the α–ε domain has to release the tRNAs in order to switch from the E-A-sites to the P-A sites (see Fig. 2B). This is in agreement with the finding that occupation of the A-site is the rate-limiting step of elongation, not the translocation reaction [12, 57]. To date, the most comprehensive analysis of the translocation reaction by cryoEM [31] challenges an essential feature of the α–ε model, namely, PRE state ribosomes carrying an fMet-Phe-tRNA at the A-site and a deacylated tRNAPhe at
3 Decoding and A-site Occupation
the P-site, also carried a tRNA at the E-site. This was particularly surprising since deacylated tRNA was not actually included in the reaction; however, the authors assumed that the source of the E-tRNA was the free pools of deacylated tRNA in solution, which enabled direct binding to the “high-affinity” E-site. Regardless of the source, the presence of simultaneously occupied A- and E-sites is in direct contradiction with the α–ε model. However, to contradict the α–ε model directly the stoichiometry of the A- and E-sites needs to be assessed. The authors made no attempts to resolve this conflict between their conclusions and the findings that an E-site release upon A-site occupation has been observed with both prokaryotic and eukaryotic ribosomes (see Ref. [58] for review). In any case, it is clear that we are far from having a detailed picture of the translocation reaction and the presence of high-resolution structures of PRE and POST states would certainly go some way to resolving one of the central enigmas of ribosome research. 3 Decoding and A-site Occupation 3.1 Some General Remarks about Proofreading
The term “proofreading” stems from the glossary of the printing arts. The first outprint of a newspaper, e.g., was checked by the print master, and if he found a misprint, the letter was exchanged and the mistake corrected for the second print that was delivered to the clients. In molecular biology, this term has a similar meaning, in that the last amino acid building block that is to be added to the growing polypeptide chain is checked for correctness before it is permanently incorporated into the synthesized protein. Proofreading is a common phenomenon in polymerases that synthesize DNA [59] and RNA [60]. In fact, polymerization of nucleic acids is well suited for proofreading, since its basic mechanism is that of “tail growth” (Fig. 3). This means the new building block that is to be added to the nascent chain is providing the energy-rich bond (the phosphoric acid anhydride bond between the α and the β phosphate residues of an NTP) for the link (the 3 ester bond) to the nascent chain. If the addition was wrong, the proofreading center hydrolyzes the last four nucleotides, thus enabling another chance for the correct addition. In contrast, the situation is very different in the case of protein synthesis since it follows the principle of “head growth” (Fig. 3): the nascent chain provides the energy-rich ester bond for the formation of the peptide bond that links the newly added amino acid. The peptidyl residue is added to the newly arrived amino acid, and the ester bond of the latter is used for the next round of elongation. Therefore, in strictu sensu, proofreading cannot exist in protein synthesis, since such a process would sacrifice the already synthesized chain via removal of the last incorporated amino acid. Not surprisingly, therefore, a “proofreading center” was not detected in the atomic structures of the ribosome. Nevertheless, the term proofreading is widely used in the field of protein synthesis, but here in a broader sense, such as
11
12 The Ribosomal Elongation Cycle
Fig. 3 Principles of tail- and head-growth: (A) Definitions of head (H) and tail (T) in the synthesis of nucleic acids and proteins. Tail growth means that the energy-rich bond of the new building unit is used for incorporation, whereas during head growth the energy-rich bond of the nascent chain is
used for incorporation of a new building block. (B) Tail growth is exemplifed by the synthesis of nucleic acids (activities of replicases and transcriptases), and head growth to protein synthesis on the ribosome. See text for details.
“kinetic proofreading” that will be discussed in Sect. 3.3. We should also note here that synthetases of both class I and class II have “proofreading centers” that play an important role for achieving the accuracy of charging tRNAs [61–63]. 3.2 Discrimination against Noncognate aa-tRNAs
A new elongation cycle begins when an aa-tRNA enters the ribosomal A-site as the ternary complex aa-tRNA-EF-Tu-GTP. The codon displayed in the A-site is specific for a single species of tRNA, termed the cognate tRNA, which has an anticodon that perfectly complements the A-site codon. However, there are many other tRNA competitors that can interfere with this selection process: 41 in E. coli and even more in the eukaryotic cell. To make matters worse, 4–6 of these tRNAs, termed nearcognate tRNAs, will have an anticodon similar to the cognate tRNA. The remaining
3 Decoding and A-site Occupation
90% have a dissimilar anticodon and are termed noncognate tRNAs. The problem is compounded further when one considers that the aa-tRNAs are delivered in the form of a ternary complex, i.e., in complex with the elongation factor EF-Tu and GTP. The ribosome must therefore discriminate between relatively large ternary complexes (72 kDa), which present multiple potential interaction sites with the ribosome, on the basis of a small discrimination area, the anticodon (1 kDa). The discrimination potential of the discrimination energy can only be reached under equilibrium conditions and, in this case, the free energy of binding is relatively large, with only a tiny fraction being discrimination energy. This means that equilibrium can only be reached after long time periods, i.e., this process must be slow to be accurate. Since we know that protein synthesis is a relatively fast and accurate process, the ribosome must overcome this hurdle. The question is how? A model has been proposed which overcomes this problem by simply dividing A-site occupation into two distinct events; a decoding step followed by an accommodation step (reviewed in Ref. [64]). During the initial decoding step, the A-site is in a low-affinity state, which reduces interaction of the ternary complex to mainly codon– anticodon interactions, thus excluding general contacts of the tRNA and elongation factor. By restricting the binding surface of the ternary complex to the discriminating feature, i.e., the anticodon, the binding energy is both small and equivalent to the discrimination energy. In addition, since the binding energy is small, equilibrium can be rapidly attained, thus ensuring the efficiency of the reaction is retained. The second step, accommodation of the A-site, requires release of the aa-tRNA from the ternary complex into the A-site. This step utilizes the nondiscriminatory binding energy to dock the tRNA precisely into the A-site and the attached amino-acyl residue into the PTF center on the 50S subunit in preparation for peptide-bond formation. As we have already seen in the previous section, accommodation of the aa-tRNA in the A-site is accompanied by release of the EtRNA. Evidently, this second step of A-site binding involves gross conformational changes within the ribosome [12] and thus can be thought of as a relatively slow process in comparison with the first or decoding step. It follows that A-site binding occurs via a coupled reaction system, consisting of a fast initial decoding and a slow second accommodation reaction. This has the important consequence that the initial reaction operates at equilibrium even when the whole system runs under steady-state conditions. It is this feature that enables the discriminatory potential of codon–anticodon interaction to be efficiently exploited. Recently, the first step of A-site binding (low-affinity A-site) was viewed using cryo-EM by analyzing ternary complexes stalled at the A-site using the antibiotic kirromycin [10, 65]. Although kirromycin allows GTP hydrolysis, it inhibits the associated conformational changes in EF-Tu that are necessary for dissociation from the ribosome. The cryo-EM reconstructions suggest that the anticodon-stem loop of the tRNA is kinked to allow codon–anticodon interaction and thus overcomes the unfavorable incoming angle of the tRNA to the A-site dictated by the ternary complex (see Fig. 4; [10]). As accommodation of an aa-tRNA into the A-site involves the dissociation of EFTu-GDP from the ribosome, which is in turn coupled with the hydrolysis of GTP, it
13
14 The Ribosomal Elongation Cycle
Fig. 4 The ternary complex on the ribosome during the decoding process. (A) Binding of the ternary complex aa-tRNA•EFTu•GTP (red or pink) during the first step of A-site occupation (T position), with tRNAs present at P- and E-sites. Ribosomal subunits are in blue (large subunit) and yellow (small subunit). (B-D) Fitting of the crystal structure of the ternary complex into
the difference mass corresponding to the ternary complex on the ribosome. To fit satisfactorily the crystal structure of a tRNA into the corresponding cryo-EM density requires the introduction of a kink in the anticodon stem of the amino-acyl-tRNA of about 40◦ . From Ref. [10]; for details see text.
is interesting to note that in E. coli up to 2 GTPs are hydrolyzed during the incorporation of cognate-tRNAs and up to 6 for near-cognate-tRNAs, whereas noncognatetRNAs do not trigger EF-Tu-dependent GTP hydrolysis at all [66]. The acceptance of a near-cognate aminoacyl-tRNA consumes three times more GTP than that of a cognate one, thus improving the accuracy by a factor of 3 only. Since the total accuracy is characterized by a factor of about 3000 (one mis-incorporation in 3000 incorporation events), it is clear that this observation adds further weight to the argument that the tRNA discrimination is governed predominantly by anticodon–codon recognition during the initial binding step (see also the next section). Why is the low-affinity A-site during the decoding step important for preventing the interference of noncognate tRNAs with the decoding reaction? Preventing this interference by noncognate tRNAs is not trivial since they represent the majority of about 90% of the ternary complexes. If the decoding step reduces the interactions of the ternary complex with the ribosome mainly to codon–anticodon interaction, then this will prevent noncognate ternary complexes interfering with decoding, since interaction of the ternary complex outside of the anticodon is not possible and the anticodon of noncognate tRNAs cannot form efficient base-pairs with the codon displayed at the A-site. In other words, the A-site codon does not really exist for the non-cognate complexes. This fact reduces the selection problem by an order of magnitude: instead of selecting one out of 41 tRNA species (cognate versus near- plus noncognate tRNAs) only 1 out of 4–6 tRNAs have to be selected (cognate versus near-cognate tRNAs). The selection problem is comparable with that of a transcriptase selecting the correct nucleotides out of 4 possible ones during
3 Decoding and A-site Occupation
RNA synthesis. This process occurs with a precision of better than one mistake in 60 000 incorporations without proofreading [60]. The fact that the noncognate ternary complexes do not interfere with the selection process has been demonstrated by a number of different approaches: (i) When an E-tRNA induces a lowaffinity A-site, a noncognate ternary complex is not incorporated into the nascent peptide chain [47]. (ii) The addition of an excess of noncognate ternary complexes does not slow down the rate of poly(Phe) synthesis in fast systems ([67] and A. Bartetzko and K.H. Nierhaus, unpublished observations). (iii) As mentioned previously, noncognate ternary complexes also show no traces of GTP turnover, whereas near-cognate complexes have a turnover about three times higher than that of cognate ternary complexes [66]. The next obvious question is how the cognate and near-cognate tRNAs are discriminated. This is a question that can now be answered at the molecular level, as discussed in the next section. 3.3 Decoding of an aa-tRNA (Cognate versus Near-cognate aa-tRNAs)
A model for the discrimination between cognate and near-cognate aa-tRNAs was proposed by Potapov about 20 years ago [68]. According to this model, the decoding center of the ribosome recognizes the anticodon–codon duplex, in particular sensing the stereochemical correctness of the partial Watson–Crick base-pairing and the positioning of the phosphate-sugar backbone within this structure. A test of this hypothesis was performed with an mRNA that carried a DNA codon (deoxycodon) at one of the three ribosomal sites. If the stability of the base pairs, i.e., the hydrogen bonding between codon–anticodon bases of the Watson–Crick pairs, is the sole requirement for the recognition step, then a 2 -deoxy base in the codon should not affect the decoding process. If, however, the stereochemical correctness of the base pairing is tested, i.e., including the positioning of the sugar pucker, then a 2 -deoxy base should impair the decoding process. It was found that a deoxycodon at the A-site was disastrous for tRNA binding at this site, whereas a deoxycodon at the P-site had no effect on tRNA binding to the P-site. This observation also explains previous results according to which a DNA cannot take over the function of an mRNA (see Ref. [69] and references therein). The components of the ribosome directly involved in decoding were identified to 3.1 Å by crystallography [70]. Crystal packing of the Thermus thermophilus 30S subunit fortuitously placed the spur (h6) of one subunit into the P-site of another, thus mimicking the anticodon stem loop (ASL) of a P-tRNA. Another surprise was that the base-pairing partner to the P-tRNA mimic was the 3 -end of the 16S rRNA, which folding back on itself extended into the decoding center. This situation, with the P-site filled, enabled Ramakrishnan and co-workers [70] to soak an ASL fragment (ASL-tRNA) and a complementary mRNA fragment into these crystals to study aa-tRNA decoding at the A-site. The results can be summarized as follows: the binding of mRNA and cognate aatRNA induces two major rearrangements within the ribosomal decoding center: the
15
16 The Ribosomal Elongation Cycle
Fig. 5 The principles of decoding in the Asite of the ribosome. (A) The first base pair of codon–anticodon interaction (position 1) exemplifies a type I A-minor motif: A1493 binds to the minor groove of the A36–U1 base pair via H-bonds. (B) The middle position illustrates a type II A-minor motif: A1492 and G530 acting in tandem to recognize the stereochemical correctness of
the A35–U2 base pair using H-bonds. (C) The third (or wobble) base pair (G34–U3) is less rigorously monitored. C1054 stacks against G34, whereas U3 interacts directly with G530 and indirectly with C518 and proline 48 of S12 through a magnesium ion (magenta). All nucleotides involved in monitoring positions 1 and 2 are universally conserved. Adapted from Ogle et al. [70].
universally conserved residues A1492 and A1493 flip out of the internal loop of h44, whereas the universally conserved base G530 switches from syn to anti conformation. A1493 recognizes the minor groove of the first base pair (ASL-tRNA position A36–U1 of the mRNA) via a type I A-minor motif (Fig. 5A). A1493 establishes three hydrogen bonds (H-bonds) with the first position of the codon–anticodon duplex: two with 2 -OH groups from both A36 and U1 and another with the O2 of U1. It is noteworthy that the latter H-bond is not sequence-specific as might be expected, since the O2 position of pyrimidines and the N3 of purines occupy equivalent positions in the minor groove of a double helix and both are H-bond acceptors. The second base pair (A35–U2) is also monitored via 2 -OH interactions, but the job is split between two bases, namely A1492 and G530 (this type II interaction of an A-minor motif is seen in Fig. 5B). A1492 and G530 are locked in position
3 Decoding and A-site Occupation
by secondary interactions with S12 (serine 50) and another universally conserved residue, C518. Monitoring the second base pair seems to occur more rigidly than that of the first pair in accordance with the fact that the second base pair is of utmost importance for decoding followed by the first pair, whereas the third pair of the codon– anticodon duplex is of least importance [71]. In other words, correct positioning of the 2 -OH groups of the first and second base pairs is critical in forming A-minor interactions and thus efficient duplex sensing. In contrast, the third position is less rigorously monitored, allowing latitude for wobble interactions (Fig. 5C). This is evident in the third base pair (G34-U3), where the minor groove remains exposed despite direct interactions with C1054, G530 and indirect metal-mediated interactions with C518 and proline 48 of S12. Taken together, these results confirm the Potapov hypothesis and explain how decoding operates through the recognition of the correct stereochemistry of the A-form codon–anticodon duplex. Furthermore, the fact that the components involved are universally conserved suggests that the mechanism of decoding is probably similar for ribosomes from all kingdoms. Important extensions to this picture could be made when the known crystal data of T. thermophilus 30S subunit showing cognate codon–anticodon interactions between an UUU codon and an ASL structure of the tRNAPhe [70] was complemented with a data set obtained after soaking the programed 30S suband tRNASer . The cognate tRNAPhe units with near-cognate ASLs from tRNALeu 2 has the anticodon 3 -AAG-5 , the near-cognate tRNALeu and tRNASer contain the 2 anticodons 3 -GAG-5 and 3 -AGG-5 with a G:U mismatch at the first or second position, respectively, whereas such a mismatch is allowed at the wobble position [72]. The most important conclusion from this work was that only the cognate codon–anticodon interaction could induce a “closed” form of the 30S subunit involving a movement of the shoulder and head towards each other, whereas near-cognate ASLs could only induce this conformation in presence of the misreading-inducing aminoglycoside antibiotic pactamycin. The complete movement can be seen under http://www.cell.com/cgi/content/full/111/5/721/DC1, and has a number of important consequences. Various nonpolar interactions between the ribosomal proteins S4 and S5 are broken upon transition from the closed to the open form, whereas h44, h27 and S12 come together and can form additional salt bridges (between residues K57 and the phosphate of C1412 of h44 or K46 and the phosphate of A913 of h27). It has already been noted previously that mutations which would be predicted to break the nonpolar interactions between S4 and S5 induce a “ram” phenotype (ram, ribosomal ambiguity mutations: ribosomes with a defect that is characterized by a high-level amino acid mis-incorporation [73, 74]). In other words, these mutations facilitate the transition into the open form and thus reduce translation fidelity by inducing mis-incorporation. On the other hand, mutations of S12 that destabilize the closed form would impair the transition to the closed form and thus increase the fidelity of aa-tRNA selection at the A-site. In fact, mutations of K57 of S12 result in the mostaccurate (“hyperaccurate”) phenotype known. Therefore, we have gained for the first time a molecular understanding of mutants that increase or decrease the level of mis-incorporations of the ribosome. Since an
17
18 The Ribosomal Elongation Cycle
occupied E-site that has been shown to induce a low affinity of the A-site and improve the accuracy (see Ref. [75] for review), also makes an important contribution to the ribosomal power for A-site tRNA discrimination, this suggests that in the frame of the open-closed model of Ramakrishnan and coworkers [72], the extensive contacts the E-site tRNA with both the 30S and 50S subunit might increase the energetic costs for the transition from the open to the closed form. Prior to the Potapov hypothesis, it had been proposed that the ribosome utilized a “proofreading mechanism” to improve the accuracy of translation [76, 77]. This mechanism was suggested to operate by re-selection of the correct substrate during a so-called “discarding step”, after the initial binding of the A-tRNA. Because reselection is dependent upon release of the tRNA from EF-Tu and is accompanied by GTP cleavage, the GTP consumption for the incorporation of a cognate and near-cognate amino acid provides a measure of the power of proofreading. Insofar as the crystal structure of EF-Tu and the ribosome are concerned, the ribosomal proofreading mechanism lacks its own active center (see Sect. 3.1 for a principal comparison of the synthesis of nucleic acids and proteins concerning proofreading). Instead, the term “proofreading” has been broadened by introducing “kinetic proofreading” that occurs after the release of the binary complex EF-Tu·GDP [78]. A simple model for kinetic proofreading is the following: the binding energy during the decoding step (first step of A-site binding, see 3.2) is for the near-cognate aa-tRNA lower than for the cognate one. Therefore, the probability of triggering the gross-conformational change required for the accommodation of the aa-tRNA into the A-site (second step of A-site binding) is lower than for the near-cognate. This in turn prolongs the resting time of the near-cognate aa-tRNA at the lowaffinity A-site and provides an additional chance for the near-cognate aa-tRNA to fall off the low-affinity A-site [79] thus increasing the accuracy. Re-binding of this near-cognate aa-tRNA is unlikely in the presence of competing ternary complexes that have a 2–3 orders of magnitude higher affinity for the A-site than the naked aa-tRNA [80]. The importance of the kinetic proofreading step can be quantitatively determined by taking advantage of the fact that the kinetic proofreading mechanism requires EF-Tu-dependent GTP hydrolysis. Accuracy of aa-tRNA selection in the presence of EF-Tu and a noncleavable GTP analog was determined to be 1:1000 [81], an accuracy only three times lower than that seen in vivo (1:3000). Exactly a threefold difference was also determined for the GTP consumption per incorporation of cognate versus near-cognate amino acids [66]. Thus it is clear that the significant contribution to the accuracy of translation (1000-fold) lies within the stereo-chemical monitoring of the codon–anticodon duplex by the ribosome as predicted by Potapov and that the “kinetic proofreading mechanism” plays only a minor role, conferring a 3-fold improvement in the accuracy. This view was qualitatively confirmed by a recent direct measurement of the discrimination power of the initial binding without proofreading, where the binding of cognate and near-cognate ASL-tRNA fragments to the A-site of 70S ribosomes were compared. The accuracy was found to be between 1 : 350 and 1 : 500, thus also demonstrating that the lion’s share of the ribosomal accuracy is carried by the initial binding [72].
3 Decoding and A-site Occupation
3.4 Roles of EF-Tu
The following functions of EF-Tu can be distinguished: (i) EF-Tu within the ternary complex aa-tRNA•EF-Tu•GTP reduces the activation energy barrier between the POST and the PRE states by about 120 kJ mol−1 [12] and thus allows the transition from the POST to the PRE state with a high rate. (ii) EF-Tu binds an aa-tRNA at the amino acid acceptor stem thus shielding the labile ester bond between the aminoacyl residue and the tRNA. (iii) A third function (related to function (i) but not identical) is the carrier role of EF-Tu, namely, to deliver the aa-tRNA to the A-site: the ternary complex has an affinity for the A-site, 2–3 orders of magnitude higher than that of the corresponding aminoacyl-tRNA [80]. (iv) Another role for EF-Tu was identified by Uhlenbeck and co-workers [82]. Measuring the affinities of various cognate aa-tRNA (e.g., Val-tRNAVal ) and some mis-pairs (e.g., Ala-tRNAVal ) they recognized that either the amino acid or the tRNA binds to EFTu with high affinity to form stable ternary complexes aa-tRNA·EF-Tu·GTP. For example, EF-Tu·GTP easily forms a ternary complex with Asp-tRNAAsp or AsntRNAAsn , but not with the mis-charged Asp-tRNAAsn , since in the latter case both moieties bind with low affinities. This observation explains an important scenario that was an enigma hitherto: in most organisms there are only 18 or 19 synthetases, i.e., not the 20 different synthetases corresponding to the 20 natural amino acids. For example, many organisms do not contain a synthetase specific for asparagine (AsnRS). In this case, AspRS is also charging tRNAAsn with aspartic acid, yielding a mis-charged Asp-tRNAAsn , which is recognized by enzymes that amidate Asp to Asn on the tRNA. The mis-charged AsptRNAAsn does not form a stable ternary complex with EF-Tu·GTP and thus Asp is not incorporated at codons specifying Asn. This discrimination process via EF-Tu was termed thermodynamic compensation and adds to the accuracy of the translational process [82]. 3.5 Mimicry at the Ribosomal A-site
The A-site is not restricted to binding tRNAs exclusively. During the various stages of the elongation cycle, a number of translational factors interact at the A-site. The first structures determined for these translational factors were those of EF-G [83, 84] and EF-Tu [83, 85]. Interestingly, the structure of the latter, in the form of a ternary complex EF-Tu-GTP·tRNA [86], had a striking similarity to that of EF-G·GDP, such that domains 3–5 of EF-G closely mimic the tRNA in the ternary complex (Figs. 6A and B; reviewed by Nissen et al. [87]). This suggested that the binding pocket of the A-site constrains the translational factors binding there to conform to a tRNA-like shape. In the last few years, solution structures for various termination factors, such as RF2, eRF1 and especially RRF (Fig. 6C) that also interact at the ribosomal A-site, have generally supported this concept. However, recent studies on the conformation and orientation of these factors on the ribosome suggest that the translation mimicry hypothesis has been over-extrapolated and that an overall
19
20 The Ribosomal Elongation Cycle
Fig. 6 Molecular mimicry of tRNAs by translation factors. Comparison of the crystal structures for (A) EF-G GDP with domains 3–5 in gold (pdb1fmn) [148], (B) EF-Tu GTP tRNA (pdb1ttt) [86], (C) RRF (pdb1eh1) [149], figures of crystal structures were generated with Swisspdb viewer [150] and rendered with POVRAY.
tRNA shape does not necessarily suggest a binding orientation analogous to that of the tRNA. Mimicry of RNA by protein may be a more common feature in ribosomes than first realized. Organellar ribosomes generally have shorter rRNA components when compared with E. coli. Recent analyses of the chloroplast and mitochondrial ribosome components suggest that these rRNA losses are compensated for by both increases in size of the ribosomal protein homologs and the presence of additional organelle-specific ribosomal proteins [88–92]. Mitochondria represent an extreme example in that the protein component of the ribosomes represents two-thirds of the mass instead of one-third as in E. coli ribosomes. It is worth mentioning that the rRNA does consist predominantly of universally conserved residues that locate to the active centers of the ribosome, i.e., the decoding center on the 30S subunit and the PTF center on the large subunit [93], thus reinforcing the importance of these regions. 3.6 Translational Errors
Three types of ribosomal errors can be distinguished, the underlying mechanisms of which, not only partially overlap but are also intimately related: (i) a simple mistake in the decoding of a codon, (ii) a processivity error, and (iii) a loss of the correct reading frame (frameshift). A decoding error can lead to incorporation of an incorrect aminoacyl residue into the nascent peptide chain. A processivity error is defined as the release of a prematurely short peptidyl-tRNA from the ribosome. A shift in the reading frame usually means the immediate loss of the genetic information, and will lead to the release of the synthesized peptide from the ribosome due to the appearance of a premature stop codon in the A-site. From the frameshift, peptide release will occur statistically within about 20 decoding steps in the incorrect reading frame since three of the 64 codons are stop codons.
3 Decoding and A-site Occupation
Incorrect incorporation of an amino acid at a sense codon is termed a missense substitution and occurs with a global average of ∼3 × 10−4 [94]. The third nucleotide of a codon is misread the most often, followed by the first one; for the decoding process the middle nucleotide of a codon seems to be the most important and is misread with the lowest detectable frequency. The codon lexicon is arranged in a way that an error in the reading of the third nucleotide of a codon results in the incorporation of either the same or a similar amino acid into the nascent chain, i.e., one with chemical properties similar to the correct amino acid such as charge or hydrophilicity. In this way, the effects of the missense substitution are buffered and usually do not lead to disastrous malfunctions of the corresponding protein. According to a rough estimate, only 1 in 400 missense events will completely inactivate the product [95]. Mis-incorporations are not only due to ribosomal decoding errors, but can be caused at the synthetase level via mis-charging. However, the charging mistakes of synthetases are usually below the level of the ribosomal mis-reading and can reach a precision of 1 mis-charging in 100 000 charging events [96]. Owing to the generally high accuracy of the synthetases, the mis-charging effects are negligible for protein synthesis. EF-Tu also participates in preventing mis-incorporations in the frame of the thermodynamic compensation mechanism described in Sect. 3.4, whereas the discrimination cognate versus noncognate is discussed in Sect. 3.2 and that between cognate versus near-cognate in Sect. 3.3. In the rare cases of processivity errors, a stop codon can be translated (termed readthrough) by a ternary complex aminoacyl-tRNA·EF-Tu·GTP leading to an extension on the protein product. However, usually processivity errors are premature drop-offs of the peptidyl-tRNAs, the frequency of which has been estimated to be around 4 × 10−4 (i.e., four drop-offs in 10 000 amino acid incorporations; [97]). The ester bond linking the peptidyl residue to the tRNA is more stable than the corresponding bond of an aminoacyl-tRNA. This means that peptidyl-tRNAs would accumulate in the cell over time, thereby sequestering the tRNAs and prohibitively restricting protein synthesis. Therefore, the existence of the enzyme peptidyl-tRNA hydrolase is essential for cell viability, since it cleaves the relevant ester bond (with the exceptions of fMet-tRNA and aminoacyl-tRNA), thus recycling the tRNAs. In studies with a protein of more than 1000 amino acids (β-galactosidase), the fraction of initiating ribosomes that did not complete the synthesis was estimated to > 20%, and up to half of the effect was caused by a premature drop-off, the other half by truncated mRNAs resulting from an abortive transcription of the lacZ gene [98]. The probability of a premature drop-off is probably not identical for every codon, but is rather sensitive to context effects and occurs more often with short peptidyl-tRNAs. Nevertheless, the energy impact of processivity errors seems to be more severe than that of missense errors, the latter being the “standard” mistake during the decoding process. An estimation of the energy loss caused by processivity errors amounts to 3% of the total energy turnover of rapidly dividing cells. Truncated mRNA may trap a synthesizing ribosome since a stop codon necessary to provide an organized termination event is absent. Bacteria contain a
21
22 The Ribosomal Elongation Cycle
stable RNA of about 350 nt that rescues these trapped ribosomes. This RNA (10Sa RNA or tmRNA) can be charged with alanine by the corresponding synthetase, occupy the A-site, and after the nascent peptide has been transferred to the alanyl residue (tRNA function) can function as a mRNA and by doing so add a 10-amino-acid peptide tag to the nascent chain, allowing an ordered termination event via a programed stop codon. Owing to dual tRNA and mRNA functions of this RNA, it has acquired the name, tmRNA. The functions of the tmRNA are (i) to tag the abortive peptides with an additional sequence at the C-terminus that destines the peptide for efficient degradation and (ii) to recycle trapped ribosomes [99]. Although truncated mRNAs are not rare, the rescue of trapped ribosomes does – at least in some organisms – not depend solely on the presence of tmRNA, since null mutants are viable in E. coli although not in Baccillus subtilis at higher temperatures [100]. The precise mechanism of tmRNA action is not known. What causes processivity errors to occur? Several mechanisms can cause a processivity error but the predominant cause is probably an event shortly after the onset of protein synthesis, i.e., the insertion of the growing peptide chain into the ribosomal tunnel. This tunnel can harbor a sequence of about 30 amino acids before the growing peptide chain emerges from the back of the 50S subunit into the cytosol surrounding the ribosome. The macrolide antibiotics, such as erythromycin, cause accumulation of short oligo-peptidyl-tRNAs ranging in size from 1 to 8 amino acids long depending on the macrolide. This occurs because these antibiotics, by binding within the tunnel, prevent egress of the nascent polypeptide chain and therefore induce drop-off. Another mechanism is a false stop, i.e., a sense codon is incorrectly recognized by a release factor leading to termination of protein synthesis. A false stop is a rare event with a probability of about 10−6 per codon [101]. Finally, frameshifts can lead to protein fragments as mentioned previously. Since loss of the reading frame would lead to an immediate loss of the genetic information, maintaining the reading frame is an essential task of the ribosome. Usually, a loss in the reading frame occurs only once in 30 000 elongation cycles [98]; however, at the recoding site of the RF2 mRNA, where a + 1 frameshift at the 26th codon is essential for production of the full-length and active RF2 protein, loss of reading frame occurs with an efficiency of between 30 and 50%. Under certain conditions the frameshifting frequency can reach almost 100% efficiency, i.e., four orders of magnitude more often than that observed with other mRNAs. Obviously, there must be a ribosomal mechanism for maintaining the reading frame that is switched-off during RF2 synthesis. A detailed analysis has revealed that it is the presence of a cognate tRNA at the E-site that is essential for maintaining the reading frame. In vitro experiments show that the absence of an E-tRNA allows frameshift events to occur with a frequency of up to 20%, whereas in the presence of an E-site tRNA no frameshifting was observed (V. Marquez, D.N. Wilson, W.P. Tate, F. Triana-Alonso and K.H. Nierhaus, in press). This does not mean that all recoding events are triggered by a pre-mature release of the E-tRNA.
4 The PTF Reaction
4 The PTF Reaction
The PTF reaction is the central enzymatic activity of the large subunit. It occurs when a peptidyl-tRNA is located in the P-site and an aa-tRNA is in the A-site, termed a PRE state. Both L-shaped tRNAs at P- and A-sites form an angle of about 40◦ [11, 23, 102], whereas the acceptor stems are related by a translational movement, i.e. the CCA ends of both tRNAs at the PTF are related by an angle of approximate 180◦ and are, thus, in effect, mirror images of one another. The twist to accomplish this reflection occurs almost entirely between nucleotides 72 and 74 [103]. During PTF the α-amino group of the A-tRNA attacks the carbonyl group of the peptidyl residue of the P-tRNA, which is linked by an ester bond to the tRNA moiety (Fig. 7). This forms a tetrahedral intermediate, which resolves to yield a peptidyl bond. As a result, the aa-tRNA becomes a peptidyl-tRNA prolonged by one aminoacyl residue, and the former peptidyl-tRNA is stripped of its peptidyl residue to become a deacylated tRNA without a significant change of the place of the tRNA moieties [11, 104]. A long-standing debate within the translation field concerned whether or not the PTF reaction is catalyzed by proteins or rRNA. The PTF center was identified by using a putative transition state analog of the PTF reaction, which was soaked into crystals of the 50S subunit from Haloarcula marismortui [105]. This analog, which has been introduced by the Yarus group and hence termed the Yarus inhibitor, is a mimic of the CCA end of a P-tRNA attached to puromycin in the A-site (inset in Fig. 7) and is a strong competitive inhibitor of the A site substrate [106]. The region moulding the binding site of the inhibitor is densely packed with highly conserved bases of the 23S rRNA, mainly derived from the so-called PTF ring of domain V. The PTF ring structure with 41 nucleotides (Fig. 8) is one of the most highly conserved in rRNA, and its PTF involvement is supported by crosslinking studies from the acyl residues of tRNAs at A- and P-sites (see, e.g., Ref. [107]) as well as by mutations that render cells resistance against many antibiotics blocking peptide-bond formation (see Ref. [108] for review). Although there are 15 proteins that interact with domain V of the 23S rRNA, only the extensions of proteins L2, L3, L4, and L10e come within 20 Å of the active site (Fig. 9). That the active center of the ribosome is made exclusively from RNA, implies that the ribosome is a true ribozyme. Each of the four proteins L2, L3, L4, and L10e (a homologue of bacterial L16) has a globular domain connected to a long extension that penetrates deeply into domain V and approaches the active site. Such a long extension is quite common to ribosomal proteins and it is thought to play a role as a “glue” for the quaternary structure of the ribosome. Interestingly, three out of the four proteins in the vicinity of the PTF center are also present in eubacterial ribosomes, viz. L2, L3, and L4. These proteins have been identified previously together with 23S rRNA as major candidates for the PTF activity by single-omission tests in a total reconstitution system of the large subunit[109, 110].
23
24 The Ribosomal Elongation Cycle
Fig. 7 The PTF reaction. The figure shows the four possible steps of peptide-bond formation according to recent crystallographic and biochemical data [103, 105, 122, 152]. The essential features are (A) C74 and C75 of the P-site tRNA (green) are Watson–Crick paired with G2252 and G2251, respectively, of the P loop (blue). Similarly, C75 from the A-site substrate (red) forms a Watson–Crick base pair with G2553 (A-loop). The "-NH2 function of the A-site aminoacyltRNA is an ammonium ion at pH 7 [153]. (B) Deprotonation of the ammonium ion triggers the nucleophilic attack of the "-amino function on the carbonyl group of the P-site substrate, which results in the tetrahedral inter-
mediate T±. The secondary "-NH2 group forms a hydrogen bond with N3 of A2451 and a second with either the 2 -OH of the A76 ribose at the P-site (shown here) or alternatively with the 2 -OH group of A2451. The oxyanion of the tetrahedral intermediate points away from the N3-A2451 [103] and thus cannot, in contrast with the previous proposal [105], form a H-bridge. (C) Further deprotonation of the secondary "-NH2 group leads to the tetrahedral intermediate T- and the PTF reaction is completed by an elimination step. (D) The peptidyl residue is linked to the ami-noacyl-tRNA at the A-site via a peptide bond.
4 The PTF Reaction
One significant difference between the Deinococcus radiodurans 50S structure and that from H. marismortui is the presence of protein L27, a protein that has no homolog counterpart in the latter archaea organism [111]. L27 is one of the few proteins that are present in the interface region of the 50S subunit. It has been proposed that L27 plays a role in placement of the CCA ends of the A-and/or P-site tRNAs, and based on docking of the tRNAs from the T. thermophilus 70S:tRNA3 structure into the D. radiodurans 50S structure, contact of the CCA ends with L27 were predicted [111]. Indeed, photoreactive derivatives of yeast NAc-tRNAPhe containing 2-azidoadenosine at their 3 -termini could be crosslinked to L27 when bound at the ribosomal P-site [112]. Recently, Zimmermann and co-workers [113] have shown that it is the very N-terminal of L27 which is crosslinked, since deletion of the 3–6 amino acids severely reduced the crosslinking efficiency and deletions of more than nine amino acids totally abolished crosslinking altogether. Therefore, L27 is more probably involved in tRNA positioning rather than in the PTF reaction itself and, furthermore, seems to be specific for only eubacteria. The debate has now turned to whether the PTF reaction follows a physical or in addition also a chemical principle. 4.1 A Short Intermission: Two Enzymatic Principles of PTF Activity
There are two main principles associated with enzymatic reactions: a chemical and a physical. A number of examples exist where one or other principle is predominant but they need not be mutually exclusive (see Ref. [1] for review). 4.1.1 Chemical Concept: A Transient Covalent Bond between Active Center and Substrate(s) Although the serine proteases subtilisin in bacteria and chymotrypsin in mammals have arisen independently during evolution, both have an identical activation center for clearing the peptide bond containing a triad of Asp, His and Ser (Fig. 10). The three amino acid residues participate directly in the catalysis through a transient covalent event. The Asp–His module executes a general acid–base catalysis consisting of a proton donation (the acidic step), and a proton-accepting step (the basic step). The nucleophilic serine residue attacks the first substrate (which can be a peptide or an ester) forming a covalent acylated enzyme intermediate, i.e., after cleavage of the peptide bond the Ser residue binds the carbonyl residue transiently forming a seryl ester. The serine residue is then displaced via a nucleophilic attack
← Fig. 7 The inset is the Yarus inhibitor CCdAp-puro-mycin (CCdApPmn),which was used to identify the PTF center of the ribosome. The interactions of the Yarus inhibitor with the rRNA were deduced from 50S crystals of H. marismortui ribosomes after soaking the inhibitor into the crystals.
Note that it was concluded that the protonated N 3 of A 2451 makes a H bridge to O2, which was thought to mark the position of the oxyanion of the tetrahedral intermediate (transition state) formed during peptidebond formation [105] (cf. with the probably correct representation in step (B).
25
26 The Ribosomal Elongation Cycle
Fig. 8 Secondary structure of the domain V of the E. coli 23S rRNA. Left: the Asite (blue) and P-site (green) regions that are related by 2-fold symmetry, where the symmetry-related residues within these regions are highlighted with the same color. Right, The 2-fold symmetry is illustrated
from two different views using ribbon representations of the PTF center from D. radiodurans 50S subunit, with the A- and Psite CCA-end ligands indicated in the corresponding colors. This figure was taken from Ref. [146] with permission.
of the second substrate (which can be an amine, a water molecule, or an alcohol). Importantly, the His residue has a pK a of about 7, making it excellently suited to function in this type of catalysis which requires steps of both proton donation and acceptance at pH 7. In the case of the PTF reaction, we can replace the intermediate seryl ester by a peptidyl-tRNA, where the peptidyl residue is also linked to the tRNA body via an ester bond. After nucleophilic attack by the α-amino group, a covalent intermediate is formed between the peptidyl and aa-tRNA. This intermediate complex with a tetrahedral carbon and a negatively charged oxygen is unstable and decomposes to give a peptidyl moeity (the nascent chain) linked to the aminoacyl-tRNA via a peptide bond in the A-site and a deacylated tRNA at the P-site. Both reactions occur equally well with an alcohol [114] or, with a water molecule instead of an aminoacyl-tRNA, in the case of termination reaction. The mechanism requires (i) the activation (deprotonation) of the nucleophilic α-amino group of the aa-tRNA by a general base catalyst, e.g., a His–Asp system as in the serine proteases, (ii) the stabilization of the tetrahedral intermediate resulting from the nucleophilic attack of the aa-tRNA on the ester linkage of peptidyl-tRNA, and (iii) the activation of the tetrahedral intermediate, which then breaks down due to proton donation from the general acid catalyst. To avoid a side reaction with a water molecule, i.e.,
4 The PTF Reaction
Fig. 9 The A- and P-site products in red and green, respectively, bound at the PTF center of the 50S subunit. The proteins that reach within ∼20 Å of the PTF center include proteins L2 (purple), L3 (blue), L4 (cyan) and L10e (brown). This figure was generated from pdb file 1KQS [25] using Swisspdb viewer [150] and rendered with POVRAY.
hydrolysis of the peptidyl-tRNA during PTF reaction, the PTF center must be in a hydrophobic pocket. However, in the crystal structure of 50S (1JJ2.pdb released upon the publication of Ref. [115]) the residue A2451, which is in proximity of the PTF catalytic center (see the next section), is surrounded by 56 water molecules within a distance of 10 Å. This finding questions the assumption that the identified region for peptide-bond formation is in its fully active state, although it is competent to form a peptide bond [25]. The essential involvement of general acid–base catalysis in peptide-bond formation is consistent with the following experimental data: (i) If an enzymatic reaction is retarded in the presence of heavy hydrogen D (D2 O) instead of H (H2 O), the obvious conclusion is that a general acid–base catalysis is involved, since the migration of D versus H is slower. Precisely, such an effect has been observed for peptide-bond formation [1]. (ii) The pH dependence of peptide-bond formation peaks at pH 7, similar to the pK a curve for a His residue [114, 116]. It was shown that the PTF activity can be blocked by His-modification reagents and that inactivation follows a one-hit kinetics of modification of His residues on 50S subunits [117], indicating that one His residue is essential for peptidebond formation. Furthermore, phenyl-boric acid which reacts specifically with His residues, blocks peptide-bond formation [118]. Taken together, the interpretation was that a His residue might mediate a general acid–base catalysis as is known from the case of serine proteases. These suggestions of a catalytic mechanism, based on general acid–base catalysis and a possible involvement of a His residue, lead to the idea that the mechanism of serine proteases is exploited by the ribosome as well. A candidate for this His residue is His229 of L2 (see Ref. [54] for discussion and
27
28 The Ribosomal Elongation Cycle
Fig. 10 The mechanism of peptide-bond hydrolysis of serine proteases. Three amino acids Ser, His and Asp participate in this reaction. (I) The nucleophilic hydroxyl anion of the serine residue attacks the carbonyl group of the substrate (peptide bond)
forming (II) a covalent acyl-enzyme intermediate. (III) The serine molecule is then displaced via a nucleophilic attack by a water molecule. (IV) Both acyl formation and breakdown proceed via a normally highenergy tetrahedral intermediate.
references therein). The recent X-ray analysis of 50S crystals demonstrates that a His–Asp module as in the case of serine proteases does not seem to apply, since no proteins are within 18 Å of the active site (see the next section; [105]). The role of a His residue as being seemingly critical for peptide-bond formation is therefore still unclear. 4.1.2 Physical Concept: The Template Model The essence of this concept is that an enzyme organizes a defined stereochemical arrangement of the two substrates that are to be covalently bonded. The stereochemical arrangement is sufficient to allow for a dramatic acceleration of the reaction rate by 106 –109 −fold. This concept does not require any direct chemical involvement in the catalysis of the reaction such as a transient covalent binding of the substrate(s) to the enzyme. Let us consider the acceleration factor provided by the ribosome. The upper limit of the rate for a ribosome-free environment can be estimated from a reaction between NH2 OH and AcPhe-tRNA (see Ref. [1]). The rate for the nucleophilic attack to form an ester or a peptide bond is 6 × 10−5 M−1 s−1 , which means one peptide bond is formed in 30 h. On the ribosome, one peptide bond is made in 50 ms (15–20 peptide bonds per ribosome per second [119]). From these numbers (15–20) s−1 /6 × 105 M−1 s−1 , an acceleration factor of 3 × 105 M can be calculated; in other words, the ribosome accelerates the reaction by a factor of 3 × 105 M. Kinetic data from organic chemistry demonstrate that a rate factor of this magnitude can be obtained from simple model compounds where the reactants are
4 The PTF Reaction Fig. 11. Peptide-bond formation in model compounds with appropriate juxtaposed nucleophile. See text for explanations.
appropriately juxtaposed. Bruice and Benkovic [120] have shown that the rate of intramolecular amine attack in phenyl-4(dimethylamino) butyrate (I in Fig. 11) is 1.3 × 103 M faster than bimolecular trimethylamine attack on phenyl acetate; a similar enhancement of the rate has been observed for intramolecular reaction in succinate half ester anion (II; Fig. 11) as compared with bimolecular reaction of acetate ion with phenyl acetate. A further rate enhancement is seen with an intramolecular reaction in the rigid ester anion (III) that proceeds 2.3 × 102 times more rapidly than that in (II), because in (III) the oxyanion nucleophile and ester are more rigidly fixed and thus better suited for the reaction, whereas (II) has rotational freedom to adopt nonproductive conformations. These results allow the prediction that if the ribosome were simply to hold the α-amino nitrogen of an aminoacyltRNA in the same position relative to the ester linkage of a peptidyl-tRNA as the carboxylate anion is held relative to the ester linkage in (III), peptide-bond formation would occur spontaneously with a rate comparable to the in vivo rate of ribosomes. The more rigid the reactants are fixed in a favorable stereochemistry, the faster the reaction proceeds. In other words, the rate of peptide-bond performance of the ribosome can be explained exclusively using the template model. 4.2 Data from the Crystal Structures
The universally conserved residue A2451 of domain V is the nearest base to the Yarus-transition analog. It was thought to be a good candidate for a general acid– base catalyst, since its N3 is about 3 Å from the oxygen and 4 Å from the nitrogen of the phosphoamide in the Yarus inhibitor (Fig. 12A). This proposal was strength-
29
30 The Ribosomal Elongation Cycle
Fig. 12 Tight fixation of the CCA ends of the P- and A-tRNAs observed in 50S subunit from H. marismortui in complex with (A) the Yarus inhibitor, and (B) the products following peptide-bond formation. The CCA ends of the tRNAs in the A- and Psites are colored red and green, respectively. The N3 of A2451 (dark blue) is 3.4 Å from the O2 of the Yarus inhibitor (see also inset in Fig. 7), whereas the same O2 is only 2.8 Å from the 2 -deoxy of A76 (arrowed). Selected rRNA residues of domain V of the
23S rRNA are colored light blue, including the A- and P-loop bases that participate in A- and P-site CCA end fixation (E. coli numbering). In (B) the P-site C74 and C75 have been omitted for clarity. Dashes indicate H-bonding and rRNA nucleotides use the following color scheme: oxygen, red; phosphorus, yellow; nitrogen, blue; carbon, dark blue. (A) and (B) were generated from pdb files 1FFZ [105] and 1KQS [25], respectively, using Swisspdb viewer [150] and rendered with POVRAY.
4 The PTF Reaction
ened when the pK value of A2451 was found to be abnormally high at neutral pH (> 7, which is 6 pH units higher than expected) [121, 122], a property essential for acid–base catalysis as it allows for easy donation and withdrawal of a proton from the α-amino group of the aa-tRNA at the A-site. According to the same model [105], protonation of A2451 would also allow formation of a hydrogen bond with the carbonyl oxyanion of the tetrahedral transition state analog (inset of Fig. 7). However, the pH dependence of DMS modification at position 2451, with which the abnormally high pK of this residue was demonstrated, was subsequently shown to be displayed only by inactive ribosomes [123]. Shortly following, several groups reported that A2451 was not essential for peptide-bond formation, since ribosomes bearing mutations at position 2451 exhibited only modest (2–14-fold) decreases in the rate of peptidyl transfer [124] and were instead shown to be defective in substrate binding [125]. The next glimpse of a stage in the PTF reaction was that of the final products, obtained by soaking of A- and P-site substrates into enzymatically active H. marismortui 50S crystals [25]. The activity of the crystals was demonstrated by addition of the native 50S crystals to a solution containing A- and P-site substrate analogs which resulted in product formation, and ceased upon removal of the crystals. The structure determined to 2.4–3.0 Å was post-peptide-bond formation but pretranslocation and showed that the deacylated CCA bound to the P-site had its 3 -OH in close proximity to the N3 of A2451 (Fig. 12B) [25]. Counterevidence against the involvement of A2451 in stabilizing the transition state analog followed: If the oxyanion of the tetrahedral intermediate is hydrogenbonded to the N3 of A2451, then this N3 must be protonated at around pH 7 and therefore should lose its proton at pH > 7.3. In this case, one would expect a strong pH dependence of the affinity of the Yarus inhibitor, since the hydrogen bond would contribute significantly to the affinity. To test this hypothesis, Strobel and co-workers [122] determined the affinity of the Yarus inhibitor for the 50S subunit at all pH values between 5 and 8.5 and found that it remained unchanged, a result inconsistent with the idea that the oxyanion is stabilized via a H-bond to the N3 of A2451. The same conclusion was drawn by subsequent crystallographic studies showing that the oxyanion of the tetrahedral intermediate points away from N3 of A2451, thus excluding a possible hydrogen bonding between these two atoms [103]. Furthermore, the Yarus inhibitor is not an honest mimic of the transition state: the distance between the O2 of the Yarus inhibitor and the 2 -C of the deoxy-A76 (dA76) ribose at the P-site is only 2.8 Å (arrowed in Fig. 12A). A physiological P-site substrate contains a 2 -OH at this 2 -C atom, which is essential for peptidebond formation [126], and would sterically clash with the O2 in the position of the Yarus inhibitor observed in the crystal. The essential nature of this 2 -OH might be explained by the observation that the α-NH2 possibly forms a hydrogen bond with this OH (illustrated in Fig. 7B). The general base catalysis debate flared up again, when Rodnina and co-workers [127] presented evidence that peptide-bond formation depends on two ionizable groups, one with a pK a of 6.9 and the other with a pK a of 7.5. The former was shown to be associated with the α-NH2 group of puromycin used in the kinetic
31
32 The Ribosomal Elongation Cycle
experiments, whereas the latter seemed to be ribosome-associated. The ionizable group evidently belongs to A2451 since a ribosome bearing an A2451U mutation catalyzed peptide-bond formation ∼130 times slower than normal and had lost the pH dependence associated with the titratable group at a pK a of ∼7.5. However, an alternative explanation suggested by the authors was that the protonated group is part of the A2450:C2063 base pair lying directly behind A2451 (seen in Fig. 12B) – the candidate-ionizable group being the N1 of A2450. Although a distance of 7 Å from the N1 to the α-NH2 group is too long for hydrogen transfer, a postulated conformational change of the PTF center might bring A2450 within the range [127]. The assumption of conformational changes broadens again the number of possible candidates that might play a role in the kind of chemical catalysis that is advocated here. We note that evidence has been presented implicating His229 of protein L2 in this catalysis [54], although current maps place this residue more than 20 Å from the tetrahedral intermediate of the transition state. The best that can be said at the moment is that a direct role of A2451 in a general acid–base catalysis is hard to be reconciled with the observation that A2451 in active ribosomes does not contain a titratable group at this pK a in contrast with inactive ribosomes [123]. In fact, the ribosome need not directly involve chemically in the catalysis of the PTF reaction, such as the formation of a transient covalent interaction between the substrate (tRNAs) and the enzyme (the ribosome or, more specifically, in this case the rRNA). The template model predicts that tight stereochemical arrangement of substrates relative to one another would be sufficient to provide the dramatic acceleration of the reaction rate needed for peptide-bond formation (see the previous section). In this case, the role of A2451 would be to withdraw a proton from free nucleophilic α-NH2 group of the A-site substrate or form a hydrogen bond with the α-NH2 group, thus promoting peptide-bond formation via proper positioning of the NH2 group. The reaction scheme would follow similar to that presented in Fig. 7 and described in more detail in the corresponding legend. Tight fixation of the CCA ends of the P- and A-tRNAs is exactly what is observed both in the analog soaked 50S crystal structure (Fig. 12A) and also with the 50S structure containing the products of the PTF reaction following soaking of the A- and P-site substrates (Fig. 12B). In the P-site, the CCA end is locked into position by two Watson–Crick base pairs, viz. C74 and C75 with G2252 and G2251. A76 stacks on the ribose of A2451 (seen clearly in Fig. 12B). In the A-site, the CCA end of the aa-tRNA is fixed by (i) Watson–Crick base-pairing between C75 and G2553, (ii) a type-I A-minor motif between A76 and the G2583–U2506 base pair, and (iii) an additional H-bond interaction between the 2 -OH of A76 with U2585. Tight fixation of the CCA ends of both A- and P-tRNAs at the PTF center underlines the importance of the template model in peptide-bond formation, viz. that precise stereochemical fixation is predominantly responsible for the enormous acceleration of the reaction. The rate of peptide-bond formation on the ribosome of ∼50 s−1 was estimated to be ∼105 faster than the uncatalyzed reaction (the rate in the absence of ribosomes) [1]. The PTF reaction without chemical catalysis
5 The Translocation Reaction
(i.e., ribosomal ionizing group is protonated at pH < 7) occurs with a rate of ∼0.5 s−1 [127], which is still >1000 faster than the uncatalyzed reaction. If this estimation is correct then the physical mode of peptide-bond formation represents approximately 90% of the reaction rate, with the chemical mode making up the remaining 10%. 4.3 Why both the Physical and Chemical Concepts for Peptide-bond Formation?
If the physical mode sufficiently explains the overall rate of protein synthesis, why does the chemical mode exist in addition? A possible explanation is that the template model requires a maximal rigid surface for the ligands, whereas the ribosome has to be flexible to allow the movement of the mRNA•tRNA2 complex during translocation. To achieve both a fast rate and a flexible structure of the ribosome, both physical and chemical catalysis might need to be involved for an optimal rate of protein synthesis, although it is already clear that the template (physical) mode probably plays the dominant role in peptide-bond formation. 5 The Translocation Reaction
Following peptide-bond formation, the positions of the tRNAs remain unchanged. This has been demonstrated by cryo-EM analyses of E. coli ribosome complexes [11, 104] and – concerning the CCA ends at the PTF center – by soaking Aand P-site substrates into active 50S H. marismortui crystals and solving the structure of the reaction products after peptide-bond formation [25]. Now the ribosome must transfer the products, the peptidyl-tRNA in the A-site and deacylated tRNA in the P-site, to the P- and E-sites respectively, i.e., shifting the ribosome from PRE to POST state (Figs. 1(c) and (f), respectively). This process is termed translocation. It must be extremely accurate at both ends of the tRNA molecule: the anticodon–codon complex must be moved exactly 10 Å (the length of one codon), longer or shorter movements will change the reading frame. At the other end of the A-site peptidyl-tRNA, the CCA end must also be precisely moved into the P-site so as to set up the next peptidyl-transferase reaction with the incoming A-tRNA. Incorrect placement of the peptidyl-tRNA at the P-site could be disastrous for peptide-bond formation and lead to abortion of translation. Ribosomes have an innate translocase activity [14], but it is more than one order of magnitude slower than that of the EF-G catalyzed reaction [13]. This implies that the structures necessary to move the tRNAs reside in the ribosome and that the role of EF-G (and EF2 in eukaryotes) is to reduce the activation energy barrier that separates the two sets of tRNA positions. Important questions remaining unanswered are: How does EF-G mediate translocation and what ribosomal components are involved in transfer of the tRNAs?
33
34 The Ribosomal Elongation Cycle
5.1 Conservation in the Elongation Factor-G Binding Site
The crystal structure of EF-G has been solved in the nucleotide-free form [83] and in complex with GDP [84]. The GTP form is the active one, which binds to the ribosome and triggers translocation. Hydrolysis of GTP inactivates EF-G and dissociates it from the ribosome (reviewed in Ref. [128]). EF-G belongs to the same subfamily of G proteins as IF2, RF3, and EF-Tu, the latter of which has been crystallized in both GTP (active) and GDP (inactive) forms. Comparison of these EF-Tu structures reveals that they exhibit large domain shifts relative to one another [85]. Cryo-EM reconstructions of EF-G bound to bacterial 70S ribosomes at 17.5–20 Å [7, 9, 129] and EF2 to eukaryotic 80S ribosomes at 17.5 Å [130] show similar binding sites for both factors (Figs. 13A and B). In these complexes, antibiotics were used to trap the elongation factor on the ribosome. EF-G was trapped on the 70S ribosome using the antibiotic fusidic acid, which allows translocation and GTP hydrolysis, but blocks the switch into the GDP conformer of the factor, preventing dissociation from the ribosome. The eukaryotic eEF2 was locked on the yeast ribosome using the antifungal sordarin, which is thought to function analogously to fusidic acid [130]. The EF-G complex was formed with a typical PRE state, i.e., A- and P-tRNAs were present. As expected, the tRNAs were translocated to the P- and E-sites, but of special interest is that the tip, domain IV, of EF-G
Fig. 13 Comparison of cryo-electron microscopic analyses of EF-G 70S complex from E. coli and EF2 80S complex from S.cerevisiae. Side view of (A) EF-G 70S complex and (B) EF2 80S complex with small subunit on left and large subunit on right. The same orientation is seen, in the small inset on the left, of an empty 80S ribosome, where the 40S and 60S subunits are colored yellow and blue, respectively. Relative arrangements of (C) EF-G and P-tRNA and
(D) EF2 and P-tRNA (where the P-tRNA was placed into the 80S density map on the basis of the position observed in a PtRNA-bound 70S ribosome reconstruction). Landmark abbreviations are for the small subunit: b, body; bk, beak; h, head; sh, shoulder and sp, spur. Landmarks for the large subunit: CP, central protuberance; St, L7/L12 stalk. The roman numerals on EFG/EF2 refer to the domains. Figures modified from Gomez-Lorenzo et al. [130].
5 The Translocation Reaction
was shown to occupy the position of the A-site. Similarly, the position for EF2 seen in the EF2:ribosome complex also occupied the A-site, and came very close to the position of the P-tRNA (Figs. 13C and D). EF-G-mediated translocation is also possible in the presence of nonhydrolyzable GTP analogs, such as GDPNP, thus suggesting that binding of EFG alone is sufficient for translocation and that hydrolysis is necessary for conformational change and release of EF-G·GDP [128]. We repeat that the fact that translocation is an intrinsic activity of the ribosome has been shown by the factor free “spontaneous” translocation [131]. Furthermore, a dipeptidyl-tRNA that has been formed at the A-site via peptide-bond formation shows a much faster spontaneous translocation than a chemically made dipeptidyltRNA that has been bound to the A-site [13]. The energy for translocation therefore might arise by the peptide-bond formation, at least in part. The surprising observation that the antibiotic sparsomycin, an efficient blocker of peptide-bond formation, is able to effectively trigger a single translocation has corroborated the view that it is not the energy from the EF-G-dependent GTP hydrolysis that drives the translocation [132]. Precisely the opposite view, namely that EF-G-dependent GTP hydrolysis limits the translocation reaction thus accelerating the translocation, has been put forward by one group [37]. In this case, EF-G would be considered a “motor protein”, although this view has not yet been accepted by the wider scientific community. One reason might be that the same group has published just the opposite results about 15 years ago: Also applying stop-flow measurements, they showed that the translocation reaction limits the EF-G-dependent GTP hydrolysis and not vice versa, namely that EF-G-dependent GTP cleavage occurs after translocation and thus the correspondingly released energy cannot directly flow into the translocation reaction [34]. For classical G-proteins, a GTPase-activating protein (GAP) stimulates the Gprotein-mediated hydrolysis of GTP. In the case of EF-G/EF2, the GAP is provided by components of the ribosome. There are certainly gross conformational changes visible upon binding of each elongation factor to the ribosome. One of the most striking changes is seen within the stalk region; density for this region is absent in the empty 70S and 80S ribosomes but becomes more ordered upon EF-G/EF2 binding [7, 129, 130], supporting its universal role in factor binding. The best candidates for the GAP role include the pentameric stalk complex, which consists of the ribosomal proteins L10 (L7/L12)4 , as well as a region of the 23S rRNA termed the sarcin-ricin loop (SRL). The SRL is so named because cleavage after G2661 of the bacterial 23S rRNA within this region by the highly specific RNase α-sarcin inhibits all elongation factor-dependent activities [133] and similar effects are seen after removing the neighboring base A2660 (E. coli nomenclature) in the 26S rRNA of yeast by the N-glycosidase activity of the ricin A-chain [134]. Furthermore, this region contains the longest (12 nucleotides) universally conserved stretch of rRNA underlining its functional importance. Recently, hybrid ribosomes were constructed where the proteins at the GTPase center from E. coli, L7/L12 and L10, were replaced with their eukaryotic counterparts from rat P1/P2 and P0, respectively [135]. Both the in vitro translation and GTPase activity of the resultant
35
36 The Ribosomal Elongation Cycle
hybrid ribosomes were strictly dependent on the presence of the eukaryotic elongation factors, EF2 and EF1a. This reflects not only the specificity of the interaction between the stalk proteins and the elongation factors from each kingdom, but also the importance of the stalk proteins in mediating elongation factor GTPase activity. The ribosomal protein L11 (and associated L11-binding site of the 23S rRNA) is often considered as a candidate for a GAP role and is often referred to as the GTPase-associated center (GAC) alone and collectively with the SRL (however to avoid confusion we will refrain from using this term, especially since L11 is unlikely to be a true GAC of the ribosome, see following). The reason that L11 has been assigned a GAP role is because mutations in both L11 and its binding site on the 23S rRNA can confer resistance against the antibiotic thiostrepton, a potent inhibitor of EF-G- and EF-Tu-dependent GTPase activities [136]. However, the direct involvement of L11 in the factor-dependent GTPase is not very probable, since (i) mutants lacking L11 are viable, although extremely compromised as indicated by about 6-fold growth retardation [137], and (ii) the IF-2-dependent GTPase is stimulated rather than blocked by thiostrepton [138]. Furthermore, replacement of either the L10 (L7/L12)4 complex or L11 with the equivalent rat protein showed that the P0 (P1 P2)2 complex, but not the eukaryotic counterpart to L11 (RL12), was responsible for factor specificity and associated GTPase dependence, although addition of L11 or RL12 did stimulate protein synthesis significantly [135]. Consistently, L11 is certainly in close proximity to the elongation factors: cryo-EM analyses of EF-G bound to 70S ribosomes revealed that upon binding of EF-G, the N-terminal domain of L11 is shifted so as to form an arc-like connection with the G-domain of EF-G [139]. This arc-like connection is also observed in the EF2–80S complex although it is broader and more fused [130]. However, from cryo-EM reconstructions of EF-Tu ternary complex ribosome complexes stalled with the antibiotic kirromycin, it is the α–sarcin/ricin loop that makes direct contact with the G domain of the EF-Tu [104] and not the L11 region. Although residue 1067 within H43 of the L11-binding site rRNA makes definite contact with the highly conserved region of the T arm of the tRNA, no contact between L11 and the tRNA is observed in the recent 9 Å reconstruction [104], in contrast with earlier proposals that were based on lower resolution reconstructions [140]. The most striking result to emerge from the improved resolution was an observed movement of 7 Å in the L11 region upon ternary complex binding to bring this region into contact with the tRNA [104]. This leads the authors to propose a model, whereby the conformation change in the L11 region perturbs the orientation of the tRNA such that codon–anticodon interaction can be established, which in turn provides the trigger for GTPase activation of EF-Tu. In this model, the L11 region would not act as a GAP directly, but would stimulate the activity consistent with the available biochemical data. Lastly, it is noteworthy that L11 also stimulates the stringent response factor RelA in the presence of a deacylated tRNA at the ribosomal A-site, which leads to the RelA-dependent synthesis of (p)ppGpp, an essential effector during the stringent response.
5 The Translocation Reaction
5.2 Dynamics within the Ribosome
The α–ε model for translocation hypothesizes a moveable domain within the ribosome that carries the A- and P-tRNAs during translocation (reviewed in Ref. [43]). Evidence for this model comes from testing the accessibility of phosphate groups on the tRNAs in the PRE and POST states. The essential observation was that the protection patterns of A- and P-tRNAs differ from one another, but the corresponding tRNAs exhibit the same protection patterns in the PRE state as they do in the POST. This suggests that distinct ribosomal components are involved in carrying the tRNAs from the PRE to the POST state. In contrast with the intensive contact patterns observed with tRNAs bound to programed ribosomes, the 40 nucleotides of mRNA covered by the ribosome [141] have almost no specific contacts with the ribosome, except two positions upstream of the two decoding codons (i.e., those codons involved in codon–anticodon interactions; [142]). This immediately suggests that the tightly bound tRNAs are the handle to move the tRNA2 •mRNA complex during translocation, whereas the mRNA follows because of the direct connection with the tRNAs via the two adjacent codon–anticodon interactions. This is one important reason why the two adjacent tRNAs maintain their simultaneous codon–anticodon interaction before (A- and P-sites) and after translocation (P- and E-sites). There are a number of candidates that may play a role in translocation of the tRNAs or even constitute portions of the moveable domains. Distinct regions within the crystal structures are disordered, most probably reflecting flexibility in these components. A classic example is the stalk region, which, as already mentioned, only becomes ordered upon factor binding. Another is the L1 region, the flexibility of which may regulate E-tRNA release. There are also certain structures that become either ordered or rearranged upon subunit association. Most of these elements are constituents of intersubunit bridges. One striking example is the universally conserved H69 in domain IV of the large subunit rRNA. H69 is the major element of bridge B2a, the largest intersubunit bridge (see Fig. 14), and is disordered in the H. marismortui 50S subunit structure but ordered in the D. radiodurans 50S subunit. Comparing the latter structure with the T. thermophilus 70S structure suggests that upon association H69 swings out towards h44, another very flexible element. In this extended conformation, H69 would be predicted to make contact with both Aand P-tRNAs [111]. Another element that is not fully resolved in either of the 50S structures is H38 of domain II, a constituent of bridge B1a, often called the “A-site finger” because it contacts the A-tRNA. As previously mentioned, both B1a and B2a bridge elements have corresponding counterparts within the 80S yeast ribosome, strengthening their candidacy for a role in translocation. The most significant progress of our understanding of the translocation reaction comes from a detailed cryo-EM study of this reaction [31], together with a biochemical analysis [143]. EF-G induces a ratchet-like movement of the 30S subunits as first shown with empty ribosomes [144], but here the movement could be coupled with translocation of tRNAs at higher resolution, yielding further insight into this
37
38 The Ribosomal Elongation Cycle
Fig. 14 Comparison of intersubunit bridge positions between bacteria and yeast ribosomes. (A, B): The 30S (blue) and 50S (gray) ribosomal subunits of T. thermophilus are shown from their intersubunit sides. Intersubunit bridges are marked in red and are annotated according to the nomenclature from Gabashvili et al. [52], where the lettering of the bridges B1, B2 and B3–B5 are labeled in blue, green and orange, respectively. Figure adapted from Cate et al.
[154]. (C, D): The 40S (yellow) and 60S (blue) ribosomal subunits of S. cerevisiae are also shown from the interface side with the intersubunit bridges in red. The lettering common to T. thermophilus are labeled in blue (B1), green (B2) and orange (B3–B5), whereas those of additional intersubunit connections in the yeast ribosome are labeled eB8–eB11 in red. Note that bridges B6 and B7 are not included for simplicity. Adapted from Spahn et al. [155].
reaction: EF-G•GTP does not bind to ribosomes in a POST state or to ribosomes that carry a peptidyl-tRNA at the P-site, but does so with high affinity if the ribosome carries only a deacylated tRNA at the P-site as it is the case in the PRE state after peptide-bond formation (see Fig. 2B). However, it difficult to imagine how EF-G binding is influenced by a deacylated tRNA at the P-site, when the A-site is occupied with a peptidyl-tRNA. The former state is denoted “locked” for binding of EF-G•GTP, the latter “unlocked”, although no structural differences could be detected at the resolution of 10–13 Å. The following picture emerged: (i) After occupying the A-site with an aa-tRNA and peptide-bond formation, the peptidyl-tRNA is in the A-site and a deacylated tRNA at the P-site, no hybrid site is visible. (ii) EF-G•GTP binds and induces the first forth-movement of the ratchet motion of the 30S subunit, a turn of about 20∞. During this movement, the deacylated tRNA is seen in a hybrid position
5 The Translocation Reaction
P/E and it is believed (although not yet observed) that the peptidyl-tRNA would also occupy a hybrid position A/P. (iii) The ribosome triggers the EF-G-dependent GTP hydrolysis and the authors postulate that the resulting EF-G conformational change into the GDP conformer completes the translocation reaction in that (i) the 30S subunit moves back (second part of the ratchet movement), (ii) the tRNAs continue their movement to E- and P-sites, respectively, and (iii) EF-G•GDP dissociates from the ribosome. Until now, EF-G•GTP has not been successfully crystallized, whereas EF-G•GDP has. A comparison of the structure of EF-G•GDPNP (a GTP analog) on the ribosome with that of the crystallized EF-G•GDP revealed a striking shift of domains 3–5 resulting in a movement of about 35 Å of the tip of domain 4. The authors postulated that this dramatic movement, similar in extent to that observed between the GTP and GDP conformers of EF-Tu (40 Å), promotes the back movement of the ratchet motion and might be also essential for the release of EF-G•GDP from the ribosome. If the striking conformational change is causing the release of EF-G rather than the second half of the ratchet movement, then the EF-G-dependent GTP hydrolysis has nothing to do with a motor protein function of this elongation factor. If, however, the GTP hydrolysis is in fact promoting the back-swing of the 30S subunit, then this could be interpreted as the molecularcorrelate to a motor-protein role of EF-G. The authors [31] also detect a substantial movement of the L1 protuberance during the translocation and postulate an active participation of this structure in the translocation of the deacylated tRNA from the P-site to the E-site. During the translocation reaction a movement of a peptidyl-tRNA from the Asite to the P-site via a P/E hybrid position can be easily reconciled with the α–ε model as has been suggested previously [51]. Note the difference to the hybridsite model. This model postulates that after peptide-bond formation and before translocation the tRNAs move on the large subunit but stay on the 30S subunit, i.e., the peptidyl-tRNA at the A-site moves from A/A-site to the A/P-site, and only the translocation movement brings the tRNA into the P/P-site. In contrast, the cryo-EM study suggests that during translocation the peptidyl-tRNA moves from the A- to the P-site via a transient A/P position. Despite the asymmetry of the ribosome, an internal symmetry within the large ribosomal subunit has been identified ([145]; reviewed in Refs. [146, 147]). The fact that the CCA ends of tRNAs at the A- and P-sites are related by 180◦ rotation rather than by a translational movement like the rest of the tRNA [25, 103, 105], suggested that at least a minimum of symmetry might exist within the PTF center of the ribosome. However, Yonath and co-workers [145] realized that this symmetry extended beyond the A and P loops, and identified two regions of approximately 90 nucleotides that are related by rotational symmetry in the D. radiodurans 50S subunit (Fig. 8). The closest residues to the center of rotation include a number of residues of the PTF ring, strands of H89 and H93 and the stem loops of H92 and H80, which are the A and P loops, respectively (symmetry-related residues are colored with corresponding colors in Fig. 8). A similar symmetry was also discovered within the large subunit from archaea H. marismortui and thermophilic
39
40 The Ribosomal Elongation Cycle
eubacteria T. thermophilus suggesting that it is a conserved feature of ribosomes. This is in itself not surprising since this region of the ribosome is highly conserved, but does add an extra dimension. But what is the significance of the symmetry? Following peptide-bond formation, the peptidyl-ACC of the A-site tRNA must move into the P-site of the PTF center, i.e., a transition from making interactions with the A loop to making interactions with the P loop. Since the symmetry-related residues within the PTF center effectively provide a lining for the walls of the cavity, the idea arose that the CCA ends of the tRNA should follow a rotation movement during translocation. By computational modelling of such a transition, it was possible to accomplish a rotation from the A- to the P-site without any steric clashes. Interestingly, the symmetry-related residues within the P-site are located slightly deeper in the tunnel than the corresponding A-site residues, leading to the proposal that the rotational movement during translocation drives the peptide into the tunnel [147]. One of the most exciting discoveries related to the rotational symmetry was that the motion is centered on residue A2602. This residue is known to be highly flexible, a fact emphasized by the different orientation of the adenine base of this residue in almost all the different crystal structure complexes solved to date. Indeed, upon binding of many ligands, such as antibiotics and CCA-end mimics, A2602 assumed a different position [146]. This flexibility and the location of A2602 at the center of rotation may suggest that A2602 has a role in guiding the CCA end of the tRNA from the A- to the P-site. Another conserved residue predicted to maintain interaction throughout the rotational motion was U2585, which has also been identified as being very flexible. The ribosome research is in exciting upheaval phase where the structure begins to explain the function. For the first time, we can imagine how the ribosome is using its complicated structure to perform its complicated function.
References 1 K. H. NIERHAUS, H. SCHULZE, B. S. COOPERMAN, Biochem. Int. 1980, 1, 185–192. 2 K. H. NIERHAUS, Angew. Chem. Int. Ed. 1996, 35, 2198–2201. 3 K. H. NIERHAUS: in Ribosomal RNA and Group I Int, eds R. GREEN, R. SCHROEDER, R. G. LANDES, Georgetown, TX 1996, 69–81. 4 K. H. NIERHAUS: in Nature Encyclopedia of Life Sciences, Nature Publishing Group, London 1999, www.els.net. 5 D. N. WILSON, G. BLAHA, S. R. CONNELL et al., Curr. Protein Pept. Sci. 2002, 3, 1–53. 6 D. N. WILSON, K. H. NIERHAUS, Angew. Chem. Int. Ed. Engl. 2003, 42, 3464–3486.
7 R. AGRAWAL, P. PENCZEK, R. GRASSUCCI et al., Proc. Natl. Acad. Sci. USA 1998, 95, 6134–6138. 8 H. STARK, E. V. ORLOVA, J. RINKE-APPEL et al., Cell 1997, 88, 19–28. 9 H. STARK, M. V. RODNINA, H. J. WIEDEN et al., Cell 2000, 100, 301–309. 10 M. VALLE, J. SENGUPTA, N. K. SWAMI et al., EMBO J. 2002, 21, 3557–3567. 11 R. K. AGRAWAL, C. M. T. SPAHN, P. PENCZEK et al., J. Cell. Biol. 2000, 150, 447–459. 12 S. SCHILLING-BARTETZKO, A. BARTETZKO, K. H. NIERHAUS, J. Biol. Chem. 1992, 267, 4703–4712.
References 13 K. BERGEMANN, K. H. NIERHAUS, J. Biol. Chem. 1983, 258, 15105–15113. 14 L. P. GAVRILOVA, A. S. SPIRIN, FEBS Lett. 1971, 17, 324–326. 15 A. R. CUKRAS, D. R. SOUTHWORTH, J. L. BRUNELLE et al., Mol. Cell 2003, 12, 321–328. 16 U. R. KUTAY, C. M. T. SPAHN, K. H. NIERHAUS, Biochim. Biophys. Acta 1990, 1050, 193–196. 17 J. R. MESTERS, A. P. POTAPOV, J. M. de GRAAF et al., J. Mol. Biol. 1994, 242, 644–654. 18 A. SAVELSBERGH, V. I. KATUNIN, D. MOHR et al., Mol. Cell 2003, 11, 1517–1523. 19 J. M. BERG, J. L. TYMOCZKO, L. STRYER: Biochemistry, 5th edition, Freeman, New York 2002. 20 F. J. TRIANA-ALONSO, K. CHAKRABURTTY, K. H. NIERHAUS, J. Biol. Chem. 1995, 270, 20473–20478. 21 H. R. BOURNE, D. A. SANDERS, F. MCCORMICK, Nature 1990, 348, 125–132. 22 D. MOAZED, H. F. NOLLER, Nature 1989, 342, 142–148. 23 M. M. YUSUPOV, G. Z. YUSUPOVA, A. BAUCOM et al., Science 2001, 292, 883–896. 24 H. F. NOLLER, M. M. YUSUPOV, G. Z. YUSUPOVA et al., FEBS Lett. 2002, 514, 11–16. 25 T. M. SCHMEING, A. C. SEILA, J. L. HANSEN et al., Nat. Struct. Biol. 2002, 9, 225–230. 26 Y. SEMENKOV, T. SHAPKINA, V. MAKHNO et al., FEBS Lett. 1992, 296, 207–210. 27 D. MOAZED, H. F. NOLLER, Cell 1989, 57, 585–597. 28 A. GNIRKE, K. H. NIERHAUS, J. Biol. Chem. 1986, 261, 14506–14514. 29 H. J. RHEINBERGER, K. H. NIERHAUS, J. Biomol. Struct. Dyn. 1987, 5, 435–446. 30 R. K. AGRAWAL, P. PENCZEK, R. A. GRASSUCCI et al., J. Biol. Chem. 1999, 274, 8723–8729. 31 M. VALLE, A. ZAVIALOV, J. SENGUPTA et al., Cell 2003, 114, 123–134. 32 C. M. SPAHN, G. BLAHA, R. K. AGRAWAL et al., J. Mol. Cell. 2001, 7, 1037–1045. 33 J. M. ROBERTSON, H. PAULSEN, W. WINTERMEYER, J. Mol. Biol. 1986, 192, 351–360.
34 J. M. ROBERTSON, C. URBANKE, G. CHINALI et al., J. Mol. Biol. 1986, 189, 653–662. 35 J. WADZACK, N. BURKHARDT, R. J¨UNEMANN et al., J. Mol. Biol. 1997, 266, 343–356. 36 J. REMME, T. MARGUS, R. VILLEMS et al., Eur. J. Biochem. 1989, 183, 281– 284. 37 M. V. RODNINA, A. SAVELSBERGH, V. I. KATUNIN et al., Nature 1997, 385, 37–41. 38 H.-J. RHEINBERGER, K. H. NIERHAUS, Proc. Natl. Acad. Sci. USA 1983, 80, 4213–4217. 39 H.-J. RHEINBERGER, K. H. NIERHAUS, J. Biol. Chem. 1986, 261, 9133–9139. 40 A. GNIRKE, U. GEIGENMU¨ LLER, H.-J. RHEINBERGER et al., J. Biol. Chem. 1989, 264, 7291–7301. 41 T. P. HAUSNER, U. GEIGENMU¨ LLER, K. H. NIERHAUS, J. Biol. Chem. 1988, 263, 13103–13111. 42 H. SARUYAMA, K. H. NIERHAUS, Mol. Gen. Genet. 1986, 204, 221–228. 43 K. H. NIERHAUS, C. M. T. SPAHN, N. BURKHARDT: et al., in The Ribosome. Structure, Function, Antibiotics, and Cellular Interactions, eds R. A. GARRETT, S. R. DOUTHWAITE, A. LILJAS et al., ASM Press, Washington, DC 2000, 319–335. 44 R. A. GRAJEVSKAJA, Y. V. IVANOV, E. M. SAMINSKY, Eur. J. Biochem. 1982, 128, 47–52. 45 H.-J. RHEINBERGER, H. STERNBACH, K. H. NIERHAUS, Proc. Natl. Acad. Sci. USA 1981, 78, 5310–5314. 46 H.-J. RHEINBERGER, H. STERNBACH, K. H. NIERHAUS, J. Biol. Chem. 1986, 261, 9140–9143. 47 U. GEIGENMU¨ LLER, K. H. NIERHAUS, EMBO J. 1990, 9, 4527–4533. 48 M. DABROWSKI, R. JUNEMANN, M. A. SCHAFER et al., Biochemistry (Moscow) 1996, 61, 1402–1412. 49 M. DABROWSKI, C. M. T. SPAHN, M. A. SCHa¨fer et al., J. Biol. Chem. 1998, 273, 32793–32800. 50 N. POLACEK, S. PATZKE, K. H. NIERHAUS et al., Mol. Cell 2000, 6, 159–171. 51 M. A. SCHa¨fer, A. O. TASTAN, S. PATZKE et al., J. Biol. Chem. 2002, 277, 19095–19105. 52 I. S. GABASHVILI, R. K. AGRAWAL, C. M. T. SPAHN et al., Cell 2000, 100, 537–549.
41
42 The Ribosomal Elongation Cycle
53 M. S. VANLOOCK, R. K. AGRAWAL, I. S. GABASHVILI et al., J. Mol. Biol. 2000, 304, 507–515. 54 G. DIEDRICH, C. M. T. SPAHN, U. STELZL et al., EMBO J. 2000, 19, 5241–5250. 55 R. WILLUMEIT, S. FORTHMANN, J. BECKMANN et al., J. Mol. Biol. 2001, 305, 167–177. 56 H. BERCHTOLD, L. RESHETNIKOVA, C. O. A. REISER et al., Nature 1993, 365, 368. 57 N. BILGIN, L. A. KIRSEBOM, M. EHRENBERG et al., Biochimie 1988, 70, 611–618. 58 G. BLAHA, K. H. NIERHAUS, Cold Spring Harbor Symposia on Quantitative Biology, 2001, 135–145 Vol. 65. 59 H. ECHOLS, M. F. GOODMAN, Annu. Rev. Biochem. 1991, 60, 477–511. 60 R. T. LIBBY, J. L. NELSON, J. M. CALVO et al., EMBO J. 1989, 8, 3153–3158. 61 K. BEEBE, L. RIBAS De Pouplana, P. SCHIMMEL, EMBO J. 2003, 22, 668–675. 62 A. C. BISHOP, K. BEEBE, P. R. SCHIMMEL, Proc. Natl. Acad. Sci. USA 2003, 100, 490–494. 63 T. L. HENDRICKSON, T. K. NOMANBHOY, V. de CRECY-LAGARD et al., Mol. Cell 2002, 9, 353–362. 64 K. H. NIERHAUS, Mol. Microbiol. 1993, 9, 661–669. 65 H. STARK, M. V. RODNINA, H. J. WIEDEN et al., Nat. Struct. Biol. 2002, 15, 15–20. 66 A. WEIJLAND, A. PARMEGGIANI, Science 1993, 259, 1311–1314. 67 N. BILGIN, M. EHRENBERG, C. KURLAND, FEBS Lett. 1988, 233, 95–99. 68 A. P. POTAPOV, FEBS Lett. 1982, 146, 28–33. 69 A. P. POTAPOV, F. J. TRIANA-ALONSO, K. H. NIERHAUS, J. Biol. Chem. 1995, 270, 17680–17684. 70 J. M. OGLE, D. E. BRODERSEN, W. M. CLEMONS Jr et al., Science 2001, 292, 897–902. 71 S. OSAWA, T. H. JUKES, K. WATANABE et al., Microbiol. Rev. 1992, 56, 229–264. 72 J. M. OGLE, F. V. MURPHY, M. J. TARRY et al., Cell 2002, 111, 721–732. 73 A. P. CARTER, W. M. CLEMONS, D. E. BRODERSEN et al., Nature 2000, 407, 340–348. 74 W. M. CLEMONS, J. L. C. MAY, B. T. WIMBERLY et al., Nature 1999, 400, 833–840.
75 K. H. NIERHAUS, Biochemistry 1990, 29, 4997–5008. 76 J. J. HOPFIELD, Proc. Natl. Acad. Sci. USA 1974, 71, 4135–4139. 77 J. NINIO, Biochimie 1975, 57, 587–595. 78 M. EHRENBERG, D. ANDERSSON, K. MOHMAN et al.: in Structure, Function and Genetics of Ribosomes, eds B. HARDESTY, G. KRAMER, Springer, New York 1986, 573–585. 79 M. V. RODNINA, T. DAVITER, K. GROMADSKI et al., Biochimie 2002, 84, 745–754. 80 S. SCHILLING-BARTETZKO, F. FRANCESCHI, H. STERNBACH et al., J. Biol. Chem. 1992, 267, 4693–4702. 81 A. M. KARIM, R. C. THOMPSON, J. Biol. Chem. 1986, 261, 3238–3243. 82 F. J. LARIVIERE, A. D. WOLFSON, O. C. UHLENBECK, Science 2001, 294, 165–168. 83 A. AEVARSSON, E. BRAZHNIKOV, M. GARBER et al., EMBO J. 1994, 13, 3669–3677. 84 J. CZWORKOWSKI, J. WANG, T. A. SEITZ et al., EMBO J. 1994, 13, 3661–3668. 85 H. BERCHTOLD, L. RESHETNIKOVA, C. O. A. REISER et al., Nature 1993, 365, 126–132. 86 P. NISSEN, M. KJELDGAARD, S. THIRUP et al., Science 1995, 270, 1464–1472. 87 P. NISSEN, M. KJELDGAARD, J. NYBORG, EMBO J. 2000, 19, 489–495. 88 K. YAMAGUCHI, K. VON KNOBLAUCH, A. R. SUBRAMANIAN, J. Biol. Chem. 2000, 275, 28455–28465. 89 E. C. KOC, W. BURKHART, K. BLACKBURN et al., J. Biol. Chem. 2001, 276, 19363–19374. 90 E. C. KOC, W. BURKHART, K. BLACKBURN et al., J. Biol. Chem. 2001, 276, 43958–43969. 91 T. SUZUKI, M. TERASAKI, C. TAKEMOTO-HORI et al., J. Biol. Chem. 2001, 276, 33181–33195. 92 K. YAMAGUCHI, A. R. SUBRAMANIAN, J. Biol. Chem. 2000, 275, 28466–28482. 93 M. R. SHARMA, E. C. KOC, P. P. DATTA et al., Cell 2003, 115, 97–108. 94 F. BOUADLOUN, D. DONNER, C. G. KURLAND, EMBO J. 1983, 2, 1351–1356. 95 J. LANGRIDGE, J. H. CAMPBELL, Mol. Gen. Genet. 1969, 103, 339–347. 96 F. CRAMER, U. ENGLISCH, W. FREIST et al., Biochimie 1991, 73, 1027–1035.
References 97 J. R. MENNINGER, J. Biol. Chem. 1976, 251, 3392–3398. 98 F. JORGENSEN, C. G. KURLAND, J. Mol. Biol. 1990, 215, 511–521. 99 K. C. KEILER, P. R. H. WALLER, R. T. SAUER, Science 1996, 271, 990–993. 100 A. FUJIHARA, H. TOMATSU, S. INAGAKI et al., Genes Cells 2002, 7, 343–350. 101 F. JORGENSEN, F. M. ADAMSKI, W. P. TATE et al., J. Mol. Biol. 1993, 230, 41–50. 102 K. H. NIERHAUS, J. WADZACK, N. BURKHARDT et al., Proc. Natl. Acad. Sci. USA 1998, 95, 945–950. 103 J. L. HANSEN, T. M. SCHMEING, P. B. MOORE et al., Proc. Natl. Acad. Sci. USA 2002, 99, 11670–11675. 104 M. VALLE, A. ZAVIALOV, W. LI et al., Nat. Struct. Biol. 2003, 10, 899–906. 105 P. NISSEN, J. HANSEN, N. BAN et al., Science 2000, 289, 920–930. 106 M. WELCH, J. CHASTANG, M. YARUS, Biochemistry 1995, 34, 385–390. 107 G. STEINER, E. KUECHLER, A. BARTA, EMBO J. 1988, 7, 3949–3955. 108 C. M. T. SPAHN, C. D. PRESSCOTT, J. Mol. Med. 1996, 74, 423–439. 109 F. FRANCESCHI, K. H. NIERHAUS, J. Biol. Chem. 1990, 265, 16676–16682. 110 H. SCHULZE, K. H. NIERHAUS, EMBO J. 1982, 1, 609–613. 111 J. HARMS, F. SCHLUENZEN, R. ZARIVACH et al., Cell 2001, 107, 679–688. 112 S. KIRILLOV, J. WOWER, S. HIXSON et al., FEBS Lett. 2002, 514, 60–66. 113 B. MAGUIRE, A. BENIAMINOV, P. RAMU et al.: in 8th Annual Meeting of the RNA Society, RNA Society, Vienna 2003, 199. 114 S. R. FAHNESTOCK, H. NEUMANN, V. SHASHUA et al., Biochemistry 1970, 9, 2477–2483. 115 D. J. KLEIN, T. M. SCHMEING, P. B. MOORE et al., EMBO J. 2001, 20, 4214–4221. 116 B. E. MADEN, R. E. MONRO, Eur. J. Biochem. 1968, 6, 309–316. 117 K. K. WAN, N. D. ZAHID, R. M. BAXTER, Eur. J. Biochem. 1975, 58, 397–402. 118 I. RYCHLIK, J. CERNA, Biochem. Int. 1980, 1, 193–200. 119 H. BREMER, P. P. DENNIS: in Escherichia coli and Salmonella, eds F. C. NEIDHARDT, R. C. III J. L. INGRAHAM, et al., ASM Press, Washington, DC 1996, 1553–1569.
120 T. BRUICE, S. BENKOVIC, J. Am. Chem. Soc. 1963, 85. 121 G. W. MUTH, L. ORTOLEVA-DONNELLY, S. A. STROBEL, Science 2000, 289, 947–950. 122 K. M. PARNELL, A. SEILA, S. A. STROBEL, Proc. Natl. Acad. Sci. USA 2002, 99, 11658–11663. 123 M. A. BAYFIELD, A. E. DAHLBERG, U. SCHULMEISTER et al., Proc. Natl. Acad. Sci. USA 2001, 98, 10096–10101. 124 J. THOMPSON, D. F. KIM, M. O’CONNOR et al., Proc. Natl. Acad. Sci. USA 2001, 98, 9002–9007. 125 N. POLACEK, M. GAYNOR, A. YASSIN et al., Nature 2001, 411, 498–501. 126 K. QUIGGLE, G. KUMAR, T. W. OTT et al., Biochemistry 1981, 20, 3480–3485. 127 V. I. KATUNIN, G. W. MUTH, S. A. STROBEL et al., Mol. Cell 2002, 10, 339–346. 128 Y. KAZIRO, Biochim. Biophys. Acta 1978, 505, 95–127. 129 R. K. AGRAWAL, A. B. HEAGLE, P. PENCZEK et al., Nat. Struct. Biol. 1999, 6, 643–647. 130 M. G. GOMEZ-LORENZO, C. M. T. SPAHN, R. K. AGRAWAL et al., EMBO J. 2000, 19, 2710–2718. 131 L. P. GAVRILOVA, A. S. SPIRIN: in Methods in Enzymology, eds K. Moldave and L. Grossmann, vol. 30 1974, 452–462. 132 K. FREDRICK, H. F. NOLLER, Science 2003, 300, 1159–1162. 133 T. P. HAUSNER, J. ATMADJA, K. H. NIERHAUS, Biochimie 1987, 69, 911–923. 134 Y. ENDO, K. TSURUGI, J. Biol. Chem. 1987, 262, 8128–8130. 135 T. UCHIUMI, S. HONMA, T. NOMURA et al., J. Biol. Chem. 2002, 277, 3857–3862. 136 E. CUNDLIFFE: in The Ribosome: Structure, Function and Evolution, eds W. E. HILL, A. DAHLBERG, R. A. GARRETT et al., ASM Press, Washington, DC 1990, 479–490. 137 G. STO¨ FFLER, E. CUNDLIFFE, M. STO¨ FFLE-MEILICKE et al., J. Biol. Chem. 1980, 255, 10517–10522. 138 D. M. CAMERON, J. THOMPSON, P. E. MARCH et al., J. Mol. Biol. 2002, 319, 27–35. 139 R. K. AGRAWAL, J. LINDE, J. SENGUPTA et al., J. Mol. Biol. 2001, 311, 777–787. 140 H. STARK, M. V. RODNINA, J. RINKEAPPEL et al., Nature 1997, 389, 403–406.
43
44 The Ribosomal Elongation Cycle
141 D. BEYER, E. SKRIPKIN, J. WADZACK et al., J. Biol. Chem. 1994, 269, 30713–30717. 142 E. V. ALEXEEVA, O. V. SHPANCHENKO, O. A. DONTSOVA et al., Nucleic Acids Res. 1996, 24, 2228–2235. 143 A. V. ZAVIALOV, M. EHRENBERG, Cell 2003, 114, 113–122. 144 J. FRANK, R. K. AGRAWAL, Nature 2000, 406, 318–322. 145 A. BASHAN, I. AGMON, R. ZARIVACH et al., Mol. Cell 2003, 11, 91–102. 146 I. AGMON, T. AUERBACH, D. BARAM et al., Eur. J. Biochem. 2003, 270, 2543–2556. 147 A. BASHAN, R. ZARIVACH, F. SCHLUENZEN et al., Biopolymers 2003, 79, 19–41.
148 M. LAURBERG, O. KRISTENSEN, K. MARTEMYANOV et al., J. Mol. Biol. 2000, 303, 593–603. 149 T. TOYODA, O. F. TIN, K. ITO et al., RNA 2000, 6, 1432–1444. 150 N. GUEX, M. C. PEITSCH, Electrophoresis 1997, 18, 2714–2723. 151 U. B. RAWAT, A. V. ZAVIALOV, J. SENGUPTA et al., Nature 2003, 421, 87–90. 152 R. GREEN, J. R. LORSCH, Cell 2002, 110, 665–668. 153 J. M. BERG, J. R. LORSCH, Science 2001, 291, 203. 154 J. H. CATE, M. M. YUSUPOV, G. Z. YUSUPOVA et al., Science 1999, 285, 2095–2104. 155 C. M. SPAHN, R. BECKMANN, N. ESWAR et al., Cell 2001, 107, 373–386.
1
Gene Expression: Basic Concepts Paul Cullen and Michel Werner Service de Biochimie et G´en´etique Mol´eculaire, Gif-sur-Yvette, France
Stefan Lorkowski, Mario Kratz and Claudia Wenner University of M¨unster, M¨unster, Germany
Christoph Marschall and Hanns-Georg Klein IMGM Laboratoties GmbH, Martinsried, Germany
David Goodlett and Ruedi Aebersold The Institute for Systems Biology, Seattle, USA
Originally published in: Analysing Gene Expression. Edited by Stefan Lorkowski, Paul Cullen, Copyright ľ 2003 Wiley-VCH Verlag GmbH & Co. KGaA Weinheim. Print ISBN: 3-527-30448-2
1 Introduction
Following the publication of the draft sequence of the human genome in February 2001 [1, 2], the focus of many groups in academia and industry has shifted to the complex and demanding study of gene expression. The following pages include a description of how the information contained within the genes is expressed, i. e. an overview of the machinery and the regulation of gene transcription and protein translation in the eukaryotic cell. At present, there is a lively debate between the proponents of mRNA expression analysis and the champions of protein analysis as to which method provides the truest picture of functional gene expression. The mRNA camp points to the very high degree of sensitivity, the speed and the completeness of their methodology. Aficionados of protein expression, by contrast, point to the obvious discrepancies between the gene complement of an organism and its ultimate phenotype. Thus, the butterfly and the caterpillar have identical genomes but radically different proteomes. We believe that this debate has produced much heat but little light. The reality is that the study of mRNA expression and protein expression are complementary techniques, each of which has an irreplaceable contribution to make to the understanding of gene expression. Indeed, this logic also applies to the invidious Protein Science Encyclopedia - Online C 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Copyright DOI: 10.1002/9783527610754
2 Gene Expression: Basic Concepts
comparisons which are sometimes drawn between individual methods. Here too, the truth is that each method has its own particular usefulness and that the choice of method depends on the question being asked and the financial and temporal resources available to the investigator. 2 Basics of Transcription and Translation in the cell 2.1 Introduction
Proteins play important roles in nearly all biological processes. They catalyse chemical reactions, are crucial for the structure and stability of cells, regulate cellular uptake and storage of other essential molecules, and are responsible for coordinated growth and development. Thus, the synthesis of proteins is of prime importance for all cells. The structure of proteins is defined in the sequence of bases in the DNA. However, the DNA is not translated into proteins directly (Figure 1). Instead,
Fig. 1 Schematic overview of the flow of the genetic information. The genetic code, which is stored within the sequence of bases within the DNA, is the blueprint for the assembly of proteins. However, DNA is not immediately translated into proteins. First, a copy of one DNA strand is made in a process called transcription, which leads to a primary RNA transcript (preliminary RNA, preRNA). This RNA is identical to the coding strand of the DNA, except that
thymine is exchanged for uracil. Then, the non-coding parts of the RNA (introns) are resected in a process called splicing, which results in the formation of messenger RNA (mRNA), which is modified by the addition of a cap at the 5 -end and a poly(A) tail at the 3 -end. The mRNA is translocated into the cytoplasm, where it is translated into protein by special organelles called ribosomes. ‘R’ indicates the remaining amino acids of the protein.
2 Basics of Transcription and Translation in the cell
in a process called transcription a molecule of messenger RNA (mRNA) is copied from the genomic DNA. A single strand of mRNA then leaves the nucleus and serves as a blueprint for the assembly of a protein. This process, called translation, is mediated by ribosomes, small organelles consisting of several subunits of ribosomal RNA (rRNA), which are preloaded with auxiliary protein factors. By the action of ribosomes, the sequence of nucleotides in the mRNA molecule is translated into a corresponding sequence of amino acids to produce a protein chain. Each of the possible pool of 20 different amino acids from which the protein is to be built is brought to the ribosome by its specific transfer RNA (tRNA) molecule. This article outlines the basic concepts of transcription and translation, with a focus on the mechanisms in eukaryotes. 2.2 Transcription
Transcription generates both the mRNAs, which are the blueprint for the assembly of proteins, and the different types of RNA that are needed for RNA processing and protein synthesis (small nuclear RNA (snRNA), tRNA, rRNA). These RNA molecules are synthesised by an enzyme called RNA polymerase, which copies the sequence of one strand of DNA, the so-called coding strand. The enzyme binds tightly to specific DNA sequences called promoters, which indicate the starting point for RNA synthesis. Promoters are characterised by consensus sequences, i. e. typical sequences of bases within the DNA, which are in special positions relative to the coding region (Figure 2). By means of these promoters, RNA polymerase recognises the starting point for transcription. After binding to the promoter, the enzyme starts transcription by first unwinding a region of the double helix and subsequent coupling of two ribonucleotides, which must be complementary to the template DNA strand. The RNA polymerase molecule then assembles the mRNA strand by moving along the DNA, unwinding the helix, and adding one complementary nucleotide at a time (Figure 3). This results in the transient formation of a RNA/DNA hybrid helix. This helix, however, is relatively short – about twelve bases – because the assembled RNA chain is immediately displaced, allowing the DNA to rewind at the rear of the RNA polymerase. The elongation process continues until the enzyme encounters a termination signal in the DNA, where the polymerase stops and releases both the DNA and the newly made RNA chain, which is typically between 70 and 10,000 nucleotides long. Little is known about termination signals in eukaryotes. Eukaryotic preliminary RNA is cleaved by a specific endonuclease, which recognises a cleavage signal containing the consensus sequence AAUAAA. The endonuclease cleaves behind the uracil residue, and a polymerase adds about 250 adenylate residues, resulting in a polyadenylate tail at the 3 -end. This poly(A) tail is thought to protect the mRNA from digestion by nucleases and also appears to play a role in promoting translation. The RNA resulting from transcription is an identical copy of the coding strand of the DNA, except that thymine is replaced by uracil. Besides smaller differences, for example in the exact position and consensus sequences of the promoters, there are also some more fundamental differences
3
4 Gene Expression: Basic Concepts
Fig. 2 Prokaryotic A) and eukaryotic B) promoters. In bacteria, two sequences on the 5 -side of the first nucleotide to be transcribed into mRNA, which is defined as +1, are important. One of these is positioned around the −10 position and has the consensus sequence TATAAT. The other is found at position −35 with the consensus sequence TTGACA. Eukaryotic genes have
promoter sites with a TATAAA consensus sequence centred at about position −25, and often also a CAAT box with a GGXCAATCT consensus sequence positioned at about position −75. Constitutive genes often also show GC-rich sequences in their promoters. The CAATand GC boxes can affect promoter activity when positioned on either the coding or the template strand.
Fig. 3 Basic mechanism of transcription. After binding to the promoter, RNA polymerase starts transcription by first unwinding a small piece of DNA. Then, the RNA strand is assembled by adding one complementary nu-cleotide at a time as the RNA polymerase moves along the DNA. This results in the formation of an RNA/DNA hybrid helix, which is relatively short due
to the fact that the RNA chain is immediately displaced, allowing the DNA to rewind again at the rear of the enzyme. This process continues until the enzyme encounters a termination signal in the DNA, where the polymerase stops and releases both the DNA and the newly synthesised RNA. This figure is adapated from Stryer [3].
2 Basics of Transcription and Translation in the cell
in transcription between eukaryotes and prokaryotes. In prokaryotes, all mRNA is produced by a single RNA polymerase, while in eukaryotes three distinct RNA polymerase enzymes exist. Furthermore, transcription and translation take place simultaneously in prokaryotes, i. e. translation begins before the whole mRNA molecule is transcribed. By contrast, transcription and translation are spatially and temporally separated in eukaryotes, which enables more subtle regulation of gene expression. In contrast to prokaryotes, where mRNA molecules are produced directly by DNA transcription, most primary RNA transcripts in eukaryotic cells are altered extensively. These alterations include the addition of a 7-methylguanosine cap at the 5 -end, the attachment of the poly(A) tail at the 3 -end, and most importantly a process called splicing, which is the resection of non-coding parts of the RNA (introns) and the joining of the coding parts (exons) to form a continuous sequence that codes for a specific polypeptide chain. Splicing is also of interest with regard to the regulation of gene expression, because many primary RNA transcripts in eukaryotes can be processed in more than one way to produce different mRNAs and thus different polypeptides. The primary transcript contains molecular signals for alternative processing pathways, and the pathway favoured in a given cell is determined by special RNA-binding proteins. 2.3 Translation
Cells translate the sequence of the bases into an amino acid sequence by means of codons, which are triplets of nucleotides on the mRNA that specify a single amino acid (Table 1). Four different nucleotides (adenine, cytosine, guanine and uracil) exist, thus, any series of the three nucleotides of a codon can result in 43 (= 64) possible combinations. Since there are only 20 different amino acids, it is clear that some or all amino acids must be specified by several codons. In fact, most amino acids are coded by between two and six different codons. Only tryptophan and methionine are coded by a single codon. Three of the 64 codons do not code for amino acids but act as termination signals (stop codons). How do cells translate the sequence of these codons into proteins? This is done by the use of a set of tRNAs, which are small RNA molecules characterised by two major binding domains: one, the anticodon, is specific for a codon in the mRNA, the other for the amino acid specified by that codon. Each tRNA is designed to carry only one of the 20 amino acids used for protein synthesis. Before an amino acid is incorporated into a polypeptide chain, it is attached by its carboxyl end to the 3 -end of an appropriate tRNA molecule (Figure 4). This attachment covalently links the amino acid to a tRNA containing the correct anticodon. Furthermore, this amino acid attachment generates a high-energy linkage at the carboxyl end, thereby facilitating its reaction with the amino group of the next amino acid in the protein sequence to form a peptide bond. As the sequence of proteins within the synthesised protein is solely determined by the tRNA molecule, and not by its attached amino acid, the accuracy of protein synthesis is crucially dependent on the accuracy of the mechanism that links the amino acid specifically to its corresponding tRNA. This accuracy is guaranteed by 20 different enzymes called aminoacyl tRNA synthetases.
5
6 Gene Expression: Basic Concepts Table 1 The genetic code
1st position
2nd position A
3rd position
U
C
G
U
Phenylalanine Phenylalanine Leucine Leucine
Serine Serine Serine Serine
Tyrosine Tyrosine STOP STOP
Cysteine Cysteine STOP Tryptophan
U C A G
C
Leucine Leucine Leucine Leucine
Proline Proline Proline Proline
Histidine Histidine Glutamine Glutamine
Arginine Arginine Arginine Arginine
U C A G
A
Isoleucine Isoleucine Isoleucine Methionine
Threonine Threonine Threonine Threonine
Asparagine Asparagine Lysine Lysine
Serine Serine Arginine Arginine
U C A G
G
Valine Valine Valine Valine
Alanine Alanine Alanine Alanine
Aspartic acid Aspartic acid Glutamic acid Glutamic acid
Glycine Glycine Glycine Glycine
U C A G
Protein synthesis requires a complex catalytic machinery. Above all, it must be ensured that each successive codon in the mRNA engages precisely with the anticodon of the tRNA molecule and does not slip by even one nucleotide. This precise movement is catalysed by the ribosomes, which are large complexes composed of one large and one small rRNA subunit as well as of several proteins. A ribosome contains three binding sites for RNA molecules: one for mRNA and two for tRNAs. One site, called the peptidyl tRNA-binding site, or P-site, holds the tRNA molecule
Fig. 4. Structure of an aminoacyl tRNA. Before being incorporated into the polypeptide chain, an amino acid is linked with its carboxyl end to the 3 -end of its specific tRNA. The letter R indicates any possible residue of the specific amino acid.
2 Basics of Transcription and Translation in the cell
that is linked to the growing end of the polypeptide chain. Another site, called the aminoacyl tRNA-binding site, or A-site, holds the incoming tRNA molecule charged with an amino acid (Figure 5). The A and P-sites are adjacent to each other so that the anticodons of two tRNAs bound here form base pairs with neighbouring codons in the mRNA molecule. During the initiation phase of protein synthesis, the large and small subunits of the ribosome are brought together at the exact spot on the mRNA where the polypeptide chain is to begin. For this purpose, an initiator tRNA molecule bound to methionine, which is the amino acid encoded by the start codon (AUG), is required. This tRNA binds to the P-site of the small ribosomal subunit, and then binds to the first AUG start codon on the mRNA that is in a particular context – the highly conserved Kozak sequence with the consensus CCACCaugG – [4–7], thereby bringing the ribosome into exactly the correct position. As soon as the mRNA and the first tRNA are bound, the assembly of the ribosome is completed by the binding of the large ribosomal subunit. The elongation of the polypeptide chain is a reiterating three-step cycle. After translation has been started by the fixation of the initiator tRNA in the P-site of the ribosome, the aminoacyl tRNA molecule with the fitting anticodon becomes bound to the ribosomal A-site by forming base pairs with the codon exposed at the A-site (step 1). A tRNA molecule is held tightly at either site only if its anticodon forms base pairs with a complementary codon on the mRNA molecule that is bound to the ribosome. Then, a peptide bond is formed by a reaction between the amino group of the amino acid linked to the tRNA in the A-site and the carboxyl group of the polypeptide chain that is bound to the tRNA molecule in the Psite (step 2). This reaction is catalysed by an enzyme called peptidyl transferase, which is an enzymatic function of the large ribosomal subunit. The free tRNA bound to the P-site is then released, and the new peptidyl tRNA in the A-site is translocated to the P-site as the ribosome moves exactly three nucleotides further along the mRNA molecule in the 3 -direction (step 3). After completion of this last step, the A-site is free to accept a new tRNA molecule, which starts the cycle again. Protein synthesis is terminated when so-called release factors bind to a stop codon in the A-site. This binding alters the activity of the peptidyl transferase, causing it to catalyse the addition of a water molecule instead of an amino acid to the peptidyl tRNA. This reaction releases the carboxyl end of the growing polypeptide chain from its tRNA molecule, thus discarding the complete protein into the cytoplasm. The ribosome then also releases the mRNA and dissociates into its two separate subunits. The major difference between prokaryotic and eukaryotic translation is that pro-karyotic mRNA can have multiple start sites and can serve as the template for the synthesis of several proteins, while each eukaryotic mRNA is the template for only one protein. In addition, slight differences exist between eukaryotes and prokaryotes in the size of the ribosomes, the nature of the initiator tRNA, and the number and structure of initiation, elongation and termination factors.
7
8 Gene Expression: Basic Concepts
Fig. 5 Schematic overview of translation. An mRNA is bound to a ribosome, in this example with the codon UUC exposed to the P-site and the codon UGG to the Asite. The peptidyl tRNA is already bound to the P-site (the last amino acid added to the chain is phenylalanine). In the first step, the tRNA with the anticodon to UGG, which is specific for tryptophan, binds to the A-site. The free amino group of tryptophan then reacts with the carboxyl group of the pep-
tide chain bound to the tRNA in the P-site (step 2). The polypeptide chain is elongated by the formation of a new peptide bond, and the polypeptide chain is now attached to the tRNA at the A-site. This leads to the release of the free tRNA at the P-site (step 3), and translocation of the peptidyl tRNA from the A-site to the P-site together with movement of the ribosome exactly three nucleotides further along the mRNA.
3 Regulation of transcription
2.4 Summary
Genetic information is made up of the sequence of bases in the DNA. This information determines the composition, structure, and above all the function of the cellular proteins. In both prokaryotes and eukaryotes a large and complex cellular machinery is responsible for the synthesis of proteins, which involves in a first step the transcription of DNA into mRNA, and in a second step the translation of mRNA into proteins. These processes have been briefly outlined in this article. However, the regulation of transcription and also translation, the mechanisms, which increase or decrease gene expression, allow the differentiation of cells in a multicellular body as well as the adjustment of a creature to changing environmental conditions. Also, these mechanisms provide insight into physiological and pathophysiological metabolic processes and may also serve as drug targets for drugs in the treatment of disease.
3 Regulation of transcription 3.1 Introduction
Gene expression in eukaryotic cells is be regulated at various stages on the path from DNA to the active protein. At least eight key points of regulation of gene expression have been identified up to now [8, 9]. These are (i) transcription initiation, (ii) transcript elongation, (iii) RNA processing (e.g., splicing, polyadenylation), (iv) RNA transport from the nucleus to the cytoplasm, (v) RNA stability (e.g., by additional polyadenylation), (vi) translation of mRNA into protein, (vii) degradation of proteins, and (viii) post-translational modification of proteins (Figure 6). The following sections describe the principles of the regulation of gene expression at each of these levels. 3.2 mRNA expression profiles – the transcriptome
Transcription is the first and main level of regulation of gene expression. Protein synthesis rates generally correlate with transcription rates except for a small number of genes for which translation is regulated at a post-transcriptional level. Variations in the ensemble of the mRNA molecules of a cell, the transcriptome, is largely reflected in equivalent variations in the population of the cell proteins, the proteome (the term ‘proteome’ was invented by Marc Wilkins and Keith Williams at the Macquarie University Center for Analytical Biotechnology in Australia in 1995). However, the amount of a specific mRNA is usually not directly related to the amount of the corresponding protein in the cell because the translational efficiency
9
10 Gene Expression: Basic Concepts
Fig. 6 Levels of modulation and control of gene expression and protein activity in eukaryotic cells. The figure is adapted and modified from [8].
3 Regulation of transcription
of a given mRNA is not a constant and because the half-life of mRNAs varies widely from one mRNA species to another. Transcriptome analysis is now possible thanks to the knowledge of gene sequences and the development of different technologies such as serial analysis of gene expression (SAGE), DNA microarray technology, and real-time RT-PCR [10–13]. Transcriptome analysis may be used to assign a function to genes. The transcriptional state of the cell in a given condition provides a signature that can be compared to that of cells in other growth or physiological conditions [14–16]. These signatures can be used to group genes on the basis of their response to a stimulus. Genes that belong to a common pathway are often regulated in a similar way. Thus similarities in expression patterns may be used to assign function to unknown genes. Many studies using DNA microarrays were performed in the Saccharomyces cerevisiae model system. This organism was used since it was the first eukaryotic genome to be sequenced, it is highly amenable to genetic analysis. Two typical examples of the use of transcriptome analysis to investigate cell function were the analysis of the cell cycle-regulated genes and of the transcriptional response of the yeast to various stresses. In the former example, yeast cells were synchronised at different points along the cell cycle and then released and allowed to resume progression within the cycle [16, 17]. About 15 percent of the yeast genes showed waves of transcription that peaked at particular times during the cell cycle. In this way numerous new genes were implicated in the cell cycle. The response of the yeast transcriptome to such stressors as heat or cold, or to chemical challenges such as diamide or hydrogen peroxide, was also investigated [18, 19]. Strikingly, all these stressors were found to repress strongly and co-ordinately the transcription of around three hundred genes which encode proteins necessary for the constitution of the translation machinery. These included ribosomal genes, genes encoding the subunits of RNA polymerase I and III, proteins involved in ribosomal RNA (rRNA) and transfer RNA (tRNA) maturation, as well as factors involved in the initiation and elongation of translation. A number of genes of unknown function were also repressed by these stressors, at least some of which may encode proteins required for the constitution of the translation apparatus. Compendia of transcription profiles may be accumulated for analysis of transcription profiles [15]. These profiles observed in various growth conditions, under the influence of compounds or drugs, and in different mutants can be arranged or clustered according to their transcriptional signature. Transcriptome analysis provides a powerful method to evaluate the side effects of a drug. Under ideal circumstances, a drug should not perturb pathways other than the one containing its target. Thus, in principle, a drug might be concluded to be free of side-effects if deletion of the drug target abolishes the transcriptional response to the drug [20]. A future application of transcriptome analysis might be to tailor treatments to obtain the best response and minimise the adverse effects of drugs in individual patients. Transcriptome analysis may also provide an important contribution to medicine, particularly in understanding biology of cancer and improving the diagnosis and
11
12 Gene Expression: Basic Concepts
treatment of this condition. Current classification methods based on morphology often fail to distinguish between tumors that may have a very different prognosis. Microarray experiments of particular lymphoid leukaemia, for example, have been able to distinguish subtypes with widely different prognosis where classical morphology failed to do so [21]. Moreover, the analysis indicated that the two kinds of leukemia originated from different cell types to which they had a closer resemblance than they had to each other. Being able to more finely classify different types of cancers and assess more precisely their sensitivity to drugs will help design more powerful treatments with fewer side effects. An area of great interest concerns the use of mathematical tools to analyse patterns of gene expression and thus to identify the regulatory networks linking together the transcription of groups of genes. This technique has been proved in principle in the study of mutants of transcription factors in microorganisms. However, it is difficult to ascertain if the effect of a mutation is direct or indirect when the only data to hand is the pattern of gene transcription. One computational solution to this dilemma is to search for potential DNA-binding sites for transcription factors in the promoter regions of co-regulated genes [16]. Alternatively, one can use the genes that respond to a particular factor as a training set in order to define the sequence of the binding site for the transcription factor and then look for the presence of this newly defined sequence in the promoter regions of the regulated genes. A more powerful approach has been devised recently that combines the chromatin immunoprecipitation approach (ChIP) with microarray technology [22–24]. Proteins bound to DNA are cross-linked with formaldehyde in vivo and the factor of interest is immunoprecipitated after random shearing of DNA by sonication. The cross-link is disrupted and the DNA that has been enriched by immunoprecipitation is amplified, labelled and hybridised to microarrays bearing the complete genome of interest in competition with chromatin which has been immunoprecipitated from a mutant in which the factor of interest is unable to bind. This procedure will reveal the binding site for the factor along the genome. In addition, the ‘ChIP on chip’ data can now be compared with transcriptome data in order to reveal which of the targets of the factor are directly responsive to its presence and which genes are regulated indirectly. A striking example of the possibilities of this approach was provided by a study of the regulators of the different phases of the yeast cell cycle [25]. In this study, it was found that the regulators of a given phase of the cell cycle not only regulated transcription of the genes expressed specifically in that phase but also regulated the regulator of the next phase in the cycle. The organisation of the cell cycle was thus mirrored in a cycle of regulators. In 2001, two versions of the draft of the human genome were published [1, 2]. The two projects used different algorithms to predict the number of genes and arrived at roughly similar figures in the 30,000 range. However, comparison of the two sets of putative genes revealed that only half of the predicted genes were common to both projects and even in those cases, intron predictions varied [26]. This fact shows how important it is to verify by experiment the presence and structure of the predicted genes. The gene organisation together with the sites of initiation and termination
3 Regulation of transcription
of transcription can be obtained using tiled oligonucleotide probe microarrays as has been done for a 150 kilobase pairs segment of human chromosome 22 [27]. Microarrays can thus also be used as a genome annotation tool. 3.3 Protein expression profiles – the proteome
As stated previously, most of the cellular regulation of gene expression is exerted at the transcriptional level. However, in some cases, as with the yeast GCN4 gene, a transcriptional activator, or CPA1 gene, which encodes the glutaminase subunit of the arginine pathway carbamoyl-phosphate synthetase, protein amounts do not reflect mRNA levels [28, 29]. Regulation can be effected at several steps after transcription. For example, the stability of the mRNA, its translation or even the stability of the protein may be regulated. Proteins may also be post-transcriptionally modified by, for example, glycosylation, phosphorylation, methylation, sulphation or ubiquitination. Such modifications cannot be inferred from the sequence of the proteins but are very important because they can greatly affect the activity of the protein. Again, transcriptome analysis cannot reveal this aspect of cell biology. Knowing where a protein is located in the cell can help tremendously in determining what it does. It is possible to fractionate organelles such as mitochondria, chloroplasts, the Golgi, vacuoles or the nucleus, and to describe the complete set of protein contained within them. Proteome analysis coupled with subcellular fractionation can therefore provide answers to important biological questions. Traditionally, proteome analysis has relied on two-dimensional separation of proteins. In a first dimension, the proteins are focused according to their isoelectric point. They are then separated in the second dimension by their apparent molecular mass. With the recent developments of mass spectrometry together with the advent of the sequence of genomes, it has been possible to devise methods that allow identification of hundreds of proteins on two-dimensional gels [30]. These methods are also able to detect some protein modifications. Although proteome analysis using two-dimensional gels can at best reveal the 500 to 1,000 most abundant proteins present within a sample, there are many situations where it cannot be replaced by transcriptome analysis. This is true, for example, of samples which are devoid of mRNAs, for example body fluids such as cerebrospinal fluid, lymph or plasma. In addition, if one is interested only in the most abundant proteins, two-dimensional electrophoresis may be a simpler and more reliable means to measure expression ratios than microarray analysis. This is particularly true of the study of protein expression in microbes where spot identifications is simpler due to the lower complexity of the protein samples [31]. The knowledge of the genome sequence of eukaryotes is an important resource. However, as stated above, particularly in multicellular organisms the gene structure is difficult to predict from pure sequence information. Moreover, alternative transcription initiation or splice sites can be used in different tissues and during development. Hence, the automatic annotation of genomic sequences cannot be considered as valid unless experimental data support it. Automated proteomic analysis,
13
14 Gene Expression: Basic Concepts
coupling two-dimensional separation of proteins with the mass spectrometric identification of the spots, can provide a reliable genome annotation method [30].
3.4 Interaction between genes and proteins – the interactome
Interactions between proteins play important roles in the cell biology. Numerous proteins participate in complexes and the knowledge of the interaction in which they are engaged is essential to understand how these machines function. Some of the interactions are transient, as is the case between a protein kinase and its substrate. Another good example of the importance of protein/protein interactions is in the transcription of messenger RNA in eukaryotes and its precise regulation. This process requires more than 70 polypeptides that belong to a dozen molecular machines also called transcription factors [32] (see below). The arrival and departure of these machines from transcribed genes is the result of a precise choreography in which ordered protein contacts culminate in the initiation of transcription by RNA polymerase II. The structure of this twelve-subunit enzyme has recently been solved [33, 34]. Crystallographic analysis cannot however be used as a routine method to solve the structure of complexes. Other techniques are thus required that will ideally give a precise description of the interactome, cataloguing the identity of the interacting polypeptide, the extension of the interaction domains, and the timing and sub-cellular context in which the interactions take place. The yeast two-hybrid method and its variants is to date the method of choice to screen for protein/protein interactions at the proteome level. This situation may change in the future with the advent of so-called protein chips. It is based on the observation that the DNA-binding domain and the activation domain of a eukaryotic transcription activator when fused to two interacting proteins will activate the transcription of a reporter gene in yeast. Thus, one can ask if two proteins interact by fusing them to the activation domain and the DNA-binding domain of a transcription factor respectively and by then following the expression of a reporter gene when they are co-expressed in yeast [35]. The method can be used to test individual proteins in a pairwise manner or to screen DNA libraries [36, 37]. These libraries are made of full-length complementary DNAs (cDNAs) or of random DNA fragments fused to the activation domain of the transcription activator. Screening the whole set of proteins of an organism with such libraries is extremely labour intensive. Nevertheless, the procedure can be applied to small proteomes such as those of viruses [38, 39], to illuminate specific parts of complex proteomes [40] or to explore proteomes of intermediate complexity like those of bacteria at a lower resolution [41]. In addition, the use of DNA fragments libraries provides a map of the interaction domains. A systematic study of the yeast RNA polymerase III complex disclosed a number of contacts that were latter described in much more details by the crystallographic analysis of the yeast RNA polymerase II indicating that the method can also be successfully applied to investigate the structure of complexes [42].
3 Regulation of transcription
The yeast two-hybrid system has been used to explore the yeast proteome by testing individual fusions of almost all the proteins fused to the activation domain or the DNA-binding domain of the Gal4p activator (Gal4p is one of the best described yeast transcription activators; it is required for the activation of transcription of the genes involved in the utilisation of galactose in that organism) [43–45]. Three different medium or high-throughput studies showed few overlaps in their results. The disconcerting result may indicate that the screening procedures were not performed at saturation level. A more disturbing implication is that a substantial proportion of the reported interaction may in fact be false-positive. This situation may, for example, result from the fact that, in the two-hybrid system, proteins that are normally segregated in different sub-cellular compartments are co-expressed in the nucleus. In support of the general value of the yeast two-hybrid system, however, it has been found through bioinformatic analysis of the screening data that there is a general correlation between co-expression of genes and interaction between proteins [46]. Moreover, as expected, proteins interact in the yeast twohybrid system more often with partners belonging to the same functional category than with partners from other categories. Despite its drawbacks, the yeast twohybrid system is currently the only procedure able to provide even a glimpse at the interactome in an in vivo context for any organism. Knowledge of the interaction partner of a given protein is valuable information. However, the two-hybrid system does not allow the identification of members of a protein complex since it usually probes for direct interactions. Recently, it was shown that coupling affinity purification of proteins followed by mass spectrometry after gel electrophoresis separation can be used at a proteomic scale for complex identification in yeast [47, 48]. The purification procedures used either a single immunopurification of in vivo FLAG-tagged protein (FLAG is an eight amino acid marker peptide) or a tandem affinity purification method to identify hundreds of protein complexes. Though very powerful, this method is not applicable to any organism since it requires the insertion of a DNA sequence at the 3 -end of the gene coding the tagged protein in order to express the gene under its own promoter. Indeed, plasmid borne expression might bias the results. Nevertheless, the method can be used to query the existence of complexes containing conserved proteins or more punctually to ask to which complex a particular protein belongs. Advances in the production of recombinant proteins have allowed the production of extensive protein arrays [49]. These arrays will certainly be used to investigate protein/protein interactions. However, the specificity of the in vitro tests that will use protein arrays and how it compares with the yeast two-hybrid system is not known presently and awaits further experiments. 3.5 The transcription machinery and core promoters
In eukaryotes, three distinct RNA polymerases are responsible for the transcription of the genome (see the following citations for reviews; [32, 50–53]). RNA polymerase I transcribes a single essential RNA, the ribosomal rRNA precursor (35S in yeast,
15
16 Gene Expression: Basic Concepts
45S in higher eukaryotes). This RNA is transcribed from large clusters of tandem arrays of the genes that code for these RNA (rDNA) and that are located the nucleolus (or nucleoli). After transcription, the rRNA precursor is matured by the addition of methyl groups and pseudo-uridylations and processed into the two large ribosomal RNAs (18S and 28S in yeast) and the 5.8S rRNA. rRNA precursor transcription is extremely active since it represents 65 to 85 percent of all RNA synthesis in the cell. rRNA maturation and processing take place in the nucleolus and some of the steps of the pathway occur concurrently with the assembly of the ribosome. RNA polymerase II not only transcribes messenger RNAs, which are the subject of most of the studies on gene expression, but also a number of small stable RNAs like small nuclear RNAs (snRNAs) and most spliceosomal RNAs. These RNAs, on the contrary to mRNAs that are translated, are assembled in ribonucleoproteins and are responsible for the maturation of other RNAs. Finally, RNA polymerase III transcribes transfer RNA (tRNA) genes, the 5S ribosomal RNA and a few other stable RNAs like U6 spliceosomal RNA and some viral RNAs like VA from the adenovirus. Transcription can be divided into steps. If one considers a gene that has never been transcribed, the first step is the recognition of the promoter by the transcription factors. While, for polymerase I and for most of polymerase III-transcribed genes, a set of basal factors is all that is required, polymerase II needs a very large number of gene-specific factors in addition to the basal transcription factors. The gene-specific transcription factors do not always communicate directly with the basal transcription machinery. Sometimes, they do so through the intermediate of co-activators that can be different complexes like chromatin remodelling machines or chromatin acetylation complexes [54, 55]. The complex of transcription factors with promoter DNA constitutes the pre-initiation complex. Each of the three RNA polymerases recognises its cognate pre-initiation complex and associates with it to form the initiation complex. The first step towards transcription initiation is the formation of the open complex that results in the separation of the two DNA strands around the transcription initiation site. The RNA polymerase then transcribes, without leaving the promoter, short RNAs (three to ten nucleotides long) that are released. It then isomerises to enter the processive phase of transcription elongation. During the early phases of transcription, some of the basal transcription factors break loose from the promoter. On the other hand, the elongation complexes, formed of the DNA template, the transcribing RNA polymerase and the RNA are extremely stable and can be biochemically purified. Transcription elongation is not a monotonous process. Portions of a given gene are transcribed faster than others. RNA polymerases may even halt at specific locations along the DNA. Transcription elongation factors are sometimes required to promote transcription through the pause sites. At the end of the transcription unit the RNA polymerase will recognise the termination signal, release the RNA and separate from the template. After RNA release, the enzyme is recycled. Interestingly, a recycled RNA polymerase initiates preferentially on the template it has just transcribed and does so, at least in vitro, much faster than the first initiation [56]. It should be stressed that each of the steps during the transcription cycle can be regulated and that each
3 Regulation of transcription
of the numerous proteins participating in the process can be the target of activators or repressors. RNA polymerases are the complex enzymes that synthesise RNA. In eukaryotes they are comprised of twelve subunits for polymerase II, 14 for polymerase I and 17 for polymerase III [57–59]. These subunits are conserved through evolution to the point that yeast subunits can, in many cases, be substituted by human subunits in vivo [60]. The subunits can be subdivided into three categories: (i) Five small subunits are shared by the three RNA polymerases. (ii) Five subunits are homologous in the three enzymes. Among these subunits, one finds the two large subunits that are the hallmark of all multi-subunit RNA polymerases, including the eubacterial and archaebacterial ones, and which form the catalytic site of the protein. (iii) The other subunits are specific to each RNA polymerase class and may or may not be conserved through evolution. Some of these subunits interact with the basal transcription factors during initiation complex formation and are thus responsible for the determination of class specificity of transcription. As mentioned above, most of the transcriptional regulation occurring in the cell pertains to the synthesis of mRNAs by polymerase II. Nevertheless, global regulation of transcription by polymerases I and III results in the adaptation of the cell translational capacity to the growth rate. The basal polymerases I and III transcription factors will therefore be described briefly before introducing those factors that are needed for mRNA transcription and regulation. Polymerase I pre-initiation complex formation in eukaryotes requires two multisubunit factors that bind to the two promoter elements, the upstream control element (UCE; also termed upstream activator sequence, UAS) and the control element (CE). Though the subunit composition of the polymerase I factors and the sequences they bind to are not conserved from yeast to higher eukaryotes. In all cases where it has been investigated, the TATA box-binding protein (TBP) is present in the pre-initiation complex. Polymerase I recognises the pre-initiation complex only when it is associated with the conserved Rrn3p transcription factor through an interaction with the polymerase I A43 subunit. It has been documented that most of polymerase I transcription regulation is mediated through the association of polymerase I with Rrn3p which happens when the cell is actively growing and which is prevented when the cell is quiescent. Polymerase III RNA synthesis is epitomised by tRNA gene transcription that requires two transcription factors (TF), TFIIIC and TFIIIBs (TF), TFIIIC [50, 61]. While the number of subunits varies between organisms in TFIIIC, TFIIIB consists of three conserved polypeptides, one of which is TATA box-binding protein (TBPs (TF), TFIIIC). TFIIIC is first recruited to the tRNA gene through interactions with two intragenic sequences termed the A and B-box. Then, TFIIIB joins in, binding upstream of the transcription start site. Polymerase III recognises the TFIIB-related factor (BRF) subunit of TFIIIB and forms the initiation complex. 5S RNA requires TFIIIA in addition to TFIIIC and TFIIIB for transcription that can be thus considered as a specific transcription factor. U6 snRNA transcription details varies from organism to organism and will not be considered here. Biochemical studies suggested that the initiation complex is assembled sequentially. However,
17
18 Gene Expression: Basic Concepts
recent studies in human suggested that polymerase III, like polymerase II (see below) can be purified in association with its transcription factors and that the initiation complex can assemble in one step. In vivo, the arrival of the different components on the class III promoters has not yet been investigated. Polymerase II requires a large set of basal transcription factors and co-activators comprising more than 50 polypeptides for the transcription of class II genes [61– 63]. This paragraph will not describe of the co-activators that are required to alter the chromatin structure. The basal transcription machinery comprises six general transcription factors (TFIIA, B, D, E, F and H) made of one to fourteen polypeptides. Together, they are sufficient to assemble a polymerase II pre-initiation complex but not to support transcription of chromatin templates. Order of addition studies have indicated that in vitro on naked DNA templates, the first basal factor to bind the promoter DNA is TFIID, which thus has a central role in pre-initiation complex formation. TFIID is composed of TBP together with 13 TBP-associated factors (TAFII ). TBP therefore plays a central role since it is present in pre-initiation complexes of all three RNA polymerases. However, recently, it has been found in multicellular eukaryotes that homologous proteins in specific situations can replace TBP [64]. After the formation of the TFIID/DNA or D complex, TFIIA and TFIIB join in to form the DAB complex which is an intermediate complex in the formation of the polymerase II initiation complex composed of DNA/TFIID/TFIIA/TFIIB. The crystallographic structure of TBP, TFIIA and TFIIB bound to DNA has been solved. The DNA is severely kinked by TBP binding to the minor groove. The DAB complex then recruits a polymerase II/TFIIF complex to form the DABF/polymerase II complex (the DABF complex comprises DAB plus TFIIF). The DABF/polymerase II complex is then recognised by TFIIE and then TFIIH, which has a dual role in transcription initiation and DNA mismatch repair [65], to complete the formation of the polymerase II pre-initiation complex. The largest subunit of polymerase II has, compared to those of polymerase I and III, is a carboxy-terminal extension consisting of more or less perfect repetitions of the YSPTSPT amino acid motif. The number of repetitions of the motif, which are organised in tandem repeats, varies from 25 in yeast to 52 in man and its sequence is conserved in eukaryotes. This carboxy-terminal domain (CTD) is extensively phosphorylated by three different cyclin-dependent kinases activated by divergent cyclins. One CTD phosphatase has been described. During the transcription cycle the phosphorylation state of the CTD varies with hypo-phosphorylation during transcription initiation, hyper-phosphorylation being the hallmark of the elongating enzyme. This modification is effected by the general transcription factor TFIIH. Though some of the specific transcription factors may act by recruitment of TFIID, it has been observed that a CTD-binding complex, the mediator of transcription activation, is the main target of gene specific activators [66]. This complex, comprising about 24 subunits, also interacts with polymerase II via other parts of the enzyme as documented by electron microscopic studies. The mediator is viewed as a co-activator since it links the gene-specific factors to the basal transcription machinery and relieves repression of transcription by chromatin. Though individual mediator subunit sequences are not very well conserved, the general structure of
3 Regulation of transcription
mediators is. Moreover, in multicellular eukaryotes, different mediators, responding to different gene specific activators, have been purified biochemically. These mediator forms have at least a core of common subunits. It is not clear yet if these complexes coexist in the cell or are the result of different purification protocols. Once assembled, the initiation complex will undergo a number of changes that will allow transcription elongation to begin. The first step is the separation of the two DNA strands around the transcription initiation site to form the open complex. This step requires TFIIE and the ATP-dependent helicase activity of TFIIH. Initially, polymerase II repeatedly synthesises short RNAs, three to eight nucleotides long, without moving along the DNA template (as is also the case during transcription initiation by the other RNA polymerases). Then, the CTD is phosphorylated in an ATP-dependent manner by TFIIH and enters elongation. Of all the factors, the only one that remains associated with the promoter is TFIID. The other factors dissociate from the promoter during the transition from initiation to elongation, TFIIB is released after transcription of the first or second nucleotide, TFIIE separates from the promoter when polymerase II reaches nucleotide ten and TFIIH is released within a window of 30 and 40 nucleotides after the initiating nucleotide. TFIIF remains bound to polymerase II [67]. Transcription elongation is not a monotonous process [68]. The RNA polymerase can pause or be halted at certain DNA sequences or the progression of the enzyme can be hindered by nucleosomes. Factors interact with the enzyme during elongation to help its progression. These factors can be divided into different classes. Elongation factors increase nucleotide incorporation speed and suppress polymerase II pauses. Cleavage factors increase the intrinsic RNA cleavage activity of polymerase II ternary complexes and allow the complex to back up, giving it another chance to go through the pause or arrest site. A last class of elongation factors is made of complexes that help the RNA polymerase to transcribe through nucleosomes by increasing their movement. Transcription is thus a complex process involving a huge number of proteins that intervene during the transcription cycle. It should be stressed again that each of these steps, be it initiation, elongation or termination, can be regulated. Therefore, transcription abundance variations can reflect changes in the rates of any of the transcription steps. 3.6 Regulatory promoters
Transcription of the great majority of eukaryotic genes is controlled by sequencespecific DNA-binding proteins. Eukaryotic DNA is usually packed into chromatin, blocking the access of RNA polymerases to the initiation sites of gene transcription. Therefore, binding of transcriptional activators to specific elements in regulatory promoters and enhancers is an essential prerequisite for transcription. Most genes are expressed only in certain tissues or during short developmental stages. The activation of most promoters depends on the co-operation of several regulatory proteins. This obligatory formation of activator complexes guarantees high specificity
19
20 Gene Expression: Basic Concepts
Fig. 7 The regulatory promoter of eukaryotes. RNA polymerase II (translucent blue) and the basic transcription factors build a pre-initiation complex at the TATA box and the eukaryotic initiator sequence (Inr) of each promoter. Complex formation is supported by trans activators (green translucent) which have an effect on TFIID (green) and the mediator (blue). Unspecific bound high-mobility group proteins (HMG) are es-
sential for chromatin remodelling (orange). The carboxy terminal domain (CTD) of the RNA polymerase II is important for the interactions with protein complexes (arrows) such as, for example, the mediator. Histone acetylating proteins are not shown; upstream activator sequence is abbreviated as UAS and TBP is the abbreviation for TATA box-binding protein.
of transcription, even in large genomes such as that of human beings. A region of a few hundred base pairs around the core promoter generally contains most of the activator binding sites and is termed the regulatory promoter. Enhancers, by contrast, are regulatory elements that are located further upstream or, in some cases, downstream of the gene. Enhancers may also be present in introns [9]. Detailed information on the mapping of eukaryotic promoters is available from the Eukaryotic Promoter Database (EPD) at www.epd.isb-sib.ch/ (Swiss Institute for Experimental Cancer Research, ISREC; Epalinges sur Lausanne, Switzerland) [69–72]. Several proteins are required to allow RNA polymerase II to bind efficiently to promoters. These proteins may be classified as follows (see Figure 7). (i) Basic transcription factors which are needed at every promoter. (ii) Trans activators, proteins that bind to enhancers and upstream activator sequences. Some trans activators react to signal molecules and thereby activate or inhibit transcription in response to changes within the intracellular environment. DNA loop formation is supported by unspecific binding of non-histone proteins. These so called ‘high mobility group’ proteins play an essential role in chromatin remodelling and transcriptional activation. (iii) Co-activators such as transcription factor II D (TFIID), which do not bind DNA, but indirectly support the interaction of trans activators and the RNA polymerase II complex. TFIID, the best studied co-activator, is a huge complex of at least ten proteins (TATA box-binding protein, TBP, and TATA box-binding protein-associated factors). Some TATA box-binding protein-associated factors resemble histones and are probably involved in displacement of nucleosomes from DNA [9].
3 Regulation of transcription
3.7 Enhancers
Studies of viral enhancers have revealed some general properties of enhancer mechanisms. Different activators cooperate during promoter activation. Utilisation of cell-type specific regulators guarantees the general functionality of enhancers. Since enhancers are distant from the promoter, activation is independent of the exact orientation and positioning of the enhancer. Enhancers are composed of modules with multiple binding sites. The multimerisation of such modules augments the activation capacity of the enhancer. The exact mechanisms of enhancer-dependent transcriptional activation is not known. One model of enhancer-dependent transcriptional activation was that enhancers transmit changes in the DNA structure to the core promoter. Another hypothesis postulated that enhancers act as binding sites for RNA polymerase II which then moves along the DNA towards the core promoter. Both models were disapproved by experiments that demonstrated sustained activation or transcription after protein bridging of the enhancer and the core promoter. The binding of activators to the enhancers probably lead to chromatin remodelling and loop formation. This enables the enhancer complex to interact with the core promoter. Stabilisation of the transcription complex may be the key feature of enhancermediated transcriptional activation. The looping model is also supported from an energetic point of view. Michael R. Botchan and his colleagues have produced visual evidence of this model of enhancer action. They created an artificial DNA molecule with several promoter sites for the transcription factor SP1 and several enhancer sites, the latter being 800 base pairs from one end of the construct. The enhancers were bound by the enhancer-binding protein E2. The SP1 sites and the enhancers were separated by 1,860 base pairs. When a mixture of SP1 and E2 proteins was added to the DNA molecules, the electron microscope picture showed the DNA drawn into loops with ‘tails’ of approximately 300 and 800 base pairs. At the neck of each loop were two distinguishable accretions of material, representing SP1, and E2 molecules respectively. Artificial DNA molecules lacking the promoter or enhancer sites, or containing a mutant promoter or enhancer, failed to form loops when mixed with the two proteins. Some enhancers are activated by cooperative binding of a combination of regulators. The impact of cooperative binding on gene expression patterns was studied in several experiments (for review see [73–75]. It is not clear if enhancer-mediated activation produces a general increase in the transcription rate (rate model) or if the transcription is stabilised by the enhancer linked to a particular gene (binary model) [76–78]. Most experimental data are consistent with both models. 3.8 Locus control regions
A number of eukaryotic gene loci are regulated by locus control regions (LCRs). These elements are defined by their ability to confer high-level tissue-specific
21
22 Gene Expression: Basic Concepts
transcription to linked genes, as well as by their capacity to overcome position effects in transgenic assays. Locus control regions modulate transcription by a chromatin opening activity that acts through an extended DNA region (e. g., over 100 kilobases; [76, 79], distinguishing them functionally from classical enhancers. Locus control region-mediated changes in chromatin structure renders the DNA hypersensitive to nuclease activity. Each locus control region can include several hypersensitive sites, each representing a module with a distinct function. It is not known how locus control regions work. They seem to maintain accessibility of a region to transcription factors [80]. It has been suggested that locus control regions loop out and interact with regulatory promoters and enhancers. Locus control regions characteristically hyperacetylate and open chromatin over long distances. The acetylation of histones starts at the locus control region and moves along the chromatin [81]. The expression level is proportional to the copy number of the locus control region. The duration of contact of the locus control region with the regulatory promoter or enhancer of the linked gene controls the period of transcription. Locus control regions function even in the presence of generally inactivating centromeric heterochromatin. Insights into the mechanisms of locus control region-mediated transcription activation came from studies of the well characterised multigene clusters such as the human and mouse (β-globin loci. Due to its compact organisation, the (β-globin locus has become a model system for locus control regions. In transgenic mice experiments, high levels of expression were only observed when the constructs included the locus control region-containing region from the distal end of the globin locus [82]. The expression of (β-globin is independent of the integration site. Although deletions of the human or mouse (β-globin locus control regions significantly reduced transcription, this does not result in a closed chromatin structure of the endogenous (β-globin locus. These results suggest a sequential activation model in which promoter remodelling and gene hyperacetylation lead to low-level basal transcription and precede locus control region-mediated high-level activated transcription [83]. In many systems, histone acetylation has been associated with chromatin opening and gene regulation. Local hyperacetylation is generally linked to transcriptional activation [84]. In some cases, histone-acetyl transferase (HAT) recruitment for a promoter has been shown to be the critical and rate limiting step for transcription and to occur before polymerase initiation [85, 86]. The locus control region may be necessary for the modification or the recruitment of new components for the remodelled and hyperacetylated promoter or for additional chromatin modifications. 3.9 Matrix attachment regions
Since locus control regions can activate gene expression over long distances, negative regulating elements must exist in order to repress locus control region function on adjacent loci. Two different classes of negative control elements have been described: Matrix attachment regions (MARs) and insulators. Matrix attachment
3 Regulation of transcription
regions are DNA fragments capable of associating with nuclear structures that remain associated even after stringent extraction and washing under high salt conditions or stripping with detergents. Chromosomes are thought to be incorporated into looped structures with the loops fastened to the intranuclear matrix [87, 88]. It has been suggested that these loops may separate each chromosome into regulatory domains. Matrix attachment regions are AT-rich sequences often located in the vicinity of transcription units or enhancers. Several unique proteins have been found in association with matrix attachment regions [89]. 3.10 Insulators
Most insulators or boundary elements have been found in the vicinity of junctions between decondensed and condensed chromatin, which presumably correspond to active and inactive loci, respectively. The insulators appear to provide a boundary between accessible and inaccessible chromatin. For repression, the insulator must be located between the enhancer and the gene. Several models of insulator function have been described (Figure 8). One model is based on the idea that the insulators trap enhancer activity and eventually disturb the loop forming proteins [90]. A second model proposes the insulator as a molecular barrier for a propagating enhancer signal. Alternatively, an insulator may block factors needed for the loop formation of chromatin between enhancer and promoter. It is still unclear, if the
Fig. 8 Model of enhancer blockade by insulators. A) The enhancer (green) interacts by looping out to interact with the promoter complex (brown). The insulator (red) traps the enhancer via mimicry of the promoter. B) The insulator acts as a molecular barrier blocking the propagation of the enhancer
signal to the promoter (black arrows). C) The insulator blocks the looping necessary for the interaction of the enhancer with the promoter. D) The interaction of two insulator complexes results in DNA looping. Activation of the promoter is possible only by an enhancer located within the same loop.
23
24 Gene Expression: Basic Concepts
interaction of two insulators could induce a chromatin loop, which would hold some genes in an inactive core compartment. The highly conserved ubiquitous CCCTC-binding factor (CTCF) activates gene expression. CTCF binds at an insulator and blocks enhancers. CTCF is the first factor that has been isolated as being essential for gene separation in vertebrates [91]. All insulators discovered to date in vertebrates function only when bound by CTCF. CTCF has eleven zinc finger domains. The domains for DNA binding, recruitment of co-repressors and repression of transcription have been identified [92]. CTCF binds to different DNA elements – a common consensus sequence has not been identified. Locus control regions, enhancer induced acetylation and opening of chromatin proceeds until an insulator has been reached. Insulatorbound CTCF recruits histone deacetylases, their activity counteracts the acetylation and stops extension of the enhancer signal. 3.11 RIDGEs – Regions of increased gene expression
Most chromosomes contain regions of increased gene expression or RIDGEs, in which transcription per unit length is seven times that of the genomic average and 20 to 200 times that of weakly expressed regions [93]. On chromosome 6, for example, the regions of increased gene expression correspond to the major histocompatibility locus (MHC). Regions of increased gene expression are not randomly distributed within the genome, but most likely represent a higher order structure. Many of the regions of increased gene expression have a high gene density. For 50 to 60 percent of the regions of increased gene expression, a correlation between gene expression and density of the genes is found. The remaining 40 to 50 percent are not gene dense. These regions of increased gene expression preferentially map to telomeres, which is remarkable in view of the phenomenon of telomeric silencing that has been observed in yeast [94]. 3.12 Enhanceosomes
The term enhanceosome has been assigned to functionally related groups of DNAbinding proteins that act in concert to enhance transcriptional activation upon the presence of the transcription machinery. Enhanceosomes therefore represent another potential mechanism for fine tuning of transcriptional activation. Experimental evidence for the enhanceosome theory has first been provided by Thanos & Maniatis [95]. Subsequent studies suggested that the proteins involved in enhanceosome formation presumably form a stereospecific assembly of surface structures enabling a more efficient binding of the transcription apparatus [96]. One of the best characterised mechanisms of enhanceosome action is the activation of the interferon-β gene in response to virus infection [97]. The first step in interferonβ enhanceosome assembly is recruitment and cooperative binding of activating factors (nuclear factor κB, NF-κB; interferon regulatory factors, IRFs; activating
3 Regulation of transcription
transcription factor, ATF2, homodimers and ATF2/c-Jun heterodimers) orchestrated by the architectural protein HMGIY (high mobility group protein isoforms I and Y). Subsequent changes in the three-dimensional structure form an ‘enhanceosome pocket’ that facilitates recruitment and binding of the basal transcription machinery. This in turn leads to an increased rate of pre-initiation complex formation. The DNA-binding affinity of the factors is also regulated by the acetylation of histones, nucleosomes and the proteins of the enhanceosome complex, and by the cAMP response element-binding protein (CREB)-binding protein (CREBBP). The current view is that the enhanceosome complex is also involved in regulating chromatin-modifying factors and that specific gene activation is dependent on the presence of all enhanceosome components [98]. 3.13 Chromatin
The eukaryotic genome is organised into chromosomes consisting of nucleosomes, which are independent functional domains. Nucleosomes are the basic units of chromosome architecture. The nucleosomal structure is regulated by the temporal and spatial status of the cell, by the ATP-dependent chromatin-remodelling SWI/SNF-like protein complex and by histone acetyltransferases (for review, see [99, 100]. Recent experimental data suggest an important role in tumourigenesis for chromatin-remodelling proteins BRM (brahma) and BRG1 (brahma-related gene 1), which are two ATPase subunits of the SWI/SNF-like complex. In addition to different chromatin and nucleosomal structures, the DNA molecule itself may have different conformations. A dramatic structural transition is made when the right-handed B-DNA double helix is converted to a left-handed Z-DNA double helix [101]. The energetically unfavourable Z-DNA conformation is stabilised by negative supercoiling, which occurs in DNA wrapped around nucleosomes. Negative supercoiling may also occur during transcription just behind the moving RNA polymerase [102]. Z-DNA structures have also been detected in the promoter region of transcribed mammalian genes, suggesting that DNA conformation is an important feature of transcriptional regulation [103]. 3.14 Silencer elements
Since each cell expresses only a small fraction of the genes it contains, most genes must be kept silent. Some genes must also be rapidly switched on and off. Transcriptional repression in eukaryotes is achieved by silencers. Two types of silencers have been described, classical silencer elements and negative regulatory elements, NREs (for review, see [104]. Classical silencer elements are DNA motifs which interfere with transcription factor assembly of the general machinery (active repression mechanism), whereas negative regulatory elements physically inhibit the interaction of transcription factors with their specific DNA-binding sites, or interfere with other factors regulating transcriptional events (passive repression mechanism).
25
26 Gene Expression: Basic Concepts
Silencers have also been described as orientation and promoter-dependent and have been found in introns, in the 3 -untranslated regions and even in exons. Although the research on silencers and the identification of new silencer elements have been rapidly increasing over the past ten years, knowledge of the silencing mechanisms is still limited. 3.15 Transcription factors, repressors and co-repressors
Transcription factors, repressors and co-repressors are proteins that bind to promoter, enhancer or silencer regions in highly specific fashion in order to up or down-regulate gene expression. These elements are predominantly located in the cell nucleus, exist in a great number and variety (more than 2,000 are known today) and are essential factors of transcriptional regulation (for recent review, see [105]). In vitro, RNA polymerase II alone fails to find the correct transcriptional start site and to synthesise efficient amounts of RNA unless nuclear extracts are also present. The proteins involved in transcriptional regulation can roughly be classified into general, constitutively active nuclear factors and regulatory transcription factors. The general factors are involved in each transcriptional process, the regulatory factors are responsible for example for the timing of expression of particular genes needed for embryonic development. Since transcriptional regulation not only reflects the basic processes of a cell (e. g., metabolism, osmoregulation, cell division), but also tissue specific tasks and challenges (e. g., production of clotting factors, immunoglobulins, repair of skin injuries), transcription factors are in continuous communication with the cell environment via cell surface sensors and receptors and the intracellular signalling system. One of the most critical general transcription factors is TFIID. The association of TFIID with the promoter is one of the initial steps for the assembly of the RNA polymerase II holoenzyme. TFIID contains the TATA box-binding protein. The interaction of the TATA box-binding protein with the DNA leads to bending of the DNA double helix and to exposure of the minor groove followed by a cascade of interactions with TATA box-binding protein-associated factors and various regulatory and stabilising proteins. The assembly of the initiation complex requires at least 30 different transcription factors, which have been alphabetically designated as TFIIA, TFIIB, etc. The transcription factors themselves are activated and deactivated by phosphorylation. Another important general transcription factor is SP1, which binds to the major groove of GC boxes in the promoter region via a zinc finger structure. GC boxes are usually found in the promoters of housekeeping genes which frequently lack a TATA box. SP1 promotes the formation of the initiation complex by interaction with TATA box-binding protein-associated factor proteins and assists TFIID in finding the correct transcriptional start site. The CCAAT box transcription factor (CTF) is another example for a general transcription factor, which participates in the assembly of the initiation complex. A tissue-specific variant of the CCAAT box transcription factor is C/EBP (CCAAT/enhancer-binding protein), which is predominantly found in liver cells.
3 Regulation of transcription
RNA polymerase I, responsible for the synthesis of rRNAs, and RNA polymerase III, responsible for the transcription of tRNAs, 5S rRNAs and a few other stable RNAs, are regulated by a distinct set of general transcription factors including upstream binding factor (UBF), TAF1B (SL1), TFIIIA, TFIIIB, TFIIIC, TATA boxbinding protein and a number of TATA box-binding protein-associated factors and other factors, which promote the formation of the polymerase-specific initiation complexes [106]. The regulatory (specific or conditional) transcription factors can be divided into developmental (embryonic development) and signal-dependent factors. Cellular signals may be classified into receptor-ligand (cell surface), internal and steroid receptor signals [105]. Steroid and thyroid hormones are transcriptionally active molecules that circulate in the body and gain access to regulatory DNA elements not by signal transduction, but also by uptake into the cell and diffusion to the nucleus. The most predominant signal-dependent transcription factors are involved in cell proliferation and reside in the nucleus. Well-established resident nuclear factors are the oncogenes Fos and Jun, which form heterodimers with the transcription factor AP1. The genes coding for these factors are characterised by additional promoter elements that respond to growth factors and the downstream signalling pathways. The activation and synthesis of Fos seems also to be determined by an auto-regulation process. Because serum is particularly rich in growth factors, a number of transactivating functions are initiated in vitro upon incubation of a cell culture with serum-containing media. These events include transactivation of serum-responsive elements (SREs) contained in certain gene promoters by serumresponsive factors (SRFs) and other cooperating factors such as the oncogene ELK1 and the p62 subunit of TFIIH. Another important transcription factor involved in cell proliferation is E2F which interacts with the product of the retinoblastoma gene in controlling the cell cycle. Another class of receptor-mediated, signal-dependent transcription factors has been designated as latent cytoplasmic factors that include the REL/NF-κB family and the proteins of the Sma (a protein that is suggested to be involved in the transforming growth factor β [TGF-β], signalling pathway), SMAD (Mad [mothers against decapentaplegic]-related protein), and STAT (signal transducer and activator of transcription) pathways, responding to stress, immunological challenges or growth factors. Since most of the genes of eukaryotic cells are silent, mechanisms of negative transcriptional regulation are also required. The cell must be able to rapidly switch on and off the transcription of genes. This is achieved by activation and deactivation of transcription factors via phosphorylation and dephosphorylation, by the interaction of repressors with silencer motifs of the DNA and inhibitors. Similar to the activating transcription factors, there are global (e.g., B-TFIID transcription factor-associated factor, BTAF1) and specific (e.g., neuron restrictive silencer factor, NRSF, also called repressor element-1, RE1, silencing transcription factor, REST) repressors and co-repressors which interact with silencer elements. Well established examples of inhibitory elements are IκB proteins that block NFκB, the MAX/MAX (oncogene MYC-associated factor X) dimer that blocks the MYC/MAX heterodimer, the cAMP response element modulator (CREM), which is an
27
28 Gene Expression: Basic Concepts
Fig. 9 Schematic model of signalling pathways between extracellular signal receptors and intracellular signal response. The organisation of signal modification on several levels of signal transduction resembles data management in neuronal networks.
alternative splicing product of the cAMP response element (CRE) gene lacking the transactivating domain, or the CCAAT displacement protein (CDP). In summary, the number of newly identified classes and families of transcription factors and other regulatory elements has been rapidly increasing over the past ten years. The Human Genome Project has contributed a great number of new targets and potential functions. A complete list and review of all factors would greatly exceed the limits of this article. The functional interrelationships between the different classes and families of transcription factors is far more complicated than was originally thought [63]. Today, we understand the communication between single cells or tissues and the whole organism on the one hand side and the cross talk between the cell surface and its protein machinery on the other side as complex interactions on several levels resembling a neuronal network (Figure 9). 3.16 Epigenetics
Recent collected data from cloning experiments in mammalian cells have shown that the gene expression profile of a nucleus is not determined solely by its exposure to a particular set of transcription factors. In other words, heritable changes in gene function must exist that cannot be explained by changes in the DNA sequence [107]. This field on non-sequence-related heritable genetic regulation has been termed ‘epigenetics’. An important mechanism of epigenetic regulation is methylation of GC-rich DNA regions. GC-rich regions are primarily located upstream of genes
References
(called CpG islands). Methylation of the CpG islands upstream of a gene silences that gene, a mechanism which is essential for embryonic development, for so-called imprinting and for chromosome X inactivation. In most cases, the methylation pattern of a cell is transmitted to the daughter cells during cell division. The mechanisms governing DNA methylation patterns remain to be elucidated (for recent review, see [108]. Another example of epigenetic regulation is the protein complexes of the polycomb/trithorax group (Pc-G/trx-G) that have been identified in Drosophila. Pc-G/trx-G is a multiprotein complex that interacts with specific regions of the DNA leading to an arrested state of gene expression in early embryogenesis that is stable throughout development. The current understanding is that both CpG methylation and the Pc-G/trx-G complex serve as an epigenetic memory of transcriptional activity during development. The mechanisms by which these patterns are generated are not yet understood. 3.17 Conclusions
Up to recently, there was an understandable emphasis in the biological scientific community on determining the sequence of the genetic code. Now, in the postgenomic era, the focus has shifted to understanding how the blueprint contained in the genes is read, first to generate the organism during embryogenesis, and then to control cell fate and function in the finished product. This article has attempted to provide a short review of the processes involved in regulating gene transcription. We have seen that control of transcription is the main, but, as we will see in the next sections, by no means the only way in which gene expression is regulated. We will also see that the population of protein molecules in a cell generally, but not always reflects the population of mRNA molecules. For this reason, a comprehensive understanding of cellular function requires knowledge not only of the transcriptome, but also of the proteome and of how proteome and transcriptome interact. We have learned that the expression of a particular gene is regulated not only by the interaction of its direct upstream promoter with a dizzying array of transcription factors and repressors, but also by a complex interaction of often distant regulatory regions including enhancers, locus control regions, matrix attachment regions, insulators and regions of increased gene expressions. The packing of the DNA in chromatin, interaction with a range of proteins including histones and enhanceosome complexes, the conformation of the DNA and methylation of DNA add a further layer of complexity to regulating when, where and with what intensity a particular gene is expressed. In the past, many breakthroughs in biology have been achieved by studying simple model systems. The classic studies of the Lac operon by Franc¸ois Jacob, Jacques Monod and their co-workers are a good example of this [109, 110]. Now, however, the paradigm may be changing. It is clear that answering at least some of the questions in the new biology will require a willingness to analyse systems of great complexity and to look for patterns rather than single data points. Increasingly, the whole is being seen to be much more than the sum of the parts.
29
30 Gene Expression: Basic Concepts
References 1 LANDER, E. S., LINTON, L. M., BIRREN, B., NUSBAUM, C., ZODY, M. C., BALDWIN, J., DEVON, K. DEWAR, K., DOYLE, M., FITZHUGH, W., FUNKE, R., GAGE, D., HARRIS, K., HEAFORD, A., HOWLAND, J., KANN, L., LEHOCZKY, J., LE VINE, R., MCEWAN, P., MCKERNAN, K., MELDRIM, J., MESIROV, J. P., MIRANDA, C., MORRIS, W., NAYLOR, J., RAYMOND, C., ROSETTI, M., SANTOS, R., SHERIDAN, A., SOUGNEZ, C., STANGE-THOMANN, N., STOJANOVIC, N., SUBRAMANIAN, A., WYMAN, D., ROGERS, J., SULSTON, J., AINSCOUGH, R., BECK, S., BENTLEY, D., BURTON, J., CLEE, C., CARTER, N., COULSON, A., DEADMAN, R., DELOUKAS, P., DUNHAM, A., DUNHAM, I., DURBIN, R., FRENCH, L., GRAFHAM, D., GREGORY, S., HUBBARD, T., HUMPHRAY, S., HUNT, A., JONES, M., LLOYD, C., MCMURRAY, A., MATTHEWS, L., MERCER, S., MILNE, S., MULLIKIN, J. C., MUNGALL, A., PLUMB, R., ROSS, M., SHOWNKEEN, R., SIMS, S., WATERSTON, R. H., WILSON, R. K., HILLIER, L. W., MCPHERSON, J. D., MARRA, M. A., MARDIS, E. R., FULTON, L. A., CHINWALLA, A. T., PEPIN, K. H., GISH, W. R., CHISSOE, S. L., WENDL, M. C., DELEHAUNTY, K. D., MINER, T. L., DELEHAUNTY, A., KRAMER, J. B., COOK, L. L., FULTON, R. S., JOHNSON, D. L., MINX, P. J., CLIFTON, S. W., HAWKINS, T., BRANSCOMB, E., PREDKI, P., RICHARDSON, P., WENNING, S., SLEZAK, T., DOGGETT, N., CHENG, J. F., OLSEN, A., LUCAS, S., ELKIN, C., UBERBACHER, E., FRAZIER, M., GIBBS, R. A., MUZNY, D. M., SCHERER, S. E., BOUCK, J. B., SODERGREN, E. J., WORLEY, K. C., RIVES, C. M., GORRELL, J. H., METZKER, M. L., NAYLOR, S. L., KUCHERLAPATI, R. S., NELSON, D. L., WEINSTOCK, G. M., SAKAKI, Y., FUJIYAMA, A., HATTORI, M., YADA, T., TOYODA, A., ITOH, T., KAWAGOE, C., WATANABE, H., TOTOKI, Y., TAYLOR, T., WEISSENBACH, J., HEILIG, R., SAURIN, W., ARTIGUENAVE, F., BROTTIER, P., BRULS, T., PELLETIER, E., ROBERT, C., WINCKER, P., SMITH, D. R., DOUCETTE-STAMM, L., RUBENFIELD, M., WEINSTOCK, K., LEE, H. M., DUBOIS, J., ROSENTHAL, A., PLATZER, M., NYAKATURA, G., TAUDIEN, S., RUMP, A., YANG, H., YU,
J., WANG, J., HUANG, G., GU, J., HOOD, L., ROWEN, L., MADAN, A., QIN, S., DAVIS, R. W., FEDERSPIEL, N. A., ABOLA, A. P., PROCTOR, M. J., MYERS, R. M., SCHMUTZ, J., DICKSON, M., GRIMWOOD, J., COX, D. R., OLSON, M. V., KAUL, R., RAYMOND, C., SHIMIZU, N., KAWASAKI, K., MINOSHIMA, S., EVANS, G. A., ATHANASIOU, M., SCHULTZ, R., ROE, B. A., CHEN, F., PAN, H., RAMSER, J., LEHRACH, H., REINHARDT, R., MCCOMBIE, W. R., de la BASTIDE, M., DEDHIA, N., BLOCKER, H., HORNISCHER, K., NORDSIEK, G., AGARWALA, R., ARAVIND, L., BAILEY, J. A., BATEMAN, A., BATZOGLOU, S., BIRNEY, E., BORK, P., BROWN, D. G., BURGE, C. B., CERUTTI, L., CHEN, H. C., CHURCH, D., CLAMP, M., COPLEY, R. R., DOERKS, T., EDDY, S. R., EICHLER, E. E., FUREY, T. S., GALAGAN, J., GILBERT, J. G., HARMON, C., HAYASHIZAKI, Y., HAUSSLER, D., HERMJAKOB, H., HOKAMP, K., JANG, W., JOHNSON, L. S., JONES, T. A., KASIF, S., KASPRYZK, A., KENNEDY, S., KENT, W. J., KITTS, P., KOONIN, E. V., KORF, I., KULP, D., LANCET, D., LOWE, T. M., MCLYSAGHT, A., MIKKELSEN, T., MORAN, J. V., MULDER, N., POLLARA, V. J., PONTING, C. P., SCHULER, G., SCHULTZ, J., SLATER, G., SMIT, AF., STUPKA, E., SZUSTAKOWSKI, J., THIERRY-MIEG, D., THIERRY-MIEG, J., WAGNER, L., WALLIS, J., WHEELER, R., WILLIAMS, A., WOLF, Y. I., WOLFE, K. H., YANG, S. P., YEH, R. F., COLLINS, F., GUYER, M. S., PETERSON, J., FELSENFELD, A., WETTERSTRAND, K. A., PATRINOS, A., MORGAN, M. J., and SZUSTAKOWKI, J.; International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409(6822): 860–921. 2 VENTER, J. C., ADAMS, M. D., MYERS, E. W., LI, P. W., MURAL, R. J., SUTTON, G. G., SMITH, H. O., YANDELL, M., EVANS, C. A., HOLT, R. A., GOCAYNE, J. D., AMANATIDES, P., BALLEW, R. M., HUSON, D. H., WORTMAN, J. R., ZHANG, Q., KODIRA, C. D., ZHENG, X. H., CHEN, L., SKUPSKI, M., SUBRAMANIAN, G., THOMAS, P. D., ZHANG, J., GABOR MIKLOS, G. L.,
References NELSON, C., BRODER, S., CLARK, A. G., NADEAU, J., MCKUSICK, V. A., ZINDER, N., LEVINE, A. J., ROBERTS, R. J., SIMON, M., SLAYMAN, C., HUNKAPILLER, M., BOLANOS, R., DELCHER, A., DEW, I., FASULO, D., FLANIGAN, M., FLOREA, L., HALPERN, A., HANNENHALLI, S., KRAVITZ, S., LEVY, S., MOBARRY, C., REINERT, K., REMINGTON, K., ABU-THREIDEH, J., BEASLEY, E., BIDDICK, K., BONAZZI, V., BRANDON, R., CARGILL, M., CHANDRAMOULISWARAN, I., CHARLAB, R., CHATURVEDI, K., DENG, Z., DI FRANCESCO, V., DUNN, P., EILBECK, K., EVANGELISTA, C., GABRIELIAN, A. E., GAN, W., GE, W., GONG, F., GU, Z., GUAN, P., HEIMAN, T. J., HIGGINS, M. E., JI, R. R., KE, Z., KETCHUM, K. A., LAI, Z., LEI, Y., LI, Z., LI, J., LIANG, Y., LIN, X., LU, F., MERKULOV, G. V., MILSHINA, N., MOORE, H. M., NAIK, A. K., NARAYAN, V. A., NEELAM, B., NUSSKERN, D., RUSCH, D. B., SALZBERG, S., SHAO, W., SHUE, B., SUN, J., WANG, Z., WANG, A., WANG, X., WANG, J., WEI, M., WIDES, R., XIAO, C., YAN, C., YAO, A., YE, J., ZHAN, M., ZHANG, W., ZHANG, H., ZHAO, Q., ZHENG, L., ZHONG, F., ZHONG, W., ZHU, S., ZHAO, S., GILBERT, D., BAUMHUETER, S., SPIER, G., CARTER, C., CRAVCHIK, A., WOODAGE, T., ALI, F., AN, H., AWE, A., BALDWIN, D., BADEN, H., BARNSTEAD, M., BARROW, I., BEESON, K., BUSAM, D., CARVER, A., CENTER, A., CHENG, M. L., CURRY, L., DANAHER, S., DAVENPORT, L., DESILETS, R., DIETZ, S., DODSON, K., DOUP, L., FERRIERA, S., GARG, N., GLUECKSMANN, A., HART, B., HAYNES, J., HAYNES, C., HEINER, C., HLADUN, S., HOSTIN, D., HOUCK, J., HOWLAND, T., IBEGWAM, C., JOHNSON, J., KALUSH, F., KLINE, L., KODURU, S., LOVE, A., MANN, F., MAY, D., MCCAWLEY, S., MCINTOSH, T., MCMULLEN, I., MOY, M., MOY, L., MURPHY, B., NELSON, K., PFANNKOCH, C., PRATTS, E., PURI, V., QURESHI, H., REARDON, M., RODRIGUEZ, R., ROGERS, Y. H., ROMBLAD, D., RUHFEL, B., SCOTT, R., SITTER, C., SMALLWOOD, M., STEWART, E., STRONG, R., SUH, E., THOMAS, R., TINT, N. N., TSE, S., VECH, C., WANG, G., WETTER, J., WILLIAMS, S., WILLIAMS, M., WINDSOR, S., WINN-DEEN, E., WOLFE, K., ZAVERI, J., ZAVERI, K., ABRIL, J. F., GUIGO,
3
4
5
6
7
8
R., CAMPBELL, M. J., SJOLANDER, K. V., KARLAK, B., KEJARIWAL, A., MI, H., LAZAREVA, B., HATTON, T., NARECHANIA, A., DIEMER, K., MURUGANUJAN, A., GUO, N., SATO, S., BAFNA, V., ISTRAIL, S., LIPPERT, R., SCHWARTZ, R., WALENZ, B., YOOSEPH, S., ALLEN, D., BASU, A., BAXENDALE, J., BLICK, L., CAMINHA, M., CARNES-STINE, J., CAULK, P., CHIANG, Y. H., COYNE, M., DAHLKE, C., MAYS, A., DOMBROSKI, M., DONNELLY, M., ELY, D., ESPARHAM, S., FOSLER, C., GIRE, H., GLANOWSKI, S., GLASSER, K., GLODEK, A., GOROKHOV, M., GRAHAM, K., GROPMAN, B., HARRIS, M., HEIL, J., HENDERSON, S., HOOVER, J., JENNINGS, D., JORDAN, C., JORDAN, J., KASHA, J., KAGAN, L., KRAFT, C., LEVITSKY, A., LEWIS, M., LIU, X., LOPEZ, J., MA, D., MAJOROS, W., MCDANIEL J., MURPHY, S., NEWMAN, M., NGUYEN, T., NGUYEN, N., NODELL, M., PAN, S., PECK, J., PETERSON, M., ROWE, W., SANDERS, R., SCOTT, J., SIMPSON, M., SMITH, T., SPRAGUE, A., STOCKWELL, T., TURNER, R., VENTER, E., WANG, M., WEN, M., WU, D., WU, M., XIA, A., ZANDIEH, A., and ZHU X The sequence of the human genome. Science 2001; 291(5507): 1304–1351. STRYER, L.. Biochemistry, 4th edition, W. H. Freeman and Company, New York 1995. PERI, S., and PANDEY, A. A reassessment of the translation initiation codon in vertebrates. Trends in Genetics 2001; 17(12): 685–687. KOZAK, M. An analysis of 5 -noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Research 1987; 15(20): 8125–8148. KOZAK, M. Structural features in eukaryotic mRNAs that modulate the initiation of translation. Journal of Biological Chemistry 1991; 266(30): 19867–19870. KOZAK, M. Do the 5 untranslated domains of human cDNAs challenge the rules for initiation of translation (or is it vice versa)? Genomics 2000; 70(3): 396–406. ALBERTS, B., BRAY, D., LEWIS, J., RAFF, M., ROBERTS, K., and WATSON, J. D. Molecular biology of the cell. Garland Publishing,
31
32 Gene Expression: Basic Concepts
9
10
11
12
13
14
15
16
New York, New York, USA 1998: 528. CAREY, M. F., and SMALE, S. T. Transcriptional regulation in eukaryotes: concepts, strategies, and techniques. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, New York, USA 2000. BUSTIN, S. A. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. Journal of Molecular Endocrinology 2000; 25(2): 169–193. CHEE, M., YANG, R., HUBELL, E., BERNO, A., HUANG, X. C., STERN, D., WINKLER, J., LOCKHART, D. J., MORRIS, M. S., and FODOR, S. P. Accessing genetic information with high-density DNA arrays. Science 1996; 274(5287): 610–614. DERISI, J. L., IYER, V. R., and BROWN, P. O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 1997; 278(5338): 680–686. VELCULESCU, V. E., ZHANG, L., VOGELSTEIN, B., and KINZLER, K. W. Serial analysis of gene expression. Science 1995; 270(5235): 484–487. EISEN, M. B., SPELLMAN, P. T., BROWN, P. O., and BOTSTEIN, D. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the USA 1998; 95(25): 14863–14868. HUGHES, T. R., MARTON, M. J., JONES, A. R., ROBERTS, C. J., STOUGHTON, R., ARMOUR, C. D., BENNETT, H. A., COFFEY, E., DAI, H., HE, Y. D., KIDD, M. J., KING, A. M., MEYER, M. R., SLADE, D., LUM, P. Y., STEPANIANTS, S. B., SHOEMAKER, D. D., GACHOTTE, D., CHAKRABURTTY, K., SIMON, J., BARD, M., and FRIEND, S. H. Functional discovery via a compendium of expression profiles. Cell 2000; 102(1): 109–126. SPELLMANN, P. T., SHERLOCK, G., ZHANG, M. Q., IYER, V R., ANDERS, K., EISEN, M. B., BROWN, P. O., BOTSTEIN, D., and FUTCHER, B. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 1998; 9(12) 3273–3297.
17 CHO, R. J., CAMPBELL, M. J., WINZELER, E. A., STEINMETZ, L., CONWAY, A., WODICKA, L., WOLFSBERG, T. G., GABRIELIAN, A. E., LANDSMAN, D., LOCKHART, D. J., and DAVIS, R. W. A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 1998; 2(1): 65–73. 18 CAUSTON, H. C., REN, B., KOH, S. S., HARBISON, C. T., KANIN, E., JENNINGS, E. G., LEE, T. I., TRUE, H. L., LANDER, E. S., and YOUNG, R. A. Remodeling of yeast genome expression in response to environmental changes. Molecular Biology of the Cell 2001; 12(2): 323–337. 19 GASCH, A. P., SPELLMAN, P. T., KAO, C. M., CARMEL-HAREL, O., EISEN, M. B., STORZ, G., BOTSTEIN, D., and BROWN, P. O. Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell 2000; 11(12): 4241–4257. 20 MARTON, M. J., DERISI, J. L., BENNETT, H. A., IYER, V. R., MEYER, M. R., ROBERTS, C. J., STOUGHTON, R., BURCHARD, J., SLADE, D., DAI, H., BASSETT, D. E., Jr., HARTWELL, L. H., BROWN, P. O., and FRIEND, S. H. Drug target validation and identification of secondary drug target effects using DNA microarrays. Nature Medicine 1998; 4(11): 1293–1301. 21 ALIZADEH, A. A., EISEN, M. B., DAVIS, R. E., MA, C., LOSSOS, I. S., ROSENWALD, A., BOLDRICK, J. C., SABET, H., TRAN, T., YU, X., POWELL, J. I., YANG, L., MARTI, G. E., MOORE, T., HUDSON, J., Jr., LU, L., LEWIS, D. B., TIBSHIRANI, R., SHERLOCK, G., CHAN, W. C., GREINER, T. C., WEISENBURGER, D. D., ARMITAGE, J. O., WARNKE, R., LEVY, R., WILSON, W., GREVER, M. R., BYRD, J. C., BOTSTEIN, D., BROWN, P. O., and STAUDT, L. M. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000; 403(6769): 503–511. 22 IYER, V. R., HORAK, C. E., SCAFE, C. S., BOTSTEIN, D., SNYDER, M., and BROWN, P. O. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 2001; 409(6819): 533–538. 23 LIEB, J. D., LIU, X., BOTSTEIN, D., and BROWN, P. O. Promoter-specific binding of Rap1 revealed by genome-wide maps
References
24
25
26
27
28
29
30
of protein-DNA association. Nature Genetics 2001; 28(4): 327–334. REN, B., ROBERT, F., WYRICK, J. J., APARICIO, O., JENNINGS, E. G., SIMON, I., ZEITLINGER, J., SCHREIBER, J., HANNETT, N., KANIN, E., VOLKERT, T. L., WILSON, C. J., BELL, S. P., and YOUNG, R. A. Genome-wide location and function of DNA binding proteins. Science 2000; 290(5500): 2306–2309. SIMON, I., BARNETT, J., HANNETT, N., HARBISON, C. T., RINALDI, N. J., VOLKERT, T. L., WYRICK, J. J., ZEITLINGER, J., GIFFORD, D. K., JAAKKOLA, T. S., and YOUNG, R. A. Serial regulation of transcriptional regulators in the yeast cell cycle. Cell 2001; 106(6): 697–708. HOGENESCH, J. B., CHING, K. A., BATALOV, S., SU, A. I., WALKER, J. R., ZHOU, Y., KAY, S. A., SCHULTZ, P. G., and COOKE, M. P. A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell 2001; 106(4): 413–415. SHOEMAKER, D. D., SCHADT, E. E., ARMOUR, C. D., HE, Y. D., GARRETT-ENGELE, P., MCDONAGH, P. D., LOERCH, P. M., LEONARDSON, A., LUM, P.Y, CAVET, G., WU, L. F., ALTSCHULER, S. J., EDWARDS, S., KING, J., TSANG, J. S., SCHIMMACK, G., SCHELTER, J. M., KOCH, J., ZIMAN, M., MARTON, M. J., LI, B., CUNDIFF P., WARD, T, CASTLE, J., KROLEWSKI, M., MEYER, M. R., MAO, M., BURCHARD, J., KIDD, M. J., DAI, H., PHILLIPS, J. W., LINSLEY, P. S., STOUGHTON, R., SCHERER, S., and BOGUSKI, M. S. Experimental annotation of the human genome using microarray technology. Nature 2001; 409(6822): 922–927. MCCARTHY, J. E. Posttranscriptional control of gene expression in yeast. Microbiology and Molecular Biology Reviews 1998; 62(4): 1492–1553. SACHS, A. B., and BURATOWSKI, S. Common themes in translational and transcriptional regulation. Trends in Biochemical Sciences 1997; 22(6): 189–192. PANDEY, A., and MANN, M. Proteomics to study genes and genomes. Nature 2000; 405(6788): 837–846.
31 MAILLET, I., LAGNIEL, G., PERROT, M., BOUCHERIE, H., and LABARRE, J. Rapid identification of yeast proteins on two-dimensional gels. Journal of Biological Chemistry 1996; 271(17): 10263–10270. 32 LEE, T. I., and YOUNG, R. A. Transcription of eukaryotic protein-coding genes. Annual Review of Genetics 2000; 34: 77–137. 33 CRAMER, P., BUSHNELL, D. A., FU, J., GNATT, A. L., MAIER-DAVIS, B., THOMPSON, N. E., BURGESS, R. R., EDWARDS, A. M., DAVID, P. R., and KORNBERG, R. D. Architecture of RNA polymerase II and implications for the transcription mechanism. Science 2000; 288(5466): 640–649. 34 GNATT, A. L., CRAMER, P., FU, J., BUSHNELL, D. A., and KORNBERG, R. D. Structural basis of transcription: an RNA polymerase II elongation complex at 3.3 Å resolution. Science 2001; 292(5523): 1876–1882. 35 FIELDS, S., and SONG, O. A novel genetic system to detect protein-protein interactions. Nature 1989; 340(6230): 245–246. 36 LEGRAIN, P., WOJCIK, J., and GAUTHIER, J. M. Protein-protein interaction maps: a lead towards cellular functions. Trends in Genetics 2001; 17(6): 346–352. 37 VIDAL, M., and LEGRAIN, P. Yeast forward and reverse ‘n’-hybrid systems. Nucleic Acids Research 1999; 27(4): 919–929. 38 BARTEL, P. L., ROECKLEIN, J. A., SENGUPTA, D., and FIELDS, S. A protein linkage map of Escherichia coli bacteriophage T7. Nature Genetics 1996; 12(1): 72–77. 39 FLAJOLET, M., ROTONDO, G., DAVIET, L., BERGAMETTI, F., INCHAUSPE, G., TIOLLAIS, P., TRANSY C., and LEGRAIN, P. A genomic approach of the hepatitis C virus generates a protein interaction map. Gene 2000; 242(1-2): 369–379. 40 WALHOUT, A. J., SORDELLA, R., LU, X., HARTLEY, J. L., TEMPLE, G. F., BRASCH, M. A., THIERRY-MIEG, N., and VIDAL, M. Protein interaction mapping in C. elegans using proteins involved in vulval development. Science 2000; 287(5450): 116–122.
33
34 Gene Expression: Basic Concepts
41 RAIN, J.-C, SELIG, L, De REUSE, H., BATTAGLIA, V, REVERDY C, SIMON, S., LENZEN, G., PETEL, F., WOJCIK, J., SCHACHTER, V, CHEMAMA, Y., LABIGNE, A., and LEGRAIN, P. The protein-protein interaction map of Helicobacter pylori. Nature 2001; 409(6817): 211–215. 42 FLORES, A., BRIAND, J.-F., GADAL, O., ANDRAU, J. C., RUBBI, L, Van MULLEM, V, BOSCHIERO, C., GOUSSOT, M., MARCK, C., CARLES, C., THURIAUX, P., SENTENAC, A., and WERNER, M. A protein-protein interaction map of yeast RNA polymerase III. Proceedings of the National Academy of Sciences of the USA 1999; 96(14): 7815–7820. 43 ITO, T., TASHIRO, K., MUTA, S., OZAWA, R., CHIBA, T., NISHIZAWA, M., YAMAMOTO, K., KUHARA, S., and SAKAKI, Y. Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proceedings of the National Academy of Sciences of the USA 2000; 97(34): 1143–1147. 44 ITO, T., CHIBA, T., OZAWA, R., YOSHIDA, M., HATTORI, M., and SAKAKI, Y. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proceedings of the National Academy of Sciences of the USA 2001; 98(3): 4569–4574. 45 UETZ, P., GIOT, L., CAGNEY, G., MANSFIELD, T. A., JUDSON, R. S., KNIGHT, J. R., LOCKSHON, D., NARAYAN, V., SRINIVASAN, M., POCHART, P., QURESHI-EMILI, A., LI, Y., GODWIN, B., CONOVER, D., KALBFLEISCH, T., VIJAYADAMODAR, G., YANG, M., JOHNSTON, M., FIELDS, S., and ROTHBERG, J. M. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000; 403(6770): 623–627. 46 GE, H., LIU, Z., CHURCH, G. M., and VIDAL, M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nature Genetics 2001; 29(4): 482–486. 47 GAVIN, A.-C., B¨OSCHE, M., KRAUSE, R., GRANDI, P., MARZIOCH, M., BAUER, A.,
48
49
50
51
SCHULTZ, J., RICK, J. M., MICHON, A. M., CRUCIAT, C. M., REMOR, M., HOFERT, C., SCHELDER, M., BRAJENOVIC, M., RUFFNER, H., MERINO, A., KLEIN, K., HUDAK, M., DICKSON, D., RUDI, T., GNAU, V., BAUCH, A., BASTUCK, S., HUHSE, B., LEUTWEIN, C., HEURTIER, M. A., COPLEY, R. R., EDELMANN, A., QUERFURTH, E., RYBIN, V., DREWES, G., RAIDA, M., BOUWMEESTER, T., BORK, P., SERAPHIN, B., KUSTER, B., NEUBAUER, G., and SUPERTI-FURGA, G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002; 415(6868): 141–147. HO, Y., GRUHLER, A., HEILBUT, A., BADER, G. D., MOORE, L., ADAMS, S. L., MILLAR, A., TAYLOR, P., BENNETT, K., BOUTILIER, K., YANG, L., WOLTING, C., DONALDSON, I., SCHANDORFF, S., SHEWNARANE, J., VO, M., TAGGART, J., GOUDREAULT, M., MUSKAT, B., ALFARANO, C., DEWAR, D., LIN, Z., MICHALICKOVA, K., WILLEMS, A. R., SASSI, H., NIELSEN, P. A., RASMUSSEN, K. J., ANDERSEN, J. R., JOHANSEN, L. E., HANSEN, L. H., JESPERSEN, H., PODTELEJNIKOV, A., NIELSEN, E., CRAWFORD, J., POULSEN, V., SORENSEN, B. D., MATTHIESEN, J., HENDRICKSON, R. C., GLEESON, F., PAWSON, T., MORAN, M. F., DUROCHER, D., MANN, M., HOGUE, C. W., FIGEYS, D., and TYERS, M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002; 415(6868): 180–183. ZHU, H., BILGIN, M., BANGHAM, R., HALL, D., CASAMAYOR, A., BERTONE, P., LAN, N., JANSEN, R., BIDLINGMAIER, S., HOUFEK, T., MITCHELL, T., MILLER, P., DEAN, R. A., GERSTEIN, M., and SNYDER, M. Global analysis of protein activities using proteome chips. Science 2001; 293(5537): 2101–2105. GEIDUSCHEK, E. P., and KASSAVETIS, G. A. The RNA polymerase III transcription apparatus. Journal of Molecular Biology 2001; 310(1): 1–26. PAULE, M. R. Transcription of ribosomal RNA genes by eukaryotic RNA polymerase I. Springer-Verlag, Berlin, Germany 1998.
References 52 WEINZIERL, R. O. J. Mechanisms of gene expression: structure, function and evolution of the base transcription machinery. Imperial College Press, London, United Kingdom 1999. 53 WHITE, R. J. RNA polymerase III transcription. Springer-Verlag, Berlin, Germany 1998. 54 JONES, K. A., and KADONAGA, J. T. Exploring the transcription-chromatin interface. Genes and Development 2000; 14(16): 1992–1996. 55 WOLFFE, A. P., and HAYES, J. J. Chromatin disruption and modification. Nucleic Acids Research 1999; 27(3): 711–720. 56 DIECI, G., and SENTENAC, A. Facilitated recycling pathway for RNA polymerase III. Cell 1996; 84(2): 245–252. 57 CARLES, C., and RIVA, M. Yeast RNA polymerase I subunits and genes. In: Transcription of ribosomal RNA genes by eukaryotic RNA polymerase I. PAULE, M. R., editor. Springer-Verlag, Berlin, Germany 1998: 9–38. 58 CHE´ DIN, S., FERRI, L., PEYROCHE, G., ANDRAU, J. C., JOURDAIN, S., LEFEVRE, O., WERNER, M., CARLES, C., and SENTENAC, A. The yeast RNA polymerase III transcription machinery. A paradigm for eukaryotic gene activation. Cold Spring Harbor Symposia on Quantitative Biology 1998; 63: 381–389. 59 WOYCHIK, N. A. Fractions to functions: RNA polymerase II thirty years later. Cold Spring Harbor Symposia on Quantitative Biology 1998; 63: 311–317. 60 SHPAKOVSKI, G. V., ACKER, J., WINTZERITH, M., LACROIX, J.-F., THURIAUX, P., and VIGNERON, M. Four subunits that are shared by the three classes of RNA polymerase are functionally interchangeable between Homo sapiens and Saccharomyces cerevisiae. Molecular and Cellular Biology 1995; 15(9): 4702–4710. 61 HAMPSEY, M. Molecular genetics of the RNA polymerase II general transcriptional machinery. Microbiology and Molecular Biology Reviews 1998; 62(2): 465–503. 62 LEE, T. I., and YOUNG, R. A. Regulation of gene expression by TBP-associated
63
64
65
66
67
68
69
70
71
72
73
proteins. Genes and Development 1998; 12(10): 1398–1408. LEMON, B., and TJIAN, R. Orchestrated response: a symphony of transcription factors for gene control. Genes and Development 2000; 14(10): 2551–2569. DANTONEL, J. C., WURTZ, J. M., POCH, O., MORAS, D., and TORA L. The TBP-like factor: an alternative transcription factor in metazoa? Trends in Biochemical Sciences 1999; 24(9): 335–339. DE LAAT, W. L., JASPERS, N. G., and HOEIJMAKERS, J. H. Molecular mechanism of nucleotide excision repair. Genes and Development 1999; 13(7): 768–785. MYERS, L. C., and KORNBERG, R. D. Mediator of transcriptional regulation. Annual Review of Biochemistry 2000; 69: 729–749. YUDKOVSKY N., RANISH, J. A., and HAHN, S. A transcription reinitiation intermediate that is stabilized by activator. Nature 2000; 408(6809): 225–229. CONAWAY, J. W., SHILATIFARD, A., DVIR, A., and CONAWAY, R. C. Control of elongation by RNA polymerase II. Trends in Biochemical Sciences 2000; 25(8): 375–380. PERIER, R. C., JUNIER, T., and BUCHER, P. The Eukaryotic Promoter Database EPD. Nucleic Acids Research 1998; 26(1): 353–357. PERIER, R. C., JUNIER, T., BONNARD, C., and BUCHER, P. The Eukaryotic Promoter Database (EPD): recent developments. Nucleic Acids Research 1999; 27(1): 307–309. PERIER, R. C., PRAZ, V., JUNIER, T., BONNARD, C., and BUCHER, P. The eukaryotic promoter database (EPD). Nucleic Acids Research 2000; 28(1): 302–303. PRAZ, V., PERIER, R., BONNARD, C., and BUCHER, P. The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data. Nucleic Acids Research 2002; 30(1): 322–324. CAREY, M. F. Transcriptional activation – A holistic view of the complex. Current Biology 1995; 5(9): 1003–1005.
35
36 Gene Expression: Basic Concepts
74 ERNST, P. and SMALE, S. T. Combinatorial regulation of transcription II: The immunoglobulin nheavy chain gene. Immunity 1995; 2(5): 427–438. 75 GROSSCHEDL, R., GIESE, K., and PAGEL, J. HMG domain proteins: Architectural elements in the assembly of nucleoprotein structures. Trends in Genetics 1994; 10(3): 94–100. 76 BULGER, M., and GROUDINE, M. Looping versus linking: toward a model for long-distance gene activation. Genes and Development 1999; 13(19): 2465–2477. 77 MOLETE, J. M., PETRYKOWSKA, H., BOUHASSIRA, E. E., FENG, Y. Q., MILLER, W, and HARDISON, R. C. Sequences flanking hypersensitive sites of the beta-globin locus control region are required for synergistic enhancement. Molecular and Cellular Biology 2001; 21(9): 2969–2980. 78 FIERING, S., EPNER, E., ROBINSON, K., ZHUANG, Y., TELLING, A., HU, M., MARTIN, D. I., ENVER, T, LEY, T. J., and GROUDINE, M. Targeted deletion of 5 HS2 of the murine beta-globin LCR reveals that it is not essential for proper regulation of the beta-globin locus. Genes and Development 1995; 9(18): 2203–2213. 79 ENGEL, J. D., and TANIMOTO, K. Looping, linking, and chromatin activity: new insights into beta-globin locus regulation. Cell 2000; 100(5): 499–502. 80 GROSVELD, F. Activation by locus control regions? Current Opinion in Genetics and Development 1999; 9(2): 152–157. 81 ELEFANT, F., SU, Y., LIEBHABER, S. A., and COOKE, N. E. Patterns of histone acetylation suggest dual pathways for gene activation by a bifunctional locus control region. EMBO Journal 2000; 19(24): 6814–6822. 82 BENDER, M. A., REIK, A., CLOSE, J., TELLING, A., EPNER, E., FIERING, S., HARDISON, R., and GROUDINE, M. Description and targeted deletion of 5 hypersensitive site 5 and 6 of the mouse beta-globin locus control region. Blood 1998; 92(11): 4394–4403. 83 SCHU¨ BELER, D., GROUDINE, M., and BENDER, M. A. The murine beta-globin locus control region regulates the rate of transcription but not the
84
85
86
87
88
89
90
91
92
hyperacetylation of histones at the active genes. Proceedings of the National Academy of Sciences of the USA 2001; 98(20): 11432–11437. CHEN, H., LIN, R. J., XIE, W., WILPITZ, D., and EVANS, R. M. Regulation of hormone-induced histone hyperacetylation and gene activation via acetylation of an acetylase. Cell 1999; 98(5): 675–686. AGALIOTI, T., LOMVARDAS, S., PAREKH, B., YIE, J., MANIATIS, T., and THANOS, D. Ordered recruitment of chromatin modifying and general transcription factors to the IFN-beta promoter. Cell 2000; 103(4): 667–678. SHANG, Y., HU, X., DIRENZO, J., LAZAR, M. A., and BROWN, M. Cofactor dynamics and sufficiency in estrogen receptor-regulated transcription. Cell 2000; 103(6): 843–852. GASSER, S. M., and LAEMMLI, U. K. Improved methods for the isolation of individual and clustered mitotic chromosomes. Experimental Cell Research 1987; 173(1): 85–98. PHI-VAN, L., and STRA¨ TLING, W. H. Association of DNA with nuclear matrix. Progress in Molecular and Subcellular Biology 1990; 11: 1–11. HART, CM., and LAEMMLI, U. K. Facilitation of chromatin dynamics by SARs. Current Opinion in Genetics and Development 1998; (8) 5: 519–525. BELL, A. C., and FELSENFELD, G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 2000; 405(6785): 482–485. FILIPPOVA, G. N., FAGERLIE, S., KLENOVA, E. M., MYERS, C, DEHNER, Y, GOODWIN, G., NEIMAN, P. E., COLLINS, S. J., and LOBANENKOV, V. V An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian c-myc oncogenes. Molecular and Cellular Biology 1996; 16(6): 2802–2813. LUTZ, M., BURKE, L. J., BARRETO, G., GOEMAN, F., GREB, H., ARNOLD, R., SCHULTHEISS, H., BREHM, A., KOUZARIDES, T., LOBANENKOV, V., and
References
93
94
95
96
97
98
99
100
101
RENKAWITZ, R. Transcriptional repression by the insulator protein CTCF involves histone deacetylases. Nucleic Acids Research 2000; 28(8): 1707–1713. CARON, H., van SCHAIK, B., van der MEE, M., BAAS, F., RIGGINS, G., van SLUIS, P., HERMUS, M. C., van ASPEREN, R., BOON, K., VOUTE, P. A., HEISTERKAMP, S., van KAMPEN, A., and VERSTEEG, R. The human transcriptome map: clustering of highly expressed genes in chromosomal domains. Science 2001; 291(5507): 1289–1292. VELCULESCU, V. E., ZHANG, L., ZHOU, W., VOGELSTEIN, J., BASRAI, M. A., BASSETT, D. E., Jr., HIETER, P., VOGELSTEIN, B., and KINZLER, K. W. Characterization of the yeast transcriptome. Cell 1997; 88(2): 243–251. THANOS, D., and MANIATIS, T. Virus induction of human IFN β gene expression requires the assembly of an enhanceosome. Cell 1995; 83(7): 1091–1100. KIM, T. K., and MANIATIS, T. The mechanism of transcriptional synergy of an in vitro assembled interferon-β enhanceosome. Molecular Cell 1997; 1(1): 119–129. YIE, J., SENGER, K., and THANOS, D. Mechanism by which the IFN-β enhanceosome activates transcription. Proceedings of the National Academy of Sciences of the USA 1999; 96(23): 13108–13113. MERIKA, M., and THANOS, D. Enhanceosomes. Current Opinion in Genetics and Development 2001; 11(2): 205–208. FRY, C. J., and PETERSON, C. L. Chromatin remodeling enzymes: who’s on first? Current Biology 2001; 11(5): R185-197. KINGSTON, R. E., BUNKER, C. A., and IMBALZANO, A., N. REPRESSION and activation by multiprotein complexes that alter chromatin structure. Genes and Development 1996; 10(8): 905–920. WANG, A. H., QUIGLEY, G. J., KOLPAK, F. J., CRAWFORD, J. L., van BOOM, J. H., van
102
103
104
105
106
107
108
109
110
111
der MAREL, G., and RICH, A. Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature 1979; 282(5740): 680–686. LIU, L. F., and WANG, J. C. Supercoiling of the DNA template during transcription. Proceedings of the National Academy of Sciences of the USA 1987; 84(20): 7024–7027. HERBERT, A., and RICH, A. Left-handed Z-DNA: structure and function. Genetica 1999; 106(1-2): 37–47. OGBOURNE, S., and ANTALIS, T. M. Transcriptional control and the role of silencers in transcriptional regulation in eukaryotes. Biochemical Journal 1998; 331(Part 1): 1–14. BRIVANLOU, A. H., and DARNELL, J. E., Jr. Signal transduction and the control of gene expression. Science 2002; 295(5556): 813–818. WEINZIERL, R. O. J., DYNLACHT, B. D., TIJAN, R. Largest subunit of Drosophila transcription factor II-D directs assembly of a complex containing TBP and a coactivator. Nature 1993; 362(6420): 511–517. RUSSO, V. E. A., MARTIENSSEN, R. A., and RIGGS, A. D. Epigenetic mechanisms of gene regulation. Cold Spring Harbour Laboratory Press, Cold Spring Harbour, New York, USA 1996. BIRD, A. DNA methylation patterns and epigenetic memory. Genes & Development 2002; 16(1): 6–21. JACOB, F., PERRIN, D., SANCHEZ, C., and MONOD, J. L’op´eron groupes de genes a` ` expression coordonn´ee par un op´erateur. Comptes Rendus des Seances de L’academie des Sciences 1960; 250: 17271729. JACOB, F., and MONOD, J. Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology 1961; 3: 318–356. CRAMER, P., BUSHNELL, D. A., and KORNBERG, R. D. Structural basis of transcription: RNA polymerase II at 2.8 Ångstrom resolution. Science 2001; 292(5523): 1863–1876.
37